mirror/go - go - Git Fam. Sieh

Commit Graph

Author	SHA1	Message	Date
Keith Randall	5b1fbfba1c	cmd/compile: rewrite >>c<<c to &^(1<<c-1) Fixes #54496 Change-Id: I3c2ed8cd55836d5b07c8cdec00d3b584885aca79 Reviewed-on: https://go-review.googlesource.com/c/go/+/424856 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Heschi Kreinick <heschi@google.com> Run-TryBot: Martin Möhrmann <martin@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Martin Möhrmann <martin@golang.org>	2022-09-02 18:51:37 +00:00
Matthew Dempsky	34f0029a85	cmd/compile/internal/noder: allow OCONVNOP for identical iface conversions In go.dev/cl/421821, I included a hack to force OCONVNOP back to OCONVIFACE for conversions involving shape types and non-empty interfaces. The comment correctly noted that this was only needed for conversions between non-identical types, but the code was conservative and applied to even conversions between identical types. This CL adds an extra bool to record whether the conversion is between identical types, so we can keep OCONVNOP instead of forcing back to OCONVIFACE. This has a small improvement to generated code, because we no longer need a convI2I call (as demonstrated by codegen/ifaces.go). But more usefully, this is relevant to pruning unnecessary itab slots in runtime dictionaries (next CL). Change-Id: I94f89e961cd26629b925037fea58d283140766ff Reviewed-on: https://go-review.googlesource.com/c/go/+/427678 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-09-02 18:26:02 +00:00
Derek Parker	6605686e3b	cmd/compile: new inline heuristic for struct compares This CL changes the heuristic used to determine whether we can inline a struct equality check or if we must generate a function and call that function for equality. The old method was to count struct fields, but this can lead to poor in lining decisions. We should really be determining the cost of the equality check and use that to determine if we should inline or generate a function. The new benchmark provided in this CL returns the following when compared against tip: ``` name old time/op new time/op delta EqStruct-32 2.46ns ± 4% 0.25ns ±10% -89.72% (p=0.000 n=39+39) ``` Fixes #38494 Change-Id: Ie06b80a2b2a03a3fd0978bcaf7715f9afb66e0ab GitHub-Last-Rev: `e9a18d9389` GitHub-Pull-Request: golang/go#53326 Reviewed-on: https://go-review.googlesource.com/c/go/+/411674 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com>	2022-09-02 17:48:22 +00:00
ruinan	121344ac33	cmd/compile: optimize RotateLeft8/16 on arm64 This CL optimizes RotateLeft8/16 on arm64. For 16 bits, we form a 32 bits register by duplicating two 16 bits registers, then use RORW instruction to do the rotate shift. For 8 bits, we just use LSR and LSL instead of RORW because the code is simpler. Benchmark Old ThisCL delta RotateLeft8-46 2.16 ns/op 1.73 ns/op -19.70% RotateLeft16-46 2.16 ns/op 1.54 ns/op -28.53% Change-Id: I09cde4383d12e31876a57f8cdfd3bb4f324fadb0 Reviewed-on: https://go-review.googlesource.com/c/go/+/420976 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com> Run-TryBot: Keith Randall <khr@golang.org>	2022-09-02 17:46:31 +00:00
ruinan	54c7bc9cff	cmd/compile: optimize shift ops on arm64 when the shift value is v&63 For the following code case: var x uint64 x >> (shift & 63) We can directly genereta `x >> shift` on arm64, since the hardware will only use the bottom 6 bits of the shift amount. Benchmark old time/op new time/op delta ShiftArithmeticRight-8 0.40ns 0.31ns -21.7% Change-Id: Id58c8a5b2f6dd5c30c3876f4a36e11b4d81e2dc9 Reviewed-on: https://go-review.googlesource.com/c/go/+/425294 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com>	2022-09-02 17:44:29 +00:00
Keith Randall	33a7e5a4b4	cmd/compile: combine multiple rotate instructions Rotating by c, then by d, is the same as rotating by c+d. Change-Id: I36df82261460ff80f7c6d39bcdf0e840cef1c91a Reviewed-on: https://go-review.googlesource.com/c/go/+/424894 Reviewed-by: Wayne Zuo <wdvxdr@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Ruinan Sun <Ruinan.Sun@arm.com> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com>	2022-08-31 22:10:52 +00:00
Keith Randall	69aed4712d	cmd/compile: use better splitting condition for string binary search Currently we use a full cmpstring to do the comparison for each split in the binary search for a string switch. Instead, split by comparing a single byte of the input string with a constant. That will give us a much faster split (although it might be not quite as good a split). Fixes #53333 R=go1.20 Change-Id: I28c7209342314f367071e4aa1f2beb6ec9ff7123 Reviewed-on: https://go-review.googlesource.com/c/go/+/414894 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Heschi Kreinick <heschi@google.com>	2022-08-31 22:08:26 +00:00
Wayne Zuo	da6556968f	cmd/compile: simplify bounded shift on riscv64 The prove pass will mark some shifts bounded, and then we can use that information to generate better code on riscv64. Change-Id: Ia22f43d0598453c9417adac7017db28d7240948b Reviewed-on: https://go-review.googlesource.com/c/go/+/422616 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2022-08-31 20:21:00 +00:00
Wayne Zuo	e8f0340fa4	cmd/compile: intrinsify RotateLeft{32,64} on loong64 Benchmark on crypto/sha256 (provided by Xiaodong Liu): name old time/op new time/op delta Hash8Bytes/New 1.19µs ± 0% 0.97µs ± 0% -18.75% (p=0.000 n=9+9) Hash8Bytes/Sum224 1.21µs ± 0% 0.97µs ± 0% -20.04% (p=0.000 n=9+10) Hash8Bytes/Sum256 1.21µs ± 0% 0.98µs ± 0% -19.16% (p=0.000 n=10+7) Hash1K/New 15.9µs ± 0% 12.4µs ± 0% -22.10% (p=0.000 n=10+10) Hash1K/Sum224 15.9µs ± 0% 12.4µs ± 0% -22.18% (p=0.000 n=8+10) Hash1K/Sum256 15.9µs ± 0% 12.4µs ± 0% -22.15% (p=0.000 n=10+9) Hash8K/New 119µs ± 0% 92µs ± 0% -22.40% (p=0.000 n=10+9) Hash8K/Sum224 119µs ± 0% 92µs ± 0% -22.41% (p=0.000 n=9+10) Hash8K/Sum256 119µs ± 0% 92µs ± 0% -22.40% (p=0.000 n=9+9) name old speed new speed delta Hash8Bytes/New 6.70MB/s ± 0% 8.25MB/s ± 0% +23.13% (p=0.000 n=10+10) Hash8Bytes/Sum224 6.60MB/s ± 0% 8.26MB/s ± 0% +25.06% (p=0.000 n=10+10) Hash8Bytes/Sum256 6.59MB/s ± 0% 8.15MB/s ± 0% +23.67% (p=0.000 n=10+7) Hash1K/New 64.3MB/s ± 0% 82.5MB/s ± 0% +28.36% (p=0.000 n=10+10) Hash1K/Sum224 64.3MB/s ± 0% 82.6MB/s ± 0% +28.51% (p=0.000 n=10+10) Hash1K/Sum256 64.3MB/s ± 0% 82.6MB/s ± 0% +28.46% (p=0.000 n=9+9) Hash8K/New 69.0MB/s ± 0% 89.0MB/s ± 0% +28.87% (p=0.000 n=10+8) Hash8K/Sum224 69.0MB/s ± 0% 89.0MB/s ± 0% +28.88% (p=0.000 n=9+10) Hash8K/Sum256 69.0MB/s ± 0% 88.9MB/s ± 0% +28.87% (p=0.000 n=8+9) Benchmark on crypto/sha512 (provided by Xiaodong Liu): name old time/op new time/op delta Hash8Bytes/New 1.55µs ± 0% 1.31µs ± 0% -15.67% (p=0.000 n=10+10) Hash8Bytes/Sum384 1.59µs ± 0% 1.35µs ± 0% -14.97% (p=0.000 n=10+10) Hash8Bytes/Sum512 1.62µs ± 0% 1.39µs ± 0% -14.02% (p=0.000 n=10+10) Hash1K/New 10.7µs ± 0% 8.6µs ± 0% -19.60% (p=0.000 n=8+8) Hash1K/Sum384 10.8µs ± 0% 8.7µs ± 0% -19.40% (p=0.000 n=9+9) Hash1K/Sum512 10.8µs ± 0% 8.7µs ± 0% -19.35% (p=0.000 n=9+10) Hash8K/New 74.6µs ± 0% 59.6µs ± 0% -20.08% (p=0.000 n=10+9) Hash8K/Sum384 74.7µs ± 0% 59.7µs ± 0% -20.04% (p=0.000 n=9+8) Hash8K/Sum512 74.7µs ± 0% 59.7µs ± 0% -20.01% (p=0.000 n=10+10) name old speed new speed delta Hash8Bytes/New 5.16MB/s ± 0% 6.12MB/s ± 0% +18.60% (p=0.000 n=10+8) Hash8Bytes/Sum384 5.02MB/s ± 0% 5.90MB/s ± 0% +17.56% (p=0.000 n=10+10) Hash8Bytes/Sum512 4.94MB/s ± 0% 5.74MB/s ± 0% +16.29% (p=0.000 n=10+9) Hash1K/New 95.4MB/s ± 0% 118.6MB/s ± 0% +24.38% (p=0.000 n=10+10) Hash1K/Sum384 95.0MB/s ± 0% 117.9MB/s ± 0% +24.06% (p=0.000 n=8+9) Hash1K/Sum512 94.8MB/s ± 0% 117.5MB/s ± 0% +23.99% (p=0.000 n=8+9) Hash8K/New 110MB/s ± 0% 137MB/s ± 0% +25.11% (p=0.000 n=9+6) Hash8K/Sum384 110MB/s ± 0% 137MB/s ± 0% +25.07% (p=0.000 n=9+8) Hash8K/Sum512 110MB/s ± 0% 137MB/s ± 0% +25.01% (p=0.000 n=10+10) Change-Id: I28ccfce634659305a336c8e0a3f8589f7361d661 Reviewed-on: https://go-review.googlesource.com/c/go/+/422317 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: David Chase <drchase@google.com>	2022-08-30 03:21:44 +00:00
Wayne Zuo	a6219737e3	cmd/compile: intrinsify Sub64 on riscv64 After this CL, the performance difference in crypto/elliptic benchmarks on linux/riscv64 are: name old time/op new time/op delta ScalarBaseMult/P256 1.64ms ± 1% 1.60ms ± 1% -2.36% (p=0.008 n=5+5) ScalarBaseMult/P224 1.53ms ± 1% 1.47ms ± 2% -4.24% (p=0.008 n=5+5) ScalarBaseMult/P384 5.12ms ± 2% 5.03ms ± 2% ~ (p=0.095 n=5+5) ScalarBaseMult/P521 22.3ms ± 2% 13.8ms ± 1% -37.89% (p=0.008 n=5+5) ScalarMult/P256 4.49ms ± 2% 4.26ms ± 2% -5.13% (p=0.008 n=5+5) ScalarMult/P224 4.33ms ± 1% 4.09ms ± 1% -5.59% (p=0.008 n=5+5) ScalarMult/P384 16.3ms ± 1% 15.5ms ± 2% -4.78% (p=0.008 n=5+5) ScalarMult/P521 101ms ± 0% 47ms ± 2% -53.36% (p=0.008 n=5+5) Change-Id: I31cf0506e27f9d85f576af1813630a19c20dda8a Reviewed-on: https://go-review.googlesource.com/c/go/+/420095 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-08-27 05:43:59 +00:00
Wayne Zuo	969f48a3a2	cmd/compile: intrinsify Add64 on riscv64 According to RISCV instruction set manual v2.2 Sec 2.4, we can implement overflowing check for unsigned addition cheaply using SLTU instructions. After this CL, the performance difference in crypto/elliptic benchmarks on linux/riscv64 are: name old time/op new time/op delta ScalarBaseMult/P256 1.93ms ± 1% 1.64ms ± 1% -14.96% (p=0.008 n=5+5) ScalarBaseMult/P224 1.80ms ± 2% 1.53ms ± 1% -14.89% (p=0.008 n=5+5) ScalarBaseMult/P384 6.15ms ± 2% 5.12ms ± 2% -16.73% (p=0.008 n=5+5) ScalarBaseMult/P521 25.9ms ± 1% 22.3ms ± 2% -13.78% (p=0.008 n=5+5) ScalarMult/P256 5.59ms ± 1% 4.49ms ± 2% -19.79% (p=0.008 n=5+5) ScalarMult/P224 5.42ms ± 1% 4.33ms ± 1% -20.01% (p=0.008 n=5+5) ScalarMult/P384 19.9ms ± 2% 16.3ms ± 1% -18.15% (p=0.008 n=5+5) ScalarMult/P521 97.3ms ± 1% 100.7ms ± 0% +3.48% (p=0.008 n=5+5) Change-Id: Ic4c82ced4b072a4a6575343fa9f29dd09b0cabc4 Reviewed-on: https://go-review.googlesource.com/c/go/+/420094 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-08-27 05:43:32 +00:00
Wayne Zuo	b60432df14	cmd/compile: deadcode for LoweredMuluhilo on riscv64 This is a follow up of CL 425101 on RISCV64. According to RISCV Volume 1, Unprivileged Spec v. 20191213 Chapter 7.1: If both the high and low bits of the same product are required, then the recommended code sequence is: MULH[[S]U] rdh, rs1, rs2; MUL rdl, rs1, rs2 (source register specifiers must be in same order and rdh cannot be the same as rs1 or rs2). Microarchitectures can then fuse these into a single multiply operation instead of performing two separate multiplies. So we should not split Muluhilo to separate instructions. Updates #54607 Change-Id: If47461f3aaaf00e27cd583a9990e144fb8bcdb17 Reviewed-on: https://go-review.googlesource.com/c/go/+/425203 Auto-Submit: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2022-08-24 18:08:33 +00:00
Keith Randall	60ad3c48f5	cmd/compile: move SSA rotate instruction detection to arch-independent rules Detect rotate instructions while still in architecture-independent form. It's easier to do here, and we don't need to repeat it in each architecture file. Change-Id: I9396954b3f3b3bfb96c160d064a02002309935bb Reviewed-on: https://go-review.googlesource.com/c/go/+/421195 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Eric Fang <eric.fang@arm.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Joedian Reid <joedian@golang.org> Reviewed-by: Ruinan Sun <Ruinan.Sun@arm.com> Run-TryBot: Keith Randall <khr@golang.org>	2022-08-23 21:24:14 +00:00
eric fang	9f0f87c806	cmd/internal/obj/arm64: remove the transition from $0 to ZR Previously we convert $0 to the ZR register for some reasons, which causes two problems: 1. Confusion, the special case of the ZR register needs to be considered when dealing with constants. For encoding, some places we encode ZR, and some places we encode $0, although we have converted $0 to ZR. 2. Unexpected instruction format. All instructions that support ZR register operands can be replaced by $0. This patch removes this conversion. Note that this patch may cause previously unintendedly supported instruction formats to no longer be supported. Change-Id: I3d8d2c06711b7614a38191397da7776417f1861c Reviewed-on: https://go-review.googlesource.com/c/go/+/404316 Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Eric Fang <eric.fang@arm.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-08-23 06:11:32 +00:00
eric fang	0a52d80666	cmd/compile/internal/ssa: optimize memory moving on arm64 This CL optimizes memory moving with LDP and STP on arm64. Benchmarks: name old time/op new time/op delta ClearFat7-160 1.08ns ± 0% 0.95ns ± 0% -11.41% (p=0.008 n=5+5) ClearFat8-160 0.84ns ± 0% 0.84ns ± 0% -0.95% (p=0.008 n=5+5) ClearFat11-160 1.08ns ± 0% 0.95ns ± 0% -11.46% (p=0.008 n=5+5) ClearFat12-160 0.95ns ± 0% 0.95ns ± 0% ~ (p=0.063 n=4+5) ClearFat13-160 1.08ns ± 0% 0.95ns ± 0% -11.45% (p=0.008 n=5+5) ClearFat14-160 1.08ns ± 0% 0.95ns ± 0% -11.47% (p=0.008 n=5+5) ClearFat15-160 1.24ns ± 0% 0.95ns ± 0% -22.98% (p=0.029 n=4+4) ClearFat16-160 0.84ns ± 0% 0.83ns ± 0% -0.11% (p=0.008 n=5+5) ClearFat24-160 2.15ns ± 0% 2.15ns ± 0% ~ (all equal) ClearFat32-160 2.86ns ± 0% 2.86ns ± 0% ~ (p=0.333 n=5+4) ClearFat40-160 2.15ns ± 0% 2.15ns ± 0% ~ (all equal) ClearFat48-160 3.32ns ± 1% 3.31ns ± 1% ~ (p=0.690 n=5+5) ClearFat56-160 2.15ns ± 0% 2.15ns ± 0% ~ (all equal) ClearFat64-160 3.25ns ± 1% 3.26ns ± 1% ~ (p=0.841 n=5+5) ClearFat72-160 2.22ns ± 0% 2.22ns ± 0% ~ (p=0.444 n=5+5) ClearFat128-160 4.03ns ± 0% 4.04ns ± 0% +0.32% (p=0.008 n=5+5) ClearFat256-160 6.44ns ± 0% 6.44ns ± 0% +0.08% (p=0.016 n=4+5) ClearFat512-160 12.2ns ± 0% 12.2ns ± 0% +0.13% (p=0.008 n=5+5) ClearFat1024-160 24.3ns ± 0% 24.3ns ± 0% ~ (p=0.167 n=5+5) ClearFat1032-160 24.5ns ± 0% 24.5ns ± 0% ~ (p=0.238 n=4+5) ClearFat1040-160 29.2ns ± 0% 29.3ns ± 0% +0.34% (p=0.008 n=5+5) CopyFat7-160 1.43ns ± 0% 1.07ns ± 0% -24.97% (p=0.008 n=5+5) CopyFat8-160 0.89ns ± 0% 0.89ns ± 0% ~ (p=0.238 n=5+5) CopyFat11-160 1.43ns ± 0% 1.07ns ± 0% -24.97% (p=0.008 n=5+5) CopyFat12-160 1.07ns ± 0% 1.07ns ± 0% ~ (p=0.238 n=5+4) CopyFat13-160 1.43ns ± 0% 1.07ns ± 0% ~ (p=0.079 n=4+5) CopyFat14-160 1.43ns ± 0% 1.07ns ± 0% -24.95% (p=0.008 n=5+5) CopyFat15-160 1.79ns ± 0% 1.07ns ± 0% ~ (p=0.079 n=4+5) CopyFat16-160 1.07ns ± 0% 1.07ns ± 0% ~ (p=0.444 n=5+5) CopyFat24-160 1.84ns ± 2% 1.67ns ± 0% -9.28% (p=0.008 n=5+5) CopyFat32-160 3.22ns ± 0% 2.92ns ± 0% -9.40% (p=0.008 n=5+5) CopyFat64-160 3.64ns ± 0% 3.57ns ± 0% -1.96% (p=0.008 n=5+5) CopyFat72-160 3.56ns ± 0% 3.11ns ± 0% -12.89% (p=0.008 n=5+5) CopyFat128-160 5.06ns ± 0% 5.06ns ± 0% +0.04% (p=0.048 n=5+5) CopyFat256-160 9.13ns ± 0% 9.13ns ± 0% ~ (p=0.659 n=5+5) CopyFat512-160 17.4ns ± 0% 17.4ns ± 0% ~ (p=0.167 n=5+5) CopyFat520-160 17.2ns ± 0% 17.3ns ± 0% +0.37% (p=0.008 n=5+5) CopyFat1024-160 34.1ns ± 0% 34.0ns ± 0% ~ (p=0.127 n=5+5) CopyFat1032-160 80.9ns ± 0% 34.2ns ± 0% -57.74% (p=0.008 n=5+5) CopyFat1040-160 94.4ns ± 0% 41.7ns ± 0% -55.78% (p=0.016 n=5+4) Change-Id: I14186f9f82b0ecf8b6c02191dc5da566b9a21e6c Reviewed-on: https://go-review.googlesource.com/c/go/+/421654 Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Eric Fang <eric.fang@arm.com> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-08-23 03:09:07 +00:00
Cherry Mui	8bf9e01473	cmd/compile: split Muluhilo op on ARM64 On ARM64 we use two separate instructions to compute the hi and lo results of a 64x64->128 multiplication. Lower to two separate ops so if only one result is needed we can deadcode the other. Fixes #54607. Change-Id: Ib023e77eb2b2b0bcf467b45471cb8a294bce6f90 Reviewed-on: https://go-review.googlesource.com/c/go/+/425101 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com>	2022-08-22 21:29:31 +00:00
Keith Randall	908499adec	cmd/compile: stop using VARKILL With the introduction of stack objects, VARKILL information is no longer needed. With stack objects, an object is dead when there are no more static references to it, and the stack scanner can't find any live pointers to it. VARKILL information isn't used to establish live ranges for address-taken variables any more. In effect, the last static reference is the VARKILL, and there's an additional dynamic liveness check during stack scanning. Next CL will actually rip out the VARKILL opcodes. Change-Id: I030a2ab867445cf4e0e69397911f8a2e2f0ed07b Reviewed-on: https://go-review.googlesource.com/c/go/+/419234 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Keith Randall <khr@golang.org>	2022-08-18 17:36:38 +00:00
Archana R	d09c6ac417	test/codegen: updated multiple tests to verify on ppc64,ppc64le Updated multiple tests in test/codegen: math.go, mathbits.go, shift.go and slices.go to verify on ppc64/ppc64le as well Change-Id: Id88dd41569b7097819fb4d451b615f69cf7f7a94 Reviewed-on: https://go-review.googlesource.com/c/go/+/412115 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Archana Ravindar <aravind5@in.ibm.com> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Paul Murphy <murp@ibm.com> Reviewed-by: Ian Lance Taylor <iant@google.com>	2022-08-17 13:56:55 +00:00
Wayne Zuo	09932f95f5	cmd/compile: combine more constant stores on amd64 Fixes #53324 Change-Id: I06149d860f858b082235e9d80bf0ea494679b386 Reviewed-on: https://go-review.googlesource.com/c/go/+/411614 Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com>	2022-08-15 16:19:21 +00:00
eric fang	efe5929dbd	cmd/compile/internal/ssa: optimize ARM64 code with TST For signed comparisons, the following four optimization rules hold: (CMPconst [0] z:(AND x y)) && z.Uses == 1 => (TST x y) (CMPWconst [0] z:(AND x y)) && z.Uses == 1 => (TSTW x y) (CMPconst [0] x:(ANDconst [c] y)) && x.Uses == 1 => (TSTconst [c] y) (CMPWconst [0] x:(ANDconst [c] y)) && x.Uses == 1 => (TSTWconst [int32(c)] y) But currently they only apply to jump instructions, not to conditional instructions within a block, such as cset, csel, etc. This CL extends the above rules into blocks so that conditional instructions can also be optimized. name old time/op new time/op delta DivisiblePow2constI64-160 1.04ns ± 0% 0.86ns ± 0% -17.30% (p=0.008 n=5+5) DivisiblePow2constI32-160 1.04ns ± 0% 0.87ns ± 0% -16.16% (p=0.016 n=4+5) DivisiblePow2constI16-160 1.04ns ± 0% 0.87ns ± 0% -16.03% (p=0.008 n=5+5) DivisiblePow2constI8-160 1.04ns ± 0% 0.86ns ± 0% -17.15% (p=0.008 n=5+5) Change-Id: I6bc34bff30862210e8dd001e0340b8fe502fe3de Reviewed-on: https://go-review.googlesource.com/c/go/+/420434 Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Run-TryBot: Eric Fang <eric.fang@arm.com>	2022-08-10 02:13:53 +00:00
Cuong Manh Le	0f8dffd0aa	all: use ":" for compiler generated symbols As it can't appear in user package paths. There is a hack for handling "go:buildid" and "type:" on windows/386. Previously, windows/386 requires underscore prefix on external symbols, but that's only applied for SHOSTOBJ/SUNDEFEXT or cgo export symbols. "go.buildid" is STEXT, "type." is STYPE, thus they are not prefixed with underscore. In external linking mode, the external linker can't resolve them as external symbols. But we are lucky that they have "." in their name, so the external linker see them as Forwarder RVA exports. See: - https://docs.microsoft.com/en-us/windows/win32/debug/pe-format#export-address-table - https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=ld/pe-dll.c;h=e7b82ba6ffadf74dc1b9ee71dc13d48336941e51;hb=HEAD#l972) This CL changes "." to ":" in symbols name, so theses symbols can not be found by external linker anymore. So a hacky way is adding the underscore prefix for these 2 symbols. I don't have enough knowledge to verify whether adding the underscore for all STEXT/STYPE symbols are fine, even if it could be, that would be done in future CL. Fixes #37762 Change-Id: I92eaaf24c0820926a36e0530fdb07b07af1fcc35 Reviewed-on: https://go-review.googlesource.com/c/go/+/317917 Reviewed-by: Than McIntosh <thanm@google.com> Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-08-09 11:28:56 +00:00
Lynn Boger	c1bfefe9d1	cmd/compile: fix confusion with ANDCCconst in PPC64 rules Currently there is a an ANDconst and an ANDCCconst op in PPC64, which is confusing since they map onto the same instruction. One of these ops sets the result of the AND operation, and the other sets the flag (condition register). This converts ANDCCconst into an op with the 2 expected results: the integer result of the AND and the flag setting. The ANDconst op has been removed. Note that in the PPC64 ISA the only variation of the 'and immediate' is the one that sets the condition bit, which probably led to the original (confusing) implementation. This also adds a few rules to improve the use of ANDCCconst with ISELB and some testcases to verify those improvements. Change-Id: I523703fa4da2098eb995dc3ba744d36fa28e41d4 Reviewed-on: https://go-review.googlesource.com/c/go/+/422015 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Paul Murphy <murp@ibm.com>	2022-08-08 20:15:55 +00:00
Keith Randall	c2a9c55823	cmd/compile: optimize unsafe.Slice generated code We don't need a multiply when the element type is size 0 or 1. The panic functions don't return, so we don't need any post-call code (register restores, etc.). Change-Id: I0dcea5df56d29d7be26554ddca966b3903c672e5 Reviewed-on: https://go-review.googlesource.com/c/go/+/419754 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2022-08-08 17:36:47 +00:00
cuiweixie	e7307034cc	cmd/compile: store combine on amd64 Fixes #54120 Change-Id: I6915b6e8d459d9becfdef4fdcba95ee4dea6af05 GitHub-Last-Rev: `03f19942c7` GitHub-Pull-Request: golang/go#54126 Reviewed-on: https://go-review.googlesource.com/c/go/+/420115 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Than McIntosh <thanm@google.com> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>	2022-08-08 16:40:58 +00:00
Matthew Dempsky	ab8d7dd75e	cmd/compile: set LocalPkg.Path to -p flag Since CL 391014, cmd/compile now requires the -p flag to be set the build system. This CL changes it to initialize LocalPkg.Path to the provided path, rather than relying on writing out `"".` into object files and expecting cmd/link to substitute them. However, this actually involved a rather long tail of fixes. Many have already been submitted, but a few notable ones that have to land simultaneously with changing LocalPkg: 1. When compiling package runtime, there are really two "runtime" packages: types.LocalPkg (the source package itself) and ir.Pkgs.Runtime (the compiler's internal representation, for synthetic references). Previously, these ended up creating separate link symbols (`"".xxx` and `runtime.xxx`, respectively), but now they both end up as `runtime.xxx`, which causes lsym collisions (notably inittask and funcsyms). 2. test/codegen tests need to be updated to expect symbols to be named `command-line-arguments.xxx` rather than `"".foo`. 3. The issue20014 test case is sensitive to the sort order of field tracking symbols. In particular, the local package now sorts to its natural place in the list, rather than to the front. Thanks to David Chase for helping track down all of the fixes needed for this CL. Updates #51734. Change-Id: Iba3041cf7ad967d18c6e17922fa06ba11798b565 Reviewed-on: https://go-review.googlesource.com/c/go/+/393715 Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>	2022-05-16 18:19:47 +00:00
Cherry Mui	540f8c2b50	cmd/compile: use jump table on ARM64 Following CL 357330, use jump tables on ARM64. name old time/op new time/op delta Switch8Predictable-4 3.41ns ± 0% 3.21ns ± 0% ~ (p=0.079 n=4+5) Switch8Unpredictable-4 12.0ns ± 0% 9.5ns ± 0% -21.17% (p=0.000 n=5+4) Switch32Predictable-4 3.06ns ± 0% 2.82ns ± 0% -7.78% (p=0.008 n=5+5) Switch32Unpredictable-4 13.3ns ± 0% 9.5ns ± 0% -28.87% (p=0.016 n=4+5) SwitchStringPredictable-4 3.71ns ± 0% 3.21ns ± 0% -13.43% (p=0.000 n=5+4) SwitchStringUnpredictable-4 14.8ns ± 0% 15.1ns ± 0% +2.37% (p=0.008 n=5+5) Change-Id: Ia0b85df7ca9273cf70c05eb957225c6e61822fa6 Reviewed-on: https://go-review.googlesource.com/c/go/+/403979 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com>	2022-05-13 19:51:03 +00:00
Paul E. Murphy	0c43878baa	cmd/compile: lower Add64/Sub64 into ssa on PPC64 math/bits.Add64 and math/bits.Sub64 now lower and optimize directly in SSA form. The optimization of carry chains focuses around eliding XER<->GPR transfers of the CA bit when used exclusively as an input to a single carry operations, or when the CA value is known. This also adds support for handling XER spills in the assembler which could happen if carry chains contain inter-dependencies on each other (which seems very unlikely with practical usage), or a clobber happens (SRAW/SRAD/SUBFC operations clobber CA). With PPC64 Add64/Sub64 lowering into SSA and this patch, the net performance difference in crypto/elliptic benchmarks on P9/ppc64le are: name old time/op new time/op delta ScalarBaseMult/P256 46.3µs ± 0% 46.9µs ± 0% +1.34% ScalarBaseMult/P224 356µs ± 0% 209µs ± 0% -41.14% ScalarBaseMult/P384 1.20ms ± 0% 0.57ms ± 0% -52.14% ScalarBaseMult/P521 3.38ms ± 0% 1.44ms ± 0% -57.27% ScalarMult/P256 199µs ± 0% 199µs ± 0% -0.17% ScalarMult/P224 357µs ± 0% 212µs ± 0% -40.56% ScalarMult/P384 1.20ms ± 0% 0.58ms ± 0% -51.86% ScalarMult/P521 3.37ms ± 0% 1.44ms ± 0% -57.32% MarshalUnmarshal/P256/Uncompressed 2.59µs ± 0% 2.52µs ± 0% -2.63% MarshalUnmarshal/P256/Compressed 2.58µs ± 0% 2.52µs ± 0% -2.06% MarshalUnmarshal/P224/Uncompressed 1.54µs ± 0% 1.40µs ± 0% -9.42% MarshalUnmarshal/P224/Compressed 1.54µs ± 0% 1.39µs ± 0% -9.87% MarshalUnmarshal/P384/Uncompressed 2.40µs ± 0% 1.80µs ± 0% -24.93% MarshalUnmarshal/P384/Compressed 2.35µs ± 0% 1.81µs ± 0% -23.03% MarshalUnmarshal/P521/Uncompressed 3.79µs ± 0% 2.58µs ± 0% -31.81% MarshalUnmarshal/P521/Compressed 3.80µs ± 0% 2.60µs ± 0% -31.67% Note, P256 uses an asm implementation, thus, little variation is expected. Change-Id: I88a24f6bf0f4f285c649e40243b1ab69cc452b71 Reviewed-on: https://go-review.googlesource.com/c/go/+/346870 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com>	2022-05-10 20:03:53 +00:00
Jorropo	e1e056fa6a	cmd/compile: fold constants found by prove It is hit ~70k times building go. This make the go binary, 0.04% smaller. I didn't included benchmarks because this is just constant foldings and is hard to mesure objectively. For example, this enable rewriting things like: if x == 20 { return x + 30 + z } Into: if x == 20 { return 50 + z } It's not just fixing programer's code, the ssa generator generate code like this sometimes. Change-Id: I0861f342b27f7227b5f1c34d8267fa0057b1bbbc GitHub-Last-Rev: `4c2f9b5216` GitHub-Pull-Request: golang/go#52669 Reviewed-on: https://go-review.googlesource.com/c/go/+/403735 Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com>	2022-05-04 20:30:17 +00:00
Paul E. Murphy	c570f0eda2	cmd/compile: combine OR + NOT into ORN on PPC64 This shows up in a few crypto functions, and other assorted places. Change-Id: I5a7f4c25ddd4a6499dc295ef693b9fe43d2448ab Reviewed-on: https://go-review.googlesource.com/c/go/+/404057 Run-TryBot: Paul Murphy <murp@ibm.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Russ Cox <rsc@golang.org>	2022-05-04 18:49:50 +00:00
Cuong Manh Le	0668e3cb1a	cmd/compile: support pointers to arrays in arrayClear Fixes #52635 Change-Id: I85f182931e30292983ef86c55a0ab6e01282395c Reviewed-on: https://go-review.googlesource.com/c/go/+/403337 Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Ian Lance Taylor <iant@google.com>	2022-05-03 05:42:48 +00:00
Keith Randall	c4b2288755	cmd/compile: add jump table codegen test Change-Id: Ic67f676f5ebe146166a0d3c1d78a802881320e49 Reviewed-on: https://go-review.googlesource.com/c/go/+/400375 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com>	2022-04-14 21:16:29 +00:00
Wayne Zuo	66f03f79da	cmd/compile: add SHLX&SHRX without load Change-Id: I79eb5e7d6bcb23f26d3a100e915efff6dae70391 Reviewed-on: https://go-review.googlesource.com/c/go/+/399061 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2022-04-13 17:48:36 +00:00
Wayne Zuo	517781b391	cmd/compile: add SARXQload and SARXLload Change-Id: I4e8dc5009a5b8af37d71b62f1322f94806d3e9d9 Reviewed-on: https://go-review.googlesource.com/c/go/+/399056 Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2022-04-13 17:48:12 +00:00
Wayne Zuo	d6320f1a58	cmd/compile: add SARX instruction for GOAMD64>=3 name old time/op new time/op delta ShiftArithmeticRight-8 0.68ns ± 5% 0.30ns ± 6% -56.14% (p=0.000 n=10+10) Change-Id: I052a0d7b9e6526d526276444e588b0cc288beff4 Reviewed-on: https://go-review.googlesource.com/c/go/+/399055 Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2022-04-12 12:59:27 +00:00
Wayne Zuo	32de2b0d1c	cmd/compile: add MOVBE index load/store Fixes #51724 Change-Id: I94e650a7482dc4c479d597f0162a6a89d779708d Reviewed-on: https://go-review.googlesource.com/c/go/+/395474 Reviewed-by: Keith Randall <khr@golang.org> Trust: Cherry Mui <cherryyz@google.com>	2022-04-11 15:41:56 +00:00
Wayne Zuo	615d3c3040	test: adjust load and store test In the load tests, we only want to test the assembly produced by the load operations. If we use the global variable sink, it will produce one load operation and one store operation(assign to sink). For example: func load_be64(b []byte) uint64 { sink64 = binary.BigEndian.Uint64(b) } If we compile this function with GOAMD64=v3, it may produce MOVBEQload and MOVQstore or MOVQload and MOVBEQstore, but we only want MOVBEQload. Discovered when developing CL 395474. Same for the store tests. Change-Id: I65c3c742f1eff657c3a0d2dd103f51140ae8079e Reviewed-on: https://go-review.googlesource.com/c/go/+/397875 Reviewed-by: Keith Randall <khr@golang.org> Trust: Cherry Mui <cherryyz@google.com>	2022-04-11 15:41:04 +00:00
Wayne Zuo	7fbabe8d57	cmd/compile: use shlx&shrx instruction for GOAMD64>=v3 The SHRX/SHLX instruction can take any general register as the shift count operand, and can read source from memory. This CL introduces some operators to combine load and shift to one instruction. For #47120 Change-Id: I13b48f53c7d30067a72eb2c8382242045dead36a Reviewed-on: https://go-review.googlesource.com/c/go/+/385174 Reviewed-by: Keith Randall <khr@golang.org> Trust: Cherry Mui <cherryyz@google.com>	2022-04-04 20:07:23 +00:00
Wayne Zuo	a92ca51507	cmd/compile: use LZCNT instruction for GOAMD64>=3 LZCNT is similar to BSR, but BSR(x) is undefined when x == 0, so using LZCNT can avoid a special case for zero input. Except that case, LZCNTQ(x) == 63-BSRQ(x) and LZCNTL(x) == 31-BSRL(x). And according to https://www.agner.org/optimize/instruction_tables.pdf, LZCNT instructions are much faster than BSR on AMD CPU. name old time/op new time/op delta LeadingZeros-8 0.91ns ± 1% 0.80ns ± 7% -11.68% (p=0.000 n=9+9) LeadingZeros8-8 0.98ns ±15% 0.91ns ± 1% -7.34% (p=0.000 n=9+9) LeadingZeros16-8 0.94ns ± 3% 0.92ns ± 2% -2.36% (p=0.001 n=10+10) LeadingZeros32-8 0.89ns ± 1% 0.78ns ± 2% -12.49% (p=0.000 n=10+10) LeadingZeros64-8 0.92ns ± 1% 0.78ns ± 1% -14.48% (p=0.000 n=10+10) Change-Id: I125147fe3d6994a4cfe558432780408e9a27557a Reviewed-on: https://go-review.googlesource.com/c/go/+/396794 Reviewed-by: Keith Randall <khr@golang.org> Trust: Emmanuel Odeke <emmanuel@orijtech.com> Run-TryBot: Emmanuel Odeke <emmanuel@orijtech.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-04-04 04:01:17 +00:00
Wayne Zuo	ba6df85c7c	cmd/compile: add MOVBEWstore support for GOAMD64>=3 This CL add MOVBE support for 16-bit version, but MOVBEWload is excluded because it does not satisfy zero extented. For #51724 Change-Id: I3fadf20bcbb9b423f6355e6a1e340107e8e621ac Reviewed-on: https://go-review.googlesource.com/c/go/+/396617 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Trust: Emmanuel Odeke <emmanuel@orijtech.com>	2022-04-03 23:48:16 +00:00
fanzha02	8ab42a945a	cmd/compile: merge ANDconst and UBFX into UBFX on arm64 Add a new rewrite rule to merge ANDconst and UBFX into UBFX. Add test cases. Change-Id: I24d6442d0c956d7ce092c3a3858d4a3a41771670 Reviewed-on: https://go-review.googlesource.com/c/go/+/377054 Trust: Fannie Zhang <Fannie.Zhang@arm.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-03-25 01:24:44 +00:00
Keith Randall	aaa3d39f27	cmd/compile: include all entries in map literal hint size Currently we only include static entries in the hint for sizing the map when allocating a map for a map literal. Change that to include all entries. This will be an overallocation if the dynamic entries in the map have equal keys, but equal keys in map literals are rare, and at worst we waste a bit of space. Fixes #43020 Change-Id: I232f82f15316bdf4ea6d657d25a0b094b77884ce Reviewed-on: https://go-review.googlesource.com/c/go/+/383634 Run-TryBot: Keith Randall <khr@golang.org> Trust: Keith Randall <khr@golang.org> Trust: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-03-01 23:20:30 +00:00
Archana R	4056934e48	test/codegen: updated arithmetic tests to verify on ppc64,ppc64le Updated multiple tests in test/codegen/arithmetic.go to verify on ppc64/ppc64le as well Change-Id: I79ca9f87017ea31147a4ba16f5d42ba0fcae64e1 Reviewed-on: https://go-review.googlesource.com/c/go/+/358546 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2021-11-01 13:12:37 +00:00
Archana R	8b9c0d1a79	test/codegen: updated comparison test to verify on ppc64,ppc64le Updated test/codegen/comparison.go to verify memequal is inlined as implemented in CL 328291. Change-Id: If7824aed37ee1f8640e54fda0f9b7610582ba316 Reviewed-on: https://go-review.googlesource.com/c/go/+/357289 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2021-10-21 13:05:07 +00:00
wdvxdr	fe7df4c4d0	cmd/compile: use MOVBE instruction for GOAMD64>=v3 In CL 354670, I copied some existing rules for convenience but forgot to update the last rule which broke `GOAMD64=v3 ./make.bat` Revive CL 354670 Change-Id: Ic1e2047c603f0122482a4b293ce1ef74d806c019 Reviewed-on: https://go-review.googlesource.com/c/go/+/356810 Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org> Trust: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Go Bot <gobot@golang.org>	2021-10-19 16:18:56 +00:00
Daniel Martí	b0351bfd7d	Revert "cmd/compile: use MOVBE instruction for GOAMD64>=v3" This reverts CL 354670. Reason for revert: broke make.bash with GOAMD64=v3. Fixes #49061. Change-Id: I7f2ed99b7c10100c4e0c1462ea91c4c9d8c609b2 Reviewed-on: https://go-review.googlesource.com/c/go/+/356790 Trust: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Koichi Shiraishi <zchee.io@gmail.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>	2021-10-19 09:49:38 +00:00
Lynn Boger	33b3260c1e	cmd/compile/internal/ssagen: set BitLen32 as intrinsic on PPC64 It was noticed through some other investigation that BitLen32 was not generating the best code and found that it wasn't recognized as an intrinsic. This corrects that and enables the test for PPC64. Change-Id: Iab496a8830c8552f507b7292649b1b660f3848b5 Reviewed-on: https://go-review.googlesource.com/c/go/+/355872 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2021-10-18 21:33:08 +00:00
wdvxdr	3e5cc4d6f6	cmd/compile: use MOVBE instruction for GOAMD64>=v3 encoding/binary benchmark on my laptop: name old time/op new time/op delta ReadSlice1000Int32s-8 4.42µs ± 5% 4.20µs ± 1% -4.94% (p=0.046 n=9+8) ReadStruct-8 359ns ± 8% 368ns ± 5% +2.35% (p=0.041 n=9+10) WriteStruct-8 349ns ± 1% 357ns ± 1% +2.15% (p=0.000 n=8+10) ReadInts-8 235ns ± 1% 233ns ± 1% -1.01% (p=0.005 n=10+10) WriteInts-8 265ns ± 1% 274ns ± 1% +3.45% (p=0.000 n=10+10) WriteSlice1000Int32s-8 4.61µs ± 5% 4.59µs ± 5% ~ (p=0.986 n=10+10) PutUint16-8 0.56ns ± 4% 0.57ns ± 4% ~ (p=0.101 n=10+10) PutUint32-8 0.83ns ± 2% 0.56ns ± 6% -32.91% (p=0.000 n=10+10) PutUint64-8 0.81ns ± 3% 0.62ns ± 4% -23.82% (p=0.000 n=10+10) LittleEndianPutUint16-8 0.55ns ± 4% 0.55ns ± 3% ~ (p=0.926 n=10+10) LittleEndianPutUint32-8 0.41ns ± 4% 0.42ns ± 3% ~ (p=0.148 n=10+9) LittleEndianPutUint64-8 0.55ns ± 2% 0.56ns ± 6% ~ (p=0.897 n=10+10) ReadFloats-8 60.4ns ± 4% 59.0ns ± 1% -2.25% (p=0.007 n=10+10) WriteFloats-8 72.3ns ± 2% 71.5ns ± 7% ~ (p=0.089 n=10+10) ReadSlice1000Float32s-8 4.21µs ± 3% 4.18µs ± 2% ~ (p=0.197 n=10+10) WriteSlice1000Float32s-8 4.61µs ± 2% 4.68µs ± 7% ~ (p=1.000 n=8+10) ReadSlice1000Uint8s-8 250ns ± 4% 247ns ± 4% ~ (p=0.324 n=10+10) WriteSlice1000Uint8s-8 227ns ± 5% 229ns ± 2% ~ (p=0.193 n=10+7) PutUvarint32-8 15.3ns ± 2% 15.4ns ± 4% ~ (p=0.782 n=10+10) PutUvarint64-8 38.5ns ± 1% 38.6ns ± 5% ~ (p=0.396 n=8+10) name old speed new speed delta ReadSlice1000Int32s-8 890MB/s ±17% 953MB/s ± 1% +7.00% (p=0.027 n=10+8) ReadStruct-8 209MB/s ± 8% 204MB/s ± 5% -2.42% (p=0.043 n=9+10) WriteStruct-8 214MB/s ± 3% 210MB/s ± 1% -1.75% (p=0.003 n=9+10) ReadInts-8 127MB/s ± 1% 129MB/s ± 1% +1.01% (p=0.006 n=10+10) WriteInts-8 113MB/s ± 1% 109MB/s ± 1% -3.34% (p=0.000 n=10+10) WriteSlice1000Int32s-8 868MB/s ± 5% 872MB/s ± 5% ~ (p=1.000 n=10+10) PutUint16-8 3.55GB/s ± 4% 3.50GB/s ± 4% ~ (p=0.093 n=10+10) PutUint32-8 4.83GB/s ± 2% 7.21GB/s ± 6% +49.16% (p=0.000 n=10+10) PutUint64-8 9.89GB/s ± 3% 12.99GB/s ± 4% +31.26% (p=0.000 n=10+10) LittleEndianPutUint16-8 3.65GB/s ± 4% 3.65GB/s ± 4% ~ (p=0.912 n=10+10) LittleEndianPutUint32-8 9.74GB/s ± 3% 9.63GB/s ± 3% ~ (p=0.222 n=9+9) LittleEndianPutUint64-8 14.4GB/s ± 2% 14.3GB/s ± 5% ~ (p=0.912 n=10+10) ReadFloats-8 199MB/s ± 4% 203MB/s ± 1% +2.27% (p=0.007 n=10+10) WriteFloats-8 166MB/s ± 2% 168MB/s ± 7% ~ (p=0.089 n=10+10) ReadSlice1000Float32s-8 949MB/s ± 3% 958MB/s ± 2% ~ (p=0.218 n=10+10) WriteSlice1000Float32s-8 867MB/s ± 2% 857MB/s ± 6% ~ (p=1.000 n=8+10) ReadSlice1000Uint8s-8 4.00GB/s ± 4% 4.06GB/s ± 4% ~ (p=0.353 n=10+10) WriteSlice1000Uint8s-8 4.40GB/s ± 4% 4.36GB/s ± 2% ~ (p=0.193 n=10+7) PutUvarint32-8 262MB/s ± 2% 260MB/s ± 4% ~ (p=0.739 n=10+10) PutUvarint64-8 208MB/s ± 1% 207MB/s ± 5% ~ (p=0.408 n=8+10) Updates #45453 Change-Id: Ifda0d48d54665cef45d46d3aad974062633142c4 Reviewed-on: https://go-review.googlesource.com/c/go/+/354670 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Trust: Matthew Dempsky <mdempsky@google.com>	2021-10-18 16:01:55 +00:00
Jake Ciolek	732f6fa9d5	cmd/compile: use ANDL for small immediates We can rewrite ANDQ with an immediate fitting in 32bit with an ANDL, which is shorter to encode. Looking at Go binary itself, before the change there was: ANDL: 2337 ANDQ: 4476 After the change: ANDL: 3790 ANDQ: 3024 So we got rid of 1452 ANDQs This makes the Linux x86_64 binary 0.03% smaller. There seems to be an impact on performance. Intel Cascade Lake benchmarks (with perflock): name old time/op new time/op delta BinaryTree17-8 1.91s ± 1% 1.89s ± 1% -1.22% (p=0.000 n=21+18) Fannkuch11-8 2.34s ± 0% 2.34s ± 0% ~ (p=0.052 n=20+20) FmtFprintfEmpty-8 27.7ns ± 1% 27.4ns ± 3% ~ (p=0.497 n=21+21) FmtFprintfString-8 53.2ns ± 0% 51.5ns ± 0% -3.21% (p=0.000 n=20+19) FmtFprintfInt-8 57.3ns ± 0% 55.7ns ± 0% -2.89% (p=0.000 n=19+19) FmtFprintfIntInt-8 92.3ns ± 0% 88.4ns ± 1% -4.23% (p=0.000 n=20+21) FmtFprintfPrefixedInt-8 103ns ± 0% 103ns ± 0% +0.23% (p=0.000 n=20+21) FmtFprintfFloat-8 147ns ± 0% 148ns ± 0% +0.75% (p=0.000 n=20+21) FmtManyArgs-8 384ns ± 0% 381ns ± 0% -0.63% (p=0.000 n=21+21) GobDecode-8 3.86ms ± 1% 3.88ms ± 1% +0.52% (p=0.000 n=20+21) GobEncode-8 2.77ms ± 1% 2.77ms ± 0% ~ (p=0.078 n=21+21) Gzip-8 168ms ± 1% 168ms ± 0% +0.24% (p=0.000 n=20+20) Gunzip-8 25.1ms ± 0% 24.3ms ± 0% -3.03% (p=0.000 n=21+21) HTTPClientServer-8 61.4µs ± 8% 59.1µs ±10% ~ (p=0.088 n=20+21) JSONEncode-8 6.86ms ± 0% 6.70ms ± 0% -2.29% (p=0.000 n=20+19) JSONDecode-8 30.8ms ± 1% 30.6ms ± 1% -0.82% (p=0.000 n=20+20) Mandelbrot200-8 3.85ms ± 0% 3.85ms ± 0% ~ (p=0.191 n=16+17) GoParse-8 2.61ms ± 2% 2.60ms ± 1% ~ (p=0.561 n=21+20) RegexpMatchEasy0_32-8 48.5ns ± 2% 45.9ns ± 3% -5.26% (p=0.000 n=20+21) RegexpMatchEasy0_1K-8 139ns ± 0% 139ns ± 0% +0.27% (p=0.000 n=18+20) RegexpMatchEasy1_32-8 41.3ns ± 0% 42.1ns ± 4% +1.95% (p=0.000 n=17+21) RegexpMatchEasy1_1K-8 216ns ± 2% 216ns ± 0% +0.17% (p=0.020 n=21+19) RegexpMatchMedium_32-8 790ns ± 7% 803ns ± 8% ~ (p=0.178 n=21+21) RegexpMatchMedium_1K-8 23.5µs ± 5% 23.7µs ± 5% ~ (p=0.421 n=21+21) RegexpMatchHard_32-8 1.09µs ± 1% 1.09µs ± 1% -0.53% (p=0.000 n=19+18) RegexpMatchHard_1K-8 33.0µs ± 0% 33.0µs ± 0% ~ (p=0.610 n=21+20) Revcomp-8 348ms ± 0% 353ms ± 0% +1.38% (p=0.000 n=17+18) Template-8 42.0ms ± 1% 41.9ms ± 1% -0.30% (p=0.049 n=20+20) TimeParse-8 185ns ± 0% 185ns ± 0% ~ (p=0.387 n=20+18) TimeFormat-8 237ns ± 1% 241ns ± 1% +1.57% (p=0.000 n=21+21) [Geo mean] 35.4µs 35.2µs -0.66% name old speed new speed delta GobDecode-8 199MB/s ± 1% 198MB/s ± 1% -0.52% (p=0.000 n=20+21) GobEncode-8 277MB/s ± 1% 277MB/s ± 0% ~ (p=0.075 n=21+21) Gzip-8 116MB/s ± 1% 115MB/s ± 0% -0.25% (p=0.000 n=20+20) Gunzip-8 773MB/s ± 0% 797MB/s ± 0% +3.12% (p=0.000 n=21+21) JSONEncode-8 283MB/s ± 0% 290MB/s ± 0% +2.35% (p=0.000 n=20+19) JSONDecode-8 63.0MB/s ± 1% 63.5MB/s ± 1% +0.82% (p=0.000 n=20+20) GoParse-8 22.2MB/s ± 2% 22.3MB/s ± 1% ~ (p=0.539 n=21+20) RegexpMatchEasy0_32-8 660MB/s ± 2% 697MB/s ± 3% +5.57% (p=0.000 n=20+21) RegexpMatchEasy0_1K-8 7.36GB/s ± 0% 7.34GB/s ± 0% -0.26% (p=0.000 n=18+20) RegexpMatchEasy1_32-8 775MB/s ± 0% 761MB/s ± 4% -1.88% (p=0.000 n=17+21) RegexpMatchEasy1_1K-8 4.74GB/s ± 2% 4.74GB/s ± 0% -0.18% (p=0.020 n=21+19) RegexpMatchMedium_32-8 40.6MB/s ± 7% 39.9MB/s ± 9% ~ (p=0.191 n=21+21) RegexpMatchMedium_1K-8 43.7MB/s ± 5% 43.2MB/s ± 5% ~ (p=0.435 n=21+21) RegexpMatchHard_32-8 29.3MB/s ± 1% 29.4MB/s ± 1% +0.53% (p=0.000 n=19+18) RegexpMatchHard_1K-8 31.0MB/s ± 0% 31.0MB/s ± 0% ~ (p=0.572 n=21+20) Revcomp-8 730MB/s ± 0% 720MB/s ± 0% -1.36% (p=0.000 n=17+18) Template-8 46.2MB/s ± 1% 46.3MB/s ± 1% +0.30% (p=0.041 n=20+20) [Geo mean] 204MB/s 205MB/s +0.30% Change-Id: Iac75d0ec184a515ce0e65e19559d5fe2e9840514 Reviewed-on: https://go-review.googlesource.com/c/go/+/354970 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Trust: Josh Bleecher Snyder <josharian@gmail.com> Trust: Keith Randall <khr@golang.org> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Go Bot <gobot@golang.org>	2021-10-12 22:02:39 +00:00
Alejandro García Montoro	e1c294a56d	cmd/compile: eliminate successive swaps The code generated when storing eight bytes loaded from memory in big endian introduced two successive byte swaps that did not actually modified the data. The new rules match this specific pattern both for amd64 and for arm64, eliminating the double swap. Fixes #41684 Change-Id: Icb6dc20b68e4393cef4fe6a07b33aba0d18c3ff3 Reviewed-on: https://go-review.googlesource.com/c/go/+/320073 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Trust: Daniel Martí <mvdan@mvdan.cc> Trust: Dmitri Shuralyov <dmitshur@golang.org>	2021-10-09 01:04:29 +00:00
Ruslan Andreev	810b08b8ec	cmd/compile: inline memequal(x, const, sz) for small sizes This CL adds late expanded memequal(x, const, sz) inlining for 2, 4, 8 bytes size. This PoC is using the same method as CL 248404. This optimization fires about 100 times in Go compiler (1675 occurrences reduced to 1574, so -6%). Also, added unit-tests to codegen/comparisions.go file. Updates #37275 Change-Id: Ia52808d573cb706d1da8166c5746ede26f46c5da Reviewed-on: https://go-review.googlesource.com/c/go/+/328291 Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> Trust: David Chase <drchase@google.com>	2021-10-06 13:47:50 +00:00
nimelehin	097a82f54d	cmd/compile: don't emit unnecessary amd64 extension checks In case of amd64 the compiler issues checks if extensions are available on a platform. With GOAMD64 microarchitecture levels provided, some of the checks could be eliminated. Change-Id: If15c178bcae273b2ce7d3673415cb8849292e087 Reviewed-on: https://go-review.googlesource.com/c/go/+/352010 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org>	2021-10-05 18:32:12 +00:00
wdvxdr	060cd73ab9	cmd/compile: use TZCNT instruction for GOAMD64>=v3 on my Intel CoffeeLake CPU: name old time/op new time/op delta TrailingZeros-8 0.68ns ± 1% 0.64ns ± 1% -6.26% (p=0.000 n=10+10) TrailingZeros8-8 0.70ns ± 1% 0.70ns ± 1% ~ (p=0.697 n=10+10) TrailingZeros16-8 0.70ns ± 1% 0.70ns ± 1% +0.57% (p=0.043 n=10+10) TrailingZeros32-8 0.66ns ± 1% 0.64ns ± 1% -3.35% (p=0.000 n=10+10) TrailingZeros64-8 0.68ns ± 1% 0.64ns ± 1% -5.84% (p=0.000 n=9+10) Updates #45453 Change-Id: I228ff2d51df24b1306136f061432f8a12bb1d6fd Reviewed-on: https://go-review.googlesource.com/c/go/+/353249 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2021-10-05 16:06:49 +00:00
Lynn Boger	b8990ec932	test: update test/codegen/noextend.go to work with either ABI on ppc64x This updates the codegen tests in noextend.go so they are not dependent on the ABI. Change-Id: I8433bea9dc78830c143290a7e0cf901b2397d38a Reviewed-on: https://go-review.googlesource.com/c/go/+/353070 Trust: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2021-09-29 17:30:11 +00:00
Archana R	b35c668072	cmd/compile: add PPC64-specific inlining for runtime.memmove Add rule to PPC64.rules to inline runtime.memmove in more cases, as is done for other target architectures Updated tests in codegen/copy.go to verify changes are done on ppc64/ppc64le Updates #41662 Change-Id: Id937ce21f9b4f4047b3e66dfa3c960128ee16a2a Reviewed-on: https://go-review.googlesource.com/c/go/+/352054 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Trust: Lynn Boger <laboger@linux.vnet.ibm.com>	2021-09-29 16:55:51 +00:00
Joel Sing	fe8347b61a	cmd/compile: optimise immediate operands with constants on riscv64 Instructions with immediates can be precomputed when operating on a constant - do so for SLTI/SLTIU, SLLI/SRLI/SRAI, NEG/NEGW, ANDI, ORI and ADDI. Additionally, optimise ANDI and ORI when the immediate is all ones or all zeroes. In particular, the RISCV64 logical left and right shift rules (Lshx/RshUx) produce sequences that check if the shift amount exceeds 64 and if so returns zero. When the shift amount is a constant we can precompute and eliminate the filter entirely. Likewise the arithmetic right shift rules produce sequences that check if the shift amount exceeds 64 and if so, ensures that the lower six bits of the shift are all ones. When the shift amount is a constant we can precompute the shift value. Arithmetic right shift sequences like: 117fc: 00100513 li a0,1 11800: 04053593 sltiu a1,a0,64 11804: fff58593 addi a1,a1,-1 11808: 0015e593 ori a1,a1,1 1180c: 40b45433 sra s0,s0,a1 Are now a single srai instruction: 117fc: 40145413 srai s0,s0,0x1 Likewise for logical left shift (and logical right shift): 1d560: 01100413 li s0,17 1d564: 04043413 sltiu s0,s0,64 1d568: 40800433 neg s0,s0 1d56c: 01131493 slli s1,t1,0x11 1d570: 0084f433 and s0,s1,s0 Which are now a single slli (or srli) instruction: 1d120: 01131413 slli s0,t1,0x11 This removes more than 30,000 instructions from the Go binary and should improve performance in a variety of areas - of note runtime.makemap_small drops from 48 to 36 instructions. Similar gains exist in at least other parts of runtime and math/bits. Change-Id: I33f6f3d1fd36d9ff1bda706997162bfe4bb859b6 Reviewed-on: https://go-review.googlesource.com/c/go/+/350689 Trust: Joel Sing <joel@sing.id.au> Reviewed-by: Michael Munday <mike.munday@lowrisc.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2021-09-24 10:51:48 +00:00
Joel Sing	d413908320	test/codegen: add shift tests for RISCV64 Add tests for shift by constant, masked shifts and bounded shifts. While here, sort tests by architecture and keep order of tests consistent (lsh, rshU, rsh). Change-Id: I512d64196f34df9cb2884e8c0f6adcf9dd88b0fc Reviewed-on: https://go-review.googlesource.com/c/go/+/351289 Trust: Joel Sing <joel@sing.id.au> Reviewed-by: Michael Munday <mike.munday@lowrisc.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com>	2021-09-24 07:22:13 +00:00
Matthew Dempsky	04572fa29b	cmd/compile: use BMI1 instructions for GOAMD64=v3 and higher BMI1 includes four instructions (ANDN, BLSI, BLSMSK, BLSR) that are easy to peephole optimize, and which GCC always seems to favor using when available and applicable. Updates #45453. Change-Id: I0274184057058f5c579e5bc3ea9c414396d3cf46 Reviewed-on: https://go-review.googlesource.com/c/go/+/351130 Run-TryBot: Matthew Dempsky <mdempsky@google.com> Trust: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2021-09-22 00:15:27 +00:00
Keith Randall	6f35430faa	cmd/compile: allow rotates to be merged with logical ops on arm64 Fixes #48002 Change-Id: Ie3a157d55b291f5ac2ef4845e6ce4fefd84fc642 Reviewed-on: https://go-review.googlesource.com/c/go/+/350912 Trust: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2021-09-20 16:25:36 +00:00
Keith Randall	315dbd10c9	cmd/compile: fold double negate on arm64 Fixes #48467 Change-Id: I52305dbf561ee3eee6c1f053e555a3a6ec1ab892 Reviewed-on: https://go-review.googlesource.com/c/go/+/350910 Trust: Keith Randall <khr@golang.org> Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2021-09-19 23:50:45 +00:00
Keith Randall	83b36ffb10	cmd/compile: implement constant rotates on arm64 Explicit constant rotates work, but constant arguments to bits.RotateLeft* needed the additional rule. Fixes #48465 Change-Id: Ia7544f21d0e7587b6b6506f72421459cd769aea6 Reviewed-on: https://go-review.googlesource.com/c/go/+/350909 Trust: Keith Randall <khr@golang.org> Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2021-09-19 23:50:23 +00:00
Michael Munday	c69f5c0d76	cmd/compile: add support for Abs and Copysign intrinsics on riscv64 Also, add the FABSS and FABSD pseudo instructions to the assembler. The compiler could use FSGNJX[SD] directly but there doesn't seem to be much advantage to doing so and the pseudo instructions are easier to understand. Change-Id: Ie8825b8aa8773c69cc4f07a32ef04abf4061d80d Reviewed-on: https://go-review.googlesource.com/c/go/+/348989 Trust: Michael Munday <mike.munday@lowrisc.org> Run-TryBot: Michael Munday <mike.munday@lowrisc.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au>	2021-09-10 10:45:59 +00:00
fanzha02	2091bd3f26	cmd/compile: simiplify arm64 bitfield optimizations In some rewrite rules for arm64 bitfield optimizations, the bitfield lsb value and the bitfield width value are related to datasize, some of them use datasize directly to check the bitfield lsb value is valid, to get the bitfiled width value, but some of them call isARM64BFMask() and arm64BFWidth() functions. In order to be consistent, this patch changes them all to use datasize. Besides, this patch sorts the codegen test cases. Run the "toolstash-check -all" command and find one inconsistent code is as the following. new: src/math/fma.go:104 BEQ 247 master: src/math/fma.go:104 BEQ 248 The above inconsistence is due to this patch changing the range of the field lsb value in "UBFIZ" optimization rules from "lc+(32\|16\|8)<64" to "lc<64", so that the following code is generated as "UBFIZ". The logical of changed code is still correct. The code of src/math/fma.go:160: const uvinf = 0x7FF0000000000000 func FMA(a, b uint32) float64 { ps := a+b return Float64frombits(uint64(ps)<<63 \| uvinf) } The new assembly code: TEXT "".FMA(SB), LEAF\|NOFRAME\|ABIInternal, $0-16 MOVWU "".a(FP), R0 MOVWU "".b+4(FP), R1 ADD R1, R0, R0 UBFIZ $63, R0, $1, R0 ORR $9218868437227405312, R0, R0 MOVD R0, "".~r2+8(FP) RET (R30) The master assembly code: TEXT "".FMA(SB), LEAF\|NOFRAME\|ABIInternal, $0-16 MOVWU "".a(FP), R0 MOVWU "".b+4(FP), R1 ADD R1, R0, R0 MOVWU R0, R0 LSL $63, R0, R0 ORR $9218868437227405312, R0, R0 MOVD R0, "".~r2+8(FP) RET (R30) Change-Id: I9061104adfdfd3384d0525327ae1e5c8b0df5c35 Reviewed-on: https://go-review.googlesource.com/c/go/+/265038 Trust: fannie zhang <Fannie.Zhang@arm.com> Run-TryBot: fannie zhang <Fannie.Zhang@arm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2021-09-10 02:44:36 +00:00
Michael Munday	8214257347	test/codegen: fix package name for test case The codegen tests are currently skipped (see #48247). The test added in CL 346050 did not compile because it was in the main package but did not contain a main function. Changing the package to 'codegen' fixes the issue. Updates #48247. Change-Id: I0a0eaca8e6a7d7b335606d2c76a204ac0c12e6d5 Reviewed-on: https://go-review.googlesource.com/c/go/+/348392 Trust: Michael Munday <mike.munday@lowrisc.org> Run-TryBot: Michael Munday <mike.munday@lowrisc.org> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org>	2021-09-08 14:51:40 +00:00
Michael Munday	1da64686f8	test/codegen: fix compilation of bitfield tests The codegen tests are currently skipped (see #48247) and the bitfield tests do not actually compile due to a duplicate function name (sbfiz5) added in CL 267602. Renaming the function fixes the issue. Updates #48247. Change-Id: I626fd5ef13732dc358e73ace9ddcc4cbb6ae5b21 Reviewed-on: https://go-review.googlesource.com/c/go/+/348391 Trust: Michael Munday <mike.munday@lowrisc.org> Run-TryBot: Michael Munday <mike.munday@lowrisc.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2021-09-08 14:51:31 +00:00
Michael Munday	fdc2072420	test/codegen: remove broken riscv64 test This test is not executed by default (see #48247) and does not actually pass. It was added in CL 346689. The code generation changes made in that CL only change how instructions are assembled, they do not actually affect the output of the compiler. This test is unfortunately therefore invalid and will never pass. Updates #48247. Change-Id: I0c807e4a111336e5a097fe4e3af2805f9932a87f Reviewed-on: https://go-review.googlesource.com/c/go/+/348390 Trust: Michael Munday <mike.munday@lowrisc.org> Run-TryBot: Michael Munday <mike.munday@lowrisc.org> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org>	2021-09-08 14:51:22 +00:00
wdvxdr	21de6bc463	cmd/compile: simplify less with non-negative number and constant 0 or 1 The most common cases: len(s) > 0 len(s) < 1 and they can be simplified to: len(s) != 0 len(s) == 0 Fixes #48054 Change-Id: I16e5b0cffcfab62a4acc2a09977a6cd3543dd000 Reviewed-on: https://go-review.googlesource.com/c/go/+/346050 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2021-09-07 14:53:41 +00:00
fanzha02	7b69ddc171	cmd/compile: merge sign extension and shift into SBFIZ This patch adds some rules to rewrite "(LeftShift (SignExtend x) lc)" expression as "SBFIZ". Add the test cases. Change-Id: I294c4ba09712eeb02c7a952447bb006780f1e60d Reviewed-on: https://go-review.googlesource.com/c/go/+/267602 Trust: fannie zhang <Fannie.Zhang@arm.com> Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: fannie zhang <Fannie.Zhang@arm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2021-09-06 03:31:53 +00:00
fanzha02	43b05173a2	cmd/compile: merge zero/sign extensions with UBFX/SBFX on arm64 The UBFX and SBFX already zero/sign extend the result. Further zero/sign extensions are thus unnecessary as long as they leave the top bits unaltered. This patch absorbs zero/sign extensions into UBFX/SBFX. Add the related test cases. Change-Id: I7c4516c8b52d677f77bf3aaedab87c4a28056ec0 Reviewed-on: https://go-review.googlesource.com/c/go/+/265039 Trust: fannie zhang <Fannie.Zhang@arm.com> Trust: Keith Randall <khr@golang.org> Run-TryBot: fannie zhang <Fannie.Zhang@arm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2021-09-06 03:31:43 +00:00
Ben Shi	37d4532867	cmd/internal/obj/riscv: simplify addition with constant This CL simplifies riscv addition (add r, imm) to (ADDI (ADDI r, imm/2), imm-imm/2) if imm is in specific ranges. (-4096 <= imm <= -2049 or 2048 <= imm <= 4094) There is little impact to the go1 benchmark, while the total size of pkg/linux_riscv64 decreased by about 11KB. Change-Id: I236eb8af3b83bb35ce9c0b318fc1d235e8ab9a4e GitHub-Last-Rev: `a2f56a0763` GitHub-Pull-Request: golang/go#48110 Reviewed-on: https://go-review.googlesource.com/c/go/+/346689 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> Trust: Michael Munday <mike.munday@lowrisc.org>	2021-09-02 14:35:10 +00:00
Michael Munday	ea51e223c2	cmd/{asm,compile}: add fused multiply-add support on riscv64 Add support to the assembler for F[N]M{ADD,SUB}[SD] instructions. Argument order is: OP RS1, RS2, RS3, RD Also, add support for the FMA intrinsic to the compiler. Automatic FMA matching is left to a future CL. Change-Id: I47166c7393b2ab6bfc2e42aa8c1a8997c3a071b3 Reviewed-on: https://go-review.googlesource.com/c/go/+/293030 Trust: Michael Munday <mike.munday@lowrisc.org> Run-TryBot: Michael Munday <mike.munday@lowrisc.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au>	2021-09-01 21:17:04 +00:00
Jake Ciolek	6cf1d5d0fa	cmd/compile: generic SSA rules for simplifying 2 and 3 operand integer arithmetic expressions This applies the following generic integer addition/subtraction transformations: x - (x + y) = -y (x - y) - x = -y y + (x - y) = x y + (z + (x - y) = x + z There's over 40 unique functions matching in Go. Hits 2 funcs in the runtime itself: runtime.stackfree() runtime.runqdrain() Go binary size reduced by 0.05% on Linux x86_64. StackCopy bench (perflocked Cascade Lake x86): name old time/op new time/op delta StackCopyPtr-8 87.3ms ± 1% 86.9ms ± 0% -0.45% (p=0.000 n=20+20) StackCopy-8 77.6ms ± 1% 77.0ms ± 0% -0.76% (p=0.000 n=20+20) StackCopyNoCache-8 2.28ms ± 2% 2.26ms ± 2% -0.93% (p=0.008 n=19+20) test/bench/go1 benchmarks (perflocked Cascade Lake x86): name old time/op new time/op delta BinaryTree17-8 1.88s ± 1% 1.88s ± 0% ~ (p=0.373 n=15+12) Fannkuch11-8 2.31s ± 0% 2.35s ± 0% +1.52% (p=0.000 n=15+14) FmtFprintfEmpty-8 26.6ns ± 0% 26.6ns ± 0% ~ (p=0.081 n=14+13) FmtFprintfString-8 48.6ns ± 0% 50.0ns ± 0% +2.86% (p=0.000 n=15+14) FmtFprintfInt-8 56.9ns ± 0% 54.8ns ± 0% -3.70% (p=0.000 n=15+15) FmtFprintfIntInt-8 90.4ns ± 0% 88.8ns ± 0% -1.78% (p=0.000 n=15+15) FmtFprintfPrefixedInt-8 104ns ± 0% 104ns ± 0% ~ (p=0.905 n=14+13) FmtFprintfFloat-8 148ns ± 0% 144ns ± 0% -2.19% (p=0.000 n=14+15) FmtManyArgs-8 389ns ± 0% 390ns ± 0% +0.35% (p=0.000 n=12+15) GobDecode-8 3.90ms ± 1% 3.88ms ± 0% -0.49% (p=0.000 n=15+14) GobEncode-8 2.73ms ± 0% 2.73ms ± 0% ~ (p=0.425 n=15+14) Gzip-8 169ms ± 0% 168ms ± 0% -0.52% (p=0.000 n=13+13) Gunzip-8 24.7ms ± 0% 24.8ms ± 0% +0.61% (p=0.000 n=15+15) HTTPClientServer-8 60.5µs ± 6% 60.4µs ± 7% ~ (p=0.595 n=15+15) JSONEncode-8 6.97ms ± 1% 6.93ms ± 0% -0.69% (p=0.000 n=14+14) JSONDecode-8 31.2ms ± 1% 30.8ms ± 1% -1.27% (p=0.000 n=14+14) Mandelbrot200-8 3.87ms ± 0% 3.87ms ± 0% ~ (p=0.652 n=15+14) GoParse-8 2.65ms ± 2% 2.64ms ± 1% ~ (p=0.202 n=15+15) RegexpMatchEasy0_32-8 45.1ns ± 0% 45.9ns ± 0% +1.68% (p=0.000 n=14+15) RegexpMatchEasy0_1K-8 140ns ± 0% 139ns ± 0% -0.44% (p=0.000 n=15+14) RegexpMatchEasy1_32-8 40.9ns ± 3% 40.5ns ± 0% -0.88% (p=0.000 n=15+13) RegexpMatchEasy1_1K-8 215ns ± 1% 220ns ± 1% +2.27% (p=0.000 n=15+15) RegexpMatchMedium_32-8 783ns ± 7% 738ns ± 0% ~ (p=0.361 n=15+15) RegexpMatchMedium_1K-8 24.1µs ± 6% 23.4µs ± 6% -2.94% (p=0.004 n=15+15) RegexpMatchHard_32-8 1.10µs ± 1% 1.09µs ± 1% -0.40% (p=0.006 n=15+14) RegexpMatchHard_1K-8 33.0µs ± 0% 33.0µs ± 0% ~ (p=0.535 n=12+14) Revcomp-8 354ms ± 0% 353ms ± 0% -0.23% (p=0.002 n=15+13) Template-8 42.0ms ± 1% 41.8ms ± 2% -0.37% (p=0.023 n=14+15) TimeParse-8 181ns ± 0% 180ns ± 1% -0.18% (p=0.014 n=12+13) TimeFormat-8 240ns ± 0% 242ns ± 1% +0.69% (p=0.000 n=12+15) [Geo mean] 35.2µs 35.1µs -0.43% name old speed new speed delta GobDecode-8 197MB/s ± 1% 198MB/s ± 0% +0.49% (p=0.000 n=15+14) GobEncode-8 281MB/s ± 0% 281MB/s ± 0% ~ (p=0.419 n=15+14) Gzip-8 115MB/s ± 0% 115MB/s ± 0% +0.52% (p=0.000 n=13+13) Gunzip-8 786MB/s ± 0% 781MB/s ± 0% -0.60% (p=0.000 n=15+15) JSONEncode-8 278MB/s ± 1% 280MB/s ± 0% +0.69% (p=0.000 n=14+14) JSONDecode-8 62.3MB/s ± 1% 63.1MB/s ± 1% +1.29% (p=0.000 n=14+14) GoParse-8 21.9MB/s ± 2% 22.0MB/s ± 1% ~ (p=0.205 n=15+15) RegexpMatchEasy0_32-8 709MB/s ± 0% 697MB/s ± 0% -1.65% (p=0.000 n=14+15) RegexpMatchEasy0_1K-8 7.34GB/s ± 0% 7.37GB/s ± 0% +0.43% (p=0.000 n=15+15) RegexpMatchEasy1_32-8 783MB/s ± 2% 790MB/s ± 0% +0.88% (p=0.000 n=15+13) RegexpMatchEasy1_1K-8 4.77GB/s ± 1% 4.66GB/s ± 1% -2.23% (p=0.000 n=15+15) RegexpMatchMedium_32-8 41.0MB/s ± 7% 43.3MB/s ± 0% ~ (p=0.360 n=15+15) RegexpMatchMedium_1K-8 42.5MB/s ± 6% 43.8MB/s ± 6% +3.07% (p=0.004 n=15+15) RegexpMatchHard_32-8 29.2MB/s ± 1% 29.3MB/s ± 1% +0.41% (p=0.006 n=15+14) RegexpMatchHard_1K-8 31.1MB/s ± 0% 31.1MB/s ± 0% ~ (p=0.495 n=12+14) Revcomp-8 718MB/s ± 0% 720MB/s ± 0% +0.23% (p=0.002 n=15+13) Template-8 46.3MB/s ± 1% 46.4MB/s ± 2% +0.38% (p=0.021 n=14+15) [Geo mean] 205MB/s 206MB/s +0.57% Change-Id: Ibd1afdf8b6c0b08087dcc3acd8f943637eb95ac0 Reviewed-on: https://go-review.googlesource.com/c/go/+/344930 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Trust: Josh Bleecher Snyder <josharian@gmail.com>	2021-08-25 19:45:20 +00:00
Meng Zhuo	efd206eb40	cmd/compile: intrinsify Mul64 on riscv64 According to RISCV instruction set manual v2.2 Sec 6.1 MULHU followed by MUL will be fused into one multiply by microarchitecture Benchstat on Hifive unmatched: name old time/op new time/op delta Hash8Bytes 245ns ± 3% 186ns ± 4% -23.99% (p=0.000 n=10+10) Hash320Bytes 1.94µs ± 1% 1.31µs ± 1% -32.38% (p=0.000 n=9+10) Hash1K 5.84µs ± 0% 3.84µs ± 0% -34.20% (p=0.000 n=10+9) Hash8K 45.3µs ± 0% 29.4µs ± 0% -35.04% (p=0.000 n=10+10) name old speed new speed delta Hash8Bytes 32.7MB/s ± 3% 43.0MB/s ± 4% +31.61% (p=0.000 n=10+10) Hash320Bytes 165MB/s ± 1% 244MB/s ± 1% +47.88% (p=0.000 n=9+10) Hash1K 175MB/s ± 0% 266MB/s ± 0% +51.98% (p=0.000 n=10+9) Hash8K 181MB/s ± 0% 279MB/s ± 0% +53.94% (p=0.000 n=10+10) Change-Id: I3561495d02a4a0ad8578e9b9819bf0a4eaca5d12 Reviewed-on: https://go-review.googlesource.com/c/go/+/329970 Reviewed-by: Joel Sing <joel@sing.id.au> Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Go Bot <gobot@golang.org> Trust: Meng Zhuo <mzh@golangcn.org>	2021-08-16 13:50:11 +00:00
Cherry Mui	6b1e4430bb	[dev.typeparams] cmd/compile: implement clobberdead mode on ARM64 For debugging. Change-Id: I5875ccd2413b8ffd2ec97a0ace66b5cae7893b24 Reviewed-on: https://go-review.googlesource.com/c/go/+/324765 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Than McIntosh <thanm@google.com> TryBot-Result: Go Bot <gobot@golang.org>	2021-06-03 20:00:30 +00:00
Cherry Mui	165d39a1d4	[dev.typeparams] test: adjust codegen test for register ABI on ARM64 In codegen/arithmetic.go, previously there are MOVD's that match for loads of arguments. With register ABI there are no more such loads. Remove the MOVD matches. Change-Id: I920ee2629c8c04d454f13a0c08e283d3528d9a64 Reviewed-on: https://go-review.googlesource.com/c/go/+/324251 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Than McIntosh <thanm@google.com>	2021-06-03 14:28:14 +00:00
Ruslan Andreev	3b321a9d12	cmd/compile: add arch-specific inlining for runtime.memmove This CL add runtime.memmove inlining for AMD64 and ARM64. According to ssa dump from testcases generic rules can't inline memmomve properly due to one of the arguments is Phi operation. But this Phi op will be optimized out by later optimization stages. As a result memmove can be inlined during arch-specific rules. The commit add new optimization rules to arch-specific rules that can inline runtime.memmove if it possible during lowering stage. Optimization fires 5 times in Go source-code using regabi. Fixes #41662 Change-Id: Iaffaf4c482d068b5f0683d141863892202cc8824 Reviewed-on: https://go-review.googlesource.com/c/go/+/289151 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Trust: David Chase <drchase@google.com>	2021-05-12 16:23:30 +00:00
Keith Randall	b211fe0058	cmd/compile: remove bit operations that modify memory directly These operations (BT{S,R,C}{Q,L}modify) are quite a bit slower than other ways of doing the same thing. Without the BTxmodify operations, there are two fallback ways the compiler performs these operations: AND/OR/XOR operations directly on memory, or load-BTx-write sequences. The compiler kinda chooses one arbitrarily depending on rewrite rule application order. Currently, it uses load-BTx-write for the Const benchmarks and AND/OR/XOR directly to memory for the non-Const benchmarks. TBD, someone might investigate which of the two fallback strategies is really better. For now, they are both better than BTx ops. name old time/op new time/op delta BitSet-8 1.09µs ± 2% 0.64µs ± 5% -41.60% (p=0.000 n=9+10) BitClear-8 1.15µs ± 3% 0.68µs ± 6% -41.00% (p=0.000 n=10+10) BitToggle-8 1.18µs ± 4% 0.73µs ± 2% -38.36% (p=0.000 n=10+8) BitSetConst-8 37.0ns ± 7% 25.8ns ± 2% -30.24% (p=0.000 n=10+10) BitClearConst-8 30.7ns ± 2% 25.0ns ±12% -18.46% (p=0.000 n=10+10) BitToggleConst-8 36.9ns ± 1% 23.8ns ± 3% -35.46% (p=0.000 n=9+10) Fixes #45790 Update #45242 Change-Id: Ie33a72dc139f261af82db15d446cd0855afb4e59 Reviewed-on: https://go-review.googlesource.com/c/go/+/318149 Trust: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Ben Shi <powerman1st@163.com>	2021-05-08 03:27:59 +00:00
Cherry Zhang	6b8e3e2d06	cmd/compile: reduce redundant register moves for regabi calls Currently, if we have AX=a and BX=b, and we want to make a call F(1, a, b), to move arguments into the desired registers it emits MOVQ AX, CX MOVL $1, AX // AX=1 MOVQ BX, DX MOVQ CX, BX // BX=a MOVQ DX, CX // CX=b This has a few redundant moves. This is because we process inputs in order. First, allocate 1 to AX, which kicks out a (in AX) to CX (a free register at the moment). Then, allocate a to BX, which kicks out b (in BX) to DX. Finally, put b to CX. Notice that if we start with allocating CX=b, then BX=a, AX=1, we will not have redundant moves. This CL reduces redundant moves by allocating them in different order: First, for inpouts that are already in place, keep them there. Then allocate free registers. Then everything else. before after cmd/compile binary size 23703888 23609680 text size 8565899 8533291 (with regabiargs enabled.) Change-Id: I69e1bdf745f2c90bb791f6d7c45b37384af1e874 Reviewed-on: https://go-review.googlesource.com/c/go/+/311371 Trust: Cherry Zhang <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Than McIntosh <thanm@google.com>	2021-04-19 16:21:39 +00:00
David Chase	4df3d0e4df	cmd/compile: rescue stmt boundaries from OpArgXXXReg and OpSelectN. Fixes this failure: go test cmd/compile/internal/ssa -run TestStmtLines -v === RUN TestStmtLines stmtlines_test.go:115: Saw too many (amd64, > 1%) lines without statement marks, total=88263, nostmt=1930 ('-run TestStmtLines -v' lists failing lines) The failure has two causes. One is that the first-line adjuster in code generation was relocating "first lines" to instructions that would either not have any code generated, or would have the statment marker removed by a different believed-good heuristic. The other was that statement boundaries were getting attached to register values (that with the old ABI were loads from the stack, hence real instructions). The register values disappear at code generation. The fixes are to (1) note that certain instructions are not good choices for "first value" and skip them, and (2) in an expandCalls post-pass, look for register valued instructions and under appropriate conditions move their statement marker to a compatible use. Also updates TestStmtLines to always log the score, for easier comparison of minor compiler changes. Updates #40724. Change-Id: I485573ce900e292d7c44574adb7629cdb4695c3f Reviewed-on: https://go-review.googlesource.com/c/go/+/309649 Trust: David Chase <drchase@google.com> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2021-04-14 18:46:08 +00:00
Cherry Zhang	444d28295b	test: make codegen/memops.go work with both ABIs Following CL 309335, this fixes memops.go. Change-Id: Ia2458b5267deee9f906f76c50e70a021ea2fcb5b Reviewed-on: https://go-review.googlesource.com/c/go/+/309552 Trust: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Than McIntosh <thanm@google.com>	2021-04-13 14:07:17 +00:00
Cherry Zhang	263e13d1f7	test: make codegen tests work with both ABIs Some codegen tests were written with the assumption that arguments and results are in memory, and with a specific stack layout. With the register ABI, the assumption is no longer true. Adjust the tests to work with both cases. - For tests expecting in memory arguments/results, change to use global variables or memory-assigned argument/results. - Allow more registers. E.g. some tests expecting register names contain only letters (e.g. AX), but it can also contain numbers (e.g. R10). - Some instruction selection changes when operate on register vs. memory, e.g. ADDQ vs. LEAQ, MOVB vs. MOVL. Accept both. TODO: mathbits.go and memops.go still need fix. Change-Id: Ic5932b4b5dd3f5d30ed078d296476b641420c4c5 Reviewed-on: https://go-review.googlesource.com/c/go/+/309335 Trust: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2021-04-12 21:59:59 +00:00
Pat Gavlin	359f44910f	cmd/compile: fix long RMW bit operations on AMD64 Under certain circumstances, the existing rules for bit operations can produce code that writes beyond its intended bounds. For example, consider the following code: func repro(b []byte, addr, bit int32) { _ = b[3] v := uint32(b[0]) \| uint32(b[1])<<8 \| uint32(b[2])<<16 \| uint32(b[3])<<24 \| 1<<(bit&31) b[0] = byte(v) b[1] = byte(v >> 8) b[2] = byte(v >> 16) b[3] = byte(v >> 24) } Roughly speaking: 1. The expression `1 << (bit & 31)` is rewritten into `(SHLL 1 bit)` 2. The expression `uint32(b[0]) \| uint32(b[1])<<8 \| uint32(b[2])<<16 \| uint32(b[3])<<24` is rewritten into `(MOVLload &b[0])` 3. The statements `b[0] = byte(v) ... b[3] = byte(v >> 24)` are rewritten into `(MOVLstore &b[0], v)` 4. `(ORL (SHLL 1, bit) (MOVLload &b[0]))` is rewritten into `(BTSL (MOVLload &b[0]) bit)`. This is a valid transformation because the destination is a register: in this case, the bit offset is masked by the number of bits in the destination register. This is identical to the masking performed by `SHL`. 5. `(MOVLstore &b[0] (BTSL (MOVLload &b[0]) bit))` is rewritten into `(BTSLmodify &b[0] bit)`. This is an invalid transformation because the destination is memory: in this case, the bit offset is not masked, and the chosen instruction may write outside its intended 32-bit location. These changes fix the invalid rewrite performed in step (5) by explicitly maksing the bit offset operand to `BT(S\|R\|C)(L\|Q)modify`. In the example above, the adjusted rules produce `(BTSLmodify &b[0] (ANDLconst [31] bit))` in step (5). These changes also add several new rules to rewrite bit sets, toggles, and clears that are rooted at `(OR\|XOR\|AND)(L\|Q)modify` operators into appropriate `BT(S\|R\|C)(L\|Q)modify` operators. These rules catch cases where `MOV(L\|Q)store ((OR\|XOR\|AND)(L\|Q) ...)` is rewritten to `(OR\|XOR\|AND)(L\|Q)modify` before the `(OR\|XOR\|AND)(L\|Q) ...` can be rewritten to `BT(S\|R\|C)(L\|Q) ...`. Overall, compilecmp reports small improvements in code size on darwin/amd64 when the changes to the compiler itself are exlcuded: file before after Δ % runtime.s 536464 536412 -52 -0.010% bytes.s 32629 32593 -36 -0.110% strings.s 44565 44529 -36 -0.081% os/signal.s 7967 7959 -8 -0.100% cmd/vendor/golang.org/x/sys/unix.s 81686 81678 -8 -0.010% math/big.s 188235 188253 +18 +0.010% cmd/link/internal/loader.s 89295 89056 -239 -0.268% cmd/link/internal/ld.s 633551 633232 -319 -0.050% cmd/link/internal/arm.s 18934 18928 -6 -0.032% cmd/link/internal/arm64.s 31814 31801 -13 -0.041% cmd/link/internal/riscv64.s 7347 7345 -2 -0.027% cmd/compile/internal/ssa.s 4029173 4033066 +3893 +0.097% total 21298280 21301472 +3192 +0.015% Change-Id: I2e560548b515865129e1724e150e30540e9d29ce GitHub-Last-Rev: `9a42bd29a5` GitHub-Pull-Request: golang/go#45242 Reviewed-on: https://go-review.googlesource.com/c/go/+/304869 Reviewed-by: Keith Randall <khr@golang.org> Trust: Josh Bleecher Snyder <josharian@gmail.com>	2021-03-26 19:40:37 +00:00
fanzha02	3a0061822e	cmd/compile: add arm64 rules to optimize go codes to constant 0 Optimize the following codes to constant 0. function shift (x uint32) uint64 { return uint64(x) >> 32 } Change-Id: Ida6b39d713cc119ad5a2f01fd54bfd252cf2c975 Reviewed-on: https://go-review.googlesource.com/c/go/+/303830 Trust: fannie zhang <Fannie.Zhang@arm.com> Run-TryBot: fannie zhang <Fannie.Zhang@arm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2021-03-26 01:57:35 +00:00
fanzha02	b182ba7fab	cmd/compile: optimize codes with arm64 REV16 instruction Optimize some patterns into rev16/rev16w instruction. Pattern1: (c & 0xff00ff00)>>8 \| (c & 0x00ff00ff)<<8 To: rev16w c Pattern2: (c & 0xff00ff00ff00ff00)>>8 \| (c & 0x00ff00ff00ff00ff)<<8 To: rev16 c This patch is a copy of CL 239637, contributed by Alice Xu(dianhong.xu@arm.com). Change-Id: I96936c1db87618bc1903c04221c7e9b2779455b3 Reviewed-on: https://go-review.googlesource.com/c/go/+/268377 Trust: fannie zhang <Fannie.Zhang@arm.com> Run-TryBot: fannie zhang <Fannie.Zhang@arm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2021-03-23 01:36:23 +00:00
Cherry Zhang	6ae3b70ef2	cmd/compile: add clobberdeadreg mode When -clobberdeadreg flag is set, the compiler inserts code that clobbers integer registers at call sites. This may be helpful for debugging register ABI. Only implemented on AMD64 for now. Change-Id: Ia203d3f891c30fd95d0103489056fe01d63a2899 Reviewed-on: https://go-review.googlesource.com/c/go/+/302809 Trust: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2021-03-19 23:21:21 +00:00
fanzha02	f5e6d3e879	cmd/compile: add rewrite rules for conditional instructions on arm64 This CL adds rewrite rules for CSETM, CSINC, CSINV, and CSNEG. By adding these rules, we can save one instruction. For example, func test(cond bool, a int) int { if cond { a++ } return a } Before: MOVD "".a+8(RSP), R0 ADD $1, R0, R1 MOVBU "".cond(RSP), R2 CMPW $0, R2 CSEL NE, R1, R0, R0 After: MOVBU "".cond(RSP), R0 CMPW $0, R0 MOVD "".a+8(RSP), R0 CSINC EQ, R0, R0, R0 This patch is a copy of CL 285694. Co-authored-by: JunchenLi <junchen.li@arm.com> Change-Id: Ic1a79e8b8ece409b533becfcb7950f11e7b76f24 Reviewed-on: https://go-review.googlesource.com/c/go/+/302231 Trust: fannie zhang <Fannie.Zhang@arm.com> Run-TryBot: fannie zhang <Fannie.Zhang@arm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2021-03-18 01:46:58 +00:00
Cherry Zhang	8628bf9a97	cmd/compile: resurrect clobberdead mode This CL resurrects the clobberdead debugging mode (CL 23924). When -clobberdead flag is set (TODO: make it GOEXPERIMENT?), the compiler inserts code that clobbers all dead stack slots that contains pointers. Mark windows syscall functions cgo_unsafe_args, as the code actually does that, by taking the address of one argument and passing it to cgocall. Change-Id: Ie09a015f4bd14ae6053cc707866e30ae509b9d6f Reviewed-on: https://go-review.googlesource.com/c/go/+/301791 Trust: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Than McIntosh <thanm@google.com>	2021-03-17 17:50:50 +00:00
Meng Zhuo	afe517590c	cmd/compile: loads from readonly globals into const for mips64x Ref: CL 141118 Update #26498 Change-Id: If4ea55c080b9aa10183eefe81fefbd4072deaf3a Reviewed-on: https://go-review.googlesource.com/c/go/+/280646 Trust: Meng Zhuo <mzh@golangcn.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2021-03-16 09:07:17 +00:00
erifan01	600259b099	cmd/compile: use depth first topological sort algorithm for layout The current layout algorithm tries to put consecutive blocks together, so the priority of the successor block is higher than the priority of the zero indegree block. This algorithm is beneficial for subsequent register allocation, but will result in more branch instructions. The depth-first topological sorting algorithm is a well-known layout algorithm, which has applications in many languages, and it helps to reduce branch instructions. This CL applies it to the layout pass. The test results show that it helps to reduce the code size. This CL also includes the following changes: 1, Removed the primary predecessor mechanism. The new layout algorithm is not very friendly to register allocator in some cases, in order to adapt to the new layout algorithm, a new primary predecessor selection strategy is introduced. 2, Since the new layout implementation may place non-loop blocks between loop blocks, some adaptive modifications have also been made to looprotate pass. 3, The layout also affects the results of codegen, so this CL also adjusted several codegen tests accordingly. It is inevitable that this CL will cause the code size or performance of a few functions to decrease, but the number of cases it improves is much larger than the number of cases it drops. Statistical data from compilecmp on linux/amd64 is as follow: name old time/op new time/op delta Template 382ms ± 4% 382ms ± 4% ~ (p=0.497 n=49+50) Unicode 170ms ± 9% 169ms ± 8% ~ (p=0.344 n=48+50) GoTypes 2.01s ± 4% 2.01s ± 4% ~ (p=0.628 n=50+48) Compiler 190ms ±10% 189ms ± 9% ~ (p=0.734 n=50+50) SSA 11.8s ± 2% 11.8s ± 3% ~ (p=0.877 n=50+50) Flate 241ms ± 9% 241ms ± 8% ~ (p=0.897 n=50+49) GoParser 366ms ± 3% 361ms ± 4% -1.21% (p=0.004 n=47+50) Reflect 835ms ± 3% 838ms ± 3% ~ (p=0.275 n=50+49) Tar 336ms ± 4% 335ms ± 3% ~ (p=0.454 n=48+48) XML 433ms ± 4% 431ms ± 3% ~ (p=0.071 n=49+48) LinkCompiler 706ms ± 4% 705ms ± 4% ~ (p=0.608 n=50+49) ExternalLinkCompiler 1.85s ± 3% 1.83s ± 2% -1.47% (p=0.000 n=49+48) LinkWithoutDebugCompiler 437ms ± 5% 437ms ± 6% ~ (p=0.953 n=49+50) [Geo mean] 615ms 613ms -0.37% name old alloc/op new alloc/op delta Template 38.7MB ± 1% 38.7MB ± 1% ~ (p=0.834 n=50+50) Unicode 28.1MB ± 0% 28.1MB ± 0% -0.22% (p=0.000 n=49+50) GoTypes 168MB ± 1% 168MB ± 1% ~ (p=0.054 n=47+47) Compiler 23.0MB ± 1% 23.0MB ± 1% ~ (p=0.432 n=50+50) SSA 1.54GB ± 0% 1.54GB ± 0% +0.21% (p=0.000 n=50+50) Flate 23.6MB ± 1% 23.6MB ± 1% ~ (p=0.153 n=43+46) GoParser 35.1MB ± 1% 35.1MB ± 2% ~ (p=0.202 n=50+50) Reflect 84.7MB ± 1% 84.7MB ± 1% ~ (p=0.333 n=48+49) Tar 34.5MB ± 1% 34.5MB ± 1% ~ (p=0.406 n=46+49) XML 44.3MB ± 2% 44.2MB ± 3% ~ (p=0.981 n=50+50) LinkCompiler 131MB ± 0% 128MB ± 0% -2.74% (p=0.000 n=50+50) ExternalLinkCompiler 120MB ± 0% 120MB ± 0% +0.01% (p=0.007 n=50+50) LinkWithoutDebugCompiler 77.3MB ± 0% 77.3MB ± 0% -0.02% (p=0.000 n=50+50) [Geo mean] 69.3MB 69.1MB -0.22% file before after Δ % addr2line 4104220 4043684 -60536 -1.475% api 5342502 5249678 -92824 -1.737% asm 4973785 4858257 -115528 -2.323% buildid 2667844 2625660 -42184 -1.581% cgo 4686849 4616313 -70536 -1.505% compile 23667431 23268406 -399025 -1.686% cover 4959676 4874108 -85568 -1.725% dist 3515934 3450422 -65512 -1.863% doc 3995581 3925469 -70112 -1.755% fix 3379202 3318522 -60680 -1.796% link 6743249 6629913 -113336 -1.681% nm 4047529 3991777 -55752 -1.377% objdump 4456151 4388151 -68000 -1.526% pack 2435040 2398072 -36968 -1.518% pprof 13804080 13565808 -238272 -1.726% test2json 2690043 2645987 -44056 -1.638% trace 10418492 10232716 -185776 -1.783% vet 7258259 7121259 -137000 -1.888% total 113145867 111204202 -1941665 -1.716% The situation on linux/arm64 is as follow: name old time/op new time/op delta Template 280ms ± 1% 282ms ± 1% +0.75% (p=0.000 n=46+48) Unicode 124ms ± 2% 124ms ± 2% +0.37% (p=0.045 n=50+50) GoTypes 1.69s ± 1% 1.70s ± 1% +0.56% (p=0.000 n=49+50) Compiler 122ms ± 1% 123ms ± 1% +0.93% (p=0.000 n=50+50) SSA 12.6s ± 1% 12.7s ± 0% +0.72% (p=0.000 n=50+50) Flate 170ms ± 1% 172ms ± 1% +0.97% (p=0.000 n=49+49) GoParser 262ms ± 1% 263ms ± 1% +0.39% (p=0.000 n=49+48) Reflect 639ms ± 1% 650ms ± 1% +1.63% (p=0.000 n=49+49) Tar 243ms ± 1% 245ms ± 1% +0.82% (p=0.000 n=50+50) XML 324ms ± 1% 327ms ± 1% +0.72% (p=0.000 n=50+49) LinkCompiler 597ms ± 1% 596ms ± 1% -0.27% (p=0.001 n=48+47) ExternalLinkCompiler 1.90s ± 1% 1.88s ± 1% -1.00% (p=0.000 n=50+50) LinkWithoutDebugCompiler 364ms ± 1% 363ms ± 1% ~ (p=0.220 n=49+50) [Geo mean] 485ms 488ms +0.49% name old alloc/op new alloc/op delta Template 38.7MB ± 0% 38.8MB ± 1% ~ (p=0.093 n=43+49) Unicode 28.4MB ± 0% 28.4MB ± 0% +0.03% (p=0.000 n=49+45) GoTypes 169MB ± 1% 169MB ± 1% +0.23% (p=0.010 n=50+50) Compiler 23.2MB ± 1% 23.2MB ± 1% +0.11% (p=0.000 n=40+44) SSA 1.54GB ± 0% 1.55GB ± 0% +0.45% (p=0.000 n=47+49) Flate 23.8MB ± 2% 23.8MB ± 1% ~ (p=0.543 n=50+50) GoParser 35.3MB ± 1% 35.4MB ± 1% ~ (p=0.792 n=50+50) Reflect 85.2MB ± 1% 85.2MB ± 0% ~ (p=0.055 n=50+47) Tar 34.5MB ± 1% 34.5MB ± 1% +0.06% (p=0.015 n=50+50) XML 43.8MB ± 2% 43.9MB ± 2% +0.19% (p=0.000 n=48+48) LinkCompiler 137MB ± 0% 136MB ± 0% -0.92% (p=0.000 n=50+50) ExternalLinkCompiler 127MB ± 0% 127MB ± 0% ~ (p=0.516 n=50+50) LinkWithoutDebugCompiler 84.0MB ± 0% 84.0MB ± 0% ~ (p=0.057 n=50+50) [Geo mean] 70.4MB 70.4MB +0.01% file before after Δ % addr2line 4021557 4002933 -18624 -0.463% api 5127847 5028503 -99344 -1.937% asm 5034716 4936836 -97880 -1.944% buildid 2608118 2594094 -14024 -0.538% cgo 4488592 4398320 -90272 -2.011% compile 22501129 22213592 -287537 -1.278% cover 4742301 4713573 -28728 -0.606% dist 3388071 3365311 -22760 -0.672% doc 3802250 3776082 -26168 -0.688% fix 3306147 3216939 -89208 -2.698% link 6404483 6363699 -40784 -0.637% nm 3941026 3921930 -19096 -0.485% objdump 4383330 4295122 -88208 -2.012% pack 2404547 2389515 -15032 -0.625% pprof 12996234 12856818 -139416 -1.073% test2json 2668500 2586788 -81712 -3.062% trace 9816276 9609580 -206696 -2.106% vet 6900682 6787338 -113344 -1.643% total 108535806 107056973 -1478833 -1.363% Change-Id: Iaec1cdcaacca8025e9babb0fb8a532fddb70c87d Reviewed-on: https://go-review.googlesource.com/c/go/+/255239 Reviewed-by: eric fang <eric.fang@arm.com> Reviewed-by: Keith Randall <khr@golang.org> Trust: eric fang <eric.fang@arm.com>	2021-03-16 02:44:54 +00:00
Josh Bleecher Snyder	43d5f213e2	cmd/compile: optimize multi-register shifts on amd64 amd64 can shift in bits from another register instead of filling with 0/1. This pattern is helpful when implementing 128 bit shifts or arbitrary length shifts. In the standard library, it shows up in pure Go math/big. Benchmarks results on amd64 with -tags=math_big_pure_go. name old time/op new time/op delta NonZeroShifts/1/shrVU-8 4.45ns ± 3% 4.39ns ± 1% -1.28% (p=0.000 n=30+27) NonZeroShifts/1/shlVU-8 4.13ns ± 4% 4.10ns ± 2% ~ (p=0.254 n=29+28) NonZeroShifts/2/shrVU-8 5.55ns ± 1% 5.63ns ± 2% +1.42% (p=0.000 n=28+29) NonZeroShifts/2/shlVU-8 5.70ns ± 2% 5.14ns ± 1% -9.82% (p=0.000 n=29+28) NonZeroShifts/3/shrVU-8 6.79ns ± 2% 6.35ns ± 2% -6.46% (p=0.000 n=28+29) NonZeroShifts/3/shlVU-8 6.69ns ± 1% 6.25ns ± 1% -6.60% (p=0.000 n=28+27) NonZeroShifts/4/shrVU-8 7.79ns ± 2% 7.06ns ± 2% -9.48% (p=0.000 n=30+30) NonZeroShifts/4/shlVU-8 7.82ns ± 1% 7.24ns ± 1% -7.37% (p=0.000 n=28+29) NonZeroShifts/5/shrVU-8 8.90ns ± 3% 7.93ns ± 1% -10.84% (p=0.000 n=29+26) NonZeroShifts/5/shlVU-8 8.68ns ± 1% 7.92ns ± 1% -8.76% (p=0.000 n=29+29) NonZeroShifts/10/shrVU-8 14.4ns ± 1% 12.3ns ± 2% -14.79% (p=0.000 n=28+29) NonZeroShifts/10/shlVU-8 14.1ns ± 1% 11.9ns ± 2% -15.55% (p=0.000 n=28+27) NonZeroShifts/100/shrVU-8 118ns ± 1% 96ns ± 3% -18.82% (p=0.000 n=30+29) NonZeroShifts/100/shlVU-8 120ns ± 2% 98ns ± 2% -18.46% (p=0.000 n=29+28) NonZeroShifts/1000/shrVU-8 1.10µs ± 1% 0.88µs ± 2% -19.63% (p=0.000 n=29+30) NonZeroShifts/1000/shlVU-8 1.10µs ± 2% 0.88µs ± 2% -20.28% (p=0.000 n=29+28) NonZeroShifts/10000/shrVU-8 10.9µs ± 1% 8.7µs ± 1% -19.78% (p=0.000 n=28+27) NonZeroShifts/10000/shlVU-8 10.9µs ± 2% 8.7µs ± 1% -19.64% (p=0.000 n=29+27) NonZeroShifts/100000/shrVU-8 111µs ± 2% 90µs ± 2% -19.39% (p=0.000 n=28+29) NonZeroShifts/100000/shlVU-8 113µs ± 2% 90µs ± 2% -20.43% (p=0.000 n=30+27) The assembly version is still faster, unfortunately, but the gap is narrowing. Speedup from pure Go to assembly: name old time/op new time/op delta NonZeroShifts/1/shrVU-8 4.39ns ± 1% 3.45ns ± 2% -21.36% (p=0.000 n=27+29) NonZeroShifts/1/shlVU-8 4.10ns ± 2% 3.47ns ± 3% -15.42% (p=0.000 n=28+30) NonZeroShifts/2/shrVU-8 5.63ns ± 2% 3.97ns ± 0% -29.40% (p=0.000 n=29+25) NonZeroShifts/2/shlVU-8 5.14ns ± 1% 3.77ns ± 2% -26.65% (p=0.000 n=28+26) NonZeroShifts/3/shrVU-8 6.35ns ± 2% 4.79ns ± 2% -24.52% (p=0.000 n=29+29) NonZeroShifts/3/shlVU-8 6.25ns ± 1% 4.42ns ± 1% -29.29% (p=0.000 n=27+26) NonZeroShifts/4/shrVU-8 7.06ns ± 2% 5.64ns ± 1% -20.05% (p=0.000 n=30+29) NonZeroShifts/4/shlVU-8 7.24ns ± 1% 5.34ns ± 2% -26.23% (p=0.000 n=29+29) NonZeroShifts/5/shrVU-8 7.93ns ± 1% 6.56ns ± 2% -17.26% (p=0.000 n=26+30) NonZeroShifts/5/shlVU-8 7.92ns ± 1% 6.27ns ± 1% -20.79% (p=0.000 n=29+25) NonZeroShifts/10/shrVU-8 12.3ns ± 2% 10.2ns ± 2% -17.21% (p=0.000 n=29+29) NonZeroShifts/10/shlVU-8 11.9ns ± 2% 10.5ns ± 2% -12.45% (p=0.000 n=27+29) NonZeroShifts/100/shrVU-8 95.9ns ± 3% 77.7ns ± 1% -19.00% (p=0.000 n=29+30) NonZeroShifts/100/shlVU-8 97.5ns ± 2% 66.8ns ± 2% -31.47% (p=0.000 n=28+30) NonZeroShifts/1000/shrVU-8 884ns ± 2% 705ns ± 1% -20.17% (p=0.000 n=30+28) NonZeroShifts/1000/shlVU-8 880ns ± 2% 590ns ± 1% -32.96% (p=0.000 n=28+25) NonZeroShifts/10000/shrVU-8 8.74µs ± 1% 7.34µs ± 3% -15.94% (p=0.000 n=27+30) NonZeroShifts/10000/shlVU-8 8.73µs ± 1% 6.00µs ± 1% -31.25% (p=0.000 n=27+28) NonZeroShifts/100000/shrVU-8 89.6µs ± 2% 75.5µs ± 2% -15.80% (p=0.000 n=29+29) NonZeroShifts/100000/shlVU-8 89.6µs ± 2% 68.0µs ± 3% -24.09% (p=0.000 n=27+30) Change-Id: I18f58d8f5513d737d9cdf09b8f9d14011ffe3958 Reviewed-on: https://go-review.googlesource.com/c/go/+/297050 Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2021-03-11 19:11:46 +00:00
Paul E. Murphy	48ddf70128	cmd/asm,cmd/compile: support 5 operand RLWNM/RLWMI on ppc64 These instructions are actually 5 argument opcodes as specified by the ISA. Prior to this patch, the MB and ME arguments were merged into a single bitmask operand to workaround the limitations of the ppc64 assembler backend. This limitation no longer exists. Thus, we can pass operands for these opcodes without having to merge the MB and ME arguments in the assembler frontend or compiler backend. Likewise, support for 4 operand variants is unchanged. Change-Id: Ib086774f3581edeaadfd2190d652aaaa8a90daeb Reviewed-on: https://go-review.googlesource.com/c/go/+/298750 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org> Trust: Carlos Eduardo Seo <carlos.seo@linaro.org>	2021-03-09 20:35:41 +00:00
fanzha02	2b50ab2aee	cmd/compile: optimize single-precision floating point square root Add generic rule to rewrite the single-precision square root expression with one single-precision instruction. The optimization will reduce two times of precision converting between double-precision and single-precision. On arm64 flatform. previous: FCVTSD F0, F0 FSQRTD F0, F0 FCVTDS F0, F0 optimized: FSQRTS S0, S0 And this patch adds the test case to check the correctness. This patch refers to CL 241877, contributed by Alice Xu (dianhong.xu@arm.com) Change-Id: I6de5d02281c693017ac4bd4c10963dd55989bd7e Reviewed-on: https://go-review.googlesource.com/c/go/+/276873 Trust: fannie zhang <Fannie.Zhang@arm.com> Run-TryBot: fannie zhang <Fannie.Zhang@arm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2021-03-02 06:38:07 +00:00
Egon Elbre	3ee32439b5	cmd/compile: ARM64 optimize []float64 and []float32 access Optimize load and store to []float64 and []float32. Previously it used LSL instead of shifted register indexed load/store. Before: LSL $3, R0, R0 FMOVD F0, (R1)(R0) After: FMOVD F0, (R1)(R0<<3) Fixes #42798 Change-Id: I0c0912140c3dce5aa6abc27097c0eb93833cc589 Reviewed-on: https://go-review.googlesource.com/c/go/+/273706 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Giovanni Bajo <rasky@develer.com>	2021-02-24 19:49:08 +00:00
Alejandro García Montoro	bf48163e8f	cmd/compile: add rule to coalesce writes The code generated when storing eight bytes loaded from memory created a series of small writes instead of a single, large one. The specific pattern of instructions generated stored 1 byte, then 2 bytes, then 4 bytes, and finally 1 byte. The new rules match this specific pattern both for amd64 and for s390x, and convert it into a single instruction to store the 8 bytes. arm64 and ppc64le already generated the right code, but the new codegen test covers also those architectures. Fixes #41663 Change-Id: Ifb9b464be2d59c2ed5034acf7b9c3e473f344030 Reviewed-on: https://go-review.googlesource.com/c/go/+/280456 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Trust: Josh Bleecher Snyder <josharian@gmail.com> Trust: Jason A. Donenfeld <Jason@zx2c4.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Go Bot <gobot@golang.org>	2021-02-24 19:25:49 +00:00
Keith Randall	42cd40ee74	cmd/compile: improve bit test code Some bit test instruction generation stopped triggering after the change to addressing modes. I suspect this was just because ANDQload was being generated before the rewrite rules could discover the BTQ. Fix that by decomposing the ANDQload when it is surrounded by a TESTQ (thus re-enabling the BTQ rules). Fixes #44228 Change-Id: I489b4a5a7eb01c65fc8db0753f8cec54097cadb2 Reviewed-on: https://go-review.googlesource.com/c/go/+/291749 Trust: Keith Randall <khr@golang.org> Trust: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2021-02-23 18:02:48 +00:00
Cherry Zhang	401d7e5a24	[dev.regabi] cmd/compile: reserve X15 as zero register on AMD64 In ABIInternal, reserve X15 as constant zero, and use it to zero memory. (Maybe there can be more use of it?) The register is zeroed when transition to ABIInternal from ABI0. Caveat: using X15 generates longer instructions than using X0. Maybe we want to use X0? Change-Id: I12d5ee92a01fc0b59dad4e5ab023ac71bc2a8b7d Reviewed-on: https://go-review.googlesource.com/c/go/+/288093 Trust: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2021-02-03 22:44:53 +00:00
David Chase	9a19481acb	[dev.regabi] cmd/compile: make ordering for InvertFlags more stable Current many architectures use a rule along the lines of // Canonicalize the order of arguments to comparisons - helps with CSE. ((CMP\|CMPW) x y) && x.ID > y.ID => (InvertFlags ((CMP\|CMPW) y x)) to normalize comparisons as much as possible for CSE. Replace the ID comparison with something less variable across compiler changes. This helps avoid spurious failures in some of the codegen-comparison tests (though the current choice of comparison is sensitive to Op ordering). Two tests changed to accommodate modified instruction choice. Change-Id: Ib35f450bd2bae9d4f9f7838ceaf7ec682bcf1e1a Reviewed-on: https://go-review.googlesource.com/c/go/+/280155 Trust: David Chase <drchase@google.com> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2021-01-13 02:40:43 +00:00
Lynn Boger	0ae3b7cb74	cmd/compile: fix rules regression with shifts on PPC64 Some rules for PPC64 were checking for a case where a shift followed by an 'and' of a mask could be lowered, depending on the format of the mask. The function to verify if the mask was valid for this purpose was not checking if the mask was 0 which we don't want to allow. This case can happen if previous optimizations resulted in that mask value. This fixes isPPC64ValidShiftMask to check for a mask of 0 and return false. This also adds a codegen testcase to verify it doesn't try to match the rules in the future. Fixes #42610 Change-Id: I565d94e88495f51321ab365d6388c01e791b4dbb Reviewed-on: https://go-review.googlesource.com/c/go/+/270358 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Paul Murphy <murp@ibm.com> Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org> Trust: Lynn Boger <laboger@linux.vnet.ibm.com>	2020-11-17 13:20:20 +00:00
Alberto Donizetti	7307e86afd	test/codegen: go fmt Fixes #42445 Change-Id: I9653ef094dba2a1ac2e3daaa98279d10df17a2a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/268257 Trust: Alberto Donizetti <alb.donizetti@gmail.com> Trust: Martin Möhrmann <moehrmann@google.com> Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com>	2020-11-08 12:19:55 +00:00
Michael Munday	854e892ce1	cmd/compile: optimize shift pairs and masks on s390x Optimize combinations of left and right shifts by a constant value into a 'rotate then insert selected bits [into zero]' instruction. Use the same instruction for contiguous masks since it has some benefits over 'and immediate' (not restricted to 32-bits, does not overwrite source register). To keep the complexity of this change under control I've only implemented 64 bit operations for now. There are a lot more optimizations that can be done with this instruction family. However, since their function overlaps with other instructions we need to be somewhat careful not to break existing optimization rules by creating optimization dead ends. This is particularly true of the load/store merging rules which contain lots of zero extensions and shifts. This CL does interfere with the store merging rules when an operand is shifted left before it is stored: binary.BigEndian.PutUint64(b, x << 1) This is unfortunate but it's not critical and somewhat complex so I plan to fix that in a follow up CL. file before after Δ % addr2line 4117446 4117282 -164 -0.004% api 4945184 4942752 -2432 -0.049% asm 4998079 4991891 -6188 -0.124% buildid 2685158 2684074 -1084 -0.040% cgo 4553732 4553394 -338 -0.007% compile 19294446 19245070 -49376 -0.256% cover 4897105 4891319 -5786 -0.118% dist 3544389 3542785 -1604 -0.045% doc 3926795 3927617 +822 +0.021% fix 3302958 3293868 -9090 -0.275% link 6546274 6543456 -2818 -0.043% nm 4102021 4100825 -1196 -0.029% objdump 4542431 4548483 +6052 +0.133% pack 2482465 2416389 -66076 -2.662% pprof 13366541 13363915 -2626 -0.020% test2json 2829007 2761515 -67492 -2.386% trace 10216164 10219684 +3520 +0.034% vet 6773956 6773572 -384 -0.006% total 107124151 106917891 -206260 -0.193% Change-Id: I7591cce41e06867ba10a745daae9333513062746 Reviewed-on: https://go-review.googlesource.com/c/go/+/233317 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Trust: Michael Munday <mike.munday@ibm.com>	2020-11-06 10:45:31 +00:00
Cherry Zhang	0387bedadf	cmd/compile: remove racefuncenterfp when it is not needed We already remove racefuncenter and racefuncexit if they are not needed (i.e. the function doesn't have any other race calls). racefuncenterfp is like racefuncenter but used on LR machines. Remove unnecessary racefuncenterfp as well. Change-Id: I65edb00e19c6d9ab55a204cbbb93e9fb710559f1 Reviewed-on: https://go-review.googlesource.com/c/go/+/267099 Trust: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2020-11-02 03:03:16 +00:00

1 2 3 4 5 ...

413 Commits