mirror of https://github.com/golang/go.git
65 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
1a6281d950 |
[release-branch.go1.16] cmd/compile: ensure constant shift amounts are in range for arm
Ensure constant shift amounts are in the range [0-31]. When shift amounts
are out of range, bad things happen. Shift amounts out of range occur
when lowering 64-bit shifts (we take an in-range shift s in [0-63] and
calculate s-32 and 32-s, both of which might be out of [0-31]).
The constant shift operations themselves still work, but their shift
amounts get copied unmolested to operations like ORshiftLL which use only
the low 5 bits. That changes an operation like <<100 which unconditionally
produces 0, to <<4, which doesn't.
Fixes #48478
Change-Id: I87363ef2b4ceaf3b2e316426064626efdfbb8ee3
Reviewed-on: https://go-review.googlesource.com/c/go/+/350969
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
(cherry picked from commit
|
|
|
|
3c85e995ef |
cmd/compile: extend ssa.AuxCall to closure and interface calls
Also introduce helper methods. Change-Id: I11a744ed002bae0ca9ebabba3206e1c14147e03d Reviewed-on: https://go-review.googlesource.com/c/go/+/239080 Trust: David Chase <drchase@google.com> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
b4ef49e527 |
cmd/compile: introduce special ssa Aux type for calls
This is prerequisite to moving call expansion later into SSA, and probably a good idea anyway. Passes tests. This is the first minimal CL that does a 1-for-1 substitution of *ssa.AuxCall for *obj.LSym. Next step (next CL) is to make this change for all calls so that additional information can be stored in AuxCall. Change-Id: Ia3a7715648fd9fb1a176850767a726e6f5b959eb Reviewed-on: https://go-review.googlesource.com/c/go/+/237680 Trust: David Chase <drchase@google.com> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
40ef1faabc |
cmd/compile: redo flag constant ops for arm
Encode the flag results in an auxint field instead of having one opcode per flag state. This helps us handle the new *noov branches in a unified manner. This is only for arm, arm64 is in a subsequent CL. We could extend to other architectures as well, athough it would only be cleanup, no behavioral change. Update #39505 Change-Id: Ia46cea596faad540d1496c5915ab1274571543f0 Reviewed-on: https://go-review.googlesource.com/c/go/+/238077 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
e031318ca6 |
cmd/compile: ARM comparisons with 0 incorrect on overflow
Some ARM rewriting rules convert 'comparing to zero' conditions of if statements to a simplified version utilizing CMN and CMP instructions to branch over condition flags, in order to save one Add or Sub caculation. Such optimizations lead to wrong branching in case an overflow/underflow occurs when executing CMN or CMP. Fix the issue by introducing new block opcodes that don't honor the overflow/underflow flag: Block-Op Meaning ARM condition codes 1. LTnoov less than MI 2. GEnoov greater than or equal PL 3. LEnoov less than or equal MI || EQ 4. GTnoov greater than NEQ & PL The patch also adds a few test cases to cover scenarios that are specific to ARM and fine-tunes the code generation tests for 'x-const'. For more details please refer to the previous fix on 64-bit ARM: https://go-review.googlesource.com/c/go/+/233097 Go1 perf, 'old' is the non-optimized version, that is removing all concerned rewriting rules. name old time/op new time/op delta BinaryTree17-8 7.73s ± 0% 7.81s ± 0% +0.97% (p=0.000 n=7+8) Fannkuch11-8 7.06s ± 0% 7.00s ± 0% -0.83% (p=0.000 n=8+8) FmtFprintfEmpty-8 181ns ± 1% 183ns ± 1% +1.31% (p=0.001 n=8+8) FmtFprintfString-8 319ns ± 1% 325ns ± 2% +1.71% (p=0.009 n=7+8) FmtFprintfInt-8 358ns ± 1% 359ns ± 1% ~ (p=0.293 n=7+7) FmtFprintfIntInt-8 459ns ± 3% 456ns ± 1% ~ (p=0.869 n=8+8) FmtFprintfPrefixedInt-8 535ns ± 4% 538ns ± 4% ~ (p=0.572 n=8+8) FmtFprintfFloat-8 1.01µs ± 2% 1.01µs ± 2% ~ (p=0.625 n=8+8) FmtManyArgs-8 1.93µs ± 2% 1.93µs ± 1% ~ (p=0.979 n=8+7) GobDecode-8 16.1ms ± 1% 16.5ms ± 1% +2.32% (p=0.000 n=8+8) GobEncode-8 15.9ms ± 0% 15.8ms ± 1% -1.00% (p=0.000 n=8+7) Gzip-8 690ms ± 1% 670ms ± 0% -2.90% (p=0.000 n=8+8) Gunzip-8 109ms ± 1% 109ms ± 1% ~ (p=0.694 n=7+8) HTTPClientServer-8 149µs ± 3% 146µs ± 2% -1.70% (p=0.028 n=8+8) JSONEncode-8 50.5ms ± 1% 49.2ms ± 0% -2.60% (p=0.001 n=7+7) JSONDecode-8 135ms ± 2% 137ms ± 1% ~ (p=0.054 n=8+7) Mandelbrot200-8 951ms ± 0% 952ms ± 0% ~ (p=0.852 n=6+8) GoParse-8 9.47ms ± 1% 9.66ms ± 1% +2.01% (p=0.000 n=8+8) RegexpMatchEasy0_32-8 288ns ± 2% 277ns ± 2% -3.61% (p=0.000 n=8+8) RegexpMatchEasy0_1K-8 1.66µs ± 1% 1.69µs ± 2% +2.21% (p=0.001 n=7+7) RegexpMatchEasy1_32-8 334ns ± 1% 305ns ± 2% -8.86% (p=0.000 n=8+8) RegexpMatchEasy1_1K-8 2.14µs ± 2% 2.15µs ± 0% ~ (p=0.099 n=8+8) RegexpMatchMedium_32-8 13.3ns ± 1% 13.3ns ± 0% ~ (p=1.000 n=7+7) RegexpMatchMedium_1K-8 81.1µs ± 3% 80.7µs ± 1% ~ (p=0.955 n=7+8) RegexpMatchHard_32-8 4.26µs ± 0% 4.26µs ± 0% ~ (p=0.933 n=7+8) RegexpMatchHard_1K-8 124µs ± 0% 124µs ± 0% +0.31% (p=0.000 n=8+8) Revcomp-8 14.7ms ± 2% 14.5ms ± 1% -1.66% (p=0.003 n=8+8) Template-8 197ms ± 2% 200ms ± 3% +1.62% (p=0.021 n=8+8) TimeParse-8 1.33µs ± 1% 1.30µs ± 1% -1.86% (p=0.002 n=8+8) TimeFormat-8 3.04µs ± 1% 3.02µs ± 0% -0.60% (p=0.000 n=8+8) name old speed new speed delta GobDecode-8 47.6MB/s ± 1% 46.5MB/s ± 1% -2.28% (p=0.000 n=8+8) GobEncode-8 48.1MB/s ± 0% 48.6MB/s ± 1% +1.02% (p=0.000 n=8+7) Gzip-8 28.1MB/s ± 1% 29.0MB/s ± 0% +2.97% (p=0.000 n=8+8) Gunzip-8 178MB/s ± 1% 179MB/s ± 2% ~ (p=0.694 n=7+8) JSONEncode-8 38.4MB/s ± 1% 39.4MB/s ± 0% +2.67% (p=0.001 n=7+7) JSONDecode-8 14.3MB/s ± 2% 14.2MB/s ± 1% -0.81% (p=0.043 n=8+7) GoParse-8 6.12MB/s ± 1% 5.99MB/s ± 1% -2.00% (p=0.000 n=8+8) RegexpMatchEasy0_32-8 111MB/s ± 2% 115MB/s ± 2% +3.77% (p=0.000 n=8+8) RegexpMatchEasy0_1K-8 618MB/s ± 1% 604MB/s ± 2% -2.16% (p=0.001 n=7+7) RegexpMatchEasy1_32-8 95.7MB/s ± 1% 105.1MB/s ± 2% +9.76% (p=0.000 n=8+8) RegexpMatchEasy1_1K-8 479MB/s ± 2% 477MB/s ± 0% ~ (p=0.105 n=8+8) RegexpMatchMedium_32-8 75.2MB/s ± 1% 75.2MB/s ± 0% ~ (p=0.247 n=7+7) RegexpMatchMedium_1K-8 12.6MB/s ± 3% 12.7MB/s ± 1% ~ (p=0.538 n=7+8) RegexpMatchHard_32-8 7.52MB/s ± 0% 7.52MB/s ± 0% ~ (p=0.968 n=7+8) RegexpMatchHard_1K-8 8.26MB/s ± 0% 8.24MB/s ± 0% -0.30% (p=0.001 n=8+8) Revcomp-8 173MB/s ± 2% 176MB/s ± 1% +1.68% (p=0.003 n=8+8) Template-8 9.85MB/s ± 2% 9.69MB/s ± 3% -1.59% (p=0.021 n=8+8) Fixes #39303 Updates #38740 Change-Id: I0a5f87bfda679f66414c0041ace2ca2e28363f36 Reviewed-on: https://go-review.googlesource.com/c/go/+/236637 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
2bad2f7eba |
cmd/compile: mark PanicBounds/Extend as calls
PanicBounds and PanicExtend are lowered to runtime calls (with a non-Go ABI), but are not currently marked as calls. Since liveness analysis only emits stack maps at calls in the runtime, this means these panic call sites in the runtime won't get a stack map. These almost immediately turn into throws in the runtime, but there's still a chance they'll try to grow the stack first, which would lead to a different panic. To fix this, mark these operations as calls. Outside the runtime, we currently emit stack maps for everything that isn't an unsafe-point, so these panic calls get stack maps by default. However, we're about to move to emitting stack maps only at call sites, at which point this will start to matter outside the runtime as well. I confirmed that this has no effect on anything but PCDATA/FUNCDATA in runtime and net/http. For #36365. Change-Id: Ic5bb463fd152cc320c815dc04cf62005261ae169 Reviewed-on: https://go-review.googlesource.com/c/go/+/230539 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
2ff746d7dc |
runtime: add async preemption support on ARM
This CL adds support of call injection and async preemption on ARM. Injected call, like sigpanic, has special frame layout. Teach traceback to handle it. Change-Id: I887e90134fbf8a676b73c26321c50b3c4762dba4 Reviewed-on: https://go-review.googlesource.com/c/go/+/202338 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
58b031949b |
cmd/compile: add fma intrinsic for arm
This change introduces an arm intrinsic that generates the FMULAD instruction for the fused-multiply-add operation on systems that support it. System support is detected via cpu.ARM.HasVFPv4. A rewrite rule translates the generic intrinsic to FMULAD. Updates #25819. Change-Id: I8459e5dd1cdbdca35f88a78dbeb7d387f1e20efa Reviewed-on: https://go-review.googlesource.com/c/go/+/142117 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
11d7775c9f |
cmd/compile: remove some nacl SSA rules
Updates golang/go#30439 Change-Id: I7ef5301fbd650d26a37a1241ddf7ca1ccd58b89d Reviewed-on: https://go-review.googlesource.com/c/go/+/200941 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> |
|
|
|
9c2e7e8bed |
cmd/compile: allow multiple SSA block control values
Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
3cfd003a8a |
cmd/compile: optimize ARM's math.bits.RotateLeft32
This CL optimizes math.bits.RotateLeft32 to inline "MOVW Rx@>Ry, Rd" on ARM. The benchmark results of math/bits show some improvements. name old time/op new time/op delta RotateLeft-4 9.42ns ± 0% 6.91ns ± 0% -26.66% (p=0.000 n=40+33) RotateLeft8-4 8.79ns ± 0% 8.79ns ± 0% -0.04% (p=0.000 n=40+31) RotateLeft16-4 8.79ns ± 0% 8.79ns ± 0% -0.04% (p=0.000 n=40+32) RotateLeft32-4 8.16ns ± 0% 7.54ns ± 0% -7.68% (p=0.000 n=40+40) RotateLeft64-4 15.7ns ± 0% 15.7ns ± 0% ~ (all equal) updates #31265 Change-Id: I77bc1c2c702d5323fc7cad5264a8e2d5666bf712 Reviewed-on: https://go-review.googlesource.com/c/go/+/188697 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
c683ab8128 |
cmd/compile: optimize ARM's math.Abs
This CL optimizes math.Abs to an inline ABSD instruction on ARM. The benchmark results of src/math/ show big improvements. name old time/op new time/op delta Acos-4 181ns ± 0% 182ns ± 0% +0.30% (p=0.000 n=40+40) Acosh-4 202ns ± 0% 202ns ± 0% ~ (all equal) Asin-4 163ns ± 0% 163ns ± 0% ~ (all equal) Asinh-4 242ns ± 0% 242ns ± 0% ~ (all equal) Atan-4 120ns ± 0% 121ns ± 0% +0.83% (p=0.000 n=40+40) Atanh-4 202ns ± 0% 202ns ± 0% ~ (all equal) Atan2-4 173ns ± 0% 173ns ± 0% ~ (all equal) Cbrt-4 1.06µs ± 0% 1.06µs ± 0% +0.09% (p=0.000 n=39+37) Ceil-4 72.9ns ± 0% 72.8ns ± 0% ~ (p=0.237 n=40+40) Copysign-4 13.2ns ± 0% 13.2ns ± 0% ~ (all equal) Cos-4 193ns ± 0% 183ns ± 0% -5.18% (p=0.000 n=40+40) Cosh-4 254ns ± 0% 239ns ± 0% -5.91% (p=0.000 n=40+40) Erf-4 112ns ± 0% 112ns ± 0% ~ (all equal) Erfc-4 117ns ± 0% 117ns ± 0% ~ (all equal) Erfinv-4 127ns ± 0% 127ns ± 1% ~ (p=0.492 n=40+40) Erfcinv-4 128ns ± 0% 128ns ± 0% ~ (all equal) Exp-4 212ns ± 0% 206ns ± 0% -3.05% (p=0.000 n=40+40) ExpGo-4 216ns ± 0% 209ns ± 0% -3.24% (p=0.000 n=40+40) Expm1-4 142ns ± 0% 142ns ± 0% ~ (all equal) Exp2-4 191ns ± 0% 184ns ± 0% -3.45% (p=0.000 n=40+40) Exp2Go-4 194ns ± 0% 187ns ± 0% -3.61% (p=0.000 n=40+40) Abs-4 14.4ns ± 0% 6.3ns ± 0% -56.39% (p=0.000 n=38+39) Dim-4 12.6ns ± 0% 12.6ns ± 0% ~ (all equal) Floor-4 49.6ns ± 0% 49.6ns ± 0% ~ (all equal) Max-4 27.6ns ± 0% 27.6ns ± 0% ~ (all equal) Min-4 27.0ns ± 0% 27.0ns ± 0% ~ (all equal) Mod-4 349ns ± 0% 305ns ± 1% -12.55% (p=0.000 n=33+40) Frexp-4 54.0ns ± 0% 47.1ns ± 0% -12.78% (p=0.000 n=38+38) Gamma-4 242ns ± 0% 234ns ± 0% -3.16% (p=0.000 n=36+40) Hypot-4 84.8ns ± 0% 67.8ns ± 0% -20.05% (p=0.000 n=31+35) HypotGo-4 88.5ns ± 0% 71.6ns ± 0% -19.12% (p=0.000 n=40+38) Ilogb-4 45.8ns ± 0% 38.9ns ± 0% -15.12% (p=0.000 n=40+32) J0-4 821ns ± 0% 802ns ± 0% -2.33% (p=0.000 n=33+40) J1-4 816ns ± 0% 807ns ± 0% -1.05% (p=0.000 n=40+29) Jn-4 1.67µs ± 0% 1.65µs ± 0% -1.45% (p=0.000 n=40+39) Ldexp-4 61.5ns ± 0% 54.6ns ± 0% -11.27% (p=0.000 n=40+32) Lgamma-4 188ns ± 0% 188ns ± 0% ~ (all equal) Log-4 154ns ± 0% 147ns ± 0% -4.78% (p=0.000 n=40+40) Logb-4 50.9ns ± 0% 42.7ns ± 0% -16.11% (p=0.000 n=34+39) Log1p-4 160ns ± 0% 159ns ± 0% ~ (p=0.828 n=40+40) Log10-4 173ns ± 0% 166ns ± 0% -4.05% (p=0.000 n=40+40) Log2-4 65.3ns ± 0% 58.4ns ± 0% -10.57% (p=0.000 n=37+37) Modf-4 36.4ns ± 0% 36.4ns ± 0% ~ (all equal) Nextafter32-4 36.4ns ± 0% 36.4ns ± 0% ~ (all equal) Nextafter64-4 32.7ns ± 0% 32.6ns ± 0% ~ (p=0.375 n=40+40) PowInt-4 300ns ± 0% 277ns ± 0% -7.78% (p=0.000 n=40+40) PowFrac-4 676ns ± 0% 635ns ± 0% -6.00% (p=0.000 n=40+35) Pow10Pos-4 17.6ns ± 0% 17.6ns ± 0% ~ (all equal) Pow10Neg-4 22.0ns ± 0% 22.0ns ± 0% ~ (all equal) Round-4 30.1ns ± 0% 30.1ns ± 0% ~ (all equal) RoundToEven-4 38.9ns ± 0% 38.9ns ± 0% ~ (all equal) Remainder-4 291ns ± 0% 263ns ± 0% -9.62% (p=0.000 n=40+40) Signbit-4 11.3ns ± 0% 11.3ns ± 0% ~ (all equal) Sin-4 185ns ± 0% 185ns ± 0% ~ (all equal) Sincos-4 230ns ± 0% 230ns ± 0% ~ (all equal) Sinh-4 253ns ± 0% 246ns ± 0% -2.77% (p=0.000 n=39+39) SqrtIndirect-4 41.4ns ± 0% 41.4ns ± 0% ~ (all equal) SqrtLatency-4 13.8ns ± 0% 13.8ns ± 0% ~ (all equal) SqrtIndirectLatency-4 37.0ns ± 0% 37.0ns ± 0% ~ (p=0.632 n=40+40) SqrtGoLatency-4 911ns ± 0% 911ns ± 0% +0.08% (p=0.000 n=40+40) SqrtPrime-4 13.2µs ± 0% 13.2µs ± 0% +0.01% (p=0.038 n=38+40) Tan-4 205ns ± 0% 205ns ± 0% ~ (all equal) Tanh-4 264ns ± 0% 247ns ± 0% -6.44% (p=0.000 n=39+32) Trunc-4 45.2ns ± 0% 45.2ns ± 0% ~ (all equal) Y0-4 796ns ± 0% 792ns ± 0% -0.55% (p=0.000 n=35+40) Y1-4 804ns ± 0% 797ns ± 0% -0.82% (p=0.000 n=24+40) Yn-4 1.64µs ± 0% 1.62µs ± 0% -1.27% (p=0.000 n=40+39) Float64bits-4 8.16ns ± 0% 8.16ns ± 0% +0.04% (p=0.000 n=35+40) Float64frombits-4 10.7ns ± 0% 10.7ns ± 0% ~ (all equal) Float32bits-4 7.53ns ± 0% 7.53ns ± 0% ~ (p=0.760 n=40+40) Float32frombits-4 6.91ns ± 0% 6.91ns ± 0% -0.04% (p=0.002 n=32+38) [Geo mean] 111ns 106ns -3.98% Change-Id: I54f4fd7f5160db020b430b556bde59cc0fdb996d Reviewed-on: https://go-review.googlesource.com/c/go/+/188678 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
2c423f063b |
cmd/compile,runtime: provide index information on bounds check failure
A few examples (for accessing a slice of length 3): s[-1] runtime error: index out of range [-1] s[3] runtime error: index out of range [3] with length 3 s[-1:0] runtime error: slice bounds out of range [-1:] s[3:0] runtime error: slice bounds out of range [3:0] s[3:-1] runtime error: slice bounds out of range [:-1] s[3:4] runtime error: slice bounds out of range [:4] with capacity 3 s[0:3:4] runtime error: slice bounds out of range [::4] with capacity 3 Note that in cases where there are multiple things wrong with the indexes (e.g. s[3:-1]), we report one of those errors kind of arbitrarily, currently the rightmost one. An exhaustive set of examples is in issue30116[u].out in the CL. The message text has the same prefix as the old message text. That leads to slightly awkward phrasing but hopefully minimizes the chance that code depending on the error text will break. Increases the size of the go binary by 0.5% (amd64). The panic functions take arguments in registers in order to keep the size of the compiled code as small as possible. Fixes #30116 Change-Id: Idb99a827b7888822ca34c240eca87b7e44a04fdd Reviewed-on: https://go-review.googlesource.com/c/go/+/161477 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
fee84cc905 |
cmd/compile: add an optimization rule for math/bits.ReverseBytes16 on arm
This CL adds two rules to turn patterns like ((x<<8) | (x>>8)) (the type of x is uint16, "|" can also be "+" or "^") to a REV16 instruction on arm v6+. This optimization rule can be used for math/bits.ReverseBytes16. Benchmarks on arm v6: name old time/op new time/op delta ReverseBytes-32 2.86ns ± 0% 2.86ns ± 0% ~ (all equal) ReverseBytes16-32 2.86ns ± 0% 2.86ns ± 0% ~ (all equal) ReverseBytes32-32 1.29ns ± 0% 1.29ns ± 0% ~ (all equal) ReverseBytes64-32 1.43ns ± 0% 1.43ns ± 0% ~ (all equal) Change-Id: I819e633c9a9d308f8e476fb0c82d73fb73dd019f Reviewed-on: https://go-review.googlesource.com/c/go/+/159019 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
|
|
|
096229b2ec |
cmd/compile: add missing type information for some arm/arm64 rules
Some indexed load/store rules lack of type information, and this CL adds that for them. Change-Id: Icac315ccb83a2f5bf30b056d4667d5b59eb4e5e2 Reviewed-on: https://go-review.googlesource.com/128455 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
|
|
|
20102594a0 |
cmd/compile: intrinsify runtime.getcallerpc on all link register architectures
Add a compiler intrinsic for getcallerpc on following architectures: arm mips mipsle mips64 mips64le ppc64 ppc64le s390x Change-Id: I758f3d4742fc214b206bcd07d90408622c17dbef Reviewed-on: https://go-review.googlesource.com/110835 Run-TryBot: Wei Xiao <Wei.Xiao@arm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
8871c930be |
cmd/compile: don't lower OpConvert
Currently, each architecture lowers OpConvert to an arch-specific OpXXXconvert. This is silly because OpConvert means the same thing on all architectures and is logically a no-op that exists only to keep track of conversions to and from unsafe.Pointer. Furthermore, lowering it makes it harder to recognize in other analyses, particularly liveness analysis. This CL eliminates the lowering of OpConvert, leaving it as the generic op until code generation time. The main complexity here is that we still need to register-allocate OpConvert operations. Currently, each arch's lowered OpConvert specifies all GP registers in its register mask. Ideally, OpConvert wouldn't affect value homing at all, and we could just copy the home of OpConvert's source, but this can potentially home an OpConvert in a LocalSlot, which neither regalloc nor stackalloc expect. Rather than try to disentangle this assumption from regalloc and stackalloc, we continue to register-allocate OpConvert, but teach regalloc that OpConvert can be allocated to any allocatable GP register. For #24543. Change-Id: I795a6aee5fd94d4444a7bafac3838a400c9f7bb6 Reviewed-on: https://go-review.googlesource.com/108496 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
caa1b4afbd |
cmd/compile/internal/ssa: note zero-width Ops
Add a bool to opInfo to indicate if an Op never results in any instructions. This is a conservative approximation: some operations, like Copy, may or may not generate code depending on their arguments. I built the list by reading each arch's ssaGenValue function. Hopefully I got them all. Change-Id: I130b251b65f18208294e129bb7ddc3f91d57d31d Reviewed-on: https://go-review.googlesource.com/97957 Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
1de1f316df |
runtime: buffered write barrier for arm
Updates #22460. Change-Id: I5581df7ad553237db7df3701b117ad99e0593b78 Reviewed-on: https://go-review.googlesource.com/92698 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
|
|
|
1ec78d1dd1 |
cmd/compile: optimize ARM code with CMN/TST/TEQ
CMN/TST/TEQ were supported since ARMv4, which can be used to simplify comparisons. This patch implements the optimization and here are the benchmark results. 1. A special test case got 18.21% improvement. name old time/op new time/op delta TSTTEQ-4 806µs ± 1% 659µs ± 0% -18.21% (p=0.000 n=20+18) (https://github.com/benshi001/ugo1/blob/master/tstteq_test.go) 2. There is no regression in the compilecmp benchmark. name old time/op new time/op delta Template 2.31s ± 1% 2.30s ± 1% ~ (p=0.661 n=10+9) Unicode 1.32s ± 3% 1.32s ± 5% ~ (p=0.280 n=10+10) GoTypes 7.69s ± 1% 7.65s ± 0% -0.52% (p=0.027 n=10+8) Compiler 36.5s ± 1% 36.4s ± 1% ~ (p=0.546 n=9+9) SSA 85.1s ± 2% 84.9s ± 1% ~ (p=0.529 n=10+10) Flate 1.43s ± 2% 1.43s ± 2% ~ (p=0.661 n=10+9) GoParser 1.81s ± 2% 1.81s ± 1% ~ (p=0.796 n=10+10) Reflect 5.10s ± 2% 5.09s ± 1% ~ (p=0.853 n=10+10) Tar 2.47s ± 1% 2.48s ± 1% ~ (p=0.123 n=10+10) XML 2.59s ± 1% 2.58s ± 1% ~ (p=0.853 n=10+10) [Geo mean] 4.78s 4.77s -0.17% name old user-time/op new user-time/op delta Template 2.72s ± 3% 2.73s ± 2% ~ (p=0.928 n=10+10) Unicode 1.58s ± 4% 1.60s ± 1% ~ (p=0.087 n=10+9) GoTypes 9.41s ± 2% 9.36s ± 1% ~ (p=0.060 n=10+10) Compiler 44.4s ± 2% 44.2s ± 2% ~ (p=0.289 n=10+10) SSA 110s ± 2% 110s ± 1% ~ (p=0.739 n=10+10) Flate 1.67s ± 2% 1.63s ± 3% ~ (p=0.063 n=10+10) GoParser 2.12s ± 1% 2.12s ± 2% ~ (p=0.840 n=10+10) Reflect 5.94s ± 1% 5.98s ± 1% ~ (p=0.063 n=9+10) Tar 3.01s ± 2% 3.02s ± 2% ~ (p=0.584 n=10+10) XML 3.04s ± 3% 3.02s ± 2% ~ (p=0.696 n=10+10) [Geo mean] 5.73s 5.72s -0.20% name old text-bytes new text-bytes delta HelloSize 579kB ± 0% 579kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.8kB ± 0% 72.8kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. There is little change in the go1 benchmark (excluding the noise). name old time/op new time/op delta BinaryTree17-4 40.3s ± 1% 40.6s ± 1% +0.80% (p=0.000 n=30+30) Fannkuch11-4 24.2s ± 1% 24.1s ± 0% ~ (p=0.093 n=30+30) FmtFprintfEmpty-4 834ns ± 0% 826ns ± 0% -0.93% (p=0.000 n=29+24) FmtFprintfString-4 1.39µs ± 1% 1.36µs ± 0% -2.02% (p=0.000 n=30+30) FmtFprintfInt-4 1.43µs ± 1% 1.44µs ± 1% ~ (p=0.155 n=30+29) FmtFprintfIntInt-4 2.09µs ± 0% 2.11µs ± 0% +1.16% (p=0.000 n=28+30) FmtFprintfPrefixedInt-4 2.33µs ± 1% 2.36µs ± 0% +1.25% (p=0.000 n=30+30) FmtFprintfFloat-4 4.27µs ± 1% 4.32µs ± 1% +1.27% (p=0.000 n=30+30) FmtManyArgs-4 8.18µs ± 0% 8.14µs ± 0% -0.46% (p=0.000 n=25+27) GobDecode-4 101ms ± 1% 101ms ± 1% ~ (p=0.182 n=29+29) GobEncode-4 89.6ms ± 1% 87.8ms ± 2% -2.02% (p=0.000 n=30+29) Gzip-4 4.07s ± 1% 4.08s ± 1% ~ (p=0.173 n=30+27) Gunzip-4 602ms ± 1% 600ms ± 1% -0.29% (p=0.000 n=29+28) HTTPClientServer-4 679µs ± 4% 683µs ± 3% ~ (p=0.197 n=30+30) JSONEncode-4 241ms ± 1% 239ms ± 1% -0.84% (p=0.000 n=30+30) JSONDecode-4 903ms ± 1% 882ms ± 1% -2.33% (p=0.000 n=30+30) Mandelbrot200-4 41.8ms ± 0% 41.8ms ± 0% ~ (p=0.719 n=30+30) GoParse-4 45.5ms ± 1% 45.8ms ± 1% +0.52% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 1.27µs ± 1% 1.27µs ± 0% -0.60% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 7.77µs ± 6% 7.69µs ± 4% -0.96% (p=0.040 n=30+30) RegexpMatchEasy1_32-4 1.29µs ± 1% 1.28µs ± 1% -0.54% (p=0.000 n=30+30) RegexpMatchEasy1_1K-4 10.3µs ± 6% 10.2µs ± 3% ~ (p=0.453 n=30+27) RegexpMatchMedium_32-4 1.98µs ± 1% 2.00µs ± 1% +0.85% (p=0.000 n=30+29) RegexpMatchMedium_1K-4 503µs ± 0% 503µs ± 1% ~ (p=0.752 n=30+30) RegexpMatchHard_32-4 27.1µs ± 1% 26.5µs ± 0% -1.96% (p=0.000 n=30+24) RegexpMatchHard_1K-4 809µs ± 1% 799µs ± 1% -1.29% (p=0.000 n=29+30) Revcomp-4 67.3ms ± 2% 67.2ms ± 1% ~ (p=0.265 n=29+29) Template-4 1.08s ± 1% 1.07s ± 0% -1.39% (p=0.000 n=30+22) TimeParse-4 6.93µs ± 1% 6.96µs ± 1% +0.40% (p=0.005 n=30+30) TimeFormat-4 13.3µs ± 0% 13.3µs ± 1% ~ (p=0.734 n=30+30) [Geo mean] 709µs 707µs -0.32% name old speed new speed delta GobDecode-4 7.59MB/s ± 1% 7.57MB/s ± 1% ~ (p=0.145 n=29+29) GobEncode-4 8.56MB/s ± 1% 8.74MB/s ± 1% +2.07% (p=0.000 n=30+29) Gzip-4 4.76MB/s ± 1% 4.75MB/s ± 1% -0.25% (p=0.037 n=30+30) Gunzip-4 32.2MB/s ± 1% 32.3MB/s ± 1% +0.29% (p=0.000 n=29+28) JSONEncode-4 8.04MB/s ± 1% 8.11MB/s ± 1% +0.85% (p=0.000 n=30+30) JSONDecode-4 2.15MB/s ± 1% 2.20MB/s ± 1% +2.29% (p=0.000 n=30+30) GoParse-4 1.27MB/s ± 1% 1.26MB/s ± 1% -0.73% (p=0.000 n=30+30) RegexpMatchEasy0_32-4 25.1MB/s ± 1% 25.3MB/s ± 0% +0.61% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 131MB/s ± 6% 133MB/s ± 4% +1.35% (p=0.009 n=28+30) RegexpMatchEasy1_32-4 24.9MB/s ± 1% 25.0MB/s ± 1% +0.54% (p=0.000 n=30+30) RegexpMatchEasy1_1K-4 99.2MB/s ± 6% 100.2MB/s ± 3% ~ (p=0.448 n=30+27) RegexpMatchMedium_32-4 503kB/s ± 1% 500kB/s ± 0% -0.66% (p=0.002 n=30+24) RegexpMatchMedium_1K-4 2.04MB/s ± 0% 2.04MB/s ± 1% ~ (p=0.358 n=30+30) RegexpMatchHard_32-4 1.18MB/s ± 1% 1.20MB/s ± 1% +1.75% (p=0.000 n=30+30) RegexpMatchHard_1K-4 1.26MB/s ± 1% 1.28MB/s ± 1% +1.42% (p=0.000 n=30+30) Revcomp-4 37.8MB/s ± 2% 37.8MB/s ± 1% ~ (p=0.266 n=29+29) Template-4 1.80MB/s ± 1% 1.82MB/s ± 1% +1.46% (p=0.000 n=30+30) [Geo mean] 6.91MB/s 6.96MB/s +0.70% fixes #21583 Change-Id: I24065a80588ccae7de3ad732a3cfb0026cf7e214 Reviewed-on: https://go-review.googlesource.com/67490 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
|
|
|
6f3e5e637c |
cmd/compile: intrinsify runtime.getcallersp
Add a compiler intrinsic for getcallersp. So we are able to get rid of the argument (not done in this CL). Change-Id: Ic38fda1c694f918328659ab44654198fb116668d Reviewed-on: https://go-review.googlesource.com/69350 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
9732485851 |
cmd/compile: optimized ARM code with BFX/BFXU
BFX&BFXU were introduced in ARMv6T2. A single BFX or BFXU is more efficiently than a pair of left-shift/right-shift in bit field extraction. This patch implements this optimization. And the benchmark tests show big improvement in special cases and little change in total. 1. There is big improvement in a special test case. name old time/op new time/op delta BFX-4 665µs ± 1% 595µs ± 0% -10.61% (p=0.000 n=20+20) (The test case: https://github.com/benshi001/ugo1/blob/master/bfx_test.go) 2. The compilecmp benchmark shows no regression. name old time/op new time/op delta Template 2.33s ± 2% 2.34s ± 2% ~ (p=0.356 n=9+10) Unicode 1.32s ± 2% 1.30s ± 2% ~ (p=0.139 n=9+8) GoTypes 7.77s ± 1% 7.76s ± 1% ~ (p=0.780 n=10+9) Compiler 37.3s ± 1% 37.1s ± 1% ~ (p=0.211 n=10+9) SSA 84.3s ± 2% 84.3s ± 2% ~ (p=0.842 n=10+9) Flate 1.45s ± 1% 1.45s ± 3% ~ (p=0.853 n=10+10) GoParser 1.83s ± 2% 1.83s ± 2% ~ (p=0.739 n=10+10) Reflect 5.08s ± 2% 5.09s ± 2% ~ (p=0.720 n=9+10) Tar 2.44s ± 1% 2.44s ± 2% ~ (p=0.684 n=10+10) XML 2.62s ± 2% 2.62s ± 2% ~ (p=0.529 n=10+10) [Geo mean] 4.80s 4.79s -0.06% name old user-time/op new user-time/op delta Template 2.76s ± 2% 2.75s ± 3% ~ (p=0.893 n=10+10) Unicode 1.63s ± 1% 1.60s ± 1% -2.07% (p=0.000 n=8+9) GoTypes 9.54s ± 1% 9.52s ± 1% ~ (p=0.215 n=10+10) Compiler 46.0s ± 1% 46.0s ± 1% ~ (p=0.853 n=10+10) SSA 110s ± 1% 110s ± 1% ~ (p=0.838 n=10+10) Flate 1.69s ± 3% 1.69s ± 5% ~ (p=0.957 n=10+10) GoParser 2.15s ± 2% 2.15s ± 2% ~ (p=0.749 n=10+10) Reflect 6.03s ± 1% 5.99s ± 2% ~ (p=0.060 n=9+10) Tar 3.02s ± 2% 2.99s ± 2% ~ (p=0.214 n=10+10) XML 3.10s ± 2% 3.08s ± 2% ~ (p=0.732 n=9+10) [Geo mean] 5.82s 5.79s -0.41% name old text-bytes new text-bytes delta HelloSize 589kB ± 0% 589kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 76.9kB ± 0% 76.9kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. The go1 benchmark shows little change in total. (excluding noise) name old time/op new time/op delta BinaryTree17-4 41.5s ± 1% 41.6s ± 1% ~ (p=0.373 n=30+26) Fannkuch11-4 23.6s ± 1% 23.6s ± 1% +0.28% (p=0.003 n=29+30) FmtFprintfEmpty-4 826ns ± 1% 827ns ± 1% ~ (p=0.155 n=30+30) FmtFprintfString-4 1.35µs ± 1% 1.35µs ± 1% ~ (p=0.499 n=30+30) FmtFprintfInt-4 1.43µs ± 1% 1.41µs ± 1% -1.19% (p=0.000 n=30+30) FmtFprintfIntInt-4 2.15µs ± 1% 2.11µs ± 1% -1.78% (p=0.000 n=30+30) FmtFprintfPrefixedInt-4 2.21µs ± 1% 2.21µs ± 1% ~ (p=0.881 n=30+30) FmtFprintfFloat-4 4.41µs ± 1% 4.44µs ± 0% +0.64% (p=0.000 n=30+30) FmtManyArgs-4 8.06µs ± 1% 8.06µs ± 0% ~ (p=0.871 n=30+30) GobDecode-4 103ms ± 1% 104ms ± 2% +0.54% (p=0.013 n=28+29) GobEncode-4 92.4ms ± 1% 92.6ms ± 1% ~ (p=0.447 n=30+29) Gzip-4 4.17s ± 1% 4.06s ± 1% -2.56% (p=0.000 n=29+30) Gunzip-4 603ms ± 1% 602ms ± 1% ~ (p=0.423 n=30+30) HTTPClientServer-4 688µs ± 2% 674µs ± 3% -2.09% (p=0.000 n=29+30) JSONEncode-4 237ms ± 1% 237ms ± 1% ~ (p=0.061 n=29+30) JSONDecode-4 907ms ± 1% 910ms ± 1% ~ (p=0.061 n=30+30) Mandelbrot200-4 41.7ms ± 0% 41.7ms ± 0% +0.19% (p=0.000 n=24+20) GoParse-4 45.7ms ± 2% 45.5ms ± 2% -0.29% (p=0.005 n=30+30) RegexpMatchEasy0_32-4 1.27µs ± 0% 1.27µs ± 0% +0.12% (p=0.031 n=30+30) RegexpMatchEasy0_1K-4 7.77µs ± 4% 7.73µs ± 3% ~ (p=0.169 n=30+30) RegexpMatchEasy1_32-4 1.29µs ± 1% 1.29µs ± 1% ~ (p=0.126 n=30+30) RegexpMatchEasy1_1K-4 10.4µs ± 3% 10.3µs ± 2% -1.32% (p=0.004 n=30+29) RegexpMatchMedium_32-4 2.06µs ± 0% 2.06µs ± 0% ~ (p=0.071 n=30+30) RegexpMatchMedium_1K-4 531µs ± 1% 530µs ± 0% ~ (p=0.121 n=30+23) RegexpMatchHard_32-4 28.7µs ± 1% 28.6µs ± 1% -0.21% (p=0.001 n=30+27) RegexpMatchHard_1K-4 860µs ± 1% 857µs ± 1% ~ (p=0.105 n=30+27) Revcomp-4 67.3ms ± 2% 67.3ms ± 2% ~ (p=0.805 n=29+29) Template-4 1.08s ± 1% 1.08s ± 1% ~ (p=0.260 n=30+30) TimeParse-4 7.04µs ± 0% 7.04µs ± 0% ~ (p=0.315 n=30+30) TimeFormat-4 13.2µs ± 1% 13.2µs ± 1% ~ (p=0.077 n=30+30) [Geo mean] 715µs 713µs -0.30% name old speed new speed delta GobDecode-4 7.42MB/s ± 1% 7.38MB/s ± 2% -0.54% (p=0.011 n=28+29) GobEncode-4 8.30MB/s ± 1% 8.29MB/s ± 1% ~ (p=0.484 n=30+29) Gzip-4 4.65MB/s ± 2% 4.78MB/s ± 1% +2.73% (p=0.000 n=30+30) Gunzip-4 32.2MB/s ± 1% 32.2MB/s ± 1% ~ (p=0.357 n=30+30) JSONEncode-4 8.18MB/s ± 1% 8.19MB/s ± 1% ~ (p=0.052 n=29+30) JSONDecode-4 2.14MB/s ± 1% 2.13MB/s ± 1% ~ (p=0.074 n=30+29) GoParse-4 1.27MB/s ± 1% 1.27MB/s ± 2% ~ (p=0.618 n=24+30) RegexpMatchEasy0_32-4 25.2MB/s ± 0% 25.2MB/s ± 0% -0.12% (p=0.031 n=30+30) RegexpMatchEasy0_1K-4 132MB/s ± 5% 132MB/s ± 2% ~ (p=0.171 n=30+30) RegexpMatchEasy1_32-4 24.8MB/s ± 1% 24.9MB/s ± 1% ~ (p=0.106 n=30+30) RegexpMatchEasy1_1K-4 98.4MB/s ± 3% 99.6MB/s ± 4% +1.19% (p=0.011 n=30+30) RegexpMatchMedium_32-4 483kB/s ± 1% 484kB/s ± 1% ~ (p=0.426 n=30+30) RegexpMatchMedium_1K-4 1.93MB/s ± 1% 1.93MB/s ± 0% ~ (p=0.157 n=30+17) RegexpMatchHard_32-4 1.12MB/s ± 1% 1.12MB/s ± 0% +0.33% (p=0.001 n=30+24) RegexpMatchHard_1K-4 1.19MB/s ± 1% 1.19MB/s ± 1% ~ (p=0.290 n=30+30) Revcomp-4 37.8MB/s ± 2% 37.8MB/s ± 1% ~ (p=0.815 n=29+29) Template-4 1.80MB/s ± 1% 1.80MB/s ± 1% ~ (p=0.586 n=30+30) [Geo mean] 6.80MB/s 6.81MB/s +0.25% fixes #20966 Change-Id: Idb5567bbe988c875315b8c98c128957cd474ccc5 Reviewed-on: https://go-review.googlesource.com/64950 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> |
|
|
|
a07176b45a |
cmd/compile: optimize ARM code with MULAF/MULSF/MULAD/MULSD
The go compiler can generate better ARM code with those more efficient FP instructions. And there is little improvement in total but big improvement in special cases. 1. The size of pkg/linux_arm/math.a shrinks by 2.4%. 2. there is neither improvement nor regression in compilecmp benchmark. name old time/op new time/op delta Template 2.32s ± 2% 2.32s ± 1% ~ (p=1.000 n=9+10) Unicode 1.32s ± 4% 1.32s ± 4% ~ (p=0.912 n=10+10) GoTypes 7.76s ± 1% 7.79s ± 1% ~ (p=0.447 n=9+10) Compiler 37.4s ± 2% 37.2s ± 2% ~ (p=0.218 n=10+10) SSA 84.8s ± 2% 85.0s ± 1% ~ (p=0.604 n=10+9) Flate 1.45s ± 2% 1.44s ± 2% ~ (p=0.075 n=10+10) GoParser 1.82s ± 1% 1.81s ± 1% ~ (p=0.190 n=10+10) Reflect 5.06s ± 1% 5.05s ± 1% ~ (p=0.315 n=10+9) Tar 2.37s ± 1% 2.37s ± 2% ~ (p=0.912 n=10+10) XML 2.56s ± 1% 2.58s ± 2% ~ (p=0.089 n=10+10) [Geo mean] 4.77s 4.77s -0.08% name old user-time/op new user-time/op delta Template 2.74s ± 2% 2.75s ± 2% ~ (p=0.856 n=9+10) Unicode 1.61s ± 4% 1.62s ± 3% ~ (p=0.693 n=10+10) GoTypes 9.55s ± 1% 9.49s ± 2% ~ (p=0.056 n=9+10) Compiler 45.9s ± 1% 45.8s ± 1% ~ (p=0.345 n=9+10) SSA 110s ± 1% 110s ± 1% ~ (p=0.763 n=9+10) Flate 1.68s ± 2% 1.68s ± 3% ~ (p=0.616 n=10+10) GoParser 2.14s ± 4% 2.14s ± 1% ~ (p=0.825 n=10+9) Reflect 5.95s ± 1% 5.97s ± 3% ~ (p=0.951 n=9+10) Tar 2.94s ± 3% 2.93s ± 2% ~ (p=0.359 n=10+10) XML 3.03s ± 3% 3.07s ± 6% ~ (p=0.166 n=10+10) [Geo mean] 5.76s 5.77s +0.12% name old text-bytes new text-bytes delta HelloSize 588kB ± 0% 588kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.9kB ± 0% 72.9kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. The performance of Mandelbrot200 improves 15%, though little improvement in total. name old time/op new time/op delta BinaryTree17-4 41.7s ± 1% 41.7s ± 1% ~ (p=0.264 n=29+23) Fannkuch11-4 24.2s ± 0% 24.1s ± 1% -0.13% (p=0.050 n=30+30) FmtFprintfEmpty-4 826ns ± 1% 824ns ± 1% -0.24% (p=0.038 n=25+30) FmtFprintfString-4 1.38µs ± 1% 1.38µs ± 0% -0.42% (p=0.000 n=27+25) FmtFprintfInt-4 1.46µs ± 1% 1.46µs ± 0% ~ (p=0.060 n=30+23) FmtFprintfIntInt-4 2.11µs ± 1% 2.08µs ± 0% -1.04% (p=0.000 n=30+30) FmtFprintfPrefixedInt-4 2.23µs ± 1% 2.22µs ± 1% -0.51% (p=0.000 n=30+30) FmtFprintfFloat-4 4.49µs ± 1% 4.48µs ± 1% -0.22% (p=0.004 n=26+30) FmtManyArgs-4 8.06µs ± 1% 8.12µs ± 1% +0.68% (p=0.000 n=25+30) GobDecode-4 104ms ± 1% 104ms ± 2% ~ (p=0.362 n=29+29) GobEncode-4 92.9ms ± 1% 92.8ms ± 2% ~ (p=0.786 n=30+30) Gzip-4 4.12s ± 1% 4.12s ± 1% ~ (p=0.314 n=30+30) Gunzip-4 602ms ± 1% 603ms ± 1% ~ (p=0.164 n=30+30) HTTPClientServer-4 659µs ± 1% 655µs ± 2% -0.64% (p=0.006 n=25+28) JSONEncode-4 234ms ± 1% 235ms ± 1% +0.29% (p=0.050 n=30+30) JSONDecode-4 912ms ± 0% 911ms ± 0% ~ (p=0.385 n=18+24) Mandelbrot200-4 49.2ms ± 0% 41.7ms ± 0% -15.35% (p=0.000 n=25+27) GoParse-4 46.3ms ± 1% 46.3ms ± 2% ~ (p=0.572 n=30+30) RegexpMatchEasy0_32-4 1.29µs ± 1% 1.27µs ± 0% -1.59% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 7.62µs ± 4% 7.71µs ± 3% ~ (p=0.074 n=30+30) RegexpMatchEasy1_32-4 1.31µs ± 0% 1.30µs ± 1% -0.71% (p=0.000 n=23+30) RegexpMatchEasy1_1K-4 10.3µs ± 3% 10.3µs ± 5% ~ (p=0.105 n=30+30) RegexpMatchMedium_32-4 2.06µs ± 1% 2.06µs ± 1% ~ (p=0.100 n=30+30) RegexpMatchMedium_1K-4 533µs ± 1% 534µs ± 1% ~ (p=0.254 n=29+30) RegexpMatchHard_32-4 28.9µs ± 0% 28.9µs ± 0% ~ (p=0.154 n=30+30) RegexpMatchHard_1K-4 868µs ± 1% 867µs ± 0% ~ (p=0.729 n=30+23) Revcomp-4 66.9ms ± 1% 67.2ms ± 2% ~ (p=0.102 n=28+29) Template-4 1.07s ± 1% 1.06s ± 1% -0.53% (p=0.000 n=30+30) TimeParse-4 7.07µs ± 1% 7.01µs ± 0% -0.85% (p=0.000 n=30+25) TimeFormat-4 13.1µs ± 0% 13.2µs ± 1% +0.77% (p=0.000 n=27+27) [Geo mean] 721µs 716µs -0.70% name old speed new speed delta GobDecode-4 7.38MB/s ± 1% 7.37MB/s ± 2% ~ (p=0.399 n=29+29) GobEncode-4 8.26MB/s ± 1% 8.27MB/s ± 2% ~ (p=0.790 n=30+30) Gzip-4 4.71MB/s ± 1% 4.71MB/s ± 1% ~ (p=0.885 n=30+30) Gunzip-4 32.2MB/s ± 1% 32.2MB/s ± 1% ~ (p=0.190 n=30+30) JSONEncode-4 8.28MB/s ± 1% 8.25MB/s ± 1% ~ (p=0.053 n=30+30) JSONDecode-4 2.13MB/s ± 0% 2.12MB/s ± 1% ~ (p=0.072 n=18+30) GoParse-4 1.25MB/s ± 1% 1.25MB/s ± 2% ~ (p=0.863 n=30+30) RegexpMatchEasy0_32-4 24.8MB/s ± 0% 25.2MB/s ± 1% +1.61% (p=0.000 n=30+30) RegexpMatchEasy0_1K-4 134MB/s ± 4% 133MB/s ± 3% ~ (p=0.074 n=30+30) RegexpMatchEasy1_32-4 24.5MB/s ± 0% 24.6MB/s ± 1% +0.72% (p=0.000 n=23+30) RegexpMatchEasy1_1K-4 99.1MB/s ± 3% 99.8MB/s ± 5% ~ (p=0.105 n=30+30) RegexpMatchMedium_32-4 483kB/s ± 1% 487kB/s ± 1% +0.83% (p=0.002 n=30+30) RegexpMatchMedium_1K-4 1.92MB/s ± 1% 1.92MB/s ± 1% ~ (p=0.058 n=30+30) RegexpMatchHard_32-4 1.10MB/s ± 0% 1.11MB/s ± 0% ~ (p=0.804 n=30+30) RegexpMatchHard_1K-4 1.18MB/s ± 0% 1.18MB/s ± 0% ~ (all equal) Revcomp-4 38.0MB/s ± 1% 37.8MB/s ± 2% ~ (p=0.098 n=28+29) Template-4 1.82MB/s ± 1% 1.83MB/s ± 1% +0.55% (p=0.000 n=29+29) [Geo mean] 6.79MB/s 6.79MB/s +0.09% Change-Id: Ia91991c2c5c59c5df712de85a83b13a21c0a554b Reviewed-on: https://go-review.googlesource.com/63770 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
2899c3e8cb |
cmd/compile: optimize ARM code with NMULF/NMULD
NMULF and NMULD are efficient FP instructions, and the go compiler can use them to generate better code. The benchmark tests of my patch did not show general change, but big improvement in special cases. 1.A special test case improved 12.6%. https://github.com/benshi001/ugo1/blob/master/fpmul_test.go name old time/op new time/op delta FPMul-4 398µs ± 1% 348µs ± 1% -12.64% (p=0.000 n=40+40) 2. the compilecmp test showed little change. name old time/op new time/op delta Template 2.30s ± 1% 2.31s ± 1% ~ (p=0.754 n=17+19) Unicode 1.31s ± 3% 1.32s ± 5% ~ (p=0.265 n=20+20) GoTypes 7.73s ± 2% 7.73s ± 1% ~ (p=0.925 n=20+20) Compiler 37.0s ± 1% 37.3s ± 2% +0.79% (p=0.002 n=19+20) SSA 83.8s ± 4% 83.5s ± 2% ~ (p=0.964 n=20+17) Flate 1.43s ± 2% 1.44s ± 1% ~ (p=0.602 n=20+20) GoParser 1.82s ± 2% 1.81s ± 2% ~ (p=0.141 n=19+20) Reflect 5.08s ± 2% 5.08s ± 3% ~ (p=0.835 n=20+19) Tar 2.36s ± 1% 2.35s ± 1% ~ (p=0.195 n=18+17) XML 2.57s ± 2% 2.56s ± 1% ~ (p=0.283 n=20+17) [Geo mean] 4.74s 4.75s +0.05% name old user-time/op new user-time/op delta Template 2.75s ± 2% 2.75s ± 0% ~ (p=0.620 n=20+15) Unicode 1.59s ± 4% 1.60s ± 4% ~ (p=0.479 n=20+19) GoTypes 9.48s ± 1% 9.47s ± 1% ~ (p=0.743 n=20+20) Compiler 45.7s ± 1% 45.7s ± 1% ~ (p=0.482 n=19+20) SSA 109s ± 1% 109s ± 2% ~ (p=0.800 n=18+20) Flate 1.67s ± 3% 1.67s ± 3% ~ (p=0.598 n=19+18) GoParser 2.15s ± 4% 2.13s ± 3% ~ (p=0.153 n=20+20) Reflect 5.95s ± 2% 5.95s ± 2% ~ (p=0.961 n=19+20) Tar 2.93s ± 2% 2.92s ± 3% ~ (p=0.242 n=20+19) XML 3.02s ± 3% 3.04s ± 3% ~ (p=0.233 n=19+18) [Geo mean] 5.74s 5.74s -0.04% name old text-bytes new text-bytes delta HelloSize 588kB ± 0% 588kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.9kB ± 0% 72.9kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. The go1 benchmark showed little change in total. name old time/op new time/op delta BinaryTree17-4 41.8s ± 1% 41.8s ± 1% ~ (p=0.388 n=40+39) Fannkuch11-4 24.1s ± 1% 24.1s ± 1% ~ (p=0.077 n=40+40) FmtFprintfEmpty-4 834ns ± 1% 831ns ± 1% -0.31% (p=0.002 n=40+37) FmtFprintfString-4 1.34µs ± 1% 1.34µs ± 0% ~ (p=0.387 n=40+40) FmtFprintfInt-4 1.44µs ± 1% 1.44µs ± 1% ~ (p=0.421 n=40+40) FmtFprintfIntInt-4 2.09µs ± 0% 2.09µs ± 1% ~ (p=0.589 n=40+39) FmtFprintfPrefixedInt-4 2.32µs ± 1% 2.33µs ± 1% +0.15% (p=0.001 n=40+40) FmtFprintfFloat-4 4.51µs ± 0% 4.44µs ± 1% -1.50% (p=0.000 n=40+40) FmtManyArgs-4 7.94µs ± 0% 7.97µs ± 0% +0.36% (p=0.001 n=32+40) GobDecode-4 104ms ± 1% 102ms ± 2% -1.27% (p=0.000 n=39+37) GobEncode-4 90.5ms ± 1% 90.9ms ± 2% +0.40% (p=0.006 n=37+40) Gzip-4 4.10s ± 2% 4.08s ± 1% -0.30% (p=0.004 n=40+40) Gunzip-4 603ms ± 0% 602ms ± 1% ~ (p=0.303 n=37+40) HTTPClientServer-4 672µs ± 3% 658µs ± 2% -2.08% (p=0.000 n=39+37) JSONEncode-4 238ms ± 1% 239ms ± 0% +0.26% (p=0.001 n=40+25) JSONDecode-4 884ms ± 1% 885ms ± 1% +0.16% (p=0.012 n=40+40) Mandelbrot200-4 49.3ms ± 0% 49.3ms ± 0% ~ (p=0.588 n=40+38) GoParse-4 46.3ms ± 1% 46.4ms ± 2% ~ (p=0.487 n=40+40) RegexpMatchEasy0_32-4 1.28µs ± 1% 1.28µs ± 0% +0.12% (p=0.003 n=40+40) RegexpMatchEasy0_1K-4 7.78µs ± 5% 7.78µs ± 4% ~ (p=0.825 n=40+40) RegexpMatchEasy1_32-4 1.29µs ± 1% 1.29µs ± 0% ~ (p=0.659 n=40+40) RegexpMatchEasy1_1K-4 10.3µs ± 3% 10.4µs ± 2% ~ (p=0.266 n=40+40) RegexpMatchMedium_32-4 2.05µs ± 1% 2.05µs ± 0% -0.18% (p=0.002 n=40+28) RegexpMatchMedium_1K-4 533µs ± 1% 534µs ± 1% ~ (p=0.397 n=37+40) RegexpMatchHard_32-4 28.9µs ± 1% 28.9µs ± 1% -0.22% (p=0.002 n=40+40) RegexpMatchHard_1K-4 868µs ± 1% 870µs ± 1% +0.21% (p=0.015 n=40+40) Revcomp-4 67.3ms ± 1% 67.2ms ± 2% ~ (p=0.262 n=38+39) Template-4 1.07s ± 1% 1.07s ± 1% ~ (p=0.276 n=40+40) TimeParse-4 7.16µs ± 1% 7.16µs ± 1% ~ (p=0.610 n=39+40) TimeFormat-4 13.3µs ± 1% 13.3µs ± 1% ~ (p=0.617 n=38+40) [Geo mean] 720µs 719µs -0.13% name old speed new speed delta GobDecode-4 7.39MB/s ± 1% 7.49MB/s ± 2% +1.25% (p=0.000 n=39+38) GobEncode-4 8.48MB/s ± 1% 8.45MB/s ± 2% -0.40% (p=0.005 n=37+40) Gzip-4 4.74MB/s ± 2% 4.75MB/s ± 1% +0.30% (p=0.018 n=40+40) Gunzip-4 32.2MB/s ± 0% 32.2MB/s ± 1% ~ (p=0.272 n=36+40) JSONEncode-4 8.15MB/s ± 1% 8.13MB/s ± 0% -0.26% (p=0.003 n=40+25) JSONDecode-4 2.19MB/s ± 1% 2.19MB/s ± 1% ~ (p=0.676 n=40+40) GoParse-4 1.25MB/s ± 2% 1.25MB/s ± 2% ~ (p=0.823 n=40+40) RegexpMatchEasy0_32-4 25.1MB/s ± 1% 25.1MB/s ± 0% -0.12% (p=0.006 n=40+40) RegexpMatchEasy0_1K-4 132MB/s ± 5% 132MB/s ± 5% ~ (p=0.821 n=40+40) RegexpMatchEasy1_32-4 24.7MB/s ± 1% 24.7MB/s ± 0% ~ (p=0.630 n=40+40) RegexpMatchEasy1_1K-4 99.1MB/s ± 3% 98.8MB/s ± 2% ~ (p=0.268 n=40+40) RegexpMatchMedium_32-4 487kB/s ± 2% 490kB/s ± 0% +0.51% (p=0.001 n=40+40) RegexpMatchMedium_1K-4 1.92MB/s ± 1% 1.92MB/s ± 1% ~ (p=0.208 n=39+40) RegexpMatchHard_32-4 1.11MB/s ± 1% 1.11MB/s ± 0% +0.36% (p=0.000 n=40+33) RegexpMatchHard_1K-4 1.18MB/s ± 1% 1.18MB/s ± 1% ~ (p=0.207 n=40+37) Revcomp-4 37.8MB/s ± 1% 37.8MB/s ± 2% ~ (p=0.276 n=38+39) Template-4 1.82MB/s ± 1% 1.81MB/s ± 1% ~ (p=0.122 n=38+40) [Geo mean] 6.81MB/s 6.81MB/s +0.06% fixes #19843 Change-Id: Ief3a0c2b15f59d40c7b40f2784eeb71196685b59 Reviewed-on: https://go-review.googlesource.com/61150 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
|
|
|
64607dbd26 |
cmd/compile: optimize ARM with MULS
MULS was introduced in ARMv7 and corresponding to MULA. This patch duplicated all MULA related SSA rules with MULS. Here was the contrast test result against the original go compiler. There was no improvement in total, but big improvement in special cases. 1. A specific test case accelerated 18.62%. (https://github.com/benshi001/ugo1/blob/master/mulsub_test.go) name old time/op new time/op delta MulSub-4 270µs ± 0% 219µs ± 0% -18.62% (p=0.000 n=35+40) 2. Total size of all .a files in pkg/ shrank by 0.002%. 3. The compilecmp benchmark showed no decline. name old time/op new time/op delta Template 2.37s ± 3% 2.36s ± 1% ~ (p=0.233 n=19+18) Unicode 1.32s ± 2% 1.34s ± 5% +1.32% (p=0.011 n=20+18) GoTypes 7.88s ± 1% 7.87s ± 1% ~ (p=0.758 n=20+20) Compiler 37.5s ± 1% 37.6s ± 1% ~ (p=0.194 n=20+19) SSA 83.7s ± 2% 83.5s ± 2% ~ (p=0.569 n=20+19) Flate 1.46s ± 3% 1.45s ± 1% ~ (p=0.619 n=20+17) GoParser 1.87s ± 2% 1.85s ± 1% -0.58% (p=0.048 n=20+18) Reflect 5.10s ± 2% 5.11s ± 2% ~ (p=0.365 n=19+20) Tar 1.78s ± 2% 1.78s ± 2% ~ (p=0.531 n=19+20) XML 2.62s ± 1% 2.61s ± 2% ~ (p=0.057 n=17+19) [Geo mean] 4.68s 4.67s -0.07% name old user-time/op new user-time/op delta Template 2.80s ± 1% 2.79s ± 2% ~ (p=0.686 n=17+20) Unicode 1.61s ± 4% 1.63s ± 6% ~ (p=0.222 n=20+20) GoTypes 9.59s ± 1% 9.60s ± 1% ~ (p=0.482 n=17+20) Compiler 46.1s ± 1% 46.2s ± 1% ~ (p=0.373 n=20+18) SSA 108s ± 1% 108s ± 2% ~ (p=0.784 n=20+20) Flate 1.68s ± 3% 1.69s ± 3% ~ (p=0.335 n=20+19) GoParser 2.20s ± 4% 2.19s ± 2% ~ (p=0.844 n=20+18) Reflect 5.97s ± 3% 6.01s ± 2% ~ (p=0.184 n=20+20) Tar 2.11s ± 2% 2.11s ± 4% ~ (p=0.961 n=19+20) XML 3.07s ± 1% 3.07s ± 3% ~ (p=0.786 n=16+19) [Geo mean] 5.61s 5.62s +0.19% name old text-bytes new text-bytes delta HelloSize 586kB ± 0% 586kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.9kB ± 0% 72.9kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 4. The go1 benchmark showed no decline in total. name old time/op new time/op delta BinaryTree17-4 41.7s ± 1% 41.7s ± 1% ~ (p=0.966 n=40+40) Fannkuch11-4 23.6s ± 0% 23.6s ± 1% -0.23% (p=0.000 n=40+40) FmtFprintfEmpty-4 844ns ± 1% 834ns ± 1% -1.23% (p=0.000 n=40+40) FmtFprintfString-4 1.39µs ± 1% 1.40µs ± 1% +0.71% (p=0.000 n=40+40) FmtFprintfInt-4 1.44µs ± 1% 1.45µs ± 1% +0.70% (p=0.000 n=40+40) FmtFprintfIntInt-4 2.10µs ± 1% 2.10µs ± 1% +0.30% (p=0.000 n=40+40) FmtFprintfPrefixedInt-4 2.49µs ± 0% 2.50µs ± 1% +0.66% (p=0.000 n=32+40) FmtFprintfFloat-4 4.42µs ± 1% 4.46µs ± 2% +0.94% (p=0.000 n=40+40) FmtManyArgs-4 8.31µs ± 1% 8.22µs ± 1% -1.09% (p=0.000 n=40+40) GobDecode-4 105ms ± 1% 102ms ± 1% -2.30% (p=0.000 n=39+39) GobEncode-4 90.2ms ± 1% 88.7ms ± 1% -1.66% (p=0.000 n=40+39) Gzip-4 4.17s ± 1% 4.16s ± 1% ~ (p=0.785 n=40+40) Gunzip-4 608ms ± 1% 608ms ± 1% ~ (p=0.481 n=40+40) HTTPClientServer-4 697µs ± 2% 684µs ± 3% -1.89% (p=0.000 n=37+40) JSONEncode-4 255ms ± 1% 256ms ± 1% +0.35% (p=0.000 n=40+40) JSONDecode-4 920ms ± 1% 926ms ± 1% +0.64% (p=0.000 n=40+39) Mandelbrot200-4 49.3ms ± 1% 49.3ms ± 0% +0.07% (p=0.005 n=40+40) GoParse-4 46.8ms ± 2% 46.7ms ± 1% ~ (p=1.000 n=40+40) RegexpMatchEasy0_32-4 1.27µs ± 0% 1.27µs ± 1% ~ (p=0.057 n=40+40) RegexpMatchEasy0_1K-4 7.97µs ± 7% 7.92µs ± 5% ~ (p=0.094 n=40+40) RegexpMatchEasy1_32-4 1.28µs ± 1% 1.28µs ± 1% ~ (p=0.406 n=40+40) RegexpMatchEasy1_1K-4 10.5µs ± 4% 10.5µs ± 3% ~ (p=0.855 n=40+40) RegexpMatchMedium_32-4 2.04µs ± 0% 2.04µs ± 1% -0.22% (p=0.000 n=39+40) RegexpMatchMedium_1K-4 541µs ± 0% 540µs ± 1% -0.25% (p=0.000 n=40+38) RegexpMatchHard_32-4 29.3µs ± 1% 29.3µs ± 0% ~ (p=0.149 n=40+40) RegexpMatchHard_1K-4 878µs ± 1% 880µs ± 0% +0.14% (p=0.005 n=36+35) Revcomp-4 81.8ms ± 2% 81.4ms ± 2% -0.43% (p=0.015 n=38+39) Template-4 1.05s ± 1% 1.05s ± 1% ~ (p=0.302 n=40+35) TimeParse-4 7.18µs ± 1% 7.26µs ± 1% +1.05% (p=0.000 n=40+36) TimeFormat-4 13.1µs ± 1% 13.1µs ± 1% ~ (p=0.698 n=37+40) [Geo mean] 733µs 732µs -0.16% name old speed new speed delta GobDecode-4 7.34MB/s ± 1% 7.51MB/s ± 1% +2.36% (p=0.000 n=39+39) GobEncode-4 8.51MB/s ± 1% 8.65MB/s ± 1% +1.69% (p=0.000 n=40+39) Gzip-4 4.66MB/s ± 1% 4.66MB/s ± 1% ~ (p=0.783 n=40+40) Gunzip-4 31.9MB/s ± 1% 31.9MB/s ± 1% ~ (p=0.466 n=40+40) JSONEncode-4 7.61MB/s ± 1% 7.58MB/s ± 1% -0.35% (p=0.001 n=40+40) JSONDecode-4 2.11MB/s ± 1% 2.10MB/s ± 1% -0.52% (p=0.000 n=38+39) GoParse-4 1.24MB/s ± 2% 1.24MB/s ± 1% ~ (p=0.556 n=40+39) RegexpMatchEasy0_32-4 25.1MB/s ± 0% 25.1MB/s ± 1% ~ (p=0.064 n=40+40) RegexpMatchEasy0_1K-4 129MB/s ± 8% 129MB/s ± 5% ~ (p=0.094 n=40+40) RegexpMatchEasy1_32-4 25.0MB/s ± 1% 25.1MB/s ± 1% ~ (p=0.331 n=40+40) RegexpMatchEasy1_1K-4 97.7MB/s ± 4% 97.8MB/s ± 3% ~ (p=0.851 n=40+40) RegexpMatchMedium_32-4 490kB/s ± 0% 490kB/s ± 0% ~ (all equal) RegexpMatchMedium_1K-4 1.89MB/s ± 0% 1.90MB/s ± 1% +0.12% (p=0.031 n=40+40) RegexpMatchHard_32-4 1.09MB/s ± 1% 1.09MB/s ± 1% ~ (p=0.597 n=40+40) RegexpMatchHard_1K-4 1.16MB/s ± 1% 1.16MB/s ± 1% ~ (p=0.565 n=40+35) Revcomp-4 31.1MB/s ± 2% 31.2MB/s ± 2% +0.44% (p=0.018 n=38+39) Template-4 1.85MB/s ± 1% 1.85MB/s ± 1% ~ (p=0.873 n=40+40) [Geo mean] 6.66MB/s 6.67MB/s +0.26% Change-Id: Icc972d8a78ea06c32c3aa15733ff0537c82c2dc7 Reviewed-on: https://go-review.googlesource.com/58950 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> |
|
|
|
a2f22a6803 |
cmd/compile: optimize ARM with more efficient MOVB/MOVBU/MOVH/MOVHU
Like the indexed MOVW (MOVWloadidx/MOVWstoreidx) used in current ARM backend, the indexed MOVB/MOVBU/MOVH/MOVHU can also be used to generate further optimized ARM code. My patch implements this optimization. Here are some contrast test results against the original go compiler. 1. The total size of all .a files in pkg/ shrinks by 0.03%. 2. The compilecmp benchmark shows a little decline. name old time/op new time/op delta Template 2.35s ± 1% 2.37s ± 3% +0.94% (p=0.006 n=19+19) Unicode 1.33s ± 3% 1.33s ± 2% ~ (p=0.158 n=20+18) GoTypes 7.86s ± 2% 7.84s ± 1% ~ (p=0.284 n=19+18) Compiler 37.5s ± 1% 37.7s ± 2% ~ (p=0.101 n=20+19) SSA 83.4s ± 2% 83.6s ± 2% ~ (p=0.231 n=20+20) Flate 1.46s ± 2% 1.45s ± 1% ~ (p=0.097 n=20+17) GoParser 1.86s ± 2% 1.86s ± 4% ~ (p=0.738 n=20+20) Reflect 5.10s ± 1% 5.11s ± 1% ~ (p=0.290 n=20+18) Tar 1.78s ± 2% 1.77s ± 2% ~ (p=0.166 n=19+20) XML 2.61s ± 2% 2.61s ± 2% ~ (p=0.665 n=19+19) [Geo mean] 4.67s 4.68s +0.16% name old user-time/op new user-time/op delta Template 2.79s ± 3% 2.80s ± 2% ~ (p=0.662 n=20+20) Unicode 1.62s ± 3% 1.64s ± 4% ~ (p=0.252 n=20+20) GoTypes 9.58s ± 2% 9.62s ± 2% ~ (p=0.250 n=20+20) Compiler 46.2s ± 1% 46.2s ± 1% ~ (p=0.602 n=20+19) SSA 108s ± 1% 108s ± 2% ~ (p=0.242 n=18+20) Flate 1.69s ± 3% 1.69s ± 4% ~ (p=0.470 n=20+20) GoParser 2.16s ± 3% 2.20s ± 4% +1.70% (p=0.005 n=19+20) Reflect 6.02s ± 2% 6.02s ± 2% ~ (p=0.700 n=20+17) Tar 2.11s ± 2% 2.11s ± 3% ~ (p=0.480 n=18+20) XML 3.07s ± 2% 3.11s ± 4% +1.50% (p=0.043 n=20+20) [Geo mean] 5.61s 5.64s +0.55% name old text-bytes new text-bytes delta HelloSize 586kB ± 0% 586kB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 5.46kB ± 0% 5.46kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 72.9kB ± 0% 72.9kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.03MB ± 0% 1.03MB ± 0% ~ (all equal) 3. The go1 benchmark shows improvement totally, and even more than 10% improvement in the test case Revcomp. name old time/op new time/op delta BinaryTree17-4 42.0s ± 1% 41.5s ± 1% -1.32% (p=0.000 n=39+40) Fannkuch11-4 24.1s ± 1% 23.6s ± 0% -2.38% (p=0.000 n=40+40) FmtFprintfEmpty-4 843ns ± 0% 839ns ± 1% -0.46% (p=0.000 n=33+40) FmtFprintfString-4 1.44µs ± 1% 1.37µs ± 1% -5.48% (p=0.000 n=40+35) FmtFprintfInt-4 1.44µs ± 1% 1.41µs ± 2% -1.50% (p=0.000 n=40+40) FmtFprintfIntInt-4 2.07µs ± 1% 2.06µs ± 0% -0.78% (p=0.000 n=40+40) FmtFprintfPrefixedInt-4 2.50µs ± 1% 2.33µs ± 1% -6.85% (p=0.000 n=40+40) FmtFprintfFloat-4 4.36µs ± 1% 4.34µs ± 0% -0.39% (p=0.017 n=40+40) FmtManyArgs-4 8.11µs ± 0% 8.00µs ± 0% -1.37% (p=0.000 n=40+40) GobDecode-4 105ms ± 2% 103ms ± 2% -2.17% (p=0.000 n=39+39) GobEncode-4 90.1ms ± 2% 88.6ms ± 1% -1.67% (p=0.000 n=40+39) Gzip-4 4.18s ± 1% 4.09s ± 1% -2.03% (p=0.000 n=40+40) Gunzip-4 608ms ± 1% 603ms ± 1% -0.86% (p=0.000 n=40+34) HTTPClientServer-4 674µs ± 3% 661µs ± 2% -1.82% (p=0.000 n=40+39) JSONEncode-4 256ms ± 1% 243ms ± 0% -5.11% (p=0.000 n=39+31) JSONDecode-4 915ms ± 1% 904ms ± 1% -1.18% (p=0.000 n=40+36) Mandelbrot200-4 49.2ms ± 0% 49.3ms ± 0% ~ (p=0.254 n=34+40) GoParse-4 46.9ms ± 2% 46.9ms ± 1% ~ (p=0.737 n=40+39) RegexpMatchEasy0_32-4 1.28µs ± 1% 1.27µs ± 1% -0.71% (p=0.000 n=40+40) RegexpMatchEasy0_1K-4 7.86µs ± 4% 7.67µs ± 4% -2.46% (p=0.000 n=38+40) RegexpMatchEasy1_32-4 1.28µs ± 1% 1.28µs ± 1% -0.54% (p=0.000 n=40+40) RegexpMatchEasy1_1K-4 10.4µs ± 2% 10.3µs ± 2% -0.88% (p=0.003 n=40+39) RegexpMatchMedium_32-4 2.05µs ± 0% 2.04µs ± 0% -0.34% (p=0.000 n=40+33) RegexpMatchMedium_1K-4 541µs ± 1% 535µs ± 1% -1.02% (p=0.000 n=40+38) RegexpMatchHard_32-4 29.3µs ± 1% 29.1µs ± 1% -0.51% (p=0.000 n=40+40) RegexpMatchHard_1K-4 881µs ± 1% 871µs ± 1% -1.15% (p=0.000 n=40+40) Revcomp-4 81.7ms ± 2% 67.5ms ± 2% -17.37% (p=0.000 n=39+39) Template-4 1.05s ± 1% 1.08s ± 2% +3.67% (p=0.000 n=40+40) TimeParse-4 7.24µs ± 1% 7.09µs ± 1% -2.13% (p=0.000 n=40+40) TimeFormat-4 13.2µs ± 1% 13.1µs ± 0% -0.31% (p=0.007 n=40+31) [Geo mean] 733µs 718µs -2.03% name old speed new speed delta GobDecode-4 7.28MB/s ± 2% 7.44MB/s ± 2% +2.23% (p=0.000 n=39+39) GobEncode-4 8.52MB/s ± 2% 8.67MB/s ± 1% +1.70% (p=0.000 n=40+39) Gzip-4 4.65MB/s ± 1% 4.74MB/s ± 1% +1.94% (p=0.000 n=37+40) Gunzip-4 31.9MB/s ± 1% 32.2MB/s ± 1% +0.90% (p=0.000 n=40+36) JSONEncode-4 7.57MB/s ± 1% 7.98MB/s ± 0% +5.41% (p=0.000 n=40+31) JSONDecode-4 2.12MB/s ± 1% 2.15MB/s ± 1% +1.23% (p=0.000 n=40+40) GoParse-4 1.23MB/s ± 1% 1.23MB/s ± 1% ~ (p=0.769 n=39+40) RegexpMatchEasy0_32-4 25.0MB/s ± 1% 25.2MB/s ± 1% +0.71% (p=0.000 n=40+40) RegexpMatchEasy0_1K-4 130MB/s ± 5% 134MB/s ± 4% +2.53% (p=0.000 n=38+40) RegexpMatchEasy1_32-4 24.9MB/s ± 1% 25.1MB/s ± 1% +0.55% (p=0.000 n=40+40) RegexpMatchEasy1_1K-4 98.5MB/s ± 2% 99.4MB/s ± 2% +0.88% (p=0.003 n=40+39) RegexpMatchMedium_32-4 490kB/s ± 0% 490kB/s ± 0% ~ (all equal) RegexpMatchMedium_1K-4 1.89MB/s ± 1% 1.91MB/s ± 1% +1.02% (p=0.000 n=40+38) RegexpMatchHard_32-4 1.10MB/s ± 1% 1.10MB/s ± 0% +0.41% (p=0.000 n=40+33) RegexpMatchHard_1K-4 1.16MB/s ± 1% 1.17MB/s ± 1% +1.21% (p=0.000 n=40+40) Revcomp-4 31.1MB/s ± 2% 37.6MB/s ± 2% +21.03% (p=0.000 n=39+39) Template-4 1.86MB/s ± 1% 1.79MB/s ± 1% -3.51% (p=0.000 n=40+38) [Geo mean] 6.66MB/s 6.80MB/s +2.13% fixes #21492 Change-Id: Ia26e7ca393f0a5f31de240e8ff9a220453ca7e0d Reviewed-on: https://go-review.googlesource.com/58450 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
|
|
|
1e72bf6218 |
cmd/compile: experiment which clobbers all dead pointer fields
The experiment "clobberdead" clobbers all pointer fields that the compiler thinks are dead, just before and after every safepoint. Useful for debugging the generation of live pointer bitmaps. Helped find the following issues: Update #15936 Update #16026 Update #16095 Update #18860 Change-Id: Id1d12f86845e3d93bae903d968b1eac61fc461f9 Reviewed-on: https://go-review.googlesource.com/23924 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
01b1a34aac |
cmd/compile: rework handling of udiv on ARM
Instead of populating the aux symbol of CALLudiv during rewrite rules, populate it during genssa. This simplifies the rewrite rules. It also removes all remaining calls to ctxt.Lookup from any rewrite rules. This is a first step towards removing ctxt from ssa.Cache entirely, and also a first step towards converting the obj.LSym.Version field into a boolean. It should also speed up compilation. Also, move func udiv into package runtime. That's where it is anyway, and it lets udiv look and act like the rest of the runtime support functions. Change-Id: I41462a632c14fdc41f61b08049ec13cd80a87bfe Reviewed-on: https://go-review.googlesource.com/41191 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> |
|
|
|
8577f81a10 |
cmd/compile/internal: Optimization with RBIT and REV
By checking GOARM in ssa/gen/ARM.rules, each intermediate operator can be implemented via different instruction serials. It is up to the user to choose between compitability and efficiency. The Bswap32(x) is optimized to REV(x) when GOARM >= 6. The CTZ(x) is optimized to CLZ(RBIT x) when GOARM == 7. Change-Id: Ie9ee645fa39333fa79ad84ed4d1cefac30422814 Reviewed-on: https://go-review.googlesource.com/35610 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
691755304c |
cmd/compile/internal/ssa: populate SymEffects for SSA Ops
Changes to ${GOARCH}Ops.go files were mechanically produced using
github.com/mdempsky/ssa-symops, a one-off tool that inserts
"SymEffect: X" elements by pattern matching against the Op names.
Change-Id: Ibf3e481ffd588647f2a31662d72114b740ccbfcf
Reviewed-on: https://go-review.googlesource.com/38084
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
cc71aa9ac4 |
cmd/compile/internal/ssa: make ARM's udiv like other calls
Passes toolstash-check -all. Change-Id: Id389f8158cf33a3c0fcef373615b5351e7c74b5b Reviewed-on: https://go-review.googlesource.com/38082 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
08d8d5c986 |
cmd/compile/internal/ssa: replace {Defer,Go}Call with StaticCall
Passes toolstash-check -all. Change-Id: Icf8b75364e4761a5e56567f503b2c1cb17382ed2 Reviewed-on: https://go-review.googlesource.com/38080 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
6fd5e2549a |
cmd/compile: mark MOVWF/MOVFW clobbering F15 on ARM
The assembler back end uses F15 as a temporary register in these instructions. Checked the assembler back end and made sure that this is the only case clobbering F15. Fixes #19403. Change-Id: I02b9e00fdd9229db899f501c8e9b306e02912d83 Reviewed-on: https://go-review.googlesource.com/37792 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
067bab00a8 |
all: fix misspellings
Change-Id: I429637ca91f7db4144f17621de851a548dc1ce76 Reviewed-on: https://go-review.googlesource.com/34923 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> |
|
|
|
f9238a76ff |
cmd/compile: make LR allocatable in non-leaf functions on ARM
The mechanism is initially introduced (and reviewed) in CL 30597 on S390X. Reduce number of "spilled value remains" by 0.4% in cmd/go. Disabled on ARMv5 because LR is clobbered almost everywhere with inserted softfloat calls. Change-Id: I2934737ce2455909647ed2118fe2bd6f0aa5ac52 Reviewed-on: https://go-review.googlesource.com/32178 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
15817e409b |
cmd/compile: make link register allocatable in non-leaf functions
We save and restore the link register in non-leaf functions because it is clobbered by CALLs. It is therefore available for general purpose use. Only enabled on s390x currently. The RC4 benchmarks in particular benefit from the extra register: name old speed new speed delta RC4_128 243MB/s ± 2% 341MB/s ± 2% +40.46% (p=0.008 n=5+5) RC4_1K 267MB/s ± 0% 359MB/s ± 1% +34.32% (p=0.008 n=5+5) RC4_8K 271MB/s ± 0% 362MB/s ± 0% +33.61% (p=0.008 n=5+5) Change-Id: Id23bff95e771da9425353da2f32668b8e34ba09f Reviewed-on: https://go-review.googlesource.com/30597 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
|
|
|
98938189a1 |
cmd/compile: remove duplicate nilchecks
Mark nil check operations as faulting if their arg is zero. This lets the late nilcheck pass remove duplicates. Fixes #17242. Change-Id: I4c9938d8a5a1e43edd85b4a66f0b34004860bcd9 Reviewed-on: https://go-review.googlesource.com/29952 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
38cd79889e |
cmd/compile: simplify div/mod on ARM
On ARM, DIV, DIVU, MOD, MODU are pseudo instructions that makes runtime calls _div/_udiv/_mod/_umod, which themselves are wrappers of udiv. The udiv function does the real thing. Instead of generating these pseudo instructions, call to udiv directly. This removes one layer of wrappers (which has an awkward way of passing argument), and also allows combining DIV and MOD if both results are needed. Change-Id: I118afc3986db3a1daabb5c1e6e57430888c91817 Reviewed-on: https://go-review.googlesource.com/29390 Reviewed-by: David Chase <drchase@google.com> |
|
|
|
3134ab3c2d |
cmd/compile: redo nil checks
Get rid of BlockCheck. Josh goaded me into it, and I went down a rabbithole making it happen. NilCheck now panics if the pointer is nil and returns void, as before. BlockCheck is gone, and NilCheck is no longer a Control value for any block. It just exists (and deadcode knows not to throw it away). I rewrote the nilcheckelim pass to handle this case. In particular, there can now be multiple NilCheck ops per block. I moved all of the arch-dependent nil check elimination done as part of ssaGenValue into its own proper pass, so we don't have to duplicate that code for every architecture. Making the arch-dependent nil check its own pass means I needed to add a bunch of flags to the opcode table so I could write the code without arch-dependent ops everywhere. Change-Id: I419f891ac9b0de313033ff09115c374163416a9f Reviewed-on: https://go-review.googlesource.com/29120 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
c345a3913f |
cmd/compile: get rid of BlockCall
No need for it, we can treat calls as (mostly) normal values that take a memory and return a memory. Lowers the number of basic blocks needed to represent a function. "go test -c net/http" uses 27% fewer basic blocks. Probably doesn't affect generated code much, but should help various passes whose running time and/or space depends on the number of basic blocks. Fixes #15631 Change-Id: I0bf21e123f835e2cfa382753955a4f8bce03dfa6 Reviewed-on: https://go-review.googlesource.com/28950 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> |
|
|
|
8ff4260777 |
cmd/compile: intrinsify Ctz, Bswap on ARM
Atomic ops on ARM are implemented with kernel calls, so they are not intrinsified. Change-Id: I0e7cc2e5526ae1a3d24b4b89be1bd13db071f8ef Reviewed-on: https://go-review.googlesource.com/28977 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
320ddcf834 |
cmd/compile: inline atomics from runtime/internal/atomic on amd64
Inline atomic reads and writes on amd64. There's no reason to pay the overhead of a call for these. To keep atomic loads from being reordered, we make them return a <value,memory> tuple. Change the meaning of resultInArg0 for tuple-generating ops to mean the first part of the result tuple, not the second. This means we can always put the store part of the tuple last, matching how arguments are laid out. This requires reordering the outputs of add32carry and sub32carry and their descendents in various architectures. benchmark old ns/op new ns/op delta BenchmarkAtomicLoad64-8 2.09 0.26 -87.56% BenchmarkAtomicStore64-8 7.54 5.72 -24.14% TBD (in a different CL): Cas, Or8, ... Change-Id: I713ea88e7da3026c44ea5bdb56ed094b20bc5207 Reviewed-on: https://go-review.googlesource.com/27641 Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
0484052358 |
[dev.ssa] cmd/compile: remove flags from regMask
Reg allocator skips flag-typed values. Flag allocator uses the type and whether the op has "clobberFlags" set. Tested on AMD64, ARM, ARM64, 386. Passed 'toolstash -cmp' on AMD64. PPC64 is coded blindly. Change-Id: Ib1cc27efecef6a1bb27f7d7ed035a582660d244f Reviewed-on: https://go-review.googlesource.com/25480 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
114c05962c |
[dev.ssa] cmd/compile: fix possible invalid pointer spill in large Zero/Move on ARM
Instead of comparing the address of the end of the memory to zero/copy, comparing the address of the last element, which is a valid pointer. Also unify large and unaligned Zero/Move, by passing alignment as AuxInt. Fixes #16515 for ARM. Change-Id: I19a62b31c5acf5c55c16a89bea1039c926dc91e5 Reviewed-on: https://go-review.googlesource.com/25300 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
d8181d5d75 |
[dev.ssa] cmd/compile: simplify MOVWreg on ARM
For register-register move, if there is only one use, allocate it in the same register so we don't need to emit an instruction. Updates #15365. Change-Id: Iad41843854a506c521d577ad93fcbe73e8de8065 Reviewed-on: https://go-review.googlesource.com/25059 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
7b9873b9b9 |
[dev.ssa] cmd/internal/obj, etc.: add and use NEGF, NEGD instructions on ARM
Updates #15365. Change-Id: I372a5617c2c7d91de545cac0464809b96711b63a Reviewed-on: https://go-review.googlesource.com/24646 Run-TryBot: Cherry Zhang <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
25e0a367da |
[dev.ssa] cmd/compile: clean up tuple types and selects
Make tuple types and their SelectX ops fully generic. These ops no longer need to be lowered. Regalloc understands them and their tuple-generating arguments. We can now have opcodes returning arbitrary pairs of results. (And it would be easy to move to >2 results if needed.) Update arm implementation to the new standard. Implement just enough in 386 port to do 64-bit add. Change-Id: I370ed5aacce219c82e1954c61d1f63af76c16f79 Reviewed-on: https://go-review.googlesource.com/24976 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> |
|
|
|
7d70f84f54 |
[dev.ssa] cmd/compile: add floating point optimizations in SSA for ARM
Add some simplification rules for floating point ops. cmd/internal/obj/arm supports instructions that compare FP register to 0, but runtime softfloat simulator does not. This CL adds these instructions to softfloat simulator as well. Updates #15365. Change-Id: I29405b2bfcb4c8cf106cb7a1a811409fec91b170 Reviewed-on: https://go-review.googlesource.com/24790 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
8cc3f4a17e |
[dev.ssa] cmd/compile: use shifted and indexed ops in SSA for ARM
This CL implements the following optimizations for ARM: - use shifted ops (e.g. ADD R1<<2, R2) and indexed load/stores - break up shift ops. Shifts used to be one SSA op that generates multiple instructions. We break them up to multiple ops, which allows constant folding and CSE for comparisons. Conditional moves are introduced for this. - simplify zero/sign-extension ops. Updates #15365. Change-Id: I55e262a776a7ef2a1505d75e04d1208913c35d39 Reviewed-on: https://go-review.googlesource.com/24512 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
8599fdd9b6 |
[dev.ssa] cmd/compile: add some ARM optimization rewriting rules
Mostly constant folding rules, analogous to AMD64 ones. Along with some simplifications. Updates #15365. Change-Id: If83bc1188bb05acb982ef3a1c21704c187e3eb24 Reviewed-on: https://go-review.googlesource.com/24210 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |