mirror of https://github.com/golang/go.git
165 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
fe2cfb74ba |
all: drop 387 support
My last 387 CL. So sad ... ... ... ... not! Fixes #40255 Change-Id: I8d4ddb744b234b8adc735db2f7c3c7b6d8bbdfa4 Reviewed-on: https://go-review.googlesource.com/c/go/+/258957 Trust: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
5c2c6d3fbf |
runtime: framepointers are no longer an experiment - hard code them
I think they are no longer experimental status. Might as well promote them to permanent. Change-Id: Id1259601b3dd2061dd60df86ee48080bfb575d2f Reviewed-on: https://go-review.googlesource.com/c/go/+/249857 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
48403b268b |
cmd/compile: error if register is reused when setting edge state
When setting the edge state in register allocation we should only be setting each register once. It is not possible for a register to hold multiple values at once. This CL converts the runtime error seen in #38195 into an internal compiler error (ICE). It is better for the compiler to fail than generate an incorrect program. The bug reported in #38195 is now exposed as: ./parserc.go:459:11: internal compiler error: 'yaml_parser_parse_node': R5 is already set (v1074/v1241) [stack trace] Updates #38195. Change-Id: Id95842fd850b95494cbd472b6fd5a55513ecacec Reviewed-on: https://go-review.googlesource.com/c/go/+/228060 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
382fe3e249 |
cmd/compile: fix deallocation of live value copies in regalloc
When deallocating the input register to a phi so that the phi itself could be allocated to that register the code was also deallocating all copies of that phi input value. Those copies of the value could still be live and if they were the register allocator could reuse them incorrectly to hold speculative copies of other phi inputs. This causes strange bugs. No test because this is a very obscure scenario that is hard to replicate but CL 228060 adds an assertion to the compiler that does trigger when running the std tests on linux/s390x without this CL applied. Hopefully that assertion will prevent future regressions. Fixes #38195. Change-Id: Id975dadedd731c7bb21933b9ea6b17daaa5c9e1d Reviewed-on: https://go-review.googlesource.com/c/go/+/228061 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
1b3a1db19f |
cmd/compile: fix liveness for open-coded defer args for infinite loops
Once defined, a stack slot holding an open-coded defer arg should always be marked live, since it may be used at any time if there is a panic. These stack slots are typically kept live naturally by the open-defer code inlined at each return/exit point. However, we need to do extra work to make sure that they are kept live if a function has an infinite loop or a panic exit. For this fix, only in the case of a function that is using open-coded defers, we compute the set of blocks (most often empty) that cannot reach a return or a BlockExit (panic) because of an infinite loop. Then, for each block b which cannot reach a return or BlockExit or is a BlockExit block, we mark each defer arg slot as live, as long as the definition of the defer arg slot dominates block b. For this change, had to export (*Func).sdom (-> Sdom) and SparseTree.isAncestorEq (-> IsAncestorEq) Updates #35277 Change-Id: I7b53c9bd38ba384a3794386dd0eb94e4cbde4eb1 Reviewed-on: https://go-review.googlesource.com/c/go/+/204802 Run-TryBot: Dan Scales <danscales@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
07b4abd62e |
all: remove the nacl port (part 2, amd64p32 + toolchain)
This is part two if the nacl removal. Part 1 was CL 199499. This CL removes amd64p32 support, which might be useful in the future if we implement the x32 ABI. It also removes the nacl bits in the toolchain, and some remaining nacl bits. Updates #30439 Change-Id: I2475d5bb066d1b474e00e40d95b520e7c2e286e1 Reviewed-on: https://go-review.googlesource.com/c/go/+/200077 Reviewed-by: Ian Lance Taylor <iant@golang.org> |
|
|
|
72dc9ab191 |
cmd/compile: reuse dead register before reusing register holding constant
For commuting ops, check whether the second argument is dead before checking if the first argument is rematerializeable. Reusing the register holding a dead value is always best. Fixes #33580 Change-Id: I7372cfc03d514e6774d2d9cc727a3e6bf6ce2657 Reviewed-on: https://go-review.googlesource.com/c/go/+/199559 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
9c2e7e8bed |
cmd/compile: allow multiple SSA block control values
Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
1c50fcf853 |
cmd/compile: add 32 bit float registers/variables on wasm
Before this change, wasm only used float variables with a size of 64 bit and applied rounding to 32 bit precision where necessary. This change adds proper 32 bit float variables. Reduces the size of pkg/js_wasm by 254 bytes. Change-Id: Ieabe846a8cb283d66def3cdf11e2523b3b31f345 Reviewed-on: https://go-review.googlesource.com/c/go/+/195117 Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
0efbd10157 |
all: fix typos
Use the following (suboptimal) script to obtain a list of possible
typos:
#!/usr/bin/env sh
set -x
git ls-files |\
grep -e '\.\(c\|cc\|go\)$' |\
xargs -n 1\
awk\
'/\/\// { gsub(/.*\/\//, ""); print; } /\/\*/, /\*\// { gsub(/.*\/\*/, ""); gsub(/\*\/.*/, ""); }' |\
hunspell -d en_US -l |\
grep '^[[:upper:]]\{0,1\}[[:lower:]]\{1,\}$' |\
grep -v -e '^.\{1,4\}$' -e '^.\{16,\}$' |\
sort -f |\
uniq -c |\
awk '$1 == 1 { print $2; }'
Then, go through the results manually and fix the most obvious typos in
the non-vendored code.
Change-Id: I3cb5830a176850e1a0584b8a40b47bde7b260eae
Reviewed-on: https://go-review.googlesource.com/c/go/+/193848
Reviewed-by: Robert Griesemer <gri@golang.org>
|
|
|
|
8a317ebc0f |
cmd/compile: don't eliminate all registers when restricting to desired ones
We shouldn't mask to desired registers if we haven't masked out all the forbidden registers yet. In this path we haven't masked out the nospill registers yet. If the resulting mask contains only nospill registers, then allocReg fails. This can only happen on resultNotInArgs-marked instructions, which exist only on the ARM64, MIPS, MIPS64, and PPC64 ports. Maybe there's a better way to handle resultNotInArgs instructions. But for 1.13, this is a low-risk fix. Fixes #33355 Change-Id: I1082f78f798d1371bde65c58cc265540480e4fa4 Reviewed-on: https://go-review.googlesource.com/c/go/+/188178 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
4ae31dc8c5 |
cmd/compile: re-use regalloc's []valState
Updates #27739: reduces package ssa's allocated space by 3.77%. maxrss is harder to measure, but using best-of-three-runs as reported by /usr/bin/time -l, I see ~2% reduction in maxrss. We still have a long way to go, though; the new maxrss is still 1.1gb. name old alloc/op new alloc/op delta Template 38.8MB ± 0% 37.7MB ± 0% -2.77% (p=0.008 n=5+5) Unicode 28.2MB ± 0% 28.1MB ± 0% -0.20% (p=0.008 n=5+5) GoTypes 131MB ± 0% 127MB ± 0% -2.94% (p=0.008 n=5+5) Compiler 606MB ± 0% 587MB ± 0% -3.21% (p=0.008 n=5+5) SSA 2.14GB ± 0% 2.06GB ± 0% -3.77% (p=0.008 n=5+5) Flate 24.0MB ± 0% 23.3MB ± 0% -3.00% (p=0.008 n=5+5) GoParser 28.8MB ± 0% 28.1MB ± 0% -2.61% (p=0.008 n=5+5) Reflect 83.8MB ± 0% 81.5MB ± 0% -2.71% (p=0.008 n=5+5) Tar 36.4MB ± 0% 35.4MB ± 0% -2.73% (p=0.008 n=5+5) XML 47.9MB ± 0% 46.7MB ± 0% -2.49% (p=0.008 n=5+5) [Geo mean] 84.6MB 82.4MB -2.65% name old allocs/op new allocs/op delta Template 379k ± 0% 379k ± 0% -0.05% (p=0.008 n=5+5) Unicode 340k ± 0% 340k ± 0% ~ (p=0.151 n=5+5) GoTypes 1.36M ± 0% 1.36M ± 0% -0.06% (p=0.008 n=5+5) Compiler 5.49M ± 0% 5.48M ± 0% -0.03% (p=0.008 n=5+5) SSA 17.5M ± 0% 17.5M ± 0% -0.03% (p=0.008 n=5+5) Flate 235k ± 0% 235k ± 0% -0.04% (p=0.008 n=5+5) GoParser 302k ± 0% 302k ± 0% -0.04% (p=0.008 n=5+5) Reflect 976k ± 0% 975k ± 0% -0.10% (p=0.008 n=5+5) Tar 352k ± 0% 352k ± 0% -0.06% (p=0.008 n=5+5) XML 436k ± 0% 436k ± 0% -0.03% (p=0.008 n=5+5) [Geo mean] 842k 841k -0.04% Change-Id: I0ab6631b5a0bb6303c291dcb0367b586a4e584fb Reviewed-on: https://go-review.googlesource.com/c/go/+/176221 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> |
|
|
|
40df9cc606 |
cmd/compile: make KeepAlive work on stack object
Currently, runtime.KeepAlive applied on a stack object doesn't actually keeps the stack object alive, and the heap object referenced from it could be collected. This is because the address of the stack object is rematerializeable, and we just ignored KeepAlive on rematerializeable values. This CL fixes it. Fixes #30476. Change-Id: Ic1f75ee54ed94ea79bd46a8ddcd9e81d01556d1d Reviewed-on: https://go-review.googlesource.com/c/164537 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
f493e55723 |
cmd/compile: document regalloc fields
Document what the fields of regalloc mean. Hopefully will help people understand how the register allocator works. Change-Id: Ic322ed2019cc839b812740afe8cd2cf0b61da046 Reviewed-on: https://go-review.googlesource.com/137016 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> |
|
|
|
bce1f12225 |
cmd/compile/internal/ssa: use math/bits in countRegs and pickReg
Makes code simpler and faster (at least on x86). name old time/op new time/op delta CountRegs-8 7.40ns ± 1% 0.59ns ± 0% -92.02% (p=0.000 n=9+9) PickReg/(1<<0)-8 2.07ns ± 0% 0.37ns ± 0% -82.13% (p=0.000 n=9+10) PickReg/(1<<16)-8 11.8ns ± 0% 0.4ns ± 0% -96.86% (p=0.002 n=8+10) Change-Id: Ic780b615b75c25b6e7632a0de93b16a8e9ed0f8f Reviewed-on: https://go-review.googlesource.com/120318 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
31e1c30f55 |
cmd/compile: do not allow regalloc to LoadReg G register
On architectures where G is stored in a register, it is possible for a variable to allocated to it, and subsequently that variable may be spilled and reloaded, for example because of an intervening call. If such an allocation reaches a join point and it is the primary predecessor, it becomes the target of a reload, which is only usually right. Fix: guard all the LoadReg ops, and spill value in the G register (if any) before merges (in the same way that 387 FP registers are freed between blocks). Includes test. Fixes #25504. Change-Id: I0482a53e20970c7315bf09c0e407ae5bba2fe05d Reviewed-on: https://go-review.googlesource.com/114695 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
482d241936 |
cmd/compile: add wasm stack optimization
Go's SSA instructions only operate on registers. For example, an add instruction would read two registers, do the addition and then write to a register. WebAssembly's instructions, on the other hand, operate on the stack. The add instruction first pops two values from the stack, does the addition, then pushes the result to the stack. To fulfill Go's semantics, one needs to map Go's single add instruction to 4 WebAssembly instructions: - Push the value of local variable A to the stack - Push the value of local variable B to the stack - Do addition - Write value from stack to local variable C Now consider that B was set to the constant 42 before the addition: - Push constant 42 to the stack - Write value from stack to local variable B This works, but is inefficient. Instead, the stack is used directly by inlining instructions if possible. With inlining it becomes: - Push the value of local variable A to the stack (add) - Push constant 42 to the stack (constant) - Do addition (add) - Write value from stack to local variable C (add) Note that the two SSA instructions can not be generated sequentially anymore, because their WebAssembly instructions are interleaved. Design doc: https://docs.google.com/document/d/131vjr4DH6JFnb-blm_uRdaC0_Nv3OUwjEY5qVCxCup4 Updates #18892 Change-Id: Ie35e1c0bebf4985fddda0d6330eb2066f9ad6dec Reviewed-on: https://go-review.googlesource.com/103535 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
c2c1822b12 |
cmd/compile: assign and preserve statement boundaries.
A new pass run after ssa building (before any other optimization) identifies the "first" ssa node for each statement. Other "noise" nodes are tagged as being never appropriate for a statement boundary (e.g., VarKill, VarDef, Phi). Rewrite, deadcode, cse, and nilcheck are modified to move the statement boundaries forward whenever possible if a boundary-tagged ssa value is removed; never-boundary nodes are ignored in this search (some operations involving constants are also tagged as never-boundary and also ignored because they are likely to be moved or removed during optimization). Code generation treats all nodes except those explicitly marked as statement boundaries as "not statement" nodes, and floats statement boundaries to the beginning of each same-line run of instructions found within a basic block. Line number html conversion was modified to make statement boundary nodes a bit more obvious by prepending a "+". The code in fuse.go that glued together the value slices of two blocks produced a result that depended on the former capacities (not lengths) of the two slices. This causes differences in the 386 bootstrap, and also can sometimes put values into an order that does a worse job of preserving statement boundaries when values are removed. Portions of two delve tests that had caught problems were incorporated into ssa/debug_test.go. There are some opportunities to do better with optimized code, but the next-ing is not lying or overly jumpy. Over 4 CLs, compilebench geomean measured binary size increase of 3.5% and compile user time increase of 3.8% (this is after optimization to reuse a sparse map instead of creating multiple maps.) This CL worsens the optimized-debugging experience with Delve; we need to work with the delve team so that they can use the is_stmt marks that we're emitting now. The reference output changes from time to time depending on other changes in the compiler, sometimes better, sometimes worse. This CL now includes a test ensuring that 99+% of the lines in the Go command itself (a handy optimized binary) include is_stmt markers. Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a Reviewed-on: https://go-review.googlesource.com/102435 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
28edaf4584 |
cmd/compile,test: combine byte loads and stores on ppc64le
CL 74410 added rules to combine consecutive byte loads and stores when the byte order was little endian for ppc64le. This is the corresponding change for bytes that are in big endian order. These rules are all intended for a little endian target arch. This adds new testcases in test/codegen/memcombine.go Fixes #22496 Updates #24242 Benchmark improvement for encoding/binary: name old time/op new time/op delta ReadSlice1000Int32s-16 11.0µs ± 0% 9.0µs ± 0% -17.47% (p=0.029 n=4+4) ReadStruct-16 2.47µs ± 1% 2.48µs ± 0% +0.67% (p=0.114 n=4+4) ReadInts-16 642ns ± 1% 630ns ± 1% -2.02% (p=0.029 n=4+4) WriteInts-16 654ns ± 0% 653ns ± 1% -0.08% (p=0.629 n=4+4) WriteSlice1000Int32s-16 8.75µs ± 0% 8.20µs ± 0% -6.19% (p=0.029 n=4+4) PutUint16-16 1.16ns ± 0% 0.93ns ± 0% -19.83% (p=0.029 n=4+4) PutUint32-16 1.16ns ± 0% 0.93ns ± 0% -19.83% (p=0.029 n=4+4) PutUint64-16 1.85ns ± 0% 0.93ns ± 0% -49.73% (p=0.029 n=4+4) LittleEndianPutUint16-16 1.03ns ± 0% 0.93ns ± 0% -9.71% (p=0.029 n=4+4) LittleEndianPutUint32-16 0.93ns ± 0% 0.93ns ± 0% ~ (all equal) LittleEndianPutUint64-16 0.93ns ± 0% 0.93ns ± 0% ~ (all equal) PutUvarint32-16 43.0ns ± 0% 43.1ns ± 0% +0.12% (p=0.429 n=4+4) PutUvarint64-16 174ns ± 0% 175ns ± 0% +0.29% (p=0.429 n=4+4) Updates made to functions in gcm.go to enable their matching. An existing testcase prevents these functions from being replaced by those in encoding/binary due to import dependencies. Change-Id: Idb3bd1e6e7b12d86cd828fb29cb095848a3e485a Reviewed-on: https://go-review.googlesource.com/98136 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
5d9c78201f |
cmd/compile: allow R11 to be allocated on s390x
R11 is only used as a temporary by a very small set of instructions (DIV, MOD, MULH and extended MVC/XC instructions). By marking these instructions as clobbering R11 we can allocate R11 in the general case. Change-Id: I0d4ffe80e57c164d42a5ea5ef6308756a5b0f742 Reviewed-on: https://go-review.googlesource.com/110255 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> |
|
|
|
3d6647d6f8 |
cmd/compile: improve regalloc live values debug printing
Before: live values at end of each block b1: v3 v2 v7 avoid=0 b2: v3 v13 avoid=81 b3: v19[AX] v3 avoid=81 b6: avoid=0 b7: avoid=0 b5: avoid=0 b4: v3 v18 avoid=81 After: live values at end of each block b1: v3 v2 v7 b2: v3 v13 avoid=AX DI b3: v19[AX] v3 avoid=AX DI b6: b7: b5: b4: v3 v18 avoid=AX DI Change-Id: Ibec5c76a16151832b8d49a21c640699fdc9a9d28 Reviewed-on: https://go-review.googlesource.com/109000 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> |
|
|
|
8871c930be |
cmd/compile: don't lower OpConvert
Currently, each architecture lowers OpConvert to an arch-specific OpXXXconvert. This is silly because OpConvert means the same thing on all architectures and is logically a no-op that exists only to keep track of conversions to and from unsafe.Pointer. Furthermore, lowering it makes it harder to recognize in other analyses, particularly liveness analysis. This CL eliminates the lowering of OpConvert, leaving it as the generic op until code generation time. The main complexity here is that we still need to register-allocate OpConvert operations. Currently, each arch's lowered OpConvert specifies all GP registers in its register mask. Ideally, OpConvert wouldn't affect value homing at all, and we could just copy the home of OpConvert's source, but this can potentially home an OpConvert in a LocalSlot, which neither regalloc nor stackalloc expect. Rather than try to disentangle this assumption from regalloc and stackalloc, we continue to register-allocate OpConvert, but teach regalloc that OpConvert can be allocated to any allocatable GP register. For #24543. Change-Id: I795a6aee5fd94d4444a7bafac3838a400c9f7bb6 Reviewed-on: https://go-review.googlesource.com/108496 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
b9a365681f |
cmd/compile: adjust is-statement on Pos's to improve debugging
Stores to auto tmp variables can be hoisted to places where the line numbers make debugging look "jumpy". Turning those instructions into ones with is_stmt = 0 in the DWARF (accomplished by marking ssa nodes with NotStmt) makes debugging look better while still attributing the instructions with the correct line number. The same is true for certain register allocator spills and reloads. Change-Id: I97a394eb522d4911cc40b4bf5bf76d3d7221f6c0 Reviewed-on: https://go-review.googlesource.com/98415 Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
014a9048d4 |
cmd/compile: prefer to evict a rematerializable register
This resolves a long-standing regalloc TODO: If you must evict a register, choose to evict a register containing a rematerializable value, since that value won't need to be spilled. Provides very minor performance and size improvements. name old time/op new time/op delta BinaryTree17-8 2.20s ± 3% 2.18s ± 2% -0.77% (p=0.000 n=45+49) Fannkuch11-8 2.14s ± 2% 2.15s ± 2% +0.73% (p=0.000 n=43+44) FmtFprintfEmpty-8 30.6ns ± 4% 30.2ns ± 3% -1.14% (p=0.000 n=50+48) FmtFprintfString-8 54.5ns ± 6% 53.6ns ± 5% -1.64% (p=0.001 n=50+48) FmtFprintfInt-8 58.0ns ± 7% 57.6ns ± 4% ~ (p=0.220 n=50+50) FmtFprintfIntInt-8 85.3ns ± 2% 84.8ns ± 3% -0.62% (p=0.001 n=44+47) FmtFprintfPrefixedInt-8 93.9ns ± 6% 93.6ns ± 5% ~ (p=0.706 n=50+48) FmtFprintfFloat-8 178ns ± 4% 177ns ± 4% ~ (p=0.107 n=49+50) FmtManyArgs-8 376ns ± 4% 374ns ± 3% -0.58% (p=0.013 n=45+50) GobDecode-8 4.77ms ± 2% 4.76ms ± 3% ~ (p=0.059 n=47+46) GobEncode-8 4.04ms ± 2% 3.99ms ± 3% -1.13% (p=0.000 n=49+49) Gzip-8 177ms ± 2% 180ms ± 3% +1.43% (p=0.000 n=48+48) Gunzip-8 28.5ms ± 6% 28.3ms ± 5% ~ (p=0.104 n=50+49) HTTPClientServer-8 72.1µs ± 1% 72.0µs ± 1% -0.15% (p=0.042 n=48+42) JSONEncode-8 9.81ms ± 5% 10.03ms ± 6% +2.29% (p=0.000 n=50+49) JSONDecode-8 39.2ms ± 3% 39.3ms ± 2% ~ (p=0.095 n=49+49) Mandelbrot200-8 3.48ms ± 2% 3.46ms ± 2% -0.80% (p=0.000 n=47+48) GoParse-8 2.54ms ± 3% 2.51ms ± 3% -1.35% (p=0.000 n=49+49) RegexpMatchEasy0_32-8 66.0ns ± 7% 65.7ns ± 8% ~ (p=0.331 n=50+50) RegexpMatchEasy0_1K-8 155ns ± 4% 154ns ± 4% ~ (p=0.986 n=49+50) RegexpMatchEasy1_32-8 62.6ns ± 8% 62.2ns ± 5% ~ (p=0.395 n=50+49) RegexpMatchEasy1_1K-8 260ns ± 5% 255ns ± 3% -1.92% (p=0.000 n=49+49) RegexpMatchMedium_32-8 92.9ns ± 2% 91.8ns ± 2% -1.25% (p=0.000 n=46+48) RegexpMatchMedium_1K-8 27.7µs ± 3% 27.0µs ± 2% -2.59% (p=0.000 n=49+49) RegexpMatchHard_32-8 1.23µs ± 4% 1.21µs ± 2% -2.16% (p=0.000 n=49+44) RegexpMatchHard_1K-8 36.4µs ± 2% 35.7µs ± 2% -1.87% (p=0.000 n=48+49) Revcomp-8 274ms ± 2% 276ms ± 3% +0.70% (p=0.034 n=45+48) Template-8 45.1ms ± 8% 45.1ms ± 8% ~ (p=0.643 n=50+50) TimeParse-8 223ns ± 2% 223ns ± 2% ~ (p=0.401 n=47+47) TimeFormat-8 245ns ± 2% 246ns ± 3% ~ (p=0.758 n=49+50) [Geo mean] 36.5µs 36.3µs -0.54% name old object-bytes new object-bytes delta Template 480kB ± 0% 480kB ± 0% ~ (all equal) Unicode 214kB ± 0% 214kB ± 0% ~ (all equal) GoTypes 1.54MB ± 0% 1.54MB ± 0% -0.03% (p=0.008 n=5+5) Compiler 5.75MB ± 0% 5.75MB ± 0% ~ (all equal) SSA 14.6MB ± 0% 14.6MB ± 0% -0.01% (p=0.008 n=5+5) Flate 300kB ± 0% 300kB ± 0% -0.01% (p=0.008 n=5+5) GoParser 366kB ± 0% 366kB ± 0% ~ (all equal) Reflect 1.20MB ± 0% 1.20MB ± 0% ~ (all equal) Tar 413kB ± 0% 413kB ± 0% ~ (all equal) XML 529kB ± 0% 528kB ± 0% -0.13% (p=0.008 n=5+5) [Geo mean] 909kB 909kB -0.02% Change-Id: I46d37a55197683a98913f35801dc2b0d609653c8 Reviewed-on: https://go-review.googlesource.com/103240 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
dafca7de0f |
cmd/compile: prefer rematerialization to copying
Fixes #24132 name old time/op new time/op delta BinaryTree17-8 2.18s ± 2% 2.15s ± 2% -1.28% (p=0.000 n=25+26) Fannkuch11-8 2.16s ± 3% 2.13s ± 3% -1.54% (p=0.000 n=27+30) FmtFprintfEmpty-8 29.9ns ± 3% 29.6ns ± 3% -1.08% (p=0.001 n=29+26) FmtFprintfString-8 53.6ns ± 2% 54.0ns ± 4% ~ (p=0.193 n=28+29) FmtFprintfInt-8 56.8ns ± 3% 57.0ns ± 3% ~ (p=0.330 n=29+29) FmtFprintfIntInt-8 85.3ns ± 2% 85.8ns ± 3% +0.56% (p=0.042 n=30+29) FmtFprintfPrefixedInt-8 94.1ns ± 5% 99.0ns ± 8% +5.20% (p=0.000 n=27+30) FmtFprintfFloat-8 183ns ± 4% 182ns ± 3% ~ (p=0.619 n=30+26) FmtManyArgs-8 369ns ± 2% 369ns ± 2% ~ (p=0.748 n=27+29) GobDecode-8 4.78ms ± 2% 4.75ms ± 1% ~ (p=0.051 n=28+27) GobEncode-8 4.06ms ± 3% 4.07ms ± 3% ~ (p=0.781 n=29+30) Gzip-8 178ms ± 2% 177ms ± 2% ~ (p=0.171 n=29+30) Gunzip-8 28.2ms ± 7% 28.0ms ± 4% ~ (p=0.155 n=30+30) HTTPClientServer-8 71.5µs ± 3% 71.3µs ± 1% ~ (p=0.913 n=25+27) JSONEncode-8 9.71ms ± 5% 9.86ms ± 4% +1.55% (p=0.015 n=28+30) JSONDecode-8 38.8ms ± 2% 39.3ms ± 2% +1.41% (p=0.000 n=28+29) Mandelbrot200-8 3.47ms ± 6% 3.44ms ± 3% ~ (p=0.183 n=28+28) GoParse-8 2.55ms ± 2% 2.54ms ± 3% -0.58% (p=0.003 n=27+29) RegexpMatchEasy0_32-8 66.0ns ± 5% 65.3ns ± 4% ~ (p=0.124 n=30+30) RegexpMatchEasy0_1K-8 152ns ± 2% 152ns ± 3% ~ (p=0.881 n=30+30) RegexpMatchEasy1_32-8 62.9ns ± 9% 62.7ns ± 7% ~ (p=0.717 n=30+30) RegexpMatchEasy1_1K-8 263ns ± 3% 263ns ± 4% ~ (p=0.909 n=30+29) RegexpMatchMedium_32-8 93.4ns ± 3% 89.3ns ± 2% -4.32% (p=0.000 n=29+29) RegexpMatchMedium_1K-8 27.5µs ± 3% 27.1µs ± 2% -1.46% (p=0.000 n=30+27) RegexpMatchHard_32-8 1.33µs ± 3% 1.31µs ± 3% -1.50% (p=0.000 n=27+28) RegexpMatchHard_1K-8 39.4µs ± 2% 39.1µs ± 2% -0.54% (p=0.027 n=28+28) Revcomp-8 274ms ± 4% 276ms ± 2% +0.67% (p=0.048 n=29+28) Template-8 45.1ms ± 5% 44.6ms ± 7% -1.22% (p=0.029 n=30+29) TimeParse-8 227ns ± 3% 224ns ± 3% -1.25% (p=0.000 n=28+27) TimeFormat-8 248ns ± 3% 245ns ± 3% -1.33% (p=0.002 n=30+29) [Geo mean] 36.6µs 36.5µs -0.32% Change-Id: I24083f0013506b77e2d9da99c40ae2f67803285e Reviewed-on: https://go-review.googlesource.com/101076 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
377a2cb2d2 |
cmd/compile: reduce allocations in regAllocState.regalloc
name old time/op new time/op delta Template 281ms ± 2% 282ms ± 3% ~ (p=0.428 n=19+20) Unicode 138ms ± 6% 138ms ± 7% ~ (p=0.813 n=19+20) GoTypes 901ms ± 2% 895ms ± 2% ~ (p=0.050 n=19+20) Compiler 4.25s ± 1% 4.23s ± 1% -0.31% (p=0.031 n=19+18) SSA 9.77s ± 1% 9.78s ± 1% ~ (p=0.512 n=20+20) Flate 187ms ± 3% 187ms ± 4% ~ (p=0.687 n=20+19) GoParser 224ms ± 4% 222ms ± 3% ~ (p=0.301 n=20+20) Reflect 576ms ± 2% 576ms ± 2% ~ (p=0.620 n=20+20) Tar 262ms ± 3% 263ms ± 3% ~ (p=0.599 n=19+18) XML 322ms ± 4% 322ms ± 2% ~ (p=0.512 n=20+20) name old user-time/op new user-time/op delta Template 403ms ± 3% 399ms ± 5% ~ (p=0.149 n=17+20) Unicode 217ms ±12% 217ms ± 9% ~ (p=0.883 n=20+20) GoTypes 1.24s ± 3% 1.24s ± 3% ~ (p=0.718 n=20+20) Compiler 5.90s ± 3% 5.84s ± 5% ~ (p=0.217 n=18+20) SSA 14.0s ± 6% 14.1s ± 5% ~ (p=0.235 n=19+20) Flate 253ms ± 6% 254ms ± 5% ~ (p=0.749 n=20+19) GoParser 309ms ± 7% 307ms ± 5% ~ (p=0.398 n=20+20) Reflect 772ms ± 3% 771ms ± 3% ~ (p=0.901 n=20+19) Tar 368ms ± 5% 369ms ± 8% ~ (p=0.429 n=20+20) XML 435ms ± 5% 434ms ± 5% ~ (p=0.841 n=20+20) name old alloc/op new alloc/op delta Template 39.0MB ± 0% 38.9MB ± 0% -0.21% (p=0.000 n=20+19) Unicode 29.0MB ± 0% 29.0MB ± 0% -0.03% (p=0.000 n=20+20) GoTypes 116MB ± 0% 115MB ± 0% -0.33% (p=0.000 n=20+20) Compiler 498MB ± 0% 496MB ± 0% -0.37% (p=0.000 n=19+20) SSA 1.41GB ± 0% 1.40GB ± 0% -0.24% (p=0.000 n=20+20) Flate 25.0MB ± 0% 25.0MB ± 0% -0.22% (p=0.000 n=20+19) GoParser 31.0MB ± 0% 30.9MB ± 0% -0.23% (p=0.000 n=20+17) Reflect 77.1MB ± 0% 77.0MB ± 0% -0.12% (p=0.000 n=20+20) Tar 39.7MB ± 0% 39.6MB ± 0% -0.17% (p=0.000 n=20+20) XML 44.9MB ± 0% 44.8MB ± 0% -0.29% (p=0.000 n=20+20) name old allocs/op new allocs/op delta Template 386k ± 0% 385k ± 0% -0.28% (p=0.000 n=20+20) Unicode 337k ± 0% 336k ± 0% -0.07% (p=0.000 n=20+20) GoTypes 1.20M ± 0% 1.20M ± 0% -0.41% (p=0.000 n=20+20) Compiler 4.71M ± 0% 4.68M ± 0% -0.52% (p=0.000 n=20+20) SSA 11.7M ± 0% 11.6M ± 0% -0.31% (p=0.000 n=20+19) Flate 238k ± 0% 237k ± 0% -0.28% (p=0.000 n=18+20) GoParser 320k ± 0% 319k ± 0% -0.34% (p=0.000 n=20+19) Reflect 961k ± 0% 959k ± 0% -0.12% (p=0.000 n=20+20) Tar 397k ± 0% 396k ± 0% -0.23% (p=0.000 n=20+20) XML 419k ± 0% 417k ± 0% -0.39% (p=0.000 n=20+19) Change-Id: Ic7ec3614808d9892c1cab3991b996b7a3b8eff21 Reviewed-on: https://go-review.googlesource.com/102676 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
2ba98f1ae9 |
cmd/compile: avoid some allocations in regalloc
Compilebench: name old time/op new time/op delta Template 283ms ± 3% 281ms ± 4% ~ (p=0.242 n=20+20) Unicode 137ms ± 6% 135ms ± 6% ~ (p=0.194 n=20+19) GoTypes 890ms ± 2% 883ms ± 1% -0.74% (p=0.001 n=19+19) Compiler 4.21s ± 2% 4.20s ± 2% -0.40% (p=0.033 n=20+19) SSA 9.86s ± 2% 9.68s ± 1% -1.80% (p=0.000 n=20+19) Flate 185ms ± 5% 185ms ± 7% ~ (p=0.429 n=20+20) GoParser 222ms ± 3% 222ms ± 4% ~ (p=0.588 n=19+20) Reflect 572ms ± 2% 570ms ± 3% ~ (p=0.113 n=19+20) Tar 263ms ± 4% 259ms ± 2% -1.41% (p=0.013 n=20+20) XML 321ms ± 2% 321ms ± 4% ~ (p=0.835 n=20+19) name old user-time/op new user-time/op delta Template 400ms ± 5% 405ms ± 5% ~ (p=0.096 n=20+20) Unicode 217ms ± 8% 213ms ± 8% ~ (p=0.242 n=20+20) GoTypes 1.23s ± 3% 1.22s ± 3% ~ (p=0.923 n=19+20) Compiler 5.76s ± 6% 5.81s ± 2% ~ (p=0.687 n=20+19) SSA 14.2s ± 4% 14.0s ± 4% ~ (p=0.121 n=20+20) Flate 248ms ± 7% 251ms ±10% ~ (p=0.369 n=20+20) GoParser 308ms ± 5% 305ms ± 6% ~ (p=0.336 n=19+20) Reflect 771ms ± 2% 766ms ± 2% ~ (p=0.113 n=20+19) Tar 370ms ± 5% 362ms ± 7% -2.06% (p=0.036 n=19+20) XML 435ms ± 4% 432ms ± 5% ~ (p=0.369 n=20+20) name old alloc/op new alloc/op delta Template 39.5MB ± 0% 39.4MB ± 0% -0.20% (p=0.000 n=20+20) Unicode 29.1MB ± 0% 29.1MB ± 0% ~ (p=0.064 n=20+20) GoTypes 117MB ± 0% 117MB ± 0% -0.17% (p=0.000 n=20+20) Compiler 503MB ± 0% 502MB ± 0% -0.15% (p=0.000 n=19+19) SSA 1.42GB ± 0% 1.42GB ± 0% -0.16% (p=0.000 n=20+20) Flate 25.3MB ± 0% 25.3MB ± 0% -0.19% (p=0.000 n=20+20) GoParser 31.4MB ± 0% 31.3MB ± 0% -0.14% (p=0.000 n=20+18) Reflect 78.1MB ± 0% 77.9MB ± 0% -0.34% (p=0.000 n=20+19) Tar 40.1MB ± 0% 40.0MB ± 0% -0.17% (p=0.000 n=20+20) XML 45.3MB ± 0% 45.2MB ± 0% -0.13% (p=0.000 n=20+20) name old allocs/op new allocs/op delta Template 393k ± 0% 392k ± 0% -0.21% (p=0.000 n=20+19) Unicode 337k ± 0% 337k ± 0% -0.02% (p=0.000 n=20+20) GoTypes 1.22M ± 0% 1.22M ± 0% -0.21% (p=0.000 n=20+20) Compiler 4.77M ± 0% 4.76M ± 0% -0.16% (p=0.000 n=20+20) SSA 11.8M ± 0% 11.8M ± 0% -0.12% (p=0.000 n=20+20) Flate 242k ± 0% 241k ± 0% -0.20% (p=0.000 n=20+20) GoParser 324k ± 0% 324k ± 0% -0.14% (p=0.000 n=20+20) Reflect 985k ± 0% 981k ± 0% -0.38% (p=0.000 n=20+20) Tar 403k ± 0% 402k ± 0% -0.19% (p=0.000 n=20+20) XML 424k ± 0% 424k ± 0% -0.16% (p=0.000 n=19+20) Change-Id: I131e382b64cd6db11a9263a477d45d80c180c499 Reviewed-on: https://go-review.googlesource.com/102421 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> |
|
|
|
983dcf70ba |
cmd/compile/internal/ssa: update regalloc in loops
Currently we don't lift spill out of loop if loop contains call.
However often we have code like this:
for .. {
if hard_case {
call()
}
// simple case, without call
}
So instead of checking for any call, check for unavoidable call.
For #22698 cases I see:
mime/quotedprintable/Writer-6 10.9µs ± 4% 9.2µs ± 3% -15.02% (p=0.000 n=8+8)
And:
compress/flate/Encode/Twain/Huffman/1e4-6 99.4µs ± 6% 90.9µs ± 0% -8.57% (p=0.000 n=8+8)
compress/flate/Encode/Twain/Huffman/1e5-6 760µs ± 1% 725µs ± 1% -4.56% (p=0.000 n=8+8)
compress/flate/Encode/Twain/Huffman/1e6-6 7.55ms ± 0% 7.24ms ± 0% -4.07% (p=0.000 n=8+7)
There are no significant changes on go1 benchmarks.
But for cases with runtime arch checks, where we call generic version on old hardware,
there are respectable performance gains:
math/RoundToEven-6 1.43ns ± 0% 1.25ns ± 0% -12.59% (p=0.001 n=7+7)
math/bits/OnesCount64-6 1.60ns ± 1% 1.42ns ± 1% -11.32% (p=0.000 n=8+8)
Also on some runtime benchmarks loops have less loads and higher performance:
runtime/RuneIterate/range1/ASCII-6 15.6ns ± 1% 13.9ns ± 1% -10.74% (p=0.000 n=7+8)
runtime/ArrayEqual-6 3.22ns ± 0% 2.86ns ± 2% -11.06% (p=0.000 n=7+8)
Fixes #22698
Updates #22234
Change-Id: I0ae2f19787d07a9026f064366dedbe601bf7257a
Reviewed-on: https://go-review.googlesource.com/84055
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
|
|
cd2cb6e3f5 |
cmd/compile: cache sparse maps across ssa passes
This is done for sparse sets already, but it was missing for sparse maps. Only affects deadstore and regalloc, as they're the only ones that use sparse maps. name old time/op new time/op delta DSEPass-4 247µs ± 0% 216µs ± 0% -12.75% (p=0.008 n=5+5) DSEPassBlock-4 3.05ms ± 1% 2.87ms ± 1% -6.02% (p=0.002 n=6+6) CSEPass-4 2.30ms ± 0% 2.32ms ± 0% +0.53% (p=0.026 n=6+6) CSEPassBlock-4 23.8ms ± 0% 23.8ms ± 0% ~ (p=0.931 n=6+5) DeadcodePass-4 51.7µs ± 1% 51.5µs ± 2% ~ (p=0.429 n=5+6) DeadcodePassBlock-4 734µs ± 1% 742µs ± 3% ~ (p=0.394 n=6+6) MultiPass-4 152µs ± 0% 149µs ± 2% ~ (p=0.082 n=5+6) MultiPassBlock-4 2.67ms ± 1% 2.41ms ± 2% -9.77% (p=0.008 n=5+5) name old alloc/op new alloc/op delta DSEPass-4 41.2kB ± 0% 0.1kB ± 0% -99.68% (p=0.002 n=6+6) DSEPassBlock-4 560kB ± 0% 4kB ± 0% -99.34% (p=0.026 n=5+6) CSEPass-4 189kB ± 0% 189kB ± 0% ~ (all equal) CSEPassBlock-4 3.10MB ± 0% 3.10MB ± 0% ~ (p=0.444 n=5+5) DeadcodePass-4 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) DeadcodePassBlock-4 164kB ± 0% 164kB ± 0% ~ (all equal) MultiPass-4 240kB ± 0% 199kB ± 0% -17.06% (p=0.002 n=6+6) MultiPassBlock-4 3.60MB ± 0% 2.99MB ± 0% -17.06% (p=0.002 n=6+6) name old allocs/op new allocs/op delta DSEPass-4 8.00 ± 0% 4.00 ± 0% -50.00% (p=0.002 n=6+6) DSEPassBlock-4 240 ± 0% 120 ± 0% -50.00% (p=0.002 n=6+6) CSEPass-4 9.00 ± 0% 9.00 ± 0% ~ (all equal) CSEPassBlock-4 1.35k ± 0% 1.35k ± 0% ~ (all equal) DeadcodePass-4 3.00 ± 0% 3.00 ± 0% ~ (all equal) DeadcodePassBlock-4 9.00 ± 0% 9.00 ± 0% ~ (all equal) MultiPass-4 11.0 ± 0% 10.0 ± 0% -9.09% (p=0.002 n=6+6) MultiPassBlock-4 165 ± 0% 150 ± 0% -9.09% (p=0.002 n=6+6) Change-Id: I43860687c88f33605eb1415f36473c5cfe8fde4a Reviewed-on: https://go-review.googlesource.com/98449 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> |
|
|
|
c18ff18465 |
cmd/compile: decouple emitted block order from regalloc block order
While tinkering with different block orders for the preemptible loop experiment, crashed the register allocator with a "bad" one (these exist). Realized that one knob was controlling two things (register allocation and branch patterns) and decided that life would be simpler if the two orders were independent. Ran some experiments and determined that we have probably, mostly, been optimizing for register allocation effects, not branch effects. Bad block orders for register allocation are somewhat costly. This will also allow separate experimentation with perhaps- better block orders for register allocation. Change-Id: I6ecf2f24cca178b6f8acc0d3c4caaef043c11ed9 Reviewed-on: https://go-review.googlesource.com/47314 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
2075a9323d |
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more efficient, and hopefully not too hard to understand. RegKills are gone. Instead of using the regalloc's liveness calculations, redo them using the Ops' clobber information. Besides saving a lot of Values, this avoids adding RegKills to blocks that would be empty otherwise, which was messing up optimizations. This does mean that it's much harder to tell whether the generation process is buggy (there's nothing to cross-check it with), and there may be disagreements with GC liveness. But the performance gain is significant, and it's nice not to be messing with earlier compiler phases. The intermediate representations are gone. Instead of producing ssa.BlockDebugs, then dwarf.LocationLists, and then finally real location lists, go directly from the SSA to a (mostly) real location list. Because the SSA analysis happens before assembly, it stores encoded block/value IDs where PCs would normally go. It would be easier to do the SSA analysis after assembly, but I didn't want to retain the SSA just for that. Generation proceeds in two phases: first, it traverses the function in CFG order, storing the state of the block at the beginning and end. End states are used to produce the start states of the successor blocks. In the second phase, it traverses in program text order and produces the location lists. The processing in the second phase is redundant, but much cheaper than storing the intermediate representation. It might be possible to combine the two phases somewhat to take advantage of cases where the CFG matches the block layout, but I haven't tried. Location lists are finalized by adding a base address selection entry, translating each encoded block/value ID to a real PC, and adding the terminating zero entry. This probably won't work on OSX, where dsymutil will choke on the base address selection. I tried emitting CU-relative relocations for each address, and it was *very* bad for performance -- it uses more memory storing all the relocations than it does for the actual location list bytes. I think I'm going to end up synthesizing the relocations in the linker only on OSX, but TBD. TestNexting needs updating: with more optimizations working, the debugger doesn't stop on the continue (line 88) any more, and the test's duplicate suppression kicks in. Also, dx and dy live a little longer now, but they have the correct values. Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307 Reviewed-on: https://go-review.googlesource.com/89356 Run-TryBot: Heschi Kreinick <heschi@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
0153a4130d |
cmd/compile: fix runtime.KeepAlive
KeepAlive needs to introduce a use of the spill of the value it is keeping alive. Without that, we don't guarantee that the spill dominates the KeepAlive. This bug was probably introduced with the code to move spills down to the dominator of the restores, instead of always spilling just after the value itself (CL 34822). Fixes #22458. Change-Id: I94955a21960448ffdacc4df775fe1213967b1d4c Reviewed-on: https://go-review.googlesource.com/74210 Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
ca360c3992 |
cmd/compile: better XPos for rematerialized values and JMPs
This attempts to choose better values for values that are rematerialized (uses the XPos of the consumer, not the original) and for unconditional branches (uses the last assigned XPos in the block). The JMP branches seem to sometimes end up with a PC in the destination block, I think because of register movement or rematerialization that gets placed in predecessor blocks. This may be acceptable because (eyeball-empirically) that is often the line number of the target block, so the line number flow is correct. Added proper test, that checks both -N -l and regular compilation. The test is also capable (for gdb, delve soon) of tracking variable printing based on comments in the source code. There's substantial room for improvement in debugger behavior. Updates #21098. Change-Id: I13abd48a39141583b85576a015f561065819afd0 Reviewed-on: https://go-review.googlesource.com/50610 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
ded2c65db3 |
cmd/compile: simplify a few bits of the code
Remove an unused type, a few redundant returns and replace a few slice append loops with a single append. Change-Id: If07248180bae5631b5b152c6051d9635889997d5 Reviewed-on: https://go-review.googlesource.com/66851 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Dave Cheney <dave@cheney.net> |
|
|
|
b74b43de68 |
cmd/compile: request r12 for indirect calls on ppc64le
On ppc64le, functions compiled with -shared expect r12 to hold the function's address for indirect calls. Previously this was enforced by generating a move instruction if the address wasn't already in r12. This change avoids that extra move by requesting r12 in the CALL ops that do indirect calls. As a result of adding support for plugins on ppc64le, it was discovered that there would be more cases where this extra move was needed, so this seemed like a better solution. Updates #20756 Change-Id: I6770885a46990f78c6d2902a715dcdaa822192a1 Reviewed-on: https://go-review.googlesource.com/62890 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Crawshaw <crawshaw@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
99da8730b0 |
all: remove some double spaces from comments
Went mainly for the ones that make no sense, such as the ones mid-sentence or after commas. Change-Id: Ie245d2c19cc7428a06295635cf6a9482ade25ff0 Reviewed-on: https://go-review.googlesource.com/57293 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> |
|
|
|
770d8d8207 |
cmd/compile: free value earlier in nilcheck
When we remove a nil check, add it back to the free Value pool immediately. Fixes #18732 Change-Id: I8d644faabbfb52157d3f2d071150ff0342ac28dc Reviewed-on: https://go-review.googlesource.com/58810 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> |
|
|
|
bf4d8d3d05 |
cmd/compile: rename SSA Register.Name to Register.String
Just to get rid of lots of .Name() stutter in printf calls. Change-Id: I86cf00b3f7b2172387a1c6a7f189c1897fab6300 Reviewed-on: https://go-review.googlesource.com/56630 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
4c54a047c6 |
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> |
|
|
|
2d57d94ac3 |
[dev.debug] cmd/compile: track variable decomposition in LocalSlot
When the compiler decomposes a user variable, track its origin so that it can be recomposed during DWARF generation. Change-Id: Ia71c7f8e7f4d65f0652f1c97b0dda5d9cad41936 Reviewed-on: https://go-review.googlesource.com/50878 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> |
|
|
|
00263a8968 |
cmd/compile: reduce debugger-worsening line number churn
Reuse block head or preceding instruction's line number for register allocator's spill, fill, copy, rematerialization instructionsl; and also for phi, and for no-src-pos instructions. Assembler creates same line number tables for copy-predecessor-line and for no-src-pos, but copy-predecessor produces better-looking assembly language output with -S and with GOSSAFUNC, and does not require changes to tests of existing assembly language. Split "copyInto" into two cases, one for register allocation, one for otherwise. This caused the test score line change count to increase by one, which may reflect legitimately useful information preserved. Without any special treatment for copyInto, the change count increases by 21 more, from 51 to 72 (i.e., quite a lot). There is a test; using two naive "scores" for line number churn, the old numbering is 2x or 4x worse. Fixes #18902. Change-Id: I0a0a69659d30ee4e5d10116a0dd2b8c5df8457b1 Reviewed-on: https://go-review.googlesource.com/36207 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
46b88c9fbc |
cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: * Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of *types.Type. * The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
1e3570ac86 |
cmd/internal/objabi: extract shared functionality from obj
Now only cmd/asm and cmd/compile depend on cmd/internal/obj. Changing the assembler backends no longer requires reinstalling cmd/link or cmd/addr2line. There's also now one canonical definition of the object file format in cmd/internal/objabi/doc.go, with a warning to update all three implementations. objabi is still something of a grab bag of unrelated code (e.g., flag and environment variable handling probably belong in a separate "tool" package), but this is still progress. Fixes #15165. Fixes #20026. Change-Id: Ic4b92fac7d0d35438e0d20c9579aad4085c5534c Reviewed-on: https://go-review.googlesource.com/40972 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> |
|
|
|
8a5175df35 |
cmd/compile: improve startRegs calculation
In register allocation, we calculate what values are used in and after the current block. If a value is used only after a function call, since registers are clobbered in call, we don't need to mark the value live at the entrance of the block. Before this CL it is considered live, and unnecessary copy or load may be generated when resolving merge edge. Fixes #14761. On AMD64: name old time/op new time/op delta BinaryTree17-12 2.84s ± 1% 2.81s ± 1% -1.06% (p=0.000 n=10+9) Fannkuch11-12 3.61s ± 0% 3.55s ± 1% -1.77% (p=0.000 n=10+9) FmtFprintfEmpty-12 50.4ns ± 4% 50.0ns ± 1% ~ (p=0.785 n=9+8) FmtFprintfString-12 80.0ns ± 3% 78.2ns ± 3% -2.35% (p=0.004 n=10+9) FmtFprintfInt-12 81.3ns ± 4% 81.8ns ± 2% ~ (p=0.159 n=10+10) FmtFprintfIntInt-12 120ns ± 4% 118ns ± 2% ~ (p=0.218 n=10+10) FmtFprintfPrefixedInt-12 152ns ± 3% 155ns ± 2% +2.11% (p=0.026 n=10+10) FmtFprintfFloat-12 240ns ± 1% 238ns ± 1% -0.79% (p=0.005 n=9+9) FmtManyArgs-12 504ns ± 1% 510ns ± 1% +1.14% (p=0.000 n=8+9) GobDecode-12 7.00ms ± 1% 6.99ms ± 0% ~ (p=0.497 n=9+10) GobEncode-12 5.47ms ± 1% 5.48ms ± 1% ~ (p=0.218 n=10+10) Gzip-12 258ms ± 2% 256ms ± 1% -0.96% (p=0.043 n=10+9) Gunzip-12 38.6ms ± 0% 38.3ms ± 0% -0.64% (p=0.000 n=9+8) HTTPClientServer-12 90.4µs ± 3% 87.2µs ±11% ~ (p=0.053 n=9+10) JSONEncode-12 15.6ms ± 0% 15.6ms ± 1% ~ (p=0.077 n=9+9) JSONDecode-12 55.1ms ± 1% 54.6ms ± 1% -0.85% (p=0.010 n=10+9) Mandelbrot200-12 4.49ms ± 0% 4.47ms ± 0% -0.25% (p=0.000 n=10+8) GoParse-12 3.38ms ± 0% 3.37ms ± 1% ~ (p=0.315 n=8+10) RegexpMatchEasy0_32-12 82.5ns ± 4% 82.0ns ± 0% ~ (p=0.164 n=10+8) RegexpMatchEasy0_1K-12 203ns ± 1% 202ns ± 1% -0.85% (p=0.000 n=9+10) RegexpMatchEasy1_32-12 82.3ns ± 1% 81.1ns ± 0% -1.39% (p=0.000 n=10+8) RegexpMatchEasy1_1K-12 357ns ± 1% 357ns ± 1% ~ (p=0.697 n=8+9) RegexpMatchMedium_32-12 125ns ± 2% 126ns ± 2% ~ (p=0.197 n=10+10) RegexpMatchMedium_1K-12 39.6µs ± 3% 39.6µs ± 1% ~ (p=0.971 n=10+10) RegexpMatchHard_32-12 1.99µs ± 2% 1.99µs ± 4% ~ (p=0.891 n=10+9) RegexpMatchHard_1K-12 60.1µs ± 3% 60.4µs ± 3% ~ (p=0.684 n=10+10) Revcomp-12 531ms ± 6% 441ms ± 0% -16.94% (p=0.000 n=10+9) Template-12 58.9ms ± 1% 58.7ms ± 1% ~ (p=0.315 n=10+10) TimeParse-12 319ns ± 1% 320ns ± 4% ~ (p=0.215 n=9+9) TimeFormat-12 345ns ± 0% 333ns ± 1% -3.36% (p=0.000 n=9+10) [Geo mean] 52.2µs 51.6µs -1.13% On ARM64: name old time/op new time/op delta BinaryTree17-8 8.53s ± 0% 8.36s ± 0% -1.89% (p=0.000 n=10+10) Fannkuch11-8 6.15s ± 0% 6.10s ± 0% -0.67% (p=0.000 n=10+10) FmtFprintfEmpty-8 117ns ± 0% 117ns ± 0% ~ (all equal) FmtFprintfString-8 192ns ± 0% 192ns ± 0% ~ (all equal) FmtFprintfInt-8 198ns ± 0% 198ns ± 0% ~ (p=0.211 n=10+10) FmtFprintfIntInt-8 289ns ± 0% 291ns ± 0% +0.59% (p=0.000 n=7+10) FmtFprintfPrefixedInt-8 320ns ± 2% 317ns ± 0% ~ (p=0.431 n=10+8) FmtFprintfFloat-8 538ns ± 0% 538ns ± 0% ~ (all equal) FmtManyArgs-8 1.17µs ± 1% 1.18µs ± 1% ~ (p=0.063 n=10+10) GobDecode-8 17.0ms ± 1% 17.2ms ± 1% +0.83% (p=0.000 n=10+10) GobEncode-8 14.2ms ± 0% 14.1ms ± 1% -0.78% (p=0.001 n=9+10) Gzip-8 806ms ± 0% 797ms ± 0% -1.12% (p=0.000 n=6+9) Gunzip-8 131ms ± 0% 130ms ± 0% -0.51% (p=0.000 n=10+9) HTTPClientServer-8 206µs ± 9% 212µs ± 2% ~ (p=0.829 n=10+8) JSONEncode-8 40.1ms ± 0% 40.1ms ± 0% ~ (p=0.136 n=9+9) JSONDecode-8 157ms ± 0% 151ms ± 0% -3.32% (p=0.000 n=9+9) Mandelbrot200-8 10.1ms ± 0% 10.1ms ± 0% -0.05% (p=0.000 n=9+8) GoParse-8 8.43ms ± 0% 8.43ms ± 0% ~ (p=0.912 n=10+10) RegexpMatchEasy0_32-8 228ns ± 1% 227ns ± 0% -0.26% (p=0.026 n=10+9) RegexpMatchEasy0_1K-8 1.92µs ± 0% 1.63µs ± 0% -15.18% (p=0.001 n=7+7) RegexpMatchEasy1_32-8 258ns ± 1% 250ns ± 0% -2.83% (p=0.000 n=10+10) RegexpMatchEasy1_1K-8 2.39µs ± 0% 2.13µs ± 0% -10.94% (p=0.000 n=9+9) RegexpMatchMedium_32-8 352ns ± 0% 351ns ± 0% -0.29% (p=0.004 n=9+10) RegexpMatchMedium_1K-8 104µs ± 0% 105µs ± 0% +0.58% (p=0.000 n=8+9) RegexpMatchHard_32-8 5.84µs ± 0% 5.82µs ± 0% -0.27% (p=0.000 n=9+10) RegexpMatchHard_1K-8 177µs ± 0% 177µs ± 0% -0.07% (p=0.000 n=9+9) Revcomp-8 1.57s ± 1% 1.50s ± 1% -4.60% (p=0.000 n=9+10) Template-8 157ms ± 1% 153ms ± 1% -2.28% (p=0.000 n=10+9) TimeParse-8 779ns ± 1% 770ns ± 1% -1.18% (p=0.013 n=10+10) TimeFormat-8 823ns ± 2% 826ns ± 1% ~ (p=0.324 n=10+9) [Geo mean] 144µs 142µs -1.45% Reduce cmd/go text size by 0.5%. Change-Id: I9288ff983c4a7cf03fc0cb35b9b1750828013117 Reviewed-on: https://go-review.googlesource.com/38457 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
34975095d0 |
cmd/compile: provide pos and curfn to temp
Concurrent compilation requires providing an explicit position and curfn to temp. This implementation of tempAt temporarily continues to use the globals lineno and Curfn, so as not to collide with mdempsky's work for #19683 eliminating the Curfn dependency from func nod. Updates #15756 Updates #19683 Change-Id: Ib3149ca4b0740e9f6eea44babc6f34cdd63028a9 Reviewed-on: https://go-review.googlesource.com/38592 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> |
|
|
|
27bc723b51 |
cmd/compile: initialize loop depths
Regalloc uses loop depths - make sure they are initialized! Test to make sure we aren't pushing spills into loops. This fixes a generated-code performance bug introduced with the better spill placement change: https://go-review.googlesource.com/c/34822/ Update #19595 Change-Id: Ib9f0da6fb588503518847d7aab51e569fd3fa61e Reviewed-on: https://go-review.googlesource.com/38434 Reviewed-by: David Chase <drchase@google.com> |
|
|
|
aea3aff669 |
cmd/compile: separate ssa.Frontend and ssa.TypeSource
Prior to this CL, the ssa.Frontend field was responsible for providing types to the backend during compilation. However, the types needed by the backend are few and static. It makes more sense to use a struct for them and to hang that struct off the ssa.Config, which is the correct home for readonly data. Now that Types is a struct, we can clean up the names a bit as well. This has the added benefit of allowing early construction of all types needed by the backend. This will be useful for concurrent backend compilation. Passes toolstash-check -all. No compiler performance change. Updates #15756 Change-Id: I021658c8cf2836d6a22bbc20cc828ac38c7da08a Reviewed-on: https://go-review.googlesource.com/38336 Reviewed-by: Matthew Dempsky <mdempsky@google.com> |
|
|
|
2cdb7f118a |
cmd/compile: move Frontend field from ssa.Config to ssa.Func
Suggested by mdempsky in CL 38232. This allows us to use the Frontend field to associate frontend state and information with a function. See the following CL in the series for examples. This is a giant CL, but it is almost entirely routine refactoring. The ssa test API is starting to feel a bit unwieldy. I will clean it up separately, once the dust has settled. Passes toolstash -cmp. Updates #15756 Change-Id: I71c573bd96ff7251935fce1391b06b1f133c3caf Reviewed-on: https://go-review.googlesource.com/38327 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> |
|
|
|
a5e3cac895 |
cmd/compile: rearrange fields between ssa.Func, ssa.Cache, and ssa.Config
This makes ssa.Func, ssa.Cache, and ssa.Config fulfill the roles laid out for them in CL 38160. The only non-trivial change in this CL is how cached values and blocks get IDs. Prior to this CL, their IDs were assigned as part of resetting the cache, and only modified IDs were reset. This required knowing how many values and blocks were modified, which required a tight coupling between ssa.Func and ssa.Config. To eliminate that coupling, we now zero values and blocks during reset, and assign their IDs when they are used. Since unused values and blocks have ID == 0, we can efficiently find the last used value/block, to avoid zeroing everything. Bulk zeroing is efficient, but not efficient enough to obviate the need to avoid zeroing everything every time. As a happy side-effect, ssa.Func.Free is no longer necessary. DebugHashMatch and friends now belong in func.go. They have been left in place for clarity and review. I will move them in a subsequent CL. Passes toolstash -cmp. No compiler performance impact. No change in 'go test cmd/compile/internal/ssa' execution time. Change-Id: I2eb7af58da067ef6a36e815a6f386cfe8634d098 Reviewed-on: https://go-review.googlesource.com/38167 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
886e9e6065 |
cmd/compile: put spills in better places
Previously we always issued a spill right after the op
that was being spilled. This CL pushes spills father away
from the generator, hopefully pushing them into unlikely branches.
For example:
x = ...
if unlikely {
call ...
}
... use x ...
Used to compile to
x = ...
spill x
if unlikely {
call ...
restore x
}
It now compiles to
x = ...
if unlikely {
spill x
call ...
restore x
}
This is particularly useful for code which appends, as the only
call is an unlikely call to growslice. It also helps for the
spills needed around write barrier calls.
The basic algorithm is walk down the dominator tree following a
path where the block still dominates all of the restores. We're
looking for a block that:
1) dominates all restores
2) has the value being spilled in a register
3) has a loop depth no deeper than the value being spilled
The walking-down code is iterative. I was forced to limit it to
searching 100 blocks so it doesn't become O(n^2). Maybe one day
we'll find a better way.
I had to delete most of David's code which pushed spills out of loops.
I suspect this CL subsumes most of the cases that his code handled.
Generally positive performance improvements, but hard to tell for sure
with all the noise. (compilebench times are unchanged.)
name old time/op new time/op delta
BinaryTree17-12 2.91s ±15% 2.80s ±12% ~ (p=0.063 n=10+10)
Fannkuch11-12 3.47s ± 0% 3.30s ± 4% -4.91% (p=0.000 n=9+10)
FmtFprintfEmpty-12 48.0ns ± 1% 47.4ns ± 1% -1.32% (p=0.002 n=9+9)
FmtFprintfString-12 85.6ns ±11% 79.4ns ± 3% -7.27% (p=0.005 n=10+10)
FmtFprintfInt-12 91.8ns ±10% 85.9ns ± 4% ~ (p=0.203 n=10+9)
FmtFprintfIntInt-12 135ns ±13% 127ns ± 1% -5.72% (p=0.025 n=10+9)
FmtFprintfPrefixedInt-12 167ns ± 1% 168ns ± 2% ~ (p=0.580 n=9+10)
FmtFprintfFloat-12 249ns ±11% 230ns ± 1% -7.32% (p=0.000 n=10+10)
FmtManyArgs-12 504ns ± 7% 506ns ± 1% ~ (p=0.198 n=9+9)
GobDecode-12 6.95ms ± 1% 7.04ms ± 1% +1.37% (p=0.001 n=10+10)
GobEncode-12 6.32ms ±13% 6.04ms ± 1% ~ (p=0.063 n=10+10)
Gzip-12 233ms ± 1% 235ms ± 0% +1.01% (p=0.000 n=10+9)
Gunzip-12 40.1ms ± 1% 39.6ms ± 0% -1.12% (p=0.000 n=10+8)
HTTPClientServer-12 227µs ± 9% 221µs ± 5% ~ (p=0.114 n=9+8)
JSONEncode-12 16.1ms ± 2% 15.8ms ± 1% -2.09% (p=0.002 n=9+8)
JSONDecode-12 61.8ms ±11% 57.9ms ± 1% -6.30% (p=0.000 n=10+9)
Mandelbrot200-12 4.30ms ± 3% 4.28ms ± 1% ~ (p=0.203 n=10+8)
GoParse-12 3.18ms ± 2% 3.18ms ± 2% ~ (p=0.579 n=10+10)
RegexpMatchEasy0_32-12 76.7ns ± 1% 77.5ns ± 1% +0.92% (p=0.002 n=9+8)
RegexpMatchEasy0_1K-12 239ns ± 3% 239ns ± 1% ~ (p=0.204 n=10+10)
RegexpMatchEasy1_32-12 71.4ns ± 1% 70.6ns ± 0% -1.15% (p=0.000 n=10+9)
RegexpMatchEasy1_1K-12 383ns ± 2% 390ns ±10% ~ (p=0.181 n=8+9)
RegexpMatchMedium_32-12 114ns ± 0% 113ns ± 1% -0.88% (p=0.000 n=9+8)
RegexpMatchMedium_1K-12 36.3µs ± 1% 36.8µs ± 1% +1.59% (p=0.000 n=10+8)
RegexpMatchHard_32-12 1.90µs ± 1% 1.90µs ± 1% ~ (p=0.341 n=10+10)
RegexpMatchHard_1K-12 59.4µs ±11% 57.8µs ± 1% ~ (p=0.968 n=10+9)
Revcomp-12 461ms ± 1% 462ms ± 1% ~ (p=1.000 n=9+9)
Template-12 67.5ms ± 1% 66.3ms ± 1% -1.77% (p=0.000 n=10+8)
TimeParse-12 314ns ± 3% 309ns ± 0% -1.56% (p=0.000 n=9+8)
TimeFormat-12 340ns ± 2% 331ns ± 1% -2.79% (p=0.000 n=10+10)
The go binary is 0.2% larger. Not really sure why the size
would change.
Change-Id: Ia5116e53a3aeb025ef350ffc51c14ae5cc17871c
Reviewed-on: https://go-review.googlesource.com/34822
Reviewed-by: David Chase <drchase@google.com>
|