mirror of https://github.com/golang/go.git
10 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
ee330385ca |
cmd/internal/obj, runtime: preempt & restart some instruction sequences
On some architectures, for async preemption the injected call needs to clobber a register (usually REGTMP) in order to return to the preempted function. As a consequence, the PC ranges where REGTMP is live are not preemptible. The uses of REGTMP are usually generated by the assembler, where it needs to load or materialize a large constant or offset that doesn't fit into the instruction. In those cases, REGTMP is not live at the start of the instruction sequence. Instead of giving up preemption in those cases, we could preempt it and restart the sequence when resuming the execution. Basically, this is like reissuing an interrupted instruction, except that here the "instruction" is a Prog that consists of multiple machine instructions. For this to work, we need to generate PC data to mark the start of the Prog. Currently this is only done for ARM64. TODO: the split-stack function prologue is currently not async preemptible. We could use this mechanism, preempt it and restart at the function entry. Change-Id: I37cb282f8e606e7ab6f67b3edfdc6063097b4bd1 Reviewed-on: https://go-review.googlesource.com/c/go/+/208126 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
9d812cfa5c |
cmd/compile,runtime: stack maps only at calls, remove register maps
Currently, we emit stack maps and register maps at almost every instruction. This was originally intended to support non-cooperative preemption, but was only ever used for debug call injection. Now debug call injection also uses conservative frame scanning. As a result, stack maps are only needed at call sites and register maps aren't needed at all except that we happen to also encode unsafe-point information in the register map PCDATA stream. This CL reduces stack maps to only appear at calls, and replace full register maps with just safe/unsafe-point information. This is all protected by the go115ReduceLiveness feature flag, which is defined in both runtime and cmd/compile. This CL significantly reduces binary sizes and also speeds up compiles and links: name old exe-bytes new exe-bytes delta BinGoSize 15.0MB ± 0% 14.1MB ± 0% -5.72% name old pcln-bytes new pcln-bytes delta BinGoSize 3.14MB ± 0% 2.48MB ± 0% -21.08% name old time/op new time/op delta Template 178ms ± 7% 172ms ±14% -3.59% (p=0.005 n=19+19) Unicode 71.0ms ±12% 69.8ms ±10% ~ (p=0.126 n=18+18) GoTypes 655ms ± 8% 615ms ± 8% -6.11% (p=0.000 n=19+19) Compiler 3.27s ± 6% 3.15s ± 7% -3.69% (p=0.001 n=20+20) SSA 7.10s ± 5% 6.85s ± 8% -3.53% (p=0.001 n=19+20) Flate 124ms ±15% 116ms ±22% -6.57% (p=0.024 n=18+19) GoParser 156ms ±26% 147ms ±34% ~ (p=0.070 n=19+19) Reflect 406ms ± 9% 387ms ±21% -4.69% (p=0.028 n=19+20) Tar 163ms ±15% 162ms ±27% ~ (p=0.370 n=19+19) XML 223ms ±13% 218ms ±14% ~ (p=0.157 n=20+20) LinkCompiler 503ms ±21% 484ms ±23% ~ (p=0.072 n=20+20) ExternalLinkCompiler 1.27s ± 7% 1.22s ± 8% -3.85% (p=0.005 n=20+19) LinkWithoutDebugCompiler 294ms ±17% 273ms ±11% -7.16% (p=0.001 n=19+18) (https://perf.golang.org/search?q=upload:20200428.8) The binary size improvement is even slightly better when you include the CLs leading up to this. Relative to the parent of "cmd/compile: mark PanicBounds/Extend as calls": name old exe-bytes new exe-bytes delta BinGoSize 15.0MB ± 0% 14.1MB ± 0% -6.18% name old pcln-bytes new pcln-bytes delta BinGoSize 3.22MB ± 0% 2.48MB ± 0% -22.92% (https://perf.golang.org/search?q=upload:20200428.9) For #36365. Change-Id: I69448e714f2a44430067ca97f6b78e08c0abed27 Reviewed-on: https://go-review.googlesource.com/c/go/+/230544 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
faafdf5115 |
cmd/compile: fix unsafe-points with stack maps
The compiler currently conflates whether a Value has a stack map with whether it's an unsafe point. For the most part, unsafe-points don't have stack maps, so this is mostly fine, but call instructions can be both an unsafe-point *and* have a stack map. For example, none of the instructions in a nosplit function should be preemptible, but calls must still have stack maps in case the called function grows the stack or get preempted. Currently, the compiler can't distinguish this case, so calls in nosplit functions are marked as safe-points just because they have stack maps. This is particularly problematic if a nosplit function calls another nosplit function, since this can introduce a preemption point where there should be none. We realized this was a problem for split-stack prologues a while back, and CL 207349 changed the encoding of unsafe-points to use the register map index instead of the stack map index so we could record both a stack map and an unsafe-point at the same instruction. But this was never extended into the compiler. This CL fixes this problem in the compiler. We make LivenessIndex slightly more abstract by separating unsafe-point marks from stack and register map indexes. We map this to the PCDATA encoding later when producing Progs. This isn't enough to fix the whole problem for nosplit functions, because obj still adds prologues and marks those as preemptible, but it's a step in the right direction. I checked this CL by comparing maps before and after this change in the runtime and net/http. In net/http, unsafe-points match exactly; at anything that isn't an unsafe-point, both the stack and register maps are unchanged by this CL. In the runtime, at every point that was a safe-point before this change, the stack maps agree (and mostly the runtime doesn't have register maps at all now). In both, all CALLs (except write barrier calls) have stack maps. For #36365. Change-Id: I066628938b02e78be5c81a6614295bcf7cc566c2 Reviewed-on: https://go-review.googlesource.com/c/go/+/230541 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
be64a19d99 |
cmd/compile, cmd/link, runtime: make defers low-cost through inline code and extra funcdata
Generate inline code at defer time to save the args of defer calls to unique (autotmp) stack slots, and generate inline code at exit time to check which defer calls were made and make the associated function/method/interface calls. We remember that a particular defer statement was reached by storing in the deferBits variable (always stored on the stack). At exit time, we check the bits of the deferBits variable to determine which defer function calls to make (in reverse order). These low-cost defers are only used for functions where no defers appear in loops. In addition, we don't do these low-cost defers if there are too many defer statements or too many exits in a function (to limit code increase). When a function uses open-coded defers, we produce extra FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and for each defer, the stack slots where the closure and associated args have been stored. The funcdata also includes the location of the deferBits variable. Therefore, for panics, we can use this funcdata to determine exactly which defers are active, and call the appropriate functions/methods/closures with the correct arguments for each active defer. In order to unwind the stack correctly after a recover(), we need to add an extra code segment to functions with open-coded defers that simply calls deferreturn() and returns. This segment is not reachable by the normal function, but is returned to by the runtime during recovery. We set the liveness information of this deferreturn() to be the same as the liveness at the first function call during the last defer exit code (so all return values and all stack slots needed by the defer calls will be live). I needed to increase the stackguard constant from 880 to 896, because of a small amount of new code in deferreturn(). The -N flag disables open-coded defers. '-d defer' prints out the kind of defer being used at each defer statement (heap-allocated, stack-allocated, or open-coded). Cost of defer statement [ go test -run NONE -bench BenchmarkDefer$ runtime ] With normal (stack-allocated) defers only: 35.4 ns/op With open-coded defers: 5.6 ns/op Cost of function call alone (remove defer keyword): 4.4 ns/op Text size increase (including funcdata) for go binary without/with open-coded defers: 0.09% The average size increase (including funcdata) for only the functions that use open-coded defers is 1.1%. The cost of a panic followed by a recover got noticeably slower, since panic processing now requires a scan of the stack for open-coded defer frames. This scan is required, even if no frames are using open-coded defers: Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ] Without open-coded defers: 62.0 ns/op With open-coded defers: 255 ns/op A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers: CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ] Without open-coded defers: 443 ns/op With open-coded defers: 347 ns/op Updates #14939 (defer performance) Updates #34481 (design doc) Change-Id: I63b1a60d1ebf28126f55ee9fd7ecffe9cb23d1ff Reviewed-on: https://go-review.googlesource.com/c/go/+/202340 Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
b76e6f8825 |
Revert "cmd/compile, cmd/link, runtime: make defers low-cost through inline code and extra funcdata"
This reverts CL 190098. Reason for revert: broke several builders. Change-Id: I69161352f9ded02537d8815f259c4d391edd9220 Reviewed-on: https://go-review.googlesource.com/c/go/+/201519 Run-TryBot: Bryan C. Mills <bcmills@google.com> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Dan Scales <danscales@google.com> |
|
|
|
dad616375f |
cmd/compile, cmd/link, runtime: make defers low-cost through inline code and extra funcdata
Generate inline code at defer time to save the args of defer calls to unique (autotmp) stack slots, and generate inline code at exit time to check which defer calls were made and make the associated function/method/interface calls. We remember that a particular defer statement was reached by storing in the deferBits variable (always stored on the stack). At exit time, we check the bits of the deferBits variable to determine which defer function calls to make (in reverse order). These low-cost defers are only used for functions where no defers appear in loops. In addition, we don't do these low-cost defers if there are too many defer statements or too many exits in a function (to limit code increase). When a function uses open-coded defers, we produce extra FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and for each defer, the stack slots where the closure and associated args have been stored. The funcdata also includes the location of the deferBits variable. Therefore, for panics, we can use this funcdata to determine exactly which defers are active, and call the appropriate functions/methods/closures with the correct arguments for each active defer. In order to unwind the stack correctly after a recover(), we need to add an extra code segment to functions with open-coded defers that simply calls deferreturn() and returns. This segment is not reachable by the normal function, but is returned to by the runtime during recovery. We set the liveness information of this deferreturn() to be the same as the liveness at the first function call during the last defer exit code (so all return values and all stack slots needed by the defer calls will be live). I needed to increase the stackguard constant from 880 to 896, because of a small amount of new code in deferreturn(). The -N flag disables open-coded defers. '-d defer' prints out the kind of defer being used at each defer statement (heap-allocated, stack-allocated, or open-coded). Cost of defer statement [ go test -run NONE -bench BenchmarkDefer$ runtime ] With normal (stack-allocated) defers only: 35.4 ns/op With open-coded defers: 5.6 ns/op Cost of function call alone (remove defer keyword): 4.4 ns/op Text size increase (including funcdata) for go cmd without/with open-coded defers: 0.09% The average size increase (including funcdata) for only the functions that use open-coded defers is 1.1%. The cost of a panic followed by a recover got noticeably slower, since panic processing now requires a scan of the stack for open-coded defer frames. This scan is required, even if no frames are using open-coded defers: Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ] Without open-coded defers: 62.0 ns/op With open-coded defers: 255 ns/op A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers: CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ] Without open-coded defers: 443 ns/op With open-coded defers: 347 ns/op Updates #14939 (defer performance) Updates #34481 (design doc) Change-Id: I51a389860b9676cfa1b84722f5fb84d3c4ee9e28 Reviewed-on: https://go-review.googlesource.com/c/go/+/190098 Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
4aeac68c92 |
runtime, cmd/compile: re-order PCDATA and FUNCDATA indices
The pclntab encoding supports writing only some PCDATA and FUNCDATA values. However, the encoding is dense: The max index in use determines the space used. We should thus choose a numbering in which frequently used indices are smaller. This change re-orders the PCDATA and FUNCDATA indices using that principle, using a quick and dirty instrumentation to measure index frequency. It shrinks binaries by about 0.5%. Updates #6853 file before after Δ % go 14745044 14671316 -73728 -0.500% addr2line 4305128 4280552 -24576 -0.571% api 6095800 6058936 -36864 -0.605% asm 4930928 4906352 -24576 -0.498% buildid 2881520 2861040 -20480 -0.711% cgo 4896584 4867912 -28672 -0.586% compile 25868408 25770104 -98304 -0.380% cover 5319656 5286888 -32768 -0.616% dist 3654528 3634048 -20480 -0.560% doc 4719672 4691000 -28672 -0.607% fix 3418312 3393736 -24576 -0.719% link 6137952 6109280 -28672 -0.467% nm 4250536 4225960 -24576 -0.578% objdump 4665192 4636520 -28672 -0.615% pack 2297488 2285200 -12288 -0.535% pprof 14735332 14657508 -77824 -0.528% test2json 2834952 2818568 -16384 -0.578% trace 11679964 11618524 -61440 -0.526% vet 8452696 8403544 -49152 -0.581% Change-Id: I30665dce57ec7a52e7d3c6718560b3aa5b83dd0b Reviewed-on: https://go-review.googlesource.com/c/go/+/171760 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
cbafcc55e8 |
cmd/compile,runtime: implement stack objects
Rework how the compiler+runtime handles stack-allocated variables whose address is taken. Direct references to such variables work as before. References through pointers, however, use a new mechanism. The new mechanism is more precise than the old "ambiguously live" mechanism. It computes liveness at runtime based on the actual references among objects on the stack. Each function records all of its address-taken objects in a FUNCDATA. These are called "stack objects". The runtime then uses that information while scanning a stack to find all of the stack objects on a stack. It then does a mark phase on the stack objects, using all the pointers found on the stack (and ancillary structures, like defer records) as the root set. Only stack objects which are found to be live during this mark phase will be scanned and thus retain any heap objects they point to. A subsequent CL will remove all the "ambiguously live" logic from the compiler, so that the stack object tracing will be required. For this CL, the stack tracing is all redundant with the current ambiguously live logic. Update #22350 Change-Id: Ide19f1f71a5b6ec8c4d54f8f66f0e9a98344772f Reviewed-on: https://go-review.googlesource.com/c/134155 Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
9f95c9db23 |
cmd/compile, cmd/internal/obj: record register maps in binary
This adds FUNCDATA and PCDATA that records the register maps much like the existing live arguments maps and live locals maps. The register map is indexed independently from the argument and locals maps since changes in register liveness tend not to correlate with changes to argument and local liveness. This is the final CL toward adding safe-points everywhere. The following CLs will optimize liveness analysis to bring down the cost. The effect of this CL is: name old time/op new time/op delta Template 195ms ± 2% 197ms ± 1% ~ (p=0.136 n=9+9) Unicode 98.4ms ± 2% 99.7ms ± 1% +1.39% (p=0.004 n=10+10) GoTypes 685ms ± 1% 700ms ± 1% +2.06% (p=0.000 n=9+9) Compiler 3.28s ± 1% 3.34s ± 0% +1.71% (p=0.000 n=9+8) SSA 7.79s ± 1% 7.91s ± 1% +1.55% (p=0.000 n=10+9) Flate 133ms ± 2% 133ms ± 2% ~ (p=0.190 n=10+10) GoParser 161ms ± 2% 164ms ± 3% +1.83% (p=0.015 n=10+10) Reflect 450ms ± 1% 457ms ± 1% +1.62% (p=0.000 n=10+10) Tar 183ms ± 2% 185ms ± 1% +0.91% (p=0.008 n=9+10) XML 234ms ± 1% 238ms ± 1% +1.60% (p=0.000 n=9+9) [Geo mean] 411ms 417ms +1.40% name old exe-bytes new exe-bytes delta HelloSize 1.47M ± 0% 1.51M ± 0% +2.79% (p=0.000 n=10+10) Compared to just before "cmd/internal/obj: consolidate emitting entry stack map", the cumulative effect of adding stack maps everywhere and register maps is: name old time/op new time/op delta Template 185ms ± 2% 197ms ± 1% +6.42% (p=0.000 n=10+9) Unicode 96.3ms ± 3% 99.7ms ± 1% +3.60% (p=0.000 n=10+10) GoTypes 658ms ± 0% 700ms ± 1% +6.37% (p=0.000 n=10+9) Compiler 3.14s ± 1% 3.34s ± 0% +6.53% (p=0.000 n=9+8) SSA 7.41s ± 2% 7.91s ± 1% +6.71% (p=0.000 n=9+9) Flate 126ms ± 1% 133ms ± 2% +6.15% (p=0.000 n=10+10) GoParser 153ms ± 1% 164ms ± 3% +6.89% (p=0.000 n=10+10) Reflect 437ms ± 1% 457ms ± 1% +4.59% (p=0.000 n=10+10) Tar 178ms ± 1% 185ms ± 1% +4.18% (p=0.000 n=10+10) XML 223ms ± 1% 238ms ± 1% +6.39% (p=0.000 n=10+9) [Geo mean] 394ms 417ms +5.78% name old alloc/op new alloc/op delta Template 34.5MB ± 0% 38.0MB ± 0% +10.19% (p=0.000 n=10+10) Unicode 29.3MB ± 0% 30.3MB ± 0% +3.56% (p=0.000 n=8+9) GoTypes 113MB ± 0% 125MB ± 0% +10.89% (p=0.000 n=10+10) Compiler 510MB ± 0% 575MB ± 0% +12.79% (p=0.000 n=10+10) SSA 1.46GB ± 0% 1.64GB ± 0% +12.40% (p=0.000 n=10+10) Flate 23.9MB ± 0% 25.9MB ± 0% +8.56% (p=0.000 n=10+10) GoParser 28.0MB ± 0% 30.8MB ± 0% +10.08% (p=0.000 n=10+10) Reflect 77.6MB ± 0% 84.3MB ± 0% +8.63% (p=0.000 n=10+10) Tar 34.1MB ± 0% 37.0MB ± 0% +8.44% (p=0.000 n=10+10) XML 42.7MB ± 0% 47.2MB ± 0% +10.75% (p=0.000 n=10+10) [Geo mean] 76.0MB 83.3MB +9.60% name old allocs/op new allocs/op delta Template 321k ± 0% 337k ± 0% +4.98% (p=0.000 n=10+10) Unicode 337k ± 0% 340k ± 0% +1.04% (p=0.000 n=10+9) GoTypes 1.13M ± 0% 1.18M ± 0% +4.85% (p=0.000 n=10+10) Compiler 4.67M ± 0% 4.96M ± 0% +6.25% (p=0.000 n=10+10) SSA 11.7M ± 0% 12.3M ± 0% +5.69% (p=0.000 n=10+10) Flate 216k ± 0% 226k ± 0% +4.52% (p=0.000 n=10+9) GoParser 271k ± 0% 283k ± 0% +4.52% (p=0.000 n=10+10) Reflect 927k ± 0% 972k ± 0% +4.78% (p=0.000 n=10+10) Tar 318k ± 0% 333k ± 0% +4.56% (p=0.000 n=10+10) XML 376k ± 0% 395k ± 0% +5.04% (p=0.000 n=10+10) [Geo mean] 730k 764k +4.61% name old exe-bytes new exe-bytes delta HelloSize 1.46M ± 0% 1.51M ± 0% +3.66% (p=0.000 n=10+10) For #24543. Change-Id: I91e003dc64151916b384274884bf02a2d6862547 Reviewed-on: https://go-review.googlesource.com/109353 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
1e3570ac86 |
cmd/internal/objabi: extract shared functionality from obj
Now only cmd/asm and cmd/compile depend on cmd/internal/obj. Changing the assembler backends no longer requires reinstalling cmd/link or cmd/addr2line. There's also now one canonical definition of the object file format in cmd/internal/objabi/doc.go, with a warning to update all three implementations. objabi is still something of a grab bag of unrelated code (e.g., flag and environment variable handling probably belong in a separate "tool" package), but this is still progress. Fixes #15165. Fixes #20026. Change-Id: Ic4b92fac7d0d35438e0d20c9579aad4085c5534c Reviewed-on: https://go-review.googlesource.com/40972 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> |