The runtime traceback code has its own definition of which functions
mark the top frame of a stack, separate from the TOPFRAME bits that
exist in the assembly and are passed along in DWARF information.
It's error-prone and redundant to have two different sources of truth.
This CL provides the actual TOPFRAME bits to the runtime, so that
the runtime can use those bits instead of reinventing its own category.
This CL also adds a new bit, SPWRITE, which marks functions that
write directly to SP (anything but adding and subtracting constants).
Such functions must stop a traceback, because the traceback has no
way to rederive the SP on entry. Again, the runtime has its own definition
which is mostly correct, but also missing some functions. During ordinary
goroutine context switches, such functions do not appear on the stack,
so the incompleteness in the runtime usually doesn't matter.
But profiling signals can arrive at any moment, and the runtime may
crash during traceback if it attempts to unwind an SP-writing frame
and gets out-of-sync with the actual stack. The runtime contains code
to try to detect likely candidates but again it is incomplete.
Deriving the SPWRITE bit automatically from the actual assembly code
provides the complete truth, and passing it to the runtime lets the
runtime use it.
This CL is part of a stack adding windows/arm64
support (#36439), intended to land in the Go 1.17 cycle.
This CL is, however, not windows/arm64-specific.
It is cleanup meant to make the port (and future ports) easier.
Change-Id: I227f53b23ac5b3dabfcc5e8ee3f00df4e113cf58
Reviewed-on: https://go-review.googlesource.com/c/go/+/288800
Trust: Russ Cox <rsc@golang.org>
Trust: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Mach-O relocation addend is signed 24-bit. When external linking,
if the addend is larger, we cannot put it directly into a Mach-O
relocation. This CL handles large addend by creating "label"
symbols at sym+0x800000, sym+(0x800000*2), etc., and emitting
Mach-O relocations that target the label symbols with a smaller
addend. The label symbols are generated late (similar to what
we do for RISC-V64).
One complexity comes from handling of carrier symbols, which does
not track its size or its inner symbols. But relocations can
target them. We track them in a side table (similar to what we
do for XCOFF, xcoffUpdateOuterSize).
Fixes#42738.
Change-Id: I8c53ab2397f8b88870d26f00e9026285e5ff5584
Reviewed-on: https://go-review.googlesource.com/c/go/+/278332
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Currently on darwin, when a symbol needs to be exported, we
export it both statically and dynamically. The dynamic export is
unnecessary for some symbols. Only export the necessary ones.
For special runtime C symbols (e.g. crosscall2), they used to be
exported dynamically, and we had a special case for pclntab to
not include those symbols (otherwise, when the dynamic linker
dedup them, the pclntab entries end up pointing out of the
module's address space). This CL changes it to not export those
symbols, and remove the special case.
Change-Id: I2ab40630742d48a09b86ee150aa5f1f7002b134d
Reviewed-on: https://go-review.googlesource.com/c/go/+/261497
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Jeremy Faller <jeremy@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Clean up some pclntab state, specifically:
1) Remove the oldPclnState type.
2) Move a structure out of pclnState, that was holding some memory.
3) Stop passing container around everywhere and calling emitPcln. Use a
slice of function symbols instead.
Change-Id: I74e916564cd769a706750d024e55ee0d811a79da
Reviewed-on: https://go-review.googlesource.com/c/go/+/248379
Run-TryBot: Jeremy Faller <jeremy@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Move the pctables out of pclntab_old. Creates a new generator symbol,
runtime.pctab, which holds all the deduplicated pctables. Also, tightens
up some of the types in runtime.
Darwin, cmd/compile statistics:
alloc/op
Pclntab_GC 26.4MB ± 0% 13.8MB ± 0%
allocs/op
Pclntab_GC 89.9k ± 0% 86.4k ± 0%
liveB
Pclntab_GC 25.5M ± 0% 24.2M ± 0%
No significant change in binary size.
Change-Id: I1560fd4421f8a210f8d4b508fbc54e1780e338f9
Reviewed-on: https://go-review.googlesource.com/c/go/+/248332
Run-TryBot: Jeremy Faller <jeremy@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Switch pcdata over to content addressable symbols. This is the last
step before removing these from pclntab_old.
No meaningful benchmarks changes come from this work.
Change-Id: I3f74f3d6026a278babe437c8010e22992c92bd89
Reviewed-on: https://go-review.googlesource.com/c/go/+/247399
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Creates two new symbols: runtime.cutab, and runtime.filetab, and strips
the filenames out of runtime.pclntab_old.
All stats are for cmd/compile.
Time:
Pclntab_GC 48.2ms ± 3% 45.5ms ± 9% -5.47% (p=0.004 n=9+9)
Alloc/op:
Pclntab_GC 30.0MB ± 0% 29.5MB ± 0% -1.88% (p=0.000 n=10+10)
Allocs/op:
Pclntab_GC 90.4k ± 0% 73.1k ± 0% -19.11% (p=0.000 n=10+10)
live-B:
Pclntab_GC 29.1M ± 0% 29.2M ± 0% +0.10% (p=0.000 n=10+10)
binary sizes:
NEW: 18565600
OLD: 18532768
The size differences in the binary are caused by the increased size of
the Func objects, and (less likely) some extra alignment padding needed
as a result. This is probably the maximum increase in size we'll size
from the pclntab reworking.
Change-Id: Idd95a9b159fea46f7701cfe6506813b88257fbea
Reviewed-on: https://go-review.googlesource.com/c/go/+/246497
Run-TryBot: Jeremy Faller <jeremy@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Austin Clements <austin@google.com>
In order to prevent renumbering of filenames in pclntab generation, use
the per-package file list (previously only used for DWARF generation) as
file-indices. This is the largest step to eliminate renumbering of
filenames in pclntab.
Note, this is probably not the final state of the file table within the
object file. In this form, the linker loads all filenames for all
objects. I'll move to storing the filenames as regular string
symbols,and defaulting all string symbols to using the larger hash value
to make generation of pcln simplest, and most memory friendly.
Change-Id: I23daafa3f4b4535076e23100200ae0e7163aafe0
Reviewed-on: https://go-review.googlesource.com/c/go/+/245485
Run-TryBot: Jeremy Faller <jeremy@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This preallocation is way too large, and showed up in the metrics. Just
remove it all together.
Change-Id: Ib4646b63cd0a903656ada244f15e977cde2a2c4c
Reviewed-on: https://go-review.googlesource.com/c/go/+/247177
Run-TryBot: Jeremy Faller <jeremy@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
On AIX, container symbols are handled in a weird way (unlike
other platforms): the outer symbol needs to have size (but still
no data), and the inner symbols must not be in the symbol table
(otherwise it overlaps with the outer symbol, which the system
linker doesn't like).
As of CL 241598, pclntab becomes a container symbol. We need to
follow the rule above for AIX.
Fix AIX build.
Change-Id: Ie2515a4cabbd8cf3f6d3868643a28f64ca3365a2
Reviewed-on: https://go-review.googlesource.com/c/go/+/246479
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Jeremy Faller <jeremy@golang.org>
Not used yet, but add the compilation unit for a function to func.
Change-Id: I7c43fa9f1da044ca63bab030062519771b9f4418
Reviewed-on: https://go-review.googlesource.com/c/go/+/244547
Run-TryBot: Jeremy Faller <jeremy@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Leaving creation of the funcID till the linker requires the linker to
load the function and file names into memory. Moving these into the
compiler/assembler prevents this.
This work is a step towards moving all func metadata into the compiler.
Change-Id: Iebffdc5a909adbd03ac263fde3f4c3d492fb1eac
Reviewed-on: https://go-review.googlesource.com/c/go/+/244024
Run-TryBot: Jeremy Faller <jeremy@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Move the function names out of runtime.pclntab_old, creating
runtime.funcnametab. There is an unfortunate artifact in this change in
that calculating the funcID still requires loading the name. Future work
will likely pull this out and put it into the object file Funcs.
ls -l cmd/compile (darwin):
before: 18524016
after: 18519952
The difference in size can be attributed to alignment in pclntab_old.
Change-Id: Ibcbb230d4632178f8fcd0667165f5335786381f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/243223
Reviewed-by: Austin Clements <austin@google.com>
Non functional change.
As runtime.pclntab breaks up, it'll be easier if we can just pass around
the pclntab state. Also, eliminate the globals in pclntab.
Change-Id: I2a5849e8f5f422a336a881e53a261e3997d11c44
Reviewed-on: https://go-review.googlesource.com/c/go/+/242599
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
As of July 2020, a fair amount of the new linker's live memory, and
runtime is spent generating pclntab. In an effort to streamline that
code, this change starts breaking up the generation of runtime.pclntab
into smaller chunks that can run later in a link. These changes are
described in an (as yet not widely distributed) document that lays out
an improved format. Largely the work consists of breaking up
runtime.pclntab into smaller pieces, stopping much of the data
rewriting, and getting runtime.pclntab into a form where we can reason
about its size and look to shrink it. This change is the first part of
that work -- just pulling out the header, and demonstrating where a
majority of that work will be.
Change-Id: I65618d0d0c780f7e5977c9df4abdbd1696fedfcb
Reviewed-on: https://go-review.googlesource.com/c/go/+/241598
Run-TryBot: Jeremy Faller <jeremy@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Rename Reloc2 to Reloc, At2 to At, Aux2 to Aux.
Change-Id: Ic98d83c080e8cd80fbe1837c8f0aa134033508ce
Reviewed-on: https://go-review.googlesource.com/c/go/+/245578
Reviewed-by: Jeremy Faller <jeremy@golang.org>
Now that we removed the "weird thing" about runtime.etext symbol,
we can remove this special case.
Change-Id: I2e4558367758d37e898a802bcd30671c7dd6fe89
Reviewed-on: https://go-review.googlesource.com/c/go/+/240066
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Jeremy Faller <jeremy@golang.org>
The loader's SymbolBuilder Add*/Set* methods include a call to mark
the underlying symbol as reachable (as a convenience, so that callers
would not have to set it explicitly). This code was carried over from
the corresponding sym.Symbol methods; back in the sym.Symbol world
unreachable symbols were never removed from the AllSyms slice, hence
setting and checking reachability was a good deal more important.
With the advent of the loader and the new deadcode implementation,
there is less of a need for this sort of fallback, and in addition the
implicit attr setting introduces data races in the the loader if there
are SymbolBuilder Add*/Set* method calls in parallel threads, as well
as adding overhead to the methods.
This patch gets rid of the implicit reachability setting, and instead
marks reachability in CreateSymForUpdate, as well as adding a few
explicit SetAttrReachable calls where needed.
Change-Id: I029a0c5a4a24237826a7831f9cbe5180d44cbc40
Reviewed-on: https://go-review.googlesource.com/c/go/+/237678
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Jeremy Faller <jeremy@golang.org>
The runtime needs to find the PC of the deferreturn call in a few
places. So for functions that have defer, we record the PC of
deferreturn call in its funcdata.
For very large binaries, the deferreturn call could be made
through a trampoline. The current code of finding deferreturn PC
fails in this case. This CL handles the trampoline as well.
Fixes#39049.
Change-Id: I929be54d6ae436f5294013793217dc2a35f080d4
Reviewed-on: https://go-review.googlesource.com/c/go/+/234105
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Jeremy Faller <jeremy@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
This is more or less a direct translation, to get things going.
There are more things we can do to make it better, especially on
the handling of container symbols.
Change-Id: I11a0087e402be8d42b9d06869385ead531755272
Reviewed-on: https://go-review.googlesource.com/c/go/+/229125
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Jeremy Faller <jeremy@golang.org>
When using plugins, on darwin we do weird things with
runtime.etext symbol, assigning a value for it, then clear it,
reassign a different value. This breaks the logic of writing text
address directly.
I think we should remove the weird thing with runtime.etext, if
possible. But for now, disable the optimization (this is not a
common case anyway).
Fix darwin-nocgo build.
Change-Id: Iab6a9f8519115226a5bbaaafe4a93f17042a928a
Reviewed-on: https://go-review.googlesource.com/c/go/+/229057
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
If we are internal linking a static executable, in pclntab
generation, the function addresses are known, so we can just use
them directly instead of emitting relocations.
For external linking or other build modes, we are generating a
relocatable binary so we still need to emit relocations.
Reduce some allocations: for linking cmd/compile,
name old alloc/op new alloc/op delta
Pclntab_GC 38.8MB ± 0% 36.4MB ± 0% -6.19% (p=0.008 n=5+5)
TODO: can we also do this in DWARF generation?
Change-Id: I43920d930ab1da97c205871027e01844a07a5e60
Reviewed-on: https://go-review.googlesource.com/c/go/+/228478
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Fix a bug in findfunctab when building plugin on Darwin (this is
a regression introduced by CL 227842).
Change-Id: Ic610168e45a750c0a2f2b8611d5d9154e6c2622f
Reviewed-on: https://go-review.googlesource.com/c/go/+/228137
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Jeremy Faller <jeremy@golang.org>
The pclntab phase generates a series of "inltree.*" symbols with
inlining related pcdata; these symbols previously were given names and
enterered into the symbol lookup table, but there is no real reason to
do this, since they never need to be looked up when pcln generation is
done. Switch them over to anonymous symbols.
So as to insure that the later symtab phase picks them up correctly,
assign them a type of SGOFUNC instead of SRODATA, and change symtab to
look for this when assigning symbols to groups.
Change-Id: I38225dbb130ad7aea5d16f79cef3d8d388c61c2b
Reviewed-on: https://go-review.googlesource.com/c/go/+/227845
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Convert the linker's findfunctab phase to use the new loader APIs.
Change-Id: Ia980a85963fe2e7c554c212c0cc89208272264bd
Reviewed-on: https://go-review.googlesource.com/c/go/+/227842
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
If a Go symbol is cloned to external, we should preserve its Aux
symbols for FuncInfo, etc.. We already do this in
loader.FuncInfo, but not in FuncInfo.Funcdata. Do it in the
latter as well. In fact, since FuncInfo and Funcdata should use
the same set of auxs, just record the auxs and reuse.
Should fix PPC64 build.
Change-Id: Iab9020eaca15d98fe3bb41f50f0d5bdb4999e8c8
Reviewed-on: https://go-review.googlesource.com/c/go/+/227848
Reviewed-by: Jeremy Faller <jeremy@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Rework the linker's pcln phase to work with the new loader. As part of
this set of changes the handling of "go.file..." symbols has been
revised somewhat -- previously they were treated as always live in the
loader, and now we no longer do this.
The original plan had been to have the new implementation generate
nameless "inltree" symbols, however the plan now is to keep them
named for now and convert them to nameless in a subsequent patch.
Change-Id: If71c93ff1f146dbb63b6ee2546308acdc94b643c
Reviewed-on: https://go-review.googlesource.com/c/go/+/227759
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Jeremy Faller <jeremy@golang.org>
On PPC64 when external linking, for large binaries we split the
text section to multiple sections, so the external linking may
insert trampolines between sections. These trampolines are within
the address range covered by the func table, but not known by Go.
This causes runtime.findfunc to return a wrong function if the
given PC is from such trampolines.
In this CL, we generate a marker between text sections where
there could potentially be a hole in the func table. At run time,
we skip the hole if we see such a marker.
Fixes#37216.
Change-Id: I95ab3875a84b357dbaa65a4ed339a19282257ce0
Reviewed-on: https://go-review.googlesource.com/c/go/+/219717
Reviewed-by: David Chase <drchase@google.com>
Now that the other dependent offset has been identified, we can remove the
unnecessary ADDI instruction from the riscv64 call sequence (reducing it
to AUIPC+JALR, rather than the previous AUIPC+ADDI+JALR).
Change-Id: I348c4efb686f9f71ed1dd1d25fb9142a41230b0d
Reviewed-on: https://go-review.googlesource.com/c/go/+/216798
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The R_CALLRISCV relocation marker is on the JALR instruction, however the actual
relocation is currently two instructions previous for the AUIPC+ADDI sequence.
Adjust the platform dependent offset accordingly and re-enable open-coded defers.
Fixes#36786.
Change-Id: I71597c193c447930fbe94ce44b7355e89ae877bb
Reviewed-on: https://go-review.googlesource.com/c/go/+/216797
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Factor out the direct CALL identification code from objabi.IsDirectJump and
use this in two places that have separately maintained lists of reloc types.
Provide an objabi.IsDirectCallOrJump function that implements the original
behaviour of objabi.IsDirectJump.
Change-Id: I48131bae92b2938fd7822110d53df0b4ffb35766
Reviewed-on: https://go-review.googlesource.com/c/go/+/196577
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Also a similar 'elapsed' function and its usages were deleted.
Fixes#19865.
Change-Id: Ib125365e69cf2eda60de64fa74290c8c7d1fd65a
Reviewed-on: https://go-review.googlesource.com/c/go/+/171730
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Generate inline code at defer time to save the args of defer calls to unique
(autotmp) stack slots, and generate inline code at exit time to check which defer
calls were made and make the associated function/method/interface calls. We
remember that a particular defer statement was reached by storing in the deferBits
variable (always stored on the stack). At exit time, we check the bits of the
deferBits variable to determine which defer function calls to make (in reverse
order). These low-cost defers are only used for functions where no defers
appear in loops. In addition, we don't do these low-cost defers if there are too
many defer statements or too many exits in a function (to limit code increase).
When a function uses open-coded defers, we produce extra
FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and
for each defer, the stack slots where the closure and associated args have been
stored. The funcdata also includes the location of the deferBits variable.
Therefore, for panics, we can use this funcdata to determine exactly which defers
are active, and call the appropriate functions/methods/closures with the correct
arguments for each active defer.
In order to unwind the stack correctly after a recover(), we need to add an extra
code segment to functions with open-coded defers that simply calls deferreturn()
and returns. This segment is not reachable by the normal function, but is returned
to by the runtime during recovery. We set the liveness information of this
deferreturn() to be the same as the liveness at the first function call during the
last defer exit code (so all return values and all stack slots needed by the defer
calls will be live).
I needed to increase the stackguard constant from 880 to 896, because of a small
amount of new code in deferreturn().
The -N flag disables open-coded defers. '-d defer' prints out the kind of defer
being used at each defer statement (heap-allocated, stack-allocated, or
open-coded).
Cost of defer statement [ go test -run NONE -bench BenchmarkDefer$ runtime ]
With normal (stack-allocated) defers only: 35.4 ns/op
With open-coded defers: 5.6 ns/op
Cost of function call alone (remove defer keyword): 4.4 ns/op
Text size increase (including funcdata) for go binary without/with open-coded defers: 0.09%
The average size increase (including funcdata) for only the functions that use
open-coded defers is 1.1%.
The cost of a panic followed by a recover got noticeably slower, since panic
processing now requires a scan of the stack for open-coded defer frames. This scan
is required, even if no frames are using open-coded defers:
Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ]
Without open-coded defers: 62.0 ns/op
With open-coded defers: 255 ns/op
A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers:
CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ]
Without open-coded defers: 443 ns/op
With open-coded defers: 347 ns/op
Updates #14939 (defer performance)
Updates #34481 (design doc)
Change-Id: I63b1a60d1ebf28126f55ee9fd7ecffe9cb23d1ff
Reviewed-on: https://go-review.googlesource.com/c/go/+/202340
Reviewed-by: Austin Clements <austin@google.com>
Generate inline code at defer time to save the args of defer calls to unique
(autotmp) stack slots, and generate inline code at exit time to check which defer
calls were made and make the associated function/method/interface calls. We
remember that a particular defer statement was reached by storing in the deferBits
variable (always stored on the stack). At exit time, we check the bits of the
deferBits variable to determine which defer function calls to make (in reverse
order). These low-cost defers are only used for functions where no defers
appear in loops. In addition, we don't do these low-cost defers if there are too
many defer statements or too many exits in a function (to limit code increase).
When a function uses open-coded defers, we produce extra
FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and
for each defer, the stack slots where the closure and associated args have been
stored. The funcdata also includes the location of the deferBits variable.
Therefore, for panics, we can use this funcdata to determine exactly which defers
are active, and call the appropriate functions/methods/closures with the correct
arguments for each active defer.
In order to unwind the stack correctly after a recover(), we need to add an extra
code segment to functions with open-coded defers that simply calls deferreturn()
and returns. This segment is not reachable by the normal function, but is returned
to by the runtime during recovery. We set the liveness information of this
deferreturn() to be the same as the liveness at the first function call during the
last defer exit code (so all return values and all stack slots needed by the defer
calls will be live).
I needed to increase the stackguard constant from 880 to 896, because of a small
amount of new code in deferreturn().
The -N flag disables open-coded defers. '-d defer' prints out the kind of defer
being used at each defer statement (heap-allocated, stack-allocated, or
open-coded).
Cost of defer statement [ go test -run NONE -bench BenchmarkDefer$ runtime ]
With normal (stack-allocated) defers only: 35.4 ns/op
With open-coded defers: 5.6 ns/op
Cost of function call alone (remove defer keyword): 4.4 ns/op
Text size increase (including funcdata) for go cmd without/with open-coded defers: 0.09%
The average size increase (including funcdata) for only the functions that use
open-coded defers is 1.1%.
The cost of a panic followed by a recover got noticeably slower, since panic
processing now requires a scan of the stack for open-coded defer frames. This scan
is required, even if no frames are using open-coded defers:
Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ]
Without open-coded defers: 62.0 ns/op
With open-coded defers: 255 ns/op
A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers:
CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ]
Without open-coded defers: 443 ns/op
With open-coded defers: 347 ns/op
Updates #14939 (defer performance)
Updates #34481 (design doc)
Change-Id: I51a389860b9676cfa1b84722f5fb84d3c4ee9e28
Reviewed-on: https://go-review.googlesource.com/c/go/+/190098
Reviewed-by: Austin Clements <austin@google.com>
Add InlTree to the FuncInfo aux symbol in new object files.
In the linker, change InlinedCall.Func from a Symbol to a string,
as we only use its Name. (There was a use of Func.File, but that
use is not correct anyway.) So we don't need to create a Symbol
if not necessary.
Change-Id: I38ce568ae0934cd9cb6d0b30599f1c8d75444fc9
Reviewed-on: https://go-review.googlesource.com/c/go/+/200098
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
I'm branching this off cl/187117, and will be reworking that diff stack.
Testing: I've run go build -toolexec 'toolstash -cmp'
Change-Id: I922a97d0f25d52ea70cd974008a063d4e7af34a7
Reviewed-on: https://go-review.googlesource.com/c/go/+/188023
Reviewed-by: Austin Clements <austin@google.com>
The logic for detecting deferreturn calls is wrong.
We used to look for a relocation whose symbol is runtime.deferreturn
and has an offset of 0. But on some architectures, the relocation
offset is not zero. These include arm (the offset is 0xebfffffe) and
s390x (the offset is 6).
This ends up setting the deferreturn offset at 0, so we end up using
the entry point live map instead of the deferreturn live map in a
frame which defers and then segfaults.
Instead, use the IsDirectCall helper to find calls.
Fixes#32477
Update #6980
Change-Id: Iecb530a7cf6eabd7233be7d0731ffa78873f3a54
Reviewed-on: https://go-review.googlesource.com/c/go/+/181258
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
The existing pclntab construction took care to re-use strings:
filenames and fully qualified function names.
It did not try to deduplicate pctab information,
perhaps because the author assumed that there
wouldn't be much duplication.
This change introduces that deduplication.
The cache gets a 33% hit rate during make.bash.
This doesn't require any changes to the file format,
and shrinks binaries by about 1%.
Updates #6853
file before after Δ %
go 14659236 14515876 -143360 -0.978%
addr2line 4272424 4223272 -49152 -1.150%
api 6050808 5993464 -57344 -0.948%
asm 4906416 4869552 -36864 -0.751%
buildid 2861104 2824240 -36864 -1.288%
cgo 4859784 4810632 -49152 -1.011%
compile 25749656 25213080 -536576 -2.084%
cover 5286952 5229608 -57344 -1.085%
dist 3634192 3597328 -36864 -1.014%
doc 4691080 4641928 -49152 -1.048%
fix 3397960 3361096 -36864 -1.085%
link 6113568 6064432 -49136 -0.804%
nm 4221928 4172776 -49152 -1.164%
objdump 4636600 4587448 -49152 -1.060%
pack 2281184 2256608 -24576 -1.077%
pprof 14641204 14485556 -155648 -1.063%
test2json 2814536 2785864 -28672 -1.019%
trace 11602204 11487516 -114688 -0.989%
vet 8399528 8313512 -86016 -1.024%
Change-Id: I59c6aae522700a0d36ddd2cbca6e22ecdf17eea2
Reviewed-on: https://go-review.googlesource.com/c/go/+/172079
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Rename it to PCIter and convert it to use methods.
Set pcscale once, during construction, to make call sites clearer.
Change some ints to bools.
Use a simple iteration termination condition,
instead of the cap comparison from the c2go translation.
Instead of requiring a Pcdata, which requires one caller
to synthesize a fake Pcdata, just ask for a byte slice.
Passes toolstash-check.
Change-Id: I811da0e929cf4a806bd6d70357ccf2911cd0c737
Reviewed-on: https://go-review.googlesource.com/c/go/+/171770
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This code was written before the c2go toolchain conversion.
Replace the handwritten varint encoding routines
and the handwritten unsigned-to-signed conversions
with calls to encoding/binary.
Passes toolstash-check.
Change-Id: I30d7f408cde3772ee98a3825e83075c4e1ec96d8
Reviewed-on: https://go-review.googlesource.com/c/go/+/171769
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>