When there are both a synchronous preemption request (by
clobbering the stack guard) and an asynchronous one (by signal),
the running goroutine may observe the synchronous request first
in stack bounds check, and go to the path of calling morestack.
If the preemption signal arrives at this point before the call to
morestack, the goroutine will be asynchronously preempted,
entering the scheduler. When it is resumed, the scheduler clears
the preemption request, unclobbers the stack guard. But the
resumed goroutine will still call morestack, as it is already on
its way. morestack will, as there is no preemption request,
double the stack unnecessarily. If this happens multiple times,
the stack may grow too big, although only a small amount is
actually used.
To fix this, we mark the stack bounds check and the call to
morestack async-nonpreemptible, starting after the memory
instruction (mostly a load, on x86 CMP with memory).
Not done for Wasm as it does not support async preemption.
Fixes#35470.
Change-Id: Ibd7f3d935a3649b80f47539116ec9b9556680cf2
Reviewed-on: https://go-review.googlesource.com/c/go/+/207350
Reviewed-by: David Chase <drchase@google.com>
Currently we use stack map index -2 to mark unsafe points, i.e.
PC ranges that is not safe for async preemption. This has a
problem: it cannot mark CALL instructions, because for stack scan
a valid stack map index is needed.
This CL switches to use register map index for marking unsafe
points instead, which does not conflict with stack scan and can
be applied on CALL instructions. This is necessary as next CL
will mark call to morestack nonpreemptible.
For #35470.
Change-Id: I357bf26c996e1fee1e7eebe4e6bb07d62930d3f7
Reviewed-on: https://go-review.googlesource.com/c/go/+/207349
Reviewed-by: David Chase <drchase@google.com>
This Patch describes NOOP in Go assembly syntax and gives Go assembly
example and corresponding GNU assembly example.
Change-Id: I9db659cc5e3dc6b1f1450f2064255af8872d4b1c
Reviewed-on: https://go-review.googlesource.com/c/go/+/207400
Run-TryBot: eric fang <eric.fang@arm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
In CL 208233 I am fixing a panic that occurs only with a specific
build mode. I want that test to run on all platforms that support that
build mode, but the logic for determining support is somewhat
involved.
For now, I am duplicating that logic into the cmd/internal/sys
package, which already reports platform support for other build flags.
We can refactor cmd/go/internal/work to use the extracted function in
a followup CL.
Updates #35759
Change-Id: Ibbaedde4d1e8f683c650beedd10849bc27e7a6e7
Reviewed-on: https://go-review.googlesource.com/c/go/+/208457
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Jay Conrod <jayconrod@google.com>
Starting with macOS 10.15 (Catalina), Apple now requires all software
distributed outside of the App Store to be notarized. Any binaries we
distribute must abide by a strict set of requirements like code-signing
and having a minimum target SDK of 10.9 (amongst others).
Apple’s notarization service will recursively inspect archives looking to
find notarization candidate binaries. If it finds a binary that does not
meet the requirements or is unable to decompress an archive, it will
reject the entire distribution. From cursory testing, it seems that the
service uses content sniffing to determine file types, so changing
the file extension will not work.
There are some binaries and archives included in our distribution that
are being detected by Apple’s service as potential candidates for
notarization or decompression. As these are files used by tests and some
are intentionally invalid, we don’t intend to ever make them compliant.
As a workaround for this, we base64-encode any binaries or archives that
Apple’s notarization service issues a warning for, as these warnings will
become errors in January 2020.
Updates #34986
Change-Id: I106fbb6227b61eb221755568f047ee11103c1680
Reviewed-on: https://go-review.googlesource.com/c/go/+/208118
Run-TryBot: Andrew Bonventre <andybons@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Handle immediates larger than 12-bits by rewriting as an LUI instruction with
the high bits, followed by the original instruction with the low bits.
Based on the riscv-go port.
Updates #27532
Change-Id: I8ed6d6e6db06fb8a27f3ab75f467ec2b7ff1f075
Reviewed-on: https://go-review.googlesource.com/c/go/+/204626
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The old recipe for making an infinite loop not be infinite
in the debugger could create an instruction (Prog) with a
line number not tied to any file (index == 0). This caused
downstream failures in DWARF processing.
So don't do that. Also adds a test, also adds a check+panic
to ensure that the next time this happens the error is less
mystifying.
Fixes#35652
Change-Id: I04f30bc94fdc4aef20dd9130561303ff84fd945e
Reviewed-on: https://go-review.googlesource.com/c/go/+/207613
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
The 2-instruction TLS access sequence
MOVQ TLS, BX
MOVQ 0(BX)(TLS*1), BX
is not async preemptible, as if it is preempted and resumed on a
different thread, the TLS address may become invalid.
May fix#35349. (This is a rare failure and I haven't been able
to reproduce it.)
Change-Id: Ie1a366fd0d7d73627dc62ee2de01c0aa09365f2b
Reviewed-on: https://go-review.googlesource.com/c/go/+/206903
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Provide initial linker support for riscv64.
Based on riscv-go port.
Updates #27532
Change-Id: I8a881ce41cd49efef0358bad9171d4d18aaf7ab2
Reviewed-on: https://go-review.googlesource.com/c/go/+/204624
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
iOS does not support SA_ONSTACK. The signal handler runs on the
G stack. Any writes below the SP may be clobbered by the signal
handler (even without call injection). So we save LR after
decrementing SP on iOS.
Updates #35439.
Change-Id: Ia6d7a0669e0bcf417b44c031d2e26675c1184165
Reviewed-on: https://go-review.googlesource.com/c/go/+/206418
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Intrinsify these functions to match other platforms. Update the
sequence of instructions used in the assembly implementations to
match the intrinsics.
Also, add a micro benchmark so we can more easily measure the
performance of these two functions:
name old time/op new time/op delta
And8-8 5.33ns ± 7% 2.55ns ± 8% -52.12% (p=0.000 n=20+20)
And8Parallel-8 7.39ns ± 5% 3.74ns ± 4% -49.34% (p=0.000 n=20+20)
Or8-8 4.84ns ±15% 2.64ns ±11% -45.50% (p=0.000 n=20+20)
Or8Parallel-8 7.27ns ± 3% 3.84ns ± 4% -47.10% (p=0.000 n=19+20)
By using a 'rotate then xor selected bits' instruction combined with
either a 'load and and' or a 'load and or' instruction we can
implement And8 and Or8 with far fewer instructions. Replacing
'compare and swap' with atomic instructions may also improve
performance when there is contention.
Change-Id: I28bb8032052b73ae8ccdf6e4c612d2877085fa01
Reviewed-on: https://go-review.googlesource.com/c/go/+/204277
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This is intended to allow IDEs to note where the optimizer
was not able to improve users' code. There may be other
applications for this, for example in studying effectiveness
of optimizer changes more quickly than running benchmarks,
or in verifying that code changes did not accidentally disable
optimizations in performance-critical code.
Logging of nilcheck (bad) for amd64 is implemented as
proof-of-concept. In general, the intent is that optimizations
that didn't happen are what will be logged, because that is
believed to be what IDE users want.
Added flag -json=version,dest
Check that version=0. (Future compilers will support a
few recent versions, I hope that version is always <=3.)
Dest is expected to be one of:
/path (or \path in Windows)
will create directory /path and fill it w/ json files
file://path
will create directory path, intended either for
I:\dont\know\enough\about\windows\paths
trustme_I_know_what_I_am_doing_probably_testing
Not passing an absolute path name usually leads to
json splattered all over source directories,
or failure when those directories are not writeable.
If you want a foot-gun, you have to ask for it.
The JSON output is directed to subdirectories of dest,
where each subdirectory is net/url.PathEscape of the
package name, and each for each foo.go in the package,
net/url.PathEscape(foo).json is created. The first line
of foo.json contains version and context information,
and subsequent lines contains LSP-conforming JSON
describing the missing optimizations.
Change-Id: Ib83176a53a8c177ee9081aefc5ae05604ccad8a0
Reviewed-on: https://go-review.googlesource.com/c/go/+/204338
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
When the frame size is large, we generate
MOVD.P 0xf0(SP), LR
ADD $(framesize-0xf0), SP
This is problematic: after the first instruction, we have a
partial frame of size (framesize-0xf0). If we try to unwind the
stack at this point, we'll try to read the LR from the stack at
0(SP) (the new SP) as the frame size is not 0. But this slot does
not contain a valid LR.
Fix this by not changing SP in two instructions. Instead,
generate
MOVD (SP), LR
ADD $framesize, SP
This affects not only async preemption but also profiling. So we
change the generated instructions, instead of marking unsafe
point.
Change-Id: I4e78c62d50ffc4acff70ccfbfec16a5ccae17f24
Reviewed-on: https://go-review.googlesource.com/c/go/+/206057
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
For async preemption, we will be using REGTMP as a temporary
register in injected call on S390X, which will clobber it. So any
code that uses REGTMP is not safe for async preemption.
In the assembler backend, we expand a Prog to multiple machine
instructions and use REGTMP as a temporary register if necessary.
These need to be marked unsafe. Unlike ARM64 and MIPS,
instructions on S390X are variable length so we don't use the
length as a condition. Instead, we set a bit on the Prog whenever
REGTMP is used.
Change-Id: Ie5d14068a950f4c7cea51dff2c4a8bdc19ec9348
Reviewed-on: https://go-review.googlesource.com/c/go/+/204105
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
If a MOVDU instruction is used with an offset of SP, the
instruction changes SP therefore needs an SP delta, which is used
for generating the PC-SP table for stack unwinding. MOVDU is
frequently used for allocating the frame and saving the LR in the
same instruction, so this is particularly useful.
Change-Id: Icb63eb55aa01c3dc350ac4e4cff6371f4c3c5867
Reviewed-on: https://go-review.googlesource.com/c/go/+/205279
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
We'll use CTR as a scratch register for call injection. Mark code
sequences that use CTR as unsafe for async preemption. Currently
it is only used in LoweredZero and LoweredMove. It is unfortunate
that they are nonpreemptible. But I think it is still better than
using LR for call injection and marking all leaf functions
nonpreemptible.
Also mark the prologue of large frame functions nonpreemptible,
as we write below SP.
Change-Id: I05a75431499f3f4b2f23651a7b17f7fcf2afbe06
Reviewed-on: https://go-review.googlesource.com/c/go/+/203823
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
On PPC64, indirect calls can be made through LR or CTR. Currently
both are used. This CL changes it to always use LR.
For async preemption, to return from the injected call, we need
an indirect jump back to the PC we preeempted. This jump can be
made through LR or CTR. So we'll have to clobber either LR or CTR.
Currently, LR is used more frequently. In particular, for a leaf
function, LR is live throughout the function. We don't want to
make leaf functions nonpreemptible. So we choose CTR for the call
injection. For code sequences that use CTR, if it is ok to use
another register, change it to.
Plus, it is a call so it will clobber LR anyway. It doesn't need
to also clobber CTR (even without preemption).
Change-Id: I07bd0e93b94a1a3aa2be2cd465801136165d8ab8
Reviewed-on: https://go-review.googlesource.com/c/go/+/203822
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
For async preemption, we will be using REGTMP as a temporary
register in injected call on MIPS, which will clobber it. So any
code that uses REGTMP is not safe for async preemption.
In the assembler backend, we expand a Prog to multiple machine
instructions and use REGTMP as a temporary register if necessary.
These need to be marked unsafe. In fact, most of the
multi-instruction Progs use REGTMP, so we mark all of them,
except ones that are whitelisted.
Change-Id: Ic00ae5589683c2c9525abdaee076d884df6b0d1e
Reviewed-on: https://go-review.googlesource.com/c/go/+/203718
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
For async preemption, we will be using REGTMP as a temporary
register in injected call on ARM64, which will clobber it. So any
code that uses REGTMP is not safe for async preemption.
In the assembler backend, we expand a Prog to multiple machine
instructions and use REGTMP as a temporary register if necessary.
These need to be marked unsafe. In fact, most of the
multi-instruction Progs use REGTMP, so we mark all of them,
except ones that are whitelisted.
Change-Id: I6e97805a13950e3b693fb606d77834940ac3722e
Reviewed-on: https://go-review.googlesource.com/c/go/+/203460
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Factor out the direct CALL identification code from objabi.IsDirectJump and
use this in two places that have separately maintained lists of reloc types.
Provide an objabi.IsDirectCallOrJump function that implements the original
behaviour of objabi.IsDirectJump.
Change-Id: I48131bae92b2938fd7822110d53df0b4ffb35766
Reviewed-on: https://go-review.googlesource.com/c/go/+/196577
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
In the dev.link branch we implemented the new object file format
and (part of) the linker improvements described in
https://golang.org/s/better-linker
The new object file is index-based and provides random access.
The linker maps the object files into read-only memory, and
access symbols on-demand using indices, as opposed to reading
all object files sequentially into the heap with the old format.
The linker carries symbol informations using indices (as opposed
to Symbol data structure). Symbols are created after the
reachability analysis, and only created for reachable symbols.
This reduces the linker's memory usage.
Linking cmd/compile, it creates ~25% fewer Symbols, and reduces
memory usage (inuse_space) by ~15%. (More results from Than.)
Currently, both the old and new object file formats are supported.
The old format is used by default. The new format can be turned
on by using the compiler/assembler/linker's -newobj flag. Note
that the flag needs to be specified consistently to all
compilations, i.e.
go build -gcflags=all=-newobj -asmflags=all=-newobj -ldflags=-newobj
Change-Id: Ia0e35306b5b9b5b19fdc7fa7c602d4ce36fa6abd
The flag field will be used for marking unsafe points. This CL
just adds the field, not doing anything with it. The next CL will
make use of it. This is for making the diff simpler.
Change-Id: I6ff5406ba2e53ae8a882184733d88482a2ca8e2a
Reviewed-on: https://go-review.googlesource.com/c/go/+/203938
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This CL adds experimental coverage instrumentation similar to what
github.com/dvyukov/go-fuzz produces in its -libfuzzer mode. The
coverage can be enabled by compiling with -d=libfuzzer. It's intended
to be used in conjunction with -buildmode=c-archive to produce an ELF
archive (.a) file that can be linked with libFuzzer. See #14565 for
example usage.
The coverage generates a unique 8-bit counter for each basic block in
the original source code, and emits an increment operation. These
counters are then collected into the __libfuzzer_extra_counters ELF
section for use by libFuzzer.
Updates #14565.
Change-Id: I239758cc0ceb9ca1220f2d9d3d23b9e761db9bf1
Reviewed-on: https://go-review.googlesource.com/c/go/+/202117
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
When we do a successful recover of a panic, we resume normal execution by
returning from the frame that had the deferred call that did the recover (after
executing any remaining deferred calls in that frame).
However, suppose we have called runtime.Goexit and there is a panic during one of the
deferred calls run by the Goexit. Further assume that there is a deferred call in
the frame of the Goexit or a parent frame that does a recover. Then the recovery
process will actually resume normal execution above the Goexit frame and hence
abort the Goexit. We will not terminate the thread as expected, but continue
running in the frame above the Goexit.
To fix this, we explicitly create a _panic object for a Goexit call. We then
change the "abort" behavior for Goexits, but not panics. After a recovery, if the
top-level panic is actually a Goexit that is marked to be aborted, then we return
to the Goexit defer-processing loop, so that the Goexit is not actually aborted.
Actual code changes are just panic.go, runtime2.go, and funcid.go. Adjusted the
test related to the new Goexit behavior (TestRecoverBeforePanicAfterGoexit) and
added several new tests of aborted panics (whose behavior has not changed).
Fixes#29226
Change-Id: Ib13cb0074f5acc2567a28db7ca6912cfc47eecb5
Reviewed-on: https://go-review.googlesource.com/c/go/+/200081
Run-TryBot: Dan Scales <danscales@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
From CL 199979, I noticed that there were some
instructions not covered by the test cases. Added those in this CL.
Additional tests for assembly instructions are also added
based on suggestions made during the review of this CL.
Previously, VSB and VSH are not included in asmz.go, they were also
added in this patch.
Change-Id: I6060a9813b483a161d61ad2240c30eec6de61536
Reviewed-on: https://go-review.googlesource.com/c/go/+/203721
Reviewed-by: Michael Munday <mike.munday@ibm.com>
POWER9 (ISA 3.0) introduced a new format of load/store instructions to
implement indexed load/store quadword, using an immediate value instead
of a register index.
This change adds support for this new instruction encoding and adds the
new load/store quadword instructions (lxv/stxv) to the assembler.
This change also adds the missing XX1-form loads/stores (halfword and byte)
included in ISA 3.0.
Change-Id: Ibcdf53c342d7a352d64a9403c2fe7b25be9c3b24
Reviewed-on: https://go-review.googlesource.com/c/go/+/200399
Run-TryBot: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Currently, at compile time we emit a function DIE with '"".' in
the function name, and we expand it at link time, with a really
ugly function. We can just expand it at compile time instead.
This way, we don't need to modify the symbol content at link time,
and also no need to allocate memory for that.
Keep the linker expansion, in case the compiler is invoked
without the import path.
Change-Id: Id53cd2e2d3eb61efceb8d44479c4b6ef890baa43
Reviewed-on: https://go-review.googlesource.com/c/go/+/204826
Reviewed-by: Than McIntosh <thanm@google.com>
This adds support for scanning the stack when a goroutine is stopped
at an async safe point. This is not yet lit up because asyncPreempt is
not yet injected, but prepares us for that.
This works by conservatively scanning the registers dumped in the
frame of asyncPreempt and its parent frame, which was stopped at an
asynchronous safe point.
Conservative scanning works by only marking words that are pointers to
valid, allocated heap objects. One complication is pointers to stack
objects. In this case, we can't determine if the stack object is still
"allocated" or if it was freed by an earlier GC. Hence, we need to
propagate the conservative-ness of scanning stack objects: if all
pointers found to a stack object were found via conservative scanning,
then the stack object itself needs to be scanned conservatively, since
its pointers may point to dead objects.
For #10958, #24543.
Change-Id: I7ff84b058c37cde3de8a982da07002eaba126fd6
Reviewed-on: https://go-review.googlesource.com/c/go/+/201761
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Flush the output log up to the root when a test panics. Prior to
this change, only the current test's output log was flushed to its
parent, resulting in no output when a subtest panics.
For the following test function:
func Test(t *testing.T) {
for i, test := range []int{1, 0, 2} {
t.Run(fmt.Sprintf("%v/%v", i, test), func(t *testing.T) {
_ = 1 / test
})
}
}
Output before this change:
panic: runtime error: integer divide by zero [recovered]
panic: runtime error: integer divide by zero
(stack trace follows)
Output after this change:
--- FAIL: Test (0.00s)
--- FAIL: Test/1/0 (0.00s)
panic: runtime error: integer divide by zero [recovered]
(stack trace follows)
Fixes#32121
Change-Id: Ifee07ccc005f0493a902190a8be734943123b6b7
Reviewed-on: https://go-review.googlesource.com/c/go/+/179599
Run-TryBot: Damien Neil <dneil@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
When compiling for GOARCH=386 GOOS=android, the compiler was attaching
R_TLS_LE relocations inappropriately -- as of Go 1.13 the TLS access
recipe for Android refers to a runtime symbol and no longer needs this
type of relocation (which was causing a crash when the linker tried to
process it).
Updates #29674.
Fixes#34788.
Change-Id: Ida01875011b524586597b1f7e273aa14e11815d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/200337
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Elias Naur <mail@eliasnaur.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
The asm encoder generally assumes that the lowest 5 bits of the
REG_XX constants match the machine instruction encoding, i.e.
the lowest 5 bits is the register number. This was not true for
FCR registers and M registers. Make it so.
MOV Rx, FCRy was encoded as two machine instructions. The first
is unnecessary. Remove.
Change-Id: Ib988e6b109ba8f564337cdd31019c1a6f1881f5b
Reviewed-on: https://go-review.googlesource.com/c/go/+/203717
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
The x86 assembler supports an "ADJSP" pseudo-op that compiles to an
ADD/SUB from SP. Unfortunately, while this seems perfect for an
instruction that would allow obj to continue to track the SP/FP delta,
obj currently doesn't do that. As a result, FP-relative references
won't work and, perhaps worse, the pcsp table will have the wrong
frame size.
We don't currently use this instruction in any assembly or generate it
in the compiler, but this is a perfect instruction for solving a
problem in #24543.
This CL makes ADJSP useful by incorporating it into the SP delta
logic.
One subtlety is that we do generate ADJSP in obj itself to open a
function's stack frame. Currently, when preprocess enters the loop to
compute the SP delta, it may or may not start at this ADJSP
instruction depending on various factors. We clean this up by instead
always starting the SP delta at 0 and always starting this loop at the
entry to the function.
Why not just recognize ADD/SUB of SP? The danger is that could change
the meaning of existing code. For example, walltime1 in
sys_linux_amd64.s saves SP, SUBs from it, and aligns it. Later, it
restores the saved copy and then does a few FP-relative references.
Currently obj doesn't know any of this is happening, but that's fine
once it gets to the FP-relative references. If we taught obj to
recognize the SUB, it would start to miscompile this code. An
alternative would be to recognize unknown instructions that write to
SP and refuse subsequent FP-relative references, but that's kind of
annoying.
This passes toolstash -cmp for std on both amd64 and 386.
Change-Id: Ic6c6a7cbf980bca904576676c07b44c0aaa9c82d
Reviewed-on: https://go-review.googlesource.com/c/go/+/200877
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Compiler-generated function references (e.g. call to
runtime.newobject) appear frequently. We assign special indices
for them, so they don't need to be referenced by name.
Change-Id: I2072594cbc56c9e1037a26e4aae12e68c2436e9f
Reviewed-on: https://go-review.googlesource.com/c/go/+/202085
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Jeremy Faller <jeremy@golang.org>
Generate inline code at defer time to save the args of defer calls to unique
(autotmp) stack slots, and generate inline code at exit time to check which defer
calls were made and make the associated function/method/interface calls. We
remember that a particular defer statement was reached by storing in the deferBits
variable (always stored on the stack). At exit time, we check the bits of the
deferBits variable to determine which defer function calls to make (in reverse
order). These low-cost defers are only used for functions where no defers
appear in loops. In addition, we don't do these low-cost defers if there are too
many defer statements or too many exits in a function (to limit code increase).
When a function uses open-coded defers, we produce extra
FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and
for each defer, the stack slots where the closure and associated args have been
stored. The funcdata also includes the location of the deferBits variable.
Therefore, for panics, we can use this funcdata to determine exactly which defers
are active, and call the appropriate functions/methods/closures with the correct
arguments for each active defer.
In order to unwind the stack correctly after a recover(), we need to add an extra
code segment to functions with open-coded defers that simply calls deferreturn()
and returns. This segment is not reachable by the normal function, but is returned
to by the runtime during recovery. We set the liveness information of this
deferreturn() to be the same as the liveness at the first function call during the
last defer exit code (so all return values and all stack slots needed by the defer
calls will be live).
I needed to increase the stackguard constant from 880 to 896, because of a small
amount of new code in deferreturn().
The -N flag disables open-coded defers. '-d defer' prints out the kind of defer
being used at each defer statement (heap-allocated, stack-allocated, or
open-coded).
Cost of defer statement [ go test -run NONE -bench BenchmarkDefer$ runtime ]
With normal (stack-allocated) defers only: 35.4 ns/op
With open-coded defers: 5.6 ns/op
Cost of function call alone (remove defer keyword): 4.4 ns/op
Text size increase (including funcdata) for go binary without/with open-coded defers: 0.09%
The average size increase (including funcdata) for only the functions that use
open-coded defers is 1.1%.
The cost of a panic followed by a recover got noticeably slower, since panic
processing now requires a scan of the stack for open-coded defer frames. This scan
is required, even if no frames are using open-coded defers:
Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ]
Without open-coded defers: 62.0 ns/op
With open-coded defers: 255 ns/op
A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers:
CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ]
Without open-coded defers: 443 ns/op
With open-coded defers: 347 ns/op
Updates #14939 (defer performance)
Updates #34481 (design doc)
Change-Id: I63b1a60d1ebf28126f55ee9fd7ecffe9cb23d1ff
Reviewed-on: https://go-review.googlesource.com/c/go/+/202340
Reviewed-by: Austin Clements <austin@google.com>
In assembly we always reference symbols by name. But for static
symbols, as they are reachable only within the current file, we
can assign them local indices and use the indices to reference
them. The index is only meaningful locally, and it is fine.
Change-Id: I16e011cd41575ef703ceb6f35899e5fa58fbcf1e
Reviewed-on: https://go-review.googlesource.com/c/go/+/201997
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
When building a program that links against Go shared libraries,
it needs to reference symbols defined in the shared library. At
compile time, we don't know where the shared library boundary is.
If we reference a symbol in package p by index, and package p is
actually part of a shared library, we cannot resolve the index at
link time, as the linker doesn't see the object file of p.
So when linking against Go shared libraries, always use named
reference for now.
To do this, the compiler needs to know whether we will be linking
against Go shared libraries. The -dynlink flag kind of indicates
that (as the document says), but currently it is actually
overloaded: it is also used when building a plugin or a shared
library, which is self-contained (if -linkshared is not otherwise
specified) and could use index for symbol reference. So we
introduce another compiler flag, -linkshared, specifically for
linking against Go shared libraries. The go command will pass
this flag if its -linkshared flag is specified
("go build -linkshared").
There may be better way to handle this. For example, we can
put the symbol indices in a special section in the shared library
that the linker can read. Or we can generate some per-package
description file to include the indices. (Currently we generate
a .shlibname file for each package that is included in a shared
library, which contains the path of the library. We could
consider extending this.) That said, this CL is a stop-gap
solution. And it is no worse than the old object files.
If we were to redesign the build system so that the shared
library boundary is known at compile time, we could use indices
for symbol references that do not cross shared library boundary,
as well as doing other things better.
Change-Id: I9c02aad36518051cc4785dbe25c4b4cef8f3faeb
Reviewed-on: https://go-review.googlesource.com/c/go/+/201818
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
This patch uses symbol NOOP to support arm64 instruction NOP. In
arm64, NOP stands for that No Operation does nothing, other than
advance the value of the program counter by 4. This instruction
can be used for instruction alignment purposes. This patch uses
NOOP to support arm64 instruction NOP, because we have a generic
"NOP" instruction, which is a zero-width pseudo-instruction.
In arm64, instruction NOP is an alias of HINT #0. This patch adds
test cases for instruction HINT #0.
Change-Id: I54e6854c46516eb652b412ef9e0f73ab7f171f8c
Reviewed-on: https://go-review.googlesource.com/c/go/+/200578
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This change removes the NewReader function (no longer used by objdump)
and prunes away the now unused code paths from Reader.BytesAt and
Reader.StringAt, which helps with performance. At the moment the
reader operates by always ingesting the entire object file (either via
direct read or by mmap), meaning that there will always be a slice
available for us to index into.
Change-Id: I3af7396effe19e50ed594fe8d82fd2d15465687c
Reviewed-on: https://go-review.googlesource.com/c/go/+/201437
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Jeremy Faller <jeremy@golang.org>
Convert the object file dumper to use NewReaderFromBytes when
reading new object files, as opposed to NewReader.
Change-Id: I9f5e0356bd21c16f545cdd70262e983a2ed38bfc
Reviewed-on: https://go-review.googlesource.com/c/go/+/201441
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>