Change the definitions of selected runtime assembly routines
from ABI0 (the default) to ABIInternal. The ABIInternal def is
intended to indicate that these functions don't follow the existing Go
runtime ABI. In addition, convert the assembly reference to
runtime.main (from runtime.mainPC) to ABIInternal. Finally, for
functions such as "runtime.duffzero" that are called directly from
generated code, make sure that the compiler looks up the correct
ABI version.
This is intended to support the register abi work, however these
changes should not have any issues even when GOEXPERIMENT=regabi is
not in effect.
Updates #27539, #40724.
Change-Id: I9846f8dcaccc95718cf2e61a18b7e924a0677e4c
Reviewed-on: https://go-review.googlesource.com/c/go/+/262319
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Than McIntosh <thanm@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
In CL 219131 we inserted a VZEROUPPER instruction on darwin/amd64.
The instruction is not available on pre-AVX machines. Guard it
with CPU feature.
Fixes#37459.
Change-Id: I9a064df277d091be4ee594eda5c7fd8ee323102b
Reviewed-on: https://go-review.googlesource.com/c/go/+/221057
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Apparently, the signal handling code path in darwin kernel leaves
the upper bits of Y registers in a dirty state, which causes many
SSE operations (128-bit and narrower) become much slower. Clear
the upper bits to get to a clean state.
We do it at the entry of asyncPreempt, which is immediately
following exiting from the kernel's signal handling code, if we
actually injected a call. It does not cover other exits where we
don't inject a call, e.g. failed preemption, profiling signal, or
other async signals. But it does cover an important use case of
async signals, preempting a tight numerical loop, which we
introduced in this cycle.
Running the benchmark in issue #37174:
name old time/op new time/op delta
Fast-8 90.0ns ± 1% 46.8ns ± 3% -47.97% (p=0.000 n=10+10)
Slow-8 188ns ± 5% 49ns ± 1% -73.82% (p=0.000 n=10+9)
There is no more slowdown due to preemption signals.
For #37174.
Change-Id: I8b83d083fade1cabbda09b4bc25ccbadafaf7605
Reviewed-on: https://go-review.googlesource.com/c/go/+/219131
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This adds asynchronous preemption function for amd64 and 386. These
functions spill and restore all register state that can be used by
user Go code.
For the moment we stub out the other arches.
For #10958, #24543.
Change-Id: I6f93fabe9875f4834922a5712362e79045c00aca
Reviewed-on: https://go-review.googlesource.com/c/go/+/201759
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>