Extracts changes from that were submitted in other CLs to enable AVX512
detection, notably:
- https://go-review.googlesource.com/c/go/+/271521
- https://go-review.googlesource.com/c/go/+/379394
- https://go-review.googlesource.com/c/go/+/502476
This change adds properties to the cpu.X86 fields to enable runtime
detection of AVX512, and the hasAVX512F, hasAVX512BW, and hasAVX512VL
macros to support bypassing runtime checks in assembly code when
GOAMD64=v4 is set.
Change-Id: Ia7c3f22f1e66bf1de575aba522cb0d0a55ce791f
Reviewed-on: https://go-review.googlesource.com/c/go/+/536257
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Martin Möhrmann <martin@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Martin Möhrmann <martin@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
Commit-Queue: Martin Möhrmann <martin@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
This CL adds four new time histogram metrics:
/sched/pauses/stopping/gc:seconds
/sched/pauses/stopping/other:seconds
/sched/pauses/total/gc:seconds
/sched/pauses/total/other:seconds
The "stopping" metrics measure the time taken to start a stop-the-world
pause. i.e., how long it takes stopTheWorldWithSema to stop all Ps.
This can be used to detect STW struggling to preempt Ps.
The "total" metrics measure the total duration of a stop-the-world
pause, from starting to stop-the-world until the world is started again.
This includes the time spent in the "start" phase.
The "gc" metrics are used for GC-related STW pauses. The "other" metrics
are used for all other STW pauses.
All of these metrics start timing in stopTheWorldWithSema only after
successfully acquiring sched.lock, thus excluding lock contention on
sched.lock. The reasoning behind this is that while waiting on
sched.lock the world is not stopped at all (all other Ps can run), so
the impact of this contention is primarily limited to the goroutine
attempting to stop-the-world. Additionally, we already have some
visibility into sched.lock contention via contention profiles (#57071).
/sched/pauses/total/gc:seconds is conceptually equivalent to
/gc/pauses:seconds, so the latter is marked as deprecated and returns
the same histogram as the former.
In the implementation, there are a few minor differences:
* For both mark and sweep termination stops, /gc/pauses:seconds started
timing prior to calling startTheWorldWithSema, thus including lock
contention.
These details are minor enough, that I do not believe the slight change
in reporting will matter. For mark termination stops, moving timing stop
into startTheWorldWithSema does have the side effect of requiring moving
other GC metric calculations outside of the STW, as they depend on the
same end time.
Fixes#63340
Change-Id: Iacd0bab11bedab85d3dcfb982361413a7d9c0d05
Reviewed-on: https://go-review.googlesource.com/c/go/+/534161
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Currently wakeableSleep has a race where, although stopTimer is called,
the timer could be queued already and fire *after* the wakeup channel is
closed.
Fix this by protecting wakeup with a lock used on the close and wake
paths and assigning the wakeup to nil on close. The wake path then
ignores a nil wakeup channel. This fixes the problem by ensuring that a
failure to stop the timer only results in the timer doing nothing,
rather than trying to send on a closed channel.
The addition of this lock requires some changes to the static lock
ranking system.
Thiere's also a second problem here: the timer could be delayed far
enough into the future that when it fires, it observes a non-nil wakeup
if the wakeableSleep has been re-initialized and reset.
Fix this problem too by allocating the wakeableSleep on the heap and
creating a new one instead of reinitializing the old one. The GC will
make sure that the reference to the old one stays alive for the timer to
fire, but that timer firing won't cause a spurious wakeup in the new
one.
Change-Id: I2b979304e755c015d4466991f135396f6a271069
Reviewed-on: https://go-review.googlesource.com/c/go/+/542335
Reviewed-by: Michael Pratt <mpratt@google.com>
Commit-Queue: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Removes the RSA KEX based ciphers from the default list. This can be
reverted using the tlsrsakex GODEBUG.
Fixes#63413
Change-Id: Id221be3eb2f6c24b91039d380313f0c87d339f98
Reviewed-on: https://go-review.googlesource.com/c/go/+/541517
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Damien Neil <dneil@google.com>
Updates the default from 1.0 -> 1.2 for servers, bringing it in line
with clients. Add a GODEBUG setting, tls10server, which lets users
revert this change.
Fixes#62459
Change-Id: I2b82f85b1c2d527df1f9afefae4ab30a8f0ceb41
Reviewed-on: https://go-review.googlesource.com/c/go/+/541516
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Damien Neil <dneil@google.com>
After landing the new execution tracer, the Windows builders failed with
some new errors.
Currently the GoSyscallBegin event has no indicator that its the target
of a ProcSteal event. This can lead to an ambiguous situation that is
unresolvable if timestamps are broken. For instance, if the tracer sees
the ProcSteal event while a goroutine has been observed to be in a
syscall (one that, for instance, did not actually lose its P), it will
proceed with the ProcSteal incorrectly.
This is a little abstract. For a more concrete example, see the
go122-syscall-steal-proc-ambiguous test.
This change resolves this ambiguity by interleaving GoSyscallBegin
events into how Ps are sequenced. Because a ProcSteal has a sequence
number (it has to, it's stopping a P from a distance) it necessarily
has to synchronize with a precise ProcStart event. This change basically
just extends this synchronization to GoSyscallBegin, so the ProcSteal
can't advance until _exactly the right_ syscall has been entered.
This change removes the test skip, since it and CL 541695 fix the two
main issues observed on Windows platforms.
For #60773.
Fixes#64061.
Change-Id: I069389cd7fe1ea903edf42d79912f6e2bcc23f62
Reviewed-on: https://go-review.googlesource.com/c/go/+/541696
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Most of the uses of work.pauseStart are completely useless, it could
simply be a local variable. One use passes a parameter from gcMarkDone
to gcMarkTermination, but that could simply be an argument.
Keeping this field in workType makes it seems more important than it
really is, so just drop it.
Change-Id: I2fdc0b21f8844e5e7be47148c3e10f13e49815c6
Reviewed-on: https://go-review.googlesource.com/c/go/+/542075
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
This CL exports the previously unexported Alias type and
corresponding functions and methods per issue #63223.
Whether Alias types are used or not is controlled by
the gotypesalias setting with the GODEBUG environment
variable. Setting gotypesalias to "1" enables the Alias
types:
GODEBUG=gotypesalias=1
By default, gotypesalias is not set.
Adjust test cases that enable/disable the use of Alias
types to use -gotypesalias=1 or -gotypesalias=0 rather
than -alias and -alias=false for consistency and to
avoid confusion.
For #63223.
Change-Id: I51308cad3320981afac97dd8c6f6a416fdb0be55
Reviewed-on: https://go-review.googlesource.com/c/go/+/541737
Run-TryBot: Robert Griesemer <gri@google.com>
Reviewed-by: Robert Findley <rfindley@google.com>
Auto-Submit: Robert Griesemer <gri@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
These functions acquire the heap lock. If they're not called on the
systemstack, a stack growth could cause a self-deadlock since stack
growth may allocate memory from the page heap.
This has been a problem for a while. If this is what's plaguing the
ppc64 port right now, it's very surprising (and probably just
coincidental) that it's showing up now.
For #64050.
For #64062.
Fixes#64067.
Change-Id: I2b95dc134d17be63b9fe8f7a3370fe5b5438682f
Reviewed-on: https://go-review.googlesource.com/c/go/+/541635
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
This change mostly implements the design described in #60773 and
includes a new scalable parser for the new trace format, available in
internal/trace/v2. I'll leave this commit message short because this is
clearly an enormous CL with a lot of detail.
This change does not hook up the new tracer into cmd/trace yet. A
follow-up CL will handle that.
For #60773.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-linux-amd64-longtest-race
Change-Id: I5d2aca2cc07580ed3c76a9813ac48ec96b157de0
Reviewed-on: https://go-review.googlesource.com/c/go/+/494187
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Currently the user arena code writes heap bits to the (*mspan).heapBits
space with the platform-specific byte ordering (the heap bits are
written and managed as uintptrs). However, the compiler always emits GC
metadata for types in little endian.
Because the scanning part of the code that loads through the type
pointer in the allocation header expects little endian ordering, we end
up with the wrong byte ordering in GC when trying to scan arena memory.
Fix this by writing out the user arena heap bits in little endian on big
endian platforms.
This means that the space returned by (*mspan).heapBits has a different
meaning for user arenas and small object spans, which is a little odd,
so I documented it. To reduce the chance of misuse of the writeHeapBits
API, which now writes out heap bits in a different ordering than
writeSmallHeapBits on big endian platforms, this change also renames
writeHeapBits to writeUserArenaHeapBits.
Much of this can be avoided in the future if the compiler were to write
out the pointer/scalar bits as an array of uintptr values instead of
plain bytes. That's too big of a change for right now though.
This change is a no-op on little endian platforms. I confirmed it by
checking for any assembly code differences in the runtime test binary.
There were none. With this change, the arena tests pass on ppc64.
Fixes#64048.
Change-Id: If077d003872fcccf5a154ff5d8441a58582061bb
Reviewed-on: https://go-review.googlesource.com/c/go/+/541315
Run-TryBot: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Currently tickspersecond forces a 100 millisecond sleep the first time
it's called. This isn't great for profiling short-lived programs, since
both CPU profiling and block profiling might call into it.
100 milliseconds is a long time, but it's chosen to try and capture a
decent estimate of the conversion on platform with course-granularity
clocks. If the granularity is 15 ms, it'll only be 15% off at worst.
Let's try a different strategy. First, let's require 5 milliseconds of
time to have elapsed at a minimum. This should be plenty on platforms
with nanosecond time granularity from the system clock, provided the
caller of tickspersecond intends to use it for calculating durations,
not timestamps. Next, grab a timestamp as close to process start as
possible, so that we can cover some of that 5 millisecond just during
runtime start.
Finally, this function is only ever called from normal goroutine
contexts. Let's do a regular goroutine sleep instead of a thread-level
sleep under a runtime lock, which has all sorts of nasty effects on
preemption.
While we're here, let's also rename tickspersecond to ticksPerSecond.
Also, let's write down some explicit rules of thumb on when to use this
function. Clocks are hard, and using this for timestamp conversion is
likely to make lining up those timestamps with other clocks on the
system difficult if not impossible.
Note that while this improves ticksPerSecond on platforms with good
clocks, we still end up with a pretty coarse sleep on platforms with
coarse clocks, and a pretty coarse result. On these platforms, keep the
minimum required elapsed time at 100 ms. There's not much we can do
about these platforms except spin and try to catch the clock boundary,
but at 10+ ms of granularity, that might be a lot of spinning.
Fixes#63103.
Fixes#63078.
Change-Id: Ic32a4ba70a03bdf5c13cb80c2669c4064aa4cca2
Reviewed-on: https://go-review.googlesource.com/c/go/+/538898
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Currently dedicated GC mark workers really try to avoid getting
preempted. The one exception is for a pending STW, indicated by
sched.gcwaiting. This is currently fine because other kinds of
preemptions don't matter to the mark workers: they're intentionally
bound to their P.
With the new execution tracer we're going to want to use forEachP to get
the attention of all Ps. We may want to do this during a GC cycle.
forEachP doesn't set sched.gcwaiting, so it may end up waiting the full
GC mark phase, burning a thread and a P in the meantime. This can mean
basically seconds of waiting and trying to preempt GC mark workers.
This change makes all mark workers yield if (*p).runSafePointFn != 0 so
that the workers actually yield somewhat promptly in response to a
forEachP attempt.
Change-Id: I7430baf326886b9f7a868704482a224dae7c9bba
Reviewed-on: https://go-review.googlesource.com/c/go/+/537235
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Currently any thread that tries to get the attention of all Ps (e.g.
stopTheWorldWithSema and forEachP) ends up in a non-preemptible state
waiting to preempt another thread. Thing is, that other thread might
also be in a non-preemptible state, trying to preempt the first thread,
resulting in a deadlock.
This is a general problem, but in practice it only boils down to one
specific scenario: a thread in GC is blocked trying to preempt a
goroutine to scan its stack while that goroutine is blocked in a
non-preemptible state to get the attention of all Ps.
There's currently a hack in a few places in the runtime to move the
calling goroutine into _Gwaiting before it goes into a non-preemptible
state to preempt other threads. This lets the GC scan its stack because
the goroutine is trivially preemptible. The only restriction is that
forEachP and stopTheWorldWithSema absolutely cannot reference the
calling goroutine's stack. This is generally not necessary, so things
are good.
Anyway, to avoid exposing the details of this hack, this change creates
a safer wrapper around forEachP (and then renames it to forEachP and the
existing one to forEachPInternal) that performs the goroutine status
change, just like stopTheWorld does. We're going to need to use this
hack with forEachP in the new tracer, so this avoids propagating the
hack further and leaves it as an implementation detail.
Change-Id: I51f02e8d8e0a3172334d23787e31abefb8a129ab
Reviewed-on: https://go-review.googlesource.com/c/go/+/533455
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Currently the execution tracer synchronizes with itself using very
heavyweight operations. As a result, it's totally fine for most of the
tracer code to look like:
if traceEnabled() {
traceXXX(...)
}
However, if we want to make that synchronization more lightweight (as
issue #60773 proposes), then this is insufficient. In particular, we
need to make sure the tracer can't observe an inconsistency between g
atomicstatus and the event that would be emitted for a particular
g transition. This means making the g status change appear to happen
atomically with the corresponding trace event being written out from the
perspective of the tracer.
This requires a change in API to something more like a lock. While we're
here, we might as well make sure that trace events can *only* be emitted
while this lock is held. This change introduces such an API:
traceAcquire, which returns a value that can emit events, and
traceRelease, which requires the value that was returned by
traceAcquire. In practice, this won't be a real lock, it'll be more like
a seqlock.
For the current tracer, this API is completely overkill and the value
returned by traceAcquire basically just checks trace.enabled. But it's
necessary for the tracer described in #60773 and we can implement that
more cleanly if we do this refactoring now instead of later.
For #60773.
Change-Id: Ibb9ff5958376339fafc2b5180aef65cf2ba18646
Reviewed-on: https://go-review.googlesource.com/c/go/+/515635
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
This change replaces the 1-bit-per-word heap bitmap for most size
classes with allocation headers for objects that contain pointers. The
header consists of a single pointer to a type. All allocations with
headers are treated as implicitly containing one or more instances of
the type in the header.
As the name implies, headers are usually stored as the first word of an
object. There are two additional exceptions to where headers are stored
and how they're used.
Objects smaller than 512 bytes do not have headers. Instead, a heap
bitmap is reserved at the end of spans for objects of this size. A full
word of overhead is too much for these small objects. The bitmap is of
the same format of the old bitmap, minus the noMorePtrs bits which are
unnecessary. All the objects <512 bytes have a bitmap less than a
pointer-word in size, and that was the granularity at which noMorePtrs
could stop scanning early anyway.
Objects that are larger than 32 KiB (which have their own span) have
their headers stored directly in the span, to allow power-of-two-sized
allocations to not spill over into an extra page.
The full implementation is behind GOEXPERIMENT=allocheaders.
The purpose of this change is performance. First and foremost, with
headers we no longer have to unroll pointer/scalar data at allocation
time for most size classes. Small size classes still need some
unrolling, but their bitmaps are small so we can optimize that case
fairly well. Larger objects effectively have their pointer/scalar data
unrolled on-demand from type data, which is much more compactly
represented and results in less TLB pressure. Furthermore, since the
headers are usually right next to the object and where we're about to
start scanning, we get an additional temporal locality benefit in the
data cache when looking up type metadata. The pointer/scalar data is
now effectively unrolled on-demand, but it's also simpler to unroll than
before; that unrolled data is never written anywhere, and for arrays we
get the benefit of retreading the same data per element, as opposed to
looking it up from scratch for each pointer-word of bitmap. Lastly,
because we no longer have a heap bitmap that spans the entire heap,
there's a flat 1.5% memory use reduction. This is balanced slightly by
some objects possibly being bumped up a size class, but most objects are
not tightly optimized to size class sizes so there's some memory to
spare, making the header basically free in those cases.
See the follow-up CL which turns on this experiment by default for
benchmark results. (CL 538217.)
Change-Id: I4c9034ee200650d06d8bdecd579d5f7c1bbf1fc5
Reviewed-on: https://go-review.googlesource.com/c/go/+/437955
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
This change adds the allocation headers GOEXPERIMENT which is a no-op.
It forks two runtime files temporarily to make the GOEXPERIMENT easier
to maintain. The forked files are mbitmap.go and msize.go.
Change-Id: I60202c00e614e4517de7dd000029cf80dd0121ef
Reviewed-on: https://go-review.googlesource.com/c/go/+/537980
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Temporarily skip to make the builder happy. Will work on a fix.
Updates #63938.
Change-Id: Ic9db771342108430c29774b2c3e50043791189a6
Reviewed-on: https://go-review.googlesource.com/c/go/+/541195
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Use ADD with constants, instead of ADDI. Also use SUB with a positive constant
rather than ADD with a negative constant. The resulting assembly is still the
same.
Change-Id: Ife10bf5ae4122e525f0e7d41b5e463e748236a9c
Reviewed-on: https://go-review.googlesource.com/c/go/+/540136
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: M Zhuo <mzh@golangcn.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
With the introduction of runtime.Pinner, returning a pointer to a pinned
struct that then points to an unpinned Go pointer is correctly caught.
However, the error message remained as "cgo result has Go pointer",
which should be updated to acknowledge that Go pointers to pinned
memory are allowed.
This also updates the comments for cgoCheckArg and cgoCheckResult
to similarly clarify.
Updates #46787
Change-Id: I147bb09e87dfb70a24d6d43e4cf84e8bcc2aff48
GitHub-Last-Rev: 706facb9f2
GitHub-Pull-Request: golang/go#62606
Reviewed-on: https://go-review.googlesource.com/c/go/+/527702
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
This CL continues adding support for And/Or primitives to
more architectures, this time for arm/arm64.
For #61395
Change-Id: Icc44ea65884c825698a345299d8f9511392aceb6
GitHub-Last-Rev: 8267665a03
GitHub-Pull-Request: golang/go#62674
Reviewed-on: https://go-review.googlesource.com/c/go/+/528797
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Mauri de Souza Meneguzzo <mauri870@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
The writeBarrier "needed" struct member has the exact same
value as "enabled", and used interchangeably.
I'm not sure if we plan to make a distinction between the
two at some point, but today they are effectively the same,
so dedup it and keep only "enabled".
Change-Id: I65e596f174e1e820dc471a45ff70c0ef4efbc386
GitHub-Last-Rev: f8c805a916
GitHub-Pull-Request: golang/go#63814
Reviewed-on: https://go-review.googlesource.com/c/go/+/538495
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Mauri de Souza Meneguzzo <mauri870@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
This CL adds the atomic primitives for the And/Or operators on x86-64.
It also includes missing benchmarks for the ops.
For #61395
Change-Id: I23ef5192866d21fc3a479d0159edeafc3aeb5c47
GitHub-Last-Rev: df800be192
GitHub-Pull-Request: golang/go#62621
Reviewed-on: https://go-review.googlesource.com/c/go/+/528315
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Previously, badmorestackg0 was never called since it was behind a g ==
R1 check, R1 holding g.m. This is clearly wrong, since we want to check
if g == g0. Fixed by using R2 that holds the value of g0.
Fixes#63953
Change-Id: I1e2a1c3be7ad9e7ae8dbf706ef6783e664a44764
GitHub-Last-Rev: b3e92cf286
GitHub-Pull-Request: golang/go#63954
Reviewed-on: https://go-review.googlesource.com/c/go/+/539840
Reviewed-by: Austin Clements <austin@google.com>
Auto-Submit: Austin Clements <austin@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
ReadMetricsSlow was updated to call the core of readMetrics on the
systemstack to prevent issues with stat skew if the stack gets moved
between readmemstats_m and readMetrics. However, readMetrics calls into
the map implementation, which has race instrumentation. The system stack
typically has no racectx set, resulting in crashes.
Donate racectx to g0 like the tracer does, so that these accesses don't
crash.
For #60607.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-race
Change-Id: Ic0251af2d9b60361f071fe97084508223109480c
Reviewed-on: https://go-review.googlesource.com/c/go/+/539695
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Currently it's possible (and even probable, with mayMoreStackMove mode)
for a stack allocation to occur between readmemstats_m and readMetrics
in ReadMetricsSlow. This can cause tests to fail by producing metrics
that are inconsistent between the two sources.
Fix this by breaking out the critical section of readMetrics and calling
that from ReadMetricsSlow on the systemstack. Our main constraint in
calling readMetrics on the system stack is the fact that we can't
acquire the metrics semaphore from the system stack. But if we break out
the critical section, then we can acquire that semaphore before we go on
the system stack.
While we're here, add another readMetrics call before readmemstats_m.
Since we're being paranoid about ways that metrics could get skewed
between the two calls, let's eliminate all uncertainty. It's possible
for readMetrics to allocate new memory, for example for histograms, and
fail while it's reading metrics. I believe we're just getting lucky
today with the order in which the metrics are produced. Another call to
readMetrics will preallocate this data in the samples slice. One nice
thing about this second read is that now we effectively have a way to
check if readMetrics really will allocate if called a second time on the
same samples slice.
Fixes#60607.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest
Change-Id: If6ce666530903239ef9f02dbbc3f1cb6be71e425
Reviewed-on: https://go-review.googlesource.com/c/go/+/539117
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
This was converted to a compiler intrinsic and no longer needs to exist
in assembly.
Change-Id: I7495c435d4642e0e71d8f7677d70af3a3ca2a6ba
Reviewed-on: https://go-review.googlesource.com/c/go/+/539195
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
This will make the upcoming GOEXPERIMENT easier to implement, since this
function relies on a lot of heap bitmap internals.
Change-Id: I2ab76e928e7bfd383dcdb5bfe72c9b23c2002a4e
Reviewed-on: https://go-review.googlesource.com/c/go/+/537979
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
We're going to want to fork this data in the near future for a
GOEXPERIMENT, so break it out now.
Change-Id: Ia7ded850bb693c443fe439c6b7279dcac638512c
Reviewed-on: https://go-review.googlesource.com/c/go/+/537978
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Go 1.21 and earlier do not understand this line, causing
"go mod vendor" of //go:build go1.22-tagged code that
uses this feature to fail.
The solution is to include the go/build change to skip over
the line in Go 1.22 (making "go mod vendor" from Go 1.22 onward
work with this change) and then wait to deploy the cgo change
until Go 1.23, at which point Go 1.21 and earlier will be unsupported.
For #56378.
Fixes#63293.
Change-Id: Ifa08b134eac5a6aa15d67dad0851f00e15e1e58b
Reviewed-on: https://go-review.googlesource.com/c/go/+/539235
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Change-Id: Ib89a71e20f9c6b86c97814c75cb427e9bd7075e5
Reviewed-on: https://go-review.googlesource.com/c/go/+/538735
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
OpenBSD 6.3 is more than five years old and has not been supported for
the last four years (only 7.3 and 7.4 are currently supported). As such,
remove special handling of MAP_STACK for 6.3 and earlier.
Change-Id: I1086c910bbcade7fb3938bb1226813212794b587
Reviewed-on: https://go-review.googlesource.com/c/go/+/538458
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Aaron Bieber <aaron@bolddaemon.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Joel Sing <joel@sing.id.au>
Allow up to 10 standard deviations from the mean, instead of
~5 that the current test allows.
10 standard deviations allows up to a 4500/5500 split.
Fixes#52465
Change-Id: Icb21c1d31fafbcf4723b75435ba5e98863e812c4
Reviewed-on: https://go-review.googlesource.com/c/go/+/538815
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Make the choice of using these instructions dynamic (triggered by cpu
feature detection) rather than static (trigered by GOARM setting).
if GOARM>=7, we know we have them.
For GOARM=5/6, dynamically dispatch based on auxv information.
Update #17082
Update #61588
Change-Id: I8a50481d942f62cf36348998a99225d0d242f8af
Reviewed-on: https://go-review.googlesource.com/c/go/+/525637
TryBot-Result: Gopher Robot <gobot@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Run-TryBot: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
CL 495415 weaken avalanche, making allowed range from 43% to 57%. Since
then, we only see a failure with 58% on linux-386-longtest builder, so
let give the test a bit more wiggle room: 40% to 59%.
Fixes#60170
Change-Id: I9528ebc8601975b733c3d9fd464ce41429654273
Reviewed-on: https://go-review.googlesource.com/c/go/+/538655
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
On some platforms (notably OpenBSD), stacks must be specifically allocated
and marked as being stack memory. Allocate the crash stack using stackalloc,
which ensures these requirements are met, rather than using a global Go
variable.
Fixes#63794
Change-Id: I6513575797dd69ff0a36f3bfd4e5fc3bd95cbf50
Reviewed-on: https://go-review.googlesource.com/c/go/+/538457
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
This is the beginning of the math/rand/v2 package from proposal #61716.
Start by copying old API. This CL copies math/rand/* to math/rand/v2
and updates references to math/rand to add v2 throughout.
Later CLs will make the v2 changes.
For #61716.
Change-Id: I1624ccffae3dfa442d4ba2461942decbd076e11b
Reviewed-on: https://go-review.googlesource.com/c/go/+/502495
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
Error like "morestack on g0" is one of the errors that is very
hard to debug, because often it doesn't print a useful stack trace.
The runtime doesn't directly print a stack trace because it is
a bad stack state to call print. Sometimes the SIGABRT may trigger
a traceback, but sometimes not especially in a cgo binary. Even if
it triggers a traceback it often does not include the stack trace
of the bad stack.
This CL makes it explicitly print a stack trace and throw. The
idea is to have some space as an "emergency" crash stack. When the
stack is in a really bad state, we switch to the crash stack and
do a traceback.
Currently only implemented on AMD64 and ARM64.
TODO: also handle errors like "morestack on gsignal" and bad
systemstack. Also handle other architectures.
Change-Id: Ibfc397202f2bb0737c5cbe99f2763de83301c1c1
Reviewed-on: https://go-review.googlesource.com/c/go/+/419435
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
After CL 527715, needm uses callbackUpdateSystemStack to set the stack
bounds for g0 on an M from the extra M list. Since
callbackUpdateSystemStack is also used for recursive cgocallback, it
does nothing if the stack is already in bounds.
Currently, the stack bounds in an extra M may contain stale bounds from
a previous thread that used this M and then returned it to the extra
list in dropm.
Typically a new thread will not have an overlapping stack with an old
thread, but because the old thread has exited there is a small chance
that the C memory allocator will allocate the new thread's stack
partially or fully overlapping with the old thread's stack.
If this occurs, then callbackUpdateSystemStack will not update the stack
bounds. If in addition, the overlap is partial such that SP on
cgocallback is close to the recorded stack lower bound, then Go may
quickly "overflow" the stack and crash with "morestack on g0".
Fix this by clearing the stack bounds in dropm, which ensures that
callbackUpdateSystemStack will unconditionally update the bounds in
needm.
For #62440.
Change-Id: Ic9e2052c2090dd679ed716d1a23a86d66cbcada7
Reviewed-on: https://go-review.googlesource.com/c/go/+/537695
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
TryBot-Bypass: Michael Pratt <mpratt@google.com>
As spotted by staticcheck, the body did keep track of errors by sharing
a single err variable, but its last value was never used as the function
simply finished by returning nil.
To prevent postDecode from erroring on empty profiles,
which breaks TestEmptyProfile, add a check at the top of the function.
Update the runtime/pprof test accordingly,
since the default units didn't make sense for an empty profile anyway.
Change-Id: I188cd8337434adf9169651ab5c914731b8b20f39
Reviewed-on: https://go-review.googlesource.com/c/go/+/483137
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
These primitives will be used by the new And/Or sync/atomic apis.
For #61395
Change-Id: I4062d6317e01afd94d3588f5425237723ab15ade
GitHub-Last-Rev: c0a8d8f34d
GitHub-Pull-Request: golang/go#63272
Reviewed-on: https://go-review.googlesource.com/c/go/+/531575
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
This implements the approach I described in
https://go-review.git.corp.google.com/c/go/+/494057/1#message-5c9773bded2f89b4058848cb036b860aa6716de3.
Specifically:
- Each level of test atomically records the cumulative number of races
seen as of the last race-induced test failure.
- When a subtest fails, it logs the race error, and then updates its
parents' counters so that they will not log the same error.
- We check each test or benchmark for races before it starts running
each of its subtests or sub-benchmark, before unblocking parallel
subtests, and after running any cleanup functions.
With this implementation, it should be the case that every test that
is running when a race is detected reports that race, and any race
reported for a subtest is not redundantly reported for its parent.
The regression tests are based on those added in CL 494057 and
CL 501895, with a few additions based on my own review of the code.
Fixes#60083.
Change-Id: I578ae929f192a7a951b31b17ecb560cbbf1ef7a1
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-linux-amd64-longtest-race,gotip-windows-amd64-longtest
Reviewed-on: https://go-review.googlesource.com/c/go/+/506300
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Bryan Mills <bcmills@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
The goroutine profile has close to three code paths for adding a
goroutine record to the goroutine profile: one for the goroutine that
requested the profile, one for every other goroutine, plus some special
handling for the finalizer goroutine. The first of those captured the
goroutine stack, but neglected to include that goroutine's labels.
Update the tests to check for the inclusion of labels for all three
types of goroutines, and include labels for the creator of the goroutine
profile.
For #63712
Change-Id: Id5387a5f536d3c37268c240e0b6db3d329a3d632
Reviewed-on: https://go-review.googlesource.com/c/go/+/537515
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Rhys Hiltner <rhys@justin.tv>
Reviewed-by: David Chase <drchase@google.com>
Change-Id: I3f0b7209621b39cee69566a5cc95e4343b4f1f20
GitHub-Last-Rev: af9dbbe69a
GitHub-Pull-Request: golang/go#63321
Reviewed-on: https://go-review.googlesource.com/c/go/+/531916
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>