This reverts CL 481061.
Reason for revert: When built with C TSAN, x_cgo_getstackbound triggers
race detection on `g->stacklo` because the synchronization is in Go,
which isn't instrumented.
For #51676.
For #59294.
For #59678.
Change-Id: I38afcda9fcffd6537582a39a5214bc23dc147d47
Reviewed-on: https://go-review.googlesource.com/c/go/+/485275
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
(This is a retry of CL 462035 which was reverted at 474976.
The only change from that CL is the aix fix SRODATA->SNOPTRDATA
at inittask.go:141)
As described here:
https://github.com/golang/go/issues/31636#issuecomment-493271830
"Find the lexically earliest package that is not initialized yet,
but has had all its dependencies initialized, initialize that package,
and repeat."
Simplify the runtime a bit, by just computing the ordering required
in the linker and giving a list to the runtime.
Update #31636Fixes#57411
RELNOTE=yes
Change-Id: I28c09451d6aa677d7394c179d23c2c02c503fc56
Reviewed-on: https://go-review.googlesource.com/c/go/+/478916
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
This reapplies CL 392854, with the followup fixes in CL 479255,
CL 479915, and CL 481057 incorporated.
CL 392854, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 479255 is a followup fix for a small bug in ARM assembly code.
CL 479915 is another followup fix to address C to Go calls after
the C code uses some stack, but that CL is also buggy.
CL 481057, by Michael Knyszek, is a followup fix for a memory leak
bug of CL 479915.
[Original CL 392854 description]
In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls.
So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call.
Instead, we only dropm while the C thread exits, so the extra M won't leak.
When invoking a Go function from C:
Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor.
And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits.
When returning back to C:
Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C.
This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows.
This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread.
For the newly added BenchmarkCGoInCThread, some benchmark results:
1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
[CL 479915 description]
Currently, when C calls into Go the first time, we grab an M
using needm, which sets m.g0's stack bounds using the SP. We don't
know how big the stack is, so we simply assume 32K. Previously,
when the Go function returns to C, we drop the M, and the next
time C calls into Go, we put a new stack bound on the g0 based on
the current SP. After CL 392854, we don't drop the M, and the next
time C calls into Go, we reuse the same g0, without recomputing
the stack bounds. If the C code uses quite a bit of stack space
before calling into Go, the SP may be well below the 32K stack
bound we assumed, so the runtime thinks the g0 stack overflows.
This CL makes needm get a more accurate stack bound from
pthread. (In some platforms this may still be a guess as we don't
know exactly where we are in the C stack), but it is probably
better than simply assuming 32K.
Fixes#51676.
Fixes#59294.
Change-Id: I9bf1400106d5c08ce621d2ed1df3a2d9e3f55494
Reviewed-on: https://go-review.googlesource.com/c/go/+/481061
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: DeJiang Zhu (doujiang) <doujiang24@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
This reverts CL 392854.
Reason for revert: caused #59294, which was derived from google
internal tests. The attempted fix of #59294 caused more breakage.
Change-Id: I5a061561ac2740856b7ecc09725ac28bd30f8bba
Reviewed-on: https://go-review.googlesource.com/c/go/+/481060
Reviewed-by: Heschi Kreinick <heschi@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
This reverts CL 479915.
Reason for revert: breaks a lot google internal tests.
Change-Id: I13a9422e810af7ba58cbf4a7e6e55f4d8cc0ca51
Reviewed-on: https://go-review.googlesource.com/c/go/+/481055
Reviewed-by: Chressie Himpel <chressie@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Currently, when C calls into Go the first time, we grab an M
using needm, which sets m.g0's stack bounds using the SP. We don't
know how big the stack is, so we simply assume 32K. Previously,
when the Go function returns to C, we drop the M, and the next
time C calls into Go, we put a new stack bound on the g0 based on
the current SP. After CL 392854, we don't drop the M, and the next
time C calls into Go, we reuse the same g0, without recomputing
the stack bounds. If the C code uses quite a bit of stack space
before calling into Go, the SP may be well below the 32K stack
bound we assumed, so the runtime thinks the g0 stack overflows.
This CL makes needm get a more accurate stack bound from
pthread. (In some platforms this may still be a guess as we don't
know exactly where we are in the C stack), but it is probably
better than simply assuming 32K.
For #59294.
Change-Id: Ie52a8f931e0648d8753e4c1dbe45468b8748b527
Reviewed-on: https://go-review.googlesource.com/c/go/+/479915
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Introduce a new m.incgocallback field that is true while C code calls
into Go code. Use it in the tracer in order to fallback to the default
unwinder instead of frame pointer unwinding for this scenario. The
existing fields (incgo, ncgo) were not sufficient to detect the case
where a thread created in C calls into Go code.
Motivation:
1. Take advantage of a cgo symbolizer, if registered, to unwind through
C stacks without frame pointers.
2. Reduce the chance of crashes. It seems unsafe to follow frame
pointers when there could be C code that was compiled without frame
pointers.
Removing the curgp.m.incgocallback check in traceStackID shows the
following minor differences between frame pointer unwinding and the
default unwinder when there is no cgo symbolizer involved.
trace_test.go:60: "goCalledFromCThread": got stack:
main.goCalledFromCThread
/src/runtime/testdata/testprogcgo/trace.go:58
_cgoexp_45c15a3efb3a_goCalledFromCThread
_cgo_gotypes.go:694
runtime.cgocallbackg1
/src/runtime/cgocall.go:318
runtime.cgocallbackg
/src/runtime/cgocall.go:236
runtime.cgocallback
/src/runtime/asm_amd64.s:998
crosscall2
/src/runtime/cgo/asm_amd64.s:30
want stack:
main.goCalledFromCThread
/src/runtime/testdata/testprogcgo/trace.go:58
_cgoexp_45c15a3efb3a_goCalledFromCThread
_cgo_gotypes.go:694
runtime.cgocallbackg1
/src/runtime/cgocall.go:318
runtime.cgocallbackg
/src/runtime/cgocall.go:236
runtime.cgocallback
/src/runtime/asm_amd64.s:998
trace_test.go:60: "goCalledFromC": got stack:
main.goCalledFromC
/src/runtime/testdata/testprogcgo/trace.go:51
_cgoexp_45c15a3efb3a_goCalledFromC
_cgo_gotypes.go:687
runtime.cgocallbackg1
/src/runtime/cgocall.go:318
runtime.cgocallbackg
/src/runtime/cgocall.go:236
runtime.cgocallback
/src/runtime/asm_amd64.s:998
crosscall2
/src/runtime/cgo/asm_amd64.s:30
runtime.asmcgocall
/src/runtime/asm_amd64.s:848
main._Cfunc_cCalledFromGo
_cgo_gotypes.go:263
main.goCalledFromGo
/src/runtime/testdata/testprogcgo/trace.go:46
main.Trace
/src/runtime/testdata/testprogcgo/trace.go:37
main.main
/src/runtime/testdata/testprogcgo/main.go:34
want stack:
main.goCalledFromC
/src/runtime/testdata/testprogcgo/trace.go:51
_cgoexp_45c15a3efb3a_goCalledFromC
_cgo_gotypes.go:687
runtime.cgocallbackg1
/src/runtime/cgocall.go:318
runtime.cgocallbackg
/src/runtime/cgocall.go:236
runtime.cgocallback
/src/runtime/asm_amd64.s:998
runtime.systemstack_switch
/src/runtime/asm_amd64.s:463
runtime.cgocall
/src/runtime/cgocall.go:168
main._Cfunc_cCalledFromGo
_cgo_gotypes.go:263
main.goCalledFromGo
/src/runtime/testdata/testprogcgo/trace.go:46
main.Trace
/src/runtime/testdata/testprogcgo/trace.go:37
main.main
/src/runtime/testdata/testprogcgo/main.go:34
For #16638
Change-Id: I95fa27a3170c5abd923afc6eadab4eae777ced31
Reviewed-on: https://go-review.googlesource.com/c/go/+/474916
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls.
So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call.
Instead, we only dropm while the C thread exits, so the extra M won't leak.
When invoking a Go function from C:
Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor.
And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits.
When returning back to C:
Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C.
This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows.
This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread.
For the newly added BenchmarkCGoInCThread, some benchmark results:
1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Fixes#51676
Change-Id: I380702fe2f9b6b401b2d6f04b0aba990f4b9ee6c
GitHub-Last-Rev: 93dc64ad98
GitHub-Pull-Request: golang/go#51679
Reviewed-on: https://go-review.googlesource.com/c/go/+/392854
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: thepudds <thepudds1460@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
This is relatively easy using the new traceback iterator.
Ancestor tracebacks are now limited to 50 frames. We could keep that
at 100, but the fact that it used 100 before seemed arbitrary and
unnecessary.
Fixes#7181
Updates #54466
Change-Id: If693045881d84848f17e568df275a5105b6f1cb0
Reviewed-on: https://go-review.googlesource.com/c/go/+/475960
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Currently, filling PC traceback buffers is one of the jobs of
gentraceback. This moves it into a new function, tracebackPCs, with a
simple API built around unwinder, and changes all callers to use this
new API.
Updates #54466.
Change-Id: Id2038bded81bf533a5a4e71178a7c014904d938c
Reviewed-on: https://go-review.googlesource.com/c/go/+/468300
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
This reverts commit ce2a609909.
aka CL 462035
Reason for revert: this CL is causing some problems in some internal Google programs.
Change-Id: I4476b8d8d2c3d7b5703d1d85c93baebb4b4e5d26
Reviewed-on: https://go-review.googlesource.com/c/go/+/474976
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
As described here:
https://github.com/golang/go/issues/31636#issuecomment-493271830
"Find the lexically earliest package that is not initialized yet,
but has had all its dependencies initialized, initialize that package,
and repeat."
Simplify the runtime a bit, by just computing the ordering required
in the linker and giving a list to the runtime.
Update #31636Fixes#57411
RELNOTE=yes
Change-Id: I1e4d3878ebe6e8953527aedb730824971d722cac
Reviewed-on: https://go-review.googlesource.com/c/go/+/462035
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
These directives affect the next declaration, so the existing form is
valid, but can be confusing because it is easy to miss. Move then
directly above the declaration for improved readability.
CL 69120 previously moved the Gosched nosplit away to hide it from
documentation. Since CL 224737, directives are automatically excluded
from documentation.
Change-Id: I8ebf2d47fbb5e77c6f40ed8afdf79eaa4f4e335e
Reviewed-on: https://go-review.googlesource.com/c/go/+/472957
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Move this knob from a binary-startup thing to a build-time thing.
This will enable followon optmizations to the write barrier.
Change-Id: Ic3323348621c76a7dc390c09ff55016b19c43018
Reviewed-on: https://go-review.googlesource.com/c/go/+/447778
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
- Fix typo in throw error message for arena.
- Correct typos in assembly and Go comments.
- Fix log message in TestTraceCPUProfile.
Change-Id: I874c9e8cd46394448b6717bc6021aa3ecf319d16
GitHub-Last-Rev: d27fad4d3c
GitHub-Pull-Request: golang/go#58375
Reviewed-on: https://go-review.googlesource.com/c/go/+/465975
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
In the profiler, when unwinding the stack, we have special
handling for VDSO calls. Currently, the special handling is only
used when the normal unwinding fails. If the signal lands in the
function that makes the VDSO call (e.g. nanotime1) and after the
stack switch, the normal unwinding doesn't fail but gets a stack
trace with exactly one frame (the nanotime1 frame). The stack
trace stops because of the stack switch. This 1-frame stack trace
is not as helpful. Instead, if vdsoSP is set, we know we are in
VDSO call or right before or after it, so use vdsoPC and vdsoSP
for unwinding. Do the same for libcall.
Also remove _TraceTrap for VDSO unwinding, as vdsoPC and vdsoSP
correspond to a call, not an interrupted instruction.
Fixes#56574.
Change-Id: I799aa7644d0c1e2715ab038a9eef49481dd3a7f5
Reviewed-on: https://go-review.googlesource.com/c/go/+/455166
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Change-Id: I69065f8adf101fdb28682c55997f503013a50e29
Reviewed-on: https://go-review.googlesource.com/c/go/+/449757
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Joedian Reid <joedian@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Joedian Reid <joedian@golang.org>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
This change adds a new GODEBUG flag called pagetrace that writes a
low-overhead trace of how pages of memory are managed by the Go runtime.
The page tracer is kept behind a GOEXPERIMENT flag due to a potential
security risk for setuid binaries.
Change-Id: I6f4a2447d02693c25214400846a5d2832ad6e5c0
Reviewed-on: https://go-review.googlesource.com/c/go/+/444157
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Use a single writeErrStr function. Avoid using global variables.
Use a single version of some error messages rather than duplicating
the messages in OS-specific files.
Change-Id: If259fbe78faf797f0a21337d14472160ca03efa0
Reviewed-on: https://go-review.googlesource.com/c/go/+/447055
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
runtime.bgsweep contains an infinite loop. With aggressive enough
inlining, it may not perform any CALLs on a typical iteration. If
the runtime trying to preempt this goroutine, the lack of CALLs may
prevent preemption for ever occurring.
bgsweep does happen to call goschedIfBusy. Add a preempt check there to
make sure we yield eventually.
For #55022.
Change-Id: If22eb86fd6a626094b3c56dc745c8e4243b0fb40
Reviewed-on: https://go-review.googlesource.com/c/go/+/447135
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Ms are allocated via standard heap allocation (`new(m)`), which means we
must keep them alive (i.e., reachable by the GC) until we are completely
done using them.
Ms are primarily reachable through runtime.allm. However, runtime.mexit
drops the M from allm fairly early, long before it is done using the M
structure. If that was the last reference to the M, it is now at risk of
being freed by the GC and used for some other allocation, leading to
memory corruption.
Ms with a Go-allocated stack coincidentally already keep a reference to
the M in sched.freem, so that the stack can be freed lazily. This
reference has the side effect of keeping this Ms reachable. However, Ms
with an OS stack skip this and are at risk of corruption.
Fix this lifetime by extending sched.freem use to all Ms, with the value
of mp.freeWait determining whether the stack needs to be freed or not.
Fixes#56243.
Change-Id: Ic0c01684775f5646970df507111c9abaac0ba52e
Reviewed-on: https://go-review.googlesource.com/c/go/+/443716
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Extra Ms may lead to the "no consistent ordering of events possible" error when parsing trace file with cgo enabled, since:
1. The gs in the extra Ms may be in `_Gdead` status while starting trace by invoking `runtime.StartTrace`,
2. and these gs will trigger `traceEvGoSysExit` events in `runtime.exitsyscall` when invoking go functions from c,
3. then, the events of those gs are under non-consistent ordering, due to missing the previous events.
Add two events, `traceEvGoCreate` and `traceEvGoInSyscall`, in `runtime.StartTrace`, will make the trace parser happy.
Fixes#29707
Change-Id: I2fd9d1713cda22f0ddb36efe1ab351f88da10881
GitHub-Last-Rev: 7bbfddb81b
GitHub-Pull-Request: golang/go#54974
Reviewed-on: https://go-review.googlesource.com/c/go/+/429858
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: xie cui <523516579@qq.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Michael Pratt <mpratt@google.com>
Add a new API (not public/exported) for registering a function with
the runtime that should be called when program execution terminates,
to be used in the new code coverage re-implementation. The API looks
like
func addExitHook(f func(), runOnNonZeroExit bool)
The first argument is the function to be run, second argument controls
whether the function is invoked even if there is a call to os.Exit
with a non-zero status. Exit hooks are run in reverse order of
registration, e.g. the first hook to be registered will be the last to
run. Exit hook functions are not allowed to panic or to make calls to
os.Exit.
Updates #51430.
Change-Id: I906f8c5184b7c1666f05a62cfc7833bf1a4300c4
Reviewed-on: https://go-review.googlesource.com/c/go/+/354790
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Than McIntosh <thanm@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Currently bgsweep attempts to be a low-priority background goroutine
that runs mainly when the application is mostly idle. To avoid
complicating the scheduler further, it achieves this with a simple
heuristic: call Gosched after each span swept. While this is somewhat
inefficient as there's scheduling overhead on each iteration, it's
mostly fine because it tends to just come out of idle time anyway. In a
busy system, the call to Gosched quickly puts bgsweep at the back of
scheduler queues.
However, what's problematic about this heuristic is the number of
tracing events it produces. Average span sweeping latencies have been
measured as low as 30 ns, so every 30 ns in the sweep phase, with
available idle time, there would be a few trace events emitted. This
could result in an overwhelming number, making traces much larger than
they need to be. It also pollutes other observability tools, like the
scheduling latencies runtime metric, because bgsweep stays runnable the
whole time.
This change fixes these problems with two modifications to the
heursitic:
1. Check if there are any idle Ps before yielding. If there are, don't
yield.
2. Sweep at least 10 spans before trying to yield.
(1) is doing most of the work here. This change assumes that the
presence of idle Ps means that there is available CPU time, so bgsweep
is already making use of idle time and there's no reason it should stop.
This will have the biggest impact on the aforementioned issues.
(2) is a mitigation for the case where GOMAXPROCS=1, because we won't
ever observe a zero idle P count. It does mean that bgsweep is a little
bit higher priority than before because it yields its time less often,
so it could interfere with goroutine scheduling latencies more. However,
by sweeping 10 spans before volunteering time, we directly reduce trace
event production by 90% in all cases. The impact on scheduling latencies
should be fairly minimal, as sweeping a span is already so fast, that
sweeping 10 is unlikely to make a dent in any meaningful end-to-end
latency. In fact, it may even improve application latencies overall by
freeing up spans and sweep work from goroutines allocating memory. It
may be worth considering pushing this number higher in the future.
Another reason to do (2) is to reduce contention on npidle, which will
be checked as part of (1), but this is a fairly minor concern. The main
reason is to capture the GOMAXPROCS=1 case.
Fixes#54767.
Change-Id: I4361400f17197b8ab84c01f56203f20575b29fc6
Reviewed-on: https://go-review.googlesource.com/c/go/+/429615
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
This change adds a metric to the runtime/metrics package which tracks
total mutex wait time for sync.Mutex and sync.RWMutex. The purpose of
this metric is to be able to quickly get an idea of the total mutex wait
time.
The implementation of this metric piggybacks off of the existing G
runnable tracking infrastructure, as well as the wait reason set on a G
when it goes into _Gwaiting.
Fixes#49881.
Change-Id: I4691abf64ac3574bec69b4d7d4428b1573130517
Reviewed-on: https://go-review.googlesource.com/c/go/+/427618
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Currently, wait reasons are set somewhat inconsistently. In a follow-up
CL, we're going to want to rely on the wait reason being there for
casgstatus, so the status quo isn't really going to work for that. Plus
this inconsistency means there are a whole bunch of cases where we could
be more specific about the G's status but aren't.
So, this change adds a new function, casGToWaiting which is like
casgstatus but also sets the wait reason. The goal is that by using this
API it'll be harder to forget to set a wait reason (or the lack thereof
will at least be explicit). This change then updates all casgstatus(gp,
..., _Gwaiting) calls to casGToWaiting(gp, ..., waitReasonX) instead.
For a number of these cases, we're missing a wait reason, and it
wouldn't hurt to add a wait reason for them, so this change also adds
those wait reasons.
For #49881.
Change-Id: Ia95e06ecb74ed17bb7bb94f1a362ebfe6bec1518
Reviewed-on: https://go-review.googlesource.com/c/go/+/427617
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Updates #54854
Change-Id: Ie18665e93e477b6f220acf4c6c070b2af4343064
Reviewed-on: https://go-review.googlesource.com/c/go/+/428157
Run-TryBot: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Use an atomic.Uint32 to represent the state of finalizer goroutine.
fingStatus will only be changed to fingWake in non fingWait state,
so it is safe to set fingRunningFinalizer status in runfinq.
name old time/op new time/op delta
Finalizer-8 592µs ± 4% 561µs ± 1% -5.22% (p=0.000 n=10+10)
FinalizerRun-8 694ns ± 6% 675ns ± 7% ~ (p=0.059 n=9+8)
Change-Id: I7e4da30cec98ce99f7d8cf4c97f933a8a2d1cae1
Reviewed-on: https://go-review.googlesource.com/c/go/+/400134
Reviewed-by: Joedian Reid <joedian@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Michael Pratt <mpratt@google.com>
Note that this changes some unsynchronized operations of g.atomicstatus to synchronized operations.
Updates #53821
Change-Id: If249d62420ea09fbec39b570942f96c63669c333
Reviewed-on: https://go-review.googlesource.com/c/go/+/425363
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Note that this changes the non-atomic operations in p.destroy() to atomic operations.
For #53821
Change-Id: I7bba77c9a2287ba697c87cce2c79293e4d1b3334
Reviewed-on: https://go-review.googlesource.com/c/go/+/425774
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Note that this changes a few unsynchronized operations of forcegcstate.idle to synchronized operations.
Updates #53821
Change-Id: I041654cc84a188fad45e2df7abce3a434f9a1f15
Reviewed-on: https://go-review.googlesource.com/c/go/+/425361
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
So they match with when/nextwhen fields of timer struct.
Updates #53821
Change-Id: Iad0cceb129796745774facfbbfe5756df3a320b4
Reviewed-on: https://go-review.googlesource.com/c/go/+/423117
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Change-Id: I79695e1cfda3b4cd911673f6e14dc316c451e2ed
Reviewed-on: https://go-review.googlesource.com/c/go/+/423436
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Extra Ms may lead to the "no consistent ordering of events possible" error when parsing trace file with cgo enabled, since:
1. The gs in the extra Ms may be in `_Gdead` status while starting trace by invoking `runtime.StartTrace`,
2. and these gs will trigger `traceEvGoSysExit` events in `runtime.exitsyscall` when invoking go functions from c,
3. then, the events of those gs are under non-consistent ordering, due to missing the previous events.
Add two events, `traceEvGoCreate` and `traceEvGoInSyscall`, in `runtime.StartTrace`, will make the trace parser happy.
Fixes#29707
Change-Id: I7cc4b80822d2c46591304a59c9da2c9fc470f1d0
GitHub-Last-Rev: 445de8eaf3
GitHub-Pull-Request: golang/go#53284
Reviewed-on: https://go-review.googlesource.com/c/go/+/411034
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
CL 310850 dropped work re-checks on non-spinning Ms to fix#43997.
This introduced a new race condition: a non-spinning M may drop its P
and then park at the same time a spinning M attempts to wake a P to
handle some new work. The spinning M fails to find an idle P (because
the non-spinning M hasn't quite made its P idle yet), and does nothing
assuming that the system is fully loaded. This results in loss of work
conservation. In the worst case we could have a complete deadlock if
injectglist fails to wake anything just as all Ps are going idle.
sched.needspinning adds new synchronization to cover this case. If work
submission fails to find a P, it sets needspinning to indicate that a
spinning M is required. When non-spinning Ms prepare to drop their P,
they check needspinning and abort going idle to become a spinning M
instead. This addresses the race without extra spurious wakeups. In the
normal (non-racing case), an M will become spinning via the normal path
and clear the flag.
injectglist must change in addition to wakep because it is a similar
form of work submission, notably used following netpoll at a point when
we might not have a P that would guarantee the work runs.
Fixes#45867
Change-Id: Ieb623a6d4162fb8c2be7b4ff8acdebcc3a0d69a8
Reviewed-on: https://go-review.googlesource.com/c/go/+/389014
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
This converts several unsynchronized reads (reads without holding
prof.signalLock) into atomic reads.
For #53821.
For #52912.
Change-Id: I421b96a22fbe26d699bcc21010c8a9e0f4efc276
Reviewed-on: https://go-review.googlesource.com/c/go/+/420196
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
I've dropped the note that sched.timeToRun is protected by sched.lock,
as it does not seem to be true.
For #53821.
Change-Id: I03f8dc6ca0bcd4ccf3ec113010a0aa39c6f7d6ef
Reviewed-on: https://go-review.googlesource.com/c/go/+/419449
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>