Commit Graph

608 Commits

Author SHA1 Message Date
Joel Sing e293c4b509 runtime: allocate crash stack via stackalloc
On some platforms (notably OpenBSD), stacks must be specifically allocated
and marked as being stack memory. Allocate the crash stack using stackalloc,
which ensures these requirements are met, rather than using a global Go
variable.

Fixes #63794

Change-Id: I6513575797dd69ff0a36f3bfd4e5fc3bd95cbf50
Reviewed-on: https://go-review.googlesource.com/c/go/+/538457
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-10-31 16:28:14 +00:00
Cherry Mui 0262ea1ff9 runtime: print a stack trace at "morestack on g0"
Error like "morestack on g0" is one of the errors that is very
hard to debug, because often it doesn't print a useful stack trace.
The runtime doesn't directly print a stack trace because it is
a bad stack state to call print. Sometimes the SIGABRT may trigger
a traceback, but sometimes not especially in a cgo binary. Even if
it triggers a traceback it often does not include the stack trace
of the bad stack.

This CL makes it explicitly print a stack trace and throw. The
idea is to have some space as an "emergency" crash stack. When the
stack is in a really bad state, we switch to the crash stack and
do a traceback.

Currently only implemented on AMD64 and ARM64.

TODO: also handle errors like "morestack on gsignal" and bad
systemstack. Also handle other architectures.

Change-Id: Ibfc397202f2bb0737c5cbe99f2763de83301c1c1
Reviewed-on: https://go-review.googlesource.com/c/go/+/419435
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2023-10-26 18:46:50 +00:00
Michael Pratt 1af424c196 runtime: clear g0 stack bounds in dropm
After CL 527715, needm uses callbackUpdateSystemStack to set the stack
bounds for g0 on an M from the extra M list. Since
callbackUpdateSystemStack is also used for recursive cgocallback, it
does nothing if the stack is already in bounds.

Currently, the stack bounds in an extra M may contain stale bounds from
a previous thread that used this M and then returned it to the extra
list in dropm.

Typically a new thread will not have an overlapping stack with an old
thread, but because the old thread has exited there is a small chance
that the C memory allocator will allocate the new thread's stack
partially or fully overlapping with the old thread's stack.

If this occurs, then callbackUpdateSystemStack will not update the stack
bounds. If in addition, the overlap is partial such that SP on
cgocallback is close to the recorded stack lower bound, then Go may
quickly "overflow" the stack and crash with "morestack on g0".

Fix this by clearing the stack bounds in dropm, which ensures that
callbackUpdateSystemStack will unconditionally update the bounds in
needm.

For #62440.

Change-Id: Ic9e2052c2090dd679ed716d1a23a86d66cbcada7
Reviewed-on: https://go-review.googlesource.com/c/go/+/537695
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
TryBot-Bypass: Michael Pratt <mpratt@google.com>
2023-10-26 15:17:33 +00:00
Cherry Mui fd54185a8d cmd/link, runtime: initialize packages in shared build mode
Currently, for the shared build mode, we don't generate the module
inittasks. Instead, we rely on the main executable to do the
initialization, for both the executable and the shared library.
But, with the model as of CL 478916, the main executable only
has relocations to packages that are directly imported. It won't
see the dependency edges between packages within a shared library.
Therefore indirect dependencies are not included, and thus not
initialized. E.g. main imports a, which imports b, but main
doesn't directly import b. a and b are in a shared object. When
linking main, it sees main depends on a, so it generates main's
inittasks to run a's init before main's, but it doesn't know b,
so b's init doesn't run.

This CL makes it initialize all packages in a shared library when
the library is loaded, as any of them could potentially be
imported, directly or indirectly.

Also, in the runtime, when running the init functions, make sure
to go through the DSOs in dependency order. Otherwise packages
can be initialized in the wrong order.

Fixes #61973.

Change-Id: I2a090336fe9fa0d6c7e43912f3ab233c9c47e247
Reviewed-on: https://go-review.googlesource.com/c/go/+/520375
Reviewed-by: Than McIntosh <thanm@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-09-20 14:46:11 +00:00
Michael Pratt 4f9fe6d509 runtime: allow update of system stack bounds on callback from C thread
[This is a redo of CL 525455 with the test fixed on darwin by defining
_XOPEN_SOURCE, and disabled with android, musl, and openbsd, which do
not provide getcontext.]

Since CL 495855, Ms are cached for C threads calling into Go, including
the stack bounds of the system stack.

Some C libraries (e.g., coroutine libraries) do manual stack management
and may change stacks between calls to Go on the same thread.

Changing the stack if there is more Go up the stack would be
problematic. But if the calls are completely independent there is no
particular reason for Go to care about the changing stack boundary.

Thus, this CL allows the stack bounds to change in such cases. The
primary downside here (besides additional complexity) is that normal
systems that do not manipulate the stack may not notice unintentional
stack corruption as quickly as before.

Note that callbackUpdateSystemStack is written to be usable for the
initial setup in needm as well as updating the stack in cgocallbackg.

Fixes #62440.
For #62130.

Change-Id: I0fe0134f865932bbaff1fc0da377c35c013bd768
Reviewed-on: https://go-review.googlesource.com/c/go/+/527715
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-09-12 17:08:55 +00:00
Michael Pratt ea8c05508b Revert "runtime: allow update of system stack bounds on callback from C thread"
This reverts CL 525455. The test fails to build on darwin, alpine, and
android.

For #62440.

Change-Id: I39c6b1e16499bd61e0f166de6c6efe7a07961e62
Reviewed-on: https://go-review.googlesource.com/c/go/+/527317
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-09-11 16:35:56 +00:00
Michael Pratt a46b1ad357 runtime: allow update of system stack bounds on callback from C thread
Since CL 495855, Ms are cached for C threads calling into Go, including
the stack bounds of the system stack.

Some C libraries (e.g., coroutine libraries) do manual stack management
and may change stacks between calls to Go on the same thread.

Changing the stack if there is more Go up the stack would be
problematic. But if the calls are completely independent there is no
particular reason for Go to care about the changing stack boundary.

Thus, this CL allows the stack bounds to change in such cases. The
primary downside here (besides additional complexity) is that normal
systems that do not manipulate the stack may not notice unintentional
stack corruption as quickly as before.

Note that callbackUpdateSystemStack is written to be usable for the
initial setup in needm as well as updating the stack in cgocallbackg.

Fixes #62440.
For #62130.

Change-Id: I7841b056acea1111bdae3b718345a3bd3961b4a8
Reviewed-on: https://go-review.googlesource.com/c/go/+/525455
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-09-11 14:46:41 +00:00
Cherry Mui c6d550a668 runtime: increase g0 stack size in non-cgo case
Currently, for non-cgo programs, the g0 stack size is 8 KiB on
most platforms. With PGO which could cause aggressive inlining in
the runtime, the runtime stack frames are larger and could
overflow the 8 KiB g0 stack. Increase it to 16 KiB. This is only
one per OS thread, so it shouldn't increase memory use much.

Fixes #62120.
Fixes #62489.

Change-Id: I565b154517021f1fd849424dafc3f0f26a755cac
Reviewed-on: https://go-review.googlesource.com/c/go/+/526995
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-09-08 18:40:23 +00:00
Nick Ripley 94d36fbc4a runtime: zero saved frame pointer when reusing goroutine stack on arm64
When a goroutine stack is reused on arm64, the spot on the stack where
the "caller's" frame pointer goes for the topmost frame should be
explicitly zeroed. Otherwise, the frame pointer check in adjustframe
with debugCheckBP enabled will fail on the topmost frame of a call stack
the first time a reused stack is grown.

Updates #39524, #58432

Change-Id: Ic1210dc005e3ecdbf9cd5d7b98846566e56df8f5
Reviewed-on: https://go-review.googlesource.com/c/go/+/481636
Reviewed-by: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
2023-08-15 13:58:27 +00:00
Joel Sing 9fc3feb441 runtime,syscall: invert openbsd architecture tests
Rather than testing for architectures that use libc-based system calls,
test that it is not the single architecture that Go is still using direct
system calls. This reduces the number of changes needed for new openbsd
ports.

Updates #36435
Updates #61546

Change-Id: I79c4597c629b8b372e9efcda79e8f6ff778b9e8e
Reviewed-on: https://go-review.googlesource.com/c/go/+/516016
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-08-05 18:04:17 +00:00
Roland Shoemaker d4dd1de19f runtime: enforce standard file descriptors open on init on unix
On Unix-like platforms, enforce that the standard file descriptions (0,
1, 2) are always open during initialization. If any of the FDs are
closed, we open them pointing at /dev/null, or fail.

Fixes #60641

Change-Id: Iaab6b3f3e5ca44006ae3ba3544d47da9a613f58f
Reviewed-on: https://go-review.googlesource.com/c/go/+/509020
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Roland Shoemaker <roland@golang.org>
Auto-Submit: Roland Shoemaker <roland@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-07-25 16:33:33 +00:00
Michael Pratt ad943066f6 runtime: call wakep in gosched
goschedImpl transitions the current goroutine from _Grunning to
_Grunnable and places it on the global run queue before calling into
schedule.

It does _not_ call wakep after adding the global run queue. I believe
the intuition behind skipping wakep is that since we are immediately
calling the scheduler so we don't need to wake anything to run this
work. Unfortunately, this intuition is not correct, as it breaks
coordination with spinning Ms [1].

Consider this example scenario:

Initial conditions:

M0: Running P0, G0
M1: Spinning, holding P1 and looking for work

Timeline:

M1: Fails to find work; drops P
M0: newproc adds G1 to P0 runq
M0: does not wakep because there is a spinning M
M1: clear mp.spinning, decrement sched.nmspinning (now in "delicate dance")
M1: check sched.runqsize -> no global runq work
M0: gosched preempts G0; adds G0 to global runq
M0: does not wakep because gosched doesn't wakep
M0: schedules G1 from P0 runq
M1: check P0 runq -> no work
M1: no work -> park

G0 is stranded on the global runq with no M/P looking to run it. This is
a loss of work conservation.

As a result, G0 will have unbounded* scheduling delay, only getting
scheduled when G1 yields. Even once G1 yields, we still won't start
another P, so both G0 and G1 will switch back and forth sharing one P
when they should start another.

*The caveat to this is that today sysmon will preempt G1 after 10ms,
effectively capping the scheduling delay to 10ms, but not solving the P
underutilization problem. Sysmon's behavior here is theoretically
unnecessary, as our work conservation guarantee should allow sysmon to
avoid preemption if there are any idle Ps. Issue #60693 tracks changing
this behavior and the challenges involved.

[1] It would be OK if we unconditionally entered the scheduler as a
spinning M ourselves, as that would require schedule to call wakep when
it finds work in case there is more work.

Fixes #55160.

Change-Id: I2f44001239564b56ea30212553ab557051d22588
Reviewed-on: https://go-review.googlesource.com/c/go/+/501976
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
2023-07-20 21:39:57 +00:00
Michael Pratt dd5db4df56 runtime: check global runq during "delicate dance"
When a thread transitions to spinning to non-spinning it must recheck
all sources of work because other threads may submit new work but skip
wakep because they see a spinning thread.

However, since the beginning of time (CL 7314062) we do not check the
global run queue, only the local per-P run queues.

The global run queue is checked just above the spinning checks while
dropping the P. I am unsure what the purpose of this check is. It
appears to simply be opportunistic since sched.lock is already held
there in order to drop the P. It is not sufficient to synchronize with
threads adding work because it occurs before decrementing
sched.nmspinning, which is what threads us to decide to wake a thread.

Resolve this by adding an explicit global run queue check alongside the
local per-P run queue checks.

Almost nothing happens between dropped sched.lock after dropping the P
and relocking sched.lock: just clearing mp.spinning and decrementing
sched.nmspinning. Thus it may be better to just hold sched.lock for this
entire period, but this is a larger change that I would prefer to avoid
in the freeze and backports.

For #55160.

Change-Id: Ifd88b5a4c561c063cedcfcfe1dd8ae04202d9666
Reviewed-on: https://go-review.googlesource.com/c/go/+/501975
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-07-20 21:39:53 +00:00
Ian Lance Taylor f51c55bfc3 runtime: adjust netpollWaiters after goroutines are ready
The runtime was adjusting netpollWaiters before the waiting
goroutines were marked as ready. This could cause the scheduler
to report a deadlock because there were no goroutines ready to run.
Keeping netpollWaiters non-zero ensures that at least one goroutine
will call netpoll(-1) from findRunnable.

This does mean that if a program has network activity for a while
and then never has it again, and also has no timers, then we can leave
an M stranded in a call to netpoll from which it will never return.
At least this won't be a common case. And it's not new; this has been
a potential problem for some time.

Fixes #61454

Change-Id: I17c7f891c2bb1262fda12c6929664e64686463c8
Reviewed-on: https://go-review.googlesource.com/c/go/+/511455
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2023-07-20 15:45:57 +00:00
Jelle van den Hooff 48dbb6227a runtime: set raceignore to zero when starting a new goroutine
When reusing a g struct the runtime did not reset
g.raceignore. Initialize raceignore to zero when initially
setting racectx.

A goroutine can end with a non-zero raceignore if it exits
after calling runtime.RaceDisable without a matching
runtime.RaceEnable. If that goroutine's g is later reused
the race detector is in a weird state: the underlying
g.racectx is active, yet g.raceignore is non-zero, and
raceacquire/racerelease which check g.raceignore become
no-ops. This causes the race detector to report races when
there are none.

Fixes #60934

Change-Id: Ib8e412f11badbaf69a480f03740da70891f4093f
Reviewed-on: https://go-review.googlesource.com/c/go/+/505055
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
2023-06-23 16:46:25 +00:00
Michael Pratt 5b6e6d2b3d runtime: make GODEBUG=dontfreezetheworld=1 safer
GODEBUG=dontfreezetheworld=1 allows goroutines to continue execution
during fatal panic. This increases the chance that tracebackothers will
encounter running goroutines that it must skip, which is expected and
fine. However, it also introduces the risk that a goroutine transitions
from stopped to running in the middle of traceback, which is unsafe and
may cause traceback crashes.

Mitigate this by halting M execution if it naturally enters the
scheduler. This ensures that goroutines cannot transition from stopped
to running after freezetheworld. We simply deadlock rather than using
gcstopm to continue keeping disturbance to scheduler state to a minimum.

Change-Id: I9aa8d84abf038ae17142f34f4384e920b1490e81
Reviewed-on: https://go-review.googlesource.com/c/go/+/501255
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-06-06 21:29:01 +00:00
Roland Shoemaker 2496653d0a runtime: implement SUID/SGID protections
On Unix platforms, the runtime previously did nothing special when a
program was run with either the SUID or SGID bits set. This can be
dangerous in certain cases, such as when dumping memory state, or
assuming the status of standard i/o file descriptors.

Taking cues from glibc, this change implements a set of protections when
a binary is run with SUID or SGID bits set (or is SUID/SGID-like). On
Linux, whether to enable these protections is determined by whether the
AT_SECURE flag is passed in the auxiliary vector. On platforms which
have the issetugid syscall (the BSDs, darwin, and Solaris/Illumos), that
is used. On the remaining platforms (currently only AIX) we check
!(getuid() == geteuid() && getgid == getegid()).

Currently when we determine a binary is "tainted" (using the glibc
terminology), we implement two specific protections:
  1. we check if the file descriptors 0, 1, and 2 are open, and if they
     are not, we open them, pointing at /dev/null (or fail).
  2. we force GOTRACKBACK=none, and generally prevent dumping of
     trackbacks and registers when a program panics/aborts.

In the future we may add additional protections.

This change requires implementing issetugid on the platforms which
support it, and implementing getuid, geteuid, getgid, and getegid on
AIX.

Thanks to Vincent Dehors from Synacktiv for reporting this issue.

Fixes #60272
Fixes CVE-2023-29403

Change-Id: I73fc93f2b7a8933c192ce3eabbf1db359db7d5fa
Reviewed-on: https://team-review.git.corp.google.com/c/golang/go-private/+/1878434
Reviewed-by: Damien Neil <dneil@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Roland Shoemaker <bracewell@google.com>
Reviewed-by: Russ Cox <rsc@google.com>
Reviewed-on: https://go-review.googlesource.com/c/go/+/501223
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-06-06 18:49:01 +00:00
Michael Pratt 7911f7c21d runtime: only increment extraMInUse when actually in use
Currently lockextra always increments extraMInUse, even if the M won't
be used (or doesn't even exist), such as in addExtraM. addExtraM fails
to decrement extraMInUse, so it stays elevated forever.

Fix this bug and simplify the model by moving extraMInUse out of
lockextra to getExtraM, where we know the M will actually be used.

While we're here, remove the nilokay argument from getExtraM, which is
always false.

Fixes #60540.

Change-Id: I7a5d97456b3bc6ea1baeb06b5b2975e3b8dd96a0
Reviewed-on: https://go-review.googlesource.com/c/go/+/499677
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-06-01 14:05:01 +00:00
Michael Anthony Knyszek 0adcc5ace8 runtime: cache inner pinner on P
This change caches the *pinner on the P to pool it and reduce the chance
that a new allocation is made. It also makes the *pinner no longer drop
its refs array on unpin, also to avoid reallocating.

The Pinner benchmark results before and after this CL are attached at
the bottom of the commit message.

Note that these results are biased toward the current change because of
the last two benchmark changes. Reusing the pinner in the benchmark
itself achieves similar performance before this change. The benchmark
results thus basically just confirm that this change does cache the
inner pinner in a useful way. Using the previous benchmarks there's
actually a slight regression from the extra check in the cache, however
the long pole is still setPinned itself.

name                                old time/op    new time/op    delta
PinnerPinUnpinBatch-8                 42.2µs ± 2%    41.5µs ± 1%      ~     (p=0.056 n=5+5)
PinnerPinUnpinBatchDouble-8            367µs ± 1%     350µs ± 1%    -4.67%  (p=0.008 n=5+5)
PinnerPinUnpinBatchTiny-8              108µs ± 0%     102µs ± 1%    -6.22%  (p=0.008 n=5+5)
PinnerPinUnpin-8                       592ns ± 8%      40ns ± 1%   -93.29%  (p=0.008 n=5+5)
PinnerPinUnpinTiny-8                   693ns ± 9%      39ns ± 1%   -94.31%  (p=0.008 n=5+5)
PinnerPinUnpinDouble-8                 843ns ± 5%     124ns ± 3%   -85.24%  (p=0.008 n=5+5)
PinnerPinUnpinParallel-8              1.11µs ± 5%    0.00µs ± 0%   -99.55%  (p=0.008 n=5+5)
PinnerPinUnpinParallelTiny-8          1.12µs ± 8%    0.00µs ± 1%   -99.55%  (p=0.008 n=5+5)
PinnerPinUnpinParallelDouble-8        1.79µs ± 4%    0.58µs ± 6%   -67.36%  (p=0.008 n=5+5)
PinnerIsPinnedOnPinned-8              5.78ns ± 0%    5.80ns ± 1%      ~     (p=0.548 n=5+5)
PinnerIsPinnedOnUnpinned-8            4.99ns ± 1%    4.98ns ± 0%      ~     (p=0.841 n=5+5)
PinnerIsPinnedOnPinnedParallel-8      0.71ns ± 0%    0.71ns ± 0%      ~     (p=0.175 n=5+5)
PinnerIsPinnedOnUnpinnedParallel-8    0.67ns ± 1%    0.66ns ± 0%      ~     (p=0.167 n=5+5)

name                                old alloc/op   new alloc/op   delta
PinnerPinUnpinBatch-8                 20.1kB ± 0%    20.0kB ± 0%    -0.32%  (p=0.008 n=5+5)
PinnerPinUnpinBatchDouble-8           52.7kB ± 0%    52.7kB ± 0%    -0.12%  (p=0.008 n=5+5)
PinnerPinUnpinBatchTiny-8             20.1kB ± 0%    20.0kB ± 0%    -0.32%  (p=0.008 n=5+5)
PinnerPinUnpin-8                       64.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
PinnerPinUnpinTiny-8                   64.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
PinnerPinUnpinDouble-8                 64.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
PinnerPinUnpinParallel-8               64.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
PinnerPinUnpinParallelTiny-8           64.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
PinnerPinUnpinParallelDouble-8         64.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
PinnerIsPinnedOnPinned-8               0.00B          0.00B           ~     (all equal)
PinnerIsPinnedOnUnpinned-8             0.00B          0.00B           ~     (all equal)
PinnerIsPinnedOnPinnedParallel-8       0.00B          0.00B           ~     (all equal)
PinnerIsPinnedOnUnpinnedParallel-8     0.00B          0.00B           ~     (all equal)

name                                old allocs/op  new allocs/op  delta
PinnerPinUnpinBatch-8                   9.00 ± 0%      8.00 ± 0%   -11.11%  (p=0.008 n=5+5)
PinnerPinUnpinBatchDouble-8             11.0 ± 0%      10.0 ± 0%    -9.09%  (p=0.008 n=5+5)
PinnerPinUnpinBatchTiny-8               9.00 ± 0%      8.00 ± 0%   -11.11%  (p=0.008 n=5+5)
PinnerPinUnpin-8                        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
PinnerPinUnpinTiny-8                    1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
PinnerPinUnpinDouble-8                  1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
PinnerPinUnpinParallel-8                1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
PinnerPinUnpinParallelTiny-8            1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
PinnerPinUnpinParallelDouble-8          1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
PinnerIsPinnedOnPinned-8                0.00           0.00           ~     (all equal)
PinnerIsPinnedOnUnpinned-8              0.00           0.00           ~     (all equal)
PinnerIsPinnedOnPinnedParallel-8        0.00           0.00           ~     (all equal)
PinnerIsPinnedOnUnpinnedParallel-8      0.00           0.00           ~     (all equal)

For #46787.

Change-Id: I0cdfad77b189c425868944a4faeff3d5b97417b9
Reviewed-on: https://go-review.googlesource.com/c/go/+/497615
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Ansiwen <ansiwen@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
2023-05-24 16:23:08 +00:00
Michael Anthony Knyszek 6f13d0bfe4 runtime: fix usage of stale "now" value for netpolling Ms
Currently pidleget gets passed "now" from before the M goes into
netpoll, resulting in incorrect accounting of idle CPU time.
lastpoll is also stored with a stale "now": the mistake was added in the
same CL it was added for pidleget.

Recompute "now" after returning from netpoll.

Also, start tracking idle time on js/wasm at all.

Credit to Rhys Hiltner for the test case.

Fixes #60276.

Change-Id: I5dd677471f74c915dfcf3d01621430876c3ff307
Reviewed-on: https://go-review.googlesource.com/c/go/+/496183
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
2023-05-23 19:24:33 +00:00
Michael Anthony Knyszek 7c91e1e568 runtime: replace raw traceEv with traceBlockReason in gopark
This change adds traceBlockReason which leaks fewer implementation
details of the tracer to the runtime. Currently, gopark is called with
an explicit trace event, but this leaks details about trace internals
throughout the runtime.

This change will make it easier to change out the trace implementation.

Change-Id: Id633e1704d2c8838c6abd1214d9695537c4ac7db
Reviewed-on: https://go-review.googlesource.com/c/go/+/494185
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
2023-05-19 20:47:25 +00:00
Michael Anthony Knyszek b60db8f7d9 runtime: formalize the trace clock
Currently the trace clock is cputicks() with comments sprinkled in
different places as to which clock to use. Since the execution tracer
redesign will use a different clock, it seems like a good time to clean
that up.

Also, rename the start/end timestamps to be more readable (i.e.
startTime vs. timeStart).

Change-Id: If43533eddd0e5f68885bb75cdbadb38da42e7584
Reviewed-on: https://go-review.googlesource.com/c/go/+/494775
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-05-19 18:01:57 +00:00
Michael Anthony Knyszek b1aadd034c runtime: emit STW events for all pauses, not just those for the GC
Currently STW events are only emitted for GC STWs. There's little reason
why the trace can't contain events for every STW: they're rare so don't
take up much space in the trace, yet being able to see when the world
was stopped is often critical to debugging certain latency issues,
especially when they stem from user-level APIs.

This change adds new "kinds" to the EvGCSTWStart event, renames the
GCSTW events to just "STW," and lets the parser deal with unknown STW
kinds for future backwards compatibility.

But, this change must break trace compatibility, so it bumps the trace
version to Go 1.21.

This change also includes a small cleanup in the trace command, which
previously checked for STW events when deciding whether user tasks
overlapped with a GC. Looking at the source, I don't see a way for STW
events to ever enter the stream that that code looks at, so that
condition has been deleted.

Change-Id: I9a5dc144092c53e92eb6950e9a5504a790ac00cf
Reviewed-on: https://go-review.googlesource.com/c/go/+/494495
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
2023-05-19 17:06:45 +00:00
Cherry Mui c426c87012 runtime/cgo: store M for C-created thread in pthread key
This reapplies CL 485500, with a fix drafted in CL 492987 incorporated.

CL 485500 is reverted due to #60004 and #60007. #60004 is fixed in
CL 492743. #60007 is fixed in CL 492987 (incorporated in this CL).

[Original CL 485500 description]

This reapplies CL 481061, with the followup fixes in CL 482975, CL 485315, and
CL 485316 incorporated.

CL 481061, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 482975 is a followup fix to a C declaration in testprogcgo.
CL 485315 is a followup fix for x_cgo_getstackbound on Illumos.
CL 485316 is a followup cleanup for ppc64 assembly.

CL 479915 passed the G to _cgo_getstackbound for direct updates to
gp.stack.lo. A G can be reused on a new thread after the previous thread
exited. This could trigger the C TSAN race detector because it couldn't
see the synchronization in Go (lockextra) preventing the same G from
being used on multiple threads at the same time.

We work around this by passing the address of a stack variable to
_cgo_getstackbound rather than the G. The stack is generally unique per
thread, so TSAN won't see the same address from multiple threads. Even
if stacks are reused across threads by pthread, C TSAN should see the
synchonization in the stack allocator.

A regression test is added to misc/cgo/testsanitizer.

[Original CL 481061 description]

This reapplies CL 392854, with the followup fixes in CL 479255,
CL 479915, and CL 481057 incorporated.

CL 392854, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 479255 is a followup fix for a small bug in ARM assembly code.
CL 479915 is another followup fix to address C to Go calls after
the C code uses some stack, but that CL is also buggy.
CL 481057, by Michael Knyszek, is a followup fix for a memory leak
bug of CL 479915.

[Original CL 392854 description]

In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls.
So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call.
Instead, we only dropm while the C thread exits, so the extra M won't leak.

When invoking a Go function from C:
Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor.
And store the g0 of the current m into the thread-specified value of the pthread key,  only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits.

When returning back to C:
Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C.

This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows.

This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread.

For the newly added BenchmarkCGoInCThread, some benchmark results:
1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz

[CL 479915 description]

Currently, when C calls into Go the first time, we grab an M
using needm, which sets m.g0's stack bounds using the SP. We don't
know how big the stack is, so we simply assume 32K. Previously,
when the Go function returns to C, we drop the M, and the next
time C calls into Go, we put a new stack bound on the g0 based on
the current SP. After CL 392854, we don't drop the M, and the next
time C calls into Go, we reuse the same g0, without recomputing
the stack bounds. If the C code uses quite a bit of stack space
before calling into Go, the SP may be well below the 32K stack
bound we assumed, so the runtime thinks the g0 stack overflows.

This CL makes needm get a more accurate stack bound from
pthread. (In some platforms this may still be a guess as we don't
know exactly where we are in the C stack), but it is probably
better than simply assuming 32K.

[CL 492987 description]

On the first call into Go from a C thread, currently we set the g0
stack's high bound imprecisely based on the SP. With CL 485500, we
keep the M and don't recompute the stack bounds when it calls into
Go again. If the first call is made when the C thread uses some
deep stack, but a subsequent call is made with a shallower stack,
the SP may be above g0.stack.hi.

This is usually okay as we don't check usually stack.hi. One place
where we do check for stack.hi is in the signal handler, in
adjustSignalStack. In particular, C TSAN delivers signals on the
g0 stack (instead of the usual signal stack). If the SP is above
g0.stack.hi, we don't see it is on the g0 stack, and throws.

This CL makes it get an accurate stack upper bound with the
pthread API (on the platforms where it is available).

Also add some debug print for the "handler not on signal stack"
throw.

Fixes #51676.
Fixes #59294.
Fixes #59678.
Fixes #60007.

Change-Id: Ie51c8e81ade34ec81d69fd7bce1fe0039a470776
Reviewed-on: https://go-review.googlesource.com/c/go/+/495855
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2023-05-17 21:53:11 +00:00
Michael Anthony Knyszek 865179164e runtime: replace sysBlockTraced with tracedSyscallEnter
sysBlockTraced is a subtle and confusing flag.

Currently, it's only used in one place: a condition around whether to
traceGoSysExit when a goroutine is about to start running. That condition
looks like "gp.syscallsp != 0 && gp.trace.sysBlockTraced".

In every case but one, "gp.syscallsp != 0" is equivalent to
"gp.trace.sysBlockTraced."

That one case is where a goroutine is running without a P and racing
with trace start (world is stopped). It switches itself back to
_Grunnable from _Gsyscall before the trace start goroutine notices, such
that the trace start goroutine fails to emit a EvGoInSyscall event for
it (EvGoInSyscall or EvGoSysBlock must precede any EvGoSysExit event).
sysBlockTraced is set unconditionally on every syscall entry and the
trace start goroutine clears it if there was no EvGoInSyscall event
emitted (i.e. did not observe _Gsyscall on the goroutine). That way when
the goroutine-without-a-P wakes up and gets scheduled, it only emits
EvGoSysExit if the flag is set, i.e. trace start didn't _clear_ the
flag.

What makes this confusing is the fact that the flag is set
unconditionally and the code relies on it being *cleared*. Really, all
it's trying to communicate is whether the tracer is aware of a
goroutine's syscall at the point where a goroutine that lost its P
during a syscall is trying to run again.

Therefore, we can replace this flag with a less subtle one:
tracedSyscallEnter. It is set when GoSysCall is traced, indicating on
the goroutine that the tracer is aware of the syscall. Later, if
traceGoSysExit is called, the tracer knows its safe to emit an event
because the tracer is aware of the syscall.

This flag is then also set at trace start, when it emits EvGoInSyscall,
which again, lets the goroutine know the tracer is aware of its syscall.

The flag is cleared by GoSysExit to indicate that the tracer is no
longer aware of any syscalls on the goroutine. It's also cleared by
trace start. This is necessary because a syscall may have been started
while a trace was stopping. If the GoSysExit isn't emitted (because it
races with the trace end STW) then the flag will be left set at the
start of the next trace period, which will result in an erroneous
GoSysExit. Instead, the flag is cleared in the same way sysBlockTraced
is today: if the tracer doesn't notice the goroutine is in a syscall, it
makes that explicit to the goroutine.

A more direct flag to use here would be one that explicitly indicates
whether EvGoInSyscall or EvGoSysBlock specifically were already emitted
for a goroutine. The reason why we don't just do this is because setting
the flag when EvGoSysBlock is emitted would be racy: EvGoSysBlock is
emitted by whatever thread is stealing the P out from under the
syscalling goroutine, so it would need to synchronize with the goroutine
its emitting the event for.

The end result of all this is that the new flag can be managed entirely
within trace.go, hiding another implementation detail about the tracer.

Tested with `stress ./trace.test -test.run="TestTraceStressStartStop"`
which was occasionally failing before the CL in which sysBlockTraced was
added (CL 9132). I also confirmed also that this test is still sensitive
to `EvGoSysExit` by removing the one use of sysBlockTraced. The result
is about a 5% error rate. If there is something very subtly wrong about
how this CL emits `EvGoSysExit`, I would expect to see it as a test
failure. Instead:

    53m55s: 200434 runs so far, 0 failures

Change-Id: If1d24ee6b6926eec7e90cdb66039a5abac819d9b
Reviewed-on: https://go-review.googlesource.com/c/go/+/494715
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
2023-05-17 15:50:17 +00:00
Michael Anthony Knyszek e3ada56537 runtime: hide sysExitTicks a little better
Just another step to hiding implementation details.

Change-Id: I71b7cc522d18c23f03a9bf32e428279e62b39a89
Reviewed-on: https://go-review.googlesource.com/c/go/+/494192
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2023-05-17 14:53:01 +00:00
Michael Anthony Knyszek c213c905a2 runtime: capture per-g trace state in a type
More tightening up of the tracer's interface.

This increases the size of each G very slightly, which isn't great, but
we stay within the same size class, so actually memory use will be
unchanged.

Change-Id: I7d1f5798edcf437c212beb1e1a2619eab833aafb
Reviewed-on: https://go-review.googlesource.com/c/go/+/494188
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
2023-05-17 14:45:40 +00:00
Michael Anthony Knyszek c890d40d0d runtime: factor our oneNewExtraM trace code
In the interest of further cleaning up the trace.go API, move the trace
logic in oneNewExtraM into its own function.

Change-Id: I5cf478cb8cd0d301ee3b068347ed48ce768b8882
Reviewed-on: https://go-review.googlesource.com/c/go/+/494186
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-05-17 14:43:25 +00:00
Michael Anthony Knyszek 62bf7a4809 runtime: hide trace.shutdown behind traceShuttingDown
Change-Id: I0b123e65f40570caeee611679d80dc27034d5a52
Reviewed-on: https://go-review.googlesource.com/c/go/+/494183
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
2023-05-11 21:27:10 +00:00
Michael Anthony Knyszek 8992bb19ad runtime: replace trace.enabled with traceEnabled
[git-generate]
cd src/runtime
grep -l 'trace\.enabled' *.go | grep -v "trace.go" | xargs sed -i 's/trace\.enabled/traceEnabled()/g'

Change-Id: I14c7821c1134690b18c8abc0edd27abcdabcad72
Reviewed-on: https://go-review.googlesource.com/c/go/+/494181
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2023-05-11 21:27:08 +00:00
Michael Anthony Knyszek a2737b1aab runtime: hide trace lock init details
This change is in service of hiding more execution trace implementation
details for big changes to come.

Change-Id: I49b9716a7bf285d23c86b58912a05eff4ddc2213
Reviewed-on: https://go-review.googlesource.com/c/go/+/494182
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-05-11 19:07:04 +00:00
Michael Pratt 734b26d4b9 runtime: exclude extra M's from debug.SetMaxThreads
The purpose of the debug.SetMaxThreads limit is to avoid accidental fork
bomb from something like millions of goroutines blocking on system
calls, causing the runtime to create millions of threads.

By definition we don't create threads created in C, so this isn't a
problem for those threads, and we can exclude them from the limit. If C
wants to create tens of thousands of threads, who are we to say no?

Fixes #60004.

Change-Id: I62b875890718b406abca42a9a4078391e25aa21b
Reviewed-on: https://go-review.googlesource.com/c/go/+/492743
Auto-Submit: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
2023-05-09 17:55:20 +00:00
Michael Pratt 77b1f23af7 runtime: clean up extra M API
There are quite a few locations that get/put Ms from the extra M list,
but the API is pretty clumsy to use. Add an easier to use getExtraM /
putExtraM API.

There are only two minor semantic changes:

1. dropm no longer calls setg(nil) inside the lockextra critical
   section. It is important that this thread no longer references the G
   (and in turn M) once it is published to the extra M list and another
   thread could acquire it. But there is no reason that needs to happen
   only after lockextra.

2. extraMLength (renamed from extraMCount) is no longer protected by
   lockextra and is instead simply an atomic (though writes are still in
   the critical section). The previous readers all dropped lockextra
   before using the value they read anyway.

For #60004.

Change-Id: Ifca4d6c84d605423855d89f49af400ca07de56f4
Reviewed-on: https://go-review.googlesource.com/c/go/+/492742
Run-TryBot: Michael Pratt <mpratt@google.com>
Commit-Queue: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
2023-05-08 16:39:39 +00:00
Chressie Himpel 72c33a5ef0 Revert "runtime/cgo: store M for C-created thread in pthread key"
This reverts CL 485500.

Reason for revert: This breaks internal tests at Google, see b/280861579 and b/280820455.

Change-Id: I426278d400f7611170918fc07c524cb059b9cc55
Reviewed-on: https://go-review.googlesource.com/c/go/+/492995
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Chressie Himpel <chressie@google.com>
2023-05-05 14:37:29 +00:00
Nick Ripley 265d19ed52 runtime/trace: avoid frame pointer unwinding for events during cgocallbackg
The current mp.incgocallback() logic allows for trace events to be
recorded using frame pointer unwinding during cgocallbackg when they
shouldn't be. Specifically, mp.incgo will be false during the
reentersyscall call at the end. It's possible to crash with tracing
enabled because of this, if C code which uses the frame pointer register
for other purposes calls into Go. This can be seen, for example, by
forcing testprogcgo/trace_unix.c to write a garbage value to RBP prior
to calling into Go.

We can drop the mp.incgo check, and instead conservatively avoid doing
frame pointer unwinding if there is any C on the stack. This is the case
if mp.ncgo > 0, or if mp.isextra is true (meaning we're coming from a
thread created by C). Rename incgocallback to reflect that we're
checking if there's any C on the stack. We can also move the ncgo
increment in cgocall closer to where the transition to C happens, which
lets us use frame pointer unwinding for the entersyscall event during
the first Go-to-C call on a stack, when there isn't yet any C on the
stack.

Fixes #59830.

Change-Id: If178a705a9d38d0d2fb19589a9e669cd982d32cd
Reviewed-on: https://go-review.googlesource.com/c/go/+/488755
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Nick Ripley <nick.ripley@datadoghq.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-04-28 21:07:22 +00:00
Lucien Coffe ff059add10 runtime: resolve checkdead panic by refining `startm` lock handling in caller context
This change addresses a `checkdead` panic caused by a race condition between
`sysmon->startm` and `checkdead` callers, due to prematurely releasing the
scheduler lock. The solution involves allowing a `startm` caller to acquire the
scheduler lock and call `startm` in this context. A new `lockheld` bool
argument is added to `startm`, which manages all lock and unlock calls within
the function. The`startIdle` function variable in `injectglist` is updated to
call `startm` with the lock held, ensuring proper lock handling in this
specific case. This refined lock handling resolves the observed race condition
issue.

Fixes #59600

Change-Id: I11663a15536c10c773fc2fde291d959099aa71be
Reviewed-on: https://go-review.googlesource.com/c/go/+/487316
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
2023-04-28 15:57:36 +00:00
Michael Pratt 7b874619be runtime/cgo: store M for C-created thread in pthread key
This reapplies CL 481061, with the followup fixes in CL 482975, CL 485315, and
CL 485316 incorporated.

CL 481061, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 482975 is a followup fix to a C declaration in testprogcgo.
CL 485315 is a followup fix for x_cgo_getstackbound on Illumos.
CL 485316 is a followup cleanup for ppc64 assembly.

[Original CL 481061 description]

This reapplies CL 392854, with the followup fixes in CL 479255,
CL 479915, and CL 481057 incorporated.

CL 392854, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 479255 is a followup fix for a small bug in ARM assembly code.
CL 479915 is another followup fix to address C to Go calls after
the C code uses some stack, but that CL is also buggy.
CL 481057, by Michael Knyszek, is a followup fix for a memory leak
bug of CL 479915.

[Original CL 392854 description]

In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls.
So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call.
Instead, we only dropm while the C thread exits, so the extra M won't leak.

When invoking a Go function from C:
Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor.
And store the g0 of the current m into the thread-specified value of the pthread key,  only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits.

When returning back to C:
Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C.

This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows.

This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread.

For the newly added BenchmarkCGoInCThread, some benchmark results:
1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz

[CL 479915 description]

Currently, when C calls into Go the first time, we grab an M
using needm, which sets m.g0's stack bounds using the SP. We don't
know how big the stack is, so we simply assume 32K. Previously,
when the Go function returns to C, we drop the M, and the next
time C calls into Go, we put a new stack bound on the g0 based on
the current SP. After CL 392854, we don't drop the M, and the next
time C calls into Go, we reuse the same g0, without recomputing
the stack bounds. If the C code uses quite a bit of stack space
before calling into Go, the SP may be well below the 32K stack
bound we assumed, so the runtime thinks the g0 stack overflows.

This CL makes needm get a more accurate stack bound from
pthread. (In some platforms this may still be a guess as we don't
know exactly where we are in the C stack), but it is probably
better than simply assuming 32K.

[CL 485500 description]

CL 479915 passed the G to _cgo_getstackbound for direct updates to
gp.stack.lo. A G can be reused on a new thread after the previous thread
exited. This could trigger the C TSAN race detector because it couldn't
see the synchronization in Go (lockextra) preventing the same G from
being used on multiple threads at the same time.

We work around this by passing the address of a stack variable to
_cgo_getstackbound rather than the G. The stack is generally unique per
thread, so TSAN won't see the same address from multiple threads. Even
if stacks are reused across threads by pthread, C TSAN should see the
synchonization in the stack allocator.

A regression test is added to misc/cgo/testsanitizer.

Fixes #51676.
Fixes #59294.
Fixes #59678.

Change-Id: Ic62be31a06ee83568215e875a891df37084e08ca
Reviewed-on: https://go-review.googlesource.com/c/go/+/485500
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
2023-04-26 19:25:46 +00:00
Ian Lance Taylor f00e947cdf runtime: add raceFiniLock to lock ranking
Also preserve the PC/SP in reentersyscall when doing lock ranking.
The test is TestDestructorCallbackRace with the staticlockranking
experiment enabled.

For #59711

Change-Id: I87ac1d121ec0d399de369666834891ab9e7d11b0
Reviewed-on: https://go-review.googlesource.com/c/go/+/487955
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
2023-04-24 21:37:06 +00:00
Lucien Coffe f229787aff runtime: prevent double lock in checkdead by unlocking before throws
This change resolves an issue where checkdead could result in a double lock when shedtrace is enabled. This fix involves adding unlocks before all throws in the checkdead function to ensure the scheduler lock is properly released.

Fixes #59758

Change-Id: If3ddf9969f4582c3c88dee9b9ecc355a63958103
Reviewed-on: https://go-review.googlesource.com/c/go/+/487375
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2023-04-21 20:14:08 +00:00
Austin Clements 87272bd1a1 runtime: tidy _Stack* constant naming
For #59670.

Change-Id: I0efa743edc08e48dc8d906803ba45e9f641369db
Reviewed-on: https://go-review.googlesource.com/c/go/+/486977
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
2023-04-21 19:29:00 +00:00
Austin Clements 2668a190ba internal/abi, runtime, cmd: merge funcFlag_* consts into internal/abi
For #59670.

Change-Id: Ie784ba4dd2701e4f455e1abde4a6bfebee4b1387
Reviewed-on: https://go-review.googlesource.com/c/go/+/485496
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
Auto-Submit: Austin Clements <austin@google.com>
2023-04-21 19:28:46 +00:00
Austin Clements df777cfa15 Revert "runtime: tidy _Stack* constant naming"
This reverts commit CL 486381.

Submitted out of order and breaks bootstrap.

Change-Id: Ia472111cb966e884a48f8ee3893b3bf4b4f4f875
Reviewed-on: https://go-review.googlesource.com/c/go/+/486915
Reviewed-by: David Chase <drchase@google.com>
TryBot-Bypass: Austin Clements <austin@google.com>
2023-04-20 16:19:25 +00:00
Austin Clements ef8e3b7fa4 runtime: tidy _Stack* constant naming
For #59670.

Change-Id: I4476d6f92663e8a825d063d6e6a7fc9a2ac99d4d
Reviewed-on: https://go-review.googlesource.com/c/go/+/486381
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-04-20 16:05:23 +00:00
Michael Pratt 94850c6f79 Revert "runtime/cgo: store M for C-created thread in pthread key"
This reverts CL 481061.

Reason for revert: When built with C TSAN, x_cgo_getstackbound triggers
race detection on `g->stacklo` because the synchronization is in Go,
which isn't instrumented.

For #51676.
For #59294.
For #59678.

Change-Id: I38afcda9fcffd6537582a39a5214bc23dc147d47
Reviewed-on: https://go-review.googlesource.com/c/go/+/485275
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
2023-04-17 18:47:08 +00:00
Keith Randall 2b92c39fe0 cmd/link: establish dependable package initialization order
(This is a retry of CL 462035 which was reverted at 474976.
The only change from that CL is the aix fix SRODATA->SNOPTRDATA
at inittask.go:141)

As described here:

https://github.com/golang/go/issues/31636#issuecomment-493271830

"Find the lexically earliest package that is not initialized yet,
but has had all its dependencies initialized, initialize that package,
 and repeat."

Simplify the runtime a bit, by just computing the ordering required
in the linker and giving a list to the runtime.

Update #31636
Fixes #57411

RELNOTE=yes

Change-Id: I28c09451d6aa677d7394c179d23c2c02c503fc56
Reviewed-on: https://go-review.googlesource.com/c/go/+/478916
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-04-14 16:55:22 +00:00
doujiang24 ccad8a9f9c runtime/cgo: store M for C-created thread in pthread key
This reapplies CL 392854, with the followup fixes in CL 479255,
CL 479915, and CL 481057 incorporated.

CL 392854, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 479255 is a followup fix for a small bug in ARM assembly code.
CL 479915 is another followup fix to address C to Go calls after
the C code uses some stack, but that CL is also buggy.
CL 481057, by Michael Knyszek, is a followup fix for a memory leak
bug of CL 479915.

[Original CL 392854 description]

In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls.
So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call.
Instead, we only dropm while the C thread exits, so the extra M won't leak.

When invoking a Go function from C:
Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor.
And store the g0 of the current m into the thread-specified value of the pthread key,  only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits.

When returning back to C:
Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C.

This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows.

This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread.

For the newly added BenchmarkCGoInCThread, some benchmark results:
1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz

[CL 479915 description]

Currently, when C calls into Go the first time, we grab an M
using needm, which sets m.g0's stack bounds using the SP. We don't
know how big the stack is, so we simply assume 32K. Previously,
when the Go function returns to C, we drop the M, and the next
time C calls into Go, we put a new stack bound on the g0 based on
the current SP. After CL 392854, we don't drop the M, and the next
time C calls into Go, we reuse the same g0, without recomputing
the stack bounds. If the C code uses quite a bit of stack space
before calling into Go, the SP may be well below the 32K stack
bound we assumed, so the runtime thinks the g0 stack overflows.

This CL makes needm get a more accurate stack bound from
pthread. (In some platforms this may still be a guess as we don't
know exactly where we are in the C stack), but it is probably
better than simply assuming 32K.

Fixes #51676.
Fixes #59294.

Change-Id: I9bf1400106d5c08ce621d2ed1df3a2d9e3f55494
Reviewed-on: https://go-review.googlesource.com/c/go/+/481061
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: DeJiang Zhu (doujiang) <doujiang24@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-04-03 18:34:11 +00:00
Cherry Mui bfe3c678ab Revert "runtime/cgo: store M for C-created thread in pthread key"
This reverts CL 392854.

Reason for revert: caused #59294, which was derived from google
internal tests. The attempted fix of #59294 caused more breakage.

Change-Id: I5a061561ac2740856b7ecc09725ac28bd30f8bba
Reviewed-on: https://go-review.googlesource.com/c/go/+/481060
Reviewed-by: Heschi Kreinick <heschi@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-03-31 19:26:35 +00:00
Cherry Mui 63ef9059a2 Revert "runtime: get a better g0 stack bound in needm"
This reverts CL 479915.

Reason for revert: breaks a lot google internal tests.

Change-Id: I13a9422e810af7ba58cbf4a7e6e55f4d8cc0ca51
Reviewed-on: https://go-review.googlesource.com/c/go/+/481055
Reviewed-by: Chressie Himpel <chressie@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-03-31 18:42:48 +00:00
Cherry Mui 443eb9757c runtime: get a better g0 stack bound in needm
Currently, when C calls into Go the first time, we grab an M
using needm, which sets m.g0's stack bounds using the SP. We don't
know how big the stack is, so we simply assume 32K. Previously,
when the Go function returns to C, we drop the M, and the next
time C calls into Go, we put a new stack bound on the g0 based on
the current SP. After CL 392854, we don't drop the M, and the next
time C calls into Go, we reuse the same g0, without recomputing
the stack bounds. If the C code uses quite a bit of stack space
before calling into Go, the SP may be well below the 32K stack
bound we assumed, so the runtime thinks the g0 stack overflows.

This CL makes needm get a more accurate stack bound from
pthread. (In some platforms this may still be a guess as we don't
know exactly where we are in the C stack), but it is probably
better than simply assuming 32K.

For #59294.

Change-Id: Ie52a8f931e0648d8753e4c1dbe45468b8748b527
Reviewed-on: https://go-review.googlesource.com/c/go/+/479915
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-03-30 23:23:55 +00:00
Felix Geisendörfer 3dd221a94d runtime/trace: use regular unwinding for cgo callbacks
Introduce a new m.incgocallback field that is true while C code calls
into Go code. Use it in the tracer in order to fallback to the default
unwinder instead of frame pointer unwinding for this scenario. The
existing fields (incgo, ncgo) were not sufficient to detect the case
where a thread created in C calls into Go code.

Motivation:

1. Take advantage of a cgo symbolizer, if registered, to unwind through
   C stacks without frame pointers.
2. Reduce the chance of crashes. It seems unsafe to follow frame
   pointers when there could be C code that was compiled without frame
   pointers.

Removing the curgp.m.incgocallback check in traceStackID shows the
following minor differences between frame pointer unwinding and the
default unwinder when there is no cgo symbolizer involved.

    trace_test.go:60: "goCalledFromCThread": got stack:
        main.goCalledFromCThread
        	/src/runtime/testdata/testprogcgo/trace.go:58
        _cgoexp_45c15a3efb3a_goCalledFromCThread
        	_cgo_gotypes.go:694
        runtime.cgocallbackg1
        	/src/runtime/cgocall.go:318
        runtime.cgocallbackg
        	/src/runtime/cgocall.go:236
        runtime.cgocallback
        	/src/runtime/asm_amd64.s:998
        crosscall2
        	/src/runtime/cgo/asm_amd64.s:30

        want stack:
        main.goCalledFromCThread
        	/src/runtime/testdata/testprogcgo/trace.go:58
        _cgoexp_45c15a3efb3a_goCalledFromCThread
        	_cgo_gotypes.go:694
        runtime.cgocallbackg1
        	/src/runtime/cgocall.go:318
        runtime.cgocallbackg
        	/src/runtime/cgocall.go:236
        runtime.cgocallback
        	/src/runtime/asm_amd64.s:998

    trace_test.go:60: "goCalledFromC": got stack:
        main.goCalledFromC
        	/src/runtime/testdata/testprogcgo/trace.go:51
        _cgoexp_45c15a3efb3a_goCalledFromC
        	_cgo_gotypes.go:687
        runtime.cgocallbackg1
        	/src/runtime/cgocall.go:318
        runtime.cgocallbackg
        	/src/runtime/cgocall.go:236
        runtime.cgocallback
        	/src/runtime/asm_amd64.s:998
        crosscall2
        	/src/runtime/cgo/asm_amd64.s:30
        runtime.asmcgocall
        	/src/runtime/asm_amd64.s:848
        main._Cfunc_cCalledFromGo
        	_cgo_gotypes.go:263
        main.goCalledFromGo
        	/src/runtime/testdata/testprogcgo/trace.go:46
        main.Trace
        	/src/runtime/testdata/testprogcgo/trace.go:37
        main.main
        	/src/runtime/testdata/testprogcgo/main.go:34

        want stack:
        main.goCalledFromC
        	/src/runtime/testdata/testprogcgo/trace.go:51
        _cgoexp_45c15a3efb3a_goCalledFromC
        	_cgo_gotypes.go:687
        runtime.cgocallbackg1
        	/src/runtime/cgocall.go:318
        runtime.cgocallbackg
        	/src/runtime/cgocall.go:236
        runtime.cgocallback
        	/src/runtime/asm_amd64.s:998
        runtime.systemstack_switch
        	/src/runtime/asm_amd64.s:463
        runtime.cgocall
        	/src/runtime/cgocall.go:168
        main._Cfunc_cCalledFromGo
        	_cgo_gotypes.go:263
        main.goCalledFromGo
        	/src/runtime/testdata/testprogcgo/trace.go:46
        main.Trace
        	/src/runtime/testdata/testprogcgo/trace.go:37
        main.main
        	/src/runtime/testdata/testprogcgo/main.go:34

For #16638

Change-Id: I95fa27a3170c5abd923afc6eadab4eae777ced31
Reviewed-on: https://go-review.googlesource.com/c/go/+/474916
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
2023-03-30 19:18:12 +00:00