mirror/go - go - Git Fam. Sieh

Commit Graph

Author	SHA1	Message	Date
Joel Sing	e293c4b509	runtime: allocate crash stack via stackalloc On some platforms (notably OpenBSD), stacks must be specifically allocated and marked as being stack memory. Allocate the crash stack using stackalloc, which ensures these requirements are met, rather than using a global Go variable. Fixes #63794 Change-Id: I6513575797dd69ff0a36f3bfd4e5fc3bd95cbf50 Reviewed-on: https://go-review.googlesource.com/c/go/+/538457 Run-TryBot: Joel Sing <joel@sing.id.au> Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-10-31 16:28:14 +00:00
Cherry Mui	0262ea1ff9	runtime: print a stack trace at "morestack on g0" Error like "morestack on g0" is one of the errors that is very hard to debug, because often it doesn't print a useful stack trace. The runtime doesn't directly print a stack trace because it is a bad stack state to call print. Sometimes the SIGABRT may trigger a traceback, but sometimes not especially in a cgo binary. Even if it triggers a traceback it often does not include the stack trace of the bad stack. This CL makes it explicitly print a stack trace and throw. The idea is to have some space as an "emergency" crash stack. When the stack is in a really bad state, we switch to the crash stack and do a traceback. Currently only implemented on AMD64 and ARM64. TODO: also handle errors like "morestack on gsignal" and bad systemstack. Also handle other architectures. Change-Id: Ibfc397202f2bb0737c5cbe99f2763de83301c1c1 Reviewed-on: https://go-review.googlesource.com/c/go/+/419435 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>	2023-10-26 18:46:50 +00:00
Michael Pratt	1af424c196	runtime: clear g0 stack bounds in dropm After CL 527715, needm uses callbackUpdateSystemStack to set the stack bounds for g0 on an M from the extra M list. Since callbackUpdateSystemStack is also used for recursive cgocallback, it does nothing if the stack is already in bounds. Currently, the stack bounds in an extra M may contain stale bounds from a previous thread that used this M and then returned it to the extra list in dropm. Typically a new thread will not have an overlapping stack with an old thread, but because the old thread has exited there is a small chance that the C memory allocator will allocate the new thread's stack partially or fully overlapping with the old thread's stack. If this occurs, then callbackUpdateSystemStack will not update the stack bounds. If in addition, the overlap is partial such that SP on cgocallback is close to the recorded stack lower bound, then Go may quickly "overflow" the stack and crash with "morestack on g0". Fix this by clearing the stack bounds in dropm, which ensures that callbackUpdateSystemStack will unconditionally update the bounds in needm. For #62440. Change-Id: Ic9e2052c2090dd679ed716d1a23a86d66cbcada7 Reviewed-on: https://go-review.googlesource.com/c/go/+/537695 Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> TryBot-Bypass: Michael Pratt <mpratt@google.com>	2023-10-26 15:17:33 +00:00
Cherry Mui	fd54185a8d	cmd/link, runtime: initialize packages in shared build mode Currently, for the shared build mode, we don't generate the module inittasks. Instead, we rely on the main executable to do the initialization, for both the executable and the shared library. But, with the model as of CL 478916, the main executable only has relocations to packages that are directly imported. It won't see the dependency edges between packages within a shared library. Therefore indirect dependencies are not included, and thus not initialized. E.g. main imports a, which imports b, but main doesn't directly import b. a and b are in a shared object. When linking main, it sees main depends on a, so it generates main's inittasks to run a's init before main's, but it doesn't know b, so b's init doesn't run. This CL makes it initialize all packages in a shared library when the library is loaded, as any of them could potentially be imported, directly or indirectly. Also, in the runtime, when running the init functions, make sure to go through the DSOs in dependency order. Otherwise packages can be initialized in the wrong order. Fixes #61973. Change-Id: I2a090336fe9fa0d6c7e43912f3ab233c9c47e247 Reviewed-on: https://go-review.googlesource.com/c/go/+/520375 Reviewed-by: Than McIntosh <thanm@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2023-09-20 14:46:11 +00:00
Michael Pratt	4f9fe6d509	runtime: allow update of system stack bounds on callback from C thread [This is a redo of CL 525455 with the test fixed on darwin by defining _XOPEN_SOURCE, and disabled with android, musl, and openbsd, which do not provide getcontext.] Since CL 495855, Ms are cached for C threads calling into Go, including the stack bounds of the system stack. Some C libraries (e.g., coroutine libraries) do manual stack management and may change stacks between calls to Go on the same thread. Changing the stack if there is more Go up the stack would be problematic. But if the calls are completely independent there is no particular reason for Go to care about the changing stack boundary. Thus, this CL allows the stack bounds to change in such cases. The primary downside here (besides additional complexity) is that normal systems that do not manipulate the stack may not notice unintentional stack corruption as quickly as before. Note that callbackUpdateSystemStack is written to be usable for the initial setup in needm as well as updating the stack in cgocallbackg. Fixes #62440. For #62130. Change-Id: I0fe0134f865932bbaff1fc0da377c35c013bd768 Reviewed-on: https://go-review.googlesource.com/c/go/+/527715 Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2023-09-12 17:08:55 +00:00
Michael Pratt	ea8c05508b	Revert "runtime: allow update of system stack bounds on callback from C thread" This reverts CL 525455. The test fails to build on darwin, alpine, and android. For #62440. Change-Id: I39c6b1e16499bd61e0f166de6c6efe7a07961e62 Reviewed-on: https://go-review.googlesource.com/c/go/+/527317 Auto-Submit: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2023-09-11 16:35:56 +00:00
Michael Pratt	a46b1ad357	runtime: allow update of system stack bounds on callback from C thread Since CL 495855, Ms are cached for C threads calling into Go, including the stack bounds of the system stack. Some C libraries (e.g., coroutine libraries) do manual stack management and may change stacks between calls to Go on the same thread. Changing the stack if there is more Go up the stack would be problematic. But if the calls are completely independent there is no particular reason for Go to care about the changing stack boundary. Thus, this CL allows the stack bounds to change in such cases. The primary downside here (besides additional complexity) is that normal systems that do not manipulate the stack may not notice unintentional stack corruption as quickly as before. Note that callbackUpdateSystemStack is written to be usable for the initial setup in needm as well as updating the stack in cgocallbackg. Fixes #62440. For #62130. Change-Id: I7841b056acea1111bdae3b718345a3bd3961b4a8 Reviewed-on: https://go-review.googlesource.com/c/go/+/525455 Run-TryBot: Michael Pratt <mpratt@google.com> Reviewed-by: Ian Lance Taylor <iant@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-09-11 14:46:41 +00:00
Cherry Mui	c6d550a668	runtime: increase g0 stack size in non-cgo case Currently, for non-cgo programs, the g0 stack size is 8 KiB on most platforms. With PGO which could cause aggressive inlining in the runtime, the runtime stack frames are larger and could overflow the 8 KiB g0 stack. Increase it to 16 KiB. This is only one per OS thread, so it shouldn't increase memory use much. Fixes #62120. Fixes #62489. Change-Id: I565b154517021f1fd849424dafc3f0f26a755cac Reviewed-on: https://go-review.googlesource.com/c/go/+/526995 Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2023-09-08 18:40:23 +00:00
Nick Ripley	94d36fbc4a	runtime: zero saved frame pointer when reusing goroutine stack on arm64 When a goroutine stack is reused on arm64, the spot on the stack where the "caller's" frame pointer goes for the topmost frame should be explicitly zeroed. Otherwise, the frame pointer check in adjustframe with debugCheckBP enabled will fail on the topmost frame of a call stack the first time a reused stack is grown. Updates #39524, #58432 Change-Id: Ic1210dc005e3ecdbf9cd5d7b98846566e56df8f5 Reviewed-on: https://go-review.googlesource.com/c/go/+/481636 Reviewed-by: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>	2023-08-15 13:58:27 +00:00
Joel Sing	9fc3feb441	runtime,syscall: invert openbsd architecture tests Rather than testing for architectures that use libc-based system calls, test that it is not the single architecture that Go is still using direct system calls. This reduces the number of changes needed for new openbsd ports. Updates #36435 Updates #61546 Change-Id: I79c4597c629b8b372e9efcda79e8f6ff778b9e8e Reviewed-on: https://go-review.googlesource.com/c/go/+/516016 Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-08-05 18:04:17 +00:00
Roland Shoemaker	d4dd1de19f	runtime: enforce standard file descriptors open on init on unix On Unix-like platforms, enforce that the standard file descriptions (0, 1, 2) are always open during initialization. If any of the FDs are closed, we open them pointing at /dev/null, or fail. Fixes #60641 Change-Id: Iaab6b3f3e5ca44006ae3ba3544d47da9a613f58f Reviewed-on: https://go-review.googlesource.com/c/go/+/509020 Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Roland Shoemaker <roland@golang.org> Auto-Submit: Roland Shoemaker <roland@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-07-25 16:33:33 +00:00
Michael Pratt	ad943066f6	runtime: call wakep in gosched goschedImpl transitions the current goroutine from _Grunning to _Grunnable and places it on the global run queue before calling into schedule. It does _not_ call wakep after adding the global run queue. I believe the intuition behind skipping wakep is that since we are immediately calling the scheduler so we don't need to wake anything to run this work. Unfortunately, this intuition is not correct, as it breaks coordination with spinning Ms [1]. Consider this example scenario: Initial conditions: M0: Running P0, G0 M1: Spinning, holding P1 and looking for work Timeline: M1: Fails to find work; drops P M0: newproc adds G1 to P0 runq M0: does not wakep because there is a spinning M M1: clear mp.spinning, decrement sched.nmspinning (now in "delicate dance") M1: check sched.runqsize -> no global runq work M0: gosched preempts G0; adds G0 to global runq M0: does not wakep because gosched doesn't wakep M0: schedules G1 from P0 runq M1: check P0 runq -> no work M1: no work -> park G0 is stranded on the global runq with no M/P looking to run it. This is a loss of work conservation. As a result, G0 will have unbounded* scheduling delay, only getting scheduled when G1 yields. Even once G1 yields, we still won't start another P, so both G0 and G1 will switch back and forth sharing one P when they should start another. *The caveat to this is that today sysmon will preempt G1 after 10ms, effectively capping the scheduling delay to 10ms, but not solving the P underutilization problem. Sysmon's behavior here is theoretically unnecessary, as our work conservation guarantee should allow sysmon to avoid preemption if there are any idle Ps. Issue #60693 tracks changing this behavior and the challenges involved. [1] It would be OK if we unconditionally entered the scheduler as a spinning M ourselves, as that would require schedule to call wakep when it finds work in case there is more work. Fixes #55160. Change-Id: I2f44001239564b56ea30212553ab557051d22588 Reviewed-on: https://go-review.googlesource.com/c/go/+/501976 Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Pratt <mpratt@google.com>	2023-07-20 21:39:57 +00:00
Michael Pratt	dd5db4df56	runtime: check global runq during "delicate dance" When a thread transitions to spinning to non-spinning it must recheck all sources of work because other threads may submit new work but skip wakep because they see a spinning thread. However, since the beginning of time (CL 7314062) we do not check the global run queue, only the local per-P run queues. The global run queue is checked just above the spinning checks while dropping the P. I am unsure what the purpose of this check is. It appears to simply be opportunistic since sched.lock is already held there in order to drop the P. It is not sufficient to synchronize with threads adding work because it occurs before decrementing sched.nmspinning, which is what threads us to decide to wake a thread. Resolve this by adding an explicit global run queue check alongside the local per-P run queue checks. Almost nothing happens between dropped sched.lock after dropping the P and relocking sched.lock: just clearing mp.spinning and decrementing sched.nmspinning. Thus it may be better to just hold sched.lock for this entire period, but this is a larger change that I would prefer to avoid in the freeze and backports. For #55160. Change-Id: Ifd88b5a4c561c063cedcfcfe1dd8ae04202d9666 Reviewed-on: https://go-review.googlesource.com/c/go/+/501975 Run-TryBot: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-07-20 21:39:53 +00:00
Ian Lance Taylor	f51c55bfc3	runtime: adjust netpollWaiters after goroutines are ready The runtime was adjusting netpollWaiters before the waiting goroutines were marked as ready. This could cause the scheduler to report a deadlock because there were no goroutines ready to run. Keeping netpollWaiters non-zero ensures that at least one goroutine will call netpoll(-1) from findRunnable. This does mean that if a program has network activity for a while and then never has it again, and also has no timers, then we can leave an M stranded in a call to netpoll from which it will never return. At least this won't be a common case. And it's not new; this has been a potential problem for some time. Fixes #61454 Change-Id: I17c7f891c2bb1262fda12c6929664e64686463c8 Reviewed-on: https://go-review.googlesource.com/c/go/+/511455 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Ian Lance Taylor <iant@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com>	2023-07-20 15:45:57 +00:00
Jelle van den Hooff	48dbb6227a	runtime: set raceignore to zero when starting a new goroutine When reusing a g struct the runtime did not reset g.raceignore. Initialize raceignore to zero when initially setting racectx. A goroutine can end with a non-zero raceignore if it exits after calling runtime.RaceDisable without a matching runtime.RaceEnable. If that goroutine's g is later reused the race detector is in a weird state: the underlying g.racectx is active, yet g.raceignore is non-zero, and raceacquire/racerelease which check g.raceignore become no-ops. This causes the race detector to report races when there are none. Fixes #60934 Change-Id: Ib8e412f11badbaf69a480f03740da70891f4093f Reviewed-on: https://go-review.googlesource.com/c/go/+/505055 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Michael Knyszek <mknyszek@google.com>	2023-06-23 16:46:25 +00:00
Michael Pratt	5b6e6d2b3d	runtime: make GODEBUG=dontfreezetheworld=1 safer GODEBUG=dontfreezetheworld=1 allows goroutines to continue execution during fatal panic. This increases the chance that tracebackothers will encounter running goroutines that it must skip, which is expected and fine. However, it also introduces the risk that a goroutine transitions from stopped to running in the middle of traceback, which is unsafe and may cause traceback crashes. Mitigate this by halting M execution if it naturally enters the scheduler. This ensures that goroutines cannot transition from stopped to running after freezetheworld. We simply deadlock rather than using gcstopm to continue keeping disturbance to scheduler state to a minimum. Change-Id: I9aa8d84abf038ae17142f34f4384e920b1490e81 Reviewed-on: https://go-review.googlesource.com/c/go/+/501255 Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>	2023-06-06 21:29:01 +00:00
Roland Shoemaker	2496653d0a	runtime: implement SUID/SGID protections On Unix platforms, the runtime previously did nothing special when a program was run with either the SUID or SGID bits set. This can be dangerous in certain cases, such as when dumping memory state, or assuming the status of standard i/o file descriptors. Taking cues from glibc, this change implements a set of protections when a binary is run with SUID or SGID bits set (or is SUID/SGID-like). On Linux, whether to enable these protections is determined by whether the AT_SECURE flag is passed in the auxiliary vector. On platforms which have the issetugid syscall (the BSDs, darwin, and Solaris/Illumos), that is used. On the remaining platforms (currently only AIX) we check !(getuid() == geteuid() && getgid == getegid()). Currently when we determine a binary is "tainted" (using the glibc terminology), we implement two specific protections: 1. we check if the file descriptors 0, 1, and 2 are open, and if they are not, we open them, pointing at /dev/null (or fail). 2. we force GOTRACKBACK=none, and generally prevent dumping of trackbacks and registers when a program panics/aborts. In the future we may add additional protections. This change requires implementing issetugid on the platforms which support it, and implementing getuid, geteuid, getgid, and getegid on AIX. Thanks to Vincent Dehors from Synacktiv for reporting this issue. Fixes #60272 Fixes CVE-2023-29403 Change-Id: I73fc93f2b7a8933c192ce3eabbf1db359db7d5fa Reviewed-on: https://team-review.git.corp.google.com/c/golang/go-private/+/1878434 Reviewed-by: Damien Neil <dneil@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: Roland Shoemaker <bracewell@google.com> Reviewed-by: Russ Cox <rsc@google.com> Reviewed-on: https://go-review.googlesource.com/c/go/+/501223 Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-06-06 18:49:01 +00:00
Michael Pratt	7911f7c21d	runtime: only increment extraMInUse when actually in use Currently lockextra always increments extraMInUse, even if the M won't be used (or doesn't even exist), such as in addExtraM. addExtraM fails to decrement extraMInUse, so it stays elevated forever. Fix this bug and simplify the model by moving extraMInUse out of lockextra to getExtraM, where we know the M will actually be used. While we're here, remove the nilokay argument from getExtraM, which is always false. Fixes #60540. Change-Id: I7a5d97456b3bc6ea1baeb06b5b2975e3b8dd96a0 Reviewed-on: https://go-review.googlesource.com/c/go/+/499677 Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-06-01 14:05:01 +00:00
Michael Anthony Knyszek	0adcc5ace8	runtime: cache inner pinner on P This change caches the pinner on the P to pool it and reduce the chance that a new allocation is made. It also makes the pinner no longer drop its refs array on unpin, also to avoid reallocating. The Pinner benchmark results before and after this CL are attached at the bottom of the commit message. Note that these results are biased toward the current change because of the last two benchmark changes. Reusing the pinner in the benchmark itself achieves similar performance before this change. The benchmark results thus basically just confirm that this change does cache the inner pinner in a useful way. Using the previous benchmarks there's actually a slight regression from the extra check in the cache, however the long pole is still setPinned itself. name old time/op new time/op delta PinnerPinUnpinBatch-8 42.2µs ± 2% 41.5µs ± 1% ~ (p=0.056 n=5+5) PinnerPinUnpinBatchDouble-8 367µs ± 1% 350µs ± 1% -4.67% (p=0.008 n=5+5) PinnerPinUnpinBatchTiny-8 108µs ± 0% 102µs ± 1% -6.22% (p=0.008 n=5+5) PinnerPinUnpin-8 592ns ± 8% 40ns ± 1% -93.29% (p=0.008 n=5+5) PinnerPinUnpinTiny-8 693ns ± 9% 39ns ± 1% -94.31% (p=0.008 n=5+5) PinnerPinUnpinDouble-8 843ns ± 5% 124ns ± 3% -85.24% (p=0.008 n=5+5) PinnerPinUnpinParallel-8 1.11µs ± 5% 0.00µs ± 0% -99.55% (p=0.008 n=5+5) PinnerPinUnpinParallelTiny-8 1.12µs ± 8% 0.00µs ± 1% -99.55% (p=0.008 n=5+5) PinnerPinUnpinParallelDouble-8 1.79µs ± 4% 0.58µs ± 6% -67.36% (p=0.008 n=5+5) PinnerIsPinnedOnPinned-8 5.78ns ± 0% 5.80ns ± 1% ~ (p=0.548 n=5+5) PinnerIsPinnedOnUnpinned-8 4.99ns ± 1% 4.98ns ± 0% ~ (p=0.841 n=5+5) PinnerIsPinnedOnPinnedParallel-8 0.71ns ± 0% 0.71ns ± 0% ~ (p=0.175 n=5+5) PinnerIsPinnedOnUnpinnedParallel-8 0.67ns ± 1% 0.66ns ± 0% ~ (p=0.167 n=5+5) name old alloc/op new alloc/op delta PinnerPinUnpinBatch-8 20.1kB ± 0% 20.0kB ± 0% -0.32% (p=0.008 n=5+5) PinnerPinUnpinBatchDouble-8 52.7kB ± 0% 52.7kB ± 0% -0.12% (p=0.008 n=5+5) PinnerPinUnpinBatchTiny-8 20.1kB ± 0% 20.0kB ± 0% -0.32% (p=0.008 n=5+5) PinnerPinUnpin-8 64.0B ± 0% 0.0B -100.00% (p=0.008 n=5+5) PinnerPinUnpinTiny-8 64.0B ± 0% 0.0B -100.00% (p=0.008 n=5+5) PinnerPinUnpinDouble-8 64.0B ± 0% 0.0B -100.00% (p=0.008 n=5+5) PinnerPinUnpinParallel-8 64.0B ± 0% 0.0B -100.00% (p=0.008 n=5+5) PinnerPinUnpinParallelTiny-8 64.0B ± 0% 0.0B -100.00% (p=0.008 n=5+5) PinnerPinUnpinParallelDouble-8 64.0B ± 0% 0.0B -100.00% (p=0.008 n=5+5) PinnerIsPinnedOnPinned-8 0.00B 0.00B ~ (all equal) PinnerIsPinnedOnUnpinned-8 0.00B 0.00B ~ (all equal) PinnerIsPinnedOnPinnedParallel-8 0.00B 0.00B ~ (all equal) PinnerIsPinnedOnUnpinnedParallel-8 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta PinnerPinUnpinBatch-8 9.00 ± 0% 8.00 ± 0% -11.11% (p=0.008 n=5+5) PinnerPinUnpinBatchDouble-8 11.0 ± 0% 10.0 ± 0% -9.09% (p=0.008 n=5+5) PinnerPinUnpinBatchTiny-8 9.00 ± 0% 8.00 ± 0% -11.11% (p=0.008 n=5+5) PinnerPinUnpin-8 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) PinnerPinUnpinTiny-8 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) PinnerPinUnpinDouble-8 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) PinnerPinUnpinParallel-8 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) PinnerPinUnpinParallelTiny-8 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) PinnerPinUnpinParallelDouble-8 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5) PinnerIsPinnedOnPinned-8 0.00 0.00 ~ (all equal) PinnerIsPinnedOnUnpinned-8 0.00 0.00 ~ (all equal) PinnerIsPinnedOnPinnedParallel-8 0.00 0.00 ~ (all equal) PinnerIsPinnedOnUnpinnedParallel-8 0.00 0.00 ~ (all equal) For #46787. Change-Id: I0cdfad77b189c425868944a4faeff3d5b97417b9 Reviewed-on: https://go-review.googlesource.com/c/go/+/497615 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Ansiwen <ansiwen@gmail.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Michael Knyszek <mknyszek@google.com>	2023-05-24 16:23:08 +00:00
Michael Anthony Knyszek	6f13d0bfe4	runtime: fix usage of stale "now" value for netpolling Ms Currently pidleget gets passed "now" from before the M goes into netpoll, resulting in incorrect accounting of idle CPU time. lastpoll is also stored with a stale "now": the mistake was added in the same CL it was added for pidleget. Recompute "now" after returning from netpoll. Also, start tracking idle time on js/wasm at all. Credit to Rhys Hiltner for the test case. Fixes #60276. Change-Id: I5dd677471f74c915dfcf3d01621430876c3ff307 Reviewed-on: https://go-review.googlesource.com/c/go/+/496183 Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com>	2023-05-23 19:24:33 +00:00
Michael Anthony Knyszek	7c91e1e568	runtime: replace raw traceEv with traceBlockReason in gopark This change adds traceBlockReason which leaks fewer implementation details of the tracer to the runtime. Currently, gopark is called with an explicit trace event, but this leaks details about trace internals throughout the runtime. This change will make it easier to change out the trace implementation. Change-Id: Id633e1704d2c8838c6abd1214d9695537c4ac7db Reviewed-on: https://go-review.googlesource.com/c/go/+/494185 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com>	2023-05-19 20:47:25 +00:00
Michael Anthony Knyszek	b60db8f7d9	runtime: formalize the trace clock Currently the trace clock is cputicks() with comments sprinkled in different places as to which clock to use. Since the execution tracer redesign will use a different clock, it seems like a good time to clean that up. Also, rename the start/end timestamps to be more readable (i.e. startTime vs. timeStart). Change-Id: If43533eddd0e5f68885bb75cdbadb38da42e7584 Reviewed-on: https://go-review.googlesource.com/c/go/+/494775 Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-05-19 18:01:57 +00:00
Michael Anthony Knyszek	b1aadd034c	runtime: emit STW events for all pauses, not just those for the GC Currently STW events are only emitted for GC STWs. There's little reason why the trace can't contain events for every STW: they're rare so don't take up much space in the trace, yet being able to see when the world was stopped is often critical to debugging certain latency issues, especially when they stem from user-level APIs. This change adds new "kinds" to the EvGCSTWStart event, renames the GCSTW events to just "STW," and lets the parser deal with unknown STW kinds for future backwards compatibility. But, this change must break trace compatibility, so it bumps the trace version to Go 1.21. This change also includes a small cleanup in the trace command, which previously checked for STW events when deciding whether user tasks overlapped with a GC. Looking at the source, I don't see a way for STW events to ever enter the stream that that code looks at, so that condition has been deleted. Change-Id: I9a5dc144092c53e92eb6950e9a5504a790ac00cf Reviewed-on: https://go-review.googlesource.com/c/go/+/494495 Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Michael Knyszek <mknyszek@google.com>	2023-05-19 17:06:45 +00:00
Cherry Mui	c426c87012	runtime/cgo: store M for C-created thread in pthread key This reapplies CL 485500, with a fix drafted in CL 492987 incorporated. CL 485500 is reverted due to #60004 and #60007. #60004 is fixed in CL 492743. #60007 is fixed in CL 492987 (incorporated in this CL). [Original CL 485500 description] This reapplies CL 481061, with the followup fixes in CL 482975, CL 485315, and CL 485316 incorporated. CL 481061, by doujiang24 <doujiang24@gmail.com>, speed up C to Go calls by binding the M to the C thread. See below for its description. CL 482975 is a followup fix to a C declaration in testprogcgo. CL 485315 is a followup fix for x_cgo_getstackbound on Illumos. CL 485316 is a followup cleanup for ppc64 assembly. CL 479915 passed the G to _cgo_getstackbound for direct updates to gp.stack.lo. A G can be reused on a new thread after the previous thread exited. This could trigger the C TSAN race detector because it couldn't see the synchronization in Go (lockextra) preventing the same G from being used on multiple threads at the same time. We work around this by passing the address of a stack variable to _cgo_getstackbound rather than the G. The stack is generally unique per thread, so TSAN won't see the same address from multiple threads. Even if stacks are reused across threads by pthread, C TSAN should see the synchonization in the stack allocator. A regression test is added to misc/cgo/testsanitizer. [Original CL 481061 description] This reapplies CL 392854, with the followup fixes in CL 479255, CL 479915, and CL 481057 incorporated. CL 392854, by doujiang24 <doujiang24@gmail.com>, speed up C to Go calls by binding the M to the C thread. See below for its description. CL 479255 is a followup fix for a small bug in ARM assembly code. CL 479915 is another followup fix to address C to Go calls after the C code uses some stack, but that CL is also buggy. CL 481057, by Michael Knyszek, is a followup fix for a memory leak bug of CL 479915. [Original CL 392854 description] In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls. So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call. Instead, we only dropm while the C thread exits, so the extra M won't leak. When invoking a Go function from C: Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor. And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits. When returning back to C: Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C. This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows. This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread. For the newly added BenchmarkCGoInCThread, some benchmark results: 1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz 2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz [CL 479915 description] Currently, when C calls into Go the first time, we grab an M using needm, which sets m.g0's stack bounds using the SP. We don't know how big the stack is, so we simply assume 32K. Previously, when the Go function returns to C, we drop the M, and the next time C calls into Go, we put a new stack bound on the g0 based on the current SP. After CL 392854, we don't drop the M, and the next time C calls into Go, we reuse the same g0, without recomputing the stack bounds. If the C code uses quite a bit of stack space before calling into Go, the SP may be well below the 32K stack bound we assumed, so the runtime thinks the g0 stack overflows. This CL makes needm get a more accurate stack bound from pthread. (In some platforms this may still be a guess as we don't know exactly where we are in the C stack), but it is probably better than simply assuming 32K. [CL 492987 description] On the first call into Go from a C thread, currently we set the g0 stack's high bound imprecisely based on the SP. With CL 485500, we keep the M and don't recompute the stack bounds when it calls into Go again. If the first call is made when the C thread uses some deep stack, but a subsequent call is made with a shallower stack, the SP may be above g0.stack.hi. This is usually okay as we don't check usually stack.hi. One place where we do check for stack.hi is in the signal handler, in adjustSignalStack. In particular, C TSAN delivers signals on the g0 stack (instead of the usual signal stack). If the SP is above g0.stack.hi, we don't see it is on the g0 stack, and throws. This CL makes it get an accurate stack upper bound with the pthread API (on the platforms where it is available). Also add some debug print for the "handler not on signal stack" throw. Fixes #51676. Fixes #59294. Fixes #59678. Fixes #60007. Change-Id: Ie51c8e81ade34ec81d69fd7bce1fe0039a470776 Reviewed-on: https://go-review.googlesource.com/c/go/+/495855 Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>	2023-05-17 21:53:11 +00:00
Michael Anthony Knyszek	865179164e	runtime: replace sysBlockTraced with tracedSyscallEnter sysBlockTraced is a subtle and confusing flag. Currently, it's only used in one place: a condition around whether to traceGoSysExit when a goroutine is about to start running. That condition looks like "gp.syscallsp != 0 && gp.trace.sysBlockTraced". In every case but one, "gp.syscallsp != 0" is equivalent to "gp.trace.sysBlockTraced." That one case is where a goroutine is running without a P and racing with trace start (world is stopped). It switches itself back to _Grunnable from _Gsyscall before the trace start goroutine notices, such that the trace start goroutine fails to emit a EvGoInSyscall event for it (EvGoInSyscall or EvGoSysBlock must precede any EvGoSysExit event). sysBlockTraced is set unconditionally on every syscall entry and the trace start goroutine clears it if there was no EvGoInSyscall event emitted (i.e. did not observe _Gsyscall on the goroutine). That way when the goroutine-without-a-P wakes up and gets scheduled, it only emits EvGoSysExit if the flag is set, i.e. trace start didn't _clear_ the flag. What makes this confusing is the fact that the flag is set unconditionally and the code relies on it being cleared. Really, all it's trying to communicate is whether the tracer is aware of a goroutine's syscall at the point where a goroutine that lost its P during a syscall is trying to run again. Therefore, we can replace this flag with a less subtle one: tracedSyscallEnter. It is set when GoSysCall is traced, indicating on the goroutine that the tracer is aware of the syscall. Later, if traceGoSysExit is called, the tracer knows its safe to emit an event because the tracer is aware of the syscall. This flag is then also set at trace start, when it emits EvGoInSyscall, which again, lets the goroutine know the tracer is aware of its syscall. The flag is cleared by GoSysExit to indicate that the tracer is no longer aware of any syscalls on the goroutine. It's also cleared by trace start. This is necessary because a syscall may have been started while a trace was stopping. If the GoSysExit isn't emitted (because it races with the trace end STW) then the flag will be left set at the start of the next trace period, which will result in an erroneous GoSysExit. Instead, the flag is cleared in the same way sysBlockTraced is today: if the tracer doesn't notice the goroutine is in a syscall, it makes that explicit to the goroutine. A more direct flag to use here would be one that explicitly indicates whether EvGoInSyscall or EvGoSysBlock specifically were already emitted for a goroutine. The reason why we don't just do this is because setting the flag when EvGoSysBlock is emitted would be racy: EvGoSysBlock is emitted by whatever thread is stealing the P out from under the syscalling goroutine, so it would need to synchronize with the goroutine its emitting the event for. The end result of all this is that the new flag can be managed entirely within trace.go, hiding another implementation detail about the tracer. Tested with `stress ./trace.test -test.run="TestTraceStressStartStop"` which was occasionally failing before the CL in which sysBlockTraced was added (CL 9132). I also confirmed also that this test is still sensitive to `EvGoSysExit` by removing the one use of sysBlockTraced. The result is about a 5% error rate. If there is something very subtly wrong about how this CL emits `EvGoSysExit`, I would expect to see it as a test failure. Instead: 53m55s: 200434 runs so far, 0 failures Change-Id: If1d24ee6b6926eec7e90cdb66039a5abac819d9b Reviewed-on: https://go-review.googlesource.com/c/go/+/494715 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>	2023-05-17 15:50:17 +00:00
Michael Anthony Knyszek	e3ada56537	runtime: hide sysExitTicks a little better Just another step to hiding implementation details. Change-Id: I71b7cc522d18c23f03a9bf32e428279e62b39a89 Reviewed-on: https://go-review.googlesource.com/c/go/+/494192 Run-TryBot: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>	2023-05-17 14:53:01 +00:00
Michael Anthony Knyszek	c213c905a2	runtime: capture per-g trace state in a type More tightening up of the tracer's interface. This increases the size of each G very slightly, which isn't great, but we stay within the same size class, so actually memory use will be unchanged. Change-Id: I7d1f5798edcf437c212beb1e1a2619eab833aafb Reviewed-on: https://go-review.googlesource.com/c/go/+/494188 TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com>	2023-05-17 14:45:40 +00:00
Michael Anthony Knyszek	c890d40d0d	runtime: factor our oneNewExtraM trace code In the interest of further cleaning up the trace.go API, move the trace logic in oneNewExtraM into its own function. Change-Id: I5cf478cb8cd0d301ee3b068347ed48ce768b8882 Reviewed-on: https://go-review.googlesource.com/c/go/+/494186 Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-05-17 14:43:25 +00:00
Michael Anthony Knyszek	62bf7a4809	runtime: hide trace.shutdown behind traceShuttingDown Change-Id: I0b123e65f40570caeee611679d80dc27034d5a52 Reviewed-on: https://go-review.googlesource.com/c/go/+/494183 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com>	2023-05-11 21:27:10 +00:00
Michael Anthony Knyszek	8992bb19ad	runtime: replace trace.enabled with traceEnabled [git-generate] cd src/runtime grep -l 'trace\.enabled' *.go \| grep -v "trace.go" \| xargs sed -i 's/trace\.enabled/traceEnabled()/g' Change-Id: I14c7821c1134690b18c8abc0edd27abcdabcad72 Reviewed-on: https://go-review.googlesource.com/c/go/+/494181 Run-TryBot: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>	2023-05-11 21:27:08 +00:00
Michael Anthony Knyszek	a2737b1aab	runtime: hide trace lock init details This change is in service of hiding more execution trace implementation details for big changes to come. Change-Id: I49b9716a7bf285d23c86b58912a05eff4ddc2213 Reviewed-on: https://go-review.googlesource.com/c/go/+/494182 Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-05-11 19:07:04 +00:00
Michael Pratt	734b26d4b9	runtime: exclude extra M's from debug.SetMaxThreads The purpose of the debug.SetMaxThreads limit is to avoid accidental fork bomb from something like millions of goroutines blocking on system calls, causing the runtime to create millions of threads. By definition we don't create threads created in C, so this isn't a problem for those threads, and we can exclude them from the limit. If C wants to create tens of thousands of threads, who are we to say no? Fixes #60004. Change-Id: I62b875890718b406abca42a9a4078391e25aa21b Reviewed-on: https://go-review.googlesource.com/c/go/+/492743 Auto-Submit: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Michael Pratt <mpratt@google.com>	2023-05-09 17:55:20 +00:00
Michael Pratt	77b1f23af7	runtime: clean up extra M API There are quite a few locations that get/put Ms from the extra M list, but the API is pretty clumsy to use. Add an easier to use getExtraM / putExtraM API. There are only two minor semantic changes: 1. dropm no longer calls setg(nil) inside the lockextra critical section. It is important that this thread no longer references the G (and in turn M) once it is published to the extra M list and another thread could acquire it. But there is no reason that needs to happen only after lockextra. 2. extraMLength (renamed from extraMCount) is no longer protected by lockextra and is instead simply an atomic (though writes are still in the critical section). The previous readers all dropped lockextra before using the value they read anyway. For #60004. Change-Id: Ifca4d6c84d605423855d89f49af400ca07de56f4 Reviewed-on: https://go-review.googlesource.com/c/go/+/492742 Run-TryBot: Michael Pratt <mpratt@google.com> Commit-Queue: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Michael Pratt <mpratt@google.com>	2023-05-08 16:39:39 +00:00
Chressie Himpel	72c33a5ef0	Revert "runtime/cgo: store M for C-created thread in pthread key" This reverts CL 485500. Reason for revert: This breaks internal tests at Google, see b/280861579 and b/280820455. Change-Id: I426278d400f7611170918fc07c524cb059b9cc55 Reviewed-on: https://go-review.googlesource.com/c/go/+/492995 Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Chressie Himpel <chressie@google.com>	2023-05-05 14:37:29 +00:00
Nick Ripley	265d19ed52	runtime/trace: avoid frame pointer unwinding for events during cgocallbackg The current mp.incgocallback() logic allows for trace events to be recorded using frame pointer unwinding during cgocallbackg when they shouldn't be. Specifically, mp.incgo will be false during the reentersyscall call at the end. It's possible to crash with tracing enabled because of this, if C code which uses the frame pointer register for other purposes calls into Go. This can be seen, for example, by forcing testprogcgo/trace_unix.c to write a garbage value to RBP prior to calling into Go. We can drop the mp.incgo check, and instead conservatively avoid doing frame pointer unwinding if there is any C on the stack. This is the case if mp.ncgo > 0, or if mp.isextra is true (meaning we're coming from a thread created by C). Rename incgocallback to reflect that we're checking if there's any C on the stack. We can also move the ncgo increment in cgocall closer to where the transition to C happens, which lets us use frame pointer unwinding for the entersyscall event during the first Go-to-C call on a stack, when there isn't yet any C on the stack. Fixes #59830. Change-Id: If178a705a9d38d0d2fb19589a9e669cd982d32cd Reviewed-on: https://go-review.googlesource.com/c/go/+/488755 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Run-TryBot: Nick Ripley <nick.ripley@datadoghq.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-04-28 21:07:22 +00:00
Lucien Coffe	ff059add10	runtime: resolve checkdead panic by refining `startm` lock handling in caller context This change addresses a `checkdead` panic caused by a race condition between `sysmon->startm` and `checkdead` callers, due to prematurely releasing the scheduler lock. The solution involves allowing a `startm` caller to acquire the scheduler lock and call `startm` in this context. A new `lockheld` bool argument is added to `startm`, which manages all lock and unlock calls within the function. The`startIdle` function variable in `injectglist` is updated to call `startm` with the lock held, ensuring proper lock handling in this specific case. This refined lock handling resolves the observed race condition issue. Fixes #59600 Change-Id: I11663a15536c10c773fc2fde291d959099aa71be Reviewed-on: https://go-review.googlesource.com/c/go/+/487316 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Michael Pratt <mpratt@google.com>	2023-04-28 15:57:36 +00:00
Michael Pratt	7b874619be	runtime/cgo: store M for C-created thread in pthread key This reapplies CL 481061, with the followup fixes in CL 482975, CL 485315, and CL 485316 incorporated. CL 481061, by doujiang24 <doujiang24@gmail.com>, speed up C to Go calls by binding the M to the C thread. See below for its description. CL 482975 is a followup fix to a C declaration in testprogcgo. CL 485315 is a followup fix for x_cgo_getstackbound on Illumos. CL 485316 is a followup cleanup for ppc64 assembly. [Original CL 481061 description] This reapplies CL 392854, with the followup fixes in CL 479255, CL 479915, and CL 481057 incorporated. CL 392854, by doujiang24 <doujiang24@gmail.com>, speed up C to Go calls by binding the M to the C thread. See below for its description. CL 479255 is a followup fix for a small bug in ARM assembly code. CL 479915 is another followup fix to address C to Go calls after the C code uses some stack, but that CL is also buggy. CL 481057, by Michael Knyszek, is a followup fix for a memory leak bug of CL 479915. [Original CL 392854 description] In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls. So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call. Instead, we only dropm while the C thread exits, so the extra M won't leak. When invoking a Go function from C: Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor. And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits. When returning back to C: Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C. This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows. This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread. For the newly added BenchmarkCGoInCThread, some benchmark results: 1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz 2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz [CL 479915 description] Currently, when C calls into Go the first time, we grab an M using needm, which sets m.g0's stack bounds using the SP. We don't know how big the stack is, so we simply assume 32K. Previously, when the Go function returns to C, we drop the M, and the next time C calls into Go, we put a new stack bound on the g0 based on the current SP. After CL 392854, we don't drop the M, and the next time C calls into Go, we reuse the same g0, without recomputing the stack bounds. If the C code uses quite a bit of stack space before calling into Go, the SP may be well below the 32K stack bound we assumed, so the runtime thinks the g0 stack overflows. This CL makes needm get a more accurate stack bound from pthread. (In some platforms this may still be a guess as we don't know exactly where we are in the C stack), but it is probably better than simply assuming 32K. [CL 485500 description] CL 479915 passed the G to _cgo_getstackbound for direct updates to gp.stack.lo. A G can be reused on a new thread after the previous thread exited. This could trigger the C TSAN race detector because it couldn't see the synchronization in Go (lockextra) preventing the same G from being used on multiple threads at the same time. We work around this by passing the address of a stack variable to _cgo_getstackbound rather than the G. The stack is generally unique per thread, so TSAN won't see the same address from multiple threads. Even if stacks are reused across threads by pthread, C TSAN should see the synchonization in the stack allocator. A regression test is added to misc/cgo/testsanitizer. Fixes #51676. Fixes #59294. Fixes #59678. Change-Id: Ic62be31a06ee83568215e875a891df37084e08ca Reviewed-on: https://go-review.googlesource.com/c/go/+/485500 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Michael Pratt <mpratt@google.com>	2023-04-26 19:25:46 +00:00
Ian Lance Taylor	f00e947cdf	runtime: add raceFiniLock to lock ranking Also preserve the PC/SP in reentersyscall when doing lock ranking. The test is TestDestructorCallbackRace with the staticlockranking experiment enabled. For #59711 Change-Id: I87ac1d121ec0d399de369666834891ab9e7d11b0 Reviewed-on: https://go-review.googlesource.com/c/go/+/487955 Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Run-TryBot: Ian Lance Taylor <iant@google.com>	2023-04-24 21:37:06 +00:00
Lucien Coffe	f229787aff	runtime: prevent double lock in checkdead by unlocking before throws This change resolves an issue where checkdead could result in a double lock when shedtrace is enabled. This fix involves adding unlocks before all throws in the checkdead function to ensure the scheduler lock is properly released. Fixes #59758 Change-Id: If3ddf9969f4582c3c88dee9b9ecc355a63958103 Reviewed-on: https://go-review.googlesource.com/c/go/+/487375 Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>	2023-04-21 20:14:08 +00:00
Austin Clements	87272bd1a1	runtime: tidy _Stack* constant naming For #59670. Change-Id: I0efa743edc08e48dc8d906803ba45e9f641369db Reviewed-on: https://go-review.googlesource.com/c/go/+/486977 Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com>	2023-04-21 19:29:00 +00:00
Austin Clements	2668a190ba	internal/abi, runtime, cmd: merge funcFlag_* consts into internal/abi For #59670. Change-Id: Ie784ba4dd2701e4f455e1abde4a6bfebee4b1387 Reviewed-on: https://go-review.googlesource.com/c/go/+/485496 Reviewed-by: David Chase <drchase@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Austin Clements <austin@google.com> Auto-Submit: Austin Clements <austin@google.com>	2023-04-21 19:28:46 +00:00
Austin Clements	df777cfa15	Revert "runtime: tidy _Stack* constant naming" This reverts commit CL 486381. Submitted out of order and breaks bootstrap. Change-Id: Ia472111cb966e884a48f8ee3893b3bf4b4f4f875 Reviewed-on: https://go-review.googlesource.com/c/go/+/486915 Reviewed-by: David Chase <drchase@google.com> TryBot-Bypass: Austin Clements <austin@google.com>	2023-04-20 16:19:25 +00:00
Austin Clements	ef8e3b7fa4	runtime: tidy _Stack* constant naming For #59670. Change-Id: I4476d6f92663e8a825d063d6e6a7fc9a2ac99d4d Reviewed-on: https://go-review.googlesource.com/c/go/+/486381 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2023-04-20 16:05:23 +00:00
Michael Pratt	94850c6f79	Revert "runtime/cgo: store M for C-created thread in pthread key" This reverts CL 481061. Reason for revert: When built with C TSAN, x_cgo_getstackbound triggers race detection on `g->stacklo` because the synchronization is in Go, which isn't instrumented. For #51676. For #59294. For #59678. Change-Id: I38afcda9fcffd6537582a39a5214bc23dc147d47 Reviewed-on: https://go-review.googlesource.com/c/go/+/485275 TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> Reviewed-by: Than McIntosh <thanm@google.com>	2023-04-17 18:47:08 +00:00
Keith Randall	2b92c39fe0	cmd/link: establish dependable package initialization order (This is a retry of CL 462035 which was reverted at 474976. The only change from that CL is the aix fix SRODATA->SNOPTRDATA at inittask.go:141) As described here: https://github.com/golang/go/issues/31636#issuecomment-493271830 "Find the lexically earliest package that is not initialized yet, but has had all its dependencies initialized, initialize that package, and repeat." Simplify the runtime a bit, by just computing the ordering required in the linker and giving a list to the runtime. Update #31636 Fixes #57411 RELNOTE=yes Change-Id: I28c09451d6aa677d7394c179d23c2c02c503fc56 Reviewed-on: https://go-review.googlesource.com/c/go/+/478916 Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-04-14 16:55:22 +00:00
doujiang24	ccad8a9f9c	runtime/cgo: store M for C-created thread in pthread key This reapplies CL 392854, with the followup fixes in CL 479255, CL 479915, and CL 481057 incorporated. CL 392854, by doujiang24 <doujiang24@gmail.com>, speed up C to Go calls by binding the M to the C thread. See below for its description. CL 479255 is a followup fix for a small bug in ARM assembly code. CL 479915 is another followup fix to address C to Go calls after the C code uses some stack, but that CL is also buggy. CL 481057, by Michael Knyszek, is a followup fix for a memory leak bug of CL 479915. [Original CL 392854 description] In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls. So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call. Instead, we only dropm while the C thread exits, so the extra M won't leak. When invoking a Go function from C: Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor. And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits. When returning back to C: Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C. This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows. This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread. For the newly added BenchmarkCGoInCThread, some benchmark results: 1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz 2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz [CL 479915 description] Currently, when C calls into Go the first time, we grab an M using needm, which sets m.g0's stack bounds using the SP. We don't know how big the stack is, so we simply assume 32K. Previously, when the Go function returns to C, we drop the M, and the next time C calls into Go, we put a new stack bound on the g0 based on the current SP. After CL 392854, we don't drop the M, and the next time C calls into Go, we reuse the same g0, without recomputing the stack bounds. If the C code uses quite a bit of stack space before calling into Go, the SP may be well below the 32K stack bound we assumed, so the runtime thinks the g0 stack overflows. This CL makes needm get a more accurate stack bound from pthread. (In some platforms this may still be a guess as we don't know exactly where we are in the C stack), but it is probably better than simply assuming 32K. Fixes #51676. Fixes #59294. Change-Id: I9bf1400106d5c08ce621d2ed1df3a2d9e3f55494 Reviewed-on: https://go-review.googlesource.com/c/go/+/481061 Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: DeJiang Zhu (doujiang) <doujiang24@gmail.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-04-03 18:34:11 +00:00
Cherry Mui	bfe3c678ab	Revert "runtime/cgo: store M for C-created thread in pthread key" This reverts CL 392854. Reason for revert: caused #59294, which was derived from google internal tests. The attempted fix of #59294 caused more breakage. Change-Id: I5a061561ac2740856b7ecc09725ac28bd30f8bba Reviewed-on: https://go-review.googlesource.com/c/go/+/481060 Reviewed-by: Heschi Kreinick <heschi@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-03-31 19:26:35 +00:00
Cherry Mui	63ef9059a2	Revert "runtime: get a better g0 stack bound in needm" This reverts CL 479915. Reason for revert: breaks a lot google internal tests. Change-Id: I13a9422e810af7ba58cbf4a7e6e55f4d8cc0ca51 Reviewed-on: https://go-review.googlesource.com/c/go/+/481055 Reviewed-by: Chressie Himpel <chressie@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-03-31 18:42:48 +00:00
Cherry Mui	443eb9757c	runtime: get a better g0 stack bound in needm Currently, when C calls into Go the first time, we grab an M using needm, which sets m.g0's stack bounds using the SP. We don't know how big the stack is, so we simply assume 32K. Previously, when the Go function returns to C, we drop the M, and the next time C calls into Go, we put a new stack bound on the g0 based on the current SP. After CL 392854, we don't drop the M, and the next time C calls into Go, we reuse the same g0, without recomputing the stack bounds. If the C code uses quite a bit of stack space before calling into Go, the SP may be well below the 32K stack bound we assumed, so the runtime thinks the g0 stack overflows. This CL makes needm get a more accurate stack bound from pthread. (In some platforms this may still be a guess as we don't know exactly where we are in the C stack), but it is probably better than simply assuming 32K. For #59294. Change-Id: Ie52a8f931e0648d8753e4c1dbe45468b8748b527 Reviewed-on: https://go-review.googlesource.com/c/go/+/479915 Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>	2023-03-30 23:23:55 +00:00
Felix Geisendörfer	3dd221a94d	runtime/trace: use regular unwinding for cgo callbacks Introduce a new m.incgocallback field that is true while C code calls into Go code. Use it in the tracer in order to fallback to the default unwinder instead of frame pointer unwinding for this scenario. The existing fields (incgo, ncgo) were not sufficient to detect the case where a thread created in C calls into Go code. Motivation: 1. Take advantage of a cgo symbolizer, if registered, to unwind through C stacks without frame pointers. 2. Reduce the chance of crashes. It seems unsafe to follow frame pointers when there could be C code that was compiled without frame pointers. Removing the curgp.m.incgocallback check in traceStackID shows the following minor differences between frame pointer unwinding and the default unwinder when there is no cgo symbolizer involved. trace_test.go:60: "goCalledFromCThread": got stack: main.goCalledFromCThread /src/runtime/testdata/testprogcgo/trace.go:58 _cgoexp_45c15a3efb3a_goCalledFromCThread _cgo_gotypes.go:694 runtime.cgocallbackg1 /src/runtime/cgocall.go:318 runtime.cgocallbackg /src/runtime/cgocall.go:236 runtime.cgocallback /src/runtime/asm_amd64.s:998 crosscall2 /src/runtime/cgo/asm_amd64.s:30 want stack: main.goCalledFromCThread /src/runtime/testdata/testprogcgo/trace.go:58 _cgoexp_45c15a3efb3a_goCalledFromCThread _cgo_gotypes.go:694 runtime.cgocallbackg1 /src/runtime/cgocall.go:318 runtime.cgocallbackg /src/runtime/cgocall.go:236 runtime.cgocallback /src/runtime/asm_amd64.s:998 trace_test.go:60: "goCalledFromC": got stack: main.goCalledFromC /src/runtime/testdata/testprogcgo/trace.go:51 _cgoexp_45c15a3efb3a_goCalledFromC _cgo_gotypes.go:687 runtime.cgocallbackg1 /src/runtime/cgocall.go:318 runtime.cgocallbackg /src/runtime/cgocall.go:236 runtime.cgocallback /src/runtime/asm_amd64.s:998 crosscall2 /src/runtime/cgo/asm_amd64.s:30 runtime.asmcgocall /src/runtime/asm_amd64.s:848 main._Cfunc_cCalledFromGo _cgo_gotypes.go:263 main.goCalledFromGo /src/runtime/testdata/testprogcgo/trace.go:46 main.Trace /src/runtime/testdata/testprogcgo/trace.go:37 main.main /src/runtime/testdata/testprogcgo/main.go:34 want stack: main.goCalledFromC /src/runtime/testdata/testprogcgo/trace.go:51 _cgoexp_45c15a3efb3a_goCalledFromC _cgo_gotypes.go:687 runtime.cgocallbackg1 /src/runtime/cgocall.go:318 runtime.cgocallbackg /src/runtime/cgocall.go:236 runtime.cgocallback /src/runtime/asm_amd64.s:998 runtime.systemstack_switch /src/runtime/asm_amd64.s:463 runtime.cgocall /src/runtime/cgocall.go:168 main._Cfunc_cCalledFromGo _cgo_gotypes.go:263 main.goCalledFromGo /src/runtime/testdata/testprogcgo/trace.go:46 main.Trace /src/runtime/testdata/testprogcgo/trace.go:37 main.main /src/runtime/testdata/testprogcgo/main.go:34 For #16638 Change-Id: I95fa27a3170c5abd923afc6eadab4eae777ced31 Reviewed-on: https://go-review.googlesource.com/c/go/+/474916 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>	2023-03-30 19:18:12 +00:00

1 2 3 4 5 ...

608 Commits