Update #40995
Change-Id: I6963ead1a7c4520092361cce80edb17010e7f436
Reviewed-on: https://go-review.googlesource.com/c/go/+/250579
Trust: Joel Sing <joel@sing.id.au>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
The g register is now in X27 (previously X4, which collided with TP usage). Remove
X27 from preempt save/restore.
Change-Id: I9dd38ec3a8222fa0710757463769dbfac8ae7d20
Reviewed-on: https://go-review.googlesource.com/c/go/+/265517
Trust: Joel Sing <joel@sing.id.au>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Following golang.org/cl/259578, findrunnable still must touch every
other P in checkTimers in order to look for timers to steal. This scales
poorly with GOMAXPROCS and potentially performs poorly by pulling remote
Ps into cache.
Add timerpMask, a bitmask that tracks whether each P may have any timers
on its timer heap.
Ideally we would update this field on any timer add / remove to always
keep it up to date. Unfortunately, updating a shared global structure is
antithetical to sharding timers by P, and doing so approximately doubles
the cost of addtimer / deltimer in microbenchmarks.
Instead we only (potentially) clear the mask when the P goes idle. This
covers the best case of avoiding looking at a P _at all_ when it is idle
and has no timers. See the comment on updateTimerPMask for more details
on the trade-off. Future CLs may be able to expand cases we can avoid
looking at the timers.
Note that the addition of idlepMask to p.init is a no-op. The zero value
of the mask is the correct init value so it is not necessary, but it is
included for clarity.
Benchmark results from WakeupParallel/syscall/pair/race/1ms (see
golang.org/cl/228577). Note that these are on top of golang.org/cl/259578:
name old msec new msec delta
Perf-task-clock-8 244 ± 4% 246 ± 4% ~ (p=0.841 n=5+5)
Perf-task-clock-16 247 ±11% 252 ± 4% ~ (p=1.000 n=5+5)
Perf-task-clock-32 270 ± 1% 268 ± 2% ~ (p=0.548 n=5+5)
Perf-task-clock-64 302 ± 3% 296 ± 1% ~ (p=0.222 n=5+5)
Perf-task-clock-128 358 ± 3% 352 ± 2% ~ (p=0.310 n=5+5)
Perf-task-clock-256 483 ± 3% 458 ± 1% -5.16% (p=0.008 n=5+5)
Perf-task-clock-512 663 ± 1% 612 ± 4% -7.61% (p=0.008 n=5+5)
Perf-task-clock-1024 1.06k ± 1% 0.95k ± 2% -10.24% (p=0.008 n=5+5)
Updates #28808
Updates #18237
Change-Id: I4239cd89f21ad16dfbbef58d81981da48acd0605
Reviewed-on: https://go-review.googlesource.com/c/go/+/264477
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Trust: Michael Pratt <mpratt@google.com>
This reverts commit 8f26b57f9a.
Reason for revert: break a bunch of code, include standard library.
Fixes#42123
Change-Id: Ife90ecbafd2cb395623d1db555fbfc9c1b0098e0
Reviewed-on: https://go-review.googlesource.com/c/go/+/264026
Trust: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Sysmon can actually get the RW lock execLock while holding the sysmon
lock (if no M is available), so there is an edge from lockRankSysmon to
lockRankRwmutexR. The stack trace is sysmon() [gets sched.sysmonlock] ->
startm() -> newm() -> newm1() -> execLock.runlock() [gets
execLock.rLock]
Change-Id: I9658659ba3899afb5219114d66b989abd50540db
Reviewed-on: https://go-review.googlesource.com/c/go/+/265721
Trust: Dan Scales <danscales@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Now that we have the G register saved, we can enable asynchronous
preemption for pure Go programs on darwin/arm64.
Updates #38485, #36365.
Change-Id: Ic654fa4dce369efe289b38d59cf1a184b358fe9e
Reviewed-on: https://go-review.googlesource.com/c/go/+/265120
Trust: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Now that we always have TLS set up, we can always save the G
register, regardless of whether cgo is used. This makes pure Go
programs signal-safe.
Updates #38485.
Change-Id: Icbc69acf0e2a5652fbcbbd074258a1a5efe87f1a
Reviewed-on: https://go-review.googlesource.com/c/go/+/265119
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
Currently, on darwin/arm64 we set up TLS using cgo. TLS is not
set for pure Go programs. As we use libc for syscalls on darwin,
we need to save the G register before the libc call. Otherwise it
is not signal-safe, as a signal may land during the execution of
a libc function, where the G register may be clobbered.
This CL initializes TLS in Go, by calling the pthread functions
directly without cgo. This makes it possible to save the G
register to TLS in pure Go programs (done in a later CL).
Inspired by Elias's CL 209197. Write the logic in Go instead of
assembly.
Updates #38485, #35853.
Change-Id: I257ba2a411ad387b2f4d50d10129d37fec7a226e
Reviewed-on: https://go-review.googlesource.com/c/go/+/265118
Trust: Cherry Zhang <cherryyz@google.com>
Trust: Elias Naur <mail@eliasnaur.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Implement runtime.duffzero and runtime.duffcopy for riscv64.
Use obj.ADUFFZERO/obj.ADUFFCOPY for medium size, word aligned
zeroing/moving.
Change-Id: I42ec622055630c94cb77e286d8d33dbe7c9f846c
Reviewed-on: https://go-review.googlesource.com/c/go/+/237797
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
It requires cgo. Also, skip the test on windows and plan9.
For #42207
Change-Id: I8522773f93bc3f9826506a41a08b86a083262e31
Reviewed-on: https://go-review.googlesource.com/c/go/+/265778
Trust: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Otherwise, if a signal occurs just after we allocated the M,
we can deadlock if the signal handler needs to allocate an M
itself.
Fixes#42207
Change-Id: I76f44547f419e8b1c14cbf49bf602c6e645d8c14
Reviewed-on: https://go-review.googlesource.com/c/go/+/265759
Trust: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
It has been observed that setgid hangs when using cgo with musl.
This fix ensures that signal 34 gets handled in an appropriate way,
like signal 33 when using glibc.
Fixes#39343
Change-Id: I89565663e2c361f62cbccfe80aaedf290bd58d57
Reviewed-on: https://go-review.googlesource.com/c/go/+/236518
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Tobias Klauser <tobias.klauser@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Some programs have a lot of timers that they adjust both forward and
backward in time. This can cause a large number of timerModifiedEarlier
timers. In practice these timers are used for I/O deadlines and are
rarely reached. The effect is that the runtime spends a lot of time
in adjusttimers making sure that there are no timerModifiedEarlier
timers, but the effort is wasted because none of the adjusted timers
are near the top of the timer heap anyhow.
Avoid much of this extra work by keeping track of the earliest known
timerModifiedEarlier timer. This lets us skip adjusttimers if we know
that none of the timers will be ready to run anyhow. We will still
eventually run it, when we reach the deadline of the earliest known
timerModifiedEarlier, although in practice that timer has likely
been removed. When we do run adjusttimers, we will reset all of the
timerModifiedEarlier timers, and clear our notion of when we need
to run adjusttimers again.
This effect should be to significantly reduce the number of times we
walk through the timer list in adjusttimers.
Fixes#41699
Change-Id: I38eb2be611fb34e3017bb33d0a9ed40d75fb414f
Reviewed-on: https://go-review.googlesource.com/c/go/+/258303
Trust: Ian Lance Taylor <iant@golang.org>
Trust: Emmanuel Odeke <emmanuel@orijtech.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Pretty minor concern, but after auditing the compiler/runtime for
conversions from pointers to go:notinheap types to unsafe.Pointer,
this is the only remaining one I found.
Update #42076
Change-Id: I81d5b893c9ada2fc19a51c2559262f2e9ff71c35
Reviewed-on: https://go-review.googlesource.com/c/go/+/265757
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
pointers to go:notinheap types should be treated as scalars. That
means they shouldn't be stored directly in interfaces, or directly
in reflect.Value.ptr.
Also be sure to use uintpr to compare such pointers in reflect.DeepEqual.
Fixes#42076
Change-Id: I53735f6d434e9c3108d4940bd1bae14c61ef2a74
Reviewed-on: https://go-review.googlesource.com/c/go/+/264480
Trust: Keith Randall <khr@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Change the scheduler to treat expired timers with the same approach it
uses to steal runnable G's.
Previously the scheduler ignored timers on P's not marked for
preemption. That had the downside that any G's waiting on those expired
timers starved until the G running on their P completed or was
preempted. That could take as long as 20ms if sysmon was in a 10ms
wake up cycle.
In addition, a spinning P that ignored an expired timer and found no
other work would stop despite there being available work, missing the
opportunity for greater parallelism.
With this change the scheduler no longer ignores timers on
non-preemptable P's or relies on sysmon as a backstop to start threads
when timers expire. Instead it wakes an idle P, if needed, when
creating a new timer because it cannot predict if the current P will
have a scheduling opportunity before the new timer expires. The P it
wakes will determine how long to sleep and block on the netpoller for
the required time, potentially stealing the new timer when it wakes.
This change also eliminates a race between a spinning P transitioning
to idle concurrently with timer creation using the same pattern used
for submission of new goroutines in the same window.
Benchmark analysis:
CL 232199, which was included in Go 1.15 improved timer latency over Go
1.14 by allowing P's to steal timers from P's not marked for preemption.
The benchmarks added in this CL measure that improvement in the
ParallelTimerLatency benchmark seen below. However, Go 1.15 still relies
on sysmon to notice expired timers in some situations and sysmon can
sleep for up to 10ms before waking to check timers. This CL fixes that
shortcoming with modest regression on other benchmarks.
name \ avg-late-ns go14.time.bench go15.time.bench fix.time.bench
ParallelTimerLatency-8 17.3M ± 3% 7.9M ± 0% 0.2M ± 3%
StaggeredTickerLatency/work-dur=300µs/tickers-per-P=1-8 53.4k ±23% 50.7k ±31% 252.4k ± 9%
StaggeredTickerLatency/work-dur=300µs/tickers-per-P=2-8 204k ±14% 90k ±58% 188k ±12%
StaggeredTickerLatency/work-dur=300µs/tickers-per-P=3-8 1.17M ± 0% 0.11M ± 5% 0.11M ± 2%
StaggeredTickerLatency/work-dur=300µs/tickers-per-P=4-8 1.81M ±44% 0.10M ± 4% 0.10M ± 2%
StaggeredTickerLatency/work-dur=300µs/tickers-per-P=5-8 2.28M ±66% 0.09M ±13% 0.08M ±21%
StaggeredTickerLatency/work-dur=300µs/tickers-per-P=6-8 2.84M ±85% 0.07M ±15% 0.07M ±18%
StaggeredTickerLatency/work-dur=300µs/tickers-per-P=7-8 2.13M ±27% 0.06M ± 4% 0.06M ± 9%
StaggeredTickerLatency/work-dur=300µs/tickers-per-P=8-8 2.63M ± 6% 0.06M ±11% 0.06M ± 9%
StaggeredTickerLatency/work-dur=300µs/tickers-per-P=9-8 3.32M ±17% 0.06M ±16% 0.07M ±14%
StaggeredTickerLatency/work-dur=300µs/tickers-per-P=10-8 8.46M ±20% 4.37M ±21% 5.03M ±23%
StaggeredTickerLatency/work-dur=2ms/tickers-per-P=1-8 1.02M ± 1% 0.20M ± 2% 0.20M ± 2%
name \ max-late-ns go14.time.bench go15.time.bench fix.time.bench
ParallelTimerLatency-8 18.3M ± 1% 8.2M ± 0% 0.5M ±12%
StaggeredTickerLatency/work-dur=300µs/tickers-per-P=1-8 141k ±19% 127k ±19% 1129k ± 3%
StaggeredTickerLatency/work-dur=300µs/tickers-per-P=2-8 2.78M ± 4% 1.23M ±15% 1.26M ± 5%
StaggeredTickerLatency/work-dur=300µs/tickers-per-P=3-8 6.05M ± 5% 0.67M ±56% 0.81M ±33%
StaggeredTickerLatency/work-dur=300µs/tickers-per-P=4-8 7.93M ±20% 0.71M ±46% 0.76M ±41%
StaggeredTickerLatency/work-dur=300µs/tickers-per-P=5-8 9.41M ±30% 0.92M ±23% 0.81M ±44%
StaggeredTickerLatency/work-dur=300µs/tickers-per-P=6-8 10.8M ±42% 0.8M ±41% 0.8M ±30%
StaggeredTickerLatency/work-dur=300µs/tickers-per-P=7-8 9.62M ±24% 0.77M ±38% 0.88M ±27%
StaggeredTickerLatency/work-dur=300µs/tickers-per-P=8-8 10.6M ±10% 0.8M ±32% 0.7M ±27%
StaggeredTickerLatency/work-dur=300µs/tickers-per-P=9-8 11.9M ±36% 0.6M ±46% 0.8M ±38%
StaggeredTickerLatency/work-dur=300µs/tickers-per-P=10-8 36.8M ±21% 24.7M ±21% 27.5M ±16%
StaggeredTickerLatency/work-dur=2ms/tickers-per-P=1-8 2.12M ± 2% 1.02M ±11% 1.03M ± 7%
Other time benchmarks:
name \ time/op go14.time.bench go15.time.bench fix.time.bench
AfterFunc-8 137µs ± 4% 123µs ± 4% 131µs ± 2%
After-8 212µs ± 3% 195µs ± 4% 204µs ± 7%
Stop-8 165µs ± 6% 156µs ± 2% 151µs ±12%
SimultaneousAfterFunc-8 260µs ± 3% 248µs ± 3% 284µs ± 2%
StartStop-8 65.8µs ± 9% 64.4µs ± 7% 67.3µs ±15%
Reset-8 13.6µs ± 2% 9.6µs ± 2% 9.1µs ± 4%
Sleep-8 307µs ± 4% 306µs ± 3% 320µs ± 2%
Ticker-8 53.0µs ± 5% 54.5µs ± 5% 57.0µs ±11%
TickerReset-8 9.24µs ± 2% 9.51µs ± 3%
TickerResetNaive-8 149µs ± 5% 145µs ± 5%
Fixes#38860
Updates #25471
Updates #27707
Change-Id: If52680509b0f3b66dbd1d0c13fa574bd2d0bbd57
Reviewed-on: https://go-review.googlesource.com/c/go/+/232298
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Trust: Ian Lance Taylor <iant@golang.org>
This change modifies addrRanges.findSucc to more efficiently find the
successor range in an addrRanges by using a binary search to narrow down
large addrRanges and iterate over no more than 8 addrRanges.
This change makes the runtime more robust against systems that may
aggressively randomize the address space mappings it gives the runtime
(e.g. Fuchsia).
For #40191.
Change-Id: If529df2abd2edb1b1496d8690ddd284ecd7138c2
Reviewed-on: https://go-review.googlesource.com/c/go/+/242679
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Since MemStats is now populated directly and some values are derived,
avoid duplicating the logic by instead populating the heap dump directly
from MemStats (external version) instead of memstats (runtime internal
version).
Change-Id: I0bec96bfa02d2ffd1b56475779c124a760e64238
Reviewed-on: https://go-review.googlesource.com/c/go/+/255817
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
For #37112.
Change-Id: I994dfe848605b95ef6aec24f53869e929247e987
Reviewed-on: https://go-review.googlesource.com/c/go/+/247049
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
For #37112.
Change-Id: Ibb0425c9c582ae3da3b2662d5bbe830d7df9079c
Reviewed-on: https://go-review.googlesource.com/c/go/+/247047
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
This change adds a concurrent HDR time histogram to the runtime with
tests. It also adds a function to generate boundaries for use by the
metrics package.
For #37112.
Change-Id: Ifbef8ddce8e3a965a0dcd58ccd4915c282ae2098
Reviewed-on: https://go-review.googlesource.com/c/go/+/247046
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
This change adds metrics for the distribution of objects allocated and
freed by size, mirroring MemStats' BySize field.
For #37112.
Change-Id: Ibaf1812da93598b37265ec97abc6669c1a5efcbf
Reviewed-on: https://go-review.googlesource.com/c/go/+/247045
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
sysMemStats are updated early on in runtime initialization, so
triggering a stack growth would be bad. Mark them nosplit.
Thank you so much to cherryyz@google.com for finding this fix!
Fixes#42218.
Change-Id: Ic62db76e6a4f829355d7eaabed1727c51adfbd0f
Reviewed-on: https://go-review.googlesource.com/c/go/+/265157
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
This change adds three new metrics: the heap goal, GC cycle count, and
forced GC count. These metrics are identical to their MemStats
counterparts.
For #37112.
Change-Id: I5a5e8dd550c0d646e5dcdbdf38274895e27cdd88
Reviewed-on: https://go-review.googlesource.com/c/go/+/247044
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
For #37112.
Change-Id: Idd3dd5c84215ddd1ab05c2e76e848aa0a4d40fb0
Reviewed-on: https://go-review.googlesource.com/c/go/+/247043
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
This change adds a new benchmark to the runtime tests for measuring the
latency of the new metrics implementation, based on the
ReadMemStats latency benchmark. readMetrics will have more metrics added
to it in the future, and this benchmark will serve as a way to measure
the cost of adding additional metrics.
Change-Id: Ib05e3ed4afa49a70863fc0c418eab35b72263e24
Reviewed-on: https://go-review.googlesource.com/c/go/+/247042
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Emmanuel Odeke <emmanuel@orijtech.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
This change adds support for a variety of runtime memory metrics and
contains the base implementation of Read for the runtime/metrics
package, which lives in the runtime.
It also adds testing infrastructure for the metrics package, and a bunch
of format and documentation tests.
For #37112.
Change-Id: I16a2c4781eeeb2de0abcb045c15105f1210e2d8a
Reviewed-on: https://go-review.googlesource.com/c/go/+/247041
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Trust: Michael Knyszek <mknyszek@google.com>
This change moves the mcache-local malloc stats into the
consistentHeapStats structure so the malloc stats can be managed
consistently with the memory stats. The one exception here is
tinyAllocs for which moving that into the global stats would incur
several atomic writes on the fast path. Microbenchmarks for just one CPU
core have shown a 50% loss in throughput. Since tiny allocation counnt
isn't exposed anyway and is always blindly added to both allocs and
frees, let that stay inconsistent and flush the tiny allocation count
every so often.
Change-Id: I2a4b75f209c0e659b9c0db081a3287bf227c10ca
Reviewed-on: https://go-review.googlesource.com/c/go/+/247039
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
This change replaces stacks_inuse, gcWorkBufInUse and
gcProgPtrScalarBitsInUse with their corresponding consistent stats. It
also adds checks to make sure the rest of the sharded stats line up with
existing stats in updatememstats.
Change-Id: I17d0bd181aedb5c55e09c8dff18cef5b2a3a14e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/247038
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
This change adds a global set of heap statistics which are similar
to existing memory statistics. The purpose of these new statistics
is to be able to read them and get a consistent result without stopping
the world. The goal is to eventually replace as many of the existing
memstats statistics with the sharded ones as possible.
The consistent memory statistics use a tailor-made synchronization
mechanism to allow writers (allocators) to proceed with minimal
synchronization by using a sequence counter and a global generation
counter to determine which set of statistics to update. Readers
increment the global generation counter to effectively grab a snapshot
of the statistics, and then iterate over all Ps using the sequence
counter to ensure that they may safely read the snapshotted statistics.
To keep statistics fresh, the reader also has a responsibility to merge
sets of statistics.
These consistent statistics are computed, but otherwise unused for now.
Upcoming changes will integrate them with the rest of the codebase and
will begin to phase out existing statistics.
Change-Id: I637a11f2439e2049d7dccb8650c5d82500733ca5
Reviewed-on: https://go-review.googlesource.com/c/go/+/247037
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
This change creates the runtime/metrics package and adds the initial
interface as laid out in the design document.
For #37112.
Change-Id: I202dcee08ab008dd63bf96f7a4162f5b5f813637
Reviewed-on: https://go-review.googlesource.com/c/go/+/247040
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
This change adds a function getMCache which returns the current P's
mcache if it's available, and otherwise tries to get mcache0 if we're
bootstrapping. This function will come in handy as we need to replicate
this behavior in multiple places in future changes.
Change-Id: I536073d6f6dc6c6390269e613ead9f8bcb6e7f98
Reviewed-on: https://go-review.googlesource.com/c/go/+/246976
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
memstats.heap_alloc is 100% a duplicate and unnecessary copy of
memstats.alloc which exists because MemStats used to be populated from
memstats via a memmove.
Change-Id: I995489f61be39786e573b8494a8ab6d4ea8bed9c
Reviewed-on: https://go-review.googlesource.com/c/go/+/246975
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
This statistic is updated in many places but for MemStats may be
computed from existing statistics. Specifically by definition
heap_idle = heap_sys - heap_inuse since heap_sys is all memory allocated
from the OS for use in the heap minus memory used for non-heap purposes.
heap_idle is almost the same (since it explicitly includes memory that
*could* be used for non-heap purposes) but also doesn't include memory
that's actually used to hold heap objects.
Although it has some utility as a sanity check, it complicates
accounting and we want fewer, orthogonal statistics for upcoming metrics
changes, so just drop it.
Change-Id: I40af54a38e335f43249f6e218f35088bfd4380d1
Reviewed-on: https://go-review.googlesource.com/c/go/+/246974
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
This change breaks apart gc_sys into three distinct pieces. Two of those
pieces are pieces which come from heap_sys since they're allocated from
the page heap. The rest comes from memory mapped from e.g.
persistentalloc which better fits the purpose of a sysMemStat. Also,
rename gc_sys to gcMiscSys.
Change-Id: I098789170052511e7b31edbcdc9a53e5c24573f7
Reviewed-on: https://go-review.googlesource.com/c/go/+/246973
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Currently MemStats is populated via an unsafe memmove from memstats, but
this places unnecessary structural restrictions on memstats, is annoying
to reason about, and tightly couples the two. Instead, just populate the
fields of MemStats explicitly.
Change-Id: I96f6a64326b1a91d4084e7b30169a4bbe6a331f9
Reviewed-on: https://go-review.googlesource.com/c/go/+/246972
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
This change modifies the type of several mstats fields to be a new type:
sysMemStat. This type has the same structure as the fields used to have.
The purpose of this change is to make it very clear which stats may be
used in various functions for accounting (usually the platform-specific
sys* functions, but there are others). Currently there's an implicit
understanding that the *uint64 value passed to these functions is some
kind of statistic whose value is atomically managed. This understanding
isn't inherently problematic, but we're about to change how some stats
(which currently use mSysStatInc and mSysStatDec) work, so we want to
make it very clear what the various requirements are around "sysStat".
This change also removes mSysStatInc and mSysStatDec in favor of a
method on sysMemStat. Note that those two functions were originally
written the way they were because atomic 64-bit adds required a valid G
on ARM, but this hasn't been the case for a very long time (since
golang.org/cl/14204, but even before then it wasn't clear if mutexes
required a valid G anymore). Today we implement 64-bit adds on ARM with
a spinlock table.
Change-Id: I4e9b37cf14afc2ae20cf736e874eb0064af086d7
Reviewed-on: https://go-review.googlesource.com/c/go/+/246971
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
This change modifies mheap's span allocation API to have each caller
declare a purpose, defined as a new enum called spanAllocType.
The purpose behind this change is two-fold:
1. Tight control over who gets to allocate heap memory is, generally
speaking, a good thing. Every codepath that allocates heap memory
places additional implicit restrictions on the allocator. A notable
example of a restriction is work bufs coming from heap memory: write
barriers are not allowed in allocation paths because then we could
have a situation where the allocator calls into the allocator.
2. Memory statistic updating is explicit. Instead of passing an opaque
pointer for statistic updating, which places restrictions on how that
statistic may be updated, we use the spanAllocType to determine which
statistic to update and how.
We also take this opportunity to group all the statistic updating code
together, which should make the accounting code a little easier to
follow.
Change-Id: Ic0b0898959ba2a776f67122f0e36c9d7d60e3085
Reviewed-on: https://go-review.googlesource.com/c/go/+/246970
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
This change renames a bunch of malloc statistics stored in the mcache
that are all named with the "local_" prefix. It also renames largeAlloc
to allocLarge to prevent a naming conflict, and next_sample because it
would be the last mcache field with the old C naming style.
Change-Id: I29695cb83b397a435ede7e9ad5c3c9be72767ea3
Reviewed-on: https://go-review.googlesource.com/c/go/+/246969
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Now that local_scan is the last mcache-based statistic that is flushed
by purgecachedstats, and heap_scan and gcController.revise may be
interacted with concurrently, we don't need to flush heap_scan at
arbitrary locations where the heap is locked, and we don't need
purgecachedstats and cachestats anymore. Instead, we can flush
local_scan at the same time we update heap_live in refill, so the two
updates may share the same revise call.
Clean up unused functions, remove code that would cause the heap to get
locked in the allocSpan when it didn't need to (other than to flush
local_scan), and flush local_scan explicitly in a few important places.
Notably we need to flush local_scan whenever we flush the other stats,
but it doesn't need to be donated anywhere, so have releaseAll do the
flushing. Also, we need to flush local_scan before we set heap_scan at
the end of a GC, which was previously handled by cachestats. Just do so
explicitly -- it's not much code and it becomes a lot more clear why we
need to do so.
Change-Id: I35ac081784df7744d515479896a41d530653692d
Reviewed-on: https://go-review.googlesource.com/c/go/+/246968
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Trust: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
This change makes local_tinyallocs work like the rest of the malloc
stats and doesn't flush local_tinyallocs, instead making that the
source-of-truth.
Change-Id: I3e6cb5f1b3d086e432ce7d456895511a48e3617a
Reviewed-on: https://go-review.googlesource.com/c/go/+/246967
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
This change removes mcentral.nmalloc and adds mcache.local_nsmallalloc
which fulfills the same role but may be accessed non-atomically. It also
moves responsibility for updating heap_live and local_nsmallalloc into
mcache functions.
As a result of this change, mcache is now the sole source-of-truth for
malloc stats. It is also solely responsible for updating heap_live and
performing the various operations required as a result of updating
heap_live. The overall improvement here is in code organization:
previously malloc stats were fairly scattered, and now they have one
single home, and nearly all the required manipulations exist in a single
file.
Change-Id: I7e93fa297c1debf17e3f2a0d68aeed28a9c6af00
Reviewed-on: https://go-review.googlesource.com/c/go/+/246966
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
This change makes nlargealloc and largealloc into mcache fields just
like nlargefree and largefree. These local fields become the new
source-of-truth. This change also moves the accounting for these fields
out of allocSpan (which is an inappropriate place for it -- this
accounting generally happens much closer to the point of allocation) and
into largeAlloc. This move is partially possible now that we can call
gcController.revise at that point.
Furthermore, this change moves largeAlloc into mcache.go and makes it a
method of mcache. While there's a little bit of a mismatch here because
largeAlloc barely interacts with the mcache, it helps solidify the
mcache as the first allocation layer and provides a clear place to
aggregate and manage statistics.
Change-Id: I37b5e648710733bb4c04430b71e96700e438587a
Reviewed-on: https://go-review.googlesource.com/c/go/+/246965
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
This change makes it so that various local malloc stats (excluding
heap_scan and local_tinyallocs) are no longer written first to mheap
fields but are instead accessed directly from each mcache.
This change is part of a move toward having stats be distributed, and
cleaning up some old code related to the stats.
Note that because there's no central source-of-truth, when an mcache
dies, it must donate its stats to another mcache. It's always safe to
donate to the mcache for the 0th P, so do that.
Change-Id: I2556093dbc27357cb9621c9b97671f3c00aa1173
Reviewed-on: https://go-review.googlesource.com/c/go/+/246964
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
This change makes it so that the GC assist ratio (the pair of
gcControllerState fields assistBytesPerWork and assistWorkPerByte) is
updated atomically. Note that the pair of fields are not updated
together atomically, but that's OK. The code here was already racy for
some time and in practice the assist ratio moves very slowly.
The purpose of this change is so that we can document
gcController.revise to be safe for concurrent use, which will be useful
in further changes.
Change-Id: Ie25d630207c88e4f85f2b8953f6a0051ebf1b4ea
Reviewed-on: https://go-review.googlesource.com/c/go/+/246963
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
next_gc is mostly updated only during a STW, but may occasionally be
updated by calls to e.g. debug.SetGCPercent. In this case the update is
supposed to be protected by the heap lock, but in reality it's accessed
by gcController.revise which may be called without the heap lock held
(despite its documentation, which will be updated in a later change).
Change the synchronization policy on next_gc so that it's atomically
accessed when the world is not stopped to aid in making revise safe for
concurrent use.
Change-Id: I79657a72f91563f3241aaeda66e8a7757d399529
Reviewed-on: https://go-review.googlesource.com/c/go/+/246962
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
gcControllerState.scanWork's docs state that it must be accessed
atomically during a GC cycle, but gcControllerState.revise does not do
this (even when called with the heap lock held).
This change makes it so that gcControllerState.revise accesses scanWork
atomically and explicitly.
Note that we don't update gcControllerState.revise's erroneous doc
comment here because this change isn't about revise's guarantees, just
about heap_scan. The comment is updated in a later change.
Change-Id: Iafc3ad214e517190bfd8a219896d23da19f7659d
Reviewed-on: https://go-review.googlesource.com/c/go/+/246961
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Currently heap_scan is mostly protected by the heap lock, but
gcControllerState.revise sometimes accesses it without a lock. In an
effort to make gcControllerState.revise callable from more contexts (and
have its synchronization guarantees actually respected), make heap_scan
atomically read from and written to, unless the world is stopped.
Note that we don't update gcControllerState.revise's erroneous doc
comment here because this change isn't about revise's guarantees, just
about heap_scan. The comment is updated in a later change.
Change-Id: Iddbbeb954767c704c2bd1d221f36e6c4fc9948a6
Reviewed-on: https://go-review.googlesource.com/c/go/+/246960
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Emmanuel Odeke <emmanuel@orijtech.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
The Windows callback support accepts Go functions with arguments that
are uintptr-sized or smaller. However, it doesn't implement smaller
arguments correctly. It assumes the Windows arguments layout is
equivalent to the Go argument layout. This is often true, but because
Windows C ABIs pad arguments to word size, while Go packs arguments,
the layout is different if there are multiple sub-word-size arguments
in a row. For example, a function with two uint16 arguments will have
a two-word C argument frame, but only a 4 byte Go argument frame.
There are also subtleties surrounding floating-point register
arguments that it doesn't handle correctly.
To fix this, when constructing a callback, we examine the Go
function's signature to construct a mapping between the C argument
frame and the Go argument frame. When the callback is invoked, we use
this mapping to build the Go argument frame and copy the result back.
This adds several test cases to TestStdcallAndCDeclCallbacks that
exercise more complex function signatures. These all fail with the
current code, but work with this CL.
In addition to fixing these callback types, this is also a step toward
the Go register ABI (#40724), which is going to make the ABI
translation more complex.
Change-Id: I19fb1681b659d9fd528ffd5e88912bebb95da052
Reviewed-on: https://go-review.googlesource.com/c/go/+/263271
Trust: Austin Clements <austin@google.com>
Trust: Alex Brainman <alex.brainman@gmail.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>