Commit Graph

494 Commits

Author SHA1 Message Date
Michael Pratt 13f6be2833 runtime: use pidleget for faketime jump
In faketime mode, checkdead is responsible for jumping time forward to
the next timer expiration, and waking an M to handle the newly ready
timer.

Currently it pulls the exact P that owns the next timer off of the pidle
list. In theory this is efficient because that P is immediately eligible
to run the timer without stealing. Unfortunately it is also fraught with
peril because we are skipping all of the bookkeeping in pidleget:

* Skipped updates to timerpMask mean that our timers may not be eligible
  for stealing, as they should be.
* Skipped updates to idlepMask mean that our runq may not be eligible
  for stealing, as they should be.
* Skipped updates to sched.npidle may break tracking of spinning Ms,
  potentially resulting in lost work.
* Finally, as of CL 410122, skipped updates to p.limiterEvent may affect
  the GC limiter, or cause a fatal throw when another event occurs.

The last case has finally undercovered this issue since it quickly
results in a hard crash.

We could add all of these updates into checkdead, but it is much more
maintainable to keep this logic in one place and use pidleget here like
everywhere else in the runtime. This means we probably won't wake the
P owning the timer, meaning that the P will need to steal the timer,
which is less efficient, but faketime is not a performance-sensitive
build mode. Note that the M will automatically make itself a spinning M
to make it eligible to steal since it is the only one running.

Fixes #53294
For #52890

Change-Id: I4acc3d259b9b4d7dc02608581c8b4fd259f272e9
Reviewed-on: https://go-review.googlesource.com/c/go/+/411119
Run-TryBot: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2022-06-08 21:56:02 +00:00
Michael Anthony Knyszek 54bd44e573 runtime: track total idle time for GC CPU limiter
Currently the GC CPU limiter only tracks idle GC work time. However, in
very undersubscribed situations, it's possible that all this extra idle
time prevents the enabling of the limiter, since it all gets account for
as mutator time. Fix this by tracking all idle time via pidleget and
pidleput. To support this, pidleget and pidleput also accept and return
"now" parameters like the timer code.

While we're here, let's clean up some incorrect assumptions that some of
the scheduling code makes about "now."

Fixes #52890.

Change-Id: I4a97893d2e5ad1e8c821f8773c2a1d449267c951
Reviewed-on: https://go-review.googlesource.com/c/go/+/410122
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-06-03 20:16:45 +00:00
Michael Anthony Knyszek b93ceefa7b runtime: use osyield in runqgrab on netbsd
NetBSD appears to have the same issue OpenBSD had in runqgrab. See
issue #52475 for more details.

For #35166.

Change-Id: Ie53192d26919b4717bc0d61cadd88d688ff38bb4
Reviewed-on: https://go-review.googlesource.com/c/go/+/407139
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2022-05-19 14:18:41 +00:00
Keith Randall 016d755213 runtime: measure stack usage; start stacks larger if needed
Measure the average stack size used by goroutines at every GC. When
starting a new goroutine, allocate an initial goroutine stack of that
average size. Intuition is that we'll waste at most 2x in stack space
because only half the goroutines can be below average. In turn, we
avoid some of the early stack growth / copying needed in the average
case.

More details in the design doc at: https://docs.google.com/document/d/1YDlGIdVTPnmUiTAavlZxBI1d9pwGQgZT7IKFKlIXohQ/edit?usp=sharing

name        old time/op  new time/op  delta
Issue18138  95.3µs ± 0%  67.3µs ±13%  -29.35%  (p=0.000 n=9+10)

Fixes #18138

Change-Id: Iba34d22ed04279da7e718bbd569bbf2734922eaa
Reviewed-on: https://go-review.googlesource.com/c/go/+/345889
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2022-05-12 22:32:42 +00:00
Rhys Hiltner 7b1f8b62be runtime: prefer curg for execution trace profile
The CPU profiler adds goroutine labels to its samples based on
getg().m.curg. That allows the profile to correctly attribute work that
the runtime does on behalf of that goroutine on the M's g0 stack via
systemstack calls, such as using runtime.Callers to record the call
stack.

Those labels also cover work on the g0 stack via mcall. When the active
goroutine calls runtime.Gosched, it will receive attribution of its
share of the scheduler work necessary to find the next runnable
goroutine.

The execution tracer's attribution of CPU samples to specific goroutines
should match. When curg is set, attribute the CPU samples to that
goroutine's ID.

Fixes #52693

Change-Id: Ic9af92e153abd8477559e48bc8ebaf3739527b94
Reviewed-on: https://go-review.googlesource.com/c/go/+/404055
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Rhys Hiltner <rhys@justin.tv>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-05 18:17:08 +00:00
Rhys Hiltner 2c0a9884e0 runtime: add CPU samples to execution trace
When the CPU profiler and execution tracer are both active, report the
CPU profile samples in the execution trace data stream.

Include only samples that arrive on the threads known to the runtime,
but include them even when running g0 (such as near the scheduler) or if
there's no P (such as near syscalls).

Render them in "go tool trace" as instantaneous events.

For #16895

Change-Id: I0aa501a7b450c971e510961c0290838729033f7f
Reviewed-on: https://go-review.googlesource.com/c/go/+/400795
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Rhys Hiltner <rhys@justin.tv>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-03 20:49:46 +00:00
Rhys Hiltner 52bd1c4d6c runtime: decrease STW pause for goroutine profile
The goroutine profile needs to stop the world to get a consistent
snapshot of all goroutines in the app. Leaving the world stopped while
iterating over allgs leads to a pause proportional to the number of
goroutines in the app (or its high-water mark).

Instead, do only a fixed amount of bookkeeping while the world is
stopped. Install a barrier so the scheduler confirms that a goroutine
appears in the profile, with its stack recorded exactly as it was during
the stop-the-world pause, before it allows that goroutine to execute.
Iterate over allgs while the app resumes normal operations, adding each
to the profile unless they've been scheduled in the meantime (and so
have profiled themselves). Stop the world a second time to remove the
barrier and do a fixed amount of cleanup work.

This increases both the fixed overhead and per-goroutine CPU-time cost
of GoroutineProfile. It also increases the wall-clock latency of the
call to GoroutineProfile, since the scheduler may interrupt it to
execute other goroutines.

    name                                  old time/op    new time/op    delta
    GoroutineProfile/small/loaded-8         1.05ms ± 5%    4.99ms ±31%   +376.85%  (p=0.000 n=10+9)
    GoroutineProfile/sparse/loaded-8        1.04ms ± 4%    3.61ms ±27%   +246.61%  (p=0.000 n=10+10)
    GoroutineProfile/large/loaded-8         7.69ms ±17%   20.35ms ± 4%   +164.50%  (p=0.000 n=10+10)
    GoroutineProfile/small/idle              958µs ± 0%    1820µs ±23%    +89.91%  (p=0.000 n=10+10)
    GoroutineProfile/sparse/idle-8          1.00ms ± 3%    1.52ms ±17%    +51.18%  (p=0.000 n=10+10)
    GoroutineProfile/small/idle-8           1.01ms ± 4%    1.47ms ± 7%    +45.28%  (p=0.000 n=9+9)
    GoroutineProfile/sparse/idle             980µs ± 1%    1403µs ± 2%    +43.22%  (p=0.000 n=9+10)
    GoroutineProfile/large/idle-8           7.19ms ± 8%    8.43ms ±21%    +17.22%  (p=0.011 n=10+10)
    PingPongHog                              511ns ± 8%     585ns ± 9%    +14.39%  (p=0.000 n=10+10)
    GoroutineProfile/large/idle             6.71ms ± 0%    7.58ms ± 3%    +13.08%  (p=0.000 n=8+10)
    PingPongHog-8                            469ns ± 8%     509ns ±12%     +8.62%  (p=0.010 n=9+10)
    WakeupParallelSyscall/5µs                216µs ± 4%     229µs ± 3%     +6.06%  (p=0.000 n=10+9)
    WakeupParallelSyscall/5µs-8              147µs ± 1%     149µs ± 2%     +1.12%  (p=0.009 n=10+10)
    WakeupParallelSyscall/2µs-8              140µs ± 0%     142µs ± 1%     +1.11%  (p=0.001 n=10+9)
    WakeupParallelSyscall/50µs-8             236µs ± 0%     238µs ± 1%     +1.08%  (p=0.000 n=9+10)
    WakeupParallelSyscall/1µs-8              138µs ± 0%     140µs ± 1%     +1.05%  (p=0.013 n=10+9)
    Matmult                                 8.52ns ± 1%    8.61ns ± 0%     +0.98%  (p=0.002 n=10+8)
    WakeupParallelSyscall/10µs-8             157µs ± 1%     158µs ± 1%     +0.58%  (p=0.003 n=10+8)
    CreateGoroutinesSingle-8                 328ns ± 0%     330ns ± 1%     +0.57%  (p=0.000 n=9+9)
    WakeupParallelSpinning/100µs-8           343µs ± 0%     344µs ± 1%     +0.30%  (p=0.015 n=8+8)
    WakeupParallelSyscall/20µs-8             178µs ± 0%     178µs ± 0%     +0.18%  (p=0.043 n=10+9)
    StackGrowthDeep-8                       22.8µs ± 0%    22.9µs ± 0%     +0.12%  (p=0.006 n=10+10)
    StackGrowth                             1.06µs ± 0%    1.06µs ± 0%     +0.09%  (p=0.000 n=8+9)
    WakeupParallelSpinning/0s               10.7µs ± 0%    10.7µs ± 0%     +0.08%  (p=0.000 n=9+9)
    WakeupParallelSpinning/5µs              30.7µs ± 0%    30.7µs ± 0%     +0.04%  (p=0.000 n=10+10)
    WakeupParallelSpinning/100µs             411µs ± 0%     411µs ± 0%     +0.03%  (p=0.000 n=10+9)
    WakeupParallelSpinning/2µs              18.7µs ± 0%    18.7µs ± 0%     +0.02%  (p=0.026 n=10+10)
    WakeupParallelSpinning/20µs-8           93.0µs ± 0%    93.0µs ± 0%     +0.01%  (p=0.021 n=9+10)
    StackGrowth-8                            216ns ± 0%     216ns ± 0%       ~     (p=0.209 n=10+10)
    CreateGoroutinesParallel-8              49.5ns ± 2%    49.3ns ± 1%       ~     (p=0.591 n=10+10)
    CreateGoroutinesSingle                   699ns ±20%     748ns ±19%       ~     (p=0.353 n=10+10)
    WakeupParallelSpinning/0s-8             15.9µs ± 2%    16.0µs ± 3%       ~     (p=0.315 n=10+10)
    WakeupParallelSpinning/1µs              14.6µs ± 0%    14.6µs ± 0%       ~     (p=0.513 n=10+10)
    WakeupParallelSpinning/2µs-8            24.2µs ± 3%    24.1µs ± 2%       ~     (p=0.971 n=10+10)
    WakeupParallelSpinning/10µs             50.7µs ± 0%    50.7µs ± 0%       ~     (p=0.101 n=10+10)
    WakeupParallelSpinning/20µs             90.7µs ± 0%    90.7µs ± 0%       ~     (p=0.898 n=10+10)
    WakeupParallelSpinning/50µs              211µs ± 0%     211µs ± 0%       ~     (p=0.382 n=10+10)
    WakeupParallelSyscall/0s-8               137µs ± 1%     138µs ± 0%       ~     (p=0.075 n=10+10)
    WakeupParallelSyscall/1µs                216µs ± 1%     219µs ± 3%       ~     (p=0.065 n=10+9)
    WakeupParallelSyscall/2µs                214µs ± 7%     219µs ± 1%       ~     (p=0.101 n=10+8)
    WakeupParallelSyscall/50µs               317µs ± 5%     326µs ± 4%       ~     (p=0.123 n=10+10)
    WakeupParallelSyscall/100µs              450µs ± 9%     459µs ± 8%       ~     (p=0.247 n=10+10)
    WakeupParallelSyscall/100µs-8            337µs ± 0%     338µs ± 1%       ~     (p=0.089 n=10+10)
    WakeupParallelSpinning/5µs-8            32.2µs ± 0%    32.2µs ± 0%     -0.05%  (p=0.026 n=9+10)
    WakeupParallelSpinning/50µs-8            216µs ± 0%     216µs ± 0%     -0.12%  (p=0.004 n=10+10)
    WakeupParallelSpinning/1µs-8            20.6µs ± 0%    20.5µs ± 0%     -0.22%  (p=0.014 n=10+10)
    WakeupParallelSpinning/10µs-8           54.5µs ± 0%    54.2µs ± 1%     -0.57%  (p=0.000 n=10+10)
    CreateGoroutines-8                       213ns ± 1%     211ns ± 1%     -0.86%  (p=0.002 n=10+10)
    CreateGoroutinesCapture                 1.03µs ± 0%    1.02µs ± 0%     -0.91%  (p=0.000 n=10+10)
    CreateGoroutinesCapture-8               1.32µs ± 1%    1.31µs ± 1%     -1.06%  (p=0.001 n=10+9)
    CreateGoroutines                         188ns ± 0%     186ns ± 0%     -1.06%  (p=0.000 n=9+10)
    CreateGoroutinesParallel                 188ns ± 0%     186ns ± 0%     -1.27%  (p=0.000 n=8+10)
    WakeupParallelSyscall/0s                 210µs ± 3%     207µs ± 3%     -1.60%  (p=0.043 n=10+10)
    StackGrowthDeep                          121µs ± 1%     119µs ± 1%     -1.70%  (p=0.000 n=9+10)
    Matmult-8                               1.82ns ± 3%    1.78ns ± 3%     -2.16%  (p=0.020 n=10+10)
    WakeupParallelSyscall/20µs               281µs ± 3%     269µs ± 4%     -4.44%  (p=0.000 n=10+10)
    WakeupParallelSyscall/10µs               239µs ± 3%     228µs ± 9%     -4.70%  (p=0.001 n=10+10)
    GoroutineProfile/sparse-nil/idle-8       485µs ± 2%      12µs ± 4%    -97.56%  (p=0.000 n=10+10)
    GoroutineProfile/small-nil/idle-8        484µs ± 2%      12µs ± 1%    -97.60%  (p=0.000 n=10+7)
    GoroutineProfile/small-nil/loaded-8      487µs ± 2%      11µs ± 3%    -97.68%  (p=0.000 n=10+10)
    GoroutineProfile/sparse-nil/loaded-8     507µs ± 4%      11µs ± 6%    -97.78%  (p=0.000 n=10+10)
    GoroutineProfile/large-nil/idle-8        709µs ± 2%      11µs ± 4%    -98.38%  (p=0.000 n=10+10)
    GoroutineProfile/large-nil/loaded-8      717µs ± 2%      11µs ± 3%    -98.43%  (p=0.000 n=10+10)
    GoroutineProfile/sparse-nil/idle         465µs ± 3%       1µs ± 1%    -99.84%  (p=0.000 n=10+10)
    GoroutineProfile/small-nil/idle          493µs ± 3%       1µs ± 0%    -99.85%  (p=0.000 n=10+9)
    GoroutineProfile/large-nil/idle          716µs ± 1%       1µs ± 2%    -99.89%  (p=0.000 n=7+10)

    name                                  old alloc/op   new alloc/op   delta
    CreateGoroutinesCapture                   144B ± 0%      144B ± 0%       ~     (all equal)
    CreateGoroutinesCapture-8                 144B ± 0%      144B ± 0%       ~     (all equal)

    name                                  old allocs/op  new allocs/op  delta
    CreateGoroutinesCapture                   5.00 ± 0%      5.00 ± 0%       ~     (all equal)
    CreateGoroutinesCapture-8                 5.00 ± 0%      5.00 ± 0%       ~     (all equal)

    name                                  old p50-ns     new p50-ns     delta
    GoroutineProfile/small/loaded-8          1.01M ± 3%     3.87M ±45%   +282.15%  (p=0.000 n=10+10)
    GoroutineProfile/sparse/loaded-8         1.02M ± 3%     2.43M ±41%   +138.42%  (p=0.000 n=10+10)
    GoroutineProfile/large/loaded-8          7.43M ±16%    17.28M ± 2%   +132.43%  (p=0.000 n=10+10)
    GoroutineProfile/small/idle               956k ± 0%     1559k ±16%    +63.03%  (p=0.000 n=10+10)
    GoroutineProfile/small/idle-8            1.01M ± 3%     1.45M ± 7%    +44.31%  (p=0.000 n=10+9)
    GoroutineProfile/sparse/idle              977k ± 1%     1399k ± 2%    +43.20%  (p=0.000 n=10+10)
    GoroutineProfile/sparse/idle-8           1.00M ± 3%     1.41M ± 3%    +40.47%  (p=0.000 n=10+10)
    GoroutineProfile/large/idle-8            6.97M ± 1%     8.41M ±25%    +20.54%  (p=0.003 n=8+10)
    GoroutineProfile/large/idle              6.71M ± 1%     7.46M ± 4%    +11.15%  (p=0.000 n=10+10)
    GoroutineProfile/sparse-nil/idle-8        483k ± 3%       13k ± 3%    -97.41%  (p=0.000 n=10+9)
    GoroutineProfile/small-nil/idle-8         483k ± 2%       12k ± 1%    -97.43%  (p=0.000 n=10+8)
    GoroutineProfile/small-nil/loaded-8       484k ± 3%       10k ± 2%    -97.93%  (p=0.000 n=10+8)
    GoroutineProfile/sparse-nil/loaded-8      492k ± 2%       10k ± 4%    -97.97%  (p=0.000 n=10+8)
    GoroutineProfile/large-nil/idle-8         708k ± 2%       12k ±15%    -98.36%  (p=0.000 n=10+10)
    GoroutineProfile/large-nil/loaded-8       714k ± 2%       10k ± 2%    -98.60%  (p=0.000 n=10+8)
    GoroutineProfile/sparse-nil/idle          459k ± 1%        1k ± 1%    -99.85%  (p=0.000 n=10+10)
    GoroutineProfile/small-nil/idle           477k ± 1%        1k ± 0%    -99.85%  (p=0.000 n=10+9)
    GoroutineProfile/large-nil/idle           712k ± 1%        1k ± 1%    -99.90%  (p=0.000 n=7+10)

    name                                  old p90-ns     new p90-ns     delta
    GoroutineProfile/small/loaded-8          1.13M ±10%     7.49M ±35%   +562.07%  (p=0.000 n=10+10)
    GoroutineProfile/sparse/loaded-8         1.10M ±12%     4.58M ±31%   +318.02%  (p=0.000 n=10+9)
    GoroutineProfile/large/loaded-8          8.78M ±24%    27.83M ± 2%   +217.00%  (p=0.000 n=10+10)
    GoroutineProfile/small/idle               967k ± 0%     2909k ±50%   +200.91%  (p=0.000 n=10+10)
    GoroutineProfile/sparse/idle-8           1.02M ± 3%     1.96M ±76%    +92.99%  (p=0.000 n=10+10)
    GoroutineProfile/small/idle-8            1.07M ±17%     1.55M ±12%    +45.23%  (p=0.000 n=10+10)
    GoroutineProfile/sparse/idle              992k ± 1%     1417k ± 3%    +42.79%  (p=0.000 n=9+10)
    GoroutineProfile/large/idle              6.73M ± 0%     7.99M ± 8%    +18.80%  (p=0.000 n=8+10)
    GoroutineProfile/large/idle-8            8.20M ±25%     9.18M ±25%       ~     (p=0.315 n=10+10)
    GoroutineProfile/sparse-nil/idle-8        495k ± 3%       13k ± 1%    -97.36%  (p=0.000 n=10+9)
    GoroutineProfile/small-nil/idle-8         494k ± 2%       13k ± 3%    -97.36%  (p=0.000 n=10+10)
    GoroutineProfile/small-nil/loaded-8       496k ± 2%       13k ± 1%    -97.41%  (p=0.000 n=10+10)
    GoroutineProfile/sparse-nil/loaded-8      544k ±11%       13k ± 1%    -97.62%  (p=0.000 n=10+9)
    GoroutineProfile/large-nil/idle-8         724k ± 1%       13k ± 3%    -98.20%  (p=0.000 n=10+10)
    GoroutineProfile/large-nil/loaded-8       729k ± 3%       13k ± 2%    -98.23%  (p=0.000 n=10+10)
    GoroutineProfile/sparse-nil/idle          476k ± 4%        1k ± 1%    -99.85%  (p=0.000 n=9+10)
    GoroutineProfile/small-nil/idle           537k ±10%        1k ± 0%    -99.87%  (p=0.000 n=10+9)
    GoroutineProfile/large-nil/idle           729k ± 0%        1k ± 1%    -99.90%  (p=0.000 n=7+10)

    name                                  old p99-ns     new p99-ns     delta
    GoroutineProfile/sparse/loaded-8         1.27M ±33%    20.49M ±17%  +1514.61%  (p=0.000 n=10+10)
    GoroutineProfile/small/loaded-8          1.37M ±29%    20.48M ±23%  +1399.35%  (p=0.000 n=10+10)
    GoroutineProfile/large/loaded-8          9.76M ±23%    39.98M ±22%   +309.52%  (p=0.000 n=10+8)
    GoroutineProfile/small/idle               976k ± 1%     3367k ±55%   +244.94%  (p=0.000 n=10+10)
    GoroutineProfile/sparse/idle-8           1.03M ± 3%     2.50M ±65%   +142.30%  (p=0.000 n=10+10)
    GoroutineProfile/small/idle-8            1.17M ±34%     1.70M ±14%    +45.15%  (p=0.000 n=10+10)
    GoroutineProfile/sparse/idle             1.02M ± 3%     1.45M ± 4%    +42.64%  (p=0.000 n=9+10)
    GoroutineProfile/large/idle              6.92M ± 2%     9.00M ± 7%    +29.98%  (p=0.000 n=8+9)
    GoroutineProfile/large/idle-8            8.74M ±23%     9.90M ±24%       ~     (p=0.190 n=10+10)
    GoroutineProfile/sparse-nil/idle-8        508k ± 4%       16k ± 2%    -96.90%  (p=0.000 n=10+9)
    GoroutineProfile/small-nil/idle-8         508k ± 4%       16k ± 3%    -96.91%  (p=0.000 n=10+9)
    GoroutineProfile/small-nil/loaded-8       542k ± 5%       15k ±15%    -97.15%  (p=0.000 n=10+10)
    GoroutineProfile/sparse-nil/loaded-8      649k ±16%       15k ±18%    -97.67%  (p=0.000 n=10+10)
    GoroutineProfile/large-nil/idle-8         738k ± 2%       16k ± 2%    -97.86%  (p=0.000 n=10+10)
    GoroutineProfile/large-nil/loaded-8       765k ± 4%       15k ±17%    -98.03%  (p=0.000 n=10+10)
    GoroutineProfile/sparse-nil/idle          539k ±26%        1k ±17%    -99.84%  (p=0.000 n=10+10)
    GoroutineProfile/small-nil/idle           659k ±25%        1k ± 0%    -99.84%  (p=0.000 n=10+8)
    GoroutineProfile/large-nil/idle           760k ± 2%        1k ±22%    -99.88%  (p=0.000 n=9+10)

Fixes #33250
For #50794

Change-Id: I862a2bc4e991cec485f21a6fce4fca84f2c6435b
Reviewed-on: https://go-review.googlesource.com/c/go/+/387415
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Rhys Hiltner <rhys@justin.tv>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-03 20:49:34 +00:00
Rhys Hiltner b9dee7e59b runtime/pprof: stress test goroutine profiler
For #33250

Change-Id: Ic7aa74b1bb5da9c4319718bac96316b236cb40b2
Reviewed-on: https://go-review.googlesource.com/c/go/+/387414
Run-TryBot: Rhys Hiltner <rhys@justin.tv>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: David Chase <drchase@google.com>
2022-05-03 20:49:18 +00:00
Rhys Hiltner 209942fa88 runtime/pprof: add race annotations for goroutine profiles
The race annotations for goroutine label maps covered the special type
of read necessary to create CPU profiles. Extend that to include
goroutine profiles. Annotate the copy involved in creating new
goroutines.

Fixes #50292

Change-Id: I10f69314e4f4eba85c506590fe4781f4d6b8ec2d
Reviewed-on: https://go-review.googlesource.com/c/go/+/385660
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Austin Clements <austin@google.com>
2022-05-03 20:49:07 +00:00
Michael Knyszek 01359b4681 runtime: add GC CPU utilization limiter
This change adds a GC CPU utilization limiter to the GC. It disables
assists to ensure GC CPU utilization remains under 50%. It uses a leaky
bucket mechanism that will only fill if GC CPU utilization exceeds 50%.
Once the bucket begins to overflow, GC assists are limited until the
bucket empties, at the risk of GC overshoot. The limiter is primarily
updated by assists. The scheduler may also update it, but only if the
GC is on and a few milliseconds have passed since the last update. This
second case exists to ensure that if the limiter is on, and no assists
are happening, we're still updating the limiter regularly.

The purpose of this limiter is to mitigate GC death spirals, opting to
use more memory instead.

This change turns the limiter on always. In practice, 50% overall GC CPU
utilization is very difficult to hit unless you're trying; even the most
allocation-heavy applications with complex heaps still need to do
something with that memory. Note that small GOGC values (i.e.
single-digit, or low teens) are more likely to trigger the limiter,
which means the GOGC tradeoff may no longer be respected. Even so, it
should still be relatively rare.

This change also introduces the feature flag for code to support the
memory limit feature.

For #48409.

Change-Id: Ia30f914e683e491a00900fd27868446c65e5d3c2
Reviewed-on: https://go-review.googlesource.com/c/go/+/353989
Reviewed-by: Michael Pratt <mpratt@google.com>
2022-05-03 15:11:42 +00:00
Michael Pratt 4289bd365c runtime: simply user throws, expand runtime throws
This gives explicit names to the possible states of throwing (-1, 0, 1).

m.throwing is now one of:

throwTypeOff: not throwing, previously == 0
throwTypeUser: user throw, previously == -1
throwTypeRuntime: runtime throw, previously == 1

For runtime throws, we now always include frame metadata and system
goroutines regardless of GOTRACEBACK to aid in debugging the runtime.

For user throws, we no longer include frame metadata or runtime frames,
unless GOTRACEBACK=system or higher.

For #51485.

Change-Id: If252e2377a0b6385ce7756b937929be4273a56c0
Reviewed-on: https://go-review.googlesource.com/c/go/+/390421
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Austin Clements <austin@google.com>
2022-04-28 17:14:41 +00:00
Michael Pratt 29bbca5c2c runtime: differentiate "user" and "system" throws
"User" throws are throws due to some invariant broken by the application.
"System" throws are due to some invariant broken by the runtime,
environment, etc (i.e., not the fault of the application).

This CL sends "user" throws through the new fatal. Currently this
function is identical to throw, but with a different name to clearly
differentiate the throw type in the stack trace, and hopefully be a bit
more clear to users what it means.

This CL changes a few categories of throw to fatal:

1. Concurrent map read/write.
2. Deadlock detection.
3. Unlock of unlocked sync.Mutex.
4. Inconsistent results from syscall.AllThreadsSyscall.

"Thread exhaustion" and "out of memory" (usually address space full)
throws are additional throws that are arguably the fault of user code,
but I've left off for now because there is no specific invariant that
they have broken to get into these states.

For #51485

Change-Id: I713276a6c290fd34a6563e6e9ef378669d74ae32
Reviewed-on: https://go-review.googlesource.com/c/go/+/390420
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
2022-04-28 16:50:31 +00:00
Michael Anthony Knyszek d29f5247b8 runtime: refactor the scavenger and make it testable
This change refactors the scavenger into a type whose methods represent
the actual function and scheduling of the scavenger. It also stubs out
access to global state in order to make it testable.

This change thus also adds a test for the scavenger. In writing this
test, I discovered the lack of a behavior I expected: if the
pageAlloc.scavenge returns < the bytes requested scavenged, that means
the heap is exhausted. This has been true this whole time, but was not
documented or explicitly relied upon. This change rectifies that. In
theory this means the scavenger could spin in run() indefinitely (as
happened in the test) if shouldStop never told it to stop. In practice,
shouldStop fires long before the heap is exhausted, but for future
changes it may be important. At the very least it's good to be
intentional about these things.

While we're here, I also moved the call to stopTimer out of wake and
into sleep. There's no reason to add more operations to a context that's
already precarious (running without a P on sysmon).

Change-Id: Ib31b86379fd9df84f25ae282734437afc540da5c
Reviewed-on: https://go-review.googlesource.com/c/go/+/384734
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-26 22:15:21 +00:00
Michael Anthony Knyszek 13b18ec947 runtime: move scheduling decisions by schedule into findrunnable
This change moves several scheduling decisions made by schedule into
findrunnable. The main motivation behind this change is the fact that
stopped Ms can't become dedicated or fractional GC workers. The main
reason for this is that when a stopped M wakes up, it stays in
findrunnable until it finds work, which means it will never consider GC
work. On that note, it'll also never consider becoming the trace reader,
either.

Another way of looking at it is that this change tries to make
findrunnable aware of more sources of work than it was before. With this
change, any M in findrunnable should be capable of becoming a GC worker,
resolving #44313. While we're here, let's also make more sources of
work, such as the trace reader, visible to handoffp, which should really
be checking all sources of work. With that, we also now correctly handle
the case where StopTrace is called from the last live M that is also
locked (#39004). stoplockedm calls handoffp to start a new M and handle
the work it cannot, and once we include the trace reader in that, we
ensure that the trace reader gets scheduled.

This change attempts to preserve the exact same ordering of work
checking to reduce its impact.

One consequence of this change is that upon entering schedule, some
sources of work won't be checked twice (i.e. the local and global
runqs, and timers) as they do now, which in some sense gives them a
lower priority than they had before.

Fixes #39004.
Fixes #44313.

Change-Id: I5d8b7f63839db8d9a3e47cdda604baac1fe615ce
Reviewed-on: https://go-review.googlesource.com/c/go/+/393880
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-26 22:09:34 +00:00
Michael Anthony Knyszek e1b5f347e7 runtime: reduce max idle mark workers during periodic GC cycles
This change reduces the maximum number of idle mark workers during
periodic (currently every 2 minutes) GC cycles to 1.

Idle mark workers soak up all available and unused Ps, up to GOMAXPROCS.
While this provides some throughput and latency benefit in general, it
can cause what appear to be massive CPU utilization spikes in otherwise
idle applications. This is mostly an issue for *very* idle applications,
ones idle enough to trigger periodic GC cycles. This spike also tends to
interact poorly with auto-scaling systems, as the system might assume
the load average is very low and suddenly see a massive burst in
activity.

The result of this change is not to bring down this 100% (of GOMAXPROCS)
CPU utilization spike to 0%, but rather

  min(25% + 1/GOMAXPROCS*100%, 100%)

Idle mark workers also do incur a small latency penalty as they must be
descheduled for other work that might pop up. Luckily the runtime is
pretty good about getting idle mark workers off of Ps, so in general
the latency benefit from shorter GC cycles outweighs this cost. But, the
cost is still non-zero and may be more significant in idle applications
that aren't invoking assists and write barriers quite as often.

We can't completely eliminate idle mark workers because they're
currently necessary for GC progress in some circumstances. Namely,
they're critical for progress when all we have is fractional workers. If
a fractional worker meets its quota, and all user goroutines are blocked
directly or indirectly on a GC cycle (via runtime.GOMAXPROCS, or
runtime.GC), the program may deadlock without GC workers, since the
fractional worker will go to sleep with nothing to wake it.

Fixes #37116.
For #44163.

Change-Id: Ib74793bb6b88d1765c52d445831310b0d11ef423
Reviewed-on: https://go-review.googlesource.com/c/go/+/393394
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-26 22:08:42 +00:00
Michael Anthony Knyszek 226346bb76 runtime: yield instead of sleeping in runqgrab on OpenBSD
OpenBSD has a coarse sleep granularity that rounds up to 10 ms
increments. This can cause significant STW delays, among other issues.
As far as I can tell, there's only 1 tightly timed sleep without an
explicit wakeup for which this actually matters.

Fixes #52475.

Change-Id: Ic69fc11096ddbbafd79b2dcdf3f912fde242db24
Reviewed-on: https://go-review.googlesource.com/c/go/+/401638
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-26 22:08:12 +00:00
Michael Pratt bb004a179a syscall: define Syscall in terms of RawSyscall on linux
This is a re-do of CL 388477, fixing #52472.

It is unsafe to call syscall.RawSyscall from syscall.Syscall with
-coverpkg=all and -race. This is because:

1. Coverage adds a sync/atomic call in RawSyscall to increment the
   coverage counter.
2. Race mode instruments sync/atomic calls with TSAN runtime calls. TSAN
   eventually calls runtime.racecallbackfunc, which expects
   getg().m.p != 0, which is no longer true after entersyscall().

cmd/go actually avoids adding coverage instrumention to package runtime
in race mode entirely to avoid these kinds of problems. Rather than also
excluding all of syscall for this one function, work around by calling
RawSyscall6 instead, which avoids coverage instrumention both by being
written in assembly and in package runtime/*.

For #51087
Fixes #52472

Change-Id: Iaffd27df03753020c4716059a455d6ca7b62f347
Reviewed-on: https://go-review.googlesource.com/c/go/+/401654
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-04-21 22:10:49 +00:00
Michael Pratt aac1d3a1b1 Revert "syscall: define Syscall in terms of RawSyscall on linux"
This reverts CL 388477, which breaks cmd/go
TestScript/cover_pkgall_runtime.

For #51087.
For #52472.

Change-Id: Id58af419a889281f15df2471c58fece011fcffbc
Reviewed-on: https://go-review.googlesource.com/c/go/+/401636
Run-TryBot: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-21 19:06:52 +00:00
Michael Pratt 64e69d3925 syscall: define Syscall in terms of RawSyscall on linux
For #51087

Change-Id: I9de7e85ccf137ae73662759382334bcbe7208150
Reviewed-on: https://go-review.googlesource.com/c/go/+/388477
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-21 18:07:14 +00:00
Russ Cox 19309779ac all: gofmt main repo
[This CL is part of a sequence implementing the proposal #51082.
The design doc is at https://go.dev/s/godocfmt-design.]

Run the updated gofmt, which reformats doc comments,
on the main repository. Vendored files are excluded.

For #51082.

Change-Id: I7332f099b60f716295fb34719c98c04eb1a85407
Reviewed-on: https://go-review.googlesource.com/c/go/+/384268
Reviewed-by: Jonathan Amsterdam <jba@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2022-04-11 16:34:30 +00:00
Russ Cox 9839668b56 all: separate doc comment from //go: directives
A future change to gofmt will rewrite

	// Doc comment.
	//go:foo

to

	// Doc comment.
	//
	//go:foo

Apply that change preemptively to all comments (not necessarily just doc comments).

For #51082.

Change-Id: Iffe0285418d1e79d34526af3520b415a12203ca9
Reviewed-on: https://go-review.googlesource.com/c/go/+/384260
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-05 17:54:15 +00:00
Russ Cox 7d87ccc860 all: fix various doc comment formatting nits
A run of lines that are indented with any number of spaces or tabs
format as a <pre> block. This commit fixes various doc comments
that format badly according to that (standard) rule.

For example, consider:

	// - List item.
	//   Second line.
	// - Another item.

Because the - lines are unindented, this is actually two paragraphs
separated by a one-line <pre> block. This CL rewrites it to:

	//  - List item.
	//    Second line.
	//  - Another item.

Today, that will format as a single <pre> block.
In a future release, we hope to format it as a bulleted list.

Various other minor fixes as well, all in preparation for reformatting.

For #51082.

Change-Id: I95cf06040d4186830e571cd50148be3bf8daf189
Reviewed-on: https://go-review.googlesource.com/c/go/+/384257
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-01 18:18:01 +00:00
Keith Randall 510ad4561f runtime: improve work stealing randomness
For certain values of GOMAXPROCS, the current code is less random than
it looks. For example with GOMAXPROCS=12, there are 4 coprimes: 1 5 7 11.
That's bad, as 12 and 4 are not relatively prime. So if pos == 2, then we
always pick 7 as the inc. We want to pick pos and inc independently
at random.

Change-Id: I5c7e4f01f9223cbc2db12a685dc0bced2cf39abf
Reviewed-on: https://go-review.googlesource.com/c/go/+/369976
Run-TryBot: Keith Randall <khr@golang.org>
Trust: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2022-03-01 18:43:08 +00:00
Michael Pratt 0a5fae2a0e runtime, syscall: reimplement AllThreadsSyscall using only signals.
In issue 50113, we see that a thread blocked in a system call can result
in a hang of AllThreadsSyscall. To resolve this, we must send a signal
to these threads to knock them out of the system call long enough to run
the per-thread syscall.

Stepping back, if we need to send signals anyway, it should be possible
to implement this entire mechanism on top of signals. This CL does so,
vastly simplifying the mechanism, both as a direct result of
newly-unnecessary code as well as some ancillary simplifications to make
things simpler to follow.

Major changes:

* The rest of the mechanism is moved to os_linux.go, with fields in mOS
  instead of m itself.
* 'Fixup' fields and functions are renamed to 'perThreadSyscall' so they
  are more precise about their purpose.
* Rather than getting passed a closure, doAllThreadsSyscall takes the
  syscall number and arguments. This avoids a lot of hairy behavior:
    * The closure may potentially only be live in fields in the M,
      hidden from the GC. Not necessary with no closure.
    * The need to loan out the race context. A direct RawSyscall6 call
      does not require any race context.
    * The closure previously conditionally panicked in strange
      locations, like a signal handler. Now we simply throw.
* All manual fixup synchronization with mPark, sysmon, templateThread,
  sigqueue, etc is gone. The core approach is much simpler:
  doAllThreadsSyscall sends a signal to every thread in allm, which
  executes the system call from the signal handler. We use (SIGRTMIN +
  1), aka SIGSETXID, the same signal used by glibc for this purpose. As
  such, we are careful to only handle this signal on non-cgo binaries.

Synchronization with thread creation is a key part of this CL. The
comment near the top of doAllThreadsSyscall describes the required
synchronization semantics and how they are achieved.

Note that current use of allocmLock protects the state mutations of allm
that are also protected by sched.lock. allocmLock is used instead of
sched.lock simply to avoid holding sched.lock for so long.

Fixes #50113

Change-Id: Ic7ea856dc66cf711731540a54996e08fc986ce84
Reviewed-on: https://go-review.googlesource.com/c/go/+/383434
Reviewed-by: Austin Clements <austin@google.com>
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-02-15 15:40:35 +00:00
Michael Pratt 7a132d6f4e runtime: move doAllThreadsSyscall to os_linux.go
syscall_runtime_doAllThreadsSyscall is only used on Linux. In
preparation of a follow-up CL that will modify the function to use other
Linux-only functions, move it to os_linux.go with no changes.

For #50113.

Change-Id: I348b6130038603aa0a917be1f1debbca5a5a073f
Reviewed-on: https://go-review.googlesource.com/c/go/+/383996
Trust: Michael Pratt <mpratt@google.com>
Reviewed-by: Andrew G. Morgan <agm@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-02-15 15:40:09 +00:00
Ian Lance Taylor 5442f4d51b runtime: restore old mp.fastrand initialization
CL 337350 changed mp.fastrand from a [2]uint32 to a uint64 and changed
the initialization to a single call of int64Hash. However, int64Hash
returns uintptr, so 32-bit systems this always left the most
significant 32 bits of mp.fastrand initialized to 0. The new code also
did not protect against initializing mp.fastrand to 0, which on a
system that does not implement math.Mul64 (most 32-bit systems) would
lead fastrand to always return 0.

This CL restores the mp.fastrand initialization to what it was before
CL 337350, adjusted for the change from [2]uint32 to uint64.

Change-Id: I663b415d9424d967e8e665ce2d017604dcd5b204
Reviewed-on: https://go-review.googlesource.com/c/go/+/383916
Trust: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-02-08 22:26:40 +00:00
Michael Pratt 822dbcb7d4 Revert "runtime: normalize sigprof traceback flags"
This reverts commit CL 358900.

Adding _TraceJumpStack to cgo traceback exposed a crashing condition.
This CL was primarily a cleanup, so we revert it entirely for now
and follow-up with the VDSO and libcall parts later.

Fixes #50936.

Change-Id: Ie45c9caaa8e2ef5bc9498ba65c36c887ca821bf7
Reviewed-on: https://go-review.googlesource.com/c/go/+/382079
Reviewed-by: Cherry Mui <cherryyz@google.com>
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-01-31 22:38:49 +00:00
Michael Pratt 3e45eb3ce1 runtime: do not inherit labels on system goroutines
GC background mark worker goroutines are created when the first GC is
triggered (or next GC after GOMAXPROCS increases). Since the GC can be
triggered from a user goroutine, those workers will inherit any pprof
labels from the user goroutine.

That isn't meaningful, so avoid it by excluding system goroutines from
inheriting labels.

Fixes #50032

Change-Id: Ib425ae561a3466007ff5deec86b9c51829ab5507
Reviewed-on: https://go-review.googlesource.com/c/go/+/369983
Reviewed-by: Austin Clements <austin@google.com>
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-01-19 16:33:05 +00:00
Felix Geisendörfer 9b0de0854d runtime: fix missing pprof labels
Use gp.m.curg instead of the gp when recording cpu profiler stack
traces. This ensures profiler labels are captured when systemstack or similar
is executing on behalf of the current goroutine.

After this there are still rare cases of samples containing the labelHog
function, so more work might be needed. This patch should fix ~99% of the
problem.

Also change testCPUProfile interface a little to allow the new test to
re-run with a longer duration if it fails during a -short run.

Fixes #48577.

Change-Id: I3dbc9fd5af3c513544e822acaa43055b2e00dfa9
Reviewed-on: https://go-review.googlesource.com/c/go/+/367200
Trust: Michael Pratt <mpratt@google.com>
Trust: Bryan Mills <bcmills@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2021-12-03 16:11:38 +00:00
Austin Clements 08ecdf7c2e runtime: fix racy allgs access on weak memory architectures
Currently, markroot is very clever about accessing the allgs slice to
find stack roots. Unfortunately, on weak memory architectures, it's a
little too clever and can sometimes read a nil g, causing a fatal
panic.

Specifically, gcMarkRootPrepare snapshots the length of allgs during
STW and then markroot accesses allgs up to this length during
concurrent marking. During concurrent marking, allgadd can append to
allgs *without synchronizing with markroot*, but the argument is that
the markroot access should be safe because allgs only grows
monotonically and existing entries in allgs never change.

This reasoning is insufficient on weak memory architectures. Suppose
thread 1 calls allgadd during concurrent marking and that allgs is
already at capacity. On thread 1, append will allocate a new slice
that initially consists of all nils, then copy the old backing store
to the new slice (write A), then allgadd will publish the new slice to
the allgs global (write B). Meanwhile, on thread 2, markroot reads the
allgs slice base pointer (read A), computes an offset from that base
pointer, and reads the value at that offset (read B). On a weak memory
machine, thread 2 can observe write B *before* write A. If the order
of events from thread 2's perspective is write B, read A, read B,
write A, then markroot on thread 2 will read a nil g and then panic.

Fix this by taking a snapshot of the allgs slice header in
gcMarkRootPrepare while the world is stopped and using that snapshot
as the list of stack roots in markroot. This eliminates all read/write
concurrency around the access in markroot.

Alternatively, we could make markroot use the atomicAllGs API to
atomically access the allgs list, but in my opinion it's much less
subtle to just eliminate all of the interesting concurrency around the
allgs access.

Fixes #49686.
Fixes #48845.
Fixes #43824.
(These are all just different paths to the same ultimate issue.)

Change-Id: I472b4934a637bbe88c8a080a280aa30212acf984
Reviewed-on: https://go-review.googlesource.com/c/go/+/368134
Trust: Austin Clements <austin@google.com>
Trust: Bryan C. Mills <bcmills@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-12-01 17:13:34 +00:00
Michael Pratt 29ec902efc runtime: get tracking time only when needed
casgstatus currently calls nanotime on every casgstatus when tracking,
even though the time is only used in some cases. For goroutines making
lots of transitions that aren't covered here, this can add a small
overhead. Switch to calling nanotime only when necessary.

Change-Id: I2617869332e8289ef33dd674d786e44dea09aaba
Reviewed-on: https://go-review.googlesource.com/c/go/+/364375
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
2021-11-16 17:13:33 +00:00
Bryan C. Mills c6a0b6f2de Revert "runtime: fix missing pprof labels"
This reverts CL 351751.

Reason for revert: new test is failing on many builders.

Change-Id: I066211c9f25607ca9eb5299aedea2ecc5069e34f
Reviewed-on: https://go-review.googlesource.com/c/go/+/360757
Trust: Bryan C. Mills <bcmills@google.com>
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2021-11-02 16:15:22 +00:00
Felix Geisendörfer da7173a2ed runtime: fix missing pprof labels
Use gp.m.curg instead of the gp when recording cpu profiler stack
traces. This ensures profiler labels are captured when systemstack or similar
is executing on behalf of the current goroutine.

After this there are still rare cases of samples containing the labelHog
function, so more work might be needed. This patch should fix ~99% of the
problem.

Fixes #48577.

Change-Id: I27132110e3d09721ec3b3ef417122bc70d8f3279
Reviewed-on: https://go-review.googlesource.com/c/go/+/351751
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Knyszek <mknyszek@google.com>
2021-11-02 15:15:09 +00:00
fanzha02 6f327f7b88 runtime, syscall: add calls to asan functions
Add explicit address sanitizer instrumentation to the runtime and
syscall packages. The compiler does not instrument the runtime
package. It does instrument the syscall package, but we need to add
a couple of cases that it can't see.

Refer to the implementation of the asan malloc runtime library,
this patch also allocates extra memory as the redzone, around the
returned memory region, and marks the redzone as unaddressable to
detect the overflows or underflows.

Updates #44853.

Change-Id: I2753d1cc1296935a66bf521e31ce91e35fcdf798
Reviewed-on: https://go-review.googlesource.com/c/go/+/298614
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Trust: fannie zhang <Fannie.Zhang@arm.com>
2021-11-02 05:35:11 +00:00
Michael Anthony Knyszek 9ac1ee2d46 runtime: track the amount of scannable allocated stack for the GC pacer
This change adds two fields to gcControllerState: stackScan, used for
pacing decisions, and scannableStackSize, which directly tracks the
amount of space allocated for inuse stacks that will be scanned.

scannableStackSize is not updated directly, but is instead flushed from
each P when at an least 8 KiB delta has accumulated. This helps reduce
issues with atomics contention for newly created goroutines. Stack
growth paths are largely unaffected.

StackGrowth-48			51.4ns ± 0%	51.4ns ± 0%	~	(p=0.927 n=10+10)
StackGrowthDeep-48		6.14µs ± 3%	6.25µs ± 4%	~	(p=0.090 n=10+9)
CreateGoroutines-48		273ns ± 1%	273ns ± 1%	~	(p=0.676 n=9+10)
CreateGoroutinesParallel-48	65.5ns ± 5%	66.6ns ± 7%	~	(p=0.340 n=9+9)
CreateGoroutinesCapture-48	2.06µs ± 1%	2.07µs ± 4%	~	(p=0.217 n=10+10)
CreateGoroutinesSingle-48	550ns ± 3%	563ns ± 4%	+2.41%	(p=0.034 n=8+10)

For #44167.

Change-Id: Id1800d41d3a6c211b43aeb5681c57c0dc8880daf
Reviewed-on: https://go-review.googlesource.com/c/go/+/309589
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2021-10-29 18:35:20 +00:00
Michael Pratt 2bc8ed8e9c runtime: normalize sigprof traceback flags
Each gentraceback call uses a different set of flags. Combine these into
a common variable, only adjusted as necessary.

The effective changes here are:

* cgo traceback now has _TraceJumpStack. This is a no-op since it
  already passes curg.
* libcall traceback now has _TraceJumpStack. This is a behavior change
  and will allow following stack transitions if a libcall is performed on
  g0.
* VDSO traceback drops _TraceTrap. vdsoPC is a return address, so
  _TraceTrap was not necessary.

Change-Id: I351b3cb8dc77df7466795d5fbf2bd8f30bba2d37
Reviewed-on: https://go-review.googlesource.com/c/go/+/358900
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-10-28 18:10:50 +00:00
emahiro 02a36668aa runtime: fix typo of pushBackAll
Fixes: #49081
Change-Id: Ie6742f1e7a60c2d92ce1283bcfaa3eac521440a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/357629
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Trust: Daniel Martí <mvdan@mvdan.cc>
Trust: Cherry Mui <cherryyz@google.com>
2021-10-21 17:23:58 +00:00
Michael Anthony Knyszek 75b73d68b3 runtime: use atomic.Float64 for assist ratio
Change-Id: Ie7f09a7c9545ef9dd1860b1e332c4edbcbf8165e
Reviewed-on: https://go-review.googlesource.com/c/go/+/356170
Reviewed-by: Austin Clements <austin@google.com>
Trust: Michael Knyszek <mknyszek@google.com>
2021-10-20 20:38:59 +00:00
Meng Zhuo ecb2f231fa runtime,sync: using fastrandn instead of modulo reduction
fastrandn is ~50% faster than fastrand() % n.
`ack -v 'fastrand\(\)\s?\%'` finds all modulo on fastrand()

name              old time/op  new time/op  delta
Fastrandn/2       2.86ns ± 0%  1.59ns ± 0%  -44.35%  (p=0.000 n=9+10)
Fastrandn/3       2.87ns ± 1%  1.59ns ± 0%  -44.41%  (p=0.000 n=10+9)
Fastrandn/4       2.87ns ± 1%  1.58ns ± 1%  -45.10%  (p=0.000 n=10+10)
Fastrandn/5       2.86ns ± 1%  1.58ns ± 1%  -44.84%  (p=0.000 n=10+10)

Change-Id: Ic91f5ca9b9e3b65127bc34792b62fd64fbd13b5c
Reviewed-on: https://go-review.googlesource.com/c/go/+/353269
Trust: Meng Zhuo <mzh@golangcn.org>
Run-TryBot: Meng Zhuo <mzh@golangcn.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2021-10-07 14:01:52 +00:00
Josh Bleecher Snyder 3100dc1a7f cmd/link,runtime: remove relocations from stkobjs
Use an offset from go.func.* instead.
This removes the last relocation from funcdata symbols,
which lets us simplify that code.

size      before    after     Δ       %
addr2line 3683218   3680706   -2512   -0.068%
api       4951074   4944850   -6224   -0.126%
asm       4744258   4757586   +13328  +0.281%
buildid   2419986   2418546   -1440   -0.060%
cgo       4218306   4197346   -20960  -0.497%
compile   22132066  22076882  -55184  -0.249%
cover     4432834   4411362   -21472  -0.484%
dist      3111202   3091346   -19856  -0.638%
doc       3583602   3563234   -20368  -0.568%
fix       3023922   3020658   -3264   -0.108%
link      6188034   6164642   -23392  -0.378%
nm        3665826   3646818   -19008  -0.519%
objdump   4015234   4012450   -2784   -0.069%
pack      2155010   2153554   -1456   -0.068%
pprof     13044178  13011522  -32656  -0.250%
test2json 2402146   2383906   -18240  -0.759%
trace     9765410   9736514   -28896  -0.296%
vet       6681250   6655058   -26192  -0.392%
total     104217556 103926980 -290576 -0.279%

relocs    before  after   Δ       %
addr2line 25563   25066   -497    -1.944%
api       18409   17176   -1233   -6.698%
asm       18903   18271   -632    -3.343%
buildid   9513    9233    -280    -2.943%
cgo       17103   16222   -881    -5.151%
compile   64825   60421   -4404   -6.794%
cover     19464   18479   -985    -5.061%
dist      10798   10135   -663    -6.140%
doc       13503   12735   -768    -5.688%
fix       11465   10820   -645    -5.626%
link      23214   21849   -1365   -5.880%
nm        25480   24987   -493    -1.935%
objdump   26610   26057   -553    -2.078%
pack      7951    7665    -286    -3.597%
pprof     63964   60761   -3203   -5.008%
test2json 8735    8389    -346    -3.961%
trace     39639   37180   -2459   -6.203%
vet       25970   24044   -1926   -7.416%
total     431108  409489  -21619  -5.015%

Change-Id: I43c26196a008da6d1cb3a782eea2f428778bd569
Reviewed-on: https://go-review.googlesource.com/c/go/+/353138
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-10-05 19:16:08 +00:00
Rhys Hiltner 5b90958084 runtime: move sigprofNonGo
The sigprofNonGo and sigprofNonGoPC functions are only used on unix-like
platforms. In preparation for unix-specific changes to sigprofNonGo,
move it (plus its close relative) to a unix-specific file.

Updates #35057

Change-Id: I9c814127c58612ea9a9fbd28a992b04ace5c604d
Reviewed-on: https://go-review.googlesource.com/c/go/+/351790
Run-TryBot: Rhys Hiltner <rhys@justin.tv>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Trust: David Chase <drchase@google.com>
2021-09-27 18:58:36 +00:00
Meng Zhuo 483533df9e runtime: using wyrand for fastrand
For modern 64-bit CPU architecture multiplier is faster than xorshift

darwin/amd64
name              old time/op  new time/op  delta
Fastrand          2.13ns ± 1%  1.78ns ± 1%  -16.47%  (p=0.000 n=9+10)
FastrandHashiter  32.5ns ± 4%  32.1ns ± 3%     ~     (p=0.277 n=8+9)
Fastrandn/2       2.16ns ± 1%  1.99ns ± 1%   -7.53%  (p=0.000 n=10+10)
Fastrandn/3       2.13ns ± 3%  2.00ns ± 1%   -5.88%  (p=0.000 n=10+10)
Fastrandn/4       2.08ns ± 2%  1.98ns ± 2%   -4.71%  (p=0.000 n=10+10)
Fastrandn/5       2.08ns ± 2%  1.98ns ± 1%   -4.90%  (p=0.000 n=10+9)

linux/mips64le
name              old time/op  new time/op  delta
Fastrand          12.1ns ± 0%  10.8ns ± 1%  -10.58%  (p=0.000 n=8+10)
FastrandHashiter   105ns ± 1%   105ns ± 1%     ~     (p=0.138 n=10+10)
Fastrandn/2       16.9ns ± 0%  16.4ns ± 4%   -2.84%  (p=0.020 n=10+10)
Fastrandn/3       16.9ns ± 0%  16.4ns ± 3%   -3.03%  (p=0.000 n=10+10)
Fastrandn/4       16.9ns ± 0%  16.5ns ± 2%   -2.01%  (p=0.002 n=8+10)
Fastrandn/5       16.9ns ± 0%  16.4ns ± 3%   -2.70%  (p=0.000 n=8+10)

linux/riscv64
name              old time/op  new time/op  delta
Fastrand          22.7ns ± 0%  12.7ns ±19%  -44.09%  (p=0.000 n=9+10)
FastrandHashiter   255ns ± 4%   250ns ± 7%     ~     (p=0.363 n=10+10)
Fastrandn/2       31.8ns ± 2%  28.5ns ±13%  -10.45%  (p=0.000 n=10+10)
Fastrandn/3       33.0ns ± 2%  27.4ns ± 8%  -17.16%  (p=0.000 n=9+10)
Fastrandn/4       29.6ns ± 3%  28.2ns ± 5%   -4.81%  (p=0.000 n=8+9)
Fastrandn/5       33.4ns ± 3%  26.5ns ± 9%  -20.49%  (p=0.000 n=8+10)

Change-Id: I88ac69625ef923f8be66647e3361e3be986de002
Reviewed-on: https://go-review.googlesource.com/c/go/+/337350
Trust: Meng Zhuo <mzh@golangcn.org>
Run-TryBot: Meng Zhuo <mzh@golangcn.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2021-09-24 01:25:24 +00:00
nimelehin 03df68d3c3 runtime: fix setting of cpu features for amd64
Because of wrong case of letters, the cpu features flags were not
set properly for amd64.

Fixes #48406.

Change-Id: If19782851670e91fd31d119f4701c47373fa7e71
GitHub-Last-Rev: 91c7321ca4
GitHub-Pull-Request: golang/go#48403
Reviewed-on: https://go-review.googlesource.com/c/go/+/350151
Trust: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
2021-09-15 20:29:43 +00:00
Andy Pan 3920d6f208 runtime: eliminate the redundant for loop in runqget()
Change-Id: If9b283bbef3ff12a64d34b07491aee3396852f05
Reviewed-on: https://go-review.googlesource.com/c/go/+/317509
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Keith Randall <khr@golang.org>
2021-08-31 13:54:19 +00:00
Martin Möhrmann c1a14781ec runtime: remove unused cpu architecture feature variables from binaries
On amd64 this reduces go binary sizes by 176 bytes due to not referencing
internal/cpu.ARM64 and internal/cpu.ARM.

Change-Id: I8e4f31e2b1939b05eec2148b44d7cff7e0aeb30e
Reviewed-on: https://go-review.googlesource.com/c/go/+/344329
Trust: Martin Möhrmann <martin@golang.org>
Run-TryBot: Martin Möhrmann <martin@golang.org>
Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
2021-08-23 11:23:58 +00:00
Tobias Klauser 6406227d71 runtime: skip sysmon workaround on NetBSD >= 9.2
Detect the NetBSD version in osinit and only enable the workaround for
the kernel bug identified in #42515 for NetBSD versions older than 9.2.

For #42515
For #46495

Change-Id: I808846c7f8e47e5f7cc0a2f869246f4bd90d8e22
Reviewed-on: https://go-review.googlesource.com/c/go/+/324472
Trust: Tobias Klauser <tobias.klauser@gmail.com>
Trust: Benny Siegert <bsiegert@gmail.com>
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
2021-08-16 21:24:44 +00:00
Ian Lance Taylor 2eb4d68833 runtime: don't use systemstack for BeforeFork/AfterFork
In https://golang.org/cl/140930043 syscall.BeforeFork was changed to
call beforefork via onM. This was done because at the time BeforeFork
was written in C but was called from Go. While the runtime was being
converted to Go, calls to complex C functions used onM to ensure that
enough stack space was available.

In https://golang.org/cl/172260043 the syscall.BeforeFork and
beforefork functions were rewritten into Go. In this rewrite
syscall.BeforeFork continue to call beforefork via onM, although
because both functions were now in Go that was no longer necessary.

In https://golang.org/cl/174950043 onM was renamed to systemstack,
producing essentially the code we have today.

Therefore, the use of systemstack in syscall.BeforeFork (and
syscall.AfterFork) is a historical relic.  Remove it.

Change-Id: Ia570f556b20e8405afa6c5e707bd6f4ad18fd7ca
Reviewed-on: https://go-review.googlesource.com/c/go/+/341335
Trust: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2021-08-13 20:50:48 +00:00
Michael Pratt 20a620fd9f runtime: drop SIGPROF while in ARM < 7 kernel helpers
On Linux ARMv6 and below runtime/internal/atomic.Cas calls into a kernel
cas helper at a fixed address. If a SIGPROF arrives while executing the
kernel helper, the sigprof lostAtomic logic will miss that we are
potentially in the spinlock critical section, which could cause
a deadlock when using atomics later in sigprof.

Fixes #47505

Change-Id: If8ba0d0fc47e45d4e6c68eca98fac4c6ed4e43c1
Reviewed-on: https://go-review.googlesource.com/c/go/+/341889
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
2021-08-13 14:00:23 +00:00
Matthew Dempsky a64ab8d3ec [dev.typeparams] all: merge master (46fd547) into dev.typeparams
Conflicts:

- src/go/types/check_test.go

  CL 324730 on dev.typeparams changed the directory paths in TestCheck,
  TestExamples, and TestFixedbugs and renamed checkFiles to testFiles;
  whereas CL 337529 on master added a new test case just above them and
  that used checkFiles.

Merge List:

+ 2021-08-12 46fd547d89 internal/goversion: update Version to 1.18
+ 2021-08-12 5805efc78e doc/go1.17: remove draft notice
+ 2021-08-12 39634e7dae CONTRIBUTORS: update for the Go 1.17 release
+ 2021-08-12 095bb790e1 os/exec: re-enable LookPathTest/16
+ 2021-08-11 dea23e9ca8 src/make.*: make --no-clean flag a no-op that prints a warning
+ 2021-08-11 d4c0ed26ac doc/go1.17: linker passes -I to extld as -Wl,--dynamic-linker
+ 2021-08-10 1f9c9d8530 doc: use "high address/low address" instead of "top/bottom"
+ 2021-08-09 f1dce319ff cmd/go: with -mod=vendor, don't panic if there are duplicate requirements
+ 2021-08-09 7aeaad5c86 runtime/cgo: when using msan explicitly unpoison cgoCallers
+ 2021-08-08 507cc341ec doc: add example for conversion from slice expressions to array ptr
+ 2021-08-07 891547e2d4 doc/go1.17: fix a typo introduced in CL 335135
+ 2021-08-06 8eaf4d16bc make.bash: do not overwrite GO_LDSO if already set
+ 2021-08-06 63b968f4f8 doc/go1.17: clarify Modules changes
+ 2021-08-06 70546f6404 runtime: allow arm64 SEH to be called if illegal instruction
+ 2021-08-05 fd45e267c2 runtime: warn that KeepAlive is not an unsafe.Pointer workaround
+ 2021-08-04 6e738868a7 net/http: speed up and deflake TestCancelRequestWhenSharingConnection
+ 2021-08-02 8a7ee4c51e io/fs: don't use absolute path in DirEntry.Name doc
+ 2021-07-31 b8ca6e59ed all: gofmt
+ 2021-07-30 b7a85e0003 net/http/httputil: close incoming ReverseProxy request body
+ 2021-07-29 70fd4e47d7 runtime: avoid possible preemption when returning from Go to C
+ 2021-07-28 9eee0ed439 cmd/go: fix go.mod file name printed in error messages for replacements
+ 2021-07-28 b39e0f461c runtime: don't crash on nil pointers in checkptrAlignment
+ 2021-07-27 7cd10c1149 cmd/go: use .mod instead of .zip to determine if version has go.mod file
+ 2021-07-27 c8cf0f74e4 cmd/go: add missing flag in UsageLine
+ 2021-07-27 7ba8e796c9 testing: clarify T.Name returns a distinct name of the running test
+ 2021-07-27 33ff155970 go/types: preserve untyped constants on the RHS of a shift expression
+ 2021-07-26 840e583ff3 runtime: correct variable name in comment
+ 2021-07-26 bfbb288574 runtime: remove adjustTimers counter
+ 2021-07-26 9c81fd53b3 cmd/vet: add missing copyright header

Change-Id: Ia80604d24c6f4205265683024e3100769cf32065
2021-08-12 12:43:12 -07:00
Ian Lance Taylor bfbb288574 runtime: remove adjustTimers counter
In CL 336432 we changed adjusttimers so that it no longer cleared
timerModifiedEarliest if there were no timersModifiedEarlier timers.
This caused some Google internal tests to time out, presumably due
to the increased contention on timersLock.  We can avoid that by
simply not skipping the loop in adjusttimers, which lets us safely
clear timerModifiedEarliest.  And if we don't skip the loop, then there
isn't much reason to keep the count of timerModifiedEarlier timers at all.
So remove it.

The effect will be that for programs that create some timerModifiedEarlier
timers and then remove them all, the program will do an occasional
additional loop over all the timers.  And, programs that have some
timerModifiedEarlier timers will always loop over all the timers,
without the quicker exit when they have all been seen.  But the loops
should not occur all that often, due to timerModifiedEarliest.

For #47329

Change-Id: I7b244c1244d97b169a3c7fbc8f8a8b115731ddee
Reviewed-on: https://go-review.googlesource.com/c/go/+/337309
Trust: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2021-07-26 22:15:24 +00:00