Commit Graph

363 Commits

Author SHA1 Message Date
Austin Clements d19fedd180 runtime: move checkmarks to a separate bitmap
Currently, the GC stores the object marks for checkmarks mode in the
heap bitmap using a rather complex encoding: for one word objects, the
checkmark is stored in the pointer/scalar bit since one word objects
must be pointers; for larger objects, the checkmark is stored in what
would be the scan/dead bit for the second word of the object. This
encoding made more sense when the runtime used the first scan/dead bit
as the regular mark bit, but we moved away from that long ago.

This encoding and overloading of the heap bitmap bits causes a great
deal of complexity in many parts of the allocator and garbage
collector and leads to some subtle bugs like #15903.

This CL moves the checkmarks mark bits into their own per-arena bitmap
and reclaims the second scan/dead bit as a regular scan/dead bit.

I tested this by enabling doubleCheck mode in heapBitsSetType and
running in both regular and GODEBUG=gccheckmark=1 mode.

Fixes #15903.

No performance degradation. (Very slight improvement on a few
benchmarks, but it's probably just noise.)

name                                old time/op            new time/op            delta
BiogoIgor                                      16.6s ± 1%             16.4s ± 1%  -0.94%  (p=0.000 n=25+24)
BiogoKrishna                                   19.2s ± 3%             19.2s ± 3%    ~     (p=0.638 n=23+25)
BleveIndexBatch100                             6.12s ± 5%             6.17s ± 4%    ~     (p=0.170 n=25+25)
CompileTemplate                                206ms ± 1%             205ms ± 1%  -0.43%  (p=0.005 n=24+24)
CompileUnicode                                82.2ms ± 2%            81.5ms ± 2%  -0.95%  (p=0.001 n=22+22)
CompileGoTypes                                 755ms ± 3%             754ms ± 4%    ~     (p=0.715 n=25+25)
CompileCompiler                                3.73s ± 1%             3.73s ± 1%    ~     (p=0.445 n=25+24)
CompileSSA                                     8.67s ± 1%             8.66s ± 1%    ~     (p=0.836 n=24+22)
CompileFlate                                   134ms ± 2%             133ms ± 1%  -0.66%  (p=0.001 n=24+23)
CompileGoParser                                164ms ± 1%             163ms ± 1%  -0.85%  (p=0.000 n=24+24)
CompileReflect                                 466ms ± 5%             466ms ± 3%    ~     (p=0.863 n=25+25)
CompileTar                                     182ms ± 1%             182ms ± 1%  -0.31%  (p=0.048 n=24+24)
CompileXML                                     249ms ± 1%             248ms ± 1%  -0.32%  (p=0.031 n=21+25)
CompileStdCmd                                  10.3s ± 1%             10.3s ± 1%    ~     (p=0.459 n=23+23)
FoglemanFauxGLRenderRotateBoat                 8.66s ± 1%             8.62s ± 1%  -0.47%  (p=0.000 n=23+24)
FoglemanPathTraceRenderGopherIter1             20.3s ± 3%             20.2s ± 2%    ~     (p=0.893 n=25+25)
GopherLuaKNucleotide                           29.7s ± 1%             29.8s ± 2%    ~     (p=0.421 n=24+25)
MarkdownRenderXHTML                            246ms ± 1%             247ms ± 1%    ~     (p=0.558 n=25+24)
Tile38WithinCircle100kmRequest                 779µs ± 4%             779µs ± 3%    ~     (p=0.954 n=25+25)
Tile38IntersectsCircle100kmRequest            1.02ms ± 3%            1.01ms ± 4%    ~     (p=0.658 n=25+25)
Tile38KNearestLimit100Request                  984µs ± 4%             986µs ± 4%    ~     (p=0.627 n=24+25)
[Geo mean]                                     552ms                  551ms       -0.19%

https://perf.golang.org/search?q=upload:20200723.6

Change-Id: Ic703f26a83fb034941dc6f4788fc997d56890dec
Reviewed-on: https://go-review.googlesource.com/c/go/+/244539
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
2020-08-17 14:31:20 +00:00
Michael Anthony Knyszek 2491c5fd24 runtime: wake scavenger and update address on sweep done
This change modifies the semantics of waking the scavenger: rather than
wake on any update to pacing, wake when we know we will have work to do,
that is, when the sweeper is done. The current scavenger runs over the
address space just once per GC cycle, and we want to maximize the chance
that the scavenger observes the most attractive scavengable memory in
that pass (i.e. free memory with the highest address), so the timing is
important. By having the scavenger awaken and reset its search space
when the sweeper is done, we increase the chance that the scavenger will
observe the most attractive scavengable memory, because no more memory
will be freed that GC cycle (so the highest scavengable address should
now be available).

Furthermore, in applications that go idle, this means the background
scavenger will be awoken even if another GC doesn't happen, which isn't
true today.

However, we're unable to wake the scavenger directly from within the
sweeper; waking the scavenger involves modifying timers and readying
goroutines, the latter of which may trigger an allocation today (and the
sweeper may run during allocation!). Instead, we do the following:

1. Set a flag which is checked by sysmon. sysmon will clear the flag and
   wake the scavenger.
2. Wake the scavenger unconditionally at sweep termination.

The idea behind this policy is that it gets us close enough to the state
above without having to deal with the complexity of waking the scavenger
in deep parts of the runtime. If the application goes idle and sweeping
finishes (so we don't reach sweep termination), then sysmon will wake
the scavenger. sysmon has a worst-case 20 ms delay in responding to this
signal, which is probably fine if the application is completely idle
anyway, but if the application is actively allocating, then the
proportional sweeper should help ensure that sweeping ends very close to
sweep termination, so sweep termination is a perfectly reasonable time
to wake up the scavenger.

Updates #35788.

Change-Id: I84289b37816a7d595d803c72a71b7f5c59d47e6b
Reviewed-on: https://go-review.googlesource.com/c/go/+/207998
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2020-04-30 18:12:03 +00:00
Michael Anthony Knyszek a13691966a runtime: add new mcentral implementation
Currently mcentral is implemented as a couple of linked lists of spans
protected by a lock. Unfortunately this design leads to significant lock
contention.

The span ownership model is also confusing and complicated. In-use spans
jump between being owned by multiple sources, generally some combination
of a gcSweepBuf, a concurrent sweeper, an mcentral or an mcache.

So first to address contention, this change replaces those linked lists
with gcSweepBufs which have an atomic fast path. Then, we change up the
ownership model: a span may be simultaneously owned only by an mcentral
and the page reclaimer. Otherwise, an mcentral (which now consists of
sweep bufs), a sweeper, or an mcache are the sole owners of a span at
any given time. This dramatically simplifies reasoning about span
ownership in the runtime.

As a result of this new ownership model, sweeping is now driven by
walking over the mcentrals rather than having its own global list of
spans. Because we no longer have a global list and we traditionally
haven't used the mcentrals for large object spans, we no longer have
anywhere to put large objects. So, this change also makes it so that we
keep large object spans in the appropriate mcentral lists.

In terms of the static lock ranking, we add the spanSet spine locks in
pretty much the same place as the mcentral locks, since they have the
potential to be manipulated both on the allocation and sweep paths, like
the mcentral locks.

This new implementation is turned on by default via a feature flag
called go115NewMCentralImpl.

Benchmark results for 1 KiB allocation throughput (5 runs each):

name \ MiB/s  go113       go114       gotip       gotip+this-patch
AllocKiB-1    1.71k ± 1%  1.68k ± 1%  1.59k ± 2%      1.71k ± 1%
AllocKiB-2    2.46k ± 1%  2.51k ± 1%  2.54k ± 1%      2.93k ± 1%
AllocKiB-4    4.27k ± 1%  4.41k ± 2%  4.33k ± 1%      5.01k ± 2%
AllocKiB-8    4.38k ± 3%  5.24k ± 1%  5.46k ± 1%      8.23k ± 1%
AllocKiB-12   4.38k ± 3%  4.49k ± 1%  5.10k ± 1%     10.04k ± 0%
AllocKiB-16   4.31k ± 1%  4.14k ± 3%  4.22k ± 0%     10.42k ± 0%
AllocKiB-20   4.26k ± 1%  3.98k ± 1%  4.09k ± 1%     10.46k ± 3%
AllocKiB-24   4.20k ± 1%  3.97k ± 1%  4.06k ± 1%     10.74k ± 1%
AllocKiB-28   4.15k ± 0%  4.00k ± 0%  4.20k ± 0%     10.76k ± 1%

Fixes #37487.

Change-Id: I92d47355acacf9af2c41bf080c08a8c1638ba210
Reviewed-on: https://go-review.googlesource.com/c/go/+/221182
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2020-04-27 18:19:26 +00:00
Michael Anthony Knyszek 03ba6b070d runtime: prevent preemption while releasing worldsema in gcStart
Currently, as a result of us releasing worldsema now to allow STW events
during a mark phase, we release worldsema between starting the world and
having the goroutine block in STW mode. This inserts preemption points
which, if followed through, could lead to a deadlock. Specifically,
because user goroutine scheduling is disabled in STW mode, the goroutine
will block before properly releasing worldsema.

The fix here is to prevent preemption while releasing the worldsema.

Fixes #38404.
Updates #19812.

Change-Id: I8ed5b3aa108ab2e4680c38e77b0584fb75690e3d
Reviewed-on: https://go-review.googlesource.com/c/go/+/228337
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Austin Clements <austin@google.com>
2020-04-16 02:35:01 +00:00
Dan Scales 0a820007e7 runtime: static lock ranking for the runtime (enabled by GOEXPERIMENT)
I took some of the infrastructure from Austin's lock logging CR
https://go-review.googlesource.com/c/go/+/192704 (with deadlock
detection from the logs), and developed a setup to give static lock
ranking for runtime locks.

Static lock ranking establishes a documented total ordering among locks,
and then reports an error if the total order is violated. This can
happen if a deadlock happens (by acquiring a sequence of locks in
different orders), or if just one side of a possible deadlock happens.
Lock ordering deadlocks cannot happen as long as the lock ordering is
followed.

Along the way, I found a deadlock involving the new timer code, which Ian fixed
via https://go-review.googlesource.com/c/go/+/207348, as well as two other
potential deadlocks.

See the constants at the top of runtime/lockrank.go to show the static
lock ranking that I ended up with, along with some comments. This is
great documentation of the current intended lock ordering when acquiring
multiple locks in the runtime.

I also added an array lockPartialOrder[] which shows and enforces the
current partial ordering among locks (which is embedded within the total
ordering). This is more specific about the dependencies among locks.

I don't try to check the ranking within a lock class with multiple locks
that can be acquired at the same time (i.e. check the ranking when
multiple hchan locks are acquired).

Currently, I am doing a lockInit() call to set the lock rank of most
locks. Any lock that is not otherwise initialized is assumed to be a
leaf lock (a very high rank lock), so that eliminates the need to do
anything for a bunch of locks (including all architecture-dependent
locks). For two locks, root.lock and notifyList.lock (only in the
runtime/sema.go file), it is not as easy to do lock initialization, so
instead, I am passing the lock rank with the lock calls.

For Windows compilation, I needed to increase the StackGuard size from
896 to 928 because of the new lock-rank checking functions.

Checking of the static lock ranking is enabled by setting
GOEXPERIMENT=staticlockranking before doing a run.

To make sure that the static lock ranking code has no overhead in memory
or CPU when not enabled by GOEXPERIMENT, I changed 'go build/install' so
that it defines a build tag (with the same name) whenever any experiment
has been baked into the toolchain (by checking Expstring()). This allows
me to avoid increasing the size of the 'mutex' type when static lock
ranking is not enabled.

Fixes #38029

Change-Id: I154217ff307c47051f8dae9c2a03b53081acd83a
Reviewed-on: https://go-review.googlesource.com/c/go/+/207619
Reviewed-by: Dan Scales <danscales@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-07 21:51:03 +00:00
Michael Anthony Knyszek d1ecfcc1e8 runtime: ensure minTriggerRatio never exceeds maxTriggerRatio
Currently, the capping logic for the GC trigger ratio is such that if
gcpercent is low, we may end up setting the trigger ratio far too high,
breaking the promise of SetGCPercent and GOGC has a trade-off knob (we
won't start a GC early enough, and we will use more memory).

This change modifies the capping logic for the trigger ratio by scaling
the minTriggerRatio with gcpercent the same way we scale
maxTriggerRatio.

Fixes #37927.

Change-Id: I2a048c1808fb67186333d3d5a6bee328be2f35da
Reviewed-on: https://go-review.googlesource.com/c/go/+/223937
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2020-03-26 16:12:18 +00:00
Michael Anthony Knyszek f1f947af28 runtime: don't hold worldsema across mark phase
This change makes it so that worldsema isn't held across the mark phase.
This means that various operations like ReadMemStats may now stop the
world during the mark phase, reducing latency on such operations.

Only three such operations are still no longer allowed to occur during
marking: GOMAXPROCS, StartTrace, and StopTrace.

For the former it's because any change to GOMAXPROCS impacts GC mark
background worker scheduling and the details there are tricky.

For the latter two it's because tracing needs to observe consistent GC
start and GC end events, and if StartTrace or StopTrace may stop the
world during marking, then it's possible for it to see a GC end event
without a start or GC start event without an end, respectively.

To ensure that GOMAXPROCS and StartTrace/StopTrace cannot proceed until
marking is complete, the runtime now holds a new semaphore, gcsema,
across the mark phase just like it used to with worldsema.

This change is being landed once more after being reverted in the Go
1.14 release cycle, since CL 215157 allows it to have a positive
effect on system performance.

For the benchmark BenchmarkReadMemStatsLatency in the runtime, which
measures ReadMemStats latencies while the GC is exercised, the tail of
these latencies reduced dramatically on an 8-core machine:

name                   old 50%tile-ns  new 50%tile-ns  delta
ReadMemStatsLatency-8      4.40M ±74%      0.12M ± 2%  -97.35%  (p=0.008 n=5+5)

name                   old 90%tile-ns  new 90%tile-ns  delta
ReadMemStatsLatency-8       102M ± 6%         0M ±14%  -99.79%  (p=0.008 n=5+5)

name                   old 99%tile-ns  new 99%tile-ns  delta
ReadMemStatsLatency-8       147M ±18%         4M ±57%  -97.43%  (p=0.008 n=5+5)

Fixes #19812.

Change-Id: If66c3c97d171524ae29f0e7af4bd33509d9fd0bb
Reviewed-on: https://go-review.googlesource.com/c/go/+/216557
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-18 19:13:50 +00:00
Michael Knyszek 64c22b70bf Revert "runtime: don't hold worldsema across mark phase"
This reverts commit 7b294cdd8d, CL 182657.

Reason for revert: This change may be causing latency problems
for applications which call ReadMemStats, because it may cause
all goroutines to stop until the GC completes.

https://golang.org/cl/215157 fixes this problem, but it's too
late in the cycle to land that.

Updates #19812.

Change-Id: Iaa26f4dec9b06b9db2a771a44e45f58d0aa8f26d
Reviewed-on: https://go-review.googlesource.com/c/go/+/216358
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-01-24 23:27:33 +00:00
Michael Knyszek ad3cef184e Revert "runtime: release worldsema before Gosched in STW GC mode"
This reverts commit 05511a5c0a, CL 208379.

Reason for revert: So that we can cleanly revert
https://golang.org/cl/182657.

Change-Id: I4fdf4f864a093db7866b3306f0f8f856b9f4d684
Reviewed-on: https://go-review.googlesource.com/c/go/+/216357
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-01-24 23:27:22 +00:00
Michael Anthony Knyszek 05511a5c0a runtime: release worldsema before Gosched in STW GC mode
After CL 182657 we no longer hold worldsema across the GC, we hold
gcsema instead.

However in STW GC mode we don't release worldsema before calling Gosched
on the user goroutine (note that user goroutines are disabled during STW
GC) so that user goroutine holds onto it. When the GC is done and the
runtime inevitably wants to "stop the world" again (though there isn't
much to stop) it'll sit there waiting for worldsema which won't be
released until the aforementioned goroutine is scheduled, which it won't
be until the GC is done!

So, we have a deadlock.

The fix is easy: just release worldsema before calling Gosched.

Fixes #34736.

Change-Id: Ia50db22ebed3176114e7e60a7edaf82f8535c1b4
Reviewed-on: https://go-review.googlesource.com/c/go/+/208379
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-11-22 17:33:48 +00:00
Michael Anthony Knyszek dac936a4ab runtime: make more page sweeper operations atomic
This change makes it so that allocation and free related page sweeper
metadata operations (e.g. pageInUse and pagesInUse) are atomic rather
than protected by the heap lock. This will help in reducing the length
of the critical path with the heap lock held in future changes.

Updates #35112.

Change-Id: Ie82bff024204dd17c4c671af63350a7a41add354
Reviewed-on: https://go-review.googlesource.com/c/go/+/196640
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-11-08 17:00:57 +00:00
Michael Knyszek 74af7fc603 runtime: place lower limit on trigger ratio
This change makes it so that the GC pacer's trigger ratio can never fall
below 0.6. Upcoming changes to the allocator make it significantly more
scalable and thus much faster in certain cases, creating a large gap
between the performance of allocation and scanning. The consequence of
this is that the trigger ratio can drop very low (0.07 was observed) in
order to drop GC utilization. A low trigger ratio like this results in a
high amount of black allocations, which causes the live heap to appear
larger, and thus the heap, and RSS, grows to a much higher stable point.

This change alleviates the problem by placing a lower bound on the
trigger ratio. The expected (and confirmed) effect of this is that
utilization in certain scenarios will no longer converge to the expected
25%, and may go higher. As a result of this artificially high trigger
ratio, more time will also be spent doing GC assists compared to
dedicated mark workers, since the GC will be on for an artifically short
fraction of time (artificial with respect to the pacer). The biggest
concern of this change is that allocation latency will suffer as a
result, since there will now be more assists. But, upcoming changes to
the allocator reduce the latency enough to outweigh the expected
increase in latency from this change, without the blowup in RSS observed
from the changes to the allocator.

Updates #35112.

Change-Id: Idd7c94fa974d0de673304c4397e716e89bfbf09b
Reviewed-on: https://go-review.googlesource.com/c/go/+/200439
Reviewed-by: Austin Clements <austin@google.com>
2019-11-04 22:52:25 +00:00
Austin Clements d16ec13756 runtime: scan stacks conservatively at async safe points
This adds support for scanning the stack when a goroutine is stopped
at an async safe point. This is not yet lit up because asyncPreempt is
not yet injected, but prepares us for that.

This works by conservatively scanning the registers dumped in the
frame of asyncPreempt and its parent frame, which was stopped at an
asynchronous safe point.

Conservative scanning works by only marking words that are pointers to
valid, allocated heap objects. One complication is pointers to stack
objects. In this case, we can't determine if the stack object is still
"allocated" or if it was freed by an earlier GC. Hence, we need to
propagate the conservative-ness of scanning stack objects: if all
pointers found to a stack object were found via conservative scanning,
then the stack object itself needs to be scanned conservatively, since
its pointers may point to dead objects.

For #10958, #24543.

Change-Id: I7ff84b058c37cde3de8a982da07002eaba126fd6
Reviewed-on: https://go-review.googlesource.com/c/go/+/201761
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-11-02 21:51:16 +00:00
Austin Clements 8c5861576a runtime: remove g.gcscanvalid
Currently, gcscanvalid is used to resolve a race between attempts to
scan a stack. Now that there's a clear owner of the stack scan
operation, there's no longer any danger of racing or attempting to
scan a stack more than once, so this CL eliminates gcscanvalid.

I double-checked my reasoning by first adding a throw if gcscanvalid
was set in scanstack and verifying that all.bash still passed.

For #10958, #24543.
Fixes #24363.

Change-Id: I76794a5fcda325ed7cfc2b545e2a839b8b3bc713
Reviewed-on: https://go-review.googlesource.com/c/go/+/201139
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-25 23:25:32 +00:00
Michael Anthony Knyszek 62e4156552 runtime: fix lock acquire cycles related to scavenge.lock
There are currently two edges in the lock cycle graph caused by
scavenge.lock: with sched.lock and mheap_.lock. These edges appear
because of the call to ready() and stack growths respectively.
Furthermore, there's already an invariant in the code wherein
mheap_.lock must be acquired before scavenge.lock, hence the cycle.

The fix to this is to bring scavenge.lock higher in the lock cycle
graph, such that sched.lock and mheap_.lock are only acquired once
scavenge.lock is already held.

To faciliate this change, we move scavenger waking outside of
gcSetTriggerRatio such that it doesn't have to happen with the heap
locked. Furthermore, we check scavenge generation numbers with the heap
locked by using gopark instead of goparkunlock, and specify a function
which aborts the park should there be any skew in generation count.

Fixes #34047.

Change-Id: I3519119214bac66375e2b1262b36ce376c820d12
Reviewed-on: https://go-review.googlesource.com/c/go/+/191977
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-09-26 15:44:49 +00:00
Michael Anthony Knyszek 9b30811280 runtime: redefine scavenge goal in terms of heap_inuse
This change makes it so that the scavenge goal is defined primarily in
terms of heap_inuse at the end of the last GC rather than next_gc. The
reason behind this change is that next_gc doesn't take into account
fragmentation, and we can fall into situation where the scavenger thinks
it should have work to do but there's no free and unscavenged memory
available.

In order to ensure the scavenge goal still tracks next_gc, we multiply
heap_inuse by the ratio between the current heap goal and the last heap
goal, which describes whether the heap is growing or shrinking, and by
how much.

Finally, this change updates the documentation for scavenging and
elaborates on why the scavenge goal is defined the way it is.

Fixes #34048.
Updates #32828.

Change-Id: I8deaf87620b5dc12a40ab8a90bf27932868610da
Reviewed-on: https://go-review.googlesource.com/c/go/+/193040
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2019-09-25 22:15:39 +00:00
Michael Knyszek aae0b5b0b2 runtime: use hard heap goal if we've done more scan work than expected
This change makes it so that if we're already finding ourselves in a
situation where we've done more scan work than expected in the
steady-state (that is, 50% of heap_scan for GOGC=100), then we fall back
on the hard heap goal instead of continuing to assume the expected case.

In some cases its possible that we're already doing more scan work than
expected, and if GC assists come in just at that window where we notice
it, they might accumulate way too much assist credit, causing undue heap
growths if GOMAXPROCS=1 (since the fractional background worker isn't
guaranteed to fire). This case seems awfully specific, and that's
because it's exactly the case for TestGcSys, which has been flaky for
some time as a result.

Fixes #28574, #27636, and #27156.

Change-Id: I771f42bed34739dbb1b84ad82cfe247f70836031
Reviewed-on: https://go-review.googlesource.com/c/go/+/184097
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-09-04 21:52:18 +00:00
Michael Anthony Knyszek 7b294cdd8d runtime: don't hold worldsema across mark phase
This change makes it so that worldsema isn't held across the mark phase.
This means that various operations like ReadMemStats may now stop the
world during the mark phase, reducing latency on such operations.

Only three such operations are still no longer allowed to occur during
marking: GOMAXPROCS, StartTrace, and StopTrace.

For the former it's because any change to GOMAXPROCS impacts GC mark
background worker scheduling and the details there are tricky.

For the latter two it's because tracing needs to observe consistent GC
start and GC end events, and if StartTrace or StopTrace may stop the
world during marking, then it's possible for it to see a GC end event
without a start or GC start event without an end, respectively.

To ensure that GOMAXPROCS and StartTrace/StopTrace cannot proceed until
marking is complete, the runtime now holds a new semaphore, gcsema,
across the mark phase just like it used to with worldsema.

Fixes #19812.

Change-Id: I15d43ed184f711b3d104e8f267fb86e335f86bf9
Reviewed-on: https://go-review.googlesource.com/c/go/+/182657
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-09-04 15:53:59 +00:00
Michael Anthony Knyszek 7ed7669c0d runtime: ensure mheap lock stack growth invariant is maintained
Currently there's an invariant in the runtime wherein the heap lock
can only be acquired on the system stack, otherwise a self-deadlock
could occur if the stack grows while the lock is held.

This invariant is upheld and documented in a number of situations (e.g.
allocManual, freeManual) but there are other places where the invariant
is either not maintained at all which risks self-deadlock (e.g.
setGCPercent, gcResetMarkState, allocmcache) or is maintained but
undocumented (e.g. gcSweep, readGCStats_m).

This change adds go:systemstack to any function that acquires the heap
lock or adds a systemstack(func() { ... }) around the critical section,
where appropriate. It also documents the invariant on (*mheap).lock
directly and updates repetitive documentation to refer to that comment.

Fixes #32105.

Change-Id: I702b1290709c118b837389c78efde25c51a2cafb
Reviewed-on: https://go-review.googlesource.com/c/go/+/177857
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Austin Clements <austin@google.com>
2019-05-24 15:34:57 +00:00
Tamir Duberstein 9c86eae384 runtime: resolve latent TODOs
These were added in https://go-review.googlesource.com/1224; according
to austin@google.com these annotations are not valuable - resolving by
removing the TODOs.

Change-Id: Icf3f21bc385cac9673ba29f0154680e970cf91f2
Reviewed-on: https://go-review.googlesource.com/c/go/+/176899
Reviewed-by: Austin Clements <austin@google.com>
2019-05-13 19:33:29 +00:00
Michael Anthony Knyszek fe67ea32bf runtime: add background scavenger
This change adds a background scavenging goroutine whose pacing is
determined when the heap goal changes. The scavenger is paced to use
at most 1% of the mutator's time for most systems. Furthermore, the
scavenger's pacing is computed based on the estimated number of
scavengable huge pages to take advantage of optimizations provided by
the OS.

The purpose of this scavenger is to deal with a shrinking heap: if the
heap goal is falling over time, the scavenger should kick in and start
returning free pages from the heap to the OS.

Also, now that we have a pacing system, the credit system used by
scavengeLocked has become redundant. Replace it with a mechanism which
only scavenges on the allocation path if it makes sense to do so with
respect to the new pacing system.

Fixes #30333.

Change-Id: I6203f8dc84affb26c3ab04528889dd9663530edc
Reviewed-on: https://go-review.googlesource.com/c/go/+/142960
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-05-09 16:21:43 +00:00
Austin Clements 5c22842cf2 runtime: introduce effective GOGC, eliminate heap_marked hacks
Currently, the pacer assumes the goal growth ratio is always exactly
GOGC/100. But sometimes this isn't the case, like when the heap is
very small (limited by heapminimum). So to placate the pacer, we lie
about the value of heap_marked in such situations.

Right now, these two lies make a truth, but GOGC is about to get more
complicated with the introduction of heap limits.

Rather than introduce more lies into the system to handle this,
introduce the concept of an "effective GOGC", which is the GOGC we're
actually using for pacing (we'll need this concept anyway with heap
limits). This commit changes the pacer to use the effective GOGC
rather than the user-set GOGC. This way, we no longer need to lie
about heap_marked because its true value is incorporated into the
effective GOGC.

Change-Id: I5b005258f937ab184ffcb5e76053abd798d542bd
Reviewed-on: https://go-review.googlesource.com/c/go/+/66092
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2019-03-05 23:08:18 +00:00
Austin Clements 4a7d5aa30b runtime: don't use GOGC in minimum sweep distance
Currently, the minimum sweep distance is 1MB * GOGC/100. It's been
this way since it was introduced in CL 13043 with no justification.

Since there seems to be no good reason to scale the minimum sweep
distance by GOGC, make the minimum sweep distance just 1MB.

Change-Id: I5320574a23c0eec641e346282aab08a3bbb057da
Reviewed-on: https://go-review.googlesource.com/c/go/+/66091
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2019-03-05 23:08:16 +00:00
Austin Clements 7da03b9fbb runtime: compute goal first in gcSetTriggerRatio
This slightly rearranges gcSetTriggerRatio to compute the goal before
computing the other controls. This will simplify implementing the heap
limit, which needs to control the absolute goal and flow the rest of
the control parameters from this.

For #16843.

Change-Id: I46b7c1f8b6e4edbee78930fb093b60bd1a03d75e
Reviewed-on: https://go-review.googlesource.com/c/go/+/46750
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2019-03-05 23:08:15 +00:00
Austin Clements 7ac0a8bc39 runtime: remove unused gcTriggerAlways
This was used during the implementation of concurrent runtime.GC() but
now there's nothing that triggers GC unconditionally. Remove this
trigger type and simplify (gcTrigger).test() accordingly.

Change-Id: I17a893c2ed1f661b8146d7783d529f71735c9105
Reviewed-on: https://go-review.googlesource.com/c/go/+/66090
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2019-03-05 23:08:13 +00:00
Austin Clements 95a6f112c6 runtime: work around "P has cached GC work" failures
We still don't understand what's causing there to be remaining GC work
when we enter mark termination, but in order to move forward on this
issue, this CL implements a work-around for the problem.

If debugCachedWork is false, this CL does a second check for remaining
GC work as soon as it stops the world for mark termination. If it
finds any work, it starts the world again and re-enters concurrent
mark. This will increase STW time by a small amount proportional to
GOMAXPROCS, but fixes a serious correctness issue.

This works-around #27993.

Change-Id: Ia23b85dd6c792ee8d623428bd1a3115631e387b8
Reviewed-on: https://go-review.googlesource.com/c/156140
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2019-01-04 01:24:58 +00:00
Austin Clements 8e6396115e runtime: don't spin in checkPut if non-preemptible
Currently it's possible for the runtime to deadlock if checkPut is
called in a non-preemptible context. In this case, checkPut may spin,
so it won't leave the non-preemptible context, but the thread running
gcMarkDone needs to preempt all of the goroutines before it can
release the checkPut spin loops.

Fix this by returning from checkPut if it's called under any of the
conditions that would prevent gcMarkDone from preempting it. In this
case, it leaves a note behind that this happened; if the runtime does
later detect left-over work it can at least indicate that it was
unable to catch it in the act.

For #27993.
Updates #29385 (may fix it).

Change-Id: Ic71c10701229febb4ddf8c104fb10e06d84b122e
Reviewed-on: https://go-review.googlesource.com/c/156017
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2019-01-02 20:21:01 +00:00
Austin Clements 2d00007bdb runtime: flush on every write barrier while debugging
Currently, we flush the write barrier buffer on every write barrier
once throwOnGCWork is set, but not during the mark completion
algorithm itself. As seen in recent failures like

  https://build.golang.org/log/317369853b803b4ee762b27653f367e1aa445ac1

by the time we actually catch a late gcWork put, the write barrier
buffer is full-size again.

As a result, we're probably not catching the actual problematic write
barrier, which is probably somewhere in the buffer.

Fix this by using the gcWork pause generation to also keep the write
barrier buffer small between the mark completion flushes it and when
mark completion is done.

For #27993.

Change-Id: I77618169441d42a7d562fb2a998cfaa89891edb2
Reviewed-on: https://go-review.googlesource.com/c/154638
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2018-12-18 15:17:50 +00:00
Austin Clements ccbca561ef runtime: capture pause stack for late gcWork put debugging
This captures the stack trace where mark completion observed that each
P had no work, and then dumps this if that P later discovers more
work. Hopefully this will help bound where the work was created.

For #27993.

Change-Id: I4f29202880d22c433482dc1463fb50ab693b6de6
Reviewed-on: https://go-review.googlesource.com/c/154599
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2018-12-17 21:24:19 +00:00
Michael Anthony Knyszek 578667f4b5 runtime: enable preemption of mark termination goroutine
A mark worker goroutine may attempt to preempt the mark termination
goroutine to scan its stack while the mark termination goroutine is
trying to preempt that worker to flush its work buffer, in rare
cases.

This change makes it so that, like a worker goroutine, the mark
termination goroutine stack is preemptible while it is on the
system stack, attempting to preempt others.

Fixes #28695.

Change-Id: I23bbb191f4fdad293e8a70befd51c9175f8a1171
Reviewed-on: https://go-review.googlesource.com/c/153077
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-12-07 16:42:49 +00:00
Austin Clements 438b9544a0 runtime: check more work flushing races
This adds several new checks to help debug #27993. It adds a mechanism
for freezing write barriers and gcWork puts during the mark completion
algorithm. This way, if we do detect mark completion, we can catch any
puts that happened during the completion algorithm. Based on build
dashboard failures, this seems to be the window of time when these are
happening.

This also double-checks that all work buffers are empty immediately
upon entering mark termination (much earlier than the current check).
This is unlikely to trigger based on the current failures, but is a
good safety net.

Change-Id: I03f56c48c4322069e28c50fbc3c15b2fee2130c2
Reviewed-on: https://go-review.googlesource.com/c/151797
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2018-11-29 22:08:05 +00:00
Austin Clements 9098d1d854 runtime: debug code to catch bad gcWork.puts
This adds a debug check to throw immediately if any pointers are added
to the gcWork buffer after the mark completion barrier. The intent is
to catch the source of the cached GC work that occasionally produces
"P has cached GC work at end of mark termination" failures.

The result should be that we get "throwOnGCWork" throws instead of "P
has cached GC work at end of mark termination" throws, but with useful
stack traces.

This should be reverted before the release. I've been unable to
reproduce this issue locally, but this issue appears fairly regularly
on the builders, so the intent is to catch it on the builders.

This probably slows down the GC slightly.

For #27993.

Change-Id: I5035e14058ad313bfbd3d68c41ec05179147a85c
Reviewed-on: https://go-review.googlesource.com/c/149969
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-11-21 16:28:17 +00:00
Austin Clements b8fad4b33d runtime: improve "P has cached GC work" debug info
For #27993.

Change-Id: I20127e8a9844c2c488f38e1ab1f8f5a27a5df03e
Reviewed-on: https://go-review.googlesource.com/c/149968
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-11-21 16:20:19 +00:00
Austin Clements 5333550bdc runtime: implement efficient page reclaimer
When we attempt to allocate an N page span (either for a large
allocation or when an mcentral runs dry), we first try to sweep spans
to release N pages. Currently, this can be extremely expensive:
sweeping a span to emptiness is the hardest thing to ask for and the
sweeper generally doesn't know where to even look for potentially
fruitful results. Since this is on the critical path of many
allocations, this is unfortunate.

This CL changes how we reclaim empty spans. Instead of trying lots of
spans and hoping for the best, it uses the newly introduced span marks
to efficiently find empty spans. The span marks (and in-use bits) are
in a dense bitmap, so these spans can be found with an efficient
sequential memory scan. This approach can scan for unmarked spans at
about 300 GB/ms and can free unmarked spans at about 32 MB/ms. We
could probably significantly improve the rate at which is can free
unmarked spans, but that's a separate issue.

Like the current reclaimer, this is still linear in the number of
spans that are swept, but the constant factor is now so vanishingly
small that it doesn't matter.

The benchmark in #18155 demonstrates both significant page reclaiming
delays, and object reclaiming delays. With "-retain-count=20000000
-preallocate=true -loop-count=3", the benchmark demonstrates several
page reclaiming delays on the order of 40ms. After this change, the
page reclaims are insignificant. The longest sweeps are still ~150ms,
but are object reclaiming delays. We'll address those in the next
several CLs.

Updates #18155.

Fixes #21378 by completely replacing the logic that had that bug.

Change-Id: Iad80eec11d7fc262d02c8f0761ac6998425c4064
Reviewed-on: https://go-review.googlesource.com/c/138959
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-11-15 19:27:11 +00:00
Austin Clements ba1698e963 runtime: mark span when marking any object on the span
This adds a mark bit for each span that is set if any objects on the
span are marked. This will be used for sweeping.

For #18155.

The impact of this is negligible for most benchmarks, and < 1% for
GC-heavy benchmarks.

name                       old time/op  new time/op  delta
Garbage/benchmem-MB=64-12  2.18ms ± 0%  2.20ms ± 1%  +0.88%  (p=0.000 n=16+18)

(https://perf.golang.org/search?q=upload:20180928.1)

name                      old time/op    new time/op    delta
BinaryTree17-12              2.68s ± 1%     2.68s ± 1%    ~     (p=0.707 n=17+19)
Fannkuch11-12                2.28s ± 0%     2.39s ± 0%  +4.95%  (p=0.000 n=19+18)
FmtFprintfEmpty-12          40.3ns ± 4%    39.4ns ± 2%  -2.27%  (p=0.000 n=17+18)
FmtFprintfString-12         67.9ns ± 1%    68.3ns ± 1%  +0.55%  (p=0.000 n=18+19)
FmtFprintfInt-12            75.7ns ± 1%    76.1ns ± 1%  +0.44%  (p=0.005 n=18+19)
FmtFprintfIntInt-12          123ns ± 1%     121ns ± 1%  -1.00%  (p=0.000 n=18+18)
FmtFprintfPrefixedInt-12     150ns ± 0%     148ns ± 0%  -1.33%  (p=0.000 n=16+13)
FmtFprintfFloat-12           208ns ± 0%     204ns ± 0%  -1.92%  (p=0.000 n=13+17)
FmtManyArgs-12               501ns ± 1%     498ns ± 0%  -0.55%  (p=0.000 n=19+17)
GobDecode-12                6.24ms ± 0%    6.25ms ± 1%    ~     (p=0.113 n=20+19)
GobEncode-12                5.33ms ± 0%    5.29ms ± 1%  -0.72%  (p=0.000 n=20+18)
Gzip-12                      220ms ± 1%     218ms ± 1%  -1.02%  (p=0.000 n=19+19)
Gunzip-12                   35.5ms ± 0%    35.7ms ± 0%  +0.45%  (p=0.000 n=16+18)
HTTPClientServer-12         77.9µs ± 1%    77.7µs ± 1%  -0.30%  (p=0.047 n=20+19)
JSONEncode-12               8.82ms ± 0%    8.93ms ± 0%  +1.20%  (p=0.000 n=18+17)
JSONDecode-12               47.3ms ± 0%    47.0ms ± 0%  -0.49%  (p=0.000 n=17+18)
Mandelbrot200-12            3.69ms ± 0%    3.68ms ± 0%  -0.25%  (p=0.000 n=19+18)
GoParse-12                  3.13ms ± 1%    3.13ms ± 1%    ~     (p=0.640 n=20+20)
RegexpMatchEasy0_32-12      76.2ns ± 1%    76.2ns ± 1%    ~     (p=0.818 n=20+19)
RegexpMatchEasy0_1K-12       226ns ± 0%     226ns ± 0%  -0.22%  (p=0.001 n=17+18)
RegexpMatchEasy1_32-12      71.9ns ± 1%    72.0ns ± 1%    ~     (p=0.653 n=18+18)
RegexpMatchEasy1_1K-12       355ns ± 1%     356ns ± 1%    ~     (p=0.160 n=18+19)
RegexpMatchMedium_32-12      106ns ± 1%     106ns ± 1%    ~     (p=0.325 n=17+20)
RegexpMatchMedium_1K-12     31.1µs ± 2%    31.2µs ± 0%  +0.59%  (p=0.007 n=19+15)
RegexpMatchHard_32-12       1.54µs ± 2%    1.53µs ± 2%  -0.78%  (p=0.021 n=17+18)
RegexpMatchHard_1K-12       46.0µs ± 1%    45.9µs ± 1%  -0.31%  (p=0.025 n=17+19)
Revcomp-12                   391ms ± 1%     394ms ± 2%  +0.80%  (p=0.000 n=17+19)
Template-12                 59.9ms ± 1%    59.9ms ± 1%    ~     (p=0.428 n=20+19)
TimeParse-12                 304ns ± 1%     312ns ± 0%  +2.88%  (p=0.000 n=20+17)
TimeFormat-12                318ns ± 0%     326ns ± 0%  +2.64%  (p=0.000 n=20+17)

(https://perf.golang.org/search?q=upload:20180928.2)

Change-Id: I336b9bf054113580a24103192904c8c76593e90e
Reviewed-on: https://go-review.googlesource.com/c/138958
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2018-11-15 19:27:09 +00:00
Brad Fitzpatrick 3813edf26e all: use "reports whether" consistently in the few places that didn't
Go documentation style for boolean funcs is to say:

    // Foo reports whether ...
    func Foo() bool

(rather than "returns true if")

This CL also replaces 4 uses of "iff" with the same "reports whether"
wording, which doesn't lose any meaning, and will prevent people from
sending typo fixes when they don't realize it's "if and only if". In
the past I think we've had the typo CLs updated to just say "reports
whether". So do them all at once.

(Inspired by the addition of another "returns true if" in CL 146938
in fd_plan9.go)

Created with:

$ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true iff" | grep -v vendor)
$ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true if" | grep -v vendor)

Change-Id: Ided502237f5ab0d25cb625dbab12529c361a8b9f
Reviewed-on: https://go-review.googlesource.com/c/147037
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-11-02 22:47:58 +00:00
Austin Clements 007e8a2fbd runtime: rename gosweepdone to isSweepDone and document better
gosweepdone is another anachronism from the time when the sweeper was
implemented in C. Rename it to "isSweepDone" for the modern era.

Change-Id: I8472aa6f52478459c3f2edc8a4b2761e73c4c2dd
Reviewed-on: https://go-review.googlesource.com/c/138658
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-10-09 18:32:08 +00:00
Austin Clements f3bb4cbfd5 runtime: eliminate gosweepone
gosweepone just switches to the system stack and calls sweepone.
sweepone doesn't need to run on the system stack, so this is pretty
pointless.

Historically, this was necessary because the sweeper was written in C
and hence needed to run on the system stack. gosweepone was the
function that Go code (specifically, bgsweep) used to call into the C
sweeper implementation. This probably became unnecessary in 2014 with
CL golang.org/cl/167540043, which ported the sweeper to Go.

This CL changes all callers of gosweepone to call sweepone and
eliminates gosweepone.

Change-Id: I26b8ef0c7d060b4c0c5dedbb25ecfc936acc7269
Reviewed-on: https://go-review.googlesource.com/c/138657
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-10-09 18:20:40 +00:00
Igor Zhilianin f90e89e675 all: fix a bunch of misspellings
Change-Id: If2954bdfc551515403706b2cd0dde94e45936e08
GitHub-Last-Rev: d4cfc41a55
GitHub-Pull-Request: golang/go#28049
Reviewed-on: https://go-review.googlesource.com/c/140299
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-10-06 15:40:03 +00:00
Austin Clements 0906d648aa runtime: eliminate gchelper mechanism
Now that we do no mark work during mark termination, we no longer need
the gchelper mechanism.

Updates #26903.
Updates #17503.

Change-Id: Ie94e5c0f918cfa047e88cae1028fece106955c1b
Reviewed-on: https://go-review.googlesource.com/c/134785
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-10-02 20:35:38 +00:00
Austin Clements 550dfc8ae1 runtime: eliminate work.markrootdone and second root marking pass
Before STW and concurrent GC were unified, there could be either one
or two root marking passes per GC cycle. There were several tasks we
had to make sure happened once and only once (whether that was at the
beginning of concurrent mark for concurrent GC or during mark
termination for STW GC). We kept track of this in work.markrootdone.

Now that STW and concurrent GC both use the concurrent marking code
and we've eliminated all work done by the second root marking pass, we
only ever need a single root marking pass. Hence, we can eliminate
work.markrootdone and all of the code that's conditional on it.

Updates #26903.

Change-Id: I654a0f5e21b9322279525560a31e64b8d33b790f
Reviewed-on: https://go-review.googlesource.com/c/134784
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-10-02 20:35:37 +00:00
Austin Clements 873bd47dfb runtime: flush mcaches lazily
Currently, all mcaches are flushed during STW mark termination as a
root marking job. This is currently necessary because all spans must
be out of these caches before sweeping begins to avoid races with
allocation and to ensure the spans are in the state expected by
sweeping. We do it as a root marking job because mcache flushing is
somewhat expensive and O(GOMAXPROCS) and this parallelizes the work
across the Ps. However, it's also the last remaining root marking job
performed during mark termination.

This CL moves mcache flushing out of mark termination and performs it
lazily. We keep track of the last sweepgen at which each mcache was
flushed and as each P is woken from STW, it observes that its mcache
is out-of-date and flushes it.

The introduces a complication for spans cached in stale mcaches. These
may now be observed by background or proportional sweeping or when
attempting to add a finalizer, but aren't in a stable state. For
example, they are likely to be on the wrong mcentral list. To fix
this, this CL extends the sweepgen protocol to also capture whether a
span is cached and, if so, whether or not its cache is stale. This
protocol blocks asynchronous sweeping from touching cached spans and
makes it the responsibility of mcache flushing to sweep the flushed
spans.

This eliminates the last mark termination root marking job, which
means we can now eliminate that entire infrastructure.

Updates #26903. This implements lazy mcache flushing.

Change-Id: Iadda7aabe540b2026cffc5195da7be37d5b4125e
Reviewed-on: https://go-review.googlesource.com/c/134783
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-10-02 20:35:35 +00:00
Austin Clements 457c8f4fe9 runtime: eliminate blocking GC work drains
Now work.helperDrainBlock is always false, so we can remove it and
code paths that only ran when it was true. That means we no longer use
the gcDrainBlock mode of gcDrain, so we can eliminate that. That means
we no longer use gcWork.get, so we can eliminate that. That means we
no longer use getfull, so we can eliminate that.

Updates #26903. This is a follow-up to unifying STW GC and concurrent GC.

Change-Id: I8dbcf8ce24861df0a6149e0b7c5cd0eadb5c13f6
Reviewed-on: https://go-review.googlesource.com/c/134782
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-10-02 20:35:34 +00:00
Austin Clements 143b13ae82 runtime: clean up remaining mark work check
Now that STW GC marking is unified with concurrent marking, there
should never be mark work remaining in mark termination. Hence, we can
make that check unconditional.

Updates #26903. This is a follow-up to unifying STW GC and concurrent GC.

Change-Id: I43a21df5577635ab379c397a7405ada68d331e03
Reviewed-on: https://go-review.googlesource.com/c/134781
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-10-02 20:35:32 +00:00
Austin Clements 1678b2c580 runtime: implement STW GC in terms of concurrent GC
Currently, STW GC works very differently from concurrent GC. The
largest differences in that in concurrent GC, all marking work is done
by background mark workers during the mark phase, while in STW GC, all
marking work is done by gchelper during the mark termination phase.

This is a consequence of the evolution of Go's GC from a STW GC by
incrementally moving work from STW mark termination into concurrent
mark. However, at this point, the STW code paths exist only as a
debugging mode. Having separate code paths for this increases the
maintenance burden and complexity of the garbage collector. At the
same time, these code paths aren't tested nearly as well, making it
far more likely that they will bit-rot.

This CL reverses the relationship between STW GC, by re-implementing
STW GC in terms of concurrent GC.

This builds on the new scheduled support for disabling user goroutine
scheduling. During sweep termination, it disables user scheduling, so
when the GC starts the world again for concurrent mark, it's really
only "concurrent" with itself.

There are several code paths that were specific to STW GC that are now
vestigial. We'll remove these in the follow-up CLs.

Updates #26903.

Change-Id: Ia3883d2fcf7ab1d89bdc9c8ee54bf9bffb32c096
Reviewed-on: https://go-review.googlesource.com/c/134780
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-10-02 20:35:31 +00:00
Austin Clements 198440cc3d runtime: remove GODEBUG=gcrescanstacks=1 mode
Currently, setting GODEBUG=gcrescanstacks=1 enables a debugging mode
where the garbage collector re-scans goroutine stacks during mark
termination. This was introduced in Go 1.8 to debug the hybrid write
barrier, but I don't think we ever used it.

Now it's one of the last sources of mark work during mark termination.
This CL removes it.

Updates #26903. This is preparation for unifying STW GC and concurrent
GC.

Updates #17503.

Change-Id: I6ae04d3738aa9c448e6e206e21857a33ecd12acf
Reviewed-on: https://go-review.googlesource.com/c/134777
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-10-02 20:35:27 +00:00
Austin Clements ecc365960b runtime: avoid using STW GC mechanism for checkmarks mode
Currently, checkmarks mode uses the full STW GC infrastructure to
perform mark checking. We're about to remove that infrastructure and,
furthermore, since checkmarks is about doing the simplest thing
possible to check concurrent GC, it's valuable for it to be simpler.

Hence, this CL makes checkmarks even simpler by making it non-parallel
and divorcing it from the STW GC infrastructure (including the
gchelper mechanism).

Updates #26903. This is preparation for unifying STW GC and concurrent
GC.

Change-Id: Iad21158123e025e3f97d7986d577315e994bd43e
Reviewed-on: https://go-review.googlesource.com/c/134776
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-10-02 20:35:26 +00:00
Austin Clements 918ed88e47 runtime: remove gcStart's mode argument
This argument is always gcBackgroundMode since only
debug.gcstoptheworld can trigger a STW GC at this point. Remove the
unnecessary argument.

Updates #26903. This is preparation for unifying STW GC and concurrent
GC.

Change-Id: Icb4ba8f10f80c2b69cf51a21e04fa2c761b71c94
Reviewed-on: https://go-review.googlesource.com/c/134775
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-10-02 20:35:24 +00:00
Austin Clements e25ef35254 runtime: don't disable GC work caching during mark termination
Currently, we disable GC work caching during mark termination. This is
no longer necessary with the new mark completion detection because

1. There's no way for any of the GC mark termination helpers to have
any real work queued and,

2. Mark termination has to explicitly flush every P's buffers anyway
in order to flush Ps that didn't run a GC mark termination helper.

Hence, remove the code that disposes gcWork buffers during mark
termination.

Updates #26903. This is a follow-up to eliminating mark 2.

Change-Id: I81f002ee25d5c10f42afd39767774636519007f9
Reviewed-on: https://go-review.googlesource.com/c/134320
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-10-02 20:35:23 +00:00
Austin Clements d398dbdfc3 runtime: eliminate gcBlackenPromptly mode
Now that there is no mark 2 phase, gcBlackenPromptly is no longer
used.

Updates #26903. This is a follow-up to eliminating mark 2.

Change-Id: Ib9c534f21b36b8416fcf3cab667f186167b827f8
Reviewed-on: https://go-review.googlesource.com/c/134319
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-10-02 20:35:21 +00:00