Commit Graph

209 Commits

Author SHA1 Message Date
Austin Clements 886caba73c runtime: always mark span when marking an object
The page sweeper depends on spans being marked if any object in the
span is marked, but currently only greyobject does this.
gcmarknewobject and wbBufFlush1 also mark objects, but neither set
span marks. As a result, if there are live objects on a span, but
they're all marked via allocation or write barriers, then the span
itself won't be marked and the page reclaimer will free the span,
ultimately leading to memory corruption when the memory for those live
allocations gets reused.

Fix this by making gcmarknewobject and wbBufFlush1 also mark pages.

No test because I have no idea how to reliably (or even unreliably)
trigger this.

Fixes #39432.

Performance is a wash or very slightly worse. I benchmarked the
gcmarknewobject and wbBufFlush1 changes independently and both showed
a slight performance improvement, so I'm going to call this noise.

name                                old time/op  new time/op  delta
BiogoIgor                            15.9s ± 2%   15.9s ± 2%    ~     (p=0.758 n=25+25)
BiogoKrishna                         15.7s ± 3%   15.7s ± 3%    ~     (p=0.382 n=21+21)
BleveIndexBatch100                   4.94s ± 3%   5.07s ± 4%  +2.63%  (p=0.000 n=25+25)
CompileTemplate                      204ms ± 1%   205ms ± 1%  +0.43%  (p=0.000 n=21+23)
CompileUnicode                      77.8ms ± 1%  78.1ms ± 1%    ~     (p=0.130 n=23+23)
CompileGoTypes                       731ms ± 1%   733ms ± 1%  +0.30%  (p=0.006 n=22+22)
CompileCompiler                      3.64s ± 2%   3.65s ± 3%    ~     (p=0.179 n=24+25)
CompileSSA                           8.44s ± 1%   8.46s ± 1%  +0.30%  (p=0.003 n=22+23)
CompileFlate                         132ms ± 1%   133ms ± 1%    ~     (p=0.098 n=22+22)
CompileGoParser                      164ms ± 1%   164ms ± 1%  +0.37%  (p=0.000 n=21+23)
CompileReflect                       455ms ± 1%   457ms ± 2%  +0.50%  (p=0.002 n=20+22)
CompileTar                           182ms ± 2%   182ms ± 1%    ~     (p=0.382 n=22+22)
CompileXML                           245ms ± 3%   245ms ± 1%    ~     (p=0.070 n=21+23)
CompileStdCmd                        16.5s ± 2%   16.5s ± 3%    ~     (p=0.486 n=23+23)
FoglemanFauxGLRenderRotateBoat       12.9s ± 1%   13.0s ± 1%  +0.97%  (p=0.000 n=21+24)
FoglemanPathTraceRenderGopherIter1   18.6s ± 1%   18.7s ± 0%    ~     (p=0.083 n=23+24)
GopherLuaKNucleotide                 28.4s ± 1%   29.3s ± 1%  +2.84%  (p=0.000 n=25+25)
MarkdownRenderXHTML                  252ms ± 0%   251ms ± 1%  -0.50%  (p=0.000 n=23+24)
Tile38WithinCircle100kmRequest       516µs ± 2%   516µs ± 2%    ~     (p=0.763 n=24+25)
Tile38IntersectsCircle100kmRequest   689µs ± 2%   689µs ± 2%    ~     (p=0.617 n=24+24)
Tile38KNearestLimit100Request        608µs ± 1%   606µs ± 2%  -0.35%  (p=0.030 n=19+22)
[Geo mean]                           522ms        524ms       +0.41%

https://perf.golang.org/search?q=upload:20200606.4

Change-Id: I8b331f310dbfaba0468035f207467c8403005bf5
Reviewed-on: https://go-review.googlesource.com/c/go/+/236817
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2020-06-08 17:09:33 +00:00
Austin Clements 57d751370c runtime: use conservative scanning for debug calls
A debugger can inject a call at almost any PC, which causes
significant complications with stack scanning and growth. Currently,
the runtime solves this using precise stack maps and register maps at
nearly all PCs, but these extra maps require roughly 5% of the binary.
These extra maps were originally considered worth this space because
they were intended to be used for non-cooperative preemption, but are
now used only for debug call injection.

This CL switches from using precise maps to instead using conservative
frame scanning, much like how non-cooperative preemption works. When a
call is injected, the runtime flushes all potential pointer registers
to the stack, and then treats that frame as well as the interrupted
frame conservatively.

The limitation of conservative frame scanning is that we cannot grow
the goroutine stack. That's doable because the previous CL switched to
performing debug calls on a new goroutine, where they are free to grow
the stack.

With this CL, there are no remaining uses of precise register maps
(though we still use the unsafe-point information that's encoded in
the register map PCDATA stream), and stack maps are only used at call
sites.

For #36365.

Change-Id: Ie217b6711f3741ccc437552d8ff88f961a73cee0
Reviewed-on: https://go-review.googlesource.com/c/go/+/229300
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-04-29 21:29:13 +00:00
Michael Anthony Knyszek eacdf76b93 runtime: add bitmap-based markrootSpans implementation
Currently markrootSpans, the scanning routine which scans span specials
(particularly finalizers) as roots, uses sweepSpans to shard work and
find spans to mark.

However, as part of a future CL to change span ownership and how
mcentral works, we want to avoid having markrootSpans use the sweep bufs
to find specials, so in this change we introduce a new mechanism.

Much like for the page reclaimer, we set up a per-page bitmap where the
first page for a span is marked if the span contains any specials, and
unmarked if it has no specials. This bitmap is updated by addspecial,
removespecial, and during sweeping.

markrootSpans then shards this bitmap into mark work and markers iterate
over the bitmap looking for spans with specials to mark. Unlike the page
reclaimer, we don't need to use the pageInUse bits because having a
special implies that a span is in-use.

While in terms of computational complexity this design is technically
worse, because it needs to iterate over the mapped heap, in practice
this iteration is very fast (we can skip over large swathes of the heap
very quickly) and we only look at spans that have any specials at all,
rather than having to touch each span.

This new implementation of markrootSpans is behind a feature flag called
go115NewMarkrootSpans.

Updates #37487.

Change-Id: I8ea07b6c11059f6d412fe419e0ab512d989377b8
Reviewed-on: https://go-review.googlesource.com/c/go/+/221178
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2020-04-21 22:50:51 +00:00
Austin Clements d965bb6130 runtime: use divRoundUp
There are a handful of places where the runtime wants to round up the
result of a division. We just introduced a helper to do this. This CL
replaces all of the hand-coded round-ups (that I could find) with this
helper.

Change-Id: I465d152157ff0f3cad40c0aa57491e4f2de510ad
Reviewed-on: https://go-review.googlesource.com/c/go/+/224385
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-20 16:05:35 +00:00
Michael Anthony Knyszek 79b43fa819 runtime: preempt dedicated background mark workers for STW
Currently, dedicated background mark workers are essentially always
non-preemptible.

This change makes it so that dedicated background mark workers park if
their preemption flag is set and someone is trying to STW, allowing them
to do so.

This change prepares us for allowing a STW to happen (and happen
promptly) during GC marking in a follow-up change.

Updates #19812.

Change-Id: I67fb6085bf0f0aebd18ca500172767818a1f15e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/215157
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Austin Clements <austin@google.com>
2020-03-18 17:59:09 +00:00
Johan Jansson cdf3db5df6 runtime: remove comment about gcCopySpans()
Remove documentation reference to gcCopySpans(), as that function was
removed in https://golang.org/cl/30537

Fixes #35683

Change-Id: I7fb7c6cc60bfb3a133a019a20eb3f9d4c7627b31
Reviewed-on: https://go-review.googlesource.com/c/go/+/209917
Reviewed-by: Austin Clements <austin@google.com>
2019-12-05 04:58:28 +00:00
Michael Anthony Knyszek a5a6f61043 runtime: fix (*gcSweepBuf).block guarantees
Currently gcSweepBuf guarantees that push operations may be performed
concurrently with each other and that block operations may be performed
concurrently with push operations as well.

Unfortunately, this isn't quite true. The existing code allows push
operations to happen concurrently with each other, but block operations
may return blocks with nil entries. The way this can happen is if two
concurrent pushers grab a slot to push to, and the first one (the one
with the earlier slot in the buffer) doesn't quite write a span value
when the block is called. The existing code in block only checks if the
very last value in the block is nil, when really an arbitrary number of
the last few values in the block may or may not be nil.

Today, this case can't actually happen because when push operations
happen concurrently during a GC (which is the only time block is
called), they only ever happen during an allocation with the heap lock
held, effectively serializing them. A block operation may happen
concurrently with one of these pushes, but its callers will never see a
nil mspan. Outside of a GC, this isn't a problem because although push
operations from allocations can run concurrently with push operations
from sweeping, block operations will never run.

In essence, the real concurrency guarantees provided by gcSweepBuf are
that block operations may happen concurrently with push operations, but
that push operations may not be concurrent with each other if there are
any block operations.

To fix this, and to prepare for push operations happening without the
heap lock held in a future CL, we update the documentation for block to
correctly state that there may be nil entries in the returned slice.
While we're here, make the mspan writes into the buffer atomic to avoid
a block user racing on a nil check, and document that the user should
load mspan values from the returned slice atomically. Finally, we make
all callers of block adhere to the new rules.

We choose to allow nil values rather than filter them out because the
only caller of block is markrootSpans, and if it catches a nil entry,
then there wasn't anything to mark in there anyway since the span is
just being created.

Updates #35112.

Change-Id: I6450aab15f51690d7a000ba5b3d529cf2ca5da1e
Reviewed-on: https://go-review.googlesource.com/c/go/+/203318
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-11-08 17:01:05 +00:00
Austin Clements 62e53b7922 runtime: use signals to preempt Gs for suspendG
This adds support for pausing a running G by sending a signal to its
M.

The main complication is that we want to target a G, but can only send
a signal to an M. Hence, the protocol we use is to simply mark the G
for preemption (which we already do) and send the M a "wake up and
look around" signal. The signal checks if it's running a G with a
preemption request and stops it if so in the same way that stack check
preemptions stop Gs. Since the preemption may fail (the G could be
moved or the signal could arrive at an unsafe point), we keep a count
of the number of received preemption signals. This lets stopG detect
if its request failed and should be retried without an explicit
channel back to suspendG.

For #10958, #24543.

Change-Id: I3e1538d5ea5200aeb434374abb5d5fdc56107e53
Reviewed-on: https://go-review.googlesource.com/c/go/+/201760
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-11-02 21:51:18 +00:00
Austin Clements d16ec13756 runtime: scan stacks conservatively at async safe points
This adds support for scanning the stack when a goroutine is stopped
at an async safe point. This is not yet lit up because asyncPreempt is
not yet injected, but prepares us for that.

This works by conservatively scanning the registers dumped in the
frame of asyncPreempt and its parent frame, which was stopped at an
asynchronous safe point.

Conservative scanning works by only marking words that are pointers to
valid, allocated heap objects. One complication is pointers to stack
objects. In this case, we can't determine if the stack object is still
"allocated" or if it was freed by an earlier GC. Hence, we need to
propagate the conservative-ness of scanning stack objects: if all
pointers found to a stack object were found via conservative scanning,
then the stack object itself needs to be scanned conservatively, since
its pointers may point to dead objects.

For #10958, #24543.

Change-Id: I7ff84b058c37cde3de8a982da07002eaba126fd6
Reviewed-on: https://go-review.googlesource.com/c/go/+/201761
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-11-02 21:51:16 +00:00
Austin Clements 7de15e362b runtime: atomically set span state and use as publication barrier
When everything is working correctly, any pointer the garbage
collector encounters can only point into a fully initialized heap
span, since the span must have been initialized before that pointer
could escape the heap allocator and become visible to the GC.

However, in various cases, we try to be defensive against bad
pointers. In findObject, this is just a sanity check: we never expect
to find a bad pointer, but programming errors can lead to them. In
spanOfHeap, we don't necessarily trust the pointer and we're trying to
check if it really does point to the heap, though it should always
point to something. Conservative scanning takes this to a new level,
since it can only guess that a word may be a pointer and verify this.

In all of these cases, we have a problem that the span lookup and
check can race with span initialization, since the span becomes
visible to lookups before it's fully initialized.

Furthermore, we're about to start initializing the span without the
heap lock held, which is going to introduce races where accesses were
previously protected by the heap lock.

To address this, this CL makes accesses to mspan.state atomic, and
ensures that the span is fully initialized before setting the state to
mSpanInUse. All loads are now atomic, and in any case where we don't
trust the pointer, it first atomically loads the span state and checks
that it's mSpanInUse, after which it will have synchronized with span
initialization and can safely check the other span fields.

For #10958, #24543, but a good fix in general.

Change-Id: I518b7c63555b02064b98aa5f802c92b758fef853
Reviewed-on: https://go-review.googlesource.com/c/go/+/203286
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2019-10-31 17:09:50 +00:00
Austin Clements 6058603471 runtime: only shrink stacks at synchronous safe points
We're about to introduce asynchronous safe points, where we won't have
precise pointer maps for all stack frames. That's okay for scanning
the stack (conservatively), but not for shrinking the stack.

Hence, this CL prepares for this by only shrinking the stack as part
of the stack scan if the goroutine is stopped at a synchronous safe
point. Otherwise, it queues up the stack shrink for the next
synchronous safe point.

We already have one condition under which we can't shrink the stack
for very similar reasons: syscalls. Currently, we just give up on
shrinking the stack if it's in a syscall. But with this mechanism, we
defer that stack shrink until the next synchronous safe point.

For #10958, #24543.

Change-Id: Ifa1dec6f33fdf30f9067be2ce3f7ab8a7f62ce38
Reviewed-on: https://go-review.googlesource.com/c/go/+/201438
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-25 23:25:35 +00:00
Austin Clements 8c5861576a runtime: remove g.gcscanvalid
Currently, gcscanvalid is used to resolve a race between attempts to
scan a stack. Now that there's a clear owner of the stack scan
operation, there's no longer any danger of racing or attempting to
scan a stack more than once, so this CL eliminates gcscanvalid.

I double-checked my reasoning by first adding a throw if gcscanvalid
was set in scanstack and verifying that all.bash still passed.

For #10958, #24543.
Fixes #24363.

Change-Id: I76794a5fcda325ed7cfc2b545e2a839b8b3bc713
Reviewed-on: https://go-review.googlesource.com/c/go/+/201139
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-25 23:25:32 +00:00
Austin Clements 3f834114ab runtime: add general suspendG/resumeG
Currently, the process of suspending a goroutine is tied to stack
scanning. In preparation for non-cooperative preemption, this CL
abstracts this into general purpose suspendG/resumeG functions.

suspendG and resumeG closely follow the existing scang and restartg
functions with one exception: the addition of a _Gpreempted status.
Currently, preemption tasks (stack scanning) are carried out by the
target goroutine if it's in _Grunning. In this new approach, the task
is always carried out by the goroutine that called suspendG. Thus, we
need a reliable way to drive the target goroutine out of _Grunning
until the requesting goroutine is ready to resume it. The new
_Gpreempted state provides the handshake: when a runnable goroutine
responds to a preemption request, it now parks itself and enters
_Gpreempted. The requesting goroutine races to put it in _Gwaiting,
which gives it ownership, but also the responsibility to start it
again.

This CL adds several TODOs about improving the synchronization on the
G status. The existing code already has these problems; we're just
taking note of them.

The next CL will remove the now-dead scang and preemptscan.

For #10958, #24543.

Change-Id: I16dbf87bea9d50399cc86719c156f48e67198f16
Reviewed-on: https://go-review.googlesource.com/c/go/+/201137
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-25 23:25:28 +00:00
Ian Lance Taylor 66e562cc52 runtime: avoid overflow in markrootBlock
In a position independent executable the data or BSS may be located
close to the end of memory. If it is placed closer than
rootBlockBytes, then the calculations in markrootBlock would overflow,
and the test that ensures that n is not larger than n0 would fail.
This would then cause scanblock to scan data that it shouldn't,
using an effectively random ptrmask, leading to program crashes.

No test because the only way to test it is to build a PIE and convince
the kernel to put the data section near the end of memory, and I don't
know how to do that. Or perhaps we could use a linker script, but that
is painful.

The new code is algebraically identical to the original code, but
avoids the potential overflow of b+rootBlockBytes.

Change-Id: Ieb4e5465174bb762b063d2491caeaa745017345e
Reviewed-on: https://go-review.googlesource.com/c/go/+/195717
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-09-20 05:02:16 +00:00
Keith Randall 8f296f59de Revert "Revert "cmd/compile,runtime: allocate defer records on the stack""
This reverts CL 180761

Reason for revert: Reinstate the stack-allocated defer CL.

There was nothing wrong with the CL proper, but stack allocation of defers exposed two other issues.

Issue #32477: Fix has been submitted as CL 181258.
Issue #32498: Possible fix is CL 181377 (not submitted yet).

Change-Id: I32b3365d5026600069291b068bbba6cb15295eb3
Reviewed-on: https://go-review.googlesource.com/c/go/+/181378
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-06-10 16:19:39 +00:00
Keith Randall 49200e3f3e Revert "cmd/compile,runtime: allocate defer records on the stack"
This reverts commit fff4f599fe.

Reason for revert: Seems to still have issues around GC.

Fixes #32452

Change-Id: Ibe7af629f9ad6a3d5312acd7b066123f484da7f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/180761
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2019-06-05 19:50:09 +00:00
Keith Randall fff4f599fe cmd/compile,runtime: allocate defer records on the stack
When a defer is executed at most once in a function body,
we can allocate the defer record for it on the stack instead
of on the heap.

This should make defers like this (which are very common) faster.

This optimization applies to 363 out of the 370 static defer sites
in the cmd/go binary.

name     old time/op  new time/op  delta
Defer-4  52.2ns ± 5%  36.2ns ± 3%  -30.70%  (p=0.000 n=10+10)

Fixes #6980
Update #14939

Change-Id: I697109dd7aeef9e97a9eeba2ef65ff53d3ee1004
Reviewed-on: https://go-review.googlesource.com/c/go/+/171758
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-06-04 17:35:20 +00:00
Tamir Duberstein 9c86eae384 runtime: resolve latent TODOs
These were added in https://go-review.googlesource.com/1224; according
to austin@google.com these annotations are not valuable - resolving by
removing the TODOs.

Change-Id: Icf3f21bc385cac9673ba29f0154680e970cf91f2
Reviewed-on: https://go-review.googlesource.com/c/go/+/176899
Reviewed-by: Austin Clements <austin@google.com>
2019-05-13 19:33:29 +00:00
Austin Clements 68d89bb8e0 runtime: separate stack freeing from stack shrinking
Currently, shrinkstack will free the stack if the goroutine is dead.
There are only two places that call shrinkstack: scanstack, which will
never call it if the goroutine is dead; and markrootFreeGStacks, which
only calls it on dead goroutines.

Clean this up by separating stack freeing out of shrinkstack.

Change-Id: I7d7891e620550c32a2220833923a025704986681
Reviewed-on: https://go-review.googlesource.com/c/go/+/170890
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2019-04-05 20:19:59 +00:00
Cherry Zhang 4f4c2a79d4 runtime: scan defer closure in stack scan
With stack objects, when we scan the stack, it scans defers with
tracebackdefers, but it seems to me that tracebackdefers doesn't
include the func value itself, which could be a stack allocated
closure. Scan it explicitly.

Alternatively, we can change tracebackdefers to include the func
value, which in turn needs to change the type of stkframe.

Fixes #30453.

Change-Id: I55a6e43264d6952ab2fa5c638bebb89fdc410e2b
Reviewed-on: https://go-review.googlesource.com/c/164118
Reviewed-by: Keith Randall <khr@golang.org>
2019-03-01 16:21:29 +00:00
Cherry Zhang af8f4062c2 runtime: scan gp._panic in stack scan
In runtime.gopanic, the _panic object p is stack allocated and
referenced from gp._panic. With stack objects, p on stack is dead
at the point preprintpanics runs. gp._panic points to p, but
stack scan doesn't look at gp. Heap scan of gp does look at
gp._panic, but it stops and ignores the pointer as it points to
the stack. So whatever p points to may be collected and clobbered.
We need to scan gp._panic explicitly during stack scan.

To test it reliably, we introduce a GODEBUG mode "clobberfree",
which clobbers the memory content when the GC frees an object.

Fixes #30150.

Change-Id: I11128298f03a89f817faa221421a9d332b41dced
Reviewed-on: https://go-review.googlesource.com/c/161778
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-02-13 15:49:22 +00:00
Tobias Klauser 9e277f7d55 all: use "reports whether" consistently instead of "returns whether"
Follow-up for CL 147037 and after Brad noticed the "returns whether"
pattern during the review of CL 150621.

Go documentation style for boolean funcs is to say:

    // Foo reports whether ...
    func Foo() bool

(rather than "returns whether")

Created with:

    $ perl -i -npe 's/returns whether/reports whether/' $(git grep -l "returns whether" | grep -v vendor)

Change-Id: I15fe9ff99180ad97750cd05a10eceafdb12dc0b4
Reviewed-on: https://go-review.googlesource.com/c/150918
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-12-02 15:12:26 +00:00
Austin Clements ba1698e963 runtime: mark span when marking any object on the span
This adds a mark bit for each span that is set if any objects on the
span are marked. This will be used for sweeping.

For #18155.

The impact of this is negligible for most benchmarks, and < 1% for
GC-heavy benchmarks.

name                       old time/op  new time/op  delta
Garbage/benchmem-MB=64-12  2.18ms ± 0%  2.20ms ± 1%  +0.88%  (p=0.000 n=16+18)

(https://perf.golang.org/search?q=upload:20180928.1)

name                      old time/op    new time/op    delta
BinaryTree17-12              2.68s ± 1%     2.68s ± 1%    ~     (p=0.707 n=17+19)
Fannkuch11-12                2.28s ± 0%     2.39s ± 0%  +4.95%  (p=0.000 n=19+18)
FmtFprintfEmpty-12          40.3ns ± 4%    39.4ns ± 2%  -2.27%  (p=0.000 n=17+18)
FmtFprintfString-12         67.9ns ± 1%    68.3ns ± 1%  +0.55%  (p=0.000 n=18+19)
FmtFprintfInt-12            75.7ns ± 1%    76.1ns ± 1%  +0.44%  (p=0.005 n=18+19)
FmtFprintfIntInt-12          123ns ± 1%     121ns ± 1%  -1.00%  (p=0.000 n=18+18)
FmtFprintfPrefixedInt-12     150ns ± 0%     148ns ± 0%  -1.33%  (p=0.000 n=16+13)
FmtFprintfFloat-12           208ns ± 0%     204ns ± 0%  -1.92%  (p=0.000 n=13+17)
FmtManyArgs-12               501ns ± 1%     498ns ± 0%  -0.55%  (p=0.000 n=19+17)
GobDecode-12                6.24ms ± 0%    6.25ms ± 1%    ~     (p=0.113 n=20+19)
GobEncode-12                5.33ms ± 0%    5.29ms ± 1%  -0.72%  (p=0.000 n=20+18)
Gzip-12                      220ms ± 1%     218ms ± 1%  -1.02%  (p=0.000 n=19+19)
Gunzip-12                   35.5ms ± 0%    35.7ms ± 0%  +0.45%  (p=0.000 n=16+18)
HTTPClientServer-12         77.9µs ± 1%    77.7µs ± 1%  -0.30%  (p=0.047 n=20+19)
JSONEncode-12               8.82ms ± 0%    8.93ms ± 0%  +1.20%  (p=0.000 n=18+17)
JSONDecode-12               47.3ms ± 0%    47.0ms ± 0%  -0.49%  (p=0.000 n=17+18)
Mandelbrot200-12            3.69ms ± 0%    3.68ms ± 0%  -0.25%  (p=0.000 n=19+18)
GoParse-12                  3.13ms ± 1%    3.13ms ± 1%    ~     (p=0.640 n=20+20)
RegexpMatchEasy0_32-12      76.2ns ± 1%    76.2ns ± 1%    ~     (p=0.818 n=20+19)
RegexpMatchEasy0_1K-12       226ns ± 0%     226ns ± 0%  -0.22%  (p=0.001 n=17+18)
RegexpMatchEasy1_32-12      71.9ns ± 1%    72.0ns ± 1%    ~     (p=0.653 n=18+18)
RegexpMatchEasy1_1K-12       355ns ± 1%     356ns ± 1%    ~     (p=0.160 n=18+19)
RegexpMatchMedium_32-12      106ns ± 1%     106ns ± 1%    ~     (p=0.325 n=17+20)
RegexpMatchMedium_1K-12     31.1µs ± 2%    31.2µs ± 0%  +0.59%  (p=0.007 n=19+15)
RegexpMatchHard_32-12       1.54µs ± 2%    1.53µs ± 2%  -0.78%  (p=0.021 n=17+18)
RegexpMatchHard_1K-12       46.0µs ± 1%    45.9µs ± 1%  -0.31%  (p=0.025 n=17+19)
Revcomp-12                   391ms ± 1%     394ms ± 2%  +0.80%  (p=0.000 n=17+19)
Template-12                 59.9ms ± 1%    59.9ms ± 1%    ~     (p=0.428 n=20+19)
TimeParse-12                 304ns ± 1%     312ns ± 0%  +2.88%  (p=0.000 n=20+17)
TimeFormat-12                318ns ± 0%     326ns ± 0%  +2.64%  (p=0.000 n=20+17)

(https://perf.golang.org/search?q=upload:20180928.2)

Change-Id: I336b9bf054113580a24103192904c8c76593e90e
Reviewed-on: https://go-review.googlesource.com/c/138958
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2018-11-15 19:27:09 +00:00
Michael Anthony Knyszek 44dcb5cb61 runtime: clean up MSpan* MCache* MCentral* in docs
This change cleans up references to MSpan, MCache, and MCentral in the
docs via a bunch of sed invocations to better reflect the Go names for
the equivalent structures (i.e. mspan, mcache, mcentral) and their
methods (i.e. MSpan_Sweep -> mspan.sweep).

Change-Id: Ie911ac975a24bd25200a273086dd835ab78b1711
Reviewed-on: https://go-review.googlesource.com/c/147557
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-11-05 22:56:22 +00:00
Austin Clements 1d09433ec0 runtime: undo manual inlining of mbits.setMarked
Since atomic.Or8 is now an intrinsic (and has been for some time),
markBits.setMarked is inlinable. Undo the manual inlining of it.

Change-Id: I8e37ccf0851ad1d3088d9c8ae0f6f0c439d7eb2d
Reviewed-on: https://go-review.googlesource.com/c/138659
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-10-09 16:44:45 +00:00
Keith Randall cbafcc55e8 cmd/compile,runtime: implement stack objects
Rework how the compiler+runtime handles stack-allocated variables
whose address is taken.

Direct references to such variables work as before. References through
pointers, however, use a new mechanism. The new mechanism is more
precise than the old "ambiguously live" mechanism. It computes liveness
at runtime based on the actual references among objects on the stack.

Each function records all of its address-taken objects in a FUNCDATA.
These are called "stack objects". The runtime then uses that
information while scanning a stack to find all of the stack objects on
a stack. It then does a mark phase on the stack objects, using all the
pointers found on the stack (and ancillary structures, like defer
records) as the root set. Only stack objects which are found to be
live during this mark phase will be scanned and thus retain any heap
objects they point to.

A subsequent CL will remove all the "ambiguously live" logic from
the compiler, so that the stack object tracing will be required.
For this CL, the stack tracing is all redundant with the current
ambiguously live logic.

Update #22350

Change-Id: Ide19f1f71a5b6ec8c4d54f8f66f0e9a98344772f
Reviewed-on: https://go-review.googlesource.com/c/134155
Reviewed-by: Austin Clements <austin@google.com>
2018-10-03 19:52:49 +00:00
Austin Clements 0906d648aa runtime: eliminate gchelper mechanism
Now that we do no mark work during mark termination, we no longer need
the gchelper mechanism.

Updates #26903.
Updates #17503.

Change-Id: Ie94e5c0f918cfa047e88cae1028fece106955c1b
Reviewed-on: https://go-review.googlesource.com/c/134785
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-10-02 20:35:38 +00:00
Austin Clements 550dfc8ae1 runtime: eliminate work.markrootdone and second root marking pass
Before STW and concurrent GC were unified, there could be either one
or two root marking passes per GC cycle. There were several tasks we
had to make sure happened once and only once (whether that was at the
beginning of concurrent mark for concurrent GC or during mark
termination for STW GC). We kept track of this in work.markrootdone.

Now that STW and concurrent GC both use the concurrent marking code
and we've eliminated all work done by the second root marking pass, we
only ever need a single root marking pass. Hence, we can eliminate
work.markrootdone and all of the code that's conditional on it.

Updates #26903.

Change-Id: I654a0f5e21b9322279525560a31e64b8d33b790f
Reviewed-on: https://go-review.googlesource.com/c/134784
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-10-02 20:35:37 +00:00
Austin Clements 873bd47dfb runtime: flush mcaches lazily
Currently, all mcaches are flushed during STW mark termination as a
root marking job. This is currently necessary because all spans must
be out of these caches before sweeping begins to avoid races with
allocation and to ensure the spans are in the state expected by
sweeping. We do it as a root marking job because mcache flushing is
somewhat expensive and O(GOMAXPROCS) and this parallelizes the work
across the Ps. However, it's also the last remaining root marking job
performed during mark termination.

This CL moves mcache flushing out of mark termination and performs it
lazily. We keep track of the last sweepgen at which each mcache was
flushed and as each P is woken from STW, it observes that its mcache
is out-of-date and flushes it.

The introduces a complication for spans cached in stale mcaches. These
may now be observed by background or proportional sweeping or when
attempting to add a finalizer, but aren't in a stable state. For
example, they are likely to be on the wrong mcentral list. To fix
this, this CL extends the sweepgen protocol to also capture whether a
span is cached and, if so, whether or not its cache is stale. This
protocol blocks asynchronous sweeping from touching cached spans and
makes it the responsibility of mcache flushing to sweep the flushed
spans.

This eliminates the last mark termination root marking job, which
means we can now eliminate that entire infrastructure.

Updates #26903. This implements lazy mcache flushing.

Change-Id: Iadda7aabe540b2026cffc5195da7be37d5b4125e
Reviewed-on: https://go-review.googlesource.com/c/134783
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-10-02 20:35:35 +00:00
Austin Clements 457c8f4fe9 runtime: eliminate blocking GC work drains
Now work.helperDrainBlock is always false, so we can remove it and
code paths that only ran when it was true. That means we no longer use
the gcDrainBlock mode of gcDrain, so we can eliminate that. That means
we no longer use gcWork.get, so we can eliminate that. That means we
no longer use getfull, so we can eliminate that.

Updates #26903. This is a follow-up to unifying STW GC and concurrent GC.

Change-Id: I8dbcf8ce24861df0a6149e0b7c5cd0eadb5c13f6
Reviewed-on: https://go-review.googlesource.com/c/134782
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-10-02 20:35:34 +00:00
Austin Clements 198440cc3d runtime: remove GODEBUG=gcrescanstacks=1 mode
Currently, setting GODEBUG=gcrescanstacks=1 enables a debugging mode
where the garbage collector re-scans goroutine stacks during mark
termination. This was introduced in Go 1.8 to debug the hybrid write
barrier, but I don't think we ever used it.

Now it's one of the last sources of mark work during mark termination.
This CL removes it.

Updates #26903. This is preparation for unifying STW GC and concurrent
GC.

Updates #17503.

Change-Id: I6ae04d3738aa9c448e6e206e21857a33ecd12acf
Reviewed-on: https://go-review.googlesource.com/c/134777
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-10-02 20:35:27 +00:00
Austin Clements e25ef35254 runtime: don't disable GC work caching during mark termination
Currently, we disable GC work caching during mark termination. This is
no longer necessary with the new mark completion detection because

1. There's no way for any of the GC mark termination helpers to have
any real work queued and,

2. Mark termination has to explicitly flush every P's buffers anyway
in order to flush Ps that didn't run a GC mark termination helper.

Hence, remove the code that disposes gcWork buffers during mark
termination.

Updates #26903. This is a follow-up to eliminating mark 2.

Change-Id: I81f002ee25d5c10f42afd39767774636519007f9
Reviewed-on: https://go-review.googlesource.com/c/134320
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-10-02 20:35:23 +00:00
Austin Clements d398dbdfc3 runtime: eliminate gcBlackenPromptly mode
Now that there is no mark 2 phase, gcBlackenPromptly is no longer
used.

Updates #26903. This is a follow-up to eliminating mark 2.

Change-Id: Ib9c534f21b36b8416fcf3cab667f186167b827f8
Reviewed-on: https://go-review.googlesource.com/c/134319
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-10-02 20:35:21 +00:00
Austin Clements 9c634ea889 runtime: flush write barrier buffer to create work
Currently, if the gcWork runs out of work, we'll fall out of the GC
worker, even though flushing the write barrier buffer could produce
more work. While this is not a correctness issue, it can lead to
premature mark 2 or mark termination.

Fix this by flushing the write barrier buffer if the local gcWork runs
out of work and then checking the local gcWork again.

This reduces the number of premature mark terminations during all.bash
by about a factor of 10.

Updates #26903. This is preparation for eliminating mark 2.

Change-Id: I48577220b90c86bfd28d498e8603bc379a8cd617
Reviewed-on: https://go-review.googlesource.com/c/134315
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-10-02 20:35:15 +00:00
Austin Clements 5a8c11ce3e runtime: rename _MSpan* constants to mSpan*
We already aliased mSpanInUse to _MSpanInUse. The dual constants are
getting annoying, so fix all of these to use the mSpan* naming
convention.

This was done automatically with:
  sed -i -re 's/_?MSpan(Dead|InUse|Manual|Free)/mSpan\1/g' *.go
plus deleting the existing definition of mSpanInUse.

Change-Id: I09979d9d491d06c10689cea625dc57faa9cc6767
Reviewed-on: https://go-review.googlesource.com/137875
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-09-26 20:51:07 +00:00
Austin Clements 083938df14 runtime: use gList for injectglist
Change-Id: Id5af75eaaf41f43bc6baa6d3fe2b852a2f93bb6f
Reviewed-on: https://go-review.googlesource.com/129400
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-08-20 18:19:28 +00:00
Austin Clements 8e8cc9db0f runtime: use gList for gfree lists
Change-Id: I3d21587e02264fe5da1cc38d98779facfa09b927
Reviewed-on: https://go-review.googlesource.com/129398
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-08-20 18:19:26 +00:00
Austin Clements 3578918b66 runtime: replace manually managed G dequeues with a type
There are two manually managed G dequeues. Abstract these both into a
shared gQueue type. This also introduces a gList type, which we'll use
to replace several manually-managed G lists in follow-up CLs.

This makes the code more readable and maintainable. gcFlushBgCredit in
particular becomes much easier to follow. It also makes it easier to
introduce more G queues in the future. Finally, the gList type clearly
distinguishes between lists of Gs and individual Gs; currently both
are represented by a *g, which can easily lead to confusion and bugs.

Change-Id: Ic7798841b405d311fc8b6aa5a958ffa4c7993c6c
Reviewed-on: https://go-review.googlesource.com/129396
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-08-20 18:19:22 +00:00
Austin Clements 3080b7d0af runtime: unify fetching of locals and arguments maps
Currently we have two nearly identical copies of the code that fetches
the locals and arguments liveness maps for a frame, plus a third
that's a poor knock-off. Unify these all into a single function.

Change-Id: Ibce7926a0b0e3d23182112da4e25df899579a585
Reviewed-on: https://go-review.googlesource.com/109698
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-05-14 00:20:16 +00:00
Josh Bleecher Snyder 4d7cf3fedb runtime: convert g.waitreason from string to uint8
Every time I poke at #14921, the g.waitreason string
pointer writes show up.

They're not particularly important performance-wise,
but it'd be nice to clear the noise away.

And it does open up a few extra bytes in the g struct
for some future use.

This is a re-roll of CL 99078, which was rolled
back because of failures on s390x.
Those failures were apparently due to an old version of gdb.

Change-Id: Icc2c12f449b2934063fd61e272e06237625ed589
Reviewed-on: https://go-review.googlesource.com/111256
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
2018-05-03 17:04:22 +00:00
Josh Bleecher Snyder d91e9705f8 runtime: avoid unnecessary scanblock calls
This is the scanstack analog of CL 104737,
which made a similar change for copystack.

name         old time/op  new time/op  delta
ScanStack-8  41.1ms ± 6%  38.9ms ± 5%  -5.52%  (p=0.000 n=50+48)

Change-Id: I7427151dea2895ed3934f8a0f61d96b568019217
Reviewed-on: https://go-review.googlesource.com/105536
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-05-01 13:33:17 +00:00
Austin Clements 0fd427fda7 runtime: use entry stack map at function entry
Currently, when the runtime looks up the stack map for a frame, it
uses frame.continpc - 1 unless continpc is the function entry PC, in
which case it uses frame.continpc. As a result, if continpc is the
function entry point (which happens for deferred frames), it will
actually look up the stack map *following* the first instruction.

I think, though I am not positive, that this is always okay today
because the first instruction of a function can never change the stack
map. It's usually not a CALL, so it doesn't have PCDATA. Or, if it is
a CALL, it has to have the entry stack map.

But we're about to start emitting stack maps at every instruction that
changes them, which means the first instruction can have PCDATA
(notably, in leaf functions that don't have a prologue).

To prepare for this, tweak how the runtime looks up stack map indexes
so that if continpc is the function entry point, it directly uses the
entry stack map.

For #24543.

Change-Id: I85aa818041cd26aff416f7b1fba186e9c8ca6568
Reviewed-on: https://go-review.googlesource.com/109349
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-04-29 00:03:04 +00:00
Josh Bleecher Snyder 6cb064c9c4 Revert "runtime: convert g.waitreason from string to uint8"
This reverts commit 4eea887fd4.

Reason for revert: broke s390x build

Change-Id: Id6c2b6a7319273c4d21f613d4cdd38b00d49f847
Reviewed-on: https://go-review.googlesource.com/100375
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-03-13 15:21:21 +00:00
Josh Bleecher Snyder 4eea887fd4 runtime: convert g.waitreason from string to uint8
Every time I poke at #14921, the g.waitreason string
pointer writes show up.

They're not particularly important performance-wise,
but it'd be nice to clear the noise away.

And it does open up a few extra bytes in the g struct
for some future use.

Change-Id: I7ffbd52fbc2a286931a2218038fda52ed6473cc9
Reviewed-on: https://go-review.googlesource.com/99078
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-03-12 21:56:50 +00:00
Jerrin Shaji George 5b3cd56038 runtime: fix a few typos in comments
Change-Id: I07a1eb02ffc621c5696b49491181300bf411f822
Reviewed-on: https://go-review.googlesource.com/96475
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-02-23 00:17:20 +00:00
Austin Clements 45ffeab549 runtime: eliminate most uses of mheap_.arena_*
This replaces all uses of the mheap_.arena_* fields outside of
mallocinit and sysAlloc. These fields fundamentally assume a
contiguous heap between two bounds, so eliminating these is necessary
for a sparse heap.

Many of these are replaced with checks for non-nil spans at the test
address (which in turn checks for a non-nil entry in the heap arena
array). Some of them are just for debugging and somewhat meaningless
with a sparse heap, so those we just delete.

Updates #10460.

Change-Id: I8345b95ffc610aed694f08f74633b3c63506a41f
Reviewed-on: https://go-review.googlesource.com/85886
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-15 21:12:22 +00:00
Austin Clements 4de468621a runtime: use spanOf* more widely
The logic in the spanOf* functions is open-coded in a lot of places
right now. Replace these with calls to the spanOf* functions.

Change-Id: I3cc996aceb9a529b60fea7ec6fef22008c012978
Reviewed-on: https://go-review.googlesource.com/85880
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-15 21:12:15 +00:00
Austin Clements 058bb7ea27 runtime: split object finding out of heapBitsForObject
heapBitsForObject does two things: it finds the base of the object and
it creates the heapBits for the base of the object. There are several
places where we just care about the base of the object. Furthermore,
greyobject only needs the heapBits in the checkmark path and can
easily compute them only when needed. Once we eliminate passing the
heap bits to grayobject, almost all uses of heapBitsForObject don't
need the heap bits.

Hence, this splits heapBitsForObject into findObject and
heapBitsForAddr (the latter already exists), removes the hbits
argument to grayobject, and replaces all heapBitsForObject calls with
calls to findObject.

In addition to making things cleaner overall, heapBitsForAddr is going
to get more expensive shortly, so it's important that we don't do it
needlessly.

Note that there's an interesting performance pitfall here. I had
originally moved findObject to mheap.go, since it made more sense
there. However, that leads to a ~2% slow down and a whopping 11%
increase in L1 icache misses on both the x/garbage and compilebench
benchmarks. This suggests we may want to be more principled about
this, but, for now, let's just leave findObject in mbitmap.go.

(I tried to make findObject small enough to inline by splitting out
the error case, but, sadly, wasn't quite able to get it under the
inlining budget.)

Change-Id: I7bcb92f383ade565d22a9f2494e4c66fd513fb10
Reviewed-on: https://go-review.googlesource.com/85878
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-15 21:12:13 +00:00
Austin Clements e9079a69f3 runtime: buffered write barrier implementation
This implements runtime support for buffered write barriers on amd64.
The buffered write barrier has a fast path that simply enqueues
pointers in a per-P buffer. Unlike the current write barrier, this
fast path is *not* a normal Go call and does not require the compiler
to spill general-purpose registers or put arguments on the stack. When
the buffer fills up, the write barrier takes the slow path, which
spills all general purpose registers and flushes the buffer. We don't
allow safe-points or stack splits while this frame is active, so it
doesn't matter that we have no type information for the spilled
registers in this frame.

One minor complication is cgocheck=2 mode, which uses the write
barrier to detect Go pointers being written to non-Go memory. We
obviously can't buffer this, so instead we set the buffer to its
minimum size, forcing the write barrier into the slow path on every
call. For this specific case, we pass additional information as
arguments to the flush function. This also requires enabling the cgo
write barrier slightly later during runtime initialization, after Ps
(and the per-P write barrier buffers) have been initialized.

The code in this CL is not yet active. The next CL will modify the
compiler to generate calls to the new write barrier.

This reduces the average cost of the write barrier by roughly a factor
of 4, which will pay for the cost of having it enabled more of the
time after we make the GC pacer less aggressive. (Benchmarks will be
in the next CL.)

Updates #14951.
Updates #22460.

Change-Id: I396b5b0e2c5e5c4acfd761a3235fd15abadc6cb1
Reviewed-on: https://go-review.googlesource.com/73711
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-10-30 18:12:44 +00:00
Austin Clements 3beaf26e4f runtime: remove write barriers from newstack, gogo
Currently, newstack and gogo have write barriers for maintaining the
context register saved in g.sched.ctxt. This is troublesome, because
newstack can be called from go:nowritebarrierrec places that can't
allow write barriers. It happens to be benign because g.sched.ctxt
will always be nil on entry to newstack *and* it so happens the
incoming ctxt will also always be nil in these contexts (I
think/hope), but this is playing with fire. It's also desirable to
mark newstack go:nowritebarrierrec to prevent any other, non-benign
write barriers from creeping in, but we can't do that right now
because of this one write barrier.

Fix all of this by observing that g.sched.ctxt is really just a saved
live pointer register. Hence, we can shade it when we scan g's stack
and otherwise move it back and forth between the actual context
register and g.sched.ctxt without write barriers. This means we can
save it in morestack along with all of the other g.sched, eliminate
the save from newstack along with its troublesome write barrier, and
eliminate the shenanigans in gogo to invoke the write barrier when
restoring it.

Once we've done all of this, we can mark newstack
go:nowritebarrierrec.

Fixes #22385.
For #22460.

Change-Id: I43c24958e3f6785b53c1350e1e83c2844e0d1522
Reviewed-on: https://go-review.googlesource.com/72553
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-10-29 17:56:08 +00:00