Under the scavenge lock it's possible to ready a goroutine (or now
injectglist, which has mostly the same effect) which could cause an
unpark trace event to be emitted. If there's no active trace buffer for
the P, then we might acquire the lock. The total order between the two
is correct, but there's no partial order edge between them. Add in the
edge.
Change-Id: I3fc5d86a3b6bdd0b5648181fb76b5ebc90c3d69f
Reviewed-on: https://go-review.googlesource.com/c/go/+/231197
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dan Scales <danscales@google.com>
This change modifies the semantics of waking the scavenger: rather than
wake on any update to pacing, wake when we know we will have work to do,
that is, when the sweeper is done. The current scavenger runs over the
address space just once per GC cycle, and we want to maximize the chance
that the scavenger observes the most attractive scavengable memory in
that pass (i.e. free memory with the highest address), so the timing is
important. By having the scavenger awaken and reset its search space
when the sweeper is done, we increase the chance that the scavenger will
observe the most attractive scavengable memory, because no more memory
will be freed that GC cycle (so the highest scavengable address should
now be available).
Furthermore, in applications that go idle, this means the background
scavenger will be awoken even if another GC doesn't happen, which isn't
true today.
However, we're unable to wake the scavenger directly from within the
sweeper; waking the scavenger involves modifying timers and readying
goroutines, the latter of which may trigger an allocation today (and the
sweeper may run during allocation!). Instead, we do the following:
1. Set a flag which is checked by sysmon. sysmon will clear the flag and
wake the scavenger.
2. Wake the scavenger unconditionally at sweep termination.
The idea behind this policy is that it gets us close enough to the state
above without having to deal with the complexity of waking the scavenger
in deep parts of the runtime. If the application goes idle and sweeping
finishes (so we don't reach sweep termination), then sysmon will wake
the scavenger. sysmon has a worst-case 20 ms delay in responding to this
signal, which is probably fine if the application is completely idle
anyway, but if the application is actively allocating, then the
proportional sweeper should help ensure that sweeping ends very close to
sweep termination, so sweep termination is a perfectly reasonable time
to wake up the scavenger.
Updates #35788.
Change-Id: I84289b37816a7d595d803c72a71b7f5c59d47e6b
Reviewed-on: https://go-review.googlesource.com/c/go/+/207998
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This change adds two bits of logic to the scavenger's pacing. Firstly,
it checks to make sure we scavenged at least one physical page, if we
released a non-zero amount of memory. If we try to release less than one
physical page, most systems will release the whole page, which could
lead to memory corruption down the road, and this is a signal we're in
this situation.
Secondly, the scavenger's pacing logic now checks to see if the time a
scavenging operation takes is measured to be exactly zero or negative.
The exact zero case can happen if time update granularity is too large
to effectively capture the time the scavenging operation took, like on
Windows where the OS timer frequency is generally 1ms. The negative case
should not happen, but we're being defensive (against kernel bugs, bugs
in the runtime, etc.). If either of these cases happen, we fall back to
Go 1.13 behavior: assume the scavenge operation took around 10µs per
physical page. We ignore huge pages in this case because we're in
unknown territory, so we choose to be conservative about pacing (huge
pages could only increase the rate of scavenging).
Currently, the scavenger is broken on Windows because the granularity of
time measurement is around 1 ms, which is too coarse to measure how fast
we're scavenging, so we often end up with a scavenging time of zero,
followed by NaNs and garbage values in the pacing logic, which usually
leads to the scavenger sleeping forever.
Fixes#38617.
Change-Id: Iaaa2a4cbb21338e1258d010f7362ed58b7db1af7
Reviewed-on: https://go-review.googlesource.com/c/go/+/229997
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Austin Clements <austin@google.com>
All five calls to wakep are protected by the same check of nmidle and
nmspinning. Move this check into wakep.
Change-Id: I2094eec211ce551e462e87614578f37f1896ba38
Reviewed-on: https://go-review.googlesource.com/c/go/+/230757
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Currently, we emit stack maps and register maps at almost every
instruction. This was originally intended to support non-cooperative
preemption, but was only ever used for debug call injection. Now debug
call injection also uses conservative frame scanning. As a result,
stack maps are only needed at call sites and register maps aren't
needed at all except that we happen to also encode unsafe-point
information in the register map PCDATA stream.
This CL reduces stack maps to only appear at calls, and replace full
register maps with just safe/unsafe-point information.
This is all protected by the go115ReduceLiveness feature flag, which
is defined in both runtime and cmd/compile.
This CL significantly reduces binary sizes and also speeds up compiles
and links:
name old exe-bytes new exe-bytes delta
BinGoSize 15.0MB ± 0% 14.1MB ± 0% -5.72%
name old pcln-bytes new pcln-bytes delta
BinGoSize 3.14MB ± 0% 2.48MB ± 0% -21.08%
name old time/op new time/op delta
Template 178ms ± 7% 172ms ±14% -3.59% (p=0.005 n=19+19)
Unicode 71.0ms ±12% 69.8ms ±10% ~ (p=0.126 n=18+18)
GoTypes 655ms ± 8% 615ms ± 8% -6.11% (p=0.000 n=19+19)
Compiler 3.27s ± 6% 3.15s ± 7% -3.69% (p=0.001 n=20+20)
SSA 7.10s ± 5% 6.85s ± 8% -3.53% (p=0.001 n=19+20)
Flate 124ms ±15% 116ms ±22% -6.57% (p=0.024 n=18+19)
GoParser 156ms ±26% 147ms ±34% ~ (p=0.070 n=19+19)
Reflect 406ms ± 9% 387ms ±21% -4.69% (p=0.028 n=19+20)
Tar 163ms ±15% 162ms ±27% ~ (p=0.370 n=19+19)
XML 223ms ±13% 218ms ±14% ~ (p=0.157 n=20+20)
LinkCompiler 503ms ±21% 484ms ±23% ~ (p=0.072 n=20+20)
ExternalLinkCompiler 1.27s ± 7% 1.22s ± 8% -3.85% (p=0.005 n=20+19)
LinkWithoutDebugCompiler 294ms ±17% 273ms ±11% -7.16% (p=0.001 n=19+18)
(https://perf.golang.org/search?q=upload:20200428.8)
The binary size improvement is even slightly better when you include
the CLs leading up to this. Relative to the parent of "cmd/compile:
mark PanicBounds/Extend as calls":
name old exe-bytes new exe-bytes delta
BinGoSize 15.0MB ± 0% 14.1MB ± 0% -6.18%
name old pcln-bytes new pcln-bytes delta
BinGoSize 3.22MB ± 0% 2.48MB ± 0% -22.92%
(https://perf.golang.org/search?q=upload:20200428.9)
For #36365.
Change-Id: I69448e714f2a44430067ca97f6b78e08c0abed27
Reviewed-on: https://go-review.googlesource.com/c/go/+/230544
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
A debugger can inject a call at almost any PC, which causes
significant complications with stack scanning and growth. Currently,
the runtime solves this using precise stack maps and register maps at
nearly all PCs, but these extra maps require roughly 5% of the binary.
These extra maps were originally considered worth this space because
they were intended to be used for non-cooperative preemption, but are
now used only for debug call injection.
This CL switches from using precise maps to instead using conservative
frame scanning, much like how non-cooperative preemption works. When a
call is injected, the runtime flushes all potential pointer registers
to the stack, and then treats that frame as well as the interrupted
frame conservatively.
The limitation of conservative frame scanning is that we cannot grow
the goroutine stack. That's doable because the previous CL switched to
performing debug calls on a new goroutine, where they are free to grow
the stack.
With this CL, there are no remaining uses of precise register maps
(though we still use the unsafe-point information that's encoded in
the register map PCDATA stream), and stack maps are only used at call
sites.
For #36365.
Change-Id: Ie217b6711f3741ccc437552d8ff88f961a73cee0
Reviewed-on: https://go-review.googlesource.com/c/go/+/229300
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Currently, when a debugger injects a call, that call happens on the
goroutine where the debugger injected it. However, this requires
significant runtime complexity that we're about to remove.
To prepare for this, this CL switches to a different approach that
leaves the interrupted goroutine parked and runs the debug call on a
new goroutine. When the debug call returns, it resumes the original
goroutine.
This should be essentially transparent to debuggers. It follows the
exact same call injection protocol and ensures the whole protocol
executes indivisibly on a single OS thread. The only difference is
that the current G and stack now change part way through the protocol.
For #36365.
Change-Id: I68463bfd73cbee06cfc49999606410a59dd8f653
Reviewed-on: https://go-review.googlesource.com/c/go/+/229299
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Currently, newproc1 allocates, initializes, and schedules a new
goroutine. We're about to change debug call injection in a way that
will need to create a new goroutine without immediately scheduling it.
To prepare for that, make scheduling the responsibility of newproc1's
caller. Currently, there's exactly one caller (newproc), so this
simply shifts that responsibility.
For #36365.
Change-Id: Idacd06b63e738982e840fe995d891bfd377ce23b
Reviewed-on: https://go-review.googlesource.com/c/go/+/229298
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Rusage.Maxrss is in bytes on Darwin but in KiB on Linux. Fix this
discrepancy so it's always in bytes.
Change-Id: Ic714abc3276566b8fe5e30565072092212610854
Reviewed-on: https://go-review.googlesource.com/c/go/+/230979
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Jeremy Faller <jeremy@golang.org>
The span set data structure may leak blocks due to a race in the logic
to check whether it's safe to free a block. The simplest example of this
race is between two poppers:
1. Popper A claims slot spanSetEntries-2.
2. Popper B claims slot spanSetEntries-1.
3. Popper A gets descheduled before it subtracts from block.used.
4. Popper B subtracts from block.used, sees that claimed
spanSetEntries-1, but also that block.used != 0, so it returns.
5. Popper A comes back and subtracts from block.used, but it didn't
claim spanSetEntries-1 so it also returns.
The spine is left with a stale block pointer and the block later gets
overwritten by pushes, never to be re-used again.
The problem here is that we designate the claimer of slot
spanSetEntries-1 to be the one who frees the block, but that may not be
the thread that actually does the last subtraction from block.used.
Fixing this problem is tricky, and the fundamental problem there is that
block.used is not stable: it may be observed to be zero, but that
doesn't necessarily mean you're the last popper!
Do something simpler: keep a counter of how many pops have happened to a
given block instead of block.used. This counter monotonically increases
when a pop is _completely done_. Because this counter is monotonically
increasing, and only increases when a popper is done, then we know for
sure whichever popper is the last to increase it (i.e. its value is
spanSetBlockEntries) is also the last popper in the block. Because the
race described above still exists, the last popper may not be the one
which claimed the last slot in the block, but we know for certain nobody
else is popping from that block anymore so we can safely free it.
Finally, because pops serialize with pushes to the same slot, we need
not worry about concurrent pushers at all.
Updates #37487.
Change-Id: I6697219372774c8ca7d8ee6895eaa230a64ce9e1
Reviewed-on: https://go-review.googlesource.com/c/go/+/230497
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently mcaches are flushed to mcentral after a bunch of memstats have
already been read. This is not safe (in the sense that it doesn't ensure
consisent memstats) since memstats may in general change when mcentral
data structures are manipulated.
Note that prior to the new mcentral implementation this was not a
problem because mcentral operations happened to never modify certain
memstats. As of the new mcentral implementation, we might for example
persistentalloc when uncaching a span, which would change memstats. This
can cause a skew between the value of sys (which currently is calculated
before mcaches are flushed) and the value of gc_sys and other_sys.
Fix this by moving mcache flushing to the very top of updatememstats.
Also leave a comment explaining that this must be done first, in
general, because mcentrals make no guarantee that they will not
influence memstats (and doing so would be unnecessarily restrictive).
Fixes#38712.
Change-Id: I15bacb313c54a46e380a945a71bb75db67169c1b
Reviewed-on: https://go-review.googlesource.com/c/go/+/230498
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
Currently allocToCache assumes it can move the search address past the
block it allocated the cache from, which violates the property that
searchAddr should always point to mapped memory (i.e. memory represented
by pageAlloc.inUse).
This bug was already fixed once for pageAlloc.alloc in the Go 1.14
release via CL 216697, but that changed failed to take into account
allocToCache.
Fixes#38605.
Change-Id: Id08180aa10d19dc0f9f551a1d9e327a295560dff
Reviewed-on: https://go-review.googlesource.com/c/go/+/229577
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Several new ones came from my testing (long, repeated runs) and one (assistQueue ->
spine) came from the staticlockranking builder (filed as issue 38441).
Fixes#38441
Change-Id: I4268da0d8b8cc51251eba6bd936110c8ab4c4e61
Reviewed-on: https://go-review.googlesource.com/c/go/+/229480
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Currently, the small object sweeper will sweep until it finds a free
slot or there are no more spans of that size class to sweep. In dense
heaps, this can cause sweeping for a given size class to take
unbounded time, and gets worse with larger heaps.
This CL limits the small object sweeper to try at most 100 spans
before giving up and allocating a fresh span. Since it's already shown
that 100 spans are completely full at that point, the space overhead
of this fresh span is at most 1%.
This CL is based on an experimental CL by Austin Clements (CL 187817)
and is updated to be part of the mcentral implementation, gated by
go115NewMCentralImpl.
Updates #18155.
Change-Id: I37a72c2dcc61dd6f802d1d0eac3683e6642b6ef8
Reviewed-on: https://go-review.googlesource.com/c/go/+/229998
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Currently mcentral is implemented as a couple of linked lists of spans
protected by a lock. Unfortunately this design leads to significant lock
contention.
The span ownership model is also confusing and complicated. In-use spans
jump between being owned by multiple sources, generally some combination
of a gcSweepBuf, a concurrent sweeper, an mcentral or an mcache.
So first to address contention, this change replaces those linked lists
with gcSweepBufs which have an atomic fast path. Then, we change up the
ownership model: a span may be simultaneously owned only by an mcentral
and the page reclaimer. Otherwise, an mcentral (which now consists of
sweep bufs), a sweeper, or an mcache are the sole owners of a span at
any given time. This dramatically simplifies reasoning about span
ownership in the runtime.
As a result of this new ownership model, sweeping is now driven by
walking over the mcentrals rather than having its own global list of
spans. Because we no longer have a global list and we traditionally
haven't used the mcentrals for large object spans, we no longer have
anywhere to put large objects. So, this change also makes it so that we
keep large object spans in the appropriate mcentral lists.
In terms of the static lock ranking, we add the spanSet spine locks in
pretty much the same place as the mcentral locks, since they have the
potential to be manipulated both on the allocation and sweep paths, like
the mcentral locks.
This new implementation is turned on by default via a feature flag
called go115NewMCentralImpl.
Benchmark results for 1 KiB allocation throughput (5 runs each):
name \ MiB/s go113 go114 gotip gotip+this-patch
AllocKiB-1 1.71k ± 1% 1.68k ± 1% 1.59k ± 2% 1.71k ± 1%
AllocKiB-2 2.46k ± 1% 2.51k ± 1% 2.54k ± 1% 2.93k ± 1%
AllocKiB-4 4.27k ± 1% 4.41k ± 2% 4.33k ± 1% 5.01k ± 2%
AllocKiB-8 4.38k ± 3% 5.24k ± 1% 5.46k ± 1% 8.23k ± 1%
AllocKiB-12 4.38k ± 3% 4.49k ± 1% 5.10k ± 1% 10.04k ± 0%
AllocKiB-16 4.31k ± 1% 4.14k ± 3% 4.22k ± 0% 10.42k ± 0%
AllocKiB-20 4.26k ± 1% 3.98k ± 1% 4.09k ± 1% 10.46k ± 3%
AllocKiB-24 4.20k ± 1% 3.97k ± 1% 4.06k ± 1% 10.74k ± 1%
AllocKiB-28 4.15k ± 0% 4.00k ± 0% 4.20k ± 0% 10.76k ± 1%
Fixes#37487.
Change-Id: I92d47355acacf9af2c41bf080c08a8c1638ba210
Reviewed-on: https://go-review.googlesource.com/c/go/+/221182
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This change implements the spanSet data structure which is based off of
the gcSweepBuf data structure. While the general idea is the same (one
has two of these which one switches between every GC cycle; one to push
to and one to pop from), there are some key differences.
Firstly, we never have a need to iterate over this data structure so
delete numBlocks and block. Secondly, we want to be able to pop from the
front of the structure concurrently with pushes to the back. As a result
we need to maintain both a head and a tail and this change introduces an
atomic headTail structure similar to the one used by sync.Pool. It also
implements popfirst in a similar way.
As a result of this headTail, we need to be able to explicitly reset the
length, head, and tail when it goes empty at the end of sweep
termination, so add a reset method.
Updates #37487.
Change-Id: I5b8ad290ec32d591e3c8c05e496c5627018074f6
Reviewed-on: https://go-review.googlesource.com/c/go/+/221181
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This change adds a global pool of spanSetBlocks to the spanSet data
structure and adds support for eagerly freeing these blocks back to the
pool if the block goes empty.
This change prepares us to use this data structure in more places in the
runtime by allowing reuse of spanSetBlock.
Updates #37487.
Change-Id: I0752226e3667a9e3e1d87c9b66edaedeae1ac23f
Reviewed-on: https://go-review.googlesource.com/c/go/+/221180
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This change copies the gcSweepBuf data structure into a new file and
renames it spanSet. It will serve as the basis for a heavily modified
version of the gcSweepBuf data structure for the new mcentral
implementation.
We move it into a separate file now for two reasons:
1. We will need both implementations as they will coexist simultaneously
for a time.
2. By creating it now in a new change it'll make future changes which
modify it easier to review (rather than introducing the new file then).
Updates #37487.
Change-Id: If80603cab6e813a1ee2e5ecd49dcde5d8045a6c7
Reviewed-on: https://go-review.googlesource.com/c/go/+/221179
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Since we're sleeping rather than waiting for the goroutines,
let the goroutines run forever.
Fixes#38595
Change-Id: I4cd611fd7565f6e8d91e50c9273d91c514825314
Reviewed-on: https://go-review.googlesource.com/c/go/+/229484
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Converted some Println() statements (used to make sure that certain variables were
kept alive and not optimized out) to assignments into global variables, so the
tests don't produce extraneous output when there is a failure.
Fixes#38594
Change-Id: I7eb41bb02b2b1e78afd7849676b5c85bc11c759c
Reviewed-on: https://go-review.googlesource.com/c/go/+/229538
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Goroutines are directly associated with labels. It's relatively easy to
plumb those through without creating goroutine-locals in the wild.
This is accomplished by splitting out most of the code from the public
`runtime.GoroutineProfile` into a new unexported
`runtime.goroutineProfileWithLabels`, which then has a thin wrapper
linked into the `runtime/pprof` package as
`runtime_goroutineProfileWithLabels`. (mirroring the way labels get
associated with the `g` for a goroutine in the first place)
Per-#6104, OS-thread creation profiles are a bit useless, as `M`s tend
to be created be created by a background goroutine. As such, I decided
not to add support for capturing the labels at `M`-creation-time, since
the stack-traces seem to always come out `nil` for my simple test
binaries.
This change currently provides labels for debug=0 and debug=1, as
debug=2 is currently entirely generated by the runtime package and I
don't see a clean way of getting the `labelMap` type handled properly
within the `runtime` package.
Update the comment added in cl/131275 to mention goroutine support for
labels.
Updates #23458
Change-Id: Ia4b558893d7d10156b77121cd9b70c4ccd9e1889
Reviewed-on: https://go-review.googlesource.com/c/go/+/189318
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
Clean up the code a little bit to make it clearer:
Don't check throwsplit for a SI_USER signal.
If throwsplit is set for a SigPanic signal, always throw;
discard any other flags.
Fixes#36420
Change-Id: Ic9dcd1108603d241f71c040504dfdc6e528f9767
Reviewed-on: https://go-review.googlesource.com/c/go/+/228900
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Currently markrootSpans, the scanning routine which scans span specials
(particularly finalizers) as roots, uses sweepSpans to shard work and
find spans to mark.
However, as part of a future CL to change span ownership and how
mcentral works, we want to avoid having markrootSpans use the sweep bufs
to find specials, so in this change we introduce a new mechanism.
Much like for the page reclaimer, we set up a per-page bitmap where the
first page for a span is marked if the span contains any specials, and
unmarked if it has no specials. This bitmap is updated by addspecial,
removespecial, and during sweeping.
markrootSpans then shards this bitmap into mark work and markers iterate
over the bitmap looking for spans with specials to mark. Unlike the page
reclaimer, we don't need to use the pageInUse bits because having a
special implies that a span is in-use.
While in terms of computational complexity this design is technically
worse, because it needs to iterate over the mapped heap, in practice
this iteration is very fast (we can skip over large swathes of the heap
very quickly) and we only look at spans that have any specials at all,
rather than having to touch each span.
This new implementation of markrootSpans is behind a feature flag called
go115NewMarkrootSpans.
Updates #37487.
Change-Id: I8ea07b6c11059f6d412fe419e0ab512d989377b8
Reviewed-on: https://go-review.googlesource.com/c/go/+/221178
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
During schedinit, these may occur in:
mProf_Malloc
stkbucket
newBucket
persistentalloc
persistentalloc1
mProf_Malloc
setprofilebucket
fixalloc.alloc
persistentalloc
persistentalloc1
These seem to be legitimate lock orderings.
Additionally, mheap.speciallock had a defined rank, but it was never
actually used. That is fixed now.
Updates #38474
Change-Id: I0f6e981852eac66dafb72159f426476509620a65
Reviewed-on: https://go-review.googlesource.com/c/go/+/228786
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dan Scales <danscales@google.com>
This reverts commit 1f0738c157.
Reason for revert: This May have caused issue 38567.
Change-Id: I2afa6a9d42cb29cfad09e706fb465c57e3774abd
Reviewed-on: https://go-review.googlesource.com/c/go/+/229301
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
The core CPU profiling loop contains a 100ms sleep.
This is important to reduce overhead.
However, it means that it takes 200ms to shutting down a program
with CPU profiling enabled. When trying to collect many samples
by running a short-lived program many times, this adds up.
This change cuts the shutdown penalty in half by skipping
the sleep whenever possible.
Change-Id: Ic3177f8e1a2d331fe1a1ecd7c8c06f50beb42535
Reviewed-on: https://go-review.googlesource.com/c/go/+/228886
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
At least as far as I can tell, this file never explicitly states whether
locks with higher or lower rank should be taken first. It is implied in
some comments, and clear from the code, of course.
Add an explicit comment to make things more clear and hopefully reduce
new locks being adding in the wrong spot.
Change-Id: I17c6fd5fc216954e5f3550cf91f17e25139f1587
Reviewed-on: https://go-review.googlesource.com/c/go/+/228785
Reviewed-by: Dan Scales <danscales@google.com>
When the seconds param is given, the block and mutex profile endpoints
report the difference between two measurements collected the given
seconds apart. Historically, the block and mutex profiles have reported
the cumulative counts since the process start, and it turned out they
are more useful when interpreted along with the time duration.
Note: cpu profile and trace endpoints already accept the "seconds"
parameter. With this CL, the block and mutex profile endpoints will
accept the "seconds" parameter. Providing the "seconds" parameter
to other types of profiles is an error.
This change moves runtime/pprof/internal/profile to internal/profile and
adds part of merge logic from github.com/google/pprof/profile/merge.go to
internal/profile, in order to allow both net/http/pprof and runtime/pprof
to access it.
Fixes#23401
Change-Id: Ie2486f1a63eb8ff210d7d3bc2de683e9335fd5cd
Reviewed-on: https://go-review.googlesource.com/c/go/+/147598
Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
A case that I missed in CL 205239: profilealloc can be called at
program startup if GOMAXPROCS is large enough.
Fixes#38474
Change-Id: I2f089fc6ec00c376680e1c0b8a2557b62789dd7f
Reviewed-on: https://go-review.googlesource.com/c/go/+/228420
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
All platforms now support pushCall, hence remove the now unnecessary
pushCallSupported flag/guard.
Change-Id: I99e4be73839da68a742f3c239bae9ce2f8764624
Reviewed-on: https://go-review.googlesource.com/c/go/+/228497
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
The Context object we pass to GetThreadContext on Windows must be 16
byte-aligned. We also can't allocate in the contexts where we create
these, so they must be stack-allocated. There's no great way to do
this, but this CL makes the code at least a little clearer, and makes
profilem and preemptM more consistent with each other.
Change-Id: I5ec47a27d7580ed6003030bf953e668e8cae2cef
Reviewed-on: https://go-review.googlesource.com/c/go/+/207967
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
This CL adds support of call injection and async preemption on
riscv64. We also clobbered REG_TMP for the injected call. Unsafe
points related to REG_TMP access have been marked in previous commits.
Fixes#36711.
Change-Id: I1a1df5b7fc23eaafc34a6a6448fcc3c91054496e
GitHub-Last-Rev: f6110d4707
GitHub-Pull-Request: golang/go#38146
Reviewed-on: https://go-review.googlesource.com/c/go/+/226206
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently, as a result of us releasing worldsema now to allow STW events
during a mark phase, we release worldsema between starting the world and
having the goroutine block in STW mode. This inserts preemption points
which, if followed through, could lead to a deadlock. Specifically,
because user goroutine scheduling is disabled in STW mode, the goroutine
will block before properly releasing worldsema.
The fix here is to prevent preemption while releasing the worldsema.
Fixes#38404.
Updates #19812.
Change-Id: I8ed5b3aa108ab2e4680c38e77b0584fb75690e3d
Reviewed-on: https://go-review.googlesource.com/c/go/+/228337
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Following CL 226818, the compiler will allow inlining a single cycle in
an inline chain. Immediately-recursive functions are still disallowed,
which is what this heuristic refers to.
Add a regression test for this case.
Note that in addition to this check, if the compiler were to inline
multiple cycles via a loop (i.e., rather than appending duplicate code),
much more work would be required here to handle a single address
appearing in multiple different inline frames.
Updates #29737
Change-Id: I88de15cfbeabb9c04381e1c12cc36778623132a5
Reviewed-on: https://go-review.googlesource.com/c/go/+/227346
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dan Scales <danscales@google.com>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
TestTryAdd is particularly brittle because it tests some real cases by
constructing fake sample stack frames. If those frames don't correctly
represent what the runtime would generate then they may fail to catch
regressions.
Instead, call runtime.Callers at the bottom of real function calls to
generate real frames as a base for truncation, etc in tests. Several of
these tests still have to fake parts of the frames to test the right
thing, but this is a bit less fragile.
This change is equivalent to the original
0dfb0513ec (golang.org/cl/227484), except
that the test skips if the test functions aren't inline (e.g., noopt
builders).
Change-Id: Ie9e32b5660cfe28a924f9cfcddcd887ea2effd66
Reviewed-on: https://go-review.googlesource.com/c/go/+/227922
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
There's no need for netpollWakeSig to use a uintptr type, a uint32 is enough.
Relevant CL: CL 212737
Change-Id: Ide24478b217a02bad62f7e000a9680c26a8c5366
Reviewed-on: https://go-review.googlesource.com/c/go/+/227798
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Tweak the runtime's GDB python test to try to reduce flake failures.
Background: the intent of the testpoint in question is to make sure
that python-supported commands like "info goroutines" or "goroutine 1
backtrace" work properly. The Go code being run under the debugger as
part of the test is single-threaded, but the test is written assuming
that in addition to the primary goroutine there will be other
background goroutines available (owned by the runtime). The flakiness
seems to crop up the most when requesting a backtrace for one of these
background goroutines; the speculation is that if we catch a
runtime-owned goroutine in an odd state, this could interfere with the
test.
The change in this patch is to explicitly start an additional
goroutine from the main thread, so that when the debugger stops the
main thread we can be sure that there is some other non-main goroutine
in a known state.
This change authored by Josh Bleecher Snyder <josharian@gmail.com>.
Updates #24616.
Change-Id: I45682323d5898e5187c0adada7c5d117e92f403b
Reviewed-on: https://go-review.googlesource.com/c/go/+/226558
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
For map with hint larger than BUCKETSIZE, makemap ignore allocated
bucket and allocate buckets itself. So do not allocate bucket in
this case, save us the cost of zeroing+assignment to the bucket.
name old time/op new time/op delta
NewEmptyMap-12 3.89ns ± 4% 3.88ns ± 2% ~ (p=0.939 n=19+20)
NewSmallMap-12 23.3ns ± 3% 23.1ns ± 2% ~ (p=0.307 n=18+17)
NewEmptyMapHintLessThan8-12 6.43ns ± 3% 6.31ns ± 2% -1.72% (p=0.000 n=19+18)
NewEmptyMapHintGreaterThan8-12 159ns ± 2% 150ns ± 1% -5.79% (p=0.000 n=20+18)
Benchmark run with commit ab7c174 reverted, see #38314.
Fixes#20184
Change-Id: Ic021f57454c3a0dd50601d73bbd77b8faf8d93b6
Reviewed-on: https://go-review.googlesource.com/c/go/+/227458
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Some runtime calls accept a slice, but only use ptr and len.
This change modifies most such routines to accept only ptr and len.
After this change, the only runtime calls that accept an unnecessary
cap arg are concatstrings and slicerunetostring.
Neither is particularly common, and both are complicated to modify.
Negligible compiler performance impact. Shrinks binaries a little.
There are only a few regressions; the one I investigated was
due to register allocation fluctuation.
Passes 'go test -race std cmd', modulo #38265 and #38266.
Wow, does that take a long time to run.
Updates #36890
file before after Δ %
compile 19655024 19655152 +128 +0.001%
cover 5244840 5236648 -8192 -0.156%
dist 3662376 3658280 -4096 -0.112%
link 6680056 6675960 -4096 -0.061%
pprof 14789844 14777556 -12288 -0.083%
test2json 2824744 2820648 -4096 -0.145%
trace 11647876 11639684 -8192 -0.070%
vet 8260472 8256376 -4096 -0.050%
total 115163736 115118808 -44928 -0.039%
Change-Id: Idb29fa6a81d6a82bfd3b65740b98cf3275ca0a78
Reviewed-on: https://go-review.googlesource.com/c/go/+/227163
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
The offset is always a multiple of the pointer size.
Change-Id: I790e087e89a081044a3ec35d99880533a4c929bd
Reviewed-on: https://go-review.googlesource.com/c/go/+/227540
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Make typedmemmove, typedmemclr, typedmemclrpartial look more like other
callers of bulkBarrierPreWrite.
Change-Id: Ic47030d88bf07d290f91198b7810ffc16d9769e2
Reviewed-on: https://go-review.googlesource.com/c/go/+/227541
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This removes all conditions and conditional code (that I could find)
that depended on darwin/386.
Fixes#37610.
Change-Id: I630d9ea13613fb7c0bcdb981e8367facff250ba0
Reviewed-on: https://go-review.googlesource.com/c/go/+/227582
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This removes all files that are only used on darwin/386 and cleans up
build tags in files that are still used on other platforms.
Updates #37610.
Change-Id: If246642476c12d15f59a474e2b91a29c0c02fe75
Reviewed-on: https://go-review.googlesource.com/c/go/+/227581
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This removes all conditions and conditional code (that I could find)
that depended on darwin/arm.
Fixes#35439 (since that only happened on darwin/arm)
Fixes#37611.
Change-Id: Ia4c32a5a4368ed75231075832b0b5bfb1ad11986
Reviewed-on: https://go-review.googlesource.com/c/go/+/227198
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This removes all files that are only used on darwin/arm and cleans up
build tags in files that are still used on other platforms.
Updates #37611.
Change-Id: Ic9490cf0edfc157c6276a7ca950c1768b34a998f
Reviewed-on: https://go-review.googlesource.com/c/go/+/227197
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TestPingPongHog tests properties of the scheduler.
But the race detector intentionally does randomized scheduling,
so the test is not applicable.
Fixes#38266
Change-Id: Ib06aa317b2776cb1faa641c4e038e2599cf70b2d
Reviewed-on: https://go-review.googlesource.com/c/go/+/227344
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TestTryAdd is particularly brittle because it tests some real cases by
constructing fake sample stack frames. If those frames don't correctly
represent what the runtime would generate then they may fail to catch
regressions.
Instead, call runtime.Callers at the bottom of real function calls to
generate real frames as a base for truncation, etc in tests. Several of
these tests still have to fake parts of the frames to test the right
thing, but this is a bit less fragile.
Change-Id: I62522a9ded5544b06d1bf28550af5400f3af667b
Reviewed-on: https://go-review.googlesource.com/c/go/+/227484
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
This change adds CFA information to the assembly function 'crosscall1'
and reorgnizes its code to establish well-formed prologue and epilogue.
It will fix an infinite callstack issue when debugging cgo program with
GDB on arm64.
Brief root cause analysis:
GDB's aarch64 unwinder parses prologue to determine current frame's size
and previous PC&SP if CFA information is not available.
The unwinder parses the prologue of 'crosscall1' to determine a frame size
of 0x10, then turns to its next frame trying to compute its previous PC&SP
as they are not saved on current frame's stack as per its 'traditional frame
unwind' rules, which ends up getting an endless frame chain like:
[callee] : pc:<pc0>, sp:<sp0>
crosscall1: pc:<pc1>, sp:<sp0>+0x10
[caller] : pc:<pc1>, sp:<sp0>+0x10+0x10
[caller] : pc:<pc1>, sp:<sp0>+0x10+0x10+0x10
...
GDB fails to detect the 'caller' frame is same as 'crosscall1' and terminate
unwinding since SP increases everytime.
Fixes#37238
Change-Id: Ia6bd8555828541a3a61f7dc9b94dfa00775ec52a
Reviewed-on: https://go-review.googlesource.com/c/go/+/226999
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
I took some of the infrastructure from Austin's lock logging CR
https://go-review.googlesource.com/c/go/+/192704 (with deadlock
detection from the logs), and developed a setup to give static lock
ranking for runtime locks.
Static lock ranking establishes a documented total ordering among locks,
and then reports an error if the total order is violated. This can
happen if a deadlock happens (by acquiring a sequence of locks in
different orders), or if just one side of a possible deadlock happens.
Lock ordering deadlocks cannot happen as long as the lock ordering is
followed.
Along the way, I found a deadlock involving the new timer code, which Ian fixed
via https://go-review.googlesource.com/c/go/+/207348, as well as two other
potential deadlocks.
See the constants at the top of runtime/lockrank.go to show the static
lock ranking that I ended up with, along with some comments. This is
great documentation of the current intended lock ordering when acquiring
multiple locks in the runtime.
I also added an array lockPartialOrder[] which shows and enforces the
current partial ordering among locks (which is embedded within the total
ordering). This is more specific about the dependencies among locks.
I don't try to check the ranking within a lock class with multiple locks
that can be acquired at the same time (i.e. check the ranking when
multiple hchan locks are acquired).
Currently, I am doing a lockInit() call to set the lock rank of most
locks. Any lock that is not otherwise initialized is assumed to be a
leaf lock (a very high rank lock), so that eliminates the need to do
anything for a bunch of locks (including all architecture-dependent
locks). For two locks, root.lock and notifyList.lock (only in the
runtime/sema.go file), it is not as easy to do lock initialization, so
instead, I am passing the lock rank with the lock calls.
For Windows compilation, I needed to increase the StackGuard size from
896 to 928 because of the new lock-rank checking functions.
Checking of the static lock ranking is enabled by setting
GOEXPERIMENT=staticlockranking before doing a run.
To make sure that the static lock ranking code has no overhead in memory
or CPU when not enabled by GOEXPERIMENT, I changed 'go build/install' so
that it defines a build tag (with the same name) whenever any experiment
has been baked into the toolchain (by checking Expstring()). This allows
me to avoid increasing the size of the 'mutex' type when static lock
ranking is not enabled.
Fixes#38029
Change-Id: I154217ff307c47051f8dae9c2a03b53081acd83a
Reviewed-on: https://go-review.googlesource.com/c/go/+/207619
Reviewed-by: Dan Scales <danscales@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Previously we stopped the timer and then reset it. With the current
timer implementation that is no longer required.
Change-Id: Ie7aba61ad53ce835f6fcd0b6bce7fe0a15b10e24
Reviewed-on: https://go-review.googlesource.com/c/go/+/227180
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Loading arguments of Xchg(64) and Xadd(64) functions to registers
could be done only once.
Change-Id: Iaf0a695ec9c6a221dfa755855edb68c476978a5b
Reviewed-on: https://go-review.googlesource.com/c/go/+/227001
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Update race detector syso files for some platforms.
There's still 2 more to do, but they might take a while so I'm
mailing the ones I have now.
Note: some arm64 tests did not complete successfully due to out
of memory errors, but I suspect the .syso is correct.
Update #14481
Update #37485 (I think?)
Update #37355
Change-Id: I7e7e707a1fd7574855a538ba89dc11acc999c760
Reviewed-on: https://go-review.googlesource.com/c/go/+/226981
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
We can initialize the runtime sigqueue packages on first use.
We don't require an explicit initialization step. So, remove it.
Change-Id: I484e02dc2c67395fd5584f35ecda2e28b37168df
Reviewed-on: https://go-review.googlesource.com/c/go/+/226540
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Although duffcopy is not used on PPC64, duff_ppc64x.s and
mkduff.go don't match. Make it so.
Fixes#38188.
Change-Id: Ic6c08e335795ea407880efd449f4229696af7744
Reviewed-on: https://go-review.googlesource.com/c/go/+/226719
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This reverts CL 212767.
Reason for revert: new test is persistently failing on freebsd-arm64-dmgk builder.
Change-Id: Ifd1227628e0e747688ddb4dc580170b2a103a89e
Reviewed-on: https://go-review.googlesource.com/c/go/+/226597
Run-TryBot: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Use explicit names for the error code returned by pollReset
and pollWait, rather than just 0, 1, 2, 3.
Change-Id: I0ab12cae57693deab7cca9cdd2fadd597e23a956
Reviewed-on: https://go-review.googlesource.com/c/go/+/226537
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
Recently, the gVisor project needs an instruction's address
with 128 bytes alignment and a function's start address with
2K bytes alignment to fit the architecture requirement for
interrupt table.
This patch allows aligning the address of an instruction to be
aligned to a specific value (2^n and not higher than 2048) and
the address of a function to be 2048 bytes.
The main changes include:
1. Adds ALIGN2048 flag to align a function's address with
2048 bytes.
e.g. "TEXT ·Add(SB),NOSPLIT|ALIGN2048" indicates that the
address of Add function should be aligned to 2048 bytes.
2. Adds a new element in the FuncInfo structure defined in
cmd/internal/obj/link.go file to record the alignment
information.
3. Adds a new element in the Func structure defined in
cmd/internal/goobj/read.go file to read the alignment
information.
4. Because go introduces a new object file format, also add
a new element in the FuncInfo structure defined in
cmd/internal/goobj2/funcinfo.go to record the alignment
information.
5. Adds the assembler support to align an intruction's offset
with a specific value (2^n and not higher than 2048).
e.g. "PCALIGN $256" indicates that the next instruction should
be aligned to 256 bytes.
This CL also adds a test.
Change-Id: I31cfa6fb5bc35dee2c44bf65913e90cddfcb492a
Reviewed-on: https://go-review.googlesource.com/c/go/+/212767
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently there are a few sanity checks in the page allocator which
should fail immediately but because it's a check for a negative number
on a uint, it's actually dead-code.
If there's a bug in the page allocator which would cause the sanity
check to fail, this could cause memory corruption by returning an
invalid address (more precisely, one might either see a segfault, or
span overlap).
This change fixes these sanity checks to check the correct condition.
Fixes#38130.
Change-Id: Ia19786cece783d39f26df24dec8788833a6a3f21
Reviewed-on: https://go-review.googlesource.com/c/go/+/226297
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Instead of calling netpollwakeup, just do the write in netpollBreak.
Use the same signaling we now use in other netpollBreak instances.
Change-Id: I53a65c22862ecc8484aee91d0e1ffb21a9e62d8c
Reviewed-on: https://go-review.googlesource.com/c/go/+/226199
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
This reverts CL 225618.
This is causing TestNetpollBreak to fail on AIX more often than not.
Change-Id: Ia3c24041ead4b320202f7f5b17a6b286f639a689
Reviewed-on: https://go-review.googlesource.com/c/go/+/226198
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
netpollBreak calls netpollwakeup, and netpollwakeup expects the mtxpoll
lock to be held, so that it has exclusive access to pendingUpdates.
Not acquiring the lock was a mistake in CL 171824. Fortunately it
rarely matters in practice.
Change-Id: I32962ec2575c846ef3d6a91a4d821b2ff02d983c
Reviewed-on: https://go-review.googlesource.com/c/go/+/225618
Reviewed-by: Michael Knyszek <mknyszek@google.com>
gentraceback generates PCs which are usually following the CALL
instruction. For those that aren't, it fixes up the PCs so that
functions processing the output can unconditionally decrement the PC.
runtime_expandInlineFrames does this unconditional decrement when
looking up the function. However, the fake stack frame generated for
overflow records fails to meet the contract, and decrementing the PC
results in a PC in the previous function. If that function contains
inlined call, runtime_expandInlineFrames will not short-circuit and will
panic trying to look up a PC that doesn't exist.
Note that the added test does not fail at HEAD. It will only fail (with
a panic) if the function preceeding lostProfileEvent contains inlined
function calls. At the moment (on linux/amd64), that is
runtime/pprof.addMaxRSS, which does not.
Fixes#38096
Change-Id: Iad0819f23c566011c920fd9a5b1254719228da0b
Reviewed-on: https://go-review.googlesource.com/c/go/+/225661
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
There might be some concurrent (maybe not concurrent, just sequential but in a short time window) and duplicate calls to `netpollBreak`, trying to wake up a net-poller. If one has called `netpollBreak` and that waking event hasn't been received by epollwait/kevent/..., then the subsequent calls of `netpollBreak` ought to be ignored or in other words, these calls should be converged into one.
Benchmarks go1.13.5 darwin/amd64:
benchmark-func time/op (old) time/op (new) delta
BenchmarkNetpollBreak-4 29668ns ±1% 3131ns ±2% -89.45%
mem/B (old) mem/B (new) delta
154B ±13% 0B ±0% -100%
Change-Id: I3cf757a5d6edc5a99adad7aea3baee4b7f2a8f5c
GitHub-Last-Rev: 15bcfbab8a
GitHub-Pull-Request: golang/go#36294
Reviewed-on: https://go-review.googlesource.com/c/go/+/212737
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Currently, the capping logic for the GC trigger ratio is such that if
gcpercent is low, we may end up setting the trigger ratio far too high,
breaking the promise of SetGCPercent and GOGC has a trade-off knob (we
won't start a GC early enough, and we will use more memory).
This change modifies the capping logic for the trigger ratio by scaling
the minTriggerRatio with gcpercent the same way we scale
maxTriggerRatio.
Fixes#37927.
Change-Id: I2a048c1808fb67186333d3d5a6bee328be2f35da
Reviewed-on: https://go-review.googlesource.com/c/go/+/223937
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Variable 'procs' used to calculate the threshold of overuse in
TestPhysicalMemoryUtilization should be updated if GOMAXPROCS
gets changed, otherwise the threshold could be a large number,
making the test meaningless.
Change-Id: I876cbf11457529f56bae77af1e35f4538a721f95
Reviewed-on: https://go-review.googlesource.com/c/go/+/210297
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Currently if a goroutine is preempted while owning a timer in the
timerModifying state, it could self-deadlock. When the goroutine is
preempted and calls into the scheduler, it could call checkTimers. If
checkTimers encounters the timerModifying timer and calls runtimer on
it, then runtimer will spin, waiting for that timer to leave the
timerModifying state, which it never will.
So far we got lucky that for the most part that there were no preemption
points while timerModifying is happening, however CL 221077 seems to
have introduced one, leading to sporadic self-deadlocks.
This change disables preemption explicitly while a goroutines holds a
timer in timerModifying. Since only checkTimers (and thus runtimer) is
called from the scheduler, this is sufficient to prevent
preemption-based self-deadlocks.
Fixes#38070.
Updates #37894.
Change-Id: Idbfac310889c92773023733ff7e2ff87e9896f0c
Reviewed-on: https://go-review.googlesource.com/c/go/+/225497
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
newdefer() actually adds the new defer to the current g's defer chain. That
happens even if we are on the system stack, in which case the g will be the g0
stack. For open-coded defers, we call newdefer() (only during panic processing)
while on the system stack, so the new defer is unintentionally added to the
g0._defer defer list. The code later correctly adds the defer to the user g's
defer list.
The g0._defer list is never used. However, that pointer on the g0._defer list can
keep a defer struct alive that is intended to be garbage-collected (smaller defers
use a defer pool, but larger-sized defer records are just GC'ed). freedefer() does
not zero out pointers when it intends that a defer become garbage-collected. So,
we can have the pointers in a defer that is held alive by g0._defer become invalid
(in particular d.link). This is the cause of the bad pointer bug in this issue
The fix is to change newdefer (only used in two places) to not add the new defer
to the gp._defer list. We just do it after the call with the correct gp pointer.
(As mentioned above, this code was already there after the newdefer in
addOneOpenDeferFrame.) That ensures that defers will be correctly
garbage-collected and eliminate the bad pointer.
This fix definitely fixes the original repro. I added a test and tried hard to
reproduce the bug (based on the original repro code), but awasn't actually able to
cause the bug. However, the test is still an interesting mix of heap-allocated,
stack-allocated, and open-coded defers.
Fixes#37688
Change-Id: I1a481b9d9e9b9ba4e8726ef718a1f4512a2d6faf
Reviewed-on: https://go-review.googlesource.com/c/go/+/224581
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
We used to fall back to GetQueuedCompletionStatus if
GetQueuedCompletionStatus was not available, but as of Go 1.11 we
require Windows 7 or later, so GetQueuedCompletionStatusEx is always
available.
Fixes#37957
Change-Id: I7d8d49a92ab7b1f5afdc54a442f696aaf4a5168e
Reviewed-on: https://go-review.googlesource.com/c/go/+/225059
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Reduce the length of time that other timer functions can see timerModifying.
In particular avoid system calls.
Fixes#38023
Change-Id: I1b61229c668e6085d9ee6dca9488a90055386c36
Reviewed-on: https://go-review.googlesource.com/c/go/+/224902
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
The callers expect negative errno values, so negate them when necessary.
No test because there is no reasonable way to make pipe/pipe2 fail.
This was reported on a system on which pipe2 returned ENOSYS.
Fixes#37997
Change-Id: I3ad6cbbc2521cf495f8df6ec991a3f781122b508
Reviewed-on: https://go-review.googlesource.com/c/go/+/224592
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
This change modifies the implementation of (*mspan).countAlloc by
using OnesCount64 (which on many systems is intrinsified). It does so by
using an unsafe pointer cast, but in this case we don't care about
endianness because we're just counting bits set.
This change means we no longer need the popcnt table which was redundant
in the runtime anyway. We can also simplify the logic here significantly
by observing that mark bits allocations are always 8-byte aligned, so we
don't need to handle any edge-cases due to the fact that OnesCount64
operates on 64 bits at a time: all irrelevant bits will be zero.
Overall, this implementation is significantly faster than the old one on
amd64, and should be similarly faster (or better!) on other systems
which support the intrinsic. On systems which do not, it should be
roughly the same performance because OnesCount64 is implemented using a
table in the general case.
Results on linux/amd64:
name old time/op new time/op delta
MSpanCountAlloc/bits=64-4 16.8ns ± 0% 12.7ns ± 0% -24.40% (p=0.000 n=5+4)
MSpanCountAlloc/bits=128-4 23.5ns ± 0% 12.8ns ± 0% -45.70% (p=0.000 n=4+5)
MSpanCountAlloc/bits=256-4 43.5ns ± 0% 12.8ns ± 0% -70.67% (p=0.000 n=4+5)
MSpanCountAlloc/bits=512-4 59.5ns ± 0% 15.4ns ± 0% -74.12% (p=0.008 n=5+5)
MSpanCountAlloc/bits=1024-4 116ns ± 1% 23ns ± 0% -79.84% (p=0.000 n=5+4)
Change-Id: Id4c994be22224653af5333683a69b0937130ed04
Reviewed-on: https://go-review.googlesource.com/c/go/+/216558
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
This change adds a small microbenchmark for (*mspan).countAlloc, which
we're about to replace. Admittedly this isn't a critical piece of code,
but the benchmark was useful in understanding the performance change.
Change-Id: Iea93c00f571ee95534a42f2ef2ab026b382242b3
Reviewed-on: https://go-review.googlesource.com/c/go/+/224438
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
It appears that PowerRegisterSuspendResumeNotification is not supported
when running inside Docker - see issues #35447, #36557 and #37149.
Our current code relies on error number to determine Docker environment.
But we already saw PowerRegisterSuspendResumeNotification return
ERROR_FILE_NOT_FOUND, ERROR_INVALID_PARAMETERS and ERROR_ACCESS_DENIED
(see issues above). So this approach is not sustainable.
Just ignore PowerRegisterSuspendResumeNotification returned error.
Fixes#37149
Change-Id: I2beba9d45cdb8c1efac5e974e747827a6261915a
Reviewed-on: https://go-review.googlesource.com/c/go/+/219657
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
There are a handful of places where the runtime wants to round up the
result of a division. We just introduced a helper to do this. This CL
replaces all of the hand-coded round-ups (that I could find) with this
helper.
Change-Id: I465d152157ff0f3cad40c0aa57491e4f2de510ad
Reviewed-on: https://go-review.googlesource.com/c/go/+/224385
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
materializeGCProg allocates a temporary buffer for unrolling a GC
program. Unfortunately, when computing the size of the buffer, it
rounds *down* the number of bytes needed to store bitmap before
rounding up the number of pages needed to store those bytes. The fact
that it rounds up to pages usually mitigates the rounding down, but
the type from #37470 exists right on the boundary where this doesn't
work:
type Sequencer struct {
htable [1 << 17]uint32
buf []byte
}
On 64-bit, this GC bitmap is exactly 8 KiB of zeros, followed by three
one bits. Hence, this needs 8193 bytes of storage, but the current
math in materializeGCProg rounds *down* the three one bits to 8192
bytes. Since this is exactly pageSize, the next step of rounding up to
the page size doesn't mitigate this error, and materializeGCProg
allocates a buffer that is one byte too small. runGCProg then writes
one byte past the end of this buffer, causing either a segfault (if
you're lucky!) or memory corruption.
Fixes#37470.
Change-Id: Iad24c463c501cd9b1dc1924bc2ad007991a094a0
Reviewed-on: https://go-review.googlesource.com/c/go/+/221197
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Add a test to ensure that the race detector sees that closing a
channel synchronizes with a read from that channel.
This test case failed when CL 181543 was in the tree.
CL 181543 was reverted in CL 216158; this adds a test to make
sure that we don't re-introduce the problem at a later date.
For #32529
For #36714
Change-Id: I5a40f744c67c3f8191d6ad822710c180880a7375
Reviewed-on: https://go-review.googlesource.com/c/go/+/216099
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
In Go 1.4 we renamed READY to pdReady and WAIT to pdWait as part of
rewriting netpoll from C to Go. Finish updating the comments to use
the new names.
Change-Id: I6cefc698b46c58211fd6be1489bdd70419454962
Reviewed-on: https://go-review.googlesource.com/c/go/+/223998
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This change makes it so that worldsema isn't held across the mark phase.
This means that various operations like ReadMemStats may now stop the
world during the mark phase, reducing latency on such operations.
Only three such operations are still no longer allowed to occur during
marking: GOMAXPROCS, StartTrace, and StopTrace.
For the former it's because any change to GOMAXPROCS impacts GC mark
background worker scheduling and the details there are tricky.
For the latter two it's because tracing needs to observe consistent GC
start and GC end events, and if StartTrace or StopTrace may stop the
world during marking, then it's possible for it to see a GC end event
without a start or GC start event without an end, respectively.
To ensure that GOMAXPROCS and StartTrace/StopTrace cannot proceed until
marking is complete, the runtime now holds a new semaphore, gcsema,
across the mark phase just like it used to with worldsema.
This change is being landed once more after being reverted in the Go
1.14 release cycle, since CL 215157 allows it to have a positive
effect on system performance.
For the benchmark BenchmarkReadMemStatsLatency in the runtime, which
measures ReadMemStats latencies while the GC is exercised, the tail of
these latencies reduced dramatically on an 8-core machine:
name old 50%tile-ns new 50%tile-ns delta
ReadMemStatsLatency-8 4.40M ±74% 0.12M ± 2% -97.35% (p=0.008 n=5+5)
name old 90%tile-ns new 90%tile-ns delta
ReadMemStatsLatency-8 102M ± 6% 0M ±14% -99.79% (p=0.008 n=5+5)
name old 99%tile-ns new 99%tile-ns delta
ReadMemStatsLatency-8 147M ±18% 4M ±57% -97.43% (p=0.008 n=5+5)
Fixes#19812.
Change-Id: If66c3c97d171524ae29f0e7af4bd33509d9fd0bb
Reviewed-on: https://go-review.googlesource.com/c/go/+/216557
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This change adds a benchmark to the runtime which measures ReadMemStats
latencies. It generates allocations with lots of pointers to keep the GC
busy while hitting ReadMemStats and measuring the time it takes to
complete.
Updates #19812.
Change-Id: I7a76aaf497ba5324d3c7a7b3df32461b3e6c3ac8
Reviewed-on: https://go-review.googlesource.com/c/go/+/220177
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
Currently, dedicated background mark workers are essentially always
non-preemptible.
This change makes it so that dedicated background mark workers park if
their preemption flag is set and someone is trying to STW, allowing them
to do so.
This change prepares us for allowing a STW to happen (and happen
promptly) during GC marking in a follow-up change.
Updates #19812.
Change-Id: I67fb6085bf0f0aebd18ca500172767818a1f15e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/215157
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Austin Clements <austin@google.com>
If multiple threads call preemptone to preempt the same M, it may
send many signals to the same M such that it hardly make
progress, causing live-lock problem. Only send a signal if there
isn't already one pending.
Fixes#37741.
Change-Id: Id94adb0b95acbd18b23abe637a8dcd81ab41b452
Reviewed-on: https://go-review.googlesource.com/c/go/+/223737
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>