Commit Graph

360 Commits

Author SHA1 Message Date
Michael Anthony Knyszek 42019613df runtime: make distributed/local malloc stats the source-of-truth
This change makes it so that various local malloc stats (excluding
heap_scan and local_tinyallocs) are no longer written first to mheap
fields but are instead accessed directly from each mcache.

This change is part of a move toward having stats be distributed, and
cleaning up some old code related to the stats.

Note that because there's no central source-of-truth, when an mcache
dies, it must donate its stats to another mcache. It's always safe to
donate to the mcache for the 0th P, so do that.

Change-Id: I2556093dbc27357cb9621c9b97671f3c00aa1173
Reviewed-on: https://go-review.googlesource.com/c/go/+/246964
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2020-10-26 17:26:08 +00:00
Michael Anthony Knyszek ce46f197b6 runtime: access the assist ratio atomically
This change makes it so that the GC assist ratio (the pair of
gcControllerState fields assistBytesPerWork and assistWorkPerByte) is
updated atomically. Note that the pair of fields are not updated
together atomically, but that's OK. The code here was already racy for
some time and in practice the assist ratio moves very slowly.

The purpose of this change is so that we can document
gcController.revise to be safe for concurrent use, which will be useful
in further changes.

Change-Id: Ie25d630207c88e4f85f2b8953f6a0051ebf1b4ea
Reviewed-on: https://go-review.googlesource.com/c/go/+/246963
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2020-10-26 17:26:01 +00:00
Austin Clements c91dffbc9a runtime: tidy cgocallback
On amd64 and 386, we have a very roundabout way of remembering that we
need to dropm on return that currently involves saving a zero to
needm's argument slot and later bringing it back. Just store the zero.

This also makes amd64 and 386 more consistent with cgocallback on all
other platforms: rather than saving the old M to the G stack, they now
save it to a named slot on the G0 stack.

The needm function no longer needs a dummy argument to get the SP, so
we drop that.

Change-Id: I7e84bb4a5ff9552de70dcf41d8accf02310535e7
Reviewed-on: https://go-review.googlesource.com/c/go/+/263268
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-26 14:50:34 +00:00
Austin Clements 30c1887873 runtime,cmd/cgo: simplify C -> Go call path
This redesigns the way calls work from C to exported Go functions. It
removes several steps from the call path, makes cmd/cgo no longer
sensitive to the Go calling convention, and eliminates the use of
reflectcall from cgo.

In order to avoid generating a large amount of FFI glue between the C
and Go ABIs, the cgo tool has long depended on generating a C function
that marshals the arguments into a struct, and then the actual ABI
switch happens in functions with fixed signatures that simply take a
pointer to this struct. In a way, this CL simply pushes this idea
further.

Currently, the cgo tool generates this argument struct in the exact
layout of the Go stack frame and depends on reflectcall to unpack it
into the appropriate Go call (even though it's actually
reflectcall'ing a function generated by cgo).

In this CL, we decouple this struct from the Go stack layout. Instead,
cgo generates a Go function that takes the struct, unpacks it, and
calls the exported function. Since this generated function has a
generic signature (like the rest of the call path), we don't need
reflectcall and can instead depend on the Go compiler itself to
implement the call to the exported Go function.

One complication is that syscall.NewCallback on Windows, which
converts a Go function into a C function pointer, depends on
cgocallback's current dynamic calling approach since the signatures of
the callbacks aren't known statically. For this specific case, we
continue to depend on reflectcall. Really, the current approach makes
some overly simplistic assumptions about translating the C ABI to the
Go ABI. Now we're at least in a much better position to do a proper
ABI translation.

For comparison, the current cgo call path looks like:

    GoF (generated C function) ->
    crosscall2 (in cgo/asm_*.s) ->
    _cgoexp_GoF (generated Go function) ->
    cgocallback (in asm_*.s) ->
    cgocallback_gofunc (in asm_*.s) ->
    cgocallbackg (in cgocall.go) ->
    cgocallbackg1 (in cgocall.go) ->
    reflectcall (in asm_*.s) ->
    _cgoexpwrap_GoF (generated Go function) ->
    p.GoF

Now the call path looks like:

    GoF (generated C function) ->
    crosscall2 (in cgo/asm_*.s) ->
    cgocallback (in asm_*.s) ->
    cgocallbackg (in cgocall.go) ->
    cgocallbackg1 (in cgocall.go) ->
    _cgoexp_GoF (generated Go function) ->
    p.GoF

Notably:

1. We combine _cgoexp_GoF and _cgoexpwrap_GoF and move the combined
operation to the end of the sequence. This combined function also
handles reflectcall's previous role.

2. We combined cgocallback and cgocallback_gofunc since the only
purpose of having both was to convert a raw PC into a Go function
value. We instead construct the Go function value in cgocallbackg1.

3. cgocallbackg1 no longer reaches backwards through the stack to get
the arguments to cgocallback_gofunc. Instead, we just pass the
arguments down.

4. Currently, we need an explicit msanwrite to mark the results struct
as written because reflectcall doesn't do this. Now, the results are
written by regular Go assignments, so the Go compiler generates the
necessary MSAN annotations. This also means we no longer need to track
the size of the arguments frame.

Updates #40724, since now we don't need to teach cgo about the
register ABI or change how it uses reflectcall.

Change-Id: I7840489a2597962aeb670e0c1798a16a7359c94f
Reviewed-on: https://go-review.googlesource.com/c/go/+/258938
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-26 14:50:32 +00:00
Tiwei Bie bc0b198bd7 runtime: dump the status of lockedg on error
The dumpgstatus() will dump current g's status anyway. When lockedg's
status is bad, it's more helpful to dump lockedg's status as well than
dumping current g's status twice.

Change-Id: If5248cb94b9cdcbf4ceea07562237e1d6ee28489
GitHub-Last-Rev: da814c51ff
GitHub-Pull-Request: golang/go#40248
Reviewed-on: https://go-review.googlesource.com/c/go/+/243097
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Emmanuel Odeke <emm.odeke@gmail.com>
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
2020-10-24 03:15:23 +00:00
Andrew G. Morgan d1b1145cac syscall: support POSIX semantics for Linux syscalls
This change adds two new methods for invoking system calls
under Linux: syscall.AllThreadsSyscall() and
syscall.AllThreadsSyscall6().

These system call wrappers ensure that all OSThreads mirror
a common system call. The wrappers serialize execution of the
runtime to ensure no race conditions where any Go code observes
a non-atomic OS state change. As such, the syscalls have
higher runtime overhead than regular system calls, and only
need to be used where such thread (or 'm' in the parlance
of the runtime sources) consistency is required.

The new support is used to enable these functions under Linux:

  syscall.Setegid(), syscall.Seteuid(), syscall.Setgroups(),
  syscall.Setgid(), syscall.Setregid(), syscall.Setreuid(),
  syscall.Setresgid(), syscall.Setresuid() and syscall.Setuid().

They work identically to their glibc counterparts.

Extensive discussion of the background issue addressed in this
patch can be found here:

   https://github.com/golang/go/issues/1435

In the case where cgo is used, the C runtime can launch pthreads that
are not managed by the Go runtime. As such, the added
syscall.AllThreadsSyscall*() return ENOTSUP when cgo is enabled.
However, for the 9 syscall.Set*() functions listed above, when cgo is
active, these functions redirect to invoke their C.set*() equivalents
in glibc, which wraps the raw system calls with a nptl:setxid fixup
mechanism. This achieves POSIX semantics for these functions in the
combined Go and C runtime.

As a side note, the glibc/nptl:setxid support (2019-11-30) does not
extend to all security related system calls under Linux so using
native Go (CGO_ENABLED=0) and these AllThreadsSyscall*()s, where
needed, will yield more well defined/consistent behavior over all
threads of a Go program. That is, using the
syscall.AllThreadsSyscall*() wrappers for things like setting state
through SYS_PRCTL and SYS_CAPSET etc.

Fixes #1435

Change-Id: Ib1a3e16b9180f64223196a32fc0f9dce14d9105c
Reviewed-on: https://go-review.googlesource.com/c/go/+/210639
Trust: Emmanuel Odeke <emm.odeke@gmail.com>
Trust: Ian Lance Taylor <iant@golang.org>
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Austin Clements <austin@google.com>
2020-10-23 20:53:14 +00:00
Michael Pratt 4a2cc73f87 runtime: don't attempt to steal from idle Ps
Work stealing is a scalability bottleneck in the scheduler. Since each P
has a work queue, work stealing must look at every P to determine if
there is any work. The number of Ps scales linearly with GOMAXPROCS
(i.e., the number of Ps _is_ GOMAXPROCS), thus this work scales linearly
with GOMAXPROCS.

Work stealing is a later attempt by a P to find work before it goes
idle. Since the P has no work of its own, extra costs here tend not to
directly affect application-level benchmarks. Where they show up is
extra CPU usage by the process as a whole. These costs get particularly
expensive for applications that transition between blocked and running
frequently.

Long term, we need a more scalable approach in general, but for now we
can make a simple observation: idle Ps ([1]) cannot possibly have
anything in their runq, so we need not bother checking at all.

We track idle Ps via a new global bitmap, updated in pidleput/pidleget.
This is already a slow path (requires sched.lock), so we don't expect
high contention there.

Using a single bitmap avoids the need to touch every P to read p.status.
Currently, the bitmap approach is not significantly better than reading
p.status. However, in a future CL I'd like to apply a similiar
optimization to timers. Once done, findrunnable would not touch most Ps
at all (in mostly idle programs), which will avoid memory latency to
pull those Ps into cache.

When reading this bitmap, we are racing with Ps going in and out of
idle, so there are a few cases to consider:

1. _Prunning -> _Pidle: Running P goes idle after we check the bitmap.
In this case, we will try to steal (and find nothing) so there is no
harm.

2. _Pidle -> _Prunning while spinning: A P that starts running may queue
new work that we miss. This is OK: (a) that P cannot go back to sleep
without completing its work, and (b) more fundamentally, we will recheck
after we drop our P.

3. _Pidle -> _Prunning after spinning: After spinning, we really can
miss work from a newly woken P. (a) above still applies here as well,
but this is also the same delicate dance case described in findrunnable:
if nothing is spinning anymore, the other P will unpark a thread to run
the work it submits.

Benchmark results from WakeupParallel/syscall/pair/race/1ms (see
golang.org/cl/228577):

name                            old msec          new msec   delta
Perf-task-clock-8               250 ± 1%          247 ± 4%     ~     (p=0.690 n=5+5)
Perf-task-clock-16              258 ± 2%          259 ± 2%     ~     (p=0.841 n=5+5)
Perf-task-clock-32              284 ± 2%          270 ± 4%   -4.94%  (p=0.032 n=5+5)
Perf-task-clock-64              326 ± 3%          303 ± 2%   -6.92%  (p=0.008 n=5+5)
Perf-task-clock-128             407 ± 2%          363 ± 5%  -10.69%  (p=0.008 n=5+5)
Perf-task-clock-256             561 ± 1%          481 ± 1%  -14.20%  (p=0.016 n=4+5)
Perf-task-clock-512             840 ± 5%          683 ± 2%  -18.70%  (p=0.008 n=5+5)
Perf-task-clock-1024          1.38k ±14%        1.07k ± 2%  -21.85%  (p=0.008 n=5+5)

[1] "Idle Ps" here refers to _Pidle Ps in the sched.pidle list. In other
contexts, Ps may temporarily transition through _Pidle (e.g., in
handoffp); those Ps may have work.

Updates #28808
Updates #18237

Change-Id: Ieeb958bd72e7d8fb375b0b1f414e8d7378b14e29
Reviewed-on: https://go-review.googlesource.com/c/go/+/259578
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Trust: Michael Pratt <mpratt@google.com>
2020-10-23 14:18:27 +00:00
Ian Lance Taylor 05739d6f17 runtime: wait for preemption signals before syscall.Exec
Fixes #41702
Fixes #42023

Change-Id: If07f40b1d73b8f276ee28ffb8b7214175e56c24d
Reviewed-on: https://go-review.googlesource.com/c/go/+/262817
Trust: Ian Lance Taylor <iant@golang.org>
Trust: Bryan C. Mills <bcmills@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-16 23:50:26 +00:00
Martin Möhrmann 7c58ef732e runtime: implement GODEBUG=inittrace=1 support
Setting inittrace=1 causes the runtime to emit a single line to standard error for
each package with init work, summarizing the execution time and memory allocation.

The emitted debug information for init functions can be used to find bottlenecks
or regressions in Go startup performance.

Packages with no init function work (user defined or compiler generated) are omitted.

Tracing plugin inits is not supported as they can execute concurrently. This would
make the implementation of tracing more complex while adding support for a very rare
use case. Plugin inits can be traced separately by testing a main package importing
the plugins package imports explicitly.

$ GODEBUG=inittrace=1 go test
init internal/bytealg @0.008 ms, 0 ms clock, 0 bytes, 0 allocs
init runtime @0.059 ms, 0.026 ms clock, 0 bytes, 0 allocs
init math @0.19 ms, 0.001 ms clock, 0 bytes, 0 allocs
init errors @0.22 ms, 0.004 ms clock, 0 bytes, 0 allocs
init strconv @0.24 ms, 0.002 ms clock, 32 bytes, 2 allocs
init sync @0.28 ms, 0.003 ms clock, 16 bytes, 1 allocs
init unicode @0.44 ms, 0.11 ms clock, 23328 bytes, 24 allocs
...

Inspired by stapelberg@google.com who instrumented doInit
in a prototype to measure init times with GDB.

Fixes #41378

Change-Id: Ic37c6a0cfc95488de9e737f5e346b8dbb39174e1
Reviewed-on: https://go-review.googlesource.com/c/go/+/254659
Trust: Martin Möhrmann <moehrmann@google.com>
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-10-14 05:34:32 +00:00
Cherry Zhang a413908dd0 all: add GOOS=ios
Introduce GOOS=ios for iOS systems. GOOS=ios matches "darwin"
build tag, like GOOS=android matches "linux" and GOOS=illumos
matches "solaris". Only ios/arm64 is supported (ios/amd64 is
not).

GOOS=ios and GOOS=darwin remain essentially the same at this
point. They will diverge at later time, to differentiate macOS
and iOS.

Uses of GOOS=="darwin" are changed to (GOOS=="darwin" || GOOS=="ios"),
except if it clearly means macOS (e.g. GOOS=="darwin" && GOARCH=="amd64"),
it remains GOOS=="darwin".

Updates #38485.

Change-Id: I4faacdc1008f42434599efb3c3ad90763a83b67c
Reviewed-on: https://go-review.googlesource.com/c/go/+/254740
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2020-09-23 18:12:59 +00:00
Michael Pratt d42b32e321 runtime: add sched.lock assertions
Functions that require holding sched.lock now have an assertion.

A few places with missing locks have been fixed in this CL:

Additionally, locking is added around the call to procresize in
schedinit. This doesn't technically need a lock since the program is
still starting (thus no concurrency) when this is called, but lock held
checking doesn't know that.

Updates #40677

Change-Id: I198d3cbaa727f7088e4d55ba8fa989cf1ee8f9cf
Reviewed-on: https://go-review.googlesource.com/c/go/+/250261
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2020-09-22 15:14:09 +00:00
Michael Pratt 02ff8b8ce4 runtime: expand gopark documentation
unlockf is called after the G is put into _Gwaiting, meaning another G
may have readied this one before unlockf is called.

This is implied by the current doc, but add additional notes to call out
this behavior, as it can be quite surprising.

Updates #40641

Change-Id: I60b1ccc6a4dd9ced8ad2aa1f729cb2e973100b59
Reviewed-on: https://go-review.googlesource.com/c/go/+/256058
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2020-09-21 15:08:44 +00:00
Paschalis Tsilias 331614c4da runtime: improve error messages after allocating a stack that is too big
In the current implementation, we can observe crashes after calling
debug.SetMaxStack and allocating a stack larger than 4GB since
stackalloc works with 32-bit sizes. To avoid this, we define an upper
limit as the largest feasible point we can grow a stack to and provide a
better error message when we get a stack overflow.

Fixes #41228

Change-Id: I55fb0a824f47ed9fb1fcc2445a4dfd57da9ef8d4
Reviewed-on: https://go-review.googlesource.com/c/go/+/255997
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Keith Randall <khr@golang.org>
2020-09-20 09:54:44 +00:00
Ian Lance Taylor 2556eb76c8 runtime: ignore SIGPROF if profiling disable for thread
This avoids a deadlock on prof.signalLock between setcpuprofilerate
and cpuprof.add if a SIGPROF is delivered to the thread between the
call to setThreadCPUProfiler and acquiring prof.signalLock.

Fixes #41014

Change-Id: Ie825e8594f93a19fb1a6320ed640f4e631553596
Reviewed-on: https://go-review.googlesource.com/c/go/+/253758
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-09-09 18:04:10 +00:00
Keith Randall 5c2c6d3fbf runtime: framepointers are no longer an experiment - hard code them
I think they are no longer experimental status. Might as well promote
them to permanent.

Change-Id: Id1259601b3dd2061dd60df86ee48080bfb575d2f
Reviewed-on: https://go-review.googlesource.com/c/go/+/249857
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2020-08-27 21:15:47 +00:00
Cholerae Hu 613388315e runtime: reduce critical path in injectglist
Change-Id: Ia3fb30ac9add39c803f11f69d967c6604fdeacf8
Reviewed-on: https://go-review.googlesource.com/c/go/+/233217
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2020-08-18 04:22:33 +00:00
liu-xuewen ba97be4b58 runtime: remove tracebackinit and unused skipPC
CL [152537](https://go-review.googlesource.com/c/go/+/152537/) changed the way inlined frames are represented in tracebacks to no longer use skipPC

Change-Id: I42386fdcc5cf72f3c122e789b6af9cbd0c6bed4b
GitHub-Last-Rev: 79c26dcd53
GitHub-Pull-Request: golang/go#39829
Reviewed-on: https://go-review.googlesource.com/c/go/+/239701
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-08-17 21:05:19 +00:00
Austin Clements 7bbd5ca5a6 runtime: replace index and contains with bytealg calls
The runtime has its own implementation of string indexing. To reduce
code duplication and cognitive load, replace this with calls to the
internal/bytealg package. We can't do this on Plan 9 because it needs
string indexing in a note handler (which isn't allowed to use the
optimized bytealg version because it uses SSE), so we can't just
eliminate the index function, but this CL does down-scope it so make
it clear it's only for note handlers on Plan 9.

Change-Id: Ie1a142678262048515c481e8c26313b80c5875df
Reviewed-on: https://go-review.googlesource.com/c/go/+/244537
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2020-08-17 13:20:03 +00:00
Michael Anthony Knyszek 6b4dcf19fa runtime: hold sched.lock over globrunqputbatch in runqputbatch
globrunqputbatch should never be called without sched.lock held.
runqputbatch's documentation even says it may acquire sched.lock in
order to call it.

Fixes #40457.

Change-Id: I5421b64f1da3a6087dfebbef7203db0c95d213a8
Reviewed-on: https://go-review.googlesource.com/c/go/+/245377
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2020-07-30 15:46:39 +00:00
Michael Pratt 85afa2eb19 runtime: ensure startm new M is consistently visible to checkdead
If no M is available, startm first grabs an idle P, then drops
sched.lock and calls newm to start a new M to run than P.

Unfortunately, that leaves a window in which a G (e.g., returning from a
syscall) may find no idle P, add to the global runq, and then in stopm
discover that there are no running M's, a condition that should be
impossible with runnable G's.

To avoid this condition, we pre-allocate the new M ID in startm before
dropping sched.lock. This ensures that checkdead will see the M as
running, and since that new M must eventually run the scheduler, it will
handle any pending work as necessary.

Outside of startm, most other calls to newm/allocm don't have a P at
all. The only exception is startTheWorldWithSema, which always has an M
if there is 1 P (i.e., the currently running M), and if there is >1 P
the findrunnable spinning dance ensures the problem never occurs.

This has been tested with strategically placed sleeps in the runtime to
help induce the correct race ordering, but the timing on this is too
narrow for a test that can be checked in.

Fixes #40368

Change-Id: If5e0293a430cc85154b7ed55bc6dadf9b340abe2
Reviewed-on: https://go-review.googlesource.com/c/go/+/245018
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2020-07-28 16:59:04 +00:00
Ian Lance Taylor 9b90491e4a runtime: steal timers from running P's
Previously we did not steal timers from running P's, because that P
should be responsible for running its own timers. However, if the P
is running a CPU-bound G, this can cause measurable delays in running
ready timers. Also, in CL 214185 we avoided taking the timer lock of a P
with no ready timers, which reduces the chances of timer lock contention.

So, if we can't find any ready timers on sleeping P's, try stealing
them from running P's.

Fixes #38860

Change-Id: I0bf1d5dc56258838bdacccbf89493524e23d7fed
Reviewed-on: https://go-review.googlesource.com/c/go/+/232199
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2020-06-03 05:33:54 +00:00
Richard Musiol 0452f9460f runtime: fix race condition between timer and event handler
This change fixes a race condition between beforeIdle waking up the
innermost event handler and a timer causing a different goroutine to
wake up at the exact same moment. This messes up the wasm event handling
and leads to memory corruption. The solution is to make beforeIdle
return the goroutine that must run next and have findrunnable pick
this goroutine without considering timers again.

Fixes #38093
Fixes #38574

Change-Id: Iffbe99411d25c2730953d1c8b0741fd892f8e540
Reviewed-on: https://go-review.googlesource.com/c/go/+/230178
Run-TryBot: Richard Musiol <neelance@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-05-31 18:35:04 +00:00
Michael Pratt 11b3730a02 runtime: disable preemption in startTemplateThread
When a locked M wants to start a new M, it hands off to the template
thread to actually call clone and start the thread. The template thread
is lazily created the first time a thread is locked (or if cgo is in
use).

stoplockedm will release the P (_Pidle), then call handoffp to give the
P to another M. In the case of a pending STW, one of two things can
happen:

1. handoffp starts an M, which does acquirep followed by schedule, which
will finally enter _Pgcstop.

2. handoffp immediately enters _Pgcstop. This only occurs if the P has
no local work, GC work, and no spinning M is required.

If handoffp starts an M, and must create a new M to do so, then newm
will simply queue the M on newmHandoff for the template thread to do the
clone.

When a stop-the-world is required, stopTheWorldWithSema will start the
stop and then wait for all Ps to enter _Pgcstop. If the template thread
is not fully created because startTemplateThread gets stopped, then
another stoplockedm may queue an M that will never get created, and the
handoff P will never leave _Pidle. Thus stopTheWorldWithSema will wait
forever.

A sequence to trigger this hang when STW occurs can be visualized with
two threads:

  T1                                 T2
-------------------------------   -----------------------------

LockOSThread                      LockOSThread
  haveTemplateThread == 0
  startTemplateThread
    haveTemplateThread = 1
    newm                            haveTemplateThread == 1
      preempt -> schedule           g.m.lockedExt++
        gcstopm -> _Pgcstop         g.m.lockedg = ...
        park                        g.lockedm = ...
                                    return

                                 ... (any code)
                                   preempt -> schedule
                                     stoplockedm
                                       releasep -> _Pidle
                                       handoffp
                                         startm (first 3 handoffp cases)
                                          newm
                                            g.m.lockedExt != 0
                                            Add to newmHandoff, return
                                       park

Note that the P in T2 is stuck sitting in _Pidle. Since the template
thread isn't running, the new M will not be started complete the
transition to _Pgcstop.

To resolve this, we disable preemption around the assignment of
haveTemplateThread and the creation of the template thread in order to
guarantee that if handTemplateThread is set then the template thread
will eventually exist, in the presence of stops.

Fixes #38931

Change-Id: I50535fbbe2f328f47b18e24d9030136719274191
Reviewed-on: https://go-review.googlesource.com/c/go/+/232978
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-05-21 21:01:39 +00:00
Michael Anthony Knyszek c847589ad0 runtime: synchronize StartTrace and StopTrace with sysmon
Currently sysmon is not stopped when the world is stopped, which is
in general a difficult thing to do. The result of this is that when
tracing starts and the value of trace.enabled changes, it's possible
for sysmon to fail to emit an event when it really should. This leads to
traces which the execution trace parser deems inconsistent.

Fix this by putting all of sysmon's work behind a new lock sysmonlock.
StartTrace and StopTrace both acquire this lock after stopping the world
but before performing any work in order to ensure sysmon sees the
required state change in tracing. This change is expected to slow down
StartTrace and StopTrace, but will help ensure consistent traces are
generated.

Updates #29707.
Fixes #38794.

Change-Id: I64c58e7c3fd173cd5281ffc208d6db24ff6c0284
Reviewed-on: https://go-review.googlesource.com/c/go/+/234617
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2020-05-21 14:48:50 +00:00
Dan Scales f9640b88c7 runtime: incorporate Gscan acquire/release into lock ranking order
I added routines that can acquire/release a particular rank without
acquiring/releasing an associated lock. I added lockRankGscan as a rank
for acquiring/releasing the Gscan bit.

castogscanstatus() and casGtoPreemptScan() are acquires of the Gscan
bit. casfrom_Gscanstatus() is a release of the Gscan bit. casgstatus()
is like an acquire and release of the Gscan bit, since it will wait if
Gscan bit is currently set.

We have a cycle between hchan and Gscan. The acquisition of Gscan and
then hchan only happens in syncadjustsudogs() when the G is suspended,
so the main normal ordering (get hchan, then get Gscan) can't be
happening. So, I added a new rank lockRankHchanLeaf that is used when
acquiring hchan locks in syncadjustsudogs. This ranking is set so no
other locks can be acquired except other hchan locks.

Fixes #38922

Change-Id: I58ce526a74ba856cb42078f7b9901f2832e1d45c
Reviewed-on: https://go-review.googlesource.com/c/go/+/228417
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2020-05-07 20:45:42 +00:00
徐志强 85162292af runtime: call osyield directly in lockextra
The `yield := osyield` line doesn't serve any purpose,  it's committed in `2015`, time to delete that line:)

Change-Id: I382d4d32cf320f054f011f3b6684c868cbcb0ff2
GitHub-Last-Rev: 7a0aa25e55
GitHub-Pull-Request: golang/go#36078
Reviewed-on: https://go-review.googlesource.com/c/go/+/210837
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2020-05-07 04:59:13 +00:00
geedchin 01a9cf8487 runtime: correct waitReasonForceGGIdle to waitResonForceGCIdle
Change-Id: I211db915ce2e98555c58f4320ca58e91536f8f3d
GitHub-Last-Rev: 40a7430f88
GitHub-Pull-Request: golang/go#38852
Reviewed-on: https://go-review.googlesource.com/c/go/+/232037
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-05-05 02:38:39 +00:00
Michael Anthony Knyszek 2491c5fd24 runtime: wake scavenger and update address on sweep done
This change modifies the semantics of waking the scavenger: rather than
wake on any update to pacing, wake when we know we will have work to do,
that is, when the sweeper is done. The current scavenger runs over the
address space just once per GC cycle, and we want to maximize the chance
that the scavenger observes the most attractive scavengable memory in
that pass (i.e. free memory with the highest address), so the timing is
important. By having the scavenger awaken and reset its search space
when the sweeper is done, we increase the chance that the scavenger will
observe the most attractive scavengable memory, because no more memory
will be freed that GC cycle (so the highest scavengable address should
now be available).

Furthermore, in applications that go idle, this means the background
scavenger will be awoken even if another GC doesn't happen, which isn't
true today.

However, we're unable to wake the scavenger directly from within the
sweeper; waking the scavenger involves modifying timers and readying
goroutines, the latter of which may trigger an allocation today (and the
sweeper may run during allocation!). Instead, we do the following:

1. Set a flag which is checked by sysmon. sysmon will clear the flag and
   wake the scavenger.
2. Wake the scavenger unconditionally at sweep termination.

The idea behind this policy is that it gets us close enough to the state
above without having to deal with the complexity of waking the scavenger
in deep parts of the runtime. If the application goes idle and sweeping
finishes (so we don't reach sweep termination), then sysmon will wake
the scavenger. sysmon has a worst-case 20 ms delay in responding to this
signal, which is probably fine if the application is completely idle
anyway, but if the application is actively allocating, then the
proportional sweeper should help ensure that sweeping ends very close to
sweep termination, so sweep termination is a perfectly reasonable time
to wake up the scavenger.

Updates #35788.

Change-Id: I84289b37816a7d595d803c72a71b7f5c59d47e6b
Reviewed-on: https://go-review.googlesource.com/c/go/+/207998
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2020-04-30 18:12:03 +00:00
Austin Clements 4e00b4c366 runtime: move condition into wakep
All five calls to wakep are protected by the same check of nmidle and
nmspinning. Move this check into wakep.

Change-Id: I2094eec211ce551e462e87614578f37f1896ba38
Reviewed-on: https://go-review.googlesource.com/c/go/+/230757
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2020-04-30 00:42:35 +00:00
Austin Clements b3863fbbc2 runtime: make newproc1 not start the goroutine
Currently, newproc1 allocates, initializes, and schedules a new
goroutine. We're about to change debug call injection in a way that
will need to create a new goroutine without immediately scheduling it.
To prepare for that, make scheduling the responsibility of newproc1's
caller. Currently, there's exactly one caller (newproc), so this
simply shifts that responsibility.

For #36365.

Change-Id: Idacd06b63e738982e840fe995d891bfd377ce23b
Reviewed-on: https://go-review.googlesource.com/c/go/+/229298
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-04-29 21:29:08 +00:00
Dan Scales 0a820007e7 runtime: static lock ranking for the runtime (enabled by GOEXPERIMENT)
I took some of the infrastructure from Austin's lock logging CR
https://go-review.googlesource.com/c/go/+/192704 (with deadlock
detection from the logs), and developed a setup to give static lock
ranking for runtime locks.

Static lock ranking establishes a documented total ordering among locks,
and then reports an error if the total order is violated. This can
happen if a deadlock happens (by acquiring a sequence of locks in
different orders), or if just one side of a possible deadlock happens.
Lock ordering deadlocks cannot happen as long as the lock ordering is
followed.

Along the way, I found a deadlock involving the new timer code, which Ian fixed
via https://go-review.googlesource.com/c/go/+/207348, as well as two other
potential deadlocks.

See the constants at the top of runtime/lockrank.go to show the static
lock ranking that I ended up with, along with some comments. This is
great documentation of the current intended lock ordering when acquiring
multiple locks in the runtime.

I also added an array lockPartialOrder[] which shows and enforces the
current partial ordering among locks (which is embedded within the total
ordering). This is more specific about the dependencies among locks.

I don't try to check the ranking within a lock class with multiple locks
that can be acquired at the same time (i.e. check the ranking when
multiple hchan locks are acquired).

Currently, I am doing a lockInit() call to set the lock rank of most
locks. Any lock that is not otherwise initialized is assumed to be a
leaf lock (a very high rank lock), so that eliminates the need to do
anything for a bunch of locks (including all architecture-dependent
locks). For two locks, root.lock and notifyList.lock (only in the
runtime/sema.go file), it is not as easy to do lock initialization, so
instead, I am passing the lock rank with the lock calls.

For Windows compilation, I needed to increase the StackGuard size from
896 to 928 because of the new lock-rank checking functions.

Checking of the static lock ranking is enabled by setting
GOEXPERIMENT=staticlockranking before doing a run.

To make sure that the static lock ranking code has no overhead in memory
or CPU when not enabled by GOEXPERIMENT, I changed 'go build/install' so
that it defines a build tag (with the same name) whenever any experiment
has been baked into the toolchain (by checking Expstring()). This allows
me to avoid increasing the size of the 'mutex' type when static lock
ranking is not enabled.

Fixes #38029

Change-Id: I154217ff307c47051f8dae9c2a03b53081acd83a
Reviewed-on: https://go-review.googlesource.com/c/go/+/207619
Reviewed-by: Dan Scales <danscales@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-07 21:51:03 +00:00
Michael Anthony Knyszek f1f947af28 runtime: don't hold worldsema across mark phase
This change makes it so that worldsema isn't held across the mark phase.
This means that various operations like ReadMemStats may now stop the
world during the mark phase, reducing latency on such operations.

Only three such operations are still no longer allowed to occur during
marking: GOMAXPROCS, StartTrace, and StopTrace.

For the former it's because any change to GOMAXPROCS impacts GC mark
background worker scheduling and the details there are tricky.

For the latter two it's because tracing needs to observe consistent GC
start and GC end events, and if StartTrace or StopTrace may stop the
world during marking, then it's possible for it to see a GC end event
without a start or GC start event without an end, respectively.

To ensure that GOMAXPROCS and StartTrace/StopTrace cannot proceed until
marking is complete, the runtime now holds a new semaphore, gcsema,
across the mark phase just like it used to with worldsema.

This change is being landed once more after being reverted in the Go
1.14 release cycle, since CL 215157 allows it to have a positive
effect on system performance.

For the benchmark BenchmarkReadMemStatsLatency in the runtime, which
measures ReadMemStats latencies while the GC is exercised, the tail of
these latencies reduced dramatically on an 8-core machine:

name                   old 50%tile-ns  new 50%tile-ns  delta
ReadMemStatsLatency-8      4.40M ±74%      0.12M ± 2%  -97.35%  (p=0.008 n=5+5)

name                   old 90%tile-ns  new 90%tile-ns  delta
ReadMemStatsLatency-8       102M ± 6%         0M ±14%  -99.79%  (p=0.008 n=5+5)

name                   old 99%tile-ns  new 99%tile-ns  delta
ReadMemStatsLatency-8       147M ±18%         4M ±57%  -97.43%  (p=0.008 n=5+5)

Fixes #19812.

Change-Id: If66c3c97d171524ae29f0e7af4bd33509d9fd0bb
Reviewed-on: https://go-review.googlesource.com/c/go/+/216557
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-18 19:13:50 +00:00
Ian Lance Taylor 2fbca94db7 runtime: add goroutines returned by poller to local run queue
In Go 1.13, when the network poller found a list of ready goroutines,
they were added to the global run queue. The timer goroutine would
typically sleep in a futex with a timeout, and when the timeout
expired the timer goroutine would either be handed off to an idle P
or added to the global run queue. The effect was that on a busy system
with no idle P's goroutines waiting for timeouts and goroutines waiting
for the network would start at the same priority.

That changed on tip with the new timer code. Now timer functions are
invoked directly from a P, and it happens that the functions used
by time.Sleep and time.After and time.Ticker add the newly ready
goroutines to the local run queue. When a P looks for work it will
prefer goroutines on the local run queue; in fact it will only
occasionally look at the global run queue, and even when it does it
will just pull one goroutine off. So on a busy system with both active
timers and active network connections the system can noticeably prefer
to run goroutines waiting for timers rather than goroutines waiting
for the network.

This CL undoes that change by, when possible, adding goroutines
waiting for the network to the local run queue of the P that checked.
This doesn't affect network poller checks done by sysmon, but it
does affect network poller checks done as each P enters the scheduler.

This CL also makes injecting a list into either the local or global run
queue more efficient, using bulk operations rather than individual ones.

Change-Id: I85a66ad74e4fc3b458256fb7ab395d06f0d2ffac
Reviewed-on: https://go-review.googlesource.com/c/go/+/216198
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2020-03-16 22:31:39 +00:00
Ian Lance Taylor 3093959ee1 runtime: remove mcache field from m
Having an mcache field in both m and p is confusing, so remove it from m.
Always use mcache field from p. Use new variable mcache0 during bootstrap.

Change-Id: If2cba9f8bb131d911d512b61fd883a86cf62cc98
Reviewed-on: https://go-review.googlesource.com/c/go/+/205239
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2020-02-24 16:39:52 +00:00
Michael Knyszek 64c22b70bf Revert "runtime: don't hold worldsema across mark phase"
This reverts commit 7b294cdd8d, CL 182657.

Reason for revert: This change may be causing latency problems
for applications which call ReadMemStats, because it may cause
all goroutines to stop until the GC completes.

https://golang.org/cl/215157 fixes this problem, but it's too
late in the cycle to land that.

Updates #19812.

Change-Id: Iaa26f4dec9b06b9db2a771a44e45f58d0aa8f26d
Reviewed-on: https://go-review.googlesource.com/c/go/+/216358
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-01-24 23:27:33 +00:00
Ian Lance Taylor 895b7c85ad runtime: don't skip checkTimers if we would clear deleted timers
The timers code used to have a problem: if code started and stopped a
lot of timers, as would happen with, for example, lots of calls to
context.WithTimeout, then it would steadily use memory holding timers
that had stopped but not been removed from the timer heap.
That problem was fixed by CL 214299, which would remove all deleted
timers whenever they got to be more than 1/4 of the total number of
timers on the heap.

The timers code had a different problem: if there were some idle P's,
the running P's would have lock contention trying to steal their timers.
That problem was fixed by CL 214185, which only acquired the timer lock
if the next timer was ready to run or there were some timers to adjust.

Unfortunately, CL 214185 partially undid 214299, in that we could now
accumulate an increasing number of deleted timers while there were no
timers ready to run. This CL restores the 214299 behavior, by checking
whether there are lots of deleted timers without acquiring the lock.

This is a performance issue to consider for the 1.14 release.

Change-Id: I13c980efdcc2a46eb84882750c39e3f7c5b2e7c3
Reviewed-on: https://go-review.googlesource.com/c/go/+/215722
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-01-22 18:10:42 +00:00
Ian Lance Taylor cfe3cd903f runtime: keep P's first timer when in new atomically accessed field
This reduces lock contention when only a few P's are running and
checking for whether they need to run timers on the sleeping P's.
Without this change the running P's would get lock contention
while looking at the sleeping P's timers. With this change a single
atomic load suffices to determine whether there are any ready timers.

Change-Id: Ie843782bd56df49867a01ecf19c47498ec827452
Reviewed-on: https://go-review.googlesource.com/c/go/+/214185
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: David Chase <drchase@google.com>
2020-01-14 19:54:20 +00:00
Ian Lance Taylor 641e61db57 runtime: don't let P's timer heap get clogged with deleted timers
Whenever more than 1/4 of the timers on a P's heap are deleted,
remove them from the heap.

Change-Id: Iff63ed3d04e6f33ffc5c834f77f645c52c007e52
Reviewed-on: https://go-review.googlesource.com/c/go/+/214299
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2020-01-10 23:03:06 +00:00
Rhys Hiltner a4c579e8f7 runtime: emit trace event in direct semaphore handoff
When a goroutine yields the remainder of its time to another goroutine
during direct semaphore handoff (as in an Unlock of a sync.Mutex in
starvation mode), it needs to signal that change to the execution
tracer. The discussion in CL 200577 didn't reach consensus on how best
to describe that, but pointed out that "traceEvGoSched / goroutine calls
Gosched" could be confusing.

Emit a "traceEvGoPreempt / goroutine is preempted" event in this case,
to allow the execution tracer to find a consistent event ordering
without being both specific and inaccurate about why the active
goroutine has changed.

Fixes #36186

Change-Id: Ic4ade19325126db2599aff6aba7cba028bb0bee9
Reviewed-on: https://go-review.googlesource.com/c/go/+/211797
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2020-01-02 20:13:03 +00:00
Ian Lance Taylor 580337e268 runtime, time: remove old timer code
Updates #6239
Updates #27707

Change-Id: I65e6471829c9de4677d3ac78ef6cd7aa0a1fc4cb
Reviewed-on: https://go-review.googlesource.com/c/go/+/171884
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
2019-11-19 15:30:58 +00:00
Ian Lance Taylor 2d8c1995b9 runtime: release timersLock while running timer
Dan Scales pointed out a theoretical deadlock in the runtime.

The timer code runs timer functions while holding the timers lock for a P.
The scavenger queues up a timer function that calls wakeScavenger,
which acquires the scavenger lock.

The scavengeSleep function acquires the scavenger lock,
then calls resetTimer which can call addInitializedTimer
which acquires the timers lock for the current P.

So there is a potential deadlock, in that the scavenger lock and
the timers lock for some P may both be acquired in different order.
It's not clear to me whether this deadlock can ever actually occur.

Issue 35532 describes another possible deadlock.

The pollSetDeadline function acquires pd.lock for some poll descriptor,
and in some cases calls resettimer which can in some cases acquire
the timers lock for the current P.

The timer code runs timer functions while holding the timers lock for a P.
The timer function for poll descriptors winds up in netpolldeadlineimpl
which acquires pd.lock.

So again there is a potential deadlock, in that the pd lock for some
poll descriptor and the timers lock for some P may both be acquired in
different order. I think this can happen if we change the deadline
for a network connection exactly as the former deadline expires.

Looking at the code, I don't see any reason why we have to hold
the timers lock while running a timer function.
This CL implements that change.

Updates #6239
Updates #27707
Fixes #35532

Change-Id: I17792f5a0120e01ea07cf1b2de8434d5c10704dd
Reviewed-on: https://go-review.googlesource.com/c/go/+/207348
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2019-11-19 02:41:53 +00:00
Ian Lance Taylor e762378c42 runtime: acquire timersLocks around moveTimers
In the discussion of CL 171828 we decided that it was not necessary to
acquire timersLock around the call to moveTimers, because the world is
stopped. However, that is not correct, as sysmon runs even when the world
is stopped, and it calls timeSleepUntil which looks through the timers.
timeSleepUntil acquires timersLock, but that doesn't help if moveTimers
is running at the same time.

Updates #6239
Updates #27707
Updates #35462

Change-Id: I346c5bde594c4aff9955ae430b37c2b6fc71567f
Reviewed-on: https://go-review.googlesource.com/c/go/+/206938
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2019-11-13 18:03:37 +00:00
Carlo Alberto Ferraris 97d0505334 runtime: consistently seed fastrand state across archs
Some, but not all, architectures mix in OS-provided random seeds when
initializing the fastrand state. The others have TODOs saying we need
to do the same. Lift that logic up in the architecture-independent
part, and use memhash to mix the seed instead of a simple addition.

Previously, dumping the fastrand state at initialization would yield
something like the following on linux-amd64, where the values in the
first column do not change between runs (as thread IDs are sequential
and always start at 0), and the values in the second column, while
changing every run, are pretty correlated:

first run:

0x0 0x44d82f1c
0x5f356495 0x44f339de
0xbe6ac92a 0x44f91cd8
0x1da02dbf 0x44fd91bc
0x7cd59254 0x44fee8a4
0xdc0af6e9 0x4547a1e0
0x3b405b7e 0x474c76fc
0x9a75c013 0x475309dc
0xf9ab24a8 0x4bffd075

second run:

0x0 0xa63fc3eb
0x5f356495 0xa6648dc2
0xbe6ac92a 0xa66c1c59
0x1da02dbf 0xa671bce8
0x7cd59254 0xa70e8287
0xdc0af6e9 0xa7129d2e
0x3b405b7e 0xa7379e2d
0x9a75c013 0xa7e4c64c
0xf9ab24a8 0xa7ecce07

With this change, we get initial states that appear to be much more
unpredictable, both within the same run as well as between runs:

0x11bddad7 0x97241c63
0x553dacc6 0x2bcd8523
0x62c01085 0x16413d92
0x6f40e9e6 0x7a138de6
0xa4898053 0x70d816f0
0x5ca5b433 0x188a395b
0x62778ca9 0xd462c3b5
0xd6e160e4 0xac9b4bd
0xb9571d65 0x597a981d

Change-Id: Ib22c530157d74200df0083f830e0408fd4aaea58
Reviewed-on: https://go-review.googlesource.com/c/go/+/203439
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-11-12 21:40:12 +00:00
Rhys Hiltner 7148478f1b sync: yield to the waiter when unlocking a starving mutex
When we have already assigned the semaphore ticket to a specific
waiter, we want to get the waiter running as fast as possible since
no other G waiting on the semaphore can acquire it optimistically.

The net effect is that, when a sync.Mutex is contended, the code in
the critical section guarded by the Mutex gets a priority boost.

Fixes #33747

The original work was done in CL 200577 by Carlo Alberto Ferraris. The
change was reverted in CL 205817 because it broke the linux-arm64-packet
and solaris-amd64-oraclerel builders.

Change-Id: I76d79b1d63fd206ed1c57fe6900cb7ae9e4d46cb
Reviewed-on: https://go-review.googlesource.com/c/go/+/206180
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-11-09 19:31:32 +00:00
Michael Anthony Knyszek a2cd2bd55d runtime: add per-p page allocation cache
This change adds a per-p free page cache which the page allocator may
allocate out of without a lock. The change also introduces a completely
lockless page allocator fast path.

Although the cache contains at most 64 pages (and usually less), the
vast majority (85%+) of page allocations are exactly 1 page in size.

Updates #35112.

Change-Id: I170bf0a9375873e7e3230845eb1df7e5cf741b78
Reviewed-on: https://go-review.googlesource.com/c/go/+/195701
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Austin Clements <austin@google.com>
2019-11-08 18:00:54 +00:00
Michael Anthony Knyszek 4517c02f28 runtime: add per-p mspan cache
This change adds a per-p mspan object cache similar to the sudog cache.
Unfortunately this cache can't quite operate like the sudog cache, since
it is used in contexts where write barriers are disallowed (i.e.
allocation codepaths), so rather than managing an array and a slice,
it's just an array and a length. A little bit more unsafe, but avoids
any write barriers.

The purpose of this change is to reduce the number of operations which
require the heap lock in allocation, paving the way for a lockless fast
path.

Updates #35112.

Change-Id: I32cfdcd8528fb7be985640e4f3a13cb98ffb7865
Reviewed-on: https://go-review.googlesource.com/c/go/+/196642
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-11-08 17:01:32 +00:00
Bryan C. Mills 73d57bf80f Revert "sync: yield to the waiter when unlocking a starving mutex"
This reverts CL 200577.

Reason for revert: broke linux-arm64-packet and solaris-amd64-oraclerel builders

Fixes #35424
Updates #33747

Change-Id: I2575fd84d37995d458183caae54704f15d8b8426
Reviewed-on: https://go-review.googlesource.com/c/go/+/205817
Run-TryBot: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-11-07 15:04:03 +00:00
Carlo Alberto Ferraris a8f57f4ada sync: yield to the waiter when unlocking a starving mutex
When we have already assigned the semaphore ticket to a specific
waiter, we want to get the waiter running as fast as possible since
no other G waiting on the semaphore can acquire it optimistically.

The net effect is that, when a sync.Mutex is contented, the code in
the critical section guarded by the Mutex gets a priority boost.

Fixes #33747

Change-Id: I9967f0f763c25504010651bdd7f944ee0189cd45
Reviewed-on: https://go-review.googlesource.com/c/go/+/200577
Reviewed-by: Rhys Hiltner <rhys@justin.tv>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-11-07 05:59:33 +00:00
Ian Lance Taylor b50fcc88e9 runtime: don't hold scheduler lock when calling timeSleepUntil
Otherwise, we can get into a deadlock: sysmon takes the scheduler lock
and calls timeSleepUntil which takes each P's timer lock. Simultaneously,
some P calls runtimer (holding the P's own timer lock) which wakes up
the scavenger, calling goready, calling wakep, calling startm, getting
the scheduler lock. Now the sysmon thread is holding the scheduler lock
and trying to get a P's timer lock, while some other thread running on
that P is holding the P's timer lock and trying to get the scheduler lock.

So change sysmon to call timeSleepUntil without holding the scheduler
lock, and change timeSleepUntil to use allpLock, which is only held for
limited periods of time and should never compete with timer locks.

This hopefully

Fixes #35375

At least it should fix the linux-arm64-packet builder problems,
which occurred more reliably as that system has GOMAXPROCS == 96,
giving a lot more scope for this deadlock.

Change-Id: I7a7917daf7a4882e0b27ca416e4f6300cfaaa774
Reviewed-on: https://go-review.googlesource.com/c/go/+/205558
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2019-11-06 15:46:26 +00:00
Ian Lance Taylor d80ab3e85a runtime: wake netpoller when dropping P, don't sleep too long in sysmon
When dropping a P, if it has any timers, and if some thread is
sleeping in the netpoller, wake the netpoller to run the P's timers.
This mitigates races between the netpoller deciding how long to sleep
and a new timer being added.

In sysmon, if all P's are idle, check the timers to decide how long to sleep.
This avoids oversleeping if no thread is using the netpoller.
This can happen in particular if some threads use runtime.LockOSThread,
as those threads do not block in the netpoller.

Also, print the number of timers per P for GODEBUG=scheddetail=1.

Before this CL, TestLockedDeadlock2 would fail about 1% of the time.
With this CL, I ran it 150,000 times with no failures.

Updates #6239
Updates #27707
Fixes #35274
Fixes #35288

Change-Id: I7e5193e6c885e567f0b1ee023664aa3e2902fcd1
Reviewed-on: https://go-review.googlesource.com/c/go/+/204800
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2019-11-04 21:37:08 +00:00