In Go 1.13, when the network poller found a list of ready goroutines,
they were added to the global run queue. The timer goroutine would
typically sleep in a futex with a timeout, and when the timeout
expired the timer goroutine would either be handed off to an idle P
or added to the global run queue. The effect was that on a busy system
with no idle P's goroutines waiting for timeouts and goroutines waiting
for the network would start at the same priority.
That changed on tip with the new timer code. Now timer functions are
invoked directly from a P, and it happens that the functions used
by time.Sleep and time.After and time.Ticker add the newly ready
goroutines to the local run queue. When a P looks for work it will
prefer goroutines on the local run queue; in fact it will only
occasionally look at the global run queue, and even when it does it
will just pull one goroutine off. So on a busy system with both active
timers and active network connections the system can noticeably prefer
to run goroutines waiting for timers rather than goroutines waiting
for the network.
This CL undoes that change by, when possible, adding goroutines
waiting for the network to the local run queue of the P that checked.
This doesn't affect network poller checks done by sysmon, but it
does affect network poller checks done as each P enters the scheduler.
This CL also makes injecting a list into either the local or global run
queue more efficient, using bulk operations rather than individual ones.
Change-Id: I85a66ad74e4fc3b458256fb7ab395d06f0d2ffac
Reviewed-on: https://go-review.googlesource.com/c/go/+/216198
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Use instructions in place of currently used defines.
Updates #36765
Change-Id: I00bb59e77b1aace549d7857cc9721ba2cb4ac6ca
Reviewed-on: https://go-review.googlesource.com/c/go/+/220541
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Add support for Load-Reserved (LR) and Store-Conditional (SC) instructions.
Use instructions in place of currently used defines.
Updates #36765
Change-Id: I77e660639802293ece40cfde4865ac237e3308d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/220540
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Also remove #define's that were previously in use.
Updates #36765
Change-Id: I90b6a8629c78f549012f3f6c5f3b325336182712
Reviewed-on: https://go-review.googlesource.com/c/go/+/220539
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Instead, note that mlock has failed, start trying the mitigation of
touching the signal stack before sending a preemption signal, and,
if the program crashes, mention the possible problem and a wiki page
describing the issue (https://golang.org/wiki/LinuxKernelSignalVectorBug).
Tested on a kernel in the buggy version range, but with the patch,
by using `ulimit -l 0`.
Fixes#37436
Change-Id: I072aadb2101496dffd655e442fa5c367dad46ce8
Reviewed-on: https://go-review.googlesource.com/c/go/+/223121
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
This commit extends the -spectre flag to cmd/asm and adds
a new Spectre mitigation mode "ret", which enables the use
of retpolines.
Retpolines prevent speculation about the target of an indirect
jump or call and are described in more detail here:
https://support.google.com/faqs/answer/7625886
Change-Id: I4f2cb982fa94e44d91e49bd98974fd125619c93a
Reviewed-on: https://go-review.googlesource.com/c/go/+/222661
Reviewed-by: Keith Randall <khr@golang.org>
Update error messages for pointer alignment checks and pointer
arithmetic checks so that each type of error has a unique error
message.
Fixes#37488
Change-Id: Ida2c2fa3f041a3307d665879a463f9e8f2c1fd03
Reviewed-on: https://go-review.googlesource.com/c/go/+/223037
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The cleantimers can run for a while in some unlikely cases.
If the GC is trying to preempt the G, it is forced to wait as the
G is holding timersLock. To avoid introducing a GC delay,
return from cleantimers if the G has a preemption request.
Fixes#37779
Change-Id: Id9a567f991e26668e2292eefc39e2edc56efa4e0
Reviewed-on: https://go-review.googlesource.com/c/go/+/223122
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This test is flaky, and the cause is suspected to be an OpenBSD kernel bug.
Since there is no obvious workaround on the Go side, skip the test on
builders whose versions are known to be affected.
Fixes#17496
Change-Id: Ifa70061eb429e1d949f0fa8a9e25d177afc5c488
Reviewed-on: https://go-review.googlesource.com/c/go/+/222856
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alexander Rakoczy <alex@golang.org>
In the open-code defer implementation, we add defer struct entries to the defer
chain on-the-fly at panic time to represent stack frames that contain open-coded
defers. This allows us to process non-open-coded and open-coded defers in the
correct order. Also, we need somewhere to be able to store the 'started' state of
open-coded defers. However, if a recover succeeds, defers will now be processed
inline again (unless another panic happens). Any defer entry representing a frame
with open-coded defers will become stale once we run the corresponding defers
inline and exit the associated stack frame. So, we need to remove all entries for
open-coded defers at recover time.
The current code was only removing the top-most open-coded defer from the defer
chain during recovery. However, with recursive functions that do repeated
panic-recover-repanic, multiple stale entries can accumulate on the chain. So, we
just adjust the loop to process the entire chain. Since this is at panic/recover
case, it is fine to scan through the entire chain (which should usually have few
elements in it, since most defers are open-coded).
The added test fails with a SEGV without the fix, because it tries to run a stale
open-code defer entry (and the stack has changed).
Fixes#37664.
Change-Id: I8e3da5d610b5e607411451b66881dea887f7484d
Reviewed-on: https://go-review.googlesource.com/c/go/+/222420
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
If typehash (used by reflect) does not match the built-in map's hash,
then problems occur. If a map is built using reflect, and then
assigned to a variable of map type, the hash function can change. That
causes very bad things.
This issue is rare. MapOf consults a cache of all types that occur in
the binary before making a new one. To make a true new map type (with
a hash function derived from typehash) that map type must not occur in
the binary anywhere. But to cause the bug, we need a variable of that
type in order to assign to it. The only way to make that work is to
use a named map type for the variable, so it is distinct from the
unnamed version that MapOf looks for.
Fixes#37716
Change-Id: I3537bfceca8cbfa1af84202f432f3c06953fe0ed
Reviewed-on: https://go-review.googlesource.com/c/go/+/222357
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
When generating stacks, the runtime automatically expands inline
functions to inline all inline frames in the stack. However, due to the
stack size limit, the final frame may be truncated in the middle of
several inline frames at the same location.
As-is, we assume that the final frame is a normal function, and emit and
cache a Location for it. If we later receive a complete stack frame, we
will first use the cached Location for the inlined function and then
generate a new Location for the "caller" frame, in violation of the
pprof requirement to merge inlined functions into the same Location.
As a result, we:
1. Nondeterministically may generate a profile with the different stacks
combined or split, depending on which is encountered first. This is
particularly problematic when performing a diff of profiles.
2. When split stacks are generated, we lose the inlining information.
We avoid both of these problems by performing a second expansion of the
last stack frame to recover additional inline frames that may have been
lost. This expansion is a bit simpler than the one done by the runtime
because we don't have to handle skipping, and we know that the last
emitted frame is not an elided wrapper, since it by definition is
already included in the stack.
Fixes#37446
Change-Id: If3ca2af25b21d252cf457cc867dd932f107d4c61
Reviewed-on: https://go-review.googlesource.com/c/go/+/221577
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
This was the last remaining use of staticbytes, so we can now
delete it.
The new code appears slightly faster on amd64:
name old time/op new time/op delta
SliceByteToString/1-4 6.29ns ± 2% 5.89ns ± 1% -6.46% (p=0.000 n=14+14)
This may not be the case on the big-endian architectures, since they have
to do an extra addition.
Updates #37612
Change-Id: Icb84c5911ba025f798de152849992a55be99e4f3
Reviewed-on: https://go-review.googlesource.com/c/go/+/221979
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
There are still two places in src/runtime/string.go that use
staticbytes, so we cannot delete it just yet.
There is a new codegen test to verify that the index calculation
is constant-folded, at least on amd64. ppc64, mips[64] and s390x
cannot currently do that.
There is also a new runtime benchmark to ensure that this does not
slow down performance (tested against parent commit):
name old time/op new time/op delta
ConvT2EByteSized/bool-4 1.07ns ± 1% 1.07ns ± 1% ~ (p=0.060 n=14+15)
ConvT2EByteSized/uint8-4 1.06ns ± 1% 1.07ns ± 1% ~ (p=0.095 n=14+15)
Updates #37612
Change-Id: I5ec30738edaa48cda78dfab4a78e24a32fa7fd6a
Reviewed-on: https://go-review.googlesource.com/c/go/+/221957
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Make panics more useful by printing values, if their
underlying kind is printable, instead of just their memory address.
Thus now given any custom type derived from any of:
float*, int*, string, uint*
if we have panic with such a result, its value will be printed.
Thus given any of:
type MyComplex128 complex128
type MyFloat64 float64
type MyString string
type MyUintptr uintptr
panic(MyComplex128(32.1 + 10i))
panic(MyFloat64(-93.7))
panic(MyString("This one"))
panic(MyUintptr(93))
They will now print in the panic:
panic: main.MyComplex64(+1.100000e-001+3.000000e+000i)
panic: main.MyFloat64(-9.370000e+001)
panic: main.MyString("This one")
panic: main.MyUintptr(93)
instead of:
panic: (main.MyComplex128) (0xe0100,0x138cc0)
panic: (main.MyFloat64) (0xe0100,0x138068)
panic: (main.MyString) (0x48aa00,0x4c0840)
panic: (main.MyUintptr) (0xe0100,0x137e58)
and anything else will be printed as in the past with:
panic: (main.MyStruct) (0xe4ee0,0x40a0e0)
Also while here, updated the Go1.15 release notes.
Fixes#37531
Change-Id: Ia486424344a386014f2869ab3483e42a9ef48ac4
Reviewed-on: https://go-review.googlesource.com/c/go/+/221779
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Print the bytes of the instruction that generated a SIGILL.
This should help us respond to bug reports without having to
go back-and-forth with the reporter to get the instruction involved.
Might also help with SIGILL problems that are difficult to reproduce.
Update #37513
Change-Id: I33059b1dbfc97bce16142a843f32a88a6547e280
Reviewed-on: https://go-review.googlesource.com/c/go/+/221431
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The CGO traceback function is called whenever CGO code is executing and
a signal is received. This occurs much more frequently now SIGURG
is used for preemption.
Disable signal preemption to significantly increase the likelihood that
a signal results in a profile sample during the test.
Updates #37201
Change-Id: Icb1a33ab0754d1a74882a4ee265b4026abe30bdc
Reviewed-on: https://go-review.googlesource.com/c/go/+/219417
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The error message for an unrecognized type in decodeField was using
string(i) for an int type i. It was recently changed (by me) to
string(rune(i)), but that just avoided a vet warning without fixing
the problem. This CL fixes the problem by using fmt.Errorf.
We also change the message to "unknown wire type" to match the master
copy of this code in github.com/google/pprof/profile/proto.go.
Updates #32479
Change-Id: Ia91ea6d5edbd7cd946225d1ee96bb7623b52bb44
Reviewed-on: https://go-review.googlesource.com/c/go/+/221384
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
If we see a racy use of timers, as in concurrent calls to Timer.Reset,
do the operations in an unpredictable order, rather than crashing.
Fixes#37400
Change-Id: Idbac295df2dfd551b6d762909d5040fc532c1b34
Reviewed-on: https://go-review.googlesource.com/c/go/+/221077
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Instead use string(r) where r has type rune.
This is in preparation for a vet warning for string(i).
Updates #32479
Change-Id: Ic205269bba1bd41723950219ecfb67ce17a7aa79
Reviewed-on: https://go-review.googlesource.com/c/go/+/220844
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Akhil Indurti <aindurti@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Toshihiro Shiino <shiino.toshihiro@gmail.com>
In CL 219131 we inserted a VZEROUPPER instruction on darwin/amd64.
The instruction is not available on pre-AVX machines. Guard it
with CPU feature.
Fixes#37459.
Change-Id: I9a064df277d091be4ee594eda5c7fd8ee323102b
Reviewed-on: https://go-review.googlesource.com/c/go/+/221057
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
In rare circumstances, this helps report a race which would
otherwise go undetected.
Fixes#36794
Change-Id: I8a3c9bd6fc34efa51516393f7ee72531c34fb073
Reviewed-on: https://go-review.googlesource.com/c/go/+/220685
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
This reverts CL 220722.
Reason for revert: rolling forward with fix.
Fixes#37392
Change-Id: Iba8b0c645267777fbb7019976292d691a10b906a
Reviewed-on: https://go-review.googlesource.com/c/go/+/220898
Run-TryBot: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Jay Conrod <jayconrod@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The wrapper takes a pointer to the argument, not the argument itself.
Fixes#36705
Change-Id: I566d4457d00bf5b84e4a8315a26516975f0d7e10
Reviewed-on: https://go-review.googlesource.com/c/go/+/215942
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Having an mcache field in both m and p is confusing, so remove it from m.
Always use mcache field from p. Use new variable mcache0 during bootstrap.
Change-Id: If2cba9f8bb131d911d512b61fd883a86cf62cc98
Reviewed-on: https://go-review.googlesource.com/c/go/+/205239
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
The runtime implementation of select has an upper limit on the number of
select cases that are supported in order to maintain low stack memory
usage. Rather than support an arbitrary number of select cases, we've
opted to panic early with a useful message pointing the user directly
at the problem.
Fixes#37350
Change-Id: Id129ba281ae120387e681ef96be8adcf89725840
Reviewed-on: https://go-review.googlesource.com/c/go/+/220583
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This constant is only used on libc-based platforms (aix, darwin,
solaris).
Change-Id: Ic57d1fe3b1501c5b552eddb9aba11f1e02510082
Reviewed-on: https://go-review.googlesource.com/c/go/+/220421
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This CL implements Ticker.Reset method in time package.
Benchmark:
name time/op
TickerReset-12 6.41µs ±10%
TickerResetNaive-12 95.7µs ±12%
Fixes#33184
Change-Id: I4cbd31796efa012b2a297bb342158f11a4a31fef
Reviewed-on: https://go-review.googlesource.com/c/go/+/220424
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This reverts CL 217362 (6e5652bebede2d53484a872f6d1dfeb498b0b50c.)
Reason for revert: Causing failures on arm64 bots. See #33184 for more info
Change-Id: I72ba40047e4138767d95aaa68842893c3508c52f
Reviewed-on: https://go-review.googlesource.com/c/go/+/220638
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This CL implements Ticker.Reset method in time package.
Benchmark:
name time/op
TickerReset-12 6.41µs ±10%
TickerResetNaive-12 95.7µs ±12%
Fixes#33184
Change-Id: I12c651f81e452541bcbbc748b45f038aae1f8dae
Reviewed-on: https://go-review.googlesource.com/c/go/+/217362
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The code has a comment saying that it waited for the goroutines,
but it didn't actually do so.
Change-Id: Icaeb40613711053a9f443cc34143835560427dda
Reviewed-on: https://go-review.googlesource.com/c/go/+/219277
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
On PPC64 when external linking, for large binaries we split the
text section to multiple sections, so the external linking may
insert trampolines between sections. These trampolines are within
the address range covered by the func table, but not known by Go.
This causes runtime.findfunc to return a wrong function if the
given PC is from such trampolines.
In this CL, we generate a marker between text sections where
there could potentially be a hole in the func table. At run time,
we skip the hole if we see such a marker.
Fixes#37216.
Change-Id: I95ab3875a84b357dbaa65a4ed339a19282257ce0
Reviewed-on: https://go-review.googlesource.com/c/go/+/219717
Reviewed-by: David Chase <drchase@google.com>
In walltime1/nanotime1, we save the caller's PC and SP for stack
unwinding. The code does that assumed zero frame size. Now that
the frame size is not zero, correct the offset. Rewrite it in a
way that doesn't depend on hard-coded frame size.
May fix#37127.
Change-Id: I47d6d54fc3499d7d5946c3f6a2dbd24fbd679de1
Reviewed-on: https://go-review.googlesource.com/c/go/+/219118
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Apparently, the signal handling code path in darwin kernel leaves
the upper bits of Y registers in a dirty state, which causes many
SSE operations (128-bit and narrower) become much slower. Clear
the upper bits to get to a clean state.
We do it at the entry of asyncPreempt, which is immediately
following exiting from the kernel's signal handling code, if we
actually injected a call. It does not cover other exits where we
don't inject a call, e.g. failed preemption, profiling signal, or
other async signals. But it does cover an important use case of
async signals, preempting a tight numerical loop, which we
introduced in this cycle.
Running the benchmark in issue #37174:
name old time/op new time/op delta
Fast-8 90.0ns ± 1% 46.8ns ± 3% -47.97% (p=0.000 n=10+10)
Slow-8 188ns ± 5% 49ns ± 1% -73.82% (p=0.000 n=10+9)
There is no more slowdown due to preemption signals.
For #37174.
Change-Id: I8b83d083fade1cabbda09b4bc25ccbadafaf7605
Reviewed-on: https://go-review.googlesource.com/c/go/+/219131
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
We were using the fallback hash unconditionally. Oops.
Fixes#37212
Change-Id: Id37d4f5c08806fdda12a3148ba4dbc46676eeb54
Reviewed-on: https://go-review.googlesource.com/c/go/+/219337
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Otherwise we can see
- goroutine 1 calls netpollBreak, the atomic.Cas succeeds, then suspends
- goroutine 2 calls noteclear, sets netpollBroken to 0
- goroutine 3 calls netpollBreak, the atomic.Cas succeeds, calls notewakeup
- goroutine 1 wakes up calls notewakeup, crashes due to double wakeup
This doesn't happen on Plan 9 because it only runs one thread at a time.
But Fuschia wants to use this code too.
Change-Id: Ib636e4f327bb15e44a2c40fd681aae9a91073a30
Reviewed-on: https://go-review.googlesource.com/c/go/+/218537
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
This commit changes the wording of a comment in malloc.go that describes
how span objects are zeroed to make it more clear.
Change-Id: I07722df1e101af3cbf8680ad07437d4a230b0168
GitHub-Last-Rev: 0e909898c7
GitHub-Pull-Request: golang/go#37008
Reviewed-on: https://go-review.googlesource.com/c/go/+/217618
Reviewed-by: Austin Clements <austin@google.com>
It's possible for the scheduler to try to preempt a goroutine running
on a thread created by C code just as the goroutine returns from Go code
to C code. If that happens, the goroutine will have a nil g,
which would normally cause us to enter the badsignal code.
The badsignal code will allocate an M, reset the signal handler,
and raise the signal. This is all wasted work for SIGURG,
as the default behavior is for the kernel to ignore the signal.
It also means that there is a period of time when preemption requests
are ignored, because the signal handler is reset to the default.
And, finally, it triggers a bug on 386 OpenBSD 6.2. So stop doing it.
No test because there is no real change in behavior (other than on OpenBSD),
the new code is just more efficient
Fixes#36996
Change-Id: I8c1cb9bc09f5ef890cab567924417e2423fc71f6
Reviewed-on: https://go-review.googlesource.com/c/go/+/217617
Reviewed-by: Austin Clements <austin@google.com>
Now that the other dependent offset has been identified, we can remove the
unnecessary ADDI instruction from the riscv64 call sequence (reducing it
to AUIPC+JALR, rather than the previous AUIPC+ADDI+JALR).
Change-Id: I348c4efb686f9f71ed1dd1d25fb9142a41230b0d
Reviewed-on: https://go-review.googlesource.com/c/go/+/216798
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This change formalizes an assumption made by the page allocator, which
is that (*pageAlloc).searchAddr should never refer to memory that is not
represented by (*pageAlloc).inUse. The portion of address space covered
by (*pageAlloc).inUse reflects the parts of the summary arrays which are
guaranteed to mapped, and so looking at any summary which is not
reflected there may cause a segfault.
In fact, this can happen today. This change thus also removes a
micro-optimization which is the only case which may cause
(*pageAlloc).searchAddr to point outside of any region covered by
(*pageAlloc).inUse, and adds a test verifying that the current segfault
can no longer occur.
Change-Id: I98b534f0ffba8656d3bd6d782f6fc22549ddf1c2
Reviewed-on: https://go-review.googlesource.com/c/go/+/216697
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This reverts commit 7b294cdd8d, CL 182657.
Reason for revert: This change may be causing latency problems
for applications which call ReadMemStats, because it may cause
all goroutines to stop until the GC completes.
https://golang.org/cl/215157 fixes this problem, but it's too
late in the cycle to land that.
Updates #19812.
Change-Id: Iaa26f4dec9b06b9db2a771a44e45f58d0aa8f26d
Reviewed-on: https://go-review.googlesource.com/c/go/+/216358
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This reverts commit 05511a5c0a, CL 208379.
Reason for revert: So that we can cleanly revert
https://golang.org/cl/182657.
Change-Id: I4fdf4f864a093db7866b3306f0f8f856b9f4d684
Reviewed-on: https://go-review.googlesource.com/c/go/+/216357
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Unlike C's memmove, Go's memmove must be careful to do indivisible
writes of pointer values because it may be racing with the garbage
collector reading the heap.
We've had various bugs related to this over the years (#36101, #13160,
#12552). Indeed, memmove is a great target for optimization and it's
easy to forget the special requirements of Go's memmove.
The CL documents these (currently unwritten!) requirements. We're also
adding a test that should hopefully keep everyone honest going
forward, though it's hard to be sure we're hitting all cases of
memmove.
Change-Id: I2f59f8d8d6fb42d2f10006b55d605b5efd8ddc24
Reviewed-on: https://go-review.googlesource.com/c/go/+/213418
Reviewed-by: Cherry Zhang <cherryyz@google.com>
The timers code used to have a problem: if code started and stopped a
lot of timers, as would happen with, for example, lots of calls to
context.WithTimeout, then it would steadily use memory holding timers
that had stopped but not been removed from the timer heap.
That problem was fixed by CL 214299, which would remove all deleted
timers whenever they got to be more than 1/4 of the total number of
timers on the heap.
The timers code had a different problem: if there were some idle P's,
the running P's would have lock contention trying to steal their timers.
That problem was fixed by CL 214185, which only acquired the timer lock
if the next timer was ready to run or there were some timers to adjust.
Unfortunately, CL 214185 partially undid 214299, in that we could now
accumulate an increasing number of deleted timers while there were no
timers ready to run. This CL restores the 214299 behavior, by checking
whether there are lots of deleted timers without acquiring the lock.
This is a performance issue to consider for the 1.14 release.
Change-Id: I13c980efdcc2a46eb84882750c39e3f7c5b2e7c3
Reviewed-on: https://go-review.googlesource.com/c/go/+/215722
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
There is a wrong offset when getting the results of a clock_gettime
syscall. Although the syscall will never be called in native ppc64x,
QEMU doesn't implement VDSO, so it will return wrong values.
Fixes#36592
Change-Id: Icf838075228dcdd62cf2c1279aa983e5993d66ee
Reviewed-on: https://go-review.googlesource.com/c/go/+/215397
Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
Based on riscv-go port.
Updates #27532
Change-Id: If522807a382130be3c8d40f4b4c1131d1de7c9e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/204632
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Go 1.14 will drop support for macOS 10.10, see #23011
This reverts CL 155097
Updates #26475
Updates #29340
Change-Id: I64d0275141407313b73068436ee81d13eacc4c76
Reviewed-on: https://go-review.googlesource.com/c/go/+/214058
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This reduces lock contention when only a few P's are running and
checking for whether they need to run timers on the sleeping P's.
Without this change the running P's would get lock contention
while looking at the sleeping P's timers. With this change a single
atomic load suffices to determine whether there are any ready timers.
Change-Id: Ie843782bd56df49867a01ecf19c47498ec827452
Reviewed-on: https://go-review.googlesource.com/c/go/+/214185
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: David Chase <drchase@google.com>
Currently, the scavenger is paced according to how long it takes to
scavenge one runtime page's worth of memory. However, this pacing
doesn't take into account the additional cost of actually using a
scavenged page. This operation, "sysUsed," is a counterpart to the
scavenging operation "sysUnused." On most systems this operation is a
no-op, but on some systems like Darwin and Windows we actually make a
syscall. Even on systems where it's a no-op, the cost is implicit: a
more expensive page fault when re-using the page.
On Darwin in particular the cost of "sysUnused" is fairly close to the
cost of "sysUsed", which skews the pacing to be too fast. A lot of
soon-to-be-allocated memory ends up scavenged, resulting in many more
expensive "sysUsed" operations, ultimately slowing down the application.
The way to fix this problem is to include the future cost of "sysUsed"
on a page in the scavenging cost. However, measuring the "sysUsed" cost
directly (like we do with "sysUnused") on most systems is infeasible
because we would have to measure the cost of the first access.
Instead, this change applies a multiplicative constant to the measured
scavenging time which is based on a per-system ratio of "sysUnused" to
"sysUsed" costs in the worst case (on systems where it's a no-op, we
measure the cost of the first access). This ultimately slows down the
scavenger to a more reasonable pace, limiting its impact on performance
but still retaining the memory footprint improvements from the previous
release.
Fixes#36507.
Change-Id: I050659cd8cdfa5a32f5cc0b56622716ea0fa5407
Reviewed-on: https://go-review.googlesource.com/c/go/+/214517
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Whenever more than 1/4 of the timers on a P's heap are deleted,
remove them from the heap.
Change-Id: Iff63ed3d04e6f33ffc5c834f77f645c52c007e52
Reviewed-on: https://go-review.googlesource.com/c/go/+/214299
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
We had a few test cases to make sure checkptr didn't have certain
false positives, but none to test for any true positives. This CL
fixes that.
Updates #22218.
Change-Id: I24c02e469a4af43b1748829a9df325ce510f7cc4
Reviewed-on: https://go-review.googlesource.com/c/go/+/214238
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
When adjustTimers sees a timerModifiedEarlier or timerModifiedLater,
it removes it from the heap, leaving a new timer at that position
in the heap. We were accidentally skipping that new timer in our loop.
In some unlikely cases this could cause adjustTimers to look at more
timers than necessary.
Change-Id: Ic71e54c175ab7d86a7fa46f1497aca71ed1c43cc
Reviewed-on: https://go-review.googlesource.com/c/go/+/214338
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently, scavenging information is printed if the gctrace debug
variable is >0. Scavenging information is also printed naively, for
every page scavenged, resulting in a lot of noise when the typical
expectation for GC trace is one line per GC.
This change adds a new GODEBUG flag called scavtrace which prints
scavenge information roughly once per GC cycle and removes any scavenge
information from gctrace. The exception is debug.FreeOSMemory, which may
force an additional line to be printed.
Fixes#32952.
Change-Id: I4177dcb85fe3f9653fd74297ea93c97c389c1811
Reviewed-on: https://go-review.googlesource.com/c/go/+/212640
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
On Windows, we implement asynchronous preemption using SuspendThread
to suspend other threads in our process. However, SuspendThread is
itself actually asynchronous (it enqueues a kernel "asynchronous
procedure call" and returns). Unfortunately, Windows' ExitProcess API
kills all threads except the calling one and then runs APCs. As a
result, if SuspendThread and ExitProcess are called simultaneously,
the exiting thread can be suspended and the suspending thread can be
exited, leaving behind a ghost process consisting of a single thread
that's suspended.
We've already protected against the runtime's own calls to
ExitProcess, but if Go code calls external code, there's nothing
stopping that code from calling ExitProcess. For example, in #35775,
our own call to racefini leads to C code calling ExitProcess and
occasionally causing a deadlock.
This CL fixes this by introducing synchronization between calling
external code on Windows and preemption. It adds an atomic field to
the M that participates in a simple CAS-based synchronization protocol
to prevent suspending a thread running external code. We use this to
protect cgocall (which is used for both cgo calls and system calls on
Windows) and racefini.
Tested by running the flag package's TestParse test compiled in race
mode in a loop. Before this change, this would reliably deadlock after
a few minutes.
Fixes#35775.
Updates #10958, #24543.
Change-Id: I50d847abcdc2688b4f71eee6a75eca0f2fee892c
Reviewed-on: https://go-review.googlesource.com/c/go/+/213837
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
On Windows, it might be possible that SuspendThread suspends a
thread right between when an exception happens and when the
exception handler runs. (This is my guess. I don't know the
implementation detail of Windows exceptions to be sure.) In this
case, we may inject a call to asyncPreempt before the exception
handler runs. The exception handler will inject a sigpanic call,
which will make the stack trace looks like
sigpanic
asyncPreempt
actual panicking function
i.e. it appears asyncPreempt panicked.
Instead, just overwrite the PC, without pushing another frame.
Fixes#35773.
Change-Id: Ief4e964dcb7f45670b5f93c4dcf285cc1c737514
Reviewed-on: https://go-review.googlesource.com/c/go/+/213879
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This is more readable and less error-prone than using RSP offsets.
Suggested during review of CL 212765.
Change-Id: I070190abeeac8eae5dbd414407602619d9d57422
Reviewed-on: https://go-review.googlesource.com/c/go/+/213577
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Correct the pipe and pipe2 implementations by using the correct RSP offsets,
used to store and return the file descriptor array.
Fix setNonblock by using the correct immediate value for O_NONBLOCK and
replace EOR (exclusive OR) with ORR.
Also correct the write1 implementation, which has a uintptr value for the fd
argument.
Change-Id: Ibca77af44b649e8bb330ca54f9c36a7a8b0f9cea
Reviewed-on: https://go-review.googlesource.com/c/go/+/212765
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
The current code uses EOR (exclusive OR), which will result in the O_NONBLOCK
flag being toggled rather than being set. Other implementations use OR, hence
this is likely a bug.
Change-Id: I5dafa9c572452070bd37789c8a731ad6d04a86cd
Reviewed-on: https://go-review.googlesource.com/c/go/+/212766
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
In reclaimChunk, the runtime is calling traceGCSweepDone() while holding the mheap
lock. traceGCSweepDone() can call traceEvent() and traceFlush(). These functions
not only can get various trace locks, but they may also do memory allocations
(runtime.newobject) that may end up getting the mheap lock. So, there may be
either a self-deadlock or a possible deadlock between multiple threads.
It seems better to release the mheap lock before calling traceGCSweepDone(). It is
fine to release the lock, since the operations to get the index of the chunk of
work to do are atomic. We already release the lock to call sweep, so there is no
new behavior for any of the callers of reclaimChunk.
With this change, mheap is a leaf lock (no other lock is ever acquired while it
is held).
Testing: besides normal all.bash, also ran all.bash with --long enabled, since
it does longer tests of runtime/trace.
Change-Id: I4f8cb66c24bb8d424f24d6c2305b4b8387409248
Reviewed-on: https://go-review.googlesource.com/c/go/+/207846
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
In the previous CL we ensures that memmove writes pointers
atomically, so the concurrent GC won't observe a partially
updated pointer. This CL adds a test.
Change-Id: Icd1124bf3a15ef25bac20c7fb8933f1a642d897c
Reviewed-on: https://go-review.googlesource.com/c/go/+/212627
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
R11 a.k.a. REGTMP is the temp register used by the assembler. It
may be clobbered if the assembler needs to synthesize
instructions. In particular, in nanotime1/walltime1, the load of
global variable runtime.iscgo clobbers it. So, avoid using R11
to hold a long-lived value.
Fixes#36309.
Change-Id: Iec2ab9d664532cad8fbf58da17f580e64a744f62
Reviewed-on: https://go-review.googlesource.com/c/go/+/212641
Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
Reviewed-by: Andrew G. Morgan <agm@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
If a pointer write is not atomic, if the GC is running
concurrently, it may observe a partially updated pointer, which
may point to unallocated or already dead memory. Most pointer
writes, like the store instructions generated by the compiler,
are already atomic. But we still need to be careful in places
like memmove. In memmove, we don't know which bits are pointers
(or too expensive to query), so we ensure that all aligned
pointer-sized units are written atomically.
Fixes#36101.
Change-Id: I1b3ca24c6b1ac8a8aaf9ee470115e9a89ec1b00b
Reviewed-on: https://go-review.googlesource.com/c/go/+/212626
Reviewed-by: Austin Clements <austin@google.com>
When a goroutine yields the remainder of its time to another goroutine
during direct semaphore handoff (as in an Unlock of a sync.Mutex in
starvation mode), it needs to signal that change to the execution
tracer. The discussion in CL 200577 didn't reach consensus on how best
to describe that, but pointed out that "traceEvGoSched / goroutine calls
Gosched" could be confusing.
Emit a "traceEvGoPreempt / goroutine is preempted" event in this case,
to allow the execution tracer to find a consistent event ordering
without being both specific and inaccurate about why the active
goroutine has changed.
Fixes#36186
Change-Id: Ic4ade19325126db2599aff6aba7cba028bb0bee9
Reviewed-on: https://go-review.googlesource.com/c/go/+/211797
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This change makes it so that we check whether scavAddr is actually
mapped before trying to look at the summary for the fast path, since we
may segfault if that that part of the summary is not mapped in.
Previously this wasn't a problem because we would conservatively map
all memory for the summaries between the lowest mapped heap address and
the highest one.
This change also adds a test for this case.
Change-Id: I2b1d89b5e044dce81745964dfaba829f4becdc57
Reviewed-on: https://go-review.googlesource.com/c/go/+/212637
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This change disables pageAlloc tests on OpenBSD in short mode because
pageAlloc holds relatively large virtual memory reservations and we make
two during the pageAlloc tests. The runtime may also be carrying one
such reservation making the virtual memory requirement for testing the
Go runtime three times as much as just running a Go binary.
This causes problems for folks who just want to build and test Go
(all.bash) on OpenBSD but either don't have machines with at least 4ish
GiB of RAM (per-process virtual memory limits are capped at some
constant factor times the amount of physical memory) or their
per-process virtual memory limits are low for other reasons.
Fixes#36210.
Change-Id: I8d89cfde448d4cd2fefff4ad6ffed90de63dd527
Reviewed-on: https://go-review.googlesource.com/c/go/+/212177
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Patch up runtime testing to use the libc fcntl function on Darwin,
which is what we should be doing anyhow. This is similar to how
we handle fcntl on AIX and Solaris.
Fixes#36211
Change-Id: I47ad87e11df043ce21496a0d59523dad28960f76
Reviewed-on: https://go-review.googlesource.com/c/go/+/212299
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
nanotime1 and walltime1 do not preserve BP on linux amd64. Previously, this
did not cause a problem, because nanotime/walltime do preserve the BP. But now
with mid-stack inlining, nanotime/walltime are usually inlined, so BP is not
preserved. So, the BP is now wrong in any function after a call to
nanotime()/walltime() on amd64. That means the frame pointer on the stack can
be wrong for any further function call made after the nanotime() call (notably
runtime.main and various GC functions). [386 doesn't use framepointer.]
Fix is to set a frame size of 8 for nanotime1 and walltime1, which means the
standard prolog/epilog that saves/restore BP in the stack frame is added.
I noticed this while investigating issue 16638 (use frame pointers for
runtime.Callers). This change would needed for progress on that issue (which
doesn't have a high priority). Verified that this fix works/is useful for issue
16638.
Change-Id: I19e19ef2c1a517d737a34928baae034f2eb0b2c2
Reviewed-on: https://go-review.googlesource.com/c/go/+/212079
Run-TryBot: Dan Scales <danscales@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
We don't asynchronously preempt if we are in the runtime. We do
this by checking the function name. However, it failed to take
inlining into account. If a runtime function gets inlined into
a non-runtime function, it can be preempted, and bad things can
happen. One instance of this is dounlockOSThread inlined into
UnlockOSThread which is in turn inlined into a non-runtime
function.
Fix this by using the innermost frame's function name.
Change-Id: Ifa036ce1320700aaaefd829b4bee0d04d05c395d
Reviewed-on: https://go-review.googlesource.com/c/go/+/211978
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Systems where PowerRegisterSuspendResumeNotification returns ERROR_
FILE_NOT_FOUND are also systems where nanotime() is on "program time"
rather than "real time". The chain for this is:
powrprof.dll!PowerRegisterSuspendResumeNotification ->
umpdc.dll!PdcPortOpen ->
ntdll.dll!ZwAlpcConnectPort("\\PdcPort") ->
syscall -> ntoskrnl.exe!AlpcpConnectPort
Opening \\.\PdcPort fails with STATUS_OBJECT_NAME_NOT_FOUND when pdc.sys
hasn't been initialized. Pdc.sys also provides the various hooks for
sleep resumption events, which means if it's not loaded, then our "real
time" timer is actually on "program time". Finally STATUS_OBJECT_NAME_
NOT_FOUND is passed through RtlNtStatusToDosError, which returns ERROR_
FILE_NOT_FOUND. Therefore, in the case where the function returns ERROR_
FILE_NOT_FOUND, we don't mind, since the timer we're using will
correspond fine with the lack of sleep resumption notifications. This
applies, for example, to Docker users.
Fixes#35447Fixes#35482
Change-Id: I9e1ce5bbc54b9da55ff7a3918b5da28112647eee
Reviewed-on: https://go-review.googlesource.com/c/go/+/208317
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Jason A. Donenfeld <Jason@zx2c4.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
When growing the address ranges, the new length is the old length + 1.
Fixes#36113.
Change-Id: I1b425f78e473cfa3cbdfe6113e166663f41fc9f3
Reviewed-on: https://go-review.googlesource.com/c/go/+/211157
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
If the defer function pointer is nil, force the seg fault to happen in deferreturn
rather than in jmpdefer. jmpdefer is used fairly infrequently now because most
functions have open-coded defers.
The open-coded defer implementation calls gentraceback() with a callback when
looking for the first open-coded defer frame. gentraceback() throws an error if it
is called with a callback on an LR architecture and jmpdefer is on the stack,
because the stack trace can be incorrect in that case - see issue #8153. So, we
want to make sure that we don't have a seg fault in jmpdefer.
Fixes#36050
Change-Id: Ie25e6f015d8eb170b40248dedeb26a37b7f9b38d
Reviewed-on: https://go-review.googlesource.com/c/go/+/210978
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Prior to this change, if the heap was very discontiguous (such as in
TestArenaCollision) it's possible we could map a large amount of memory
as R/W and commit it. We would use only the start and end to track what
should be mapped, and we would extend that mapping as needed to
accomodate a potentially fragmented address space.
After this change, we only map exactly the part of the summary arrays
that we need by using the inUse ranges from the previous change. This
reduces the GCSys footprint of TestArenaCollision from 300 MiB to 18
MiB.
Because summaries are no longer mapped contiguously, this means the
scavenger can no longer iterate directly. This change also updates the
scavenger to borrow ranges out of inUse and iterate over only the
parts of the heap which are actually currently in use. This is both an
optimization and necessary for correctness.
Fixes#35514.
Change-Id: I96bf0c73ed0d2d89a00202ece7b9d089a53bac90
Reviewed-on: https://go-review.googlesource.com/c/go/+/207758
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This change adds a new inUse field to the allocator which tracks ranges
of addresses that are owned by the heap. It is updated on each heap
growth.
These ranges are tracked in an array which is kept sorted. In practice
this array shouldn't exceed its initial allocation except in rare cases
and thus should be small (ideally exactly 1 element in size).
In a hypothetical worst-case scenario wherein we have a 1 TiB heap and 4
MiB arenas (note that the address ranges will never be at a smaller
granularity than an arena, since arenas are always allocated
contiguously), inUse would use at most 4 MiB of memory if the heap
mappings were completely discontiguous (highly unlikely) with an
additional 2 MiB leaked from previous allocations. Furthermore, the
copies that are done to keep the inUse array sorted will copy at most 4
MiB of memory in such a scenario, which, assuming a conservative copying
rate of 5 GiB/s, amounts to about 800µs.
However, note that in practice:
1) Most 64-bit platforms have 64 MiB arenas.
2) The copies should incur little-to-no page faults, meaning a copy rate
closer to 25-50 GiB/s is expected.
3) Go heaps are almost always mostly contiguous.
Updates #35514.
Change-Id: I3ad07f1c2b5b9340acf59ecc3b9ae09e884814fe
Reviewed-on: https://go-review.googlesource.com/c/go/+/207757
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Austin Clements <austin@google.com>
The syscall_forkx function returns the value of errno even on success. This can be a problem when using cgo where an atfork handler might be registered; if the atfork handler does something which causes errno to be set the caller of syscall_forkx can be misled into thinking the fork has failed. This causes the various exec functions in the runtime package to hang.
Change-Id: Ia1842179226078a0cbbea33d541aa1187dc47f68
GitHub-Last-Rev: 4dc4db75c8
GitHub-Pull-Request: golang/go#36076
Reviewed-on: https://go-review.googlesource.com/c/go/+/210742
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Support for these was added in CL 189577
Change-Id: Iaf2a774b141995cbbdfb3888aea67ae9c7f928b1
Reviewed-on: https://go-review.googlesource.com/c/go/+/210677
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
After golang.org/cl/210124, I wondered if the same error had gone
unnoticed elsewhere. I quickly spotted another dozen mistakes after
reading through the output of:
git grep '\<[Aa]n [bcdfgjklmnpqrtvwyz][a-z]'
Many results are false positives for acronyms like "an mtime", since
it's pronounced "an em-time". However, the total amount of output isn't
that large given how simple the grep pattern is.
Change-Id: Iaa2ca69e42f4587a9e3137d6c5ed758887906ca6
Reviewed-on: https://go-review.googlesource.com/c/go/+/210678
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Zach Jones <zachj1@gmail.com>
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
AIX doesn't allow to mmap an already mmap address. The previous way to
deal with this behavior was to munmap before calling mmap again.
However, mprotect syscall is able to change protections on a memory
range. Thus, memory mapped by sysReserve can be remap using it. Note
that sysMap is always called with a non-nil pointer so mprotect is
always possible.
Updates: #35451
Change-Id: I1fd1e1363d9ed9eb5a8aa7c8242549bd6dad8cd0
Reviewed-on: https://go-review.googlesource.com/c/go/+/207237
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Some Linux distributions will continue to provide 5.3.x kernels for a
while rather than 5.4.x.
Updates #35777
Change-Id: I493ef8338d94475f4fb1402ffb9040152832b0fd
Reviewed-on: https://go-review.googlesource.com/c/go/+/210299
Reviewed-by: Austin Clements <austin@google.com>
CL 209899 worked around an issue that corrupts vector registers in
recent versions of the Linux kernel by mlocking the top page of every
signal stack on amd64. However, the underlying issue also affects the
XMM registers on 386. This CL applies the mlock fix to both amd64 and
386.
Fixes#35777 (again).
Change-Id: I9886f2dc4c23625421296bd5518d5fd3288bfe48
Reviewed-on: https://go-review.googlesource.com/c/go/+/210345
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently, we're ignoring failures to mlock signal stacks in the
workaround for #35777. This means if your mlock limit is low, you'll
instead get random memory corruption, which seems like the wrong
trade-off.
This CL checks for mlock failures and panics with useful guidance.
Updates #35777.
Change-Id: I15f02d3a1fceade79f6ca717500ca5b86d5bd570
Reviewed-on: https://go-review.googlesource.com/c/go/+/210098
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Give the runtime more of a chance to do other work in a tight loop.
Fixes#34693
Change-Id: I8df6173d2c93ecaccecf4520a6913b495787df78
Reviewed-on: https://go-review.googlesource.com/c/go/+/210217
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently, a block's control instruction gets the liveness info
of the last Value in the block. However, for an empty block, the
control instruction gets the invalid liveness info and therefore
not preemptible. One example is empty infinite loop, which has
only a control instruction. The control instruction being non-
preemptible makes the whole loop non-preemptible.
Fix this by using a different, preemptible liveness info for
empty block's control. We can choose an arbitrary preemptible
liveness info, as at run time we don't really use the liveness
map at that instruction.
As before, if the last Value in the block is non-preemptible, so
is the block control. For example, the conditional branch in the
write barrier test block is still non-preemptible.
Also, only update liveness info if we are actually emitting
instructions. So zero-width Values' liveness info (which are
always invalid) won't affect the block control's liveness info.
For example, if the last Values in a block is a tuple-generating
operation and a Select, the block control instruction is still
preemptible.
Fixes#35923.
Change-Id: Ic5225f3254b07e4955f7905329b544515907642b
Reviewed-on: https://go-review.googlesource.com/c/go/+/209659
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
Linux 5.2 introduced a bug that can corrupt vector registers on return
from a signal if the signal stack isn't faulted in:
https://bugzilla.kernel.org/show_bug.cgi?id=205663
This CL works around this by mlocking the top page of all Go signal
stacks on the affected kernels.
Fixes#35326, #35777
Change-Id: I77c80a2baa4780827633f92f464486caa222295d
Reviewed-on: https://go-review.googlesource.com/c/go/+/209899
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: David Chase <drchase@google.com>
This will be used to parse the Linux kernel versions, but this code is
generic and can be tested on its own.
For #35777.
Change-Id: If1df48d07250e5855dde45bc9d57c66f777b9fb4
Reviewed-on: https://go-review.googlesource.com/c/go/+/209597
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Currently the page allocator bitmap is implemented as a single giant
memory mapping which is reserved at init time and committed as needed.
This causes problems on systems that don't handle large uncommitted
mappings well, or institute low virtual address space defaults as a
memory limiting mechanism.
This change modifies the implementation of the page allocator bitmap
away from a directly-mapped set of bytes to a sparse array in same vein
as mheap.arenas. This will hurt performance a little but the biggest
gains are from the lockless allocation possible with the page allocator,
so the impact of this extra layer of indirection should be minimal.
In fact, this is exactly what we see:
https://perf.golang.org/search?q=upload:20191125.5
This reduces the amount of mapped (PROT_NONE) memory needed on systems
with 48-bit address spaces to ~600 MiB down from almost 9 GiB. The bulk
of this remaining memory is used by the summaries.
Go processes with 32-bit address spaces now always commit to 128 KiB of
memory for the bitmap. Previously it would only commit the pages in the
bitmap which represented the range of addresses (lowest address to
highest address, even if there are unused regions in that range) used by
the heap.
Updates #35568.
Updates #35451.
Change-Id: I0ff10380156568642b80c366001eefd0a4e6c762
Reviewed-on: https://go-review.googlesource.com/c/go/+/207497
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
We were using the race context of the P that held the timer,
but since we unlock the P's timers while executing a timer
that could lead to a race on the race context itself.
Updates #6239
Updates #27707Fixes#35906
Change-Id: I5f9d5f52d8e28dffb88c3327301071b16ed1a913
Reviewed-on: https://go-review.googlesource.com/c/go/+/209580
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Plan 9 doesn't have a way to reserve virtual memory, so the
implementation of sysReserve allocates memory space (which won't
be backed with real pages until the virtual pages are referenced).
If the space is then freed with sysFree, it's not returned to
the OS (because Plan 9 doesn't allow shrinking a shared address
space), but it must be cleared to zeroes in case it's reallocated
subsequently.
This interacts badly with the way mallocinit on 64-bit machines
sets up the heap, calling sysReserve repeatedly for a very large
(64MB?) arena with a non-nil address hint, and then freeing the space
again because it doesn't have the expected alignment. The
repeated clearing of multiple megabytes adds significant startup
time to every go program.
We correct this by restricting sysReserve to allocate memory only
when the caller doesn't provide an address hint. If a hint is
provided, sysReserve will now return nil instead of allocating memory
at a different address.
Fixes#27744
Change-Id: Iae5a950adefe4274c4bc64dd9c740d19afe4ed1c
Reviewed-on: https://go-review.googlesource.com/c/go/+/207917
Run-TryBot: David du Colombier <0intro@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David du Colombier <0intro@gmail.com>
This change makes it so that waking up the scavenger readies its
goroutine without "next" set, so that it doesn't interfere with the
application's use of the runnext feature in the scheduler which helps
fairness.
As of CL 201763 the scavenger began waking up much more often, and in
TestPingPongHog this meant that it would sometimes supercede either a
hog or light goroutine in runnext, leading to a skew in the results and
ultimately a test flake.
This change thus re-enables the TestPingPongHog test on the builders.
Fixes#35271.
Change-Id: Iace08576912e8940554dd7de6447e458ad0d201d
Reviewed-on: https://go-review.googlesource.com/c/go/+/208380
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Currently scavengeAll (which is called by debug.FreeOSMemory) doesn't
reset the scavenge address before scavenging, meaning it could miss
large portions of the heap. Fix this by reseting the address before
scavenging, which will ensure it is able to walk over the entire heap.
Fixes#35858.
Change-Id: I4a7408050b8e134318ff94428f98cb96a1795aa9
Reviewed-on: https://go-review.googlesource.com/c/go/+/208960
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Print the current SP and (old) stack bounds when the stack grows
too large. This helps to identify the problem: whether a large
stack is used, or something else goes wrong.
For #35470.
Change-Id: I34a4064d5c7280978391d835e171b90d06f87222
Reviewed-on: https://go-review.googlesource.com/c/go/+/207351
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
Currently we use stack map index -2 to mark unsafe points, i.e.
PC ranges that is not safe for async preemption. This has a
problem: it cannot mark CALL instructions, because for stack scan
a valid stack map index is needed.
This CL switches to use register map index for marking unsafe
points instead, which does not conflict with stack scan and can
be applied on CALL instructions. This is necessary as next CL
will mark call to morestack nonpreemptible.
For #35470.
Change-Id: I357bf26c996e1fee1e7eebe4e6bb07d62930d3f7
Reviewed-on: https://go-review.googlesource.com/c/go/+/207349
Reviewed-by: David Chase <drchase@google.com>
On darwin, we use libc calls, and cgo is required on ARM and
ARM64 so we have TLS set up to save/restore G during C calls. If
cgo is absent, we cannot save/restore G in TLS, and if a signal
is received during C execution we cannot get the G. Therefore
don't send signals (and hope that we won't receive any signal
during C execution).
This can only happen in the go_bootstrap program (otherwise cgo
is required).
Fixes#35800.
Change-Id: I6c02a9378af02c19d32749a42db45165b578188d
Reviewed-on: https://go-review.googlesource.com/c/go/+/208818
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
cgo_mmap.go:mmap() is called by mem_linux.go:sysAlloc(), a low-level memory
allocation function. mmap() should be nosplit, since it is called in a lot of
low-level parts of the runtime and callers often assume it won't acquire any
locks.
As an example there is a potential deadlock involving two threads if mmap is not nosplit:
trace.bufLock acquired, then stackpool[order].item.mu, then mheap_.lock
- can happen for traceEvents that are not invoked on the system stack and cause
a traceFlush, which causes a sysAlloc, which calls mmap(), which may cause a
stack split. mheap_.lock
mheap_.lock acquired, then trace.bufLock
- can happen when doing a trace in reclaimChunk (which holds the mheap_ lock)
Also, sysAlloc() has a comment that it is nosplit because it may be invoked
without a valid G, in which case its callee mmap() should also be nosplit.
Similarly, sys_darwin.go:mmap() is called by mem_darwin.go:sysAlloc(), and should
be nosplit for the same reasons.
Extra gomote testing: linux/arm64, darwin/amd64
Change-Id: Ia4d10cec5cf1e186a0fe5aab2858c6e0e5b80fdc
Reviewed-on: https://go-review.googlesource.com/c/go/+/207844
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Because of concurrent goroutines it is possible for multiple event
handlers to return at the same time. This was not properly supported
and caused the wrong goroutine to continue, which in turn caused
memory corruption.
This change adds a stack of events so it is always clear which is the
innermost event that needs to return next.
Fixes#35256
Change-Id: Ia527da3b91673bc14e84174cdc407f5c9d5a3d09
Reviewed-on: https://go-review.googlesource.com/c/go/+/204662
Run-TryBot: Richard Musiol <neelance@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
After CL 182657 we no longer hold worldsema across the GC, we hold
gcsema instead.
However in STW GC mode we don't release worldsema before calling Gosched
on the user goroutine (note that user goroutines are disabled during STW
GC) so that user goroutine holds onto it. When the GC is done and the
runtime inevitably wants to "stop the world" again (though there isn't
much to stop) it'll sit there waiting for worldsema which won't be
released until the aforementioned goroutine is scheduled, which it won't
be until the GC is done!
So, we have a deadlock.
The fix is easy: just release worldsema before calling Gosched.
Fixes#34736.
Change-Id: Ia50db22ebed3176114e7e60a7edaf82f8535c1b4
Reviewed-on: https://go-review.googlesource.com/c/go/+/208379
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TestPhysicalMemoryUtilization occasionally fails on some platforms by
only a small margin. The reason for this is that it assumes the
scavenger will always be able to scavenge all the memory that's released
by sweeping, but because of the page cache, there could be free and
unscavenged memory held onto by a P which the scavenger simply cannot
get to.
As a result, if the page cache gets filled completely (512 KiB of free
and unscavenged memory) this could skew a test which expects to
scavenge roughly 8 MiB of memory. More specifically, this is 512 KiB of
memory per P, and if a system is more inclined to bounce around
between Ps (even if there's only one goroutine), this memory can get
"stuck".
Through some experimentation, I found that failures correlated highly
with relatively large amounts of memory ending up in some page cache
(like 60 or 64 pages) on at least one P.
This change changes the test's threshold such that it accounts for the
page cache, and scales up with GOMAXPROCS. Because the test constants
themselves don't change, however, the test must now also bound
GOMAXPROCS such that the threshold doesn't get too high (at which point
the test becomes meaningless).
Fixes#35580.
Change-Id: I6bdb70706de991966a9d28347da830be4a19d3a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/208377
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
The profile proto message builder maintains a location entry cache
that maps a location (possibly involving multiple user frames
that represent inlined function calls) to the location id. We have
been using the first pc of the inlined call sequence as the key of
the cached location entry assuming that, for a given pc, the sequence
of frames representing the inlined call stack is deterministic and
stable. Then, when analyzing the new stack trace, we expected the
exact number of pcs to be present in the captured stack trace upon
the cache hit.
This assumption does not hold, however, in the presence of the stack
trace truncation in the runtime during profiling, and also with the
potential bugs in runtime.
A better fix is to use all the pcs of the inlined call sequece as
the key instead of the first pc. But that is a bigger code change.
This CL avoids the crash assuming the trace was truncated.
Fixes#35538
Change-Id: I8c6bae98bc8b178ee51523c7316f56b1cce6df16
Reviewed-on: https://go-review.googlesource.com/c/go/+/207609
Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
In TestAsyncPreempt, the function being tested for preemption,
although still asynchronously preemptible, may have only samll
ranges of PCs that are preemtible. In an unlucky run, it may
take quite a while to have a signal that lands on a preemptible
instruction. The test case is kind of an extreme. Relax it to
make it more preemptible.
In the original version, the first closure has more work to do,
and it is not a leaf function, and the second test case is a
frameless leaf function. In the current version, the first one
is also a frameless leaf function (the atomic is intrinsified).
Add some calls to it. It is still not preemptible without async
preemption.
Fixes#35608.
Change-Id: Ia4f857f2afc55501c6568d7507b517e3b4db191c
Reviewed-on: https://go-review.googlesource.com/c/go/+/208221
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This implements preemptM on Windows using SuspendThead and
ResumeThread.
Unlike on POSIX platforms, preemptM on Windows happens synchronously.
This means we need a make a few other tweaks to suspendG:
1. We need to CAS the G back to _Grunning before doing the preemptM,
or there's a good chance we'll just catch the G spinning on its
status in the runtime, which won't be preemptible.
2. We need to rate-limit preemptM attempts. Otherwise, if the first
attempt catches the G at a non-preemptible point, the busy loop in
suspendG may hammer it so hard that it never makes it past that
non-preemptible point.
Updates #10958, #24543.
Change-Id: Ie53b098811096f7e45d864afd292dc9e999ce226
Reviewed-on: https://go-review.googlesource.com/c/go/+/204340
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
On Windows, there is currently a race between unminit closing the
thread's handle and profileloop1 suspending the thread using its
handle. If another handle reuses the same handle value, this can lead
to unpredictable results.
To fix this, we protect the thread handle with a lock and duplicate it
under this lock in profileloop1 before using it.
This is going to become a much bigger problem with non-cooperative
preemption (#10958, #24543), which uses the same basic mechanism as
profileloop1.
Change-Id: I9d62b83051df8c03f3363344438e37781a69ce16
Reviewed-on: https://go-review.googlesource.com/c/go/+/207779
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
This field is only used on Windows.
Change-Id: I12d4df09261f8e7ad54c2abd7beda669af28c8e7
Reviewed-on: https://go-review.googlesource.com/c/go/+/207778
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Since the new page allocator, AIX's GDB has trouble running Go programs.
It does work but it can be really slow. Therefore, they are disable when
tests are run with -short.
Updates: #35710
Change-Id: Ibfc4bd2cd9714268f1fe172aaf32a73612e262d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/207919
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Dan Scales pointed out a theoretical deadlock in the runtime.
The timer code runs timer functions while holding the timers lock for a P.
The scavenger queues up a timer function that calls wakeScavenger,
which acquires the scavenger lock.
The scavengeSleep function acquires the scavenger lock,
then calls resetTimer which can call addInitializedTimer
which acquires the timers lock for the current P.
So there is a potential deadlock, in that the scavenger lock and
the timers lock for some P may both be acquired in different order.
It's not clear to me whether this deadlock can ever actually occur.
Issue 35532 describes another possible deadlock.
The pollSetDeadline function acquires pd.lock for some poll descriptor,
and in some cases calls resettimer which can in some cases acquire
the timers lock for the current P.
The timer code runs timer functions while holding the timers lock for a P.
The timer function for poll descriptors winds up in netpolldeadlineimpl
which acquires pd.lock.
So again there is a potential deadlock, in that the pd lock for some
poll descriptor and the timers lock for some P may both be acquired in
different order. I think this can happen if we change the deadline
for a network connection exactly as the former deadline expires.
Looking at the code, I don't see any reason why we have to hold
the timers lock while running a timer function.
This CL implements that change.
Updates #6239
Updates #27707Fixes#35532
Change-Id: I17792f5a0120e01ea07cf1b2de8434d5c10704dd
Reviewed-on: https://go-review.googlesource.com/c/go/+/207348
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
When initializing an M, we set up its signal stack to the gsignal
stack if an alternate signal stack is not already set. On Android,
an alternate signal stack is always set, even cgo is not used.
This breaks the logic of saving/fetching G on the signal stack
during VDSO, which assumes the signal stack is allocated by Go if
cgo is not used (if cgo is used, we use TLS for saving G).
When cgo is not used, we can always use the Go signal stack, even
if an alternate signal stack is already set. Since cgo is not
used, no one other than the Go runtime will care.
Fixes#35554.
Change-Id: Ia9d84cd55cb35097f3df46f37996589c86f10e0f
Reviewed-on: https://go-review.googlesource.com/c/go/+/207445
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
CL 206078 introduced a stray errno check that was always false. This CL removes it.
Updates #35276
Change-Id: I6996bb595d347fe81752786a3d83d3432735c9cb
GitHub-Last-Rev: e026e71b16
GitHub-Pull-Request: golang/go#35650
Reviewed-on: https://go-review.googlesource.com/c/go/+/207577
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
On AIX, addresses returned by mmap are between 0x0a00000000000000
and 0x0afffffffffffff. The previous solution to handle these large
addresses was to increase the arena size up to 60 bits addresses,
cf CL 138736.
However, with the new page allocator, the 60bit heap addresses are
causing huge memory allocations, especially by (s *pageAlloc).init. mmap
and munmap syscalls dealing with these allocations are reducing
performances of every Go programs.
In order to avoid these allocations, arenaBaseOffset is set to
0x0a00000000000000 and heap addresses are on 48bit, as others operating
systems.
Updates: #35451
Change-Id: Ice916b8578f76703428ec12a82024147a7592bc0
Reviewed-on: https://go-review.googlesource.com/c/go/+/206841
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
In scavengeOne's fast path, we currently don't check the summary for the
chunk that scavAddr points to, which means that we might accidentally
scavenge unused address space if the previous scavenge moves the
scavAddr into that space. The result of this today is a crash.
This change makes it so that scavengeOne's fast path only happens after
the check, following the comment in mpagealloc.go. It also adds a test
for this case.
Fixes#35465.
Updates #35112.
Change-Id: I861d44ee75e42a0e1f5aaec243bc449228273903
Reviewed-on: https://go-review.googlesource.com/c/go/+/206978
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
When we receive a signal, if G is nil we call badsignal, which
calls needm. When cgo is not used, there is no extra M, so needm
will just hang. In this situation, even GOTRACEBACK=crash cannot
get a stack trace, as we're in the signal handler and cannot
receive another signal (SIGQUIT).
Instead, just crash.
For #35554.
Updates #34391.
Change-Id: I061ac43fc0ac480435c050083096d126b149d21f
Reviewed-on: https://go-review.googlesource.com/c/go/+/206959
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
tryAdd shouldn't succeed (and accept the new frame) if the last
existing frame on the deck is not an inlined frame.
For example, when we see the followig stack
[300656 300664 300655 300664]
with each PC corresponds to
[{PC:300656 Func:nil Function:runtime.nanotime File:/workdir/go/src/runtime/time_nofake.go Line:19 Entry:300416 {0x28dac8 0x386c80}}]
[{PC:300664 Func:0x28dac8 Function:runtime.checkTimers File:/workdir/go/src/runtime/proc.go Line:2623 Entry:300416 {0x28dac8 0x386c80}}]
[{PC:300655 Func:nil Function:runtime.nanotime File:/workdir/go/src/runtime/time_nofake.go Line:19 Entry:300416 {0x28dac8 0x386c80}}]
[{PC:300664 Func:0x28dac8 Function:runtime.checkTimers File:/workdir/go/src/runtime/proc.go Line:2623 Entry:300416 {0x28dac8 0x386c80}}]
PC:300656 and PC:300664 belong to a single location entry,
but the bug in the current tryAdd logic placed the entire stack into one
location entry.
Also adds tests - this crash is a tricky case to test because I think it
should happen with normal go code. The new TestTryAdd simulates it by
using fake call sequences. The test crashed without the fix.
Update #35538
Change-Id: I6d3483f757abf4c429ab91616e4def90832fc04a
Reviewed-on: https://go-review.googlesource.com/c/go/+/206958
Reviewed-by: Keith Randall <khr@golang.org>
In the discussion of CL 171828 we decided that it was not necessary to
acquire timersLock around the call to moveTimers, because the world is
stopped. However, that is not correct, as sysmon runs even when the world
is stopped, and it calls timeSleepUntil which looks through the timers.
timeSleepUntil acquires timersLock, but that doesn't help if moveTimers
is running at the same time.
Updates #6239
Updates #27707
Updates #35462
Change-Id: I346c5bde594c4aff9955ae430b37c2b6fc71567f
Reviewed-on: https://go-review.googlesource.com/c/go/+/206938
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
The problem should be fixed by the previous CL. Reenable async
preemption on darwin/arm64.
Updates #35439.
Change-Id: I93e8c4702b4d8fe6abaa6fc9c27def5c8aed1b59
Reviewed-on: https://go-review.googlesource.com/c/go/+/206419
Reviewed-by: Keith Randall <khr@golang.org>
Some, but not all, architectures mix in OS-provided random seeds when
initializing the fastrand state. The others have TODOs saying we need
to do the same. Lift that logic up in the architecture-independent
part, and use memhash to mix the seed instead of a simple addition.
Previously, dumping the fastrand state at initialization would yield
something like the following on linux-amd64, where the values in the
first column do not change between runs (as thread IDs are sequential
and always start at 0), and the values in the second column, while
changing every run, are pretty correlated:
first run:
0x0 0x44d82f1c
0x5f356495 0x44f339de
0xbe6ac92a 0x44f91cd8
0x1da02dbf 0x44fd91bc
0x7cd59254 0x44fee8a4
0xdc0af6e9 0x4547a1e0
0x3b405b7e 0x474c76fc
0x9a75c013 0x475309dc
0xf9ab24a8 0x4bffd075
second run:
0x0 0xa63fc3eb
0x5f356495 0xa6648dc2
0xbe6ac92a 0xa66c1c59
0x1da02dbf 0xa671bce8
0x7cd59254 0xa70e8287
0xdc0af6e9 0xa7129d2e
0x3b405b7e 0xa7379e2d
0x9a75c013 0xa7e4c64c
0xf9ab24a8 0xa7ecce07
With this change, we get initial states that appear to be much more
unpredictable, both within the same run as well as between runs:
0x11bddad7 0x97241c63
0x553dacc6 0x2bcd8523
0x62c01085 0x16413d92
0x6f40e9e6 0x7a138de6
0xa4898053 0x70d816f0
0x5ca5b433 0x188a395b
0x62778ca9 0xd462c3b5
0xd6e160e4 0xac9b4bd
0xb9571d65 0x597a981d
Change-Id: Ib22c530157d74200df0083f830e0408fd4aaea58
Reviewed-on: https://go-review.googlesource.com/c/go/+/203439
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
At least on Darwin notewakeup is not async-signal-safe.
Fixes#35276
Change-Id: I1d7523715e8e77dbd7f21d9b1ed131e52d46cc41
Reviewed-on: https://go-review.googlesource.com/c/go/+/206078
Reviewed-by: Austin Clements <austin@google.com>
Before this CL, if max > min and max was unaligned to min, then the
function could return an unaligned (unaligned to min) region to
scavenge. On most platforms, this leads to some kind of crash.
Fix this by explicitly aligning max to the next multiple of min.
Fixes#35445.
Updates #35112.
Change-Id: I0af42d4a307b48a97e47ed152c619d77b0298291
Reviewed-on: https://go-review.googlesource.com/c/go/+/206277
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Intrinsify these functions to match other platforms. Update the
sequence of instructions used in the assembly implementations to
match the intrinsics.
Also, add a micro benchmark so we can more easily measure the
performance of these two functions:
name old time/op new time/op delta
And8-8 5.33ns ± 7% 2.55ns ± 8% -52.12% (p=0.000 n=20+20)
And8Parallel-8 7.39ns ± 5% 3.74ns ± 4% -49.34% (p=0.000 n=20+20)
Or8-8 4.84ns ±15% 2.64ns ±11% -45.50% (p=0.000 n=20+20)
Or8Parallel-8 7.27ns ± 3% 3.84ns ± 4% -47.10% (p=0.000 n=19+20)
By using a 'rotate then xor selected bits' instruction combined with
either a 'load and and' or a 'load and or' instruction we can
implement And8 and Or8 with far fewer instructions. Replacing
'compare and swap' with atomic instructions may also improve
performance when there is contention.
Change-Id: I28bb8032052b73ae8ccdf6e4c612d2877085fa01
Reviewed-on: https://go-review.googlesource.com/c/go/+/204277
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
On some platforms (currently ARM and ARM64), when calling into
VDSO we store the G to the gsignal stack, if there is one, so if
we receive a signal during VDSO we can find the G.
If we receive a signal during VDSO, and within the signal handler
we call nanotime again (e.g. when handling profiling signal),
we'll save/clear the G slot on the gsignal stack again, which
clobbers the original saved G. If we receive a second signal
during the same VDSO execution, we will fetch a nil G, which will
lead to bad things such as deadlock.
Don't save G if we're calling VDSO code from the gsignal stack.
Saving G is not necessary as we won't receive a nested signal.
Fixes#35473.
Change-Id: Ibfd8587a3c70c2f1533908b056e81b94d75d65a5
Reviewed-on: https://go-review.googlesource.com/c/go/+/206397
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
This change makes the test addresses start at 1 GiB instead of 2 GiB to
support mips and mipsle, which only have 31-bit address spaces.
It also changes some tests to use smaller offsets for the chunk index to
avoid jumping too far ahead in the address space to support 31-bit
address spaces. The tests don't require such large jumps for what
they're testing anyway.
Updates #35112.
Fixes#35440.
Change-Id: Ic68ff2b0a1f10ef37ac00d4bb5b910ddcdc76f2e
Reviewed-on: https://go-review.googlesource.com/c/go/+/205938
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
When we have already assigned the semaphore ticket to a specific
waiter, we want to get the waiter running as fast as possible since
no other G waiting on the semaphore can acquire it optimistically.
The net effect is that, when a sync.Mutex is contended, the code in
the critical section guarded by the Mutex gets a priority boost.
Fixes#33747
The original work was done in CL 200577 by Carlo Alberto Ferraris. The
change was reverted in CL 205817 because it broke the linux-arm64-packet
and solaris-amd64-oraclerel builders.
Change-Id: I76d79b1d63fd206ed1c57fe6900cb7ae9e4d46cb
Reviewed-on: https://go-review.googlesource.com/c/go/+/206180
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
CL 201765 activated calls from the runtime to functions in math/bits.
When coverage and race detection were simultaneously enabled,
this caused a crash when the covered+race-checked code in
math/bits was called from the runtime before there was even a P.
PS Win for gdlv in helping sort this out.
TODO - next CL intrinsifies the new functions in
runtime/internal/sys
TODO/Would-be-nice - Ctz64 and TrailingZeros64 are the same
function; 386.s is intrinsified; clean all that up.
Fixes#35461.
Updates #35112.
Change-Id: I750a54dba493130ad3e68a06530ede7687d41e1d
Reviewed-on: https://go-review.googlesource.com/c/go/+/206199
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Unlike function calls, when processing instructions that directly
fault we must not subtract 1 from the pc before looking up the
file/line information.
Since the file/line lookup unconditionally subtracts 1, add 1 to
the faulting instruction PCs to compensate.
Fixes#34123
Change-Id: Ie7361e3d2f84a0d4f48d97e5a9e74f6291ba7a8b
Reviewed-on: https://go-review.googlesource.com/c/go/+/196962
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
This adds pipe/pipe2 on Solaris as they exist on other Unix systems.
They were not added previously because Solaris does not need them
for netpollBreak. They are added now in preparation for using pipes
in TestSignalM.
Updates #35276
Change-Id: I53dfdf077430153155f0a79715af98b0972a841c
Reviewed-on: https://go-review.googlesource.com/c/go/+/206077
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Without this CL, one of the TestDebugCall tests would fail 1% to 2% of
the time on the android-amd64-emu gomote. With this CL, I ran the
tests for 1000 iterations with no failures.
Fixes#32985
Change-Id: I541268a2a0c10d0cd7604f0b2dbd15c1d18e5730
Reviewed-on: https://go-review.googlesource.com/c/go/+/205248
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
This change adds a per-p free page cache which the page allocator may
allocate out of without a lock. The change also introduces a completely
lockless page allocator fast path.
Although the cache contains at most 64 pages (and usually less), the
vast majority (85%+) of page allocations are exactly 1 page in size.
Updates #35112.
Change-Id: I170bf0a9375873e7e3230845eb1df7e5cf741b78
Reviewed-on: https://go-review.googlesource.com/c/go/+/195701
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Austin Clements <austin@google.com>
This change adds a page cache structure which owns a chunk of free pages
at a given base address. It also adds code to allocate to this cache
from the page allocator. Finally, it adds tests for both.
Notably this change does not yet integrate the code into the runtime,
just into runtime tests.
Updates #35112.
Change-Id: Ibe121498d5c3be40390fab58a3816295601670df
Reviewed-on: https://go-review.googlesource.com/c/go/+/196643
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This change adds a per-p mspan object cache similar to the sudog cache.
Unfortunately this cache can't quite operate like the sudog cache, since
it is used in contexts where write barriers are disallowed (i.e.
allocation codepaths), so rather than managing an array and a slice,
it's just an array and a length. A little bit more unsafe, but avoids
any write barriers.
The purpose of this change is to reduce the number of operations which
require the heap lock in allocation, paving the way for a lockless fast
path.
Updates #35112.
Change-Id: I32cfdcd8528fb7be985640e4f3a13cb98ffb7865
Reviewed-on: https://go-review.googlesource.com/c/go/+/196642
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This change combines the functionality of allocSpanLocked, allocManual,
and alloc_m into a new method called allocSpan. While these methods'
abstraction boundaries are OK when the heap lock is held throughout,
they start to break down when we want finer-grained locking in the page
allocator.
allocSpan does just that, and only locks the heap when it absolutely has
to. Piggy-backing off of work in previous CLs to make more of span
initialization lockless, this change makes span initialization entirely
lockless as part of the reorganization.
Ultimately this change will enable us to add a lockless fast path to
allocSpan.
Updates #35112.
Change-Id: I99875939d75fb4e958a67ac99e4a7cda44f06864
Reviewed-on: https://go-review.googlesource.com/c/go/+/196641
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Currently gcSweepBuf guarantees that push operations may be performed
concurrently with each other and that block operations may be performed
concurrently with push operations as well.
Unfortunately, this isn't quite true. The existing code allows push
operations to happen concurrently with each other, but block operations
may return blocks with nil entries. The way this can happen is if two
concurrent pushers grab a slot to push to, and the first one (the one
with the earlier slot in the buffer) doesn't quite write a span value
when the block is called. The existing code in block only checks if the
very last value in the block is nil, when really an arbitrary number of
the last few values in the block may or may not be nil.
Today, this case can't actually happen because when push operations
happen concurrently during a GC (which is the only time block is
called), they only ever happen during an allocation with the heap lock
held, effectively serializing them. A block operation may happen
concurrently with one of these pushes, but its callers will never see a
nil mspan. Outside of a GC, this isn't a problem because although push
operations from allocations can run concurrently with push operations
from sweeping, block operations will never run.
In essence, the real concurrency guarantees provided by gcSweepBuf are
that block operations may happen concurrently with push operations, but
that push operations may not be concurrent with each other if there are
any block operations.
To fix this, and to prepare for push operations happening without the
heap lock held in a future CL, we update the documentation for block to
correctly state that there may be nil entries in the returned slice.
While we're here, make the mspan writes into the buffer atomic to avoid
a block user racing on a nil check, and document that the user should
load mspan values from the returned slice atomically. Finally, we make
all callers of block adhere to the new rules.
We choose to allow nil values rather than filter them out because the
only caller of block is markrootSpans, and if it catches a nil entry,
then there wasn't anything to mark in there anyway since the span is
just being created.
Updates #35112.
Change-Id: I6450aab15f51690d7a000ba5b3d529cf2ca5da1e
Reviewed-on: https://go-review.googlesource.com/c/go/+/203318
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This change makes it so that allocation and free related page sweeper
metadata operations (e.g. pageInUse and pagesInUse) are atomic rather
than protected by the heap lock. This will help in reducing the length
of the critical path with the heap lock held in future changes.
Updates #35112.
Change-Id: Ie82bff024204dd17c4c671af63350a7a41add354
Reviewed-on: https://go-review.googlesource.com/c/go/+/196640
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This CL adds support of call injection and async preemption on
PPC64.
For the injected call to return to the preempted PC, we have to
clobber either LR or CTR. For reasons mentioned in previous CLs,
we choose CTR. Previous CLs have marked code sequences that use
CTR async-nonpreemtible.
Change-Id: Ia642b5f06a890dd52476f45023b2a830c522eee0
Reviewed-on: https://go-review.googlesource.com/c/go/+/203824
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
mheap_.alloc currently accepts both a spanClass and a "large" parameter
indicating whether the allocation is large. These are redundant, since
spanClass.sizeclass() == 0 is an equivalent way to determine this and is
already used in mheap_.alloc. There are no places in the runtime where
the size class could be non-zero and large == true.
Updates #35112.
Change-Id: Ie66facf8f0faca6f4cd3d20a8ac4bc259e11823d
Reviewed-on: https://go-review.googlesource.com/c/go/+/196639
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This change defines a maximum supported physical and huge page size in
the runtime based on the new page allocator's implementation, and uses
them where appropriate.
Furthemore, if the system exceeds the maximum supported huge page
size, we simply ignore it silently.
It also fixes a huge-page-related test which is only triggered by a
condition which is definitely wrong.
Finally, it adds a few TODOs related to code clean-up and supporting
larger huge page sizes.
Updates #35112.
Fixes#35431.
Change-Id: Ie4348afb6bf047cce2c1433576d1514720d8230f
Reviewed-on: https://go-review.googlesource.com/c/go/+/205937
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
For the most part, heap memstats are already updated atomically when
passed down to OS-level memory functions (e.g. sysMap). Elsewhere,
however, they're updated with the heap lock.
In order to facilitate holding the heap lock for less time during
allocation paths, this change more consistently makes the update of
these statistics atomic by calling mSysStat{Inc,Dec} appropriately
instead of simply adding or subtracting. It also ensures these values
are loaded atomically.
Furthermore, an undocumented but safe update condition for these
memstats is during STW, at which point using atomics is unnecessary.
This change also documents this condition in mstats.go.
Updates #35112.
Change-Id: I87d0b6c27b98c88099acd2563ea23f8da1239b66
Reviewed-on: https://go-review.googlesource.com/c/go/+/196638
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This change removes useless additional heap_objects accounting for large
objects. heap_objects is computed from scratch at ReadMemStats time
(which stops the world) by using nlargealloc and nlargefree, so mutating
heap_objects turns out to be pointless.
As a result, the "large" parameter on "mheap_.freeSpan" is no longer
necessary and so this change cleans that up too.
Change-Id: I7d6b486d9b57c018e3db46221d81b55fe4c1b021
Reviewed-on: https://go-review.googlesource.com/c/go/+/196637
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
In preparation for a lockless fast path in the page allocator, this
change makes it so that checking if an allocation needs to be zeroed may
be done atomically.
Unfortunately, this means there is a CAS-loop to ensure monotonicity of
the zeroedBase value in heapArena. This CAS-loop exits if an allocator
acquiring memory further on in the arena wins or if it succeeds. The
CAS-loop should have a relatively small amount of contention because of
this monotonicity, though it would be ideal if we could just have
CAS-ers with the greatest value always win. The CAS-loop is unnecessary
in the steady-state, but should bring some start-up performance gains as
it's likely cheaper than the additional zeroing required, especially for
large allocations.
For very large allocations that span arenas, the CAS-loop should be
completely uncontended for most of the arenas it touches, it may only
encounter contention on the first and last arena.
Updates #35112.
Change-Id: If3d19198b33f1b1387b71e1ce5902d39a5c0f98e
Reviewed-on: https://go-review.googlesource.com/c/go/+/203859
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This test is failing consistently in the longtest builders,
potentially masking regressions in other packages.
Updates #35271
Change-Id: Idc03171c0109b5c8d4913e0af2078c1115666897
Reviewed-on: https://go-review.googlesource.com/c/go/+/206098
Reviewed-by: Carlos Amedee <carlos@golang.org>
The pprof profile proto message expects inlined functions of a PC
to be encoded in one Location entry using multiple Line entries.
https://github.com/google/pprof/blob/5e96527/proto/profile.proto#L177-L184
runtime/pprof has encoded the symbolization information by creating
a Location for each PC found in the stack trace and including info
from all the frames expanded from the PC using runtime.CallersFrames.
This assumes inlined functions are represented as a single PC in the
stack trace. (https://go-review.googlesource.com/41256)
In the recent years, behavior around inlining and the traceback
changed significantly (e.g. https://golang.org/cl/152537,
https://golang.org/issue/29582, and many changes). Now the PCs
in the stack trace represent user frames even including inline
marks. As a result, the profile proto started to allocate a Location
entry for each user frame, lose the inline information (so pprof
presented incorrect results when inlined functions are involved),
and confuse the pprof tool with those PCs made up for inline marks.
This CL attempts to detect inlined call frames from the stack traces
of CPU profiles, and organize the Location information as intended.
Currently, runtime does not provide a reliable and convenient way to
detect inlined call frames and expand user frames from a given externally
recognizable PCs. So we use heuristics to recover the groups
- inlined call frames have nil Func field
- inlined call frames will have the same Entry point
- but must be careful with recursive functions that have the
same Entry point by definition, and non-Go functions that
may lack most of the fields of Frame.
The followup CL will address the issue with other profile types.
Change-Id: I0c9667ab016a3e898d648f31c3f82d84c15398db
Reviewed-on: https://go-review.googlesource.com/c/go/+/204636
Reviewed-by: Keith Randall <khr@golang.org>
This change removes the old page allocator from the runtime.
Updates #35112.
Change-Id: Ib20e1c030f869b6318cd6f4288a9befdbae1b771
Reviewed-on: https://go-review.googlesource.com/c/go/+/195700
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This change flips the oldPageAllocator constant enabling the new page
allocator in the Go runtime.
Updates #35112.
Change-Id: I7fc8332af9fd0e43ce28dd5ebc1c1ce519ce6d0c
Reviewed-on: https://go-review.googlesource.com/c/go/+/201765
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This CL adds support of call injection and async preemption on
MIPS and MIPS64.
Like ARM64, we need to clobber one register (REGTMP) for
returning from the injected call. Previous CLs have marked code
sequences that use REGTMP async-nonpreemtible.
It seems on MIPS/MIPS64, a CALL instruction is not "atomic" (!).
If a signal is delivered right at the CALL instruction, we may
see an updated LR with a not-yet-updated PC. In some cases this
may lead to failed stack unwinding. Don't preempt in this case.
Change-Id: I99437b2d05869ded5c0c8cb55265dbfc933aedab
Reviewed-on: https://go-review.googlesource.com/c/go/+/203720
Reviewed-by: Keith Randall <khr@golang.org>
This change adds the allocNeedZero method to mheap which uses the new
heapArena field zeroedBase to determine whether a new allocation needs
zeroing. The purpose of this work is to avoid zeroing memory that is
fresh from the OS in the context of the new allocator, where we no
longer have the concept of a free span to track this information.
The new field in heapArena, zeroedBase, is small, which runs counter to
the advice in the doc comment for heapArena. Since heapArenas are
already not a multiple of the system page size, this advice seems stale,
and we're OK with using an extra physical page for a heapArena. So, this
change also deletes the comment with that advice.
Updates #35112.
Change-Id: I688cd9fd3c57a98a6d43c45cf699543ce16697e2
Reviewed-on: https://go-review.googlesource.com/c/go/+/203858
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This CL adds support of call injection and async preemption on
S390X.
Like ARM64, we need to clobber one register (REGTMP) for
returning from the injected call. Previous CLs have marked code
sequences that use REGTMP async-nonpreemtible.
Change-Id: I78adbc5fd70ca245da390f6266623385b45c9dfc
Reviewed-on: https://go-review.googlesource.com/c/go/+/204106
Reviewed-by: Keith Randall <khr@golang.org>
This change integrates all the bits and pieces of the new page allocator
into the runtime, behind a global constant.
Updates #35112.
Change-Id: I6696bde7bab098a498ab37ed2a2caad2a05d30ec
Reviewed-on: https://go-review.googlesource.com/c/go/+/201764
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Currently the runtime background scavenger is paced externally,
controlled by a collection of variables which together describe a line
that we'd like to stay under.
However, the line to stay under is computed as a function of the number
of free and unscavenged huge pages in the heap at the end of the last
GC. Aside from this number being inaccurate (which is still acceptable),
the scavenging system also makes an order-of-magnitude assumption as to
how expensive scavenging a single page actually is.
This change simplifies the scavenger in preparation for making it
operate on bitmaps. It makes it so that the scavenger paces itself, by
measuring the amount of time it takes to scavenge a single page. The
scavenging methods on mheap already avoid breaking huge pages, so if we
scavenge a real huge page, then we'll have paced correctly, otherwise
we'll sleep for longer to avoid using more than scavengePercent wall
clock time.
Unfortunately, all this involves measuring time, which is quite tricky.
Currently we don't directly account for long process sleeps or OS-level
context switches (which is quite difficult to do in general), but we do
account for Go scheduler overhead and variations in it by maintaining an
EWMA of the ratio of time spent scavenging to the time spent sleeping.
This ratio, as well as the sleep time, are bounded in order to deal with
the aforementioned OS-related anomalies.
Updates #35112.
Change-Id: Ieca8b088fdfca2bebb06bcde25ef14a42fd5216b
Reviewed-on: https://go-review.googlesource.com/c/go/+/201763
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This CL adds support of call injection and async preemption on
ARM64.
There seems no way to return from the injected call without
clobbering *any* register. So we have to clobber one, which is
chosen to be REGTMP. Previous CLs have marked code sequences
that use REGTMP async-nonpreemtible.
Change-Id: Ieca4e3ba5557adf3d0f5d923bce5f1769b58e30b
Reviewed-on: https://go-review.googlesource.com/c/go/+/203461
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This change adds a "locked" parameter to scavenge() and scavengeone()
which allows these methods to be run with the heap lock acquired, and
synchronously with respect to others which acquire the heap lock.
This mode is necessary for both heap-growth scavenging (multiple
asynchronous scavengers here could be problematic) and
debug.FreeOSMemory.
Updates #35112.
Change-Id: I24eea8e40f971760999c980981893676b4c9b666
Reviewed-on: https://go-review.googlesource.com/c/go/+/195699
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
This change makes it so that the new page allocator returns the number
of pages that are scavenged in a new allocation so that mheap can update
memstats appropriately.
The accounting could be embedded into pageAlloc, but that would make
the new allocator more difficult to test.
Updates #35112.
Change-Id: I0f94f563d7af2458e6d534f589d2e7dd6af26d12
Reviewed-on: https://go-review.googlesource.com/c/go/+/195698
Reviewed-by: Austin Clements <austin@google.com>
This change adds a scavenger for the new page allocator along with
tests. The scavenger walks over the heap backwards once per GC, looking
for memory to scavenge. It walks across the heap without any lock held,
searching optimistically. If it finds what appears to be a scavenging
candidate it acquires the heap lock and attempts to verify it. Upon
verification it then scavenges.
Notably, unlike the old scavenger, it doesn't show any preference for
huge pages and instead follows a more strict last-page-first policy.
Updates #35112.
Change-Id: I0621ef73c999a471843eab2d1307ae5679dd18d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/195697
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This change adds a new bitmap-based allocator to the runtime with tests.
It does not yet integrate the page allocator into the runtime and thus
this change is almost purely additive.
Updates #35112.
Change-Id: Ic3d024c28abee8be8797d3918116a80f901cc2bf
Reviewed-on: https://go-review.googlesource.com/c/go/+/190622
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This change ensures js-wasm returns page-aligned memory. While today
its lack of alignment doesn't cause problems, this is an invariant of
sysAlloc which is documented in HACKING.md but isn't upheld by js-wasm.
Any code that calls sysAlloc directly for small structures expects a
certain alignment (e.g. debuglog, tracebufs) but this is not maintained
by js-wasm's sysAlloc.
Where sysReserve comes into play is that sysAlloc is implemented in
terms of sysReserve on js-wasm. Also, the documentation of sysReserve
says that the returned memory is "OS-aligned" which on most platforms
means page-aligned, but the "OS-alignment" on js-wasm is effectively 1,
which doesn't seem right either.
The expected impact of this change is increased memory use on wasm,
since there's no way to decommit memory, and any small structures
allocated with sysAlloc won't be packed quite as tightly. However, any
memory increase should be minimal. Most calls to sysReserve and sysAlloc
already aligned their request to physPageSize before calling it; there
are only a few circumstances where this is not true, and they involve
allocating an amount of memory returned by unsafe.Sizeof where it's
actually quite important that we get the alignment right.
Updates #35112.
Change-Id: I9ca171e507ff3bd186326ccf611b35b9ebea1bfe
Reviewed-on: https://go-review.googlesource.com/c/go/+/205277
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Richard Musiol <neelance@gmail.com>
This change adds the concept of summaries and of summarizing a set of
pallocBits, a core concept in the new page allocator. These summaries
are really just three integers packed into a uint64. This change also
adds tests and a benchmark for generating these summaries.
Updates #35112.
Change-Id: I69686316086c820c792b7a54235859c2105e5fee
Reviewed-on: https://go-review.googlesource.com/c/go/+/190621
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This change adds a per-chunk bitmap for page allocation called
pallocBits with algorithms for allocating and freeing pages out of the
bitmap. This change also adds tests for pallocBits, but does not yet
integrate it into the runtime.
Updates #35112.
Change-Id: I479006ed9f1609c80eedfff0580d5426b064b0ff
Reviewed-on: https://go-review.googlesource.com/c/go/+/190620
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This change is the first of a series of changes which replace the
current page allocator (which is based on the contents of mgclarge.go
and some of mheap.go) with one based on free/used bitmaps.
It adds in the key constants for the page allocator as well as a comment
describing the implementation.
Updates #35112.
Change-Id: I839d3a07f46842ad379701d27aa691885afdba63
Reviewed-on: https://go-review.googlesource.com/c/go/+/190619
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This changes makes it so that sysReserve, which creates a PROT_NONE
mapping, maps that memory as NORESERVE. Before this change, relatively
large PROT_NONE mappings could cause fork to fail with ENOMEM, reported
as "not enough space". Presumably this refers to swap space, since
adding this flag causes the failures to go away.
This helps unblock page allocator work, since it allows us to make large
PROT_NONE mappings on solaris safely.
Updates #35112.
Change-Id: Ic3cba310c626e93d5db0f27269e2569bb7bc393e
Reviewed-on: https://go-review.googlesource.com/c/go/+/205759
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
When we have already assigned the semaphore ticket to a specific
waiter, we want to get the waiter running as fast as possible since
no other G waiting on the semaphore can acquire it optimistically.
The net effect is that, when a sync.Mutex is contented, the code in
the critical section guarded by the Mutex gets a priority boost.
Fixes#33747
Change-Id: I9967f0f763c25504010651bdd7f944ee0189cd45
Reviewed-on: https://go-review.googlesource.com/c/go/+/200577
Reviewed-by: Rhys Hiltner <rhys@justin.tv>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
On iOS, the address space is not 48 bits as one might believe, since
it's arm64 hardware. In fact, all pointers are truncated to 33 bits, and
the OS only gives applications access to the range [1<<32, 2<<32).
While today this has no effect on the Go runtime, future changes which
care about address space size need this to be correct.
Updates #35112.
Change-Id: Id518a2298080f7e3d31cf7d909506a37748cc49a
Reviewed-on: https://go-review.googlesource.com/c/go/+/205758
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
This change removes a hack which was added to deal with Darwin 10.10's
weird ignorance of mapping hints which would cause race mode to fail
since it requires the heap to live within a certain address range.
We no longer support 10.10, and this is potentially causing problems
related to the page allocator, so drop this code.
Updates #26475.
Updates #35112.
Change-Id: I0e1c6f8c924afe715a2aceb659a969d7c7b6f749
Reviewed-on: https://go-review.googlesource.com/c/go/+/205757
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
The test deliberately constructs an invalid pointer, so don't check it.
Fixes#35379
Change-Id: Ifeff3484740786b0470de3a4d2d4103d91e06f5d
Reviewed-on: https://go-review.googlesource.com/c/go/+/205717
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Otherwise, we can get into a deadlock: sysmon takes the scheduler lock
and calls timeSleepUntil which takes each P's timer lock. Simultaneously,
some P calls runtimer (holding the P's own timer lock) which wakes up
the scavenger, calling goready, calling wakep, calling startm, getting
the scheduler lock. Now the sysmon thread is holding the scheduler lock
and trying to get a P's timer lock, while some other thread running on
that P is holding the P's timer lock and trying to get the scheduler lock.
So change sysmon to call timeSleepUntil without holding the scheduler
lock, and change timeSleepUntil to use allpLock, which is only held for
limited periods of time and should never compete with timer locks.
This hopefully
Fixes#35375
At least it should fix the linux-arm64-packet builder problems,
which occurred more reliably as that system has GOMAXPROCS == 96,
giving a lot more scope for this deadlock.
Change-Id: I7a7917daf7a4882e0b27ca416e4f6300cfaaa774
Reviewed-on: https://go-review.googlesource.com/c/go/+/205558
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
WaitForSigusr1 registers a callback to be called on SIGUSR1 directly
from the runtime signal handler. Currently, this callback has a write
barrier in it, which can crash with a nil P if the GC is active and
the signal arrives on an M that doesn't have a P.
Fix this by recording the ID of the M that receives the signal instead
of the M itself, since that's all we needed anyway. To make sure there
are no other problems, this also lifts the callback into a package
function and marks it "go:nowritebarrierrec".
Fixes#35248.
Updates #35276, since in principle a write barrier at exactly the
wrong time while entering the scheduler could cause issues, though I
suspect that bug is unrelated.
Change-Id: I47b4bc73782efbb613785a93e381d8aaf6850826
Reviewed-on: https://go-review.googlesource.com/c/go/+/204620
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The check is not relevant anymore.
The comment claims that go run does not rebuild packages,
but this is not true. And we use go build anyway.
We may have added the check because without caching
rebuilding everything starting from runtime for each test
takes a while. But now we have caching.
So from every side this check just adds code and pain.
Change-Id: Ifbbb643724100622e5f9db884339b67cde4ba729
Reviewed-on: https://go-review.googlesource.com/c/go/+/202450
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The hash is used in type switches. However, compiler statically generates itab's
for all interface/type pairs used in switches (which are added to itabTable
in itabsinit). The dynamically-generated itab's never participate in type switches,
and thus the hash is irrelevant.
Change-Id: I4f6e37be31b8f5605cca7a1806cb04708e948cea
Reviewed-on: https://go-review.googlesource.com/c/go/+/202448
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Keith Randall <khr@golang.org>