On PPC64 when external linking, for large binaries we split the
text section to multiple sections, so the external linking may
insert trampolines between sections. These trampolines are within
the address range covered by the func table, but not known by Go.
This causes runtime.findfunc to return a wrong function if the
given PC is from such trampolines.
In this CL, we generate a marker between text sections where
there could potentially be a hole in the func table. At run time,
we skip the hole if we see such a marker.
Fixes#37216.
Change-Id: I95ab3875a84b357dbaa65a4ed339a19282257ce0
Reviewed-on: https://go-review.googlesource.com/c/go/+/219717
Reviewed-by: David Chase <drchase@google.com>
In walltime1/nanotime1, we save the caller's PC and SP for stack
unwinding. The code does that assumed zero frame size. Now that
the frame size is not zero, correct the offset. Rewrite it in a
way that doesn't depend on hard-coded frame size.
May fix#37127.
Change-Id: I47d6d54fc3499d7d5946c3f6a2dbd24fbd679de1
Reviewed-on: https://go-review.googlesource.com/c/go/+/219118
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Apparently, the signal handling code path in darwin kernel leaves
the upper bits of Y registers in a dirty state, which causes many
SSE operations (128-bit and narrower) become much slower. Clear
the upper bits to get to a clean state.
We do it at the entry of asyncPreempt, which is immediately
following exiting from the kernel's signal handling code, if we
actually injected a call. It does not cover other exits where we
don't inject a call, e.g. failed preemption, profiling signal, or
other async signals. But it does cover an important use case of
async signals, preempting a tight numerical loop, which we
introduced in this cycle.
Running the benchmark in issue #37174:
name old time/op new time/op delta
Fast-8 90.0ns ± 1% 46.8ns ± 3% -47.97% (p=0.000 n=10+10)
Slow-8 188ns ± 5% 49ns ± 1% -73.82% (p=0.000 n=10+9)
There is no more slowdown due to preemption signals.
For #37174.
Change-Id: I8b83d083fade1cabbda09b4bc25ccbadafaf7605
Reviewed-on: https://go-review.googlesource.com/c/go/+/219131
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
We were using the fallback hash unconditionally. Oops.
Fixes#37212
Change-Id: Id37d4f5c08806fdda12a3148ba4dbc46676eeb54
Reviewed-on: https://go-review.googlesource.com/c/go/+/219337
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Otherwise we can see
- goroutine 1 calls netpollBreak, the atomic.Cas succeeds, then suspends
- goroutine 2 calls noteclear, sets netpollBroken to 0
- goroutine 3 calls netpollBreak, the atomic.Cas succeeds, calls notewakeup
- goroutine 1 wakes up calls notewakeup, crashes due to double wakeup
This doesn't happen on Plan 9 because it only runs one thread at a time.
But Fuschia wants to use this code too.
Change-Id: Ib636e4f327bb15e44a2c40fd681aae9a91073a30
Reviewed-on: https://go-review.googlesource.com/c/go/+/218537
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
This commit changes the wording of a comment in malloc.go that describes
how span objects are zeroed to make it more clear.
Change-Id: I07722df1e101af3cbf8680ad07437d4a230b0168
GitHub-Last-Rev: 0e909898c7
GitHub-Pull-Request: golang/go#37008
Reviewed-on: https://go-review.googlesource.com/c/go/+/217618
Reviewed-by: Austin Clements <austin@google.com>
It's possible for the scheduler to try to preempt a goroutine running
on a thread created by C code just as the goroutine returns from Go code
to C code. If that happens, the goroutine will have a nil g,
which would normally cause us to enter the badsignal code.
The badsignal code will allocate an M, reset the signal handler,
and raise the signal. This is all wasted work for SIGURG,
as the default behavior is for the kernel to ignore the signal.
It also means that there is a period of time when preemption requests
are ignored, because the signal handler is reset to the default.
And, finally, it triggers a bug on 386 OpenBSD 6.2. So stop doing it.
No test because there is no real change in behavior (other than on OpenBSD),
the new code is just more efficient
Fixes#36996
Change-Id: I8c1cb9bc09f5ef890cab567924417e2423fc71f6
Reviewed-on: https://go-review.googlesource.com/c/go/+/217617
Reviewed-by: Austin Clements <austin@google.com>
Now that the other dependent offset has been identified, we can remove the
unnecessary ADDI instruction from the riscv64 call sequence (reducing it
to AUIPC+JALR, rather than the previous AUIPC+ADDI+JALR).
Change-Id: I348c4efb686f9f71ed1dd1d25fb9142a41230b0d
Reviewed-on: https://go-review.googlesource.com/c/go/+/216798
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This change formalizes an assumption made by the page allocator, which
is that (*pageAlloc).searchAddr should never refer to memory that is not
represented by (*pageAlloc).inUse. The portion of address space covered
by (*pageAlloc).inUse reflects the parts of the summary arrays which are
guaranteed to mapped, and so looking at any summary which is not
reflected there may cause a segfault.
In fact, this can happen today. This change thus also removes a
micro-optimization which is the only case which may cause
(*pageAlloc).searchAddr to point outside of any region covered by
(*pageAlloc).inUse, and adds a test verifying that the current segfault
can no longer occur.
Change-Id: I98b534f0ffba8656d3bd6d782f6fc22549ddf1c2
Reviewed-on: https://go-review.googlesource.com/c/go/+/216697
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This reverts commit 7b294cdd8d, CL 182657.
Reason for revert: This change may be causing latency problems
for applications which call ReadMemStats, because it may cause
all goroutines to stop until the GC completes.
https://golang.org/cl/215157 fixes this problem, but it's too
late in the cycle to land that.
Updates #19812.
Change-Id: Iaa26f4dec9b06b9db2a771a44e45f58d0aa8f26d
Reviewed-on: https://go-review.googlesource.com/c/go/+/216358
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This reverts commit 05511a5c0a, CL 208379.
Reason for revert: So that we can cleanly revert
https://golang.org/cl/182657.
Change-Id: I4fdf4f864a093db7866b3306f0f8f856b9f4d684
Reviewed-on: https://go-review.googlesource.com/c/go/+/216357
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Unlike C's memmove, Go's memmove must be careful to do indivisible
writes of pointer values because it may be racing with the garbage
collector reading the heap.
We've had various bugs related to this over the years (#36101, #13160,
#12552). Indeed, memmove is a great target for optimization and it's
easy to forget the special requirements of Go's memmove.
The CL documents these (currently unwritten!) requirements. We're also
adding a test that should hopefully keep everyone honest going
forward, though it's hard to be sure we're hitting all cases of
memmove.
Change-Id: I2f59f8d8d6fb42d2f10006b55d605b5efd8ddc24
Reviewed-on: https://go-review.googlesource.com/c/go/+/213418
Reviewed-by: Cherry Zhang <cherryyz@google.com>
The timers code used to have a problem: if code started and stopped a
lot of timers, as would happen with, for example, lots of calls to
context.WithTimeout, then it would steadily use memory holding timers
that had stopped but not been removed from the timer heap.
That problem was fixed by CL 214299, which would remove all deleted
timers whenever they got to be more than 1/4 of the total number of
timers on the heap.
The timers code had a different problem: if there were some idle P's,
the running P's would have lock contention trying to steal their timers.
That problem was fixed by CL 214185, which only acquired the timer lock
if the next timer was ready to run or there were some timers to adjust.
Unfortunately, CL 214185 partially undid 214299, in that we could now
accumulate an increasing number of deleted timers while there were no
timers ready to run. This CL restores the 214299 behavior, by checking
whether there are lots of deleted timers without acquiring the lock.
This is a performance issue to consider for the 1.14 release.
Change-Id: I13c980efdcc2a46eb84882750c39e3f7c5b2e7c3
Reviewed-on: https://go-review.googlesource.com/c/go/+/215722
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
There is a wrong offset when getting the results of a clock_gettime
syscall. Although the syscall will never be called in native ppc64x,
QEMU doesn't implement VDSO, so it will return wrong values.
Fixes#36592
Change-Id: Icf838075228dcdd62cf2c1279aa983e5993d66ee
Reviewed-on: https://go-review.googlesource.com/c/go/+/215397
Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
Based on riscv-go port.
Updates #27532
Change-Id: If522807a382130be3c8d40f4b4c1131d1de7c9e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/204632
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Go 1.14 will drop support for macOS 10.10, see #23011
This reverts CL 155097
Updates #26475
Updates #29340
Change-Id: I64d0275141407313b73068436ee81d13eacc4c76
Reviewed-on: https://go-review.googlesource.com/c/go/+/214058
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This reduces lock contention when only a few P's are running and
checking for whether they need to run timers on the sleeping P's.
Without this change the running P's would get lock contention
while looking at the sleeping P's timers. With this change a single
atomic load suffices to determine whether there are any ready timers.
Change-Id: Ie843782bd56df49867a01ecf19c47498ec827452
Reviewed-on: https://go-review.googlesource.com/c/go/+/214185
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: David Chase <drchase@google.com>
Currently, the scavenger is paced according to how long it takes to
scavenge one runtime page's worth of memory. However, this pacing
doesn't take into account the additional cost of actually using a
scavenged page. This operation, "sysUsed," is a counterpart to the
scavenging operation "sysUnused." On most systems this operation is a
no-op, but on some systems like Darwin and Windows we actually make a
syscall. Even on systems where it's a no-op, the cost is implicit: a
more expensive page fault when re-using the page.
On Darwin in particular the cost of "sysUnused" is fairly close to the
cost of "sysUsed", which skews the pacing to be too fast. A lot of
soon-to-be-allocated memory ends up scavenged, resulting in many more
expensive "sysUsed" operations, ultimately slowing down the application.
The way to fix this problem is to include the future cost of "sysUsed"
on a page in the scavenging cost. However, measuring the "sysUsed" cost
directly (like we do with "sysUnused") on most systems is infeasible
because we would have to measure the cost of the first access.
Instead, this change applies a multiplicative constant to the measured
scavenging time which is based on a per-system ratio of "sysUnused" to
"sysUsed" costs in the worst case (on systems where it's a no-op, we
measure the cost of the first access). This ultimately slows down the
scavenger to a more reasonable pace, limiting its impact on performance
but still retaining the memory footprint improvements from the previous
release.
Fixes#36507.
Change-Id: I050659cd8cdfa5a32f5cc0b56622716ea0fa5407
Reviewed-on: https://go-review.googlesource.com/c/go/+/214517
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Whenever more than 1/4 of the timers on a P's heap are deleted,
remove them from the heap.
Change-Id: Iff63ed3d04e6f33ffc5c834f77f645c52c007e52
Reviewed-on: https://go-review.googlesource.com/c/go/+/214299
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
We had a few test cases to make sure checkptr didn't have certain
false positives, but none to test for any true positives. This CL
fixes that.
Updates #22218.
Change-Id: I24c02e469a4af43b1748829a9df325ce510f7cc4
Reviewed-on: https://go-review.googlesource.com/c/go/+/214238
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
When adjustTimers sees a timerModifiedEarlier or timerModifiedLater,
it removes it from the heap, leaving a new timer at that position
in the heap. We were accidentally skipping that new timer in our loop.
In some unlikely cases this could cause adjustTimers to look at more
timers than necessary.
Change-Id: Ic71e54c175ab7d86a7fa46f1497aca71ed1c43cc
Reviewed-on: https://go-review.googlesource.com/c/go/+/214338
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently, scavenging information is printed if the gctrace debug
variable is >0. Scavenging information is also printed naively, for
every page scavenged, resulting in a lot of noise when the typical
expectation for GC trace is one line per GC.
This change adds a new GODEBUG flag called scavtrace which prints
scavenge information roughly once per GC cycle and removes any scavenge
information from gctrace. The exception is debug.FreeOSMemory, which may
force an additional line to be printed.
Fixes#32952.
Change-Id: I4177dcb85fe3f9653fd74297ea93c97c389c1811
Reviewed-on: https://go-review.googlesource.com/c/go/+/212640
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
On Windows, we implement asynchronous preemption using SuspendThread
to suspend other threads in our process. However, SuspendThread is
itself actually asynchronous (it enqueues a kernel "asynchronous
procedure call" and returns). Unfortunately, Windows' ExitProcess API
kills all threads except the calling one and then runs APCs. As a
result, if SuspendThread and ExitProcess are called simultaneously,
the exiting thread can be suspended and the suspending thread can be
exited, leaving behind a ghost process consisting of a single thread
that's suspended.
We've already protected against the runtime's own calls to
ExitProcess, but if Go code calls external code, there's nothing
stopping that code from calling ExitProcess. For example, in #35775,
our own call to racefini leads to C code calling ExitProcess and
occasionally causing a deadlock.
This CL fixes this by introducing synchronization between calling
external code on Windows and preemption. It adds an atomic field to
the M that participates in a simple CAS-based synchronization protocol
to prevent suspending a thread running external code. We use this to
protect cgocall (which is used for both cgo calls and system calls on
Windows) and racefini.
Tested by running the flag package's TestParse test compiled in race
mode in a loop. Before this change, this would reliably deadlock after
a few minutes.
Fixes#35775.
Updates #10958, #24543.
Change-Id: I50d847abcdc2688b4f71eee6a75eca0f2fee892c
Reviewed-on: https://go-review.googlesource.com/c/go/+/213837
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
On Windows, it might be possible that SuspendThread suspends a
thread right between when an exception happens and when the
exception handler runs. (This is my guess. I don't know the
implementation detail of Windows exceptions to be sure.) In this
case, we may inject a call to asyncPreempt before the exception
handler runs. The exception handler will inject a sigpanic call,
which will make the stack trace looks like
sigpanic
asyncPreempt
actual panicking function
i.e. it appears asyncPreempt panicked.
Instead, just overwrite the PC, without pushing another frame.
Fixes#35773.
Change-Id: Ief4e964dcb7f45670b5f93c4dcf285cc1c737514
Reviewed-on: https://go-review.googlesource.com/c/go/+/213879
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This is more readable and less error-prone than using RSP offsets.
Suggested during review of CL 212765.
Change-Id: I070190abeeac8eae5dbd414407602619d9d57422
Reviewed-on: https://go-review.googlesource.com/c/go/+/213577
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Correct the pipe and pipe2 implementations by using the correct RSP offsets,
used to store and return the file descriptor array.
Fix setNonblock by using the correct immediate value for O_NONBLOCK and
replace EOR (exclusive OR) with ORR.
Also correct the write1 implementation, which has a uintptr value for the fd
argument.
Change-Id: Ibca77af44b649e8bb330ca54f9c36a7a8b0f9cea
Reviewed-on: https://go-review.googlesource.com/c/go/+/212765
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
The current code uses EOR (exclusive OR), which will result in the O_NONBLOCK
flag being toggled rather than being set. Other implementations use OR, hence
this is likely a bug.
Change-Id: I5dafa9c572452070bd37789c8a731ad6d04a86cd
Reviewed-on: https://go-review.googlesource.com/c/go/+/212766
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
In reclaimChunk, the runtime is calling traceGCSweepDone() while holding the mheap
lock. traceGCSweepDone() can call traceEvent() and traceFlush(). These functions
not only can get various trace locks, but they may also do memory allocations
(runtime.newobject) that may end up getting the mheap lock. So, there may be
either a self-deadlock or a possible deadlock between multiple threads.
It seems better to release the mheap lock before calling traceGCSweepDone(). It is
fine to release the lock, since the operations to get the index of the chunk of
work to do are atomic. We already release the lock to call sweep, so there is no
new behavior for any of the callers of reclaimChunk.
With this change, mheap is a leaf lock (no other lock is ever acquired while it
is held).
Testing: besides normal all.bash, also ran all.bash with --long enabled, since
it does longer tests of runtime/trace.
Change-Id: I4f8cb66c24bb8d424f24d6c2305b4b8387409248
Reviewed-on: https://go-review.googlesource.com/c/go/+/207846
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
In the previous CL we ensures that memmove writes pointers
atomically, so the concurrent GC won't observe a partially
updated pointer. This CL adds a test.
Change-Id: Icd1124bf3a15ef25bac20c7fb8933f1a642d897c
Reviewed-on: https://go-review.googlesource.com/c/go/+/212627
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
R11 a.k.a. REGTMP is the temp register used by the assembler. It
may be clobbered if the assembler needs to synthesize
instructions. In particular, in nanotime1/walltime1, the load of
global variable runtime.iscgo clobbers it. So, avoid using R11
to hold a long-lived value.
Fixes#36309.
Change-Id: Iec2ab9d664532cad8fbf58da17f580e64a744f62
Reviewed-on: https://go-review.googlesource.com/c/go/+/212641
Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
Reviewed-by: Andrew G. Morgan <agm@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
If a pointer write is not atomic, if the GC is running
concurrently, it may observe a partially updated pointer, which
may point to unallocated or already dead memory. Most pointer
writes, like the store instructions generated by the compiler,
are already atomic. But we still need to be careful in places
like memmove. In memmove, we don't know which bits are pointers
(or too expensive to query), so we ensure that all aligned
pointer-sized units are written atomically.
Fixes#36101.
Change-Id: I1b3ca24c6b1ac8a8aaf9ee470115e9a89ec1b00b
Reviewed-on: https://go-review.googlesource.com/c/go/+/212626
Reviewed-by: Austin Clements <austin@google.com>
When a goroutine yields the remainder of its time to another goroutine
during direct semaphore handoff (as in an Unlock of a sync.Mutex in
starvation mode), it needs to signal that change to the execution
tracer. The discussion in CL 200577 didn't reach consensus on how best
to describe that, but pointed out that "traceEvGoSched / goroutine calls
Gosched" could be confusing.
Emit a "traceEvGoPreempt / goroutine is preempted" event in this case,
to allow the execution tracer to find a consistent event ordering
without being both specific and inaccurate about why the active
goroutine has changed.
Fixes#36186
Change-Id: Ic4ade19325126db2599aff6aba7cba028bb0bee9
Reviewed-on: https://go-review.googlesource.com/c/go/+/211797
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This change makes it so that we check whether scavAddr is actually
mapped before trying to look at the summary for the fast path, since we
may segfault if that that part of the summary is not mapped in.
Previously this wasn't a problem because we would conservatively map
all memory for the summaries between the lowest mapped heap address and
the highest one.
This change also adds a test for this case.
Change-Id: I2b1d89b5e044dce81745964dfaba829f4becdc57
Reviewed-on: https://go-review.googlesource.com/c/go/+/212637
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This change disables pageAlloc tests on OpenBSD in short mode because
pageAlloc holds relatively large virtual memory reservations and we make
two during the pageAlloc tests. The runtime may also be carrying one
such reservation making the virtual memory requirement for testing the
Go runtime three times as much as just running a Go binary.
This causes problems for folks who just want to build and test Go
(all.bash) on OpenBSD but either don't have machines with at least 4ish
GiB of RAM (per-process virtual memory limits are capped at some
constant factor times the amount of physical memory) or their
per-process virtual memory limits are low for other reasons.
Fixes#36210.
Change-Id: I8d89cfde448d4cd2fefff4ad6ffed90de63dd527
Reviewed-on: https://go-review.googlesource.com/c/go/+/212177
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Patch up runtime testing to use the libc fcntl function on Darwin,
which is what we should be doing anyhow. This is similar to how
we handle fcntl on AIX and Solaris.
Fixes#36211
Change-Id: I47ad87e11df043ce21496a0d59523dad28960f76
Reviewed-on: https://go-review.googlesource.com/c/go/+/212299
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
nanotime1 and walltime1 do not preserve BP on linux amd64. Previously, this
did not cause a problem, because nanotime/walltime do preserve the BP. But now
with mid-stack inlining, nanotime/walltime are usually inlined, so BP is not
preserved. So, the BP is now wrong in any function after a call to
nanotime()/walltime() on amd64. That means the frame pointer on the stack can
be wrong for any further function call made after the nanotime() call (notably
runtime.main and various GC functions). [386 doesn't use framepointer.]
Fix is to set a frame size of 8 for nanotime1 and walltime1, which means the
standard prolog/epilog that saves/restore BP in the stack frame is added.
I noticed this while investigating issue 16638 (use frame pointers for
runtime.Callers). This change would needed for progress on that issue (which
doesn't have a high priority). Verified that this fix works/is useful for issue
16638.
Change-Id: I19e19ef2c1a517d737a34928baae034f2eb0b2c2
Reviewed-on: https://go-review.googlesource.com/c/go/+/212079
Run-TryBot: Dan Scales <danscales@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
We don't asynchronously preempt if we are in the runtime. We do
this by checking the function name. However, it failed to take
inlining into account. If a runtime function gets inlined into
a non-runtime function, it can be preempted, and bad things can
happen. One instance of this is dounlockOSThread inlined into
UnlockOSThread which is in turn inlined into a non-runtime
function.
Fix this by using the innermost frame's function name.
Change-Id: Ifa036ce1320700aaaefd829b4bee0d04d05c395d
Reviewed-on: https://go-review.googlesource.com/c/go/+/211978
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Systems where PowerRegisterSuspendResumeNotification returns ERROR_
FILE_NOT_FOUND are also systems where nanotime() is on "program time"
rather than "real time". The chain for this is:
powrprof.dll!PowerRegisterSuspendResumeNotification ->
umpdc.dll!PdcPortOpen ->
ntdll.dll!ZwAlpcConnectPort("\\PdcPort") ->
syscall -> ntoskrnl.exe!AlpcpConnectPort
Opening \\.\PdcPort fails with STATUS_OBJECT_NAME_NOT_FOUND when pdc.sys
hasn't been initialized. Pdc.sys also provides the various hooks for
sleep resumption events, which means if it's not loaded, then our "real
time" timer is actually on "program time". Finally STATUS_OBJECT_NAME_
NOT_FOUND is passed through RtlNtStatusToDosError, which returns ERROR_
FILE_NOT_FOUND. Therefore, in the case where the function returns ERROR_
FILE_NOT_FOUND, we don't mind, since the timer we're using will
correspond fine with the lack of sleep resumption notifications. This
applies, for example, to Docker users.
Fixes#35447Fixes#35482
Change-Id: I9e1ce5bbc54b9da55ff7a3918b5da28112647eee
Reviewed-on: https://go-review.googlesource.com/c/go/+/208317
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Jason A. Donenfeld <Jason@zx2c4.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
When growing the address ranges, the new length is the old length + 1.
Fixes#36113.
Change-Id: I1b425f78e473cfa3cbdfe6113e166663f41fc9f3
Reviewed-on: https://go-review.googlesource.com/c/go/+/211157
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
If the defer function pointer is nil, force the seg fault to happen in deferreturn
rather than in jmpdefer. jmpdefer is used fairly infrequently now because most
functions have open-coded defers.
The open-coded defer implementation calls gentraceback() with a callback when
looking for the first open-coded defer frame. gentraceback() throws an error if it
is called with a callback on an LR architecture and jmpdefer is on the stack,
because the stack trace can be incorrect in that case - see issue #8153. So, we
want to make sure that we don't have a seg fault in jmpdefer.
Fixes#36050
Change-Id: Ie25e6f015d8eb170b40248dedeb26a37b7f9b38d
Reviewed-on: https://go-review.googlesource.com/c/go/+/210978
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Prior to this change, if the heap was very discontiguous (such as in
TestArenaCollision) it's possible we could map a large amount of memory
as R/W and commit it. We would use only the start and end to track what
should be mapped, and we would extend that mapping as needed to
accomodate a potentially fragmented address space.
After this change, we only map exactly the part of the summary arrays
that we need by using the inUse ranges from the previous change. This
reduces the GCSys footprint of TestArenaCollision from 300 MiB to 18
MiB.
Because summaries are no longer mapped contiguously, this means the
scavenger can no longer iterate directly. This change also updates the
scavenger to borrow ranges out of inUse and iterate over only the
parts of the heap which are actually currently in use. This is both an
optimization and necessary for correctness.
Fixes#35514.
Change-Id: I96bf0c73ed0d2d89a00202ece7b9d089a53bac90
Reviewed-on: https://go-review.googlesource.com/c/go/+/207758
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This change adds a new inUse field to the allocator which tracks ranges
of addresses that are owned by the heap. It is updated on each heap
growth.
These ranges are tracked in an array which is kept sorted. In practice
this array shouldn't exceed its initial allocation except in rare cases
and thus should be small (ideally exactly 1 element in size).
In a hypothetical worst-case scenario wherein we have a 1 TiB heap and 4
MiB arenas (note that the address ranges will never be at a smaller
granularity than an arena, since arenas are always allocated
contiguously), inUse would use at most 4 MiB of memory if the heap
mappings were completely discontiguous (highly unlikely) with an
additional 2 MiB leaked from previous allocations. Furthermore, the
copies that are done to keep the inUse array sorted will copy at most 4
MiB of memory in such a scenario, which, assuming a conservative copying
rate of 5 GiB/s, amounts to about 800µs.
However, note that in practice:
1) Most 64-bit platforms have 64 MiB arenas.
2) The copies should incur little-to-no page faults, meaning a copy rate
closer to 25-50 GiB/s is expected.
3) Go heaps are almost always mostly contiguous.
Updates #35514.
Change-Id: I3ad07f1c2b5b9340acf59ecc3b9ae09e884814fe
Reviewed-on: https://go-review.googlesource.com/c/go/+/207757
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Austin Clements <austin@google.com>