Commit Graph

118 Commits

Author SHA1 Message Date
Keith Randall 052d402985 runtime: clear mspan.largeType more carefully in the case of arenas
The pointer stored in mspan.largeType is an invalid pointer when
the span is an arena. We need to make sure that pointer isn't seen
by the garbage collector, as it might barf on it. Make sure we
zero the pointer using a uintptr write so the old value isn't picked
up by the write barrier.

The mspan.largeType field itself is in a NotInHeap struct, so a heap
scan won't find it. The only way we find it is when writing it, or
when reading it and putting it in a GC-reachable location. I think we
might need to audit the runtime to make sure these pointers aren't
being passed in places where the GC might (non-conservatively) scan a
stack frame it lives in. (It might be ok, many such places are either
systemstack or nosplit.)

Change-Id: Ie059d054e0da4d48a4c4b3be88b8e1e46ffa7d10
Reviewed-on: https://go-review.googlesource.com/c/go/+/548535
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-12-11 17:15:22 +00:00
Michael Anthony Knyszek f119abb65d runtime: refactor runtime->tracer API to appear more like a lock
Currently the execution tracer synchronizes with itself using very
heavyweight operations. As a result, it's totally fine for most of the
tracer code to look like:

    if traceEnabled() {
	traceXXX(...)
    }

However, if we want to make that synchronization more lightweight (as
issue #60773 proposes), then this is insufficient. In particular, we
need to make sure the tracer can't observe an inconsistency between g
atomicstatus and the event that would be emitted for a particular
g transition. This means making the g status change appear to happen
atomically with the corresponding trace event being written out from the
perspective of the tracer.

This requires a change in API to something more like a lock. While we're
here, we might as well make sure that trace events can *only* be emitted
while this lock is held. This change introduces such an API:
traceAcquire, which returns a value that can emit events, and
traceRelease, which requires the value that was returned by
traceAcquire. In practice, this won't be a real lock, it'll be more like
a seqlock.

For the current tracer, this API is completely overkill and the value
returned by traceAcquire basically just checks trace.enabled. But it's
necessary for the tracer described in #60773 and we can implement that
more cleanly if we do this refactoring now instead of later.

For #60773.

Change-Id: Ibb9ff5958376339fafc2b5180aef65cf2ba18646
Reviewed-on: https://go-review.googlesource.com/c/go/+/515635
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2023-11-09 22:34:25 +00:00
Michael Anthony Knyszek 38ac7c41aa runtime: implement experiment to replace heap bitmap with alloc headers
This change replaces the 1-bit-per-word heap bitmap for most size
classes with allocation headers for objects that contain pointers. The
header consists of a single pointer to a type. All allocations with
headers are treated as implicitly containing one or more instances of
the type in the header.

As the name implies, headers are usually stored as the first word of an
object. There are two additional exceptions to where headers are stored
and how they're used.

Objects smaller than 512 bytes do not have headers. Instead, a heap
bitmap is reserved at the end of spans for objects of this size. A full
word of overhead is too much for these small objects. The bitmap is of
the same format of the old bitmap, minus the noMorePtrs bits which are
unnecessary. All the objects <512 bytes have a bitmap less than a
pointer-word in size, and that was the granularity at which noMorePtrs
could stop scanning early anyway.

Objects that are larger than 32 KiB (which have their own span) have
their headers stored directly in the span, to allow power-of-two-sized
allocations to not spill over into an extra page.

The full implementation is behind GOEXPERIMENT=allocheaders.

The purpose of this change is performance. First and foremost, with
headers we no longer have to unroll pointer/scalar data at allocation
time for most size classes. Small size classes still need some
unrolling, but their bitmaps are small so we can optimize that case
fairly well. Larger objects effectively have their pointer/scalar data
unrolled on-demand from type data, which is much more compactly
represented and results in less TLB pressure. Furthermore, since the
headers are usually right next to the object and where we're about to
start scanning, we get an additional temporal locality benefit in the
data cache when looking up type metadata. The pointer/scalar data is
now effectively unrolled on-demand, but it's also simpler to unroll than
before; that unrolled data is never written anywhere, and for arrays we
get the benefit of retreading the same data per element, as opposed to
looking it up from scratch for each pointer-word of bitmap. Lastly,
because we no longer have a heap bitmap that spans the entire heap,
there's a flat 1.5% memory use reduction. This is balanced slightly by
some objects possibly being bumped up a size class, but most objects are
not tightly optimized to size class sizes so there's some memory to
spare, making the header basically free in those cases.

See the follow-up CL which turns on this experiment by default for
benchmark results. (CL 538217.)

Change-Id: I4c9034ee200650d06d8bdecd579d5f7c1bbf1fc5
Reviewed-on: https://go-review.googlesource.com/c/go/+/437955
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-09 19:58:08 +00:00
Michael Pratt 7241fee9b0 runtime: remove write-only sweepdata fields
Change-Id: Ia238889a704812473b838b20efedfe9d24b1e26f
Reviewed-on: https://go-review.googlesource.com/c/go/+/534160
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-10-11 16:55:22 +00:00
Cherry Mui 340a4f55c4 runtime: use smaller fields for mspan.freeindex and nelems
mspan.freeindex and nelems can fit into uint16 for all possible
values. Use uint16 instead of uintptr.

Change-Id: Ifce20751e81d5022be1f6b5cbb5fbe4fd1728b1b
Reviewed-on: https://go-review.googlesource.com/c/go/+/451359
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-02 20:39:21 +00:00
Sven Anderson 697644070c runtime: improve Pinner with gcBits
This change replaces the statically sized pinnerBits with gcBits
based ones, that are copied in each GC cycle if they exist.  The
pinnerBits now include a second bit per object, that indicates if a
pinner counter for multi-pins exists, in order to avoid unnecessary
specials iterations.

This is a follow-up to CL 367296.

Change-Id: I82e38cecd535e18c3b3ae54b5cc67d3aeeaafcfd
Reviewed-on: https://go-review.googlesource.com/c/go/+/493275
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2023-05-19 23:21:57 +00:00
Michael Anthony Knyszek 7c91e1e568 runtime: replace raw traceEv with traceBlockReason in gopark
This change adds traceBlockReason which leaks fewer implementation
details of the tracer to the runtime. Currently, gopark is called with
an explicit trace event, but this leaks details about trace internals
throughout the runtime.

This change will make it easier to change out the trace implementation.

Change-Id: Id633e1704d2c8838c6abd1214d9695537c4ac7db
Reviewed-on: https://go-review.googlesource.com/c/go/+/494185
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
2023-05-19 20:47:25 +00:00
Michael Anthony Knyszek a3e90dc377 runtime: add eager scavenging details to GODEBUG=scavtrace=1
Also, clean up atomics on released-per-cycle while we're here.

For #57069.

Change-Id: I14026e8281f01dea1e8c8de6aa8944712b7b24d9
Reviewed-on: https://go-review.googlesource.com/c/go/+/495916
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-05-19 13:38:43 +00:00
Michael Anthony Knyszek b7e767b022 runtime: capture per-p trace state in a type
More tightening up of the tracer's interface.

Change-Id: I992141c7f30e5c2d5d77d1fcd6817d35bc6e5f6d
Reviewed-on: https://go-review.googlesource.com/c/go/+/494191
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-05-17 14:46:57 +00:00
Michael Anthony Knyszek 8992bb19ad runtime: replace trace.enabled with traceEnabled
[git-generate]
cd src/runtime
grep -l 'trace\.enabled' *.go | grep -v "trace.go" | xargs sed -i 's/trace\.enabled/traceEnabled()/g'

Change-Id: I14c7821c1134690b18c8abc0edd27abcdabcad72
Reviewed-on: https://go-review.googlesource.com/c/go/+/494181
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2023-05-11 21:27:08 +00:00
Michael Anthony Knyszek 8fa9e3beee runtime: manage huge pages explicitly
This change makes it so that on Linux the Go runtime explicitly marks
page heap memory as either available to be backed by hugepages or not
using heuristics based on density.

The motivation behind this change is twofold:
1. In default Linux configurations, khugepaged can recoalesce hugepages
   even after the scavenger breaks them up, resulting in significant
   overheads for small heaps when their heaps shrink.
2. The Go runtime already has some heuristics about this, but those
   heuristics appear to have bit-rotted and result in haphazard
   hugepage management. Unlucky (but otherwise fairly dense) regions of
   memory end up not backed by huge pages while sparse regions end up
   accidentally marked MADV_HUGEPAGE and are not later broken up by the
   scavenger, because it already got the memory it needed from more
   dense sections (this is more likely to happen with small heaps that
   go idle).

In this change, the runtime uses a new policy:

1. Mark all new memory MADV_HUGEPAGE.
2. Track whether each page chunk (4 MiB) became dense during the GC
   cycle. Mark those MADV_HUGEPAGE, and hide them from the scavenger.
3. If a chunk is not dense for 1 full GC cycle, make it visible to the
   scavenger.
4. The scavenger marks a chunk MADV_NOHUGEPAGE before it scavenges it.

This policy is intended to try and back memory that is a good candidate
for huge pages (high occupancy) with huge pages, and give memory that is
not (low occupancy) to the scavenger. Occupancy is defined not just by
occupancy at any instant of time, but also occupancy in the near future.
It's generally true that by the end of a GC cycle the heap gets quite
dense (from the perspective of the page allocator).

Because we want scavenging and huge page management to happen together
(the right time to MADV_NOHUGEPAGE is just before scavenging in order to
break up huge pages and keep them that way) and the cost of applying
MADV_HUGEPAGE and MADV_NOHUGEPAGE is somewhat high, the scavenger avoids
releasing memory in dense page chunks. All this together means the
scavenger will now more generally release memory on a ~1 GC cycle delay.

Notably this has implications for scavenging to maintain the memory
limit and the runtime/debug.FreeOSMemory API. This change makes it so
that in these cases all memory is visible to the scavenger regardless of
sparseness and delays the page allocator in re-marking this memory with
MADV_NOHUGEPAGE for around 1 GC cycle to mitigate churn.

The end result of this change should be little-to-no performance
difference for dense heaps (MADV_HUGEPAGE works a lot like the default
unmarked state) but should allow the scavenger to more effectively take
back fragments of huge pages. The main risk here is churn, because
MADV_HUGEPAGE usually forces the kernel to immediately back memory with
a huge page. That's the reason for the large amount of hysteresis (1
full GC cycle) and why the definition of high density is 96% occupancy.

Fixes #55328.

Change-Id: I8da7998f1a31b498a9cc9bc662c1ae1a6bf64630
Reviewed-on: https://go-review.googlesource.com/c/go/+/436395
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-04-19 14:30:00 +00:00
cui fliter af9f21289f runtime: fix function name in comments
Change-Id: I18bb87bfdea8b6d7994091ced5134aa2549f221e
Reviewed-on: https://go-review.googlesource.com/c/go/+/472476
Run-TryBot: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
2023-03-01 22:20:32 +00:00
Michael Anthony Knyszek a7b75972f2 runtime: check for overflow in sweep assist
The sweep assist computation is intentionally racy for performance,
since the specifics of sweep assist aren't super sensitive to error.
However, if overflow occurs when computing the live heap delta, we can
end up with a massive sweep target that causes the sweep assist to sweep
until sweep termination, causing severe latency issues. In fact, because
heapLive doesn't always increase monotonically then anything that
flushes mcaches will cause _all_ allocating goroutines to inevitably get
stuck in sweeping.

Consider the following scenario:
1. SetGCPercent is called, updating sweepHeapLiveBasis to heapLive.
2. Very shortly after, ReadMemStats is called, flushing mcaches and
   decreasing heapLive below the value sweepHeapLiveBasis was set to.
3. Every allocating goroutine goes to refill its mcache, calls into
   deductSweepCredit for sweep assist, and gets stuck sweeping until
   the sweep phase ends.

Fix this by just checking for overflow in the delta live heap calculation
and if it would overflow, pick a small delta live heap. This probably
means that no sweeping will happen at all, but that's OK. This is a
transient state and the runtime will recover as soon as heapLive
increases again.

Note that deductSweepCredit doesn't check overflow on other operations
but that's OK: those operations are signed and extremely unlikely to
overflow. The subtraction targeted by this CL is only a problem because
it's unsigned. An alternative fix would be to make the operation signed,
but being explicit about the overflow situation seems worthwhile.

Fixes #57523.

Change-Id: Ib18f71f53468e913548aac6e5358830c72ef0215
Reviewed-on: https://go-review.googlesource.com/c/go/+/460376
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
2023-02-14 23:13:03 +00:00
Cherry Mui febe7b8e2a runtime: make GC see object as allocated after it is initialized
When the GC is scanning some memory (possibly conservatively),
finding a pointer, while concurrently another goroutine is
allocating an object at the same address as the found pointer, the
GC may see the pointer before the object and/or the heap bits are
initialized. This may cause the GC to see bad pointers and
possibly crash.

To prevent this, we make it that the scanner can only see the
object as allocated after the object and the heap bits are
initialized. Currently the allocator uses freeindex to find the
next available slot, and that code is coupled with updating the
free index to a new slot past it. The scanner also uses the
freeindex to determine if an object is allocated. This is somewhat
racy. This CL makes the scanner use a different field, which is
only updated after the object initialization (and a memory
barrier).

Fixes #54596.

Change-Id: I2a57a226369926e7192c253dd0d21d3faf22297c
Reviewed-on: https://go-review.googlesource.com/c/go/+/449017
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-11-15 02:55:24 +00:00
Jakub Ciolek 2df6c1abce runtime: remove the started field from sweepdata
This bool doesn't seem to be used anymore. Remove it.

Change-Id: Ic73346a98513c392d89482c5e1d818a90d713516
Reviewed-on: https://go-review.googlesource.com/c/go/+/419654
Run-TryBot: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
2022-11-08 21:31:58 +00:00
Michael Anthony Knyszek 7866538d25 runtime: add safe arena support to the runtime
This change adds an API to the runtime for arenas. A later CL can
potentially export it as an experimental API, but for now, just the
runtime implementation will suffice.

The purpose of arenas is to improve efficiency, primarily by allowing
for an application to manually free memory, thereby delaying garbage
collection. It comes with other potential performance benefits, such as
better locality, a better allocation strategy, and better handling of
interior pointers by the GC.

This implementation is based on one by danscales@google.com with a few
significant differences:
* The implementation lives entirely in the runtime (all layers).
* Arena chunks are the minimum of 8 MiB or the heap arena size. This
  choice is made because in practice 64 MiB appears to be way too large
  of an area for most real-world use-cases.
* Arena chunks are not unmapped, instead they're placed on an evacuation
  list and when there are no pointers left pointing into them, they're
  allowed to be reused.
* Reusing partially-used arena chunks no longer tries to find one used
  by the same P first; it just takes the first one available.
* In order to ensure worst-case fragmentation is never worse than 25%,
  only types and slice backing stores whose sizes are 1/4th the size of
  a chunk or less may be used. Previously larger sizes, up to the size
  of the chunk, were allowed.
* ASAN, MSAN, and the race detector are fully supported.
* Sets arena chunks to fault that were deferred at the end of mark
  termination (a non-public patch once did this; I don't see a reason
  not to continue that).

For #51317.

Change-Id: I83b1693a17302554cb36b6daa4e9249a81b1644f
Reviewed-on: https://go-review.googlesource.com/c/go/+/423359
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
2022-10-12 20:23:30 +00:00
Michael Anthony Knyszek 8451529e9a runtime: tweak bgsweep "low-priority" heuristic
Currently bgsweep attempts to be a low-priority background goroutine
that runs mainly when the application is mostly idle. To avoid
complicating the scheduler further, it achieves this with a simple
heuristic: call Gosched after each span swept. While this is somewhat
inefficient as there's scheduling overhead on each iteration, it's
mostly fine because it tends to just come out of idle time anyway. In a
busy system, the call to Gosched quickly puts bgsweep at the back of
scheduler queues.

However, what's problematic about this heuristic is the number of
tracing events it produces. Average span sweeping latencies have been
measured as low as 30 ns, so every 30 ns in the sweep phase, with
available idle time, there would be a few trace events emitted. This
could result in an overwhelming number, making traces much larger than
they need to be. It also pollutes other observability tools, like the
scheduling latencies runtime metric, because bgsweep stays runnable the
whole time.

This change fixes these problems with two modifications to the
heursitic:

1. Check if there are any idle Ps before yielding. If there are, don't
   yield.
2. Sweep at least 10 spans before trying to yield.

(1) is doing most of the work here. This change assumes that the
presence of idle Ps means that there is available CPU time, so bgsweep
is already making use of idle time and there's no reason it should stop.
This will have the biggest impact on the aforementioned issues.

(2) is a mitigation for the case where GOMAXPROCS=1, because we won't
ever observe a zero idle P count. It does mean that bgsweep is a little
bit higher priority than before because it yields its time less often,
so it could interfere with goroutine scheduling latencies more. However,
by sweeping 10 spans before volunteering time, we directly reduce trace
event production by 90% in all cases. The impact on scheduling latencies
should be fairly minimal, as sweeping a span is already so fast, that
sweeping 10 is unlikely to make a dent in any meaningful end-to-end
latency. In fact, it may even improve application latencies overall by
freeing up spans and sweep work from goroutines allocating memory. It
may be worth considering pushing this number higher in the future.

Another reason to do (2) is to reduce contention on npidle, which will
be checked as part of (1), but this is a fairly minor concern. The main
reason is to capture the GOMAXPROCS=1 case.

Fixes #54767.

Change-Id: I4361400f17197b8ab84c01f56203f20575b29fc6
Reviewed-on: https://go-review.googlesource.com/c/go/+/429615
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2022-09-16 16:33:11 +00:00
Michael Pratt 3a9281ff61 runtime: convert gcController.heapLive to atomic type
Atomic operations are used even during STW for consistency.

For #53821.

Change-Id: Ibe7afe5cf893b1288ce24fc96b7691b1f81754ff
Reviewed-on: https://go-review.googlesource.com/c/go/+/417775
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-08 14:11:09 +00:00
Michael Pratt 0e18cf6d09 runtime: trivial replacements of _g_ in GC files
Change-Id: Iedf10558d9a1d3b80a151927b99660b688ed9ccb
Reviewed-on: https://go-review.googlesource.com/c/go/+/418585
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2022-08-02 18:51:53 +00:00
Michael Anthony Knyszek 7c404d59db runtime: store consistent total allocation stats as uint64
Currently the consistent total allocation stats are managed as uintptrs,
which means they can easily overflow on 32-bit systems. Fix this by
storing these stats as uint64s. This will cause some minor performance
degradation on 32-bit systems, but there really isn't a way around this,
and it affects the correctness of the metrics we export.

Fixes #52680.

Change-Id: I7e6ca44047d46b4bd91c6f87c2d29f730e0d6191
Reviewed-on: https://go-review.googlesource.com/c/go/+/403758
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Austin Clements <austin@google.com>
2022-05-03 19:58:15 +00:00
Michael Anthony Knyszek 91f863013e runtime: redesign scavenging algorithm
Currently the runtime's scavenging algorithm involves running from the
top of the heap address space to the bottom (or as far as it gets) once
per GC cycle. Once it treads some ground, it doesn't tread it again
until the next GC cycle.

This works just fine for the background scavenger, for heap-growth
scavenging, and for debug.FreeOSMemory. However, it breaks down in the
face of a memory limit for small heaps in the tens of MiB. Basically,
because the scavenger never retreads old ground, it's completely
oblivious to new memory it could scavenge, and that it really *should*
in the face of a memory limit.

Also, every time some thread goes to scavenge in the runtime, it
reserves what could be a considerable amount of address space, hiding it
from other scavengers.

This change modifies and simplifies the implementation overall. It's
less code with complexities that are much better encapsulated. The
current implementation iterates optimistically over the address space
looking for memory to scavenge, keeping track of what it last saw. The
new implementation does the same, but instead of directly iterating over
pages, it iterates over chunks. It maintains an index of chunks (as a
bitmap over the address space) that indicate which chunks may contain
scavenge work. The page allocator populates this index, while scavengers
consume it and iterate over it optimistically.

This has a two key benefits:
1. Scavenging is much simpler: find a candidate chunk, and check it,
   essentially just using the scavengeOne fast path. There's no need for
   the complexity of iterating beyond one chunk, because the index is
   lock-free and already maintains that information.
2. If pages are freed to the page allocator (always guaranteed to be
   unscavenged), the page allocator immediately notifies all scavengers
   of the new source of work, avoiding the hiding issues of the old
   implementation.

One downside of the new implementation, however, is that it's
potentially more expensive to find pages to scavenge. In the past, if
a single page would become free high up in the address space, the
runtime's scavengers would ignore it. Now that scavengers won't, one or
more scavengers may need to iterate potentially across the whole heap to
find the next source of work. For the background scavenger, this just
means a potentially less reactive scavenger -- overall it should still
use the same amount of CPU. It means worse overheads for memory limit
scavenging, but that's not exactly something with a baseline yet.

In practice, this shouldn't be too bad, hopefully since the chunk index
is extremely compact. For a 48-bit address space, the index is only 8
MiB in size at worst, but even just one physical page in the index is
able to support up to 128 GiB heaps, provided they aren't terribly
sparse. On 32-bit platforms, the index is only 128 bytes in size.

For #48409.

Change-Id: I72b7e74365046b18c64a6417224c5d85511194fb
Reviewed-on: https://go-review.googlesource.com/c/go/+/399474
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-03 15:13:53 +00:00
Michael Anthony Knyszek 375d696ddf runtime: move inconsistent memstats into gcController
Fundamentally, all of these memstats exist to serve the runtime in
managing memory. For the sake of simpler testing, couple these stats
more tightly with the GC.

This CL was mostly done automatically. The fields had to be moved
manually, but the references to the fields were updated via

    gofmt -w -r 'memstats.<field> -> gcController.<field>' *.go

For #48409.

Change-Id: Ic036e875c98138d9a11e1c35f8c61b784c376134
Reviewed-on: https://go-review.googlesource.com/c/go/+/397678
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-03 15:12:38 +00:00
Michael Anthony Knyszek 85d466493d runtime: maintain a direct count of total allocs and frees
This will be used by the memory limit computation to determine
overheads.

For #48409.

Change-Id: Iaa4e26e1e6e46f88d10ba8ebb6b001be876dc5cd
Reviewed-on: https://go-review.googlesource.com/c/go/+/394220
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-03 15:12:12 +00:00
Michael Anthony Knyszek d29f5247b8 runtime: refactor the scavenger and make it testable
This change refactors the scavenger into a type whose methods represent
the actual function and scheduling of the scavenger. It also stubs out
access to global state in order to make it testable.

This change thus also adds a test for the scavenger. In writing this
test, I discovered the lack of a behavior I expected: if the
pageAlloc.scavenge returns < the bytes requested scavenged, that means
the heap is exhausted. This has been true this whole time, but was not
documented or explicitly relied upon. This change rectifies that. In
theory this means the scavenger could spin in run() indefinitely (as
happened in the test) if shouldStop never told it to stop. In practice,
shouldStop fires long before the heap is exhausted, but for future
changes it may be important. At the very least it's good to be
intentional about these things.

While we're here, I also moved the call to stopTimer out of wake and
into sleep. There's no reason to add more operations to a context that's
already precarious (running without a P on sysmon).

Change-Id: Ib31b86379fd9df84f25ae282734437afc540da5c
Reviewed-on: https://go-review.googlesource.com/c/go/+/384734
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-26 22:15:21 +00:00
Daniel Martí 3e7ffb862f all: consistently use US spelling of present participles
It has been agreed that we should prefer the US spelling of words like
"canceling" over "cancelling"; for example, see https://go.dev/cl/14526.

Fix a few occurrences of the "canceling" inconsistency, as well as:

* signaling
* tunneling
* marshaling

Change-Id: I99f3ba0a700a9f0292bc6c1b110af31dd05f1ff0
Reviewed-on: https://go-review.googlesource.com/c/go/+/398734
Trust: Daniel Martí <mvdan@mvdan.cc>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2022-04-08 13:44:41 +00:00
Russ Cox 9839668b56 all: separate doc comment from //go: directives
A future change to gofmt will rewrite

	// Doc comment.
	//go:foo

to

	// Doc comment.
	//
	//go:foo

Apply that change preemptively to all comments (not necessarily just doc comments).

For #51082.

Change-Id: Iffe0285418d1e79d34526af3520b415a12203ca9
Reviewed-on: https://go-review.googlesource.com/c/go/+/384260
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-05 17:54:15 +00:00
“kinggo” 4dfbb89f58 runtime: typo fix cyle -> cycle
Change-Id: I213fa8aa9b9c2537a189677394ddd30c62312518
GitHub-Last-Rev: ccafdee944
GitHub-Pull-Request: golang/go#50268
Reviewed-on: https://go-review.googlesource.com/c/go/+/373336
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Zhuo Meng <mzh@golangcn.org>
2021-12-21 01:51:23 +00:00
fanzha02 6f327f7b88 runtime, syscall: add calls to asan functions
Add explicit address sanitizer instrumentation to the runtime and
syscall packages. The compiler does not instrument the runtime
package. It does instrument the syscall package, but we need to add
a couple of cases that it can't see.

Refer to the implementation of the asan malloc runtime library,
this patch also allocates extra memory as the redzone, around the
returned memory region, and marks the redzone as unaddressable to
detect the overflows or underflows.

Updates #44853.

Change-Id: I2753d1cc1296935a66bf521e31ce91e35fcdf798
Reviewed-on: https://go-review.googlesource.com/c/go/+/298614
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Trust: fannie zhang <Fannie.Zhang@arm.com>
2021-11-02 05:35:11 +00:00
Michael Anthony Knyszek 413672fc84 runtime: detangle sweeper pacing from GC pacing
The sweeper's pacing state is global, so detangle it from the GC pacer's
state updates so that the GC pacer can be tested.

For #44167.

Change-Id: Ibcea989cd435b73c5891f777d9f95f9604e03bd1
Reviewed-on: https://go-review.googlesource.com/c/go/+/309273
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2021-10-29 18:23:03 +00:00
Michael Anthony Knyszek 3aecb3a8f7 runtime: fix sweep termination condition
Currently, there is a chance that the sweep termination condition could
flap, causing e.g. runtime.GC to return before all sweep work has not
only been drained, but also completed. CL 307915 and CL 307916 attempted
to fix this problem, but it is still possible that mheap_.sweepDrained is
marked before any outstanding sweepers are accounted for in
mheap_.sweepers, leaving a window in which a thread could observe
isSweepDone as true before it actually was (and after some time it would
revert to false, then true again, depending on the number of outstanding
sweepers at that point).

This change fixes the sweep termination condition by merging
mheap_.sweepers and mheap_.sweepDrained into a single atomic value.

This value is updated such that a new potential sweeper will increment
the oustanding sweeper count iff there are still outstanding spans to be
swept without an outstanding sweeper to pick them up. This design
simplifies the sweep termination condition into a single atomic load and
comparison and ensures the condition never flaps.

Updates #46500.
Fixes #45315.

Change-Id: I6d69aff156b8d48428c4cc8cfdbf28be346dbf04
Reviewed-on: https://go-review.googlesource.com/c/go/+/333389
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2021-10-29 17:12:47 +00:00
Michael Anthony Knyszek 016d5eea11 runtime: retype mheap.reclaimCredit as atomic.Uintptr
[git-generate]
cd src/runtime
mv export_test.go export.go
GOROOT=$(dirname $(dirname $PWD)) rf '
  add mheap.reclaimCredit \
// reclaimCredit is spare credit for extra pages swept. Since \
// the page reclaimer works in large chunks, it may reclaim \
// more than requested. Any spare pages released go to this \
// credit pool. \
reclaimCredit_ atomic.Uintptr
  ex {
    import "runtime/internal/atomic"

    var t mheap
    var v, w uintptr
    var d uintptr

    t.reclaimCredit -> t.reclaimCredit_.Load()
    t.reclaimCredit = v -> t.reclaimCredit_.Store(v)
    atomic.Loaduintptr(&t.reclaimCredit) -> t.reclaimCredit_.Load()
    atomic.LoadAcquintptr(&t.reclaimCredit) -> t.reclaimCredit_.LoadAcquire()
    atomic.Storeuintptr(&t.reclaimCredit, v) -> t.reclaimCredit_.Store(v)
    atomic.StoreReluintptr(&t.reclaimCredit, v) -> t.reclaimCredit_.StoreRelease(v)
    atomic.Casuintptr(&t.reclaimCredit, v, w) -> t.reclaimCredit_.CompareAndSwap(v, w)
    atomic.Xchguintptr(&t.reclaimCredit, v) -> t.reclaimCredit_.Swap(v)
    atomic.Xadduintptr(&t.reclaimCredit, d) -> t.reclaimCredit_.Add(d)
  }
  rm mheap.reclaimCredit
  mv mheap.reclaimCredit_ mheap.reclaimCredit
'
mv export.go export_test.go

Change-Id: I2c567781a28f5d8c2275ff18f2cf605b82f22d09
Reviewed-on: https://go-review.googlesource.com/c/go/+/356712
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2021-10-20 20:39:36 +00:00
Michael Anthony Knyszek 1dff8f0a05 runtime: retype mheap.pagesSweptBasis as atomic.Uint64
[git-generate]
cd src/runtime
mv export_test.go export.go
GOROOT=$(dirname $(dirname $PWD)) rf '
  add mheap.pagesSweptBasis pagesSweptBasis_ atomic.Uint64 // pagesSwept to use as the origin of the sweep ratio
  ex {
    import "runtime/internal/atomic"

    var t mheap
    var v, w uint64
    var d int64

    t.pagesSweptBasis -> t.pagesSweptBasis_.Load()
    t.pagesSweptBasis = v -> t.pagesSweptBasis_.Store(v)
    atomic.Load64(&t.pagesSweptBasis) -> t.pagesSweptBasis_.Load()
    atomic.LoadAcq64(&t.pagesSweptBasis) -> t.pagesSweptBasis_.LoadAcquire()
    atomic.Store64(&t.pagesSweptBasis, v) -> t.pagesSweptBasis_.Store(v)
    atomic.StoreRel64(&t.pagesSweptBasis, v) -> t.pagesSweptBasis_.StoreRelease(v)
    atomic.Cas64(&t.pagesSweptBasis, v, w) -> t.pagesSweptBasis_.CompareAndSwap(v, w)
    atomic.Xchg64(&t.pagesSweptBasis, v) -> t.pagesSweptBasis_.Swap(v)
    atomic.Xadd64(&t.pagesSweptBasis, d) -> t.pagesSweptBasis_.Add(d)
  }
  rm mheap.pagesSweptBasis
  mv mheap.pagesSweptBasis_ mheap.pagesSweptBasis
'
mv export.go export_test.go

Change-Id: Id9438184b9bd06d96894c02376385bad45dee154
Reviewed-on: https://go-review.googlesource.com/c/go/+/356710
Reviewed-by: Austin Clements <austin@google.com>
Trust: Michael Knyszek <mknyszek@google.com>
2021-10-20 20:39:29 +00:00
Michael Anthony Knyszek e90492882a runtime: retype mheap.pagesSwept as atomic.Uint64
[git-generate]
cd src/runtime
mv export_test.go export.go
GOROOT=$(dirname $(dirname $PWD)) rf '
  add mheap.pagesSwept pagesSwept_ atomic.Uint64 // pages swept this cycle
  ex {
    import "runtime/internal/atomic"

    var t mheap
    var v, w uint64
    var d int64

    t.pagesSwept -> t.pagesSwept_.Load()
    t.pagesSwept = v -> t.pagesSwept_.Store(v)
    atomic.Load64(&t.pagesSwept) -> t.pagesSwept_.Load()
    atomic.LoadAcq64(&t.pagesSwept) -> t.pagesSwept_.LoadAcquire()
    atomic.Store64(&t.pagesSwept, v) -> t.pagesSwept_.Store(v)
    atomic.StoreRel64(&t.pagesSwept, v) -> t.pagesSwept_.StoreRelease(v)
    atomic.Cas64(&t.pagesSwept, v, w) -> t.pagesSwept_.CompareAndSwap(v, w)
    atomic.Xchg64(&t.pagesSwept, v) -> t.pagesSwept_.Swap(v)
    atomic.Xadd64(&t.pagesSwept, d) -> t.pagesSwept_.Add(d)
  }
  rm mheap.pagesSwept
  mv mheap.pagesSwept_ mheap.pagesSwept
'
mv export.go export_test.go

Change-Id: Ife99893d90a339655f604bc3a64ee3decec645ea
Reviewed-on: https://go-review.googlesource.com/c/go/+/356709
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Austin Clements <austin@google.com>
2021-10-20 20:39:25 +00:00
Cherry Mui 3298c749ac [dev.typeparams] runtime: undo go'd closure argument workaround
In CL 298669 we added defer/go wrapping, and, as it is not
allowed for closures to escape when compiling runtime, we worked
around it by rewriting go'd closures to argumentless
non-capturing closures, so it is not a real closure and so not
needed to escape.

Previous CL removes the restriction. Now we can undo the
workaround.

Updates #40724.

Change-Id: Ic7bf129da4aee7b7fdb7157414eca943a6a27264
Reviewed-on: https://go-review.googlesource.com/c/go/+/325110
Trust: Cherry Mui <cherryyz@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
2021-06-04 17:00:32 +00:00
Michael Anthony Knyszek f2d5bd1ad3 runtime: move internal GC statistics from memstats to gcController
This change moves certain important but internal-only GC statistics from
memstats into gcController. These statistics are mainly used in pacing
the GC, so it makes sense to keep them in the pacer's state.

This CL was mostly generated via

rf '
    ex . {
	memstats.gc_trigger -> gcController.trigger
	memstats.triggerRatio -> gcController.triggerRatio
	memstats.heap_marked -> gcController.heapMarked
	memstats.heap_live -> gcController.heapLive
	memstats.heap_scan -> gcController.heapScan
    }
'

except for a few special cases, like updating names in comments and when
these fields are used within gcControllerState methods (at which point
they're accessed through the reciever).

For #44167.

Change-Id: I6bd1602585aeeb80818ded24c07d8e6fec992b93
Reviewed-on: https://go-review.googlesource.com/c/go/+/306598
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2021-04-13 23:42:29 +00:00
Austin Clements 1b736b3c19 runtime: consolidate "is sweep done" conditions
The runtime currently has two different notions of sweep completion:

1. All spans are either swept or have begun sweeping.

2. The sweeper has *finished* sweeping all spans.

Having both is confusing (it doesn't help that the documentation is
often unclear or wrong). Condition 2 is stronger and the theoretical
slight optimization that condition 1 could impact is never actually
useful. Hence, this CL consolidates both conditions down to condition 2.

Updates #45315.

Change-Id: I55c84d767d74eb31a004a5619eaba2e351162332
Reviewed-on: https://go-review.googlesource.com/c/go/+/307916
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2021-04-12 19:22:52 +00:00
Austin Clements a25a77aed2 runtime: block sweep completion on all sweep paths
The runtime currently has two different notions of sweep completion:

1. All spans are either swept or have begun sweeping.

2. The sweeper has *finished* sweeping all spans.

Most things depend on condition 1. Notably, GC correctness depends on
condition 1, but since all sweep operations a non-preemptible, the STW
at the beginning of GC forces condition 1 to become condition 2.

runtime.GC(), however, depends on condition 2, since the intent is to
complete a complete GC cycle, and also update the heap profile (which
can only be done after sweeping is complete).

However, the way we compute condition 2 is racy right now and may in
fact only indicate condition 1. Specifically, sweepone blocks
condition 2 until all sweepone calls are done, but there are many
other ways to enter the sweeper that don't block this. Hence, sweepone
may see that there are no more spans in the sweep list and see that
it's the last sweepone and declare sweeping done, while there's some
other sweeper still working on a span.

Fix this by making sure every entry to the sweeper participates in the
protocol that blocks condition 2. To make sure we get this right, this
CL introduces a type to track sweep blocking and (lightly) enforces
span sweep ownership via the type system. This has the nice
side-effect of abstracting the pattern of acquiring sweep ownership
that's currently repeated in many different places.

Fixes #45315.

Change-Id: I7fab30170c5ae14c8b2f10998628735b8be6d901
Reviewed-on: https://go-review.googlesource.com/c/go/+/307915
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2021-04-12 19:22:50 +00:00
Austin Clements 4e1bf8ed38 runtime: add GC testing helpers for regabi signature fuzzer
This CL adds a set of helper functions for testing GC interactions.
These are intended for use in the regabi signature fuzzer, but are
generally useful for GC tests, so we make them generally available to
runtime tests.

These provide:

1. An easy way to force stack movement, for testing stack copying.

2. A simple and robust way to check the reachability of a set of
pointers.

3. A way to check what general category of memory a pointer points to,
mostly so tests can make sure they're testing what they mean to.

For #40724, but generally useful.

Change-Id: I15d33ccb3f5a792c0472a19c2cc9a8b4a9356a66
Reviewed-on: https://go-review.googlesource.com/c/go/+/305330
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
2021-03-29 21:50:16 +00:00
Austin Clements 1ef114d12c runtime: abstract specials list iteration
The specials processing loop in mspan.sweep is about to get more
complicated and I'm too allergic to list manipulation to open code
more of it there.

Change-Id: I767a0889739da85fb2878fc06a5c55b73bf2ba7d
Reviewed-on: https://go-review.googlesource.com/c/go/+/305551
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2021-03-29 21:50:14 +00:00
Than McIntosh 769d4b68ef cmd/compile: wrap/desugar defer calls for register abi
Adds code to the compiler's "order" phase to rewrite go and defer
statements to always be argument-less. E.g.

 defer f(x,y)       =>     x1, y1 := x, y
			   defer func() { f(x1, y1) }

This transformation is not beneficial on its own, but it helps
simplify runtime defer handling for the new register ABI (when
invoking deferred functions on the panic path, the runtime doesn't
need to manage the complexity of determining which args to pass in
register vs memory).

This feature is currently enabled by default if GOEXPERIMENT=regabi or
GOEXPERIMENT=regabidefer is in effect.

Included in this CL are some workarounds in the runtime to insure that
"go" statement targets in the runtime are argument-less already (since
wrapping them can potentially introduce heap-allocated closures, which
are currently not allowed). The expectation is that these workarounds
will be temporary, and can go away once we either A) change the rules
about heap-allocated closures, or B) implement some other scheme for
handling go statements.

Change-Id: I01060d79a6b140c6f0838d6e6813f807ccdca319
Reviewed-on: https://go-review.googlesource.com/c/go/+/298669
Trust: Than McIntosh <thanm@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
2021-03-23 23:08:19 +00:00
Michael Anthony Knyszek 39a5ee52b9 runtime: decouple consistent stats from mcache and allow P-less update
This change modifies the consistent stats implementation to keep the
per-P sequence counter on each P instead of each mcache. A valid mcache
is not available everywhere that we want to call e.g. allocSpan, as per
issue #42339. By decoupling these two, we can add a mechanism to allow
contexts without a P to update stats consistently.

In this CL, we achieve that with a mutex. In practice, it will be very
rare for an M to update these stats without a P. Furthermore, the stats
reader also only needs to hold the mutex across the update to "gen"
since once that changes, writers are free to continue updating the new
stats generation. Contention could thus only arise between writers
without a P, and as mentioned earlier, those should be rare.

A nice side-effect of this change is that the consistent stats acquire
and release API becomes simpler.

Fixes #42339.

Change-Id: Ied74ab256f69abd54b550394c8ad7c4c40a5fe34
Reviewed-on: https://go-review.googlesource.com/c/go/+/267158
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2020-11-02 21:21:46 +00:00
Michael Pratt 6abbfc17c2 runtime: add world-stopped assertions
Stopping the world is an implicit lock for many operations, so we should
assert the world is stopped in functions that require it.

This is enabled along with the rest of lock ranking, though it is a bit
orthogonal and likely cheap enough to enable all the time should we
choose.

Requiring a lock _or_ world stop is common, so that can be expressed as
well.

Updates #40677

Change-Id: If0a58544f4251d367f73c4120c9d39974c6cd091
Reviewed-on: https://go-review.googlesource.com/c/go/+/248577
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Trust: Michael Pratt <mpratt@google.com>
2020-10-30 20:20:58 +00:00
Michael Anthony Knyszek 79781e8dd3 runtime: move malloc stats into consistentHeapStats
This change moves the mcache-local malloc stats into the
consistentHeapStats structure so the malloc stats can be managed
consistently with the memory stats. The one exception here is
tinyAllocs for which moving that into the global stats would incur
several atomic writes on the fast path. Microbenchmarks for just one CPU
core have shown a 50% loss in throughput. Since tiny allocation counnt
isn't exposed anyway and is always blindly added to both allocs and
frees, let that stay inconsistent and flush the tiny allocation count
every so often.

Change-Id: I2a4b75f209c0e659b9c0db081a3287bf227c10ca
Reviewed-on: https://go-review.googlesource.com/c/go/+/247039
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2020-10-26 18:28:56 +00:00
Michael Anthony Knyszek c863849800 runtime: rename mcache fields to match Go style
This change renames a bunch of malloc statistics stored in the mcache
that are all named with the "local_" prefix. It also renames largeAlloc
to allocLarge to prevent a naming conflict, and next_sample because it
would be the last mcache field with the old C naming style.

Change-Id: I29695cb83b397a435ede7e9ad5c3c9be72767ea3
Reviewed-on: https://go-review.googlesource.com/c/go/+/246969
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2020-10-26 17:26:48 +00:00
Michael Anthony Knyszek e6d0bd2b89 runtime: clean up old mcentral code
This change deletes the old mcentral implementation from the code base
and the newMCentralImpl feature flag along with it.

Updates #37487.

Change-Id: Ibca8f722665f0865051f649ffe699cbdbfdcfcf2
Reviewed-on: https://go-review.googlesource.com/c/go/+/221184
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2020-08-17 20:06:49 +00:00
Michael Anthony Knyszek 260dff3ca3 runtime: clean up old markrootSpans
This change removes the old markrootSpans implementation and deletes the
feature flag.

Updates #37487.

Change-Id: Idb5a2559abcc3be5a7da6f2ccce1a86e1d7634e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/221183
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Austin Clements <austin@google.com>
2020-08-17 20:06:41 +00:00
Austin Clements ea2de3346f runtime: detect and report zombie slots during sweeping
A zombie slot is a slot that is marked, but isn't allocated. This can
indicate a bug in the GC, or a bad use of unsafe.Pointer. Currently,
the sweeper has best-effort detection for zombie slots: if there are
more marked slots than allocated slots, then there must have been a
zombie slot. However, this is imprecise since it only compares totals
and it reports almost no information that may be helpful to debug the
issue.

Add a precise check that compares the mark and allocation bitmaps and
reports detailed information if it detects a zombie slot.

No appreciable effect on performance as measured by the sweet
benchmarks:

name                                old time/op  new time/op  delta
BiogoIgor                            15.8s ± 2%   15.8s ± 2%    ~     (p=0.421 n=24+25)
BiogoKrishna                         15.6s ± 2%   15.8s ± 5%    ~     (p=0.082 n=22+23)
BleveIndexBatch100                   4.90s ± 3%   4.88s ± 2%    ~     (p=0.627 n=25+24)
CompileTemplate                      204ms ± 1%   205ms ± 0%  +0.22%  (p=0.010 n=24+23)
CompileUnicode                      77.8ms ± 2%  78.0ms ± 1%    ~     (p=0.236 n=25+24)
CompileGoTypes                       729ms ± 0%   731ms ± 0%  +0.26%  (p=0.000 n=24+24)
CompileCompiler                      3.52s ± 0%   3.52s ± 1%    ~     (p=0.152 n=25+25)
CompileSSA                           8.06s ± 1%   8.05s ± 0%    ~     (p=0.192 n=25+24)
CompileFlate                         132ms ± 1%   132ms ± 1%    ~     (p=0.373 n=24+24)
CompileGoParser                      163ms ± 1%   164ms ± 1%  +0.32%  (p=0.003 n=24+25)
CompileReflect                       453ms ± 1%   455ms ± 1%  +0.39%  (p=0.000 n=22+22)
CompileTar                           181ms ± 1%   181ms ± 1%  +0.20%  (p=0.029 n=24+21)
CompileXML                           244ms ± 1%   244ms ± 1%    ~     (p=0.065 n=24+24)
CompileStdCmd                        15.8s ± 2%   15.7s ± 2%    ~     (p=0.059 n=23+24)
FoglemanFauxGLRenderRotateBoat       13.4s ±11%   12.8s ± 0%    ~     (p=0.377 n=25+24)
FoglemanPathTraceRenderGopherIter1   18.6s ± 0%   18.6s ± 0%    ~     (p=0.696 n=23+24)
GopherLuaKNucleotide                 28.7s ± 4%   28.6s ± 5%    ~     (p=0.700 n=25+25)
MarkdownRenderXHTML                  250ms ± 1%   248ms ± 1%  -1.01%  (p=0.000 n=24+24)
[Geo mean]                           1.60s        1.60s       -0.11%

(https://perf.golang.org/search?q=upload:20200517.6)

For #38702.

Change-Id: I8af1fefd5fbf7b9cb665b98f9c4b73d1d08eea81
Reviewed-on: https://go-review.googlesource.com/c/go/+/234100
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-05-21 21:36:40 +00:00
Michael Anthony Knyszek 55ec5182d7 runtime: remove scavAddr in favor of address ranges
This change removes the concept of s.scavAddr in favor of explicitly
reserving and unreserving address ranges. s.scavAddr has several
problems with raciness that can cause the scavenger to miss updates, or
move it back unnecessarily, forcing future scavenge calls to iterate
over searched address space unnecessarily.

This change achieves this by replacing scavAddr with a second addrRanges
which is cloned from s.inUse at the end of each sweep phase. Ranges from
this second addrRanges are then reserved by scavengers (with the
reservation size proportional to the heap size) who are then able to
safely iterate over those ranges without worry of another scavenger
coming in.

Fixes #35788.

Change-Id: Ief01ae170384174875118742f6c26b2a41cbb66d
Reviewed-on: https://go-review.googlesource.com/c/go/+/208378
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Austin Clements <austin@google.com>
2020-05-08 16:24:40 +00:00
Michael Anthony Knyszek 2491c5fd24 runtime: wake scavenger and update address on sweep done
This change modifies the semantics of waking the scavenger: rather than
wake on any update to pacing, wake when we know we will have work to do,
that is, when the sweeper is done. The current scavenger runs over the
address space just once per GC cycle, and we want to maximize the chance
that the scavenger observes the most attractive scavengable memory in
that pass (i.e. free memory with the highest address), so the timing is
important. By having the scavenger awaken and reset its search space
when the sweeper is done, we increase the chance that the scavenger will
observe the most attractive scavengable memory, because no more memory
will be freed that GC cycle (so the highest scavengable address should
now be available).

Furthermore, in applications that go idle, this means the background
scavenger will be awoken even if another GC doesn't happen, which isn't
true today.

However, we're unable to wake the scavenger directly from within the
sweeper; waking the scavenger involves modifying timers and readying
goroutines, the latter of which may trigger an allocation today (and the
sweeper may run during allocation!). Instead, we do the following:

1. Set a flag which is checked by sysmon. sysmon will clear the flag and
   wake the scavenger.
2. Wake the scavenger unconditionally at sweep termination.

The idea behind this policy is that it gets us close enough to the state
above without having to deal with the complexity of waking the scavenger
in deep parts of the runtime. If the application goes idle and sweeping
finishes (so we don't reach sweep termination), then sysmon will wake
the scavenger. sysmon has a worst-case 20 ms delay in responding to this
signal, which is probably fine if the application is completely idle
anyway, but if the application is actively allocating, then the
proportional sweeper should help ensure that sweeping ends very close to
sweep termination, so sweep termination is a perfectly reasonable time
to wake up the scavenger.

Updates #35788.

Change-Id: I84289b37816a7d595d803c72a71b7f5c59d47e6b
Reviewed-on: https://go-review.googlesource.com/c/go/+/207998
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2020-04-30 18:12:03 +00:00
Michael Anthony Knyszek a13691966a runtime: add new mcentral implementation
Currently mcentral is implemented as a couple of linked lists of spans
protected by a lock. Unfortunately this design leads to significant lock
contention.

The span ownership model is also confusing and complicated. In-use spans
jump between being owned by multiple sources, generally some combination
of a gcSweepBuf, a concurrent sweeper, an mcentral or an mcache.

So first to address contention, this change replaces those linked lists
with gcSweepBufs which have an atomic fast path. Then, we change up the
ownership model: a span may be simultaneously owned only by an mcentral
and the page reclaimer. Otherwise, an mcentral (which now consists of
sweep bufs), a sweeper, or an mcache are the sole owners of a span at
any given time. This dramatically simplifies reasoning about span
ownership in the runtime.

As a result of this new ownership model, sweeping is now driven by
walking over the mcentrals rather than having its own global list of
spans. Because we no longer have a global list and we traditionally
haven't used the mcentrals for large object spans, we no longer have
anywhere to put large objects. So, this change also makes it so that we
keep large object spans in the appropriate mcentral lists.

In terms of the static lock ranking, we add the spanSet spine locks in
pretty much the same place as the mcentral locks, since they have the
potential to be manipulated both on the allocation and sweep paths, like
the mcentral locks.

This new implementation is turned on by default via a feature flag
called go115NewMCentralImpl.

Benchmark results for 1 KiB allocation throughput (5 runs each):

name \ MiB/s  go113       go114       gotip       gotip+this-patch
AllocKiB-1    1.71k ± 1%  1.68k ± 1%  1.59k ± 2%      1.71k ± 1%
AllocKiB-2    2.46k ± 1%  2.51k ± 1%  2.54k ± 1%      2.93k ± 1%
AllocKiB-4    4.27k ± 1%  4.41k ± 2%  4.33k ± 1%      5.01k ± 2%
AllocKiB-8    4.38k ± 3%  5.24k ± 1%  5.46k ± 1%      8.23k ± 1%
AllocKiB-12   4.38k ± 3%  4.49k ± 1%  5.10k ± 1%     10.04k ± 0%
AllocKiB-16   4.31k ± 1%  4.14k ± 3%  4.22k ± 0%     10.42k ± 0%
AllocKiB-20   4.26k ± 1%  3.98k ± 1%  4.09k ± 1%     10.46k ± 3%
AllocKiB-24   4.20k ± 1%  3.97k ± 1%  4.06k ± 1%     10.74k ± 1%
AllocKiB-28   4.15k ± 0%  4.00k ± 0%  4.20k ± 0%     10.76k ± 1%

Fixes #37487.

Change-Id: I92d47355acacf9af2c41bf080c08a8c1638ba210
Reviewed-on: https://go-review.googlesource.com/c/go/+/221182
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2020-04-27 18:19:26 +00:00