Commit Graph

319 Commits

Author SHA1 Message Date
Lénaïc Huard 52eaed6633 runtime: decorate anonymous memory mappings
Leverage the prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, ...) API to name
the anonymous memory areas.

This API has been introduced in Linux 5.17 to decorate the anonymous
memory areas shown in /proc/<pid>/maps.

This is already used by glibc. See:
* https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/malloc.c;h=27dfd1eb907f4615b70c70237c42c552bb4f26a8;hb=HEAD#l2434
* https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/setvmaname.c;h=ea93a5ffbebc9e5a7e32a297138f465724b4725f;hb=HEAD#l63

This can be useful when investigating the memory consumption of a
multi-language program.
On a 100% Go program, pprof profiler can be used to profile the memory
consumption of the program. But pprof is only aware of what happens
within the Go world.

On a multi-language program, there could be a doubt about whether the
suspicious extra-memory consumption comes from the Go part or the native
part.

With this change, the following Go program:

        package main

        import (
                "fmt"
                "log"
                "os"
        )

        /*
        #include <stdlib.h>

        void f(void)
        {
          (void)malloc(1024*1024*1024);
        }
        */
        import "C"

        func main() {
                C.f()

                data, err := os.ReadFile("/proc/self/maps")
                if err != nil {
                        log.Fatal(err)
                }
                fmt.Println(string(data))
        }

produces this output:

        $ GLIBC_TUNABLES=glibc.mem.decorate_maps=1 ~/doc/devel/open-source/go/bin/go run .
        00400000-00402000 r--p 00000000 00:21 28451768                           /home/lenaic/.cache/go-build/9f/9f25a17baed5a80d03eb080a2ce2a5ff49c17f9a56e28330f0474a2bb74a30a0-d/test_vma_name
        00402000-004a4000 r-xp 00002000 00:21 28451768                           /home/lenaic/.cache/go-build/9f/9f25a17baed5a80d03eb080a2ce2a5ff49c17f9a56e28330f0474a2bb74a30a0-d/test_vma_name
        004a4000-00574000 r--p 000a4000 00:21 28451768                           /home/lenaic/.cache/go-build/9f/9f25a17baed5a80d03eb080a2ce2a5ff49c17f9a56e28330f0474a2bb74a30a0-d/test_vma_name
        00574000-00575000 r--p 00173000 00:21 28451768                           /home/lenaic/.cache/go-build/9f/9f25a17baed5a80d03eb080a2ce2a5ff49c17f9a56e28330f0474a2bb74a30a0-d/test_vma_name
        00575000-00580000 rw-p 00174000 00:21 28451768                           /home/lenaic/.cache/go-build/9f/9f25a17baed5a80d03eb080a2ce2a5ff49c17f9a56e28330f0474a2bb74a30a0-d/test_vma_name
        00580000-005a4000 rw-p 00000000 00:00 0
        2e075000-2e096000 rw-p 00000000 00:00 0                                  [heap]
        c000000000-c000400000 rw-p 00000000 00:00 0                              [anon: Go: heap]
        c000400000-c004000000 ---p 00000000 00:00 0                              [anon: Go: heap reservation]
        777f40000000-777f40021000 rw-p 00000000 00:00 0                          [anon: glibc: malloc arena]
        777f40021000-777f44000000 ---p 00000000 00:00 0
        777f44000000-777f44021000 rw-p 00000000 00:00 0                          [anon: glibc: malloc arena]
        777f44021000-777f48000000 ---p 00000000 00:00 0
        777f48000000-777f48021000 rw-p 00000000 00:00 0                          [anon: glibc: malloc arena]
        777f48021000-777f4c000000 ---p 00000000 00:00 0
        777f4c000000-777f4c021000 rw-p 00000000 00:00 0                          [anon: glibc: malloc arena]
        777f4c021000-777f50000000 ---p 00000000 00:00 0
        777f50000000-777f50021000 rw-p 00000000 00:00 0                          [anon: glibc: malloc arena]
        777f50021000-777f54000000 ---p 00000000 00:00 0
        777f55afb000-777f55afc000 ---p 00000000 00:00 0
        777f55afc000-777f562fc000 rw-p 00000000 00:00 0                          [anon: glibc: pthread stack: 216378]
        777f562fc000-777f562fd000 ---p 00000000 00:00 0
        777f562fd000-777f56afd000 rw-p 00000000 00:00 0                          [anon: glibc: pthread stack: 216377]
        777f56afd000-777f56afe000 ---p 00000000 00:00 0
        777f56afe000-777f572fe000 rw-p 00000000 00:00 0                          [anon: glibc: pthread stack: 216376]
        777f572fe000-777f572ff000 ---p 00000000 00:00 0
        777f572ff000-777f57aff000 rw-p 00000000 00:00 0                          [anon: glibc: pthread stack: 216375]
        777f57aff000-777f57b00000 ---p 00000000 00:00 0
        777f57b00000-777f58300000 rw-p 00000000 00:00 0                          [anon: glibc: pthread stack: 216374]
        777f58300000-777f58400000 rw-p 00000000 00:00 0                          [anon: Go: page alloc index]
        777f58400000-777f5a400000 rw-p 00000000 00:00 0                          [anon: Go: heap index]
        777f5a400000-777f6a580000 ---p 00000000 00:00 0                          [anon: Go: scavenge index]
        777f6a580000-777f6a581000 rw-p 00000000 00:00 0                          [anon: Go: scavenge index]
        777f6a581000-777f7a400000 ---p 00000000 00:00 0                          [anon: Go: scavenge index]
        777f7a400000-777f8a580000 ---p 00000000 00:00 0                          [anon: Go: page summary]
        777f8a580000-777f8a581000 rw-p 00000000 00:00 0                          [anon: Go: page alloc]
        777f8a581000-777f9c430000 ---p 00000000 00:00 0                          [anon: Go: page summary]
        777f9c430000-777f9c431000 rw-p 00000000 00:00 0                          [anon: Go: page alloc]
        777f9c431000-777f9e806000 ---p 00000000 00:00 0                          [anon: Go: page summary]
        777f9e806000-777f9e807000 rw-p 00000000 00:00 0                          [anon: Go: page alloc]
        777f9e807000-777f9ec00000 ---p 00000000 00:00 0                          [anon: Go: page summary]
        777f9ec36000-777f9ecb6000 rw-p 00000000 00:00 0                          [anon: Go: immortal metadata]
        777f9ecb6000-777f9ecc6000 rw-p 00000000 00:00 0                          [anon: Go: gc bits]
        777f9ecc6000-777f9ecd6000 rw-p 00000000 00:00 0                          [anon: Go: allspans array]
        777f9ecd6000-777f9ece7000 rw-p 00000000 00:00 0                          [anon: Go: immortal metadata]
        777f9ece7000-777f9ed67000 ---p 00000000 00:00 0                          [anon: Go: page summary]
        777f9ed67000-777f9ed68000 rw-p 00000000 00:00 0                          [anon: Go: page alloc]
        777f9ed68000-777f9ede7000 ---p 00000000 00:00 0                          [anon: Go: page summary]
        777f9ede7000-777f9ee07000 rw-p 00000000 00:00 0                          [anon: Go: page alloc]
        777f9ee07000-777f9ee0a000 rw-p 00000000 00:00 0                          [anon: glibc: loader malloc]
        777f9ee0a000-777f9ee2e000 r--p 00000000 00:21 48158213                   /usr/lib/libc.so.6
        777f9ee2e000-777f9ef9f000 r-xp 00024000 00:21 48158213                   /usr/lib/libc.so.6
        777f9ef9f000-777f9efee000 r--p 00195000 00:21 48158213                   /usr/lib/libc.so.6
        777f9efee000-777f9eff2000 r--p 001e3000 00:21 48158213                   /usr/lib/libc.so.6
        777f9eff2000-777f9eff4000 rw-p 001e7000 00:21 48158213                   /usr/lib/libc.so.6
        777f9eff4000-777f9effc000 rw-p 00000000 00:00 0
        777f9effc000-777f9effe000 rw-p 00000000 00:00 0                          [anon: glibc: loader malloc]
        777f9f00a000-777f9f04a000 rw-p 00000000 00:00 0                          [anon: Go: immortal metadata]
        777f9f04a000-777f9f04c000 r--p 00000000 00:00 0                          [vvar]
        777f9f04c000-777f9f04e000 r--p 00000000 00:00 0                          [vvar_vclock]
        777f9f04e000-777f9f050000 r-xp 00000000 00:00 0                          [vdso]
        777f9f050000-777f9f051000 r--p 00000000 00:21 48158204                   /usr/lib/ld-linux-x86-64.so.2
        777f9f051000-777f9f07a000 r-xp 00001000 00:21 48158204                   /usr/lib/ld-linux-x86-64.so.2
        777f9f07a000-777f9f085000 r--p 0002a000 00:21 48158204                   /usr/lib/ld-linux-x86-64.so.2
        777f9f085000-777f9f087000 r--p 00034000 00:21 48158204                   /usr/lib/ld-linux-x86-64.so.2
        777f9f087000-777f9f088000 rw-p 00036000 00:21 48158204                   /usr/lib/ld-linux-x86-64.so.2
        777f9f088000-777f9f089000 rw-p 00000000 00:00 0
        7ffc7bfa7000-7ffc7bfc8000 rw-p 00000000 00:00 0                          [stack]
        ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0                  [vsyscall]

The anonymous memory areas are now labelled so that we can see which
ones have been allocated by the Go runtime versus which ones have been
allocated by the glibc.

Fixes #71546

Change-Id: I304e8b4dd7f2477a6da794fd44e9a7a5354e4bf4
Reviewed-on: https://go-review.googlesource.com/c/go/+/646095
Auto-Submit: Alan Donovan <adonovan@google.com>
Commit-Queue: Alan Donovan <adonovan@google.com>
Reviewed-by: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-03-04 11:22:33 -08:00
Michael Anthony Knyszek 6d7cb594b3 weak: accept linker-allocated objects to Make
Currently Make panics when passed a linker-allocated object. This is
inconsistent with both runtime.AddCleanup and runtime.SetFinalizer. Not
panicking in this case is important so that all pointers can be treated
equally by these APIs. Libraries should not have to worry where a
pointer came from to still make weak pointers.

Supporting this behavior is a bit complex for weak pointers versus
finalizers and cleanups. For the latter two, it means a function is
never called, so we can just drop everything on the floor. For weak
pointers, we still need to produce pointers that compare as per the API.
To do this, copy the tiny lock-free trace map implementation and use it
to store weak handles for "immortal" objects. These paths in the
runtime should be rare, so it's OK if it's not incredibly fast, but we
should keep the memory footprint relatively low (at least not have it be
any worse than specials), so this change tweaks the map implementation a
little bit to ensure that's the case.

Fixes #71726.

Change-Id: I0c87c9d90656d81659ac8d70f511773d0093ce27
Reviewed-on: https://go-review.googlesource.com/c/go/+/649460
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-19 14:25:00 -08:00
Michael Anthony Knyszek b25b5f3ff4 runtime: fix GODEBUG=gccheckmark=1 and add smoke test
This change fixes GODEBUG=gccheckmark=1 which seems to have bit-rotted.
Because the root jobs weren't being reset, it wasn't doing anything.
Then, it turned out that checkmark mode would queue up noscan objects in
workbufs, which caused it to fail. Then it turned out checkmark mode was
broken with user arenas, since their heap arenas are not registered
anywhere. Then, it turned out that checkmark mode could just not run
properly if the goroutine's preemption flag was set (since
sched.gcwaiting is true during the STW). And lastly, it turned out that
async preemption could cause erroneous checkmark failures.

This change fixes all these issues and adds a simple smoke test to dist
to run the runtime tests under gccheckmark, which exercises all of these
issues.

Fixes #69074.
Fixes #69377.
Fixes #69376.

Change-Id: Iaa0bb7b9e63ed4ba34d222b47510d6292ce168bc
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest
Reviewed-on: https://go-review.googlesource.com/c/go/+/608915
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2025-02-03 08:21:09 -08:00
Michael Anthony Knyszek d87878c62b runtime: make special offset a uintptr
Currently specials try to save on space by only encoding the offset from
the base of the span in a uint16. This worked fine up until Go 1.24.
- Most specials have an offset of 0 (mem profile, finalizers, etc.)
- Cleanups do not care about the offset at all, so even if it's wrong,
  it's OK.
- Weak pointers *do* care, but the unique package always makes a new
  allocation, so the weak pointer handle offset it makes is always zero.

With Go 1.24 and general weak pointers now available, nothing is
stopping someone from just creating a weak pointer that is >64 KiB
offset from the start of an object, and this weak pointer must be
distinct from others.

Fix this problem by just increasing the size of a special and making the
offset a uintptr, to capture all possible offsets. Since we're in the
freeze, this is the safest thing to do. Specials aren't so common that I
expect a substantial memory increase from this change. In a future
release (or if there is a problem) we can almost certainly pack the
special's kind and offset together. There was already a bunch of wasted
space due to padding, so this would bring us back to the same memory
footprint before this change.

Also, add tests for equality of basic weak interior pointers. This
works, but we really should've had tests for it.

Fixes #70739.

Change-Id: Ib49a7f8f0f1ec3db4571a7afb0f4d94c8a93aa40
Reviewed-on: https://go-review.googlesource.com/c/go/+/634598
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Commit-Queue: Michael Knyszek <mknyszek@google.com>
2024-12-09 21:38:18 +00:00
Carlos Amedee 7f049eac1b runtime: properly search for cleanups in cleanup.stop
This change modifies the logic which searches for existing cleanups.
The existing search logic sets the next node to the current node
in certain conditions. This would cause future searches to loop
endlessly. The existing loop could convert non-cleanup specials into
cleanups and cause data corruption.

This also changes where we release the m while we are adding a
cleanup. We are currently holding onto an p-specific gcwork after
releasing the m.

Change-Id: I0ac0b304f40910549c8df114e523c89d9f0d7a75
Reviewed-on: https://go-review.googlesource.com/c/go/+/630278
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Carlos Amedee <carlos@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2024-11-22 20:28:23 +00:00
Michael Anthony Knyszek d69e6f63c3 runtime: keep cleanup closure alive across adding the cleanup special
This is similar to the weak handle bug in #70455. In short, there's a
window where a heap-allocated value is only visible through a special
that has not been made visible to the GC yet.

For #70455.

Change-Id: Ic2bb2c60d422a5bc5dab8d971cfc26ff6d7622bc
Reviewed-on: https://go-review.googlesource.com/c/go/+/630277
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-11-20 19:50:09 +00:00
Michael Anthony Knyszek 88cfad0c89 runtime: explicitly keep handle alive during getOrAddWeakHandle
getOrAddWeakHandle is very careful about keeping its input alive across
the operation, but not very careful about keeping the heap-allocated
handle it creates alive. In fact, there's a window in this function
where it is *only* visible via the special. Specifically, the window of
time between when the handle is stored in the special and when the
special actually becomes visible to the GC.

(If we fail to add the special because it already exists, that case is
fine. We don't even use the same handle value, but the one we obtain
from the attached GC-visible special, *and* we return that value, so it
remains live.)

Fixes #70455.

Change-Id: Iadaff0cfb93bcaf61ba2b05be7fa0519c481de82
Reviewed-on: https://go-review.googlesource.com/c/go/+/630315
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-11-20 19:46:18 +00:00
Michael Anthony Knyszek a65f1a467f weak: move internal/weak to weak, and update according to proposal
The updates are:
- API documentation changes.
- Removal of the old package documentation discouraging linkname.
- Addition of new package documentation with some advice.
- Renaming of weak.Pointer.Strong -> weak.Pointer.Value.

Fixes #67552.

Change-Id: Ifad7e629b6d339dacaf2ca37b459d7f903e31bf8
Reviewed-on: https://go-review.googlesource.com/c/go/+/628455
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
2024-11-18 22:29:23 +00:00
Carlos Amedee 6a2fb15475 runtime: implement Stop for AddCleanup
This change adds the implementation for AddCleanup.Stop. It allows the
caller to cancel the call to execute the cleanup. Cleanup will not be
stopped if the cleanup has already been queued for execution.

For #67535

Change-Id: I494b77d344e54d772c41489d172286773c3814e5
Reviewed-on: https://go-review.googlesource.com/c/go/+/627975
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Carlos Amedee <carlos@golang.org>
2024-11-16 03:54:51 +00:00
Carlos Amedee 0531768b30 runtime: implement AddCleanup
This change introduces AddCleanup to the runtime package. AddCleanup attaches
a cleanup function to an pointer to an object.

The Stop method on Cleanups will be implemented in a followup CL.

AddCleanup is intended to be an incremental improvement over
SetFinalizer and will result in SetFinalizer being deprecated.

For #67535

Change-Id: I99645152e3fdcee85fcf42a4f312c6917e8aecb1
Reviewed-on: https://go-review.googlesource.com/c/go/+/627695
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2024-11-16 03:26:04 +00:00
Michael Anthony Knyszek 80d306da50 runtime: prevent weak->strong conversions during mark termination
Currently it's possible for weak->strong conversions to create more GC
work during mark termination. When a weak->strong conversion happens
during the mark phase, we need to mark the newly-strong pointer, since
it may now be the only pointer to that object. In other words, the
object could be white.

But queueing new white objects creates GC work, and if this happens
during mark termination, we could end up violating mark termination
invariants. In the parlance of the mark termination algorithm, the
weak->strong conversion is a non-monotonic source of GC work, unlike the
write barriers (which will eventually only see black objects).

This change fixes the problem by forcing weak->strong conversions to
block during mark termination. We can do this efficiently by setting a
global flag before the ragged barrier that is checked at each
weak->strong conversion. If the flag is set, then the conversions block.
The ragged barrier ensures that all Ps have observed the flag and that
any weak->strong conversions which completed before the ragged barrier
have their newly-minted strong pointers visible in GC work queues if
necessary. We later unset the flag and wake all the blocked goroutines
during the mark termination STW.

There are a few subtleties that we need to account for. For one, it's
possible that a goroutine which blocked in a weak->strong conversion
wakes up only to find it's mark termination time again, so we need to
recheck the global flag on wake. We should also stay non-preemptible
while performing the check, so that if the check *does* appear as true,
it cannot switch back to false while we're actively trying to block. If
it switches to false while we try to block, then we'll be stuck in the
queue until the following GC.

All-in-all, this CL is more complicated than I would have liked, but
it's the only idea so far that is clearly correct to me at a high level.

This change adds a test which is somewhat invasive as it manipulates
mark termination, but hopefully that infrastructure will be useful for
debugging, fixing, and regression testing mark termination whenever we
do fix it.

Fixes #69803.

Change-Id: Ie314e6fd357c9e2a07a9be21f217f75f7aba8c4a
Reviewed-on: https://go-review.googlesource.com/c/go/+/623615
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-11-13 18:52:12 +00:00
Michael Anthony Knyszek 79fd633632 internal/weak: shade pointer in weak-to-strong conversion
There's a bug in the weak-to-strong conversion in that creating the
*only* strong pointer to some weakly-held object during the mark phase
may result in that object not being properly marked.

The exact mechanism for this is that the new strong pointer will always
point to a white object (because it was only weakly referenced up until
this point) and it can then be stored in a blackened stack, hiding it
from the garbage collector.

This "hide a white pointer in the stack" problem is pretty much exactly
what the Yuasa part of the hybrid write barrier is trying to catch, so
we need to do the same thing the write barrier would do: shade the
pointer.

Added a test and confirmed that it fails with high probability if the
pointer shading is missing.

Fixes #69210.

Change-Id: Iaae64ae95ea7e975c2f2c3d4d1960e74e1bd1c3f
Reviewed-on: https://go-review.googlesource.com/c/go/+/610396
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
2024-09-04 18:13:24 +00:00
Michael Anthony Knyszek e76353d5a9 runtime: allow the tracer to be reentrant
This change allows the tracer to be reentrant by restructuring the
internals such that writing an event is atomic with respect to stack
growth. Essentially, a core set of functions that are involved in
acquiring a trace buffer and writing to it are all marked nosplit.

Stack growth is currently the only hidden place where the tracer may be
accidentally reentrant, preventing the tracer from being used
everywhere. It already lacks write barriers, lacks allocations, and is
non-preemptible. This change thus makes the tracer fully reentrant,
since the only reentry case it needs to handle is stack growth.

Since the invariants needed to attain this are subtle, this change also
extends the debugTraceReentrancy debug mode to check these invariants as
well. Specifically, the invariants are checked by setting the throwsplit
flag.

A side benefit of this change is it simplifies the trace event writing
API a good bit: there's no need to actually thread the event writer
through things, and most callsites look a bit simpler.

Change-Id: I7c329fb7a6cb936bd363c44cf882ea0a925132f3
Reviewed-on: https://go-review.googlesource.com/c/go/+/587599
Reviewed-by: Austin Clements <austin@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-07-25 14:38:21 +00:00
David Chase fc5073bc15 runtime,internal: move runtime/internal/sys to internal/runtime/sys
Cleanup and friction reduction

For #65355.

Change-Id: Ia14c9dc584a529a35b97801dd3e95b9acc99a511
Reviewed-on: https://go-review.googlesource.com/c/go/+/600436
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
2024-07-23 19:05:35 +00:00
Michael Anthony Knyszek ca1d2ead5d runtime: skip tracing events that would cause reentrancy
Some of the new experimental events added have a problem in that they
might be emitted during stack growth. This is, to my knowledge, the only
restriction on the tracer, because the tracer otherwise prevents
preemption, avoids allocation, and avoids write barriers. However, the
stack can grow from within the tracer. This leads to
tracing-during-tracing which can result in lost buffers and broken event
streams. (There's a debug mode to get a nice error message, but it's
disabled by default.)

This change resolves the problem by skipping writing out these new
events. This results in the new events sometimes being broken (alloc
without a free, free without an alloc) but for now that's OK. Before the
freeze begins we just want to fix broken tests; tools interpreting these
events will be totally in-house to begin with, and if they have to be a
little bit smarter about missing information, that's OK. In the future
we'll have a more robust fix for this, but it appears that it's going to
require making the tracer fully reentrant. (This is not too hard; either
we force flushing all buffers when going reentrant (which is actually
somewhat subtle with respect to event ordering) or we isolate down just
the actual event writing to be atomic with respect to stack growth. Both
are just bigger changes on shared codepaths that are scary to land this
late in the release cycle.)

Fixes #67379.

Change-Id: I46bb7e470e61c64ff54ac5aec5554b828c1ca4be
Reviewed-on: https://go-review.googlesource.com/c/go/+/587597
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-22 22:31:00 +00:00
Michael Anthony Knyszek 97c13cfb25 runtime: delete pagetrace GOEXPERIMENT
The page tracer's functionality is now captured by the regular execution
tracer as an experimental GODEBUG variable. This is a lot more usable
and maintainable than the page tracer, which is likely to have bitrotted
by this point. There's also no tooling available for the page tracer.

Change-Id: I2408394555e01dde75a522e9a489b7e55cf12c8e
Reviewed-on: https://go-review.googlesource.com/c/go/+/583379
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-08 17:48:45 +00:00
Michael Anthony Knyszek 724bab1505 runtime: add traceallocfree GODEBUG for alloc/free events in traces
This change adds expensive alloc/free events to traces, guarded by a
GODEBUG that can be set at run time by mutating the GODEBUG environment
variable. This supersedes the alloc/free trace deleted in a previous CL.

There are two parts to this CL.

The first part is adding a mechanism for exposing experimental events
through the tracer and trace parser. This boils down to a new
ExperimentalEvent event type in the parser API which simply reveals the
raw event data for the event. Each experimental event can also be
associated with "experimental data" which is associated with a
particular generation. This experimental data is just exposed as a bag
of bytes that supplements the experimental events.

In the runtime, this CL organizes experimental events by experiment.
An experiment is defined by a set of experimental events and a single
special batch type. Batches of this special type are exposed through the
parser's API as the aforementioned "experimental data".

The second part of this CL is defining the AllocFree experiment, which
defines 9 new experimental events covering heap object alloc/frees, span
alloc/frees, and goroutine stack alloc/frees. It also generates special
batches that contain a type table: a mapping of IDs to type information.

Change-Id: I965c00e3dcfdf5570f365ff89d0f70d8aeca219c
Reviewed-on: https://go-review.googlesource.com/c/go/+/583377
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-08 17:47:01 +00:00
Michael Anthony Knyszek dfc86e922c internal/weak: add package implementing weak pointers
This change adds the internal/weak package, which exposes GC-supported
weak pointers to the standard library. This is for the upcoming weak
package, but may be useful for other future constructs.

For #62483.

Change-Id: I4aa8fa9400110ad5ea022a43c094051699ccab9d
Reviewed-on: https://go-review.googlesource.com/c/go/+/576297
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-04-18 21:25:11 +00:00
Michael Anthony Knyszek 9f3f4c64db runtime: remove the allocheaders GOEXPERIMENT
This change removes the allocheaders, deleting all the old code and
merging mbitmap_allocheaders.go back into mbitmap.go.

This change also deletes the SetType benchmarks which were already
broken in the new GOEXPERIMENT (it's harder to set up than before). We
weren't really watching these benchmarks at all, and they don't provide
additional test coverage.

Change-Id: I135497201c3259087c5cd3722ed3fbe24791d25d
Reviewed-on: https://go-review.googlesource.com/c/go/+/567200
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
2024-04-09 04:07:57 +00:00
Andy Pan 4c2b1e0feb runtime: migrate internal/atomic to internal/runtime
For #65355

Change-Id: I65dd090fb99de9b231af2112c5ccb0eb635db2be
Reviewed-on: https://go-review.googlesource.com/c/go/+/560155
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ibrahim Bazoka <ibrahimbazoka729@gmail.com>
Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>
2024-03-25 19:53:03 +00:00
qiulaidongfeng 2f706871b9 runtime: delete todo of the list field for mspan
Change-Id: I10a3308c19da08d2ff0c8077bb74ad888ee04fea
GitHub-Last-Rev: 3e95b71384
GitHub-Pull-Request: golang/go#64077
Reviewed-on: https://go-review.googlesource.com/c/go/+/541755
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
2023-12-14 18:21:15 +00:00
Michael Anthony Knyszek f119abb65d runtime: refactor runtime->tracer API to appear more like a lock
Currently the execution tracer synchronizes with itself using very
heavyweight operations. As a result, it's totally fine for most of the
tracer code to look like:

    if traceEnabled() {
	traceXXX(...)
    }

However, if we want to make that synchronization more lightweight (as
issue #60773 proposes), then this is insufficient. In particular, we
need to make sure the tracer can't observe an inconsistency between g
atomicstatus and the event that would be emitted for a particular
g transition. This means making the g status change appear to happen
atomically with the corresponding trace event being written out from the
perspective of the tracer.

This requires a change in API to something more like a lock. While we're
here, we might as well make sure that trace events can *only* be emitted
while this lock is held. This change introduces such an API:
traceAcquire, which returns a value that can emit events, and
traceRelease, which requires the value that was returned by
traceAcquire. In practice, this won't be a real lock, it'll be more like
a seqlock.

For the current tracer, this API is completely overkill and the value
returned by traceAcquire basically just checks trace.enabled. But it's
necessary for the tracer described in #60773 and we can implement that
more cleanly if we do this refactoring now instead of later.

For #60773.

Change-Id: Ibb9ff5958376339fafc2b5180aef65cf2ba18646
Reviewed-on: https://go-review.googlesource.com/c/go/+/515635
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2023-11-09 22:34:25 +00:00
Michael Anthony Knyszek 38ac7c41aa runtime: implement experiment to replace heap bitmap with alloc headers
This change replaces the 1-bit-per-word heap bitmap for most size
classes with allocation headers for objects that contain pointers. The
header consists of a single pointer to a type. All allocations with
headers are treated as implicitly containing one or more instances of
the type in the header.

As the name implies, headers are usually stored as the first word of an
object. There are two additional exceptions to where headers are stored
and how they're used.

Objects smaller than 512 bytes do not have headers. Instead, a heap
bitmap is reserved at the end of spans for objects of this size. A full
word of overhead is too much for these small objects. The bitmap is of
the same format of the old bitmap, minus the noMorePtrs bits which are
unnecessary. All the objects <512 bytes have a bitmap less than a
pointer-word in size, and that was the granularity at which noMorePtrs
could stop scanning early anyway.

Objects that are larger than 32 KiB (which have their own span) have
their headers stored directly in the span, to allow power-of-two-sized
allocations to not spill over into an extra page.

The full implementation is behind GOEXPERIMENT=allocheaders.

The purpose of this change is performance. First and foremost, with
headers we no longer have to unroll pointer/scalar data at allocation
time for most size classes. Small size classes still need some
unrolling, but their bitmaps are small so we can optimize that case
fairly well. Larger objects effectively have their pointer/scalar data
unrolled on-demand from type data, which is much more compactly
represented and results in less TLB pressure. Furthermore, since the
headers are usually right next to the object and where we're about to
start scanning, we get an additional temporal locality benefit in the
data cache when looking up type metadata. The pointer/scalar data is
now effectively unrolled on-demand, but it's also simpler to unroll than
before; that unrolled data is never written anywhere, and for arrays we
get the benefit of retreading the same data per element, as opposed to
looking it up from scratch for each pointer-word of bitmap. Lastly,
because we no longer have a heap bitmap that spans the entire heap,
there's a flat 1.5% memory use reduction. This is balanced slightly by
some objects possibly being bumped up a size class, but most objects are
not tightly optimized to size class sizes so there's some memory to
spare, making the header basically free in those cases.

See the follow-up CL which turns on this experiment by default for
benchmark results. (CL 538217.)

Change-Id: I4c9034ee200650d06d8bdecd579d5f7c1bbf1fc5
Reviewed-on: https://go-review.googlesource.com/c/go/+/437955
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-11-09 19:58:08 +00:00
Michael Anthony Knyszek 7a606fef66 runtime: split out pointer/scalar metadata from heapArena
We're going to want to fork this data in the near future for a
GOEXPERIMENT, so break it out now.

Change-Id: Ia7ded850bb693c443fe439c6b7279dcac638512c
Reviewed-on: https://go-review.googlesource.com/c/go/+/537978
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
2023-11-02 17:24:39 +00:00
Cherry Mui 340a4f55c4 runtime: use smaller fields for mspan.freeindex and nelems
mspan.freeindex and nelems can fit into uint16 for all possible
values. Use uint16 instead of uintptr.

Change-Id: Ifce20751e81d5022be1f6b5cbb5fbe4fd1728b1b
Reviewed-on: https://go-review.googlesource.com/c/go/+/451359
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-02 20:39:21 +00:00
Jes Cok f4e7675d11 all: clean unnecessary casts
Run 'unconvert -safe -apply' (https://github.com/mdempsky/unconvert)

Change-Id: I24b7cd7d286cddce86431d8470d15c5f3f0d1106
GitHub-Last-Rev: 022e75384c
GitHub-Pull-Request: golang/go#62662
Reviewed-on: https://go-review.googlesource.com/c/go/+/528696
Auto-Submit: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2023-09-18 20:01:34 +00:00
Li Gang b45b00162b runtime: resolve false sharing for frequent memory allocate workloads
False sharing observed inside mheap struct, between arenas and preceding
variables.Pad mheap.arenas and preceding variables to avoid false sharing

This false-sharing getting worse and impact performance on multi core
system and frequent memory allocate workloads. While running MinIO On a
2 socket system(56 Core per socket) and GOGC=1000, we observed HITM>8%
(perf c2c) on this cacheline.

After resolve this false-sharing issue, we got performance 17% improved.

Improvement verified on MinIO:
Server: https://github.com/minio/minio
Client: https://github.com/minio/warp
Config: Single node MinIO Server with 6 ramdisk, without TLS enabled,
        Run warp GET request, 128KB object and 512 concurrent

Fixes #62472

Signed-off-by: Li Gang<gang.g.li@intel.com>
Change-Id: I9a4a3c97f5bc8cd014c627f92d59d9187ebaaab5
Reviewed-on: https://go-review.googlesource.com/c/go/+/525955
Reviewed-by: Heschi Kreinick <heschi@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
2023-09-13 18:36:09 +00:00
Sven Anderson 697644070c runtime: improve Pinner with gcBits
This change replaces the statically sized pinnerBits with gcBits
based ones, that are copied in each GC cycle if they exist.  The
pinnerBits now include a second bit per object, that indicates if a
pinner counter for multi-pins exists, in order to avoid unnecessary
specials iterations.

This is a follow-up to CL 367296.

Change-Id: I82e38cecd535e18c3b3ae54b5cc67d3aeeaafcfd
Reviewed-on: https://go-review.googlesource.com/c/go/+/493275
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2023-05-19 23:21:57 +00:00
Sven Anderson 251daf46fb runtime: implement Pinner API for object pinning
Some C APIs require the use or structures that contain pointers to
buffers (iovec, io_uring, ...).  The pointer passing rules would
require that these buffers are allocated in C memory and to process
this data with Go libraries it would need to be copied.

In order to provide a zero-copy way to use these C APIs, this CL
implements a Pinner API that allows to pin Go objects, which
guarantees that the garbage collector does not move these objects
while pinned.  This allows to relax the pointer passing rules so that
pinned pointers can be stored in C allocated memory or can be
contained in Go memory that is passed to C functions.

The Pin() method accepts pointers to objects of any type and
unsafe.Pointer.  Slices and arrays can be pinned by calling Pin()
with the pointer to the first element.  Pinning of maps is not
supported.

If the GC collects unreachable Pinner holding pinned objects it
panics.  If Pin() is called with the other non-pointer types it
panics as well.

Performance considerations: This change has no impact on execution
time on existing code, because checks are only done in code paths,
that would panic otherwise.  The memory footprint on existing code is
one pointer per memory span.

Fixes: #46787

Signed-off-by: Sven Anderson <sven@anderson.de>
Change-Id: I110031fe789b92277ae45a9455624687bd1c54f2
Reviewed-on: https://go-review.googlesource.com/c/go/+/367296
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
2023-05-19 14:59:14 +00:00
Michael Anthony Knyszek a3e90dc377 runtime: add eager scavenging details to GODEBUG=scavtrace=1
Also, clean up atomics on released-per-cycle while we're here.

For #57069.

Change-Id: I14026e8281f01dea1e8c8de6aa8944712b7b24d9
Reviewed-on: https://go-review.googlesource.com/c/go/+/495916
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-05-19 13:38:43 +00:00
Michael Anthony Knyszek 8992bb19ad runtime: replace trace.enabled with traceEnabled
[git-generate]
cd src/runtime
grep -l 'trace\.enabled' *.go | grep -v "trace.go" | xargs sed -i 's/trace\.enabled/traceEnabled()/g'

Change-Id: I14c7821c1134690b18c8abc0edd27abcdabcad72
Reviewed-on: https://go-review.googlesource.com/c/go/+/494181
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2023-05-11 21:27:08 +00:00
Michael Anthony Knyszek 8fa9e3beee runtime: manage huge pages explicitly
This change makes it so that on Linux the Go runtime explicitly marks
page heap memory as either available to be backed by hugepages or not
using heuristics based on density.

The motivation behind this change is twofold:
1. In default Linux configurations, khugepaged can recoalesce hugepages
   even after the scavenger breaks them up, resulting in significant
   overheads for small heaps when their heaps shrink.
2. The Go runtime already has some heuristics about this, but those
   heuristics appear to have bit-rotted and result in haphazard
   hugepage management. Unlucky (but otherwise fairly dense) regions of
   memory end up not backed by huge pages while sparse regions end up
   accidentally marked MADV_HUGEPAGE and are not later broken up by the
   scavenger, because it already got the memory it needed from more
   dense sections (this is more likely to happen with small heaps that
   go idle).

In this change, the runtime uses a new policy:

1. Mark all new memory MADV_HUGEPAGE.
2. Track whether each page chunk (4 MiB) became dense during the GC
   cycle. Mark those MADV_HUGEPAGE, and hide them from the scavenger.
3. If a chunk is not dense for 1 full GC cycle, make it visible to the
   scavenger.
4. The scavenger marks a chunk MADV_NOHUGEPAGE before it scavenges it.

This policy is intended to try and back memory that is a good candidate
for huge pages (high occupancy) with huge pages, and give memory that is
not (low occupancy) to the scavenger. Occupancy is defined not just by
occupancy at any instant of time, but also occupancy in the near future.
It's generally true that by the end of a GC cycle the heap gets quite
dense (from the perspective of the page allocator).

Because we want scavenging and huge page management to happen together
(the right time to MADV_NOHUGEPAGE is just before scavenging in order to
break up huge pages and keep them that way) and the cost of applying
MADV_HUGEPAGE and MADV_NOHUGEPAGE is somewhat high, the scavenger avoids
releasing memory in dense page chunks. All this together means the
scavenger will now more generally release memory on a ~1 GC cycle delay.

Notably this has implications for scavenging to maintain the memory
limit and the runtime/debug.FreeOSMemory API. This change makes it so
that in these cases all memory is visible to the scavenger regardless of
sparseness and delays the page allocator in re-marking this memory with
MADV_NOHUGEPAGE for around 1 GC cycle to mitigate churn.

The end result of this change should be little-to-no performance
difference for dense heaps (MADV_HUGEPAGE works a lot like the default
unmarked state) but should allow the scavenger to more effectively take
back fragments of huge pages. The main risk here is churn, because
MADV_HUGEPAGE usually forces the kernel to immediately back memory with
a huge page. That's the reason for the large amount of hysteresis (1
full GC cycle) and why the definition of high density is 96% occupancy.

Fixes #55328.

Change-Id: I8da7998f1a31b498a9cc9bc662c1ae1a6bf64630
Reviewed-on: https://go-review.googlesource.com/c/go/+/436395
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-04-19 14:30:00 +00:00
Michael Anthony Knyszek 1f9d80e331 runtime: disable huge pages for GC metadata for small heaps
For #55328.

Change-Id: I8792161f09906c08d506cc0ace9d07e76ec6baa6
Reviewed-on: https://go-review.googlesource.com/c/go/+/460316
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-04-19 14:27:27 +00:00
Oleksandr Redko 1a09d57de5 runtime: correct typos
- Fix typo in throw error message for arena.
- Correct typos in assembly and Go comments.
- Fix log message in TestTraceCPUProfile.

Change-Id: I874c9e8cd46394448b6717bc6021aa3ecf319d16
GitHub-Last-Rev: d27fad4d3c
GitHub-Pull-Request: golang/go#58375
Reviewed-on: https://go-review.googlesource.com/c/go/+/465975
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-02-08 14:52:12 +00:00
Keith Randall 79edd1d19d runtime: remove go119MemoryLimitSupport flag
Change-Id: I207480d991c6242a1610795605c5ec6a3b3c59de
Reviewed-on: https://go-review.googlesource.com/c/go/+/463225
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-01-28 20:46:29 +00:00
Michael Knyszek e4435cb844 runtime: add page tracer
This change adds a new GODEBUG flag called pagetrace that writes a
low-overhead trace of how pages of memory are managed by the Go runtime.

The page tracer is kept behind a GOEXPERIMENT flag due to a potential
security risk for setuid binaries.

Change-Id: I6f4a2447d02693c25214400846a5d2832ad6e5c0
Reviewed-on: https://go-review.googlesource.com/c/go/+/444157
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-11-18 03:45:30 +00:00
Cherry Mui febe7b8e2a runtime: make GC see object as allocated after it is initialized
When the GC is scanning some memory (possibly conservatively),
finding a pointer, while concurrently another goroutine is
allocating an object at the same address as the found pointer, the
GC may see the pointer before the object and/or the heap bits are
initialized. This may cause the GC to see bad pointers and
possibly crash.

To prevent this, we make it that the scanner can only see the
object as allocated after the object and the heap bits are
initialized. Currently the allocator uses freeindex to find the
next available slot, and that code is coupled with updating the
free index to a new slot past it. The scanner also uses the
freeindex to determine if an object is allocated. This is somewhat
racy. This CL makes the scanner use a different field, which is
only updated after the object initialization (and a memory
barrier).

Fixes #54596.

Change-Id: I2a57a226369926e7192c253dd0d21d3faf22297c
Reviewed-on: https://go-review.googlesource.com/c/go/+/449017
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-11-15 02:55:24 +00:00
Michael Anthony Knyszek 7866538d25 runtime: add safe arena support to the runtime
This change adds an API to the runtime for arenas. A later CL can
potentially export it as an experimental API, but for now, just the
runtime implementation will suffice.

The purpose of arenas is to improve efficiency, primarily by allowing
for an application to manually free memory, thereby delaying garbage
collection. It comes with other potential performance benefits, such as
better locality, a better allocation strategy, and better handling of
interior pointers by the GC.

This implementation is based on one by danscales@google.com with a few
significant differences:
* The implementation lives entirely in the runtime (all layers).
* Arena chunks are the minimum of 8 MiB or the heap arena size. This
  choice is made because in practice 64 MiB appears to be way too large
  of an area for most real-world use-cases.
* Arena chunks are not unmapped, instead they're placed on an evacuation
  list and when there are no pointers left pointing into them, they're
  allowed to be reused.
* Reusing partially-used arena chunks no longer tries to find one used
  by the same P first; it just takes the first one available.
* In order to ensure worst-case fragmentation is never worse than 25%,
  only types and slice backing stores whose sizes are 1/4th the size of
  a chunk or less may be used. Previously larger sizes, up to the size
  of the chunk, were allowed.
* ASAN, MSAN, and the race detector are fully supported.
* Sets arena chunks to fault that were deferred at the end of mark
  termination (a non-public patch once did this; I don't see a reason
  not to continue that).

For #51317.

Change-Id: I83b1693a17302554cb36b6daa4e9249a81b1644f
Reviewed-on: https://go-review.googlesource.com/c/go/+/423359
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
2022-10-12 20:23:30 +00:00
Michael Anthony Knyszek 4c383951b9 runtime: make (*mheap).sysAlloc more general
This change makes (*mheap).sysAlloc take an explicit list of hints and a
boolean as to whether or not any newly-created heapArenas should be
registered in the full arena list.

This is a refactoring in preparation for arenas.

For #51317.

Change-Id: I0584a033fce3fcb60c5d0bc033d5fb8bd23b2378
Reviewed-on: https://go-review.googlesource.com/c/go/+/432078
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
2022-10-12 20:23:24 +00:00
Michael Anthony Knyszek 69fc74f3ee runtime: factor out mheap span initialization
This change refactors span heap initialization. This change should just
be a no-op and just prepares for adding support for arenas.

For #51317.

Change-Id: Ie6f877ca10f86d26e7b6c4857b223589a351e253
Reviewed-on: https://go-review.googlesource.com/c/go/+/423364
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
2022-10-12 20:23:11 +00:00
Than McIntosh 506e690a26 runtime: mark arenaIdx.l1 and arenaIdx.l2 methods as nosplit
Mark the "l1" and "l2" methods on "arenaIdx" with //go:nosplit, since
these methods are called from a nosplit context (for example, from
"spanOf").

Fixes #56044.
Updates #21314.

Change-Id: I48c7aa756b59a13162c89ef21066f83371ae50f1
Reviewed-on: https://go-review.googlesource.com/c/go/+/441859
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-10-10 19:56:54 +00:00
Keith Randall f1b7b2fc52 runtime: make mSpanStateBox accessors nosplit
get, at least, is called from typedmemclr which must not be interruptible.
These were previously nosplit by accident before CL 424395 (the only
call they had was an intrinsic, so they were leaf functions, so they had
no prologue). After CL 424395 they contained a call (in noinline builds),
thus had a prologue, thus had a suspension point.

I have no idea how we might test this.

This is another motivating use case for having a nosplitrec directive
in the runtime.

Fixes #55156
Fixes #54779
Fixes #54906
Fixes #54907

Change-Id: I851d733d71bda7172c4c96e027657e22b499ee00
Reviewed-on: https://go-review.googlesource.com/c/go/+/431919
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-09-19 21:57:04 +00:00
Michael Anthony Knyszek b7c28f484d runtime/metrics: add CPU stats
This changes adds a breakdown for estimated CPU usage by time. These
estimates are not based on real on-CPU counters, so each metric has a
disclaimer explaining so. They can, however, be more reasonably
compared to a total CPU time metric that this change also adds.

Fixes #47216.

Change-Id: I125006526be9f8e0d609200e193da5a78d9935be
Reviewed-on: https://go-review.googlesource.com/c/go/+/404307
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Josh MacDonald <jmacd@lightstep.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-09-16 16:32:20 +00:00
Michael Anthony Knyszek 5a37965495 runtime: make mheap.pagesInUse an atomic.Uintptr
This change fixes an old TODO that made it a uint64 because it would
make alignment within mheap more complicated. Now that we don't have to
worry about it since we're using atomic types as much as possible,
switch to using a Uintptr. This likely will improve performance a tiny
bit on 32-bit platforms, but really it's mostly cleanup.

Change-Id: Ie705799a111ccad977fc1f43de8b50cf611be303
Reviewed-on: https://go-review.googlesource.com/c/go/+/429221
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
2022-09-08 16:06:01 +00:00
Michael Anthony Knyszek e28cc362a8 runtime: remove alignment padding in mheap and pageAlloc
All subfields use atomic types to ensure alignment, so there's no more
need for these fields.

Change-Id: Iada4253f352a074073ce603f1f6b07cbd5b7c58a
Reviewed-on: https://go-review.googlesource.com/c/go/+/429220
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2022-09-08 16:06:00 +00:00
Ludi Rehak f15761b50b runtime: fix formula for computing number of padding bytes
In order to prevent false sharing of cache lines, structs are
padded with some number of bytes. These bytes are unused, serving
only to make the size of the struct a multiple of the size of the
cache line.

The current calculation of how much to pad is an overestimation,
when the struct size is already a multiple of the cache line size
without padding. For these cases, no padding is necessary, and
the size of the inner pad field should be 0. The bug is that the
pad field is sized to a whole 'nother cache line, wasting space.

Here is the current formula that can never return 0:
cpu.CacheLinePadSize - unsafe.Sizeof(myStruct{})%cpu.CacheLinePadSize

This change simply mods that calculation by cpu.CacheLinePadSize,
so that 0 will be returned instead of cpu.CacheLinePadSize.

Change-Id: I26a2b287171bf47a3b9121873b2722f728381b5e
Reviewed-on: https://go-review.googlesource.com/c/go/+/414214
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Joedian Reid <joedian@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-19 16:04:12 +00:00
Cuong Manh Le a719a78c1b runtime: add and use runtime/internal/sys.NotInHeap
Updates #46731

Change-Id: Ic2208c8bb639aa1e390be0d62e2bd799ecf20654
Reviewed-on: https://go-review.googlesource.com/c/go/+/421878
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2022-08-19 00:29:18 +00:00
Cuong Manh Le 014f0e8205 runtime: convert mSpanStateBox.s to atomic type
Updates #53821

Change-Id: I02f31a7a8295deb3e840565412abf10ff776c2c3
Reviewed-on: https://go-review.googlesource.com/c/go/+/424395
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2022-08-17 16:26:24 +00:00
Keith Randall 6a9c674a09 runtime: redo heap bitmap
[this is a retry of CL 407035 + its revert CL 422395. The content is unchanged]

Use just 1 bit per word to record the ptr/nonptr bitmap.
Use word-sized operations to manipulate the bitmap, so we can operate
on up to 64 ptr/nonptr bits at a time.

Use a separate bitmap, one bit per word of the ptr/nonptr bitmap,
to encode a no-more-pointers signal. Since we can check 64 ptr/nonptr
bits at once, knowing the exact last pointer location is not necessary.

As a followon CL, we should make the gcdata bitmap an array of
uintptr instead of an array of byte, so we can load 64 bits of it at once.
Similarly for the processing of gc programs.

Change-Id: Ica5eb622f5b87e647be64f471d67b02732ef8be6
Reviewed-on: https://go-review.googlesource.com/c/go/+/422634
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
2022-08-16 20:39:36 +00:00
Keith Randall cb13022a24 runtime: ensure that we don't scan noscan objects
We claim to not maintain pointer bits for noscan objects. But in fact
we do, since whenever we switch a page from scannable to noscan, we
call heapBits.initSpan which zeroes the heap bits.

Switch to ensure that we never scan noscan objects. This ensures that
we don't depend on the ptrbits for noscan objects. That fixes a bug
in the 1-bit bitmap CL which depended on that fact.

Change-Id: I4e66f582605b53732f8fca310c1f6bd2892963cb
Reviewed-on: https://go-review.googlesource.com/c/go/+/422435
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-09 22:28:42 +00:00