mirror/go - go - Git Fam. Sieh

Commit Graph

Author	SHA1	Message	Date
Michael Anthony Knyszek	67c2dcbc59	runtime: use OnesCount64 to count allocated objects in a span This change modifies the implementation of (*mspan).countAlloc by using OnesCount64 (which on many systems is intrinsified). It does so by using an unsafe pointer cast, but in this case we don't care about endianness because we're just counting bits set. This change means we no longer need the popcnt table which was redundant in the runtime anyway. We can also simplify the logic here significantly by observing that mark bits allocations are always 8-byte aligned, so we don't need to handle any edge-cases due to the fact that OnesCount64 operates on 64 bits at a time: all irrelevant bits will be zero. Overall, this implementation is significantly faster than the old one on amd64, and should be similarly faster (or better!) on other systems which support the intrinsic. On systems which do not, it should be roughly the same performance because OnesCount64 is implemented using a table in the general case. Results on linux/amd64: name old time/op new time/op delta MSpanCountAlloc/bits=64-4 16.8ns ± 0% 12.7ns ± 0% -24.40% (p=0.000 n=5+4) MSpanCountAlloc/bits=128-4 23.5ns ± 0% 12.8ns ± 0% -45.70% (p=0.000 n=4+5) MSpanCountAlloc/bits=256-4 43.5ns ± 0% 12.8ns ± 0% -70.67% (p=0.000 n=4+5) MSpanCountAlloc/bits=512-4 59.5ns ± 0% 15.4ns ± 0% -74.12% (p=0.008 n=5+5) MSpanCountAlloc/bits=1024-4 116ns ± 1% 23ns ± 0% -79.84% (p=0.000 n=5+4) Change-Id: Id4c994be22224653af5333683a69b0937130ed04 Reviewed-on: https://go-review.googlesource.com/c/go/+/216558 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2020-03-23 17:07:22 +00:00
Austin Clements	ab5a40c5e3	runtime: fix rounding in materializeGCProg materializeGCProg allocates a temporary buffer for unrolling a GC program. Unfortunately, when computing the size of the buffer, it rounds down the number of bytes needed to store bitmap before rounding up the number of pages needed to store those bytes. The fact that it rounds up to pages usually mitigates the rounding down, but the type from #37470 exists right on the boundary where this doesn't work: type Sequencer struct { htable [1 << 17]uint32 buf []byte } On 64-bit, this GC bitmap is exactly 8 KiB of zeros, followed by three one bits. Hence, this needs 8193 bytes of storage, but the current math in materializeGCProg rounds down the three one bits to 8192 bytes. Since this is exactly pageSize, the next step of rounding up to the page size doesn't mitigate this error, and materializeGCProg allocates a buffer that is one byte too small. runGCProg then writes one byte past the end of this buffer, causing either a segfault (if you're lucky!) or memory corruption. Fixes #37470. Change-Id: Iad24c463c501cd9b1dc1924bc2ad007991a094a0 Reviewed-on: https://go-review.googlesource.com/c/go/+/221197 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2020-03-20 16:05:33 +00:00
Austin Clements	7de15e362b	runtime: atomically set span state and use as publication barrier When everything is working correctly, any pointer the garbage collector encounters can only point into a fully initialized heap span, since the span must have been initialized before that pointer could escape the heap allocator and become visible to the GC. However, in various cases, we try to be defensive against bad pointers. In findObject, this is just a sanity check: we never expect to find a bad pointer, but programming errors can lead to them. In spanOfHeap, we don't necessarily trust the pointer and we're trying to check if it really does point to the heap, though it should always point to something. Conservative scanning takes this to a new level, since it can only guess that a word may be a pointer and verify this. In all of these cases, we have a problem that the span lookup and check can race with span initialization, since the span becomes visible to lookups before it's fully initialized. Furthermore, we're about to start initializing the span without the heap lock held, which is going to introduce races where accesses were previously protected by the heap lock. To address this, this CL makes accesses to mspan.state atomic, and ensures that the span is fully initialized before setting the state to mSpanInUse. All loads are now atomic, and in any case where we don't trust the pointer, it first atomically loads the span state and checks that it's mSpanInUse, after which it will have synchronized with span initialization and can safely check the other span fields. For #10958, #24543, but a good fix in general. Change-Id: I518b7c63555b02064b98aa5f802c92b758fef853 Reviewed-on: https://go-review.googlesource.com/c/go/+/203286 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>	2019-10-31 17:09:50 +00:00
Austin Clements	a9b37ae026	runtime: fully initialize span in alloc_m Currently, several important fields of a heap span are set by heapBits.initSpan, which happens after the span has already been published and returned from the locked region of alloc_m. In particular, allocBits is set very late, which makes mspan.isFree unsafe even if you were to lock the heap because it tries to access allocBits. This CL fixes this by populating these fields in alloc_m. The next CL builds on this to only publish the span once it is fully initialized. Together, they'll make it safe to check allocBits even if there is a race with alloc_m. For #10958, #24543, but a good fix in general. Change-Id: I7fde90023af0f497e826b637efa4d19c32840c08 Reviewed-on: https://go-review.googlesource.com/c/go/+/203285 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>	2019-10-31 17:09:48 +00:00
Cuong Manh Le	66f78e9d88	runtime: mark findObject nosplit findObject takes the pointer argument as uintptr. If the pointer is to the local stack and calling findObject happens to require the stack to be reallocated, then spanOf is called for the old pointer. Marking findObject as nosplit fixes the issue. Fixes #35068 Change-Id: I029d36f9c23f91812f18f98839edf02e0ba4082e Reviewed-on: https://go-review.googlesource.com/c/go/+/202798 Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2019-10-26 00:05:49 +00:00
Cuong Manh Le	813d8e8862	runtime: factor out debug.invalidptr case in findObject This helps keeping findObject's frame small. Updates #35068 Change-Id: I1b8c1fcc5831944c86f1a30ed2f2d867a5f2b242 Reviewed-on: https://go-review.googlesource.com/c/go/+/202797 Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2019-10-26 00:05:43 +00:00
Cuong Manh Le	80315322f3	runtime: simplify findObject bad pointer checking condition Factor out case s == nil, make the code cleaner and easier to read. Change-Id: I63f52e14351c0a0d20a611b1fe10fdc0d4947d96 Reviewed-on: https://go-review.googlesource.com/c/go/+/202498 Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2019-10-26 00:05:37 +00:00
Iskander Sharipov	12e63226b9	all: remove commented-out print statements Those print statements are not a good debug helpers and only clutter the code. Change-Id: Ifbf450a04e6fa538af68e6352c016728edb4119a Reviewed-on: https://go-review.googlesource.com/c/go/+/160537 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2019-05-05 08:09:30 +00:00
Keith Randall	db16de9203	runtime: remove kindNoPointers We already have the ptrdata field in a type, which encodes exactly the same information that kindNoPointers does. My problem with kindNoPointers is that it often leads to double-negative code like: t.kind & kindNoPointers != 0 Much clearer is: t.ptrdata == 0 Update #27167 Change-Id: I92307d7f018a6bbe3daca4a4abb4225e359349b1 Reviewed-on: https://go-review.googlesource.com/c/go/+/169157 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2019-03-25 20:46:35 +00:00
Tobias Klauser	9e277f7d55	all: use "reports whether" consistently instead of "returns whether" Follow-up for CL 147037 and after Brad noticed the "returns whether" pattern during the review of CL 150621. Go documentation style for boolean funcs is to say: // Foo reports whether ... func Foo() bool (rather than "returns whether") Created with: $ perl -i -npe 's/returns whether/reports whether/' $(git grep -l "returns whether" \| grep -v vendor) Change-Id: I15fe9ff99180ad97750cd05a10eceafdb12dc0b4 Reviewed-on: https://go-review.googlesource.com/c/150918 Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-12-02 15:12:26 +00:00
Brad Fitzpatrick	3813edf26e	all: use "reports whether" consistently in the few places that didn't Go documentation style for boolean funcs is to say: // Foo reports whether ... func Foo() bool (rather than "returns true if") This CL also replaces 4 uses of "iff" with the same "reports whether" wording, which doesn't lose any meaning, and will prevent people from sending typo fixes when they don't realize it's "if and only if". In the past I think we've had the typo CLs updated to just say "reports whether". So do them all at once. (Inspired by the addition of another "returns true if" in CL 146938 in fd_plan9.go) Created with: $ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true iff" \| grep -v vendor) $ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true if" \| grep -v vendor) Change-Id: Ided502237f5ab0d25cb625dbab12529c361a8b9f Reviewed-on: https://go-review.googlesource.com/c/147037 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2018-11-02 22:47:58 +00:00
Austin Clements	1d09433ec0	runtime: undo manual inlining of mbits.setMarked Since atomic.Or8 is now an intrinsic (and has been for some time), markBits.setMarked is inlinable. Undo the manual inlining of it. Change-Id: I8e37ccf0851ad1d3088d9c8ae0f6f0c439d7eb2d Reviewed-on: https://go-review.googlesource.com/c/138659 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-10-09 16:44:45 +00:00
Keith Randall	9a8372f8bd	cmd/compile,runtime: remove ambiguously live logic The previous CL introduced stack objects. This CL removes the old ambiguously live liveness analysis. After this CL we're relying on stack objects exclusively. Update a bunch of liveness tests to reflect the new world. Fixes #22350 Change-Id: I739b26e015882231011ce6bc1a7f426049e59f31 Reviewed-on: https://go-review.googlesource.com/c/134156 Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-10-03 19:54:16 +00:00
Keith Randall	cbafcc55e8	cmd/compile,runtime: implement stack objects Rework how the compiler+runtime handles stack-allocated variables whose address is taken. Direct references to such variables work as before. References through pointers, however, use a new mechanism. The new mechanism is more precise than the old "ambiguously live" mechanism. It computes liveness at runtime based on the actual references among objects on the stack. Each function records all of its address-taken objects in a FUNCDATA. These are called "stack objects". The runtime then uses that information while scanning a stack to find all of the stack objects on a stack. It then does a mark phase on the stack objects, using all the pointers found on the stack (and ancillary structures, like defer records) as the root set. Only stack objects which are found to be live during this mark phase will be scanned and thus retain any heap objects they point to. A subsequent CL will remove all the "ambiguously live" logic from the compiler, so that the stack object tracing will be required. For this CL, the stack tracing is all redundant with the current ambiguously live logic. Update #22350 Change-Id: Ide19f1f71a5b6ec8c4d54f8f66f0e9a98344772f Reviewed-on: https://go-review.googlesource.com/c/134155 Reviewed-by: Austin Clements <austin@google.com>	2018-10-03 19:52:49 +00:00
Austin Clements	5a8c11ce3e	runtime: rename _MSpan* constants to mSpan* We already aliased mSpanInUse to _MSpanInUse. The dual constants are getting annoying, so fix all of these to use the mSpan* naming convention. This was done automatically with: sed -i -re 's/_?MSpan(Dead\|InUse\|Manual\|Free)/mSpan\1/g' *.go plus deleting the existing definition of mSpanInUse. Change-Id: I09979d9d491d06c10689cea625dc57faa9cc6767 Reviewed-on: https://go-review.googlesource.com/137875 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-09-26 20:51:07 +00:00
Martin Möhrmann	4363c98f62	runtime: do not execute write barrier on newly allocated slice in growslice The new slice created in growslice is cleared during malloc for element types containing pointers and therefore can only contain nil pointers. This change avoids executing write barriers for these nil pointers by adding and using a special bulkBarrierPreWriteSrcOnly function that does not enqueue pointers to slots in dst to the write barrier buffer. Change-Id: If9b18248bfeeb6a874b0132d19520adea593bfc4 Reviewed-on: https://go-review.googlesource.com/115996 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-08-24 08:13:47 +00:00
Austin Clements	b88276da66	runtime: fix bitmap copying corner-cases When an object spans heap arenas, its bitmap is discontiguous, so heapBitsSetType unrolls the bitmap into the object itself and then copies it out to the real heap bitmap. Unfortunately, since this code path is rare, it had two unnoticed bugs related to the head and tail of the bitmap: 1. At the head of the object, we were using hbitp as the destination bitmap pointer rather than h.bitp, but hbitp points into the temporary bitmap space (that is, the object itself), so we were failing to copy the partial bitmap byte at the head of an object. 2. The core copying loop copied all of the full bitmap bytes, but always drove the remaining word count down to 0, even if there was a partial bitmap byte for the tail of the object. As a result, we never wrote partial bitmap bytes at the tail of an object. I found these by enabling out-of-place unrolling all the time. To improve our chances of detecting these sorts of bugs in the future, this CL mimics this by enabling out-of-place mode 50% of the time when doubleCheck is enabled so that we test both in-place and out-of-place mode. Change-Id: I69e5d829fb3444be4cf11f4c6d8462c26dc467e8 Reviewed-on: https://go-review.googlesource.com/110995 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-05-21 18:42:00 +00:00
Austin Clements	3080b7d0af	runtime: unify fetching of locals and arguments maps Currently we have two nearly identical copies of the code that fetches the locals and arguments liveness maps for a frame, plus a third that's a poor knock-off. Unify these all into a single function. Change-Id: Ibce7926a0b0e3d23182112da4e25df899579a585 Reviewed-on: https://go-review.googlesource.com/109698 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-05-14 00:20:16 +00:00
Josh Bleecher Snyder	5af0b28a73	runtime: iterate over set bits in adjustpointers There are several things combined in this change. First, eliminate the gobitvector type in favor of adding a ptrbit method to bitvector. In non-performance-critical code, use that method. In performance critical code, though, load the bitvector data one byte at a time and iterate only over set bits. To support that, add and use sys.Ctz8. name old time/op new time/op delta StackCopyPtr-8 81.8ms ± 5% 78.9ms ± 3% -3.58% (p=0.000 n=97+96) StackCopy-8 65.9ms ± 3% 62.8ms ± 3% -4.67% (p=0.000 n=96+92) StackCopyNoCache-8 105ms ± 3% 102ms ± 3% -3.38% (p=0.000 n=96+95) Change-Id: I00b80f45612708bd440b1a411a57fa6dfa24aa74 Reviewed-on: https://go-review.googlesource.com/109716 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2018-04-29 05:24:44 +00:00
Austin Clements	0fd427fda7	runtime: use entry stack map at function entry Currently, when the runtime looks up the stack map for a frame, it uses frame.continpc - 1 unless continpc is the function entry PC, in which case it uses frame.continpc. As a result, if continpc is the function entry point (which happens for deferred frames), it will actually look up the stack map following the first instruction. I think, though I am not positive, that this is always okay today because the first instruction of a function can never change the stack map. It's usually not a CALL, so it doesn't have PCDATA. Or, if it is a CALL, it has to have the entry stack map. But we're about to start emitting stack maps at every instruction that changes them, which means the first instruction can have PCDATA (notably, in leaf functions that don't have a prologue). To prepare for this, tweak how the runtime looks up stack map indexes so that if continpc is the function entry point, it directly uses the entry stack map. For #24543. Change-Id: I85aa818041cd26aff416f7b1fba186e9c8ca6568 Reviewed-on: https://go-review.googlesource.com/109349 Reviewed-by: Rick Hudson <rlh@golang.org>	2018-04-29 00:03:04 +00:00
Zhou Peng	3412baaa02	runtime: fix comment typo This was a typo mistake according to if cond and runtime/mheap.go:323 Change-Id: Id046d4afbfe0ea43cb29e1a9f400e1f130de221d Reviewed-on: https://go-review.googlesource.com/102575 Reviewed-by: Austin Clements <austin@google.com>	2018-03-26 17:40:46 +00:00
Zhou Peng	b77aad0891	runtime: fix typo, func comments should start with function name Change-Id: I289af4884583537639800e37928c22814d38cba9 Reviewed-on: https://go-review.googlesource.com/98115 Reviewed-by: Alberto Donizetti <alb.donizetti@gmail.com>	2018-03-02 12:03:30 +00:00
Austin Clements	ec25210564	runtime: support a two-level arena map Currently, the heap arena map is a single, large array that covers every possible arena frame in the entire address space. This is practical up to about 48 bits of address space with 64 MB arenas. However, there are two problems with this: 1. mips64, ppc64, and s390x support full 64-bit address spaces (though on Linux only s390x has kernel support for 64-bit address spaces). On these platforms, it would be good to support these larger address spaces. 2. On Windows, processes are charged for untouched memory, so for processes with small heaps, the mostly-untouched 32 MB arena map plus a 64 MB arena are significant overhead. Hence, it would be good to reduce both the arena map size and the arena size, but with a single-level arena, these are inversely proportional. This CL adds support for a two-level arena map. Arena frame numbers are now divided into arenaL1Bits of L1 index and arenaL2Bits of L2 index. At the moment, arenaL1Bits is always 0, so we effectively have a single level map. We do a few things so that this has no cost beyond the current single-level map: 1. We embed the L2 array directly in mheap, so if there's a single entry in the L2 array, the representation is identical to the current representation and there's no extra level of indirection. 2. Hot code that accesses the arena map is structured so that it optimizes to nearly the same machine code as it does currently. 3. We make some small tweaks to hot code paths and to the inliner itself to keep some important functions inlined despite their now-larger ASTs. In particular, this is necessary for heapBitsForAddr and heapBits.next. Possibly as a result of some of the tweaks, this actually slightly improves the performance of the x/benchmarks garbage benchmark: name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.28ms ± 1% 2.26ms ± 1% -1.07% (p=0.000 n=17+19) (https://perf.golang.org/search?q=upload:20180223.2) For #23900. Change-Id: If5164e0961754f97eb9eca58f837f36d759505ff Reviewed-on: https://go-review.googlesource.com/96779 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-02-23 21:59:50 +00:00
Austin Clements	33b76920ec	runtime: rename "arena index" to "arena map" There are too many places where I want to talk about "indexing into the arena index". Make this less awkward and ambiguous by calling it the "arena map" instead. Change-Id: I726b0667bb2139dbc006175a0ec09a871cdf73f9 Reviewed-on: https://go-review.googlesource.com/96777 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-02-23 21:59:48 +00:00
Austin Clements	ea8d7a370d	runtime: clarify address space limit constants and comments Now that we support the full non-contiguous virtual address space of amd64 hardware, some of the comments and constants related to this are out of date. This renames memLimitBits to heapAddrBits because 1<<memLimitBits is no longer the limit of the address space and rewrites the comment to focus first on hardware limits (which span OSes) and then discuss kernel limits. Second, this eliminates the memLimit constant because there's no longer a meaningful "highest possible heap pointer value" on amd64. Updates #23862. Change-Id: I44b32033d2deb6b69248fb8dda14fc0e65c47f11 Reviewed-on: https://go-review.googlesource.com/95498 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-02-21 20:32:36 +00:00
Austin Clements	e9db7b9dd1	runtime: abstract indexing of arena index Accessing the arena index is about to get slightly more complicated. Abstract this away into a set of functions for going back and forth between addresses and arena slice indexes. For #23862. Change-Id: I0b20e74ef47a07b78ed0cf0a6128afe6f6e40f4b Reviewed-on: https://go-review.googlesource.com/95496 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-02-21 20:32:34 +00:00
Austin Clements	3e214e5693	runtime: simplify bulkBarrierPreWrite Currently, bulkBarrierPreWrite uses inheap to decide whether the destination is in the heap or whether to check for stack or global data. However, this isn't the best question to ask. Instead, get the span directly and query its state. This lets us directly determine whether this might be a global, or is stack memory, or is heap memory. At this point, inheap is no longer used in the hot path, so drop it from the must-be-inlined list and substitute spanOf. This will help in a circuitous way with #23862, since fixing that is going to push inheap very slightly over the inline-able threshold on a few platforms. Change-Id: I5360fc1181183598502409f12979899e1e4d45f7 Reviewed-on: https://go-review.googlesource.com/95495 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-02-21 20:32:33 +00:00
Austin Clements	45ffeab549	runtime: eliminate most uses of mheap_.arena_* This replaces all uses of the mheap_.arena_* fields outside of mallocinit and sysAlloc. These fields fundamentally assume a contiguous heap between two bounds, so eliminating these is necessary for a sparse heap. Many of these are replaced with checks for non-nil spans at the test address (which in turn checks for a non-nil entry in the heap arena array). Some of them are just for debugging and somewhat meaningless with a sparse heap, so those we just delete. Updates #10460. Change-Id: I8345b95ffc610aed694f08f74633b3c63506a41f Reviewed-on: https://go-review.googlesource.com/85886 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-02-15 21:12:22 +00:00
Austin Clements	c0392d2e7f	runtime: make the heap bitmap sparse This splits the heap bitmap into separate chunks for every 64MB of the heap and introduces an index mapping from virtual address to metadata. It modifies the heapBits abstraction to use this two-level structure. Finally, it modifies heapBitsSetType to unroll the bitmap into the object itself and then copy it out if the bitmap would span discontiguous bitmap chunks. This is a step toward supporting general sparse heaps, which will eliminate address space conflict failures as well as the limit on the heap size. It's also advantageous for 32-bit. 32-bit already supports discontiguous heaps by always starting the arena at address 0. However, as a result, with a contiguous bitmap, if the kernel chooses a high address (near 2GB) for a heap mapping, the runtime is forced to map up to 128MB of heap bitmap. Now the runtime can map sections of the bitmap for just the parts of the address space used by the heap. Updates #10460. This slightly slows down the x/garbage and compilebench benchmarks. However, I think the slowdown is acceptably small. name old time/op new time/op delta Template 178ms ± 1% 180ms ± 1% +0.78% (p=0.029 n=10+10) Unicode 85.7ms ± 2% 86.5ms ± 2% ~ (p=0.089 n=10+10) GoTypes 594ms ± 0% 599ms ± 1% +0.70% (p=0.000 n=9+9) Compiler 2.86s ± 0% 2.87s ± 0% +0.40% (p=0.001 n=9+9) SSA 7.23s ± 2% 7.29s ± 2% +0.94% (p=0.029 n=10+10) Flate 116ms ± 1% 117ms ± 1% +0.99% (p=0.000 n=9+9) GoParser 146ms ± 1% 146ms ± 0% ~ (p=0.193 n=10+7) Reflect 399ms ± 0% 403ms ± 1% +0.89% (p=0.001 n=10+10) Tar 173ms ± 1% 174ms ± 1% +0.91% (p=0.013 n=10+9) XML 208ms ± 1% 210ms ± 1% +0.93% (p=0.000 n=10+10) [Geo mean] 368ms 371ms +0.79% name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.17ms ± 1% 2.21ms ± 1% +2.15% (p=0.000 n=20+20) Change-Id: I037fd283221976f4f61249119d6b97b100bcbc66 Reviewed-on: https://go-review.googlesource.com/85883 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-02-15 21:12:18 +00:00
Austin Clements	f61057c497	runtime: fix various contiguous bitmap assumptions There are various places that assume the heap bitmap is contiguous and scan it sequentially. We're about to split up the heap bitmap. This commit modifies all of these except heapBitsSetType to use the heapBits abstractions so they can transparently switch to a discontiguous bitmap. Updates #10460. This is a step toward supporting sparse heaps. Change-Id: I2f3994a5785e4dccb66602fb3950bbd290d9392c Reviewed-on: https://go-review.googlesource.com/85882 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-02-15 21:12:17 +00:00
Austin Clements	29e9c4d4a4	runtime: lay out heap bitmap forward in memory Currently the heap bitamp is laid in reverse order in memory relative to the heap itself. This was originally done out of "excessive cleverness" so that computing a bitmap pointer could load only the arena_start field and so that heaps could be more contiguous by growing the arena and the bitmap out from a common center point. However, this appears to have no actual performance benefit, it complicates nearly every use of the bitmap, and it makes already confusing code more confusing. Furthermore, it's still possible to use a single field (the new bitmap_delta) for the bitmap pointer computation by employing slightly different excessive cleverness. Hence, this CL puts the bitmap into forward order. This is a (very) updated version of CL 9404. Change-Id: I743587cc626c4ecd81e660658bad85b54584108c Reviewed-on: https://go-review.googlesource.com/85881 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-02-15 21:12:16 +00:00
Austin Clements	4de468621a	runtime: use spanOf* more widely The logic in the spanOf* functions is open-coded in a lot of places right now. Replace these with calls to the spanOf* functions. Change-Id: I3cc996aceb9a529b60fea7ec6fef22008c012978 Reviewed-on: https://go-review.googlesource.com/85880 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-02-15 21:12:15 +00:00
Austin Clements	058bb7ea27	runtime: split object finding out of heapBitsForObject heapBitsForObject does two things: it finds the base of the object and it creates the heapBits for the base of the object. There are several places where we just care about the base of the object. Furthermore, greyobject only needs the heapBits in the checkmark path and can easily compute them only when needed. Once we eliminate passing the heap bits to grayobject, almost all uses of heapBitsForObject don't need the heap bits. Hence, this splits heapBitsForObject into findObject and heapBitsForAddr (the latter already exists), removes the hbits argument to grayobject, and replaces all heapBitsForObject calls with calls to findObject. In addition to making things cleaner overall, heapBitsForAddr is going to get more expensive shortly, so it's important that we don't do it needlessly. Note that there's an interesting performance pitfall here. I had originally moved findObject to mheap.go, since it made more sense there. However, that leads to a ~2% slow down and a whopping 11% increase in L1 icache misses on both the x/garbage and compilebench benchmarks. This suggests we may want to be more principled about this, but, for now, let's just leave findObject in mbitmap.go. (I tried to make findObject small enough to inline by splitting out the error case, but, sadly, wasn't quite able to get it under the inlining budget.) Change-Id: I7bcb92f383ade565d22a9f2494e4c66fd513fb10 Reviewed-on: https://go-review.googlesource.com/85878 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-02-15 21:12:13 +00:00
Austin Clements	41e6abdc61	runtime: replace mlookup and findObject with heapBitsForObject These functions all serve essentially the same purpose. mlookup is used in only one place and findObject in only three. Use heapBitsForObject instead, which is the most optimized implementation. (This may seem slightly silly because none of these uses care about the heap bits, but we're about to split up the functionality of heapBitsForObject anyway. At that point, findObject will rise from the ashes.) Change-Id: I906468c972be095dd23cf2404a7d4434e802f250 Reviewed-on: https://go-review.googlesource.com/85877 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-02-15 21:12:12 +00:00
Austin Clements	245310883d	runtime: eliminate all writebarrierptr* calls Calls to writebarrierptr can simply be actual pointer writes. Calls to writebarrierptr_prewrite need to go through the write barrier buffer. Updates #22460. Change-Id: I92cee4da98c5baa499f1977563757c76f95bf0ca Reviewed-on: https://go-review.googlesource.com/92704 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-02-13 16:34:45 +00:00
Austin Clements	3675bff55d	runtime: mark heapBits.bits nosplit heapBits.bits is used during bulkBarrierPreWrite via heapBits.isPointer, which means it must not be preempted. If it is preempted, several bad things can happen: 1. This could allow a GC phase change, and the resulting shear between the barriers and the memory writes could result in a lost pointer. 2. Since bulkBarrierPreWrite uses the P's local write barrier buffer, if it also migrates to a different P, it could try to append to the write barrier buffer concurrently with another write barrier. This can result in the buffer's next pointer skipping over its end pointer, which results in a buffer overflow that can corrupt arbitrary other fields in the Ps (or anything in the heap, really, but it'll probably crash from the corrupted P quickly). Fix this by marking heapBits.bits go:nosplit. This would be the perfect use for a recursive no-preempt annotation (#21314). This doesn't actually affect any binaries because this function was always inlined anyway. (I discovered it when I was modifying heapBits and make h.bits() no longer inline, which led to rampant crashes from problem 2 above.) Updates #22987 and #22988 (but doesn't fix because it doesn't actually change the generated code). Change-Id: I60ebb928b1233b0613361ac3d0558d7b1cb65610 Reviewed-on: https://go-review.googlesource.com/83015 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Rick Hudson <rlh@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-12-11 14:51:36 +00:00
Austin Clements	877387e38a	runtime: use buffered write barrier for bulkBarrierPreWrite This modifies bulkBarrierPreWrite to use the buffered write barrier instead of the eager write barrier. This reduces the number of system stack switches and sanity checks by a factor of the buffer size (currently 256). This affects both typedmemmove and typedmemclr. Since this is purely a runtime change, it applies to all arches (unlike the pointer write barrier). name old time/op new time/op delta BulkWriteBarrier-12 7.33ns ± 6% 4.46ns ± 9% -39.10% (p=0.000 n=20+19) Updates #22460. Change-Id: I6a686a63bbf08be02b9b97250e37163c5a90cdd8 Reviewed-on: https://go-review.googlesource.com/73832 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-10-30 18:12:54 +00:00
Martin Möhrmann	7045e6f6c4	runtime: remove unused prefetch functions The only non test user of the assembler prefetch functions is the heapBits.prefetch function which is itself unused. The runtime prefetch functions have no functionality on most platforms and are not inlineable since they are written in assembler. The function call overhead eliminates the performance gains that could be achieved with prefetching and would degrade performance for platforms where the functions are no-ops. If prefetch functions are needed back again later they can be improved by avoiding the function call overhead and implementing them as intrinsics. Change-Id: I52c553cf3607ffe09f0441c6e7a0a818cb21117d Reviewed-on: https://go-review.googlesource.com/44370 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2017-08-08 06:43:49 +00:00
Ilya Tocar	a4ee95c805	runtime: avoid division in gc Replace int division with (cheaper) byte division in heapBitsSetType. Provides noticeable speed-up: GrowSlicePtr-6 181ns ± 3% 169ns ± 3% -6.85% (p=0.000 n=10+10) Change-Id: I4064bb72e8e692023783b8f58d19491844c39382 Reviewed-on: https://go-review.googlesource.com/42290 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2017-06-05 16:32:02 +00:00
Austin Clements	29e88d5130	runtime: print debug info on "base out of range" This adds debugging information when we panic with "heapBitsForSpan: base out of range". Updates #20259. Change-Id: I0dc1a106aa9e9531051c7d08867ace5ef230eb3f Reviewed-on: https://go-review.googlesource.com/43310 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-05-12 14:28:15 +00:00
Austin Clements	8e25d4ccef	runtime: eliminate heapBitsSetTypeNoScan It's no longer necessary to maintain the bitmap of noscan objects since we now use the span metadata to determine that they're noscan instead of the bitmap. The combined effect of segregating noscan spans and the follow-on optimizations is roughly a 1% improvement in performance across the go1 benchmarks and the x/benchmarks, with no increase in heap size. Benchmark details: https://perf.golang.org/search?q=upload:20170420.1 name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.27ms ± 0% 2.25ms ± 1% -0.96% (p=0.000 n=15+18) name old time/op new time/op delta BinaryTree17-12 2.53s ± 2% 2.55s ± 1% +0.68% (p=0.001 n=17+16) Fannkuch11-12 3.02s ± 0% 3.01s ± 0% -0.15% (p=0.000 n=16+16) FmtFprintfEmpty-12 47.1ns ± 7% 47.0ns ± 5% ~ (p=0.886 n=20+17) FmtFprintfString-12 73.6ns ± 3% 73.8ns ± 1% +0.30% (p=0.026 n=19+17) FmtFprintfInt-12 80.3ns ± 2% 80.2ns ± 1% ~ (p=0.994 n=20+18) FmtFprintfIntInt-12 124ns ± 0% 124ns ± 0% ~ (all samples are equal) FmtFprintfPrefixedInt-12 172ns ± 1% 171ns ± 1% -0.72% (p=0.003 n=20+18) FmtFprintfFloat-12 217ns ± 1% 216ns ± 1% -0.27% (p=0.019 n=18+19) FmtManyArgs-12 490ns ± 1% 488ns ± 0% -0.36% (p=0.014 n=18+18) GobDecode-12 6.71ms ± 1% 6.73ms ± 1% +0.42% (p=0.000 n=20+20) GobEncode-12 5.25ms ± 0% 5.24ms ± 0% -0.20% (p=0.001 n=18+20) Gzip-12 227ms ± 0% 226ms ± 1% ~ (p=0.107 n=20+19) Gunzip-12 38.8ms ± 0% 38.8ms ± 0% ~ (p=0.221 n=19+18) HTTPClientServer-12 75.4µs ± 1% 76.3µs ± 1% +1.26% (p=0.000 n=20+19) JSONEncode-12 14.7ms ± 0% 14.7ms ± 1% -0.14% (p=0.002 n=18+17) JSONDecode-12 57.6ms ± 0% 55.2ms ± 0% -4.13% (p=0.000 n=19+19) Mandelbrot200-12 3.73ms ± 0% 3.73ms ± 0% -0.09% (p=0.000 n=19+17) GoParse-12 3.18ms ± 1% 3.15ms ± 1% -0.90% (p=0.000 n=18+20) RegexpMatchEasy0_32-12 73.3ns ± 2% 73.2ns ± 1% ~ (p=0.994 n=20+18) RegexpMatchEasy0_1K-12 236ns ± 2% 234ns ± 1% -0.70% (p=0.002 n=19+17) RegexpMatchEasy1_32-12 69.7ns ± 2% 69.9ns ± 2% ~ (p=0.416 n=20+20) RegexpMatchEasy1_1K-12 366ns ± 1% 365ns ± 1% ~ (p=0.376 n=19+17) RegexpMatchMedium_32-12 109ns ± 1% 108ns ± 1% ~ (p=0.461 n=17+18) RegexpMatchMedium_1K-12 35.2µs ± 1% 35.2µs ± 3% ~ (p=0.238 n=19+20) RegexpMatchHard_32-12 1.77µs ± 1% 1.77µs ± 1% +0.33% (p=0.007 n=17+16) RegexpMatchHard_1K-12 53.2µs ± 0% 53.3µs ± 0% +0.26% (p=0.001 n=17+17) Revcomp-12 1.13s ±117% 0.87s ±184% ~ (p=0.813 n=20+19) Template-12 63.9ms ± 1% 64.6ms ± 1% +1.18% (p=0.000 n=19+20) TimeParse-12 313ns ± 5% 312ns ± 0% ~ (p=0.114 n=20+19) TimeFormat-12 336ns ± 0% 333ns ± 0% -0.97% (p=0.000 n=18+16) [Geo mean] 50.6µs 50.1µs -1.04% This is a cherry-pick of dev.garbage commit `edb54c300f`, with updated benchmark results. Change-Id: Ic77faaa15cdac3bfbbb0032dde5c204e05a0fd8e Reviewed-on: https://go-review.googlesource.com/41253 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-04-28 22:50:37 +00:00
Austin Clements	c44d031bf0	runtime: eliminate heapBits.hasPointers This is no longer necessary now that we can more efficiently consult the span's noscan bit. This is a cherry-pick of dev.garbage commit `312aa09996`. Change-Id: Id0b00b278533660973f45eb6efa5b00f373d58af Reviewed-on: https://go-review.googlesource.com/41252 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-04-28 22:50:34 +00:00
Austin Clements	1a033b1a70	runtime: separate spans of noscan objects Currently, we mix objects with pointers and objects without pointers ("noscan" objects) together in memory. As a result, for every object we grey, we have to check that object's heap bits to find out if it's noscan, which adds to the per-object cost of GC. This also hurts the TLB footprint of the garbage collector because it decreases the density of scannable objects at the page level. This commit improves the situation by using separate spans for noscan objects. This will allow a much simpler noscan check (in a follow up CL), eliminate the need to clear the bitmap of noscan objects (in a follow up CL), and improves TLB footprint by increasing the density of scannable objects. This is also a step toward eliminating dead bits, since the current noscan check depends on checking the dead bit of the first word. This has no effect on the heap size of the garbage benchmark. We'll measure the performance change of this after the follow-up optimizations. This is a cherry-pick from dev.garbage commit `d491e550c3`. The only non-trivial merge conflict was in updatememstats in mstats.go, where we now have to separate the per-spanclass stats from the per-sizeclass stats. Change-Id: I13bdc4869538ece5649a8d2a41c6605371618e40 Reviewed-on: https://go-review.googlesource.com/41251 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-04-28 22:50:31 +00:00
Austin Clements	42c1214762	runtime: eliminate write barriers from alloc/mark bitmaps This introduces a new type, gcBits, to use for alloc/mark bitmap allocations instead of uint8. This type is marked go:notinheap, so uses of it correctly eliminate write barriers. Since we now have a type, this also extracts some common operations to methods both for convenience and to avoid (*uint8) casts at most use sites. For #19325. Change-Id: Id51f734fb2e96b8b7715caa348c8dcd4aef0696a Reviewed-on: https://go-review.googlesource.com/38580 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-04-13 18:20:42 +00:00
Austin Clements	8fbaa4f70b	runtime: rename _MSpanStack -> _MSpanManual We're about to generalize _MSpanStack to be used for other forms of in-heap manual memory management in the runtime. This is an automated rename of _MSpanStack to _MSpanManual plus some comment fix-ups. For #19325. Change-Id: I1e20a57bb3b87a0d324382f92a3e294ffc767395 Reviewed-on: https://go-review.googlesource.com/38574 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-04-13 18:20:30 +00:00
Austin Clements	6c6f455f88	runtime: consolidate changes to arena_used Changing mheap_.arena_used requires several steps that are currently repeated multiple times in mheap_.sysAlloc. Consolidate these into a single function. In the future, this will also make it easier to add other auxiliary VM structures. Change-Id: Ie68837d2612e1f4ba4904acb1b6b832b15431d56 Reviewed-on: https://go-review.googlesource.com/40151 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2017-04-11 01:35:47 +00:00
Keith Randall	d5dc490519	cmd/compile: intrinsics for math/bits.TrailingZerosX Implement math/bits.TrailingZerosX using intrinsics. Generally reorganize the intrinsic spec a bit. The instrinsics data structure is now built at init time. This will make doing the other functions in math/bits easier. Update sys.CtzX to return int instead of uint{64,32} so it matches math/bits.TrailingZerosX. Improve the intrinsics a bit for amd64. We don't need the CMOV for <64 bit versions. Update #18616 Change-Id: Ic1c5339c943f961d830ae56f12674d7b29d4ff39 Reviewed-on: https://go-review.googlesource.com/38155 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-03-16 02:44:16 +00:00
Austin Clements	b992c2649e	runtime: print SP/FP on bad pointer crashes If the bad pointer is on a stack, this makes it possible to find the frame containing the bad pointer. Change-Id: Ieda44e054aa9ebf22d15d184457c7610b056dded Reviewed-on: https://go-review.googlesource.com/37858 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-03-07 20:46:54 +00:00
Austin Clements	0efc8b2188	runtime: avoid repeated findmoduledatap calls Currently almost every function that deals with a _func has to first look up the moduledata for the module containing the function's entry point. This means we almost always do at least two identical module lookups whenever we deal with a _func (one to get the _func and another to get something from its module data) and sometimes several more. Fix this by making findfunc return a new funcInfo type that embeds _func, but also includes the moduledata, and making all of the functions that currently take a _func instead take a funcInfo and use the already-found moduledata. This transformation is trivial for the most part, since the *_func type is usually inferred. The annoying part is that we can no longer use nil to indicate failure, so this introduces a funcInfo.valid() method and replaces nil checks with calls to valid. Change-Id: I9b8075ef1c31185c1943596d96dec45c7ab5100f Reviewed-on: https://go-review.googlesource.com/37331 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>	2017-03-06 19:17:24 +00:00
Austin Clements	b50b728587	runtime: simplify sweep allocation counting Currently sweep counts the number of allocated objects, computes the number of free objects from that, then re-computes the number of allocated objects from that. Simplify and clean this up by skipping these intermediate steps. Change-Id: I3ed98e371eb54bbcab7c8530466c4ab5fde35f0a Reviewed-on: https://go-review.googlesource.com/34935 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-03-03 17:02:16 +00:00

1 2 3

139 Commits