mirror/go - go - Git Fam. Sieh

Commit Graph

Author	SHA1	Message	Date
Andy Pan	4c2b1e0feb	runtime: migrate internal/atomic to internal/runtime For #65355 Change-Id: I65dd090fb99de9b231af2112c5ccb0eb635db2be Reviewed-on: https://go-review.googlesource.com/c/go/+/560155 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ibrahim Bazoka <ibrahimbazoka729@gmail.com> Auto-Submit: Emmanuel Odeke <emmanuel@orijtech.com>	2024-03-25 19:53:03 +00:00
Michael Anthony Knyszek	f119abb65d	runtime: refactor runtime->tracer API to appear more like a lock Currently the execution tracer synchronizes with itself using very heavyweight operations. As a result, it's totally fine for most of the tracer code to look like: if traceEnabled() { traceXXX(...) } However, if we want to make that synchronization more lightweight (as issue #60773 proposes), then this is insufficient. In particular, we need to make sure the tracer can't observe an inconsistency between g atomicstatus and the event that would be emitted for a particular g transition. This means making the g status change appear to happen atomically with the corresponding trace event being written out from the perspective of the tracer. This requires a change in API to something more like a lock. While we're here, we might as well make sure that trace events can only be emitted while this lock is held. This change introduces such an API: traceAcquire, which returns a value that can emit events, and traceRelease, which requires the value that was returned by traceAcquire. In practice, this won't be a real lock, it'll be more like a seqlock. For the current tracer, this API is completely overkill and the value returned by traceAcquire basically just checks trace.enabled. But it's necessary for the tracer described in #60773 and we can implement that more cleanly if we do this refactoring now instead of later. For #60773. Change-Id: Ibb9ff5958376339fafc2b5180aef65cf2ba18646 Reviewed-on: https://go-review.googlesource.com/c/go/+/515635 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>	2023-11-09 22:34:25 +00:00
Cherry Mui	340a4f55c4	runtime: use smaller fields for mspan.freeindex and nelems mspan.freeindex and nelems can fit into uint16 for all possible values. Use uint16 instead of uintptr. Change-Id: Ifce20751e81d5022be1f6b5cbb5fbe4fd1728b1b Reviewed-on: https://go-review.googlesource.com/c/go/+/451359 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2023-10-02 20:39:21 +00:00
Michael Anthony Knyszek	8992bb19ad	runtime: replace trace.enabled with traceEnabled [git-generate] cd src/runtime grep -l 'trace\.enabled' *.go \| grep -v "trace.go" \| xargs sed -i 's/trace\.enabled/traceEnabled()/g' Change-Id: I14c7821c1134690b18c8abc0edd27abcdabcad72 Reviewed-on: https://go-review.googlesource.com/c/go/+/494181 Run-TryBot: Michael Knyszek <mknyszek@google.com> Auto-Submit: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>	2023-05-11 21:27:08 +00:00
Michael Knyszek	2b59307ac2	Revert "runtime: delay incrementing freeindex in malloc" This reverts commit `bed2b7cf41`. Reason for revert: I clicked submit by accident on the wrong CL. Change-Id: Iddf128cb62f289d472510eb30466e515068271b2 Reviewed-on: https://go-review.googlesource.com/c/go/+/449501 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com>	2022-11-14 22:09:16 +00:00
Cherry Mui	bed2b7cf41	runtime: delay incrementing freeindex in malloc When the GC is scanning some memory (possibly conservatively), finding a pointer, while concurrently another goroutine is allocating an object at the same address as the found pointer, the GC may see the pointer before the object and/or the heap bits are initialized. This may cause the GC to see bad pointers and possibly crash. To prevent this, we make it that the scanner can only see the object as allocated after the object and the heap bits are initialized. As the scanner uses the freeindex to determine if an object is allocated, we delay the increment of freeindex after the initialization. As currently in some code path finding the next free index and updating the free index to a new slot past it is coupled, this needs a small refactoring. In the new code mspan.nextFreeIndex return the next free index but not update it (although allocCache is updated). mallocgc will update it at a later time. Fixes #54596. Change-Id: I6dd5ccf743f2d2c46a1ed67c6a8237fe09a71260 Reviewed-on: https://go-review.googlesource.com/c/go/+/427619 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>	2022-11-11 18:33:54 +00:00
Michael Anthony Knyszek	7866538d25	runtime: add safe arena support to the runtime This change adds an API to the runtime for arenas. A later CL can potentially export it as an experimental API, but for now, just the runtime implementation will suffice. The purpose of arenas is to improve efficiency, primarily by allowing for an application to manually free memory, thereby delaying garbage collection. It comes with other potential performance benefits, such as better locality, a better allocation strategy, and better handling of interior pointers by the GC. This implementation is based on one by danscales@google.com with a few significant differences: * The implementation lives entirely in the runtime (all layers). * Arena chunks are the minimum of 8 MiB or the heap arena size. This choice is made because in practice 64 MiB appears to be way too large of an area for most real-world use-cases. * Arena chunks are not unmapped, instead they're placed on an evacuation list and when there are no pointers left pointing into them, they're allowed to be reused. * Reusing partially-used arena chunks no longer tries to find one used by the same P first; it just takes the first one available. * In order to ensure worst-case fragmentation is never worse than 25%, only types and slice backing stores whose sizes are 1/4th the size of a chunk or less may be used. Previously larger sizes, up to the size of the chunk, were allowed. * ASAN, MSAN, and the race detector are fully supported. * Sets arena chunks to fault that were deferred at the end of mark termination (a non-public patch once did this; I don't see a reason not to continue that). For #51317. Change-Id: I83b1693a17302554cb36b6daa4e9249a81b1644f Reviewed-on: https://go-review.googlesource.com/c/go/+/423359 Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Michael Knyszek <mknyszek@google.com>	2022-10-12 20:23:30 +00:00
Cuong Manh Le	a719a78c1b	runtime: add and use runtime/internal/sys.NotInHeap Updates #46731 Change-Id: Ic2208c8bb639aa1e390be0d62e2bd799ecf20654 Reviewed-on: https://go-review.googlesource.com/c/go/+/421878 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>	2022-08-19 00:29:18 +00:00
Keith Randall	6a9c674a09	runtime: redo heap bitmap [this is a retry of CL 407035 + its revert CL 422395. The content is unchanged] Use just 1 bit per word to record the ptr/nonptr bitmap. Use word-sized operations to manipulate the bitmap, so we can operate on up to 64 ptr/nonptr bits at a time. Use a separate bitmap, one bit per word of the ptr/nonptr bitmap, to encode a no-more-pointers signal. Since we can check 64 ptr/nonptr bits at once, knowing the exact last pointer location is not necessary. As a followon CL, we should make the gcdata bitmap an array of uintptr instead of an array of byte, so we can load 64 bits of it at once. Similarly for the processing of gc programs. Change-Id: Ica5eb622f5b87e647be64f471d67b02732ef8be6 Reviewed-on: https://go-review.googlesource.com/c/go/+/422634 Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@golang.org>	2022-08-16 20:39:36 +00:00
Keith Randall	ad0287f496	Revert "runtime: redo heap bitmap" This reverts commit `b589208c8c`. Reason for revert: Bug somewhere in this code, causing wasm and maybe linux/386 to fail. Change-Id: I5e1e501d839584e0219271bb937e94348f83c11f Reviewed-on: https://go-review.googlesource.com/c/go/+/422395 Reviewed-by: Than McIntosh <thanm@google.com> Run-TryBot: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-08-09 16:10:10 +00:00
Keith Randall	b589208c8c	runtime: redo heap bitmap Use just 1 bit per word to record the ptr/nonptr bitmap. Use word-sized operations to manipulate the bitmap, so we can operate on up to 64 ptr/nonptr bits at a time. Use a separate bitmap, one bit per word of the ptr/nonptr bitmap, to encode a no-more-pointers signal. Since we can check 64 ptr/nonptr bits at once, knowing the exact last pointer location is not necessary. This cleans up the bitmap implementation significantly, which will hopefully make it faster. TODO: measure As a followon CL, we should make the gcdata bitmap an array of uintptr instead of an array of byte, so we can load 64 bits of it at once. Similarly for the processing of gc programs. Change-Id: I18151b1876d9543599800dec51e2a1b19df97d49 Reviewed-on: https://go-review.googlesource.com/c/go/+/407035 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@google.com>	2022-08-08 16:57:33 +00:00
Michael Anthony Knyszek	903f313d8b	runtime: fix unclosed GCSweepStart trace event CL 333389 erroneously moved traceGCSweepDone inside the sl.valid block that it introduced in mcentral.cacheSpan, when it should have left it outside that scope, because the trace event is created unconditionally at the top of the method. Fixes #49231. Change-Id: If719c05063d4f4d330a320da6dc00d1e9d8102e4 Reviewed-on: https://go-review.googlesource.com/c/go/+/359775 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> TryBot-Result: Go Bot <gobot@golang.org>	2021-10-29 18:34:24 +00:00
Michael Anthony Knyszek	d8fc7f785e	runtime: clean up allocation zeroing Currently, the runtime zeroes allocations in several ways. First, small object spans are always zeroed if they come from mheap, and their slots are zeroed later in mallocgc if needed. Second, large object spans (objects that have their own spans) plumb the need for zeroing down into mheap. Thirdly, large objects that have no pointers have their zeroing delayed until after preemption is reenabled, but before returning in mallocgc. All of this has two consequences: 1. Spans for small objects that come from mheap are sometimes unnecessarily zeroed, even if the mallocgc call that created them doesn't need the object slot to be zeroed. 2. This is all messy and difficult to reason about. This CL simplifies this code, resolving both (1) and (2). First, it recognizes that zeroing in mheap is unnecessary for small object spans; mallocgc and its callees in mcache and mcentral, by design, are always able to deal with non-zeroed spans. They must, for they deal with recycled spans all the time. Once this fact is made clear, the only remaining use of zeroing in mheap is for large objects. As a result, this CL lifts mheap zeroing for large objects into mallocgc, to parallel all the other codepaths in mallocgc. This is makes the large object allocation code less surprising. Next, this CL sets the flag for the delayed zeroing explicitly in the one case where it matters, and inverts and renames the flag from isZeroed to delayZeroing. Finally, it adds a check to make sure that only pointer-free allocations take the delayed zeroing codepath, as an extra safety measure. Benchmark results: https://perf.golang.org/search?q=upload:20211028.8 Inspired by tapir.liu@gmail.com's CL 343470. Change-Id: I7e1296adc19ce8a02c8d93a0a5082aefb2673e8f Reviewed-on: https://go-review.googlesource.com/c/go/+/359477 Trust: Michael Knyszek <mknyszek@google.com> Reviewed-by: David Chase <drchase@google.com>	2021-10-29 17:44:15 +00:00
Michael Anthony Knyszek	3aecb3a8f7	runtime: fix sweep termination condition Currently, there is a chance that the sweep termination condition could flap, causing e.g. runtime.GC to return before all sweep work has not only been drained, but also completed. CL 307915 and CL 307916 attempted to fix this problem, but it is still possible that mheap_.sweepDrained is marked before any outstanding sweepers are accounted for in mheap_.sweepers, leaving a window in which a thread could observe isSweepDone as true before it actually was (and after some time it would revert to false, then true again, depending on the number of outstanding sweepers at that point). This change fixes the sweep termination condition by merging mheap_.sweepers and mheap_.sweepDrained into a single atomic value. This value is updated such that a new potential sweeper will increment the oustanding sweeper count iff there are still outstanding spans to be swept without an outstanding sweeper to pick them up. This design simplifies the sweep termination condition into a single atomic load and comparison and ensures the condition never flaps. Updates #46500. Fixes #45315. Change-Id: I6d69aff156b8d48428c4cc8cfdbf28be346dbf04 Reviewed-on: https://go-review.googlesource.com/c/go/+/333389 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2021-10-29 17:12:47 +00:00
David Chase	0bbfc5c31e	runtime: break up large calls to memclrNoHeapPointers to allow preemption If something "huge" is allocated, and the zeroing is trivial (no pointers involved) then zero it by chunks in a loop so that preemption can occur, not all in a single non-preemptible call. Benchmarking suggests that 256K is the best chunk size. Updates #42642. Change-Id: I94015e467eaa098c59870e479d6d83bc88efbfb4 Reviewed-on: https://go-review.googlesource.com/c/go/+/270943 Trust: David Chase <drchase@google.com> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>	2021-04-30 19:41:02 +00:00
Austin Clements	a25a77aed2	runtime: block sweep completion on all sweep paths The runtime currently has two different notions of sweep completion: 1. All spans are either swept or have begun sweeping. 2. The sweeper has finished sweeping all spans. Most things depend on condition 1. Notably, GC correctness depends on condition 1, but since all sweep operations a non-preemptible, the STW at the beginning of GC forces condition 1 to become condition 2. runtime.GC(), however, depends on condition 2, since the intent is to complete a complete GC cycle, and also update the heap profile (which can only be done after sweeping is complete). However, the way we compute condition 2 is racy right now and may in fact only indicate condition 1. Specifically, sweepone blocks condition 2 until all sweepone calls are done, but there are many other ways to enter the sweeper that don't block this. Hence, sweepone may see that there are no more spans in the sweep list and see that it's the last sweepone and declare sweeping done, while there's some other sweeper still working on a span. Fix this by making sure every entry to the sweeper participates in the protocol that blocks condition 2. To make sure we get this right, this CL introduces a type to track sweep blocking and (lightly) enforces span sweep ownership via the type system. This has the nice side-effect of abstracting the pattern of acquiring sweep ownership that's currently repeated in many different places. Fixes #45315. Change-Id: I7fab30170c5ae14c8b2f10998628735b8be6d901 Reviewed-on: https://go-review.googlesource.com/c/go/+/307915 Trust: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>	2021-04-12 19:22:50 +00:00
Matthew Dempsky	4662029264	runtime: simplify divmagic for span calculations It's both simpler and faster to just unconditionally do two 32-bit multiplies rather than a bunch of branching to try to avoid them. This is safe thanks to the tight bounds derived in [1] and verified during mksizeclasses.go. Benchstat results below for compilebench benchmarks on my P920. See also [2] for micro benchmarks comparing the new functions against the originals (as well as several basic attempts at optimizing them). name old time/op new time/op delta Template 295ms ± 3% 290ms ± 1% -1.95% (p=0.000 n=20+20) Unicode 113ms ± 3% 110ms ± 2% -2.32% (p=0.000 n=21+17) GoTypes 1.78s ± 1% 1.76s ± 1% -1.23% (p=0.000 n=21+20) Compiler 119ms ± 2% 117ms ± 4% -1.53% (p=0.007 n=20+20) SSA 14.3s ± 1% 13.8s ± 1% -3.12% (p=0.000 n=17+20) Flate 173ms ± 2% 170ms ± 1% -1.64% (p=0.000 n=20+19) GoParser 278ms ± 2% 273ms ± 2% -1.92% (p=0.000 n=20+19) Reflect 686ms ± 3% 671ms ± 3% -2.18% (p=0.000 n=19+20) Tar 255ms ± 2% 248ms ± 2% -2.90% (p=0.000 n=20+20) XML 335ms ± 3% 327ms ± 2% -2.34% (p=0.000 n=20+20) LinkCompiler 799ms ± 1% 799ms ± 1% ~ (p=0.925 n=20+20) ExternalLinkCompiler 1.90s ± 1% 1.90s ± 0% ~ (p=0.327 n=20+20) LinkWithoutDebugCompiler 385ms ± 1% 386ms ± 1% ~ (p=0.251 n=18+20) [Geo mean] 512ms 504ms -1.61% name old user-time/op new user-time/op delta Template 286ms ± 4% 282ms ± 4% -1.42% (p=0.025 n=21+20) Unicode 104ms ± 9% 102ms ±14% ~ (p=0.294 n=21+20) GoTypes 1.75s ± 3% 1.72s ± 2% -1.36% (p=0.000 n=21+20) Compiler 109ms ±11% 108ms ± 8% ~ (p=0.187 n=21+19) SSA 14.0s ± 1% 13.5s ± 2% -3.25% (p=0.000 n=16+20) Flate 166ms ± 4% 164ms ± 4% -1.34% (p=0.032 n=19+19) GoParser 268ms ± 4% 263ms ± 4% -1.71% (p=0.011 n=18+20) Reflect 666ms ± 3% 654ms ± 4% -1.77% (p=0.002 n=18+20) Tar 245ms ± 5% 236ms ± 6% -3.34% (p=0.000 n=20+20) XML 320ms ± 4% 314ms ± 3% -2.01% (p=0.001 n=19+18) LinkCompiler 744ms ± 4% 747ms ± 3% ~ (p=0.627 n=20+19) ExternalLinkCompiler 1.71s ± 3% 1.72s ± 2% ~ (p=0.149 n=20+20) LinkWithoutDebugCompiler 345ms ± 6% 342ms ± 8% ~ (p=0.355 n=20+20) [Geo mean] 484ms 477ms -1.50% [1] Daniel Lemire, Owen Kaser, Nathan Kurz. 2019. "Faster Remainder by Direct Computation: Applications to Compilers and Software Libraries." https://arxiv.org/abs/1902.01961 [2] https://github.com/mdempsky/benchdivmagic Change-Id: Ie4d214e7a908b0d979c878f2d404bd56bdf374f6 Reviewed-on: https://go-review.googlesource.com/c/go/+/300994 Run-TryBot: Matthew Dempsky <mdempsky@google.com> Trust: Matthew Dempsky <mdempsky@google.com> Trust: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>	2021-03-12 18:02:59 +00:00
Cherry Zhang	9933f66555	runtime: remove residual !go115NewMCentralImpl fields Change-Id: I1685721c82be4ac3c854084592e5e0f182b367ef Reviewed-on: https://go-review.googlesource.com/c/go/+/266858 Trust: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>	2020-10-31 17:29:05 +00:00
Michael Anthony Knyszek	a5088e76f1	runtime: remove mcentral.nmalloc and add mcache.local_nsmallalloc This change removes mcentral.nmalloc and adds mcache.local_nsmallalloc which fulfills the same role but may be accessed non-atomically. It also moves responsibility for updating heap_live and local_nsmallalloc into mcache functions. As a result of this change, mcache is now the sole source-of-truth for malloc stats. It is also solely responsible for updating heap_live and performing the various operations required as a result of updating heap_live. The overall improvement here is in code organization: previously malloc stats were fairly scattered, and now they have one single home, and nearly all the required manipulations exist in a single file. Change-Id: I7e93fa297c1debf17e3f2a0d68aeed28a9c6af00 Reviewed-on: https://go-review.googlesource.com/c/go/+/246966 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>	2020-10-26 17:26:23 +00:00
Michael Anthony Knyszek	e6d0bd2b89	runtime: clean up old mcentral code This change deletes the old mcentral implementation from the code base and the newMCentralImpl feature flag along with it. Updates #37487. Change-Id: Ibca8f722665f0865051f649ffe699cbdbfdcfcf2 Reviewed-on: https://go-review.googlesource.com/c/go/+/221184 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>	2020-08-17 20:06:49 +00:00
Michael Anthony Knyszek	08bf64a81e	runtime: bound small object sweeping to 100 spans when allocating Currently, the small object sweeper will sweep until it finds a free slot or there are no more spans of that size class to sweep. In dense heaps, this can cause sweeping for a given size class to take unbounded time, and gets worse with larger heaps. This CL limits the small object sweeper to try at most 100 spans before giving up and allocating a fresh span. Since it's already shown that 100 spans are completely full at that point, the space overhead of this fresh span is at most 1%. This CL is based on an experimental CL by Austin Clements (CL 187817) and is updated to be part of the mcentral implementation, gated by go115NewMCentralImpl. Updates #18155. Change-Id: I37a72c2dcc61dd6f802d1d0eac3683e6642b6ef8 Reviewed-on: https://go-review.googlesource.com/c/go/+/229998 Run-TryBot: Michael Knyszek <mknyszek@google.com> Reviewed-by: Austin Clements <austin@google.com>	2020-04-27 18:40:56 +00:00
Michael Anthony Knyszek	a13691966a	runtime: add new mcentral implementation Currently mcentral is implemented as a couple of linked lists of spans protected by a lock. Unfortunately this design leads to significant lock contention. The span ownership model is also confusing and complicated. In-use spans jump between being owned by multiple sources, generally some combination of a gcSweepBuf, a concurrent sweeper, an mcentral or an mcache. So first to address contention, this change replaces those linked lists with gcSweepBufs which have an atomic fast path. Then, we change up the ownership model: a span may be simultaneously owned only by an mcentral and the page reclaimer. Otherwise, an mcentral (which now consists of sweep bufs), a sweeper, or an mcache are the sole owners of a span at any given time. This dramatically simplifies reasoning about span ownership in the runtime. As a result of this new ownership model, sweeping is now driven by walking over the mcentrals rather than having its own global list of spans. Because we no longer have a global list and we traditionally haven't used the mcentrals for large object spans, we no longer have anywhere to put large objects. So, this change also makes it so that we keep large object spans in the appropriate mcentral lists. In terms of the static lock ranking, we add the spanSet spine locks in pretty much the same place as the mcentral locks, since they have the potential to be manipulated both on the allocation and sweep paths, like the mcentral locks. This new implementation is turned on by default via a feature flag called go115NewMCentralImpl. Benchmark results for 1 KiB allocation throughput (5 runs each): name \ MiB/s go113 go114 gotip gotip+this-patch AllocKiB-1 1.71k ± 1% 1.68k ± 1% 1.59k ± 2% 1.71k ± 1% AllocKiB-2 2.46k ± 1% 2.51k ± 1% 2.54k ± 1% 2.93k ± 1% AllocKiB-4 4.27k ± 1% 4.41k ± 2% 4.33k ± 1% 5.01k ± 2% AllocKiB-8 4.38k ± 3% 5.24k ± 1% 5.46k ± 1% 8.23k ± 1% AllocKiB-12 4.38k ± 3% 4.49k ± 1% 5.10k ± 1% 10.04k ± 0% AllocKiB-16 4.31k ± 1% 4.14k ± 3% 4.22k ± 0% 10.42k ± 0% AllocKiB-20 4.26k ± 1% 3.98k ± 1% 4.09k ± 1% 10.46k ± 3% AllocKiB-24 4.20k ± 1% 3.97k ± 1% 4.06k ± 1% 10.74k ± 1% AllocKiB-28 4.15k ± 0% 4.00k ± 0% 4.20k ± 0% 10.76k ± 1% Fixes #37487. Change-Id: I92d47355acacf9af2c41bf080c08a8c1638ba210 Reviewed-on: https://go-review.googlesource.com/c/go/+/221182 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2020-04-27 18:19:26 +00:00
Dan Scales	0a820007e7	runtime: static lock ranking for the runtime (enabled by GOEXPERIMENT) I took some of the infrastructure from Austin's lock logging CR https://go-review.googlesource.com/c/go/+/192704 (with deadlock detection from the logs), and developed a setup to give static lock ranking for runtime locks. Static lock ranking establishes a documented total ordering among locks, and then reports an error if the total order is violated. This can happen if a deadlock happens (by acquiring a sequence of locks in different orders), or if just one side of a possible deadlock happens. Lock ordering deadlocks cannot happen as long as the lock ordering is followed. Along the way, I found a deadlock involving the new timer code, which Ian fixed via https://go-review.googlesource.com/c/go/+/207348, as well as two other potential deadlocks. See the constants at the top of runtime/lockrank.go to show the static lock ranking that I ended up with, along with some comments. This is great documentation of the current intended lock ordering when acquiring multiple locks in the runtime. I also added an array lockPartialOrder[] which shows and enforces the current partial ordering among locks (which is embedded within the total ordering). This is more specific about the dependencies among locks. I don't try to check the ranking within a lock class with multiple locks that can be acquired at the same time (i.e. check the ranking when multiple hchan locks are acquired). Currently, I am doing a lockInit() call to set the lock rank of most locks. Any lock that is not otherwise initialized is assumed to be a leaf lock (a very high rank lock), so that eliminates the need to do anything for a bunch of locks (including all architecture-dependent locks). For two locks, root.lock and notifyList.lock (only in the runtime/sema.go file), it is not as easy to do lock initialization, so instead, I am passing the lock rank with the lock calls. For Windows compilation, I needed to increase the StackGuard size from 896 to 928 because of the new lock-rank checking functions. Checking of the static lock ranking is enabled by setting GOEXPERIMENT=staticlockranking before doing a run. To make sure that the static lock ranking code has no overhead in memory or CPU when not enabled by GOEXPERIMENT, I changed 'go build/install' so that it defines a build tag (with the same name) whenever any experiment has been baked into the toolchain (by checking Expstring()). This allows me to avoid increasing the size of the 'mutex' type when static lock ranking is not enabled. Fixes #38029 Change-Id: I154217ff307c47051f8dae9c2a03b53081acd83a Reviewed-on: https://go-review.googlesource.com/c/go/+/207619 Reviewed-by: Dan Scales <danscales@google.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Dan Scales <danscales@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2020-04-07 21:51:03 +00:00
Michael Anthony Knyszek	7f574e476a	runtime: remove unnecessary large parameter to mheap_.alloc mheap_.alloc currently accepts both a spanClass and a "large" parameter indicating whether the allocation is large. These are redundant, since spanClass.sizeclass() == 0 is an equivalent way to determine this and is already used in mheap_.alloc. There are no places in the runtime where the size class could be non-zero and large == true. Updates #35112. Change-Id: Ie66facf8f0faca6f4cd3d20a8ac4bc259e11823d Reviewed-on: https://go-review.googlesource.com/c/go/+/196639 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2019-11-08 16:44:33 +00:00
Michael Anthony Knyszek	814c5058bb	runtime: remove useless heap_objects accounting This change removes useless additional heap_objects accounting for large objects. heap_objects is computed from scratch at ReadMemStats time (which stops the world) by using nlargealloc and nlargefree, so mutating heap_objects turns out to be pointless. As a result, the "large" parameter on "mheap_.freeSpan" is no longer necessary and so this change cleans that up too. Change-Id: I7d6b486d9b57c018e3db46221d81b55fe4c1b021 Reviewed-on: https://go-review.googlesource.com/c/go/+/196637 Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2019-11-08 16:20:27 +00:00
Martin Möhrmann	6ca51f7897	runtime: replace division by span element size by multiply and shifts Divisions are generally slow. The compiler can optimize a division to use a sequence of faster multiplies and shifts (magic constants) if the divisor is not know at compile time. The value of the span element size in mcentral.grow is not known at compile time but magic constants to compute n / span.elementsize are already stored in class_to_divmagic and mspan. They however need to be adjusted to work for (0 <= n <= span.npages * pagesize) instead of (0 <= n < span.npages * pagesize). Change-Id: Ieea59f1c94525a88d012f2557d43691967900deb Reviewed-on: https://go-review.googlesource.com/c/go/+/148057 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2019-03-18 14:05:54 +00:00
Michael Anthony Knyszek	44dcb5cb61	runtime: clean up MSpan* MCache* MCentral* in docs This change cleans up references to MSpan, MCache, and MCentral in the docs via a bunch of sed invocations to better reflect the Go names for the equivalent structures (i.e. mspan, mcache, mcentral) and their methods (i.e. MSpan_Sweep -> mspan.sweep). Change-Id: Ie911ac975a24bd25200a273086dd835ab78b1711 Reviewed-on: https://go-review.googlesource.com/c/147557 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-11-05 22:56:22 +00:00
Brad Fitzpatrick	3813edf26e	all: use "reports whether" consistently in the few places that didn't Go documentation style for boolean funcs is to say: // Foo reports whether ... func Foo() bool (rather than "returns true if") This CL also replaces 4 uses of "iff" with the same "reports whether" wording, which doesn't lose any meaning, and will prevent people from sending typo fixes when they don't realize it's "if and only if". In the past I think we've had the typo CLs updated to just say "reports whether". So do them all at once. (Inspired by the addition of another "returns true if" in CL 146938 in fd_plan9.go) Created with: $ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true iff" \| grep -v vendor) $ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true if" \| grep -v vendor) Change-Id: Ided502237f5ab0d25cb625dbab12529c361a8b9f Reviewed-on: https://go-review.googlesource.com/c/147037 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2018-11-02 22:47:58 +00:00
Austin Clements	416804f3e2	runtime: simplify free count calculation in (un)cacheSpan For unclear reasons, cacheSpan and uncacheSpan compute the number of elements in a span by dividing its size by the element size. This number is simply available in the mspan structure, so just use it. Change-Id: If2e5de6ecec39befd3324bf1da4a275ad000932f Reviewed-on: https://go-review.googlesource.com/c/138656 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-10-09 18:20:23 +00:00
Austin Clements	70d7e96c78	runtime: avoid tracking spans with no objects with mcentral Lazy mcache flushing (golang.org/cl/134783) made it so that moving a span from an mcache to an mcentral was sometimes responsible for sweeping the span. However, it did a "preserving" sweep, which meant it retained ownership, even if the sweeper swept all objects in the span. As a result, we could put a completely unused span back in the mcentral. Fix this by first taking back ownership of the span into the mcentral and moving it to the right mcentral list, and then doing a non-preserving sweep. The non-preserving sweep will move the span to the heap if it sweeps all objects. Change-Id: I244b1893b44b8c00264f0928ac9239449775f617 Reviewed-on: https://go-review.googlesource.com/c/140597 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>	2018-10-09 18:20:13 +00:00
Austin Clements	3f86d7cc67	runtime: tidy mheap.freeSpan freeSpan currently takes a mysterious "acct int32" argument. This is really just a boolean and actually just needs to match the "large" argument to alloc in order to balance out accounting. To make this clearer, replace acct with a "large bool" argument that must match the call to mheap.alloc. Change-Id: Ibc81faefdf9f0583114e1953fcfb362e9c3c76de Reviewed-on: https://go-review.googlesource.com/c/138655 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-10-09 16:43:18 +00:00
Austin Clements	873bd47dfb	runtime: flush mcaches lazily Currently, all mcaches are flushed during STW mark termination as a root marking job. This is currently necessary because all spans must be out of these caches before sweeping begins to avoid races with allocation and to ensure the spans are in the state expected by sweeping. We do it as a root marking job because mcache flushing is somewhat expensive and O(GOMAXPROCS) and this parallelizes the work across the Ps. However, it's also the last remaining root marking job performed during mark termination. This CL moves mcache flushing out of mark termination and performs it lazily. We keep track of the last sweepgen at which each mcache was flushed and as each P is woken from STW, it observes that its mcache is out-of-date and flushes it. The introduces a complication for spans cached in stale mcaches. These may now be observed by background or proportional sweeping or when attempting to add a finalizer, but aren't in a stable state. For example, they are likely to be on the wrong mcentral list. To fix this, this CL extends the sweepgen protocol to also capture whether a span is cached and, if so, whether or not its cache is stale. This protocol blocks asynchronous sweeping from touching cached spans and makes it the responsibility of mcache flushing to sweep the flushed spans. This eliminates the last mark termination root marking job, which means we can now eliminate that entire infrastructure. Updates #26903. This implements lazy mcache flushing. Change-Id: Iadda7aabe540b2026cffc5195da7be37d5b4125e Reviewed-on: https://go-review.googlesource.com/c/134783 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-10-02 20:35:35 +00:00
Austin Clements	45ffeab549	runtime: eliminate most uses of mheap_.arena_* This replaces all uses of the mheap_.arena_* fields outside of mallocinit and sysAlloc. These fields fundamentally assume a contiguous heap between two bounds, so eliminating these is necessary for a sparse heap. Many of these are replaced with checks for non-nil spans at the test address (which in turn checks for a non-nil entry in the heap arena array). Some of them are just for debugging and somewhat meaningless with a sparse heap, so those we just delete. Updates #10460. Change-Id: I8345b95ffc610aed694f08f74633b3c63506a41f Reviewed-on: https://go-review.googlesource.com/85886 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2018-02-15 21:12:22 +00:00
Austin Clements	1a033b1a70	runtime: separate spans of noscan objects Currently, we mix objects with pointers and objects without pointers ("noscan" objects) together in memory. As a result, for every object we grey, we have to check that object's heap bits to find out if it's noscan, which adds to the per-object cost of GC. This also hurts the TLB footprint of the garbage collector because it decreases the density of scannable objects at the page level. This commit improves the situation by using separate spans for noscan objects. This will allow a much simpler noscan check (in a follow up CL), eliminate the need to clear the bitmap of noscan objects (in a follow up CL), and improves TLB footprint by increasing the density of scannable objects. This is also a step toward eliminating dead bits, since the current noscan check depends on checking the dead bit of the first word. This has no effect on the heap size of the garbage benchmark. We'll measure the performance change of this after the follow-up optimizations. This is a cherry-pick from dev.garbage commit `d491e550c3`. The only non-trivial merge conflict was in updatememstats in mstats.go, where we now have to separate the per-spanclass stats from the per-sizeclass stats. Change-Id: I13bdc4869538ece5649a8d2a41c6605371618e40 Reviewed-on: https://go-review.googlesource.com/41251 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-04-28 22:50:31 +00:00
Austin Clements	a5eb3dceaf	runtime: drive proportional sweep directly off heap_live Currently, proportional sweep maintains its own count of how many bytes have been allocated since the beginning of the sweep cycle so it can compute how many pages need to be swept for a given allocation. However, this requires a somewhat complex reimbursement scheme since proportional sweep must be done before a span is allocated, but we don't know how many bytes to charge until we've allocated a span. This means that the allocated byte count used by proportional sweep can go up and down, which has led to underflow bugs in the past (#18043) and is going to interfere with adjusting sweep pacing on-the-fly (for #19076). This approach also means we're maintaining a statistic that is very closely related to heap_live, but has a different 0 value. This is particularly confusing because the sweep ratio is computed based on heap_live, so you have to understand that these two statistics are very closely related. Replace all of this and compute the sweep debt directly from the current value of heap_live. To make this work, we simply save the value of heap_live when the sweep ratio is computed to use as a "basis" for later computing the sweep debt. This eliminates the need for reimbursement as well as the code for maintaining the sweeper's version of the live heap size. For #19076. Coincidentally fixes #18043, since this eliminates sweep reimbursement entirely. Change-Id: I1f931ddd6e90c901a3972c7506874c899251dc2a Reviewed-on: https://go-review.googlesource.com/39832 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-04-21 17:41:57 +00:00
Austin Clements	79c56addb6	runtime: make sweep trace events encompass entire sweep loop Currently, each individual span sweep emits a span to the trace. But sweeps are generally done in loops until some condition is satisfied, so this tracing is lower-level than anyone really wants any hides the fact that no other work is being accomplished between adjacent sweep events. This is also high overhead: enabling tracing significantly impacts sweep latency. Replace this with instead tracing around the sweep loops used for allocation. This is slightly tricky because sweep loops don't generally know if any sweeping will happen in them. Hence, we make the tracing lazy by recording in the P that we would like to start tracing the sweep if one happens, and then only closing the sweep event if we started it. This does mean we don't get tracing on every sweep path, which are legion. However, we get much more informative tracing on the paths that block allocation, which are the paths that matter. Change-Id: I73e14fbb250acb0c9d92e3648bddaa5e7d7e271c Reviewed-on: https://go-review.googlesource.com/40810 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-04-19 18:31:11 +00:00
Austin Clements	4a7cf960c3	runtime: make ReadMemStats STW for < 25µs Currently ReadMemStats stops the world for ~1.7 ms/GB of heap because it collects statistics from every single span. For large heaps, this can be quite costly. This is particularly unfortunate because many production infrastructures call this function regularly to collect and report statistics. Fix this by tracking the necessary cumulative statistics in the mcaches. ReadMemStats still has to stop the world to stabilize these statistics, but there are only O(GOMAXPROCS) mcaches to collect statistics from, so this pause is only 25µs even at GOMAXPROCS=100. Fixes #13613. Change-Id: I3c0a4e14833f4760dab675efc1916e73b4c0032a Reviewed-on: https://go-review.googlesource.com/34937 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2017-03-04 02:56:37 +00:00
Austin Clements	1bc6be6423	runtime: mark several types go:notinheap This covers basically all sysAlloc'd, persistentalloc'd, and fixalloc'd types. Change-Id: I0487c887c2a0ade5e33d4c4c12d837e97468e66b Reviewed-on: https://go-review.googlesource.com/30941 Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-15 17:58:20 +00:00
Austin Clements	38f674687a	[dev.garbage] runtime: reintroduce no-zeroing optimization Currently we always zero objects when we allocate them. We used to have an optimization that would not zero objects that had not been allocated since the whole span was last zeroed (either by getting it from the system or by getting it from the heap, which does a bulk zero), but this depended on the sweeper clobbering the first two words of each object. Hence, we lost this optimization when the bitmap sweeper went away. Re-introduce this optimization using a different mechanism. Each span already keeps a flag indicating that it just came from the OS or was just bulk zeroed by the mheap. We can simply use this flag to know when we don't need to zero an object. This is slightly less efficient than the old optimization: if a span gets allocated and partially used, then GC happens and the span gets returned to the mcentral, then the span gets re-acquired, the old optimization knew that it only had to re-zero the objects that had been reclaimed, whereas this optimization will re-zero everything. However, in this case, you're already paying for the garbage collection, and you've only wasted one zeroing of the span, so in practice there seems to be little difference. (If we did want to revive the full optimization, each span could keep track of a frontier beyond which all free slots are zeroed. I prototyped this and it didn't obvious do any better than the much simpler approach in this commit.) This significantly improves BinaryTree17, which is allocation-heavy (and runs first, so most pages are already zeroed), and slightly improves everything else. name old time/op new time/op delta XBenchGarbage-12 2.15ms ± 1% 2.14ms ± 1% -0.80% (p=0.000 n=17+17) name old time/op new time/op delta BinaryTree17-12 2.71s ± 1% 2.56s ± 1% -5.73% (p=0.000 n=18+19) DivconstI64-12 1.70ns ± 1% 1.70ns ± 1% ~ (p=0.562 n=18+18) DivconstU64-12 1.74ns ± 2% 1.74ns ± 1% ~ (p=0.394 n=20+20) DivconstI32-12 1.74ns ± 0% 1.74ns ± 0% ~ (all samples are equal) DivconstU32-12 1.66ns ± 1% 1.66ns ± 0% ~ (p=0.516 n=15+16) DivconstI16-12 1.84ns ± 0% 1.84ns ± 0% ~ (all samples are equal) DivconstU16-12 1.82ns ± 0% 1.82ns ± 0% ~ (all samples are equal) DivconstI8-12 1.79ns ± 0% 1.79ns ± 0% ~ (all samples are equal) DivconstU8-12 1.60ns ± 0% 1.60ns ± 1% ~ (p=0.603 n=17+19) Fannkuch11-12 2.11s ± 1% 2.11s ± 0% ~ (p=0.333 n=16+19) FmtFprintfEmpty-12 45.1ns ± 4% 45.4ns ± 5% ~ (p=0.111 n=20+20) FmtFprintfString-12 134ns ± 0% 129ns ± 0% -3.45% (p=0.000 n=18+16) FmtFprintfInt-12 131ns ± 1% 129ns ± 1% -1.54% (p=0.000 n=16+18) FmtFprintfIntInt-12 205ns ± 2% 203ns ± 0% -0.56% (p=0.014 n=20+18) FmtFprintfPrefixedInt-12 200ns ± 2% 197ns ± 1% -1.48% (p=0.000 n=20+18) FmtFprintfFloat-12 256ns ± 1% 256ns ± 0% -0.21% (p=0.008 n=18+20) FmtManyArgs-12 805ns ± 0% 804ns ± 0% -0.19% (p=0.001 n=18+18) GobDecode-12 7.21ms ± 1% 7.14ms ± 1% -0.92% (p=0.000 n=19+20) GobEncode-12 5.88ms ± 1% 5.88ms ± 1% ~ (p=0.641 n=18+19) Gzip-12 218ms ± 1% 218ms ± 1% ~ (p=0.271 n=19+18) Gunzip-12 37.1ms ± 0% 36.9ms ± 0% -0.29% (p=0.000 n=18+17) HTTPClientServer-12 78.1µs ± 2% 77.4µs ± 2% ~ (p=0.070 n=19+19) JSONEncode-12 15.5ms ± 1% 15.5ms ± 0% ~ (p=0.063 n=20+18) JSONDecode-12 56.1ms ± 0% 55.4ms ± 1% -1.18% (p=0.000 n=19+18) Mandelbrot200-12 4.05ms ± 0% 4.06ms ± 0% +0.29% (p=0.001 n=18+18) GoParse-12 3.28ms ± 1% 3.21ms ± 1% -2.30% (p=0.000 n=20+20) RegexpMatchEasy0_32-12 69.4ns ± 2% 69.3ns ± 1% ~ (p=0.205 n=18+16) RegexpMatchEasy0_1K-12 239ns ± 0% 239ns ± 0% ~ (all samples are equal) RegexpMatchEasy1_32-12 69.4ns ± 1% 69.4ns ± 1% ~ (p=0.620 n=15+18) RegexpMatchEasy1_1K-12 370ns ± 1% 369ns ± 2% ~ (p=0.088 n=20+20) RegexpMatchMedium_32-12 108ns ± 0% 108ns ± 0% ~ (all samples are equal) RegexpMatchMedium_1K-12 33.6µs ± 3% 33.5µs ± 3% ~ (p=0.718 n=20+20) RegexpMatchHard_32-12 1.68µs ± 1% 1.67µs ± 2% ~ (p=0.316 n=20+20) RegexpMatchHard_1K-12 50.5µs ± 3% 50.4µs ± 3% ~ (p=0.659 n=20+20) Revcomp-12 381ms ± 1% 381ms ± 1% ~ (p=0.916 n=19+18) Template-12 66.5ms ± 1% 65.8ms ± 2% -1.08% (p=0.000 n=20+20) TimeParse-12 317ns ± 0% 319ns ± 0% +0.48% (p=0.000 n=19+12) TimeFormat-12 338ns ± 0% 338ns ± 0% ~ (p=0.124 n=19+18) [Geo mean] 5.99µs 5.96µs -0.54% Change-Id: I638ffd9d9f178835bbfa499bac20bd7224f1a907 Reviewed-on: https://go-review.googlesource.com/22591 Reviewed-by: Rick Hudson <rlh@golang.org>	2016-04-29 15:08:13 +00:00
Austin Clements	15744c92de	[dev.garbage] runtime: remove unused head/end arguments from freeSpan These used to be used for the list of newly freed objects, but that's no longer a thing. Change-Id: I5a4503137b74ec0eae5372ca271b1aa0b32df074 Reviewed-on: https://go-review.googlesource.com/22557 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-29 03:53:08 +00:00
Rick Hudson	f8d0d4fd59	[dev.garbage] runtime: cleanup and optimize span.base() Prior to this CL the base of a span was calculated in various places using shifts or calls to base(). This CL now always calls base() which has been optimized to calculate the base of the span when the span is initialized and store that value in the span structure. Change-Id: I661f2bfa21e3748a249cdf049ef9062db6e78100 Reviewed-on: https://go-review.googlesource.com/20703 Reviewed-by: Austin Clements <austin@google.com>	2016-04-27 21:54:59 +00:00
Rick Hudson	4093481523	[dev.garbage] runtime: add bit and cache ctz64 (count trailing zero) Add to each span a 64 bit cache (allocCache) of the allocBits at freeindex. allocCache is shifted such that the lowest bit corresponds to the bit freeindex. allocBits uses a 0 to indicate an object is free, on the other hand allocCache uses a 1 to indicate an object is free. This facilitates ctz64 (count trailing zero) which counts the number of 0s trailing the least significant 1. This is also the index of the least significant 1. Each span maintains a freeindex indicating the boundary between allocated objects and unallocated objects. allocCache is shifted as freeindex is incremented such that the low bit in allocCache corresponds to the bit a freeindex in the allocBits array. Currently ctz64 is written in Go using a for loop so it is not very efficient. Use of the hardware instruction will follow. With this in mind comparisons of the garbage benchmark are as follows. 1.6 release 2.8 seconds dev:garbage branch 3.1 seconds. Profiling shows the go implementation of ctz64 takes up 1% of the total time. Change-Id: If084ed9c3b1eda9f3c6ab2e794625cb870b8167f Reviewed-on: https://go-review.googlesource.com/20200 Reviewed-by: Austin Clements <austin@google.com>	2016-04-27 21:54:54 +00:00
Rick Hudson	e4ac2d4acc	[dev.garbage] runtime: replace ref with allocCount This is a renaming of the field ref to the more appropriate allocCount. The field holds the number of objects in the span that are currently allocated. Some throws strings were adjusted to more accurately convey the meaning of allocCount. Change-Id: I10daf44e3e9cc24a10912638c7de3c1984ef8efe Reviewed-on: https://go-review.googlesource.com/19518 Reviewed-by: Austin Clements <austin@google.com>	2016-04-27 21:54:49 +00:00
Rick Hudson	3479b065d4	[dev.garbage] runtime: allocate directly from GC mark bits Instead of building a freelist from the mark bits generated by the GC this CL allocates directly from the mark bits. The approach moves the mark bits from the pointer/no pointer heap structures into their own per span data structures. The mark/allocation vectors consist of a single mark bit per object. Two vectors are maintained, one for allocation and one for the GC's mark phase. During the GC cycle's sweep phase the interpretation of the vectors is swapped. The mark vector becomes the allocation vector and the old allocation vector is cleared and becomes the mark vector that the next GC cycle will use. Marked entries in the allocation vector indicate that the object is not free. Each allocation vector maintains a boundary between areas of the span already allocated from and areas not yet allocated from. As objects are allocated this boundary is moved until it reaches the end of the span. At this point further allocations will be done from another span. Since we no longer sweep a span inspecting each freed object the responsibility for maintaining pointer/scalar bits in the heapBitMap containing is now the responsibility of the the routines doing the actual allocation. This CL is functionally complete and ready for performance tuning. Change-Id: I336e0fc21eef1066e0b68c7067cc71b9f3d50e04 Reviewed-on: https://go-review.googlesource.com/19470 Reviewed-by: Austin Clements <austin@google.com>	2016-04-27 21:54:47 +00:00
Rick Hudson	dc65a82eff	[dev.garbage] runtime: mark/allocation helper functions The gcmarkBits is a bit vector used by the GC to mark reachable objects. Once a GC cycle is complete the gcmarkBits swap places with the allocBits. allocBits is then used directly by malloc to locate free objects, thus avoiding the construction of a linked free list. This CL introduces a set of helper functions for manipulating gcmarkBits and allocBits that will be used by later CLs to realize the actual algorithm. Minimal attempts have been made to optimize these helper routines. Change-Id: I55ad6240ca32cd456e8ed4973c6970b3b882dd34 Reviewed-on: https://go-review.googlesource.com/19420 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Rick Hudson <rlh@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-27 21:54:44 +00:00
Brad Fitzpatrick	5fea2ccc77	all: single space after period. The tree's pretty inconsistent about single space vs double space after a period in documentation. Make it consistently a single space, per earlier decisions. This means contributors won't be confused by misleading precedence. This CL doesn't use go/doc to parse. It only addresses // comments. It was generated with: $ perl -i -npe 's,^(\s// .+[a-z]\.) +([A-Z]),$1 $2,' $(git grep -l -E '^\s//(.+\.) +([A-Z])') $ go test go/doc -update Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7 Reviewed-on: https://go-review.googlesource.com/20022 Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Dave Day <djd@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-03-02 00:13:47 +00:00
Austin Clements	0d26efb12a	runtime: remove unnecessary clears of the heap bitmap Currently we clear the heap bitmap of a span both when we allocate that span and when we free it. There's no point in doing both, and we definitely have to write the heap bitmap when we allocate a span for pointer-sized objects, so switch to clearing only when we allocate a span. This results in a slight overall performance improvement; however, most of the benchmarks that get slower are very short, while the longer benchmarks generally got faster. name old time/op new time/op delta XBenchGarbage-12 2.48ms ± 1% 2.47ms ± 1% -0.58% (p=0.000 n=91+91) name old time/op new time/op delta BinaryTree17-12 2.85s ± 2% 2.85s ± 2% ~ (p=0.550 n=20+19) Fannkuch11-12 2.54s ± 0% 2.47s ± 1% -2.72% (p=0.000 n=19+18) FmtFprintfEmpty-12 51.3ns ± 4% 51.0ns ± 3% ~ (p=0.223 n=20+20) FmtFprintfString-12 169ns ± 0% 167ns ± 0% -1.18% (p=0.000 n=17+16) FmtFprintfInt-12 160ns ± 0% 161ns ± 0% +0.63% (p=0.000 n=16+15) FmtFprintfIntInt-12 267ns ± 0% 269ns ± 1% +0.62% (p=0.000 n=17+20) FmtFprintfPrefixedInt-12 234ns ± 1% 240ns ± 0% +2.80% (p=0.000 n=20+20) FmtFprintfFloat-12 316ns ± 0% 313ns ± 0% -0.76% (p=0.000 n=20+19) FmtManyArgs-12 1.04µs ± 0% 1.05µs ± 0% +0.45% (p=0.000 n=19+16) GobDecode-12 7.90ms ± 1% 7.81ms ± 0% -1.10% (p=0.000 n=18+18) GobEncode-12 6.61ms ± 1% 6.58ms ± 0% -0.46% (p=0.000 n=20+15) Gzip-12 320ms ± 1% 322ms ± 1% +0.47% (p=0.030 n=20+20) Gunzip-12 42.4ms ± 1% 42.6ms ± 0% +0.37% (p=0.000 n=20+20) HTTPClientServer-12 70.7µs ± 1% 70.6µs ± 2% ~ (p=0.784 n=18+20) JSONEncode-12 16.9ms ± 1% 16.8ms ± 0% -0.64% (p=0.000 n=20+20) JSONDecode-12 60.8ms ± 0% 58.6ms ± 1% -3.50% (p=0.000 n=17+18) Mandelbrot200-12 3.92ms ± 0% 3.91ms ± 0% -0.25% (p=0.000 n=19+19) GoParse-12 3.65ms ± 0% 3.68ms ± 1% +0.67% (p=0.000 n=17+16) RegexpMatchEasy0_32-12 102ns ± 1% 102ns ± 2% +0.67% (p=0.009 n=19+19) RegexpMatchEasy0_1K-12 350ns ± 0% 351ns ± 1% +0.34% (p=0.002 n=20+20) RegexpMatchEasy1_32-12 84.1ns ± 2% 84.2ns ± 2% ~ (p=0.799 n=20+18) RegexpMatchEasy1_1K-12 510ns ± 1% 508ns ± 1% -0.45% (p=0.000 n=20+17) RegexpMatchMedium_32-12 132ns ± 1% 134ns ± 1% +0.85% (p=0.000 n=20+19) RegexpMatchMedium_1K-12 40.0µs ± 1% 39.9µs ± 1% -0.29% (p=0.014 n=19+18) RegexpMatchHard_32-12 2.09µs ± 1% 2.05µs ± 0% -1.76% (p=0.000 n=20+18) RegexpMatchHard_1K-12 62.7µs ± 1% 61.8µs ± 1% -1.39% (p=0.000 n=20+19) Revcomp-12 541ms ± 1% 534ms ± 0% -1.16% (p=0.000 n=19+20) Template-12 71.1ms ± 0% 69.1ms ± 0% -2.83% (p=0.000 n=18+19) TimeParse-12 356ns ± 0% 357ns ± 0% +0.36% (p=0.000 n=17+19) TimeFormat-12 358ns ± 0% 372ns ± 1% +3.74% (p=0.000 n=15+18) [Geo mean] 62.6µs 62.5µs -0.25% Change-Id: Ied190b77c7a4d91ec7b2218c592fc31cf7acf362 Reviewed-on: https://go-review.googlesource.com/19633 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-02-25 23:37:19 +00:00
Austin Clements	87d939dee8	runtime: fix (sometimes major) underestimation of heap_live Currently, we update memstats.heap_live from mcache.local_cachealloc whenever we lock the heap (e.g., to obtain a fresh span or to release an unused span). However, under the right circumstances, local_cachealloc can accumulate allocations up to the size of the entire heap without flushing them to heap_live. Specifically, since span allocations from an mcentral don't lock the heap, if a large number of pages are held in an mcentral and the application continues to use and free objects of that size class (e.g., the BinaryTree17 benchmark), local_cachealloc won't be flushed until the mcentral runs out of spans. This is a problem because, unlike many of the memory statistics that are purely informative, heap_live is used to determine when the garbage collector should start and how hard it should work. This commit eliminates local_cachealloc, instead atomically updating heap_live directly. To control contention, we do this only when obtaining a span from an mcentral. Furthermore, we make heap_live conservative: allocating a span assumes that all free slots in that span will be used and accounts for these when the span is allocated, before the objects themselves are. This is important because 1) this triggers the GC earlier than necessary rather than potentially too late and 2) this leads to a conservative GC rate rather than a GC rate that is potentially too low. Alternatively, we could have flushed local_cachealloc when it passed some threshold, but this would require determining a threshold and would cause heap_live to underestimate the true value rather than overestimate. Fixes #12199. name old time/op new time/op delta BinaryTree17-12 2.88s ± 4% 2.88s ± 1% ~ (p=0.470 n=19+19) Fannkuch11-12 2.48s ± 1% 2.48s ± 1% ~ (p=0.243 n=16+19) FmtFprintfEmpty-12 50.9ns ± 2% 50.7ns ± 1% ~ (p=0.238 n=15+14) FmtFprintfString-12 175ns ± 1% 171ns ± 1% -2.48% (p=0.000 n=18+18) FmtFprintfInt-12 159ns ± 1% 158ns ± 1% -0.78% (p=0.000 n=19+18) FmtFprintfIntInt-12 270ns ± 1% 265ns ± 2% -1.67% (p=0.000 n=18+18) FmtFprintfPrefixedInt-12 235ns ± 1% 234ns ± 0% ~ (p=0.362 n=18+19) FmtFprintfFloat-12 309ns ± 1% 308ns ± 1% -0.41% (p=0.001 n=18+19) FmtManyArgs-12 1.10µs ± 1% 1.08µs ± 0% -1.96% (p=0.000 n=19+18) GobDecode-12 7.81ms ± 1% 7.80ms ± 1% ~ (p=0.425 n=18+19) GobEncode-12 6.53ms ± 1% 6.53ms ± 1% ~ (p=0.817 n=19+19) Gzip-12 312ms ± 1% 312ms ± 2% ~ (p=0.967 n=19+20) Gunzip-12 42.0ms ± 1% 41.9ms ± 1% ~ (p=0.172 n=19+19) HTTPClientServer-12 63.7µs ± 1% 63.8µs ± 1% ~ (p=0.639 n=19+19) JSONEncode-12 16.4ms ± 1% 16.4ms ± 1% ~ (p=0.954 n=19+19) JSONDecode-12 58.5ms ± 1% 57.8ms ± 1% -1.27% (p=0.000 n=18+19) Mandelbrot200-12 3.86ms ± 1% 3.88ms ± 0% +0.44% (p=0.000 n=18+18) GoParse-12 3.67ms ± 2% 3.66ms ± 1% -0.52% (p=0.001 n=18+19) RegexpMatchEasy0_32-12 100ns ± 1% 100ns ± 0% ~ (p=0.257 n=19+18) RegexpMatchEasy0_1K-12 347ns ± 1% 347ns ± 1% ~ (p=0.527 n=18+18) RegexpMatchEasy1_32-12 83.7ns ± 2% 83.1ns ± 2% ~ (p=0.096 n=18+19) RegexpMatchEasy1_1K-12 509ns ± 1% 505ns ± 1% -0.75% (p=0.000 n=18+19) RegexpMatchMedium_32-12 130ns ± 2% 129ns ± 1% ~ (p=0.962 n=20+20) RegexpMatchMedium_1K-12 39.5µs ± 2% 39.4µs ± 1% ~ (p=0.376 n=20+19) RegexpMatchHard_32-12 2.04µs ± 0% 2.04µs ± 1% ~ (p=0.195 n=18+17) RegexpMatchHard_1K-12 61.4µs ± 1% 61.4µs ± 1% ~ (p=0.885 n=19+19) Revcomp-12 540ms ± 2% 542ms ± 4% ~ (p=0.552 n=19+17) Template-12 69.6ms ± 1% 71.2ms ± 1% +2.39% (p=0.000 n=20+20) TimeParse-12 357ns ± 1% 357ns ± 1% ~ (p=0.883 n=18+20) TimeFormat-12 379ns ± 1% 362ns ± 1% -4.53% (p=0.000 n=18+19) [Geo mean] 62.0µs 61.8µs -0.44% name old time/op new time/op delta XBenchGarbage-12 5.89ms ± 2% 5.81ms ± 2% -1.41% (p=0.000 n=19+18) Change-Id: I96b31cca6ae77c30693a891cff3fe663fa2447a0 Reviewed-on: https://go-review.googlesource.com/17748 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-12-15 16:16:08 +00:00
Austin Clements	6383fb6152	runtime: deduct correct sweep credit deductSweepCredit expects the size in bytes of the span being allocated, but mCentral_CacheSpan passes the size of a single object in the span. As a result, we don't sweep enough on that call and when mCentral_CacheSpan later calls reimburseSweepCredit, it's very likely to underflow mheap_.spanBytesAlloc, which causes the next call to deductSweepCredit to think it owes a huge number of pages and finish off the whole sweep. In addition to causing the occasional allocation that triggers the full sweep to be potentially extremely expensive relative to other allocations, this can indirectly slow down many other allocations. deductSweepCredit uses sweepone to sweep spans, which returns fully-unused spans to the heap, where these spans are freed and coalesced with neighboring free spans. On the other hand, when mCentral_CacheSpan sweeps a span, it does so with the intent to immediately reuse that span and, as a result, will not return the span to the heap even if it is fully unused. This saves on the cost of locking the heap, finding a span, and initializing that span. For example, before this change, with GOMAXPROCS=1 (or the background sweeper disabled) BinaryTree17 returned roughly 220K spans to the heap and allocated new spans from the heap roughly 232K times. After this change, it returns 1.3K spans to the heap and allocates new spans from the heap 39K times. (With background sweeping these numbers are effectively unchanged because the background sweeper sweeps almost all of the spans with sweepone; however, parallel sweeping saves more than the cost of allocating spans from the heap.) Fixes #13535. Fixes #13589. name old time/op new time/op delta BinaryTree17-12 3.03s ± 1% 2.86s ± 4% -5.61% (p=0.000 n=18+20) Fannkuch11-12 2.48s ± 1% 2.49s ± 1% ~ (p=0.060 n=17+20) FmtFprintfEmpty-12 50.7ns ± 1% 50.9ns ± 1% +0.43% (p=0.025 n=15+16) FmtFprintfString-12 174ns ± 2% 174ns ± 2% ~ (p=0.539 n=19+20) FmtFprintfInt-12 158ns ± 1% 158ns ± 1% ~ (p=0.300 n=18+20) FmtFprintfIntInt-12 269ns ± 2% 269ns ± 2% ~ (p=0.784 n=20+18) FmtFprintfPrefixedInt-12 233ns ± 1% 234ns ± 1% ~ (p=0.389 n=18+18) FmtFprintfFloat-12 309ns ± 1% 310ns ± 1% +0.25% (p=0.048 n=18+18) FmtManyArgs-12 1.10µs ± 1% 1.10µs ± 1% ~ (p=0.259 n=18+19) GobDecode-12 7.81ms ± 1% 7.72ms ± 1% -1.17% (p=0.000 n=19+19) GobEncode-12 6.56ms ± 0% 6.55ms ± 1% ~ (p=0.433 n=17+19) Gzip-12 318ms ± 2% 317ms ± 1% ~ (p=0.578 n=19+18) Gunzip-12 42.1ms ± 2% 42.0ms ± 0% -0.45% (p=0.007 n=18+16) HTTPClientServer-12 63.9µs ± 1% 64.0µs ± 1% ~ (p=0.146 n=17+19) JSONEncode-12 16.4ms ± 1% 16.4ms ± 1% ~ (p=0.271 n=19+19) JSONDecode-12 58.1ms ± 1% 58.0ms ± 1% ~ (p=0.152 n=18+18) Mandelbrot200-12 3.85ms ± 0% 3.85ms ± 0% ~ (p=0.126 n=19+18) GoParse-12 3.71ms ± 1% 3.64ms ± 1% -1.86% (p=0.000 n=20+18) RegexpMatchEasy0_32-12 100ns ± 2% 100ns ± 1% ~ (p=0.588 n=20+20) RegexpMatchEasy0_1K-12 346ns ± 1% 347ns ± 1% +0.27% (p=0.014 n=17+20) RegexpMatchEasy1_32-12 82.9ns ± 3% 83.5ns ± 3% ~ (p=0.096 n=19+20) RegexpMatchEasy1_1K-12 506ns ± 1% 506ns ± 1% ~ (p=0.530 n=19+19) RegexpMatchMedium_32-12 129ns ± 2% 129ns ± 1% ~ (p=0.566 n=20+19) RegexpMatchMedium_1K-12 39.4µs ± 1% 39.4µs ± 1% ~ (p=0.713 n=19+20) RegexpMatchHard_32-12 2.05µs ± 1% 2.06µs ± 1% +0.36% (p=0.008 n=18+20) RegexpMatchHard_1K-12 61.6µs ± 1% 61.7µs ± 1% ~ (p=0.286 n=19+20) Revcomp-12 538ms ± 1% 541ms ± 2% ~ (p=0.081 n=18+19) Template-12 71.5ms ± 2% 71.6ms ± 1% ~ (p=0.513 n=20+19) TimeParse-12 357ns ± 1% 357ns ± 1% ~ (p=0.935 n=19+18) TimeFormat-12 352ns ± 1% 352ns ± 1% ~ (p=0.293 n=19+20) [Geo mean] 62.0µs 61.9µs -0.21% name old time/op new time/op delta XBenchGarbage-12 5.83ms ± 2% 5.86ms ± 3% ~ (p=0.247 n=19+20) Change-Id: I790bb530adace27ccf25d372f24a11954b88443c Reviewed-on: https://go-review.googlesource.com/17745 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-12-15 16:15:38 +00:00
Matthew Dempsky	c17c42e8a5	runtime: rewrite lots of foo_Bar(f, ...) into f.bar(...) Applies to types fixAlloc, mCache, mCentral, mHeap, mSpan, and mSpanList. Two special cases: 1. mHeap_Scavenge() previously didn't take an mheap parameter, so it was specially handled in this CL. 2. mHeap_Free() would have collided with mheap's "free" field, so it's been renamed to (mheap).freeSpan to parallel its underlying (*mheap).freeSpanLocked method. Change-Id: I325938554cca432c166fe9d9d689af2bbd68de4b Reviewed-on: https://go-review.googlesource.com/16221 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-11-12 00:34:58 +00:00

1 2

62 Commits