mirror/go - go - Git Fam. Sieh

Commit Graph

Author	SHA1	Message	Date
Austin Clements	15744c92de	[dev.garbage] runtime: remove unused head/end arguments from freeSpan These used to be used for the list of newly freed objects, but that's no longer a thing. Change-Id: I5a4503137b74ec0eae5372ca271b1aa0b32df074 Reviewed-on: https://go-review.googlesource.com/22557 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-29 03:53:08 +00:00
Rick Hudson	2063d5d903	[dev.garbage] runtime: restructure alloc and mark bits Two changes are included here that are dependent on the other. The first is that allocBits and gcamrkBits are changed to a *uint8 which points to the first byte of that span's mark and alloc bits. Several places were altered to perform pointer arithmetic to locate the byte corresponding to an object in the span. The actual bit corresponding to an object is indexed in the byte by using the lower three bits of the objects index. The second change avoids the redundant calculation of an object's index. The index is returned from heapBitsForObject and then used by the functions indexing allocBits and gcmarkBits. Finally we no longer allocate the gc bits in the span structures. Instead we use an arena based allocation scheme that allows for a more compact bit map as well as recycling and bulk clearing of the mark bits. Change-Id: If4d04b2021c092ec39a4caef5937a8182c64dfef Reviewed-on: https://go-review.googlesource.com/20705 Reviewed-by: Austin Clements <austin@google.com>	2016-04-29 00:00:47 +00:00
Rick Hudson	f8d0d4fd59	[dev.garbage] runtime: cleanup and optimize span.base() Prior to this CL the base of a span was calculated in various places using shifts or calls to base(). This CL now always calls base() which has been optimized to calculate the base of the span when the span is initialized and store that value in the span structure. Change-Id: I661f2bfa21e3748a249cdf049ef9062db6e78100 Reviewed-on: https://go-review.googlesource.com/20703 Reviewed-by: Austin Clements <austin@google.com>	2016-04-27 21:54:59 +00:00
Rick Hudson	8dda1c4c08	[dev.garbage] runtime: remove heapBitsSweepSpan Prior to this CL the sweep phase was responsible for locating all objects that were about to be freed and calling a function to process the object. This was done by the function heapBitsSweepSpan. Part of processing included calls to tracefree and msanfree as well as counting how many objects were freed. The calls to tracefree and msanfree have been moved into the gcmalloc routine and called when the object is about to be reallocated. The counting of free objects has been optimized using an array based popcnt algorithm and if all the objects in a span are free then span is freed. Similarly the code to locate the next free object has been optimized to use an array based ctz (count trailing zero). Various hot paths in the allocation logic have been optimized. At this point the garbage benchmark is within 3% of the 1.6 release. Change-Id: I00643c442e2ada1685c010c3447e4ea8537d2dfa Reviewed-on: https://go-review.googlesource.com/20201 Reviewed-by: Austin Clements <austin@google.com>	2016-04-27 21:54:57 +00:00
Rick Hudson	4093481523	[dev.garbage] runtime: add bit and cache ctz64 (count trailing zero) Add to each span a 64 bit cache (allocCache) of the allocBits at freeindex. allocCache is shifted such that the lowest bit corresponds to the bit freeindex. allocBits uses a 0 to indicate an object is free, on the other hand allocCache uses a 1 to indicate an object is free. This facilitates ctz64 (count trailing zero) which counts the number of 0s trailing the least significant 1. This is also the index of the least significant 1. Each span maintains a freeindex indicating the boundary between allocated objects and unallocated objects. allocCache is shifted as freeindex is incremented such that the low bit in allocCache corresponds to the bit a freeindex in the allocBits array. Currently ctz64 is written in Go using a for loop so it is not very efficient. Use of the hardware instruction will follow. With this in mind comparisons of the garbage benchmark are as follows. 1.6 release 2.8 seconds dev:garbage branch 3.1 seconds. Profiling shows the go implementation of ctz64 takes up 1% of the total time. Change-Id: If084ed9c3b1eda9f3c6ab2e794625cb870b8167f Reviewed-on: https://go-review.googlesource.com/20200 Reviewed-by: Austin Clements <austin@google.com>	2016-04-27 21:54:54 +00:00
Rick Hudson	3479b065d4	[dev.garbage] runtime: allocate directly from GC mark bits Instead of building a freelist from the mark bits generated by the GC this CL allocates directly from the mark bits. The approach moves the mark bits from the pointer/no pointer heap structures into their own per span data structures. The mark/allocation vectors consist of a single mark bit per object. Two vectors are maintained, one for allocation and one for the GC's mark phase. During the GC cycle's sweep phase the interpretation of the vectors is swapped. The mark vector becomes the allocation vector and the old allocation vector is cleared and becomes the mark vector that the next GC cycle will use. Marked entries in the allocation vector indicate that the object is not free. Each allocation vector maintains a boundary between areas of the span already allocated from and areas not yet allocated from. As objects are allocated this boundary is moved until it reaches the end of the span. At this point further allocations will be done from another span. Since we no longer sweep a span inspecting each freed object the responsibility for maintaining pointer/scalar bits in the heapBitMap containing is now the responsibility of the the routines doing the actual allocation. This CL is functionally complete and ready for performance tuning. Change-Id: I336e0fc21eef1066e0b68c7067cc71b9f3d50e04 Reviewed-on: https://go-review.googlesource.com/19470 Reviewed-by: Austin Clements <austin@google.com>	2016-04-27 21:54:47 +00:00
Austin Clements	17f6e5396b	runtime: print sweep ratio if gcpacertrace>0 Change-Id: I5217bf4b75e110ca2946e1abecac6310ed84dad5 Reviewed-on: https://go-review.googlesource.com/21205 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-03-30 02:27:58 +00:00
Austin Clements	0d26efb12a	runtime: remove unnecessary clears of the heap bitmap Currently we clear the heap bitmap of a span both when we allocate that span and when we free it. There's no point in doing both, and we definitely have to write the heap bitmap when we allocate a span for pointer-sized objects, so switch to clearing only when we allocate a span. This results in a slight overall performance improvement; however, most of the benchmarks that get slower are very short, while the longer benchmarks generally got faster. name old time/op new time/op delta XBenchGarbage-12 2.48ms ± 1% 2.47ms ± 1% -0.58% (p=0.000 n=91+91) name old time/op new time/op delta BinaryTree17-12 2.85s ± 2% 2.85s ± 2% ~ (p=0.550 n=20+19) Fannkuch11-12 2.54s ± 0% 2.47s ± 1% -2.72% (p=0.000 n=19+18) FmtFprintfEmpty-12 51.3ns ± 4% 51.0ns ± 3% ~ (p=0.223 n=20+20) FmtFprintfString-12 169ns ± 0% 167ns ± 0% -1.18% (p=0.000 n=17+16) FmtFprintfInt-12 160ns ± 0% 161ns ± 0% +0.63% (p=0.000 n=16+15) FmtFprintfIntInt-12 267ns ± 0% 269ns ± 1% +0.62% (p=0.000 n=17+20) FmtFprintfPrefixedInt-12 234ns ± 1% 240ns ± 0% +2.80% (p=0.000 n=20+20) FmtFprintfFloat-12 316ns ± 0% 313ns ± 0% -0.76% (p=0.000 n=20+19) FmtManyArgs-12 1.04µs ± 0% 1.05µs ± 0% +0.45% (p=0.000 n=19+16) GobDecode-12 7.90ms ± 1% 7.81ms ± 0% -1.10% (p=0.000 n=18+18) GobEncode-12 6.61ms ± 1% 6.58ms ± 0% -0.46% (p=0.000 n=20+15) Gzip-12 320ms ± 1% 322ms ± 1% +0.47% (p=0.030 n=20+20) Gunzip-12 42.4ms ± 1% 42.6ms ± 0% +0.37% (p=0.000 n=20+20) HTTPClientServer-12 70.7µs ± 1% 70.6µs ± 2% ~ (p=0.784 n=18+20) JSONEncode-12 16.9ms ± 1% 16.8ms ± 0% -0.64% (p=0.000 n=20+20) JSONDecode-12 60.8ms ± 0% 58.6ms ± 1% -3.50% (p=0.000 n=17+18) Mandelbrot200-12 3.92ms ± 0% 3.91ms ± 0% -0.25% (p=0.000 n=19+19) GoParse-12 3.65ms ± 0% 3.68ms ± 1% +0.67% (p=0.000 n=17+16) RegexpMatchEasy0_32-12 102ns ± 1% 102ns ± 2% +0.67% (p=0.009 n=19+19) RegexpMatchEasy0_1K-12 350ns ± 0% 351ns ± 1% +0.34% (p=0.002 n=20+20) RegexpMatchEasy1_32-12 84.1ns ± 2% 84.2ns ± 2% ~ (p=0.799 n=20+18) RegexpMatchEasy1_1K-12 510ns ± 1% 508ns ± 1% -0.45% (p=0.000 n=20+17) RegexpMatchMedium_32-12 132ns ± 1% 134ns ± 1% +0.85% (p=0.000 n=20+19) RegexpMatchMedium_1K-12 40.0µs ± 1% 39.9µs ± 1% -0.29% (p=0.014 n=19+18) RegexpMatchHard_32-12 2.09µs ± 1% 2.05µs ± 0% -1.76% (p=0.000 n=20+18) RegexpMatchHard_1K-12 62.7µs ± 1% 61.8µs ± 1% -1.39% (p=0.000 n=20+19) Revcomp-12 541ms ± 1% 534ms ± 0% -1.16% (p=0.000 n=19+20) Template-12 71.1ms ± 0% 69.1ms ± 0% -2.83% (p=0.000 n=18+19) TimeParse-12 356ns ± 0% 357ns ± 0% +0.36% (p=0.000 n=17+19) TimeFormat-12 358ns ± 0% 372ns ± 1% +3.74% (p=0.000 n=15+18) [Geo mean] 62.6µs 62.5µs -0.25% Change-Id: Ied190b77c7a4d91ec7b2218c592fc31cf7acf362 Reviewed-on: https://go-review.googlesource.com/19633 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-02-25 23:37:19 +00:00
Austin Clements	4ad64cadf8	runtime: trace sweep completion in gcpacertrace mode Change-Id: I7991612e4d064c15492a39c19f753df1db926203 Reviewed-on: https://go-review.googlesource.com/17747 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Russ Cox <rsc@golang.org>	2015-12-15 16:15:59 +00:00
Austin Clements	c1cbe5b577	runtime: check for spanBytesAlloc underflow Change-Id: I5e6739ff0c6c561195ed9891fb90f933b81e7750 Reviewed-on: https://go-review.googlesource.com/17746 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Russ Cox <rsc@golang.org>	2015-12-15 16:15:47 +00:00
Michael Matloob	432cb66f16	runtime: break out system-specific constants into package sys runtime/internal/sys will hold system-, architecture- and config- specific constants. Updates #11647 Change-Id: I6db29c312556087a42e8d2bdd9af40d157c56b54 Reviewed-on: https://go-review.googlesource.com/16817 Reviewed-by: Russ Cox <rsc@golang.org>	2015-11-12 17:04:45 +00:00
Matthew Dempsky	c17c42e8a5	runtime: rewrite lots of foo_Bar(f, ...) into f.bar(...) Applies to types fixAlloc, mCache, mCentral, mHeap, mSpan, and mSpanList. Two special cases: 1. mHeap_Scavenge() previously didn't take an mheap parameter, so it was specially handled in this CL. 2. mHeap_Free() would have collided with mheap's "free" field, so it's been renamed to (mheap).freeSpan to parallel its underlying (*mheap).freeSpanLocked method. Change-Id: I325938554cca432c166fe9d9d689af2bbd68de4b Reviewed-on: https://go-review.googlesource.com/16221 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-11-12 00:34:58 +00:00
Austin Clements	7d1d642956	runtime: fix use of xadd64 Commit `7407d8e` was rebased over the switch to runtime/internal/atomic and introduced a call to xadd64, which no longer exists. Fix that call. Change-Id: I99c93469794c16504ae4a8ffe3066ac382c66a3a Reviewed-on: https://go-review.googlesource.com/16816 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-11-11 15:26:24 +00:00
Austin Clements	7407d8e582	runtime: fix over-aggressive proportional sweep Currently, sweeping is performed before allocating a span by charging for the entire size of the span requested, rather than the number of bytes actually available for allocation from the returned span. That is, if the returned span is 8K, but already has 6K in use, the mutator is charged for 8K of heap allocation even though it can only allocate 2K more from the span. As a result, proportional sweep is over-aggressive and tends to finish much earlier than it needs to. This effect is more amplified by fragmented heaps. Fix this by reimbursing the mutator for the used space in a span once it has allocated that span. We still have to charge up-front for the worst-case because we don't know which span the mutator will get, but at least we can correct the over-charge once it has a span, which will go toward later span allocations. This has negligible effect on the throughput of the go1 benchmarks and the garbage benchmark. Fixes #12040. Change-Id: I0e23e7a4ccf126cca000fed5067b20017028dd6b Reviewed-on: https://go-review.googlesource.com/16515 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-11-11 15:21:32 +00:00
Michael Matloob	67faca7d9c	runtime: break atomics out into package runtime/internal/atomic This change breaks out most of the atomics functions in the runtime into package runtime/internal/atomic. It adds some basic support in the toolchain for runtime packages, and also modifies linux/arm atomics to remove the dependency on the runtime's mutex. The mutexes have been replaced with spinlocks. all trybots are happy! In addition to the trybots, I've tested on the darwin/arm64 builder, on the darwin/arm builder, and on a ppc64le machine. Change-Id: I6698c8e3cf3834f55ce5824059f44d00dc8e3c2f Reviewed-on: https://go-review.googlesource.com/14204 Run-TryBot: Michael Matloob <matloob@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-11-10 17:38:04 +00:00
Dmitry Vyukov	ee0305e036	runtime: remove dead code runtime.free has long gone. Change-Id: I058f69e6481b8fa008e1951c29724731a8a3d081 Reviewed-on: https://go-review.googlesource.com/16593 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com>	2015-11-03 19:20:21 +00:00
Dmitry Vyukov	bf606094ee	runtime: fix finalization and profiling of tiny allocations Handling of special records for tiny allocations has two problems: 1. Once we queue a finalizer we mark the object. As the result any subsequent finalizers for the same object will not be queued during this GC cycle. If we have 16 finalizers setup (the worst case), finalization will take 16 GC cycles. This is what caused misbehave of tinyfin.go. The actual flakiness was caused by the fact that fing is asynchronous and don't always run before the check. 2. If a tiny block has both finalizer and profile specials, it is possible that we both queue finalizer, preserve the object live and free the profile record. As the result heap profile can be skewed. Fix both issues by analyzing all special records for a single object at once. Also, make tinyfin test stricter and remove reliance on real time. Also, add a test for the problem 2. Currently heap profile missed about a half of live memory. Fixes #13100 Change-Id: I9ae4dc1c44893724138a4565ca5cae29f2e97544 Reviewed-on: https://go-review.googlesource.com/16591 Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Dmitry Vyukov <dvyukov@google.com>	2015-11-03 18:57:18 +00:00
Ian Lance Taylor	73f329f472	runtime, syscall: add calls to msan functions Add explicit memory sanitizer instrumentation to the runtime and syscall packages. The compiler does not instrument the runtime package. It does instrument the syscall package, but we need to add a couple of cases that it can't see. Change-Id: I2d66073f713fe67e33a6720460d2bb8f72f31394 Reviewed-on: https://go-review.googlesource.com/16164 Reviewed-by: David Crawshaw <crawshaw@golang.org>	2015-10-21 19:17:46 +00:00
Austin Clements	9a31d38f65	runtime: remove sweep wait loop in finishsweep_m In general, finishsweep_m must block until any spans that are concurrently being swept have been swept. It accomplishes this by looping over all spans, which, as in the previous commit, takes ~1ms/heap GB. Unfortunately, we do this during the STW sweep termination phase, so multi-gigabyte heaps can push our STW time past 10ms. However, there's no need to do this wait if the world is stopped because, in effect, stopping the world already had to wait for anything that was sweeping (and if it didn't, the wait in finishsweep_m would deadlock). Hence, we can simply skip this loop if the world is stopped, such as during sweep termination. In fact, currently all calls to finishsweep_m are STW, but this hasn't always been the case and may not be the case in the future, so we keep the logic around. For 24GB heaps, this reduces max pause time by 75% relative to tip and by 90% relative to Go 1.5. Notably, all pauses are now well under 10ms. Here are the results for the garbage benchmark: ------------- max pause ------------ Heap Procs after change before change 1.5.1 24GB 12 3.8ms 16ms 37ms 24GB 4 3.7ms 16ms 37ms 4GB 4 3.7ms 3ms 6.9ms In the 4GB/4P case, it seems the "before change" run got lucky: the max went up, but the 99%ile pause time went down from 3ms to 2.04ms. Change-Id: Ica22189559f231d408ef2815019c9dbb5f38bf31 Reviewed-on: https://go-review.googlesource.com/15071 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-10-02 19:56:01 +00:00
Austin Clements	70462f90ec	runtime: simplify mSpan_Sweep This is a cleanup following `cc8f544`, which was a minimal change to fix issue #11617. This consolidates the two places in mSpan_Sweep that update sweepgen. Previously this was necessary because sweepgen must be updated before freeing the span, but we freed large spans early. Now we free large spans later, so there's no need to duplicate the sweepgen update. This also means large spans can take advantage of the sweepgen sanity checking performed for other spans. Change-Id: I23b79dbd9ec81d08575cd307cdc0fa6b20831768 Reviewed-on: https://go-review.googlesource.com/12451 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-09-14 18:29:58 +00:00
Austin Clements	fc9ca85f4c	runtime: make sweep proportional to spans bytes allocated Proportional concurrent sweep is currently based on a ratio of spans to be swept per bytes of object allocation. However, proportional sweeping is performed during span allocation, not object allocation, in order to minimize contention and overhead. Since objects are allocated from spans after those spans are allocated, the system tends to operate in debt, which means when the next GC cycle starts, there is often sweep debt remaining, so GC has to finish the sweep, which delays the start of the cycle and delays enabling mutator assists. For example, it's quite likely that many Ps will simultaneously refill their span caches immediately after a GC cycle (because GC flushes the span caches), but at this point, there has been very little object allocation since the end of GC, so very little sweeping is done. The Ps then allocate objects from these cached spans, which drives up the bytes of object allocation, but since these allocations are coming from cached spans, nothing considers whether more sweeping has to happen. If the sweep ratio is high enough (which can happen if the next GC trigger is very close to the retained heap size), this can easily represent a sweep debt of thousands of pages. Fix this by making proportional sweep proportional to the number of bytes of spans allocated, rather than the number of bytes of objects allocated. Prior to allocating a span, both the small object path and the large object path ensure credit for allocating that span, so the system operates in the black, rather than in the red. Combined with the previous commit, this should eliminate all sweeping from GC start up. On the stress test in issue #11911, this reduces the time spent sweeping during GC (and delaying start up) by several orders of magnitude: mean 99%ile max pre fix 1 ms 11 ms 144 ms post fix 270 ns 735 ns 916 ns Updates #11911. Change-Id: I89223712883954c9d6ec2a7a51ecb97172097df3 Reviewed-on: https://go-review.googlesource.com/13044 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-08-04 18:54:44 +00:00
Austin Clements	e33d6b3d4d	runtime: remove out-of-date comment An out-of-date comment snuck in to `cc8f544`. Remove it. Change-Id: I5bc7c17e737d1cabe57b88de06d7579c60ca28ff Reviewed-on: https://go-review.googlesource.com/12328 Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2015-07-17 16:52:32 +00:00
Austin Clements	cc8f544198	runtime: don't free large spans until heapBitsSweepSpan returns This fixes a race between 1) sweeping and freeing an unmarked large span and 2) reusing that span and allocating from it. This race arises because mSpan_Sweep returns spans for large objects to the heap before heapBitsSweepSpan clears the mark bit on the object in the span. Specifically, the following sequence of events can lead to an incorrectly zeroed bitmap byte, which causes the garbage collector to not trace any pointers in that object (the pointer bits for the first four words are cleared, and the scan bits are also cleared, so it looks like a no-scan object). 1) P0 calls mSpan_Sweep on a large span S0 with an unmarked object on it. 2) mSpan_Sweep calls heapBitsSweepSpan, which invokes the callback for the one (unmarked) object on the span. 3) The callback calls mHeap_Free, which makes span S0 available for allocation, but this is too early. 4) P1 grabs this S0 from the heap to use for allocation. 5) P1 allocates an object on this span and writes that object's type bits to the bitmap. 6) P0 returns from the callback to heapBitsSweepSpan. heapBitsSweepSpan clears the byte containing the mark, even though this span is now owned by P1 and this byte contains important bitmap information. This fixes this problem by simply delaying the mHeap_Free until after the heapBitsSweepSpan. I think the overall logic of mSpan_Sweep could be simplified now, but this seems like the minimal change. Fixes #11617. Change-Id: I6b1382c7e7cc35f81984467c0772fe9848b7522a Reviewed-on: https://go-review.googlesource.com/12320 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Rob Pike <r@golang.org>	2015-07-17 03:34:11 +00:00
Austin Clements	5250279eb9	runtime: detect and print corrupted free lists Issues #10240, #10541, #10941, #11023, #11027 and possibly others are indicating memory corruption in the runtime. One of the easiest places to both get corruption and detect it is in the allocator's free lists since they appear throughout memory and follow strict invariants. This commit adds a check when sweeping a span that its free list is sane and, if not, it prints the corrupted free list and panics. Hopefully this will help us collect more information on these failures. Change-Id: I6d417bcaeedf654943a5e068bd76b58bb02d4a64 Reviewed-on: https://go-review.googlesource.com/10713 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com>	2015-06-16 21:17:47 +00:00
Austin Clements	24a7252e25	runtime: finish sweeping before concurrent GC starts Currently, the concurrent sweep follows a 1:1 rule: when allocation needs a span, it sweeps a span (likewise, when a large allocation needs N pages, it sweeps until it frees N pages). This rule worked well for the STW collector (especially when GOGC==100) because it did no more sweeping than necessary to keep the heap from growing, would generally finish sweeping just before GC, and ensured good temporal locality between sweeping a page and allocating from it. It doesn't work well with concurrent GC. Since concurrent GC requires starting GC earlier (sometimes much earlier), the sweep often won't be done when GC starts. Unfortunately, the first thing GC has to do is finish the sweep. In the mean time, the mutator can continue allocating, pushing the heap size even closer to the goal size. This worked okay with the 7/8ths trigger, but it gets into a vicious cycle with the GC trigger controller: if the mutator is allocating quickly and driving the trigger lower, more and more sweep work will be left to GC; this both causes GC to take longer (allowing the mutator to allocate more during GC) and delays the start of the concurrent mark phase, which throws off the GC controller's statistics and generally causes it to push the trigger even lower. As an example of a particularly bad case, the garbage benchmark with GOMAXPROCS=4 and -benchmem 512 (MB) spends the first 0.4-0.8 seconds of each GC cycle sweeping, during which the heap grows by between 109MB and 252MB. To fix this, this change replaces the 1:1 sweep rule with a proportional sweep rule. At the end of GC, GC knows exactly how much heap allocation will occur before the next concurrent GC as well as how many span pages must be swept. This change computes this "sweep ratio" and when the mallocgc asks for a span, the mcentral sweeps enough spans to bring the swept span count into ratio with the allocated byte count. On the benchmark from above, this entirely eliminates sweeping at the beginning of GC, which reduces the time between startGC readying the GC goroutine and GC stopping the world for sweep termination to ~100µs during which the heap grows at most 134KB. Change-Id: I35422d6bba0c2310d48bb1f8f30a72d29e98c1af Reviewed-on: https://go-review.googlesource.com/8921 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-21 15:35:46 +00:00
Austin Clements	bedb6f8aef	runtime: remove unnecessary traceNextGC Commit `d7e0ad4` removed the next_gc manipulation from mSpan_Sweep, but left in the traceNextGC() for recording the updated next_gc value. Remove this now unnecessary call. Change-Id: I28e0de071661199be9810d7bdcc81ce50b5a58ae Reviewed-on: https://go-review.googlesource.com/8894 Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-04-14 20:54:23 +00:00
Austin Clements	d7e0ad4b82	runtime: introduce heap_live; replace use of heap_alloc in GC Currently there are two main consumers of memstats.heap_alloc: updatememstats (aka ReadMemStats) and shouldtriggergc. updatememstats recomputes heap_alloc from the ground up, so we don't need to keep heap_alloc up to date for it. shouldtriggergc wants to know how many bytes were marked by the previous GC plus how many bytes have been allocated since then, but this isn't what heap_alloc tracks. heap_alloc also includes objects that are not marked and haven't yet been swept. Introduce a new memstat called heap_live that actually tracks what shouldtriggergc wants to know and stop keeping heap_alloc up to date. Unlike heap_alloc, heap_live follows a simple sawtooth that drops during each mark termination and increases monotonically between GCs. heap_alloc, on the other hand, has much more complicated behavior: it may drop during sweep termination, slowly decreases from background sweeping between GCs, is roughly unaffected by allocation as long as there are unswept spans (because we sweep and allocate at the same rate), and may go up after background sweeping is done depending on the GC trigger. heap_live simplifies computing next_gc and using it to figure out when to trigger garbage collection. Currently, we guess next_gc at the end of a cycle and update it as we sweep and get a better idea of how much heap was marked. Now, since we're directly tracking how much heap is marked, we can directly compute next_gc. This also corrects bugs that could cause us to trigger GC early. Currently, in any case where sweep termination actually finds spans to sweep, heap_alloc is an overestimation of live heap, so we'll trigger GC too early. heap_live, on the other hand, is unaffected by sweeping. Change-Id: I1f96807b6ed60d4156e8173a8e68745ffc742388 Reviewed-on: https://go-review.googlesource.com/8389 Reviewed-by: Russ Cox <rsc@golang.org>	2015-04-06 21:28:13 +00:00
Dmitry Vyukov	919fd24884	runtime: remove runtime frames from stacks in traces Stip uninteresting bottom and top frames from trace stacks. This makes both binary and json trace files smaller, and also makes stacks shorter and more readable in the viewer. Change-Id: Ib9c80ccc280504f0e235f867f53f1d2652c41583 Reviewed-on: https://go-review.googlesource.com/5523 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Dmitry Vyukov <dvyukov@google.com>	2015-03-10 14:46:15 +00:00
Russ Cox	5789b28525	runtime: start GC background sweep eagerly Starting it lazily causes a memory allocation (for the goroutine) during GC. First use of channels for runtime implementation. Change-Id: I9cd24dcadbbf0ee5070ee6d0ed7ea415504f316c Reviewed-on: https://go-review.googlesource.com/6960 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2015-03-05 21:41:55 +00:00
Rick Hudson	122384e489	runtime: Remove boundary bit logic. This is an experiment to see if removing the boundary bit logic will lead to fewer cache misses and improved performance. Instead of using boundary bits we use the span information to get element size and use some bit whacking to get the boundary without having to touch the random heap bits which cause cache misses. Furthermore once the boundary bit is removed we can either use that bit for a simpler checkmark routine or we can reduce the number of bits in the GC bitmap to 2 bits per pointer sized work. For example the 2 bits at the boundary can be used for marking and pointer/scalar differentiation. Since we don't need the mark bit except at the boundary nibble of the object other nibbles can use this bit as a noscan bit to indicate that there are no more pointers in the object. Currently the changed included in this CL slows down the garbage benchmark. With the boundary bits garbage gives 5.78 and without (this CL) it gives 5.88 which is a 2% slowdown. Change-Id: Id68f831ad668176f7dc9f7b57b339e4ebb6dc4c2 Reviewed-on: https://go-review.googlesource.com/6665 Reviewed-by: Austin Clements <austin@google.com>	2015-03-04 20:55:55 +00:00
Russ Cox	89a091de24	runtime: split gc_m into gcMark and gcSweep This is a nice split but more importantly it provides a better way to fit the checkmark phase into the sequencing. Also factor out common span copying into gcSpanCopy. Change-Id: Ia058644974e4ed4ac3cf4b017a3446eb2284d053 Reviewed-on: https://go-review.googlesource.com/5333 Reviewed-by: Austin Clements <austin@google.com>	2015-02-20 17:00:39 +00:00
Russ Cox	484f801ff4	runtime: reorganize memory code Move code from malloc1.go, malloc2.go, mem.go, mgc0.go into appropriate locations. Factor mgc.go into mgc.go, mgcmark.go, mgcsweep.go, mstats.go. A lot of this code was in certain files because the right place was in a C file but it was written in Go, or vice versa. This is one step toward making things actually well-organized again. Change-Id: I6741deb88a7cfb1c17ffe0bcca3989e10207968f Reviewed-on: https://go-review.googlesource.com/5300 Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-02-19 20:17:01 +00:00

1 2

82 Commits