mirror/go - go - Git Fam. Sieh

Commit Graph

Author	SHA1	Message	Date
Ian Lance Taylor	be1ef46775	runtime: add optional expensive check for invalid cgo pointer passing If you set GODEBUG=cgocheck=2 the runtime package will use the write barrier to detect cases where a Go program writes a Go pointer into non-Go memory. In conjunction with the existing cgo checks, and the not-yet-implemented cgo check for exported functions, this should reliably detect all cases (that do not import the unsafe package) in which a Go pointer is incorrectly shared with C code. This check is optional because it turns on the write barrier at all times, which is known to be expensive. Update #12416. Change-Id: I549d8b2956daa76eac853928e9280e615d6365f4 Reviewed-on: https://go-review.googlesource.com/16899 Reviewed-by: Russ Cox <rsc@golang.org>	2015-11-16 18:39:06 +00:00
Michael Matloob	432cb66f16	runtime: break out system-specific constants into package sys runtime/internal/sys will hold system-, architecture- and config- specific constants. Updates #11647 Change-Id: I6db29c312556087a42e8d2bdd9af40d157c56b54 Reviewed-on: https://go-review.googlesource.com/16817 Reviewed-by: Russ Cox <rsc@golang.org>	2015-11-12 17:04:45 +00:00
Matthew Dempsky	c17c42e8a5	runtime: rewrite lots of foo_Bar(f, ...) into f.bar(...) Applies to types fixAlloc, mCache, mCentral, mHeap, mSpan, and mSpanList. Two special cases: 1. mHeap_Scavenge() previously didn't take an mheap parameter, so it was specially handled in this CL. 2. mHeap_Free() would have collided with mheap's "free" field, so it's been renamed to (mheap).freeSpan to parallel its underlying (*mheap).freeSpanLocked method. Change-Id: I325938554cca432c166fe9d9d689af2bbd68de4b Reviewed-on: https://go-review.googlesource.com/16221 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-11-12 00:34:58 +00:00
Michael Matloob	67faca7d9c	runtime: break atomics out into package runtime/internal/atomic This change breaks out most of the atomics functions in the runtime into package runtime/internal/atomic. It adds some basic support in the toolchain for runtime packages, and also modifies linux/arm atomics to remove the dependency on the runtime's mutex. The mutexes have been replaced with spinlocks. all trybots are happy! In addition to the trybots, I've tested on the darwin/arm64 builder, on the darwin/arm builder, and on a ppc64le machine. Change-Id: I6698c8e3cf3834f55ce5824059f44d00dc8e3c2f Reviewed-on: https://go-review.googlesource.com/14204 Run-TryBot: Michael Matloob <matloob@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-11-10 17:38:04 +00:00
Austin Clements	beedb1ec33	runtime: add pcvalue cache to improve stack scan speed The cost of scanning large stacks is currently dominated by the time spent looking up and decoding the pcvalue table. However, large stacks are usually large not because they contain calls to many different functions, but because they contain many calls to the same, small set of recursive functions. Hence, walking large stacks tends to make the same pcvalue queries many times. Based on this observation, this commit adds a small, very simple, and fast cache in front of pcvalue lookup. We thread this cache down from operations that make many pcvalue calls, such as gentraceback, stack scanning, and stack adjusting. This simple cache works well because it has minimal overhead when it's not effective. I also tried a hashed direct-map cache, CLOCK-based replacement, round-robin replacement, and round-robin with lookups disabled until there had been at least 16 probes, but none of these approaches had obvious wins over the random replacement policy in this commit. This nearly doubles the overall performance of the deep stack test program from issue #10898: name old time/op new time/op delta Issue10898 16.5s ±12% 9.2s ±12% -44.37% (p=0.008 n=5+5) It's a very slight win on the garbage benchmark: name old time/op new time/op delta XBenchGarbage-12 4.92ms ± 1% 4.89ms ± 1% -0.75% (p=0.000 n=18+19) It's a wash (but doesn't harm performance) on the go1 benchmarks, which don't have particularly deep stacks: name old time/op new time/op delta BinaryTree17-12 3.11s ± 2% 3.20s ± 3% +2.83% (p=0.000 n=17+20) Fannkuch11-12 2.51s ± 1% 2.51s ± 1% -0.22% (p=0.034 n=19+18) FmtFprintfEmpty-12 50.8ns ± 3% 50.6ns ± 2% ~ (p=0.793 n=20+20) FmtFprintfString-12 174ns ± 0% 174ns ± 1% +0.17% (p=0.048 n=15+20) FmtFprintfInt-12 177ns ± 0% 165ns ± 1% -6.99% (p=0.000 n=17+19) FmtFprintfIntInt-12 283ns ± 1% 284ns ± 0% +0.22% (p=0.000 n=18+15) FmtFprintfPrefixedInt-12 243ns ± 1% 244ns ± 1% +0.40% (p=0.000 n=20+19) FmtFprintfFloat-12 318ns ± 0% 319ns ± 0% +0.27% (p=0.001 n=19+20) FmtManyArgs-12 1.12µs ± 0% 1.14µs ± 0% +1.74% (p=0.000 n=19+20) GobDecode-12 8.69ms ± 0% 8.73ms ± 1% +0.46% (p=0.000 n=18+18) GobEncode-12 6.64ms ± 1% 6.61ms ± 1% -0.46% (p=0.000 n=20+20) Gzip-12 323ms ± 2% 319ms ± 1% -1.11% (p=0.000 n=20+20) Gunzip-12 42.8ms ± 0% 42.9ms ± 0% ~ (p=0.158 n=18+20) HTTPClientServer-12 63.3µs ± 1% 63.1µs ± 1% -0.35% (p=0.011 n=20+20) JSONEncode-12 16.9ms ± 1% 17.3ms ± 1% +2.84% (p=0.000 n=19+20) JSONDecode-12 59.7ms ± 0% 58.5ms ± 0% -2.05% (p=0.000 n=19+17) Mandelbrot200-12 3.92ms ± 0% 3.91ms ± 0% -0.16% (p=0.003 n=19+19) GoParse-12 3.79ms ± 2% 3.75ms ± 2% -0.91% (p=0.005 n=20+20) RegexpMatchEasy0_32-12 102ns ± 1% 101ns ± 1% -0.80% (p=0.001 n=14+20) RegexpMatchEasy0_1K-12 337ns ± 1% 346ns ± 1% +2.90% (p=0.000 n=20+19) RegexpMatchEasy1_32-12 84.4ns ± 2% 84.3ns ± 2% ~ (p=0.743 n=20+20) RegexpMatchEasy1_1K-12 502ns ± 1% 505ns ± 0% +0.64% (p=0.000 n=20+20) RegexpMatchMedium_32-12 133ns ± 1% 132ns ± 1% -0.85% (p=0.000 n=20+19) RegexpMatchMedium_1K-12 40.1µs ± 1% 39.8µs ± 1% -0.77% (p=0.000 n=18+18) RegexpMatchHard_32-12 2.08µs ± 1% 2.07µs ± 1% -0.55% (p=0.001 n=18+19) RegexpMatchHard_1K-12 62.4µs ± 1% 62.0µs ± 1% -0.74% (p=0.000 n=19+19) Revcomp-12 545ms ± 2% 545ms ± 3% ~ (p=0.771 n=19+20) Template-12 73.7ms ± 1% 72.0ms ± 0% -2.33% (p=0.000 n=20+18) TimeParse-12 358ns ± 1% 351ns ± 1% -2.07% (p=0.000 n=20+20) TimeFormat-12 369ns ± 1% 356ns ± 0% -3.53% (p=0.000 n=20+18) [Geo mean] 63.5µs 63.2µs -0.41% name old speed new speed delta GobDecode-12 88.3MB/s ± 0% 87.9MB/s ± 0% -0.43% (p=0.000 n=18+17) GobEncode-12 116MB/s ± 1% 116MB/s ± 1% +0.47% (p=0.000 n=20+20) Gzip-12 60.2MB/s ± 2% 60.8MB/s ± 1% +1.13% (p=0.000 n=20+20) Gunzip-12 453MB/s ± 0% 453MB/s ± 0% ~ (p=0.160 n=18+20) JSONEncode-12 115MB/s ± 1% 112MB/s ± 1% -2.76% (p=0.000 n=19+20) JSONDecode-12 32.5MB/s ± 0% 33.2MB/s ± 0% +2.09% (p=0.000 n=19+17) GoParse-12 15.3MB/s ± 2% 15.4MB/s ± 2% +0.92% (p=0.004 n=20+20) RegexpMatchEasy0_32-12 311MB/s ± 1% 314MB/s ± 1% +0.78% (p=0.000 n=15+19) RegexpMatchEasy0_1K-12 3.04GB/s ± 1% 2.95GB/s ± 1% -2.90% (p=0.000 n=19+19) RegexpMatchEasy1_32-12 379MB/s ± 2% 380MB/s ± 2% ~ (p=0.779 n=20+20) RegexpMatchEasy1_1K-12 2.04GB/s ± 1% 2.02GB/s ± 0% -0.62% (p=0.000 n=20+20) RegexpMatchMedium_32-12 7.46MB/s ± 1% 7.53MB/s ± 1% +0.86% (p=0.000 n=20+19) RegexpMatchMedium_1K-12 25.5MB/s ± 1% 25.7MB/s ± 1% +0.78% (p=0.000 n=18+18) RegexpMatchHard_32-12 15.4MB/s ± 1% 15.5MB/s ± 1% +0.62% (p=0.000 n=19+19) RegexpMatchHard_1K-12 16.4MB/s ± 1% 16.5MB/s ± 1% +0.82% (p=0.000 n=20+19) Revcomp-12 466MB/s ± 2% 466MB/s ± 3% ~ (p=0.765 n=19+20) Template-12 26.3MB/s ± 1% 27.0MB/s ± 0% +2.38% (p=0.000 n=20+18) [Geo mean] 97.8MB/s 98.0MB/s +0.23% Change-Id: I281044ae0b24990ba46487cacbc1069493274bc4 Reviewed-on: https://go-review.googlesource.com/13614 Reviewed-by: Keith Randall <khr@golang.org>	2015-10-22 17:48:13 +00:00
Matthew Dempsky	84afa1be76	runtime: make iface/eface handling more type safe Change compiler-invoked interface functions to directly take iface/eface parameters instead of fInterface/interface{} to avoid needing to always convert. For the handful of functions that legitimately need to take an interface{} parameter, add efaceOf to type-safely convert interface{} to eface. Change-Id: I8928761a12fd3c771394f36adf93d3006a9fcf39 Reviewed-on: https://go-review.googlesource.com/16166 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-10-21 23:08:22 +00:00
Austin Clements	b307910b6e	runtime: fix offset in invalidptr panic message Change-Id: I00e1eebbf5e1a01c8fad5ca5324aa8eec1e4d731 Reviewed-on: https://go-review.googlesource.com/14792 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-09-22 16:55:17 +00:00
Austin Clements	b7c55ba496	runtime: improve invalid pointer error message By default, the runtime panics if it detects a pointer to an unallocated span. At this point, this usually catches bad uses of unsafe or cgo in user code (though it could also catch runtime bugs). Unfortunately, the rather cryptic error misleads users, offers users little help with debugging their own problem, and offers the Go developers little help with root-causing. Improve the error message in various ways. First, the wording is improved to make it clearer what condition was detected and to suggest that this may be the result of incorrect use of unsafe or cgo. Second, we add a dump of the object containing the bad pointer so that there's at least some hope of figuring out why a bad pointer was stored in the Go heap. Change-Id: I57b91b12bc3cb04476399d7706679e096ce594b9 Reviewed-on: https://go-review.googlesource.com/14763 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-09-18 22:23:11 +00:00
Russ Cox	3ae17043f7	runtime: make sure heapBitsBulkBarrier cannot be preempted Changes the torture test in #12068 from failing about 1/10 times to not failing in almost 2,000 runs. This was only happening in -race mode because functions are bigger in -race mode, so a few of the helpers for heapBitsBulkBarrier were not being inlined, and they were not marked nosplit, so (only in -race mode) the write barrier was being preempted by GC, causing missed pointer updates. Filed issue #12069 for diagnosis of any other similar errors. Fixes #12068. Change-Id: Ic174d9b050ba278b18b08ab0d85a73c33bd5b175 Reviewed-on: https://go-review.googlesource.com/13364 Reviewed-by: Austin Clements <austin@google.com>	2015-08-07 17:55:26 +00:00
Russ Cox	d3ffc975f3	runtime: set invalidptr=1 by default, as documented Also make invalidptr control the recently added GC pointer check, as documented. Change-Id: Iccfdf49480219d12be8b33b8f03d8312d8ceabed Reviewed-on: https://go-review.googlesource.com/12857 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Rob Pike <r@golang.org>	2015-07-29 23:50:20 +00:00
Russ Cox	4addec3aaa	runtime: reenable bad pointer check in GC The last time we tried this, linux/arm64 broke. The series of CLs leading to this one fixes that problem. Let's try again. Fixes #9880. Change-Id: I67bc1d959175ec972d4dcbe4aa6f153790f74251 Reviewed-on: https://go-review.googlesource.com/12849 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2015-07-29 21:37:55 +00:00
Russ Cox	a93e5b4ff9	Revert "runtime: diagnose invalid pointers during GC" Broke arm64. Update #9880. This reverts commit `38d9b2a3a9`. Change-Id: I35fa21005af2183828a9d8b195ebcfbe45ec5138 Reviewed-on: https://go-review.googlesource.com/12247 Reviewed-by: Russ Cox <rsc@golang.org>	2015-07-16 01:49:58 +00:00
Russ Cox	38d9b2a3a9	runtime: diagnose invalid pointers during GC For #9880. Let's see what breaks. Change-Id: Ic8b99a604e60177a448af5f7173595feed607875 Reviewed-on: https://go-review.googlesource.com/10818 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com>	2015-07-15 05:42:06 +00:00
Austin Clements	d231cb8249	runtime: repeat bitmap for slice of GCprog n-1 times, not n times Currently, to write out the bitmap of a slice of a type with a GCprog, we construct a new GCprog that executes the underlying type's GCprog to write out the bitmap once and then repeats those bits n more times. This results in n+1 repetitions of the bitmap, which is one more repetition than it should be. This corrupts the bitmap of the heap following the slice and may write past the mapped bitmap memory and segfault. Fix this by repeating the bitmap only n-1 more times. Fixes #11430. Change-Id: Ic24854363bffc5a755b66f257339f9309ada3aa5 Reviewed-on: https://go-review.googlesource.com/11570 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2015-06-26 21:52:51 +00:00
Austin Clements	a8ae93fd26	runtime: fix heap bitmap repeating with large scalar tails When heapBitsSetType repeats a source bitmap with a scalar tail (typ.ptrdata < typ.size), it lays out the tail upon reaching the end of the source bitmap by simply increasing the number of bits claimed to be in the incoming bit buffer. This causes later iterations to read the appropriate number of zeros out of the bit buffer before starting on the next repeat of the source bitmap. Currently, however, later iterations of the loop continue to read bits from the source bitmap regardless of the number of bits currently in the bit buffer. The bit buffer can only hold 32 or 64 bits, so if the scalar tail is large and the padding bits exceed the size of the bit buffer, the read from the source bitmap on the next iteration will shift the incoming bits into oblivion when it attempts to put them in the bit buffer. When the buffer does eventually shift down to where these bits were supposed to be, it will contain zeros. As a result, words that should be marked as pointers on later repetitions are marked as scalars, so the garbage collector does not trace them. If this is the only reference to an object, it will be incorrectly freed. Fix this by adding logic to drain the bit buffer down if it is large instead of reading more bits from the source bitmap. Fixes #11286. Change-Id: I964432c4b9f1cec334fc8c3da0ff16460203feb6 Reviewed-on: https://go-review.googlesource.com/11360 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-23 18:37:17 +00:00
Rick Hudson	1ab9176e54	runtime: remove race and increase precision in pointer validation. This CL removes the single and racy use of mheap.arena_end outside of the bookkeeping done in mHeap_init and mHeap_Alloc. There should be no way for heapBitsForSpan to see a pointer to an invalid span. This CL makes the check for this more precise by checking that the pointer is between mheap_.arena_start and mheap_.arena_used instead of mheap_.arena_end. Change-Id: I1200b54353ee1eda002d92645fd8d26048600ceb Reviewed-on: https://go-review.googlesource.com/11342 Reviewed-by: Austin Clements <austin@google.com>	2015-06-22 20:37:23 +00:00
Russ Cox	80ec711755	runtime: use type-based write barrier for remote stack write during chansend A send on an unbuffered channel to a blocked receiver is the only case in the runtime where one goroutine writes directly to the stack of another. The garbage collector assumes that if a goroutine is blocked, its stack contains no new pointers since the last time it ran. The send on an unbuffered channel violates this, so it needs an explicit write barrier. It has an explicit write barrier, but not one that can handle a write to another stack. Use one that can (based on type bitmap instead of heap bitmap). To make this work, raise the limit for type bitmaps so that they are used for all types up to 64 kB in size (256 bytes of bitmap). (The runtime already imposes a limit of 64 kB for a channel element size.) I have been unable to reproduce this problem in a simple test program. Could help #11035. Change-Id: I06ad994032d8cff3438c9b3eaa8d853915128af5 Reviewed-on: https://go-review.googlesource.com/10815 Reviewed-by: Austin Clements <austin@google.com>	2015-06-15 16:50:30 +00:00
Russ Cox	d57c889ae8	runtime: wait to update arena_used until after mapping bitmap This avoids a race with gcmarkwb_m that was leading to faults. Fixes #10212. Change-Id: I6fcf8d09f2692227063ce29152cb57366ea22487 Reviewed-on: https://go-review.googlesource.com/10816 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2015-06-11 18:15:21 +00:00
Ainar Garipov	7f9f70e5b6	all: fix misprints in comments These were found by grepping the comments from the go code and feeding the output to aspell. Change-Id: Id734d6c8d1938ec3c36bd94a4dbbad577e3ad395 Reviewed-on: https://go-review.googlesource.com/10941 Reviewed-by: Aamir Khan <syst3m.w0rm@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2015-06-11 14:18:57 +00:00
Russ Cox	433c0bc769	runtime: avoid fault in heapBitsBulkBarrier Change-Id: I0512e461de1f25cb2a1cb7f23e7a77d00700667c Reviewed-on: https://go-review.googlesource.com/10803 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-08 20:24:00 +00:00
Austin Clements	306f8f11ad	runtime: unwind stack barriers when writing above the current frame Stack barriers assume that writes through pointers to frames above the current frame will get write barriers, and hence these frames do not need to be re-scanned to pick up these changes. For normal writes, this is true. However, there are places in the runtime that use typedmemmove to potentially write through pointers to higher frames (such as mapassign1). Currently, typedmemmove does not execute write barriers if the destination is on the stack. If there's a stack barrier between the current frame and the frame being modified with typedmemmove, and the stack barrier is not otherwise hit, it's possible that the garbage collector will never see the updated pointer and incorrectly reclaim the object. Fix this by making heapBitsBulkBarrier (which lies behind typedmemmove and its variants) detect when the destination is in the stack and unwind stack barriers up to the point, forcing mark termination to later rescan the effected frame and collect these pointers. Fixes #11084. Might be related to #10240, #10541, #10941, #11023, #11027 and possibly others. Change-Id: I323d6cd0f1d29fa01f8fc946f4b90e04ef210efd Reviewed-on: https://go-review.googlesource.com/10791 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-07 17:57:47 +00:00
Russ Cox	001438bdfe	runtime: fix callwritebarrier Given a call frame F of size N where the return values start at offset R, callwritebarrier was instructing heapBitsBulkBarrier to scan the block of memory [F+R, F+R+N). It should only scan [F+R, F+N). The extra N-R bytes scanned might lead into the next allocated block in memory. Because the scan was consulting the heap bitmap for type information, scanning into the next block normally "just worked" in the sense of not crashing. Scanning the extra N-R bytes of memory is a problem mainly because it causes the GC to consider pointers that might otherwise not be considered, leading it to retain objects that should actually be freed. This is very difficult to detect. Luckily, juju turned up a case where the heap bitmap and the memory were out of sync for the block immediately after the call frame, so that heapBitsBulkBarrier saw an obvious non-pointer where it expected a pointer, causing a loud crash. Why is there a non-pointer in memory that the heap bitmap records as a pointer? That is more difficult to answer. At least one way that it could happen is that allocations containing no pointers at all do not update the heap bitmap. So if heapBitsBulkBarrier walked out of the current object and into a no-pointer object and consulted those bitmap bits, it would be misled. This doesn't happen in general because all the paths to heapBitsBulkBarrier first check for the no-pointer case. This may or may not be what happened, but it's the only scenario I've been able to construct. I tried for quite a while to write a simple test for this and could not. It does fix the juju crash, and it is clearly an improvement over the old code. Fixes #10844. Change-Id: I53982c93ef23ef93155c4086bbd95a4c4fdaac9a Reviewed-on: https://go-review.googlesource.com/10317 Reviewed-by: Austin Clements <austin@google.com>	2015-05-21 19:14:03 +00:00
Russ Cox	512f75e8df	runtime: replace GC programs with simpler encoding, faster decoder Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org>	2015-05-16 00:38:17 +00:00
Russ Cox	c3c047a6a3	runtime: test and fix heap bitmap for 1-pointer allocation on 32-bit system Change-Id: Ic064fe7c6bd3304dcc8c3f7b3b5393870b5387c2 Reviewed-on: https://go-review.googlesource.com/10119 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Austin Clements <austin@google.com>	2015-05-15 18:47:00 +00:00
Russ Cox	65c4d7beab	runtime: optimize heapBitsBulkBarrier a tiny amount This may be mostly noise but: name old mean new mean delta BinaryTree17 6.03s × (0.98,1.02) 5.98s × (0.97,1.03) ~ (p=0.306) Fannkuch11 4.42s × (0.99,1.01) 4.34s × (0.99,1.02) -1.83% (p=0.000) FmtFprintfEmpty 84.7ns × (0.99,1.01) 84.4ns × (1.00,1.00) ~ (p=0.138) FmtFprintfString 289ns × (0.98,1.02) 289ns × (1.00,1.01) ~ (p=0.509) FmtFprintfInt 280ns × (0.97,1.03) 272ns × (0.98,1.03) -2.64% (p=0.003) FmtFprintfIntInt 484ns × (0.98,1.02) 482ns × (0.98,1.03) ~ (p=0.606) FmtFprintfPrefixedInt 397ns × (0.98,1.03) 393ns × (0.99,1.02) ~ (p=0.064) FmtFprintfFloat 573ns × (0.99,1.01) 569ns × (0.99,1.01) -0.69% (p=0.023) FmtManyArgs 1.89µs × (0.99,1.02) 1.91µs × (0.98,1.02) ~ (p=0.219) GobDecode 15.4ms × (0.99,1.02) 15.1ms × (0.99,1.01) -2.05% (p=0.000) GobEncode 12.0ms × (0.97,1.04) 11.9ms × (0.97,1.03) ~ (p=0.458) Gzip 652ms × (0.99,1.01) 653ms × (0.99,1.01) ~ (p=0.743) Gunzip 144ms × (0.99,1.01) 143ms × (0.99,1.01) ~ (p=0.134) HTTPClientServer 91.6µs × (0.99,1.01) 91.8µs × (0.99,1.03) ~ (p=0.678) JSONEncode 31.9ms × (1.00,1.00) 32.0ms × (0.99,1.01) ~ (p=0.334) JSONDecode 110ms × (0.99,1.01) 110ms × (0.99,1.01) ~ (p=0.315) Mandelbrot200 6.04ms × (0.99,1.01) 6.04ms × (1.00,1.01) ~ (p=0.596) GoParse 6.72ms × (0.98,1.03) 6.74ms × (0.99,1.03) ~ (p=0.577) RegexpMatchEasy0_32 161ns × (0.99,1.01) 160ns × (1.00,1.00) -0.83% (p=0.002) RegexpMatchEasy0_1K 542ns × (0.99,1.02) 541ns × (0.99,1.01) ~ (p=0.396) RegexpMatchEasy1_32 140ns × (0.98,1.01) 137ns × (1.00,1.00) -2.12% (p=0.000) RegexpMatchEasy1_1K 892ns × (0.99,1.01) 891ns × (1.00,1.01) ~ (p=0.631) RegexpMatchMedium_32 255ns × (0.99,1.01) 253ns × (0.99,1.01) -0.76% (p=0.008) RegexpMatchMedium_1K 73.1µs × (1.00,1.01) 72.9µs × (1.00,1.00) ~ (p=0.229) RegexpMatchHard_32 3.86µs × (1.00,1.01) 3.85µs × (1.00,1.00) ~ (p=0.341) RegexpMatchHard_1K 117µs × (1.00,1.01) 117µs × (0.99,1.00) ~ (p=0.955) Revcomp 954ms × (0.97,1.03) 955ms × (0.98,1.02) ~ (p=0.894) Template 133ms × (0.97,1.05) 129ms × (0.99,1.02) -2.50% (p=0.014) TimeParse 629ns × (0.99,1.01) 626ns × (0.99,1.01) ~ (p=0.106) TimeFormat 663ns × (0.99,1.01) 660ns × (0.99,1.02) ~ (p=0.231) Change-Id: I580e03ed01b0629cb5eae4c4637618f20127f924 Reviewed-on: https://go-review.googlesource.com/9994 Reviewed-by: Austin Clements <austin@google.com>	2015-05-15 13:52:00 +00:00
Russ Cox	ecfe42cab0	runtime: keep pointer bits set always in 1-word spans It's dumb to clear them in initSpan, set them in heapBitsSetType, clear them in heapBitsSweepSpan, set them again in heapBitsSetType, clear them again in heapBitsSweepSpan, and so on. Set them in initSpan and be done with it (until the span is reused for objects of a different size). This avoids an atomic operation in a common case (one-word allocation). Suggested by rlh. name old mean new mean delta BinaryTree17 5.87s × (0.97,1.03) 5.93s × (0.98,1.04) ~ (p=0.056) Fannkuch11 4.34s × (1.00,1.01) 4.41s × (1.00,1.00) +1.42% (p=0.000) FmtFprintfEmpty 86.1ns × (0.98,1.03) 88.9ns × (0.95,1.14) ~ (p=0.066) FmtFprintfString 292ns × (0.97,1.04) 284ns × (0.98,1.03) -2.64% (p=0.000) FmtFprintfInt 271ns × (0.98,1.06) 274ns × (0.98,1.05) ~ (p=0.148) FmtFprintfIntInt 478ns × (0.98,1.05) 487ns × (0.98,1.03) +1.85% (p=0.004) FmtFprintfPrefixedInt 397ns × (0.98,1.05) 394ns × (0.98,1.02) ~ (p=0.184) FmtFprintfFloat 553ns × (0.99,1.02) 543ns × (0.99,1.01) -1.71% (p=0.000) FmtManyArgs 1.90µs × (0.98,1.05) 1.88µs × (0.99,1.01) -0.97% (p=0.037) GobDecode 15.1ms × (0.99,1.01) 15.3ms × (0.99,1.01) +0.78% (p=0.001) GobEncode 11.7ms × (0.98,1.05) 11.6ms × (0.99,1.02) -1.39% (p=0.009) Gzip 646ms × (1.00,1.01) 647ms × (1.00,1.01) ~ (p=0.120) Gunzip 142ms × (1.00,1.00) 142ms × (1.00,1.00) ~ (p=0.068) HTTPClientServer 89.7µs × (0.99,1.01) 90.1µs × (0.98,1.03) ~ (p=0.224) JSONEncode 31.3ms × (0.99,1.01) 31.2ms × (0.99,1.02) ~ (p=0.149) JSONDecode 113ms × (0.99,1.01) 111ms × (0.99,1.01) -1.25% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) +0.09% (p=0.015) GoParse 6.63ms × (0.98,1.03) 6.55ms × (0.99,1.02) -1.10% (p=0.006) RegexpMatchEasy0_32 161ns × (1.00,1.00) 161ns × (1.00,1.00) (sample has zero variance) RegexpMatchEasy0_1K 539ns × (0.99,1.01) 563ns × (0.99,1.01) +4.51% (p=0.000) RegexpMatchEasy1_32 140ns × (0.99,1.01) 141ns × (0.99,1.01) +1.34% (p=0.000) RegexpMatchEasy1_1K 886ns × (1.00,1.01) 888ns × (1.00,1.00) +0.20% (p=0.003) RegexpMatchMedium_32 252ns × (1.00,1.02) 255ns × (0.99,1.01) +1.32% (p=0.000) RegexpMatchMedium_1K 72.7µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.296) RegexpMatchHard_32 3.84µs × (1.00,1.01) 3.84µs × (1.00,1.00) ~ (p=0.339) RegexpMatchHard_1K 117µs × (1.00,1.01) 117µs × (1.00,1.00) -0.28% (p=0.022) Revcomp 914ms × (0.99,1.01) 909ms × (0.99,1.01) -0.49% (p=0.031) Template 128ms × (0.99,1.01) 127ms × (0.99,1.01) -1.10% (p=0.000) TimeParse 628ns × (0.99,1.01) 639ns × (0.99,1.01) +1.69% (p=0.000) TimeFormat 660ns × (0.99,1.01) 662ns × (0.99,1.02) ~ (p=0.287) Change-Id: I3127b0ab89708267c74aa7d0eae1db1a1bcdfda5 Reviewed-on: https://go-review.googlesource.com/9884 Reviewed-by: Austin Clements <austin@google.com>	2015-05-14 15:58:54 +00:00
Russ Cox	94934f843e	runtime: rewrite addb/subtractb to be simpler to compile; introduce add1, subtract1 This reduces the depth of the inlining at a particular call site. The inliner introduces many temporary variables, and the compiler can do a better job with fewer. Being verbose in the bodies of these helper functions seems like a reasonable tradeoff: the uses are still just as readable, and they run faster in some important cases. Change-Id: I5323976ed3704d0acd18fb31176cfbf5ba23a89c Reviewed-on: https://go-review.googlesource.com/9883 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2015-05-14 15:55:42 +00:00
Russ Cox	5b3739357a	runtime: skip atomics in heapBitsSetType when GC is not running Suggested by Rick during code review of this code, but separated out for easier diagnosis in case it causes problems (and also easier rollback). name old mean new mean delta SetTypePtr 13.9ns × (0.98,1.05) 6.2ns × (0.99,1.01) -55.18% (p=0.000) SetTypePtr8 15.5ns × (0.95,1.10) 15.5ns × (0.99,1.05) ~ (p=0.952) SetTypePtr16 17.8ns × (0.99,1.05) 18.0ns × (1.00,1.00) ~ (p=0.157) SetTypePtr32 25.2ns × (0.99,1.01) 24.3ns × (0.99,1.01) -3.86% (p=0.000) SetTypePtr64 42.2ns × (0.93,1.13) 40.8ns × (0.99,1.01) ~ (p=0.239) SetTypePtr126 67.3ns × (1.00,1.00) 67.5ns × (0.99,1.02) ~ (p=0.365) SetTypePtr128 67.6ns × (1.00,1.01) 70.1ns × (0.97,1.10) ~ (p=0.063) SetTypePtrSlice 575ns × (0.98,1.06) 543ns × (0.95,1.17) -5.54% (p=0.034) SetTypeNode1 12.4ns × (0.98,1.09) 12.8ns × (0.99,1.01) +3.40% (p=0.021) SetTypeNode1Slice 97.1ns × (0.97,1.09) 89.5ns × (1.00,1.00) -7.78% (p=0.000) SetTypeNode8 29.8ns × (1.00,1.01) 17.7ns × (1.00,1.01) -40.74% (p=0.000) SetTypeNode8Slice 204ns × (0.99,1.04) 190ns × (0.97,1.06) -6.96% (p=0.000) SetTypeNode64 42.8ns × (0.99,1.01) 44.0ns × (0.95,1.12) ~ (p=0.163) SetTypeNode64Slice 1.00µs × (0.95,1.09) 0.98µs × (0.96,1.08) ~ (p=0.356) SetTypeNode64Dead 12.2ns × (0.99,1.04) 12.7ns × (1.00,1.01) +4.34% (p=0.000) SetTypeNode64DeadSlice 1.14µs × (0.94,1.11) 0.99µs × (0.99,1.03) -13.74% (p=0.000) SetTypeNode124 67.9ns × (0.99,1.03) 70.4ns × (0.95,1.15) ~ (p=0.115) SetTypeNode124Slice 1.76µs × (0.99,1.04) 1.88µs × (0.91,1.23) ~ (p=0.096) SetTypeNode126 67.7ns × (1.00,1.01) 68.2ns × (0.99,1.02) +0.72% (p=0.014) SetTypeNode126Slice 1.76µs × (1.00,1.01) 1.87µs × (0.93,1.15) +6.15% (p=0.035) SetTypeNode1024 462ns × (0.96,1.10) 451ns × (0.99,1.05) ~ (p=0.224) SetTypeNode1024Slice 14.4µs × (0.95,1.15) 14.2µs × (0.97,1.19) ~ (p=0.676) name old mean new mean delta BinaryTree17 5.87s × (0.98,1.04) 5.87s × (0.98,1.03) ~ (p=0.993) Fannkuch11 4.39s × (0.99,1.01) 4.34s × (1.00,1.01) -1.22% (p=0.000) FmtFprintfEmpty 90.6ns × (0.97,1.06) 89.4ns × (0.97,1.03) ~ (p=0.070) FmtFprintfString 305ns × (0.98,1.02) 296ns × (0.99,1.02) -2.94% (p=0.000) FmtFprintfInt 276ns × (0.97,1.04) 270ns × (0.98,1.03) -2.17% (p=0.001) FmtFprintfIntInt 490ns × (0.97,1.05) 473ns × (0.99,1.02) -3.59% (p=0.000) FmtFprintfPrefixedInt 402ns × (0.99,1.02) 397ns × (0.99,1.01) -1.15% (p=0.000) FmtFprintfFloat 577ns × (0.99,1.01) 549ns × (0.99,1.01) -4.78% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.02) 1.87µs × (0.99,1.01) -1.43% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 14.7ms × (0.99,1.02) -3.55% (p=0.000) GobEncode 11.7ms × (0.98,1.04) 11.5ms × (0.99,1.02) -1.63% (p=0.002) Gzip 647ms × (0.99,1.01) 647ms × (1.00,1.01) ~ (p=0.486) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.00) ~ (p=0.234) HTTPClientServer 90.7µs × (0.99,1.01) 90.4µs × (0.98,1.04) ~ (p=0.331) JSONEncode 31.9ms × (0.97,1.06) 31.6ms × (0.98,1.02) ~ (p=0.206) JSONDecode 110ms × (0.99,1.01) 112ms × (0.99,1.02) +1.48% (p=0.000) Mandelbrot200 6.00ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.058) GoParse 6.63ms × (0.98,1.03) 6.61ms × (0.98,1.02) ~ (p=0.353) RegexpMatchEasy0_32 162ns × (0.99,1.01) 161ns × (1.00,1.00) -0.33% (p=0.004) RegexpMatchEasy0_1K 539ns × (0.99,1.01) 540ns × (0.99,1.02) ~ (p=0.222) RegexpMatchEasy1_32 139ns × (0.99,1.01) 140ns × (0.97,1.03) ~ (p=0.054) RegexpMatchEasy1_1K 886ns × (1.00,1.00) 887ns × (1.00,1.00) +0.18% (p=0.001) RegexpMatchMedium_32 252ns × (1.00,1.01) 252ns × (1.00,1.00) +0.21% (p=0.010) RegexpMatchMedium_1K 72.7µs × (1.00,1.01) 72.6µs × (1.00,1.00) ~ (p=0.060) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.065) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) -0.27% (p=0.000) Revcomp 916ms × (0.98,1.04) 909ms × (0.99,1.01) ~ (p=0.054) Template 126ms × (0.99,1.01) 128ms × (0.99,1.02) +1.43% (p=0.000) TimeParse 632ns × (0.99,1.01) 625ns × (1.00,1.01) -1.05% (p=0.000) TimeFormat 655ns × (0.99,1.02) 669ns × (0.99,1.02) +2.01% (p=0.000) Change-Id: I9477b7c9489c6fa98e860c190ce06cd73c53c6a1 Reviewed-on: https://go-review.googlesource.com/9829 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2015-05-14 15:54:53 +00:00
Russ Cox	4212a3c3d9	runtime: use heap bitmap for typedmemmove The current implementation of typedmemmove walks the ptrmask in the type to find out where pointers are. This led to turning off GC programs for the Go 1.5 dev cycle, so that there would always be a ptrmask. Instead of also interpreting the GC programs, interpret the heap bitmap, which we know must be available and up to date. (There is no point to write barriers when writing outside the heap.) This CL is only about correctness. The next CL will optimize the code. Change-Id: Id1305c7c071fd2734ab96634b0e1c745b23fa793 Reviewed-on: https://go-review.googlesource.com/9886 Reviewed-by: Austin Clements <austin@google.com>	2015-05-11 16:38:21 +00:00
Russ Cox	266a842f55	runtime: zero entire bitmap for object, even past dead marker We want typedmemmove to use the heap bitmap to determine where pointers are, instead of reinterpreting the type information. The heap bitmap is simpler to access. In general, typedmemmove will need to be able to look up the bits for any word and find valid pointer information, so fill even after the dead marker. Not filling after the dead marker was an optimization I introduced only a few days ago, when reintroducing the dead marker code. At the time I said it probably wouldn't last, and it didn't. Change-Id: I6ba01bff17ddee1ff429f454abe29867ec60606e Reviewed-on: https://go-review.googlesource.com/9885 Reviewed-by: Austin Clements <austin@google.com>	2015-05-11 16:37:46 +00:00
Russ Cox	e375ca2a25	runtime: reorder bits in heap bitmap bytes The runtime deals with 1-bit pointer bitmaps and 2-bit heap bitmaps that have entries for both pointers and mark bits. Each byte in a 1-bit pointer bitmap looks like pppppppp (all pointer bits). Each byte in a 2-bit heap bitmap looks like mpmpmpmp (mark, pointer, ...). This means that when converting from 1-bit to 2-bit, as we do during malloc, we have to pick up 4 bits in pppp form and use shifts to create the mpmpmpmp form. This CL changes the 2-bit heap bitmap form to mmmmpppp, so that 4 bits picked up in 1-bit form can be used directly in the low bits of the heap bitmap byte, without expansion. This simplifies the code, and it also happens to be faster. name old mean new mean delta SetTypePtr 14.0ns × (0.98,1.09) 14.0ns × (0.98,1.08) ~ (p=0.966) SetTypePtr8 16.5ns × (0.99,1.05) 15.3ns × (0.96,1.16) -6.86% (p=0.012) SetTypePtr16 21.3ns × (0.98,1.05) 18.8ns × (0.94,1.14) -11.49% (p=0.000) SetTypePtr32 34.6ns × (0.93,1.22) 27.7ns × (0.91,1.26) -20.08% (p=0.001) SetTypePtr64 55.7ns × (0.97,1.11) 41.6ns × (0.98,1.04) -25.30% (p=0.000) SetTypePtr126 98.0ns × (1.00,1.00) 67.7ns × (0.99,1.05) -30.88% (p=0.000) SetTypePtr128 98.6ns × (1.00,1.01) 68.6ns × (0.99,1.03) -30.44% (p=0.000) SetTypePtrSlice 781ns × (0.99,1.01) 571ns × (0.99,1.04) -26.93% (p=0.000) SetTypeNode1 13.1ns × (0.99,1.01) 12.1ns × (0.99,1.01) -7.45% (p=0.000) SetTypeNode1Slice 113ns × (0.99,1.01) 94ns × (1.00,1.00) -16.35% (p=0.000) SetTypeNode8 32.7ns × (1.00,1.00) 29.8ns × (0.99,1.01) -8.97% (p=0.000) SetTypeNode8Slice 266ns × (1.00,1.00) 204ns × (1.00,1.00) -23.40% (p=0.000) SetTypeNode64 58.0ns × (0.98,1.08) 42.8ns × (1.00,1.01) -26.24% (p=0.000) SetTypeNode64Slice 1.55µs × (0.99,1.02) 0.96µs × (1.00,1.00) -37.84% (p=0.000) SetTypeNode64Dead 13.1ns × (0.99,1.01) 12.1ns × (1.00,1.00) -7.33% (p=0.000) SetTypeNode64DeadSlice 1.52µs × (1.00,1.01) 1.08µs × (1.00,1.01) -28.95% (p=0.000) SetTypeNode124 97.9ns × (1.00,1.00) 67.1ns × (1.00,1.01) -31.49% (p=0.000) SetTypeNode124Slice 2.87µs × (0.99,1.02) 1.75µs × (1.00,1.01) -39.15% (p=0.000) SetTypeNode126 98.4ns × (1.00,1.01) 68.1ns × (1.00,1.01) -30.79% (p=0.000) SetTypeNode126Slice 2.91µs × (0.99,1.01) 1.77µs × (0.99,1.01) -39.09% (p=0.000) SetTypeNode1024 732ns × (1.00,1.00) 511ns × (0.87,1.42) -30.14% (p=0.000) SetTypeNode1024Slice 23.1µs × (1.00,1.00) 13.9µs × (0.99,1.02) -39.83% (p=0.000) Change-Id: I12e3b850a4e6fa6c8146b8635ff728f3ef658819 Reviewed-on: https://go-review.googlesource.com/9828 Reviewed-by: Austin Clements <austin@google.com>	2015-05-11 16:37:36 +00:00
Russ Cox	54af9a3ba5	runtime: reintroduce ``dead'' space during GC scan Reintroduce an optimization discarded during the initial conversion from 4-bit heap bitmaps to 2-bit heap bitmaps: when we reach the place in the bitmap where there are no more pointers, mark that position for the GC so that it can avoid scanning past that place. During heapBitsSetType we can also avoid initializing heap bitmap beyond that location, which gives a bit of a win compared to Go 1.4. This particular optimization (not initializing the heap bitmap) may not last: we might change typedmemmove to use the heap bitmap, in which case it would all need to be initialized. The early stop in the GC scan will stay no matter what. Compared to Go 1.4 (github.com/rsc/go, branch go14bench): name old mean new mean delta SetTypeNode64 80.7ns × (1.00,1.01) 57.4ns × (1.00,1.01) -28.83% (p=0.000) SetTypeNode64Dead 80.5ns × (1.00,1.01) 13.1ns × (0.99,1.02) -83.77% (p=0.000) SetTypeNode64Slice 2.16µs × (1.00,1.01) 1.54µs × (1.00,1.01) -28.75% (p=0.000) SetTypeNode64DeadSlice 2.16µs × (1.00,1.01) 1.52µs × (1.00,1.00) -29.74% (p=0.000) Compared to previous CL: name old mean new mean delta SetTypeNode64 56.7ns × (1.00,1.00) 57.4ns × (1.00,1.01) +1.19% (p=0.000) SetTypeNode64Dead 57.2ns × (1.00,1.00) 13.1ns × (0.99,1.02) -77.15% (p=0.000) SetTypeNode64Slice 1.56µs × (1.00,1.01) 1.54µs × (1.00,1.01) -0.89% (p=0.000) SetTypeNode64DeadSlice 1.55µs × (1.00,1.01) 1.52µs × (1.00,1.00) -2.23% (p=0.000) This is the last CL in the sequence converting from the 4-bit heap to the 2-bit heap, with all the same optimizations reenabled. Compared to before that process began (compared to CL 9701 patch set 1): name old mean new mean delta BinaryTree17 5.87s × (0.94,1.09) 5.91s × (0.96,1.06) ~ (p=0.578) Fannkuch11 4.32s × (1.00,1.00) 4.32s × (1.00,1.00) ~ (p=0.474) FmtFprintfEmpty 89.1ns × (0.95,1.16) 89.0ns × (0.93,1.10) ~ (p=0.942) FmtFprintfString 283ns × (0.98,1.02) 298ns × (0.98,1.06) +5.33% (p=0.000) FmtFprintfInt 284ns × (0.98,1.04) 286ns × (0.98,1.03) ~ (p=0.208) FmtFprintfIntInt 486ns × (0.98,1.03) 498ns × (0.97,1.06) +2.48% (p=0.000) FmtFprintfPrefixedInt 400ns × (0.99,1.02) 408ns × (0.98,1.02) +2.23% (p=0.000) FmtFprintfFloat 566ns × (0.99,1.01) 587ns × (0.98,1.01) +3.69% (p=0.000) FmtManyArgs 1.91µs × (0.99,1.02) 1.94µs × (0.99,1.02) +1.81% (p=0.000) GobDecode 15.5ms × (0.98,1.05) 15.8ms × (0.98,1.03) +1.94% (p=0.002) GobEncode 11.9ms × (0.97,1.03) 12.0ms × (0.96,1.09) ~ (p=0.263) Gzip 648ms × (0.99,1.01) 648ms × (0.99,1.01) ~ (p=0.992) Gunzip 143ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.585) HTTPClientServer 89.2µs × (0.99,1.02) 90.3µs × (0.98,1.01) +1.24% (p=0.000) JSONEncode 32.3ms × (0.97,1.06) 31.6ms × (0.99,1.01) -2.29% (p=0.000) JSONDecode 106ms × (0.99,1.01) 107ms × (1.00,1.01) +0.62% (p=0.000) Mandelbrot200 6.02ms × (1.00,1.00) 6.03ms × (1.00,1.01) ~ (p=0.250) GoParse 6.57ms × (0.97,1.06) 6.53ms × (0.99,1.03) ~ (p=0.243) RegexpMatchEasy0_32 162ns × (1.00,1.00) 161ns × (1.00,1.01) -0.80% (p=0.000) RegexpMatchEasy0_1K 561ns × (0.99,1.02) 541ns × (0.99,1.01) -3.67% (p=0.000) RegexpMatchEasy1_32 145ns × (0.95,1.04) 138ns × (1.00,1.00) -5.04% (p=0.000) RegexpMatchEasy1_1K 864ns × (0.99,1.04) 887ns × (0.99,1.01) +2.57% (p=0.000) RegexpMatchMedium_32 255ns × (0.99,1.04) 253ns × (0.99,1.01) -1.05% (p=0.012) RegexpMatchMedium_1K 73.9µs × (0.98,1.04) 72.8µs × (1.00,1.00) -1.51% (p=0.005) RegexpMatchHard_32 3.92µs × (0.98,1.04) 3.85µs × (1.00,1.01) -1.88% (p=0.002) RegexpMatchHard_1K 120µs × (0.98,1.04) 117µs × (1.00,1.01) -2.02% (p=0.001) Revcomp 936ms × (0.95,1.08) 922ms × (0.97,1.08) ~ (p=0.234) Template 130ms × (0.98,1.04) 126ms × (0.99,1.01) -2.99% (p=0.000) TimeParse 638ns × (0.98,1.05) 628ns × (0.99,1.01) -1.54% (p=0.004) TimeFormat 674ns × (0.99,1.01) 668ns × (0.99,1.01) -0.80% (p=0.001) The slowdown of the first few benchmarks seems to be due to the new atomic operations for certain small size allocations. But the larger benchmarks mostly improve, probably due to the decreased memory pressure from having half as much heap bitmap. CL 9706, which removes the (never used anymore) wbshadow mode, gets back what is lost in the early microbenchmarks. Change-Id: I37423a209e8ec2a2e92538b45cac5422a6acd32d Reviewed-on: https://go-review.googlesource.com/9705 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-05-11 14:51:40 +00:00
Russ Cox	feb8a3b616	runtime: optimize heapBitsSetType For the conversion of the heap bitmap from 4-bit to 2-bit fields, I replaced heapBitsSetType with the dumbest thing that could possibly work: two atomic operations (atomicand8+atomicor8) per 2-bit field. This CL replaces that code with a proper implementation that avoids the atomics whenever possible. Benchmarks vs base CL (before the conversion to 2-bit heap bitmap) and vs Go 1.4 below. Compared to Go 1.4, SetTypePtr (a 1-pointer allocation) is 10ns slower because a race against the concurrent GC requires the use of an atomicor8 that used to be an ordinary write. This slowdown was present even in the base CL. Compared to both Go 1.4 and base, SetTypeNode8 (a 10-word allocation) is 10ns slower because it too needs a new atomic, because with the denser representation, the byte on the end of the allocation is now shared with the object next to it; this was not true with the 4-bit representation. Excluding these two (fundamental) slowdowns due to the use of atomics, the new code is noticeably faster than both Go 1.4 and the base CL. The next CL will reintroduce the ``typeDead'' optimization. Stats are from 5 runs on a MacBookPro10,2 (late 2012 Core i5). Compared to base CL ( = new atomic) name old mean new mean delta SetTypePtr 14.1ns × (0.99,1.02) 14.7ns × (0.93,1.10) ~ (p=0.175) SetTypePtr8 18.4ns × (1.00,1.01) 18.6ns × (0.81,1.21) ~ (p=0.866) SetTypePtr16 28.7ns × (1.00,1.00) 22.4ns × (0.90,1.27) -21.88% (p=0.015) SetTypePtr32 52.3ns × (1.00,1.00) 33.8ns × (0.93,1.24) -35.37% (p=0.001) SetTypePtr64 79.2ns × (1.00,1.00) 55.1ns × (1.00,1.01) -30.43% (p=0.000) SetTypePtr126 118ns × (1.00,1.00) 100ns × (1.00,1.00) -15.97% (p=0.000) SetTypePtr128 130ns × (0.92,1.19) 98ns × (1.00,1.00) -24.36% (p=0.008) SetTypePtrSlice 726ns × (0.96,1.08) 760ns × (1.00,1.00) ~ (p=0.152) SetTypeNode1 14.1ns × (0.94,1.15) 12.0ns × (1.00,1.01) -14.60% (p=0.020) SetTypeNode1Slice 135ns × (0.96,1.07) 88ns × (1.00,1.00) -34.53% (p=0.000) SetTypeNode8 20.9ns × (1.00,1.01) 32.6ns × (1.00,1.00) +55.37% (p=0.000) SetTypeNode8Slice 414ns × (0.99,1.02) 244ns × (1.00,1.00) -41.09% (p=0.000) SetTypeNode64 80.0ns × (1.00,1.00) 57.4ns × (1.00,1.00) -28.23% (p=0.000) SetTypeNode64Slice 2.15µs × (1.00,1.01) 1.56µs × (1.00,1.00) -27.43% (p=0.000) SetTypeNode124 119ns × (0.99,1.00) 100ns × (1.00,1.00) -16.11% (p=0.000) SetTypeNode124Slice 3.40µs × (1.00,1.00) 2.93µs × (1.00,1.00) -13.80% (p=0.000) SetTypeNode126 120ns × (1.00,1.01) 98ns × (1.00,1.00) -18.19% (p=0.000) SetTypeNode126Slice 3.53µs × (0.98,1.08) 3.02µs × (1.00,1.00) -14.49% (p=0.002) SetTypeNode1024 726ns × (0.97,1.09) 740ns × (1.00,1.00) ~ (p=0.451) SetTypeNode1024Slice 24.9µs × (0.89,1.37) 23.1µs × (1.00,1.00) ~ (p=0.476) Compared to Go 1.4 ( = new atomic) name old mean new mean delta SetTypePtr 5.71ns × (0.89,1.19) 14.68ns × (0.93,1.10) +157.24% (p=0.000) SetTypePtr8 19.3ns × (0.96,1.10) 18.6ns × (0.81,1.21) ~ (p=0.638) SetTypePtr16 30.7ns × (0.99,1.03) 22.4ns × (0.90,1.27) -26.88% (p=0.005) SetTypePtr32 51.5ns × (1.00,1.00) 33.8ns × (0.93,1.24) -34.40% (p=0.001) SetTypePtr64 83.6ns × (0.94,1.12) 55.1ns × (1.00,1.01) -34.12% (p=0.001) SetTypePtr126 137ns × (0.87,1.26) 100ns × (1.00,1.00) -27.10% (p=0.028) SetTypePtrSlice 865ns × (0.80,1.23) 760ns × (1.00,1.00) ~ (p=0.243) SetTypeNode1 15.2ns × (0.88,1.12) 12.0ns × (1.00,1.01) -20.89% (p=0.014) SetTypeNode1Slice 156ns × (0.93,1.16) 88ns × (1.00,1.00) -43.57% (p=0.001) SetTypeNode8 23.8ns × (0.90,1.18) 32.6ns × (1.00,1.00) +36.76% (p=0.003) ** SetTypeNode8Slice 502ns × (0.92,1.10) 244ns × (1.00,1.00) -51.46% (p=0.000) SetTypeNode64 85.6ns × (0.94,1.11) 57.4ns × (1.00,1.00) -32.89% (p=0.001) SetTypeNode64Slice 2.36µs × (0.91,1.14) 1.56µs × (1.00,1.00) -33.96% (p=0.002) SetTypeNode124 130ns × (0.91,1.12) 100ns × (1.00,1.00) -23.49% (p=0.004) SetTypeNode124Slice 3.81µs × (0.90,1.22) 2.93µs × (1.00,1.00) -23.09% (p=0.025) There are fewer benchmarks vs Go 1.4 because unrolling directly into the heap bitmap is not yet implemented, so those would not be meaningful comparisons. These benchmarks were not present in Go 1.4 as distributed. The backport to Go 1.4 is in github.com/rsc/go's go14bench branch, commit 71d5ee5. Change-Id: I95ed05a22bf484b0fc9efad549279e766c98d2b6 Reviewed-on: https://go-review.googlesource.com/9704 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-05-11 14:51:20 +00:00
Russ Cox	0234dfd493	runtime: use 2-bit heap bitmap (in place of 4-bit) Previous CLs changed the representation of the non-heap type bitmaps to be 1-bit bitmaps (pointer or not). Before this CL, the heap bitmap stored a 2-bit type for each word and a mark bit and checkmark bit for the first word of the object. (There used to be additional per-word bits.) Reduce heap bitmap to 2-bit, with 1 dedicated to pointer or not, and the other used for mark, checkmark, and "keep scanning forward to find pointers in this object." See comments for details. This CL replaces heapBitsSetType with very slow but obviously correct code. A followup CL will optimize it. (Spoiler: the new code is faster than Go 1.4 was.) Change-Id: I999577a133f3cfecacebdec9cdc3573c235c7fb9 Reviewed-on: https://go-review.googlesource.com/9703 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2015-05-11 14:43:45 +00:00
Russ Cox	6d8a147bef	runtime: use 1-bit pointer bitmaps in type representation The type information in reflect.Type and the GC programs is now 1 bit per word, down from 2 bits. The in-memory unrolled type bitmap representation are now 1 bit per word, down from 4 bits. The conversion from the unrolled (now 1-bit) bitmap to the heap bitmap (still 4-bit) is not optimized. A followup CL will work on that, after the heap bitmap has been converted to 2-bit. The typeDead optimization, in which a special value denotes that there are no more pointers anywhere in the object, is lost in this CL. A followup CL will bring it back in the final form of heapBitsSetType. Change-Id: If61e67950c16a293b0b516a6fd9a1c755b6d5549 Reviewed-on: https://go-review.googlesource.com/9702 Reviewed-by: Austin Clements <austin@google.com>	2015-05-11 14:43:33 +00:00
Russ Cox	4d0f3a1c95	cmd/internal/gc, runtime: use 1-bit bitmap for stack frames, data, bss The bitmaps were 2 bits per pointer because we needed to distinguish scalar, pointer, multiword, and we used the leftover value to distinguish uninitialized from scalar, even though the garbage collector (GC) didn't care. Now that there are no multiword structures from the GC's point of view, cut the bitmaps down to 1 bit per pointer, recording just live pointer vs not. The GC assumes the same layout for stack frames and for the maps describing the global data and bss sections, so change them all in one CL. The code still refers to 4-bit heap bitmaps and 2-bit "type bitmaps", since the 2-bit representation lives (at least for now) in some of the reflect data. Because these stack frame bitmaps are stored directly in the rodata in the binary, this CL reduces the size of the 6g binary by about 1.1%. Performance change is basically a wash, but using less memory, and smaller binaries, and enables other bitmap reductions. name old mean new mean delta BenchmarkBinaryTree17 13.2s × (0.97,1.03) 13.0s × (0.99,1.01) -0.93% (p=0.005) BenchmarkBinaryTree17-2 9.69s × (0.96,1.05) 9.51s × (0.96,1.03) -1.86% (p=0.001) BenchmarkBinaryTree17-4 10.1s × (0.97,1.05) 10.0s × (0.96,1.05) ~ (p=0.141) BenchmarkFannkuch11 4.35s × (0.99,1.01) 4.43s × (0.98,1.04) +1.75% (p=0.001) BenchmarkFannkuch11-2 4.31s × (0.99,1.03) 4.32s × (1.00,1.00) ~ (p=0.095) BenchmarkFannkuch11-4 4.32s × (0.99,1.02) 4.38s × (0.98,1.04) +1.38% (p=0.008) BenchmarkFmtFprintfEmpty 83.5ns × (0.97,1.10) 87.3ns × (0.92,1.11) +4.55% (p=0.014) BenchmarkFmtFprintfEmpty-2 81.8ns × (0.98,1.04) 82.5ns × (0.97,1.08) ~ (p=0.364) BenchmarkFmtFprintfEmpty-4 80.9ns × (0.99,1.01) 82.6ns × (0.97,1.08) +2.12% (p=0.010) BenchmarkFmtFprintfString 320ns × (0.95,1.04) 322ns × (0.97,1.05) ~ (p=0.368) BenchmarkFmtFprintfString-2 303ns × (0.97,1.04) 304ns × (0.97,1.04) ~ (p=0.484) BenchmarkFmtFprintfString-4 305ns × (0.97,1.05) 306ns × (0.98,1.05) ~ (p=0.543) BenchmarkFmtFprintfInt 311ns × (0.98,1.03) 319ns × (0.97,1.03) +2.63% (p=0.000) BenchmarkFmtFprintfInt-2 297ns × (0.98,1.04) 301ns × (0.97,1.04) +1.19% (p=0.023) BenchmarkFmtFprintfInt-4 302ns × (0.98,1.02) 304ns × (0.97,1.03) ~ (p=0.126) BenchmarkFmtFprintfIntInt 554ns × (0.96,1.05) 554ns × (0.97,1.03) ~ (p=0.975) BenchmarkFmtFprintfIntInt-2 520ns × (0.98,1.03) 517ns × (0.98,1.02) ~ (p=0.153) BenchmarkFmtFprintfIntInt-4 524ns × (0.98,1.02) 525ns × (0.98,1.03) ~ (p=0.597) BenchmarkFmtFprintfPrefixedInt 433ns × (0.97,1.06) 434ns × (0.97,1.06) ~ (p=0.804) BenchmarkFmtFprintfPrefixedInt-2 413ns × (0.98,1.04) 413ns × (0.98,1.03) ~ (p=0.881) BenchmarkFmtFprintfPrefixedInt-4 420ns × (0.97,1.03) 421ns × (0.97,1.03) ~ (p=0.561) BenchmarkFmtFprintfFloat 620ns × (0.99,1.03) 636ns × (0.97,1.03) +2.57% (p=0.000) BenchmarkFmtFprintfFloat-2 601ns × (0.98,1.02) 617ns × (0.98,1.03) +2.58% (p=0.000) BenchmarkFmtFprintfFloat-4 613ns × (0.98,1.03) 626ns × (0.98,1.02) +2.15% (p=0.000) BenchmarkFmtManyArgs 2.19µs × (0.96,1.04) 2.23µs × (0.97,1.02) +1.65% (p=0.000) BenchmarkFmtManyArgs-2 2.08µs × (0.98,1.03) 2.10µs × (0.99,1.02) +0.79% (p=0.019) BenchmarkFmtManyArgs-4 2.10µs × (0.98,1.02) 2.13µs × (0.98,1.02) +1.72% (p=0.000) BenchmarkGobDecode 21.3ms × (0.97,1.05) 21.1ms × (0.97,1.04) -1.36% (p=0.025) BenchmarkGobDecode-2 20.0ms × (0.97,1.03) 19.2ms × (0.97,1.03) -4.00% (p=0.000) BenchmarkGobDecode-4 19.5ms × (0.99,1.02) 19.0ms × (0.99,1.01) -2.39% (p=0.000) BenchmarkGobEncode 18.3ms × (0.95,1.07) 18.1ms × (0.96,1.08) ~ (p=0.305) BenchmarkGobEncode-2 16.8ms × (0.97,1.02) 16.4ms × (0.98,1.02) -2.79% (p=0.000) BenchmarkGobEncode-4 15.4ms × (0.98,1.02) 15.4ms × (0.98,1.02) ~ (p=0.465) BenchmarkGzip 650ms × (0.98,1.03) 655ms × (0.97,1.04) ~ (p=0.075) BenchmarkGzip-2 652ms × (0.98,1.03) 655ms × (0.98,1.02) ~ (p=0.337) BenchmarkGzip-4 656ms × (0.98,1.04) 653ms × (0.98,1.03) ~ (p=0.291) BenchmarkGunzip 143ms × (1.00,1.01) 143ms × (1.00,1.01) ~ (p=0.507) BenchmarkGunzip-2 143ms × (1.00,1.01) 143ms × (1.00,1.01) ~ (p=0.313) BenchmarkGunzip-4 143ms × (1.00,1.01) 143ms × (1.00,1.01) ~ (p=0.312) BenchmarkHTTPClientServer 110µs × (0.98,1.03) 109µs × (0.99,1.02) -1.40% (p=0.000) BenchmarkHTTPClientServer-2 154µs × (0.90,1.08) 149µs × (0.90,1.08) -3.43% (p=0.007) BenchmarkHTTPClientServer-4 138µs × (0.97,1.04) 138µs × (0.96,1.04) ~ (p=0.670) BenchmarkJSONEncode 40.2ms × (0.98,1.02) 40.2ms × (0.98,1.05) ~ (p=0.828) BenchmarkJSONEncode-2 35.1ms × (0.99,1.02) 35.2ms × (0.98,1.03) ~ (p=0.392) BenchmarkJSONEncode-4 35.3ms × (0.98,1.03) 35.3ms × (0.98,1.02) ~ (p=0.813) BenchmarkJSONDecode 119ms × (0.97,1.02) 117ms × (0.98,1.02) -1.80% (p=0.000) BenchmarkJSONDecode-2 115ms × (0.99,1.02) 114ms × (0.98,1.02) -1.18% (p=0.000) BenchmarkJSONDecode-4 116ms × (0.98,1.02) 114ms × (0.98,1.02) -1.43% (p=0.000) BenchmarkMandelbrot200 6.03ms × (1.00,1.01) 6.03ms × (1.00,1.01) ~ (p=0.985) BenchmarkMandelbrot200-2 6.03ms × (1.00,1.01) 6.02ms × (1.00,1.01) ~ (p=0.320) BenchmarkMandelbrot200-4 6.03ms × (1.00,1.01) 6.03ms × (1.00,1.01) ~ (p=0.799) BenchmarkGoParse 8.63ms × (0.89,1.10) 8.58ms × (0.93,1.09) ~ (p=0.667) BenchmarkGoParse-2 8.20ms × (0.97,1.04) 8.37ms × (0.97,1.04) +1.96% (p=0.001) BenchmarkGoParse-4 8.00ms × (0.98,1.02) 8.14ms × (0.99,1.02) +1.75% (p=0.000) BenchmarkRegexpMatchEasy0_32 162ns × (1.00,1.01) 164ns × (0.98,1.04) +1.35% (p=0.011) BenchmarkRegexpMatchEasy0_32-2 161ns × (1.00,1.01) 161ns × (1.00,1.00) ~ (p=0.185) BenchmarkRegexpMatchEasy0_32-4 161ns × (1.00,1.00) 161ns × (1.00,1.00) -0.19% (p=0.001) BenchmarkRegexpMatchEasy0_1K 540ns × (0.99,1.02) 566ns × (0.98,1.04) +4.98% (p=0.000) BenchmarkRegexpMatchEasy0_1K-2 540ns × (0.99,1.01) 557ns × (0.99,1.01) +3.21% (p=0.000) BenchmarkRegexpMatchEasy0_1K-4 541ns × (0.99,1.01) 559ns × (0.99,1.01) +3.26% (p=0.000) BenchmarkRegexpMatchEasy1_32 139ns × (0.98,1.04) 139ns × (0.99,1.03) ~ (p=0.979) BenchmarkRegexpMatchEasy1_32-2 139ns × (0.99,1.04) 139ns × (0.99,1.02) ~ (p=0.777) BenchmarkRegexpMatchEasy1_32-4 139ns × (0.98,1.04) 139ns × (0.99,1.04) ~ (p=0.771) BenchmarkRegexpMatchEasy1_1K 890ns × (0.99,1.03) 885ns × (1.00,1.01) -0.50% (p=0.004) BenchmarkRegexpMatchEasy1_1K-2 888ns × (0.99,1.01) 885ns × (0.99,1.01) -0.37% (p=0.004) BenchmarkRegexpMatchEasy1_1K-4 890ns × (0.99,1.02) 884ns × (1.00,1.00) -0.70% (p=0.000) BenchmarkRegexpMatchMedium_32 252ns × (0.99,1.01) 251ns × (0.99,1.01) ~ (p=0.081) BenchmarkRegexpMatchMedium_32-2 254ns × (0.99,1.04) 252ns × (0.99,1.01) -0.78% (p=0.027) BenchmarkRegexpMatchMedium_32-4 253ns × (0.99,1.04) 252ns × (0.99,1.01) -0.70% (p=0.022) BenchmarkRegexpMatchMedium_1K 72.9µs × (0.99,1.01) 72.7µs × (1.00,1.00) ~ (p=0.064) BenchmarkRegexpMatchMedium_1K-2 74.1µs × (0.98,1.05) 72.9µs × (1.00,1.01) -1.61% (p=0.001) BenchmarkRegexpMatchMedium_1K-4 73.6µs × (0.99,1.05) 72.8µs × (1.00,1.00) -1.13% (p=0.007) BenchmarkRegexpMatchHard_32 3.88µs × (0.99,1.03) 3.92µs × (0.98,1.05) ~ (p=0.143) BenchmarkRegexpMatchHard_32-2 3.89µs × (0.99,1.03) 3.93µs × (0.98,1.09) ~ (p=0.278) BenchmarkRegexpMatchHard_32-4 3.90µs × (0.99,1.05) 3.93µs × (0.98,1.05) ~ (p=0.252) BenchmarkRegexpMatchHard_1K 118µs × (0.99,1.01) 117µs × (0.99,1.02) -0.54% (p=0.003) BenchmarkRegexpMatchHard_1K-2 118µs × (0.99,1.01) 118µs × (0.99,1.03) ~ (p=0.581) BenchmarkRegexpMatchHard_1K-4 118µs × (0.99,1.02) 117µs × (0.99,1.01) -0.54% (p=0.002) BenchmarkRevcomp 991ms × (0.95,1.10) 989ms × (0.94,1.08) ~ (p=0.879) BenchmarkRevcomp-2 978ms × (0.95,1.11) 962ms × (0.96,1.08) ~ (p=0.257) BenchmarkRevcomp-4 979ms × (0.96,1.07) 974ms × (0.96,1.11) ~ (p=0.678) BenchmarkTemplate 141ms × (0.99,1.02) 145ms × (0.99,1.02) +2.75% (p=0.000) BenchmarkTemplate-2 135ms × (0.98,1.02) 138ms × (0.99,1.02) +2.34% (p=0.000) BenchmarkTemplate-4 136ms × (0.98,1.02) 140ms × (0.99,1.02) +2.71% (p=0.000) BenchmarkTimeParse 640ns × (0.99,1.01) 622ns × (0.99,1.01) -2.88% (p=0.000) BenchmarkTimeParse-2 640ns × (0.99,1.01) 622ns × (1.00,1.00) -2.81% (p=0.000) BenchmarkTimeParse-4 640ns × (1.00,1.01) 622ns × (0.99,1.01) -2.82% (p=0.000) BenchmarkTimeFormat 730ns × (0.98,1.02) 731ns × (0.98,1.03) ~ (p=0.767) BenchmarkTimeFormat-2 709ns × (0.99,1.02) 707ns × (0.99,1.02) ~ (p=0.347) BenchmarkTimeFormat-4 717ns × (0.98,1.01) 718ns × (0.98,1.02) ~ (p=0.793) Change-Id: Ie779c47e912bf80eb918bafa13638bd8dfd6c2d9 Reviewed-on: https://go-review.googlesource.com/9406 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-05-01 18:44:36 +00:00
Rick Hudson	ada8cdb9f6	runtime: Fix bug due to elided return. A previous change to mbitmap.go dropped a return on a path the seems not to be excersized. This was a mistake that this CL fixes. Change-Id: I715ee4ef08f5bf8d9f53cee84e8fb31a237e2d43 Reviewed-on: https://go-review.googlesource.com/9295 Reviewed-by: Austin Clements <austin@google.com>	2015-04-24 21:52:30 +00:00
Rick Hudson	899a4ad47e	runtime: Speed up heapBitsForObject Optimized heapBitsForObject by special casing objects whose size is a power of two. When a span holding such objects is initialized I added a mask that when &ed with an interior pointer results in the base of the pointer. For the garbage benchmark this resulted in CPU_CLK_UNHALTED in heapBitsForObject going from 7.7% down to 5.9% of the total, INST_RETIRED went from 12.2 -> 8.7. Here are the benchmarks that were at lease plus or minus 1%. benchmark old ns/op new ns/op delta BenchmarkFmtFprintfString 249 221 -11.24% BenchmarkFmtFprintfInt 247 223 -9.72% BenchmarkFmtFprintfEmpty 76.5 69.6 -9.02% BenchmarkBinaryTree17 4106631412 3744550160 -8.82% BenchmarkFmtFprintfFloat 424 399 -5.90% BenchmarkGoParse 4484421 4242115 -5.40% BenchmarkGobEncode 8803668 8449107 -4.03% BenchmarkFmtManyArgs 1494 1436 -3.88% BenchmarkGobDecode 10431051 10032606 -3.82% BenchmarkFannkuch11 2591306713 2517400464 -2.85% BenchmarkTimeParse 361 371 +2.77% BenchmarkJSONDecode 70620492 68830357 -2.53% BenchmarkRegexpMatchMedium_1K 54693 53343 -2.47% BenchmarkTemplate 90008879 91929940 +2.13% BenchmarkTimeFormat 380 387 +1.84% BenchmarkRegexpMatchEasy1_32 111 113 +1.80% BenchmarkJSONEncode 21359159 21007583 -1.65% BenchmarkRegexpMatchEasy1_1K 603 613 +1.66% BenchmarkRegexpMatchEasy0_32 127 129 +1.57% BenchmarkFmtFprintfIntInt 399 393 -1.50% BenchmarkRegexpMatchEasy0_1K 373 378 +1.34% Change-Id: I78e297161026f8b5cc7507c965fd3e486f81ed29 Reviewed-on: https://go-review.googlesource.com/8980 Reviewed-by: Austin Clements <austin@google.com>	2015-04-20 21:39:06 +00:00
Michael Hudson-Doyle	a1f57598cc	runtime, cmd/internal/ld: rename themoduledata to firstmoduledata 'themoduledata' doesn't really make sense now we support multiple moduledata objects. Change-Id: I8263045d8f62a42cb523502b37289b0fba054f62 Reviewed-on: https://go-review.googlesource.com/8521 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-04-10 05:11:49 +00:00
Michael Hudson-Doyle	fae4a128cb	runtime, reflect: support multiple moduledata objects This changes all the places that consult themoduledata to consult a linked list of moduledata objects, as will be necessary for -linkshared to work. Obviously, as there is as yet no way of adding moduledata objects to this list, all this change achieves right now is wasting a few instructions here and there. Change-Id: I397af7f60d0849b76aaccedf72238fe664867051 Reviewed-on: https://go-review.googlesource.com/8231 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-04-10 04:51:42 +00:00
Michael Hudson-Doyle	67426a8a9e	runtime, cmd/internal/ld: change runtime to use a single linker symbol In preparation for being able to run a go program that has code in several objects, this changes from having several linker symbols used by the runtime into having one linker symbol that points at a structure containing the needed data. Multiple object support will construct a linked list of such structures. A follow up will initialize the slices in the themoduledata structure directly from the linker but I was aiming for a minimal diff for now. Change-Id: I613cce35309801cf265a1d5ae5aaca8d689c5cbf Reviewed-on: https://go-review.googlesource.com/7441 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-03-31 22:45:07 +00:00
Rick Hudson	122384e489	runtime: Remove boundary bit logic. This is an experiment to see if removing the boundary bit logic will lead to fewer cache misses and improved performance. Instead of using boundary bits we use the span information to get element size and use some bit whacking to get the boundary without having to touch the random heap bits which cause cache misses. Furthermore once the boundary bit is removed we can either use that bit for a simpler checkmark routine or we can reduce the number of bits in the GC bitmap to 2 bits per pointer sized work. For example the 2 bits at the boundary can be used for marking and pointer/scalar differentiation. Since we don't need the mark bit except at the boundary nibble of the object other nibbles can use this bit as a noscan bit to indicate that there are no more pointers in the object. Currently the changed included in this CL slows down the garbage benchmark. With the boundary bits garbage gives 5.78 and without (this CL) it gives 5.88 which is a 2% slowdown. Change-Id: Id68f831ad668176f7dc9f7b57b339e4ebb6dc4c2 Reviewed-on: https://go-review.googlesource.com/6665 Reviewed-by: Austin Clements <austin@google.com>	2015-03-04 20:55:55 +00:00
Russ Cox	9feb24f3ed	runtime: use multiply instead of divide in heapBitsForObject These benchmarks show the effect of the combination of this change and Rick's pending CL 6665. Code with interior pointers is helped much more than code without, but even code without doesn't suffer too badly. benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 6989407768 6851728175 -1.97% BenchmarkFannkuch11 4416250775 4405762558 -0.24% BenchmarkFmtFprintfEmpty 134 130 -2.99% BenchmarkFmtFprintfString 491 402 -18.13% BenchmarkFmtFprintfInt 430 420 -2.33% BenchmarkFmtFprintfIntInt 748 663 -11.36% BenchmarkFmtFprintfPrefixedInt 602 534 -11.30% BenchmarkFmtFprintfFloat 728 699 -3.98% BenchmarkFmtManyArgs 2528 2507 -0.83% BenchmarkGobDecode 17448191 17749756 +1.73% BenchmarkGobEncode 14579824 14370183 -1.44% BenchmarkGzip 656489990 652669348 -0.58% BenchmarkGunzip 141254147 141099278 -0.11% BenchmarkHTTPClientServer 94111 93738 -0.40% BenchmarkJSONEncode 36305013 36696440 +1.08% BenchmarkJSONDecode 124652000 128176454 +2.83% BenchmarkMandelbrot200 6009333 5997093 -0.20% BenchmarkGoParse 7651583 7623494 -0.37% BenchmarkRegexpMatchEasy0_32 213 213 +0.00% BenchmarkRegexpMatchEasy0_1K 511 494 -3.33% BenchmarkRegexpMatchEasy1_32 186 187 +0.54% BenchmarkRegexpMatchEasy1_1K 1834 1827 -0.38% BenchmarkRegexpMatchMedium_32 427 412 -3.51% BenchmarkRegexpMatchMedium_1K 154841 153086 -1.13% BenchmarkRegexpMatchHard_32 7473 7478 +0.07% BenchmarkRegexpMatchHard_1K 233587 232272 -0.56% BenchmarkRevcomp 918797689 944528032 +2.80% BenchmarkTemplate 167665081 167773121 +0.06% BenchmarkTimeParse 631 636 +0.79% BenchmarkTimeFormat 672 666 -0.89% Change-Id: Ia923de3cdb3993b640fe0a02cbe2c7babc16f32c Reviewed-on: https://go-review.googlesource.com/6782 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2015-03-04 17:46:47 +00:00
Austin Clements	da4874cba4	runtime: trivial clean ups to greyobject Previously, the typeDead check in greyobject was under a separate !useCheckmark conditional. Put it with the rest of the !useCheckmark code. Also move a comment about atomic update of the marked bit to where we actually do that update now. Change-Id: Ief5f16401a25739ad57d959607b8d81ffe0bc211 Reviewed-on: https://go-review.googlesource.com/6271 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-02-27 19:39:57 +00:00
Keith Randall	6d1ebeb527	runtime: handle holes in the heap We need to distinguish pointers to free spans, which indicate bugs in our pointer analysis, from pointers to never-in-the-heap spans, which can legitimately arise from sysAlloc/mmap/etc. This normally isn't a problem because the heap is contiguous, but in some situations (32 bit, particularly) the heap must grow around an already allocated region. The bad pointer test is disabled so this fix doesn't actually do anything, but it removes one barrier from reenabling it. Fixes #9872. Change-Id: I0a92db4d43b642c58d2b40af69c906a8d9777f88 Reviewed-on: https://go-review.googlesource.com/5780 Reviewed-by: Dmitry Vyukov <dvyukov@google.com>	2015-02-25 21:07:10 +00:00
Austin Clements	bceb18e498	runtime: eliminate unnecessary assumption in heapBitsForObject The slow path of heapBitsForObjects somewhat subtly assumes that the pointer will not point to the first word of the object and will round the pointer wrong if this assumption is violated. This assumption is safe because the fast path should always take care of this case, but there's no benefit to making this assumption, it makes the code more difficult to experiment with than necessary, and it's trivial to eliminate. Change-Id: Iedd336f7d529a27d3abeb83e77dfb32a285ea73a Reviewed-on: https://go-review.googlesource.com/5636 Reviewed-by: Russ Cox <rsc@golang.org>	2015-02-23 21:49:27 +00:00
Russ Cox	484f801ff4	runtime: reorganize memory code Move code from malloc1.go, malloc2.go, mem.go, mgc0.go into appropriate locations. Factor mgc.go into mgc.go, mgcmark.go, mgcsweep.go, mstats.go. A lot of this code was in certain files because the right place was in a C file but it was written in Go, or vice versa. This is one step toward making things actually well-organized again. Change-Id: I6741deb88a7cfb1c17ffe0bcca3989e10207968f Reviewed-on: https://go-review.googlesource.com/5300 Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org>	2015-02-19 20:17:01 +00:00
Russ Cox	3965d7508e	runtime: factor out bitmap, finalizer code from malloc/mgc The code in mfinal.go is moved from malloc.go and mgc.go and substantially unchanged. The code in mbitmap.go is also moved from those files, but cleaned up so that it can be called from those files (in most cases the code being moved was not already a standalone function). I also renamed the constants and wrote comments describing the format. The result is a significant cleanup and isolation of the bitmap code, but, roughly speaking, it should be treated and reviewed as new code. The other files changed only as much as necessary to support this code movement. This CL does NOT change the semantics of the heap or type bitmaps at all, although there are now some obvious opportunities to do so in followup CLs. Change-Id: I41b8d5de87ad1d3cd322709931ab25e659dbb21d Reviewed-on: https://go-review.googlesource.com/2991 Reviewed-by: Keith Randall <khr@golang.org>	2015-01-19 16:26:51 +00:00

48 Commits