Commit Graph

48 Commits

Author SHA1 Message Date
Ian Lance Taylor be1ef46775 runtime: add optional expensive check for invalid cgo pointer passing
If you set GODEBUG=cgocheck=2 the runtime package will use the write
barrier to detect cases where a Go program writes a Go pointer into
non-Go memory.  In conjunction with the existing cgo checks, and the
not-yet-implemented cgo check for exported functions, this should
reliably detect all cases (that do not import the unsafe package) in
which a Go pointer is incorrectly shared with C code.  This check is
optional because it turns on the write barrier at all times, which is
known to be expensive.

Update #12416.

Change-Id: I549d8b2956daa76eac853928e9280e615d6365f4
Reviewed-on: https://go-review.googlesource.com/16899
Reviewed-by: Russ Cox <rsc@golang.org>
2015-11-16 18:39:06 +00:00
Michael Matloob 432cb66f16 runtime: break out system-specific constants into package sys
runtime/internal/sys will hold system-, architecture- and config-
specific constants.

Updates #11647

Change-Id: I6db29c312556087a42e8d2bdd9af40d157c56b54
Reviewed-on: https://go-review.googlesource.com/16817
Reviewed-by: Russ Cox <rsc@golang.org>
2015-11-12 17:04:45 +00:00
Matthew Dempsky c17c42e8a5 runtime: rewrite lots of foo_Bar(f, ...) into f.bar(...)
Applies to types fixAlloc, mCache, mCentral, mHeap, mSpan, and
mSpanList.

Two special cases:

1. mHeap_Scavenge() previously didn't take an *mheap parameter, so it
was specially handled in this CL.

2. mHeap_Free() would have collided with mheap's "free" field, so it's
been renamed to (*mheap).freeSpan to parallel its underlying
(*mheap).freeSpanLocked method.

Change-Id: I325938554cca432c166fe9d9d689af2bbd68de4b
Reviewed-on: https://go-review.googlesource.com/16221
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-11-12 00:34:58 +00:00
Michael Matloob 67faca7d9c runtime: break atomics out into package runtime/internal/atomic
This change breaks out most of the atomics functions in the runtime
into package runtime/internal/atomic. It adds some basic support
in the toolchain for runtime packages, and also modifies linux/arm
atomics to remove the dependency on the runtime's mutex. The mutexes
have been replaced with spinlocks.

all trybots are happy!
In addition to the trybots, I've tested on the darwin/arm64 builder,
on the darwin/arm builder, and on a ppc64le machine.

Change-Id: I6698c8e3cf3834f55ce5824059f44d00dc8e3c2f
Reviewed-on: https://go-review.googlesource.com/14204
Run-TryBot: Michael Matloob <matloob@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-11-10 17:38:04 +00:00
Austin Clements beedb1ec33 runtime: add pcvalue cache to improve stack scan speed
The cost of scanning large stacks is currently dominated by the time
spent looking up and decoding the pcvalue table. However, large stacks
are usually large not because they contain calls to many different
functions, but because they contain many calls to the same, small set
of recursive functions. Hence, walking large stacks tends to make the
same pcvalue queries many times.

Based on this observation, this commit adds a small, very simple, and
fast cache in front of pcvalue lookup. We thread this cache down from
operations that make many pcvalue calls, such as gentraceback, stack
scanning, and stack adjusting.

This simple cache works well because it has minimal overhead when it's
not effective. I also tried a hashed direct-map cache, CLOCK-based
replacement, round-robin replacement, and round-robin with lookups
disabled until there had been at least 16 probes, but none of these
approaches had obvious wins over the random replacement policy in this
commit.

This nearly doubles the overall performance of the deep stack test
program from issue #10898:

name        old time/op  new time/op  delta
Issue10898   16.5s ±12%    9.2s ±12%  -44.37%  (p=0.008 n=5+5)

It's a very slight win on the garbage benchmark:

name              old time/op  new time/op  delta
XBenchGarbage-12  4.92ms ± 1%  4.89ms ± 1%  -0.75%  (p=0.000 n=18+19)

It's a wash (but doesn't harm performance) on the go1 benchmarks,
which don't have particularly deep stacks:

name                      old time/op    new time/op    delta
BinaryTree17-12              3.11s ± 2%     3.20s ± 3%  +2.83%  (p=0.000 n=17+20)
Fannkuch11-12                2.51s ± 1%     2.51s ± 1%  -0.22%  (p=0.034 n=19+18)
FmtFprintfEmpty-12          50.8ns ± 3%    50.6ns ± 2%    ~     (p=0.793 n=20+20)
FmtFprintfString-12          174ns ± 0%     174ns ± 1%  +0.17%  (p=0.048 n=15+20)
FmtFprintfInt-12             177ns ± 0%     165ns ± 1%  -6.99%  (p=0.000 n=17+19)
FmtFprintfIntInt-12          283ns ± 1%     284ns ± 0%  +0.22%  (p=0.000 n=18+15)
FmtFprintfPrefixedInt-12     243ns ± 1%     244ns ± 1%  +0.40%  (p=0.000 n=20+19)
FmtFprintfFloat-12           318ns ± 0%     319ns ± 0%  +0.27%  (p=0.001 n=19+20)
FmtManyArgs-12              1.12µs ± 0%    1.14µs ± 0%  +1.74%  (p=0.000 n=19+20)
GobDecode-12                8.69ms ± 0%    8.73ms ± 1%  +0.46%  (p=0.000 n=18+18)
GobEncode-12                6.64ms ± 1%    6.61ms ± 1%  -0.46%  (p=0.000 n=20+20)
Gzip-12                      323ms ± 2%     319ms ± 1%  -1.11%  (p=0.000 n=20+20)
Gunzip-12                   42.8ms ± 0%    42.9ms ± 0%    ~     (p=0.158 n=18+20)
HTTPClientServer-12         63.3µs ± 1%    63.1µs ± 1%  -0.35%  (p=0.011 n=20+20)
JSONEncode-12               16.9ms ± 1%    17.3ms ± 1%  +2.84%  (p=0.000 n=19+20)
JSONDecode-12               59.7ms ± 0%    58.5ms ± 0%  -2.05%  (p=0.000 n=19+17)
Mandelbrot200-12            3.92ms ± 0%    3.91ms ± 0%  -0.16%  (p=0.003 n=19+19)
GoParse-12                  3.79ms ± 2%    3.75ms ± 2%  -0.91%  (p=0.005 n=20+20)
RegexpMatchEasy0_32-12       102ns ± 1%     101ns ± 1%  -0.80%  (p=0.001 n=14+20)
RegexpMatchEasy0_1K-12       337ns ± 1%     346ns ± 1%  +2.90%  (p=0.000 n=20+19)
RegexpMatchEasy1_32-12      84.4ns ± 2%    84.3ns ± 2%    ~     (p=0.743 n=20+20)
RegexpMatchEasy1_1K-12       502ns ± 1%     505ns ± 0%  +0.64%  (p=0.000 n=20+20)
RegexpMatchMedium_32-12      133ns ± 1%     132ns ± 1%  -0.85%  (p=0.000 n=20+19)
RegexpMatchMedium_1K-12     40.1µs ± 1%    39.8µs ± 1%  -0.77%  (p=0.000 n=18+18)
RegexpMatchHard_32-12       2.08µs ± 1%    2.07µs ± 1%  -0.55%  (p=0.001 n=18+19)
RegexpMatchHard_1K-12       62.4µs ± 1%    62.0µs ± 1%  -0.74%  (p=0.000 n=19+19)
Revcomp-12                   545ms ± 2%     545ms ± 3%    ~     (p=0.771 n=19+20)
Template-12                 73.7ms ± 1%    72.0ms ± 0%  -2.33%  (p=0.000 n=20+18)
TimeParse-12                 358ns ± 1%     351ns ± 1%  -2.07%  (p=0.000 n=20+20)
TimeFormat-12                369ns ± 1%     356ns ± 0%  -3.53%  (p=0.000 n=20+18)
[Geo mean]                  63.5µs         63.2µs       -0.41%

name                      old speed      new speed      delta
GobDecode-12              88.3MB/s ± 0%  87.9MB/s ± 0%  -0.43%  (p=0.000 n=18+17)
GobEncode-12               116MB/s ± 1%   116MB/s ± 1%  +0.47%  (p=0.000 n=20+20)
Gzip-12                   60.2MB/s ± 2%  60.8MB/s ± 1%  +1.13%  (p=0.000 n=20+20)
Gunzip-12                  453MB/s ± 0%   453MB/s ± 0%    ~     (p=0.160 n=18+20)
JSONEncode-12              115MB/s ± 1%   112MB/s ± 1%  -2.76%  (p=0.000 n=19+20)
JSONDecode-12             32.5MB/s ± 0%  33.2MB/s ± 0%  +2.09%  (p=0.000 n=19+17)
GoParse-12                15.3MB/s ± 2%  15.4MB/s ± 2%  +0.92%  (p=0.004 n=20+20)
RegexpMatchEasy0_32-12     311MB/s ± 1%   314MB/s ± 1%  +0.78%  (p=0.000 n=15+19)
RegexpMatchEasy0_1K-12    3.04GB/s ± 1%  2.95GB/s ± 1%  -2.90%  (p=0.000 n=19+19)
RegexpMatchEasy1_32-12     379MB/s ± 2%   380MB/s ± 2%    ~     (p=0.779 n=20+20)
RegexpMatchEasy1_1K-12    2.04GB/s ± 1%  2.02GB/s ± 0%  -0.62%  (p=0.000 n=20+20)
RegexpMatchMedium_32-12   7.46MB/s ± 1%  7.53MB/s ± 1%  +0.86%  (p=0.000 n=20+19)
RegexpMatchMedium_1K-12   25.5MB/s ± 1%  25.7MB/s ± 1%  +0.78%  (p=0.000 n=18+18)
RegexpMatchHard_32-12     15.4MB/s ± 1%  15.5MB/s ± 1%  +0.62%  (p=0.000 n=19+19)
RegexpMatchHard_1K-12     16.4MB/s ± 1%  16.5MB/s ± 1%  +0.82%  (p=0.000 n=20+19)
Revcomp-12                 466MB/s ± 2%   466MB/s ± 3%    ~     (p=0.765 n=19+20)
Template-12               26.3MB/s ± 1%  27.0MB/s ± 0%  +2.38%  (p=0.000 n=20+18)
[Geo mean]                97.8MB/s       98.0MB/s       +0.23%

Change-Id: I281044ae0b24990ba46487cacbc1069493274bc4
Reviewed-on: https://go-review.googlesource.com/13614
Reviewed-by: Keith Randall <khr@golang.org>
2015-10-22 17:48:13 +00:00
Matthew Dempsky 84afa1be76 runtime: make iface/eface handling more type safe
Change compiler-invoked interface functions to directly take
iface/eface parameters instead of fInterface/interface{} to avoid
needing to always convert.

For the handful of functions that legitimately need to take an
interface{} parameter, add efaceOf to type-safely convert *interface{}
to *eface.

Change-Id: I8928761a12fd3c771394f36adf93d3006a9fcf39
Reviewed-on: https://go-review.googlesource.com/16166
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-10-21 23:08:22 +00:00
Austin Clements b307910b6e runtime: fix offset in invalidptr panic message
Change-Id: I00e1eebbf5e1a01c8fad5ca5324aa8eec1e4d731
Reviewed-on: https://go-review.googlesource.com/14792
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-09-22 16:55:17 +00:00
Austin Clements b7c55ba496 runtime: improve invalid pointer error message
By default, the runtime panics if it detects a pointer to an
unallocated span. At this point, this usually catches bad uses of
unsafe or cgo in user code (though it could also catch runtime bugs).
Unfortunately, the rather cryptic error misleads users, offers users
little help with debugging their own problem, and offers the Go
developers little help with root-causing.

Improve the error message in various ways. First, the wording is
improved to make it clearer what condition was detected and to suggest
that this may be the result of incorrect use of unsafe or cgo. Second,
we add a dump of the object containing the bad pointer so that there's
at least some hope of figuring out why a bad pointer was stored in the
Go heap.

Change-Id: I57b91b12bc3cb04476399d7706679e096ce594b9
Reviewed-on: https://go-review.googlesource.com/14763
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-09-18 22:23:11 +00:00
Russ Cox 3ae17043f7 runtime: make sure heapBitsBulkBarrier cannot be preempted
Changes the torture test in #12068 from failing about 1/10 times
to not failing in almost 2,000 runs.

This was only happening in -race mode because functions are
bigger in -race mode, so a few of the helpers for heapBitsBulkBarrier
were not being inlined, and they were not marked nosplit,
so (only in -race mode) the write barrier was being preempted by GC,
causing missed pointer updates.

Filed issue #12069 for diagnosis of any other similar errors.

Fixes #12068.

Change-Id: Ic174d9b050ba278b18b08ab0d85a73c33bd5b175
Reviewed-on: https://go-review.googlesource.com/13364
Reviewed-by: Austin Clements <austin@google.com>
2015-08-07 17:55:26 +00:00
Russ Cox d3ffc975f3 runtime: set invalidptr=1 by default, as documented
Also make invalidptr control the recently added GC pointer check,
as documented.

Change-Id: Iccfdf49480219d12be8b33b8f03d8312d8ceabed
Reviewed-on: https://go-review.googlesource.com/12857
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
2015-07-29 23:50:20 +00:00
Russ Cox 4addec3aaa runtime: reenable bad pointer check in GC
The last time we tried this, linux/arm64 broke.
The series of CLs leading to this one fixes that problem.
Let's try again.

Fixes #9880.

Change-Id: I67bc1d959175ec972d4dcbe4aa6f153790f74251
Reviewed-on: https://go-review.googlesource.com/12849
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-07-29 21:37:55 +00:00
Russ Cox a93e5b4ff9 Revert "runtime: diagnose invalid pointers during GC"
Broke arm64. Update #9880.

This reverts commit 38d9b2a3a9.

Change-Id: I35fa21005af2183828a9d8b195ebcfbe45ec5138
Reviewed-on: https://go-review.googlesource.com/12247
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-16 01:49:58 +00:00
Russ Cox 38d9b2a3a9 runtime: diagnose invalid pointers during GC
For #9880. Let's see what breaks.

Change-Id: Ic8b99a604e60177a448af5f7173595feed607875
Reviewed-on: https://go-review.googlesource.com/10818
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
2015-07-15 05:42:06 +00:00
Austin Clements d231cb8249 runtime: repeat bitmap for slice of GCprog n-1 times, not n times
Currently, to write out the bitmap of a slice of a type with a GCprog,
we construct a new GCprog that executes the underlying type's GCprog
to write out the bitmap once and then repeats those bits n more times.
This results in n+1 repetitions of the bitmap, which is one more
repetition than it should be. This corrupts the bitmap of the heap
following the slice and may write past the mapped bitmap memory and
segfault.

Fix this by repeating the bitmap only n-1 more times.

Fixes #11430.

Change-Id: Ic24854363bffc5a755b66f257339f9309ada3aa5
Reviewed-on: https://go-review.googlesource.com/11570
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-06-26 21:52:51 +00:00
Austin Clements a8ae93fd26 runtime: fix heap bitmap repeating with large scalar tails
When heapBitsSetType repeats a source bitmap with a scalar tail
(typ.ptrdata < typ.size), it lays out the tail upon reaching the end
of the source bitmap by simply increasing the number of bits claimed
to be in the incoming bit buffer. This causes later iterations to read
the appropriate number of zeros out of the bit buffer before starting
on the next repeat of the source bitmap.

Currently, however, later iterations of the loop continue to read bits
from the source bitmap *regardless of the number of bits currently in
the bit buffer*. The bit buffer can only hold 32 or 64 bits, so if the
scalar tail is large and the padding bits exceed the size of the bit
buffer, the read from the source bitmap on the next iteration will
shift the incoming bits into oblivion when it attempts to put them in
the bit buffer. When the buffer does eventually shift down to where
these bits were supposed to be, it will contain zeros. As a result,
words that should be marked as pointers on later repetitions are
marked as scalars, so the garbage collector does not trace them. If
this is the only reference to an object, it will be incorrectly freed.

Fix this by adding logic to drain the bit buffer down if it is large
instead of reading more bits from the source bitmap.

Fixes #11286.

Change-Id: I964432c4b9f1cec334fc8c3da0ff16460203feb6
Reviewed-on: https://go-review.googlesource.com/11360
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-23 18:37:17 +00:00
Rick Hudson 1ab9176e54 runtime: remove race and increase precision in pointer validation.
This CL removes the single and racy use of mheap.arena_end outside
of the bookkeeping done in mHeap_init and mHeap_Alloc.
There should be no way for heapBitsForSpan to see a pointer to
an invalid span. This CL makes the check for this more precise by
checking that the pointer is between mheap_.arena_start and
mheap_.arena_used instead of mheap_.arena_end.

Change-Id: I1200b54353ee1eda002d92645fd8d26048600ceb
Reviewed-on: https://go-review.googlesource.com/11342
Reviewed-by: Austin Clements <austin@google.com>
2015-06-22 20:37:23 +00:00
Russ Cox 80ec711755 runtime: use type-based write barrier for remote stack write during chansend
A send on an unbuffered channel to a blocked receiver is the only
case in the runtime where one goroutine writes directly to the stack
of another. The garbage collector assumes that if a goroutine is
blocked, its stack contains no new pointers since the last time it ran.
The send on an unbuffered channel violates this, so it needs an
explicit write barrier. It has an explicit write barrier, but not one that
can handle a write to another stack. Use one that can (based on type bitmap
instead of heap bitmap).

To make this work, raise the limit for type bitmaps so that they are
used for all types up to 64 kB in size (256 bytes of bitmap).
(The runtime already imposes a limit of 64 kB for a channel element size.)

I have been unable to reproduce this problem in a simple test program.

Could help #11035.

Change-Id: I06ad994032d8cff3438c9b3eaa8d853915128af5
Reviewed-on: https://go-review.googlesource.com/10815
Reviewed-by: Austin Clements <austin@google.com>
2015-06-15 16:50:30 +00:00
Russ Cox d57c889ae8 runtime: wait to update arena_used until after mapping bitmap
This avoids a race with gcmarkwb_m that was leading to faults.

Fixes #10212.

Change-Id: I6fcf8d09f2692227063ce29152cb57366ea22487
Reviewed-on: https://go-review.googlesource.com/10816
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-06-11 18:15:21 +00:00
Ainar Garipov 7f9f70e5b6 all: fix misprints in comments
These were found by grepping the comments from the go code and feeding
the output to aspell.

Change-Id: Id734d6c8d1938ec3c36bd94a4dbbad577e3ad395
Reviewed-on: https://go-review.googlesource.com/10941
Reviewed-by: Aamir Khan <syst3m.w0rm@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-06-11 14:18:57 +00:00
Russ Cox 433c0bc769 runtime: avoid fault in heapBitsBulkBarrier
Change-Id: I0512e461de1f25cb2a1cb7f23e7a77d00700667c
Reviewed-on: https://go-review.googlesource.com/10803
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-08 20:24:00 +00:00
Austin Clements 306f8f11ad runtime: unwind stack barriers when writing above the current frame
Stack barriers assume that writes through pointers to frames above the
current frame will get write barriers, and hence these frames do not
need to be re-scanned to pick up these changes. For normal writes,
this is true. However, there are places in the runtime that use
typedmemmove to potentially write through pointers to higher frames
(such as mapassign1). Currently, typedmemmove does not execute write
barriers if the destination is on the stack. If there's a stack
barrier between the current frame and the frame being modified with
typedmemmove, and the stack barrier is not otherwise hit, it's
possible that the garbage collector will never see the updated pointer
and incorrectly reclaim the object.

Fix this by making heapBitsBulkBarrier (which lies behind typedmemmove
and its variants) detect when the destination is in the stack and
unwind stack barriers up to the point, forcing mark termination to
later rescan the effected frame and collect these pointers.

Fixes #11084. Might be related to #10240, #10541, #10941, #11023,
 #11027 and possibly others.

Change-Id: I323d6cd0f1d29fa01f8fc946f4b90e04ef210efd
Reviewed-on: https://go-review.googlesource.com/10791
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-07 17:57:47 +00:00
Russ Cox 001438bdfe runtime: fix callwritebarrier
Given a call frame F of size N where the return values start at offset R,
callwritebarrier was instructing heapBitsBulkBarrier to scan the block
of memory [F+R, F+R+N). It should only scan [F+R, F+N). The extra N-R
bytes scanned might lead into the next allocated block in memory.
Because the scan was consulting the heap bitmap for type information,
scanning into the next block normally "just worked" in the sense of
not crashing.

Scanning the extra N-R bytes of memory is a problem mainly because
it causes the GC to consider pointers that might otherwise not be
considered, leading it to retain objects that should actually be freed.
This is very difficult to detect.

Luckily, juju turned up a case where the heap bitmap and the memory
were out of sync for the block immediately after the call frame, so that
heapBitsBulkBarrier saw an obvious non-pointer where it expected a
pointer, causing a loud crash.

Why is there a non-pointer in memory that the heap bitmap records as
a pointer? That is more difficult to answer. At least one way that it
could happen is that allocations containing no pointers at all do not
update the heap bitmap. So if heapBitsBulkBarrier walked out of the
current object and into a no-pointer object and consulted those bitmap
bits, it would be misled. This doesn't happen in general because all
the paths to heapBitsBulkBarrier first check for the no-pointer case.
This may or may not be what happened, but it's the only scenario
I've been able to construct.

I tried for quite a while to write a simple test for this and could not.
It does fix the juju crash, and it is clearly an improvement over the
old code.

Fixes #10844.

Change-Id: I53982c93ef23ef93155c4086bbd95a4c4fdaac9a
Reviewed-on: https://go-review.googlesource.com/10317
Reviewed-by: Austin Clements <austin@google.com>
2015-05-21 19:14:03 +00:00
Russ Cox 512f75e8df runtime: replace GC programs with simpler encoding, faster decoder
Small types record the location of pointers in their memory layout
by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries,
and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using
a bitmap for a large type containing arrays does not make sense:
if someone refers to the type [1<<28]*byte in a program in such
a way that the type information makes it into the binary, it would be
a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB
(for 1-bit entries) bitmap full of 1s into the binary or even to keep
one in memory during the execution of the program.

For large types containing arrays, it is much more compact to describe
the locations of pointers using a notation that can express repetition
than to lay out a bitmap of pointers. Go 1.4 included such a notation,
called ``GC programs'' but it was complex, required recursion during
decoding, and was generally slow. Dmitriy measured the execution of
these programs writing directly to the heap bitmap as being 7x slower
than copying from a preunrolled 4-bit mask (and frankly that code was
not terribly fast either). For some tests, unrollgcprog1 was seen costing
as much as 3x more than the rest of malloc combined.

This CL introduces a different form for the GC programs. They use a
simple Lempel-Ziv-style encoding of the 1-bit pointer information,
in which the only operations are (1) emit the following n bits
and (2) repeat the last n bits c more times. This encoding can be
generated directly from the Go type information (using repetition
only for arrays or large runs of non-pointer data) and it can be decoded
very efficiently. In particular the decoding requires little state and
no recursion, so that the entire decoding can run without any memory
accesses other than the reads of the encoding and the writes of the
decoded form to the heap bitmap. For recursive types like arrays of
arrays of arrays, the inner instructions are only executed once, not
n times, so that large repetitions run at full speed. (In contrast, large
repetitions in the old programs repeated the individual bit-level layout
of the inner data over and over.) The result is as much as 25x faster
decoding compared to the old form.

Because the old decoder was so slow, Go 1.4 had three (or so) cases
for how to set the heap bitmap bits for an allocation of a given type:

(1) If the type had an even number of words up to 32 words, then
the 4-bit pointer mask for the type fit in no more than 16 bytes;
store the 4-bit pointer mask directly in the binary and copy from it.

(1b) If the type had an odd number of words up to 15 words, then
the 4-bit pointer mask for the type, doubled to end on a byte boundary,
fit in no more than 16 bytes; store that doubled mask directly in the
binary and copy from it.

(2) If the type had an even number of words up to 128 words,
or an odd number of words up to 63 words (again due to doubling),
then the 4-bit pointer mask would fit in a 64-byte unrolled mask.
Store a GC program in the binary, but leave space in the BSS for
the unrolled mask. Execute the GC program to construct the mask the
first time it is needed, and thereafter copy from the mask.

(3) Otherwise, store a GC program and execute it to write directly to
the heap bitmap each time an object of that type is allocated.
(This is the case that was 7x slower than the other two.)

Because the new pointer masks store 1-bit entries instead of 4-bit
entries and because using the decoder no longer carries a significant
overhead, after this CL (that is, for Go 1.5) there are only two cases:

(1) If the type is 128 words or less (no condition about odd or even),
store the 1-bit pointer mask directly in the binary and use it to
initialize the heap bitmap during malloc. (Implemented in CL 9702.)

(2) There is no case 2 anymore.

(3) Otherwise, store a GC program and execute it to write directly to
the heap bitmap each time an object of that type is allocated.

Executing the GC program directly into the heap bitmap (case (3) above)
was disabled for the Go 1.5 dev cycle, both to avoid needing to use
GC programs for typedmemmove and to avoid updating that code as
the heap bitmap format changed. Typedmemmove no longer uses this
type information; as of CL 9886 it uses the heap bitmap directly.
Now that the heap bitmap format is stable, we reintroduce GC programs
and their space savings.

Benchmarks for heapBitsSetType, before this CL vs this CL:

name                    old mean               new mean              delta
SetTypePtr              7.59ns × (0.99,1.02)   5.16ns × (1.00,1.00)  -32.05% (p=0.000)
SetTypePtr8             21.0ns × (0.98,1.05)   21.4ns × (1.00,1.00)     ~    (p=0.179)
SetTypePtr16            24.1ns × (0.99,1.01)   24.6ns × (1.00,1.00)   +2.41% (p=0.001)
SetTypePtr32            31.2ns × (0.99,1.01)   32.4ns × (0.99,1.02)   +3.72% (p=0.001)
SetTypePtr64            45.2ns × (1.00,1.00)   47.2ns × (1.00,1.00)   +4.42% (p=0.000)
SetTypePtr126           75.8ns × (0.99,1.01)   79.1ns × (1.00,1.00)   +4.25% (p=0.000)
SetTypePtr128           74.3ns × (0.99,1.01)   77.6ns × (1.00,1.01)   +4.55% (p=0.000)
SetTypePtrSlice          726ns × (1.00,1.01)    712ns × (1.00,1.00)   -1.95% (p=0.001)
SetTypeNode1            20.0ns × (0.99,1.01)   20.7ns × (1.00,1.00)   +3.71% (p=0.000)
SetTypeNode1Slice        112ns × (1.00,1.00)    113ns × (0.99,1.00)     ~    (p=0.070)
SetTypeNode8            23.9ns × (1.00,1.00)   24.7ns × (1.00,1.01)   +3.18% (p=0.000)
SetTypeNode8Slice        294ns × (0.99,1.02)    287ns × (0.99,1.01)   -2.38% (p=0.015)
SetTypeNode64           52.8ns × (0.99,1.03)   51.8ns × (0.99,1.01)     ~    (p=0.069)
SetTypeNode64Slice      1.13µs × (0.99,1.05)   1.14µs × (0.99,1.00)     ~    (p=0.767)
SetTypeNode64Dead       36.0ns × (1.00,1.01)   32.5ns × (0.99,1.00)   -9.67% (p=0.000)
SetTypeNode64DeadSlice  1.43µs × (0.99,1.01)   1.40µs × (1.00,1.00)   -2.39% (p=0.001)
SetTypeNode124          75.7ns × (1.00,1.01)   79.0ns × (1.00,1.00)   +4.44% (p=0.000)
SetTypeNode124Slice     1.94µs × (1.00,1.01)   2.04µs × (0.99,1.01)   +4.98% (p=0.000)
SetTypeNode126          75.4ns × (1.00,1.01)   77.7ns × (0.99,1.01)   +3.11% (p=0.000)
SetTypeNode126Slice     1.95µs × (0.99,1.01)   2.03µs × (1.00,1.00)   +3.74% (p=0.000)
SetTypeNode128          85.4ns × (0.99,1.01)  122.0ns × (1.00,1.00)  +42.89% (p=0.000)
SetTypeNode128Slice     2.20µs × (1.00,1.01)   2.36µs × (0.98,1.02)   +7.48% (p=0.001)
SetTypeNode130          83.3ns × (1.00,1.00)  123.0ns × (1.00,1.00)  +47.61% (p=0.000)
SetTypeNode130Slice     2.30µs × (0.99,1.01)   2.40µs × (0.98,1.01)   +4.37% (p=0.000)
SetTypeNode1024          498ns × (1.00,1.00)    537ns × (1.00,1.00)   +7.96% (p=0.000)
SetTypeNode1024Slice    15.5µs × (0.99,1.01)   17.8µs × (1.00,1.00)  +15.27% (p=0.000)

The above compares always using a cached pointer mask (and the
corresponding waste of memory) against using the programs directly.
Some slowdown is expected, in exchange for having a better general algorithm.
The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024,
along with the slice variants of those.
It is possible that the cutoff of 128 words (bits) should be raised
in a followup CL, but even with this low cutoff the GC programs are
faster than Go 1.4's "fast path" non-GC program case.

Benchmarks for heapBitsSetType, Go 1.4 vs this CL:

name                    old mean              new mean              delta
SetTypePtr              6.89ns × (1.00,1.00)  5.17ns × (1.00,1.00)  -25.02% (p=0.000)
SetTypePtr8             25.8ns × (0.97,1.05)  21.5ns × (1.00,1.00)  -16.70% (p=0.000)
SetTypePtr16            39.8ns × (0.97,1.02)  24.7ns × (0.99,1.01)  -37.81% (p=0.000)
SetTypePtr32            68.8ns × (0.98,1.01)  32.2ns × (1.00,1.01)  -53.18% (p=0.000)
SetTypePtr64             130ns × (1.00,1.00)    47ns × (1.00,1.00)  -63.67% (p=0.000)
SetTypePtr126            241ns × (0.99,1.01)    79ns × (1.00,1.01)  -67.25% (p=0.000)
SetTypePtr128           2.07µs × (1.00,1.00)  0.08µs × (1.00,1.00)  -96.27% (p=0.000)
SetTypePtrSlice         1.05µs × (0.99,1.01)  0.72µs × (0.99,1.02)  -31.70% (p=0.000)
SetTypeNode1            16.0ns × (0.99,1.01)  20.8ns × (0.99,1.03)  +29.91% (p=0.000)
SetTypeNode1Slice        184ns × (0.99,1.01)   112ns × (0.99,1.01)  -39.26% (p=0.000)
SetTypeNode8            29.5ns × (0.97,1.02)  24.6ns × (1.00,1.00)  -16.50% (p=0.000)
SetTypeNode8Slice        624ns × (0.98,1.02)   285ns × (1.00,1.00)  -54.31% (p=0.000)
SetTypeNode64            135ns × (0.96,1.08)    52ns × (0.99,1.02)  -61.32% (p=0.000)
SetTypeNode64Slice      3.83µs × (1.00,1.00)  1.14µs × (0.99,1.01)  -70.16% (p=0.000)
SetTypeNode64Dead        134ns × (0.99,1.01)    32ns × (1.00,1.01)  -75.74% (p=0.000)
SetTypeNode64DeadSlice  3.83µs × (0.99,1.00)  1.40µs × (1.00,1.01)  -63.42% (p=0.000)
SetTypeNode124           240ns × (0.99,1.01)    79ns × (1.00,1.01)  -67.05% (p=0.000)
SetTypeNode124Slice     7.27µs × (1.00,1.00)  2.04µs × (1.00,1.00)  -71.95% (p=0.000)
SetTypeNode126          2.06µs × (0.99,1.01)  0.08µs × (0.99,1.01)  -96.23% (p=0.000)
SetTypeNode126Slice     64.4µs × (1.00,1.00)   2.0µs × (1.00,1.00)  -96.85% (p=0.000)
SetTypeNode128          2.09µs × (1.00,1.01)  0.12µs × (1.00,1.00)  -94.15% (p=0.000)
SetTypeNode128Slice     65.4µs × (1.00,1.00)   2.4µs × (0.99,1.03)  -96.39% (p=0.000)
SetTypeNode130          2.11µs × (1.00,1.00)  0.12µs × (1.00,1.00)  -94.18% (p=0.000)
SetTypeNode130Slice     66.3µs × (1.00,1.00)   2.4µs × (0.97,1.08)  -96.34% (p=0.000)
SetTypeNode1024         16.0µs × (1.00,1.01)   0.5µs × (1.00,1.00)  -96.65% (p=0.000)
SetTypeNode1024Slice     512µs × (1.00,1.00)    18µs × (0.98,1.04)  -96.45% (p=0.000)

SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation.
Both Go 1.4 and this CL are using pointer bitmaps for this case,
so that's an overall 3x speedup for using pointer bitmaps.

SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation.
Both Go 1.4 and this CL are running the GC program for this case,
so that's an overall 17x speedup when using GC programs (and
I've seen >20x on other systems).

Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against
this CL's SetTypeNode128 (GC program), the slow path in the
code in this CL is 2x faster than the fast path in Go 1.4.

The Go 1 benchmarks are basically unaffected compared to just before this CL.

Go 1 benchmarks, before this CL vs this CL:

name                   old mean              new mean              delta
BinaryTree17            5.87s × (0.97,1.04)   5.91s × (0.96,1.04)    ~    (p=0.306)
Fannkuch11              4.38s × (1.00,1.00)   4.37s × (1.00,1.01)  -0.22% (p=0.006)
FmtFprintfEmpty        90.7ns × (0.97,1.10)  89.3ns × (0.96,1.09)    ~    (p=0.280)
FmtFprintfString        282ns × (0.98,1.04)   287ns × (0.98,1.07)  +1.72% (p=0.039)
FmtFprintfInt           269ns × (0.99,1.03)   282ns × (0.97,1.04)  +4.87% (p=0.000)
FmtFprintfIntInt        478ns × (0.99,1.02)   481ns × (0.99,1.02)  +0.61% (p=0.048)
FmtFprintfPrefixedInt   399ns × (0.98,1.03)   400ns × (0.98,1.05)    ~    (p=0.533)
FmtFprintfFloat         563ns × (0.99,1.01)   570ns × (1.00,1.01)  +1.37% (p=0.000)
FmtManyArgs            1.89µs × (0.99,1.01)  1.92µs × (0.99,1.02)  +1.88% (p=0.000)
GobDecode              15.2ms × (0.99,1.01)  15.2ms × (0.98,1.05)    ~    (p=0.609)
GobEncode              11.6ms × (0.98,1.03)  11.9ms × (0.98,1.04)  +2.17% (p=0.000)
Gzip                    648ms × (0.99,1.01)   648ms × (1.00,1.01)    ~    (p=0.835)
Gunzip                  142ms × (1.00,1.00)   143ms × (1.00,1.01)    ~    (p=0.169)
HTTPClientServer       90.5µs × (0.98,1.03)  91.5µs × (0.98,1.04)  +1.04% (p=0.045)
JSONEncode             31.5ms × (0.98,1.03)  31.4ms × (0.98,1.03)    ~    (p=0.549)
JSONDecode              111ms × (0.99,1.01)   107ms × (0.99,1.01)  -3.21% (p=0.000)
Mandelbrot200          6.01ms × (1.00,1.00)  6.01ms × (1.00,1.00)    ~    (p=0.878)
GoParse                6.54ms × (0.99,1.02)  6.61ms × (0.99,1.03)  +1.08% (p=0.004)
RegexpMatchEasy0_32     160ns × (1.00,1.01)   161ns × (1.00,1.00)  +0.40% (p=0.000)
RegexpMatchEasy0_1K     560ns × (0.99,1.01)   559ns × (0.99,1.01)    ~    (p=0.088)
RegexpMatchEasy1_32     138ns × (0.99,1.01)   138ns × (1.00,1.00)    ~    (p=0.380)
RegexpMatchEasy1_1K     877ns × (1.00,1.00)   878ns × (1.00,1.00)    ~    (p=0.157)
RegexpMatchMedium_32    251ns × (0.99,1.00)   251ns × (1.00,1.01)  +0.28% (p=0.021)
RegexpMatchMedium_1K   72.6µs × (1.00,1.00)  72.6µs × (1.00,1.00)    ~    (p=0.539)
RegexpMatchHard_32     3.84µs × (1.00,1.00)  3.84µs × (1.00,1.00)    ~    (p=0.378)
RegexpMatchHard_1K      117µs × (1.00,1.00)   117µs × (1.00,1.00)    ~    (p=0.067)
Revcomp                 904ms × (0.99,1.02)   904ms × (0.99,1.01)    ~    (p=0.943)
Template                125ms × (0.99,1.02)   127ms × (0.99,1.01)  +1.79% (p=0.000)
TimeParse               627ns × (0.99,1.01)   622ns × (0.99,1.01)  -0.88% (p=0.000)
TimeFormat              655ns × (0.99,1.02)   655ns × (0.99,1.02)    ~    (p=0.976)

For the record, Go 1 benchmarks, Go 1.4 vs this CL:

name                   old mean              new mean              delta
BinaryTree17            4.61s × (0.97,1.05)   5.91s × (0.98,1.03)  +28.35% (p=0.000)
Fannkuch11              4.40s × (0.99,1.03)   4.41s × (0.99,1.01)     ~    (p=0.212)
FmtFprintfEmpty         102ns × (0.99,1.01)    84ns × (0.99,1.02)  -18.38% (p=0.000)
FmtFprintfString        302ns × (0.98,1.01)   303ns × (0.99,1.02)     ~    (p=0.203)
FmtFprintfInt           313ns × (0.97,1.05)   270ns × (0.99,1.01)  -13.69% (p=0.000)
FmtFprintfIntInt        524ns × (0.98,1.02)   477ns × (0.99,1.00)   -8.87% (p=0.000)
FmtFprintfPrefixedInt   424ns × (0.98,1.02)   386ns × (0.99,1.01)   -8.96% (p=0.000)
FmtFprintfFloat         652ns × (0.98,1.02)   594ns × (0.97,1.05)   -8.97% (p=0.000)
FmtManyArgs            2.13µs × (0.99,1.02)  1.94µs × (0.99,1.01)   -8.92% (p=0.000)
GobDecode              17.1ms × (0.99,1.02)  14.9ms × (0.98,1.03)  -13.07% (p=0.000)
GobEncode              13.5ms × (0.98,1.03)  11.5ms × (0.98,1.03)  -15.25% (p=0.000)
Gzip                    656ms × (0.99,1.02)   647ms × (0.99,1.01)   -1.29% (p=0.000)
Gunzip                  143ms × (0.99,1.02)   144ms × (0.99,1.01)     ~    (p=0.204)
HTTPClientServer       88.2µs × (0.98,1.02)  90.8µs × (0.98,1.01)   +2.93% (p=0.000)
JSONEncode             32.2ms × (0.98,1.02)  30.9ms × (0.97,1.04)   -4.06% (p=0.001)
JSONDecode              121ms × (0.98,1.02)   110ms × (0.98,1.05)   -8.95% (p=0.000)
Mandelbrot200          6.06ms × (0.99,1.01)  6.11ms × (0.98,1.04)     ~    (p=0.184)
GoParse                6.76ms × (0.97,1.04)  6.58ms × (0.98,1.05)   -2.63% (p=0.003)
RegexpMatchEasy0_32     195ns × (1.00,1.01)   155ns × (0.99,1.01)  -20.43% (p=0.000)
RegexpMatchEasy0_1K     479ns × (0.98,1.03)   535ns × (0.99,1.02)  +11.59% (p=0.000)
RegexpMatchEasy1_32     169ns × (0.99,1.02)   131ns × (0.99,1.03)  -22.44% (p=0.000)
RegexpMatchEasy1_1K    1.53µs × (0.99,1.01)  0.87µs × (0.99,1.02)  -43.07% (p=0.000)
RegexpMatchMedium_32    334ns × (0.99,1.01)   242ns × (0.99,1.01)  -27.53% (p=0.000)
RegexpMatchMedium_1K    125µs × (1.00,1.01)    72µs × (0.99,1.03)  -42.53% (p=0.000)
RegexpMatchHard_32     6.03µs × (0.99,1.01)  3.79µs × (0.99,1.01)  -37.12% (p=0.000)
RegexpMatchHard_1K      189µs × (0.99,1.02)   115µs × (0.99,1.01)  -39.20% (p=0.000)
Revcomp                 935ms × (0.96,1.03)   926ms × (0.98,1.02)     ~    (p=0.083)
Template                146ms × (0.97,1.05)   119ms × (0.99,1.01)  -18.37% (p=0.000)
TimeParse               660ns × (0.99,1.01)   624ns × (0.99,1.02)   -5.43% (p=0.000)
TimeFormat              670ns × (0.98,1.02)   710ns × (1.00,1.01)   +5.97% (p=0.000)

This CL is a bit larger than I would like, but the compiler, linker, runtime,
and package reflect all need to be in sync about the format of these programs,
so there is no easy way to split this into independent changes (at least
while keeping the build working at each change).

Fixes #9625.
Fixes #10524.

Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a
Reviewed-on: https://go-review.googlesource.com/9888
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
2015-05-16 00:38:17 +00:00
Russ Cox c3c047a6a3 runtime: test and fix heap bitmap for 1-pointer allocation on 32-bit system
Change-Id: Ic064fe7c6bd3304dcc8c3f7b3b5393870b5387c2
Reviewed-on: https://go-review.googlesource.com/10119
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Austin Clements <austin@google.com>
2015-05-15 18:47:00 +00:00
Russ Cox 65c4d7beab runtime: optimize heapBitsBulkBarrier a tiny amount
This may be mostly noise but:

name                   old mean              new mean              delta
BinaryTree17            6.03s × (0.98,1.02)   5.98s × (0.97,1.03)    ~    (p=0.306)
Fannkuch11              4.42s × (0.99,1.01)   4.34s × (0.99,1.02)  -1.83% (p=0.000)
FmtFprintfEmpty        84.7ns × (0.99,1.01)  84.4ns × (1.00,1.00)    ~    (p=0.138)
FmtFprintfString        289ns × (0.98,1.02)   289ns × (1.00,1.01)    ~    (p=0.509)
FmtFprintfInt           280ns × (0.97,1.03)   272ns × (0.98,1.03)  -2.64% (p=0.003)
FmtFprintfIntInt        484ns × (0.98,1.02)   482ns × (0.98,1.03)    ~    (p=0.606)
FmtFprintfPrefixedInt   397ns × (0.98,1.03)   393ns × (0.99,1.02)    ~    (p=0.064)
FmtFprintfFloat         573ns × (0.99,1.01)   569ns × (0.99,1.01)  -0.69% (p=0.023)
FmtManyArgs            1.89µs × (0.99,1.02)  1.91µs × (0.98,1.02)    ~    (p=0.219)
GobDecode              15.4ms × (0.99,1.02)  15.1ms × (0.99,1.01)  -2.05% (p=0.000)
GobEncode              12.0ms × (0.97,1.04)  11.9ms × (0.97,1.03)    ~    (p=0.458)
Gzip                    652ms × (0.99,1.01)   653ms × (0.99,1.01)    ~    (p=0.743)
Gunzip                  144ms × (0.99,1.01)   143ms × (0.99,1.01)    ~    (p=0.134)
HTTPClientServer       91.6µs × (0.99,1.01)  91.8µs × (0.99,1.03)    ~    (p=0.678)
JSONEncode             31.9ms × (1.00,1.00)  32.0ms × (0.99,1.01)    ~    (p=0.334)
JSONDecode              110ms × (0.99,1.01)   110ms × (0.99,1.01)    ~    (p=0.315)
Mandelbrot200          6.04ms × (0.99,1.01)  6.04ms × (1.00,1.01)    ~    (p=0.596)
GoParse                6.72ms × (0.98,1.03)  6.74ms × (0.99,1.03)    ~    (p=0.577)
RegexpMatchEasy0_32     161ns × (0.99,1.01)   160ns × (1.00,1.00)  -0.83% (p=0.002)
RegexpMatchEasy0_1K     542ns × (0.99,1.02)   541ns × (0.99,1.01)    ~    (p=0.396)
RegexpMatchEasy1_32     140ns × (0.98,1.01)   137ns × (1.00,1.00)  -2.12% (p=0.000)
RegexpMatchEasy1_1K     892ns × (0.99,1.01)   891ns × (1.00,1.01)    ~    (p=0.631)
RegexpMatchMedium_32    255ns × (0.99,1.01)   253ns × (0.99,1.01)  -0.76% (p=0.008)
RegexpMatchMedium_1K   73.1µs × (1.00,1.01)  72.9µs × (1.00,1.00)    ~    (p=0.229)
RegexpMatchHard_32     3.86µs × (1.00,1.01)  3.85µs × (1.00,1.00)    ~    (p=0.341)
RegexpMatchHard_1K      117µs × (1.00,1.01)   117µs × (0.99,1.00)    ~    (p=0.955)
Revcomp                 954ms × (0.97,1.03)   955ms × (0.98,1.02)    ~    (p=0.894)
Template                133ms × (0.97,1.05)   129ms × (0.99,1.02)  -2.50% (p=0.014)
TimeParse               629ns × (0.99,1.01)   626ns × (0.99,1.01)    ~    (p=0.106)
TimeFormat              663ns × (0.99,1.01)   660ns × (0.99,1.02)    ~    (p=0.231)

Change-Id: I580e03ed01b0629cb5eae4c4637618f20127f924
Reviewed-on: https://go-review.googlesource.com/9994
Reviewed-by: Austin Clements <austin@google.com>
2015-05-15 13:52:00 +00:00
Russ Cox ecfe42cab0 runtime: keep pointer bits set always in 1-word spans
It's dumb to clear them in initSpan, set them in heapBitsSetType,
clear them in heapBitsSweepSpan, set them again in heapBitsSetType,
clear them again in heapBitsSweepSpan, and so on.

Set them in initSpan and be done with it (until the span is reused
for objects of a different size).

This avoids an atomic operation in a common case (one-word allocation).
Suggested by rlh.

name                   old mean              new mean              delta
BinaryTree17            5.87s × (0.97,1.03)   5.93s × (0.98,1.04)              ~    (p=0.056)
Fannkuch11              4.34s × (1.00,1.01)   4.41s × (1.00,1.00)            +1.42% (p=0.000)
FmtFprintfEmpty        86.1ns × (0.98,1.03)  88.9ns × (0.95,1.14)              ~    (p=0.066)
FmtFprintfString        292ns × (0.97,1.04)   284ns × (0.98,1.03)            -2.64% (p=0.000)
FmtFprintfInt           271ns × (0.98,1.06)   274ns × (0.98,1.05)              ~    (p=0.148)
FmtFprintfIntInt        478ns × (0.98,1.05)   487ns × (0.98,1.03)            +1.85% (p=0.004)
FmtFprintfPrefixedInt   397ns × (0.98,1.05)   394ns × (0.98,1.02)              ~    (p=0.184)
FmtFprintfFloat         553ns × (0.99,1.02)   543ns × (0.99,1.01)            -1.71% (p=0.000)
FmtManyArgs            1.90µs × (0.98,1.05)  1.88µs × (0.99,1.01)            -0.97% (p=0.037)
GobDecode              15.1ms × (0.99,1.01)  15.3ms × (0.99,1.01)            +0.78% (p=0.001)
GobEncode              11.7ms × (0.98,1.05)  11.6ms × (0.99,1.02)            -1.39% (p=0.009)
Gzip                    646ms × (1.00,1.01)   647ms × (1.00,1.01)              ~    (p=0.120)
Gunzip                  142ms × (1.00,1.00)   142ms × (1.00,1.00)              ~    (p=0.068)
HTTPClientServer       89.7µs × (0.99,1.01)  90.1µs × (0.98,1.03)              ~    (p=0.224)
JSONEncode             31.3ms × (0.99,1.01)  31.2ms × (0.99,1.02)              ~    (p=0.149)
JSONDecode              113ms × (0.99,1.01)   111ms × (0.99,1.01)            -1.25% (p=0.000)
Mandelbrot200          6.01ms × (1.00,1.00)  6.01ms × (1.00,1.00)            +0.09% (p=0.015)
GoParse                6.63ms × (0.98,1.03)  6.55ms × (0.99,1.02)            -1.10% (p=0.006)
RegexpMatchEasy0_32     161ns × (1.00,1.00)   161ns × (1.00,1.00)  (sample has zero variance)
RegexpMatchEasy0_1K     539ns × (0.99,1.01)   563ns × (0.99,1.01)            +4.51% (p=0.000)
RegexpMatchEasy1_32     140ns × (0.99,1.01)   141ns × (0.99,1.01)            +1.34% (p=0.000)
RegexpMatchEasy1_1K     886ns × (1.00,1.01)   888ns × (1.00,1.00)            +0.20% (p=0.003)
RegexpMatchMedium_32    252ns × (1.00,1.02)   255ns × (0.99,1.01)            +1.32% (p=0.000)
RegexpMatchMedium_1K   72.7µs × (1.00,1.00)  72.6µs × (1.00,1.00)              ~    (p=0.296)
RegexpMatchHard_32     3.84µs × (1.00,1.01)  3.84µs × (1.00,1.00)              ~    (p=0.339)
RegexpMatchHard_1K      117µs × (1.00,1.01)   117µs × (1.00,1.00)            -0.28% (p=0.022)
Revcomp                 914ms × (0.99,1.01)   909ms × (0.99,1.01)            -0.49% (p=0.031)
Template                128ms × (0.99,1.01)   127ms × (0.99,1.01)            -1.10% (p=0.000)
TimeParse               628ns × (0.99,1.01)   639ns × (0.99,1.01)            +1.69% (p=0.000)
TimeFormat              660ns × (0.99,1.01)   662ns × (0.99,1.02)              ~    (p=0.287)

Change-Id: I3127b0ab89708267c74aa7d0eae1db1a1bcdfda5
Reviewed-on: https://go-review.googlesource.com/9884
Reviewed-by: Austin Clements <austin@google.com>
2015-05-14 15:58:54 +00:00
Russ Cox 94934f843e runtime: rewrite addb/subtractb to be simpler to compile; introduce add1, subtract1
This reduces the depth of the inlining at a particular call site.
The inliner introduces many temporary variables, and the compiler can do
a better job with fewer. Being verbose in the bodies of these helper functions
seems like a reasonable tradeoff: the uses are still just as readable, and
they run faster in some important cases.

Change-Id: I5323976ed3704d0acd18fb31176cfbf5ba23a89c
Reviewed-on: https://go-review.googlesource.com/9883
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-05-14 15:55:42 +00:00
Russ Cox 5b3739357a runtime: skip atomics in heapBitsSetType when GC is not running
Suggested by Rick during code review of this code,
but separated out for easier diagnosis in case it causes
problems (and also easier rollback).

name                    old mean              new mean              delta
SetTypePtr              13.9ns × (0.98,1.05)   6.2ns × (0.99,1.01)  -55.18% (p=0.000)
SetTypePtr8             15.5ns × (0.95,1.10)  15.5ns × (0.99,1.05)     ~    (p=0.952)
SetTypePtr16            17.8ns × (0.99,1.05)  18.0ns × (1.00,1.00)     ~    (p=0.157)
SetTypePtr32            25.2ns × (0.99,1.01)  24.3ns × (0.99,1.01)   -3.86% (p=0.000)
SetTypePtr64            42.2ns × (0.93,1.13)  40.8ns × (0.99,1.01)     ~    (p=0.239)
SetTypePtr126           67.3ns × (1.00,1.00)  67.5ns × (0.99,1.02)     ~    (p=0.365)
SetTypePtr128           67.6ns × (1.00,1.01)  70.1ns × (0.97,1.10)     ~    (p=0.063)
SetTypePtrSlice          575ns × (0.98,1.06)   543ns × (0.95,1.17)   -5.54% (p=0.034)
SetTypeNode1            12.4ns × (0.98,1.09)  12.8ns × (0.99,1.01)   +3.40% (p=0.021)
SetTypeNode1Slice       97.1ns × (0.97,1.09)  89.5ns × (1.00,1.00)   -7.78% (p=0.000)
SetTypeNode8            29.8ns × (1.00,1.01)  17.7ns × (1.00,1.01)  -40.74% (p=0.000)
SetTypeNode8Slice        204ns × (0.99,1.04)   190ns × (0.97,1.06)   -6.96% (p=0.000)
SetTypeNode64           42.8ns × (0.99,1.01)  44.0ns × (0.95,1.12)     ~    (p=0.163)
SetTypeNode64Slice      1.00µs × (0.95,1.09)  0.98µs × (0.96,1.08)     ~    (p=0.356)
SetTypeNode64Dead       12.2ns × (0.99,1.04)  12.7ns × (1.00,1.01)   +4.34% (p=0.000)
SetTypeNode64DeadSlice  1.14µs × (0.94,1.11)  0.99µs × (0.99,1.03)  -13.74% (p=0.000)
SetTypeNode124          67.9ns × (0.99,1.03)  70.4ns × (0.95,1.15)     ~    (p=0.115)
SetTypeNode124Slice     1.76µs × (0.99,1.04)  1.88µs × (0.91,1.23)     ~    (p=0.096)
SetTypeNode126          67.7ns × (1.00,1.01)  68.2ns × (0.99,1.02)   +0.72% (p=0.014)
SetTypeNode126Slice     1.76µs × (1.00,1.01)  1.87µs × (0.93,1.15)   +6.15% (p=0.035)
SetTypeNode1024          462ns × (0.96,1.10)   451ns × (0.99,1.05)     ~    (p=0.224)
SetTypeNode1024Slice    14.4µs × (0.95,1.15)  14.2µs × (0.97,1.19)     ~    (p=0.676)

name                   old mean              new mean              delta
BinaryTree17            5.87s × (0.98,1.04)   5.87s × (0.98,1.03)    ~    (p=0.993)
Fannkuch11              4.39s × (0.99,1.01)   4.34s × (1.00,1.01)  -1.22% (p=0.000)
FmtFprintfEmpty        90.6ns × (0.97,1.06)  89.4ns × (0.97,1.03)    ~    (p=0.070)
FmtFprintfString        305ns × (0.98,1.02)   296ns × (0.99,1.02)  -2.94% (p=0.000)
FmtFprintfInt           276ns × (0.97,1.04)   270ns × (0.98,1.03)  -2.17% (p=0.001)
FmtFprintfIntInt        490ns × (0.97,1.05)   473ns × (0.99,1.02)  -3.59% (p=0.000)
FmtFprintfPrefixedInt   402ns × (0.99,1.02)   397ns × (0.99,1.01)  -1.15% (p=0.000)
FmtFprintfFloat         577ns × (0.99,1.01)   549ns × (0.99,1.01)  -4.78% (p=0.000)
FmtManyArgs            1.89µs × (0.99,1.02)  1.87µs × (0.99,1.01)  -1.43% (p=0.000)
GobDecode              15.2ms × (0.99,1.01)  14.7ms × (0.99,1.02)  -3.55% (p=0.000)
GobEncode              11.7ms × (0.98,1.04)  11.5ms × (0.99,1.02)  -1.63% (p=0.002)
Gzip                    647ms × (0.99,1.01)   647ms × (1.00,1.01)    ~    (p=0.486)
Gunzip                  142ms × (1.00,1.00)   143ms × (1.00,1.00)    ~    (p=0.234)
HTTPClientServer       90.7µs × (0.99,1.01)  90.4µs × (0.98,1.04)    ~    (p=0.331)
JSONEncode             31.9ms × (0.97,1.06)  31.6ms × (0.98,1.02)    ~    (p=0.206)
JSONDecode              110ms × (0.99,1.01)   112ms × (0.99,1.02)  +1.48% (p=0.000)
Mandelbrot200          6.00ms × (1.00,1.00)  6.01ms × (1.00,1.00)    ~    (p=0.058)
GoParse                6.63ms × (0.98,1.03)  6.61ms × (0.98,1.02)    ~    (p=0.353)
RegexpMatchEasy0_32     162ns × (0.99,1.01)   161ns × (1.00,1.00)  -0.33% (p=0.004)
RegexpMatchEasy0_1K     539ns × (0.99,1.01)   540ns × (0.99,1.02)    ~    (p=0.222)
RegexpMatchEasy1_32     139ns × (0.99,1.01)   140ns × (0.97,1.03)    ~    (p=0.054)
RegexpMatchEasy1_1K     886ns × (1.00,1.00)   887ns × (1.00,1.00)  +0.18% (p=0.001)
RegexpMatchMedium_32    252ns × (1.00,1.01)   252ns × (1.00,1.00)  +0.21% (p=0.010)
RegexpMatchMedium_1K   72.7µs × (1.00,1.01)  72.6µs × (1.00,1.00)    ~    (p=0.060)
RegexpMatchHard_32     3.84µs × (1.00,1.00)  3.84µs × (1.00,1.00)    ~    (p=0.065)
RegexpMatchHard_1K      117µs × (1.00,1.00)   117µs × (1.00,1.00)  -0.27% (p=0.000)
Revcomp                 916ms × (0.98,1.04)   909ms × (0.99,1.01)    ~    (p=0.054)
Template                126ms × (0.99,1.01)   128ms × (0.99,1.02)  +1.43% (p=0.000)
TimeParse               632ns × (0.99,1.01)   625ns × (1.00,1.01)  -1.05% (p=0.000)
TimeFormat              655ns × (0.99,1.02)   669ns × (0.99,1.02)  +2.01% (p=0.000)

Change-Id: I9477b7c9489c6fa98e860c190ce06cd73c53c6a1
Reviewed-on: https://go-review.googlesource.com/9829
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-05-14 15:54:53 +00:00
Russ Cox 4212a3c3d9 runtime: use heap bitmap for typedmemmove
The current implementation of typedmemmove walks the ptrmask
in the type to find out where pointers are. This led to turning off
GC programs for the Go 1.5 dev cycle, so that there would always
be a ptrmask. Instead of also interpreting the GC programs,
interpret the heap bitmap, which we know must be available and
up to date. (There is no point to write barriers when writing outside
the heap.)

This CL is only about correctness. The next CL will optimize the code.

Change-Id: Id1305c7c071fd2734ab96634b0e1c745b23fa793
Reviewed-on: https://go-review.googlesource.com/9886
Reviewed-by: Austin Clements <austin@google.com>
2015-05-11 16:38:21 +00:00
Russ Cox 266a842f55 runtime: zero entire bitmap for object, even past dead marker
We want typedmemmove to use the heap bitmap to determine
where pointers are, instead of reinterpreting the type information.
The heap bitmap is simpler to access.

In general, typedmemmove will need to be able to look up the bits
for any word and find valid pointer information, so fill even after the
dead marker. Not filling after the dead marker was an optimization
I introduced only a few days ago, when reintroducing the dead marker
code. At the time I said it probably wouldn't last, and it didn't.

Change-Id: I6ba01bff17ddee1ff429f454abe29867ec60606e
Reviewed-on: https://go-review.googlesource.com/9885
Reviewed-by: Austin Clements <austin@google.com>
2015-05-11 16:37:46 +00:00
Russ Cox e375ca2a25 runtime: reorder bits in heap bitmap bytes
The runtime deals with 1-bit pointer bitmaps and 2-bit heap bitmaps
that have entries for both pointers and mark bits.

Each byte in a 1-bit pointer bitmap looks like pppppppp (all pointer bits).
Each byte in a 2-bit heap bitmap looks like mpmpmpmp (mark, pointer, ...).
This means that when converting from 1-bit to 2-bit, as we do
during malloc, we have to pick up 4 bits in pppp form and use
shifts to create the mpmpmpmp form.

This CL changes the 2-bit heap bitmap form to mmmmpppp,
so that 4 bits picked up in 1-bit form can be used directly in
the low bits of the heap bitmap byte, without expansion.
This simplifies the code, and it also happens to be faster.

name                    old mean              new mean              delta
SetTypePtr              14.0ns × (0.98,1.09)  14.0ns × (0.98,1.08)     ~    (p=0.966)
SetTypePtr8             16.5ns × (0.99,1.05)  15.3ns × (0.96,1.16)   -6.86% (p=0.012)
SetTypePtr16            21.3ns × (0.98,1.05)  18.8ns × (0.94,1.14)  -11.49% (p=0.000)
SetTypePtr32            34.6ns × (0.93,1.22)  27.7ns × (0.91,1.26)  -20.08% (p=0.001)
SetTypePtr64            55.7ns × (0.97,1.11)  41.6ns × (0.98,1.04)  -25.30% (p=0.000)
SetTypePtr126           98.0ns × (1.00,1.00)  67.7ns × (0.99,1.05)  -30.88% (p=0.000)
SetTypePtr128           98.6ns × (1.00,1.01)  68.6ns × (0.99,1.03)  -30.44% (p=0.000)
SetTypePtrSlice          781ns × (0.99,1.01)   571ns × (0.99,1.04)  -26.93% (p=0.000)
SetTypeNode1            13.1ns × (0.99,1.01)  12.1ns × (0.99,1.01)   -7.45% (p=0.000)
SetTypeNode1Slice        113ns × (0.99,1.01)    94ns × (1.00,1.00)  -16.35% (p=0.000)
SetTypeNode8            32.7ns × (1.00,1.00)  29.8ns × (0.99,1.01)   -8.97% (p=0.000)
SetTypeNode8Slice        266ns × (1.00,1.00)   204ns × (1.00,1.00)  -23.40% (p=0.000)
SetTypeNode64           58.0ns × (0.98,1.08)  42.8ns × (1.00,1.01)  -26.24% (p=0.000)
SetTypeNode64Slice      1.55µs × (0.99,1.02)  0.96µs × (1.00,1.00)  -37.84% (p=0.000)
SetTypeNode64Dead       13.1ns × (0.99,1.01)  12.1ns × (1.00,1.00)   -7.33% (p=0.000)
SetTypeNode64DeadSlice  1.52µs × (1.00,1.01)  1.08µs × (1.00,1.01)  -28.95% (p=0.000)
SetTypeNode124          97.9ns × (1.00,1.00)  67.1ns × (1.00,1.01)  -31.49% (p=0.000)
SetTypeNode124Slice     2.87µs × (0.99,1.02)  1.75µs × (1.00,1.01)  -39.15% (p=0.000)
SetTypeNode126          98.4ns × (1.00,1.01)  68.1ns × (1.00,1.01)  -30.79% (p=0.000)
SetTypeNode126Slice     2.91µs × (0.99,1.01)  1.77µs × (0.99,1.01)  -39.09% (p=0.000)
SetTypeNode1024          732ns × (1.00,1.00)   511ns × (0.87,1.42)  -30.14% (p=0.000)
SetTypeNode1024Slice    23.1µs × (1.00,1.00)  13.9µs × (0.99,1.02)  -39.83% (p=0.000)

Change-Id: I12e3b850a4e6fa6c8146b8635ff728f3ef658819
Reviewed-on: https://go-review.googlesource.com/9828
Reviewed-by: Austin Clements <austin@google.com>
2015-05-11 16:37:36 +00:00
Russ Cox 54af9a3ba5 runtime: reintroduce ``dead'' space during GC scan
Reintroduce an optimization discarded during the initial conversion
from 4-bit heap bitmaps to 2-bit heap bitmaps: when we reach the
place in the bitmap where there are no more pointers, mark that position
for the GC so that it can avoid scanning past that place.

During heapBitsSetType we can also avoid initializing heap bitmap
beyond that location, which gives a bit of a win compared to Go 1.4.
This particular optimization (not initializing the heap bitmap) may not last:
we might change typedmemmove to use the heap bitmap, in which
case it would all need to be initialized. The early stop in the GC scan
will stay no matter what.

Compared to Go 1.4 (github.com/rsc/go, branch go14bench):
name                    old mean              new mean              delta
SetTypeNode64           80.7ns × (1.00,1.01)  57.4ns × (1.00,1.01)  -28.83% (p=0.000)
SetTypeNode64Dead       80.5ns × (1.00,1.01)  13.1ns × (0.99,1.02)  -83.77% (p=0.000)
SetTypeNode64Slice      2.16µs × (1.00,1.01)  1.54µs × (1.00,1.01)  -28.75% (p=0.000)
SetTypeNode64DeadSlice  2.16µs × (1.00,1.01)  1.52µs × (1.00,1.00)  -29.74% (p=0.000)

Compared to previous CL:
name                    old mean              new mean              delta
SetTypeNode64           56.7ns × (1.00,1.00)  57.4ns × (1.00,1.01)   +1.19% (p=0.000)
SetTypeNode64Dead       57.2ns × (1.00,1.00)  13.1ns × (0.99,1.02)  -77.15% (p=0.000)
SetTypeNode64Slice      1.56µs × (1.00,1.01)  1.54µs × (1.00,1.01)   -0.89% (p=0.000)
SetTypeNode64DeadSlice  1.55µs × (1.00,1.01)  1.52µs × (1.00,1.00)   -2.23% (p=0.000)

This is the last CL in the sequence converting from the 4-bit heap
to the 2-bit heap, with all the same optimizations reenabled.
Compared to before that process began (compared to CL 9701 patch set 1):

name                    old mean              new mean              delta
BinaryTree17             5.87s × (0.94,1.09)   5.91s × (0.96,1.06)    ~    (p=0.578)
Fannkuch11               4.32s × (1.00,1.00)   4.32s × (1.00,1.00)    ~    (p=0.474)
FmtFprintfEmpty         89.1ns × (0.95,1.16)  89.0ns × (0.93,1.10)    ~    (p=0.942)
FmtFprintfString         283ns × (0.98,1.02)   298ns × (0.98,1.06)  +5.33% (p=0.000)
FmtFprintfInt            284ns × (0.98,1.04)   286ns × (0.98,1.03)    ~    (p=0.208)
FmtFprintfIntInt         486ns × (0.98,1.03)   498ns × (0.97,1.06)  +2.48% (p=0.000)
FmtFprintfPrefixedInt    400ns × (0.99,1.02)   408ns × (0.98,1.02)  +2.23% (p=0.000)
FmtFprintfFloat          566ns × (0.99,1.01)   587ns × (0.98,1.01)  +3.69% (p=0.000)
FmtManyArgs             1.91µs × (0.99,1.02)  1.94µs × (0.99,1.02)  +1.81% (p=0.000)
GobDecode               15.5ms × (0.98,1.05)  15.8ms × (0.98,1.03)  +1.94% (p=0.002)
GobEncode               11.9ms × (0.97,1.03)  12.0ms × (0.96,1.09)    ~    (p=0.263)
Gzip                     648ms × (0.99,1.01)   648ms × (0.99,1.01)    ~    (p=0.992)
Gunzip                   143ms × (1.00,1.00)   143ms × (1.00,1.01)    ~    (p=0.585)
HTTPClientServer        89.2µs × (0.99,1.02)  90.3µs × (0.98,1.01)  +1.24% (p=0.000)
JSONEncode              32.3ms × (0.97,1.06)  31.6ms × (0.99,1.01)  -2.29% (p=0.000)
JSONDecode               106ms × (0.99,1.01)   107ms × (1.00,1.01)  +0.62% (p=0.000)
Mandelbrot200           6.02ms × (1.00,1.00)  6.03ms × (1.00,1.01)    ~    (p=0.250)
GoParse                 6.57ms × (0.97,1.06)  6.53ms × (0.99,1.03)    ~    (p=0.243)
RegexpMatchEasy0_32      162ns × (1.00,1.00)   161ns × (1.00,1.01)  -0.80% (p=0.000)
RegexpMatchEasy0_1K      561ns × (0.99,1.02)   541ns × (0.99,1.01)  -3.67% (p=0.000)
RegexpMatchEasy1_32      145ns × (0.95,1.04)   138ns × (1.00,1.00)  -5.04% (p=0.000)
RegexpMatchEasy1_1K      864ns × (0.99,1.04)   887ns × (0.99,1.01)  +2.57% (p=0.000)
RegexpMatchMedium_32     255ns × (0.99,1.04)   253ns × (0.99,1.01)  -1.05% (p=0.012)
RegexpMatchMedium_1K    73.9µs × (0.98,1.04)  72.8µs × (1.00,1.00)  -1.51% (p=0.005)
RegexpMatchHard_32      3.92µs × (0.98,1.04)  3.85µs × (1.00,1.01)  -1.88% (p=0.002)
RegexpMatchHard_1K       120µs × (0.98,1.04)   117µs × (1.00,1.01)  -2.02% (p=0.001)
Revcomp                  936ms × (0.95,1.08)   922ms × (0.97,1.08)    ~    (p=0.234)
Template                 130ms × (0.98,1.04)   126ms × (0.99,1.01)  -2.99% (p=0.000)
TimeParse                638ns × (0.98,1.05)   628ns × (0.99,1.01)  -1.54% (p=0.004)
TimeFormat               674ns × (0.99,1.01)   668ns × (0.99,1.01)  -0.80% (p=0.001)

The slowdown of the first few benchmarks seems to be due to the new
atomic operations for certain small size allocations. But the larger
benchmarks mostly improve, probably due to the decreased memory
pressure from having half as much heap bitmap.

CL 9706, which removes the (never used anymore) wbshadow mode,
gets back what is lost in the early microbenchmarks.

Change-Id: I37423a209e8ec2a2e92538b45cac5422a6acd32d
Reviewed-on: https://go-review.googlesource.com/9705
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-05-11 14:51:40 +00:00
Russ Cox feb8a3b616 runtime: optimize heapBitsSetType
For the conversion of the heap bitmap from 4-bit to 2-bit fields,
I replaced heapBitsSetType with the dumbest thing that could possibly work:
two atomic operations (atomicand8+atomicor8) per 2-bit field.

This CL replaces that code with a proper implementation that
avoids the atomics whenever possible. Benchmarks vs base CL
(before the conversion to 2-bit heap bitmap) and vs Go 1.4 below.

Compared to Go 1.4, SetTypePtr (a 1-pointer allocation)
is 10ns slower because a race against the concurrent GC requires the
use of an atomicor8 that used to be an ordinary write. This slowdown
was present even in the base CL.

Compared to both Go 1.4 and base, SetTypeNode8 (a 10-word allocation)
is 10ns slower because it too needs a new atomic, because with the
denser representation, the byte on the end of the allocation is now shared
with the object next to it; this was not true with the 4-bit representation.

Excluding these two (fundamental) slowdowns due to the use of atomics,
the new code is noticeably faster than both Go 1.4 and the base CL.

The next CL will reintroduce the ``typeDead'' optimization.

Stats are from 5 runs on a MacBookPro10,2 (late 2012 Core i5).

Compared to base CL (** = new atomic)
name                  old mean              new mean              delta
SetTypePtr            14.1ns × (0.99,1.02)  14.7ns × (0.93,1.10)     ~    (p=0.175)
SetTypePtr8           18.4ns × (1.00,1.01)  18.6ns × (0.81,1.21)     ~    (p=0.866)
SetTypePtr16          28.7ns × (1.00,1.00)  22.4ns × (0.90,1.27)  -21.88% (p=0.015)
SetTypePtr32          52.3ns × (1.00,1.00)  33.8ns × (0.93,1.24)  -35.37% (p=0.001)
SetTypePtr64          79.2ns × (1.00,1.00)  55.1ns × (1.00,1.01)  -30.43% (p=0.000)
SetTypePtr126          118ns × (1.00,1.00)   100ns × (1.00,1.00)  -15.97% (p=0.000)
SetTypePtr128          130ns × (0.92,1.19)    98ns × (1.00,1.00)  -24.36% (p=0.008)
SetTypePtrSlice        726ns × (0.96,1.08)   760ns × (1.00,1.00)     ~    (p=0.152)
SetTypeNode1          14.1ns × (0.94,1.15)  12.0ns × (1.00,1.01)  -14.60% (p=0.020)
SetTypeNode1Slice      135ns × (0.96,1.07)    88ns × (1.00,1.00)  -34.53% (p=0.000)
SetTypeNode8          20.9ns × (1.00,1.01)  32.6ns × (1.00,1.00)  +55.37% (p=0.000) **
SetTypeNode8Slice      414ns × (0.99,1.02)   244ns × (1.00,1.00)  -41.09% (p=0.000)
SetTypeNode64         80.0ns × (1.00,1.00)  57.4ns × (1.00,1.00)  -28.23% (p=0.000)
SetTypeNode64Slice    2.15µs × (1.00,1.01)  1.56µs × (1.00,1.00)  -27.43% (p=0.000)
SetTypeNode124         119ns × (0.99,1.00)   100ns × (1.00,1.00)  -16.11% (p=0.000)
SetTypeNode124Slice   3.40µs × (1.00,1.00)  2.93µs × (1.00,1.00)  -13.80% (p=0.000)
SetTypeNode126         120ns × (1.00,1.01)    98ns × (1.00,1.00)  -18.19% (p=0.000)
SetTypeNode126Slice   3.53µs × (0.98,1.08)  3.02µs × (1.00,1.00)  -14.49% (p=0.002)
SetTypeNode1024        726ns × (0.97,1.09)   740ns × (1.00,1.00)     ~    (p=0.451)
SetTypeNode1024Slice  24.9µs × (0.89,1.37)  23.1µs × (1.00,1.00)     ~    (p=0.476)

Compared to Go 1.4 (** = new atomic)
name                  old mean               new mean              delta
SetTypePtr            5.71ns × (0.89,1.19)  14.68ns × (0.93,1.10)  +157.24% (p=0.000) **
SetTypePtr8           19.3ns × (0.96,1.10)   18.6ns × (0.81,1.21)      ~    (p=0.638)
SetTypePtr16          30.7ns × (0.99,1.03)   22.4ns × (0.90,1.27)   -26.88% (p=0.005)
SetTypePtr32          51.5ns × (1.00,1.00)   33.8ns × (0.93,1.24)   -34.40% (p=0.001)
SetTypePtr64          83.6ns × (0.94,1.12)   55.1ns × (1.00,1.01)   -34.12% (p=0.001)
SetTypePtr126          137ns × (0.87,1.26)    100ns × (1.00,1.00)   -27.10% (p=0.028)
SetTypePtrSlice        865ns × (0.80,1.23)    760ns × (1.00,1.00)      ~    (p=0.243)
SetTypeNode1          15.2ns × (0.88,1.12)   12.0ns × (1.00,1.01)   -20.89% (p=0.014)
SetTypeNode1Slice      156ns × (0.93,1.16)     88ns × (1.00,1.00)   -43.57% (p=0.001)
SetTypeNode8          23.8ns × (0.90,1.18)   32.6ns × (1.00,1.00)   +36.76% (p=0.003) **
SetTypeNode8Slice      502ns × (0.92,1.10)    244ns × (1.00,1.00)   -51.46% (p=0.000)
SetTypeNode64         85.6ns × (0.94,1.11)   57.4ns × (1.00,1.00)   -32.89% (p=0.001)
SetTypeNode64Slice    2.36µs × (0.91,1.14)   1.56µs × (1.00,1.00)   -33.96% (p=0.002)
SetTypeNode124         130ns × (0.91,1.12)    100ns × (1.00,1.00)   -23.49% (p=0.004)
SetTypeNode124Slice   3.81µs × (0.90,1.22)   2.93µs × (1.00,1.00)   -23.09% (p=0.025)

There are fewer benchmarks vs Go 1.4 because unrolling directly
into the heap bitmap is not yet implemented, so those would not
be meaningful comparisons.

These benchmarks were not present in Go 1.4 as distributed.
The backport to Go 1.4 is in github.com/rsc/go's go14bench branch,
commit 71d5ee5.

Change-Id: I95ed05a22bf484b0fc9efad549279e766c98d2b6
Reviewed-on: https://go-review.googlesource.com/9704
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-05-11 14:51:20 +00:00
Russ Cox 0234dfd493 runtime: use 2-bit heap bitmap (in place of 4-bit)
Previous CLs changed the representation of the non-heap type bitmaps
to be 1-bit bitmaps (pointer or not). Before this CL, the heap bitmap
stored a 2-bit type for each word and a mark bit and checkmark bit
for the first word of the object. (There used to be additional per-word bits.)

Reduce heap bitmap to 2-bit, with 1 dedicated to pointer or not,
and the other used for mark, checkmark, and "keep scanning forward
to find pointers in this object." See comments for details.

This CL replaces heapBitsSetType with very slow but obviously correct code.
A followup CL will optimize it. (Spoiler: the new code is faster than Go 1.4 was.)

Change-Id: I999577a133f3cfecacebdec9cdc3573c235c7fb9
Reviewed-on: https://go-review.googlesource.com/9703
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-05-11 14:43:45 +00:00
Russ Cox 6d8a147bef runtime: use 1-bit pointer bitmaps in type representation
The type information in reflect.Type and the GC programs is now
1 bit per word, down from 2 bits.

The in-memory unrolled type bitmap representation are now
1 bit per word, down from 4 bits.

The conversion from the unrolled (now 1-bit) bitmap to the
heap bitmap (still 4-bit) is not optimized. A followup CL will
work on that, after the heap bitmap has been converted to 2-bit.

The typeDead optimization, in which a special value denotes
that there are no more pointers anywhere in the object, is lost
in this CL. A followup CL will bring it back in the final form of
heapBitsSetType.

Change-Id: If61e67950c16a293b0b516a6fd9a1c755b6d5549
Reviewed-on: https://go-review.googlesource.com/9702
Reviewed-by: Austin Clements <austin@google.com>
2015-05-11 14:43:33 +00:00
Russ Cox 4d0f3a1c95 cmd/internal/gc, runtime: use 1-bit bitmap for stack frames, data, bss
The bitmaps were 2 bits per pointer because we needed to distinguish
scalar, pointer, multiword, and we used the leftover value to distinguish
uninitialized from scalar, even though the garbage collector (GC) didn't care.

Now that there are no multiword structures from the GC's point of view,
cut the bitmaps down to 1 bit per pointer, recording just live pointer vs not.

The GC assumes the same layout for stack frames and for the maps
describing the global data and bss sections, so change them all in one CL.

The code still refers to 4-bit heap bitmaps and 2-bit "type bitmaps", since
the 2-bit representation lives (at least for now) in some of the reflect data.

Because these stack frame bitmaps are stored directly in the rodata in
the binary, this CL reduces the size of the 6g binary by about 1.1%.

Performance change is basically a wash, but using less memory,
and smaller binaries, and enables other bitmap reductions.

name                                      old mean                new mean        delta
BenchmarkBinaryTree17                13.2s × (0.97,1.03)     13.0s × (0.99,1.01)  -0.93% (p=0.005)
BenchmarkBinaryTree17-2              9.69s × (0.96,1.05)     9.51s × (0.96,1.03)  -1.86% (p=0.001)
BenchmarkBinaryTree17-4              10.1s × (0.97,1.05)     10.0s × (0.96,1.05)  ~ (p=0.141)
BenchmarkFannkuch11                  4.35s × (0.99,1.01)     4.43s × (0.98,1.04)  +1.75% (p=0.001)
BenchmarkFannkuch11-2                4.31s × (0.99,1.03)     4.32s × (1.00,1.00)  ~ (p=0.095)
BenchmarkFannkuch11-4                4.32s × (0.99,1.02)     4.38s × (0.98,1.04)  +1.38% (p=0.008)
BenchmarkFmtFprintfEmpty            83.5ns × (0.97,1.10)    87.3ns × (0.92,1.11)  +4.55% (p=0.014)
BenchmarkFmtFprintfEmpty-2          81.8ns × (0.98,1.04)    82.5ns × (0.97,1.08)  ~ (p=0.364)
BenchmarkFmtFprintfEmpty-4          80.9ns × (0.99,1.01)    82.6ns × (0.97,1.08)  +2.12% (p=0.010)
BenchmarkFmtFprintfString            320ns × (0.95,1.04)     322ns × (0.97,1.05)  ~ (p=0.368)
BenchmarkFmtFprintfString-2          303ns × (0.97,1.04)     304ns × (0.97,1.04)  ~ (p=0.484)
BenchmarkFmtFprintfString-4          305ns × (0.97,1.05)     306ns × (0.98,1.05)  ~ (p=0.543)
BenchmarkFmtFprintfInt               311ns × (0.98,1.03)     319ns × (0.97,1.03)  +2.63% (p=0.000)
BenchmarkFmtFprintfInt-2             297ns × (0.98,1.04)     301ns × (0.97,1.04)  +1.19% (p=0.023)
BenchmarkFmtFprintfInt-4             302ns × (0.98,1.02)     304ns × (0.97,1.03)  ~ (p=0.126)
BenchmarkFmtFprintfIntInt            554ns × (0.96,1.05)     554ns × (0.97,1.03)  ~ (p=0.975)
BenchmarkFmtFprintfIntInt-2          520ns × (0.98,1.03)     517ns × (0.98,1.02)  ~ (p=0.153)
BenchmarkFmtFprintfIntInt-4          524ns × (0.98,1.02)     525ns × (0.98,1.03)  ~ (p=0.597)
BenchmarkFmtFprintfPrefixedInt       433ns × (0.97,1.06)     434ns × (0.97,1.06)  ~ (p=0.804)
BenchmarkFmtFprintfPrefixedInt-2     413ns × (0.98,1.04)     413ns × (0.98,1.03)  ~ (p=0.881)
BenchmarkFmtFprintfPrefixedInt-4     420ns × (0.97,1.03)     421ns × (0.97,1.03)  ~ (p=0.561)
BenchmarkFmtFprintfFloat             620ns × (0.99,1.03)     636ns × (0.97,1.03)  +2.57% (p=0.000)
BenchmarkFmtFprintfFloat-2           601ns × (0.98,1.02)     617ns × (0.98,1.03)  +2.58% (p=0.000)
BenchmarkFmtFprintfFloat-4           613ns × (0.98,1.03)     626ns × (0.98,1.02)  +2.15% (p=0.000)
BenchmarkFmtManyArgs                2.19µs × (0.96,1.04)    2.23µs × (0.97,1.02)  +1.65% (p=0.000)
BenchmarkFmtManyArgs-2              2.08µs × (0.98,1.03)    2.10µs × (0.99,1.02)  +0.79% (p=0.019)
BenchmarkFmtManyArgs-4              2.10µs × (0.98,1.02)    2.13µs × (0.98,1.02)  +1.72% (p=0.000)
BenchmarkGobDecode                  21.3ms × (0.97,1.05)    21.1ms × (0.97,1.04)  -1.36% (p=0.025)
BenchmarkGobDecode-2                20.0ms × (0.97,1.03)    19.2ms × (0.97,1.03)  -4.00% (p=0.000)
BenchmarkGobDecode-4                19.5ms × (0.99,1.02)    19.0ms × (0.99,1.01)  -2.39% (p=0.000)
BenchmarkGobEncode                  18.3ms × (0.95,1.07)    18.1ms × (0.96,1.08)  ~ (p=0.305)
BenchmarkGobEncode-2                16.8ms × (0.97,1.02)    16.4ms × (0.98,1.02)  -2.79% (p=0.000)
BenchmarkGobEncode-4                15.4ms × (0.98,1.02)    15.4ms × (0.98,1.02)  ~ (p=0.465)
BenchmarkGzip                        650ms × (0.98,1.03)     655ms × (0.97,1.04)  ~ (p=0.075)
BenchmarkGzip-2                      652ms × (0.98,1.03)     655ms × (0.98,1.02)  ~ (p=0.337)
BenchmarkGzip-4                      656ms × (0.98,1.04)     653ms × (0.98,1.03)  ~ (p=0.291)
BenchmarkGunzip                      143ms × (1.00,1.01)     143ms × (1.00,1.01)  ~ (p=0.507)
BenchmarkGunzip-2                    143ms × (1.00,1.01)     143ms × (1.00,1.01)  ~ (p=0.313)
BenchmarkGunzip-4                    143ms × (1.00,1.01)     143ms × (1.00,1.01)  ~ (p=0.312)
BenchmarkHTTPClientServer            110µs × (0.98,1.03)     109µs × (0.99,1.02)  -1.40% (p=0.000)
BenchmarkHTTPClientServer-2          154µs × (0.90,1.08)     149µs × (0.90,1.08)  -3.43% (p=0.007)
BenchmarkHTTPClientServer-4          138µs × (0.97,1.04)     138µs × (0.96,1.04)  ~ (p=0.670)
BenchmarkJSONEncode                 40.2ms × (0.98,1.02)    40.2ms × (0.98,1.05)  ~ (p=0.828)
BenchmarkJSONEncode-2               35.1ms × (0.99,1.02)    35.2ms × (0.98,1.03)  ~ (p=0.392)
BenchmarkJSONEncode-4               35.3ms × (0.98,1.03)    35.3ms × (0.98,1.02)  ~ (p=0.813)
BenchmarkJSONDecode                  119ms × (0.97,1.02)     117ms × (0.98,1.02)  -1.80% (p=0.000)
BenchmarkJSONDecode-2                115ms × (0.99,1.02)     114ms × (0.98,1.02)  -1.18% (p=0.000)
BenchmarkJSONDecode-4                116ms × (0.98,1.02)     114ms × (0.98,1.02)  -1.43% (p=0.000)
BenchmarkMandelbrot200              6.03ms × (1.00,1.01)    6.03ms × (1.00,1.01)  ~ (p=0.985)
BenchmarkMandelbrot200-2            6.03ms × (1.00,1.01)    6.02ms × (1.00,1.01)  ~ (p=0.320)
BenchmarkMandelbrot200-4            6.03ms × (1.00,1.01)    6.03ms × (1.00,1.01)  ~ (p=0.799)
BenchmarkGoParse                    8.63ms × (0.89,1.10)    8.58ms × (0.93,1.09)  ~ (p=0.667)
BenchmarkGoParse-2                  8.20ms × (0.97,1.04)    8.37ms × (0.97,1.04)  +1.96% (p=0.001)
BenchmarkGoParse-4                  8.00ms × (0.98,1.02)    8.14ms × (0.99,1.02)  +1.75% (p=0.000)
BenchmarkRegexpMatchEasy0_32         162ns × (1.00,1.01)     164ns × (0.98,1.04)  +1.35% (p=0.011)
BenchmarkRegexpMatchEasy0_32-2       161ns × (1.00,1.01)     161ns × (1.00,1.00)  ~ (p=0.185)
BenchmarkRegexpMatchEasy0_32-4       161ns × (1.00,1.00)     161ns × (1.00,1.00)  -0.19% (p=0.001)
BenchmarkRegexpMatchEasy0_1K         540ns × (0.99,1.02)     566ns × (0.98,1.04)  +4.98% (p=0.000)
BenchmarkRegexpMatchEasy0_1K-2       540ns × (0.99,1.01)     557ns × (0.99,1.01)  +3.21% (p=0.000)
BenchmarkRegexpMatchEasy0_1K-4       541ns × (0.99,1.01)     559ns × (0.99,1.01)  +3.26% (p=0.000)
BenchmarkRegexpMatchEasy1_32         139ns × (0.98,1.04)     139ns × (0.99,1.03)  ~ (p=0.979)
BenchmarkRegexpMatchEasy1_32-2       139ns × (0.99,1.04)     139ns × (0.99,1.02)  ~ (p=0.777)
BenchmarkRegexpMatchEasy1_32-4       139ns × (0.98,1.04)     139ns × (0.99,1.04)  ~ (p=0.771)
BenchmarkRegexpMatchEasy1_1K         890ns × (0.99,1.03)     885ns × (1.00,1.01)  -0.50% (p=0.004)
BenchmarkRegexpMatchEasy1_1K-2       888ns × (0.99,1.01)     885ns × (0.99,1.01)  -0.37% (p=0.004)
BenchmarkRegexpMatchEasy1_1K-4       890ns × (0.99,1.02)     884ns × (1.00,1.00)  -0.70% (p=0.000)
BenchmarkRegexpMatchMedium_32        252ns × (0.99,1.01)     251ns × (0.99,1.01)  ~ (p=0.081)
BenchmarkRegexpMatchMedium_32-2      254ns × (0.99,1.04)     252ns × (0.99,1.01)  -0.78% (p=0.027)
BenchmarkRegexpMatchMedium_32-4      253ns × (0.99,1.04)     252ns × (0.99,1.01)  -0.70% (p=0.022)
BenchmarkRegexpMatchMedium_1K       72.9µs × (0.99,1.01)    72.7µs × (1.00,1.00)  ~ (p=0.064)
BenchmarkRegexpMatchMedium_1K-2     74.1µs × (0.98,1.05)    72.9µs × (1.00,1.01)  -1.61% (p=0.001)
BenchmarkRegexpMatchMedium_1K-4     73.6µs × (0.99,1.05)    72.8µs × (1.00,1.00)  -1.13% (p=0.007)
BenchmarkRegexpMatchHard_32         3.88µs × (0.99,1.03)    3.92µs × (0.98,1.05)  ~ (p=0.143)
BenchmarkRegexpMatchHard_32-2       3.89µs × (0.99,1.03)    3.93µs × (0.98,1.09)  ~ (p=0.278)
BenchmarkRegexpMatchHard_32-4       3.90µs × (0.99,1.05)    3.93µs × (0.98,1.05)  ~ (p=0.252)
BenchmarkRegexpMatchHard_1K          118µs × (0.99,1.01)     117µs × (0.99,1.02)  -0.54% (p=0.003)
BenchmarkRegexpMatchHard_1K-2        118µs × (0.99,1.01)     118µs × (0.99,1.03)  ~ (p=0.581)
BenchmarkRegexpMatchHard_1K-4        118µs × (0.99,1.02)     117µs × (0.99,1.01)  -0.54% (p=0.002)
BenchmarkRevcomp                     991ms × (0.95,1.10)     989ms × (0.94,1.08)  ~ (p=0.879)
BenchmarkRevcomp-2                   978ms × (0.95,1.11)     962ms × (0.96,1.08)  ~ (p=0.257)
BenchmarkRevcomp-4                   979ms × (0.96,1.07)     974ms × (0.96,1.11)  ~ (p=0.678)
BenchmarkTemplate                    141ms × (0.99,1.02)     145ms × (0.99,1.02)  +2.75% (p=0.000)
BenchmarkTemplate-2                  135ms × (0.98,1.02)     138ms × (0.99,1.02)  +2.34% (p=0.000)
BenchmarkTemplate-4                  136ms × (0.98,1.02)     140ms × (0.99,1.02)  +2.71% (p=0.000)
BenchmarkTimeParse                   640ns × (0.99,1.01)     622ns × (0.99,1.01)  -2.88% (p=0.000)
BenchmarkTimeParse-2                 640ns × (0.99,1.01)     622ns × (1.00,1.00)  -2.81% (p=0.000)
BenchmarkTimeParse-4                 640ns × (1.00,1.01)     622ns × (0.99,1.01)  -2.82% (p=0.000)
BenchmarkTimeFormat                  730ns × (0.98,1.02)     731ns × (0.98,1.03)  ~ (p=0.767)
BenchmarkTimeFormat-2                709ns × (0.99,1.02)     707ns × (0.99,1.02)  ~ (p=0.347)
BenchmarkTimeFormat-4                717ns × (0.98,1.01)     718ns × (0.98,1.02)  ~ (p=0.793)

Change-Id: Ie779c47e912bf80eb918bafa13638bd8dfd6c2d9
Reviewed-on: https://go-review.googlesource.com/9406
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-05-01 18:44:36 +00:00
Rick Hudson ada8cdb9f6 runtime: Fix bug due to elided return.
A previous change to mbitmap.go dropped a return on a
path the seems not to be excersized. This was a mistake that
this CL fixes.

Change-Id: I715ee4ef08f5bf8d9f53cee84e8fb31a237e2d43
Reviewed-on: https://go-review.googlesource.com/9295
Reviewed-by: Austin Clements <austin@google.com>
2015-04-24 21:52:30 +00:00
Rick Hudson 899a4ad47e runtime: Speed up heapBitsForObject
Optimized heapBitsForObject by special casing
objects whose size is a power of two. When a
span holding such objects is initialized I
added a mask that when &ed with an interior pointer
results in the base of the pointer. For the garbage
benchmark this resulted in CPU_CLK_UNHALTED in
heapBitsForObject going from 7.7% down to 5.9%
of the total, INST_RETIRED went from 12.2 -> 8.7.

Here are the benchmarks that were at lease plus or minus 1%.

benchmark                          old ns/op      new ns/op      delta
BenchmarkFmtFprintfString          249            221            -11.24%
BenchmarkFmtFprintfInt             247            223            -9.72%
BenchmarkFmtFprintfEmpty           76.5           69.6           -9.02%
BenchmarkBinaryTree17              4106631412     3744550160     -8.82%
BenchmarkFmtFprintfFloat           424            399            -5.90%
BenchmarkGoParse                   4484421        4242115        -5.40%
BenchmarkGobEncode                 8803668        8449107        -4.03%
BenchmarkFmtManyArgs               1494           1436           -3.88%
BenchmarkGobDecode                 10431051       10032606       -3.82%
BenchmarkFannkuch11                2591306713     2517400464     -2.85%
BenchmarkTimeParse                 361            371            +2.77%
BenchmarkJSONDecode                70620492       68830357       -2.53%
BenchmarkRegexpMatchMedium_1K      54693          53343          -2.47%
BenchmarkTemplate                  90008879       91929940       +2.13%
BenchmarkTimeFormat                380            387            +1.84%
BenchmarkRegexpMatchEasy1_32       111            113            +1.80%
BenchmarkJSONEncode                21359159       21007583       -1.65%
BenchmarkRegexpMatchEasy1_1K       603            613            +1.66%
BenchmarkRegexpMatchEasy0_32       127            129            +1.57%
BenchmarkFmtFprintfIntInt          399            393            -1.50%
BenchmarkRegexpMatchEasy0_1K       373            378            +1.34%

Change-Id: I78e297161026f8b5cc7507c965fd3e486f81ed29
Reviewed-on: https://go-review.googlesource.com/8980
Reviewed-by: Austin Clements <austin@google.com>
2015-04-20 21:39:06 +00:00
Michael Hudson-Doyle a1f57598cc runtime, cmd/internal/ld: rename themoduledata to firstmoduledata
'themoduledata' doesn't really make sense now we support multiple moduledata
objects.

Change-Id: I8263045d8f62a42cb523502b37289b0fba054f62
Reviewed-on: https://go-review.googlesource.com/8521
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-10 05:11:49 +00:00
Michael Hudson-Doyle fae4a128cb runtime, reflect: support multiple moduledata objects
This changes all the places that consult themoduledata to consult a
linked list of moduledata objects, as will be necessary for
-linkshared to work.

Obviously, as there is as yet no way of adding moduledata objects to
this list, all this change achieves right now is wasting a few
instructions here and there.

Change-Id: I397af7f60d0849b76aaccedf72238fe664867051
Reviewed-on: https://go-review.googlesource.com/8231
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-10 04:51:42 +00:00
Michael Hudson-Doyle 67426a8a9e runtime, cmd/internal/ld: change runtime to use a single linker symbol
In preparation for being able to run a go program that has code
in several objects, this changes from having several linker
symbols used by the runtime into having one linker symbol that
points at a structure containing the needed data.  Multiple
object support will construct a linked list of such structures.

A follow up will initialize the slices in the themoduledata
structure directly from the linker but I was aiming for a minimal
diff for now.

Change-Id: I613cce35309801cf265a1d5ae5aaca8d689c5cbf
Reviewed-on: https://go-review.googlesource.com/7441
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-03-31 22:45:07 +00:00
Rick Hudson 122384e489 runtime: Remove boundary bit logic.
This is an experiment to see if removing the boundary bit logic will
lead to fewer cache misses and improved performance. Instead of using
boundary bits we use the span information to get element size and use
some bit whacking to get the boundary without having to touch the
random heap bits which cause cache misses.

Furthermore once the boundary bit is removed we can either use that
bit for a simpler checkmark routine or we can reduce the number of
bits in the GC bitmap to 2 bits per pointer sized work. For example
the 2 bits at the boundary can be used for marking and pointer/scalar
differentiation. Since we don't need the mark bit except at the
boundary nibble of the object other nibbles can use this bit
as a noscan bit to indicate that there are no more pointers in
the object.

Currently the changed included in this CL slows down the garbage
benchmark. With the boundary bits garbage gives 5.78 and without
(this CL) it gives 5.88 which is a 2% slowdown.

Change-Id: Id68f831ad668176f7dc9f7b57b339e4ebb6dc4c2
Reviewed-on: https://go-review.googlesource.com/6665
Reviewed-by: Austin Clements <austin@google.com>
2015-03-04 20:55:55 +00:00
Russ Cox 9feb24f3ed runtime: use multiply instead of divide in heapBitsForObject
These benchmarks show the effect of the combination of this change
and Rick's pending CL 6665. Code with interior pointers is helped
much more than code without, but even code without doesn't suffer
too badly.

benchmark                          old ns/op      new ns/op      delta
BenchmarkBinaryTree17              6989407768     6851728175     -1.97%
BenchmarkFannkuch11                4416250775     4405762558     -0.24%
BenchmarkFmtFprintfEmpty           134            130            -2.99%
BenchmarkFmtFprintfString          491            402            -18.13%
BenchmarkFmtFprintfInt             430            420            -2.33%
BenchmarkFmtFprintfIntInt          748            663            -11.36%
BenchmarkFmtFprintfPrefixedInt     602            534            -11.30%
BenchmarkFmtFprintfFloat           728            699            -3.98%
BenchmarkFmtManyArgs               2528           2507           -0.83%
BenchmarkGobDecode                 17448191       17749756       +1.73%
BenchmarkGobEncode                 14579824       14370183       -1.44%
BenchmarkGzip                      656489990      652669348      -0.58%
BenchmarkGunzip                    141254147      141099278      -0.11%
BenchmarkHTTPClientServer          94111          93738          -0.40%
BenchmarkJSONEncode                36305013       36696440       +1.08%
BenchmarkJSONDecode                124652000      128176454      +2.83%
BenchmarkMandelbrot200             6009333        5997093        -0.20%
BenchmarkGoParse                   7651583        7623494        -0.37%
BenchmarkRegexpMatchEasy0_32       213            213            +0.00%
BenchmarkRegexpMatchEasy0_1K       511            494            -3.33%
BenchmarkRegexpMatchEasy1_32       186            187            +0.54%
BenchmarkRegexpMatchEasy1_1K       1834           1827           -0.38%
BenchmarkRegexpMatchMedium_32      427            412            -3.51%
BenchmarkRegexpMatchMedium_1K      154841         153086         -1.13%
BenchmarkRegexpMatchHard_32        7473           7478           +0.07%
BenchmarkRegexpMatchHard_1K        233587         232272         -0.56%
BenchmarkRevcomp                   918797689      944528032      +2.80%
BenchmarkTemplate                  167665081      167773121      +0.06%
BenchmarkTimeParse                 631            636            +0.79%
BenchmarkTimeFormat                672            666            -0.89%

Change-Id: Ia923de3cdb3993b640fe0a02cbe2c7babc16f32c
Reviewed-on: https://go-review.googlesource.com/6782
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2015-03-04 17:46:47 +00:00
Austin Clements da4874cba4 runtime: trivial clean ups to greyobject
Previously, the typeDead check in greyobject was under a separate
!useCheckmark conditional.  Put it with the rest of the !useCheckmark
code.  Also move a comment about atomic update of the marked bit to
where we actually do that update now.

Change-Id: Ief5f16401a25739ad57d959607b8d81ffe0bc211
Reviewed-on: https://go-review.googlesource.com/6271
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-02-27 19:39:57 +00:00
Keith Randall 6d1ebeb527 runtime: handle holes in the heap
We need to distinguish pointers to free spans, which indicate bugs in
our pointer analysis, from pointers to never-in-the-heap spans, which
can legitimately arise from sysAlloc/mmap/etc.  This normally isn't a
problem because the heap is contiguous, but in some situations (32
bit, particularly) the heap must grow around an already allocated
region.

The bad pointer test is disabled so this fix doesn't actually do
anything, but it removes one barrier from reenabling it.

Fixes #9872.

Change-Id: I0a92db4d43b642c58d2b40af69c906a8d9777f88
Reviewed-on: https://go-review.googlesource.com/5780
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-02-25 21:07:10 +00:00
Austin Clements bceb18e498 runtime: eliminate unnecessary assumption in heapBitsForObject
The slow path of heapBitsForObjects somewhat subtly assumes that the
pointer will not point to the first word of the object and will round
the pointer wrong if this assumption is violated.  This assumption is
safe because the fast path should always take care of this case, but
there's no benefit to making this assumption, it makes the code more
difficult to experiment with than necessary, and it's trivial to
eliminate.

Change-Id: Iedd336f7d529a27d3abeb83e77dfb32a285ea73a
Reviewed-on: https://go-review.googlesource.com/5636
Reviewed-by: Russ Cox <rsc@golang.org>
2015-02-23 21:49:27 +00:00
Russ Cox 484f801ff4 runtime: reorganize memory code
Move code from malloc1.go, malloc2.go, mem.go, mgc0.go into
appropriate locations.

Factor mgc.go into mgc.go, mgcmark.go, mgcsweep.go, mstats.go.

A lot of this code was in certain files because the right place was in
a C file but it was written in Go, or vice versa. This is one step toward
making things actually well-organized again.

Change-Id: I6741deb88a7cfb1c17ffe0bcca3989e10207968f
Reviewed-on: https://go-review.googlesource.com/5300
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-02-19 20:17:01 +00:00
Russ Cox 3965d7508e runtime: factor out bitmap, finalizer code from malloc/mgc
The code in mfinal.go is moved from malloc*.go and mgc*.go
and substantially unchanged.

The code in mbitmap.go is also moved from those files, but
cleaned up so that it can be called from those files (in most cases
the code being moved was not already a standalone function).
I also renamed the constants and wrote comments describing
the format. The result is a significant cleanup and isolation of
the bitmap code, but, roughly speaking, it should be treated
and reviewed as new code.

The other files changed only as much as necessary to support
this code movement.

This CL does NOT change the semantics of the heap or type
bitmaps at all, although there are now some obvious opportunities
to do so in followup CLs.

Change-Id: I41b8d5de87ad1d3cd322709931ab25e659dbb21d
Reviewed-on: https://go-review.googlesource.com/2991
Reviewed-by: Keith Randall <khr@golang.org>
2015-01-19 16:26:51 +00:00