In order to move malloc to Go, we need to have a
separate stack allocator. If we run out of stack
during malloc, malloc will not be available
to allocate a new stack.
Stacks are the last remaining FlagNoGC objects in the
GC heap. Once they are out, we can get rid of the
distinction between the allocated/blockboundary bits.
(This will be in a separate change.)
Fixes#7468Fixes#7424
LGTM=rsc, dvyukov
R=golang-codereviews, dvyukov, khr, dave, rsc
CC=golang-codereviews
https://golang.org/cl/104200047
Remove GC bitmap backward scanning.
This was already done once in https://golang.org/cl/5530074/
Still makes GC a bit faster.
On the garbage benchmark, before:
gc-pause-one=237345195
gc-pause-total=4746903
cputime=32427775
time=32458208
after:
gc-pause-one=235484019
gc-pause-total=4709680
cputime=31861965
time=31877772
Also prepares mgc0.c for future changes.
R=golang-codereviews, khr, khr
CC=golang-codereviews, rsc
https://golang.org/cl/105380043
newproc takes two extra pointers, not two extra registers.
On amd64p32 (nacl) they are different.
We diagnosed this before the 1.3 cut but the tree was frozen.
I believe this is causing the random problems on the builder.
Fixes#8199.
TBR=r
CC=golang-codereviews
https://golang.org/cl/102710043
Output number of spinning threads,
this is useful to understanding whether the scheduler
is in a steady state or not.
R=golang-codereviews, khr
CC=golang-codereviews, rsc
https://golang.org/cl/103540045
Say when a goroutine is locked to OS thread in crash reports
and goroutine profiles.
It can be useful to understand what goroutines consume OS threads
(syscall and locked), e.g. if you forget to call UnlockOSThread
or leak locked goroutines.
R=golang-codereviews
CC=golang-codereviews, rsc
https://golang.org/cl/94170043
The runtime has historically held two dedicated values g (current goroutine)
and m (current thread) in 'extern register' slots (TLS on x86, real registers
backed by TLS on ARM).
This CL removes the extern register m; code now uses g->m.
On ARM, this frees up the register that formerly held m (R9).
This is important for NaCl, because NaCl ARM code cannot use R9 at all.
The Go 1 macrobenchmarks (those with per-op times >= 10 µs) are unaffected:
BenchmarkBinaryTree17 5491374955 5471024381 -0.37%
BenchmarkFannkuch11 4357101311 4275174828 -1.88%
BenchmarkGobDecode 11029957 11364184 +3.03%
BenchmarkGobEncode 6852205 6784822 -0.98%
BenchmarkGzip 650795967 650152275 -0.10%
BenchmarkGunzip 140962363 141041670 +0.06%
BenchmarkHTTPClientServer 71581 73081 +2.10%
BenchmarkJSONEncode 31928079 31913356 -0.05%
BenchmarkJSONDecode 117470065 113689916 -3.22%
BenchmarkMandelbrot200 6008923 5998712 -0.17%
BenchmarkGoParse 6310917 6327487 +0.26%
BenchmarkRegexpMatchMedium_1K 114568 114763 +0.17%
BenchmarkRegexpMatchHard_1K 168977 169244 +0.16%
BenchmarkRevcomp 935294971 914060918 -2.27%
BenchmarkTemplate 145917123 148186096 +1.55%
Minux previous reported larger variations, but these were caused by
run-to-run noise, not repeatable slowdowns.
Actual code changes by Minux.
I only did the docs and the benchmarking.
LGTM=dvyukov, iant, minux
R=minux, josharian, iant, dave, bradfitz, dvyukov
CC=golang-codereviews
https://golang.org/cl/109050043
MOV with SSE registers seems faster than REP MOVSQ if the
size being copied is less than about 2K. Previously we
didn't use MOV if the memory region is larger than 256
byte. This patch improves the performance of 257 ~ 2048
byte non-overlapping copy by using MOV.
Here is the benchmark result on Intel Xeon 3.5GHz (Nehalem).
benchmark old ns/op new ns/op delta
BenchmarkMemmove16 4 4 +0.42%
BenchmarkMemmove32 5 5 -0.20%
BenchmarkMemmove64 6 6 -0.81%
BenchmarkMemmove128 7 7 -0.82%
BenchmarkMemmove256 10 10 +1.92%
BenchmarkMemmove512 29 16 -44.90%
BenchmarkMemmove1024 37 25 -31.55%
BenchmarkMemmove2048 55 44 -19.46%
BenchmarkMemmove4096 92 91 -0.76%
benchmark old MB/s new MB/s speedup
BenchmarkMemmove16 3370.61 3356.88 1.00x
BenchmarkMemmove32 6368.68 6386.99 1.00x
BenchmarkMemmove64 10367.37 10462.62 1.01x
BenchmarkMemmove128 17551.16 17713.48 1.01x
BenchmarkMemmove256 24692.81 24142.99 0.98x
BenchmarkMemmove512 17428.70 31687.72 1.82x
BenchmarkMemmove1024 27401.82 40009.45 1.46x
BenchmarkMemmove2048 36884.86 45766.98 1.24x
BenchmarkMemmove4096 44295.91 44627.86 1.01x
LGTM=khr
R=golang-codereviews, gobot, khr
CC=golang-codereviews
https://golang.org/cl/90500043
This requires minimal changes to the runtime hooks. In particular,
synchronization events must be done only on valid addresses now,
so I've added the additional checks to race.c.
LGTM=iant
R=iant
CC=golang-codereviews
https://golang.org/cl/101000046
Afterprologue check was required when did not know
about return arguments of functions and/or they were not zeroed.
Now 100% precision is required for stacks due to stack copying,
so it must work w/o afterprologue one way or another.
I can limit this change for 1.3 to merely adding a TODO,
but this check is super confusing so I don't want this knowledge to get lost.
LGTM=rsc
R=golang-codereviews, gobot, rsc, khr
CC=golang-codereviews, khr, rsc
https://golang.org/cl/96580045
Also implement go:nosplit annotation. Not really needed
for now, but we'll definitely need it for other conversions.
benchmark old ns/op new ns/op delta
BenchmarkRuneIterate 534 474 -11.24%
BenchmarkRuneIterate2 535 470 -12.15%
LGTM=bradfitz
R=golang-codereviews, dave, bradfitz, minux
CC=golang-codereviews
https://golang.org/cl/93380044
Reportedly in the Linux 3.16 kernel the VDSO will not have
section headers or a normal symbol table.
Too late for 1.3 but perhaps for 1.3.1, if there is one.
Fixes#8197.
LGTM=rsc
R=golang-codereviews, mattn.jp, rsc
CC=golang-codereviews
https://golang.org/cl/101260044
It appears that something about Go on Windows
cannot handle the fault cause by a jump to address 0.
The way Go represents and calls functions, this
never happened at all, until CL 105140044.
This CL changes the code added in CL 105140044
to make jump to 0 impossible once again.
Fixes#8047. (again, on Windows)
TBR=bradfitz
R=golang-codereviews, dave
CC=adg, golang-codereviews, iant, r
https://golang.org/cl/105120044
jmpdefer modifies PC, SP, and LR, and not atomically,
so walking past jmpdefer will often end up in a state
where the three are not a consistent execution snapshot.
This was causing warning messages a few frames later
when the traceback realized it was confused, but given
the right memory it could easily crash instead.
Update #8153
LGTM=minux, iant
R=golang-codereviews, minux, iant
CC=golang-codereviews, r
https://golang.org/cl/107970043
A runtime.Goexit during a panic-invoked deferred call
left the panic stack intact even though all the stack frames
are gone when the goroutine is torn down.
The next goroutine to reuse that struct will have a
bogus panic stack and can cause the traceback routines
to walk into garbage.
Most likely to happen during tests, because t.Fatal might
be called during a deferred func and uses runtime.Goexit.
This "not enough cleared in Goexit" failure mode has
happened to us multiple times now. Clear all the pointers
that don't make sense to keep, not just gp->panic.
Fixes#8158.
LGTM=iant, dvyukov
R=iant, dvyukov
CC=golang-codereviews
https://golang.org/cl/102220043
The 1-byte write was silently clearing a byte on the stack.
If there was another function call with more arguments
in the same stack frame, no harm done.
Otherwise, if the variable at that location was already zero,
no harm done.
Otherwise, problems.
Fixes#8139.
LGTM=dsymonds
R=golang-codereviews, dsymonds
CC=golang-codereviews, iant, r
https://golang.org/cl/100940043
We were requiring that the defer stack and the panic stack
be completely processed, thinking that if any were left over
the stack scan and the defer stack/panic stack must be out
of sync. It turns out that the panic stack may well have
leftover entries in some situations, and that's okay.
Fixes#8132.
LGTM=minux, r
R=golang-codereviews, minux, r
CC=golang-codereviews, iant, khr
https://golang.org/cl/100900044
C globals are conservatively scanned. This helps
avoid false retention, especially for 32 bit.
LGTM=rsc
R=golang-codereviews, khr, rsc
CC=golang-codereviews
https://golang.org/cl/102040043
The 'continuation pc' is where the frame will continue
execution, if anywhere. For a frame that stopped execution
due to a CALL instruction, the continuation pc is immediately
after the CALL. But for a frame that stopped execution due to
a fault, the continuation pc is the pc after the most recent CALL
to deferproc in that frame, or else 0. That is where execution
will continue, if anywhere.
The liveness information is only recorded for CALL instructions.
This change makes sure that we never look for liveness information
except for CALL instructions.
Using a valid PC fixes crashes when a garbage collection or
stack copying tries to process a stack frame that has faulted.
Record continuation pc in heapdump (format change).
Fixes#8048.
LGTM=iant, khr
R=khr, iant, dvyukov
CC=golang-codereviews, r
https://golang.org/cl/100870044
Update #2675
The code here was using the error check for Linux/386,
not the one for FreeBSD/386. Most of the time it worked.
Thanks to Neel Natu (FreeBSD developer) for finding this.
The s/JCC/JAE/ a few lines later is a no-op but makes the
test match the rest of the file. Why we write JAE instead of JCC
I don't know, but the two are equivalent and the file might
as well be consistent.
LGTM=bradfitz, minux
R=golang-codereviews, bradfitz, minux
CC=golang-codereviews
https://golang.org/cl/99680044
The rtype struct is meant to be a copy of reflect.rtype. The
zero field was added to reflect.rtype in 18495:6e50725ac753.
LGTM=rsc
R=khr, rsc
CC=golang-codereviews
https://golang.org/cl/93660045
Add nacl.bash, the NaCl version of all.bash.
It's a separate script because it builds a variant of package syscall
with a large zip file embedded in it, containing all the input files
needed for tests.
Disable various tests new since the last round, mostly the ones using os/exec.
Fixes#7945.
LGTM=dave
R=golang-codereviews, remyoudompheng, dave, bradfitz
CC=golang-codereviews
https://golang.org/cl/100590044
The move from 4kB to 8kB in Go 1.2 was to eliminate many stack split hot spots.
The move back to 4kB was predicated on copying stacks eliminating
the potential for hot spots.
Unfortunately, the fact that stacks do not copy 100% of the time means
that hot spots can still happen under the right conditions, and the slowdown
is worse now than it was in Go 1.2. There is a real program in issue 8030 that
sees about a 30x slowdown: it has a reflect call near the top of the stack
which inhibits any stack copying on that segment.
Go back to 8kB until stack copying can be used 100% of the time.
Fixes#8030.
LGTM=khr, dave, iant
R=iant, khr, r, bradfitz, dave
CC=golang-codereviews
https://golang.org/cl/92540043
Currently freeOSMemory makes only marking phase of GC, but not sweeping phase.
So recently memory is not released after freeOSMemory.
Do both marking and sweeping during freeOSMemory.
Fixes#8019.
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, rsc
https://golang.org/cl/97550043
The GC program describing a data structure sometimes trusts the
pointer base type and other times does not (if not, the garbage collector
must fall back on per-allocation type information stored in the heap).
Make the scanning of a pointer in an interface do the same.
This fixes a crash in a particular use of reflect.SliceHeader.
Fixes#8004.
LGTM=khr
R=golang-codereviews, khr
CC=0xe2.0x9a.0x9b, golang-codereviews, iant, r
https://golang.org/cl/100470045
mstats.last_gc is unix time now, it is compared with abstract monotonic time.
On my machine GC is forced every 5 mins regardless of last_gc.
LGTM=rsc
R=golang-codereviews
CC=golang-codereviews, iant, rsc
https://golang.org/cl/91350045
I have no test case for this at tip.
The original report included a program crashing at revision 88ac7297d2fa.
I tested this code at that revision and it does fix the crash.
However, at tip the reported code no longer crashes, presumably
because some allocation patterns have changed. I believe the
bug is still present at tip and that this code still fixes it.
Fixes#7143.
LGTM=alex.brainman
R=golang-codereviews, alex.brainman
CC=dvyukov, golang-codereviews
https://golang.org/cl/96300046
If it's not used (such as on other systems or if softfloat
is disabled) the linker will discard it.
The alternative is to teach cmd/go that every binary
depends on math implicitly on arm. I started down that
path but it's too scary. If we're going to get dependencies
right we should get dependencies right.
Fixes#6994.
LGTM=bradfitz, dave
R=golang-codereviews, bradfitz, dave
CC=golang-codereviews
https://golang.org/cl/95290043
<enter reason for undo>
««« original CL description
runtime/race: fix the link for the race detector.
LGTM=bradfitz
R=golang-dev, bradfitz
CC=golang-codereviews
https://golang.org/cl/100330043
»»»
TBR=minux
R=minux.ma
CC=golang-codereviews
https://golang.org/cl/96200044