Number of lost samples was overcounted (never reset).
Also remove unused variable (it's trivial to restore it for debugging if needed).
LGTM=iant
R=golang-codereviews, iant
CC=golang-codereviews, rsc
https://golang.org/cl/96060043
Where the spelling changed from British to
US norm (e.g., optimise -> optimize) it follows
the style in that file.
LGTM=adonovan
R=golang-codereviews, adonovan
CC=golang-codereviews
https://golang.org/cl/96980043
Because gotraceback is called early and often, its cache commits to the value of getenv("GOTRACEBACK") before getenv is even ready. So now we reset its cache once getenv becomes ready. Panicking programs now dump core again.
LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/97800045
If slice append is the only place where a program allocates,
then it will consume all available memory w/o triggering GC.
This was demonstrated in the issue.
Fixes#7922.
LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews, iant, khr
https://golang.org/cl/91010048
The monotonic clock patch changed all runtime times
to abstract monotonic time. As the result user-visible
MemStats.LastGC become monotonic time as well.
Restore Unix time for LastGC.
This is the simplest way to expose time.now to runtime that I found.
Another option would be to change time.now to C called
int64 runtime.unixnanotime() and then express time.now in terms of it.
But this would require to introduce 2 64-bit divisions into time.now.
Another option would be to change time.now to C called
void runtime.unixnanotime1(struct {int64 sec, int32 nsec} *now)
and then express both time.now and runtime.unixnanotime in terms of it.
Fixes#7852.
LGTM=minux.ma, iant
R=minux.ma, rsc, iant
CC=golang-codereviews
https://golang.org/cl/93720045
The backing memory for >1 word interfaces was being scanned
conservatively.
LGTM=iant
R=golang-codereviews, iant
CC=golang-codereviews
https://golang.org/cl/94000043
Use a real type for Gs instead of scanning them conservatively.
Zero the schedlink pointer when it is dead.
Update #7820
LGTM=rsc
R=rsc, dvyukov
CC=golang-codereviews
https://golang.org/cl/89360043
This has typically crashed in the past, although usually with
an 'all goroutines are asleep - deadlock!' message that shows
no goroutines (because there aren't any).
Previous discussion at:
https://groups.google.com/d/msg/golang-nuts/uCT_7WxxopQ/BoSBlLFzUTkJhttps://groups.google.com/d/msg/golang-dev/KUojayEr20I/u4fp_Ej5PdUJhttp://golang.org/issue/7711
There is general agreement that runtime.Goexit terminates the
main goroutine, so that main cannot return, so the program does
not exit.
The interpretation that all other goroutines exiting causes an
exit(0) is relatively new and was not part of those discussions.
That is what this CL changes.
Thankfully, even though the exit(0) has been there for a while,
some other accounting bugs made it very difficult to trigger,
so it is reasonable to replace. In particular, see golang.org/issue/7711#c10
for an examination of the behavior across past releases.
Fixes#7711.
LGTM=iant, r
R=golang-codereviews, iant, dvyukov, r
CC=golang-codereviews
https://golang.org/cl/88210044
Having the pointers means you can grub around in the
binary finding out more about them.
This helped with issue 7748.
LGTM=minux.ma, bradfitz
R=golang-codereviews, minux.ma, bradfitz
CC=golang-codereviews
https://golang.org/cl/88090045
When I did the original 386 ports on Linux and OS X, I chose to
define GS-relative expressions like 4(GS) as relative to the actual
thread-local storage base, which was usually GS but might not be
(it might be FS, or it might be a different constant offset from GS or FS).
The original scope was limited but since then the rewrites have
gotten out of control. Sometimes GS is rewritten, sometimes FS.
Some ports do other rewrites to enable shared libraries and
other linking. At no point in the code is it clear whether you are
looking at the real GS/FS or some synthesized thing that will be
rewritten. The code manipulating all these is duplicated in many
places.
The first step to fixing issue 7719 is to make the code intelligible
again.
This CL adds an explicit TLS pseudo-register to the 386 and amd64.
As a register, TLS refers to the thread-local storage base, and it
can only be loaded into another register:
MOVQ TLS, AX
An offset from the thread-local storage base is written off(reg)(TLS*1).
Semantically it is off(reg), but the (TLS*1) annotation marks this as
indexing from the loaded TLS base. This emits a relocation so that
if the linker needs to adjust the offset, it can. For example:
MOVQ TLS, AX
MOVQ 8(AX)(TLS*1), CX // load m into CX
On systems that support direct access to the TLS memory, this
pair of instructions can be reduced to a direct TLS memory reference:
MOVQ 8(TLS), CX // load m into CX
The 2-instruction and 1-instruction forms correspond roughly to
ELF TLS initial exec mode and ELF TLS local exec mode, respectively.
Liblink applies this rewrite on systems that support the 1-instruction form.
The decision is made using only the operating system (and probably
the -shared flag, eventually), not the link mode. If some link modes
on a particular operating system require the 2-instruction form,
then all builds for that operating system will use the 2-instruction
form, so that the link mode decision can be delayed to link time.
Obviously it is late to be making changes like this, but I despair
of correcting issue 7719 and issue 7164 without it. To make sure
I am not changing existing behavior, I built a "hello world" program
for every GOOS/GOARCH combination we have and then worked
to make sure that the rewrite generates exactly the same binaries,
byte for byte. There are a handful of TODOs in the code marking
kludges to get the byte-for-byte property, but at least now I can
explain exactly how each binary is handled.
The targets I tested this way are:
darwin-386
darwin-amd64
dragonfly-386
dragonfly-amd64
freebsd-386
freebsd-amd64
freebsd-arm
linux-386
linux-amd64
linux-arm
nacl-386
nacl-amd64p32
netbsd-386
netbsd-amd64
openbsd-386
openbsd-amd64
plan9-386
plan9-amd64
solaris-amd64
windows-386
windows-amd64
There were four exceptions to the byte-for-byte goal:
windows-386 and windows-amd64 have a time stamp
at bytes 137 and 138 of the header.
darwin-386 and plan9-386 have five or six modified
bytes in the middle of the Go symbol table, caused by
editing comments in runtime/sys_{darwin,plan9}_386.s.
Fixes#7164.
LGTM=iant
R=iant, aram, minux.ma, dave
CC=golang-codereviews
https://golang.org/cl/87920043
Do not consider idle finalizer/bgsweep/timer goroutines as doing something useful.
We can't simply set isbackground for the whole lifetime of the goroutines,
because when finalizer goroutine calls user function, we do want to consider it
as doing something useful.
This is borken due to timers for quite some time.
With background sweep is become even more broken.
Fixes#7784.
LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/87960044
Currently Pool can cache up to 15 elements per P, and these elements are not accesible to other Ps.
If a Pool caches large objects, say 2MB, and GOMAXPROCS is set to a large value, say 32,
then the Pool can waste up to 960MB.
The new caching policy caches at most 1 per-P element, the rest is shared between Ps.
Get/Put performance is unchanged. Nested Get/Put performance is 57% worse.
However, overall scalability of nested Get/Put is significantly improved,
so the new policy starts winning under contention.
benchmark old ns/op new ns/op delta
BenchmarkPool 27.4 26.7 -2.55%
BenchmarkPool-4 6.63 6.59 -0.60%
BenchmarkPool-16 1.98 1.87 -5.56%
BenchmarkPool-64 1.93 1.86 -3.63%
BenchmarkPoolOverlflow 3970 6235 +57.05%
BenchmarkPoolOverlflow-4 10935 1668 -84.75%
BenchmarkPoolOverlflow-16 13419 520 -96.12%
BenchmarkPoolOverlflow-64 10295 380 -96.31%
LGTM=rsc
R=rsc
CC=golang-codereviews, khr
https://golang.org/cl/86020043
It looks like maybe on slower builders 4 seconds is not enough.
Trying to get rid of the flaky failures.
TBR=iant
CC=golang-codereviews
https://golang.org/cl/86870044
It runs too long in -short mode.
Disable the one in init, because it doesn't respect -short.
Make the part that claims to test execution in a finalizer
actually execute the test in the finalizer.
LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=aram.h, golang-codereviews, iant, khr
https://golang.org/cl/86550045
We originally decided to skip this test in short mode
to prevent the parallel runtime test to timeout on the
Plan 9 builder. This should no longer be required since
the issue was fixed in CL 86210043.
LGTM=dave, bradfitz
R=dvyukov, dave, bradfitz
CC=golang-codereviews, rsc
https://golang.org/cl/84790044
If you pass ns = 100,000 to this function, timediv will
return ms = 0. tsemacquire in /sys/src/9/port/sysproc.c
will return immediately when ms == 0 and the semaphore
cannot be acquired immediately - it doesn't sleep - so
notetsleep will spin, chewing cpu and repeatedly reading
the time, until the 100us have passed.
Thanks to the time reads it won't take too many iterations,
but whatever we are waiting for does not get a chance to
run. Eventually the notetsleep spin loop returns and we
end up in the stoptheworld spin loop - actually a sleep
loop but we're not doing a good job of sleeping.
After 100ms or so of this, the kernel says enough and
schedules a different thread. That thread manages to do
whatever we're waiting for, and the spinning in the other
thread stops. If tsemacquire had actually slept, this
would have happened much quicker.
Many thanks to Russ Cox for help debugging.
LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/86210043
Cuts the number of calls from 6 to 2 in the non-debug case.
LGTM=iant
R=golang-codereviews, iant
CC=0intro, aram, golang-codereviews, khr
https://golang.org/cl/86040043
Getenv() should not call malloc when called from
gotraceback(). Instead, we return a static buffer
in this case, with enough room to hold the longest
value.
LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/85680043
On Plan 9 gotraceback calls getenv calls malloc, and we gotraceback
on every call to gentraceback, which happens during garbage collection.
Honestly I don't even know how this works on Plan 9.
I suspect it does not, and that we are getting by because
no one has tried to run with $GOTRACEBACK set at all.
This will speed up all the other systems by epsilon, since they
won't call getenv and atoi repeatedly.
LGTM=bradfitz
R=golang-codereviews, bradfitz, 0intro
CC=golang-codereviews
https://golang.org/cl/85430046
Given
type Outer struct {
*Inner
...
}
the compiler generates the implementation of (*Outer).M dispatching to
the embedded Inner. The implementation is logically:
func (p *Outer) M() {
(p.Inner).M()
}
but since the only change here is the replacement of one pointer
receiver with another, the actual generated code overwrites the
original receiver with the p.Inner pointer and then jumps to the M
method expecting the *Inner receiver.
During reflect.Value.Call, we create an argument frame and the
associated data structures to describe it to the garbage collector,
populate the frame, call reflect.call to run a function call using
that frame, and then copy the results back out of the frame. The
reflect.call function does a memmove of the frame structure onto the
stack (to set up the inputs), runs the call, and the memmoves the
stack back to the frame structure (to preserve the outputs).
Originally reflect.call did not distinguish inputs from outputs: both
memmoves were for the full stack frame. However, in the case where the
called function was one of these wrappers, the rewritten receiver is
almost certainly a different type than the original receiver. This is
not a problem on the stack, where we use the program counter to
determine the type information and understand that during (*Outer).M
the receiver is an *Outer while during (*Inner).M the receiver in the
same memory word is now an *Inner. But in the statically typed
argument frame created by reflect, the receiver is always an *Outer.
Copying the modified receiver pointer off the stack into the frame
will store an *Inner there, and then if a garbage collection happens
to scan that argument frame before it is discarded, it will scan the
*Inner memory as if it were an *Outer. If the two have different
memory layouts, the collection will intepret the memory incorrectly.
Fix by only copying back the results.
Fixes#7725.
LGTM=khr
R=khr
CC=dave, golang-codereviews
https://golang.org/cl/85180043
It turns out there is a relatively common pattern that relies on
inverted channel semaphore:
gate := make(chan bool, N)
for ... {
// limit concurrency
gate <- true
go func() {
foo(...)
<-gate
}()
}
// join all goroutines
for i := 0; i < N; i++ {
gate <- true
}
So handle synchronization on inverted semaphores with cap>1.
Fixes#7718.
LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/84880046
Defers generated from cgo lie to us about their argument layout.
Mark those defers as not copyable.
CL 83820043 contains an additional test for this code and should be
checked in (and enabled) after this change is in.
Fixes bug 7695.
LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews
https://golang.org/cl/84740043
Iterate the right number of times in arrays and channels.
Handle channels with zero-sized objects in them.
Output longer type names if we have them.
Compute argument offset correctly.
LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews
https://golang.org/cl/82980043
I have no idea what this code is for, but it pretty
clearly needs to be uint64, not uint32.
LGTM=aram
R=0intro, aram
CC=golang-codereviews
https://golang.org/cl/84410043
Trying to make GODEBUG=gcdead=1 work with liveness
and in particular ambiguously live variables.
1. In the liveness computation, mark all ambiguously live
variables as live for the entire function, except the entry.
They are zeroed directly after entry, and we need them not
to be poisoned thereafter.
2. In the liveness computation, compute liveness (and deadness)
for all parameters, not just pointer-containing parameters.
Otherwise gcdead poisons untracked scalar parameters and results.
3. Fix liveness debugging print for -live=2 to use correct bitmaps.
(Was not updated for compaction during compaction CL.)
4. Correct varkill during map literal initialization.
Was killing the map itself instead of the inserted value temp.
5. Disable aggressive varkill cleanup for call arguments if
the call appears in a defer or go statement.
6. In the garbage collector, avoid bug scanning empty
strings. An empty string is two zeros. The multiword
code only looked at the first zero and then interpreted
the next two bits in the bitmap as an ordinary word bitmap.
For a string the bits are 11 00, so if a live string was zero
length with a 0 base pointer, the poisoning code treated
the length as an ordinary word with code 00, meaning it
needed poisoning, turning the string into a poison-length
string with base pointer 0. By the same logic I believe that
a live nil slice (bits 11 01 00) will have its cap poisoned.
Always scan full multiword struct.
7. In the runtime, treat both poison words (PoisonGC and
PoisonStack) as invalid pointers that warrant crashes.
Manual testing as follows:
- Create a script called gcdead on your PATH containing:
#!/bin/bash
GODEBUG=gcdead=1 GOGC=10 GOTRACEBACK=2 exec "$@"
- Now you can build a test and then run 'gcdead ./foo.test'.
- More importantly, you can run 'go test -short -exec gcdead std'
to run all the tests.
Fixes#7676.
While here, enable the precise scanning of slices, since that was
disabled due to bugs like these. That now works, both with and
without gcdead.
Fixes#7549.
LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/83410044
The garbage collector poison pointers
(0x6969696969696969 and 0x6868686868686868)
are malformed addresses on amd64.
That is, they are not 48-bit addresses sign extended
to 64 bits. This causes a different kind of hardware fault
than the usual 'unmapped page' when accessing such
an address, and OS X 10.9.2 sends the resulting SIGSEGV
incorrectly, making it look like it was user-generated
rather than kernel-generated and does not include the
faulting address. This means that in GODEBUG=gcdead=1
mode, if there is a bug and something tries to dereference
a poisoned pointer, the runtime delivers the SIGSEGV to
os/signal and returns to the faulting code, which faults
again, causing the process to hang instead of crashing.
Fix by rewriting "user-generated" SIGSEGV on OS X to
look like a kernel-generated SIGSEGV with fault address
0xb01dfacedebac1e.
I chose that address because (1) when printed in hex
during a crash, it is obviously spelling out English text,
(2) there are no current Google hits for that pointer,
which will make its origin easy to find once this CL
is indexed, and (3) it is not an altogether inaccurate
description of the situation.
Add a test. Maybe other systems will break too.
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, iant, ken
https://golang.org/cl/83270049
Delaying the runtime.throw until here will print more information.
In particular it will print the signal and code values, which means
it will show the fault address.
The canpanic checks were added recently, in CL 75320043.
They were just not added in exactly the right place.
LGTM=iant
R=dvyukov, iant
CC=golang-codereviews
https://golang.org/cl/83980043
Brad has been asking for this for a while.
I have resisted because I wanted to find a more general way to
do this, one that would keep the performance of code introducing
variables the same as the performance of code that did not.
(See golang.org/issue/3512#c20).
I have not found the more general way, and recent changes to
remove ambiguously live temporaries have blown away the
property I was trying to preserve, so that's no longer a reason
not to make the change.
Fixes#3512.
LGTM=iant
R=iant
CC=bradfitz, golang-codereviews, khr, r
https://golang.org/cl/83740044
The software floating point runs with m->locks++
to avoid being preempted; recognize this case in panic
and undo it so that m->locks is maintained correctly
when panicking.
Fixes#7553.
LGTM=dvyukov
R=golang-codereviews, dvyukov
CC=golang-codereviews
https://golang.org/cl/84030043
The old limit of 5 was chosen because we didn't actually know how
many bytes of arguments there were; 5 was a halfway point between
printing some useful information and looking ridiculous.
Now we know how many bytes of arguments there are, and we stop
the printing when we reach that point, so the "looking ridiculous" case
doesn't happen anymore: we only print actual argument words.
The cutoff now serves only to truncate very long (but real) argument lists.
In multiple debugging sessions recently (completely unrelated bugs)
I have been frustrated by not seeing more of the long argument lists:
5 words is only 2.5 interface values or strings, and not even 2 slices.
Double the max amount we'll show.
LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=golang-codereviews, iant, r
https://golang.org/cl/83850043
Reduce footprint of liveness bitmaps by about 5x.
1. Mark all liveness bitmap symbols as 4-byte aligned
(they were aligned to a larger size by default).
2. The bitmap data is a bitmap count n followed by n bitmaps.
Each bitmap begins with its own count m giving the number
of bits. All the m's are the same for the n bitmaps.
Emit this bitmap length once instead of n times.
3. Many bitmaps within a function have the same bit values,
but each call site was given a distinct bitmap. Merge duplicate
bitmaps so that no bitmap is written more than once.
4. Many functions end up with the same aggregate bitmap data.
We used to name the bitmap data funcname.gcargs and funcname.gclocals.
Instead, name it gclocals.<md5 of data> and mark it dupok so
that the linker coalesces duplicate sets. This cut the bitmap
data remaining after step 3 by 40%; I was not expecting it to
be quite so dramatic.
Applied to "go build -ldflags -w code.google.com/p/go.tools/cmd/godoc":
bitmaps pclntab binary on disk
before this CL 1326600 1985854 12738268
4-byte align 1154288 (0.87x) 1985854 (1.00x) 12566236 (0.99x)
one bitmap len 782528 (0.54x) 1985854 (1.00x) 12193500 (0.96x)
dedup bitmap 414748 (0.31x) 1948478 (0.98x) 11787996 (0.93x)
dedup bitmap set 245580 (0.19x) 1948478 (0.98x) 11620060 (0.91x)
While here, remove various dead blocks of code from plive.c.
Fixes#6929.
Fixes#7568.
LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/83630044
REP MOVSQ and REP STOSQ have a really high startup overhead.
Use a Duff's device to do the repetition instead.
benchmark old ns/op new ns/op delta
BenchmarkClearFat32 7.20 1.60 -77.78%
BenchmarkCopyFat32 6.88 2.38 -65.41%
BenchmarkClearFat64 7.15 3.20 -55.24%
BenchmarkCopyFat64 6.88 3.44 -50.00%
BenchmarkClearFat128 9.53 5.34 -43.97%
BenchmarkCopyFat128 9.27 5.56 -40.02%
BenchmarkClearFat256 13.8 9.53 -30.94%
BenchmarkCopyFat256 13.5 10.3 -23.70%
BenchmarkClearFat512 22.3 18.0 -19.28%
BenchmarkCopyFat512 22.0 19.7 -10.45%
BenchmarkCopyFat1024 36.5 38.4 +5.21%
BenchmarkClearFat1024 35.1 35.0 -0.28%
TODO: use for stack frame zeroing
TODO: REP prefixes are still used for "reverse" copying when src/dst
regions overlap. Might be worth fixing.
LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews, r
https://golang.org/cl/81370046
The old code was using the PC of the instruction after the CALL.
Variables live during the call but not live when it returns would
not be seen as live during the stack copy, which might lead to
corruption. The correct PC to use is the one just before the
return address. After this CL the lookup matches what mgc0.c does.
The only time this matters is if you have back to back CALL instructions:
CALL f1 // x live here
CALL f2 // x no longer live
If a stack copy occurs during the execution of f1, the old code will
use the liveness bitmap intended for the execution of f2 and will not
treat x as live.
The only way this situation can arise and cause a problem in a stack copy
is if x lives on the stack has had its address taken but the compiler knows
enough about the context to know that x is no longer needed once f1
returns. The compiler has never known that much, so using the f2 context
cannot currently cause incorrect execution. For the same reason, it is not
possible to write a test for this today.
CL 83090046 will make the compiler precise enough in some cases
that this distinction will start mattering. The existing stack growth tests
in package runtime will fail if that CL is submitted without this one.
While we're here, print the frame PC in debug mode and update the
bitmap interpretation strings.
LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/83250043
GODEBUG=allocfreetrace=1:
The allocfreetrace=1 mode prints a stack trace for each block
allocated and freed, and also a stack trace for each garbage collection.
It was implemented by reusing the heap profiling support: if allocfreetrace=1
then the heap profile was effectively running at 1 sample per 1 byte allocated
(always sample). The stack being shown at allocation was the stack gathered
for profiling, meaning it was derived only from the program counters and
did not include information about function arguments or frame pointers.
The stack being shown at free was the allocation stack, not the free stack.
If you are generating this log, you can find the allocation stack yourself, but
it can be useful to see exactly the sequence that led to freeing the block:
was it the garbage collector or an explicit free? Now that the garbage collector
runs on an m0 stack, the stack trace for the garbage collector was never interesting.
Fix all these problems:
1. Decouple allocfreetrace=1 from heap profiling.
2. Print the standard goroutine stack traces instead of a custom format.
3. Print the stack trace at time of allocation for an allocation,
and print the stack trace at time of free (not the allocation trace again)
for a free.
4. Print all goroutine stacks at garbage collection. Having all the stacks
means that you can see the exact point at which each goroutine was
preempted, which is often useful for identifying liveness-related errors.
GODEBUG=gcdead=1:
This mode overwrites dead pointers with a poison value.
Detect the poison value as an invalid pointer during collection,
the same way that small integers are invalid pointers.
LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/81670043