Commit Graph

1902 Commits

Author SHA1 Message Date
Keith Randall 711d1ad7ee runtime: be a lot more lenient on smhasher avalanche test.
Fixes #7943

LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=golang-codereviews
https://golang.org/cl/98170043
2014-05-09 15:50:57 -07:00
Keith Randall 65c63dc4aa runtime: write memory profile statistics to the heap dump.
LGTM=rsc
R=rsc, khr
CC=golang-codereviews
https://golang.org/cl/97010043
2014-05-08 08:35:49 -07:00
Keith Randall 51b72d94de runtime: use duff zero and copy to initialize memory
benchmark                 old ns/op     new ns/op     delta
BenchmarkCopyFat512       1307          329           -74.83%
BenchmarkCopyFat256       666           169           -74.62%
BenchmarkCopyFat1024      2617          671           -74.36%
BenchmarkCopyFat128       343           89.0          -74.05%
BenchmarkCopyFat64        182           48.9          -73.13%
BenchmarkCopyFat32        103           28.8          -72.04%
BenchmarkClearFat128      102           46.6          -54.31%
BenchmarkClearFat512      344           167           -51.45%
BenchmarkClearFat64       50.5          26.5          -47.52%
BenchmarkClearFat256      147           87.2          -40.68%
BenchmarkClearFat32       22.7          16.4          -27.75%
BenchmarkClearFat1024     511           662           +29.55%

Fixes #7624

LGTM=rsc
R=golang-codereviews, khr, bradfitz, josharian, dave, rsc
CC=golang-codereviews
https://golang.org/cl/92760044
2014-05-07 13:17:10 -07:00
Dmitriy Vyukov acb03b8028 runtime: optimize markspan
Increases throughput by 2x on a memory hungry program on 8-node NUMA machine.

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/100230043
2014-05-07 19:32:34 +04:00
Dmitriy Vyukov c0bf96e6b1 runtime: fix bug in cpu profiler
Number of lost samples was overcounted (never reset).
Also remove unused variable (it's trivial to restore it for debugging if needed).

LGTM=iant
R=golang-codereviews, iant
CC=golang-codereviews, rsc
https://golang.org/cl/96060043
2014-05-07 18:48:14 +04:00
Robert Griesemer f3913624a7 std lib: fix various typos in comments
Where the spelling changed from British to
US norm (e.g., optimise -> optimize) it follows
the style in that file.

LGTM=adonovan
R=golang-codereviews, adonovan
CC=golang-codereviews
https://golang.org/cl/96980043
2014-05-02 13:17:55 -07:00
Alan Donovan 28c515f40f runtime: fix bug in GOTRACEBACK=crash causing suppression of core dumps.
Because gotraceback is called early and often, its cache commits to the value of getenv("GOTRACEBACK") before getenv is even ready.  So now we reset its cache once getenv becomes ready.  Panicking programs now dump core again.

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/97800045
2014-05-02 13:06:58 -04:00
Dmitriy Vyukov 8afa086ce6 runtime: do not set m->locks around memory allocation
If slice append is the only place where a program allocates,
then it will consume all available memory w/o triggering GC.
This was demonstrated in the issue.
Fixes #7922.

LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews, iant, khr
https://golang.org/cl/91010048
2014-05-02 17:39:25 +01:00
Dmitriy Vyukov 350a8fcde1 runtime: make MemStats.LastGC Unix time again
The monotonic clock patch changed all runtime times
to abstract monotonic time. As the result user-visible
MemStats.LastGC become monotonic time as well.
Restore Unix time for LastGC.

This is the simplest way to expose time.now to runtime that I found.
Another option would be to change time.now to C called
int64 runtime.unixnanotime() and then express time.now in terms of it.
But this would require to introduce 2 64-bit divisions into time.now.
Another option would be to change time.now to C called
void runtime.unixnanotime1(struct {int64 sec, int32 nsec} *now)
and then express both time.now and runtime.unixnanotime in terms of it.

Fixes #7852.

LGTM=minux.ma, iant
R=minux.ma, rsc, iant
CC=golang-codereviews
https://golang.org/cl/93720045
2014-05-02 17:32:42 +01:00
Keith Randall e9977dad45 runtime: correctly type interface data.
The backing memory for >1 word interfaces was being scanned
conservatively.

LGTM=iant
R=golang-codereviews, iant
CC=golang-codereviews
https://golang.org/cl/94000043
2014-05-01 09:37:55 -04:00
Keith Randall 29d1b211fd runtime: clean up scanning of Gs
Use a real type for Gs instead of scanning them conservatively.
Zero the schedlink pointer when it is dead.

Update #7820

LGTM=rsc
R=rsc, dvyukov
CC=golang-codereviews
https://golang.org/cl/89360043
2014-04-28 12:47:09 -04:00
Keith Randall 573cfe9561 runtime: heapdump - make sure spans are swept before dumping.
LGTM=rsc
R=golang-codereviews, adonovan, rsc
CC=golang-codereviews
https://golang.org/cl/90440043
2014-04-28 12:45:00 -04:00
Mark Zavislak 800d8adf35 runtime: fix typo in error message
LGTM=robert.hencke, iant
R=golang-codereviews, robert.hencke, iant
CC=golang-codereviews
https://golang.org/cl/89760043
2014-04-21 08:55:23 -07:00
Rémy Oudompheng 1332eb5b62 runtime/race: add test for issue 7561.
LGTM=dvyukov
R=rsc, iant, khr, dvyukov
CC=golang-codereviews
https://golang.org/cl/76520045
2014-04-21 17:21:09 +02:00
Shenghou Ma 1bb4f37fce runtime, go/build: re-enable cgo on FreeBSD.
Fixes #7331.

LGTM=dave, iant
R=golang-codereviews, dave, gobot, iant
CC=golang-codereviews
https://golang.org/cl/89150043
2014-04-21 00:09:22 -04:00
Shenghou Ma d31d19765b runtime, cmd/ld, cmd/5l, run.bash: enable external linking on FreeBSD/ARM.
Update #7331

LGTM=dave, iant
R=golang-codereviews, dave, gobot, iant
CC=golang-codereviews
https://golang.org/cl/89520043
2014-04-21 00:08:59 -04:00
Alex Brainman 6e8c7f5bb2 cmd/nm: print symbol sizes for windows pe executables
Fixes #6973

LGTM=r
R=golang-codereviews, r
CC=golang-codereviews
https://golang.org/cl/88820043
2014-04-19 14:47:20 +10:00
Russ Cox ade6bc68b0 runtime: crash when func main calls Goexit and all other goroutines exit
This has typically crashed in the past, although usually with
an 'all goroutines are asleep - deadlock!' message that shows
no goroutines (because there aren't any).

Previous discussion at:
https://groups.google.com/d/msg/golang-nuts/uCT_7WxxopQ/BoSBlLFzUTkJ
https://groups.google.com/d/msg/golang-dev/KUojayEr20I/u4fp_Ej5PdUJ
http://golang.org/issue/7711

There is general agreement that runtime.Goexit terminates the
main goroutine, so that main cannot return, so the program does
not exit.

The interpretation that all other goroutines exiting causes an
exit(0) is relatively new and was not part of those discussions.
That is what this CL changes.

Thankfully, even though the exit(0) has been there for a while,
some other accounting bugs made it very difficult to trigger,
so it is reasonable to replace. In particular, see golang.org/issue/7711#c10
for an examination of the behavior across past releases.

Fixes #7711.

LGTM=iant, r
R=golang-codereviews, iant, dvyukov, r
CC=golang-codereviews
https://golang.org/cl/88210044
2014-04-16 13:12:18 -04:00
Russ Cox a5b1530557 runtime: adjust GC debug print to include source pointers
Having the pointers means you can grub around in the
binary finding out more about them.

This helped with issue 7748.

LGTM=minux.ma, bradfitz
R=golang-codereviews, minux.ma, bradfitz
CC=golang-codereviews
https://golang.org/cl/88090045
2014-04-16 11:39:43 -04:00
Russ Cox 90093f0634 liblink: introduce TLS register on 386 and amd64
When I did the original 386 ports on Linux and OS X, I chose to
define GS-relative expressions like 4(GS) as relative to the actual
thread-local storage base, which was usually GS but might not be
(it might be FS, or it might be a different constant offset from GS or FS).

The original scope was limited but since then the rewrites have
gotten out of control. Sometimes GS is rewritten, sometimes FS.
Some ports do other rewrites to enable shared libraries and
other linking. At no point in the code is it clear whether you are
looking at the real GS/FS or some synthesized thing that will be
rewritten. The code manipulating all these is duplicated in many
places.

The first step to fixing issue 7719 is to make the code intelligible
again.

This CL adds an explicit TLS pseudo-register to the 386 and amd64.
As a register, TLS refers to the thread-local storage base, and it
can only be loaded into another register:

        MOVQ TLS, AX

An offset from the thread-local storage base is written off(reg)(TLS*1).
Semantically it is off(reg), but the (TLS*1) annotation marks this as
indexing from the loaded TLS base. This emits a relocation so that
if the linker needs to adjust the offset, it can. For example:

        MOVQ TLS, AX
        MOVQ 8(AX)(TLS*1), CX // load m into CX

On systems that support direct access to the TLS memory, this
pair of instructions can be reduced to a direct TLS memory reference:

        MOVQ 8(TLS), CX // load m into CX

The 2-instruction and 1-instruction forms correspond roughly to
ELF TLS initial exec mode and ELF TLS local exec mode, respectively.

Liblink applies this rewrite on systems that support the 1-instruction form.
The decision is made using only the operating system (and probably
the -shared flag, eventually), not the link mode. If some link modes
on a particular operating system require the 2-instruction form,
then all builds for that operating system will use the 2-instruction
form, so that the link mode decision can be delayed to link time.

Obviously it is late to be making changes like this, but I despair
of correcting issue 7719 and issue 7164 without it. To make sure
I am not changing existing behavior, I built a "hello world" program
for every GOOS/GOARCH combination we have and then worked
to make sure that the rewrite generates exactly the same binaries,
byte for byte. There are a handful of TODOs in the code marking
kludges to get the byte-for-byte property, but at least now I can
explain exactly how each binary is handled.

The targets I tested this way are:

        darwin-386
        darwin-amd64
        dragonfly-386
        dragonfly-amd64
        freebsd-386
        freebsd-amd64
        freebsd-arm
        linux-386
        linux-amd64
        linux-arm
        nacl-386
        nacl-amd64p32
        netbsd-386
        netbsd-amd64
        openbsd-386
        openbsd-amd64
        plan9-386
        plan9-amd64
        solaris-amd64
        windows-386
        windows-amd64

There were four exceptions to the byte-for-byte goal:

windows-386 and windows-amd64 have a time stamp
at bytes 137 and 138 of the header.

darwin-386 and plan9-386 have five or six modified
bytes in the middle of the Go symbol table, caused by
editing comments in runtime/sys_{darwin,plan9}_386.s.

Fixes #7164.

LGTM=iant
R=iant, aram, minux.ma, dave
CC=golang-codereviews
https://golang.org/cl/87920043
2014-04-15 13:45:39 -04:00
Dmitriy Vyukov 55e0f36fb4 runtime: fix program termination when main goroutine calls Goexit
Do not consider idle finalizer/bgsweep/timer goroutines as doing something useful.
We can't simply set isbackground for the whole lifetime of the goroutines,
because when finalizer goroutine calls user function, we do want to consider it
as doing something useful.
This is borken due to timers for quite some time.
With background sweep is become even more broken.
Fixes #7784.

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/87960044
2014-04-15 19:48:17 +04:00
Dmitriy Vyukov 8fc6ed4c89 sync: less agressive local caching in Pool
Currently Pool can cache up to 15 elements per P, and these elements are not accesible to other Ps.
If a Pool caches large objects, say 2MB, and GOMAXPROCS is set to a large value, say 32,
then the Pool can waste up to 960MB.
The new caching policy caches at most 1 per-P element, the rest is shared between Ps.

Get/Put performance is unchanged. Nested Get/Put performance is 57% worse.
However, overall scalability of nested Get/Put is significantly improved,
so the new policy starts winning under contention.

benchmark                     old ns/op     new ns/op     delta
BenchmarkPool                 27.4          26.7          -2.55%
BenchmarkPool-4               6.63          6.59          -0.60%
BenchmarkPool-16              1.98          1.87          -5.56%
BenchmarkPool-64              1.93          1.86          -3.63%
BenchmarkPoolOverlflow        3970          6235          +57.05%
BenchmarkPoolOverlflow-4      10935         1668          -84.75%
BenchmarkPoolOverlflow-16     13419         520           -96.12%
BenchmarkPoolOverlflow-64     10295         380           -96.31%

LGTM=rsc
R=rsc
CC=golang-codereviews, khr
https://golang.org/cl/86020043
2014-04-14 21:13:32 +04:00
Russ Cox 72185093f6 runtime: increase timeout in TestStackGrowth
It looks like maybe on slower builders 4 seconds is not enough.
Trying to get rid of the flaky failures.

TBR=iant
CC=golang-codereviews
https://golang.org/cl/86870044
2014-04-13 20:19:10 -04:00
Russ Cox 9d81ade223 runtime: make stack growth test shorter
It runs too long in -short mode.

Disable the one in init, because it doesn't respect -short.

Make the part that claims to test execution in a finalizer
actually execute the test in the finalizer.

LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=aram.h, golang-codereviews, iant, khr
https://golang.org/cl/86550045
2014-04-11 00:08:07 -04:00
Russ Cox 5539ef02b6 runtime: make times in GODEBUG=gctrace=1 output clearer
TBR=0intro
CC=golang-codereviews
https://golang.org/cl/86620043
2014-04-10 14:34:48 -04:00
David du Colombier d7ac73c869 runtime: no longer skip stack growth test in short mode
We originally decided to skip this test in short mode
to prevent the parallel runtime test to timeout on the
Plan 9 builder. This should no longer be required since
the issue was fixed in CL 86210043.

LGTM=dave, bradfitz
R=dvyukov, dave, bradfitz
CC=golang-codereviews, rsc
https://golang.org/cl/84790044
2014-04-10 06:37:30 +02:00
David du Colombier 5a51306170 runtime: fix semasleep on Plan 9
If you pass ns = 100,000 to this function, timediv will
return ms = 0. tsemacquire in /sys/src/9/port/sysproc.c
will return immediately when ms == 0 and the semaphore
cannot be acquired immediately - it doesn't sleep - so
notetsleep will spin, chewing cpu and repeatedly reading
the time, until the 100us have passed.

Thanks to the time reads it won't take too many iterations,
but whatever we are waiting for does not get a chance to
run. Eventually the notetsleep spin loop returns and we
end up in the stoptheworld spin loop - actually a sleep
loop but we're not doing a good job of sleeping.

After 100ms or so of this, the kernel says enough and
schedules a different thread. That thread manages to do
whatever we're waiting for, and the spinning in the other
thread stops. If tsemacquire had actually slept, this
would have happened much quicker.

Many thanks to Russ Cox for help debugging.

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/86210043
2014-04-10 06:36:20 +02:00
Russ Cox 95ee7d6414 runtime: use 3x fewer nanotime calls in garbage collection
Cuts the number of calls from 6 to 2 in the non-debug case.

LGTM=iant
R=golang-codereviews, iant
CC=0intro, aram, golang-codereviews, khr
https://golang.org/cl/86040043
2014-04-09 10:38:12 -04:00
Russ Cox e688e7128d runtime: fix flaky linux/386 build
TBR=iant
CC=golang-codereviews
https://golang.org/cl/86030043
2014-04-09 10:02:55 -04:00
David du Colombier a07f6adda8 runtime: fix GOTRACEBACK on Plan 9
Getenv() should not call malloc when called from
gotraceback(). Instead, we return a static buffer
in this case, with enough room to hold the longest
value.

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/85680043
2014-04-09 06:41:14 +02:00
Russ Cox 5556bfa9c7 runtime: cache gotraceback setting
On Plan 9 gotraceback calls getenv calls malloc, and we gotraceback
on every call to gentraceback, which happens during garbage collection.
Honestly I don't even know how this works on Plan 9.
I suspect it does not, and that we are getting by because
no one has tried to run with $GOTRACEBACK set at all.

This will speed up all the other systems by epsilon, since they
won't call getenv and atoi repeatedly.

LGTM=bradfitz
R=golang-codereviews, bradfitz, 0intro
CC=golang-codereviews
https://golang.org/cl/85430046
2014-04-08 22:35:41 -04:00
Russ Cox 72c5d5e756 reflect, runtime: fix crash in GC due to reflect.call + precise GC
Given
        type Outer struct {
                *Inner
                ...
        }
the compiler generates the implementation of (*Outer).M dispatching to
the embedded Inner. The implementation is logically:
        func (p *Outer) M() {
                (p.Inner).M()
        }
but since the only change here is the replacement of one pointer
receiver with another, the actual generated code overwrites the
original receiver with the p.Inner pointer and then jumps to the M
method expecting the *Inner receiver.

During reflect.Value.Call, we create an argument frame and the
associated data structures to describe it to the garbage collector,
populate the frame, call reflect.call to run a function call using
that frame, and then copy the results back out of the frame. The
reflect.call function does a memmove of the frame structure onto the
stack (to set up the inputs), runs the call, and the memmoves the
stack back to the frame structure (to preserve the outputs).

Originally reflect.call did not distinguish inputs from outputs: both
memmoves were for the full stack frame. However, in the case where the
called function was one of these wrappers, the rewritten receiver is
almost certainly a different type than the original receiver. This is
not a problem on the stack, where we use the program counter to
determine the type information and understand that during (*Outer).M
the receiver is an *Outer while during (*Inner).M the receiver in the
same memory word is now an *Inner. But in the statically typed
argument frame created by reflect, the receiver is always an *Outer.
Copying the modified receiver pointer off the stack into the frame
will store an *Inner there, and then if a garbage collection happens
to scan that argument frame before it is discarded, it will scan the
*Inner memory as if it were an *Outer. If the two have different
memory layouts, the collection will intepret the memory incorrectly.

Fix by only copying back the results.

Fixes #7725.

LGTM=khr
R=khr
CC=dave, golang-codereviews
https://golang.org/cl/85180043
2014-04-08 11:11:35 -04:00
Dmitriy Vyukov 9e1cadad0f runtime/race: more precise handling of channel synchronization
It turns out there is a relatively common pattern that relies on
inverted channel semaphore:

gate := make(chan bool, N)
for ... {
        // limit concurrency
        gate <- true
        go func() {
                foo(...)
                <-gate
        }()
}
// join all goroutines
for i := 0; i < N; i++ {
        gate <- true
}

So handle synchronization on inverted semaphores with cap>1.
Fixes #7718.

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/84880046
2014-04-08 10:18:20 +04:00
Keith Randall fc6753c7cd runtime: make sure associated defers are copyable before trying to copy a stack.
Defers generated from cgo lie to us about their argument layout.
Mark those defers as not copyable.

CL 83820043 contains an additional test for this code and should be
checked in (and enabled) after this change is in.

Fixes bug 7695.

LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews
https://golang.org/cl/84740043
2014-04-07 17:40:00 -07:00
Keith Randall af923df89e runtime: fix heapdump bugs.
Iterate the right number of times in arrays and channels.
Handle channels with zero-sized objects in them.
Output longer type names if we have them.
Compute argument offset correctly.

LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews
https://golang.org/cl/82980043
2014-04-07 17:35:44 -07:00
Keith Randall 1daa2520bf runtime: fix plan9 warning.
I have no idea what this code is for, but it pretty
clearly needs to be uint64, not uint32.

LGTM=aram
R=0intro, aram
CC=golang-codereviews
https://golang.org/cl/84410043
2014-04-04 08:15:27 -07:00
Russ Cox 28f1868fed cmd/gc, runtime: make GODEBUG=gcdead=1 mode work with liveness
Trying to make GODEBUG=gcdead=1 work with liveness
and in particular ambiguously live variables.

1. In the liveness computation, mark all ambiguously live
variables as live for the entire function, except the entry.
They are zeroed directly after entry, and we need them not
to be poisoned thereafter.

2. In the liveness computation, compute liveness (and deadness)
for all parameters, not just pointer-containing parameters.
Otherwise gcdead poisons untracked scalar parameters and results.

3. Fix liveness debugging print for -live=2 to use correct bitmaps.
(Was not updated for compaction during compaction CL.)

4. Correct varkill during map literal initialization.
Was killing the map itself instead of the inserted value temp.

5. Disable aggressive varkill cleanup for call arguments if
the call appears in a defer or go statement.

6. In the garbage collector, avoid bug scanning empty
strings. An empty string is two zeros. The multiword
code only looked at the first zero and then interpreted
the next two bits in the bitmap as an ordinary word bitmap.
For a string the bits are 11 00, so if a live string was zero
length with a 0 base pointer, the poisoning code treated
the length as an ordinary word with code 00, meaning it
needed poisoning, turning the string into a poison-length
string with base pointer 0. By the same logic I believe that
a live nil slice (bits 11 01 00) will have its cap poisoned.
Always scan full multiword struct.

7. In the runtime, treat both poison words (PoisonGC and
PoisonStack) as invalid pointers that warrant crashes.

Manual testing as follows:

- Create a script called gcdead on your PATH containing:

        #!/bin/bash
        GODEBUG=gcdead=1 GOGC=10 GOTRACEBACK=2 exec "$@"
- Now you can build a test and then run 'gcdead ./foo.test'.
- More importantly, you can run 'go test -short -exec gcdead std'
   to run all the tests.

Fixes #7676.

While here, enable the precise scanning of slices, since that was
disabled due to bugs like these. That now works, both with and
without gcdead.

Fixes #7549.

LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/83410044
2014-04-03 20:33:25 -04:00
Russ Cox 17f9423e75 runtime: test malformed address fault and fix on OS X
The garbage collector poison pointers
(0x6969696969696969 and 0x6868686868686868)
are malformed addresses on amd64.
That is, they are not 48-bit addresses sign extended
to 64 bits. This causes a different kind of hardware fault
than the usual 'unmapped page' when accessing such
an address, and OS X 10.9.2 sends the resulting SIGSEGV
incorrectly, making it look like it was user-generated
rather than kernel-generated and does not include the
faulting address. This means that in GODEBUG=gcdead=1
mode, if there is a bug and something tries to dereference
a poisoned pointer, the runtime delivers the SIGSEGV to
os/signal and returns to the faulting code, which faults
again, causing the process to hang instead of crashing.

Fix by rewriting "user-generated" SIGSEGV on OS X to
look like a kernel-generated SIGSEGV with fault address
0xb01dfacedebac1e.

I chose that address because (1) when printed in hex
during a crash, it is obviously spelling out English text,
(2) there are no current Google hits for that pointer,
which will make its origin easy to find once this CL
is indexed, and (3) it is not an altogether inaccurate
description of the situation.

Add a test. Maybe other systems will break too.

LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, iant, ken
https://golang.org/cl/83270049
2014-04-03 19:07:33 -04:00
Russ Cox 4110271501 runtime: handle fault during runtime more like unexpected fault address
Delaying the runtime.throw until here will print more information.
In particular it will print the signal and code values, which means
it will show the fault address.

The canpanic checks were added recently, in CL 75320043.
They were just not added in exactly the right place.

LGTM=iant
R=dvyukov, iant
CC=golang-codereviews
https://golang.org/cl/83980043
2014-04-03 19:05:59 -04:00
Russ Cox f5f5a8b620 cmd/gc, runtime: optimize map[string] lookup from []byte key
Brad has been asking for this for a while.
I have resisted because I wanted to find a more general way to
do this, one that would keep the performance of code introducing
variables the same as the performance of code that did not.
(See golang.org/issue/3512#c20).

I have not found the more general way, and recent changes to
remove ambiguously live temporaries have blown away the
property I was trying to preserve, so that's no longer a reason
not to make the change.

Fixes #3512.

LGTM=iant
R=iant
CC=bradfitz, golang-codereviews, khr, r
https://golang.org/cl/83740044
2014-04-03 19:05:17 -04:00
Russ Cox 0e1b6bb547 runtime: use mincore correctly in addrspace_free
Fixes #7476.

LGTM=iant
R=iant
CC=golang-codereviews
https://golang.org/cl/84000043
2014-04-03 19:04:47 -04:00
Russ Cox b2cbf49343 runtime: fix fault during arm software floating point
The software floating point runs with m->locks++
to avoid being preempted; recognize this case in panic
and undo it so that m->locks is maintained correctly
when panicking.

Fixes #7553.

LGTM=dvyukov
R=golang-codereviews, dvyukov
CC=golang-codereviews
https://golang.org/cl/84030043
2014-04-03 15:39:48 -04:00
Russ Cox c40480ddd9 runtime: print up to 10 words of arguments
The old limit of 5 was chosen because we didn't actually know how
many bytes of arguments there were; 5 was a halfway point between
printing some useful information and looking ridiculous.

Now we know how many bytes of arguments there are, and we stop
the printing when we reach that point, so the "looking ridiculous" case
doesn't happen anymore: we only print actual argument words.
The cutoff now serves only to truncate very long (but real) argument lists.

In multiple debugging sessions recently (completely unrelated bugs)
I have been frustrated by not seeing more of the long argument lists:
5 words is only 2.5 interface values or strings, and not even 2 slices.
Double the max amount we'll show.

LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=golang-codereviews, iant, r
https://golang.org/cl/83850043
2014-04-02 23:00:40 -04:00
Dave Cheney 9121e7e4df runtime: check that new slice cap doesn't overflow
Fixes #7550.

LGTM=iant
R=golang-codereviews, iant, josharian
CC=golang-codereviews
https://golang.org/cl/83520043
2014-04-03 13:44:44 +11:00
Russ Cox 81bc9b3ffd runtime: revert change to PoisonPtr value
Submitted accidentally in CL 83630044.
Fixes various builds.

TBR=khr
CC=golang-codereviews
https://golang.org/cl/83100047
2014-04-02 16:55:30 -04:00
Russ Cox 4676fae525 cmd/gc, cmd/ld, runtime: compact liveness bitmaps
Reduce footprint of liveness bitmaps by about 5x.

1. Mark all liveness bitmap symbols as 4-byte aligned
(they were aligned to a larger size by default).

2. The bitmap data is a bitmap count n followed by n bitmaps.
Each bitmap begins with its own count m giving the number
of bits. All the m's are the same for the n bitmaps.
Emit this bitmap length once instead of n times.

3. Many bitmaps within a function have the same bit values,
but each call site was given a distinct bitmap. Merge duplicate
bitmaps so that no bitmap is written more than once.

4. Many functions end up with the same aggregate bitmap data.
We used to name the bitmap data funcname.gcargs and funcname.gclocals.
Instead, name it gclocals.<md5 of data> and mark it dupok so
that the linker coalesces duplicate sets. This cut the bitmap
data remaining after step 3 by 40%; I was not expecting it to
be quite so dramatic.

Applied to "go build -ldflags -w code.google.com/p/go.tools/cmd/godoc":

                bitmaps           pclntab           binary on disk
before this CL  1326600           1985854           12738268
4-byte align    1154288 (0.87x)   1985854 (1.00x)   12566236 (0.99x)
one bitmap len   782528 (0.54x)   1985854 (1.00x)   12193500 (0.96x)
dedup bitmap     414748 (0.31x)   1948478 (0.98x)   11787996 (0.93x)
dedup bitmap set 245580 (0.19x)   1948478 (0.98x)   11620060 (0.91x)

While here, remove various dead blocks of code from plive.c.

Fixes #6929.
Fixes #7568.

LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/83630044
2014-04-02 16:49:27 -04:00
Dmitriy Vyukov f4ef6977ff runtime: ignore pointers to global objects in SetFinalizer
Update #7656

LGTM=rsc
R=rsc, iant
CC=golang-codereviews
https://golang.org/cl/82560043
2014-04-02 10:19:28 +04:00
Keith Randall 6c7cbf086c runtime: get rid of most uses of REP for copying/zeroing.
REP MOVSQ and REP STOSQ have a really high startup overhead.
Use a Duff's device to do the repetition instead.

benchmark                 old ns/op     new ns/op     delta
BenchmarkClearFat32       7.20          1.60          -77.78%
BenchmarkCopyFat32        6.88          2.38          -65.41%
BenchmarkClearFat64       7.15          3.20          -55.24%
BenchmarkCopyFat64        6.88          3.44          -50.00%
BenchmarkClearFat128      9.53          5.34          -43.97%
BenchmarkCopyFat128       9.27          5.56          -40.02%
BenchmarkClearFat256      13.8          9.53          -30.94%
BenchmarkCopyFat256       13.5          10.3          -23.70%
BenchmarkClearFat512      22.3          18.0          -19.28%
BenchmarkCopyFat512       22.0          19.7          -10.45%
BenchmarkCopyFat1024      36.5          38.4          +5.21%
BenchmarkClearFat1024     35.1          35.0          -0.28%

TODO: use for stack frame zeroing
TODO: REP prefixes are still used for "reverse" copying when src/dst
regions overlap.  Might be worth fixing.

LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews, r
https://golang.org/cl/81370046
2014-04-01 12:51:02 -07:00
Russ Cox cfb347fc0a runtime: use correct pc to obtain liveness info during stack copy
The old code was using the PC of the instruction after the CALL.
Variables live during the call but not live when it returns would
not be seen as live during the stack copy, which might lead to
corruption. The correct PC to use is the one just before the
return address. After this CL the lookup matches what mgc0.c does.

The only time this matters is if you have back to back CALL instructions:

        CALL f1 // x live here
        CALL f2 // x no longer live

If a stack copy occurs during the execution of f1, the old code will
use the liveness bitmap intended for the execution of f2 and will not
treat x as live.

The only way this situation can arise and cause a problem in a stack copy
is if x lives on the stack has had its address taken but the compiler knows
enough about the context to know that x is no longer needed once f1
returns. The compiler has never known that much, so using the f2 context
cannot currently cause incorrect execution. For the same reason, it is not
possible to write a test for this today.

CL 83090046 will make the compiler precise enough in some cases
that this distinction will start mattering. The existing stack growth tests
in package runtime will fail if that CL is submitted without this one.

While we're here, print the frame PC in debug mode and update the
bitmap interpretation strings.

LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/83250043
2014-04-01 14:57:58 -04:00
Russ Cox 1ec4d5e9e7 runtime: adjust GODEBUG=allocfreetrace=1 and GODEBUG=gcdead=1
GODEBUG=allocfreetrace=1:

The allocfreetrace=1 mode prints a stack trace for each block
allocated and freed, and also a stack trace for each garbage collection.

It was implemented by reusing the heap profiling support: if allocfreetrace=1
then the heap profile was effectively running at 1 sample per 1 byte allocated
(always sample). The stack being shown at allocation was the stack gathered
for profiling, meaning it was derived only from the program counters and
did not include information about function arguments or frame pointers.
The stack being shown at free was the allocation stack, not the free stack.
If you are generating this log, you can find the allocation stack yourself, but
it can be useful to see exactly the sequence that led to freeing the block:
was it the garbage collector or an explicit free? Now that the garbage collector
runs on an m0 stack, the stack trace for the garbage collector was never interesting.

Fix all these problems:

1. Decouple allocfreetrace=1 from heap profiling.
2. Print the standard goroutine stack traces instead of a custom format.
3. Print the stack trace at time of allocation for an allocation,
   and print the stack trace at time of free (not the allocation trace again)
   for a free.
4. Print all goroutine stacks at garbage collection. Having all the stacks
   means that you can see the exact point at which each goroutine was
   preempted, which is often useful for identifying liveness-related errors.

GODEBUG=gcdead=1:

This mode overwrites dead pointers with a poison value.
Detect the poison value as an invalid pointer during collection,
the same way that small integers are invalid pointers.

LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/81670043
2014-04-01 13:30:10 -04:00