Commit Graph

333 Commits

Author SHA1 Message Date
Keith Randall 3cf83c182a undo CL 104200047 / 318b04f28372
Breaks windows and race detector.
TBR=rsc

««« original CL description
runtime: stack allocator, separate from mallocgc

In order to move malloc to Go, we need to have a
separate stack allocator.  If we run out of stack
during malloc, malloc will not be available
to allocate a new stack.

Stacks are the last remaining FlagNoGC objects in the
GC heap.  Once they are out, we can get rid of the
distinction between the allocated/blockboundary bits.
(This will be in a separate change.)

Fixes #7468
Fixes #7424

LGTM=rsc, dvyukov
R=golang-codereviews, dvyukov, khr, dave, rsc
CC=golang-codereviews
https://golang.org/cl/104200047
»»»

TBR=rsc
CC=golang-codereviews
https://golang.org/cl/101570044
2014-06-30 19:48:08 -07:00
Keith Randall 7c13860cd0 runtime: stack allocator, separate from mallocgc
In order to move malloc to Go, we need to have a
separate stack allocator.  If we run out of stack
during malloc, malloc will not be available
to allocate a new stack.

Stacks are the last remaining FlagNoGC objects in the
GC heap.  Once they are out, we can get rid of the
distinction between the allocated/blockboundary bits.
(This will be in a separate change.)

Fixes #7468
Fixes #7424

LGTM=rsc, dvyukov
R=golang-codereviews, dvyukov, khr, dave, rsc
CC=golang-codereviews
https://golang.org/cl/104200047
2014-06-30 18:59:24 -07:00
Russ Cox 89f185fe8a all: remove 'extern register M *m' from runtime
The runtime has historically held two dedicated values g (current goroutine)
and m (current thread) in 'extern register' slots (TLS on x86, real registers
backed by TLS on ARM).

This CL removes the extern register m; code now uses g->m.

On ARM, this frees up the register that formerly held m (R9).
This is important for NaCl, because NaCl ARM code cannot use R9 at all.

The Go 1 macrobenchmarks (those with per-op times >= 10 µs) are unaffected:

BenchmarkBinaryTree17              5491374955     5471024381     -0.37%
BenchmarkFannkuch11                4357101311     4275174828     -1.88%
BenchmarkGobDecode                 11029957       11364184       +3.03%
BenchmarkGobEncode                 6852205        6784822        -0.98%
BenchmarkGzip                      650795967      650152275      -0.10%
BenchmarkGunzip                    140962363      141041670      +0.06%
BenchmarkHTTPClientServer          71581          73081          +2.10%
BenchmarkJSONEncode                31928079       31913356       -0.05%
BenchmarkJSONDecode                117470065      113689916      -3.22%
BenchmarkMandelbrot200             6008923        5998712        -0.17%
BenchmarkGoParse                   6310917        6327487        +0.26%
BenchmarkRegexpMatchMedium_1K      114568         114763         +0.17%
BenchmarkRegexpMatchHard_1K        168977         169244         +0.16%
BenchmarkRevcomp                   935294971      914060918      -2.27%
BenchmarkTemplate                  145917123      148186096      +1.55%

Minux previous reported larger variations, but these were caused by
run-to-run noise, not repeatable slowdowns.

Actual code changes by Minux.
I only did the docs and the benchmarking.

LGTM=dvyukov, iant, minux
R=minux, josharian, iant, dave, bradfitz, dvyukov
CC=golang-codereviews
https://golang.org/cl/109050043
2014-06-26 11:54:39 -04:00
Russ Cox 14d2ee1d00 runtime: make continuation pc available to stack walk
The 'continuation pc' is where the frame will continue
execution, if anywhere. For a frame that stopped execution
due to a CALL instruction, the continuation pc is immediately
after the CALL. But for a frame that stopped execution due to
a fault, the continuation pc is the pc after the most recent CALL
to deferproc in that frame, or else 0. That is where execution
will continue, if anywhere.

The liveness information is only recorded for CALL instructions.
This change makes sure that we never look for liveness information
except for CALL instructions.

Using a valid PC fixes crashes when a garbage collection or
stack copying tries to process a stack frame that has faulted.

Record continuation pc in heapdump (format change).

Fixes #8048.

LGTM=iant, khr
R=khr, iant, dvyukov
CC=golang-codereviews, r
https://golang.org/cl/100870044
2014-05-31 10:10:12 -04:00
Shenghou Ma a68b9be935 runtime: fix empty heap dump bug on windows.
Fixes #8119.

LGTM=khr, rsc
R=alex.brainman, khr, bradfitz, rsc
CC=golang-codereviews
https://golang.org/cl/93640044
2014-05-31 01:09:48 -07:00
Dmitriy Vyukov a12661329b runtime: fix triggering of forced GC
mstats.last_gc is unix time now, it is compared with abstract monotonic time.
On my machine GC is forced every 5 mins regardless of last_gc.

LGTM=rsc
R=golang-codereviews
CC=golang-codereviews, iant, rsc
https://golang.org/cl/91350045
2014-05-13 09:53:03 +04:00
Russ Cox 90093f0634 liblink: introduce TLS register on 386 and amd64
When I did the original 386 ports on Linux and OS X, I chose to
define GS-relative expressions like 4(GS) as relative to the actual
thread-local storage base, which was usually GS but might not be
(it might be FS, or it might be a different constant offset from GS or FS).

The original scope was limited but since then the rewrites have
gotten out of control. Sometimes GS is rewritten, sometimes FS.
Some ports do other rewrites to enable shared libraries and
other linking. At no point in the code is it clear whether you are
looking at the real GS/FS or some synthesized thing that will be
rewritten. The code manipulating all these is duplicated in many
places.

The first step to fixing issue 7719 is to make the code intelligible
again.

This CL adds an explicit TLS pseudo-register to the 386 and amd64.
As a register, TLS refers to the thread-local storage base, and it
can only be loaded into another register:

        MOVQ TLS, AX

An offset from the thread-local storage base is written off(reg)(TLS*1).
Semantically it is off(reg), but the (TLS*1) annotation marks this as
indexing from the loaded TLS base. This emits a relocation so that
if the linker needs to adjust the offset, it can. For example:

        MOVQ TLS, AX
        MOVQ 8(AX)(TLS*1), CX // load m into CX

On systems that support direct access to the TLS memory, this
pair of instructions can be reduced to a direct TLS memory reference:

        MOVQ 8(TLS), CX // load m into CX

The 2-instruction and 1-instruction forms correspond roughly to
ELF TLS initial exec mode and ELF TLS local exec mode, respectively.

Liblink applies this rewrite on systems that support the 1-instruction form.
The decision is made using only the operating system (and probably
the -shared flag, eventually), not the link mode. If some link modes
on a particular operating system require the 2-instruction form,
then all builds for that operating system will use the 2-instruction
form, so that the link mode decision can be delayed to link time.

Obviously it is late to be making changes like this, but I despair
of correcting issue 7719 and issue 7164 without it. To make sure
I am not changing existing behavior, I built a "hello world" program
for every GOOS/GOARCH combination we have and then worked
to make sure that the rewrite generates exactly the same binaries,
byte for byte. There are a handful of TODOs in the code marking
kludges to get the byte-for-byte property, but at least now I can
explain exactly how each binary is handled.

The targets I tested this way are:

        darwin-386
        darwin-amd64
        dragonfly-386
        dragonfly-amd64
        freebsd-386
        freebsd-amd64
        freebsd-arm
        linux-386
        linux-amd64
        linux-arm
        nacl-386
        nacl-amd64p32
        netbsd-386
        netbsd-amd64
        openbsd-386
        openbsd-amd64
        plan9-386
        plan9-amd64
        solaris-amd64
        windows-386
        windows-amd64

There were four exceptions to the byte-for-byte goal:

windows-386 and windows-amd64 have a time stamp
at bytes 137 and 138 of the header.

darwin-386 and plan9-386 have five or six modified
bytes in the middle of the Go symbol table, caused by
editing comments in runtime/sys_{darwin,plan9}_386.s.

Fixes #7164.

LGTM=iant
R=iant, aram, minux.ma, dave
CC=golang-codereviews
https://golang.org/cl/87920043
2014-04-15 13:45:39 -04:00
Russ Cox 72c5d5e756 reflect, runtime: fix crash in GC due to reflect.call + precise GC
Given
        type Outer struct {
                *Inner
                ...
        }
the compiler generates the implementation of (*Outer).M dispatching to
the embedded Inner. The implementation is logically:
        func (p *Outer) M() {
                (p.Inner).M()
        }
but since the only change here is the replacement of one pointer
receiver with another, the actual generated code overwrites the
original receiver with the p.Inner pointer and then jumps to the M
method expecting the *Inner receiver.

During reflect.Value.Call, we create an argument frame and the
associated data structures to describe it to the garbage collector,
populate the frame, call reflect.call to run a function call using
that frame, and then copy the results back out of the frame. The
reflect.call function does a memmove of the frame structure onto the
stack (to set up the inputs), runs the call, and the memmoves the
stack back to the frame structure (to preserve the outputs).

Originally reflect.call did not distinguish inputs from outputs: both
memmoves were for the full stack frame. However, in the case where the
called function was one of these wrappers, the rewritten receiver is
almost certainly a different type than the original receiver. This is
not a problem on the stack, where we use the program counter to
determine the type information and understand that during (*Outer).M
the receiver is an *Outer while during (*Inner).M the receiver in the
same memory word is now an *Inner. But in the statically typed
argument frame created by reflect, the receiver is always an *Outer.
Copying the modified receiver pointer off the stack into the frame
will store an *Inner there, and then if a garbage collection happens
to scan that argument frame before it is discarded, it will scan the
*Inner memory as if it were an *Outer. If the two have different
memory layouts, the collection will intepret the memory incorrectly.

Fix by only copying back the results.

Fixes #7725.

LGTM=khr
R=khr
CC=dave, golang-codereviews
https://golang.org/cl/85180043
2014-04-08 11:11:35 -04:00
Russ Cox b2cbf49343 runtime: fix fault during arm software floating point
The software floating point runs with m->locks++
to avoid being preempted; recognize this case in panic
and undo it so that m->locks is maintained correctly
when panicking.

Fixes #7553.

LGTM=dvyukov
R=golang-codereviews, dvyukov
CC=golang-codereviews
https://golang.org/cl/84030043
2014-04-03 15:39:48 -04:00
Russ Cox 5a23a7e52c runtime: enable 'bad pointer' check during garbage collection of Go stack frames
This is the same check we use during stack copying.
The check cannot be applied to C stack frames, even
though we do emit pointer bitmaps for the arguments,
because (1) the pointer bitmaps assume all arguments
are always live, not true of outputs during the prologue,
and (2) the pointer bitmaps encode interface values as
pointer pairs, not true of interfaces holding integers.

For the rest of the frames, however, we should hold ourselves
to the rule that a pointer marked live really is initialized.
The interface scanning already implicitly checks this
because it interprets the type word  as a valid type pointer.

This may slow things down a little because of the extra loads.
Or it may speed things up because we don't bother enqueuing
nil pointers anymore. Enough of the rest of the system is slow
right now that we can't measure it meaningfully.
Enable for now, even if it is slow, to shake out bugs in the
liveness bitmaps, and then decide whether to turn it off
for the Go 1.3 release (issue 7650 reminds us to do this).

The new m->traceback field lets us force printing of fp=
values on all goroutine stack traces when we detect a
bad pointer. This makes it easier to understand exactly
where in the frame the bad pointer is, so that we can trace
it back to a specific variable and determine what is wrong.

Update #7650

LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/80860044
2014-03-27 14:06:15 -04:00
Keith Randall fff63c2448 runtime: WriteHeapDump dumps the heap to a file.
See http://golang.org/s/go13heapdump for the file format.

LGTM=rsc
R=rsc, bradfitz, dvyukov, khr
CC=golang-codereviews
https://golang.org/cl/37540043
2014-03-25 15:09:49 -07:00
Keith Randall 1b45cc45e3 runtime: redo stack map entries to avoid false retention
Change two-bit stack map entries to encode:
0 = dead
1 = scalar
2 = pointer
3 = multiword

If multiword, the two-bit entry for the following word encodes:
0 = string
1 = slice
2 = iface
3 = eface

That way, during stack scanning we can check if a string
is zero length or a slice has zero capacity.  We can avoid
following the contained pointer in those cases.  It is safe
to do so because it can never be dereferenced, and it is
desirable to do so because it may cause false retention
of the following block in memory.

Slice feature turned off until issue 7564 is fixed.

Update #7549

LGTM=rsc
R=golang-codereviews, bradfitz, rsc
CC=golang-codereviews
https://golang.org/cl/76380043
2014-03-25 14:11:34 -07:00
Russ Cox 3750904a7e runtime: use VEH, not SEH, for windows/386 exception handling
Structured Exception Handling (SEH) was the first way to handle
exceptions (memory faults, divides by zero) on Windows.
The S might as well stand for "stack-based": the implementation
interprets stack addresses in a few different ways, and it gets
subtly confused by Go's management of stacks. It's also something
that requires active maintenance during cgo switches, and we've
had bugs in that maintenance in the past.

We have recently come to believe that SEH cannot work with
Go's stack usage. See http://golang.org/issue/7325 for details.

Vectored Exception Handling (VEH) is more like a Unix signal
handler: you set it once for the whole process and forget about it.

This CL drops all the SEH code and replaces it with VEH code.
Many special cases and 7 #ifdefs disappear.

VEH was introduced in Windows XP, so Go on windows/386 will
now require Windows XP or later. The previous requirement was
Windows 2000 or later. Windows 2000 immediately preceded
Windows XP, so Windows 2000 is the only affected version.
Microsoft stopped supporting Windows 2000 in 2010.
See http://golang.org/s/win2000-golang-nuts for details.

Fixes #7325.

LGTM=alex.brainman, r
R=golang-codereviews, alex.brainman, stephen.gutekanst, dave
CC=golang-codereviews, iant, r
https://golang.org/cl/74790043
2014-03-24 21:22:16 -04:00
Aram Hăvărneanu 199e703083 runtime: fix use after close race in Solaris network poller
The Solaris network poller uses event ports, which are
level-triggered. As such, it has to re-arm itself after each
wakeup. The arming mechanism (which runs in its own thread) raced
with the closing of a file descriptor happening in a different
thread. When a network file descriptor is about to be closed,
the network poller is awaken to give it a chance to remove its
association with the file descriptor. Because the poller always
re-armed itself, it raced with code that closed the descriptor.

This change makes the network poller check before re-arming if
the file descriptor is about to be closed, in which case it will
ignore the re-arming request. It uses the per-PollDesc lock in
order to serialize access to the PollDesc.

This change also adds extensive documentation describing the
Solaris implementation of the network poller.

Fixes #7410.

LGTM=dvyukov, iant
R=golang-codereviews, bradfitz, iant, dvyukov, aram.h, gobot
CC=golang-codereviews
https://golang.org/cl/69190044
2014-03-14 17:53:05 +04:00
Anthony Martin 41aa887be5 runtime: fix signal handling on Plan 9
LGTM=rsc
R=rsc, 0intro, aram, jeremyjackins, iant
CC=golang-codereviews, lucio.dere, minux.ma, paurea, r
https://golang.org/cl/9796043
2014-03-13 09:00:12 -07:00
Dmitriy Vyukov e678ab4e37 runtime: detect stack split after fork
This check would allowed to easily prevent issue 7511.
Update #7511

LGTM=rsc
R=rsc, aram
CC=golang-codereviews
https://golang.org/cl/75260043
2014-03-13 17:41:08 +04:00
Dmitriy Vyukov 1569628725 runtime: harden conditions when runtime panics on crash
This is especially important for SetPanicOnCrash,
but also useful for e.g. nil deref in mallocgc.
Panics on such crashes can't lead to anything useful,
only to deadlocks, hangs and obscure crashes.
This is a copy of broken but already LGTMed
https://golang.org/cl/68540043/

TBR=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/75320043
2014-03-13 13:25:59 +04:00
Dave Cheney 6431be3fe4 runtime: more Native Client fixes
Thanks to Ian for spotting these.

runtime.h: define uintreg correctly.
stack.c: address warning caused by the type of uintreg being 32 bits on amd64p32.

Commentary (mainly for my own use)

nacl/amd64p32 defines a machine with 64bit registers, but address space is limited to a 4gb window (the window is placed randomly inside the full 48 bit virtual address space of a process). To cope with this 6c defines _64BIT and _64BITREG.

_64BITREG is always defined by 6c, so both GOARCH=amd64 and GOARCH=amd64p32 use 64bit wide registers.

However _64BIT itself is only defined when 6c is compiling for amd64 targets. The definition is elided for amd64p32 environments causing int, uint and other arch specific types to revert to their 32bit definitions.

LGTM=iant
R=iant, rsc, remyoudompheng
CC=golang-codereviews
https://golang.org/cl/72860046
2014-03-11 14:43:10 +11:00
Shenghou Ma 84570aa9a1 runtime: round stack size to power of 2.
Fixes build on windows/386 and plan9/386.
Fixes #7487.

LGTM=mattn.jp, dvyukov, rsc
R=golang-codereviews, mattn.jp, dvyukov, 0intro, rsc
CC=golang-codereviews
https://golang.org/cl/72360043
2014-03-07 15:11:16 -05:00
Dmitriy Vyukov 1a89e6388c runtime: refactor and fix stack management code
There are at least 3 bugs:
1. g->stacksize accounting is broken during copystack/shrinkstack
2. stktop->free is not properly maintained during copystack/shrinkstack
3. stktop->free logic is broken:
        we can have stktop->free==FixedStack,
        and we will free it into stack cache,
        but it actually comes from heap as the result of non-copying segment shrink
This shows as at least spurious races on race builders (maybe something else as well I don't know).

The idea behind the refactoring is to consolidate stacksize and
segment origin logic in stackalloc/stackfree.

Fixes #7490.

LGTM=rsc, khr
R=golang-codereviews, rsc, khr
CC=golang-codereviews
https://golang.org/cl/72440043
2014-03-07 20:52:29 +04:00
Dmitriy Vyukov f946a7ca09 runtime: fix memory corruption and leak in recursive panic handling
Recursive panics leave dangling Panic structs in g->panic stack.
At best it leads to a Defer leak and incorrect output on a subsequent panic.
At worst it arbitrary corrupts heap.

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/72480043
2014-03-07 20:50:30 +04:00
Dmitriy Vyukov a1695d2ea3 runtime: use custom thunks for race calls instead of cgo
Implement custom assembly thunks for hot race calls (memory accesses and function entry/exit).
The thunks extract caller pc, verify that the address is in heap or global and switch to g0 stack.

Before:
ok  	regexp	3.692s
ok  	compress/bzip2	9.461s
ok  	encoding/json	6.380s
After:
ok  	regexp	2.229s (-40%)
ok  	compress/bzip2	4.703s (-50%)
ok  	encoding/json	3.629s (-43%)

For comparison, normal non-race build:
ok  	regexp	0.348s
ok  	compress/bzip2	0.304s
ok  	encoding/json	0.661s
Race build:
ok  	regexp	2.229s (+540%)
ok  	compress/bzip2	4.703s (+1447%)
ok  	encoding/json	3.629s (+449%)

Also removes some race-related special cases from cgocall and scheduler.
In long-term it will allow to remove cyclic runtime/race dependency on cmd/cgo.

Fixes #4249.
Fixes #7460.
Update #6508
Update #6688

R=iant, rsc, bradfitz
CC=golang-codereviews
https://golang.org/cl/55100044
2014-03-06 23:48:30 +04:00
Russ Cox 1249d3a518 runtime: handle Go calls C calls Go panic correctly on windows/386
32-bit Windows uses "structured exception handling" (SEH) to
handle hardware faults: that there is a per-thread linked list
of fault handlers maintained in user space instead of
something like Unix's signal handlers. The structures in the
linked list are required to live on the OS stack, and the
usual discipline is that the function that pushes a record
(allocated from the current stack frame) onto the list pops
that record before returning. Not to pop the entry before
returning creates a dangling pointer error: the list head
points to a stack frame that no longer exists.

Go pushes an SEH record in the top frame of every OS thread,
and that record suffices for all Go execution on that thread,
at least until cgo gets involved.

If we call into C using cgo, that called C code may push its
own SEH records, but by the convention it must pop them before
returning back to the Go code. We assume it does, and that's
fine.

If the C code calls back into Go, we want the Go SEH handler
to become active again, not whatever C has set up. So
runtime.callbackasm1, which handles a call from C back into
Go, pushes a new SEH record before calling the Go code and
pops it when the Go code returns. That's also fine.

It can happen that when Go calls C calls Go like this, the
inner Go code panics. We allow a defer in the outer Go to
recover the panic, effectively wiping not only the inner Go
frames but also the C calls. This sequence was not popping the
SEH stack up to what it was before the cgo calls, so it was
creating the dangling pointer warned about above. When
eventually the m stack was used enough to overwrite the
dangling SEH records, the SEH chain was lost, and any future
panic would not end up in Go's handler.

The bug in TestCallbackPanic and friends was thus creating a
situation where TestSetPanicOnFault - which causes a hardware
fault - would not find the Go fault handler and instead crash
the binary.

Add checks to TestCallbackPanicLocked to diagnose the mistake
in that test instead of leaving a bad state for another test
case to stumble over.

Fix bug by restoring SEH chain during deferred "endcgo"
cleanup.

This bug is likely present in Go 1.2.1, but since it depends
on Go calling C calling Go, with the inner Go panicking and
the outer Go recovering the panic, it seems not important
enough to bother fixing before Go 1.3. Certainly no one has
complained.

Fixes #7470.

LGTM=alex.brainman
R=golang-codereviews, alex.brainman
CC=golang-codereviews, iant, khr
https://golang.org/cl/71440043
2014-03-05 11:10:40 -05:00
Keith Randall 1665b006a5 runtime: grow stack by copying
On stack overflow, if all frames on the stack are
copyable, we copy the frames to a new stack twice
as large as the old one.  During GC, if a G is using
less than 1/4 of its stack, copy the stack to a stack
half its size.

TODO
- Do something about C frames.  When a C frame is in the
  stack segment, it isn't copyable.  We allocate a new segment
  in this case.
  - For idempotent C code, we can abort it, copy the stack,
    then retry.  I'm working on a separate CL for this.
  - For other C code, we can raise the stackguard
    to the lowest Go frame so the next call that Go frame
    makes triggers a copy, which will then succeed.
- Pick a starting stack size?

The plan is that eventually we reach a point where the
stack contains only copyable frames.

LGTM=rsc
R=dvyukov, rsc
CC=golang-codereviews
https://golang.org/cl/54650044
2014-02-26 23:28:44 -08:00
Keith Randall 3b5278fca6 runtime: get rid of the settype buffer and lock.
MCaches	now hold a MSpan for each sizeclass which they have
exclusive access to allocate from, so no lock is needed.

Modifying the heap bitmaps also no longer requires a cas.

runtime.free gets more expensive.  But we don't use it
much any more.

It's not much faster on 1 processor, but it's a lot
faster on multiple processors.

benchmark                 old ns/op    new ns/op    delta
BenchmarkSetTypeNoPtr1           24           23   -0.42%
BenchmarkSetTypeNoPtr2           33           34   +0.89%
BenchmarkSetTypePtr1             51           49   -3.72%
BenchmarkSetTypePtr2             55           54   -1.98%

benchmark                old ns/op    new ns/op    delta
BenchmarkAllocation          52739        50770   -3.73%
BenchmarkAllocation-2        33957        34141   +0.54%
BenchmarkAllocation-3        33326        29015  -12.94%
BenchmarkAllocation-4        38105        25795  -32.31%
BenchmarkAllocation-5        68055        24409  -64.13%
BenchmarkAllocation-6        71544        23488  -67.17%
BenchmarkAllocation-7        68374        23041  -66.30%
BenchmarkAllocation-8        70117        20758  -70.40%

LGTM=rsc, dvyukov
R=dvyukov, bradfitz, khr, rsc
CC=golang-codereviews
https://golang.org/cl/46810043
2014-02-26 15:52:58 -08:00
Russ Cox e8fe1cce66 runtime, net: fixes from CL 68490043 review
These are mistakes in the first big NaCl CL.

LGTM=minux.ma, iant
R=golang-codereviews, minux.ma, iant
CC=golang-codereviews
https://golang.org/cl/69200043
2014-02-26 12:21:31 -05:00
Dave Cheney 7c8280c9ef all: merge NaCl branch (part 1)
See golang.org/s/go13nacl for design overview.

This CL is the mostly mechanical changes from rsc's Go 1.2 based NaCl branch, specifically 39cb35750369 to 500771b477cf from https://code.google.com/r/rsc-go13nacl. This CL does not include working NaCl support, there are probably two or three more large merges to come.

CL 15750044 is not included as it involves more invasive changes to the linker which will need to be merged separately.

The exact change lists included are

15050047: syscall: support for Native Client
15360044: syscall: unzip implementation for Native Client
15370044: syscall: Native Client SRPC implementation
15400047: cmd/dist, cmd/go, go/build, test: support for Native Client
15410048: runtime: support for Native Client
15410049: syscall: file descriptor table for Native Client
15410050: syscall: in-memory file system for Native Client
15440048: all: update +build lines for Native Client port
15540045: cmd/6g, cmd/8g, cmd/gc: support for Native Client
15570045: os: support for Native Client
15680044: crypto/..., hash/crc32, reflect, sync/atomic: support for amd64p32
15690044: net: support for Native Client
15690048: runtime: support for fake time like on Go Playground
15690051: build: disable various tests on Native Client

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/68150047
2014-02-25 09:47:42 -05:00
Aram Hăvărneanu 50df136483 runtime, net: add support for GOOS=solaris
LGTM=dave, rsc
R=golang-codereviews, minux.ma, mikioh.mikioh, dave, iant, rsc
CC=golang-codereviews
https://golang.org/cl/36030043
2014-02-24 22:31:01 -05:00
Russ Cox e56c6e7535 runtime/debug: add SetPanicOnFault
SetPanicOnFault allows recovery from unexpected memory faults.
This can be useful if you are using a memory-mapped file
or probing the address space of the current program.

LGTM=r
R=r
CC=golang-codereviews
https://golang.org/cl/66590044
2014-02-20 16:18:05 -05:00
Russ Cox 67c83db60d runtime: use goc2c as much as possible
Package runtime's C functions written to be called from Go
started out written in C using carefully constructed argument
lists and the FLUSH macro to write a result back to memory.

For some functions, the appropriate parameter list ended up
being architecture-dependent due to differences in alignment,
so we added 'goc2c', which takes a .goc file containing Go func
declarations but C bodies, rewrites the Go func declaration to
equivalent C declarations for the target architecture, adds the
needed FLUSH statements, and writes out an equivalent C file.
That C file is compiled as part of package runtime.

Native Client's x86-64 support introduces the most complex
alignment rules yet, breaking many functions that could until
now be portably written in C. Using goc2c for those avoids the
breakage.

Separately, Keith's work on emitting stack information from
the C compiler would require the hand-written functions
to add #pragmas specifying how many arguments are result
parameters. Using goc2c for those avoids maintaining #pragmas.

For both reasons, use goc2c for as many Go-called C functions
as possible.

This CL is a replay of the bulk of CL 15400047 and CL 15790043,
both of which were reviewed as part of the NaCl port and are
checked in to the NaCl branch. This CL is part of bringing the
NaCl code into the main tree.

No new code here, just reformatting and occasional movement
into .h files.

LGTM=r
R=dave, alex.brainman, r
CC=golang-codereviews
https://golang.org/cl/65220044
2014-02-20 15:58:47 -05:00
Dmitriy Vyukov 5e72fae9b2 runtime: improve cpu profiles for GC/syscalls/cgo
Current "System->etext" is not very informative.
Add parent "GC" frame.
Replace un-unwindable syscall/cgo frames with Go stack that leads to the call.

LGTM=rsc
R=rsc, alex.brainman, ality
CC=golang-codereviews
https://golang.org/cl/61270043
2014-02-12 22:31:36 +04:00
Dmitriy Vyukov 2ea859a779 runtime: refactor level-triggered IO support
Remove GOOS_solaris ifdef from netpoll code,
instead introduce runtime edge/level triggered IO flag.
Replace armread/armwrite with a single arm(mode) function,
that's how all other interfaces look like and these functions
will need to do roughly the same thing anyway.

LGTM=rsc
R=golang-codereviews, dave, rsc
CC=golang-codereviews
https://golang.org/cl/55500044
2014-02-12 22:24:29 +04:00
Dmitriy Vyukov e1ee04828d runtime: refactor chan code
1. Make internal chan functions static.
2. Move selgen local variable instead of a member of G struct.
3. Change "bool *pres/selected" parameter of chansend/chanrecv to "bool block",
   which is simpler, faster and less code.
-37 lines total.

LGTM=rsc
R=golang-codereviews, dave, gobot, rsc
CC=bradfitz, golang-codereviews, iant, khr
https://golang.org/cl/58610043
2014-02-12 22:21:38 +04:00
Dmitriy Vyukov 0229dc6dbe runtime: do not cpu profile idle threads on windows
Currently this leads to a significant skew towards 'etext' entry,
since all idle threads are profiled every tick.
Before:
Total: 66608 samples
   63188  94.9%  94.9%    63188  94.9% etext
     278   0.4%  95.3%      278   0.4% sweepspan
     216   0.3%  95.6%      448   0.7% runtime.mallocgc
     122   0.2%  95.8%      122   0.2% scanblock
     113   0.2%  96.0%      113   0.2% net/textproto.canonicalMIMEHeaderKey
After:
Total: 8008 samples
    3949  49.3%  49.3%     3949  49.3% etext
     231   2.9%  52.2%      231   2.9% scanblock
     211   2.6%  54.8%      211   2.6% runtime.cas64
     182   2.3%  57.1%      408   5.1% runtime.mallocgc
     178   2.2%  59.3%      178   2.2% runtime.atomicload64

LGTM=alex.brainman
R=golang-codereviews, alex.brainman
CC=golang-codereviews
https://golang.org/cl/61250043
2014-02-10 15:40:55 +04:00
Dmitriy Vyukov 86a3a54284 runtime: fix windows build
Currently windows crashes because early allocs in schedinit
try to allocate tiny memory blocks, but m->p is not yet setup.
I've considered calling procresize(1) earlier in schedinit,
but this refactoring is better and must fix the issue as well.
Fixes #7218.

R=golang-codereviews, r
CC=golang-codereviews
https://golang.org/cl/54570045
2014-01-28 00:26:56 +04:00
Dmitriy Vyukov 1fa7029425 runtime: combine small NoScan allocations
Combine NoScan allocations < 16 bytes into a single memory block.
Reduces number of allocations on json/garbage benchmarks by 10+%.

json-1
allocated                 8039872      7949194      -1.13%
allocs                     105774        93776     -11.34%
cputime                 156200000    100700000     -35.53%
gc-pause-one              4908873      3814853     -22.29%
gc-pause-total            2748969      2899288      +5.47%
rss                      52674560     43560960     -17.30%
sys-gc                    3796976      3256304     -14.24%
sys-heap                 43843584     35192832     -19.73%
sys-other                 5589312      5310784      -4.98%
sys-stack                  393216       393216      +0.00%
sys-total                53623088     44153136     -17.66%
time                    156193436    100886714     -35.41%
virtual-mem             256548864    256540672      -0.00%

garbage-1
allocated                 2996885      2932982      -2.13%
allocs                      62904        55200     -12.25%
cputime                  17470000     17400000      -0.40%
gc-pause-one            932757485    925806143      -0.75%
gc-pause-total            4663787      4629030      -0.75%
rss                    1151074304   1133670400      -1.51%
sys-gc                   66068352     65085312      -1.49%
sys-heap               1039728640   1024065536      -1.51%
sys-other                38038208     37485248      -1.45%
sys-stack                 8650752      8781824      +1.52%
sys-total              1152485952   1135417920      -1.48%
time                     17478088     17418005      -0.34%
virtual-mem            1343709184   1324204032      -1.45%

LGTM=iant, bradfitz
R=golang-codereviews, dave, iant, rsc, bradfitz
CC=golang-codereviews, khr
https://golang.org/cl/38750047
2014-01-24 22:35:11 +04:00
Keith Randall be5d2d4432 runtime: Print elision message if we skipped frames on traceback.
Fixes bug 7180

R=golang-codereviews, dvyukov
CC=golang-codereviews, gri
https://golang.org/cl/55810044
2014-01-23 12:47:30 -08:00
Dmitriy Vyukov 9cbd2fb1aa runtime: remove locks from netpoll hotpaths
Introduces two-phase goroutine parking mechanism -- prepare to park, commit park.
This mechanism does not require backing mutex to protect wait predicate.
Use it in netpoll. See comment in netpoll.goc for details.
This slightly reduces contention between reader, writer and read/write io notifications;
and just eliminates a bunch of mutex operations from hotpaths, thus making then faster.

benchmark                             old ns/op    new ns/op    delta
BenchmarkTCP4ConcurrentReadWrite           2109         1945   -7.78%
BenchmarkTCP4ConcurrentReadWrite-2         1162         1113   -4.22%
BenchmarkTCP4ConcurrentReadWrite-4          798          755   -5.39%
BenchmarkTCP4ConcurrentReadWrite-8          803          748   -6.85%
BenchmarkTCP4Persistent                    9411         9240   -1.82%
BenchmarkTCP4Persistent-2                  5888         5813   -1.27%
BenchmarkTCP4Persistent-4                  4016         3968   -1.20%
BenchmarkTCP4Persistent-8                  3943         3857   -2.18%

R=golang-codereviews, mikioh.mikioh, gobot, iant, rsc
CC=golang-codereviews, khr
https://golang.org/cl/45700043
2014-01-22 11:27:16 +04:00
Dmitriy Vyukov 98b50b89a8 runtime: allocate goroutine ids in batches
Helps reduce contention on sched.goidgen.

benchmark                               old ns/op    new ns/op    delta
BenchmarkCreateGoroutines-16                  259          237   -8.49%
BenchmarkCreateGoroutinesParallel-16          127           43  -66.06%

R=golang-codereviews, dave, bradfitz, khr
CC=golang-codereviews, rsc
https://golang.org/cl/46970043
2014-01-22 10:34:36 +04:00
Dmitriy Vyukov 8a3c587dc1 runtime: fix and improve CPU profiling
- do not lose profiling signals when we have no mcache (possible for syscalls/cgo)
- do not lose any profiling signals on windows
- fix profiling of cgo programs on windows (they had no m->thread setup)
- properly setup tls in cgo programs on windows
- check _beginthread return value

Fixes #6417.
Fixes #6986.

R=alex.brainman, rsc
CC=golang-codereviews
https://golang.org/cl/44820047
2014-01-22 10:30:10 +04:00
Dmitriy Vyukov cb133c6607 runtime: do not collect GC roots explicitly
Currently we collect (add) all roots into a global array in a single-threaded GC phase.
This hinders parallelism.
With this change we just kick off parallel for for number_of_goroutines+5 iterations.
Then parallel for callback decides whether it needs to scan stack of a goroutine
scan data segment, scan finalizers, etc. This eliminates the single-threaded phase entirely.
This requires to store all goroutines in an array instead of a linked list
(to allow direct indexing).
This CL also removes DebugScan functionality. It is broken because it uses
unbounded stack, so it can not run on g0. When it was working, I've found
it helpless for debugging issues because the two algorithms are too different now.
This change would require updating the DebugScan, so it's simpler to just delete it.

With 8 threads this change reduces GC pause by ~6%, while keeping cputime roughly the same.

garbage-8
allocated                 2987886      2989221      +0.04%
allocs                      62885        62887      +0.00%
cputime                  21286000     21272000      -0.07%
gc-pause-one             26633247     24885421      -6.56%
gc-pause-total             873570       811264      -7.13%
rss                     242089984    242515968      +0.18%
sys-gc                   13934336     13869056      -0.47%
sys-heap                205062144    205062144      +0.00%
sys-other                12628288     12628288      +0.00%
sys-stack                11534336     11927552      +3.41%
sys-total               243159104    243487040      +0.13%
time                      2809477      2740795      -2.44%

R=golang-codereviews, rsc
CC=cshapiro, golang-codereviews, khr
https://golang.org/cl/46860043
2014-01-21 13:06:57 +04:00
Dmitriy Vyukov 1ba04c171a runtime: per-P defer pool
Instead of a per-goroutine stack of defers for all sizes,
introduce per-P defer pool for argument sizes 8, 24, 40, 56, 72 bytes.

For a program that starts 1e6 goroutines and then joins then:
old: rss=6.6g virtmem=10.2g time=4.85s
new: rss=4.5g virtmem= 8.2g time=3.48s

R=golang-codereviews, rsc
CC=golang-codereviews
https://golang.org/cl/42750044
2014-01-21 11:20:23 +04:00
Aram Hăvărneanu a46b434931 runtime: add support for GOOS=solaris
R=alex.brainman, dave, jsing, gobot, rsc
CC=golang-codereviews
https://golang.org/cl/35990043
2014-01-17 17:58:10 +13:00
Dmitriy Vyukov c0b9e6218c runtime: output how long goroutines are blocked
Example of output:

goroutine 4 [sleep for 3 min]:
time.Sleep(0x34630b8a000)
        src/pkg/runtime/time.goc:31 +0x31
main.func·002()
        block.go:16 +0x2c
created by main.main
        block.go:17 +0x33

Full program and output are here:
http://play.golang.org/p/NEZdADI3Td

Fixes #6809.

R=golang-codereviews, khr, kamil.kisiel, bradfitz, rsc
CC=golang-codereviews
https://golang.org/cl/50420043
2014-01-16 12:54:46 +04:00
Dmitriy Vyukov 4722b1cbd3 runtime: use lock-free ring for work queues
Use lock-free fixed-size ring for work queues
instead of an unbounded mutex-protected array.
The ring has single producer and multiple consumers.
If the ring overflows, work is put onto global queue.

benchmark              old ns/op    new ns/op    delta
BenchmarkMatmult               7            5  -18.12%
BenchmarkMatmult-4             2            2  -18.98%
BenchmarkMatmult-16            1            0  -12.84%

BenchmarkCreateGoroutines                     105           88  -16.10%
BenchmarkCreateGoroutines-4                   376          219  -41.76%
BenchmarkCreateGoroutines-16                  241          174  -27.80%
BenchmarkCreateGoroutinesParallel             103           87  -14.66%
BenchmarkCreateGoroutinesParallel-4           169          143  -15.38%
BenchmarkCreateGoroutinesParallel-16          158          151   -4.43%

R=golang-codereviews, rsc
CC=ddetlefs, devon.odell, golang-codereviews
https://golang.org/cl/46170044
2014-01-16 12:17:00 +04:00
Keith Randall 020b39c3f3 runtime: use special records hung off the MSpan to
record finalizers and heap profile info.  Enables
removing the special bit from the heap bitmap.  Also
provides a generic mechanism for annotating occasional
heap objects.

finalizers
        overhead      per obj
old	680 B	      80 B avg
new	16 B/span     48 B

profile
        overhead      per obj
old	32KB	      24 B + hash tables
new	16 B/span     24 B

R=cshapiro, khr, dvyukov, gobot
CC=golang-codereviews
https://golang.org/cl/13314053
2014-01-07 13:45:50 -08:00
Alex Brainman 7f8a5057dd syscall: add NewCallbackCDecl again
Fixes #6338

R=golang-dev, kin.wilson.za, rsc
CC=golang-dev
https://golang.org/cl/36180044
2013-12-19 14:38:50 +11:00
Carl Shapiro 76c54c1193 runtime: add GODEBUG option for an electric fence like heap mode
When enabled this new debugging mode will allocate objects on
their own page and never recycle memory addresses.  This is an
essential tool to root cause a broad class of heap corruption.

R=golang-dev, dave, daniel.morsing, dvyukov, rsc, iant, cshapiro
CC=golang-dev
https://golang.org/cl/22060046
2013-12-06 14:40:45 -08:00
Carl Shapiro 48279bd567 runtime: add an allocation and free tracing for gc debugging
Output for an allocation and free (sweep) follows

MProf_Malloc(p=0xc2100210a0, size=0x50, type=0x0 <single object>)
        #0 0x46ee15 runtime.mallocgc /usr/local/google/home/cshapiro/go/src/pkg/runtime/malloc.goc:141
        #1 0x47004f runtime.settype_flush /usr/local/google/home/cshapiro/go/src/pkg/runtime/malloc.goc:612
        #2 0x45f92c gc /usr/local/google/home/cshapiro/go/src/pkg/runtime/mgc0.c:2071
        #3 0x45f89e mgc /usr/local/google/home/cshapiro/go/src/pkg/runtime/mgc0.c:2050
        #4 0x45258b runtime.mcall /usr/local/google/home/cshapiro/go/src/pkg/runtime/asm_amd64.s:179

MProf_Free(p=0xc2100210a0, size=0x50)
        #0 0x46ee15 runtime.mallocgc /usr/local/google/home/cshapiro/go/src/pkg/runtime/malloc.goc:141
        #1 0x47004f runtime.settype_flush /usr/local/google/home/cshapiro/go/src/pkg/runtime/malloc.goc:612
        #2 0x45f92c gc /usr/local/google/home/cshapiro/go/src/pkg/runtime/mgc0.c:2071
        #3 0x45f89e mgc /usr/local/google/home/cshapiro/go/src/pkg/runtime/mgc0.c:2050
        #4 0x45258b runtime.mcall /usr/local/google/home/cshapiro/go/src/pkg/runtime/asm_amd64.s:179

R=golang-dev, dvyukov, rsc, cshapiro
CC=golang-dev
https://golang.org/cl/21990045
2013-12-03 14:42:38 -08:00
Keith Randall 3278dc158e runtime: pass key/value to map accessors by reference, not by value.
This change is part of the plan to get rid of all vararg C calls
which are a pain for getting exact stack scanning.

We allocate a chunk of zero memory to return a pointer to when a
map access doesn't find the key.  This is simpler than returning nil
and fixing things up in the caller.  Linker magic allocates a single
zero memory area that is shared by all (non-reflect-generated) map
types.

Passing things by reference gets rid of some copies, so it speeds
up code with big keys/values.

benchmark             old ns/op    new ns/op    delta
BenchmarkBigKeyMap           34           31   -8.48%
BenchmarkBigValMap           37           30  -18.62%
BenchmarkSmallKeyMap         26           23  -11.28%

R=golang-dev, dvyukov, khr, rsc
CC=golang-dev
https://golang.org/cl/14794043
2013-12-02 13:05:04 -08:00