Also fix a bunch of bugs:
1. Accesses to last_gc must be atomic (it's int64).
2. last_gc still can be 0 during first checks in sysmon, check for 0.
3. forcegc.g can be unitialized when sysmon accesses it:
forcegc.g is initialized by main goroutine (forcegc.g = newproc1(...)),
and main goroutine is unsynchronized with both sysmon and forcegc goroutine.
Initialize forcegc.g in the forcegc goroutine itself instead.
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, rsc
https://golang.org/cl/136770043
Mutex is consistent with package sync, and when in the
unexported Go form it avoids having a conflcit between
the type (now mutex) and the function (lock).
LGTM=iant
R=golang-codereviews, iant
CC=dvyukov, golang-codereviews, r
https://golang.org/cl/133140043
uintptr or uint64 in the runtime C were turning into uint in the Go,
bool was turning into uint8, and so on. Fix that.
Also delete Go wrappers for C functions.
The C functions can be called directly now
(but still eventually need to be converted to Go).
LGTM=bradfitz, minux, iant
R=golang-codereviews, bradfitz, iant, minux
CC=golang-codereviews, khr, r
https://golang.org/cl/138740043
Every change to g->atomicstatus is now done atomically so that we can
ensure that all gs pass through a gc safepoint on demand. This allows
the GC to move from one phase to the next safely. In some phases the
stack will be scanned. This CL only deals with the infrastructure that
allows g->atomicstatus to go from one state to another. Future CLs
will deal with scanning and monitoring what phase the GC is in.
The major change was to moving to using a Gscan bit to indicate that
the status is in a scan state. The only bug fix was in oldstack where
I wasn't moving to a Gcopystack state in order to block scanning until
the new stack was in place. The proc.go file is waiting for an atomic
load instruction.
LGTM=rsc
R=golang-codereviews, dvyukov, josharian, rsc
CC=golang-codereviews, khr
https://golang.org/cl/132960044
Once and for all.
Broken in cl/108640043.
I've messed it before. To test scavenger-related changes
one needs to alter the constants during final testing.
And then it's very easy to submit with the altered constants.
LGTM=rsc
R=golang-codereviews
CC=golang-codereviews, rsc
https://golang.org/cl/136720044
Before, a slice with cap=0 or a string with len=0 might have its
base pointer pointing beyond the actual slice/string data into
the next block. The collector had to ignore slices and strings with
cap=0 in order to avoid misinterpreting the base pointer.
Now, a slice with cap=0 or a string with len=0 still has a base
pointer pointing into the actual slice/string data, no matter what.
The collector can now always scan the pointer, which means
strings and slices are no longer special.
Fixes#8404.
LGTM=khr, josharian
R=josharian, khr, dvyukov
CC=golang-codereviews
https://golang.org/cl/112570044
A whole thread is too much for background scavenger that sleeps all the time anyway.
We already have sysmon thread that can do this work.
Also remove g->isbackground and simplify enter/exitsyscall.
LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews, khr, rlh
https://golang.org/cl/108640043
These are required for chans, semaphores, timers, etc.
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, rlh, rsc
https://golang.org/cl/123640043
Half the code in the garbage collector accesses the bitmap
as an array of bytes instead of as an array of uintptrs.
This is tricky to do correctly in a portable fashion,
it breaks on big-endian systems.
Make the bitmap a byte array.
Simplifies markallocated, scanblock and span sweep along the way,
as we don't need to recalculate bitmap position for each word.
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, rlh, rsc
https://golang.org/cl/125250043
Eliminating use of this extension makes it easier to port the Go runtime
to other compilers. This CL also disables the extension in cc to prevent
accidental use.
LGTM=rsc, khr
R=rsc, aram, khr, dvyukov
CC=axwalk, golang-codereviews
https://golang.org/cl/106790044
FlagNoGC is unused now.
FlagNoInvokeGC is unneeded as we don't invoke GC
on g0 and when holding locks anyway.
mal/malloc have very few uses and you never remember
the exact set of flags they use and the difference between them.
Moreover, eventually we need to give exact types to all allocations,
something what mal/malloc do not support.
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, rsc
https://golang.org/cl/117580043
The implementation 'return 0' results in too many collisions.
LGTM=khr
R=golang-codereviews, adonovan, khr
CC=golang-codereviews, iant, khr, r
https://golang.org/cl/125720044
Create proper closures so hash functions can be called
directly from Go. Rearrange calling convention so return
value is directly accessible.
LGTM=dvyukov
R=golang-codereviews, dvyukov, dave, khr
CC=golang-codereviews
https://golang.org/cl/119360043
Several reasons:
1. Significantly simplifies runtime.
2. This code proved to be buggy.
3. Free is incompatible with bump-the-pointer allocation.
4. We want to write runtime in Go, Go does not have free.
5. Too much code to free env strings on startup.
LGTM=khr
R=golang-codereviews, josharian, tracey.brendan, khr
CC=bradfitz, golang-codereviews, r, rlh, rsc
https://golang.org/cl/116390043
This change introduces gomallocgc, a Go clone of mallocgc.
Only a few uses have been moved over, so there are still
lots of uses from C. Many of these C uses will be moved
over to Go (e.g. in slice.goc), but probably not all.
What should remain of C's mallocgc is an open question.
LGTM=rsc, dvyukov
R=rsc, khr, dave, bradfitz, dvyukov
CC=golang-codereviews
https://golang.org/cl/108840046
Implement the design described in:
https://docs.google.com/document/d/1v4Oqa0WwHunqlb8C3ObL_uNQw3DfSY-ztoA-4wWbKcg/pub
Summary of the changes:
GC uses "2-bits per word" pointer type info embed directly into bitmap.
Scanning of stacks/data/heap is unified.
The old spans types go away.
Compiler generates "sparse" 4-bits type info for GC (directly for GC bitmap).
Linker generates "dense" 2-bits type info for data/bss (the same as stacks use).
Summary of results:
-1680 lines of code total (-1000+ in mgc0.c only)
-25% memory consumption
-3-7% binary size
-15% GC pause reduction
-7% run time reduction
LGTM=khr
R=golang-codereviews, rsc, christoph, khr
CC=golang-codereviews, rlh
https://golang.org/cl/106260045
redo stack allocation. This is mostly the same as
the original CL with a few bug fixes.
1. add racemalloc() for stack allocations
2. fix poolalloc/poolfree to terminate free lists correctly.
3. adjust span ref count correctly.
4. don't use cache for sizes >= StackCacheSize.
Should fix bugs and memory leaks in original changelist.
««« original CL description
undo CL 104200047 / 318b04f28372
Breaks windows and race detector.
TBR=rsc
««« original CL description
runtime: stack allocator, separate from mallocgc
In order to move malloc to Go, we need to have a
separate stack allocator. If we run out of stack
during malloc, malloc will not be available
to allocate a new stack.
Stacks are the last remaining FlagNoGC objects in the
GC heap. Once they are out, we can get rid of the
distinction between the allocated/blockboundary bits.
(This will be in a separate change.)
Fixes#7468Fixes#7424
LGTM=rsc, dvyukov
R=golang-codereviews, dvyukov, khr, dave, rsc
CC=golang-codereviews
https://golang.org/cl/104200047
»»»
TBR=rsc
CC=golang-codereviews
https://golang.org/cl/101570044
»»»
LGTM=dvyukov
R=dvyukov, dave, khr, alex.brainman
CC=golang-codereviews
https://golang.org/cl/112240044
The code in GC that handles gp->gobuf.ctxt is wrong,
because it does not mark the ctxt object itself,
if just queues the ctxt object for scanning.
So the ctxt object can be collected as garbage.
However, Gobuf.ctxt is void*, so it's always marked and
scanned through G.
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, khr, rsc
https://golang.org/cl/105490044
Breaks windows and race detector.
TBR=rsc
««« original CL description
runtime: stack allocator, separate from mallocgc
In order to move malloc to Go, we need to have a
separate stack allocator. If we run out of stack
during malloc, malloc will not be available
to allocate a new stack.
Stacks are the last remaining FlagNoGC objects in the
GC heap. Once they are out, we can get rid of the
distinction between the allocated/blockboundary bits.
(This will be in a separate change.)
Fixes#7468Fixes#7424
LGTM=rsc, dvyukov
R=golang-codereviews, dvyukov, khr, dave, rsc
CC=golang-codereviews
https://golang.org/cl/104200047
»»»
TBR=rsc
CC=golang-codereviews
https://golang.org/cl/101570044
In order to move malloc to Go, we need to have a
separate stack allocator. If we run out of stack
during malloc, malloc will not be available
to allocate a new stack.
Stacks are the last remaining FlagNoGC objects in the
GC heap. Once they are out, we can get rid of the
distinction between the allocated/blockboundary bits.
(This will be in a separate change.)
Fixes#7468Fixes#7424
LGTM=rsc, dvyukov
R=golang-codereviews, dvyukov, khr, dave, rsc
CC=golang-codereviews
https://golang.org/cl/104200047
The runtime has historically held two dedicated values g (current goroutine)
and m (current thread) in 'extern register' slots (TLS on x86, real registers
backed by TLS on ARM).
This CL removes the extern register m; code now uses g->m.
On ARM, this frees up the register that formerly held m (R9).
This is important for NaCl, because NaCl ARM code cannot use R9 at all.
The Go 1 macrobenchmarks (those with per-op times >= 10 µs) are unaffected:
BenchmarkBinaryTree17 5491374955 5471024381 -0.37%
BenchmarkFannkuch11 4357101311 4275174828 -1.88%
BenchmarkGobDecode 11029957 11364184 +3.03%
BenchmarkGobEncode 6852205 6784822 -0.98%
BenchmarkGzip 650795967 650152275 -0.10%
BenchmarkGunzip 140962363 141041670 +0.06%
BenchmarkHTTPClientServer 71581 73081 +2.10%
BenchmarkJSONEncode 31928079 31913356 -0.05%
BenchmarkJSONDecode 117470065 113689916 -3.22%
BenchmarkMandelbrot200 6008923 5998712 -0.17%
BenchmarkGoParse 6310917 6327487 +0.26%
BenchmarkRegexpMatchMedium_1K 114568 114763 +0.17%
BenchmarkRegexpMatchHard_1K 168977 169244 +0.16%
BenchmarkRevcomp 935294971 914060918 -2.27%
BenchmarkTemplate 145917123 148186096 +1.55%
Minux previous reported larger variations, but these were caused by
run-to-run noise, not repeatable slowdowns.
Actual code changes by Minux.
I only did the docs and the benchmarking.
LGTM=dvyukov, iant, minux
R=minux, josharian, iant, dave, bradfitz, dvyukov
CC=golang-codereviews
https://golang.org/cl/109050043
The 'continuation pc' is where the frame will continue
execution, if anywhere. For a frame that stopped execution
due to a CALL instruction, the continuation pc is immediately
after the CALL. But for a frame that stopped execution due to
a fault, the continuation pc is the pc after the most recent CALL
to deferproc in that frame, or else 0. That is where execution
will continue, if anywhere.
The liveness information is only recorded for CALL instructions.
This change makes sure that we never look for liveness information
except for CALL instructions.
Using a valid PC fixes crashes when a garbage collection or
stack copying tries to process a stack frame that has faulted.
Record continuation pc in heapdump (format change).
Fixes#8048.
LGTM=iant, khr
R=khr, iant, dvyukov
CC=golang-codereviews, r
https://golang.org/cl/100870044
mstats.last_gc is unix time now, it is compared with abstract monotonic time.
On my machine GC is forced every 5 mins regardless of last_gc.
LGTM=rsc
R=golang-codereviews
CC=golang-codereviews, iant, rsc
https://golang.org/cl/91350045
When I did the original 386 ports on Linux and OS X, I chose to
define GS-relative expressions like 4(GS) as relative to the actual
thread-local storage base, which was usually GS but might not be
(it might be FS, or it might be a different constant offset from GS or FS).
The original scope was limited but since then the rewrites have
gotten out of control. Sometimes GS is rewritten, sometimes FS.
Some ports do other rewrites to enable shared libraries and
other linking. At no point in the code is it clear whether you are
looking at the real GS/FS or some synthesized thing that will be
rewritten. The code manipulating all these is duplicated in many
places.
The first step to fixing issue 7719 is to make the code intelligible
again.
This CL adds an explicit TLS pseudo-register to the 386 and amd64.
As a register, TLS refers to the thread-local storage base, and it
can only be loaded into another register:
MOVQ TLS, AX
An offset from the thread-local storage base is written off(reg)(TLS*1).
Semantically it is off(reg), but the (TLS*1) annotation marks this as
indexing from the loaded TLS base. This emits a relocation so that
if the linker needs to adjust the offset, it can. For example:
MOVQ TLS, AX
MOVQ 8(AX)(TLS*1), CX // load m into CX
On systems that support direct access to the TLS memory, this
pair of instructions can be reduced to a direct TLS memory reference:
MOVQ 8(TLS), CX // load m into CX
The 2-instruction and 1-instruction forms correspond roughly to
ELF TLS initial exec mode and ELF TLS local exec mode, respectively.
Liblink applies this rewrite on systems that support the 1-instruction form.
The decision is made using only the operating system (and probably
the -shared flag, eventually), not the link mode. If some link modes
on a particular operating system require the 2-instruction form,
then all builds for that operating system will use the 2-instruction
form, so that the link mode decision can be delayed to link time.
Obviously it is late to be making changes like this, but I despair
of correcting issue 7719 and issue 7164 without it. To make sure
I am not changing existing behavior, I built a "hello world" program
for every GOOS/GOARCH combination we have and then worked
to make sure that the rewrite generates exactly the same binaries,
byte for byte. There are a handful of TODOs in the code marking
kludges to get the byte-for-byte property, but at least now I can
explain exactly how each binary is handled.
The targets I tested this way are:
darwin-386
darwin-amd64
dragonfly-386
dragonfly-amd64
freebsd-386
freebsd-amd64
freebsd-arm
linux-386
linux-amd64
linux-arm
nacl-386
nacl-amd64p32
netbsd-386
netbsd-amd64
openbsd-386
openbsd-amd64
plan9-386
plan9-amd64
solaris-amd64
windows-386
windows-amd64
There were four exceptions to the byte-for-byte goal:
windows-386 and windows-amd64 have a time stamp
at bytes 137 and 138 of the header.
darwin-386 and plan9-386 have five or six modified
bytes in the middle of the Go symbol table, caused by
editing comments in runtime/sys_{darwin,plan9}_386.s.
Fixes#7164.
LGTM=iant
R=iant, aram, minux.ma, dave
CC=golang-codereviews
https://golang.org/cl/87920043
Given
type Outer struct {
*Inner
...
}
the compiler generates the implementation of (*Outer).M dispatching to
the embedded Inner. The implementation is logically:
func (p *Outer) M() {
(p.Inner).M()
}
but since the only change here is the replacement of one pointer
receiver with another, the actual generated code overwrites the
original receiver with the p.Inner pointer and then jumps to the M
method expecting the *Inner receiver.
During reflect.Value.Call, we create an argument frame and the
associated data structures to describe it to the garbage collector,
populate the frame, call reflect.call to run a function call using
that frame, and then copy the results back out of the frame. The
reflect.call function does a memmove of the frame structure onto the
stack (to set up the inputs), runs the call, and the memmoves the
stack back to the frame structure (to preserve the outputs).
Originally reflect.call did not distinguish inputs from outputs: both
memmoves were for the full stack frame. However, in the case where the
called function was one of these wrappers, the rewritten receiver is
almost certainly a different type than the original receiver. This is
not a problem on the stack, where we use the program counter to
determine the type information and understand that during (*Outer).M
the receiver is an *Outer while during (*Inner).M the receiver in the
same memory word is now an *Inner. But in the statically typed
argument frame created by reflect, the receiver is always an *Outer.
Copying the modified receiver pointer off the stack into the frame
will store an *Inner there, and then if a garbage collection happens
to scan that argument frame before it is discarded, it will scan the
*Inner memory as if it were an *Outer. If the two have different
memory layouts, the collection will intepret the memory incorrectly.
Fix by only copying back the results.
Fixes#7725.
LGTM=khr
R=khr
CC=dave, golang-codereviews
https://golang.org/cl/85180043
The software floating point runs with m->locks++
to avoid being preempted; recognize this case in panic
and undo it so that m->locks is maintained correctly
when panicking.
Fixes#7553.
LGTM=dvyukov
R=golang-codereviews, dvyukov
CC=golang-codereviews
https://golang.org/cl/84030043
This is the same check we use during stack copying.
The check cannot be applied to C stack frames, even
though we do emit pointer bitmaps for the arguments,
because (1) the pointer bitmaps assume all arguments
are always live, not true of outputs during the prologue,
and (2) the pointer bitmaps encode interface values as
pointer pairs, not true of interfaces holding integers.
For the rest of the frames, however, we should hold ourselves
to the rule that a pointer marked live really is initialized.
The interface scanning already implicitly checks this
because it interprets the type word as a valid type pointer.
This may slow things down a little because of the extra loads.
Or it may speed things up because we don't bother enqueuing
nil pointers anymore. Enough of the rest of the system is slow
right now that we can't measure it meaningfully.
Enable for now, even if it is slow, to shake out bugs in the
liveness bitmaps, and then decide whether to turn it off
for the Go 1.3 release (issue 7650 reminds us to do this).
The new m->traceback field lets us force printing of fp=
values on all goroutine stack traces when we detect a
bad pointer. This makes it easier to understand exactly
where in the frame the bad pointer is, so that we can trace
it back to a specific variable and determine what is wrong.
Update #7650
LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/80860044
Change two-bit stack map entries to encode:
0 = dead
1 = scalar
2 = pointer
3 = multiword
If multiword, the two-bit entry for the following word encodes:
0 = string
1 = slice
2 = iface
3 = eface
That way, during stack scanning we can check if a string
is zero length or a slice has zero capacity. We can avoid
following the contained pointer in those cases. It is safe
to do so because it can never be dereferenced, and it is
desirable to do so because it may cause false retention
of the following block in memory.
Slice feature turned off until issue 7564 is fixed.
Update #7549
LGTM=rsc
R=golang-codereviews, bradfitz, rsc
CC=golang-codereviews
https://golang.org/cl/76380043
Structured Exception Handling (SEH) was the first way to handle
exceptions (memory faults, divides by zero) on Windows.
The S might as well stand for "stack-based": the implementation
interprets stack addresses in a few different ways, and it gets
subtly confused by Go's management of stacks. It's also something
that requires active maintenance during cgo switches, and we've
had bugs in that maintenance in the past.
We have recently come to believe that SEH cannot work with
Go's stack usage. See http://golang.org/issue/7325 for details.
Vectored Exception Handling (VEH) is more like a Unix signal
handler: you set it once for the whole process and forget about it.
This CL drops all the SEH code and replaces it with VEH code.
Many special cases and 7 #ifdefs disappear.
VEH was introduced in Windows XP, so Go on windows/386 will
now require Windows XP or later. The previous requirement was
Windows 2000 or later. Windows 2000 immediately preceded
Windows XP, so Windows 2000 is the only affected version.
Microsoft stopped supporting Windows 2000 in 2010.
See http://golang.org/s/win2000-golang-nuts for details.
Fixes#7325.
LGTM=alex.brainman, r
R=golang-codereviews, alex.brainman, stephen.gutekanst, dave
CC=golang-codereviews, iant, r
https://golang.org/cl/74790043
The Solaris network poller uses event ports, which are
level-triggered. As such, it has to re-arm itself after each
wakeup. The arming mechanism (which runs in its own thread) raced
with the closing of a file descriptor happening in a different
thread. When a network file descriptor is about to be closed,
the network poller is awaken to give it a chance to remove its
association with the file descriptor. Because the poller always
re-armed itself, it raced with code that closed the descriptor.
This change makes the network poller check before re-arming if
the file descriptor is about to be closed, in which case it will
ignore the re-arming request. It uses the per-PollDesc lock in
order to serialize access to the PollDesc.
This change also adds extensive documentation describing the
Solaris implementation of the network poller.
Fixes#7410.
LGTM=dvyukov, iant
R=golang-codereviews, bradfitz, iant, dvyukov, aram.h, gobot
CC=golang-codereviews
https://golang.org/cl/69190044
This is especially important for SetPanicOnCrash,
but also useful for e.g. nil deref in mallocgc.
Panics on such crashes can't lead to anything useful,
only to deadlocks, hangs and obscure crashes.
This is a copy of broken but already LGTMed
https://golang.org/cl/68540043/
TBR=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/75320043
Thanks to Ian for spotting these.
runtime.h: define uintreg correctly.
stack.c: address warning caused by the type of uintreg being 32 bits on amd64p32.
Commentary (mainly for my own use)
nacl/amd64p32 defines a machine with 64bit registers, but address space is limited to a 4gb window (the window is placed randomly inside the full 48 bit virtual address space of a process). To cope with this 6c defines _64BIT and _64BITREG.
_64BITREG is always defined by 6c, so both GOARCH=amd64 and GOARCH=amd64p32 use 64bit wide registers.
However _64BIT itself is only defined when 6c is compiling for amd64 targets. The definition is elided for amd64p32 environments causing int, uint and other arch specific types to revert to their 32bit definitions.
LGTM=iant
R=iant, rsc, remyoudompheng
CC=golang-codereviews
https://golang.org/cl/72860046