Commit Graph

134 Commits

Author SHA1 Message Date
Russ Cox 0316dafda2 runtime: rename SysAlloc to sysAlloc for Go
Renaming the C SysAlloc will let Go define a prototype without exporting it.
For use in cpuprof.goc's translation to Go.

LGTM=mdempsky
R=golang-codereviews, mdempsky
CC=golang-codereviews, iant
https://golang.org/cl/140060043
2014-08-30 00:54:40 -04:00
Dmitriy Vyukov 9f38b6c9e5 runtime: clean up GC code
Remove C version of GC.
Convert freeOSMemory to Go.
Restore g0 check in GC.
Remove unknownGCPercent check in GC,
it's initialized explicitly now.

LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews, khr
https://golang.org/cl/139910043
2014-08-29 18:44:38 +04:00
Keith Randall c46bcd4d13 runtime: move finalizer thread to Go.
LGTM=dvyukov
R=golang-codereviews, dvyukov, khr
CC=golang-codereviews
https://golang.org/cl/124630043
2014-08-28 13:23:10 -07:00
Russ Cox 8ecb9a765e runtime: rename Lock to Mutex
Mutex is consistent with package sync, and when in the
unexported Go form it avoids having a conflcit between
the type (now mutex) and the function (lock).

LGTM=iant
R=golang-codereviews, iant
CC=dvyukov, golang-codereviews, r
https://golang.org/cl/133140043
2014-08-27 23:32:49 -04:00
Russ Cox d21638b5ec cmd/cc, runtime: preserve C runtime type names in generated Go
uintptr or uint64 in the runtime C were turning into uint in the Go,
bool was turning into uint8, and so on. Fix that.

Also delete Go wrappers for C functions.
The C functions can be called directly now
(but still eventually need to be converted to Go).

LGTM=bradfitz, minux, iant
R=golang-codereviews, bradfitz, iant, minux
CC=golang-codereviews, khr, r
https://golang.org/cl/138740043
2014-08-27 21:59:49 -04:00
Russ Cox 613383c765 cmd/gc, runtime: treat slices and strings like pointers in garbage collection
Before, a slice with cap=0 or a string with len=0 might have its
base pointer pointing beyond the actual slice/string data into
the next block. The collector had to ignore slices and strings with
cap=0 in order to avoid misinterpreting the base pointer.

Now, a slice with cap=0 or a string with len=0 still has a base
pointer pointing into the actual slice/string data, no matter what.
The collector can now always scan the pointer, which means
strings and slices are no longer special.

Fixes #8404.

LGTM=khr, josharian
R=josharian, khr, dvyukov
CC=golang-codereviews
https://golang.org/cl/112570044
2014-08-25 14:38:19 -04:00
Dmitriy Vyukov b83d8bd0b8 runtime: remove dedicated scavenger thread
A whole thread is too much for background scavenger that sleeps all the time anyway.
We already have sysmon thread that can do this work.
Also remove g->isbackground and simplify enter/exitsyscall.

LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews, khr, rlh
https://golang.org/cl/108640043
2014-08-25 20:59:52 +04:00
Sanjay Menakuru ef50462378 runtime,runtime/debug: Converted some functions from goc to Go.
LGTM=khr
R=golang-codereviews, khr
CC=dvyukov, golang-codereviews
https://golang.org/cl/131010044
2014-08-24 20:27:00 -07:00
Dmitriy Vyukov 7f223e3b3b runtime: fix races on mheap.allspans
This is based on the crash dump provided by Alan
and on mental experiments:

sweep 0 74
fatal error: gc: unswept span
runtime stack:
runtime.throw(0x9df60d)
markroot(0xc208002000, 0x3)
runtime.parfordo(0xc208002000)
runtime.gchelper()

I think that when we moved all stacks into heap,
we introduced a bunch of bad data races. This was later
worsened by parallel stack shrinking.

Observation 1: exitsyscall can allocate a stack from heap at any time (including during STW).
Observation 2: parallel stack shrinking can (surprisingly) grow heap during marking.
Consider that we steadily grow stacks of a number of goroutines from 8K to 16K.
And during GC they all can be shrunk back to 8K. Shrinking will allocate lots of 8K
stacks, and we do not necessary have that many in heap at this moment. So shrinking
can grow heap as well.

Consequence: any access to mheap.allspans in GC (and otherwise) must take heap lock.
This is not true in several places.

Fix this by protecting accesses to mheap.allspans and introducing allspans cache for marking,
similar to what we use for sweeping.

LGTM=rsc
R=golang-codereviews, rsc
CC=adonovan, golang-codereviews, khr, rlh
https://golang.org/cl/126510043
2014-08-24 12:05:07 +04:00
Dmitriy Vyukov 684de04118 runtime: convert common scheduler functions to Go
These are required for chans, semaphores, timers, etc.

LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, rlh, rsc
https://golang.org/cl/123640043
2014-08-21 20:41:09 +04:00
Dmitriy Vyukov 30ef2c7deb runtime: fix memstats
Newly allocated memory is subtracted from inuse, while it was never added to inuse.
Span leftovers are subtracted from both inuse and idle,
while they were never added.
Fixes #8544.
Fixes #8430.

LGTM=khr, cookieo9
R=golang-codereviews, khr, cookieo9
CC=golang-codereviews, rlh, rsc
https://golang.org/cl/130200044
2014-08-19 11:46:05 +04:00
Dmitriy Vyukov e0df11d57e runtime: implement transfer cache
Currently we do the following dance after sweeping a span:
1. lock mcentral
2. remove the span from a list
3. unlock mcentral
4. unmark span
5. lock mheap
6. insert the span into heap
7. unlock mheap
8. lock mcentral
9. observe empty list
10. unlock mcentral
11. lock mheap
12. grab the span
13. unlock mheap
14. mark span
15. lock mcentral
16. insert the span into empty list
17. unlock mcentral

This change short-circuits this sequence to nothing,
that is, we just cache and use the span after sweeping.

This gives us functionality similar (even better) to tcmalloc's transfer cache.

benchmark            old ns/op     new ns/op     delta
BenchmarkMalloc8     22.2          19.5          -12.16%
BenchmarkMalloc16    31.0          26.6          -14.19%

LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, rlh, rsc
https://golang.org/cl/119550043
2014-08-18 16:52:31 +04:00
Dmitriy Vyukov 1837419f30 runtime: remove FlagNoProfile
Turns out to be unused as well.

LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=golang-codereviews, khr
https://golang.org/cl/127170044
2014-08-13 01:03:32 +04:00
Dmitriy Vyukov 44106a0880 runtime: bump MaxGcprocs to 32
There was a number of improvements related to GC parallelization:
1. Parallel roots/stacks scanning.
2. Parallel stack shrinking.
3. Per-thread workbuf caches.
4. Workset reduction.
Currently 32 threads work well.
go.benchmarks:garbage benchmark on 2 x Intel Xeon E5-2690 (16 HT cores)

1 thread/1 processor:
time=16405255
cputime=16386223
gc-pause-one=546793975
gc-pause-total=3280763

2 threads/1 processor:
time=9043497
cputime=18075822
gc-pause-one=331116489
gc-pause-total=2152257

4 threads/1 processor:
time=4882030
cputime=19421337
gc-pause-one=174543105
gc-pause-total=1134530

8 threads/1 processor:
time=4134757
cputime=20097075
gc-pause-one=158680588
gc-pause-total=1015555

16 threads/1 processor + HT:
time=2006706
cputime=31960509
gc-pause-one=75425744
gc-pause-total=460097

16 threads/2 processors:
time=1513373
cputime=23805571
gc-pause-one=56630946
gc-pause-total=345448

32 threads/2 processors + HT:
time=1199312
cputime=37592764
gc-pause-one=48945064
gc-pause-total=278986

LGTM=rlh
R=golang-codereviews, tracey.brendan, rlh
CC=golang-codereviews, khr, rsc
https://golang.org/cl/123920043
2014-08-08 20:52:11 +04:00
Peter Collingbourne e03bce158f cmd/cc, runtime: eliminate use of the unnamed substructure C extension
Eliminating use of this extension makes it easier to port the Go runtime
to other compilers. This CL also disables the extension in cc to prevent
accidental use.

LGTM=rsc, khr
R=rsc, aram, khr, dvyukov
CC=axwalk, golang-codereviews
https://golang.org/cl/106790044
2014-08-07 09:00:02 -04:00
Dmitriy Vyukov aac7f1a0d6 runtime: convert markallocated from C to Go
benchmark                      old ns/op     new ns/op     delta
BenchmarkMalloc8               28.7          22.4          -21.95%
BenchmarkMalloc16              44.8          33.8          -24.55%
BenchmarkMallocTypeInfo8       49.0          32.9          -32.86%
BenchmarkMallocTypeInfo16      46.7          35.8          -23.34%
BenchmarkMallocLargeStruct     907           901           -0.66%
BenchmarkGobDecode             13235542      12036851      -9.06%
BenchmarkGobEncode             10639699      9539155       -10.34%
BenchmarkJSONEncode            25193036      21898922      -13.08%
BenchmarkJSONDecode            96104044      89464904      -6.91%

Fixes #8452.

LGTM=khr
R=golang-codereviews, bradfitz, rsc, dave, khr
CC=golang-codereviews
https://golang.org/cl/122090043
2014-08-07 13:34:30 +04:00
Dmitriy Vyukov cd2f8356ce runtime: remove mal/malloc/FlagNoGC/FlagNoInvokeGC
FlagNoGC is unused now.
FlagNoInvokeGC is unneeded as we don't invoke GC
on g0 and when holding locks anyway.
mal/malloc have very few uses and you never remember
the exact set of flags they use and the difference between them.
Moreover, eventually we need to give exact types to all allocations,
something what mal/malloc do not support.

LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, rsc
https://golang.org/cl/117580043
2014-08-07 13:04:04 +04:00
Dmitriy Vyukov f3dd6df6b1 runtime: simplify code
Full spans can't be passed to UncacheSpan since we get rid of free.

LGTM=rsc
R=golang-codereviews
CC=golang-codereviews, khr, rsc
https://golang.org/cl/119490044
2014-08-06 18:36:48 +04:00
Dmitriy Vyukov f6f2f77142 runtime: cache one GC workbuf in thread-local storage
We call scanblock for lots of small root pieces
e.g. for every stack frame args and locals area.
Every scanblock invocation calls getempty/putempty,
which accesses lock-free stack shared among all worker threads.
One-element local cache allows most scanblock calls
to proceed without accessing the shared stack.

LGTM=rsc
R=golang-codereviews, rlh
CC=golang-codereviews, khr, rsc
https://golang.org/cl/121250043
2014-08-06 01:50:37 +04:00
Dmitriy Vyukov cecca43804 runtime: get rid of free
Several reasons:
1. Significantly simplifies runtime.
2. This code proved to be buggy.
3. Free is incompatible with bump-the-pointer allocation.
4. We want to write runtime in Go, Go does not have free.
5. Too much code to free env strings on startup.

LGTM=khr
R=golang-codereviews, josharian, tracey.brendan, khr
CC=bradfitz, golang-codereviews, r, rlh, rsc
https://golang.org/cl/116390043
2014-07-31 12:55:40 +04:00
Keith Randall 4aa50434e1 runtime: rewrite malloc in Go.
This change introduces gomallocgc, a Go clone of mallocgc.
Only a few uses have been moved over, so there are still
lots of uses from C. Many of these C uses will be moved
over to Go (e.g. in slice.goc), but probably not all.
What should remain of C's mallocgc is an open question.

LGTM=rsc, dvyukov
R=rsc, khr, dave, bradfitz, dvyukov
CC=golang-codereviews
https://golang.org/cl/108840046
2014-07-30 09:01:52 -07:00
Dmitriy Vyukov cd17a717f9 runtime: simpler and faster GC
Implement the design described in:
https://docs.google.com/document/d/1v4Oqa0WwHunqlb8C3ObL_uNQw3DfSY-ztoA-4wWbKcg/pub

Summary of the changes:
GC uses "2-bits per word" pointer type info embed directly into bitmap.
Scanning of stacks/data/heap is unified.
The old spans types go away.
Compiler generates "sparse" 4-bits type info for GC (directly for GC bitmap).
Linker generates "dense" 2-bits type info for data/bss (the same as stacks use).

Summary of results:
-1680 lines of code total (-1000+ in mgc0.c only)
-25% memory consumption
-3-7% binary size
-15% GC pause reduction
-7% run time reduction

LGTM=khr
R=golang-codereviews, rsc, christoph, khr
CC=golang-codereviews, rlh
https://golang.org/cl/106260045
2014-07-29 11:01:02 +04:00
Keith Randall f378f30034 undo CL 101570044 / 2c57aaea79c4
redo stack allocation.  This is mostly the same as
the original CL with a few bug fixes.

1. add racemalloc() for stack allocations
2. fix poolalloc/poolfree to terminate free lists correctly.
3. adjust span ref count correctly.
4. don't use cache for sizes >= StackCacheSize.

Should fix bugs and memory leaks in original changelist.

««« original CL description
undo CL 104200047 / 318b04f28372

Breaks windows and race detector.
TBR=rsc

««« original CL description
runtime: stack allocator, separate from mallocgc

In order to move malloc to Go, we need to have a
separate stack allocator.  If we run out of stack
during malloc, malloc will not be available
to allocate a new stack.

Stacks are the last remaining FlagNoGC objects in the
GC heap.  Once they are out, we can get rid of the
distinction between the allocated/blockboundary bits.
(This will be in a separate change.)

Fixes #7468
Fixes #7424

LGTM=rsc, dvyukov
R=golang-codereviews, dvyukov, khr, dave, rsc
CC=golang-codereviews
https://golang.org/cl/104200047
»»»

TBR=rsc
CC=golang-codereviews
https://golang.org/cl/101570044
»»»

LGTM=dvyukov
R=dvyukov, dave, khr, alex.brainman
CC=golang-codereviews
https://golang.org/cl/112240044
2014-07-17 14:41:46 -07:00
Keith Randall 0c6b55e76b runtime: convert map implementation to Go.
It's a bit slower, but not painfully so.  There is still room for
improvement (saving space so we can use nosplit, and removing the
requirement for hash/eq stubs).

benchmark                              old ns/op     new ns/op     delta
BenchmarkMegMap                        23.5          24.2          +2.98%
BenchmarkMegOneMap                     14.9          15.7          +5.37%
BenchmarkMegEqMap                      71668         72234         +0.79%
BenchmarkMegEmptyMap                   4.05          4.93          +21.73%
BenchmarkSmallStrMap                   21.9          22.5          +2.74%
BenchmarkMapStringKeysEight_16         23.1          26.3          +13.85%
BenchmarkMapStringKeysEight_32         21.9          25.0          +14.16%
BenchmarkMapStringKeysEight_64         21.9          25.1          +14.61%
BenchmarkMapStringKeysEight_1M         21.9          25.0          +14.16%
BenchmarkIntMap                        21.8          12.5          -42.66%
BenchmarkRepeatedLookupStrMapKey32     39.3          30.2          -23.16%
BenchmarkRepeatedLookupStrMapKey1M     322353        322675        +0.10%
BenchmarkNewEmptyMap                   129           136           +5.43%
BenchmarkMapIter                       137           107           -21.90%
BenchmarkMapIterEmpty                  7.14          8.71          +21.99%
BenchmarkSameLengthMap                 5.24          6.82          +30.15%
BenchmarkBigKeyMap                     34.5          35.3          +2.32%
BenchmarkBigValMap                     36.1          36.1          +0.00%
BenchmarkSmallKeyMap                   26.9          26.7          -0.74%

LGTM=rsc
R=golang-codereviews, dave, dvyukov, rsc, gobot, khr
CC=golang-codereviews
https://golang.org/cl/99380043
2014-07-16 14:16:19 -07:00
Keith Randall 3cf83c182a undo CL 104200047 / 318b04f28372
Breaks windows and race detector.
TBR=rsc

««« original CL description
runtime: stack allocator, separate from mallocgc

In order to move malloc to Go, we need to have a
separate stack allocator.  If we run out of stack
during malloc, malloc will not be available
to allocate a new stack.

Stacks are the last remaining FlagNoGC objects in the
GC heap.  Once they are out, we can get rid of the
distinction between the allocated/blockboundary bits.
(This will be in a separate change.)

Fixes #7468
Fixes #7424

LGTM=rsc, dvyukov
R=golang-codereviews, dvyukov, khr, dave, rsc
CC=golang-codereviews
https://golang.org/cl/104200047
»»»

TBR=rsc
CC=golang-codereviews
https://golang.org/cl/101570044
2014-06-30 19:48:08 -07:00
Keith Randall 7c13860cd0 runtime: stack allocator, separate from mallocgc
In order to move malloc to Go, we need to have a
separate stack allocator.  If we run out of stack
during malloc, malloc will not be available
to allocate a new stack.

Stacks are the last remaining FlagNoGC objects in the
GC heap.  Once they are out, we can get rid of the
distinction between the allocated/blockboundary bits.
(This will be in a separate change.)

Fixes #7468
Fixes #7424

LGTM=rsc, dvyukov
R=golang-codereviews, dvyukov, khr, dave, rsc
CC=golang-codereviews
https://golang.org/cl/104200047
2014-06-30 18:59:24 -07:00
Keith Randall 65c63dc4aa runtime: write memory profile statistics to the heap dump.
LGTM=rsc
R=rsc, khr
CC=golang-codereviews
https://golang.org/cl/97010043
2014-05-08 08:35:49 -07:00
Keith Randall 29d1b211fd runtime: clean up scanning of Gs
Use a real type for Gs instead of scanning them conservatively.
Zero the schedlink pointer when it is dead.

Update #7820

LGTM=rsc
R=rsc, dvyukov
CC=golang-codereviews
https://golang.org/cl/89360043
2014-04-28 12:47:09 -04:00
Russ Cox 28f1868fed cmd/gc, runtime: make GODEBUG=gcdead=1 mode work with liveness
Trying to make GODEBUG=gcdead=1 work with liveness
and in particular ambiguously live variables.

1. In the liveness computation, mark all ambiguously live
variables as live for the entire function, except the entry.
They are zeroed directly after entry, and we need them not
to be poisoned thereafter.

2. In the liveness computation, compute liveness (and deadness)
for all parameters, not just pointer-containing parameters.
Otherwise gcdead poisons untracked scalar parameters and results.

3. Fix liveness debugging print for -live=2 to use correct bitmaps.
(Was not updated for compaction during compaction CL.)

4. Correct varkill during map literal initialization.
Was killing the map itself instead of the inserted value temp.

5. Disable aggressive varkill cleanup for call arguments if
the call appears in a defer or go statement.

6. In the garbage collector, avoid bug scanning empty
strings. An empty string is two zeros. The multiword
code only looked at the first zero and then interpreted
the next two bits in the bitmap as an ordinary word bitmap.
For a string the bits are 11 00, so if a live string was zero
length with a 0 base pointer, the poisoning code treated
the length as an ordinary word with code 00, meaning it
needed poisoning, turning the string into a poison-length
string with base pointer 0. By the same logic I believe that
a live nil slice (bits 11 01 00) will have its cap poisoned.
Always scan full multiword struct.

7. In the runtime, treat both poison words (PoisonGC and
PoisonStack) as invalid pointers that warrant crashes.

Manual testing as follows:

- Create a script called gcdead on your PATH containing:

        #!/bin/bash
        GODEBUG=gcdead=1 GOGC=10 GOTRACEBACK=2 exec "$@"
- Now you can build a test and then run 'gcdead ./foo.test'.
- More importantly, you can run 'go test -short -exec gcdead std'
   to run all the tests.

Fixes #7676.

While here, enable the precise scanning of slices, since that was
disabled due to bugs like these. That now works, both with and
without gcdead.

Fixes #7549.

LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/83410044
2014-04-03 20:33:25 -04:00
Russ Cox 81bc9b3ffd runtime: revert change to PoisonPtr value
Submitted accidentally in CL 83630044.
Fixes various builds.

TBR=khr
CC=golang-codereviews
https://golang.org/cl/83100047
2014-04-02 16:55:30 -04:00
Russ Cox 4676fae525 cmd/gc, cmd/ld, runtime: compact liveness bitmaps
Reduce footprint of liveness bitmaps by about 5x.

1. Mark all liveness bitmap symbols as 4-byte aligned
(they were aligned to a larger size by default).

2. The bitmap data is a bitmap count n followed by n bitmaps.
Each bitmap begins with its own count m giving the number
of bits. All the m's are the same for the n bitmaps.
Emit this bitmap length once instead of n times.

3. Many bitmaps within a function have the same bit values,
but each call site was given a distinct bitmap. Merge duplicate
bitmaps so that no bitmap is written more than once.

4. Many functions end up with the same aggregate bitmap data.
We used to name the bitmap data funcname.gcargs and funcname.gclocals.
Instead, name it gclocals.<md5 of data> and mark it dupok so
that the linker coalesces duplicate sets. This cut the bitmap
data remaining after step 3 by 40%; I was not expecting it to
be quite so dramatic.

Applied to "go build -ldflags -w code.google.com/p/go.tools/cmd/godoc":

                bitmaps           pclntab           binary on disk
before this CL  1326600           1985854           12738268
4-byte align    1154288 (0.87x)   1985854 (1.00x)   12566236 (0.99x)
one bitmap len   782528 (0.54x)   1985854 (1.00x)   12193500 (0.96x)
dedup bitmap     414748 (0.31x)   1948478 (0.98x)   11787996 (0.93x)
dedup bitmap set 245580 (0.19x)   1948478 (0.98x)   11620060 (0.91x)

While here, remove various dead blocks of code from plive.c.

Fixes #6929.
Fixes #7568.

LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/83630044
2014-04-02 16:49:27 -04:00
Russ Cox 1ec4d5e9e7 runtime: adjust GODEBUG=allocfreetrace=1 and GODEBUG=gcdead=1
GODEBUG=allocfreetrace=1:

The allocfreetrace=1 mode prints a stack trace for each block
allocated and freed, and also a stack trace for each garbage collection.

It was implemented by reusing the heap profiling support: if allocfreetrace=1
then the heap profile was effectively running at 1 sample per 1 byte allocated
(always sample). The stack being shown at allocation was the stack gathered
for profiling, meaning it was derived only from the program counters and
did not include information about function arguments or frame pointers.
The stack being shown at free was the allocation stack, not the free stack.
If you are generating this log, you can find the allocation stack yourself, but
it can be useful to see exactly the sequence that led to freeing the block:
was it the garbage collector or an explicit free? Now that the garbage collector
runs on an m0 stack, the stack trace for the garbage collector was never interesting.

Fix all these problems:

1. Decouple allocfreetrace=1 from heap profiling.
2. Print the standard goroutine stack traces instead of a custom format.
3. Print the stack trace at time of allocation for an allocation,
   and print the stack trace at time of free (not the allocation trace again)
   for a free.
4. Print all goroutine stacks at garbage collection. Having all the stacks
   means that you can see the exact point at which each goroutine was
   preempted, which is often useful for identifying liveness-related errors.

GODEBUG=gcdead=1:

This mode overwrites dead pointers with a poison value.
Detect the poison value as an invalid pointer during collection,
the same way that small integers are invalid pointers.

LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/81670043
2014-04-01 13:30:10 -04:00
Dmitriy Vyukov f8c350873c runtime: fix yet another race in bgsweep
Currently it's possible that bgsweep finishes before all spans
have been swept (we only know that sweeping of all spans has *started*).
In such case bgsweep may fail wake up runfinq goroutine when it needs to.
finq may still be nil at this point, but some finalizers may be queued later.
Make bgsweep to wait for sweeping to *complete*, then it can decide
whether it needs to wake up runfinq for sure.
Update #7533

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/75960043
2014-03-26 15:11:36 +04:00
Keith Randall fff63c2448 runtime: WriteHeapDump dumps the heap to a file.
See http://golang.org/s/go13heapdump for the file format.

LGTM=rsc
R=rsc, bradfitz, dvyukov, khr
CC=golang-codereviews
https://golang.org/cl/37540043
2014-03-25 15:09:49 -07:00
Keith Randall 1b45cc45e3 runtime: redo stack map entries to avoid false retention
Change two-bit stack map entries to encode:
0 = dead
1 = scalar
2 = pointer
3 = multiword

If multiword, the two-bit entry for the following word encodes:
0 = string
1 = slice
2 = iface
3 = eface

That way, during stack scanning we can check if a string
is zero length or a slice has zero capacity.  We can avoid
following the contained pointer in those cases.  It is safe
to do so because it can never be dereferenced, and it is
desirable to do so because it may cause false retention
of the following block in memory.

Slice feature turned off until issue 7564 is fixed.

Update #7549

LGTM=rsc
R=golang-codereviews, bradfitz, rsc
CC=golang-codereviews
https://golang.org/cl/76380043
2014-03-25 14:11:34 -07:00
Ian Lance Taylor 4ebfa83199 runtime: accurately record whether heap memory is reserved
The existing code did not have a clear notion of whether
memory has been actually reserved.  It checked based on
whether in 32-bit mode or 64-bit mode and (on GNU/Linux) the
requested address, but it confused the requested address and
the returned address.

LGTM=rsc
R=rsc, dvyukov
CC=golang-codereviews, michael.hudson
https://golang.org/cl/79610043
2014-03-25 13:22:19 -07:00
Keith Randall f4359afa7f runtime: shrink bigger stacks without any copying.
Instead, split the underlying storage in half and
free just half of it.

Shrinking without copying lets us reclaim storage used
by a previously profligate Go routine that has now blocked
inside some C code.

To shrink in place, we need all stacks to be a power of 2 in size.

LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews
https://golang.org/cl/69580044
2014-03-06 16:03:43 -08:00
Russ Cox da1bea0ef0 runtime: fix malloc page alignment + efence
Two memory allocator bug fixes.

- efence is not maintaining the proper heap metadata
  to make eventual memory reuse safe, so use SysFault.

- now that our heap PageSize is 8k but most hardware
  uses 4k pages, SysAlloc and SysReserve results must be
  explicitly aligned. Do that in a few more call sites and
  document this fact in malloc.h.

Fixes #7448.

LGTM=iant
R=golang-codereviews, josharian, iant
CC=dvyukov, golang-codereviews
https://golang.org/cl/71750048
2014-03-06 18:34:29 -05:00
Keith Randall 1665b006a5 runtime: grow stack by copying
On stack overflow, if all frames on the stack are
copyable, we copy the frames to a new stack twice
as large as the old one.  During GC, if a G is using
less than 1/4 of its stack, copy the stack to a stack
half its size.

TODO
- Do something about C frames.  When a C frame is in the
  stack segment, it isn't copyable.  We allocate a new segment
  in this case.
  - For idempotent C code, we can abort it, copy the stack,
    then retry.  I'm working on a separate CL for this.
  - For other C code, we can raise the stackguard
    to the lowest Go frame so the next call that Go frame
    makes triggers a copy, which will then succeed.
- Pick a starting stack size?

The plan is that eventually we reach a point where the
stack contains only copyable frames.

LGTM=rsc
R=dvyukov, rsc
CC=golang-codereviews
https://golang.org/cl/54650044
2014-02-26 23:28:44 -08:00
Keith Randall 3b5278fca6 runtime: get rid of the settype buffer and lock.
MCaches	now hold a MSpan for each sizeclass which they have
exclusive access to allocate from, so no lock is needed.

Modifying the heap bitmaps also no longer requires a cas.

runtime.free gets more expensive.  But we don't use it
much any more.

It's not much faster on 1 processor, but it's a lot
faster on multiple processors.

benchmark                 old ns/op    new ns/op    delta
BenchmarkSetTypeNoPtr1           24           23   -0.42%
BenchmarkSetTypeNoPtr2           33           34   +0.89%
BenchmarkSetTypePtr1             51           49   -3.72%
BenchmarkSetTypePtr2             55           54   -1.98%

benchmark                old ns/op    new ns/op    delta
BenchmarkAllocation          52739        50770   -3.73%
BenchmarkAllocation-2        33957        34141   +0.54%
BenchmarkAllocation-3        33326        29015  -12.94%
BenchmarkAllocation-4        38105        25795  -32.31%
BenchmarkAllocation-5        68055        24409  -64.13%
BenchmarkAllocation-6        71544        23488  -67.17%
BenchmarkAllocation-7        68374        23041  -66.30%
BenchmarkAllocation-8        70117        20758  -70.40%

LGTM=rsc, dvyukov
R=dvyukov, bradfitz, khr, rsc
CC=golang-codereviews
https://golang.org/cl/46810043
2014-02-26 15:52:58 -08:00
Russ Cox 67c83db60d runtime: use goc2c as much as possible
Package runtime's C functions written to be called from Go
started out written in C using carefully constructed argument
lists and the FLUSH macro to write a result back to memory.

For some functions, the appropriate parameter list ended up
being architecture-dependent due to differences in alignment,
so we added 'goc2c', which takes a .goc file containing Go func
declarations but C bodies, rewrites the Go func declaration to
equivalent C declarations for the target architecture, adds the
needed FLUSH statements, and writes out an equivalent C file.
That C file is compiled as part of package runtime.

Native Client's x86-64 support introduces the most complex
alignment rules yet, breaking many functions that could until
now be portably written in C. Using goc2c for those avoids the
breakage.

Separately, Keith's work on emitting stack information from
the C compiler would require the hand-written functions
to add #pragmas specifying how many arguments are result
parameters. Using goc2c for those avoids maintaining #pragmas.

For both reasons, use goc2c for as many Go-called C functions
as possible.

This CL is a replay of the bulk of CL 15400047 and CL 15790043,
both of which were reviewed as part of the NaCl port and are
checked in to the NaCl branch. This CL is part of bringing the
NaCl code into the main tree.

No new code here, just reformatting and occasional movement
into .h files.

LGTM=r
R=dave, alex.brainman, r
CC=golang-codereviews
https://golang.org/cl/65220044
2014-02-20 15:58:47 -05:00
Russ Cox e8ecd9f67a runtime: update malloc comment for MSpan.needzero
Missed this suggestion in CL 57680046.

LGTM=iant
R=iant
CC=golang-codereviews
https://golang.org/cl/63390043
2014-02-13 14:31:48 -05:00
Russ Cox 86e3cb8da5 runtime: introduce MSpan.needzero instead of writing to span data
This cleans up the code significantly, and it avoids any
possible problems with madvise zeroing out some but
not all of the data.

Fixes #6400.

LGTM=dave
R=dvyukov, dave
CC=golang-codereviews
https://golang.org/cl/57680046
2014-02-13 11:10:31 -05:00
Dmitriy Vyukov 3c3be62201 runtime: concurrent GC sweep
Moves sweep phase out of stoptheworld by adding
background sweeper goroutine and lazy on-demand sweeping.

It turned out to be somewhat trickier than I expected,
because there is no point in time when we know size of live heap
nor consistent number of mallocs and frees.
So everything related to next_gc, mprof, memstats, etc becomes trickier.

At the end of GC next_gc is conservatively set to heap_alloc*GOGC,
which is much larger than real value. But after every sweep
next_gc is decremented by freed*GOGC. So when everything is swept
next_gc becomes what it should be.

For mprof I had to introduce 3-generation scheme (allocs, revent_allocs, prev_allocs),
because by the end of GC we know number of frees for the *previous* GC.

Significant caution is required to not cross yet-unknown real value of next_gc.
This is achieved by 2 means:
1. Whenever I allocate a span from MCentral, I sweep a span in that MCentral.
2. Whenever I allocate N pages from MHeap, I sweep until at least N pages are
returned to heap.
This provides quite strong guarantees that heap does not grow when it should now.

http-1
allocated                    7036         7033      -0.04%
allocs                         60           60      +0.00%
cputime                     51050        46700      -8.52%
gc-pause-one             34060569      1777993     -94.78%
gc-pause-total               2554          133     -94.79%
latency-50                 178448       170926      -4.22%
latency-95                 284350       198294     -30.26%
latency-99                 345191       220652     -36.08%
rss                     101564416    101007360      -0.55%
sys-gc                    6606832      6541296      -0.99%
sys-heap                 88801280     87752704      -1.18%
sys-other                 7334208      7405928      +0.98%
sys-stack                  524288       524288      +0.00%
sys-total               103266608    102224216      -1.01%
time                        50339        46533      -7.56%
virtual-mem             292990976    293728256      +0.25%

garbage-1
allocated                 2983818      2990889      +0.24%
allocs                      62880        62902      +0.03%
cputime                  16480000     16190000      -1.76%
gc-pause-one            828462467    487875135     -41.11%
gc-pause-total            4142312      2439375     -41.11%
rss                    1151709184   1153712128      +0.17%
sys-gc                   66068352     66068352      +0.00%
sys-heap               1039728640   1039728640      +0.00%
sys-other                37776064     40770176      +7.93%
sys-stack                 8781824      8781824      +0.00%
sys-total              1152354880   1155348992      +0.26%
time                     16496998     16199876      -1.80%
virtual-mem            1409564672   1402281984      -0.52%

LGTM=rsc
R=golang-codereviews, sameer, rsc, iant, jeremyjackins, gobot
CC=golang-codereviews, khr
https://golang.org/cl/46430043
2014-02-12 22:16:42 +04:00
Dmitriy Vyukov e48751e217 runtime: increase page size to 8K
Tcmalloc uses 8K, 32K and 64K pages, and in custom setups 256K pages.
Only Chromium uses 4K pages today (in "slow but small" configuration).
The general tendency is to increase page size, because it reduces
metadata size and DTLB pressure.
This change reduces GC pause by ~10% and slightly improves other metrics.

json-1
allocated                 8037492      8038689      +0.01%
allocs                     105762       105573      -0.18%
cputime                 158400000    155800000      -1.64%
gc-pause-one              4412234      4135702      -6.27%
gc-pause-total            2647340      2398707      -9.39%
rss                      54923264     54525952      -0.72%
sys-gc                    3952624      3928048      -0.62%
sys-heap                 46399488     46006272      -0.85%
sys-other                 5597504      5290304      -5.49%
sys-stack                  393216       393216      +0.00%
sys-total                56342832     55617840      -1.29%
time                    158478890    156046916      -1.53%
virtual-mem             256548864    256593920      +0.02%

garbage-1
allocated                 2991113      2986259      -0.16%
allocs                      62844        62652      -0.31%
cputime                  16330000     15860000      -2.88%
gc-pause-one            789108229    725555211      -8.05%
gc-pause-total            3945541      3627776      -8.05%
rss                    1143660544   1132253184      -1.00%
sys-gc                   65609600     65806208      +0.30%
sys-heap               1032388608   1035599872      +0.31%
sys-other                37501632     22777664     -39.26%
sys-stack                 8650752      8781824      +1.52%
sys-total              1144150592   1132965568      -0.98%
time                     16364602     15891994      -2.89%
virtual-mem            1327296512   1313746944      -1.02%

This is the exact reincarnation of already LGTMed:
https://golang.org/cl/45770044
which must not break darwin/freebsd after:
https://golang.org/cl/56630043
TBR=iant

LGTM=khr, iant
R=iant, khr
CC=golang-codereviews
https://golang.org/cl/58230043
2014-01-30 13:28:19 +04:00
Dmitriy Vyukov 86a3a54284 runtime: fix windows build
Currently windows crashes because early allocs in schedinit
try to allocate tiny memory blocks, but m->p is not yet setup.
I've considered calling procresize(1) earlier in schedinit,
but this refactoring is better and must fix the issue as well.
Fixes #7218.

R=golang-codereviews, r
CC=golang-codereviews
https://golang.org/cl/54570045
2014-01-28 00:26:56 +04:00
Dmitriy Vyukov bace9523ee runtime: smarter slice grow
When growing slice take into account size of the allocated memory block.
Also apply the same optimization to string->[]byte conversion.
Fixes #6307.

benchmark                    old ns/op    new ns/op    delta
BenchmarkAppendGrowByte        4541036      4434108   -2.35%
BenchmarkAppendGrowString     59885673     44813604  -25.17%

LGTM=khr
R=khr
CC=golang-codereviews, iant, rsc
https://golang.org/cl/53340044
2014-01-27 15:11:12 +04:00
Dmitriy Vyukov 1fa7029425 runtime: combine small NoScan allocations
Combine NoScan allocations < 16 bytes into a single memory block.
Reduces number of allocations on json/garbage benchmarks by 10+%.

json-1
allocated                 8039872      7949194      -1.13%
allocs                     105774        93776     -11.34%
cputime                 156200000    100700000     -35.53%
gc-pause-one              4908873      3814853     -22.29%
gc-pause-total            2748969      2899288      +5.47%
rss                      52674560     43560960     -17.30%
sys-gc                    3796976      3256304     -14.24%
sys-heap                 43843584     35192832     -19.73%
sys-other                 5589312      5310784      -4.98%
sys-stack                  393216       393216      +0.00%
sys-total                53623088     44153136     -17.66%
time                    156193436    100886714     -35.41%
virtual-mem             256548864    256540672      -0.00%

garbage-1
allocated                 2996885      2932982      -2.13%
allocs                      62904        55200     -12.25%
cputime                  17470000     17400000      -0.40%
gc-pause-one            932757485    925806143      -0.75%
gc-pause-total            4663787      4629030      -0.75%
rss                    1151074304   1133670400      -1.51%
sys-gc                   66068352     65085312      -1.49%
sys-heap               1039728640   1024065536      -1.51%
sys-other                38038208     37485248      -1.45%
sys-stack                 8650752      8781824      +1.52%
sys-total              1152485952   1135417920      -1.48%
time                     17478088     17418005      -0.34%
virtual-mem            1343709184   1324204032      -1.45%

LGTM=iant, bradfitz
R=golang-codereviews, dave, iant, rsc, bradfitz
CC=golang-codereviews, khr
https://golang.org/cl/38750047
2014-01-24 22:35:11 +04:00
Dmitriy Vyukov 8371b0142e undo CL 45770044 / d795425bfa18
Breaks darwin and freebsd.

««« original CL description
runtime: increase page size to 8K
Tcmalloc uses 8K, 32K and 64K pages, and in custom setups 256K pages.
Only Chromium uses 4K pages today (in "slow but small" configuration).
The general tendency is to increase page size, because it reduces
metadata size and DTLB pressure.
This change reduces GC pause by ~10% and slightly improves other metrics.

json-1
allocated                 8037492      8038689      +0.01%
allocs                     105762       105573      -0.18%
cputime                 158400000    155800000      -1.64%
gc-pause-one              4412234      4135702      -6.27%
gc-pause-total            2647340      2398707      -9.39%
rss                      54923264     54525952      -0.72%
sys-gc                    3952624      3928048      -0.62%
sys-heap                 46399488     46006272      -0.85%
sys-other                 5597504      5290304      -5.49%
sys-stack                  393216       393216      +0.00%
sys-total                56342832     55617840      -1.29%
time                    158478890    156046916      -1.53%
virtual-mem             256548864    256593920      +0.02%

garbage-1
allocated                 2991113      2986259      -0.16%
allocs                      62844        62652      -0.31%
cputime                  16330000     15860000      -2.88%
gc-pause-one            789108229    725555211      -8.05%
gc-pause-total            3945541      3627776      -8.05%
rss                    1143660544   1132253184      -1.00%
sys-gc                   65609600     65806208      +0.30%
sys-heap               1032388608   1035599872      +0.31%
sys-other                37501632     22777664     -39.26%
sys-stack                 8650752      8781824      +1.52%
sys-total              1144150592   1132965568      -0.98%
time                     16364602     15891994      -2.89%
virtual-mem            1327296512   1313746944      -1.02%

R=golang-codereviews, dave, khr, rsc, khr
CC=golang-codereviews
https://golang.org/cl/45770044
»»»

R=golang-codereviews
CC=golang-codereviews
https://golang.org/cl/56060043
2014-01-23 19:56:59 +04:00
Dmitriy Vyukov 6d603af6dc runtime: increase page size to 8K
Tcmalloc uses 8K, 32K and 64K pages, and in custom setups 256K pages.
Only Chromium uses 4K pages today (in "slow but small" configuration).
The general tendency is to increase page size, because it reduces
metadata size and DTLB pressure.
This change reduces GC pause by ~10% and slightly improves other metrics.

json-1
allocated                 8037492      8038689      +0.01%
allocs                     105762       105573      -0.18%
cputime                 158400000    155800000      -1.64%
gc-pause-one              4412234      4135702      -6.27%
gc-pause-total            2647340      2398707      -9.39%
rss                      54923264     54525952      -0.72%
sys-gc                    3952624      3928048      -0.62%
sys-heap                 46399488     46006272      -0.85%
sys-other                 5597504      5290304      -5.49%
sys-stack                  393216       393216      +0.00%
sys-total                56342832     55617840      -1.29%
time                    158478890    156046916      -1.53%
virtual-mem             256548864    256593920      +0.02%

garbage-1
allocated                 2991113      2986259      -0.16%
allocs                      62844        62652      -0.31%
cputime                  16330000     15860000      -2.88%
gc-pause-one            789108229    725555211      -8.05%
gc-pause-total            3945541      3627776      -8.05%
rss                    1143660544   1132253184      -1.00%
sys-gc                   65609600     65806208      +0.30%
sys-heap               1032388608   1035599872      +0.31%
sys-other                37501632     22777664     -39.26%
sys-stack                 8650752      8781824      +1.52%
sys-total              1144150592   1132965568      -0.98%
time                     16364602     15891994      -2.89%
virtual-mem            1327296512   1313746944      -1.02%

R=golang-codereviews, dave, khr, rsc, khr
CC=golang-codereviews
https://golang.org/cl/45770044
2014-01-23 18:59:43 +04:00