On stack overflow, if all frames on the stack are
copyable, we copy the frames to a new stack twice
as large as the old one. During GC, if a G is using
less than 1/4 of its stack, copy the stack to a stack
half its size.
TODO
- Do something about C frames. When a C frame is in the
stack segment, it isn't copyable. We allocate a new segment
in this case.
- For idempotent C code, we can abort it, copy the stack,
then retry. I'm working on a separate CL for this.
- For other C code, we can raise the stackguard
to the lowest Go frame so the next call that Go frame
makes triggers a copy, which will then succeed.
- Pick a starting stack size?
The plan is that eventually we reach a point where the
stack contains only copyable frames.
LGTM=rsc
R=dvyukov, rsc
CC=golang-codereviews
https://golang.org/cl/54650044
MCaches now hold a MSpan for each sizeclass which they have
exclusive access to allocate from, so no lock is needed.
Modifying the heap bitmaps also no longer requires a cas.
runtime.free gets more expensive. But we don't use it
much any more.
It's not much faster on 1 processor, but it's a lot
faster on multiple processors.
benchmark old ns/op new ns/op delta
BenchmarkSetTypeNoPtr1 24 23 -0.42%
BenchmarkSetTypeNoPtr2 33 34 +0.89%
BenchmarkSetTypePtr1 51 49 -3.72%
BenchmarkSetTypePtr2 55 54 -1.98%
benchmark old ns/op new ns/op delta
BenchmarkAllocation 52739 50770 -3.73%
BenchmarkAllocation-2 33957 34141 +0.54%
BenchmarkAllocation-3 33326 29015 -12.94%
BenchmarkAllocation-4 38105 25795 -32.31%
BenchmarkAllocation-5 68055 24409 -64.13%
BenchmarkAllocation-6 71544 23488 -67.17%
BenchmarkAllocation-7 68374 23041 -66.30%
BenchmarkAllocation-8 70117 20758 -70.40%
LGTM=rsc, dvyukov
R=dvyukov, bradfitz, khr, rsc
CC=golang-codereviews
https://golang.org/cl/46810043
Package runtime's C functions written to be called from Go
started out written in C using carefully constructed argument
lists and the FLUSH macro to write a result back to memory.
For some functions, the appropriate parameter list ended up
being architecture-dependent due to differences in alignment,
so we added 'goc2c', which takes a .goc file containing Go func
declarations but C bodies, rewrites the Go func declaration to
equivalent C declarations for the target architecture, adds the
needed FLUSH statements, and writes out an equivalent C file.
That C file is compiled as part of package runtime.
Native Client's x86-64 support introduces the most complex
alignment rules yet, breaking many functions that could until
now be portably written in C. Using goc2c for those avoids the
breakage.
Separately, Keith's work on emitting stack information from
the C compiler would require the hand-written functions
to add #pragmas specifying how many arguments are result
parameters. Using goc2c for those avoids maintaining #pragmas.
For both reasons, use goc2c for as many Go-called C functions
as possible.
This CL is a replay of the bulk of CL 15400047 and CL 15790043,
both of which were reviewed as part of the NaCl port and are
checked in to the NaCl branch. This CL is part of bringing the
NaCl code into the main tree.
No new code here, just reformatting and occasional movement
into .h files.
LGTM=r
R=dave, alex.brainman, r
CC=golang-codereviews
https://golang.org/cl/65220044
This cleans up the code significantly, and it avoids any
possible problems with madvise zeroing out some but
not all of the data.
Fixes#6400.
LGTM=dave
R=dvyukov, dave
CC=golang-codereviews
https://golang.org/cl/57680046
Moves sweep phase out of stoptheworld by adding
background sweeper goroutine and lazy on-demand sweeping.
It turned out to be somewhat trickier than I expected,
because there is no point in time when we know size of live heap
nor consistent number of mallocs and frees.
So everything related to next_gc, mprof, memstats, etc becomes trickier.
At the end of GC next_gc is conservatively set to heap_alloc*GOGC,
which is much larger than real value. But after every sweep
next_gc is decremented by freed*GOGC. So when everything is swept
next_gc becomes what it should be.
For mprof I had to introduce 3-generation scheme (allocs, revent_allocs, prev_allocs),
because by the end of GC we know number of frees for the *previous* GC.
Significant caution is required to not cross yet-unknown real value of next_gc.
This is achieved by 2 means:
1. Whenever I allocate a span from MCentral, I sweep a span in that MCentral.
2. Whenever I allocate N pages from MHeap, I sweep until at least N pages are
returned to heap.
This provides quite strong guarantees that heap does not grow when it should now.
http-1
allocated 7036 7033 -0.04%
allocs 60 60 +0.00%
cputime 51050 46700 -8.52%
gc-pause-one 34060569 1777993 -94.78%
gc-pause-total 2554 133 -94.79%
latency-50 178448 170926 -4.22%
latency-95 284350 198294 -30.26%
latency-99 345191 220652 -36.08%
rss 101564416 101007360 -0.55%
sys-gc 6606832 6541296 -0.99%
sys-heap 88801280 87752704 -1.18%
sys-other 7334208 7405928 +0.98%
sys-stack 524288 524288 +0.00%
sys-total 103266608 102224216 -1.01%
time 50339 46533 -7.56%
virtual-mem 292990976 293728256 +0.25%
garbage-1
allocated 2983818 2990889 +0.24%
allocs 62880 62902 +0.03%
cputime 16480000 16190000 -1.76%
gc-pause-one 828462467 487875135 -41.11%
gc-pause-total 4142312 2439375 -41.11%
rss 1151709184 1153712128 +0.17%
sys-gc 66068352 66068352 +0.00%
sys-heap 1039728640 1039728640 +0.00%
sys-other 37776064 40770176 +7.93%
sys-stack 8781824 8781824 +0.00%
sys-total 1152354880 1155348992 +0.26%
time 16496998 16199876 -1.80%
virtual-mem 1409564672 1402281984 -0.52%
LGTM=rsc
R=golang-codereviews, sameer, rsc, iant, jeremyjackins, gobot
CC=golang-codereviews, khr
https://golang.org/cl/46430043
Currently windows crashes because early allocs in schedinit
try to allocate tiny memory blocks, but m->p is not yet setup.
I've considered calling procresize(1) earlier in schedinit,
but this refactoring is better and must fix the issue as well.
Fixes#7218.
R=golang-codereviews, r
CC=golang-codereviews
https://golang.org/cl/54570045
When growing slice take into account size of the allocated memory block.
Also apply the same optimization to string->[]byte conversion.
Fixes#6307.
benchmark old ns/op new ns/op delta
BenchmarkAppendGrowByte 4541036 4434108 -2.35%
BenchmarkAppendGrowString 59885673 44813604 -25.17%
LGTM=khr
R=khr
CC=golang-codereviews, iant, rsc
https://golang.org/cl/53340044
record finalizers and heap profile info. Enables
removing the special bit from the heap bitmap. Also
provides a generic mechanism for annotating occasional
heap objects.
finalizers
overhead per obj
old 680 B 80 B avg
new 16 B/span 48 B
profile
overhead per obj
old 32KB 24 B + hash tables
new 16 B/span 24 B
R=cshapiro, khr, dvyukov, gobot
CC=golang-codereviews
https://golang.org/cl/13314053
On the plus side, we don't need to change the bits when mallocing
pointerless objects. On the other hand, we need to mark objects in the
free lists during GC. But the free lists are small at GC time, so it
should be a net win.
benchmark old ns/op new ns/op delta
BenchmarkMalloc8 40 33 -17.65%
BenchmarkMalloc16 45 38 -15.72%
BenchmarkMallocTypeInfo8 58 59 +0.85%
BenchmarkMallocTypeInfo16 63 64 +1.10%
R=golang-dev, rsc, dvyukov
CC=cshapiro, golang-dev
https://golang.org/cl/41040043
Currently lots of sys allocations are not accounted in any of XxxSys,
including GC bitmap, spans table, GC roots blocks, GC finalizer blocks,
iface table, netpoll descriptors and more. Up to ~20% can unaccounted.
This change introduces 2 new stats: GCSys and OtherSys for GC metadata
and all other misc allocations, respectively.
Also ensures that all XxxSys indeed sum up to Sys. All sys memory allocation
functions require the stat for accounting, so that it's impossible to miss something.
Also fix updating of mcache_sys/inuse, they were not updated after deallocation.
test/bench/garbage/parser before:
Sys 670064344
HeapSys 610271232
StackSys 65536
MSpanSys 14204928
MCacheSys 16384
BuckHashSys 1439992
after:
Sys 670064344
HeapSys 610271232
StackSys 65536
MSpanSys 14188544
MCacheSys 16384
BuckHashSys 3194304
GCSys 39198688
OtherSys 3129656
Fixes#5799.
R=rsc, dave, alex.brainman
CC=golang-dev
https://golang.org/cl/12946043
the use of the flag, especially for objects which actually do have
pointers but we don't want the GC to scan them.
R=golang-dev, cshapiro
CC=golang-dev
https://golang.org/cl/13181045
Originally the requirement was f(x) where f's argument is
exactly x's type.
CL 11858043 relaxed the requirement in a non-standard
way: f's argument must be exactly x's type or interface{}.
If we're going to relax the requirement, it should be done
in a way consistent with the rest of Go. This CL allows f's
argument to have any type for which x is assignable;
that's the same requirement the compiler would impose
if compiling f(x) directly.
Fixes#5368.
R=dvyukov, bradfitz, pieter
CC=golang-dev
https://golang.org/cl/12895043
Make it accept type, combine flags.
Several reasons for the change:
1. mallocgc and settype must be atomic wrt GC
2. settype is called from only one place now
3. it will help performance (eventually settype
functionality must be combined with markallocated)
4. flags are easier to read now (no mallocgc(sz, 0, 1, 0) anymore)
R=golang-dev, iant, nightlyone, rsc, dave, khr, bradfitz, r
CC=golang-dev
https://golang.org/cl/10136043
Also reduce FixAlloc allocation granulatiry from 128k to 16k,
small programs do not need that much memory for MCache's and MSpan's.
R=golang-dev, khr
CC=golang-dev
https://golang.org/cl/10140044
Count only number of frees, everything else is derivable
and does not need to be counted on every malloc.
benchmark old ns/op new ns/op delta
BenchmarkMalloc8 68 66 -3.07%
BenchmarkMalloc16 75 70 -6.48%
BenchmarkMallocTypeInfo8 102 97 -4.80%
BenchmarkMallocTypeInfo16 108 105 -2.78%
R=golang-dev, dave, rsc
CC=golang-dev
https://golang.org/cl/9776043
It is a caching wrapper around SysAlloc() that can allocate small chunks.
Use it for symtab allocations. Reduces number of symtab walks from 4 to 3
(reduces buildfuncs time from 10ms to 7.5ms on a large binary,
reduces initial heap size by 680K on the same binary).
Also can be used for type info allocation, itab allocation.
There are also several places in GC where we do the same thing,
they can be changed to use persistentalloc().
Also can be used in FixAlloc, because each instance of FixAlloc allocates
in 128K regions, which is too eager.
Reincarnation of committed and rolled back https://golang.org/cl/9805043
The latent bugs that it revealed are fixed:
https://golang.org/cl/9837049https://golang.org/cl/9778048
R=golang-dev, khr
CC=golang-dev
https://golang.org/cl/9778049
This depends on: 9791044: runtime: allocate page table lazily
Once page table is moved out of heap, the heap becomes small.
This removes unnecessary dereferences during heap access.
No logical changes.
R=golang-dev, khr
CC=golang-dev
https://golang.org/cl/9802043
This removes the 256MB memory allocation at startup,
which conflicts with ulimit.
Also will allow to eliminate an unnecessary memory dereference in GC,
because the page table is usually mapped at known address.
Update #5049.
Update #5236.
R=golang-dev, khr, r, khr, rsc
CC=golang-dev
https://golang.org/cl/9791044
multiple failures on amd64
««« original CL description
runtime: introduce helper persistentalloc() function
It is a caching wrapper around SysAlloc() that can allocate small chunks.
Use it for symtab allocations. Reduces number of symtab walks from 4 to 3
(reduces buildfuncs time from 10ms to 7.5ms on a large binary,
reduces initial heap size by 680K on the same binary).
Also can be used for type info allocation, itab allocation.
There are also several places in GC where we do the same thing,
they can be changed to use persistentalloc().
Also can be used in FixAlloc, because each instance of FixAlloc allocates
in 128K regions, which is too eager.
R=golang-dev, daniel.morsing, khr
CC=golang-dev
https://golang.org/cl/9805043
»»»
R=golang-dev
CC=golang-dev
https://golang.org/cl/9822043
It is a caching wrapper around SysAlloc() that can allocate small chunks.
Use it for symtab allocations. Reduces number of symtab walks from 4 to 3
(reduces buildfuncs time from 10ms to 7.5ms on a large binary,
reduces initial heap size by 680K on the same binary).
Also can be used for type info allocation, itab allocation.
There are also several places in GC where we do the same thing,
they can be changed to use persistentalloc().
Also can be used in FixAlloc, because each instance of FixAlloc allocates
in 128K regions, which is too eager.
R=golang-dev, daniel.morsing, khr
CC=golang-dev
https://golang.org/cl/9805043
Currently per-sizeclass stats are lost for destroyed MCache's. This patch fixes this.
Also, only update mstats.heap_alloc on heap operations, because that's the only
stat that needs to be promptly updated. Everything else needs to be up-to-date only in ReadMemStats().
R=golang-dev, remyoudompheng, dave, iant
CC=golang-dev
https://golang.org/cl/9207047
The nlistmin/size thresholds are copied from tcmalloc,
but are unnecesary for Go malloc. We do not do explicit
frees into MCache. For sparse cases when we do (mainly hashmap),
simpler logic will do.
R=rsc, dave, iant
CC=gobot, golang-dev, r, remyoudompheng
https://golang.org/cl/9373043
Finer-grained transfers were relevant with per-M caches,
with per-P caches they are not relevant and harmful for performance.
For few small size classes where it makes difference,
it's fine to grab the whole span (4K).
benchmark old ns/op new ns/op delta
BenchmarkMalloc 42 40 -4.45%
R=golang-dev, bradfitz
CC=golang-dev
https://golang.org/cl/9374043
Also change table type from int32[] to int8[] to save space in L1$.
benchmark old ns/op new ns/op delta
BenchmarkMalloc 42 40 -4.68%
R=golang-dev, bradfitz, r
CC=golang-dev
https://golang.org/cl/9199044
Before, the mheap structure was in the bss,
but it's quite large (today, 256 MB, much of
which is never actually paged in), and it makes
Go binaries run afoul of exec-time bss size
limits on some BSD systems.
Fixes#4447.
R=golang-dev, dave, minux.ma, remyoudompheng, iant
CC=golang-dev
https://golang.org/cl/7307122