Currently small and large (size>rate) objects are merged into a single entry.
But rate adjusting is required only for small objects.
As a result pprof either incorrectly adjusts large objects
or does not adjust small objects.
With this change objects of different sizes are stored in different buckets.
LGTM=rsc
R=golang-codereviews, gobot, rsc
CC=golang-codereviews
https://golang.org/cl/59220049
Currently it periodically fails with the following message.
The immediate cause is the wrong base register when obtaining g
in sys_windows_amd64/386.s.
But there are several secondary problems as well.
runtime: unknown pc 0x0 after stack split
panic: invalid memory address or nil pointer dereference
fatal error: panic during malloc
[signal 0xc0000005 code=0x0 addr=0x60 pc=0x42267a]
runtime stack:
runtime.panic(0x7914c0, 0xc862af)
c:/src/perfer/work/windows-amd64-a15f344a9efa/go/src/pkg/runtime/panic.c:217 +0x2c
runtime: unexpected return pc for runtime.externalthreadhandler called from 0x0
R=rsc, alex.brainman
CC=golang-codereviews
https://golang.org/cl/63310043
This cleans up the code significantly, and it avoids any
possible problems with madvise zeroing out some but
not all of the data.
Fixes#6400.
LGTM=dave
R=dvyukov, dave
CC=golang-codereviews
https://golang.org/cl/57680046
The issue was that one of the MSpan_Sweep callers
was doing sweep with preemption enabled.
Additional checks are added.
LGTM=rsc
R=rsc, dave
CC=golang-codereviews
https://golang.org/cl/62990043
State of the world:
CL 46430043 introduced a new concurrent sweep but is broken.
CL 62360043 made the new sweep non-concurrent
to try to fix the world while we understand what's wrong with
the concurrent version.
This CL fixes the non-concurrent form to run finalizers.
This CL is just a band-aid to get the build green again.
Dmitriy is working on understanding and then fixing what's
wrong with the concurrent sweep.
TBR=dvyukov
CC=golang-codereviews
https://golang.org/cl/62370043
Better sampling of objects that are close in size to sampling rate.
See the comment for details.
LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews
https://golang.org/cl/43830043
Current "System->etext" is not very informative.
Add parent "GC" frame.
Replace un-unwindable syscall/cgo frames with Go stack that leads to the call.
LGTM=rsc
R=rsc, alex.brainman, ality
CC=golang-codereviews
https://golang.org/cl/61270043
Remove GOOS_solaris ifdef from netpoll code,
instead introduce runtime edge/level triggered IO flag.
Replace armread/armwrite with a single arm(mode) function,
that's how all other interfaces look like and these functions
will need to do roughly the same thing anyway.
LGTM=rsc
R=golang-codereviews, dave, rsc
CC=golang-codereviews
https://golang.org/cl/55500044
1. Make internal chan functions static.
2. Move selgen local variable instead of a member of G struct.
3. Change "bool *pres/selected" parameter of chansend/chanrecv to "bool block",
which is simpler, faster and less code.
-37 lines total.
LGTM=rsc
R=golang-codereviews, dave, gobot, rsc
CC=bradfitz, golang-codereviews, iant, khr
https://golang.org/cl/58610043
Moves sweep phase out of stoptheworld by adding
background sweeper goroutine and lazy on-demand sweeping.
It turned out to be somewhat trickier than I expected,
because there is no point in time when we know size of live heap
nor consistent number of mallocs and frees.
So everything related to next_gc, mprof, memstats, etc becomes trickier.
At the end of GC next_gc is conservatively set to heap_alloc*GOGC,
which is much larger than real value. But after every sweep
next_gc is decremented by freed*GOGC. So when everything is swept
next_gc becomes what it should be.
For mprof I had to introduce 3-generation scheme (allocs, revent_allocs, prev_allocs),
because by the end of GC we know number of frees for the *previous* GC.
Significant caution is required to not cross yet-unknown real value of next_gc.
This is achieved by 2 means:
1. Whenever I allocate a span from MCentral, I sweep a span in that MCentral.
2. Whenever I allocate N pages from MHeap, I sweep until at least N pages are
returned to heap.
This provides quite strong guarantees that heap does not grow when it should now.
http-1
allocated 7036 7033 -0.04%
allocs 60 60 +0.00%
cputime 51050 46700 -8.52%
gc-pause-one 34060569 1777993 -94.78%
gc-pause-total 2554 133 -94.79%
latency-50 178448 170926 -4.22%
latency-95 284350 198294 -30.26%
latency-99 345191 220652 -36.08%
rss 101564416 101007360 -0.55%
sys-gc 6606832 6541296 -0.99%
sys-heap 88801280 87752704 -1.18%
sys-other 7334208 7405928 +0.98%
sys-stack 524288 524288 +0.00%
sys-total 103266608 102224216 -1.01%
time 50339 46533 -7.56%
virtual-mem 292990976 293728256 +0.25%
garbage-1
allocated 2983818 2990889 +0.24%
allocs 62880 62902 +0.03%
cputime 16480000 16190000 -1.76%
gc-pause-one 828462467 487875135 -41.11%
gc-pause-total 4142312 2439375 -41.11%
rss 1151709184 1153712128 +0.17%
sys-gc 66068352 66068352 +0.00%
sys-heap 1039728640 1039728640 +0.00%
sys-other 37776064 40770176 +7.93%
sys-stack 8781824 8781824 +0.00%
sys-total 1152354880 1155348992 +0.26%
time 16496998 16199876 -1.80%
virtual-mem 1409564672 1402281984 -0.52%
LGTM=rsc
R=golang-codereviews, sameer, rsc, iant, jeremyjackins, gobot
CC=golang-codereviews, khr
https://golang.org/cl/46430043
There is frequently a thread hanging on GQCS,
currently it skews profiles towards netpoll,
but it is not bad and is not consuming any resources.
R=alex.brainman
CC=golang-codereviews
https://golang.org/cl/61560043
mp->mcache can be concurrently modified by runtime·helpgc.
In such case sigprof can remember mcache=nil, then helpgc sets it to non-nil,
then sigprof restores it back to nil, GC crashes with nil mcache.
R=rsc
CC=golang-codereviews
https://golang.org/cl/58860044
This CL is in preparation to make cgo work on freebsd/arm.
How to generate defs-files on freebsd/arm in the bootstrapping phase:
1. run freebsd on appropriate arm-eabi platforms
2. both syscall z-files and runtime def-files in the current tree are
broken about EABI padding, fix them by hand
3. run make.bash again to build $GOTOOLDIR/cgo
4. use $GOTOOLDIR/cgo directly
LGTM=minux.ma, iant
R=iant, minux.ma, dave
CC=golang-codereviews
https://golang.org/cl/59580045
Add benchmarks for:
1. non-blocking failing receive (polling of "stop" chan)
2. channel-based semaphore (gate pattern)
3. select-based producer/consumer (pass data through a channel, but also wait on "stop" and "timeout" channels)
LGTM=r
R=golang-codereviews, r
CC=bradfitz, golang-codereviews, iant, khr
https://golang.org/cl/59040043
On 32-bits one can arrange make(chan) params so that
the chan buffer gives you access to whole memory.
LGTM=r
R=golang-codereviews, r
CC=bradfitz, golang-codereviews, iant, khr
https://golang.org/cl/50250045
Tiny alloc memory block is shared by different goroutines running on the same thread.
We call racemalloc after enabling preemption in mallocgc,
as the result another goroutine can act on not yet race-cleared tiny block.
Call racemalloc before enabling preemption.
Fixes#7224.
LGTM=dave
R=golang-codereviews, dave
CC=golang-codereviews
https://golang.org/cl/57730043
Although debug.Stack is deprecated, it should still return the correct result.
Output before this CL (using a trivial library in $GOPATH/test.com/a):
/home/vince/src/test.com/a/lib.go:9 (0x42311e)
com/a.ShowStack: os.Stdout.Write(debug.Stack())
Output with this CL applied:
/home/vince/src/test.com/a/lib.go:9 (0x42311e)
ShowStack: os.Stdout.Write(debug.Stack())
LGTM=iant
R=golang-codereviews, iant
CC=golang-codereviews
https://golang.org/cl/57330043
Currently windows crashes because early allocs in schedinit
try to allocate tiny memory blocks, but m->p is not yet setup.
I've considered calling procresize(1) earlier in schedinit,
but this refactoring is better and must fix the issue as well.
Fixes#7218.
R=golang-codereviews, r
CC=golang-codereviews
https://golang.org/cl/54570045
When GOMAXPROCS>1 the last P in syscall is never retaken
(because there are already idle P's -- npidle>0).
This prevents sysmon thread from sleeping.
On a darwin machine the program from issue 6673 constantly
consumes ~0.2% CPU. With this change it stably consumes 0.0% CPU.
Fixes#6673.
R=golang-codereviews, r
CC=bradfitz, golang-codereviews, iant, khr
https://golang.org/cl/56990045
Use the smaller read-only bytes.NewReader/strings.NewReader instead
of a bytes.Buffer when possible.
LGTM=r
R=golang-codereviews, r
CC=golang-codereviews
https://golang.org/cl/54660045
On 32-bits n*sizeof(r[0]) can overflow.
Or it can become 1<<32-eps, and mallocgc will "successfully"
allocate 0 pages for it, there are no checks downstream
and MHeap_Grow just does:
npage = (npage+15)&~15;
ask = npage<<PageShift;
LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews
https://golang.org/cl/54760045
When growing slice take into account size of the allocated memory block.
Also apply the same optimization to string->[]byte conversion.
Fixes#6307.
benchmark old ns/op new ns/op delta
BenchmarkAppendGrowByte 4541036 4434108 -2.35%
BenchmarkAppendGrowString 59885673 44813604 -25.17%
LGTM=khr
R=khr
CC=golang-codereviews, iant, rsc
https://golang.org/cl/53340044
On top of "tiny allocator" (cl/38750047), reduces number of allocs by 1% on json.
No code must rely on zero termination. So will also make debugging simpler,
by uncovering issues earlier.
json-1
allocated 7949686 7915766 -0.43%
allocs 93778 92790 -1.05%
time 100957795 97250949 -3.67%
rest of the metrics are too noisy.
LGTM=r
R=golang-codereviews, r, bradfitz, iant
CC=golang-codereviews
https://golang.org/cl/40370061
There is more zeroing than I would like right now -
temporaries used for the new map and channel runtime
calls need to be eliminated - but it will do for now.
This CL only has an effect if you are building with
GOEXPERIMENT=precisestack ./all.bash
(or make.bash). It costs about 5% in the overall time
spent in all.bash. That number will come down before
we make it on by default, but this should be enough for
Keith to try using the precise maps for copying stacks.
amd64 only (and it's not really great generated code).
TBR=khr, iant
CC=golang-codereviews
https://golang.org/cl/56430043
The addition of TLS to ARM rewrote the MRC instruction
differently depending on whether we were using internal
or external linking mode. That's clearly not okay, since we
don't know that during compilation, which is when we now
generate the code. Also, because the change did not introduce
a real MRC instruction but instead just macro-expanded it
in the assembler, liblink is rewriting a WORD instruction that
may actually be looking for that specific constant, which would
lead to very unexpected results. It was also using one value
that happened to be 8 where a different value that also
happened to be 8 belonged. So the code was correct for those
values but not correct in general, and very confusing.
Throw it all away.
Replace with the following. There is a linker-provided symbol
runtime.tlsgm with a value (address) set to the offset from the
hardware-provided TLS base register to the g and m storage.
Any reference to that name emits an appropriate TLS relocation
to be resolved by either the internal linker or the external linker,
depending on the link mode. The relocation has exactly the
semantics of the R_ARM_TLS_LE32 relocation, which is what
the external linker provides.
This symbol is only used in two routines, runtime.load_gm and
runtime.save_gm. In both cases it is now used like this:
MRC 15, 0, R0, C13, C0, 3 // fetch TLS base pointer
MOVW $runtime·tlsgm(SB), R2
ADD R2, R0 // now R0 points at thread-local g+m storage
It is likely that this change breaks the generation of shared libraries
on ARM, because the MOVW needs to be rewritten to use the global
offset table and a different relocation type. But let's get the supported
functionality working again before we worry about unsupported
functionality.
LGTM=dave, iant
R=iant, dave
CC=golang-codereviews
https://golang.org/cl/56120043
The typo was introduced by one of Dmitriy's CLs this morning.
The fix makes the ARM build compile again; it still won't pass
its tests, but one thing at a time.
TBR=dvyukov
CC=golang-codereviews
https://golang.org/cl/55770044
Introduces two-phase goroutine parking mechanism -- prepare to park, commit park.
This mechanism does not require backing mutex to protect wait predicate.
Use it in netpoll. See comment in netpoll.goc for details.
This slightly reduces contention between reader, writer and read/write io notifications;
and just eliminates a bunch of mutex operations from hotpaths, thus making then faster.
benchmark old ns/op new ns/op delta
BenchmarkTCP4ConcurrentReadWrite 2109 1945 -7.78%
BenchmarkTCP4ConcurrentReadWrite-2 1162 1113 -4.22%
BenchmarkTCP4ConcurrentReadWrite-4 798 755 -5.39%
BenchmarkTCP4ConcurrentReadWrite-8 803 748 -6.85%
BenchmarkTCP4Persistent 9411 9240 -1.82%
BenchmarkTCP4Persistent-2 5888 5813 -1.27%
BenchmarkTCP4Persistent-4 4016 3968 -1.20%
BenchmarkTCP4Persistent-8 3943 3857 -2.18%
R=golang-codereviews, mikioh.mikioh, gobot, iant, rsc
CC=golang-codereviews, khr
https://golang.org/cl/45700043
- do not lose profiling signals when we have no mcache (possible for syscalls/cgo)
- do not lose any profiling signals on windows
- fix profiling of cgo programs on windows (they had no m->thread setup)
- properly setup tls in cgo programs on windows
- check _beginthread return value
Fixes#6417.
Fixes#6986.
R=alex.brainman, rsc
CC=golang-codereviews
https://golang.org/cl/44820047