Commit Graph

1742 Commits

Author SHA1 Message Date
Dmitriy Vyukov e71d147750 runtime: fix mem profile when both large and small objects are allocated at the same stack
Currently small and large (size>rate) objects are merged into a single entry.
But rate adjusting is required only for small objects.
As a result pprof either incorrectly adjusts large objects
or does not adjust small objects.
With this change objects of different sizes are stored in different buckets.

LGTM=rsc
R=golang-codereviews, gobot, rsc
CC=golang-codereviews
https://golang.org/cl/59220049
2014-02-14 13:20:41 +04:00
Dmitriy Vyukov eca55f5ac0 runtime: fix windows cpu profiler
Currently it periodically fails with the following message.
The immediate cause is the wrong base register when obtaining g
in sys_windows_amd64/386.s.
But there are several secondary problems as well.

runtime: unknown pc 0x0 after stack split
panic: invalid memory address or nil pointer dereference
fatal error: panic during malloc
[signal 0xc0000005 code=0x0 addr=0x60 pc=0x42267a]

runtime stack:
runtime.panic(0x7914c0, 0xc862af)
        c:/src/perfer/work/windows-amd64-a15f344a9efa/go/src/pkg/runtime/panic.c:217 +0x2c
runtime: unexpected return pc for runtime.externalthreadhandler called from 0x0

R=rsc, alex.brainman
CC=golang-codereviews
https://golang.org/cl/63310043
2014-02-14 09:20:51 +04:00
Russ Cox e8ecd9f67a runtime: update malloc comment for MSpan.needzero
Missed this suggestion in CL 57680046.

LGTM=iant
R=iant
CC=golang-codereviews
https://golang.org/cl/63390043
2014-02-13 14:31:48 -05:00
Russ Cox 86e3cb8da5 runtime: introduce MSpan.needzero instead of writing to span data
This cleans up the code significantly, and it avoids any
possible problems with madvise zeroing out some but
not all of the data.

Fixes #6400.

LGTM=dave
R=dvyukov, dave
CC=golang-codereviews
https://golang.org/cl/57680046
2014-02-13 11:10:31 -05:00
Dmitriy Vyukov f8e4a2ef94 runtime: fix concurrent GC sweep
The issue was that one of the MSpan_Sweep callers
was doing sweep with preemption enabled.
Additional checks are added.

LGTM=rsc
R=rsc, dave
CC=golang-codereviews
https://golang.org/cl/62990043
2014-02-13 19:36:45 +04:00
Russ Cox 39067c79f3 runtime/pprof: fix arm build after CL 61270043
TBR=dvyukov
CC=golang-codereviews
https://golang.org/cl/62960043
2014-02-13 01:16:20 -05:00
Russ Cox 73a304356b runtime: fix non-concurrent sweep
State of the world:

CL 46430043 introduced a new concurrent sweep but is broken.

CL 62360043 made the new sweep non-concurrent
to try to fix the world while we understand what's wrong with
the concurrent version.

This CL fixes the non-concurrent form to run finalizers.
This CL is just a band-aid to get the build green again.

Dmitriy is working on understanding and then fixing what's
wrong with the concurrent sweep.

TBR=dvyukov
CC=golang-codereviews
https://golang.org/cl/62370043
2014-02-12 15:54:21 -05:00
Dmitriy Vyukov 3cac829ff4 runtime: temporary disable concurrent GC sweep
We see failures on builders, e.g.:
http://build.golang.org/log/70bb28cd6bcf8c4f49810a011bb4337a61977bf4

LGTM=rsc, dave
R=rsc, dave
CC=golang-codereviews
https://golang.org/cl/62360043
2014-02-13 00:03:27 +04:00
Dmitriy Vyukov bf0d71af29 runtime: more precise mprof sampling
Better sampling of objects that are close in size to sampling rate.
See the comment for details.

LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews
https://golang.org/cl/43830043
2014-02-12 22:36:45 +04:00
Dmitriy Vyukov 5e72fae9b2 runtime: improve cpu profiles for GC/syscalls/cgo
Current "System->etext" is not very informative.
Add parent "GC" frame.
Replace un-unwindable syscall/cgo frames with Go stack that leads to the call.

LGTM=rsc
R=rsc, alex.brainman, ality
CC=golang-codereviews
https://golang.org/cl/61270043
2014-02-12 22:31:36 +04:00
Dmitriy Vyukov 2ea859a779 runtime: refactor level-triggered IO support
Remove GOOS_solaris ifdef from netpoll code,
instead introduce runtime edge/level triggered IO flag.
Replace armread/armwrite with a single arm(mode) function,
that's how all other interfaces look like and these functions
will need to do roughly the same thing anyway.

LGTM=rsc
R=golang-codereviews, dave, rsc
CC=golang-codereviews
https://golang.org/cl/55500044
2014-02-12 22:24:29 +04:00
Dmitriy Vyukov e1ee04828d runtime: refactor chan code
1. Make internal chan functions static.
2. Move selgen local variable instead of a member of G struct.
3. Change "bool *pres/selected" parameter of chansend/chanrecv to "bool block",
   which is simpler, faster and less code.
-37 lines total.

LGTM=rsc
R=golang-codereviews, dave, gobot, rsc
CC=bradfitz, golang-codereviews, iant, khr
https://golang.org/cl/58610043
2014-02-12 22:21:38 +04:00
Dmitriy Vyukov 3c3be62201 runtime: concurrent GC sweep
Moves sweep phase out of stoptheworld by adding
background sweeper goroutine and lazy on-demand sweeping.

It turned out to be somewhat trickier than I expected,
because there is no point in time when we know size of live heap
nor consistent number of mallocs and frees.
So everything related to next_gc, mprof, memstats, etc becomes trickier.

At the end of GC next_gc is conservatively set to heap_alloc*GOGC,
which is much larger than real value. But after every sweep
next_gc is decremented by freed*GOGC. So when everything is swept
next_gc becomes what it should be.

For mprof I had to introduce 3-generation scheme (allocs, revent_allocs, prev_allocs),
because by the end of GC we know number of frees for the *previous* GC.

Significant caution is required to not cross yet-unknown real value of next_gc.
This is achieved by 2 means:
1. Whenever I allocate a span from MCentral, I sweep a span in that MCentral.
2. Whenever I allocate N pages from MHeap, I sweep until at least N pages are
returned to heap.
This provides quite strong guarantees that heap does not grow when it should now.

http-1
allocated                    7036         7033      -0.04%
allocs                         60           60      +0.00%
cputime                     51050        46700      -8.52%
gc-pause-one             34060569      1777993     -94.78%
gc-pause-total               2554          133     -94.79%
latency-50                 178448       170926      -4.22%
latency-95                 284350       198294     -30.26%
latency-99                 345191       220652     -36.08%
rss                     101564416    101007360      -0.55%
sys-gc                    6606832      6541296      -0.99%
sys-heap                 88801280     87752704      -1.18%
sys-other                 7334208      7405928      +0.98%
sys-stack                  524288       524288      +0.00%
sys-total               103266608    102224216      -1.01%
time                        50339        46533      -7.56%
virtual-mem             292990976    293728256      +0.25%

garbage-1
allocated                 2983818      2990889      +0.24%
allocs                      62880        62902      +0.03%
cputime                  16480000     16190000      -1.76%
gc-pause-one            828462467    487875135     -41.11%
gc-pause-total            4142312      2439375     -41.11%
rss                    1151709184   1153712128      +0.17%
sys-gc                   66068352     66068352      +0.00%
sys-heap               1039728640   1039728640      +0.00%
sys-other                37776064     40770176      +7.93%
sys-stack                 8781824      8781824      +0.00%
sys-total              1152354880   1155348992      +0.26%
time                     16496998     16199876      -1.80%
virtual-mem            1409564672   1402281984      -0.52%

LGTM=rsc
R=golang-codereviews, sameer, rsc, iant, jeremyjackins, gobot
CC=golang-codereviews, khr
https://golang.org/cl/46430043
2014-02-12 22:16:42 +04:00
Dmitriy Vyukov e5a4211b36 runtime: do not profile blocked netpoll on windows
There is frequently a thread hanging on GQCS,
currently it skews profiles towards netpoll,
but it is not bad and is not consuming any resources.

R=alex.brainman
CC=golang-codereviews
https://golang.org/cl/61560043
2014-02-11 13:41:46 +04:00
David du Colombier 120218afeb runtime: homogenize panic strings on Plan 9
LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/61410046
2014-02-11 09:34:43 +01:00
David du Colombier 76dcb9b346 runtime: handle "sys: trap: divide error" note on Plan 9
Fixes #7286.

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/61410044
2014-02-10 21:47:52 +01:00
Dmitriy Vyukov 373e1e94d8 runtime: fix crash during cpu profiling
mp->mcache can be concurrently modified by runtime·helpgc.
In such case sigprof can remember mcache=nil, then helpgc sets it to non-nil,
then sigprof restores it back to nil, GC crashes with nil mcache.

R=rsc
CC=golang-codereviews
https://golang.org/cl/58860044
2014-02-10 20:24:47 +04:00
Dmitriy Vyukov 0229dc6dbe runtime: do not cpu profile idle threads on windows
Currently this leads to a significant skew towards 'etext' entry,
since all idle threads are profiled every tick.
Before:
Total: 66608 samples
   63188  94.9%  94.9%    63188  94.9% etext
     278   0.4%  95.3%      278   0.4% sweepspan
     216   0.3%  95.6%      448   0.7% runtime.mallocgc
     122   0.2%  95.8%      122   0.2% scanblock
     113   0.2%  96.0%      113   0.2% net/textproto.canonicalMIMEHeaderKey
After:
Total: 8008 samples
    3949  49.3%  49.3%     3949  49.3% etext
     231   2.9%  52.2%      231   2.9% scanblock
     211   2.6%  54.8%      211   2.6% runtime.cas64
     182   2.3%  57.1%      408   5.1% runtime.mallocgc
     178   2.2%  59.3%      178   2.2% runtime.atomicload64

LGTM=alex.brainman
R=golang-codereviews, alex.brainman
CC=golang-codereviews
https://golang.org/cl/61250043
2014-02-10 15:40:55 +04:00
Keith Randall da7cf0ba5d runtime: faster memclr on x86.
Use explicit SSE writes instead of REP STOSQ.

benchmark               old ns/op    new ns/op    delta
BenchmarkMemclr5               22            5  -73.62%
BenchmarkMemclr16              27            5  -78.49%
BenchmarkMemclr64              28            6  -76.43%
BenchmarkMemclr256             34            8  -74.94%
BenchmarkMemclr4096           112           84  -24.73%
BenchmarkMemclr65536         1902         1920   +0.95%

LGTM=dvyukov
R=golang-codereviews, dvyukov
CC=golang-codereviews
https://golang.org/cl/60090044
2014-02-06 17:43:22 -08:00
Mikio Hara 61fe7d8308 runtime/cgo: fix build on freebsd/arm
This CL is in preparation to make cgo work on freebsd/arm.

LGTM=iant
R=iant
CC=golang-codereviews
https://golang.org/cl/60500044
2014-02-07 10:22:34 +09:00
Mikio Hara 8e56eb8b57 runtime: fix build on freebsd/arm
This CL is in preparation to make cgo work on freebsd/arm.

How to generate defs-files on freebsd/arm in the bootstrapping phase:
1. run freebsd on appropriate arm-eabi platforms
2. both syscall z-files and runtime def-files in the current tree are
   broken about EABI padding, fix them by hand
3. run make.bash again to build $GOTOOLDIR/cgo
4. use $GOTOOLDIR/cgo directly

LGTM=minux.ma, iant
R=iant, minux.ma, dave
CC=golang-codereviews
https://golang.org/cl/59580045
2014-02-07 10:22:13 +09:00
Dmitriy Vyukov 3baceaa151 runtime: add more chan benchmarks
Add benchmarks for:
1. non-blocking failing receive (polling of "stop" chan)
2. channel-based semaphore (gate pattern)
3. select-based producer/consumer (pass data through a channel, but also wait on "stop" and "timeout" channels)

LGTM=r
R=golang-codereviews, r
CC=bradfitz, golang-codereviews, iant, khr
https://golang.org/cl/59040043
2014-02-04 09:41:48 +04:00
Dmitriy Vyukov 1b2e435b15 runtime: fix typos in test
I don't know what is n, but it exists somewhere there.

LGTM=dave
R=golang-codereviews, dave
CC=golang-codereviews
https://golang.org/cl/58710043
2014-01-31 18:09:53 +04:00
Dmitriy Vyukov e48751e217 runtime: increase page size to 8K
Tcmalloc uses 8K, 32K and 64K pages, and in custom setups 256K pages.
Only Chromium uses 4K pages today (in "slow but small" configuration).
The general tendency is to increase page size, because it reduces
metadata size and DTLB pressure.
This change reduces GC pause by ~10% and slightly improves other metrics.

json-1
allocated                 8037492      8038689      +0.01%
allocs                     105762       105573      -0.18%
cputime                 158400000    155800000      -1.64%
gc-pause-one              4412234      4135702      -6.27%
gc-pause-total            2647340      2398707      -9.39%
rss                      54923264     54525952      -0.72%
sys-gc                    3952624      3928048      -0.62%
sys-heap                 46399488     46006272      -0.85%
sys-other                 5597504      5290304      -5.49%
sys-stack                  393216       393216      +0.00%
sys-total                56342832     55617840      -1.29%
time                    158478890    156046916      -1.53%
virtual-mem             256548864    256593920      +0.02%

garbage-1
allocated                 2991113      2986259      -0.16%
allocs                      62844        62652      -0.31%
cputime                  16330000     15860000      -2.88%
gc-pause-one            789108229    725555211      -8.05%
gc-pause-total            3945541      3627776      -8.05%
rss                    1143660544   1132253184      -1.00%
sys-gc                   65609600     65806208      +0.30%
sys-heap               1032388608   1035599872      +0.31%
sys-other                37501632     22777664     -39.26%
sys-stack                 8650752      8781824      +1.52%
sys-total              1144150592   1132965568      -0.98%
time                     16364602     15891994      -2.89%
virtual-mem            1327296512   1313746944      -1.02%

This is the exact reincarnation of already LGTMed:
https://golang.org/cl/45770044
which must not break darwin/freebsd after:
https://golang.org/cl/56630043
TBR=iant

LGTM=khr, iant
R=iant, khr
CC=golang-codereviews
https://golang.org/cl/58230043
2014-01-30 13:28:19 +04:00
Dmitriy Vyukov 327e431057 runtime: prepare for 8K pages
Ensure than heap is PageSize aligned.

LGTM=iant
R=iant, dave, gobot
CC=golang-codereviews
https://golang.org/cl/56630043
2014-01-29 18:18:46 +04:00
Dmitriy Vyukov d62379eef5 runtime: more chan tests
LGTM=bradfitz
R=golang-codereviews, bradfitz
CC=golang-codereviews
https://golang.org/cl/57390043
2014-01-28 22:45:14 +04:00
Dmitriy Vyukov d176e3d7c5 runtime: prefetch next block in mallocgc
json-1
cputime                  99600000     98600000      -1.00%
time                    100005493     98859693      -1.15%

garbage-1
cputime                  15760000     15440000      -2.03%
time                     15791759     15471701      -2.03%

LGTM=khr
R=golang-codereviews, gobot, khr, dave
CC=bradfitz, golang-codereviews, iant
https://golang.org/cl/57310043
2014-01-28 22:38:39 +04:00
Dmitriy Vyukov d409e44cfb runtime: fix buffer overflow in make(chan)
On 32-bits one can arrange make(chan) params so that
the chan buffer gives you access to whole memory.

LGTM=r
R=golang-codereviews, r
CC=bradfitz, golang-codereviews, iant, khr
https://golang.org/cl/50250045
2014-01-28 22:37:35 +04:00
Dmitriy Vyukov ce884036d2 runtime: adjust malloc race instrumentation for tiny allocs
Tiny alloc memory block is shared by different goroutines running on the same thread.
We call racemalloc after enabling preemption in mallocgc,
as the result another goroutine can act on not yet race-cleared tiny block.
Call racemalloc before enabling preemption.
Fixes #7224.

LGTM=dave
R=golang-codereviews, dave
CC=golang-codereviews
https://golang.org/cl/57730043
2014-01-28 22:34:32 +04:00
Vincent Vanackere d7c14655a9 runtime/debug: fix incorrect Stack output if package path contains a dot
Although debug.Stack is deprecated, it should still return the correct result.
Output before this CL (using a trivial library in $GOPATH/test.com/a):
/home/vince/src/test.com/a/lib.go:9 (0x42311e)
        com/a.ShowStack: os.Stdout.Write(debug.Stack())

Output with this CL applied:
/home/vince/src/test.com/a/lib.go:9 (0x42311e)
        ShowStack: os.Stdout.Write(debug.Stack())

LGTM=iant
R=golang-codereviews, iant
CC=golang-codereviews
https://golang.org/cl/57330043
2014-01-27 14:00:00 -08:00
Dmitriy Vyukov 86a3a54284 runtime: fix windows build
Currently windows crashes because early allocs in schedinit
try to allocate tiny memory blocks, but m->p is not yet setup.
I've considered calling procresize(1) earlier in schedinit,
but this refactoring is better and must fix the issue as well.
Fixes #7218.

R=golang-codereviews, r
CC=golang-codereviews
https://golang.org/cl/54570045
2014-01-28 00:26:56 +04:00
Dmitriy Vyukov 179d41fecc runtime: tune P retake logic
When GOMAXPROCS>1 the last P in syscall is never retaken
(because there are already idle P's -- npidle>0).
This prevents sysmon thread from sleeping.
On a darwin machine the program from issue 6673 constantly
consumes ~0.2% CPU. With this change it stably consumes 0.0% CPU.
Fixes #6673.

R=golang-codereviews, r
CC=bradfitz, golang-codereviews, iant, khr
https://golang.org/cl/56990045
2014-01-27 23:17:46 +04:00
Brad Fitzpatrick a18f4ab569 all: use {bytes,strings}.NewReader instead of bytes.Buffers
Use the smaller read-only bytes.NewReader/strings.NewReader instead
of a bytes.Buffer when possible.

LGTM=r
R=golang-codereviews, r
CC=golang-codereviews
https://golang.org/cl/54660045
2014-01-27 11:05:01 -08:00
Dmitriy Vyukov e1a91c5b89 runtime: fix buffer overflow in stringtoslicerune
On 32-bits n*sizeof(r[0]) can overflow.
Or it can become 1<<32-eps, and mallocgc will "successfully"
allocate 0 pages for it, there are no checks downstream
and MHeap_Grow just does:
npage = (npage+15)&~15;
ask = npage<<PageShift;

LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews
https://golang.org/cl/54760045
2014-01-27 20:29:21 +04:00
Dmitriy Vyukov bace9523ee runtime: smarter slice grow
When growing slice take into account size of the allocated memory block.
Also apply the same optimization to string->[]byte conversion.
Fixes #6307.

benchmark                    old ns/op    new ns/op    delta
BenchmarkAppendGrowByte        4541036      4434108   -2.35%
BenchmarkAppendGrowString     59885673     44813604  -25.17%

LGTM=khr
R=khr
CC=golang-codereviews, iant, rsc
https://golang.org/cl/53340044
2014-01-27 15:11:12 +04:00
Jeff Sickel 03e4f25849 runtime/pprof: plan9 fails the TestGoroutineSwitch, skip for now.
LGTM=r
R=golang-codereviews, 0intro, r
CC=golang-codereviews
https://golang.org/cl/55430043
2014-01-25 10:09:08 -08:00
Dmitriy Vyukov 1fa7029425 runtime: combine small NoScan allocations
Combine NoScan allocations < 16 bytes into a single memory block.
Reduces number of allocations on json/garbage benchmarks by 10+%.

json-1
allocated                 8039872      7949194      -1.13%
allocs                     105774        93776     -11.34%
cputime                 156200000    100700000     -35.53%
gc-pause-one              4908873      3814853     -22.29%
gc-pause-total            2748969      2899288      +5.47%
rss                      52674560     43560960     -17.30%
sys-gc                    3796976      3256304     -14.24%
sys-heap                 43843584     35192832     -19.73%
sys-other                 5589312      5310784      -4.98%
sys-stack                  393216       393216      +0.00%
sys-total                53623088     44153136     -17.66%
time                    156193436    100886714     -35.41%
virtual-mem             256548864    256540672      -0.00%

garbage-1
allocated                 2996885      2932982      -2.13%
allocs                      62904        55200     -12.25%
cputime                  17470000     17400000      -0.40%
gc-pause-one            932757485    925806143      -0.75%
gc-pause-total            4663787      4629030      -0.75%
rss                    1151074304   1133670400      -1.51%
sys-gc                   66068352     65085312      -1.49%
sys-heap               1039728640   1024065536      -1.51%
sys-other                38038208     37485248      -1.45%
sys-stack                 8650752      8781824      +1.52%
sys-total              1152485952   1135417920      -1.48%
time                     17478088     17418005      -0.34%
virtual-mem            1343709184   1324204032      -1.45%

LGTM=iant, bradfitz
R=golang-codereviews, dave, iant, rsc, bradfitz
CC=golang-codereviews, khr
https://golang.org/cl/38750047
2014-01-24 22:35:11 +04:00
Dmitriy Vyukov f8e0057bb7 sync: scalable Pool
Introduce fixed-size P-local caches.
When local caches overflow/underflow a batch of items
is transferred to/from global mutex-protected cache.

benchmark                    old ns/op    new ns/op    delta
BenchmarkPool                    50554        22423  -55.65%
BenchmarkPool-4                 400359         5904  -98.53%
BenchmarkPool-16                403311         1598  -99.60%
BenchmarkPool-32                367310         1526  -99.58%

BenchmarkPoolOverlflow            5214         3633  -30.32%
BenchmarkPoolOverlflow-4         42663         9539  -77.64%
BenchmarkPoolOverlflow-8         46919        11385  -75.73%
BenchmarkPoolOverlflow-16        39454        13048  -66.93%

BenchmarkSprintfEmpty                    84           63  -25.68%
BenchmarkSprintfEmpty-2                 371           32  -91.13%
BenchmarkSprintfEmpty-4                 465           22  -95.25%
BenchmarkSprintfEmpty-8                 565           12  -97.77%
BenchmarkSprintfEmpty-16                498            5  -98.87%
BenchmarkSprintfEmpty-32                492            4  -99.04%

BenchmarkSprintfString                  259          229  -11.58%
BenchmarkSprintfString-2                574          144  -74.91%
BenchmarkSprintfString-4                651           77  -88.05%
BenchmarkSprintfString-8                868           47  -94.48%
BenchmarkSprintfString-16               825           33  -95.96%
BenchmarkSprintfString-32               825           30  -96.28%

BenchmarkSprintfInt                     213          188  -11.74%
BenchmarkSprintfInt-2                   448          138  -69.20%
BenchmarkSprintfInt-4                   624           52  -91.63%
BenchmarkSprintfInt-8                   691           31  -95.43%
BenchmarkSprintfInt-16                  724           18  -97.46%
BenchmarkSprintfInt-32                  718           16  -97.70%

BenchmarkSprintfIntInt                  311          282   -9.32%
BenchmarkSprintfIntInt-2                333          145  -56.46%
BenchmarkSprintfIntInt-4                642          110  -82.87%
BenchmarkSprintfIntInt-8                832           42  -94.90%
BenchmarkSprintfIntInt-16               817           24  -97.00%
BenchmarkSprintfIntInt-32               805           22  -97.17%

BenchmarkSprintfPrefixedInt             309          269  -12.94%
BenchmarkSprintfPrefixedInt-2           245          168  -31.43%
BenchmarkSprintfPrefixedInt-4           598           99  -83.36%
BenchmarkSprintfPrefixedInt-8           770           67  -91.23%
BenchmarkSprintfPrefixedInt-16          829           54  -93.49%
BenchmarkSprintfPrefixedInt-32          824           50  -93.83%

BenchmarkSprintfFloat                   418          398   -4.78%
BenchmarkSprintfFloat-2                 295          203  -31.19%
BenchmarkSprintfFloat-4                 585          128  -78.12%
BenchmarkSprintfFloat-8                 873           60  -93.13%
BenchmarkSprintfFloat-16                884           33  -96.24%
BenchmarkSprintfFloat-32                881           29  -96.62%

BenchmarkManyArgs                      1097         1069   -2.55%
BenchmarkManyArgs-2                     705          567  -19.57%
BenchmarkManyArgs-4                     792          319  -59.72%
BenchmarkManyArgs-8                     963          172  -82.14%
BenchmarkManyArgs-16                   1115          103  -90.76%
BenchmarkManyArgs-32                   1133           90  -92.03%

LGTM=rsc
R=golang-codereviews, bradfitz, minux.ma, gobot, rsc
CC=golang-codereviews
https://golang.org/cl/46010043
2014-01-24 22:29:53 +04:00
Dmitriy Vyukov 9fa9613e0b runtime: do not zero terminate strings
On top of "tiny allocator" (cl/38750047), reduces number of allocs by 1% on json.
No code must rely on zero termination. So will also make debugging simpler,
by uncovering issues earlier.

json-1
allocated                 7949686      7915766      -0.43%
allocs                      93778        92790      -1.05%
time                    100957795     97250949      -3.67%
rest of the metrics are too noisy.

LGTM=r
R=golang-codereviews, r, bradfitz, iant
CC=golang-codereviews
https://golang.org/cl/40370061
2014-01-24 22:29:01 +04:00
Russ Cox a81692e265 cmd/gc: add zeroing to enable precise stack accounting
There is more zeroing than I would like right now -
temporaries used for the new map and channel runtime
calls need to be eliminated - but it will do for now.

This CL only has an effect if you are building with

        GOEXPERIMENT=precisestack ./all.bash

(or make.bash). It costs about 5% in the overall time
spent in all.bash. That number will come down before
we make it on by default, but this should be enough for
Keith to try using the precise maps for copying stacks.

amd64 only (and it's not really great generated code).

TBR=khr, iant
CC=golang-codereviews
https://golang.org/cl/56430043
2014-01-23 23:11:04 -05:00
Russ Cox b377c9c6a9 liblink, runtime: fix cgo on arm
The addition of TLS to ARM rewrote the MRC instruction
differently depending on whether we were using internal
or external linking mode. That's clearly not okay, since we
don't know that during compilation, which is when we now
generate the code. Also, because the change did not introduce
a real MRC instruction but instead just macro-expanded it
in the assembler, liblink is rewriting a WORD instruction that
may actually be looking for that specific constant, which would
lead to very unexpected results. It was also using one value
that happened to be 8 where a different value that also
happened to be 8 belonged. So the code was correct for those
values but not correct in general, and very confusing.

Throw it all away.

Replace with the following. There is a linker-provided symbol
runtime.tlsgm with a value (address) set to the offset from the
hardware-provided TLS base register to the g and m storage.
Any reference to that name emits an appropriate TLS relocation
to be resolved by either the internal linker or the external linker,
depending on the link mode. The relocation has exactly the
semantics of the R_ARM_TLS_LE32 relocation, which is what
the external linker provides.

This symbol is only used in two routines, runtime.load_gm and
runtime.save_gm. In both cases it is now used like this:

        MRC		15, 0, R0, C13, C0, 3 // fetch TLS base pointer
        MOVW	$runtime·tlsgm(SB), R2
        ADD	R2, R0 // now R0 points at thread-local g+m storage

It is likely that this change breaks the generation of shared libraries
on ARM, because the MOVW needs to be rewritten to use the global
offset table and a different relocation type. But let's get the supported
functionality working again before we worry about unsupported
functionality.

LGTM=dave, iant
R=iant, dave
CC=golang-codereviews
https://golang.org/cl/56120043
2014-01-23 22:51:39 -05:00
Keith Randall be5d2d4432 runtime: Print elision message if we skipped frames on traceback.
Fixes bug 7180

R=golang-codereviews, dvyukov
CC=golang-codereviews, gri
https://golang.org/cl/55810044
2014-01-23 12:47:30 -08:00
Dmitriy Vyukov 8371b0142e undo CL 45770044 / d795425bfa18
Breaks darwin and freebsd.

««« original CL description
runtime: increase page size to 8K
Tcmalloc uses 8K, 32K and 64K pages, and in custom setups 256K pages.
Only Chromium uses 4K pages today (in "slow but small" configuration).
The general tendency is to increase page size, because it reduces
metadata size and DTLB pressure.
This change reduces GC pause by ~10% and slightly improves other metrics.

json-1
allocated                 8037492      8038689      +0.01%
allocs                     105762       105573      -0.18%
cputime                 158400000    155800000      -1.64%
gc-pause-one              4412234      4135702      -6.27%
gc-pause-total            2647340      2398707      -9.39%
rss                      54923264     54525952      -0.72%
sys-gc                    3952624      3928048      -0.62%
sys-heap                 46399488     46006272      -0.85%
sys-other                 5597504      5290304      -5.49%
sys-stack                  393216       393216      +0.00%
sys-total                56342832     55617840      -1.29%
time                    158478890    156046916      -1.53%
virtual-mem             256548864    256593920      +0.02%

garbage-1
allocated                 2991113      2986259      -0.16%
allocs                      62844        62652      -0.31%
cputime                  16330000     15860000      -2.88%
gc-pause-one            789108229    725555211      -8.05%
gc-pause-total            3945541      3627776      -8.05%
rss                    1143660544   1132253184      -1.00%
sys-gc                   65609600     65806208      +0.30%
sys-heap               1032388608   1035599872      +0.31%
sys-other                37501632     22777664     -39.26%
sys-stack                 8650752      8781824      +1.52%
sys-total              1144150592   1132965568      -0.98%
time                     16364602     15891994      -2.89%
virtual-mem            1327296512   1313746944      -1.02%

R=golang-codereviews, dave, khr, rsc, khr
CC=golang-codereviews
https://golang.org/cl/45770044
»»»

R=golang-codereviews
CC=golang-codereviews
https://golang.org/cl/56060043
2014-01-23 19:56:59 +04:00
Dmitriy Vyukov 6d603af6dc runtime: increase page size to 8K
Tcmalloc uses 8K, 32K and 64K pages, and in custom setups 256K pages.
Only Chromium uses 4K pages today (in "slow but small" configuration).
The general tendency is to increase page size, because it reduces
metadata size and DTLB pressure.
This change reduces GC pause by ~10% and slightly improves other metrics.

json-1
allocated                 8037492      8038689      +0.01%
allocs                     105762       105573      -0.18%
cputime                 158400000    155800000      -1.64%
gc-pause-one              4412234      4135702      -6.27%
gc-pause-total            2647340      2398707      -9.39%
rss                      54923264     54525952      -0.72%
sys-gc                    3952624      3928048      -0.62%
sys-heap                 46399488     46006272      -0.85%
sys-other                 5597504      5290304      -5.49%
sys-stack                  393216       393216      +0.00%
sys-total                56342832     55617840      -1.29%
time                    158478890    156046916      -1.53%
virtual-mem             256548864    256593920      +0.02%

garbage-1
allocated                 2991113      2986259      -0.16%
allocs                      62844        62652      -0.31%
cputime                  16330000     15860000      -2.88%
gc-pause-one            789108229    725555211      -8.05%
gc-pause-total            3945541      3627776      -8.05%
rss                    1143660544   1132253184      -1.00%
sys-gc                   65609600     65806208      +0.30%
sys-heap               1032388608   1035599872      +0.31%
sys-other                37501632     22777664     -39.26%
sys-stack                 8650752      8781824      +1.52%
sys-total              1144150592   1132965568      -0.98%
time                     16364602     15891994      -2.89%
virtual-mem            1327296512   1313746944      -1.02%

R=golang-codereviews, dave, khr, rsc, khr
CC=golang-codereviews
https://golang.org/cl/45770044
2014-01-23 18:59:43 +04:00
Russ Cox f7245c0626 runtime: fix typo in ARM code
The typo was introduced by one of Dmitriy's CLs this morning.
The fix makes the ARM build compile again; it still won't pass
its tests, but one thing at a time.

TBR=dvyukov
CC=golang-codereviews
https://golang.org/cl/55770044
2014-01-22 16:39:39 -05:00
Dmitriy Vyukov b7b93a7154 runtime: fix code formatting
Place && at the end of line.
Offset expression continuation.

R=golang-codereviews, bradfitz
CC=golang-codereviews
https://golang.org/cl/55380044
2014-01-22 13:30:12 +04:00
Dmitriy Vyukov 9cbd2fb1aa runtime: remove locks from netpoll hotpaths
Introduces two-phase goroutine parking mechanism -- prepare to park, commit park.
This mechanism does not require backing mutex to protect wait predicate.
Use it in netpoll. See comment in netpoll.goc for details.
This slightly reduces contention between reader, writer and read/write io notifications;
and just eliminates a bunch of mutex operations from hotpaths, thus making then faster.

benchmark                             old ns/op    new ns/op    delta
BenchmarkTCP4ConcurrentReadWrite           2109         1945   -7.78%
BenchmarkTCP4ConcurrentReadWrite-2         1162         1113   -4.22%
BenchmarkTCP4ConcurrentReadWrite-4          798          755   -5.39%
BenchmarkTCP4ConcurrentReadWrite-8          803          748   -6.85%
BenchmarkTCP4Persistent                    9411         9240   -1.82%
BenchmarkTCP4Persistent-2                  5888         5813   -1.27%
BenchmarkTCP4Persistent-4                  4016         3968   -1.20%
BenchmarkTCP4Persistent-8                  3943         3857   -2.18%

R=golang-codereviews, mikioh.mikioh, gobot, iant, rsc
CC=golang-codereviews, khr
https://golang.org/cl/45700043
2014-01-22 11:27:16 +04:00
Dmitriy Vyukov cb86d86786 runtime/race: race instrument reads/writes in select cases
The new select tests currently fail (the race is not detected).

R=khr
CC=golang-codereviews
https://golang.org/cl/54220043
2014-01-22 10:36:17 +04:00
Dmitriy Vyukov 98b50b89a8 runtime: allocate goroutine ids in batches
Helps reduce contention on sched.goidgen.

benchmark                               old ns/op    new ns/op    delta
BenchmarkCreateGoroutines-16                  259          237   -8.49%
BenchmarkCreateGoroutinesParallel-16          127           43  -66.06%

R=golang-codereviews, dave, bradfitz, khr
CC=golang-codereviews, rsc
https://golang.org/cl/46970043
2014-01-22 10:34:36 +04:00
Dmitriy Vyukov 8a3c587dc1 runtime: fix and improve CPU profiling
- do not lose profiling signals when we have no mcache (possible for syscalls/cgo)
- do not lose any profiling signals on windows
- fix profiling of cgo programs on windows (they had no m->thread setup)
- properly setup tls in cgo programs on windows
- check _beginthread return value

Fixes #6417.
Fixes #6986.

R=alex.brainman, rsc
CC=golang-codereviews
https://golang.org/cl/44820047
2014-01-22 10:30:10 +04:00