mirror/go - go - Git Fam. Sieh

Commit Graph

Author	SHA1	Message	Date
Dmitriy Vyukov	e71d147750	runtime: fix mem profile when both large and small objects are allocated at the same stack Currently small and large (size>rate) objects are merged into a single entry. But rate adjusting is required only for small objects. As a result pprof either incorrectly adjusts large objects or does not adjust small objects. With this change objects of different sizes are stored in different buckets. LGTM=rsc R=golang-codereviews, gobot, rsc CC=golang-codereviews https://golang.org/cl/59220049	2014-02-14 13:20:41 +04:00
Dmitriy Vyukov	eca55f5ac0	runtime: fix windows cpu profiler Currently it periodically fails with the following message. The immediate cause is the wrong base register when obtaining g in sys_windows_amd64/386.s. But there are several secondary problems as well. runtime: unknown pc 0x0 after stack split panic: invalid memory address or nil pointer dereference fatal error: panic during malloc [signal 0xc0000005 code=0x0 addr=0x60 pc=0x42267a] runtime stack: runtime.panic(0x7914c0, 0xc862af) c:/src/perfer/work/windows-amd64-a15f344a9efa/go/src/pkg/runtime/panic.c:217 +0x2c runtime: unexpected return pc for runtime.externalthreadhandler called from 0x0 R=rsc, alex.brainman CC=golang-codereviews https://golang.org/cl/63310043	2014-02-14 09:20:51 +04:00
Russ Cox	e8ecd9f67a	runtime: update malloc comment for MSpan.needzero Missed this suggestion in CL 57680046. LGTM=iant R=iant CC=golang-codereviews https://golang.org/cl/63390043	2014-02-13 14:31:48 -05:00
Russ Cox	86e3cb8da5	runtime: introduce MSpan.needzero instead of writing to span data This cleans up the code significantly, and it avoids any possible problems with madvise zeroing out some but not all of the data. Fixes #6400. LGTM=dave R=dvyukov, dave CC=golang-codereviews https://golang.org/cl/57680046	2014-02-13 11:10:31 -05:00
Dmitriy Vyukov	f8e4a2ef94	runtime: fix concurrent GC sweep The issue was that one of the MSpan_Sweep callers was doing sweep with preemption enabled. Additional checks are added. LGTM=rsc R=rsc, dave CC=golang-codereviews https://golang.org/cl/62990043	2014-02-13 19:36:45 +04:00
Russ Cox	39067c79f3	runtime/pprof: fix arm build after CL 61270043 TBR=dvyukov CC=golang-codereviews https://golang.org/cl/62960043	2014-02-13 01:16:20 -05:00
Russ Cox	73a304356b	runtime: fix non-concurrent sweep State of the world: CL 46430043 introduced a new concurrent sweep but is broken. CL 62360043 made the new sweep non-concurrent to try to fix the world while we understand what's wrong with the concurrent version. This CL fixes the non-concurrent form to run finalizers. This CL is just a band-aid to get the build green again. Dmitriy is working on understanding and then fixing what's wrong with the concurrent sweep. TBR=dvyukov CC=golang-codereviews https://golang.org/cl/62370043	2014-02-12 15:54:21 -05:00
Dmitriy Vyukov	3cac829ff4	runtime: temporary disable concurrent GC sweep We see failures on builders, e.g.: http://build.golang.org/log/70bb28cd6bcf8c4f49810a011bb4337a61977bf4 LGTM=rsc, dave R=rsc, dave CC=golang-codereviews https://golang.org/cl/62360043	2014-02-13 00:03:27 +04:00
Dmitriy Vyukov	bf0d71af29	runtime: more precise mprof sampling Better sampling of objects that are close in size to sampling rate. See the comment for details. LGTM=rsc R=golang-codereviews, rsc CC=golang-codereviews https://golang.org/cl/43830043	2014-02-12 22:36:45 +04:00
Dmitriy Vyukov	5e72fae9b2	runtime: improve cpu profiles for GC/syscalls/cgo Current "System->etext" is not very informative. Add parent "GC" frame. Replace un-unwindable syscall/cgo frames with Go stack that leads to the call. LGTM=rsc R=rsc, alex.brainman, ality CC=golang-codereviews https://golang.org/cl/61270043	2014-02-12 22:31:36 +04:00
Dmitriy Vyukov	2ea859a779	runtime: refactor level-triggered IO support Remove GOOS_solaris ifdef from netpoll code, instead introduce runtime edge/level triggered IO flag. Replace armread/armwrite with a single arm(mode) function, that's how all other interfaces look like and these functions will need to do roughly the same thing anyway. LGTM=rsc R=golang-codereviews, dave, rsc CC=golang-codereviews https://golang.org/cl/55500044	2014-02-12 22:24:29 +04:00
Dmitriy Vyukov	e1ee04828d	runtime: refactor chan code 1. Make internal chan functions static. 2. Move selgen local variable instead of a member of G struct. 3. Change "bool *pres/selected" parameter of chansend/chanrecv to "bool block", which is simpler, faster and less code. -37 lines total. LGTM=rsc R=golang-codereviews, dave, gobot, rsc CC=bradfitz, golang-codereviews, iant, khr https://golang.org/cl/58610043	2014-02-12 22:21:38 +04:00
Dmitriy Vyukov	3c3be62201	runtime: concurrent GC sweep Moves sweep phase out of stoptheworld by adding background sweeper goroutine and lazy on-demand sweeping. It turned out to be somewhat trickier than I expected, because there is no point in time when we know size of live heap nor consistent number of mallocs and frees. So everything related to next_gc, mprof, memstats, etc becomes trickier. At the end of GC next_gc is conservatively set to heap_allocGOGC, which is much larger than real value. But after every sweep next_gc is decremented by freedGOGC. So when everything is swept next_gc becomes what it should be. For mprof I had to introduce 3-generation scheme (allocs, revent_allocs, prev_allocs), because by the end of GC we know number of frees for the previous GC. Significant caution is required to not cross yet-unknown real value of next_gc. This is achieved by 2 means: 1. Whenever I allocate a span from MCentral, I sweep a span in that MCentral. 2. Whenever I allocate N pages from MHeap, I sweep until at least N pages are returned to heap. This provides quite strong guarantees that heap does not grow when it should now. http-1 allocated 7036 7033 -0.04% allocs 60 60 +0.00% cputime 51050 46700 -8.52% gc-pause-one 34060569 1777993 -94.78% gc-pause-total 2554 133 -94.79% latency-50 178448 170926 -4.22% latency-95 284350 198294 -30.26% latency-99 345191 220652 -36.08% rss 101564416 101007360 -0.55% sys-gc 6606832 6541296 -0.99% sys-heap 88801280 87752704 -1.18% sys-other 7334208 7405928 +0.98% sys-stack 524288 524288 +0.00% sys-total 103266608 102224216 -1.01% time 50339 46533 -7.56% virtual-mem 292990976 293728256 +0.25% garbage-1 allocated 2983818 2990889 +0.24% allocs 62880 62902 +0.03% cputime 16480000 16190000 -1.76% gc-pause-one 828462467 487875135 -41.11% gc-pause-total 4142312 2439375 -41.11% rss 1151709184 1153712128 +0.17% sys-gc 66068352 66068352 +0.00% sys-heap 1039728640 1039728640 +0.00% sys-other 37776064 40770176 +7.93% sys-stack 8781824 8781824 +0.00% sys-total 1152354880 1155348992 +0.26% time 16496998 16199876 -1.80% virtual-mem 1409564672 1402281984 -0.52% LGTM=rsc R=golang-codereviews, sameer, rsc, iant, jeremyjackins, gobot CC=golang-codereviews, khr https://golang.org/cl/46430043	2014-02-12 22:16:42 +04:00
Dmitriy Vyukov	e5a4211b36	runtime: do not profile blocked netpoll on windows There is frequently a thread hanging on GQCS, currently it skews profiles towards netpoll, but it is not bad and is not consuming any resources. R=alex.brainman CC=golang-codereviews https://golang.org/cl/61560043	2014-02-11 13:41:46 +04:00
David du Colombier	120218afeb	runtime: homogenize panic strings on Plan 9 LGTM=rsc R=rsc CC=golang-codereviews https://golang.org/cl/61410046	2014-02-11 09:34:43 +01:00
David du Colombier	76dcb9b346	runtime: handle "sys: trap: divide error" note on Plan 9 Fixes #7286. LGTM=rsc R=rsc CC=golang-codereviews https://golang.org/cl/61410044	2014-02-10 21:47:52 +01:00
Dmitriy Vyukov	373e1e94d8	runtime: fix crash during cpu profiling mp->mcache can be concurrently modified by runtime·helpgc. In such case sigprof can remember mcache=nil, then helpgc sets it to non-nil, then sigprof restores it back to nil, GC crashes with nil mcache. R=rsc CC=golang-codereviews https://golang.org/cl/58860044	2014-02-10 20:24:47 +04:00
Dmitriy Vyukov	0229dc6dbe	runtime: do not cpu profile idle threads on windows Currently this leads to a significant skew towards 'etext' entry, since all idle threads are profiled every tick. Before: Total: 66608 samples 63188 94.9% 94.9% 63188 94.9% etext 278 0.4% 95.3% 278 0.4% sweepspan 216 0.3% 95.6% 448 0.7% runtime.mallocgc 122 0.2% 95.8% 122 0.2% scanblock 113 0.2% 96.0% 113 0.2% net/textproto.canonicalMIMEHeaderKey After: Total: 8008 samples 3949 49.3% 49.3% 3949 49.3% etext 231 2.9% 52.2% 231 2.9% scanblock 211 2.6% 54.8% 211 2.6% runtime.cas64 182 2.3% 57.1% 408 5.1% runtime.mallocgc 178 2.2% 59.3% 178 2.2% runtime.atomicload64 LGTM=alex.brainman R=golang-codereviews, alex.brainman CC=golang-codereviews https://golang.org/cl/61250043	2014-02-10 15:40:55 +04:00
Keith Randall	da7cf0ba5d	runtime: faster memclr on x86. Use explicit SSE writes instead of REP STOSQ. benchmark old ns/op new ns/op delta BenchmarkMemclr5 22 5 -73.62% BenchmarkMemclr16 27 5 -78.49% BenchmarkMemclr64 28 6 -76.43% BenchmarkMemclr256 34 8 -74.94% BenchmarkMemclr4096 112 84 -24.73% BenchmarkMemclr65536 1902 1920 +0.95% LGTM=dvyukov R=golang-codereviews, dvyukov CC=golang-codereviews https://golang.org/cl/60090044	2014-02-06 17:43:22 -08:00
Mikio Hara	61fe7d8308	runtime/cgo: fix build on freebsd/arm This CL is in preparation to make cgo work on freebsd/arm. LGTM=iant R=iant CC=golang-codereviews https://golang.org/cl/60500044	2014-02-07 10:22:34 +09:00
Mikio Hara	8e56eb8b57	runtime: fix build on freebsd/arm This CL is in preparation to make cgo work on freebsd/arm. How to generate defs-files on freebsd/arm in the bootstrapping phase: 1. run freebsd on appropriate arm-eabi platforms 2. both syscall z-files and runtime def-files in the current tree are broken about EABI padding, fix them by hand 3. run make.bash again to build $GOTOOLDIR/cgo 4. use $GOTOOLDIR/cgo directly LGTM=minux.ma, iant R=iant, minux.ma, dave CC=golang-codereviews https://golang.org/cl/59580045	2014-02-07 10:22:13 +09:00
Dmitriy Vyukov	3baceaa151	runtime: add more chan benchmarks Add benchmarks for: 1. non-blocking failing receive (polling of "stop" chan) 2. channel-based semaphore (gate pattern) 3. select-based producer/consumer (pass data through a channel, but also wait on "stop" and "timeout" channels) LGTM=r R=golang-codereviews, r CC=bradfitz, golang-codereviews, iant, khr https://golang.org/cl/59040043	2014-02-04 09:41:48 +04:00
Dmitriy Vyukov	1b2e435b15	runtime: fix typos in test I don't know what is n, but it exists somewhere there. LGTM=dave R=golang-codereviews, dave CC=golang-codereviews https://golang.org/cl/58710043	2014-01-31 18:09:53 +04:00
Dmitriy Vyukov	e48751e217	runtime: increase page size to 8K Tcmalloc uses 8K, 32K and 64K pages, and in custom setups 256K pages. Only Chromium uses 4K pages today (in "slow but small" configuration). The general tendency is to increase page size, because it reduces metadata size and DTLB pressure. This change reduces GC pause by ~10% and slightly improves other metrics. json-1 allocated 8037492 8038689 +0.01% allocs 105762 105573 -0.18% cputime 158400000 155800000 -1.64% gc-pause-one 4412234 4135702 -6.27% gc-pause-total 2647340 2398707 -9.39% rss 54923264 54525952 -0.72% sys-gc 3952624 3928048 -0.62% sys-heap 46399488 46006272 -0.85% sys-other 5597504 5290304 -5.49% sys-stack 393216 393216 +0.00% sys-total 56342832 55617840 -1.29% time 158478890 156046916 -1.53% virtual-mem 256548864 256593920 +0.02% garbage-1 allocated 2991113 2986259 -0.16% allocs 62844 62652 -0.31% cputime 16330000 15860000 -2.88% gc-pause-one 789108229 725555211 -8.05% gc-pause-total 3945541 3627776 -8.05% rss 1143660544 1132253184 -1.00% sys-gc 65609600 65806208 +0.30% sys-heap 1032388608 1035599872 +0.31% sys-other 37501632 22777664 -39.26% sys-stack 8650752 8781824 +1.52% sys-total 1144150592 1132965568 -0.98% time 16364602 15891994 -2.89% virtual-mem 1327296512 1313746944 -1.02% This is the exact reincarnation of already LGTMed: https://golang.org/cl/45770044 which must not break darwin/freebsd after: https://golang.org/cl/56630043 TBR=iant LGTM=khr, iant R=iant, khr CC=golang-codereviews https://golang.org/cl/58230043	2014-01-30 13:28:19 +04:00
Dmitriy Vyukov	327e431057	runtime: prepare for 8K pages Ensure than heap is PageSize aligned. LGTM=iant R=iant, dave, gobot CC=golang-codereviews https://golang.org/cl/56630043	2014-01-29 18:18:46 +04:00
Dmitriy Vyukov	d62379eef5	runtime: more chan tests LGTM=bradfitz R=golang-codereviews, bradfitz CC=golang-codereviews https://golang.org/cl/57390043	2014-01-28 22:45:14 +04:00
Dmitriy Vyukov	d176e3d7c5	runtime: prefetch next block in mallocgc json-1 cputime 99600000 98600000 -1.00% time 100005493 98859693 -1.15% garbage-1 cputime 15760000 15440000 -2.03% time 15791759 15471701 -2.03% LGTM=khr R=golang-codereviews, gobot, khr, dave CC=bradfitz, golang-codereviews, iant https://golang.org/cl/57310043	2014-01-28 22:38:39 +04:00
Dmitriy Vyukov	d409e44cfb	runtime: fix buffer overflow in make(chan) On 32-bits one can arrange make(chan) params so that the chan buffer gives you access to whole memory. LGTM=r R=golang-codereviews, r CC=bradfitz, golang-codereviews, iant, khr https://golang.org/cl/50250045	2014-01-28 22:37:35 +04:00
Dmitriy Vyukov	ce884036d2	runtime: adjust malloc race instrumentation for tiny allocs Tiny alloc memory block is shared by different goroutines running on the same thread. We call racemalloc after enabling preemption in mallocgc, as the result another goroutine can act on not yet race-cleared tiny block. Call racemalloc before enabling preemption. Fixes #7224. LGTM=dave R=golang-codereviews, dave CC=golang-codereviews https://golang.org/cl/57730043	2014-01-28 22:34:32 +04:00
Vincent Vanackere	d7c14655a9	runtime/debug: fix incorrect Stack output if package path contains a dot Although debug.Stack is deprecated, it should still return the correct result. Output before this CL (using a trivial library in $GOPATH/test.com/a): /home/vince/src/test.com/a/lib.go:9 (0x42311e) com/a.ShowStack: os.Stdout.Write(debug.Stack()) Output with this CL applied: /home/vince/src/test.com/a/lib.go:9 (0x42311e) ShowStack: os.Stdout.Write(debug.Stack()) LGTM=iant R=golang-codereviews, iant CC=golang-codereviews https://golang.org/cl/57330043	2014-01-27 14:00:00 -08:00
Dmitriy Vyukov	86a3a54284	runtime: fix windows build Currently windows crashes because early allocs in schedinit try to allocate tiny memory blocks, but m->p is not yet setup. I've considered calling procresize(1) earlier in schedinit, but this refactoring is better and must fix the issue as well. Fixes #7218. R=golang-codereviews, r CC=golang-codereviews https://golang.org/cl/54570045	2014-01-28 00:26:56 +04:00
Dmitriy Vyukov	179d41fecc	runtime: tune P retake logic When GOMAXPROCS>1 the last P in syscall is never retaken (because there are already idle P's -- npidle>0). This prevents sysmon thread from sleeping. On a darwin machine the program from issue 6673 constantly consumes ~0.2% CPU. With this change it stably consumes 0.0% CPU. Fixes #6673. R=golang-codereviews, r CC=bradfitz, golang-codereviews, iant, khr https://golang.org/cl/56990045	2014-01-27 23:17:46 +04:00
Brad Fitzpatrick	a18f4ab569	all: use {bytes,strings}.NewReader instead of bytes.Buffers Use the smaller read-only bytes.NewReader/strings.NewReader instead of a bytes.Buffer when possible. LGTM=r R=golang-codereviews, r CC=golang-codereviews https://golang.org/cl/54660045	2014-01-27 11:05:01 -08:00
Dmitriy Vyukov	e1a91c5b89	runtime: fix buffer overflow in stringtoslicerune On 32-bits n*sizeof(r[0]) can overflow. Or it can become 1<<32-eps, and mallocgc will "successfully" allocate 0 pages for it, there are no checks downstream and MHeap_Grow just does: npage = (npage+15)&~15; ask = npage<<PageShift; LGTM=khr R=golang-codereviews, khr CC=golang-codereviews https://golang.org/cl/54760045	2014-01-27 20:29:21 +04:00
Dmitriy Vyukov	bace9523ee	runtime: smarter slice grow When growing slice take into account size of the allocated memory block. Also apply the same optimization to string->[]byte conversion. Fixes #6307. benchmark old ns/op new ns/op delta BenchmarkAppendGrowByte 4541036 4434108 -2.35% BenchmarkAppendGrowString 59885673 44813604 -25.17% LGTM=khr R=khr CC=golang-codereviews, iant, rsc https://golang.org/cl/53340044	2014-01-27 15:11:12 +04:00
Jeff Sickel	03e4f25849	runtime/pprof: plan9 fails the TestGoroutineSwitch, skip for now. LGTM=r R=golang-codereviews, 0intro, r CC=golang-codereviews https://golang.org/cl/55430043	2014-01-25 10:09:08 -08:00
Dmitriy Vyukov	1fa7029425	runtime: combine small NoScan allocations Combine NoScan allocations < 16 bytes into a single memory block. Reduces number of allocations on json/garbage benchmarks by 10+%. json-1 allocated `8039872` 7949194 -1.13% allocs 105774 93776 -11.34% cputime 156200000 100700000 -35.53% gc-pause-one 4908873 3814853 -22.29% gc-pause-total 2748969 2899288 +5.47% rss 52674560 43560960 -17.30% sys-gc 3796976 3256304 -14.24% sys-heap 43843584 35192832 -19.73% sys-other 5589312 5310784 -4.98% sys-stack 393216 393216 +0.00% sys-total 53623088 44153136 -17.66% time 156193436 100886714 -35.41% virtual-mem 256548864 256540672 -0.00% garbage-1 allocated 2996885 2932982 -2.13% allocs 62904 55200 -12.25% cputime 17470000 17400000 -0.40% gc-pause-one 932757485 925806143 -0.75% gc-pause-total 4663787 4629030 -0.75% rss 1151074304 1133670400 -1.51% sys-gc 66068352 65085312 -1.49% sys-heap 1039728640 1024065536 -1.51% sys-other 38038208 37485248 -1.45% sys-stack 8650752 8781824 +1.52% sys-total 1152485952 1135417920 -1.48% time 17478088 17418005 -0.34% virtual-mem 1343709184 1324204032 -1.45% LGTM=iant, bradfitz R=golang-codereviews, dave, iant, rsc, bradfitz CC=golang-codereviews, khr https://golang.org/cl/38750047	2014-01-24 22:35:11 +04:00
Dmitriy Vyukov	f8e0057bb7	sync: scalable Pool Introduce fixed-size P-local caches. When local caches overflow/underflow a batch of items is transferred to/from global mutex-protected cache. benchmark old ns/op new ns/op delta BenchmarkPool 50554 22423 -55.65% BenchmarkPool-4 400359 5904 -98.53% BenchmarkPool-16 403311 1598 -99.60% BenchmarkPool-32 367310 1526 -99.58% BenchmarkPoolOverlflow 5214 3633 -30.32% BenchmarkPoolOverlflow-4 42663 9539 -77.64% BenchmarkPoolOverlflow-8 46919 11385 -75.73% BenchmarkPoolOverlflow-16 39454 13048 -66.93% BenchmarkSprintfEmpty 84 63 -25.68% BenchmarkSprintfEmpty-2 371 32 -91.13% BenchmarkSprintfEmpty-4 465 22 -95.25% BenchmarkSprintfEmpty-8 565 12 -97.77% BenchmarkSprintfEmpty-16 498 5 -98.87% BenchmarkSprintfEmpty-32 492 4 -99.04% BenchmarkSprintfString 259 229 -11.58% BenchmarkSprintfString-2 574 144 -74.91% BenchmarkSprintfString-4 651 77 -88.05% BenchmarkSprintfString-8 868 47 -94.48% BenchmarkSprintfString-16 825 33 -95.96% BenchmarkSprintfString-32 825 30 -96.28% BenchmarkSprintfInt 213 188 -11.74% BenchmarkSprintfInt-2 448 138 -69.20% BenchmarkSprintfInt-4 624 52 -91.63% BenchmarkSprintfInt-8 691 31 -95.43% BenchmarkSprintfInt-16 724 18 -97.46% BenchmarkSprintfInt-32 718 16 -97.70% BenchmarkSprintfIntInt 311 282 -9.32% BenchmarkSprintfIntInt-2 333 145 -56.46% BenchmarkSprintfIntInt-4 642 110 -82.87% BenchmarkSprintfIntInt-8 832 42 -94.90% BenchmarkSprintfIntInt-16 817 24 -97.00% BenchmarkSprintfIntInt-32 805 22 -97.17% BenchmarkSprintfPrefixedInt 309 269 -12.94% BenchmarkSprintfPrefixedInt-2 245 168 -31.43% BenchmarkSprintfPrefixedInt-4 598 99 -83.36% BenchmarkSprintfPrefixedInt-8 770 67 -91.23% BenchmarkSprintfPrefixedInt-16 829 54 -93.49% BenchmarkSprintfPrefixedInt-32 824 50 -93.83% BenchmarkSprintfFloat 418 398 -4.78% BenchmarkSprintfFloat-2 295 203 -31.19% BenchmarkSprintfFloat-4 585 128 -78.12% BenchmarkSprintfFloat-8 873 60 -93.13% BenchmarkSprintfFloat-16 884 33 -96.24% BenchmarkSprintfFloat-32 881 29 -96.62% BenchmarkManyArgs 1097 1069 -2.55% BenchmarkManyArgs-2 705 567 -19.57% BenchmarkManyArgs-4 792 319 -59.72% BenchmarkManyArgs-8 963 172 -82.14% BenchmarkManyArgs-16 1115 103 -90.76% BenchmarkManyArgs-32 1133 90 -92.03% LGTM=rsc R=golang-codereviews, bradfitz, minux.ma, gobot, rsc CC=golang-codereviews https://golang.org/cl/46010043	2014-01-24 22:29:53 +04:00
Dmitriy Vyukov	9fa9613e0b	runtime: do not zero terminate strings On top of "tiny allocator" (cl/38750047), reduces number of allocs by 1% on json. No code must rely on zero termination. So will also make debugging simpler, by uncovering issues earlier. json-1 allocated 7949686 7915766 -0.43% allocs 93778 92790 -1.05% time 100957795 97250949 -3.67% rest of the metrics are too noisy. LGTM=r R=golang-codereviews, r, bradfitz, iant CC=golang-codereviews https://golang.org/cl/40370061	2014-01-24 22:29:01 +04:00
Russ Cox	a81692e265	cmd/gc: add zeroing to enable precise stack accounting There is more zeroing than I would like right now - temporaries used for the new map and channel runtime calls need to be eliminated - but it will do for now. This CL only has an effect if you are building with GOEXPERIMENT=precisestack ./all.bash (or make.bash). It costs about 5% in the overall time spent in all.bash. That number will come down before we make it on by default, but this should be enough for Keith to try using the precise maps for copying stacks. amd64 only (and it's not really great generated code). TBR=khr, iant CC=golang-codereviews https://golang.org/cl/56430043	2014-01-23 23:11:04 -05:00
Russ Cox	b377c9c6a9	liblink, runtime: fix cgo on arm The addition of TLS to ARM rewrote the MRC instruction differently depending on whether we were using internal or external linking mode. That's clearly not okay, since we don't know that during compilation, which is when we now generate the code. Also, because the change did not introduce a real MRC instruction but instead just macro-expanded it in the assembler, liblink is rewriting a WORD instruction that may actually be looking for that specific constant, which would lead to very unexpected results. It was also using one value that happened to be 8 where a different value that also happened to be 8 belonged. So the code was correct for those values but not correct in general, and very confusing. Throw it all away. Replace with the following. There is a linker-provided symbol runtime.tlsgm with a value (address) set to the offset from the hardware-provided TLS base register to the g and m storage. Any reference to that name emits an appropriate TLS relocation to be resolved by either the internal linker or the external linker, depending on the link mode. The relocation has exactly the semantics of the R_ARM_TLS_LE32 relocation, which is what the external linker provides. This symbol is only used in two routines, runtime.load_gm and runtime.save_gm. In both cases it is now used like this: MRC 15, 0, R0, C13, C0, 3 // fetch TLS base pointer MOVW $runtime·tlsgm(SB), R2 ADD R2, R0 // now R0 points at thread-local g+m storage It is likely that this change breaks the generation of shared libraries on ARM, because the MOVW needs to be rewritten to use the global offset table and a different relocation type. But let's get the supported functionality working again before we worry about unsupported functionality. LGTM=dave, iant R=iant, dave CC=golang-codereviews https://golang.org/cl/56120043	2014-01-23 22:51:39 -05:00
Keith Randall	be5d2d4432	runtime: Print elision message if we skipped frames on traceback. Fixes bug 7180 R=golang-codereviews, dvyukov CC=golang-codereviews, gri https://golang.org/cl/55810044	2014-01-23 12:47:30 -08:00
Dmitriy Vyukov	8371b0142e	undo CL 45770044 / d795425bfa18 Breaks darwin and freebsd. ««« original CL description runtime: increase page size to 8K Tcmalloc uses 8K, 32K and 64K pages, and in custom setups 256K pages. Only Chromium uses 4K pages today (in "slow but small" configuration). The general tendency is to increase page size, because it reduces metadata size and DTLB pressure. This change reduces GC pause by ~10% and slightly improves other metrics. json-1 allocated 8037492 8038689 +0.01% allocs 105762 105573 -0.18% cputime 158400000 155800000 -1.64% gc-pause-one 4412234 4135702 -6.27% gc-pause-total 2647340 2398707 -9.39% rss 54923264 54525952 -0.72% sys-gc 3952624 3928048 -0.62% sys-heap 46399488 46006272 -0.85% sys-other 5597504 5290304 -5.49% sys-stack 393216 393216 +0.00% sys-total 56342832 55617840 -1.29% time 158478890 156046916 -1.53% virtual-mem 256548864 256593920 +0.02% garbage-1 allocated 2991113 2986259 -0.16% allocs 62844 62652 -0.31% cputime 16330000 15860000 -2.88% gc-pause-one 789108229 725555211 -8.05% gc-pause-total 3945541 3627776 -8.05% rss 1143660544 1132253184 -1.00% sys-gc 65609600 65806208 +0.30% sys-heap 1032388608 1035599872 +0.31% sys-other 37501632 22777664 -39.26% sys-stack 8650752 8781824 +1.52% sys-total 1144150592 1132965568 -0.98% time 16364602 15891994 -2.89% virtual-mem 1327296512 1313746944 -1.02% R=golang-codereviews, dave, khr, rsc, khr CC=golang-codereviews https://golang.org/cl/45770044 »»» R=golang-codereviews CC=golang-codereviews https://golang.org/cl/56060043	2014-01-23 19:56:59 +04:00
Dmitriy Vyukov	6d603af6dc	runtime: increase page size to 8K Tcmalloc uses 8K, 32K and 64K pages, and in custom setups 256K pages. Only Chromium uses 4K pages today (in "slow but small" configuration). The general tendency is to increase page size, because it reduces metadata size and DTLB pressure. This change reduces GC pause by ~10% and slightly improves other metrics. json-1 allocated 8037492 8038689 +0.01% allocs 105762 105573 -0.18% cputime 158400000 155800000 -1.64% gc-pause-one 4412234 4135702 -6.27% gc-pause-total 2647340 2398707 -9.39% rss 54923264 54525952 -0.72% sys-gc 3952624 3928048 -0.62% sys-heap 46399488 46006272 -0.85% sys-other 5597504 5290304 -5.49% sys-stack 393216 393216 +0.00% sys-total 56342832 55617840 -1.29% time 158478890 156046916 -1.53% virtual-mem 256548864 256593920 +0.02% garbage-1 allocated 2991113 2986259 -0.16% allocs 62844 62652 -0.31% cputime 16330000 15860000 -2.88% gc-pause-one 789108229 725555211 -8.05% gc-pause-total 3945541 3627776 -8.05% rss 1143660544 1132253184 -1.00% sys-gc 65609600 65806208 +0.30% sys-heap 1032388608 1035599872 +0.31% sys-other 37501632 22777664 -39.26% sys-stack 8650752 8781824 +1.52% sys-total 1144150592 1132965568 -0.98% time 16364602 15891994 -2.89% virtual-mem 1327296512 1313746944 -1.02% R=golang-codereviews, dave, khr, rsc, khr CC=golang-codereviews https://golang.org/cl/45770044	2014-01-23 18:59:43 +04:00
Russ Cox	f7245c0626	runtime: fix typo in ARM code The typo was introduced by one of Dmitriy's CLs this morning. The fix makes the ARM build compile again; it still won't pass its tests, but one thing at a time. TBR=dvyukov CC=golang-codereviews https://golang.org/cl/55770044	2014-01-22 16:39:39 -05:00
Dmitriy Vyukov	b7b93a7154	runtime: fix code formatting Place && at the end of line. Offset expression continuation. R=golang-codereviews, bradfitz CC=golang-codereviews https://golang.org/cl/55380044	2014-01-22 13:30:12 +04:00
Dmitriy Vyukov	9cbd2fb1aa	runtime: remove locks from netpoll hotpaths Introduces two-phase goroutine parking mechanism -- prepare to park, commit park. This mechanism does not require backing mutex to protect wait predicate. Use it in netpoll. See comment in netpoll.goc for details. This slightly reduces contention between reader, writer and read/write io notifications; and just eliminates a bunch of mutex operations from hotpaths, thus making then faster. benchmark old ns/op new ns/op delta BenchmarkTCP4ConcurrentReadWrite 2109 1945 -7.78% BenchmarkTCP4ConcurrentReadWrite-2 1162 1113 -4.22% BenchmarkTCP4ConcurrentReadWrite-4 798 755 -5.39% BenchmarkTCP4ConcurrentReadWrite-8 803 748 -6.85% BenchmarkTCP4Persistent 9411 9240 -1.82% BenchmarkTCP4Persistent-2 5888 5813 -1.27% BenchmarkTCP4Persistent-4 4016 3968 -1.20% BenchmarkTCP4Persistent-8 3943 3857 -2.18% R=golang-codereviews, mikioh.mikioh, gobot, iant, rsc CC=golang-codereviews, khr https://golang.org/cl/45700043	2014-01-22 11:27:16 +04:00
Dmitriy Vyukov	cb86d86786	runtime/race: race instrument reads/writes in select cases The new select tests currently fail (the race is not detected). R=khr CC=golang-codereviews https://golang.org/cl/54220043	2014-01-22 10:36:17 +04:00
Dmitriy Vyukov	98b50b89a8	runtime: allocate goroutine ids in batches Helps reduce contention on sched.goidgen. benchmark old ns/op new ns/op delta BenchmarkCreateGoroutines-16 259 237 -8.49% BenchmarkCreateGoroutinesParallel-16 127 43 -66.06% R=golang-codereviews, dave, bradfitz, khr CC=golang-codereviews, rsc https://golang.org/cl/46970043	2014-01-22 10:34:36 +04:00
Dmitriy Vyukov	8a3c587dc1	runtime: fix and improve CPU profiling - do not lose profiling signals when we have no mcache (possible for syscalls/cgo) - do not lose any profiling signals on windows - fix profiling of cgo programs on windows (they had no m->thread setup) - properly setup tls in cgo programs on windows - check _beginthread return value Fixes #6417. Fixes #6986. R=alex.brainman, rsc CC=golang-codereviews https://golang.org/cl/44820047	2014-01-22 10:30:10 +04:00

1 2 3 4 5 ...

1742 Commits