mirror of https://github.com/golang/go.git
22 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
2a331ca8bb |
runtime: document relaxed access to arena_used
The unsynchronized accesses to mheap_.arena_used in the concurrent part of the garbage collector look like a problem waiting to happen. In fact, they are safe, but the reason is somewhat subtle and undocumented. This commit documents this reasoning. Related to issue #9984. Change-Id: Icdbf2329c1aa11dbe2396a71eb5fc2a85bd4afd5 Reviewed-on: https://go-review.googlesource.com/11254 Reviewed-by: Dmitry Vyukov <dvyukov@google.com> |
|
|
|
b0532a96a8 |
runtime: fix write-barrier-enabled phase list in gcmarkwb_m
Commit
|
|
|
|
faa7a7e8ae |
runtime: implement GC stack barriers
This commit implements stack barriers to minimize the amount of stack re-scanning that must be done during mark termination. Currently the GC scans stacks of active goroutines twice during every GC cycle: once at the beginning during root discovery and once at the end during mark termination. The second scan happens while the world is stopped and guarantees that we've seen all of the roots (since there are no write barriers on writes to local stack variables). However, this means pause time is proportional to stack size. In particularly recursive programs, this can drive pause time up past our 10ms goal (e.g., it takes about 150ms to scan a 50MB heap). Re-scanning the entire stack is rarely necessary, especially for large stacks, because usually most of the frames on the stack were not active between the first and second scans and hence any changes to these frames (via non-escaping pointers passed down the stack) were tracked by write barriers. To efficiently track how far a stack has been unwound since the first scan (and, hence, how much needs to be re-scanned), this commit introduces stack barriers. During the first scan, at exponentially spaced points in each stack, the scan overwrites return PCs with the PC of the stack barrier function. When "returned" to, the stack barrier function records how far the stack has unwound and jumps to the original return PC for that point in the stack. Then the second scan only needs to proceed as far as the lowest barrier that hasn't been hit. For deeply recursive programs, this substantially reduces mark termination time (and hence pause time). For the goscheme example linked in issue #10898, prior to this change, mark termination times were typically between 100 and 500ms; with this change, mark termination times are typically between 10 and 20ms. As a result of the reduced stack scanning work, this reduces overall execution time of the goscheme example by 20%. Fixes #10898. The effect of this on programs that are not deeply recursive is minimal: name old time/op new time/op delta BinaryTree17 3.16s ± 2% 3.26s ± 1% +3.31% (p=0.000 n=19+19) Fannkuch11 2.42s ± 1% 2.48s ± 1% +2.24% (p=0.000 n=17+19) FmtFprintfEmpty 50.0ns ± 3% 49.8ns ± 1% ~ (p=0.534 n=20+19) FmtFprintfString 173ns ± 0% 175ns ± 0% +1.49% (p=0.000 n=16+19) FmtFprintfInt 170ns ± 1% 175ns ± 1% +2.97% (p=0.000 n=20+19) FmtFprintfIntInt 288ns ± 0% 295ns ± 0% +2.73% (p=0.000 n=16+19) FmtFprintfPrefixedInt 242ns ± 1% 252ns ± 1% +4.13% (p=0.000 n=18+18) FmtFprintfFloat 324ns ± 0% 323ns ± 0% -0.36% (p=0.000 n=20+19) FmtManyArgs 1.14µs ± 0% 1.12µs ± 1% -1.01% (p=0.000 n=18+19) GobDecode 8.88ms ± 1% 8.87ms ± 0% ~ (p=0.480 n=19+18) GobEncode 6.80ms ± 1% 6.85ms ± 0% +0.82% (p=0.000 n=20+18) Gzip 363ms ± 1% 363ms ± 1% ~ (p=0.077 n=18+20) Gunzip 90.6ms ± 0% 90.0ms ± 1% -0.71% (p=0.000 n=17+18) HTTPClientServer 51.5µs ± 1% 50.8µs ± 1% -1.32% (p=0.000 n=18+18) JSONEncode 17.0ms ± 0% 17.1ms ± 0% +0.40% (p=0.000 n=18+17) JSONDecode 61.8ms ± 0% 63.8ms ± 1% +3.11% (p=0.000 n=18+17) Mandelbrot200 3.84ms ± 0% 3.84ms ± 1% ~ (p=0.583 n=19+19) GoParse 3.71ms ± 1% 3.72ms ± 1% ~ (p=0.159 n=18+19) RegexpMatchEasy0_32 100ns ± 0% 100ns ± 1% -0.19% (p=0.033 n=17+19) RegexpMatchEasy0_1K 342ns ± 1% 331ns ± 0% -3.41% (p=0.000 n=19+19) RegexpMatchEasy1_32 82.5ns ± 0% 81.7ns ± 0% -0.98% (p=0.000 n=18+18) RegexpMatchEasy1_1K 505ns ± 0% 494ns ± 1% -2.16% (p=0.000 n=18+18) RegexpMatchMedium_32 137ns ± 1% 137ns ± 1% -0.24% (p=0.048 n=20+18) RegexpMatchMedium_1K 41.6µs ± 0% 41.3µs ± 1% -0.57% (p=0.004 n=18+20) RegexpMatchHard_32 2.11µs ± 0% 2.11µs ± 1% +0.20% (p=0.037 n=17+19) RegexpMatchHard_1K 63.9µs ± 2% 63.3µs ± 0% -0.99% (p=0.000 n=20+17) Revcomp 560ms ± 1% 522ms ± 0% -6.87% (p=0.000 n=18+16) Template 75.0ms ± 0% 75.1ms ± 1% +0.18% (p=0.013 n=18+19) TimeParse 358ns ± 1% 364ns ± 0% +1.74% (p=0.000 n=20+15) TimeFormat 360ns ± 0% 372ns ± 0% +3.55% (p=0.000 n=20+18) Change-Id: If8a9bfae6c128d15a4f405e02bcfa50129df82a2 Reviewed-on: https://go-review.googlesource.com/10314 Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
|
|
|
001438bdfe |
runtime: fix callwritebarrier
Given a call frame F of size N where the return values start at offset R, callwritebarrier was instructing heapBitsBulkBarrier to scan the block of memory [F+R, F+R+N). It should only scan [F+R, F+N). The extra N-R bytes scanned might lead into the next allocated block in memory. Because the scan was consulting the heap bitmap for type information, scanning into the next block normally "just worked" in the sense of not crashing. Scanning the extra N-R bytes of memory is a problem mainly because it causes the GC to consider pointers that might otherwise not be considered, leading it to retain objects that should actually be freed. This is very difficult to detect. Luckily, juju turned up a case where the heap bitmap and the memory were out of sync for the block immediately after the call frame, so that heapBitsBulkBarrier saw an obvious non-pointer where it expected a pointer, causing a loud crash. Why is there a non-pointer in memory that the heap bitmap records as a pointer? That is more difficult to answer. At least one way that it could happen is that allocations containing no pointers at all do not update the heap bitmap. So if heapBitsBulkBarrier walked out of the current object and into a no-pointer object and consulted those bitmap bits, it would be misled. This doesn't happen in general because all the paths to heapBitsBulkBarrier first check for the no-pointer case. This may or may not be what happened, but it's the only scenario I've been able to construct. I tried for quite a while to write a simple test for this and could not. It does fix the juju crash, and it is clearly an improvement over the old code. Fixes #10844. Change-Id: I53982c93ef23ef93155c4086bbd95a4c4fdaac9a Reviewed-on: https://go-review.googlesource.com/10317 Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
197aa9e64d |
runtime: remove unused quiesce code
This is dead code. If you want to quiesce the system the
preferred way is to use forEachP(func(*p){}).
Change-Id: Ic7677a5dd55e3639b99e78ddeb2c71dd1dd091fa
Reviewed-on: https://go-review.googlesource.com/10267
Reviewed-by: Austin Clements <austin@google.com>
|
|
|
|
497970f421 |
runtime: use memmove during slice append
The effect of this CL: name old mean new mean delta BinaryTree17 5.97s × (0.96,1.04) 5.95s × (0.98,1.02) ~ (p=0.697) Fannkuch11 4.39s × (1.00,1.01) 4.41s × (1.00,1.01) +0.52% (p=0.015) FmtFprintfEmpty 90.8ns × (0.97,1.05) 89.4ns × (0.94,1.13) ~ (p=0.571) FmtFprintfString 305ns × (0.99,1.01) 292ns × (0.98,1.05) -4.35% (p=0.000) FmtFprintfInt 278ns × (0.96,1.03) 279ns × (0.98,1.04) ~ (p=0.741) FmtFprintfIntInt 489ns × (0.99,1.02) 482ns × (0.98,1.03) -1.43% (p=0.024) FmtFprintfPrefixedInt 402ns × (0.98,1.02) 395ns × (0.98,1.03) -1.67% (p=0.014) FmtFprintfFloat 578ns × (1.00,1.00) 569ns × (0.99,1.01) -1.48% (p=0.000) FmtManyArgs 1.88µs × (0.99,1.01) 1.88µs × (1.00,1.01) ~ (p=0.055) GobDecode 15.3ms × (0.99,1.01) 15.2ms × (1.00,1.01) -0.61% (p=0.007) GobEncode 11.8ms × (0.98,1.05) 11.6ms × (0.99,1.01) ~ (p=0.075) Gzip 647ms × (0.99,1.01) 647ms × (1.00,1.00) ~ (p=0.790) Gunzip 143ms × (1.00,1.00) 142ms × (1.00,1.00) ~ (p=0.370) HTTPClientServer 91.2µs × (0.99,1.01) 91.7µs × (0.99,1.02) ~ (p=0.233) JSONEncode 31.5ms × (0.98,1.01) 31.8ms × (0.99,1.02) +1.09% (p=0.015) JSONDecode 110ms × (0.99,1.01) 110ms × (0.99,1.02) ~ (p=0.577) Mandelbrot200 6.00ms × (1.00,1.00) 6.02ms × (1.00,1.00) +0.24% (p=0.001) GoParse 6.68ms × (0.98,1.02) 6.61ms × (0.99,1.01) -1.10% (p=0.027) RegexpMatchEasy0_32 162ns × (1.00,1.00) 161ns × (1.00,1.01) -0.66% (p=0.001) RegexpMatchEasy0_1K 539ns × (1.00,1.00) 539ns × (0.99,1.01) ~ (p=0.509) RegexpMatchEasy1_32 140ns × (0.99,1.02) 139ns × (0.99,1.02) ~ (p=0.163) RegexpMatchEasy1_1K 886ns × (1.00,1.00) 887ns × (1.00,1.00) ~ (p=0.408) RegexpMatchMedium_32 252ns × (1.00,1.00) 255ns × (0.99,1.01) +1.01% (p=0.000) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.176) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.403) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.351) Revcomp 926ms × (0.99,1.01) 925ms × (0.99,1.01) ~ (p=0.541) Template 126ms × (0.99,1.02) 130ms × (0.99,1.01) +3.42% (p=0.000) TimeParse 632ns × (0.99,1.01) 626ns × (1.00,1.00) -0.88% (p=0.000) TimeFormat 658ns × (0.99,1.01) 662ns × (0.99,1.02) ~ (p=0.111) The effect of this CL combined with CL 9886: name old mean new mean delta BinaryTree17 5.90s × (0.98,1.03) 5.95s × (0.98,1.02) ~ (p=0.175) Fannkuch11 4.34s × (1.00,1.00) 4.41s × (1.00,1.01) +1.69% (p=0.000) FmtFprintfEmpty 87.3ns × (0.97,1.17) 89.4ns × (0.94,1.13) ~ (p=0.499) FmtFprintfString 288ns × (0.98,1.04) 292ns × (0.98,1.05) ~ (p=0.292) FmtFprintfInt 290ns × (0.98,1.05) 279ns × (0.98,1.04) -3.76% (p=0.001) FmtFprintfIntInt 493ns × (0.98,1.04) 482ns × (0.98,1.03) -2.27% (p=0.017) FmtFprintfPrefixedInt 399ns × (0.98,1.02) 395ns × (0.98,1.03) ~ (p=0.159) FmtFprintfFloat 569ns × (1.00,1.00) 569ns × (0.99,1.01) ~ (p=0.847) FmtManyArgs 1.90µs × (0.99,1.03) 1.88µs × (1.00,1.01) -1.14% (p=0.009) GobDecode 15.2ms × (1.00,1.01) 15.2ms × (1.00,1.01) ~ (p=0.170) GobEncode 11.8ms × (0.99,1.02) 11.6ms × (0.99,1.01) -1.47% (p=0.003) Gzip 649ms × (0.99,1.00) 647ms × (1.00,1.00) ~ (p=0.200) Gunzip 144ms × (0.99,1.01) 142ms × (1.00,1.00) -1.04% (p=0.000) HTTPClientServer 91.1µs × (0.98,1.03) 91.7µs × (0.99,1.02) ~ (p=0.345) JSONEncode 31.5ms × (0.99,1.01) 31.8ms × (0.99,1.02) +0.98% (p=0.021) JSONDecode 110ms × (1.00,1.01) 110ms × (0.99,1.02) ~ (p=0.259) Mandelbrot200 6.02ms × (1.00,1.01) 6.02ms × (1.00,1.00) ~ (p=0.500) GoParse 6.68ms × (1.00,1.01) 6.61ms × (0.99,1.01) -1.17% (p=0.001) RegexpMatchEasy0_32 161ns × (1.00,1.00) 161ns × (1.00,1.01) -0.39% (p=0.033) RegexpMatchEasy0_1K 539ns × (1.00,1.00) 539ns × (0.99,1.01) ~ (p=0.445) RegexpMatchEasy1_32 138ns × (1.00,1.01) 139ns × (0.99,1.02) ~ (p=0.281) RegexpMatchEasy1_1K 887ns × (1.00,1.01) 887ns × (1.00,1.00) ~ (p=0.610) RegexpMatchMedium_32 251ns × (1.00,1.02) 255ns × (0.99,1.01) +1.42% (p=0.000) RegexpMatchMedium_1K 72.7µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.097) RegexpMatchHard_32 3.85µs × (1.00,1.00) 3.84µs × (1.00,1.00) -0.31% (p=0.000) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.704) Revcomp 923ms × (0.98,1.02) 925ms × (0.99,1.01) ~ (p=0.574) Template 126ms × (0.98,1.03) 130ms × (0.99,1.01) +3.28% (p=0.000) TimeParse 631ns × (0.99,1.02) 626ns × (1.00,1.00) ~ (p=0.053) TimeFormat 660ns × (0.99,1.01) 662ns × (0.99,1.02) ~ (p=0.398) Change-Id: I59c03d329fe7bc178a31477c6f1f01062b881041 Reviewed-on: https://go-review.googlesource.com/9993 Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
4212a3c3d9 |
runtime: use heap bitmap for typedmemmove
The current implementation of typedmemmove walks the ptrmask in the type to find out where pointers are. This led to turning off GC programs for the Go 1.5 dev cycle, so that there would always be a ptrmask. Instead of also interpreting the GC programs, interpret the heap bitmap, which we know must be available and up to date. (There is no point to write barriers when writing outside the heap.) This CL is only about correctness. The next CL will optimize the code. Change-Id: Id1305c7c071fd2734ab96634b0e1c745b23fa793 Reviewed-on: https://go-review.googlesource.com/9886 Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
1635ab7dfe |
runtime: remove wbshadow mode
The write barrier shadow heap was very useful for developing the write barriers initially, but it's no longer used, clunky, and dragging the rest of the implementation down. The gccheckmark mode will find bugs due to missed barriers when they result in missed marks; wbshadow mode found the missed barriers more aggressively, but it required an entire separate copy of the heap. The gccheckmark mode requires no extra memory, making it more useful in practice. Compared to previous CL: name old mean new mean delta BinaryTree17 5.91s × (0.96,1.06) 5.72s × (0.97,1.03) -3.12% (p=0.000) Fannkuch11 4.32s × (1.00,1.00) 4.36s × (1.00,1.00) +0.91% (p=0.000) FmtFprintfEmpty 89.0ns × (0.93,1.10) 86.6ns × (0.96,1.11) ~ (p=0.077) FmtFprintfString 298ns × (0.98,1.06) 283ns × (0.99,1.04) -4.90% (p=0.000) FmtFprintfInt 286ns × (0.98,1.03) 283ns × (0.98,1.04) -1.09% (p=0.032) FmtFprintfIntInt 498ns × (0.97,1.06) 480ns × (0.99,1.02) -3.65% (p=0.000) FmtFprintfPrefixedInt 408ns × (0.98,1.02) 396ns × (0.99,1.01) -3.00% (p=0.000) FmtFprintfFloat 587ns × (0.98,1.01) 562ns × (0.99,1.01) -4.34% (p=0.000) FmtManyArgs 1.94µs × (0.99,1.02) 1.89µs × (0.99,1.01) -2.85% (p=0.000) GobDecode 15.8ms × (0.98,1.03) 15.7ms × (0.99,1.02) ~ (p=0.251) GobEncode 12.0ms × (0.96,1.09) 11.8ms × (0.98,1.03) -1.87% (p=0.024) Gzip 648ms × (0.99,1.01) 647ms × (0.99,1.01) ~ (p=0.688) Gunzip 143ms × (1.00,1.01) 143ms × (1.00,1.01) ~ (p=0.203) HTTPClientServer 90.3µs × (0.98,1.01) 89.1µs × (0.99,1.02) -1.30% (p=0.000) JSONEncode 31.6ms × (0.99,1.01) 31.7ms × (0.98,1.02) ~ (p=0.219) JSONDecode 107ms × (1.00,1.01) 111ms × (0.99,1.01) +3.58% (p=0.000) Mandelbrot200 6.03ms × (1.00,1.01) 6.01ms × (1.00,1.00) ~ (p=0.077) GoParse 6.53ms × (0.99,1.03) 6.54ms × (0.99,1.02) ~ (p=0.585) RegexpMatchEasy0_32 161ns × (1.00,1.01) 161ns × (0.98,1.05) ~ (p=0.948) RegexpMatchEasy0_1K 541ns × (0.99,1.01) 559ns × (0.98,1.01) +3.32% (p=0.000) RegexpMatchEasy1_32 138ns × (1.00,1.00) 137ns × (0.99,1.01) -0.55% (p=0.001) RegexpMatchEasy1_1K 887ns × (0.99,1.01) 878ns × (0.99,1.01) -0.98% (p=0.000) RegexpMatchMedium_32 253ns × (0.99,1.01) 252ns × (0.99,1.01) -0.39% (p=0.001) RegexpMatchMedium_1K 72.8µs × (1.00,1.00) 72.7µs × (1.00,1.00) ~ (p=0.485) RegexpMatchHard_32 3.85µs × (1.00,1.01) 3.85µs × (1.00,1.01) ~ (p=0.283) RegexpMatchHard_1K 117µs × (1.00,1.01) 117µs × (1.00,1.00) ~ (p=0.175) Revcomp 922ms × (0.97,1.08) 903ms × (0.98,1.05) -2.15% (p=0.021) Template 126ms × (0.99,1.01) 126ms × (0.99,1.01) ~ (p=0.943) TimeParse 628ns × (0.99,1.01) 634ns × (0.99,1.01) +0.92% (p=0.000) TimeFormat 668ns × (0.99,1.01) 698ns × (0.98,1.03) +4.53% (p=0.000) It's nice that the microbenchmarks are the ones helped the most, because those were the ones hurt the most by the conversion from 4-bit to 2-bit heap bitmaps. This CL brings the overall effect of that process to (compared to CL 9706 patch set 1): name old mean new mean delta BinaryTree17 5.87s × (0.94,1.09) 5.72s × (0.97,1.03) -2.57% (p=0.011) Fannkuch11 4.32s × (1.00,1.00) 4.36s × (1.00,1.00) +0.87% (p=0.000) FmtFprintfEmpty 89.1ns × (0.95,1.16) 86.6ns × (0.96,1.11) ~ (p=0.090) FmtFprintfString 283ns × (0.98,1.02) 283ns × (0.99,1.04) ~ (p=0.681) FmtFprintfInt 284ns × (0.98,1.04) 283ns × (0.98,1.04) ~ (p=0.620) FmtFprintfIntInt 486ns × (0.98,1.03) 480ns × (0.99,1.02) -1.27% (p=0.002) FmtFprintfPrefixedInt 400ns × (0.99,1.02) 396ns × (0.99,1.01) -0.84% (p=0.001) FmtFprintfFloat 566ns × (0.99,1.01) 562ns × (0.99,1.01) -0.80% (p=0.000) FmtManyArgs 1.91µs × (0.99,1.02) 1.89µs × (0.99,1.01) -1.10% (p=0.000) GobDecode 15.5ms × (0.98,1.05) 15.7ms × (0.99,1.02) +1.55% (p=0.005) GobEncode 11.9ms × (0.97,1.03) 11.8ms × (0.98,1.03) -0.97% (p=0.048) Gzip 648ms × (0.99,1.01) 647ms × (0.99,1.01) ~ (p=0.627) Gunzip 143ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.482) HTTPClientServer 89.2µs × (0.99,1.02) 89.1µs × (0.99,1.02) ~ (p=0.740) JSONEncode 32.3ms × (0.97,1.06) 31.7ms × (0.98,1.02) -1.95% (p=0.002) JSONDecode 106ms × (0.99,1.01) 111ms × (0.99,1.01) +4.22% (p=0.000) Mandelbrot200 6.02ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.417) GoParse 6.57ms × (0.97,1.06) 6.54ms × (0.99,1.02) ~ (p=0.404) RegexpMatchEasy0_32 162ns × (1.00,1.00) 161ns × (0.98,1.05) ~ (p=0.088) RegexpMatchEasy0_1K 561ns × (0.99,1.02) 559ns × (0.98,1.01) -0.47% (p=0.034) RegexpMatchEasy1_32 145ns × (0.95,1.04) 137ns × (0.99,1.01) -5.56% (p=0.000) RegexpMatchEasy1_1K 864ns × (0.99,1.04) 878ns × (0.99,1.01) +1.57% (p=0.000) RegexpMatchMedium_32 255ns × (0.99,1.04) 252ns × (0.99,1.01) -1.43% (p=0.001) RegexpMatchMedium_1K 73.9µs × (0.98,1.04) 72.7µs × (1.00,1.00) -1.55% (p=0.004) RegexpMatchHard_32 3.92µs × (0.98,1.04) 3.85µs × (1.00,1.01) -1.80% (p=0.003) RegexpMatchHard_1K 120µs × (0.98,1.04) 117µs × (1.00,1.00) -2.13% (p=0.001) Revcomp 936ms × (0.95,1.08) 903ms × (0.98,1.05) -3.58% (p=0.002) Template 130ms × (0.98,1.04) 126ms × (0.99,1.01) -2.98% (p=0.000) TimeParse 638ns × (0.98,1.05) 634ns × (0.99,1.01) ~ (p=0.198) TimeFormat 674ns × (0.99,1.01) 698ns × (0.98,1.03) +3.69% (p=0.000) Change-Id: Ia0e9b50b1d75a3c0c7556184cd966305574fe07c Reviewed-on: https://go-review.googlesource.com/9706 Reviewed-by: Rick Hudson <rlh@golang.org> |
|
|
|
6d8a147bef |
runtime: use 1-bit pointer bitmaps in type representation
The type information in reflect.Type and the GC programs is now 1 bit per word, down from 2 bits. The in-memory unrolled type bitmap representation are now 1 bit per word, down from 4 bits. The conversion from the unrolled (now 1-bit) bitmap to the heap bitmap (still 4-bit) is not optimized. A followup CL will work on that, after the heap bitmap has been converted to 2-bit. The typeDead optimization, in which a special value denotes that there are no more pointers anywhere in the object, is lost in this CL. A followup CL will bring it back in the final form of heapBitsSetType. Change-Id: If61e67950c16a293b0b516a6fd9a1c755b6d5549 Reviewed-on: https://go-review.googlesource.com/9702 Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
32d6fbcb4f |
runtime: replace needwb() with writeBarrierEnabled
Reduce the write barrier check to a single load and compare so that it can be inlined into write barrier use sites. Makes the standard write barrier a little faster too. name old new delta BenchmarkBinaryTree17 17.9s × (0.99,1.01) 17.9s × (1.00,1.01) ~ BenchmarkFannkuch11 4.35s × (1.00,1.00) 4.43s × (1.00,1.00) +1.81% BenchmarkFmtFprintfEmpty 120ns × (0.93,1.06) 110ns × (1.00,1.06) -7.92% BenchmarkFmtFprintfString 479ns × (0.99,1.00) 487ns × (0.99,1.00) +1.67% BenchmarkFmtFprintfInt 452ns × (0.99,1.02) 450ns × (0.99,1.00) ~ BenchmarkFmtFprintfIntInt 766ns × (0.99,1.01) 762ns × (1.00,1.00) ~ BenchmarkFmtFprintfPrefixedInt 576ns × (0.98,1.01) 584ns × (0.99,1.01) ~ BenchmarkFmtFprintfFloat 730ns × (1.00,1.01) 738ns × (1.00,1.00) +1.16% BenchmarkFmtManyArgs 2.84µs × (0.99,1.00) 2.80µs × (1.00,1.01) -1.22% BenchmarkGobDecode 39.3ms × (0.98,1.01) 39.0ms × (0.99,1.00) ~ BenchmarkGobEncode 39.5ms × (0.99,1.01) 37.8ms × (0.98,1.01) -4.33% BenchmarkGzip 663ms × (1.00,1.01) 661ms × (0.99,1.01) ~ BenchmarkGunzip 143ms × (1.00,1.00) 142ms × (1.00,1.00) ~ BenchmarkHTTPClientServer 132µs × (0.99,1.01) 132µs × (0.99,1.01) ~ BenchmarkJSONEncode 57.4ms × (0.99,1.01) 56.3ms × (0.99,1.01) -1.96% BenchmarkJSONDecode 139ms × (0.99,1.00) 138ms × (0.99,1.01) ~ BenchmarkMandelbrot200 6.03ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ BenchmarkGoParse 10.3ms × (0.89,1.14) 10.2ms × (0.87,1.05) ~ BenchmarkRegexpMatchEasy0_32 209ns × (1.00,1.00) 208ns × (1.00,1.00) ~ BenchmarkRegexpMatchEasy0_1K 591ns × (0.99,1.00) 588ns × (1.00,1.00) ~ BenchmarkRegexpMatchEasy1_32 184ns × (0.99,1.02) 182ns × (0.99,1.01) ~ BenchmarkRegexpMatchEasy1_1K 1.01µs × (1.00,1.00) 0.99µs × (1.00,1.01) -2.33% BenchmarkRegexpMatchMedium_32 330ns × (1.00,1.00) 323ns × (1.00,1.01) -2.12% BenchmarkRegexpMatchMedium_1K 92.6µs × (1.00,1.00) 89.9µs × (1.00,1.00) -2.92% BenchmarkRegexpMatchHard_32 4.80µs × (0.95,1.00) 4.72µs × (0.95,1.01) ~ BenchmarkRegexpMatchHard_1K 136µs × (1.00,1.00) 133µs × (1.00,1.01) -1.86% BenchmarkRevcomp 900ms × (0.99,1.04) 900ms × (1.00,1.05) ~ BenchmarkTemplate 172ms × (1.00,1.00) 168ms × (0.99,1.01) -2.07% BenchmarkTimeParse 637ns × (1.00,1.00) 637ns × (1.00,1.00) ~ BenchmarkTimeFormat 744ns × (1.00,1.01) 738ns × (1.00,1.00) -0.67% Change-Id: I4ecc925805da1f5ee264377f1f7574f54ee575e7 Reviewed-on: https://go-review.googlesource.com/9321 Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
181e26b9fa |
runtime: replace func-based write barrier skipping with type-based
This CL revises CL 7504 to use explicitly uintptr types for the struct fields that are going to be updated sometimes without write barriers. The result is that the fields are now updated *always* without write barriers. This approach has two important properties: 1) Now the GC never looks at the field, so if the missing reference could cause a problem, it will do so all the time, not just when the write barrier is missed at just the right moment. 2) Now a write barrier never happens for the field, avoiding the (correct) detection of inconsistent write barriers when GODEBUG=wbshadow=1. Change-Id: Iebd3962c727c0046495cc08914a8dc0808460e0e Reviewed-on: https://go-review.googlesource.com/9019 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> |
|
|
|
ab4df700b8 |
runtime: merge slice and sliceStruct
By removing type slice, renaming type sliceStruct to type slice and whacking until it compiles. Has a pleasing net reduction of conversions. Fixes #10188 Change-Id: I77202b8df637185b632fd7875a1fdd8d52c7a83c Reviewed-on: https://go-review.googlesource.com/8770 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> |
|
|
|
a1f57598cc |
runtime, cmd/internal/ld: rename themoduledata to firstmoduledata
'themoduledata' doesn't really make sense now we support multiple moduledata objects. Change-Id: I8263045d8f62a42cb523502b37289b0fba054f62 Reviewed-on: https://go-review.googlesource.com/8521 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> |
|
|
|
fae4a128cb |
runtime, reflect: support multiple moduledata objects
This changes all the places that consult themoduledata to consult a linked list of moduledata objects, as will be necessary for -linkshared to work. Obviously, as there is as yet no way of adding moduledata objects to this list, all this change achieves right now is wasting a few instructions here and there. Change-Id: I397af7f60d0849b76aaccedf72238fe664867051 Reviewed-on: https://go-review.googlesource.com/8231 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> |
|
|
|
67426a8a9e |
runtime, cmd/internal/ld: change runtime to use a single linker symbol
In preparation for being able to run a go program that has code in several objects, this changes from having several linker symbols used by the runtime into having one linker symbol that points at a structure containing the needed data. Multiple object support will construct a linked list of such structures. A follow up will initialize the slices in the themoduledata structure directly from the linker but I was aiming for a minimal diff for now. Change-Id: I613cce35309801cf265a1d5ae5aaca8d689c5cbf Reviewed-on: https://go-review.googlesource.com/7441 Reviewed-by: Ian Lance Taylor <iant@golang.org> |
|
|
|
41dbcc19ef |
runtime: Remove write barriers during STW.
The GC assumes that there will be no asynchronous write barriers when the world is stopped. This keeps the synchronization between write barriers and the GC simple. However, currently, there are a few places in runtime code where this assumption does not hold. The GC stops the world by collecting all Ps, which stops all user Go code, but small parts of the runtime can run without a P. For example, the code that releases a P must still deschedule its G onto a runnable queue before stopping. Similarly, when a G returns from a long-running syscall, it must run code to reacquire a P. Currently, this code can contain write barriers. This can lead to the GC collecting reachable objects if something like the following sequence of events happens: 1. GC stops the world by collecting all Ps. 2. G #1 returns from a syscall (for example), tries to install a pointer to object X, and calls greyobject on X. 3. greyobject on G #1 marks X, but does not yet add it to a write buffer. At this point, X is effectively black, not grey, even though it may point to white objects. 4. GC reaches X through some other path and calls greyobject on X, but greyobject does nothing because X is already marked. 5. GC completes. 6. greyobject on G #1 adds X to a work buffer, but it's too late. 7. Objects that were reachable only through X are incorrectly collected. To fix this, we check the invariant that no asynchronous write barriers happen when the world is stopped by checking that write barriers always have a P, and modify all currently known sources of these writes to disable the write barrier. In all modified cases this is safe because the object in question will always be reachable via some other path. Some of the trace code was turned off, in particular the code that traces returning from a syscall. The GC assumes that as far as the heap is concerned the thread is stopped when it is in a syscall. Upon returning the trace code must not do any heap writes for the same reasons discussed above. Fixes #10098 Fixes #9953 Fixes #9951 Fixes #9884 May relate to #9610 #9771 Change-Id: Ic2e70b7caffa053e56156838eb8d89503e3c0c8a Reviewed-on: https://go-review.googlesource.com/7504 Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
2e7f0a00c3 |
runtime: fix comment
IRIW requires 4 threads: first writes x, second writes y, third reads x and y, fourth reads y and x. This is Peterson/Dekker mutual exclusion algorithm based on critical store-load sequences: http://en.wikipedia.org/wiki/Dekker's_algorithm http://en.wikipedia.org/wiki/Peterson%27s_algorithm Change-Id: I30a00865afbe895f7617feed4559018f81ff4528 Reviewed-on: https://go-review.googlesource.com/7561 Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org> |
|
|
|
ed8cc5cf9b |
runtime: fix race instrumentation of append
typedslicecopy is another write barrier that is not understood by racewalk. It seems quite complex to handle it in the compiler, so instead just instrument it in runtime. Update #9796 Change-Id: I0eb6abf3a2cd2491a338fab5f7da22f01bf7e89b Reviewed-on: https://go-review.googlesource.com/4370 Reviewed-by: Russ Cox <rsc@golang.org> |
|
|
|
c9321f3fb1 |
runtime: fix nosplit stack overflow
The overflow happens only with -gcflags="-N -l" and can be reproduced with: $ go test -gcflags="-N -l" -a -run=none net runtime.cgocall: nosplit stack overflow 504 assumed on entry to runtime.cgocall 480 after runtime.cgocall uses 24 472 on entry to runtime.cgocall_errno 408 after runtime.cgocall_errno uses 64 400 on entry to runtime.exitsyscall 288 after runtime.exitsyscall uses 112 280 on entry to runtime.exitsyscallfast 152 after runtime.exitsyscallfast uses 128 144 on entry to runtime.writebarrierptr 88 after runtime.writebarrierptr uses 56 80 on entry to runtime.writebarrierptr_nostore1 24 after runtime.writebarrierptr_nostore1 uses 56 16 on entry to runtime.acquirem -24 after runtime.acquirem uses 40 Move closure creation into separate function so that frames of writebarrierptr_shadow and writebarrierptr_nostore1 are overlapped. Fixes #9721 Change-Id: I40851f0786763ee964af34814edbc3e3d73cf4e7 Reviewed-on: https://go-review.googlesource.com/3418 Reviewed-by: Russ Cox <rsc@golang.org> |
|
|
|
d94192180f |
runtime: fix wbshadow mode
Half of tests currently crash with GODEBUG=wbshadow. _PageSize is set to 8192. So data can be extended outside of actually mapped region during rounding. Which leads to crash during initial copying to shadow. Use _PhysPageSize instead. Change-Id: Iaa89992bd57f86dafa16b092b53fdc0606213acb Reviewed-on: https://go-review.googlesource.com/3286 Reviewed-by: Russ Cox <rsc@golang.org> |
|
|
|
3965d7508e |
runtime: factor out bitmap, finalizer code from malloc/mgc
The code in mfinal.go is moved from malloc*.go and mgc*.go and substantially unchanged. The code in mbitmap.go is also moved from those files, but cleaned up so that it can be called from those files (in most cases the code being moved was not already a standalone function). I also renamed the constants and wrote comments describing the format. The result is a significant cleanup and isolation of the bitmap code, but, roughly speaking, it should be treated and reviewed as new code. The other files changed only as much as necessary to support this code movement. This CL does NOT change the semantics of the heap or type bitmaps at all, although there are now some obvious opportunities to do so in followup CLs. Change-Id: I41b8d5de87ad1d3cd322709931ab25e659dbb21d Reviewed-on: https://go-review.googlesource.com/2991 Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
4d226dfee9 |
runtime: move write barrier code into mbarrier.go
I also added new comments at the top of mbarrier.go, but the rest of the code is just copy-and-paste. Change-Id: Iaeb2b12f8b1eaa33dbff5c2de676ca902bfddf2e Reviewed-on: https://go-review.googlesource.com/2990 Reviewed-by: Austin Clements <austin@google.com> |