mirror of https://github.com/golang/go.git
1183 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
232331f0c7 |
runtime: add blank assignment to defeat "declared but not used" error from go/types
gc should ideally consider this an error too; see golang/go#8560. Change-Id: Ieee71c4ecaff493d7f83e15ba8c8a04ee90a4cf1 Reviewed-on: https://go-review.googlesource.com/10757 Reviewed-by: Robert Griesemer <gri@golang.org> |
|
|
|
7529314ed3 |
runtime: use correct SP when installing stack barriers
Currently the stack barriers are installed at the next frame boundary after gp.sched.sp + 1024*2^n for n=0,1,2,... However, when a G is in a system call, we set gp.sched.sp to 0, which causes stack barriers to be installed at *every* frame. This easily overflows the slice we've reserved for storing the stack barrier information, and causes a "slice bounds out of range" panic in gcInstallStackBarrier. Fix this by using gp.syscallsp instead of gp.sched.sp if it's non-zero. This is the same logic that gentraceback uses to determine the current SP. Fixes #11049. Change-Id: Ie40eeee5bec59b7c1aa715a7c17aa63b1f1cf4e8 Reviewed-on: https://go-review.googlesource.com/10755 Reviewed-by: Russ Cox <rsc@golang.org> |
|
|
|
3ffcbb633e |
runtime: default GOMAXPROCS to NumCPU(), not 1
See golang.org/s/go15gomaxprocs for details. Change-Id: I8de5df34fa01d31d78f0194ec78a2474c281243c Reviewed-on: https://go-review.googlesource.com/10668 Reviewed-by: Rob Pike <r@golang.org> |
|
|
|
5353cde080 |
runtime, cmd/internal/obj/arm: improve arm function prologue
When stack growth is not needed, as it usually is not,
execute only a single conditional branch
rather than three conditional instructions.
This adds 4 bytes to every function,
but might speed up execution in the common case.
Sample disassembly for
func f() {
_ = [128]byte{}
}
Before:
TEXT main.f(SB) x.go
x.go:3 0x2000 e59a1008 MOVW 0x8(R10), R1
x.go:3 0x2004 e59fb028 MOVW 0x28(R15), R11
x.go:3 0x2008 e08d200b ADD R11, R13, R2
x.go:3 0x200c e1520001 CMP R1, R2
x.go:3 0x2010 91a0300e MOVW.LS R14, R3
x.go:3 0x2014 9b0118a9 BL.LS runtime.morestack_noctxt(SB)
x.go:3 0x2018 9afffff8 B.LS main.f(SB)
x.go:3 0x201c e52de084 MOVW.W R14, -0x84(R13)
x.go:4 0x2020 e28d1004 ADD $4, R13, R1
x.go:4 0x2024 e3a00000 MOVW $0, R0
x.go:4 0x2028 eb012255 BL 0x4a984
x.go:5 0x202c e49df084 RET #132
x.go:5 0x2030 eafffffe B 0x2030
x.go:5 0x2034 ffffff7c ?
After:
TEXT main.f(SB) x.go
x.go:3 0x2000 e59a1008 MOVW 0x8(R10), R1
x.go:3 0x2004 e59fb02c MOVW 0x2c(R15), R11
x.go:3 0x2008 e08d200b ADD R11, R13, R2
x.go:3 0x200c e1520001 CMP R1, R2
x.go:3 0x2010 9a000004 B.LS 0x2028
x.go:3 0x2014 e52de084 MOVW.W R14, -0x84(R13)
x.go:4 0x2018 e28d1004 ADD $4, R13, R1
x.go:4 0x201c e3a00000 MOVW $0, R0
x.go:4 0x2020 eb0124dc BL 0x4b398
x.go:5 0x2024 e49df084 RET #132
x.go:5 0x2028 e1a0300e MOVW R14, R3
x.go:5 0x202c eb011b0d BL runtime.morestack_noctxt(SB)
x.go:5 0x2030 eafffff2 B main.f(SB)
x.go:5 0x2034 eafffffe B 0x2034
x.go:5 0x2038 ffffff7c ?
Updates #10587.
package sort benchmarks on an iPhone 6:
name old time/op new time/op delta
SortString1K 569µs ± 0% 565µs ± 1% -0.75% (p=0.000 n=23+24)
StableString1K 872µs ± 1% 870µs ± 1% -0.16% (p=0.009 n=23+24)
SortInt1K 317µs ± 2% 316µs ± 2% ~ (p=0.410 n=26+26)
StableInt1K 343µs ± 1% 339µs ± 1% -1.07% (p=0.000 n=22+23)
SortInt64K 30.0ms ± 1% 30.0ms ± 1% ~ (p=0.091 n=25+24)
StableInt64K 30.2ms ± 0% 30.0ms ± 0% -0.69% (p=0.000 n=22+22)
Sort1e2 147µs ± 1% 146µs ± 0% -0.48% (p=0.000 n=25+24)
Stable1e2 290µs ± 1% 286µs ± 1% -1.30% (p=0.000 n=23+24)
Sort1e4 29.5ms ± 2% 29.7ms ± 1% +0.71% (p=0.000 n=23+23)
Stable1e4 88.7ms ± 4% 88.6ms ± 8% -0.07% (p=0.022 n=26+26)
Sort1e6 4.81s ± 7% 4.83s ± 7% ~ (p=0.192 n=26+26)
Stable1e6 18.3s ± 1% 18.1s ± 1% -0.76% (p=0.000 n=25+23)
SearchWrappers 318ns ± 1% 344ns ± 1% +8.14% (p=0.000 n=23+26)
package sort benchmarks on a first generation rpi:
name old time/op new time/op delta
SearchWrappers 4.13µs ± 0% 3.95µs ± 0% -4.42% (p=0.000 n=15+13)
SortString1K 5.81ms ± 1% 5.82ms ± 2% ~ (p=0.400 n=14+15)
StableString1K 9.69ms ± 1% 9.73ms ± 0% ~ (p=0.121 n=15+11)
SortInt1K 3.30ms ± 2% 3.66ms ±19% +10.82% (p=0.000 n=15+14)
StableInt1K 5.97ms ±15% 4.17ms ± 8% -30.05% (p=0.000 n=15+15)
SortInt64K 319ms ± 1% 295ms ± 1% -7.65% (p=0.000 n=15+15)
StableInt64K 343ms ± 0% 332ms ± 0% -3.26% (p=0.000 n=12+13)
Sort1e2 3.36ms ± 2% 3.22ms ± 4% -4.10% (p=0.000 n=15+15)
Stable1e2 6.74ms ± 1% 6.43ms ± 2% -4.67% (p=0.000 n=15+15)
Sort1e4 247ms ± 1% 247ms ± 1% ~ (p=0.331 n=15+14)
Stable1e4 864ms ± 0% 820ms ± 0% -5.15% (p=0.000 n=14+15)
Sort1e6 41.2s ± 0% 41.2s ± 0% +0.15% (p=0.000 n=13+14)
Stable1e6 192s ± 0% 182s ± 0% -5.07% (p=0.000 n=14+14)
Change-Id: I8a9db77e1d4ea1956575895893bc9d04bd81204b
Reviewed-on: https://go-review.googlesource.com/10497
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
|
|
03410f6758 |
runtime: fix TestFixedGOROOT to properly restore the GOROOT env var after test
Otherwise subsequent tests won't see any modified GOROOT.
With this CL I can move my GOROOT, set GOROOT to the new location, and
the runtime tests pass. Previously the crash_tests would instead look
for the GOROOT baked into the binary, instead of the env var:
--- FAIL: TestGcSys (0.01s)
crash_test.go:92: building source: exit status 2
go: cannot find GOROOT directory: /home/bradfitz/go
--- FAIL: TestGCFairness (0.01s)
crash_test.go:92: building source: exit status 2
go: cannot find GOROOT directory: /home/bradfitz/go
--- FAIL: TestGdbPython (0.07s)
runtime-gdb_test.go:64: building source exit status 2
go: cannot find GOROOT directory: /home/bradfitz/go
--- FAIL: TestLargeStringConcat (0.01s)
crash_test.go:92: building source: exit status 2
go: cannot find GOROOT directory: /home/bradfitz/go
Update #10029
Change-Id: If91be0f04d3acdcf39a9e773a4e7905a446bc477
Reviewed-on: https://go-review.googlesource.com/10685
Reviewed-by: Andrew Gerrand <adg@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
|
|
|
|
10083d8007 |
runtime: print start of GC cycle in gctrace, rather than end
Currently the GODEBUG=gctrace=1 trace line includes "@n.nnns" to indicate the time that the GC cycle ended relative to the time the program started. This was meant to be consistent with the utilization as of the end of the cycle, which is printed next on the trace line, but it winds up just being confusing and unexpected. Change the trace line to include the time that the GC cycle started relative to the time the program started. Change-Id: I7d64580cd696eb17540716d3e8a74a9d6ae50650 Reviewed-on: https://go-review.googlesource.com/10634 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org> |
|
|
|
faa7a7e8ae |
runtime: implement GC stack barriers
This commit implements stack barriers to minimize the amount of stack re-scanning that must be done during mark termination. Currently the GC scans stacks of active goroutines twice during every GC cycle: once at the beginning during root discovery and once at the end during mark termination. The second scan happens while the world is stopped and guarantees that we've seen all of the roots (since there are no write barriers on writes to local stack variables). However, this means pause time is proportional to stack size. In particularly recursive programs, this can drive pause time up past our 10ms goal (e.g., it takes about 150ms to scan a 50MB heap). Re-scanning the entire stack is rarely necessary, especially for large stacks, because usually most of the frames on the stack were not active between the first and second scans and hence any changes to these frames (via non-escaping pointers passed down the stack) were tracked by write barriers. To efficiently track how far a stack has been unwound since the first scan (and, hence, how much needs to be re-scanned), this commit introduces stack barriers. During the first scan, at exponentially spaced points in each stack, the scan overwrites return PCs with the PC of the stack barrier function. When "returned" to, the stack barrier function records how far the stack has unwound and jumps to the original return PC for that point in the stack. Then the second scan only needs to proceed as far as the lowest barrier that hasn't been hit. For deeply recursive programs, this substantially reduces mark termination time (and hence pause time). For the goscheme example linked in issue #10898, prior to this change, mark termination times were typically between 100 and 500ms; with this change, mark termination times are typically between 10 and 20ms. As a result of the reduced stack scanning work, this reduces overall execution time of the goscheme example by 20%. Fixes #10898. The effect of this on programs that are not deeply recursive is minimal: name old time/op new time/op delta BinaryTree17 3.16s ± 2% 3.26s ± 1% +3.31% (p=0.000 n=19+19) Fannkuch11 2.42s ± 1% 2.48s ± 1% +2.24% (p=0.000 n=17+19) FmtFprintfEmpty 50.0ns ± 3% 49.8ns ± 1% ~ (p=0.534 n=20+19) FmtFprintfString 173ns ± 0% 175ns ± 0% +1.49% (p=0.000 n=16+19) FmtFprintfInt 170ns ± 1% 175ns ± 1% +2.97% (p=0.000 n=20+19) FmtFprintfIntInt 288ns ± 0% 295ns ± 0% +2.73% (p=0.000 n=16+19) FmtFprintfPrefixedInt 242ns ± 1% 252ns ± 1% +4.13% (p=0.000 n=18+18) FmtFprintfFloat 324ns ± 0% 323ns ± 0% -0.36% (p=0.000 n=20+19) FmtManyArgs 1.14µs ± 0% 1.12µs ± 1% -1.01% (p=0.000 n=18+19) GobDecode 8.88ms ± 1% 8.87ms ± 0% ~ (p=0.480 n=19+18) GobEncode 6.80ms ± 1% 6.85ms ± 0% +0.82% (p=0.000 n=20+18) Gzip 363ms ± 1% 363ms ± 1% ~ (p=0.077 n=18+20) Gunzip 90.6ms ± 0% 90.0ms ± 1% -0.71% (p=0.000 n=17+18) HTTPClientServer 51.5µs ± 1% 50.8µs ± 1% -1.32% (p=0.000 n=18+18) JSONEncode 17.0ms ± 0% 17.1ms ± 0% +0.40% (p=0.000 n=18+17) JSONDecode 61.8ms ± 0% 63.8ms ± 1% +3.11% (p=0.000 n=18+17) Mandelbrot200 3.84ms ± 0% 3.84ms ± 1% ~ (p=0.583 n=19+19) GoParse 3.71ms ± 1% 3.72ms ± 1% ~ (p=0.159 n=18+19) RegexpMatchEasy0_32 100ns ± 0% 100ns ± 1% -0.19% (p=0.033 n=17+19) RegexpMatchEasy0_1K 342ns ± 1% 331ns ± 0% -3.41% (p=0.000 n=19+19) RegexpMatchEasy1_32 82.5ns ± 0% 81.7ns ± 0% -0.98% (p=0.000 n=18+18) RegexpMatchEasy1_1K 505ns ± 0% 494ns ± 1% -2.16% (p=0.000 n=18+18) RegexpMatchMedium_32 137ns ± 1% 137ns ± 1% -0.24% (p=0.048 n=20+18) RegexpMatchMedium_1K 41.6µs ± 0% 41.3µs ± 1% -0.57% (p=0.004 n=18+20) RegexpMatchHard_32 2.11µs ± 0% 2.11µs ± 1% +0.20% (p=0.037 n=17+19) RegexpMatchHard_1K 63.9µs ± 2% 63.3µs ± 0% -0.99% (p=0.000 n=20+17) Revcomp 560ms ± 1% 522ms ± 0% -6.87% (p=0.000 n=18+16) Template 75.0ms ± 0% 75.1ms ± 1% +0.18% (p=0.013 n=18+19) TimeParse 358ns ± 1% 364ns ± 0% +1.74% (p=0.000 n=20+15) TimeFormat 360ns ± 0% 372ns ± 0% +3.55% (p=0.000 n=20+18) Change-Id: If8a9bfae6c128d15a4f405e02bcfa50129df82a2 Reviewed-on: https://go-review.googlesource.com/10314 Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
|
|
|
724f8298a8 |
runtime: avoid double-scanning of stacks
Currently there's a race between stopg scanning another G's stack and the G reaching a preemption point and scanning its own stack. When this race occurs, the G's stack is scanned twice. Currently this is okay, so this race is benign. However, we will shortly be adding stack barriers during the first stack scan, so scanning will no longer be idempotent. To prepare for this, this change ensures that each stack is scanned only once during each GC phase by checking the flag that indicates that the stack has been scanned in this phase before scanning the stack. Change-Id: Id9f4d5e2e5b839bc3f200ec1723a4a12dd677ab4 Reviewed-on: https://go-review.googlesource.com/10458 Reviewed-by: Rick Hudson <rlh@golang.org> |
|
|
|
3f6e69aca5 |
runtime: steal space for stack barrier tracking from stack
The stack barrier code will need a bookkeeping structure to keep track of the overwritten return PCs. This commit introduces and allocates this structure, but does not yet use the structure. We don't want to allocate space for this structure during garbage collection, so this commit allocates it along with the allocation of the corresponding stack. However, we can't do a regular allocation in newstack because mallocgc may itself grow the stack (which would lead to a recursive allocation). Hence, this commit makes the bookkeeping structure part of the stack allocation itself by stealing the necessary space from the top of the stack allocation. Since the size of this bookkeeping structure is logarithmic in the size of the stack, this has minimal impact on stack behavior. Change-Id: Ia14408be06aafa9ca4867f4e70bddb3fe0e96665 Reviewed-on: https://go-review.googlesource.com/10313 Reviewed-by: Russ Cox <rsc@golang.org> |
|
|
|
e610c25df0 |
runtime: decouple stack bounds and stack allocation size
Currently the runtime assumes that the allocation for the stack is exactly [stack.lo, stack.hi). We're about to steal a small part of this allocation for per-stack GC metadata. To prepare for this, this commit adds a field to the G for the allocated size of the stack. With this change, stack.lo and stack.hi continue to act as the true bounds on the stack, but are no longer also used as the bounds on the stack allocation. (I also tried this the other way around, where stack.lo and stack.hi remained the allocation bounds and I introduced a new top of stack. However, there are far more places that assume stack.hi is the true top of the stack than there are places that assume it's the top of the allocation.) Change-Id: Ifa9d956753be53d286d09cbc73d47fb34a18c0c6 Reviewed-on: https://go-review.googlesource.com/10312 Reviewed-by: Russ Cox <rsc@golang.org> |
|
|
|
c02b8911d8 |
runtime: clean up signalstack API
Currently signalstack takes a lower limit and a length and all calls hard-code the passed length. Change the API to take a *stack and compute the lower limit and length from the passed stack. This will make it easier for the runtime to steal some space from the top of the stack since it eliminates the hard-coded stack sizes. Change-Id: I7d2a9f45894b221f4e521628c2165530bbc57d53 Reviewed-on: https://go-review.googlesource.com/10311 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org> |
|
|
|
cc6a7fce53 |
runtime: increase precision of gctrace times
Currently we truncate gctrace clock and CPU times to millisecond precision. As a result, many phases are typically printed as 0, which is fine for user consumption, but makes gathering statistics and reports over GC traces difficult. In 1.4, the gctrace line printed times in microseconds. This was better for statistics, but not as easy for users to read or interpret, and it generally made the trace lines longer. This change strikes a balance between these extremes by printing milliseconds, but including the decimal part to two significant figures down to microsecond precision. This remains easy to read and interpret, but includes more precision when it's useful. For example, where the code currently prints, gc #29 @1.629s 0%: 0+2+0+12+0 ms clock, 0+2+0+0/12/0+0 ms cpu, 4->4->2 MB, 4 MB goal, 1 P this prints, gc #29 @1.629s 0%: 0.005+2.1+0+12+0.29 ms clock, 0.005+2.1+0+0/12/0+0.29 ms cpu, 4->4->2 MB, 4 MB goal, 1 P Fixes #10970. Change-Id: I249624779433927cd8b0947b986df9060c289075 Reviewed-on: https://go-review.googlesource.com/10554 Reviewed-by: Russ Cox <rsc@golang.org> |
|
|
|
1fa0a8cec5 |
runtime: fix data race in BenchmarkChanPopular
Fixes #11014. Change-Id: I9a18dacd10564d3eaa1fea4d77f1a48e08e79f53 Reviewed-on: https://go-review.googlesource.com/10563 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> |
|
|
|
df2809f04e |
runtime: document that runtime.GC() blocks until GC is complete
runtime.GC() is intentionally very weakly specified. However, it is so weakly specified that it's difficult to know that it's being used correctly for its one intended use case: to ensure garbage collection has run in a test that is garbage-sensitive. In particular, it is unclear whether it is synchronous or asynchronous. In the old STW collector this was essentially self-evident; short of queuing up a garbage collection to run later, it had to be synchronous. However, with the concurrent collector, there's evidence that people are inferring that it may be asynchronous (e.g., issue #10986), as this is both unclear in the documentation and possible in the implementation. In fact, runtime.GC() runs a fully synchronous STW collection. We probably don't want to commit to this exact behavior. But we can commit to the essential property that tests rely on: that runtime.GC() does not return until the GC has finished. Change-Id: Ifc3045a505e1898ecdbe32c1f7e80e2e9ffacb5b Reviewed-on: https://go-review.googlesource.com/10488 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org> |
|
|
|
f2c3957ed8 |
runtime: disable GC around TestGoroutineParallelism
TestGoroutineParallelism can deadlock if the GC runs during the test. Currently it tries to prevent this by forcing a GC before the test, but this is best effort and fails completely if GOGC is very low for testing. This change replaces this best-effort fix with simply setting GOGC to off for the duration of the test. Change-Id: I8229310833f241b149ebcd32845870c1cb14e9f8 Reviewed-on: https://go-review.googlesource.com/10454 Reviewed-by: Russ Cox <rsc@golang.org> |
|
|
|
4a1957d0aa |
runtime: use stripped test environment for TestGdbPython
Most runtime tests that invoke the compiler to build a sub-test binary do so with a special environment constructed by testEnv that strips out environment variables that should apply to the test but not to the build. Fix TestGdbPython to use this test environment when invoking go build, like other tests do. Change-Id: Iafdf89d4765c587cbebc427a5d61cb8a7e71b326 Reviewed-on: https://go-review.googlesource.com/10455 Reviewed-by: Russ Cox <rsc@golang.org> |
|
|
|
8017ace496 |
runtime: don't always block all signals on OpenBSD
Implement the changes from CL 10173 on OpenBSD. Change-Id: I2db1cd8141fd392a34753a1b8113e2e0401173b9 Reviewed-on: https://go-review.googlesource.com/10342 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> |
|
|
|
84cfba17c2 |
runtime: don't always unblock all signals
Ian proposed an improved way of handling signals masks in Go, motivated by a problem where the Android java runtime expects certain signals to be blocked for all JVM threads. Discussion here https://groups.google.com/forum/#!topic/golang-dev/_TSCkQHJt6g Ian's text is used in the following: A Go program always needs to have the synchronous signals enabled. These are the signals for which _SigPanic is set in sigtable, namely SIGSEGV, SIGBUS, SIGFPE. A Go program that uses the os/signal package, and calls signal.Notify, needs to have at least one thread which is not blocking that signal, but it doesn't matter much which one. Unix programs do not change signal mask across execve. They inherit signal masks across fork. The shell uses this fact to some extent; for example, the job control signals (SIGTTIN, SIGTTOU, SIGTSTP) are blocked for commands run due to backquote quoting or $(). Our current position on signal masks was not thought out. We wandered into step by step, e.g., http://golang.org/cl/7323067 . This CL does the following: Introduce a new platform hook, msigsave, that saves the signal mask of the current thread to m.sigsave. Call msigsave from needm and newm. In minit grab set up the signal mask from m.sigsave and unblock the essential synchronous signals, and SIGILL, SIGTRAP, SIGPROF, SIGSTKFLT (for systems that have it). In unminit, restore the signal mask from m.sigsave. The first time that os/signal.Notify is called, start a new thread whose only purpose is to update its signal mask to make sure signals for signal.Notify are unblocked on at least one thread. The effect on Go programs will be that if they are invoked with some non-synchronous signals blocked, those signals will normally be ignored. Previously, those signals would mostly be ignored. A change in behaviour will occur for programs started with any of these signals blocked, if they receive the signal: SIGHUP, SIGINT, SIGQUIT, SIGABRT, SIGTERM. Previously those signals would always cause a crash (unless using the os/signal package); with this change, they will be ignored if the program is started with the signal blocked (and does not use the os/signal package). ./all.bash completes successfully on linux/amd64. OpenBSD is missing the implementation. Change-Id: I188098ba7eb85eae4c14861269cc466f2aa40e8c Reviewed-on: https://go-review.googlesource.com/10173 Reviewed-by: Ian Lance Taylor <iant@golang.org> |
|
|
|
001438bdfe |
runtime: fix callwritebarrier
Given a call frame F of size N where the return values start at offset R, callwritebarrier was instructing heapBitsBulkBarrier to scan the block of memory [F+R, F+R+N). It should only scan [F+R, F+N). The extra N-R bytes scanned might lead into the next allocated block in memory. Because the scan was consulting the heap bitmap for type information, scanning into the next block normally "just worked" in the sense of not crashing. Scanning the extra N-R bytes of memory is a problem mainly because it causes the GC to consider pointers that might otherwise not be considered, leading it to retain objects that should actually be freed. This is very difficult to detect. Luckily, juju turned up a case where the heap bitmap and the memory were out of sync for the block immediately after the call frame, so that heapBitsBulkBarrier saw an obvious non-pointer where it expected a pointer, causing a loud crash. Why is there a non-pointer in memory that the heap bitmap records as a pointer? That is more difficult to answer. At least one way that it could happen is that allocations containing no pointers at all do not update the heap bitmap. So if heapBitsBulkBarrier walked out of the current object and into a no-pointer object and consulted those bitmap bits, it would be misled. This doesn't happen in general because all the paths to heapBitsBulkBarrier first check for the no-pointer case. This may or may not be what happened, but it's the only scenario I've been able to construct. I tried for quite a while to write a simple test for this and could not. It does fix the juju crash, and it is clearly an improvement over the old code. Fixes #10844. Change-Id: I53982c93ef23ef93155c4086bbd95a4c4fdaac9a Reviewed-on: https://go-review.googlesource.com/10317 Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
a5c3bbe0b4 |
runtime: eliminate write barrier from adjustpointers
Currently adjustpointers invokes a write barrier for every stack slot it updates. This is safe---the write barrier always does nothing because the new value is never a heap pointer---but it's unnecessary overhead in performance and complexity. Fix this by rewriting adjustpointers to work with *uintptrs instead of *unsafe.Pointers. As an added bonus, this makes the code cleaner. name old mean new mean delta BinaryTree17 3.35s × (0.98,1.01) 3.33s × (0.99,1.02) ~ (p=0.095 n=20+19) Fannkuch11 2.49s × (1.00,1.01) 2.52s × (0.99,1.01) +1.23% (p=0.000 n=19+20) FmtFprintfEmpty 52.2ns × (0.99,1.02) 52.2ns × (0.99,1.02) ~ (p=0.766 n=19+19) FmtFprintfString 181ns × (0.99,1.02) 179ns × (0.99,1.01) -1.06% (p=0.000 n=20+19) FmtFprintfInt 177ns × (0.99,1.01) 173ns × (0.99,1.02) -2.26% (p=0.000 n=17+20) FmtFprintfIntInt 300ns × (0.99,1.01) 302ns × (0.99,1.01) +0.76% (p=0.000 n=19+20) FmtFprintfPrefixedInt 253ns × (0.99,1.02) 256ns × (0.99,1.01) +0.96% (p=0.000 n=20+19) FmtFprintfFloat 334ns × (0.99,1.02) 334ns × (1.00,1.01) ~ (p=0.243 n=20+19) FmtManyArgs 1.16µs × (0.99,1.01) 1.17µs × (0.99,1.02) +0.88% (p=0.000 n=20+20) GobDecode 9.16ms × (0.99,1.02) 9.18ms × (1.00,1.00) +0.21% (p=0.048 n=20+17) GobEncode 7.03ms × (0.99,1.01) 7.05ms × (0.99,1.01) ~ (p=0.091 n=19+19) Gzip 374ms × (0.99,1.01) 372ms × (0.99,1.02) -0.50% (p=0.008 n=18+20) Gunzip 92.9ms × (0.99,1.01) 92.5ms × (1.00,1.01) -0.47% (p=0.002 n=19+19) HTTPClientServer 53.1µs × (0.98,1.01) 52.5µs × (0.99,1.01) -0.98% (p=0.000 n=20+19) JSONEncode 17.4ms × (0.99,1.02) 17.5ms × (0.99,1.01) ~ (p=0.061 n=19+20) JSONDecode 66.0ms × (0.99,1.02) 64.7ms × (0.99,1.01) -1.87% (p=0.000 n=20+20) Mandelbrot200 3.94ms × (1.00,1.01) 3.95ms × (1.00,1.01) ~ (p=0.799 n=18+19) GoParse 3.89ms × (0.99,1.02) 3.86ms × (0.99,1.01) -0.70% (p=0.016 n=20+19) RegexpMatchEasy0_32 102ns × (0.99,1.02) 102ns × (1.00,1.01) ~ (p=0.557 n=20+18) RegexpMatchEasy0_1K 353ns × (0.99,1.02) 341ns × (0.99,1.01) -3.38% (p=0.000 n=20+20) RegexpMatchEasy1_32 85.0ns × (0.99,1.02) 85.0ns × (0.99,1.01) ~ (p=0.851 n=19+20) RegexpMatchEasy1_1K 521ns × (0.99,1.02) 506ns × (1.00,1.01) -2.85% (p=0.000 n=20+18) RegexpMatchMedium_32 142ns × (0.99,1.02) 141ns × (1.00,1.01) -1.17% (p=0.000 n=20+19) RegexpMatchMedium_1K 42.8µs × (0.99,1.01) 42.3µs × (0.99,1.01) -1.07% (p=0.000 n=20+19) RegexpMatchHard_32 2.17µs × (0.99,1.01) 2.16µs × (1.00,1.01) -0.51% (p=0.042 n=20+18) RegexpMatchHard_1K 65.6µs × (0.99,1.01) 64.8µs × (1.00,1.00) -1.21% (p=0.000 n=20+17) Revcomp 581ms × (0.99,1.04) 536ms × (1.00,1.01) -7.71% (p=0.000 n=20+18) Template 77.2ms × (0.99,1.01) 76.8ms × (0.99,1.01) ~ (p=0.426 n=20+18) TimeParse 369ns × (0.99,1.02) 371ns × (1.00,1.01) ~ (p=0.117 n=20+19) TimeFormat 371ns × (0.99,1.02) 391ns × (0.99,1.01) +5.33% (p=0.000 n=20+19) Change-Id: I5b952ba577ac4365c8c87db837c5804a1e30b7be Reviewed-on: https://go-review.googlesource.com/10293 Reviewed-by: Russ Cox <rsc@golang.org> |
|
|
|
5b66e5d0d8 |
runtime: turn work buffer tracing off by default
During development we ran with monitoring code turned on by default. This CL turns the work buffer monitoring off. Performance change on most go1 benchmarks is small or insignificant. name old mean new mean delta BinaryTree17 3.35s × (0.99,1.01) 3.35s × (0.99,1.01) ~ (p=0.841 n=5+5) Fannkuch11 2.59s × (1.00,1.01) 2.55s × (1.00,1.00) -1.65% (p=0.008 n=5+5) FmtFprintfEmpty 52.5ns × (0.99,1.02) 53.2ns × (0.98,1.01) ~ (p=0.063 n=5+5) FmtFprintfString 181ns × (1.00,1.00) 180ns × (1.00,1.00) -0.55% (p=0.029 n=4+4) FmtFprintfInt 176ns × (1.00,1.01) 174ns × (1.00,1.00) -0.91% (p=0.000 n=5+4) FmtFprintfIntInt 298ns × (1.00,1.00) 299ns × (1.00,1.00) ~ (p=0.143 n=4+4) FmtFprintfPrefixedInt 250ns × (1.00,1.01) 246ns × (1.00,1.00) -1.68% (p=0.000 n=5+4) FmtFprintfFloat 340ns × (1.00,1.00) 340ns × (1.00,1.01) ~ (p=0.643 n=5+5) FmtManyArgs 1.16µs × (1.00,1.00) 1.15µs × (1.00,1.00) -0.47% (p=0.016 n=5+5) GobDecode 9.22ms × (1.00,1.00) 9.23ms × (1.00,1.00) ~ (p=0.841 n=5+5) GobEncode 7.00ms × (1.00,1.01) 7.09ms × (0.99,1.01) +1.26% (p=0.016 n=5+5) Gzip 387ms × (1.00,1.00) 389ms × (0.99,1.02) ~ (p=1.000 n=5+5) Gunzip 97.8ms × (1.00,1.00) 98.3ms × (1.00,1.00) +0.51% (p=0.016 n=5+4) HTTPClientServer 52.6µs × (1.00,1.01) 52.7µs × (1.00,1.01) ~ (p=1.000 n=5+5) JSONEncode 18.0ms × (0.99,1.02) 17.9ms × (1.00,1.00) ~ (p=0.310 n=5+5) JSONDecode 64.8ms × (0.99,1.02) 63.6ms × (1.00,1.00) -1.94% (p=0.008 n=5+5) Mandelbrot200 4.05ms × (1.00,1.00) 4.05ms × (1.00,1.00) ~ (p=0.421 n=5+5) GoParse 3.86ms × (1.00,1.01) 3.84ms × (0.99,1.01) ~ (p=0.421 n=5+5) RegexpMatchEasy0_32 101ns × (1.00,1.00) 102ns × (0.99,1.02) ~ (p=0.238 n=4+5) RegexpMatchEasy0_1K 346ns × (1.00,1.01) 345ns × (1.00,1.00) ~ (p=0.333 n=5+4) RegexpMatchEasy1_32 87.3ns × (0.99,1.02) 87.4ns × (1.00,1.00) ~ (p=0.190 n=5+4) RegexpMatchEasy1_1K 520ns × (1.00,1.00) 520ns × (1.00,1.01) ~ (p=1.000 n=4+5) RegexpMatchMedium_32 143ns × (1.00,1.00) 142ns × (1.00,1.00) -0.70% (p=0.029 n=4+4) RegexpMatchMedium_1K 43.2µs × (1.00,1.01) 43.2µs × (1.00,1.00) ~ (p=0.841 n=5+5) RegexpMatchHard_32 2.24µs × (1.00,1.01) 2.23µs × (1.00,1.01) -0.63% (p=0.048 n=5+5) RegexpMatchHard_1K 68.7µs × (1.00,1.00) 68.3µs × (1.00,1.00) -0.56% (p=0.008 n=5+5) Revcomp 577ms × (1.00,1.01) 579ms × (1.00,1.00) ~ (p=0.151 n=5+5) Template 74.9ms × (1.00,1.00) 76.5ms × (1.00,1.00) +2.11% (p=0.008 n=5+5) TimeParse 359ns × (1.00,1.00) 362ns × (1.00,1.00) +0.72% (p=0.008 n=5+5) TimeFormat 369ns × (1.00,1.00) 371ns × (1.00,1.01) ~ (p=0.071 n=5+5) Change-Id: I4206a3f77a3d1450966b7a62ea7597aec44cb72f Reviewed-on: https://go-review.googlesource.com/10294 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
719efc70eb |
runtime: make runtime.callers walk calling G, not g0
Currently runtime.callers invokes gentraceback with the pc and sp of the G it is called from, but always passes g0 even if it was called from a regular g. Right now this has no ill effects because runtime.callers does not use either callback argument or the _TraceJumpStack flag, but it makes the code fragile and will break some upcoming changes. Fix this by lifting the getg() call outside of the systemstack in runtime.callers. Change-Id: I4e1e927961c0e0cd4dcf28693be47df7bae9e122 Reviewed-on: https://go-review.googlesource.com/10292 Reviewed-by: Daniel Morsing <daniel.morsing@gmail.com> Reviewed-by: Rick Hudson <rlh@golang.org> |
|
|
|
197aa9e64d |
runtime: remove unused quiesce code
This is dead code. If you want to quiesce the system the
preferred way is to use forEachP(func(*p){}).
Change-Id: Ic7677a5dd55e3639b99e78ddeb2c71dd1dd091fa
Reviewed-on: https://go-review.googlesource.com/10267
Reviewed-by: Austin Clements <austin@google.com>
|
|
|
|
913db7685e |
runtime: run background mark helpers only if work is available
Prior to this CL whenever the GC marking was enabled and a P was looking for work we supplied a G to help the GC do its marking tasks. Once this G finished all the marking available it would release the P to find another available G. In the case where there was no work the P would drop into findrunnable which would execute the mark helper G which would immediately return and the P would drop into findrunnable again repeating the process. Since the P was always given a G to run it never blocks. This CL first checks if the GC mark helper G has available work and if not the P immediately falls through to its blocking logic. Fixes #10901 Change-Id: I94ac9646866ba64b7892af358888bc9950de23b5 Reviewed-on: https://go-review.googlesource.com/10189 Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
f4d51eb2f5 |
runtime: minor clean up to heapminimum
Currently setGCPercent sets heapminimum to heapminimum*GOGC/100. The real intent is to set heapminimum to a scaled multiple of a fixed default heap minimum, not to scale heapminimum based on its current value. This turns out to be okay because setGCPercent is only called once and heapminimum is initially set to this default heap minimum. However, the code as written is confusing, especially since setGCPercent is otherwise written so it could be called again to change GOGC. Fix this by introducing a defaultHeapMinimum constant and using this instead of the current value of heapminimum to compute the scaled heap minimum. As part of this, this commit improves the documentation on heapminimum. Change-Id: I4eb82c73dc2eb44a6e5a17c780a747a2e73d7493 Reviewed-on: https://go-review.googlesource.com/10181 Reviewed-by: Russ Cox <rsc@golang.org> |
|
|
|
8903b3db0e |
runtime: add fast check for self-loop pointer in scanobject
Addresses a problem reported on the mailing list. This will come up mainly in programs custom allocators that batch allocations, but it still helps in our programs, which mainly do not have such allocations. name old mean new mean delta BinaryTree17 5.95s × (0.97,1.03) 5.93s × (0.97,1.04) ~ (p=0.613) Fannkuch11 4.46s × (0.98,1.04) 4.33s × (0.99,1.01) -2.93% (p=0.000) FmtFprintfEmpty 86.6ns × (0.98,1.03) 86.8ns × (0.98,1.02) ~ (p=0.523) FmtFprintfString 290ns × (0.98,1.05) 287ns × (0.98,1.03) ~ (p=0.061) FmtFprintfInt 271ns × (0.98,1.04) 286ns × (0.99,1.01) +5.54% (p=0.000) FmtFprintfIntInt 495ns × (0.98,1.04) 489ns × (0.99,1.01) -1.24% (p=0.015) FmtFprintfPrefixedInt 391ns × (0.99,1.02) 407ns × (0.99,1.01) +4.00% (p=0.000) FmtFprintfFloat 578ns × (0.99,1.01) 559ns × (0.99,1.01) -3.35% (p=0.000) FmtManyArgs 1.96µs × (0.98,1.05) 1.94µs × (0.99,1.01) -1.33% (p=0.030) GobDecode 15.9ms × (0.97,1.05) 15.7ms × (0.99,1.01) -1.35% (p=0.044) GobEncode 11.4ms × (0.97,1.05) 11.3ms × (0.98,1.03) ~ (p=0.141) Gzip 658ms × (0.98,1.05) 648ms × (0.99,1.01) -1.59% (p=0.009) Gunzip 144ms × (0.99,1.03) 144ms × (0.99,1.01) ~ (p=0.867) HTTPClientServer 92.1µs × (0.97,1.05) 90.3µs × (0.99,1.01) -1.89% (p=0.005) JSONEncode 31.0ms × (0.96,1.07) 30.2ms × (0.98,1.03) -2.66% (p=0.001) JSONDecode 110ms × (0.97,1.04) 107ms × (0.99,1.01) -2.59% (p=0.000) Mandelbrot200 6.15ms × (0.98,1.04) 6.07ms × (0.99,1.02) -1.32% (p=0.045) GoParse 6.79ms × (0.97,1.04) 6.74ms × (0.97,1.04) ~ (p=0.242) RegexpMatchEasy0_32 158ns × (0.98,1.05) 155ns × (0.99,1.01) -1.64% (p=0.010) RegexpMatchEasy0_1K 548ns × (0.97,1.04) 540ns × (0.99,1.01) -1.34% (p=0.042) RegexpMatchEasy1_32 133ns × (0.97,1.04) 132ns × (0.97,1.05) ~ (p=0.466) RegexpMatchEasy1_1K 899ns × (0.96,1.05) 878ns × (0.99,1.01) -2.32% (p=0.002) RegexpMatchMedium_32 250ns × (0.96,1.03) 243ns × (0.99,1.01) -2.90% (p=0.000) RegexpMatchMedium_1K 73.4µs × (0.98,1.04) 73.0µs × (0.98,1.04) ~ (p=0.411) RegexpMatchHard_32 3.87µs × (0.97,1.07) 3.84µs × (0.98,1.04) ~ (p=0.273) RegexpMatchHard_1K 120µs × (0.97,1.08) 117µs × (0.99,1.01) -2.06% (p=0.010) Revcomp 940ms × (0.96,1.07) 924ms × (0.97,1.07) ~ (p=0.071) Template 128ms × (0.96,1.05) 128ms × (0.99,1.01) ~ (p=0.502) TimeParse 632ns × (0.96,1.07) 616ns × (0.99,1.01) -2.58% (p=0.001) TimeFormat 671ns × (0.97,1.06) 657ns × (0.99,1.02) -2.10% (p=0.002) In contrast to the one in test/bench/go1 (above), the binarytree program on the shootout site uses more goroutines, batches allocations, and sets GOMAXPROCS to runtime.NumCPU()*2. Using that version, before vs after: name old mean new mean delta BinaryTree20 18.6s × (0.96,1.05) 11.3s × (0.98,1.02) -39.46% (p=0.000) And Go 1.4 vs after: name old mean new mean delta BinaryTree20 13.0s × (0.97,1.02) 11.3s × (0.98,1.02) -13.21% (p=0.000) There is still a scheduling problem - the raw run times are hiding the fact that this chews up 2x the CPU - but we'll take care of that separately. Change-Id: I3f5da879b24ae73a0d06745381ffb88c3744948b Reviewed-on: https://go-review.googlesource.com/10220 Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
79986e24e0 |
runtime/pprof: write heap statistics to heap profile always
This is a duplicate of CL 9491. That CL broke the build due to pprof shortcomings and was reverted in CL 9565. CL 9623 fixed pprof, so this can go in again. Fixes #10659. Change-Id: If470fc90b3db2ade1d161b4417abd2f5c6c330b8 Reviewed-on: https://go-review.googlesource.com/10212 Reviewed-by: Matthew Dempsky <mdempsky@google.com> |
|
|
|
f0dd002895 |
runtime: use separate count and note for forEachP
Currently, forEachP reuses the stopwait and stopnote fields from
stopTheWorld to track how many Ps have not responded to the safe-point
request and to sleep until all Ps have responded.
It was assumed this was safe because both stopTheWorld and forEachP
must occur under the worlsema and hence stopwait and stopnote cannot
be used for both purposes simultaneously and callers could always
determine the appropriate use based on sched.gcwaiting (which is only
set by stopTheWorld). However, this is not the case, since it's
possible for there to be a window between when an M observes that
gcwaiting is set and when it checks stopwait during which stopwait
could have changed meanings. When this happens, the M decrements
stopwait and may wakeup stopnote, but does not otherwise participate
in the forEachP protocol. As a result, stopwait is decremented too
many times, so it may reach zero before all Ps have run the safe-point
function, causing forEachP to wake up early. It will then either
observe that some P has not run the safe-point function and panic with
"P did not run fn", or the remaining P (or Ps) will run the safe-point
function before it wakes up and it will observe that stopwait is
negative and panic with "not stopped".
Fix this problem by giving forEachP its own safePointWait and
safePointNote fields.
One known sequence of events that can cause this race is as
follows. It involves three actors:
G1 is running on M1 on P1. P1 has an empty run queue.
G2/M2 is in a blocked syscall and has lost its P. (The details of this
don't matter, it just needs to be in a position where it needs to grab
an idle P.)
GC just started on G3/M3/P3. (These aren't very involved, they just
have to be separate from the other G's, M's, and P's.)
1. GC calls stopTheWorld(), which sets sched.gcwaiting to 1.
Now G1/M1 begins to enter a syscall:
2. G1/M1 invokes reentersyscall, which sets the P1's status to
_Psyscall.
3. G1/M1's reentersyscall observes gcwaiting != 0 and calls
entersyscall_gcwait.
4. G1/M1's entersyscall_gcwait blocks acquiring sched.lock.
Back on GC:
5. stopTheWorld cas's P1's status to _Pgcstop, does other stuff, and
returns.
6. GC does stuff and then calls startTheWorld().
7. startTheWorld() calls procresize(), which sets P1's status to
_Pidle and puts P1 on the idle list.
Now G2/M2 returns from its syscall and takes over P1:
8. G2/M2 returns from its blocked syscall and gets P1 from the idle
list.
9. G2/M2 acquires P1, which sets P1's status to _Prunning.
10. G2/M2 starts a new syscall and invokes reentersyscall, which sets
P1's status to _Psyscall.
Back on G1/M1:
11. G1/M1 finally acquires sched.lock in entersyscall_gcwait.
At this point, G1/M1 still thinks it's running on P1. P1's status is
_Psyscall, which is consistent with what G1/M1 is doing, but it's
_Psyscall because *G2/M2* put it in to _Psyscall, not G1/M1. This is
basically an ABA race on P1's status.
Because forEachP currently shares stopwait with stopTheWorld. G1/M1's
entersyscall_gcwait observes the non-zero stopwait set by forEachP,
but mistakes it for a stopTheWorld. It cas's P1's status from
_Psyscall (set by G2/M2) to _Pgcstop and proceeds to decrement
stopwait one more time than forEachP was expecting.
Fixes #10618. (See the issue for details on why the above race is safe
when forEachP is not involved.)
Prior to this commit, the command
stress ./runtime.test -test.run TestFutexsleep\|TestGoroutineProfile
would reliably fail after a few hundred runs. With this commit, it
ran for over 2 million runs and never crashed.
Change-Id: I9a91ea20035b34b6e5f07ef135b144115f281f30
Reviewed-on: https://go-review.googlesource.com/10157
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
|
|
277acca286 |
runtime: hold worldsema while starting the world
Currently, startTheWorld releases worldsema before starting the world. Since startTheWorld can change gomaxprocs after allowing Ps to run, this means that gomaxprocs can change while another P holds worldsema. Unfortunately, the garbage collector and forEachP assume that holding worldsema protects against changes in gomaxprocs (which it *almost* does). In particular, this is causing somewhat frequent "P did not run fn" crashes in forEachP in the runtime tests because gomaxprocs is changing between the several loops that forEachP does over all the Ps. Fix this by only releasing worldsema after the world is started. This relates to issue #10618. forEachP still fails under stress testing, but much less frequently. Change-Id: I085d627b70cca9ebe9af28fe73b9872f1bb224ff Reviewed-on: https://go-review.googlesource.com/10156 Reviewed-by: Russ Cox <rsc@golang.org> |
|
|
|
9c44a41dd5 |
runtime: disallow preemption during startTheWorld
Currently, startTheWorld clears preemptoff for the current M before starting the world. A few callers increment m.locks around startTheWorld, presumably to prevent preemption any time during starting the world. This is almost certainly pointless (none of the other callers do this), but there's no harm in making startTheWorld keep preemption disabled until it's all done, which definitely lets us drop these m.locks manipulations. Change-Id: I8a93658abd0c72276c9bafa3d2c7848a65b4691a Reviewed-on: https://go-review.googlesource.com/10155 Reviewed-by: Russ Cox <rsc@golang.org> |
|
|
|
a1da255aa0 |
runtime: factor stoptheworld/starttheworld pattern
There are several steps to stopping and starting the world and currently they're open-coded in several places. The garbage collector is the only thing that needs to stop and start the world in a non-trivial pattern. Replace all other uses with calls to higher-level functions that implement the entire pattern necessary to stop and start the world. This is a pure refectoring and should not change any code semantics. In the following commits, we'll make changes that are easier to do with this abstraction in place. This commit renames the old starttheworld to startTheWorldWithSema. This is a slight misnomer right now because the callers release worldsema just before calling this. However, a later commit will swap these and I don't want to think of another name in the mean time. Change-Id: I5dc97f87b44fb98963c49c777d7053653974c911 Reviewed-on: https://go-review.googlesource.com/10154 Reviewed-by: Russ Cox <rsc@golang.org> |
|
|
|
5f7060afd2 |
runtime: don't start GC if preemptoff is set
In order to avoid deadlocks, startGC avoids kicking off GC if locks are held by the calling M. However, it currently fails to check preemptoff, which is the other way to disable preemption. Fix this by adding a check for preemptoff. Change-Id: Ie1083166e5ba4af5c9d6c5a42efdfaaef41ca997 Reviewed-on: https://go-review.googlesource.com/10153 Reviewed-by: Russ Cox <rsc@golang.org> |
|
|
|
e544bee1dd |
runtime: correct exception stack trace output
It is misleading when stack trace say: signal arrived during cgo execution but we are not in cgo call. Change-Id: I627e2f2bdc7755074677f77f21befc070a101914 Reviewed-on: https://go-review.googlesource.com/9190 Reviewed-by: Russ Cox <rsc@golang.org> |
|
|
|
a0fc306023 |
runtime: eliminate runqvictims and a copy from runqsteal
Currently, runqsteal steals Gs from another P into an intermediate buffer and then copies those Gs into the current P's run queue. This intermediate buffer itself was moved from the stack to the P in commit |
|
|
|
512f75e8df |
runtime: replace GC programs with simpler encoding, faster decoder
Small types record the location of pointers in their memory layout by using a simple bitmap. In Go 1.4 the bitmap held 4-bit entries, and in Go 1.5 the bitmap holds 1-bit entries, but in both cases using a bitmap for a large type containing arrays does not make sense: if someone refers to the type [1<<28]*byte in a program in such a way that the type information makes it into the binary, it would be a waste of space to write a 128 MB (for 4-bit entries) or even 32 MB (for 1-bit entries) bitmap full of 1s into the binary or even to keep one in memory during the execution of the program. For large types containing arrays, it is much more compact to describe the locations of pointers using a notation that can express repetition than to lay out a bitmap of pointers. Go 1.4 included such a notation, called ``GC programs'' but it was complex, required recursion during decoding, and was generally slow. Dmitriy measured the execution of these programs writing directly to the heap bitmap as being 7x slower than copying from a preunrolled 4-bit mask (and frankly that code was not terribly fast either). For some tests, unrollgcprog1 was seen costing as much as 3x more than the rest of malloc combined. This CL introduces a different form for the GC programs. They use a simple Lempel-Ziv-style encoding of the 1-bit pointer information, in which the only operations are (1) emit the following n bits and (2) repeat the last n bits c more times. This encoding can be generated directly from the Go type information (using repetition only for arrays or large runs of non-pointer data) and it can be decoded very efficiently. In particular the decoding requires little state and no recursion, so that the entire decoding can run without any memory accesses other than the reads of the encoding and the writes of the decoded form to the heap bitmap. For recursive types like arrays of arrays of arrays, the inner instructions are only executed once, not n times, so that large repetitions run at full speed. (In contrast, large repetitions in the old programs repeated the individual bit-level layout of the inner data over and over.) The result is as much as 25x faster decoding compared to the old form. Because the old decoder was so slow, Go 1.4 had three (or so) cases for how to set the heap bitmap bits for an allocation of a given type: (1) If the type had an even number of words up to 32 words, then the 4-bit pointer mask for the type fit in no more than 16 bytes; store the 4-bit pointer mask directly in the binary and copy from it. (1b) If the type had an odd number of words up to 15 words, then the 4-bit pointer mask for the type, doubled to end on a byte boundary, fit in no more than 16 bytes; store that doubled mask directly in the binary and copy from it. (2) If the type had an even number of words up to 128 words, or an odd number of words up to 63 words (again due to doubling), then the 4-bit pointer mask would fit in a 64-byte unrolled mask. Store a GC program in the binary, but leave space in the BSS for the unrolled mask. Execute the GC program to construct the mask the first time it is needed, and thereafter copy from the mask. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. (This is the case that was 7x slower than the other two.) Because the new pointer masks store 1-bit entries instead of 4-bit entries and because using the decoder no longer carries a significant overhead, after this CL (that is, for Go 1.5) there are only two cases: (1) If the type is 128 words or less (no condition about odd or even), store the 1-bit pointer mask directly in the binary and use it to initialize the heap bitmap during malloc. (Implemented in CL 9702.) (2) There is no case 2 anymore. (3) Otherwise, store a GC program and execute it to write directly to the heap bitmap each time an object of that type is allocated. Executing the GC program directly into the heap bitmap (case (3) above) was disabled for the Go 1.5 dev cycle, both to avoid needing to use GC programs for typedmemmove and to avoid updating that code as the heap bitmap format changed. Typedmemmove no longer uses this type information; as of CL 9886 it uses the heap bitmap directly. Now that the heap bitmap format is stable, we reintroduce GC programs and their space savings. Benchmarks for heapBitsSetType, before this CL vs this CL: name old mean new mean delta SetTypePtr 7.59ns × (0.99,1.02) 5.16ns × (1.00,1.00) -32.05% (p=0.000) SetTypePtr8 21.0ns × (0.98,1.05) 21.4ns × (1.00,1.00) ~ (p=0.179) SetTypePtr16 24.1ns × (0.99,1.01) 24.6ns × (1.00,1.00) +2.41% (p=0.001) SetTypePtr32 31.2ns × (0.99,1.01) 32.4ns × (0.99,1.02) +3.72% (p=0.001) SetTypePtr64 45.2ns × (1.00,1.00) 47.2ns × (1.00,1.00) +4.42% (p=0.000) SetTypePtr126 75.8ns × (0.99,1.01) 79.1ns × (1.00,1.00) +4.25% (p=0.000) SetTypePtr128 74.3ns × (0.99,1.01) 77.6ns × (1.00,1.01) +4.55% (p=0.000) SetTypePtrSlice 726ns × (1.00,1.01) 712ns × (1.00,1.00) -1.95% (p=0.001) SetTypeNode1 20.0ns × (0.99,1.01) 20.7ns × (1.00,1.00) +3.71% (p=0.000) SetTypeNode1Slice 112ns × (1.00,1.00) 113ns × (0.99,1.00) ~ (p=0.070) SetTypeNode8 23.9ns × (1.00,1.00) 24.7ns × (1.00,1.01) +3.18% (p=0.000) SetTypeNode8Slice 294ns × (0.99,1.02) 287ns × (0.99,1.01) -2.38% (p=0.015) SetTypeNode64 52.8ns × (0.99,1.03) 51.8ns × (0.99,1.01) ~ (p=0.069) SetTypeNode64Slice 1.13µs × (0.99,1.05) 1.14µs × (0.99,1.00) ~ (p=0.767) SetTypeNode64Dead 36.0ns × (1.00,1.01) 32.5ns × (0.99,1.00) -9.67% (p=0.000) SetTypeNode64DeadSlice 1.43µs × (0.99,1.01) 1.40µs × (1.00,1.00) -2.39% (p=0.001) SetTypeNode124 75.7ns × (1.00,1.01) 79.0ns × (1.00,1.00) +4.44% (p=0.000) SetTypeNode124Slice 1.94µs × (1.00,1.01) 2.04µs × (0.99,1.01) +4.98% (p=0.000) SetTypeNode126 75.4ns × (1.00,1.01) 77.7ns × (0.99,1.01) +3.11% (p=0.000) SetTypeNode126Slice 1.95µs × (0.99,1.01) 2.03µs × (1.00,1.00) +3.74% (p=0.000) SetTypeNode128 85.4ns × (0.99,1.01) 122.0ns × (1.00,1.00) +42.89% (p=0.000) SetTypeNode128Slice 2.20µs × (1.00,1.01) 2.36µs × (0.98,1.02) +7.48% (p=0.001) SetTypeNode130 83.3ns × (1.00,1.00) 123.0ns × (1.00,1.00) +47.61% (p=0.000) SetTypeNode130Slice 2.30µs × (0.99,1.01) 2.40µs × (0.98,1.01) +4.37% (p=0.000) SetTypeNode1024 498ns × (1.00,1.00) 537ns × (1.00,1.00) +7.96% (p=0.000) SetTypeNode1024Slice 15.5µs × (0.99,1.01) 17.8µs × (1.00,1.00) +15.27% (p=0.000) The above compares always using a cached pointer mask (and the corresponding waste of memory) against using the programs directly. Some slowdown is expected, in exchange for having a better general algorithm. The GC programs kick in for SetTypeNode128, SetTypeNode130, SetTypeNode1024, along with the slice variants of those. It is possible that the cutoff of 128 words (bits) should be raised in a followup CL, but even with this low cutoff the GC programs are faster than Go 1.4's "fast path" non-GC program case. Benchmarks for heapBitsSetType, Go 1.4 vs this CL: name old mean new mean delta SetTypePtr 6.89ns × (1.00,1.00) 5.17ns × (1.00,1.00) -25.02% (p=0.000) SetTypePtr8 25.8ns × (0.97,1.05) 21.5ns × (1.00,1.00) -16.70% (p=0.000) SetTypePtr16 39.8ns × (0.97,1.02) 24.7ns × (0.99,1.01) -37.81% (p=0.000) SetTypePtr32 68.8ns × (0.98,1.01) 32.2ns × (1.00,1.01) -53.18% (p=0.000) SetTypePtr64 130ns × (1.00,1.00) 47ns × (1.00,1.00) -63.67% (p=0.000) SetTypePtr126 241ns × (0.99,1.01) 79ns × (1.00,1.01) -67.25% (p=0.000) SetTypePtr128 2.07µs × (1.00,1.00) 0.08µs × (1.00,1.00) -96.27% (p=0.000) SetTypePtrSlice 1.05µs × (0.99,1.01) 0.72µs × (0.99,1.02) -31.70% (p=0.000) SetTypeNode1 16.0ns × (0.99,1.01) 20.8ns × (0.99,1.03) +29.91% (p=0.000) SetTypeNode1Slice 184ns × (0.99,1.01) 112ns × (0.99,1.01) -39.26% (p=0.000) SetTypeNode8 29.5ns × (0.97,1.02) 24.6ns × (1.00,1.00) -16.50% (p=0.000) SetTypeNode8Slice 624ns × (0.98,1.02) 285ns × (1.00,1.00) -54.31% (p=0.000) SetTypeNode64 135ns × (0.96,1.08) 52ns × (0.99,1.02) -61.32% (p=0.000) SetTypeNode64Slice 3.83µs × (1.00,1.00) 1.14µs × (0.99,1.01) -70.16% (p=0.000) SetTypeNode64Dead 134ns × (0.99,1.01) 32ns × (1.00,1.01) -75.74% (p=0.000) SetTypeNode64DeadSlice 3.83µs × (0.99,1.00) 1.40µs × (1.00,1.01) -63.42% (p=0.000) SetTypeNode124 240ns × (0.99,1.01) 79ns × (1.00,1.01) -67.05% (p=0.000) SetTypeNode124Slice 7.27µs × (1.00,1.00) 2.04µs × (1.00,1.00) -71.95% (p=0.000) SetTypeNode126 2.06µs × (0.99,1.01) 0.08µs × (0.99,1.01) -96.23% (p=0.000) SetTypeNode126Slice 64.4µs × (1.00,1.00) 2.0µs × (1.00,1.00) -96.85% (p=0.000) SetTypeNode128 2.09µs × (1.00,1.01) 0.12µs × (1.00,1.00) -94.15% (p=0.000) SetTypeNode128Slice 65.4µs × (1.00,1.00) 2.4µs × (0.99,1.03) -96.39% (p=0.000) SetTypeNode130 2.11µs × (1.00,1.00) 0.12µs × (1.00,1.00) -94.18% (p=0.000) SetTypeNode130Slice 66.3µs × (1.00,1.00) 2.4µs × (0.97,1.08) -96.34% (p=0.000) SetTypeNode1024 16.0µs × (1.00,1.01) 0.5µs × (1.00,1.00) -96.65% (p=0.000) SetTypeNode1024Slice 512µs × (1.00,1.00) 18µs × (0.98,1.04) -96.45% (p=0.000) SetTypeNode124 uses a 124 data + 2 ptr = 126-word allocation. Both Go 1.4 and this CL are using pointer bitmaps for this case, so that's an overall 3x speedup for using pointer bitmaps. SetTypeNode128 uses a 128 data + 2 ptr = 130-word allocation. Both Go 1.4 and this CL are running the GC program for this case, so that's an overall 17x speedup when using GC programs (and I've seen >20x on other systems). Comparing Go 1.4's SetTypeNode124 (pointer bitmap) against this CL's SetTypeNode128 (GC program), the slow path in the code in this CL is 2x faster than the fast path in Go 1.4. The Go 1 benchmarks are basically unaffected compared to just before this CL. Go 1 benchmarks, before this CL vs this CL: name old mean new mean delta BinaryTree17 5.87s × (0.97,1.04) 5.91s × (0.96,1.04) ~ (p=0.306) Fannkuch11 4.38s × (1.00,1.00) 4.37s × (1.00,1.01) -0.22% (p=0.006) FmtFprintfEmpty 90.7ns × (0.97,1.10) 89.3ns × (0.96,1.09) ~ (p=0.280) FmtFprintfString 282ns × (0.98,1.04) 287ns × (0.98,1.07) +1.72% (p=0.039) FmtFprintfInt 269ns × (0.99,1.03) 282ns × (0.97,1.04) +4.87% (p=0.000) FmtFprintfIntInt 478ns × (0.99,1.02) 481ns × (0.99,1.02) +0.61% (p=0.048) FmtFprintfPrefixedInt 399ns × (0.98,1.03) 400ns × (0.98,1.05) ~ (p=0.533) FmtFprintfFloat 563ns × (0.99,1.01) 570ns × (1.00,1.01) +1.37% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.01) 1.92µs × (0.99,1.02) +1.88% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 15.2ms × (0.98,1.05) ~ (p=0.609) GobEncode 11.6ms × (0.98,1.03) 11.9ms × (0.98,1.04) +2.17% (p=0.000) Gzip 648ms × (0.99,1.01) 648ms × (1.00,1.01) ~ (p=0.835) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.169) HTTPClientServer 90.5µs × (0.98,1.03) 91.5µs × (0.98,1.04) +1.04% (p=0.045) JSONEncode 31.5ms × (0.98,1.03) 31.4ms × (0.98,1.03) ~ (p=0.549) JSONDecode 111ms × (0.99,1.01) 107ms × (0.99,1.01) -3.21% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.878) GoParse 6.54ms × (0.99,1.02) 6.61ms × (0.99,1.03) +1.08% (p=0.004) RegexpMatchEasy0_32 160ns × (1.00,1.01) 161ns × (1.00,1.00) +0.40% (p=0.000) RegexpMatchEasy0_1K 560ns × (0.99,1.01) 559ns × (0.99,1.01) ~ (p=0.088) RegexpMatchEasy1_32 138ns × (0.99,1.01) 138ns × (1.00,1.00) ~ (p=0.380) RegexpMatchEasy1_1K 877ns × (1.00,1.00) 878ns × (1.00,1.00) ~ (p=0.157) RegexpMatchMedium_32 251ns × (0.99,1.00) 251ns × (1.00,1.01) +0.28% (p=0.021) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.539) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.378) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.067) Revcomp 904ms × (0.99,1.02) 904ms × (0.99,1.01) ~ (p=0.943) Template 125ms × (0.99,1.02) 127ms × (0.99,1.01) +1.79% (p=0.000) TimeParse 627ns × (0.99,1.01) 622ns × (0.99,1.01) -0.88% (p=0.000) TimeFormat 655ns × (0.99,1.02) 655ns × (0.99,1.02) ~ (p=0.976) For the record, Go 1 benchmarks, Go 1.4 vs this CL: name old mean new mean delta BinaryTree17 4.61s × (0.97,1.05) 5.91s × (0.98,1.03) +28.35% (p=0.000) Fannkuch11 4.40s × (0.99,1.03) 4.41s × (0.99,1.01) ~ (p=0.212) FmtFprintfEmpty 102ns × (0.99,1.01) 84ns × (0.99,1.02) -18.38% (p=0.000) FmtFprintfString 302ns × (0.98,1.01) 303ns × (0.99,1.02) ~ (p=0.203) FmtFprintfInt 313ns × (0.97,1.05) 270ns × (0.99,1.01) -13.69% (p=0.000) FmtFprintfIntInt 524ns × (0.98,1.02) 477ns × (0.99,1.00) -8.87% (p=0.000) FmtFprintfPrefixedInt 424ns × (0.98,1.02) 386ns × (0.99,1.01) -8.96% (p=0.000) FmtFprintfFloat 652ns × (0.98,1.02) 594ns × (0.97,1.05) -8.97% (p=0.000) FmtManyArgs 2.13µs × (0.99,1.02) 1.94µs × (0.99,1.01) -8.92% (p=0.000) GobDecode 17.1ms × (0.99,1.02) 14.9ms × (0.98,1.03) -13.07% (p=0.000) GobEncode 13.5ms × (0.98,1.03) 11.5ms × (0.98,1.03) -15.25% (p=0.000) Gzip 656ms × (0.99,1.02) 647ms × (0.99,1.01) -1.29% (p=0.000) Gunzip 143ms × (0.99,1.02) 144ms × (0.99,1.01) ~ (p=0.204) HTTPClientServer 88.2µs × (0.98,1.02) 90.8µs × (0.98,1.01) +2.93% (p=0.000) JSONEncode 32.2ms × (0.98,1.02) 30.9ms × (0.97,1.04) -4.06% (p=0.001) JSONDecode 121ms × (0.98,1.02) 110ms × (0.98,1.05) -8.95% (p=0.000) Mandelbrot200 6.06ms × (0.99,1.01) 6.11ms × (0.98,1.04) ~ (p=0.184) GoParse 6.76ms × (0.97,1.04) 6.58ms × (0.98,1.05) -2.63% (p=0.003) RegexpMatchEasy0_32 195ns × (1.00,1.01) 155ns × (0.99,1.01) -20.43% (p=0.000) RegexpMatchEasy0_1K 479ns × (0.98,1.03) 535ns × (0.99,1.02) +11.59% (p=0.000) RegexpMatchEasy1_32 169ns × (0.99,1.02) 131ns × (0.99,1.03) -22.44% (p=0.000) RegexpMatchEasy1_1K 1.53µs × (0.99,1.01) 0.87µs × (0.99,1.02) -43.07% (p=0.000) RegexpMatchMedium_32 334ns × (0.99,1.01) 242ns × (0.99,1.01) -27.53% (p=0.000) RegexpMatchMedium_1K 125µs × (1.00,1.01) 72µs × (0.99,1.03) -42.53% (p=0.000) RegexpMatchHard_32 6.03µs × (0.99,1.01) 3.79µs × (0.99,1.01) -37.12% (p=0.000) RegexpMatchHard_1K 189µs × (0.99,1.02) 115µs × (0.99,1.01) -39.20% (p=0.000) Revcomp 935ms × (0.96,1.03) 926ms × (0.98,1.02) ~ (p=0.083) Template 146ms × (0.97,1.05) 119ms × (0.99,1.01) -18.37% (p=0.000) TimeParse 660ns × (0.99,1.01) 624ns × (0.99,1.02) -5.43% (p=0.000) TimeFormat 670ns × (0.98,1.02) 710ns × (1.00,1.01) +5.97% (p=0.000) This CL is a bit larger than I would like, but the compiler, linker, runtime, and package reflect all need to be in sync about the format of these programs, so there is no easy way to split this into independent changes (at least while keeping the build working at each change). Fixes #9625. Fixes #10524. Change-Id: I9e3e20d6097099d0f8532d1cb5b1af528804989a Reviewed-on: https://go-review.googlesource.com/9888 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Russ Cox <rsc@golang.org> |
|
|
|
d820d5f3ab |
runtime: make mapzero not crash on arm
Change-Id: I40e8a4a2e62253233b66f6a2e61e222437292c31 Reviewed-on: https://go-review.googlesource.com/10151 Reviewed-by: Minux Ma <minux@golang.org> |
|
|
|
c3c047a6a3 |
runtime: test and fix heap bitmap for 1-pointer allocation on 32-bit system
Change-Id: Ic064fe7c6bd3304dcc8c3f7b3b5393870b5387c2 Reviewed-on: https://go-review.googlesource.com/10119 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
7e26a2d9a8 |
runtime: allocate map element zero values for reflect-created types on demand
Preallocating them in reflect means that (1) if you say _ = PtrTo(ArrayOf(1000000000, reflect.TypeOf(byte(0)))), you just allocated 1GB of data (2) if you say it again, that's *another* GB of data. The only use of t.zero in the runtime is for map elements. Delay the allocation until the creation of a map with that element type, and share the zeros. The one downside of the shared zero is that it's not garbage collected, but it's also never written, so the OS should be able to handle it fairly efficiently. Change-Id: I56b098a091abf3ac0945de28ebef9a6c08e76614 Reviewed-on: https://go-review.googlesource.com/10111 Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
65c4d7beab |
runtime: optimize heapBitsBulkBarrier a tiny amount
This may be mostly noise but: name old mean new mean delta BinaryTree17 6.03s × (0.98,1.02) 5.98s × (0.97,1.03) ~ (p=0.306) Fannkuch11 4.42s × (0.99,1.01) 4.34s × (0.99,1.02) -1.83% (p=0.000) FmtFprintfEmpty 84.7ns × (0.99,1.01) 84.4ns × (1.00,1.00) ~ (p=0.138) FmtFprintfString 289ns × (0.98,1.02) 289ns × (1.00,1.01) ~ (p=0.509) FmtFprintfInt 280ns × (0.97,1.03) 272ns × (0.98,1.03) -2.64% (p=0.003) FmtFprintfIntInt 484ns × (0.98,1.02) 482ns × (0.98,1.03) ~ (p=0.606) FmtFprintfPrefixedInt 397ns × (0.98,1.03) 393ns × (0.99,1.02) ~ (p=0.064) FmtFprintfFloat 573ns × (0.99,1.01) 569ns × (0.99,1.01) -0.69% (p=0.023) FmtManyArgs 1.89µs × (0.99,1.02) 1.91µs × (0.98,1.02) ~ (p=0.219) GobDecode 15.4ms × (0.99,1.02) 15.1ms × (0.99,1.01) -2.05% (p=0.000) GobEncode 12.0ms × (0.97,1.04) 11.9ms × (0.97,1.03) ~ (p=0.458) Gzip 652ms × (0.99,1.01) 653ms × (0.99,1.01) ~ (p=0.743) Gunzip 144ms × (0.99,1.01) 143ms × (0.99,1.01) ~ (p=0.134) HTTPClientServer 91.6µs × (0.99,1.01) 91.8µs × (0.99,1.03) ~ (p=0.678) JSONEncode 31.9ms × (1.00,1.00) 32.0ms × (0.99,1.01) ~ (p=0.334) JSONDecode 110ms × (0.99,1.01) 110ms × (0.99,1.01) ~ (p=0.315) Mandelbrot200 6.04ms × (0.99,1.01) 6.04ms × (1.00,1.01) ~ (p=0.596) GoParse 6.72ms × (0.98,1.03) 6.74ms × (0.99,1.03) ~ (p=0.577) RegexpMatchEasy0_32 161ns × (0.99,1.01) 160ns × (1.00,1.00) -0.83% (p=0.002) RegexpMatchEasy0_1K 542ns × (0.99,1.02) 541ns × (0.99,1.01) ~ (p=0.396) RegexpMatchEasy1_32 140ns × (0.98,1.01) 137ns × (1.00,1.00) -2.12% (p=0.000) RegexpMatchEasy1_1K 892ns × (0.99,1.01) 891ns × (1.00,1.01) ~ (p=0.631) RegexpMatchMedium_32 255ns × (0.99,1.01) 253ns × (0.99,1.01) -0.76% (p=0.008) RegexpMatchMedium_1K 73.1µs × (1.00,1.01) 72.9µs × (1.00,1.00) ~ (p=0.229) RegexpMatchHard_32 3.86µs × (1.00,1.01) 3.85µs × (1.00,1.00) ~ (p=0.341) RegexpMatchHard_1K 117µs × (1.00,1.01) 117µs × (0.99,1.00) ~ (p=0.955) Revcomp 954ms × (0.97,1.03) 955ms × (0.98,1.02) ~ (p=0.894) Template 133ms × (0.97,1.05) 129ms × (0.99,1.02) -2.50% (p=0.014) TimeParse 629ns × (0.99,1.01) 626ns × (0.99,1.01) ~ (p=0.106) TimeFormat 663ns × (0.99,1.01) 660ns × (0.99,1.02) ~ (p=0.231) Change-Id: I580e03ed01b0629cb5eae4c4637618f20127f924 Reviewed-on: https://go-review.googlesource.com/9994 Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
497970f421 |
runtime: use memmove during slice append
The effect of this CL: name old mean new mean delta BinaryTree17 5.97s × (0.96,1.04) 5.95s × (0.98,1.02) ~ (p=0.697) Fannkuch11 4.39s × (1.00,1.01) 4.41s × (1.00,1.01) +0.52% (p=0.015) FmtFprintfEmpty 90.8ns × (0.97,1.05) 89.4ns × (0.94,1.13) ~ (p=0.571) FmtFprintfString 305ns × (0.99,1.01) 292ns × (0.98,1.05) -4.35% (p=0.000) FmtFprintfInt 278ns × (0.96,1.03) 279ns × (0.98,1.04) ~ (p=0.741) FmtFprintfIntInt 489ns × (0.99,1.02) 482ns × (0.98,1.03) -1.43% (p=0.024) FmtFprintfPrefixedInt 402ns × (0.98,1.02) 395ns × (0.98,1.03) -1.67% (p=0.014) FmtFprintfFloat 578ns × (1.00,1.00) 569ns × (0.99,1.01) -1.48% (p=0.000) FmtManyArgs 1.88µs × (0.99,1.01) 1.88µs × (1.00,1.01) ~ (p=0.055) GobDecode 15.3ms × (0.99,1.01) 15.2ms × (1.00,1.01) -0.61% (p=0.007) GobEncode 11.8ms × (0.98,1.05) 11.6ms × (0.99,1.01) ~ (p=0.075) Gzip 647ms × (0.99,1.01) 647ms × (1.00,1.00) ~ (p=0.790) Gunzip 143ms × (1.00,1.00) 142ms × (1.00,1.00) ~ (p=0.370) HTTPClientServer 91.2µs × (0.99,1.01) 91.7µs × (0.99,1.02) ~ (p=0.233) JSONEncode 31.5ms × (0.98,1.01) 31.8ms × (0.99,1.02) +1.09% (p=0.015) JSONDecode 110ms × (0.99,1.01) 110ms × (0.99,1.02) ~ (p=0.577) Mandelbrot200 6.00ms × (1.00,1.00) 6.02ms × (1.00,1.00) +0.24% (p=0.001) GoParse 6.68ms × (0.98,1.02) 6.61ms × (0.99,1.01) -1.10% (p=0.027) RegexpMatchEasy0_32 162ns × (1.00,1.00) 161ns × (1.00,1.01) -0.66% (p=0.001) RegexpMatchEasy0_1K 539ns × (1.00,1.00) 539ns × (0.99,1.01) ~ (p=0.509) RegexpMatchEasy1_32 140ns × (0.99,1.02) 139ns × (0.99,1.02) ~ (p=0.163) RegexpMatchEasy1_1K 886ns × (1.00,1.00) 887ns × (1.00,1.00) ~ (p=0.408) RegexpMatchMedium_32 252ns × (1.00,1.00) 255ns × (0.99,1.01) +1.01% (p=0.000) RegexpMatchMedium_1K 72.6µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.176) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.403) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.351) Revcomp 926ms × (0.99,1.01) 925ms × (0.99,1.01) ~ (p=0.541) Template 126ms × (0.99,1.02) 130ms × (0.99,1.01) +3.42% (p=0.000) TimeParse 632ns × (0.99,1.01) 626ns × (1.00,1.00) -0.88% (p=0.000) TimeFormat 658ns × (0.99,1.01) 662ns × (0.99,1.02) ~ (p=0.111) The effect of this CL combined with CL 9886: name old mean new mean delta BinaryTree17 5.90s × (0.98,1.03) 5.95s × (0.98,1.02) ~ (p=0.175) Fannkuch11 4.34s × (1.00,1.00) 4.41s × (1.00,1.01) +1.69% (p=0.000) FmtFprintfEmpty 87.3ns × (0.97,1.17) 89.4ns × (0.94,1.13) ~ (p=0.499) FmtFprintfString 288ns × (0.98,1.04) 292ns × (0.98,1.05) ~ (p=0.292) FmtFprintfInt 290ns × (0.98,1.05) 279ns × (0.98,1.04) -3.76% (p=0.001) FmtFprintfIntInt 493ns × (0.98,1.04) 482ns × (0.98,1.03) -2.27% (p=0.017) FmtFprintfPrefixedInt 399ns × (0.98,1.02) 395ns × (0.98,1.03) ~ (p=0.159) FmtFprintfFloat 569ns × (1.00,1.00) 569ns × (0.99,1.01) ~ (p=0.847) FmtManyArgs 1.90µs × (0.99,1.03) 1.88µs × (1.00,1.01) -1.14% (p=0.009) GobDecode 15.2ms × (1.00,1.01) 15.2ms × (1.00,1.01) ~ (p=0.170) GobEncode 11.8ms × (0.99,1.02) 11.6ms × (0.99,1.01) -1.47% (p=0.003) Gzip 649ms × (0.99,1.00) 647ms × (1.00,1.00) ~ (p=0.200) Gunzip 144ms × (0.99,1.01) 142ms × (1.00,1.00) -1.04% (p=0.000) HTTPClientServer 91.1µs × (0.98,1.03) 91.7µs × (0.99,1.02) ~ (p=0.345) JSONEncode 31.5ms × (0.99,1.01) 31.8ms × (0.99,1.02) +0.98% (p=0.021) JSONDecode 110ms × (1.00,1.01) 110ms × (0.99,1.02) ~ (p=0.259) Mandelbrot200 6.02ms × (1.00,1.01) 6.02ms × (1.00,1.00) ~ (p=0.500) GoParse 6.68ms × (1.00,1.01) 6.61ms × (0.99,1.01) -1.17% (p=0.001) RegexpMatchEasy0_32 161ns × (1.00,1.00) 161ns × (1.00,1.01) -0.39% (p=0.033) RegexpMatchEasy0_1K 539ns × (1.00,1.00) 539ns × (0.99,1.01) ~ (p=0.445) RegexpMatchEasy1_32 138ns × (1.00,1.01) 139ns × (0.99,1.02) ~ (p=0.281) RegexpMatchEasy1_1K 887ns × (1.00,1.01) 887ns × (1.00,1.00) ~ (p=0.610) RegexpMatchMedium_32 251ns × (1.00,1.02) 255ns × (0.99,1.01) +1.42% (p=0.000) RegexpMatchMedium_1K 72.7µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.097) RegexpMatchHard_32 3.85µs × (1.00,1.00) 3.84µs × (1.00,1.00) -0.31% (p=0.000) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) ~ (p=0.704) Revcomp 923ms × (0.98,1.02) 925ms × (0.99,1.01) ~ (p=0.574) Template 126ms × (0.98,1.03) 130ms × (0.99,1.01) +3.28% (p=0.000) TimeParse 631ns × (0.99,1.02) 626ns × (1.00,1.00) ~ (p=0.053) TimeFormat 660ns × (0.99,1.01) 662ns × (0.99,1.02) ~ (p=0.398) Change-Id: I59c03d329fe7bc178a31477c6f1f01062b881041 Reviewed-on: https://go-review.googlesource.com/9993 Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
30aacd4ce2 |
runtime: add Node128, Node130 benchmarks
Change-Id: I815a7ceeea48cc652b3c8568967665af39b02834 Reviewed-on: https://go-review.googlesource.com/10045 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> |
|
|
|
ecfe42cab0 |
runtime: keep pointer bits set always in 1-word spans
It's dumb to clear them in initSpan, set them in heapBitsSetType, clear them in heapBitsSweepSpan, set them again in heapBitsSetType, clear them again in heapBitsSweepSpan, and so on. Set them in initSpan and be done with it (until the span is reused for objects of a different size). This avoids an atomic operation in a common case (one-word allocation). Suggested by rlh. name old mean new mean delta BinaryTree17 5.87s × (0.97,1.03) 5.93s × (0.98,1.04) ~ (p=0.056) Fannkuch11 4.34s × (1.00,1.01) 4.41s × (1.00,1.00) +1.42% (p=0.000) FmtFprintfEmpty 86.1ns × (0.98,1.03) 88.9ns × (0.95,1.14) ~ (p=0.066) FmtFprintfString 292ns × (0.97,1.04) 284ns × (0.98,1.03) -2.64% (p=0.000) FmtFprintfInt 271ns × (0.98,1.06) 274ns × (0.98,1.05) ~ (p=0.148) FmtFprintfIntInt 478ns × (0.98,1.05) 487ns × (0.98,1.03) +1.85% (p=0.004) FmtFprintfPrefixedInt 397ns × (0.98,1.05) 394ns × (0.98,1.02) ~ (p=0.184) FmtFprintfFloat 553ns × (0.99,1.02) 543ns × (0.99,1.01) -1.71% (p=0.000) FmtManyArgs 1.90µs × (0.98,1.05) 1.88µs × (0.99,1.01) -0.97% (p=0.037) GobDecode 15.1ms × (0.99,1.01) 15.3ms × (0.99,1.01) +0.78% (p=0.001) GobEncode 11.7ms × (0.98,1.05) 11.6ms × (0.99,1.02) -1.39% (p=0.009) Gzip 646ms × (1.00,1.01) 647ms × (1.00,1.01) ~ (p=0.120) Gunzip 142ms × (1.00,1.00) 142ms × (1.00,1.00) ~ (p=0.068) HTTPClientServer 89.7µs × (0.99,1.01) 90.1µs × (0.98,1.03) ~ (p=0.224) JSONEncode 31.3ms × (0.99,1.01) 31.2ms × (0.99,1.02) ~ (p=0.149) JSONDecode 113ms × (0.99,1.01) 111ms × (0.99,1.01) -1.25% (p=0.000) Mandelbrot200 6.01ms × (1.00,1.00) 6.01ms × (1.00,1.00) +0.09% (p=0.015) GoParse 6.63ms × (0.98,1.03) 6.55ms × (0.99,1.02) -1.10% (p=0.006) RegexpMatchEasy0_32 161ns × (1.00,1.00) 161ns × (1.00,1.00) (sample has zero variance) RegexpMatchEasy0_1K 539ns × (0.99,1.01) 563ns × (0.99,1.01) +4.51% (p=0.000) RegexpMatchEasy1_32 140ns × (0.99,1.01) 141ns × (0.99,1.01) +1.34% (p=0.000) RegexpMatchEasy1_1K 886ns × (1.00,1.01) 888ns × (1.00,1.00) +0.20% (p=0.003) RegexpMatchMedium_32 252ns × (1.00,1.02) 255ns × (0.99,1.01) +1.32% (p=0.000) RegexpMatchMedium_1K 72.7µs × (1.00,1.00) 72.6µs × (1.00,1.00) ~ (p=0.296) RegexpMatchHard_32 3.84µs × (1.00,1.01) 3.84µs × (1.00,1.00) ~ (p=0.339) RegexpMatchHard_1K 117µs × (1.00,1.01) 117µs × (1.00,1.00) -0.28% (p=0.022) Revcomp 914ms × (0.99,1.01) 909ms × (0.99,1.01) -0.49% (p=0.031) Template 128ms × (0.99,1.01) 127ms × (0.99,1.01) -1.10% (p=0.000) TimeParse 628ns × (0.99,1.01) 639ns × (0.99,1.01) +1.69% (p=0.000) TimeFormat 660ns × (0.99,1.01) 662ns × (0.99,1.02) ~ (p=0.287) Change-Id: I3127b0ab89708267c74aa7d0eae1db1a1bcdfda5 Reviewed-on: https://go-review.googlesource.com/9884 Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
94934f843e |
runtime: rewrite addb/subtractb to be simpler to compile; introduce add1, subtract1
This reduces the depth of the inlining at a particular call site. The inliner introduces many temporary variables, and the compiler can do a better job with fewer. Being verbose in the bodies of these helper functions seems like a reasonable tradeoff: the uses are still just as readable, and they run faster in some important cases. Change-Id: I5323976ed3704d0acd18fb31176cfbf5ba23a89c Reviewed-on: https://go-review.googlesource.com/9883 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
5b3739357a |
runtime: skip atomics in heapBitsSetType when GC is not running
Suggested by Rick during code review of this code, but separated out for easier diagnosis in case it causes problems (and also easier rollback). name old mean new mean delta SetTypePtr 13.9ns × (0.98,1.05) 6.2ns × (0.99,1.01) -55.18% (p=0.000) SetTypePtr8 15.5ns × (0.95,1.10) 15.5ns × (0.99,1.05) ~ (p=0.952) SetTypePtr16 17.8ns × (0.99,1.05) 18.0ns × (1.00,1.00) ~ (p=0.157) SetTypePtr32 25.2ns × (0.99,1.01) 24.3ns × (0.99,1.01) -3.86% (p=0.000) SetTypePtr64 42.2ns × (0.93,1.13) 40.8ns × (0.99,1.01) ~ (p=0.239) SetTypePtr126 67.3ns × (1.00,1.00) 67.5ns × (0.99,1.02) ~ (p=0.365) SetTypePtr128 67.6ns × (1.00,1.01) 70.1ns × (0.97,1.10) ~ (p=0.063) SetTypePtrSlice 575ns × (0.98,1.06) 543ns × (0.95,1.17) -5.54% (p=0.034) SetTypeNode1 12.4ns × (0.98,1.09) 12.8ns × (0.99,1.01) +3.40% (p=0.021) SetTypeNode1Slice 97.1ns × (0.97,1.09) 89.5ns × (1.00,1.00) -7.78% (p=0.000) SetTypeNode8 29.8ns × (1.00,1.01) 17.7ns × (1.00,1.01) -40.74% (p=0.000) SetTypeNode8Slice 204ns × (0.99,1.04) 190ns × (0.97,1.06) -6.96% (p=0.000) SetTypeNode64 42.8ns × (0.99,1.01) 44.0ns × (0.95,1.12) ~ (p=0.163) SetTypeNode64Slice 1.00µs × (0.95,1.09) 0.98µs × (0.96,1.08) ~ (p=0.356) SetTypeNode64Dead 12.2ns × (0.99,1.04) 12.7ns × (1.00,1.01) +4.34% (p=0.000) SetTypeNode64DeadSlice 1.14µs × (0.94,1.11) 0.99µs × (0.99,1.03) -13.74% (p=0.000) SetTypeNode124 67.9ns × (0.99,1.03) 70.4ns × (0.95,1.15) ~ (p=0.115) SetTypeNode124Slice 1.76µs × (0.99,1.04) 1.88µs × (0.91,1.23) ~ (p=0.096) SetTypeNode126 67.7ns × (1.00,1.01) 68.2ns × (0.99,1.02) +0.72% (p=0.014) SetTypeNode126Slice 1.76µs × (1.00,1.01) 1.87µs × (0.93,1.15) +6.15% (p=0.035) SetTypeNode1024 462ns × (0.96,1.10) 451ns × (0.99,1.05) ~ (p=0.224) SetTypeNode1024Slice 14.4µs × (0.95,1.15) 14.2µs × (0.97,1.19) ~ (p=0.676) name old mean new mean delta BinaryTree17 5.87s × (0.98,1.04) 5.87s × (0.98,1.03) ~ (p=0.993) Fannkuch11 4.39s × (0.99,1.01) 4.34s × (1.00,1.01) -1.22% (p=0.000) FmtFprintfEmpty 90.6ns × (0.97,1.06) 89.4ns × (0.97,1.03) ~ (p=0.070) FmtFprintfString 305ns × (0.98,1.02) 296ns × (0.99,1.02) -2.94% (p=0.000) FmtFprintfInt 276ns × (0.97,1.04) 270ns × (0.98,1.03) -2.17% (p=0.001) FmtFprintfIntInt 490ns × (0.97,1.05) 473ns × (0.99,1.02) -3.59% (p=0.000) FmtFprintfPrefixedInt 402ns × (0.99,1.02) 397ns × (0.99,1.01) -1.15% (p=0.000) FmtFprintfFloat 577ns × (0.99,1.01) 549ns × (0.99,1.01) -4.78% (p=0.000) FmtManyArgs 1.89µs × (0.99,1.02) 1.87µs × (0.99,1.01) -1.43% (p=0.000) GobDecode 15.2ms × (0.99,1.01) 14.7ms × (0.99,1.02) -3.55% (p=0.000) GobEncode 11.7ms × (0.98,1.04) 11.5ms × (0.99,1.02) -1.63% (p=0.002) Gzip 647ms × (0.99,1.01) 647ms × (1.00,1.01) ~ (p=0.486) Gunzip 142ms × (1.00,1.00) 143ms × (1.00,1.00) ~ (p=0.234) HTTPClientServer 90.7µs × (0.99,1.01) 90.4µs × (0.98,1.04) ~ (p=0.331) JSONEncode 31.9ms × (0.97,1.06) 31.6ms × (0.98,1.02) ~ (p=0.206) JSONDecode 110ms × (0.99,1.01) 112ms × (0.99,1.02) +1.48% (p=0.000) Mandelbrot200 6.00ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.058) GoParse 6.63ms × (0.98,1.03) 6.61ms × (0.98,1.02) ~ (p=0.353) RegexpMatchEasy0_32 162ns × (0.99,1.01) 161ns × (1.00,1.00) -0.33% (p=0.004) RegexpMatchEasy0_1K 539ns × (0.99,1.01) 540ns × (0.99,1.02) ~ (p=0.222) RegexpMatchEasy1_32 139ns × (0.99,1.01) 140ns × (0.97,1.03) ~ (p=0.054) RegexpMatchEasy1_1K 886ns × (1.00,1.00) 887ns × (1.00,1.00) +0.18% (p=0.001) RegexpMatchMedium_32 252ns × (1.00,1.01) 252ns × (1.00,1.00) +0.21% (p=0.010) RegexpMatchMedium_1K 72.7µs × (1.00,1.01) 72.6µs × (1.00,1.00) ~ (p=0.060) RegexpMatchHard_32 3.84µs × (1.00,1.00) 3.84µs × (1.00,1.00) ~ (p=0.065) RegexpMatchHard_1K 117µs × (1.00,1.00) 117µs × (1.00,1.00) -0.27% (p=0.000) Revcomp 916ms × (0.98,1.04) 909ms × (0.99,1.01) ~ (p=0.054) Template 126ms × (0.99,1.01) 128ms × (0.99,1.02) +1.43% (p=0.000) TimeParse 632ns × (0.99,1.01) 625ns × (1.00,1.01) -1.05% (p=0.000) TimeFormat 655ns × (0.99,1.02) 669ns × (0.99,1.02) +2.01% (p=0.000) Change-Id: I9477b7c9489c6fa98e860c190ce06cd73c53c6a1 Reviewed-on: https://go-review.googlesource.com/9829 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
6f2c0f1585 |
runtime: add check for malloc in a signal handler
Change-Id: Ic8ebbe81eb788626c01bfab238d54236e6e5ef2b Reviewed-on: https://go-review.googlesource.com/9964 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org> |
|
|
|
c4fe503119 |
runtime: reduce thrashing of gs between ps
One important use case is a pipeline computation that pass values
from one Goroutine to the next and then exits or is placed in a
wait state. If GOMAXPROCS > 1 a Goroutine running on P1 will enable
another Goroutine and then immediately make P1 available to execute
it. We need to prevent other Ps from stealing the G that P1 is about
to execute. Otherwise the Gs can thrash between Ps causing unneeded
synchronization and slowing down throughput.
Fix this by changing the stealing logic so that when a P attempts to
steal the only G on some other P's run queue, it will pause
momentarily to allow the victim P to schedule the G.
As part of optimizing stealing we also use a per P victim queue
move stolen gs. This eliminates the zeroing of a stack local victim
queue which turned out to be expensive.
This CL is a necessary but not sufficient prerequisite to changing
the default value of GOMAXPROCS to something > 1 which is another
CL/discussion.
For highly serialized programs, such as GoroutineRing below this can
make a large difference. For larger and more parallel programs such
as the x/benchmarks there is no noticeable detriment.
~/work/code/src/rsc.io/benchstat/benchstat old.txt new.txt
name old mean new mean delta
GoroutineRing 30.2µs × (0.98,1.01) 30.1µs × (0.97,1.04) ~ (p=0.941)
GoroutineRing-2 113µs × (0.91,1.07) 30µs × (0.98,1.03) -73.17% (p=0.004)
GoroutineRing-4 144µs × (0.98,1.02) 32µs × (0.98,1.01) -77.69% (p=0.000)
GoroutineRingBuf 32.7µs × (0.97,1.03) 32.5µs × (0.97,1.02) ~ (p=0.795)
GoroutineRingBuf-2 120µs × (0.92,1.08) 33µs × (1.00,1.00) -72.48% (p=0.004)
GoroutineRingBuf-4 138µs × (0.92,1.06) 33µs × (1.00,1.00) -76.21% (p=0.003)
The bench benchmarks show little impact.
old new
garbage 7032879 7011696
httpold 25509 25301
splayold 1022073 1019499
jsonold 28230624 28081433
Change-Id: I228c48fed8d85c9bbef16a7edc53ab7898506f50
Reviewed-on: https://go-review.googlesource.com/9872
Reviewed-by: Austin Clements <austin@google.com>
|
|
|
|
350fd548b3 |
runtime: don't run runq tests on the system stack
Running these tests on the system stack is problematic because they allocate Ps, which are large enough to overflow the system stack if they are stack-allocated. It used to be necessary to run these tests on the system stack because they were written in C, but since this is no longer the case, we can fix this problem by simply not running the tests on the system stack. This also means we no longer need the hack in one of these tests that forces the allocated Ps to escape to the heap, so eliminate that as well. Change-Id: I9064f5f8fd7f7b446ff39a22a70b172cfcb2dc57 Reviewed-on: https://go-review.googlesource.com/9923 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> |
|
|
|
7de86a1b1c |
runtime: terminate exit status buffer on Plan 9
The status buffer built by the exit function was not nil-terminated. Fixes #10789. Change-Id: I2d34ac50a19d138176c4b47393497ba7070d5b61 Reviewed-on: https://go-review.googlesource.com/9953 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> |
|
|
|
f85a05581e |
runtime: fix signal handling on Plan 9
Once added to the signal queue, the pointer passed to the signal handler could no longer be valid. Instead of passing the pointer to the note string, we recopy the value of the note string to a static array in the signal queue. Fixes #10784. Change-Id: Iddd6837b58a14dfaa16b069308ae28a7b8e0965b Reviewed-on: https://go-review.googlesource.com/9950 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> |
|
|
|
77fc03f4cd |
cmd/internal/ld, runtime: abort on shared library ABI mismatch
This: 1) Defines the ABI hash of a package (as the SHA1 of the __.PKGDEF) 2) Defines the ABI hash of a shared library (sort the packages by import path, concatenate the hashes of the packages and SHA1 that) 3) When building a shared library, compute the above value and define a global symbol that points to a go string that has the hash as its value. 4) When linking against a shared library, read the abi hash from the library and put both the value seen at link time and a reference to the global symbol into the moduledata. 5) During runtime initialization, check that the hash seen at link time still matches the hash the global symbol points to. Change-Id: Iaa54c783790e6dde3057a2feadc35473d49614a5 Reviewed-on: https://go-review.googlesource.com/8773 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com> |
|
|
|
be0cb9224b |
runtime: fix addmoduledata to follow the platform ABI
addmoduledata is called from a .init_array function and need to follow the platform ABI. It contains accesses to global data which are rewritten to use R15 by the assembler, and as R15 is callee-save we need to save it. Change-Id: I03893efb1576aed4f102f2465421f256f3bb0f30 Reviewed-on: https://go-review.googlesource.com/9941 Reviewed-by: Ian Lance Taylor <iant@golang.org> |
|
|
|
4212a3c3d9 |
runtime: use heap bitmap for typedmemmove
The current implementation of typedmemmove walks the ptrmask in the type to find out where pointers are. This led to turning off GC programs for the Go 1.5 dev cycle, so that there would always be a ptrmask. Instead of also interpreting the GC programs, interpret the heap bitmap, which we know must be available and up to date. (There is no point to write barriers when writing outside the heap.) This CL is only about correctness. The next CL will optimize the code. Change-Id: Id1305c7c071fd2734ab96634b0e1c745b23fa793 Reviewed-on: https://go-review.googlesource.com/9886 Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
266a842f55 |
runtime: zero entire bitmap for object, even past dead marker
We want typedmemmove to use the heap bitmap to determine where pointers are, instead of reinterpreting the type information. The heap bitmap is simpler to access. In general, typedmemmove will need to be able to look up the bits for any word and find valid pointer information, so fill even after the dead marker. Not filling after the dead marker was an optimization I introduced only a few days ago, when reintroducing the dead marker code. At the time I said it probably wouldn't last, and it didn't. Change-Id: I6ba01bff17ddee1ff429f454abe29867ec60606e Reviewed-on: https://go-review.googlesource.com/9885 Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
e375ca2a25 |
runtime: reorder bits in heap bitmap bytes
The runtime deals with 1-bit pointer bitmaps and 2-bit heap bitmaps that have entries for both pointers and mark bits. Each byte in a 1-bit pointer bitmap looks like pppppppp (all pointer bits). Each byte in a 2-bit heap bitmap looks like mpmpmpmp (mark, pointer, ...). This means that when converting from 1-bit to 2-bit, as we do during malloc, we have to pick up 4 bits in pppp form and use shifts to create the mpmpmpmp form. This CL changes the 2-bit heap bitmap form to mmmmpppp, so that 4 bits picked up in 1-bit form can be used directly in the low bits of the heap bitmap byte, without expansion. This simplifies the code, and it also happens to be faster. name old mean new mean delta SetTypePtr 14.0ns × (0.98,1.09) 14.0ns × (0.98,1.08) ~ (p=0.966) SetTypePtr8 16.5ns × (0.99,1.05) 15.3ns × (0.96,1.16) -6.86% (p=0.012) SetTypePtr16 21.3ns × (0.98,1.05) 18.8ns × (0.94,1.14) -11.49% (p=0.000) SetTypePtr32 34.6ns × (0.93,1.22) 27.7ns × (0.91,1.26) -20.08% (p=0.001) SetTypePtr64 55.7ns × (0.97,1.11) 41.6ns × (0.98,1.04) -25.30% (p=0.000) SetTypePtr126 98.0ns × (1.00,1.00) 67.7ns × (0.99,1.05) -30.88% (p=0.000) SetTypePtr128 98.6ns × (1.00,1.01) 68.6ns × (0.99,1.03) -30.44% (p=0.000) SetTypePtrSlice 781ns × (0.99,1.01) 571ns × (0.99,1.04) -26.93% (p=0.000) SetTypeNode1 13.1ns × (0.99,1.01) 12.1ns × (0.99,1.01) -7.45% (p=0.000) SetTypeNode1Slice 113ns × (0.99,1.01) 94ns × (1.00,1.00) -16.35% (p=0.000) SetTypeNode8 32.7ns × (1.00,1.00) 29.8ns × (0.99,1.01) -8.97% (p=0.000) SetTypeNode8Slice 266ns × (1.00,1.00) 204ns × (1.00,1.00) -23.40% (p=0.000) SetTypeNode64 58.0ns × (0.98,1.08) 42.8ns × (1.00,1.01) -26.24% (p=0.000) SetTypeNode64Slice 1.55µs × (0.99,1.02) 0.96µs × (1.00,1.00) -37.84% (p=0.000) SetTypeNode64Dead 13.1ns × (0.99,1.01) 12.1ns × (1.00,1.00) -7.33% (p=0.000) SetTypeNode64DeadSlice 1.52µs × (1.00,1.01) 1.08µs × (1.00,1.01) -28.95% (p=0.000) SetTypeNode124 97.9ns × (1.00,1.00) 67.1ns × (1.00,1.01) -31.49% (p=0.000) SetTypeNode124Slice 2.87µs × (0.99,1.02) 1.75µs × (1.00,1.01) -39.15% (p=0.000) SetTypeNode126 98.4ns × (1.00,1.01) 68.1ns × (1.00,1.01) -30.79% (p=0.000) SetTypeNode126Slice 2.91µs × (0.99,1.01) 1.77µs × (0.99,1.01) -39.09% (p=0.000) SetTypeNode1024 732ns × (1.00,1.00) 511ns × (0.87,1.42) -30.14% (p=0.000) SetTypeNode1024Slice 23.1µs × (1.00,1.00) 13.9µs × (0.99,1.02) -39.83% (p=0.000) Change-Id: I12e3b850a4e6fa6c8146b8635ff728f3ef658819 Reviewed-on: https://go-review.googlesource.com/9828 Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
363fd1dd1b |
runtime: move a few atomic fields up
Moving them up makes them properly aligned on 32-bit systems. There are some odd fields above them right now (like fixalloc and mutex maybe). Change-Id: I57851a5bbb2e7cc339712f004f99bb6c0cce0ca5 Reviewed-on: https://go-review.googlesource.com/9889 Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
8f037fa1ab |
runtime: fix TestLFStack on 386
The new(uint64) was moving to the stack, which may not be aligned. Change-Id: Iad070964202001b52029494d43e299fed980f939 Reviewed-on: https://go-review.googlesource.com/9787 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: David Chase <drchase@google.com> |
|
|
|
1635ab7dfe |
runtime: remove wbshadow mode
The write barrier shadow heap was very useful for developing the write barriers initially, but it's no longer used, clunky, and dragging the rest of the implementation down. The gccheckmark mode will find bugs due to missed barriers when they result in missed marks; wbshadow mode found the missed barriers more aggressively, but it required an entire separate copy of the heap. The gccheckmark mode requires no extra memory, making it more useful in practice. Compared to previous CL: name old mean new mean delta BinaryTree17 5.91s × (0.96,1.06) 5.72s × (0.97,1.03) -3.12% (p=0.000) Fannkuch11 4.32s × (1.00,1.00) 4.36s × (1.00,1.00) +0.91% (p=0.000) FmtFprintfEmpty 89.0ns × (0.93,1.10) 86.6ns × (0.96,1.11) ~ (p=0.077) FmtFprintfString 298ns × (0.98,1.06) 283ns × (0.99,1.04) -4.90% (p=0.000) FmtFprintfInt 286ns × (0.98,1.03) 283ns × (0.98,1.04) -1.09% (p=0.032) FmtFprintfIntInt 498ns × (0.97,1.06) 480ns × (0.99,1.02) -3.65% (p=0.000) FmtFprintfPrefixedInt 408ns × (0.98,1.02) 396ns × (0.99,1.01) -3.00% (p=0.000) FmtFprintfFloat 587ns × (0.98,1.01) 562ns × (0.99,1.01) -4.34% (p=0.000) FmtManyArgs 1.94µs × (0.99,1.02) 1.89µs × (0.99,1.01) -2.85% (p=0.000) GobDecode 15.8ms × (0.98,1.03) 15.7ms × (0.99,1.02) ~ (p=0.251) GobEncode 12.0ms × (0.96,1.09) 11.8ms × (0.98,1.03) -1.87% (p=0.024) Gzip 648ms × (0.99,1.01) 647ms × (0.99,1.01) ~ (p=0.688) Gunzip 143ms × (1.00,1.01) 143ms × (1.00,1.01) ~ (p=0.203) HTTPClientServer 90.3µs × (0.98,1.01) 89.1µs × (0.99,1.02) -1.30% (p=0.000) JSONEncode 31.6ms × (0.99,1.01) 31.7ms × (0.98,1.02) ~ (p=0.219) JSONDecode 107ms × (1.00,1.01) 111ms × (0.99,1.01) +3.58% (p=0.000) Mandelbrot200 6.03ms × (1.00,1.01) 6.01ms × (1.00,1.00) ~ (p=0.077) GoParse 6.53ms × (0.99,1.03) 6.54ms × (0.99,1.02) ~ (p=0.585) RegexpMatchEasy0_32 161ns × (1.00,1.01) 161ns × (0.98,1.05) ~ (p=0.948) RegexpMatchEasy0_1K 541ns × (0.99,1.01) 559ns × (0.98,1.01) +3.32% (p=0.000) RegexpMatchEasy1_32 138ns × (1.00,1.00) 137ns × (0.99,1.01) -0.55% (p=0.001) RegexpMatchEasy1_1K 887ns × (0.99,1.01) 878ns × (0.99,1.01) -0.98% (p=0.000) RegexpMatchMedium_32 253ns × (0.99,1.01) 252ns × (0.99,1.01) -0.39% (p=0.001) RegexpMatchMedium_1K 72.8µs × (1.00,1.00) 72.7µs × (1.00,1.00) ~ (p=0.485) RegexpMatchHard_32 3.85µs × (1.00,1.01) 3.85µs × (1.00,1.01) ~ (p=0.283) RegexpMatchHard_1K 117µs × (1.00,1.01) 117µs × (1.00,1.00) ~ (p=0.175) Revcomp 922ms × (0.97,1.08) 903ms × (0.98,1.05) -2.15% (p=0.021) Template 126ms × (0.99,1.01) 126ms × (0.99,1.01) ~ (p=0.943) TimeParse 628ns × (0.99,1.01) 634ns × (0.99,1.01) +0.92% (p=0.000) TimeFormat 668ns × (0.99,1.01) 698ns × (0.98,1.03) +4.53% (p=0.000) It's nice that the microbenchmarks are the ones helped the most, because those were the ones hurt the most by the conversion from 4-bit to 2-bit heap bitmaps. This CL brings the overall effect of that process to (compared to CL 9706 patch set 1): name old mean new mean delta BinaryTree17 5.87s × (0.94,1.09) 5.72s × (0.97,1.03) -2.57% (p=0.011) Fannkuch11 4.32s × (1.00,1.00) 4.36s × (1.00,1.00) +0.87% (p=0.000) FmtFprintfEmpty 89.1ns × (0.95,1.16) 86.6ns × (0.96,1.11) ~ (p=0.090) FmtFprintfString 283ns × (0.98,1.02) 283ns × (0.99,1.04) ~ (p=0.681) FmtFprintfInt 284ns × (0.98,1.04) 283ns × (0.98,1.04) ~ (p=0.620) FmtFprintfIntInt 486ns × (0.98,1.03) 480ns × (0.99,1.02) -1.27% (p=0.002) FmtFprintfPrefixedInt 400ns × (0.99,1.02) 396ns × (0.99,1.01) -0.84% (p=0.001) FmtFprintfFloat 566ns × (0.99,1.01) 562ns × (0.99,1.01) -0.80% (p=0.000) FmtManyArgs 1.91µs × (0.99,1.02) 1.89µs × (0.99,1.01) -1.10% (p=0.000) GobDecode 15.5ms × (0.98,1.05) 15.7ms × (0.99,1.02) +1.55% (p=0.005) GobEncode 11.9ms × (0.97,1.03) 11.8ms × (0.98,1.03) -0.97% (p=0.048) Gzip 648ms × (0.99,1.01) 647ms × (0.99,1.01) ~ (p=0.627) Gunzip 143ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.482) HTTPClientServer 89.2µs × (0.99,1.02) 89.1µs × (0.99,1.02) ~ (p=0.740) JSONEncode 32.3ms × (0.97,1.06) 31.7ms × (0.98,1.02) -1.95% (p=0.002) JSONDecode 106ms × (0.99,1.01) 111ms × (0.99,1.01) +4.22% (p=0.000) Mandelbrot200 6.02ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.417) GoParse 6.57ms × (0.97,1.06) 6.54ms × (0.99,1.02) ~ (p=0.404) RegexpMatchEasy0_32 162ns × (1.00,1.00) 161ns × (0.98,1.05) ~ (p=0.088) RegexpMatchEasy0_1K 561ns × (0.99,1.02) 559ns × (0.98,1.01) -0.47% (p=0.034) RegexpMatchEasy1_32 145ns × (0.95,1.04) 137ns × (0.99,1.01) -5.56% (p=0.000) RegexpMatchEasy1_1K 864ns × (0.99,1.04) 878ns × (0.99,1.01) +1.57% (p=0.000) RegexpMatchMedium_32 255ns × (0.99,1.04) 252ns × (0.99,1.01) -1.43% (p=0.001) RegexpMatchMedium_1K 73.9µs × (0.98,1.04) 72.7µs × (1.00,1.00) -1.55% (p=0.004) RegexpMatchHard_32 3.92µs × (0.98,1.04) 3.85µs × (1.00,1.01) -1.80% (p=0.003) RegexpMatchHard_1K 120µs × (0.98,1.04) 117µs × (1.00,1.00) -2.13% (p=0.001) Revcomp 936ms × (0.95,1.08) 903ms × (0.98,1.05) -3.58% (p=0.002) Template 130ms × (0.98,1.04) 126ms × (0.99,1.01) -2.98% (p=0.000) TimeParse 638ns × (0.98,1.05) 634ns × (0.99,1.01) ~ (p=0.198) TimeFormat 674ns × (0.99,1.01) 698ns × (0.98,1.03) +3.69% (p=0.000) Change-Id: Ia0e9b50b1d75a3c0c7556184cd966305574fe07c Reviewed-on: https://go-review.googlesource.com/9706 Reviewed-by: Rick Hudson <rlh@golang.org> |
|
|
|
54af9a3ba5 |
runtime: reintroduce ``dead'' space during GC scan
Reintroduce an optimization discarded during the initial conversion from 4-bit heap bitmaps to 2-bit heap bitmaps: when we reach the place in the bitmap where there are no more pointers, mark that position for the GC so that it can avoid scanning past that place. During heapBitsSetType we can also avoid initializing heap bitmap beyond that location, which gives a bit of a win compared to Go 1.4. This particular optimization (not initializing the heap bitmap) may not last: we might change typedmemmove to use the heap bitmap, in which case it would all need to be initialized. The early stop in the GC scan will stay no matter what. Compared to Go 1.4 (github.com/rsc/go, branch go14bench): name old mean new mean delta SetTypeNode64 80.7ns × (1.00,1.01) 57.4ns × (1.00,1.01) -28.83% (p=0.000) SetTypeNode64Dead 80.5ns × (1.00,1.01) 13.1ns × (0.99,1.02) -83.77% (p=0.000) SetTypeNode64Slice 2.16µs × (1.00,1.01) 1.54µs × (1.00,1.01) -28.75% (p=0.000) SetTypeNode64DeadSlice 2.16µs × (1.00,1.01) 1.52µs × (1.00,1.00) -29.74% (p=0.000) Compared to previous CL: name old mean new mean delta SetTypeNode64 56.7ns × (1.00,1.00) 57.4ns × (1.00,1.01) +1.19% (p=0.000) SetTypeNode64Dead 57.2ns × (1.00,1.00) 13.1ns × (0.99,1.02) -77.15% (p=0.000) SetTypeNode64Slice 1.56µs × (1.00,1.01) 1.54µs × (1.00,1.01) -0.89% (p=0.000) SetTypeNode64DeadSlice 1.55µs × (1.00,1.01) 1.52µs × (1.00,1.00) -2.23% (p=0.000) This is the last CL in the sequence converting from the 4-bit heap to the 2-bit heap, with all the same optimizations reenabled. Compared to before that process began (compared to CL 9701 patch set 1): name old mean new mean delta BinaryTree17 5.87s × (0.94,1.09) 5.91s × (0.96,1.06) ~ (p=0.578) Fannkuch11 4.32s × (1.00,1.00) 4.32s × (1.00,1.00) ~ (p=0.474) FmtFprintfEmpty 89.1ns × (0.95,1.16) 89.0ns × (0.93,1.10) ~ (p=0.942) FmtFprintfString 283ns × (0.98,1.02) 298ns × (0.98,1.06) +5.33% (p=0.000) FmtFprintfInt 284ns × (0.98,1.04) 286ns × (0.98,1.03) ~ (p=0.208) FmtFprintfIntInt 486ns × (0.98,1.03) 498ns × (0.97,1.06) +2.48% (p=0.000) FmtFprintfPrefixedInt 400ns × (0.99,1.02) 408ns × (0.98,1.02) +2.23% (p=0.000) FmtFprintfFloat 566ns × (0.99,1.01) 587ns × (0.98,1.01) +3.69% (p=0.000) FmtManyArgs 1.91µs × (0.99,1.02) 1.94µs × (0.99,1.02) +1.81% (p=0.000) GobDecode 15.5ms × (0.98,1.05) 15.8ms × (0.98,1.03) +1.94% (p=0.002) GobEncode 11.9ms × (0.97,1.03) 12.0ms × (0.96,1.09) ~ (p=0.263) Gzip 648ms × (0.99,1.01) 648ms × (0.99,1.01) ~ (p=0.992) Gunzip 143ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.585) HTTPClientServer 89.2µs × (0.99,1.02) 90.3µs × (0.98,1.01) +1.24% (p=0.000) JSONEncode 32.3ms × (0.97,1.06) 31.6ms × (0.99,1.01) -2.29% (p=0.000) JSONDecode 106ms × (0.99,1.01) 107ms × (1.00,1.01) +0.62% (p=0.000) Mandelbrot200 6.02ms × (1.00,1.00) 6.03ms × (1.00,1.01) ~ (p=0.250) GoParse 6.57ms × (0.97,1.06) 6.53ms × (0.99,1.03) ~ (p=0.243) RegexpMatchEasy0_32 162ns × (1.00,1.00) 161ns × (1.00,1.01) -0.80% (p=0.000) RegexpMatchEasy0_1K 561ns × (0.99,1.02) 541ns × (0.99,1.01) -3.67% (p=0.000) RegexpMatchEasy1_32 145ns × (0.95,1.04) 138ns × (1.00,1.00) -5.04% (p=0.000) RegexpMatchEasy1_1K 864ns × (0.99,1.04) 887ns × (0.99,1.01) +2.57% (p=0.000) RegexpMatchMedium_32 255ns × (0.99,1.04) 253ns × (0.99,1.01) -1.05% (p=0.012) RegexpMatchMedium_1K 73.9µs × (0.98,1.04) 72.8µs × (1.00,1.00) -1.51% (p=0.005) RegexpMatchHard_32 3.92µs × (0.98,1.04) 3.85µs × (1.00,1.01) -1.88% (p=0.002) RegexpMatchHard_1K 120µs × (0.98,1.04) 117µs × (1.00,1.01) -2.02% (p=0.001) Revcomp 936ms × (0.95,1.08) 922ms × (0.97,1.08) ~ (p=0.234) Template 130ms × (0.98,1.04) 126ms × (0.99,1.01) -2.99% (p=0.000) TimeParse 638ns × (0.98,1.05) 628ns × (0.99,1.01) -1.54% (p=0.004) TimeFormat 674ns × (0.99,1.01) 668ns × (0.99,1.01) -0.80% (p=0.001) The slowdown of the first few benchmarks seems to be due to the new atomic operations for certain small size allocations. But the larger benchmarks mostly improve, probably due to the decreased memory pressure from having half as much heap bitmap. CL 9706, which removes the (never used anymore) wbshadow mode, gets back what is lost in the early microbenchmarks. Change-Id: I37423a209e8ec2a2e92538b45cac5422a6acd32d Reviewed-on: https://go-review.googlesource.com/9705 Reviewed-by: Rick Hudson <rlh@golang.org> |
|
|
|
feb8a3b616 |
runtime: optimize heapBitsSetType
For the conversion of the heap bitmap from 4-bit to 2-bit fields, I replaced heapBitsSetType with the dumbest thing that could possibly work: two atomic operations (atomicand8+atomicor8) per 2-bit field. This CL replaces that code with a proper implementation that avoids the atomics whenever possible. Benchmarks vs base CL (before the conversion to 2-bit heap bitmap) and vs Go 1.4 below. Compared to Go 1.4, SetTypePtr (a 1-pointer allocation) is 10ns slower because a race against the concurrent GC requires the use of an atomicor8 that used to be an ordinary write. This slowdown was present even in the base CL. Compared to both Go 1.4 and base, SetTypeNode8 (a 10-word allocation) is 10ns slower because it too needs a new atomic, because with the denser representation, the byte on the end of the allocation is now shared with the object next to it; this was not true with the 4-bit representation. Excluding these two (fundamental) slowdowns due to the use of atomics, the new code is noticeably faster than both Go 1.4 and the base CL. The next CL will reintroduce the ``typeDead'' optimization. Stats are from 5 runs on a MacBookPro10,2 (late 2012 Core i5). Compared to base CL (** = new atomic) name old mean new mean delta SetTypePtr 14.1ns × (0.99,1.02) 14.7ns × (0.93,1.10) ~ (p=0.175) SetTypePtr8 18.4ns × (1.00,1.01) 18.6ns × (0.81,1.21) ~ (p=0.866) SetTypePtr16 28.7ns × (1.00,1.00) 22.4ns × (0.90,1.27) -21.88% (p=0.015) SetTypePtr32 52.3ns × (1.00,1.00) 33.8ns × (0.93,1.24) -35.37% (p=0.001) SetTypePtr64 79.2ns × (1.00,1.00) 55.1ns × (1.00,1.01) -30.43% (p=0.000) SetTypePtr126 118ns × (1.00,1.00) 100ns × (1.00,1.00) -15.97% (p=0.000) SetTypePtr128 130ns × (0.92,1.19) 98ns × (1.00,1.00) -24.36% (p=0.008) SetTypePtrSlice 726ns × (0.96,1.08) 760ns × (1.00,1.00) ~ (p=0.152) SetTypeNode1 14.1ns × (0.94,1.15) 12.0ns × (1.00,1.01) -14.60% (p=0.020) SetTypeNode1Slice 135ns × (0.96,1.07) 88ns × (1.00,1.00) -34.53% (p=0.000) SetTypeNode8 20.9ns × (1.00,1.01) 32.6ns × (1.00,1.00) +55.37% (p=0.000) ** SetTypeNode8Slice 414ns × (0.99,1.02) 244ns × (1.00,1.00) -41.09% (p=0.000) SetTypeNode64 80.0ns × (1.00,1.00) 57.4ns × (1.00,1.00) -28.23% (p=0.000) SetTypeNode64Slice 2.15µs × (1.00,1.01) 1.56µs × (1.00,1.00) -27.43% (p=0.000) SetTypeNode124 119ns × (0.99,1.00) 100ns × (1.00,1.00) -16.11% (p=0.000) SetTypeNode124Slice 3.40µs × (1.00,1.00) 2.93µs × (1.00,1.00) -13.80% (p=0.000) SetTypeNode126 120ns × (1.00,1.01) 98ns × (1.00,1.00) -18.19% (p=0.000) SetTypeNode126Slice 3.53µs × (0.98,1.08) 3.02µs × (1.00,1.00) -14.49% (p=0.002) SetTypeNode1024 726ns × (0.97,1.09) 740ns × (1.00,1.00) ~ (p=0.451) SetTypeNode1024Slice 24.9µs × (0.89,1.37) 23.1µs × (1.00,1.00) ~ (p=0.476) Compared to Go 1.4 (** = new atomic) name old mean new mean delta SetTypePtr 5.71ns × (0.89,1.19) 14.68ns × (0.93,1.10) +157.24% (p=0.000) ** SetTypePtr8 19.3ns × (0.96,1.10) 18.6ns × (0.81,1.21) ~ (p=0.638) SetTypePtr16 30.7ns × (0.99,1.03) 22.4ns × (0.90,1.27) -26.88% (p=0.005) SetTypePtr32 51.5ns × (1.00,1.00) 33.8ns × (0.93,1.24) -34.40% (p=0.001) SetTypePtr64 83.6ns × (0.94,1.12) 55.1ns × (1.00,1.01) -34.12% (p=0.001) SetTypePtr126 137ns × (0.87,1.26) 100ns × (1.00,1.00) -27.10% (p=0.028) SetTypePtrSlice 865ns × (0.80,1.23) 760ns × (1.00,1.00) ~ (p=0.243) SetTypeNode1 15.2ns × (0.88,1.12) 12.0ns × (1.00,1.01) -20.89% (p=0.014) SetTypeNode1Slice 156ns × (0.93,1.16) 88ns × (1.00,1.00) -43.57% (p=0.001) SetTypeNode8 23.8ns × (0.90,1.18) 32.6ns × (1.00,1.00) +36.76% (p=0.003) ** SetTypeNode8Slice 502ns × (0.92,1.10) 244ns × (1.00,1.00) -51.46% (p=0.000) SetTypeNode64 85.6ns × (0.94,1.11) 57.4ns × (1.00,1.00) -32.89% (p=0.001) SetTypeNode64Slice 2.36µs × (0.91,1.14) 1.56µs × (1.00,1.00) -33.96% (p=0.002) SetTypeNode124 130ns × (0.91,1.12) 100ns × (1.00,1.00) -23.49% (p=0.004) SetTypeNode124Slice 3.81µs × (0.90,1.22) 2.93µs × (1.00,1.00) -23.09% (p=0.025) There are fewer benchmarks vs Go 1.4 because unrolling directly into the heap bitmap is not yet implemented, so those would not be meaningful comparisons. These benchmarks were not present in Go 1.4 as distributed. The backport to Go 1.4 is in github.com/rsc/go's go14bench branch, commit 71d5ee5. Change-Id: I95ed05a22bf484b0fc9efad549279e766c98d2b6 Reviewed-on: https://go-review.googlesource.com/9704 Reviewed-by: Rick Hudson <rlh@golang.org> |
|
|
|
0234dfd493 |
runtime: use 2-bit heap bitmap (in place of 4-bit)
Previous CLs changed the representation of the non-heap type bitmaps to be 1-bit bitmaps (pointer or not). Before this CL, the heap bitmap stored a 2-bit type for each word and a mark bit and checkmark bit for the first word of the object. (There used to be additional per-word bits.) Reduce heap bitmap to 2-bit, with 1 dedicated to pointer or not, and the other used for mark, checkmark, and "keep scanning forward to find pointers in this object." See comments for details. This CL replaces heapBitsSetType with very slow but obviously correct code. A followup CL will optimize it. (Spoiler: the new code is faster than Go 1.4 was.) Change-Id: I999577a133f3cfecacebdec9cdc3573c235c7fb9 Reviewed-on: https://go-review.googlesource.com/9703 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
6d8a147bef |
runtime: use 1-bit pointer bitmaps in type representation
The type information in reflect.Type and the GC programs is now 1 bit per word, down from 2 bits. The in-memory unrolled type bitmap representation are now 1 bit per word, down from 4 bits. The conversion from the unrolled (now 1-bit) bitmap to the heap bitmap (still 4-bit) is not optimized. A followup CL will work on that, after the heap bitmap has been converted to 2-bit. The typeDead optimization, in which a special value denotes that there are no more pointers anywhere in the object, is lost in this CL. A followup CL will bring it back in the final form of heapBitsSetType. Change-Id: If61e67950c16a293b0b516a6fd9a1c755b6d5549 Reviewed-on: https://go-review.googlesource.com/9702 Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
7d9e16abc6 |
runtime: add benchmark of heapBitsSetType
There was an old benchmark that measured this indirectly via allocation, but I don't understand how to factor out the allocation cost when interpreting the numbers. Replace with a benchmark that only calls heapBitsSetType, that does not allocate. This was not possible when the benchmark was first written, because heapBitsSetType had not been factored out of mallocgc. Change-Id: I30f0f02362efab3465a50769398be859832e6640 Reviewed-on: https://go-review.googlesource.com/9701 Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
db6f88a84b |
runtime: enable profiling on g0
Since we now have stack information for code running on the systemstack, we can traceback over it. To make cpu profiles useful, add a case in gentraceback to jump over systemstack switches. Fixes #10609. Change-Id: I21f47fcc802c07c5d4a1ada56374314e388a6dc7 Reviewed-on: https://go-review.googlesource.com/9506 Reviewed-by: Dmitry Vyukov <dvyukov@google.com> |
|
|
|
fd392ee52b |
cmd/internal/ld: generate correct .debug_frames on RISC architectures
With this patch, gdb seems to be able to corretly backtrace Go
process on at least linux/{arm,arm64,ppc64}.
Change-Id: Ic40a2a70e71a19c4a92e4655710f38a807b67e9a
Reviewed-on: https://go-review.googlesource.com/9822
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
|
|
|
|
0211d7d7b0 |
runtime: turn off checkmark by default
Change-Id: Ic8cb8b1ed8715d6d5a53ec3cac385c0e93883514 Reviewed-on: https://go-review.googlesource.com/9825 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
9626561030 |
runtime: fix gccheckmark mode and enable by default
It was testing the mark bits on what roots pointed at, but not the remainder of the live heap, because in CL 2991 I accidentally inverted this check during refactoring. The next CL will turn it back off by default again, but I want one run on the builders with the full checkmark checks. Change-Id: Ic166458cea25c0a56e5387fc527cb166ff2e5ada Reviewed-on: https://go-review.googlesource.com/9824 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
b6e178ed7e |
runtime: set heap minimum default based on GOGC
Currently the heap minimum is set to 4MB which prevents our ability to collect at every allocation by setting GOGC=0. This adjust the heap minimum to 4MB*GOGC/100 thus reenabling collecting at every allocation. Fixes #10681 Change-Id: I912d027dac4b14ae535597e8beefa9ac3fb8ad94 Reviewed-on: https://go-review.googlesource.com/9814 Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Russ Cox <rsc@golang.org> |
|
|
|
fa896733b5 |
runtime: check consistency of all module data objects
Current code just checks the consistency (that the functab is correctly sorted by PC, etc) of the moduledata object that the runtime belongs to. Change to check all of them. Change-Id: I544a44c5de7445fff87d3cdb4840ff04c5e2bf75 Reviewed-on: https://go-review.googlesource.com/9773 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> |
|
|
|
a52dc9fcbd |
runtime: fix comments that mention g status values
Makes searching in source code easier. Change-Id: Ie2e85934d23920ac0bc01d28168bcfbbdc465580 Reviewed-on: https://go-review.googlesource.com/9774 Reviewed-by: Daniel Morsing <daniel.morsing@gmail.com> Reviewed-by: Minux Ma <minux@golang.org> |
|
|
|
17db6e0420 |
runtime: use heap scan size as estimate of GC scan work
Currently, the GC uses a moving average of recent scan work ratios to estimate the total scan work required by this cycle. This is in turn used to compute how much scan work should be done by mutators when they allocate in order to perform all expected scan work by the time the allocated heap reaches the heap goal. However, our current scan work estimate can be arbitrarily wrong if the heap topography changes significantly from one cycle to the next. For example, in the go1 benchmarks, at the beginning of each benchmark, the heap is dominated by a 256MB no-scan object, so the GC learns that the scan density of the heap is very low. In benchmarks that then rapidly allocate pointer-dense objects, by the time of the next GC cycle, our estimate of the scan work can be too low by a large factor. This in turn lets the mutator allocate faster than the GC can collect, allowing it to get arbitrarily far ahead of the scan work estimate, which leads to very long GC cycles with very little mutator assist that can overshoot the heap goal by large margins. This is particularly easy to demonstrate with BinaryTree17: $ GODEBUG=gctrace=1 ./go1.test -test.bench BinaryTree17 gc #1 @0.017s 2%: 0+0+0+0+0 ms clock, 0+0+0+0/0/0+0 ms cpu, 4->262->262 MB, 4 MB goal, 1 P gc #2 @0.026s 3%: 0+0+0+0+0 ms clock, 0+0+0+0/0/0+0 ms cpu, 262->262->262 MB, 524 MB goal, 1 P testing: warning: no tests to run PASS BenchmarkBinaryTree17 gc #3 @1.906s 0%: 0+0+0+0+7 ms clock, 0+0+0+0/0/0+7 ms cpu, 325->325->287 MB, 325 MB goal, 1 P (forced) gc #4 @12.203s 20%: 0+0+0+10067+10 ms clock, 0+0+0+0/2523/852+10 ms cpu, 430->2092->1950 MB, 574 MB goal, 1 P 1 9150447353 ns/op Change this estimate to instead use the *current* scannable heap size. This has the advantage of being based solely on the current state of the heap, not on past densities or reachable heap sizes, so it isn't susceptible to falling behind during these sorts of phase changes. This is strictly an over-estimate, but it's better to over-estimate and get more assist than necessary than it is to under-estimate and potentially spiral out of control. Experiments with scaling this estimate back showed no obvious benefit for mutator utilization, heap size, or assist time. This new estimate has little effect for most benchmarks, including most go1 benchmarks, x/benchmarks, and the 6g benchmark. It has a huge effect for benchmarks that triggered the bad pacer behavior: name old mean new mean delta BinaryTree17 10.0s × (1.00,1.00) 3.5s × (0.98,1.01) -64.93% (p=0.000) Fannkuch11 2.74s × (1.00,1.01) 2.65s × (1.00,1.00) -3.52% (p=0.000) FmtFprintfEmpty 56.4ns × (0.99,1.00) 57.8ns × (1.00,1.01) +2.43% (p=0.000) FmtFprintfString 187ns × (0.99,1.00) 185ns × (0.99,1.01) -1.19% (p=0.010) FmtFprintfInt 184ns × (1.00,1.00) 183ns × (1.00,1.00) (no variance) FmtFprintfIntInt 321ns × (1.00,1.00) 315ns × (1.00,1.00) -1.80% (p=0.000) FmtFprintfPrefixedInt 266ns × (1.00,1.00) 263ns × (1.00,1.00) -1.22% (p=0.000) FmtFprintfFloat 353ns × (1.00,1.00) 353ns × (1.00,1.00) -0.13% (p=0.035) FmtManyArgs 1.21µs × (1.00,1.00) 1.19µs × (1.00,1.00) -1.33% (p=0.000) GobDecode 9.69ms × (1.00,1.00) 9.59ms × (1.00,1.00) -1.07% (p=0.000) GobEncode 7.89ms × (0.99,1.01) 7.74ms × (1.00,1.00) -1.92% (p=0.000) Gzip 391ms × (1.00,1.00) 392ms × (1.00,1.00) ~ (p=0.522) Gunzip 97.1ms × (1.00,1.00) 97.0ms × (1.00,1.00) -0.10% (p=0.000) HTTPClientServer 55.7µs × (0.99,1.01) 56.7µs × (0.99,1.01) +1.81% (p=0.001) JSONEncode 19.1ms × (1.00,1.00) 19.0ms × (1.00,1.00) -0.85% (p=0.000) JSONDecode 66.8ms × (1.00,1.00) 66.9ms × (1.00,1.00) ~ (p=0.288) Mandelbrot200 4.13ms × (1.00,1.00) 4.12ms × (1.00,1.00) -0.08% (p=0.000) GoParse 3.97ms × (1.00,1.01) 4.01ms × (1.00,1.00) +0.99% (p=0.000) RegexpMatchEasy0_32 114ns × (1.00,1.00) 115ns × (0.99,1.00) ~ (p=0.070) RegexpMatchEasy0_1K 376ns × (1.00,1.00) 376ns × (1.00,1.00) ~ (p=0.900) RegexpMatchEasy1_32 94.9ns × (1.00,1.00) 96.3ns × (1.00,1.01) +1.53% (p=0.001) RegexpMatchEasy1_1K 568ns × (1.00,1.00) 567ns × (1.00,1.00) -0.22% (p=0.001) RegexpMatchMedium_32 159ns × (1.00,1.00) 159ns × (1.00,1.00) ~ (p=0.178) RegexpMatchMedium_1K 46.4µs × (1.00,1.00) 46.6µs × (1.00,1.00) +0.29% (p=0.000) RegexpMatchHard_32 2.37µs × (1.00,1.00) 2.37µs × (1.00,1.00) ~ (p=0.722) RegexpMatchHard_1K 71.1µs × (1.00,1.00) 71.2µs × (1.00,1.00) ~ (p=0.229) Revcomp 565ms × (1.00,1.00) 562ms × (1.00,1.00) -0.52% (p=0.000) Template 81.0ms × (1.00,1.00) 80.2ms × (1.00,1.00) -0.97% (p=0.000) TimeParse 380ns × (1.00,1.00) 380ns × (1.00,1.00) ~ (p=0.148) TimeFormat 405ns × (0.99,1.00) 385ns × (0.99,1.00) -5.00% (p=0.000) Change-Id: I11274158bf3affaf62662e02de7af12d5fb789e4 Reviewed-on: https://go-review.googlesource.com/9696 Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com> |
|
|
|
3be3cbd548 |
runtime: track "scannable" bytes of heap
This tracks the number of scannable bytes in the allocated heap. That is, bytes that the garbage collector must scan before reaching the last pointer field in each object. This will be used to compute a more robust estimate of the GC scan work. Change-Id: I1eecd45ef9cdd65b69d2afb5db5da885c80086bb Reviewed-on: https://go-review.googlesource.com/9695 Reviewed-by: Russ Cox <rsc@golang.org> |
|
|
|
53c53984e7 |
runtime: include scalar slots in GC scan work metric
The garbage collector predicts how much "scan work" must be done in a cycle to determine how much work should be done by mutators when they allocate. Most code doesn't care what units the scan work is in: it simply knows that a certain amount of scan work has to be done in the cycle. Currently, the GC uses the number of pointer slots scanned as the scan work on the theory that this is the bulk of the time spent in the garbage collector and hence reflects real CPU resource usage. However, this metric is difficult to estimate at the beginning of a cycle. Switch to counting the total number of bytes scanned, including both pointer and scalar slots. This is still less than the total marked heap since it omits no-scan objects and no-scan tails of objects. This metric may not reflect absolute performance as well as the count of scanned pointer slots (though it still takes time to scan scalar fields), but it will be much easier to estimate robustly, which is more important. Change-Id: Ie3a5eeeb0384a1ca566f61b2f11e9ff3a75ca121 Reviewed-on: https://go-review.googlesource.com/9694 Reviewed-by: Russ Cox <rsc@golang.org> |
|
|
|
c4931a8433 |
runtime: dispose gcWork caches before updating controller state
Currently, we only flush the per-P gcWork caches in gcMark, at the
beginning of mark termination. This is necessary to ensure that no
work is held up in these caches.
However, this flush happens after we update the GC controller state,
which depends on statistics about marked heap size and scan work that
are only updated by this flush. Hence, the controller is missing the
bulk of heap marking and scan work. This bug was introduced in commit
|
|
|
|
1845314560 |
runtime: remove unused GC timers
During development some tracing routines were added that are not needed in the release. These included GCstarttimes, GCendtimes, and GCprinttimes. Fixes #10462 Change-Id: I0788e6409d61038571a5ae0cbbab793102df0a65 Reviewed-on: https://go-review.googlesource.com/9689 Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
fe5ef5c9d7 |
runtime, syscall: link Solaris binaries directly instead of using dlopen/dlsym
Before CL 8214 (use .plt instead of .got on Solaris) Solaris used a dynamic linking scheme that didn't permit lazy binding. To speed program startup, Go binaries only used it for a small number of symbols required by the runtime. Other symbols were resolved on demand on first use, and were cached for subsequent use. This required some moderately complex code in the syscall package. CL 8214 changed the way dynamic linking is implemented, and now lazy binding is supported. As now all symbols are resolved lazily by the dynamic loader, there is no need for the complex code in the syscall package that did the same. This CL makes Go programs link directly with the necessary shared libraries and deletes the lazy-loading code implemented in Go. Change-Id: Ifd7275db72de61b70647242e7056dd303b1aee9e Reviewed-on: https://go-review.googlesource.com/9184 Reviewed-by: Minux Ma <minux@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> |
|
|
|
121489cbfd |
runtime/cgo: add cgo support for solaris/amd64
Change-Id: Ic9744c7716cdd53f27c6e5874230963e5fff0333 Reviewed-on: https://go-review.googlesource.com/8260 Reviewed-by: Minux Ma <minux@golang.org> |
|
|
|
c94f1f791b |
runtime: always load address of libcFunc on Solaris
The linker always uses .plt for externals, so libcFunc is now an actual external symbol instead of a pointer to one. Fixes most of the breakage introduced in previous CL. Change-Id: I64b8c96f93127f2d13b5289b024677fd3ea7dbea Reviewed-on: https://go-review.googlesource.com/8215 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Minux Ma <minux@golang.org> |
|
|
|
ceefebd795 |
runtime: rename ptrsize to ptrdata
I forgot there is already a ptrSize constant. Rename field to avoid some confusion. Change-Id: I098fdcc8afc947d6c02c41c6e6de24624cc1c8ff Reviewed-on: https://go-review.googlesource.com/9700 Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
5a828cfcde |
runtime: let freezetheworld work even when gomaxprocs=1
Freezetheworld still has stuff to do when gomaxprocs=1. In particular, signals can come in on other Ms (like the GC M, say) and the single user M is still running. Fixes #10546 Change-Id: I2f07f17d1c81e93cf905df2cb087112d436ca7e7 Reviewed-on: https://go-review.googlesource.com/9551 Reviewed-by: Dmitry Vyukov <dvyukov@google.com> |
|
|
|
102436e800 |
runtime: fix software FP regs corruption when emulating SQRT on ARM
When emulating ARM FSQRT instruction, the sqrt function itself should not use any floating point arithmetics, otherwise it will clobber the user software FP registers. Fortunately, the sqrt function only uses floating point instructions to test for corner cases, so it's easy to make that function does all it job using pure integer arithmetic only. I've verified that after this change, runtime.stepflt and runtime.sqrt doesn't contain any call to _sfloat. (Perhaps we should add //go:nosfloat to make the compiler enforce this?) Fixes #10641. Change-Id: Ida4742c49000fae4fea4649f28afde630ce4c576 Signed-off-by: Shenghou Ma <minux@golang.org> Reviewed-on: https://go-review.googlesource.com/9570 Reviewed-by: Dave Cheney <dave@cheney.net> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
98a9d36837 |
runtime: add pointer size to type structure
This adds a field to the runtime type structure that records the size of the prefix of objects of that type containing pointers. Any data after this offset is scalar data. This is necessary for shrinking the type bitmaps to 1 bit and will help the garbage collector efficiently estimate the amount of heap that needs to be scanned. Change-Id: I1318d79e6360dca0ac980245016c562e61f52ff5 Reviewed-on: https://go-review.googlesource.com/9691 Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
|
|
|
b86e71f5aa |
runtime: Reduce calls to shouldtriggergc
shouldtriggergc is slightly expensive due to the call overhead and the use of an atomic. This CL reduces the number of time one checks if a GC should be done from one at each allocation to once when a span is allocated. Since shouldtriggergc is an important abstraction simply hand inlining it, along with its atomic instruction would lose the abstraction. Change-Id: Ia3210655b4b3d433f77064a21ecb54e4d9d435f7 Reviewed-on: https://go-review.googlesource.com/9403 Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
031c3bc9ae |
runtime: fix stackDebug comment
Change-Id: Ia9191bd7ecdf7bd5ee7d69ae23aa71760f379aa8 Reviewed-on: https://go-review.googlesource.com/9590 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> |
|
|
|
dc870d5f4b |
runtime: detailed debug output of controller state
This adds a detailed debug dump of the state of the GC controller and a GODEBUG flag to enable it. Change-Id: I562fed7981691a84ddf0f9e6fcd9f089f497ac13 Reviewed-on: https://go-review.googlesource.com/9640 Reviewed-by: Russ Cox <rsc@golang.org> |
|
|
|
4fffc50c26 |
runtime: correct accounting of scan work and bytes marked
(1) Count pointer-free objects found during scanning roots as marked bytes, by not zeroing the mark total after scanning roots. (2) Don't count the bytes for the roots themselves, by not adding them to the mark total in scanblock (the zeroing removed by (1) was aimed at that add but hitting more). Combined, (1) and (2) fix the calculation of the marked heap size. This makes the GC trigger much less often in the Go 1 benchmarks, which have a global []byte pointing at 256 MB of data. That 256 MB allocation was not being included in the heap size in the current code, but was included in Go 1.4. This is the source of much of the relative slowdown in that directory. (3) Count the bytes for the roots as scanned work, by not zeroing the scan total after scanning roots. There is no strict justification for this, and it probably doesn't matter much either way, but it was always combined with another buggy zeroing (removed in (1)), so guilty by association. Austin noticed this. name old mean new mean delta BenchmarkBinaryTree17 13.1s × (0.97,1.03) 5.9s × (0.97,1.05) -55.19% (p=0.000) BenchmarkFannkuch11 4.35s × (0.99,1.01) 4.37s × (1.00,1.01) +0.47% (p=0.032) BenchmarkFmtFprintfEmpty 84.6ns × (0.95,1.14) 85.7ns × (0.94,1.05) ~ (p=0.521) BenchmarkFmtFprintfString 320ns × (0.95,1.06) 283ns × (0.99,1.02) -11.48% (p=0.000) BenchmarkFmtFprintfInt 311ns × (0.98,1.03) 288ns × (0.99,1.02) -7.26% (p=0.000) BenchmarkFmtFprintfIntInt 554ns × (0.96,1.05) 478ns × (0.99,1.02) -13.70% (p=0.000) BenchmarkFmtFprintfPrefixedInt 434ns × (0.96,1.06) 393ns × (0.98,1.04) -9.60% (p=0.000) BenchmarkFmtFprintfFloat 620ns × (0.99,1.03) 584ns × (0.99,1.01) -5.73% (p=0.000) BenchmarkFmtManyArgs 2.19µs × (0.98,1.03) 1.94µs × (0.99,1.01) -11.62% (p=0.000) BenchmarkGobDecode 21.2ms × (0.97,1.06) 15.2ms × (0.99,1.01) -28.17% (p=0.000) BenchmarkGobEncode 18.1ms × (0.94,1.06) 11.8ms × (0.99,1.01) -35.00% (p=0.000) BenchmarkGzip 650ms × (0.98,1.01) 649ms × (0.99,1.02) ~ (p=0.802) BenchmarkGunzip 143ms × (1.00,1.01) 143ms × (1.00,1.01) ~ (p=0.438) BenchmarkHTTPClientServer 110µs × (0.98,1.04) 101µs × (0.98,1.02) -8.79% (p=0.000) BenchmarkJSONEncode 40.3ms × (0.97,1.03) 31.8ms × (0.98,1.03) -20.92% (p=0.000) BenchmarkJSONDecode 119ms × (0.97,1.02) 108ms × (0.99,1.02) -9.15% (p=0.000) BenchmarkMandelbrot200 6.03ms × (1.00,1.01) 6.03ms × (0.99,1.01) ~ (p=0.750) BenchmarkGoParse 8.58ms × (0.89,1.10) 6.80ms × (1.00,1.00) -20.71% (p=0.000) BenchmarkRegexpMatchEasy0_32 162ns × (1.00,1.01) 162ns × (0.99,1.02) ~ (p=0.131) BenchmarkRegexpMatchEasy0_1K 540ns × (0.99,1.02) 559ns × (0.99,1.02) +3.58% (p=0.000) BenchmarkRegexpMatchEasy1_32 139ns × (0.98,1.04) 139ns × (1.00,1.00) ~ (p=0.466) BenchmarkRegexpMatchEasy1_1K 889ns × (0.99,1.01) 885ns × (0.99,1.01) -0.50% (p=0.022) BenchmarkRegexpMatchMedium_32 252ns × (0.99,1.02) 252ns × (0.99,1.01) ~ (p=0.469) BenchmarkRegexpMatchMedium_1K 72.9µs × (0.99,1.01) 73.6µs × (0.99,1.03) ~ (p=0.168) BenchmarkRegexpMatchHard_32 3.87µs × (1.00,1.01) 3.86µs × (1.00,1.00) ~ (p=0.055) BenchmarkRegexpMatchHard_1K 118µs × (0.99,1.01) 117µs × (0.99,1.00) ~ (p=0.133) BenchmarkRevcomp 995ms × (0.94,1.10) 949ms × (0.99,1.01) -4.64% (p=0.000) BenchmarkTemplate 141ms × (0.97,1.02) 127ms × (0.99,1.01) -10.00% (p=0.000) BenchmarkTimeParse 641ns × (0.99,1.01) 623ns × (0.99,1.01) -2.79% (p=0.000) BenchmarkTimeFormat 729ns × (0.98,1.03) 679ns × (0.99,1.00) -6.93% (p=0.000) Change-Id: I839bd7356630d18377989a0748763414e15ed057 Reviewed-on: https://go-review.googlesource.com/9602 Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
4d0f3a1c95 |
cmd/internal/gc, runtime: use 1-bit bitmap for stack frames, data, bss
The bitmaps were 2 bits per pointer because we needed to distinguish scalar, pointer, multiword, and we used the leftover value to distinguish uninitialized from scalar, even though the garbage collector (GC) didn't care. Now that there are no multiword structures from the GC's point of view, cut the bitmaps down to 1 bit per pointer, recording just live pointer vs not. The GC assumes the same layout for stack frames and for the maps describing the global data and bss sections, so change them all in one CL. The code still refers to 4-bit heap bitmaps and 2-bit "type bitmaps", since the 2-bit representation lives (at least for now) in some of the reflect data. Because these stack frame bitmaps are stored directly in the rodata in the binary, this CL reduces the size of the 6g binary by about 1.1%. Performance change is basically a wash, but using less memory, and smaller binaries, and enables other bitmap reductions. name old mean new mean delta BenchmarkBinaryTree17 13.2s × (0.97,1.03) 13.0s × (0.99,1.01) -0.93% (p=0.005) BenchmarkBinaryTree17-2 9.69s × (0.96,1.05) 9.51s × (0.96,1.03) -1.86% (p=0.001) BenchmarkBinaryTree17-4 10.1s × (0.97,1.05) 10.0s × (0.96,1.05) ~ (p=0.141) BenchmarkFannkuch11 4.35s × (0.99,1.01) 4.43s × (0.98,1.04) +1.75% (p=0.001) BenchmarkFannkuch11-2 4.31s × (0.99,1.03) 4.32s × (1.00,1.00) ~ (p=0.095) BenchmarkFannkuch11-4 4.32s × (0.99,1.02) 4.38s × (0.98,1.04) +1.38% (p=0.008) BenchmarkFmtFprintfEmpty 83.5ns × (0.97,1.10) 87.3ns × (0.92,1.11) +4.55% (p=0.014) BenchmarkFmtFprintfEmpty-2 81.8ns × (0.98,1.04) 82.5ns × (0.97,1.08) ~ (p=0.364) BenchmarkFmtFprintfEmpty-4 80.9ns × (0.99,1.01) 82.6ns × (0.97,1.08) +2.12% (p=0.010) BenchmarkFmtFprintfString 320ns × (0.95,1.04) 322ns × (0.97,1.05) ~ (p=0.368) BenchmarkFmtFprintfString-2 303ns × (0.97,1.04) 304ns × (0.97,1.04) ~ (p=0.484) BenchmarkFmtFprintfString-4 305ns × (0.97,1.05) 306ns × (0.98,1.05) ~ (p=0.543) BenchmarkFmtFprintfInt 311ns × (0.98,1.03) 319ns × (0.97,1.03) +2.63% (p=0.000) BenchmarkFmtFprintfInt-2 297ns × (0.98,1.04) 301ns × (0.97,1.04) +1.19% (p=0.023) BenchmarkFmtFprintfInt-4 302ns × (0.98,1.02) 304ns × (0.97,1.03) ~ (p=0.126) BenchmarkFmtFprintfIntInt 554ns × (0.96,1.05) 554ns × (0.97,1.03) ~ (p=0.975) BenchmarkFmtFprintfIntInt-2 520ns × (0.98,1.03) 517ns × (0.98,1.02) ~ (p=0.153) BenchmarkFmtFprintfIntInt-4 524ns × (0.98,1.02) 525ns × (0.98,1.03) ~ (p=0.597) BenchmarkFmtFprintfPrefixedInt 433ns × (0.97,1.06) 434ns × (0.97,1.06) ~ (p=0.804) BenchmarkFmtFprintfPrefixedInt-2 413ns × (0.98,1.04) 413ns × (0.98,1.03) ~ (p=0.881) BenchmarkFmtFprintfPrefixedInt-4 420ns × (0.97,1.03) 421ns × (0.97,1.03) ~ (p=0.561) BenchmarkFmtFprintfFloat 620ns × (0.99,1.03) 636ns × (0.97,1.03) +2.57% (p=0.000) BenchmarkFmtFprintfFloat-2 601ns × (0.98,1.02) 617ns × (0.98,1.03) +2.58% (p=0.000) BenchmarkFmtFprintfFloat-4 613ns × (0.98,1.03) 626ns × (0.98,1.02) +2.15% (p=0.000) BenchmarkFmtManyArgs 2.19µs × (0.96,1.04) 2.23µs × (0.97,1.02) +1.65% (p=0.000) BenchmarkFmtManyArgs-2 2.08µs × (0.98,1.03) 2.10µs × (0.99,1.02) +0.79% (p=0.019) BenchmarkFmtManyArgs-4 2.10µs × (0.98,1.02) 2.13µs × (0.98,1.02) +1.72% (p=0.000) BenchmarkGobDecode 21.3ms × (0.97,1.05) 21.1ms × (0.97,1.04) -1.36% (p=0.025) BenchmarkGobDecode-2 20.0ms × (0.97,1.03) 19.2ms × (0.97,1.03) -4.00% (p=0.000) BenchmarkGobDecode-4 19.5ms × (0.99,1.02) 19.0ms × (0.99,1.01) -2.39% (p=0.000) BenchmarkGobEncode 18.3ms × (0.95,1.07) 18.1ms × (0.96,1.08) ~ (p=0.305) BenchmarkGobEncode-2 16.8ms × (0.97,1.02) 16.4ms × (0.98,1.02) -2.79% (p=0.000) BenchmarkGobEncode-4 15.4ms × (0.98,1.02) 15.4ms × (0.98,1.02) ~ (p=0.465) BenchmarkGzip 650ms × (0.98,1.03) 655ms × (0.97,1.04) ~ (p=0.075) BenchmarkGzip-2 652ms × (0.98,1.03) 655ms × (0.98,1.02) ~ (p=0.337) BenchmarkGzip-4 656ms × (0.98,1.04) 653ms × (0.98,1.03) ~ (p=0.291) BenchmarkGunzip 143ms × (1.00,1.01) 143ms × (1.00,1.01) ~ (p=0.507) BenchmarkGunzip-2 143ms × (1.00,1.01) 143ms × (1.00,1.01) ~ (p=0.313) BenchmarkGunzip-4 143ms × (1.00,1.01) 143ms × (1.00,1.01) ~ (p=0.312) BenchmarkHTTPClientServer 110µs × (0.98,1.03) 109µs × (0.99,1.02) -1.40% (p=0.000) BenchmarkHTTPClientServer-2 154µs × (0.90,1.08) 149µs × (0.90,1.08) -3.43% (p=0.007) BenchmarkHTTPClientServer-4 138µs × (0.97,1.04) 138µs × (0.96,1.04) ~ (p=0.670) BenchmarkJSONEncode 40.2ms × (0.98,1.02) 40.2ms × (0.98,1.05) ~ (p=0.828) BenchmarkJSONEncode-2 35.1ms × (0.99,1.02) 35.2ms × (0.98,1.03) ~ (p=0.392) BenchmarkJSONEncode-4 35.3ms × (0.98,1.03) 35.3ms × (0.98,1.02) ~ (p=0.813) BenchmarkJSONDecode 119ms × (0.97,1.02) 117ms × (0.98,1.02) -1.80% (p=0.000) BenchmarkJSONDecode-2 115ms × (0.99,1.02) 114ms × (0.98,1.02) -1.18% (p=0.000) BenchmarkJSONDecode-4 116ms × (0.98,1.02) 114ms × (0.98,1.02) -1.43% (p=0.000) BenchmarkMandelbrot200 6.03ms × (1.00,1.01) 6.03ms × (1.00,1.01) ~ (p=0.985) BenchmarkMandelbrot200-2 6.03ms × (1.00,1.01) 6.02ms × (1.00,1.01) ~ (p=0.320) BenchmarkMandelbrot200-4 6.03ms × (1.00,1.01) 6.03ms × (1.00,1.01) ~ (p=0.799) BenchmarkGoParse 8.63ms × (0.89,1.10) 8.58ms × (0.93,1.09) ~ (p=0.667) BenchmarkGoParse-2 8.20ms × (0.97,1.04) 8.37ms × (0.97,1.04) +1.96% (p=0.001) BenchmarkGoParse-4 8.00ms × (0.98,1.02) 8.14ms × (0.99,1.02) +1.75% (p=0.000) BenchmarkRegexpMatchEasy0_32 162ns × (1.00,1.01) 164ns × (0.98,1.04) +1.35% (p=0.011) BenchmarkRegexpMatchEasy0_32-2 161ns × (1.00,1.01) 161ns × (1.00,1.00) ~ (p=0.185) BenchmarkRegexpMatchEasy0_32-4 161ns × (1.00,1.00) 161ns × (1.00,1.00) -0.19% (p=0.001) BenchmarkRegexpMatchEasy0_1K 540ns × (0.99,1.02) 566ns × (0.98,1.04) +4.98% (p=0.000) BenchmarkRegexpMatchEasy0_1K-2 540ns × (0.99,1.01) 557ns × (0.99,1.01) +3.21% (p=0.000) BenchmarkRegexpMatchEasy0_1K-4 541ns × (0.99,1.01) 559ns × (0.99,1.01) +3.26% (p=0.000) BenchmarkRegexpMatchEasy1_32 139ns × (0.98,1.04) 139ns × (0.99,1.03) ~ (p=0.979) BenchmarkRegexpMatchEasy1_32-2 139ns × (0.99,1.04) 139ns × (0.99,1.02) ~ (p=0.777) BenchmarkRegexpMatchEasy1_32-4 139ns × (0.98,1.04) 139ns × (0.99,1.04) ~ (p=0.771) BenchmarkRegexpMatchEasy1_1K 890ns × (0.99,1.03) 885ns × (1.00,1.01) -0.50% (p=0.004) BenchmarkRegexpMatchEasy1_1K-2 888ns × (0.99,1.01) 885ns × (0.99,1.01) -0.37% (p=0.004) BenchmarkRegexpMatchEasy1_1K-4 890ns × (0.99,1.02) 884ns × (1.00,1.00) -0.70% (p=0.000) BenchmarkRegexpMatchMedium_32 252ns × (0.99,1.01) 251ns × (0.99,1.01) ~ (p=0.081) BenchmarkRegexpMatchMedium_32-2 254ns × (0.99,1.04) 252ns × (0.99,1.01) -0.78% (p=0.027) BenchmarkRegexpMatchMedium_32-4 253ns × (0.99,1.04) 252ns × (0.99,1.01) -0.70% (p=0.022) BenchmarkRegexpMatchMedium_1K 72.9µs × (0.99,1.01) 72.7µs × (1.00,1.00) ~ (p=0.064) BenchmarkRegexpMatchMedium_1K-2 74.1µs × (0.98,1.05) 72.9µs × (1.00,1.01) -1.61% (p=0.001) BenchmarkRegexpMatchMedium_1K-4 73.6µs × (0.99,1.05) 72.8µs × (1.00,1.00) -1.13% (p=0.007) BenchmarkRegexpMatchHard_32 3.88µs × (0.99,1.03) 3.92µs × (0.98,1.05) ~ (p=0.143) BenchmarkRegexpMatchHard_32-2 3.89µs × (0.99,1.03) 3.93µs × (0.98,1.09) ~ (p=0.278) BenchmarkRegexpMatchHard_32-4 3.90µs × (0.99,1.05) 3.93µs × (0.98,1.05) ~ (p=0.252) BenchmarkRegexpMatchHard_1K 118µs × (0.99,1.01) 117µs × (0.99,1.02) -0.54% (p=0.003) BenchmarkRegexpMatchHard_1K-2 118µs × (0.99,1.01) 118µs × (0.99,1.03) ~ (p=0.581) BenchmarkRegexpMatchHard_1K-4 118µs × (0.99,1.02) 117µs × (0.99,1.01) -0.54% (p=0.002) BenchmarkRevcomp 991ms × (0.95,1.10) 989ms × (0.94,1.08) ~ (p=0.879) BenchmarkRevcomp-2 978ms × (0.95,1.11) 962ms × (0.96,1.08) ~ (p=0.257) BenchmarkRevcomp-4 979ms × (0.96,1.07) 974ms × (0.96,1.11) ~ (p=0.678) BenchmarkTemplate 141ms × (0.99,1.02) 145ms × (0.99,1.02) +2.75% (p=0.000) BenchmarkTemplate-2 135ms × (0.98,1.02) 138ms × (0.99,1.02) +2.34% (p=0.000) BenchmarkTemplate-4 136ms × (0.98,1.02) 140ms × (0.99,1.02) +2.71% (p=0.000) BenchmarkTimeParse 640ns × (0.99,1.01) 622ns × (0.99,1.01) -2.88% (p=0.000) BenchmarkTimeParse-2 640ns × (0.99,1.01) 622ns × (1.00,1.00) -2.81% (p=0.000) BenchmarkTimeParse-4 640ns × (1.00,1.01) 622ns × (0.99,1.01) -2.82% (p=0.000) BenchmarkTimeFormat 730ns × (0.98,1.02) 731ns × (0.98,1.03) ~ (p=0.767) BenchmarkTimeFormat-2 709ns × (0.99,1.02) 707ns × (0.99,1.02) ~ (p=0.347) BenchmarkTimeFormat-4 717ns × (0.98,1.01) 718ns × (0.98,1.02) ~ (p=0.793) Change-Id: Ie779c47e912bf80eb918bafa13638bd8dfd6c2d9 Reviewed-on: https://go-review.googlesource.com/9406 Reviewed-by: Rick Hudson <rlh@golang.org> |
|
|
|
7bebccb972 |
Revert "runtime/pprof: write heap statistics to heap profile always"
This reverts commit
|
|
|
|
a55b131393 |
cmd/dist, runtime: Make stack guard larger for non-optimized builds
Kind of a hack, but makes the non-optimized builds pass. Fixes #10079 Change-Id: I26f41c546867f8f3f16d953dc043e784768f2aff Reviewed-on: https://go-review.googlesource.com/9552 Reviewed-by: Russ Cox <rsc@golang.org> |
|
|
|
7fbb1b36c3 |
cmd/internal/gc: improve flow of input params to output params
This includes the following information in the per-function summary:
outK = paramJ encoded in outK bits for paramJ
outK = *paramJ encoded in outK bits for paramJ
heap = paramJ EscHeap
heap = *paramJ EscContentEscapes
Note that (currently) if the address of a parameter is taken and
returned, necessarily a heap allocation occurred to contain that
reference, and the heap can never refer to stack, therefore the
parameter and everything downstream from it escapes to the heap.
The per-function summary information now has a tuneable number of bits
(2 is probably noticeably better than 1, 3 is likely overkill, but it
is now easy to check and the -m debugging output includes information
that allows you to figure out if more would be better.)
A new test was added to check pointer flow through struct-typed and
*struct-typed parameters and returns; some of these are sensitive to
the number of summary bits, and ought to yield better results with a
more competent escape analysis algorithm. Another new test checks
(some) correctness with array parameters, results, and operations.
The old analysis inferred a piece of plan9 runtime was non-escaping by
counteracting overconservative analysis with buggy analysis; with the
bug fixed, the result was too conservative (and it's not easy to fix
in this framework) so the source code was tweaked to get the desired
result. A test was added against the discovered bug.
The escape analysis was further improved splitting the "level" into
3 parts, one tracking the conventional "level" and the other two
computing the highest-level-suffix-from-copy, which is used to
generally model the cancelling effect of indirection applied to
address-of.
With the improved escape analysis enabled, it was necessary to
modify one of the runtime tests because it now attempts to allocate
too much on the (small, fixed-size) G0 (system) stack and this
failed the test.
Compiling src/std after touching src/runtime/*.go with -m logging
turned on shows 420 fewer heap allocation sites (10538 vs 10968).
Profiling allocations in src/html/template with
for i in {1..5} ;
do go tool 6g -memprofile=mastx.${i}.prof -memprofilerate=1 *.go;
go tool pprof -alloc_objects -text mastx.${i}.prof ;
done
showed a 15% reduction in allocations performed by the compiler.
Update #3753
Update #4720
Fixes #10466
Change-Id: I0fd97d5f5ac527b45f49e2218d158a6e89951432
Reviewed-on: https://go-review.googlesource.com/8202
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
|
|
|
|
4044adedf7 |
runtime/cgo, cmd/dist: turn off exc_bad_access handler by default
App Store policy requires programs do not reference the exc_server symbol. (Some public forum threads show that Unity ran into this several years ago and it is a hard policy rule.) While some research suggests that I could write my own version of exc_server, the expedient course is to disable the exception handler by default. Go programs only need it when running under lldb, which is primarily used by tests. So enable the exception handler in cmd/dist when we are running the tests. Fixes #10646 Change-Id: I853905254894b5367edb8abd381d45585a78ee8b Reviewed-on: https://go-review.googlesource.com/9549 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> |
|
|
|
5f69e739d3 |
runtime: adjust traceTickDiv for non-x86 architectures
Fixes #10554. Fixes #10623. Change-Id: I90fbaa34e3d55c8758178f8d2e7fa41ff1194a1b Signed-off-by: Shenghou Ma <minux@golang.org> Reviewed-on: https://go-review.googlesource.com/9247 Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Dave Cheney <dave@cheney.net> |
|
|
|
79a990b845 |
runtime: schedule GC work more aggressively
Schedule the work as early as possible, while still respecting the utilization percentage on average. The old code tried never to go above the utilization percentage. The new code is willing to go above the utilization percentage by one time slice (but of course after doing that it must wait until the percentage drops back down to the target before it gets another time slice). The effect is that for concurrent GCs that can run in a small number of time slices, the time during which write barriers are enabled is reduced by one mutator + GC time slice round (possibly 30 ms per GC). This only affects the fractional GC processor (the remainder of GOMAXPROCS/4), so it matters most in GOMAXPROCS=1, a bit in GOMAXPROCS=2, and not at all in GOMAXPROCS=4. GOMAXPROCS=1 name old mean new mean delta BenchmarkBinaryTree17 12.4s × (0.98,1.03) 13.5s × (0.97,1.04) +8.84% (p=0.000) BenchmarkFannkuch11 4.38s × (1.00,1.01) 4.38s × (1.00,1.01) ~ (p=0.343) BenchmarkFmtFprintfEmpty 88.9ns × (0.97,1.10) 90.1ns × (0.93,1.14) ~ (p=0.224) BenchmarkFmtFprintfString 356ns × (0.94,1.05) 321ns × (0.94,1.12) -9.77% (p=0.000) BenchmarkFmtFprintfInt 344ns × (0.98,1.03) 325ns × (0.96,1.03) -5.46% (p=0.000) BenchmarkFmtFprintfIntInt 622ns × (0.97,1.03) 571ns × (0.95,1.05) -8.09% (p=0.000) BenchmarkFmtFprintfPrefixedInt 462ns × (0.96,1.04) 431ns × (0.95,1.05) -6.81% (p=0.000) BenchmarkFmtFprintfFloat 653ns × (0.98,1.03) 621ns × (0.99,1.03) -4.90% (p=0.000) BenchmarkFmtManyArgs 2.32µs × (0.97,1.03) 2.19µs × (0.98,1.02) -5.43% (p=0.000) BenchmarkGobDecode 27.0ms × (0.96,1.04) 20.0ms × (0.97,1.04) -26.06% (p=0.000) BenchmarkGobEncode 26.6ms × (0.99,1.01) 17.8ms × (0.95,1.05) -33.19% (p=0.000) BenchmarkGzip 659ms × (0.98,1.03) 650ms × (0.99,1.01) -1.34% (p=0.000) BenchmarkGunzip 145ms × (0.98,1.04) 143ms × (1.00,1.01) -1.47% (p=0.000) BenchmarkHTTPClientServer 111µs × (0.97,1.04) 110µs × (0.96,1.03) -1.30% (p=0.000) BenchmarkJSONEncode 52.0ms × (0.97,1.03) 40.8ms × (0.97,1.03) -21.47% (p=0.000) BenchmarkJSONDecode 127ms × (0.98,1.04) 120ms × (0.98,1.02) -5.55% (p=0.000) BenchmarkMandelbrot200 6.04ms × (0.99,1.04) 6.02ms × (1.00,1.01) ~ (p=0.176) BenchmarkGoParse 8.62ms × (0.96,1.08) 8.55ms × (0.93,1.09) ~ (p=0.302) BenchmarkRegexpMatchEasy0_32 164ns × (0.98,1.05) 165ns × (0.98,1.07) ~ (p=0.293) BenchmarkRegexpMatchEasy0_1K 546ns × (0.98,1.06) 547ns × (0.97,1.07) ~ (p=0.741) BenchmarkRegexpMatchEasy1_32 142ns × (0.97,1.09) 141ns × (0.97,1.05) ~ (p=0.231) BenchmarkRegexpMatchEasy1_1K 904ns × (0.97,1.07) 900ns × (0.98,1.04) ~ (p=0.294) BenchmarkRegexpMatchMedium_32 256ns × (0.98,1.06) 256ns × (0.97,1.04) ~ (p=0.530) BenchmarkRegexpMatchMedium_1K 74.2µs × (0.98,1.05) 73.8µs × (0.98,1.04) ~ (p=0.334) BenchmarkRegexpMatchHard_32 3.94µs × (0.98,1.07) 3.92µs × (0.98,1.05) ~ (p=0.356) BenchmarkRegexpMatchHard_1K 119µs × (0.98,1.07) 119µs × (0.98,1.06) ~ (p=0.467) BenchmarkRevcomp 978ms × (0.96,1.09) 984ms × (0.95,1.07) ~ (p=0.448) BenchmarkTemplate 151ms × (0.96,1.03) 142ms × (0.95,1.04) -5.55% (p=0.000) BenchmarkTimeParse 628ns × (0.99,1.01) 628ns × (0.99,1.01) ~ (p=0.855) BenchmarkTimeFormat 729ns × (0.98,1.06) 734ns × (0.97,1.05) ~ (p=0.149) GOMAXPROCS=2 name old mean new mean delta BenchmarkBinaryTree17-2 9.80s × (0.97,1.03) 9.85s × (0.99,1.02) ~ (p=0.444) BenchmarkFannkuch11-2 4.35s × (0.99,1.01) 4.40s × (0.98,1.05) ~ (p=0.099) BenchmarkFmtFprintfEmpty-2 86.7ns × (0.97,1.05) 85.9ns × (0.98,1.04) ~ (p=0.409) BenchmarkFmtFprintfString-2 297ns × (0.98,1.01) 297ns × (0.99,1.01) ~ (p=0.743) BenchmarkFmtFprintfInt-2 309ns × (0.98,1.02) 310ns × (0.99,1.01) ~ (p=0.464) BenchmarkFmtFprintfIntInt-2 525ns × (0.97,1.05) 518ns × (0.99,1.01) ~ (p=0.151) BenchmarkFmtFprintfPrefixedInt-2 408ns × (0.98,1.02) 408ns × (0.98,1.03) ~ (p=0.797) BenchmarkFmtFprintfFloat-2 603ns × (0.99,1.01) 604ns × (0.98,1.02) ~ (p=0.588) BenchmarkFmtManyArgs-2 2.07µs × (0.98,1.02) 2.05µs × (0.99,1.01) ~ (p=0.091) BenchmarkGobDecode-2 19.1ms × (0.97,1.01) 19.3ms × (0.97,1.04) ~ (p=0.195) BenchmarkGobEncode-2 16.2ms × (0.97,1.03) 16.4ms × (0.99,1.01) ~ (p=0.069) BenchmarkGzip-2 652ms × (0.99,1.01) 651ms × (0.99,1.01) ~ (p=0.705) BenchmarkGunzip-2 143ms × (1.00,1.01) 143ms × (1.00,1.00) ~ (p=0.665) BenchmarkHTTPClientServer-2 149µs × (0.92,1.11) 149µs × (0.91,1.08) ~ (p=0.862) BenchmarkJSONEncode-2 34.6ms × (0.98,1.02) 37.2ms × (0.99,1.01) +7.56% (p=0.000) BenchmarkJSONDecode-2 117ms × (0.99,1.01) 117ms × (0.99,1.01) ~ (p=0.858) BenchmarkMandelbrot200-2 6.10ms × (0.99,1.03) 6.03ms × (1.00,1.00) ~ (p=0.083) BenchmarkGoParse-2 8.25ms × (0.98,1.01) 8.21ms × (0.99,1.02) ~ (p=0.307) BenchmarkRegexpMatchEasy0_32-2 162ns × (0.99,1.02) 162ns × (0.99,1.01) ~ (p=0.857) BenchmarkRegexpMatchEasy0_1K-2 541ns × (0.99,1.01) 540ns × (1.00,1.00) ~ (p=0.530) BenchmarkRegexpMatchEasy1_32-2 138ns × (1.00,1.00) 141ns × (0.98,1.04) +1.88% (p=0.038) BenchmarkRegexpMatchEasy1_1K-2 887ns × (0.99,1.01) 894ns × (0.99,1.01) ~ (p=0.087) BenchmarkRegexpMatchMedium_32-2 252ns × (0.99,1.01) 252ns × (0.99,1.01) ~ (p=0.954) BenchmarkRegexpMatchMedium_1K-2 73.4µs × (0.99,1.02) 72.8µs × (1.00,1.01) -0.87% (p=0.029) BenchmarkRegexpMatchHard_32-2 3.95µs × (0.97,1.05) 3.87µs × (1.00,1.01) -2.11% (p=0.035) BenchmarkRegexpMatchHard_1K-2 117µs × (0.99,1.01) 117µs × (0.99,1.01) ~ (p=0.669) BenchmarkRevcomp-2 980ms × (0.95,1.03) 993ms × (0.94,1.09) ~ (p=0.527) BenchmarkTemplate-2 136ms × (0.98,1.01) 135ms × (0.99,1.01) ~ (p=0.200) BenchmarkTimeParse-2 630ns × (1.00,1.01) 630ns × (1.00,1.00) ~ (p=0.634) BenchmarkTimeFormat-2 705ns × (0.99,1.01) 710ns × (0.98,1.02) ~ (p=0.174) GOMAXPROCS=4 BenchmarkBinaryTree17-4 9.87s × (0.96,1.04) 9.75s × (0.96,1.03) ~ (p=0.178) BenchmarkFannkuch11-4 4.35s × (1.00,1.01) 4.40s × (0.99,1.04) ~ (p=0.071) BenchmarkFmtFprintfEmpty-4 85.8ns × (0.98,1.06) 85.6ns × (0.98,1.04) ~ (p=0.858) BenchmarkFmtFprintfString-4 306ns × (0.99,1.03) 304ns × (0.97,1.02) ~ (p=0.470) BenchmarkFmtFprintfInt-4 317ns × (0.98,1.01) 315ns × (0.98,1.02) -0.92% (p=0.044) BenchmarkFmtFprintfIntInt-4 527ns × (0.99,1.01) 525ns × (0.98,1.01) ~ (p=0.164) BenchmarkFmtFprintfPrefixedInt-4 421ns × (0.98,1.03) 417ns × (0.99,1.02) ~ (p=0.092) BenchmarkFmtFprintfFloat-4 623ns × (0.98,1.02) 618ns × (0.98,1.03) ~ (p=0.172) BenchmarkFmtManyArgs-4 2.09µs × (0.98,1.02) 2.09µs × (0.98,1.02) ~ (p=0.679) BenchmarkGobDecode-4 18.6ms × (0.99,1.01) 18.6ms × (0.98,1.03) ~ (p=0.595) BenchmarkGobEncode-4 15.0ms × (0.98,1.02) 15.1ms × (0.99,1.01) ~ (p=0.301) BenchmarkGzip-4 659ms × (0.98,1.04) 660ms × (0.97,1.02) ~ (p=0.724) BenchmarkGunzip-4 145ms × (0.98,1.04) 144ms × (0.99,1.04) ~ (p=0.671) BenchmarkHTTPClientServer-4 139µs × (0.97,1.02) 138µs × (0.99,1.02) ~ (p=0.392) BenchmarkJSONEncode-4 35.0ms × (0.99,1.02) 35.1ms × (0.98,1.02) ~ (p=0.777) BenchmarkJSONDecode-4 119ms × (0.98,1.01) 118ms × (0.98,1.02) ~ (p=0.710) BenchmarkMandelbrot200-4 6.02ms × (1.00,1.00) 6.02ms × (1.00,1.00) ~ (p=0.289) BenchmarkGoParse-4 7.96ms × (0.99,1.01) 7.96ms × (0.99,1.01) ~ (p=0.884) BenchmarkRegexpMatchEasy0_32-4 164ns × (0.98,1.04) 166ns × (0.97,1.04) ~ (p=0.221) BenchmarkRegexpMatchEasy0_1K-4 540ns × (0.99,1.01) 552ns × (0.97,1.04) +2.10% (p=0.018) BenchmarkRegexpMatchEasy1_32-4 140ns × (0.99,1.04) 142ns × (0.97,1.04) ~ (p=0.226) BenchmarkRegexpMatchEasy1_1K-4 896ns × (0.99,1.03) 907ns × (0.97,1.04) ~ (p=0.155) BenchmarkRegexpMatchMedium_32-4 255ns × (0.99,1.04) 255ns × (0.98,1.04) ~ (p=0.904) BenchmarkRegexpMatchMedium_1K-4 73.4µs × (0.99,1.04) 73.8µs × (0.98,1.04) ~ (p=0.560) BenchmarkRegexpMatchHard_32-4 3.93µs × (0.98,1.04) 3.95µs × (0.98,1.04) ~ (p=0.571) BenchmarkRegexpMatchHard_1K-4 117µs × (1.00,1.01) 119µs × (0.98,1.04) +1.48% (p=0.048) BenchmarkRevcomp-4 990ms × (0.94,1.08) 989ms × (0.94,1.10) ~ (p=0.957) BenchmarkTemplate-4 137ms × (0.98,1.02) 137ms × (0.99,1.01) ~ (p=0.996) BenchmarkTimeParse-4 629ns × (1.00,1.00) 629ns × (0.99,1.01) ~ (p=0.924) BenchmarkTimeFormat-4 710ns × (0.99,1.01) 716ns × (0.98,1.02) +0.84% (p=0.033) Change-Id: I43a04e0f6ad5e3ba9847dddf12e13222561f9cf4 Reviewed-on: https://go-review.googlesource.com/9543 Reviewed-by: Austin Clements <austin@google.com> |
|
|
|
3ca20218c1 |
runtime: fix gcDumpObject on non-heap pointers
gcDumpObject is used to print the source and destination objects when checkmark find a missing mark. However, gcDumpObject currently assumes the given pointer will point to a heap object. This is not true of the source object during root marking and may not even be true of the destination object in the limited situations where the heap points back in to the stack. If the pointer isn't a heap object, gcDumpObject will attempt an out-of-bounds access to h_spans. This will cause a panicslice, which will attempt to construct a useful panic message. This will cause a string allocation, which will lead mallocgc to panic because the GC is in mark termination (checkmark only happens during mark termination). Fix this by checking that the pointer points into the heap arena before attempting to use it as an arena pointer. Change-Id: I09da600c380d4773f1f8f38e45b82cb229ea6382 Reviewed-on: https://go-review.googlesource.com/9498 Reviewed-by: Rick Hudson <rlh@golang.org> |
|
|
|
4b78c9575d |
runtime: print stack of G during a signal
Sequence of operations: - Go code does a systemstack call - during the systemstack call, receive a signal - signal requests a traceback of all goroutines The orignal G is still marked as _Grunning, so the traceback code refuses to print its stack. Fix by allowing traceback of Gs whose caller is on the same M as G is. G can't be modifying its stack if that is the case. Fixes #10546 Change-Id: I2bcea48c0197fbf78ab6fa080027cd80181083ad Reviewed-on: https://go-review.googlesource.com/9435 Reviewed-by: Ian Lance Taylor <iant@golang.org> |
|
|
|
4d1ab2d8d1 |
runtime: re-enable TestNewProc0 on android/arm and fix heap corruption
The problem is not actually specific to android/arm. Linux/ARM's runtime.clone set the stack pointer to child_stk-4 before calling the fn. And then when fn returns, it tries to write to 4(R13) to provide argument for runtime.exit, which is just beyond the allocated child stack, and thus it will corrupt the heap randomly or trigger segfault if that memory happens to be unmapped. While we're at here, shorten the test polling interval to 0.1s to speed up the test (it was only checking at 1s interval, which means the test takes at least 1s). Fixes #10548. Change-Id: I57cd63232022b113b6cd61e987b0684ebcce930a Reviewed-on: https://go-review.googlesource.com/9457 Reviewed-by: David Crawshaw <crawshaw@golang.org> |
|
|
|
c26fc88d56 |
runtime/pprof: write heap statistics to heap profile always
The heap statistics were only written if asked for a profile with debug > 0, but that also prints a stack trace for each profile line, which is comparatively much noisier. The statistics are short enough and separate enough (they only appear at the end) and useful enough that we can print them always. This means that people using -test.memprofile in tests will get a memory profile with statistics included now. Pprof won't care, but if people care to look, the numbers will be there. This avoids the need for hacks like using -memprofilerate=1 to find the number of allocations. Change-Id: I10a4f593403d0315aad11b37c6e554b734caa73f Reviewed-on: https://go-review.googlesource.com/9491 Reviewed-by: David Chase <drchase@google.com> |
|
|
|
c526f3ac10 |
runtime: tail call into memeq/cmp body implementations
There's no need to call/ret to the body implementation. It can write the result to the right place. Just jump to it and have it return to our caller. Old: call body implementation compute result put result in a register return write register to result location return New: load address of result location into a register jump to body implementation compute result write result to passed-in address return It's a bit tricky on 386 because there is no free register with which to pass the result location. Free up a register by keeping around blen-alen instead of both alen and blen. Change-Id: If2cf0682a5bf1cc592bdda7c126ed4eee8944fba Reviewed-on: https://go-review.googlesource.com/9202 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> |
|
|
|
7e49c8193c |
runtime: skip gdb goroutine backtrace test on non-x86
Gdb is not able to backtrace our non-standard stack frames on RISC architectures without frame pointer. Change-Id: Id62a566ce2d743602ded2da22ff77b9ae34bc5ae Signed-off-by: Shenghou Ma <minux@golang.org> Reviewed-on: https://go-review.googlesource.com/9456 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> |
|
|
|
da11a9dda3 |
cmd/internal/ld, runtime: unify stack reservation in PE header and runtime
With 128KB stack reservation, on 32-bit Windows, the maximum number threads is ~9000. The original 65535-byte stack commit is causing problem on Windows XP where it makes the stack reservation to be 1MB despite the fact that the runtime specified 128KB. While we're at here, also fix the extra spacings in the unable to create more OS thread error message: println will insert a space between each argument. See #9457 for more information. Change-Id: I3a82f7d9717d3d55211b6eb1c34b00b0eaad83ed Reviewed-on: https://go-review.googlesource.com/2237 Reviewed-by: Alex Brainman <alex.brainman@gmail.com> Run-TryBot: Minux Ma <minux@golang.org> |
|
|
|
e7dd28891e |
cmd/internal/gc, cmd/[56789]g: rename stackcopy to blockcopy
To avoid confusion with the runtime concept of copying stack. Change-Id: I33442377b71012c2482c2d0ddd561492c71e70d0 Reviewed-on: https://go-review.googlesource.com/8639 Reviewed-by: Dave Cheney <dave@cheney.net> Reviewed-by: Russ Cox <rsc@golang.org> |