go_android_exec is looking for "exitcode=" to decide the result
of running a test. The heap dump test nondeterministically prints
"finalized" right at the end of the test. When the timing is just
right, we print "finalizedexitcode=0" and confuse go_android_exec.
This failure happens occasionally on the android builders.
Change-Id: I4f73a4db05d8f40047ecd3ef3a881a4ae3741e26
Reviewed-on: https://go-review.googlesource.com/23861
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Any defer in a shared object crashed when GOARCH=386. This turns out to be two
bugs:
1) Calls to morestack were not processed to be PIC safe (must have been
possible to trigger this another way too)
2) jmpdefer needs to rewind the return address of the deferred function past
the instructions that load the GOT pointer into BX, not just past the call
Bug 2) requires re-introducing the a way for .s files to know when they are
being compiled for dynamic linking but I've tried to do that in as minimal
a way as possible.
Fixes#15916
Change-Id: Ia0d09b69ec272a176934176b8eaef5f3bfcacf04
Reviewed-on: https://go-review.googlesource.com/23623
Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
When doing a backtrace from a signal that occurs in C code compiled
without using -fasynchronous-unwind-tables, we have to rely on frame
pointers. In order to do that, the traceback function needs the signal
context to reliably pick up the frame pointer.
Change-Id: I7b45930fced01685c337d108e0f146057928f876
Reviewed-on: https://go-review.googlesource.com/23494
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The code has moved from code.google.com to github.com.
Change-Id: I0cc9eb69b3fedc9e916417bc7695759632f2391f
Reviewed-on: https://go-review.googlesource.com/23523
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Add TSAN acquire/release calls to runtime/cgo to match the ones
generated by cgo. This avoids a false positive race around the malloc
memory used in runtime/cgo when other goroutines are simultaneously
calling malloc and free from cgo.
These new calls will only be used when building with CGO_CFLAGS and
CGO_LDFLAGS set to -fsanitize=thread, which becomes a requirement to
avoid all false positives when using TSAN. These are needed not just
for runtime/cgo, but also for any runtime package that uses cgo (such as
net and os/user).
Add an unused attribute to the _cgo_tsan_acquire and _cgo_tsan_release
functions, in case there are no actual cgo function calls.
Add a test that checks that setting CGO_CFLAGS/CGO_LDFLAGS avoids a
false positive report when using os/user.
Change-Id: I0905c644ff7f003b6718aac782393fa219514c48
Reviewed-on: https://go-review.googlesource.com/23492
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
In order to support pprof for position independent executables, pprof
needs to adjust the PC addresses stored in the profile by the address at
which the program is loaded. The legacy profiling support which we use
already supports recording the GNU/Linux /proc/self/maps data
immediately after the CPU samples, so do that. Also change the pprof
symbolizer to use the information, if available, when looking up
addresses in the Go pcline data.
Fixes#15714.
Change-Id: I4bf679210ef7c51d85cf873c968ce82db8898e3e
Reviewed-on: https://go-review.googlesource.com/23525
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Also adds missing copyright notice.
Updates #15603.
Change-Id: Icf4bb45ba5edec891491fe5f0039a8a25125d168
Reviewed-on: https://go-review.googlesource.com/23501
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently when the garbage collector frees stacks of dead goroutines
in markrootFreeGStacks, it calls stackfree on a regular user stack.
This is a problem, since stackfree manipulates the stack cache in the
per-P mcache, so if it grows the stack or gets preempted in the middle
of manipulating the stack cache (which are both possible since it's on
a user stack), it can easily corrupt the stack cache.
Fix this by calling markrootFreeGStacks on the system stack, so that
all calls to stackfree happen on the system stack. To prevent this bug
in the future, mark stack functions that manipulate the mcache as
go:systemstack.
Fixes#15853.
Change-Id: Ic0d1c181efb342f134285a152560c3a074f14a3d
Reviewed-on: https://go-review.googlesource.com/23511
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This has a minor performance cost, but far less than is being gained by SSA.
As an experiment, enable it during the Go 1.7 beta.
Having frame pointers on by default makes Linux's perf, Intel VTune,
and other profilers much more useful, because it lets them gather a
stack trace efficiently on profiling events.
(It doesn't help us that much, since when we walk the stack we usually
need to look up PC-specific information as well.)
Fixes#15840.
Change-Id: I4efd38412a0de4a9c87b1b6e5d11c301e63f1a2a
Reviewed-on: https://go-review.googlesource.com/23451
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The irregular calling convention for defers currently incorrectly
manages the BP if frame pointers are enabled. Specifically, jmpdefer
manipulates the SP as if its own caller, deferreturn, had returned.
However, it does not manipulate the BP to match. As a result, when a
BP-based traceback happens during a deferred function call, it unwinds
to the function that performed the defer and then thinks that function
called itself in an infinite regress.
Fix this by making jmpdefer manipulate the BP as if deferreturn had
actually returned.
Fixes#12968.
Updates #15840.
Change-Id: Ic9cc7c863baeaf977883ed0c25a7e80e592cf066
Reviewed-on: https://go-review.googlesource.com/23457
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
A few other architectures have already defined a NOFRAME flag.
Use it to disable frame pointer code on a few very low-level functions
that must behave like Windows code.
Makes the failing os/signal test pass on a Windows gomote.
Change-Id: I982365f2c59a0aa302b4428c970846c61027cf3e
Reviewed-on: https://go-review.googlesource.com/23456
Reviewed-by: Austin Clements <austin@google.com>
I have been running this patch inside Google against Go 1.6 for the last month.
The new tests will probably break the builders but let's see
exactly how they break.
Change-Id: Ia65cf7d3faecffeeb4b06e9b80875c0e57d86d9e
Reviewed-on: https://go-review.googlesource.com/23452
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Acquire and release the TSAN synchronization point when calling malloc,
just as we do when calling any other C function. If we don't do this,
TSAN will report false positive errors about races calling malloc and
free.
We used to have a special code path for malloc and free, going through
the runtime functions cmalloc and cfree. The special code path for cfree
was no longer used even before this CL. This CL stops using the special
code path for malloc, because there is no place along that path where we
could conditionally insert the TSAN synchronization. This CL removes
the support for the special code path for both functions.
Instead, cgo now automatically generates the malloc function as though
it were referenced as C.malloc. We need to automatically generate it
even if C.malloc is not called, even if malloc and size_t are not
declared, to support cgo-provided functions like C.CString.
Change-Id: I829854ec0787a80f33fa0a8a0dc2ee1d617830e2
Reviewed-on: https://go-review.googlesource.com/23260
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This makes GOEXPERIMENT=framepointer, GOOS=darwin, and buildmode=carchive coexist.
Change-Id: I9f6fb2f0f06f27df683e5b51f2fa55cd21872453
Reviewed-on: https://go-review.googlesource.com/23454
Reviewed-by: Austin Clements <austin@google.com>
Currently scanstack obtains its own gcWork from the P for the duration
of the stack scan and then, if called during mark termination,
disposes the gcWork.
However, this means that the number of workbufs allocated will be at
least the number of stacks scanned during mark termination, which may
be very high (especially during a STW GC). This happens because, in
steady state, each scanstack will obtain a fresh workbuf (either from
the empty list or by allocating it), fill it with the scan results,
and then dispose it to the full list. Nothing is consuming from the
full list during this (and hence nothing is recycling them to the
empty list), so the length of the full list by the time mark
termination starts draining it is at least the number of stacks
scanned.
Fix this by pushing the gcWork acquisition up the stack to either the
gcDrain that calls markroot that calls scanstack (which batches across
many stack scans and is the path taken during STW GC) or to newstack
(which is still a single scanstack call, but this is roughly bounded
by the number of Ps).
This fix reduces the workbuf allocation for the test program from
issue #15319 from 213 MB (roughly 2KB * 1e5 goroutines) to 10 MB.
Fixes#15319.
Note that there's potentially a similar issue in write barriers during
mark 2. Fixing that will be more difficult since there's no broader
non-preemptible context, but it should also be less of a problem since
the full list is being drained during mark 2.
Some overall improvements in the go1 benchmarks, plus the usual noise.
No significant change in the garbage benchmark (time/op or GC memory).
name old time/op new time/op delta
BinaryTree17-12 2.54s ± 1% 2.51s ± 1% -1.09% (p=0.000 n=20+19)
Fannkuch11-12 2.12s ± 0% 2.17s ± 0% +2.18% (p=0.000 n=19+18)
FmtFprintfEmpty-12 45.1ns ± 1% 45.2ns ± 0% ~ (p=0.078 n=19+18)
FmtFprintfString-12 127ns ± 0% 128ns ± 0% +1.08% (p=0.000 n=19+16)
FmtFprintfInt-12 125ns ± 0% 122ns ± 1% -2.71% (p=0.000 n=14+18)
FmtFprintfIntInt-12 196ns ± 0% 190ns ± 1% -2.91% (p=0.000 n=12+20)
FmtFprintfPrefixedInt-12 196ns ± 0% 194ns ± 1% -0.94% (p=0.000 n=13+18)
FmtFprintfFloat-12 253ns ± 1% 251ns ± 1% -0.86% (p=0.000 n=19+20)
FmtManyArgs-12 807ns ± 1% 784ns ± 1% -2.85% (p=0.000 n=20+20)
GobDecode-12 7.13ms ± 1% 7.12ms ± 1% ~ (p=0.351 n=19+20)
GobEncode-12 5.89ms ± 0% 5.95ms ± 0% +0.94% (p=0.000 n=19+19)
Gzip-12 219ms ± 1% 221ms ± 1% +1.35% (p=0.000 n=18+20)
Gunzip-12 37.5ms ± 1% 37.4ms ± 0% ~ (p=0.057 n=20+19)
HTTPClientServer-12 81.4µs ± 4% 81.9µs ± 3% ~ (p=0.118 n=17+18)
JSONEncode-12 15.7ms ± 1% 15.8ms ± 1% +0.73% (p=0.000 n=17+18)
JSONDecode-12 57.9ms ± 1% 57.2ms ± 1% -1.34% (p=0.000 n=19+19)
Mandelbrot200-12 4.12ms ± 1% 4.10ms ± 0% -0.33% (p=0.000 n=19+17)
GoParse-12 3.22ms ± 2% 3.25ms ± 1% +0.72% (p=0.000 n=18+20)
RegexpMatchEasy0_32-12 70.6ns ± 1% 71.1ns ± 2% +0.63% (p=0.005 n=19+20)
RegexpMatchEasy0_1K-12 240ns ± 0% 239ns ± 1% -0.59% (p=0.000 n=19+20)
RegexpMatchEasy1_32-12 71.3ns ± 1% 71.3ns ± 1% ~ (p=0.844 n=17+17)
RegexpMatchEasy1_1K-12 384ns ± 2% 371ns ± 1% -3.45% (p=0.000 n=19+20)
RegexpMatchMedium_32-12 109ns ± 1% 108ns ± 2% -0.48% (p=0.029 n=19+19)
RegexpMatchMedium_1K-12 34.3µs ± 1% 34.5µs ± 2% ~ (p=0.160 n=18+20)
RegexpMatchHard_32-12 1.79µs ± 9% 1.72µs ± 2% -3.83% (p=0.000 n=19+19)
RegexpMatchHard_1K-12 53.3µs ± 4% 51.8µs ± 1% -2.82% (p=0.000 n=19+20)
Revcomp-12 386ms ± 0% 388ms ± 0% +0.72% (p=0.000 n=17+20)
Template-12 62.9ms ± 1% 62.5ms ± 1% -0.57% (p=0.010 n=18+19)
TimeParse-12 325ns ± 0% 331ns ± 0% +1.84% (p=0.000 n=18+19)
TimeFormat-12 338ns ± 0% 343ns ± 0% +1.34% (p=0.000 n=18+20)
[Geo mean] 52.7µs 52.5µs -0.42%
Change-Id: Ib2d34736c4ae2ec329605b0fbc44636038d8d018
Reviewed-on: https://go-review.googlesource.com/23391
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Also mark it go:systemstack and explain why.
Change-Id: I88baf22741c04012ba2588d8e03dd3801d19b5c0
Reviewed-on: https://go-review.googlesource.com/23390
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Names for Append?Bytes are slightly changed in addition to adding a slash.
Change-Id: I0291aa29c693f9040fd01368eaad9766259677df
Reviewed-on: https://go-review.googlesource.com/23426
Run-TryBot: Marcel van Lohuizen <mpvl@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Names of sub-benchmarks are preserved, short of the additional slash.
Change-Id: I9b3f82964f9a44b0d28724413320afd091ed3106
Reviewed-on: https://go-review.googlesource.com/23425
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Marcel van Lohuizen <mpvl@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Other GOARCHs already handle their callee-saved FP registers, but
arm was missing. Without this change, code using Cgo and floating
point code might fail in mysterious and hard to debug ways.
There are no floating point registers when GOARM=5, so skip the
registers when runtime.goarm < 6.
darwin/arm doesn't support GOARM=5, so the check is left out of
rt0_darwin_arm.s.
Fixes#14876
Change-Id: I6bcb90a76df3664d8ba1f33123a74b1eb2c9f8b2
Reviewed-on: https://go-review.googlesource.com/23140
Run-TryBot: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
When gentraceback starts on a system stack in sigprof, it is
configured to jump to the user stack when it reaches the end of the
system stack. Currently this updates the current frame's FP, but not
its SP. This is okay on non-LR machines (x86) because frame.sp is only
used to find defers, which the bottom-most frame of the user stack
will never have.
However, on LR machines, we use frame.sp to find the saved LR. We then
use to resolve the function of the next frame, which is used to
resolved the size of the next frame. Since we're not updating frame.sp
on a stack jump, we read the saved LR from the system stack instead of
the user stack and wind up resolving the wrong function and hence the
wrong frame size for the next frame.
This has had remarkably few ill effects (though the resulting profiles
must be wrong). We noticed it because of a bad interaction with stack
barriers. Specifically, once we get the next frame size wrong, we also
get the location of its LR wrong. If we happen to get a stack slot
that contains a stale stack barrier LR (for a stack barrier we already
hit) and hasn't been overwritten with something else as we re-grew the
stack, gentraceback will fail with a "found next stack barrier at ..."
error, pointing at the slot that it thinks is an LR, but isn't.
Fixes#15138.
Updates #15313 (might fix it).
Change-Id: I13cfa322b44c0c2f23ac2b3d03e12631e4a6406b
Reviewed-on: https://go-review.googlesource.com/23291
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Currently it's possible for user code to exploit the high scheduler
priority of the GC worker in conjunction with the runnext optimization
to elevate a user goroutine to high priority so it will always run
even if there are other runnable goroutines.
For example, if a goroutine is in a tight allocation loop, the
following can happen:
1. Goroutine 1 allocates, triggering a GC.
2. G 1 attempts an assist, but fails and blocks.
3. The scheduler runs the GC worker, since it is high priority.
Note that this also starts a new scheduler quantum.
4. The GC worker does enough work to satisfy the assist.
5. The GC worker readies G 1, putting it in runnext.
6. GC finishes and the scheduler runs G 1 from runnext, giving it
the rest of the GC worker's quantum.
7. Go to 1.
Even if there are other goroutines on the run queue, they never get a
chance to run in the above sequence. This requires a confluence of
circumstances that make it unlikely, though not impossible, that it
would happen in "real" code. In the test added by this commit, we
force this confluence by setting GOMAXPROCS to 1 and GOGC to 1 so it's
easy for the test to repeated trigger GC and wake from a blocked
assist.
We fix this by making GC always put user goroutines at the end of the
run queue, instead of in runnext. This makes it so user code can't
piggy-back on the GC's high priority to make a user goroutine act like
it has high priority. The only other situation where GC wakes user
goroutines is waking all blocked assists at the end, but this uses the
global run queue and hence doesn't have this problem.
Fixes#15706.
Change-Id: I1589dee4b7b7d0c9c8575ed3472226084dfce8bc
Reviewed-on: https://go-review.googlesource.com/23172
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently ready always puts the readied goroutine in runnext. We're
going to have to change this for some uses, so add a flag for whether
or not to use runnext.
For now we always pass true so this is a no-op change.
For #15706.
Change-Id: Iaa66d8355ccfe4bbe347570cc1b1878c70fa25df
Reviewed-on: https://go-review.googlesource.com/23171
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
OpenBSD 6.0 (due out November 2016) will support PT_TLS, which will
allow for the OpenBSD cgo pthread_create() workaround to be removed.
However, in order for Go to continue working on supported OpenBSD
releases (the current release and the previous release - 5.9 and 6.0,
once 6.0 is released), we cannot enable PT_TLS immediately. Instead,
adjust the existing code so that it works with the previous TCB
allocation and the new TIB allocation. This allows the same Go
runtime to work on 5.8, 5.9 and later 6.0.
Once OpenBSD 5.9 is no longer supported (May 2017, when 6.1 is
released), PT_TLS can be enabled and the additional cgo runtime
code removed.
Change-Id: I3eed5ec593d80eea78c6656cb12557004b2c0c9a
Reviewed-on: https://go-review.googlesource.com/23197
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The test case in #15639 somehow causes an invalid syscall frame. The
failure is obscured because the throw occurs when throwsplit == true,
which causes a "stack split at bad time" error when trying to print the
throw message.
This CL fixes the "stack split at bad time" by using systemstack. No
test because there shouldn't be any way to trigger this error anyhow.
Update #15639.
Change-Id: I4240f3fd01bdc3c112f3ffd1316b68504222d9e1
Reviewed-on: https://go-review.googlesource.com/23153
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
On some systems, gdb is set to: "startup-with-shell on". This
breaks runtime_test. This just make sure gdb does not start by
spawning a shell.
Fixes#15354
Change-Id: Ia040931c61dea22f4fdd79665ab9f84835ecaa70
Reviewed-on: https://go-review.googlesource.com/23142
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The signal might get delivered to a different thread, and that thread
might not run again before the currently running thread returns and
exits. Sleep to give the other thread time to pick up the signal and
crash.
Not tested for all cases, but, optimistically:
Fixes#14063.
Change-Id: Iff58669ac6185ad91cce85e0e86f17497a3659fd
Reviewed-on: https://go-review.googlesource.com/23203
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Mikio Hara <mikioh.mikioh@gmail.com>
racefini calls __tsan_fini which is C code and at the end of it
invoked the standard C library exit(3) call. This has undefined
behavior if invoked more than once. Specifically in C++ programs
it caused static destructors to run twice. At least on glibc
impls it also means the at_exit handlers list (where those are
stored) also free's a list entry when it completes these. So invoking
twice results in a double free at exit which trips debug memory
allocation tracking.
Fix all of this by using an atomic as a boolean barrier around
calls to racefini being invoked > 1 time.
Fixes#15578
Change-Id: I49222aa9b8ded77160931f46434c61a8379570fc
Reviewed-on: https://go-review.googlesource.com/22882
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Issue #15613 points out that the darwin builders have been getting
regular failures in which a process that should exit with a SIGPIPE
signal is instead exiting with exit status 2. The code calls
runtime.raise. On most systems runtime.raise is the equivalent of
pthread_kill(gettid(), sig); that is, it kills the thread with the
signal, which should ensure that the program does not keep going. On
darwin, however, runtime.raise is actually kill(getpid(), sig); that is,
it sends a signal to the entire process. If the process decides to
deliver the signal to a different thread, then it is possible that in
some cases the thread that calls raise is able to execute the next
system call before the signal is actually delivered. That would cause
the observed error.
I have not been able to recreate the problem myself, so I don't know
whether this actually fixes it. But, optimistically:
Fixed#15613.
Change-Id: I60c0a9912aae2f46143ca1388fd85e9c3fa9df1f
Reviewed-on: https://go-review.googlesource.com/23152
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This should help with debugging failures.
For #15138 and #15477.
Change-Id: I77db2b6375d8b4403d3edf5527899d076291e02c
Reviewed-on: https://go-review.googlesource.com/23134
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
The convention for writing something like "64 kB" is 64<<10, since
this is easier to read than 1<<16. Update gcBitsChunkBytes to follow
this convention.
Change-Id: I5b5a3f726dcf482051ba5b1814db247ff3b8bb2f
Reviewed-on: https://go-review.googlesource.com/23132
Reviewed-by: Rick Hudson <rlh@golang.org>
The 17-31 byte code is broken. Disabled it.
Added a bunch of tests to at least cover the cases
in indexShortStr. I'll channel Brad and wonder why
this CL ever got in without any tests.
Fixes#15679
Change-Id: I84a7b283a74107db865b9586c955dcf5f2d60161
Reviewed-on: https://go-review.googlesource.com/23106
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently the heapBitsSetType documentation says that there are no
races on the heap bitmap, but that isn't exactly true. There are no
*write-write* races, but there are read-write races. Expand the
documentation to explain this and why it's okay.
Change-Id: Ibd92b69bcd6524a40a9dd4ec82422b50831071ed
Reviewed-on: https://go-review.googlesource.com/23092
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently we only execute a publication barrier for scan objects (and
skip it for noscan objects). This used to be okay because GC would
never consult the object itself (so it wouldn't observe uninitialized
memory even if it found a pointer to a noscan object), and the heap
bitmap was pre-initialized to noscan.
However, now we explicitly initialize the heap bitmap for noscan
objects when we allocate them. While the GC will still never consult
the contents of a noscan object, it does need to see the initialized
heap bitmap. Hence, we need to execute a publication barrier to make
the bitmap visible before user code can expose a pointer to the newly
allocated object even for noscan objects.
Change-Id: Ie4133c638db0d9055b4f7a8061a634d970627153
Reviewed-on: https://go-review.googlesource.com/23043
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
In future releases of OpenBSD, the sigreturn syscall will no longer
exist. As such, stop using sigreturn on openbsd/386 and just return
from the signal trampoline (as we already do for openbsd/amd64 and
openbsd/arm).
Change-Id: Ic4de1795bbfbfb062a685832aea0d597988c6985
Reviewed-on: https://go-review.googlesource.com/23024
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Noticed and fix by Alex Brainman.
Tested in https://golang.org/cl/23005 (which makes all compiler
warnings fatal during development)
Fixes#15623
Change-Id: Ic19999fce8bb8640d963965cc328574efadd7855
Reviewed-on: https://go-review.googlesource.com/23010
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
The test sometimes fails on builders.
The test uses sleeps to establish the necessary goroutine
execution order. If sleeps undersleep/oversleep
the race is still reported, but it can be reported when the
main test goroutine returns. In such case test driver
can't match the race with the test and reports failure.
Wait for both test goroutines to ensure that the race
is reported in the test scope.
Fixes#15579
Change-Id: I0b9bec0ebfb0c127d83eb5325a7fe19ef9545050
Reviewed-on: https://go-review.googlesource.com/22951
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
In issue #13992, Russ mentioned that the heap bitmap footprint was
halved but that the bitmap size calculation hadn't been updated. This
presents the opportunity to either halve the bitmap size or double
the addressable virtual space. This CL doubles the addressable virtual
space. On 32 bit this can be tweaked further to allow the bitmap to
cover the entire 4GB virtual address space, removing a failure mode
if the kernel hands out memory with a too low address.
First, fix the calculation and double _MaxArena32 to cover 4GB virtual
memory space with the same bitmap size (256 MB).
Then, allow the fallback mode for the initial memory reservation
on 32 bit (or 64 bit with too little available virtual memory) to not
include space for the arena. mheap.sysAlloc will automatically reserve
additional space when the existing arena is full.
Finally, set arena_start to 0 in 32 bit mode, so that any address is
acceptable for subsequent (additional) reservations.
Before, the bitmap was always located just before arena_start, so
fix the two places relying on that assumption: Point the otherwise unused
mheap.bitmap to one byte after the end of the bitmap, and use it for
bitmap addressing instead of arena_start.
With arena_start set to 0 on 32 bit, the cgoInRange check is no longer a
sufficient check for Go pointers. Introduce and call inHeapOrStack to
check whether a pointer is to the Go heap or stack.
While we're here, remove sysReserveHigh which seems to be unused.
Fixes#13992
Change-Id: I592b513148a50b9d3967b5c5d94b86b3ec39acc2
Reviewed-on: https://go-review.googlesource.com/20471
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Change-Id: Ice9c234960adc7857c8370b777a0b18e29d59281
Reviewed-on: https://go-review.googlesource.com/22853
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
I meant to delete these in CL 22850, actually.
Change-Id: I0c286efd2b9f1caf0221aa88e3bcc03649c89517
Reviewed-on: https://go-review.googlesource.com/22851
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This can only happen when profiling and there is foreign code
at the top of the g0 stack but we're not in cgo.
That in turn only happens with the race detector.
Fixes#13568.
Change-Id: I23775132c9c1a3a3aaae191b318539f368adf25e
Reviewed-on: https://go-review.googlesource.com/18322
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
CL/19862 (f79b50b8d5) recently introduced the constants
SeekStart, SeekCurrent, and SeekEnd to the io package. We should use these constants
consistently throughout the code base.
Updates #15269
Change-Id: If7fcaca7676e4a51f588528f5ced28220d9639a2
Reviewed-on: https://go-review.googlesource.com/22097
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Joe Tsai <joetsai@digital-static.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Negative-case conversion code was wrong for minimum int32,
used negate-then-widen instead of widen-then-negate.
Test already exists; this fixes the failure.
Fixes#15563.
Change-Id: I4b0b3ae8f2c9714bdcc405d4d0b1502ccfba2b40
Reviewed-on: https://go-review.googlesource.com/22830
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
So we can start working on other architectures here.
Change is a dummy to keep git happy.
Change-Id: I1caa62a242790601810a1ff72af7ea9773d4da76
Reviewed-on: https://go-review.googlesource.com/22822
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Adds a small function signame that infers a signal name
from the signal table, otherwise will fallback to using
hex(sig) as previously. No signal table is present for
Windows hence it will always print the hex value.
Sample code and new result:
```go
package main
import (
"fmt"
"time"
)
func main() {
defer func() {
if err := recover(); err != nil {
fmt.Printf("err=%v\n", err)
}
}()
ticker := time.Tick(1e9)
for {
<-ticker
}
}
```
```shell
$ go run main.go &
$ kill -11 <pid>
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0xb01dfacedebac1e
pc=0xc71db]
...
```
Fixes#13969
Change-Id: Ie6be312eb766661f1cea9afec352b73270f27f9d
Reviewed-on: https://go-review.googlesource.com/22753
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The following performance improvements have been made to the
low-level atomic functions for ppc64le & ppc64:
- For those cases containing a lwarx and stwcx (or other sizes):
sync, lwarx, maybe something, stwcx, loop to sync, sync, isync
The sync is moved before (outside) the lwarx/stwcx loop, and the
sync after is removed, so it becomes:
sync, lwarx, maybe something, stwcx, loop to lwarx, isync
- For the Or8 and And8, the shifting and manipulation of the
address to the word aligned version were removed and the
instructions were changed to use lbarx, stbcx instead of
register shifting, xor, then lwarx, stwcx.
- New instructions LWSYNC, LBAR, STBCC were tested and added.
runtime/atomic_ppc64x.s was changed to use the LWSYNC opcode
instead of the WORD encoding.
Fixes#15469
Ran some of the benchmarks in the runtime and sync directories.
Some results varied from run to run but the trend was improvement
based on best times for base and new:
runtime.test:
BenchmarkChanNonblocking-128 0.88 0.89 +1.14%
BenchmarkChanUncontended-128 569 511 -10.19%
BenchmarkChanContended-128 63110 53231 -15.65%
BenchmarkChanSync-128 691 598 -13.46%
BenchmarkChanSyncWork-128 11355 11649 +2.59%
BenchmarkChanProdCons0-128 2402 2090 -12.99%
BenchmarkChanProdCons10-128 1348 1363 +1.11%
BenchmarkChanProdCons100-128 1002 746 -25.55%
BenchmarkChanProdConsWork0-128 2554 2720 +6.50%
BenchmarkChanProdConsWork10-128 1909 1804 -5.50%
BenchmarkChanProdConsWork100-128 1624 1580 -2.71%
BenchmarkChanCreation-128 237 212 -10.55%
BenchmarkChanSem-128 705 667 -5.39%
BenchmarkChanPopular-128 5081190 4497566 -11.49%
BenchmarkCreateGoroutines-128 532 473 -11.09%
BenchmarkCreateGoroutinesParallel-128 35.0 34.7 -0.86%
BenchmarkCreateGoroutinesCapture-128 4923 4200 -14.69%
sync.test:
BenchmarkUncontendedSemaphore-128 112 94.2 -15.89%
BenchmarkContendedSemaphore-128 133 128 -3.76%
BenchmarkMutexUncontended-128 1.90 1.67 -12.11%
BenchmarkMutex-128 353 310 -12.18%
BenchmarkMutexSlack-128 304 283 -6.91%
BenchmarkMutexWork-128 554 541 -2.35%
BenchmarkMutexWorkSlack-128 567 556 -1.94%
BenchmarkMutexNoSpin-128 275 242 -12.00%
BenchmarkMutexSpin-128 1129 1030 -8.77%
BenchmarkOnce-128 1.08 0.96 -11.11%
BenchmarkPool-128 29.8 27.4 -8.05%
BenchmarkPoolOverflow-128 40564 36583 -9.81%
BenchmarkSemaUncontended-128 3.14 2.63 -16.24%
BenchmarkSemaSyntNonblock-128 1087 1069 -1.66%
BenchmarkSemaSyntBlock-128 897 893 -0.45%
BenchmarkSemaWorkNonblock-128 1034 1028 -0.58%
BenchmarkSemaWorkBlock-128 949 886 -6.64%
Change-Id: I4403fb29d3cd5254b7b1ce87a216bd11b391079e
Reviewed-on: https://go-review.googlesource.com/22549
Reviewed-by: Michael Munday <munday@ca.ibm.com>
Reviewed-by: Minux Ma <minux@golang.org>
Builder is too slow. This test passed on builder machines but took
15+ min.
Change-Id: Ief9d67ea47671a57e954e402751043bc1ce09451
Reviewed-on: https://go-review.googlesource.com/22798
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Since tracebackctxt.go uses //export functions, the C functions can't be
externally visible in the C comment. The code was using attributes to
work around that, but that failed on Windows.
Change-Id: If4449fd8209a8998b4f6855ea89e5db1471b2981
Reviewed-on: https://go-review.googlesource.com/22786
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
If we collected a cgo traceback when entering the SIGPROF signal
handler, record it as part of the profiling stack trace.
This serves as the promised test for https://golang.org/cl/21055 .
Change-Id: I5f60cd6cea1d9b7c3932211483a6bfab60ed21d2
Reviewed-on: https://go-review.googlesource.com/22650
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Race runtime also needs local malloc caches and currently uses
a mix of per-OS-thread and per-goroutine caches. This leads to
increased memory consumption. But more importantly cache of
synchronization objects is per-goroutine and we don't always
have goroutine context when feeing memory in GC. As the result
synchronization object descriptors leak (more precisely, they
can be reused if another synchronization object is recreated
at the same address, but it does not always help). For example,
the added BenchmarkSyncLeak has effectively runaway memory
consumption (based on a real long running server).
This change updates race runtime with support for per-P contexts.
BenchmarkSyncLeak now stabilizes at ~1GB memory consumption.
Long term, this will allow us to remove race runtime dependency
on glibc (as malloc is the main cornerstone).
I've also implemented a different scheme to pass P context to
race runtime: scheduler notified race runtime about association
between G and P by calling procwire(g, p)/procunwire(g, p).
But it turned out to be very messy as we have lots of places
where the association changes (e.g. syscalls). So I dropped it
in favor of the current scheme: race runtime asks scheduler
about the current P.
Fixes#14533
Change-Id: Iad10d2f816a44affae1b9fed446b3580eafd8c69
Reviewed-on: https://go-review.googlesource.com/19970
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Runqempty is a critical predicate for scheduler. If runqempty spuriously
returns true, then scheduler can fail to schedule arbitrary number of
runnable goroutines on idle Ps for arbitrary long time. With the addition
of runnext runqempty predicate become broken (can spuriously return true).
Consider that runnext is not nil and the main array is empty. Runqempty
observes that the array is empty, then it is descheduled for some time.
Then queue owner pushes another element to the queue evicting runnext
into the array. Then queue owner pops runnext. Then runqempty resumes
and observes runnext is nil and returns true. But there were no point
in time when the queue was empty.
Fix runqempty predicate to not return true spuriously.
Change-Id: Ifb7d75a699101f3ff753c4ce7c983cf08befd31e
Reviewed-on: https://go-review.googlesource.com/20858
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This updates some comments that became out of date when we moved the
mark bit out of the heap bitmap and started using the high bit for the
first word as a scan/dead bit.
Change-Id: I4a572d16db6114cadff006825466c1f18359f2db
Reviewed-on: https://go-review.googlesource.com/22662
Reviewed-by: Rick Hudson <rlh@golang.org>
MIPS N64 ABI passes arguments in registers R4-R11, return value in R2.
R16-R23, R28, R30 and F24-F31 are callee-save. gcc PIC code expects
to be called with indirect call through R25.
Change-Id: I24f582b4b58e1891ba9fd606509990f95cca8051
Reviewed-on: https://go-review.googlesource.com/19805
Reviewed-by: Minux Ma <minux@golang.org>
SB register (R28) is introduced for access external addresses with shorter
instruction sequences. It is loaded at entry points. External data within
2G of SB can be accessed this way.
cmd/internal/obj: relocaltion R_ADDRMIPS is split into two relocations
R_ADDRMIPS and R_ADDRMIPSU, handling the low 16 bits and the "upper" 16
bits of external addresses, respectively, since the instructios may not
be adjacent. It might be better if relocation Variant could be used.
cmd/link/internal/mips64: support new relocations.
cmd/compile/internal/mips64: reserve SB register.
runtime: initialize SB register at entry points.
Change-Id: I5f34868f88c5a9698c042a8a1f12f76806c187b9
Reviewed-on: https://go-review.googlesource.com/19802
Reviewed-by: Minux Ma <minux@golang.org>
Leave R28 to SB register, which will be introduced in CL 19802.
Change-Id: I1cf7a789695c5de664267ec8086bfb0b043ebc14
Reviewed-on: https://go-review.googlesource.com/19863
Reviewed-by: Minux Ma <minux@golang.org>
With the switch to separate mark bitmaps, the scan/dead bit for the
first word of each object is now unused. Reclaim this bit and use it
as a scan/dead bit, just like words three and on. The second word is
still used for checkmark.
This dramatically simplifies heapBitsSetTypeNoScan and hasPointers,
since they no longer need different cases for 1, 2, and 3+ word
objects. They can instead just manipulate the heap bitmap for the
first word and be done with it.
In order to enable this, we change heapBitsSetType and runGCProg to
always set the scan/dead bit to scan for the first word on every code
path. Since these functions only apply to types that have pointers,
there's no need to do this conditionally: it's *always* necessary to
set the scan bit in the first word.
We also change every place that scans an object and checks if there
are more pointers. Rather than only checking morePointers if the word
is >= 2, we now check morePointers if word != 1 (since that's the
checkmark word).
Looking forward, we should probably reclaim the checkmark bit, too,
but that's going to be quite a bit more work.
Tested by setting doubleCheck in heapBitsSetType and running all.bash
on both linux/amd64 and linux/386, and by running GOGC=10 all.bash.
This particularly improves the FmtFprintf* go1 benchmarks, since they
do a large amount of noscan allocation.
name old time/op new time/op delta
BinaryTree17-12 2.34s ± 1% 2.38s ± 1% +1.70% (p=0.000 n=17+19)
Fannkuch11-12 2.09s ± 0% 2.09s ± 1% ~ (p=0.276 n=17+16)
FmtFprintfEmpty-12 44.9ns ± 2% 44.8ns ± 2% ~ (p=0.340 n=19+18)
FmtFprintfString-12 127ns ± 0% 125ns ± 0% -1.57% (p=0.000 n=16+15)
FmtFprintfInt-12 128ns ± 0% 122ns ± 1% -4.45% (p=0.000 n=15+20)
FmtFprintfIntInt-12 207ns ± 1% 193ns ± 0% -6.55% (p=0.000 n=19+14)
FmtFprintfPrefixedInt-12 197ns ± 1% 191ns ± 0% -2.93% (p=0.000 n=17+18)
FmtFprintfFloat-12 263ns ± 0% 248ns ± 1% -5.88% (p=0.000 n=15+19)
FmtManyArgs-12 794ns ± 0% 779ns ± 1% -1.90% (p=0.000 n=18+18)
GobDecode-12 7.14ms ± 2% 7.11ms ± 1% ~ (p=0.072 n=20+20)
GobEncode-12 5.85ms ± 1% 5.82ms ± 1% -0.49% (p=0.000 n=20+20)
Gzip-12 218ms ± 1% 215ms ± 1% -1.22% (p=0.000 n=19+19)
Gunzip-12 36.8ms ± 0% 36.7ms ± 0% -0.18% (p=0.006 n=18+20)
HTTPClientServer-12 77.1µs ± 4% 77.1µs ± 3% ~ (p=0.945 n=19+20)
JSONEncode-12 15.6ms ± 1% 15.9ms ± 1% +1.68% (p=0.000 n=18+20)
JSONDecode-12 55.2ms ± 1% 53.6ms ± 1% -2.93% (p=0.000 n=17+19)
Mandelbrot200-12 4.05ms ± 1% 4.05ms ± 0% ~ (p=0.306 n=17+17)
GoParse-12 3.14ms ± 1% 3.10ms ± 1% -1.31% (p=0.000 n=19+18)
RegexpMatchEasy0_32-12 69.3ns ± 1% 70.0ns ± 0% +0.89% (p=0.000 n=19+17)
RegexpMatchEasy0_1K-12 237ns ± 1% 236ns ± 0% -0.62% (p=0.000 n=19+16)
RegexpMatchEasy1_32-12 69.5ns ± 1% 70.3ns ± 1% +1.14% (p=0.000 n=18+17)
RegexpMatchEasy1_1K-12 377ns ± 1% 366ns ± 1% -3.03% (p=0.000 n=15+19)
RegexpMatchMedium_32-12 107ns ± 1% 107ns ± 2% ~ (p=0.318 n=20+19)
RegexpMatchMedium_1K-12 33.8µs ± 3% 33.5µs ± 1% -1.04% (p=0.001 n=20+19)
RegexpMatchHard_32-12 1.68µs ± 1% 1.73µs ± 0% +2.50% (p=0.000 n=20+18)
RegexpMatchHard_1K-12 50.8µs ± 1% 52.0µs ± 1% +2.50% (p=0.000 n=19+18)
Revcomp-12 381ms ± 1% 385ms ± 1% +1.00% (p=0.000 n=17+18)
Template-12 64.9ms ± 3% 62.6ms ± 1% -3.55% (p=0.000 n=19+18)
TimeParse-12 324ns ± 0% 328ns ± 1% +1.25% (p=0.000 n=18+18)
TimeFormat-12 345ns ± 0% 334ns ± 0% -3.31% (p=0.000 n=15+17)
[Geo mean] 52.1µs 51.5µs -1.00%
Change-Id: I13e74da3193a7f80794c654f944d1f0d60817049
Reviewed-on: https://go-review.googlesource.com/22632
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This makes this code better self-documenting and makes it easier to
find these places in the future.
Change-Id: I31dc5598ae67f937fb9ef26df92fd41d01e983c3
Reviewed-on: https://go-review.googlesource.com/22631
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
heapBits.bits is carefully written to produce good machine code. Use
it in heapBits.morePointers and heapBits.isPointer to get good machine
code there, too.
Change-Id: I208c7d0d38697e7a22cad67f692162589b75f1e2
Reviewed-on: https://go-review.googlesource.com/22630
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Fix issues introduced in 5f9a870.
Change-Id: Ia75945ef563956613bf88bbe57800a96455c265d
Reviewed-on: https://go-review.googlesource.com/22661
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Add support for the context function set by runtime.SetCgoTraceback.
The context function was added in CL 17761, without support.
This CL is the support.
This CL has not been tested for real C code, as a working context
function for C code requires unwind support that does not seem to exist.
I wanted to get the CL out before the freeze.
I apologize for the length of this CL. It's mostly plumbing, but
unfortunately the plumbing is processor-specific.
Change-Id: I8ce11a0de9b3dafcc29efd2649d776e93bff0e90
Reviewed-on: https://go-review.googlesource.com/22508
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This commit moves the GC from free list allocation to
bit mark allocation. Instead of using the bitmaps
generated during the mark phases to generate free
list and then using the free lists for allocation we
allocate directly from the bitmaps.
The change in the garbage benchmark
name old time/op new time/op delta
XBenchGarbage-12 2.22ms ± 1% 2.13ms ± 1% -3.90% (p=0.000 n=18+18)
Change-Id: I17f57233336f0ca5ef5404c3be4ecb443ab622aa
nextFreeFast is currently not inlined by the compiler due
to its size and complexity. This CL simplifies
nextFreeFast by letting the slow path handle (nextFree)
handle a corner cases.
Change-Id: Ia9c5d1a7912bcb4bec072f5fd240f0e0bafb20e4
Reviewed-on: https://go-review.googlesource.com/22598
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
sweep used to skip mcental.freeSpan (and its locking) if it didn't
find any new free objects. We lost that optimization when the
freed-object counting changed in dad83f7 to count total free objects
instead of newly freed objects.
The previous commit brings back counting of newly freed objects, so we
can easily revive this optimization by checking that count (like we
used to) instead of the total free objects count.
Change-Id: I43658707a1c61674d0366124d5976b00d98741a9
Reviewed-on: https://go-review.googlesource.com/22596
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
Commit 8dda1c4 changed the meaning of "nfree" in sweep from the number
of newly freed objects to the total number of free objects in the
span, but didn't update where sweep added nfree to c.local_nsmallfree.
Hence, we're over-accounting the number of frees. This is causing
TestArrayHash to fail with "too many allocs NNN - hash not balanced".
Fix this by computing the number of newly freed objects and adding
that to c.local_nsmallfree, so it behaves like it used to. Computing
this requires a small tweak to mallocgc: apparently we've never set
s.allocCount when allocating a large object; fix this by setting it to
1 so sweep doesn't get confused.
Change-Id: I31902ffd310110da4ffd807c5c06f1117b872dc8
Reviewed-on: https://go-review.googlesource.com/22595
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
We broke tracing of freed objects in GODEBUG=allocfreetrace=1 mode
when we removed the sweep over the mark bitmap. Fix it by
re-introducing the sweep over the bitmap specifically if we're in
allocfreetrace mode. This doesn't have to be even remotely efficient,
since the overhead of allocfreetrace is huge anyway, so we can keep
the code for this down to just a few lines.
Change-Id: I9e176b3b04c73608a0ea3068d5d0cd30760ebd40
Reviewed-on: https://go-review.googlesource.com/22592
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently we always zero objects when we allocate them. We used to
have an optimization that would not zero objects that had not been
allocated since the whole span was last zeroed (either by getting it
from the system or by getting it from the heap, which does a bulk
zero), but this depended on the sweeper clobbering the first two words
of each object. Hence, we lost this optimization when the bitmap
sweeper went away.
Re-introduce this optimization using a different mechanism. Each span
already keeps a flag indicating that it just came from the OS or was
just bulk zeroed by the mheap. We can simply use this flag to know
when we don't need to zero an object. This is slightly less efficient
than the old optimization: if a span gets allocated and partially
used, then GC happens and the span gets returned to the mcentral, then
the span gets re-acquired, the old optimization knew that it only had
to re-zero the objects that had been reclaimed, whereas this
optimization will re-zero everything. However, in this case, you're
already paying for the garbage collection, and you've only wasted one
zeroing of the span, so in practice there seems to be little
difference. (If we did want to revive the full optimization, each span
could keep track of a frontier beyond which all free slots are zeroed.
I prototyped this and it didn't obvious do any better than the much
simpler approach in this commit.)
This significantly improves BinaryTree17, which is allocation-heavy
(and runs first, so most pages are already zeroed), and slightly
improves everything else.
name old time/op new time/op delta
XBenchGarbage-12 2.15ms ± 1% 2.14ms ± 1% -0.80% (p=0.000 n=17+17)
name old time/op new time/op delta
BinaryTree17-12 2.71s ± 1% 2.56s ± 1% -5.73% (p=0.000 n=18+19)
DivconstI64-12 1.70ns ± 1% 1.70ns ± 1% ~ (p=0.562 n=18+18)
DivconstU64-12 1.74ns ± 2% 1.74ns ± 1% ~ (p=0.394 n=20+20)
DivconstI32-12 1.74ns ± 0% 1.74ns ± 0% ~ (all samples are equal)
DivconstU32-12 1.66ns ± 1% 1.66ns ± 0% ~ (p=0.516 n=15+16)
DivconstI16-12 1.84ns ± 0% 1.84ns ± 0% ~ (all samples are equal)
DivconstU16-12 1.82ns ± 0% 1.82ns ± 0% ~ (all samples are equal)
DivconstI8-12 1.79ns ± 0% 1.79ns ± 0% ~ (all samples are equal)
DivconstU8-12 1.60ns ± 0% 1.60ns ± 1% ~ (p=0.603 n=17+19)
Fannkuch11-12 2.11s ± 1% 2.11s ± 0% ~ (p=0.333 n=16+19)
FmtFprintfEmpty-12 45.1ns ± 4% 45.4ns ± 5% ~ (p=0.111 n=20+20)
FmtFprintfString-12 134ns ± 0% 129ns ± 0% -3.45% (p=0.000 n=18+16)
FmtFprintfInt-12 131ns ± 1% 129ns ± 1% -1.54% (p=0.000 n=16+18)
FmtFprintfIntInt-12 205ns ± 2% 203ns ± 0% -0.56% (p=0.014 n=20+18)
FmtFprintfPrefixedInt-12 200ns ± 2% 197ns ± 1% -1.48% (p=0.000 n=20+18)
FmtFprintfFloat-12 256ns ± 1% 256ns ± 0% -0.21% (p=0.008 n=18+20)
FmtManyArgs-12 805ns ± 0% 804ns ± 0% -0.19% (p=0.001 n=18+18)
GobDecode-12 7.21ms ± 1% 7.14ms ± 1% -0.92% (p=0.000 n=19+20)
GobEncode-12 5.88ms ± 1% 5.88ms ± 1% ~ (p=0.641 n=18+19)
Gzip-12 218ms ± 1% 218ms ± 1% ~ (p=0.271 n=19+18)
Gunzip-12 37.1ms ± 0% 36.9ms ± 0% -0.29% (p=0.000 n=18+17)
HTTPClientServer-12 78.1µs ± 2% 77.4µs ± 2% ~ (p=0.070 n=19+19)
JSONEncode-12 15.5ms ± 1% 15.5ms ± 0% ~ (p=0.063 n=20+18)
JSONDecode-12 56.1ms ± 0% 55.4ms ± 1% -1.18% (p=0.000 n=19+18)
Mandelbrot200-12 4.05ms ± 0% 4.06ms ± 0% +0.29% (p=0.001 n=18+18)
GoParse-12 3.28ms ± 1% 3.21ms ± 1% -2.30% (p=0.000 n=20+20)
RegexpMatchEasy0_32-12 69.4ns ± 2% 69.3ns ± 1% ~ (p=0.205 n=18+16)
RegexpMatchEasy0_1K-12 239ns ± 0% 239ns ± 0% ~ (all samples are equal)
RegexpMatchEasy1_32-12 69.4ns ± 1% 69.4ns ± 1% ~ (p=0.620 n=15+18)
RegexpMatchEasy1_1K-12 370ns ± 1% 369ns ± 2% ~ (p=0.088 n=20+20)
RegexpMatchMedium_32-12 108ns ± 0% 108ns ± 0% ~ (all samples are equal)
RegexpMatchMedium_1K-12 33.6µs ± 3% 33.5µs ± 3% ~ (p=0.718 n=20+20)
RegexpMatchHard_32-12 1.68µs ± 1% 1.67µs ± 2% ~ (p=0.316 n=20+20)
RegexpMatchHard_1K-12 50.5µs ± 3% 50.4µs ± 3% ~ (p=0.659 n=20+20)
Revcomp-12 381ms ± 1% 381ms ± 1% ~ (p=0.916 n=19+18)
Template-12 66.5ms ± 1% 65.8ms ± 2% -1.08% (p=0.000 n=20+20)
TimeParse-12 317ns ± 0% 319ns ± 0% +0.48% (p=0.000 n=19+12)
TimeFormat-12 338ns ± 0% 338ns ± 0% ~ (p=0.124 n=19+18)
[Geo mean] 5.99µs 5.96µs -0.54%
Change-Id: I638ffd9d9f178835bbfa499bac20bd7224f1a907
Reviewed-on: https://go-review.googlesource.com/22591
Reviewed-by: Rick Hudson <rlh@golang.org>
This converts all remaining uses of mspan.start to instead use
mspan.base(). In many cases, this actually reduces the complexity of
the code.
Change-Id: If113840e00d3345a6cf979637f6a152e6344aee7
Reviewed-on: https://go-review.googlesource.com/22590
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
Currently we have lots of (s.start << _PageShift) and variants. We now
have an s.base() function that returns this. It's faster and more
readable, so use it.
Change-Id: I888060a9dae15ea75ca8cc1c2b31c905e71b452b
Reviewed-on: https://go-review.googlesource.com/22559
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
These used to be used for the list of newly freed objects, but that's
no longer a thing.
Change-Id: I5a4503137b74ec0eae5372ca271b1aa0b32df074
Reviewed-on: https://go-review.googlesource.com/22557
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Our compilers now provides instrinsics including
sys.Ctz64 that support CTZ (count trailing zero)
instructions. This CL replaces the Go versions
of CTZ with the compiler intrinsic.
Count trailing zeros CTZ finds the least
significant 1 in a word and returns the number
of less significant 0s in the word.
Allocation uses the bitmap created by the garbage
collector to locate an unmarked object. The logic
takes a word of the bitmap, complements, and then
caches it. It then uses CTZ to locate an available
unmarked object. It then shifts marked bits out of
the bitmap word preparing it for the next search.
Once all the unmarked objects are used in the
cached work the bitmap gets another word and
repeats the process.
Change-Id: Id2fc42d1d4b9893efaa2e1bd01896985b7e42f82
Reviewed-on: https://go-review.googlesource.com/21366
Reviewed-by: Austin Clements <austin@google.com>
Two changes are included here that are dependent on the other.
The first is that allocBits and gcamrkBits are changed to
a *uint8 which points to the first byte of that span's
mark and alloc bits. Several places were altered to
perform pointer arithmetic to locate the byte corresponding
to an object in the span. The actual bit corresponding
to an object is indexed in the byte by using the lower three
bits of the objects index.
The second change avoids the redundant calculation of an
object's index. The index is returned from heapBitsForObject
and then used by the functions indexing allocBits
and gcmarkBits.
Finally we no longer allocate the gc bits in the span
structures. Instead we use an arena based allocation scheme
that allows for a more compact bit map as well as recycling
and bulk clearing of the mark bits.
Change-Id: If4d04b2021c092ec39a4caef5937a8182c64dfef
Reviewed-on: https://go-review.googlesource.com/20705
Reviewed-by: Austin Clements <austin@google.com>
The _SigUnblock flag was appended to SIGSYS slot of runtime signal table
for Linux in https://go-review.googlesource.com/22202, but there is
still no concrete opinion on whether SIGSYS must be an unblocked signal
for runtime.
This change removes _SigUnblock flag from SIGSYS on Linux for
consistency in runtime signal handling and adds a reference to #15204 to
runtime signal table for FreeBSD.
Updates #15204.
Change-Id: I42992b1d852c2ab5dd37d6dbb481dba46929f665
Reviewed-on: https://go-review.googlesource.com/22537
Reviewed-by: Ian Lance Taylor <iant@golang.org>
It wasn't rendering as HTML nicely.
Change-Id: I5408ec22932a05e85c210c0faa434bd19dce5650
Reviewed-on: https://go-review.googlesource.com/22532
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The complexity of the GC work buffers put and tryGet
prevented them from being inlined. This CL simplifies
the fast path thus enabling inlining. If the fast
path does not succeed the previous put and tryGet
functions are called.
Change-Id: I6da6495d0dadf42bd0377c110b502274cc01acf5
Reviewed-on: https://go-review.googlesource.com/20704
Reviewed-by: Austin Clements <austin@google.com>
Prior to this CL the base of a span was calculated in various
places using shifts or calls to base(). This CL now
always calls base() which has been optimized to calculate the
base of the span when the span is initialized and store that
value in the span structure.
Change-Id: I661f2bfa21e3748a249cdf049ef9062db6e78100
Reviewed-on: https://go-review.googlesource.com/20703
Reviewed-by: Austin Clements <austin@google.com>
Prior to this CL the sweep phase was responsible for locating
all objects that were about to be freed and calling a function
to process the object. This was done by the function
heapBitsSweepSpan. Part of processing included calls to
tracefree and msanfree as well as counting how many objects
were freed.
The calls to tracefree and msanfree have been moved into the
gcmalloc routine and called when the object is about to be
reallocated. The counting of free objects has been optimized
using an array based popcnt algorithm and if all the objects
in a span are free then span is freed.
Similarly the code to locate the next free object has been
optimized to use an array based ctz (count trailing zero).
Various hot paths in the allocation logic have been optimized.
At this point the garbage benchmark is within 3% of the 1.6
release.
Change-Id: I00643c442e2ada1685c010c3447e4ea8537d2dfa
Reviewed-on: https://go-review.googlesource.com/20201
Reviewed-by: Austin Clements <austin@google.com>
Add to each span a 64 bit cache (allocCache) of the allocBits
at freeindex. allocCache is shifted such that the lowest bit
corresponds to the bit freeindex. allocBits uses a 0 to
indicate an object is free, on the other hand allocCache
uses a 1 to indicate an object is free. This facilitates
ctz64 (count trailing zero) which counts the number of 0s
trailing the least significant 1. This is also the index of
the least significant 1.
Each span maintains a freeindex indicating the boundary
between allocated objects and unallocated objects. allocCache
is shifted as freeindex is incremented such that the low bit
in allocCache corresponds to the bit a freeindex in the
allocBits array.
Currently ctz64 is written in Go using a for loop so it is
not very efficient. Use of the hardware instruction will
follow. With this in mind comparisons of the garbage
benchmark are as follows.
1.6 release 2.8 seconds
dev:garbage branch 3.1 seconds.
Profiling shows the go implementation of ctz64 takes up
1% of the total time.
Change-Id: If084ed9c3b1eda9f3c6ab2e794625cb870b8167f
Reviewed-on: https://go-review.googlesource.com/20200
Reviewed-by: Austin Clements <austin@google.com>
Most (all?) processors that Go supports supply a hardware
instruction that takes a byte and returns the number
of zeros trailing the first 1 encountered, or 8
if no ones are found. This is the index within the
byte of the first 1 encountered. CTZ should improve the
performance of the nextFreeIndex function.
Since nextFreeIndex wants the next unmarked (0) bit
a bit-wise complement is needed before calling ctz.
Furthermore unmarked bits associated with previously
allocated objects need to be ignored. Instead of writing
a 1 as we allocate the code masks all bits less than the
freeindex after loading the byte.
While this CL does not actual execute a CTZ instruction
it supplies a ctz function with the appropiate signature
along with the logic to execute it.
Change-Id: I5c55ce0ed48ca22c21c4dd9f969b0819b4eadaa7
Reviewed-on: https://go-review.googlesource.com/20169
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This is a renaming of the field ref to the
more appropriate allocCount. The field
holds the number of objects in the span
that are currently allocated. Some throws
strings were adjusted to more accurately
convey the meaning of allocCount.
Change-Id: I10daf44e3e9cc24a10912638c7de3c1984ef8efe
Reviewed-on: https://go-review.googlesource.com/19518
Reviewed-by: Austin Clements <austin@google.com>
Instead of building a freelist from the mark bits generated
by the GC this CL allocates directly from the mark bits.
The approach moves the mark bits from the pointer/no pointer
heap structures into their own per span data structures. The
mark/allocation vectors consist of a single mark bit per
object. Two vectors are maintained, one for allocation and
one for the GC's mark phase. During the GC cycle's sweep
phase the interpretation of the vectors is swapped. The
mark vector becomes the allocation vector and the old
allocation vector is cleared and becomes the mark vector that
the next GC cycle will use.
Marked entries in the allocation vector indicate that the
object is not free. Each allocation vector maintains a boundary
between areas of the span already allocated from and areas
not yet allocated from. As objects are allocated this boundary
is moved until it reaches the end of the span. At this point
further allocations will be done from another span.
Since we no longer sweep a span inspecting each freed object
the responsibility for maintaining pointer/scalar bits in
the heapBitMap containing is now the responsibility of the
the routines doing the actual allocation.
This CL is functionally complete and ready for performance
tuning.
Change-Id: I336e0fc21eef1066e0b68c7067cc71b9f3d50e04
Reviewed-on: https://go-review.googlesource.com/19470
Reviewed-by: Austin Clements <austin@google.com>
The gcmarkBits is a bit vector used by the GC to mark
reachable objects. Once a GC cycle is complete the gcmarkBits
swap places with the allocBits. allocBits is then used directly
by malloc to locate free objects, thus avoiding the
construction of a linked free list. This CL introduces a set
of helper functions for manipulating gcmarkBits and allocBits
that will be used by later CLs to realize the actual
algorithm. Minimal attempts have been made to optimize these
helper routines.
Change-Id: I55ad6240ca32cd456e8ed4973c6970b3b882dd34
Reviewed-on: https://go-review.googlesource.com/19420
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Rick Hudson <rlh@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
In preparation for changing how the next free object is chosen
refactor and consolidate code into a single function.
Change-Id: I6836cd88ed7cbf0b2df87abd7c1c3b9fabc1cbd8
Reviewed-on: https://go-review.googlesource.com/19317
Reviewed-by: Austin Clements <austin@google.com>
The freelist for normal objects and the freelist
for stacks share the same mspan field for holding
the list head but are operated on by different code
sequences. This overloading complicates the use of bit
vectors for allocation of normal objects. This change
refactors the use of the stackfreelist out from the
use of freelist.
Change-Id: I5b155b5b8a1fcd8e24c12ee1eb0800ad9b6b4fa0
Reviewed-on: https://go-review.googlesource.com/19315
Reviewed-by: Austin Clements <austin@google.com>
The bitmap allocation data structure prototypes. Before
this is released these underlying data structures need
to be more performant but the signatures of helper
functions utilizing these structures will remain stable.
Change-Id: I5ace12f2fb512a7038a52bbde2bfb7e98783bcbe
Reviewed-on: https://go-review.googlesource.com/19221
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
These are used at the bottom level of various GC operations that must
not be preempted. To be on the safe side, mark them all nosplit.
Change-Id: I8f7360e79c9852bd044df71413b8581ad764380c
Reviewed-on: https://go-review.googlesource.com/22504
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
Fixes#15468
Change-Id: I8723171f87774a98d5e80e7832ebb96dd1fbea74
Reviewed-on: https://go-review.googlesource.com/22524
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
Now it is possible to build a c-archive as PIC on darwin/arm (this is
now the default). Then the system linker can link the binary using
the archive as PIE.
Fixes#12896.
Change-Id: Iad84131572422190f5fa036e7d71910dc155f155
Reviewed-on: https://go-review.googlesource.com/22461
Reviewed-by: David Crawshaw <crawshaw@golang.org>
TestNoRaceIOHttp does all kinds of bad things:
1. Binds to a fixed port, so concurrent tests fail.
2. Registers HTTP handler multiple times, so repeated tests fail.
3. Relies on sleep to wait for listen.
Fix all of that.
Change-Id: I1210b7797ef5e92465b37dc407246d92a2a24fe8
Reviewed-on: https://go-review.googlesource.com/19953
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Currently we clear gcscanvalid in both casgstatus and
casfrom_Gscanstatus if the new status is _Grunning. This is very
important to do in casgstatus. However, this is potentially wrong in
casfrom_Gscanstatus because in this case the caller doesn't own gp and
hence the write is racy. Unlike the other _Gscan statuses, during
_Gscanrunning, the G is still running. This does not indicate that
it's transitioning into a running state. The scan simply hasn't
happened yet, so it's neither valid nor invalid.
Conveniently, this also means clearing gcscanvalid is unnecessary in
this case because the G was already in _Grunning, so we can simply
remove this code. What will happen instead is that the G will be
preempted to scan itself, that scan will set gcscanvalid to true, and
then the G will return to _Grunning via casgstatus, clearing
gcscanvalid.
This fix will become necessary shortly when we start keeping track of
the set of G's with dirty stacks, since it will no longer be
idempotent to simply set gcscanvalid to false.
Change-Id: I688c82e6fbf00d5dbbbff49efa66acb99ee86785
Reviewed-on: https://go-review.googlesource.com/20669
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This adds a best-effort pass to remove stack barriers immediately
after the end of mark termination. This isn't necessary for the Go
runtime, but should help external tools that perform stack walks but
aren't aware of Go's stack barriers such as GDB, perf, and VTune.
(Though clearly they'll still have trouble unwinding stacks during
mark.)
Change-Id: I66600fae1f03ee36b5459d2b00dcc376269af18e
Reviewed-on: https://go-review.googlesource.com/20668
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently we remove stack barriers during STW mark termination, which
has a non-trivial per-goroutine cost and means that we have to touch
even clean stacks during mark termination. However, there's no problem
with leaving them in during the sweep phase. They just have to be out
by the time we install new stack barriers immediately prior to
scanning the stack such as during the mark phase of the next GC cycle
or during mark termination in a STW GC.
Hence, move the gcRemoveStackBarriers from STW mark termination to
just before we install new stack barriers during concurrent mark. This
removes the cost from STW. Furthermore, this combined with concurrent
stack shrinking means that the mark termination scan of a clean stack
is a complete no-op, which will make it possible to skip clean stacks
entirely during mark termination.
This has the downside that it will mess up anything outside of Go that
tries to walk Go stacks all the time instead of just some of the time.
This includes tools like GDB, perf, and VTune. We'll improve the
situation shortly.
Change-Id: Ia40baad8f8c16aeefac05425e00b0cf478137097
Reviewed-on: https://go-review.googlesource.com/20667
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently we enqueue span root mark jobs during both concurrent mark
and mark termination, but we make the job a no-op during mark
termination.
This is silly. Instead of queueing them up just to not do them, don't
queue them up in the first place.
Change-Id: Ie1d36de884abfb17dd0db6f0449a2b7c997affab
Reviewed-on: https://go-review.googlesource.com/20666
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently we free cached stacks of dead Gs during STW stack root
marking. We do this during STW because there's no way to take
ownership of a particular dead G, so attempting to free a dead G's
stack during concurrent stack root marking could race with reusing
that G.
However, we can do this concurrently if we take a completely different
approach. One way to prevent reuse of a dead G is to remove it from
the free G list. Hence, this adds a new fixed root marking task that
simply removes all Gs from the list of dead Gs with cached stacks,
frees their stacks, and then adds them to the list of dead Gs without
cached stacks.
This is also a necessary step toward rescanning only dirty stacks,
since it eliminates another task from STW stack marking.
Change-Id: Iefbad03078b284a2e7bf30fba397da4ca87fe095
Reviewed-on: https://go-review.googlesource.com/20665
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently all free Gs are added to one list. Split this into two
lists: one for free Gs with cached stacks and one for Gs without
cached stacks.
This lets us preferentially allocate Gs that already have a stack, but
more importantly, it sets us up to free cached G stacks concurrently.
Change-Id: Idbe486f708997e1c9d166662995283f02d1eeb3c
Reviewed-on: https://go-review.googlesource.com/20664
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Also adds TestGdbBacktrace to the runtime package.
Dwarf modifications written by Bryan Chan (@bryanpkc) who is also
at IBM and covered by the same CLA.
Fixes#14628
Change-Id: I106a1f704c3745a31f29cdadb0032e3905829850
Reviewed-on: https://go-review.googlesource.com/20193
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The code sequence for large-offset floating-point stores
includes adding the base pointer to r11. Make sure we
can interpret that instruction correctly.
Fixes build.
Fixes#15440
Change-Id: I7fe5a4a57e08682967052bf77c54e0ec47fcb53e
Reviewed-on: https://go-review.googlesource.com/22440
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Zero the entire buffer so we don't need to
lower its capacity upon return. This lets callers
do some appending without allocation.
Zeroing is cheap, the byte buffer requires only
4 extra instructions.
Fixes#14235
Change-Id: I970d7badcef047dafac75ac17130030181f18fe2
Reviewed-on: https://go-review.googlesource.com/22424
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
On some processors cputicks (used to generate trace timestamps)
produce non-monotonic timestamps. It is important that the parser
distinguishes logically inconsistent traces (e.g. missing, excessive
or misordered events) from broken timestamps. The former is a bug
in tracer, the latter is a machine issue.
Test that (1) parser does not return a logical error in case of
broken timestamps and (2) broken timestamps are eventually detected
and reported.
Change-Id: Ib4b1eb43ce128b268e754400ed8b5e8def04bd78
Reviewed-on: https://go-review.googlesource.com/21608
Reviewed-by: Austin Clements <austin@google.com>
Currently tracer uses global sequencer and it introduces
significant slowdown on parallel machines (up to 10x).
Replace the global sequencer with per-goroutine sequencer.
If we assign per-goroutine sequence numbers to only 3 types
of events (start, unblock and syscall exit), it is enough to
restore consistent partial ordering of all events. Even these
events don't need sequence numbers all the time (if goroutine
starts on the same P where it was unblocked, then start does
not need sequence number).
The burden of restoring the order is put on trace parser.
Details of the algorithm are described in the comments.
On http benchmark with GOMAXPROCS=48:
no tracing: 5026 ns/op
tracing: 27803 ns/op (+453%)
with this change: 6369 ns/op (+26%, mostly for traceback)
Also trace size is reduced by ~22%. Average event size before: 4.63
bytes/event, after: 3.62 bytes/event.
Besides running trace tests, I've also tested with manually broken
cputicks (random skew for each event, per-P skew and episodic random skew).
In all cases broken timestamps were detected and no test failures.
Change-Id: I078bde421ccc386a66f6c2051ab207bcd5613efa
Reviewed-on: https://go-review.googlesource.com/21512
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The special case was because PPC did not support external linking, but
now it does.
Fixes#10410.
Change-Id: I9b024686e0f03da7a44c1c59b41c529802f16ab0
Reviewed-on: https://go-review.googlesource.com/22372
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently when we compute the trigger for the next GC, we do it based
on an estimate of the reachable heap size at the start of the GC
cycle, which is itself based on an estimate of the floating garbage.
This was introduced by 4655aad to fix a bad feedback loop that allowed
the heap to grow to many times the true reachable size.
However, this estimate gets easily confused by rapidly allocating
applications, and, worse it's different than the heap size the trigger
controller uses to compute the trigger itself. This results in the
trigger controller often thinking that GC finished before it started.
Since this would be a pretty great outcome from it's perspective, it
sets the trigger for the next cycle as close to the next goal as
possible (which is limited to 95% of the goal).
Furthermore, the bad feedback loop this estimate originally fixed
seems not to happen any more, suggesting it was fixed more correctly
by some other change in the mean time. Finally, with the change to
allocate black, it shouldn't even be theoretically possible for this
bad feedback loop to occur.
Hence, eliminate the floating garbage estimate and simply consider the
reachable heap to be the marked heap. This harms overall throughput
slightly for allocation-heavy benchmarks, but significantly improves
mutator availability.
Fixes#12204. This brings the average trigger in this benchmark from
0.95 (the cap) to 0.7 and the active GC utilization from ~90% to ~45%.
Updates #14951. This makes the trigger controller much better behaved,
so it pulls the trigger lower if assists are consuming a lot of CPU
like it's supposed to, increasing mutator availability.
name old time/op new time/op delta
XBenchGarbage-12 2.21ms ± 1% 2.28ms ± 3% +3.29% (p=0.000 n=17+17)
Some of this slow down we paid for in earlier commits. Relative to the
start of the series to switch to allocate-black (the parent of "count
black allocations toward scan work"), the garbage benchmark is 2.62%
slower.
name old time/op new time/op delta
BinaryTree17-12 2.53s ± 3% 2.53s ± 3% ~ (p=0.708 n=20+19)
Fannkuch11-12 2.08s ± 0% 2.08s ± 0% -0.22% (p=0.002 n=19+18)
FmtFprintfEmpty-12 45.3ns ± 2% 45.2ns ± 3% ~ (p=0.505 n=20+20)
FmtFprintfString-12 129ns ± 0% 131ns ± 2% +1.80% (p=0.000 n=16+19)
FmtFprintfInt-12 121ns ± 2% 121ns ± 2% ~ (p=0.768 n=19+19)
FmtFprintfIntInt-12 186ns ± 1% 188ns ± 3% +0.99% (p=0.000 n=19+19)
FmtFprintfPrefixedInt-12 188ns ± 1% 188ns ± 1% ~ (p=0.947 n=18+16)
FmtFprintfFloat-12 254ns ± 1% 255ns ± 1% +0.30% (p=0.002 n=19+17)
FmtManyArgs-12 763ns ± 0% 770ns ± 0% +0.92% (p=0.000 n=18+18)
GobDecode-12 7.00ms ± 1% 7.04ms ± 1% +0.61% (p=0.049 n=20+20)
GobEncode-12 5.88ms ± 1% 5.88ms ± 0% ~ (p=0.641 n=18+19)
Gzip-12 214ms ± 1% 215ms ± 1% +0.43% (p=0.002 n=18+19)
Gunzip-12 37.6ms ± 0% 37.6ms ± 0% +0.11% (p=0.015 n=17+18)
HTTPClientServer-12 76.9µs ± 2% 78.1µs ± 2% +1.44% (p=0.000 n=20+18)
JSONEncode-12 15.2ms ± 2% 15.1ms ± 1% ~ (p=0.271 n=19+18)
JSONDecode-12 53.1ms ± 1% 53.3ms ± 0% +0.49% (p=0.000 n=18+19)
Mandelbrot200-12 4.04ms ± 1% 4.03ms ± 0% -0.33% (p=0.005 n=18+18)
GoParse-12 3.29ms ± 1% 3.28ms ± 1% ~ (p=0.146 n=16+17)
RegexpMatchEasy0_32-12 69.9ns ± 3% 69.5ns ± 1% ~ (p=0.785 n=20+19)
RegexpMatchEasy0_1K-12 237ns ± 0% 237ns ± 0% ~ (p=1.000 n=18+18)
RegexpMatchEasy1_32-12 69.5ns ± 1% 69.2ns ± 1% -0.44% (p=0.020 n=16+19)
RegexpMatchEasy1_1K-12 372ns ± 1% 371ns ± 2% ~ (p=0.086 n=20+19)
RegexpMatchMedium_32-12 108ns ± 3% 107ns ± 1% -1.00% (p=0.004 n=19+14)
RegexpMatchMedium_1K-12 34.2µs ± 4% 34.0µs ± 2% ~ (p=0.380 n=19+20)
RegexpMatchHard_32-12 1.77µs ± 4% 1.76µs ± 3% ~ (p=0.558 n=18+20)
RegexpMatchHard_1K-12 53.4µs ± 4% 52.8µs ± 2% -1.10% (p=0.020 n=18+20)
Revcomp-12 359ms ± 4% 377ms ± 0% +5.19% (p=0.000 n=20+18)
Template-12 63.7ms ± 2% 62.9ms ± 2% -1.27% (p=0.005 n=18+20)
TimeParse-12 316ns ± 2% 313ns ± 1% ~ (p=0.059 n=20+16)
TimeFormat-12 329ns ± 0% 331ns ± 0% +0.39% (p=0.000 n=16+18)
[Geo mean] 51.6µs 51.7µs +0.18%
Change-Id: I1dce4640c8205d41717943b021039fffea863c57
Reviewed-on: https://go-review.googlesource.com/21324
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently we allocate white for most of concurrent marking. This is
based on the classical argument that it produces less floating
garbage, since allocations during GC may not get linked into the heap
and allocating white lets us reclaim these. However, it's not clear
how often this actually happens, especially since our write barrier
shades any pointer as soon as it's installed in the heap regardless of
the color of the slot.
On the other hand, allocating black has several advantages that seem
to significantly outweigh this downside.
1) It naturally bounds the total scan work to the live heap size at
the start of a GC cycle. Allocating white does not, and thus depends
entirely on assists to prevent the heap from growing faster than it
can be scanned.
2) It reduces the total amount of scan work per GC cycle by the size
of newly allocated objects that are linked into the heap graph, since
objects allocated black never need to be scanned.
3) It reduces total write barrier work since more objects will already
be black when they are linked into the heap graph.
This gives a slight overall improvement in benchmarks.
name old time/op new time/op delta
XBenchGarbage-12 2.24ms ± 0% 2.21ms ± 1% -1.32% (p=0.000 n=18+17)
name old time/op new time/op delta
BinaryTree17-12 2.60s ± 3% 2.53s ± 3% -2.56% (p=0.000 n=20+20)
Fannkuch11-12 2.08s ± 1% 2.08s ± 0% ~ (p=0.452 n=19+19)
FmtFprintfEmpty-12 45.1ns ± 2% 45.3ns ± 2% ~ (p=0.367 n=19+20)
FmtFprintfString-12 131ns ± 3% 129ns ± 0% -1.60% (p=0.000 n=20+16)
FmtFprintfInt-12 122ns ± 0% 121ns ± 2% -0.86% (p=0.000 n=16+19)
FmtFprintfIntInt-12 187ns ± 1% 186ns ± 1% ~ (p=0.514 n=18+19)
FmtFprintfPrefixedInt-12 189ns ± 0% 188ns ± 1% -0.54% (p=0.000 n=16+18)
FmtFprintfFloat-12 256ns ± 0% 254ns ± 1% -0.43% (p=0.000 n=17+19)
FmtManyArgs-12 769ns ± 0% 763ns ± 0% -0.72% (p=0.000 n=18+18)
GobDecode-12 7.08ms ± 2% 7.00ms ± 1% -1.22% (p=0.000 n=20+20)
GobEncode-12 5.88ms ± 0% 5.88ms ± 1% ~ (p=0.406 n=18+18)
Gzip-12 214ms ± 0% 214ms ± 1% ~ (p=0.103 n=17+18)
Gunzip-12 37.6ms ± 0% 37.6ms ± 0% ~ (p=0.563 n=17+17)
HTTPClientServer-12 77.2µs ± 3% 76.9µs ± 2% ~ (p=0.606 n=20+20)
JSONEncode-12 15.1ms ± 1% 15.2ms ± 2% ~ (p=0.138 n=19+19)
JSONDecode-12 53.3ms ± 1% 53.1ms ± 1% -0.33% (p=0.000 n=19+18)
Mandelbrot200-12 4.04ms ± 1% 4.04ms ± 1% ~ (p=0.075 n=19+18)
GoParse-12 3.30ms ± 1% 3.29ms ± 1% -0.57% (p=0.000 n=18+16)
RegexpMatchEasy0_32-12 69.5ns ± 1% 69.9ns ± 3% ~ (p=0.822 n=18+20)
RegexpMatchEasy0_1K-12 237ns ± 1% 237ns ± 0% ~ (p=0.398 n=19+18)
RegexpMatchEasy1_32-12 69.8ns ± 2% 69.5ns ± 1% ~ (p=0.090 n=20+16)
RegexpMatchEasy1_1K-12 371ns ± 1% 372ns ± 1% ~ (p=0.178 n=19+20)
RegexpMatchMedium_32-12 108ns ± 2% 108ns ± 3% ~ (p=0.124 n=20+19)
RegexpMatchMedium_1K-12 33.9µs ± 2% 34.2µs ± 4% ~ (p=0.309 n=20+19)
RegexpMatchHard_32-12 1.75µs ± 2% 1.77µs ± 4% +1.28% (p=0.018 n=19+18)
RegexpMatchHard_1K-12 52.7µs ± 1% 53.4µs ± 4% +1.23% (p=0.013 n=15+18)
Revcomp-12 354ms ± 1% 359ms ± 4% +1.27% (p=0.043 n=20+20)
Template-12 63.6ms ± 2% 63.7ms ± 2% ~ (p=0.654 n=20+18)
TimeParse-12 313ns ± 1% 316ns ± 2% +0.80% (p=0.014 n=17+20)
TimeFormat-12 332ns ± 0% 329ns ± 0% -0.66% (p=0.000 n=16+16)
[Geo mean] 51.7µs 51.6µs -0.09%
Change-Id: I2214a6a0e4f544699ea166073249a8efdf080dc0
Reviewed-on: https://go-review.googlesource.com/21323
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently allocating black switches to the system stack (which is
probably a historical accident) and atomically updates the global
bytes marked stat. Since we're about to depend on this much more,
optimize it a bit by putting it back on the regular stack and updating
the per-P bytes marked stat, which gets lazily folded into the global
bytes marked stat.
Change-Id: Ibbe16e5382d3fd2256e4381f88af342bf7020b04
Reviewed-on: https://go-review.googlesource.com/22170
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently we count black allocations toward the scannable heap size,
but not toward the scan work we've done so far. This is clearly
inconsistent (we have, in effect, scanned these allocations and since
they're already black, we're not going to scan them again). Worse, it
means we don't count black allocations toward the scannable heap size
as of the *next* GC because this is based on the amount of scan work
we did in this cycle.
Fix this by counting black allocations as scan work. Currently the GC
spends very little time in allocate-black mode, so this probably
hasn't been a problem, but this will become important when we switch
to always allocating black.
Change-Id: If6ff693b070c385b65b6ecbbbbf76283a0f9d990
Reviewed-on: https://go-review.googlesource.com/22119
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Consistently use type int for the size argument of
runtime.newarray, runtime.reflect_unsafe_NewArray
and reflect.unsafe_NewArray.
Change-Id: Ic77bf2dde216c92ca8c49462f8eedc0385b6314e
Reviewed-on: https://go-review.googlesource.com/22311
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Gobot Gobot <gobot@golang.org>
mapaccess{1,2} returns a pointer to the value. When the key
is not in the map, it returns a pointer to zeroed memory.
Currently, for large map values we have a complicated scheme which
dynamically allocates zeroed memory for this purpose. It is ugly
code and requires an atomic.Load in a bunch of places we'd rather
not have it.
Switch to a scheme where callsites of mapaccess{1,2} which expect
large return values pass in a pointer to zeroed memory that
mapaccess can return if the key is not found. This avoids the
atomic.Load on all map accesses with a few extra instructions only
for the large value acccesses, plus a bit of bss space.
There was a time (1.4 & 1.5?) where we did something like this but
all the tricks to make the right size zero value were done by the
linker. That scheme broke in the presence of dyamic linking.
The scheme in this CL works even when dynamic linking.
Fixes#12337
Change-Id: Ic2d0319944af33bbb59785938d9ab80958d1b4b1
Reviewed-on: https://go-review.googlesource.com/22221
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
mallocgc can calculate noscan itself. The only remaining
flag argument is needzero, so we just make that a boolean arg.
Fixes#15379
Change-Id: I839a70790b2a0c9dbcee2600052bfbd6c8148e20
Reviewed-on: https://go-review.googlesource.com/22290
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
No point in passing the slice type to these functions.
All they need is the element type. One less indirection,
maybe a few less []T type descriptors in the binary.
Change-Id: Ib0b83b5f14ca21d995ecc199ce8ac00c4eb375e6
Reviewed-on: https://go-review.googlesource.com/22275
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
The extra checks provided by newarray are
redundant in these cases.
This shrinks by one frame the call stack expected
by the pprof test.
name old time/op new time/op delta
MakeSlice-8 34.3ns ± 2% 30.5ns ± 3% -11.03% (p=0.000 n=24+22)
GrowSlicePtr-8 134ns ± 2% 129ns ± 3% -3.25% (p=0.000 n=25+24)
Change-Id: Icd828655906b921c732701fd9d61da3fa217b0af
Reviewed-on: https://go-review.googlesource.com/22276
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
On GNU/Linux, SIGSYS is specified to cause the process to terminate
without a core dump. In https://codereview.appspot.com/3749041 , it
appears that Golang accidentally introduced incorrect behavior for
this signal, which caused Golang processes to keep running after
receiving SIGSYS. This change reverts it to the old/correct behavior.
Updates #15204
Change-Id: I3aa48a9499c1bc36fa5d3f40c088fdd7599e0db5
Reviewed-on: https://go-review.googlesource.com/22202
Reviewed-by: Ian Lance Taylor <iant@golang.org>
We now inline type to interface conversions when the type
is pointer-shaped. No need to keep code to handle that in
convT2{I,E}.
Change-Id: I3a6668259556077cbb2986a9e8fe42a625d506c9
Reviewed-on: https://go-review.googlesource.com/22249
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michel Lespinasse <walken@google.com>
Introduce and start using nameOff for two encoded names. This pair
of changes is best done together because the linker's method decoder
expects the method layouts to match.
Precursor to converting all existing name and *string fields to
nameOff.
linux/amd64:
cmd/go: -45KB (0.5%)
jujud: -389KB (0.6%)
linux/amd64 PIE:
cmd/go: -170KB (1.4%)
jujud: -1.5MB (1.8%)
For #6853.
Change-Id: Ia044423f010fb987ce070b94c46a16fc78666ff6
Reviewed-on: https://go-review.googlesource.com/21396
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Currently the scavenger marks memory unused in multiples of the
allocator page size (8K). This is safe as long as the true physical
page size is 4K (or 8K), as it is on many platforms. However, on
ARM64, PPC64x, and MIPS64, the physical page size is larger than 8K,
so if we attempt to mark memory unused, the kernel will round the
boundaries of the region *out* to all pages covered by the requested
region, and we'll release a larger region of memory than intended. As
a result, the scavenger is currently disabled on these platforms.
Fix this by first rounding the region to be marked unused *in* to
multiples of the physical page size, so that when we ask the kernel to
mark it unused, it releases exactly the requested region.
Fixes#9993.
Change-Id: I96d5fdc2f77f9d69abadcea29bcfe55e68288cb1
Reviewed-on: https://go-review.googlesource.com/22066
Reviewed-by: Rick Hudson <rlh@golang.org>
If sysUnused is passed an address or length that is not aligned to the
physical page boundary, the kernel will unmap more memory than the
caller wanted. Add a check for this.
For #9993.
Change-Id: I68ff03032e7b65cf0a853fe706ce21dc7f2aaaf8
Reviewed-on: https://go-review.googlesource.com/22065
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
The runtime hard-codes an assumed physical page size. If this is
smaller than the kernel's page size or not a multiple of it, sysUnused
may incorrectly release more memory to the system than intended.
Add a runtime startup check that the runtime's assumed physical page
is compatible with the kernel's physical page size.
For #9993.
Change-Id: Ida9d07f93c00ca9a95dd55fc59bf0d8a607f6728
Reviewed-on: https://go-review.googlesource.com/22064
Reviewed-by: Rick Hudson <rlh@golang.org>
archauxv no longer does anything on 386, so remove it.
Change-Id: I94545238e40fa6a6832a7c3b40aedfc6c1f6a97b
Reviewed-on: https://go-review.googlesource.com/22063
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The Linux kernel provides 16 bytes of random data via the auxv vector
at startup. Currently we consume this separately on 386, amd64, arm,
and arm64. Now that we have a common auxv parser, handle _AT_RANDOM in
the common path.
Change-Id: Ib69549a1d37e2d07a351cf0f44007bcd24f0d20d
Reviewed-on: https://go-review.googlesource.com/22062
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Currently several different Linux architectures have separate copies
of the auxv parser. Bring these all together into a single copy of the
parser that calls out to a per-arch handler for each tag/value pair.
This is in preparation for handling common auxv tags in one place.
For #9993.
Change-Id: Iceebc3afad6b4133b70fca7003561ae370445c10
Reviewed-on: https://go-review.googlesource.com/22061
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
https://golang.org/cl/10173 intrduced msigsave, ensureSigM and
_SigUnblock but didn't enable the new signal save/restore mechanism for
SIG{HUP,INT,QUIT,ABRT,TERM} on DragonFly BSD, FreeBSD and OpenBSD.
At present, it looks like they have the implementation. This change
enables the new mechanism on DragonFly BSD, FreeBSD and OpenBSD the same
as Darwin, NetBSD.
Change-Id: Ifb4b4743b3b4f50bfcdc7cf1fe1b59c377fa2a41
Reviewed-on: https://go-review.googlesource.com/18657
Run-TryBot: Mikio Hara <mikioh.mikioh@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
sync/atomic.StorePointer (which is implemented in
runtime/atomic_pointer.go) writes the pointer twice (through two
completely different code paths, no less). Fix it to only write once.
Change-Id: Id3b2aef9aa9081c2cf096833e001b93d3dd1f5da
Reviewed-on: https://go-review.googlesource.com/21999
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Michael Matloob <matloob@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
SwapPointer is declared as
func SwapPointer(addr *unsafe.Pointer, new unsafe.Pointer) (old unsafe.Pointer)
in sync/atomic, but defined in the runtime (where it's actually
implemented) as
func sync_atomic_SwapPointer(ptr unsafe.Pointer, new unsafe.Pointer) unsafe.Pointer
Make ptr a *unsafe.Pointer in the runtime definition to match the type
in sync/atomic.
Change-Id: I99bab651b995001bbe54f9e790fdef2417ef0e9e
Reviewed-on: https://go-review.googlesource.com/21998
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Michael Matloob <matloob@golang.org>
Use deBruijn sequences to count low-order zeros.
Reorg bswap to not use &^, it takes another instruction on x86.
Change-Id: I4a5ed9fd16ee6a279d88c067e8a2ba11de821156
Reviewed-on: https://go-review.googlesource.com/22084
Reviewed-by: David Chase <drchase@google.com>
These comments were left behind after runtime.h was converted
from C to Go. I examined the original code and tried to move these
to the places that the most sense.
Change-Id: I8769d60234c0113d682f9de3bd8d6c34c450c188
Reviewed-on: https://go-review.googlesource.com/21969
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
By replacing the *string used to represent pkgPath with a
reflect.name everywhere, the embedded *string for package paths
inside the reflect.name can be replaced by an offset, nameOff.
This reduces the number of pointers in the type information.
This also moves all reflect.name types into the same section, making
it possible to use nameOff more widely in later CLs.
No significant binary size change for normal binaries, but:
linux/amd64 PIE:
cmd/go: -440KB (3.7%)
jujud: -2.6MB (3.2%)
For #6853.
Change-Id: I3890b132a784a1090b1b72b32febfe0bea77eaee
Reviewed-on: https://go-review.googlesource.com/21395
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Merge them together into os1_darwin.go. A future CL will rename it.
Change-Id: Ia4380d3296ebd5ce210908ce3582ff184566f692
Reviewed-on: https://go-review.googlesource.com/22004
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Make it clear that the point of this function stores a pointer
*without* a write barrier.
sed -i -e 's/Storep1/StorepNoWB/' $(git grep -l Storep1)
Updates #15270.
Change-Id: Ifad7e17815e51a738070655fe3b178afdadaecf6
Reviewed-on: https://go-review.googlesource.com/21994
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Michael Matloob <matloob@golang.org>
atomic.Storep1 is not supposed to invoke a write barrier (that's what
atomicstorep is for), but currently does on s390x. This causes a panic
in runtime.mapzero when it tries to use atomic.Storep1 to store what's
actually a scalar.
Fix this by eliminating the write barrier from atomic.Storep1 on
s390x. Also add some documentation to atomicstorep to explain the
difference between these.
Fixes#15270.
Change-Id: I291846732d82f090a218df3ef6351180aff54e81
Reviewed-on: https://go-review.googlesource.com/21993
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Michael Munday <munday@ca.ibm.com>
This CL introduces the typeOff type and a lookup method of the same
name that can turn a typeOff offset into an *rtype.
In a typical Go binary (built with buildmode=exe, pie, c-archive, or
c-shared), there is one moduledata and all typeOff values are offsets
relative to firstmoduledata.types. This makes computing the pointer
cheap in typical programs.
With buildmode=shared (and one day, buildmode=plugin) there are
multiple modules whose relative offset is determined at runtime.
We identify a type in the general case by the pair of the original
*rtype that references it and its typeOff value. We determine
the module from the original pointer, and then use the typeOff from
there to compute the final *rtype.
To ensure there is only one *rtype representing each type, the
runtime initializes a typemap for each module, using any identical
type from an earlier module when resolving that offset. This means
that types computed from an offset match the type mapped by the
pointer dynamic relocations.
A series of followup CLs will replace other *rtype values with typeOff
(and name/*string with nameOff).
For types created at runtime by reflect, type offsets are treated as
global IDs and reference into a reflect offset map kept by the runtime.
darwin/amd64:
cmd/go: -57KB (0.6%)
jujud: -557KB (0.8%)
linux/amd64 PIE:
cmd/go: -361KB (3.0%)
jujud: -3.5MB (4.2%)
For #6853.
Change-Id: Icf096fd884a0a0cb9f280f46f7a26c70a9006c96
Reviewed-on: https://go-review.googlesource.com/21285
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
No need to acquire the M just to change G's paniconfault flag, and the
original C implementation of SetPanicOnFault did not. The M
acquisition logic is an artifact of golang.org/cl/131010044, which was
started before golang.org/cl/123640043 (which introduced the current
"getg" function) was submitted.
Change-Id: I6d1939008660210be46904395cf5f5bbc2c8f754
Reviewed-on: https://go-review.googlesource.com/21935
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This is the first in a series of CLs to replace the use of pointers
in binary read-only data with offsets.
In standard Go binaries these CLs have a small effect, shrinking
8-byte pointers to 4-bytes. In position-independent code, it also
saves the dynamic relocation for the pointer. This has a significant
effect on the binary size when building as PIE, c-archive, or
c-shared.
darwin/amd64:
cmd/go: -12KB (0.1%)
jujud: -82KB (0.1%)
linux/amd64 PIE:
cmd/go: -86KB (0.7%)
jujud: -569KB (0.7%)
For #6853.
Change-Id: Iad5625bbeba58dabfd4d334dbee3fcbfe04b2dcf
Reviewed-on: https://go-review.googlesource.com/21284
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Load and store instructions are atomic on the s390x.
Change-Id: I0031ed2fba43f33863bca114d0fdec2e7d1ce807
Reviewed-on: https://go-review.googlesource.com/20938
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The previous cleanup was done with a buggy tool, missing some potential
rewrites.
Change-Id: I333467036e355f999a6a493e8de87e084f374e26
Reviewed-on: https://go-review.googlesource.com/21378
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Merge the amd64 lfstack implementation into the general 64 bit
implementation.
Change-Id: Id9ed61b90d2e3bc3b0246294c03eb2c92803b6ca
Reviewed-on: https://go-review.googlesource.com/21707
Run-TryBot: Dave Cheney <dave@cheney.net>
Reviewed-by: Minux Ma <minux@golang.org>
After mdempsky's recent changes, these are the only references to
"TheChar" left in the Go tree. Without the context, and without
knowing the history, this is confusing.
Also rename sys.TheGoos and sys.TheGoarch to sys.GOOS
and sys.GOARCH.
Also change the heap dump format to include sys.GOARCH
rather than TheChar, which is no longer a concept.
Updates #15169 (changes heapdump format)
Change-Id: I3e99eeeae00ed55d7d01e6ed503d958c6e931dca
Reviewed-on: https://go-review.googlesource.com/21647
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Only compute the number of maximum allowed elements per slice once.
name old time/op new time/op delta
MakeSlice-2 55.5ns ± 1% 45.6ns ± 2% -17.88% (p=0.000 n=99+100)
Change-Id: I951feffda5d11910a75e55d7e978d306d14da2c5
Reviewed-on: https://go-review.googlesource.com/21801
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This reverts commit ab4c9298b8.
Sysmon critically depends on system timer resolution for retaking
of Ps blocked in system calls. See #14790 for an example
of a program where execution time goes from 2ms to 30ms if
timeBeginPeriod(1) is not used.
We can remove timeBeginPeriod(1) when we support UMS (#7876).
Update #14790
Change-Id: I362b56154359b2c52d47f9f2468fe012b481cf6d
Reviewed-on: https://go-review.googlesource.com/20834
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
This makes traces self-contained and simplifies trace workflow
in modern cloud environments where it is simpler to reach
a service via HTTP than to obtain the binary.
Change-Id: I6ff3ca694dc698270f1e29da37d5efaf4e843a0d
Reviewed-on: https://go-review.googlesource.com/21732
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
This broke solaris, which apparently does use the upper 17 bits of the address space.
This reverts commit 3b02c5b1b6.
Change-Id: Iedfe54abd0384960845468205f20191a97751c0b
Reviewed-on: https://go-review.googlesource.com/21652
Reviewed-by: Dave Cheney <dave@cheney.net>
The test for profiling of channel blocking is timing dependent,
and in particular the blockSelectRecvAsync case can fail on a
slow builder (plan9_arm) when many tests are run in parallel.
The child goroutine sleeps for a fixed period so the parent
can be observed to block in a select call reading from the
child; but if the OS process running the parent goroutine is
delayed long enough, the child may wake again before the
parent has reached the blocking point. By repeating the test
three times, the likelihood of a blocking event is increased.
Fixes#15096
Change-Id: I2ddb9576a83408d06b51ded682bf8e71e53ce59e
Reviewed-on: https://go-review.googlesource.com/21604
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Merge the remaining lfstack{Pack,Unpack} implemetations into one file.
unsafe.Sizeof(uintptr(0)) == 4 is a constant comparison so this branch
folds away at compile time.
Dmitry confirmed that the upper 17 bits of an address will be zero for a
user mode pointer, so there is no need to sign extend on amd64 during
unpack, so we can reuse the same implementation as all othe 64 bit
archs.
Change-Id: I99f589416d8b181ccde5364c9c2e78e4a5efc7f1
Reviewed-on: https://go-review.googlesource.com/21597
Run-TryBot: Dave Cheney <dave@cheney.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
So that all Go processes do not die on startup on a system with >256 CPUs.
I tested this by hacking osinit to set ncpu to 1000.
Updates #15131
Change-Id: I52e061a0de97be41d684dd8b748fa9087d6f1aef
Reviewed-on: https://go-review.googlesource.com/21599
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
None of the two places that call lfstackUnpack use the second argument.
This simplifies a followup CL that merges the lfstack{Pack,Unpack}
implementations.
Change-Id: I3c93f6259da99e113d94f8c8027584da79c1ac2c
Reviewed-on: https://go-review.googlesource.com/21595
Run-TryBot: Dave Cheney <dave@cheney.net>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Flaky tests are a distraction and cover up real problems.
File bugs instead and mark them as flaky.
This moves the net/http flaky test flagging mechanism to internal/testenv.
Updates #15156
Updates #15157
Updates #15158
Change-Id: I0e561cd2a09c0dec369cd4ed93bc5a2b40233dfe
Reviewed-on: https://go-review.googlesource.com/21614
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Merge all the 64bit lfstack impls into one file, adjust build tags to
match.
Merge all the comments on the various lfstack implementations for
posterity.
lfstack_amd64.go can probably be merged, but it is slightly different so
that will happen in a followup.
Change-Id: I5362d5e127daa81c9cb9d4fa8a0cc5c5e5c2707c
Reviewed-on: https://go-review.googlesource.com/21591
Run-TryBot: Dave Cheney <dave@cheney.net>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
A future CL will rename os1_windows.go to os_windows.go.
Change-Id: I223e76002dd1e9c9d1798fb0beac02c7d3bf4812
Reviewed-on: https://go-review.googlesource.com/21564
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
Missed a case for closure calls (OCALLFUNC && indirect) in
esc.go:esccall.
Cleanup to runtime code for windows to more thoroughly hide
a technical escape. Also made code pickier about failing
to late non-optional kernel32.dll.
Fixes#14409.
Change-Id: Ie75486a2c8626c4583224e02e4872c2875f7bca5
Reviewed-on: https://go-review.googlesource.com/20102
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Usleep(100) in runqgrab negatively affects latency and throughput
of parallel application. We are sleeping instead of doing useful work.
This is effect is particularly visible on windows where minimal
sleep duration is 1-15ms.
Reduce sleep from 100us to 3us and use osyield on windows.
Sync chan send/recv takes ~50ns, so 3us gives us ~50x overshoot.
benchmark old ns/op new ns/op delta
BenchmarkChanSync-12 216 217 +0.46%
BenchmarkChanSyncWork-12 27213 25816 -5.13%
CPU consumption goes up from 106% to 108% in the first case,
and from 107% to 125% in the second case.
Test case from #14790 on windows:
BenchmarkDefaultResolution-8 4583372 29720 -99.35%
Benchmark1ms-8 992056 30701 -96.91%
99-th latency percentile for HTTP request serving is improved by up to 15%
(see http://golang.org/cl/20835 for details).
The following benchmarks are from the change that originally added this sleep
(see https://golang.org/s/go15gomaxprocs):
name old time/op new time/op delta
Chain 22.6µs ± 2% 22.7µs ± 6% ~ (p=0.905 n=9+10)
ChainBuf 22.4µs ± 3% 22.5µs ± 4% ~ (p=0.780 n=9+10)
Chain-2 23.5µs ± 4% 24.9µs ± 1% +5.66% (p=0.000 n=10+9)
ChainBuf-2 23.7µs ± 1% 24.4µs ± 1% +3.31% (p=0.000 n=9+10)
Chain-4 24.2µs ± 2% 25.1µs ± 3% +3.70% (p=0.000 n=9+10)
ChainBuf-4 24.4µs ± 5% 25.0µs ± 2% +2.37% (p=0.023 n=10+10)
Powser 2.37s ± 1% 2.37s ± 1% ~ (p=0.423 n=8+9)
Powser-2 2.48s ± 2% 2.57s ± 2% +3.74% (p=0.000 n=10+9)
Powser-4 2.66s ± 1% 2.75s ± 1% +3.40% (p=0.000 n=10+10)
Sieve 13.3s ± 2% 13.3s ± 2% ~ (p=1.000 n=10+9)
Sieve-2 7.00s ± 2% 7.44s ±16% ~ (p=0.408 n=8+10)
Sieve-4 4.13s ±21% 3.85s ±22% ~ (p=0.113 n=9+9)
Fixes#14790
Change-Id: Ie7c6a1c4f9c8eb2f5d65ab127a3845386d6f8b5d
Reviewed-on: https://go-review.googlesource.com/20835
Reviewed-by: Austin Clements <austin@google.com>
When we grow the heap, we create a temporary "in use" span for the
memory acquired from the OS and then free that span to link it into
the heap. Hence, we (1) increase pagesInUse when we make the temporary
span so that (2) freeing the span will correctly decrease it.
However, currently step (1) increases pagesInUse by the number of
pages requested from the heap, while step (2) decreases it by the
number of pages requested from the OS (the size of the temporary
span). These aren't necessarily the same, since we round up the number
of pages we request from the OS, so steps 1 and 2 don't necessarily
cancel out like they're supposed to. Over time, this can add up and
cause pagesInUse to underflow and wrap around to 2^64. The garbage
collector computes the sweep ratio from this, so if this happens, the
sweep ratio becomes effectively infinite, causing the first allocation
on each P in a sweep cycle to sweep the entire heap. This makes
sweeping effectively STW.
Fix this by increasing pagesInUse in step 1 by the number of pages
requested from the OS, so that the two steps correctly cancel out. We
add a test that checks that the running total matches the actual state
of the heap.
Fixes#15022. For 1.6.x.
Change-Id: Iefd9d6abe37d0d447cbdbdf9941662e4f18eeffc
Reviewed-on: https://go-review.googlesource.com/21280
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
It appears that windows osyield is just 15ms sleep on my computer
(see benchmarks below). Replace NtWaitForSingleObject in osyield
with SwitchToThread (as suggested by Dmitry).
Also add issue #14790 related benchmarks, so we can track perfomance
changes in CL 20834 and CL 20835 and beyond.
Update #14790
benchmark old ns/op new ns/op delta
BenchmarkChanToSyscallPing1ms 1953200 1953000 -0.01%
BenchmarkChanToSyscallPing15ms 31562904 31248400 -1.00%
BenchmarkSyscallToSyscallPing1ms 5247 4202 -19.92%
BenchmarkSyscallToSyscallPing15ms 5260 4374 -16.84%
BenchmarkChanToChanPing1ms 474 494 +4.22%
BenchmarkChanToChanPing15ms 468 489 +4.49%
BenchmarkOsYield1ms 980018 75.5 -99.99%
BenchmarkOsYield15ms 15625200 75.8 -100.00%
Change-Id: I1b4cc7caca784e2548ee3c846ca07ef152ebedce
Reviewed-on: https://go-review.googlesource.com/21294
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Add supporting code for runtime initialization, including both
32- and 64-bit x86 architectures.
Add .ctors section on Windows to PE .o files, and INITENTRY to .ctors
section to plug in to the GCC C/C++ startup initialization mechanism.
This allows the Go runtime to initialize itself. Add .text section
symbol for .ctor relocations. Note: This is unlikely to be useful for
MSVC-based toolchains.
Fixes#13494
Change-Id: I4286a96f70e5f5228acae88eef46e2bed95813f3
Reviewed-on: https://go-review.googlesource.com/18057
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Change-Id: I91873aaebf79bdf1c00d38aacc1a1fb8d79656a7
Reviewed-on: https://go-review.googlesource.com/21433
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Make sure that for any DLL that Go uses itself, we only look for the
DLL in the Windows System32 directory, guarding against DLL preloading
attacks.
(Unless the Windows version is ancient and LoadLibraryEx is
unavailable, in which case the user probably has bigger security
problems anyway.)
This does not change the behavior of syscall.LoadLibrary or NewLazyDLL
if the DLL name is something unused by Go itself.
This change also intentionally does not add any new API surface. Instead,
x/sys is updated with a LoadLibraryEx function and LazyDLL.Flags in:
https://golang.org/cl/21388
Updates #14959
Change-Id: I8d29200559cc19edf8dcf41dbdd39a389cd6aeb9
Reviewed-on: https://go-review.googlesource.com/21140
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Fixes a problem when using the external linker on Solaris. The Solaris
external linker still doesn't work due to issue #14957.
The problem is, for example, with `go test cmd/objdump`:
objdump_test.go:71: go build fmthello.go: exit status 2
# command-line-arguments
/var/gcc/iant/go/pkg/tool/solaris_amd64/link: running gcc failed: exit status 1
Undefined first referenced
symbol in file
x_cgo_callers /tmp/go-link-355600608/go.o
ld: fatal: symbol referencing errors
collect2: error: ld returned 1 exit status
Change-Id: I54917cfd5c288ee77ea25c439489bd2c9124fe73
Reviewed-on: https://go-review.googlesource.com/21392
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
The new function runtime.SetCgoTraceback may be used to register stack
traceback and symbolizer functions, written in C, to do a stack
traceback from cgo code.
There is a sample implementation of runtime.SetCgoSymbolizer at
github.com/ianlancetaylor/cgosymbolizer. Just importing that package is
sufficient to get symbolic C backtraces.
Currently only supported on linux/amd64.
Change-Id: If96ee2eb41c6c7379d407b9561b87557bfe47341
Reviewed-on: https://go-review.googlesource.com/17761
Reviewed-by: Austin Clements <austin@google.com>
Previously, cmd/compile rejected constant int->string conversions if
the integer value did not fit into an "int" value. Also, runtime
incorrectly truncated 64-bit values to 32-bit before checking if
they're a valid Unicode code point. According to the Go spec, both of
these cases should instead yield "\uFFFD".
Fixes#15039.
Change-Id: I3c8a3ad9a0780c0a8dc1911386a523800fec9764
Reviewed-on: https://go-review.googlesource.com/21344
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Only use REP;MOVSB if:
1) The CPUID flag says it is fast, and
2) The pointers are unaligned
Otherwise, use REP;MOVSQ.
Update #14630
Change-Id: I946b28b87880c08e5eed1ce2945016466c89db66
Reviewed-on: https://go-review.googlesource.com/21300
Reviewed-by: Nigel Tao <nigeltao@golang.org>
See #14874
This change tells the linker to collect all the itablink symbols and
collect them so that moduledata can have a slice of all compiler
generated itabs.
The logic is shamelessly adapted from what is done with typelink symbols.
Change-Id: Ie93b59acf0fcba908a876d506afbf796f222dbac
Reviewed-on: https://go-review.googlesource.com/20889
Reviewed-by: Keith Randall <khr@golang.org>
See #14874
This change tells the compiler to emit itab and itablink symbols in
situations where they could be useful; however the compiled code does
not actually make use of the new symbols yet.
Change-Id: I0db3e6ec0cb1f3b7cebd4c60229e4a48372fe586
Reviewed-on: https://go-review.googlesource.com/20888
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Michel Lespinasse <walken@google.com>
See #14874
This change makes the runtime register all compiler generated itabs
(as obtained from the moduledata) during init.
Change-Id: I9969a0985b99b8bda820a631f7fe4c78f1174cdf
Reviewed-on: https://go-review.googlesource.com/20900
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Michel Lespinasse <walken@google.com>
linux/386 depends on modify_ldt system call, but recent Linux kernels
can disable this system call. Any Go programs built as linux/386
crash with the message 'Trace/breakpoint trap'.
The kernel config CONFIG_MODIFY_LDT_SYSCALL, which control
enable/disable modify_ldt, is disabled on Amazon Linux 2016.03.
This fixes this problem by using set_thread_area instead of modify_ldt
on linux/386.
Fixes#14795.
Change-Id: I0cc5139e40e9e5591945164156a77b6bdff2c7f1
Reviewed-on: https://go-review.googlesource.com/21190
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Minux Ma <minux@golang.org>
One intrinsic was needed to help get the very best
performance out of a future GC; as long as that one was
being added, I also added Bswap since that is sometimes
a handy thing to have. I had intended to fill out the
bit-scan intrinsic family, but the mismatch between the
"scan forward" instruction and "count leading zeroes"
was large enough to cause me to leave it out -- it poses
a dilemma that I'd rather dodge right now.
These intrinsics are not exposed for general use.
That's a separate issue requiring an API proposal change
( https://github.com/golang/proposal )
All intrinsics are tested, both that they are substituted
on the appropriate architecture, and that they produce the
expected result.
Change-Id: I5848037cfd97de4f75bdc33bdd89bba00af4a8ee
Reviewed-on: https://go-review.googlesource.com/20564
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
For #14876.
Change-Id: I0992859264cbaf9c9b691fad53345bbb01b4cf3b
Reviewed-on: https://go-review.googlesource.com/21085
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
For #14876.
Change-Id: I33947f74e8058437a784862f1f064974afc99250
Reviewed-on: https://go-review.googlesource.com/21084
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This is a follow-up of https://go-review.googlesource.com/#/c/20653/
Special case computation for slices with elements of byte size or
pointer size.
name old time/op new time/op delta
GrowSliceBytes-4 86.2ns ± 3% 75.4ns ± 2% -12.50% (p=0.000 n=20+20)
GrowSliceInts-4 161ns ± 3% 136ns ± 3% -15.59% (p=0.000 n=19+19)
GrowSlicePtr-4 239ns ± 2% 233ns ± 2% -2.52% (p=0.000 n=20+20)
GrowSliceStruct24Bytes-4 258ns ± 3% 256ns ± 3% ~ (p=0.134 n=20+20)
Change-Id: Ice5fa648058fe9d7fa89dee97ca359966f671128
Reviewed-on: https://go-review.googlesource.com/21101
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
There's a race between runtime.goexitsall killing all OS processes
of a go program in order to exit, and runtime.newosproc forking a
new one. If the new process has been created but not yet stored
its pid in m.procid, it will not be killed by goexitsall and
deadlock results.
This CL prevents the race by making the newly forked process
check whether the program is exiting. It also prevents a
potential "shoot-out" if multiple goroutines call Exit at
the same time, which could possibly lead to two processes
killing each other and leaving the rest deadlocked.
Change-Id: I3170b4a62d2461f6b029b3d6aad70373714ed53e
Reviewed-on: https://go-review.googlesource.com/21135
Run-TryBot: David du Colombier <0intro@gmail.com>
Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David du Colombier <0intro@gmail.com>
During random stealing we steal 4*GOMAXPROCS times from random procs.
One would expect that most of the time we check all procs this way,
but due to low quality PRNG we actually miss procs with frightening
probability. Below are modelling experiment results for 1e6 tries:
GOMAXPROCS = 2 : missed 1 procs 7944 times
GOMAXPROCS = 3 : missed 1 procs 101620 times
GOMAXPROCS = 3 : missed 2 procs 3571 times
GOMAXPROCS = 4 : missed 1 procs 63916 times
GOMAXPROCS = 4 : missed 2 procs 61 times
GOMAXPROCS = 4 : missed 3 procs 16 times
GOMAXPROCS = 5 : missed 1 procs 133136 times
GOMAXPROCS = 5 : missed 2 procs 1025 times
GOMAXPROCS = 5 : missed 3 procs 101 times
GOMAXPROCS = 5 : missed 4 procs 15 times
GOMAXPROCS = 8 : missed 1 procs 151765 times
GOMAXPROCS = 8 : missed 2 procs 5057 times
GOMAXPROCS = 8 : missed 3 procs 1726 times
GOMAXPROCS = 8 : missed 4 procs 68 times
GOMAXPROCS = 12 : missed 1 procs 199081 times
GOMAXPROCS = 12 : missed 2 procs 27489 times
GOMAXPROCS = 12 : missed 3 procs 3113 times
GOMAXPROCS = 12 : missed 4 procs 233 times
GOMAXPROCS = 12 : missed 5 procs 9 times
GOMAXPROCS = 16 : missed 1 procs 237477 times
GOMAXPROCS = 16 : missed 2 procs 30037 times
GOMAXPROCS = 16 : missed 3 procs 9466 times
GOMAXPROCS = 16 : missed 4 procs 1334 times
GOMAXPROCS = 16 : missed 5 procs 192 times
GOMAXPROCS = 16 : missed 6 procs 5 times
GOMAXPROCS = 16 : missed 7 procs 1 times
GOMAXPROCS = 16 : missed 8 procs 1 times
A missed proc won't lead to underutilization because we check all procs
again after dropping P. But it can lead to an unpleasant situation
when we miss a proc, drop P, check all procs, discover work, acquire P,
miss the proc again, repeat.
Improve stealing logic to cover all procs.
Also don't enter spinning mode and try to steal when there is nobody around.
Change-Id: Ibb6b122cc7fb836991bad7d0639b77c807aab4c2
Reviewed-on: https://go-review.googlesource.com/20836
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Marvin Stenger <marvin.stenger94@gmail.com>
Create a byte encoding designed for static Go names.
It is intended to be a compact representation of a name
and optional tag data that can be turned into a Go string
without allocating, and describes whether or not it is
exported without unicode table.
The encoding is described in reflect/type.go:
// The first byte is a bit field containing:
//
// 1<<0 the name is exported
// 1<<1 tag data follows the name
// 1<<2 pkgPath *string follow the name and tag
//
// The next two bytes are the data length:
//
// l := uint16(data[1])<<8 | uint16(data[2])
//
// Bytes [3:3+l] are the string data.
//
// If tag data follows then bytes 3+l and 3+l+1 are the tag length,
// with the data following.
//
// If the import path follows, then ptrSize bytes at the end of
// the data form a *string. The import path is only set for concrete
// methods that are defined in a different package than their type.
Shrinks binary sizes:
cmd/go: 164KB (1.6%)
jujud: 1.0MB (1.5%)
For #6853.
Change-Id: I46b6591015b17936a443c9efb5009de8dfe8b609
Reviewed-on: https://go-review.googlesource.com/20968
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The current runtime attempts to forward signals generated by non-Go
code to the original signal handler. If it can't call the original
handler directly, it currently attempts to re-raise the signal after
resetting the handler. In this case, the original context is lost.
This fix prevents that problem by simply returning from the go signal
handler after resetting the original handler. It only does this when
the original handler is the system default handler, which in all cases
is known to not recover. The signal is not reset, so it is retriggered
and the original handler takes over with the proper context.
Fixes#14899
Change-Id: Ib1c19dfa4b50d9732d7a453de3784c8141e1cbb3
Reviewed-on: https://go-review.googlesource.com/21006
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Fixes#14938.
Additionally some simplifications along the way.
Change-Id: I2c5fb7e32dcc6fab68fff36a49cb72e715756abe
Reviewed-on: https://go-review.googlesource.com/21046
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
For darwin/arm{,64} a non-Go thread is created to convert
EXC_BAD_ACCESS to panics. However, the Go signal handler refuse to
handle signals that would otherwise be ignored if they arrive at
non-Go threads.
Block all (posix) signals to that thread, making sure that
no unexpected signals arrive to it. At least one test, TestStop in
os/signal, depends on signals not arriving on any non-Go threads.
For #14318
Change-Id: I901467fb53bdadb0d03b0f1a537116c7f4754423
Reviewed-on: https://go-review.googlesource.com/21047
Reviewed-by: David Crawshaw <crawshaw@golang.org>
The existing implementation for Equal and similar
functions in the bytes package operate on one byte at
at time. This performs poorly on ppc64/ppc64le especially
when the byte buffers are large. This change improves
those functions by loading and comparing double words where
possible. The common code has been moved to a function
that can be shared by the other functions in this
file which perform the same type of comparison.
Further optimizations are done for the case where
>= 32 bytes are being compared. The new function
memeqbody is used by memeq_varlen, Equal, and eqstring.
When running the bytes test with -test.bench=Equal
benchmark old MB/s new MB/s speedup
BenchmarkEqual1 164.83 129.49 0.79x
BenchmarkEqual6 563.51 445.47 0.79x
BenchmarkEqual9 656.15 1099.00 1.67x
BenchmarkEqual15 591.93 1024.30 1.73x
BenchmarkEqual16 613.25 1914.12 3.12x
BenchmarkEqual20 682.37 1687.04 2.47x
BenchmarkEqual32 807.96 3843.29 4.76x
BenchmarkEqual4K 1076.25 23280.51 21.63x
BenchmarkEqual4M 1079.30 13120.14 12.16x
BenchmarkEqual64M 1073.28 10876.92 10.13x
It was determined that the degradation in the smaller byte tests
were due to unfavorable code alignment of the single byte loop.
Fixes#14368
Change-Id: I0dd87382c28887c70f4fbe80877a8ba03c31d7cd
Reviewed-on: https://go-review.googlesource.com/20249
Reviewed-by: Minux Ma <minux@golang.org>
MOVSB is quite a bit faster for unaligned moves.
Possibly we should use MOVSB all of the time, but Intel folks
say it might be a bit faster to use MOVSQ on some processors
(but not any I have access to at the moment).
benchmark old ns/op new ns/op delta
BenchmarkMemmove4096-8 93.9 93.2 -0.75%
BenchmarkMemmoveUnalignedDst4096-8 256 151 -41.02%
BenchmarkMemmoveUnalignedSrc4096-8 175 90.5 -48.29%
Fixes#14630
Change-Id: I568e6d6590eb3615e6a699fb474020596be665ff
Reviewed-on: https://go-review.googlesource.com/20293
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Missed this in review of 20812
Change-Id: I01e220499dcd58e1a7205e2a577dd9630a8b7174
Reviewed-on: https://go-review.googlesource.com/20819
Reviewed-by: Keith Randall <khr@golang.org>
We're about to add another root marking job that needs to happen only
during the first markroot pass (whether that's concurrent or STW),
just like finalizer scanning. Rather than introducing another flag
that has the same value as finalizersDone, just rename finalizersDone
to markrootDone.
Change-Id: I535356c6ea1f3734cb5b6add264cb7bf48de95e8
Reviewed-on: https://go-review.googlesource.com/20043
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently shinkstack is only safe during STW because it adjusts
channel-related stack pointers and moves send/receive stack slots
without synchronizing with the channel code. Make it safe to use when
the world isn't stopped by:
1) Locking all channels the G is blocked on while adjusting the sudogs
and copying the area of the stack that may contain send/receive
slots.
2) For any stack frames that may contain send/receive slot, using an
atomic CAS to adjust pointers to prevent races between adjusting a
pointer in a receive slot and a concurrent send writing to that
receive slot.
In principle, the synchronization could be finer-grained. For example,
we considered synchronizing around the sudogs, which would allow
channel operations involving other Gs to continue if the G being
shrunk was far enough down the send/receive queue. However, using the
channel lock means no additional locks are necessary in the channel
code. Furthermore, the stack shrinking code holds the channel lock for
a very short time (much less than the time required to shrink the
stack).
This does not yet make stack shrinking concurrent; it merely makes
doing so safe.
This has negligible effect on the go1 and garbage benchmarks.
For #12967.
Change-Id: Ia49df3a8a7be4b36e365aac4155a2416b94b988c
Reviewed-on: https://go-review.googlesource.com/20042
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
Currently, locking a G's stack by setting its status to _Gcopystack or
_Gscan is unordered with respect to channel locks. However, when we
make stack shrinking concurrent, stack shrinking will need to lock the
G and then acquire channel locks, which imposes an order on these.
Document this lock ordering and fix closechan to respect it.
Everything else already happens to respect it.
For #12967.
Change-Id: I4dd02675efffb3e7daa5285cf75bf24f987d90d4
Reviewed-on: https://go-review.googlesource.com/20041
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently sudog.elem is never accessed concurrently, so in several
cases we drop the channel lock just before reading/writing the
sent/received value from/to sudog.elem. However, concurrent stack
shrinking is going to have to adjust sudog.elem to point to the new
stack, which means it needs a way to synchronize with accesses to
sudog.elem. Hence, add sudog.elem to the fields protected by
hchan.lock and scoot the unlocks down past the uses of sudog.elem.
While we're here, better document the channel synchronization rules.
For #12967.
Change-Id: I3ad0ca71f0a74b0716c261aef21b2f7f13f74917
Reviewed-on: https://go-review.googlesource.com/20040
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
With concurrent stack shrinking, the stack can move the instant after
a G enters _Gwaiting. There are only two places that put a G into
_Gwaiting: gopark and newstack. We fixed uses of gopark. This commit
fixes newstack by simplifying its G transitions and, in particular,
eliminating or narrowing the transient _Gwaiting states it passes
through so it's clear nothing in the G is accessed while in _Gwaiting.
For #12967.
Change-Id: I2440ead411d2bc61beb1e2ab020ebe3cb3481af9
Reviewed-on: https://go-review.googlesource.com/20039
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
gopark calls the unlock function after setting the G to _Gwaiting.
This means it's generally unsafe to access the G's stack from the
unlock function because the G may start running on another P. Once we
start shrinking stacks concurrently, a stack shrink could also move
the stack the moment after it enters _Gwaiting and before the unlock
function is called.
Document this restriction and fix the two places where we currently
violate it.
This is unlikely to be a problem in practice for these two places
right now, but they're already skating on thin ice. For example, the
following sequence could in principle cause corruption, deadlock, or a
panic in the select code:
On M1/P1:
1. G1 selects on channels A and B.
2. selectgoImpl calls gopark.
3. gopark puts G1 in _Gwaiting.
4. gopark calls selparkcommit.
5. selparkcommit releases the lock on channel A.
On M2/P2:
6. G2 sends to channel A.
7. The send puts G1 in _Grunnable and puts it on P2's run queue.
8. The scheduler runs, selects G1, puts it in _Grunning, and resumes G1.
9. On G1, the sellock immediately following the gopark gets called.
10. sellock grows and moves the stack.
On M1/P1:
11. selparkcommit continues to scan the lock order for the next
channel to unlock, but it's now reading from a freed (and possibly
reused) stack.
This shouldn't happen in practice because step 10 isn't the first call
to sellock, so the stack should already be big enough. However, once
we start shrinking stacks concurrently, this reasoning won't work any
more.
For #12967.
Change-Id: I3660c5be37e5be9f87433cb8141bdfdf37fadc4c
Reviewed-on: https://go-review.googlesource.com/20038
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently the g.waiting list created by a select is in poll order.
However, nothing depends on this, and we're going to need access to
the channel lock order in other places shortly, so modify select to
put the waiting list in channel lock order.
For #12967.
Change-Id: If0d38816216ecbb37a36624d9b25dd96e0a775ec
Reviewed-on: https://go-review.googlesource.com/20037
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
Currently the select lock order is a []*hchan. We're going to need to
refer to things other than the channel itself in lock order shortly,
so switch this to a []uint16 of indexes into the select cases. This
parallels the existing representation for the poll order.
Change-Id: I89262223fe20b4ddf5321592655ba9eac489cda1
Reviewed-on: https://go-review.googlesource.com/20036
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Given a G, there's currently no way to find the channel it's blocking
on. We'll need this information to fix a (probably theoretical) bug in
select and to implement concurrent stack shrinking, so record the
channel in the sudog.
For #12967.
Change-Id: If8fb63a140f1d07175818824d08c0ebeec2bdf66
Reviewed-on: https://go-review.googlesource.com/20035
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
gcMarkRootCheck is too expensive to do during mark termination.
However, since it's a useful check and it complements checkmark mode
nicely, enable it during mark termination is checkmark is enabled.
Change-Id: Icd9039e85e6e9d22747454441b50f1cdd1412202
Reviewed-on: https://go-review.googlesource.com/20663
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Change Cond implementation to use a notification list such that waiters
can first register for a notification, release the lock, then actually
wait. Signalers never have to park anymore.
This is intended to address an issue in the previous implementation
where Broadcast could fail to signal all waiters.
Results of the existing benchmark are below.
Original New Diff
BenchmarkCond1-48 2000000 745 ns/op 755 +1.3%
BenchmarkCond2-48 1000000 1545 ns/op 1532 -0.8%
BenchmarkCond4-48 300000 3833 ns/op 3896 +1.6%
BenchmarkCond8-48 200000 10049 ns/op 10257 +2.1%
BenchmarkCond16-48 100000 21123 ns/op 21236 +0.5%
BenchmarkCond32-48 30000 40393 ns/op 41097 +1.7%
Fixes#14064
Change-Id: I083466d61593a791a034df61f5305adfb8f1c7f9
Reviewed-on: https://go-review.googlesource.com/18892
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Caleb Spare <cespare@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The type information for a method includes two variants: a func
without the receiver, and a func with the receiver as the first
parameter. The former is used as part of the dynamic interface
checks, but the latter is only returned as a type in the
reflect.Method struct.
Instead of computing it at compile time, construct it at run time
with reflect.FuncOf.
Using cl/20701 as a baseline,
cmd/go: -480KB, (4.4%)
jujud: -5.6MB, (7.8%)
For #6853.
Change-Id: I1b8c73f3ab894735f53d00cb9c0b506d84d54e92
Reviewed-on: https://go-review.googlesource.com/20709
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
CL 14603 attempted to preserve the callee-save registers for
the darwin/arm runtime initialization routine, but I believe it
wasn't sufficient and resulted in the crash reported in issue
Saving and restoring the registers on the stack the same way
linux/arm does seems more obvious and fixes#14778, so do that.
Even though #14778 is not reproducible on darwin/arm64, I applied
a similar change there, and to linux/arm64 which obeys the same
calling convention.
Finally, this CL is a candidate for a 1.6 minor release for the same
reason CL 14603 was in a 1.5 minor release (as CL 16968). It is
small and only touches the iOS platforms and gomobile on darwin/arm
is currently useless without it.
Fixes#14778Fixes#12590 (again)
Change-Id: I7401daf0bbd7c579a7e84761384a7b763651752a
Reviewed-on: https://go-review.googlesource.com/20621
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
In particular, write down the rules for stack ownership because the
details of this are about to get very important with concurrent stack
shrinking. (Interestingly, the details don't actually change, but
anything that's currently skating on thin ice is likely to fall
through.)
Fox #12967.
Change-Id: I561e2610e864295e9faba07717a934aabefcaab9
Reviewed-on: https://go-review.googlesource.com/20034
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently copystack adjusts pointers in the old stack and then copies
the adjusted stack to the new stack. In addition to being generally
confusing, this is going to make concurrent stack shrinking harder.
Switch this around so that we first copy the stack and then adjust
pointers on the new stack (never writing to the old stack).
This reprises CL 15996, but takes a different and simpler approach. CL
15996 still walked the old stack while adjusting pointers on the new
stack. In this CL, we adjust auxiliary structures before walking the
stack, so we can just walk the new stack.
For #12967.
Change-Id: I94fa86f823ba9ee478e73b2ba509eed3361c43df
Reviewed-on: https://go-review.googlesource.com/20033
Reviewed-by: Rick Hudson <rlh@golang.org>
In particular, document that *sel is on the stack no matter what.
Change-Id: I1c264215e026c27721b13eedae73ec845066cdec
Reviewed-on: https://go-review.googlesource.com/20032
Reviewed-by: Rick Hudson <rlh@golang.org>
Only compute the number of maximum allowed elements per slice once.
Special case newcap computation for slices with byte sized elements.
name old time/op new time/op delta
GrowSliceBytes-2 61.1ns ± 1% 43.4ns ± 1% -29.00% (p=0.000 n=20+20)
GrowSliceInts-2 85.9ns ± 1% 75.7ns ± 1% -11.80% (p=0.000 n=20+20)
Change-Id: I5d9c0d5987cdd108ac29dc32e31912dcefa2324d
Reviewed-on: https://go-review.googlesource.com/20653
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Move functions testSchedLocalQueueLocal and testSchedLocalQueueSteal
from proc.go to export_test.go, the only site that they are used.
Fixes#14796
Change-Id: I16b6fa4a13835eab33f66a2c2e87a5f5c79b7bd3
Reviewed-on: https://go-review.googlesource.com/20640
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
In addition to reflect.Value.Call, exported methods can be invoked
by the Func value in the reflect.Method struct. This CL has the
compiler track what functions get access to a legitimate reflect.Method
struct by looking for interface calls to either of:
Method(int) reflect.Method
MethodByName(string) (reflect.Method, bool)
This is a little overly conservative. If a user implements a type
with one of these methods without using the underlying calls on
reflect.Type, the linker will assume the worst and include all
exported methods. But it's cheap.
No change to any of the binary sizes reported in cl/20483.
For #14740
Change-Id: Ie17786395d0453ce0384d8b240ecb043b7726137
Reviewed-on: https://go-review.googlesource.com/20489
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Still fails about 20% of the time on my laptop.
Fixes#14766.
Change-Id: I169ab728c6022dceeb91188f5ad466ed6413c062
Reviewed-on: https://go-review.googlesource.com/20590
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
On Plan 9, there's no "kill all threads" system call, so exit is done
by sending a "go: exit" note to each OS process. If concurrent GC
occurs during this loop, deadlock sometimes results. Prevent this by
incrementing m.locks before sending notes.
Change-Id: I31aa15134ff6e42d9a82f9f8a308620b3ad1b1b1
Reviewed-on: https://go-review.googlesource.com/20477
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Alternative to golang.org/cl/19852. This memory layout doesn't have
an easy type representation, but it is noticeably smaller than the
current funcType, and saves significant extra space.
Some notes on the layout are in reflect/type.go:
// A *rtype for each in and out parameter is stored in an array that
// directly follows the funcType (and possibly its uncommonType). So
// a function type with one method, one input, and one output is:
//
// struct {
// funcType
// uncommonType
// [2]*rtype // [0] is in, [1] is out
// uncommonTypeSliceContents
// }
There are three arbitrary limits introduced by this CL:
1. No more than 65535 function input parameters.
2. No more than 32767 function output parameters.
3. reflect.FuncOf is limited to 128 parameters.
I don't think these are limits in practice, but are worth noting.
Reduces godoc binary size by 2.4%, 330KB.
For #6853.
Change-Id: I225c0a0516ebdbe92d41dfdf43f716da42dfe347
Reviewed-on: https://go-review.googlesource.com/19916
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Instead of a pointer on every rtype, use a bit flag to indicate that
the contents of uncommonType directly follows the rtype value when it
is needed.
This requires a bit of juggling in the compiler's rtype encoder. The
backing arrays for fields in the rtype are presently encoded directly
after the slice header. This packing requires separating the encoding
of the uncommonType slice headers from their backing arrays.
Reduces binary size of godoc by ~180KB (1.5%).
No measurable change in all.bash time.
For #6853.
Change-Id: I60205948ceb5c0abba76fdf619652da9c465a597
Reviewed-on: https://go-review.googlesource.com/19790
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Before, those C files might have been intended for the Plan 9 C compiler,
but that option was removed in Go 1.5. We can simplify the maintenance
of cgo packages now if we assume C files (and C++ and M and SWIG files)
should only be considered when cgo is enabled.
Also remove newly unnecessary build tags in runtime/cgo's C files.
Fixes#14123
Change-Id: Ia5a7fe62b9469965aa7c3547fe43c6c9292b8205
Reviewed-on: https://go-review.googlesource.com/19613
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently work.finalizersDone is reset only at the beginning of
gcStart. As a result, it will be set when checkmark runs, so checkmark
will skip scanning finalizers. Hence, if there are any bugs that cause
the regular scan of finalizers to miss pointers, checkmark will also
miss them and fail to detect the missed pointer.
Fix this by resetting finalizersDone in gcResetMarkState. This way it
gets reset before any full mark, which is exactly what we want.
Change-Id: I4ddb5eba5b3b97e52aaf3e08fd9aa692bda32b20
Reviewed-on: https://go-review.googlesource.com/20332
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Implementation more or less follows plan9_386 version.
Revised 7 March to correct a bug in runtime.seek and
tidy whitespace for 8-column tabs.
Change-Id: I2e921558b5816502e8aafe330530c5a48a6c7537
Reviewed-on: https://go-review.googlesource.com/18966
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Plan 9 trap/signal handling differs on ARM from other architectures
because ARM has a link register. Also trap message syntax varies
between different architectures (historical accident?).
Revised 7 March to clarify a comment.
Change-Id: Ib6485f82857a2f9a0d6b2c375cf0aaa230b83656
Reviewed-on: https://go-review.googlesource.com/18969
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
We used to start background mark workers and assists at different
times, so we needed to keep track of these separately. They're now set
to exactly the same time, so clean things up by merging them in to one
value, markStartTime.
Change-Id: I17c9843c3ed2d6f07b4c8cd0b2c438fc6de23b53
Reviewed-on: https://go-review.googlesource.com/20143
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Current runtime.WriteHeapProfile() doesn't print correct
EnableGC. Even if GOGC=off, the result file has below line:
# EnableGC = true
It is hard to print correct status of the variable because of corner
cases e.g. initialization. For avoiding confusion, this commit removes
the print.
Change-Id: Ia792454a6c650bdc50a06fbaff4df7b6330ae08a
Reviewed-on: https://go-review.googlesource.com/18600
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
gcMarkRootCheck takes ~10ns per goroutine. This is just a debugging
check, so disable it (plus, if something is going to go wrong, it's
more likely to go wrong during concurrent mark).
We may be able to re-enable this later, or move it to after we've
started the world again. (But not for 1.6.x.)
For 1.6.x.
Fixes#14419.
name / 95%ile-time/markTerm old new delta
500kIdleGs-12 24.0ms ± 0% 18.9ms ± 6% -21.46% (p=0.000 n=15+20)
Change-Id: Idb2a2b1771449de772c159ef95920d6df1090666
Reviewed-on: https://go-review.googlesource.com/20148
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently we reset the mark state during STW sweep termination. This
involves looping over all of the goroutines. Each iteration of this
loop takes ~25ns, so at around 400k goroutines, we'll exceed our 10ms
pause goal.
However, it's safe to do this before we stop the world for sweep
termination because nothing is consuming this state yet. Hence, move
the reset to just before STW.
This isn't perfect: a long reset can still delay allocating goroutines
that block on GC starting. But it's certainly better to block some
things eventually than to block everything immediately.
For 1.6.x.
Fixes#14420.
name \ 95%ile-time/sweepTerm old new delta
500kIdleGs-12 11312µs ± 6% 18.9µs ± 6% -99.83% (p=0.000 n=16+20)
Change-Id: I9815c4d8d9b0d3c3e94dfdab78049cefe0dcc93c
Reviewed-on: https://go-review.googlesource.com/20147
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Also fix compiler-invoked panics to avoid a confusing "malloc deadlock"
crash if they are invoked while executing the runtime.
Fixes#14599.
Change-Id: I89436abcbf3587901909abbdca1973301654a76e
Reviewed-on: https://go-review.googlesource.com/20219
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
go test github.com/onsi/gomega/gbytes now passes at tip, and tests
added to the reflect package.
Fixes#14645
Change-Id: I16216c1a86211a1103d913237fe6bca5000cf885
Reviewed-on: https://go-review.googlesource.com/20221
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This change consolidates functions and methods related to TCPAddr,
TCPConn and TCPListener for maintenance purpose, especially for
documentation. Also refactors Dial error code paths.
The followup changes will update comments and examples.
Updates #10624.
Change-Id: I3333ee218ebcd08928f9e2826cd1984d15ea153e
Reviewed-on: https://go-review.googlesource.com/20009
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Mikio Hara <mikioh.mikioh@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The tree's pretty inconsistent about single space vs double space
after a period in documentation. Make it consistently a single space,
per earlier decisions. This means contributors won't be confused by
misleading precedence.
This CL doesn't use go/doc to parse. It only addresses // comments.
It was generated with:
$ perl -i -npe 's,^(\s*// .+[a-z]\.) +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.) +([A-Z])')
$ go test go/doc -update
Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
Reviewed-on: https://go-review.googlesource.com/20022
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dave Day <djd@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This is a subset of https://golang.org/cl/20022 with only the copyright
header lines, so the next CL will be smaller and more reviewable.
Go policy has been single space after periods in comments for some time.
The copyright header template at:
https://golang.org/doc/contribute.html#copyright
also uses a single space.
Make them all consistent.
Change-Id: Icc26c6b8495c3820da6b171ca96a74701b4a01b0
Reviewed-on: https://go-review.googlesource.com/20111
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Avoid targeting a partial register with load;
ensure source of load (writebarrier) is aligned.
Better yet would be "CMPB $1,writebarrier" but that requires
wrestling with flagalloc (mem operand complicates moving
instruction around).
Didn't see a change in time for
benchcmd -n 10 Build go build net/http
Verified that we clean the code up properly:
0x20a8 <main.main+104>: mov 0xc30a2(%rip),%eax
# 0xc5150 <runtime.writeBarrier>
0x20ae <main.main+110>: test %al,%al
Change-Id: Id5fb8c260eaec27bd727cb0ae1476c60343b0986
Reviewed-on: https://go-review.googlesource.com/19998
Reviewed-by: Keith Randall <khr@golang.org>
Commit a5c3bbe modified adjustpointers to use *uintptrs instead of
*unsafe.Pointers for manipulating stack pointers for clarity and to
eliminate the unnecessary write barrier when writing the updated stack
pointer.
This commit makes the equivalent change to adjustpointer.
Change-Id: I6dc309590b298bdd86ecdc9737db848d6786c3f7
Reviewed-on: https://go-review.googlesource.com/17148
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently dropg does not unwire locked g/m.
This is unnecessary distiction between locked and non-locked g/m.
We always restart goroutines with execute which re-wires g/m.
First, this produces false sense that this distinction is necessary.
Second, it can confuse some sanity and cross checks. For example,
if we check that g/m are unwired before we wire them in execute,
the check will fail for locked g/m. I've hit this while doing some
race detector changes, When we deschedule a goroutine and run
scheduler code, m.curg is generally nil, but not for locked ms.
Remove the distinction.
Change-Id: I3b87a28ff343baa1d564aab1f821b582a84dee07
Reviewed-on: https://go-review.googlesource.com/19950
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
REP-prefixed instructions have a large startup cost.
Avoid them like the plague.
benchmark old ns/op new ns/op delta
BenchmarkIndexByte10-8 22.4 5.34 -76.16%
Fixes#13983
Change-Id: I857e956e240fc9681d053f2584ccf24c1b272bb3
Reviewed-on: https://go-review.googlesource.com/18703
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
_Genqueue and _Gscanenqueue were introduced as part of the GC quiesce
code. The quiesce code was removed by 197aa9e, but these states and
some associated code stuck around. Remove them.
Change-Id: I69df81881602d4a431556513dac2959668d27c20
Reviewed-on: https://go-review.googlesource.com/19638
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently most uses of gcWork use the per-P gcWork, but there are two
places that still use a stack-based gcWork. Simplify things by making
these instead use the per-P gcWork.
Change-Id: I712d012cce9dd5757c8541824e9641ac1c2a329c
Reviewed-on: https://go-review.googlesource.com/19636
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>