TestFreeOSMemory was disabled on many arches because of issue #9993.
Since that's been fixed, enable the test everywhere.
Change-Id: I298c38c3e04128d9c8a1f558980939d5699bea03
Reviewed-on: https://go-review.googlesource.com/27403
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
syscall.Getpagesize currently returns hard-coded page sizes on all
architectures (some of which are probably always wrong, and some of
which are definitely not always right). The runtime now has this
information, queried from the OS during runtime init, so make
syscall.Getpagesize return the page size that the runtime knows.
Updates #10180.
Change-Id: I4daa6fbc61a2193eb8fa9e7878960971205ac346
Reviewed-on: https://go-review.googlesource.com/25051
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Now that the runtime fetches the true physical page size from the OS,
make the physical page size used by heap growth a variable instead of
a constant. This isn't used in any performance-critical paths, so it
shouldn't be an issue.
sys.PhysPageSize is also renamed to sys.DefaultPhysPageSize to make it
clear that it's not necessarily the true page size. There are no uses
of this constant any more, but we'll keep it around for now.
Updates #12480 and #10180.
Change-Id: I6c23b9df860db309c38c8287a703c53817754f03
Reviewed-on: https://go-review.googlesource.com/25022
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently the physical page size assumed by the runtime is hard-coded.
On Linux the runtime at least fetches the OS page size during init and
sanity checks against the hard-coded value, but they may still differ.
On other OSes we wouldn't even notice.
Add support on all OSes to fetch the actual OS physical page size
during runtime init and lift the sanity check of PhysPageSize from the
Linux init code to general malloc init. Currently this is the only use
of the retrieved page size, but we'll add more shortly.
Updates #12480 and #10180.
Change-Id: I065f2834bc97c71d3208edc17fd990ec9058b6da
Reviewed-on: https://go-review.googlesource.com/25050
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently we assume the physical page size on ARM is 4kB. While this
is usually true, the architecture also supports 16kB and 64kB physical
pages, and Linux (and possibly other OSes) can be configured to use
these larger page sizes.
With Go 1.6, such a configuration could potentially run, but generally
resulted in memory corruption or random panics. With current master,
this configuration will cause the runtime to panic during init on
Linux when it checks the true physical page size (and will still cause
corruption or panics on other OSes).
However, the assumed physical page size only has to be a multiple of
the true physical page size, the scavenger can now deal with large
physical page sizes, and the rest of the runtime can deal with a
larger assumed physical page size than the true size. Hence, there's
little disadvantage to conservatively setting the assumed physical
page size to 64kB on ARM.
This may result in some extra memory use, since we can only return
memory at multiples of the assumed physical page size. However, it is
a simple change that should make Go run on systems configured for
larger page sizes. The following commits will make the runtime query
the actual physical page size from the OS, but this is a simple step
there.
Updates #12480.
Change-Id: I851829595bc9e0c76235c847a7b5f62ad82b5302
Reviewed-on: https://go-review.googlesource.com/25021
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
Currently the time spent in scanobject is proportional to the size of
the object being scanned. Since scanobject is non-preemptible, large
objects can cause significant goroutine (and even whole application)
delays through several means:
1. If a GC assist picks up a large object, the allocating goroutine is
blocked for the whole scan, even if that scan well exceeds that
goroutine's debt.
2. Since the scheduler does not run on the P performing a large object
scan, goroutines in that P's run queue do not run unless they are
stolen by another P (which can take some time). If there are a few
large objects, all of the Ps may get tied up so the scheduler
doesn't run anywhere.
3. Even if a large object is scanned by a background worker and other
Ps are still running the scheduler, the large object scan doesn't
flush background credit until the whole scan is done. This can
easily cause all allocations to block in assists, waiting for
credit, causing an effective STW.
Fix this by splitting large objects into 128 KB "oblets" and scanning
at most one oblet at a time. Since we can scan 1–2 MB/ms, this equates
to bounding scanobject at roughly 100 µs. This improves assist
behavior both because assists can no longer get "unlucky" and be stuck
scanning a large object, and because it causes the background worker
to flush credit and unblock assists more frequently when scanning
large objects. This also improves GC parallelism if the heap consists
primarily of a small number of very large objects by letting multiple
workers scan a large objects in parallel.
Fixes#10345. Fixes#16293.
This substantially improves goroutine latency in the benchmark from
issue #16293, which exercises several forms of very large objects:
name old max-latency new max-latency delta
SliceNoPointer-12 154µs ± 1% 155µs ± 2% ~ (p=0.087 n=13+12)
SlicePointer-12 314ms ± 1% 5.94ms ±138% -98.11% (p=0.000 n=19+20)
SliceLivePointer-12 1148ms ± 0% 4.72ms ±167% -99.59% (p=0.000 n=19+20)
MapNoPointer-12 72509µs ± 1% 408µs ±325% -99.44% (p=0.000 n=19+18)
ChanPointer-12 313ms ± 0% 4.74ms ±140% -98.49% (p=0.000 n=18+20)
ChanLivePointer-12 1147ms ± 0% 3.30ms ±149% -99.71% (p=0.000 n=19+20)
name old P99.9-latency new P99.9-latency delta
SliceNoPointer-12 113µs ±25% 107µs ±12% ~ (p=0.153 n=20+18)
SlicePointer-12 309450µs ± 0% 133µs ±23% -99.96% (p=0.000 n=20+20)
SliceLivePointer-12 961ms ± 0% 1.35ms ±27% -99.86% (p=0.000 n=20+20)
MapNoPointer-12 448µs ±288% 119µs ±18% -73.34% (p=0.000 n=18+20)
ChanPointer-12 309450µs ± 0% 134µs ±23% -99.96% (p=0.000 n=20+19)
ChanLivePointer-12 961ms ± 0% 1.35ms ±27% -99.86% (p=0.000 n=20+20)
This has negligible effect on all metrics from the garbage, JSON, and
HTTP x/benchmarks.
It shows slight improvement on some of the go1 benchmarks,
particularly Revcomp, which uses some multi-megabyte buffers:
name old time/op new time/op delta
BinaryTree17-12 2.46s ± 1% 2.47s ± 1% +0.32% (p=0.012 n=20+20)
Fannkuch11-12 2.82s ± 0% 2.81s ± 0% -0.61% (p=0.000 n=17+20)
FmtFprintfEmpty-12 50.8ns ± 5% 50.5ns ± 2% ~ (p=0.197 n=17+19)
FmtFprintfString-12 131ns ± 1% 132ns ± 0% +0.57% (p=0.000 n=20+16)
FmtFprintfInt-12 117ns ± 0% 116ns ± 0% -0.47% (p=0.000 n=15+20)
FmtFprintfIntInt-12 180ns ± 0% 179ns ± 1% -0.78% (p=0.000 n=16+20)
FmtFprintfPrefixedInt-12 186ns ± 1% 185ns ± 1% -0.55% (p=0.000 n=19+20)
FmtFprintfFloat-12 263ns ± 1% 271ns ± 0% +2.84% (p=0.000 n=18+20)
FmtManyArgs-12 741ns ± 1% 742ns ± 1% ~ (p=0.190 n=19+19)
GobDecode-12 7.44ms ± 0% 7.35ms ± 1% -1.21% (p=0.000 n=20+20)
GobEncode-12 6.22ms ± 1% 6.21ms ± 1% ~ (p=0.336 n=20+19)
Gzip-12 220ms ± 1% 219ms ± 1% ~ (p=0.130 n=19+19)
Gunzip-12 37.9ms ± 0% 37.9ms ± 1% ~ (p=1.000 n=20+19)
HTTPClientServer-12 82.5µs ± 3% 82.6µs ± 3% ~ (p=0.776 n=20+19)
JSONEncode-12 16.4ms ± 1% 16.5ms ± 2% +0.49% (p=0.003 n=18+19)
JSONDecode-12 53.7ms ± 1% 54.1ms ± 1% +0.71% (p=0.000 n=19+18)
Mandelbrot200-12 4.19ms ± 1% 4.20ms ± 1% ~ (p=0.452 n=19+19)
GoParse-12 3.38ms ± 1% 3.37ms ± 1% ~ (p=0.123 n=19+19)
RegexpMatchEasy0_32-12 72.1ns ± 1% 71.8ns ± 1% ~ (p=0.397 n=19+17)
RegexpMatchEasy0_1K-12 242ns ± 0% 242ns ± 0% ~ (p=0.168 n=17+20)
RegexpMatchEasy1_32-12 72.1ns ± 1% 72.1ns ± 1% ~ (p=0.538 n=18+19)
RegexpMatchEasy1_1K-12 385ns ± 1% 384ns ± 1% ~ (p=0.388 n=20+20)
RegexpMatchMedium_32-12 112ns ± 1% 112ns ± 3% ~ (p=0.539 n=20+20)
RegexpMatchMedium_1K-12 34.4µs ± 2% 34.4µs ± 2% ~ (p=0.628 n=18+18)
RegexpMatchHard_32-12 1.80µs ± 1% 1.80µs ± 1% ~ (p=0.522 n=18+19)
RegexpMatchHard_1K-12 54.0µs ± 1% 54.1µs ± 1% ~ (p=0.647 n=20+19)
Revcomp-12 387ms ± 1% 369ms ± 5% -4.89% (p=0.000 n=17+19)
Template-12 62.3ms ± 1% 62.0ms ± 0% -0.48% (p=0.002 n=20+17)
TimeParse-12 314ns ± 1% 314ns ± 0% ~ (p=1.011 n=20+13)
TimeFormat-12 358ns ± 0% 354ns ± 0% -1.12% (p=0.000 n=17+20)
[Geo mean] 53.5µs 53.3µs -0.23%
Change-Id: I2a0a179d1d6bf7875dd054b7693dd12d2a340132
Reviewed-on: https://go-review.googlesource.com/23540
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Commit 59877bf renamed bitMarked to bitScan, since the bitmap is no
longer used for marking. However, there were several other references
to this strewn about comments and in some other constant names. Fix
these up, too.
Change-Id: I4183d28c6b01977f1d75a99ad55b150f2211772d
Reviewed-on: https://go-review.googlesource.com/28450
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
The previous if condition already checks the same expression and doesn't
have side effects.
Change-Id: Ieaf30a786572b608d0a883052b45fd3f04bc6147
Reviewed-on: https://go-review.googlesource.com/28475
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
We reset global buffer only if its pos != 0.
We ought to do it always, but queue it only if pos != 0.
This is a latent bug. Currently it does not fire because
whenever we create a global buffer, we increment pos.
Change-Id: I01e28ae88ce9a5412497c524391b8b7cb443ffd9
Reviewed-on: https://go-review.googlesource.com/25574
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Currently this message says "invalid stack pointer", which could be
interpreted as the value of SP being invalid. Change it to "invalid
pointer found on stack" to emphasize that it's a pointer on the stack
that's invalid.
Updates #16948.
Change-Id: I753624f8cc7e08cf13d3ea5d9c790cc4af9fa372
Reviewed-on: https://go-review.googlesource.com/28430
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
This reverts commit 3607c5f4f1.
This was causing failures on amd64 machines without AVX.
Fixes#16939
Change-Id: I70080fbb4e7ae791857334f2bffd847d08cb25fa
Reviewed-on: https://go-review.googlesource.com/28274
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
On ARM64, MIPS64, and PPC64, some floating point registers were
reserved for constants 0, 1, 2, 0.5, etc. This CL removes them.
On ARM64, they are never used. On MIPS64 and PPC64, the only use
case is a multiplication-by-2 in the old backend of the compiler,
which is replaced with an addition.
Change-Id: I737cbf43283756e3408964fc88c567a938c57036
Reviewed-on: https://go-review.googlesource.com/28095
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This change makes sure that tests are run with the correct
version of the go tool. The correct version is the one that
we invoked with "go test", not the one that is first in our path.
Fixes#16577
Change-Id: If22c8f8c3ec9e7c35d094362873819f2fbb8559b
Reviewed-on: https://go-review.googlesource.com/28089
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Generate a for loop for ranging over strings that only needs to call
the runtime function charntorune for non ASCII characters.
This provides faster iteration over ASCII characters and slightly
faster iteration for other characters.
The runtime function charntorune is changed to take an index from where
to start decoding and returns the index after the last byte belonging
to the decoded rune.
All call sites of charntorune in the runtime are replaced by a for loop
that will be transformed by the compiler instead of calling the charntorune
function directly.
go binary size decreases by 80 bytes.
godoc binary size increases by around 4 kilobytes.
runtime:
name old time/op new time/op delta
RuneIterate/range/ASCII-4 43.7ns ± 3% 10.3ns ± 4% -76.33% (p=0.000 n=44+45)
RuneIterate/range/Japanese-4 72.5ns ± 2% 62.8ns ± 2% -13.41% (p=0.000 n=49+50)
RuneIterate/range1/ASCII-4 43.5ns ± 2% 10.4ns ± 3% -76.18% (p=0.000 n=50+50)
RuneIterate/range1/Japanese-4 72.5ns ± 2% 62.9ns ± 2% -13.26% (p=0.000 n=50+49)
RuneIterate/range2/ASCII-4 43.5ns ± 3% 10.3ns ± 2% -76.22% (p=0.000 n=48+47)
RuneIterate/range2/Japanese-4 72.4ns ± 2% 62.7ns ± 2% -13.47% (p=0.000 n=50+50)
strings:
name old time/op new time/op delta
IndexRune-4 64.7ns ± 5% 22.4ns ± 3% -65.43% (p=0.000 n=25+21)
MapNoChanges-4 269ns ± 2% 157ns ± 2% -41.46% (p=0.000 n=23+24)
Fields-4 23.0ms ± 2% 19.7ms ± 2% -14.35% (p=0.000 n=25+25)
FieldsFunc-4 23.1ms ± 2% 19.6ms ± 2% -14.94% (p=0.000 n=25+24)
name old speed new speed delta
Fields-4 45.6MB/s ± 2% 53.2MB/s ± 2% +16.87% (p=0.000 n=24+25)
FieldsFunc-4 45.5MB/s ± 2% 53.5MB/s ± 2% +17.57% (p=0.000 n=25+24)
Updates #13162
Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701
Reviewed-on: https://go-review.googlesource.com/27853
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Martin Möhrmann <martisch@uos.de>
noescape is now 0 instructions with the SSA backend.
fast atomics are no longer a TODO (at least for amd64).
Change-Id: Ib6e06f7471bef282a47ba236d8ce95404bb60a42
Reviewed-on: https://go-review.googlesource.com/28087
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The current padding in the 'p' struct is hardcoded at 64 bytes. It should be the
cache line size. On ppc64x, the current value is only okay because sys.CacheLineSize
is wrong at 64 bytes. This change fixes that by making the padding equal to the
cache line size. It also fixes the cache line size for ppc64/ppc64le to 128 bytes.
Fixes#16477
Change-Id: Ib7ec5195685116eb11ba312a064f41920373d4a3
Reviewed-on: https://go-review.googlesource.com/25370
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Where possible generate calls to runtime makeslice with int arguments
during compile time instead of makeslice with int64 arguments.
This eliminates converting arguments for calls to makeslice with
int64 arguments for platforms where int64 values do not fit into
arguments of type int.
godoc 386 binary shrinks by approximately 12 kilobyte.
amd64:
name old time/op new time/op delta
MakeSlice-2 29.8ns ± 1% 29.8ns ± 1% ~ (p=1.000 n=24+24)
386:
name old time/op new time/op delta
MakeSlice-2 52.3ns ± 0% 45.9ns ± 0% -12.17% (p=0.000 n=25+22)
Fixes #15357
Change-Id: Icb8701bb63c5a83877d26c8a4b78e782ba76de7c
Reviewed-on: https://go-review.googlesource.com/27851
Run-TryBot: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Fixes#16911.
Fix obsolete inferno-os links, since code.google.com shutdown.
This CL points to the right files by replacing
http://code.google.com/p/inferno-os/source/browse
with
https://bitbucket.org/inferno-os/inferno-os/src/default
To implement the change I wrote and ran this script in the root:
$ grep -Rn 'http://code.google.com/p/inferno-os/source/browse' * \
| cut -d":" -f1 | while read F;do perl -pi -e \
's/http:\/\/code.google.com\/p\/inferno-os\/source\/browse/https:\/\/bitbucket.org\/inferno-os\/inferno-os\/src\/default/g'
$F;done
I excluded any cmd/vendor changes from the commit.
Change-Id: Iaaf828ac8f6fc949019fd01832989d00b29b6749
Reviewed-on: https://go-review.googlesource.com/27994
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
For reasons I have forgotten typelinksinit processed modules backwards.
(I suspect this was an attempt to process types in the executing
binary first.)
It does not appear to be necessary, and it is not the order we want
when a module can be loaded at an arbitrary point during a program's
execution as a plugin. So reverse the order.
While here, make it safe to call typelinksinit multiple times.
Change-Id: Ie10587c55c8e5efa0542981efb6eb3c12dd59e8c
Reviewed-on: https://go-review.googlesource.com/27822
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Inline atomic reads and writes on amd64. There's no reason
to pay the overhead of a call for these.
To keep atomic loads from being reordered, we make them
return a <value,memory> tuple.
Change the meaning of resultInArg0 for tuple-generating ops
to mean the first part of the result tuple, not the second.
This means we can always put the store part of the tuple last,
matching how arguments are laid out. This requires reordering
the outputs of add32carry and sub32carry and their descendents
in various architectures.
benchmark old ns/op new ns/op delta
BenchmarkAtomicLoad64-8 2.09 0.26 -87.56%
BenchmarkAtomicStore64-8 7.54 5.72 -24.14%
TBD (in a different CL): Cas, Or8, ...
Change-Id: I713ea88e7da3026c44ea5bdb56ed094b20bc5207
Reviewed-on: https://go-review.googlesource.com/27641
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Add missing function prototypes.
Fix function prototypes.
Use FP references instead of SP references.
Fix variable names.
Update comments.
Clean up whitespace. (Not for vet.)
All fairly minor fixes to make vet happy.
Updates #11041
Change-Id: Ifab2cdf235ff61cdc226ab1d84b8467b5ac9446c
Reviewed-on: https://go-review.googlesource.com/27713
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The type sigtabtt was introduced by an automated tool in
https://golang.org/cl/167550043. It was the Go version of the C type
SigTab. However, when the C code using SigTab was converted to Go in
https://golang.org/cl/168500044 it was rewritten to use a different Go
type, sigTabT, rather than sigtabtt (the difference being that sigTabT
uses string where sigtabtt uses *int8 from the C type char*). So this is
just a dreg from the conversion that was never actually used.
Change-Id: I2ec6eb4b25613bf5e5ad1dbba1f4b5ff20f80f55
Reviewed-on: https://go-review.googlesource.com/27691
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This should improve the precision of time.now() from microseconds
to nanoseconds.
Also, modify runtime.nanotime to keep it consistent with cleanup
done to time.now.
Updates #11222 for s390x.
Change-Id: I27864115ea1fee7299360d9003cd3a8355f624d3
Reviewed-on: https://go-review.googlesource.com/27710
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Now that we have ops that can return 2 results, have BSF return a result
and flags. We can then get rid of the redundant comparison and use CMOV
instead of CMOVconst ops.
Get rid of a bunch of the ops we don't use. Ctz{8,16}, plus all the Clzs,
and CMOVNEs. I don't think we'll ever use them, and they would be easy
to add back if needed.
Change-Id: I8858a1d017903474ea7e4002fc76a6a86e7bd487
Reviewed-on: https://go-review.googlesource.com/27630
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Tell msan that the arguments to the traceback functions are initialized,
in case the traceback functions are compiled with -fsanitize=memory.
Change-Id: I3ab0816604906c6cd7086245e6ae2e7fa62fe354
Reviewed-on: https://go-review.googlesource.com/24856
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Add missing race and msan checks to reflect.typedmmemove and
reflect.typedslicecopy. Missing these checks caused the race detector
to miss races and caused msan to issue false positive errors.
Fixes#16281.
Change-Id: I500b5f92bd68dc99dd5d6f297827fd5d2609e88b
Reviewed-on: https://go-review.googlesource.com/24760
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Fetch the current time in nanoseconds, not microseconds, by using
clock_gettime rather than gettimeofday.
Updates #11222
Change-Id: I1c2c1b88f80ae82002518359436e19099061c6fb
Reviewed-on: https://go-review.googlesource.com/26790
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Minux Ma <minux@golang.org>
Found by vet.
Updates #11041
Change-Id: I5217b3e20c6af435d7500d6bb487b9895efe6605
Reviewed-on: https://go-review.googlesource.com/27493
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
In StartTrace we emit EvGoCreate for all existing goroutines.
This includes stack unwind to obtain current stack.
Real Go programs can contain hundreds of thousands of blocked goroutines.
For such programs StartTrace can take up to a second (few ms per goroutine).
Obtain current stack ID once and use it for all EvGoCreate events.
This speeds up StartTrace with 10K blocked goroutines from 20ms to 4 ms
(win for StartTrace called from net/http/pprof hander will be bigger
as stack is deeper).
Change-Id: I9e5ff9468331a840f8fdcdd56c5018c2cfde61fc
Reviewed-on: https://go-review.googlesource.com/25573
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
They are unused, and vet wants them to have
a function prototype.
Updates #11041
Change-Id: Idedc96ddd3c3cf1b1d2ab6d98796367eab29f032
Reviewed-on: https://go-review.googlesource.com/27492
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The asmdecl check had hand-rolled code that
calculated the size and offset of parameters
based only on the AST.
It included a list of known named types.
This CL changes asmdecl to use go/types instead.
This allows us to easily handle named types.
It also adds support for structs, arrays,
and complex parameters.
It improves the default names given to unnamed
parameters. Previously, all anonymous arguments were
called "unnamed", and the first anonymous return
argument was called "ret".
Anonymous arguments are now called arg, arg1, arg2,
etc., depending on the index in the argument list.
Return arguments are ret, ret1, ret2.
This CL also fixes a bug in the printing of
composite data type sizes.
Updates #11041
Change-Id: I1085116a26fe6199480b680eff659eb9ab31769b
Reviewed-on: https://go-review.googlesource.com/27150
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
Go will have already cleared the structs (the original C wouldn't
have).
Change-Id: I4a5a0cfd73953181affc158d188aae2ce281bb33
Reviewed-on: https://go-review.googlesource.com/27435
Run-TryBot: Michael Munday <munday@ca.ibm.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The previous fix for this, commit 336dad2a, had everything right in
the commit message, but reversed the test in the code. Fix the test in
the code.
This reversal effectively disabled the scavenger on large page systems
*except* in the rare cases where this code was originally wrong, which
is why it didn't obviously show up in testing.
Fixes#16644. Again. :(
Change-Id: I27cce4aea13de217197db4b628f17860f27ce83e
Reviewed-on: https://go-review.googlesource.com/27402
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The transition from mark 1 to mark 2 no longer enqueues new root
marking jobs, but some of the comments still refer to this. Fix these
comments.
Change-Id: I3f98628dba32c5afe30495ab495da42b32291e9e
Reviewed-on: https://go-review.googlesource.com/24965
Reviewed-by: Rick Hudson <rlh@golang.org>
This time with the cherry-pick from the proper patch of
the old CL.
Stack size increased.
Corrected NaN-comparison glitches.
Marked g register as clobbered by calls.
Fixed shared libraries.
live_ssa.go still disabled because of differences.
Presumably turning on more optimization will fix
both the stack size and the live_ssa.go glitches.
Enhanced debugging output for shared libs test.
Rebased onto master.
Updates #16010.
Change-Id: I40864faf1ef32c118fb141b7ef8e854498e6b2c4
Reviewed-on: https://go-review.googlesource.com/27159
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
sysUnused (e.g., madvise MADV_FREE) is only sensible to call on
physical page boundaries, so scavengelist rounds in the bounds of the
region being released to the nearest physical page boundaries.
However, if the region is smaller than a physical page and neither the
start nor end fall on a boundary, then rounding the start up to a page
boundary and the end down to a page boundary will result in end < start.
Currently, we only give up on the region if start == end, so if we
encounter end < start, we'll call madvise with a negative length and
the madvise will fail.
Issue #16644 gives a concrete example of this:
start = 0x1285ac000
end = 0x1285ae000 (1 8K page)
This leads to the rounded values
start = 0x1285b0000
end = 0x1285a0000
which leads to len = -65536.
Fix this by giving up on the region if end <= start, not just if
end == start.
Fixes#16644.
Change-Id: I8300db492dbadc82ac1ad878318b36bcb7c39524
Reviewed-on: https://go-review.googlesource.com/27230
Reviewed-by: Keith Randall <khr@golang.org>
We should check whether there is a concurrent writer at the
start of every mapiternext, not just in mapaccessK (which is
only called during certain map growth situations).
Tests turned off by default because they are inherently flaky.
Fixes#16278
Change-Id: I8b72cab1b8c59d1923bec6fa3eabc932e4e91542
Reviewed-on: https://go-review.googlesource.com/24749
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
We already inlined
_, ok = e.(T)
_, ok = i.(E)
_, ok = e.(E)
The only ok-only variants not inlined are now
_, ok = i.(I)
_, ok = e.(I)
These call getitab, so are non-trivial.
Change-Id: Ie45fd8933ee179a679b92ce925079b94cff0ee12
Reviewed-on: https://go-review.googlesource.com/26658
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Merging from dev.ssa back into master.
Contains complete SSA backends for arm, arm64, 386, amd64p32.
Work in progress for PPC64.
Change-Id: Ifd7075e3ec6f88f776e29f8c7fd55830328897fd
Last part of the 386 SSA port.
Modify the x86 backend to simulate SSE registers and
instructions with 387 registers and instructions.
The simulation isn't terribly performant, but it works,
and the old implementation wasn't very performant either.
Leaving to people who care about 387 to optimize if they want.
Turn on SSA backend for 386 by default.
Fixes#16358
Change-Id: I678fb59132620b2c47e993c1c10c4c21135f70c0
Reviewed-on: https://go-review.googlesource.com/25271
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Fixes#16618.
Change-Id: Iffada12e8672bbdbcf2e787782c497e2c45701b1
Reviewed-on: https://go-review.googlesource.com/25550
Run-TryBot: Minux Ma <minux@golang.org>
Reviewed-by: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Fixes#16570 on iOS.
Thanks Daniel Burhans for reporting the bug and testing the fix.
Change-Id: I43ae7b78c8f85a131ed3d93ea59da9f32a02cd8f
Reviewed-on: https://go-review.googlesource.com/25481
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
When compiling with -buildmode=shared, a map[int32]*_type is created for
each extra module mapping duplicate types back to a canonical object.
This is done in the function typelinksinit, which is called before the
init function that sets up the hash functions for the map
implementation. The result is typemap becomes unusable after
runtime initialization.
The fix in this CL is to move algorithm init before typelinksinit in
the runtime setup process. (For 1.8, we may want to turn typemap into
a sorted slice of types and use binary search.)
Manually tested on GOOS=linux with:
GOHOSTARCH=386 GOARCH=386 ./make.bash && \
go install -buildmode=shared std && \
cd ../test && \
go run run.go -linkshared
Fixes#16590
Change-Id: Idc08c50cc70d20028276fbf564509d2cd5405210
Reviewed-on: https://go-review.googlesource.com/25469
Run-TryBot: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
macOS Sierra beta4 changed the kernel interface for getting time.
DX now optionally points to an address for additional info.
Set it to zero to avoid corrupting memory.
Fixes#16570
Change-Id: I9f537e552682045325cdbb68b7d0b4ddafade14a
Reviewed-on: https://go-review.googlesource.com/25400
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Quentin Smith <quentin@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Mutator goroutines that allocate memory during the concurrent mark
phase are required to spend some time assisting the garbage
collector. The magnitude of this mandatory assistance is proportional
to the goroutine's allocation debt and subject to the assistance
ratio as calculated by the pacer.
When assisting the garbage collector, a mutator goroutine will go
beyond paying off its allocation debt. It will build up extra credit
to amortize the overhead of the assist.
In fast-allocating applications with high assist ratios, building up
this credit can take the affected goroutine's entire time slice.
Reduce the penalty on each goroutine being selected to assist the GC
in two ways, to spread the responsibility more evenly.
First, do a consistent amount of extra scan work without regard for
the pacer's assistance ratio. Second, reduce the magnitude of the
extra scan work so it can be completed within a few hundred
microseconds.
Commentary on gcOverAssistWork is by Austin Clements, originally in
https://golang.org/cl/24704
Updates #14812Fixes#16432
Change-Id: I436f899e778c20daa314f3e9f0e2a1bbd53b43e1
Reviewed-on: https://go-review.googlesource.com/25155
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Chris Broadfoot <cbro@golang.org>
Currently the pprof package gives almost no guidance for how to use it
and, despite the standard boilerplate used to create CPU and memory
profiles, this boilerplate appears nowhere in the pprof documentation.
Update the pprof package documentation to give the standard
boilerplate in a form people can copy, paste, and tweak. This
boilerplate is based on rsc's 2011 blog post on profiling Go programs
at https://blog.golang.org/profiling-go-programs, which is where I
always go when I need to copy-paste the boilerplate.
Change-Id: I74021e494ea4dcc6b56d6fb5e59829ad4bb7b0be
Reviewed-on: https://go-review.googlesource.com/25182
Reviewed-by: Rick Hudson <rlh@golang.org>
GOARCH=386 SSATEST=1 ./all.bash passes
Caveat: still needs changes to test/ files to use *_ssa.go versions. I
won't check those changes in with this CL because the builders will
complain as they don't have SSATEST=1.
Mostly minor fixes.
Implement float <-> uint32 in assembly. It seems the simplest option
for now.
GO386=387 does not work. That's why I can't make SSA the default for
386 yet.
Change-Id: Ic4d4402104d32bcfb1fd612f5bb6539f9acb8ae0
Reviewed-on: https://go-review.googlesource.com/25119
Reviewed-by: Cherry Zhang <cherryyz@google.com>
The omission of this instruction could confuse the traceback code if a
SIGPROF occurred during a signal handler. The traceback code would
trace up to sigtramp, but would then get confused because it would see a
PC address that did not appear to be in the function.
Fixes#16453.
Change-Id: I2b3d53e0b272fb01d9c2cb8add22bad879d3eebc
Reviewed-on: https://go-review.googlesource.com/25104
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Most operations need an upper bound on the physical page size, which
is what sys.PhysPageSize is for (this is checked at runtime init on
Linux). However, a few operations need a *lower* bound on the physical
page size. Introduce a "minPhysPageSize" constant to act as this lower
bound and use it where it makes sense:
1) In addrspace_free, we have to query each page in the given range.
Currently we increment by the upper bound on the physical page
size, which means we may skip over pages if the true size is
smaller. Worse, we currently pass a result buffer that only has
enough room for one page. If there are actually multiple pages in
the range passed to mincore, the kernel will overflow this buffer.
Fix these problems by incrementing by the lower-bound on the
physical page size and by passing "1" for the length, which the
kernel will round up to the true physical page size.
2) In the write barrier, the bad pointer check tests for pointers to
the first physical page, which are presumably small integers
masquerading as pointers. However, if physical pages are smaller
than we think, we may have legitimate pointers below
sys.PhysPageSize. Hence, use minPhysPageSize for this test since
pointers should never fall below that.
In particular, this applies to ARM64 and MIPS. The runtime is
configured to use 64kB pages on ARM64, but by default Linux uses 4kB
pages. Similarly, the runtime assumes 16kB pages on MIPS, but both 4kB
and 16kB kernel configurations are common. This also applies to ARM on
systems where the runtime is recompiled to deal with a larger page
size. It is also a step toward making the runtime use only a
dynamically-queried page size.
Change-Id: I1fdfd18f6e7cbca170cc100354b9faa22fde8a69
Reviewed-on: https://go-review.googlesource.com/25020
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Austin Clements <austin@google.com>
When a non-Go thread calls into Go, the runtime needs an M to run the Go
code. The runtime keeps a list of extra M's available. When the last
extra M is allocated, the needextram field is set to tell it to allocate
a new extra M as soon as it is running in Go. This ensures that an extra
M will always be available for the next thread.
However, if many threads need an extra M at the same time, this
serializes them all. One thread will get an extra M with the needextram
field set. All the other threads will see that there is no M available
and will go to sleep. The one thread that succeeded will create a new
extra M. One lucky thread will get it. All the other threads will see
that there is no M available and will go to sleep. The effect is
thundering herd, as all the threads looking for an extra M go through
the process one by one. This seems to have a particularly bad effect on
the FreeBSD scheduler for some reason.
With this change, we track the number of threads waiting for an M, and
create all of them as soon as one thread gets through. This still means
that all the threads will fight for the lock to pick up the next M. But
at least each thread that gets the lock will succeed, instead of going
to sleep only to fight again.
This smooths out the performance greatly on FreeBSD, reducing the
average wall time of `testprogcgo CgoCallbackGC` by 74%. On GNU/Linux
the average wall time goes down by 9%.
Fixes#13926Fixes#16396
Change-Id: I6dc42a4156085a7ed4e5334c60b39db8f8ef8fea
Reviewed-on: https://go-review.googlesource.com/25047
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Add some simplification rules for floating point ops.
cmd/internal/obj/arm supports instructions that compare FP register
to 0, but runtime softfloat simulator does not. This CL adds these
instructions to softfloat simulator as well.
Updates #15365.
Change-Id: I29405b2bfcb4c8cf106cb7a1a811409fec91b170
Reviewed-on: https://go-review.googlesource.com/24790
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
This fixes erroneous handling of the more result parameter of
runtime.Frames.Next.
Fixes#16349.
Change-Id: I4f1c0263dafbb883294b31dbb8922b9d3e650200
Reviewed-on: https://go-review.googlesource.com/24911
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The cgocallback function picked up a ctxt parameter in CL 22508.
That CL updated the assembler implementation, but there are a few
mentions in Go code that were not updated. This CL fixes that.
Fixes#16326
Change-Id: I5f68e23565c6a0b11057aff476d13990bff54a66
Reviewed-on: https://go-review.googlesource.com/24848
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
In the beta version of the macOS Sierra (10.12) release, the
gettimeofday system call changed on x86. Previously it always returned
the time in the AX/DX registers. Now, if AX is returned as 0, it means
that the system call has stored the values into the memory pointed to by
the first argument, just as the libc gettimeofday function does. The
libc function handles both cases, and we need to do so as well.
Fixes#16272.
Change-Id: Ibe5ad50a2c5b125e92b5a4e787db4b5179f6b723
Reviewed-on: https://go-review.googlesource.com/24812
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The shrinkstack code locks all the channels a goroutine is waiting for,
but didn't handle the case of the same channel appearing in the list
multiple times. This led to a deadlock. The channels are sorted so it's
easy to avoid locking the same channel twice.
Fixes#16286.
Change-Id: Ie514805d0532f61c942e85af5b7b8ac405e2ff65
Reviewed-on: https://go-review.googlesource.com/24815
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
The assembly is broken: it does `MOVQ g(R12), R14` expecting that
R12 contains tls address, but it does not do get_tls(R12) before.
This magically works on linux: `MOVQ g(R12), R14` is compiled to
`mov %fs:0xfffffffffffffff8,%r14` which does not use R12.
But it crashes on windows.
Add explicit `get_tls(R12)`.
Fixes#16206
Change-Id: Ic1f21a6fef2473bcf9147de6646929781c9c1e98
Reviewed-on: https://go-review.googlesource.com/24590
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
When the blocked field was first introduced back in
https://golang.org/cl/61250043 the scheduler trace code incorrectly used
m->blocked instead of mp->blocked. That has carried through the
conversion to Go. This CL fixes it.
Change-Id: Id81907b625221895aa5c85b9853f7c185efd8f4b
Reviewed-on: https://go-review.googlesource.com/24571
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
If creating a new thread fails with EAGAIN, point the user at ulimit.
Fixes#15476.
Change-Id: Ib36519614b5c72776ea7f218a0c62df1dd91a8ea
Reviewed-on: https://go-review.googlesource.com/24570
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Several minor changes that remove a good chunk of the overhead added
to the reflect Name method over the 1.7 cycle, as seen from the
non-SSA architectures.
In particular, there are ~20 fewer instructions in reflect.name.name
on 386, and the method now qualifies for inlining.
The simple JSON decoding benchmark on darwin/386:
name old time/op new time/op delta
CodeDecoder-8 49.2ms ± 0% 48.9ms ± 1% -0.77% (p=0.000 n=10+9)
name old speed new speed delta
CodeDecoder-8 39.4MB/s ± 0% 39.7MB/s ± 1% +0.77% (p=0.000 n=10+9)
On darwin/amd64 the effect is less pronounced:
name old time/op new time/op delta
CodeDecoder-8 38.9ms ± 0% 38.7ms ± 1% -0.38% (p=0.005 n=10+10)
name old speed new speed delta
CodeDecoder-8 49.9MB/s ± 0% 50.1MB/s ± 1% +0.38% (p=0.006 n=10+10)
Counterintuitively, I get much more useful benchmark data out of my
MacBook Pro than a linux workstation with more expensive Intel chips.
While the laptop has fewer cores and an active GUI, the single-threaded
performance is significantly better (nearly 1.5x decoding throughput)
so the differences are more pronounced.
For #16117.
Change-Id: I4e0cc1cc2d271d47d5127b1ee1ca926faf34cabf
Reviewed-on: https://go-review.googlesource.com/24510
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This modifies a recent performance improvement to the
And8 and Or8 atomic functions which required both ppc64le
and ppc64 to use power8 instructions. Since then it was
decided that ppc64 (BE) should work for power5 and later.
This change uses instructions compatible with power5 for
ppc64 and uses power8 for ppc64le.
Fixes#16004
Change-Id: I623c75e8e6fd1fa063a53d250d86cdc9d0890dc7
Reviewed-on: https://go-review.googlesource.com/24181
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Andrew Gerrand <adg@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
In the comments for this file there is a reference to gperftools
for more info on pprof. pprof now live on its own repo on github,
and the version in gperftools is deprecated.
Change-Id: I8a188f129534f73edd132ef4e5a2d566e69df7e9
Reviewed-on: https://go-review.googlesource.com/24502
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This was removed in CL 19695 but it slows down reflect.New, which ends
up on the hot path of things like JSON decoding.
There is no immediate cost in binary size, but it will make it harder to
further shrink run time type information in Go 1.8.
Before
BenchmarkNew-40 30000000 36.3 ns/op
After
BenchmarkNew-40 50000000 29.5 ns/op
Fixes#16161
Updates #16117
Change-Id: If7cb7f3e745d44678f3f5cf3a5338c59847529d2
Reviewed-on: https://go-review.googlesource.com/24400
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The test is in the runtime package because there are other tests of
pprof there. At some point we should probably move them all into a pprof
testsuite.
Fixes#16128.
Change-Id: Ieefa40c61cf3edde11fe0cf04da1debfd8b3d7c0
Reviewed-on: https://go-review.googlesource.com/24274
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Andrew Gerrand <adg@golang.org>
A straight conversion from a type T to an interface type I, where T does
not implement I, should always panic with an interface conversion error
that shows the missing method. This was not happening if the conversion
was done once using the comma-ok form (the result would not be OK) and
then again in a straight conversion. Due to an error in the runtime
package the second conversion was failing with a nil pointer
dereference.
Fixes#16130.
Change-Id: I8b9fca0f1bb635a6181b8b76de8c2385bb7ac2d2
Reviewed-on: https://go-review.googlesource.com/24284
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Michel Lespinasse <walken@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Andrew Gerrand <adg@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
We haven't used poisonStack since we switched to 1-bit stack maps
(4d0f3a1), but the checks are still there. However, nothing prevents
us from genuinely allocating an object at this address on 32-bit and
causing the runtime to crash claiming that it's found a bad pointer.
Since we're not using poisonStack anyway, just pull it out.
Fixes#15831.
Change-Id: Ia6ef604675b8433f75045e369f5acd4644a5bb38
Reviewed-on: https://go-review.googlesource.com/24211
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
I verified that the test fails if I undo the change that it tests for.
Updates #14732.
Change-Id: Ib30352580236adefae946450ddd6cd65a62b7cdf
Reviewed-on: https://go-review.googlesource.com/24151
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Mikio Hara <mikioh.mikioh@gmail.com>
This is an attempt to get more information for #14809, which seems to
occur rarely.
Updates #14809.
Change-Id: Idbeb136ceb57993644e03266622eb699d2685d02
Reviewed-on: https://go-review.googlesource.com/24110
Reviewed-by: Mikio Hara <mikioh.mikioh@gmail.com>
Reviewed-by: Austin Clements <austin@google.com>
This adds 8 bytes of binary size to every type that has methods. It is
the smallest change I could come up with for 1.7.
Fixes#16037
Change-Id: Ibe15c3165854a21768596967757864b880dbfeed
Reviewed-on: https://go-review.googlesource.com/24070
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Crawshaw <crawshaw@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Instead of doing:
x = input
one round of aes on x
x ^= seed
two rounds of aes on x
Do:
x = input
x ^= seed
three rounds of aes on x
This change provides some additional seed-dependent scrambling
which should help prevent collisions.
Change-Id: I02c774d09c2eb6917cf861513816a1024a9b65d7
Reviewed-on: https://go-review.googlesource.com/23577
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
When setting $pc, gdb does a backtrace using the current value of $sp,
and it may complain if $sp does not match that $pc (although the
assignment went through successfully).
This happens with ARM SSA backend: when setting $pc it prints
> Cannot access memory at address 0x0
As well as occasionally on MIPS64:
> warning: GDB can't find the start of the function at 0xc82003fe07.
> ...
Setting $sp before setting $pc makes it happy.
Change-Id: Idd96dbef3e9b698829da553c6d71d5b4c6d492db
Reviewed-on: https://go-review.googlesource.com/23940
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
R13 needs to be set to g because C code may have clobbered R13.
Fixes#16006.
Change-Id: I66311fe28440e85e589a1695fa1c42416583b4c6
Reviewed-on: https://go-review.googlesource.com/23910
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
go_android_exec is looking for "exitcode=" to decide the result
of running a test. The heap dump test nondeterministically prints
"finalized" right at the end of the test. When the timing is just
right, we print "finalizedexitcode=0" and confuse go_android_exec.
This failure happens occasionally on the android builders.
Change-Id: I4f73a4db05d8f40047ecd3ef3a881a4ae3741e26
Reviewed-on: https://go-review.googlesource.com/23861
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Any defer in a shared object crashed when GOARCH=386. This turns out to be two
bugs:
1) Calls to morestack were not processed to be PIC safe (must have been
possible to trigger this another way too)
2) jmpdefer needs to rewind the return address of the deferred function past
the instructions that load the GOT pointer into BX, not just past the call
Bug 2) requires re-introducing the a way for .s files to know when they are
being compiled for dynamic linking but I've tried to do that in as minimal
a way as possible.
Fixes#15916
Change-Id: Ia0d09b69ec272a176934176b8eaef5f3bfcacf04
Reviewed-on: https://go-review.googlesource.com/23623
Run-TryBot: Michael Hudson-Doyle <michael.hudson@canonical.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
When doing a backtrace from a signal that occurs in C code compiled
without using -fasynchronous-unwind-tables, we have to rely on frame
pointers. In order to do that, the traceback function needs the signal
context to reliably pick up the frame pointer.
Change-Id: I7b45930fced01685c337d108e0f146057928f876
Reviewed-on: https://go-review.googlesource.com/23494
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The code has moved from code.google.com to github.com.
Change-Id: I0cc9eb69b3fedc9e916417bc7695759632f2391f
Reviewed-on: https://go-review.googlesource.com/23523
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Add TSAN acquire/release calls to runtime/cgo to match the ones
generated by cgo. This avoids a false positive race around the malloc
memory used in runtime/cgo when other goroutines are simultaneously
calling malloc and free from cgo.
These new calls will only be used when building with CGO_CFLAGS and
CGO_LDFLAGS set to -fsanitize=thread, which becomes a requirement to
avoid all false positives when using TSAN. These are needed not just
for runtime/cgo, but also for any runtime package that uses cgo (such as
net and os/user).
Add an unused attribute to the _cgo_tsan_acquire and _cgo_tsan_release
functions, in case there are no actual cgo function calls.
Add a test that checks that setting CGO_CFLAGS/CGO_LDFLAGS avoids a
false positive report when using os/user.
Change-Id: I0905c644ff7f003b6718aac782393fa219514c48
Reviewed-on: https://go-review.googlesource.com/23492
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
In order to support pprof for position independent executables, pprof
needs to adjust the PC addresses stored in the profile by the address at
which the program is loaded. The legacy profiling support which we use
already supports recording the GNU/Linux /proc/self/maps data
immediately after the CPU samples, so do that. Also change the pprof
symbolizer to use the information, if available, when looking up
addresses in the Go pcline data.
Fixes#15714.
Change-Id: I4bf679210ef7c51d85cf873c968ce82db8898e3e
Reviewed-on: https://go-review.googlesource.com/23525
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Also adds missing copyright notice.
Updates #15603.
Change-Id: Icf4bb45ba5edec891491fe5f0039a8a25125d168
Reviewed-on: https://go-review.googlesource.com/23501
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently when the garbage collector frees stacks of dead goroutines
in markrootFreeGStacks, it calls stackfree on a regular user stack.
This is a problem, since stackfree manipulates the stack cache in the
per-P mcache, so if it grows the stack or gets preempted in the middle
of manipulating the stack cache (which are both possible since it's on
a user stack), it can easily corrupt the stack cache.
Fix this by calling markrootFreeGStacks on the system stack, so that
all calls to stackfree happen on the system stack. To prevent this bug
in the future, mark stack functions that manipulate the mcache as
go:systemstack.
Fixes#15853.
Change-Id: Ic0d1c181efb342f134285a152560c3a074f14a3d
Reviewed-on: https://go-review.googlesource.com/23511
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This has a minor performance cost, but far less than is being gained by SSA.
As an experiment, enable it during the Go 1.7 beta.
Having frame pointers on by default makes Linux's perf, Intel VTune,
and other profilers much more useful, because it lets them gather a
stack trace efficiently on profiling events.
(It doesn't help us that much, since when we walk the stack we usually
need to look up PC-specific information as well.)
Fixes#15840.
Change-Id: I4efd38412a0de4a9c87b1b6e5d11c301e63f1a2a
Reviewed-on: https://go-review.googlesource.com/23451
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The irregular calling convention for defers currently incorrectly
manages the BP if frame pointers are enabled. Specifically, jmpdefer
manipulates the SP as if its own caller, deferreturn, had returned.
However, it does not manipulate the BP to match. As a result, when a
BP-based traceback happens during a deferred function call, it unwinds
to the function that performed the defer and then thinks that function
called itself in an infinite regress.
Fix this by making jmpdefer manipulate the BP as if deferreturn had
actually returned.
Fixes#12968.
Updates #15840.
Change-Id: Ic9cc7c863baeaf977883ed0c25a7e80e592cf066
Reviewed-on: https://go-review.googlesource.com/23457
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
A few other architectures have already defined a NOFRAME flag.
Use it to disable frame pointer code on a few very low-level functions
that must behave like Windows code.
Makes the failing os/signal test pass on a Windows gomote.
Change-Id: I982365f2c59a0aa302b4428c970846c61027cf3e
Reviewed-on: https://go-review.googlesource.com/23456
Reviewed-by: Austin Clements <austin@google.com>
I have been running this patch inside Google against Go 1.6 for the last month.
The new tests will probably break the builders but let's see
exactly how they break.
Change-Id: Ia65cf7d3faecffeeb4b06e9b80875c0e57d86d9e
Reviewed-on: https://go-review.googlesource.com/23452
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Acquire and release the TSAN synchronization point when calling malloc,
just as we do when calling any other C function. If we don't do this,
TSAN will report false positive errors about races calling malloc and
free.
We used to have a special code path for malloc and free, going through
the runtime functions cmalloc and cfree. The special code path for cfree
was no longer used even before this CL. This CL stops using the special
code path for malloc, because there is no place along that path where we
could conditionally insert the TSAN synchronization. This CL removes
the support for the special code path for both functions.
Instead, cgo now automatically generates the malloc function as though
it were referenced as C.malloc. We need to automatically generate it
even if C.malloc is not called, even if malloc and size_t are not
declared, to support cgo-provided functions like C.CString.
Change-Id: I829854ec0787a80f33fa0a8a0dc2ee1d617830e2
Reviewed-on: https://go-review.googlesource.com/23260
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This makes GOEXPERIMENT=framepointer, GOOS=darwin, and buildmode=carchive coexist.
Change-Id: I9f6fb2f0f06f27df683e5b51f2fa55cd21872453
Reviewed-on: https://go-review.googlesource.com/23454
Reviewed-by: Austin Clements <austin@google.com>
Currently scanstack obtains its own gcWork from the P for the duration
of the stack scan and then, if called during mark termination,
disposes the gcWork.
However, this means that the number of workbufs allocated will be at
least the number of stacks scanned during mark termination, which may
be very high (especially during a STW GC). This happens because, in
steady state, each scanstack will obtain a fresh workbuf (either from
the empty list or by allocating it), fill it with the scan results,
and then dispose it to the full list. Nothing is consuming from the
full list during this (and hence nothing is recycling them to the
empty list), so the length of the full list by the time mark
termination starts draining it is at least the number of stacks
scanned.
Fix this by pushing the gcWork acquisition up the stack to either the
gcDrain that calls markroot that calls scanstack (which batches across
many stack scans and is the path taken during STW GC) or to newstack
(which is still a single scanstack call, but this is roughly bounded
by the number of Ps).
This fix reduces the workbuf allocation for the test program from
issue #15319 from 213 MB (roughly 2KB * 1e5 goroutines) to 10 MB.
Fixes#15319.
Note that there's potentially a similar issue in write barriers during
mark 2. Fixing that will be more difficult since there's no broader
non-preemptible context, but it should also be less of a problem since
the full list is being drained during mark 2.
Some overall improvements in the go1 benchmarks, plus the usual noise.
No significant change in the garbage benchmark (time/op or GC memory).
name old time/op new time/op delta
BinaryTree17-12 2.54s ± 1% 2.51s ± 1% -1.09% (p=0.000 n=20+19)
Fannkuch11-12 2.12s ± 0% 2.17s ± 0% +2.18% (p=0.000 n=19+18)
FmtFprintfEmpty-12 45.1ns ± 1% 45.2ns ± 0% ~ (p=0.078 n=19+18)
FmtFprintfString-12 127ns ± 0% 128ns ± 0% +1.08% (p=0.000 n=19+16)
FmtFprintfInt-12 125ns ± 0% 122ns ± 1% -2.71% (p=0.000 n=14+18)
FmtFprintfIntInt-12 196ns ± 0% 190ns ± 1% -2.91% (p=0.000 n=12+20)
FmtFprintfPrefixedInt-12 196ns ± 0% 194ns ± 1% -0.94% (p=0.000 n=13+18)
FmtFprintfFloat-12 253ns ± 1% 251ns ± 1% -0.86% (p=0.000 n=19+20)
FmtManyArgs-12 807ns ± 1% 784ns ± 1% -2.85% (p=0.000 n=20+20)
GobDecode-12 7.13ms ± 1% 7.12ms ± 1% ~ (p=0.351 n=19+20)
GobEncode-12 5.89ms ± 0% 5.95ms ± 0% +0.94% (p=0.000 n=19+19)
Gzip-12 219ms ± 1% 221ms ± 1% +1.35% (p=0.000 n=18+20)
Gunzip-12 37.5ms ± 1% 37.4ms ± 0% ~ (p=0.057 n=20+19)
HTTPClientServer-12 81.4µs ± 4% 81.9µs ± 3% ~ (p=0.118 n=17+18)
JSONEncode-12 15.7ms ± 1% 15.8ms ± 1% +0.73% (p=0.000 n=17+18)
JSONDecode-12 57.9ms ± 1% 57.2ms ± 1% -1.34% (p=0.000 n=19+19)
Mandelbrot200-12 4.12ms ± 1% 4.10ms ± 0% -0.33% (p=0.000 n=19+17)
GoParse-12 3.22ms ± 2% 3.25ms ± 1% +0.72% (p=0.000 n=18+20)
RegexpMatchEasy0_32-12 70.6ns ± 1% 71.1ns ± 2% +0.63% (p=0.005 n=19+20)
RegexpMatchEasy0_1K-12 240ns ± 0% 239ns ± 1% -0.59% (p=0.000 n=19+20)
RegexpMatchEasy1_32-12 71.3ns ± 1% 71.3ns ± 1% ~ (p=0.844 n=17+17)
RegexpMatchEasy1_1K-12 384ns ± 2% 371ns ± 1% -3.45% (p=0.000 n=19+20)
RegexpMatchMedium_32-12 109ns ± 1% 108ns ± 2% -0.48% (p=0.029 n=19+19)
RegexpMatchMedium_1K-12 34.3µs ± 1% 34.5µs ± 2% ~ (p=0.160 n=18+20)
RegexpMatchHard_32-12 1.79µs ± 9% 1.72µs ± 2% -3.83% (p=0.000 n=19+19)
RegexpMatchHard_1K-12 53.3µs ± 4% 51.8µs ± 1% -2.82% (p=0.000 n=19+20)
Revcomp-12 386ms ± 0% 388ms ± 0% +0.72% (p=0.000 n=17+20)
Template-12 62.9ms ± 1% 62.5ms ± 1% -0.57% (p=0.010 n=18+19)
TimeParse-12 325ns ± 0% 331ns ± 0% +1.84% (p=0.000 n=18+19)
TimeFormat-12 338ns ± 0% 343ns ± 0% +1.34% (p=0.000 n=18+20)
[Geo mean] 52.7µs 52.5µs -0.42%
Change-Id: Ib2d34736c4ae2ec329605b0fbc44636038d8d018
Reviewed-on: https://go-review.googlesource.com/23391
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Also mark it go:systemstack and explain why.
Change-Id: I88baf22741c04012ba2588d8e03dd3801d19b5c0
Reviewed-on: https://go-review.googlesource.com/23390
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Names for Append?Bytes are slightly changed in addition to adding a slash.
Change-Id: I0291aa29c693f9040fd01368eaad9766259677df
Reviewed-on: https://go-review.googlesource.com/23426
Run-TryBot: Marcel van Lohuizen <mpvl@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Names of sub-benchmarks are preserved, short of the additional slash.
Change-Id: I9b3f82964f9a44b0d28724413320afd091ed3106
Reviewed-on: https://go-review.googlesource.com/23425
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Marcel van Lohuizen <mpvl@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Other GOARCHs already handle their callee-saved FP registers, but
arm was missing. Without this change, code using Cgo and floating
point code might fail in mysterious and hard to debug ways.
There are no floating point registers when GOARM=5, so skip the
registers when runtime.goarm < 6.
darwin/arm doesn't support GOARM=5, so the check is left out of
rt0_darwin_arm.s.
Fixes#14876
Change-Id: I6bcb90a76df3664d8ba1f33123a74b1eb2c9f8b2
Reviewed-on: https://go-review.googlesource.com/23140
Run-TryBot: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
When gentraceback starts on a system stack in sigprof, it is
configured to jump to the user stack when it reaches the end of the
system stack. Currently this updates the current frame's FP, but not
its SP. This is okay on non-LR machines (x86) because frame.sp is only
used to find defers, which the bottom-most frame of the user stack
will never have.
However, on LR machines, we use frame.sp to find the saved LR. We then
use to resolve the function of the next frame, which is used to
resolved the size of the next frame. Since we're not updating frame.sp
on a stack jump, we read the saved LR from the system stack instead of
the user stack and wind up resolving the wrong function and hence the
wrong frame size for the next frame.
This has had remarkably few ill effects (though the resulting profiles
must be wrong). We noticed it because of a bad interaction with stack
barriers. Specifically, once we get the next frame size wrong, we also
get the location of its LR wrong. If we happen to get a stack slot
that contains a stale stack barrier LR (for a stack barrier we already
hit) and hasn't been overwritten with something else as we re-grew the
stack, gentraceback will fail with a "found next stack barrier at ..."
error, pointing at the slot that it thinks is an LR, but isn't.
Fixes#15138.
Updates #15313 (might fix it).
Change-Id: I13cfa322b44c0c2f23ac2b3d03e12631e4a6406b
Reviewed-on: https://go-review.googlesource.com/23291
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Currently it's possible for user code to exploit the high scheduler
priority of the GC worker in conjunction with the runnext optimization
to elevate a user goroutine to high priority so it will always run
even if there are other runnable goroutines.
For example, if a goroutine is in a tight allocation loop, the
following can happen:
1. Goroutine 1 allocates, triggering a GC.
2. G 1 attempts an assist, but fails and blocks.
3. The scheduler runs the GC worker, since it is high priority.
Note that this also starts a new scheduler quantum.
4. The GC worker does enough work to satisfy the assist.
5. The GC worker readies G 1, putting it in runnext.
6. GC finishes and the scheduler runs G 1 from runnext, giving it
the rest of the GC worker's quantum.
7. Go to 1.
Even if there are other goroutines on the run queue, they never get a
chance to run in the above sequence. This requires a confluence of
circumstances that make it unlikely, though not impossible, that it
would happen in "real" code. In the test added by this commit, we
force this confluence by setting GOMAXPROCS to 1 and GOGC to 1 so it's
easy for the test to repeated trigger GC and wake from a blocked
assist.
We fix this by making GC always put user goroutines at the end of the
run queue, instead of in runnext. This makes it so user code can't
piggy-back on the GC's high priority to make a user goroutine act like
it has high priority. The only other situation where GC wakes user
goroutines is waking all blocked assists at the end, but this uses the
global run queue and hence doesn't have this problem.
Fixes#15706.
Change-Id: I1589dee4b7b7d0c9c8575ed3472226084dfce8bc
Reviewed-on: https://go-review.googlesource.com/23172
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently ready always puts the readied goroutine in runnext. We're
going to have to change this for some uses, so add a flag for whether
or not to use runnext.
For now we always pass true so this is a no-op change.
For #15706.
Change-Id: Iaa66d8355ccfe4bbe347570cc1b1878c70fa25df
Reviewed-on: https://go-review.googlesource.com/23171
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
OpenBSD 6.0 (due out November 2016) will support PT_TLS, which will
allow for the OpenBSD cgo pthread_create() workaround to be removed.
However, in order for Go to continue working on supported OpenBSD
releases (the current release and the previous release - 5.9 and 6.0,
once 6.0 is released), we cannot enable PT_TLS immediately. Instead,
adjust the existing code so that it works with the previous TCB
allocation and the new TIB allocation. This allows the same Go
runtime to work on 5.8, 5.9 and later 6.0.
Once OpenBSD 5.9 is no longer supported (May 2017, when 6.1 is
released), PT_TLS can be enabled and the additional cgo runtime
code removed.
Change-Id: I3eed5ec593d80eea78c6656cb12557004b2c0c9a
Reviewed-on: https://go-review.googlesource.com/23197
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The test case in #15639 somehow causes an invalid syscall frame. The
failure is obscured because the throw occurs when throwsplit == true,
which causes a "stack split at bad time" error when trying to print the
throw message.
This CL fixes the "stack split at bad time" by using systemstack. No
test because there shouldn't be any way to trigger this error anyhow.
Update #15639.
Change-Id: I4240f3fd01bdc3c112f3ffd1316b68504222d9e1
Reviewed-on: https://go-review.googlesource.com/23153
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
On some systems, gdb is set to: "startup-with-shell on". This
breaks runtime_test. This just make sure gdb does not start by
spawning a shell.
Fixes#15354
Change-Id: Ia040931c61dea22f4fdd79665ab9f84835ecaa70
Reviewed-on: https://go-review.googlesource.com/23142
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The signal might get delivered to a different thread, and that thread
might not run again before the currently running thread returns and
exits. Sleep to give the other thread time to pick up the signal and
crash.
Not tested for all cases, but, optimistically:
Fixes#14063.
Change-Id: Iff58669ac6185ad91cce85e0e86f17497a3659fd
Reviewed-on: https://go-review.googlesource.com/23203
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Mikio Hara <mikioh.mikioh@gmail.com>
racefini calls __tsan_fini which is C code and at the end of it
invoked the standard C library exit(3) call. This has undefined
behavior if invoked more than once. Specifically in C++ programs
it caused static destructors to run twice. At least on glibc
impls it also means the at_exit handlers list (where those are
stored) also free's a list entry when it completes these. So invoking
twice results in a double free at exit which trips debug memory
allocation tracking.
Fix all of this by using an atomic as a boolean barrier around
calls to racefini being invoked > 1 time.
Fixes#15578
Change-Id: I49222aa9b8ded77160931f46434c61a8379570fc
Reviewed-on: https://go-review.googlesource.com/22882
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Issue #15613 points out that the darwin builders have been getting
regular failures in which a process that should exit with a SIGPIPE
signal is instead exiting with exit status 2. The code calls
runtime.raise. On most systems runtime.raise is the equivalent of
pthread_kill(gettid(), sig); that is, it kills the thread with the
signal, which should ensure that the program does not keep going. On
darwin, however, runtime.raise is actually kill(getpid(), sig); that is,
it sends a signal to the entire process. If the process decides to
deliver the signal to a different thread, then it is possible that in
some cases the thread that calls raise is able to execute the next
system call before the signal is actually delivered. That would cause
the observed error.
I have not been able to recreate the problem myself, so I don't know
whether this actually fixes it. But, optimistically:
Fixed#15613.
Change-Id: I60c0a9912aae2f46143ca1388fd85e9c3fa9df1f
Reviewed-on: https://go-review.googlesource.com/23152
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This should help with debugging failures.
For #15138 and #15477.
Change-Id: I77db2b6375d8b4403d3edf5527899d076291e02c
Reviewed-on: https://go-review.googlesource.com/23134
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
The convention for writing something like "64 kB" is 64<<10, since
this is easier to read than 1<<16. Update gcBitsChunkBytes to follow
this convention.
Change-Id: I5b5a3f726dcf482051ba5b1814db247ff3b8bb2f
Reviewed-on: https://go-review.googlesource.com/23132
Reviewed-by: Rick Hudson <rlh@golang.org>
The 17-31 byte code is broken. Disabled it.
Added a bunch of tests to at least cover the cases
in indexShortStr. I'll channel Brad and wonder why
this CL ever got in without any tests.
Fixes#15679
Change-Id: I84a7b283a74107db865b9586c955dcf5f2d60161
Reviewed-on: https://go-review.googlesource.com/23106
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently the heapBitsSetType documentation says that there are no
races on the heap bitmap, but that isn't exactly true. There are no
*write-write* races, but there are read-write races. Expand the
documentation to explain this and why it's okay.
Change-Id: Ibd92b69bcd6524a40a9dd4ec82422b50831071ed
Reviewed-on: https://go-review.googlesource.com/23092
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently we only execute a publication barrier for scan objects (and
skip it for noscan objects). This used to be okay because GC would
never consult the object itself (so it wouldn't observe uninitialized
memory even if it found a pointer to a noscan object), and the heap
bitmap was pre-initialized to noscan.
However, now we explicitly initialize the heap bitmap for noscan
objects when we allocate them. While the GC will still never consult
the contents of a noscan object, it does need to see the initialized
heap bitmap. Hence, we need to execute a publication barrier to make
the bitmap visible before user code can expose a pointer to the newly
allocated object even for noscan objects.
Change-Id: Ie4133c638db0d9055b4f7a8061a634d970627153
Reviewed-on: https://go-review.googlesource.com/23043
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
In future releases of OpenBSD, the sigreturn syscall will no longer
exist. As such, stop using sigreturn on openbsd/386 and just return
from the signal trampoline (as we already do for openbsd/amd64 and
openbsd/arm).
Change-Id: Ic4de1795bbfbfb062a685832aea0d597988c6985
Reviewed-on: https://go-review.googlesource.com/23024
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Noticed and fix by Alex Brainman.
Tested in https://golang.org/cl/23005 (which makes all compiler
warnings fatal during development)
Fixes#15623
Change-Id: Ic19999fce8bb8640d963965cc328574efadd7855
Reviewed-on: https://go-review.googlesource.com/23010
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
The test sometimes fails on builders.
The test uses sleeps to establish the necessary goroutine
execution order. If sleeps undersleep/oversleep
the race is still reported, but it can be reported when the
main test goroutine returns. In such case test driver
can't match the race with the test and reports failure.
Wait for both test goroutines to ensure that the race
is reported in the test scope.
Fixes#15579
Change-Id: I0b9bec0ebfb0c127d83eb5325a7fe19ef9545050
Reviewed-on: https://go-review.googlesource.com/22951
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
In issue #13992, Russ mentioned that the heap bitmap footprint was
halved but that the bitmap size calculation hadn't been updated. This
presents the opportunity to either halve the bitmap size or double
the addressable virtual space. This CL doubles the addressable virtual
space. On 32 bit this can be tweaked further to allow the bitmap to
cover the entire 4GB virtual address space, removing a failure mode
if the kernel hands out memory with a too low address.
First, fix the calculation and double _MaxArena32 to cover 4GB virtual
memory space with the same bitmap size (256 MB).
Then, allow the fallback mode for the initial memory reservation
on 32 bit (or 64 bit with too little available virtual memory) to not
include space for the arena. mheap.sysAlloc will automatically reserve
additional space when the existing arena is full.
Finally, set arena_start to 0 in 32 bit mode, so that any address is
acceptable for subsequent (additional) reservations.
Before, the bitmap was always located just before arena_start, so
fix the two places relying on that assumption: Point the otherwise unused
mheap.bitmap to one byte after the end of the bitmap, and use it for
bitmap addressing instead of arena_start.
With arena_start set to 0 on 32 bit, the cgoInRange check is no longer a
sufficient check for Go pointers. Introduce and call inHeapOrStack to
check whether a pointer is to the Go heap or stack.
While we're here, remove sysReserveHigh which seems to be unused.
Fixes#13992
Change-Id: I592b513148a50b9d3967b5c5d94b86b3ec39acc2
Reviewed-on: https://go-review.googlesource.com/20471
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Change-Id: Ice9c234960adc7857c8370b777a0b18e29d59281
Reviewed-on: https://go-review.googlesource.com/22853
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
I meant to delete these in CL 22850, actually.
Change-Id: I0c286efd2b9f1caf0221aa88e3bcc03649c89517
Reviewed-on: https://go-review.googlesource.com/22851
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This can only happen when profiling and there is foreign code
at the top of the g0 stack but we're not in cgo.
That in turn only happens with the race detector.
Fixes#13568.
Change-Id: I23775132c9c1a3a3aaae191b318539f368adf25e
Reviewed-on: https://go-review.googlesource.com/18322
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
CL/19862 (f79b50b8d5) recently introduced the constants
SeekStart, SeekCurrent, and SeekEnd to the io package. We should use these constants
consistently throughout the code base.
Updates #15269
Change-Id: If7fcaca7676e4a51f588528f5ced28220d9639a2
Reviewed-on: https://go-review.googlesource.com/22097
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Joe Tsai <joetsai@digital-static.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Negative-case conversion code was wrong for minimum int32,
used negate-then-widen instead of widen-then-negate.
Test already exists; this fixes the failure.
Fixes#15563.
Change-Id: I4b0b3ae8f2c9714bdcc405d4d0b1502ccfba2b40
Reviewed-on: https://go-review.googlesource.com/22830
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
So we can start working on other architectures here.
Change is a dummy to keep git happy.
Change-Id: I1caa62a242790601810a1ff72af7ea9773d4da76
Reviewed-on: https://go-review.googlesource.com/22822
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Adds a small function signame that infers a signal name
from the signal table, otherwise will fallback to using
hex(sig) as previously. No signal table is present for
Windows hence it will always print the hex value.
Sample code and new result:
```go
package main
import (
"fmt"
"time"
)
func main() {
defer func() {
if err := recover(); err != nil {
fmt.Printf("err=%v\n", err)
}
}()
ticker := time.Tick(1e9)
for {
<-ticker
}
}
```
```shell
$ go run main.go &
$ kill -11 <pid>
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0xb01dfacedebac1e
pc=0xc71db]
...
```
Fixes#13969
Change-Id: Ie6be312eb766661f1cea9afec352b73270f27f9d
Reviewed-on: https://go-review.googlesource.com/22753
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The following performance improvements have been made to the
low-level atomic functions for ppc64le & ppc64:
- For those cases containing a lwarx and stwcx (or other sizes):
sync, lwarx, maybe something, stwcx, loop to sync, sync, isync
The sync is moved before (outside) the lwarx/stwcx loop, and the
sync after is removed, so it becomes:
sync, lwarx, maybe something, stwcx, loop to lwarx, isync
- For the Or8 and And8, the shifting and manipulation of the
address to the word aligned version were removed and the
instructions were changed to use lbarx, stbcx instead of
register shifting, xor, then lwarx, stwcx.
- New instructions LWSYNC, LBAR, STBCC were tested and added.
runtime/atomic_ppc64x.s was changed to use the LWSYNC opcode
instead of the WORD encoding.
Fixes#15469
Ran some of the benchmarks in the runtime and sync directories.
Some results varied from run to run but the trend was improvement
based on best times for base and new:
runtime.test:
BenchmarkChanNonblocking-128 0.88 0.89 +1.14%
BenchmarkChanUncontended-128 569 511 -10.19%
BenchmarkChanContended-128 63110 53231 -15.65%
BenchmarkChanSync-128 691 598 -13.46%
BenchmarkChanSyncWork-128 11355 11649 +2.59%
BenchmarkChanProdCons0-128 2402 2090 -12.99%
BenchmarkChanProdCons10-128 1348 1363 +1.11%
BenchmarkChanProdCons100-128 1002 746 -25.55%
BenchmarkChanProdConsWork0-128 2554 2720 +6.50%
BenchmarkChanProdConsWork10-128 1909 1804 -5.50%
BenchmarkChanProdConsWork100-128 1624 1580 -2.71%
BenchmarkChanCreation-128 237 212 -10.55%
BenchmarkChanSem-128 705 667 -5.39%
BenchmarkChanPopular-128 5081190 4497566 -11.49%
BenchmarkCreateGoroutines-128 532 473 -11.09%
BenchmarkCreateGoroutinesParallel-128 35.0 34.7 -0.86%
BenchmarkCreateGoroutinesCapture-128 4923 4200 -14.69%
sync.test:
BenchmarkUncontendedSemaphore-128 112 94.2 -15.89%
BenchmarkContendedSemaphore-128 133 128 -3.76%
BenchmarkMutexUncontended-128 1.90 1.67 -12.11%
BenchmarkMutex-128 353 310 -12.18%
BenchmarkMutexSlack-128 304 283 -6.91%
BenchmarkMutexWork-128 554 541 -2.35%
BenchmarkMutexWorkSlack-128 567 556 -1.94%
BenchmarkMutexNoSpin-128 275 242 -12.00%
BenchmarkMutexSpin-128 1129 1030 -8.77%
BenchmarkOnce-128 1.08 0.96 -11.11%
BenchmarkPool-128 29.8 27.4 -8.05%
BenchmarkPoolOverflow-128 40564 36583 -9.81%
BenchmarkSemaUncontended-128 3.14 2.63 -16.24%
BenchmarkSemaSyntNonblock-128 1087 1069 -1.66%
BenchmarkSemaSyntBlock-128 897 893 -0.45%
BenchmarkSemaWorkNonblock-128 1034 1028 -0.58%
BenchmarkSemaWorkBlock-128 949 886 -6.64%
Change-Id: I4403fb29d3cd5254b7b1ce87a216bd11b391079e
Reviewed-on: https://go-review.googlesource.com/22549
Reviewed-by: Michael Munday <munday@ca.ibm.com>
Reviewed-by: Minux Ma <minux@golang.org>
Builder is too slow. This test passed on builder machines but took
15+ min.
Change-Id: Ief9d67ea47671a57e954e402751043bc1ce09451
Reviewed-on: https://go-review.googlesource.com/22798
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Minux Ma <minux@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Since tracebackctxt.go uses //export functions, the C functions can't be
externally visible in the C comment. The code was using attributes to
work around that, but that failed on Windows.
Change-Id: If4449fd8209a8998b4f6855ea89e5db1471b2981
Reviewed-on: https://go-review.googlesource.com/22786
Reviewed-by: Minux Ma <minux@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
If we collected a cgo traceback when entering the SIGPROF signal
handler, record it as part of the profiling stack trace.
This serves as the promised test for https://golang.org/cl/21055 .
Change-Id: I5f60cd6cea1d9b7c3932211483a6bfab60ed21d2
Reviewed-on: https://go-review.googlesource.com/22650
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Race runtime also needs local malloc caches and currently uses
a mix of per-OS-thread and per-goroutine caches. This leads to
increased memory consumption. But more importantly cache of
synchronization objects is per-goroutine and we don't always
have goroutine context when feeing memory in GC. As the result
synchronization object descriptors leak (more precisely, they
can be reused if another synchronization object is recreated
at the same address, but it does not always help). For example,
the added BenchmarkSyncLeak has effectively runaway memory
consumption (based on a real long running server).
This change updates race runtime with support for per-P contexts.
BenchmarkSyncLeak now stabilizes at ~1GB memory consumption.
Long term, this will allow us to remove race runtime dependency
on glibc (as malloc is the main cornerstone).
I've also implemented a different scheme to pass P context to
race runtime: scheduler notified race runtime about association
between G and P by calling procwire(g, p)/procunwire(g, p).
But it turned out to be very messy as we have lots of places
where the association changes (e.g. syscalls). So I dropped it
in favor of the current scheme: race runtime asks scheduler
about the current P.
Fixes#14533
Change-Id: Iad10d2f816a44affae1b9fed446b3580eafd8c69
Reviewed-on: https://go-review.googlesource.com/19970
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Runqempty is a critical predicate for scheduler. If runqempty spuriously
returns true, then scheduler can fail to schedule arbitrary number of
runnable goroutines on idle Ps for arbitrary long time. With the addition
of runnext runqempty predicate become broken (can spuriously return true).
Consider that runnext is not nil and the main array is empty. Runqempty
observes that the array is empty, then it is descheduled for some time.
Then queue owner pushes another element to the queue evicting runnext
into the array. Then queue owner pops runnext. Then runqempty resumes
and observes runnext is nil and returns true. But there were no point
in time when the queue was empty.
Fix runqempty predicate to not return true spuriously.
Change-Id: Ifb7d75a699101f3ff753c4ce7c983cf08befd31e
Reviewed-on: https://go-review.googlesource.com/20858
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This updates some comments that became out of date when we moved the
mark bit out of the heap bitmap and started using the high bit for the
first word as a scan/dead bit.
Change-Id: I4a572d16db6114cadff006825466c1f18359f2db
Reviewed-on: https://go-review.googlesource.com/22662
Reviewed-by: Rick Hudson <rlh@golang.org>
MIPS N64 ABI passes arguments in registers R4-R11, return value in R2.
R16-R23, R28, R30 and F24-F31 are callee-save. gcc PIC code expects
to be called with indirect call through R25.
Change-Id: I24f582b4b58e1891ba9fd606509990f95cca8051
Reviewed-on: https://go-review.googlesource.com/19805
Reviewed-by: Minux Ma <minux@golang.org>
SB register (R28) is introduced for access external addresses with shorter
instruction sequences. It is loaded at entry points. External data within
2G of SB can be accessed this way.
cmd/internal/obj: relocaltion R_ADDRMIPS is split into two relocations
R_ADDRMIPS and R_ADDRMIPSU, handling the low 16 bits and the "upper" 16
bits of external addresses, respectively, since the instructios may not
be adjacent. It might be better if relocation Variant could be used.
cmd/link/internal/mips64: support new relocations.
cmd/compile/internal/mips64: reserve SB register.
runtime: initialize SB register at entry points.
Change-Id: I5f34868f88c5a9698c042a8a1f12f76806c187b9
Reviewed-on: https://go-review.googlesource.com/19802
Reviewed-by: Minux Ma <minux@golang.org>
Leave R28 to SB register, which will be introduced in CL 19802.
Change-Id: I1cf7a789695c5de664267ec8086bfb0b043ebc14
Reviewed-on: https://go-review.googlesource.com/19863
Reviewed-by: Minux Ma <minux@golang.org>
With the switch to separate mark bitmaps, the scan/dead bit for the
first word of each object is now unused. Reclaim this bit and use it
as a scan/dead bit, just like words three and on. The second word is
still used for checkmark.
This dramatically simplifies heapBitsSetTypeNoScan and hasPointers,
since they no longer need different cases for 1, 2, and 3+ word
objects. They can instead just manipulate the heap bitmap for the
first word and be done with it.
In order to enable this, we change heapBitsSetType and runGCProg to
always set the scan/dead bit to scan for the first word on every code
path. Since these functions only apply to types that have pointers,
there's no need to do this conditionally: it's *always* necessary to
set the scan bit in the first word.
We also change every place that scans an object and checks if there
are more pointers. Rather than only checking morePointers if the word
is >= 2, we now check morePointers if word != 1 (since that's the
checkmark word).
Looking forward, we should probably reclaim the checkmark bit, too,
but that's going to be quite a bit more work.
Tested by setting doubleCheck in heapBitsSetType and running all.bash
on both linux/amd64 and linux/386, and by running GOGC=10 all.bash.
This particularly improves the FmtFprintf* go1 benchmarks, since they
do a large amount of noscan allocation.
name old time/op new time/op delta
BinaryTree17-12 2.34s ± 1% 2.38s ± 1% +1.70% (p=0.000 n=17+19)
Fannkuch11-12 2.09s ± 0% 2.09s ± 1% ~ (p=0.276 n=17+16)
FmtFprintfEmpty-12 44.9ns ± 2% 44.8ns ± 2% ~ (p=0.340 n=19+18)
FmtFprintfString-12 127ns ± 0% 125ns ± 0% -1.57% (p=0.000 n=16+15)
FmtFprintfInt-12 128ns ± 0% 122ns ± 1% -4.45% (p=0.000 n=15+20)
FmtFprintfIntInt-12 207ns ± 1% 193ns ± 0% -6.55% (p=0.000 n=19+14)
FmtFprintfPrefixedInt-12 197ns ± 1% 191ns ± 0% -2.93% (p=0.000 n=17+18)
FmtFprintfFloat-12 263ns ± 0% 248ns ± 1% -5.88% (p=0.000 n=15+19)
FmtManyArgs-12 794ns ± 0% 779ns ± 1% -1.90% (p=0.000 n=18+18)
GobDecode-12 7.14ms ± 2% 7.11ms ± 1% ~ (p=0.072 n=20+20)
GobEncode-12 5.85ms ± 1% 5.82ms ± 1% -0.49% (p=0.000 n=20+20)
Gzip-12 218ms ± 1% 215ms ± 1% -1.22% (p=0.000 n=19+19)
Gunzip-12 36.8ms ± 0% 36.7ms ± 0% -0.18% (p=0.006 n=18+20)
HTTPClientServer-12 77.1µs ± 4% 77.1µs ± 3% ~ (p=0.945 n=19+20)
JSONEncode-12 15.6ms ± 1% 15.9ms ± 1% +1.68% (p=0.000 n=18+20)
JSONDecode-12 55.2ms ± 1% 53.6ms ± 1% -2.93% (p=0.000 n=17+19)
Mandelbrot200-12 4.05ms ± 1% 4.05ms ± 0% ~ (p=0.306 n=17+17)
GoParse-12 3.14ms ± 1% 3.10ms ± 1% -1.31% (p=0.000 n=19+18)
RegexpMatchEasy0_32-12 69.3ns ± 1% 70.0ns ± 0% +0.89% (p=0.000 n=19+17)
RegexpMatchEasy0_1K-12 237ns ± 1% 236ns ± 0% -0.62% (p=0.000 n=19+16)
RegexpMatchEasy1_32-12 69.5ns ± 1% 70.3ns ± 1% +1.14% (p=0.000 n=18+17)
RegexpMatchEasy1_1K-12 377ns ± 1% 366ns ± 1% -3.03% (p=0.000 n=15+19)
RegexpMatchMedium_32-12 107ns ± 1% 107ns ± 2% ~ (p=0.318 n=20+19)
RegexpMatchMedium_1K-12 33.8µs ± 3% 33.5µs ± 1% -1.04% (p=0.001 n=20+19)
RegexpMatchHard_32-12 1.68µs ± 1% 1.73µs ± 0% +2.50% (p=0.000 n=20+18)
RegexpMatchHard_1K-12 50.8µs ± 1% 52.0µs ± 1% +2.50% (p=0.000 n=19+18)
Revcomp-12 381ms ± 1% 385ms ± 1% +1.00% (p=0.000 n=17+18)
Template-12 64.9ms ± 3% 62.6ms ± 1% -3.55% (p=0.000 n=19+18)
TimeParse-12 324ns ± 0% 328ns ± 1% +1.25% (p=0.000 n=18+18)
TimeFormat-12 345ns ± 0% 334ns ± 0% -3.31% (p=0.000 n=15+17)
[Geo mean] 52.1µs 51.5µs -1.00%
Change-Id: I13e74da3193a7f80794c654f944d1f0d60817049
Reviewed-on: https://go-review.googlesource.com/22632
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This makes this code better self-documenting and makes it easier to
find these places in the future.
Change-Id: I31dc5598ae67f937fb9ef26df92fd41d01e983c3
Reviewed-on: https://go-review.googlesource.com/22631
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
heapBits.bits is carefully written to produce good machine code. Use
it in heapBits.morePointers and heapBits.isPointer to get good machine
code there, too.
Change-Id: I208c7d0d38697e7a22cad67f692162589b75f1e2
Reviewed-on: https://go-review.googlesource.com/22630
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Fix issues introduced in 5f9a870.
Change-Id: Ia75945ef563956613bf88bbe57800a96455c265d
Reviewed-on: https://go-review.googlesource.com/22661
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Add support for the context function set by runtime.SetCgoTraceback.
The context function was added in CL 17761, without support.
This CL is the support.
This CL has not been tested for real C code, as a working context
function for C code requires unwind support that does not seem to exist.
I wanted to get the CL out before the freeze.
I apologize for the length of this CL. It's mostly plumbing, but
unfortunately the plumbing is processor-specific.
Change-Id: I8ce11a0de9b3dafcc29efd2649d776e93bff0e90
Reviewed-on: https://go-review.googlesource.com/22508
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This commit moves the GC from free list allocation to
bit mark allocation. Instead of using the bitmaps
generated during the mark phases to generate free
list and then using the free lists for allocation we
allocate directly from the bitmaps.
The change in the garbage benchmark
name old time/op new time/op delta
XBenchGarbage-12 2.22ms ± 1% 2.13ms ± 1% -3.90% (p=0.000 n=18+18)
Change-Id: I17f57233336f0ca5ef5404c3be4ecb443ab622aa
nextFreeFast is currently not inlined by the compiler due
to its size and complexity. This CL simplifies
nextFreeFast by letting the slow path handle (nextFree)
handle a corner cases.
Change-Id: Ia9c5d1a7912bcb4bec072f5fd240f0e0bafb20e4
Reviewed-on: https://go-review.googlesource.com/22598
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
sweep used to skip mcental.freeSpan (and its locking) if it didn't
find any new free objects. We lost that optimization when the
freed-object counting changed in dad83f7 to count total free objects
instead of newly freed objects.
The previous commit brings back counting of newly freed objects, so we
can easily revive this optimization by checking that count (like we
used to) instead of the total free objects count.
Change-Id: I43658707a1c61674d0366124d5976b00d98741a9
Reviewed-on: https://go-review.googlesource.com/22596
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
Commit 8dda1c4 changed the meaning of "nfree" in sweep from the number
of newly freed objects to the total number of free objects in the
span, but didn't update where sweep added nfree to c.local_nsmallfree.
Hence, we're over-accounting the number of frees. This is causing
TestArrayHash to fail with "too many allocs NNN - hash not balanced".
Fix this by computing the number of newly freed objects and adding
that to c.local_nsmallfree, so it behaves like it used to. Computing
this requires a small tweak to mallocgc: apparently we've never set
s.allocCount when allocating a large object; fix this by setting it to
1 so sweep doesn't get confused.
Change-Id: I31902ffd310110da4ffd807c5c06f1117b872dc8
Reviewed-on: https://go-review.googlesource.com/22595
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
We broke tracing of freed objects in GODEBUG=allocfreetrace=1 mode
when we removed the sweep over the mark bitmap. Fix it by
re-introducing the sweep over the bitmap specifically if we're in
allocfreetrace mode. This doesn't have to be even remotely efficient,
since the overhead of allocfreetrace is huge anyway, so we can keep
the code for this down to just a few lines.
Change-Id: I9e176b3b04c73608a0ea3068d5d0cd30760ebd40
Reviewed-on: https://go-review.googlesource.com/22592
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently we always zero objects when we allocate them. We used to
have an optimization that would not zero objects that had not been
allocated since the whole span was last zeroed (either by getting it
from the system or by getting it from the heap, which does a bulk
zero), but this depended on the sweeper clobbering the first two words
of each object. Hence, we lost this optimization when the bitmap
sweeper went away.
Re-introduce this optimization using a different mechanism. Each span
already keeps a flag indicating that it just came from the OS or was
just bulk zeroed by the mheap. We can simply use this flag to know
when we don't need to zero an object. This is slightly less efficient
than the old optimization: if a span gets allocated and partially
used, then GC happens and the span gets returned to the mcentral, then
the span gets re-acquired, the old optimization knew that it only had
to re-zero the objects that had been reclaimed, whereas this
optimization will re-zero everything. However, in this case, you're
already paying for the garbage collection, and you've only wasted one
zeroing of the span, so in practice there seems to be little
difference. (If we did want to revive the full optimization, each span
could keep track of a frontier beyond which all free slots are zeroed.
I prototyped this and it didn't obvious do any better than the much
simpler approach in this commit.)
This significantly improves BinaryTree17, which is allocation-heavy
(and runs first, so most pages are already zeroed), and slightly
improves everything else.
name old time/op new time/op delta
XBenchGarbage-12 2.15ms ± 1% 2.14ms ± 1% -0.80% (p=0.000 n=17+17)
name old time/op new time/op delta
BinaryTree17-12 2.71s ± 1% 2.56s ± 1% -5.73% (p=0.000 n=18+19)
DivconstI64-12 1.70ns ± 1% 1.70ns ± 1% ~ (p=0.562 n=18+18)
DivconstU64-12 1.74ns ± 2% 1.74ns ± 1% ~ (p=0.394 n=20+20)
DivconstI32-12 1.74ns ± 0% 1.74ns ± 0% ~ (all samples are equal)
DivconstU32-12 1.66ns ± 1% 1.66ns ± 0% ~ (p=0.516 n=15+16)
DivconstI16-12 1.84ns ± 0% 1.84ns ± 0% ~ (all samples are equal)
DivconstU16-12 1.82ns ± 0% 1.82ns ± 0% ~ (all samples are equal)
DivconstI8-12 1.79ns ± 0% 1.79ns ± 0% ~ (all samples are equal)
DivconstU8-12 1.60ns ± 0% 1.60ns ± 1% ~ (p=0.603 n=17+19)
Fannkuch11-12 2.11s ± 1% 2.11s ± 0% ~ (p=0.333 n=16+19)
FmtFprintfEmpty-12 45.1ns ± 4% 45.4ns ± 5% ~ (p=0.111 n=20+20)
FmtFprintfString-12 134ns ± 0% 129ns ± 0% -3.45% (p=0.000 n=18+16)
FmtFprintfInt-12 131ns ± 1% 129ns ± 1% -1.54% (p=0.000 n=16+18)
FmtFprintfIntInt-12 205ns ± 2% 203ns ± 0% -0.56% (p=0.014 n=20+18)
FmtFprintfPrefixedInt-12 200ns ± 2% 197ns ± 1% -1.48% (p=0.000 n=20+18)
FmtFprintfFloat-12 256ns ± 1% 256ns ± 0% -0.21% (p=0.008 n=18+20)
FmtManyArgs-12 805ns ± 0% 804ns ± 0% -0.19% (p=0.001 n=18+18)
GobDecode-12 7.21ms ± 1% 7.14ms ± 1% -0.92% (p=0.000 n=19+20)
GobEncode-12 5.88ms ± 1% 5.88ms ± 1% ~ (p=0.641 n=18+19)
Gzip-12 218ms ± 1% 218ms ± 1% ~ (p=0.271 n=19+18)
Gunzip-12 37.1ms ± 0% 36.9ms ± 0% -0.29% (p=0.000 n=18+17)
HTTPClientServer-12 78.1µs ± 2% 77.4µs ± 2% ~ (p=0.070 n=19+19)
JSONEncode-12 15.5ms ± 1% 15.5ms ± 0% ~ (p=0.063 n=20+18)
JSONDecode-12 56.1ms ± 0% 55.4ms ± 1% -1.18% (p=0.000 n=19+18)
Mandelbrot200-12 4.05ms ± 0% 4.06ms ± 0% +0.29% (p=0.001 n=18+18)
GoParse-12 3.28ms ± 1% 3.21ms ± 1% -2.30% (p=0.000 n=20+20)
RegexpMatchEasy0_32-12 69.4ns ± 2% 69.3ns ± 1% ~ (p=0.205 n=18+16)
RegexpMatchEasy0_1K-12 239ns ± 0% 239ns ± 0% ~ (all samples are equal)
RegexpMatchEasy1_32-12 69.4ns ± 1% 69.4ns ± 1% ~ (p=0.620 n=15+18)
RegexpMatchEasy1_1K-12 370ns ± 1% 369ns ± 2% ~ (p=0.088 n=20+20)
RegexpMatchMedium_32-12 108ns ± 0% 108ns ± 0% ~ (all samples are equal)
RegexpMatchMedium_1K-12 33.6µs ± 3% 33.5µs ± 3% ~ (p=0.718 n=20+20)
RegexpMatchHard_32-12 1.68µs ± 1% 1.67µs ± 2% ~ (p=0.316 n=20+20)
RegexpMatchHard_1K-12 50.5µs ± 3% 50.4µs ± 3% ~ (p=0.659 n=20+20)
Revcomp-12 381ms ± 1% 381ms ± 1% ~ (p=0.916 n=19+18)
Template-12 66.5ms ± 1% 65.8ms ± 2% -1.08% (p=0.000 n=20+20)
TimeParse-12 317ns ± 0% 319ns ± 0% +0.48% (p=0.000 n=19+12)
TimeFormat-12 338ns ± 0% 338ns ± 0% ~ (p=0.124 n=19+18)
[Geo mean] 5.99µs 5.96µs -0.54%
Change-Id: I638ffd9d9f178835bbfa499bac20bd7224f1a907
Reviewed-on: https://go-review.googlesource.com/22591
Reviewed-by: Rick Hudson <rlh@golang.org>
This converts all remaining uses of mspan.start to instead use
mspan.base(). In many cases, this actually reduces the complexity of
the code.
Change-Id: If113840e00d3345a6cf979637f6a152e6344aee7
Reviewed-on: https://go-review.googlesource.com/22590
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
Currently we have lots of (s.start << _PageShift) and variants. We now
have an s.base() function that returns this. It's faster and more
readable, so use it.
Change-Id: I888060a9dae15ea75ca8cc1c2b31c905e71b452b
Reviewed-on: https://go-review.googlesource.com/22559
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
These used to be used for the list of newly freed objects, but that's
no longer a thing.
Change-Id: I5a4503137b74ec0eae5372ca271b1aa0b32df074
Reviewed-on: https://go-review.googlesource.com/22557
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Our compilers now provides instrinsics including
sys.Ctz64 that support CTZ (count trailing zero)
instructions. This CL replaces the Go versions
of CTZ with the compiler intrinsic.
Count trailing zeros CTZ finds the least
significant 1 in a word and returns the number
of less significant 0s in the word.
Allocation uses the bitmap created by the garbage
collector to locate an unmarked object. The logic
takes a word of the bitmap, complements, and then
caches it. It then uses CTZ to locate an available
unmarked object. It then shifts marked bits out of
the bitmap word preparing it for the next search.
Once all the unmarked objects are used in the
cached work the bitmap gets another word and
repeats the process.
Change-Id: Id2fc42d1d4b9893efaa2e1bd01896985b7e42f82
Reviewed-on: https://go-review.googlesource.com/21366
Reviewed-by: Austin Clements <austin@google.com>
Two changes are included here that are dependent on the other.
The first is that allocBits and gcamrkBits are changed to
a *uint8 which points to the first byte of that span's
mark and alloc bits. Several places were altered to
perform pointer arithmetic to locate the byte corresponding
to an object in the span. The actual bit corresponding
to an object is indexed in the byte by using the lower three
bits of the objects index.
The second change avoids the redundant calculation of an
object's index. The index is returned from heapBitsForObject
and then used by the functions indexing allocBits
and gcmarkBits.
Finally we no longer allocate the gc bits in the span
structures. Instead we use an arena based allocation scheme
that allows for a more compact bit map as well as recycling
and bulk clearing of the mark bits.
Change-Id: If4d04b2021c092ec39a4caef5937a8182c64dfef
Reviewed-on: https://go-review.googlesource.com/20705
Reviewed-by: Austin Clements <austin@google.com>
The _SigUnblock flag was appended to SIGSYS slot of runtime signal table
for Linux in https://go-review.googlesource.com/22202, but there is
still no concrete opinion on whether SIGSYS must be an unblocked signal
for runtime.
This change removes _SigUnblock flag from SIGSYS on Linux for
consistency in runtime signal handling and adds a reference to #15204 to
runtime signal table for FreeBSD.
Updates #15204.
Change-Id: I42992b1d852c2ab5dd37d6dbb481dba46929f665
Reviewed-on: https://go-review.googlesource.com/22537
Reviewed-by: Ian Lance Taylor <iant@golang.org>