Commit Graph

229 Commits

Author SHA1 Message Date
Cherry Zhang a413908dd0 all: add GOOS=ios
Introduce GOOS=ios for iOS systems. GOOS=ios matches "darwin"
build tag, like GOOS=android matches "linux" and GOOS=illumos
matches "solaris". Only ios/arm64 is supported (ios/amd64 is
not).

GOOS=ios and GOOS=darwin remain essentially the same at this
point. They will diverge at later time, to differentiate macOS
and iOS.

Uses of GOOS=="darwin" are changed to (GOOS=="darwin" || GOOS=="ios"),
except if it clearly means macOS (e.g. GOOS=="darwin" && GOARCH=="amd64"),
it remains GOOS=="darwin".

Updates #38485.

Change-Id: I4faacdc1008f42434599efb3c3ad90763a83b67c
Reviewed-on: https://go-review.googlesource.com/c/go/+/254740
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2020-09-23 18:12:59 +00:00
Michael Anthony Knyszek e6d0bd2b89 runtime: clean up old mcentral code
This change deletes the old mcentral implementation from the code base
and the newMCentralImpl feature flag along with it.

Updates #37487.

Change-Id: Ibca8f722665f0865051f649ffe699cbdbfdcfcf2
Reviewed-on: https://go-review.googlesource.com/c/go/+/221184
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2020-08-17 20:06:49 +00:00
Keith Randall 12c01f7698 runtime: ensure arenaBaseOffset makes it into DWARF (for viewcore)
This constant does not make it into DWARF because it is an ideal
constant larger than maxint (1<<63-1). DWARF has no way to represent
signed values that large. Define a different typed constant that
is unsigned and so can represent this constant properly.

Viewcore needs this constant to interrogate the heap data structures.
In addition, the sign of arenaBaseOffset changed in 1.15, and providing
a new name lets viewcore detect the sign change easily.

Change-Id: I4274a2f6e79ebbf1411e85d64758fac1672fb96b
Reviewed-on: https://go-review.googlesource.com/c/go/+/240198
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2020-06-27 02:49:01 +00:00
Austin Clements 886caba73c runtime: always mark span when marking an object
The page sweeper depends on spans being marked if any object in the
span is marked, but currently only greyobject does this.
gcmarknewobject and wbBufFlush1 also mark objects, but neither set
span marks. As a result, if there are live objects on a span, but
they're all marked via allocation or write barriers, then the span
itself won't be marked and the page reclaimer will free the span,
ultimately leading to memory corruption when the memory for those live
allocations gets reused.

Fix this by making gcmarknewobject and wbBufFlush1 also mark pages.

No test because I have no idea how to reliably (or even unreliably)
trigger this.

Fixes #39432.

Performance is a wash or very slightly worse. I benchmarked the
gcmarknewobject and wbBufFlush1 changes independently and both showed
a slight performance improvement, so I'm going to call this noise.

name                                old time/op  new time/op  delta
BiogoIgor                            15.9s ± 2%   15.9s ± 2%    ~     (p=0.758 n=25+25)
BiogoKrishna                         15.7s ± 3%   15.7s ± 3%    ~     (p=0.382 n=21+21)
BleveIndexBatch100                   4.94s ± 3%   5.07s ± 4%  +2.63%  (p=0.000 n=25+25)
CompileTemplate                      204ms ± 1%   205ms ± 1%  +0.43%  (p=0.000 n=21+23)
CompileUnicode                      77.8ms ± 1%  78.1ms ± 1%    ~     (p=0.130 n=23+23)
CompileGoTypes                       731ms ± 1%   733ms ± 1%  +0.30%  (p=0.006 n=22+22)
CompileCompiler                      3.64s ± 2%   3.65s ± 3%    ~     (p=0.179 n=24+25)
CompileSSA                           8.44s ± 1%   8.46s ± 1%  +0.30%  (p=0.003 n=22+23)
CompileFlate                         132ms ± 1%   133ms ± 1%    ~     (p=0.098 n=22+22)
CompileGoParser                      164ms ± 1%   164ms ± 1%  +0.37%  (p=0.000 n=21+23)
CompileReflect                       455ms ± 1%   457ms ± 2%  +0.50%  (p=0.002 n=20+22)
CompileTar                           182ms ± 2%   182ms ± 1%    ~     (p=0.382 n=22+22)
CompileXML                           245ms ± 3%   245ms ± 1%    ~     (p=0.070 n=21+23)
CompileStdCmd                        16.5s ± 2%   16.5s ± 3%    ~     (p=0.486 n=23+23)
FoglemanFauxGLRenderRotateBoat       12.9s ± 1%   13.0s ± 1%  +0.97%  (p=0.000 n=21+24)
FoglemanPathTraceRenderGopherIter1   18.6s ± 1%   18.7s ± 0%    ~     (p=0.083 n=23+24)
GopherLuaKNucleotide                 28.4s ± 1%   29.3s ± 1%  +2.84%  (p=0.000 n=25+25)
MarkdownRenderXHTML                  252ms ± 0%   251ms ± 1%  -0.50%  (p=0.000 n=23+24)
Tile38WithinCircle100kmRequest       516µs ± 2%   516µs ± 2%    ~     (p=0.763 n=24+25)
Tile38IntersectsCircle100kmRequest   689µs ± 2%   689µs ± 2%    ~     (p=0.617 n=24+24)
Tile38KNearestLimit100Request        608µs ± 1%   606µs ± 2%  -0.35%  (p=0.030 n=19+22)
[Geo mean]                           522ms        524ms       +0.41%

https://perf.golang.org/search?q=upload:20200606.4

Change-Id: I8b331f310dbfaba0468035f207467c8403005bf5
Reviewed-on: https://go-review.googlesource.com/c/go/+/236817
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2020-06-08 17:09:33 +00:00
Michael Anthony Knyszek 796786cd0c runtime: make maxOffAddr reflect the actual address space upper bound
Currently maxOffAddr is defined in terms of the whole 64-bit address
space, assuming that it's all supported, by using ^uintptr(0) as the
maximal address in the offset space. In reality, the maximal address in
the offset space is (1<<heapAddrBits)-1 because we don't have more than
that actually available to us on a given platform.

On most platforms this is fine, because arenaBaseOffset is just
connecting two segments of address space, but on AIX we use it as an
actual offset for the starting address of the available address space,
which is limited. This means using ^uintptr(0) as the maximal address in
the offset address space causes wrap-around, especially when we just
want to represent a range approximately like [addr, infinity), which
today we do by using maxOffAddr.

To fix this, we define maxOffAddr more appropriately, in terms of
(1<<heapAddrBits)-1.

This change also redefines arenaBaseOffset to not be the negation of the
virtual address corresponding to address zero in the virtual address
space, but instead directly as the virtual address corresponding to
zero. This matches the existing documentation more closely and makes the
logic around arenaBaseOffset decidedly simpler, especially when trying
to reason about its use on AIX.

Fixes #38966.

Change-Id: I1336e5036a39de846f64cc2d253e8536dee57611
Reviewed-on: https://go-review.googlesource.com/c/go/+/233497
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2020-05-14 16:20:19 +00:00
Michael Anthony Knyszek 2e455ec2eb runtime: avoid overflow from linearAlloc
Currently linearAlloc manages an exclusive "end" address for the top of
its reserved space. While unlikely for a linearAlloc to be allocated
with an "end" address hitting the top of the address space, it is
possible and could lead to overflow.

Avoid overflow by chopping off the last byte from the linearAlloc if
it's bumping up against the top of the address space defensively. In
practice, this means that if 32-bit platforms map the top of the address
space and use the linearAlloc to acquire arenas, the top arena will not
be usable.

Fixes #35954.

Change-Id: I512cddcd34fd1ab15cb6ca92bbf899fc1ef22ff6
Reviewed-on: https://go-review.googlesource.com/c/go/+/231338
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2020-05-07 21:40:09 +00:00
Michael Anthony Knyszek a13691966a runtime: add new mcentral implementation
Currently mcentral is implemented as a couple of linked lists of spans
protected by a lock. Unfortunately this design leads to significant lock
contention.

The span ownership model is also confusing and complicated. In-use spans
jump between being owned by multiple sources, generally some combination
of a gcSweepBuf, a concurrent sweeper, an mcentral or an mcache.

So first to address contention, this change replaces those linked lists
with gcSweepBufs which have an atomic fast path. Then, we change up the
ownership model: a span may be simultaneously owned only by an mcentral
and the page reclaimer. Otherwise, an mcentral (which now consists of
sweep bufs), a sweeper, or an mcache are the sole owners of a span at
any given time. This dramatically simplifies reasoning about span
ownership in the runtime.

As a result of this new ownership model, sweeping is now driven by
walking over the mcentrals rather than having its own global list of
spans. Because we no longer have a global list and we traditionally
haven't used the mcentrals for large object spans, we no longer have
anywhere to put large objects. So, this change also makes it so that we
keep large object spans in the appropriate mcentral lists.

In terms of the static lock ranking, we add the spanSet spine locks in
pretty much the same place as the mcentral locks, since they have the
potential to be manipulated both on the allocation and sweep paths, like
the mcentral locks.

This new implementation is turned on by default via a feature flag
called go115NewMCentralImpl.

Benchmark results for 1 KiB allocation throughput (5 runs each):

name \ MiB/s  go113       go114       gotip       gotip+this-patch
AllocKiB-1    1.71k ± 1%  1.68k ± 1%  1.59k ± 2%      1.71k ± 1%
AllocKiB-2    2.46k ± 1%  2.51k ± 1%  2.54k ± 1%      2.93k ± 1%
AllocKiB-4    4.27k ± 1%  4.41k ± 2%  4.33k ± 1%      5.01k ± 2%
AllocKiB-8    4.38k ± 3%  5.24k ± 1%  5.46k ± 1%      8.23k ± 1%
AllocKiB-12   4.38k ± 3%  4.49k ± 1%  5.10k ± 1%     10.04k ± 0%
AllocKiB-16   4.31k ± 1%  4.14k ± 3%  4.22k ± 0%     10.42k ± 0%
AllocKiB-20   4.26k ± 1%  3.98k ± 1%  4.09k ± 1%     10.46k ± 3%
AllocKiB-24   4.20k ± 1%  3.97k ± 1%  4.06k ± 1%     10.74k ± 1%
AllocKiB-28   4.15k ± 0%  4.00k ± 0%  4.20k ± 0%     10.76k ± 1%

Fixes #37487.

Change-Id: I92d47355acacf9af2c41bf080c08a8c1638ba210
Reviewed-on: https://go-review.googlesource.com/c/go/+/221182
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2020-04-27 18:19:26 +00:00
Michael Anthony Knyszek eacdf76b93 runtime: add bitmap-based markrootSpans implementation
Currently markrootSpans, the scanning routine which scans span specials
(particularly finalizers) as roots, uses sweepSpans to shard work and
find spans to mark.

However, as part of a future CL to change span ownership and how
mcentral works, we want to avoid having markrootSpans use the sweep bufs
to find specials, so in this change we introduce a new mechanism.

Much like for the page reclaimer, we set up a per-page bitmap where the
first page for a span is marked if the span contains any specials, and
unmarked if it has no specials. This bitmap is updated by addspecial,
removespecial, and during sweeping.

markrootSpans then shards this bitmap into mark work and markers iterate
over the bitmap looking for spans with specials to mark. Unlike the page
reclaimer, we don't need to use the pageInUse bits because having a
special implies that a span is in-use.

While in terms of computational complexity this design is technically
worse, because it needs to iterate over the mapped heap, in practice
this iteration is very fast (we can skip over large swathes of the heap
very quickly) and we only look at spans that have any specials at all,
rather than having to touch each span.

This new implementation of markrootSpans is behind a feature flag called
go115NewMarkrootSpans.

Updates #37487.

Change-Id: I8ea07b6c11059f6d412fe419e0ab512d989377b8
Reviewed-on: https://go-review.googlesource.com/c/go/+/221178
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2020-04-21 22:50:51 +00:00
Ian Lance Taylor 7ea40f6594 runtime: use mcache0 if no P in profilealloc
A case that I missed in CL 205239: profilealloc can be called at
program startup if GOMAXPROCS is large enough.

Fixes #38474

Change-Id: I2f089fc6ec00c376680e1c0b8a2557b62789dd7f
Reviewed-on: https://go-review.googlesource.com/c/go/+/228420
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2020-04-17 00:45:52 +00:00
Dan Scales 0a820007e7 runtime: static lock ranking for the runtime (enabled by GOEXPERIMENT)
I took some of the infrastructure from Austin's lock logging CR
https://go-review.googlesource.com/c/go/+/192704 (with deadlock
detection from the logs), and developed a setup to give static lock
ranking for runtime locks.

Static lock ranking establishes a documented total ordering among locks,
and then reports an error if the total order is violated. This can
happen if a deadlock happens (by acquiring a sequence of locks in
different orders), or if just one side of a possible deadlock happens.
Lock ordering deadlocks cannot happen as long as the lock ordering is
followed.

Along the way, I found a deadlock involving the new timer code, which Ian fixed
via https://go-review.googlesource.com/c/go/+/207348, as well as two other
potential deadlocks.

See the constants at the top of runtime/lockrank.go to show the static
lock ranking that I ended up with, along with some comments. This is
great documentation of the current intended lock ordering when acquiring
multiple locks in the runtime.

I also added an array lockPartialOrder[] which shows and enforces the
current partial ordering among locks (which is embedded within the total
ordering). This is more specific about the dependencies among locks.

I don't try to check the ranking within a lock class with multiple locks
that can be acquired at the same time (i.e. check the ranking when
multiple hchan locks are acquired).

Currently, I am doing a lockInit() call to set the lock rank of most
locks. Any lock that is not otherwise initialized is assumed to be a
leaf lock (a very high rank lock), so that eliminates the need to do
anything for a bunch of locks (including all architecture-dependent
locks). For two locks, root.lock and notifyList.lock (only in the
runtime/sema.go file), it is not as easy to do lock initialization, so
instead, I am passing the lock rank with the lock calls.

For Windows compilation, I needed to increase the StackGuard size from
896 to 928 because of the new lock-rank checking functions.

Checking of the static lock ranking is enabled by setting
GOEXPERIMENT=staticlockranking before doing a run.

To make sure that the static lock ranking code has no overhead in memory
or CPU when not enabled by GOEXPERIMENT, I changed 'go build/install' so
that it defines a build tag (with the same name) whenever any experiment
has been baked into the toolchain (by checking Expstring()). This allows
me to avoid increasing the size of the 'mutex' type when static lock
ranking is not enabled.

Fixes #38029

Change-Id: I154217ff307c47051f8dae9c2a03b53081acd83a
Reviewed-on: https://go-review.googlesource.com/c/go/+/207619
Reviewed-by: Dan Scales <danscales@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-07 21:51:03 +00:00
Austin Clements d965bb6130 runtime: use divRoundUp
There are a handful of places where the runtime wants to round up the
result of a division. We just introduced a helper to do this. This CL
replaces all of the hand-coded round-ups (that I could find) with this
helper.

Change-Id: I465d152157ff0f3cad40c0aa57491e4f2de510ad
Reviewed-on: https://go-review.googlesource.com/c/go/+/224385
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-20 16:05:35 +00:00
Ian Lance Taylor 3093959ee1 runtime: remove mcache field from m
Having an mcache field in both m and p is confusing, so remove it from m.
Always use mcache field from p. Use new variable mcache0 during bootstrap.

Change-Id: If2cba9f8bb131d911d512b61fd883a86cf62cc98
Reviewed-on: https://go-review.googlesource.com/c/go/+/205239
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2020-02-24 16:39:52 +00:00
Jerrin Shaji George 921ceadd29 runtime: rewrite a comment in malloc.go
This commit changes the wording of a comment in malloc.go that describes
how span objects are zeroed to make it more clear.

Change-Id: I07722df1e101af3cbf8680ad07437d4a230b0168
GitHub-Last-Rev: 0e909898c7
GitHub-Pull-Request: golang/go#37008
Reviewed-on: https://go-review.googlesource.com/c/go/+/217618
Reviewed-by: Austin Clements <austin@google.com>
2020-02-05 21:19:43 +00:00
Joel Sing 8e0be05ec7 runtime: add support for linux/riscv64
Based on riscv-go port.

Updates #27532

Change-Id: If522807a382130be3c8d40f4b4c1131d1de7c9e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/204632
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-01-19 14:04:09 +00:00
Clément Chigot 5042317d69 runtime: add arenaBaseOffset on aix/ppc64
On AIX, addresses returned by mmap are between 0x0a00000000000000
and 0x0afffffffffffff. The previous solution to handle these large
addresses was to increase the arena size up to 60 bits addresses,
cf CL 138736.

However, with the new page allocator, the 60bit heap addresses are
causing huge memory allocations, especially by (s *pageAlloc).init. mmap
and munmap syscalls dealing with these allocations are reducing
performances of every Go programs.

In order to avoid these allocations, arenaBaseOffset is set to
0x0a00000000000000 and heap addresses are on 48bit, as others operating
systems.

Updates: #35451

Change-Id: Ice916b8578f76703428ec12a82024147a7592bc0
Reviewed-on: https://go-review.googlesource.com/c/go/+/206841
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2019-11-16 00:02:02 +00:00
Michael Anthony Knyszek 7f574e476a runtime: remove unnecessary large parameter to mheap_.alloc
mheap_.alloc currently accepts both a spanClass and a "large" parameter
indicating whether the allocation is large. These are redundant, since
spanClass.sizeclass() == 0 is an equivalent way to determine this and is
already used in mheap_.alloc. There are no places in the runtime where
the size class could be non-zero and large == true.

Updates #35112.

Change-Id: Ie66facf8f0faca6f4cd3d20a8ac4bc259e11823d
Reviewed-on: https://go-review.googlesource.com/c/go/+/196639
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-11-08 16:44:33 +00:00
Michael Anthony Knyszek ffb5646fe0 runtime: define maximum supported physical page and huge page sizes
This change defines a maximum supported physical and huge page size in
the runtime based on the new page allocator's implementation, and uses
them where appropriate.

Furthemore, if the system exceeds the maximum supported huge page
size, we simply ignore it silently.

It also fixes a huge-page-related test which is only triggered by a
condition which is definitely wrong.

Finally, it adds a few TODOs related to code clean-up and supporting
larger huge page sizes.

Updates #35112.
Fixes #35431.

Change-Id: Ie4348afb6bf047cce2c1433576d1514720d8230f
Reviewed-on: https://go-review.googlesource.com/c/go/+/205937
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-11-08 16:35:48 +00:00
Michael Anthony Knyszek 33dfd3529b runtime: remove old page allocator
This change removes the old page allocator from the runtime.

Updates #35112.

Change-Id: Ib20e1c030f869b6318cd6f4288a9befdbae1b771
Reviewed-on: https://go-review.googlesource.com/c/go/+/195700
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-11-08 00:07:43 +00:00
Michael Anthony Knyszek e6135c2768 runtime: switch to new page allocator
This change flips the oldPageAllocator constant enabling the new page
allocator in the Go runtime.

Updates #35112.

Change-Id: I7fc8332af9fd0e43ce28dd5ebc1c1ce519ce6d0c
Reviewed-on: https://go-review.googlesource.com/c/go/+/201765
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-11-07 23:55:56 +00:00
Michael Anthony Knyszek 689f6f77f0 runtime: integrate new page allocator into runtime
This change integrates all the bits and pieces of the new page allocator
into the runtime, behind a global constant.

Updates #35112.

Change-Id: I6696bde7bab098a498ab37ed2a2caad2a05d30ec
Reviewed-on: https://go-review.googlesource.com/c/go/+/201764
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-11-07 20:14:02 +00:00
Michael Anthony Knyszek 73317080e1 runtime: add scavenging code for new page allocator
This change adds a scavenger for the new page allocator along with
tests. The scavenger walks over the heap backwards once per GC, looking
for memory to scavenge. It walks across the heap without any lock held,
searching optimistically. If it finds what appears to be a scavenging
candidate it acquires the heap lock and attempts to verify it. Upon
verification it then scavenges.

Notably, unlike the old scavenger, it doesn't show any preference for
huge pages and instead follows a more strict last-page-first policy.

Updates #35112.

Change-Id: I0621ef73c999a471843eab2d1307ae5679dd18d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/195697
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-11-07 19:14:27 +00:00
Michael Anthony Knyszek 198f0452b0 runtime: define darwin/arm64's address space as 33 bits
On iOS, the address space is not 48 bits as one might believe, since
it's arm64 hardware. In fact, all pointers are truncated to 33 bits, and
the OS only gives applications access to the range [1<<32, 2<<32).

While today this has no effect on the Go runtime, future changes which
care about address space size need this to be correct.

Updates #35112.

Change-Id: Id518a2298080f7e3d31cf7d909506a37748cc49a
Reviewed-on: https://go-review.googlesource.com/c/go/+/205758
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2019-11-07 02:33:31 +00:00
Michael Anthony Knyszek 383b447e0d runtime: clean up power-of-two rounding code with align functions
This change renames the "round" function to the more appropriately named
"alignUp" which rounds an integer up to the next multiple of a power of
two.

This change also adds the alignDown function, which is almost like
alignUp but rounds down to the previous multiple of a power of two.

With these two functions, we also go and replace manual rounding code
with it where we can.

Change-Id: Ie1487366280484dcb2662972b01b4f7135f72fec
Reviewed-on: https://go-review.googlesource.com/c/go/+/190618
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2019-11-04 23:41:34 +00:00
Michael Anthony Knyszek a41ebe6e25 runtime: add physHugePageShift
This change adds physHugePageShift which is defined such that
1 << physHugePageShift == physHugePageSize. The purpose of this variable
is to avoid doing expensive divisions in key functions, such as
(*mspan).hugePages.

This change also does a sweep of any place we might do a division or mod
operation with physHugePageSize and turns it into bit shifts and other
bitwise operations.

Finally, this change adds a check to mallocinit which ensures that
physHugePageSize is always a power of two. osinit might choose to ignore
non-powers-of-two for the value and replace it with zero, but mallocinit
will fail if it's not a power of two (or zero). It also derives
physHugePageShift from physHugePageSize.

This change helps improve the performance of most applications because
of how often (*mspan).hugePages is called.

Updates #32828.

Change-Id: I1a6db113d52d563f59ae8fd4f0e130858859e68f
Reviewed-on: https://go-review.googlesource.com/c/go/+/186598
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-07-30 18:44:52 +00:00
Keith Randall 01d137262a runtime: use uintptr instead of int32 for counting to next heap profile sample
Overflow of the comparison caused very large (>=1<<32) allocations to
sometimes not get sampled at all. Use uintptr so the comparison will
never overflow.

Fixes #33342

Tested on the example in 33342. I don't want to check a test in that
needs that much memory, however.

Change-Id: I51fe77a9117affed8094da93c0bc5f445ac2d3d3
Reviewed-on: https://go-review.googlesource.com/c/go/+/188017
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-07-29 21:07:49 +00:00
Austin Clements 5b15510d96 runtime: align allocations harder in GODEBUG=sbrk=1 mode
Currently, GODEBUG=sbrk=1 mode aligns allocations by their type's
alignment. You would think this would be the right thing to do, but
because 64-bit fields are only 4-byte aligned right now (see #599),
this can cause a 64-bit field of an allocated object to be 4-byte
aligned, but not 8-byte aligned. If there is an atomic access to that
unaligned 64-bit field, it will crash.

This doesn't happen in normal allocation mode because the
size-segregated allocation and the current size classes will cause any
types larger than 8 bytes to be 8 byte aligned.

We fix this by making sbrk=1 mode use alignment based on the type's
size rather than its declared alignment. This matches how the tiny
allocator aligns allocations.

This was tested with

  GOARCH=386 GODEBUG=sbrk=1 go test sync/atomic

This crashes with an unaligned access before this change, and passes
with this change.

This should be reverted when/if we fix #599.

Fixes #33159.

Change-Id: Ifc52c72c6b99c5d370476685271baa43ad907565
Reviewed-on: https://go-review.googlesource.com/c/go/+/186919
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-07-19 15:04:08 +00:00
Michael Anthony Knyszek 4e7bef84c1 runtime: mark newly-mapped memory as scavenged
On most platforms newly-mapped memory is untouched, meaning the pages
backing the region haven't been faulted in yet. However, we mark this
memory as unscavenged which means the background scavenger
aggressively "returns" this memory to the OS if the heap is small.

The only platform where newly-mapped memory is actually unscavenged (and
counts toward the application's RSS) is on Windows, since
(*mheap).sysAlloc commits the reservation. Instead of making a special
case for Windows, I change the requirements a bit for a sysReserve'd
region. It must now be both sysMap'd and sysUsed'd, with sysMap being a
no-op on Windows. Comments about memory allocation have been updated to
include a more up-to-date mental model of which states a region of memory
may be in (at a very low level) and how to transition between these
states.

Now this means we can correctly mark newly-mapped heap memory as
scavenged on every platform, reducing the load on the background
scavenger early on in the application for small heaps. As a result,
heap-growth scavenging is no longer necessary, since any actual RSS
growth will be accounted for on the allocation codepath.

Finally, this change also cleans up grow a little bit to avoid
pretending that it's freeing an in-use span and just does the necessary
operations directly.

Fixes #32012.
Fixes #31966.
Updates #26473.

Change-Id: Ie06061eb638162e0560cdeb0b8993d94cfb4d290
Reviewed-on: https://go-review.googlesource.com/c/go/+/177097
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-05-16 22:00:47 +00:00
Michael Anthony Knyszek 1033065ee3 runtime: add physHugePageSize
This change adds the global physHugePageSize which is initialized in
osinit(). physHugePageSize contains the system's transparent huge page
(or superpage) size in bytes.

For #30333.

Change-Id: I2f0198c40729dbbe6e6f2676cef1d57dd107562c
Reviewed-on: https://go-review.googlesource.com/c/go/+/170858
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Austin Clements <austin@google.com>
2019-05-03 18:41:45 +00:00
Richard Musiol 460f9c6068 runtime, cmd/link: optimize memory allocation on wasm
WebAssembly's memory is contiguous. Allocating memory at a high address
also allocates all memory up to that address. This change reduces
the initial memory allocated on wasm from 1GB to 16MB by using multiple
heap arenas and reducing the size of a heap arena.

Fixes #27462.

Change-Id: Ic941e6edcadd411e65a14cb2f9fd6c8eae02fc7a
Reviewed-on: https://go-review.googlesource.com/c/go/+/170950
Run-TryBot: Richard Musiol <neelance@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-04-16 15:03:34 +00:00
Keith Randall db16de9203 runtime: remove kindNoPointers
We already have the ptrdata field in a type, which encodes exactly
the same information that kindNoPointers does.

My problem with kindNoPointers is that it often leads to
double-negative code like:

   t.kind & kindNoPointers != 0

Much clearer is:

   t.ptrdata == 0

Update #27167

Change-Id: I92307d7f018a6bbe3daca4a4abb4225e359349b1
Reviewed-on: https://go-review.googlesource.com/c/go/+/169157
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-03-25 20:46:35 +00:00
Ian Lance Taylor c5babcc485 runtime: align first persistentalloc chunk as requested
Change-Id: Ib391e019b1a7513d234fb1c8ff802efe8fa7c950
Reviewed-on: https://go-review.googlesource.com/c/go/+/163859
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-03-06 21:33:41 +00:00
Marcel van Lohuizen 9650726e79 internal/reflectlite: lite version of reflect package
to be used by errors package for checking assignability
and setting error values in As.

Updates #29934.

Change-Id: I8c1d02a2c6efa0919d54b286cfe8b4edc26da059
Reviewed-on: https://go-review.googlesource.com/c/161759
Run-TryBot: Marcel van Lohuizen <mpvl@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2019-02-27 18:27:01 +00:00
Raul Silvera dc889025c7 runtime: sample large heap allocations correctly
Remove an unnecessary check on the heap sampling code that forced sampling
of all heap allocations larger than the sampling rate. This need to follow
a poisson process so that they can be correctly unsampled. Maintain a check
for MemProfileRate==1 to provide a mechanism for full sampling, as
documented in https://golang.org/pkg/runtime/#pkg-variables.

Additional testing for this change is on cl/129117.

Fixes #26618

Change-Id: I7802bde2afc655cf42cffac34af9bafeb3361957
GitHub-Last-Rev: 471f747af8
GitHub-Pull-Request: golang/go#29791
Reviewed-on: https://go-review.googlesource.com/c/158337
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2019-01-18 15:29:32 +00:00
Ian Lance Taylor 0d6a2d5f9a runtime: skip writes to persistent memory in cgo checker
Fixes #23899
Fixes #28458

Change-Id: Ie177f2d4c399445d8d5e1a327f2419c7866cb45e
Reviewed-on: https://go-review.googlesource.com/c/155697
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-01-04 02:40:56 +00:00
Clément Chigot 041526c6ef runtime: handle 64bits addresses for AIX
This commit allows the runtime to handle 64bits addresses returned by
mmap syscall on AIX.

Mmap syscall returns addresses on 59bits on AIX. But the Arena
implementation only allows addresses with less than 48 bits.
This commit increases the arena size up to 1<<60 for aix/ppc64.

Update: #25893

Change-Id: Iea72e8a944d10d4f00be915785e33ae82dd6329e
Reviewed-on: https://go-review.googlesource.com/c/138736
Reviewed-by: Austin Clements <austin@google.com>
2018-11-26 14:06:28 +00:00
Austin Clements e500ffd88c runtime: track all heap arenas in a slice
Currently, there's no efficient way to iterate over the Go heap. We're
going to need this for fast free page sweeping, so this CL adds a
slice of all allocated heap arenas. This will also be useful for
generational GC.

For #18155.

Change-Id: I58d126cfb9c3f61b3125d80b74ccb1b2169efbcc
Reviewed-on: https://go-review.googlesource.com/c/138076
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-11-15 19:14:10 +00:00
Martin Möhrmann 1e50dd02a9 runtime: use multiplication with overflow check for newarray
This improves performance for e.g. maps with a bucket size
(key+value*8 bytes) larger than 32 bytes and removes loading
a value from the maxElems array for smaller bucket sizes.

name                old time/op  new time/op  delta
MakeMap/[Byte]Byte  95.5ns ± 1%  94.7ns ± 1%  -0.78%  (p=0.013 n=9+9)
MakeMap/[Int]Int     128ns ± 0%   121ns ± 2%  -5.63%  (p=0.000 n=6+10)

Updates #21588

Change-Id: I7d9eb7d49150c399c15dcab675e24bc97ff97852
Reviewed-on: https://go-review.googlesource.com/c/143997
Reviewed-by: Keith Randall <khr@golang.org>
2018-10-23 16:48:01 +00:00
Michael Anthony Knyszek 07e738ec32 runtime: use only treaps for tracking spans
Currently, mheap tracks spans in both mSpanLists and mTreaps, but
mSpanLists, while they tend to be smaller, complicate the
implementation. Here we simplify the implementation by removing
free and busy from mheap and renaming freelarge -> free and busylarge
-> busy.

This change also slightly changes the reclamation policy. Previously,
for allocations under 1MB we would attempt to find a small span of the
right size. Now, we just try to find any number of spans totaling the
right size. This may increase heap fragmentation, but that will be dealt
with using virtual memory tricks in follow-up CLs.

For #14045.

Garbage-heavy benchmarks show very little change, except what appears
to be a decrease in STW times and peak RSS.

name                      old STW-ns/GC       new STW-ns/GC       delta
Garbage/benchmem-MB=64-8           263k ±64%           217k ±24%  -17.66%  (p=0.028 n=25+23)

name                      old STW-ns/op       new STW-ns/op       delta
Garbage/benchmem-MB=64-8          9.39k ±65%          7.80k ±24%  -16.88%  (p=0.037 n=25+23)

name                      old peak-RSS-bytes  new peak-RSS-bytes  delta
Garbage/benchmem-MB=64-8           281M ± 0%           249M ± 4%  -11.40%  (p=0.000 n=19+18)

https://perf.golang.org/search?q=upload:20181005.1

Go1 benchmarks perform roughly the same, the most notable regression
being the JSON encode/decode benchmark with worsens by ~2%.

name                     old time/op    new time/op    delta
BinaryTree17-8              3.02s ± 2%     2.99s ± 2%  -1.18%  (p=0.000 n=25+24)
Fannkuch11-8                3.05s ± 1%     3.02s ± 2%  -1.20%  (p=0.000 n=25+25)
FmtFprintfEmpty-8          43.6ns ± 5%    43.4ns ± 3%    ~     (p=0.528 n=25+25)
FmtFprintfString-8         74.9ns ± 3%    73.4ns ± 1%  -2.03%  (p=0.001 n=25+24)
FmtFprintfInt-8            79.3ns ± 3%    77.9ns ± 1%  -1.73%  (p=0.003 n=25+25)
FmtFprintfIntInt-8          119ns ± 6%     116ns ± 0%  -2.68%  (p=0.000 n=25+18)
FmtFprintfPrefixedInt-8     134ns ± 4%     132ns ± 1%  -1.52%  (p=0.004 n=25+25)
FmtFprintfFloat-8           240ns ± 1%     241ns ± 1%    ~     (p=0.403 n=24+23)
FmtManyArgs-8               543ns ± 1%     537ns ± 1%  -1.00%  (p=0.000 n=25+25)
GobDecode-8                6.88ms ± 1%    6.92ms ± 4%    ~     (p=0.088 n=24+22)
GobEncode-8                5.92ms ± 1%    5.93ms ± 1%    ~     (p=0.898 n=25+24)
Gzip-8                      267ms ± 2%     266ms ± 2%    ~     (p=0.213 n=25+24)
Gunzip-8                   35.4ms ± 1%    35.6ms ± 1%  +0.70%  (p=0.000 n=25+25)
HTTPClientServer-8          104µs ± 2%     104µs ± 2%    ~     (p=0.686 n=25+25)
JSONEncode-8               9.67ms ± 1%    9.80ms ± 4%  +1.32%  (p=0.000 n=25+25)
JSONDecode-8               47.7ms ± 1%    48.8ms ± 5%  +2.33%  (p=0.000 n=25+25)
Mandelbrot200-8            4.87ms ± 1%    4.91ms ± 1%  +0.79%  (p=0.000 n=25+25)
GoParse-8                  3.59ms ± 4%    3.55ms ± 1%    ~     (p=0.199 n=25+24)
RegexpMatchEasy0_32-8      90.3ns ± 1%    89.9ns ± 1%  -0.47%  (p=0.000 n=25+21)
RegexpMatchEasy0_1K-8       204ns ± 1%     204ns ± 1%    ~     (p=0.914 n=25+24)
RegexpMatchEasy1_32-8      84.9ns ± 0%    84.6ns ± 1%  -0.36%  (p=0.000 n=24+25)
RegexpMatchEasy1_1K-8       350ns ± 1%     348ns ± 3%  -0.59%  (p=0.007 n=25+25)
RegexpMatchMedium_32-8      122ns ± 1%     121ns ± 0%  -1.08%  (p=0.000 n=25+18)
RegexpMatchMedium_1K-8     36.1µs ± 1%    34.6µs ± 1%  -4.02%  (p=0.000 n=25+25)
RegexpMatchHard_32-8       1.69µs ± 2%    1.65µs ± 1%  -2.38%  (p=0.000 n=25+25)
RegexpMatchHard_1K-8       50.8µs ± 1%    49.4µs ± 1%  -2.69%  (p=0.000 n=25+24)
Revcomp-8                   453ms ± 2%     449ms ± 3%  -0.74%  (p=0.022 n=25+24)
Template-8                 63.2ms ± 2%    63.4ms ± 1%    ~     (p=0.127 n=25+24)
TimeParse-8                 313ns ± 1%     315ns ± 3%    ~     (p=0.924 n=24+25)
TimeFormat-8                294ns ± 1%     292ns ± 2%  -0.65%  (p=0.004 n=23+24)
[Geo mean]                 49.9µs         49.6µs       -0.65%

name                     old speed      new speed      delta
GobDecode-8               112MB/s ± 1%   110MB/s ± 4%  -1.00%  (p=0.036 n=24+24)
GobEncode-8               130MB/s ± 1%   129MB/s ± 1%    ~     (p=0.894 n=25+24)
Gzip-8                   72.7MB/s ± 2%  73.0MB/s ± 2%    ~     (p=0.208 n=25+24)
Gunzip-8                  549MB/s ± 1%   545MB/s ± 1%  -0.70%  (p=0.000 n=25+25)
JSONEncode-8              201MB/s ± 1%   198MB/s ± 3%  -1.29%  (p=0.000 n=25+25)
JSONDecode-8             40.7MB/s ± 1%  39.8MB/s ± 5%  -2.23%  (p=0.000 n=25+25)
GoParse-8                16.2MB/s ± 4%  16.3MB/s ± 1%    ~     (p=0.211 n=25+24)
RegexpMatchEasy0_32-8     354MB/s ± 1%   356MB/s ± 1%  +0.47%  (p=0.000 n=25+21)
RegexpMatchEasy0_1K-8    5.00GB/s ± 0%  4.99GB/s ± 1%    ~     (p=0.588 n=24+24)
RegexpMatchEasy1_32-8     377MB/s ± 1%   378MB/s ± 1%  +0.39%  (p=0.000 n=25+25)
RegexpMatchEasy1_1K-8    2.92GB/s ± 1%  2.94GB/s ± 3%  +0.65%  (p=0.008 n=25+25)
RegexpMatchMedium_32-8   8.14MB/s ± 1%  8.22MB/s ± 1%  +0.98%  (p=0.000 n=25+24)
RegexpMatchMedium_1K-8   28.4MB/s ± 1%  29.6MB/s ± 1%  +4.19%  (p=0.000 n=25+25)
RegexpMatchHard_32-8     18.9MB/s ± 2%  19.4MB/s ± 1%  +2.43%  (p=0.000 n=25+25)
RegexpMatchHard_1K-8     20.2MB/s ± 1%  20.7MB/s ± 1%  +2.76%  (p=0.000 n=25+24)
Revcomp-8                 561MB/s ± 2%   566MB/s ± 3%  +0.75%  (p=0.021 n=25+24)
Template-8               30.7MB/s ± 2%  30.6MB/s ± 1%    ~     (p=0.131 n=25+24)
[Geo mean]                120MB/s        121MB/s       +0.48%

https://perf.golang.org/search?q=upload:20181004.6

Change-Id: I97f9fee34577961a116a8ddd445c6272253f0f95
Reviewed-on: https://go-review.googlesource.com/c/139837
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-10-17 16:26:10 +00:00
Austin Clements 4c1c839a3d runtime: clarify table of arena sizes
Currently the table of arena sizes mixes the number of entries in the
L1 with the size of the L2. While the size of the L2 is important,
this makes it hard to see what's actually going on because there's an
implicit factor of sys.PtrSize.

This changes the L2 column to say both the number of entries and the
size that results in. This should hopefully make the relations between
the columns of the table clearer, since they can now be plugged
directly into the given formula.

Change-Id: Ie677adaef763b893a2f620bd4fc3b8db314b3a1e
Reviewed-on: https://go-review.googlesource.com/c/139697
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-10-04 16:14:44 +00:00
Austin Clements 918ed88e47 runtime: remove gcStart's mode argument
This argument is always gcBackgroundMode since only
debug.gcstoptheworld can trigger a STW GC at this point. Remove the
unnecessary argument.

Updates #26903. This is preparation for unifying STW GC and concurrent
GC.

Change-Id: Icb4ba8f10f80c2b69cf51a21e04fa2c761b71c94
Reviewed-on: https://go-review.googlesource.com/c/134775
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-10-02 20:35:24 +00:00
Austin Clements 01e6cfc2a0 runtime: don't call mcache.refill on systemstack
mcache.refill doesn't need to run on the system stack; it just needs
to be non-preemptible. Its only caller, mcache.nextFree, also needs to
be non-preemptible, so we can remove the unnecessary systemstack
switch.

Change-Id: Iba5b3f4444855f1dc134485ba588efff3b54c426
Reviewed-on: https://go-review.googlesource.com/138196
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-09-28 18:39:49 +00:00
Austin Clements 5a8c11ce3e runtime: rename _MSpan* constants to mSpan*
We already aliased mSpanInUse to _MSpanInUse. The dual constants are
getting annoying, so fix all of these to use the mSpan* naming
convention.

This was done automatically with:
  sed -i -re 's/_?MSpan(Dead|InUse|Manual|Free)/mSpan\1/g' *.go
plus deleting the existing definition of mSpanInUse.

Change-Id: I09979d9d491d06c10689cea625dc57faa9cc6767
Reviewed-on: https://go-review.googlesource.com/137875
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-09-26 20:51:07 +00:00
Tobias Klauser c7ac645d28 runtime: fix reference to sys{Fault,Free,Reserve,Unused,Used} in comments
Change-Id: Icbaedc49c810c63c51d56ae394d2f70e4d64b3e0
Reviewed-on: https://go-review.googlesource.com/136495
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-09-25 06:22:24 +00:00
Austin Clements 2d4c1ff739 runtime: don't create heap hints outside TSAN's supported heap
TSAN for Go only supports heap address in the range [0x00c000000000,
0x00e000000000). However, we currently create heap hints of the form
0xXXc000000000 for XX between 0x00 and 0x7f. Even for XX=0x01, this
hint is outside TSAN's supported heap address range.

Fix this by creating a slightly different set of hints in race mode,
all of which fall inside TSAN's heap address range.

This should fix TestArenaCollision flakes. That test forces the
runtime to use later heap hints. Currently, this always results in
TSAN "failed to allocate" failures on Windows (which happens to have a
slightly more constrained TSAN layout than non-Windows). Most of the
time we don't notice these failures, but sometimes it crashes TSAN,
leading to a test failure.

Fixes #25698.

Change-Id: I8926cd61f0ee5ee00efa77b283f7b809c555be46
Reviewed-on: https://go-review.googlesource.com/123780
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-07-19 18:12:51 +00:00
Richard Musiol 35ea62468b runtime: add js/wasm architecture
This commit adds the js/wasm architecture to the runtime package.
Currently WebAssembly has no support for threads yet, see
https://github.com/WebAssembly/design/issues/1073. Because of that,
there is no preemption of goroutines and no sysmon goroutine.

Design doc: https://docs.google.com/document/d/131vjr4DH6JFnb-blm_uRdaC0_Nv3OUwjEY5qVCxCup4
About WebAssembly assembly files: https://docs.google.com/document/d/1GRmy3rA4DiYtBlX-I1Jr_iHykbX8EixC3Mq0TCYqbKc

Updates #18892

Change-Id: I7f12d21b5180500d55ae9fd2f7e926a1731db391
Reviewed-on: https://go-review.googlesource.com/103877
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-05-08 00:17:34 +00:00
Austin Clements 4946d9e87b runtime: stop when we run out of hints in race mode
Currently, the runtime falls back to asking for any address the OS can
offer for the heap when it runs out of hint addresses. However, the
race detector assumes the heap lives in [0x00c000000000,
0x00e000000000), and will fail in a non-obvious way if we go outside
this region.

Fix this by actively throwing a useful error if we run out of heap
hints in race mode.

This problem is currently being triggered by TestArenaCollision, which
intentionally triggers this fallback behavior. Fix the test to look
for the new panic message in race mode.

Fixes #24670.
Updates #24133.

Change-Id: I57de6d17a3495dc1f1f84afc382cd18a6efc2bf7
Reviewed-on: https://go-review.googlesource.com/104717
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-04 18:08:04 +00:00
Austin Clements 788464724c runtime: reduce arena size to 4MB on 64-bit Windows
Currently, we use 64MB heap arenas on 64-bit platforms. This works
well on UNIX-like OSes because they treat untouched pages as
essentially free. However, on Windows, committed memory is charged
against a process whether or not it has demand-faulted physical pages
in. Hence, on Windows, even a process with a tiny heap will commit
64MB for one heap arena, plus another 32MB for the arena map. Things
are much worse under the race detector, which increases the heap
commitment by a factor of 5.5X, leading to 384MB of committed memory
at runtime init.

Fix this by reducing the heap arena size to 4MB on Windows.

To counterbalance the effect of increasing the arena map size by a
factor of 16, and to further reduce the impact of the commitment for
the arena map, we switch from a single entry L1 arena map to a 64
entry L1 arena map.

Compared to the original arena design, this slows down the
x/benchmarks garbage benchmark by 0.49% (the slow down of this commit
alone is 1.59%, but the previous commit bought us a 1% speed-up):

name                       old time/op  new time/op  delta
Garbage/benchmem-MB=64-12  2.28ms ± 1%  2.29ms ± 1%  +0.49%  (p=0.000 n=17+18)

(https://perf.golang.org/search?q=upload:20180223.1)

(This was measured on linux/amd64 by modifying its arena configuration
as above.)

Fixes #23900.

Change-Id: I6b7fa5ecebee2947bf20cfeb78c248809469c6b1
Reviewed-on: https://go-review.googlesource.com/96780
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-23 21:59:51 +00:00
Austin Clements ec25210564 runtime: support a two-level arena map
Currently, the heap arena map is a single, large array that covers
every possible arena frame in the entire address space. This is
practical up to about 48 bits of address space with 64 MB arenas.

However, there are two problems with this:

1. mips64, ppc64, and s390x support full 64-bit address spaces (though
   on Linux only s390x has kernel support for 64-bit address spaces).
   On these platforms, it would be good to support these larger
   address spaces.

2. On Windows, processes are charged for untouched memory, so for
   processes with small heaps, the mostly-untouched 32 MB arena map
   plus a 64 MB arena are significant overhead. Hence, it would be
   good to reduce both the arena map size and the arena size, but with
   a single-level arena, these are inversely proportional.

This CL adds support for a two-level arena map. Arena frame numbers
are now divided into arenaL1Bits of L1 index and arenaL2Bits of L2
index.

At the moment, arenaL1Bits is always 0, so we effectively have a
single level map. We do a few things so that this has no cost beyond
the current single-level map:

1. We embed the L2 array directly in mheap, so if there's a single
   entry in the L2 array, the representation is identical to the
   current representation and there's no extra level of indirection.

2. Hot code that accesses the arena map is structured so that it
   optimizes to nearly the same machine code as it does currently.

3. We make some small tweaks to hot code paths and to the inliner
   itself to keep some important functions inlined despite their
   now-larger ASTs. In particular, this is necessary for
   heapBitsForAddr and heapBits.next.

Possibly as a result of some of the tweaks, this actually slightly
improves the performance of the x/benchmarks garbage benchmark:

name                       old time/op  new time/op  delta
Garbage/benchmem-MB=64-12  2.28ms ± 1%  2.26ms ± 1%  -1.07%  (p=0.000 n=17+19)

(https://perf.golang.org/search?q=upload:20180223.2)

For #23900.

Change-Id: If5164e0961754f97eb9eca58f837f36d759505ff
Reviewed-on: https://go-review.googlesource.com/96779
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-23 21:59:50 +00:00
Austin Clements 33b76920ec runtime: rename "arena index" to "arena map"
There are too many places where I want to talk about "indexing into
the arena index". Make this less awkward and ambiguous by calling it
the "arena map" instead.

Change-Id: I726b0667bb2139dbc006175a0ec09a871cdf73f9
Reviewed-on: https://go-review.googlesource.com/96777
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-23 21:59:48 +00:00
Austin Clements ea8d7a370d runtime: clarify address space limit constants and comments
Now that we support the full non-contiguous virtual address space of
amd64 hardware, some of the comments and constants related to this are
out of date.

This renames memLimitBits to heapAddrBits because 1<<memLimitBits is
no longer the limit of the address space and rewrites the comment to
focus first on hardware limits (which span OSes) and then discuss
kernel limits.

Second, this eliminates the memLimit constant because there's no
longer a meaningful "highest possible heap pointer value" on amd64.

Updates #23862.

Change-Id: I44b32033d2deb6b69248fb8dda14fc0e65c47f11
Reviewed-on: https://go-review.googlesource.com/95498
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-21 20:32:36 +00:00