Introduce GOOS=ios for iOS systems. GOOS=ios matches "darwin"
build tag, like GOOS=android matches "linux" and GOOS=illumos
matches "solaris". Only ios/arm64 is supported (ios/amd64 is
not).
GOOS=ios and GOOS=darwin remain essentially the same at this
point. They will diverge at later time, to differentiate macOS
and iOS.
Uses of GOOS=="darwin" are changed to (GOOS=="darwin" || GOOS=="ios"),
except if it clearly means macOS (e.g. GOOS=="darwin" && GOARCH=="amd64"),
it remains GOOS=="darwin".
Updates #38485.
Change-Id: I4faacdc1008f42434599efb3c3ad90763a83b67c
Reviewed-on: https://go-review.googlesource.com/c/go/+/254740
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This change deletes the old mcentral implementation from the code base
and the newMCentralImpl feature flag along with it.
Updates #37487.
Change-Id: Ibca8f722665f0865051f649ffe699cbdbfdcfcf2
Reviewed-on: https://go-review.googlesource.com/c/go/+/221184
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
This constant does not make it into DWARF because it is an ideal
constant larger than maxint (1<<63-1). DWARF has no way to represent
signed values that large. Define a different typed constant that
is unsigned and so can represent this constant properly.
Viewcore needs this constant to interrogate the heap data structures.
In addition, the sign of arenaBaseOffset changed in 1.15, and providing
a new name lets viewcore detect the sign change easily.
Change-Id: I4274a2f6e79ebbf1411e85d64758fac1672fb96b
Reviewed-on: https://go-review.googlesource.com/c/go/+/240198
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
The page sweeper depends on spans being marked if any object in the
span is marked, but currently only greyobject does this.
gcmarknewobject and wbBufFlush1 also mark objects, but neither set
span marks. As a result, if there are live objects on a span, but
they're all marked via allocation or write barriers, then the span
itself won't be marked and the page reclaimer will free the span,
ultimately leading to memory corruption when the memory for those live
allocations gets reused.
Fix this by making gcmarknewobject and wbBufFlush1 also mark pages.
No test because I have no idea how to reliably (or even unreliably)
trigger this.
Fixes#39432.
Performance is a wash or very slightly worse. I benchmarked the
gcmarknewobject and wbBufFlush1 changes independently and both showed
a slight performance improvement, so I'm going to call this noise.
name old time/op new time/op delta
BiogoIgor 15.9s ± 2% 15.9s ± 2% ~ (p=0.758 n=25+25)
BiogoKrishna 15.7s ± 3% 15.7s ± 3% ~ (p=0.382 n=21+21)
BleveIndexBatch100 4.94s ± 3% 5.07s ± 4% +2.63% (p=0.000 n=25+25)
CompileTemplate 204ms ± 1% 205ms ± 1% +0.43% (p=0.000 n=21+23)
CompileUnicode 77.8ms ± 1% 78.1ms ± 1% ~ (p=0.130 n=23+23)
CompileGoTypes 731ms ± 1% 733ms ± 1% +0.30% (p=0.006 n=22+22)
CompileCompiler 3.64s ± 2% 3.65s ± 3% ~ (p=0.179 n=24+25)
CompileSSA 8.44s ± 1% 8.46s ± 1% +0.30% (p=0.003 n=22+23)
CompileFlate 132ms ± 1% 133ms ± 1% ~ (p=0.098 n=22+22)
CompileGoParser 164ms ± 1% 164ms ± 1% +0.37% (p=0.000 n=21+23)
CompileReflect 455ms ± 1% 457ms ± 2% +0.50% (p=0.002 n=20+22)
CompileTar 182ms ± 2% 182ms ± 1% ~ (p=0.382 n=22+22)
CompileXML 245ms ± 3% 245ms ± 1% ~ (p=0.070 n=21+23)
CompileStdCmd 16.5s ± 2% 16.5s ± 3% ~ (p=0.486 n=23+23)
FoglemanFauxGLRenderRotateBoat 12.9s ± 1% 13.0s ± 1% +0.97% (p=0.000 n=21+24)
FoglemanPathTraceRenderGopherIter1 18.6s ± 1% 18.7s ± 0% ~ (p=0.083 n=23+24)
GopherLuaKNucleotide 28.4s ± 1% 29.3s ± 1% +2.84% (p=0.000 n=25+25)
MarkdownRenderXHTML 252ms ± 0% 251ms ± 1% -0.50% (p=0.000 n=23+24)
Tile38WithinCircle100kmRequest 516µs ± 2% 516µs ± 2% ~ (p=0.763 n=24+25)
Tile38IntersectsCircle100kmRequest 689µs ± 2% 689µs ± 2% ~ (p=0.617 n=24+24)
Tile38KNearestLimit100Request 608µs ± 1% 606µs ± 2% -0.35% (p=0.030 n=19+22)
[Geo mean] 522ms 524ms +0.41%
https://perf.golang.org/search?q=upload:20200606.4
Change-Id: I8b331f310dbfaba0468035f207467c8403005bf5
Reviewed-on: https://go-review.googlesource.com/c/go/+/236817
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Currently maxOffAddr is defined in terms of the whole 64-bit address
space, assuming that it's all supported, by using ^uintptr(0) as the
maximal address in the offset space. In reality, the maximal address in
the offset space is (1<<heapAddrBits)-1 because we don't have more than
that actually available to us on a given platform.
On most platforms this is fine, because arenaBaseOffset is just
connecting two segments of address space, but on AIX we use it as an
actual offset for the starting address of the available address space,
which is limited. This means using ^uintptr(0) as the maximal address in
the offset address space causes wrap-around, especially when we just
want to represent a range approximately like [addr, infinity), which
today we do by using maxOffAddr.
To fix this, we define maxOffAddr more appropriately, in terms of
(1<<heapAddrBits)-1.
This change also redefines arenaBaseOffset to not be the negation of the
virtual address corresponding to address zero in the virtual address
space, but instead directly as the virtual address corresponding to
zero. This matches the existing documentation more closely and makes the
logic around arenaBaseOffset decidedly simpler, especially when trying
to reason about its use on AIX.
Fixes#38966.
Change-Id: I1336e5036a39de846f64cc2d253e8536dee57611
Reviewed-on: https://go-review.googlesource.com/c/go/+/233497
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Currently linearAlloc manages an exclusive "end" address for the top of
its reserved space. While unlikely for a linearAlloc to be allocated
with an "end" address hitting the top of the address space, it is
possible and could lead to overflow.
Avoid overflow by chopping off the last byte from the linearAlloc if
it's bumping up against the top of the address space defensively. In
practice, this means that if 32-bit platforms map the top of the address
space and use the linearAlloc to acquire arenas, the top arena will not
be usable.
Fixes#35954.
Change-Id: I512cddcd34fd1ab15cb6ca92bbf899fc1ef22ff6
Reviewed-on: https://go-review.googlesource.com/c/go/+/231338
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Currently mcentral is implemented as a couple of linked lists of spans
protected by a lock. Unfortunately this design leads to significant lock
contention.
The span ownership model is also confusing and complicated. In-use spans
jump between being owned by multiple sources, generally some combination
of a gcSweepBuf, a concurrent sweeper, an mcentral or an mcache.
So first to address contention, this change replaces those linked lists
with gcSweepBufs which have an atomic fast path. Then, we change up the
ownership model: a span may be simultaneously owned only by an mcentral
and the page reclaimer. Otherwise, an mcentral (which now consists of
sweep bufs), a sweeper, or an mcache are the sole owners of a span at
any given time. This dramatically simplifies reasoning about span
ownership in the runtime.
As a result of this new ownership model, sweeping is now driven by
walking over the mcentrals rather than having its own global list of
spans. Because we no longer have a global list and we traditionally
haven't used the mcentrals for large object spans, we no longer have
anywhere to put large objects. So, this change also makes it so that we
keep large object spans in the appropriate mcentral lists.
In terms of the static lock ranking, we add the spanSet spine locks in
pretty much the same place as the mcentral locks, since they have the
potential to be manipulated both on the allocation and sweep paths, like
the mcentral locks.
This new implementation is turned on by default via a feature flag
called go115NewMCentralImpl.
Benchmark results for 1 KiB allocation throughput (5 runs each):
name \ MiB/s go113 go114 gotip gotip+this-patch
AllocKiB-1 1.71k ± 1% 1.68k ± 1% 1.59k ± 2% 1.71k ± 1%
AllocKiB-2 2.46k ± 1% 2.51k ± 1% 2.54k ± 1% 2.93k ± 1%
AllocKiB-4 4.27k ± 1% 4.41k ± 2% 4.33k ± 1% 5.01k ± 2%
AllocKiB-8 4.38k ± 3% 5.24k ± 1% 5.46k ± 1% 8.23k ± 1%
AllocKiB-12 4.38k ± 3% 4.49k ± 1% 5.10k ± 1% 10.04k ± 0%
AllocKiB-16 4.31k ± 1% 4.14k ± 3% 4.22k ± 0% 10.42k ± 0%
AllocKiB-20 4.26k ± 1% 3.98k ± 1% 4.09k ± 1% 10.46k ± 3%
AllocKiB-24 4.20k ± 1% 3.97k ± 1% 4.06k ± 1% 10.74k ± 1%
AllocKiB-28 4.15k ± 0% 4.00k ± 0% 4.20k ± 0% 10.76k ± 1%
Fixes#37487.
Change-Id: I92d47355acacf9af2c41bf080c08a8c1638ba210
Reviewed-on: https://go-review.googlesource.com/c/go/+/221182
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Currently markrootSpans, the scanning routine which scans span specials
(particularly finalizers) as roots, uses sweepSpans to shard work and
find spans to mark.
However, as part of a future CL to change span ownership and how
mcentral works, we want to avoid having markrootSpans use the sweep bufs
to find specials, so in this change we introduce a new mechanism.
Much like for the page reclaimer, we set up a per-page bitmap where the
first page for a span is marked if the span contains any specials, and
unmarked if it has no specials. This bitmap is updated by addspecial,
removespecial, and during sweeping.
markrootSpans then shards this bitmap into mark work and markers iterate
over the bitmap looking for spans with specials to mark. Unlike the page
reclaimer, we don't need to use the pageInUse bits because having a
special implies that a span is in-use.
While in terms of computational complexity this design is technically
worse, because it needs to iterate over the mapped heap, in practice
this iteration is very fast (we can skip over large swathes of the heap
very quickly) and we only look at spans that have any specials at all,
rather than having to touch each span.
This new implementation of markrootSpans is behind a feature flag called
go115NewMarkrootSpans.
Updates #37487.
Change-Id: I8ea07b6c11059f6d412fe419e0ab512d989377b8
Reviewed-on: https://go-review.googlesource.com/c/go/+/221178
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
A case that I missed in CL 205239: profilealloc can be called at
program startup if GOMAXPROCS is large enough.
Fixes#38474
Change-Id: I2f089fc6ec00c376680e1c0b8a2557b62789dd7f
Reviewed-on: https://go-review.googlesource.com/c/go/+/228420
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
I took some of the infrastructure from Austin's lock logging CR
https://go-review.googlesource.com/c/go/+/192704 (with deadlock
detection from the logs), and developed a setup to give static lock
ranking for runtime locks.
Static lock ranking establishes a documented total ordering among locks,
and then reports an error if the total order is violated. This can
happen if a deadlock happens (by acquiring a sequence of locks in
different orders), or if just one side of a possible deadlock happens.
Lock ordering deadlocks cannot happen as long as the lock ordering is
followed.
Along the way, I found a deadlock involving the new timer code, which Ian fixed
via https://go-review.googlesource.com/c/go/+/207348, as well as two other
potential deadlocks.
See the constants at the top of runtime/lockrank.go to show the static
lock ranking that I ended up with, along with some comments. This is
great documentation of the current intended lock ordering when acquiring
multiple locks in the runtime.
I also added an array lockPartialOrder[] which shows and enforces the
current partial ordering among locks (which is embedded within the total
ordering). This is more specific about the dependencies among locks.
I don't try to check the ranking within a lock class with multiple locks
that can be acquired at the same time (i.e. check the ranking when
multiple hchan locks are acquired).
Currently, I am doing a lockInit() call to set the lock rank of most
locks. Any lock that is not otherwise initialized is assumed to be a
leaf lock (a very high rank lock), so that eliminates the need to do
anything for a bunch of locks (including all architecture-dependent
locks). For two locks, root.lock and notifyList.lock (only in the
runtime/sema.go file), it is not as easy to do lock initialization, so
instead, I am passing the lock rank with the lock calls.
For Windows compilation, I needed to increase the StackGuard size from
896 to 928 because of the new lock-rank checking functions.
Checking of the static lock ranking is enabled by setting
GOEXPERIMENT=staticlockranking before doing a run.
To make sure that the static lock ranking code has no overhead in memory
or CPU when not enabled by GOEXPERIMENT, I changed 'go build/install' so
that it defines a build tag (with the same name) whenever any experiment
has been baked into the toolchain (by checking Expstring()). This allows
me to avoid increasing the size of the 'mutex' type when static lock
ranking is not enabled.
Fixes#38029
Change-Id: I154217ff307c47051f8dae9c2a03b53081acd83a
Reviewed-on: https://go-review.googlesource.com/c/go/+/207619
Reviewed-by: Dan Scales <danscales@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
There are a handful of places where the runtime wants to round up the
result of a division. We just introduced a helper to do this. This CL
replaces all of the hand-coded round-ups (that I could find) with this
helper.
Change-Id: I465d152157ff0f3cad40c0aa57491e4f2de510ad
Reviewed-on: https://go-review.googlesource.com/c/go/+/224385
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Having an mcache field in both m and p is confusing, so remove it from m.
Always use mcache field from p. Use new variable mcache0 during bootstrap.
Change-Id: If2cba9f8bb131d911d512b61fd883a86cf62cc98
Reviewed-on: https://go-review.googlesource.com/c/go/+/205239
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This commit changes the wording of a comment in malloc.go that describes
how span objects are zeroed to make it more clear.
Change-Id: I07722df1e101af3cbf8680ad07437d4a230b0168
GitHub-Last-Rev: 0e909898c7
GitHub-Pull-Request: golang/go#37008
Reviewed-on: https://go-review.googlesource.com/c/go/+/217618
Reviewed-by: Austin Clements <austin@google.com>
Based on riscv-go port.
Updates #27532
Change-Id: If522807a382130be3c8d40f4b4c1131d1de7c9e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/204632
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
On AIX, addresses returned by mmap are between 0x0a00000000000000
and 0x0afffffffffffff. The previous solution to handle these large
addresses was to increase the arena size up to 60 bits addresses,
cf CL 138736.
However, with the new page allocator, the 60bit heap addresses are
causing huge memory allocations, especially by (s *pageAlloc).init. mmap
and munmap syscalls dealing with these allocations are reducing
performances of every Go programs.
In order to avoid these allocations, arenaBaseOffset is set to
0x0a00000000000000 and heap addresses are on 48bit, as others operating
systems.
Updates: #35451
Change-Id: Ice916b8578f76703428ec12a82024147a7592bc0
Reviewed-on: https://go-review.googlesource.com/c/go/+/206841
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
mheap_.alloc currently accepts both a spanClass and a "large" parameter
indicating whether the allocation is large. These are redundant, since
spanClass.sizeclass() == 0 is an equivalent way to determine this and is
already used in mheap_.alloc. There are no places in the runtime where
the size class could be non-zero and large == true.
Updates #35112.
Change-Id: Ie66facf8f0faca6f4cd3d20a8ac4bc259e11823d
Reviewed-on: https://go-review.googlesource.com/c/go/+/196639
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This change defines a maximum supported physical and huge page size in
the runtime based on the new page allocator's implementation, and uses
them where appropriate.
Furthemore, if the system exceeds the maximum supported huge page
size, we simply ignore it silently.
It also fixes a huge-page-related test which is only triggered by a
condition which is definitely wrong.
Finally, it adds a few TODOs related to code clean-up and supporting
larger huge page sizes.
Updates #35112.
Fixes#35431.
Change-Id: Ie4348afb6bf047cce2c1433576d1514720d8230f
Reviewed-on: https://go-review.googlesource.com/c/go/+/205937
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This change removes the old page allocator from the runtime.
Updates #35112.
Change-Id: Ib20e1c030f869b6318cd6f4288a9befdbae1b771
Reviewed-on: https://go-review.googlesource.com/c/go/+/195700
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This change flips the oldPageAllocator constant enabling the new page
allocator in the Go runtime.
Updates #35112.
Change-Id: I7fc8332af9fd0e43ce28dd5ebc1c1ce519ce6d0c
Reviewed-on: https://go-review.googlesource.com/c/go/+/201765
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This change integrates all the bits and pieces of the new page allocator
into the runtime, behind a global constant.
Updates #35112.
Change-Id: I6696bde7bab098a498ab37ed2a2caad2a05d30ec
Reviewed-on: https://go-review.googlesource.com/c/go/+/201764
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This change adds a scavenger for the new page allocator along with
tests. The scavenger walks over the heap backwards once per GC, looking
for memory to scavenge. It walks across the heap without any lock held,
searching optimistically. If it finds what appears to be a scavenging
candidate it acquires the heap lock and attempts to verify it. Upon
verification it then scavenges.
Notably, unlike the old scavenger, it doesn't show any preference for
huge pages and instead follows a more strict last-page-first policy.
Updates #35112.
Change-Id: I0621ef73c999a471843eab2d1307ae5679dd18d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/195697
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
On iOS, the address space is not 48 bits as one might believe, since
it's arm64 hardware. In fact, all pointers are truncated to 33 bits, and
the OS only gives applications access to the range [1<<32, 2<<32).
While today this has no effect on the Go runtime, future changes which
care about address space size need this to be correct.
Updates #35112.
Change-Id: Id518a2298080f7e3d31cf7d909506a37748cc49a
Reviewed-on: https://go-review.googlesource.com/c/go/+/205758
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
This change renames the "round" function to the more appropriately named
"alignUp" which rounds an integer up to the next multiple of a power of
two.
This change also adds the alignDown function, which is almost like
alignUp but rounds down to the previous multiple of a power of two.
With these two functions, we also go and replace manual rounding code
with it where we can.
Change-Id: Ie1487366280484dcb2662972b01b4f7135f72fec
Reviewed-on: https://go-review.googlesource.com/c/go/+/190618
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
This change adds physHugePageShift which is defined such that
1 << physHugePageShift == physHugePageSize. The purpose of this variable
is to avoid doing expensive divisions in key functions, such as
(*mspan).hugePages.
This change also does a sweep of any place we might do a division or mod
operation with physHugePageSize and turns it into bit shifts and other
bitwise operations.
Finally, this change adds a check to mallocinit which ensures that
physHugePageSize is always a power of two. osinit might choose to ignore
non-powers-of-two for the value and replace it with zero, but mallocinit
will fail if it's not a power of two (or zero). It also derives
physHugePageShift from physHugePageSize.
This change helps improve the performance of most applications because
of how often (*mspan).hugePages is called.
Updates #32828.
Change-Id: I1a6db113d52d563f59ae8fd4f0e130858859e68f
Reviewed-on: https://go-review.googlesource.com/c/go/+/186598
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Overflow of the comparison caused very large (>=1<<32) allocations to
sometimes not get sampled at all. Use uintptr so the comparison will
never overflow.
Fixes#33342
Tested on the example in 33342. I don't want to check a test in that
needs that much memory, however.
Change-Id: I51fe77a9117affed8094da93c0bc5f445ac2d3d3
Reviewed-on: https://go-review.googlesource.com/c/go/+/188017
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Currently, GODEBUG=sbrk=1 mode aligns allocations by their type's
alignment. You would think this would be the right thing to do, but
because 64-bit fields are only 4-byte aligned right now (see #599),
this can cause a 64-bit field of an allocated object to be 4-byte
aligned, but not 8-byte aligned. If there is an atomic access to that
unaligned 64-bit field, it will crash.
This doesn't happen in normal allocation mode because the
size-segregated allocation and the current size classes will cause any
types larger than 8 bytes to be 8 byte aligned.
We fix this by making sbrk=1 mode use alignment based on the type's
size rather than its declared alignment. This matches how the tiny
allocator aligns allocations.
This was tested with
GOARCH=386 GODEBUG=sbrk=1 go test sync/atomic
This crashes with an unaligned access before this change, and passes
with this change.
This should be reverted when/if we fix#599.
Fixes#33159.
Change-Id: Ifc52c72c6b99c5d370476685271baa43ad907565
Reviewed-on: https://go-review.googlesource.com/c/go/+/186919
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
On most platforms newly-mapped memory is untouched, meaning the pages
backing the region haven't been faulted in yet. However, we mark this
memory as unscavenged which means the background scavenger
aggressively "returns" this memory to the OS if the heap is small.
The only platform where newly-mapped memory is actually unscavenged (and
counts toward the application's RSS) is on Windows, since
(*mheap).sysAlloc commits the reservation. Instead of making a special
case for Windows, I change the requirements a bit for a sysReserve'd
region. It must now be both sysMap'd and sysUsed'd, with sysMap being a
no-op on Windows. Comments about memory allocation have been updated to
include a more up-to-date mental model of which states a region of memory
may be in (at a very low level) and how to transition between these
states.
Now this means we can correctly mark newly-mapped heap memory as
scavenged on every platform, reducing the load on the background
scavenger early on in the application for small heaps. As a result,
heap-growth scavenging is no longer necessary, since any actual RSS
growth will be accounted for on the allocation codepath.
Finally, this change also cleans up grow a little bit to avoid
pretending that it's freeing an in-use span and just does the necessary
operations directly.
Fixes#32012.
Fixes#31966.
Updates #26473.
Change-Id: Ie06061eb638162e0560cdeb0b8993d94cfb4d290
Reviewed-on: https://go-review.googlesource.com/c/go/+/177097
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This change adds the global physHugePageSize which is initialized in
osinit(). physHugePageSize contains the system's transparent huge page
(or superpage) size in bytes.
For #30333.
Change-Id: I2f0198c40729dbbe6e6f2676cef1d57dd107562c
Reviewed-on: https://go-review.googlesource.com/c/go/+/170858
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Austin Clements <austin@google.com>
WebAssembly's memory is contiguous. Allocating memory at a high address
also allocates all memory up to that address. This change reduces
the initial memory allocated on wasm from 1GB to 16MB by using multiple
heap arenas and reducing the size of a heap arena.
Fixes#27462.
Change-Id: Ic941e6edcadd411e65a14cb2f9fd6c8eae02fc7a
Reviewed-on: https://go-review.googlesource.com/c/go/+/170950
Run-TryBot: Richard Musiol <neelance@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
We already have the ptrdata field in a type, which encodes exactly
the same information that kindNoPointers does.
My problem with kindNoPointers is that it often leads to
double-negative code like:
t.kind & kindNoPointers != 0
Much clearer is:
t.ptrdata == 0
Update #27167
Change-Id: I92307d7f018a6bbe3daca4a4abb4225e359349b1
Reviewed-on: https://go-review.googlesource.com/c/go/+/169157
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
to be used by errors package for checking assignability
and setting error values in As.
Updates #29934.
Change-Id: I8c1d02a2c6efa0919d54b286cfe8b4edc26da059
Reviewed-on: https://go-review.googlesource.com/c/161759
Run-TryBot: Marcel van Lohuizen <mpvl@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Remove an unnecessary check on the heap sampling code that forced sampling
of all heap allocations larger than the sampling rate. This need to follow
a poisson process so that they can be correctly unsampled. Maintain a check
for MemProfileRate==1 to provide a mechanism for full sampling, as
documented in https://golang.org/pkg/runtime/#pkg-variables.
Additional testing for this change is on cl/129117.
Fixes#26618
Change-Id: I7802bde2afc655cf42cffac34af9bafeb3361957
GitHub-Last-Rev: 471f747af8
GitHub-Pull-Request: golang/go#29791
Reviewed-on: https://go-review.googlesource.com/c/158337
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
This commit allows the runtime to handle 64bits addresses returned by
mmap syscall on AIX.
Mmap syscall returns addresses on 59bits on AIX. But the Arena
implementation only allows addresses with less than 48 bits.
This commit increases the arena size up to 1<<60 for aix/ppc64.
Update: #25893
Change-Id: Iea72e8a944d10d4f00be915785e33ae82dd6329e
Reviewed-on: https://go-review.googlesource.com/c/138736
Reviewed-by: Austin Clements <austin@google.com>
Currently, there's no efficient way to iterate over the Go heap. We're
going to need this for fast free page sweeping, so this CL adds a
slice of all allocated heap arenas. This will also be useful for
generational GC.
For #18155.
Change-Id: I58d126cfb9c3f61b3125d80b74ccb1b2169efbcc
Reviewed-on: https://go-review.googlesource.com/c/138076
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
This improves performance for e.g. maps with a bucket size
(key+value*8 bytes) larger than 32 bytes and removes loading
a value from the maxElems array for smaller bucket sizes.
name old time/op new time/op delta
MakeMap/[Byte]Byte 95.5ns ± 1% 94.7ns ± 1% -0.78% (p=0.013 n=9+9)
MakeMap/[Int]Int 128ns ± 0% 121ns ± 2% -5.63% (p=0.000 n=6+10)
Updates #21588
Change-Id: I7d9eb7d49150c399c15dcab675e24bc97ff97852
Reviewed-on: https://go-review.googlesource.com/c/143997
Reviewed-by: Keith Randall <khr@golang.org>
Currently the table of arena sizes mixes the number of entries in the
L1 with the size of the L2. While the size of the L2 is important,
this makes it hard to see what's actually going on because there's an
implicit factor of sys.PtrSize.
This changes the L2 column to say both the number of entries and the
size that results in. This should hopefully make the relations between
the columns of the table clearer, since they can now be plugged
directly into the given formula.
Change-Id: Ie677adaef763b893a2f620bd4fc3b8db314b3a1e
Reviewed-on: https://go-review.googlesource.com/c/139697
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This argument is always gcBackgroundMode since only
debug.gcstoptheworld can trigger a STW GC at this point. Remove the
unnecessary argument.
Updates #26903. This is preparation for unifying STW GC and concurrent
GC.
Change-Id: Icb4ba8f10f80c2b69cf51a21e04fa2c761b71c94
Reviewed-on: https://go-review.googlesource.com/c/134775
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
mcache.refill doesn't need to run on the system stack; it just needs
to be non-preemptible. Its only caller, mcache.nextFree, also needs to
be non-preemptible, so we can remove the unnecessary systemstack
switch.
Change-Id: Iba5b3f4444855f1dc134485ba588efff3b54c426
Reviewed-on: https://go-review.googlesource.com/138196
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
We already aliased mSpanInUse to _MSpanInUse. The dual constants are
getting annoying, so fix all of these to use the mSpan* naming
convention.
This was done automatically with:
sed -i -re 's/_?MSpan(Dead|InUse|Manual|Free)/mSpan\1/g' *.go
plus deleting the existing definition of mSpanInUse.
Change-Id: I09979d9d491d06c10689cea625dc57faa9cc6767
Reviewed-on: https://go-review.googlesource.com/137875
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
TSAN for Go only supports heap address in the range [0x00c000000000,
0x00e000000000). However, we currently create heap hints of the form
0xXXc000000000 for XX between 0x00 and 0x7f. Even for XX=0x01, this
hint is outside TSAN's supported heap address range.
Fix this by creating a slightly different set of hints in race mode,
all of which fall inside TSAN's heap address range.
This should fix TestArenaCollision flakes. That test forces the
runtime to use later heap hints. Currently, this always results in
TSAN "failed to allocate" failures on Windows (which happens to have a
slightly more constrained TSAN layout than non-Windows). Most of the
time we don't notice these failures, but sometimes it crashes TSAN,
leading to a test failure.
Fixes#25698.
Change-Id: I8926cd61f0ee5ee00efa77b283f7b809c555be46
Reviewed-on: https://go-review.googlesource.com/123780
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently, the runtime falls back to asking for any address the OS can
offer for the heap when it runs out of hint addresses. However, the
race detector assumes the heap lives in [0x00c000000000,
0x00e000000000), and will fail in a non-obvious way if we go outside
this region.
Fix this by actively throwing a useful error if we run out of heap
hints in race mode.
This problem is currently being triggered by TestArenaCollision, which
intentionally triggers this fallback behavior. Fix the test to look
for the new panic message in race mode.
Fixes#24670.
Updates #24133.
Change-Id: I57de6d17a3495dc1f1f84afc382cd18a6efc2bf7
Reviewed-on: https://go-review.googlesource.com/104717
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently, we use 64MB heap arenas on 64-bit platforms. This works
well on UNIX-like OSes because they treat untouched pages as
essentially free. However, on Windows, committed memory is charged
against a process whether or not it has demand-faulted physical pages
in. Hence, on Windows, even a process with a tiny heap will commit
64MB for one heap arena, plus another 32MB for the arena map. Things
are much worse under the race detector, which increases the heap
commitment by a factor of 5.5X, leading to 384MB of committed memory
at runtime init.
Fix this by reducing the heap arena size to 4MB on Windows.
To counterbalance the effect of increasing the arena map size by a
factor of 16, and to further reduce the impact of the commitment for
the arena map, we switch from a single entry L1 arena map to a 64
entry L1 arena map.
Compared to the original arena design, this slows down the
x/benchmarks garbage benchmark by 0.49% (the slow down of this commit
alone is 1.59%, but the previous commit bought us a 1% speed-up):
name old time/op new time/op delta
Garbage/benchmem-MB=64-12 2.28ms ± 1% 2.29ms ± 1% +0.49% (p=0.000 n=17+18)
(https://perf.golang.org/search?q=upload:20180223.1)
(This was measured on linux/amd64 by modifying its arena configuration
as above.)
Fixes#23900.
Change-Id: I6b7fa5ecebee2947bf20cfeb78c248809469c6b1
Reviewed-on: https://go-review.googlesource.com/96780
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently, the heap arena map is a single, large array that covers
every possible arena frame in the entire address space. This is
practical up to about 48 bits of address space with 64 MB arenas.
However, there are two problems with this:
1. mips64, ppc64, and s390x support full 64-bit address spaces (though
on Linux only s390x has kernel support for 64-bit address spaces).
On these platforms, it would be good to support these larger
address spaces.
2. On Windows, processes are charged for untouched memory, so for
processes with small heaps, the mostly-untouched 32 MB arena map
plus a 64 MB arena are significant overhead. Hence, it would be
good to reduce both the arena map size and the arena size, but with
a single-level arena, these are inversely proportional.
This CL adds support for a two-level arena map. Arena frame numbers
are now divided into arenaL1Bits of L1 index and arenaL2Bits of L2
index.
At the moment, arenaL1Bits is always 0, so we effectively have a
single level map. We do a few things so that this has no cost beyond
the current single-level map:
1. We embed the L2 array directly in mheap, so if there's a single
entry in the L2 array, the representation is identical to the
current representation and there's no extra level of indirection.
2. Hot code that accesses the arena map is structured so that it
optimizes to nearly the same machine code as it does currently.
3. We make some small tweaks to hot code paths and to the inliner
itself to keep some important functions inlined despite their
now-larger ASTs. In particular, this is necessary for
heapBitsForAddr and heapBits.next.
Possibly as a result of some of the tweaks, this actually slightly
improves the performance of the x/benchmarks garbage benchmark:
name old time/op new time/op delta
Garbage/benchmem-MB=64-12 2.28ms ± 1% 2.26ms ± 1% -1.07% (p=0.000 n=17+19)
(https://perf.golang.org/search?q=upload:20180223.2)
For #23900.
Change-Id: If5164e0961754f97eb9eca58f837f36d759505ff
Reviewed-on: https://go-review.googlesource.com/96779
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
There are too many places where I want to talk about "indexing into
the arena index". Make this less awkward and ambiguous by calling it
the "arena map" instead.
Change-Id: I726b0667bb2139dbc006175a0ec09a871cdf73f9
Reviewed-on: https://go-review.googlesource.com/96777
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
Now that we support the full non-contiguous virtual address space of
amd64 hardware, some of the comments and constants related to this are
out of date.
This renames memLimitBits to heapAddrBits because 1<<memLimitBits is
no longer the limit of the address space and rewrites the comment to
focus first on hardware limits (which span OSes) and then discuss
kernel limits.
Second, this eliminates the memLimit constant because there's no
longer a meaningful "highest possible heap pointer value" on amd64.
Updates #23862.
Change-Id: I44b32033d2deb6b69248fb8dda14fc0e65c47f11
Reviewed-on: https://go-review.googlesource.com/95498
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>