A few of these are copied from the memory model doc.
Many are entirely new, following discussion on #47141.
See https://research.swtch.com/gomm for background.
The rule we are establishing is that each type that is meant
to help synchronize a Go program should document its
happens-before guarantees.
For #50859.
Change-Id: I947c40639b263abe67499fa74f68711a97873a39
Reviewed-on: https://go-review.googlesource.com/c/go/+/381316
Auto-Submit: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Alan Donovan <adonovan@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Roland Shoemaker <roland@golang.org>
When the procUnpin is placed after shared.pushHead, there is
no need for x as a flag to indicate the previous process.
This CL can make the logic clear, and at the same time reduce
a redundant judgment.
Change-Id: I34ec9ba4cb5b5dbdf13a8f158b90481fed248cf5
Reviewed-on: https://go-review.googlesource.com/c/go/+/360059
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
And then revert the bootstrap cmd directories and certain testdata.
And adjust tests as needed.
Not reverting the changes in std that are bootstrapped,
because some of those changes would appear in API docs,
and we want to use any consistently.
Instead, rewrite 'any' to 'interface{}' in cmd/dist for those directories
when preparing the bootstrap copy.
A few files changed as a result of running gofmt -w
not because of interface{} -> any but because they
hadn't been updated for the new //go:build lines.
Fixes#49884.
Change-Id: Ie8045cba995f65bd79c694ec77a1b3d1fe01bb09
Reviewed-on: https://go-review.googlesource.com/c/go/+/368254
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
This CL provide abilty to randomly select P to steal object from its
shared queue. In order to provide such ability randomOrder structure
was copied from runtime/proc.go.
It should reduce contention in firsts Ps and improve balance of object
stealing across all Ps. Also, the patch provides new benchmark
PoolStarvation which force Ps to steal objects.
Benchmarks:
name old time/op new time/op delta
Pool-8 2.16ns ±14% 2.14ns ±16% ~ (p=0.425 n=10+10)
PoolOverflow-8 489ns ± 0% 489ns ± 0% ~ (p=0.719 n=9+10)
PoolStarvation-8 7.00µs ± 4% 6.59µs ± 2% -5.86% (p=0.000 n=10+10)
PoolSTW-8 15.1µs ± 1% 15.2µs ± 1% +0.99% (p=0.001 n=10+10)
PoolExpensiveNew-8 1.25ms ±10% 1.31ms ± 9% ~ (p=0.143 n=10+10)
[Geo mean] 2.68µs 2.68µs -0.28%
name old p50-ns/STW new p50-ns/STW delta
PoolSTW-8 15.0k ± 1% 15.1k ± 1% +0.92% (p=0.000 n=10+10)
name old p95-ns/STW new p95-ns/STW delta
PoolSTW-8 16.2k ± 3% 16.4k ± 2% ~ (p=0.143 n=10+10)
name old GCs/op new GCs/op delta
PoolExpensiveNew-8 0.29 ± 2% 0.30 ± 1% +2.84% (p=0.000 n=8+10)
name old New/op new New/op delta
PoolExpensiveNew-8 8.07 ±11% 8.49 ±10% ~ (p=0.123 n=10+10)
Change-Id: I3ca1d0bf1f358b1148c58e64740fb2d5bfc0bc02
Reviewed-on: https://go-review.googlesource.com/c/go/+/303949
Reviewed-by: David Chase <drchase@google.com>
Trust: Emmanuel Odeke <emmanuel@orijtech.com>
CL 253748 introduced a special case in cmd/go to allow sync to import
runtime/internal/atomic. Besides introducing unnecessary complexity
into cmd/go, this breaks other packages (like gopls) that understand
how imports work, but don't understand this special case.
Fix this by using the more standard linkname-based approach to pull
the necessary functions from runtime/internal/atomic into sync. Since
these are compiler intrinsics, we also have to tell the compiler that
the linknamed symbols are intrinsics to get this optimization in sync.
Fixes#42196.
Change-Id: I1f91498c255c91583950886a89c3c9adc39a32f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/265124
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Paul Murphy <murp@ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Currently, every Pool is cleared completely at the start of each GC.
This is a problem for heavy users of Pool because it causes an
allocation spike immediately after Pools are clear, which impacts both
throughput and latency.
This CL fixes this by introducing a victim cache mechanism. Instead of
clearing Pools, the victim cache is dropped and the primary cache is
moved to the victim cache. As a result, in steady-state, there are
(roughly) no new allocations, but if Pool usage drops, objects will
still be collected within two GCs (as opposed to one).
This victim cache approach also improves Pool's impact on GC dynamics.
The current approach causes all objects in Pools to be short lived.
However, if an application is in steady state and is just going to
repopulate its Pools, then these objects impact the live heap size *as
if* they were long lived. Since Pooled objects count as short lived
when computing the GC trigger and goal, but act as long lived objects
in the live heap, this causes GC to trigger too frequently. If Pooled
objects are a non-trivial portion of an application's heap, this
increases the CPU overhead of GC. The victim cache lets Pooled objects
affect the GC trigger and goal as long-lived objects.
This has no impact on Get/Put performance, but substantially reduces
the impact to the Pool user when a GC happens. PoolExpensiveNew
demonstrates this in the substantially reduction in the rate at which
the "New" function is called.
name old time/op new time/op delta
Pool-12 2.21ns ±36% 2.00ns ± 0% ~ (p=0.070 n=19+16)
PoolOverflow-12 587ns ± 1% 583ns ± 1% -0.77% (p=0.000 n=18+18)
PoolSTW-12 5.57µs ± 3% 4.52µs ± 4% -18.82% (p=0.000 n=20+19)
PoolExpensiveNew-12 3.69ms ± 7% 1.25ms ± 5% -66.25% (p=0.000 n=20+19)
name old p50-ns/STW new p50-ns/STW delta
PoolSTW-12 5.48k ± 2% 4.53k ± 2% -17.32% (p=0.000 n=20+20)
name old p95-ns/STW new p95-ns/STW delta
PoolSTW-12 6.69k ± 4% 5.13k ± 3% -23.31% (p=0.000 n=19+18)
name old GCs/op new GCs/op delta
PoolExpensiveNew-12 0.39 ± 1% 0.32 ± 2% -17.95% (p=0.000 n=18+20)
name old New/op new New/op delta
PoolExpensiveNew-12 40.0 ± 6% 12.4 ± 6% -68.91% (p=0.000 n=20+19)
(https://perf.golang.org/search?q=upload:20190311.2)
Fixes#22950.
Change-Id: If2e183d948c650417283076aacc20739682cdd70
Reviewed-on: https://go-review.googlesource.com/c/go/+/166961
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Currently, Pool stores each per-P shard's overflow in a slice
protected by a Mutex. In order to store to the overflow or steal from
another shard, a P must lock that shard's Mutex. This allows for
simple synchronization between Put and Get, but has unfortunate
consequences for clearing pools.
Pools are cleared during STW sweep termination, and hence rely on
pinning a goroutine to its P to synchronize between Get/Put and
clearing. This makes the Get/Put fast path extremely fast because it
can rely on quiescence-style coordination, which doesn't even require
atomic writes, much less locking.
The catch is that a goroutine cannot acquire a Mutex while pinned to
its P (as this could deadlock). Hence, it must drop the pin on the
slow path. But this means the slow path is not synchronized with
clearing. As a result,
1) It's difficult to reason about races between clearing and the slow
path. Furthermore, this reasoning often depends on unspecified nuances
of where preemption points can occur.
2) Clearing must zero out the pointer to every object in every Pool to
prevent a concurrent slow path from causing all objects to be
retained. Since this happens during STW, this has an O(# objects in
Pools) effect on STW time.
3) We can't implement a victim cache without making clearing even
slower.
This CL solves these problems by replacing the locked overflow slice
with a lock-free structure. This allows Gets and Puts to be pinned the
whole time they're manipulating the shards slice (Pool.local), which
eliminates the races between Get/Put and clearing. This, in turn,
eliminates the need to zero all object pointers, reducing clearing to
O(# of Pools) during STW.
In addition to significantly reducing STW impact, this also happens to
speed up the Get/Put fast-path and the slow path. It somewhat
increases the cost of PoolExpensiveNew, but we'll fix that in the next
CL.
name old time/op new time/op delta
Pool-12 3.00ns ± 0% 2.21ns ±36% -26.32% (p=0.000 n=18+19)
PoolOverflow-12 600ns ± 1% 587ns ± 1% -2.21% (p=0.000 n=16+18)
PoolSTW-12 71.0µs ± 2% 5.6µs ± 3% -92.15% (p=0.000 n=20+20)
PoolExpensiveNew-12 3.14ms ± 5% 3.69ms ± 7% +17.67% (p=0.000 n=19+20)
name old p50-ns/STW new p50-ns/STW delta
PoolSTW-12 70.7k ± 1% 5.5k ± 2% -92.25% (p=0.000 n=20+20)
name old p95-ns/STW new p95-ns/STW delta
PoolSTW-12 73.1k ± 2% 6.7k ± 4% -90.86% (p=0.000 n=18+19)
name old GCs/op new GCs/op delta
PoolExpensiveNew-12 0.38 ± 1% 0.39 ± 1% +2.07% (p=0.000 n=20+18)
name old New/op new New/op delta
PoolExpensiveNew-12 33.9 ± 6% 40.0 ± 6% +17.97% (p=0.000 n=19+20)
(https://perf.golang.org/search?q=upload:20190311.1)
Fixes#22331.
For #22950.
Change-Id: Ic5cd826e25e218f3f8256dbc4d22835c1fecb391
Reviewed-on: https://go-review.googlesource.com/c/go/+/166960
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Make poolLocal size multiple of 128, so it aligns to CPU cache line
on the most common architectures.
This also has the following benefits:
- It may help compiler substituting integer multiplication
by bit shift inside indexLocal.
- It shrinks poolLocal size from 176 bytes to 128 bytes on amd64,
so now it fits two cache lines (or a single cache line on certain
Intel CPUs - see https://software.intel.com/en-us/articles/optimizing-application-performance-on-intel-coret-microarchitecture-using-hardware-implemented-prefetchers).
No measurable performance changes on linux/amd64 and linux/386.
Change-Id: I11df0f064718a662e77a85d88b8a15a8919f25e9
Reviewed-on: https://go-review.googlesource.com/40918
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
cmd and runtime were handled separately, and I'm intentionally skipped
syscall. This is the rest of the standard library.
CL generated mechanically with github.com/mdempsky/unconvert.
Change-Id: I9e0eff886974dedc37adb93f602064b83e469122
Reviewed-on: https://go-review.googlesource.com/22104
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
You can not use cannot, but you cannot spell cannot can not.
Change-Id: I2f0971481a460804de96fd8c9e46a9cc62a3fc5b
Reviewed-on: https://go-review.googlesource.com/19772
Reviewed-by: Rob Pike <r@golang.org>
Factor out duplicated race thunks from sync, syscall net
and fmt packages into a separate package and use it.
Fixes#8593
Change-Id: I156869c50946277809f6b509463752e7f7d28cdb
Reviewed-on: https://go-review.googlesource.com/14870
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Pool memory was only being released during the first GC after the first Put.
Put assumes that p.local != nil means p is on the allPools list.
poolCleanup (called during each GC) removed each pool from allPools
but did not clear p.local, so each pool was cleared by exactly one GC
and then never cleared again.
This bug was introduced late in the Go 1.3 release cycle.
Fixes#8979.
LGTM=rsc
R=golang-codereviews, bradfitz, r, rsc
CC=golang-codereviews, khr
https://golang.org/cl/162980043