This CL provide abilty to randomly select P to steal object from its
shared queue. In order to provide such ability randomOrder structure
was copied from runtime/proc.go.
It should reduce contention in firsts Ps and improve balance of object
stealing across all Ps. Also, the patch provides new benchmark
PoolStarvation which force Ps to steal objects.
Benchmarks:
name old time/op new time/op delta
Pool-8 2.16ns ±14% 2.14ns ±16% ~ (p=0.425 n=10+10)
PoolOverflow-8 489ns ± 0% 489ns ± 0% ~ (p=0.719 n=9+10)
PoolStarvation-8 7.00µs ± 4% 6.59µs ± 2% -5.86% (p=0.000 n=10+10)
PoolSTW-8 15.1µs ± 1% 15.2µs ± 1% +0.99% (p=0.001 n=10+10)
PoolExpensiveNew-8 1.25ms ±10% 1.31ms ± 9% ~ (p=0.143 n=10+10)
[Geo mean] 2.68µs 2.68µs -0.28%
name old p50-ns/STW new p50-ns/STW delta
PoolSTW-8 15.0k ± 1% 15.1k ± 1% +0.92% (p=0.000 n=10+10)
name old p95-ns/STW new p95-ns/STW delta
PoolSTW-8 16.2k ± 3% 16.4k ± 2% ~ (p=0.143 n=10+10)
name old GCs/op new GCs/op delta
PoolExpensiveNew-8 0.29 ± 2% 0.30 ± 1% +2.84% (p=0.000 n=8+10)
name old New/op new New/op delta
PoolExpensiveNew-8 8.07 ±11% 8.49 ±10% ~ (p=0.123 n=10+10)
Change-Id: I3ca1d0bf1f358b1148c58e64740fb2d5bfc0bc02
Reviewed-on: https://go-review.googlesource.com/c/go/+/303949
Reviewed-by: David Chase <drchase@google.com>
Trust: Emmanuel Odeke <emmanuel@orijtech.com>
Make all our package sources use Go 1.17 gofmt format
(adding //go:build lines).
Part of //go:build change (#41184).
See https://golang.org/design/draft-gobuild
Change-Id: Ia0534360e4957e58cd9a18429c39d0e32a6addb4
Reviewed-on: https://go-review.googlesource.com/c/go/+/294430
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
In TestPoolDequeue it's surprisingly common for the queue to stay
nearly empty the whole time and for a racing PopTail to happen in the
window between the producer doing a PushHead and doing a PopHead. In
short mode, there are only 100 PopTail attempts. On linux/amd64, it's
not uncommon for this to fail 50% of the time. On linux/arm64, it's
not uncommon for this to fail 100% of the time, causing the test to
fail.
This CL fixes this by only checking for a successful PopTail in long
mode. Long mode makes 200,000 PopTail attempts, and has never been
observed to fail.
Fixes#31422.
Change-Id: If464d55eb94fcb0b8d78fbc441d35be9f056a290
Reviewed-on: https://go-review.googlesource.com/c/go/+/183981
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
TestPoolDequeue creates P-1 consumer goroutines and 1 producer
goroutine. Currently, if a consumer goroutine pops the last value from
the dequeue, it sets a flag that stops all consumers, but the producer
also periodically pops from the dequeue and doesn't set this flag.
Hence, if the producer were to pop the last element, the consumers
will continue to run and the test won't terminate. This CL fixes this
by also setting the termination flag in the producer.
I believe it's impossible for this to happen right now because the
producer only pops after pushing an element for which j%10==0 and the
last element is either 999 or 1999999, which means it should never try
to pop after pushing the last element. However, we shouldn't depend on
this reasoning.
Change-Id: Icd2bc8d7cb9a79ebfcec99e367c8a2ba76e027d8
Reviewed-on: https://go-review.googlesource.com/c/go/+/183980
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
TestPoolDequeue in long mode does a little more than 1<<21 pushes.
This was originally because the head and tail indexes were 21 bits and
the intent was to test wrap-around behavior. However, in the final
version they were both 32 bits, so the test no longer tested
wrap-around.
It would take too long to reach 32-bit wrap around in a test, so
instead we initialize the poolDequeue with indexes that are already
nearly at their limit. This keeps the knowledge of the maximum index
in one place, and lets us test wrap-around even in short mode.
Change-Id: Ib9b8d85b6d9b5be235849c2b32e81c809e806579
Reviewed-on: https://go-review.googlesource.com/c/go/+/183979
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Shorten some of the longest tests that run during all.bash.
Removes 7r 50u 21s from all.bash.
After this change, all.bash is under 5 minutes again on my laptop.
For #26473.
Change-Id: Ie0460aa935808d65460408feaed210fbaa1d5d79
Reviewed-on: https://go-review.googlesource.com/c/go/+/177559
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Currently, every Pool is cleared completely at the start of each GC.
This is a problem for heavy users of Pool because it causes an
allocation spike immediately after Pools are clear, which impacts both
throughput and latency.
This CL fixes this by introducing a victim cache mechanism. Instead of
clearing Pools, the victim cache is dropped and the primary cache is
moved to the victim cache. As a result, in steady-state, there are
(roughly) no new allocations, but if Pool usage drops, objects will
still be collected within two GCs (as opposed to one).
This victim cache approach also improves Pool's impact on GC dynamics.
The current approach causes all objects in Pools to be short lived.
However, if an application is in steady state and is just going to
repopulate its Pools, then these objects impact the live heap size *as
if* they were long lived. Since Pooled objects count as short lived
when computing the GC trigger and goal, but act as long lived objects
in the live heap, this causes GC to trigger too frequently. If Pooled
objects are a non-trivial portion of an application's heap, this
increases the CPU overhead of GC. The victim cache lets Pooled objects
affect the GC trigger and goal as long-lived objects.
This has no impact on Get/Put performance, but substantially reduces
the impact to the Pool user when a GC happens. PoolExpensiveNew
demonstrates this in the substantially reduction in the rate at which
the "New" function is called.
name old time/op new time/op delta
Pool-12 2.21ns ±36% 2.00ns ± 0% ~ (p=0.070 n=19+16)
PoolOverflow-12 587ns ± 1% 583ns ± 1% -0.77% (p=0.000 n=18+18)
PoolSTW-12 5.57µs ± 3% 4.52µs ± 4% -18.82% (p=0.000 n=20+19)
PoolExpensiveNew-12 3.69ms ± 7% 1.25ms ± 5% -66.25% (p=0.000 n=20+19)
name old p50-ns/STW new p50-ns/STW delta
PoolSTW-12 5.48k ± 2% 4.53k ± 2% -17.32% (p=0.000 n=20+20)
name old p95-ns/STW new p95-ns/STW delta
PoolSTW-12 6.69k ± 4% 5.13k ± 3% -23.31% (p=0.000 n=19+18)
name old GCs/op new GCs/op delta
PoolExpensiveNew-12 0.39 ± 1% 0.32 ± 2% -17.95% (p=0.000 n=18+20)
name old New/op new New/op delta
PoolExpensiveNew-12 40.0 ± 6% 12.4 ± 6% -68.91% (p=0.000 n=20+19)
(https://perf.golang.org/search?q=upload:20190311.2)
Fixes#22950.
Change-Id: If2e183d948c650417283076aacc20739682cdd70
Reviewed-on: https://go-review.googlesource.com/c/go/+/166961
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
This adds two benchmarks that will highlight two problems in Pool that
we're about to address.
The first benchmark measures the impact of large Pools on GC STW time.
Currently, STW time is O(# of items in Pools), and this benchmark
demonstrates 70µs STW times.
The second benchmark measures the impact of fully clearing all Pools
on each GC. Typically this is a problem in heavily-loaded systems
because it causes a spike in allocation. This benchmark stresses this
by simulating an expensive "New" function, so the cost of creating new
objects is reflected in the ns/op of the benchmark.
For #22950, #22331.
Change-Id: I0c8853190d23144026fa11837b6bf42adc461722
Reviewed-on: https://go-review.googlesource.com/c/go/+/166959
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
This adds a dynamically sized, lock-free, single-producer,
multi-consumer queue that will be used in the new Pool stealing
implementation. It's built on top of the fixed-size queue added in the
previous CL.
For #22950, #22331.
Change-Id: Ifc0ca3895bec7e7f9289ba9fb7dd0332bf96ba5a
Reviewed-on: https://go-review.googlesource.com/c/go/+/166958
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
This is the first step toward fixing multiple issues with sync.Pool.
This adds a fixed size, lock-free, single-producer, multi-consumer
queue that will be used in the new Pool stealing implementation.
For #22950, #22331.
Change-Id: I50e85e3cb83a2ee71f611ada88e7f55996504bb5
Reviewed-on: https://go-review.googlesource.com/c/go/+/166957
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Prevent possible goroutine rescheduling to another P between
Put and Get calls by locking the goroutine to OS thread.
Inspired by the CL 42770.
Fixes#20198.
Change-Id: I18e24fcad1630658713e6b9d80d90d7941f604be
Reviewed-on: https://go-review.googlesource.com/44310
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Pool memory was only being released during the first GC after the first Put.
Put assumes that p.local != nil means p is on the allPools list.
poolCleanup (called during each GC) removed each pool from allPools
but did not clear p.local, so each pool was cleared by exactly one GC
and then never cleared again.
This bug was introduced late in the Go 1.3 release cycle.
Fixes#8979.
LGTM=rsc
R=golang-codereviews, bradfitz, r, rsc
CC=golang-codereviews, khr
https://golang.org/cl/162980043