I noticed in pprof that acquirem() was a bit of a hotspot. It turns out
that we can use the same trick that runtime.rand() does, and only
acquirem if we're doing something non-nosplit -- in this case, getting a
new state -- but otherwise just do getg().m, which is safe because we're
inside runtime and don't call split functions.
cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
│ sec/op │ sec/op vs base │
ParallelGetRandom-16 2.651n ± 4% 2.416n ± 7% -8.87% (p=0.001 n=10)
│ B/s │ B/s vs base │
ParallelGetRandom-16 1.406Gi ± 4% 1.542Gi ± 6% +9.72% (p=0.001 n=10)
Change-Id: Iae075f4e298b923e499cd01adfabacab725a8684
Reviewed-on: https://go-review.googlesource.com/c/go/+/616738
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Updates #66779
Updates #69577
Change-Id: I0dea5a30aab87aaa443e7e6646c1d07aa865ac1c
GitHub-Last-Rev: 1cea46deb3
GitHub-Pull-Request: golang/go#69719
Reviewed-on: https://go-review.googlesource.com/c/go/+/616696
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Commit-Queue: Ian Lance Taylor <iant@google.com>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
This prevents false sharing, which makes a large difference on machines
with several NUMA nodes, such as this dual socket server:
cpu: Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz
│ sec/op │ sec/op vs base │
ParallelGetRandom-128 0.7944n ± 5% 0.4503n ± 0% -43.31% (p=0.000 n=10)
│ B/s │ B/s vs base │
ParallelGetRandom-128 4.690Gi ± 5% 8.272Gi ± 0% +76.38% (p=0.000 n=10)
Change-Id: Id4421e9a4c190b38aff0be4c59e9067b0a38ccd7
Reviewed-on: https://go-review.googlesource.com/c/go/+/616535
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Jason Donenfeld <Jason@zx2c4.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Linux 6.11 supports calling getrandom() from the vDSO. It operates on a
thread-local opaque state allocated with mmap using flags specified by
the vDSO.
Opaque states are allocated in chunks, ideally ncpu at a time as a hint,
rounding up to as many fit in a complete page. On first use, a state is
assigned to an m, which owns that state, until the m exits, at which
point it is given back to the pool.
Performance appears to be quite good:
│ sec/op │ sec/op vs base │
Read/4-16 222.45n ± 3% 27.13n ± 6% -87.80% (p=0.000 n=10)
│ B/s │ B/s vs base │
Read/4-16 17.15Mi ± 3% 140.61Mi ± 6% +719.82% (p=0.000 n=10)
Fixes#69577.
Change-Id: Ib6f44e8f2f3940c94d970eaada0eb566ec297dc7
Reviewed-on: https://go-review.googlesource.com/c/go/+/614835
Reviewed-by: Filippo Valsorda <filippo@golang.org>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Auto-Submit: Jason Donenfeld <Jason@zx2c4.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>