Commit Graph

710 Commits

Author SHA1 Message Date
apocelipes c841ba3a3e math/big: use built-in clear to simplify code
Change-Id: I07c3a498ce1e462c3d1703d77e7d7824e9334651
GitHub-Last-Rev: 2ba8c4c705
GitHub-Pull-Request: golang/go#66312
Reviewed-on: https://go-review.googlesource.com/c/go/+/571636
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-03-14 17:02:38 +00:00
Oleksandr Redko 806ea41fce math/rand, math/rand/v2: rename receiver variables
According to the https://go.dev/wiki/CodeReviewComments#receiver-names

Change-Id: Ib8bc57cf6a680e5c75d7346b74e77847945f6939
Reviewed-on: https://go-review.googlesource.com/c/go/+/568635
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
2024-03-04 17:32:49 +00:00
Meng Zhuo ad377e906a math: add round assembly implementations on riscv64
goos: linux
goarch: riscv64
pkg: math
            │ floor_old.bench │           floor_new.bench           │
            │     sec/op      │   sec/op     vs base                │
Ceil              54.12n ± 0%   22.05n ± 0%  -59.26% (p=0.000 n=10)
Floor             40.80n ± 0%   22.05n ± 0%  -45.96% (p=0.000 n=10)
Round             20.73n ± 0%   20.74n ± 0%        ~ (p=0.441 n=10)
RoundToEven       24.07n ± 0%   24.07n ± 0%        ~ (p=1.000 n=10)
Trunc             38.73n ± 0%   22.05n ± 0%  -43.07% (p=0.000 n=10)
geomean           33.58n        22.17n       -33.98%

Change-Id: I24fb9e3bbf8146da253b6791b21377bea1afbd16
Reviewed-on: https://go-review.googlesource.com/c/go/+/504737
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: M Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: M Zhuo <mengzhuo1203@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
2024-02-23 08:34:12 +00:00
Daniel Martí 5c92f43c51 math/rand/v2: use a doc link for crypto/rand
It's easier to go look at its documentation when there's a link.

Change-Id: Iad6c1aa1a3f4b9127dc526b4db473239329780d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/563255
Reviewed-by: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Than McIntosh <thanm@google.com>
2024-02-19 08:55:25 +00:00
Lynn Boger 4fde3ef2ac math/big,crypto/internal/bigmod: unroll loop in addMulVVW for ppc64x
This updates the assembly implementation of AddMulVVW to
unroll the main loop to do 64 bytes at a time.

The code for addMulVVWx is based on the same code and has
also been updated to improve performance.

goos: linux
goarch: ppc64le
pkg: crypto/internal/bigmod
cpu: POWER10
               │ bg.orig.out │               bg.out               │
               │   sec/op    │   sec/op     vs base               │
ModAdd           116.3n ± 0%   116.9n ± 0%   +0.52% (p=0.002 n=6)
ModSub           111.5n ± 0%   111.5n ± 0%    0.00% (p=0.273 n=6)
MontgomeryRepr   2.195µ ± 0%   1.944µ ± 0%  -11.44% (p=0.002 n=6)
MontgomeryMul    2.195µ ± 0%   1.943µ ± 0%  -11.48% (p=0.002 n=6)
ModMul           4.418µ ± 0%   3.900µ ± 0%  -11.72% (p=0.002 n=6)
ExpBig           5.736m ± 0%   5.117m ± 0%  -10.78% (p=0.002 n=6)
Exp              5.891m ± 0%   5.237m ± 0%  -11.11% (p=0.002 n=6)
geomean          9.901µ        9.094µ        -8.15%


goos: linux
goarch: ppc64le
pkg: math/big
cpu: POWER10
                 │ am.orig.out  │               am.out               │
                 │    sec/op    │   sec/op     vs base               │
AddMulVVW/1         4.456n ± 1%   3.565n ± 0%  -20.00% (p=0.002 n=6)
AddMulVVW/2         4.875n ± 1%   5.938n ± 1%  +21.79% (p=0.002 n=6)
AddMulVVW/3         5.484n ± 0%   5.693n ± 0%   +3.80% (p=0.002 n=6)
AddMulVVW/4         6.370n ± 0%   6.065n ± 0%   -4.79% (p=0.002 n=6)
AddMulVVW/5         7.321n ± 0%   7.188n ± 0%   -1.82% (p=0.002 n=6)
AddMulVVW/10        12.26n ± 8%   11.41n ± 0%   -6.97% (p=0.002 n=6)
AddMulVVW/100      100.70n ± 0%   93.58n ± 0%   -7.08% (p=0.002 n=6)
AddMulVVW/1000      938.6n ± 0%   845.5n ± 0%   -9.92% (p=0.002 n=6)
AddMulVVW/10000     9.459µ ± 0%   8.415µ ± 0%  -11.04% (p=0.002 n=6)
AddMulVVW/100000    94.57µ ± 0%   84.01µ ± 0%  -11.16% (p=0.002 n=6)
geomean             75.17n        71.21n        -5.27%


Change-Id: Idd79f5f02387564f4c2cc28d50b1c12bcd9a400f
Reviewed-on: https://go-review.googlesource.com/c/go/+/557915
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Paul Murphy <murp@ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
2024-01-25 19:32:43 +00:00
Robert Griesemer aba18d5b67 math/big: fix uint64 overflow in nat.mulRange
Compute median as a + (b-a)/2 instead of (a + b)/2.
Add additional test cases.

Fixes #65025.

Change-Id: Ib716a1036c17f8f33f51e33cedab13512eb7e0be
Reviewed-on: https://go-review.googlesource.com/c/go/+/554617
Reviewed-by: Robert Griesemer <gri@google.com>
Auto-Submit: Robert Griesemer <gri@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Robert Griesemer <gri@google.com>
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2024-01-09 15:29:36 +00:00
Danil Timerbulatov 527829a7cb all: remove newline characters after return statements
This commit is aimed at improving the readability and consistency
of the code base. Extraneous newline characters were present after
some return statements, creating unnecessary separation in the code.

Fixes #64610

Change-Id: Ic1b05bf11761c4dff22691c2f1c3755f66d341f7
Reviewed-on: https://go-review.googlesource.com/c/go/+/548316
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2023-12-14 17:22:18 +00:00
Russ Cox c29444ef39 math/rand, math/rand/v2: use ChaCha8 for global rand
Move ChaCha8 code into internal/chacha8rand and use it to implement
runtime.rand, which is used for the unseeded global source for
both math/rand and math/rand/v2. This also affects the calculation of
the start point for iteration over very very large maps (when the
32-bit fastrand is not big enough).

The benefit is that misuse of the global random number generators
in math/rand and math/rand/v2 in contexts where non-predictable
randomness is important for security reasons is no longer a
security problem, removing a common mistake among programmers
who are unaware of the different kinds of randomness.

The cost is an extra 304 bytes per thread stored in the m struct
plus 2-3ns more per random uint64 due to the more sophisticated
algorithm. Using PCG looks like it would cost about the same,
although I haven't benchmarked that.

Before this, the math/rand and math/rand/v2 global generator
was wyrand (https://github.com/wangyi-fudan/wyhash).
For math/rand, using wyrand instead of the Mitchell/Reeds/Thompson
ALFG was justifiable, since the latter was not any better.
But for math/rand/v2, the global generator really should be
at least as good as one of the well-studied, specific algorithms
provided directly by the package, and it's not.

(Wyrand is still reasonable for scheduling and cache decisions.)

Good randomness does have a cost: about twice wyrand.

Also rationalize the various runtime rand references.

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ bbb48afeb7.amd64 │           5cf807d1ea.amd64           │
                        │      sec/op      │    sec/op     vs base                │
ChaCha8-32                     1.862n ± 2%    1.861n ± 2%        ~ (p=0.825 n=20)
PCG_DXSM-32                    1.471n ± 1%    1.460n ± 2%        ~ (p=0.153 n=20)
SourceUint64-32                1.636n ± 2%    1.582n ± 1%   -3.30% (p=0.000 n=20)
GlobalInt64-32                 2.087n ± 1%    3.663n ± 1%  +75.54% (p=0.000 n=20)
GlobalInt64Parallel-32        0.1042n ± 1%   0.2026n ± 1%  +94.48% (p=0.000 n=20)
GlobalUint64-32                2.263n ± 2%    3.724n ± 1%  +64.57% (p=0.000 n=20)
GlobalUint64Parallel-32       0.1019n ± 1%   0.1973n ± 1%  +93.67% (p=0.000 n=20)
Int64-32                       1.771n ± 1%    1.774n ± 1%        ~ (p=0.449 n=20)
Uint64-32                      1.863n ± 2%    1.866n ± 1%        ~ (p=0.364 n=20)
GlobalIntN1000-32              3.134n ± 3%    4.730n ± 2%  +50.95% (p=0.000 n=20)
IntN1000-32                    2.489n ± 1%    2.489n ± 1%        ~ (p=0.683 n=20)
Int64N1000-32                  2.521n ± 1%    2.516n ± 1%        ~ (p=0.394 n=20)
Int64N1e8-32                   2.479n ± 1%    2.478n ± 2%        ~ (p=0.743 n=20)
Int64N1e9-32                   2.530n ± 2%    2.514n ± 2%        ~ (p=0.193 n=20)
Int64N2e9-32                   2.501n ± 1%    2.494n ± 1%        ~ (p=0.616 n=20)
Int64N1e18-32                  3.227n ± 1%    3.205n ± 1%        ~ (p=0.101 n=20)
Int64N2e18-32                  3.647n ± 1%    3.599n ± 1%        ~ (p=0.019 n=20)
Int64N4e18-32                  5.135n ± 1%    5.069n ± 2%        ~ (p=0.034 n=20)
Int32N1000-32                  2.657n ± 1%    2.637n ± 1%        ~ (p=0.180 n=20)
Int32N1e8-32                   2.636n ± 1%    2.636n ± 1%        ~ (p=0.763 n=20)
Int32N1e9-32                   2.660n ± 2%    2.638n ± 1%        ~ (p=0.358 n=20)
Int32N2e9-32                   2.662n ± 2%    2.618n ± 2%        ~ (p=0.064 n=20)
Float32-32                     2.272n ± 2%    2.239n ± 2%        ~ (p=0.194 n=20)
Float64-32                     2.272n ± 1%    2.286n ± 2%        ~ (p=0.763 n=20)
ExpFloat64-32                  3.762n ± 1%    3.744n ± 1%        ~ (p=0.171 n=20)
NormFloat64-32                 3.706n ± 1%    3.655n ± 2%        ~ (p=0.066 n=20)
Perm3-32                       32.93n ± 3%    34.62n ± 1%   +5.13% (p=0.000 n=20)
Perm30-32                      202.9n ± 1%    204.0n ± 1%        ~ (p=0.482 n=20)
Perm30ViaShuffle-32            115.0n ± 1%    114.9n ± 1%        ~ (p=0.358 n=20)
ShuffleOverhead-32             112.8n ± 1%    112.7n ± 1%        ~ (p=0.692 n=20)
Concurrent-32                  2.107n ± 0%    3.725n ± 1%  +76.75% (p=0.000 n=20)

goos: darwin
goarch: arm64
pkg: math/rand/v2
                       │ bbb48afeb7.arm64 │           5cf807d1ea.arm64            │
                       │      sec/op      │    sec/op     vs base                 │
ChaCha8-8                     2.480n ± 0%    2.429n ± 0%    -2.04% (p=0.000 n=20)
PCG_DXSM-8                    2.531n ± 0%    2.530n ± 0%         ~ (p=0.877 n=20)
SourceUint64-8                2.534n ± 0%    2.533n ± 0%         ~ (p=0.732 n=20)
GlobalInt64-8                 2.172n ± 1%    4.794n ± 0%  +120.67% (p=0.000 n=20)
GlobalInt64Parallel-8        0.4320n ± 0%   0.9605n ± 0%  +122.32% (p=0.000 n=20)
GlobalUint64-8                2.182n ± 0%    4.770n ± 0%  +118.58% (p=0.000 n=20)
GlobalUint64Parallel-8       0.4307n ± 0%   0.9583n ± 0%  +122.51% (p=0.000 n=20)
Int64-8                       4.107n ± 0%    4.104n ± 0%         ~ (p=0.416 n=20)
Uint64-8                      4.080n ± 0%    4.080n ± 0%         ~ (p=0.052 n=20)
GlobalIntN1000-8              2.814n ± 2%    5.643n ± 0%  +100.50% (p=0.000 n=20)
IntN1000-8                    4.141n ± 0%    4.139n ± 0%         ~ (p=0.140 n=20)
Int64N1000-8                  4.140n ± 0%    4.140n ± 0%         ~ (p=0.313 n=20)
Int64N1e8-8                   4.140n ± 0%    4.139n ± 0%         ~ (p=0.103 n=20)
Int64N1e9-8                   4.139n ± 0%    4.140n ± 0%         ~ (p=0.761 n=20)
Int64N2e9-8                   4.140n ± 0%    4.140n ± 0%         ~ (p=0.636 n=20)
Int64N1e18-8                  5.266n ± 0%    5.326n ± 1%    +1.14% (p=0.001 n=20)
Int64N2e18-8                  6.052n ± 0%    6.167n ± 0%    +1.90% (p=0.000 n=20)
Int64N4e18-8                  8.826n ± 0%    9.051n ± 0%    +2.55% (p=0.000 n=20)
Int32N1000-8                  4.127n ± 0%    4.132n ± 0%    +0.12% (p=0.000 n=20)
Int32N1e8-8                   4.126n ± 0%    4.131n ± 0%    +0.12% (p=0.000 n=20)
Int32N1e9-8                   4.127n ± 0%    4.132n ± 0%    +0.12% (p=0.000 n=20)
Int32N2e9-8                   4.132n ± 0%    4.131n ± 0%         ~ (p=0.017 n=20)
Float32-8                     4.109n ± 0%    4.105n ± 0%         ~ (p=0.379 n=20)
Float64-8                     4.107n ± 0%    4.106n ± 0%         ~ (p=0.867 n=20)
ExpFloat64-8                  5.339n ± 0%    5.383n ± 0%    +0.82% (p=0.000 n=20)
NormFloat64-8                 5.735n ± 0%    5.737n ± 1%         ~ (p=0.856 n=20)
Perm3-8                       26.65n ± 0%    26.80n ± 1%    +0.58% (p=0.000 n=20)
Perm30-8                      194.8n ± 1%    197.0n ± 0%    +1.18% (p=0.000 n=20)
Perm30ViaShuffle-8            156.6n ± 0%    157.6n ± 1%    +0.61% (p=0.000 n=20)
ShuffleOverhead-8             124.9n ± 0%    125.5n ± 0%    +0.52% (p=0.000 n=20)
Concurrent-8                  2.434n ± 3%    5.066n ± 0%  +108.09% (p=0.000 n=20)

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ bbb48afeb7.386 │            5cf807d1ea.386             │
                        │     sec/op     │    sec/op     vs base                 │
ChaCha8-32                  11.295n ± 1%    4.748n ± 2%   -57.96% (p=0.000 n=20)
PCG_DXSM-32                  7.693n ± 1%    7.738n ± 2%         ~ (p=0.542 n=20)
SourceUint64-32              7.658n ± 2%    7.622n ± 2%         ~ (p=0.344 n=20)
GlobalInt64-32               3.473n ± 2%    7.526n ± 2%  +116.73% (p=0.000 n=20)
GlobalInt64Parallel-32      0.3198n ± 0%   0.5444n ± 0%   +70.22% (p=0.000 n=20)
GlobalUint64-32              3.612n ± 0%    7.575n ± 1%  +109.69% (p=0.000 n=20)
GlobalUint64Parallel-32     0.3168n ± 0%   0.5403n ± 0%   +70.51% (p=0.000 n=20)
Int64-32                     7.673n ± 2%    7.789n ± 1%         ~ (p=0.122 n=20)
Uint64-32                    7.773n ± 1%    7.827n ± 2%         ~ (p=0.920 n=20)
GlobalIntN1000-32            6.268n ± 1%    9.581n ± 1%   +52.87% (p=0.000 n=20)
IntN1000-32                  10.33n ± 2%    10.45n ± 1%         ~ (p=0.233 n=20)
Int64N1000-32                10.98n ± 2%    11.01n ± 1%         ~ (p=0.401 n=20)
Int64N1e8-32                 11.19n ± 2%    10.97n ± 1%         ~ (p=0.033 n=20)
Int64N1e9-32                 11.06n ± 1%    11.08n ± 1%         ~ (p=0.498 n=20)
Int64N2e9-32                 11.10n ± 1%    11.01n ± 2%         ~ (p=0.995 n=20)
Int64N1e18-32                15.23n ± 2%    15.04n ± 1%         ~ (p=0.973 n=20)
Int64N2e18-32                15.89n ± 1%    15.85n ± 1%         ~ (p=0.409 n=20)
Int64N4e18-32                18.96n ± 2%    19.34n ± 2%         ~ (p=0.048 n=20)
Int32N1000-32                10.46n ± 2%    10.44n ± 2%         ~ (p=0.480 n=20)
Int32N1e8-32                 10.46n ± 2%    10.49n ± 2%         ~ (p=0.951 n=20)
Int32N1e9-32                 10.28n ± 2%    10.26n ± 1%         ~ (p=0.431 n=20)
Int32N2e9-32                 10.50n ± 2%    10.44n ± 2%         ~ (p=0.249 n=20)
Float32-32                   13.80n ± 2%    13.80n ± 2%         ~ (p=0.751 n=20)
Float64-32                   23.55n ± 2%    23.87n ± 0%         ~ (p=0.408 n=20)
ExpFloat64-32                15.36n ± 1%    15.29n ± 2%         ~ (p=0.316 n=20)
NormFloat64-32               13.57n ± 1%    13.79n ± 1%    +1.66% (p=0.005 n=20)
Perm3-32                     45.70n ± 2%    46.99n ± 2%    +2.81% (p=0.001 n=20)
Perm30-32                    399.0n ± 1%    403.8n ± 1%    +1.19% (p=0.006 n=20)
Perm30ViaShuffle-32          349.0n ± 1%    350.4n ± 1%         ~ (p=0.909 n=20)
ShuffleOverhead-32           322.3n ± 1%    323.8n ± 1%         ~ (p=0.410 n=20)
Concurrent-32                3.331n ± 1%    7.312n ± 1%  +119.50% (p=0.000 n=20)

For #61716.

Change-Id: Ibdddeed85c34d9ae397289dc899e04d4845f9ed2
Reviewed-on: https://go-review.googlesource.com/c/go/+/516860
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-12-05 20:34:30 +00:00
Russ Cox d92434935f math/rand/v2: add ChaCha8
This is a replay of CL 516859, after its rollback in CL 543895,
with big-endian systems fixed and the tests disabled on RISC-V
since the compiler is broken there (#64285).

ChaCha8 provides a cryptographically strong generator
alongside PCG, so that people who want stronger randomness
have access to that. On systems with 128-bit vector math
assembly (amd64 and arm64), ChaCha8 runs at about the same
speed as PCG (25% slower on amd64, 2% faster on arm64).

Fixes #64284.

Change-Id: I6290bb8ace28e1aff9a61f805dbe380ccdf25b94
Reviewed-on: https://go-review.googlesource.com/c/go/+/546020
Reviewed-by: Filippo Valsorda <filippo@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-12-05 20:32:54 +00:00
Michael Knyszek 82fc03f9c9 Revert "math/rand/v2: add ChaCha8"
This reverts commit 6382893890.

Reason for revert: Causes failures on big endian platforms and riscv64.
Possibly a bug in the generic implementation.

For #64284.
For #64285.

Change-Id: Ic1bb8533d9641fae28d0337b36d434b9a575cd7e
Reviewed-on: https://go-review.googlesource.com/c/go/+/543895
Reviewed-by: Heschi Kreinick <heschi@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Bypass: Michael Knyszek <mknyszek@google.com>
2023-11-20 18:39:03 +00:00
Ludi Rehak ee6b34797b all: add floating point option for ARM targets
This change introduces new options to set the floating point
mode on ARM targets. The GOARM version number can optionally be
followed by ',hardfloat' or ',softfloat' to select whether to
use hardware instructions or software emulation for floating
point computations, respectively. For example,
GOARM=7,softfloat.

Previously, software floating point support was limited to
GOARM=5. With these options, software floating point is now
extended to all ARM versions, including GOARM=6 and 7. This
change also extends hardware floating point to GOARM=5.

GOARM=5 defaults to softfloat and GOARM=6 and 7 default to
hardfloat.

For #61588

Change-Id: I23dc86fbd0733b262004a2ed001e1032cf371e94
Reviewed-on: https://go-review.googlesource.com/c/go/+/514907
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
2023-11-20 17:19:36 +00:00
Russ Cox 6382893890 math/rand/v2: add ChaCha8
ChaCha8 provides a cryptographically strong generator
alongside PCG, so that people who want stronger randomness
have access to that. On systems with 128-bit vector math
assembly (amd64 and arm64), ChaCha8 runs at about the same
speed as PCG (25% slower on amd64, 2% faster on arm64).

Obviously all the claimed benchmark variation other than the
new ChaCha8 benchmark is a lie.

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ afa459a2f0.amd64 │          bbb48afeb7.amd64           │
                        │      sec/op      │    sec/op     vs base               │
PCG_DXSM-32                    1.488n ± 2%    1.492n ± 2%       ~ (p=0.309 n=20)
ChaCha8-32                                    1.861n ± 2%
SourceUint64-32                1.450n ± 3%    1.590n ± 2%  +9.69% (p=0.000 n=20)
GlobalInt64-32                 2.067n ± 2%    2.061n ± 1%       ~ (p=0.952 n=20)
GlobalInt64Parallel-32        0.1044n ± 2%   0.1041n ± 1%       ~ (p=0.498 n=20)
GlobalUint64-32                2.085n ± 0%    2.256n ± 2%  +8.23% (p=0.000 n=20)
GlobalUint64Parallel-32       0.1008n ± 1%   0.1018n ± 1%       ~ (p=0.041 n=20)
Int64-32                       1.779n ± 1%    1.779n ± 1%       ~ (p=0.410 n=20)
Uint64-32                      1.854n ± 2%    1.882n ± 1%       ~ (p=0.044 n=20)
GlobalIntN1000-32              3.140n ± 3%    3.115n ± 3%       ~ (p=0.673 n=20)
IntN1000-32                    2.496n ± 1%    2.509n ± 1%       ~ (p=0.171 n=20)
Int64N1000-32                  2.510n ± 2%    2.493n ± 1%       ~ (p=0.804 n=20)
Int64N1e8-32                   2.471n ± 2%    2.521n ± 1%  +1.98% (p=0.003 n=20)
Int64N1e9-32                   2.488n ± 2%    2.506n ± 1%       ~ (p=0.663 n=20)
Int64N2e9-32                   2.478n ± 2%    2.482n ± 2%       ~ (p=0.533 n=20)
Int64N1e18-32                  3.088n ± 1%    3.216n ± 1%  +4.15% (p=0.000 n=20)
Int64N2e18-32                  3.493n ± 1%    3.635n ± 2%  +4.05% (p=0.000 n=20)
Int64N4e18-32                  5.060n ± 2%    5.122n ± 1%  +1.22% (p=0.000 n=20)
Int32N1000-32                  2.620n ± 1%    2.672n ± 1%  +2.00% (p=0.002 n=20)
Int32N1e8-32                   2.652n ± 0%    2.646n ± 1%       ~ (p=0.743 n=20)
Int32N1e9-32                   2.644n ± 1%    2.660n ± 2%       ~ (p=0.163 n=20)
Int32N2e9-32                   2.619n ± 2%    2.652n ± 1%       ~ (p=0.132 n=20)
Float32-32                     2.261n ± 1%    2.267n ± 1%       ~ (p=0.516 n=20)
Float64-32                     2.241n ± 2%    2.276n ± 1%       ~ (p=0.080 n=20)
ExpFloat64-32                  3.716n ± 1%    3.779n ± 1%  +1.68% (p=0.007 n=20)
NormFloat64-32                 3.718n ± 1%    3.747n ± 1%       ~ (p=0.011 n=20)
Perm3-32                       34.11n ± 2%    34.23n ± 2%       ~ (p=0.779 n=20)
Perm30-32                      200.6n ± 0%    202.3n ± 2%       ~ (p=0.055 n=20)
Perm30ViaShuffle-32            109.7n ± 1%    115.5n ± 2%  +5.34% (p=0.000 n=20)
ShuffleOverhead-32             107.2n ± 1%    113.3n ± 1%  +5.74% (p=0.000 n=20)
Concurrent-32                  2.108n ± 6%    2.107n ± 1%       ~ (p=0.448 n=20)

goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
                       │ afa459a2f0.arm64 │          bbb48afeb7.arm64           │
                       │      sec/op      │    sec/op     vs base               │
PCG_DXSM-8                    2.531n ± 0%    2.529n ± 0%       ~ (p=0.586 n=20)
ChaCha8-8                                    2.480n ± 0%
SourceUint64-8                2.531n ± 0%    2.534n ± 0%       ~ (p=0.227 n=20)
GlobalInt64-8                 2.177n ± 1%    2.173n ± 1%       ~ (p=0.733 n=20)
GlobalInt64Parallel-8        0.4319n ± 0%   0.4304n ± 0%  -0.32% (p=0.003 n=20)
GlobalUint64-8                2.185n ± 1%    2.185n ± 0%       ~ (p=0.541 n=20)
GlobalUint64Parallel-8       0.4295n ± 1%   0.4294n ± 0%       ~ (p=0.203 n=20)
Int64-8                       4.104n ± 0%    4.107n ± 0%       ~ (p=0.193 n=20)
Uint64-8                      4.080n ± 0%    4.081n ± 0%       ~ (p=0.053 n=20)
GlobalIntN1000-8              2.814n ± 1%    2.814n ± 0%       ~ (p=0.879 n=20)
IntN1000-8                    4.140n ± 0%    4.141n ± 0%       ~ (p=0.428 n=20)
Int64N1000-8                  4.139n ± 0%    4.140n ± 0%       ~ (p=0.114 n=20)
Int64N1e8-8                   4.140n ± 0%    4.140n ± 0%       ~ (p=0.898 n=20)
Int64N1e9-8                   4.139n ± 0%    4.140n ± 0%       ~ (p=0.593 n=20)
Int64N2e9-8                   4.140n ± 0%    4.139n ± 0%       ~ (p=0.158 n=20)
Int64N1e18-8                  5.273n ± 0%    5.274n ± 0%       ~ (p=0.308 n=20)
Int64N2e18-8                  6.059n ± 0%    6.058n ± 0%       ~ (p=0.053 n=20)
Int64N4e18-8                  8.803n ± 0%    8.800n ± 0%       ~ (p=0.673 n=20)
Int32N1000-8                  4.131n ± 0%    4.131n ± 0%       ~ (p=0.342 n=20)
Int32N1e8-8                   4.131n ± 0%    4.131n ± 0%       ~ (p=0.091 n=20)
Int32N1e9-8                   4.131n ± 0%    4.131n ± 0%       ~ (p=0.273 n=20)
Int32N2e9-8                   4.131n ± 0%    4.131n ± 0%       ~ (p=0.425 n=20)
Float32-8                     4.110n ± 0%    4.112n ± 0%       ~ (p=0.203 n=20)
Float64-8                     4.104n ± 0%    4.106n ± 0%       ~ (p=0.409 n=20)
ExpFloat64-8                  5.338n ± 0%    5.339n ± 0%       ~ (p=0.037 n=20)
NormFloat64-8                 5.731n ± 0%    5.733n ± 0%       ~ (p=0.692 n=20)
Perm3-8                       26.62n ± 0%    26.65n ± 0%  +0.09% (p=0.000 n=20)
Perm30-8                      194.6n ± 2%    194.9n ± 0%       ~ (p=0.141 n=20)
Perm30ViaShuffle-8            156.4n ± 0%    156.5n ± 0%  +0.06% (p=0.000 n=20)
ShuffleOverhead-8             125.8n ± 0%    125.0n ± 0%  -0.64% (p=0.000 n=20)
Concurrent-8                  2.654n ± 6%    2.441n ± 6%  -8.06% (p=0.009 n=20)

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ afa459a2f0.386 │            bbb48afeb7.386            │
                        │     sec/op     │    sec/op      vs base               │
PCG_DXSM-32                  7.793n ± 2%    7.647n ±  1%       ~ (p=0.021 n=20)
ChaCha8-32                                  11.48n ±  2%
SourceUint64-32              7.680n ± 1%    7.714n ±  1%       ~ (p=0.713 n=20)
GlobalInt64-32               3.474n ± 3%    3.491n ± 28%       ~ (p=0.337 n=20)
GlobalInt64Parallel-32      0.3253n ± 0%   0.3194n ±  0%  -1.81% (p=0.000 n=20)
GlobalUint64-32              3.433n ± 2%    3.610n ±  2%  +5.14% (p=0.000 n=20)
GlobalUint64Parallel-32     0.3156n ± 0%   0.3164n ±  0%       ~ (p=0.073 n=20)
Int64-32                     7.707n ± 1%    7.824n ±  0%  +1.52% (p=0.005 n=20)
Uint64-32                    7.714n ± 1%    7.732n ±  2%       ~ (p=0.441 n=20)
GlobalIntN1000-32            6.236n ± 1%    6.176n ±  2%       ~ (p=0.499 n=20)
IntN1000-32                  10.41n ± 1%    10.31n ±  2%       ~ (p=0.782 n=20)
Int64N1000-32                10.97n ± 2%    11.22n ±  2%  +2.19% (p=0.002 n=20)
Int64N1e8-32                 10.98n ± 1%    11.07n ±  1%       ~ (p=0.056 n=20)
Int64N1e9-32                 10.95n ± 0%    11.15n ±  2%       ~ (p=0.016 n=20)
Int64N2e9-32                 11.11n ± 1%    11.00n ±  1%       ~ (p=0.654 n=20)
Int64N1e18-32                15.18n ± 2%    14.97n ±  2%       ~ (p=0.387 n=20)
Int64N2e18-32                15.61n ± 1%    15.91n ±  1%  +1.92% (p=0.003 n=20)
Int64N4e18-32                19.23n ± 2%    18.98n ±  1%       ~ (p=1.000 n=20)
Int32N1000-32                10.35n ± 1%    10.31n ±  2%       ~ (p=0.081 n=20)
Int32N1e8-32                 10.33n ± 1%    10.38n ±  1%       ~ (p=0.335 n=20)
Int32N1e9-32                 10.35n ± 1%    10.37n ±  1%       ~ (p=0.497 n=20)
Int32N2e9-32                 10.35n ± 1%    10.41n ±  1%       ~ (p=0.605 n=20)
Float32-32                   13.57n ± 1%    13.78n ±  2%       ~ (p=0.047 n=20)
Float64-32                   22.95n ± 4%    23.43n ±  3%       ~ (p=0.218 n=20)
ExpFloat64-32                15.23n ± 2%    15.46n ±  1%       ~ (p=0.095 n=20)
NormFloat64-32               13.78n ± 1%    13.73n ±  2%       ~ (p=0.031 n=20)
Perm3-32                     46.62n ± 2%    47.46n ±  2%  +1.82% (p=0.004 n=20)
Perm30-32                    400.7n ± 1%    403.5n ±  1%       ~ (p=0.098 n=20)
Perm30ViaShuffle-32          350.5n ± 1%    348.1n ±  2%       ~ (p=0.703 n=20)
ShuffleOverhead-32           326.0n ± 2%    326.2n ±  2%       ~ (p=0.440 n=20)
Concurrent-32                3.290n ± 0%    3.297n ±  4%       ~ (p=0.189 n=20)

For #61716.

Change-Id: Id2a7e1c1db0beb81f563faaefba65fe292497269
Reviewed-on: https://go-review.googlesource.com/c/go/+/516859
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2023-11-19 22:05:54 +00:00
Robert Griesemer 22278e3835 math/big: faster FloatPrec implementation
Based on observations by Cherry Mui (see comments in CL 539299).
Add new benchmark FloatPrecMixed.

For #50489.

name                         old time/op  new time/op  delta
FloatPrecExact/1-12           129ns ± 0%   105ns ±11%  -18.51%  (p=0.008 n=5+5)
FloatPrecExact/10-12          317ns ± 2%   283ns ± 1%  -10.65%  (p=0.008 n=5+5)
FloatPrecExact/100-12        1.80µs ±15%  1.35µs ± 0%  -25.09%  (p=0.008 n=5+5)
FloatPrecExact/1000-12       9.48µs ±14%  8.32µs ± 1%  -12.25%  (p=0.008 n=5+5)
FloatPrecExact/10000-12       195µs ± 1%   191µs ± 0%   -1.73%  (p=0.008 n=5+5)
FloatPrecExact/100000-12     7.31ms ± 1%  7.24ms ± 1%   -0.99%  (p=0.032 n=5+5)
FloatPrecExact/1000000-12     301ms ± 3%   302ms ± 2%     ~     (p=0.841 n=5+5)
FloatPrecMixed/1-12           141ns ± 0%   110ns ± 3%  -21.88%  (p=0.008 n=5+5)
FloatPrecMixed/10-12          767ns ± 0%   739ns ± 5%     ~     (p=0.151 n=5+5)
FloatPrecMixed/100-12        4.93µs ± 2%  3.73µs ± 1%  -24.33%  (p=0.008 n=5+5)
FloatPrecMixed/1000-12       90.9µs ±11%  70.3µs ± 2%  -22.66%  (p=0.008 n=5+5)
FloatPrecMixed/10000-12      2.30ms ± 0%  1.92ms ± 1%  -16.41%  (p=0.008 n=5+5)
FloatPrecMixed/100000-12     87.1ms ± 1%  68.5ms ± 1%  -21.42%  (p=0.008 n=5+5)
FloatPrecMixed/1000000-12     4.09s ± 1%   3.58s ± 1%  -12.35%  (p=0.008 n=5+5)
FloatPrecInexact/1-12        92.4ns ± 0%  66.1ns ± 5%  -28.41%  (p=0.008 n=5+5)
FloatPrecInexact/10-12        118ns ± 0%    91ns ± 1%  -23.14%  (p=0.016 n=5+4)
FloatPrecInexact/100-12       310ns ±10%   244ns ± 1%  -21.32%  (p=0.008 n=5+5)
FloatPrecInexact/1000-12      952ns ± 1%   828ns ± 1%  -12.96%  (p=0.016 n=4+5)
FloatPrecInexact/10000-12    6.71µs ± 1%  6.25µs ± 3%   -6.83%  (p=0.008 n=5+5)
FloatPrecInexact/100000-12   66.1µs ± 1%  61.2µs ± 1%   -7.45%  (p=0.008 n=5+5)
FloatPrecInexact/1000000-12   635µs ± 2%   584µs ± 1%   -7.97%  (p=0.008 n=5+5)

Change-Id: I3aa67b49a042814a3286ee8306fbed36709cbb6e
Reviewed-on: https://go-review.googlesource.com/c/go/+/542756
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Robert Griesemer <gri@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
Auto-Submit: Robert Griesemer <gri@google.com>
2023-11-15 22:16:34 +00:00
Robert Griesemer e14b96cb51 math/big: update comment in the implementation of FloatPrec
Follow-up on CL 539299: missed to incorporate the updated
comment per feedback on that CL.

For #50489.

Change-Id: Ib035400038b1d11532f62055b5cdb382ab75654c
Reviewed-on: https://go-review.googlesource.com/c/go/+/542115
Run-TryBot: Robert Griesemer <gri@google.com>
Auto-Submit: Robert Griesemer <gri@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-11-14 16:45:59 +00:00
Robert Griesemer dd88f23a20 math/big: implement Rat.FloatPrec
goos: darwin
goarch: amd64
pkg: math/big
cpu: Intel(R) Core(TM) i7-8700B CPU @ 3.20GHz
BenchmarkFloatPrecExact/1-12             9380685          125.0 ns/op
BenchmarkFloatPrecExact/10-12            3780493          321.2 ns/op
BenchmarkFloatPrecExact/100-12            698272         1679 ns/op
BenchmarkFloatPrecExact/1000-12           117975         9113 ns/op
BenchmarkFloatPrecExact/10000-12            5913       192768 ns/op
BenchmarkFloatPrecExact/100000-12            164      7401817 ns/op
BenchmarkFloatPrecExact/1000000-12             4    293568523 ns/op

BenchmarkFloatPrecInexact/1-12          12836612           91.26 ns/op
BenchmarkFloatPrecInexact/10-12         10144908          114.9 ns/op
BenchmarkFloatPrecInexact/100-12         4121931          297.3 ns/op
BenchmarkFloatPrecInexact/1000-12        1275886          927.7 ns/op
BenchmarkFloatPrecInexact/10000-12        170392         6546 ns/op
BenchmarkFloatPrecInexact/100000-12        18307        65232 ns/op
BenchmarkFloatPrecInexact/1000000-12        1701       621412 ns/op

Fixes #50489.

Change-Id: Ic952f00e35d42f2470ecab53df712721997eac94
Reviewed-on: https://go-review.googlesource.com/c/go/+/539299
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Robert Griesemer <gri@google.com>
Run-TryBot: Robert Griesemer <gri@google.com>
Reviewed-by: Robert Griesemer <gri@google.com>
2023-11-14 00:44:42 +00:00
Russ Cox 8abde68f19 math/rand/v2: delete Mitchell/Reeds source
These slowdowns are because we are now using PCG instead of the
Mitchell/Reeds LFSR for the benchmarks. PCG is in fact a bit slower
(but generates statically far better random numbers).

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 01ff938549.amd64 │           afa459a2f0.amd64           │
                        │      sec/op      │    sec/op     vs base                │
PCG_DXSM-32                    1.490n ± 0%    1.488n ± 2%        ~ (p=0.408 n=20)
SourceUint64-32                1.352n ± 1%    1.450n ± 3%   +7.21% (p=0.000 n=20)
GlobalInt64-32                 2.083n ± 0%    2.067n ± 2%        ~ (p=0.223 n=20)
GlobalInt64Parallel-32        0.1035n ± 1%   0.1044n ± 2%        ~ (p=0.010 n=20)
GlobalUint64-32                2.038n ± 1%    2.085n ± 0%   +2.28% (p=0.000 n=20)
GlobalUint64Parallel-32       0.1006n ± 1%   0.1008n ± 1%        ~ (p=0.733 n=20)
Int64-32                       1.687n ± 2%    1.779n ± 1%   +5.48% (p=0.000 n=20)
Uint64-32                      1.674n ± 2%    1.854n ± 2%  +10.69% (p=0.000 n=20)
GlobalIntN1000-32              3.135n ± 1%    3.140n ± 3%        ~ (p=0.794 n=20)
IntN1000-32                    2.478n ± 1%    2.496n ± 1%   +0.73% (p=0.006 n=20)
Int64N1000-32                  2.455n ± 1%    2.510n ± 2%   +2.22% (p=0.000 n=20)
Int64N1e8-32                   2.467n ± 2%    2.471n ± 2%        ~ (p=0.050 n=20)
Int64N1e9-32                   2.454n ± 1%    2.488n ± 2%   +1.39% (p=0.000 n=20)
Int64N2e9-32                   2.482n ± 1%    2.478n ± 2%        ~ (p=0.066 n=20)
Int64N1e18-32                  3.349n ± 2%    3.088n ± 1%   -7.81% (p=0.000 n=20)
Int64N2e18-32                  3.537n ± 1%    3.493n ± 1%   -1.24% (p=0.002 n=20)
Int64N4e18-32                  4.917n ± 0%    5.060n ± 2%   +2.91% (p=0.000 n=20)
Int32N1000-32                  2.386n ± 1%    2.620n ± 1%   +9.76% (p=0.000 n=20)
Int32N1e8-32                   2.366n ± 1%    2.652n ± 0%  +12.11% (p=0.000 n=20)
Int32N1e9-32                   2.355n ± 2%    2.644n ± 1%  +12.32% (p=0.000 n=20)
Int32N2e9-32                   2.371n ± 1%    2.619n ± 2%  +10.48% (p=0.000 n=20)
Float32-32                     2.245n ± 2%    2.261n ± 1%        ~ (p=0.625 n=20)
Float64-32                     2.235n ± 1%    2.241n ± 2%        ~ (p=0.393 n=20)
ExpFloat64-32                  3.813n ± 3%    3.716n ± 1%   -2.53% (p=0.000 n=20)
NormFloat64-32                 3.652n ± 2%    3.718n ± 1%   +1.79% (p=0.006 n=20)
Perm3-32                       33.12n ± 3%    34.11n ± 2%        ~ (p=0.021 n=20)
Perm30-32                      205.1n ± 1%    200.6n ± 0%   -2.17% (p=0.000 n=20)
Perm30ViaShuffle-32            110.8n ± 1%    109.7n ± 1%   -0.99% (p=0.002 n=20)
ShuffleOverhead-32             113.0n ± 1%    107.2n ± 1%   -5.09% (p=0.000 n=20)
Concurrent-32                  2.100n ± 0%    2.108n ± 6%        ~ (p=0.103 n=20)

goos: darwin
goarch: arm64
pkg: math/rand/v2
                       │ 01ff938549.arm64 │           afa459a2f0.arm64           │
                       │      sec/op      │    sec/op     vs base                │
PCG_DXSM-8                    2.531n ± 0%    2.531n ± 0%        ~ (p=0.763 n=20)
SourceUint64-8                2.258n ± 1%    2.531n ± 0%  +12.09% (p=0.000 n=20)
GlobalInt64-8                 2.167n ± 0%    2.177n ± 1%        ~ (p=0.213 n=20)
GlobalInt64Parallel-8        0.4310n ± 0%   0.4319n ± 0%        ~ (p=0.027 n=20)
GlobalUint64-8                2.182n ± 1%    2.185n ± 1%        ~ (p=0.683 n=20)
GlobalUint64Parallel-8       0.4297n ± 0%   0.4295n ± 1%        ~ (p=0.941 n=20)
Int64-8                       2.472n ± 1%    4.104n ± 0%  +66.00% (p=0.000 n=20)
Uint64-8                      2.449n ± 1%    4.080n ± 0%  +66.60% (p=0.000 n=20)
GlobalIntN1000-8              2.814n ± 2%    2.814n ± 1%        ~ (p=0.972 n=20)
IntN1000-8                    2.998n ± 2%    4.140n ± 0%  +38.09% (p=0.000 n=20)
Int64N1000-8                  2.949n ± 2%    4.139n ± 0%  +40.35% (p=0.000 n=20)
Int64N1e8-8                   2.953n ± 2%    4.140n ± 0%  +40.22% (p=0.000 n=20)
Int64N1e9-8                   2.950n ± 0%    4.139n ± 0%  +40.32% (p=0.000 n=20)
Int64N2e9-8                   2.946n ± 2%    4.140n ± 0%  +40.53% (p=0.000 n=20)
Int64N1e18-8                  3.779n ± 1%    5.273n ± 0%  +39.52% (p=0.000 n=20)
Int64N2e18-8                  4.370n ± 1%    6.059n ± 0%  +38.65% (p=0.000 n=20)
Int64N4e18-8                  6.544n ± 1%    8.803n ± 0%  +34.52% (p=0.000 n=20)
Int32N1000-8                  2.950n ± 0%    4.131n ± 0%  +40.06% (p=0.000 n=20)
Int32N1e8-8                   2.950n ± 2%    4.131n ± 0%  +40.03% (p=0.000 n=20)
Int32N1e9-8                   2.951n ± 2%    4.131n ± 0%  +39.99% (p=0.000 n=20)
Int32N2e9-8                   2.950n ± 2%    4.131n ± 0%  +40.03% (p=0.000 n=20)
Float32-8                     3.441n ± 0%    4.110n ± 0%  +19.44% (p=0.000 n=20)
Float64-8                     3.442n ± 0%    4.104n ± 0%  +19.24% (p=0.000 n=20)
ExpFloat64-8                  4.481n ± 0%    5.338n ± 0%  +19.11% (p=0.000 n=20)
NormFloat64-8                 4.725n ± 0%    5.731n ± 0%  +21.28% (p=0.000 n=20)
Perm3-8                       26.55n ± 0%    26.62n ± 0%   +0.28% (p=0.000 n=20)
Perm30-8                      181.9n ± 0%    194.6n ± 2%   +6.98% (p=0.000 n=20)
Perm30ViaShuffle-8            142.9n ± 0%    156.4n ± 0%   +9.45% (p=0.000 n=20)
ShuffleOverhead-8             120.8n ± 2%    125.8n ± 0%   +4.10% (p=0.000 n=20)
Concurrent-8                  2.421n ± 6%    2.654n ± 6%   +9.67% (p=0.002 n=20)

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 01ff938549.386 │            afa459a2f0.386             │
                        │     sec/op     │    sec/op     vs base                 │
PCG_DXSM-32                  7.613n ± 1%    7.793n ± 2%    +2.38% (p=0.000 n=20)
SourceUint64-32              2.069n ± 0%    7.680n ± 1%  +271.19% (p=0.000 n=20)
GlobalInt64-32               3.456n ± 1%    3.474n ± 3%         ~ (p=0.654 n=20)
GlobalInt64Parallel-32      0.3252n ± 0%   0.3253n ± 0%         ~ (p=0.952 n=20)
GlobalUint64-32              3.573n ± 1%    3.433n ± 2%    -3.92% (p=0.000 n=20)
GlobalUint64Parallel-32     0.3159n ± 0%   0.3156n ± 0%         ~ (p=0.223 n=20)
Int64-32                     2.562n ± 2%    7.707n ± 1%  +200.74% (p=0.000 n=20)
Uint64-32                    2.592n ± 0%    7.714n ± 1%  +197.65% (p=0.000 n=20)
GlobalIntN1000-32            6.266n ± 2%    6.236n ± 1%         ~ (p=0.039 n=20)
IntN1000-32                  4.724n ± 2%   10.410n ± 1%  +120.39% (p=0.000 n=20)
Int64N1000-32                5.490n ± 2%   10.975n ± 2%   +99.89% (p=0.000 n=20)
Int64N1e8-32                 5.513n ± 2%   10.980n ± 1%   +99.15% (p=0.000 n=20)
Int64N1e9-32                 5.476n ± 1%   10.950n ± 0%   +99.96% (p=0.000 n=20)
Int64N2e9-32                 5.501n ± 2%   11.110n ± 1%  +101.96% (p=0.000 n=20)
Int64N1e18-32                9.043n ± 2%   15.180n ± 2%   +67.86% (p=0.000 n=20)
Int64N2e18-32                9.601n ± 2%   15.610n ± 1%   +62.60% (p=0.000 n=20)
Int64N4e18-32                12.00n ± 1%    19.23n ± 2%   +60.14% (p=0.000 n=20)
Int32N1000-32                4.829n ± 2%   10.345n ± 1%  +114.25% (p=0.000 n=20)
Int32N1e8-32                 4.825n ± 2%   10.330n ± 1%  +114.09% (p=0.000 n=20)
Int32N1e9-32                 4.830n ± 2%   10.350n ± 1%  +114.26% (p=0.000 n=20)
Int32N2e9-32                 4.750n ± 2%   10.345n ± 1%  +117.81% (p=0.000 n=20)
Float32-32                   10.89n ± 4%    13.57n ± 1%   +24.61% (p=0.000 n=20)
Float64-32                   19.60n ± 4%    22.95n ± 4%   +17.12% (p=0.000 n=20)
ExpFloat64-32                12.96n ± 3%    15.23n ± 2%   +17.47% (p=0.000 n=20)
NormFloat64-32               7.516n ± 1%   13.780n ± 1%   +83.34% (p=0.000 n=20)
Perm3-32                     36.78n ± 2%    46.62n ± 2%   +26.72% (p=0.000 n=20)
Perm30-32                    238.9n ± 2%    400.7n ± 1%   +67.73% (p=0.000 n=20)
Perm30ViaShuffle-32          189.7n ± 2%    350.5n ± 1%   +84.79% (p=0.000 n=20)
ShuffleOverhead-32           159.8n ± 1%    326.0n ± 2%  +104.01% (p=0.000 n=20)
Concurrent-32                3.286n ± 1%    3.290n ± 0%         ~ (p=0.743 n=20)

On the other hand, compared to the original "update benchmarks" CL,
the cleanups we've made more than compensate for PCG being a bit
slower than LFSR, at least on 64-bit x86. ARM64 (Apple M1) is a bit
slower: perhaps the 64x64→128 multiply is slower there for some reason.
386 is noticeably slower, but it's also a non-SSA backend.

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 220860f76f.amd64 │            afa459a2f0.amd64            │
                        │      sec/op      │    sec/op     vs base                  │
SourceUint64-32                1.555n ± 1%    1.450n ± 3%   -6.78% (p=0.000 n=20)
GlobalInt64-32                 2.071n ± 1%    2.067n ± 2%        ~ (p=0.673 n=20)
GlobalInt63Parallel-32        0.1023n ± 1%
GlobalInt64Parallel-32                       0.1044n ± 2%
GlobalUint64-32                5.193n ± 1%    2.085n ± 0%  -59.86% (p=0.000 n=20)
GlobalUint64Parallel-32       0.2341n ± 0%   0.1008n ± 1%  -56.93% (p=0.000 n=20)
Int64-32                       2.056n ± 2%    1.779n ± 1%  -13.47% (p=0.000 n=20)
Uint64-32                      2.077n ± 2%    1.854n ± 2%  -10.74% (p=0.000 n=20)
GlobalIntN1000-32              4.077n ± 2%    3.140n ± 3%  -22.98% (p=0.000 n=20)
IntN1000-32                    3.476n ± 2%    2.496n ± 1%  -28.19% (p=0.000 n=20)
Int64N1000-32                  3.059n ± 1%    2.510n ± 2%  -17.96% (p=0.000 n=20)
Int64N1e8-32                   2.942n ± 1%    2.471n ± 2%  -15.98% (p=0.000 n=20)
Int64N1e9-32                   2.932n ± 1%    2.488n ± 2%  -15.14% (p=0.000 n=20)
Int64N2e9-32                   2.925n ± 1%    2.478n ± 2%  -15.30% (p=0.000 n=20)
Int64N1e18-32                  3.116n ± 1%    3.088n ± 1%        ~ (p=0.013 n=20)
Int64N2e18-32                  4.067n ± 1%    3.493n ± 1%  -14.11% (p=0.000 n=20)
Int64N4e18-32                  4.054n ± 1%    5.060n ± 2%  +24.80% (p=0.000 n=20)
Int32N1000-32                  2.951n ± 1%    2.620n ± 1%  -11.22% (p=0.000 n=20)
Int32N1e8-32                   3.102n ± 1%    2.652n ± 0%  -14.50% (p=0.000 n=20)
Int32N1e9-32                   3.535n ± 1%    2.644n ± 1%  -25.20% (p=0.000 n=20)
Int32N2e9-32                   3.514n ± 1%    2.619n ± 2%  -25.47% (p=0.000 n=20)
Float32-32                     2.760n ± 1%    2.261n ± 1%  -18.06% (p=0.000 n=20)
Float64-32                     2.284n ± 1%    2.241n ± 2%        ~ (p=0.016 n=20)
ExpFloat64-32                  3.757n ± 1%    3.716n ± 1%        ~ (p=0.034 n=20)
NormFloat64-32                 3.837n ± 1%    3.718n ± 1%   -3.09% (p=0.000 n=20)
Perm3-32                       35.23n ± 2%    34.11n ± 2%   -3.19% (p=0.000 n=20)
Perm30-32                      208.8n ± 1%    200.6n ± 0%   -3.93% (p=0.000 n=20)
Perm30ViaShuffle-32            111.7n ± 1%    109.7n ± 1%   -1.84% (p=0.000 n=20)
ShuffleOverhead-32             101.1n ± 1%    107.2n ± 1%   +6.03% (p=0.000 n=20)
Concurrent-32                  2.108n ± 7%    2.108n ± 6%        ~ (p=0.644 n=20)
PCG_DXSM-32                                   1.488n ± 2%

goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
                       │ 220860f76f.arm64 │            afa459a2f0.arm64            │
                       │      sec/op      │    sec/op     vs base                  │
SourceUint64-8                2.316n ± 1%    2.531n ± 0%   +9.33% (p=0.000 n=20)
GlobalInt64-8                 2.183n ± 1%    2.177n ± 1%        ~ (p=0.533 n=20)
GlobalInt63Parallel-8        0.4331n ± 0%
GlobalInt64Parallel-8                       0.4319n ± 0%
GlobalUint64-8                4.377n ± 2%    2.185n ± 1%  -50.07% (p=0.000 n=20)
GlobalUint64Parallel-8       0.9237n ± 0%   0.4295n ± 1%  -53.50% (p=0.000 n=20)
Int64-8                       2.538n ± 1%    4.104n ± 0%  +61.68% (p=0.000 n=20)
Uint64-8                      2.604n ± 1%    4.080n ± 0%  +56.68% (p=0.000 n=20)
GlobalIntN1000-8              3.857n ± 2%    2.814n ± 1%  -27.04% (p=0.000 n=20)
IntN1000-8                    3.822n ± 2%    4.140n ± 0%   +8.32% (p=0.000 n=20)
Int64N1000-8                  3.318n ± 0%    4.139n ± 0%  +24.74% (p=0.000 n=20)
Int64N1e8-8                   3.349n ± 1%    4.140n ± 0%  +23.64% (p=0.000 n=20)
Int64N1e9-8                   3.317n ± 2%    4.139n ± 0%  +24.80% (p=0.000 n=20)
Int64N2e9-8                   3.317n ± 2%    4.140n ± 0%  +24.81% (p=0.000 n=20)
Int64N1e18-8                  3.542n ± 1%    5.273n ± 0%  +48.85% (p=0.000 n=20)
Int64N2e18-8                  5.087n ± 0%    6.059n ± 0%  +19.12% (p=0.000 n=20)
Int64N4e18-8                  5.084n ± 0%    8.803n ± 0%  +73.16% (p=0.000 n=20)
Int32N1000-8                  3.208n ± 2%    4.131n ± 0%  +28.79% (p=0.000 n=20)
Int32N1e8-8                   3.610n ± 1%    4.131n ± 0%  +14.43% (p=0.000 n=20)
Int32N1e9-8                   4.235n ± 0%    4.131n ± 0%   -2.44% (p=0.000 n=20)
Int32N2e9-8                   4.229n ± 1%    4.131n ± 0%   -2.33% (p=0.000 n=20)
Float32-8                     3.468n ± 0%    4.110n ± 0%  +18.50% (p=0.000 n=20)
Float64-8                     3.447n ± 0%    4.104n ± 0%  +19.05% (p=0.000 n=20)
ExpFloat64-8                  4.567n ± 0%    5.338n ± 0%  +16.86% (p=0.000 n=20)
NormFloat64-8                 4.821n ± 0%    5.731n ± 0%  +18.89% (p=0.000 n=20)
Perm3-8                       28.89n ± 0%    26.62n ± 0%   -7.84% (p=0.000 n=20)
Perm30-8                      175.7n ± 0%    194.6n ± 2%  +10.76% (p=0.000 n=20)
Perm30ViaShuffle-8            153.5n ± 0%    156.4n ± 0%   +1.86% (p=0.000 n=20)
ShuffleOverhead-8             119.8n ± 1%    125.8n ± 0%   +4.97% (p=0.000 n=20)
Concurrent-8                  2.433n ± 3%    2.654n ± 6%   +9.13% (p=0.001 n=20)
PCG_DXSM-8                                   2.531n ± 0%

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 220860f76f.386 │             afa459a2f0.386              │
                        │     sec/op     │    sec/op     vs base                   │
SourceUint64-32             2.370n ±  1%    7.680n ± 1%  +224.05% (p=0.000 n=20)
GlobalInt64-32              3.569n ±  1%    3.474n ± 3%    -2.66% (p=0.001 n=20)
GlobalInt63Parallel-32     0.3221n ±  1%
GlobalInt64Parallel-32                     0.3253n ± 0%
GlobalUint64-32             8.797n ± 10%    3.433n ± 2%   -60.98% (p=0.000 n=20)
GlobalUint64Parallel-32    0.6351n ±  0%   0.3156n ± 0%   -50.31% (p=0.000 n=20)
Int64-32                    2.612n ±  2%    7.707n ± 1%  +195.04% (p=0.000 n=20)
Uint64-32                   3.350n ±  1%    7.714n ± 1%  +130.25% (p=0.000 n=20)
GlobalIntN1000-32           5.892n ±  1%    6.236n ± 1%    +5.82% (p=0.000 n=20)
IntN1000-32                 4.546n ±  1%   10.410n ± 1%  +128.97% (p=0.000 n=20)
Int64N1000-32               14.59n ±  1%    10.97n ± 2%   -24.75% (p=0.000 n=20)
Int64N1e8-32                14.76n ±  2%    10.98n ± 1%   -25.58% (p=0.000 n=20)
Int64N1e9-32                16.57n ±  1%    10.95n ± 0%   -33.90% (p=0.000 n=20)
Int64N2e9-32                14.54n ±  1%    11.11n ± 1%   -23.62% (p=0.000 n=20)
Int64N1e18-32               16.14n ±  1%    15.18n ± 2%    -5.95% (p=0.000 n=20)
Int64N2e18-32               18.10n ±  1%    15.61n ± 1%   -13.73% (p=0.000 n=20)
Int64N4e18-32               18.65n ±  1%    19.23n ± 2%    +3.08% (p=0.000 n=20)
Int32N1000-32               3.560n ±  1%   10.345n ± 1%  +190.55% (p=0.000 n=20)
Int32N1e8-32                3.770n ±  2%   10.330n ± 1%  +174.01% (p=0.000 n=20)
Int32N1e9-32                4.098n ±  0%   10.350n ± 1%  +152.53% (p=0.000 n=20)
Int32N2e9-32                4.179n ±  1%   10.345n ± 1%  +147.52% (p=0.000 n=20)
Float32-32                  21.18n ±  4%    13.57n ± 1%   -35.93% (p=0.000 n=20)
Float64-32                  20.60n ±  2%    22.95n ± 4%   +11.41% (p=0.000 n=20)
ExpFloat64-32               13.07n ±  0%    15.23n ± 2%   +16.48% (p=0.000 n=20)
NormFloat64-32              7.738n ±  2%   13.780n ± 1%   +78.08% (p=0.000 n=20)
Perm3-32                    36.73n ±  1%    46.62n ± 2%   +26.91% (p=0.000 n=20)
Perm30-32                   211.9n ±  1%    400.7n ± 1%   +89.05% (p=0.000 n=20)
Perm30ViaShuffle-32         165.2n ±  1%    350.5n ± 1%  +112.20% (p=0.000 n=20)
ShuffleOverhead-32          133.9n ±  1%    326.0n ± 2%  +143.37% (p=0.000 n=20)
Concurrent-32               3.287n ±  2%    3.290n ± 0%         ~ (p=0.365 n=20)
PCG_DXSM-32                                 7.793n ± 2%

For #61716.

Change-Id: I4e9c0525b5f84a2ac46f23da9e365495e2d05777
Reviewed-on: https://go-review.googlesource.com/c/go/+/502506
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-30 17:09:26 +00:00
Russ Cox 8631fcbf31 math/rand/v2: add PCG-DXSM
For the original math/rand, we ported Plan 9's random number
generator, which was a refinement by Ken Thompson of an algorithm
by Don Mitchell and Jim Reeds, which Mitchell in turn recalls as
having been derived from an algorithm by Marsaglia. At its core,
it is an additive lagged Fibonacci generator (ALFG).

Whatever the details of the history, this generator is nowhere
near the current state of the art for simple, pseudo-random
generators.

This CL adds an implementation of Melissa O'Neill's PCG, specifically
the variant PCG-DXSM, which she defined after writing the PCG paper
and which is now the default in Numpy. The update is slightly slower
(a few multiplies and adds, instead of a few adds), but the state
is dramatically smaller (2 words instead of 607). The statistical
output properties are better too.

A followup CL will delete the old generator.

PCG is the only change here, so no benchmarks should be affected.
Including them anyway as further evidence for caution.

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 8993506f2f.amd64 │           01ff938549.amd64           │
                        │      sec/op      │    sec/op     vs base                │
SourceUint64-32                1.325n ± 1%    1.352n ± 1%   +2.00% (p=0.000 n=20)
GlobalInt64-32                 2.240n ± 1%    2.083n ± 0%   -7.03% (p=0.000 n=20)
GlobalInt64Parallel-32        0.1041n ± 1%   0.1035n ± 1%        ~ (p=0.064 n=20)
GlobalUint64-32                2.072n ± 3%    2.038n ± 1%        ~ (p=0.089 n=20)
GlobalUint64Parallel-32       0.1008n ± 1%   0.1006n ± 1%        ~ (p=0.804 n=20)
Int64-32                       1.716n ± 1%    1.687n ± 2%        ~ (p=0.045 n=20)
Uint64-32                      1.665n ± 1%    1.674n ± 2%        ~ (p=0.878 n=20)
GlobalIntN1000-32              3.335n ± 1%    3.135n ± 1%   -6.00% (p=0.000 n=20)
IntN1000-32                    2.484n ± 1%    2.478n ± 1%        ~ (p=0.085 n=20)
Int64N1000-32                  2.502n ± 2%    2.455n ± 1%   -1.88% (p=0.002 n=20)
Int64N1e8-32                   2.484n ± 2%    2.467n ± 2%        ~ (p=0.048 n=20)
Int64N1e9-32                   2.502n ± 0%    2.454n ± 1%   -1.92% (p=0.000 n=20)
Int64N2e9-32                   2.502n ± 0%    2.482n ± 1%   -0.76% (p=0.000 n=20)
Int64N1e18-32                  3.201n ± 1%    3.349n ± 2%   +4.62% (p=0.000 n=20)
Int64N2e18-32                  3.504n ± 1%    3.537n ± 1%        ~ (p=0.185 n=20)
Int64N4e18-32                  4.873n ± 1%    4.917n ± 0%   +0.90% (p=0.000 n=20)
Int32N1000-32                  2.639n ± 1%    2.386n ± 1%   -9.57% (p=0.000 n=20)
Int32N1e8-32                   2.686n ± 2%    2.366n ± 1%  -11.91% (p=0.000 n=20)
Int32N1e9-32                   2.636n ± 1%    2.355n ± 2%  -10.70% (p=0.000 n=20)
Int32N2e9-32                   2.660n ± 1%    2.371n ± 1%  -10.88% (p=0.000 n=20)
Float32-32                     2.261n ± 1%    2.245n ± 2%        ~ (p=0.752 n=20)
Float64-32                     2.280n ± 1%    2.235n ± 1%   -1.97% (p=0.007 n=20)
ExpFloat64-32                  3.891n ± 1%    3.813n ± 3%        ~ (p=0.087 n=20)
NormFloat64-32                 3.711n ± 1%    3.652n ± 2%        ~ (p=0.021 n=20)
Perm3-32                       32.60n ± 2%    33.12n ± 3%        ~ (p=0.107 n=20)
Perm30-32                      204.2n ± 0%    205.1n ± 1%        ~ (p=0.358 n=20)
Perm30ViaShuffle-32            121.7n ± 2%    110.8n ± 1%   -8.96% (p=0.000 n=20)
ShuffleOverhead-32             106.2n ± 2%    113.0n ± 1%   +6.36% (p=0.000 n=20)
Concurrent-32                  2.190n ± 5%    2.100n ± 0%   -4.13% (p=0.001 n=20)
PCG_DXSM-32                                   1.490n ± 0%

goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
                       │ 8993506f2f.arm64 │           01ff938549.arm64           │
                       │      sec/op      │    sec/op     vs base                │
SourceUint64-8                2.271n ± 0%    2.258n ± 1%        ~ (p=0.167 n=20)
GlobalInt64-8                 2.161n ± 1%    2.167n ± 0%        ~ (p=0.693 n=20)
GlobalInt64Parallel-8        0.4303n ± 0%   0.4310n ± 0%        ~ (p=0.051 n=20)
GlobalUint64-8                2.164n ± 1%    2.182n ± 1%        ~ (p=0.042 n=20)
GlobalUint64Parallel-8       0.4287n ± 0%   0.4297n ± 0%        ~ (p=0.082 n=20)
Int64-8                       2.478n ± 1%    2.472n ± 1%        ~ (p=0.151 n=20)
Uint64-8                      2.460n ± 1%    2.449n ± 1%        ~ (p=0.013 n=20)
GlobalIntN1000-8              2.814n ± 2%    2.814n ± 2%        ~ (p=0.821 n=20)
IntN1000-8                    3.003n ± 2%    2.998n ± 2%        ~ (p=0.024 n=20)
Int64N1000-8                  2.954n ± 0%    2.949n ± 2%        ~ (p=0.192 n=20)
Int64N1e8-8                   2.956n ± 0%    2.953n ± 2%        ~ (p=0.109 n=20)
Int64N1e9-8                   3.325n ± 0%    2.950n ± 0%  -11.26% (p=0.000 n=20)
Int64N2e9-8                   2.956n ± 2%    2.946n ± 2%        ~ (p=0.027 n=20)
Int64N1e18-8                  3.780n ± 1%    3.779n ± 1%        ~ (p=0.815 n=20)
Int64N2e18-8                  4.385n ± 0%    4.370n ± 1%        ~ (p=0.402 n=20)
Int64N4e18-8                  6.527n ± 0%    6.544n ± 1%        ~ (p=0.140 n=20)
Int32N1000-8                  2.964n ± 1%    2.950n ± 0%   -0.47% (p=0.002 n=20)
Int32N1e8-8                   2.964n ± 1%    2.950n ± 2%        ~ (p=0.013 n=20)
Int32N1e9-8                   2.963n ± 2%    2.951n ± 2%        ~ (p=0.062 n=20)
Int32N2e9-8                   2.961n ± 2%    2.950n ± 2%   -0.37% (p=0.002 n=20)
Float32-8                     3.442n ± 0%    3.441n ± 0%        ~ (p=0.211 n=20)
Float64-8                     3.442n ± 0%    3.442n ± 0%        ~ (p=0.067 n=20)
ExpFloat64-8                  4.472n ± 0%    4.481n ± 0%   +0.20% (p=0.000 n=20)
NormFloat64-8                 4.734n ± 0%    4.725n ± 0%   -0.19% (p=0.003 n=20)
Perm3-8                       26.55n ± 0%    26.55n ± 0%        ~ (p=0.833 n=20)
Perm30-8                      181.9n ± 0%    181.9n ± 0%   -0.03% (p=0.004 n=20)
Perm30ViaShuffle-8            143.1n ± 0%    142.9n ± 0%        ~ (p=0.204 n=20)
ShuffleOverhead-8             120.6n ± 1%    120.8n ± 2%        ~ (p=0.102 n=20)
Concurrent-8                  2.357n ± 2%    2.421n ± 6%        ~ (p=0.016 n=20)
PCG_DXSM-8                                   2.531n ± 0%

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 8993506f2f.386 │           01ff938549.386            │
                        │     sec/op     │    sec/op     vs base               │
SourceUint64-32              2.102n ± 2%    2.069n ± 0%       ~ (p=0.021 n=20)
GlobalInt64-32               3.542n ± 2%    3.456n ± 1%  -2.44% (p=0.001 n=20)
GlobalInt64Parallel-32      0.3202n ± 0%   0.3252n ± 0%  +1.56% (p=0.000 n=20)
GlobalUint64-32              3.507n ± 1%    3.573n ± 1%  +1.87% (p=0.000 n=20)
GlobalUint64Parallel-32     0.3170n ± 1%   0.3159n ± 0%       ~ (p=0.167 n=20)
Int64-32                     2.516n ± 1%    2.562n ± 2%       ~ (p=0.016 n=20)
Uint64-32                    2.544n ± 1%    2.592n ± 0%  +1.85% (p=0.000 n=20)
GlobalIntN1000-32            6.237n ± 1%    6.266n ± 2%       ~ (p=0.268 n=20)
IntN1000-32                  4.670n ± 2%    4.724n ± 2%       ~ (p=0.644 n=20)
Int64N1000-32                5.412n ± 1%    5.490n ± 2%       ~ (p=0.159 n=20)
Int64N1e8-32                 5.414n ± 2%    5.513n ± 2%       ~ (p=0.129 n=20)
Int64N1e9-32                 5.473n ± 1%    5.476n ± 1%       ~ (p=0.723 n=20)
Int64N2e9-32                 5.487n ± 1%    5.501n ± 2%       ~ (p=0.481 n=20)
Int64N1e18-32                8.901n ± 2%    9.043n ± 2%       ~ (p=0.330 n=20)
Int64N2e18-32                9.521n ± 1%    9.601n ± 2%       ~ (p=0.703 n=20)
Int64N4e18-32                11.92n ± 1%    12.00n ± 1%       ~ (p=0.489 n=20)
Int32N1000-32                4.785n ± 1%    4.829n ± 2%       ~ (p=0.402 n=20)
Int32N1e8-32                 4.748n ± 1%    4.825n ± 2%       ~ (p=0.218 n=20)
Int32N1e9-32                 4.810n ± 1%    4.830n ± 2%       ~ (p=0.794 n=20)
Int32N2e9-32                 4.812n ± 1%    4.750n ± 2%       ~ (p=0.057 n=20)
Float32-32                   10.48n ± 4%    10.89n ± 4%       ~ (p=0.162 n=20)
Float64-32                   19.79n ± 3%    19.60n ± 4%       ~ (p=0.668 n=20)
ExpFloat64-32                12.91n ± 3%    12.96n ± 3%       ~ (p=1.000 n=20)
NormFloat64-32               7.462n ± 1%    7.516n ± 1%       ~ (p=0.051 n=20)
Perm3-32                     35.98n ± 2%    36.78n ± 2%       ~ (p=0.033 n=20)
Perm30-32                    241.5n ± 1%    238.9n ± 2%       ~ (p=0.126 n=20)
Perm30ViaShuffle-32          187.3n ± 2%    189.7n ± 2%       ~ (p=0.387 n=20)
ShuffleOverhead-32           160.2n ± 1%    159.8n ± 1%       ~ (p=0.256 n=20)
Concurrent-32                3.308n ± 3%    3.286n ± 1%       ~ (p=0.038 n=20)
PCG_DXSM-32                                 7.613n ± 1%

For #61716.

Change-Id: Icb274ca1f782504d658305a40159b4ae6a2f3f1d
Reviewed-on: https://go-review.googlesource.com/c/go/+/502505
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Rob Pike <r@golang.org>
2023-10-30 17:09:23 +00:00
Russ Cox f2e2637227 math/rand/v2: simplify Perm
The compiler says Perm is being inlined into BenchmarkPerm,
and yet BenchmarkPerm30ViaShuffle, which you'd think is the
same code, still runs significantly faster.

The benchmarks are mystifying but this is clearly still a step in
the right direction, since BenchmarkPerm30ViaShuffle is still
the fastest and we avoid having two copies of that logic.

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ e1bbe739fb.amd64 │           8993506f2f.amd64           │
                        │      sec/op      │    sec/op     vs base                │
SourceUint64-32                1.316n ± 2%    1.325n ± 1%        ~ (p=0.208 n=20)
GlobalInt64-32                 2.048n ± 1%    2.240n ± 1%   +9.38% (p=0.000 n=20)
GlobalInt64Parallel-32        0.1037n ± 1%   0.1041n ± 1%        ~ (p=0.774 n=20)
GlobalUint64-32                2.039n ± 2%    2.072n ± 3%        ~ (p=0.115 n=20)
GlobalUint64Parallel-32       0.1013n ± 1%   0.1008n ± 1%        ~ (p=0.417 n=20)
Int64-32                       1.692n ± 2%    1.716n ± 1%        ~ (p=0.122 n=20)
Uint64-32                      1.643n ± 2%    1.665n ± 1%        ~ (p=0.062 n=20)
GlobalIntN1000-32              3.287n ± 1%    3.335n ± 1%        ~ (p=0.147 n=20)
IntN1000-32                    2.678n ± 2%    2.484n ± 1%   -7.24% (p=0.000 n=20)
Int64N1000-32                  2.684n ± 2%    2.502n ± 2%   -6.80% (p=0.000 n=20)
Int64N1e8-32                   2.663n ± 2%    2.484n ± 2%   -6.76% (p=0.000 n=20)
Int64N1e9-32                   2.633n ± 1%    2.502n ± 0%   -4.98% (p=0.000 n=20)
Int64N2e9-32                   2.657n ± 1%    2.502n ± 0%   -5.87% (p=0.000 n=20)
Int64N1e18-32                  3.125n ± 2%    3.201n ± 1%   +2.43% (p=0.000 n=20)
Int64N2e18-32                  3.476n ± 1%    3.504n ± 1%   +0.83% (p=0.009 n=20)
Int64N4e18-32                  4.795n ± 1%    4.873n ± 1%        ~ (p=0.106 n=20)
Int32N1000-32                  2.485n ± 2%    2.639n ± 1%   +6.20% (p=0.000 n=20)
Int32N1e8-32                   2.457n ± 1%    2.686n ± 2%   +9.34% (p=0.000 n=20)
Int32N1e9-32                   2.452n ± 1%    2.636n ± 1%   +7.52% (p=0.000 n=20)
Int32N2e9-32                   2.453n ± 1%    2.660n ± 1%   +8.44% (p=0.000 n=20)
Float32-32                     2.254n ± 1%    2.261n ± 1%        ~ (p=0.888 n=20)
Float64-32                     2.262n ± 1%    2.280n ± 1%        ~ (p=0.040 n=20)
ExpFloat64-32                  3.777n ± 2%    3.891n ± 1%   +3.03% (p=0.000 n=20)
NormFloat64-32                 3.606n ± 1%    3.711n ± 1%   +2.91% (p=0.000 n=20)
Perm3-32                       33.12n ± 2%    32.60n ± 2%        ~ (p=0.045 n=20)
Perm30-32                      176.1n ± 1%    204.2n ± 0%  +15.96% (p=0.000 n=20)
Perm30ViaShuffle-32            109.3n ± 1%    121.7n ± 2%  +11.30% (p=0.000 n=20)
ShuffleOverhead-32             112.5n ± 1%    106.2n ± 2%   -5.56% (p=0.000 n=20)
Concurrent-32                  2.099n ± 0%    2.190n ± 5%   +4.36% (p=0.001 n=20)

goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
                       │ e1bbe739fb.arm64 │           8993506f2f.arm64           │
                       │      sec/op      │    sec/op     vs base                │
SourceUint64-8                2.290n ± 1%    2.271n ± 0%        ~ (p=0.015 n=20)
GlobalInt64-8                 2.180n ± 1%    2.161n ± 1%        ~ (p=0.180 n=20)
GlobalInt64Parallel-8        0.4294n ± 0%   0.4303n ± 0%   +0.19% (p=0.001 n=20)
GlobalUint64-8                2.170n ± 1%    2.164n ± 1%        ~ (p=0.673 n=20)
GlobalUint64Parallel-8       0.4283n ± 0%   0.4287n ± 0%        ~ (p=0.128 n=20)
Int64-8                       2.481n ± 1%    2.478n ± 1%        ~ (p=0.867 n=20)
Uint64-8                      2.464n ± 1%    2.460n ± 1%        ~ (p=0.763 n=20)
GlobalIntN1000-8              2.814n ± 0%    2.814n ± 2%        ~ (p=0.969 n=20)
IntN1000-8                    2.934n ± 2%    3.003n ± 2%   +2.35% (p=0.000 n=20)
Int64N1000-8                  2.957n ± 1%    2.954n ± 0%        ~ (p=0.285 n=20)
Int64N1e8-8                   2.935n ± 2%    2.956n ± 0%   +0.73% (p=0.002 n=20)
Int64N1e9-8                   2.935n ± 2%    3.325n ± 0%  +13.29% (p=0.000 n=20)
Int64N2e9-8                   2.933n ± 4%    2.956n ± 2%        ~ (p=0.163 n=20)
Int64N1e18-8                  3.781n ± 1%    3.780n ± 1%        ~ (p=0.805 n=20)
Int64N2e18-8                  4.362n ± 0%    4.385n ± 0%        ~ (p=0.077 n=20)
Int64N4e18-8                  6.576n ± 1%    6.527n ± 0%        ~ (p=0.024 n=20)
Int32N1000-8                  2.942n ± 2%    2.964n ± 1%        ~ (p=0.073 n=20)
Int32N1e8-8                   2.941n ± 1%    2.964n ± 1%        ~ (p=0.058 n=20)
Int32N1e9-8                   2.938n ± 2%    2.963n ± 2%   +0.87% (p=0.003 n=20)
Int32N2e9-8                   2.982n ± 2%    2.961n ± 2%        ~ (p=0.056 n=20)
Float32-8                     3.441n ± 0%    3.442n ± 0%        ~ (p=0.030 n=20)
Float64-8                     3.441n ± 0%    3.442n ± 0%   +0.03% (p=0.001 n=20)
ExpFloat64-8                  4.472n ± 0%    4.472n ± 0%        ~ (p=0.877 n=20)
NormFloat64-8                 4.716n ± 0%    4.734n ± 0%   +0.38% (p=0.000 n=20)
Perm3-8                       26.66n ± 0%    26.55n ± 0%   -0.39% (p=0.000 n=20)
Perm30-8                      143.3n ± 0%    181.9n ± 0%  +26.97% (p=0.000 n=20)
Perm30ViaShuffle-8            142.9n ± 0%    143.1n ± 0%        ~ (p=0.669 n=20)
ShuffleOverhead-8             121.1n ± 1%    120.6n ± 1%   -0.41% (p=0.004 n=20)
Concurrent-8                  2.379n ± 2%    2.357n ± 2%        ~ (p=0.337 n=20)

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ e1bbe739fb.386 │            8993506f2f.386            │
                        │     sec/op     │    sec/op     vs base                │
SourceUint64-32              2.087n ± 1%    2.102n ± 2%        ~ (p=0.507 n=20)
GlobalInt64-32               3.538n ± 2%    3.542n ± 2%        ~ (p=0.425 n=20)
GlobalInt64Parallel-32      0.3207n ± 1%   0.3202n ± 0%        ~ (p=0.963 n=20)
GlobalUint64-32              3.543n ± 1%    3.507n ± 1%        ~ (p=0.034 n=20)
GlobalUint64Parallel-32     0.3170n ± 0%   0.3170n ± 1%        ~ (p=0.920 n=20)
Int64-32                     2.548n ± 1%    2.516n ± 1%        ~ (p=0.139 n=20)
Uint64-32                    2.565n ± 2%    2.544n ± 1%        ~ (p=0.394 n=20)
GlobalIntN1000-32            6.300n ± 1%    6.237n ± 1%        ~ (p=0.029 n=20)
IntN1000-32                  4.750n ± 0%    4.670n ± 2%        ~ (p=0.034 n=20)
Int64N1000-32                5.515n ± 2%    5.412n ± 1%   -1.86% (p=0.009 n=20)
Int64N1e8-32                 5.527n ± 0%    5.414n ± 2%   -2.05% (p=0.002 n=20)
Int64N1e9-32                 5.531n ± 2%    5.473n ± 1%        ~ (p=0.047 n=20)
Int64N2e9-32                 5.514n ± 2%    5.487n ± 1%        ~ (p=0.298 n=20)
Int64N1e18-32                9.059n ± 1%    8.901n ± 2%        ~ (p=0.037 n=20)
Int64N2e18-32                9.594n ± 1%    9.521n ± 1%        ~ (p=0.051 n=20)
Int64N4e18-32                12.05n ± 2%    11.92n ± 1%        ~ (p=0.357 n=20)
Int32N1000-32                4.840n ± 2%    4.785n ± 1%        ~ (p=0.189 n=20)
Int32N1e8-32                 4.832n ± 2%    4.748n ± 1%        ~ (p=0.042 n=20)
Int32N1e9-32                 4.815n ± 2%    4.810n ± 1%        ~ (p=0.878 n=20)
Int32N2e9-32                 4.813n ± 1%    4.812n ± 1%        ~ (p=0.542 n=20)
Float32-32                   10.90n ± 2%    10.48n ± 4%   -3.85% (p=0.007 n=20)
Float64-32                   20.32n ± 4%    19.79n ± 3%        ~ (p=0.553 n=20)
ExpFloat64-32                12.95n ± 3%    12.91n ± 3%        ~ (p=0.909 n=20)
NormFloat64-32               7.570n ± 1%    7.462n ± 1%   -1.44% (p=0.004 n=20)
Perm3-32                     37.80n ± 2%    35.98n ± 2%   -4.79% (p=0.000 n=20)
Perm30-32                    214.0n ± 1%    241.5n ± 1%  +12.85% (p=0.000 n=20)
Perm30ViaShuffle-32          188.7n ± 2%    187.3n ± 2%        ~ (p=0.029 n=20)
ShuffleOverhead-32           160.8n ± 1%    160.2n ± 1%        ~ (p=0.180 n=20)
Concurrent-32                3.288n ± 0%    3.308n ± 3%        ~ (p=0.037 n=20)

For #61716.

Change-Id: I342b611456c3569520d3c91c849d29eba325d87e
Reviewed-on: https://go-review.googlesource.com/c/go/+/502504
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Rob Pike <r@golang.org>
2023-10-30 17:09:21 +00:00
Branden Brown 488e2a56b9 math/rand/v2: remove bias in ExpFloat64 and NormFloat64
The original implementation of the ziggurat algorithm was designed for
32-bit random integer inputs. This necessitated reusing some low-order
bits for the slice selection and the random coordinate, which introduces
statistical bias. The result is that PractRand consistently fails the
math/rand normal and exponential sequences (transformed to uniform)
within 2 GB of variates.

This change adjusts the ziggurat procedures to use 63-bit random inputs,
so that there is no need to reuse bits between the slice and coordinate.
This is sufficient for the normal sequence to survive to 256 GB of
PractRand testing.

An alternative technique is to recalculate the ziggurats to use 1024
rather than 128 or 256 slices to make full use of 64-bit inputs. This
improves the survival of the normal sequence to far beyond 256 GB and
additionally provides a 6% performance improvement due to the improved
rejection procedure efficiency. However, doing so increases the total
size of the ziggurat tables from 4.5 kB to 48 kB.

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 2703446c2e.amd64 │           e1bbe739fb.amd64           │
                        │      sec/op      │    sec/op     vs base                │
SourceUint64-32                1.337n ± 1%    1.316n ± 2%        ~ (p=0.024 n=20)
GlobalInt64-32                 2.225n ± 2%    2.048n ± 1%   -7.93% (p=0.000 n=20)
GlobalInt64Parallel-32        0.1043n ± 2%   0.1037n ± 1%        ~ (p=0.587 n=20)
GlobalUint64-32                2.058n ± 1%    2.039n ± 2%        ~ (p=0.030 n=20)
GlobalUint64Parallel-32       0.1009n ± 1%   0.1013n ± 1%        ~ (p=0.984 n=20)
Int64-32                       1.719n ± 2%    1.692n ± 2%        ~ (p=0.085 n=20)
Uint64-32                      1.669n ± 1%    1.643n ± 2%        ~ (p=0.049 n=20)
GlobalIntN1000-32              3.321n ± 2%    3.287n ± 1%        ~ (p=0.298 n=20)
IntN1000-32                    2.479n ± 1%    2.678n ± 2%   +8.01% (p=0.000 n=20)
Int64N1000-32                  2.477n ± 1%    2.684n ± 2%   +8.38% (p=0.000 n=20)
Int64N1e8-32                   2.490n ± 1%    2.663n ± 2%   +6.99% (p=0.000 n=20)
Int64N1e9-32                   2.458n ± 1%    2.633n ± 1%   +7.12% (p=0.000 n=20)
Int64N2e9-32                   2.486n ± 2%    2.657n ± 1%   +6.90% (p=0.000 n=20)
Int64N1e18-32                  3.215n ± 2%    3.125n ± 2%   -2.78% (p=0.000 n=20)
Int64N2e18-32                  3.588n ± 2%    3.476n ± 1%   -3.15% (p=0.000 n=20)
Int64N4e18-32                  4.938n ± 2%    4.795n ± 1%   -2.91% (p=0.000 n=20)
Int32N1000-32                  2.673n ± 2%    2.485n ± 2%   -7.02% (p=0.000 n=20)
Int32N1e8-32                   2.631n ± 2%    2.457n ± 1%   -6.63% (p=0.000 n=20)
Int32N1e9-32                   2.628n ± 2%    2.452n ± 1%   -6.70% (p=0.000 n=20)
Int32N2e9-32                   2.684n ± 2%    2.453n ± 1%   -8.61% (p=0.000 n=20)
Float32-32                     2.240n ± 2%    2.254n ± 1%        ~ (p=0.878 n=20)
Float64-32                     2.253n ± 1%    2.262n ± 1%        ~ (p=0.963 n=20)
ExpFloat64-32                  3.677n ± 1%    3.777n ± 2%   +2.71% (p=0.004 n=20)
NormFloat64-32                 3.761n ± 1%    3.606n ± 1%   -4.15% (p=0.000 n=20)
Perm3-32                       33.55n ± 2%    33.12n ± 2%        ~ (p=0.402 n=20)
Perm30-32                      173.2n ± 1%    176.1n ± 1%   +1.67% (p=0.000 n=20)
Perm30ViaShuffle-32            115.9n ± 1%    109.3n ± 1%   -5.69% (p=0.000 n=20)
ShuffleOverhead-32             101.9n ± 1%    112.5n ± 1%  +10.35% (p=0.000 n=20)
Concurrent-32                  2.107n ± 6%    2.099n ± 0%        ~ (p=0.051 n=20)

goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
                       │ 2703446c2e.arm64 │          e1bbe739fb.arm64           │
                       │      sec/op      │    sec/op     vs base               │
SourceUint64-8                2.275n ± 0%    2.290n ± 1%       ~ (p=0.044 n=20)
GlobalInt64-8                 2.154n ± 1%    2.180n ± 1%       ~ (p=0.068 n=20)
GlobalInt64Parallel-8        0.4298n ± 0%   0.4294n ± 0%       ~ (p=0.079 n=20)
GlobalUint64-8                2.160n ± 1%    2.170n ± 1%       ~ (p=0.129 n=20)
GlobalUint64Parallel-8       0.4286n ± 0%   0.4283n ± 0%       ~ (p=0.350 n=20)
Int64-8                       2.491n ± 1%    2.481n ± 1%       ~ (p=0.330 n=20)
Uint64-8                      2.458n ± 0%    2.464n ± 1%       ~ (p=0.351 n=20)
GlobalIntN1000-8              2.814n ± 2%    2.814n ± 0%       ~ (p=0.325 n=20)
IntN1000-8                    2.933n ± 0%    2.934n ± 2%       ~ (p=0.079 n=20)
Int64N1000-8                  2.962n ± 1%    2.957n ± 1%       ~ (p=0.259 n=20)
Int64N1e8-8                   2.960n ± 1%    2.935n ± 2%       ~ (p=0.276 n=20)
Int64N1e9-8                   2.935n ± 2%    2.935n ± 2%       ~ (p=0.984 n=20)
Int64N2e9-8                   2.934n ± 0%    2.933n ± 4%       ~ (p=0.463 n=20)
Int64N1e18-8                  3.777n ± 1%    3.781n ± 1%       ~ (p=0.516 n=20)
Int64N2e18-8                  4.359n ± 1%    4.362n ± 0%       ~ (p=0.256 n=20)
Int64N4e18-8                  6.536n ± 1%    6.576n ± 1%       ~ (p=0.224 n=20)
Int32N1000-8                  2.937n ± 0%    2.942n ± 2%       ~ (p=0.312 n=20)
Int32N1e8-8                   2.937n ± 1%    2.941n ± 1%       ~ (p=0.463 n=20)
Int32N1e9-8                   2.936n ± 0%    2.938n ± 2%       ~ (p=0.044 n=20)
Int32N2e9-8                   2.938n ± 2%    2.982n ± 2%       ~ (p=0.174 n=20)
Float32-8                     3.441n ± 0%    3.441n ± 0%       ~ (p=0.064 n=20)
Float64-8                     3.441n ± 0%    3.441n ± 0%       ~ (p=0.826 n=20)
ExpFloat64-8                  4.486n ± 0%    4.472n ± 0%  -0.31% (p=0.000 n=20)
NormFloat64-8                 4.721n ± 0%    4.716n ± 0%       ~ (p=0.051 n=20)
Perm3-8                       26.65n ± 0%    26.66n ± 0%       ~ (p=0.080 n=20)
Perm30-8                      143.2n ± 0%    143.3n ± 0%  +0.10% (p=0.000 n=20)
Perm30ViaShuffle-8            143.0n ± 0%    142.9n ± 0%       ~ (p=0.642 n=20)
ShuffleOverhead-8             120.6n ± 1%    121.1n ± 1%  +0.41% (p=0.010 n=20)
Concurrent-8                  2.399n ± 5%    2.379n ± 2%       ~ (p=0.365 n=20)

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 2703446c2e.386 │           e1bbe739fb.386            │
                        │     sec/op     │    sec/op     vs base               │
SourceUint64-32             2.072n ±  2%    2.087n ± 1%       ~ (p=0.440 n=20)
GlobalInt64-32              3.546n ± 27%    3.538n ± 2%       ~ (p=0.101 n=20)
GlobalInt64Parallel-32     0.3211n ±  0%   0.3207n ± 1%       ~ (p=0.753 n=20)
GlobalUint64-32             3.522n ±  2%    3.543n ± 1%       ~ (p=0.071 n=20)
GlobalUint64Parallel-32    0.3172n ±  0%   0.3170n ± 0%       ~ (p=0.507 n=20)
Int64-32                    2.520n ±  2%    2.548n ± 1%       ~ (p=0.267 n=20)
Uint64-32                   2.581n ±  1%    2.565n ± 2%       ~ (p=0.143 n=20)
GlobalIntN1000-32           6.171n ±  1%    6.300n ± 1%       ~ (p=0.037 n=20)
IntN1000-32                 4.752n ±  2%    4.750n ± 0%       ~ (p=0.984 n=20)
Int64N1000-32               5.429n ±  1%    5.515n ± 2%       ~ (p=0.292 n=20)
Int64N1e8-32                5.469n ±  2%    5.527n ± 0%       ~ (p=0.013 n=20)
Int64N1e9-32                5.489n ±  2%    5.531n ± 2%       ~ (p=0.256 n=20)
Int64N2e9-32                5.492n ±  2%    5.514n ± 2%       ~ (p=0.606 n=20)
Int64N1e18-32               8.927n ±  1%    9.059n ± 1%       ~ (p=0.229 n=20)
Int64N2e18-32               9.622n ±  1%    9.594n ± 1%       ~ (p=0.703 n=20)
Int64N4e18-32               12.03n ±  1%    12.05n ± 2%       ~ (p=0.733 n=20)
Int32N1000-32               4.817n ±  1%    4.840n ± 2%       ~ (p=0.941 n=20)
Int32N1e8-32                4.801n ±  1%    4.832n ± 2%       ~ (p=0.228 n=20)
Int32N1e9-32                4.798n ±  1%    4.815n ± 2%       ~ (p=0.560 n=20)
Int32N2e9-32                4.840n ±  1%    4.813n ± 1%       ~ (p=0.015 n=20)
Float32-32                  10.51n ±  4%    10.90n ± 2%  +3.71% (p=0.007 n=20)
Float64-32                  20.33n ±  3%    20.32n ± 4%       ~ (p=0.566 n=20)
ExpFloat64-32               12.59n ±  2%    12.95n ± 3%  +2.86% (p=0.002 n=20)
NormFloat64-32              7.350n ±  2%    7.570n ± 1%  +2.99% (p=0.007 n=20)
Perm3-32                    39.29n ±  2%    37.80n ± 2%  -3.79% (p=0.000 n=20)
Perm30-32                   219.1n ±  2%    214.0n ± 1%  -2.33% (p=0.002 n=20)
Perm30ViaShuffle-32         189.8n ±  2%    188.7n ± 2%       ~ (p=0.147 n=20)
ShuffleOverhead-32          158.9n ±  2%    160.8n ± 1%       ~ (p=0.176 n=20)
Concurrent-32               3.306n ±  3%    3.288n ± 0%  -0.54% (p=0.005 n=20)

For #61716.

Change-Id: I4c5fe710b310dc075ae21c97d1805bcc20db5050
Reviewed-on: https://go-review.googlesource.com/c/go/+/516275
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Rob Pike <r@golang.org>
2023-10-30 17:08:47 +00:00
Russ Cox ecda959b99 math/rand/v2: optimize Float32, Float64
We realized too late after Go 1 that float64(r.Uint64())/(1<<64)
is not a correct implementation: it occasionally rounds to 1.
The correct implementation is float64(r.Uint64()&(1<<53-1))/(1<<53)
but we couldn't change the implementation for compatibility, so we
changed it to retry only in the "round to 1" cases.

The change to v2 lets us update the algorithm to the simpler,
faster one.

Note that this implementation cannot generate 2⁻⁵⁴, nor 2⁻¹⁰⁰,
nor any of the other numbers between 0 and 2⁻⁵³. A slower algorithm
could shift some of the probability of generating these two boundary
values over to the values in between, but that would be much slower
and not necessarily be better. In particular, the current
implementation has the property that there are uniform gaps between
the possible returned floats, which might help stability. Also, the
result is often scaled and shifted, like Float64()*X+Y. Multiplying by
X>1 would open new gaps, and adding most Y would erase all the
distinctions that were introduced.

The only changes to benchmarks should be in Float32 and Float64.
The other changes remain a cautionary tale.

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 4d84a369d1.amd64 │           2703446c2e.amd64           │
                        │      sec/op      │    sec/op     vs base                │
SourceUint64-32                1.348n ± 2%    1.337n ± 1%        ~ (p=0.662 n=20)
GlobalInt64-32                 2.082n ± 2%    2.225n ± 2%   +6.87% (p=0.000 n=20)
GlobalInt64Parallel-32        0.1036n ± 1%   0.1043n ± 2%        ~ (p=0.171 n=20)
GlobalUint64-32                2.077n ± 2%    2.058n ± 1%        ~ (p=0.560 n=20)
GlobalUint64Parallel-32       0.1012n ± 1%   0.1009n ± 1%        ~ (p=0.995 n=20)
Int64-32                       1.750n ± 0%    1.719n ± 2%   -1.74% (p=0.000 n=20)
Uint64-32                      1.707n ± 2%    1.669n ± 1%   -2.20% (p=0.000 n=20)
GlobalIntN1000-32              3.192n ± 1%    3.321n ± 2%   +4.04% (p=0.000 n=20)
IntN1000-32                    2.462n ± 2%    2.479n ± 1%        ~ (p=0.417 n=20)
Int64N1000-32                  2.470n ± 1%    2.477n ± 1%        ~ (p=0.664 n=20)
Int64N1e8-32                   2.503n ± 2%    2.490n ± 1%        ~ (p=0.245 n=20)
Int64N1e9-32                   2.487n ± 1%    2.458n ± 1%        ~ (p=0.032 n=20)
Int64N2e9-32                   2.487n ± 1%    2.486n ± 2%        ~ (p=0.507 n=20)
Int64N1e18-32                  3.006n ± 2%    3.215n ± 2%   +6.94% (p=0.000 n=20)
Int64N2e18-32                  3.368n ± 1%    3.588n ± 2%   +6.55% (p=0.000 n=20)
Int64N4e18-32                  4.763n ± 1%    4.938n ± 2%   +3.69% (p=0.000 n=20)
Int32N1000-32                  2.403n ± 1%    2.673n ± 2%  +11.19% (p=0.000 n=20)
Int32N1e8-32                   2.405n ± 1%    2.631n ± 2%   +9.42% (p=0.000 n=20)
Int32N1e9-32                   2.402n ± 2%    2.628n ± 2%   +9.41% (p=0.000 n=20)
Int32N2e9-32                   2.384n ± 1%    2.684n ± 2%  +12.56% (p=0.000 n=20)
Float32-32                     2.641n ± 2%    2.240n ± 2%  -15.18% (p=0.000 n=20)
Float64-32                     2.483n ± 1%    2.253n ± 1%   -9.26% (p=0.000 n=20)
ExpFloat64-32                  3.486n ± 2%    3.677n ± 1%   +5.49% (p=0.000 n=20)
NormFloat64-32                 3.648n ± 1%    3.761n ± 1%   +3.11% (p=0.000 n=20)
Perm3-32                       33.04n ± 1%    33.55n ± 2%        ~ (p=0.180 n=20)
Perm30-32                      171.9n ± 1%    173.2n ± 1%        ~ (p=0.050 n=20)
Perm30ViaShuffle-32            100.3n ± 1%    115.9n ± 1%  +15.55% (p=0.000 n=20)
ShuffleOverhead-32             102.5n ± 1%    101.9n ± 1%        ~ (p=0.266 n=20)
Concurrent-32                  2.101n ± 0%    2.107n ± 6%        ~ (p=0.212 n=20)

goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
                       │ 4d84a369d1.arm64 │          2703446c2e.arm64           │
                       │      sec/op      │    sec/op     vs base               │
SourceUint64-8                2.261n ± 1%    2.275n ± 0%       ~ (p=0.082 n=20)
GlobalInt64-8                 2.160n ± 1%    2.154n ± 1%       ~ (p=0.490 n=20)
GlobalInt64Parallel-8        0.4299n ± 0%   0.4298n ± 0%       ~ (p=0.663 n=20)
GlobalUint64-8                2.169n ± 1%    2.160n ± 1%       ~ (p=0.292 n=20)
GlobalUint64Parallel-8       0.4293n ± 1%   0.4286n ± 0%       ~ (p=0.155 n=20)
Int64-8                       2.473n ± 1%    2.491n ± 1%       ~ (p=0.317 n=20)
Uint64-8                      2.453n ± 1%    2.458n ± 0%       ~ (p=0.941 n=20)
GlobalIntN1000-8              2.814n ± 2%    2.814n ± 2%       ~ (p=0.972 n=20)
IntN1000-8                    2.933n ± 2%    2.933n ± 0%       ~ (p=0.287 n=20)
Int64N1000-8                  2.934n ± 2%    2.962n ± 1%       ~ (p=0.062 n=20)
Int64N1e8-8                   2.935n ± 2%    2.960n ± 1%       ~ (p=0.183 n=20)
Int64N1e9-8                   2.934n ± 2%    2.935n ± 2%       ~ (p=0.367 n=20)
Int64N2e9-8                   2.935n ± 2%    2.934n ± 0%       ~ (p=0.455 n=20)
Int64N1e18-8                  3.778n ± 1%    3.777n ± 1%       ~ (p=0.995 n=20)
Int64N2e18-8                  4.359n ± 1%    4.359n ± 1%       ~ (p=0.122 n=20)
Int64N4e18-8                  6.546n ± 1%    6.536n ± 1%       ~ (p=0.920 n=20)
Int32N1000-8                  2.940n ± 2%    2.937n ± 0%       ~ (p=0.149 n=20)
Int32N1e8-8                   2.937n ± 2%    2.937n ± 1%       ~ (p=0.620 n=20)
Int32N1e9-8                   2.938n ± 0%    2.936n ± 0%       ~ (p=0.046 n=20)
Int32N2e9-8                   2.938n ± 2%    2.938n ± 2%       ~ (p=0.455 n=20)
Float32-8                     3.486n ± 0%    3.441n ± 0%  -1.28% (p=0.000 n=20)
Float64-8                     3.480n ± 0%    3.441n ± 0%  -1.13% (p=0.000 n=20)
ExpFloat64-8                  4.533n ± 0%    4.486n ± 0%  -1.03% (p=0.000 n=20)
NormFloat64-8                 4.764n ± 0%    4.721n ± 0%  -0.90% (p=0.000 n=20)
Perm3-8                       26.66n ± 0%    26.65n ± 0%       ~ (p=0.019 n=20)
Perm30-8                      143.4n ± 0%    143.2n ± 0%  -0.17% (p=0.000 n=20)
Perm30ViaShuffle-8            142.9n ± 0%    143.0n ± 0%       ~ (p=0.522 n=20)
ShuffleOverhead-8             120.7n ± 0%    120.6n ± 1%       ~ (p=0.488 n=20)
Concurrent-8                  2.360n ± 2%    2.399n ± 5%       ~ (p=0.062 n=20)

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 4d84a369d1.386 │            2703446c2e.386             │
                        │     sec/op     │    sec/op      vs base                │
SourceUint64-32              2.101n ± 2%    2.072n ±  2%        ~ (p=0.273 n=20)
GlobalInt64-32               3.518n ± 2%    3.546n ± 27%   +0.78% (p=0.007 n=20)
GlobalInt64Parallel-32      0.3206n ± 0%   0.3211n ±  0%        ~ (p=0.386 n=20)
GlobalUint64-32              3.538n ± 1%    3.522n ±  2%        ~ (p=0.331 n=20)
GlobalUint64Parallel-32     0.3231n ± 0%   0.3172n ±  0%   -1.84% (p=0.000 n=20)
Int64-32                     2.554n ± 2%    2.520n ±  2%        ~ (p=0.465 n=20)
Uint64-32                    2.575n ± 2%    2.581n ±  1%        ~ (p=0.213 n=20)
GlobalIntN1000-32            6.292n ± 1%    6.171n ±  1%        ~ (p=0.015 n=20)
IntN1000-32                  4.735n ± 1%    4.752n ±  2%        ~ (p=0.635 n=20)
Int64N1000-32                5.489n ± 2%    5.429n ±  1%        ~ (p=0.324 n=20)
Int64N1e8-32                 5.528n ± 2%    5.469n ±  2%        ~ (p=0.013 n=20)
Int64N1e9-32                 5.438n ± 2%    5.489n ±  2%        ~ (p=0.984 n=20)
Int64N2e9-32                 5.474n ± 1%    5.492n ±  2%        ~ (p=0.616 n=20)
Int64N1e18-32                9.053n ± 1%    8.927n ±  1%        ~ (p=0.037 n=20)
Int64N2e18-32                9.685n ± 2%    9.622n ±  1%        ~ (p=0.449 n=20)
Int64N4e18-32                12.18n ± 1%    12.03n ±  1%        ~ (p=0.013 n=20)
Int32N1000-32                4.862n ± 1%    4.817n ±  1%   -0.94% (p=0.002 n=20)
Int32N1e8-32                 4.758n ± 2%    4.801n ±  1%        ~ (p=0.597 n=20)
Int32N1e9-32                 4.772n ± 1%    4.798n ±  1%        ~ (p=0.774 n=20)
Int32N2e9-32                 4.847n ± 0%    4.840n ±  1%        ~ (p=0.867 n=20)
Float32-32                   22.18n ± 4%    10.51n ±  4%  -52.61% (p=0.000 n=20)
Float64-32                   21.21n ± 3%    20.33n ±  3%   -4.17% (p=0.000 n=20)
ExpFloat64-32                12.39n ± 2%    12.59n ±  2%        ~ (p=0.139 n=20)
NormFloat64-32               7.422n ± 1%    7.350n ±  2%        ~ (p=0.208 n=20)
Perm3-32                     38.00n ± 2%    39.29n ±  2%   +3.38% (p=0.000 n=20)
Perm30-32                    212.7n ± 1%    219.1n ±  2%   +3.03% (p=0.001 n=20)
Perm30ViaShuffle-32          187.5n ± 2%    189.8n ±  2%        ~ (p=0.457 n=20)
ShuffleOverhead-32           159.7n ± 1%    158.9n ±  2%        ~ (p=0.920 n=20)
Concurrent-32                3.470n ± 0%    3.306n ±  3%   -4.71% (p=0.000 n=20)

For #61716.

Change-Id: I1933f1f9efd7e6e832d83e7fa5d84398f67d41f5
Reviewed-on: https://go-review.googlesource.com/c/go/+/502503
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-30 17:08:40 +00:00
Russ Cox c266587846 math/rand/v2: add, optimize N, UintN, Uint32N, Uint64N
Now that we can break the value stream, we can take advantage
of better algorithms that have been suggested since the original
code was written.

Also optimizes IntN, Int32N, Int64N, Perm (indirectly).

All the N variants (IntN, Int32N, Int64N, UintN, N, etc) now
return the same values given a Source and parameter n, so that
for example uint(r.IntN(10)) and r.UintN(10) and r.N(uint(10))
are completely interchangeable.

Int64N4e18 gets slower but that is a near worst case for
the algorithm and is extremely unlikely in practice.

32-bit Int32N variants got slower too, by 15-30%, in exchange
for speeding up everything on 64-bit systems and consistency
across the N functions.

Also rename previously missed benchmark
GlobalInt63Parallel to GlobalInt64Parallel.

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 11ad9fdddc.amd64 │            4d84a369d1.amd64            │
                        │      sec/op      │    sec/op     vs base                  │
SourceUint64-32                1.335n ± 1%    1.348n ± 2%        ~ (p=0.335 n=20)
GlobalInt64-32                 2.046n ± 1%    2.082n ± 2%        ~ (p=0.310 n=20)
GlobalInt63Parallel-32        0.1037n ± 1%
GlobalInt64Parallel-32                       0.1036n ± 1%
GlobalUint64-32                2.075n ± 0%    2.077n ± 2%        ~ (p=0.228 n=20)
GlobalUint64Parallel-32       0.1013n ± 1%   0.1012n ± 1%        ~ (p=0.878 n=20)
Int64-32                       1.726n ± 2%    1.750n ± 0%   +1.39% (p=0.000 n=20)
Uint64-32                      1.673n ± 1%    1.707n ± 2%   +2.03% (p=0.002 n=20)
GlobalIntN1000-32              3.895n ± 2%    3.192n ± 1%  -18.05% (p=0.000 n=20)
IntN1000-32                    3.403n ± 1%    2.462n ± 2%  -27.65% (p=0.000 n=20)
Int64N1000-32                  3.053n ± 2%    2.470n ± 1%  -19.11% (p=0.000 n=20)
Int64N1e8-32                   2.718n ± 1%    2.503n ± 2%   -7.91% (p=0.000 n=20)
Int64N1e9-32                   2.712n ± 1%    2.487n ± 1%   -8.31% (p=0.000 n=20)
Int64N2e9-32                   2.690n ± 1%    2.487n ± 1%   -7.57% (p=0.000 n=20)
Int64N1e18-32                  3.084n ± 2%    3.006n ± 2%   -2.53% (p=0.000 n=20)
Int64N2e18-32                  4.026n ± 1%    3.368n ± 1%  -16.33% (p=0.000 n=20)
Int64N4e18-32                  4.049n ± 2%    4.763n ± 1%  +17.62% (p=0.000 n=20)
Int32N1000-32                  2.730n ± 0%    2.403n ± 1%  -11.94% (p=0.000 n=20)
Int32N1e8-32                   2.916n ± 2%    2.405n ± 1%  -17.53% (p=0.000 n=20)
Int32N1e9-32                   3.375n ± 1%    2.402n ± 2%  -28.83% (p=0.000 n=20)
Int32N2e9-32                   3.292n ± 1%    2.384n ± 1%  -27.58% (p=0.000 n=20)
Float32-32                     2.673n ± 1%    2.641n ± 2%        ~ (p=0.147 n=20)
Float64-32                     2.485n ± 1%    2.483n ± 1%        ~ (p=0.804 n=20)
ExpFloat64-32                  3.577n ± 2%    3.486n ± 2%   -2.57% (p=0.000 n=20)
NormFloat64-32                 3.797n ± 2%    3.648n ± 1%   -3.92% (p=0.000 n=20)
Perm3-32                       35.79n ± 2%    33.04n ± 1%   -7.68% (p=0.000 n=20)
Perm30-32                      205.1n ± 1%    171.9n ± 1%  -16.14% (p=0.000 n=20)
Perm30ViaShuffle-32            111.2n ± 2%    100.3n ± 1%   -9.76% (p=0.000 n=20)
ShuffleOverhead-32             100.5n ± 2%    102.5n ± 1%   +1.99% (p=0.007 n=20)
Concurrent-32                  2.188n ± 5%    2.101n ± 0%        ~ (p=0.013 n=20)

goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
                       │ 11ad9fdddc.arm64 │            4d84a369d1.arm64            │
                       │      sec/op      │    sec/op     vs base                  │
SourceUint64-8                2.272n ± 1%    2.261n ± 1%        ~ (p=0.172 n=20)
GlobalInt64-8                 2.155n ± 1%    2.160n ± 1%        ~ (p=0.482 n=20)
GlobalInt63Parallel-8        0.4352n ± 0%
GlobalInt64Parallel-8                       0.4299n ± 0%
GlobalUint64-8                2.173n ± 1%    2.169n ± 1%        ~ (p=0.262 n=20)
GlobalUint64Parallel-8       0.4340n ± 0%   0.4293n ± 1%   -1.08% (p=0.000 n=20)
Int64-8                       2.544n ± 1%    2.473n ± 1%   -2.83% (p=0.000 n=20)
Uint64-8                      2.552n ± 1%    2.453n ± 1%   -3.90% (p=0.000 n=20)
GlobalIntN1000-8              3.856n ± 0%    2.814n ± 2%  -27.02% (p=0.000 n=20)
IntN1000-8                    3.820n ± 0%    2.933n ± 2%  -23.22% (p=0.000 n=20)
Int64N1000-8                  3.219n ± 2%    2.934n ± 2%   -8.85% (p=0.000 n=20)
Int64N1e8-8                   3.221n ± 2%    2.935n ± 2%   -8.91% (p=0.000 n=20)
Int64N1e9-8                   3.276n ± 2%    2.934n ± 2%  -10.44% (p=0.000 n=20)
Int64N2e9-8                   3.217n ± 0%    2.935n ± 2%   -8.78% (p=0.000 n=20)
Int64N1e18-8                  3.502n ± 2%    3.778n ± 1%   +7.91% (p=0.000 n=20)
Int64N2e18-8                  4.968n ± 1%    4.359n ± 1%  -12.26% (p=0.000 n=20)
Int64N4e18-8                  4.963n ± 0%    6.546n ± 1%  +31.92% (p=0.000 n=20)
Int32N1000-8                  3.189n ± 1%    2.940n ± 2%   -7.81% (p=0.000 n=20)
Int32N1e8-8                   3.514n ± 1%    2.937n ± 2%  -16.41% (p=0.000 n=20)
Int32N1e9-8                   4.133n ± 0%    2.938n ± 0%  -28.91% (p=0.000 n=20)
Int32N2e9-8                   4.137n ± 0%    2.938n ± 2%  -28.97% (p=0.000 n=20)
Float32-8                     3.468n ± 1%    3.486n ± 0%   +0.52% (p=0.000 n=20)
Float64-8                     3.478n ± 0%    3.480n ± 0%        ~ (p=0.063 n=20)
ExpFloat64-8                  4.563n ± 0%    4.533n ± 0%   -0.67% (p=0.000 n=20)
NormFloat64-8                 4.768n ± 0%    4.764n ± 0%   -0.07% (p=0.001 n=20)
Perm3-8                       28.94n ± 0%    26.66n ± 0%   -7.88% (p=0.000 n=20)
Perm30-8                      175.9n ± 0%    143.4n ± 0%  -18.50% (p=0.000 n=20)
Perm30ViaShuffle-8            152.6n ± 1%    142.9n ± 0%   -6.29% (p=0.000 n=20)
ShuffleOverhead-8             119.6n ± 1%    120.7n ± 0%   +0.96% (p=0.000 n=20)
Concurrent-8                  2.452n ± 3%    2.360n ± 2%   -3.73% (p=0.007 n=20)

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 11ad9fdddc.386 │             4d84a369d1.386             │
                        │     sec/op     │    sec/op     vs base                  │
SourceUint64-32              2.091n ± 1%    2.101n ± 2%        ~ (p=0.672 n=20)
GlobalInt64-32               3.514n ± 2%    3.518n ± 2%        ~ (p=0.723 n=20)
GlobalInt63Parallel-32      0.3197n ± 0%
GlobalInt64Parallel-32                     0.3206n ± 0%
GlobalUint64-32              3.542n ± 1%    3.538n ± 1%        ~ (p=0.304 n=20)
GlobalUint64Parallel-32     0.3218n ± 0%   0.3231n ± 0%        ~ (p=0.071 n=20)
Int64-32                     2.552n ± 2%    2.554n ± 2%        ~ (p=0.693 n=20)
Uint64-32                    2.566n ± 1%    2.575n ± 2%        ~ (p=0.606 n=20)
GlobalIntN1000-32            5.965n ± 2%    6.292n ± 1%   +5.46% (p=0.000 n=20)
IntN1000-32                  4.652n ± 1%    4.735n ± 1%   +1.77% (p=0.000 n=20)
Int64N1000-32               14.485n ± 1%    5.489n ± 2%  -62.11% (p=0.000 n=20)
Int64N1e8-32                14.675n ± 1%    5.528n ± 2%  -62.33% (p=0.000 n=20)
Int64N1e9-32                16.805n ± 2%    5.438n ± 2%  -67.64% (p=0.000 n=20)
Int64N2e9-32                14.515n ± 1%    5.474n ± 1%  -62.28% (p=0.000 n=20)
Int64N1e18-32               16.165n ± 1%    9.053n ± 1%  -44.00% (p=0.000 n=20)
Int64N2e18-32               17.945n ± 2%    9.685n ± 2%  -46.03% (p=0.000 n=20)
Int64N4e18-32                18.35n ± 2%    12.18n ± 1%  -33.62% (p=0.000 n=20)
Int32N1000-32                3.608n ± 1%    4.862n ± 1%  +34.77% (p=0.000 n=20)
Int32N1e8-32                 3.767n ± 1%    4.758n ± 2%  +26.31% (p=0.000 n=20)
Int32N1e9-32                 4.130n ± 2%    4.772n ± 1%  +15.54% (p=0.000 n=20)
Int32N2e9-32                 4.206n ± 1%    4.847n ± 0%  +15.24% (p=0.000 n=20)
Float32-32                   22.18n ± 4%    22.18n ± 4%        ~ (p=0.195 n=20)
Float64-32                   20.75n ± 4%    21.21n ± 3%        ~ (p=0.394 n=20)
ExpFloat64-32                12.58n ± 3%    12.39n ± 2%        ~ (p=0.032 n=20)
NormFloat64-32               7.920n ± 3%    7.422n ± 1%   -6.29% (p=0.000 n=20)
Perm3-32                     40.27n ± 1%    38.00n ± 2%   -5.65% (p=0.000 n=20)
Perm30-32                    213.2n ± 2%    212.7n ± 1%        ~ (p=0.995 n=20)
Perm30ViaShuffle-32          164.2n ± 2%    187.5n ± 2%  +14.22% (p=0.000 n=20)
ShuffleOverhead-32           134.7n ± 2%    159.7n ± 1%  +18.52% (p=0.000 n=20)
Concurrent-32                3.301n ± 2%    3.470n ± 0%   +5.10% (p=0.000 n=20)

For #61716.

Change-Id: Id1481b04202883cd0b23e21bb58d1bca4e482bd3
Reviewed-on: https://go-review.googlesource.com/c/go/+/502500
Reviewed-by: Rob Pike <r@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-30 17:08:37 +00:00
Russ Cox c7dddb02d3 math/rand/v2: change Source to use uint64
This should make Uint64-using functions faster and leave
other things alone. It is a mystery why so much got faster.
A good cautionary tale not to read too much into minor
jitter in the benchmarks.

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 220860f76f.amd64 │           11ad9fdddc.amd64           │
                        │      sec/op      │    sec/op     vs base                │
SourceUint64-32                1.555n ± 1%    1.335n ± 1%  -14.15% (p=0.000 n=20)
GlobalInt64-32                 2.071n ± 1%    2.046n ± 1%        ~ (p=0.016 n=20)
GlobalInt63Parallel-32        0.1023n ± 1%   0.1037n ± 1%   +1.37% (p=0.002 n=20)
GlobalUint64-32                5.193n ± 1%    2.075n ± 0%  -60.06% (p=0.000 n=20)
GlobalUint64Parallel-32       0.2341n ± 0%   0.1013n ± 1%  -56.74% (p=0.000 n=20)
Int64-32                       2.056n ± 2%    1.726n ± 2%  -16.10% (p=0.000 n=20)
Uint64-32                      2.077n ± 2%    1.673n ± 1%  -19.46% (p=0.000 n=20)
GlobalIntN1000-32              4.077n ± 2%    3.895n ± 2%   -4.45% (p=0.000 n=20)
IntN1000-32                    3.476n ± 2%    3.403n ± 1%   -2.10% (p=0.000 n=20)
Int64N1000-32                  3.059n ± 1%    3.053n ± 2%        ~ (p=0.131 n=20)
Int64N1e8-32                   2.942n ± 1%    2.718n ± 1%   -7.60% (p=0.000 n=20)
Int64N1e9-32                   2.932n ± 1%    2.712n ± 1%   -7.50% (p=0.000 n=20)
Int64N2e9-32                   2.925n ± 1%    2.690n ± 1%   -8.03% (p=0.000 n=20)
Int64N1e18-32                  3.116n ± 1%    3.084n ± 2%        ~ (p=0.425 n=20)
Int64N2e18-32                  4.067n ± 1%    4.026n ± 1%   -1.02% (p=0.007 n=20)
Int64N4e18-32                  4.054n ± 1%    4.049n ± 2%        ~ (p=0.204 n=20)
Int32N1000-32                  2.951n ± 1%    2.730n ± 0%   -7.49% (p=0.000 n=20)
Int32N1e8-32                   3.102n ± 1%    2.916n ± 2%   -6.03% (p=0.000 n=20)
Int32N1e9-32                   3.535n ± 1%    3.375n ± 1%   -4.54% (p=0.000 n=20)
Int32N2e9-32                   3.514n ± 1%    3.292n ± 1%   -6.30% (p=0.000 n=20)
Float32-32                     2.760n ± 1%    2.673n ± 1%   -3.13% (p=0.000 n=20)
Float64-32                     2.284n ± 1%    2.485n ± 1%   +8.80% (p=0.000 n=20)
ExpFloat64-32                  3.757n ± 1%    3.577n ± 2%   -4.78% (p=0.000 n=20)
NormFloat64-32                 3.837n ± 1%    3.797n ± 2%        ~ (p=0.204 n=20)
Perm3-32                       35.23n ± 2%    35.79n ± 2%        ~ (p=0.298 n=20)
Perm30-32                      208.8n ± 1%    205.1n ± 1%   -1.82% (p=0.000 n=20)
Perm30ViaShuffle-32            111.7n ± 1%    111.2n ± 2%        ~ (p=0.273 n=20)
ShuffleOverhead-32             101.1n ± 1%    100.5n ± 2%        ~ (p=0.878 n=20)
Concurrent-32                  2.108n ± 7%    2.188n ± 5%        ~ (p=0.417 n=20)

goos: darwin
goarch: arm64
pkg: math/rand/v2
                       │ 220860f76f.arm64 │           11ad9fdddc.arm64           │
                       │      sec/op      │    sec/op     vs base                │
SourceUint64-8                2.316n ± 1%    2.272n ± 1%   -1.86% (p=0.000 n=20)
GlobalInt64-8                 2.183n ± 1%    2.155n ± 1%        ~ (p=0.122 n=20)
GlobalInt63Parallel-8        0.4331n ± 0%   0.4352n ± 0%   +0.48% (p=0.000 n=20)
GlobalUint64-8                4.377n ± 2%    2.173n ± 1%  -50.35% (p=0.000 n=20)
GlobalUint64Parallel-8       0.9237n ± 0%   0.4340n ± 0%  -53.02% (p=0.000 n=20)
Int64-8                       2.538n ± 1%    2.544n ± 1%        ~ (p=0.189 n=20)
Uint64-8                      2.604n ± 1%    2.552n ± 1%   -1.98% (p=0.000 n=20)
GlobalIntN1000-8              3.857n ± 2%    3.856n ± 0%        ~ (p=0.051 n=20)
IntN1000-8                    3.822n ± 2%    3.820n ± 0%   -0.05% (p=0.001 n=20)
Int64N1000-8                  3.318n ± 0%    3.219n ± 2%   -2.98% (p=0.000 n=20)
Int64N1e8-8                   3.349n ± 1%    3.221n ± 2%   -3.79% (p=0.000 n=20)
Int64N1e9-8                   3.317n ± 2%    3.276n ± 2%   -1.24% (p=0.001 n=20)
Int64N2e9-8                   3.317n ± 2%    3.217n ± 0%   -3.01% (p=0.000 n=20)
Int64N1e18-8                  3.542n ± 1%    3.502n ± 2%   -1.16% (p=0.001 n=20)
Int64N2e18-8                  5.087n ± 0%    4.968n ± 1%   -2.33% (p=0.000 n=20)
Int64N4e18-8                  5.084n ± 0%    4.963n ± 0%   -2.39% (p=0.000 n=20)
Int32N1000-8                  3.208n ± 2%    3.189n ± 1%   -0.58% (p=0.001 n=20)
Int32N1e8-8                   3.610n ± 1%    3.514n ± 1%   -2.67% (p=0.000 n=20)
Int32N1e9-8                   4.235n ± 0%    4.133n ± 0%   -2.40% (p=0.000 n=20)
Int32N2e9-8                   4.229n ± 1%    4.137n ± 0%   -2.19% (p=0.000 n=20)
Float32-8                     3.468n ± 0%    3.468n ± 1%        ~ (p=0.350 n=20)
Float64-8                     3.447n ± 0%    3.478n ± 0%   +0.90% (p=0.000 n=20)
ExpFloat64-8                  4.567n ± 0%    4.563n ± 0%   -0.10% (p=0.002 n=20)
NormFloat64-8                 4.821n ± 0%    4.768n ± 0%   -1.09% (p=0.000 n=20)
Perm3-8                       28.89n ± 0%    28.94n ± 0%   +0.17% (p=0.000 n=20)
Perm30-8                      175.7n ± 0%    175.9n ± 0%   +0.14% (p=0.000 n=20)
Perm30ViaShuffle-8            153.5n ± 0%    152.6n ± 1%        ~ (p=0.010 n=20)
ShuffleOverhead-8             119.8n ± 1%    119.6n ± 1%        ~ (p=0.147 n=20)
Concurrent-8                  2.433n ± 3%    2.452n ± 3%        ~ (p=0.616 n=20)

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 220860f76f.386 │            11ad9fdddc.386            │
                        │     sec/op     │    sec/op     vs base                │
SourceUint64-32             2.370n ±  1%    2.091n ± 1%  -11.75% (p=0.000 n=20)
GlobalInt64-32              3.569n ±  1%    3.514n ± 2%   -1.56% (p=0.000 n=20)
GlobalInt63Parallel-32     0.3221n ±  1%   0.3197n ± 0%   -0.76% (p=0.000 n=20)
GlobalUint64-32             8.797n ± 10%    3.542n ± 1%  -59.74% (p=0.000 n=20)
GlobalUint64Parallel-32    0.6351n ±  0%   0.3218n ± 0%  -49.33% (p=0.000 n=20)
Int64-32                    2.612n ±  2%    2.552n ± 2%   -2.30% (p=0.000 n=20)
Uint64-32                   3.350n ±  1%    2.566n ± 1%  -23.42% (p=0.000 n=20)
GlobalIntN1000-32           5.892n ±  1%    5.965n ± 2%        ~ (p=0.082 n=20)
IntN1000-32                 4.546n ±  1%    4.652n ± 1%   +2.33% (p=0.000 n=20)
Int64N1000-32               14.59n ±  1%    14.48n ± 1%        ~ (p=0.652 n=20)
Int64N1e8-32                14.76n ±  2%    14.67n ± 1%        ~ (p=0.836 n=20)
Int64N1e9-32                16.57n ±  1%    16.80n ± 2%        ~ (p=0.016 n=20)
Int64N2e9-32                14.54n ±  1%    14.52n ± 1%        ~ (p=0.533 n=20)
Int64N1e18-32               16.14n ±  1%    16.16n ± 1%        ~ (p=0.606 n=20)
Int64N2e18-32               18.10n ±  1%    17.95n ± 2%        ~ (p=0.062 n=20)
Int64N4e18-32               18.65n ±  1%    18.35n ± 2%   -1.61% (p=0.010 n=20)
Int32N1000-32               3.560n ±  1%    3.608n ± 1%   +1.33% (p=0.001 n=20)
Int32N1e8-32                3.770n ±  2%    3.767n ± 1%        ~ (p=0.155 n=20)
Int32N1e9-32                4.098n ±  0%    4.130n ± 2%        ~ (p=0.016 n=20)
Int32N2e9-32                4.179n ±  1%    4.206n ± 1%        ~ (p=0.011 n=20)
Float32-32                  21.18n ±  4%    22.18n ± 4%   +4.70% (p=0.003 n=20)
Float64-32                  20.60n ±  2%    20.75n ± 4%   +0.73% (p=0.000 n=20)
ExpFloat64-32               13.07n ±  0%    12.58n ± 3%   -3.82% (p=0.000 n=20)
NormFloat64-32              7.738n ±  2%    7.920n ± 3%        ~ (p=0.066 n=20)
Perm3-32                    36.73n ±  1%    40.27n ± 1%   +9.65% (p=0.000 n=20)
Perm30-32                   211.9n ±  1%    213.2n ± 2%        ~ (p=0.262 n=20)
Perm30ViaShuffle-32         165.2n ±  1%    164.2n ± 2%        ~ (p=0.029 n=20)
ShuffleOverhead-32          133.9n ±  1%    134.7n ± 2%        ~ (p=0.551 n=20)
Concurrent-32               3.287n ±  2%    3.301n ± 2%        ~ (p=0.330 n=20)

For #61716.

Change-Id: I8d2f73f87dd3603a0c2ff069988938e0957b6904
Reviewed-on: https://go-review.googlesource.com/c/go/+/502499
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Rob Pike <r@golang.org>
2023-10-30 17:08:34 +00:00
Russ Cox 1f4db9dbd6 math/rand/v2: update benchmarks
Change the benchmarks to use the result of the calls,
as I found that in certain cases inlining resulted in
discarding part of the computation in the benchmark loop.
Add various benchmarks that will be relevant in future CLs.

goos: linux
goarch: amd64
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 220860f76f.amd64 │
                        │      sec/op      │
SourceUint64-32                1.555n ± 1%
GlobalInt64-32                 2.071n ± 1%
GlobalInt63Parallel-32        0.1023n ± 1%
GlobalUint64-32                5.193n ± 1%
GlobalUint64Parallel-32       0.2341n ± 0%
Int64-32                       2.056n ± 2%
Uint64-32                      2.077n ± 2%
GlobalIntN1000-32              4.077n ± 2%
IntN1000-32                    3.476n ± 2%
Int64N1000-32                  3.059n ± 1%
Int64N1e8-32                   2.942n ± 1%
Int64N1e9-32                   2.932n ± 1%
Int64N2e9-32                   2.925n ± 1%
Int64N1e18-32                  3.116n ± 1%
Int64N2e18-32                  4.067n ± 1%
Int64N4e18-32                  4.054n ± 1%
Int32N1000-32                  2.951n ± 1%
Int32N1e8-32                   3.102n ± 1%
Int32N1e9-32                   3.535n ± 1%
Int32N2e9-32                   3.514n ± 1%
Float32-32                     2.760n ± 1%
Float64-32                     2.284n ± 1%
ExpFloat64-32                  3.757n ± 1%
NormFloat64-32                 3.837n ± 1%
Perm3-32                       35.23n ± 2%
Perm30-32                      208.8n ± 1%
Perm30ViaShuffle-32            111.7n ± 1%
ShuffleOverhead-32             101.1n ± 1%
Concurrent-32                  2.108n ± 7%

goos: darwin
goarch: arm64
pkg: math/rand/v2
cpu: Apple M1
                       │ 220860f76f.arm64 │
                       │      sec/op      │
SourceUint64-8                2.316n ± 1%
GlobalInt64-8                 2.183n ± 1%
GlobalInt63Parallel-8        0.4331n ± 0%
GlobalUint64-8                4.377n ± 2%
GlobalUint64Parallel-8       0.9237n ± 0%
Int64-8                       2.538n ± 1%
Uint64-8                      2.604n ± 1%
GlobalIntN1000-8              3.857n ± 2%
IntN1000-8                    3.822n ± 2%
Int64N1000-8                  3.318n ± 0%
Int64N1e8-8                   3.349n ± 1%
Int64N1e9-8                   3.317n ± 2%
Int64N2e9-8                   3.317n ± 2%
Int64N1e18-8                  3.542n ± 1%
Int64N2e18-8                  5.087n ± 0%
Int64N4e18-8                  5.084n ± 0%
Int32N1000-8                  3.208n ± 2%
Int32N1e8-8                   3.610n ± 1%
Int32N1e9-8                   4.235n ± 0%
Int32N2e9-8                   4.229n ± 1%
Float32-8                     3.468n ± 0%
Float64-8                     3.447n ± 0%
ExpFloat64-8                  4.567n ± 0%
NormFloat64-8                 4.821n ± 0%
Perm3-8                       28.89n ± 0%
Perm30-8                      175.7n ± 0%
Perm30ViaShuffle-8            153.5n ± 0%
ShuffleOverhead-8             119.8n ± 1%
Concurrent-8                  2.433n ± 3%

goos: linux
goarch: 386
pkg: math/rand/v2
cpu: AMD Ryzen 9 7950X 16-Core Processor
                        │ 220860f76f.386 │
                        │     sec/op     │
SourceUint64-32             2.370n ±  1%
GlobalInt64-32              3.569n ±  1%
GlobalInt63Parallel-32     0.3221n ±  1%
GlobalUint64-32             8.797n ± 10%
GlobalUint64Parallel-32    0.6351n ±  0%
Int64-32                    2.612n ±  2%
Uint64-32                   3.350n ±  1%
GlobalIntN1000-32           5.892n ±  1%
IntN1000-32                 4.546n ±  1%
Int64N1000-32               14.59n ±  1%
Int64N1e8-32                14.76n ±  2%
Int64N1e9-32                16.57n ±  1%
Int64N2e9-32                14.54n ±  1%
Int64N1e18-32               16.14n ±  1%
Int64N2e18-32               18.10n ±  1%
Int64N4e18-32               18.65n ±  1%
Int32N1000-32               3.560n ±  1%
Int32N1e8-32                3.770n ±  2%
Int32N1e9-32                4.098n ±  0%
Int32N2e9-32                4.179n ±  1%
Float32-32                  21.18n ±  4%
Float64-32                  20.60n ±  2%
ExpFloat64-32               13.07n ±  0%
NormFloat64-32              7.738n ±  2%
Perm3-32                    36.73n ±  1%
Perm30-32                   211.9n ±  1%
Perm30ViaShuffle-32         165.2n ±  1%
ShuffleOverhead-32          133.9n ±  1%
Concurrent-32               3.287n ±  2%

For #61716.

Change-Id: I2f0938eae4b7bf736a8cd899a99783e731bf2179
Reviewed-on: https://go-review.googlesource.com/c/go/+/502496
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Rob Pike <r@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-30 14:32:20 +00:00
Russ Cox 1cc5b34d28 math/rand/v2: remove Rand.Seed
Removing Rand.Seed lets us remove lockedSource as well,
along with the ambiguity in globalRand about which source
to use.

For #61716.

Change-Id: Ibe150520dd1e7dd87165eacaebe9f0c2daeaedfd
Reviewed-on: https://go-review.googlesource.com/c/go/+/502498
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Rob Pike <r@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
2023-10-30 14:31:46 +00:00
Russ Cox 48bd1fc93b math/rand/v2: clean up regression test
Add more test cases.
Replace -printgolden with -update,
which rewrites the files for us.

For #61716.

Change-Id: I7c4c900ee896042429135a21971a56ebe16b6a66
Reviewed-on: https://go-review.googlesource.com/c/go/+/516858
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-30 14:30:24 +00:00
Russ Cox d6c1ef52ad math/rand/v2: remove Read
In math/rand, Read is deprecated. Remove in v2.
People should use crypto/rand if they need long strings.

For #61716.

Change-Id: Ib254b7e1844616e96db60a3a7abb572b0dcb1583
Reviewed-on: https://go-review.googlesource.com/c/go/+/502497
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-30 14:30:14 +00:00
Russ Cox d42750b17c math/rand/v2: rename various functions
Int31 -> Int32
Int31n -> Int32N
Int63 -> Int64
Int63n -> Int64N
Intn -> IntN

The 31 and 63 are pedantic and confusing: the functions should
be named for the type they return, same as all the others.

The lower-case n is inconsistent with Go's usual CamelCase
and especially problematic because we plan to add 'func N'.
Capitalize the n.

For #61716.

Change-Id: Idb1a005a82f353677450d47fb612ade7a41fde69
Reviewed-on: https://go-review.googlesource.com/c/go/+/516857
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-30 14:29:37 +00:00
Russ Cox 59f0ab4036 math/rand/v2: start of new API
This is the beginning of the math/rand/v2 package from proposal #61716.
Start by copying old API. This CL copies math/rand/* to math/rand/v2
and updates references to math/rand to add v2 throughout.
Later CLs will make the v2 changes.

For #61716.

Change-Id: I1624ccffae3dfa442d4ba2461942decbd076e11b
Reviewed-on: https://go-review.googlesource.com/c/go/+/502495
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
2023-10-30 14:29:30 +00:00
Dmitri Shuralyov bf97e724b5 all: drop old +build lines
Running 'go fix' on the cmd+std packages handled much of this change.

Also update code generators to use only the new go:build lines,
not the old +build ones.

For #41184.
For #60268.

Change-Id: If35532abe3012e7357b02c79d5992ff5ac37ca23
Cq-Include-Trybots: luci.golang.try:gotip-linux-386-longtest,gotip-linux-amd64-longtest,gotip-windows-amd64-longtest
Reviewed-on: https://go-review.googlesource.com/c/go/+/536237
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-19 23:33:27 +00:00
cui fliter d57303e65f math: add available godoc link
Change-Id: I4a6c2ef6fd21355952ab7d8eaad883646a95d364
Reviewed-on: https://go-review.googlesource.com/c/go/+/535087
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Than McIntosh <thanm@google.com>
2023-10-19 11:59:09 +00:00
Oleksandr Redko da8f406f06 all: simplify bool conditions
Change-Id: Id2079f7012392dea8dfe2386bb9fb1ea3f487a4a
Reviewed-on: https://go-review.googlesource.com/c/go/+/526015
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: qiulaidongfeng <2645477756@qq.com>
2023-09-20 18:06:13 +00:00
Dmitri Shuralyov 0dfb22ed70 all: use ^TestName$ regular pattern for invoking a single test
Use ^ and $ in the -run flag regular expression value when the intention
is to invoke a single named test. This removes the reliance on there not
being another similarly named test to achieve the intended result.

In particular, package syscall has tests named TestUnshareMountNameSpace
and TestUnshareMountNameSpaceChroot that both trigger themselves setting
GO_WANT_HELPER_PROCESS=1 to run alternate code in a helper process. As a
consequence of overlap in their test names, the former was inadvertently
triggering one too many helpers.

Spotted while reviewing CL 525196. Apply the same change in other places
to make it easier for code readers to see that said tests aren't running
extraneous tests. The unlikely cases of -run=TestSomething intentionally
being used to run all tests that have the TestSomething substring in the
name can be better written as -run=^.*TestSomething.*$ or with a comment
so it is clear it wasn't an oversight.

Change-Id: Iba208aba3998acdbf8c6708e5d23ab88938bfc1e
Reviewed-on: https://go-review.googlesource.com/c/go/+/524948
Reviewed-by: Tobias Klauser <tobias.klauser@gmail.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Kirill Kolyshkin <kolyshkin@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-09-05 23:35:29 +00:00
chanxuehong aaa384cf3a math/big, math/rand: use the built-in max function
Change-Id: I71a38dd20bfaf2b1aed18892d54eeb017d3d7d66
GitHub-Last-Rev: 8da43b2cbd
GitHub-Pull-Request: golang/go#61955
Reviewed-on: https://go-review.googlesource.com/c/go/+/518595
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: qiulaidongfeng <2645477756@qq.com>
2023-08-17 16:42:19 +00:00
qiulaidongfeng 1d3a77e5e6 math/big: using the min built-in function
Change-Id: I9e95806116a8547ec782f66226d1b1382c6156de

Change-Id: I9e95806116a8547ec782f66226d1b1382c6156de
GitHub-Last-Rev: 5b4ce994c1
GitHub-Pull-Request: golang/go#61829
Reviewed-on: https://go-review.googlesource.com/c/go/+/516895
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
2023-08-11 02:52:49 +00:00
Srinivas Pokala a37da52d75 math: enable huge argument tests on s390x
new s390x assembly implementation of Sin/Cos/SinCos/Tan handle huge argument
test's.

Updates #29240

Change-Id: I9f22d9714528ef2af52c749079f3727250089baf
Reviewed-on: https://go-review.googlesource.com/c/go/+/509675
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-07-31 16:23:41 +00:00
Srinivas Pokala 20ea988421 math: huge argument handling for sin/cos in s390x
Currently s390x, sin/cos assembly implementation not handling huge
arguments. This change reverts assembly routine to native go implementation
for huge arguments. Implementing the changes in assembly giving better
performance than native go changes in terms of execution/cycles.

name                                         Go_changes     Asm_changes
Sin/input_size_(0.5)-8                      11.85ns ± 0%   5.32ns ± 1%
Sin/input_size_(1<<20)-8                    15.32ns ± 0%   9.75ns ± 3%
Sin/input_size_(1<<_40)-8                   17.9ns  ± 0%   10.3ns ± 6%
Sin/input_size_(1<<50)-8                    16.33ns ± 0%   9.75ns ± 6%
Sin/input_size_(1<<60)-8                    33.0ns  ± 1%   29.1ns ± 0%
Sin/input_size_(1<<80)-8                    29.9ns  ± 0%   27.2ns ± 2%
Sin/input_size_(1<<200)-8                   31.5ns  ± 1%   28.3ns ± 0%
Sin/input_size_(1<<480)-8                   29.4ns  ± 1%   28.0ns ± 1%
Sin/input_size_(1234567891234567_<<_180)-8  29.3ns  ± 1%   28.0ns ± 0%
Cos/input_size_(0.5)-8                      10.33ns ± 0%   5.69ns ± 1%
Cos/input_size_(1<<20)-8                    16.67ns ± 0%   9.18ns ± 0%
Cos/input_size_(1<<_40)-8                   18.50ns ± 0%   9.45ns ± 3%
Cos/input_size_(1<<50)-8                    16.67ns ± 0%   9.18ns ± 1%
Cos/input_size_(1<<60)-8                    31.6ns  ± 1%   26.7ns ± 2%
Cos/input_size_(1<<80)-8                    31.3ns  ± 0%   25.5ns ± 1%
Cos/input_size_(1<<200)-8                   30.0ns  ± 0%   26.7ns ± 1%
Cos/input_size_(1<<480)-8                   31.9ns  ±2%   27.0ns ± 0%
Cos/input_size_(1234567891234567_<<_180)-8  31.8ns  ± 0%   26.9ns ± 0%

Fixes #29240

Change-Id: Id2ebcfa113926f27510d527e80daaddad925a707
Reviewed-on: https://go-review.googlesource.com/c/go/+/469635
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Bill O'Farrell <billotosyr@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2023-07-31 04:25:54 +00:00
root a8a6f90a23 math: support to handle huge arguments in tan function on s390x
Currently on s390x, tan assembly implementation is not handling huge arguments at all. This change is to check for large arguments and revert back to native go implantation from assembly code in case of huge arguments.

The changes are implemented in assembly code to get better performance over native go implementation.

Benchmark details of tan function with table driven inputs are updated as part of the issue link.

Fixes #37854

Change-Id: I4e5321e65c27b7ce8c497fc9d3991ca8604753d2
Reviewed-on: https://go-review.googlesource.com/c/go/+/470595
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
2023-07-27 23:30:00 +00:00
Keith Randall 9b33543339 math: test large negative values as args for trig functions
Sin/Tan are odd, Cos is even, so it is easy to compute the correct
result from the positive argument case.

Change-Id: If851d00fc7f515ece8199cf56d21186ced51e94f
Reviewed-on: https://go-review.googlesource.com/c/go/+/509815
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Srinivas Pokala <Pokala.Srinivas@ibm.com>
Reviewed-by: Robert Griesemer <gri@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2023-07-17 21:05:34 +00:00
Michael Munday 158d11196f math: add test that covers riscv64 fnm{add,sub} codegen
Adds a test that triggers the RISC-V fused multiply-add code
generation bug fixed by CL 506575.

Change-Id: Ia3a55a68b48c5cc6beac4e5235975dea31f3faf2
Reviewed-on: https://go-review.googlesource.com/c/go/+/507035
Auto-Submit: M Zhuo <mzh@golangcn.org>
Reviewed-by: M Zhuo <mzh@golangcn.org>
Run-TryBot: Michael Munday <mike.munday@lowrisc.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Joedian Reid <joedian@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2023-07-07 17:39:26 +00:00
Michael Munday c8dad424bf math: fix portable FMA when x*y < 0 and x*y == -z
When x*y == -z the portable implementation of FMA copied the sign
bit from x*y into the result. This meant that when x*y == -z and
x*y < 0 the result was -0 which is incorrect.

Fixes #61130.

Change-Id: Ib93a568b7bdb9031e2aedfa1bdfa9bddde90851d
Reviewed-on: https://go-review.googlesource.com/c/go/+/507376
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Michael Munday <mike.munday@lowrisc.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Joedian Reid <joedian@golang.org>
2023-07-05 22:05:30 +00:00
Ian Lance Taylor 65db95d0ed math: document that Min/Max differ from min/max
For #59488
Fixes #60616

Change-Id: Idf9f42d7d868999664652dd7b478684a474f1d96
Reviewed-on: https://go-review.googlesource.com/c/go/+/501355
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Rob Pike <r@golang.org>
Run-TryBot: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
2023-06-15 19:45:12 +00:00
Alexander Yastrebov 8ffc931eae all: fix spelling errors
Fix spelling errors discovered using https://github.com/codespell-project/codespell. Errors in data files and vendored packages are ignored.

Change-Id: I83c7818222f2eea69afbd270c15b7897678131dc
GitHub-Last-Rev: 3491615b1b
GitHub-Pull-Request: golang/go#60758
Reviewed-on: https://go-review.googlesource.com/c/go/+/502576
Auto-Submit: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2023-06-14 00:03:57 +00:00
cui fliter 8b53c2d2fc all: fix mismatched symbols
There are some symbol mismatches in the comments, this commit attempts to fix them

Change-Id: I5c9075e5218defe9233c075744d243b26ff68496
Reviewed-on: https://go-review.googlesource.com/c/go/+/492996
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: shuang cui <imcusg@gmail.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
2023-06-13 20:02:49 +00:00
Alan Donovan fdbc66d6dd math/big: rename Int.ToFloat64 to Float64
The "To" prefix was a relic of the first draft
that I failed to make consistent with the unprefixed
name used in the proposal. Fortunately iant spotted
it during the API audit.

Updates #56984
Updates #60560

Change-Id: Ifa6eeddf6dd5f0637c0568e383f9a4bef88b10f9
Reviewed-on: https://go-review.googlesource.com/c/go/+/500116
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Alan Donovan <adonovan@google.com>
2023-06-02 14:22:24 +00:00
Ian Lance Taylor bf14663943 Revert "math: add Compare and Compare32"
This reverts CL 467515. Now that we have cmp.Compare,
we don't need math.Compare or math.Compare32 after all.

For #56491
Fixes #60519

Change-Id: I8ed33464adfc6d69bd6b328edb26aa2ee3d234d9
Reviewed-on: https://go-review.googlesource.com/c/go/+/499416
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Eli Bendersky <eliben@google.com>
2023-05-31 21:19:39 +00:00
Egon Elbre 74af79bcf6 fmt,math/big,net/url: fixes to old Benchmarks
b.ResetTimer used to also stop the timer, however it does not anymore.
These benchmarks hadn't been fixed and as a result ended up measuring
some additional things.

Also, make some for loops more conventional.

Change-Id: I76ca68456d85eec51722a80587e5b2c9f5d836a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/496996
Run-TryBot: Damien Neil <dneil@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Auto-Submit: Damien Neil <dneil@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Damien Neil <dneil@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
2023-05-23 20:25:13 +00:00
cui fliter 57e3189821 all: fix a lot of comments
Fix comments, including duplicate is, wrong phrases and articles, misspellings, etc.

Change-Id: I8bfea53b9b275e649757cc4bee6a8a026ed9c7a4
Reviewed-on: https://go-review.googlesource.com/c/go/+/493035
Reviewed-by: Benny Siegert <bsiegert@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: shuang cui <imcusg@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
2023-05-10 12:59:20 +00:00
Lynn Boger e23322e2cc cmd/internal/obj/ppc64: modify PCALIGN to ensure alignment
The initial purpose of PCALIGN was to identify code
where it would be beneficial to align code for performance,
but avoid cases where too many NOPs were added. On p10, it
is now necessary to enforce a certain alignment in some
cases, so the behavior of PCALIGN needs to be slightly
different.  Code will now be aligned to the value specified
on the PCALIGN instruction regardless of number of NOPs added,
which is more intuitive and consistent with power assembler
alignment directives.

This also adds 64 as a possible alignment value.

The existing values used in PCALIGN were modified according to
the new behavior.

A testcase was updated and performance testing was done to
verify that this does not adversely affect performance.

Change-Id: Iad1cf5ff112e5bfc0514f0805be90e24095e932b
Reviewed-on: https://go-review.googlesource.com/c/go/+/485056
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Archana Ravindar <aravind5@in.ibm.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
2023-04-21 16:47:45 +00:00
Ian Lance Taylor 9a0c506a4e all: re-run stringer
Re-run all go:generate stringer commands. This mostly adds checks
that the constant values did not change, but does add new strings
for the debug/dwarf and internal/pkgbits packages.

Change-Id: I5fc41f20da47338152c183d45d5ae65074e2fccf
Reviewed-on: https://go-review.googlesource.com/c/go/+/483717
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
2023-04-11 20:24:07 +00:00
Ian Lance Taylor 6d2cac12db math/rand: clarify Seed deprecation note
Fixes #59331

Change-Id: I62156be2f2758c59349c3b02db6cf9140429c9e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/481915
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Bypass: Ian Lance Taylor <iant@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2023-04-04 20:18:09 +00:00