Commit Graph

589 Commits

Author SHA1 Message Date
Ruixin(Peter) Bao 3a37fd4010 math/big: rewrite subVW to use fast path on s390x
This CL replaces the original subVW implementation with a implementation
that uses a similar idea as CL 164968.

When we know the borrow bit is zero, we can copy the rest of words as
they will not be updated. Also, since we are copying vector of a words,
a faster implementation of copy is written in this CL to copy a word or
multiple words at a time.

Benchmarks:
name             old time/op    new time/op     delta
SubVW/1-18         4.43ns ± 0%     3.82ns ± 0%   -13.85%  (p=0.000 n=20+20)
SubVW/2-18         5.39ns ± 0%     4.25ns ± 0%   -21.23%  (p=0.000 n=20+20)
SubVW/3-18         6.29ns ± 0%     4.65ns ± 0%   -26.07%  (p=0.000 n=16+19)
SubVW/4-18         6.08ns ± 2%     4.84ns ± 0%   -20.43%  (p=0.000 n=20+20)
SubVW/5-18         7.06ns ± 1%     4.93ns ± 0%   -30.18%  (p=0.000 n=20+20)
SubVW/10-18        10.3ns ± 2%      7.2ns ± 0%   -30.35%  (p=0.000 n=20+19)
SubVW/100-18       48.0ns ± 4%     17.6ns ± 0%   -63.32%  (p=0.000 n=18+19)
SubVW/1000-18       448ns ±10%      236ns ± 1%   -47.24%  (p=0.000 n=20+20)
SubVW/10000-18     4.83µs ± 5%     2.96µs ± 0%   -38.73%  (p=0.000 n=20+19)
SubVW/100000-18    46.6µs ± 3%     30.6µs ± 1%   -34.30%  (p=0.000 n=20+20)
[Geo mean]         56.3ns          37.0ns        -34.24%

name             old speed      new speed       delta
SubVW/1-18       1.80GB/s ± 0%   2.10GB/s ± 0%   +16.16%  (p=0.000 n=20+20)
SubVW/2-18       2.97GB/s ± 0%   3.77GB/s ± 0%   +26.95%  (p=0.000 n=20+20)
SubVW/3-18       3.82GB/s ± 0%   5.16GB/s ± 0%   +35.26%  (p=0.000 n=20+19)
SubVW/4-18       5.26GB/s ± 1%   6.61GB/s ± 0%   +25.59%  (p=0.000 n=20+20)
SubVW/5-18       5.67GB/s ± 1%   8.11GB/s ± 0%   +43.12%  (p=0.000 n=20+20)
SubVW/10-18      7.79GB/s ± 2%  11.17GB/s ± 0%   +43.52%  (p=0.000 n=20+19)
SubVW/100-18     16.7GB/s ± 4%   45.5GB/s ± 0%  +172.61%  (p=0.000 n=18+20)
SubVW/1000-18    17.9GB/s ± 9%   33.9GB/s ± 1%   +89.25%  (p=0.000 n=20+20)
SubVW/10000-18   16.6GB/s ± 5%   27.0GB/s ± 0%   +63.08%  (p=0.000 n=20+19)
SubVW/100000-18  17.2GB/s ± 2%   26.1GB/s ± 1%   +52.18%  (p=0.000 n=20+20)
[Geo mean]       7.25GB/s       11.03GB/s        +52.01%

Change-Id: I32e99cbab3260054a96231d02b87049c833ab77e
Reviewed-on: https://go-review.googlesource.com/c/go/+/227297
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-24 14:50:59 +00:00
Ruixin(Peter) Bao ee8972cd12 math/big: rewrite addVW to use fast path on s390x
Rewrite addVW to use a fast path and remove the original
vector and non vector implementation of addVW in assembly. This CL uses
a similar idea as CL 164968, where we copy the rest of words when we
know carry bit is zero.

In addition, since we are copying vector of words, a faster
implementation of copy is written in this CL to copy a word or multiple
words at a time.

Benchmarks:
name             old time/op    new time/op     delta
AddVW/1-18         4.56ns ± 0%     4.01ns ± 6%   -12.14%  (p=0.000 n=18+20)
AddVW/2-18         5.54ns ± 0%     4.42ns ± 5%   -20.20%  (p=0.000 n=18+20)
AddVW/3-18         6.55ns ± 0%     4.61ns ± 0%   -29.62%  (p=0.000 n=16+18)
AddVW/4-18         6.11ns ± 2%     5.12ns ± 6%   -16.19%  (p=0.000 n=20+20)
AddVW/5-18         7.32ns ± 4%     5.14ns ± 0%   -29.77%  (p=0.000 n=20+19)
AddVW/10-18        10.6ns ± 2%      7.2ns ± 1%   -31.47%  (p=0.000 n=20+20)
AddVW/100-18       49.6ns ± 2%     18.0ns ± 0%   -63.63%  (p=0.000 n=20+20)
AddVW/1000-18       465ns ± 3%      244ns ± 0%   -47.54%  (p=0.000 n=20+20)
AddVW/10000-18     4.99µs ± 4%     2.97µs ± 0%   -40.54%  (p=0.000 n=20+20)
AddVW/100000-18    48.3µs ± 3%     30.8µs ± 1%   -36.29%  (p=0.000 n=20+20)
[Geo mean]         58.1ns          38.0ns        -34.57%

name             old speed      new speed       delta
AddVW/1-18       1.76GB/s ± 0%   2.00GB/s ± 6%   +14.04%  (p=0.000 n=20+20)
AddVW/2-18       2.89GB/s ± 0%   3.63GB/s ± 5%   +25.55%  (p=0.000 n=18+20)
AddVW/3-18       3.66GB/s ± 0%   5.21GB/s ± 0%   +42.25%  (p=0.000 n=18+19)
AddVW/4-18       5.24GB/s ± 2%   6.27GB/s ± 6%   +19.61%  (p=0.000 n=20+20)
AddVW/5-18       5.47GB/s ± 4%   7.78GB/s ± 0%   +42.28%  (p=0.000 n=20+18)
AddVW/10-18      7.55GB/s ± 2%  11.04GB/s ± 1%   +46.09%  (p=0.000 n=20+20)
AddVW/100-18     16.1GB/s ± 2%   44.3GB/s ± 0%  +174.77%  (p=0.000 n=20+20)
AddVW/1000-18    17.2GB/s ± 3%   32.8GB/s ± 1%   +90.58%  (p=0.000 n=20+20)
AddVW/10000-18   16.0GB/s ± 4%   26.9GB/s ± 0%   +68.11%  (p=0.000 n=20+20)
AddVW/100000-18  16.6GB/s ± 3%   26.0GB/s ± 1%   +56.94%  (p=0.000 n=20+20)
[Geo mean]       7.03GB/s       10.75GB/s        +52.93%

Change-Id: Idbb73f3178311bd2b18a93bdc1e48f26869d2f6a
Reviewed-on: https://go-review.googlesource.com/c/go/+/209679
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-24 13:26:34 +00:00
Michael Munday ab7a65f283 cmd/compile: clean up codegen for branch-on-carry on s390x
This CL optimizes code that uses a carry from a function such as
bits.Add64 as the condition in an if statement. For example:

    x, c := bits.Add64(a, b, 0)
    if c != 0 {
        panic("overflow")
    }

Rather than converting the carry into a 0 or a 1 value and using
that as an input to a comparison instruction the carry flag is now
used as the input to a conditional branch directly. This typically
removes an ADD LOGICAL WITH CARRY instruction when user code is
doing overflow detection and is closer to the code that a user
would expect to generate.

Change-Id: I950431270955ab72f1b5c6db873b6abe769be0da
Reviewed-on: https://go-review.googlesource.com/c/go/+/219757
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-22 20:11:06 +00:00
Ruixin(Peter) Bao 0329c915a0 math/big: clean up whitespace in arith_s390x.s file
This CL looks big but it only does formatting changes to arith_s390x.s.
The file was formatted using asmfmt(https://github.com/klauspost/asmfmt)
, so there should not be any functional impact. I verified that the
generated assembly of big.test file is identical.

Change-Id: I8b4035ef082a4d0357881869327e25253f2d8be1
Reviewed-on: https://go-review.googlesource.com/c/go/+/229302
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-22 15:40:55 +00:00
Alberto Donizetti 813f8eae27 math/big: remove Direct Sqrt computation
The Float.Sqrt method switches (for performance reasons) between
direct (uses Quo) and inverse (doesn't) computation, depending on the
precision, with threshold 128.

Unfortunately the implementation of recursive division in CL 172018
made Quo slightly slower exactly in the range around and below the
threshold Sqrt is using, so this strategy is no longer profitable.

The new division algorithm allocates more, and this has increased the
amount of allocations performed by Sqrt when using the direct method;
on low precisions the computation is fast, so additional allocations
have an negative impact on performance.

Interestingly, only using the inverse method doesn't just reverse the
effects of the Quo algorithm change, but it seems to make performances
better overall for small precisions:

name                 old time/op    new time/op    delta
FloatSqrt/64-4          643ns ± 1%     635ns ± 1%   -1.24%  (p=0.000 n=10+10)
FloatSqrt/128-4        1.44µs ± 1%    1.02µs ± 1%  -29.25%  (p=0.000 n=10+10)
FloatSqrt/256-4        1.49µs ± 1%    1.49µs ± 1%     ~     (p=0.752 n=10+10)
FloatSqrt/1000-4       3.71µs ± 1%    3.74µs ± 1%   +0.87%  (p=0.001 n=10+10)
FloatSqrt/10000-4      35.3µs ± 1%    35.6µs ± 1%   +0.82%  (p=0.002 n=10+9)
FloatSqrt/100000-4      844µs ± 1%     844µs ± 0%     ~     (p=0.549 n=10+9)
FloatSqrt/1000000-4    69.5ms ± 0%    69.6ms ± 0%     ~     (p=0.222 n=9+9)

name                 old alloc/op   new alloc/op   delta
FloatSqrt/64-4           280B ± 0%      200B ± 0%  -28.57%  (p=0.000 n=10+10)
FloatSqrt/128-4          504B ± 0%      248B ± 0%  -50.79%  (p=0.000 n=10+10)
FloatSqrt/256-4          344B ± 0%      344B ± 0%     ~     (all equal)
FloatSqrt/1000-4       1.30kB ± 0%    1.30kB ± 0%     ~     (all equal)
FloatSqrt/10000-4      13.5kB ± 0%    13.5kB ± 0%     ~     (p=0.237 n=10+10)
FloatSqrt/100000-4      123kB ± 0%     123kB ± 0%     ~     (p=0.247 n=10+10)
FloatSqrt/1000000-4    1.83MB ± 1%    1.83MB ± 3%     ~     (p=0.779 n=8+10)

name                 old allocs/op  new allocs/op  delta
FloatSqrt/64-4           8.00 ± 0%      5.00 ± 0%  -37.50%  (p=0.000 n=10+10)
FloatSqrt/128-4          11.0 ± 0%       5.0 ± 0%  -54.55%  (p=0.000 n=10+10)
FloatSqrt/256-4          5.00 ± 0%      5.00 ± 0%     ~     (all equal)
FloatSqrt/1000-4         6.00 ± 0%      6.00 ± 0%     ~     (all equal)
FloatSqrt/10000-4        6.00 ± 0%      6.00 ± 0%     ~     (all equal)
FloatSqrt/100000-4       6.00 ± 0%      6.00 ± 0%     ~     (all equal)
FloatSqrt/1000000-4      10.3 ±13%      10.3 ±13%     ~     (p=1.000 n=10+10)

For example, 1.02µs for FloatSqrt/128 is actually better than what I
was getting on the same machine before the Quo changes.

The .8% slowdown on /1000 and /10000 appears to be real and it is
quite baffling (that codepath was not touched at all); it may be
caused by code alignment changes.

Change-Id: Ib03761cdc1055674bc7526d4f3a23d7a25094029
Reviewed-on: https://go-review.googlesource.com/c/go/+/228062
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2020-04-15 16:37:53 +00:00
Brad Fitzpatrick 8f53fad035 math/big: add test that linker is able to remove unused code
(Follow-up to CL 228108.)

Change-Id: Ia6d119ee19c7aa923cdeead06d3cee87a1751105
Reviewed-on: https://go-review.googlesource.com/c/go/+/228109
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2020-04-15 03:25:21 +00:00
Hanjun Kim 5a447c0ae9 math/big: fix typo in documentation for Int.Exp
Fixes #38304

Also change `If m > 0, y < 0, ...` to `If m != 0, y < 0, ...` since `Exp` will return `nil`
whatever `m`'s sign is.

Change-Id: I17d7337ccd1404318cea5d42a8de904ad185fd00
GitHub-Last-Rev: 2399510300
GitHub-Pull-Request: golang/go#38390
Reviewed-on: https://go-review.googlesource.com/c/go/+/228000
Reviewed-by: Robert Griesemer <gri@golang.org>
2020-04-15 00:32:18 +00:00
Brad Fitzpatrick a55645fa34 math/big: don't use Float in init to help linker discard 162 KiB
Removes 162 KiB from binaries that don't use math/big.Float:

-rwxr-xr-x 1 bradfitz bradfitz 1916590 Apr 14 12:21 x.after
-rwxr-xr-x 1 bradfitz bradfitz 2082575 Apr 14 12:21 x.before

No change in deps (this package already used sync).

No change in benchmarks:

name                 old time/op    new time/op    delta
FloatSqrt/64-8         1.06µs ±10%    1.03µs ± 6%   ~     (p=0.133 n=10+9)
FloatSqrt/128-8        2.26µs ± 9%    2.28µs ± 9%   ~     (p=0.460 n=10+8)
FloatSqrt/256-8        2.29µs ± 5%    2.31µs ± 3%   ~     (p=0.214 n=9+9)
FloatSqrt/1000-8       5.82µs ± 3%    5.87µs ± 7%   ~     (p=0.666 n=9+9)
FloatSqrt/10000-8      56.4µs ± 5%    57.0µs ± 6%   ~     (p=0.436 n=10+10)
FloatSqrt/100000-8     1.34ms ± 8%    1.31ms ± 3%   ~     (p=0.447 n=10+9)
FloatSqrt/1000000-8     106ms ± 5%     107ms ± 7%   ~     (p=0.315 n=10+10)

name                 old alloc/op   new alloc/op   delta
FloatSqrt/64-8           280B ± 0%      280B ± 0%   ~     (all equal)
FloatSqrt/128-8          504B ± 0%      504B ± 0%   ~     (all equal)
FloatSqrt/256-8          344B ± 0%      344B ± 0%   ~     (all equal)
FloatSqrt/1000-8       1.30kB ± 0%    1.30kB ± 0%   ~     (all equal)
FloatSqrt/10000-8      13.5kB ± 0%    13.5kB ± 0%   ~     (p=0.403 n=10+10)
FloatSqrt/100000-8      123kB ± 0%     123kB ± 0%   ~     (p=0.393 n=10+10)
FloatSqrt/1000000-8    1.84MB ± 7%    1.84MB ± 5%   ~     (p=0.739 n=10+10)

name                 old allocs/op  new allocs/op  delta
FloatSqrt/64-8           8.00 ± 0%      8.00 ± 0%   ~     (all equal)
FloatSqrt/128-8          11.0 ± 0%      11.0 ± 0%   ~     (all equal)
FloatSqrt/256-8          5.00 ± 0%      5.00 ± 0%   ~     (all equal)
FloatSqrt/1000-8         6.00 ± 0%      6.00 ± 0%   ~     (all equal)
FloatSqrt/10000-8        6.00 ± 0%      6.00 ± 0%   ~     (all equal)
FloatSqrt/100000-8       6.00 ± 0%      6.00 ± 0%   ~     (all equal)
FloatSqrt/1000000-8      10.9 ±10%      10.8 ±17%   ~     (p=0.974 n=10+10)

Change-Id: I3337f1f531bf7b4fae192b9d90cd24ff2be14fea
Reviewed-on: https://go-review.googlesource.com/c/go/+/228108
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-14 20:50:19 +00:00
Rémy Oudompheng ac1fd419b6 math/big: correct off-by-one access in divBasic
The divBasic function computes the quotient of big nats u/v word by word.
It estimates each word qhat by performing a long division (top 2 words of u
divided by top word of v), looks at the next word to correct the estimate,
then perform a full multiplication (qhat*v) to catch any inaccuracy in the
estimate.

In the latter case, "negative" values appear temporarily and carries
must be carefully managed, and the recursive division refactoring
introduced a case where qhat*v has the same length as v, triggering an
out-of-bounds write in the case it happens when computing the top word
of the quotient.

Fixes #37499

Change-Id: I15089da4a4027beda43af497bf6de261eb792f94
Reviewed-on: https://go-review.googlesource.com/c/go/+/221980
Reviewed-by: Robert Griesemer <gri@golang.org>
2020-04-08 20:53:58 +00:00
Brian Kessler 6b6414cab4 math: correct Atan2(±y,+∞) = ±0 on s390x
The s390x assembly implementation was previously only handling this
case correctly for x = -Pi.  Update the special case handling for
any y.

Fixes #35446

Change-Id: I355575e9ec8c7ce8bd9db10d74f42a22f39a2f38
Reviewed-on: https://go-review.googlesource.com/c/go/+/223420
Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Reviewed-by: Robert Griesemer <gri@golang.org>
2020-03-25 04:06:34 +00:00
Alberto Donizetti c5058652fd math/big: document that Sqrt doesn't set Accuracy
Document that the Float.Sqrt method does not set the receiver's
Accuracy field.

Updates #37915

Change-Id: Ief1dcac07eacc0ef02f86bfac9044501477bca1c
Reviewed-on: https://go-review.googlesource.com/c/go/+/224497
Reviewed-by: Robert Griesemer <gri@golang.org>
2020-03-20 19:24:37 +00:00
Brian Kessler d774d979dd math/cmplx: disable TanHuge test on s390x
s390x has inaccurate range reduction for the assembly routines
in math so these tests are diabled until these are corrected.

Updates #37854

Change-Id: I1e26acd6d09ae3e592a3dd90aec73a6844f5c6fe
Reviewed-on: https://go-review.googlesource.com/c/go/+/223457
Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
2020-03-14 07:03:15 +00:00
Brian Kessler 70dc28f766 math/cmplx: implement Payne-Hanek range reduction
Tan has poles along the real axis. In order to accurately calculate
the value near these poles, a range reduction by Pi is performed and
the result calculated via a Taylor series.  The prior implementation
of range reduction used Cody-Waite range reduction in three parts.
This fails when x is too large to accurately calculate the partial
products in the summation accurately.  Above this threshold, Payne-Hanek
range reduction using a multiple precision value of 1/Pi is required.

Additionally, the threshold used in math/trig_reduce.go for Payne-Hanek
range reduction was not set conservatively enough. The prior threshold
ensured that catastrophic failure did not occur where the argument x
would not actually be reduced below Pi/4. However, errors in reduction
begin to occur at values much lower when z = ((x - y*PI4A) - y*PI4B) - y*PI4C
is not exact because y*PI4A cannot be exactly represented as a float64.
reduceThreshold is lowered to the proper value.

Fixes #31566

Change-Id: I0f39a4171a5be44f64305f18dc57f6c29f19dba7
Reviewed-on: https://go-review.googlesource.com/c/go/+/172838
Reviewed-by: Rob Pike <r@golang.org>
2020-03-14 04:12:41 +00:00
Joel Sing fe70838598 math/big: initial vector arithmetic in riscv64 assembly
Provide an assembly implementation of mulWW - for now all others run the
Go code.

Change-Id: Icb594c31048255f131bdea8d64f56784fc9db4d1
Reviewed-on: https://go-review.googlesource.com/c/go/+/220919
Reviewed-by: Robert Griesemer <gri@golang.org>
2020-02-25 16:47:02 +00:00
Joel Sing 89f249a40d math: implement Sqrt in assembly for riscv64
Change-Id: I9a5dc33271434e58335f5562a30cc131c6a8332c
Reviewed-on: https://go-review.googlesource.com/c/go/+/220918
Reviewed-by: Robert Griesemer <gri@golang.org>
2020-02-25 16:43:26 +00:00
Filippo Valsorda 88ae4ccefb math/big: reintroduce pre-Go 1.14 mention in GCD docs
It was removed in CL 217302 but was intentionally added in CL 217104.

Change-Id: I1a478d80ad1ec4f0a0184bfebf8f1a5e352cfe8c
Reviewed-on: https://go-review.googlesource.com/c/go/+/217941
Reviewed-by: Robert Griesemer <gri@golang.org>
2020-02-05 20:54:27 +00:00
Filippo Valsorda cdb7fd6b06 math/big: simplify GCD docs
We don't usually document past behavior (like "As of Go 1.14 ...") and
in isolation the current docs made it sound like a and b could only be
negative or zero.

Change-Id: I0d3c2b8579a9c01159ce528a3128b1478e99042a
Reviewed-on: https://go-review.googlesource.com/c/go/+/217302
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2020-01-31 23:32:37 +00:00
Robert Griesemer 9bb40ed8ec math/big: update comment on Int.GCD
Per the suggestion https://golang.org/cl/216200/2/doc/go1.14.html#423.

Updates #28878.

Change-Id: I654d2d114409624219a0041916f0a4030efc7573
Reviewed-on: https://go-review.googlesource.com/c/go/+/217104
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2020-01-30 20:37:01 +00:00
Joel Sing 5a3a5d3525 math, math/big: add support for riscv64
Based on riscv-go port.

Updates #27532

Change-Id: Id8ae7d851c393ec3702e4176c363accb0a42587f
Reviewed-on: https://go-review.googlesource.com/c/go/+/204633
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2020-01-15 18:49:52 +00:00
Brad Fitzpatrick 7d1d944626 math/rand: update comment to avoid use of ^ for exponentiation
Fixes #35920

Change-Id: I1a4d26c5f7f3fbd4de13fc337de482667d83c47f
Reviewed-on: https://go-review.googlesource.com/c/go/+/209758
Reviewed-by: Alberto Donizetti <alb.donizetti@gmail.com>
2019-12-04 21:14:24 +00:00
Ville Skyttä 440f7d6404 all: fix a bunch of misspellings
Change-Id: I5b909df0fd048cd66c5a27fca1b06466d3bcaac7
GitHub-Last-Rev: 778c5d2131
GitHub-Pull-Request: golang/go#35624
Reviewed-on: https://go-review.googlesource.com/c/go/+/207421
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-11-15 21:04:43 +00:00
Rémy Oudompheng 7ad27481f8 math/big: fix out-of-bounds panic in divRecursive
The bounds in the last carry branch were wrong as there
is no reason for len(u) >= n+n/2 to always hold true.

We also adjust test to avoid using a remainder of 1
(in which case, the last step of the algorithm computes
(qhatv+1) - qhatv which rarely produces a carry).

Change-Id: I69fbab9c5e19d0db1c087fbfcd5b89352c2d26fb
Reviewed-on: https://go-review.googlesource.com/c/go/+/206839
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-11-13 19:15:27 +00:00
Robert Griesemer 1fe33e3cb2 math/big: ensure correct test input
There is a (theoretical, but possible) chance that the
random number values a, b used for TestDiv are 0 or 1,
in which case the test would fail.

This CL makes sure that a >= 1 and b >= 2 at all times.

Fixes #35523.

Change-Id: I6451feb94241249516a821cd0066e95a0c65b0ed
Reviewed-on: https://go-review.googlesource.com/c/go/+/206818
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-11-12 18:52:52 +00:00
Rémy Oudompheng 194ae3236d math/big: implement recursive algorithm for division
The current division algorithm produces one word of result at a time,
using 2-word division to compute the top word and mulAddVWW to compute
the remainder. The top word may need to be adjusted by 1 or 2 units.

The recursive version, based on Burnikel, Ziegler, "Fast Recursive Division",
uses the same principles, but in a multi-word setting, so that
multiplication benefits from the Karatsuba algorithm (and possibly later
improvements).

benchmark                             old ns/op        new ns/op      delta
BenchmarkDiv/20/10-4                  38.2             38.3           +0.26%
BenchmarkDiv/40/20-4                  38.7             38.5           -0.52%
BenchmarkDiv/100/50-4                 62.5             62.6           +0.16%
BenchmarkDiv/200/100-4                238              259            +8.82%
BenchmarkDiv/400/200-4                311              338            +8.68%
BenchmarkDiv/1000/500-4               604              649            +7.45%
BenchmarkDiv/2000/1000-4              1214             1278           +5.27%
BenchmarkDiv/20000/10000-4            38279            36510          -4.62%
BenchmarkDiv/200000/100000-4          3022057          1359615        -55.01%
BenchmarkDiv/2000000/1000000-4        310827664        54012939       -82.62%
BenchmarkDiv/20000000/10000000-4      33272829421      1965401359     -94.09%
BenchmarkString/10/Base10-4           158              156            -1.27%
BenchmarkString/100/Base10-4          797              792            -0.63%
BenchmarkString/1000/Base10-4         3677             3814           +3.73%
BenchmarkString/10000/Base10-4        16633            17116          +2.90%
BenchmarkString/100000/Base10-4       5779029          1793808        -68.96%
BenchmarkString/1000000/Base10-4      889840820        85524031       -90.39%
BenchmarkString/10000000/Base10-4     134338236860     4935657026     -96.33%

Fixes #21960
Updates #30943

Change-Id: I134c6f81a47870c688ca95b6081eb9211def15a2
Reviewed-on: https://go-review.googlesource.com/c/go/+/172018
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-11-12 05:18:25 +00:00
Ian Lance Taylor f5e89c2214 Revert "math/cmplx: handle special cases"
This reverts CL 169501.

Reason for revert: The new tests fail at least on s390x and MIPS. This is likely a minor bug in the compiler or runtime. But this point in the release cycle is not the time to debug these details, which are unlikely to be new. Let's try again for 1.15.

Updates #29320
Fixes #35443

Change-Id: I2218b2083f8974b57d528e3742524393fc72b355
Reviewed-on: https://go-review.googlesource.com/c/go/+/206037
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-11-08 03:07:30 +00:00
Brian Kessler 68dce4296e math/cmplx: handle special cases
Implement special case handling and testing to ensure
conformance with the C99 standard annex G.6 Complex arithmetic.

Fixes #29320

Change-Id: Ieb0527191dd7fdea5b1aecb42b9e23aae3f74260
Reviewed-on: https://go-review.googlesource.com/c/go/+/169501
Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-11-07 19:48:30 +00:00
Russ Cox e8f01d591f math: test portable FMA even on system with hardware FMA
This makes it a little less likely the portable FMA will be
broken without realizing it.

Change-Id: I7f7f4509b35160a9709f8b8a0e494c09ea6e410a
Reviewed-on: https://go-review.googlesource.com/c/go/+/205337
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-11-07 14:53:38 +00:00
Russ Cox 543c6d2e0d math, cmd/compile: rename Fma to FMA
This API was added for #25819, where it was discussed as math.FMA.
The commit adding it used math.Fma, presumably for consistency
with the rest of the unusual names in package math
(Sincos, Acosh, Erfcinv, Float32bits, etc).

I believe that using an idiomatic Go name is more important here
than consistency with these other names, most of which are historical
baggage from C's standard library.

Early additions like Float32frombits happened before "uppercase for export"
(so they were originally like "float32frombits") and they were not properly
reconsidered when we uppercased the symbols to export them.
That's a mistake we live with.

The names of functions we have added since then, and even a few
that were legacy, are more properly Go-cased, such as IsNaN, IsInf,
and RoundToEven, rather than Isnan, Isinf, and Roundtoeven.
And also constants like MaxFloat32.

For new API, we should keep using proper Go-cased symbols
instead of minimally-upper-cased-C symbols.

So math.FMA, not math.Fma.

This API has not yet been released, so this change does not break
the compatibility promise.

This CL also modifies cmd/compile, since the compiler knows
the name of the function. I could have stopped at changing the
string constants, but it seemed to make more sense to use a
consistent casing everywhere.

Change-Id: I0f6f3407f41e99bfa8239467345c33945088896e
Reviewed-on: https://go-review.googlesource.com/c/go/+/205317
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-11-07 14:51:06 +00:00
Brian Kessler f5949b6067 math/big: allow all values for GCD
Allow the inputs a and b to be zero or negative to GCD
with the following definitions.

If x or y are not nil, GCD sets their value such that z = a*x + b*y.
Regardless of the signs of a and b, z is always >= 0.
If a == b == 0, GCD sets z = x = y = 0.
If a == 0 and b != 0, GCD sets z = |b|, x = 0, y = sign(b) * 1.
If a != 0 and b == 0, GCD sets z = |a|, x = sign(a) * 1, y = 0.

Fixes #28878

Change-Id: Ia83fce66912a96545c95cd8df0549bfd852652f3
Reviewed-on: https://go-review.googlesource.com/c/go/+/164972
Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-11-07 06:58:44 +00:00
Rémy Oudompheng 8f30d25168 math/big: use nat pool to reduce allocations in mul and sqr
This notably allows to reuse temporaries across
the karatsubaSqr recursion.

benchmark                    old ns/op     new ns/op     delta
BenchmarkNatMul/10-4         227           228           +0.44%
BenchmarkNatMul/100-4        8339          8589          +3.00%
BenchmarkNatMul/1000-4       313796        312272        -0.49%
BenchmarkNatMul/10000-4      11924720      11873589      -0.43%
BenchmarkNatMul/100000-4     503813354     503839058     +0.01%
BenchmarkNatSqr/20-4         549           513           -6.56%
BenchmarkNatSqr/30-4         945           874           -7.51%
BenchmarkNatSqr/50-4         1993          1832          -8.08%
BenchmarkNatSqr/80-4         4096          3874          -5.42%
BenchmarkNatSqr/100-4        6192          5712          -7.75%
BenchmarkNatSqr/200-4        20388         19543         -4.14%
BenchmarkNatSqr/300-4        38735         36715         -5.21%
BenchmarkNatSqr/500-4        99562         93542         -6.05%
BenchmarkNatSqr/800-4        195554        184907        -5.44%
BenchmarkNatSqr/1000-4       286302        275053        -3.93%
BenchmarkNatSqr/10000-4      9817057       9441641       -3.82%
BenchmarkNatSqr/100000-4     390713416     379696789     -2.82%

benchmark                    old allocs     new allocs     delta
BenchmarkNatMul/10-4         1              1              +0.00%
BenchmarkNatMul/100-4        1              1              +0.00%
BenchmarkNatMul/1000-4       2              1              -50.00%
BenchmarkNatMul/10000-4      2              1              -50.00%
BenchmarkNatMul/100000-4     9              11             +22.22%
BenchmarkNatSqr/20-4         2              1              -50.00%
BenchmarkNatSqr/30-4         2              1              -50.00%
BenchmarkNatSqr/50-4         2              1              -50.00%
BenchmarkNatSqr/80-4         2              1              -50.00%
BenchmarkNatSqr/100-4        2              1              -50.00%
BenchmarkNatSqr/200-4        2              1              -50.00%
BenchmarkNatSqr/300-4        4              1              -75.00%
BenchmarkNatSqr/500-4        4              1              -75.00%
BenchmarkNatSqr/800-4        10             1              -90.00%
BenchmarkNatSqr/1000-4       10             1              -90.00%
BenchmarkNatSqr/10000-4      731            1              -99.86%
BenchmarkNatSqr/100000-4     19687          6              -99.97%

benchmark                    old bytes     new bytes     delta
BenchmarkNatMul/10-4         192           192           +0.00%
BenchmarkNatMul/100-4        4864          4864          +0.00%
BenchmarkNatMul/1000-4       57344         49224         -14.16%
BenchmarkNatMul/10000-4      565248        498772        -11.76%
BenchmarkNatMul/100000-4     5749504       7263720       +26.34%
BenchmarkNatSqr/20-4         672           352           -47.62%
BenchmarkNatSqr/30-4         992           512           -48.39%
BenchmarkNatSqr/50-4         1792          896           -50.00%
BenchmarkNatSqr/80-4         2688          1408          -47.62%
BenchmarkNatSqr/100-4        3584          1792          -50.00%
BenchmarkNatSqr/200-4        6656          3456          -48.08%
BenchmarkNatSqr/300-4        24448         16387         -32.97%
BenchmarkNatSqr/500-4        36864         24591         -33.29%
BenchmarkNatSqr/800-4        69760         40981         -41.25%
BenchmarkNatSqr/1000-4       86016         49180         -42.82%
BenchmarkNatSqr/10000-4      2524800       487368        -80.70%
BenchmarkNatSqr/100000-4     68599808      5876581       -91.43%

Change-Id: I8e6e409ae1cb48be9d5aa9b5f428d6cbe487673a
Reviewed-on: https://go-review.googlesource.com/c/go/+/172017
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-10-25 03:14:39 +00:00
Robert Griesemer a52c0a1992 math/big: make Rat.Denom side-effect free
A Rat is represented via a quotient a/b where a and b are Int values.
To make it possible to use an uninitialized Rat value (with a and b
uninitialized and thus == 0), the implementation treats a 0 denominator
as 1.

Rat.Num and Rat.Denom return pointers to these values a and b. Because
b may be 0, Rat.Denom used to first initialize it to 1 and thus produce
an undesirable side-effect (by changing the Rat's denominator).

This CL changes Denom to return a new (not shared) *Int with value 1
in the rare case where the Rat was not initialized. This eliminates
the side effect and returns the correct denominator value.

While this is changing behavior of the API, the impact should now be
minor because together with (prior) CL https://golang.org/cl/202997,
which initializes Rats ASAP, Denom is unlikely used to access the
denominator of an uninitialized (and thus 0) Rat. Any operation that
will somehow set a Rat value will ensure that the denominator is not 0.

Fixes #33792.
Updates #3521.

Change-Id: I0bf15ac60513cf52162bfb62440817ba36f0c3fc
Reviewed-on: https://go-review.googlesource.com/c/go/+/203059
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-10-24 03:34:24 +00:00
Robert Griesemer 4412181e7c math/big: normalize unitialized denominators ASAP
A Rat is represented via a quotient a/b where a and b are Int values.
To make it possible to use an uninitialized Rat value (with a and b
uninitialized and thus == 0), the implementation treats a 0 denominator
as 1.

For each operation we check if the denominator is 0, and then treat
it as 1 (if necessary). Operations that create a new Rat result,
normalize that value such that a result denominator 1 is represened
as 0 again.

This CL changes this behavior slightly: 0 denominators are still
interpreted as 1, but whenever we (safely) can, we set an uninitialized
0 denominator to 1. This simplifies the code overall.

Also: Improved some doc strings.

Preparation for addressing issue #33792.

Updates #33792.

Change-Id: I3040587c8d0dad2e840022f96ca027d8470878a0
Reviewed-on: https://go-review.googlesource.com/c/go/+/202997
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-10-24 03:34:11 +00:00
Akhil Indurti 93a601dd2a math: add guaranteed-precision FMA implementation
Currently, the precision of the float64 multiply-add operation
(x * y) + z varies across architectures. While generated code for
ppc64, s390x, and arm64 can guarantee that there is no intermediate
rounding on those platforms, other architectures like x86, mips, and
arm will exhibit different behavior depending on available instruction
set. Consequently, applications cannot rely on results being identical
across GOARCH-dependent codepaths.

This CL introduces a software implementation that performs an IEEE 754
double-precision fused-multiply-add operation. The only supported
rounding mode is round-to-nearest ties-to-even. Separate CLs include
hardware implementations when available. Otherwise, this software
fallback is given as the default implementation.

Specifically,
    - arm64, ppc64, s390x: Uses the FMA instruction provided by all
      of these ISAs.
    - mips[64][le]: Falls back to this software implementation. Only
      release 6 of the ISA includes a strict FMA instruction with
      MADDF.D (not implementation defined). Because the number of R6
      processors in the wild is scarce, the assembly implementation
      is left as a future optimization.
    - x86: Guards the use of VFMADD213SD by checking cpu.X86.HasFMA.
    - arm: Guards the use of VFMA by checking cpu.ARM.HasVFPv4.
    - software fallback: Uses mostly integer arithmetic except
      for input that involves Inf, NaN, or zero.

Updates #25819.

Change-Id: Iadadff2219638bacc9fec78d3ab885393fea4a08
Reviewed-on: https://go-review.googlesource.com/c/go/+/127458
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-21 16:15:30 +00:00
Alberto Donizetti 57c63e0fb2 math/bits: add Rem, Rem32, Rem64
The Div functions in math/bits (Div, Div32, and Div64) compute both
quotients and remainders, but they panic if the quotients do not not
fit a 32/64 uint.

Since, on the other hand, the remainder will always fit the size of
the divisor, it is useful to have Div variants that only compute the
remainder, and don't panic on a quotient overflow.

This change adds to the math/bits package three new functions:

  Rem(hi, lo, y uint) uint
  Rem32(hi, lo, y uint32) uint32
  Rem64(hi, lo, y uint64) uint64

which can be used to compute (hi,lo)%y even when the quotient
overflows the uint size.

Fixes #28970

Change-Id: I119948429f737670c5e5ceb8756121e6a738dbdc
Reviewed-on: https://go-review.googlesource.com/c/go/+/197838
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-18 13:47:36 +00:00
Robert Griesemer 898f9db81f math/big: make Rat accessors safe for concurrent use
Do not modify the underlying Rat denominator when calling
one of the accessors Float32, Float64; verify that we don't
modify the Rat denominator when calling Inv, Sign, IsInt, Num.

Fixes #34919.
Reopens #33792.

Change-Id: Ife6d1252373f493a597398ee51e7b5695b708df5
Reviewed-on: https://go-review.googlesource.com/c/go/+/201205
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-10-15 23:31:58 +00:00
Brad Fitzpatrick 03ef105dae all: remove nacl (part 3, more amd64p32)
Part 1: CL 199499 (GOOS nacl)
Part 2: CL 200077 (amd64p32 files, toolchain)
Part 3: stuff that arguably should've been part of Part 2, but I forgot
        one of my grep patterns when splitting the original CL up into
        two parts.

This one might also have interesting stuff to resurrect for any future
x32 ABI support.

Updates #30439

Change-Id: I2b4143374a253a003666f3c69e776b7e456bdb9c
Reviewed-on: https://go-review.googlesource.com/c/go/+/200318
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-10-10 22:38:38 +00:00
Brad Fitzpatrick 07b4abd62e all: remove the nacl port (part 2, amd64p32 + toolchain)
This is part two if the nacl removal. Part 1 was CL 199499.

This CL removes amd64p32 support, which might be useful in the future
if we implement the x32 ABI. It also removes the nacl bits in the
toolchain, and some remaining nacl bits.

Updates #30439

Change-Id: I2475d5bb066d1b474e00e40d95b520e7c2e286e1
Reviewed-on: https://go-review.googlesource.com/c/go/+/200077
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-10-09 22:34:34 +00:00
Carlo Alberto Ferraris 5f1aeaeb77 math/rand: devirtualize interface call in Read
This allows to inline the common case in which the Source is a
rngSource. On linux/amd64 in a VM:

name        old time/op  new time/op  delta
Read3-4     33.8ns ± 8%  18.5ns ± 8%  -45.38%  (p=0.000 n=10+10)
Read64-4     371ns ± 8%    70ns ± 7%  -81.00%  (p=0.000 n=10+10)
Read1000-4  5.33µs ± 5%  0.86µs ± 3%  -83.85%  (p=0.000 n=9+9)

Change-Id: Ibf47b0e9ecdfe62ffcb66d6a92f191800bdc740e
Reviewed-on: https://go-review.googlesource.com/c/go/+/191539
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-09-30 15:43:34 +00:00
Carlo Alberto Ferraris 931365763a math/rand: devirtualize interface in lockedSource
Avoid interface calls, enable inlining, and store the rngSource close to the
Mutex to exploit better memory locality.

Also add a benchmark to properly measure the threadsafe nature of globalRand.

On a linux/amd64 VM:

name                       old time/op  new time/op  delta
Int63Threadsafe-4          36.4ns ±12%  20.6ns ±11%  -43.52%  (p=0.000 n=30+30)
Int63ThreadsafeParallel-4  79.3ns ± 5%  56.5ns ± 5%  -28.69%  (p=0.000 n=29+30)

Change-Id: I6ab912c1a1e9afc7bacd8e72c82d4d50d546a510
Reviewed-on: https://go-review.googlesource.com/c/go/+/191538
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-09-29 14:27:05 +00:00
Robert Griesemer 770fac4586 math/big: avoid MinExp exponent wrap-around in 'x' Text format
Fixes #34343.

Change-Id: I74240c8f431f6596338633a86a7a5ee1fce70a65
Reviewed-on: https://go-review.googlesource.com/c/go/+/196057
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-09-18 04:48:34 +00:00
Javier 5cc64141e7 math: Add examples for Copysign, Dim, Exp* and Trunc
Change-Id: I95921a8a55b243600aaec24ddca74b7040107dca
Reviewed-on: https://go-review.googlesource.com/c/go/+/195203
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-09-17 23:05:29 +00:00
Ainar Garipov 0efbd10157 all: fix typos
Use the following (suboptimal) script to obtain a list of possible
typos:

  #!/usr/bin/env sh

  set -x

  git ls-files |\
    grep -e '\.\(c\|cc\|go\)$' |\
    xargs -n 1\
    awk\
    '/\/\// { gsub(/.*\/\//, ""); print; } /\/\*/, /\*\// { gsub(/.*\/\*/, ""); gsub(/\*\/.*/, ""); }' |\
    hunspell -d en_US -l |\
    grep '^[[:upper:]]\{0,1\}[[:lower:]]\{1,\}$' |\
    grep -v -e '^.\{1,4\}$' -e '^.\{16,\}$' |\
    sort -f |\
    uniq -c |\
    awk '$1 == 1 { print $2; }'

Then, go through the results manually and fix the most obvious typos in
the non-vendored code.

Change-Id: I3cb5830a176850e1a0584b8a40b47bde7b260eae
Reviewed-on: https://go-review.googlesource.com/c/go/+/193848
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-09-08 17:28:20 +00:00
peter zhang d5fe73393c math/big: fix a duplicate "the" in a comment
Change-Id: Ib637381ab8a12aeb798576b781e1b3c458ba812d
GitHub-Last-Rev: 12994496b6
GitHub-Pull-Request: golang/go#34017
Reviewed-on: https://go-review.googlesource.com/c/go/+/192877
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2019-09-02 11:42:47 +00:00
Eric Lagergren 9dfa4cb026 math/big: document that Rat.Denom might modify the receiver
Fixes #33792

Change-Id: I306a95883c3db2d674d3294a6feb50adc50ee5d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/192017
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-08-28 16:20:00 +00:00
Illya Yalovyy be452cea42 math/big: fast path for Cmp if same
math/big.Int Cmp method does not have a fast path for the case if x and y are the same.

Fixes #30856

Change-Id: Ia9a5b5f72db9d73af1b13ed6ac39ecff87d10393
Reviewed-on: https://go-review.googlesource.com/c/go/+/178957
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-08-28 15:08:00 +00:00
Robert Griesemer 41b9e99d5b cmd/gofmt: fix normalization of imaginary number literals
The old code only normalized decimal integer imaginary number
literals. But with the generalized imaginary number syntax,
the number value may be decimal, binary, octal, or hexadecimal,
integer or floating-point.

The new code only looks at the number pattern. Only for decimal
integer imaginary literals do we need to strip leading zeroes.
The remaining normalization code simply ignore the 'i' suffix.
As a result, the new code is both simpler and shorter.

Fixes #32718.

Change-Id: If43fc962a48ed62002e65d5c81fddbb9bd283984
Reviewed-on: https://go-review.googlesource.com/c/go/+/183378
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-06-21 17:24:29 +00:00
Michael Brandenburg 18107ed9fb math: add examples for Log, Log2, Mod, and Abs
Change-Id: I5f57acd5e970b3fec5f33cfceee179235cbf739f
Reviewed-on: https://go-review.googlesource.com/c/go/+/182877
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-06-19 07:09:13 +00:00
Russ Cox 06b0babf31 all: shorten some tests
Shorten some of the longest tests that run during all.bash.
Removes 7r 50u 21s from all.bash.

After this change, all.bash is under 5 minutes again on my laptop.

For #26473.

Change-Id: Ie0460aa935808d65460408feaed210fbaa1d5d79
Reviewed-on: https://go-review.googlesource.com/c/go/+/177559
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-05-22 12:54:00 +00:00
Filippo Valsorda 41329c07f9 math/bits: document that Add, Sub, Mul, RotateLeft, ReverseBytes are constant time
Fixes #31267

Change-Id: I91e4aa8cf9d797689cb9612d0fe3bf1bb3ad15a6
Reviewed-on: https://go-review.googlesource.com/c/go/+/178177
Reviewed-by: Keith Randall <khr@golang.org>
2019-05-21 20:15:52 +00:00
adarsh ravichandran 776e1709e5 math/bits: add example for OnesCount function
Change-Id: Id87db9bed5e8715d554c1bf95c063d7d0a03c3e9
Reviewed-on: https://go-review.googlesource.com/c/go/+/178117
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-05-20 18:16:09 +00:00
smasher164 5ca44dc403 math/bits: make Add and Sub fallbacks constant time
Make the extended precision add-with-carry and sub-with-carry operations
take a constant amount of time to execute, regardless of input.

name             old time/op  new time/op  delta
Add-4            1.16ns ±11%  1.51ns ± 5%  +30.52%  (p=0.008 n=5+5)
Add32-4          1.08ns ± 0%  1.03ns ± 1%   -4.86%  (p=0.029 n=4+4)
Add64-4          1.09ns ± 1%  1.95ns ± 3%  +79.23%  (p=0.008 n=5+5)
Add64multiple-4  4.03ns ± 1%  4.55ns ±11%  +13.07%  (p=0.008 n=5+5)
Sub-4            1.08ns ± 1%  1.50ns ± 0%  +38.17%  (p=0.016 n=5+4)
Sub32-4          1.09ns ± 2%  1.53ns ±10%  +40.26%  (p=0.008 n=5+5)
Sub64-4          1.10ns ± 1%  1.47ns ± 1%  +33.39%  (p=0.008 n=5+5)
Sub64multiple-4  4.30ns ± 2%  4.08ns ± 4%   -5.07%  (p=0.032 n=5+5)

Fixes #31267

Change-Id: I1824b1b3ab8f09902ce8b5fef84ce2fdb8847ed9
Reviewed-on: https://go-review.googlesource.com/c/go/+/170758
Reviewed-by: Filippo Valsorda <filippo@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Filippo Valsorda <filippo@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-05-20 01:13:27 +00:00
JT Olio 5a2da5624a math/big: stack allocate scaleDenom return value
benchmark             old ns/op     new ns/op     delta
BenchmarkRatCmp-4     154           77.9          -49.42%

Change-Id: I932710ad8b6905879e232168b1777927f86ba22a
Reviewed-on: https://go-review.googlesource.com/c/go/+/175460
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-05-08 17:11:57 +00:00
erifan01 503e6ccd74 math/big: fix the bug in assembly implementation of shlVU on arm64
For the case where the addresses of parameter z and x of the function
shlVU overlap and the address of z is greater than x, x (input value)
can be polluted during the calculation when the high words of x are
overlapped with the low words of z (output value).

Fixes #31084

Change-Id: I9bb0266a1d7856b8faa9a9b1975d6f57dece0479
Reviewed-on: https://go-review.googlesource.com/c/go/+/169780
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-05-08 01:29:00 +00:00
Brian Kessler 689ee112df math/big: document Int.String
Int.String had no documentation and the documentation for Int.Text
did not mention the handling of the nil pointer case.

Change-Id: I9f21921e431c948545b7cabc7829e4b4e574bbe9
Reviewed-on: https://go-review.googlesource.com/c/go/+/175118
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-05-03 03:25:26 +00:00
Michael Munday f0fdbb1e8b math: consolidate assembly stub implementations
Where assembly functions are just jumps to the Go implementation
put them into a stubs_<arch>.s file. This reduces the number of
files considerably and makes it easier to see what is really
implemented in assembly.

I've also run the stubs files through asmfmt to format them in
a more consistent way.

Eventually we should replace these 'stub' assembly files with
a pure Go implementation now that we have mid-stack inlining
(see #31362).

Change-Id: If5b2022dcc23e1299f1b7ba79884f1b1263d0f7f
Reviewed-on: https://go-review.googlesource.com/c/go/+/173398
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-04-23 14:50:16 +00:00
erifan01 d17d41e58d math/big: optimize mulAddVWW on arm64 for better performance
Unroll the cycle 4 times to reduce load overhead.

Benchmarks:
name                old time/op    new time/op    delta
MulAddVWW/1-8         15.9ns ± 0%    11.9ns ± 0%  -24.92%  (p=0.000 n=8+8)
MulAddVWW/2-8         16.1ns ± 0%    13.9ns ± 1%  -13.82%  (p=0.000 n=8+8)
MulAddVWW/3-8         18.9ns ± 0%    17.3ns ± 0%   -8.47%  (p=0.000 n=8+8)
MulAddVWW/4-8         21.7ns ± 0%    19.5ns ± 0%  -10.14%  (p=0.000 n=8+8)
MulAddVWW/5-8         25.1ns ± 0%    22.5ns ± 0%  -10.27%  (p=0.000 n=8+8)
MulAddVWW/10-8        41.6ns ± 0%    40.0ns ± 0%   -3.79%  (p=0.000 n=8+8)
MulAddVWW/100-8        368ns ± 0%     363ns ± 0%   -1.36%  (p=0.000 n=8+8)
MulAddVWW/1000-8      3.52µs ± 0%    3.52µs ± 0%   -0.14%  (p=0.000 n=8+8)
MulAddVWW/10000-8     35.1µs ± 0%    35.1µs ± 0%   -0.01%  (p=0.000 n=7+6)
MulAddVWW/100000-8     351µs ± 0%     351µs ± 0%   +0.15%  (p=0.038 n=8+8)

Change-Id: I052a4db286ac6e4f3293289c7e9a82027da0405e
Reviewed-on: https://go-review.googlesource.com/c/go/+/155780
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-04-22 14:45:16 +00:00
Josh Bleecher Snyder 5781df421e all: s/cancelation/cancellation/
Though there is variation in the spelling of canceled,
cancellation is always spelled with a double l.

Reference: https://www.grammarly.com/blog/canceled-vs-cancelled/

Change-Id: I240f1a297776c8e27e74f3eca566d2bc4c856f2f
Reviewed-on: https://go-review.googlesource.com/c/go/+/170060
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-04-16 20:27:15 +00:00
Michael Munday 64dc4ba73f math: use new mnemonics for 'rotate then insert' on s390x
Mnemonics for these instructions were added to the assembler in
CL 159357.

Change-Id: Ie11c45ecc9cead9a8850fcc929b0211cfd980fe5
Reviewed-on: https://go-review.googlesource.com/c/go/+/160157
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-04-16 15:34:41 +00:00
Robert Griesemer f8c6f986fd math/big: don't clobber shared underlying array in pow5 computation
Rearranged code slightly to make lifetime of underlying array of
pow5 more explicit in code.

Fixes #31184.

Change-Id: I063081f0e54097c499988d268a23813746592654
Reviewed-on: https://go-review.googlesource.com/c/go/+/170641
Reviewed-by: Filippo Valsorda <filippo@golang.org>
2019-04-15 17:48:21 +00:00
Neven Sajko 7756a72b35 all: change the old assembly style AX:CX to CX, AX
Assembly files with "/vendor/" or "testdata" in their paths were ignored.

Change-Id: I3882ff07eb4426abb9f8ee96f82dff73c81cd61f
GitHub-Last-Rev: 51ae8c324d
GitHub-Pull-Request: golang/go#31166
Reviewed-on: https://go-review.googlesource.com/c/go/+/170197
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-04-09 00:22:03 +00:00
Filippo Valsorda ead895688d math/big: do not panic in Exp when y < 0 and x doesn't have an inverse
If x does not have an inverse modulo m, and a negative exponent is used,
return nil just like ModInverse does now.

Change-Id: I8fa72f7a851e8cf77c5fab529ede88408740626f
Reviewed-on: https://go-review.googlesource.com/c/go/+/170757
Run-TryBot: Filippo Valsorda <filippo@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-04-04 23:02:09 +00:00
Than McIntosh bead358611 math/bits: add gccgo-friendly code for compiler bootstrap
When building as part of the bootstrap process, avoid
use of "go:linkname" applied to variables, since this
feature is ill-defined/unsupported for gccgo.

Updates #30771.

Change-Id: Id44d01b5c98d292702e5075674117518cb59e2d0
Reviewed-on: https://go-review.googlesource.com/c/go/+/170737
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-04-04 18:17:30 +00:00
Neven Sajko 964fe4b80f math/big: simplify shlVU_g and shrVU_g
Rewrote a few lines to be more idiomatic/less assembly-ish.

Benchmarked with `go test -bench Float -tags math_big_pure_go`:

name                  old time/op    new time/op    delta
FloatString/100-8        751ns ± 0%     746ns ± 1%  -0.71%  (p=0.000 n=10+10)
FloatString/1000-8      22.9µs ± 0%    22.9µs ± 0%    ~     (p=0.271 n=10+10)
FloatString/10000-8     1.89ms ± 0%    1.89ms ± 0%    ~     (p=0.481 n=10+10)
FloatString/100000-8     184ms ± 0%     184ms ± 0%    ~     (p=0.094 n=9+9)
FloatAdd/10-8           56.4ns ± 1%    56.5ns ± 0%    ~     (p=0.170 n=9+9)
FloatAdd/100-8          59.7ns ± 0%    59.3ns ± 0%  -0.70%  (p=0.000 n=8+9)
FloatAdd/1000-8          101ns ± 0%      99ns ± 0%  -1.89%  (p=0.000 n=8+8)
FloatAdd/10000-8         553ns ± 0%     536ns ± 0%  -3.00%  (p=0.000 n=9+10)
FloatAdd/100000-8       4.94µs ± 0%    4.74µs ± 0%  -3.94%  (p=0.000 n=9+10)
FloatSub/10-8           50.3ns ± 0%    50.5ns ± 0%  +0.52%  (p=0.000 n=8+8)
FloatSub/100-8          52.0ns ± 0%    52.2ns ± 1%  +0.46%  (p=0.012 n=8+10)
FloatSub/1000-8         77.9ns ± 0%    77.3ns ± 0%  -0.80%  (p=0.000 n=7+8)
FloatSub/10000-8         371ns ± 0%     362ns ± 0%  -2.67%  (p=0.000 n=10+10)
FloatSub/100000-8       3.20µs ± 0%    3.10µs ± 0%  -3.16%  (p=0.000 n=10+10)
ParseFloatSmallExp-8    7.84µs ± 0%    7.82µs ± 0%  -0.17%  (p=0.037 n=9+9)
ParseFloatLargeExp-8    29.3µs ± 1%    29.5µs ± 0%    ~     (p=0.059 n=9+8)
FloatSqrt/64-8           516ns ± 0%     519ns ± 0%  +0.54%  (p=0.000 n=9+9)
FloatSqrt/128-8         1.07µs ± 0%    1.07µs ± 0%    ~     (p=0.109 n=8+9)
FloatSqrt/256-8         1.23µs ± 0%    1.23µs ± 0%  +0.50%  (p=0.000 n=9+9)
FloatSqrt/1000-8        3.43µs ± 0%    3.44µs ± 0%  +0.53%  (p=0.000 n=9+8)
FloatSqrt/10000-8       40.9µs ± 0%    40.7µs ± 0%  -0.39%  (p=0.000 n=9+8)
FloatSqrt/100000-8      1.07ms ± 0%    1.07ms ± 0%  -0.10%  (p=0.017 n=10+9)
FloatSqrt/1000000-8     89.3ms ± 0%    89.2ms ± 0%  -0.07%  (p=0.015 n=9+8)

Change-Id: Ibf07c6142719d11bc7f329246957d87a9f3ba3d2
GitHub-Last-Rev: 870a041ab7
GitHub-Pull-Request: golang/go#31220
Reviewed-on: https://go-review.googlesource.com/c/go/+/170449
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-04-04 00:26:24 +00:00
Robert Griesemer 4dce6dbb1e math/big: temporarily disable buggy shlVU assembly for arm64
This addresses the failures we have seen in #31084. The correct
fix is to find the actual bug in the assembly code.

Updates #31084.

Change-Id: I437780c53d0c4423d742e2e3b650b899ce845372
Reviewed-on: https://go-review.googlesource.com/c/go/+/169721
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-03-27 23:40:12 +00:00
Brian Kessler 5ee2290420 math/big: implement Rat.SetUint64
Implemented via the underlying Int.SetUint64.
Added tests for Rat.SetInt64 and Rat.SetUint64.

Fixes #29579

Change-Id: I03faaffc93e36873b202b58ae72b139dea5c40f9
Reviewed-on: https://go-review.googlesource.com/c/go/+/160682
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-03-27 15:20:28 +00:00
Neven Sajko 270de1c110 math: use Sincos instead of Sin and Cos in Jn and Yn
Change-Id: I0da3857013f1d4e90820fb043314d78924113a27
GitHub-Last-Rev: 7c3d813c6e
GitHub-Pull-Request: golang/go#31019
Reviewed-on: https://go-review.googlesource.com/c/go/+/169078
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-03-25 22:41:37 +00:00
Robert Griesemer e4ba40030f math/big: accept non-decimal floats with Rat.SetString
This fixes an old oversight. Rat.SetString already permitted
fractions a/b where both a and b could independently specify
a base prefix. With this CL, it now also accepts non-decimal
floating-point numbers.

Fixes #29799.

Change-Id: I9cc65666a5cebb00f0202da2e4fc5654a02e3234
Reviewed-on: https://go-review.googlesource.com/c/go/+/168237
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
2019-03-25 22:29:26 +00:00
Vladimir Kovpak 9a8979deb0 math/rand: add example for Intn func
Change-Id: I831ffb5c3fa2872d71def8d8461f0adbd4ae2c1a
GitHub-Last-Rev: 2adfcd2d5a
GitHub-Pull-Request: golang/go#30706
Reviewed-on: https://go-review.googlesource.com/c/go/+/166426
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-03-25 16:05:37 +00:00
erifan01 d0cbf9bf53 cmd/compile: follow up intrinsifying math/bits.Add64 for arm64
This CL deals with the additional comments of CL 159017.

Change-Id: I4ad3c60c834646d58dc0c544c741b92bfe83fb8b
Reviewed-on: https://go-review.googlesource.com/c/go/+/168857
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-03-22 15:09:47 +00:00
erifan01 5714c91b53 cmd/compile: intrinsify math/bits.Add64 for arm64
This CL instrinsifies Add64 with arm64 instruction sequence ADDS, ADCS
and ADC, and optimzes the case of carry chains.The CL also changes the
test code so that the intrinsic implementation can be tested.

Benchmarks:
name               old time/op       new time/op       delta
Add-224            2.500000ns +- 0%  2.090000ns +- 4%  -16.40%  (p=0.000 n=9+10)
Add32-224          2.500000ns +- 0%  2.500000ns +- 0%     ~     (all equal)
Add64-224          2.500000ns +- 0%  1.577778ns +- 2%  -36.89%  (p=0.000 n=10+9)
Add64multiple-224  6.000000ns +- 0%  2.000000ns +- 0%  -66.67%  (p=0.000 n=10+10)

Change-Id: I6ee91c9a85c16cc72ade5fd94868c579f16c7615
Reviewed-on: https://go-review.googlesource.com/c/go/+/159017
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-03-20 05:39:49 +00:00
David Chase 6535791385 math: fix math.Remainder(-x,x) (for Inf > x > 0)
Modify the |x| == |y| case to return -0 when x < 0.

Fixes #30814.

Change-Id: Ic4cd48001e0e894a12b5b813c6a1ddc3a055610b
Reviewed-on: https://go-review.googlesource.com/c/go/+/167479
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-03-15 14:52:51 +00:00
Robert Griesemer cfa93ba51f math/big: add support for underscores '_' in numbers
The primary change is in nat.scan which now accepts underscores for base 0.
While at it, streamlined error handling in that function as well.
Also, improved the corresponding test significantly by checking the
expected result values also in case of scan errors.

The second major change is in scanExponent which now accepts underscores when
the new sepOk argument is set. While at it, essentially rewrote that
function to match error and underscore handling of nat.scan more closely.
Added a new test for scanExponent which until now was only tested
indirectly.

Finally, updated the documentation for several functions and added many
new test cases to clients of nat.scan.

A major portion of this CL is due to much better test coverage.

Updates #28493.

Change-Id: I7f17b361b633fbe6c798619d891bd5e0a045b5c5
Reviewed-on: https://go-review.googlesource.com/c/go/+/166157
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
2019-03-12 22:58:58 +00:00
Brian Kessler ef891e1c83 math/big: implement Int.TrailingZeroBits
Implemented via the underlying nat.trailingZeroBits.

Fixes #29578

Change-Id: If9876c5a74b107cbabceb7547bef4e44501f6745
Reviewed-on: https://go-review.googlesource.com/c/go/+/160681
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-03-12 13:18:27 +00:00
Josh Bleecher Snyder 4d10aba35e math/big: add fast path for amd64 addVW for large z
This matches the pure Go fast path added in the previous commit.

I will leave other architectures to those with ready access to hardware.

name            old time/op    new time/op    delta
AddVW/1-8         3.60ns ± 3%    3.59ns ± 1%      ~     (p=0.147 n=91+86)
AddVW/2-8         3.92ns ± 1%    3.91ns ± 2%    -0.36%  (p=0.000 n=86+92)
AddVW/3-8         4.33ns ± 5%    4.46ns ± 5%    +2.94%  (p=0.000 n=96+97)
AddVW/4-8         4.76ns ± 5%    4.82ns ± 5%    +1.28%  (p=0.000 n=95+92)
AddVW/5-8         5.40ns ± 1%    5.42ns ± 0%    +0.47%  (p=0.000 n=76+71)
AddVW/10-8        8.03ns ± 1%    7.80ns ± 5%    -2.90%  (p=0.000 n=73+96)
AddVW/100-8       43.8ns ± 5%    17.9ns ± 1%   -59.12%  (p=0.000 n=94+81)
AddVW/1000-8       428ns ± 4%      85ns ± 6%   -80.20%  (p=0.000 n=96+99)
AddVW/10000-8     4.22µs ± 2%    1.80µs ± 3%   -57.32%  (p=0.000 n=69+92)
AddVW/100000-8    44.8µs ± 8%    31.5µs ± 3%   -29.76%  (p=0.000 n=99+90)

name            old time/op    new time/op    delta
SubVW/1-8         3.53ns ± 2%    3.63ns ± 5%    +2.97%  (p=0.000 n=94+93)
SubVW/2-8         4.33ns ± 5%    4.01ns ± 2%    -7.36%  (p=0.000 n=90+85)
SubVW/3-8         4.32ns ± 2%    4.32ns ± 5%      ~     (p=0.084 n=87+97)
SubVW/4-8         4.70ns ± 2%    4.83ns ± 6%    +2.77%  (p=0.000 n=85+96)
SubVW/5-8         5.84ns ± 1%    5.35ns ± 1%    -8.35%  (p=0.000 n=87+87)
SubVW/10-8        8.01ns ± 4%    7.54ns ± 4%    -5.84%  (p=0.000 n=98+97)
SubVW/100-8       43.9ns ± 5%    17.9ns ± 1%   -59.20%  (p=0.000 n=98+76)
SubVW/1000-8       426ns ± 2%      85ns ± 3%   -80.13%  (p=0.000 n=90+98)
SubVW/10000-8     4.24µs ± 2%    1.81µs ± 3%   -57.28%  (p=0.000 n=74+91)
SubVW/100000-8    44.5µs ± 4%    31.5µs ± 2%   -29.33%  (p=0.000 n=84+91)

Change-Id: I10dd361cbaca22197c27e7734c0f50065292afbb
Reviewed-on: https://go-review.googlesource.com/c/go/+/164969
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-03-09 20:34:40 +00:00
Josh Bleecher Snyder fe24837c4d math/big: add fast path for pure Go addVW for large z
In the normal case, only a few words have to be updated when adding a word to a vector.
When that happens, we can simply copy the rest of the words, which is much faster.
However, the overhead of that makes it prohibitive for small vectors,
so we check the size at the beginning.

The implementation is a bit weird to allow addVW to continued to be inlined; see #30548.

The AddVW benchmarks are surprising, but fully repeatable.
The SubVW benchmarks are more or less as expected.
I expect that removing the indirect function call will
help both and make them a bit more normal.

name            old time/op    new time/op     delta
AddVW/1-8         4.27ns ± 2%     3.81ns ± 3%   -10.83%  (p=0.000 n=89+90)
AddVW/2-8         4.91ns ± 2%     4.34ns ± 1%   -11.60%  (p=0.000 n=83+90)
AddVW/3-8         5.77ns ± 4%     5.76ns ± 2%      ~     (p=0.365 n=91+87)
AddVW/4-8         6.03ns ± 1%     6.03ns ± 1%      ~     (p=0.392 n=80+76)
AddVW/5-8         6.48ns ± 2%     6.63ns ± 1%    +2.27%  (p=0.000 n=76+74)
AddVW/10-8        9.56ns ± 2%     9.56ns ± 1%    -0.02%  (p=0.002 n=69+76)
AddVW/100-8       90.6ns ± 0%     18.1ns ± 4%   -79.99%  (p=0.000 n=72+94)
AddVW/1000-8       865ns ± 0%       85ns ± 6%   -90.14%  (p=0.000 n=66+96)
AddVW/10000-8     8.57µs ± 2%     1.82µs ± 3%   -78.73%  (p=0.000 n=99+94)
AddVW/100000-8    84.4µs ± 2%     31.8µs ± 4%   -62.29%  (p=0.000 n=93+98)

name            old time/op    new time/op     delta
SubVW/1-8         3.90ns ± 2%     4.13ns ± 4%    +6.02%  (p=0.000 n=92+95)
SubVW/2-8         4.15ns ± 1%     5.20ns ± 1%   +25.22%  (p=0.000 n=83+85)
SubVW/3-8         5.50ns ± 2%     6.22ns ± 6%   +13.21%  (p=0.000 n=91+97)
SubVW/4-8         5.99ns ± 1%     6.63ns ± 1%   +10.63%  (p=0.000 n=79+61)
SubVW/5-8         6.75ns ± 4%     6.88ns ± 2%    +1.82%  (p=0.000 n=98+73)
SubVW/10-8        9.57ns ± 1%     9.56ns ± 1%    -0.13%  (p=0.000 n=77+64)
SubVW/100-8       90.3ns ± 1%     18.1ns ± 2%   -80.00%  (p=0.000 n=75+94)
SubVW/1000-8       860ns ± 4%       85ns ± 7%   -90.14%  (p=0.000 n=97+99)
SubVW/10000-8     8.51µs ± 3%     1.77µs ± 6%   -79.21%  (p=0.000 n=100+97)
SubVW/100000-8    84.4µs ± 3%     31.5µs ± 3%   -62.66%  (p=0.000 n=92+92)

Change-Id: I721d7031d40f245b4a284f5bdd93e7bb85e7e937
Reviewed-on: https://go-review.googlesource.com/c/go/+/164968
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-03-09 20:33:46 +00:00
Josh Bleecher Snyder 4c227a091e math/big: remove bounds checks in pure Go implementations
These routines are quite sensitive to BCE.

This change eliminates bounds checks from loops.
It does so at the cost of a bit of safety:
malformed input will now return incorrect answers
instead of panicking.

This isn't as bad as it sounds: math/big has very good
test coverage, and the alternative implementations are in
assembly, which could do much worse things with malformed input.

If the compiler's BCE improves, so could these routines.

Notable BCE improvements for these routines would be:

* Allowing and propagating more cross-slice length hints.
  Then hints like _ = y[:len(z)] would eliminate bounds checks for y[i].

* Propagating enough information so that we could do
  n := len(x)
  if len(z) < n {
    n = len(z)
  }
  and then have i < n eliminate the same bounds checks as
  i < len(x) && i < len(z) currently does.

* Providing some way to do BCE for unrolled loops.
  Now that we have math/bits implementations,
  it is possible to write things like ADC chains in
  pure Go, if you can reasonably unroll loops.

Benchmarks below are for amd64, using -tags=math_big_pure_go.

name            old time/op    new time/op    delta
AddVV/1-8         5.15ns ± 3%    4.65ns ± 4%   -9.81%  (p=0.000 n=93+86)
AddVV/2-8         6.40ns ± 2%    5.58ns ± 4%  -12.78%  (p=0.000 n=90+95)
AddVV/3-8         7.07ns ± 2%    6.66ns ± 2%   -5.88%  (p=0.000 n=87+83)
AddVV/4-8         7.94ns ± 5%    7.41ns ± 4%   -6.65%  (p=0.000 n=94+98)
AddVV/5-8         8.55ns ± 1%    8.80ns ± 0%   +2.92%  (p=0.000 n=87+92)
AddVV/10-8        12.7ns ± 1%    12.3ns ± 1%   -3.12%  (p=0.000 n=83+71)
AddVV/100-8        119ns ± 5%     117ns ± 4%   -1.60%  (p=0.000 n=93+90)
AddVV/1000-8      1.14µs ± 4%    1.14µs ± 5%     ~     (p=0.812 n=95+91)
AddVV/10000-8     11.4µs ± 5%    11.3µs ± 5%     ~     (p=0.503 n=97+96)
AddVV/100000-8     114µs ± 4%     113µs ± 5%   -0.98%  (p=0.002 n=97+90)

name            old time/op    new time/op    delta
SubVV/1-8         5.23ns ± 5%    4.65ns ± 3%  -11.18%  (p=0.000 n=89+91)
SubVV/2-8         6.49ns ± 5%    5.58ns ± 3%  -14.04%  (p=0.000 n=92+94)
SubVV/3-8         7.10ns ± 3%    6.65ns ± 2%   -6.28%  (p=0.000 n=87+80)
SubVV/4-8         8.04ns ± 1%    7.44ns ± 5%   -7.49%  (p=0.000 n=83+98)
SubVV/5-8         8.55ns ± 2%    8.32ns ± 1%   -2.75%  (p=0.000 n=84+92)
SubVV/10-8        12.7ns ± 1%    12.3ns ± 1%   -3.09%  (p=0.000 n=80+75)
SubVV/100-8        119ns ± 0%     116ns ± 3%   -1.83%  (p=0.000 n=87+98)
SubVV/1000-8      1.13µs ± 5%    1.13µs ± 3%     ~     (p=0.082 n=96+98)
SubVV/10000-8     11.2µs ± 1%    11.3µs ± 3%   +0.76%  (p=0.000 n=87+97)
SubVV/100000-8     112µs ± 2%     113µs ± 3%   +0.55%  (p=0.000 n=76+88)

name            old time/op    new time/op    delta
AddVW/1-8         4.30ns ± 4%    3.96ns ± 6%  -8.02%  (p=0.000 n=89+97)
AddVW/2-8         5.15ns ± 2%    4.91ns ± 1%  -4.56%  (p=0.000 n=87+80)
AddVW/3-8         5.59ns ± 3%    5.75ns ± 2%  +2.91%  (p=0.000 n=91+88)
AddVW/4-8         6.20ns ± 1%    6.03ns ± 1%  -2.71%  (p=0.000 n=75+90)
AddVW/5-8         6.93ns ± 3%    6.49ns ± 2%  -6.35%  (p=0.000 n=100+82)
AddVW/10-8        10.0ns ± 7%     9.6ns ± 0%  -4.02%  (p=0.000 n=98+74)
AddVW/100-8       91.1ns ± 1%    90.6ns ± 1%  -0.55%  (p=0.000 n=84+80)
AddVW/1000-8       866ns ± 1%     856ns ± 4%  -1.06%  (p=0.000 n=69+96)
AddVW/10000-8     8.64µs ± 1%    8.53µs ± 4%  -1.25%  (p=0.000 n=67+99)
AddVW/100000-8    84.3µs ± 2%    85.4µs ± 4%  +1.22%  (p=0.000 n=89+99)

name            old time/op    new time/op    delta
SubVW/1-8         4.28ns ± 2%    3.82ns ± 3%  -10.63%  (p=0.000 n=91+89)
SubVW/2-8         4.61ns ± 1%    4.48ns ± 3%   -2.67%  (p=0.000 n=94+96)
SubVW/3-8         5.54ns ± 1%    5.81ns ± 4%   +4.87%  (p=0.000 n=92+97)
SubVW/4-8         6.20ns ± 1%    6.08ns ± 2%   -1.99%  (p=0.000 n=71+88)
SubVW/5-8         6.91ns ± 3%    6.64ns ± 1%   -3.90%  (p=0.000 n=97+70)
SubVW/10-8        9.85ns ± 2%    9.62ns ± 0%   -2.31%  (p=0.000 n=82+62)
SubVW/100-8       91.1ns ± 1%    90.9ns ± 3%   -0.14%  (p=0.010 n=71+93)
SubVW/1000-8       859ns ± 3%     867ns ± 1%   +0.98%  (p=0.000 n=99+78)
SubVW/10000-8     8.54µs ± 5%    8.57µs ± 2%   +0.38%  (p=0.007 n=98+92)
SubVW/100000-8    84.5µs ± 3%    84.6µs ± 3%     ~     (p=0.334 n=95+94)

name                old time/op    new time/op    delta
AddMulVVW/1-8         5.43ns ± 3%    4.36ns ± 2%  -19.67%  (p=0.000 n=95+94)
AddMulVVW/2-8         6.56ns ± 4%    6.11ns ± 1%   -6.90%  (p=0.000 n=91+91)
AddMulVVW/3-8         8.00ns ± 1%    7.80ns ± 4%   -2.52%  (p=0.000 n=83+95)
AddMulVVW/4-8         9.81ns ± 2%    9.53ns ± 1%   -2.86%  (p=0.000 n=77+64)
AddMulVVW/5-8         11.4ns ± 3%    11.3ns ± 5%   -0.89%  (p=0.000 n=95+97)
AddMulVVW/10-8        18.9ns ± 5%    19.1ns ± 5%   +0.89%  (p=0.000 n=91+94)
AddMulVVW/100-8        165ns ± 5%     165ns ± 4%     ~     (p=0.427 n=97+98)
AddMulVVW/1000-8      1.56µs ± 3%    1.56µs ± 4%     ~     (p=0.167 n=98+96)
AddMulVVW/10000-8     15.7µs ± 5%    15.6µs ± 5%   -0.31%  (p=0.044 n=95+97)
AddMulVVW/100000-8     156µs ± 3%     157µs ± 8%     ~     (p=0.373 n=72+99)

Change-Id: Ibc720785d5b95f6a797103b1363843205f4d56bf
Reviewed-on: https://go-review.googlesource.com/c/go/+/164966
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-03-09 20:33:13 +00:00
Robert Griesemer 129c6e4496 math/big: support new octal prefixes 0o and 0O
This CL extends the various SetString and Parse methods for
Ints, Rats, and Floats to accept the new octal prefixes.

The main change is in natconv.go, all other changes are
documentation and test updates.

Finally, this CL also fixes TestRatSetString which silently
dropped certain failures.

Updates #12711.

Change-Id: I5ee5879e25013ba1e6eda93ff280915f25ab5d55
Reviewed-on: https://go-review.googlesource.com/c/go/+/165898
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
2019-03-07 21:13:57 +00:00
Josh Bleecher Snyder d5edbcac98 math/big: rewrite pure Go implementations to use math/bits
While we're here, delete addWW_g and subWW_g, per the TODO.
They are now obsolete.

Benchmarks on amd64 with -tags=math_big_pure_go.

name                old time/op    new time/op     delta
AddVV/1-8             5.24ns ± 2%     5.12ns ± 1%    -2.11%  (p=0.000 n=82+87)
AddVV/2-8             6.44ns ± 1%     6.33ns ± 2%    -1.82%  (p=0.000 n=77+82)
AddVV/3-8             7.89ns ± 8%     6.97ns ± 4%   -11.71%  (p=0.000 n=100+96)
AddVV/4-8             8.60ns ± 0%     7.72ns ± 4%   -10.24%  (p=0.000 n=90+96)
AddVV/5-8             10.3ns ± 4%      8.5ns ± 1%   -17.02%  (p=0.000 n=96+91)
AddVV/10-8            16.2ns ± 5%     12.8ns ± 1%   -21.11%  (p=0.000 n=97+86)
AddVV/100-8            148ns ± 1%      117ns ± 5%   -21.07%  (p=0.000 n=66+98)
AddVV/1000-8          1.41µs ± 4%     1.13µs ± 3%   -19.90%  (p=0.000 n=97+97)
AddVV/10000-8         14.2µs ± 5%     11.2µs ± 1%   -20.82%  (p=0.000 n=99+84)
AddVV/100000-8         142µs ± 4%      113µs ± 4%   -20.40%  (p=0.000 n=91+92)
SubVV/1-8             5.29ns ± 1%     5.11ns ± 0%    -3.30%  (p=0.000 n=87+88)
SubVV/2-8             6.36ns ± 4%     6.33ns ± 2%    -0.56%  (p=0.002 n=98+73)
SubVV/3-8             7.58ns ± 5%     6.98ns ± 4%    -8.01%  (p=0.000 n=97+91)
SubVV/4-8             8.61ns ± 3%     7.98ns ± 2%    -7.31%  (p=0.000 n=95+83)
SubVV/5-8             10.6ns ± 2%      8.5ns ± 1%   -19.56%  (p=0.000 n=79+89)
SubVV/10-8            16.3ns ± 4%     12.7ns ± 1%   -21.97%  (p=0.000 n=98+82)
SubVV/100-8            124ns ± 1%      118ns ± 1%    -4.83%  (p=0.000 n=85+81)
SubVV/1000-8          1.14µs ± 5%     1.12µs ± 2%    -1.17%  (p=0.000 n=97+81)
SubVV/10000-8         11.6µs ±10%     11.2µs ± 1%    -3.39%  (p=0.000 n=100+84)
SubVV/100000-8         114µs ± 6%      114µs ± 5%      ~     (p=0.396 n=83+94)
AddVW/1-8             4.04ns ± 4%     4.34ns ± 4%    +7.57%  (p=0.000 n=96+98)
AddVW/2-8             4.34ns ± 5%     4.40ns ± 5%    +1.40%  (p=0.000 n=99+98)
AddVW/3-8             5.43ns ± 0%     5.54ns ± 2%    +1.97%  (p=0.000 n=85+94)
AddVW/4-8             6.23ns ± 1%     6.18ns ± 2%    -0.66%  (p=0.000 n=77+78)
AddVW/5-8             6.78ns ± 2%     6.90ns ± 4%    +1.77%  (p=0.000 n=80+99)
AddVW/10-8            10.5ns ± 4%      9.9ns ± 1%    -5.77%  (p=0.000 n=97+69)
AddVW/100-8            114ns ± 3%       91ns ± 0%   -20.38%  (p=0.000 n=98+77)
AddVW/1000-8          1.12µs ± 1%     0.87µs ± 1%   -22.80%  (p=0.000 n=82+68)
AddVW/10000-8         11.2µs ± 2%      8.5µs ± 5%   -23.85%  (p=0.000 n=85+100)
AddVW/100000-8         112µs ± 2%       85µs ± 5%   -24.22%  (p=0.000 n=71+96)
SubVW/1-8             4.09ns ± 2%     4.18ns ± 4%    +2.32%  (p=0.000 n=78+96)
SubVW/2-8             4.59ns ± 5%     4.52ns ± 7%    -1.54%  (p=0.000 n=98+94)
SubVW/3-8             5.41ns ±10%     5.55ns ± 1%    +2.48%  (p=0.000 n=100+89)
SubVW/4-8             6.51ns ± 2%     6.19ns ± 0%    -4.85%  (p=0.000 n=97+81)
SubVW/5-8             7.25ns ± 3%     6.90ns ± 4%    -4.93%  (p=0.000 n=97+96)
SubVW/10-8            10.6ns ± 4%      9.8ns ± 2%    -7.32%  (p=0.000 n=95+96)
SubVW/100-8           90.4ns ± 0%     90.8ns ± 0%    +0.43%  (p=0.000 n=83+78)
SubVW/1000-8           853ns ± 4%      857ns ± 2%    +0.42%  (p=0.000 n=100+98)
SubVW/10000-8         8.52µs ± 4%     8.53µs ± 2%      ~     (p=0.061 n=99+97)
SubVW/100000-8        84.8µs ± 5%     84.2µs ± 2%    -0.78%  (p=0.000 n=99+93)
AddMulVVW/1-8         8.73ns ± 0%     5.33ns ± 3%   -38.91%  (p=0.000 n=91+96)
AddMulVVW/2-8         14.8ns ± 3%      6.5ns ± 2%   -56.33%  (p=0.000 n=100+79)
AddMulVVW/3-8         18.6ns ± 2%      7.8ns ± 5%   -57.84%  (p=0.000 n=89+96)
AddMulVVW/4-8         24.0ns ± 2%      9.8ns ± 0%   -59.09%  (p=0.000 n=95+67)
AddMulVVW/5-8         29.0ns ± 2%     11.5ns ± 5%   -60.44%  (p=0.000 n=90+97)
AddMulVVW/10-8        54.1ns ± 0%     18.8ns ± 1%   -65.37%  (p=0.000 n=82+84)
AddMulVVW/100-8        508ns ± 2%      165ns ± 4%   -67.62%  (p=0.000 n=72+98)
AddMulVVW/1000-8      4.96µs ± 3%     1.55µs ± 1%   -68.86%  (p=0.000 n=99+91)
AddMulVVW/10000-8     50.0µs ± 4%     15.5µs ± 4%   -68.95%  (p=0.000 n=97+97)
AddMulVVW/100000-8     491µs ± 1%      156µs ± 8%   -68.22%  (p=0.000 n=79+95)

Change-Id: I4c6ae0b4065f371aea8103f6a85d9e9274bf01d0
Reviewed-on: https://go-review.googlesource.com/c/go/+/164965
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-03-04 20:49:12 +00:00
Josh Bleecher Snyder 87cc56718a math/big: optimize shlVU_g and shrVU_g
Special case shifts by zero.
Provide hints to the compiler that shifts are bounded.

There are no existing benchmarks for shifts,
but the Float implementation uses shifts,
so we can use those.

Benchmarks on amd64 with -tags=math_big_pure_go.

name                  old time/op    new time/op    delta
FloatString/100-8        869ns ± 3%     872ns ± 4%   +0.40%  (p=0.001 n=94+83)
FloatString/1000-8      26.5µs ± 1%    26.4µs ± 1%   -0.46%  (p=0.000 n=87+96)
FloatString/10000-8     2.18ms ± 2%    2.18ms ± 2%     ~     (p=0.687 n=90+89)
FloatString/100000-8     200ms ± 7%     197ms ± 5%   -1.47%  (p=0.000 n=100+90)
FloatAdd/10-8           65.9ns ± 4%    64.0ns ± 4%   -2.94%  (p=0.000 n=92+93)
FloatAdd/100-8          71.3ns ± 4%    67.4ns ± 4%   -5.51%  (p=0.000 n=96+93)
FloatAdd/1000-8          128ns ± 1%     121ns ± 0%   -5.69%  (p=0.000 n=91+80)
FloatAdd/10000-8         718ns ± 4%     626ns ± 4%  -12.83%  (p=0.000 n=99+99)
FloatAdd/100000-8       6.43µs ± 3%    5.50µs ± 1%  -14.50%  (p=0.000 n=98+83)
FloatSub/10-8           57.7ns ± 2%    57.0ns ± 4%   -1.20%  (p=0.000 n=89+96)
FloatSub/100-8          59.9ns ± 3%    58.7ns ± 4%   -2.10%  (p=0.000 n=100+98)
FloatSub/1000-8         94.5ns ± 1%    88.6ns ± 0%   -6.16%  (p=0.000 n=74+70)
FloatSub/10000-8         456ns ± 1%     416ns ± 5%   -8.83%  (p=0.000 n=87+95)
FloatSub/100000-8       4.00µs ± 1%    3.57µs ± 1%  -10.87%  (p=0.000 n=68+85)
FloatSqrt/64-8           585ns ± 1%     579ns ± 1%   -0.99%  (p=0.000 n=92+90)
FloatSqrt/128-8         1.26µs ± 1%    1.23µs ± 2%   -2.42%  (p=0.000 n=91+81)
FloatSqrt/256-8         1.45µs ± 3%    1.40µs ± 1%   -3.61%  (p=0.000 n=96+90)
FloatSqrt/1000-8        4.03µs ± 1%    3.91µs ± 1%   -3.05%  (p=0.000 n=90+93)
FloatSqrt/10000-8       48.0µs ± 0%    47.3µs ± 1%   -1.55%  (p=0.000 n=90+90)
FloatSqrt/100000-8      1.23ms ± 3%    1.22ms ± 4%   -1.00%  (p=0.000 n=99+99)
FloatSqrt/1000000-8     96.7ms ± 4%    98.0ms ±10%     ~     (p=0.322 n=89+99)

Change-Id: I0f941c05b7c324256d7f0674559b6ba906e92ba8
Reviewed-on: https://go-review.googlesource.com/c/go/+/164967
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-03-04 19:30:57 +00:00
Juraj Sukop 1d992f2e36 math/big: better initial guess for nat.sqrt
The proposed change introduces a better initial guess which is closer to the final value and therefore converges in fewer steps. Consider for example sqrt(8): previously the guess was 8, whereas now it is 4 (and the result is 2). All this change does is it computes the division by two more accurately while it keeps the guess ≥ √x.

Change-Id: I917248d734a7b0488d14a647a063f674e56c4e30
GitHub-Last-Rev: c06d9d4876
GitHub-Pull-Request: golang/go#28981
Reviewed-on: https://go-review.googlesource.com/c/163866
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-02-27 18:48:56 +00:00
Bryan C. Mills bd98628676 math/cmplx: avoid panic in Pow(x, NaN())
Fixes #30088

Change-Id: I08cec17feddc86bd08532e6b135807e3c8f4c1b2
Reviewed-on: https://go-review.googlesource.com/c/161197
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-02-27 14:01:03 +00:00
Brian Kessler a73abca37b math/big: handle alias of cofactor inputs in GCD
If the variables passed in to the cofactor arguments of GCD (x, y)
aliased the input arguments (a, b), the previous implementation would
result in incorrect results for y.  This change reorganizes the calculation
so that the only case that need to be handled is when y aliases b, which
can be handled with a simple check.

Tests were added for all of the alias cases for input arguments and and
and irrelevant test case for a previous binary GCD calculation was dropped.

Fixes #30217

Change-Id: Ibe6137f09b3e1ae3c29e3c97aba85b67f33dc169
Reviewed-on: https://go-review.googlesource.com/c/162517
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-02-27 00:11:17 +00:00
Russ Cox d6311ff1e4 math/big: add %#b and %O integer formats
Matching fmt, %#b now prints an 0b prefix,
and %O prints octal with an 0o prefix.

See golang.org/design/19308-number-literals for background.

For #19308.
For #12711.

Change-Id: I139c5a9a1dfae15415621601edfa13c6a5f19cfc
Reviewed-on: https://go-review.googlesource.com/c/160250
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-02-26 19:39:19 +00:00
Russ Cox 675503c507 math/big: add %x float format
big.Float already had %p for printing hex format,
but that format normalizes differently from fmt's %x
and ignores precision entirely.

This CL adds %x to big.Float, matching fmt's behavior:
the verb is spelled 'x' not 'p', the mantissa is normalized
to [1, 2), and precision is respected.

See golang.org/design/19308-number-literals for background.

For #29008.

Change-Id: I9c1b9612107094856797e5b0b584c556c1914895
Reviewed-on: https://go-review.googlesource.com/c/160249
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-02-26 19:39:11 +00:00
Michael Munday 42a82ce1a7 math/bits: optimize Reverse32 and Reverse64
Use ReverseBytes32 and ReverseBytes64 to speed up these functions.
The byte reversal functions are intrinsics on most platforms and
generally compile to a single instruction.

name       old time/op  new time/op  delta
Reverse32  2.41ns ± 1%  1.94ns ± 3%  -19.60%  (p=0.000 n=20+19)
Reverse64  3.85ns ± 1%  2.56ns ± 1%  -33.32%  (p=0.000 n=17+19)

Change-Id: I160bf59a0c7bd5db94114803ec5a59fae448f096
Reviewed-on: https://go-review.googlesource.com/c/159358
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-02-26 17:52:08 +00:00
Robert Griesemer fae44a2be3 src, misc: apply gofmt
This applies the new gofmt literal normalizations to the library.

Change-Id: I8c1e8ef62eb556fc568872c9f77a31ef236348e7
Reviewed-on: https://go-review.googlesource.com/c/162539
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-02-19 20:38:28 +00:00
Russ Cox e2d87f2ca5 strconv: format hex floats
This CL updates FormatFloat to format
standard hexadecimal floating-point constants,
using the 'x' and 'X' verbs.

See golang.org/design/19308-number-literals for background.

For #29008.

Change-Id: I540b8f71d492cfdb7c58af533d357a564591f28b
Reviewed-on: https://go-review.googlesource.com/c/160242
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-02-12 14:48:22 +00:00
Robert Griesemer 7bc2aa670f math/big: permit upper-case 'P' binary exponent (not just 'p')
The current implementation accepted binary exponents but restricted
them to 'p'. This change permits both 'p' and 'P'.

R=Go1.13

Updates #29008.

Change-Id: I7a89ccb86af4438f17b0422be7cb630ffcf43272
Reviewed-on: https://go-review.googlesource.com/c/159297
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
2019-02-11 23:22:35 +00:00
Robert Griesemer 33caf3be83 math/big: document that Rat.SetString accepts _decimal_ float representations
Updates #29799.

Change-Id: I267c2c3ba3964e96903954affc248d0c52c4916c
Reviewed-on: https://go-review.googlesource.com/c/158397
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-01-17 23:04:06 +00:00
Brian Kessler 649294d0a5 math: fix ternary correction statement in Log1p
The original port of Log1p incorrectly translated a ternary statement
so that a correction was only applied to one of the branches.

Fixes #29488

Change-Id: I035b2fc741f76fe7c0154c63da6e298b575e08a4
Reviewed-on: https://go-review.googlesource.com/c/156120
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Katie Hockman <katie@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-01-07 18:57:45 +00:00
Will Beason bfaf11c158 math/big: fix incorrect comment variable reference
Fix comment as w&1 is the parity of 'x', not of 'n'.

Change-Id: Ia0e448f7e5896412ff9b164459ce15561ab624cc
GitHub-Last-Rev: 54ba08ab10
GitHub-Pull-Request: golang/go#29419
Reviewed-on: https://go-review.googlesource.com/c/155743
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-12-26 05:21:41 +00:00
Robert Griesemer 9ce38f570f math: don't run huge argument tests on s390x
The s390x implementations for Sin/Cos/SinCos/Tan use assembly
routines which don't reduce arguments accurately enough for
huge inputs.

Fixes #29221.

Change-Id: I340f576899d67bb52a553c3ab22e6464172c936d
Reviewed-on: https://go-review.googlesource.com/c/154119
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-12-13 22:13:57 +00:00
Brian Kessler 02ad841dd8 math: correct mPi4 comment
The previous comment mis-stated the number of bits in mPi4.
The correct value is 19*64 + 1 == 1217 bits.

Change-Id: Ife971ff6936ce2d5b81ce663ce48044749d592a0
Reviewed-on: https://go-review.googlesource.com/c/154017
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-12-13 20:35:45 +00:00
Robert Griesemer 944a9c7a4f math: use constant rather than variable for exported test threshold
This is a minor follow-up on https://golang.org/cl/153059.

TBR=iant

Updates #6794.

Change-Id: I03657dafc572959d46a03f86bbeb280825bc969d
Reviewed-on: https://go-review.googlesource.com/c/153845
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-12-13 06:33:18 +00:00
Brian Kessler 98521a5a8f math: implement trignometric range reduction for huge arguments
This change implements Payne-Hanek range reduction by Pi/4
to properly calculate trigonometric functions of huge arguments.

The implementation is based on:

"ARGUMENT REDUCTION FOR HUGE ARGUMENTS: Good to the Last Bit"
K. C. Ng et al, March 24, 1992

The major difference with the reference is that the simulated
multi-precision calculation of x*B is implemented using 64-bit
integer arithmetic rather than floating point to ease extraction
of the relevant bits of 4/Pi.

The assembly implementations for 386 were removed since the trigonometric
instructions only use a 66-bit representation of Pi internally for
reduction.  It is not possible to use these instructions and maintain
accuracy without a prior accurate reduction in software as recommended
by Intel.

Fixes #6794

Change-Id: I31bf1369e0578891d738c5473447fe9b10560196
Reviewed-on: https://go-review.googlesource.com/c/153059
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-12-13 06:01:42 +00:00
Alberto Donizetti 11ce6eabd6 math/bits: remove named return in TrailingZeros16
TrailingZeros16 is the only one of the TrailingZeros functions with a
named return value in the signature. This creates a sligthly
unpleasant effect in the godoc listing:

  func TrailingZeros(x uint) int
  func TrailingZeros16(x uint16) (n int)
  func TrailingZeros32(x uint32) int
  func TrailingZeros64(x uint64) int
  func TrailingZeros8(x uint8) int

Since the named return value is not even used, remove it.

Change-Id: I15c5aedb6157003911b6e0685c357ce56e466c0e
Reviewed-on: https://go-review.googlesource.com/c/153340
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-12-09 14:27:56 +00:00
Robert Griesemer 276870d6e0 math: document sign bit correspondence for floating-point/bits conversions
Fixes #27736.

Change-Id: Ibda7da7ec6e731626fc43abf3e8c1190117f7885
Reviewed-on: https://go-review.googlesource.com/c/153057
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-12-06 22:27:54 +00:00
Josh Bleecher Snyder bfc54bb6f3 math/big: allocate less for single-Word nats
For many uses of math/big, most numbers are small in practice.
Prior to this change, big.NewInt allocated a minimum of five Words:
one to hold the value, and four as extra capacity.
In most cases, this extra capacity is waste.
Worse, allocating a single Word uses a fast malloc path for tiny allocs;
allocating five Words is more expensive in CPU as well as memory.

This change is a simple fix: Treat a request for one Word at its word.
I experimented with more complicated fixes and did not find anything
that outperformed this easy fix.

On some real world programs, this is a clear win.

The compiler:

name        old alloc/op      new alloc/op      delta
Template         37.1MB ± 0%       37.0MB ± 0%  -0.23%  (p=0.008 n=5+5)
Unicode          29.2MB ± 0%       28.5MB ± 0%  -2.48%  (p=0.008 n=5+5)
GoTypes           133MB ± 0%        133MB ± 0%  -0.05%  (p=0.008 n=5+5)
Compiler          628MB ± 0%        628MB ± 0%  -0.06%  (p=0.008 n=5+5)
SSA              2.04GB ± 0%       2.03GB ± 0%  -0.14%  (p=0.008 n=5+5)
Flate            24.7MB ± 0%       24.6MB ± 0%  -0.23%  (p=0.008 n=5+5)
GoParser         29.6MB ± 0%       29.6MB ± 0%  -0.07%  (p=0.008 n=5+5)
Reflect          82.3MB ± 0%       82.2MB ± 0%  -0.05%  (p=0.008 n=5+5)
Tar              36.2MB ± 0%       36.2MB ± 0%  -0.12%  (p=0.008 n=5+5)
XML              49.5MB ± 0%       49.4MB ± 0%  -0.23%  (p=0.008 n=5+5)
[Geo mean]       85.1MB            84.8MB       -0.37%

name        old allocs/op     new allocs/op     delta
Template           364k ± 0%         364k ± 0%    ~     (p=0.476 n=5+5)
Unicode            341k ± 0%         341k ± 0%    ~     (p=0.690 n=5+5)
GoTypes           1.37M ± 0%        1.37M ± 0%    ~     (p=0.444 n=5+5)
Compiler          5.50M ± 0%        5.50M ± 0%  +0.02%  (p=0.008 n=5+5)
SSA               16.0M ± 0%        16.0M ± 0%  +0.01%  (p=0.008 n=5+5)
Flate              238k ± 0%         238k ± 0%    ~     (p=0.222 n=5+5)
GoParser           305k ± 0%         305k ± 0%    ~     (p=0.841 n=5+5)
Reflect            976k ± 0%         976k ± 0%    ~     (p=0.222 n=5+5)
Tar                354k ± 0%         354k ± 0%    ~     (p=0.103 n=5+5)
XML                450k ± 0%         450k ± 0%    ~     (p=0.151 n=5+5)
[Geo mean]         837k              837k       +0.01%

go.skylark.net (at ea6d2813de75ded8d157b9540bc3d3ad0b688623):

name                     old alloc/op   new alloc/op   delta
Hashtable-8                 456kB ± 0%     299kB ± 0%  -34.33%  (p=0.000 n=9+9)
/bench_builtin_method-8     220kB ± 0%     190kB ± 0%  -13.55%  (p=0.000 n=9+10)

name                     old allocs/op  new allocs/op  delta
Hashtable-8                 7.84k ± 0%     7.84k ± 0%     ~     (all equal)
/bench_builtin_method-8     7.49k ± 0%     7.49k ± 0%     ~     (all equal)

The math/big benchmarks are messy, which is predictable, since they
naturally exercise the bigger-than-one-word code more.
Also worth noting is that many of the benchmarks have very high variance.
I've omitted the opVV and opVW benchmarks, as they are unrelated.

name                              old time/op    new time/op    delta
DecimalConversion-8                 92.5µs ± 1%    90.6µs ± 0%   -2.12%  (p=0.000 n=17+19)
FloatString/100-8                    867ns ± 0%     871ns ± 0%   +0.50%  (p=0.000 n=18+18)
FloatString/1000-8                  26.4µs ± 1%    26.5µs ± 1%     ~     (p=0.396 n=20+19)
FloatString/10000-8                 2.15ms ± 2%    2.16ms ± 2%     ~     (p=0.089 n=19+20)
FloatString/100000-8                 209ms ± 1%     209ms ± 1%     ~     (p=0.583 n=19+19)
FloatAdd/10-8                       63.5ns ± 2%    64.1ns ± 6%     ~     (p=0.389 n=19+19)
FloatAdd/100-8                      66.0ns ± 2%    65.8ns ± 2%     ~     (p=0.825 n=20+20)
FloatAdd/1000-8                     93.9ns ± 1%    94.3ns ± 1%     ~     (p=0.273 n=19+20)
FloatAdd/10000-8                     347ns ± 2%     342ns ± 1%   -1.50%  (p=0.000 n=18+18)
FloatAdd/100000-8                   2.78µs ± 1%    2.78µs ± 2%     ~     (p=0.961 n=20+19)
FloatSub/10-8                       56.9ns ± 2%    57.8ns ± 3%   +1.59%  (p=0.001 n=19+19)
FloatSub/100-8                      58.2ns ± 2%    58.9ns ± 2%   +1.25%  (p=0.004 n=20+20)
FloatSub/1000-8                     74.9ns ± 1%    74.4ns ± 1%   -0.76%  (p=0.000 n=19+20)
FloatSub/10000-8                     223ns ± 1%     220ns ± 2%   -1.29%  (p=0.000 n=16+20)
FloatSub/100000-8                   1.66µs ± 1%    1.66µs ± 2%     ~     (p=0.147 n=20+20)
ParseFloatSmallExp-8                8.38µs ± 0%    8.59µs ± 0%   +2.48%  (p=0.000 n=19+19)
ParseFloatLargeExp-8                31.1µs ± 0%    32.0µs ± 0%   +3.04%  (p=0.000 n=16+17)
GCD10x10/WithoutXY-8                 115ns ± 1%      99ns ± 3%  -14.07%  (p=0.000 n=20+20)
GCD10x10/WithXY-8                    322ns ± 0%     312ns ± 0%   -3.11%  (p=0.000 n=18+13)
GCD10x100/WithoutXY-8                233ns ± 1%     219ns ± 1%   -5.73%  (p=0.000 n=19+17)
GCD10x100/WithXY-8                   709ns ± 0%     759ns ± 0%   +7.04%  (p=0.000 n=19+19)
GCD10x1000/WithoutXY-8               653ns ± 1%     642ns ± 1%   -1.69%  (p=0.000 n=17+20)
GCD10x1000/WithXY-8                 1.35µs ± 0%    1.35µs ± 1%     ~     (p=0.255 n=20+16)
GCD10x10000/WithoutXY-8             4.57µs ± 1%    4.61µs ± 1%   +0.95%  (p=0.000 n=18+17)
GCD10x10000/WithXY-8                6.82µs ± 0%    6.84µs ± 0%   +0.27%  (p=0.000 n=16+17)
GCD10x100000/WithoutXY-8            43.9µs ± 1%    44.0µs ± 1%   +0.28%  (p=0.000 n=18+17)
GCD10x100000/WithXY-8               60.6µs ± 0%    60.6µs ± 0%     ~     (p=0.907 n=18+18)
GCD100x100/WithoutXY-8              1.13µs ± 0%    1.21µs ± 0%   +6.39%  (p=0.000 n=19+19)
GCD100x100/WithXY-8                 1.82µs ± 0%    1.92µs ± 0%   +5.24%  (p=0.000 n=19+17)
GCD100x1000/WithoutXY-8             2.00µs ± 0%    2.03µs ± 1%   +1.61%  (p=0.000 n=18+16)
GCD100x1000/WithXY-8                3.22µs ± 0%    3.20µs ± 1%   -0.83%  (p=0.000 n=19+19)
GCD100x10000/WithoutXY-8            9.28µs ± 1%    9.17µs ± 1%   -1.25%  (p=0.000 n=18+19)
GCD100x10000/WithXY-8               13.5µs ± 0%    13.3µs ± 0%   -1.12%  (p=0.000 n=18+19)
GCD100x100000/WithoutXY-8           80.4µs ± 0%    78.6µs ± 0%   -2.25%  (p=0.000 n=19+19)
GCD100x100000/WithXY-8               114µs ± 0%     112µs ± 0%   -1.46%  (p=0.000 n=19+17)
GCD1000x1000/WithoutXY-8            12.9µs ± 1%    12.9µs ± 2%   -0.50%  (p=0.014 n=20+19)
GCD1000x1000/WithXY-8               19.6µs ± 1%    19.6µs ± 2%   -0.28%  (p=0.040 n=17+18)
GCD1000x10000/WithoutXY-8           22.4µs ± 0%    22.4µs ± 2%     ~     (p=0.220 n=19+19)
GCD1000x10000/WithXY-8              57.0µs ± 0%    56.5µs ± 0%   -0.87%  (p=0.000 n=20+20)
GCD1000x100000/WithoutXY-8           116µs ± 0%     115µs ± 0%   -0.49%  (p=0.000 n=18+19)
GCD1000x100000/WithXY-8              410µs ± 0%     411µs ± 0%     ~     (p=0.052 n=19+19)
GCD10000x10000/WithoutXY-8           247µs ± 1%     244µs ± 1%   -0.92%  (p=0.000 n=19+19)
GCD10000x10000/WithXY-8              476µs ± 1%     473µs ± 1%   -0.48%  (p=0.009 n=19+19)
GCD10000x100000/WithoutXY-8          573µs ± 1%     571µs ± 1%   -0.45%  (p=0.012 n=20+20)
GCD10000x100000/WithXY-8            3.35ms ± 1%    3.35ms ± 1%     ~     (p=0.444 n=20+19)
GCD100000x100000/WithoutXY-8        12.0ms ± 2%    11.9ms ± 2%     ~     (p=0.276 n=18+20)
GCD100000x100000/WithXY-8           27.3ms ± 1%    27.3ms ± 1%     ~     (p=0.792 n=20+19)
Hilbert-8                            672µs ± 0%     611µs ± 0%   -9.02%  (p=0.000 n=19+19)
Binomial-8                          1.40µs ± 0%    1.18µs ± 0%  -15.69%  (p=0.000 n=16+14)
QuoRem-8                            2.20µs ± 1%    2.17µs ± 1%   -1.13%  (p=0.000 n=19+19)
Exp-8                               4.10ms ± 1%    4.11ms ± 1%     ~     (p=0.296 n=20+19)
Exp2-8                              4.11ms ± 1%    4.12ms ± 1%     ~     (p=0.429 n=20+20)
Bitset-8                            8.67ns ± 6%    8.74ns ± 4%     ~     (p=0.139 n=19+17)
BitsetNeg-8                         43.6ns ± 1%    43.8ns ± 2%   +0.61%  (p=0.036 n=20+20)
BitsetOrig-8                        77.5ns ± 1%    68.4ns ± 1%  -11.77%  (p=0.000 n=19+20)
BitsetNegOrig-8                      145ns ± 1%     141ns ± 1%   -2.87%  (p=0.000 n=19+20)
ModSqrt225_Tonelli-8                 324µs ± 1%     324µs ± 1%     ~     (p=0.409 n=18+20)
ModSqrt225_3Mod4-8                  98.9µs ± 1%    99.1µs ± 1%     ~     (p=0.298 n=19+18)
ModSqrt231_Tonelli-8                 337µs ± 1%     337µs ± 1%     ~     (p=0.718 n=20+18)
ModSqrt231_5Mod8-8                   115µs ± 1%     114µs ± 1%   -0.22%  (p=0.050 n=20+20)
ModInverse-8                         895ns ± 0%     869ns ± 1%   -2.83%  (p=0.000 n=17+17)
Sqrt-8                              28.1µs ± 1%    28.1µs ± 0%   -0.28%  (p=0.000 n=16+20)
IntSqr/1-8                          10.8ns ± 3%    10.5ns ± 3%   -2.51%  (p=0.000 n=19+17)
IntSqr/2-8                          30.5ns ± 2%    30.3ns ± 4%   -0.71%  (p=0.035 n=18+18)
IntSqr/3-8                          40.1ns ± 1%    40.1ns ± 1%     ~     (p=0.710 n=20+17)
IntSqr/5-8                          65.3ns ± 1%    65.4ns ± 2%     ~     (p=0.744 n=19+19)
IntSqr/8-8                           101ns ± 1%     102ns ± 0%     ~     (p=0.234 n=19+20)
IntSqr/10-8                          138ns ± 0%     138ns ± 2%     ~     (p=0.827 n=18+18)
IntSqr/20-8                          378ns ± 1%     378ns ± 1%     ~     (p=0.479 n=18+18)
IntSqr/30-8                          637ns ± 0%     638ns ± 1%     ~     (p=0.051 n=18+20)
IntSqr/50-8                         1.34µs ± 2%    1.34µs ± 1%     ~     (p=0.970 n=18+19)
IntSqr/80-8                         2.78µs ± 0%    2.78µs ± 1%   -0.18%  (p=0.006 n=19+17)
IntSqr/100-8                        3.98µs ± 0%    3.98µs ± 0%     ~     (p=0.057 n=17+19)
IntSqr/200-8                        13.5µs ± 0%    13.5µs ± 1%   -0.33%  (p=0.000 n=19+17)
IntSqr/300-8                        25.3µs ± 1%    25.3µs ± 1%     ~     (p=0.361 n=19+20)
IntSqr/500-8                        62.9µs ± 0%    62.9µs ± 1%     ~     (p=0.899 n=17+17)
IntSqr/800-8                         128µs ± 1%     127µs ± 1%   -0.32%  (p=0.016 n=18+20)
IntSqr/1000-8                        192µs ± 0%     192µs ± 1%     ~     (p=0.916 n=17+18)
Div/20/10-8                         34.9ns ± 2%    35.6ns ± 1%   +2.01%  (p=0.000 n=20+20)
Div/200/100-8                        218ns ± 1%     215ns ± 2%   -1.43%  (p=0.000 n=18+18)
Div/2000/1000-8                     1.16µs ± 1%    1.15µs ± 1%   -1.04%  (p=0.000 n=19+20)
Div/20000/10000-8                   35.7µs ± 1%    35.4µs ± 1%   -0.69%  (p=0.000 n=19+18)
Div/200000/100000-8                 2.89ms ± 1%    2.88ms ± 1%   -0.62%  (p=0.007 n=19+20)
Mul-8                               9.28ms ± 1%    9.27ms ± 1%     ~     (p=0.563 n=18+18)
ZeroShifts/Shl-8                     712ns ± 6%     716ns ± 7%     ~     (p=0.597 n=20+20)
ZeroShifts/ShlSame-8                4.00ns ± 1%    4.06ns ± 5%     ~     (p=0.162 n=18+20)
ZeroShifts/Shr-8                     714ns ±10%   1285ns ±156%     ~     (p=0.250 n=20+20)
ZeroShifts/ShrSame-8                4.00ns ± 1%    4.09ns ±10%   +2.34%  (p=0.048 n=16+19)
Exp3Power/0x10-8                     154ns ± 0%     159ns ±13%     ~     (p=0.197 n=14+20)
Exp3Power/0x40-8                     171ns ± 1%     175ns ± 8%     ~     (p=0.058 n=16+19)
Exp3Power/0x100-8                    287ns ± 0%     316ns ± 4%  +10.03%  (p=0.000 n=17+19)
Exp3Power/0x400-8                    698ns ± 1%     801ns ± 6%  +14.75%  (p=0.000 n=19+20)
Exp3Power/0x1000-8                  2.87µs ± 0%    3.65µs ± 6%  +27.24%  (p=0.000 n=18+18)
Exp3Power/0x4000-8                  21.9µs ± 1%    28.7µs ± 8%  +31.09%  (p=0.000 n=18+20)
Exp3Power/0x10000-8                  204µs ± 0%     267µs ± 9%  +30.81%  (p=0.000 n=20+20)
Exp3Power/0x40000-8                 1.86ms ± 0%    2.26ms ± 5%  +21.68%  (p=0.000 n=18+19)
Exp3Power/0x100000-8                17.5ms ± 1%    20.7ms ± 7%  +18.39%  (p=0.000 n=19+20)
Exp3Power/0x400000-8                 156ms ± 0%     172ms ± 6%  +10.54%  (p=0.000 n=19+20)
Fibo-8                              26.9ms ± 1%    27.5ms ± 3%   +2.32%  (p=0.000 n=19+19)
NatSqr/1-8                          31.0ns ± 4%    39.5ns ±29%  +27.25%  (p=0.000 n=20+19)
NatSqr/2-8                          54.1ns ± 1%    69.0ns ±28%  +27.52%  (p=0.000 n=20+20)
NatSqr/3-8                          66.6ns ± 1%    83.0ns ±25%  +24.59%  (p=0.000 n=20+20)
NatSqr/5-8                          97.1ns ± 1%   119.9ns ±12%  +23.50%  (p=0.000 n=16+20)
NatSqr/8-8                           138ns ± 1%     171ns ± 9%  +24.20%  (p=0.000 n=19+20)
NatSqr/10-8                          182ns ± 0%     225ns ± 9%  +23.50%  (p=0.000 n=16+20)
NatSqr/20-8                          447ns ± 1%     624ns ± 6%  +39.64%  (p=0.000 n=19+19)
NatSqr/30-8                          736ns ± 2%     986ns ± 9%  +33.94%  (p=0.000 n=19+20)
NatSqr/50-8                         1.51µs ± 2%    1.97µs ± 9%  +30.42%  (p=0.000 n=20+20)
NatSqr/80-8                         3.03µs ± 1%    3.67µs ± 7%  +21.08%  (p=0.000 n=20+20)
NatSqr/100-8                        4.31µs ± 1%    5.20µs ± 7%  +20.52%  (p=0.000 n=19+20)
NatSqr/200-8                        14.2µs ± 0%    16.3µs ± 4%  +14.92%  (p=0.000 n=19+20)
NatSqr/300-8                        27.8µs ± 1%    33.2µs ± 7%  +19.28%  (p=0.000 n=20+18)
NatSqr/500-8                        66.6µs ± 1%    74.5µs ± 3%  +11.87%  (p=0.000 n=18+18)
NatSqr/800-8                         135µs ± 1%     165µs ± 7%  +22.33%  (p=0.000 n=20+20)
NatSqr/1000-8                        200µs ± 0%     228µs ± 3%  +14.39%  (p=0.000 n=19+20)
NatSetBytes/8-8                     8.87ns ± 4%    8.77ns ± 2%   -1.17%  (p=0.020 n=20+16)
NatSetBytes/24-8                    38.6ns ± 3%    49.5ns ±29%  +28.32%  (p=0.000 n=18+19)
NatSetBytes/128-8                   75.2ns ± 1%   120.7ns ±29%  +60.60%  (p=0.000 n=17+20)
NatSetBytes/7-8                     16.2ns ± 2%    16.5ns ± 2%   +1.76%  (p=0.000 n=20+20)
NatSetBytes/23-8                    46.5ns ± 1%    60.2ns ±24%  +29.59%  (p=0.000 n=20+20)
NatSetBytes/127-8                   83.1ns ± 1%   118.2ns ±20%  +42.33%  (p=0.000 n=18+20)
ScanPi-8                            89.1µs ± 1%   117.4µs ±12%  +31.75%  (p=0.000 n=18+20)
StringPiParallel-8                  35.1µs ± 9%    40.2µs ±12%  +14.53%  (p=0.000 n=20+20)
Scan/10/Base2-8                      410ns ±14%     429ns ±10%   +4.47%  (p=0.018 n=19+20)
Scan/100/Base2-8                    3.05µs ±20%    2.97µs ±14%     ~     (p=0.449 n=20+20)
Scan/1000/Base2-8                   29.3µs ± 8%    30.1µs ±23%     ~     (p=0.355 n=20+20)
Scan/10000/Base2-8                   402µs ±13%     395µs ±14%     ~     (p=0.355 n=20+20)
Scan/100000/Base2-8                 11.8ms ±10%    11.6ms ± 1%     ~     (p=0.245 n=17+18)
Scan/10/Base8-8                      194ns ± 6%     196ns ±12%     ~     (p=0.829 n=20+19)
Scan/100/Base8-8                    1.11µs ±15%    1.11µs ±12%     ~     (p=0.743 n=20+20)
Scan/1000/Base8-8                   11.7µs ±10%    11.7µs ±12%     ~     (p=0.904 n=20+20)
Scan/10000/Base8-8                   209µs ± 7%     210µs ± 8%     ~     (p=0.478 n=20+20)
Scan/100000/Base8-8                 10.6ms ± 7%    10.4ms ± 6%     ~     (p=0.112 n=20+18)
Scan/10/Base10-8                     182ns ±12%     188ns ±11%   +3.52%  (p=0.044 n=20+20)
Scan/100/Base10-8                   1.01µs ± 8%    1.00µs ±13%     ~     (p=0.588 n=20+20)
Scan/1000/Base10-8                  10.7µs ±20%    10.6µs ±14%     ~     (p=0.560 n=20+20)
Scan/10000/Base10-8                  195µs ±10%     194µs ± 9%     ~     (p=0.883 n=20+20)
Scan/100000/Base10-8                10.6ms ± 2%    10.6ms ± 2%     ~     (p=0.495 n=20+20)
Scan/10/Base16-8                     166ns ±10%     174ns ±17%     ~     (p=0.072 n=20+20)
Scan/100/Base16-8                    836ns ±10%     826ns ±12%     ~     (p=0.562 n=20+17)
Scan/1000/Base16-8                  8.96µs ±13%    8.65µs ± 9%     ~     (p=0.203 n=20+18)
Scan/10000/Base16-8                  198µs ± 3%     198µs ± 5%     ~     (p=0.718 n=20+20)
Scan/100000/Base16-8                11.1ms ± 3%    11.0ms ± 4%     ~     (p=0.512 n=20+20)
String/10/Base2-8                   88.1ns ± 7%    94.1ns ±11%   +6.80%  (p=0.000 n=19+20)
String/100/Base2-8                   577ns ± 4%     598ns ± 5%   +3.72%  (p=0.000 n=20+20)
String/1000/Base2-8                 5.25µs ± 2%    5.62µs ± 5%   +7.04%  (p=0.000 n=19+20)
String/10000/Base2-8                55.6µs ± 1%    60.1µs ± 2%   +8.12%  (p=0.000 n=19+19)
String/100000/Base2-8                519µs ± 2%     560µs ± 2%   +7.91%  (p=0.000 n=18+17)
String/10/Base8-8                   52.2ns ± 8%    53.3ns ±12%     ~     (p=0.188 n=20+18)
String/100/Base8-8                   218ns ± 3%     232ns ±10%   +6.66%  (p=0.000 n=20+20)
String/1000/Base8-8                 1.84µs ± 3%    1.94µs ± 4%   +5.07%  (p=0.000 n=20+18)
String/10000/Base8-8                18.1µs ± 2%    19.1µs ± 3%   +5.84%  (p=0.000 n=20+19)
String/100000/Base8-8                184µs ± 2%     197µs ± 1%   +7.15%  (p=0.000 n=19+19)
String/10/Base10-8                   158ns ± 7%     146ns ± 6%   -7.65%  (p=0.000 n=20+19)
String/100/Base10-8                  807ns ± 2%     845ns ± 4%   +4.79%  (p=0.000 n=20+19)
String/1000/Base10-8                3.99µs ± 3%    3.99µs ± 7%     ~     (p=0.920 n=20+20)
String/10000/Base10-8               20.8µs ± 6%    22.1µs ±10%   +6.11%  (p=0.000 n=19+20)
String/100000/Base10-8              5.60ms ± 2%    5.59ms ± 2%     ~     (p=0.749 n=20+19)
String/10/Base16-8                  49.0ns ±13%    49.3ns ±16%     ~     (p=0.581 n=19+20)
String/100/Base16-8                  173ns ± 5%     185ns ± 6%   +6.63%  (p=0.000 n=20+18)
String/1000/Base16-8                1.38µs ± 3%    1.49µs ±10%   +8.27%  (p=0.000 n=19+20)
String/10000/Base16-8               13.5µs ± 2%    14.5µs ± 3%   +7.08%  (p=0.000 n=20+20)
String/100000/Base16-8               138µs ± 4%     148µs ± 4%   +7.57%  (p=0.000 n=19+20)
LeafSize/0-8                        2.74ms ± 1%    2.79ms ± 2%   +2.00%  (p=0.000 n=19+19)
LeafSize/1-8                        24.8µs ± 4%    26.1µs ± 8%   +5.33%  (p=0.000 n=18+19)
LeafSize/2-8                        24.9µs ± 7%    25.0µs ± 8%     ~     (p=0.989 n=20+19)
LeafSize/3-8                        97.6µs ± 3%   100.2µs ± 5%   +2.66%  (p=0.001 n=20+19)
LeafSize/4-8                        25.2µs ± 5%    25.4µs ± 5%     ~     (p=0.173 n=19+20)
LeafSize/5-8                         118µs ± 2%     119µs ± 5%     ~     (p=0.478 n=20+20)
LeafSize/6-8                        97.6µs ± 3%   100.1µs ± 8%   +2.65%  (p=0.021 n=20+19)
LeafSize/7-8                        65.6µs ± 5%    67.5µs ± 6%   +2.92%  (p=0.003 n=20+19)
LeafSize/8-8                        25.5µs ± 5%    25.6µs ± 6%     ~     (p=0.461 n=19+20)
LeafSize/9-8                         134µs ± 4%     136µs ± 5%     ~     (p=0.194 n=19+20)
LeafSize/10-8                        119µs ± 3%     122µs ± 3%   +2.52%  (p=0.000 n=20+19)
LeafSize/11-8                        115µs ± 5%     116µs ± 5%     ~     (p=0.158 n=20+19)
LeafSize/12-8                       97.4µs ± 4%   100.3µs ± 5%   +2.91%  (p=0.003 n=19+20)
LeafSize/13-8                       93.1µs ± 4%    93.0µs ± 6%     ~     (p=0.698 n=20+20)
LeafSize/14-8                       67.0µs ± 3%    69.7µs ± 6%   +4.10%  (p=0.000 n=20+20)
LeafSize/15-8                       48.3µs ± 2%    49.3µs ± 6%   +1.91%  (p=0.014 n=19+20)
LeafSize/16-8                       25.6µs ± 5%    25.6µs ± 6%     ~     (p=0.947 n=20+20)
LeafSize/32-8                       30.1µs ± 4%    30.3µs ± 5%     ~     (p=0.685 n=18+19)
LeafSize/64-8                       53.4µs ± 2%    54.0µs ± 3%     ~     (p=0.053 n=19+19)
ProbablyPrime/n=0-8                 3.59ms ± 1%    3.55ms ± 1%   -1.12%  (p=0.000 n=20+18)
ProbablyPrime/n=1-8                 4.21ms ± 2%    4.17ms ± 2%   -0.73%  (p=0.018 n=20+19)
ProbablyPrime/n=5-8                 6.74ms ± 1%    6.72ms ± 1%     ~     (p=0.102 n=20+20)
ProbablyPrime/n=10-8                9.91ms ± 1%    9.89ms ± 2%     ~     (p=0.322 n=19+20)
ProbablyPrime/n=20-8                16.2ms ± 1%    16.1ms ± 2%   -0.52%  (p=0.006 n=19+19)
ProbablyPrime/Lucas-8               2.94ms ± 1%    2.95ms ± 1%   +0.52%  (p=0.002 n=18+19)
ProbablyPrime/MillerRabinBase2-8     641µs ± 2%     640µs ± 2%     ~     (p=0.607 n=19+20)
FloatSqrt/64-8                       653ns ± 5%     704ns ± 5%   +7.82%  (p=0.000 n=19+20)
FloatSqrt/128-8                     1.32µs ± 3%    1.42µs ± 5%   +7.29%  (p=0.000 n=18+20)
FloatSqrt/256-8                     1.44µs ± 2%    1.45µs ± 4%     ~     (p=0.089 n=19+19)
FloatSqrt/1000-8                    3.36µs ± 3%    3.42µs ± 5%   +1.82%  (p=0.012 n=20+20)
FloatSqrt/10000-8                   25.5µs ± 2%    27.5µs ± 7%   +7.91%  (p=0.000 n=18+19)
FloatSqrt/100000-8                   629µs ± 6%     663µs ± 9%   +5.32%  (p=0.000 n=18+20)
FloatSqrt/1000000-8                 46.4ms ± 2%    46.6ms ± 5%     ~     (p=0.351 n=20+19)
[Geo mean]                          9.60µs        10.01µs        +4.28%

name                              old alloc/op   new alloc/op   delta
DecimalConversion-8                 54.0kB ± 0%    43.6kB ± 0%  -19.40%  (p=0.000 n=20+20)
FloatString/100-8                     400B ± 0%      400B ± 0%     ~     (all equal)
FloatString/1000-8                  3.10kB ± 0%    3.10kB ± 0%     ~     (all equal)
FloatString/10000-8                 52.1kB ± 0%    52.1kB ± 0%     ~     (p=0.153 n=20+20)
FloatString/100000-8                 582kB ± 0%     582kB ± 0%     ~     (all equal)
FloatAdd/10-8                        0.00B          0.00B          ~     (all equal)
FloatAdd/100-8                       0.00B          0.00B          ~     (all equal)
FloatAdd/1000-8                      0.00B          0.00B          ~     (all equal)
FloatAdd/10000-8                     0.00B          0.00B          ~     (all equal)
FloatAdd/100000-8                    0.00B          0.00B          ~     (all equal)
FloatSub/10-8                        0.00B          0.00B          ~     (all equal)
FloatSub/100-8                       0.00B          0.00B          ~     (all equal)
FloatSub/1000-8                      0.00B          0.00B          ~     (all equal)
FloatSub/10000-8                     0.00B          0.00B          ~     (all equal)
FloatSub/100000-8                    0.00B          0.00B          ~     (all equal)
ParseFloatSmallExp-8                4.18kB ± 0%    3.60kB ± 0%  -13.79%  (p=0.000 n=20+20)
ParseFloatLargeExp-8                18.9kB ± 0%    19.3kB ± 0%   +2.25%  (p=0.000 n=20+20)
GCD10x10/WithoutXY-8                 96.0B ± 0%     16.0B ± 0%  -83.33%  (p=0.000 n=20+20)
GCD10x10/WithXY-8                     240B ± 0%       88B ± 0%  -63.33%  (p=0.000 n=20+20)
GCD10x100/WithoutXY-8                 192B ± 0%      112B ± 0%  -41.67%  (p=0.000 n=20+20)
GCD10x100/WithXY-8                    464B ± 0%      424B ± 0%   -8.62%  (p=0.000 n=20+20)
GCD10x1000/WithoutXY-8                416B ± 0%      336B ± 0%  -19.23%  (p=0.000 n=20+20)
GCD10x1000/WithXY-8                 1.25kB ± 0%    1.10kB ± 0%  -12.18%  (p=0.000 n=20+20)
GCD10x10000/WithoutXY-8             2.91kB ± 0%    2.83kB ± 0%   -2.75%  (p=0.000 n=20+20)
GCD10x10000/WithXY-8                8.70kB ± 0%    8.55kB ± 0%   -1.76%  (p=0.000 n=16+16)
GCD10x100000/WithoutXY-8            27.2kB ± 0%    27.2kB ± 0%   -0.29%  (p=0.000 n=20+20)
GCD10x100000/WithXY-8               82.4kB ± 0%    82.3kB ± 0%   -0.17%  (p=0.000 n=20+19)
GCD100x100/WithoutXY-8                288B ± 0%      384B ± 0%  +33.33%  (p=0.000 n=20+20)
GCD100x100/WithXY-8                   464B ± 0%      576B ± 0%  +24.14%  (p=0.000 n=20+20)
GCD100x1000/WithoutXY-8               640B ± 0%      688B ± 0%   +7.50%  (p=0.000 n=20+20)
GCD100x1000/WithXY-8                1.52kB ± 0%    1.46kB ± 0%   -3.68%  (p=0.000 n=20+20)
GCD100x10000/WithoutXY-8            4.24kB ± 0%    4.29kB ± 0%   +1.13%  (p=0.000 n=20+20)
GCD100x10000/WithXY-8               11.1kB ± 0%    11.0kB ± 0%   -0.51%  (p=0.000 n=15+20)
GCD100x100000/WithoutXY-8           40.9kB ± 0%    40.9kB ± 0%   +0.12%  (p=0.000 n=20+19)
GCD100x100000/WithXY-8               110kB ± 0%     109kB ± 0%   -0.08%  (p=0.000 n=20+20)
GCD1000x1000/WithoutXY-8            1.22kB ± 0%    1.06kB ± 0%  -13.16%  (p=0.000 n=20+20)
GCD1000x1000/WithXY-8               2.37kB ± 0%    2.11kB ± 0%  -10.83%  (p=0.000 n=20+20)
GCD1000x10000/WithoutXY-8           4.71kB ± 0%    4.63kB ± 0%   -1.70%  (p=0.000 n=20+19)
GCD1000x10000/WithXY-8              28.2kB ± 0%    28.0kB ± 0%   -0.43%  (p=0.000 n=20+15)
GCD1000x100000/WithoutXY-8          41.3kB ± 0%    41.2kB ± 0%   -0.20%  (p=0.000 n=20+16)
GCD1000x100000/WithXY-8              301kB ± 0%     301kB ± 0%   -0.13%  (p=0.000 n=20+20)
GCD10000x10000/WithoutXY-8          8.64kB ± 0%    8.48kB ± 0%   -1.85%  (p=0.000 n=20+20)
GCD10000x10000/WithXY-8             57.2kB ± 0%    57.7kB ± 0%   +0.80%  (p=0.000 n=20+20)
GCD10000x100000/WithoutXY-8         43.8kB ± 0%    43.7kB ± 0%   -0.19%  (p=0.000 n=20+18)
GCD10000x100000/WithXY-8            2.08MB ± 0%    2.08MB ± 0%   -0.02%  (p=0.000 n=15+19)
GCD100000x100000/WithoutXY-8        81.6kB ± 0%    81.4kB ± 0%   -0.20%  (p=0.000 n=20+20)
GCD100000x100000/WithXY-8           4.32MB ± 0%    4.33MB ± 0%   +0.12%  (p=0.000 n=20+20)
Hilbert-8                            653kB ± 0%     313kB ± 0%  -52.13%  (p=0.000 n=19+20)
Binomial-8                          1.82kB ± 0%    1.02kB ± 0%  -43.86%  (p=0.000 n=20+20)
QuoRem-8                             0.00B          0.00B          ~     (all equal)
Exp-8                               11.1kB ± 0%    11.0kB ± 0%   -0.34%  (p=0.000 n=19+20)
Exp2-8                              11.3kB ± 0%    11.3kB ± 0%   -0.35%  (p=0.000 n=19+20)
Bitset-8                             0.00B          0.00B          ~     (all equal)
BitsetNeg-8                          0.00B          0.00B          ~     (all equal)
BitsetOrig-8                          103B ± 0%       63B ± 0%  -38.83%  (p=0.000 n=20+20)
BitsetNegOrig-8                       215B ± 0%      175B ± 0%  -18.60%  (p=0.000 n=20+20)
ModSqrt225_Tonelli-8                11.3kB ± 0%    11.0kB ± 0%   -2.76%  (p=0.000 n=20+17)
ModSqrt225_3Mod4-8                  3.57kB ± 0%    3.53kB ± 0%   -1.12%  (p=0.000 n=20+20)
ModSqrt231_Tonelli-8                11.0kB ± 0%    10.7kB ± 0%   -2.55%  (p=0.000 n=20+20)
ModSqrt231_5Mod8-8                  4.21kB ± 0%    4.09kB ± 0%   -2.85%  (p=0.000 n=16+20)
ModInverse-8                        1.44kB ± 0%    1.28kB ± 0%  -11.11%  (p=0.000 n=20+20)
Sqrt-8                              6.00kB ± 0%    6.00kB ± 0%     ~     (all equal)
IntSqr/1-8                           0.00B          0.00B          ~     (all equal)
IntSqr/2-8                           0.00B          0.00B          ~     (all equal)
IntSqr/3-8                           0.00B          0.00B          ~     (all equal)
IntSqr/5-8                           0.00B          0.00B          ~     (all equal)
IntSqr/8-8                           0.00B          0.00B          ~     (all equal)
IntSqr/10-8                          0.00B          0.00B          ~     (all equal)
IntSqr/20-8                           320B ± 0%      320B ± 0%     ~     (all equal)
IntSqr/30-8                           480B ± 0%      480B ± 0%     ~     (all equal)
IntSqr/50-8                           896B ± 0%      896B ± 0%     ~     (all equal)
IntSqr/80-8                         1.28kB ± 0%    1.28kB ± 0%     ~     (all equal)
IntSqr/100-8                        1.79kB ± 0%    1.79kB ± 0%     ~     (all equal)
IntSqr/200-8                        3.20kB ± 0%    3.20kB ± 0%     ~     (all equal)
IntSqr/300-8                        8.06kB ± 0%    8.06kB ± 0%     ~     (all equal)
IntSqr/500-8                        12.3kB ± 0%    12.3kB ± 0%     ~     (all equal)
IntSqr/800-8                        28.8kB ± 0%    28.8kB ± 0%     ~     (all equal)
IntSqr/1000-8                       36.9kB ± 0%    36.9kB ± 0%     ~     (all equal)
Div/20/10-8                          0.00B          0.00B          ~     (all equal)
Div/200/100-8                        0.00B          0.00B          ~     (all equal)
Div/2000/1000-8                      0.00B          0.00B          ~     (all equal)
Div/20000/10000-8                    0.00B          0.00B          ~     (all equal)
Div/200000/100000-8                   690B ± 0%      690B ± 0%     ~     (all equal)
Mul-8                                565kB ± 0%     565kB ± 0%     ~     (all equal)
ZeroShifts/Shl-8                    6.53kB ± 0%    6.53kB ± 0%     ~     (all equal)
ZeroShifts/ShlSame-8                 0.00B          0.00B          ~     (all equal)
ZeroShifts/Shr-8                    6.53kB ± 0%    6.53kB ± 0%     ~     (all equal)
ZeroShifts/ShrSame-8                 0.00B          0.00B          ~     (all equal)
Exp3Power/0x10-8                      192B ± 0%      112B ± 0%  -41.67%  (p=0.000 n=20+20)
Exp3Power/0x40-8                      192B ± 0%      112B ± 0%  -41.67%  (p=0.000 n=20+20)
Exp3Power/0x100-8                     288B ± 0%      208B ± 0%  -27.78%  (p=0.000 n=20+20)
Exp3Power/0x400-8                     672B ± 0%      592B ± 0%  -11.90%  (p=0.000 n=20+20)
Exp3Power/0x1000-8                  3.33kB ± 0%    3.25kB ± 0%   -2.40%  (p=0.000 n=20+20)
Exp3Power/0x4000-8                  13.8kB ± 0%    13.7kB ± 0%   -0.58%  (p=0.000 n=20+20)
Exp3Power/0x10000-8                  117kB ± 0%     117kB ± 0%   -0.07%  (p=0.000 n=20+20)
Exp3Power/0x40000-8                  755kB ± 0%     755kB ± 0%   -0.01%  (p=0.000 n=19+20)
Exp3Power/0x100000-8                5.22MB ± 0%    5.22MB ± 0%   -0.00%  (p=0.000 n=20+20)
Exp3Power/0x400000-8                39.8MB ± 0%    39.8MB ± 0%   -0.00%  (p=0.000 n=20+19)
Fibo-8                              3.09MB ± 0%    3.08MB ± 0%   -0.28%  (p=0.000 n=20+16)
NatSqr/1-8                           48.0B ± 0%     48.0B ± 0%     ~     (all equal)
NatSqr/2-8                           64.0B ± 0%     64.0B ± 0%     ~     (all equal)
NatSqr/3-8                           80.0B ± 0%     80.0B ± 0%     ~     (all equal)
NatSqr/5-8                            112B ± 0%      112B ± 0%     ~     (all equal)
NatSqr/8-8                            160B ± 0%      160B ± 0%     ~     (all equal)
NatSqr/10-8                           192B ± 0%      192B ± 0%     ~     (all equal)
NatSqr/20-8                           672B ± 0%      672B ± 0%     ~     (all equal)
NatSqr/30-8                           992B ± 0%      992B ± 0%     ~     (all equal)
NatSqr/50-8                         1.79kB ± 0%    1.79kB ± 0%     ~     (all equal)
NatSqr/80-8                         2.69kB ± 0%    2.69kB ± 0%     ~     (all equal)
NatSqr/100-8                        3.58kB ± 0%    3.58kB ± 0%     ~     (all equal)
NatSqr/200-8                        6.66kB ± 0%    6.66kB ± 0%     ~     (all equal)
NatSqr/300-8                        24.4kB ± 0%    24.4kB ± 0%     ~     (all equal)
NatSqr/500-8                        36.9kB ± 0%    36.9kB ± 0%     ~     (all equal)
NatSqr/800-8                        69.8kB ± 0%    69.8kB ± 0%     ~     (all equal)
NatSqr/1000-8                       86.0kB ± 0%    86.0kB ± 0%     ~     (all equal)
NatSetBytes/8-8                      0.00B          0.00B          ~     (all equal)
NatSetBytes/24-8                     64.0B ± 0%     64.0B ± 0%     ~     (all equal)
NatSetBytes/128-8                     160B ± 0%      160B ± 0%     ~     (all equal)
NatSetBytes/7-8                      0.00B          0.00B          ~     (all equal)
NatSetBytes/23-8                     64.0B ± 0%     64.0B ± 0%     ~     (all equal)
NatSetBytes/127-8                     160B ± 0%      160B ± 0%     ~     (all equal)
ScanPi-8                            75.4kB ± 0%    75.7kB ± 0%   +0.41%  (p=0.000 n=20+20)
StringPiParallel-8                  20.4kB ± 0%    20.4kB ± 0%     ~     (p=0.223 n=20+20)
Scan/10/Base2-8                      48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Scan/100/Base2-8                     48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Scan/1000/Base2-8                    48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Scan/10000/Base2-8                   48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Scan/100000/Base2-8                  48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Scan/10/Base8-8                      48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Scan/100/Base8-8                     48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Scan/1000/Base8-8                    48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Scan/10000/Base8-8                   48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Scan/100000/Base8-8                  48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Scan/10/Base10-8                     48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Scan/100/Base10-8                    48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Scan/1000/Base10-8                   48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Scan/10000/Base10-8                  48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Scan/100000/Base10-8                 48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Scan/10/Base16-8                     48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Scan/100/Base16-8                    48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Scan/1000/Base16-8                   48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Scan/10000/Base16-8                  48.0B ± 0%     48.0B ± 0%     ~     (all equal)
Scan/100000/Base16-8                 48.0B ± 0%     48.0B ± 0%     ~     (all equal)
String/10/Base2-8                    48.0B ± 0%     48.0B ± 0%     ~     (all equal)
String/100/Base2-8                    352B ± 0%      352B ± 0%     ~     (all equal)
String/1000/Base2-8                 3.46kB ± 0%    3.46kB ± 0%     ~     (all equal)
String/10000/Base2-8                41.0kB ± 0%    41.0kB ± 0%     ~     (all equal)
String/100000/Base2-8                336kB ± 0%     336kB ± 0%     ~     (all equal)
String/10/Base8-8                    16.0B ± 0%     16.0B ± 0%     ~     (all equal)
String/100/Base8-8                    112B ± 0%      112B ± 0%     ~     (all equal)
String/1000/Base8-8                 1.15kB ± 0%    1.15kB ± 0%     ~     (all equal)
String/10000/Base8-8                12.3kB ± 0%    12.3kB ± 0%     ~     (all equal)
String/100000/Base8-8                115kB ± 0%     115kB ± 0%     ~     (all equal)
String/10/Base10-8                   64.0B ± 0%     24.0B ± 0%  -62.50%  (p=0.000 n=20+20)
String/100/Base10-8                   192B ± 0%      192B ± 0%     ~     (all equal)
String/1000/Base10-8                1.95kB ± 0%    1.95kB ± 0%     ~     (all equal)
String/10000/Base10-8               20.0kB ± 0%    20.0kB ± 0%     ~     (p=0.983 n=19+20)
String/100000/Base10-8               210kB ± 1%     211kB ± 1%   +0.82%  (p=0.000 n=19+20)
String/10/Base16-8                   16.0B ± 0%     16.0B ± 0%     ~     (all equal)
String/100/Base16-8                  96.0B ± 0%     96.0B ± 0%     ~     (all equal)
String/1000/Base16-8                  896B ± 0%      896B ± 0%     ~     (all equal)
String/10000/Base16-8               9.47kB ± 0%    9.47kB ± 0%     ~     (all equal)
String/100000/Base16-8              90.1kB ± 0%    90.1kB ± 0%     ~     (all equal)
LeafSize/0-8                        16.9kB ± 0%    16.8kB ± 0%   -0.44%  (p=0.000 n=20+20)
LeafSize/1-8                        22.4kB ± 0%    22.3kB ± 0%   -0.34%  (p=0.000 n=20+19)
LeafSize/2-8                        22.4kB ± 0%    22.3kB ± 0%   -0.34%  (p=0.000 n=20+19)
LeafSize/3-8                        22.4kB ± 0%    22.3kB ± 0%   -0.34%  (p=0.000 n=20+17)
LeafSize/4-8                        22.4kB ± 0%    22.3kB ± 0%   -0.34%  (p=0.000 n=20+19)
LeafSize/5-8                        22.4kB ± 0%    22.3kB ± 0%   -0.33%  (p=0.000 n=20+20)
LeafSize/6-8                        22.3kB ± 0%    22.2kB ± 0%   -0.34%  (p=0.000 n=20+20)
LeafSize/7-8                        22.3kB ± 0%    22.2kB ± 0%   -0.35%  (p=0.000 n=20+20)
LeafSize/8-8                        22.3kB ± 0%    22.2kB ± 0%   -0.34%  (p=0.000 n=16+20)
LeafSize/9-8                        22.3kB ± 0%    22.2kB ± 0%   -0.33%  (p=0.000 n=20+20)
LeafSize/10-8                       22.3kB ± 0%    22.2kB ± 0%   -0.33%  (p=0.000 n=20+20)
LeafSize/11-8                       22.3kB ± 0%    22.2kB ± 0%   -0.33%  (p=0.000 n=20+20)
LeafSize/12-8                       22.3kB ± 0%    22.2kB ± 0%   -0.33%  (p=0.000 n=20+20)
LeafSize/13-8                       22.3kB ± 0%    22.2kB ± 0%   -0.34%  (p=0.000 n=20+15)
LeafSize/14-8                       22.3kB ± 0%    22.2kB ± 0%   -0.33%  (p=0.000 n=20+20)
LeafSize/15-8                       22.3kB ± 0%    22.2kB ± 0%   -0.33%  (p=0.000 n=20+20)
LeafSize/16-8                       22.3kB ± 0%    22.2kB ± 0%   -0.33%  (p=0.000 n=19+20)
LeafSize/32-8                       22.3kB ± 0%    22.2kB ± 0%   -0.32%  (p=0.000 n=20+20)
LeafSize/64-8                       21.8kB ± 0%    21.7kB ± 0%   -0.33%  (p=0.000 n=18+19)
ProbablyPrime/n=0-8                 15.3kB ± 0%    14.9kB ± 0%   -2.35%  (p=0.000 n=20+20)
ProbablyPrime/n=1-8                 21.0kB ± 0%    20.7kB ± 0%   -1.71%  (p=0.000 n=20+20)
ProbablyPrime/n=5-8                 43.4kB ± 0%    42.9kB ± 0%   -1.20%  (p=0.000 n=20+20)
ProbablyPrime/n=10-8                71.5kB ± 0%    70.7kB ± 0%   -1.01%  (p=0.000 n=19+20)
ProbablyPrime/n=20-8                 127kB ± 0%     126kB ± 0%   -0.88%  (p=0.000 n=20+20)
ProbablyPrime/Lucas-8               3.07kB ± 0%    2.79kB ± 0%   -9.12%  (p=0.000 n=20+20)
ProbablyPrime/MillerRabinBase2-8    12.1kB ± 0%    12.0kB ± 0%   -0.66%  (p=0.000 n=20+20)
FloatSqrt/64-8                        416B ± 0%      360B ± 0%  -13.46%  (p=0.000 n=20+20)
FloatSqrt/128-8                       640B ± 0%      584B ± 0%   -8.75%  (p=0.000 n=20+20)
FloatSqrt/256-8                       512B ± 0%      472B ± 0%   -7.81%  (p=0.000 n=20+20)
FloatSqrt/1000-8                    1.47kB ± 0%    1.43kB ± 0%   -2.72%  (p=0.000 n=20+20)
FloatSqrt/10000-8                   18.2kB ± 0%    18.1kB ± 0%   -0.22%  (p=0.000 n=20+20)
FloatSqrt/100000-8                   204kB ± 0%     204kB ± 0%   -0.02%  (p=0.000 n=20+20)
FloatSqrt/1000000-8                 6.37MB ± 0%    6.37MB ± 0%   -0.00%  (p=0.000 n=19+20)
[Geo mean]                          3.42kB         3.24kB        -5.33%

name                              old allocs/op  new allocs/op  delta
DecimalConversion-8                  1.65k ± 0%     1.65k ± 0%     ~     (all equal)
FloatString/100-8                     8.00 ± 0%      8.00 ± 0%     ~     (all equal)
FloatString/1000-8                    9.00 ± 0%      9.00 ± 0%     ~     (all equal)
FloatString/10000-8                   22.0 ± 0%      22.0 ± 0%     ~     (all equal)
FloatString/100000-8                   136 ± 0%       136 ± 0%     ~     (all equal)
FloatAdd/10-8                         0.00           0.00          ~     (all equal)
FloatAdd/100-8                        0.00           0.00          ~     (all equal)
FloatAdd/1000-8                       0.00           0.00          ~     (all equal)
FloatAdd/10000-8                      0.00           0.00          ~     (all equal)
FloatAdd/100000-8                     0.00           0.00          ~     (all equal)
FloatSub/10-8                         0.00           0.00          ~     (all equal)
FloatSub/100-8                        0.00           0.00          ~     (all equal)
FloatSub/1000-8                       0.00           0.00          ~     (all equal)
FloatSub/10000-8                      0.00           0.00          ~     (all equal)
FloatSub/100000-8                     0.00           0.00          ~     (all equal)
ParseFloatSmallExp-8                   110 ± 0%       130 ± 0%  +18.18%  (p=0.000 n=20+20)
ParseFloatLargeExp-8                   319 ± 0%       371 ± 0%  +16.30%  (p=0.000 n=20+20)
GCD10x10/WithoutXY-8                  2.00 ± 0%      2.00 ± 0%     ~     (all equal)
GCD10x10/WithXY-8                     5.00 ± 0%      6.00 ± 0%  +20.00%  (p=0.000 n=20+20)
GCD10x100/WithoutXY-8                 4.00 ± 0%      4.00 ± 0%     ~     (all equal)
GCD10x100/WithXY-8                    9.00 ± 0%     12.00 ± 0%  +33.33%  (p=0.000 n=20+20)
GCD10x1000/WithoutXY-8                4.00 ± 0%      4.00 ± 0%     ~     (all equal)
GCD10x1000/WithXY-8                   11.0 ± 0%      12.0 ± 0%   +9.09%  (p=0.000 n=20+20)
GCD10x10000/WithoutXY-8               4.00 ± 0%      4.00 ± 0%     ~     (all equal)
GCD10x10000/WithXY-8                  11.0 ± 0%      12.0 ± 0%   +9.09%  (p=0.000 n=20+20)
GCD10x100000/WithoutXY-8              4.00 ± 0%      4.00 ± 0%     ~     (all equal)
GCD10x100000/WithXY-8                 11.0 ± 0%      12.0 ± 0%   +9.09%  (p=0.000 n=20+20)
GCD100x100/WithoutXY-8                6.00 ± 0%     10.00 ± 0%  +66.67%  (p=0.000 n=20+20)
GCD100x100/WithXY-8                   9.00 ± 0%     15.00 ± 0%  +66.67%  (p=0.000 n=20+20)
GCD100x1000/WithoutXY-8               6.00 ± 0%      8.00 ± 0%  +33.33%  (p=0.000 n=20+20)
GCD100x1000/WithXY-8                  12.0 ± 0%      13.0 ± 0%   +8.33%  (p=0.000 n=20+20)
GCD100x10000/WithoutXY-8              6.00 ± 0%      8.00 ± 0%  +33.33%  (p=0.000 n=20+20)
GCD100x10000/WithXY-8                 12.0 ± 0%      13.0 ± 0%   +8.33%  (p=0.000 n=20+20)
GCD100x100000/WithoutXY-8             6.00 ± 0%      8.00 ± 0%  +33.33%  (p=0.000 n=20+20)
GCD100x100000/WithXY-8                12.0 ± 0%      13.0 ± 0%   +8.33%  (p=0.000 n=20+20)
GCD1000x1000/WithoutXY-8              10.0 ± 0%      10.0 ± 0%     ~     (all equal)
GCD1000x1000/WithXY-8                 19.0 ± 0%      20.0 ± 0%   +5.26%  (p=0.000 n=20+20)
GCD1000x10000/WithoutXY-8             8.00 ± 0%      8.00 ± 0%     ~     (all equal)
GCD1000x10000/WithXY-8                26.0 ± 0%      26.0 ± 0%     ~     (all equal)
GCD1000x100000/WithoutXY-8            8.00 ± 0%      8.00 ± 0%     ~     (all equal)
GCD1000x100000/WithXY-8               27.0 ± 0%      27.0 ± 0%     ~     (all equal)
GCD10000x10000/WithoutXY-8            10.0 ± 0%      10.0 ± 0%     ~     (all equal)
GCD10000x10000/WithXY-8               76.0 ± 0%      78.0 ± 0%   +2.63%  (p=0.000 n=20+20)
GCD10000x100000/WithoutXY-8           8.00 ± 0%      8.00 ± 0%     ~     (all equal)
GCD10000x100000/WithXY-8               174 ± 0%       174 ± 0%     ~     (all equal)
GCD100000x100000/WithoutXY-8          10.0 ± 0%      10.0 ± 0%     ~     (all equal)
GCD100000x100000/WithXY-8              645 ± 0%       647 ± 0%   +0.31%  (p=0.000 n=20+20)
Hilbert-8                            14.1k ± 0%     14.3k ± 0%   +0.92%  (p=0.000 n=20+20)
Binomial-8                            38.0 ± 0%      38.0 ± 0%     ~     (all equal)
QuoRem-8                              0.00           0.00          ~     (all equal)
Exp-8                                 21.0 ± 0%      21.0 ± 0%     ~     (all equal)
Exp2-8                                22.0 ± 0%      22.0 ± 0%     ~     (all equal)
Bitset-8                              0.00           0.00          ~     (all equal)
BitsetNeg-8                           0.00           0.00          ~     (all equal)
BitsetOrig-8                          1.00 ± 0%      1.00 ± 0%     ~     (all equal)
BitsetNegOrig-8                       2.00 ± 0%      2.00 ± 0%     ~     (all equal)
ModSqrt225_Tonelli-8                  85.0 ± 0%      86.0 ± 0%   +1.18%  (p=0.000 n=20+20)
ModSqrt225_3Mod4-8                    25.0 ± 0%      25.0 ± 0%     ~     (all equal)
ModSqrt231_Tonelli-8                  80.0 ± 0%      80.0 ± 0%     ~     (all equal)
ModSqrt231_5Mod8-8                    32.0 ± 0%      32.0 ± 0%     ~     (all equal)
ModInverse-8                          11.0 ± 0%      11.0 ± 0%     ~     (all equal)
Sqrt-8                                13.0 ± 0%      13.0 ± 0%     ~     (all equal)
IntSqr/1-8                            0.00           0.00          ~     (all equal)
IntSqr/2-8                            0.00           0.00          ~     (all equal)
IntSqr/3-8                            0.00           0.00          ~     (all equal)
IntSqr/5-8                            0.00           0.00          ~     (all equal)
IntSqr/8-8                            0.00           0.00          ~     (all equal)
IntSqr/10-8                           0.00           0.00          ~     (all equal)
IntSqr/20-8                           1.00 ± 0%      1.00 ± 0%     ~     (all equal)
IntSqr/30-8                           1.00 ± 0%      1.00 ± 0%     ~     (all equal)
IntSqr/50-8                           1.00 ± 0%      1.00 ± 0%     ~     (all equal)
IntSqr/80-8                           1.00 ± 0%      1.00 ± 0%     ~     (all equal)
IntSqr/100-8                          1.00 ± 0%      1.00 ± 0%     ~     (all equal)
IntSqr/200-8                          1.00 ± 0%      1.00 ± 0%     ~     (all equal)
IntSqr/300-8                          3.00 ± 0%      3.00 ± 0%     ~     (all equal)
IntSqr/500-8                          3.00 ± 0%      3.00 ± 0%     ~     (all equal)
IntSqr/800-8                          9.00 ± 0%      9.00 ± 0%     ~     (all equal)
IntSqr/1000-8                         9.00 ± 0%      9.00 ± 0%     ~     (all equal)
Div/20/10-8                           0.00           0.00          ~     (all equal)
Div/200/100-8                         0.00           0.00          ~     (all equal)
Div/2000/1000-8                       0.00           0.00          ~     (all equal)
Div/20000/10000-8                     0.00           0.00          ~     (all equal)
Div/200000/100000-8                   0.00           0.00          ~     (all equal)
Mul-8                                 2.00 ± 0%      2.00 ± 0%     ~     (all equal)
ZeroShifts/Shl-8                      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
ZeroShifts/ShlSame-8                  0.00           0.00          ~     (all equal)
ZeroShifts/Shr-8                      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
ZeroShifts/ShrSame-8                  0.00           0.00          ~     (all equal)
Exp3Power/0x10-8                      4.00 ± 0%      4.00 ± 0%     ~     (all equal)
Exp3Power/0x40-8                      4.00 ± 0%      4.00 ± 0%     ~     (all equal)
Exp3Power/0x100-8                     5.00 ± 0%      5.00 ± 0%     ~     (all equal)
Exp3Power/0x400-8                     7.00 ± 0%      7.00 ± 0%     ~     (all equal)
Exp3Power/0x1000-8                    11.0 ± 0%      11.0 ± 0%     ~     (all equal)
Exp3Power/0x4000-8                    15.0 ± 0%      15.0 ± 0%     ~     (all equal)
Exp3Power/0x10000-8                   29.0 ± 0%      29.0 ± 0%     ~     (all equal)
Exp3Power/0x40000-8                    140 ± 0%       140 ± 0%     ~     (all equal)
Exp3Power/0x100000-8                 1.12k ± 0%     1.12k ± 0%     ~     (all equal)
Exp3Power/0x400000-8                 9.88k ± 0%     9.88k ± 0%     ~     (p=0.747 n=17+19)
Fibo-8                                 739 ± 0%       743 ± 0%   +0.54%  (p=0.000 n=20+20)
NatSqr/1-8                            1.00 ± 0%      1.00 ± 0%     ~     (all equal)
NatSqr/2-8                            1.00 ± 0%      1.00 ± 0%     ~     (all equal)
NatSqr/3-8                            1.00 ± 0%      1.00 ± 0%     ~     (all equal)
NatSqr/5-8                            1.00 ± 0%      1.00 ± 0%     ~     (all equal)
NatSqr/8-8                            1.00 ± 0%      1.00 ± 0%     ~     (all equal)
NatSqr/10-8                           1.00 ± 0%      1.00 ± 0%     ~     (all equal)
NatSqr/20-8                           2.00 ± 0%      2.00 ± 0%     ~     (all equal)
NatSqr/30-8                           2.00 ± 0%      2.00 ± 0%     ~     (all equal)
NatSqr/50-8                           2.00 ± 0%      2.00 ± 0%     ~     (all equal)
NatSqr/80-8                           2.00 ± 0%      2.00 ± 0%     ~     (all equal)
NatSqr/100-8                          2.00 ± 0%      2.00 ± 0%     ~     (all equal)
NatSqr/200-8                          2.00 ± 0%      2.00 ± 0%     ~     (all equal)
NatSqr/300-8                          4.00 ± 0%      4.00 ± 0%     ~     (all equal)
NatSqr/500-8                          4.00 ± 0%      4.00 ± 0%     ~     (all equal)
NatSqr/800-8                          10.0 ± 0%      10.0 ± 0%     ~     (all equal)
NatSqr/1000-8                         10.0 ± 0%      10.0 ± 0%     ~     (all equal)
NatSetBytes/8-8                       0.00           0.00          ~     (all equal)
NatSetBytes/24-8                      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
NatSetBytes/128-8                     1.00 ± 0%      1.00 ± 0%     ~     (all equal)
NatSetBytes/7-8                       0.00           0.00          ~     (all equal)
NatSetBytes/23-8                      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
NatSetBytes/127-8                     1.00 ± 0%      1.00 ± 0%     ~     (all equal)
ScanPi-8                              60.0 ± 0%      61.0 ± 0%   +1.67%  (p=0.000 n=20+20)
StringPiParallel-8                    24.0 ± 0%      24.0 ± 0%     ~     (all equal)
Scan/10/Base2-8                       1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Scan/100/Base2-8                      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Scan/1000/Base2-8                     1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Scan/10000/Base2-8                    1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Scan/100000/Base2-8                   1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Scan/10/Base8-8                       1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Scan/100/Base8-8                      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Scan/1000/Base8-8                     1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Scan/10000/Base8-8                    1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Scan/100000/Base8-8                   1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Scan/10/Base10-8                      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Scan/100/Base10-8                     1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Scan/1000/Base10-8                    1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Scan/10000/Base10-8                   1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Scan/100000/Base10-8                  1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Scan/10/Base16-8                      1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Scan/100/Base16-8                     1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Scan/1000/Base16-8                    1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Scan/10000/Base16-8                   1.00 ± 0%      1.00 ± 0%     ~     (all equal)
Scan/100000/Base16-8                  1.00 ± 0%      1.00 ± 0%     ~     (all equal)
String/10/Base2-8                     1.00 ± 0%      1.00 ± 0%     ~     (all equal)
String/100/Base2-8                    1.00 ± 0%      1.00 ± 0%     ~     (all equal)
String/1000/Base2-8                   1.00 ± 0%      1.00 ± 0%     ~     (all equal)
String/10000/Base2-8                  1.00 ± 0%      1.00 ± 0%     ~     (all equal)
String/100000/Base2-8                 1.00 ± 0%      1.00 ± 0%     ~     (all equal)
String/10/Base8-8                     1.00 ± 0%      1.00 ± 0%     ~     (all equal)
String/100/Base8-8                    1.00 ± 0%      1.00 ± 0%     ~     (all equal)
String/1000/Base8-8                   1.00 ± 0%      1.00 ± 0%     ~     (all equal)
String/10000/Base8-8                  1.00 ± 0%      1.00 ± 0%     ~     (all equal)
String/100000/Base8-8                 1.00 ± 0%      1.00 ± 0%     ~     (all equal)
String/10/Base10-8                    2.00 ± 0%      2.00 ± 0%     ~     (all equal)
String/100/Base10-8                   2.00 ± 0%      2.00 ± 0%     ~     (all equal)
String/1000/Base10-8                  3.00 ± 0%      3.00 ± 0%     ~     (all equal)
String/10000/Base10-8                 3.00 ± 0%      3.00 ± 0%     ~     (all equal)
String/100000/Base10-8                3.00 ± 0%      3.00 ± 0%     ~     (all equal)
String/10/Base16-8                    1.00 ± 0%      1.00 ± 0%     ~     (all equal)
String/100/Base16-8                   1.00 ± 0%      1.00 ± 0%     ~     (all equal)
String/1000/Base16-8                  1.00 ± 0%      1.00 ± 0%     ~     (all equal)
String/10000/Base16-8                 1.00 ± 0%      1.00 ± 0%     ~     (all equal)
String/100000/Base16-8                1.00 ± 0%      1.00 ± 0%     ~     (all equal)
LeafSize/0-8                          10.0 ± 0%      10.0 ± 0%     ~     (all equal)
LeafSize/1-8                          13.0 ± 0%      13.0 ± 0%     ~     (all equal)
LeafSize/2-8                          13.0 ± 0%      13.0 ± 0%     ~     (all equal)
LeafSize/3-8                          13.0 ± 0%      13.0 ± 0%     ~     (all equal)
LeafSize/4-8                          13.0 ± 0%      13.0 ± 0%     ~     (all equal)
LeafSize/5-8                          13.0 ± 0%      13.0 ± 0%     ~     (all equal)
LeafSize/6-8                          12.0 ± 0%      12.0 ± 0%     ~     (all equal)
LeafSize/7-8                          12.0 ± 0%      12.0 ± 0%     ~     (all equal)
LeafSize/8-8                          12.0 ± 0%      12.0 ± 0%     ~     (all equal)
LeafSize/9-8                          12.0 ± 0%      12.0 ± 0%     ~     (all equal)
LeafSize/10-8                         12.0 ± 0%      12.0 ± 0%     ~     (all equal)
LeafSize/11-8                         12.0 ± 0%      12.0 ± 0%     ~     (all equal)
LeafSize/12-8                         12.0 ± 0%      12.0 ± 0%     ~     (all equal)
LeafSize/13-8                         12.0 ± 0%      12.0 ± 0%     ~     (all equal)
LeafSize/14-8                         12.0 ± 0%      12.0 ± 0%     ~     (all equal)
LeafSize/15-8                         12.0 ± 0%      12.0 ± 0%     ~     (all equal)
LeafSize/16-8                         12.0 ± 0%      12.0 ± 0%     ~     (all equal)
LeafSize/32-8                         12.0 ± 0%      12.0 ± 0%     ~     (all equal)
LeafSize/64-8                         11.0 ± 0%      11.0 ± 0%     ~     (all equal)
ProbablyPrime/n=0-8                   52.0 ± 0%      52.0 ± 0%     ~     (all equal)
ProbablyPrime/n=1-8                   73.0 ± 0%      73.0 ± 0%     ~     (all equal)
ProbablyPrime/n=5-8                    157 ± 0%       157 ± 0%     ~     (all equal)
ProbablyPrime/n=10-8                   262 ± 0%       262 ± 0%     ~     (all equal)
ProbablyPrime/n=20-8                   472 ± 0%       472 ± 0%     ~     (all equal)
ProbablyPrime/Lucas-8                 22.0 ± 0%      22.0 ± 0%     ~     (all equal)
ProbablyPrime/MillerRabinBase2-8      29.0 ± 0%      29.0 ± 0%     ~     (all equal)
FloatSqrt/64-8                        9.00 ± 0%     10.00 ± 0%  +11.11%  (p=0.000 n=20+20)
FloatSqrt/128-8                       12.0 ± 0%      13.0 ± 0%   +8.33%  (p=0.000 n=20+20)
FloatSqrt/256-8                       8.00 ± 0%      8.00 ± 0%     ~     (all equal)
FloatSqrt/1000-8                      9.00 ± 0%      9.00 ± 0%     ~     (all equal)
FloatSqrt/10000-8                     14.0 ± 0%      14.0 ± 0%     ~     (all equal)
FloatSqrt/100000-8                    33.0 ± 0%      33.0 ± 0%     ~     (all equal)
FloatSqrt/1000000-8                  1.16k ± 0%     1.16k ± 0%     ~     (all equal)
[Geo mean]                            6.62           6.76        +2.09%

Change-Id: Id9df4157cac1e07721e35cff7fcdefe60703873a
Reviewed-on: https://go-review.googlesource.com/c/150999
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-11-28 17:38:46 +00:00
Brian Kessler 979d9027ae math/bits: define Div to panic when y<=hi
Div panics when y<=hi because either the quotient overflows
the size of the output or division by zero occurs when y==0.
This provides a uniform behavior for all implementations.

Fixes #28316

Change-Id: If23aeb10e0709ee1a60b7d614afc9103d674a980
Reviewed-on: https://go-review.googlesource.com/c/149517
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-11-27 05:04:33 +00:00
Brian Kessler ead5d1e316 math/bits: panic when y<=hi in Div
Explicitly check for divide-by-zero/overflow and panic with the appropriate
runtime error.  The additional checks have basically no effect on performance
since the branch is easily predicted.

name     old time/op  new time/op  delta
Div-4    53.9ns ± 1%  53.0ns ± 1%  -1.59%  (p=0.016 n=4+5)
Div32-4  17.9ns ± 0%  18.4ns ± 0%  +2.56%  (p=0.008 n=5+5)
Div64-4  53.5ns ± 0%  53.3ns ± 0%    ~     (p=0.095 n=5+5)

Updates #28316

Change-Id: I36297ee9946cbbc57fefb44d1730283b049ecf57
Reviewed-on: https://go-review.googlesource.com/c/144377
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-11-27 05:04:14 +00:00