mirror/go - go - Git Fam. Sieh

Commit Graph

Author	SHA1	Message	Date
smasher164	5ca44dc403	math/bits: make Add and Sub fallbacks constant time Make the extended precision add-with-carry and sub-with-carry operations take a constant amount of time to execute, regardless of input. name old time/op new time/op delta Add-4 1.16ns ±11% 1.51ns ± 5% +30.52% (p=0.008 n=5+5) Add32-4 1.08ns ± 0% 1.03ns ± 1% -4.86% (p=0.029 n=4+4) Add64-4 1.09ns ± 1% 1.95ns ± 3% +79.23% (p=0.008 n=5+5) Add64multiple-4 4.03ns ± 1% 4.55ns ±11% +13.07% (p=0.008 n=5+5) Sub-4 1.08ns ± 1% 1.50ns ± 0% +38.17% (p=0.016 n=5+4) Sub32-4 1.09ns ± 2% 1.53ns ±10% +40.26% (p=0.008 n=5+5) Sub64-4 1.10ns ± 1% 1.47ns ± 1% +33.39% (p=0.008 n=5+5) Sub64multiple-4 4.30ns ± 2% 4.08ns ± 4% -5.07% (p=0.032 n=5+5) Fixes #31267 Change-Id: I1824b1b3ab8f09902ce8b5fef84ce2fdb8847ed9 Reviewed-on: https://go-review.googlesource.com/c/go/+/170758 Reviewed-by: Filippo Valsorda <filippo@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Filippo Valsorda <filippo@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-05-20 01:13:27 +00:00
JT Olio	5a2da5624a	math/big: stack allocate scaleDenom return value benchmark old ns/op new ns/op delta BenchmarkRatCmp-4 154 77.9 -49.42% Change-Id: I932710ad8b6905879e232168b1777927f86ba22a Reviewed-on: https://go-review.googlesource.com/c/go/+/175460 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2019-05-08 17:11:57 +00:00
erifan01	503e6ccd74	math/big: fix the bug in assembly implementation of shlVU on arm64 For the case where the addresses of parameter z and x of the function shlVU overlap and the address of z is greater than x, x (input value) can be polluted during the calculation when the high words of x are overlapped with the low words of z (output value). Fixes #31084 Change-Id: I9bb0266a1d7856b8faa9a9b1975d6f57dece0479 Reviewed-on: https://go-review.googlesource.com/c/go/+/169780 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-05-08 01:29:00 +00:00
Brian Kessler	689ee112df	math/big: document Int.String Int.String had no documentation and the documentation for Int.Text did not mention the handling of the nil pointer case. Change-Id: I9f21921e431c948545b7cabc7829e4b4e574bbe9 Reviewed-on: https://go-review.googlesource.com/c/go/+/175118 Reviewed-by: Robert Griesemer <gri@golang.org>	2019-05-03 03:25:26 +00:00
Michael Munday	f0fdbb1e8b	math: consolidate assembly stub implementations Where assembly functions are just jumps to the Go implementation put them into a stubs_<arch>.s file. This reduces the number of files considerably and makes it easier to see what is really implemented in assembly. I've also run the stubs files through asmfmt to format them in a more consistent way. Eventually we should replace these 'stub' assembly files with a pure Go implementation now that we have mid-stack inlining (see #31362). Change-Id: If5b2022dcc23e1299f1b7ba79884f1b1263d0f7f Reviewed-on: https://go-review.googlesource.com/c/go/+/173398 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2019-04-23 14:50:16 +00:00
erifan01	d17d41e58d	math/big: optimize mulAddVWW on arm64 for better performance Unroll the cycle 4 times to reduce load overhead. Benchmarks: name old time/op new time/op delta MulAddVWW/1-8 15.9ns ± 0% 11.9ns ± 0% -24.92% (p=0.000 n=8+8) MulAddVWW/2-8 16.1ns ± 0% 13.9ns ± 1% -13.82% (p=0.000 n=8+8) MulAddVWW/3-8 18.9ns ± 0% 17.3ns ± 0% -8.47% (p=0.000 n=8+8) MulAddVWW/4-8 21.7ns ± 0% 19.5ns ± 0% -10.14% (p=0.000 n=8+8) MulAddVWW/5-8 25.1ns ± 0% 22.5ns ± 0% -10.27% (p=0.000 n=8+8) MulAddVWW/10-8 41.6ns ± 0% 40.0ns ± 0% -3.79% (p=0.000 n=8+8) MulAddVWW/100-8 368ns ± 0% 363ns ± 0% -1.36% (p=0.000 n=8+8) MulAddVWW/1000-8 3.52µs ± 0% 3.52µs ± 0% -0.14% (p=0.000 n=8+8) MulAddVWW/10000-8 35.1µs ± 0% 35.1µs ± 0% -0.01% (p=0.000 n=7+6) MulAddVWW/100000-8 351µs ± 0% 351µs ± 0% +0.15% (p=0.038 n=8+8) Change-Id: I052a4db286ac6e4f3293289c7e9a82027da0405e Reviewed-on: https://go-review.googlesource.com/c/go/+/155780 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-04-22 14:45:16 +00:00
Josh Bleecher Snyder	5781df421e	all: s/cancelation/cancellation/ Though there is variation in the spelling of canceled, cancellation is always spelled with a double l. Reference: https://www.grammarly.com/blog/canceled-vs-cancelled/ Change-Id: I240f1a297776c8e27e74f3eca566d2bc4c856f2f Reviewed-on: https://go-review.googlesource.com/c/go/+/170060 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2019-04-16 20:27:15 +00:00
Michael Munday	64dc4ba73f	math: use new mnemonics for 'rotate then insert' on s390x Mnemonics for these instructions were added to the assembler in CL 159357. Change-Id: Ie11c45ecc9cead9a8850fcc929b0211cfd980fe5 Reviewed-on: https://go-review.googlesource.com/c/go/+/160157 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2019-04-16 15:34:41 +00:00
Robert Griesemer	f8c6f986fd	math/big: don't clobber shared underlying array in pow5 computation Rearranged code slightly to make lifetime of underlying array of pow5 more explicit in code. Fixes #31184. Change-Id: I063081f0e54097c499988d268a23813746592654 Reviewed-on: https://go-review.googlesource.com/c/go/+/170641 Reviewed-by: Filippo Valsorda <filippo@golang.org>	2019-04-15 17:48:21 +00:00
Neven Sajko	7756a72b35	all: change the old assembly style AX:CX to CX, AX Assembly files with "/vendor/" or "testdata" in their paths were ignored. Change-Id: I3882ff07eb4426abb9f8ee96f82dff73c81cd61f GitHub-Last-Rev: `51ae8c324d` GitHub-Pull-Request: golang/go#31166 Reviewed-on: https://go-review.googlesource.com/c/go/+/170197 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-04-09 00:22:03 +00:00
Filippo Valsorda	ead895688d	math/big: do not panic in Exp when y < 0 and x doesn't have an inverse If x does not have an inverse modulo m, and a negative exponent is used, return nil just like ModInverse does now. Change-Id: I8fa72f7a851e8cf77c5fab529ede88408740626f Reviewed-on: https://go-review.googlesource.com/c/go/+/170757 Run-TryBot: Filippo Valsorda <filippo@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2019-04-04 23:02:09 +00:00
Than McIntosh	bead358611	math/bits: add gccgo-friendly code for compiler bootstrap When building as part of the bootstrap process, avoid use of "go:linkname" applied to variables, since this feature is ill-defined/unsupported for gccgo. Updates #30771. Change-Id: Id44d01b5c98d292702e5075674117518cb59e2d0 Reviewed-on: https://go-review.googlesource.com/c/go/+/170737 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-04-04 18:17:30 +00:00
Neven Sajko	964fe4b80f	math/big: simplify shlVU_g and shrVU_g Rewrote a few lines to be more idiomatic/less assembly-ish. Benchmarked with `go test -bench Float -tags math_big_pure_go`: name old time/op new time/op delta FloatString/100-8 751ns ± 0% 746ns ± 1% -0.71% (p=0.000 n=10+10) FloatString/1000-8 22.9µs ± 0% 22.9µs ± 0% ~ (p=0.271 n=10+10) FloatString/10000-8 1.89ms ± 0% 1.89ms ± 0% ~ (p=0.481 n=10+10) FloatString/100000-8 184ms ± 0% 184ms ± 0% ~ (p=0.094 n=9+9) FloatAdd/10-8 56.4ns ± 1% 56.5ns ± 0% ~ (p=0.170 n=9+9) FloatAdd/100-8 59.7ns ± 0% 59.3ns ± 0% -0.70% (p=0.000 n=8+9) FloatAdd/1000-8 101ns ± 0% 99ns ± 0% -1.89% (p=0.000 n=8+8) FloatAdd/10000-8 553ns ± 0% 536ns ± 0% -3.00% (p=0.000 n=9+10) FloatAdd/100000-8 4.94µs ± 0% 4.74µs ± 0% -3.94% (p=0.000 n=9+10) FloatSub/10-8 50.3ns ± 0% 50.5ns ± 0% +0.52% (p=0.000 n=8+8) FloatSub/100-8 52.0ns ± 0% 52.2ns ± 1% +0.46% (p=0.012 n=8+10) FloatSub/1000-8 77.9ns ± 0% 77.3ns ± 0% -0.80% (p=0.000 n=7+8) FloatSub/10000-8 371ns ± 0% 362ns ± 0% -2.67% (p=0.000 n=10+10) FloatSub/100000-8 3.20µs ± 0% 3.10µs ± 0% -3.16% (p=0.000 n=10+10) ParseFloatSmallExp-8 7.84µs ± 0% 7.82µs ± 0% -0.17% (p=0.037 n=9+9) ParseFloatLargeExp-8 29.3µs ± 1% 29.5µs ± 0% ~ (p=0.059 n=9+8) FloatSqrt/64-8 516ns ± 0% 519ns ± 0% +0.54% (p=0.000 n=9+9) FloatSqrt/128-8 1.07µs ± 0% 1.07µs ± 0% ~ (p=0.109 n=8+9) FloatSqrt/256-8 1.23µs ± 0% 1.23µs ± 0% +0.50% (p=0.000 n=9+9) FloatSqrt/1000-8 3.43µs ± 0% 3.44µs ± 0% +0.53% (p=0.000 n=9+8) FloatSqrt/10000-8 40.9µs ± 0% 40.7µs ± 0% -0.39% (p=0.000 n=9+8) FloatSqrt/100000-8 1.07ms ± 0% 1.07ms ± 0% -0.10% (p=0.017 n=10+9) FloatSqrt/1000000-8 89.3ms ± 0% 89.2ms ± 0% -0.07% (p=0.015 n=9+8) Change-Id: Ibf07c6142719d11bc7f329246957d87a9f3ba3d2 GitHub-Last-Rev: `870a041ab7` GitHub-Pull-Request: golang/go#31220 Reviewed-on: https://go-review.googlesource.com/c/go/+/170449 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2019-04-04 00:26:24 +00:00
Robert Griesemer	4dce6dbb1e	math/big: temporarily disable buggy shlVU assembly for arm64 This addresses the failures we have seen in #31084. The correct fix is to find the actual bug in the assembly code. Updates #31084. Change-Id: I437780c53d0c4423d742e2e3b650b899ce845372 Reviewed-on: https://go-review.googlesource.com/c/go/+/169721 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-03-27 23:40:12 +00:00
Brian Kessler	5ee2290420	math/big: implement Rat.SetUint64 Implemented via the underlying Int.SetUint64. Added tests for Rat.SetInt64 and Rat.SetUint64. Fixes #29579 Change-Id: I03faaffc93e36873b202b58ae72b139dea5c40f9 Reviewed-on: https://go-review.googlesource.com/c/go/+/160682 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2019-03-27 15:20:28 +00:00
Neven Sajko	270de1c110	math: use Sincos instead of Sin and Cos in Jn and Yn Change-Id: I0da3857013f1d4e90820fb043314d78924113a27 GitHub-Last-Rev: `7c3d813c6e` GitHub-Pull-Request: golang/go#31019 Reviewed-on: https://go-review.googlesource.com/c/go/+/169078 Reviewed-by: Robert Griesemer <gri@golang.org>	2019-03-25 22:41:37 +00:00
Robert Griesemer	e4ba40030f	math/big: accept non-decimal floats with Rat.SetString This fixes an old oversight. Rat.SetString already permitted fractions a/b where both a and b could independently specify a base prefix. With this CL, it now also accepts non-decimal floating-point numbers. Fixes #29799. Change-Id: I9cc65666a5cebb00f0202da2e4fc5654a02e3234 Reviewed-on: https://go-review.googlesource.com/c/go/+/168237 Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>	2019-03-25 22:29:26 +00:00
Vladimir Kovpak	9a8979deb0	math/rand: add example for Intn func Change-Id: I831ffb5c3fa2872d71def8d8461f0adbd4ae2c1a GitHub-Last-Rev: `2adfcd2d5a` GitHub-Pull-Request: golang/go#30706 Reviewed-on: https://go-review.googlesource.com/c/go/+/166426 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2019-03-25 16:05:37 +00:00
erifan01	d0cbf9bf53	cmd/compile: follow up intrinsifying math/bits.Add64 for arm64 This CL deals with the additional comments of CL 159017. Change-Id: I4ad3c60c834646d58dc0c544c741b92bfe83fb8b Reviewed-on: https://go-review.googlesource.com/c/go/+/168857 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-03-22 15:09:47 +00:00
erifan01	5714c91b53	cmd/compile: intrinsify math/bits.Add64 for arm64 This CL instrinsifies Add64 with arm64 instruction sequence ADDS, ADCS and ADC, and optimzes the case of carry chains.The CL also changes the test code so that the intrinsic implementation can be tested. Benchmarks: name old time/op new time/op delta Add-224 2.500000ns +- 0% 2.090000ns +- 4% -16.40% (p=0.000 n=9+10) Add32-224 2.500000ns +- 0% 2.500000ns +- 0% ~ (all equal) Add64-224 2.500000ns +- 0% 1.577778ns +- 2% -36.89% (p=0.000 n=10+9) Add64multiple-224 6.000000ns +- 0% 2.000000ns +- 0% -66.67% (p=0.000 n=10+10) Change-Id: I6ee91c9a85c16cc72ade5fd94868c579f16c7615 Reviewed-on: https://go-review.googlesource.com/c/go/+/159017 Run-TryBot: Ben Shi <powerman1st@163.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-03-20 05:39:49 +00:00
David Chase	6535791385	math: fix math.Remainder(-x,x) (for Inf > x > 0) Modify the \|x\| == \|y\| case to return -0 when x < 0. Fixes #30814. Change-Id: Ic4cd48001e0e894a12b5b813c6a1ddc3a055610b Reviewed-on: https://go-review.googlesource.com/c/go/+/167479 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2019-03-15 14:52:51 +00:00
Robert Griesemer	cfa93ba51f	math/big: add support for underscores '_' in numbers The primary change is in nat.scan which now accepts underscores for base 0. While at it, streamlined error handling in that function as well. Also, improved the corresponding test significantly by checking the expected result values also in case of scan errors. The second major change is in scanExponent which now accepts underscores when the new sepOk argument is set. While at it, essentially rewrote that function to match error and underscore handling of nat.scan more closely. Added a new test for scanExponent which until now was only tested indirectly. Finally, updated the documentation for several functions and added many new test cases to clients of nat.scan. A major portion of this CL is due to much better test coverage. Updates #28493. Change-Id: I7f17b361b633fbe6c798619d891bd5e0a045b5c5 Reviewed-on: https://go-review.googlesource.com/c/go/+/166157 Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>	2019-03-12 22:58:58 +00:00
Brian Kessler	ef891e1c83	math/big: implement Int.TrailingZeroBits Implemented via the underlying nat.trailingZeroBits. Fixes #29578 Change-Id: If9876c5a74b107cbabceb7547bef4e44501f6745 Reviewed-on: https://go-review.googlesource.com/c/go/+/160681 Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-03-12 13:18:27 +00:00
Josh Bleecher Snyder	4d10aba35e	math/big: add fast path for amd64 addVW for large z This matches the pure Go fast path added in the previous commit. I will leave other architectures to those with ready access to hardware. name old time/op new time/op delta AddVW/1-8 3.60ns ± 3% 3.59ns ± 1% ~ (p=0.147 n=91+86) AddVW/2-8 3.92ns ± 1% 3.91ns ± 2% -0.36% (p=0.000 n=86+92) AddVW/3-8 4.33ns ± 5% 4.46ns ± 5% +2.94% (p=0.000 n=96+97) AddVW/4-8 4.76ns ± 5% 4.82ns ± 5% +1.28% (p=0.000 n=95+92) AddVW/5-8 5.40ns ± 1% 5.42ns ± 0% +0.47% (p=0.000 n=76+71) AddVW/10-8 8.03ns ± 1% 7.80ns ± 5% -2.90% (p=0.000 n=73+96) AddVW/100-8 43.8ns ± 5% 17.9ns ± 1% -59.12% (p=0.000 n=94+81) AddVW/1000-8 428ns ± 4% 85ns ± 6% -80.20% (p=0.000 n=96+99) AddVW/10000-8 4.22µs ± 2% 1.80µs ± 3% -57.32% (p=0.000 n=69+92) AddVW/100000-8 44.8µs ± 8% 31.5µs ± 3% -29.76% (p=0.000 n=99+90) name old time/op new time/op delta SubVW/1-8 3.53ns ± 2% 3.63ns ± 5% +2.97% (p=0.000 n=94+93) SubVW/2-8 4.33ns ± 5% 4.01ns ± 2% -7.36% (p=0.000 n=90+85) SubVW/3-8 4.32ns ± 2% 4.32ns ± 5% ~ (p=0.084 n=87+97) SubVW/4-8 4.70ns ± 2% 4.83ns ± 6% +2.77% (p=0.000 n=85+96) SubVW/5-8 5.84ns ± 1% 5.35ns ± 1% -8.35% (p=0.000 n=87+87) SubVW/10-8 8.01ns ± 4% 7.54ns ± 4% -5.84% (p=0.000 n=98+97) SubVW/100-8 43.9ns ± 5% 17.9ns ± 1% -59.20% (p=0.000 n=98+76) SubVW/1000-8 426ns ± 2% 85ns ± 3% -80.13% (p=0.000 n=90+98) SubVW/10000-8 4.24µs ± 2% 1.81µs ± 3% -57.28% (p=0.000 n=74+91) SubVW/100000-8 44.5µs ± 4% 31.5µs ± 2% -29.33% (p=0.000 n=84+91) Change-Id: I10dd361cbaca22197c27e7734c0f50065292afbb Reviewed-on: https://go-review.googlesource.com/c/go/+/164969 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2019-03-09 20:34:40 +00:00
Josh Bleecher Snyder	fe24837c4d	math/big: add fast path for pure Go addVW for large z In the normal case, only a few words have to be updated when adding a word to a vector. When that happens, we can simply copy the rest of the words, which is much faster. However, the overhead of that makes it prohibitive for small vectors, so we check the size at the beginning. The implementation is a bit weird to allow addVW to continued to be inlined; see #30548. The AddVW benchmarks are surprising, but fully repeatable. The SubVW benchmarks are more or less as expected. I expect that removing the indirect function call will help both and make them a bit more normal. name old time/op new time/op delta AddVW/1-8 4.27ns ± 2% 3.81ns ± 3% -10.83% (p=0.000 n=89+90) AddVW/2-8 4.91ns ± 2% 4.34ns ± 1% -11.60% (p=0.000 n=83+90) AddVW/3-8 5.77ns ± 4% 5.76ns ± 2% ~ (p=0.365 n=91+87) AddVW/4-8 6.03ns ± 1% 6.03ns ± 1% ~ (p=0.392 n=80+76) AddVW/5-8 6.48ns ± 2% 6.63ns ± 1% +2.27% (p=0.000 n=76+74) AddVW/10-8 9.56ns ± 2% 9.56ns ± 1% -0.02% (p=0.002 n=69+76) AddVW/100-8 90.6ns ± 0% 18.1ns ± 4% -79.99% (p=0.000 n=72+94) AddVW/1000-8 865ns ± 0% 85ns ± 6% -90.14% (p=0.000 n=66+96) AddVW/10000-8 8.57µs ± 2% 1.82µs ± 3% -78.73% (p=0.000 n=99+94) AddVW/100000-8 84.4µs ± 2% 31.8µs ± 4% -62.29% (p=0.000 n=93+98) name old time/op new time/op delta SubVW/1-8 3.90ns ± 2% 4.13ns ± 4% +6.02% (p=0.000 n=92+95) SubVW/2-8 4.15ns ± 1% 5.20ns ± 1% +25.22% (p=0.000 n=83+85) SubVW/3-8 5.50ns ± 2% 6.22ns ± 6% +13.21% (p=0.000 n=91+97) SubVW/4-8 5.99ns ± 1% 6.63ns ± 1% +10.63% (p=0.000 n=79+61) SubVW/5-8 6.75ns ± 4% 6.88ns ± 2% +1.82% (p=0.000 n=98+73) SubVW/10-8 9.57ns ± 1% 9.56ns ± 1% -0.13% (p=0.000 n=77+64) SubVW/100-8 90.3ns ± 1% 18.1ns ± 2% -80.00% (p=0.000 n=75+94) SubVW/1000-8 860ns ± 4% 85ns ± 7% -90.14% (p=0.000 n=97+99) SubVW/10000-8 8.51µs ± 3% 1.77µs ± 6% -79.21% (p=0.000 n=100+97) SubVW/100000-8 84.4µs ± 3% 31.5µs ± 3% -62.66% (p=0.000 n=92+92) Change-Id: I721d7031d40f245b4a284f5bdd93e7bb85e7e937 Reviewed-on: https://go-review.googlesource.com/c/go/+/164968 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2019-03-09 20:33:46 +00:00
Josh Bleecher Snyder	4c227a091e	math/big: remove bounds checks in pure Go implementations These routines are quite sensitive to BCE. This change eliminates bounds checks from loops. It does so at the cost of a bit of safety: malformed input will now return incorrect answers instead of panicking. This isn't as bad as it sounds: math/big has very good test coverage, and the alternative implementations are in assembly, which could do much worse things with malformed input. If the compiler's BCE improves, so could these routines. Notable BCE improvements for these routines would be: * Allowing and propagating more cross-slice length hints. Then hints like _ = y[:len(z)] would eliminate bounds checks for y[i]. * Propagating enough information so that we could do n := len(x) if len(z) < n { n = len(z) } and then have i < n eliminate the same bounds checks as i < len(x) && i < len(z) currently does. * Providing some way to do BCE for unrolled loops. Now that we have math/bits implementations, it is possible to write things like ADC chains in pure Go, if you can reasonably unroll loops. Benchmarks below are for amd64, using -tags=math_big_pure_go. name old time/op new time/op delta AddVV/1-8 5.15ns ± 3% 4.65ns ± 4% -9.81% (p=0.000 n=93+86) AddVV/2-8 6.40ns ± 2% 5.58ns ± 4% -12.78% (p=0.000 n=90+95) AddVV/3-8 7.07ns ± 2% 6.66ns ± 2% -5.88% (p=0.000 n=87+83) AddVV/4-8 7.94ns ± 5% 7.41ns ± 4% -6.65% (p=0.000 n=94+98) AddVV/5-8 8.55ns ± 1% 8.80ns ± 0% +2.92% (p=0.000 n=87+92) AddVV/10-8 12.7ns ± 1% 12.3ns ± 1% -3.12% (p=0.000 n=83+71) AddVV/100-8 119ns ± 5% 117ns ± 4% -1.60% (p=0.000 n=93+90) AddVV/1000-8 1.14µs ± 4% 1.14µs ± 5% ~ (p=0.812 n=95+91) AddVV/10000-8 11.4µs ± 5% 11.3µs ± 5% ~ (p=0.503 n=97+96) AddVV/100000-8 114µs ± 4% 113µs ± 5% -0.98% (p=0.002 n=97+90) name old time/op new time/op delta SubVV/1-8 5.23ns ± 5% 4.65ns ± 3% -11.18% (p=0.000 n=89+91) SubVV/2-8 6.49ns ± 5% 5.58ns ± 3% -14.04% (p=0.000 n=92+94) SubVV/3-8 7.10ns ± 3% 6.65ns ± 2% -6.28% (p=0.000 n=87+80) SubVV/4-8 8.04ns ± 1% 7.44ns ± 5% -7.49% (p=0.000 n=83+98) SubVV/5-8 8.55ns ± 2% 8.32ns ± 1% -2.75% (p=0.000 n=84+92) SubVV/10-8 12.7ns ± 1% 12.3ns ± 1% -3.09% (p=0.000 n=80+75) SubVV/100-8 119ns ± 0% 116ns ± 3% -1.83% (p=0.000 n=87+98) SubVV/1000-8 1.13µs ± 5% 1.13µs ± 3% ~ (p=0.082 n=96+98) SubVV/10000-8 11.2µs ± 1% 11.3µs ± 3% +0.76% (p=0.000 n=87+97) SubVV/100000-8 112µs ± 2% 113µs ± 3% +0.55% (p=0.000 n=76+88) name old time/op new time/op delta AddVW/1-8 4.30ns ± 4% 3.96ns ± 6% -8.02% (p=0.000 n=89+97) AddVW/2-8 5.15ns ± 2% 4.91ns ± 1% -4.56% (p=0.000 n=87+80) AddVW/3-8 5.59ns ± 3% 5.75ns ± 2% +2.91% (p=0.000 n=91+88) AddVW/4-8 6.20ns ± 1% 6.03ns ± 1% -2.71% (p=0.000 n=75+90) AddVW/5-8 6.93ns ± 3% 6.49ns ± 2% -6.35% (p=0.000 n=100+82) AddVW/10-8 10.0ns ± 7% 9.6ns ± 0% -4.02% (p=0.000 n=98+74) AddVW/100-8 91.1ns ± 1% 90.6ns ± 1% -0.55% (p=0.000 n=84+80) AddVW/1000-8 866ns ± 1% 856ns ± 4% -1.06% (p=0.000 n=69+96) AddVW/10000-8 8.64µs ± 1% 8.53µs ± 4% -1.25% (p=0.000 n=67+99) AddVW/100000-8 84.3µs ± 2% 85.4µs ± 4% +1.22% (p=0.000 n=89+99) name old time/op new time/op delta SubVW/1-8 4.28ns ± 2% 3.82ns ± 3% -10.63% (p=0.000 n=91+89) SubVW/2-8 4.61ns ± 1% 4.48ns ± 3% -2.67% (p=0.000 n=94+96) SubVW/3-8 5.54ns ± 1% 5.81ns ± 4% +4.87% (p=0.000 n=92+97) SubVW/4-8 6.20ns ± 1% 6.08ns ± 2% -1.99% (p=0.000 n=71+88) SubVW/5-8 6.91ns ± 3% 6.64ns ± 1% -3.90% (p=0.000 n=97+70) SubVW/10-8 9.85ns ± 2% 9.62ns ± 0% -2.31% (p=0.000 n=82+62) SubVW/100-8 91.1ns ± 1% 90.9ns ± 3% -0.14% (p=0.010 n=71+93) SubVW/1000-8 859ns ± 3% 867ns ± 1% +0.98% (p=0.000 n=99+78) SubVW/10000-8 8.54µs ± 5% 8.57µs ± 2% +0.38% (p=0.007 n=98+92) SubVW/100000-8 84.5µs ± 3% 84.6µs ± 3% ~ (p=0.334 n=95+94) name old time/op new time/op delta AddMulVVW/1-8 5.43ns ± 3% 4.36ns ± 2% -19.67% (p=0.000 n=95+94) AddMulVVW/2-8 6.56ns ± 4% 6.11ns ± 1% -6.90% (p=0.000 n=91+91) AddMulVVW/3-8 8.00ns ± 1% 7.80ns ± 4% -2.52% (p=0.000 n=83+95) AddMulVVW/4-8 9.81ns ± 2% 9.53ns ± 1% -2.86% (p=0.000 n=77+64) AddMulVVW/5-8 11.4ns ± 3% 11.3ns ± 5% -0.89% (p=0.000 n=95+97) AddMulVVW/10-8 18.9ns ± 5% 19.1ns ± 5% +0.89% (p=0.000 n=91+94) AddMulVVW/100-8 165ns ± 5% 165ns ± 4% ~ (p=0.427 n=97+98) AddMulVVW/1000-8 1.56µs ± 3% 1.56µs ± 4% ~ (p=0.167 n=98+96) AddMulVVW/10000-8 15.7µs ± 5% 15.6µs ± 5% -0.31% (p=0.044 n=95+97) AddMulVVW/100000-8 156µs ± 3% 157µs ± 8% ~ (p=0.373 n=72+99) Change-Id: Ibc720785d5b95f6a797103b1363843205f4d56bf Reviewed-on: https://go-review.googlesource.com/c/go/+/164966 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Robert Griesemer <gri@golang.org>	2019-03-09 20:33:13 +00:00
Robert Griesemer	129c6e4496	math/big: support new octal prefixes 0o and 0O This CL extends the various SetString and Parse methods for Ints, Rats, and Floats to accept the new octal prefixes. The main change is in natconv.go, all other changes are documentation and test updates. Finally, this CL also fixes TestRatSetString which silently dropped certain failures. Updates #12711. Change-Id: I5ee5879e25013ba1e6eda93ff280915f25ab5d55 Reviewed-on: https://go-review.googlesource.com/c/go/+/165898 Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>	2019-03-07 21:13:57 +00:00
Josh Bleecher Snyder	d5edbcac98	math/big: rewrite pure Go implementations to use math/bits While we're here, delete addWW_g and subWW_g, per the TODO. They are now obsolete. Benchmarks on amd64 with -tags=math_big_pure_go. name old time/op new time/op delta AddVV/1-8 5.24ns ± 2% 5.12ns ± 1% -2.11% (p=0.000 n=82+87) AddVV/2-8 6.44ns ± 1% 6.33ns ± 2% -1.82% (p=0.000 n=77+82) AddVV/3-8 7.89ns ± 8% 6.97ns ± 4% -11.71% (p=0.000 n=100+96) AddVV/4-8 8.60ns ± 0% 7.72ns ± 4% -10.24% (p=0.000 n=90+96) AddVV/5-8 10.3ns ± 4% 8.5ns ± 1% -17.02% (p=0.000 n=96+91) AddVV/10-8 16.2ns ± 5% 12.8ns ± 1% -21.11% (p=0.000 n=97+86) AddVV/100-8 148ns ± 1% 117ns ± 5% -21.07% (p=0.000 n=66+98) AddVV/1000-8 1.41µs ± 4% 1.13µs ± 3% -19.90% (p=0.000 n=97+97) AddVV/10000-8 14.2µs ± 5% 11.2µs ± 1% -20.82% (p=0.000 n=99+84) AddVV/100000-8 142µs ± 4% 113µs ± 4% -20.40% (p=0.000 n=91+92) SubVV/1-8 5.29ns ± 1% 5.11ns ± 0% -3.30% (p=0.000 n=87+88) SubVV/2-8 6.36ns ± 4% 6.33ns ± 2% -0.56% (p=0.002 n=98+73) SubVV/3-8 7.58ns ± 5% 6.98ns ± 4% -8.01% (p=0.000 n=97+91) SubVV/4-8 8.61ns ± 3% 7.98ns ± 2% -7.31% (p=0.000 n=95+83) SubVV/5-8 10.6ns ± 2% 8.5ns ± 1% -19.56% (p=0.000 n=79+89) SubVV/10-8 16.3ns ± 4% 12.7ns ± 1% -21.97% (p=0.000 n=98+82) SubVV/100-8 124ns ± 1% 118ns ± 1% -4.83% (p=0.000 n=85+81) SubVV/1000-8 1.14µs ± 5% 1.12µs ± 2% -1.17% (p=0.000 n=97+81) SubVV/10000-8 11.6µs ±10% 11.2µs ± 1% -3.39% (p=0.000 n=100+84) SubVV/100000-8 114µs ± 6% 114µs ± 5% ~ (p=0.396 n=83+94) AddVW/1-8 4.04ns ± 4% 4.34ns ± 4% +7.57% (p=0.000 n=96+98) AddVW/2-8 4.34ns ± 5% 4.40ns ± 5% +1.40% (p=0.000 n=99+98) AddVW/3-8 5.43ns ± 0% 5.54ns ± 2% +1.97% (p=0.000 n=85+94) AddVW/4-8 6.23ns ± 1% 6.18ns ± 2% -0.66% (p=0.000 n=77+78) AddVW/5-8 6.78ns ± 2% 6.90ns ± 4% +1.77% (p=0.000 n=80+99) AddVW/10-8 10.5ns ± 4% 9.9ns ± 1% -5.77% (p=0.000 n=97+69) AddVW/100-8 114ns ± 3% 91ns ± 0% -20.38% (p=0.000 n=98+77) AddVW/1000-8 1.12µs ± 1% 0.87µs ± 1% -22.80% (p=0.000 n=82+68) AddVW/10000-8 11.2µs ± 2% 8.5µs ± 5% -23.85% (p=0.000 n=85+100) AddVW/100000-8 112µs ± 2% 85µs ± 5% -24.22% (p=0.000 n=71+96) SubVW/1-8 4.09ns ± 2% 4.18ns ± 4% +2.32% (p=0.000 n=78+96) SubVW/2-8 4.59ns ± 5% 4.52ns ± 7% -1.54% (p=0.000 n=98+94) SubVW/3-8 5.41ns ±10% 5.55ns ± 1% +2.48% (p=0.000 n=100+89) SubVW/4-8 6.51ns ± 2% 6.19ns ± 0% -4.85% (p=0.000 n=97+81) SubVW/5-8 7.25ns ± 3% 6.90ns ± 4% -4.93% (p=0.000 n=97+96) SubVW/10-8 10.6ns ± 4% 9.8ns ± 2% -7.32% (p=0.000 n=95+96) SubVW/100-8 90.4ns ± 0% 90.8ns ± 0% +0.43% (p=0.000 n=83+78) SubVW/1000-8 853ns ± 4% 857ns ± 2% +0.42% (p=0.000 n=100+98) SubVW/10000-8 8.52µs ± 4% 8.53µs ± 2% ~ (p=0.061 n=99+97) SubVW/100000-8 84.8µs ± 5% 84.2µs ± 2% -0.78% (p=0.000 n=99+93) AddMulVVW/1-8 8.73ns ± 0% 5.33ns ± 3% -38.91% (p=0.000 n=91+96) AddMulVVW/2-8 14.8ns ± 3% 6.5ns ± 2% -56.33% (p=0.000 n=100+79) AddMulVVW/3-8 18.6ns ± 2% 7.8ns ± 5% -57.84% (p=0.000 n=89+96) AddMulVVW/4-8 24.0ns ± 2% 9.8ns ± 0% -59.09% (p=0.000 n=95+67) AddMulVVW/5-8 29.0ns ± 2% 11.5ns ± 5% -60.44% (p=0.000 n=90+97) AddMulVVW/10-8 54.1ns ± 0% 18.8ns ± 1% -65.37% (p=0.000 n=82+84) AddMulVVW/100-8 508ns ± 2% 165ns ± 4% -67.62% (p=0.000 n=72+98) AddMulVVW/1000-8 4.96µs ± 3% 1.55µs ± 1% -68.86% (p=0.000 n=99+91) AddMulVVW/10000-8 50.0µs ± 4% 15.5µs ± 4% -68.95% (p=0.000 n=97+97) AddMulVVW/100000-8 491µs ± 1% 156µs ± 8% -68.22% (p=0.000 n=79+95) Change-Id: I4c6ae0b4065f371aea8103f6a85d9e9274bf01d0 Reviewed-on: https://go-review.googlesource.com/c/go/+/164965 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2019-03-04 20:49:12 +00:00
Josh Bleecher Snyder	87cc56718a	math/big: optimize shlVU_g and shrVU_g Special case shifts by zero. Provide hints to the compiler that shifts are bounded. There are no existing benchmarks for shifts, but the Float implementation uses shifts, so we can use those. Benchmarks on amd64 with -tags=math_big_pure_go. name old time/op new time/op delta FloatString/100-8 869ns ± 3% 872ns ± 4% +0.40% (p=0.001 n=94+83) FloatString/1000-8 26.5µs ± 1% 26.4µs ± 1% -0.46% (p=0.000 n=87+96) FloatString/10000-8 2.18ms ± 2% 2.18ms ± 2% ~ (p=0.687 n=90+89) FloatString/100000-8 200ms ± 7% 197ms ± 5% -1.47% (p=0.000 n=100+90) FloatAdd/10-8 65.9ns ± 4% 64.0ns ± 4% -2.94% (p=0.000 n=92+93) FloatAdd/100-8 71.3ns ± 4% 67.4ns ± 4% -5.51% (p=0.000 n=96+93) FloatAdd/1000-8 128ns ± 1% 121ns ± 0% -5.69% (p=0.000 n=91+80) FloatAdd/10000-8 718ns ± 4% 626ns ± 4% -12.83% (p=0.000 n=99+99) FloatAdd/100000-8 6.43µs ± 3% 5.50µs ± 1% -14.50% (p=0.000 n=98+83) FloatSub/10-8 57.7ns ± 2% 57.0ns ± 4% -1.20% (p=0.000 n=89+96) FloatSub/100-8 59.9ns ± 3% 58.7ns ± 4% -2.10% (p=0.000 n=100+98) FloatSub/1000-8 94.5ns ± 1% 88.6ns ± 0% -6.16% (p=0.000 n=74+70) FloatSub/10000-8 456ns ± 1% 416ns ± 5% -8.83% (p=0.000 n=87+95) FloatSub/100000-8 4.00µs ± 1% 3.57µs ± 1% -10.87% (p=0.000 n=68+85) FloatSqrt/64-8 585ns ± 1% 579ns ± 1% -0.99% (p=0.000 n=92+90) FloatSqrt/128-8 1.26µs ± 1% 1.23µs ± 2% -2.42% (p=0.000 n=91+81) FloatSqrt/256-8 1.45µs ± 3% 1.40µs ± 1% -3.61% (p=0.000 n=96+90) FloatSqrt/1000-8 4.03µs ± 1% 3.91µs ± 1% -3.05% (p=0.000 n=90+93) FloatSqrt/10000-8 48.0µs ± 0% 47.3µs ± 1% -1.55% (p=0.000 n=90+90) FloatSqrt/100000-8 1.23ms ± 3% 1.22ms ± 4% -1.00% (p=0.000 n=99+99) FloatSqrt/1000000-8 96.7ms ± 4% 98.0ms ±10% ~ (p=0.322 n=89+99) Change-Id: I0f941c05b7c324256d7f0674559b6ba906e92ba8 Reviewed-on: https://go-review.googlesource.com/c/go/+/164967 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2019-03-04 19:30:57 +00:00
Juraj Sukop	1d992f2e36	math/big: better initial guess for nat.sqrt The proposed change introduces a better initial guess which is closer to the final value and therefore converges in fewer steps. Consider for example sqrt(8): previously the guess was 8, whereas now it is 4 (and the result is 2). All this change does is it computes the division by two more accurately while it keeps the guess ≥ √x. Change-Id: I917248d734a7b0488d14a647a063f674e56c4e30 GitHub-Last-Rev: `c06d9d4876` GitHub-Pull-Request: golang/go#28981 Reviewed-on: https://go-review.googlesource.com/c/163866 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2019-02-27 18:48:56 +00:00
Bryan C. Mills	bd98628676	math/cmplx: avoid panic in Pow(x, NaN()) Fixes #30088 Change-Id: I08cec17feddc86bd08532e6b135807e3c8f4c1b2 Reviewed-on: https://go-review.googlesource.com/c/161197 Run-TryBot: Bryan C. Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2019-02-27 14:01:03 +00:00
Brian Kessler	a73abca37b	math/big: handle alias of cofactor inputs in GCD If the variables passed in to the cofactor arguments of GCD (x, y) aliased the input arguments (a, b), the previous implementation would result in incorrect results for y. This change reorganizes the calculation so that the only case that need to be handled is when y aliases b, which can be handled with a simple check. Tests were added for all of the alias cases for input arguments and and and irrelevant test case for a previous binary GCD calculation was dropped. Fixes #30217 Change-Id: Ibe6137f09b3e1ae3c29e3c97aba85b67f33dc169 Reviewed-on: https://go-review.googlesource.com/c/162517 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-02-27 00:11:17 +00:00
Russ Cox	d6311ff1e4	math/big: add %#b and %O integer formats Matching fmt, %#b now prints an 0b prefix, and %O prints octal with an 0o prefix. See golang.org/design/19308-number-literals for background. For #19308. For #12711. Change-Id: I139c5a9a1dfae15415621601edfa13c6a5f19cfc Reviewed-on: https://go-review.googlesource.com/c/160250 Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2019-02-26 19:39:19 +00:00
Russ Cox	675503c507	math/big: add %x float format big.Float already had %p for printing hex format, but that format normalizes differently from fmt's %x and ignores precision entirely. This CL adds %x to big.Float, matching fmt's behavior: the verb is spelled 'x' not 'p', the mantissa is normalized to [1, 2), and precision is respected. See golang.org/design/19308-number-literals for background. For #29008. Change-Id: I9c1b9612107094856797e5b0b584c556c1914895 Reviewed-on: https://go-review.googlesource.com/c/160249 Reviewed-by: Robert Griesemer <gri@golang.org>	2019-02-26 19:39:11 +00:00
Michael Munday	42a82ce1a7	math/bits: optimize Reverse32 and Reverse64 Use ReverseBytes32 and ReverseBytes64 to speed up these functions. The byte reversal functions are intrinsics on most platforms and generally compile to a single instruction. name old time/op new time/op delta Reverse32 2.41ns ± 1% 1.94ns ± 3% -19.60% (p=0.000 n=20+19) Reverse64 3.85ns ± 1% 2.56ns ± 1% -33.32% (p=0.000 n=17+19) Change-Id: I160bf59a0c7bd5db94114803ec5a59fae448f096 Reviewed-on: https://go-review.googlesource.com/c/159358 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2019-02-26 17:52:08 +00:00
Robert Griesemer	fae44a2be3	src, misc: apply gofmt This applies the new gofmt literal normalizations to the library. Change-Id: I8c1e8ef62eb556fc568872c9f77a31ef236348e7 Reviewed-on: https://go-review.googlesource.com/c/162539 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2019-02-19 20:38:28 +00:00
Russ Cox	e2d87f2ca5	strconv: format hex floats This CL updates FormatFloat to format standard hexadecimal floating-point constants, using the 'x' and 'X' verbs. See golang.org/design/19308-number-literals for background. For #29008. Change-Id: I540b8f71d492cfdb7c58af533d357a564591f28b Reviewed-on: https://go-review.googlesource.com/c/160242 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2019-02-12 14:48:22 +00:00
Robert Griesemer	7bc2aa670f	math/big: permit upper-case 'P' binary exponent (not just 'p') The current implementation accepted binary exponents but restricted them to 'p'. This change permits both 'p' and 'P'. R=Go1.13 Updates #29008. Change-Id: I7a89ccb86af4438f17b0422be7cb630ffcf43272 Reviewed-on: https://go-review.googlesource.com/c/159297 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>	2019-02-11 23:22:35 +00:00
Robert Griesemer	33caf3be83	math/big: document that Rat.SetString accepts _decimal_ float representations Updates #29799. Change-Id: I267c2c3ba3964e96903954affc248d0c52c4916c Reviewed-on: https://go-review.googlesource.com/c/158397 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2019-01-17 23:04:06 +00:00
Brian Kessler	649294d0a5	math: fix ternary correction statement in Log1p The original port of Log1p incorrectly translated a ternary statement so that a correction was only applied to one of the branches. Fixes #29488 Change-Id: I035b2fc741f76fe7c0154c63da6e298b575e08a4 Reviewed-on: https://go-review.googlesource.com/c/156120 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Katie Hockman <katie@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2019-01-07 18:57:45 +00:00
Will Beason	bfaf11c158	math/big: fix incorrect comment variable reference Fix comment as w&1 is the parity of 'x', not of 'n'. Change-Id: Ia0e448f7e5896412ff9b164459ce15561ab624cc GitHub-Last-Rev: `54ba08ab10` GitHub-Pull-Request: golang/go#29419 Reviewed-on: https://go-review.googlesource.com/c/155743 Reviewed-by: Robert Griesemer <gri@golang.org>	2018-12-26 05:21:41 +00:00
Robert Griesemer	9ce38f570f	math: don't run huge argument tests on s390x The s390x implementations for Sin/Cos/SinCos/Tan use assembly routines which don't reduce arguments accurately enough for huge inputs. Fixes #29221. Change-Id: I340f576899d67bb52a553c3ab22e6464172c936d Reviewed-on: https://go-review.googlesource.com/c/154119 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-12-13 22:13:57 +00:00
Brian Kessler	02ad841dd8	math: correct mPi4 comment The previous comment mis-stated the number of bits in mPi4. The correct value is 19*64 + 1 == 1217 bits. Change-Id: Ife971ff6936ce2d5b81ce663ce48044749d592a0 Reviewed-on: https://go-review.googlesource.com/c/154017 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-12-13 20:35:45 +00:00
Robert Griesemer	944a9c7a4f	math: use constant rather than variable for exported test threshold This is a minor follow-up on https://golang.org/cl/153059. TBR=iant Updates #6794. Change-Id: I03657dafc572959d46a03f86bbeb280825bc969d Reviewed-on: https://go-review.googlesource.com/c/153845 Reviewed-by: Robert Griesemer <gri@golang.org>	2018-12-13 06:33:18 +00:00
Brian Kessler	98521a5a8f	math: implement trignometric range reduction for huge arguments This change implements Payne-Hanek range reduction by Pi/4 to properly calculate trigonometric functions of huge arguments. The implementation is based on: "ARGUMENT REDUCTION FOR HUGE ARGUMENTS: Good to the Last Bit" K. C. Ng et al, March 24, 1992 The major difference with the reference is that the simulated multi-precision calculation of x*B is implemented using 64-bit integer arithmetic rather than floating point to ease extraction of the relevant bits of 4/Pi. The assembly implementations for 386 were removed since the trigonometric instructions only use a 66-bit representation of Pi internally for reduction. It is not possible to use these instructions and maintain accuracy without a prior accurate reduction in software as recommended by Intel. Fixes #6794 Change-Id: I31bf1369e0578891d738c5473447fe9b10560196 Reviewed-on: https://go-review.googlesource.com/c/153059 Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-12-13 06:01:42 +00:00
Alberto Donizetti	11ce6eabd6	math/bits: remove named return in TrailingZeros16 TrailingZeros16 is the only one of the TrailingZeros functions with a named return value in the signature. This creates a sligthly unpleasant effect in the godoc listing: func TrailingZeros(x uint) int func TrailingZeros16(x uint16) (n int) func TrailingZeros32(x uint32) int func TrailingZeros64(x uint64) int func TrailingZeros8(x uint8) int Since the named return value is not even used, remove it. Change-Id: I15c5aedb6157003911b6e0685c357ce56e466c0e Reviewed-on: https://go-review.googlesource.com/c/153340 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-12-09 14:27:56 +00:00
Robert Griesemer	276870d6e0	math: document sign bit correspondence for floating-point/bits conversions Fixes #27736. Change-Id: Ibda7da7ec6e731626fc43abf3e8c1190117f7885 Reviewed-on: https://go-review.googlesource.com/c/153057 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2018-12-06 22:27:54 +00:00
Josh Bleecher Snyder	bfc54bb6f3	math/big: allocate less for single-Word nats For many uses of math/big, most numbers are small in practice. Prior to this change, big.NewInt allocated a minimum of five Words: one to hold the value, and four as extra capacity. In most cases, this extra capacity is waste. Worse, allocating a single Word uses a fast malloc path for tiny allocs; allocating five Words is more expensive in CPU as well as memory. This change is a simple fix: Treat a request for one Word at its word. I experimented with more complicated fixes and did not find anything that outperformed this easy fix. On some real world programs, this is a clear win. The compiler: name old alloc/op new alloc/op delta Template 37.1MB ± 0% 37.0MB ± 0% -0.23% (p=0.008 n=5+5) Unicode 29.2MB ± 0% 28.5MB ± 0% -2.48% (p=0.008 n=5+5) GoTypes 133MB ± 0% 133MB ± 0% -0.05% (p=0.008 n=5+5) Compiler 628MB ± 0% 628MB ± 0% -0.06% (p=0.008 n=5+5) SSA 2.04GB ± 0% 2.03GB ± 0% -0.14% (p=0.008 n=5+5) Flate 24.7MB ± 0% 24.6MB ± 0% -0.23% (p=0.008 n=5+5) GoParser 29.6MB ± 0% 29.6MB ± 0% -0.07% (p=0.008 n=5+5) Reflect 82.3MB ± 0% 82.2MB ± 0% -0.05% (p=0.008 n=5+5) Tar 36.2MB ± 0% 36.2MB ± 0% -0.12% (p=0.008 n=5+5) XML 49.5MB ± 0% 49.4MB ± 0% -0.23% (p=0.008 n=5+5) [Geo mean] 85.1MB 84.8MB -0.37% name old allocs/op new allocs/op delta Template 364k ± 0% 364k ± 0% ~ (p=0.476 n=5+5) Unicode 341k ± 0% 341k ± 0% ~ (p=0.690 n=5+5) GoTypes 1.37M ± 0% 1.37M ± 0% ~ (p=0.444 n=5+5) Compiler 5.50M ± 0% 5.50M ± 0% +0.02% (p=0.008 n=5+5) SSA 16.0M ± 0% 16.0M ± 0% +0.01% (p=0.008 n=5+5) Flate 238k ± 0% 238k ± 0% ~ (p=0.222 n=5+5) GoParser 305k ± 0% 305k ± 0% ~ (p=0.841 n=5+5) Reflect 976k ± 0% 976k ± 0% ~ (p=0.222 n=5+5) Tar 354k ± 0% 354k ± 0% ~ (p=0.103 n=5+5) XML 450k ± 0% 450k ± 0% ~ (p=0.151 n=5+5) [Geo mean] 837k 837k +0.01% go.skylark.net (at ea6d2813de75ded8d157b9540bc3d3ad0b688623): name old alloc/op new alloc/op delta Hashtable-8 456kB ± 0% 299kB ± 0% -34.33% (p=0.000 n=9+9) /bench_builtin_method-8 220kB ± 0% 190kB ± 0% -13.55% (p=0.000 n=9+10) name old allocs/op new allocs/op delta Hashtable-8 7.84k ± 0% 7.84k ± 0% ~ (all equal) /bench_builtin_method-8 7.49k ± 0% 7.49k ± 0% ~ (all equal) The math/big benchmarks are messy, which is predictable, since they naturally exercise the bigger-than-one-word code more. Also worth noting is that many of the benchmarks have very high variance. I've omitted the opVV and opVW benchmarks, as they are unrelated. name old time/op new time/op delta DecimalConversion-8 92.5µs ± 1% 90.6µs ± 0% -2.12% (p=0.000 n=17+19) FloatString/100-8 867ns ± 0% 871ns ± 0% +0.50% (p=0.000 n=18+18) FloatString/1000-8 26.4µs ± 1% 26.5µs ± 1% ~ (p=0.396 n=20+19) FloatString/10000-8 2.15ms ± 2% 2.16ms ± 2% ~ (p=0.089 n=19+20) FloatString/100000-8 209ms ± 1% 209ms ± 1% ~ (p=0.583 n=19+19) FloatAdd/10-8 63.5ns ± 2% 64.1ns ± 6% ~ (p=0.389 n=19+19) FloatAdd/100-8 66.0ns ± 2% 65.8ns ± 2% ~ (p=0.825 n=20+20) FloatAdd/1000-8 93.9ns ± 1% 94.3ns ± 1% ~ (p=0.273 n=19+20) FloatAdd/10000-8 347ns ± 2% 342ns ± 1% -1.50% (p=0.000 n=18+18) FloatAdd/100000-8 2.78µs ± 1% 2.78µs ± 2% ~ (p=0.961 n=20+19) FloatSub/10-8 56.9ns ± 2% 57.8ns ± 3% +1.59% (p=0.001 n=19+19) FloatSub/100-8 58.2ns ± 2% 58.9ns ± 2% +1.25% (p=0.004 n=20+20) FloatSub/1000-8 74.9ns ± 1% 74.4ns ± 1% -0.76% (p=0.000 n=19+20) FloatSub/10000-8 223ns ± 1% 220ns ± 2% -1.29% (p=0.000 n=16+20) FloatSub/100000-8 1.66µs ± 1% 1.66µs ± 2% ~ (p=0.147 n=20+20) ParseFloatSmallExp-8 8.38µs ± 0% 8.59µs ± 0% +2.48% (p=0.000 n=19+19) ParseFloatLargeExp-8 31.1µs ± 0% 32.0µs ± 0% +3.04% (p=0.000 n=16+17) GCD10x10/WithoutXY-8 115ns ± 1% 99ns ± 3% -14.07% (p=0.000 n=20+20) GCD10x10/WithXY-8 322ns ± 0% 312ns ± 0% -3.11% (p=0.000 n=18+13) GCD10x100/WithoutXY-8 233ns ± 1% 219ns ± 1% -5.73% (p=0.000 n=19+17) GCD10x100/WithXY-8 709ns ± 0% 759ns ± 0% +7.04% (p=0.000 n=19+19) GCD10x1000/WithoutXY-8 653ns ± 1% 642ns ± 1% -1.69% (p=0.000 n=17+20) GCD10x1000/WithXY-8 1.35µs ± 0% 1.35µs ± 1% ~ (p=0.255 n=20+16) GCD10x10000/WithoutXY-8 4.57µs ± 1% 4.61µs ± 1% +0.95% (p=0.000 n=18+17) GCD10x10000/WithXY-8 6.82µs ± 0% 6.84µs ± 0% +0.27% (p=0.000 n=16+17) GCD10x100000/WithoutXY-8 43.9µs ± 1% 44.0µs ± 1% +0.28% (p=0.000 n=18+17) GCD10x100000/WithXY-8 60.6µs ± 0% 60.6µs ± 0% ~ (p=0.907 n=18+18) GCD100x100/WithoutXY-8 1.13µs ± 0% 1.21µs ± 0% +6.39% (p=0.000 n=19+19) GCD100x100/WithXY-8 1.82µs ± 0% 1.92µs ± 0% +5.24% (p=0.000 n=19+17) GCD100x1000/WithoutXY-8 2.00µs ± 0% 2.03µs ± 1% +1.61% (p=0.000 n=18+16) GCD100x1000/WithXY-8 3.22µs ± 0% 3.20µs ± 1% -0.83% (p=0.000 n=19+19) GCD100x10000/WithoutXY-8 9.28µs ± 1% 9.17µs ± 1% -1.25% (p=0.000 n=18+19) GCD100x10000/WithXY-8 13.5µs ± 0% 13.3µs ± 0% -1.12% (p=0.000 n=18+19) GCD100x100000/WithoutXY-8 80.4µs ± 0% 78.6µs ± 0% -2.25% (p=0.000 n=19+19) GCD100x100000/WithXY-8 114µs ± 0% 112µs ± 0% -1.46% (p=0.000 n=19+17) GCD1000x1000/WithoutXY-8 12.9µs ± 1% 12.9µs ± 2% -0.50% (p=0.014 n=20+19) GCD1000x1000/WithXY-8 19.6µs ± 1% 19.6µs ± 2% -0.28% (p=0.040 n=17+18) GCD1000x10000/WithoutXY-8 22.4µs ± 0% 22.4µs ± 2% ~ (p=0.220 n=19+19) GCD1000x10000/WithXY-8 57.0µs ± 0% 56.5µs ± 0% -0.87% (p=0.000 n=20+20) GCD1000x100000/WithoutXY-8 116µs ± 0% 115µs ± 0% -0.49% (p=0.000 n=18+19) GCD1000x100000/WithXY-8 410µs ± 0% 411µs ± 0% ~ (p=0.052 n=19+19) GCD10000x10000/WithoutXY-8 247µs ± 1% 244µs ± 1% -0.92% (p=0.000 n=19+19) GCD10000x10000/WithXY-8 476µs ± 1% 473µs ± 1% -0.48% (p=0.009 n=19+19) GCD10000x100000/WithoutXY-8 573µs ± 1% 571µs ± 1% -0.45% (p=0.012 n=20+20) GCD10000x100000/WithXY-8 3.35ms ± 1% 3.35ms ± 1% ~ (p=0.444 n=20+19) GCD100000x100000/WithoutXY-8 12.0ms ± 2% 11.9ms ± 2% ~ (p=0.276 n=18+20) GCD100000x100000/WithXY-8 27.3ms ± 1% 27.3ms ± 1% ~ (p=0.792 n=20+19) Hilbert-8 672µs ± 0% 611µs ± 0% -9.02% (p=0.000 n=19+19) Binomial-8 1.40µs ± 0% 1.18µs ± 0% -15.69% (p=0.000 n=16+14) QuoRem-8 2.20µs ± 1% 2.17µs ± 1% -1.13% (p=0.000 n=19+19) Exp-8 4.10ms ± 1% 4.11ms ± 1% ~ (p=0.296 n=20+19) Exp2-8 4.11ms ± 1% 4.12ms ± 1% ~ (p=0.429 n=20+20) Bitset-8 8.67ns ± 6% 8.74ns ± 4% ~ (p=0.139 n=19+17) BitsetNeg-8 43.6ns ± 1% 43.8ns ± 2% +0.61% (p=0.036 n=20+20) BitsetOrig-8 77.5ns ± 1% 68.4ns ± 1% -11.77% (p=0.000 n=19+20) BitsetNegOrig-8 145ns ± 1% 141ns ± 1% -2.87% (p=0.000 n=19+20) ModSqrt225_Tonelli-8 324µs ± 1% 324µs ± 1% ~ (p=0.409 n=18+20) ModSqrt225_3Mod4-8 98.9µs ± 1% 99.1µs ± 1% ~ (p=0.298 n=19+18) ModSqrt231_Tonelli-8 337µs ± 1% 337µs ± 1% ~ (p=0.718 n=20+18) ModSqrt231_5Mod8-8 115µs ± 1% 114µs ± 1% -0.22% (p=0.050 n=20+20) ModInverse-8 895ns ± 0% 869ns ± 1% -2.83% (p=0.000 n=17+17) Sqrt-8 28.1µs ± 1% 28.1µs ± 0% -0.28% (p=0.000 n=16+20) IntSqr/1-8 10.8ns ± 3% 10.5ns ± 3% -2.51% (p=0.000 n=19+17) IntSqr/2-8 30.5ns ± 2% 30.3ns ± 4% -0.71% (p=0.035 n=18+18) IntSqr/3-8 40.1ns ± 1% 40.1ns ± 1% ~ (p=0.710 n=20+17) IntSqr/5-8 65.3ns ± 1% 65.4ns ± 2% ~ (p=0.744 n=19+19) IntSqr/8-8 101ns ± 1% 102ns ± 0% ~ (p=0.234 n=19+20) IntSqr/10-8 138ns ± 0% 138ns ± 2% ~ (p=0.827 n=18+18) IntSqr/20-8 378ns ± 1% 378ns ± 1% ~ (p=0.479 n=18+18) IntSqr/30-8 637ns ± 0% 638ns ± 1% ~ (p=0.051 n=18+20) IntSqr/50-8 1.34µs ± 2% 1.34µs ± 1% ~ (p=0.970 n=18+19) IntSqr/80-8 2.78µs ± 0% 2.78µs ± 1% -0.18% (p=0.006 n=19+17) IntSqr/100-8 3.98µs ± 0% 3.98µs ± 0% ~ (p=0.057 n=17+19) IntSqr/200-8 13.5µs ± 0% 13.5µs ± 1% -0.33% (p=0.000 n=19+17) IntSqr/300-8 25.3µs ± 1% 25.3µs ± 1% ~ (p=0.361 n=19+20) IntSqr/500-8 62.9µs ± 0% 62.9µs ± 1% ~ (p=0.899 n=17+17) IntSqr/800-8 128µs ± 1% 127µs ± 1% -0.32% (p=0.016 n=18+20) IntSqr/1000-8 192µs ± 0% 192µs ± 1% ~ (p=0.916 n=17+18) Div/20/10-8 34.9ns ± 2% 35.6ns ± 1% +2.01% (p=0.000 n=20+20) Div/200/100-8 218ns ± 1% 215ns ± 2% -1.43% (p=0.000 n=18+18) Div/2000/1000-8 1.16µs ± 1% 1.15µs ± 1% -1.04% (p=0.000 n=19+20) Div/20000/10000-8 35.7µs ± 1% 35.4µs ± 1% -0.69% (p=0.000 n=19+18) Div/200000/100000-8 2.89ms ± 1% 2.88ms ± 1% -0.62% (p=0.007 n=19+20) Mul-8 9.28ms ± 1% 9.27ms ± 1% ~ (p=0.563 n=18+18) ZeroShifts/Shl-8 712ns ± 6% 716ns ± 7% ~ (p=0.597 n=20+20) ZeroShifts/ShlSame-8 4.00ns ± 1% 4.06ns ± 5% ~ (p=0.162 n=18+20) ZeroShifts/Shr-8 714ns ±10% 1285ns ±156% ~ (p=0.250 n=20+20) ZeroShifts/ShrSame-8 4.00ns ± 1% 4.09ns ±10% +2.34% (p=0.048 n=16+19) Exp3Power/0x10-8 154ns ± 0% 159ns ±13% ~ (p=0.197 n=14+20) Exp3Power/0x40-8 171ns ± 1% 175ns ± 8% ~ (p=0.058 n=16+19) Exp3Power/0x100-8 287ns ± 0% 316ns ± 4% +10.03% (p=0.000 n=17+19) Exp3Power/0x400-8 698ns ± 1% 801ns ± 6% +14.75% (p=0.000 n=19+20) Exp3Power/0x1000-8 2.87µs ± 0% 3.65µs ± 6% +27.24% (p=0.000 n=18+18) Exp3Power/0x4000-8 21.9µs ± 1% 28.7µs ± 8% +31.09% (p=0.000 n=18+20) Exp3Power/0x10000-8 204µs ± 0% 267µs ± 9% +30.81% (p=0.000 n=20+20) Exp3Power/0x40000-8 1.86ms ± 0% 2.26ms ± 5% +21.68% (p=0.000 n=18+19) Exp3Power/0x100000-8 17.5ms ± 1% 20.7ms ± 7% +18.39% (p=0.000 n=19+20) Exp3Power/0x400000-8 156ms ± 0% 172ms ± 6% +10.54% (p=0.000 n=19+20) Fibo-8 26.9ms ± 1% 27.5ms ± 3% +2.32% (p=0.000 n=19+19) NatSqr/1-8 31.0ns ± 4% 39.5ns ±29% +27.25% (p=0.000 n=20+19) NatSqr/2-8 54.1ns ± 1% 69.0ns ±28% +27.52% (p=0.000 n=20+20) NatSqr/3-8 66.6ns ± 1% 83.0ns ±25% +24.59% (p=0.000 n=20+20) NatSqr/5-8 97.1ns ± 1% 119.9ns ±12% +23.50% (p=0.000 n=16+20) NatSqr/8-8 138ns ± 1% 171ns ± 9% +24.20% (p=0.000 n=19+20) NatSqr/10-8 182ns ± 0% 225ns ± 9% +23.50% (p=0.000 n=16+20) NatSqr/20-8 447ns ± 1% 624ns ± 6% +39.64% (p=0.000 n=19+19) NatSqr/30-8 736ns ± 2% 986ns ± 9% +33.94% (p=0.000 n=19+20) NatSqr/50-8 1.51µs ± 2% 1.97µs ± 9% +30.42% (p=0.000 n=20+20) NatSqr/80-8 3.03µs ± 1% 3.67µs ± 7% +21.08% (p=0.000 n=20+20) NatSqr/100-8 4.31µs ± 1% 5.20µs ± 7% +20.52% (p=0.000 n=19+20) NatSqr/200-8 14.2µs ± 0% 16.3µs ± 4% +14.92% (p=0.000 n=19+20) NatSqr/300-8 27.8µs ± 1% 33.2µs ± 7% +19.28% (p=0.000 n=20+18) NatSqr/500-8 66.6µs ± 1% 74.5µs ± 3% +11.87% (p=0.000 n=18+18) NatSqr/800-8 135µs ± 1% 165µs ± 7% +22.33% (p=0.000 n=20+20) NatSqr/1000-8 200µs ± 0% 228µs ± 3% +14.39% (p=0.000 n=19+20) NatSetBytes/8-8 8.87ns ± 4% 8.77ns ± 2% -1.17% (p=0.020 n=20+16) NatSetBytes/24-8 38.6ns ± 3% 49.5ns ±29% +28.32% (p=0.000 n=18+19) NatSetBytes/128-8 75.2ns ± 1% 120.7ns ±29% +60.60% (p=0.000 n=17+20) NatSetBytes/7-8 16.2ns ± 2% 16.5ns ± 2% +1.76% (p=0.000 n=20+20) NatSetBytes/23-8 46.5ns ± 1% 60.2ns ±24% +29.59% (p=0.000 n=20+20) NatSetBytes/127-8 83.1ns ± 1% 118.2ns ±20% +42.33% (p=0.000 n=18+20) ScanPi-8 89.1µs ± 1% 117.4µs ±12% +31.75% (p=0.000 n=18+20) StringPiParallel-8 35.1µs ± 9% 40.2µs ±12% +14.53% (p=0.000 n=20+20) Scan/10/Base2-8 410ns ±14% 429ns ±10% +4.47% (p=0.018 n=19+20) Scan/100/Base2-8 3.05µs ±20% 2.97µs ±14% ~ (p=0.449 n=20+20) Scan/1000/Base2-8 29.3µs ± 8% 30.1µs ±23% ~ (p=0.355 n=20+20) Scan/10000/Base2-8 402µs ±13% 395µs ±14% ~ (p=0.355 n=20+20) Scan/100000/Base2-8 11.8ms ±10% 11.6ms ± 1% ~ (p=0.245 n=17+18) Scan/10/Base8-8 194ns ± 6% 196ns ±12% ~ (p=0.829 n=20+19) Scan/100/Base8-8 1.11µs ±15% 1.11µs ±12% ~ (p=0.743 n=20+20) Scan/1000/Base8-8 11.7µs ±10% 11.7µs ±12% ~ (p=0.904 n=20+20) Scan/10000/Base8-8 209µs ± 7% 210µs ± 8% ~ (p=0.478 n=20+20) Scan/100000/Base8-8 10.6ms ± 7% 10.4ms ± 6% ~ (p=0.112 n=20+18) Scan/10/Base10-8 182ns ±12% 188ns ±11% +3.52% (p=0.044 n=20+20) Scan/100/Base10-8 1.01µs ± 8% 1.00µs ±13% ~ (p=0.588 n=20+20) Scan/1000/Base10-8 10.7µs ±20% 10.6µs ±14% ~ (p=0.560 n=20+20) Scan/10000/Base10-8 195µs ±10% 194µs ± 9% ~ (p=0.883 n=20+20) Scan/100000/Base10-8 10.6ms ± 2% 10.6ms ± 2% ~ (p=0.495 n=20+20) Scan/10/Base16-8 166ns ±10% 174ns ±17% ~ (p=0.072 n=20+20) Scan/100/Base16-8 836ns ±10% 826ns ±12% ~ (p=0.562 n=20+17) Scan/1000/Base16-8 8.96µs ±13% 8.65µs ± 9% ~ (p=0.203 n=20+18) Scan/10000/Base16-8 198µs ± 3% 198µs ± 5% ~ (p=0.718 n=20+20) Scan/100000/Base16-8 11.1ms ± 3% 11.0ms ± 4% ~ (p=0.512 n=20+20) String/10/Base2-8 88.1ns ± 7% 94.1ns ±11% +6.80% (p=0.000 n=19+20) String/100/Base2-8 577ns ± 4% 598ns ± 5% +3.72% (p=0.000 n=20+20) String/1000/Base2-8 5.25µs ± 2% 5.62µs ± 5% +7.04% (p=0.000 n=19+20) String/10000/Base2-8 55.6µs ± 1% 60.1µs ± 2% +8.12% (p=0.000 n=19+19) String/100000/Base2-8 519µs ± 2% 560µs ± 2% +7.91% (p=0.000 n=18+17) String/10/Base8-8 52.2ns ± 8% 53.3ns ±12% ~ (p=0.188 n=20+18) String/100/Base8-8 218ns ± 3% 232ns ±10% +6.66% (p=0.000 n=20+20) String/1000/Base8-8 1.84µs ± 3% 1.94µs ± 4% +5.07% (p=0.000 n=20+18) String/10000/Base8-8 18.1µs ± 2% 19.1µs ± 3% +5.84% (p=0.000 n=20+19) String/100000/Base8-8 184µs ± 2% 197µs ± 1% +7.15% (p=0.000 n=19+19) String/10/Base10-8 158ns ± 7% 146ns ± 6% -7.65% (p=0.000 n=20+19) String/100/Base10-8 807ns ± 2% 845ns ± 4% +4.79% (p=0.000 n=20+19) String/1000/Base10-8 3.99µs ± 3% 3.99µs ± 7% ~ (p=0.920 n=20+20) String/10000/Base10-8 20.8µs ± 6% 22.1µs ±10% +6.11% (p=0.000 n=19+20) String/100000/Base10-8 5.60ms ± 2% 5.59ms ± 2% ~ (p=0.749 n=20+19) String/10/Base16-8 49.0ns ±13% 49.3ns ±16% ~ (p=0.581 n=19+20) String/100/Base16-8 173ns ± 5% 185ns ± 6% +6.63% (p=0.000 n=20+18) String/1000/Base16-8 1.38µs ± 3% 1.49µs ±10% +8.27% (p=0.000 n=19+20) String/10000/Base16-8 13.5µs ± 2% 14.5µs ± 3% +7.08% (p=0.000 n=20+20) String/100000/Base16-8 138µs ± 4% 148µs ± 4% +7.57% (p=0.000 n=19+20) LeafSize/0-8 2.74ms ± 1% 2.79ms ± 2% +2.00% (p=0.000 n=19+19) LeafSize/1-8 24.8µs ± 4% 26.1µs ± 8% +5.33% (p=0.000 n=18+19) LeafSize/2-8 24.9µs ± 7% 25.0µs ± 8% ~ (p=0.989 n=20+19) LeafSize/3-8 97.6µs ± 3% 100.2µs ± 5% +2.66% (p=0.001 n=20+19) LeafSize/4-8 25.2µs ± 5% 25.4µs ± 5% ~ (p=0.173 n=19+20) LeafSize/5-8 118µs ± 2% 119µs ± 5% ~ (p=0.478 n=20+20) LeafSize/6-8 97.6µs ± 3% 100.1µs ± 8% +2.65% (p=0.021 n=20+19) LeafSize/7-8 65.6µs ± 5% 67.5µs ± 6% +2.92% (p=0.003 n=20+19) LeafSize/8-8 25.5µs ± 5% 25.6µs ± 6% ~ (p=0.461 n=19+20) LeafSize/9-8 134µs ± 4% 136µs ± 5% ~ (p=0.194 n=19+20) LeafSize/10-8 119µs ± 3% 122µs ± 3% +2.52% (p=0.000 n=20+19) LeafSize/11-8 115µs ± 5% 116µs ± 5% ~ (p=0.158 n=20+19) LeafSize/12-8 97.4µs ± 4% 100.3µs ± 5% +2.91% (p=0.003 n=19+20) LeafSize/13-8 93.1µs ± 4% 93.0µs ± 6% ~ (p=0.698 n=20+20) LeafSize/14-8 67.0µs ± 3% 69.7µs ± 6% +4.10% (p=0.000 n=20+20) LeafSize/15-8 48.3µs ± 2% 49.3µs ± 6% +1.91% (p=0.014 n=19+20) LeafSize/16-8 25.6µs ± 5% 25.6µs ± 6% ~ (p=0.947 n=20+20) LeafSize/32-8 30.1µs ± 4% 30.3µs ± 5% ~ (p=0.685 n=18+19) LeafSize/64-8 53.4µs ± 2% 54.0µs ± 3% ~ (p=0.053 n=19+19) ProbablyPrime/n=0-8 3.59ms ± 1% 3.55ms ± 1% -1.12% (p=0.000 n=20+18) ProbablyPrime/n=1-8 4.21ms ± 2% 4.17ms ± 2% -0.73% (p=0.018 n=20+19) ProbablyPrime/n=5-8 6.74ms ± 1% 6.72ms ± 1% ~ (p=0.102 n=20+20) ProbablyPrime/n=10-8 9.91ms ± 1% 9.89ms ± 2% ~ (p=0.322 n=19+20) ProbablyPrime/n=20-8 16.2ms ± 1% 16.1ms ± 2% -0.52% (p=0.006 n=19+19) ProbablyPrime/Lucas-8 2.94ms ± 1% 2.95ms ± 1% +0.52% (p=0.002 n=18+19) ProbablyPrime/MillerRabinBase2-8 641µs ± 2% 640µs ± 2% ~ (p=0.607 n=19+20) FloatSqrt/64-8 653ns ± 5% 704ns ± 5% +7.82% (p=0.000 n=19+20) FloatSqrt/128-8 1.32µs ± 3% 1.42µs ± 5% +7.29% (p=0.000 n=18+20) FloatSqrt/256-8 1.44µs ± 2% 1.45µs ± 4% ~ (p=0.089 n=19+19) FloatSqrt/1000-8 3.36µs ± 3% 3.42µs ± 5% +1.82% (p=0.012 n=20+20) FloatSqrt/10000-8 25.5µs ± 2% 27.5µs ± 7% +7.91% (p=0.000 n=18+19) FloatSqrt/100000-8 629µs ± 6% 663µs ± 9% +5.32% (p=0.000 n=18+20) FloatSqrt/1000000-8 46.4ms ± 2% 46.6ms ± 5% ~ (p=0.351 n=20+19) [Geo mean] 9.60µs 10.01µs +4.28% name old alloc/op new alloc/op delta DecimalConversion-8 54.0kB ± 0% 43.6kB ± 0% -19.40% (p=0.000 n=20+20) FloatString/100-8 400B ± 0% 400B ± 0% ~ (all equal) FloatString/1000-8 3.10kB ± 0% 3.10kB ± 0% ~ (all equal) FloatString/10000-8 52.1kB ± 0% 52.1kB ± 0% ~ (p=0.153 n=20+20) FloatString/100000-8 582kB ± 0% 582kB ± 0% ~ (all equal) FloatAdd/10-8 0.00B 0.00B ~ (all equal) FloatAdd/100-8 0.00B 0.00B ~ (all equal) FloatAdd/1000-8 0.00B 0.00B ~ (all equal) FloatAdd/10000-8 0.00B 0.00B ~ (all equal) FloatAdd/100000-8 0.00B 0.00B ~ (all equal) FloatSub/10-8 0.00B 0.00B ~ (all equal) FloatSub/100-8 0.00B 0.00B ~ (all equal) FloatSub/1000-8 0.00B 0.00B ~ (all equal) FloatSub/10000-8 0.00B 0.00B ~ (all equal) FloatSub/100000-8 0.00B 0.00B ~ (all equal) ParseFloatSmallExp-8 4.18kB ± 0% 3.60kB ± 0% -13.79% (p=0.000 n=20+20) ParseFloatLargeExp-8 18.9kB ± 0% 19.3kB ± 0% +2.25% (p=0.000 n=20+20) GCD10x10/WithoutXY-8 96.0B ± 0% 16.0B ± 0% -83.33% (p=0.000 n=20+20) GCD10x10/WithXY-8 240B ± 0% 88B ± 0% -63.33% (p=0.000 n=20+20) GCD10x100/WithoutXY-8 192B ± 0% 112B ± 0% -41.67% (p=0.000 n=20+20) GCD10x100/WithXY-8 464B ± 0% 424B ± 0% -8.62% (p=0.000 n=20+20) GCD10x1000/WithoutXY-8 416B ± 0% 336B ± 0% -19.23% (p=0.000 n=20+20) GCD10x1000/WithXY-8 1.25kB ± 0% 1.10kB ± 0% -12.18% (p=0.000 n=20+20) GCD10x10000/WithoutXY-8 2.91kB ± 0% 2.83kB ± 0% -2.75% (p=0.000 n=20+20) GCD10x10000/WithXY-8 8.70kB ± 0% 8.55kB ± 0% -1.76% (p=0.000 n=16+16) GCD10x100000/WithoutXY-8 27.2kB ± 0% 27.2kB ± 0% -0.29% (p=0.000 n=20+20) GCD10x100000/WithXY-8 82.4kB ± 0% 82.3kB ± 0% -0.17% (p=0.000 n=20+19) GCD100x100/WithoutXY-8 288B ± 0% 384B ± 0% +33.33% (p=0.000 n=20+20) GCD100x100/WithXY-8 464B ± 0% 576B ± 0% +24.14% (p=0.000 n=20+20) GCD100x1000/WithoutXY-8 640B ± 0% 688B ± 0% +7.50% (p=0.000 n=20+20) GCD100x1000/WithXY-8 1.52kB ± 0% 1.46kB ± 0% -3.68% (p=0.000 n=20+20) GCD100x10000/WithoutXY-8 4.24kB ± 0% 4.29kB ± 0% +1.13% (p=0.000 n=20+20) GCD100x10000/WithXY-8 11.1kB ± 0% 11.0kB ± 0% -0.51% (p=0.000 n=15+20) GCD100x100000/WithoutXY-8 40.9kB ± 0% 40.9kB ± 0% +0.12% (p=0.000 n=20+19) GCD100x100000/WithXY-8 110kB ± 0% 109kB ± 0% -0.08% (p=0.000 n=20+20) GCD1000x1000/WithoutXY-8 1.22kB ± 0% 1.06kB ± 0% -13.16% (p=0.000 n=20+20) GCD1000x1000/WithXY-8 2.37kB ± 0% 2.11kB ± 0% -10.83% (p=0.000 n=20+20) GCD1000x10000/WithoutXY-8 4.71kB ± 0% 4.63kB ± 0% -1.70% (p=0.000 n=20+19) GCD1000x10000/WithXY-8 28.2kB ± 0% 28.0kB ± 0% -0.43% (p=0.000 n=20+15) GCD1000x100000/WithoutXY-8 41.3kB ± 0% 41.2kB ± 0% -0.20% (p=0.000 n=20+16) GCD1000x100000/WithXY-8 301kB ± 0% 301kB ± 0% -0.13% (p=0.000 n=20+20) GCD10000x10000/WithoutXY-8 8.64kB ± 0% 8.48kB ± 0% -1.85% (p=0.000 n=20+20) GCD10000x10000/WithXY-8 57.2kB ± 0% 57.7kB ± 0% +0.80% (p=0.000 n=20+20) GCD10000x100000/WithoutXY-8 43.8kB ± 0% 43.7kB ± 0% -0.19% (p=0.000 n=20+18) GCD10000x100000/WithXY-8 2.08MB ± 0% 2.08MB ± 0% -0.02% (p=0.000 n=15+19) GCD100000x100000/WithoutXY-8 81.6kB ± 0% 81.4kB ± 0% -0.20% (p=0.000 n=20+20) GCD100000x100000/WithXY-8 4.32MB ± 0% 4.33MB ± 0% +0.12% (p=0.000 n=20+20) Hilbert-8 653kB ± 0% 313kB ± 0% -52.13% (p=0.000 n=19+20) Binomial-8 1.82kB ± 0% 1.02kB ± 0% -43.86% (p=0.000 n=20+20) QuoRem-8 0.00B 0.00B ~ (all equal) Exp-8 11.1kB ± 0% 11.0kB ± 0% -0.34% (p=0.000 n=19+20) Exp2-8 11.3kB ± 0% 11.3kB ± 0% -0.35% (p=0.000 n=19+20) Bitset-8 0.00B 0.00B ~ (all equal) BitsetNeg-8 0.00B 0.00B ~ (all equal) BitsetOrig-8 103B ± 0% 63B ± 0% -38.83% (p=0.000 n=20+20) BitsetNegOrig-8 215B ± 0% 175B ± 0% -18.60% (p=0.000 n=20+20) ModSqrt225_Tonelli-8 11.3kB ± 0% 11.0kB ± 0% -2.76% (p=0.000 n=20+17) ModSqrt225_3Mod4-8 3.57kB ± 0% 3.53kB ± 0% -1.12% (p=0.000 n=20+20) ModSqrt231_Tonelli-8 11.0kB ± 0% 10.7kB ± 0% -2.55% (p=0.000 n=20+20) ModSqrt231_5Mod8-8 4.21kB ± 0% 4.09kB ± 0% -2.85% (p=0.000 n=16+20) ModInverse-8 1.44kB ± 0% 1.28kB ± 0% -11.11% (p=0.000 n=20+20) Sqrt-8 6.00kB ± 0% 6.00kB ± 0% ~ (all equal) IntSqr/1-8 0.00B 0.00B ~ (all equal) IntSqr/2-8 0.00B 0.00B ~ (all equal) IntSqr/3-8 0.00B 0.00B ~ (all equal) IntSqr/5-8 0.00B 0.00B ~ (all equal) IntSqr/8-8 0.00B 0.00B ~ (all equal) IntSqr/10-8 0.00B 0.00B ~ (all equal) IntSqr/20-8 320B ± 0% 320B ± 0% ~ (all equal) IntSqr/30-8 480B ± 0% 480B ± 0% ~ (all equal) IntSqr/50-8 896B ± 0% 896B ± 0% ~ (all equal) IntSqr/80-8 1.28kB ± 0% 1.28kB ± 0% ~ (all equal) IntSqr/100-8 1.79kB ± 0% 1.79kB ± 0% ~ (all equal) IntSqr/200-8 3.20kB ± 0% 3.20kB ± 0% ~ (all equal) IntSqr/300-8 8.06kB ± 0% 8.06kB ± 0% ~ (all equal) IntSqr/500-8 12.3kB ± 0% 12.3kB ± 0% ~ (all equal) IntSqr/800-8 28.8kB ± 0% 28.8kB ± 0% ~ (all equal) IntSqr/1000-8 36.9kB ± 0% 36.9kB ± 0% ~ (all equal) Div/20/10-8 0.00B 0.00B ~ (all equal) Div/200/100-8 0.00B 0.00B ~ (all equal) Div/2000/1000-8 0.00B 0.00B ~ (all equal) Div/20000/10000-8 0.00B 0.00B ~ (all equal) Div/200000/100000-8 690B ± 0% 690B ± 0% ~ (all equal) Mul-8 565kB ± 0% 565kB ± 0% ~ (all equal) ZeroShifts/Shl-8 6.53kB ± 0% 6.53kB ± 0% ~ (all equal) ZeroShifts/ShlSame-8 0.00B 0.00B ~ (all equal) ZeroShifts/Shr-8 6.53kB ± 0% 6.53kB ± 0% ~ (all equal) ZeroShifts/ShrSame-8 0.00B 0.00B ~ (all equal) Exp3Power/0x10-8 192B ± 0% 112B ± 0% -41.67% (p=0.000 n=20+20) Exp3Power/0x40-8 192B ± 0% 112B ± 0% -41.67% (p=0.000 n=20+20) Exp3Power/0x100-8 288B ± 0% 208B ± 0% -27.78% (p=0.000 n=20+20) Exp3Power/0x400-8 672B ± 0% 592B ± 0% -11.90% (p=0.000 n=20+20) Exp3Power/0x1000-8 3.33kB ± 0% 3.25kB ± 0% -2.40% (p=0.000 n=20+20) Exp3Power/0x4000-8 13.8kB ± 0% 13.7kB ± 0% -0.58% (p=0.000 n=20+20) Exp3Power/0x10000-8 117kB ± 0% 117kB ± 0% -0.07% (p=0.000 n=20+20) Exp3Power/0x40000-8 755kB ± 0% 755kB ± 0% -0.01% (p=0.000 n=19+20) Exp3Power/0x100000-8 5.22MB ± 0% 5.22MB ± 0% -0.00% (p=0.000 n=20+20) Exp3Power/0x400000-8 39.8MB ± 0% 39.8MB ± 0% -0.00% (p=0.000 n=20+19) Fibo-8 3.09MB ± 0% 3.08MB ± 0% -0.28% (p=0.000 n=20+16) NatSqr/1-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) NatSqr/2-8 64.0B ± 0% 64.0B ± 0% ~ (all equal) NatSqr/3-8 80.0B ± 0% 80.0B ± 0% ~ (all equal) NatSqr/5-8 112B ± 0% 112B ± 0% ~ (all equal) NatSqr/8-8 160B ± 0% 160B ± 0% ~ (all equal) NatSqr/10-8 192B ± 0% 192B ± 0% ~ (all equal) NatSqr/20-8 672B ± 0% 672B ± 0% ~ (all equal) NatSqr/30-8 992B ± 0% 992B ± 0% ~ (all equal) NatSqr/50-8 1.79kB ± 0% 1.79kB ± 0% ~ (all equal) NatSqr/80-8 2.69kB ± 0% 2.69kB ± 0% ~ (all equal) NatSqr/100-8 3.58kB ± 0% 3.58kB ± 0% ~ (all equal) NatSqr/200-8 6.66kB ± 0% 6.66kB ± 0% ~ (all equal) NatSqr/300-8 24.4kB ± 0% 24.4kB ± 0% ~ (all equal) NatSqr/500-8 36.9kB ± 0% 36.9kB ± 0% ~ (all equal) NatSqr/800-8 69.8kB ± 0% 69.8kB ± 0% ~ (all equal) NatSqr/1000-8 86.0kB ± 0% 86.0kB ± 0% ~ (all equal) NatSetBytes/8-8 0.00B 0.00B ~ (all equal) NatSetBytes/24-8 64.0B ± 0% 64.0B ± 0% ~ (all equal) NatSetBytes/128-8 160B ± 0% 160B ± 0% ~ (all equal) NatSetBytes/7-8 0.00B 0.00B ~ (all equal) NatSetBytes/23-8 64.0B ± 0% 64.0B ± 0% ~ (all equal) NatSetBytes/127-8 160B ± 0% 160B ± 0% ~ (all equal) ScanPi-8 75.4kB ± 0% 75.7kB ± 0% +0.41% (p=0.000 n=20+20) StringPiParallel-8 20.4kB ± 0% 20.4kB ± 0% ~ (p=0.223 n=20+20) Scan/10/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/100/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/1000/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/10000/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/100000/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/10/Base8-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/100/Base8-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/1000/Base8-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/10000/Base8-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/100000/Base8-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/10/Base10-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/100/Base10-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/1000/Base10-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/10000/Base10-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/100000/Base10-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/10/Base16-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/100/Base16-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/1000/Base16-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/10000/Base16-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) Scan/100000/Base16-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) String/10/Base2-8 48.0B ± 0% 48.0B ± 0% ~ (all equal) String/100/Base2-8 352B ± 0% 352B ± 0% ~ (all equal) String/1000/Base2-8 3.46kB ± 0% 3.46kB ± 0% ~ (all equal) String/10000/Base2-8 41.0kB ± 0% 41.0kB ± 0% ~ (all equal) String/100000/Base2-8 336kB ± 0% 336kB ± 0% ~ (all equal) String/10/Base8-8 16.0B ± 0% 16.0B ± 0% ~ (all equal) String/100/Base8-8 112B ± 0% 112B ± 0% ~ (all equal) String/1000/Base8-8 1.15kB ± 0% 1.15kB ± 0% ~ (all equal) String/10000/Base8-8 12.3kB ± 0% 12.3kB ± 0% ~ (all equal) String/100000/Base8-8 115kB ± 0% 115kB ± 0% ~ (all equal) String/10/Base10-8 64.0B ± 0% 24.0B ± 0% -62.50% (p=0.000 n=20+20) String/100/Base10-8 192B ± 0% 192B ± 0% ~ (all equal) String/1000/Base10-8 1.95kB ± 0% 1.95kB ± 0% ~ (all equal) String/10000/Base10-8 20.0kB ± 0% 20.0kB ± 0% ~ (p=0.983 n=19+20) String/100000/Base10-8 210kB ± 1% 211kB ± 1% +0.82% (p=0.000 n=19+20) String/10/Base16-8 16.0B ± 0% 16.0B ± 0% ~ (all equal) String/100/Base16-8 96.0B ± 0% 96.0B ± 0% ~ (all equal) String/1000/Base16-8 896B ± 0% 896B ± 0% ~ (all equal) String/10000/Base16-8 9.47kB ± 0% 9.47kB ± 0% ~ (all equal) String/100000/Base16-8 90.1kB ± 0% 90.1kB ± 0% ~ (all equal) LeafSize/0-8 16.9kB ± 0% 16.8kB ± 0% -0.44% (p=0.000 n=20+20) LeafSize/1-8 22.4kB ± 0% 22.3kB ± 0% -0.34% (p=0.000 n=20+19) LeafSize/2-8 22.4kB ± 0% 22.3kB ± 0% -0.34% (p=0.000 n=20+19) LeafSize/3-8 22.4kB ± 0% 22.3kB ± 0% -0.34% (p=0.000 n=20+17) LeafSize/4-8 22.4kB ± 0% 22.3kB ± 0% -0.34% (p=0.000 n=20+19) LeafSize/5-8 22.4kB ± 0% 22.3kB ± 0% -0.33% (p=0.000 n=20+20) LeafSize/6-8 22.3kB ± 0% 22.2kB ± 0% -0.34% (p=0.000 n=20+20) LeafSize/7-8 22.3kB ± 0% 22.2kB ± 0% -0.35% (p=0.000 n=20+20) LeafSize/8-8 22.3kB ± 0% 22.2kB ± 0% -0.34% (p=0.000 n=16+20) LeafSize/9-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20) LeafSize/10-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20) LeafSize/11-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20) LeafSize/12-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20) LeafSize/13-8 22.3kB ± 0% 22.2kB ± 0% -0.34% (p=0.000 n=20+15) LeafSize/14-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20) LeafSize/15-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=20+20) LeafSize/16-8 22.3kB ± 0% 22.2kB ± 0% -0.33% (p=0.000 n=19+20) LeafSize/32-8 22.3kB ± 0% 22.2kB ± 0% -0.32% (p=0.000 n=20+20) LeafSize/64-8 21.8kB ± 0% 21.7kB ± 0% -0.33% (p=0.000 n=18+19) ProbablyPrime/n=0-8 15.3kB ± 0% 14.9kB ± 0% -2.35% (p=0.000 n=20+20) ProbablyPrime/n=1-8 21.0kB ± 0% 20.7kB ± 0% -1.71% (p=0.000 n=20+20) ProbablyPrime/n=5-8 43.4kB ± 0% 42.9kB ± 0% -1.20% (p=0.000 n=20+20) ProbablyPrime/n=10-8 71.5kB ± 0% 70.7kB ± 0% -1.01% (p=0.000 n=19+20) ProbablyPrime/n=20-8 127kB ± 0% 126kB ± 0% -0.88% (p=0.000 n=20+20) ProbablyPrime/Lucas-8 3.07kB ± 0% 2.79kB ± 0% -9.12% (p=0.000 n=20+20) ProbablyPrime/MillerRabinBase2-8 12.1kB ± 0% 12.0kB ± 0% -0.66% (p=0.000 n=20+20) FloatSqrt/64-8 416B ± 0% 360B ± 0% -13.46% (p=0.000 n=20+20) FloatSqrt/128-8 640B ± 0% 584B ± 0% -8.75% (p=0.000 n=20+20) FloatSqrt/256-8 512B ± 0% 472B ± 0% -7.81% (p=0.000 n=20+20) FloatSqrt/1000-8 1.47kB ± 0% 1.43kB ± 0% -2.72% (p=0.000 n=20+20) FloatSqrt/10000-8 18.2kB ± 0% 18.1kB ± 0% -0.22% (p=0.000 n=20+20) FloatSqrt/100000-8 204kB ± 0% 204kB ± 0% -0.02% (p=0.000 n=20+20) FloatSqrt/1000000-8 6.37MB ± 0% 6.37MB ± 0% -0.00% (p=0.000 n=19+20) [Geo mean] 3.42kB 3.24kB -5.33% name old allocs/op new allocs/op delta DecimalConversion-8 1.65k ± 0% 1.65k ± 0% ~ (all equal) FloatString/100-8 8.00 ± 0% 8.00 ± 0% ~ (all equal) FloatString/1000-8 9.00 ± 0% 9.00 ± 0% ~ (all equal) FloatString/10000-8 22.0 ± 0% 22.0 ± 0% ~ (all equal) FloatString/100000-8 136 ± 0% 136 ± 0% ~ (all equal) FloatAdd/10-8 0.00 0.00 ~ (all equal) FloatAdd/100-8 0.00 0.00 ~ (all equal) FloatAdd/1000-8 0.00 0.00 ~ (all equal) FloatAdd/10000-8 0.00 0.00 ~ (all equal) FloatAdd/100000-8 0.00 0.00 ~ (all equal) FloatSub/10-8 0.00 0.00 ~ (all equal) FloatSub/100-8 0.00 0.00 ~ (all equal) FloatSub/1000-8 0.00 0.00 ~ (all equal) FloatSub/10000-8 0.00 0.00 ~ (all equal) FloatSub/100000-8 0.00 0.00 ~ (all equal) ParseFloatSmallExp-8 110 ± 0% 130 ± 0% +18.18% (p=0.000 n=20+20) ParseFloatLargeExp-8 319 ± 0% 371 ± 0% +16.30% (p=0.000 n=20+20) GCD10x10/WithoutXY-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) GCD10x10/WithXY-8 5.00 ± 0% 6.00 ± 0% +20.00% (p=0.000 n=20+20) GCD10x100/WithoutXY-8 4.00 ± 0% 4.00 ± 0% ~ (all equal) GCD10x100/WithXY-8 9.00 ± 0% 12.00 ± 0% +33.33% (p=0.000 n=20+20) GCD10x1000/WithoutXY-8 4.00 ± 0% 4.00 ± 0% ~ (all equal) GCD10x1000/WithXY-8 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=20+20) GCD10x10000/WithoutXY-8 4.00 ± 0% 4.00 ± 0% ~ (all equal) GCD10x10000/WithXY-8 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=20+20) GCD10x100000/WithoutXY-8 4.00 ± 0% 4.00 ± 0% ~ (all equal) GCD10x100000/WithXY-8 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=20+20) GCD100x100/WithoutXY-8 6.00 ± 0% 10.00 ± 0% +66.67% (p=0.000 n=20+20) GCD100x100/WithXY-8 9.00 ± 0% 15.00 ± 0% +66.67% (p=0.000 n=20+20) GCD100x1000/WithoutXY-8 6.00 ± 0% 8.00 ± 0% +33.33% (p=0.000 n=20+20) GCD100x1000/WithXY-8 12.0 ± 0% 13.0 ± 0% +8.33% (p=0.000 n=20+20) GCD100x10000/WithoutXY-8 6.00 ± 0% 8.00 ± 0% +33.33% (p=0.000 n=20+20) GCD100x10000/WithXY-8 12.0 ± 0% 13.0 ± 0% +8.33% (p=0.000 n=20+20) GCD100x100000/WithoutXY-8 6.00 ± 0% 8.00 ± 0% +33.33% (p=0.000 n=20+20) GCD100x100000/WithXY-8 12.0 ± 0% 13.0 ± 0% +8.33% (p=0.000 n=20+20) GCD1000x1000/WithoutXY-8 10.0 ± 0% 10.0 ± 0% ~ (all equal) GCD1000x1000/WithXY-8 19.0 ± 0% 20.0 ± 0% +5.26% (p=0.000 n=20+20) GCD1000x10000/WithoutXY-8 8.00 ± 0% 8.00 ± 0% ~ (all equal) GCD1000x10000/WithXY-8 26.0 ± 0% 26.0 ± 0% ~ (all equal) GCD1000x100000/WithoutXY-8 8.00 ± 0% 8.00 ± 0% ~ (all equal) GCD1000x100000/WithXY-8 27.0 ± 0% 27.0 ± 0% ~ (all equal) GCD10000x10000/WithoutXY-8 10.0 ± 0% 10.0 ± 0% ~ (all equal) GCD10000x10000/WithXY-8 76.0 ± 0% 78.0 ± 0% +2.63% (p=0.000 n=20+20) GCD10000x100000/WithoutXY-8 8.00 ± 0% 8.00 ± 0% ~ (all equal) GCD10000x100000/WithXY-8 174 ± 0% 174 ± 0% ~ (all equal) GCD100000x100000/WithoutXY-8 10.0 ± 0% 10.0 ± 0% ~ (all equal) GCD100000x100000/WithXY-8 645 ± 0% 647 ± 0% +0.31% (p=0.000 n=20+20) Hilbert-8 14.1k ± 0% 14.3k ± 0% +0.92% (p=0.000 n=20+20) Binomial-8 38.0 ± 0% 38.0 ± 0% ~ (all equal) QuoRem-8 0.00 0.00 ~ (all equal) Exp-8 21.0 ± 0% 21.0 ± 0% ~ (all equal) Exp2-8 22.0 ± 0% 22.0 ± 0% ~ (all equal) Bitset-8 0.00 0.00 ~ (all equal) BitsetNeg-8 0.00 0.00 ~ (all equal) BitsetOrig-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) BitsetNegOrig-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) ModSqrt225_Tonelli-8 85.0 ± 0% 86.0 ± 0% +1.18% (p=0.000 n=20+20) ModSqrt225_3Mod4-8 25.0 ± 0% 25.0 ± 0% ~ (all equal) ModSqrt231_Tonelli-8 80.0 ± 0% 80.0 ± 0% ~ (all equal) ModSqrt231_5Mod8-8 32.0 ± 0% 32.0 ± 0% ~ (all equal) ModInverse-8 11.0 ± 0% 11.0 ± 0% ~ (all equal) Sqrt-8 13.0 ± 0% 13.0 ± 0% ~ (all equal) IntSqr/1-8 0.00 0.00 ~ (all equal) IntSqr/2-8 0.00 0.00 ~ (all equal) IntSqr/3-8 0.00 0.00 ~ (all equal) IntSqr/5-8 0.00 0.00 ~ (all equal) IntSqr/8-8 0.00 0.00 ~ (all equal) IntSqr/10-8 0.00 0.00 ~ (all equal) IntSqr/20-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) IntSqr/30-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) IntSqr/50-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) IntSqr/80-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) IntSqr/100-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) IntSqr/200-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) IntSqr/300-8 3.00 ± 0% 3.00 ± 0% ~ (all equal) IntSqr/500-8 3.00 ± 0% 3.00 ± 0% ~ (all equal) IntSqr/800-8 9.00 ± 0% 9.00 ± 0% ~ (all equal) IntSqr/1000-8 9.00 ± 0% 9.00 ± 0% ~ (all equal) Div/20/10-8 0.00 0.00 ~ (all equal) Div/200/100-8 0.00 0.00 ~ (all equal) Div/2000/1000-8 0.00 0.00 ~ (all equal) Div/20000/10000-8 0.00 0.00 ~ (all equal) Div/200000/100000-8 0.00 0.00 ~ (all equal) Mul-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) ZeroShifts/Shl-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) ZeroShifts/ShlSame-8 0.00 0.00 ~ (all equal) ZeroShifts/Shr-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) ZeroShifts/ShrSame-8 0.00 0.00 ~ (all equal) Exp3Power/0x10-8 4.00 ± 0% 4.00 ± 0% ~ (all equal) Exp3Power/0x40-8 4.00 ± 0% 4.00 ± 0% ~ (all equal) Exp3Power/0x100-8 5.00 ± 0% 5.00 ± 0% ~ (all equal) Exp3Power/0x400-8 7.00 ± 0% 7.00 ± 0% ~ (all equal) Exp3Power/0x1000-8 11.0 ± 0% 11.0 ± 0% ~ (all equal) Exp3Power/0x4000-8 15.0 ± 0% 15.0 ± 0% ~ (all equal) Exp3Power/0x10000-8 29.0 ± 0% 29.0 ± 0% ~ (all equal) Exp3Power/0x40000-8 140 ± 0% 140 ± 0% ~ (all equal) Exp3Power/0x100000-8 1.12k ± 0% 1.12k ± 0% ~ (all equal) Exp3Power/0x400000-8 9.88k ± 0% 9.88k ± 0% ~ (p=0.747 n=17+19) Fibo-8 739 ± 0% 743 ± 0% +0.54% (p=0.000 n=20+20) NatSqr/1-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) NatSqr/2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) NatSqr/3-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) NatSqr/5-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) NatSqr/8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) NatSqr/10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) NatSqr/20-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) NatSqr/30-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) NatSqr/50-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) NatSqr/80-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) NatSqr/100-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) NatSqr/200-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) NatSqr/300-8 4.00 ± 0% 4.00 ± 0% ~ (all equal) NatSqr/500-8 4.00 ± 0% 4.00 ± 0% ~ (all equal) NatSqr/800-8 10.0 ± 0% 10.0 ± 0% ~ (all equal) NatSqr/1000-8 10.0 ± 0% 10.0 ± 0% ~ (all equal) NatSetBytes/8-8 0.00 0.00 ~ (all equal) NatSetBytes/24-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) NatSetBytes/128-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) NatSetBytes/7-8 0.00 0.00 ~ (all equal) NatSetBytes/23-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) NatSetBytes/127-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) ScanPi-8 60.0 ± 0% 61.0 ± 0% +1.67% (p=0.000 n=20+20) StringPiParallel-8 24.0 ± 0% 24.0 ± 0% ~ (all equal) Scan/10/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/100/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/1000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/10000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/100000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/10/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/100/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/1000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/10000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/100000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/10/Base10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/100/Base10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/1000/Base10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/10000/Base10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/100000/Base10-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/10/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/100/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/1000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/10000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) Scan/100000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/10/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/100/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/1000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/10000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/100000/Base2-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/10/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/100/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/1000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/10000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/100000/Base8-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/10/Base10-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) String/100/Base10-8 2.00 ± 0% 2.00 ± 0% ~ (all equal) String/1000/Base10-8 3.00 ± 0% 3.00 ± 0% ~ (all equal) String/10000/Base10-8 3.00 ± 0% 3.00 ± 0% ~ (all equal) String/100000/Base10-8 3.00 ± 0% 3.00 ± 0% ~ (all equal) String/10/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/100/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/1000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/10000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) String/100000/Base16-8 1.00 ± 0% 1.00 ± 0% ~ (all equal) LeafSize/0-8 10.0 ± 0% 10.0 ± 0% ~ (all equal) LeafSize/1-8 13.0 ± 0% 13.0 ± 0% ~ (all equal) LeafSize/2-8 13.0 ± 0% 13.0 ± 0% ~ (all equal) LeafSize/3-8 13.0 ± 0% 13.0 ± 0% ~ (all equal) LeafSize/4-8 13.0 ± 0% 13.0 ± 0% ~ (all equal) LeafSize/5-8 13.0 ± 0% 13.0 ± 0% ~ (all equal) LeafSize/6-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/7-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/8-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/9-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/10-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/11-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/12-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/13-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/14-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/15-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/16-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/32-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) LeafSize/64-8 11.0 ± 0% 11.0 ± 0% ~ (all equal) ProbablyPrime/n=0-8 52.0 ± 0% 52.0 ± 0% ~ (all equal) ProbablyPrime/n=1-8 73.0 ± 0% 73.0 ± 0% ~ (all equal) ProbablyPrime/n=5-8 157 ± 0% 157 ± 0% ~ (all equal) ProbablyPrime/n=10-8 262 ± 0% 262 ± 0% ~ (all equal) ProbablyPrime/n=20-8 472 ± 0% 472 ± 0% ~ (all equal) ProbablyPrime/Lucas-8 22.0 ± 0% 22.0 ± 0% ~ (all equal) ProbablyPrime/MillerRabinBase2-8 29.0 ± 0% 29.0 ± 0% ~ (all equal) FloatSqrt/64-8 9.00 ± 0% 10.00 ± 0% +11.11% (p=0.000 n=20+20) FloatSqrt/128-8 12.0 ± 0% 13.0 ± 0% +8.33% (p=0.000 n=20+20) FloatSqrt/256-8 8.00 ± 0% 8.00 ± 0% ~ (all equal) FloatSqrt/1000-8 9.00 ± 0% 9.00 ± 0% ~ (all equal) FloatSqrt/10000-8 14.0 ± 0% 14.0 ± 0% ~ (all equal) FloatSqrt/100000-8 33.0 ± 0% 33.0 ± 0% ~ (all equal) FloatSqrt/1000000-8 1.16k ± 0% 1.16k ± 0% ~ (all equal) [Geo mean] 6.62 6.76 +2.09% Change-Id: Id9df4157cac1e07721e35cff7fcdefe60703873a Reviewed-on: https://go-review.googlesource.com/c/150999 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Alan Donovan <adonovan@google.com> Reviewed-by: Robert Griesemer <gri@golang.org>	2018-11-28 17:38:46 +00:00
Brian Kessler	979d9027ae	math/bits: define Div to panic when y<=hi Div panics when y<=hi because either the quotient overflows the size of the output or division by zero occurs when y==0. This provides a uniform behavior for all implementations. Fixes #28316 Change-Id: If23aeb10e0709ee1a60b7d614afc9103d674a980 Reviewed-on: https://go-review.googlesource.com/c/149517 Reviewed-by: Robert Griesemer <gri@golang.org>	2018-11-27 05:04:33 +00:00
Brian Kessler	ead5d1e316	math/bits: panic when y<=hi in Div Explicitly check for divide-by-zero/overflow and panic with the appropriate runtime error. The additional checks have basically no effect on performance since the branch is easily predicted. name old time/op new time/op delta Div-4 53.9ns ± 1% 53.0ns ± 1% -1.59% (p=0.016 n=4+5) Div32-4 17.9ns ± 0% 18.4ns ± 0% +2.56% (p=0.008 n=5+5) Div64-4 53.5ns ± 0% 53.3ns ± 0% ~ (p=0.095 n=5+5) Updates #28316 Change-Id: I36297ee9946cbbc57fefb44d1730283b049ecf57 Reviewed-on: https://go-review.googlesource.com/c/144377 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-11-27 05:04:14 +00:00
Brad Fitzpatrick	3813edf26e	all: use "reports whether" consistently in the few places that didn't Go documentation style for boolean funcs is to say: // Foo reports whether ... func Foo() bool (rather than "returns true if") This CL also replaces 4 uses of "iff" with the same "reports whether" wording, which doesn't lose any meaning, and will prevent people from sending typo fixes when they don't realize it's "if and only if". In the past I think we've had the typo CLs updated to just say "reports whether". So do them all at once. (Inspired by the addition of another "returns true if" in CL 146938 in fd_plan9.go) Created with: $ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true iff" \| grep -v vendor) $ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true if" \| grep -v vendor) Change-Id: Ided502237f5ab0d25cb625dbab12529c361a8b9f Reviewed-on: https://go-review.googlesource.com/c/147037 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2018-11-02 22:47:58 +00:00
Robert Griesemer	c86d464734	math/big: shallow copies of Int/Rat/Float are not supported (documentation) Fixes #28423. Change-Id: Ie57ade565d0407a4bffaa86fb4475ff083168e79 Reviewed-on: https://go-review.googlesource.com/c/145537 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2018-10-29 18:23:31 +00:00
hearot	f28191340e	math/big: fix a formula used as documentation The function documentation was wrong, it was using a wrong parameter. This change replaces it with the right parameter. The wrong formula was: q = (u1<<_W + u0 - r)/y The function has got a parameter "v" (of type Word), not a parameter "y". So, the right formula is: q = (u1<<_W + u0 - r)/v Fixes #28444 Change-Id: I82e57ba014735a9fdb6262874ddf498754d30d33 Reviewed-on: https://go-review.googlesource.com/c/145280 Reviewed-by: Robert Griesemer <gri@golang.org>	2018-10-28 16:58:20 +00:00
Keith Randall	899f3a2892	cmd/compile: intrinsify math/bits.Add on amd64 name old time/op new time/op delta Add-8 1.11ns ± 0% 1.18ns ± 0% +6.31% (p=0.029 n=4+4) Add32-8 1.02ns ± 0% 1.02ns ± 1% ~ (p=0.333 n=4+5) Add64-8 1.11ns ± 1% 1.17ns ± 0% +5.79% (p=0.008 n=5+5) Add64multiple-8 4.35ns ± 1% 0.86ns ± 0% -80.22% (p=0.000 n=5+4) The individual ops are a bit slower (but still very fast). Using the ops in carry chains is very fast. Update #28273 Change-Id: Id975f76df2b930abf0e412911d327b6c5b1befe5 Reviewed-on: https://go-review.googlesource.com/c/144257 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-10-25 19:47:00 +00:00
Brian Kessler	127c51e48c	math/bits: correct BenchmarkSub64 Previously, the benchmark was measuring Add64 instead of Sub64. Change-Id: I0cf30935c8a4728bead9868834377aae0b34f008 Reviewed-on: https://go-review.googlesource.com/c/144380 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-10-24 14:53:19 +00:00
Igor Zhilianin	f90e89e675	all: fix a bunch of misspellings Change-Id: If2954bdfc551515403706b2cd0dde94e45936e08 GitHub-Last-Rev: `d4cfc41a55` GitHub-Pull-Request: golang/go#28049 Reviewed-on: https://go-review.googlesource.com/c/140299 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-10-06 15:40:03 +00:00
Plekhanov Maxim	47e71f3b69	math: use Abs in Pow rather than if x < 0 { x = -x } name old time/op new time/op delta PowInt 55.7ns ± 1% 53.4ns ± 2% -4.15% (p=0.000 n=9+9) PowFrac 133ns ± 1% 133ns ± 2% ~ (p=0.587 n=8+9) Change-Id: Ica0f4c2cbd554f2195c6d1762ed26742ff8e3924 Reviewed-on: https://go-review.googlesource.com/c/85375 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-10-04 17:33:04 +00:00
Plekhanov Maxim	497d24178f	math: use Abs in Mod rather than if x < 0 { x = -x} goos: linux goarch: amd64 pkg: math name old time/op new time/op delta Mod 64.7ns ± 2% 63.7ns ± 2% -1.52% (p=0.003 n=8+10) Change-Id: I851bec0fd6c223dab73e4a680b7393d49e81a0e8 Reviewed-on: https://go-review.googlesource.com/c/85095 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-10-04 17:32:44 +00:00
Zhou Peng	b8ac64a581	all: this big patch remove whitespace from assembly files Don't worry, this patch just remove trailing whitespace from assembly files, and does not touch any logical changes. Change-Id: Ia724ac0b1abf8bc1e41454bdc79289ef317c165d Reviewed-on: https://go-review.googlesource.com/c/113595 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-10-03 15:28:51 +00:00
fanzha02	a19a83c8ef	cmd/compile: optimize math.Float64(32)bits and math.Float64(32)frombits on arm64 Use float <-> int register moves without conversion instead of stores and loads to move float <-> int values. Math package benchmark results. name old time/op new time/op delta Acosh 153ns ± 0% 147ns ± 0% -3.92% (p=0.000 n=10+10) Asinh 183ns ± 0% 177ns ± 0% -3.28% (p=0.000 n=10+10) Atanh 157ns ± 0% 155ns ± 0% -1.27% (p=0.000 n=10+10) Atan2 118ns ± 0% 117ns ± 1% -0.59% (p=0.003 n=10+10) Cbrt 119ns ± 0% 114ns ± 0% -4.20% (p=0.000 n=10+10) Copysign 7.51ns ± 0% 6.51ns ± 0% -13.32% (p=0.000 n=9+10) Cos 73.1ns ± 0% 70.6ns ± 0% -3.42% (p=0.000 n=10+10) Cosh 119ns ± 0% 121ns ± 0% +1.68% (p=0.000 n=10+9) ExpGo 154ns ± 0% 149ns ± 0% -3.05% (p=0.000 n=9+10) Expm1 101ns ± 0% 99ns ± 0% -1.88% (p=0.000 n=10+10) Exp2Go 150ns ± 0% 146ns ± 0% -2.67% (p=0.000 n=10+10) Abs 7.01ns ± 0% 6.01ns ± 0% -14.27% (p=0.000 n=10+9) Mod 234ns ± 0% 212ns ± 0% -9.40% (p=0.000 n=9+10) Frexp 34.5ns ± 0% 30.0ns ± 0% -13.04% (p=0.000 n=10+10) Gamma 112ns ± 0% 111ns ± 0% -0.89% (p=0.000 n=10+10) Hypot 73.6ns ± 0% 68.6ns ± 0% -6.79% (p=0.000 n=10+10) HypotGo 77.1ns ± 0% 72.1ns ± 0% -6.49% (p=0.000 n=10+10) Ilogb 31.0ns ± 0% 28.0ns ± 0% -9.68% (p=0.000 n=10+10) J0 437ns ± 0% 434ns ± 0% -0.62% (p=0.000 n=10+10) J1 433ns ± 0% 431ns ± 0% -0.46% (p=0.000 n=10+10) Jn 927ns ± 0% 922ns ± 0% -0.54% (p=0.000 n=10+10) Ldexp 41.5ns ± 0% 37.0ns ± 0% -10.84% (p=0.000 n=9+10) Log 124ns ± 0% 118ns ± 0% -4.84% (p=0.000 n=10+9) Logb 34.0ns ± 0% 32.0ns ± 0% -5.88% (p=0.000 n=10+10) Log1p 110ns ± 0% 108ns ± 0% -1.82% (p=0.000 n=10+10) Log10 136ns ± 0% 132ns ± 0% -2.94% (p=0.000 n=10+10) Log2 51.6ns ± 0% 47.1ns ± 0% -8.72% (p=0.000 n=10+10) Nextafter32 33.0ns ± 0% 30.5ns ± 0% -7.58% (p=0.000 n=10+10) Nextafter64 29.0ns ± 0% 26.5ns ± 0% -8.62% (p=0.000 n=10+10) PowInt 169ns ± 0% 160ns ± 0% -5.33% (p=0.000 n=10+10) PowFrac 375ns ± 0% 361ns ± 0% -3.73% (p=0.000 n=10+10) RoundToEven 14.0ns ± 0% 12.5ns ± 0% -10.71% (p=0.000 n=10+10) Remainder 206ns ± 0% 192ns ± 0% -6.80% (p=0.000 n=10+9) Signbit 6.01ns ± 0% 5.51ns ± 0% -8.32% (p=0.000 n=10+9) Sin 70.1ns ± 0% 69.6ns ± 0% -0.71% (p=0.000 n=10+10) Sincos 99.1ns ± 0% 99.6ns ± 0% +0.50% (p=0.000 n=9+10) SqrtGoLatency 178ns ± 0% 146ns ± 0% -17.70% (p=0.000 n=8+10) SqrtPrime 9.19µs ± 0% 9.20µs ± 0% +0.01% (p=0.000 n=9+9) Tanh 125ns ± 1% 127ns ± 0% +1.36% (p=0.000 n=10+10) Y0 428ns ± 0% 426ns ± 0% -0.47% (p=0.000 n=10+10) Y1 431ns ± 0% 429ns ± 0% -0.46% (p=0.000 n=10+9) Yn 906ns ± 0% 901ns ± 0% -0.55% (p=0.000 n=10+10) Float64bits 4.50ns ± 0% 3.50ns ± 0% -22.22% (p=0.000 n=10+10) Float64frombits 4.00ns ± 0% 3.50ns ± 0% -12.50% (p=0.000 n=10+9) Float32bits 4.50ns ± 0% 3.50ns ± 0% -22.22% (p=0.002 n=8+10) Float32frombits 4.00ns ± 0% 3.50ns ± 0% -12.50% (p=0.000 n=10+10) Change-Id: Iba829e15d5624962fe0c699139ea783efeefabc2 Reviewed-on: https://go-review.googlesource.com/129715 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-09-17 20:49:04 +00:00
Brian Kessler	13de5e7f7f	math/bits: add extended precision Add, Sub, Mul, Div Port math/big pure go versions of add-with-carry, subtract-with-borrow, full-width multiply, and full-width divide. Updates #24813 Change-Id: Ifae5d2f6ee4237137c9dcba931f69c91b80a4b1c Reviewed-on: https://go-review.googlesource.com/123157 Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-09-11 17:54:20 +00:00
Eric Ponce	ded9411580	math: add Round and RoundToEven examples Change-Id: Ibef5f96ea588d17eac1c96ee3992e01943ba0fef Reviewed-on: https://go-review.googlesource.com/131496 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2018-08-28 05:22:41 +00:00
Brian Kessler	a3381faf81	math/big: streamline divLarge initialization The divLarge code contained "todo"s about avoiding alias and clear calls in the initialization of variables. By rearranging the order of initialization and always using an auxiliary variable for the shifted divisor, all of these calls can be safely avoided. On average, normalizing the divisor (shift>0) is required 31/32 or 63/64 of the time. If one always performs the shift into an auxiliary variable first, this avoids the need to check for aliasing of vIn in the output variables u and z. The remainder u is initialized via a left shift of uIn and thus needs no alias check against uIn. Since uIn and vIn were both used, z needs no alias checks except against u which is used for storage of the remainder. This change has a minimal impact on performance (see below), but cleans up the initialization code and eliminates the "todo"s. name old time/op new time/op delta Div/20/10-4 86.7ns ± 6% 85.7ns ± 5% ~ (p=0.841 n=5+5) Div/200/100-4 523ns ± 5% 502ns ± 3% -4.13% (p=0.024 n=5+5) Div/2000/1000-4 2.55µs ± 3% 2.59µs ± 5% ~ (p=0.548 n=5+5) Div/20000/10000-4 80.4µs ± 4% 80.0µs ± 2% ~ (p=1.000 n=5+5) Div/200000/100000-4 6.43ms ± 6% 6.35ms ± 4% ~ (p=0.548 n=5+5) Fixes #22928 Change-Id: I30d8498ef1cf8b69b0f827165c517bc25a5c32d7 Reviewed-on: https://go-review.googlesource.com/130775 Reviewed-by: Robert Griesemer <gri@golang.org>	2018-08-22 22:54:01 +00:00
Brian Kessler	3fd62ce910	math/big: optimize multiplication by 2 and 1/2 in float Sqrt The Sqrt code previously used explicit constants for 2 and 1/2. This change replaces multiplication by these constants with increment and decrement of the floating point exponent directly. This improves performance by ~7-10% for small inputs and minimal improvement for large inputs. name old time/op new time/op delta FloatSqrt/64-4 1.39µs ± 0% 1.29µs ± 3% -7.01% (p=0.016 n=4+5) FloatSqrt/128-4 2.84µs ± 0% 2.60µs ± 1% -8.33% (p=0.008 n=5+5) FloatSqrt/256-4 3.24µs ± 1% 2.91µs ± 2% -10.00% (p=0.008 n=5+5) FloatSqrt/1000-4 7.42µs ± 1% 6.74µs ± 0% -9.16% (p=0.008 n=5+5) FloatSqrt/10000-4 65.9µs ± 1% 65.3µs ± 4% ~ (p=0.310 n=5+5) FloatSqrt/100000-4 1.57ms ± 8% 1.52ms ± 1% ~ (p=0.111 n=5+4) FloatSqrt/1000000-4 127ms ± 1% 126ms ± 1% ~ (p=0.690 n=5+5) Change-Id: Id81ac842a9d64981e001c4ca3ff129eebd227593 Reviewed-on: https://go-review.googlesource.com/130835 Reviewed-by: Robert Griesemer <gri@golang.org>	2018-08-22 21:02:21 +00:00
Ian Lance Taylor	1ae2eed0b2	math: test for pos/neg zero return of Ceil/Floor/Trunc Ceil and Trunc of -0.2 return -0, not +0, but we didn't test that. Updates #23647 Change-Id: Idbd4699376abfb4ca93f16c73c114d610d86a9f2 Reviewed-on: https://go-review.googlesource.com/91335 Run-TryBot: Ian Lance Taylor <iant@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-08-21 22:31:37 +00:00
Iskander Sharipov	0fbaf6ca8b	math,net: omit explicit true tag expr in switch Performed `switch true {}` => `switch {}` replacement. Found using https://go-critic.github.io/overview.html#switchTrue-ref Change-Id: Ib39ea98531651966a5a56b7bd729b46e4eeb7f7c Reviewed-on: https://go-review.googlesource.com/123378 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-08-20 22:15:59 +00:00
Michael Munday	edae0ff8c1	math: use s390x mnemonics rather than binary encodings TMLL, LGDR and LDGR have all been added to the Go assembler previously, so we don't need to encode them using WORD and BYTE directives anymore. This is purely a cosmetic change, it does not change the contents of any object files. Change-Id: I93f815b91be310858297d8a0dc9e6d8e3f09dd65 Reviewed-on: https://go-review.googlesource.com/129895 Run-TryBot: Michael Munday <mike.munday@ibm.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-08-20 17:42:08 +00:00
Benjamin Cable	669ac1228a	math/rand: improve package documentation Notify readers that interval notation is used. Fixes: #26765 Change-Id: Id02a7fcffbf41699e85631badeee083f5d4b2201 Reviewed-on: https://go-review.googlesource.com/127549 Reviewed-by: Rob Pike <r@golang.org>	2018-08-03 23:08:42 +00:00
Keith Randall	51ddeb9965	math: add tests for erf and erfc Test large but not infinite arguments. This CL adds a test which breaks s390x. Don't submit until a fix for that is figured out. Update #26477 Change-Id: Ic86739fe3554e87d7f8e15482875c198fcf1d59c Reviewed-on: https://go-review.googlesource.com/125641 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-08-03 03:38:52 +00:00
bill_ofarrell	f04a002e5a	math: ensure Erfc is not called with out-of-expected-range arguments on s390x The existing implementation produces correct results with a wide range of inputs, but invalid results asymptotically. With this change we ensure correct asymptotic results on s390x Fixes #26477 Change-Id: I760c1f8177f7cab2d7622ab9a926dfb1f8113b49 Reviewed-on: https://go-review.googlesource.com/127119 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-08-03 01:21:31 +00:00
Brian Kessler	30b045d4d1	math/big: handle negative exponents in Exp For modular exponentiation, negative exponents can be handled using the following relation. for y < 0: xy mod m == (x(-1))**\|y\| mod m First compute ModInverse(x, m) and then compute the exponentiation with the absolute value of the exponent. Non-modular exponentiation with a negative exponent still returns 1. Fixes #25865 Change-Id: I2a35986a24794b48e549c8de935ac662d217d8a0 Reviewed-on: https://go-review.googlesource.com/118562 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2018-06-14 22:26:30 +00:00
Brian Kessler	d31cad7ca5	math/big: round x + (-x) to -0 for mode ToNegativeInf Handling of sign bit as defined by IEEE 754-2008, section 6.3: When the sum of two operands with opposite signs (or the difference of two operands with like signs) is exactly zero, the sign of that sum (or difference) shall be +0 in all rounding-direction attributes except roundTowardNegative; under that attribute, the sign of an exact zero sum (or difference) shall be −0. However, x+x = x−(−x) retains the same sign as x even when x is zero. This change handles the special case of Add/Sub resulting in exactly zero when the rounding mode is ToNegativeInf setting the sign bit accordingly. Fixes #25798 Change-Id: I4d0715fa3c3e4a3d8a4d7861dc1d6423c8b1c68c Reviewed-on: https://go-review.googlesource.com/117495 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2018-06-14 21:07:01 +00:00
Andrii Soldatenko	efddc161d2	math: add examples to Ceil, Floor, Pow, Pow10 functions Change-Id: I9154df128b349c102854bb0f21e4c313685dd0e6 Reviewed-on: https://go-review.googlesource.com/118659 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-06-13 22:01:28 +00:00
Tim Cooper	161874da2a	all: update comment URLs from HTTP to HTTPS, where possible Each URL was manually verified to ensure it did not serve up incorrect content. Change-Id: I4dc846227af95a73ee9a3074d0c379ff0fa955df Reviewed-on: https://go-review.googlesource.com/115798 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org>	2018-06-01 21:52:00 +00:00
Brian Kessler	85f4051731	math/big: implement Atkin's ModSqrt for 5 mod 8 primes For primes congruent to 5 mod 8 there is a simple deterministic method for calculating the modular square root due to Atkin, using one exponentiation and 4 multiplications. A. Atkin. Probabilistic primality testing, summary by F. Morain. Research Report 1779, INRIA, pages 159–163, 1992. This increases the speed of modular square roots for these primes considerably. name old time/op new time/op delta ModSqrt231_5Mod8-4 1.03ms ± 2% 0.36ms ± 5% -65.06% (p=0.008 n=5+5) Change-Id: I024f6e514bbca8d634218983117db2afffe615fe Reviewed-on: https://go-review.googlesource.com/99615 Reviewed-by: Robert Griesemer <gri@golang.org>	2018-05-30 16:28:46 +00:00
Tim Cooper	555eb70db2	all: regenerate stringer files Change-Id: I34838320047792c4719837591e848b87ccb7f5ab Reviewed-on: https://go-review.googlesource.com/115058 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-05-29 20:35:41 +00:00
Alexander Döring	1a5d0f83c9	math/big: reduce allocations in Karatsuba case of sqr For #23221. Change-Id: If55dcf2e0706d6658f4a0863e3740437e008706c Reviewed-on: https://go-review.googlesource.com/114335 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2018-05-23 23:01:10 +00:00
Alexander Döring	1dc20e9124	math/big: specialize Karatsuba implementation for squaring Currently we use three different algorithms for squaring: 1. basic multiplication for small numbers 2. basic squaring for medium numbers 3. Karatsuba multiplication for large numbers Change 3. to a version of Karatsuba multiplication specialized for x == y. Increasing the performance of 3. lets us lower the threshold between 2. and 3. Adapt TestCalibrate to the change that 3. isn't independent of the threshold between 1. and 2. any more. Fixes #23221. benchstat old.txt new.txt name old time/op new time/op delta NatSqr/1-4 29.6ns ± 7% 29.5ns ± 5% ~ (p=0.103 n=50+50) NatSqr/2-4 51.9ns ± 1% 51.9ns ± 1% ~ (p=0.693 n=42+49) NatSqr/3-4 64.3ns ± 1% 64.1ns ± 0% -0.26% (p=0.000 n=46+43) NatSqr/5-4 93.5ns ± 2% 93.1ns ± 1% -0.39% (p=0.000 n=48+49) NatSqr/8-4 131ns ± 1% 131ns ± 1% ~ (p=0.870 n=46+49) NatSqr/10-4 175ns ± 1% 175ns ± 1% +0.38% (p=0.000 n=49+47) NatSqr/20-4 426ns ± 1% 429ns ± 1% +0.84% (p=0.000 n=46+48) NatSqr/30-4 702ns ± 2% 699ns ± 1% -0.38% (p=0.011 n=46+44) NatSqr/50-4 1.44µs ± 2% 1.43µs ± 1% -0.54% (p=0.010 n=48+48) NatSqr/80-4 2.85µs ± 1% 2.87µs ± 1% +0.68% (p=0.000 n=47+47) NatSqr/100-4 4.06µs ± 1% 4.07µs ± 1% +0.29% (p=0.000 n=46+45) NatSqr/200-4 13.4µs ± 1% 13.5µs ± 1% +0.73% (p=0.000 n=48+48) NatSqr/300-4 28.5µs ± 1% 28.2µs ± 1% -1.22% (p=0.000 n=46+48) NatSqr/500-4 81.9µs ± 1% 67.0µs ± 1% -18.25% (p=0.000 n=48+48) NatSqr/800-4 161µs ± 1% 140µs ± 1% -13.29% (p=0.000 n=47+48) NatSqr/1000-4 245µs ± 1% 207µs ± 1% -15.17% (p=0.000 n=49+49) go test -v -calibrate --run TestCalibrate ... Calibrating threshold between basicSqr(x) and karatsubaSqr(x) Looking for a timing difference for x between 200 - 500 words by 10 step words = 200 deltaT = -980ns ( -7%) is karatsubaSqr(x) better: false words = 210 deltaT = -773ns ( -5%) is karatsubaSqr(x) better: false words = 220 deltaT = -695ns ( -4%) is karatsubaSqr(x) better: false words = 230 deltaT = -570ns ( -3%) is karatsubaSqr(x) better: false words = 240 deltaT = -458ns ( -2%) is karatsubaSqr(x) better: false words = 250 deltaT = -63ns ( 0%) is karatsubaSqr(x) better: false words = 260 deltaT = 118ns ( 0%) is karatsubaSqr(x) better: true threshold found words = 270 deltaT = 377ns ( 1%) is karatsubaSqr(x) better: true words = 280 deltaT = 765ns ( 3%) is karatsubaSqr(x) better: true words = 290 deltaT = 673ns ( 2%) is karatsubaSqr(x) better: true words = 300 deltaT = 502ns ( 1%) is karatsubaSqr(x) better: true words = 310 deltaT = 629ns ( 2%) is karatsubaSqr(x) better: true words = 320 deltaT = 1.011µs ( 3%) is karatsubaSqr(x) better: true words = 330 deltaT = 1.36µs ( 4%) is karatsubaSqr(x) better: true words = 340 deltaT = 3.001µs ( 8%) is karatsubaSqr(x) better: true words = 350 deltaT = 3.178µs ( 8%) is karatsubaSqr(x) better: true ... Change-Id: I6f13c23d94d042539ac28e77fd2618cdc37a429e Reviewed-on: https://go-review.googlesource.com/105075 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2018-05-23 20:46:02 +00:00
Agniva De Sarker	f94b5a8105	math/rand: clarify documentation for Seed example Fixes #25325 Change-Id: I101641be99a820722edb7272918e04e8d2e1646c Reviewed-on: https://go-review.googlesource.com/112775 Reviewed-by: Rob Pike <r@golang.org>	2018-05-12 06:21:01 +00:00
Brian Kessler	50649a967c	math/big: implement Lehmer's extended GCD algorithm Updates #15833 The extended GCD algorithm can be implemented using Lehmer's algorithm with additional updates for the cosequences following Algorithm 10.45 from Cohen et al. "Handbook of Elliptic and Hyperelliptic Curve Cryptography" pp 192. This brings the speed of the extended GCD calculation within ~2x of the base GCD calculation. There is a slight degradation in the non-extended GCD speed for small inputs (1-2 words) due to the additional code to handle the extended updates. name old time/op new time/op delta GCD10x10/WithoutXY-4 262ns ± 1% 266ns ± 2% ~ (p=0.333 n=5+5) GCD10x10/WithXY-4 1.42µs ± 2% 0.74µs ± 3% -47.90% (p=0.008 n=5+5) GCD10x100/WithoutXY-4 520ns ± 2% 539ns ± 1% +3.81% (p=0.008 n=5+5) GCD10x100/WithXY-4 2.32µs ± 1% 1.67µs ± 0% -27.80% (p=0.008 n=5+5) GCD10x1000/WithoutXY-4 1.40µs ± 1% 1.45µs ± 2% +3.26% (p=0.016 n=4+5) GCD10x1000/WithXY-4 4.78µs ± 1% 3.43µs ± 1% -28.37% (p=0.008 n=5+5) GCD10x10000/WithoutXY-4 10.0µs ± 0% 10.2µs ± 3% +1.80% (p=0.008 n=5+5) GCD10x10000/WithXY-4 20.9µs ± 3% 17.9µs ± 1% -14.20% (p=0.008 n=5+5) GCD10x100000/WithoutXY-4 96.8µs ± 0% 96.3µs ± 1% ~ (p=0.310 n=5+5) GCD10x100000/WithXY-4 196µs ± 3% 159µs ± 2% -18.61% (p=0.008 n=5+5) GCD100x100/WithoutXY-4 2.53µs ±15% 2.34µs ± 0% -7.35% (p=0.008 n=5+5) GCD100x100/WithXY-4 19.3µs ± 0% 3.9µs ± 1% -79.58% (p=0.008 n=5+5) GCD100x1000/WithoutXY-4 4.23µs ± 0% 4.17µs ± 3% ~ (p=0.127 n=5+5) GCD100x1000/WithXY-4 22.8µs ± 1% 7.5µs ±10% -67.00% (p=0.008 n=5+5) GCD100x10000/WithoutXY-4 19.1µs ± 0% 19.0µs ± 0% ~ (p=0.095 n=5+5) GCD100x10000/WithXY-4 75.1µs ± 2% 30.5µs ± 2% -59.38% (p=0.008 n=5+5) GCD100x100000/WithoutXY-4 170µs ± 5% 167µs ± 1% ~ (p=1.000 n=5+5) GCD100x100000/WithXY-4 542µs ± 2% 267µs ± 2% -50.79% (p=0.008 n=5+5) GCD1000x1000/WithoutXY-4 28.0µs ± 0% 27.1µs ± 0% -3.29% (p=0.008 n=5+5) GCD1000x1000/WithXY-4 329µs ± 0% 42µs ± 1% -87.12% (p=0.008 n=5+5) GCD1000x10000/WithoutXY-4 47.2µs ± 0% 46.4µs ± 0% -1.65% (p=0.016 n=5+4) GCD1000x10000/WithXY-4 607µs ± 9% 123µs ± 1% -79.70% (p=0.008 n=5+5) GCD1000x100000/WithoutXY-4 260µs ±17% 245µs ± 0% ~ (p=0.056 n=5+5) GCD1000x100000/WithXY-4 3.64ms ± 1% 0.93ms ± 1% -74.41% (p=0.016 n=4+5) GCD10000x10000/WithoutXY-4 513µs ± 0% 507µs ± 0% -1.22% (p=0.008 n=5+5) GCD10000x10000/WithXY-4 7.44ms ± 1% 1.00ms ± 0% -86.58% (p=0.008 n=5+5) GCD10000x100000/WithoutXY-4 1.23ms ± 0% 1.23ms ± 1% ~ (p=0.056 n=5+5) GCD10000x100000/WithXY-4 37.3ms ± 0% 7.3ms ± 1% -80.45% (p=0.008 n=5+5) GCD100000x100000/WithoutXY-4 24.2ms ± 0% 24.2ms ± 0% ~ (p=0.841 n=5+5) GCD100000x100000/WithXY-4 505ms ± 1% 56ms ± 1% -88.92% (p=0.008 n=5+5) Change-Id: I25f42ab8c55033acb83cc32bb03c12c1963925e8 Reviewed-on: https://go-review.googlesource.com/78755 Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-05-08 17:24:36 +00:00
Richard Musiol	b00f72e08a	math, math/big: add wasm architecture This commit adds the wasm architecture to the math package. Updates #18892 Change-Id: I5cc38552a31b193d35fb81ae87600a76b8b9e9b5 Reviewed-on: https://go-review.googlesource.com/106996 Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-05-08 13:29:22 +00:00
Martin Möhrmann	8c4170b2c9	math/bits: move tests into their own package This makes math/bits not have any explicit imports even when compiling tests and thereby avoids import cycles when dependencies of testing want to import math/bits. Change-Id: I95eccae2f5c4310e9b18124abfa85212dfbd9daa Reviewed-on: https://go-review.googlesource.com/110479 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-05-01 15:33:01 +00:00
Brian Kessler	6f7ec484f6	math/big: handle negative moduli in ModInverse Currently, there is no check for a negative modulus in ModInverse. Negative moduli are passed internally to GCD, which returns 0 for negative arguments. Mod is symmetric with respect to negative moduli, so the calculation can be done by just negating the modulus before passing the arguments to GCD. Fixes #24949 Change-Id: Ifd1e64c9b2343f0489c04ab65504e73a623378c7 Reviewed-on: https://go-review.googlesource.com/108115 Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Robert Griesemer <gri@golang.org>	2018-05-01 00:04:28 +00:00
Brian Kessler	4d44a87243	math/big: return nil for nonexistent ModInverse Currently, the behavior of z.ModInverse(g, n) is undefined when g and n are not relatively prime. In that case, no ModInverse exists which can be easily checked during the computation of the ModInverse. Because the ModInverse does not indicate whether the inverse exists, there are reimplementations of a "checked" ModInverse in crypto/rsa. This change removes the undefined behavior. If the ModInverse does not exist, the receiver z is unchanged and the return value is nil. This matches the behavior of ModSqrt for the case where the square root does not exist. name old time/op new time/op delta ModInverse-4 2.40µs ± 4% 2.22µs ± 0% -7.74% (p=0.016 n=5+4) name old alloc/op new alloc/op delta ModInverse-4 1.36kB ± 0% 1.17kB ± 0% -14.12% (p=0.008 n=5+5) name old allocs/op new allocs/op delta ModInverse-4 10.0 ± 0% 9.0 ± 0% -10.00% (p=0.008 n=5+5) Fixes #24922 Change-Id: If7f9d491858450bdb00f1e317152f02493c9c8a8 Reviewed-on: https://go-review.googlesource.com/108996 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2018-04-30 23:45:27 +00:00
erifan01	6d5ebc7022	math: add a testcase for Mod and Remainder respectively One might try to implement the Mod or Remainder function with the expression x - TRUNC(x/y + 0.5)*y, but in fact this method is wrong, because the rounding of (x/y + 0.5) to initialize the argument of TRUNC may lose too much precision. However, the current test cases can not detect this error. This CL adds two test cases to prevent people from continuing to do such attempts. Change-Id: I6690f5cffb21bf8ae06a314b7a45cafff8bcee13 Reviewed-on: https://go-review.googlesource.com/84275 Reviewed-by: Robert Griesemer <gri@golang.org>	2018-04-17 03:17:22 +00:00
weeellz	a2ffe3e625	math/rand: refactor rng.go Made constant names more idiomatic, moved some constants to function seedrand, and found better name for _M. Change-Id: I192172f398378bef486a5bbceb6ba86af48ebcc9 Reviewed-on: https://go-review.googlesource.com/107135 Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-04-16 22:22:14 +00:00
Cherry Zhang	b08a9b7ecc	all: use new softfloat on GOARM=5 Use the new softfloat support in the compiler, originally added for softfloat on MIPS. This support is portable, so we can just use it for softfloat on ARM. In the old softfloat support on ARM, the compiler generates floating point instructions, then the assembler inserts calls to _sfloat before FP instructions. _sfloat decodes the following FP instructions and simulates them. In the new scheme, the compiler generates runtime calls to do FP operations at a higher level. It doesn't generate FP instructions, and therefore the assembler won't insert _sfloat calls, i.e. the old mechanism is automatically suppressed. The old method may be still be triggered with assembly code using FP instructions. In the standard library, the only occurance is math/sqrt_arm.s, which is rewritten to call to the Go implementation instead. Some significant speedups for code using floating points: name old time/op new time/op delta BinaryTree17-4 37.1s ± 2% 37.3s ± 1% ~ (p=0.105 n=10+10) Fannkuch11-4 13.0s ± 0% 13.1s ± 0% +0.46% (p=0.000 n=10+10) FmtFprintfEmpty-4 700ns ± 4% 734ns ± 6% +4.84% (p=0.009 n=10+10) FmtFprintfString-4 1.22µs ± 3% 1.22µs ± 4% ~ (p=0.897 n=10+10) FmtFprintfInt-4 1.27µs ± 2% 1.30µs ± 1% +1.91% (p=0.001 n=10+9) FmtFprintfIntInt-4 1.83µs ± 2% 1.81µs ± 3% ~ (p=0.149 n=10+10) FmtFprintfPrefixedInt-4 1.80µs ± 3% 1.81µs ± 2% ~ (p=0.421 n=10+8) FmtFprintfFloat-4 6.89µs ± 3% 3.59µs ± 2% -47.93% (p=0.000 n=10+10) FmtManyArgs-4 6.39µs ± 1% 6.09µs ± 1% -4.61% (p=0.000 n=10+9) GobDecode-4 109ms ± 2% 81ms ± 2% -25.99% (p=0.000 n=9+10) GobEncode-4 109ms ± 2% 76ms ± 2% -29.88% (p=0.000 n=10+9) Gzip-4 3.61s ± 1% 3.59s ± 1% ~ (p=0.247 n=10+10) Gunzip-4 449ms ± 4% 450ms ± 1% ~ (p=0.230 n=10+7) HTTPClientServer-4 1.55ms ± 3% 1.53ms ± 2% ~ (p=0.400 n=9+10) JSONEncode-4 356ms ± 1% 183ms ± 1% -48.73% (p=0.000 n=10+10) JSONDecode-4 1.12s ± 2% 0.87s ± 1% -21.88% (p=0.000 n=10+10) Mandelbrot200-4 5.49s ± 1% 2.55s ± 1% -53.45% (p=0.000 n=9+10) GoParse-4 49.6ms ± 2% 47.5ms ± 1% -4.08% (p=0.000 n=10+9) RegexpMatchEasy0_32-4 1.13µs ± 4% 1.20µs ± 4% +6.42% (p=0.000 n=10+10) RegexpMatchEasy0_1K-4 4.41µs ± 2% 4.44µs ± 2% ~ (p=0.128 n=10+10) RegexpMatchEasy1_32-4 1.15µs ± 5% 1.20µs ± 5% +4.85% (p=0.002 n=10+10) RegexpMatchEasy1_1K-4 6.21µs ± 2% 6.37µs ± 4% +2.62% (p=0.001 n=9+10) RegexpMatchMedium_32-4 1.58µs ± 5% 1.65µs ± 3% +4.85% (p=0.000 n=10+10) RegexpMatchMedium_1K-4 341µs ± 3% 351µs ± 7% ~ (p=0.573 n=8+10) RegexpMatchHard_32-4 21.4µs ± 3% 21.5µs ± 5% ~ (p=0.931 n=9+9) RegexpMatchHard_1K-4 626µs ± 2% 626µs ± 1% ~ (p=0.645 n=8+8) Revcomp-4 46.4ms ± 2% 47.4ms ± 2% +2.07% (p=0.000 n=10+10) Template-4 1.31s ± 3% 1.23s ± 4% -6.13% (p=0.000 n=10+10) TimeParse-4 4.49µs ± 1% 4.41µs ± 2% -1.81% (p=0.000 n=10+9) TimeFormat-4 9.31µs ± 1% 9.32µs ± 2% ~ (p=0.561 n=9+9) Change-Id: Iaeeff6c9a09c1b2c064d06e09dd88101dc02bfa4 Reviewed-on: https://go-review.googlesource.com/106735 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-04-13 16:39:39 +00:00
Brian Kessler	7818b82fc8	math/big: clean up z.div(z, x, y) calls Updates #22830 Due to not checking if the output slices alias in divLarge, calls of the form z.div(z, x, y) caused the slice z to attempt to be used to store both the quotient and the remainder of the division. CL 78995 applies an alias check to correct that error. This CL cleans up the additional div calls that attempt to supply the same slice to hold both the quotient and remainder. Note that the call in expNN was responsible for the reported error in r.Exp(x, 1, m) when r was initialized to a non-zero value. The second instance in expNNMontgomery did not result in an error due to the size of the arguments. // RR = 2*(2_Wlen(m)) mod m RR := nat(nil).setWord(1) zz := nat(nil).shl(RR, uint(2numWords_W)) _, RR = RR.div(RR, zz, m) Specifically, cap(RR) == 5 after setWord(1) due to const e = 4 in z.make(1) len(zz) == 2len(m) + 1 after shifting left, numWords = len(m) Reusing the backing array for z and z2 in div was only triggered if cap(RR) >= len(zz) + 1 and len(m) > 1 so that divLarge was called. But, 5 < 2*len(m) + 2 if len(m) > 1, so new arrays were allocated and the error was never triggered in this case. Change-Id: Iedac80dbbde13216c94659e84d28f6f4be3aaf24 Reviewed-on: https://go-review.googlesource.com/81055 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2018-04-05 22:02:33 +00:00
Carlos Eduardo Seo	fc8967e384	math/big: improve performance on ppc64x by unrolling loops This change improves performance of addVV, subVV and mulAddVWW by unrolling the loops, with improvements up to 1.45x. benchmark old ns/op new ns/op delta BenchmarkAddVV/1-16 5.79 5.85 +1.04% BenchmarkAddVV/2-16 6.41 6.62 +3.28% BenchmarkAddVV/3-16 6.89 7.35 +6.68% BenchmarkAddVV/4-16 7.47 8.26 +10.58% BenchmarkAddVV/5-16 8.04 8.18 +1.74% BenchmarkAddVV/10-16 10.9 11.2 +2.75% BenchmarkAddVV/100-16 81.7 57.0 -30.23% BenchmarkAddVV/1000-16 714 500 -29.97% BenchmarkAddVV/10000-16 7088 4946 -30.22% BenchmarkAddVV/100000-16 71514 49364 -30.97% BenchmarkSubVV/1-16 5.94 5.89 -0.84% BenchmarkSubVV/2-16 12.9 6.82 -47.13% BenchmarkSubVV/3-16 7.03 7.34 +4.41% BenchmarkSubVV/4-16 7.58 8.23 +8.58% BenchmarkSubVV/5-16 8.15 8.19 +0.49% BenchmarkSubVV/10-16 11.2 11.4 +1.79% BenchmarkSubVV/100-16 82.4 57.0 -30.83% BenchmarkSubVV/1000-16 715 499 -30.21% BenchmarkSubVV/10000-16 7089 4947 -30.22% BenchmarkSubVV/100000-16 71568 49378 -31.01% benchmark old MB/s new MB/s speedup BenchmarkAddVV/1-16 11048.49 10939.92 0.99x BenchmarkAddVV/2-16 19973.41 19323.60 0.97x BenchmarkAddVV/3-16 27847.09 26123.06 0.94x BenchmarkAddVV/4-16 34276.46 30976.54 0.90x BenchmarkAddVV/5-16 39781.92 39140.68 0.98x BenchmarkAddVV/10-16 58559.29 56894.68 0.97x BenchmarkAddVV/100-16 78354.88 112243.69 1.43x BenchmarkAddVV/1000-16 89592.74 127889.04 1.43x BenchmarkAddVV/10000-16 90292.39 129387.06 1.43x BenchmarkAddVV/100000-16 89492.92 129647.78 1.45x BenchmarkSubVV/1-16 10781.03 10861.22 1.01x BenchmarkSubVV/2-16 9949.27 18760.21 1.89x BenchmarkSubVV/3-16 27319.40 26166.01 0.96x BenchmarkSubVV/4-16 33764.35 31123.02 0.92x BenchmarkSubVV/5-16 39272.40 39050.31 0.99x BenchmarkSubVV/10-16 57262.87 56206.33 0.98x BenchmarkSubVV/100-16 77641.78 112280.86 1.45x BenchmarkSubVV/1000-16 89486.27 128064.08 1.43x BenchmarkSubVV/10000-16 90274.37 129356.59 1.43x BenchmarkSubVV/100000-16 89424.42 129610.50 1.45x Change-Id: I2795a82134d1e3b75e2634c76b8ca165a723ec7b Reviewed-on: https://go-review.googlesource.com/103495 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>	2018-04-05 20:00:30 +00:00
Robert Griesemer	542ea5ad91	go/printer, gofmt: tuned table alignment for better results The go/printer (and thus gofmt) uses a heuristic to determine whether to break alignment between elements of an expression list which is spread across multiple lines. The heuristic only kicked in if the entry sizes (character length) was above a certain threshold (20) and the ratio between the previous and current entry size was above a certain value (4). This heuristic worked reasonably most of the time, but also led to unfortunate breaks in many cases where a single entry was suddenly much smaller (or larger) then the previous one. The behavior of gofmt was sufficiently mysterious in some of these situations that many issues were filed against it. The simplest solution to address this problem is to remove the heuristic altogether and have a programmer introduce empty lines to force different alignments if it improves readability. The problem with that approach is that the places where it really matters, very long tables with many (hundreds, or more) entries, may be machine-generated and not "post-processed" by a human (e.g., unicode/utf8/tables.go). If a single one of those entries is overlong, the result would be that the alignment would force all comments or values in key:value pairs to be adjusted to that overlong value, making the table hard to read (e.g., that entry may not even be visible on screen and all other entries seem spaced out too wide). Instead, we opted for a slightly improved heuristic that behaves much better for "normal", human-written code. 1) The threshold is increased from 20 to 40. This disables the heuristic for many common cases yet even if the alignment is not "ideal", 40 is not that many characters per line with todays screens, making it very likely that the entire line remains "visible" in an editor. 2) Changed the heuristic to not simply look at the size ratio between current and previous line, but instead considering the geometric mean of the sizes of the previous (aligned) lines. This emphasizes the "overall picture" of the previous lines, rather than a single one (which might be an outlier). 3) Changed the ratio from 4 to 2.5. Now that we ignore sizes below 40, a ratio of 4 would mean that a new entry would have to be 4 times bigger (160) or smaller (10) before alignment would be broken. A ratio of 2.5 seems more sensible. Applied updated gofmt to all of src and misc. Also tested against several former issues that complained about this and verified that the output for the given examples is satisfactory (added respective test cases). Some of the files changed because they were not gofmt-ed in the first place. For #644. For #7335. For #10392. (and probably more related issues) Fixes #22852. Change-Id: I5e48b3d3b157a5cf2d649833b7297b33f43a6f6e	2018-04-04 13:39:34 -07:00
isharipo	02952ad7a8	math/big: remove "else" from if with block that ends with return That "else" was needed due to gc DCE limitations. Now it's not the case and we can avoid go lint complaints. (See #23521 and https://golang.org/cl/91056.) There is inlining test for bigEndianWord, so if test is passing, no performance regression should occur. Change-Id: Id84d63f361e5e51a52293904ff042966c83c16e9 Reviewed-on: https://go-review.googlesource.com/104555 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>	2018-04-03 19:55:04 +00:00
Michael Munday	32e6461dc6	cmd/asm, math: add s390x floating point test instructions Floating point test instructions allow special cases (NaN, ±∞ and a few other useful properties) to be checked directly. This CL adds the following instructions to the assembler: * LTEBR - load and test (float32) * LTDBR - load and test (float64) * TCEB - test data class (float32) * TCDB - test data class (float64) Note that I have only added immediate versions of the 'test data class' instructions for now as that's the only case I think the compiler will use. Change-Id: I3398aab2b3a758bf909bd158042234030c8af582 Reviewed-on: https://go-review.googlesource.com/104457 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-04-03 16:08:04 +00:00
Wèi Cōngruì	4b265fb747	math: fix Ldexp when result is below ldexp(2, -1075) Before this change, the smallest result Ldexp can handle was ldexp(2, -1075), which is SmallestNonzeroFloat64. There are some numbers below it should also be rounded to SmallestNonzeroFloat64. The change fixes this. Fixes #23407 Change-Id: I76f4cb005a6e9ccdd95b5e5c734079fd5d29e4aa Reviewed-on: https://go-review.googlesource.com/87338 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2018-03-29 23:14:13 +00:00
Wèi Cōngruì	7fe2f549cc	math: handle denormals in AMD64 Exp Fixes #23164 Change-Id: I6e8c6443f3ef91df71e117cce1cfa1faba647dd7 Reviewed-on: https://go-review.googlesource.com/87337 Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-03-29 07:03:09 +00:00
erifan01	711a373cc3	math: optimize Exp and Exp2 on arm64 This CL implements Exp and Exp2 with arm64 assembly. By inlining Ldexp and using fused instructions(fmadd, fmsub, fnmsub), this CL helps to improve the performance of functions Exp, Exp2, Sinh, Cosh and Tanh. Benchmarks: name old time/op new time/op delta Cosh-8 138ns ± 0% 96ns ± 0% -30.72% (p=0.008 n=5+5) Exp-8 105ns ± 0% 58ns ± 0% -45.24% (p=0.000 n=5+4) Exp2-8 100ns ± 0% 57ns ± 0% -43.21% (p=0.008 n=5+5) Sinh-8 139ns ± 0% 102ns ± 0% -26.62% (p=0.008 n=5+5) Tanh-8 134ns ± 0% 100ns ± 0% -25.67% (p=0.008 n=5+5) Change-Id: I7483a3333062a1d3525cedf3de56db78d79031c6 Reviewed-on: https://go-review.googlesource.com/86615 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-03-27 19:55:02 +00:00
Carlos Eduardo Seo	a44c72823c	math/big: improve performance of addVW/subVW for ppc64x This change adds a better implementation in asm for addVW/subVW for ppc64x, with speedups up to 3.11x. benchmark old ns/op new ns/op delta BenchmarkAddVW/1-16 6.87 5.71 -16.89% BenchmarkAddVW/2-16 7.72 5.94 -23.06% BenchmarkAddVW/3-16 8.74 6.56 -24.94% BenchmarkAddVW/4-16 9.66 7.26 -24.84% BenchmarkAddVW/5-16 10.8 7.26 -32.78% BenchmarkAddVW/10-16 17.4 9.97 -42.70% BenchmarkAddVW/100-16 164 56.0 -65.85% BenchmarkAddVW/1000-16 1638 524 -68.01% BenchmarkAddVW/10000-16 16421 5201 -68.33% BenchmarkAddVW/100000-16 165762 53324 -67.83% BenchmarkSubVW/1-16 6.76 5.62 -16.86% BenchmarkSubVW/2-16 7.69 6.02 -21.72% BenchmarkSubVW/3-16 8.85 6.61 -25.31% BenchmarkSubVW/4-16 10.0 7.34 -26.60% BenchmarkSubVW/5-16 11.3 7.33 -35.13% BenchmarkSubVW/10-16 19.5 18.7 -4.10% BenchmarkSubVW/100-16 153 55.9 -63.46% BenchmarkSubVW/1000-16 1502 519 -65.45% BenchmarkSubVW/10000-16 15005 5165 -65.58% BenchmarkSubVW/100000-16 150620 53124 -64.73% benchmark old MB/s new MB/s speedup BenchmarkAddVW/1-16 1165.12 1400.76 1.20x BenchmarkAddVW/2-16 2071.39 2693.25 1.30x BenchmarkAddVW/3-16 2744.72 3656.92 1.33x BenchmarkAddVW/4-16 3311.63 4407.34 1.33x BenchmarkAddVW/5-16 3700.52 5512.48 1.49x BenchmarkAddVW/10-16 4605.63 8026.37 1.74x BenchmarkAddVW/100-16 4856.15 14296.76 2.94x BenchmarkAddVW/1000-16 4883.96 15264.21 3.13x BenchmarkAddVW/10000-16 4871.52 15380.78 3.16x BenchmarkAddVW/100000-16 4826.17 15002.48 3.11x BenchmarkSubVW/1-16 1183.20 1423.03 1.20x BenchmarkSubVW/2-16 2081.92 2657.44 1.28x BenchmarkSubVW/3-16 2711.52 3632.30 1.34x BenchmarkSubVW/4-16 3198.30 4360.30 1.36x BenchmarkSubVW/5-16 3534.43 5460.40 1.54x BenchmarkSubVW/10-16 4106.34 4273.51 1.04x BenchmarkSubVW/100-16 5213.48 14306.32 2.74x BenchmarkSubVW/1000-16 5324.27 15391.21 2.89x BenchmarkSubVW/10000-16 5331.33 15486.57 2.90x BenchmarkSubVW/100000-16 5311.35 15059.01 2.84x Change-Id: Ibaa5b9b38d63fba8e01a9c327eb8bef1e6e908c1 Reviewed-on: https://go-review.googlesource.com/101975 Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>	2018-03-27 15:06:53 +00:00
Vlad Krasnov	26d74e8b65	math/big: reduce amount of copying in Montgomery multiplication Instead shifting the accumulator every iteration of the loop, shift once in the end. This significantly improves performance on arm64. On arm64: name old time/op new time/op delta RSA2048Decrypt 3.33ms ± 0% 2.63ms ± 0% -20.94% (p=0.000 n=11+11) RSA2048Sign 4.22ms ± 0% 3.55ms ± 0% -15.89% (p=0.000 n=11+11) 3PrimeRSA2048Decrypt 1.95ms ± 0% 1.59ms ± 0% -18.59% (p=0.000 n=11+11) On Skylake: name old time/op new time/op delta RSA2048Decrypt-8 1.73ms ± 2% 1.55ms ± 2% -10.19% (p=0.000 n=10+10) RSA2048Sign-8 2.17ms ± 2% 2.00ms ± 2% -7.93% (p=0.000 n=10+10) 3PrimeRSA2048Decrypt-8 1.10ms ± 2% 0.96ms ± 2% -13.03% (p=0.000 n=10+9) Change-Id: I5786191a1a09e4217fdb1acfd90880d35c5855f7 Reviewed-on: https://go-review.googlesource.com/99838 Run-TryBot: Vlad Krasnov <vlad@cloudflare.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Adam Langley <agl@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2018-03-19 21:40:56 +00:00
Alberto Donizetti	168cc7ff9c	math/big: add 0 shift fastpath to shl and shr One could expect calls like z.mant.shl(z.mant, shiftAmount) (or higher-level-functions calls that use lhs/rhs) to be almost free when shiftAmount = 0; and expect calls like z.mant.shl(x.mant, 0) to have the same cost of a x.mant -> z.mant copy. Neither of this things are currently true. For an 800 words nat, the first kind of calls cost ~800ns for rigth shifts and ~3.5µs for left shift; while the second kind of calls are doing more work than necessary by calling shlVU/shrVU. This change makes the first kind of calls ({Shl,Shr}Same) almost free, and the second kind of calls ({Shl,Shr}) about 30% faster. name old time/op new time/op delta ZeroShifts/Shl-4 3.64µs ± 3% 2.49µs ± 1% -31.55% (p=0.000 n=10+10) ZeroShifts/ShlSame-4 3.65µs ± 1% 0.01µs ± 1% -99.85% (p=0.000 n=9+9) ZeroShifts/Shr-4 3.65µs ± 1% 2.49µs ± 1% -31.91% (p=0.000 n=10+10) ZeroShifts/ShrSame-4 825ns ± 0% 6ns ± 1% -99.33% (p=0.000 n=9+10) During go test math/big, the shl zeroshift fastpath is triggered 1380 times; while the shr fastpath is triggered 153334 times(!). Change-Id: I5f92b304a40638bd8453a86c87c58e54b337bcdf Reviewed-on: https://go-review.googlesource.com/87660 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2018-03-19 18:01:37 +00:00
Robert Griesemer	7d4d2cb686	math/big: add comment about internal assumptions on nat values Change-Id: I7ed40507a019c0bf521ba748fc22c03d74bb17b7 Reviewed-on: https://go-review.googlesource.com/100719 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2018-03-14 18:31:05 +00:00
erifan01	140bfe9cfe	math/big: optimize shlVU and shrVU on arm64 This CL implements shlVU and shrVU with arm64 HW instructions "LDP" and "STP" to reduce load cost, it also removes unnecessary checks on the number of shifts for better performance. Benchmarks: name old time/op new time/op delta AddVV/1-8 21.6ns ± 1% 21.6ns ± 1% ~ (p=0.683 n=5+5) AddVV/2-8 13.5ns ± 0% 13.5ns ± 0% ~ (all equal) AddVV/3-8 15.5ns ± 0% 15.5ns ± 0% ~ (all equal) AddVV/4-8 17.5ns ± 0% 17.5ns ± 0% ~ (all equal) AddVV/5-8 19.5ns ± 0% 19.5ns ± 0% ~ (all equal) AddVV/10-8 29.5ns ± 0% 29.5ns ± 0% ~ (all equal) AddVV/100-8 217ns ± 0% 217ns ± 0% ~ (all equal) AddVV/1000-8 2.02µs ± 0% 2.03µs ± 0% +0.73% (p=0.008 n=5+5) AddVV/10000-8 20.5µs ± 0% 20.5µs ± 0% -0.01% (p=0.008 n=5+5) AddVV/100000-8 246µs ± 5% 250µs ± 4% ~ (p=0.548 n=5+5) AddVW/1-8 9.26ns ± 0% 9.32ns ± 0% +0.65% (p=0.016 n=4+5) AddVW/2-8 19.8ns ± 3% 19.8ns ± 0% ~ (p=0.143 n=5+5) AddVW/3-8 11.5ns ± 0% 11.5ns ± 0% ~ (all equal) AddVW/4-8 13.0ns ± 0% 13.0ns ± 0% ~ (all equal) AddVW/5-8 14.5ns ± 0% 14.5ns ± 0% ~ (all equal) AddVW/10-8 22.0ns ± 0% 22.0ns ± 0% ~ (all equal) AddVW/100-8 167ns ± 0% 166ns ± 0% -0.60% (p=0.000 n=5+4) AddVW/1000-8 1.52µs ± 0% 1.52µs ± 0% ~ (all equal) AddVW/10000-8 15.1µs ± 0% 15.1µs ± 0% +0.01% (p=0.008 n=5+5) AddVW/100000-8 163µs ± 4% 153µs ± 3% -5.97% (p=0.016 n=5+5) AddMulVVW/1-8 32.4ns ± 1% 33.0ns ± 1% +1.73% (p=0.040 n=5+5) AddMulVVW/2-8 56.4ns ± 2% 55.9ns ± 1% ~ (p=0.135 n=5+5) AddMulVVW/3-8 85.4ns ± 1% 85.1ns ± 0% ~ (p=0.079 n=5+5) AddMulVVW/4-8 129ns ± 1% 129ns ± 0% ~ (p=0.397 n=5+5) AddMulVVW/5-8 148ns ± 0% 148ns ± 0% ~ (all equal) AddMulVVW/10-8 270ns ± 0% 268ns ± 0% -0.74% (p=0.029 n=4+4) AddMulVVW/100-8 2.75µs ± 0% 2.75µs ± 0% -0.09% (p=0.008 n=5+5) AddMulVVW/1000-8 26.0µs ± 0% 26.0µs ± 0% -0.06% (p=0.024 n=5+5) AddMulVVW/10000-8 312µs ± 0% 312µs ± 0% -0.09% (p=0.008 n=5+5) AddMulVVW/100000-8 2.89ms ± 0% 2.89ms ± 0% +0.14% (p=0.016 n=5+5) DecimalConversion-8 315µs ± 1% 312µs ± 0% ~ (p=0.095 n=5+5) FloatString/100-8 2.56µs ± 1% 2.52µs ± 1% -1.31% (p=0.016 n=5+5) FloatString/1000-8 58.6µs ± 0% 58.2µs ± 0% -0.75% (p=0.008 n=5+5) FloatString/10000-8 4.59ms ± 0% 4.59ms ± 0% ~ (p=0.056 n=5+5) FloatString/100000-8 446ms ± 0% 446ms ± 0% -0.04% (p=0.008 n=5+5) FloatAdd/10-8 184ns ± 0% 178ns ± 0% -3.48% (p=0.008 n=5+5) FloatAdd/100-8 189ns ± 3% 178ns ± 2% -6.02% (p=0.008 n=5+5) FloatAdd/1000-8 371ns ± 0% 267ns ± 0% -27.99% (p=0.000 n=5+4) FloatAdd/10000-8 1.87µs ± 0% 1.03µs ± 0% -44.74% (p=0.008 n=5+5) FloatAdd/100000-8 17.1µs ± 0% 8.8µs ± 0% -48.71% (p=0.016 n=5+4) FloatSub/10-8 148ns ± 0% 146ns ± 0% -1.35% (p=0.000 n=5+4) FloatSub/100-8 148ns ± 0% 140ns ± 0% -5.41% (p=0.008 n=5+5) FloatSub/1000-8 242ns ± 0% 191ns ± 0% -21.24% (p=0.008 n=5+5) FloatSub/10000-8 1.07µs ± 0% 0.64µs ± 1% -39.89% (p=0.016 n=4+5) FloatSub/100000-8 9.48µs ± 0% 5.32µs ± 0% -43.87% (p=0.008 n=5+5) ParseFloatSmallExp-8 29.3µs ± 3% 28.6µs ± 1% ~ (p=0.310 n=5+5) ParseFloatLargeExp-8 125µs ± 1% 123µs ± 0% -1.99% (p=0.008 n=5+5) GCD10x10/WithoutXY-8 278ns ± 4% 289ns ± 5% +3.96% (p=0.040 n=5+5) GCD10x10/WithXY-8 2.12µs ± 2% 2.15µs ± 2% ~ (p=0.095 n=5+5) GCD10x100/WithoutXY-8 615ns ± 1% 629ns ± 5% ~ (p=0.135 n=5+5) GCD10x100/WithXY-8 3.42µs ± 1% 3.53µs ± 2% +3.38% (p=0.008 n=5+5) GCD10x1000/WithoutXY-8 1.39µs ± 1% 1.38µs ± 1% ~ (p=0.460 n=5+5) GCD10x1000/WithXY-8 7.47µs ± 2% 7.49µs ± 3% ~ (p=1.000 n=5+5) GCD10x10000/WithoutXY-8 8.71µs ± 1% 8.71µs ± 0% ~ (p=0.841 n=5+5) GCD10x10000/WithXY-8 28.4µs ± 2% 27.2µs ± 2% -4.24% (p=0.008 n=5+5) GCD10x100000/WithoutXY-8 78.9µs ± 1% 79.1µs ± 0% ~ (p=0.222 n=5+5) GCD10x100000/WithXY-8 240µs ± 1% 228µs ± 1% -4.98% (p=0.008 n=5+5) GCD100x100/WithoutXY-8 1.87µs ± 2% 1.89µs ± 1% ~ (p=0.095 n=5+5) GCD100x100/WithXY-8 26.6µs ± 1% 26.3µs ± 0% -1.14% (p=0.032 n=5+5) GCD100x1000/WithoutXY-8 4.44µs ± 2% 4.47µs ± 2% ~ (p=0.444 n=5+5) GCD100x1000/WithXY-8 36.7µs ± 1% 36.0µs ± 1% -1.96% (p=0.008 n=5+5) GCD100x10000/WithoutXY-8 22.8µs ± 1% 22.3µs ± 1% -2.52% (p=0.008 n=5+5) GCD100x10000/WithXY-8 145µs ± 1% 142µs ± 0% -1.88% (p=0.008 n=5+5) GCD100x100000/WithoutXY-8 198µs ± 1% 190µs ± 0% -4.06% (p=0.008 n=5+5) GCD100x100000/WithXY-8 1.11ms ± 0% 1.09ms ± 0% -1.87% (p=0.008 n=5+5) GCD1000x1000/WithoutXY-8 25.4µs ± 1% 25.0µs ± 1% -1.34% (p=0.008 n=5+5) GCD1000x1000/WithXY-8 515µs ± 1% 485µs ± 0% -5.85% (p=0.008 n=5+5) GCD1000x10000/WithoutXY-8 57.3µs ± 1% 56.2µs ± 1% -1.95% (p=0.008 n=5+5) GCD1000x10000/WithXY-8 1.21ms ± 0% 1.18ms ± 0% -2.65% (p=0.008 n=5+5) GCD1000x100000/WithoutXY-8 358µs ± 0% 352µs ± 1% -1.71% (p=0.008 n=5+5) GCD1000x100000/WithXY-8 8.72ms ± 0% 8.66ms ± 0% -0.71% (p=0.008 n=5+5) GCD10000x10000/WithoutXY-8 690µs ± 0% 687µs ± 1% ~ (p=0.095 n=5+5) GCD10000x10000/WithXY-8 16.0ms ± 0% 12.5ms ± 0% -22.01% (p=0.008 n=5+5) GCD10000x100000/WithoutXY-8 2.09ms ± 0% 2.07ms ± 0% -0.58% (p=0.008 n=5+5) GCD10000x100000/WithXY-8 86.8ms ± 0% 83.4ms ± 0% -3.95% (p=0.008 n=5+5) GCD100000x100000/WithoutXY-8 51.2ms ± 0% 51.2ms ± 0% ~ (p=0.548 n=5+5) GCD100000x100000/WithXY-8 1.25s ± 0% 0.89s ± 0% -28.98% (p=0.008 n=5+5) Hilbert-8 2.46ms ± 2% 2.53ms ± 1% +2.89% (p=0.032 n=5+5) Binomial-8 5.15µs ± 4% 4.92µs ± 1% -4.43% (p=0.032 n=5+5) QuoRem-8 7.10µs ± 0% 7.05µs ± 0% -0.59% (p=0.008 n=5+5) Exp-8 161ms ± 0% 161ms ± 0% -0.24% (p=0.008 n=5+5) Exp2-8 161ms ± 0% 161ms ± 0% -0.30% (p=0.016 n=4+5) Bitset-8 40.4ns ± 0% 40.3ns ± 0% ~ (p=0.159 n=5+5) BitsetNeg-8 158ns ± 4% 155ns ± 2% ~ (p=0.183 n=5+5) BitsetOrig-8 374ns ± 0% 383ns ± 1% +2.35% (p=0.008 n=5+5) BitsetNegOrig-8 620ns ± 1% 663ns ± 2% +7.00% (p=0.008 n=5+5) ModSqrt225_Tonelli-8 7.26ms ± 0% 7.27ms ± 0% ~ (p=0.841 n=5+5) ModSqrt224_3Mod4-8 2.24ms ± 0% 2.24ms ± 0% ~ (p=0.690 n=5+5) ModSqrt5430_Tonelli-8 62.3s ± 0% 62.4s ± 0% +0.15% (p=0.008 n=5+5) ModSqrt5430_3Mod4-8 20.8s ± 0% 20.8s ± 0% ~ (p=0.151 n=5+5) Sqrt-8 101µs ± 0% 97µs ± 0% -3.99% (p=0.008 n=5+5) IntSqr/1-8 32.7ns ± 1% 32.5ns ± 1% ~ (p=0.325 n=5+5) IntSqr/2-8 161ns ± 4% 160ns ± 4% ~ (p=0.659 n=5+5) IntSqr/3-8 296ns ± 7% 297ns ± 6% ~ (p=0.841 n=5+5) IntSqr/5-8 752ns ± 7% 755ns ± 6% ~ (p=0.889 n=5+5) IntSqr/8-8 1.91µs ± 3% 1.90µs ± 3% ~ (p=0.746 n=5+5) IntSqr/10-8 2.99µs ± 4% 3.00µs ± 4% ~ (p=0.516 n=5+5) IntSqr/20-8 6.29µs ± 2% 6.19µs ± 2% ~ (p=0.151 n=5+5) IntSqr/30-8 14.0µs ± 1% 13.8µs ± 2% ~ (p=0.056 n=5+5) IntSqr/50-8 38.1µs ± 3% 37.9µs ± 3% ~ (p=0.548 n=5+5) IntSqr/80-8 95.1µs ± 1% 94.7µs ± 1% ~ (p=0.310 n=5+5) IntSqr/100-8 148µs ± 1% 148µs ± 1% ~ (p=0.548 n=5+5) IntSqr/200-8 587µs ± 1% 587µs ± 1% ~ (p=1.000 n=5+5) IntSqr/300-8 1.31ms ± 1% 1.32ms ± 1% ~ (p=0.151 n=5+5) IntSqr/500-8 2.48ms ± 0% 2.49ms ± 0% ~ (p=0.310 n=5+5) IntSqr/800-8 4.68ms ± 0% 4.67ms ± 0% ~ (p=0.548 n=5+5) IntSqr/1000-8 7.57ms ± 0% 7.56ms ± 0% ~ (p=0.421 n=5+5) Mul-8 311ms ± 0% 311ms ± 0% ~ (p=0.151 n=5+5) Exp3Power/0x10-8 584ns ± 2% 573ns ± 1% ~ (p=0.190 n=5+5) Exp3Power/0x40-8 646ns ± 2% 649ns ± 1% ~ (p=0.690 n=5+5) Exp3Power/0x100-8 1.42µs ± 2% 1.45µs ± 1% +2.03% (p=0.032 n=5+5) Exp3Power/0x400-8 8.28µs ± 1% 8.39µs ± 0% +1.33% (p=0.008 n=5+5) Exp3Power/0x1000-8 60.1µs ± 0% 59.8µs ± 0% -0.44% (p=0.008 n=5+5) Exp3Power/0x4000-8 818µs ± 0% 816µs ± 0% -0.23% (p=0.008 n=5+5) Exp3Power/0x10000-8 7.79ms ± 0% 7.78ms ± 0% ~ (p=0.690 n=5+5) Exp3Power/0x40000-8 73.4ms ± 0% 73.3ms ± 0% ~ (p=0.151 n=5+5) Exp3Power/0x100000-8 665ms ± 0% 664ms ± 0% -0.16% (p=0.016 n=4+5) Exp3Power/0x400000-8 5.99s ± 0% 5.97s ± 0% -0.24% (p=0.008 n=5+5) Fibo-8 116ms ± 0% 117ms ± 0% +0.42% (p=0.008 n=5+5) NatSqr/1-8 113ns ± 2% 112ns ± 1% ~ (p=0.190 n=5+5) NatSqr/2-8 249ns ± 2% 250ns ± 2% ~ (p=0.365 n=5+5) NatSqr/3-8 379ns ± 1% 381ns ± 2% ~ (p=0.127 n=5+5) NatSqr/5-8 838ns ± 3% 841ns ± 5% ~ (p=0.754 n=5+5) NatSqr/8-8 1.97µs ± 3% 1.97µs ± 4% ~ (p=1.000 n=5+5) NatSqr/10-8 3.04µs ± 4% 3.04µs ± 4% ~ (p=1.000 n=5+5) NatSqr/20-8 6.49µs ± 3% 6.50µs ± 2% ~ (p=0.841 n=5+5) NatSqr/30-8 14.3µs ± 2% 14.2µs ± 2% ~ (p=0.548 n=5+5) NatSqr/50-8 38.5µs ± 3% 38.3µs ± 3% ~ (p=0.421 n=5+5) NatSqr/80-8 96.3µs ± 1% 96.1µs ± 1% ~ (p=0.421 n=5+5) NatSqr/100-8 149µs ± 1% 148µs ± 1% ~ (p=0.310 n=5+5) NatSqr/200-8 591µs ± 1% 592µs ± 1% ~ (p=0.690 n=5+5) NatSqr/300-8 1.31ms ± 1% 1.32ms ± 0% ~ (p=0.190 n=5+4) NatSqr/500-8 2.49ms ± 0% 2.49ms ± 0% ~ (p=0.095 n=5+5) NatSqr/800-8 4.70ms ± 0% 4.69ms ± 0% ~ (p=0.222 n=5+5) NatSqr/1000-8 7.60ms ± 0% 7.58ms ± 0% ~ (p=0.222 n=5+5) ScanPi-8 326µs ± 0% 327µs ± 1% ~ (p=0.222 n=5+5) StringPiParallel-8 71.4µs ± 5% 67.7µs ± 4% ~ (p=0.095 n=5+5) Scan/10/Base2-8 1.09µs ± 0% 1.10µs ± 1% ~ (p=0.810 n=5+5) Scan/100/Base2-8 7.79µs ± 0% 7.83µs ± 0% +0.53% (p=0.008 n=5+5) Scan/1000/Base2-8 78.9µs ± 0% 79.0µs ± 0% ~ (p=0.151 n=5+5) Scan/10000/Base2-8 1.22ms ± 0% 1.23ms ± 1% ~ (p=0.690 n=5+5) Scan/100000/Base2-8 55.1ms ± 0% 55.1ms ± 0% +0.10% (p=0.008 n=5+5) Scan/10/Base8-8 512ns ± 1% 534ns ± 1% +4.34% (p=0.008 n=5+5) Scan/100/Base8-8 2.90µs ± 1% 2.92µs ± 0% +0.67% (p=0.024 n=5+5) Scan/1000/Base8-8 31.0µs ± 0% 31.1µs ± 0% +0.27% (p=0.008 n=5+5) Scan/10000/Base8-8 741µs ± 0% 744µs ± 1% ~ (p=0.310 n=5+5) Scan/100000/Base8-8 50.5ms ± 0% 50.7ms ± 0% +0.23% (p=0.016 n=5+4) Scan/10/Base10-8 485ns ± 0% 510ns ± 1% +5.15% (p=0.008 n=5+5) Scan/100/Base10-8 2.68µs ± 0% 2.70µs ± 0% +0.84% (p=0.008 n=5+5) Scan/1000/Base10-8 28.7µs ± 0% 28.8µs ± 0% +0.34% (p=0.008 n=5+5) Scan/10000/Base10-8 717µs ± 0% 720µs ± 1% ~ (p=0.238 n=5+5) Scan/100000/Base10-8 50.3ms ± 0% 50.3ms ± 0% +0.02% (p=0.016 n=4+5) Scan/10/Base16-8 439ns ± 0% 461ns ± 1% +5.06% (p=0.008 n=5+5) Scan/100/Base16-8 2.48µs ± 0% 2.49µs ± 0% +0.59% (p=0.024 n=5+5) Scan/1000/Base16-8 27.2µs ± 0% 27.3µs ± 0% ~ (p=0.063 n=5+5) Scan/10000/Base16-8 722µs ± 0% 725µs ± 1% ~ (p=0.421 n=5+5) Scan/100000/Base16-8 52.7ms ± 0% 52.7ms ± 0% ~ (p=0.686 n=4+4) String/10/Base2-8 248ns ± 1% 248ns ± 1% ~ (p=0.802 n=5+5) String/100/Base2-8 1.51µs ± 0% 1.51µs ± 0% -0.54% (p=0.024 n=5+5) String/1000/Base2-8 13.6µs ± 0% 13.6µs ± 0% ~ (p=0.548 n=5+5) String/10000/Base2-8 135µs ± 1% 135µs ± 2% ~ (p=0.421 n=5+5) String/100000/Base2-8 1.32ms ± 1% 1.33ms ± 1% ~ (p=0.310 n=5+5) String/10/Base8-8 169ns ± 0% 170ns ± 0% ~ (p=0.079 n=5+5) String/100/Base8-8 635ns ± 1% 633ns ± 1% ~ (p=0.595 n=5+5) String/1000/Base8-8 5.33µs ± 0% 5.30µs ± 0% ~ (p=0.063 n=5+5) String/10000/Base8-8 50.7µs ± 1% 50.7µs ± 1% ~ (p=1.000 n=5+5) String/100000/Base8-8 499µs ± 1% 500µs ± 1% ~ (p=1.000 n=5+5) String/10/Base10-8 517ns ± 1% 512ns ± 1% -1.01% (p=0.032 n=5+5) String/100/Base10-8 1.97µs ± 0% 2.01µs ± 1% +2.13% (p=0.008 n=5+5) String/1000/Base10-8 12.6µs ± 1% 12.1µs ± 1% -4.16% (p=0.008 n=5+5) String/10000/Base10-8 57.9µs ± 1% 54.8µs ± 1% -5.46% (p=0.008 n=5+5) String/100000/Base10-8 25.6ms ± 0% 25.6ms ± 0% -0.12% (p=0.008 n=5+5) String/10/Base16-8 149ns ± 0% 149ns ± 1% ~ (p=1.000 n=5+5) String/100/Base16-8 514ns ± 0% 514ns ± 1% ~ (p=0.825 n=5+5) String/1000/Base16-8 4.01µs ± 0% 4.01µs ± 0% ~ (p=0.595 n=5+5) String/10000/Base16-8 37.7µs ± 0% 37.8µs ± 1% ~ (p=0.222 n=5+5) String/100000/Base16-8 373µs ± 1% 372µs ± 0% ~ (p=1.000 n=5+5) LeafSize/0-8 6.64ms ± 0% 6.66ms ± 0% +0.32% (p=0.008 n=5+5) LeafSize/1-8 74.0µs ± 1% 71.2µs ± 1% -3.75% (p=0.008 n=5+5) LeafSize/2-8 74.1µs ± 0% 70.7µs ± 1% -4.53% (p=0.008 n=5+5) LeafSize/3-8 379µs ± 0% 374µs ± 0% -1.25% (p=0.008 n=5+5) LeafSize/4-8 72.7µs ± 0% 69.2µs ± 0% -4.79% (p=0.008 n=5+5) LeafSize/5-8 471µs ± 0% 466µs ± 0% -1.05% (p=0.008 n=5+5) LeafSize/6-8 377µs ± 0% 373µs ± 0% -1.16% (p=0.008 n=5+5) LeafSize/7-8 245µs ± 0% 241µs ± 0% -1.65% (p=0.008 n=5+5) LeafSize/8-8 73.1µs ± 0% 69.4µs ± 0% -5.10% (p=0.008 n=5+5) LeafSize/9-8 538µs ± 0% 532µs ± 0% -1.01% (p=0.008 n=5+5) LeafSize/10-8 472µs ± 0% 467µs ± 0% -1.07% (p=0.008 n=5+5) LeafSize/11-8 460µs ± 0% 454µs ± 0% -1.22% (p=0.008 n=5+5) LeafSize/12-8 378µs ± 0% 373µs ± 0% -1.34% (p=0.008 n=5+5) LeafSize/13-8 344µs ± 0% 338µs ± 0% -1.61% (p=0.008 n=5+5) LeafSize/14-8 247µs ± 0% 243µs ± 0% -1.62% (p=0.008 n=5+5) LeafSize/15-8 169µs ± 0% 165µs ± 0% -2.71% (p=0.008 n=5+5) LeafSize/16-8 73.3µs ± 1% 69.5µs ± 0% -5.11% (p=0.008 n=5+5) LeafSize/32-8 82.7µs ± 0% 79.2µs ± 0% -4.24% (p=0.008 n=5+5) LeafSize/64-8 135µs ± 0% 132µs ± 0% -2.20% (p=0.008 n=5+5) ProbablyPrime/n=0-8 44.2ms ± 0% 43.9ms ± 0% -0.69% (p=0.008 n=5+5) ProbablyPrime/n=1-8 64.8ms ± 0% 64.4ms ± 0% -0.60% (p=0.008 n=5+5) ProbablyPrime/n=5-8 147ms ± 0% 147ms ± 0% -0.34% (p=0.008 n=5+5) ProbablyPrime/n=10-8 250ms ± 0% 249ms ± 0% -0.29% (p=0.008 n=5+5) ProbablyPrime/n=20-8 456ms ± 0% 455ms ± 0% -0.29% (p=0.008 n=5+5) ProbablyPrime/Lucas-8 23.6ms ± 0% 23.2ms ± 0% -1.44% (p=0.008 n=5+5) ProbablyPrime/MillerRabinBase2-8 20.6ms ± 0% 20.6ms ± 0% -0.31% (p=0.008 n=5+5) FloatSqrt/64-8 2.27µs ± 1% 2.11µs ± 1% -7.02% (p=0.008 n=5+5) FloatSqrt/128-8 4.93µs ± 1% 4.40µs ± 1% -10.73% (p=0.008 n=5+5) FloatSqrt/256-8 13.6µs ± 0% 6.6µs ± 1% -51.40% (p=0.008 n=5+5) FloatSqrt/1000-8 69.8µs ± 0% 31.2µs ± 0% -55.27% (p=0.008 n=5+5) FloatSqrt/10000-8 1.91ms ± 0% 0.59ms ± 0% -69.17% (p=0.008 n=5+5) FloatSqrt/100000-8 55.4ms ± 0% 17.8ms ± 0% -67.79% (p=0.008 n=5+5) FloatSqrt/1000000-8 4.56s ± 0% 1.52s ± 0% -66.59% (p=0.008 n=5+5) Change-Id: Icce52c69668f564490c69b908338b21a2288e116 Reviewed-on: https://go-review.googlesource.com/79355 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-03-12 14:42:11 +00:00
Alberto Donizetti	010579c237	math/big: allocate less in Float.Sqrt The Newton sqrtInverse procedure we use to compute Float.Sqrt should not allocate a number of times proportional to the number of Newton iterations we need to reach the desired precision. At the beginning the function the target precision is known, so even if we do want to perform the early steps at low precisions (to save time), it's still possible to pre-allocate larger backing arrays, both for the temp variables in the loop and the variable that'll hold the final result. There's one complication. At the following line: u.Sub(three, u) the Sub method will allocate, because the receiver aliases one of the arguments, and the large backing array we initially allocated for u will be replaced by a smaller one allocated by Sub. We can work around this by introducing a second temp variable u2 that we use to hold the Sub call result. Overall, the sqrtInverse procedure still allocates a number of times proportional to the number of Newton steps, because unfortunately a few of the Mul calls in the Newton function allocate; but at least we allocate less in the function itself. FloatSqrt/256-4 1.97µs ± 1% 1.84µs ± 1% -6.61% (p=0.000 n=8+8) FloatSqrt/1000-4 4.80µs ± 3% 4.28µs ± 1% -10.78% (p=0.000 n=8+8) FloatSqrt/10000-4 40.0µs ± 1% 38.3µs ± 1% -4.15% (p=0.000 n=8+8) FloatSqrt/100000-4 955µs ± 1% 932µs ± 0% -2.49% (p=0.000 n=8+7) FloatSqrt/1000000-4 79.8ms ± 1% 79.4ms ± 1% ~ (p=0.105 n=8+8) name old alloc/op new alloc/op delta FloatSqrt/256-4 816B ± 0% 512B ± 0% -37.25% (p=0.000 n=8+8) FloatSqrt/1000-4 2.50kB ± 0% 1.47kB ± 0% -41.03% (p=0.000 n=8+8) FloatSqrt/10000-4 23.5kB ± 0% 18.2kB ± 0% -22.62% (p=0.000 n=8+8) FloatSqrt/100000-4 251kB ± 0% 173kB ± 0% -31.26% (p=0.000 n=8+8) FloatSqrt/1000000-4 4.61MB ± 0% 2.86MB ± 0% -37.90% (p=0.000 n=8+8) name old allocs/op new allocs/op delta FloatSqrt/256-4 12.0 ± 0% 8.0 ± 0% -33.33% (p=0.000 n=8+8) FloatSqrt/1000-4 19.0 ± 0% 9.0 ± 0% -52.63% (p=0.000 n=8+8) FloatSqrt/10000-4 35.0 ± 0% 14.0 ± 0% -60.00% (p=0.000 n=8+8) FloatSqrt/100000-4 55.0 ± 0% 23.0 ± 0% -58.18% (p=0.000 n=8+8) FloatSqrt/1000000-4 122 ± 0% 75 ± 0% -38.52% (p=0.000 n=8+8) Change-Id: I950dbf61a40267a6cca82ae72524c3024bcb149c Reviewed-on: https://go-review.googlesource.com/87659 Reviewed-by: Robert Griesemer <gri@golang.org>	2018-03-08 19:12:35 +00:00
isharipo	d2a5263a9c	math/big: speedup nat.setBytes for bigger slices Set up to _S (number of bytes in Uint) bytes at time by using BigEndian.Uint32 and BigEndian.Uint64. The performance improves for slices bigger than _S bytes. This is the case for 128/256bit arith that initializes it's objects from bytes. name old time/op new time/op delta NatSetBytes/8-4 29.8ns ± 1% 11.4ns ± 0% -61.63% (p=0.000 n=9+8) NatSetBytes/24-4 109ns ± 1% 56ns ± 0% -48.75% (p=0.000 n=9+8) NatSetBytes/128-4 420ns ± 2% 110ns ± 1% -73.83% (p=0.000 n=10+10) NatSetBytes/7-4 26.2ns ± 1% 21.3ns ± 2% -18.63% (p=0.000 n=8+9) NatSetBytes/23-4 106ns ± 1% 67ns ± 1% -36.93% (p=0.000 n=9+10) NatSetBytes/127-4 410ns ± 2% 121ns ± 0% -70.46% (p=0.000 n=9+8) Found this optimization opportunity by looking at ethereum_corevm community benchmark cpuprofile. name old time/op new time/op delta OpDiv256-4 715ns ± 1% 596ns ± 1% -16.57% (p=0.008 n=5+5) OpDiv128-4 373ns ± 1% 314ns ± 1% -15.83% (p=0.008 n=5+5) OpDiv64-4 301ns ± 0% 285ns ± 1% -5.12% (p=0.008 n=5+5) Change-Id: I8e5a680ae6284c8233d8d7431d51253a8a740b57 Reviewed-on: https://go-review.googlesource.com/98775 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> Reviewed-by: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-03-08 18:50:10 +00:00
erifan01	0585d41c87	math/big: optimize addVW and subVW on arm64 The biggest hot spot of the existing implementation is "load" operations, which lead to poor performance. By unrolling the cycle 4 times and 2 times, and using "LDP", "STP" instructions, this CL can reduce the "load" cost and improve performance. Benchmarks: name old time/op new time/op delta AddVV/1-8 21.5ns ± 0% 21.5ns ± 0% ~ (all equal) AddVV/2-8 13.5ns ± 0% 13.5ns ± 0% ~ (all equal) AddVV/3-8 15.5ns ± 0% 15.5ns ± 0% ~ (all equal) AddVV/4-8 17.5ns ± 0% 17.5ns ± 0% ~ (all equal) AddVV/5-8 19.5ns ± 0% 19.5ns ± 0% ~ (all equal) AddVV/10-8 29.5ns ± 0% 29.5ns ± 0% ~ (all equal) AddVV/100-8 217ns ± 0% 217ns ± 0% ~ (all equal) AddVV/1000-8 2.02µs ± 0% 2.02µs ± 0% ~ (all equal) AddVV/10000-8 20.3µs ± 0% 20.3µs ± 0% ~ (p=0.603 n=5+5) AddVV/100000-8 223µs ± 7% 228µs ± 8% ~ (p=0.548 n=5+5) AddVW/1-8 9.32ns ± 0% 9.26ns ± 0% -0.64% (p=0.008 n=5+5) AddVW/2-8 19.8ns ± 3% 10.5ns ± 0% -46.92% (p=0.008 n=5+5) AddVW/3-8 11.5ns ± 0% 11.0ns ± 0% -4.35% (p=0.008 n=5+5) AddVW/4-8 13.0ns ± 0% 12.0ns ± 0% -7.69% (p=0.008 n=5+5) AddVW/5-8 14.5ns ± 0% 12.5ns ± 0% -13.79% (p=0.008 n=5+5) AddVW/10-8 22.0ns ± 0% 15.5ns ± 0% -29.55% (p=0.008 n=5+5) AddVW/100-8 167ns ± 0% 81ns ± 0% -51.44% (p=0.008 n=5+5) AddVW/1000-8 1.52µs ± 0% 0.64µs ± 0% -57.58% (p=0.008 n=5+5) AddVW/10000-8 15.1µs ± 0% 7.2µs ± 0% -52.55% (p=0.008 n=5+5) AddVW/100000-8 150µs ± 0% 71µs ± 0% -52.95% (p=0.008 n=5+5) SubVW/1-8 9.32ns ± 0% 9.26ns ± 0% -0.64% (p=0.008 n=5+5) SubVW/2-8 19.7ns ± 2% 10.5ns ± 0% -46.70% (p=0.008 n=5+5) SubVW/3-8 11.5ns ± 0% 11.0ns ± 0% -4.35% (p=0.008 n=5+5) SubVW/4-8 13.0ns ± 0% 12.0ns ± 0% -7.69% (p=0.008 n=5+5) SubVW/5-8 14.5ns ± 0% 12.5ns ± 0% -13.79% (p=0.008 n=5+5) SubVW/10-8 22.0ns ± 0% 15.5ns ± 0% -29.55% (p=0.008 n=5+5) SubVW/100-8 167ns ± 0% 81ns ± 0% -51.44% (p=0.008 n=5+5) SubVW/1000-8 1.52µs ± 0% 0.64µs ± 0% -57.58% (p=0.008 n=5+5) SubVW/10000-8 15.1µs ± 0% 7.2µs ± 0% -52.49% (p=0.008 n=5+5) SubVW/100000-8 150µs ± 0% 71µs ± 0% -52.91% (p=0.008 n=5+5) AddMulVVW/1-8 32.4ns ± 1% 32.6ns ± 1% ~ (p=0.119 n=5+5) AddMulVVW/2-8 57.0ns ± 0% 57.0ns ± 0% ~ (p=0.643 n=5+5) AddMulVVW/3-8 90.8ns ± 0% 90.7ns ± 0% ~ (p=0.524 n=5+5) AddMulVVW/4-8 118ns ± 0% 118ns ± 1% ~ (p=1.000 n=4+5) AddMulVVW/5-8 144ns ± 1% 144ns ± 0% ~ (p=0.794 n=5+4) AddMulVVW/10-8 294ns ± 1% 296ns ± 0% +0.48% (p=0.040 n=5+5) AddMulVVW/100-8 2.73µs ± 0% 2.73µs ± 0% ~ (p=0.278 n=5+5) AddMulVVW/1000-8 26.0µs ± 0% 26.5µs ± 0% +2.14% (p=0.008 n=5+5) AddMulVVW/10000-8 297µs ± 0% 297µs ± 0% +0.24% (p=0.008 n=5+5) AddMulVVW/100000-8 3.15ms ± 1% 3.13ms ± 0% ~ (p=0.690 n=5+5) DecimalConversion-8 311µs ± 2% 309µs ± 2% ~ (p=0.310 n=5+5) FloatString/100-8 2.55µs ± 2% 2.54µs ± 2% ~ (p=1.000 n=5+5) FloatString/1000-8 58.1µs ± 0% 58.1µs ± 0% ~ (p=0.151 n=5+5) FloatString/10000-8 4.59ms ± 0% 4.59ms ± 0% ~ (p=0.151 n=5+5) FloatString/100000-8 446ms ± 0% 446ms ± 0% +0.01% (p=0.016 n=5+5) FloatAdd/10-8 183ns ± 0% 183ns ± 0% ~ (p=0.333 n=4+5) FloatAdd/100-8 187ns ± 1% 192ns ± 2% ~ (p=0.056 n=5+5) FloatAdd/1000-8 369ns ± 0% 371ns ± 0% +0.54% (p=0.016 n=4+5) FloatAdd/10000-8 1.88µs ± 0% 1.88µs ± 0% -0.14% (p=0.000 n=4+5) FloatAdd/100000-8 17.2µs ± 0% 17.1µs ± 0% -0.37% (p=0.008 n=5+5) FloatSub/10-8 147ns ± 0% 147ns ± 0% ~ (all equal) FloatSub/100-8 145ns ± 0% 146ns ± 0% ~ (p=0.238 n=5+4) FloatSub/1000-8 241ns ± 0% 241ns ± 0% ~ (p=0.333 n=5+4) FloatSub/10000-8 1.06µs ± 0% 1.06µs ± 0% ~ (p=0.444 n=5+5) FloatSub/100000-8 9.50µs ± 0% 9.48µs ± 0% -0.14% (p=0.008 n=5+5) ParseFloatSmallExp-8 28.4µs ± 2% 28.5µs ± 1% ~ (p=0.690 n=5+5) ParseFloatLargeExp-8 125µs ± 1% 124µs ± 1% ~ (p=0.095 n=5+5) GCD10x10/WithoutXY-8 277ns ± 2% 278ns ± 3% ~ (p=0.937 n=5+5) GCD10x10/WithXY-8 2.08µs ± 3% 2.15µs ± 3% ~ (p=0.056 n=5+5) GCD10x100/WithoutXY-8 592ns ± 3% 613ns ± 4% ~ (p=0.056 n=5+5) GCD10x100/WithXY-8 3.40µs ± 2% 3.42µs ± 4% ~ (p=0.841 n=5+5) GCD10x1000/WithoutXY-8 1.37µs ± 2% 1.35µs ± 3% ~ (p=0.460 n=5+5) GCD10x1000/WithXY-8 7.34µs ± 2% 7.33µs ± 4% ~ (p=0.841 n=5+5) GCD10x10000/WithoutXY-8 8.52µs ± 0% 8.51µs ± 1% ~ (p=0.421 n=5+5) GCD10x10000/WithXY-8 27.5µs ± 2% 27.2µs ± 1% ~ (p=0.151 n=5+5) GCD10x100000/WithoutXY-8 78.3µs ± 1% 78.5µs ± 1% ~ (p=0.690 n=5+5) GCD10x100000/WithXY-8 231µs ± 0% 229µs ± 1% -1.11% (p=0.016 n=5+5) GCD100x100/WithoutXY-8 1.86µs ± 2% 1.86µs ± 2% ~ (p=0.881 n=5+5) GCD100x100/WithXY-8 27.1µs ± 2% 27.2µs ± 1% ~ (p=0.421 n=5+5) GCD100x1000/WithoutXY-8 4.44µs ± 2% 4.41µs ± 1% ~ (p=0.310 n=5+5) GCD100x1000/WithXY-8 36.3µs ± 1% 36.2µs ± 1% ~ (p=0.310 n=5+5) GCD100x10000/WithoutXY-8 22.6µs ± 2% 22.5µs ± 1% ~ (p=0.690 n=5+5) GCD100x10000/WithXY-8 145µs ± 1% 145µs ± 1% ~ (p=1.000 n=5+5) GCD100x100000/WithoutXY-8 195µs ± 0% 196µs ± 1% ~ (p=0.548 n=5+5) GCD100x100000/WithXY-8 1.10ms ± 0% 1.10ms ± 0% -0.30% (p=0.016 n=5+5) GCD1000x1000/WithoutXY-8 25.0µs ± 1% 25.2µs ± 2% ~ (p=0.222 n=5+5) GCD1000x1000/WithXY-8 520µs ± 0% 520µs ± 1% ~ (p=0.151 n=5+5) GCD1000x10000/WithoutXY-8 57.0µs ± 1% 56.9µs ± 1% ~ (p=0.690 n=5+5) GCD1000x10000/WithXY-8 1.21ms ± 0% 1.21ms ± 1% ~ (p=0.881 n=5+5) GCD1000x100000/WithoutXY-8 358µs ± 0% 359µs ± 1% ~ (p=0.548 n=5+5) GCD1000x100000/WithXY-8 8.73ms ± 0% 8.73ms ± 0% ~ (p=0.548 n=5+5) GCD10000x10000/WithoutXY-8 686µs ± 0% 687µs ± 0% ~ (p=0.548 n=5+5) GCD10000x10000/WithXY-8 15.9ms ± 0% 15.9ms ± 0% ~ (p=0.841 n=5+5) GCD10000x100000/WithoutXY-8 2.08ms ± 0% 2.08ms ± 0% ~ (p=1.000 n=5+5) GCD10000x100000/WithXY-8 86.7ms ± 0% 86.7ms ± 0% ~ (p=1.000 n=5+5) GCD100000x100000/WithoutXY-8 51.1ms ± 0% 51.0ms ± 0% ~ (p=0.151 n=5+5) GCD100000x100000/WithXY-8 1.23s ± 0% 1.23s ± 0% ~ (p=0.841 n=5+5) Hilbert-8 2.41ms ± 1% 2.42ms ± 2% ~ (p=0.690 n=5+5) Binomial-8 4.86µs ± 1% 4.86µs ± 1% ~ (p=0.889 n=5+5) QuoRem-8 7.09µs ± 0% 7.08µs ± 0% -0.09% (p=0.024 n=5+5) Exp-8 161ms ± 0% 161ms ± 0% -0.08% (p=0.032 n=5+5) Exp2-8 161ms ± 0% 161ms ± 0% ~ (p=1.000 n=5+5) Bitset-8 40.7ns ± 0% 40.6ns ± 0% ~ (p=0.095 n=4+5) BitsetNeg-8 159ns ± 4% 148ns ± 0% -6.92% (p=0.016 n=5+4) BitsetOrig-8 378ns ± 1% 378ns ± 1% ~ (p=0.937 n=5+5) BitsetNegOrig-8 647ns ± 5% 647ns ± 4% ~ (p=1.000 n=5+5) ModSqrt225_Tonelli-8 7.26ms ± 0% 7.27ms ± 0% ~ (p=1.000 n=5+5) ModSqrt224_3Mod4-8 2.24ms ± 0% 2.24ms ± 0% ~ (p=0.690 n=5+5) ModSqrt5430_Tonelli-8 62.8s ± 1% 62.5s ± 0% ~ (p=0.063 n=5+4) ModSqrt5430_3Mod4-8 20.8s ± 0% 20.8s ± 0% ~ (p=0.310 n=5+5) Sqrt-8 101µs ± 1% 101µs ± 0% -0.35% (p=0.032 n=5+5) IntSqr/1-8 32.3ns ± 1% 32.5ns ± 1% ~ (p=0.421 n=5+5) IntSqr/2-8 157ns ± 5% 156ns ± 5% ~ (p=0.651 n=5+5) IntSqr/3-8 292ns ± 2% 291ns ± 3% ~ (p=0.881 n=5+5) IntSqr/5-8 738ns ± 6% 740ns ± 5% ~ (p=0.841 n=5+5) IntSqr/8-8 1.82µs ± 4% 1.83µs ± 4% ~ (p=0.730 n=5+5) IntSqr/10-8 2.92µs ± 1% 2.93µs ± 1% ~ (p=0.643 n=5+5) IntSqr/20-8 6.28µs ± 2% 6.28µs ± 2% ~ (p=1.000 n=5+5) IntSqr/30-8 13.8µs ± 2% 13.9µs ± 3% ~ (p=1.000 n=5+5) IntSqr/50-8 37.8µs ± 4% 37.9µs ± 4% ~ (p=0.690 n=5+5) IntSqr/80-8 95.9µs ± 1% 95.8µs ± 1% ~ (p=0.841 n=5+5) IntSqr/100-8 148µs ± 1% 148µs ± 1% ~ (p=0.310 n=5+5) IntSqr/200-8 586µs ± 1% 586µs ± 1% ~ (p=0.841 n=5+5) IntSqr/300-8 1.32ms ± 0% 1.31ms ± 0% ~ (p=0.222 n=5+5) IntSqr/500-8 2.48ms ± 0% 2.48ms ± 0% ~ (p=0.556 n=5+4) IntSqr/800-8 4.68ms ± 0% 4.68ms ± 0% ~ (p=0.548 n=5+5) IntSqr/1000-8 7.57ms ± 0% 7.56ms ± 0% ~ (p=0.421 n=5+5) Mul-8 311ms ± 0% 311ms ± 0% ~ (p=0.548 n=5+5) Exp3Power/0x10-8 559ns ± 1% 560ns ± 1% ~ (p=0.984 n=5+5) Exp3Power/0x40-8 641ns ± 1% 634ns ± 1% ~ (p=0.063 n=5+5) Exp3Power/0x100-8 1.39µs ± 2% 1.40µs ± 2% ~ (p=0.381 n=5+5) Exp3Power/0x400-8 8.27µs ± 1% 8.26µs ± 0% ~ (p=0.571 n=5+5) Exp3Power/0x1000-8 59.9µs ± 0% 59.7µs ± 0% -0.23% (p=0.008 n=5+5) Exp3Power/0x4000-8 816µs ± 0% 816µs ± 0% ~ (p=1.000 n=5+5) Exp3Power/0x10000-8 7.77ms ± 0% 7.77ms ± 0% ~ (p=0.841 n=5+5) Exp3Power/0x40000-8 73.4ms ± 0% 73.4ms ± 0% ~ (p=0.690 n=5+5) Exp3Power/0x100000-8 665ms ± 0% 664ms ± 0% -0.14% (p=0.008 n=5+5) Exp3Power/0x400000-8 5.98s ± 0% 5.98s ± 0% -0.09% (p=0.008 n=5+5) Fibo-8 116ms ± 0% 116ms ± 0% -0.25% (p=0.008 n=5+5) NatSqr/1-8 115ns ± 3% 116ns ± 2% ~ (p=0.238 n=5+5) NatSqr/2-8 237ns ± 1% 237ns ± 1% ~ (p=0.683 n=5+5) NatSqr/3-8 367ns ± 3% 368ns ± 3% ~ (p=0.817 n=5+5) NatSqr/5-8 807ns ± 3% 812ns ± 3% ~ (p=0.913 n=5+5) NatSqr/8-8 1.93µs ± 2% 1.93µs ± 3% ~ (p=0.651 n=5+5) NatSqr/10-8 2.98µs ± 2% 2.99µs ± 2% ~ (p=0.690 n=5+5) NatSqr/20-8 6.49µs ± 2% 6.46µs ± 2% ~ (p=0.548 n=5+5) NatSqr/30-8 14.4µs ± 2% 14.3µs ± 2% ~ (p=0.690 n=5+5) NatSqr/50-8 38.6µs ± 2% 38.7µs ± 2% ~ (p=0.841 n=5+5) NatSqr/80-8 96.1µs ± 2% 95.8µs ± 2% ~ (p=0.548 n=5+5) NatSqr/100-8 149µs ± 1% 149µs ± 1% ~ (p=0.841 n=5+5) NatSqr/200-8 593µs ± 1% 590µs ± 1% ~ (p=0.421 n=5+5) NatSqr/300-8 1.32ms ± 0% 1.32ms ± 1% ~ (p=0.222 n=5+5) NatSqr/500-8 2.49ms ± 0% 2.49ms ± 0% ~ (p=0.690 n=5+5) NatSqr/800-8 4.69ms ± 0% 4.69ms ± 0% ~ (p=1.000 n=5+5) NatSqr/1000-8 7.59ms ± 0% 7.58ms ± 0% ~ (p=0.841 n=5+5) ScanPi-8 322µs ± 0% 321µs ± 0% ~ (p=0.095 n=5+5) StringPiParallel-8 71.4µs ± 5% 68.8µs ± 4% ~ (p=0.151 n=5+5) Scan/10/Base2-8 1.10µs ± 0% 1.09µs ± 0% -0.36% (p=0.032 n=5+5) Scan/100/Base2-8 7.78µs ± 0% 7.79µs ± 0% +0.14% (p=0.008 n=5+5) Scan/1000/Base2-8 78.8µs ± 0% 79.0µs ± 0% +0.24% (p=0.008 n=5+5) Scan/10000/Base2-8 1.22ms ± 0% 1.22ms ± 0% ~ (p=0.056 n=5+5) Scan/100000/Base2-8 55.1ms ± 0% 55.0ms ± 0% -0.15% (p=0.008 n=5+5) Scan/10/Base8-8 514ns ± 0% 515ns ± 0% ~ (p=0.079 n=5+5) Scan/100/Base8-8 2.89µs ± 0% 2.89µs ± 0% +0.15% (p=0.008 n=5+5) Scan/1000/Base8-8 31.0µs ± 0% 31.1µs ± 0% +0.12% (p=0.008 n=5+5) Scan/10000/Base8-8 740µs ± 0% 740µs ± 0% ~ (p=0.222 n=5+5) Scan/100000/Base8-8 50.6ms ± 0% 50.5ms ± 0% -0.06% (p=0.016 n=4+5) Scan/10/Base10-8 492ns ± 1% 490ns ± 1% ~ (p=0.310 n=5+5) Scan/100/Base10-8 2.67µs ± 0% 2.67µs ± 0% ~ (p=0.056 n=5+5) Scan/1000/Base10-8 28.7µs ± 0% 28.7µs ± 0% ~ (p=1.000 n=5+5) Scan/10000/Base10-8 717µs ± 0% 716µs ± 0% ~ (p=0.222 n=5+5) Scan/100000/Base10-8 50.2ms ± 0% 50.3ms ± 0% +0.05% (p=0.008 n=5+5) Scan/10/Base16-8 442ns ± 1% 442ns ± 0% ~ (p=0.468 n=5+5) Scan/100/Base16-8 2.46µs ± 0% 2.45µs ± 0% ~ (p=0.159 n=5+5) Scan/1000/Base16-8 27.2µs ± 0% 27.2µs ± 0% ~ (p=0.841 n=5+5) Scan/10000/Base16-8 721µs ± 0% 722µs ± 0% ~ (p=0.548 n=5+5) Scan/100000/Base16-8 52.6ms ± 0% 52.6ms ± 0% +0.07% (p=0.008 n=5+5) String/10/Base2-8 244ns ± 1% 242ns ± 1% ~ (p=0.103 n=5+5) String/100/Base2-8 1.48µs ± 0% 1.48µs ± 1% ~ (p=0.786 n=5+5) String/1000/Base2-8 13.3µs ± 1% 13.3µs ± 0% ~ (p=0.222 n=5+5) String/10000/Base2-8 132µs ± 1% 132µs ± 1% ~ (p=1.000 n=5+5) String/100000/Base2-8 1.30ms ± 1% 1.30ms ± 1% ~ (p=1.000 n=5+5) String/10/Base8-8 167ns ± 1% 168ns ± 1% ~ (p=0.135 n=5+5) String/100/Base8-8 623ns ± 1% 626ns ± 1% ~ (p=0.151 n=5+5) String/1000/Base8-8 5.24µs ± 1% 5.24µs ± 0% ~ (p=1.000 n=5+5) String/10000/Base8-8 50.0µs ± 1% 50.0µs ± 1% ~ (p=1.000 n=5+5) String/100000/Base8-8 492µs ± 1% 489µs ± 1% ~ (p=0.056 n=5+5) String/10/Base10-8 503ns ± 1% 501ns ± 0% ~ (p=0.183 n=5+5) String/100/Base10-8 1.96µs ± 0% 1.97µs ± 0% ~ (p=0.389 n=5+5) String/1000/Base10-8 12.4µs ± 1% 12.4µs ± 1% ~ (p=0.841 n=5+5) String/10000/Base10-8 56.7µs ± 1% 56.6µs ± 0% ~ (p=1.000 n=5+5) String/100000/Base10-8 25.6ms ± 0% 25.6ms ± 0% ~ (p=0.222 n=5+5) String/10/Base16-8 147ns ± 0% 148ns ± 2% ~ (p=1.000 n=4+5) String/100/Base16-8 505ns ± 0% 505ns ± 1% ~ (p=0.778 n=5+5) String/1000/Base16-8 3.94µs ± 0% 3.94µs ± 0% ~ (p=0.841 n=5+5) String/10000/Base16-8 37.4µs ± 1% 37.2µs ± 1% ~ (p=0.095 n=5+5) String/100000/Base16-8 367µs ± 1% 367µs ± 0% ~ (p=1.000 n=5+5) LeafSize/0-8 6.64ms ± 0% 6.65ms ± 0% ~ (p=0.690 n=5+5) LeafSize/1-8 72.5µs ± 1% 72.4µs ± 1% ~ (p=0.841 n=5+5) LeafSize/2-8 72.6µs ± 1% 72.6µs ± 1% ~ (p=1.000 n=5+5) LeafSize/3-8 377µs ± 0% 377µs ± 0% ~ (p=0.421 n=5+5) LeafSize/4-8 71.2µs ± 1% 71.3µs ± 0% ~ (p=0.278 n=5+5) LeafSize/5-8 469µs ± 0% 469µs ± 0% ~ (p=0.310 n=5+5) LeafSize/6-8 376µs ± 0% 376µs ± 0% ~ (p=0.841 n=5+5) LeafSize/7-8 244µs ± 0% 244µs ± 0% ~ (p=0.841 n=5+5) LeafSize/8-8 71.9µs ± 1% 72.1µs ± 1% ~ (p=0.548 n=5+5) LeafSize/9-8 536µs ± 0% 536µs ± 0% ~ (p=0.151 n=5+5) LeafSize/10-8 470µs ± 0% 471µs ± 0% +0.10% (p=0.032 n=5+5) LeafSize/11-8 458µs ± 0% 458µs ± 0% ~ (p=0.881 n=5+5) LeafSize/12-8 376µs ± 0% 376µs ± 0% ~ (p=0.548 n=5+5) LeafSize/13-8 341µs ± 0% 342µs ± 0% ~ (p=0.222 n=5+5) LeafSize/14-8 246µs ± 0% 245µs ± 0% ~ (p=0.167 n=5+5) LeafSize/15-8 168µs ± 0% 168µs ± 0% ~ (p=0.548 n=5+5) LeafSize/16-8 72.1µs ± 1% 72.2µs ± 1% ~ (p=0.690 n=5+5) LeafSize/32-8 81.5µs ± 1% 81.4µs ± 1% ~ (p=1.000 n=5+5) LeafSize/64-8 133µs ± 1% 134µs ± 1% ~ (p=0.690 n=5+5) ProbablyPrime/n=0-8 44.3ms ± 0% 44.2ms ± 0% -0.28% (p=0.008 n=5+5) ProbablyPrime/n=1-8 64.8ms ± 0% 64.7ms ± 0% -0.15% (p=0.008 n=5+5) ProbablyPrime/n=5-8 147ms ± 0% 147ms ± 0% -0.11% (p=0.008 n=5+5) ProbablyPrime/n=10-8 250ms ± 0% 250ms ± 0% ~ (p=0.056 n=5+5) ProbablyPrime/n=20-8 456ms ± 0% 455ms ± 0% -0.05% (p=0.008 n=5+5) ProbablyPrime/Lucas-8 23.6ms ± 0% 23.5ms ± 0% -0.29% (p=0.008 n=5+5) ProbablyPrime/MillerRabinBase2-8 20.6ms ± 0% 20.6ms ± 0% ~ (p=0.690 n=5+5) FloatSqrt/64-8 2.01µs ± 1% 2.02µs ± 1% ~ (p=0.421 n=5+5) FloatSqrt/128-8 4.43µs ± 2% 4.38µs ± 2% ~ (p=0.222 n=5+5) FloatSqrt/256-8 6.64µs ± 1% 6.68µs ± 2% ~ (p=0.516 n=5+5) FloatSqrt/1000-8 31.9µs ± 0% 31.8µs ± 0% ~ (p=0.095 n=5+5) FloatSqrt/10000-8 595µs ± 0% 594µs ± 0% ~ (p=0.056 n=5+5) FloatSqrt/100000-8 17.9ms ± 0% 17.9ms ± 0% ~ (p=0.151 n=5+5) FloatSqrt/1000000-8 1.52s ± 0% 1.52s ± 0% ~ (p=0.841 n=5+5) name old speed new speed delta AddVV/1-8 2.97GB/s ± 0% 2.97GB/s ± 0% ~ (p=0.971 n=4+4) AddVV/2-8 9.47GB/s ± 0% 9.47GB/s ± 0% +0.01% (p=0.016 n=5+5) AddVV/3-8 12.4GB/s ± 0% 12.4GB/s ± 0% ~ (p=0.548 n=5+5) AddVV/4-8 14.6GB/s ± 0% 14.6GB/s ± 0% ~ (p=1.000 n=5+5) AddVV/5-8 16.4GB/s ± 0% 16.4GB/s ± 0% ~ (p=1.000 n=5+5) AddVV/10-8 21.7GB/s ± 0% 21.7GB/s ± 0% ~ (p=0.548 n=5+5) AddVV/100-8 29.4GB/s ± 0% 29.4GB/s ± 0% ~ (p=1.000 n=5+5) AddVV/1000-8 31.7GB/s ± 0% 31.7GB/s ± 0% ~ (p=0.524 n=5+4) AddVV/10000-8 31.5GB/s ± 0% 31.5GB/s ± 0% ~ (p=0.690 n=5+5) AddVV/100000-8 28.8GB/s ± 7% 28.1GB/s ± 8% ~ (p=0.548 n=5+5) AddVW/1-8 859MB/s ± 0% 864MB/s ± 0% +0.61% (p=0.008 n=5+5) AddVW/2-8 809MB/s ± 2% 1520MB/s ± 0% +87.78% (p=0.008 n=5+5) AddVW/3-8 2.08GB/s ± 0% 2.18GB/s ± 0% +4.54% (p=0.008 n=5+5) AddVW/4-8 2.46GB/s ± 0% 2.66GB/s ± 0% +8.33% (p=0.016 n=4+5) AddVW/5-8 2.76GB/s ± 0% 3.20GB/s ± 0% +16.03% (p=0.008 n=5+5) AddVW/10-8 3.63GB/s ± 0% 5.15GB/s ± 0% +41.83% (p=0.008 n=5+5) AddVW/100-8 4.79GB/s ± 0% 9.87GB/s ± 0% +106.12% (p=0.008 n=5+5) AddVW/1000-8 5.27GB/s ± 0% 12.42GB/s ± 0% +135.74% (p=0.008 n=5+5) AddVW/10000-8 5.31GB/s ± 0% 11.19GB/s ± 0% +110.71% (p=0.008 n=5+5) AddVW/100000-8 5.32GB/s ± 0% 11.32GB/s ± 0% +112.56% (p=0.008 n=5+5) SubVW/1-8 859MB/s ± 0% 864MB/s ± 0% +0.61% (p=0.008 n=5+5) SubVW/2-8 812MB/s ± 2% 1520MB/s ± 0% +87.09% (p=0.008 n=5+5) SubVW/3-8 2.08GB/s ± 0% 2.18GB/s ± 0% +4.55% (p=0.008 n=5+5) SubVW/4-8 2.46GB/s ± 0% 2.66GB/s ± 0% +8.33% (p=0.008 n=5+5) SubVW/5-8 2.75GB/s ± 0% 3.20GB/s ± 0% +16.03% (p=0.008 n=5+5) SubVW/10-8 3.63GB/s ± 0% 5.15GB/s ± 0% +41.82% (p=0.008 n=5+5) SubVW/100-8 4.79GB/s ± 0% 9.87GB/s ± 0% +106.13% (p=0.008 n=5+5) SubVW/1000-8 5.27GB/s ± 0% 12.42GB/s ± 0% +135.74% (p=0.008 n=5+5) SubVW/10000-8 5.31GB/s ± 0% 11.17GB/s ± 0% +110.44% (p=0.008 n=5+5) SubVW/100000-8 5.32GB/s ± 0% 11.31GB/s ± 0% +112.35% (p=0.008 n=5+5) AddMulVVW/1-8 1.97GB/s ± 1% 1.96GB/s ± 1% ~ (p=0.151 n=5+5) AddMulVVW/2-8 2.24GB/s ± 0% 2.25GB/s ± 0% ~ (p=0.095 n=5+5) AddMulVVW/3-8 2.11GB/s ± 0% 2.12GB/s ± 0% ~ (p=0.548 n=5+5) AddMulVVW/4-8 2.17GB/s ± 1% 2.17GB/s ± 1% ~ (p=0.548 n=5+5) AddMulVVW/5-8 2.22GB/s ± 1% 2.21GB/s ± 1% ~ (p=0.421 n=5+5) AddMulVVW/10-8 2.17GB/s ± 1% 2.16GB/s ± 0% ~ (p=0.095 n=5+5) AddMulVVW/100-8 2.35GB/s ± 0% 2.35GB/s ± 0% ~ (p=0.421 n=5+5) AddMulVVW/1000-8 2.47GB/s ± 0% 2.41GB/s ± 0% -2.09% (p=0.008 n=5+5) AddMulVVW/10000-8 2.16GB/s ± 0% 2.15GB/s ± 0% -0.23% (p=0.008 n=5+5) AddMulVVW/100000-8 2.03GB/s ± 1% 2.04GB/s ± 0% ~ (p=0.690 n=5+5) name old alloc/op new alloc/op delta FloatString/100-8 400B ± 0% 400B ± 0% ~ (all equal) FloatString/1000-8 3.22kB ± 0% 3.22kB ± 0% ~ (all equal) FloatString/10000-8 55.6kB ± 0% 55.5kB ± 0% ~ (p=0.206 n=5+5) FloatString/100000-8 627kB ± 0% 627kB ± 0% ~ (all equal) FloatAdd/10-8 0.00B 0.00B ~ (all equal) FloatAdd/100-8 0.00B 0.00B ~ (all equal) FloatAdd/1000-8 0.00B 0.00B ~ (all equal) FloatAdd/10000-8 0.00B 0.00B ~ (all equal) FloatAdd/100000-8 0.00B 0.00B ~ (all equal) FloatSub/10-8 0.00B 0.00B ~ (all equal) FloatSub/100-8 0.00B 0.00B ~ (all equal) FloatSub/1000-8 0.00B 0.00B ~ (all equal) FloatSub/10000-8 0.00B 0.00B ~ (all equal) FloatSub/100000-8 0.00B 0.00B ~ (all equal) FloatSqrt/64-8 416B ± 0% 416B ± 0% ~ (all equal) FloatSqrt/128-8 720B ± 0% 720B ± 0% ~ (all equal) FloatSqrt/256-8 816B ± 0% 816B ± 0% ~ (all equal) FloatSqrt/1000-8 2.50kB ± 0% 2.50kB ± 0% ~ (all equal) FloatSqrt/10000-8 23.5kB ± 0% 23.5kB ± 0% ~ (all equal) FloatSqrt/100000-8 251kB ± 0% 251kB ± 0% ~ (all equal) FloatSqrt/1000000-8 4.61MB ± 0% 4.61MB ± 0% ~ (all equal) name old allocs/op new allocs/op delta FloatString/100-8 8.00 ± 0% 8.00 ± 0% ~ (all equal) FloatString/1000-8 10.0 ± 0% 10.0 ± 0% ~ (all equal) FloatString/10000-8 42.0 ± 0% 42.0 ± 0% ~ (all equal) FloatString/100000-8 346 ± 0% 346 ± 0% ~ (all equal) FloatAdd/10-8 0.00 0.00 ~ (all equal) FloatAdd/100-8 0.00 0.00 ~ (all equal) FloatAdd/1000-8 0.00 0.00 ~ (all equal) FloatAdd/10000-8 0.00 0.00 ~ (all equal) FloatAdd/100000-8 0.00 0.00 ~ (all equal) FloatSub/10-8 0.00 0.00 ~ (all equal) FloatSub/100-8 0.00 0.00 ~ (all equal) FloatSub/1000-8 0.00 0.00 ~ (all equal) FloatSub/10000-8 0.00 0.00 ~ (all equal) FloatSub/100000-8 0.00 0.00 ~ (all equal) FloatSqrt/64-8 9.00 ± 0% 9.00 ± 0% ~ (all equal) FloatSqrt/128-8 13.0 ± 0% 13.0 ± 0% ~ (all equal) FloatSqrt/256-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) FloatSqrt/1000-8 19.0 ± 0% 19.0 ± 0% ~ (all equal) FloatSqrt/10000-8 35.0 ± 0% 35.0 ± 0% ~ (all equal) FloatSqrt/100000-8 55.0 ± 0% 55.0 ± 0% ~ (all equal) FloatSqrt/1000000-8 122 ± 0% 122 ± 0% ~ (all equal) Change-Id: I6888d84c037d91f9e2199f3492ea3f6a0ed77b24 Reviewed-on: https://go-review.googlesource.com/77832 Reviewed-by: Vlad Krasnov <vlad@cloudflare.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-03-08 15:31:37 +00:00
Vlad Krasnov	fd3d27938a	math/big: implement addMulVVW on arm64 The lack of proper addMulVVW implementation for arm64 hurts RSA performance. This assembly implementation is optimized for arm64 based servers. name old time/op new time/op delta pkg:math/big goos:linux goarch:arm64 AddMulVVW/1 55.2ns ± 0% 11.9ns ± 1% -78.37% (p=0.000 n=8+10) AddMulVVW/2 67.0ns ± 0% 11.2ns ± 0% -83.28% (p=0.000 n=7+10) AddMulVVW/3 93.2ns ± 0% 13.2ns ± 0% -85.84% (p=0.000 n=10+10) AddMulVVW/4 126ns ± 0% 13ns ± 1% -89.82% (p=0.000 n=10+10) AddMulVVW/5 151ns ± 0% 17ns ± 0% -88.87% (p=0.000 n=10+9) AddMulVVW/10 323ns ± 0% 25ns ± 0% -92.20% (p=0.000 n=10+10) AddMulVVW/100 3.28µs ± 0% 0.14µs ± 0% -95.82% (p=0.000 n=10+10) AddMulVVW/1000 31.7µs ± 0% 1.3µs ± 0% -96.00% (p=0.000 n=10+8) AddMulVVW/10000 313µs ± 0% 13µs ± 0% -95.98% (p=0.000 n=10+10) AddMulVVW/100000 3.24ms ± 0% 0.13ms ± 1% -96.13% (p=0.000 n=9+9) pkg:crypto/rsa goos:linux goarch:arm64 RSA2048Decrypt 44.7ms ± 0% 4.0ms ± 6% -91.08% (p=0.000 n=8+10) RSA2048Sign 46.3ms ± 0% 5.0ms ± 0% -89.29% (p=0.000 n=9+10) 3PrimeRSA2048Decrypt 22.3ms ± 0% 2.4ms ± 0% -89.26% (p=0.000 n=10+10) Change-Id: I295f0bd5c51a4442d02c44ece1f6026d30dff0bc Reviewed-on: https://go-review.googlesource.com/76270 Reviewed-by: Vlad Krasnov <vlad@cloudflare.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Vlad Krasnov <vlad@cloudflare.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-03-07 23:04:38 +00:00
Cherry Zhang	084143d844	math/big: don't use R18 in ARM64 assembly R18 seems reserved on Apple platforms. May fix darwin/arm64 build. Change-Id: Ia2c1de550a64827c85a64affa53b94c62aacce8e Reviewed-on: https://go-review.googlesource.com/98896 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Elias Naur <elias.naur@gmail.com>	2018-03-06 15:34:00 +00:00
erifan01	c4f3fe95c6	math/big: optimize addVV and subVV on arm64 The biggest hot spot of the existing implementation is "load" operations, which lead to poor performance. By unrolling the cycle 4x and 2x, and using "LDP", "STP" instructions, this CL can reduce the "load" cost and improve performance. Benchmarks: name old time/op new time/op delta AddVV/1-8 21.5ns ± 0% 11.5ns ± 0% -46.51% (p=0.008 n=5+5) AddVV/2-8 13.5ns ± 0% 12.0ns ± 0% -11.11% (p=0.008 n=5+5) AddVV/3-8 15.5ns ± 0% 13.0ns ± 0% -16.13% (p=0.008 n=5+5) AddVV/4-8 17.5ns ± 0% 13.5ns ± 0% -22.86% (p=0.008 n=5+5) AddVV/5-8 19.5ns ± 0% 14.5ns ± 0% -25.64% (p=0.008 n=5+5) AddVV/10-8 29.5ns ± 0% 18.0ns ± 0% -38.98% (p=0.008 n=5+5) AddVV/100-8 217ns ± 0% 94ns ± 0% -56.64% (p=0.008 n=5+5) AddVV/1000-8 2.02µs ± 0% 1.03µs ± 0% -48.85% (p=0.008 n=5+5) AddVV/10000-8 20.5µs ± 0% 11.3µs ± 0% -44.70% (p=0.008 n=5+5) AddVV/100000-8 247µs ± 3% 154µs ± 0% -37.52% (p=0.008 n=5+5) SubVV/1-8 21.5ns ± 0% 11.5ns ± 0% ~ (p=0.079 n=4+5) SubVV/2-8 13.5ns ± 0% 12.0ns ± 0% -11.11% (p=0.008 n=5+5) SubVV/3-8 15.5ns ± 0% 13.0ns ± 0% -16.13% (p=0.008 n=5+5) SubVV/4-8 17.5ns ± 0% 13.5ns ± 0% -22.86% (p=0.008 n=5+5) SubVV/5-8 19.5ns ± 0% 14.5ns ± 0% -25.64% (p=0.008 n=5+5) SubVV/10-8 29.5ns ± 0% 18.0ns ± 0% -38.98% (p=0.008 n=5+5) SubVV/100-8 217ns ± 0% 94ns ± 0% -56.64% (p=0.008 n=5+5) SubVV/1000-8 2.02µs ± 0% 0.80µs ± 0% -60.50% (p=0.008 n=5+5) SubVV/10000-8 20.5µs ± 0% 11.3µs ± 0% -44.99% (p=0.008 n=5+5) SubVV/100000-8 221µs ±11% 223µs ±16% ~ (p=0.690 n=5+5) AddVW/1-8 9.32ns ± 0% 9.32ns ± 0% ~ (all equal) AddVW/2-8 19.7ns ± 1% 19.7ns ± 0% ~ (p=0.381 n=5+4) AddVW/3-8 11.5ns ± 0% 11.5ns ± 0% ~ (all equal) AddVW/4-8 13.0ns ± 0% 13.0ns ± 0% ~ (all equal) AddVW/5-8 14.5ns ± 0% 14.5ns ± 0% ~ (all equal) AddVW/10-8 22.0ns ± 0% 22.0ns ± 0% ~ (all equal) AddVW/100-8 167ns ± 0% 167ns ± 0% ~ (all equal) AddVW/1000-8 1.52µs ± 0% 1.52µs ± 0% +0.40% (p=0.008 n=5+5) AddVW/10000-8 15.1µs ± 0% 15.1µs ± 0% ~ (p=0.556 n=5+4) AddVW/100000-8 152µs ± 1% 152µs ± 1% ~ (p=0.690 n=5+5) AddMulVVW/1-8 33.3ns ± 0% 32.7ns ± 1% -1.86% (p=0.008 n=5+5) AddMulVVW/2-8 59.3ns ± 1% 56.9ns ± 1% -4.15% (p=0.008 n=5+5) AddMulVVW/3-8 80.5ns ± 1% 85.4ns ± 3% +6.19% (p=0.008 n=5+5) AddMulVVW/4-8 127ns ± 0% 111ns ± 1% -13.19% (p=0.008 n=5+5) AddMulVVW/5-8 144ns ± 0% 149ns ± 0% +3.47% (p=0.016 n=4+5) AddMulVVW/10-8 298ns ± 1% 283ns ± 0% -4.77% (p=0.008 n=5+5) AddMulVVW/100-8 3.06µs ± 0% 2.99µs ± 0% -2.21% (p=0.008 n=5+5) AddMulVVW/1000-8 31.3µs ± 0% 26.9µs ± 0% -14.17% (p=0.008 n=5+5) AddMulVVW/10000-8 316µs ± 0% 305µs ± 0% -3.51% (p=0.008 n=5+5) AddMulVVW/100000-8 3.17ms ± 0% 3.17ms ± 1% ~ (p=0.690 n=5+5) DecimalConversion-8 316µs ± 1% 313µs ± 2% ~ (p=0.095 n=5+5) FloatString/100-8 2.53µs ± 1% 2.56µs ± 2% ~ (p=0.222 n=5+5) FloatString/1000-8 58.4µs ± 0% 58.5µs ± 0% ~ (p=0.206 n=5+5) FloatString/10000-8 4.59ms ± 0% 4.58ms ± 0% -0.31% (p=0.008 n=5+5) FloatString/100000-8 446ms ± 0% 444ms ± 0% -0.31% (p=0.008 n=5+5) FloatAdd/10-8 184ns ± 0% 172ns ± 0% -6.30% (p=0.008 n=5+5) FloatAdd/100-8 189ns ± 2% 191ns ± 4% ~ (p=0.381 n=5+5) FloatAdd/1000-8 371ns ± 0% 347ns ± 1% -6.42% (p=0.008 n=5+5) FloatAdd/10000-8 1.87µs ± 0% 1.68µs ± 0% -10.16% (p=0.008 n=5+5) FloatAdd/100000-8 17.1µs ± 0% 15.6µs ± 0% -8.74% (p=0.016 n=5+4) FloatSub/10-8 152ns ± 0% 138ns ± 0% -9.47% (p=0.000 n=4+5) FloatSub/100-8 148ns ± 0% 142ns ± 0% -4.05% (p=0.000 n=5+4) FloatSub/1000-8 245ns ± 1% 217ns ± 0% -11.28% (p=0.000 n=5+4) FloatSub/10000-8 1.07µs ± 0% 0.88µs ± 1% -18.14% (p=0.008 n=5+5) FloatSub/100000-8 9.58µs ± 0% 7.96µs ± 0% -16.84% (p=0.008 n=5+5) ParseFloatSmallExp-8 28.8µs ± 1% 29.0µs ± 1% ~ (p=0.095 n=5+5) ParseFloatLargeExp-8 126µs ± 1% 126µs ± 1% ~ (p=0.841 n=5+5) GCD10x10/WithoutXY-8 277ns ± 2% 281ns ± 4% ~ (p=0.746 n=5+5) GCD10x10/WithXY-8 2.10µs ± 1% 2.12µs ± 3% ~ (p=0.548 n=5+5) GCD10x100/WithoutXY-8 615ns ± 3% 607ns ± 2% ~ (p=0.135 n=5+5) GCD10x100/WithXY-8 3.50µs ± 2% 3.62µs ± 5% ~ (p=0.151 n=5+5) GCD10x1000/WithoutXY-8 1.39µs ± 2% 1.39µs ± 3% ~ (p=0.690 n=5+5) GCD10x1000/WithXY-8 7.39µs ± 1% 7.34µs ± 2% ~ (p=0.135 n=5+5) GCD10x10000/WithoutXY-8 8.66µs ± 1% 8.68µs ± 1% ~ (p=0.421 n=5+5) GCD10x10000/WithXY-8 28.1µs ± 2% 27.0µs ± 2% -3.81% (p=0.008 n=5+5) GCD10x100000/WithoutXY-8 79.3µs ± 1% 79.3µs ± 1% ~ (p=0.841 n=5+5) GCD10x100000/WithXY-8 238µs ± 0% 227µs ± 1% -4.74% (p=0.008 n=5+5) GCD100x100/WithoutXY-8 1.89µs ± 1% 1.88µs ± 2% ~ (p=0.968 n=5+5) GCD100x100/WithXY-8 26.7µs ± 1% 27.0µs ± 1% +1.44% (p=0.032 n=5+5) GCD100x1000/WithoutXY-8 4.48µs ± 1% 4.45µs ± 2% ~ (p=0.341 n=5+5) GCD100x1000/WithXY-8 36.3µs ± 1% 35.1µs ± 1% -3.27% (p=0.008 n=5+5) GCD100x10000/WithoutXY-8 22.8µs ± 0% 22.7µs ± 1% ~ (p=0.056 n=5+5) GCD100x10000/WithXY-8 145µs ± 1% 133µs ± 1% -8.33% (p=0.008 n=5+5) GCD100x100000/WithoutXY-8 198µs ± 0% 195µs ± 0% -1.56% (p=0.008 n=5+5) GCD100x100000/WithXY-8 1.11ms ± 0% 1.00ms ± 0% -10.04% (p=0.008 n=5+5) GCD1000x1000/WithoutXY-8 25.2µs ± 1% 24.8µs ± 1% -1.63% (p=0.016 n=5+5) GCD1000x1000/WithXY-8 513µs ± 0% 517µs ± 2% ~ (p=0.421 n=5+5) GCD1000x10000/WithoutXY-8 57.0µs ± 0% 52.7µs ± 1% -7.56% (p=0.008 n=5+5) GCD1000x10000/WithXY-8 1.20ms ± 0% 1.10ms ± 0% -8.70% (p=0.008 n=5+5) GCD1000x100000/WithoutXY-8 358µs ± 0% 318µs ± 1% -11.03% (p=0.008 n=5+5) GCD1000x100000/WithXY-8 8.71ms ± 0% 7.65ms ± 0% -12.19% (p=0.008 n=5+5) GCD10000x10000/WithoutXY-8 690µs ± 0% 630µs ± 0% -8.71% (p=0.008 n=5+5) GCD10000x10000/WithXY-8 16.0ms ± 1% 14.9ms ± 0% -6.85% (p=0.008 n=5+5) GCD10000x100000/WithoutXY-8 2.09ms ± 0% 1.75ms ± 0% -16.09% (p=0.016 n=5+4) GCD10000x100000/WithXY-8 86.8ms ± 0% 76.3ms ± 0% -12.09% (p=0.008 n=5+5) GCD100000x100000/WithoutXY-8 51.1ms ± 0% 46.0ms ± 0% -9.97% (p=0.008 n=5+5) GCD100000x100000/WithXY-8 1.25s ± 0% 1.15s ± 0% -7.92% (p=0.008 n=5+5) Hilbert-8 2.45ms ± 1% 2.49ms ± 1% +1.99% (p=0.008 n=5+5) Binomial-8 4.98µs ± 3% 4.90µs ± 2% ~ (p=0.421 n=5+5) QuoRem-8 7.10µs ± 0% 6.21µs ± 0% -12.55% (p=0.016 n=5+4) Exp-8 161ms ± 0% 161ms ± 0% ~ (p=0.421 n=5+5) Exp2-8 161ms ± 0% 161ms ± 0% ~ (p=0.151 n=5+5) Bitset-8 40.4ns ± 0% 40.3ns ± 0% ~ (p=0.190 n=5+5) BitsetNeg-8 163ns ± 3% 137ns ± 2% -15.91% (p=0.008 n=5+5) BitsetOrig-8 377ns ± 1% 372ns ± 1% -1.22% (p=0.024 n=5+5) BitsetNegOrig-8 631ns ± 1% 605ns ± 1% -4.09% (p=0.008 n=5+5) ModSqrt225_Tonelli-8 7.26ms ± 0% 7.26ms ± 0% ~ (p=0.548 n=5+5) ModSqrt224_3Mod4-8 2.24ms ± 0% 2.24ms ± 0% ~ (p=1.000 n=5+5) ModSqrt5430_Tonelli-8 62.4s ± 0% 62.4s ± 0% ~ (p=0.841 n=5+5) ModSqrt5430_3Mod4-8 20.8s ± 0% 20.7s ± 0% ~ (p=0.056 n=5+5) Sqrt-8 101µs ± 0% 89µs ± 0% -12.17% (p=0.008 n=5+5) IntSqr/1-8 32.5ns ± 1% 32.7ns ± 1% ~ (p=0.056 n=5+5) IntSqr/2-8 160ns ± 5% 158ns ± 0% ~ (p=0.397 n=5+4) IntSqr/3-8 298ns ± 4% 296ns ± 4% ~ (p=0.667 n=5+5) IntSqr/5-8 737ns ± 5% 761ns ± 3% +3.34% (p=0.016 n=5+5) IntSqr/8-8 1.87µs ± 4% 1.90µs ± 3% ~ (p=0.222 n=5+5) IntSqr/10-8 2.96µs ± 4% 2.92µs ± 6% ~ (p=0.310 n=5+5) IntSqr/20-8 6.28µs ± 3% 6.21µs ± 2% ~ (p=0.310 n=5+5) IntSqr/30-8 14.0µs ± 2% 13.9µs ± 2% ~ (p=0.548 n=5+5) IntSqr/50-8 37.7µs ± 3% 38.3µs ± 2% ~ (p=0.095 n=5+5) IntSqr/80-8 95.9µs ± 2% 95.1µs ± 1% ~ (p=0.310 n=5+5) IntSqr/100-8 148µs ± 1% 148µs ± 1% ~ (p=0.841 n=5+5) IntSqr/200-8 586µs ± 1% 587µs ± 1% ~ (p=1.000 n=5+5) IntSqr/300-8 1.32ms ± 0% 1.31ms ± 1% -0.73% (p=0.032 n=5+5) IntSqr/500-8 2.48ms ± 0% 2.45ms ± 0% -1.15% (p=0.008 n=5+5) IntSqr/800-8 4.68ms ± 0% 4.62ms ± 0% -1.23% (p=0.008 n=5+5) IntSqr/1000-8 7.57ms ± 0% 7.50ms ± 0% -0.84% (p=0.008 n=5+5) Mul-8 311ms ± 0% 308ms ± 0% -0.81% (p=0.008 n=5+5) Exp3Power/0x10-8 574ns ± 1% 578ns ± 2% ~ (p=0.500 n=5+5) Exp3Power/0x40-8 640ns ± 1% 646ns ± 0% ~ (p=0.056 n=5+5) Exp3Power/0x100-8 1.42µs ± 1% 1.42µs ± 1% ~ (p=0.246 n=5+5) Exp3Power/0x400-8 8.30µs ± 1% 8.29µs ± 1% ~ (p=0.802 n=5+5) Exp3Power/0x1000-8 60.0µs ± 0% 59.9µs ± 0% -0.24% (p=0.016 n=5+5) Exp3Power/0x4000-8 817µs ± 0% 816µs ± 0% -0.17% (p=0.008 n=5+5) Exp3Power/0x10000-8 7.80ms ± 1% 7.70ms ± 0% -1.23% (p=0.008 n=5+5) Exp3Power/0x40000-8 73.4ms ± 0% 72.5ms ± 0% -1.28% (p=0.008 n=5+5) Exp3Power/0x100000-8 665ms ± 0% 656ms ± 0% -1.34% (p=0.008 n=5+5) Exp3Power/0x400000-8 5.99s ± 0% 5.90s ± 0% -1.40% (p=0.008 n=5+5) Fibo-8 116ms ± 0% 50ms ± 0% -57.09% (p=0.008 n=5+5) NatSqr/1-8 112ns ± 4% 112ns ± 2% ~ (p=0.968 n=5+5) NatSqr/2-8 251ns ± 2% 250ns ± 1% ~ (p=0.571 n=5+5) NatSqr/3-8 378ns ± 2% 379ns ± 2% ~ (p=0.794 n=5+5) NatSqr/5-8 829ns ± 3% 827ns ± 2% ~ (p=1.000 n=5+5) NatSqr/8-8 1.97µs ± 2% 1.95µs ± 2% ~ (p=0.310 n=5+5) NatSqr/10-8 3.02µs ± 2% 2.99µs ± 2% ~ (p=0.421 n=5+5) NatSqr/20-8 6.51µs ± 2% 6.49µs ± 1% ~ (p=0.841 n=5+5) NatSqr/30-8 14.1µs ± 2% 14.0µs ± 2% ~ (p=0.841 n=5+5) NatSqr/50-8 38.1µs ± 2% 38.3µs ± 3% ~ (p=0.690 n=5+5) NatSqr/80-8 95.5µs ± 2% 96.0µs ± 1% ~ (p=0.421 n=5+5) NatSqr/100-8 150µs ± 1% 148µs ± 2% ~ (p=0.095 n=5+5) NatSqr/200-8 588µs ± 1% 590µs ± 1% ~ (p=0.421 n=5+5) NatSqr/300-8 1.32ms ± 1% 1.31ms ± 1% ~ (p=0.841 n=5+5) NatSqr/500-8 2.50ms ± 0% 2.47ms ± 0% -1.03% (p=0.008 n=5+5) NatSqr/800-8 4.70ms ± 0% 4.64ms ± 0% -1.31% (p=0.008 n=5+5) NatSqr/1000-8 7.60ms ± 0% 7.52ms ± 0% -1.01% (p=0.008 n=5+5) ScanPi-8 326µs ± 0% 326µs ± 0% ~ (p=0.841 n=5+5) StringPiParallel-8 70.3µs ± 5% 63.8µs ±10% ~ (p=0.056 n=5+5) Scan/10/Base2-8 1.09µs ± 0% 1.09µs ± 0% ~ (p=0.317 n=5+5) Scan/100/Base2-8 7.79µs ± 0% 7.78µs ± 0% ~ (p=0.063 n=5+5) Scan/1000/Base2-8 79.0µs ± 0% 78.9µs ± 0% -0.18% (p=0.008 n=5+5) Scan/10000/Base2-8 1.22ms ± 0% 1.22ms ± 0% -0.15% (p=0.008 n=5+5) Scan/100000/Base2-8 55.1ms ± 0% 55.2ms ± 0% +0.20% (p=0.008 n=5+5) Scan/10/Base8-8 512ns ± 0% 512ns ± 1% ~ (p=0.810 n=5+5) Scan/100/Base8-8 2.89µs ± 0% 2.89µs ± 0% ~ (p=0.810 n=5+5) Scan/1000/Base8-8 31.0µs ± 0% 31.0µs ± 0% ~ (p=0.151 n=5+5) Scan/10000/Base8-8 740µs ± 0% 741µs ± 0% +0.10% (p=0.008 n=5+5) Scan/100000/Base8-8 50.6ms ± 0% 50.6ms ± 0% +0.08% (p=0.008 n=5+5) Scan/10/Base10-8 487ns ± 0% 487ns ± 0% ~ (p=0.571 n=5+5) Scan/100/Base10-8 2.67µs ± 0% 2.67µs ± 0% ~ (p=0.810 n=5+5) Scan/1000/Base10-8 28.7µs ± 0% 28.7µs ± 0% +0.06% (p=0.008 n=5+5) Scan/10000/Base10-8 716µs ± 0% 717µs ± 0% ~ (p=0.222 n=5+5) Scan/100000/Base10-8 50.3ms ± 0% 50.3ms ± 0% +0.10% (p=0.008 n=5+5) Scan/10/Base16-8 438ns ± 0% 437ns ± 1% ~ (p=0.786 n=5+5) Scan/100/Base16-8 2.47µs ± 0% 2.47µs ± 0% -0.19% (p=0.048 n=5+5) Scan/1000/Base16-8 27.2µs ± 0% 27.3µs ± 0% ~ (p=0.087 n=5+5) Scan/10000/Base16-8 722µs ± 0% 722µs ± 0% +0.11% (p=0.008 n=5+5) Scan/100000/Base16-8 52.6ms ± 0% 52.7ms ± 0% +0.15% (p=0.008 n=5+5) String/10/Base2-8 247ns ± 2% 248ns ± 1% ~ (p=0.437 n=5+5) String/100/Base2-8 1.51µs ± 0% 1.51µs ± 0% -0.37% (p=0.024 n=5+5) String/1000/Base2-8 13.6µs ± 1% 13.5µs ± 0% ~ (p=0.095 n=5+5) String/10000/Base2-8 135µs ± 0% 135µs ± 1% ~ (p=0.841 n=5+5) String/100000/Base2-8 1.32ms ± 1% 1.32ms ± 1% ~ (p=0.690 n=5+5) String/10/Base8-8 169ns ± 1% 169ns ± 1% ~ (p=1.000 n=5+5) String/100/Base8-8 636ns ± 0% 634ns ± 1% ~ (p=0.413 n=5+5) String/1000/Base8-8 5.33µs ± 1% 5.32µs ± 0% ~ (p=0.222 n=5+5) String/10000/Base8-8 50.9µs ± 1% 50.7µs ± 0% ~ (p=0.151 n=5+5) String/100000/Base8-8 500µs ± 1% 497µs ± 0% ~ (p=0.421 n=5+5) String/10/Base10-8 516ns ± 1% 513ns ± 0% -0.62% (p=0.016 n=5+4) String/100/Base10-8 1.97µs ± 0% 1.96µs ± 0% ~ (p=0.667 n=4+5) String/1000/Base10-8 12.5µs ± 0% 11.5µs ± 0% -7.92% (p=0.008 n=5+5) String/10000/Base10-8 57.7µs ± 0% 52.5µs ± 0% -8.93% (p=0.008 n=5+5) String/100000/Base10-8 25.6ms ± 0% 21.6ms ± 0% -15.94% (p=0.008 n=5+5) String/10/Base16-8 150ns ± 1% 149ns ± 0% ~ (p=0.413 n=5+4) String/100/Base16-8 514ns ± 1% 514ns ± 1% ~ (p=0.849 n=5+5) String/1000/Base16-8 4.01µs ± 0% 4.01µs ± 0% ~ (p=0.421 n=5+5) String/10000/Base16-8 37.8µs ± 1% 37.8µs ± 1% ~ (p=0.841 n=5+5) String/100000/Base16-8 373µs ± 2% 373µs ± 0% ~ (p=0.421 n=5+5) LeafSize/0-8 6.63ms ± 0% 6.63ms ± 0% ~ (p=0.730 n=4+5) LeafSize/1-8 74.0µs ± 0% 67.7µs ± 1% -8.53% (p=0.008 n=5+5) LeafSize/2-8 74.2µs ± 0% 68.3µs ± 1% -7.99% (p=0.008 n=5+5) LeafSize/3-8 379µs ± 0% 309µs ± 0% -18.52% (p=0.008 n=5+5) LeafSize/4-8 72.7µs ± 1% 66.7µs ± 0% -8.37% (p=0.008 n=5+5) LeafSize/5-8 471µs ± 0% 384µs ± 0% -18.55% (p=0.008 n=5+5) LeafSize/6-8 378µs ± 0% 308µs ± 0% -18.59% (p=0.008 n=5+5) LeafSize/7-8 245µs ± 0% 204µs ± 1% -16.75% (p=0.008 n=5+5) LeafSize/8-8 73.4µs ± 0% 66.9µs ± 1% -8.79% (p=0.008 n=5+5) LeafSize/9-8 538µs ± 0% 437µs ± 0% -18.75% (p=0.008 n=5+5) LeafSize/10-8 472µs ± 0% 396µs ± 1% -16.01% (p=0.008 n=5+5) LeafSize/11-8 460µs ± 0% 374µs ± 0% -18.58% (p=0.008 n=5+5) LeafSize/12-8 378µs ± 0% 308µs ± 0% -18.38% (p=0.008 n=5+5) LeafSize/13-8 343µs ± 0% 284µs ± 0% -17.30% (p=0.008 n=5+5) LeafSize/14-8 248µs ± 0% 206µs ± 0% -16.94% (p=0.008 n=5+5) LeafSize/15-8 169µs ± 0% 144µs ± 0% -14.69% (p=0.008 n=5+5) LeafSize/16-8 72.9µs ± 0% 66.8µs ± 1% -8.27% (p=0.008 n=5+5) LeafSize/32-8 82.5µs ± 0% 76.7µs ± 0% -7.04% (p=0.008 n=5+5) LeafSize/64-8 134µs ± 0% 129µs ± 0% -3.80% (p=0.008 n=5+5) ProbablyPrime/n=0-8 44.2ms ± 0% 43.4ms ± 0% -1.95% (p=0.008 n=5+5) ProbablyPrime/n=1-8 64.9ms ± 0% 64.0ms ± 0% -1.27% (p=0.008 n=5+5) ProbablyPrime/n=5-8 147ms ± 0% 146ms ± 0% -0.58% (p=0.008 n=5+5) ProbablyPrime/n=10-8 250ms ± 0% 249ms ± 0% -0.35% (p=0.008 n=5+5) ProbablyPrime/n=20-8 456ms ± 0% 455ms ± 0% -0.18% (p=0.008 n=5+5) ProbablyPrime/Lucas-8 23.6ms ± 0% 22.7ms ± 0% -3.74% (p=0.008 n=5+5) ProbablyPrime/MillerRabinBase2-8 20.7ms ± 0% 20.6ms ± 0% ~ (p=0.421 n=5+5) FloatSqrt/64-8 2.25µs ± 1% 2.29µs ± 0% +1.48% (p=0.008 n=5+5) FloatSqrt/128-8 4.86µs ± 1% 4.92µs ± 1% +1.21% (p=0.032 n=5+5) FloatSqrt/256-8 13.6µs ± 0% 13.7µs ± 1% +1.31% (p=0.032 n=5+5) FloatSqrt/1000-8 70.0µs ± 1% 70.1µs ± 0% ~ (p=0.690 n=5+5) FloatSqrt/10000-8 1.92ms ± 0% 1.90ms ± 0% -0.59% (p=0.008 n=5+5) FloatSqrt/100000-8 55.3ms ± 0% 54.8ms ± 0% -1.01% (p=0.008 n=5+5) FloatSqrt/1000000-8 4.56s ± 0% 4.50s ± 0% -1.28% (p=0.008 n=5+5) name old speed new speed delta AddVV/1-8 2.97GB/s ± 0% 5.56GB/s ± 0% +86.85% (p=0.008 n=5+5) AddVV/2-8 9.47GB/s ± 0% 10.66GB/s ± 0% +12.50% (p=0.008 n=5+5) AddVV/3-8 12.4GB/s ± 0% 14.7GB/s ± 0% +19.10% (p=0.008 n=5+5) AddVV/4-8 14.6GB/s ± 0% 18.9GB/s ± 0% +29.63% (p=0.016 n=4+5) AddVV/5-8 16.4GB/s ± 0% 22.0GB/s ± 0% +34.47% (p=0.016 n=5+4) AddVV/10-8 21.7GB/s ± 0% 35.5GB/s ± 0% +63.89% (p=0.008 n=5+5) AddVV/100-8 29.4GB/s ± 0% 68.0GB/s ± 0% +131.38% (p=0.008 n=5+5) AddVV/1000-8 31.7GB/s ± 0% 61.9GB/s ± 0% +95.43% (p=0.008 n=5+5) AddVV/10000-8 31.2GB/s ± 0% 56.4GB/s ± 0% +80.83% (p=0.008 n=5+5) AddVV/100000-8 25.9GB/s ± 3% 41.4GB/s ± 0% +59.98% (p=0.008 n=5+5) SubVV/1-8 2.97GB/s ± 0% 5.56GB/s ± 0% +86.97% (p=0.016 n=4+5) SubVV/2-8 9.47GB/s ± 0% 10.66GB/s ± 0% +12.51% (p=0.008 n=5+5) SubVV/3-8 12.4GB/s ± 0% 14.8GB/s ± 0% +19.23% (p=0.016 n=4+5) SubVV/4-8 14.6GB/s ± 0% 18.9GB/s ± 0% +29.56% (p=0.008 n=5+5) SubVV/5-8 16.4GB/s ± 0% 22.0GB/s ± 0% +34.47% (p=0.016 n=4+5) SubVV/10-8 21.7GB/s ± 0% 35.5GB/s ± 0% +63.89% (p=0.008 n=5+5) SubVV/100-8 29.4GB/s ± 0% 68.0GB/s ± 0% +131.38% (p=0.008 n=5+5) SubVV/1000-8 31.6GB/s ± 0% 80.1GB/s ± 0% +153.08% (p=0.008 n=5+5) SubVV/10000-8 31.2GB/s ± 0% 56.7GB/s ± 0% +81.79% (p=0.008 n=5+5) SubVV/100000-8 29.1GB/s ±10% 29.0GB/s ±18% ~ (p=0.690 n=5+5) AddVW/1-8 859MB/s ± 0% 859MB/s ± 0% -0.01% (p=0.008 n=5+5) AddVW/2-8 811MB/s ± 1% 814MB/s ± 0% ~ (p=0.413 n=5+4) AddVW/3-8 2.08GB/s ± 0% 2.08GB/s ± 0% ~ (p=0.206 n=5+5) AddVW/4-8 2.46GB/s ± 0% 2.46GB/s ± 0% ~ (p=0.056 n=5+5) AddVW/5-8 2.75GB/s ± 0% 2.75GB/s ± 0% ~ (p=0.508 n=5+5) AddVW/10-8 3.63GB/s ± 0% 3.63GB/s ± 0% ~ (p=0.214 n=5+5) AddVW/100-8 4.79GB/s ± 0% 4.79GB/s ± 0% ~ (p=0.500 n=5+5) AddVW/1000-8 5.27GB/s ± 0% 5.25GB/s ± 0% -0.43% (p=0.008 n=5+5) AddVW/10000-8 5.30GB/s ± 0% 5.30GB/s ± 0% ~ (p=0.397 n=5+5) AddVW/100000-8 5.27GB/s ± 1% 5.25GB/s ± 1% ~ (p=0.690 n=5+5) AddMulVVW/1-8 1.92GB/s ± 0% 1.96GB/s ± 1% +1.95% (p=0.008 n=5+5) AddMulVVW/2-8 2.16GB/s ± 1% 2.25GB/s ± 1% +4.32% (p=0.008 n=5+5) AddMulVVW/3-8 2.39GB/s ± 1% 2.25GB/s ± 3% -5.79% (p=0.008 n=5+5) AddMulVVW/4-8 2.00GB/s ± 0% 2.31GB/s ± 1% +15.31% (p=0.008 n=5+5) AddMulVVW/5-8 2.22GB/s ± 0% 2.14GB/s ± 0% -3.86% (p=0.008 n=5+5) AddMulVVW/10-8 2.15GB/s ± 1% 2.25GB/s ± 0% +5.03% (p=0.008 n=5+5) AddMulVVW/100-8 2.09GB/s ± 0% 2.14GB/s ± 0% +2.25% (p=0.008 n=5+5) AddMulVVW/1000-8 2.04GB/s ± 0% 2.38GB/s ± 0% +16.52% (p=0.008 n=5+5) AddMulVVW/10000-8 2.03GB/s ± 0% 2.10GB/s ± 0% +3.64% (p=0.008 n=5+5) AddMulVVW/100000-8 2.02GB/s ± 0% 2.02GB/s ± 1% ~ (p=0.690 n=5+5) Change-Id: Ie482d67a7dbb5af6f5d81af2b3d9d14bd66336db Reviewed-on: https://go-review.googlesource.com/77831 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-03-06 00:22:08 +00:00
Ilya Tocar	c15984c6c6	math: remove unused variable useSSE41 was used inside asm implementation of floor to select between base and ss4 code path. We intrinsified floor and left asm functions as a backup for non-sse4 systems. This made variable unused, so remove it. Change-Id: Ia2633de7c7cb1ef1d5b15a2366b523e481b722d9 Reviewed-on: https://go-review.googlesource.com/97935 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-03-01 18:51:44 +00:00
erifan01	ed6c6c9c11	math: optimize sinh and cosh Improve performance by reducing unnecessary function calls Benchmarks: Tme old time/op new time/op delta Cosh-8 229ns ± 0% 138ns ± 0% -39.74% (p=0.008 n=5+5) Sinh-8 231ns ± 0% 139ns ± 0% -39.83% (p=0.008 n=5+5) Change-Id: Icab5485849bbfaafca8429d06b67c558101f4f3c Reviewed-on: https://go-review.googlesource.com/85477 Reviewed-by: Robert Griesemer <gri@golang.org>	2018-02-27 04:34:37 +00:00
Ilya Tocar	c3935c08d2	math/big: speed-up addMulVVW on amd64 Use MULX/ADOX/ADCX instructions to speed-up addMulVVW, when they are available. addMulVVW is a hotspot in rsa. This is faster than ADD/ADC/IMUL version, because ADOX/ADCX only modify carry/overflow flag, so they can be interleaved with each other and with MULX, which doesn't modify flags at all. Increasing unroll factor to e. g. 16 makes rsa 1% faster, but 3PrimeRSA2048Decrypt performance falls back to baseline. Updates #20058 AddMulVVW/1-8 3.28ns ± 2% 3.26ns ± 3% ~ (p=0.107 n=10+10) AddMulVVW/2-8 4.26ns ± 2% 4.24ns ± 3% ~ (p=0.327 n=9+9) AddMulVVW/3-8 5.07ns ± 2% 5.26ns ± 2% +3.73% (p=0.000 n=10+10) AddMulVVW/4-8 6.40ns ± 2% 6.50ns ± 2% +1.61% (p=0.000 n=10+10) AddMulVVW/5-8 6.77ns ± 2% 6.86ns ± 1% +1.38% (p=0.001 n=9+9) AddMulVVW/10-8 12.2ns ± 2% 10.6ns ± 3% -13.65% (p=0.000 n=10+10) AddMulVVW/100-8 79.7ns ± 2% 52.4ns ± 1% -34.17% (p=0.000 n=10+10) AddMulVVW/1000-8 695ns ± 1% 491ns ± 2% -29.39% (p=0.000 n=9+10) AddMulVVW/10000-8 7.26µs ± 2% 5.92µs ± 6% -18.42% (p=0.000 n=10+10) AddMulVVW/100000-8 72.6µs ± 2% 62.2µs ± 2% -14.31% (p=0.000 n=10+10) crypto/rsa speed-up is smaller, but stil noticeable: RSA2048Decrypt-8 1.61ms ± 1% 1.38ms ± 1% -14.13% (p=0.000 n=10+10) RSA2048Sign-8 1.93ms ± 1% 1.70ms ± 1% -11.86% (p=0.000 n=10+10) 3PrimeRSA2048Decrypt-8 932µs ± 0% 828µs ± 0% -11.15% (p=0.000 n=10+10) Results on crypto/tls: HandshakeServer/RSA-8 901µs ± 1% 777µs ± 0% -13.70% (p=0.000 n=10+8) HandshakeServer/ECDHE-P256-RSA-8 1.01ms ± 1% 0.90ms ± 0% -11.53% (p=0.000 n=10+9) Full math/big benchmarks: name old time/op new time/op delta AddVV/1-8 3.74ns ± 6% 3.55ns ± 2% ~ (p=0.082 n=10+8) AddVV/2-8 3.96ns ± 2% 3.98ns ± 5% ~ (p=0.794 n=10+9) AddVV/3-8 4.97ns ± 2% 4.94ns ± 1% ~ (p=0.081 n=10+9) AddVV/4-8 5.59ns ± 2% 5.59ns ± 2% ~ (p=0.809 n=10+10) AddVV/5-8 6.63ns ± 1% 6.62ns ± 1% ~ (p=0.560 n=9+10) AddVV/10-8 8.11ns ± 1% 8.11ns ± 2% ~ (p=0.402 n=10+10) AddVV/100-8 46.9ns ± 2% 46.8ns ± 1% ~ (p=0.809 n=10+10) AddVV/1000-8 389ns ± 1% 391ns ± 4% ~ (p=0.809 n=10+10) AddVV/10000-8 5.05µs ± 5% 4.98µs ± 2% ~ (p=0.113 n=9+10) AddVV/100000-8 55.3µs ± 3% 55.2µs ± 3% ~ (p=0.796 n=10+10) AddVW/1-8 3.04ns ± 3% 3.02ns ± 3% ~ (p=0.538 n=10+10) AddVW/2-8 3.57ns ± 2% 3.61ns ± 2% +1.12% (p=0.032 n=9+9) AddVW/3-8 3.77ns ± 1% 3.79ns ± 2% ~ (p=0.719 n=10+10) AddVW/4-8 4.69ns ± 1% 4.69ns ± 2% ~ (p=0.920 n=10+9) AddVW/5-8 4.58ns ± 1% 4.58ns ± 1% ~ (p=0.812 n=10+10) AddVW/10-8 7.62ns ± 2% 7.63ns ± 1% ~ (p=0.926 n=10+10) AddVW/100-8 41.1ns ± 2% 42.4ns ± 3% +3.34% (p=0.000 n=10+10) AddVW/1000-8 386ns ± 2% 389ns ± 4% ~ (p=0.514 n=10+10) AddVW/10000-8 3.88µs ± 3% 3.87µs ± 3% ~ (p=0.448 n=10+10) AddVW/100000-8 41.2µs ± 3% 41.7µs ± 3% ~ (p=0.148 n=10+10) AddMulVVW/1-8 3.28ns ± 2% 3.26ns ± 3% ~ (p=0.107 n=10+10) AddMulVVW/2-8 4.26ns ± 2% 4.24ns ± 3% ~ (p=0.327 n=9+9) AddMulVVW/3-8 5.07ns ± 2% 5.26ns ± 2% +3.73% (p=0.000 n=10+10) AddMulVVW/4-8 6.40ns ± 2% 6.50ns ± 2% +1.61% (p=0.000 n=10+10) AddMulVVW/5-8 6.77ns ± 2% 6.86ns ± 1% +1.38% (p=0.001 n=9+9) AddMulVVW/10-8 12.2ns ± 2% 10.6ns ± 3% -13.65% (p=0.000 n=10+10) AddMulVVW/100-8 79.7ns ± 2% 52.4ns ± 1% -34.17% (p=0.000 n=10+10) AddMulVVW/1000-8 695ns ± 1% 491ns ± 2% -29.39% (p=0.000 n=9+10) AddMulVVW/10000-8 7.26µs ± 2% 5.92µs ± 6% -18.42% (p=0.000 n=10+10) AddMulVVW/100000-8 72.6µs ± 2% 62.2µs ± 2% -14.31% (p=0.000 n=10+10) DecimalConversion-8 108µs ±19% 104µs ± 4% ~ (p=0.460 n=10+8) FloatString/100-8 926ns ±14% 908ns ± 5% ~ (p=0.398 n=9+9) FloatString/1000-8 25.7µs ± 1% 25.7µs ± 1% ~ (p=0.739 n=10+10) FloatString/10000-8 2.13ms ± 1% 2.12ms ± 1% ~ (p=0.353 n=10+10) FloatString/100000-8 207ms ± 1% 206ms ± 2% ~ (p=0.912 n=10+10) FloatAdd/10-8 61.3ns ± 3% 61.9ns ± 3% ~ (p=0.183 n=10+10) FloatAdd/100-8 62.0ns ± 2% 62.9ns ± 4% ~ (p=0.118 n=10+10) FloatAdd/1000-8 84.7ns ± 2% 84.4ns ± 1% ~ (p=0.591 n=10+10) FloatAdd/10000-8 305ns ± 2% 306ns ± 1% ~ (p=0.443 n=10+10) FloatAdd/100000-8 2.45µs ± 1% 2.46µs ± 1% ~ (p=0.782 n=10+10) FloatSub/10-8 56.8ns ± 4% 56.5ns ± 5% ~ (p=0.423 n=10+10) FloatSub/100-8 57.3ns ± 4% 57.1ns ± 5% ~ (p=0.540 n=10+10) FloatSub/1000-8 66.8ns ± 4% 66.6ns ± 1% ~ (p=0.868 n=10+10) FloatSub/10000-8 199ns ± 1% 198ns ± 1% ~ (p=0.287 n=10+9) FloatSub/100000-8 1.47µs ± 2% 1.47µs ± 2% ~ (p=0.920 n=10+9) ParseFloatSmallExp-8 8.74µs ±10% 9.48µs ±10% +8.51% (p=0.010 n=9+10) ParseFloatLargeExp-8 39.2µs ±25% 39.6µs ±12% ~ (p=0.529 n=10+10) GCD10x10/WithoutXY-8 173ns ±23% 177ns ±20% ~ (p=0.698 n=10+10) GCD10x10/WithXY-8 736ns ±12% 728ns ±16% ~ (p=0.838 n=10+10) GCD10x100/WithoutXY-8 325ns ±16% 326ns ±14% ~ (p=0.912 n=10+10) GCD10x100/WithXY-8 1.14µs ±13% 1.16µs ± 6% ~ (p=0.287 n=10+9) GCD10x1000/WithoutXY-8 851ns ±25% 820ns ±12% ~ (p=0.592 n=10+10) GCD10x1000/WithXY-8 2.89µs ±17% 2.85µs ± 5% ~ (p=1.000 n=10+9) GCD10x10000/WithoutXY-8 6.66µs ±12% 6.82µs ±19% ~ (p=0.529 n=10+10) GCD10x10000/WithXY-8 18.0µs ± 5% 17.2µs ±19% ~ (p=0.315 n=7+10) GCD10x100000/WithoutXY-8 77.8µs ±18% 73.3µs ±11% ~ (p=0.315 n=10+9) GCD10x100000/WithXY-8 186µs ±14% 204µs ±29% ~ (p=0.218 n=10+10) GCD100x100/WithoutXY-8 1.09µs ± 1% 1.09µs ± 2% ~ (p=0.117 n=9+10) GCD100x100/WithXY-8 7.93µs ± 1% 7.97µs ± 1% +0.52% (p=0.006 n=10+10) GCD100x1000/WithoutXY-8 2.00µs ± 3% 2.04µs ± 6% ~ (p=0.053 n=9+10) GCD100x1000/WithXY-8 9.23µs ± 1% 9.29µs ± 1% +0.63% (p=0.009 n=10+10) GCD100x10000/WithoutXY-8 10.2µs ±11% 9.7µs ± 6% ~ (p=0.278 n=10+9) GCD100x10000/WithXY-8 33.3µs ± 4% 33.6µs ± 4% ~ (p=0.481 n=10+10) GCD100x100000/WithoutXY-8 106µs ±17% 105µs ±13% ~ (p=0.853 n=10+10) GCD100x100000/WithXY-8 289µs ±17% 276µs ± 8% ~ (p=0.353 n=10+10) GCD1000x1000/WithoutXY-8 12.2µs ± 1% 12.1µs ± 1% -0.45% (p=0.007 n=10+10) GCD1000x1000/WithXY-8 131µs ± 1% 132µs ± 0% +0.93% (p=0.000 n=9+7) GCD1000x10000/WithoutXY-8 20.6µs ± 2% 20.6µs ± 1% ~ (p=0.326 n=10+9) GCD1000x10000/WithXY-8 238µs ± 1% 237µs ± 1% ~ (p=0.356 n=9+10) GCD1000x100000/WithoutXY-8 117µs ± 8% 114µs ±11% ~ (p=0.190 n=10+10) GCD1000x100000/WithXY-8 1.51ms ± 1% 1.50ms ± 1% ~ (p=0.053 n=9+10) GCD10000x10000/WithoutXY-8 220µs ± 1% 218µs ± 1% -0.86% (p=0.000 n=10+10) GCD10000x10000/WithXY-8 3.04ms ± 0% 3.05ms ± 0% +0.33% (p=0.001 n=9+10) GCD10000x100000/WithoutXY-8 513µs ± 0% 511µs ± 0% -0.38% (p=0.000 n=10+10) GCD10000x100000/WithXY-8 15.1ms ± 0% 15.0ms ± 0% ~ (p=0.053 n=10+9) GCD100000x100000/WithoutXY-8 10.4ms ± 1% 10.4ms ± 2% ~ (p=0.258 n=9+9) GCD100000x100000/WithXY-8 205ms ± 1% 205ms ± 1% ~ (p=0.481 n=10+10) Hilbert-8 1.25ms ±15% 1.24ms ±17% ~ (p=0.853 n=10+10) Binomial-8 3.03µs ±24% 2.90µs ±16% ~ (p=0.481 n=10+10) QuoRem-8 1.95µs ± 1% 1.95µs ± 2% ~ (p=0.117 n=9+10) Exp-8 5.12ms ± 2% 3.99ms ± 1% -22.02% (p=0.000 n=10+9) Exp2-8 5.14ms ± 2% 3.98ms ± 0% -22.55% (p=0.000 n=10+9) Bitset-8 16.4ns ± 2% 16.5ns ± 2% ~ (p=0.311 n=9+10) BitsetNeg-8 46.3ns ± 4% 45.8ns ± 4% ~ (p=0.272 n=10+10) BitsetOrig-8 250ns ±19% 247ns ±14% ~ (p=0.671 n=10+10) BitsetNegOrig-8 416ns ±14% 429ns ±14% ~ (p=0.353 n=10+10) ModSqrt225_Tonelli-8 400µs ± 0% 320µs ± 0% -19.88% (p=0.000 n=9+7) ModSqrt224_3Mod4-8 123µs ± 1% 97µs ± 0% -21.21% (p=0.000 n=9+10) ModSqrt5430_Tonelli-8 1.87s ± 0% 1.39s ± 1% -25.70% (p=0.000 n=9+10) ModSqrt5430_3Mod4-8 630ms ± 2% 465ms ± 1% -26.12% (p=0.000 n=10+10) Sqrt-8 25.8µs ± 1% 25.9µs ± 0% +0.66% (p=0.002 n=10+8) IntSqr/1-8 11.3ns ± 1% 11.3ns ± 2% ~ (p=0.360 n=9+10) IntSqr/2-8 26.6ns ± 1% 27.4ns ± 2% +2.87% (p=0.000 n=8+9) IntSqr/3-8 36.5ns ± 6% 36.6ns ± 5% ~ (p=0.589 n=10+10) IntSqr/5-8 57.2ns ± 2% 57.8ns ± 1% +0.92% (p=0.045 n=10+9) IntSqr/8-8 112ns ± 1% 93ns ± 1% -16.60% (p=0.000 n=10+10) IntSqr/10-8 148ns ± 1% 129ns ± 5% -12.85% (p=0.000 n=10+10) IntSqr/20-8 642ns ±28% 692ns ±21% ~ (p=0.105 n=10+10) IntSqr/30-8 1.03µs ±18% 1.06µs ±15% ~ (p=0.422 n=10+8) IntSqr/50-8 2.33µs ±14% 2.14µs ±20% ~ (p=0.063 n=10+10) IntSqr/80-8 4.06µs ±13% 3.72µs ±14% -8.31% (p=0.029 n=10+10) IntSqr/100-8 5.79µs ±10% 5.20µs ±18% -10.15% (p=0.004 n=10+10) IntSqr/200-8 17.1µs ± 1% 12.9µs ± 3% -24.44% (p=0.000 n=10+10) IntSqr/300-8 35.9µs ± 0% 26.6µs ± 1% -25.75% (p=0.000 n=10+10) IntSqr/500-8 84.9µs ± 0% 71.7µs ± 1% -15.49% (p=0.000 n=10+10) IntSqr/800-8 170µs ± 1% 142µs ± 2% -16.73% (p=0.000 n=10+10) IntSqr/1000-8 258µs ± 1% 218µs ± 1% -15.65% (p=0.000 n=10+10) Mul-8 10.4ms ± 1% 8.3ms ± 0% -20.05% (p=0.000 n=10+9) Exp3Power/0x10-8 311ns ±15% 321ns ±24% ~ (p=0.447 n=10+10) Exp3Power/0x40-8 358ns ±21% 346ns ±37% ~ (p=0.591 n=10+10) Exp3Power/0x100-8 611ns ±19% 570ns ±27% ~ (p=0.393 n=10+10) Exp3Power/0x400-8 1.31µs ±26% 1.34µs ±19% ~ (p=0.853 n=10+10) Exp3Power/0x1000-8 6.76µs ±23% 6.22µs ±16% ~ (p=0.095 n=10+9) Exp3Power/0x4000-8 37.6µs ±14% 36.4µs ±21% ~ (p=0.247 n=10+10) Exp3Power/0x10000-8 345µs ±14% 310µs ±11% -9.99% (p=0.005 n=10+10) Exp3Power/0x40000-8 2.77ms ± 1% 2.34ms ± 1% -15.47% (p=0.000 n=10+10) Exp3Power/0x100000-8 25.1ms ± 1% 21.3ms ± 1% -15.26% (p=0.000 n=10+10) Exp3Power/0x400000-8 225ms ± 1% 190ms ± 1% -15.61% (p=0.000 n=10+10) Fibo-8 23.4ms ± 1% 23.3ms ± 0% ~ (p=0.052 n=10+10) NatSqr/1-8 58.4ns ±24% 59.8ns ±38% ~ (p=0.739 n=10+10) NatSqr/2-8 122ns ±21% 122ns ±16% ~ (p=0.896 n=10+10) NatSqr/3-8 140ns ±28% 148ns ±30% ~ (p=0.288 n=10+10) NatSqr/5-8 193ns ±29% 210ns ±34% ~ (p=0.469 n=10+10) NatSqr/8-8 317ns ±21% 296ns ±25% ~ (p=0.393 n=10+10) NatSqr/10-8 362ns ± 8% 373ns ±30% ~ (p=0.617 n=9+10) NatSqr/20-8 1.24µs ±16% 1.06µs ±29% -14.57% (p=0.019 n=10+10) NatSqr/30-8 1.90µs ±32% 1.71µs ±10% ~ (p=0.176 n=10+9) NatSqr/50-8 4.22µs ±19% 3.67µs ± 7% -13.03% (p=0.017 n=10+9) NatSqr/80-8 7.33µs ±20% 6.50µs ±15% -11.26% (p=0.009 n=10+10) NatSqr/100-8 9.84µs ±18% 9.33µs ± 8% ~ (p=0.280 n=10+10) NatSqr/200-8 21.4µs ± 7% 20.0µs ±14% ~ (p=0.075 n=10+10) NatSqr/300-8 38.0µs ± 2% 31.3µs ±10% -17.63% (p=0.000 n=10+10) NatSqr/500-8 102µs ± 5% 101µs ± 4% ~ (p=0.780 n=9+10) NatSqr/800-8 190µs ± 3% 166µs ± 6% -12.29% (p=0.000 n=10+10) NatSqr/1000-8 277µs ± 2% 245µs ± 6% -11.64% (p=0.000 n=10+10) ScanPi-8 144µs ±23% 149µs ±24% ~ (p=0.579 n=10+10) StringPiParallel-8 25.6µs ± 0% 25.8µs ± 0% +0.69% (p=0.000 n=9+10) Scan/10/Base2-8 305ns ± 1% 309ns ± 1% +1.32% (p=0.000 n=10+9) Scan/100/Base2-8 1.95µs ± 1% 1.98µs ± 1% +1.10% (p=0.000 n=10+10) Scan/1000/Base2-8 19.5µs ± 1% 19.7µs ± 1% +1.39% (p=0.000 n=10+10) Scan/10000/Base2-8 270µs ± 1% 272µs ± 1% +0.58% (p=0.024 n=9+9) Scan/100000/Base2-8 10.3ms ± 0% 10.3ms ± 0% +0.16% (p=0.022 n=9+10) Scan/10/Base8-8 146ns ± 4% 154ns ± 4% +5.57% (p=0.000 n=9+9) Scan/100/Base8-8 748ns ± 1% 759ns ± 1% +1.51% (p=0.000 n=9+10) Scan/1000/Base8-8 7.88µs ± 1% 8.00µs ± 1% +1.64% (p=0.000 n=10+10) Scan/10000/Base8-8 155µs ± 1% 155µs ± 1% ~ (p=0.968 n=10+9) Scan/100000/Base8-8 9.11ms ± 0% 9.11ms ± 0% ~ (p=0.604 n=9+10) Scan/10/Base10-8 140ns ± 5% 149ns ± 5% +6.39% (p=0.000 n=9+10) Scan/100/Base10-8 680ns ± 0% 688ns ± 1% +1.08% (p=0.000 n=9+10) Scan/1000/Base10-8 7.09µs ± 1% 7.16µs ± 1% +0.98% (p=0.019 n=10+10) Scan/10000/Base10-8 149µs ± 3% 150µs ± 3% ~ (p=0.143 n=10+10) Scan/100000/Base10-8 9.16ms ± 0% 9.16ms ± 0% ~ (p=0.661 n=10+9) Scan/10/Base16-8 134ns ± 5% 135ns ± 3% ~ (p=0.505 n=9+9) Scan/100/Base16-8 560ns ± 1% 563ns ± 0% +0.67% (p=0.000 n=10+8) Scan/1000/Base16-8 6.28µs ± 1% 6.26µs ± 1% ~ (p=0.448 n=10+10) Scan/10000/Base16-8 161µs ± 1% 162µs ± 1% +0.74% (p=0.008 n=9+9) Scan/100000/Base16-8 9.64ms ± 0% 9.64ms ± 0% ~ (p=0.436 n=10+10) String/10/Base2-8 116ns ±12% 118ns ±13% ~ (p=0.645 n=10+10) String/100/Base2-8 871ns ±23% 860ns ±22% ~ (p=0.699 n=10+10) String/1000/Base2-8 10.0µs ±20% 10.0µs ±23% ~ (p=0.853 n=10+10) String/10000/Base2-8 110µs ±21% 120µs ±25% ~ (p=0.436 n=10+10) String/100000/Base2-8 768µs ±11% 733µs ±16% ~ (p=0.393 n=10+10) String/10/Base8-8 51.3ns ± 1% 51.0ns ± 3% ~ (p=0.286 n=9+9) String/100/Base8-8 284ns ± 9% 272ns ±12% ~ (p=0.267 n=9+10) String/1000/Base8-8 3.06µs ± 9% 3.04µs ±10% ~ (p=0.739 n=10+10) String/10000/Base8-8 36.1µs ±14% 35.1µs ± 9% ~ (p=0.447 n=10+9) String/100000/Base8-8 371µs ±12% 373µs ±16% ~ (p=0.739 n=10+10) String/10/Base10-8 167ns ±11% 165ns ± 9% ~ (p=0.781 n=10+10) String/100/Base10-8 727ns ± 1% 740ns ± 2% +1.70% (p=0.001 n=10+10) String/1000/Base10-8 5.30µs ±18% 5.37µs ±14% ~ (p=0.631 n=10+10) String/10000/Base10-8 45.0µs ±14% 44.6µs ±10% ~ (p=0.720 n=9+10) String/100000/Base10-8 5.10ms ± 1% 5.05ms ± 3% ~ (p=0.211 n=9+10) String/10/Base16-8 47.7ns ± 6% 47.7ns ± 6% ~ (p=0.985 n=10+10) String/100/Base16-8 221ns ±10% 234ns ±27% ~ (p=0.541 n=10+10) String/1000/Base16-8 2.23µs ±11% 2.12µs ± 8% -4.81% (p=0.029 n=9+8) String/10000/Base16-8 28.3µs ±21% 28.5µs ±14% ~ (p=0.796 n=10+10) String/100000/Base16-8 291µs ±16% 293µs ±15% ~ (p=0.931 n=9+9) LeafSize/0-8 2.43ms ± 1% 2.49ms ± 1% +2.56% (p=0.000 n=10+10) LeafSize/1-8 49.7µs ± 9% 46.3µs ±16% -6.78% (p=0.017 n=10+9) LeafSize/2-8 48.4µs ±18% 46.3µs ±19% ~ (p=0.436 n=10+10) LeafSize/3-8 81.7µs ± 3% 80.9µs ± 3% ~ (p=0.278 n=10+9) LeafSize/4-8 47.0µs ± 7% 47.9µs ±13% ~ (p=0.905 n=9+10) LeafSize/5-8 96.8µs ± 1% 97.3µs ± 2% ~ (p=0.515 n=8+10) LeafSize/6-8 82.5µs ± 4% 80.9µs ± 2% -1.92% (p=0.019 n=10+10) LeafSize/7-8 67.2µs ±13% 66.6µs ± 9% ~ (p=0.842 n=10+9) LeafSize/8-8 46.0µs ±28% 45.1µs ±12% ~ (p=0.739 n=10+10) LeafSize/9-8 111µs ± 1% 111µs ± 1% ~ (p=0.739 n=10+10) LeafSize/10-8 98.8µs ± 4% 97.9µs ± 3% ~ (p=0.278 n=10+9) LeafSize/11-8 96.8µs ± 1% 96.4µs ± 1% ~ (p=0.211 n=9+10) LeafSize/12-8 81.0µs ± 4% 81.3µs ± 3% ~ (p=0.579 n=10+10) LeafSize/13-8 79.7µs ± 5% 79.2µs ± 3% ~ (p=0.661 n=10+9) LeafSize/14-8 67.6µs ±12% 65.8µs ± 7% ~ (p=0.447 n=10+9) LeafSize/15-8 63.9µs ±17% 66.3µs ±14% ~ (p=0.481 n=10+10) LeafSize/16-8 44.0µs ±28% 46.0µs ±27% ~ (p=0.481 n=10+10) LeafSize/32-8 46.2µs ±13% 43.5µs ±18% ~ (p=0.156 n=9+10) LeafSize/64-8 53.3µs ±10% 53.0µs ±19% ~ (p=0.730 n=9+9) ProbablyPrime/n=0-8 3.60ms ± 1% 3.39ms ± 1% -5.87% (p=0.000 n=10+9) ProbablyPrime/n=1-8 4.42ms ± 1% 4.08ms ± 1% -7.69% (p=0.000 n=10+10) ProbablyPrime/n=5-8 7.57ms ± 2% 6.79ms ± 1% -10.24% (p=0.000 n=10+10) ProbablyPrime/n=10-8 11.6ms ± 2% 10.2ms ± 1% -11.69% (p=0.000 n=10+10) ProbablyPrime/n=20-8 19.4ms ± 2% 16.9ms ± 2% -12.89% (p=0.000 n=10+10) ProbablyPrime/Lucas-8 2.81ms ± 2% 2.72ms ± 1% -3.22% (p=0.000 n=10+9) ProbablyPrime/MillerRabinBase2-8 797µs ± 1% 680µs ± 1% -14.64% (p=0.000 n=10+10) name old speed new speed delta AddVV/1-8 17.1GB/s ± 6% 18.0GB/s ± 2% ~ (p=0.122 n=10+8) AddVV/2-8 32.4GB/s ± 2% 32.2GB/s ± 4% ~ (p=0.661 n=10+9) AddVV/3-8 38.6GB/s ± 2% 38.9GB/s ± 1% ~ (p=0.113 n=10+9) AddVV/4-8 45.8GB/s ± 2% 45.8GB/s ± 2% ~ (p=0.796 n=10+10) AddVV/5-8 48.1GB/s ± 2% 48.3GB/s ± 1% ~ (p=0.315 n=10+10) AddVV/10-8 78.9GB/s ± 1% 78.9GB/s ± 2% ~ (p=0.353 n=10+10) AddVV/100-8 136GB/s ± 2% 137GB/s ± 1% ~ (p=0.971 n=10+10) AddVV/1000-8 164GB/s ± 1% 164GB/s ± 4% ~ (p=0.853 n=10+10) AddVV/10000-8 126GB/s ± 6% 129GB/s ± 2% ~ (p=0.063 n=10+10) AddVV/100000-8 116GB/s ± 3% 116GB/s ± 3% ~ (p=0.796 n=10+10) AddVW/1-8 2.64GB/s ± 3% 2.64GB/s ± 3% ~ (p=0.579 n=10+10) AddVW/2-8 4.49GB/s ± 2% 4.44GB/s ± 2% -1.09% (p=0.040 n=9+9) AddVW/3-8 6.36GB/s ± 1% 6.34GB/s ± 2% ~ (p=0.684 n=10+10) AddVW/4-8 6.83GB/s ± 1% 6.82GB/s ± 2% ~ (p=0.905 n=10+9) AddVW/5-8 8.75GB/s ± 1% 8.73GB/s ± 1% ~ (p=0.796 n=10+10) AddVW/10-8 10.5GB/s ± 2% 10.5GB/s ± 1% ~ (p=0.971 n=10+10) AddVW/100-8 19.5GB/s ± 2% 18.9GB/s ± 2% -3.22% (p=0.000 n=10+10) AddVW/1000-8 20.7GB/s ± 2% 20.6GB/s ± 4% ~ (p=0.631 n=10+10) AddVW/10000-8 20.6GB/s ± 3% 20.7GB/s ± 3% ~ (p=0.481 n=10+10) AddVW/100000-8 19.4GB/s ± 2% 19.2GB/s ± 3% ~ (p=0.165 n=10+10) AddMulVVW/1-8 19.5GB/s ± 2% 19.7GB/s ± 3% ~ (p=0.123 n=10+10) AddMulVVW/2-8 30.1GB/s ± 2% 30.2GB/s ± 3% ~ (p=0.297 n=9+9) AddMulVVW/3-8 37.9GB/s ± 2% 36.5GB/s ± 2% -3.63% (p=0.000 n=10+10) AddMulVVW/4-8 40.0GB/s ± 2% 39.4GB/s ± 2% -1.58% (p=0.001 n=10+10) AddMulVVW/5-8 47.3GB/s ± 2% 46.6GB/s ± 1% -1.35% (p=0.001 n=9+9) AddMulVVW/10-8 52.3GB/s ± 2% 60.6GB/s ± 3% +15.76% (p=0.000 n=10+10) AddMulVVW/100-8 80.3GB/s ± 2% 122.1GB/s ± 1% +51.92% (p=0.000 n=10+10) AddMulVVW/1000-8 92.0GB/s ± 1% 130.3GB/s ± 2% +41.61% (p=0.000 n=9+10) AddMulVVW/10000-8 88.2GB/s ± 2% 108.2GB/s ± 5% +22.66% (p=0.000 n=10+10) AddMulVVW/100000-8 88.2GB/s ± 2% 102.9GB/s ± 2% +16.69% (p=0.000 n=10+10) Change-Id: Ic98e30c91d437d845fed03e07e976c3fdbf02b36 Reviewed-on: https://go-review.googlesource.com/74851 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Adam Langley <agl@golang.org>	2018-02-24 00:13:03 +00:00
Shawn Smith	d3beea8c52	all: fix misspellings GitHub-Last-Rev: `468df242d0` GitHub-Pull-Request: golang/go#23935 Change-Id: If751ce3ffa3a4d5e00a3138211383d12cb6b23fc Reviewed-on: https://go-review.googlesource.com/95577 Run-TryBot: Andrew Bonventre <andybons@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Andrew Bonventre <andybons@golang.org>	2018-02-20 21:02:58 +00:00
Alberto Donizetti	331092c58f	math/big: fix %s verbs in Float tests error messages Fatalf calls in two Float tests use the %s verb with Floats values, which is not allowed and results in failure messages that look like this: float_test.go:1385: i = 0, prec = 1, ToZero: %!s(big.Float=1) [0] / %!s(big.Float=1) [0] = %!s(big.Float=0.0625) want %!s(big.Float=1) Switch to %v. Change-Id: Ifdc80bf19c91ca1b190f6551a6d0a51b42ed5919 Reviewed-on: https://go-review.googlesource.com/87199 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-02-14 09:50:19 +00:00
Thanabodee Charoenpiriyakij	1124fa300b	math: use Abs rather than if x < 0 { x = -x } This is the benchmark result base on darwin with amd64 architecture: name old time/op new time/op delta Cos 10.2ns ± 2% 10.3ns ± 3% +1.18% (p=0.032 n=10+10) Cosh 25.3ns ± 3% 24.6ns ± 2% -3.00% (p=0.000 n=10+10) Hypot 6.40ns ± 2% 6.19ns ± 3% -3.36% (p=0.000 n=10+10) HypotGo 7.16ns ± 3% 6.54ns ± 2% -8.66% (p=0.000 n=10+10) J0 66.0ns ± 2% 63.7ns ± 1% -3.42% (p=0.000 n=9+10) Fixes #21812 Change-Id: I2b88fbdfc250cd548f8f08b44ce2eb172dcacf43 Reviewed-on: https://go-review.googlesource.com/84437 Reviewed-by: Giovanni Bajo <rasky@develer.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-02-13 20:12:23 +00:00
Paul PISCUC	3526c40979	math/rand: typo fixed in documentation of seedPos In the comment of seedPost, the word: condiiton was changed to: condition Change-Id: I8967cc0e9f5d37776bada96cc1443c8bf46e1117 Reviewed-on: https://go-review.googlesource.com/86156 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-01-04 20:27:29 +00:00
Brian Kessler	5305bdd86b	math: correct result for Pow(x, ±.5) Fixes #23224 The previous Pow code had an optimization for powers equal to ±0.5 that used Sqrt for increased accuracy/speed. This caused special cases involving powers of ±0.5 to disagree with the Pow spec. This change places the Sqrt optimization after all of the special case handling. Change-Id: I6bf757f6248256b29cc21725a84e27705d855369 Reviewed-on: https://go-review.googlesource.com/85660 Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-01-02 18:10:43 +00:00
Ilya Tocar	1992893307	math: remove asm version of Dim Dim performance has regressed by 14% vs 1.9 on amd64. Current pure go version of Dim is faster and, what is even more important for performance, is inlinable, so instead of tweaking asm implementation, just remove it. I had to update BenchmarkDim, because it was simply reloading constant(answer) in a loop. Perf data below: name old time/op new time/op delta Dim-6 6.79ns ± 0% 1.60ns ± 1% -76.39% (p=0.000 n=7+10) If I modify benchmark to be the same as in this CL results are even better: name old time/op new time/op delta Dim-6 10.2ns ± 0% 1.6ns ± 1% -84.27% (p=0.000 n=8+10) Updates #21913 Change-Id: I00e23c8affc293531e1d9f0e0e49f3a525634f53 Reviewed-on: https://go-review.googlesource.com/80695 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-11-30 21:00:33 +00:00
Alberto Donizetti	ff534e2130	math/big: protect against aliasing in nat.divLarge In nat.divLarge (having signature (z nat).divLarge(u, uIn, v nat)), we check whether z aliases uIn or v, but aliasing is currently not checked for the u parameter. Unfortunately, z and u aliasing each other can in some cases cause errors in the computation. The q return parameter (which will hold the result's quotient), is unconditionally initialized as q = z.make(m + 1) When cap(z) ≥ m+1, z.make() will reuse z's backing array, causing q and z to share the same backing array. If then z aliases u, setting q during the quotient computation will then corrupt u, which at that point already holds computation state. To fix this, we add an alias(z, u) check at the beginning of the function, taking care of aliasing the same way we already do for uIn and v. Fixes #22830 Change-Id: I3ab81120d5af6db7772a062bb1dfc011de91f7ad Reviewed-on: https://go-review.googlesource.com/78995 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-11-30 20:36:54 +00:00
Vladimir Stefanovic	2708da0dc1	runtime/cgo, math: don't use FP instructions for soft-float mips{,le} Updates #18162 Change-Id: I591fcf71a02678a99a56a6487da9689d3c9b1bb6 Reviewed-on: https://go-review.googlesource.com/37955 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-11-30 17:12:32 +00:00
Brian Kessler	802a8f88a3	math/cmplx: use signed zero to correct branch cuts Branch cuts for the elementary complex functions along real or imaginary axes should be resolved in floating point calculations by one-sided continuity with signed zero as described in: "Branch Cuts for Complex Elementary Functions or Much Ado About Nothing's Sign Bit" W. Kahan Available at: https://people.freebsd.org/~das/kahan86branch.pdf And as described in the C99 standard which is claimed as the original cephes source. Sqrt did not return the correct branch when imag(x) == 0. The branch is now determined by sign(imag(x)). This incorrect branch choice was affecting the behavior of the Trigonometric/Hyperbolic functions that use Sqrt in intermediate calculations. Asin, Asinh and Atan had spurious domain checks, whereas the functions should be valid over the whole complex plane with appropriate branch cuts. Fixes #6888 Change-Id: I9b1278af54f54bfb4208276ae345bbd3ddf3ec83 Reviewed-on: https://go-review.googlesource.com/46492 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2017-11-27 07:44:00 +00:00
Russ Cox	d9a198c7d0	Revert "math/rand: make Perm match Shuffle" This reverts CL 55972. Reason for revert: this changes Perm's behavior unnecessarily. I asked for this change originally but I now regret it. Reverting so that I don't have to justify it in Go 1.10 release notes. Edited to keep the change to rand_test.go, which seems to have been mostly unrelated. Fixes #22744. Change-Id: If8bb1bcde3ced0db2fdcd0aa65ab128613686c66 Reviewed-on: https://go-review.googlesource.com/78195 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com> Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2017-11-16 16:24:30 +00:00
Daniel Martí	a265f2e90e	go/printer: indent lone comments in composite lits If a composite literal contains any comments on their own lines without any elements, the printer would unindent the comments. The comments in this edge case are written when the closing '}' is written. Indent and outdent first so that the indentation is interspersed before the comment is written. Also note that the go/printer golden tests don't show the exact same behaviour that gofmt does. Added a TODO to figure this out in a separate CL. While at it, ensure that the tree conforms to gofmt. The changes are unrelated to this indentation fix, however. Fixes #22355. Change-Id: I5ac25ac6de95a236f1e123479127cc4dd71e93fe Reviewed-on: https://go-review.googlesource.com/74232 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-11-15 18:48:48 +00:00
Brian Kessler	2955a8a6cc	math/big: clarify comment on lehmerGCD overflow A clarifying comment was added to indicate that overflow of a single Word is not possible in the single digit calculation. Lehmer's paper includes a proof of the bounds on the size of the cosequences (u0, u1, u2, v0, v1, v2). Change-Id: I98127a07aa8f8fe44814b74b2bc6ff720805194b Reviewed-on: https://go-review.googlesource.com/77451 Reviewed-by: Robert Griesemer <gri@golang.org>	2017-11-14 17:32:39 +00:00
Filippo Valsorda	ef0e2af7b0	math/big: add security warning to (*Int).Rand Change-Id: I22a67733aa2d07298e124077654c9b1473802100 Reviewed-on: https://go-review.googlesource.com/76012 Reviewed-by: Aliaksandr Valialkin <valyala@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2017-11-06 15:55:31 +00:00
Tobias Klauser	89bcbf40b8	math/bits: add examples for right rotation Right rotation is achieved using negative k in RotateLeft*(x, k). Add examples demonstrating that functionality. Change-Id: I15dab159accd2937cb18d3fa8ca32da8501567d3 Reviewed-on: https://go-review.googlesource.com/75371 Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-11-03 20:12:07 +00:00
Lynn Boger	3860478b42	math: implement asm modf for ppc64x This change adds an asm implementations modf for ppc64x. Improvements: BenchmarkModf-16 7.48 6.26 -16.31% Updates: #21390 Change-Id: I9c4f3213688e3e8842d050840dc04fc9c0bf6ce4 Reviewed-on: https://go-review.googlesource.com/74411 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Munday <mike.munday@ibm.com>	2017-11-02 13:24:32 +00:00
griesemer	85c32c3744	math/big: implement CmpAbs Fixes #22473. Change-Id: Ie886dfc8b5510970d6d63ca6472c73325f6f2276 Reviewed-on: https://go-review.googlesource.com/74971 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com>	2017-11-01 18:17:49 +00:00
Alberto Donizetti	856dccb175	math/big: avoid unnecessary Newton iteration in Float.Sqrt An initial draft of the Newton code for Float.Sqrt was structured like this: for condition // do Newton iteration.. prec = 2 since prec, at the end of the loop, was double the precision used in the last Newton iteration, the termination condition was set to 2limit. The code was later rewritten in the form for condition prec = 2 // do Newton iteration.. but condition was not updated, and it's still 2limit, which is about double what we actually need, and is triggering the execution of an additional, and unnecessary, Newton iteration. This change adjusts the Newton termination condition to the (correct) value of z.prec, plus 32 guard bits as a safety margin. name old time/op new time/op delta FloatSqrt/64-4 798ns ± 3% 802ns ± 3% ~ (p=0.458 n=8+8) FloatSqrt/128-4 1.65µs ± 1% 1.65µs ± 1% ~ (p=0.290 n=8+8) FloatSqrt/256-4 3.10µs ± 1% 2.10µs ± 0% -32.32% (p=0.000 n=8+7) FloatSqrt/1000-4 8.83µs ± 1% 4.91µs ± 2% -44.39% (p=0.000 n=8+8) FloatSqrt/10000-4 107µs ± 1% 40µs ± 1% -62.68% (p=0.000 n=8+8) FloatSqrt/100000-4 2.91ms ± 1% 0.96ms ± 1% -67.13% (p=0.000 n=8+8) FloatSqrt/1000000-4 240ms ± 1% 80ms ± 1% -66.66% (p=0.000 n=8+8) name old alloc/op new alloc/op delta FloatSqrt/64-4 416B ± 0% 416B ± 0% ~ (all equal) FloatSqrt/128-4 720B ± 0% 720B ± 0% ~ (all equal) FloatSqrt/256-4 1.34kB ± 0% 0.82kB ± 0% -39.29% (p=0.000 n=8+8) FloatSqrt/1000-4 5.09kB ± 0% 2.50kB ± 0% -50.94% (p=0.000 n=8+8) FloatSqrt/10000-4 45.9kB ± 0% 23.5kB ± 0% -48.81% (p=0.000 n=8+8) FloatSqrt/100000-4 533kB ± 0% 251kB ± 0% -52.90% (p=0.000 n=8+8) FloatSqrt/1000000-4 9.21MB ± 0% 4.61MB ± 0% -49.98% (p=0.000 n=8+8) name old allocs/op new allocs/op delta FloatSqrt/64-4 9.00 ± 0% 9.00 ± 0% ~ (all equal) FloatSqrt/128-4 13.0 ± 0% 13.0 ± 0% ~ (all equal) FloatSqrt/256-4 15.0 ± 0% 12.0 ± 0% -20.00% (p=0.000 n=8+8) FloatSqrt/1000-4 24.0 ± 0% 19.0 ± 0% -20.83% (p=0.000 n=8+8) FloatSqrt/10000-4 40.0 ± 0% 35.0 ± 0% -12.50% (p=0.000 n=8+8) FloatSqrt/100000-4 66.0 ± 0% 55.0 ± 0% -16.67% (p=0.000 n=8+8) FloatSqrt/1000000-4 143 ± 0% 122 ± 0% -14.69% (p=0.000 n=8+8) Change-Id: I4868adb7f8960f2ca20e7792734c2e6211669fc0 Reviewed-on: https://go-review.googlesource.com/75010 Reviewed-by: Robert Griesemer <gri@golang.org>	2017-11-01 16:21:06 +00:00
Alberto Donizetti	2c783dc038	math/big: save one subtraction per iteration in Float.Sqrt The Sqrt Newton method computes g(t) = f(t)/f'(t) and then iterates t2 = t1 - g(t1) We can save one operation by including the final subtraction in g(t) and evaluating the resulting expression symbolically. For example, for the direct method, g(t) = ½(t² - x)/t and we use 2 multiplications, 1 division and 1 subtraction in g(), plus 1 final subtraction; but if we compute t - g(t) = t - ½(t² - x)/t = ½(t² + x)/t we only use 2 multiplications, 1 division and 1 addition. A similar simplification can be done for the inverse method. name old time/op new time/op delta FloatSqrt/64-4 889ns ± 4% 790ns ± 1% -11.19% (p=0.000 n=8+7) FloatSqrt/128-4 1.82µs ± 0% 1.64µs ± 1% -10.07% (p=0.001 n=6+8) FloatSqrt/256-4 3.56µs ± 4% 3.10µs ± 3% -12.96% (p=0.000 n=7+8) FloatSqrt/1000-4 9.06µs ± 3% 8.86µs ± 1% -2.20% (p=0.001 n=7+7) FloatSqrt/10000-4 109µs ± 1% 107µs ± 1% -1.56% (p=0.000 n=8+8) FloatSqrt/100000-4 2.91ms ± 0% 2.89ms ± 2% -0.68% (p=0.026 n=7+7) FloatSqrt/1000000-4 237ms ± 1% 239ms ± 1% +0.72% (p=0.021 n=8+8) name old alloc/op new alloc/op delta FloatSqrt/64-4 448B ± 0% 416B ± 0% -7.14% (p=0.000 n=8+8) FloatSqrt/128-4 752B ± 0% 720B ± 0% -4.26% (p=0.000 n=8+8) FloatSqrt/256-4 2.05kB ± 0% 1.34kB ± 0% -34.38% (p=0.000 n=8+8) FloatSqrt/1000-4 6.91kB ± 0% 5.09kB ± 0% -26.39% (p=0.000 n=8+8) FloatSqrt/10000-4 60.5kB ± 0% 45.9kB ± 0% -24.17% (p=0.000 n=8+8) FloatSqrt/100000-4 617kB ± 0% 533kB ± 0% -13.57% (p=0.000 n=8+8) FloatSqrt/1000000-4 10.3MB ± 0% 9.2MB ± 0% -10.85% (p=0.000 n=8+8) name old allocs/op new allocs/op delta FloatSqrt/64-4 9.00 ± 0% 9.00 ± 0% ~ (all equal) FloatSqrt/128-4 13.0 ± 0% 13.0 ± 0% ~ (all equal) FloatSqrt/256-4 20.0 ± 0% 15.0 ± 0% -25.00% (p=0.000 n=8+8) FloatSqrt/1000-4 31.0 ± 0% 24.0 ± 0% -22.58% (p=0.000 n=8+8) FloatSqrt/10000-4 50.0 ± 0% 40.0 ± 0% -20.00% (p=0.000 n=8+8) FloatSqrt/100000-4 76.0 ± 0% 66.0 ± 0% -13.16% (p=0.000 n=8+8) FloatSqrt/1000000-4 146 ± 0% 143 ± 0% -2.05% (p=0.000 n=8+8) Change-Id: I271c00de1ca9740e585bf2af7bcd87b18c1fa68e Reviewed-on: https://go-review.googlesource.com/73879 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-11-01 11:24:02 +00:00
Ilya Tocar	94484d8ed5	cmd/compile: intrinsify math.{Trunc/Ceil/Floor} on amd64 This significantly speed-ups Trunc. Ceil/Floor are using the same instruction, so do them too. name old time/op new time/op delta Floor-6 3.33ns ± 1% 3.22ns ± 0% -3.39% (p=0.000 n=10+10) Ceil-6 3.33ns ± 1% 3.22ns ± 0% -3.16% (p=0.000 n=10+7) Trunc-6 4.83ns ± 0% 3.22ns ± 0% -33.36% (p=0.000 n=6+8) Change-Id: If848790e458eedfe38a6a0407bb4f589c68ac254 Reviewed-on: https://go-review.googlesource.com/68630 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-10-31 19:30:54 +00:00
Michael Munday	c280126557	cmd/asm, cmd/internal/obj/s390x, math: add "test under mask" instructions Adds the following s390x test under mask (immediate) instructions: TMHH TMHL TMLH TMLL These are useful for testing bits and are already used in the math package. Change-Id: Idffb3f83b238dba76ac1e42ac6b0bf7f1d11bea2 Reviewed-on: https://go-review.googlesource.com/41092 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-10-30 23:55:14 +00:00
Michael Munday	b97688d112	math: optimize dim and remove s390x assembly implementation By calculating dim directly, rather than calling max, we can simplify the generated code significantly. The compiler now reports that dim is easily inlineable, but it can't be inlined because there is still an assembly stub for Dim. Since dim is now very simple I no longer think it is worth having assembly implementations of it. I have therefore removed the s390x assembly. Removing the other assembly for Dim is #21913. name old time/op new time/op delta Dim 4.29ns ± 0% 3.53ns ± 0% -17.62% (p=0.000 n=9+8) Change-Id: Ic38a6b51603cbc661dcdb868ecf2b1947e9f399e Reviewed-on: https://go-review.googlesource.com/64194 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-10-30 19:05:51 +00:00
Alberto Donizetti	bd48d37e30	math/big: add (Float).Sqrt This change adds a Square root method to the big.Float type, with signature (z Float) Sqrt(x Float) Float Fixes #20460 Change-Id: I050aaed0615fe0894e11c800744600648343c223 Reviewed-on: https://go-review.googlesource.com/67830 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-10-26 17:29:27 +00:00
Brian Kessler	1643d4f33a	math/big: implement Lehmer's GCD algorithm Updates #15833 Lehmer's GCD algorithm uses single precision calculations to simulate several steps of multiple precision calculations in Euclid's GCD algorithm which leads to a considerable speed up. This implementation uses Collins' simplified testing condition on the single digit cosequences which requires only one quotient and avoids any possibility of overflow. name old time/op new time/op delta GCD10x10/WithoutXY-4 1.82µs ±24% 0.28µs ± 6% -84.40% (p=0.008 n=5+5) GCD10x10/WithXY-4 1.69µs ± 6% 1.71µs ± 6% ~ (p=0.595 n=5+5) GCD10x100/WithoutXY-4 1.87µs ± 2% 0.56µs ± 4% -70.13% (p=0.008 n=5+5) GCD10x100/WithXY-4 2.61µs ± 2% 2.65µs ± 4% ~ (p=0.635 n=5+5) GCD10x1000/WithoutXY-4 2.75µs ± 2% 1.48µs ± 1% -46.06% (p=0.008 n=5+5) GCD10x1000/WithXY-4 5.29µs ± 2% 5.25µs ± 2% ~ (p=0.548 n=5+5) GCD10x10000/WithoutXY-4 10.7µs ± 2% 10.3µs ± 0% -4.38% (p=0.008 n=5+5) GCD10x10000/WithXY-4 22.3µs ± 6% 22.1µs ± 1% ~ (p=1.000 n=5+5) GCD10x100000/WithoutXY-4 93.7µs ± 2% 99.4µs ± 2% +6.09% (p=0.008 n=5+5) GCD10x100000/WithXY-4 196µs ± 2% 199µs ± 2% ~ (p=0.222 n=5+5) GCD100x100/WithoutXY-4 10.1µs ± 2% 2.5µs ± 2% -74.84% (p=0.008 n=5+5) GCD100x100/WithXY-4 21.4µs ± 2% 21.3µs ± 7% ~ (p=0.548 n=5+5) GCD100x1000/WithoutXY-4 11.3µs ± 2% 4.4µs ± 4% -60.87% (p=0.008 n=5+5) GCD100x1000/WithXY-4 24.7µs ± 3% 23.9µs ± 1% ~ (p=0.056 n=5+5) GCD100x10000/WithoutXY-4 26.6µs ± 1% 20.0µs ± 2% -24.82% (p=0.008 n=5+5) GCD100x10000/WithXY-4 78.7µs ± 2% 78.2µs ± 2% ~ (p=0.690 n=5+5) GCD100x100000/WithoutXY-4 174µs ± 2% 171µs ± 1% ~ (p=0.056 n=5+5) GCD100x100000/WithXY-4 563µs ± 4% 561µs ± 2% ~ (p=1.000 n=5+5) GCD1000x1000/WithoutXY-4 120µs ± 5% 29µs ± 3% -75.71% (p=0.008 n=5+5) GCD1000x1000/WithXY-4 355µs ± 4% 358µs ± 2% ~ (p=0.841 n=5+5) GCD1000x10000/WithoutXY-4 140µs ± 2% 49µs ± 2% -65.07% (p=0.008 n=5+5) GCD1000x10000/WithXY-4 626µs ± 3% 628µs ± 9% ~ (p=0.690 n=5+5) GCD1000x100000/WithoutXY-4 340µs ± 4% 259µs ± 6% -23.79% (p=0.008 n=5+5) GCD1000x100000/WithXY-4 3.76ms ± 4% 3.82ms ± 5% ~ (p=0.310 n=5+5) GCD10000x10000/WithoutXY-4 3.11ms ± 3% 0.54ms ± 2% -82.74% (p=0.008 n=5+5) GCD10000x10000/WithXY-4 7.96ms ± 3% 7.69ms ± 3% ~ (p=0.151 n=5+5) GCD10000x100000/WithoutXY-4 3.88ms ± 1% 1.27ms ± 2% -67.21% (p=0.008 n=5+5) GCD10000x100000/WithXY-4 38.1ms ± 2% 38.8ms ± 1% ~ (p=0.095 n=5+5) GCD100000x100000/WithoutXY-4 208ms ± 1% 25ms ± 4% -88.07% (p=0.008 n=5+5) GCD100000x100000/WithXY-4 533ms ± 5% 525ms ± 4% ~ (p=0.548 n=5+5) Change-Id: Ic1e007eb807b93e75f4752e968e98c1f0cb90e43 Reviewed-on: https://go-review.googlesource.com/59450 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-10-24 22:42:43 +00:00
Mark Pulford	a5c44f3e3f	math: add RoundToEven function Rounding ties to even is statistically useful for some applications. This implementation completes IEEE float64 rounding mode support (in addition to Round, Ceil, Floor, Trunc). This function avoids subtle faults found in ad-hoc implementations, and is simple enough to be inlined by the compiler. Fixes #21748 Change-Id: I09415df2e42435f9e7dabe3bdc0148e9b9ebd609 Reviewed-on: https://go-review.googlesource.com/61211 Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-10-24 22:33:09 +00:00
Matthew Dempsky	a5868a47c6	cmd/internal/obj/x86: move MOV->XOR rewriting into compiler Fixes #20986. Change-Id: Ic3cf5c0ab260f259ecff7b92cfdf5f4ae432aef3 Reviewed-on: https://go-review.googlesource.com/73072 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2017-10-24 21:32:17 +00:00
Filippo Valsorda	f75158c365	math/big: fix ModSqrt optimized path for x = z name old time/op new time/op delta ModSqrt224_3Mod4-4 153µs ± 2% 154µs ± 1% ~ (p=0.548 n=5+5) ModSqrt5430_3Mod4-4 776ms ± 2% 791ms ± 2% ~ (p=0.222 n=5+5) Fixes #22265 Change-Id: If233542716e04341990a45a1c2b7118da6d233f7 Reviewed-on: https://go-review.googlesource.com/70832 Run-TryBot: Filippo Valsorda <hi@filippo.io> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-10-16 21:41:44 +00:00
griesemer	51cfe6849a	math/big: provide support for conversion bases up to 62 Increase MaxBase from 36 to 62 and extend the conversion alphabet with the upper-case letters 'A' to 'Z'. For int conversions with bases <= 36, the letters 'A' to 'Z' have the same values (10 to 35) as the corresponding lower-case letters. For conversion bases > 36 up to 62, the upper-case letters have the values 36 to 61. Added MaxBase to api/except.txt: Clients should not make assumptions about the value of MaxBase being constant. The core of the change is in natconv.go. The remaining changes are adjusted tests and documentation. Fixes #21558. Change-Id: I5f74da633caafca03993e13f32ac9546c572cc84 Reviewed-on: https://go-review.googlesource.com/65970 Reviewed-by: Martin Möhrmann <moehrmann@google.com>	2017-10-06 17:46:15 +00:00
griesemer	2ddd07138d	math/bits: complete examples Change-Id: Icbe6885ffd3aa4e77441ab03a2b9a04a9276d5eb Reviewed-on: https://go-review.googlesource.com/68311 Reviewed-by: Martin Möhrmann <moehrmann@google.com>	2017-10-06 16:58:03 +00:00
Marvin Stenger	90d71fe99e	all: revert "all: prefer strings.IndexByte over strings.Index" This reverts https://golang.org/cl/65930. Fixes #22148 Change-Id: Ie0712621ed89c43bef94417fc32de9af77607760 Reviewed-on: https://go-review.googlesource.com/68430 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2017-10-05 23:19:10 +00:00
Marvin Stenger	aa00c607e1	math/big: remove []byte/string conversions This removes some of the []byte/string conversions currently existing in the (un)marshaling methods of Int and Rat. For Int we introduce a new function (Int).setFromScanner() essentially implementing the SetString method being given an io.ByteScanner instead of a string. So we can handle the string case in (Int).SetString with a strings.Reader and the []byte case in (Int).UnmarshalText() with a bytes.Reader now avoiding the []byte/string conversion here. For Rat we introduce a new function (Rat).marshal() essentially implementing the String method outputting []byte instead of string. Using this new function and the same formatting rules as in (Rat).RatString we can implement (Rat).MarshalText() without the []byte/string conversion it used to have. Change-Id: Ic5ef246c1582c428a40f214b95a16671ef0a06d9 Reviewed-on: https://go-review.googlesource.com/65950 Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-10-04 17:16:52 +00:00
Marvin Stenger	f22ba1f247	all: prefer strings.IndexByte over strings.Index strings.IndexByte was introduced in go1.2 and it can be used effectively wherever the second argument to strings.Index is exactly one byte long. This avoids generating unnecessary string symbols and saves a few calls to strings.Index. Change-Id: I1ab5edb7c4ee9058084cfa57cbcc267c2597e793 Reviewed-on: https://go-review.googlesource.com/65930 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2017-09-25 17:35:41 +00:00
Marvin Stenger	7a5d76fa62	math/big: delete solved TODO The TODO is no longer needed as it was solved by a previous CL. See https://go-review.googlesource.com/14995. Change-Id: If62d1b296f35758ad3d18d28c8fbb95e797f4464 Reviewed-on: https://go-review.googlesource.com/65232 Reviewed-by: Robert Griesemer <gri@golang.org>	2017-09-22 10:21:46 +00:00
Agniva De Sarker	d2f317218b	math: implement fast path for Exp - using FMA and AVX instructions if available to speed-up Exp calculation on amd64 - using a data table instead of #define'ed constants because these instructions do not support loading floating point immediates. One has to use a memory operand / register. - Benchmark results on Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz: Original vs New (non-FMA path) name old time/op new time/op delta Exp 16.0ns ± 1% 16.1ns ± 3% ~ (p=0.308 n=9+10) Original vs New (FMA path) name old time/op new time/op delta Exp 16.0ns ± 1% 13.7ns ± 2% -14.80% (p=0.000 n=9+10) Change-Id: I3d8986925d82b39b95ee979ae06f59d7e591d02e Reviewed-on: https://go-review.googlesource.com/62590 Reviewed-by: Ilya Tocar <ilya.tocar@intel.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-09-20 21:43:00 +00:00
Burak Guven	9cc170f9a5	math/rand: fix comment for Shuffle Shuffle panics if n < 0, not n <= 0. The comment for the (*Rand).Shuffle function is already accurate. Change-Id: I073049310bca9632e50e9ca3ff79eec402122793 Reviewed-on: https://go-review.googlesource.com/63750 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2017-09-14 03:41:35 +00:00
Michael Munday	ffb4708d1b	math: fix Abs, Copysign and Signbit benchmarks CL 62250 makes constant folding a bit more aggressive and these benchmarks were optimized away. This CL adds some indirection to the function arguments to stop them being folded. The Copysign benchmark is a bit faster because I've left one argument as a constant and it can be partially folded. old CL 62250 this CL Copysign 1.24ns ± 0% 0.34ns ± 2% 1.02ns ± 2% Abs 0.67ns ± 0% 0.35ns ± 3% 0.67ns ± 0% Signbit 0.87ns ± 0% 0.35ns ± 2% 0.87ns ± 1% Change-Id: I9604465a87d7aa29f4bd6009839c8ee354be3cd7 Reviewed-on: https://go-review.googlesource.com/62450 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2017-09-09 16:52:16 +00:00
Ian Lance Taylor	44f7fd030f	math/rand: change http to https in comment Change-Id: I19c1b0e1b238dda82e69bd47459528ed06b55840 Reviewed-on: https://go-review.googlesource.com/62310 Reviewed-by: David Crawshaw <crawshaw@golang.org>	2017-09-08 22:11:55 +00:00
Josh Bleecher Snyder	caae0917bf	math/rand: make Perm match Shuffle Perm and Shuffle are fundamentally doing the same work. This change makes Perm's algorithm match Shuffle's. In addition to allowing developers to switch more easily between the two methods, it affords a nice speed-up: name old time/op new time/op delta Perm3-8 75.7ns ± 1% 51.8ns ± 1% -31.59% (p=0.000 n=9+8) Perm30-8 610ns ± 1% 405ns ± 1% -33.67% (p=0.000 n=9+9) This change alters the output from Perm, given the same Source and seed. This is a change from Go 1.0 behavior. This necessitates updating the regression test. This also changes the number of calls made to the Source during Perm, which changes the output of the math/rand examples. This also slightly perturbs the output of Perm, nudging it out of the range currently accepted by TestUniformFactorial. However, it is complete unclear that the helpers relied on by TestUniformFactorial are correct. That is #21211. This change updates checkSimilarDistribution to respect closeEnough for standard deviations, which makes the test pass. The whole situation is muddy; see #21211 for details. There is an alternative implementation of Perm that avoids initializing m, which is more similar to the existing implementation, plus some optimizations: func (r *Rand) Perm(n int) []int { m := make([]int, n) max31 := n if n > 1<<31-1-1 { max31 = 1<<31 - 1 - 1 } i := 1 for ; i < max31; i++ { j := r.int31n(int32(i + 1)) m[i] = m[j] m[j] = i } for ; i < n; i++ { j := r.Int63n(int64(i + 1)) m[i] = m[j] m[j] = i } return m } This is a tiny bit faster than the implementation actually used in this change: name old time/op new time/op delta Perm3-8 51.8ns ± 1% 50.3ns ± 1% -2.83% (p=0.000 n=8+9) Perm30-8 405ns ± 1% 394ns ± 1% -2.66% (p=0.000 n=9+8) However, 3% in performance doesn't seem worth having the two algorithms diverge, nor the reduced readability of this alternative. Updates #16213. Change-Id: I11a7441ff8837ee9c241b4c88f7aa905348be781 Reviewed-on: https://go-review.googlesource.com/55972 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rob Pike <r@golang.org>	2017-09-08 13:26:20 +00:00
Josh Bleecher Snyder	a2dfe5d278	math/rand: add Shuffle Shuffle uses the Fisher-Yates algorithm. Since this is new API, it affords us the opportunity to use a much faster Int31n implementation that mostly avoids division. As a result, BenchmarkPerm30ViaShuffle is about 30% faster than BenchmarkPerm30, despite requiring a separate initialization loop and using function calls to swap elements. Fixes #20480 Updates #16213 Updates #21211 Change-Id: Ib8956c4bebed9d84f193eb98282ec16ee7c2b2d5 Reviewed-on: https://go-review.googlesource.com/51891 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2017-09-08 13:03:02 +00:00
Mark Pulford	03c3bb5f84	math: Add Round function (ties away from zero) This function avoids subtle faults found in many ad-hoc implementations, and is simple enough to be inlined by the compiler. Fixes #20100 Change-Id: Ib320254e9b1f1f798c6ef906b116f63bc29e8d08 Reviewed-on: https://go-review.googlesource.com/43652 Reviewed-by: Robert Griesemer <gri@golang.org>	2017-09-02 21:00:08 +00:00
griesemer	f7cb5bca1a	math/big: fix internal comment Change-Id: Id003e2dbecad7b3c249a747f8b4032135dfbe34f Reviewed-on: https://go-review.googlesource.com/60670 Reviewed-by: Marcel van Lohuizen <mpvl@golang.org>	2017-08-31 13:05:11 +00:00
jaredculp	dc42ffff59	math: add examples for trig functions Change-Id: Ic3ce2f3c055f2636ec8fc9cec8592e596b18dc05 Reviewed-on: https://go-review.googlesource.com/54771 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2017-08-25 20:26:19 +00:00

1 2 3 4 5 ...

589 Commits