mirror/go - go - Git Fam. Sieh

Commit Graph

Author	SHA1	Message	Date
Alberto Donizetti	010579c237	math/big: allocate less in Float.Sqrt The Newton sqrtInverse procedure we use to compute Float.Sqrt should not allocate a number of times proportional to the number of Newton iterations we need to reach the desired precision. At the beginning the function the target precision is known, so even if we do want to perform the early steps at low precisions (to save time), it's still possible to pre-allocate larger backing arrays, both for the temp variables in the loop and the variable that'll hold the final result. There's one complication. At the following line: u.Sub(three, u) the Sub method will allocate, because the receiver aliases one of the arguments, and the large backing array we initially allocated for u will be replaced by a smaller one allocated by Sub. We can work around this by introducing a second temp variable u2 that we use to hold the Sub call result. Overall, the sqrtInverse procedure still allocates a number of times proportional to the number of Newton steps, because unfortunately a few of the Mul calls in the Newton function allocate; but at least we allocate less in the function itself. FloatSqrt/256-4 1.97µs ± 1% 1.84µs ± 1% -6.61% (p=0.000 n=8+8) FloatSqrt/1000-4 4.80µs ± 3% 4.28µs ± 1% -10.78% (p=0.000 n=8+8) FloatSqrt/10000-4 40.0µs ± 1% 38.3µs ± 1% -4.15% (p=0.000 n=8+8) FloatSqrt/100000-4 955µs ± 1% 932µs ± 0% -2.49% (p=0.000 n=8+7) FloatSqrt/1000000-4 79.8ms ± 1% 79.4ms ± 1% ~ (p=0.105 n=8+8) name old alloc/op new alloc/op delta FloatSqrt/256-4 816B ± 0% 512B ± 0% -37.25% (p=0.000 n=8+8) FloatSqrt/1000-4 2.50kB ± 0% 1.47kB ± 0% -41.03% (p=0.000 n=8+8) FloatSqrt/10000-4 23.5kB ± 0% 18.2kB ± 0% -22.62% (p=0.000 n=8+8) FloatSqrt/100000-4 251kB ± 0% 173kB ± 0% -31.26% (p=0.000 n=8+8) FloatSqrt/1000000-4 4.61MB ± 0% 2.86MB ± 0% -37.90% (p=0.000 n=8+8) name old allocs/op new allocs/op delta FloatSqrt/256-4 12.0 ± 0% 8.0 ± 0% -33.33% (p=0.000 n=8+8) FloatSqrt/1000-4 19.0 ± 0% 9.0 ± 0% -52.63% (p=0.000 n=8+8) FloatSqrt/10000-4 35.0 ± 0% 14.0 ± 0% -60.00% (p=0.000 n=8+8) FloatSqrt/100000-4 55.0 ± 0% 23.0 ± 0% -58.18% (p=0.000 n=8+8) FloatSqrt/1000000-4 122 ± 0% 75 ± 0% -38.52% (p=0.000 n=8+8) Change-Id: I950dbf61a40267a6cca82ae72524c3024bcb149c Reviewed-on: https://go-review.googlesource.com/87659 Reviewed-by: Robert Griesemer <gri@golang.org>	2018-03-08 19:12:35 +00:00
isharipo	d2a5263a9c	math/big: speedup nat.setBytes for bigger slices Set up to _S (number of bytes in Uint) bytes at time by using BigEndian.Uint32 and BigEndian.Uint64. The performance improves for slices bigger than _S bytes. This is the case for 128/256bit arith that initializes it's objects from bytes. name old time/op new time/op delta NatSetBytes/8-4 29.8ns ± 1% 11.4ns ± 0% -61.63% (p=0.000 n=9+8) NatSetBytes/24-4 109ns ± 1% 56ns ± 0% -48.75% (p=0.000 n=9+8) NatSetBytes/128-4 420ns ± 2% 110ns ± 1% -73.83% (p=0.000 n=10+10) NatSetBytes/7-4 26.2ns ± 1% 21.3ns ± 2% -18.63% (p=0.000 n=8+9) NatSetBytes/23-4 106ns ± 1% 67ns ± 1% -36.93% (p=0.000 n=9+10) NatSetBytes/127-4 410ns ± 2% 121ns ± 0% -70.46% (p=0.000 n=9+8) Found this optimization opportunity by looking at ethereum_corevm community benchmark cpuprofile. name old time/op new time/op delta OpDiv256-4 715ns ± 1% 596ns ± 1% -16.57% (p=0.008 n=5+5) OpDiv128-4 373ns ± 1% 314ns ± 1% -15.83% (p=0.008 n=5+5) OpDiv64-4 301ns ± 0% 285ns ± 1% -5.12% (p=0.008 n=5+5) Change-Id: I8e5a680ae6284c8233d8d7431d51253a8a740b57 Reviewed-on: https://go-review.googlesource.com/98775 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> Reviewed-by: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-03-08 18:50:10 +00:00
erifan01	0585d41c87	math/big: optimize addVW and subVW on arm64 The biggest hot spot of the existing implementation is "load" operations, which lead to poor performance. By unrolling the cycle 4 times and 2 times, and using "LDP", "STP" instructions, this CL can reduce the "load" cost and improve performance. Benchmarks: name old time/op new time/op delta AddVV/1-8 21.5ns ± 0% 21.5ns ± 0% ~ (all equal) AddVV/2-8 13.5ns ± 0% 13.5ns ± 0% ~ (all equal) AddVV/3-8 15.5ns ± 0% 15.5ns ± 0% ~ (all equal) AddVV/4-8 17.5ns ± 0% 17.5ns ± 0% ~ (all equal) AddVV/5-8 19.5ns ± 0% 19.5ns ± 0% ~ (all equal) AddVV/10-8 29.5ns ± 0% 29.5ns ± 0% ~ (all equal) AddVV/100-8 217ns ± 0% 217ns ± 0% ~ (all equal) AddVV/1000-8 2.02µs ± 0% 2.02µs ± 0% ~ (all equal) AddVV/10000-8 20.3µs ± 0% 20.3µs ± 0% ~ (p=0.603 n=5+5) AddVV/100000-8 223µs ± 7% 228µs ± 8% ~ (p=0.548 n=5+5) AddVW/1-8 9.32ns ± 0% 9.26ns ± 0% -0.64% (p=0.008 n=5+5) AddVW/2-8 19.8ns ± 3% 10.5ns ± 0% -46.92% (p=0.008 n=5+5) AddVW/3-8 11.5ns ± 0% 11.0ns ± 0% -4.35% (p=0.008 n=5+5) AddVW/4-8 13.0ns ± 0% 12.0ns ± 0% -7.69% (p=0.008 n=5+5) AddVW/5-8 14.5ns ± 0% 12.5ns ± 0% -13.79% (p=0.008 n=5+5) AddVW/10-8 22.0ns ± 0% 15.5ns ± 0% -29.55% (p=0.008 n=5+5) AddVW/100-8 167ns ± 0% 81ns ± 0% -51.44% (p=0.008 n=5+5) AddVW/1000-8 1.52µs ± 0% 0.64µs ± 0% -57.58% (p=0.008 n=5+5) AddVW/10000-8 15.1µs ± 0% 7.2µs ± 0% -52.55% (p=0.008 n=5+5) AddVW/100000-8 150µs ± 0% 71µs ± 0% -52.95% (p=0.008 n=5+5) SubVW/1-8 9.32ns ± 0% 9.26ns ± 0% -0.64% (p=0.008 n=5+5) SubVW/2-8 19.7ns ± 2% 10.5ns ± 0% -46.70% (p=0.008 n=5+5) SubVW/3-8 11.5ns ± 0% 11.0ns ± 0% -4.35% (p=0.008 n=5+5) SubVW/4-8 13.0ns ± 0% 12.0ns ± 0% -7.69% (p=0.008 n=5+5) SubVW/5-8 14.5ns ± 0% 12.5ns ± 0% -13.79% (p=0.008 n=5+5) SubVW/10-8 22.0ns ± 0% 15.5ns ± 0% -29.55% (p=0.008 n=5+5) SubVW/100-8 167ns ± 0% 81ns ± 0% -51.44% (p=0.008 n=5+5) SubVW/1000-8 1.52µs ± 0% 0.64µs ± 0% -57.58% (p=0.008 n=5+5) SubVW/10000-8 15.1µs ± 0% 7.2µs ± 0% -52.49% (p=0.008 n=5+5) SubVW/100000-8 150µs ± 0% 71µs ± 0% -52.91% (p=0.008 n=5+5) AddMulVVW/1-8 32.4ns ± 1% 32.6ns ± 1% ~ (p=0.119 n=5+5) AddMulVVW/2-8 57.0ns ± 0% 57.0ns ± 0% ~ (p=0.643 n=5+5) AddMulVVW/3-8 90.8ns ± 0% 90.7ns ± 0% ~ (p=0.524 n=5+5) AddMulVVW/4-8 118ns ± 0% 118ns ± 1% ~ (p=1.000 n=4+5) AddMulVVW/5-8 144ns ± 1% 144ns ± 0% ~ (p=0.794 n=5+4) AddMulVVW/10-8 294ns ± 1% 296ns ± 0% +0.48% (p=0.040 n=5+5) AddMulVVW/100-8 2.73µs ± 0% 2.73µs ± 0% ~ (p=0.278 n=5+5) AddMulVVW/1000-8 26.0µs ± 0% 26.5µs ± 0% +2.14% (p=0.008 n=5+5) AddMulVVW/10000-8 297µs ± 0% 297µs ± 0% +0.24% (p=0.008 n=5+5) AddMulVVW/100000-8 3.15ms ± 1% 3.13ms ± 0% ~ (p=0.690 n=5+5) DecimalConversion-8 311µs ± 2% 309µs ± 2% ~ (p=0.310 n=5+5) FloatString/100-8 2.55µs ± 2% 2.54µs ± 2% ~ (p=1.000 n=5+5) FloatString/1000-8 58.1µs ± 0% 58.1µs ± 0% ~ (p=0.151 n=5+5) FloatString/10000-8 4.59ms ± 0% 4.59ms ± 0% ~ (p=0.151 n=5+5) FloatString/100000-8 446ms ± 0% 446ms ± 0% +0.01% (p=0.016 n=5+5) FloatAdd/10-8 183ns ± 0% 183ns ± 0% ~ (p=0.333 n=4+5) FloatAdd/100-8 187ns ± 1% 192ns ± 2% ~ (p=0.056 n=5+5) FloatAdd/1000-8 369ns ± 0% 371ns ± 0% +0.54% (p=0.016 n=4+5) FloatAdd/10000-8 1.88µs ± 0% 1.88µs ± 0% -0.14% (p=0.000 n=4+5) FloatAdd/100000-8 17.2µs ± 0% 17.1µs ± 0% -0.37% (p=0.008 n=5+5) FloatSub/10-8 147ns ± 0% 147ns ± 0% ~ (all equal) FloatSub/100-8 145ns ± 0% 146ns ± 0% ~ (p=0.238 n=5+4) FloatSub/1000-8 241ns ± 0% 241ns ± 0% ~ (p=0.333 n=5+4) FloatSub/10000-8 1.06µs ± 0% 1.06µs ± 0% ~ (p=0.444 n=5+5) FloatSub/100000-8 9.50µs ± 0% 9.48µs ± 0% -0.14% (p=0.008 n=5+5) ParseFloatSmallExp-8 28.4µs ± 2% 28.5µs ± 1% ~ (p=0.690 n=5+5) ParseFloatLargeExp-8 125µs ± 1% 124µs ± 1% ~ (p=0.095 n=5+5) GCD10x10/WithoutXY-8 277ns ± 2% 278ns ± 3% ~ (p=0.937 n=5+5) GCD10x10/WithXY-8 2.08µs ± 3% 2.15µs ± 3% ~ (p=0.056 n=5+5) GCD10x100/WithoutXY-8 592ns ± 3% 613ns ± 4% ~ (p=0.056 n=5+5) GCD10x100/WithXY-8 3.40µs ± 2% 3.42µs ± 4% ~ (p=0.841 n=5+5) GCD10x1000/WithoutXY-8 1.37µs ± 2% 1.35µs ± 3% ~ (p=0.460 n=5+5) GCD10x1000/WithXY-8 7.34µs ± 2% 7.33µs ± 4% ~ (p=0.841 n=5+5) GCD10x10000/WithoutXY-8 8.52µs ± 0% 8.51µs ± 1% ~ (p=0.421 n=5+5) GCD10x10000/WithXY-8 27.5µs ± 2% 27.2µs ± 1% ~ (p=0.151 n=5+5) GCD10x100000/WithoutXY-8 78.3µs ± 1% 78.5µs ± 1% ~ (p=0.690 n=5+5) GCD10x100000/WithXY-8 231µs ± 0% 229µs ± 1% -1.11% (p=0.016 n=5+5) GCD100x100/WithoutXY-8 1.86µs ± 2% 1.86µs ± 2% ~ (p=0.881 n=5+5) GCD100x100/WithXY-8 27.1µs ± 2% 27.2µs ± 1% ~ (p=0.421 n=5+5) GCD100x1000/WithoutXY-8 4.44µs ± 2% 4.41µs ± 1% ~ (p=0.310 n=5+5) GCD100x1000/WithXY-8 36.3µs ± 1% 36.2µs ± 1% ~ (p=0.310 n=5+5) GCD100x10000/WithoutXY-8 22.6µs ± 2% 22.5µs ± 1% ~ (p=0.690 n=5+5) GCD100x10000/WithXY-8 145µs ± 1% 145µs ± 1% ~ (p=1.000 n=5+5) GCD100x100000/WithoutXY-8 195µs ± 0% 196µs ± 1% ~ (p=0.548 n=5+5) GCD100x100000/WithXY-8 1.10ms ± 0% 1.10ms ± 0% -0.30% (p=0.016 n=5+5) GCD1000x1000/WithoutXY-8 25.0µs ± 1% 25.2µs ± 2% ~ (p=0.222 n=5+5) GCD1000x1000/WithXY-8 520µs ± 0% 520µs ± 1% ~ (p=0.151 n=5+5) GCD1000x10000/WithoutXY-8 57.0µs ± 1% 56.9µs ± 1% ~ (p=0.690 n=5+5) GCD1000x10000/WithXY-8 1.21ms ± 0% 1.21ms ± 1% ~ (p=0.881 n=5+5) GCD1000x100000/WithoutXY-8 358µs ± 0% 359µs ± 1% ~ (p=0.548 n=5+5) GCD1000x100000/WithXY-8 8.73ms ± 0% 8.73ms ± 0% ~ (p=0.548 n=5+5) GCD10000x10000/WithoutXY-8 686µs ± 0% 687µs ± 0% ~ (p=0.548 n=5+5) GCD10000x10000/WithXY-8 15.9ms ± 0% 15.9ms ± 0% ~ (p=0.841 n=5+5) GCD10000x100000/WithoutXY-8 2.08ms ± 0% 2.08ms ± 0% ~ (p=1.000 n=5+5) GCD10000x100000/WithXY-8 86.7ms ± 0% 86.7ms ± 0% ~ (p=1.000 n=5+5) GCD100000x100000/WithoutXY-8 51.1ms ± 0% 51.0ms ± 0% ~ (p=0.151 n=5+5) GCD100000x100000/WithXY-8 1.23s ± 0% 1.23s ± 0% ~ (p=0.841 n=5+5) Hilbert-8 2.41ms ± 1% 2.42ms ± 2% ~ (p=0.690 n=5+5) Binomial-8 4.86µs ± 1% 4.86µs ± 1% ~ (p=0.889 n=5+5) QuoRem-8 7.09µs ± 0% 7.08µs ± 0% -0.09% (p=0.024 n=5+5) Exp-8 161ms ± 0% 161ms ± 0% -0.08% (p=0.032 n=5+5) Exp2-8 161ms ± 0% 161ms ± 0% ~ (p=1.000 n=5+5) Bitset-8 40.7ns ± 0% 40.6ns ± 0% ~ (p=0.095 n=4+5) BitsetNeg-8 159ns ± 4% 148ns ± 0% -6.92% (p=0.016 n=5+4) BitsetOrig-8 378ns ± 1% 378ns ± 1% ~ (p=0.937 n=5+5) BitsetNegOrig-8 647ns ± 5% 647ns ± 4% ~ (p=1.000 n=5+5) ModSqrt225_Tonelli-8 7.26ms ± 0% 7.27ms ± 0% ~ (p=1.000 n=5+5) ModSqrt224_3Mod4-8 2.24ms ± 0% 2.24ms ± 0% ~ (p=0.690 n=5+5) ModSqrt5430_Tonelli-8 62.8s ± 1% 62.5s ± 0% ~ (p=0.063 n=5+4) ModSqrt5430_3Mod4-8 20.8s ± 0% 20.8s ± 0% ~ (p=0.310 n=5+5) Sqrt-8 101µs ± 1% 101µs ± 0% -0.35% (p=0.032 n=5+5) IntSqr/1-8 32.3ns ± 1% 32.5ns ± 1% ~ (p=0.421 n=5+5) IntSqr/2-8 157ns ± 5% 156ns ± 5% ~ (p=0.651 n=5+5) IntSqr/3-8 292ns ± 2% 291ns ± 3% ~ (p=0.881 n=5+5) IntSqr/5-8 738ns ± 6% 740ns ± 5% ~ (p=0.841 n=5+5) IntSqr/8-8 1.82µs ± 4% 1.83µs ± 4% ~ (p=0.730 n=5+5) IntSqr/10-8 2.92µs ± 1% 2.93µs ± 1% ~ (p=0.643 n=5+5) IntSqr/20-8 6.28µs ± 2% 6.28µs ± 2% ~ (p=1.000 n=5+5) IntSqr/30-8 13.8µs ± 2% 13.9µs ± 3% ~ (p=1.000 n=5+5) IntSqr/50-8 37.8µs ± 4% 37.9µs ± 4% ~ (p=0.690 n=5+5) IntSqr/80-8 95.9µs ± 1% 95.8µs ± 1% ~ (p=0.841 n=5+5) IntSqr/100-8 148µs ± 1% 148µs ± 1% ~ (p=0.310 n=5+5) IntSqr/200-8 586µs ± 1% 586µs ± 1% ~ (p=0.841 n=5+5) IntSqr/300-8 1.32ms ± 0% 1.31ms ± 0% ~ (p=0.222 n=5+5) IntSqr/500-8 2.48ms ± 0% 2.48ms ± 0% ~ (p=0.556 n=5+4) IntSqr/800-8 4.68ms ± 0% 4.68ms ± 0% ~ (p=0.548 n=5+5) IntSqr/1000-8 7.57ms ± 0% 7.56ms ± 0% ~ (p=0.421 n=5+5) Mul-8 311ms ± 0% 311ms ± 0% ~ (p=0.548 n=5+5) Exp3Power/0x10-8 559ns ± 1% 560ns ± 1% ~ (p=0.984 n=5+5) Exp3Power/0x40-8 641ns ± 1% 634ns ± 1% ~ (p=0.063 n=5+5) Exp3Power/0x100-8 1.39µs ± 2% 1.40µs ± 2% ~ (p=0.381 n=5+5) Exp3Power/0x400-8 8.27µs ± 1% 8.26µs ± 0% ~ (p=0.571 n=5+5) Exp3Power/0x1000-8 59.9µs ± 0% 59.7µs ± 0% -0.23% (p=0.008 n=5+5) Exp3Power/0x4000-8 816µs ± 0% 816µs ± 0% ~ (p=1.000 n=5+5) Exp3Power/0x10000-8 7.77ms ± 0% 7.77ms ± 0% ~ (p=0.841 n=5+5) Exp3Power/0x40000-8 73.4ms ± 0% 73.4ms ± 0% ~ (p=0.690 n=5+5) Exp3Power/0x100000-8 665ms ± 0% 664ms ± 0% -0.14% (p=0.008 n=5+5) Exp3Power/0x400000-8 5.98s ± 0% 5.98s ± 0% -0.09% (p=0.008 n=5+5) Fibo-8 116ms ± 0% 116ms ± 0% -0.25% (p=0.008 n=5+5) NatSqr/1-8 115ns ± 3% 116ns ± 2% ~ (p=0.238 n=5+5) NatSqr/2-8 237ns ± 1% 237ns ± 1% ~ (p=0.683 n=5+5) NatSqr/3-8 367ns ± 3% 368ns ± 3% ~ (p=0.817 n=5+5) NatSqr/5-8 807ns ± 3% 812ns ± 3% ~ (p=0.913 n=5+5) NatSqr/8-8 1.93µs ± 2% 1.93µs ± 3% ~ (p=0.651 n=5+5) NatSqr/10-8 2.98µs ± 2% 2.99µs ± 2% ~ (p=0.690 n=5+5) NatSqr/20-8 6.49µs ± 2% 6.46µs ± 2% ~ (p=0.548 n=5+5) NatSqr/30-8 14.4µs ± 2% 14.3µs ± 2% ~ (p=0.690 n=5+5) NatSqr/50-8 38.6µs ± 2% 38.7µs ± 2% ~ (p=0.841 n=5+5) NatSqr/80-8 96.1µs ± 2% 95.8µs ± 2% ~ (p=0.548 n=5+5) NatSqr/100-8 149µs ± 1% 149µs ± 1% ~ (p=0.841 n=5+5) NatSqr/200-8 593µs ± 1% 590µs ± 1% ~ (p=0.421 n=5+5) NatSqr/300-8 1.32ms ± 0% 1.32ms ± 1% ~ (p=0.222 n=5+5) NatSqr/500-8 2.49ms ± 0% 2.49ms ± 0% ~ (p=0.690 n=5+5) NatSqr/800-8 4.69ms ± 0% 4.69ms ± 0% ~ (p=1.000 n=5+5) NatSqr/1000-8 7.59ms ± 0% 7.58ms ± 0% ~ (p=0.841 n=5+5) ScanPi-8 322µs ± 0% 321µs ± 0% ~ (p=0.095 n=5+5) StringPiParallel-8 71.4µs ± 5% 68.8µs ± 4% ~ (p=0.151 n=5+5) Scan/10/Base2-8 1.10µs ± 0% 1.09µs ± 0% -0.36% (p=0.032 n=5+5) Scan/100/Base2-8 7.78µs ± 0% 7.79µs ± 0% +0.14% (p=0.008 n=5+5) Scan/1000/Base2-8 78.8µs ± 0% 79.0µs ± 0% +0.24% (p=0.008 n=5+5) Scan/10000/Base2-8 1.22ms ± 0% 1.22ms ± 0% ~ (p=0.056 n=5+5) Scan/100000/Base2-8 55.1ms ± 0% 55.0ms ± 0% -0.15% (p=0.008 n=5+5) Scan/10/Base8-8 514ns ± 0% 515ns ± 0% ~ (p=0.079 n=5+5) Scan/100/Base8-8 2.89µs ± 0% 2.89µs ± 0% +0.15% (p=0.008 n=5+5) Scan/1000/Base8-8 31.0µs ± 0% 31.1µs ± 0% +0.12% (p=0.008 n=5+5) Scan/10000/Base8-8 740µs ± 0% 740µs ± 0% ~ (p=0.222 n=5+5) Scan/100000/Base8-8 50.6ms ± 0% 50.5ms ± 0% -0.06% (p=0.016 n=4+5) Scan/10/Base10-8 492ns ± 1% 490ns ± 1% ~ (p=0.310 n=5+5) Scan/100/Base10-8 2.67µs ± 0% 2.67µs ± 0% ~ (p=0.056 n=5+5) Scan/1000/Base10-8 28.7µs ± 0% 28.7µs ± 0% ~ (p=1.000 n=5+5) Scan/10000/Base10-8 717µs ± 0% 716µs ± 0% ~ (p=0.222 n=5+5) Scan/100000/Base10-8 50.2ms ± 0% 50.3ms ± 0% +0.05% (p=0.008 n=5+5) Scan/10/Base16-8 442ns ± 1% 442ns ± 0% ~ (p=0.468 n=5+5) Scan/100/Base16-8 2.46µs ± 0% 2.45µs ± 0% ~ (p=0.159 n=5+5) Scan/1000/Base16-8 27.2µs ± 0% 27.2µs ± 0% ~ (p=0.841 n=5+5) Scan/10000/Base16-8 721µs ± 0% 722µs ± 0% ~ (p=0.548 n=5+5) Scan/100000/Base16-8 52.6ms ± 0% 52.6ms ± 0% +0.07% (p=0.008 n=5+5) String/10/Base2-8 244ns ± 1% 242ns ± 1% ~ (p=0.103 n=5+5) String/100/Base2-8 1.48µs ± 0% 1.48µs ± 1% ~ (p=0.786 n=5+5) String/1000/Base2-8 13.3µs ± 1% 13.3µs ± 0% ~ (p=0.222 n=5+5) String/10000/Base2-8 132µs ± 1% 132µs ± 1% ~ (p=1.000 n=5+5) String/100000/Base2-8 1.30ms ± 1% 1.30ms ± 1% ~ (p=1.000 n=5+5) String/10/Base8-8 167ns ± 1% 168ns ± 1% ~ (p=0.135 n=5+5) String/100/Base8-8 623ns ± 1% 626ns ± 1% ~ (p=0.151 n=5+5) String/1000/Base8-8 5.24µs ± 1% 5.24µs ± 0% ~ (p=1.000 n=5+5) String/10000/Base8-8 50.0µs ± 1% 50.0µs ± 1% ~ (p=1.000 n=5+5) String/100000/Base8-8 492µs ± 1% 489µs ± 1% ~ (p=0.056 n=5+5) String/10/Base10-8 503ns ± 1% 501ns ± 0% ~ (p=0.183 n=5+5) String/100/Base10-8 1.96µs ± 0% 1.97µs ± 0% ~ (p=0.389 n=5+5) String/1000/Base10-8 12.4µs ± 1% 12.4µs ± 1% ~ (p=0.841 n=5+5) String/10000/Base10-8 56.7µs ± 1% 56.6µs ± 0% ~ (p=1.000 n=5+5) String/100000/Base10-8 25.6ms ± 0% 25.6ms ± 0% ~ (p=0.222 n=5+5) String/10/Base16-8 147ns ± 0% 148ns ± 2% ~ (p=1.000 n=4+5) String/100/Base16-8 505ns ± 0% 505ns ± 1% ~ (p=0.778 n=5+5) String/1000/Base16-8 3.94µs ± 0% 3.94µs ± 0% ~ (p=0.841 n=5+5) String/10000/Base16-8 37.4µs ± 1% 37.2µs ± 1% ~ (p=0.095 n=5+5) String/100000/Base16-8 367µs ± 1% 367µs ± 0% ~ (p=1.000 n=5+5) LeafSize/0-8 6.64ms ± 0% 6.65ms ± 0% ~ (p=0.690 n=5+5) LeafSize/1-8 72.5µs ± 1% 72.4µs ± 1% ~ (p=0.841 n=5+5) LeafSize/2-8 72.6µs ± 1% 72.6µs ± 1% ~ (p=1.000 n=5+5) LeafSize/3-8 377µs ± 0% 377µs ± 0% ~ (p=0.421 n=5+5) LeafSize/4-8 71.2µs ± 1% 71.3µs ± 0% ~ (p=0.278 n=5+5) LeafSize/5-8 469µs ± 0% 469µs ± 0% ~ (p=0.310 n=5+5) LeafSize/6-8 376µs ± 0% 376µs ± 0% ~ (p=0.841 n=5+5) LeafSize/7-8 244µs ± 0% 244µs ± 0% ~ (p=0.841 n=5+5) LeafSize/8-8 71.9µs ± 1% 72.1µs ± 1% ~ (p=0.548 n=5+5) LeafSize/9-8 536µs ± 0% 536µs ± 0% ~ (p=0.151 n=5+5) LeafSize/10-8 470µs ± 0% 471µs ± 0% +0.10% (p=0.032 n=5+5) LeafSize/11-8 458µs ± 0% 458µs ± 0% ~ (p=0.881 n=5+5) LeafSize/12-8 376µs ± 0% 376µs ± 0% ~ (p=0.548 n=5+5) LeafSize/13-8 341µs ± 0% 342µs ± 0% ~ (p=0.222 n=5+5) LeafSize/14-8 246µs ± 0% 245µs ± 0% ~ (p=0.167 n=5+5) LeafSize/15-8 168µs ± 0% 168µs ± 0% ~ (p=0.548 n=5+5) LeafSize/16-8 72.1µs ± 1% 72.2µs ± 1% ~ (p=0.690 n=5+5) LeafSize/32-8 81.5µs ± 1% 81.4µs ± 1% ~ (p=1.000 n=5+5) LeafSize/64-8 133µs ± 1% 134µs ± 1% ~ (p=0.690 n=5+5) ProbablyPrime/n=0-8 44.3ms ± 0% 44.2ms ± 0% -0.28% (p=0.008 n=5+5) ProbablyPrime/n=1-8 64.8ms ± 0% 64.7ms ± 0% -0.15% (p=0.008 n=5+5) ProbablyPrime/n=5-8 147ms ± 0% 147ms ± 0% -0.11% (p=0.008 n=5+5) ProbablyPrime/n=10-8 250ms ± 0% 250ms ± 0% ~ (p=0.056 n=5+5) ProbablyPrime/n=20-8 456ms ± 0% 455ms ± 0% -0.05% (p=0.008 n=5+5) ProbablyPrime/Lucas-8 23.6ms ± 0% 23.5ms ± 0% -0.29% (p=0.008 n=5+5) ProbablyPrime/MillerRabinBase2-8 20.6ms ± 0% 20.6ms ± 0% ~ (p=0.690 n=5+5) FloatSqrt/64-8 2.01µs ± 1% 2.02µs ± 1% ~ (p=0.421 n=5+5) FloatSqrt/128-8 4.43µs ± 2% 4.38µs ± 2% ~ (p=0.222 n=5+5) FloatSqrt/256-8 6.64µs ± 1% 6.68µs ± 2% ~ (p=0.516 n=5+5) FloatSqrt/1000-8 31.9µs ± 0% 31.8µs ± 0% ~ (p=0.095 n=5+5) FloatSqrt/10000-8 595µs ± 0% 594µs ± 0% ~ (p=0.056 n=5+5) FloatSqrt/100000-8 17.9ms ± 0% 17.9ms ± 0% ~ (p=0.151 n=5+5) FloatSqrt/1000000-8 1.52s ± 0% 1.52s ± 0% ~ (p=0.841 n=5+5) name old speed new speed delta AddVV/1-8 2.97GB/s ± 0% 2.97GB/s ± 0% ~ (p=0.971 n=4+4) AddVV/2-8 9.47GB/s ± 0% 9.47GB/s ± 0% +0.01% (p=0.016 n=5+5) AddVV/3-8 12.4GB/s ± 0% 12.4GB/s ± 0% ~ (p=0.548 n=5+5) AddVV/4-8 14.6GB/s ± 0% 14.6GB/s ± 0% ~ (p=1.000 n=5+5) AddVV/5-8 16.4GB/s ± 0% 16.4GB/s ± 0% ~ (p=1.000 n=5+5) AddVV/10-8 21.7GB/s ± 0% 21.7GB/s ± 0% ~ (p=0.548 n=5+5) AddVV/100-8 29.4GB/s ± 0% 29.4GB/s ± 0% ~ (p=1.000 n=5+5) AddVV/1000-8 31.7GB/s ± 0% 31.7GB/s ± 0% ~ (p=0.524 n=5+4) AddVV/10000-8 31.5GB/s ± 0% 31.5GB/s ± 0% ~ (p=0.690 n=5+5) AddVV/100000-8 28.8GB/s ± 7% 28.1GB/s ± 8% ~ (p=0.548 n=5+5) AddVW/1-8 859MB/s ± 0% 864MB/s ± 0% +0.61% (p=0.008 n=5+5) AddVW/2-8 809MB/s ± 2% 1520MB/s ± 0% +87.78% (p=0.008 n=5+5) AddVW/3-8 2.08GB/s ± 0% 2.18GB/s ± 0% +4.54% (p=0.008 n=5+5) AddVW/4-8 2.46GB/s ± 0% 2.66GB/s ± 0% +8.33% (p=0.016 n=4+5) AddVW/5-8 2.76GB/s ± 0% 3.20GB/s ± 0% +16.03% (p=0.008 n=5+5) AddVW/10-8 3.63GB/s ± 0% 5.15GB/s ± 0% +41.83% (p=0.008 n=5+5) AddVW/100-8 4.79GB/s ± 0% 9.87GB/s ± 0% +106.12% (p=0.008 n=5+5) AddVW/1000-8 5.27GB/s ± 0% 12.42GB/s ± 0% +135.74% (p=0.008 n=5+5) AddVW/10000-8 5.31GB/s ± 0% 11.19GB/s ± 0% +110.71% (p=0.008 n=5+5) AddVW/100000-8 5.32GB/s ± 0% 11.32GB/s ± 0% +112.56% (p=0.008 n=5+5) SubVW/1-8 859MB/s ± 0% 864MB/s ± 0% +0.61% (p=0.008 n=5+5) SubVW/2-8 812MB/s ± 2% 1520MB/s ± 0% +87.09% (p=0.008 n=5+5) SubVW/3-8 2.08GB/s ± 0% 2.18GB/s ± 0% +4.55% (p=0.008 n=5+5) SubVW/4-8 2.46GB/s ± 0% 2.66GB/s ± 0% +8.33% (p=0.008 n=5+5) SubVW/5-8 2.75GB/s ± 0% 3.20GB/s ± 0% +16.03% (p=0.008 n=5+5) SubVW/10-8 3.63GB/s ± 0% 5.15GB/s ± 0% +41.82% (p=0.008 n=5+5) SubVW/100-8 4.79GB/s ± 0% 9.87GB/s ± 0% +106.13% (p=0.008 n=5+5) SubVW/1000-8 5.27GB/s ± 0% 12.42GB/s ± 0% +135.74% (p=0.008 n=5+5) SubVW/10000-8 5.31GB/s ± 0% 11.17GB/s ± 0% +110.44% (p=0.008 n=5+5) SubVW/100000-8 5.32GB/s ± 0% 11.31GB/s ± 0% +112.35% (p=0.008 n=5+5) AddMulVVW/1-8 1.97GB/s ± 1% 1.96GB/s ± 1% ~ (p=0.151 n=5+5) AddMulVVW/2-8 2.24GB/s ± 0% 2.25GB/s ± 0% ~ (p=0.095 n=5+5) AddMulVVW/3-8 2.11GB/s ± 0% 2.12GB/s ± 0% ~ (p=0.548 n=5+5) AddMulVVW/4-8 2.17GB/s ± 1% 2.17GB/s ± 1% ~ (p=0.548 n=5+5) AddMulVVW/5-8 2.22GB/s ± 1% 2.21GB/s ± 1% ~ (p=0.421 n=5+5) AddMulVVW/10-8 2.17GB/s ± 1% 2.16GB/s ± 0% ~ (p=0.095 n=5+5) AddMulVVW/100-8 2.35GB/s ± 0% 2.35GB/s ± 0% ~ (p=0.421 n=5+5) AddMulVVW/1000-8 2.47GB/s ± 0% 2.41GB/s ± 0% -2.09% (p=0.008 n=5+5) AddMulVVW/10000-8 2.16GB/s ± 0% 2.15GB/s ± 0% -0.23% (p=0.008 n=5+5) AddMulVVW/100000-8 2.03GB/s ± 1% 2.04GB/s ± 0% ~ (p=0.690 n=5+5) name old alloc/op new alloc/op delta FloatString/100-8 400B ± 0% 400B ± 0% ~ (all equal) FloatString/1000-8 3.22kB ± 0% 3.22kB ± 0% ~ (all equal) FloatString/10000-8 55.6kB ± 0% 55.5kB ± 0% ~ (p=0.206 n=5+5) FloatString/100000-8 627kB ± 0% 627kB ± 0% ~ (all equal) FloatAdd/10-8 0.00B 0.00B ~ (all equal) FloatAdd/100-8 0.00B 0.00B ~ (all equal) FloatAdd/1000-8 0.00B 0.00B ~ (all equal) FloatAdd/10000-8 0.00B 0.00B ~ (all equal) FloatAdd/100000-8 0.00B 0.00B ~ (all equal) FloatSub/10-8 0.00B 0.00B ~ (all equal) FloatSub/100-8 0.00B 0.00B ~ (all equal) FloatSub/1000-8 0.00B 0.00B ~ (all equal) FloatSub/10000-8 0.00B 0.00B ~ (all equal) FloatSub/100000-8 0.00B 0.00B ~ (all equal) FloatSqrt/64-8 416B ± 0% 416B ± 0% ~ (all equal) FloatSqrt/128-8 720B ± 0% 720B ± 0% ~ (all equal) FloatSqrt/256-8 816B ± 0% 816B ± 0% ~ (all equal) FloatSqrt/1000-8 2.50kB ± 0% 2.50kB ± 0% ~ (all equal) FloatSqrt/10000-8 23.5kB ± 0% 23.5kB ± 0% ~ (all equal) FloatSqrt/100000-8 251kB ± 0% 251kB ± 0% ~ (all equal) FloatSqrt/1000000-8 4.61MB ± 0% 4.61MB ± 0% ~ (all equal) name old allocs/op new allocs/op delta FloatString/100-8 8.00 ± 0% 8.00 ± 0% ~ (all equal) FloatString/1000-8 10.0 ± 0% 10.0 ± 0% ~ (all equal) FloatString/10000-8 42.0 ± 0% 42.0 ± 0% ~ (all equal) FloatString/100000-8 346 ± 0% 346 ± 0% ~ (all equal) FloatAdd/10-8 0.00 0.00 ~ (all equal) FloatAdd/100-8 0.00 0.00 ~ (all equal) FloatAdd/1000-8 0.00 0.00 ~ (all equal) FloatAdd/10000-8 0.00 0.00 ~ (all equal) FloatAdd/100000-8 0.00 0.00 ~ (all equal) FloatSub/10-8 0.00 0.00 ~ (all equal) FloatSub/100-8 0.00 0.00 ~ (all equal) FloatSub/1000-8 0.00 0.00 ~ (all equal) FloatSub/10000-8 0.00 0.00 ~ (all equal) FloatSub/100000-8 0.00 0.00 ~ (all equal) FloatSqrt/64-8 9.00 ± 0% 9.00 ± 0% ~ (all equal) FloatSqrt/128-8 13.0 ± 0% 13.0 ± 0% ~ (all equal) FloatSqrt/256-8 12.0 ± 0% 12.0 ± 0% ~ (all equal) FloatSqrt/1000-8 19.0 ± 0% 19.0 ± 0% ~ (all equal) FloatSqrt/10000-8 35.0 ± 0% 35.0 ± 0% ~ (all equal) FloatSqrt/100000-8 55.0 ± 0% 55.0 ± 0% ~ (all equal) FloatSqrt/1000000-8 122 ± 0% 122 ± 0% ~ (all equal) Change-Id: I6888d84c037d91f9e2199f3492ea3f6a0ed77b24 Reviewed-on: https://go-review.googlesource.com/77832 Reviewed-by: Vlad Krasnov <vlad@cloudflare.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-03-08 15:31:37 +00:00
Vlad Krasnov	fd3d27938a	math/big: implement addMulVVW on arm64 The lack of proper addMulVVW implementation for arm64 hurts RSA performance. This assembly implementation is optimized for arm64 based servers. name old time/op new time/op delta pkg:math/big goos:linux goarch:arm64 AddMulVVW/1 55.2ns ± 0% 11.9ns ± 1% -78.37% (p=0.000 n=8+10) AddMulVVW/2 67.0ns ± 0% 11.2ns ± 0% -83.28% (p=0.000 n=7+10) AddMulVVW/3 93.2ns ± 0% 13.2ns ± 0% -85.84% (p=0.000 n=10+10) AddMulVVW/4 126ns ± 0% 13ns ± 1% -89.82% (p=0.000 n=10+10) AddMulVVW/5 151ns ± 0% 17ns ± 0% -88.87% (p=0.000 n=10+9) AddMulVVW/10 323ns ± 0% 25ns ± 0% -92.20% (p=0.000 n=10+10) AddMulVVW/100 3.28µs ± 0% 0.14µs ± 0% -95.82% (p=0.000 n=10+10) AddMulVVW/1000 31.7µs ± 0% 1.3µs ± 0% -96.00% (p=0.000 n=10+8) AddMulVVW/10000 313µs ± 0% 13µs ± 0% -95.98% (p=0.000 n=10+10) AddMulVVW/100000 3.24ms ± 0% 0.13ms ± 1% -96.13% (p=0.000 n=9+9) pkg:crypto/rsa goos:linux goarch:arm64 RSA2048Decrypt 44.7ms ± 0% 4.0ms ± 6% -91.08% (p=0.000 n=8+10) RSA2048Sign 46.3ms ± 0% 5.0ms ± 0% -89.29% (p=0.000 n=9+10) 3PrimeRSA2048Decrypt 22.3ms ± 0% 2.4ms ± 0% -89.26% (p=0.000 n=10+10) Change-Id: I295f0bd5c51a4442d02c44ece1f6026d30dff0bc Reviewed-on: https://go-review.googlesource.com/76270 Reviewed-by: Vlad Krasnov <vlad@cloudflare.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Vlad Krasnov <vlad@cloudflare.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-03-07 23:04:38 +00:00
Cherry Zhang	084143d844	math/big: don't use R18 in ARM64 assembly R18 seems reserved on Apple platforms. May fix darwin/arm64 build. Change-Id: Ia2c1de550a64827c85a64affa53b94c62aacce8e Reviewed-on: https://go-review.googlesource.com/98896 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Elias Naur <elias.naur@gmail.com>	2018-03-06 15:34:00 +00:00
erifan01	c4f3fe95c6	math/big: optimize addVV and subVV on arm64 The biggest hot spot of the existing implementation is "load" operations, which lead to poor performance. By unrolling the cycle 4x and 2x, and using "LDP", "STP" instructions, this CL can reduce the "load" cost and improve performance. Benchmarks: name old time/op new time/op delta AddVV/1-8 21.5ns ± 0% 11.5ns ± 0% -46.51% (p=0.008 n=5+5) AddVV/2-8 13.5ns ± 0% 12.0ns ± 0% -11.11% (p=0.008 n=5+5) AddVV/3-8 15.5ns ± 0% 13.0ns ± 0% -16.13% (p=0.008 n=5+5) AddVV/4-8 17.5ns ± 0% 13.5ns ± 0% -22.86% (p=0.008 n=5+5) AddVV/5-8 19.5ns ± 0% 14.5ns ± 0% -25.64% (p=0.008 n=5+5) AddVV/10-8 29.5ns ± 0% 18.0ns ± 0% -38.98% (p=0.008 n=5+5) AddVV/100-8 217ns ± 0% 94ns ± 0% -56.64% (p=0.008 n=5+5) AddVV/1000-8 2.02µs ± 0% 1.03µs ± 0% -48.85% (p=0.008 n=5+5) AddVV/10000-8 20.5µs ± 0% 11.3µs ± 0% -44.70% (p=0.008 n=5+5) AddVV/100000-8 247µs ± 3% 154µs ± 0% -37.52% (p=0.008 n=5+5) SubVV/1-8 21.5ns ± 0% 11.5ns ± 0% ~ (p=0.079 n=4+5) SubVV/2-8 13.5ns ± 0% 12.0ns ± 0% -11.11% (p=0.008 n=5+5) SubVV/3-8 15.5ns ± 0% 13.0ns ± 0% -16.13% (p=0.008 n=5+5) SubVV/4-8 17.5ns ± 0% 13.5ns ± 0% -22.86% (p=0.008 n=5+5) SubVV/5-8 19.5ns ± 0% 14.5ns ± 0% -25.64% (p=0.008 n=5+5) SubVV/10-8 29.5ns ± 0% 18.0ns ± 0% -38.98% (p=0.008 n=5+5) SubVV/100-8 217ns ± 0% 94ns ± 0% -56.64% (p=0.008 n=5+5) SubVV/1000-8 2.02µs ± 0% 0.80µs ± 0% -60.50% (p=0.008 n=5+5) SubVV/10000-8 20.5µs ± 0% 11.3µs ± 0% -44.99% (p=0.008 n=5+5) SubVV/100000-8 221µs ±11% 223µs ±16% ~ (p=0.690 n=5+5) AddVW/1-8 9.32ns ± 0% 9.32ns ± 0% ~ (all equal) AddVW/2-8 19.7ns ± 1% 19.7ns ± 0% ~ (p=0.381 n=5+4) AddVW/3-8 11.5ns ± 0% 11.5ns ± 0% ~ (all equal) AddVW/4-8 13.0ns ± 0% 13.0ns ± 0% ~ (all equal) AddVW/5-8 14.5ns ± 0% 14.5ns ± 0% ~ (all equal) AddVW/10-8 22.0ns ± 0% 22.0ns ± 0% ~ (all equal) AddVW/100-8 167ns ± 0% 167ns ± 0% ~ (all equal) AddVW/1000-8 1.52µs ± 0% 1.52µs ± 0% +0.40% (p=0.008 n=5+5) AddVW/10000-8 15.1µs ± 0% 15.1µs ± 0% ~ (p=0.556 n=5+4) AddVW/100000-8 152µs ± 1% 152µs ± 1% ~ (p=0.690 n=5+5) AddMulVVW/1-8 33.3ns ± 0% 32.7ns ± 1% -1.86% (p=0.008 n=5+5) AddMulVVW/2-8 59.3ns ± 1% 56.9ns ± 1% -4.15% (p=0.008 n=5+5) AddMulVVW/3-8 80.5ns ± 1% 85.4ns ± 3% +6.19% (p=0.008 n=5+5) AddMulVVW/4-8 127ns ± 0% 111ns ± 1% -13.19% (p=0.008 n=5+5) AddMulVVW/5-8 144ns ± 0% 149ns ± 0% +3.47% (p=0.016 n=4+5) AddMulVVW/10-8 298ns ± 1% 283ns ± 0% -4.77% (p=0.008 n=5+5) AddMulVVW/100-8 3.06µs ± 0% 2.99µs ± 0% -2.21% (p=0.008 n=5+5) AddMulVVW/1000-8 31.3µs ± 0% 26.9µs ± 0% -14.17% (p=0.008 n=5+5) AddMulVVW/10000-8 316µs ± 0% 305µs ± 0% -3.51% (p=0.008 n=5+5) AddMulVVW/100000-8 3.17ms ± 0% 3.17ms ± 1% ~ (p=0.690 n=5+5) DecimalConversion-8 316µs ± 1% 313µs ± 2% ~ (p=0.095 n=5+5) FloatString/100-8 2.53µs ± 1% 2.56µs ± 2% ~ (p=0.222 n=5+5) FloatString/1000-8 58.4µs ± 0% 58.5µs ± 0% ~ (p=0.206 n=5+5) FloatString/10000-8 4.59ms ± 0% 4.58ms ± 0% -0.31% (p=0.008 n=5+5) FloatString/100000-8 446ms ± 0% 444ms ± 0% -0.31% (p=0.008 n=5+5) FloatAdd/10-8 184ns ± 0% 172ns ± 0% -6.30% (p=0.008 n=5+5) FloatAdd/100-8 189ns ± 2% 191ns ± 4% ~ (p=0.381 n=5+5) FloatAdd/1000-8 371ns ± 0% 347ns ± 1% -6.42% (p=0.008 n=5+5) FloatAdd/10000-8 1.87µs ± 0% 1.68µs ± 0% -10.16% (p=0.008 n=5+5) FloatAdd/100000-8 17.1µs ± 0% 15.6µs ± 0% -8.74% (p=0.016 n=5+4) FloatSub/10-8 152ns ± 0% 138ns ± 0% -9.47% (p=0.000 n=4+5) FloatSub/100-8 148ns ± 0% 142ns ± 0% -4.05% (p=0.000 n=5+4) FloatSub/1000-8 245ns ± 1% 217ns ± 0% -11.28% (p=0.000 n=5+4) FloatSub/10000-8 1.07µs ± 0% 0.88µs ± 1% -18.14% (p=0.008 n=5+5) FloatSub/100000-8 9.58µs ± 0% 7.96µs ± 0% -16.84% (p=0.008 n=5+5) ParseFloatSmallExp-8 28.8µs ± 1% 29.0µs ± 1% ~ (p=0.095 n=5+5) ParseFloatLargeExp-8 126µs ± 1% 126µs ± 1% ~ (p=0.841 n=5+5) GCD10x10/WithoutXY-8 277ns ± 2% 281ns ± 4% ~ (p=0.746 n=5+5) GCD10x10/WithXY-8 2.10µs ± 1% 2.12µs ± 3% ~ (p=0.548 n=5+5) GCD10x100/WithoutXY-8 615ns ± 3% 607ns ± 2% ~ (p=0.135 n=5+5) GCD10x100/WithXY-8 3.50µs ± 2% 3.62µs ± 5% ~ (p=0.151 n=5+5) GCD10x1000/WithoutXY-8 1.39µs ± 2% 1.39µs ± 3% ~ (p=0.690 n=5+5) GCD10x1000/WithXY-8 7.39µs ± 1% 7.34µs ± 2% ~ (p=0.135 n=5+5) GCD10x10000/WithoutXY-8 8.66µs ± 1% 8.68µs ± 1% ~ (p=0.421 n=5+5) GCD10x10000/WithXY-8 28.1µs ± 2% 27.0µs ± 2% -3.81% (p=0.008 n=5+5) GCD10x100000/WithoutXY-8 79.3µs ± 1% 79.3µs ± 1% ~ (p=0.841 n=5+5) GCD10x100000/WithXY-8 238µs ± 0% 227µs ± 1% -4.74% (p=0.008 n=5+5) GCD100x100/WithoutXY-8 1.89µs ± 1% 1.88µs ± 2% ~ (p=0.968 n=5+5) GCD100x100/WithXY-8 26.7µs ± 1% 27.0µs ± 1% +1.44% (p=0.032 n=5+5) GCD100x1000/WithoutXY-8 4.48µs ± 1% 4.45µs ± 2% ~ (p=0.341 n=5+5) GCD100x1000/WithXY-8 36.3µs ± 1% 35.1µs ± 1% -3.27% (p=0.008 n=5+5) GCD100x10000/WithoutXY-8 22.8µs ± 0% 22.7µs ± 1% ~ (p=0.056 n=5+5) GCD100x10000/WithXY-8 145µs ± 1% 133µs ± 1% -8.33% (p=0.008 n=5+5) GCD100x100000/WithoutXY-8 198µs ± 0% 195µs ± 0% -1.56% (p=0.008 n=5+5) GCD100x100000/WithXY-8 1.11ms ± 0% 1.00ms ± 0% -10.04% (p=0.008 n=5+5) GCD1000x1000/WithoutXY-8 25.2µs ± 1% 24.8µs ± 1% -1.63% (p=0.016 n=5+5) GCD1000x1000/WithXY-8 513µs ± 0% 517µs ± 2% ~ (p=0.421 n=5+5) GCD1000x10000/WithoutXY-8 57.0µs ± 0% 52.7µs ± 1% -7.56% (p=0.008 n=5+5) GCD1000x10000/WithXY-8 1.20ms ± 0% 1.10ms ± 0% -8.70% (p=0.008 n=5+5) GCD1000x100000/WithoutXY-8 358µs ± 0% 318µs ± 1% -11.03% (p=0.008 n=5+5) GCD1000x100000/WithXY-8 8.71ms ± 0% 7.65ms ± 0% -12.19% (p=0.008 n=5+5) GCD10000x10000/WithoutXY-8 690µs ± 0% 630µs ± 0% -8.71% (p=0.008 n=5+5) GCD10000x10000/WithXY-8 16.0ms ± 1% 14.9ms ± 0% -6.85% (p=0.008 n=5+5) GCD10000x100000/WithoutXY-8 2.09ms ± 0% 1.75ms ± 0% -16.09% (p=0.016 n=5+4) GCD10000x100000/WithXY-8 86.8ms ± 0% 76.3ms ± 0% -12.09% (p=0.008 n=5+5) GCD100000x100000/WithoutXY-8 51.1ms ± 0% 46.0ms ± 0% -9.97% (p=0.008 n=5+5) GCD100000x100000/WithXY-8 1.25s ± 0% 1.15s ± 0% -7.92% (p=0.008 n=5+5) Hilbert-8 2.45ms ± 1% 2.49ms ± 1% +1.99% (p=0.008 n=5+5) Binomial-8 4.98µs ± 3% 4.90µs ± 2% ~ (p=0.421 n=5+5) QuoRem-8 7.10µs ± 0% 6.21µs ± 0% -12.55% (p=0.016 n=5+4) Exp-8 161ms ± 0% 161ms ± 0% ~ (p=0.421 n=5+5) Exp2-8 161ms ± 0% 161ms ± 0% ~ (p=0.151 n=5+5) Bitset-8 40.4ns ± 0% 40.3ns ± 0% ~ (p=0.190 n=5+5) BitsetNeg-8 163ns ± 3% 137ns ± 2% -15.91% (p=0.008 n=5+5) BitsetOrig-8 377ns ± 1% 372ns ± 1% -1.22% (p=0.024 n=5+5) BitsetNegOrig-8 631ns ± 1% 605ns ± 1% -4.09% (p=0.008 n=5+5) ModSqrt225_Tonelli-8 7.26ms ± 0% 7.26ms ± 0% ~ (p=0.548 n=5+5) ModSqrt224_3Mod4-8 2.24ms ± 0% 2.24ms ± 0% ~ (p=1.000 n=5+5) ModSqrt5430_Tonelli-8 62.4s ± 0% 62.4s ± 0% ~ (p=0.841 n=5+5) ModSqrt5430_3Mod4-8 20.8s ± 0% 20.7s ± 0% ~ (p=0.056 n=5+5) Sqrt-8 101µs ± 0% 89µs ± 0% -12.17% (p=0.008 n=5+5) IntSqr/1-8 32.5ns ± 1% 32.7ns ± 1% ~ (p=0.056 n=5+5) IntSqr/2-8 160ns ± 5% 158ns ± 0% ~ (p=0.397 n=5+4) IntSqr/3-8 298ns ± 4% 296ns ± 4% ~ (p=0.667 n=5+5) IntSqr/5-8 737ns ± 5% 761ns ± 3% +3.34% (p=0.016 n=5+5) IntSqr/8-8 1.87µs ± 4% 1.90µs ± 3% ~ (p=0.222 n=5+5) IntSqr/10-8 2.96µs ± 4% 2.92µs ± 6% ~ (p=0.310 n=5+5) IntSqr/20-8 6.28µs ± 3% 6.21µs ± 2% ~ (p=0.310 n=5+5) IntSqr/30-8 14.0µs ± 2% 13.9µs ± 2% ~ (p=0.548 n=5+5) IntSqr/50-8 37.7µs ± 3% 38.3µs ± 2% ~ (p=0.095 n=5+5) IntSqr/80-8 95.9µs ± 2% 95.1µs ± 1% ~ (p=0.310 n=5+5) IntSqr/100-8 148µs ± 1% 148µs ± 1% ~ (p=0.841 n=5+5) IntSqr/200-8 586µs ± 1% 587µs ± 1% ~ (p=1.000 n=5+5) IntSqr/300-8 1.32ms ± 0% 1.31ms ± 1% -0.73% (p=0.032 n=5+5) IntSqr/500-8 2.48ms ± 0% 2.45ms ± 0% -1.15% (p=0.008 n=5+5) IntSqr/800-8 4.68ms ± 0% 4.62ms ± 0% -1.23% (p=0.008 n=5+5) IntSqr/1000-8 7.57ms ± 0% 7.50ms ± 0% -0.84% (p=0.008 n=5+5) Mul-8 311ms ± 0% 308ms ± 0% -0.81% (p=0.008 n=5+5) Exp3Power/0x10-8 574ns ± 1% 578ns ± 2% ~ (p=0.500 n=5+5) Exp3Power/0x40-8 640ns ± 1% 646ns ± 0% ~ (p=0.056 n=5+5) Exp3Power/0x100-8 1.42µs ± 1% 1.42µs ± 1% ~ (p=0.246 n=5+5) Exp3Power/0x400-8 8.30µs ± 1% 8.29µs ± 1% ~ (p=0.802 n=5+5) Exp3Power/0x1000-8 60.0µs ± 0% 59.9µs ± 0% -0.24% (p=0.016 n=5+5) Exp3Power/0x4000-8 817µs ± 0% 816µs ± 0% -0.17% (p=0.008 n=5+5) Exp3Power/0x10000-8 7.80ms ± 1% 7.70ms ± 0% -1.23% (p=0.008 n=5+5) Exp3Power/0x40000-8 73.4ms ± 0% 72.5ms ± 0% -1.28% (p=0.008 n=5+5) Exp3Power/0x100000-8 665ms ± 0% 656ms ± 0% -1.34% (p=0.008 n=5+5) Exp3Power/0x400000-8 5.99s ± 0% 5.90s ± 0% -1.40% (p=0.008 n=5+5) Fibo-8 116ms ± 0% 50ms ± 0% -57.09% (p=0.008 n=5+5) NatSqr/1-8 112ns ± 4% 112ns ± 2% ~ (p=0.968 n=5+5) NatSqr/2-8 251ns ± 2% 250ns ± 1% ~ (p=0.571 n=5+5) NatSqr/3-8 378ns ± 2% 379ns ± 2% ~ (p=0.794 n=5+5) NatSqr/5-8 829ns ± 3% 827ns ± 2% ~ (p=1.000 n=5+5) NatSqr/8-8 1.97µs ± 2% 1.95µs ± 2% ~ (p=0.310 n=5+5) NatSqr/10-8 3.02µs ± 2% 2.99µs ± 2% ~ (p=0.421 n=5+5) NatSqr/20-8 6.51µs ± 2% 6.49µs ± 1% ~ (p=0.841 n=5+5) NatSqr/30-8 14.1µs ± 2% 14.0µs ± 2% ~ (p=0.841 n=5+5) NatSqr/50-8 38.1µs ± 2% 38.3µs ± 3% ~ (p=0.690 n=5+5) NatSqr/80-8 95.5µs ± 2% 96.0µs ± 1% ~ (p=0.421 n=5+5) NatSqr/100-8 150µs ± 1% 148µs ± 2% ~ (p=0.095 n=5+5) NatSqr/200-8 588µs ± 1% 590µs ± 1% ~ (p=0.421 n=5+5) NatSqr/300-8 1.32ms ± 1% 1.31ms ± 1% ~ (p=0.841 n=5+5) NatSqr/500-8 2.50ms ± 0% 2.47ms ± 0% -1.03% (p=0.008 n=5+5) NatSqr/800-8 4.70ms ± 0% 4.64ms ± 0% -1.31% (p=0.008 n=5+5) NatSqr/1000-8 7.60ms ± 0% 7.52ms ± 0% -1.01% (p=0.008 n=5+5) ScanPi-8 326µs ± 0% 326µs ± 0% ~ (p=0.841 n=5+5) StringPiParallel-8 70.3µs ± 5% 63.8µs ±10% ~ (p=0.056 n=5+5) Scan/10/Base2-8 1.09µs ± 0% 1.09µs ± 0% ~ (p=0.317 n=5+5) Scan/100/Base2-8 7.79µs ± 0% 7.78µs ± 0% ~ (p=0.063 n=5+5) Scan/1000/Base2-8 79.0µs ± 0% 78.9µs ± 0% -0.18% (p=0.008 n=5+5) Scan/10000/Base2-8 1.22ms ± 0% 1.22ms ± 0% -0.15% (p=0.008 n=5+5) Scan/100000/Base2-8 55.1ms ± 0% 55.2ms ± 0% +0.20% (p=0.008 n=5+5) Scan/10/Base8-8 512ns ± 0% 512ns ± 1% ~ (p=0.810 n=5+5) Scan/100/Base8-8 2.89µs ± 0% 2.89µs ± 0% ~ (p=0.810 n=5+5) Scan/1000/Base8-8 31.0µs ± 0% 31.0µs ± 0% ~ (p=0.151 n=5+5) Scan/10000/Base8-8 740µs ± 0% 741µs ± 0% +0.10% (p=0.008 n=5+5) Scan/100000/Base8-8 50.6ms ± 0% 50.6ms ± 0% +0.08% (p=0.008 n=5+5) Scan/10/Base10-8 487ns ± 0% 487ns ± 0% ~ (p=0.571 n=5+5) Scan/100/Base10-8 2.67µs ± 0% 2.67µs ± 0% ~ (p=0.810 n=5+5) Scan/1000/Base10-8 28.7µs ± 0% 28.7µs ± 0% +0.06% (p=0.008 n=5+5) Scan/10000/Base10-8 716µs ± 0% 717µs ± 0% ~ (p=0.222 n=5+5) Scan/100000/Base10-8 50.3ms ± 0% 50.3ms ± 0% +0.10% (p=0.008 n=5+5) Scan/10/Base16-8 438ns ± 0% 437ns ± 1% ~ (p=0.786 n=5+5) Scan/100/Base16-8 2.47µs ± 0% 2.47µs ± 0% -0.19% (p=0.048 n=5+5) Scan/1000/Base16-8 27.2µs ± 0% 27.3µs ± 0% ~ (p=0.087 n=5+5) Scan/10000/Base16-8 722µs ± 0% 722µs ± 0% +0.11% (p=0.008 n=5+5) Scan/100000/Base16-8 52.6ms ± 0% 52.7ms ± 0% +0.15% (p=0.008 n=5+5) String/10/Base2-8 247ns ± 2% 248ns ± 1% ~ (p=0.437 n=5+5) String/100/Base2-8 1.51µs ± 0% 1.51µs ± 0% -0.37% (p=0.024 n=5+5) String/1000/Base2-8 13.6µs ± 1% 13.5µs ± 0% ~ (p=0.095 n=5+5) String/10000/Base2-8 135µs ± 0% 135µs ± 1% ~ (p=0.841 n=5+5) String/100000/Base2-8 1.32ms ± 1% 1.32ms ± 1% ~ (p=0.690 n=5+5) String/10/Base8-8 169ns ± 1% 169ns ± 1% ~ (p=1.000 n=5+5) String/100/Base8-8 636ns ± 0% 634ns ± 1% ~ (p=0.413 n=5+5) String/1000/Base8-8 5.33µs ± 1% 5.32µs ± 0% ~ (p=0.222 n=5+5) String/10000/Base8-8 50.9µs ± 1% 50.7µs ± 0% ~ (p=0.151 n=5+5) String/100000/Base8-8 500µs ± 1% 497µs ± 0% ~ (p=0.421 n=5+5) String/10/Base10-8 516ns ± 1% 513ns ± 0% -0.62% (p=0.016 n=5+4) String/100/Base10-8 1.97µs ± 0% 1.96µs ± 0% ~ (p=0.667 n=4+5) String/1000/Base10-8 12.5µs ± 0% 11.5µs ± 0% -7.92% (p=0.008 n=5+5) String/10000/Base10-8 57.7µs ± 0% 52.5µs ± 0% -8.93% (p=0.008 n=5+5) String/100000/Base10-8 25.6ms ± 0% 21.6ms ± 0% -15.94% (p=0.008 n=5+5) String/10/Base16-8 150ns ± 1% 149ns ± 0% ~ (p=0.413 n=5+4) String/100/Base16-8 514ns ± 1% 514ns ± 1% ~ (p=0.849 n=5+5) String/1000/Base16-8 4.01µs ± 0% 4.01µs ± 0% ~ (p=0.421 n=5+5) String/10000/Base16-8 37.8µs ± 1% 37.8µs ± 1% ~ (p=0.841 n=5+5) String/100000/Base16-8 373µs ± 2% 373µs ± 0% ~ (p=0.421 n=5+5) LeafSize/0-8 6.63ms ± 0% 6.63ms ± 0% ~ (p=0.730 n=4+5) LeafSize/1-8 74.0µs ± 0% 67.7µs ± 1% -8.53% (p=0.008 n=5+5) LeafSize/2-8 74.2µs ± 0% 68.3µs ± 1% -7.99% (p=0.008 n=5+5) LeafSize/3-8 379µs ± 0% 309µs ± 0% -18.52% (p=0.008 n=5+5) LeafSize/4-8 72.7µs ± 1% 66.7µs ± 0% -8.37% (p=0.008 n=5+5) LeafSize/5-8 471µs ± 0% 384µs ± 0% -18.55% (p=0.008 n=5+5) LeafSize/6-8 378µs ± 0% 308µs ± 0% -18.59% (p=0.008 n=5+5) LeafSize/7-8 245µs ± 0% 204µs ± 1% -16.75% (p=0.008 n=5+5) LeafSize/8-8 73.4µs ± 0% 66.9µs ± 1% -8.79% (p=0.008 n=5+5) LeafSize/9-8 538µs ± 0% 437µs ± 0% -18.75% (p=0.008 n=5+5) LeafSize/10-8 472µs ± 0% 396µs ± 1% -16.01% (p=0.008 n=5+5) LeafSize/11-8 460µs ± 0% 374µs ± 0% -18.58% (p=0.008 n=5+5) LeafSize/12-8 378µs ± 0% 308µs ± 0% -18.38% (p=0.008 n=5+5) LeafSize/13-8 343µs ± 0% 284µs ± 0% -17.30% (p=0.008 n=5+5) LeafSize/14-8 248µs ± 0% 206µs ± 0% -16.94% (p=0.008 n=5+5) LeafSize/15-8 169µs ± 0% 144µs ± 0% -14.69% (p=0.008 n=5+5) LeafSize/16-8 72.9µs ± 0% 66.8µs ± 1% -8.27% (p=0.008 n=5+5) LeafSize/32-8 82.5µs ± 0% 76.7µs ± 0% -7.04% (p=0.008 n=5+5) LeafSize/64-8 134µs ± 0% 129µs ± 0% -3.80% (p=0.008 n=5+5) ProbablyPrime/n=0-8 44.2ms ± 0% 43.4ms ± 0% -1.95% (p=0.008 n=5+5) ProbablyPrime/n=1-8 64.9ms ± 0% 64.0ms ± 0% -1.27% (p=0.008 n=5+5) ProbablyPrime/n=5-8 147ms ± 0% 146ms ± 0% -0.58% (p=0.008 n=5+5) ProbablyPrime/n=10-8 250ms ± 0% 249ms ± 0% -0.35% (p=0.008 n=5+5) ProbablyPrime/n=20-8 456ms ± 0% 455ms ± 0% -0.18% (p=0.008 n=5+5) ProbablyPrime/Lucas-8 23.6ms ± 0% 22.7ms ± 0% -3.74% (p=0.008 n=5+5) ProbablyPrime/MillerRabinBase2-8 20.7ms ± 0% 20.6ms ± 0% ~ (p=0.421 n=5+5) FloatSqrt/64-8 2.25µs ± 1% 2.29µs ± 0% +1.48% (p=0.008 n=5+5) FloatSqrt/128-8 4.86µs ± 1% 4.92µs ± 1% +1.21% (p=0.032 n=5+5) FloatSqrt/256-8 13.6µs ± 0% 13.7µs ± 1% +1.31% (p=0.032 n=5+5) FloatSqrt/1000-8 70.0µs ± 1% 70.1µs ± 0% ~ (p=0.690 n=5+5) FloatSqrt/10000-8 1.92ms ± 0% 1.90ms ± 0% -0.59% (p=0.008 n=5+5) FloatSqrt/100000-8 55.3ms ± 0% 54.8ms ± 0% -1.01% (p=0.008 n=5+5) FloatSqrt/1000000-8 4.56s ± 0% 4.50s ± 0% -1.28% (p=0.008 n=5+5) name old speed new speed delta AddVV/1-8 2.97GB/s ± 0% 5.56GB/s ± 0% +86.85% (p=0.008 n=5+5) AddVV/2-8 9.47GB/s ± 0% 10.66GB/s ± 0% +12.50% (p=0.008 n=5+5) AddVV/3-8 12.4GB/s ± 0% 14.7GB/s ± 0% +19.10% (p=0.008 n=5+5) AddVV/4-8 14.6GB/s ± 0% 18.9GB/s ± 0% +29.63% (p=0.016 n=4+5) AddVV/5-8 16.4GB/s ± 0% 22.0GB/s ± 0% +34.47% (p=0.016 n=5+4) AddVV/10-8 21.7GB/s ± 0% 35.5GB/s ± 0% +63.89% (p=0.008 n=5+5) AddVV/100-8 29.4GB/s ± 0% 68.0GB/s ± 0% +131.38% (p=0.008 n=5+5) AddVV/1000-8 31.7GB/s ± 0% 61.9GB/s ± 0% +95.43% (p=0.008 n=5+5) AddVV/10000-8 31.2GB/s ± 0% 56.4GB/s ± 0% +80.83% (p=0.008 n=5+5) AddVV/100000-8 25.9GB/s ± 3% 41.4GB/s ± 0% +59.98% (p=0.008 n=5+5) SubVV/1-8 2.97GB/s ± 0% 5.56GB/s ± 0% +86.97% (p=0.016 n=4+5) SubVV/2-8 9.47GB/s ± 0% 10.66GB/s ± 0% +12.51% (p=0.008 n=5+5) SubVV/3-8 12.4GB/s ± 0% 14.8GB/s ± 0% +19.23% (p=0.016 n=4+5) SubVV/4-8 14.6GB/s ± 0% 18.9GB/s ± 0% +29.56% (p=0.008 n=5+5) SubVV/5-8 16.4GB/s ± 0% 22.0GB/s ± 0% +34.47% (p=0.016 n=4+5) SubVV/10-8 21.7GB/s ± 0% 35.5GB/s ± 0% +63.89% (p=0.008 n=5+5) SubVV/100-8 29.4GB/s ± 0% 68.0GB/s ± 0% +131.38% (p=0.008 n=5+5) SubVV/1000-8 31.6GB/s ± 0% 80.1GB/s ± 0% +153.08% (p=0.008 n=5+5) SubVV/10000-8 31.2GB/s ± 0% 56.7GB/s ± 0% +81.79% (p=0.008 n=5+5) SubVV/100000-8 29.1GB/s ±10% 29.0GB/s ±18% ~ (p=0.690 n=5+5) AddVW/1-8 859MB/s ± 0% 859MB/s ± 0% -0.01% (p=0.008 n=5+5) AddVW/2-8 811MB/s ± 1% 814MB/s ± 0% ~ (p=0.413 n=5+4) AddVW/3-8 2.08GB/s ± 0% 2.08GB/s ± 0% ~ (p=0.206 n=5+5) AddVW/4-8 2.46GB/s ± 0% 2.46GB/s ± 0% ~ (p=0.056 n=5+5) AddVW/5-8 2.75GB/s ± 0% 2.75GB/s ± 0% ~ (p=0.508 n=5+5) AddVW/10-8 3.63GB/s ± 0% 3.63GB/s ± 0% ~ (p=0.214 n=5+5) AddVW/100-8 4.79GB/s ± 0% 4.79GB/s ± 0% ~ (p=0.500 n=5+5) AddVW/1000-8 5.27GB/s ± 0% 5.25GB/s ± 0% -0.43% (p=0.008 n=5+5) AddVW/10000-8 5.30GB/s ± 0% 5.30GB/s ± 0% ~ (p=0.397 n=5+5) AddVW/100000-8 5.27GB/s ± 1% 5.25GB/s ± 1% ~ (p=0.690 n=5+5) AddMulVVW/1-8 1.92GB/s ± 0% 1.96GB/s ± 1% +1.95% (p=0.008 n=5+5) AddMulVVW/2-8 2.16GB/s ± 1% 2.25GB/s ± 1% +4.32% (p=0.008 n=5+5) AddMulVVW/3-8 2.39GB/s ± 1% 2.25GB/s ± 3% -5.79% (p=0.008 n=5+5) AddMulVVW/4-8 2.00GB/s ± 0% 2.31GB/s ± 1% +15.31% (p=0.008 n=5+5) AddMulVVW/5-8 2.22GB/s ± 0% 2.14GB/s ± 0% -3.86% (p=0.008 n=5+5) AddMulVVW/10-8 2.15GB/s ± 1% 2.25GB/s ± 0% +5.03% (p=0.008 n=5+5) AddMulVVW/100-8 2.09GB/s ± 0% 2.14GB/s ± 0% +2.25% (p=0.008 n=5+5) AddMulVVW/1000-8 2.04GB/s ± 0% 2.38GB/s ± 0% +16.52% (p=0.008 n=5+5) AddMulVVW/10000-8 2.03GB/s ± 0% 2.10GB/s ± 0% +3.64% (p=0.008 n=5+5) AddMulVVW/100000-8 2.02GB/s ± 0% 2.02GB/s ± 1% ~ (p=0.690 n=5+5) Change-Id: Ie482d67a7dbb5af6f5d81af2b3d9d14bd66336db Reviewed-on: https://go-review.googlesource.com/77831 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-03-06 00:22:08 +00:00
Ilya Tocar	c15984c6c6	math: remove unused variable useSSE41 was used inside asm implementation of floor to select between base and ss4 code path. We intrinsified floor and left asm functions as a backup for non-sse4 systems. This made variable unused, so remove it. Change-Id: Ia2633de7c7cb1ef1d5b15a2366b523e481b722d9 Reviewed-on: https://go-review.googlesource.com/97935 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-03-01 18:51:44 +00:00
erifan01	ed6c6c9c11	math: optimize sinh and cosh Improve performance by reducing unnecessary function calls Benchmarks: Tme old time/op new time/op delta Cosh-8 229ns ± 0% 138ns ± 0% -39.74% (p=0.008 n=5+5) Sinh-8 231ns ± 0% 139ns ± 0% -39.83% (p=0.008 n=5+5) Change-Id: Icab5485849bbfaafca8429d06b67c558101f4f3c Reviewed-on: https://go-review.googlesource.com/85477 Reviewed-by: Robert Griesemer <gri@golang.org>	2018-02-27 04:34:37 +00:00
Ilya Tocar	c3935c08d2	math/big: speed-up addMulVVW on amd64 Use MULX/ADOX/ADCX instructions to speed-up addMulVVW, when they are available. addMulVVW is a hotspot in rsa. This is faster than ADD/ADC/IMUL version, because ADOX/ADCX only modify carry/overflow flag, so they can be interleaved with each other and with MULX, which doesn't modify flags at all. Increasing unroll factor to e. g. 16 makes rsa 1% faster, but 3PrimeRSA2048Decrypt performance falls back to baseline. Updates #20058 AddMulVVW/1-8 3.28ns ± 2% 3.26ns ± 3% ~ (p=0.107 n=10+10) AddMulVVW/2-8 4.26ns ± 2% 4.24ns ± 3% ~ (p=0.327 n=9+9) AddMulVVW/3-8 5.07ns ± 2% 5.26ns ± 2% +3.73% (p=0.000 n=10+10) AddMulVVW/4-8 6.40ns ± 2% 6.50ns ± 2% +1.61% (p=0.000 n=10+10) AddMulVVW/5-8 6.77ns ± 2% 6.86ns ± 1% +1.38% (p=0.001 n=9+9) AddMulVVW/10-8 12.2ns ± 2% 10.6ns ± 3% -13.65% (p=0.000 n=10+10) AddMulVVW/100-8 79.7ns ± 2% 52.4ns ± 1% -34.17% (p=0.000 n=10+10) AddMulVVW/1000-8 695ns ± 1% 491ns ± 2% -29.39% (p=0.000 n=9+10) AddMulVVW/10000-8 7.26µs ± 2% 5.92µs ± 6% -18.42% (p=0.000 n=10+10) AddMulVVW/100000-8 72.6µs ± 2% 62.2µs ± 2% -14.31% (p=0.000 n=10+10) crypto/rsa speed-up is smaller, but stil noticeable: RSA2048Decrypt-8 1.61ms ± 1% 1.38ms ± 1% -14.13% (p=0.000 n=10+10) RSA2048Sign-8 1.93ms ± 1% 1.70ms ± 1% -11.86% (p=0.000 n=10+10) 3PrimeRSA2048Decrypt-8 932µs ± 0% 828µs ± 0% -11.15% (p=0.000 n=10+10) Results on crypto/tls: HandshakeServer/RSA-8 901µs ± 1% 777µs ± 0% -13.70% (p=0.000 n=10+8) HandshakeServer/ECDHE-P256-RSA-8 1.01ms ± 1% 0.90ms ± 0% -11.53% (p=0.000 n=10+9) Full math/big benchmarks: name old time/op new time/op delta AddVV/1-8 3.74ns ± 6% 3.55ns ± 2% ~ (p=0.082 n=10+8) AddVV/2-8 3.96ns ± 2% 3.98ns ± 5% ~ (p=0.794 n=10+9) AddVV/3-8 4.97ns ± 2% 4.94ns ± 1% ~ (p=0.081 n=10+9) AddVV/4-8 5.59ns ± 2% 5.59ns ± 2% ~ (p=0.809 n=10+10) AddVV/5-8 6.63ns ± 1% 6.62ns ± 1% ~ (p=0.560 n=9+10) AddVV/10-8 8.11ns ± 1% 8.11ns ± 2% ~ (p=0.402 n=10+10) AddVV/100-8 46.9ns ± 2% 46.8ns ± 1% ~ (p=0.809 n=10+10) AddVV/1000-8 389ns ± 1% 391ns ± 4% ~ (p=0.809 n=10+10) AddVV/10000-8 5.05µs ± 5% 4.98µs ± 2% ~ (p=0.113 n=9+10) AddVV/100000-8 55.3µs ± 3% 55.2µs ± 3% ~ (p=0.796 n=10+10) AddVW/1-8 3.04ns ± 3% 3.02ns ± 3% ~ (p=0.538 n=10+10) AddVW/2-8 3.57ns ± 2% 3.61ns ± 2% +1.12% (p=0.032 n=9+9) AddVW/3-8 3.77ns ± 1% 3.79ns ± 2% ~ (p=0.719 n=10+10) AddVW/4-8 4.69ns ± 1% 4.69ns ± 2% ~ (p=0.920 n=10+9) AddVW/5-8 4.58ns ± 1% 4.58ns ± 1% ~ (p=0.812 n=10+10) AddVW/10-8 7.62ns ± 2% 7.63ns ± 1% ~ (p=0.926 n=10+10) AddVW/100-8 41.1ns ± 2% 42.4ns ± 3% +3.34% (p=0.000 n=10+10) AddVW/1000-8 386ns ± 2% 389ns ± 4% ~ (p=0.514 n=10+10) AddVW/10000-8 3.88µs ± 3% 3.87µs ± 3% ~ (p=0.448 n=10+10) AddVW/100000-8 41.2µs ± 3% 41.7µs ± 3% ~ (p=0.148 n=10+10) AddMulVVW/1-8 3.28ns ± 2% 3.26ns ± 3% ~ (p=0.107 n=10+10) AddMulVVW/2-8 4.26ns ± 2% 4.24ns ± 3% ~ (p=0.327 n=9+9) AddMulVVW/3-8 5.07ns ± 2% 5.26ns ± 2% +3.73% (p=0.000 n=10+10) AddMulVVW/4-8 6.40ns ± 2% 6.50ns ± 2% +1.61% (p=0.000 n=10+10) AddMulVVW/5-8 6.77ns ± 2% 6.86ns ± 1% +1.38% (p=0.001 n=9+9) AddMulVVW/10-8 12.2ns ± 2% 10.6ns ± 3% -13.65% (p=0.000 n=10+10) AddMulVVW/100-8 79.7ns ± 2% 52.4ns ± 1% -34.17% (p=0.000 n=10+10) AddMulVVW/1000-8 695ns ± 1% 491ns ± 2% -29.39% (p=0.000 n=9+10) AddMulVVW/10000-8 7.26µs ± 2% 5.92µs ± 6% -18.42% (p=0.000 n=10+10) AddMulVVW/100000-8 72.6µs ± 2% 62.2µs ± 2% -14.31% (p=0.000 n=10+10) DecimalConversion-8 108µs ±19% 104µs ± 4% ~ (p=0.460 n=10+8) FloatString/100-8 926ns ±14% 908ns ± 5% ~ (p=0.398 n=9+9) FloatString/1000-8 25.7µs ± 1% 25.7µs ± 1% ~ (p=0.739 n=10+10) FloatString/10000-8 2.13ms ± 1% 2.12ms ± 1% ~ (p=0.353 n=10+10) FloatString/100000-8 207ms ± 1% 206ms ± 2% ~ (p=0.912 n=10+10) FloatAdd/10-8 61.3ns ± 3% 61.9ns ± 3% ~ (p=0.183 n=10+10) FloatAdd/100-8 62.0ns ± 2% 62.9ns ± 4% ~ (p=0.118 n=10+10) FloatAdd/1000-8 84.7ns ± 2% 84.4ns ± 1% ~ (p=0.591 n=10+10) FloatAdd/10000-8 305ns ± 2% 306ns ± 1% ~ (p=0.443 n=10+10) FloatAdd/100000-8 2.45µs ± 1% 2.46µs ± 1% ~ (p=0.782 n=10+10) FloatSub/10-8 56.8ns ± 4% 56.5ns ± 5% ~ (p=0.423 n=10+10) FloatSub/100-8 57.3ns ± 4% 57.1ns ± 5% ~ (p=0.540 n=10+10) FloatSub/1000-8 66.8ns ± 4% 66.6ns ± 1% ~ (p=0.868 n=10+10) FloatSub/10000-8 199ns ± 1% 198ns ± 1% ~ (p=0.287 n=10+9) FloatSub/100000-8 1.47µs ± 2% 1.47µs ± 2% ~ (p=0.920 n=10+9) ParseFloatSmallExp-8 8.74µs ±10% 9.48µs ±10% +8.51% (p=0.010 n=9+10) ParseFloatLargeExp-8 39.2µs ±25% 39.6µs ±12% ~ (p=0.529 n=10+10) GCD10x10/WithoutXY-8 173ns ±23% 177ns ±20% ~ (p=0.698 n=10+10) GCD10x10/WithXY-8 736ns ±12% 728ns ±16% ~ (p=0.838 n=10+10) GCD10x100/WithoutXY-8 325ns ±16% 326ns ±14% ~ (p=0.912 n=10+10) GCD10x100/WithXY-8 1.14µs ±13% 1.16µs ± 6% ~ (p=0.287 n=10+9) GCD10x1000/WithoutXY-8 851ns ±25% 820ns ±12% ~ (p=0.592 n=10+10) GCD10x1000/WithXY-8 2.89µs ±17% 2.85µs ± 5% ~ (p=1.000 n=10+9) GCD10x10000/WithoutXY-8 6.66µs ±12% 6.82µs ±19% ~ (p=0.529 n=10+10) GCD10x10000/WithXY-8 18.0µs ± 5% 17.2µs ±19% ~ (p=0.315 n=7+10) GCD10x100000/WithoutXY-8 77.8µs ±18% 73.3µs ±11% ~ (p=0.315 n=10+9) GCD10x100000/WithXY-8 186µs ±14% 204µs ±29% ~ (p=0.218 n=10+10) GCD100x100/WithoutXY-8 1.09µs ± 1% 1.09µs ± 2% ~ (p=0.117 n=9+10) GCD100x100/WithXY-8 7.93µs ± 1% 7.97µs ± 1% +0.52% (p=0.006 n=10+10) GCD100x1000/WithoutXY-8 2.00µs ± 3% 2.04µs ± 6% ~ (p=0.053 n=9+10) GCD100x1000/WithXY-8 9.23µs ± 1% 9.29µs ± 1% +0.63% (p=0.009 n=10+10) GCD100x10000/WithoutXY-8 10.2µs ±11% 9.7µs ± 6% ~ (p=0.278 n=10+9) GCD100x10000/WithXY-8 33.3µs ± 4% 33.6µs ± 4% ~ (p=0.481 n=10+10) GCD100x100000/WithoutXY-8 106µs ±17% 105µs ±13% ~ (p=0.853 n=10+10) GCD100x100000/WithXY-8 289µs ±17% 276µs ± 8% ~ (p=0.353 n=10+10) GCD1000x1000/WithoutXY-8 12.2µs ± 1% 12.1µs ± 1% -0.45% (p=0.007 n=10+10) GCD1000x1000/WithXY-8 131µs ± 1% 132µs ± 0% +0.93% (p=0.000 n=9+7) GCD1000x10000/WithoutXY-8 20.6µs ± 2% 20.6µs ± 1% ~ (p=0.326 n=10+9) GCD1000x10000/WithXY-8 238µs ± 1% 237µs ± 1% ~ (p=0.356 n=9+10) GCD1000x100000/WithoutXY-8 117µs ± 8% 114µs ±11% ~ (p=0.190 n=10+10) GCD1000x100000/WithXY-8 1.51ms ± 1% 1.50ms ± 1% ~ (p=0.053 n=9+10) GCD10000x10000/WithoutXY-8 220µs ± 1% 218µs ± 1% -0.86% (p=0.000 n=10+10) GCD10000x10000/WithXY-8 3.04ms ± 0% 3.05ms ± 0% +0.33% (p=0.001 n=9+10) GCD10000x100000/WithoutXY-8 513µs ± 0% 511µs ± 0% -0.38% (p=0.000 n=10+10) GCD10000x100000/WithXY-8 15.1ms ± 0% 15.0ms ± 0% ~ (p=0.053 n=10+9) GCD100000x100000/WithoutXY-8 10.4ms ± 1% 10.4ms ± 2% ~ (p=0.258 n=9+9) GCD100000x100000/WithXY-8 205ms ± 1% 205ms ± 1% ~ (p=0.481 n=10+10) Hilbert-8 1.25ms ±15% 1.24ms ±17% ~ (p=0.853 n=10+10) Binomial-8 3.03µs ±24% 2.90µs ±16% ~ (p=0.481 n=10+10) QuoRem-8 1.95µs ± 1% 1.95µs ± 2% ~ (p=0.117 n=9+10) Exp-8 5.12ms ± 2% 3.99ms ± 1% -22.02% (p=0.000 n=10+9) Exp2-8 5.14ms ± 2% 3.98ms ± 0% -22.55% (p=0.000 n=10+9) Bitset-8 16.4ns ± 2% 16.5ns ± 2% ~ (p=0.311 n=9+10) BitsetNeg-8 46.3ns ± 4% 45.8ns ± 4% ~ (p=0.272 n=10+10) BitsetOrig-8 250ns ±19% 247ns ±14% ~ (p=0.671 n=10+10) BitsetNegOrig-8 416ns ±14% 429ns ±14% ~ (p=0.353 n=10+10) ModSqrt225_Tonelli-8 400µs ± 0% 320µs ± 0% -19.88% (p=0.000 n=9+7) ModSqrt224_3Mod4-8 123µs ± 1% 97µs ± 0% -21.21% (p=0.000 n=9+10) ModSqrt5430_Tonelli-8 1.87s ± 0% 1.39s ± 1% -25.70% (p=0.000 n=9+10) ModSqrt5430_3Mod4-8 630ms ± 2% 465ms ± 1% -26.12% (p=0.000 n=10+10) Sqrt-8 25.8µs ± 1% 25.9µs ± 0% +0.66% (p=0.002 n=10+8) IntSqr/1-8 11.3ns ± 1% 11.3ns ± 2% ~ (p=0.360 n=9+10) IntSqr/2-8 26.6ns ± 1% 27.4ns ± 2% +2.87% (p=0.000 n=8+9) IntSqr/3-8 36.5ns ± 6% 36.6ns ± 5% ~ (p=0.589 n=10+10) IntSqr/5-8 57.2ns ± 2% 57.8ns ± 1% +0.92% (p=0.045 n=10+9) IntSqr/8-8 112ns ± 1% 93ns ± 1% -16.60% (p=0.000 n=10+10) IntSqr/10-8 148ns ± 1% 129ns ± 5% -12.85% (p=0.000 n=10+10) IntSqr/20-8 642ns ±28% 692ns ±21% ~ (p=0.105 n=10+10) IntSqr/30-8 1.03µs ±18% 1.06µs ±15% ~ (p=0.422 n=10+8) IntSqr/50-8 2.33µs ±14% 2.14µs ±20% ~ (p=0.063 n=10+10) IntSqr/80-8 4.06µs ±13% 3.72µs ±14% -8.31% (p=0.029 n=10+10) IntSqr/100-8 5.79µs ±10% 5.20µs ±18% -10.15% (p=0.004 n=10+10) IntSqr/200-8 17.1µs ± 1% 12.9µs ± 3% -24.44% (p=0.000 n=10+10) IntSqr/300-8 35.9µs ± 0% 26.6µs ± 1% -25.75% (p=0.000 n=10+10) IntSqr/500-8 84.9µs ± 0% 71.7µs ± 1% -15.49% (p=0.000 n=10+10) IntSqr/800-8 170µs ± 1% 142µs ± 2% -16.73% (p=0.000 n=10+10) IntSqr/1000-8 258µs ± 1% 218µs ± 1% -15.65% (p=0.000 n=10+10) Mul-8 10.4ms ± 1% 8.3ms ± 0% -20.05% (p=0.000 n=10+9) Exp3Power/0x10-8 311ns ±15% 321ns ±24% ~ (p=0.447 n=10+10) Exp3Power/0x40-8 358ns ±21% 346ns ±37% ~ (p=0.591 n=10+10) Exp3Power/0x100-8 611ns ±19% 570ns ±27% ~ (p=0.393 n=10+10) Exp3Power/0x400-8 1.31µs ±26% 1.34µs ±19% ~ (p=0.853 n=10+10) Exp3Power/0x1000-8 6.76µs ±23% 6.22µs ±16% ~ (p=0.095 n=10+9) Exp3Power/0x4000-8 37.6µs ±14% 36.4µs ±21% ~ (p=0.247 n=10+10) Exp3Power/0x10000-8 345µs ±14% 310µs ±11% -9.99% (p=0.005 n=10+10) Exp3Power/0x40000-8 2.77ms ± 1% 2.34ms ± 1% -15.47% (p=0.000 n=10+10) Exp3Power/0x100000-8 25.1ms ± 1% 21.3ms ± 1% -15.26% (p=0.000 n=10+10) Exp3Power/0x400000-8 225ms ± 1% 190ms ± 1% -15.61% (p=0.000 n=10+10) Fibo-8 23.4ms ± 1% 23.3ms ± 0% ~ (p=0.052 n=10+10) NatSqr/1-8 58.4ns ±24% 59.8ns ±38% ~ (p=0.739 n=10+10) NatSqr/2-8 122ns ±21% 122ns ±16% ~ (p=0.896 n=10+10) NatSqr/3-8 140ns ±28% 148ns ±30% ~ (p=0.288 n=10+10) NatSqr/5-8 193ns ±29% 210ns ±34% ~ (p=0.469 n=10+10) NatSqr/8-8 317ns ±21% 296ns ±25% ~ (p=0.393 n=10+10) NatSqr/10-8 362ns ± 8% 373ns ±30% ~ (p=0.617 n=9+10) NatSqr/20-8 1.24µs ±16% 1.06µs ±29% -14.57% (p=0.019 n=10+10) NatSqr/30-8 1.90µs ±32% 1.71µs ±10% ~ (p=0.176 n=10+9) NatSqr/50-8 4.22µs ±19% 3.67µs ± 7% -13.03% (p=0.017 n=10+9) NatSqr/80-8 7.33µs ±20% 6.50µs ±15% -11.26% (p=0.009 n=10+10) NatSqr/100-8 9.84µs ±18% 9.33µs ± 8% ~ (p=0.280 n=10+10) NatSqr/200-8 21.4µs ± 7% 20.0µs ±14% ~ (p=0.075 n=10+10) NatSqr/300-8 38.0µs ± 2% 31.3µs ±10% -17.63% (p=0.000 n=10+10) NatSqr/500-8 102µs ± 5% 101µs ± 4% ~ (p=0.780 n=9+10) NatSqr/800-8 190µs ± 3% 166µs ± 6% -12.29% (p=0.000 n=10+10) NatSqr/1000-8 277µs ± 2% 245µs ± 6% -11.64% (p=0.000 n=10+10) ScanPi-8 144µs ±23% 149µs ±24% ~ (p=0.579 n=10+10) StringPiParallel-8 25.6µs ± 0% 25.8µs ± 0% +0.69% (p=0.000 n=9+10) Scan/10/Base2-8 305ns ± 1% 309ns ± 1% +1.32% (p=0.000 n=10+9) Scan/100/Base2-8 1.95µs ± 1% 1.98µs ± 1% +1.10% (p=0.000 n=10+10) Scan/1000/Base2-8 19.5µs ± 1% 19.7µs ± 1% +1.39% (p=0.000 n=10+10) Scan/10000/Base2-8 270µs ± 1% 272µs ± 1% +0.58% (p=0.024 n=9+9) Scan/100000/Base2-8 10.3ms ± 0% 10.3ms ± 0% +0.16% (p=0.022 n=9+10) Scan/10/Base8-8 146ns ± 4% 154ns ± 4% +5.57% (p=0.000 n=9+9) Scan/100/Base8-8 748ns ± 1% 759ns ± 1% +1.51% (p=0.000 n=9+10) Scan/1000/Base8-8 7.88µs ± 1% 8.00µs ± 1% +1.64% (p=0.000 n=10+10) Scan/10000/Base8-8 155µs ± 1% 155µs ± 1% ~ (p=0.968 n=10+9) Scan/100000/Base8-8 9.11ms ± 0% 9.11ms ± 0% ~ (p=0.604 n=9+10) Scan/10/Base10-8 140ns ± 5% 149ns ± 5% +6.39% (p=0.000 n=9+10) Scan/100/Base10-8 680ns ± 0% 688ns ± 1% +1.08% (p=0.000 n=9+10) Scan/1000/Base10-8 7.09µs ± 1% 7.16µs ± 1% +0.98% (p=0.019 n=10+10) Scan/10000/Base10-8 149µs ± 3% 150µs ± 3% ~ (p=0.143 n=10+10) Scan/100000/Base10-8 9.16ms ± 0% 9.16ms ± 0% ~ (p=0.661 n=10+9) Scan/10/Base16-8 134ns ± 5% 135ns ± 3% ~ (p=0.505 n=9+9) Scan/100/Base16-8 560ns ± 1% 563ns ± 0% +0.67% (p=0.000 n=10+8) Scan/1000/Base16-8 6.28µs ± 1% 6.26µs ± 1% ~ (p=0.448 n=10+10) Scan/10000/Base16-8 161µs ± 1% 162µs ± 1% +0.74% (p=0.008 n=9+9) Scan/100000/Base16-8 9.64ms ± 0% 9.64ms ± 0% ~ (p=0.436 n=10+10) String/10/Base2-8 116ns ±12% 118ns ±13% ~ (p=0.645 n=10+10) String/100/Base2-8 871ns ±23% 860ns ±22% ~ (p=0.699 n=10+10) String/1000/Base2-8 10.0µs ±20% 10.0µs ±23% ~ (p=0.853 n=10+10) String/10000/Base2-8 110µs ±21% 120µs ±25% ~ (p=0.436 n=10+10) String/100000/Base2-8 768µs ±11% 733µs ±16% ~ (p=0.393 n=10+10) String/10/Base8-8 51.3ns ± 1% 51.0ns ± 3% ~ (p=0.286 n=9+9) String/100/Base8-8 284ns ± 9% 272ns ±12% ~ (p=0.267 n=9+10) String/1000/Base8-8 3.06µs ± 9% 3.04µs ±10% ~ (p=0.739 n=10+10) String/10000/Base8-8 36.1µs ±14% 35.1µs ± 9% ~ (p=0.447 n=10+9) String/100000/Base8-8 371µs ±12% 373µs ±16% ~ (p=0.739 n=10+10) String/10/Base10-8 167ns ±11% 165ns ± 9% ~ (p=0.781 n=10+10) String/100/Base10-8 727ns ± 1% 740ns ± 2% +1.70% (p=0.001 n=10+10) String/1000/Base10-8 5.30µs ±18% 5.37µs ±14% ~ (p=0.631 n=10+10) String/10000/Base10-8 45.0µs ±14% 44.6µs ±10% ~ (p=0.720 n=9+10) String/100000/Base10-8 5.10ms ± 1% 5.05ms ± 3% ~ (p=0.211 n=9+10) String/10/Base16-8 47.7ns ± 6% 47.7ns ± 6% ~ (p=0.985 n=10+10) String/100/Base16-8 221ns ±10% 234ns ±27% ~ (p=0.541 n=10+10) String/1000/Base16-8 2.23µs ±11% 2.12µs ± 8% -4.81% (p=0.029 n=9+8) String/10000/Base16-8 28.3µs ±21% 28.5µs ±14% ~ (p=0.796 n=10+10) String/100000/Base16-8 291µs ±16% 293µs ±15% ~ (p=0.931 n=9+9) LeafSize/0-8 2.43ms ± 1% 2.49ms ± 1% +2.56% (p=0.000 n=10+10) LeafSize/1-8 49.7µs ± 9% 46.3µs ±16% -6.78% (p=0.017 n=10+9) LeafSize/2-8 48.4µs ±18% 46.3µs ±19% ~ (p=0.436 n=10+10) LeafSize/3-8 81.7µs ± 3% 80.9µs ± 3% ~ (p=0.278 n=10+9) LeafSize/4-8 47.0µs ± 7% 47.9µs ±13% ~ (p=0.905 n=9+10) LeafSize/5-8 96.8µs ± 1% 97.3µs ± 2% ~ (p=0.515 n=8+10) LeafSize/6-8 82.5µs ± 4% 80.9µs ± 2% -1.92% (p=0.019 n=10+10) LeafSize/7-8 67.2µs ±13% 66.6µs ± 9% ~ (p=0.842 n=10+9) LeafSize/8-8 46.0µs ±28% 45.1µs ±12% ~ (p=0.739 n=10+10) LeafSize/9-8 111µs ± 1% 111µs ± 1% ~ (p=0.739 n=10+10) LeafSize/10-8 98.8µs ± 4% 97.9µs ± 3% ~ (p=0.278 n=10+9) LeafSize/11-8 96.8µs ± 1% 96.4µs ± 1% ~ (p=0.211 n=9+10) LeafSize/12-8 81.0µs ± 4% 81.3µs ± 3% ~ (p=0.579 n=10+10) LeafSize/13-8 79.7µs ± 5% 79.2µs ± 3% ~ (p=0.661 n=10+9) LeafSize/14-8 67.6µs ±12% 65.8µs ± 7% ~ (p=0.447 n=10+9) LeafSize/15-8 63.9µs ±17% 66.3µs ±14% ~ (p=0.481 n=10+10) LeafSize/16-8 44.0µs ±28% 46.0µs ±27% ~ (p=0.481 n=10+10) LeafSize/32-8 46.2µs ±13% 43.5µs ±18% ~ (p=0.156 n=9+10) LeafSize/64-8 53.3µs ±10% 53.0µs ±19% ~ (p=0.730 n=9+9) ProbablyPrime/n=0-8 3.60ms ± 1% 3.39ms ± 1% -5.87% (p=0.000 n=10+9) ProbablyPrime/n=1-8 4.42ms ± 1% 4.08ms ± 1% -7.69% (p=0.000 n=10+10) ProbablyPrime/n=5-8 7.57ms ± 2% 6.79ms ± 1% -10.24% (p=0.000 n=10+10) ProbablyPrime/n=10-8 11.6ms ± 2% 10.2ms ± 1% -11.69% (p=0.000 n=10+10) ProbablyPrime/n=20-8 19.4ms ± 2% 16.9ms ± 2% -12.89% (p=0.000 n=10+10) ProbablyPrime/Lucas-8 2.81ms ± 2% 2.72ms ± 1% -3.22% (p=0.000 n=10+9) ProbablyPrime/MillerRabinBase2-8 797µs ± 1% 680µs ± 1% -14.64% (p=0.000 n=10+10) name old speed new speed delta AddVV/1-8 17.1GB/s ± 6% 18.0GB/s ± 2% ~ (p=0.122 n=10+8) AddVV/2-8 32.4GB/s ± 2% 32.2GB/s ± 4% ~ (p=0.661 n=10+9) AddVV/3-8 38.6GB/s ± 2% 38.9GB/s ± 1% ~ (p=0.113 n=10+9) AddVV/4-8 45.8GB/s ± 2% 45.8GB/s ± 2% ~ (p=0.796 n=10+10) AddVV/5-8 48.1GB/s ± 2% 48.3GB/s ± 1% ~ (p=0.315 n=10+10) AddVV/10-8 78.9GB/s ± 1% 78.9GB/s ± 2% ~ (p=0.353 n=10+10) AddVV/100-8 136GB/s ± 2% 137GB/s ± 1% ~ (p=0.971 n=10+10) AddVV/1000-8 164GB/s ± 1% 164GB/s ± 4% ~ (p=0.853 n=10+10) AddVV/10000-8 126GB/s ± 6% 129GB/s ± 2% ~ (p=0.063 n=10+10) AddVV/100000-8 116GB/s ± 3% 116GB/s ± 3% ~ (p=0.796 n=10+10) AddVW/1-8 2.64GB/s ± 3% 2.64GB/s ± 3% ~ (p=0.579 n=10+10) AddVW/2-8 4.49GB/s ± 2% 4.44GB/s ± 2% -1.09% (p=0.040 n=9+9) AddVW/3-8 6.36GB/s ± 1% 6.34GB/s ± 2% ~ (p=0.684 n=10+10) AddVW/4-8 6.83GB/s ± 1% 6.82GB/s ± 2% ~ (p=0.905 n=10+9) AddVW/5-8 8.75GB/s ± 1% 8.73GB/s ± 1% ~ (p=0.796 n=10+10) AddVW/10-8 10.5GB/s ± 2% 10.5GB/s ± 1% ~ (p=0.971 n=10+10) AddVW/100-8 19.5GB/s ± 2% 18.9GB/s ± 2% -3.22% (p=0.000 n=10+10) AddVW/1000-8 20.7GB/s ± 2% 20.6GB/s ± 4% ~ (p=0.631 n=10+10) AddVW/10000-8 20.6GB/s ± 3% 20.7GB/s ± 3% ~ (p=0.481 n=10+10) AddVW/100000-8 19.4GB/s ± 2% 19.2GB/s ± 3% ~ (p=0.165 n=10+10) AddMulVVW/1-8 19.5GB/s ± 2% 19.7GB/s ± 3% ~ (p=0.123 n=10+10) AddMulVVW/2-8 30.1GB/s ± 2% 30.2GB/s ± 3% ~ (p=0.297 n=9+9) AddMulVVW/3-8 37.9GB/s ± 2% 36.5GB/s ± 2% -3.63% (p=0.000 n=10+10) AddMulVVW/4-8 40.0GB/s ± 2% 39.4GB/s ± 2% -1.58% (p=0.001 n=10+10) AddMulVVW/5-8 47.3GB/s ± 2% 46.6GB/s ± 1% -1.35% (p=0.001 n=9+9) AddMulVVW/10-8 52.3GB/s ± 2% 60.6GB/s ± 3% +15.76% (p=0.000 n=10+10) AddMulVVW/100-8 80.3GB/s ± 2% 122.1GB/s ± 1% +51.92% (p=0.000 n=10+10) AddMulVVW/1000-8 92.0GB/s ± 1% 130.3GB/s ± 2% +41.61% (p=0.000 n=9+10) AddMulVVW/10000-8 88.2GB/s ± 2% 108.2GB/s ± 5% +22.66% (p=0.000 n=10+10) AddMulVVW/100000-8 88.2GB/s ± 2% 102.9GB/s ± 2% +16.69% (p=0.000 n=10+10) Change-Id: Ic98e30c91d437d845fed03e07e976c3fdbf02b36 Reviewed-on: https://go-review.googlesource.com/74851 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Adam Langley <agl@golang.org>	2018-02-24 00:13:03 +00:00
Shawn Smith	d3beea8c52	all: fix misspellings GitHub-Last-Rev: `468df242d0` GitHub-Pull-Request: golang/go#23935 Change-Id: If751ce3ffa3a4d5e00a3138211383d12cb6b23fc Reviewed-on: https://go-review.googlesource.com/95577 Run-TryBot: Andrew Bonventre <andybons@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Andrew Bonventre <andybons@golang.org>	2018-02-20 21:02:58 +00:00
Alberto Donizetti	331092c58f	math/big: fix %s verbs in Float tests error messages Fatalf calls in two Float tests use the %s verb with Floats values, which is not allowed and results in failure messages that look like this: float_test.go:1385: i = 0, prec = 1, ToZero: %!s(big.Float=1) [0] / %!s(big.Float=1) [0] = %!s(big.Float=0.0625) want %!s(big.Float=1) Switch to %v. Change-Id: Ifdc80bf19c91ca1b190f6551a6d0a51b42ed5919 Reviewed-on: https://go-review.googlesource.com/87199 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-02-14 09:50:19 +00:00
Thanabodee Charoenpiriyakij	1124fa300b	math: use Abs rather than if x < 0 { x = -x } This is the benchmark result base on darwin with amd64 architecture: name old time/op new time/op delta Cos 10.2ns ± 2% 10.3ns ± 3% +1.18% (p=0.032 n=10+10) Cosh 25.3ns ± 3% 24.6ns ± 2% -3.00% (p=0.000 n=10+10) Hypot 6.40ns ± 2% 6.19ns ± 3% -3.36% (p=0.000 n=10+10) HypotGo 7.16ns ± 3% 6.54ns ± 2% -8.66% (p=0.000 n=10+10) J0 66.0ns ± 2% 63.7ns ± 1% -3.42% (p=0.000 n=9+10) Fixes #21812 Change-Id: I2b88fbdfc250cd548f8f08b44ce2eb172dcacf43 Reviewed-on: https://go-review.googlesource.com/84437 Reviewed-by: Giovanni Bajo <rasky@develer.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-02-13 20:12:23 +00:00
Paul PISCUC	3526c40979	math/rand: typo fixed in documentation of seedPos In the comment of seedPost, the word: condiiton was changed to: condition Change-Id: I8967cc0e9f5d37776bada96cc1443c8bf46e1117 Reviewed-on: https://go-review.googlesource.com/86156 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-01-04 20:27:29 +00:00
Brian Kessler	5305bdd86b	math: correct result for Pow(x, ±.5) Fixes #23224 The previous Pow code had an optimization for powers equal to ±0.5 that used Sqrt for increased accuracy/speed. This caused special cases involving powers of ±0.5 to disagree with the Pow spec. This change places the Sqrt optimization after all of the special case handling. Change-Id: I6bf757f6248256b29cc21725a84e27705d855369 Reviewed-on: https://go-review.googlesource.com/85660 Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-01-02 18:10:43 +00:00
Ilya Tocar	1992893307	math: remove asm version of Dim Dim performance has regressed by 14% vs 1.9 on amd64. Current pure go version of Dim is faster and, what is even more important for performance, is inlinable, so instead of tweaking asm implementation, just remove it. I had to update BenchmarkDim, because it was simply reloading constant(answer) in a loop. Perf data below: name old time/op new time/op delta Dim-6 6.79ns ± 0% 1.60ns ± 1% -76.39% (p=0.000 n=7+10) If I modify benchmark to be the same as in this CL results are even better: name old time/op new time/op delta Dim-6 10.2ns ± 0% 1.6ns ± 1% -84.27% (p=0.000 n=8+10) Updates #21913 Change-Id: I00e23c8affc293531e1d9f0e0e49f3a525634f53 Reviewed-on: https://go-review.googlesource.com/80695 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-11-30 21:00:33 +00:00
Alberto Donizetti	ff534e2130	math/big: protect against aliasing in nat.divLarge In nat.divLarge (having signature (z nat).divLarge(u, uIn, v nat)), we check whether z aliases uIn or v, but aliasing is currently not checked for the u parameter. Unfortunately, z and u aliasing each other can in some cases cause errors in the computation. The q return parameter (which will hold the result's quotient), is unconditionally initialized as q = z.make(m + 1) When cap(z) ≥ m+1, z.make() will reuse z's backing array, causing q and z to share the same backing array. If then z aliases u, setting q during the quotient computation will then corrupt u, which at that point already holds computation state. To fix this, we add an alias(z, u) check at the beginning of the function, taking care of aliasing the same way we already do for uIn and v. Fixes #22830 Change-Id: I3ab81120d5af6db7772a062bb1dfc011de91f7ad Reviewed-on: https://go-review.googlesource.com/78995 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-11-30 20:36:54 +00:00
Vladimir Stefanovic	2708da0dc1	runtime/cgo, math: don't use FP instructions for soft-float mips{,le} Updates #18162 Change-Id: I591fcf71a02678a99a56a6487da9689d3c9b1bb6 Reviewed-on: https://go-review.googlesource.com/37955 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-11-30 17:12:32 +00:00
Brian Kessler	802a8f88a3	math/cmplx: use signed zero to correct branch cuts Branch cuts for the elementary complex functions along real or imaginary axes should be resolved in floating point calculations by one-sided continuity with signed zero as described in: "Branch Cuts for Complex Elementary Functions or Much Ado About Nothing's Sign Bit" W. Kahan Available at: https://people.freebsd.org/~das/kahan86branch.pdf And as described in the C99 standard which is claimed as the original cephes source. Sqrt did not return the correct branch when imag(x) == 0. The branch is now determined by sign(imag(x)). This incorrect branch choice was affecting the behavior of the Trigonometric/Hyperbolic functions that use Sqrt in intermediate calculations. Asin, Asinh and Atan had spurious domain checks, whereas the functions should be valid over the whole complex plane with appropriate branch cuts. Fixes #6888 Change-Id: I9b1278af54f54bfb4208276ae345bbd3ddf3ec83 Reviewed-on: https://go-review.googlesource.com/46492 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2017-11-27 07:44:00 +00:00
Russ Cox	d9a198c7d0	Revert "math/rand: make Perm match Shuffle" This reverts CL 55972. Reason for revert: this changes Perm's behavior unnecessarily. I asked for this change originally but I now regret it. Reverting so that I don't have to justify it in Go 1.10 release notes. Edited to keep the change to rand_test.go, which seems to have been mostly unrelated. Fixes #22744. Change-Id: If8bb1bcde3ced0db2fdcd0aa65ab128613686c66 Reviewed-on: https://go-review.googlesource.com/78195 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com> Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2017-11-16 16:24:30 +00:00
Daniel Martí	a265f2e90e	go/printer: indent lone comments in composite lits If a composite literal contains any comments on their own lines without any elements, the printer would unindent the comments. The comments in this edge case are written when the closing '}' is written. Indent and outdent first so that the indentation is interspersed before the comment is written. Also note that the go/printer golden tests don't show the exact same behaviour that gofmt does. Added a TODO to figure this out in a separate CL. While at it, ensure that the tree conforms to gofmt. The changes are unrelated to this indentation fix, however. Fixes #22355. Change-Id: I5ac25ac6de95a236f1e123479127cc4dd71e93fe Reviewed-on: https://go-review.googlesource.com/74232 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-11-15 18:48:48 +00:00
Brian Kessler	2955a8a6cc	math/big: clarify comment on lehmerGCD overflow A clarifying comment was added to indicate that overflow of a single Word is not possible in the single digit calculation. Lehmer's paper includes a proof of the bounds on the size of the cosequences (u0, u1, u2, v0, v1, v2). Change-Id: I98127a07aa8f8fe44814b74b2bc6ff720805194b Reviewed-on: https://go-review.googlesource.com/77451 Reviewed-by: Robert Griesemer <gri@golang.org>	2017-11-14 17:32:39 +00:00
Filippo Valsorda	ef0e2af7b0	math/big: add security warning to (*Int).Rand Change-Id: I22a67733aa2d07298e124077654c9b1473802100 Reviewed-on: https://go-review.googlesource.com/76012 Reviewed-by: Aliaksandr Valialkin <valyala@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2017-11-06 15:55:31 +00:00
Tobias Klauser	89bcbf40b8	math/bits: add examples for right rotation Right rotation is achieved using negative k in RotateLeft*(x, k). Add examples demonstrating that functionality. Change-Id: I15dab159accd2937cb18d3fa8ca32da8501567d3 Reviewed-on: https://go-review.googlesource.com/75371 Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-11-03 20:12:07 +00:00
Lynn Boger	3860478b42	math: implement asm modf for ppc64x This change adds an asm implementations modf for ppc64x. Improvements: BenchmarkModf-16 7.48 6.26 -16.31% Updates: #21390 Change-Id: I9c4f3213688e3e8842d050840dc04fc9c0bf6ce4 Reviewed-on: https://go-review.googlesource.com/74411 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Munday <mike.munday@ibm.com>	2017-11-02 13:24:32 +00:00
griesemer	85c32c3744	math/big: implement CmpAbs Fixes #22473. Change-Id: Ie886dfc8b5510970d6d63ca6472c73325f6f2276 Reviewed-on: https://go-review.googlesource.com/74971 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com>	2017-11-01 18:17:49 +00:00
Alberto Donizetti	856dccb175	math/big: avoid unnecessary Newton iteration in Float.Sqrt An initial draft of the Newton code for Float.Sqrt was structured like this: for condition // do Newton iteration.. prec = 2 since prec, at the end of the loop, was double the precision used in the last Newton iteration, the termination condition was set to 2limit. The code was later rewritten in the form for condition prec = 2 // do Newton iteration.. but condition was not updated, and it's still 2limit, which is about double what we actually need, and is triggering the execution of an additional, and unnecessary, Newton iteration. This change adjusts the Newton termination condition to the (correct) value of z.prec, plus 32 guard bits as a safety margin. name old time/op new time/op delta FloatSqrt/64-4 798ns ± 3% 802ns ± 3% ~ (p=0.458 n=8+8) FloatSqrt/128-4 1.65µs ± 1% 1.65µs ± 1% ~ (p=0.290 n=8+8) FloatSqrt/256-4 3.10µs ± 1% 2.10µs ± 0% -32.32% (p=0.000 n=8+7) FloatSqrt/1000-4 8.83µs ± 1% 4.91µs ± 2% -44.39% (p=0.000 n=8+8) FloatSqrt/10000-4 107µs ± 1% 40µs ± 1% -62.68% (p=0.000 n=8+8) FloatSqrt/100000-4 2.91ms ± 1% 0.96ms ± 1% -67.13% (p=0.000 n=8+8) FloatSqrt/1000000-4 240ms ± 1% 80ms ± 1% -66.66% (p=0.000 n=8+8) name old alloc/op new alloc/op delta FloatSqrt/64-4 416B ± 0% 416B ± 0% ~ (all equal) FloatSqrt/128-4 720B ± 0% 720B ± 0% ~ (all equal) FloatSqrt/256-4 1.34kB ± 0% 0.82kB ± 0% -39.29% (p=0.000 n=8+8) FloatSqrt/1000-4 5.09kB ± 0% 2.50kB ± 0% -50.94% (p=0.000 n=8+8) FloatSqrt/10000-4 45.9kB ± 0% 23.5kB ± 0% -48.81% (p=0.000 n=8+8) FloatSqrt/100000-4 533kB ± 0% 251kB ± 0% -52.90% (p=0.000 n=8+8) FloatSqrt/1000000-4 9.21MB ± 0% 4.61MB ± 0% -49.98% (p=0.000 n=8+8) name old allocs/op new allocs/op delta FloatSqrt/64-4 9.00 ± 0% 9.00 ± 0% ~ (all equal) FloatSqrt/128-4 13.0 ± 0% 13.0 ± 0% ~ (all equal) FloatSqrt/256-4 15.0 ± 0% 12.0 ± 0% -20.00% (p=0.000 n=8+8) FloatSqrt/1000-4 24.0 ± 0% 19.0 ± 0% -20.83% (p=0.000 n=8+8) FloatSqrt/10000-4 40.0 ± 0% 35.0 ± 0% -12.50% (p=0.000 n=8+8) FloatSqrt/100000-4 66.0 ± 0% 55.0 ± 0% -16.67% (p=0.000 n=8+8) FloatSqrt/1000000-4 143 ± 0% 122 ± 0% -14.69% (p=0.000 n=8+8) Change-Id: I4868adb7f8960f2ca20e7792734c2e6211669fc0 Reviewed-on: https://go-review.googlesource.com/75010 Reviewed-by: Robert Griesemer <gri@golang.org>	2017-11-01 16:21:06 +00:00
Alberto Donizetti	2c783dc038	math/big: save one subtraction per iteration in Float.Sqrt The Sqrt Newton method computes g(t) = f(t)/f'(t) and then iterates t2 = t1 - g(t1) We can save one operation by including the final subtraction in g(t) and evaluating the resulting expression symbolically. For example, for the direct method, g(t) = ½(t² - x)/t and we use 2 multiplications, 1 division and 1 subtraction in g(), plus 1 final subtraction; but if we compute t - g(t) = t - ½(t² - x)/t = ½(t² + x)/t we only use 2 multiplications, 1 division and 1 addition. A similar simplification can be done for the inverse method. name old time/op new time/op delta FloatSqrt/64-4 889ns ± 4% 790ns ± 1% -11.19% (p=0.000 n=8+7) FloatSqrt/128-4 1.82µs ± 0% 1.64µs ± 1% -10.07% (p=0.001 n=6+8) FloatSqrt/256-4 3.56µs ± 4% 3.10µs ± 3% -12.96% (p=0.000 n=7+8) FloatSqrt/1000-4 9.06µs ± 3% 8.86µs ± 1% -2.20% (p=0.001 n=7+7) FloatSqrt/10000-4 109µs ± 1% 107µs ± 1% -1.56% (p=0.000 n=8+8) FloatSqrt/100000-4 2.91ms ± 0% 2.89ms ± 2% -0.68% (p=0.026 n=7+7) FloatSqrt/1000000-4 237ms ± 1% 239ms ± 1% +0.72% (p=0.021 n=8+8) name old alloc/op new alloc/op delta FloatSqrt/64-4 448B ± 0% 416B ± 0% -7.14% (p=0.000 n=8+8) FloatSqrt/128-4 752B ± 0% 720B ± 0% -4.26% (p=0.000 n=8+8) FloatSqrt/256-4 2.05kB ± 0% 1.34kB ± 0% -34.38% (p=0.000 n=8+8) FloatSqrt/1000-4 6.91kB ± 0% 5.09kB ± 0% -26.39% (p=0.000 n=8+8) FloatSqrt/10000-4 60.5kB ± 0% 45.9kB ± 0% -24.17% (p=0.000 n=8+8) FloatSqrt/100000-4 617kB ± 0% 533kB ± 0% -13.57% (p=0.000 n=8+8) FloatSqrt/1000000-4 10.3MB ± 0% 9.2MB ± 0% -10.85% (p=0.000 n=8+8) name old allocs/op new allocs/op delta FloatSqrt/64-4 9.00 ± 0% 9.00 ± 0% ~ (all equal) FloatSqrt/128-4 13.0 ± 0% 13.0 ± 0% ~ (all equal) FloatSqrt/256-4 20.0 ± 0% 15.0 ± 0% -25.00% (p=0.000 n=8+8) FloatSqrt/1000-4 31.0 ± 0% 24.0 ± 0% -22.58% (p=0.000 n=8+8) FloatSqrt/10000-4 50.0 ± 0% 40.0 ± 0% -20.00% (p=0.000 n=8+8) FloatSqrt/100000-4 76.0 ± 0% 66.0 ± 0% -13.16% (p=0.000 n=8+8) FloatSqrt/1000000-4 146 ± 0% 143 ± 0% -2.05% (p=0.000 n=8+8) Change-Id: I271c00de1ca9740e585bf2af7bcd87b18c1fa68e Reviewed-on: https://go-review.googlesource.com/73879 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-11-01 11:24:02 +00:00
Ilya Tocar	94484d8ed5	cmd/compile: intrinsify math.{Trunc/Ceil/Floor} on amd64 This significantly speed-ups Trunc. Ceil/Floor are using the same instruction, so do them too. name old time/op new time/op delta Floor-6 3.33ns ± 1% 3.22ns ± 0% -3.39% (p=0.000 n=10+10) Ceil-6 3.33ns ± 1% 3.22ns ± 0% -3.16% (p=0.000 n=10+7) Trunc-6 4.83ns ± 0% 3.22ns ± 0% -33.36% (p=0.000 n=6+8) Change-Id: If848790e458eedfe38a6a0407bb4f589c68ac254 Reviewed-on: https://go-review.googlesource.com/68630 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-10-31 19:30:54 +00:00
Michael Munday	c280126557	cmd/asm, cmd/internal/obj/s390x, math: add "test under mask" instructions Adds the following s390x test under mask (immediate) instructions: TMHH TMHL TMLH TMLL These are useful for testing bits and are already used in the math package. Change-Id: Idffb3f83b238dba76ac1e42ac6b0bf7f1d11bea2 Reviewed-on: https://go-review.googlesource.com/41092 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-10-30 23:55:14 +00:00
Michael Munday	b97688d112	math: optimize dim and remove s390x assembly implementation By calculating dim directly, rather than calling max, we can simplify the generated code significantly. The compiler now reports that dim is easily inlineable, but it can't be inlined because there is still an assembly stub for Dim. Since dim is now very simple I no longer think it is worth having assembly implementations of it. I have therefore removed the s390x assembly. Removing the other assembly for Dim is #21913. name old time/op new time/op delta Dim 4.29ns ± 0% 3.53ns ± 0% -17.62% (p=0.000 n=9+8) Change-Id: Ic38a6b51603cbc661dcdb868ecf2b1947e9f399e Reviewed-on: https://go-review.googlesource.com/64194 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-10-30 19:05:51 +00:00
Alberto Donizetti	bd48d37e30	math/big: add (Float).Sqrt This change adds a Square root method to the big.Float type, with signature (z Float) Sqrt(x Float) Float Fixes #20460 Change-Id: I050aaed0615fe0894e11c800744600648343c223 Reviewed-on: https://go-review.googlesource.com/67830 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-10-26 17:29:27 +00:00
Brian Kessler	1643d4f33a	math/big: implement Lehmer's GCD algorithm Updates #15833 Lehmer's GCD algorithm uses single precision calculations to simulate several steps of multiple precision calculations in Euclid's GCD algorithm which leads to a considerable speed up. This implementation uses Collins' simplified testing condition on the single digit cosequences which requires only one quotient and avoids any possibility of overflow. name old time/op new time/op delta GCD10x10/WithoutXY-4 1.82µs ±24% 0.28µs ± 6% -84.40% (p=0.008 n=5+5) GCD10x10/WithXY-4 1.69µs ± 6% 1.71µs ± 6% ~ (p=0.595 n=5+5) GCD10x100/WithoutXY-4 1.87µs ± 2% 0.56µs ± 4% -70.13% (p=0.008 n=5+5) GCD10x100/WithXY-4 2.61µs ± 2% 2.65µs ± 4% ~ (p=0.635 n=5+5) GCD10x1000/WithoutXY-4 2.75µs ± 2% 1.48µs ± 1% -46.06% (p=0.008 n=5+5) GCD10x1000/WithXY-4 5.29µs ± 2% 5.25µs ± 2% ~ (p=0.548 n=5+5) GCD10x10000/WithoutXY-4 10.7µs ± 2% 10.3µs ± 0% -4.38% (p=0.008 n=5+5) GCD10x10000/WithXY-4 22.3µs ± 6% 22.1µs ± 1% ~ (p=1.000 n=5+5) GCD10x100000/WithoutXY-4 93.7µs ± 2% 99.4µs ± 2% +6.09% (p=0.008 n=5+5) GCD10x100000/WithXY-4 196µs ± 2% 199µs ± 2% ~ (p=0.222 n=5+5) GCD100x100/WithoutXY-4 10.1µs ± 2% 2.5µs ± 2% -74.84% (p=0.008 n=5+5) GCD100x100/WithXY-4 21.4µs ± 2% 21.3µs ± 7% ~ (p=0.548 n=5+5) GCD100x1000/WithoutXY-4 11.3µs ± 2% 4.4µs ± 4% -60.87% (p=0.008 n=5+5) GCD100x1000/WithXY-4 24.7µs ± 3% 23.9µs ± 1% ~ (p=0.056 n=5+5) GCD100x10000/WithoutXY-4 26.6µs ± 1% 20.0µs ± 2% -24.82% (p=0.008 n=5+5) GCD100x10000/WithXY-4 78.7µs ± 2% 78.2µs ± 2% ~ (p=0.690 n=5+5) GCD100x100000/WithoutXY-4 174µs ± 2% 171µs ± 1% ~ (p=0.056 n=5+5) GCD100x100000/WithXY-4 563µs ± 4% 561µs ± 2% ~ (p=1.000 n=5+5) GCD1000x1000/WithoutXY-4 120µs ± 5% 29µs ± 3% -75.71% (p=0.008 n=5+5) GCD1000x1000/WithXY-4 355µs ± 4% 358µs ± 2% ~ (p=0.841 n=5+5) GCD1000x10000/WithoutXY-4 140µs ± 2% 49µs ± 2% -65.07% (p=0.008 n=5+5) GCD1000x10000/WithXY-4 626µs ± 3% 628µs ± 9% ~ (p=0.690 n=5+5) GCD1000x100000/WithoutXY-4 340µs ± 4% 259µs ± 6% -23.79% (p=0.008 n=5+5) GCD1000x100000/WithXY-4 3.76ms ± 4% 3.82ms ± 5% ~ (p=0.310 n=5+5) GCD10000x10000/WithoutXY-4 3.11ms ± 3% 0.54ms ± 2% -82.74% (p=0.008 n=5+5) GCD10000x10000/WithXY-4 7.96ms ± 3% 7.69ms ± 3% ~ (p=0.151 n=5+5) GCD10000x100000/WithoutXY-4 3.88ms ± 1% 1.27ms ± 2% -67.21% (p=0.008 n=5+5) GCD10000x100000/WithXY-4 38.1ms ± 2% 38.8ms ± 1% ~ (p=0.095 n=5+5) GCD100000x100000/WithoutXY-4 208ms ± 1% 25ms ± 4% -88.07% (p=0.008 n=5+5) GCD100000x100000/WithXY-4 533ms ± 5% 525ms ± 4% ~ (p=0.548 n=5+5) Change-Id: Ic1e007eb807b93e75f4752e968e98c1f0cb90e43 Reviewed-on: https://go-review.googlesource.com/59450 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-10-24 22:42:43 +00:00
Mark Pulford	a5c44f3e3f	math: add RoundToEven function Rounding ties to even is statistically useful for some applications. This implementation completes IEEE float64 rounding mode support (in addition to Round, Ceil, Floor, Trunc). This function avoids subtle faults found in ad-hoc implementations, and is simple enough to be inlined by the compiler. Fixes #21748 Change-Id: I09415df2e42435f9e7dabe3bdc0148e9b9ebd609 Reviewed-on: https://go-review.googlesource.com/61211 Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-10-24 22:33:09 +00:00
Matthew Dempsky	a5868a47c6	cmd/internal/obj/x86: move MOV->XOR rewriting into compiler Fixes #20986. Change-Id: Ic3cf5c0ab260f259ecff7b92cfdf5f4ae432aef3 Reviewed-on: https://go-review.googlesource.com/73072 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2017-10-24 21:32:17 +00:00
Filippo Valsorda	f75158c365	math/big: fix ModSqrt optimized path for x = z name old time/op new time/op delta ModSqrt224_3Mod4-4 153µs ± 2% 154µs ± 1% ~ (p=0.548 n=5+5) ModSqrt5430_3Mod4-4 776ms ± 2% 791ms ± 2% ~ (p=0.222 n=5+5) Fixes #22265 Change-Id: If233542716e04341990a45a1c2b7118da6d233f7 Reviewed-on: https://go-review.googlesource.com/70832 Run-TryBot: Filippo Valsorda <hi@filippo.io> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-10-16 21:41:44 +00:00
griesemer	51cfe6849a	math/big: provide support for conversion bases up to 62 Increase MaxBase from 36 to 62 and extend the conversion alphabet with the upper-case letters 'A' to 'Z'. For int conversions with bases <= 36, the letters 'A' to 'Z' have the same values (10 to 35) as the corresponding lower-case letters. For conversion bases > 36 up to 62, the upper-case letters have the values 36 to 61. Added MaxBase to api/except.txt: Clients should not make assumptions about the value of MaxBase being constant. The core of the change is in natconv.go. The remaining changes are adjusted tests and documentation. Fixes #21558. Change-Id: I5f74da633caafca03993e13f32ac9546c572cc84 Reviewed-on: https://go-review.googlesource.com/65970 Reviewed-by: Martin Möhrmann <moehrmann@google.com>	2017-10-06 17:46:15 +00:00
griesemer	2ddd07138d	math/bits: complete examples Change-Id: Icbe6885ffd3aa4e77441ab03a2b9a04a9276d5eb Reviewed-on: https://go-review.googlesource.com/68311 Reviewed-by: Martin Möhrmann <moehrmann@google.com>	2017-10-06 16:58:03 +00:00
Marvin Stenger	90d71fe99e	all: revert "all: prefer strings.IndexByte over strings.Index" This reverts https://golang.org/cl/65930. Fixes #22148 Change-Id: Ie0712621ed89c43bef94417fc32de9af77607760 Reviewed-on: https://go-review.googlesource.com/68430 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2017-10-05 23:19:10 +00:00
Marvin Stenger	aa00c607e1	math/big: remove []byte/string conversions This removes some of the []byte/string conversions currently existing in the (un)marshaling methods of Int and Rat. For Int we introduce a new function (Int).setFromScanner() essentially implementing the SetString method being given an io.ByteScanner instead of a string. So we can handle the string case in (Int).SetString with a strings.Reader and the []byte case in (Int).UnmarshalText() with a bytes.Reader now avoiding the []byte/string conversion here. For Rat we introduce a new function (Rat).marshal() essentially implementing the String method outputting []byte instead of string. Using this new function and the same formatting rules as in (Rat).RatString we can implement (Rat).MarshalText() without the []byte/string conversion it used to have. Change-Id: Ic5ef246c1582c428a40f214b95a16671ef0a06d9 Reviewed-on: https://go-review.googlesource.com/65950 Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-10-04 17:16:52 +00:00
Marvin Stenger	f22ba1f247	all: prefer strings.IndexByte over strings.Index strings.IndexByte was introduced in go1.2 and it can be used effectively wherever the second argument to strings.Index is exactly one byte long. This avoids generating unnecessary string symbols and saves a few calls to strings.Index. Change-Id: I1ab5edb7c4ee9058084cfa57cbcc267c2597e793 Reviewed-on: https://go-review.googlesource.com/65930 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2017-09-25 17:35:41 +00:00
Marvin Stenger	7a5d76fa62	math/big: delete solved TODO The TODO is no longer needed as it was solved by a previous CL. See https://go-review.googlesource.com/14995. Change-Id: If62d1b296f35758ad3d18d28c8fbb95e797f4464 Reviewed-on: https://go-review.googlesource.com/65232 Reviewed-by: Robert Griesemer <gri@golang.org>	2017-09-22 10:21:46 +00:00
Agniva De Sarker	d2f317218b	math: implement fast path for Exp - using FMA and AVX instructions if available to speed-up Exp calculation on amd64 - using a data table instead of #define'ed constants because these instructions do not support loading floating point immediates. One has to use a memory operand / register. - Benchmark results on Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz: Original vs New (non-FMA path) name old time/op new time/op delta Exp 16.0ns ± 1% 16.1ns ± 3% ~ (p=0.308 n=9+10) Original vs New (FMA path) name old time/op new time/op delta Exp 16.0ns ± 1% 13.7ns ± 2% -14.80% (p=0.000 n=9+10) Change-Id: I3d8986925d82b39b95ee979ae06f59d7e591d02e Reviewed-on: https://go-review.googlesource.com/62590 Reviewed-by: Ilya Tocar <ilya.tocar@intel.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-09-20 21:43:00 +00:00
Burak Guven	9cc170f9a5	math/rand: fix comment for Shuffle Shuffle panics if n < 0, not n <= 0. The comment for the (*Rand).Shuffle function is already accurate. Change-Id: I073049310bca9632e50e9ca3ff79eec402122793 Reviewed-on: https://go-review.googlesource.com/63750 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2017-09-14 03:41:35 +00:00
Michael Munday	ffb4708d1b	math: fix Abs, Copysign and Signbit benchmarks CL 62250 makes constant folding a bit more aggressive and these benchmarks were optimized away. This CL adds some indirection to the function arguments to stop them being folded. The Copysign benchmark is a bit faster because I've left one argument as a constant and it can be partially folded. old CL 62250 this CL Copysign 1.24ns ± 0% 0.34ns ± 2% 1.02ns ± 2% Abs 0.67ns ± 0% 0.35ns ± 3% 0.67ns ± 0% Signbit 0.87ns ± 0% 0.35ns ± 2% 0.87ns ± 1% Change-Id: I9604465a87d7aa29f4bd6009839c8ee354be3cd7 Reviewed-on: https://go-review.googlesource.com/62450 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2017-09-09 16:52:16 +00:00
Ian Lance Taylor	44f7fd030f	math/rand: change http to https in comment Change-Id: I19c1b0e1b238dda82e69bd47459528ed06b55840 Reviewed-on: https://go-review.googlesource.com/62310 Reviewed-by: David Crawshaw <crawshaw@golang.org>	2017-09-08 22:11:55 +00:00
Josh Bleecher Snyder	caae0917bf	math/rand: make Perm match Shuffle Perm and Shuffle are fundamentally doing the same work. This change makes Perm's algorithm match Shuffle's. In addition to allowing developers to switch more easily between the two methods, it affords a nice speed-up: name old time/op new time/op delta Perm3-8 75.7ns ± 1% 51.8ns ± 1% -31.59% (p=0.000 n=9+8) Perm30-8 610ns ± 1% 405ns ± 1% -33.67% (p=0.000 n=9+9) This change alters the output from Perm, given the same Source and seed. This is a change from Go 1.0 behavior. This necessitates updating the regression test. This also changes the number of calls made to the Source during Perm, which changes the output of the math/rand examples. This also slightly perturbs the output of Perm, nudging it out of the range currently accepted by TestUniformFactorial. However, it is complete unclear that the helpers relied on by TestUniformFactorial are correct. That is #21211. This change updates checkSimilarDistribution to respect closeEnough for standard deviations, which makes the test pass. The whole situation is muddy; see #21211 for details. There is an alternative implementation of Perm that avoids initializing m, which is more similar to the existing implementation, plus some optimizations: func (r *Rand) Perm(n int) []int { m := make([]int, n) max31 := n if n > 1<<31-1-1 { max31 = 1<<31 - 1 - 1 } i := 1 for ; i < max31; i++ { j := r.int31n(int32(i + 1)) m[i] = m[j] m[j] = i } for ; i < n; i++ { j := r.Int63n(int64(i + 1)) m[i] = m[j] m[j] = i } return m } This is a tiny bit faster than the implementation actually used in this change: name old time/op new time/op delta Perm3-8 51.8ns ± 1% 50.3ns ± 1% -2.83% (p=0.000 n=8+9) Perm30-8 405ns ± 1% 394ns ± 1% -2.66% (p=0.000 n=9+8) However, 3% in performance doesn't seem worth having the two algorithms diverge, nor the reduced readability of this alternative. Updates #16213. Change-Id: I11a7441ff8837ee9c241b4c88f7aa905348be781 Reviewed-on: https://go-review.googlesource.com/55972 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rob Pike <r@golang.org>	2017-09-08 13:26:20 +00:00
Josh Bleecher Snyder	a2dfe5d278	math/rand: add Shuffle Shuffle uses the Fisher-Yates algorithm. Since this is new API, it affords us the opportunity to use a much faster Int31n implementation that mostly avoids division. As a result, BenchmarkPerm30ViaShuffle is about 30% faster than BenchmarkPerm30, despite requiring a separate initialization loop and using function calls to swap elements. Fixes #20480 Updates #16213 Updates #21211 Change-Id: Ib8956c4bebed9d84f193eb98282ec16ee7c2b2d5 Reviewed-on: https://go-review.googlesource.com/51891 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2017-09-08 13:03:02 +00:00
Mark Pulford	03c3bb5f84	math: Add Round function (ties away from zero) This function avoids subtle faults found in many ad-hoc implementations, and is simple enough to be inlined by the compiler. Fixes #20100 Change-Id: Ib320254e9b1f1f798c6ef906b116f63bc29e8d08 Reviewed-on: https://go-review.googlesource.com/43652 Reviewed-by: Robert Griesemer <gri@golang.org>	2017-09-02 21:00:08 +00:00
griesemer	f7cb5bca1a	math/big: fix internal comment Change-Id: Id003e2dbecad7b3c249a747f8b4032135dfbe34f Reviewed-on: https://go-review.googlesource.com/60670 Reviewed-by: Marcel van Lohuizen <mpvl@golang.org>	2017-08-31 13:05:11 +00:00
jaredculp	dc42ffff59	math: add examples for trig functions Change-Id: Ic3ce2f3c055f2636ec8fc9cec8592e596b18dc05 Reviewed-on: https://go-review.googlesource.com/54771 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2017-08-25 20:26:19 +00:00
Keith Randall	fb05948d9e	cmd/compile,math: improve code generation for math.Abs Implement int reg <-> fp reg moves on amd64. If we see a load to int reg followed by an int->fp move, then we can just load to the fp reg instead. Same for stores. math.Abs is now: MOVQ "".x+8(SP), AX SHLQ $1, AX SHRQ $1, AX MOVQ AX, "".~r1+16(SP) math.Copysign is now: MOVQ "".x+8(SP), AX SHLQ $1, AX SHRQ $1, AX MOVQ "".y+16(SP), CX SHRQ $63, CX SHLQ $63, CX ORQ CX, AX MOVQ AX, "".~r2+24(SP) math.Float64bits is now: MOVSD "".x+8(SP), X0 MOVSD X0, "".~r1+16(SP) (it would be nicer to use a non-SSE reg for this, nothing is perfect) And due to the fix for #21440, the inlined version of these improve as well. name old time/op new time/op delta Abs 1.38ns ± 5% 0.89ns ±10% -35.54% (p=0.000 n=10+10) Copysign 1.56ns ± 7% 1.35ns ± 6% -13.77% (p=0.000 n=9+10) Fixes #13095 Change-Id: Ibd7f2792412a6668608780b0688a77062e1f1499 Reviewed-on: https://go-review.googlesource.com/58732 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>	2017-08-25 19:15:01 +00:00
Agniva De Sarker	ea5e3bd2a1	all: fix easy-to-miss typos Using the wonderful https://github.com/client9/misspell tool. Change-Id: Icdbc75a5559854f4a7a61b5271bcc7e3f99a1a24 Reviewed-on: https://go-review.googlesource.com/57851 Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com> Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-08-23 03:07:12 +00:00
Lakshay Garg	4c0bba158e	math: implement the erfcinv function Fixes: #6359 Change-Id: I6c697befd681a253e73a7091faa9f20ff3791201 Reviewed-on: https://go-review.googlesource.com/57090 Reviewed-by: Robert Griesemer <gri@golang.org>	2017-08-22 13:13:20 +00:00
Brian Kessler	edaa0ffadb	math/big: use internal sqr on nats Replace z.mul(x, x) calls on nats in internal code with z.sqr(x) that employs optimized squaring routines. Benchmark results: Exp-4 12.9ms ± 2% 12.8ms ± 3% ~ (p=0.165 n=10+10) Exp2-4 13.0ms ± 4% 12.8ms ± 2% -2.14% (p=0.015 n=8+9) ModSqrt225_Tonelli-4 987µs ± 4% 989µs ± 2% ~ (p=0.673 n=8+9) ModSqrt224_3Mod4-4 300µs ± 2% 301µs ± 3% ~ (p=0.546 n=9+9) ModSqrt5430_Tonelli-4 4.88s ± 6% 4.82s ± 5% ~ (p=0.247 n=10+10) ModSqrt5430_3Mod4-4 1.62s ±10% 1.57s ± 1% ~ (p=0.094 n=9+9) Exp3Power/0x10-4 496ns ± 7% 426ns ± 7% -14.21% (p=0.000 n=10+10) Exp3Power/0x40-4 575ns ± 5% 470ns ± 7% -18.20% (p=0.000 n=9+10) Exp3Power/0x100-4 929ns ±19% 770ns ±10% -17.13% (p=0.000 n=10+10) Exp3Power/0x400-4 1.96µs ± 7% 1.79µs ± 5% -8.68% (p=0.000 n=10+10) Exp3Power/0x1000-4 10.9µs ± 9% 7.9µs ± 5% -28.02% (p=0.000 n=10+10) Exp3Power/0x4000-4 86.8µs ± 8% 67.3µs ± 8% -22.41% (p=0.000 n=10+10) Exp3Power/0x10000-4 750µs ± 8% 731µs ± 1% ~ (p=0.074 n=9+8) Exp3Power/0x40000-4 7.07ms ± 7% 7.05ms ± 4% ~ (p=0.931 n=9+9) Exp3Power/0x100000-4 64.7ms ± 2% 65.6ms ± 6% ~ (p=0.661 n=9+10) Exp3Power/0x400000-4 577ms ± 2% 580ms ± 3% ~ (p=0.931 n=9+9) ProbablyPrime/n=0-4 9.08ms ±17% 9.09ms ±16% ~ (p=0.447 n=9+10) ProbablyPrime/n=1-4 10.8ms ± 4% 10.7ms ± 2% ~ (p=0.243 n=10+9) ProbablyPrime/n=5-4 18.5ms ± 3% 18.5ms ± 1% ~ (p=0.863 n=9+9) ProbablyPrime/n=10-4 28.6ms ± 6% 28.2ms ± 1% ~ (p=0.050 n=9+9) ProbablyPrime/n=20-4 48.4ms ± 4% 48.4ms ± 2% ~ (p=0.739 n=10+10) ProbablyPrime/Lucas-4 6.75ms ± 4% 6.75ms ± 2% ~ (p=0.963 n=9+8) ProbablyPrime/MillerRabinBase2-4 2.00ms ± 5% 2.00ms ± 7% ~ (p=0.931 n=9+9) Change-Id: Ibe9f58d11dbad25eb369faedf480b666a0250a6b Reviewed-on: https://go-review.googlesource.com/56773 Reviewed-by: Robert Griesemer <gri@golang.org>	2017-08-22 08:04:44 +00:00
Lakshay Garg	77412b9300	math: implement the erfinv function This commit defines the inverse of error function (erfinv) in the math package. The function is based on the rational approximation of percentage points of normal distribution available at https://www.jstor.org/stable/pdf/2347330.pdf. Fixes #6359 Change-Id: Icfe4508f623e0574c7fffdbf7aa929540fd4c944 Reviewed-on: https://go-review.googlesource.com/46990 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-08-18 13:30:46 +00:00
Brian Kessler	497f891fce	math/big: recognize squaring for Floats Updates #13745 Recognize z.Mul(x, x) as squaring for Floats and use the internal z.sqr(x) method for nat on the mantissa. Change-Id: I0f792157bad93a13cae1aecc4c10bd20c6397693 Reviewed-on: https://go-review.googlesource.com/56774 Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-08-18 12:48:10 +00:00
Brian Kessler	fe08ebaebb	math/big: use internal square for Rat updates #13745 A squared rational is always positive and can not be reduced since the numerator and denominator had no previous common factors. The nat multiplication can be performed using the internal sqr method. Change-Id: I558f5b38e379bfd26ff163c9489006d7e5a9cfaa Reviewed-on: https://go-review.googlesource.com/56776 Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-08-18 12:47:46 +00:00
Daniel Martí	59413d34c9	all: unindent some big chunks of code Found with mvdan.cc/unindent. Prioritized the ones with the biggest wins for now. Change-Id: I2b032e45cdd559fc9ed5b1ee4c4de42c4c92e07b Reviewed-on: https://go-review.googlesource.com/56470 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2017-08-18 06:59:48 +00:00
crvv	d46953c9f6	math: fix inaccurate result of Exp(1) The existing implementation is translated from C, which uses a polynomial coefficient very close to 1/6. If the function uses 1/6 as this coeffient, the result of Exp(1) will be more accurate. And this change doesn't introduce more error to Exp function. Fixes #20319 Change-Id: I94c236a18cf95570ebb69f7fb99884b0d7cf5f6e Reviewed-on: https://go-review.googlesource.com/49294 Reviewed-by: Robert Griesemer <gri@golang.org>	2017-08-17 09:01:27 +00:00
Brian Kessler	25b040c287	math/big: recognize z.Mul(x, x) as squaring of x updates #13745 Multiprecision squaring can be done in a straightforward manner with about half the multiplications of a basic multiplication due to the symmetry of the operands. This change implements basic squaring for nat types and uses it for Int multiplication when the same variable is supplied to both arguments of z.Mul(x, x). This has some overhead to allocate a temporary variable to hold the cross products, shift them to double and add them to the diagonal terms. There is a speed benefit in the intermediate range when the overhead is neglible and the asymptotic performance of karatsuba multiplication has not been reached. basicSqrThreshold = 20 karatsubaSqrThreshold = 400 Were set by running calibrate_test.go to measure timing differences between the algorithms. Benchmarks for squaring: name old time/op new time/op delta IntSqr/1-4 51.5ns ±25% 25.1ns ± 7% -51.38% (p=0.008 n=5+5) IntSqr/2-4 79.1ns ± 4% 72.4ns ± 2% -8.47% (p=0.008 n=5+5) IntSqr/3-4 102ns ± 4% 97ns ± 5% ~ (p=0.056 n=5+5) IntSqr/5-4 161ns ± 4% 163ns ± 7% ~ (p=0.952 n=5+5) IntSqr/8-4 277ns ± 5% 267ns ± 6% ~ (p=0.087 n=5+5) IntSqr/10-4 358ns ± 3% 360ns ± 4% ~ (p=0.730 n=5+5) IntSqr/20-4 1.07µs ± 3% 1.01µs ± 6% ~ (p=0.056 n=5+5) IntSqr/30-4 2.36µs ± 4% 1.72µs ± 2% -27.03% (p=0.008 n=5+5) IntSqr/50-4 5.19µs ± 3% 3.88µs ± 4% -25.37% (p=0.008 n=5+5) IntSqr/80-4 11.3µs ± 4% 8.6µs ± 3% -23.78% (p=0.008 n=5+5) IntSqr/100-4 16.2µs ± 4% 12.8µs ± 3% -21.49% (p=0.008 n=5+5) IntSqr/200-4 50.1µs ± 5% 44.7µs ± 3% -10.65% (p=0.008 n=5+5) IntSqr/300-4 105µs ±11% 95µs ± 3% -9.50% (p=0.008 n=5+5) IntSqr/500-4 231µs ± 5% 227µs ± 2% ~ (p=0.310 n=5+5) IntSqr/800-4 496µs ± 9% 459µs ± 3% -7.40% (p=0.016 n=5+5) IntSqr/1000-4 700µs ± 3% 710µs ± 5% ~ (p=0.841 n=5+5) Show a speed up of 10-25% in the range where basicSqr is optimal, improved single word squaring and no significant difference when the fallback to standard multiplication is used. Change-Id: Iae2c82ca91cf890823f91e5c83bbe9a2c534b72b Reviewed-on: https://go-review.googlesource.com/53638 Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-08-16 10:07:47 +00:00
Brian Kessler	53836a74f8	math/big: speed up GCD x, y calculation The current implementation of the extended Euclidean GCD algorithm calculates both cosequences x and y inside the division loop. This is unneccessary since the second Bezout coefficient can be obtained at the end of calculation via a multiplication, subtraction and a division. In case only one coefficient is needed, e.g. ModInverse this calculation can be skipped entirely. This is a standard optimization, see e.g. "Handbook of Elliptic and Hyperelliptic Curve Cryptography" Cohen et al pp 191 Available at: http://cs.ucsb.edu/~koc/ccs130h/2013/EllipticHyperelliptic-CohenFrey.pdf Updates #15833 Change-Id: I1e0d2e63567cfed97fd955048fe6373d36f22757 Reviewed-on: https://go-review.googlesource.com/50530 Reviewed-by: Robert Griesemer <gri@golang.org>	2017-08-16 09:13:12 +00:00
Brian Kessler	1246566142	math: eliminate overflow in Pow(x,y) for large y The current implementation uses a shift and add loop to compute the product of x's exponent xe and the integer part of y (yi) for yi up to 1<<63. Since xe is an 11-bit exponent, this product can be up to 74-bits and overflow both 32 and 64-bit int. This change checks whether the accumulated exponent will fit in the 11-bit float exponent of the output and breaks out of the loop early if overflow is detected. The current handling of yi >= 1<<63 uses Exp(y * Log(x)) which incorrectly returns Nan for x<0. In addition, for y this large, Exp(y * Log(x)) can be enumerated to only overflow except when x == -1 since the boundary cases computed exactly: Pow(NextAfter(1.0, Inf(1)), 1<<63) == 2.72332... * 10^889 Pow(NextAfter(1.0, Inf(-1)), 1<<63) == 1.91624... * 10^-445 exceed the range of float64. So, the call can be replaced with a simple case statement analgous to y == Inf that correctly handles x < 0 as well. Fixes #7394 Change-Id: I6f50dc951f3693697f9669697599860604323102 Reviewed-on: https://go-review.googlesource.com/48290 Reviewed-by: Robert Griesemer <gri@golang.org>	2017-08-16 09:10:10 +00:00
Carlos Eduardo Seo	3cb41be817	math/big: improve performance for AddMulVVW and mulAddVWW for ppc64x This change adds a better implementation in asm for AddMulVVW and mulAddVWW for ppc64x, with speedups up to 1.54x. benchmark old ns/op new ns/op delta BenchmarkAddMulVVW/1-8 6.58 6.29 -4.41% BenchmarkAddMulVVW/2-8 7.43 7.25 -2.42% BenchmarkAddMulVVW/3-8 8.95 8.15 -8.94% BenchmarkAddMulVVW/4-8 10.1 9.37 -7.23% BenchmarkAddMulVVW/5-8 12.0 10.7 -10.83% BenchmarkAddMulVVW/10-8 22.1 20.1 -9.05% BenchmarkAddMulVVW/100-8 211 154 -27.01% BenchmarkAddMulVVW/1000-8 2046 1450 -29.13% BenchmarkAddMulVVW/10000-8 20407 14793 -27.51% BenchmarkAddMulVVW/100000-8 223857 145548 -34.98% benchmark old MB/s new MB/s speedup BenchmarkAddMulVVW/1-8 9719.88 10175.79 1.05x BenchmarkAddMulVVW/2-8 17233.97 17657.54 1.02x BenchmarkAddMulVVW/3-8 21446.05 23550.49 1.10x BenchmarkAddMulVVW/4-8 25375.70 27334.33 1.08x BenchmarkAddMulVVW/5-8 26650.52 30029.34 1.13x BenchmarkAddMulVVW/10-8 28984.29 31833.68 1.10x BenchmarkAddMulVVW/100-8 30249.41 41531.69 1.37x BenchmarkAddMulVVW/1000-8 31273.35 44108.54 1.41x BenchmarkAddMulVVW/10000-8 31360.47 43263.54 1.38x BenchmarkAddMulVVW/100000-8 28589.58 43971.66 1.54x Change-Id: I8a8105d4da3592afdef3125757a99f378a0254bb Reviewed-on: https://go-review.googlesource.com/53931 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>	2017-08-11 13:59:52 +00:00
romanyx	92cfd07a6c	math/bits: examples generator Change-Id: Icdd0566d3b7dbc034256e16f8a6b6f1af07069b3 Reviewed-on: https://go-review.googlesource.com/54350 Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-08-11 11:05:01 +00:00
Brian Kessler	9c7bf0807a	math/big: avoid unneeded sticky bit calculations As noted in the TODO comment, the sticky bit is only used when the rounding bit is zero or the rounding mode is ToNearestEven. This change makes that check explicit and will eliminate half the sticky bit calculations on average when rounding mode is not ToNearestEven. Change-Id: Ia4709f08f46e682bf97dabe5eb2a10e8e3d7af43 Reviewed-on: https://go-review.googlesource.com/54111 Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Robert Griesemer <gri@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org>	2017-08-11 09:52:30 +00:00
Wembley G. Leach, Jr	762a0bae06	math/bits: Add examples for Reverse functions Change-Id: I30563d31f6acea594cc853cc6b672ec664f90d48 Reviewed-on: https://go-review.googlesource.com/53636 Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-08-09 18:02:36 +00:00
Than McIntosh	ff560ee950	math: additional tests for Ldexp Add test cases to verify behavior for Ldexp with exponents outside the range of Minint32/Maxint32, for a gccgo bug. Test for issue #21323. Change-Id: Iea67bc6fcfafdfddf515cf7075bdac59360c277a Reviewed-on: https://go-review.googlesource.com/54230 Run-TryBot: Than McIntosh <thanm@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2017-08-09 15:33:37 +00:00
romanyx	fa155066c4	math/bits: some regular examples for functions Change-Id: Iee1b3e116b4dcc4071d6512abc5241eabedaeb5c Reviewed-on: https://go-review.googlesource.com/53850 Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-08-09 13:25:29 +00:00
Josh Bleecher Snyder	6b53dd4f2b	math/rand: use t.Helper in tests Change-Id: Iece39e6412c0f6c63f563eed1621b8cca02de835 Reviewed-on: https://go-review.googlesource.com/51890 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Avelino <t@avelino.xxx> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-08-08 23:49:31 +00:00
Josh Bleecher Snyder	ca19f2fc78	math/rand: fix uniform distribution stddev in tests The standard deviation of a uniform distribution is size / √12. The size of the interval [0, 255] is 256, not 255. While we're here, simplify the expression. The tests previously passed only because the error margin was large enough. Sample observed standard deviations while running tests: 73.7893634666819 73.9221651548294 73.8077961697150 73.9084236069471 73.8968446814785 73.8684209136244 73.9774618960282 73.9523483202549 255 / √12 == 73.6121593216772 256 / √12 == 73.9008344562721 Change-Id: I7bc6cdc11e5d098951f2f2133036f62489275979 Reviewed-on: https://go-review.googlesource.com/51310 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2017-08-08 23:49:00 +00:00
Jelte Fennema	403ae5081a	math: change oeis.org urls to https Regular HTTP is insecure, oeis.org supports HTTPS and it is actually used in some other places in the codebase. This changes these final urls to use HTTPS. Change-Id: Ia46410a9c7ce67238a10cb6bfffaceca46112f58 Reviewed-on: https://go-review.googlesource.com/52072 Reviewed-by: Alberto Donizetti <alb.donizetti@gmail.com>	2017-08-08 08:56:40 +00:00
Josh Bleecher Snyder	380525598c	all: remove some manual hyphenation Manual hyphenation doesn't work well when text gets reflown, for example by godoc. There are a few other manual hyphenations in the tree, but they are in local comments or comments for unexported functions. Change-Id: I17c9b1fee1def650da48903b3aae2fa1e1119a65 Reviewed-on: https://go-review.googlesource.com/53510 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2017-08-06 16:14:46 +00:00
Francesc Campoy Flores	3e3da54633	math/bits: fix example for OnesCount64 Erroneously called OnesCount instead of OnesCount64 Change-Id: Ie877e43f213253e45d31f64931c4a15915849586 Reviewed-on: https://go-review.googlesource.com/53410 Reviewed-by: Chris Broadfoot <cbro@golang.org>	2017-08-05 00:20:37 +00:00
Francesc Campoy	9b1e7cf2ac	math/bits: add examples for OnesCount functions Change-Id: Ie673f9665825a40281c2584d478ba1260f725856 Reviewed-on: https://go-review.googlesource.com/53357 Run-TryBot: Chris Broadfoot <cbro@golang.org> Reviewed-by: Chris Broadfoot <cbro@golang.org>	2017-08-04 23:24:07 +00:00
Dylan Waits	5f7b3fabe1	math/bits: add examples for leading zero methods Change-Id: Ib491d144387a7675af370f7b925fe6e62440d153 Reviewed-on: https://go-review.googlesource.com/48966 Run-TryBot: Kevin Burke <kev@inburke.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Kevin Burke <kev@inburke.com>	2017-07-15 21:55:58 +00:00
Aditya Mukerjee	a83d0175a8	math/rand: add concurrency warning to overview comment Change-Id: I52efa7aa72a23256e5ca56470ffeba975ed8f739 Reviewed-on: https://go-review.googlesource.com/48760 Reviewed-by: Bryan Mills <bcmills@google.com>	2017-07-15 20:34:17 +00:00
Martynas Budriūnas	41af3fa33e	math: add a Sqrt example Change-Id: I259e25b9d0b069912053a250e9739e04fafca54d Reviewed-on: https://go-review.googlesource.com/48892 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2017-07-15 20:12:22 +00:00
Ian Lance Taylor	dc6ae87c8c	math: clarify comment about bit-identical results across architectures Updates #18354. Change-Id: I76bc4a73d8dc99eeda14b395e451d75a65184191 Reviewed-on: https://go-review.googlesource.com/45013 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Rob Pike <r@golang.org>	2017-06-06 22:32:34 +00:00
gulyasm	a838191406	math: add doc note about floating point operation Go doesn't guarantee that the result of floating point operations will be the same on different architectures. It was not stated in the documentation, that can lead to confusion. Fixes #18354 Change-Id: Idb1b4c256fb9a7158a74256136eca3b8ce44476f Reviewed-on: https://go-review.googlesource.com/34938 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2017-06-06 20:20:41 +00:00
Alberto Donizetti	1948b7f806	math/big: fix Add, Sub when receiver aliases 2nd operand Fixes #20490 Change-Id: I9cfa604f9ff94df779cb9b4cbbd706258fc473ac Reviewed-on: https://go-review.googlesource.com/44150 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-05-31 10:28:05 +00:00
Martin Möhrmann	69972aea74	internal/cpu: new package to detect cpu features Implements detection of x86 cpu features that are used in the go standard library. Changes all standard library packages to use the new cpu package instead of using runtime internal variables to check x86 cpu features. Updates: #15403 Change-Id: I2999a10cb4d9ec4863ffbed72f4e021a1dbc4bb9 Reviewed-on: https://go-review.googlesource.com/41476 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-05-10 17:02:21 +00:00
Lynn Boger	8304d10763	cmd/compile: ppc64x intrinsics for math/bits This adds math/bits intrinsics for OnesCount, Len, TrailingZeros on ppc64x. benchmark old ns/op new ns/op delta BenchmarkLeadingZeros-16 4.26 1.71 -59.86% BenchmarkLeadingZeros16-16 3.04 1.83 -39.80% BenchmarkLeadingZeros32-16 3.31 1.82 -45.02% BenchmarkLeadingZeros64-16 3.69 1.71 -53.66% BenchmarkTrailingZeros-16 2.55 1.62 -36.47% BenchmarkTrailingZeros32-16 2.55 1.77 -30.59% BenchmarkTrailingZeros64-16 2.78 1.62 -41.73% BenchmarkOnesCount-16 3.19 0.93 -70.85% BenchmarkOnesCount32-16 2.55 1.18 -53.73% BenchmarkOnesCount64-16 3.22 0.93 -71.12% Update #18616 I also made a change to bits_test.go because when debugging some failures the output was not quite providing the right argument information. Change-Id: Ia58d31d1777cf4582a4505f85b11a1202ca07d3e Reviewed-on: https://go-review.googlesource.com/41630 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com> Reviewed-by: Keith Randall <khr@golang.org>	2017-05-10 12:10:56 +00:00
Bill O'Farrell	88672de7af	math: use SIMD to accelerate additional scalar math functions on s390x As necessary, math functions were structured to use stubs, so that they can be accelerated with assembly on any platform. Technique used was minimax polynomial approximation using tables of polynomial coefficients, with argument range reduction. Benchmark New Old Speedup BenchmarkAcos 12.2 47.5 3.89 BenchmarkAcosh 18.5 56.2 3.04 BenchmarkAsin 13.1 40.6 3.10 BenchmarkAsinh 19.4 62.8 3.24 BenchmarkAtan 10.1 23 2.28 BenchmarkAtanh 19.1 53.2 2.79 BenchmarkAtan2 16.5 33.9 2.05 BenchmarkCbrt 14.8 58 3.92 BenchmarkErf 10.8 20.1 1.86 BenchmarkErfc 11.2 23.5 2.10 BenchmarkExp 8.77 53.8 6.13 BenchmarkExpm1 10.1 38.3 3.79 BenchmarkLog 13.1 40.1 3.06 BenchmarkLog1p 12.7 38.3 3.02 BenchmarkPowInt 31.7 40.5 1.28 BenchmarkPowFrac 33.1 141 4.26 BenchmarkTan 11.5 30 2.61 Accuracy was tested against a high precision reference function to determine maximum error. Note: ulperr is error in "units in the last place" max ulperr Acos 1.15 Acosh 1.07 Asin 2.22 Asinh 1.72 Atan 1.41 Atanh 3.00 Atan2 1.45 Cbrt 1.18 Erf 1.29 Erfc 4.82 Exp 1.00 Expm1 2.26 Log 0.94 Log1p 2.39 Tan 3.14 Pow will have 99.99% correctly rounded results with reasonable inputs producing numeric (non Inf or NaN) results Change-Id: I850e8cf7b70426e8b54ec49d74acd4cddc8c6cb2 Reviewed-on: https://go-review.googlesource.com/38585 Reviewed-by: Michael Munday <munday@ca.ibm.com> Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-05-08 19:52:30 +00:00
Carlos Eduardo Seo	9459c03b29	math/big: improve performance for addVV/subVV for ppc64x This change adds a better asm implementation of addVV for ppc64x, with speedups up to nearly 3x in the best cases. benchmark old ns/op new ns/op delta BenchmarkAddVV/1-8 7.33 5.81 -20.74% BenchmarkAddVV/2-8 8.72 6.49 -25.57% BenchmarkAddVV/3-8 10.5 7.08 -32.57% BenchmarkAddVV/4-8 12.7 7.57 -40.39% BenchmarkAddVV/5-8 14.3 8.06 -43.64% BenchmarkAddVV/10-8 27.6 11.1 -59.78% BenchmarkAddVV/100-8 218 82.4 -62.20% BenchmarkAddVV/1000-8 2064 718 -65.21% BenchmarkAddVV/10000-8 20536 7153 -65.17% BenchmarkAddVV/100000-8 211004 72403 -65.69% benchmark old MB/s new MB/s speedup BenchmarkAddVV/1-8 8729.74 11006.26 1.26x BenchmarkAddVV/2-8 14683.65 19707.55 1.34x BenchmarkAddVV/3-8 18226.96 27103.63 1.49x BenchmarkAddVV/4-8 20204.50 33805.81 1.67x BenchmarkAddVV/5-8 22348.64 39694.06 1.78x BenchmarkAddVV/10-8 23212.74 57631.08 2.48x BenchmarkAddVV/100-8 29300.07 77629.53 2.65x BenchmarkAddVV/1000-8 31000.56 89094.54 2.87x BenchmarkAddVV/10000-8 31163.61 89469.16 2.87x BenchmarkAddVV/100000-8 30331.16 88393.73 2.91x It also adds the use of CTR for the loop counter in subVV, instead of manually updating the loop counter. This is slightly faster. Change-Id: Ic4b05cad384fd057972d46a5618ed5c3039d7460 Reviewed-on: https://go-review.googlesource.com/41010 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>	2017-04-25 13:15:39 +00:00
Ilya Tocar	bc6459ac6c	math: remove asm version of sincos everywhere, except 386 We have dedicated asm implementation of sincos only on 386 and amd64, on everything else we are just jumping to generic version. However amd64 version is actually slower than generic one: Sincos-6 34.4ns ± 0% 24.8ns ± 0% -27.79% (p=0.000 n=8+10) So remove all sincos*.s and keep only generic and 386. Updates #19819 Change-Id: I7eefab35743729578264f52f6d23ee2c227c92a5 Reviewed-on: https://go-review.googlesource.com/41200 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-04-24 15:09:18 +00:00
Michael Munday	eed6938cbb	cmd/asm, cmd/internal/obj/s390x, math: add LGDR and LDGR instructions The instructions allow moves between floating point and general purpose registers without any conversion taking place. Change-Id: I82c6f3ad9c841a83783b5be80dcf5cd538ff49e6 Reviewed-on: https://go-review.googlesource.com/38777 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-04-17 16:33:51 +00:00
Robert Griesemer	9d01def597	math/bits: support negative rotation count and remove RotateRight For details see the discussion on the issue below. RotateLeft functions can now be inlined because the don't panic anymore for negative rotation counts. name old time/op new time/op delta RotateLeft-8 6.72ns ± 2% 1.86ns ± 0% -72.33% (p=0.016 n=5+4) RotateLeft8-8 4.41ns ± 2% 1.67ns ± 1% -62.15% (p=0.008 n=5+5) RotateLeft16-8 4.46ns ± 6% 1.65ns ± 0% -63.06% (p=0.008 n=5+5) RotateLeft32-8 4.50ns ± 5% 1.67ns ± 1% -62.86% (p=0.008 n=5+5) RotateLeft64-8 4.54ns ± 1% 1.85ns ± 1% -59.32% (p=0.008 n=5+5) https://perf.golang.org/search?q=upload:20170411.4 (Measured on 2.3 GHz Intel Core i7 running macOS 10.12.3.) For #18616. Change-Id: I0828d80d54ec24f8d44954a57b3d6aeedb69c686 Reviewed-on: https://go-review.googlesource.com/40394 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2017-04-11 23:57:24 +00:00
Eric Lagergren	094498c9a1	all: fix minor misspellings Change-Id: I1f1cfb161640eb8756fb1a283892d06b30b7a8fa Reviewed-on: https://go-review.googlesource.com/39356 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-04-03 23:19:07 +00:00
Carlos Eduardo Seo	4a1140472b	math/big: Unify divWW implementation for ppc64 and ppc64le. Starting in go1.9, the minimum processor requirement for ppc64 is POWER8. So it may now use the same divWW implementation as ppc64le. Updates #19074 Change-Id: If1a85f175cda89eee06a1024ccd468da6124c844 Reviewed-on: https://go-review.googlesource.com/39010 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>	2017-03-31 14:05:12 +00:00
Ilya Tocar	4f579cc65b	math: speed up Log on amd64 After https://golang.org/cl/31490 we break false output dependency for CVTS.. in compiler generated code. I've looked through asm code, which uses CVTS.. and added XOR to the only case where it affected performance. Log-6 21.6ns ± 0% 19.9ns ± 0% -7.87% (p=0.000 n=10+10) Change-Id: I25d9b405e3041a3839b40f9f9a52e708034bb347 Reviewed-on: https://go-review.googlesource.com/38771 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-03-29 20:36:29 +00:00
Robert Griesemer	70ea0ec30f	math/big: replace local versions of bitLen, nlz with math/bits versions Verified that BenchmarkBitLen time went down from 2.25 ns/op to 0.65 ns/op an a 2.3 GHz Intel Core i7, before removing that benchmark (now covered by math/bits benchmarks). Change-Id: I3890bb7d1889e95b9a94bd68f0bdf06f1885adeb Reviewed-on: https://go-review.googlesource.com/38464 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2017-03-23 19:43:09 +00:00
Robert Griesemer	9ecfd177cf	math/big: fix TestFloatSetFloat64String A -0 constant is the same as 0. Use explicit negative zero for float64 -0.0. Also, fix two test cases that were wrong. Fixes #19673. Change-Id: Ic09775f29d9bc2ee7814172e59c4a693441ea730 Reviewed-on: https://go-review.googlesource.com/38463 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>	2017-03-23 17:17:16 +00:00
Josh Bleecher Snyder	2de773d45f	math/big: make nat.setUint64 vet-friendly nat.setUint64 is nicely generic. By assuming 32- or 64-bit words, however, we can write simpler code, and eliminate some shifts in dead code that vet complains about. Generated code for 64 bit systems is unaltered. Generated code for 32 bit systems is much better. For 386, the routine length drops from 325 bytes of code to 271 bytes of code, with fewer loops. Change-Id: I1bc14c06272dee37a7fcb48d33dd1e621eba945d Reviewed-on: https://go-review.googlesource.com/38070 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2017-03-11 00:39:23 +00:00
Eitan Adler	789c5255a4	all: remove the the duplicate words Change-Id: I6343c162e27e2e492547c96f1fc504909b1c03c0 Reviewed-on: https://go-review.googlesource.com/37793 Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-03-06 04:39:12 +00:00
Robert Griesemer	32b41c8dc7	math/bits: move left-over functionality from bits_impl.go to bits.go Removes an extra function call for TrailingZeroes and thus may increase chances for inlining. Change-Id: Iefd8d4402dc89b64baf4e5c865eb3dadade623af Reviewed-on: https://go-review.googlesource.com/37613 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2017-02-28 23:50:47 +00:00
Robert Griesemer	83bc4a2fee	math/bits: faster LeadingZeros and Len functions benchmark old ns/op new ns/op delta BenchmarkLeadingZeros-8 8.43 3.10 -63.23% BenchmarkLeadingZeros8-8 8.13 1.33 -83.64% BenchmarkLeadingZeros16-8 7.34 2.07 -71.80% BenchmarkLeadingZeros32-8 7.99 2.87 -64.08% BenchmarkLeadingZeros64-8 8.13 2.96 -63.59% Measured on 2.3 GHz Intel Core i7 running macOS 10.12.3. Change-Id: Id343531b408d42ac45f10c76f60e85bdb977f91e Reviewed-on: https://go-review.googlesource.com/37582 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2017-02-28 20:55:13 +00:00
Robert Griesemer	9515cb511a	math/bits: faster TrailingZeroes8 For sizes > 8, the existing code is faster. benchmark old ns/op new ns/op delta BenchmarkTrailingZeros8-8 1.95 1.29 -33.85% Measured on 2.3 GHz Intel Core i7 running macOS 10.12.3. Change-Id: I6f3a33ec633a2c544ec29693c141f2f99335c745 Reviewed-on: https://go-review.googlesource.com/37581 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2017-02-28 20:55:01 +00:00
Robert Griesemer	d7a659b11b	math/bits: faster OnesCount using table lookups for sizes 8,16,32 For uint64, the existing algorithm is faster. benchmark old ns/op new ns/op delta BenchmarkOnesCount8-8 1.95 0.97 -50.26% BenchmarkOnesCount16-8 2.54 1.39 -45.28% BenchmarkOnesCount32-8 2.61 1.96 -24.90% Measured on 2.3 GHz Intel Core i7 running macOS 10.12.3. Change-Id: I6cc42882fef3d24694720464039161e339a9ae99 Reviewed-on: https://go-review.googlesource.com/37580 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2017-02-28 20:54:49 +00:00
Robert Griesemer	e18adbf88d	math/bits: faster Reverse8/16 functions using table lookups Measured on 2.3 GHz Intel Core i7, running macOS 10.12.3: benchmark old ns/op new ns/op delta BenchmarkReverse8-8 1.70 0.99 -41.76% BenchmarkReverse16-8 2.24 1.32 -41.07% Fixes #19279. Change-Id: I398cf8a3513b7fa63c130efc7846a7c5353999d4 Reviewed-on: https://go-review.googlesource.com/37459 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2017-02-25 22:18:58 +00:00
Robert Griesemer	ac91a514ff	math/bits: fix incorrect doc strings for TrailingZeros functions Change-Id: I3e40018ab1903d3b9ada7ad7812ba71ea2a428e7 Reviewed-on: https://go-review.googlesource.com/37456 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2017-02-25 00:58:25 +00:00
Robert Griesemer	322fff8ac8	math/big: use math/bits where appropriate This change adds math/bits as a new dependency of math/big. - use bits.LeadingZeroes instead of local implementation (they are identical, so there's no performance loss here) - leave other functionality local (ntz, bitLen) since there's faster implementations in math/big at the moment Change-Id: I1218aa8a1df0cc9783583b090a4bb5a8a145c4a2 Reviewed-on: https://go-review.googlesource.com/37141 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2017-02-24 19:19:02 +00:00
Martin Möhrmann	8c6643846e	math: speed up and improve accuracy of Pow10 Removes init function from the math package. Allows stripping of arrays with pre-computed values used for Pow10 from binaries if Pow10 is not used. cmd/go shrinks by 128 bytes. Fixed small values like 10**-323 being 0 instead of 1e-323. Overall precision is increased but still not as good as predefined constants for some inputs. Samples: Pow10(208) before: 1.0000000000000006662e+208 after: 1.0000000000000000959e+208 Pow10(202) before 1.0000000000000009895e+202 after 1.0000000000000001193e+202 Pow10(60) before 1.0000000000000001278e+60 after 0.9999999999999999494e+60 Pow10(-100) before 0.99999999999999938551e-100 after 0.99999999999999989309e-100 Pow10(-200) before 0.9999999999999988218e-200 after 1.0000000000000001271e-200 name old time/op new time/op delta Pow10Pos-4 44.6ns ± 2% 1.2ns ± 1% -97.39% (p=0.000 n=19+17) Pow10Neg-4 50.8ns ± 1% 4.1ns ± 2% -92.02% (p=0.000 n=17+19) Change-Id: If094034286b8ac64be3a95fd9e8ffa3d4ad39b31 Reviewed-on: https://go-review.googlesource.com/36331 Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-02-22 19:17:04 +00:00
Alexander Döring	ffb3b3698c	math: add more tests for special cases of Bessel functions Y0, Y1, Yn Test finite negative x with Y0(-1), Y1(-1), Yn(2,-1), Yn(-3,-1). Also test the special case Yn(0,0). Fixes #19130. Change-Id: I95f05a72e1c455ed8ddf202c56f4266f03f370fd Reviewed-on: https://go-review.googlesource.com/37310 Reviewed-by: Robert Griesemer <gri@golang.org>	2017-02-22 17:52:15 +00:00
Robert Griesemer	174058038c	math/big: define Word as uint instead of uintptr For compatibility with math/bits uint operations. When math/big was written originally, the Go compiler used 32bit int/uint values even on a 64bit machine. uintptr was the type that represented the machine register size. Now, the int/uint types are sized to the native machine register size, so they are the natural machine Word type. On most machines, the size of int/uint correspond to the size of uintptr. On platforms where uint and uintptr have different sizes, this change may lead to performance differences (e.g., amd64p32). Change-Id: Ief249c160b707b6441848f20041e32e9e9d8d8ca Reviewed-on: https://go-review.googlesource.com/37372 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2017-02-21 19:31:40 +00:00
Robert Griesemer	177dfba112	math/bits: faster OnesCount Using some additional suggestions per "Hacker's Delight". Added documentation and extra tests. Measured on 1.7 GHz Intel Core i7, running macOS 10.12.3. benchmark old ns/op new ns/op delta BenchmarkOnesCount-4 7.34 5.38 -26.70% BenchmarkOnesCount8-4 2.03 1.98 -2.46% BenchmarkOnesCount16-4 2.56 2.50 -2.34% BenchmarkOnesCount32-4 2.98 2.39 -19.80% BenchmarkOnesCount64-4 4.22 2.96 -29.86% Change-Id: I566b0ef766e55cf5776b1662b6016024ebe5d878 Reviewed-on: https://go-review.googlesource.com/37223 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-02-19 18:50:48 +00:00
Martin Möhrmann	6cfc3b25e9	math: protect benchmarked functions from being optimized away Add exported global variables and store the results of benchmarked functions in them. This prevents the current compiler optimizations from removing the instructions that are needed to compute the return values of the benchmarked functions. Change-Id: If8b08424e85f3796bb6dd73e761c653abbabcc5e Reviewed-on: https://go-review.googlesource.com/37195 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-02-18 17:00:59 +00:00
Robert Griesemer	a4a3d63dbe	math/bits: added benchmarks for Leading/TrailingZeros BenchmarkLeadingZeros-8 200000000 8.80 ns/op BenchmarkLeadingZeros8-8 200000000 8.21 ns/op BenchmarkLeadingZeros16-8 200000000 7.49 ns/op BenchmarkLeadingZeros32-8 200000000 7.80 ns/op BenchmarkLeadingZeros64-8 200000000 8.67 ns/op BenchmarkTrailingZeros-8 1000000000 2.05 ns/op BenchmarkTrailingZeros8-8 2000000000 1.94 ns/op BenchmarkTrailingZeros16-8 2000000000 1.94 ns/op BenchmarkTrailingZeros32-8 2000000000 1.92 ns/op BenchmarkTrailingZeros64-8 2000000000 2.03 ns/op Change-Id: I45497bf2d6369ba6cfc88ded05aa735908af8908 Reviewed-on: https://go-review.googlesource.com/37220 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2017-02-17 23:41:16 +00:00
Robert Griesemer	19028bdd18	math/bits: faster Rotate functions, added respective benchmarks Measured on 2.3 GHz Intel Core i7, running maxOS 10.12.3. benchmark old ns/op new ns/op delta BenchmarkRotateLeft-8 7.87 7.00 -11.05% BenchmarkRotateLeft8-8 8.41 4.52 -46.25% BenchmarkRotateLeft16-8 8.07 4.55 -43.62% BenchmarkRotateLeft32-8 8.36 4.73 -43.42% BenchmarkRotateLeft64-8 7.93 4.78 -39.72% BenchmarkRotateRight-8 8.23 6.72 -18.35% BenchmarkRotateRight8-8 8.76 4.39 -49.89% BenchmarkRotateRight16-8 9.07 4.44 -51.05% BenchmarkRotateRight32-8 8.85 4.46 -49.60% BenchmarkRotateRight64-8 8.11 4.43 -45.38% Change-Id: I79ea1e9e6fc65f95794a91f860a911efed3aa8a1 Reviewed-on: https://go-review.googlesource.com/37219 Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2017-02-17 23:40:45 +00:00
Robert Griesemer	a12edb8db6	math/bits: faster OnesCount, added respective benchmarks Also: Changed Reverse/ReverseBytes implementations to use the same (smaller) masks as OnesCount. BenchmarkOnesCount-8 37.0 6.26 -83.08% BenchmarkOnesCount8-8 7.24 1.99 -72.51% BenchmarkOnesCount16-8 11.3 2.47 -78.14% BenchmarkOnesCount32-8 18.4 3.02 -83.59% BenchmarkOnesCount64-8 40.0 3.78 -90.55% BenchmarkReverse-8 6.69 6.22 -7.03% BenchmarkReverse8-8 1.64 1.64 +0.00% BenchmarkReverse16-8 2.26 2.18 -3.54% BenchmarkReverse32-8 2.88 2.87 -0.35% BenchmarkReverse64-8 5.64 4.34 -23.05% BenchmarkReverseBytes-8 2.48 2.17 -12.50% BenchmarkReverseBytes16-8 0.63 0.95 +50.79% BenchmarkReverseBytes32-8 1.13 1.24 +9.73% BenchmarkReverseBytes64-8 2.50 2.16 -13.60% OnesCount-8 37.0ns ± 0% 6.3ns ± 0% ~ (p=1.000 n=1+1) OnesCount8-8 7.24ns ± 0% 1.99ns ± 0% ~ (p=1.000 n=1+1) OnesCount16-8 11.3ns ± 0% 2.5ns ± 0% ~ (p=1.000 n=1+1) OnesCount32-8 18.4ns ± 0% 3.0ns ± 0% ~ (p=1.000 n=1+1) OnesCount64-8 40.0ns ± 0% 3.8ns ± 0% ~ (p=1.000 n=1+1) Reverse-8 6.69ns ± 0% 6.22ns ± 0% ~ (p=1.000 n=1+1) Reverse8-8 1.64ns ± 0% 1.64ns ± 0% ~ (all samples are equal) Reverse16-8 2.26ns ± 0% 2.18ns ± 0% ~ (p=1.000 n=1+1) Reverse32-8 2.88ns ± 0% 2.87ns ± 0% ~ (p=1.000 n=1+1) Reverse64-8 5.64ns ± 0% 4.34ns ± 0% ~ (p=1.000 n=1+1) ReverseBytes-8 2.48ns ± 0% 2.17ns ± 0% ~ (p=1.000 n=1+1) ReverseBytes16-8 0.63ns ± 0% 0.95ns ± 0% ~ (p=1.000 n=1+1) ReverseBytes32-8 1.13ns ± 0% 1.24ns ± 0% ~ (p=1.000 n=1+1) ReverseBytes64-8 2.50ns ± 0% 2.16ns ± 0% ~ (p=1.000 n=1+1) Change-Id: I591b0ffc83fc3a42828256b6e5030f32c64f9497 Reviewed-on: https://go-review.googlesource.com/37218 Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2017-02-17 23:40:10 +00:00
Robert Griesemer	4498b68390	math/bits: faster Reverse, ReverseBytes - moved from: x&m>>k \| x&^m<<k to: x&m>>k \| x<<k&m This permits use of the same constant m twice () which may be better for machines that can't use large immediate constants directly with an AND instruction and have to load them explicitly. ) CPUs don't usually have a &^ instruction, so x&^m becomes x&(^m) - simplified returns This improves the generated code because the compiler recognizes x>>k \| x<<k as ROT when k is the bitsize of x. The 8-bit versions of these instructions can be significantly faster still if they are replaced with table lookups, as long as the table is in cache. If the table is not in cache, table-lookup is probably slower, hence the choice of an explicit register-only implementation for now. BenchmarkReverse-8 8.50 6.86 -19.29% BenchmarkReverse8-8 2.17 1.74 -19.82% BenchmarkReverse16-8 2.89 2.34 -19.03% BenchmarkReverse32-8 3.55 2.95 -16.90% BenchmarkReverse64-8 6.81 5.57 -18.21% BenchmarkReverseBytes-8 3.49 2.48 -28.94% BenchmarkReverseBytes16-8 0.93 0.62 -33.33% BenchmarkReverseBytes32-8 1.55 1.13 -27.10% BenchmarkReverseBytes64-8 2.47 2.47 +0.00% Reverse-8 8.50ns ± 0% 6.86ns ± 0% ~ (p=1.000 n=1+1) Reverse8-8 2.17ns ± 0% 1.74ns ± 0% ~ (p=1.000 n=1+1) Reverse16-8 2.89ns ± 0% 2.34ns ± 0% ~ (p=1.000 n=1+1) Reverse32-8 3.55ns ± 0% 2.95ns ± 0% ~ (p=1.000 n=1+1) Reverse64-8 6.81ns ± 0% 5.57ns ± 0% ~ (p=1.000 n=1+1) ReverseBytes-8 3.49ns ± 0% 2.48ns ± 0% ~ (p=1.000 n=1+1) ReverseBytes16-8 0.93ns ± 0% 0.62ns ± 0% ~ (p=1.000 n=1+1) ReverseBytes32-8 1.55ns ± 0% 1.13ns ± 0% ~ (p=1.000 n=1+1) ReverseBytes64-8 2.47ns ± 0% 2.47ns ± 0% ~ (all samples are equal) Change-Id: I0064de8c7e0e568ca7885d6f7064344bef91a06d Reviewed-on: https://go-review.googlesource.com/37215 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-02-17 22:20:28 +00:00
Robert Griesemer	3a239a6ae4	math/bits: fix benchmarks (make sure calls don't get optimized away) Sum up function results and store them in an exported (global) variable. This prevents the compiler from optimizing away the otherwise side-effect free function calls. We now have more realistic set of benchmark numbers... Measured on 2.3 GHz Intel Core i7, running maxOS 10.12.3. Note: These measurements are based on the same "old" implementation as the prior measurements (commit `7d5c003`). benchmark old ns/op new ns/op delta BenchmarkReverse-8 72.9 8.50 -88.34% BenchmarkReverse8-8 13.2 2.17 -83.56% BenchmarkReverse16-8 21.2 2.89 -86.37% BenchmarkReverse32-8 36.3 3.55 -90.22% BenchmarkReverse64-8 71.3 6.81 -90.45% BenchmarkReverseBytes-8 11.2 3.49 -68.84% BenchmarkReverseBytes16-8 6.24 0.93 -85.10% BenchmarkReverseBytes32-8 7.40 1.55 -79.05% BenchmarkReverseBytes64-8 10.5 2.47 -76.48% Reverse-8 72.9ns ± 0% 8.5ns ± 0% ~ (p=1.000 n=1+1) Reverse8-8 13.2ns ± 0% 2.2ns ± 0% ~ (p=1.000 n=1+1) Reverse16-8 21.2ns ± 0% 2.9ns ± 0% ~ (p=1.000 n=1+1) Reverse32-8 36.3ns ± 0% 3.5ns ± 0% ~ (p=1.000 n=1+1) Reverse64-8 71.3ns ± 0% 6.8ns ± 0% ~ (p=1.000 n=1+1) ReverseBytes-8 11.2ns ± 0% 3.5ns ± 0% ~ (p=1.000 n=1+1) ReverseBytes16-8 6.24ns ± 0% 0.93ns ± 0% ~ (p=1.000 n=1+1) ReverseBytes32-8 7.40ns ± 0% 1.55ns ± 0% ~ (p=1.000 n=1+1) ReverseBytes64-8 10.5ns ± 0% 2.5ns ± 0% ~ (p=1.000 n=1+1) Change-Id: I8aef1334b84f6cafd25edccad7e6868b37969efb Reviewed-on: https://go-review.googlesource.com/37213 Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2017-02-17 20:58:12 +00:00
Robert Griesemer	ddb15cea4a	math/bits: much faster ReverseBytes, added respective benchmarks Measured on 2.3 GHz Intel Core i7, running maxOS 10.12.3. benchmark old ns/op new ns/op delta BenchmarkReverseBytes-8 11.4 3.51 -69.21% BenchmarkReverseBytes16-8 6.87 0.64 -90.68% BenchmarkReverseBytes32-8 7.79 0.65 -91.66% BenchmarkReverseBytes64-8 11.6 0.64 -94.48% name old time/op new time/op delta ReverseBytes-8 11.4ns ± 0% 3.5ns ± 0% ~ (p=1.000 n=1+1) ReverseBytes16-8 6.87ns ± 0% 0.64ns ± 0% ~ (p=1.000 n=1+1) ReverseBytes32-8 7.79ns ± 0% 0.65ns ± 0% ~ (p=1.000 n=1+1) ReverseBytes64-8 11.6ns ± 0% 0.6ns ± 0% ~ (p=1.000 n=1+1) Change-Id: I67b529652b3b613c61687e9e185e8d4ee40c51a2 Reviewed-on: https://go-review.googlesource.com/37211 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2017-02-17 19:38:26 +00:00
Robert Griesemer	7d5c003a3a	math/bits: much faster Reverse, added respective benchmarks Measured on 2.3 GHz Intel Core i7, running maxOS 10.12.3. name old time/op new time/op delta Reverse-8 76.6ns ± 0% 8.1ns ± 0% ~ (p=1.000 n=1+1) Reverse8-8 12.6ns ± 0% 0.6ns ± 0% ~ (p=1.000 n=1+1) Reverse16-8 20.8ns ± 0% 0.6ns ± 0% ~ (p=1.000 n=1+1) Reverse32-8 36.5ns ± 0% 0.6ns ± 0% ~ (p=1.000 n=1+1) Reverse64-8 74.0ns ± 0% 6.4ns ± 0% ~ (p=1.000 n=1+1) benchmark old ns/op new ns/op delta BenchmarkReverse-8 76.6 8.07 -89.46% BenchmarkReverse8-8 12.6 0.64 -94.92% BenchmarkReverse16-8 20.8 0.64 -96.92% BenchmarkReverse32-8 36.5 0.64 -98.25% BenchmarkReverse64-8 74.0 6.38 -91.38% Change-Id: I6b99b10cee2f2babfe79342b50ee36a45a34da30 Reviewed-on: https://go-review.googlesource.com/37149 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2017-02-17 19:38:13 +00:00
Robert Griesemer	81acd308a4	math/bits: expand doc strings for all functions Follow-up on https://go-review.googlesource.com/36315. No functionality change. For #18616. Change-Id: Id4df34dd7d0381be06eea483a11bf92f4a01f604 Reviewed-on: https://go-review.googlesource.com/37140 Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2017-02-17 19:02:56 +00:00
Shenghou Ma	211102c85f	math: fix typos in Bessel function docs While we're at it, also document Yn(0, 0) = -Inf for completeness. Fixes #18823. Change-Id: Ib6db68f76d29cc2373c12ebdf3fab129cac8c167 Reviewed-on: https://go-review.googlesource.com/35970 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-02-16 22:41:34 +00:00
Robert Griesemer	661e2179e5	math/bits: added package for bit-level counting and manipulation Initial platform-independent implementation. For #18616. Change-Id: I4585c55b963101af9059c06c1b8a866cb384754c Reviewed-on: https://go-review.googlesource.com/36315 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2017-02-16 21:54:59 +00:00
Daniel Martí	6910756f9b	math/big: simplify bool expression Change-Id: I280c53be455f2fe0474ad577c0f7b7908a4eccb2 Reviewed-on: https://go-review.googlesource.com/36993 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-02-14 23:34:25 +00:00
Michael Munday	d2fea0447f	math/big: fix s390x test build tags The tests failed to compile when using the math_big_pure_go tag on s390x. Change-Id: I2a09f53ff6562ab9bc9b886cffc0f6205bbfcfbb Reviewed-on: https://go-review.googlesource.com/36956 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2017-02-14 19:44:35 +00:00
Josh Bleecher Snyder	785cb7e098	all: fix some printf format strings Appease vet. Change-Id: Ie88de08b91041990c0eaf2e15628cdb98d40c660 Reviewed-on: https://go-review.googlesource.com/36938 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-02-14 02:09:30 +00:00
Michael Munday	a524616860	cmd/{asm,internal/obj/s390x}, math: remove emulated float instructions The s390x port was based on the ppc64 port and, because of the way the port was done, inherited some instructions from it. ppc64 supports 3-operand (4-operand for FMADD etc.) floating point instructions but s390x doesn't (the destination register is always an input) and so these were emulated. There is a bug in the emulation of FMADD whereby if the destination register is also a source for the multiplication it will be clobbered. This doesn't break any assembly code in the std lib but could affect future work. To fix this I have gone through the floating point instructions and removed all unnecessary 3-/4-operand emulation. The compiler doesn't need it and assembly writers don't need it, it's just a source of bugs. I've also deleted the FNMADD family of emulated instructions. They aren't used anywhere. Change-Id: Ic07cedcf141a6a3b43a0c84895460f6cfbf56c04 Reviewed-on: https://go-review.googlesource.com/33350 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-02-10 16:11:25 +00:00
Alberto Donizetti	f44e587031	math: check overflow in amd64 Exp implementation Unlike the pure go implementation used by every other architecture, the amd64 asm implementation of Exp does not fail early if the argument is known to overflow. Make it fail early. Cost of the check is < 1ns (on an old Sandy Bridge machine): name old time/op new time/op delta Exp-4 18.3ns ± 1% 18.7ns ± 1% +2.08% (p=0.000 n=18+20) Fixes #14932 Fixes #18912 Change-Id: I04b3f9b4ee853822cbdc97feade726fbe2907289 Reviewed-on: https://go-review.googlesource.com/36271 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> Reviewed-by: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2017-02-10 13:40:08 +00:00
Robert Griesemer	1f93ba66d6	math/big: add IsInt64/IsUint64 predicates Change-Id: Ia5ed3919cb492009ac8f66d175b47a69f83ee4f1 Reviewed-on: https://go-review.googlesource.com/36487 Reviewed-by: Alan Donovan <adonovan@google.com>	2017-02-07 23:02:33 +00:00
Russ Cox	850e55b8c0	crypto/*: document use or non-use of constant-time algorithms Fixes #16821. Change-Id: I63d5f3d7cfba1c76259912d754025c5f3cbe4a56 Reviewed-on: https://go-review.googlesource.com/31573 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-12-07 16:34:50 +00:00
Russ Cox	3f69822a9a	math/rand: export Source64, mainly for documentation value There is some code value too: types intending to implement Source64 can write a conversion confirming that. For #4254 and the Go 1.8 release notes. Change-Id: I7fc350a84f3a963e4dab317ad228fa340dda5c66 Reviewed-on: https://go-review.googlesource.com/33456 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-11-23 04:29:25 +00:00
Russ Cox	37d078ede3	math/big: add Baillie-PSW test to (Int).ProbablyPrime After x.ProbablyPrime(n) passes the n Miller-Rabin rounds, add a Baillie-PSW test before declaring x probably prime. Although the provable error bounds are unchanged, the empirical error bounds drop dramatically: there are no known inputs for which Baillie-PSW gives the wrong answer. For example, before this CL, big.NewInt(4431327).ProbablyPrime(1) == true. Now it is (correctly) false. The new Baillie-PSW test is two pieces: an added Miller-Rabin round with base 2, and a so-called extra strong Lucas test. (See the references listed in prime.go for more details.) The Lucas test takes about 3.5x as long as the Miller-Rabin round, which is close to theoretical expectations. name time/op ProbablyPrime/Lucas 2.91ms ± 2% ProbablyPrime/MillerRabinBase2 850µs ± 1% ProbablyPrime/n=0 3.75ms ± 3% The speed of prime testing for a prime input does get slower: name old time/op new time/op delta ProbablyPrime/n=1 849µs ± 1% 4521µs ± 1% +432.31% (p=0.000 n=10+9) ProbablyPrime/n=5 4.31ms ± 3% 7.87ms ± 1% +82.70% (p=0.000 n=10+10) ProbablyPrime/n=10 8.52ms ± 3% 12.28ms ± 1% +44.11% (p=0.000 n=10+10) ProbablyPrime/n=20 16.9ms ± 2% 21.4ms ± 2% +26.35% (p=0.000 n=9+10) However, because the Baillie-PSW test is only added when the old ProbablyPrime(n) would return true, testing composites runs at the same speed as before, except in the case where the result would have been incorrect and is now correct. In particular, the most important use of this code is for generating random primes in crypto/rand. That use spends essentially all its time testing composites, so it is not slowed down by the new Baillie-PSW check: name old time/op new time/op delta Prime 104ms ±22% 111ms ±16% ~ (p=0.165 n=10+10) Thanks to Serhat Şevki Dinçer for CL 20170, which this CL builds on. Fixes #13229. Change-Id: Id26dde9b012c7637c85f2e96355d029b6382812a Reviewed-on: https://go-review.googlesource.com/30770 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2016-11-22 02:05:47 +00:00
Brad Fitzpatrick	d8b14c5243	math/rand: make floating point tests shorter on mips and mipsle Like GOARM=5 does. Fixes #17944 Change-Id: Ica2a54a90fbd4a29471d1c6009ace2fcc5e82a73 Reviewed-on: https://go-review.googlesource.com/33326 Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-11-16 19:22:53 +00:00
Dmitri Shuralyov	d8264de868	all: spell "marshal" and "unmarshal" consistently The tree is inconsistent about single l vs double l in those words in documentation, test messages, and one error value text. $ git grep -E '[Mm]arshall(\|s\|er\|ers\|ed\|ing)' \| wc -l 42 $ git grep -E '[Mm]arshal(\|s\|er\|ers\|ed\|ing)' \| wc -l 1694 Make it consistently a single l, per earlier decisions. This means contributors won't be confused by misleading precedence, and it helps consistency. Change the spelling in one error value text in newRawAttributes of crypto/x509 package to be consistent. This change was generated with: perl -i -npe 's,([Mm]arshal)l(\|s\|er\|ers\|ed\|ing),$1$2,' $(git grep -l -E '[Mm]arshall' \| grep -v AUTHORS \| grep -v CONTRIBUTORS) Updates #12431. Follows https://golang.org/cl/14150. Change-Id: I85d28a2d7692862ccb02d6a09f5d18538b6049a2 Reviewed-on: https://go-review.googlesource.com/33017 Run-TryBot: Minux Ma <minux@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-11-12 00:13:35 +00:00
Bill O'Farrell	b6a15683f0	math: use SIMD to accelerate some scalar math functions on s390x Note, most math functions are structured to use stubs, so that they can be accelerated with assembly on any platform. Sinh, cosh, and tanh were not structued with stubs, so this CL does that. This set of routines was chosen as likely to produce good speedups with assembly on any platform. Technique used was minimax polynomial approximation using tables of polynomial coefficients, with argument range reduction. A table of scaling factors was also used for cosh and log10. before after speedup BenchmarkCos 22.1 ns/op 6.79 ns/op 3.25x BenchmarkCosh 125 ns/op 11.7 ns/op 10.68x BenchmarkLog10 48.4 ns/op 12.5 ns/op 3.87x BenchmarkSin 22.2 ns/op 6.55 ns/op 3.39x BenchmarkSinh 125 ns/op 14.2 ns/op 8.80x BenchmarkTanh 65.0 ns/op 15.1 ns/op 4.30x Accuracy was tested against a high precision reference function to determine maximum error. Approximately 4,000,000 points were tested for each function, producing the following result. Note: ulperr is error in "units in the last place" max ulperr sin 1.43 (returns NaN beyond +-2^50) cos 1.79 (returns NaN beyond +-2^50) cosh 1.05 sinh 3.02 tanh 3.69 log10 1.75 Also includes a set of tests to test non-vector functions even when SIMD is enabled Change-Id: Icb45f14d00864ee19ed973d209c3af21e4df4edc Reviewed-on: https://go-review.googlesource.com/32352 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Munday <munday@ca.ibm.com>	2016-11-11 20:20:23 +00:00
Vladimir Stefanovic	d1e9104fb2	math, math/big: add support for GOARCH=mips{,le} Change-Id: I54e100cced5b49674937fb87d1e0f585f962aeb7 Reviewed-on: https://go-review.googlesource.com/31484 Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-11-03 22:55:06 +00:00
Cherry Zhang	0dabbcdc43	math/big: flip long/short flag on TestFloat32Distribution It looks like a typo in CL 30707. Change-Id: Ia2d013567dbd1a49901d9be0cd2d5a103e6e38cf Reviewed-on: https://go-review.googlesource.com/32187 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2016-10-27 21:44:37 +00:00
Bill O'Farrell	1e6b12a201	math/big: uses SIMD for some math big functions on s390x The following benchmarks are improved by the amounts shown (Others unaffected beyond the level of noise.) Also adds a test to confirm non-SIMD implementation still correct, even when run on SIMD-capable machine Benchmark old new BenchmarkAddVV/100-18 66148.08 MB/s 117546.19 MB/s 1.8x BenchmarkAddVV/1000-18 70168.27 MB/s 133478.96 MB/s 1.9x BenchmarkAddVV/10000-18 67489.80 MB/s 100010.79 MB/s 1.5x BenchmarkAddVV/100000-18 54329.99 MB/s 69232.45 MB/s 1.3x BenchmarkAddVW/100-18 9929.10 MB/s 14841.31 MB/s 1.5x BenchmarkAddVW/1000-18 10583.31 MB/s 18674.44 MB/s 1.76x BenchmarkAddVW/10000-18 10521.15 MB/s 17484.10 MB/s 1.66x BenchmarkAddVW/100000-18 10616.56 MB/s 18084.27 MB/s 1.7x Change-Id: Ic9234c41a43f6c5e9d0e9377de8b4deeefc428a7 Reviewed-on: https://go-review.googlesource.com/32211 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-10-26 23:52:10 +00:00
Mohit Agarwal	5a9549260d	math/cmplx: prevent infinite loop in tanSeries The condition to determine if any further iterations are needed is evaluated to false in case it encounters a NaN. Instead, flip the condition to keep looping until the factor is greater than the machine roundoff error. Updates #17577 Change-Id: I058abe73fcd49d3ae4e2f7b33020437cc8f290c3 Reviewed-on: https://go-review.googlesource.com/31952 Reviewed-by: Robert Griesemer <gri@golang.org>	2016-10-25 18:32:22 +00:00
Alexander Döring	4c9c023346	math,math/cmplx: fix linter issues Change-Id: If061f1f120573cb109d97fa40806e160603cd593 Reviewed-on: https://go-review.googlesource.com/31871 Reviewed-by: Rob Pike <r@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-10-24 23:25:46 +00:00
Mohit Agarwal	a6141ebd3f	math/big: fix alignment in Float.Parse docs Leading spaces in a couple of lines instead of tabs cause those to be misaligned (as seen on <https://golang.org/pkg/math/big/#Float.Parse>): <<< number = [ sign ] [ prefix ] mantissa [ exponent ] \| infinity . sign = "+" \| "-" . prefix = "0" ( "x" \| "X" \| "b" \| "B" ) . mantissa = digits \| digits "." [ digits ] \| "." digits . exponent = ( "E" \| "e" \| "p" ) [ sign ] digits . digits = digit { digit } . digit = "0" ... "9" \| "a" ... "z" \| "A" ... "Z" . infinity = [ sign ] ( "inf" \| "Inf" ) . >>> Replace the leading spaces with tabs so that those align well. Change-Id: Ibba6cd53f340001bbd929067dc587feb071dc3bd Reviewed-on: https://go-review.googlesource.com/31830 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-10-24 17:59:42 +00:00
Josh Bleecher Snyder	accf5cc386	all: minor vet fixes Change-Id: I22f0f3e792052762499f632571155768b4052bc9 Reviewed-on: https://go-review.googlesource.com/31759 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-10-24 17:27:37 +00:00
Emmanuel Odeke	f36e1adaa2	math/big: implement Float.Scan, type assert fmt interfaces to enforce docs Implements Float.Scan which satisfies fmt.Scanner interface. Also enforces docs' interface implementation claims with compile time type assertions, that is: + Float always implements fmt.Formatter and fmt.Scanner + Int always implements fmt.Formatter and fmt.Scanner + Rat always implements fmt.Formatter which will ensure that the API claims are strictly matched. Also note that Float.Scan doesn't handle ±Inf. Fixes #17391 Change-Id: I3d3dfbe7f602066975c7a7794fe25b4c645440ce Reviewed-on: https://go-review.googlesource.com/30723 Reviewed-by: Robert Griesemer <gri@golang.org>	2016-10-19 03:25:30 +00:00
Mohit Agarwal	7eed848a17	math: speed up Gamma(+Inf) Add special case for Gamma(+∞) which speeds it up: benchmark old ns/op new ns/op delta BenchmarkGamma-4 14.5 7.44 -48.69% The documentation for math.Gamma already specifies it as a special case: Gamma(+Inf) = +Inf The original C code that has been used as the reference implementation (as mentioned in the comments in gamma.go) also treats Gamma(+∞) as a special case: if( x == INFINITY ) return(x); Change-Id: Idac36e19192b440475aec0796faa2d2c7f8abe0b Reviewed-on: https://go-review.googlesource.com/31370 Reviewed-by: Robert Griesemer <gri@golang.org>	2016-10-18 22:12:03 +00:00
Alberto Donizetti	f6cdfc7987	math/big: add benchmarks for big.Float String In addition to the DecimalConversion benchmark, that exercises the String method of the internal decimal type on a range of small shifts, add a few benchmarks for the big.Float String method. They can be used to obtain more realistic data on the real-world performance of big.Float printing. Change-Id: I7ada324e7603cb1ce7492ccaf3382db0096223ba Reviewed-on: https://go-review.googlesource.com/31275 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-10-18 05:54:35 +00:00
Russ Cox	9ee21f90d2	math/big: add (*Int).Sqrt This is needed for some of the more complex primality tests (to filter out exact squares), and while the code is simple the boundary conditions are not obvious, so it seems worth having in the library. Change-Id: Ica994a6b6c1e412a6f6d9c3cf823f9b653c6bcbd Reviewed-on: https://go-review.googlesource.com/30706 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2016-10-17 20:30:19 +00:00
Robert Griesemer	92221fe8bc	math/big: slightly faster float->decimal conversion Inspired by Alberto Donizetti's observations in https://go-review.googlesource.com/#/c/30099/. name old time/op new time/op delta DecimalConversion-8 138µs ± 1% 136µs ± 2% -1.85% (p=0.000 n=10+10) 10 runs each, measured on a Mac Mini, 2.3 GHz Intel Core i7. Performance improvements varied between -1.25% to -4.4%; -1.85% is about in the middle of the observed improvement. The generated code is slightly shorter in the inner loops of the conversion code. Change-Id: I10fb3b2843da527691c39ad5e5e5bd37ed63e2fa Reviewed-on: https://go-review.googlesource.com/31250 Reviewed-by: Alan Donovan <adonovan@google.com>	2016-10-17 19:33:33 +00:00
Russ Cox	f444b48fe4	encoding/json: fix decoding of null into Unmarshaler, TextUnmarshaler 1. Define behavior for Unmarshal of JSON null into Unmarshaler and TextUnmarshaler. Specifically, an Unmarshaler will be given the literal null and can decide what to do (because otherwise json.RawMessage is impossible to implement), and a TextUnmarshaler will be skipped over (because there is no text to unmarshal), like most other inappropriate types. Document this in Unmarshal, with a reminder in UnmarshalJSON about handling null. 2. Test all this. 3. Fix the TextUnmarshaler case, which was returning an unmarshalling error, to match the definition. 4. Fix the error that had been used for the TextUnmarshaler, since it was claiming that there was a JSON string when in fact the problem was NOT having a string. 5. Adjust time.Time and big.Int's UnmarshalJSON to ignore null, as is conventional. Fixes #9037. Change-Id: If78350414eb8dda712867dc8f4ca35a9db041b0c Reviewed-on: https://go-review.googlesource.com/30944 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-10-17 15:21:33 +00:00
Russ Cox	88562dc83e	math/big: move ProbablyPrime into its own source file A later CL will be adding more code here. It will help to keep it separate from the other code. Change-Id: I971ba53de819cd10991b51fdec665984939a5f9b Reviewed-on: https://go-review.googlesource.com/30709 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-10-11 16:16:17 +00:00
Russ Cox	9927f25d71	math/big: test and optimize Exp(2, y, n) for large y, odd n The Montgomery multiply code is applicable to this case but was being bypassed. Don't do that. The old test len(x) > 1 was really just a bad approximation to x > 1. name old time/op new time/op delta Exp-8 5.56ms ± 4% 5.73ms ± 3% ~ (p=0.095 n=5+5) Exp2-8 7.59ms ± 1% 5.66ms ± 1% -25.40% (p=0.008 n=5+5) This comes up especially when doing Fermat (Miller-Rabin) primality tests with base 2. Change-Id: I4cc02978db6dfa93f7f3c8f32718e25eedb4f5ed Reviewed-on: https://go-review.googlesource.com/30708 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-10-11 16:15:51 +00:00
Russ Cox	9a8832f142	math/big: move exhaustive tests behind -long flag This way you can still run 'go test' or 'go bench -run Foo' without wondering why it is taking so very long. Change-Id: Icfa097a6deb1d6682acb7be9f34729215c29eabb Reviewed-on: https://go-review.googlesource.com/30707 Reviewed-by: Robert Griesemer <gri@golang.org>	2016-10-11 16:15:12 +00:00
Russ Cox	3a9072829e	math/big: make division faster - Add new BenchmarkQuoRem. - Eliminate allocation in divLarge nat pool - Unroll mulAddVWW body 4x - Remove some redundant slice loads in divLarge name old time/op new time/op delta QuoRem-8 2.18µs ± 1% 1.93µs ± 1% -11.38% (p=0.000 n=19+18) The starting point in the comparison here is Cherry's pending CL to turn mulWW and divWW into intrinsics. The optimizations in divLarge work best because all the function calls are gone. The effect of this CL is not as large if you don't assume Cherry's CL. Change-Id: Ia6138907489c5b9168497912e43705634e163b35 Reviewed-on: https://go-review.googlesource.com/30613 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2016-10-10 18:50:23 +00:00
Robert Griesemer	95a6572b2b	math/big: Rat.SetString to report error if input is not consumed entirely Also, document behavior explicitly for all SetString implementations. Fixes #17001. Change-Id: Iccc882b4bc7f8b61b6092f330e405c146a80dc98 Reviewed-on: https://go-review.googlesource.com/30472 Reviewed-by: Matthew Dempsky <mdempsky@google.com>	2016-10-06 20:37:01 +00:00
Alexander Döring	7b4a224667	math/cmplx: add examples for Abs, Exp, Polar Updates #16360 Change-Id: I941519981ff5bda3a113e14fa6be718eb4d2bf83 Reviewed-on: https://go-review.googlesource.com/30554 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-10-06 19:49:12 +00:00
Russ Cox	4f3a641e6e	math: fix Gamma(-171.5) on all platforms Using 387 mode was computing it without underflow to zero, apparently due to an 80-bit intermediate. Avoid underflow even with 64-bit floats. This eliminates the TODOs in the test suite. Fixes linux-386-387 build and fixes #11441. Change-Id: I8abaa63bfdf040438a95625d1cb61042f0302473 Reviewed-on: https://go-review.googlesource.com/30540 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-10-06 14:53:09 +00:00
Russ Cox	a39920fdbb	math: fix Gamma(x) for x < -170.5 and other corner cases Fixes #11441. Test tables generated by package main import ( "bytes" "fmt" "log" "os/exec" "strconv" "strings" ) var inputs = []float64{ 0.5, 1.5, 2.5, 3.5, -0.5, -1.5, -2.5, -3.5, 0.1, 0.01, 1e-8, 1e-16, 1e-3, 1e-16, 1e-308, 5.6e-309, 5.5e-309, 1e-309, 1e-323, 5e-324, -0.1, -0.01, -1e-8, -1e-16, -1e-3, -1e-16, -1e-308, -5.6e-309, -5.5e-309, -1e-300 / 1e9, -1e-300 / 1e23, -5e-300 / 1e24, -0.9999999999999999, -1.0000000000000002, -1.9999999999999998, -2.0000000000000004, -100.00000000000001, -99.999999999999986, 17, 171, 171.6, 171.624, 171.625, 172, 2000, -100.5, -160.5, -170.5, -171.5, -176.5, -177.5, -178.5, -179.5, -201.0001, -202.9999, -1000.5, -1000000000.3, -4503599627370495.5, -63.349078729022985, -127.45117632943295, } func main() { var buf bytes.Buffer for _, v := range inputs { fmt.Fprintf(&buf, "gamma(%.1000g)\n", v) } cmd := exec.Command("gp", "-q") cmd.Stdin = &buf out, err := cmd.CombinedOutput() if err != nil { log.Fatalf("gp: %v", err) } f := strings.Split(string(out), "\n") if len(f) > 0 && f[len(f)-1] == "" { f = f[:len(f)-1] } if len(f) != len(inputs) { log.Fatalf("gp: wrong output count\n%s\n", out) } for i, g := range f { gf, err := strconv.ParseFloat(strings.Replace(g, " E", "e", -1), 64) if err != nil { if strings.Contains(err.Error(), "value out of range") { if strings.HasPrefix(g, "-") { fmt.Printf("\t{%g, Inf(-1)},\n", inputs[i]) } else { fmt.Printf("\t{%g, Inf(1)},\n", inputs[i]) } continue } log.Fatal(err) } if gf == 0 && strings.HasPrefix(g, "-") { fmt.Printf("\t{%g, Copysign(0, -1)},\n", inputs[i]) continue } fmt.Printf("\t{%g, %g},\n", inputs[i], gf) } } Change-Id: Ie98c7751d92b8ffb40e8313f5ea10df0890e2feb Reviewed-on: https://go-review.googlesource.com/30146 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Quentin Smith <quentin@golang.org>	2016-10-05 03:53:13 +00:00
Russ Cox	aab849e429	math: use portable Exp instead of 387 instructions on 386 The 387 implementation is less accurate and slower. name old time/op new time/op delta Exp-8 29.7ns ± 2% 24.0ns ± 2% -19.08% (p=0.000 n=10+10) This makes Gamma more accurate too. Change-Id: Iad33b9cce0b087ccbce3e08ba7a6d285c4999d02 Reviewed-on: https://go-review.googlesource.com/30230 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Quentin Smith <quentin@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-10-05 03:53:11 +00:00
Florian Uekermann	003a598bf2	math/rand: add Rand.Uint64 This adds Uint64 methods to Rand and rngSource. Rand.Uint64 uses Source.Uint64 directly if it is present. rngSource.Uint64 provides access to all 64 bits generated by the underlying ALFG. To ensure high seed quality a 64th bit has been added to all elements of the array of "cooked" random numbers that are used for seeding. gen_cooked.go generates both the 63 bit and 64 bit array. Fixes #4254 Change-Id: I22855618ac69abae3d2799b3e7e59996d4c5a4b1 Reviewed-on: https://go-review.googlesource.com/27253 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-10-03 15:04:39 +00:00
Ilya Tocar	731b3ed18d	math: make sqrt smaller on AMD64 This makes function fit in 16 bytes, saving 16 bytes. Change-Id: Iac5d2add42f6dae985b2a5cbe19ad4bd4bcc92ec Reviewed-on: https://go-review.googlesource.com/29151 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2016-09-29 15:56:52 +00:00
Cherry Zhang	ba94dd3438	math: add some assembly implementations on ARM64 Also add GP<->FP move addressing mode to FMOVS, FMOVD instructions. Ceil-8 37.1ns ± 0% 7.9ns ± 0% -78.64% (p=0.000 n=4+5) Dim-8 20.9ns ± 1% 11.3ns ± 0% -45.93% (p=0.008 n=5+5) Floor-8 22.9ns ± 0% 7.9ns ± 0% -65.41% (p=0.029 n=4+4) Gamma-8 117ns ± 0% 94ns ± 1% -19.50% (p=0.016 n=4+5) PowInt-8 121ns ± 0% 108ns ± 1% -11.07% (p=0.008 n=5+5) PowFrac-8 331ns ± 0% 318ns ± 0% -3.93% (p=0.000 n=5+4) Trunc-8 18.8ns ± 0% 7.9ns ± 0% -57.83% (p=0.016 n=4+5) Change-Id: I709b7f1a914b28adc27414522db551e2630cfb92 Reviewed-on: https://go-review.googlesource.com/29734 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2016-09-27 23:52:12 +00:00
Michal Bohuslávek	9ed0715bb6	math/big: support negative numbers in ModInverse Fixes #16984 Change-Id: I3a330e82941a068ca6097985af4ab221275fd336 Reviewed-on: https://go-review.googlesource.com/29299 Run-TryBot: Adam Langley <agl@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Adam Langley <agl@golang.org>	2016-09-27 00:42:56 +00:00
Alberto Donizetti	6bcd258095	math/big: better SetFloat64 example in doc Fixes #17221 Change-Id: Idaa2af6b8646651ea72195671d1a4b5c370a5a22 Reviewed-on: https://go-review.googlesource.com/29711 Reviewed-by: Robert Griesemer <gri@golang.org>	2016-09-26 18:30:35 +00:00
Lynn Boger	3311275ce8	math, cmd/internal/obj/ppc64: improve floor, ceil, trunc with asm This adds the instructions frim, frip, and friz to the ppc64x assembler for use in implementing the math.Floor, math.Ceil, and math.Trunc functions to improve performance. Fixes #17185 BenchmarkCeil-128 21.4 6.99 -67.34% BenchmarkFloor-128 13.9 6.37 -54.17% BenchmarkTrunc-128 12.7 6.33 -50.16% Change-Id: I96131bd4e8c9c8dbafb25bfeb544cf9d2dbb4282 Reviewed-on: https://go-review.googlesource.com/29654 Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Michael Munday <munday@ca.ibm.com>	2016-09-23 13:03:08 +00:00
Michael Munday	e94c52933b	cmd/compile: intrinsify Ctz{32,64} and Bswap{32,64} on s390x Also adds the 'find leftmost one' instruction (FLOGR) and replaces the WORD-encoded use of FLOGR in math/big with it. Change-Id: I18e7cd19e75b8501a6ae8bd925471f7e37ded206 Reviewed-on: https://go-review.googlesource.com/29372 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-09-19 19:03:01 +00:00
Brad Fitzpatrick	6f135bfd92	math/big: cut 2 minutes off race tests No need to test so many sizes in race mode, especially for a package which doesn't use goroutines. Reduces test time from 2.5 minutes to 25 seconds. Updates #17104 Change-Id: I7065b39273f82edece385c0d67b3f2d83d4934b8 Reviewed-on: https://go-review.googlesource.com/29163 Reviewed-by: David Crawshaw <crawshaw@golang.org>	2016-09-14 19:11:43 +00:00
Michael Munday	f1515a01fd	runtime, math/big: allow R0 on s390x to contain values other than 0 The new SSA backend for s390x can use R0 as a general purpose register. This change modifies assembly code to either avoid using R0 entirely or explicitly set R0 to 0. R0 can still be safely used as 0 in address calculations. Change-Id: I3efa723e9ef322a91a408bd8c31768d7858526c8 Reviewed-on: https://go-review.googlesource.com/28976 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-09-12 18:06:01 +00:00
Ilya Tocar	6e703ae709	math: fix sqrt regression on AMD64 1.7 introduced a significant regression compared to 1.6: SqrtIndirect-4 2.32ns ± 0% 7.86ns ± 0% +238.79% (p=0.000 n=20+18) This is caused by sqrtsd preserving upper part of destination register. Which introduces dependency on previous value of X0. In 1.6 benchmark loop didn't use X0 immediately after call: callq %rbx movsd 0x8(%rsp),%xmm2 movsd 0x20(%rsp),%xmm1 addsd %xmm2,%xmm1 mov 0x18(%rsp),%rax inc %rax jmp loop In 1.7 however xmm0 is used just after call: callq %rbx mov 0x10(%rsp),%rcx lea 0x1(%rcx),%rax movsd 0x8(%rsp),%xmm0 movsd 0x18(%rsp),%xmm1 I've verified that this is caused by dependency, by inserting XORPS X0,X0 in the beginning of math.Sqrt, which puts performance back on 1.6 level. Splitting SQRTSD mem,reg into: MOVSD mem,reg SQRTSD reg,reg Removes dependency, because MOVSD (load version) doesn't need to preserve upper part of a register. And reg,reg operation is solved by renamer in CPU. As a result of this change regression is gone: SqrtIndirect-4 7.86ns ± 0% 2.33ns ± 0% -70.36% (p=0.000 n=18+17) This also removes old Sqrt benchmarks, in favor of benchmarks measuring latency. Only SqrtIndirect is kept, to show impact of this patch. Change-Id: Ic7eebe8866445adff5bc38192fa8d64c9a6b8872 Reviewed-on: https://go-review.googlesource.com/28392 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> Reviewed-by: Keith Randall <khr@golang.org>	2016-09-06 15:45:02 +00:00
David Glasser	82bc0d4e80	math/rand: document that NewSource sources race While it was previously explicitly documented that "the default Source" is safe for concurrent use, a careless reader can interpret that as meaning "the implementation of the Source interface created by functions in this package" rather than "the default shared Source used by top-level functions". Be explicit that the Source returned by NewSource is not safe for use by multiple goroutines. Fixes #3611. Change-Id: Iae4bc04c3887ad6e2491e36e38feda40324022c5 Reviewed-on: https://go-review.googlesource.com/25501 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-09-02 05:16:21 +00:00
Ilya Tocar	2a2cab2911	math: speed up bessel functions on AMD64 J0-4 71.9ns ± 1% 54.6ns ± 0% -24.08% (p=0.000 n=20+18) J1-4 71.6ns ± 0% 55.4ns ± 0% -22.60% (p=0.000 n=19+20) Jn-4 153ns ± 0% 118ns ± 1% -22.71% (p=0.000 n=20+20) Y0-4 70.8ns ± 0% 53.9ns ± 0% -23.87% (p=0.000 n=19+19) Y1-4 70.8ns ± 0% 54.1ns ± 0% -23.54% (p=0.000 n=20+20) Yn-4 149ns ± 0% 116ns ± 0% -22.15% (p=0.000 n=19+20) Fixes #16889 Change-Id: Ie88496407b42f6acb918ffae1226b1b4c0500cb9 Reviewed-on: https://go-review.googlesource.com/28086 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2016-08-31 14:45:29 +00:00
Ethan Miller	4955147291	math/big: add assembly implementation of arith for ppc64{le} The existing implementation used a pure go implementation, leading to slow cryptographic performance. Implemented mulWW, subVV, mulAddVWW, addMulVVW, and bitLen for ppc64{le}. Implemented divWW for ppc64le only, as the DIVDEU instruction is only available on Power8 or newer. benchcmp output: benchmark old ns/op new ns/op delta BenchmarkSignP384 28934360 10877330 -62.41% BenchmarkRSA2048Decrypt 41261033 5139930 -87.54% BenchmarkRSA2048Sign 45231300 7610985 -83.17% Benchmark3PrimeRSA2048Decrypt 20487300 2481408 -87.89% Fixes #16621 Change-Id: If8b68963bb49909bde832f2bda08a3791c4f5b7a Reviewed-on: https://go-review.googlesource.com/26951 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Munday <munday@ca.ibm.com>	2016-08-29 21:03:21 +00:00
Michael Munday	9f7ea61674	math: optimize Ceil, Floor and Trunc on s390x Use the FIDBR instruction to round floating-point numbers to integers. name old time/op new time/op delta Ceil 14.1ns ± 0% 3.0ns ± 0% -78.89% (p=0.000 n=10+10) Floor 6.42ns ± 0% 3.03ns ± 0% -52.80% (p=0.000 n=10+10) Trunc 6.67ns ± 0% 3.03ns ± 0% -54.57% (p=0.000 n=10+9) Change-Id: I3b416f6d0bccaaa9b547de86356471365862399c Reviewed-on: https://go-review.googlesource.com/27827 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-08-26 17:27:13 +00:00
Josh Bleecher Snyder	71ab9fa312	all: fix assembly vet issues Add missing function prototypes. Fix function prototypes. Use FP references instead of SP references. Fix variable names. Update comments. Clean up whitespace. (Not for vet.) All fairly minor fixes to make vet happy. Updates #11041 Change-Id: Ifab2cdf235ff61cdc226ab1d84b8467b5ac9446c Reviewed-on: https://go-review.googlesource.com/27713 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-08-25 18:52:31 +00:00
Alberto Donizetti	cda633b39b	math/big: avoid allocation in float.{Add, Sub} when there's no aliasing name old time/op new time/op delta FloatAdd/10-4 116ns ± 1% 82ns ± 0% -28.74% (p=0.008 n=5+5) FloatAdd/100-4 124ns ± 0% 86ns ± 1% -30.34% (p=0.016 n=4+5) FloatAdd/1000-4 192ns ± 1% 123ns ± 0% -35.94% (p=0.008 n=5+5) FloatAdd/10000-4 826ns ± 0% 438ns ± 0% -46.99% (p=0.000 n=4+5) FloatAdd/100000-4 6.82µs ± 1% 3.36µs ± 0% -50.74% (p=0.008 n=5+5) FloatSub/10-4 108ns ± 1% 77ns ± 1% -29.06% (p=0.008 n=5+5) FloatSub/100-4 115ns ± 0% 79ns ± 0% -31.48% (p=0.029 n=4+4) FloatSub/1000-4 168ns ± 0% 99ns ± 0% -41.09% (p=0.029 n=4+4) FloatSub/10000-4 690ns ± 2% 288ns ± 1% -58.24% (p=0.008 n=5+5) FloatSub/100000-4 5.37µs ± 1% 2.10µs ± 1% -60.89% (p=0.008 n=5+5) name old alloc/op new alloc/op delta FloatAdd/10-4 48.0B ± 0% 0.0B ±NaN% -100.00% (p=0.008 n=5+5) FloatAdd/100-4 64.0B ± 0% 0.0B ±NaN% -100.00% (p=0.008 n=5+5) FloatAdd/1000-4 176B ± 0% 0B ±NaN% -100.00% (p=0.008 n=5+5) FloatAdd/10000-4 1.41kB ± 0% 0.00kB ±NaN% -100.00% (p=0.008 n=5+5) FloatAdd/100000-4 13.6kB ± 0% 0.0kB ±NaN% -100.00% (p=0.008 n=5+5) FloatSub/10-4 48.0B ± 0% 0.0B ±NaN% -100.00% (p=0.008 n=5+5) FloatSub/100-4 64.0B ± 0% 0.0B ±NaN% -100.00% (p=0.008 n=5+5) FloatSub/1000-4 176B ± 0% 0B ±NaN% -100.00% (p=0.008 n=5+5) FloatSub/10000-4 1.41kB ± 0% 0.00kB ±NaN% -100.00% (p=0.008 n=5+5) FloatSub/100000-4 13.6kB ± 0% 0.0kB ±NaN% -100.00% (p=0.008 n=5+5) Fixes #14868 Change-Id: Ia2b8b1a8ef0868288ecb25f812b17bd03ff40d1c Reviewed-on: https://go-review.googlesource.com/23568 Reviewed-by: Robert Griesemer <gri@golang.org>	2016-08-17 17:56:42 +00:00
Florian Uekermann	507144c011	math/rand: Document origin of cooked pseudo-random numbers The Source provided by math/rand relies on an array of cooked pseudo-random 63bit integers for seeding. The origin of these numbers is undocumented. Add a standalone program in math/rand folder that generates the 63bit integer array as well as a 64bit version supporting extension of the Source to 64bit pseudo-random number generation while maintaining the current sequence in the lower 63bit. The code is largely based on the initial implementation of the random number generator in the go repository by Ken Thompson (revision 399). Change-Id: Ib4192aea8127595027116a0f5a7be53f11dc110b Reviewed-on: https://go-review.googlesource.com/22230 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-08-17 14:50:18 +00:00
Josh Bleecher Snyder	302dd7b71e	crypto/cipher, math/big: fix example names Fixes (legit) vet warnings. Fix some verb tenses while we're here. Updates #11041 Change-Id: I27e995f55b38f4cf584e97a67b8545e8247e83d6 Reviewed-on: https://go-review.googlesource.com/27122 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: David Crawshaw <crawshaw@golang.org>	2016-08-16 14:36:32 +00:00
Josh Bleecher Snyder	40cf4ad0ef	all: fix "result not used" vet warnings For tests, assign to _. For benchmarks, assign to a sink. Updates #11041 Change-Id: I87c5543245c7bc74dceb38902f4551768dd37948 Reviewed-on: https://go-review.googlesource.com/27116 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-08-16 14:15:10 +00:00
Josh Bleecher Snyder	3357a02b74	math/big: use array instead of slice for deBruijn lookups This allows the compiler to remove a bounds check. math/big/nat.go:681: index bounds check elided math/big/nat.go:683: index bounds check elided Change-Id: Ieecb89ec5e988761b06764bd671672015cd58e9d Reviewed-on: https://go-review.googlesource.com/26663 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-08-16 00:22:13 +00:00
Ian Lance Taylor	fb3cf5c686	math/rand: fix raciness in Rand.Read There are no synchronization points protecting the readVal and readPos variables. This leads to a race when Read is called concurrently. Fix this by adding methods to lockedSource, which is the case where a race matters. Fixes #16308. Change-Id: Ic028909955700906b2d71e5c37c02da21b0f4ad9 Reviewed-on: https://go-review.googlesource.com/24852 Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>	2016-07-11 15:11:44 +00:00
Dmitri Popov	8d966bad6e	math/rand: fix io.Reader implementation Do not throw away the rest of Int63 value used for generation random bytes. Save it in Rand struct and re-use during the next Read call. Fixes #16124 Change-Id: Ic70bd80c3c3a6590e60ac615e8b3c2324589bea3 Reviewed-on: https://go-review.googlesource.com/24251 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com> Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-06-27 22:18:09 +00:00
Konstantin Shaposhnikov	33fa855e6c	math/rand: fix comment about bits of seed used by the default Source Fixes #15788 Change-Id: I5a1fd1e5992f1c16cf8d8437d742bf02e1653b9c Reviewed-on: https://go-review.googlesource.com/23461 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-06-26 05:00:39 +00:00
Nathan VanBenschoten	5e43dc943a	math/big: special-case a 0 mantissa during Rat parsing Previously, a 0 mantissa was special-cased during big.Float parsing, but not during big.Rat parsing. This meant that a value like 0e9999999999 would parse successfully in big.Float.SetString, but would hang in big.Rat.SetString. This discrepancy became an issue in https://golang.org/src/go/constant/value.go?#L250, where the big.Float would report an exponent of 0, so big.Rat.SetString would be used and would subsequently hang. A Go Playground example of this is https://play.golang.org/p/3fy28eUJuF The solution is to special-case a zero mantissa during big.Rat parsing as well, so that neither big.Rat nor big.Float will hang when parsing a value with 0 mantissa but a large exponent. This was discovered using go-fuzz on CockroachDB: https://github.com/cockroachdb/go-fuzz/blob/master/examples/parser/main.go Fixes #16176 Change-Id: I775558a8682adbeba1cc9d20ba10f8ed26259c56 Reviewed-on: https://go-review.googlesource.com/24430 Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-06-24 20:51:06 +00:00
Alberto Donizetti	5db44c17a2	math/big: avoid panic in float.Text with negative prec Fixes #15918 Change-Id: I4b434aed262960a2e6c659d4c2296fbf662c3a52 Reviewed-on: https://go-review.googlesource.com/23633 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-06-01 19:20:52 +00:00
Marcel van Lohuizen	2deb9209de	math/big: using Run for some more benchmarks Change-Id: I3ede8098f405de5d88e51c8370d3b68446d40744 Reviewed-on: https://go-review.googlesource.com/23428 Run-TryBot: Marcel van Lohuizen <mpvl@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>	2016-05-26 19:14:13 +00:00
Robert Griesemer	2168f2a68b	math/big: simplify benchmarking code some more Follow-up cleanup to https://golang.org/cl/23424/ . Change-Id: Ifb05c1ff5327df6bc5f4cbc554e18363293f7960 Reviewed-on: https://go-review.googlesource.com/23446 Reviewed-by: Marcel van Lohuizen <mpvl@golang.org>	2016-05-26 16:27:24 +00:00
Marcel van Lohuizen	07f0c19a30	math/big: use run for benchmarks shortens code and gives an example of the use of Run. Change-Id: I75ffaf762218a589274b4b62e19022e31e805d1b Reviewed-on: https://go-review.googlesource.com/23424 Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: Marcel van Lohuizen <mpvl@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-05-25 17:49:37 +00:00
Jeff R. Allen	3474610fbc	math/rand: Doc fix for how many bits Seed uses Document the fact that the default Source uses only the bottom 31 bits of the given seed. Fixes #15788 Change-Id: If20d1ec44a55c793a4a0a388f84b9392c2102bd1 Reviewed-on: https://go-review.googlesource.com/23352 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-05-24 19:07:46 +00:00
Russ Cox	ab4414773e	math/big: write t10 to multiply t by 10 The compiler has caught up. In fact the compiler is ahead; it knows about a magic multiply-by-5 instruction: // compute '0' + byte(r - t10) in AX MOVQ t, AX LEAQ (AX)(AX4), AX SHLQ $1, AX MOVQ r, CX SUBQ AX, CX LEAL 48(CX), AX For comparison, the shifty version compiles to: // compute '0' + byte(r - t10) in AX MOVQ t, AX MOVQ AX, CX SHLQ $3, AX MOVQ r, DX SUBQ AX, DX SUBQ CX, DX SUBQ CX, DX LEAL 48(DX), AX Fixes #2671. Change-Id: Ifbf23dbfeb19c0bb020fa44eb2f025943969fb6b Reviewed-on: https://go-review.googlesource.com/23372 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Andrew Gerrand <adg@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-05-24 13:34:20 +00:00
Emmanuel Odeke	53fd522c0d	all: make copyright headers consistent with one space after period Follows suit with https://go-review.googlesource.com/#/c/20111. Generated by running $ grep -R 'Go Authors. All' * \| cut -d":" -f1 \| while read F;do perl -pi -e 's/Go Authors. All/Go Authors. All/g' $F;done The code in cmd/internal/unvendor wasn't changed. Fixes #15213 Change-Id: I4f235cee0a62ec435f9e8540a1ec08ae03b1a75f Reviewed-on: https://go-review.googlesource.com/21819 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-05-02 13:43:18 +00:00
Robert Griesemer	aea224386e	math/big: more tests, documentation for Flot gob marshalling Follow-up to https://golang.org/cl/21755. This turned out to be a bit more than just a few nits as originally expected in that CL. 1) The actual mantissa may be shorter than required for the given precision (because of trailing 0's): no need to allocate space for it (and transmit 0's). This can save a lot of space when the precision is high: E.g., for prec == 1000, 16 words or 128 bytes are required at the most, but if the actual number is short, it may be much less (for the test cases present, it's significantly less). 2) The actual mantissa may be longer than the number of words required for the given precision: make sure to not overflow when encoding in bytes. 3) Add more documentation. 4) Add more tests. Change-Id: I9f40c408cfdd9183a8e81076d2f7d6c75e7a00e9 Reviewed-on: https://go-review.googlesource.com/22324 Reviewed-by: Alan Donovan <adonovan@google.com>	2016-04-20 21:16:21 +00:00
OneOfOne	d8c9dd6048	math/big: implement GobDecode/Encode for big.Float Added GobEncode/Decode and a test for them. Fixes #14593 Change-Id: Ic8d3efd24d0313a1a66f01da293c4c1fd39764a8 Reviewed-on: https://go-review.googlesource.com/21755 Reviewed-by: Robert Griesemer <gri@golang.org>	2016-04-20 17:51:01 +00:00
Matthew Dempsky	0da4dbe232	all: remove unnecessary type conversions cmd and runtime were handled separately, and I'm intentionally skipped syscall. This is the rest of the standard library. CL generated mechanically with github.com/mdempsky/unconvert. Change-Id: I9e0eff886974dedc37adb93f602064b83e469122 Reviewed-on: https://go-review.googlesource.com/22104 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-15 07:31:45 +00:00
Aliaksandr Valialkin	187afdebef	math/big: re-use memory in Int.GCD This improves TLS handshake performance. benchmark old ns/op new ns/op delta BenchmarkGCD10x10/WithoutXY-4 965 968 +0.31% BenchmarkGCD10x10/WithXY-4 1813 1391 -23.28% BenchmarkGCD10x100/WithoutXY-4 1093 1075 -1.65% BenchmarkGCD10x100/WithXY-4 2348 1676 -28.62% BenchmarkGCD10x1000/WithoutXY-4 1569 1565 -0.25% BenchmarkGCD10x1000/WithXY-4 4262 3242 -23.93% BenchmarkGCD10x10000/WithoutXY-4 6069 6066 -0.05% BenchmarkGCD10x10000/WithXY-4 12123 11331 -6.53% BenchmarkGCD10x100000/WithoutXY-4 52664 52610 -0.10% BenchmarkGCD10x100000/WithXY-4 97494 95649 -1.89% BenchmarkGCD100x100/WithoutXY-4 5244 5228 -0.31% BenchmarkGCD100x100/WithXY-4 22572 18630 -17.46% BenchmarkGCD100x1000/WithoutXY-4 6143 6233 +1.47% BenchmarkGCD100x1000/WithXY-4 24652 19357 -21.48% BenchmarkGCD100x10000/WithoutXY-4 15725 15804 +0.50% BenchmarkGCD100x10000/WithXY-4 60552 55973 -7.56% BenchmarkGCD100x100000/WithoutXY-4 107008 107853 +0.79% BenchmarkGCD100x100000/WithXY-4 349597 340994 -2.46% BenchmarkGCD1000x1000/WithoutXY-4 63785 64434 +1.02% BenchmarkGCD1000x1000/WithXY-4 373186 334035 -10.49% BenchmarkGCD1000x10000/WithoutXY-4 78038 78241 +0.26% BenchmarkGCD1000x10000/WithXY-4 543692 507034 -6.74% BenchmarkGCD1000x100000/WithoutXY-4 205607 207727 +1.03% BenchmarkGCD1000x100000/WithXY-4 2488113 2415323 -2.93% BenchmarkGCD10000x10000/WithoutXY-4 1731340 1714992 -0.94% BenchmarkGCD10000x10000/WithXY-4 10601046 7111329 -32.92% BenchmarkGCD10000x100000/WithoutXY-4 2239155 2212173 -1.21% BenchmarkGCD10000x100000/WithXY-4 30097040 26538887 -11.82% BenchmarkGCD100000x100000/WithoutXY-4 119845326 119863916 +0.02% BenchmarkGCD100000x100000/WithXY-4 768006543 426795966 -44.43% benchmark old allocs new allocs delta BenchmarkGCD10x10/WithoutXY-4 5 5 +0.00% BenchmarkGCD10x10/WithXY-4 17 9 -47.06% BenchmarkGCD10x100/WithoutXY-4 6 6 +0.00% BenchmarkGCD10x100/WithXY-4 21 9 -57.14% BenchmarkGCD10x1000/WithoutXY-4 6 6 +0.00% BenchmarkGCD10x1000/WithXY-4 30 12 -60.00% BenchmarkGCD10x10000/WithoutXY-4 6 6 +0.00% BenchmarkGCD10x10000/WithXY-4 26 12 -53.85% BenchmarkGCD10x100000/WithoutXY-4 6 6 +0.00% BenchmarkGCD10x100000/WithXY-4 28 12 -57.14% BenchmarkGCD100x100/WithoutXY-4 5 5 +0.00% BenchmarkGCD100x100/WithXY-4 183 61 -66.67% BenchmarkGCD100x1000/WithoutXY-4 8 8 +0.00% BenchmarkGCD100x1000/WithXY-4 170 47 -72.35% BenchmarkGCD100x10000/WithoutXY-4 8 8 +0.00% BenchmarkGCD100x10000/WithXY-4 200 67 -66.50% BenchmarkGCD100x100000/WithoutXY-4 8 8 +0.00% BenchmarkGCD100x100000/WithXY-4 188 65 -65.43% BenchmarkGCD1000x1000/WithoutXY-4 5 5 +0.00% BenchmarkGCD1000x1000/WithXY-4 2435 1193 -51.01% BenchmarkGCD1000x10000/WithoutXY-4 8 8 +0.00% BenchmarkGCD1000x10000/WithXY-4 2211 1076 -51.33% BenchmarkGCD1000x100000/WithoutXY-4 8 8 +0.00% BenchmarkGCD1000x100000/WithXY-4 2271 1108 -51.21% BenchmarkGCD10000x10000/WithoutXY-4 5 5 +0.00% BenchmarkGCD10000x10000/WithXY-4 23183 11605 -49.94% BenchmarkGCD10000x100000/WithoutXY-4 8 8 +0.00% BenchmarkGCD10000x100000/WithXY-4 23421 11717 -49.97% BenchmarkGCD100000x100000/WithoutXY-4 5 5 +0.00% BenchmarkGCD100000x100000/WithXY-4 232976 116815 -49.86% benchmark old bytes new bytes delta BenchmarkGCD10x10/WithoutXY-4 208 208 +0.00% BenchmarkGCD10x10/WithXY-4 736 432 -41.30% BenchmarkGCD10x100/WithoutXY-4 256 256 +0.00% BenchmarkGCD10x100/WithXY-4 896 432 -51.79% BenchmarkGCD10x1000/WithoutXY-4 368 368 +0.00% BenchmarkGCD10x1000/WithXY-4 1856 1152 -37.93% BenchmarkGCD10x10000/WithoutXY-4 1616 1616 +0.00% BenchmarkGCD10x10000/WithXY-4 7920 7376 -6.87% BenchmarkGCD10x100000/WithoutXY-4 13776 13776 +0.00% BenchmarkGCD10x100000/WithXY-4 68800 68176 -0.91% BenchmarkGCD100x100/WithoutXY-4 208 208 +0.00% BenchmarkGCD100x100/WithXY-4 6960 2112 -69.66% BenchmarkGCD100x1000/WithoutXY-4 544 560 +2.94% BenchmarkGCD100x1000/WithXY-4 7280 2400 -67.03% BenchmarkGCD100x10000/WithoutXY-4 2896 2912 +0.55% BenchmarkGCD100x10000/WithXY-4 15280 10002 -34.54% BenchmarkGCD100x100000/WithoutXY-4 27344 27365 +0.08% BenchmarkGCD100x100000/WithXY-4 88288 83427 -5.51% BenchmarkGCD1000x1000/WithoutXY-4 544 544 +0.00% BenchmarkGCD1000x1000/WithXY-4 178288 40043 -77.54% BenchmarkGCD1000x10000/WithoutXY-4 3344 3136 -6.22% BenchmarkGCD1000x10000/WithXY-4 188720 54432 -71.16% BenchmarkGCD1000x100000/WithoutXY-4 27792 27592 -0.72% BenchmarkGCD1000x100000/WithXY-4 373872 239447 -35.95% BenchmarkGCD10000x10000/WithoutXY-4 4288 4288 +0.00% BenchmarkGCD10000x10000/WithXY-4 11935584 481875 -95.96% BenchmarkGCD10000x100000/WithoutXY-4 31296 28834 -7.87% BenchmarkGCD10000x100000/WithXY-4 13237088 1662620 -87.44% BenchmarkGCD100000x100000/WithoutXY-4 40768 40768 +0.00% BenchmarkGCD100000x100000/WithXY-4 1165518864 14256010 -98.78% Change-Id: I652b3244bd074a03f3bc9a87c282330f9e5f1507 Reviewed-on: https://go-review.googlesource.com/21506 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-07 17:19:37 +00:00
Michael Munday	1e7c61d8c3	math/big: add s390x function implementations Change-Id: I2aadc885d6330460e494c687757f07c5e006f3b0 Reviewed-on: https://go-review.googlesource.com/20937 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-07 16:15:27 +00:00
Michael Munday	0382a30dd6	math: add functions and stubs for s390x Includes assembly implementations of Sqrt and Dim. Change-Id: I57472e8d31e2ee74bcebf9f8e818f765eb9b8abf Reviewed-on: https://go-review.googlesource.com/20936 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-06 23:35:56 +00:00
Robert Griesemer	7c86263be2	math/big: much simplified and faster Float rounding Change-Id: Iab0add7aee51a8c72a81f51d980d22d2fd612f5c Reviewed-on: https://go-review.googlesource.com/20817 Reviewed-by: Alan Donovan <adonovan@google.com>	2016-03-22 17:07:34 +00:00
Robert Griesemer	a14537816e	math/big: fix rounding to smallest denormal for Float.Float32/64 Converting a big.Float value x to a float32/64 value did not correctly round x up to the smallest denormal float32/64 if x was smaller than the smallest denormal float32/64, but larger than 0.5 of a smallest denormal float32/64. Handle this case explicitly and simplify some code in the turn. For #14651. Change-Id: I025e24bf8f0e671581a7de0abf7c1cd7e6403a6c Reviewed-on: https://go-review.googlesource.com/20816 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Alan Donovan <adonovan@google.com>	2016-03-21 20:24:06 +00:00
Robert Griesemer	bea2008b83	math/cmplx: added clarifying comment Fixes #14890. Change-Id: Ie790276b0e2ef94c92db3a777042d750269f876a Reviewed-on: https://go-review.googlesource.com/20953 Reviewed-by: Alan Donovan <adonovan@google.com>	2016-03-21 16:18:38 +00:00
Dominik Honnef	b2cf571040	all: delete dead test code This deletes unused code and helpers from tests. Change-Id: Ie31d46115f558ceb8da6efbf90c3c204e03b0d7e Reviewed-on: https://go-review.googlesource.com/20927 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-03-21 07:10:08 +00:00
Matthew Dempsky	e28a3929ef	math/big: cleanup documentation for Format methods 'b' is a standard verb for floating point values. The runes like '+' and '#' are called "flags" by package fmt's documentation. The flag '-' controls left/right justification, not anything related to signs. Change-Id: Ia9cf81b002df373f274ce635fe09b5bd0066aa1c Reviewed-on: https://go-review.googlesource.com/20930 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-03-21 02:49:32 +00:00
Matthew Dempsky	95c6c5f36b	math/big: fix comment typos Change-Id: I34cdc9cb3d32e86ff3a57db0012326c39cd55670 Reviewed-on: https://go-review.googlesource.com/20718 Reviewed-by: Robert Griesemer <gri@golang.org>	2016-03-15 20:41:15 +00:00
Lynn Boger	b4b2ddb867	math: improve sqrt for ppc64le,ppc64 The existing implementation uses code written in Go to implement Sqrt; this adds the assembler to use the sqrt instruction for Power and makes the necessary changes to allow it to be inlined. The following tests showed this relative improvement: benchmark delta BenchmarkSqrt -97.91% BenchmarkSqrtIndirect -96.65% BenchmarkSqrtGo -35.93% BenchmarkSqrtPrime -96.94% Fixes #14349 Change-Id: I8074f4dc63486e756587564ceb320aca300bf5fa Reviewed-on: https://go-review.googlesource.com/19515 Reviewed-by: Minux Ma <minux@golang.org>	2016-03-10 15:01:21 +00:00
Brady Catherman	9323de3da7	testing: implement 'Unordered Output' in Examples. Adds a type of output to Examples that allows tests to have unordered output. This is intended to help clarify when the output of a command will produce a fixed return, but that return might not be in an constant order. Examples where this is useful would be documenting the rand.Perm() call, or perhaps the (os.File).Readdir(), both of which can not guarantee order, but can guarantee the elements of the output. Fixes #10149 Change-Id: Iaf0cf1580b686afebd79718ed67ea744f5ed9fc5 Reviewed-on: https://go-review.googlesource.com/19280 Reviewed-by: Andrew Gerrand <adg@golang.org>	2016-03-09 04:34:41 +00:00
Robert Griesemer	3858efcc58	math/big: use correct precision in Float.Float32/64 for denormals When a big.Float is converted to a denormal float32/64, the rounding precision depends on the size of the denormal. Rounding may round up and thus change the size (exponent) of the denormal. Recompute the correct precision again for correct placement of the mantissa. Fixes #14553. Change-Id: Iedab5810a2d2a405cc5da28c6de7be34cb035b86 Reviewed-on: https://go-review.googlesource.com/20198 Reviewed-by: Alan Donovan <adonovan@google.com>	2016-03-04 17:39:50 +00:00
Rob Pike	0f9cc465fa	math: delete unused function sqrtC It appears to be a trivial dreg. Unreferenced. Gone. Change-Id: I4a5ceed48e84254bc8a07fdb04487a18a0edf965 Reviewed-on: https://go-review.googlesource.com/20122 Run-TryBot: Rob Pike <r@golang.org> Reviewed-by: Dave Cheney <dave@cheney.net>	2016-03-03 02:29:09 +00:00
Brad Fitzpatrick	5fea2ccc77	all: single space after period. The tree's pretty inconsistent about single space vs double space after a period in documentation. Make it consistently a single space, per earlier decisions. This means contributors won't be confused by misleading precedence. This CL doesn't use go/doc to parse. It only addresses // comments. It was generated with: $ perl -i -npe 's,^(\s// .+[a-z]\.) +([A-Z]),$1 $2,' $(git grep -l -E '^\s//(.+\.) +([A-Z])') $ go test go/doc -update Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7 Reviewed-on: https://go-review.googlesource.com/20022 Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Dave Day <djd@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-03-02 00:13:47 +00:00
Brad Fitzpatrick	519474451a	all: make copyright headers consistent with one space after period This is a subset of https://golang.org/cl/20022 with only the copyright header lines, so the next CL will be smaller and more reviewable. Go policy has been single space after periods in comments for some time. The copyright header template at: https://golang.org/doc/contribute.html#copyright also uses a single space. Make them all consistent. Change-Id: Icc26c6b8495c3820da6b171ca96a74701b4a01b0 Reviewed-on: https://go-review.googlesource.com/20111 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-03-01 23:34:33 +00:00
Martin Möhrmann	fdd0179bb1	all: fix typos and spelling Change-Id: Icd06d99c42b8299fd931c7da821e1f418684d913 Reviewed-on: https://go-review.googlesource.com/19829 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-02-24 18:42:29 +00:00
Nathan VanBenschoten	b04f3b06ec	all: replace strings.Index with strings.Contains where possible Change-Id: Ia613f1c37bfce800ece0533a5326fca91d99a66a Reviewed-on: https://go-review.googlesource.com/18120 Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Robert Griesemer <gri@golang.org>	2016-02-19 01:06:05 +00:00
Russ Cox	1e066cad1b	math/big: fix Exp(x, x, x) for certain large x Fixes #13907. Change-Id: Ieaa5183f399b12a9177372212adf481c8f0b4a0d Reviewed-on: https://go-review.googlesource.com/18491 Reviewed-by: Robert Griesemer <gri@golang.org> Reviewed-by: Vlad Krasnov <vlad@cloudflare.com> Reviewed-by: Adam Langley <agl@golang.org>	2016-01-13 01:43:35 +00:00
Robert Griesemer	5e059d1c31	math/big: fix typo in comment Fixes #13875. Change-Id: Icbb85c858d0bc545499a2b31622e9e7abdd7e5f9 Reviewed-on: https://go-review.googlesource.com/18441 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-01-08 20:18:10 +00:00
Russ Cox	04d732b4c2	build: shorten a few packages with long tests Takes 3% off my all.bash run time. For #10571. Change-Id: I8f00f523d6919e87182d35722a669b0b96b8218b Reviewed-on: https://go-review.googlesource.com/18087 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-12-29 15:46:44 +00:00
Russ Cox	6bcec09ceb	math/big: additional Montgomery cleanup Also fix bug reported in CL 17510. Found during fix of #13515 in CL 17672, but separate from the fix. Change-Id: I4b1024569a98f5cfd2ebb442ec3d64356164d284 Reviewed-on: https://go-review.googlesource.com/17673 Reviewed-by: Robert Griesemer <gri@golang.org>	2015-12-16 20:25:29 +00:00
Russ Cox	4306352182	math/big: fix carry propagation in Int.Exp Montgomery code Fixes #13515. Change-Id: I7dd5fbc816e5ea135f7d81f6735e7601f636fe4f Reviewed-on: https://go-review.googlesource.com/17672 Reviewed-by: Robert Griesemer <gri@golang.org>	2015-12-11 17:24:29 +00:00
David Chase	38255cbd1b	math/rand: improve uniformity of rand.Float64,Float32 Replaced code that substituted 0 for rounded-up 1 with code to try again. This has minimal effect on the existing stream of random numbers, but restores uniformity. Fixes #12290. Change-Id: Ib68f0b0a4a173339bcd0274cc16509f7b0977de8 Reviewed-on: https://go-review.googlesource.com/17670 Reviewed-by: Russ Cox <rsc@golang.org> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-12-11 15:17:42 +00:00
Russ Cox	0816432918	math/big: fix misuse of Unicode ˆ (U+02C6) is a circumflex accent, not an exponentiation operator. In the rest of the source code for this package, exponentation is written as **, so do the same here. Change-Id: I107b85be242ab79d152eb8a6fcf3ca2b197d7658 Reviewed-on: https://go-review.googlesource.com/17671 Reviewed-by: Robert Griesemer <gri@golang.org>	2015-12-11 03:37:57 +00:00
Brad Fitzpatrick	7e1791b97f	math/big: fix typo Found by github user asukakenji. Change-Id: I4c76316b69e8a243fb6bf280283f3722e728d853 Reviewed-on: https://go-review.googlesource.com/17641 Reviewed-by: Robert Griesemer <gri@golang.org>	2015-12-09 16:59:52 +00:00
Jingguo Yao	fb01ad21c2	math/rand: add a comment for the i=0 iteration Fixes #13215 Change-Id: I126117d42e7c1e69cbc7fad0760e225b03ed15bd Reviewed-on: https://go-review.googlesource.com/16852 Reviewed-by: Keith Randall <khr@golang.org>	2015-11-15 08:21:29 +00:00
Yao Zhang	4a25f6ca05	math, math/big: added support for mips64{,le} Change-Id: I5129a5b9dbbc57d97da723c2fc247bd28f951817 Reviewed-on: https://go-review.googlesource.com/14451 Reviewed-by: Minux Ma <minux@golang.org>	2015-11-12 04:49:57 +00:00
Matthew Dempsky	7832c82bf5	math: fix bad shift in Expm1 Noticed by cmd/vet. Expected values array produced by Python instead of Keisan because: 1) Keisan's website calculator is painfully difficult to copy/paste values into and out of, and 2) after tediously computing e^(vf[i] * 10) - 1 via Keisan I discovered that Keisan computing vf[i]10 in a higher precision was giving substantially different output values. Also, testing uses "close" instead of "veryclose" because 386's assembly implementation produces values for some of the test cases that fail "veryclose". Curiously, Expm1(vf[i]10) is identical to Exp(vf[i]*10)-1 on 386, whereas with the portable implementation they're only "veryclose". Investigating these questions is left to someone else. I just wanted to fix the cmd/vet warning. Fixes #13101. Change-Id: Ica8f6c267d01aa4cc31f53593e95812746942fbc Reviewed-on: https://go-review.googlesource.com/16505 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Klaus Post <klauspost@gmail.com> Reviewed-by: Robert Griesemer <gri@golang.org>	2015-10-30 22:55:19 +00:00
Brad Fitzpatrick	a59a27564b	math: fix typo and braino in my earlier commit The bug number was a typo, and I forgot to switch the implementation back to if statements after the change from Float64bits in the first patchset back to branching. if statements can currently be inlined, but switch cannot (#13071) Change-Id: I81d0cf64bda69186c3d747a07047f6a694f8fa70 Reviewed-on: https://go-review.googlesource.com/16446 Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-10-29 21:12:08 +00:00
Brad Fitzpatrick	6f8a66536b	math: replace assembly implementations of Abs with pure Go version The compiler can do a fine job, and can also inline it. From Jeremy Jackins's observation and rsc's recommendation in thread: "Pure Go math.Abs outperforms assembly version" https://groups.google.com/forum/#!topic/golang-dev/nP5mWvwAXZo Updates #13095 Change-Id: I3066f8eaa327bb403173b29791cc8661d7c0532c Reviewed-on: https://go-review.googlesource.com/16444 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-10-29 19:44:59 +00:00
Caleb Spare	22d4c8bf13	math: fix normalization bug in pure-Go sqrt Fixes #13013 Change-Id: I6cf500eacdce76e303fc1cd92dd1c80eef0986bc Reviewed-on: https://go-review.googlesource.com/16158 Reviewed-by: Andrew Gerrand <adg@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-10-23 18:29:10 +00:00
David Crawshaw	e4feb18fc2	math/big: fix SetMantExp comment Change-Id: If30cf9c94b58e18564db46c15c6f5cc14ec1a6fa Reviewed-on: https://go-review.googlesource.com/16271 Reviewed-by: Robert Griesemer <gri@golang.org>	2015-10-23 16:38:27 +00:00
Charlie Dorian	6fed2a68f7	math: Modf(-0) returns -0,-0 Fixes #12867 Change-Id: I8ba81c622bce2a77a6142f941603198582eaf8a4 Reviewed-on: https://go-review.googlesource.com/15570 Reviewed-by: Robert Griesemer <gri@golang.org>	2015-10-09 17:09:16 +00:00
Charlie Dorian	1ef9b5a5b9	math/cmplx: make error tolerance test function of expected value Copy math package CL 12230 to cmplx package. Change-Id: I3345b782b84b5b98e2b6a60d8774c7e7cede2891 Reviewed-on: https://go-review.googlesource.com/15500 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-10-07 18:41:11 +00:00
Damian Gryski	2fd016422e	math/big: check return value from quick.Check() for GCD tests Change-Id: I46c12aaaf453365c157604dfb1486605cfefd7af Reviewed-on: https://go-review.googlesource.com/15263 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-10-03 16:06:44 +00:00
Ilya Tocar	37cfb2e07e	math: optimize ceil/floor functions on amd64 Use SSE 4.1 rounding instruction to perform rounding Results (haswell): name old time/op new time/op delta Floor-48 2.71ns ± 0% 1.87ns ± 1% -31.17% (p=0.000 n=16+19) Ceil-48 3.09ns ± 3% 2.16ns ± 0% -30.16% (p=0.000 n=19+12) Change-Id: If63715879eed6530b1eb4fc96132d827f8f43909 Reviewed-on: https://go-review.googlesource.com/14561 Reviewed-by: Klaus Post <klauspost@gmail.com> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2015-10-03 15:55:08 +00:00
Adam Langley	5d5889c4d9	math/big: correct documentation for ProbablyPrime. As akalin points out in the bug, the comment previously claimed that the probability that the input is prime given that the function returned true is 1 - ¼ⁿ. But that's wrong: the correct statement is that the probability of the function returning false given a composite input is 1 - ¼ⁿ. This is not nearly as helpful, but at least it's truthful. A number of other (correct) expressions are suggested on the bug, but I think that the simplier one is preferable. This change also notes that the function is not suitable for adversarial inputs since it's deterministic. Fixes #12274. Change-Id: I6a0871d103b126ee5a5a922a8c6993055cb7b1ed Reviewed-on: https://go-review.googlesource.com/14052 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2015-09-30 00:39:00 +00:00
Robert Griesemer	3b9e8bb7f2	math/big: more documentation Good enough for now. Fixes #11241. Change-Id: Ieb50809f104d20bcbe14daecac503f72486bec92 Reviewed-on: https://go-review.googlesource.com/15111 Reviewed-by: Alan Donovan <adonovan@google.com>	2015-09-29 00:23:51 +00:00
Robert Griesemer	18563f8ab4	math/big: clean up *Int encoding tests - more uniform naming - test sign more deliberately - remove superfluous test (JSON encoder always uses the JSON marshaler if present) Change-Id: I37b1e367c01fc8bae1e06adbdb72dd366c08d5ce Reviewed-on: https://go-review.googlesource.com/15110 Reviewed-by: Alan Donovan <adonovan@google.com>	2015-09-29 00:23:31 +00:00
Robert Griesemer	38c5fd5cf8	math/big: implement Float.Text(Un)Marshaler Fixes #12256. Change-Id: Ie4a3337996da5c060b27530b076048ffead85f3b Reviewed-on: https://go-review.googlesource.com/15040 Reviewed-by: Alan Donovan <adonovan@google.com>	2015-09-29 00:21:45 +00:00
Robert Griesemer	3d4cd144cc	math/big: improved documentation - moved existing package documentation from nat.go to doc.go - expanded on it For #11241. Change-Id: Ie75a2b0178a8904a4154307a1f5080d7efc5489a Reviewed-on: https://go-review.googlesource.com/15042 Reviewed-by: Alan Donovan <adonovan@google.com>	2015-09-28 16:27:52 +00:00
Robert Griesemer	59129c6a93	math/big: remove some string conversions in Int encoding Change-Id: I1180aa3d30fb8563c8e6ecefeb3296af0a88f5a6 Reviewed-on: https://go-review.googlesource.com/14998 Reviewed-by: Alan Donovan <adonovan@google.com>	2015-09-25 22:25:52 +00:00
Robert Griesemer	7fa5a11ea1	math/big: move Int/Rat gob/json/xml functionality in separate files Like int/rat/float conversions, move this functionality into separate implementation and test files. No implementation changes besides the move. Change-Id: If19c45f5a72a57b95cbce2329724693ae5a4807d Reviewed-on: https://go-review.googlesource.com/14997 Reviewed-by: Alan Donovan <adonovan@google.com>	2015-09-25 22:25:36 +00:00
Robert Griesemer	e937eeeccd	math/big: removed more unnecessary string conversions - renamed (nat) itoa to utoa (since that's what it is) - added (nat) itoa that takes a sign parameter; this helps removing a few string copies - used buffers instead of string+ in Rat conversions Change-Id: I6b37a6b39557ae311cafdfe5c4a26e9246bde1a9 Reviewed-on: https://go-review.googlesource.com/14995 Reviewed-by: Alan Donovan <adonovan@google.com>	2015-09-25 22:25:12 +00:00
Robert Griesemer	8d701f092d	math/big: implement Int.Text, Int.Append This makes the Int conversion routines match the respective strconv and big.Float conversion routines. Change-Id: I5cfcda1632ee52fe87c5bb75892bdda76cc3af15 Reviewed-on: https://go-review.googlesource.com/14994 Reviewed-by: Alan Donovan <adonovan@google.com>	2015-09-25 22:24:49 +00:00
Robert Griesemer	b07a9efa78	math/big: faster string conversion routines Eliminated unnecessary string conversions throughout and removed (internal) capability for arbitrary character sets in conversion routines (functionality was not exported and not used internally). benchmark old ns/op new ns/op delta BenchmarkDecimalConversion-8 198283 187085 -5.65% BenchmarkStringPiParallel-8 46116 47822 +3.70% BenchmarkString10Base2-8 216 166 -23.15% BenchmarkString100Base2-8 886 762 -14.00% BenchmarkString1000Base2-8 7296 6625 -9.20% BenchmarkString10000Base2-8 72371 65563 -9.41% BenchmarkString100000Base2-8 725849 672766 -7.31% BenchmarkString10Base8-8 160 114 -28.75% BenchmarkString100Base8-8 398 309 -22.36% BenchmarkString1000Base8-8 2650 2244 -15.32% BenchmarkString10000Base8-8 24974 21745 -12.93% BenchmarkString100000Base8-8 245457 217489 -11.39% BenchmarkString10Base10-8 337 288 -14.54% BenchmarkString100Base10-8 1298 1046 -19.41% BenchmarkString1000Base10-8 6200 5752 -7.23% BenchmarkString10000Base10-8 24942 22589 -9.43% BenchmarkString100000Base10-8 8012921 7947152 -0.82% BenchmarkString10Base16-8 156 107 -31.41% BenchmarkString100Base16-8 344 255 -25.87% BenchmarkString1000Base16-8 2067 1705 -17.51% BenchmarkString10000Base16-8 19026 16112 -15.32% BenchmarkString100000Base16-8 184038 163457 -11.18% Change-Id: I68bd807529bd9b985f4b6ac2a87764bcc1a7d2f7 Reviewed-on: https://go-review.googlesource.com/14926 Reviewed-by: Alan Donovan <adonovan@google.com>	2015-09-24 23:07:47 +00:00
Robert Griesemer	3f7c3e01db	math/big: fix test for denormalized inputs and enable more test cases Also: removed unnecessary BUG comment (was fixed). Change-Id: I8f11fbcb4e30a19ec5a25df742b3e25e2ee7f846 Reviewed-on: https://go-review.googlesource.com/14923 Reviewed-by: Alan Donovan <adonovan@google.com>	2015-09-24 22:23:55 +00:00
Robert Griesemer	59a6ba5634	math/big: factored out an internal accessor method (cleanup), added benchmark Current result of DecimalConversion benchmark (for future reference): BenchmarkDecimalConversion-8 10000 204770 ns/op Measured on Mac Mini (late 2012) running OS X 10.10.5, 2.3 GHz Intel Core i7, 8 GB 1333 MHz DDR3. Also: Removed comment suggesting to implement decimal by representing digits as numbers 0..9 rather than ASCII chars '0'..'9' to avoid repeated +/-'0' operations. Tried and it appears (per above benchmark) that the +/-'0' operations are neglibile but the addition conversion passes around it are not and that it makes things significantly slower. Change-Id: I6ee033b1172043248093cc5d02abff5fc54c2e7a Reviewed-on: https://go-review.googlesource.com/14857 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Alan Donovan <adonovan@google.com>	2015-09-23 20:25:59 +00:00
Robert Griesemer	4fc9565ffc	math/big: implement negative precision for Float.Append/Text Enabled all but a handful of disabled Float formatting test cases. Fixes #10991. Change-Id: Id18e160e857be2743429a377000e996978015a1a Reviewed-on: https://go-review.googlesource.com/14850 Reviewed-by: Alan Donovan <adonovan@google.com>	2015-09-23 16:45:38 +00:00
Robert Griesemer	b5d94b7d41	math/big: add test cases for min/max exponent values Change-Id: I2e74e39628285e2fecaab712be6cff230619a6c2 Reviewed-on: https://go-review.googlesource.com/14778 Reviewed-by: Alan Donovan <adonovan@google.com>	2015-09-22 07:59:02 +00:00
Robert Griesemer	16b3675bc8	math/big: optimize Float.Parse by reducing powers of 10 to powers of 2 and 5 Instead of computing the final adjustment factor as a power of 10, it's more efficient to split 10e into 2e * 5**e . Powers of 2 are trivially added to the Float exponent, and powers of 5 are smaller and thus faster to compute. Also, use a table of uint64 values rather than float64 values for initial power value. uint64 values appear to be faster to convert to Floats (useful for small exponents). Added two small benchmarks to confirm that there's no regresssion. benchmark old ns/op new ns/op delta BenchmarkParseFloatSmallExp-8 17543 16220 -7.54% BenchmarkParseFloatLargeExp-8 60865 59996 -1.43% Change-Id: I3efd7556b023316f86f334137a67fe0c6d52f8ef Reviewed-on: https://go-review.googlesource.com/14782 Reviewed-by: Alan Donovan <adonovan@google.com>	2015-09-22 05:48:02 +00:00
Spencer Nelson	f9e404c1c5	math/rand: make Rand fulfill the Reader interface Add a Read function to Rand which reads random bytes into a buffer. Fixes #8330 Change-Id: I85b90277b8be9287c6697def8dbefe0029b6ee06 Reviewed-on: https://go-review.googlesource.com/14522 Reviewed-by: Rob Pike <r@golang.org>	2015-09-16 17:54:01 +00:00
Alberto Donizetti	3d5bed2726	math/big: Add small complete example of big.Rat usage Updates #11241 Change-Id: If71f651f3b8aca432c91314358b93f195217d9ec Reviewed-on: https://go-review.googlesource.com/14317 Reviewed-by: Robert Griesemer <gri@golang.org>	2015-09-15 17:46:22 +00:00
Rob Pike	67ddae87b9	all: use one 'l' when cancelling everywhere except Solaris Fixes #11626. Change-Id: I1b70c0844473c3b57a53d7cca747ea5cdc68d232 Reviewed-on: https://go-review.googlesource.com/14526 Run-TryBot: Rob Pike <r@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2015-09-11 18:31:51 +00:00
Robert Griesemer	a370fbaac6	math/big: use more direct formatting in ExampleRoundingMode, cosmetic changes Change-Id: I3d37391af2089881a5bd4d8f3e5d434b279c272e Reviewed-on: https://go-review.googlesource.com/14490 Reviewed-by: Chris Manghane <cmang@golang.org>	2015-09-10 22:10:41 +00:00
Konstantin Shaposhnikov	e216735dfa	math/big: add example for RoundingMode Updates #11241 Change-Id: I0614c5a9a7a4c399ad5d664f36c70c3210911905 Reviewed-on: https://go-review.googlesource.com/14356 Reviewed-by: Robert Griesemer <gri@golang.org>	2015-09-10 19:58:29 +00:00
Konstantin Shaposhnikov	49fb8cc10c	all: minor documentation tweaks for constants Block comments appear after a block in the HTML documentation generated by godoc. Words like "following" should be avoided. Change-Id: Iedfad67f4b8b9c84f128b98b9b06fa76919af388 Reviewed-on: https://go-review.googlesource.com/14357 Reviewed-by: Rob Pike <r@golang.org>	2015-09-09 05:07:52 +00:00
David Leon Gil	ea0491b70a	math/big: use optimized formula in ModSqrt for 3 mod 4 primes For primes which are 3 mod 4, using Tonelli-Shanks is slower and more complicated than using the identity a**((p+1)/4) mod p == sqrt(a) For 2^450-2^225-1 and 2^10860-2^5430-1, which are 3 mod 4: BenchmarkModSqrt225_TonelliTri 1000 1135375 ns/op BenchmarkModSqrt225_3Mod4 10000 156009 ns/op BenchmarkModSqrt5430_Tonelli 1 3448851386 ns/op BenchmarkModSqrt5430_3Mod4 2 914616710 ns/op ~2.6x to 7x faster. Fixes #11437 (which is a prime choice of issues to fix) Change-Id: I813fb29454160483ec29825469e0370d517850c2 Reviewed-on: https://go-review.googlesource.com/11522 Reviewed-by: Adam Langley <agl@golang.org>	2015-08-29 19:11:03 +00:00
Tarmigan Casebolt	e893724e75	math: avoid unused assignment in jn.go Change-Id: Ie4f21bcd5849e994c63ec5bbda2dee6f3ec4da12 Reviewed-on: https://go-review.googlesource.com/13891 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-08-24 14:34:36 +00:00
Robert Griesemer	e288271773	math/big: fix TestBytes test Fixes #12231. Change-Id: I1f07c444623cd864667e21b2fee534eacdc193bb Reviewed-on: https://go-review.googlesource.com/13814 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-08-21 20:02:40 +00:00
Alberto Donizetti	13b5dc885b	math/big: correctly handle large exponent in SetString Even though the umul/uquo functions expect two valid, finite big.Floats arguments, SetString was calling them with possibly Inf values, which resulted in bogus return values. Replace umul and udiv calls with Mul and Quo calls to fix this. Also, fix two wrong tests. See relevant issue on issue tracker for a detailed explanation. Fixes #11341 Change-Id: Ie35222763a57a2d712a5f5f7baec75cab8189a53 Reviewed-on: https://go-review.googlesource.com/13778 Reviewed-by: Robert Griesemer <gri@golang.org>	2015-08-21 18:11:22 +00:00
Robert Griesemer	92eb34b59a	math/big: remove superfluous comparison This is not a functional change. Also: - minor cleanups, better comments - uniform spelling of noun "zeros" (per OED) Fixes #11277. Change-Id: I1726f358ce15907bd2410f646b02cf8b11b919cd Reviewed-on: https://go-review.googlesource.com/11267 Reviewed-by: Alan Donovan <adonovan@google.com> Reviewed-by: Robert Griesemer <gri@golang.org>	2015-08-21 17:46:46 +00:00
Andrey Petrov	7cabaded51	math/rand: warn against using package for security-sensitive work Urge users of math/rand to consider using crypto/rand when doing security-sensitive work. Related to issue #11871. While we haven't reached consensus on how to make the package inherently safer, everyone agrees that the docs for math/rand can be improved. Change-Id: I576a312e51b2a3445691da6b277c7b4717173197 Reviewed-on: https://go-review.googlesource.com/12900 Reviewed-by: Rob Pike <r@golang.org>	2015-07-30 12:42:18 +00:00
Robert Griesemer	f35bc3ee87	math/big: document rounding for Rat.FloatToString Fixes #11523. Change-Id: I172f6facd555a1c6db76f25d5097343c20dea59a Reviewed-on: https://go-review.googlesource.com/12507 Reviewed-by: Alan Donovan <adonovan@google.com> Reviewed-by: Russ Cox <rsc@golang.org>	2015-07-22 19:47:19 +00:00
Russ Cox	bed6326a3c	math: fix Log2 test failures on ppc64 (and s390) - Make Log2 exact for powers of two. - Fix error tolerance function to make tolerance a function of the correct (expected) value. Fixes #9066. Change-Id: I0320a93ce4130deed1c7b7685627d51acb7bc56d Reviewed-on: https://go-review.googlesource.com/12230 Reviewed-by: Ian Lance Taylor <iant@golang.org>	2015-07-15 05:35:22 +00:00
Brad Fitzpatrick	783297ad6a	all: link to https for golang subdomains too The previous commit (git `2ae77376`) just did golang.org. This one includes golang.org subdomains like blog, play, and build. Change-Id: I4469f7b307ae2a12ea89323422044e604c5133ae Reviewed-on: https://go-review.googlesource.com/12071 Reviewed-by: Rob Pike <r@golang.org>	2015-07-12 04:42:40 +00:00

... 3 4 5 6 7 ...

589 Commits