mirror/go - go - Git Fam. Sieh

Commit Graph

Author	SHA1	Message	Date
Russ Cox	a812e5f3c3	math/big: update calibration tests and recalibrate Refactor calibration tests to use the same logic for all. Choosing thresholds that are broadly appropriate for all systems is part science but also part guesswork and judgement. We could instead set per-GOOS/GOARCH thresholds, but that seems like too much work, and even then there would be variation between different chips within a GOOS/GOARCH. (For example see the three linux/amd64 systems benchmarked below.) The thresholds chosen in this CL are: karatsubaThreshold = 40 // unchanged basicSqrThreshold = 12 // was 20 karatsubaSqrThreshold = 80 // was 260 divRecursiveThreshold = 40 // was 100 The new file calibrate.md explains the calibration process and links to graphs justifying those values. (The graphs are hosted on swtch.com to avoid adding a megabyte of extra data to the Go repo and Go distributions.) A rendered copy of calibrate.md is at https://swtch.com/math/big/calibrate.html. goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-88 13.13n ± 2% 13.14n ± 2% ~ (p=0.494 n=15) Div/40/20-88 13.13n ± 2% 13.14n ± 2% ~ (p=0.137 n=15) Div/100/50-88 25.50n ± 0% 25.51n ± 0% ~ (p=0.038 n=15) Div/200/100-88 113.1n ± 1% 116.0n ± 3% +2.56% (p=0.000 n=15) Div/400/200-88 135.3n ± 0% 137.1n ± 1% ~ (p=0.004 n=15) Div/1000/500-88 259.9n ± 1% 259.0n ± 2% ~ (p=0.182 n=15) Div/2000/1000-88 568.8n ± 1% 564.7n ± 3% ~ (p=0.927 n=15) Div/20000/10000-88 25.79µ ± 1% 22.11µ ± 2% -14.26% (p=0.000 n=15) Div/200000/100000-88 755.1µ ± 1% 737.6µ ± 1% -2.32% (p=0.000 n=15) Div/2000000/1000000-88 31.30m ± 0% 31.20m ± 1% ~ (p=0.081 n=15) Div/20000000/10000000-88 1.268 ± 0% 1.265 ± 0% ~ (p=0.011 n=15) NatMul/10-88 142.6n ± 0% 142.9n ± 7% ~ (p=0.145 n=15) NatMul/100-88 4.347µ ± 0% 4.350µ ± 3% ~ (p=0.430 n=15) NatMul/1000-88 187.6µ ± 0% 188.4µ ± 2% ~ (p=0.004 n=15) NatMul/10000-88 8.052m ± 0% 8.057m ± 1% ~ (p=0.148 n=15) NatMul/100000-88 260.6m ± 0% 260.7m ± 0% ~ (p=0.512 n=15) NatSqr/1-88 26.58n ± 5% 27.96n ± 8% ~ (p=0.574 n=15) NatSqr/2-88 42.35n ± 7% 44.87n ± 6% ~ (p=0.690 n=15) NatSqr/3-88 53.28n ± 4% 55.62n ± 5% ~ (p=0.151 n=15) NatSqr/5-88 76.26n ± 6% 81.43n ± 6% +6.78% (p=0.000 n=15) NatSqr/8-88 110.8n ± 5% 116.4n ± 6% ~ (p=0.040 n=15) NatSqr/10-88 141.4n ± 4% 147.8n ± 4% ~ (p=0.011 n=15) NatSqr/20-88 325.8n ± 3% 341.7n ± 4% +4.88% (p=0.000 n=15) NatSqr/30-88 536.8n ± 3% 556.1n ± 4% ~ (p=0.027 n=15) NatSqr/50-88 1.168µ ± 3% 1.197µ ± 3% ~ (p=0.442 n=15) NatSqr/80-88 2.527µ ± 2% 2.480µ ± 2% -1.86% (p=0.000 n=15) NatSqr/100-88 3.771µ ± 2% 3.535µ ± 2% -6.26% (p=0.000 n=15) NatSqr/200-88 14.03µ ± 2% 10.57µ ± 3% -24.68% (p=0.000 n=15) NatSqr/300-88 24.06µ ± 2% 20.57µ ± 2% -14.52% (p=0.000 n=15) NatSqr/500-88 65.43µ ± 1% 45.45µ ± 1% -30.55% (p=0.000 n=15) NatSqr/800-88 126.41µ ± 1% 94.13µ ± 2% -25.54% (p=0.000 n=15) NatSqr/1000-88 196.4µ ± 1% 135.1µ ± 1% -31.18% (p=0.000 n=15) NatSqr/10000-88 6.404m ± 0% 5.326m ± 1% -16.84% (p=0.000 n=15) NatSqr/100000-88 267.2m ± 0% 198.7m ± 0% -25.64% (p=0.000 n=15) geomean 7.318µ 6.948µ -5.06% goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) CPU @ 3.10GHz │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-16 22.23n ± 0% 22.23n ± 0% ~ (p=0.973 n=15) Div/40/20-16 22.23n ± 0% 22.23n ± 0% ~ (p=0.226 n=15) Div/100/50-16 55.27n ± 1% 55.59n ± 0% ~ (p=0.004 n=15) Div/200/100-16 174.7n ± 3% 175.9n ± 2% ~ (p=0.645 n=15) Div/400/200-16 208.8n ± 1% 209.5n ± 2% ~ (p=0.169 n=15) Div/1000/500-16 378.7n ± 2% 380.5n ± 2% ~ (p=0.091 n=15) Div/2000/1000-16 778.4n ± 1% 781.1n ± 2% ~ (p=0.104 n=15) Div/20000/10000-16 25.16µ ± 1% 24.93µ ± 1% -0.91% (p=0.000 n=15) Div/200000/100000-16 926.4µ ± 0% 927.7µ ± 1% ~ (p=0.436 n=15) Div/2000000/1000000-16 35.58m ± 0% 35.53m ± 0% ~ (p=0.267 n=15) Div/20000000/10000000-16 1.333 ± 0% 1.330 ± 0% ~ (p=0.126 n=15) NatMul/10-16 172.6n ± 0% 165.4n ± 0% -4.17% (p=0.000 n=15) NatMul/100-16 5.706µ ± 0% 5.503µ ± 0% -3.56% (p=0.000 n=15) NatMul/1000-16 220.8µ ± 0% 219.1µ ± 0% -0.76% (p=0.000 n=15) NatMul/10000-16 8.688m ± 0% 8.621m ± 0% -0.77% (p=0.000 n=15) NatMul/100000-16 333.3m ± 0% 333.5m ± 0% ~ (p=0.512 n=15) NatSqr/1-16 28.66n ± 1% 28.42n ± 3% -0.84% (p=0.000 n=15) NatSqr/2-16 48.29n ± 2% 48.19n ± 2% ~ (p=0.042 n=15) NatSqr/3-16 59.93n ± 0% 59.64n ± 2% -0.48% (p=0.000 n=15) NatSqr/5-16 88.05n ± 0% 87.89n ± 3% ~ (p=0.066 n=15) NatSqr/8-16 127.7n ± 0% 126.9n ± 3% -0.63% (p=0.000 n=15) NatSqr/10-16 170.4n ± 0% 169.7n ± 3% ~ (p=0.004 n=15) NatSqr/20-16 388.8n ± 0% 392.9n ± 3% ~ (p=0.123 n=15) NatSqr/30-16 635.2n ± 0% 641.7n ± 3% ~ (p=0.123 n=15) NatSqr/50-16 1.304µ ± 1% 1.314µ ± 3% ~ (p=0.927 n=15) NatSqr/80-16 2.709µ ± 1% 2.899µ ± 4% +7.01% (p=0.000 n=15) NatSqr/100-16 3.885µ ± 0% 3.981µ ± 4% ~ (p=0.123 n=15) NatSqr/200-16 13.29µ ± 2% 12.14µ ± 4% -8.67% (p=0.000 n=15) NatSqr/300-16 23.39µ ± 0% 22.51µ ± 3% -3.78% (p=0.000 n=15) NatSqr/500-16 58.13µ ± 1% 50.56µ ± 2% -13.02% (p=0.000 n=15) NatSqr/800-16 118.4µ ± 1% 107.6µ ± 2% -9.11% (p=0.000 n=15) NatSqr/1000-16 172.7µ ± 1% 151.8µ ± 2% -12.11% (p=0.000 n=15) NatSqr/10000-16 6.065m ± 1% 5.757m ± 1% -5.08% (p=0.000 n=15) NatSqr/100000-16 240.9m ± 0% 228.1m ± 0% -5.32% (p=0.000 n=15) geomean 8.601µ 8.453µ -1.71% goos: linux goarch: amd64 pkg: math/big cpu: AMD Ryzen 9 7950X 16-Core Processor │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-32 11.11n ± 0% 11.11n ± 1% ~ (p=0.532 n=15) Div/40/20-32 11.08n ± 1% 11.11n ± 0% ~ (p=0.815 n=15) Div/100/50-32 16.81n ± 0% 16.84n ± 29% ~ (p=0.020 n=15) Div/200/100-32 73.91n ± 0% 76.85n ± 11% +3.98% (p=0.000 n=15) Div/400/200-32 87.35n ± 0% 88.91n ± 34% +1.79% (p=0.000 n=15) Div/1000/500-32 169.3n ± 1% 168.9n ± 1% ~ (p=0.049 n=15) Div/2000/1000-32 369.3n ± 0% 369.0n ± 0% ~ (p=0.108 n=15) Div/20000/10000-32 15.92µ ± 0% 13.55µ ± 2% -14.91% (p=0.000 n=15) Div/200000/100000-32 491.4µ ± 0% 482.4µ ± 1% -1.84% (p=0.000 n=15) Div/2000000/1000000-32 20.09m ± 0% 19.96m ± 0% -0.69% (p=0.000 n=15) Div/20000000/10000000-32 756.5m ± 0% 755.5m ± 0% ~ (p=0.089 n=15) NatMul/10-32 125.4n ± 5% 124.8n ± 1% ~ (p=0.588 n=15) NatMul/100-32 2.952µ ± 3% 2.969µ ± 0% ~ (p=0.237 n=15) NatMul/1000-32 120.7µ ± 0% 121.1µ ± 0% +0.30% (p=0.000 n=15) NatMul/10000-32 4.845m ± 0% 4.839m ± 1% ~ (p=0.653 n=15) NatMul/100000-32 173.3m ± 0% 173.3m ± 0% ~ (p=0.838 n=15) NatSqr/1-32 31.18n ± 23% 32.08n ± 2% ~ (p=0.015 n=15) NatSqr/2-32 57.22n ± 28% 58.88n ± 2% ~ (p=0.054 n=15) NatSqr/3-32 61.34n ± 18% 64.33n ± 2% ~ (p=0.237 n=15) NatSqr/5-32 72.47n ± 17% 79.81n ± 3% ~ (p=0.067 n=15) NatSqr/8-32 83.26n ± 26% 100.10n ± 3% ~ (p=0.016 n=15) NatSqr/10-32 87.31n ± 43% 125.50n ± 2% ~ (p=0.003 n=15) NatSqr/20-32 193.5n ± 25% 244.4n ± 13% ~ (p=0.002 n=15) NatSqr/30-32 323.9n ± 17% 380.9n ± 6% ~ (p=0.003 n=15) NatSqr/50-32 713.4n ± 9% 761.7n ± 8% ~ (p=0.419 n=15) NatSqr/80-32 1.486µ ± 7% 1.609µ ± 5% +8.28% (p=0.000 n=15) NatSqr/100-32 2.115µ ± 9% 2.253µ ± 1% ~ (p=0.104 n=15) NatSqr/200-32 7.201µ ± 4% 6.610µ ± 1% -8.21% (p=0.000 n=15) NatSqr/300-32 13.08µ ± 2% 12.37µ ± 1% -5.41% (p=0.000 n=15) NatSqr/500-32 32.56µ ± 2% 27.83µ ± 2% -14.52% (p=0.000 n=15) NatSqr/800-32 66.83µ ± 3% 59.59µ ± 1% -10.83% (p=0.000 n=15) NatSqr/1000-32 98.09µ ± 1% 83.59µ ± 1% -14.78% (p=0.000 n=15) NatSqr/10000-32 3.445m ± 1% 3.245m ± 0% -5.81% (p=0.000 n=15) NatSqr/100000-32 137.3m ± 0% 127.0m ± 0% -7.54% (p=0.000 n=15) geomean 4.897µ 4.972µ +1.52% goos: linux goarch: arm64 pkg: math/big │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-16 15.26n ± 2% 15.14n ± 1% ~ (p=0.212 n=15) Div/40/20-16 15.22n ± 1% 15.16n ± 0% ~ (p=0.190 n=15) Div/100/50-16 26.53n ± 2% 26.42n ± 0% -0.41% (p=0.000 n=15) Div/200/100-16 124.3n ± 0% 124.0n ± 0% ~ (p=0.704 n=15) Div/400/200-16 142.4n ± 0% 141.8n ± 0% ~ (p=0.074 n=15) Div/1000/500-16 262.0n ± 1% 261.3n ± 1% ~ (p=0.046 n=15) Div/2000/1000-16 532.6n ± 0% 532.5n ± 1% ~ (p=0.798 n=15) Div/20000/10000-16 22.27µ ± 0% 22.88µ ± 0% +2.73% (p=0.000 n=15) Div/200000/100000-16 890.4µ ± 0% 902.8µ ± 0% +1.39% (p=0.000 n=15) Div/2000000/1000000-16 35.03m ± 0% 35.10m ± 0% ~ (p=0.305 n=15) Div/20000000/10000000-16 1.380 ± 0% 1.385 ± 0% ~ (p=0.019 n=15) NatMul/10-16 177.6n ± 1% 175.6n ± 3% ~ (p=0.480 n=15) NatMul/100-16 5.675µ ± 0% 5.669µ ± 1% ~ (p=0.705 n=15) NatMul/1000-16 224.3µ ± 0% 224.6µ ± 0% ~ (p=0.653 n=15) NatMul/10000-16 8.735m ± 0% 8.739m ± 0% ~ (p=0.567 n=15) NatMul/100000-16 331.6m ± 0% 331.6m ± 1% ~ (p=0.412 n=15) NatSqr/1-16 43.69n ± 2% 42.77n ± 6% ~ (p=0.383 n=15) NatSqr/2-16 65.26n ± 2% 63.91n ± 5% ~ (p=0.285 n=15) NatSqr/3-16 73.95n ± 1% 72.25n ± 6% ~ (p=0.198 n=15) NatSqr/5-16 95.06n ± 1% 94.21n ± 3% ~ (p=0.721 n=15) NatSqr/8-16 155.5n ± 1% 153.4n ± 4% ~ (p=0.170 n=15) NatSqr/10-16 175.4n ± 1% 174.0n ± 2% ~ (p=0.271 n=15) NatSqr/20-16 360.8n ± 0% 358.5n ± 2% ~ (p=0.170 n=15) NatSqr/30-16 584.7n ± 0% 582.9n ± 1% ~ (p=0.170 n=15) NatSqr/50-16 1.323µ ± 0% 1.322µ ± 0% ~ (p=0.627 n=15) NatSqr/80-16 2.916µ ± 0% 2.674µ ± 0% -8.30% (p=0.000 n=15) NatSqr/100-16 4.365µ ± 0% 3.802µ ± 0% -12.90% (p=0.000 n=15) NatSqr/200-16 16.42µ ± 0% 11.29µ ± 0% -31.26% (p=0.000 n=15) NatSqr/300-16 28.07µ ± 0% 22.83µ ± 0% -18.68% (p=0.000 n=15) NatSqr/500-16 76.30µ ± 0% 50.06µ ± 0% -34.39% (p=0.000 n=15) NatSqr/800-16 147.5µ ± 0% 101.2µ ± 1% -31.41% (p=0.000 n=15) NatSqr/1000-16 228.6µ ± 0% 149.5µ ± 0% -34.61% (p=0.000 n=15) NatSqr/10000-16 7.417m ± 0% 6.025m ± 0% -18.76% (p=0.000 n=15) NatSqr/100000-16 309.2m ± 0% 214.9m ± 0% -30.50% (p=0.000 n=15) geomean 8.559µ 7.906µ -7.63% goos: darwin goarch: arm64 pkg: math/big cpu: Apple M3 Pro │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-12 9.577n ± 6% 9.473n ± 5% ~ (p=0.384 n=15) Div/40/20-12 9.480n ± 1% 9.430n ± 1% ~ (p=0.019 n=15) Div/100/50-12 14.82n ± 0% 14.82n ± 0% ~ (p=0.845 n=15) Div/200/100-12 83.94n ± 1% 84.35n ± 4% ~ (p=0.512 n=15) Div/400/200-12 102.7n ± 1% 102.9n ± 0% ~ (p=0.845 n=15) Div/1000/500-12 185.3n ± 1% 181.9n ± 1% -1.83% (p=0.000 n=15) Div/2000/1000-12 397.0n ± 1% 396.7n ± 0% ~ (p=0.959 n=15) Div/20000/10000-12 14.05µ ± 0% 13.70µ ± 1% ~ (p=0.002 n=15) Div/200000/100000-12 529.4µ ± 3% 526.7µ ± 2% ~ (p=0.967 n=15) Div/2000000/1000000-12 20.05m ± 0% 20.05m ± 0% ~ (p=0.653 n=15) Div/20000000/10000000-12 788.2m ± 1% 789.0m ± 1% ~ (p=0.412 n=15) NatMul/10-12 79.95n ± 1% 80.87n ± 1% +1.15% (p=0.000 n=15) NatMul/100-12 2.973µ ± 0% 2.986µ ± 2% ~ (p=0.051 n=15) NatMul/1000-12 122.6µ ± 5% 123.0µ ± 1% ~ (p=0.783 n=15) NatMul/10000-12 4.990m ± 1% 5.000m ± 1% ~ (p=0.653 n=15) NatMul/100000-12 185.3m ± 3% 190.3m ± 1% ~ (p=0.089 n=15) NatSqr/1-12 11.84n ± 1% 11.88n ± 1% ~ (p=0.735 n=15) NatSqr/2-12 21.01n ± 1% 21.44n ± 6% ~ (p=0.039 n=15) NatSqr/3-12 25.59n ± 0% 26.74n ± 9% +4.49% (p=0.000 n=15) NatSqr/5-12 36.78n ± 0% 37.04n ± 1% +0.71% (p=0.000 n=15) NatSqr/8-12 63.09n ± 3% 63.22n ± 1% ~ (p=0.846 n=15) NatSqr/10-12 79.98n ± 0% 79.78n ± 0% ~ (p=0.100 n=15) NatSqr/20-12 174.0n ± 0% 175.5n ± 1% ~ (p=0.361 n=15) NatSqr/30-12 290.0n ± 0% 291.4n ± 0% ~ (p=0.002 n=15) NatSqr/50-12 655.2n ± 4% 658.1n ± 0% ~ (p=0.060 n=15) NatSqr/80-12 1.506µ ± 0% 1.397µ ± 5% -7.24% (p=0.000 n=15) NatSqr/100-12 2.273µ ± 0% 2.005µ ± 5% -11.79% (p=0.000 n=15) NatSqr/200-12 8.833µ ± 6% 6.109µ ± 0% -30.84% (p=0.000 n=15) NatSqr/300-12 15.15µ ± 4% 12.37µ ± 0% -18.34% (p=0.000 n=15) NatSqr/500-12 41.89µ ± 0% 27.70µ ± 1% -33.88% (p=0.000 n=15) NatSqr/800-12 80.72µ ± 0% 56.40µ ± 0% -30.12% (p=0.000 n=15) NatSqr/1000-12 127.06µ ± 1% 84.06µ ± 1% -33.84% (p=0.000 n=15) NatSqr/10000-12 4.130m ± 0% 3.390m ± 0% -17.91% (p=0.000 n=15) NatSqr/100000-12 173.2m ± 0% 131.2m ± 6% -24.25% (p=0.000 n=15) geomean 4.489µ 4.189µ -6.68% Change-Id: Iaf65fd85457b003ebf07a787c875cda321b40cc9 Reviewed-on: https://go-review.googlesource.com/c/go/+/652058 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Robert Griesemer <gri@google.com> Reviewed-by: Alan Donovan <adonovan@google.com> Auto-Submit: Russ Cox <rsc@golang.org>	2025-03-12 05:41:50 -07:00
Russ Cox	d037ed62bc	math/big: simplify, speed up Karatsuba multiplication The old Karatsuba implementation only operated on lengths that are a power of two times a number smaller than karatsubaThreshold. For example, when karatsubaThreshold = 40, multiplying a pair of 99-word numbers runs karatsuba on the low 96 (= 39<<2) words and then has to fix up the answer to include the high 3 words of each. I suspect this requirement was needed to make the analysis of how many temporary words to reserve easier, back when the answer was 3*n and depended on exactly halving the size at each Karatsuba step. Now that we have the more flexible temporary allocation stack, we can change Karatsuba to accept operands of odd length. Doing so avoids most of the fixup that the old approach required. For example, multiplying a pair of 99-word numbers runs karatsuba on all 99 words now. This is simpler and about the same speed or, for large cases, faster. goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) CPU @ 3.10GHz │ old │ new │ │ sec/op │ sec/op vs base │ GCD10x10/WithoutXY-16 99.62n ± 3% 99.10n ± 3% ~ (p=0.009 n=15) GCD10x10/WithXY-16 243.4n ± 1% 245.2n ± 1% ~ (p=0.009 n=15) GCD100x100/WithoutXY-16 921.9n ± 1% 919.2n ± 1% ~ (p=0.076 n=15) GCD100x100/WithXY-16 1.527µ ± 1% 1.526µ ± 0% ~ (p=0.813 n=15) GCD1000x1000/WithoutXY-16 9.704µ ± 1% 9.696µ ± 0% ~ (p=0.532 n=15) GCD1000x1000/WithXY-16 14.03µ ± 1% 13.96µ ± 0% ~ (p=0.014 n=15) GCD10000x10000/WithoutXY-16 206.5µ ± 2% 206.5µ ± 0% ~ (p=0.967 n=15) GCD10000x10000/WithXY-16 398.0µ ± 1% 397.4µ ± 0% ~ (p=0.683 n=15) Div/20/10-16 22.22n ± 0% 22.23n ± 0% ~ (p=0.105 n=15) Div/40/20-16 22.23n ± 0% 22.23n ± 0% ~ (p=0.307 n=15) Div/100/50-16 55.47n ± 0% 55.47n ± 0% ~ (p=0.573 n=15) Div/200/100-16 174.9n ± 1% 174.6n ± 1% ~ (p=0.814 n=15) Div/400/200-16 209.5n ± 1% 210.5n ± 1% ~ (p=0.454 n=15) Div/1000/500-16 379.9n ± 0% 383.5n ± 2% ~ (p=0.123 n=15) Div/2000/1000-16 780.1n ± 0% 784.6n ± 1% +0.58% (p=0.000 n=15) Div/20000/10000-16 25.22µ ± 1% 25.15µ ± 0% ~ (p=0.213 n=15) Div/200000/100000-16 921.8µ ± 1% 926.1µ ± 0% ~ (p=0.009 n=15) Div/2000000/1000000-16 37.91m ± 0% 35.63m ± 0% -6.02% (p=0.000 n=15) Div/20000000/10000000-16 1.378 ± 0% 1.336 ± 0% -3.03% (p=0.000 n=15) NatMul/10-16 166.8n ± 4% 168.9n ± 3% ~ (p=0.008 n=15) NatMul/100-16 5.519µ ± 2% 5.548µ ± 4% ~ (p=0.032 n=15) NatMul/1000-16 230.4µ ± 1% 220.2µ ± 1% -4.43% (p=0.000 n=15) NatMul/10000-16 8.569m ± 1% 8.640m ± 1% ~ (p=0.005 n=15) NatMul/100000-16 376.5m ± 1% 334.1m ± 0% -11.26% (p=0.000 n=15) NatSqr/1-16 27.85n ± 5% 28.60n ± 2% ~ (p=0.123 n=15) NatSqr/2-16 47.99n ± 2% 48.84n ± 1% ~ (p=0.008 n=15) NatSqr/3-16 59.41n ± 2% 60.87n ± 2% +2.46% (p=0.001 n=15) NatSqr/5-16 87.27n ± 2% 89.31n ± 3% ~ (p=0.087 n=15) NatSqr/8-16 124.6n ± 3% 128.9n ± 3% ~ (p=0.006 n=15) NatSqr/10-16 166.3n ± 3% 172.7n ± 3% ~ (p=0.002 n=15) NatSqr/20-16 385.2n ± 2% 394.7n ± 3% ~ (p=0.036 n=15) NatSqr/30-16 622.7n ± 3% 642.9n ± 3% ~ (p=0.032 n=15) NatSqr/50-16 1.274µ ± 3% 1.323µ ± 4% ~ (p=0.003 n=15) NatSqr/80-16 2.606µ ± 4% 2.714µ ± 4% ~ (p=0.044 n=15) NatSqr/100-16 3.731µ ± 4% 3.871µ ± 4% ~ (p=0.038 n=15) NatSqr/200-16 12.99µ ± 2% 13.09µ ± 3% ~ (p=0.838 n=15) NatSqr/300-16 22.87µ ± 2% 23.25µ ± 2% ~ (p=0.285 n=15) NatSqr/500-16 58.43µ ± 1% 58.25µ ± 2% ~ (p=0.345 n=15) NatSqr/800-16 115.3µ ± 3% 116.2µ ± 3% ~ (p=0.126 n=15) NatSqr/1000-16 173.9µ ± 1% 174.3µ ± 1% ~ (p=0.935 n=15) NatSqr/10000-16 6.133m ± 2% 6.034m ± 1% -1.62% (p=0.000 n=15) NatSqr/100000-16 253.8m ± 1% 241.5m ± 0% -4.87% (p=0.000 n=15) geomean 7.745µ 7.760µ +0.19% goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz │ old │ new │ │ sec/op │ sec/op vs base │ GCD10x10/WithoutXY-88 62.17n ± 4% 61.44n ± 0% -1.17% (p=0.000 n=15) GCD10x10/WithXY-88 173.4n ± 2% 172.4n ± 4% ~ (p=0.615 n=15) GCD100x100/WithoutXY-88 584.0n ± 1% 582.9n ± 0% ~ (p=0.009 n=15) GCD100x100/WithXY-88 1.098µ ± 1% 1.091µ ± 2% ~ (p=0.002 n=15) GCD1000x1000/WithoutXY-88 6.055µ ± 0% 6.049µ ± 0% ~ (p=0.007 n=15) GCD1000x1000/WithXY-88 9.430µ ± 0% 9.417µ ± 1% ~ (p=0.123 n=15) GCD10000x10000/WithoutXY-88 153.4µ ± 2% 149.0µ ± 2% -2.85% (p=0.000 n=15) GCD10000x10000/WithXY-88 350.6µ ± 3% 349.0µ ± 2% ~ (p=0.126 n=15) Div/20/10-88 13.12n ± 0% 13.12n ± 1% 0.00% (p=0.042 n=15) Div/40/20-88 13.12n ± 0% 13.13n ± 0% ~ (p=0.004 n=15) Div/100/50-88 25.49n ± 0% 25.49n ± 0% ~ (p=0.452 n=15) Div/200/100-88 115.7n ± 2% 113.8n ± 2% ~ (p=0.212 n=15) Div/400/200-88 135.0n ± 1% 136.1n ± 1% ~ (p=0.005 n=15) Div/1000/500-88 257.5n ± 1% 259.9n ± 1% ~ (p=0.004 n=15) Div/2000/1000-88 567.5n ± 1% 572.4n ± 2% ~ (p=0.616 n=15) Div/20000/10000-88 25.65µ ± 0% 25.77µ ± 1% ~ (p=0.032 n=15) Div/200000/100000-88 777.4µ ± 1% 754.3µ ± 1% -2.97% (p=0.000 n=15) Div/2000000/1000000-88 33.66m ± 0% 31.37m ± 0% -6.81% (p=0.000 n=15) Div/20000000/10000000-88 1.320 ± 0% 1.266 ± 0% -4.04% (p=0.000 n=15) NatMul/10-88 151.9n ± 7% 143.3n ± 7% ~ (p=0.878 n=15) NatMul/100-88 4.418µ ± 2% 4.337µ ± 3% ~ (p=0.512 n=15) NatMul/1000-88 206.8µ ± 1% 189.8µ ± 1% -8.25% (p=0.000 n=15) NatMul/10000-88 8.531m ± 1% 8.095m ± 0% -5.12% (p=0.000 n=15) NatMul/100000-88 298.9m ± 0% 260.5m ± 1% -12.85% (p=0.000 n=15) NatSqr/1-88 27.55n ± 6% 28.25n ± 7% ~ (p=0.024 n=15) NatSqr/2-88 44.71n ± 6% 46.21n ± 9% ~ (p=0.024 n=15) NatSqr/3-88 55.44n ± 4% 58.41n ± 10% ~ (p=0.126 n=15) NatSqr/5-88 80.71n ± 5% 81.41n ± 5% ~ (p=0.032 n=15) NatSqr/8-88 115.7n ± 4% 115.4n ± 5% ~ (p=0.814 n=15) NatSqr/10-88 147.4n ± 4% 147.3n ± 4% ~ (p=0.505 n=15) NatSqr/20-88 337.8n ± 3% 337.3n ± 4% ~ (p=0.814 n=15) NatSqr/30-88 556.9n ± 3% 557.6n ± 4% ~ (p=0.814 n=15) NatSqr/50-88 1.208µ ± 4% 1.208µ ± 3% ~ (p=0.910 n=15) NatSqr/80-88 2.591µ ± 3% 2.581µ ± 3% ~ (p=0.705 n=15) NatSqr/100-88 3.870µ ± 3% 3.858µ ± 3% ~ (p=0.846 n=15) NatSqr/200-88 14.43µ ± 3% 14.28µ ± 2% ~ (p=0.383 n=15) NatSqr/300-88 24.68µ ± 2% 24.49µ ± 2% ~ (p=0.624 n=15) NatSqr/500-88 66.27µ ± 1% 66.18µ ± 1% ~ (p=0.735 n=15) NatSqr/800-88 128.7µ ± 1% 127.4µ ± 1% ~ (p=0.050 n=15) NatSqr/1000-88 198.7µ ± 1% 197.7µ ± 1% ~ (p=0.229 n=15) NatSqr/10000-88 6.582m ± 1% 6.426m ± 1% -2.37% (p=0.000 n=15) NatSqr/100000-88 274.3m ± 0% 267.3m ± 0% -2.57% (p=0.000 n=15) geomean 6.518µ 6.438µ -1.22% goos: linux goarch: arm64 pkg: math/big │ old │ new │ │ sec/op │ sec/op vs base │ GCD10x10/WithoutXY-16 61.70n ± 1% 61.32n ± 1% ~ (p=0.361 n=15) GCD10x10/WithXY-16 217.3n ± 1% 217.0n ± 1% ~ (p=0.395 n=15) GCD100x100/WithoutXY-16 569.7n ± 0% 572.6n ± 2% ~ (p=0.213 n=15) GCD100x100/WithXY-16 1.241µ ± 1% 1.236µ ± 1% ~ (p=0.157 n=15) GCD1000x1000/WithoutXY-16 5.558µ ± 0% 5.566µ ± 0% ~ (p=0.228 n=15) GCD1000x1000/WithXY-16 9.319µ ± 0% 9.326µ ± 0% ~ (p=0.233 n=15) GCD10000x10000/WithoutXY-16 126.4µ ± 2% 128.7µ ± 3% ~ (p=0.081 n=15) GCD10000x10000/WithXY-16 279.3µ ± 0% 278.3µ ± 5% ~ (p=0.187 n=15) Div/20/10-16 15.12n ± 1% 15.21n ± 1% ~ (p=0.490 n=15) Div/40/20-16 15.11n ± 0% 15.23n ± 1% ~ (p=0.107 n=15) Div/100/50-16 26.53n ± 0% 26.50n ± 0% ~ (p=0.299 n=15) Div/200/100-16 123.7n ± 0% 124.0n ± 0% ~ (p=0.086 n=15) Div/400/200-16 142.5n ± 0% 142.4n ± 0% ~ (p=0.039 n=15) Div/1000/500-16 259.9n ± 1% 261.2n ± 1% ~ (p=0.044 n=15) Div/2000/1000-16 539.4n ± 1% 532.3n ± 1% -1.32% (p=0.001 n=15) Div/20000/10000-16 22.43µ ± 0% 22.32µ ± 0% -0.49% (p=0.000 n=15) Div/200000/100000-16 898.3µ ± 0% 889.6µ ± 0% -0.96% (p=0.000 n=15) Div/2000000/1000000-16 38.37m ± 0% 35.11m ± 0% -8.49% (p=0.000 n=15) Div/20000000/10000000-16 1.449 ± 0% 1.384 ± 0% -4.48% (p=0.000 n=15) NatMul/10-16 182.0n ± 1% 177.8n ± 1% -2.31% (p=0.000 n=15) NatMul/100-16 5.537µ ± 0% 5.693µ ± 0% +2.82% (p=0.000 n=15) NatMul/1000-16 229.9µ ± 0% 224.8µ ± 0% -2.24% (p=0.000 n=15) NatMul/10000-16 8.985m ± 0% 8.751m ± 0% -2.61% (p=0.000 n=15) NatMul/100000-16 371.1m ± 0% 331.5m ± 0% -10.66% (p=0.000 n=15) NatSqr/1-16 46.77n ± 6% 42.76n ± 1% -8.57% (p=0.000 n=15) NatSqr/2-16 66.99n ± 4% 63.62n ± 1% -5.03% (p=0.000 n=15) NatSqr/3-16 76.79n ± 4% 73.42n ± 1% ~ (p=0.007 n=15) NatSqr/5-16 99.00n ± 3% 95.35n ± 1% -3.69% (p=0.000 n=15) NatSqr/8-16 160.0n ± 3% 155.1n ± 1% -3.06% (p=0.001 n=15) NatSqr/10-16 178.4n ± 2% 175.9n ± 0% -1.40% (p=0.001 n=15) NatSqr/20-16 361.9n ± 2% 361.3n ± 0% ~ (p=0.083 n=15) NatSqr/30-16 584.7n ± 0% 586.8n ± 0% +0.36% (p=0.000 n=15) NatSqr/50-16 1.327µ ± 0% 1.329µ ± 0% ~ (p=0.349 n=15) NatSqr/80-16 2.893µ ± 1% 2.925µ ± 0% +1.11% (p=0.000 n=15) NatSqr/100-16 4.330µ ± 1% 4.381µ ± 0% +1.18% (p=0.000 n=15) NatSqr/200-16 16.25µ ± 1% 16.43µ ± 0% +1.07% (p=0.000 n=15) NatSqr/300-16 27.85µ ± 1% 28.06µ ± 0% +0.77% (p=0.000 n=15) NatSqr/500-16 76.01µ ± 0% 76.34µ ± 0% ~ (p=0.002 n=15) NatSqr/800-16 146.8µ ± 0% 148.1µ ± 0% +0.83% (p=0.000 n=15) NatSqr/1000-16 228.2µ ± 0% 228.6µ ± 0% ~ (p=0.123 n=15) NatSqr/10000-16 7.524m ± 0% 7.426m ± 0% -1.31% (p=0.000 n=15) NatSqr/100000-16 316.7m ± 0% 309.2m ± 0% -2.36% (p=0.000 n=15) geomean 7.264µ 7.172µ -1.27% goos: darwin goarch: arm64 pkg: math/big cpu: Apple M3 Pro │ old │ new │ │ sec/op │ sec/op vs base │ GCD10x10/WithoutXY-12 32.61n ± 1% 32.42n ± 1% ~ (p=0.021 n=15) GCD10x10/WithXY-12 87.70n ± 1% 88.42n ± 1% ~ (p=0.010 n=15) GCD100x100/WithoutXY-12 305.9n ± 0% 306.4n ± 0% ~ (p=0.003 n=15) GCD100x100/WithXY-12 560.3n ± 2% 556.6n ± 1% ~ (p=0.018 n=15) GCD1000x1000/WithoutXY-12 3.509µ ± 2% 3.464µ ± 1% ~ (p=0.145 n=15) GCD1000x1000/WithXY-12 5.347µ ± 2% 5.372µ ± 1% ~ (p=0.046 n=15) GCD10000x10000/WithoutXY-12 73.75µ ± 1% 73.99µ ± 1% ~ (p=0.004 n=15) GCD10000x10000/WithXY-12 148.4µ ± 0% 147.8µ ± 1% ~ (p=0.076 n=15) Div/20/10-12 9.481n ± 0% 9.462n ± 1% ~ (p=0.631 n=15) Div/40/20-12 9.457n ± 0% 9.462n ± 1% ~ (p=0.798 n=15) Div/100/50-12 14.91n ± 0% 14.79n ± 1% -0.80% (p=0.000 n=15) Div/200/100-12 84.56n ± 1% 84.60n ± 1% ~ (p=0.271 n=15) Div/400/200-12 103.8n ± 0% 102.8n ± 0% -0.96% (p=0.000 n=15) Div/1000/500-12 181.3n ± 1% 184.2n ± 2% ~ (p=0.091 n=15) Div/2000/1000-12 397.5n ± 0% 397.4n ± 0% ~ (p=0.299 n=15) Div/20000/10000-12 14.04µ ± 1% 13.99µ ± 0% ~ (p=0.221 n=15) Div/200000/100000-12 523.1µ ± 0% 514.0µ ± 3% ~ (p=0.775 n=15) Div/2000000/1000000-12 21.58m ± 0% 20.01m ± 1% -7.29% (p=0.000 n=15) Div/20000000/10000000-12 813.5m ± 0% 796.2m ± 1% -2.13% (p=0.000 n=15) NatMul/10-12 80.46n ± 1% 80.02n ± 1% ~ (p=0.063 n=15) NatMul/100-12 2.904µ ± 0% 2.979µ ± 1% +2.58% (p=0.000 n=15) NatMul/1000-12 127.8µ ± 0% 122.3µ ± 0% -4.28% (p=0.000 n=15) NatMul/10000-12 5.141m ± 0% 4.975m ± 1% -3.23% (p=0.000 n=15) NatMul/100000-12 208.8m ± 0% 189.6m ± 3% -9.21% (p=0.000 n=15) NatSqr/1-12 11.90n ± 1% 11.76n ± 1% ~ (p=0.059 n=15) NatSqr/2-12 21.33n ± 1% 21.12n ± 0% ~ (p=0.063 n=15) NatSqr/3-12 26.05n ± 1% 25.79n ± 0% ~ (p=0.002 n=15) NatSqr/5-12 37.31n ± 0% 36.98n ± 1% ~ (p=0.008 n=15) NatSqr/8-12 63.07n ± 0% 62.75n ± 1% ~ (p=0.061 n=15) NatSqr/10-12 79.48n ± 0% 79.59n ± 0% ~ (p=0.455 n=15) NatSqr/20-12 173.1n ± 0% 173.2n ± 1% ~ (p=0.518 n=15) NatSqr/30-12 288.6n ± 1% 289.2n ± 0% ~ (p=0.030 n=15) NatSqr/50-12 653.3n ± 0% 653.3n ± 0% ~ (p=0.361 n=15) NatSqr/80-12 1.492µ ± 0% 1.496µ ± 0% ~ (p=0.018 n=15) NatSqr/100-12 2.270µ ± 1% 2.270µ ± 0% ~ (p=0.326 n=15) NatSqr/200-12 8.776µ ± 1% 8.784µ ± 1% ~ (p=0.083 n=15) NatSqr/300-12 15.07µ ± 0% 15.09µ ± 0% ~ (p=0.455 n=15) NatSqr/500-12 41.71µ ± 0% 41.77µ ± 1% ~ (p=0.305 n=15) NatSqr/800-12 80.77µ ± 1% 80.59µ ± 0% ~ (p=0.113 n=15) NatSqr/1000-12 126.4µ ± 1% 126.5µ ± 0% ~ (p=0.683 n=15) NatSqr/10000-12 4.204m ± 0% 4.119m ± 0% -2.02% (p=0.000 n=15) NatSqr/100000-12 177.0m ± 0% 172.9m ± 0% -2.31% (p=0.000 n=15) geomean 3.790µ 3.757µ -0.87% Change-Id: Ifc7a9b61f678df216690511ac8bb9143189a795e Reviewed-on: https://go-review.googlesource.com/c/go/+/652057 Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Robert Griesemer <gri@google.com>	2025-03-12 05:41:17 -07:00
Russ Cox	70dcc78871	math/big: avoid negative slice size in nat.rem In a division, normally the answer to N digits / D digits has N-D digits, but not when N-D is negative. Fix the calculation of the number of digits for the temporary in nat.rem not to be negative. Fixes #72043. Change-Id: Ib9faa430aeb6c5f4c4a730f1ec631d2bf3f7472c Reviewed-on: https://go-review.googlesource.com/c/go/+/655156 Reviewed-by: Robert Griesemer <gri@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-03-06 08:08:34 -08:00
Xiaolin Zhao	6ba91df153	math: implement func archExp and archExp2 in assembly on loong64 goos: linux goarch: loong64 pkg: math cpu: Loongson-3A6000 @ 2500.00MHz \| bench.old \| bench.new \| \| sec/op \| sec/op vs base \| Exp 26.30n ± 0% 12.93n ± 0% -50.85% (p=0.000 n=10) ExpGo 26.86n ± 0% 26.92n ± 0% +0.22% (p=0.000 n=10) Expm1 16.76n ± 0% 16.75n ± 0% ~ (p=0.060 n=10) Exp2 23.05n ± 0% 12.12n ± 0% -47.42% (p=0.000 n=10) Exp2Go 23.41n ± 0% 23.47n ± 0% +0.28% (p=0.000 n=10) geomean 22.97n 17.54n -23.64% goos: linux goarch: loong64 pkg: math/cmplx cpu: Loongson-3A6000 @ 2500.00MHz \| bench.old \| bench.new \| \| sec/op \| sec/op vs base \| Exp 51.32n ± 0% 35.41n ± 0% -30.99% (p=0.000 n=10) goos: linux goarch: loong64 pkg: math cpu: Loongson-3A5000 @ 2500.00MHz \| bench.old \| bench.new \| \| sec/op \| sec/op vs base \| Exp 50.27n ± 0% 48.75n ± 1% -3.01% (p=0.000 n=10) ExpGo 50.72n ± 0% 50.44n ± 0% -0.55% (p=0.000 n=10) Expm1 28.40n ± 0% 28.32n ± 0% ~ (p=0.360 n=10) Exp2 50.09n ± 0% 21.49n ± 1% -57.10% (p=0.000 n=10) Exp2Go 50.05n ± 0% 49.69n ± 0% -0.72% (p=0.000 n=10) geomean 44.85n 37.52n -16.35% goos: linux goarch: loong64 pkg: math/cmplx cpu: Loongson-3A5000 @ 2500.00MHz \| bench.old \| bench.new \| \| sec/op \| sec/op vs base \| Exp 88.56n ± 0% 67.29n ± 0% -24.03% (p=0.000 n=10) Change-Id: I89e456d26fc075d83335ee4a31227d2aface5714 Reviewed-on: https://go-review.googlesource.com/c/go/+/653935 Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>	2025-03-05 18:30:54 -08:00
Russ Cox	2b73707358	math/big: add tests for allocation during multiply Test that big.Int.Mul reusing the same target is not allocating temporary garbage during its computation. That code is going to be modified in an upcoming CL. Change-Id: I3ed55c06da030282233c29cd7af2a04f395dc7a2 Reviewed-on: https://go-review.googlesource.com/c/go/+/652056 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com> Auto-Submit: Russ Cox <rsc@golang.org>	2025-02-27 06:05:02 -08:00
Russ Cox	0ab2253ce1	math/big: move multiplication to natmul.go No code changes. This CL moves the multiplication (and squaring) code into natmul.go, in preparation for cleaning up Karatsuba and then adding Toom-Cook and FFT-based multiplication. Change-Id: I7f84328284cc4e1ca4da0ebb9f666a5535e8d7f2 Reviewed-on: https://go-review.googlesource.com/c/go/+/652055 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Alan Donovan <adonovan@google.com>	2025-02-27 06:05:00 -08:00
Russ Cox	1f7e28acdf	math/big: optimize atoi of base 2, 4, 16 Avoid multiplies when converting base 2, 4, 16 inputs, reducing conversion time from O(N²) to O(N). The Base8 and Base10 code paths should be unmodified, but the base-2,4,16 changes tickle the compiler to generate better (amd64) or worse (arm64) when really it should not. This is described in detail in #71868 and should be ignored for the purposes of this CL. goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) CPU @ 3.10GHz │ old │ new │ │ sec/op │ sec/op vs base │ Scan/10/Base2-16 324.4n ± 0% 258.7n ± 0% -20.25% (p=0.000 n=15) Scan/100/Base2-16 2.376µ ± 0% 1.968µ ± 0% -17.17% (p=0.000 n=15) Scan/1000/Base2-16 23.89µ ± 0% 19.16µ ± 0% -19.80% (p=0.000 n=15) Scan/10000/Base2-16 311.5µ ± 0% 190.4µ ± 0% -38.86% (p=0.000 n=15) Scan/100000/Base2-16 10.508m ± 0% 1.904m ± 0% -81.88% (p=0.000 n=15) Scan/10/Base8-16 138.3n ± 0% 127.9n ± 0% -7.52% (p=0.000 n=15) Scan/100/Base8-16 886.1n ± 0% 790.2n ± 0% -10.82% (p=0.000 n=15) Scan/1000/Base8-16 9.227µ ± 0% 8.234µ ± 0% -10.76% (p=0.000 n=15) Scan/10000/Base8-16 165.8µ ± 0% 155.6µ ± 0% -6.19% (p=0.000 n=15) Scan/100000/Base8-16 9.044m ± 0% 8.935m ± 0% -1.20% (p=0.000 n=15) Scan/10/Base10-16 129.9n ± 0% 120.0n ± 0% -7.62% (p=0.000 n=15) Scan/100/Base10-16 816.3n ± 0% 730.0n ± 0% -10.57% (p=0.000 n=15) Scan/1000/Base10-16 8.518µ ± 0% 7.628µ ± 0% -10.45% (p=0.000 n=15) Scan/10000/Base10-16 158.6µ ± 0% 149.4µ ± 0% -5.80% (p=0.000 n=15) Scan/100000/Base10-16 8.962m ± 0% 8.855m ± 0% -1.20% (p=0.000 n=15) Scan/10/Base16-16 114.5n ± 0% 108.6n ± 0% -5.15% (p=0.000 n=15) Scan/100/Base16-16 648.3n ± 0% 525.0n ± 0% -19.02% (p=0.000 n=15) Scan/1000/Base16-16 7.375µ ± 0% 5.636µ ± 0% -23.58% (p=0.000 n=15) Scan/10000/Base16-16 171.18µ ± 0% 66.99µ ± 0% -60.87% (p=0.000 n=15) Scan/100000/Base16-16 9490.9µ ± 0% 682.8µ ± 0% -92.81% (p=0.000 n=15) geomean 20.11µ 13.69µ -31.94% goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz │ old │ new │ │ sec/op │ sec/op vs base │ Scan/10/Base2-88 275.4n ± 0% 215.0n ± 0% -21.93% (p=0.000 n=15) Scan/100/Base2-88 1.869µ ± 0% 1.629µ ± 0% -12.84% (p=0.000 n=15) Scan/1000/Base2-88 18.56µ ± 0% 15.81µ ± 0% -14.82% (p=0.000 n=15) Scan/10000/Base2-88 270.0µ ± 0% 157.2µ ± 0% -41.77% (p=0.000 n=15) Scan/100000/Base2-88 11.518m ± 0% 1.571m ± 0% -86.36% (p=0.000 n=15) Scan/10/Base8-88 108.9n ± 0% 106.0n ± 0% -2.66% (p=0.000 n=15) Scan/100/Base8-88 655.2n ± 0% 594.9n ± 0% -9.20% (p=0.000 n=15) Scan/1000/Base8-88 6.467µ ± 0% 5.966µ ± 0% -7.75% (p=0.000 n=15) Scan/10000/Base8-88 151.2µ ± 0% 147.4µ ± 0% -2.53% (p=0.000 n=15) Scan/100000/Base8-88 10.33m ± 0% 10.30m ± 0% -0.25% (p=0.000 n=15) Scan/10/Base10-88 100.20n ± 0% 98.53n ± 0% -1.67% (p=0.000 n=15) Scan/100/Base10-88 596.9n ± 0% 543.3n ± 0% -8.98% (p=0.000 n=15) Scan/1000/Base10-88 5.904µ ± 0% 5.485µ ± 0% -7.10% (p=0.000 n=15) Scan/10000/Base10-88 145.7µ ± 0% 142.0µ ± 0% -2.55% (p=0.000 n=15) Scan/100000/Base10-88 10.26m ± 0% 10.24m ± 0% -0.18% (p=0.000 n=15) Scan/10/Base16-88 90.33n ± 0% 87.60n ± 0% -3.02% (p=0.000 n=15) Scan/100/Base16-88 506.4n ± 0% 437.7n ± 0% -13.57% (p=0.000 n=15) Scan/1000/Base16-88 5.056µ ± 0% 4.007µ ± 0% -20.75% (p=0.000 n=15) Scan/10000/Base16-88 163.35µ ± 0% 65.37µ ± 0% -59.98% (p=0.000 n=15) Scan/100000/Base16-88 11027.2µ ± 0% 735.1µ ± 0% -93.33% (p=0.000 n=15) geomean 17.13µ 11.74µ -31.46% goos: linux goarch: arm64 pkg: math/big │ old │ new │ │ sec/op │ sec/op vs base │ Scan/10/Base2-16 324.7n ± 0% 348.4n ± 0% +7.30% (p=0.000 n=15) Scan/100/Base2-16 2.604µ ± 0% 3.031µ ± 0% +16.40% (p=0.000 n=15) Scan/1000/Base2-16 26.15µ ± 0% 29.94µ ± 0% +14.52% (p=0.000 n=15) Scan/10000/Base2-16 334.3µ ± 0% 298.8µ ± 0% -10.64% (p=0.000 n=15) Scan/100000/Base2-16 10.664m ± 0% 2.991m ± 0% -71.95% (p=0.000 n=15) Scan/10/Base8-16 144.4n ± 1% 162.2n ± 1% +12.33% (p=0.000 n=15) Scan/100/Base8-16 917.2n ± 0% 1084.0n ± 0% +18.19% (p=0.000 n=15) Scan/1000/Base8-16 9.367µ ± 0% 10.901µ ± 0% +16.38% (p=0.000 n=15) Scan/10000/Base8-16 164.2µ ± 0% 181.2µ ± 0% +10.34% (p=0.000 n=15) Scan/100000/Base8-16 8.871m ± 1% 9.140m ± 0% +3.04% (p=0.000 n=15) Scan/10/Base10-16 134.6n ± 1% 148.3n ± 1% +10.18% (p=0.000 n=15) Scan/100/Base10-16 837.1n ± 0% 986.6n ± 0% +17.86% (p=0.000 n=15) Scan/1000/Base10-16 8.563µ ± 0% 9.936µ ± 0% +16.03% (p=0.000 n=15) Scan/10000/Base10-16 156.5µ ± 1% 171.3µ ± 0% +9.41% (p=0.000 n=15) Scan/100000/Base10-16 8.863m ± 1% 9.011m ± 0% +1.66% (p=0.000 n=15) Scan/10/Base16-16 115.7n ± 2% 129.1n ± 1% +11.58% (p=0.000 n=15) Scan/100/Base16-16 708.6n ± 0% 796.8n ± 0% +12.45% (p=0.000 n=15) Scan/1000/Base16-16 7.314µ ± 0% 7.554µ ± 0% +3.28% (p=0.000 n=15) Scan/10000/Base16-16 149.05µ ± 0% 74.60µ ± 0% -49.95% (p=0.000 n=15) Scan/100000/Base16-16 9091.6µ ± 0% 741.5µ ± 0% -91.84% (p=0.000 n=15) geomean 20.39µ 17.65µ -13.44% goos: darwin goarch: arm64 pkg: math/big cpu: Apple M3 Pro │ old │ new │ │ sec/op │ sec/op vs base │ Scan/10/Base2-12 193.8n ± 2% 157.3n ± 1% -18.83% (p=0.000 n=15) Scan/100/Base2-12 1.445µ ± 2% 1.362µ ± 1% -5.74% (p=0.000 n=15) Scan/1000/Base2-12 14.28µ ± 0% 13.51µ ± 0% -5.42% (p=0.000 n=15) Scan/10000/Base2-12 177.1µ ± 0% 134.6µ ± 0% -24.04% (p=0.000 n=15) Scan/100000/Base2-12 5.429m ± 1% 1.333m ± 0% -75.45% (p=0.000 n=15) Scan/10/Base8-12 75.52n ± 2% 76.09n ± 1% ~ (p=0.010 n=15) Scan/100/Base8-12 528.4n ± 1% 532.1n ± 1% ~ (p=0.003 n=15) Scan/1000/Base8-12 5.423µ ± 1% 5.427µ ± 0% ~ (p=0.183 n=15) Scan/10000/Base8-12 89.26µ ± 1% 89.37µ ± 0% ~ (p=0.237 n=15) Scan/100000/Base8-12 4.543m ± 2% 4.560m ± 1% ~ (p=0.595 n=15) Scan/10/Base10-12 69.87n ± 1% 70.51n ± 0% ~ (p=0.002 n=15) Scan/100/Base10-12 488.4n ± 1% 491.2n ± 0% ~ (p=0.060 n=15) Scan/1000/Base10-12 5.014µ ± 1% 5.008µ ± 0% ~ (p=0.783 n=15) Scan/10000/Base10-12 84.90µ ± 0% 85.10µ ± 0% ~ (p=0.109 n=15) Scan/100000/Base10-12 4.516m ± 1% 4.521m ± 1% ~ (p=0.713 n=15) Scan/10/Base16-12 59.21n ± 1% 57.70n ± 1% -2.55% (p=0.000 n=15) Scan/100/Base16-12 380.0n ± 1% 360.7n ± 1% -5.08% (p=0.000 n=15) Scan/1000/Base16-12 3.775µ ± 0% 3.421µ ± 0% -9.38% (p=0.000 n=15) Scan/10000/Base16-12 80.62µ ± 0% 34.44µ ± 1% -57.28% (p=0.000 n=15) Scan/100000/Base16-12 4826.4µ ± 2% 450.9µ ± 2% -90.66% (p=0.000 n=15) geomean 11.05µ 8.448µ -23.52% Change-Id: Ifdb2049545f34072aa75cdbb72bed4cf465f0ad7 Reviewed-on: https://go-review.googlesource.com/c/go/+/650640 Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Robert Griesemer <gri@google.com>	2025-02-27 06:04:57 -08:00
Russ Cox	f8f08bfd7c	math/big: improve scan test and benchmark Add a few more test cases for scanning (integer conversion), which were helpful in debugging some upcoming changes. BenchmarkScan currently times converting the value 10N represented in base B back into []Word form. When B = 10, the text is 1 followed by many zeros, which could hit a "multiply by zero" special case when processing many digit chunks, misrepresenting the actual time required depending on whether that case is optimized. Change the benchmark to use 9N, which is about as big and will not cause runs of zeros in any of the tested bases. The benchmark comparison below is not showing faster code, since of course the code is not changing at all here. Instead, it is showing that the new benchmark work is roughly the same size as the old benchmark work. goos: darwin goarch: arm64 pkg: math/big cpu: Apple M3 Pro │ old │ new │ │ sec/op │ sec/op vs base │ ScanPi-12 43.35µ ± 1% 43.59µ ± 1% ~ (p=0.069 n=15) Scan/10/Base2-12 202.3n ± 2% 193.7n ± 1% -4.25% (p=0.000 n=15) Scan/100/Base2-12 1.512µ ± 3% 1.447µ ± 1% -4.30% (p=0.000 n=15) Scan/1000/Base2-12 15.06µ ± 2% 14.33µ ± 0% -4.83% (p=0.000 n=15) Scan/10000/Base2-12 188.0µ ± 5% 177.3µ ± 1% -5.65% (p=0.000 n=15) Scan/100000/Base2-12 5.814m ± 3% 5.382m ± 1% -7.43% (p=0.000 n=15) Scan/10/Base8-12 78.57n ± 2% 75.02n ± 1% -4.52% (p=0.000 n=15) Scan/100/Base8-12 548.2n ± 2% 526.8n ± 1% -3.90% (p=0.000 n=15) Scan/1000/Base8-12 5.674µ ± 2% 5.421µ ± 0% -4.46% (p=0.000 n=15) Scan/10000/Base8-12 94.42µ ± 1% 88.61µ ± 1% -6.15% (p=0.000 n=15) Scan/100000/Base8-12 4.906m ± 2% 4.498m ± 3% -8.31% (p=0.000 n=15) Scan/10/Base10-12 73.42n ± 1% 69.56n ± 0% -5.26% (p=0.000 n=15) Scan/100/Base10-12 511.9n ± 1% 488.2n ± 0% -4.63% (p=0.000 n=15) Scan/1000/Base10-12 5.254µ ± 2% 5.009µ ± 0% -4.66% (p=0.000 n=15) Scan/10000/Base10-12 90.22µ ± 2% 84.52µ ± 0% -6.32% (p=0.000 n=15) Scan/100000/Base10-12 4.842m ± 3% 4.471m ± 3% -7.65% (p=0.000 n=15) Scan/10/Base16-12 62.28n ± 1% 58.70n ± 1% -5.75% (p=0.000 n=15) Scan/100/Base16-12 398.6n ± 0% 377.9n ± 1% -5.19% (p=0.000 n=15) Scan/1000/Base16-12 4.108µ ± 1% 3.782µ ± 0% -7.94% (p=0.000 n=15) Scan/10000/Base16-12 83.78µ ± 2% 80.51µ ± 1% -3.90% (p=0.000 n=15) Scan/100000/Base16-12 5.080m ± 3% 4.698m ± 3% -7.53% (p=0.000 n=15) geomean 12.41µ 11.74µ -5.36% Change-Id: If3ce290ecc7f38672f11b42fd811afb53dee665d Reviewed-on: https://go-review.googlesource.com/c/go/+/650639 Reviewed-by: Alan Donovan <adonovan@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org>	2025-02-27 05:58:49 -08:00
Russ Cox	872496da2b	math/big: replace nat pool with Word stack In the early days of math/big, algorithms that needed more space grew the result larger than it needed to be and then used the high words as extra space. This made results their own temporary space caches, at the cost that saving a result in a data structure might hold significantly more memory than necessary. Specifically, new(big.Int).Mul(x, y) returned a big.Int with a backing slice 3X as big as it strictly needed to be. If you are storing many multiplication results, or even a single large result, the 3X overhead can add up. This approach to storage for temporaries also requires being able to analyze the algorithms to predict the exact amount they need, which can be difficult. For both these reasons, the implementation of recursive long division, which came later, introduced a “nat pool” where temporaries could be stored and reused, or reclaimed by the GC when no longer used. This avoids the storage and bookkeeping overheads but introduces a per-temporary sync.Pool overhead. divRecursiveStep takes an array of cached temporaries to remove some of that overhead. The nat pool was better but is still not quite right. This CL introduces something even better than the nat pool (still probably not quite right, but the best I can see for now): a sync.Pool holding stacks for allocating temporaries. Now an operation can get one stack out of the pool and then allocate as many temporaries as it needs during the operation, eventually returning the stack back to the pool. The sync.Pool operations are now per-exported-operation (like big.Int.Mul), not per-temporary. This CL converts both the pre-allocation in nat.mul and the uses of the nat pool to use stack pools instead. This simplifies some code and sets us up better for more complex algorithms (such as Toom-Cook or FFT-based multiplication) that need more temporaries. It is also a little bit faster. goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) CPU @ 3.10GHz │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-16 23.68n ± 0% 22.21n ± 0% -6.21% (p=0.000 n=15) Div/40/20-16 23.68n ± 0% 22.21n ± 0% -6.21% (p=0.000 n=15) Div/100/50-16 56.65n ± 0% 55.53n ± 0% -1.98% (p=0.000 n=15) Div/200/100-16 194.6n ± 1% 172.8n ± 0% -11.20% (p=0.000 n=15) Div/400/200-16 232.1n ± 0% 206.7n ± 0% -10.94% (p=0.000 n=15) Div/1000/500-16 405.3n ± 1% 383.8n ± 0% -5.30% (p=0.000 n=15) Div/2000/1000-16 810.4n ± 1% 795.2n ± 0% -1.88% (p=0.000 n=15) Div/20000/10000-16 25.88µ ± 0% 25.39µ ± 0% -1.89% (p=0.000 n=15) Div/200000/100000-16 931.5µ ± 0% 924.3µ ± 0% -0.77% (p=0.000 n=15) Div/2000000/1000000-16 37.77m ± 0% 37.75m ± 0% ~ (p=0.098 n=15) Div/20000000/10000000-16 1.367 ± 0% 1.377 ± 0% +0.72% (p=0.003 n=15) NatMul/10-16 168.5n ± 3% 164.0n ± 4% ~ (p=0.751 n=15) NatMul/100-16 6.086µ ± 3% 5.380µ ± 3% -11.60% (p=0.000 n=15) NatMul/1000-16 238.1µ ± 3% 228.3µ ± 1% -4.12% (p=0.000 n=15) NatMul/10000-16 8.721m ± 2% 8.518m ± 1% -2.33% (p=0.000 n=15) NatMul/100000-16 369.6m ± 0% 371.1m ± 0% +0.42% (p=0.000 n=15) geomean 19.57µ 18.74µ -4.21% │ old │ new │ │ B/op │ B/op vs base │ NatMul/10-16 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-16 4.750Ki ± 0% 1.751Ki ± 0% -63.14% (p=0.000 n=15) NatMul/1000-16 48.16Ki ± 0% 16.02Ki ± 0% -66.73% (p=0.000 n=15) NatMul/10000-16 482.9Ki ± 1% 165.4Ki ± 3% -65.75% (p=0.000 n=15) NatMul/100000-16 5.747Mi ± 7% 4.197Mi ± 0% -26.97% (p=0.000 n=15) geomean 41.42Ki 20.63Ki -50.18% ¹ all samples are equal │ old │ new │ │ allocs/op │ allocs/op vs base │ NatMul/10-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/1000-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/10000-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100000-16 7.000 ± 14% 7.000 ± 14% ~ (p=0.668 n=15) geomean 1.476 1.476 +0.00% ¹ all samples are equal goos: linux goarch: amd64 pkg: math/big cpu: Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-88 15.84n ± 1% 13.12n ± 0% -17.17% (p=0.000 n=15) Div/40/20-88 15.88n ± 1% 13.12n ± 0% -17.38% (p=0.000 n=15) Div/100/50-88 26.42n ± 0% 25.47n ± 0% -3.60% (p=0.000 n=15) Div/200/100-88 132.4n ± 0% 114.9n ± 0% -13.22% (p=0.000 n=15) Div/400/200-88 150.1n ± 0% 135.6n ± 0% -9.66% (p=0.000 n=15) Div/1000/500-88 275.5n ± 0% 264.1n ± 0% -4.14% (p=0.000 n=15) Div/2000/1000-88 586.5n ± 0% 581.1n ± 0% -0.92% (p=0.000 n=15) Div/20000/10000-88 25.87µ ± 0% 25.72µ ± 0% -0.59% (p=0.000 n=15) Div/200000/100000-88 772.2µ ± 0% 779.0µ ± 0% +0.88% (p=0.000 n=15) Div/2000000/1000000-88 33.36m ± 0% 33.63m ± 0% +0.80% (p=0.000 n=15) Div/20000000/10000000-88 1.307 ± 0% 1.320 ± 0% +1.03% (p=0.000 n=15) NatMul/10-88 140.4n ± 0% 148.8n ± 4% +5.98% (p=0.000 n=15) NatMul/100-88 4.663µ ± 1% 4.388µ ± 1% -5.90% (p=0.000 n=15) NatMul/1000-88 207.7µ ± 0% 205.8µ ± 0% -0.89% (p=0.000 n=15) NatMul/10000-88 8.456m ± 0% 8.468m ± 0% +0.14% (p=0.021 n=15) NatMul/100000-88 295.1m ± 0% 297.9m ± 0% +0.94% (p=0.000 n=15) geomean 14.96µ 14.33µ -4.23% │ old │ new │ │ B/op │ B/op vs base │ NatMul/10-88 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-88 4.750Ki ± 0% 1.758Ki ± 0% -62.99% (p=0.000 n=15) NatMul/1000-88 48.44Ki ± 0% 16.08Ki ± 0% -66.80% (p=0.000 n=15) NatMul/10000-88 489.7Ki ± 1% 166.1Ki ± 3% -66.08% (p=0.000 n=15) NatMul/100000-88 5.546Mi ± 0% 3.819Mi ± 60% -31.15% (p=0.000 n=15) geomean 41.29Ki 20.30Ki -50.85% ¹ all samples are equal │ old │ new │ │ allocs/op │ allocs/op vs base │ NatMul/10-88 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-88 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/1000-88 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/10000-88 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100000-88 5.000 ± 20% 6.000 ± 67% ~ (p=0.672 n=15) geomean 1.380 1.431 +3.71% ¹ all samples are equal goos: linux goarch: arm64 pkg: math/big │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-16 15.85n ± 0% 15.23n ± 0% -3.91% (p=0.000 n=15) Div/40/20-16 15.88n ± 0% 15.22n ± 0% -4.16% (p=0.000 n=15) Div/100/50-16 29.69n ± 0% 26.39n ± 0% -11.11% (p=0.000 n=15) Div/200/100-16 149.2n ± 0% 123.3n ± 0% -17.36% (p=0.000 n=15) Div/400/200-16 160.3n ± 0% 139.2n ± 0% -13.16% (p=0.000 n=15) Div/1000/500-16 271.0n ± 0% 256.1n ± 0% -5.50% (p=0.000 n=15) Div/2000/1000-16 545.3n ± 0% 527.0n ± 0% -3.36% (p=0.000 n=15) Div/20000/10000-16 22.60µ ± 0% 22.20µ ± 0% -1.77% (p=0.000 n=15) Div/200000/100000-16 889.0µ ± 0% 892.2µ ± 0% +0.35% (p=0.000 n=15) Div/2000000/1000000-16 38.01m ± 0% 38.12m ± 0% +0.30% (p=0.000 n=15) Div/20000000/10000000-16 1.437 ± 0% 1.444 ± 0% +0.50% (p=0.000 n=15) NatMul/10-16 166.4n ± 2% 169.5n ± 1% +1.86% (p=0.000 n=15) NatMul/100-16 5.733µ ± 1% 5.570µ ± 1% -2.84% (p=0.000 n=15) NatMul/1000-16 232.6µ ± 1% 229.8µ ± 0% -1.22% (p=0.000 n=15) NatMul/10000-16 9.039m ± 1% 8.969m ± 0% -0.77% (p=0.000 n=15) NatMul/100000-16 367.0m ± 0% 368.8m ± 0% +0.48% (p=0.000 n=15) geomean 16.15µ 15.50µ -4.01% │ old │ new │ │ B/op │ B/op vs base │ NatMul/10-16 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-16 4.750Ki ± 0% 1.751Ki ± 0% -63.14% (p=0.000 n=15) NatMul/1000-16 48.33Ki ± 0% 16.02Ki ± 0% -66.85% (p=0.000 n=15) NatMul/10000-16 536.5Ki ± 1% 165.7Ki ± 3% -69.12% (p=0.000 n=15) NatMul/100000-16 6.078Mi ± 6% 4.197Mi ± 0% -30.94% (p=0.000 n=15) geomean 42.81Ki 20.64Ki -51.78% ¹ all samples are equal │ old │ new │ │ allocs/op │ allocs/op vs base │ NatMul/10-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/1000-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/10000-16 2.000 ± 50% 1.000 ± 0% -50.00% (p=0.001 n=15) NatMul/100000-16 9.000 ± 11% 8.000 ± 12% -11.11% (p=0.001 n=15) geomean 1.783 1.516 -14.97% ¹ all samples are equal goos: darwin goarch: arm64 pkg: math/big cpu: Apple M3 Pro │ old │ new │ │ sec/op │ sec/op vs base │ Div/20/10-12 9.850n ± 1% 9.405n ± 1% -4.52% (p=0.000 n=15) Div/40/20-12 9.858n ± 0% 9.403n ± 1% -4.62% (p=0.000 n=15) Div/100/50-12 16.40n ± 1% 14.81n ± 0% -9.70% (p=0.000 n=15) Div/200/100-12 88.48n ± 2% 80.88n ± 0% -8.59% (p=0.000 n=15) Div/400/200-12 107.90n ± 1% 99.28n ± 1% -7.99% (p=0.000 n=15) Div/1000/500-12 188.8n ± 1% 178.6n ± 1% -5.40% (p=0.000 n=15) Div/2000/1000-12 399.9n ± 0% 389.1n ± 0% -2.70% (p=0.000 n=15) Div/20000/10000-12 13.94µ ± 2% 13.81µ ± 1% ~ (p=0.574 n=15) Div/200000/100000-12 523.8µ ± 0% 521.7µ ± 0% -0.40% (p=0.000 n=15) Div/2000000/1000000-12 21.46m ± 0% 21.48m ± 0% ~ (p=0.067 n=15) Div/20000000/10000000-12 812.5m ± 0% 812.9m ± 0% ~ (p=0.061 n=15) NatMul/10-12 77.14n ± 0% 78.35n ± 1% +1.57% (p=0.000 n=15) NatMul/100-12 2.999µ ± 0% 2.871µ ± 1% -4.27% (p=0.000 n=15) NatMul/1000-12 126.2µ ± 0% 126.8µ ± 0% +0.51% (p=0.011 n=15) NatMul/10000-12 5.099m ± 0% 5.125m ± 0% +0.51% (p=0.000 n=15) NatMul/100000-12 206.7m ± 0% 208.4m ± 0% +0.80% (p=0.000 n=15) geomean 9.512µ 9.236µ -2.91% │ old │ new │ │ B/op │ B/op vs base │ NatMul/10-12 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-12 4.750Ki ± 0% 1.750Ki ± 0% -63.16% (p=0.000 n=15) NatMul/1000-12 48.13Ki ± 0% 16.01Ki ± 0% -66.73% (p=0.000 n=15) NatMul/10000-12 483.5Ki ± 1% 163.2Ki ± 2% -66.24% (p=0.000 n=15) NatMul/100000-12 5.480Mi ± 4% 1.532Mi ± 104% -72.05% (p=0.000 n=15) geomean 41.03Ki 16.82Ki -59.01% ¹ all samples are equal │ old │ new │ │ allocs/op │ allocs/op vs base │ NatMul/10-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/1000-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/10000-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=15) ¹ NatMul/100000-12 5.000 ± 0% 1.000 ± 400% -80.00% (p=0.007 n=15) geomean 1.380 1.000 -27.52% ¹ all samples are equal Change-Id: I7efa6fe37971ed26ae120a32250fcb47ece0a011 Reviewed-on: https://go-review.googlesource.com/c/go/+/650638 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Alan Donovan <adonovan@google.com>	2025-02-27 05:58:09 -08:00
Russ Cox	763766e9ab	math/big: report allocs in BenchmarkNatMul, BenchmarkNatSqr Change-Id: I112f55c0e3ee3b75e615a06b27552de164565c04 Reviewed-on: https://go-review.googlesource.com/c/go/+/650637 Reviewed-by: Robert Griesemer <gri@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Russ Cox <rsc@golang.org>	2025-02-27 05:50:45 -08:00
Russ Cox	b66d83ccea	math/big: clean up GCD a little The GCD code was setting one *Int to the value of another by smashing one struct on top of the other, instead of using Set. That was safe in this one case, but it's not idiomatic in math/big nor safe in general, so rewrite the code not to do that. (In one case, by swapping variables around; in another, by calling Set.) The added Set call does slow down GCDs by a small amount, since the answer has to be copied out. To compensate for that, optimize a bit: remove the s, t temporaries entirely and handle vector x word multiplication directly. The net result is that almost all GCDs are faster, except for small ones, which are a few nanoseconds slower. goos: darwin goarch: arm64 pkg: math/big cpu: Apple M3 Pro │ bench.before │ bench.after │ │ sec/op │ sec/op vs base │ GCD10x10/WithoutXY-12 23.80n ± 1% 31.71n ± 1% +33.24% (p=0.000 n=10) GCD10x10/WithXY-12 100.40n ± 0% 92.14n ± 1% -8.22% (p=0.000 n=10) GCD10x100/WithoutXY-12 63.70n ± 0% 70.73n ± 0% +11.05% (p=0.000 n=10) GCD10x100/WithXY-12 278.6n ± 0% 233.1n ± 1% -16.35% (p=0.000 n=10) GCD10x1000/WithoutXY-12 153.4n ± 0% 162.2n ± 1% +5.74% (p=0.000 n=10) GCD10x1000/WithXY-12 456.0n ± 0% 411.8n ± 1% -9.69% (p=0.000 n=10) GCD10x10000/WithoutXY-12 1.002µ ± 1% 1.036µ ± 0% +3.39% (p=0.000 n=10) GCD10x10000/WithXY-12 2.330µ ± 1% 2.210µ ± 0% -5.13% (p=0.000 n=10) GCD10x100000/WithoutXY-12 8.894µ ± 0% 8.889µ ± 1% ~ (p=0.754 n=10) GCD10x100000/WithXY-12 20.84µ ± 0% 20.24µ ± 0% -2.84% (p=0.000 n=10) GCD100x100/WithoutXY-12 373.3n ± 3% 314.4n ± 0% -15.76% (p=0.000 n=10) GCD100x100/WithXY-12 662.5n ± 0% 572.4n ± 1% -13.59% (p=0.000 n=10) GCD100x1000/WithoutXY-12 641.8n ± 0% 598.1n ± 1% -6.81% (p=0.000 n=10) GCD100x1000/WithXY-12 1.123µ ± 0% 1.019µ ± 1% -9.26% (p=0.000 n=10) GCD100x10000/WithoutXY-12 2.870µ ± 0% 2.831µ ± 0% -1.38% (p=0.000 n=10) GCD100x10000/WithXY-12 4.930µ ± 1% 4.675µ ± 0% -5.16% (p=0.000 n=10) GCD100x100000/WithoutXY-12 24.08µ ± 0% 23.97µ ± 0% -0.48% (p=0.007 n=10) GCD100x100000/WithXY-12 43.66µ ± 0% 42.52µ ± 0% -2.61% (p=0.001 n=10) GCD1000x1000/WithoutXY-12 3.999µ ± 0% 3.569µ ± 1% -10.75% (p=0.000 n=10) GCD1000x1000/WithXY-12 6.397µ ± 0% 5.534µ ± 0% -13.49% (p=0.000 n=10) GCD1000x10000/WithoutXY-12 6.875µ ± 0% 6.450µ ± 0% -6.18% (p=0.000 n=10) GCD1000x10000/WithXY-12 20.75µ ± 1% 19.17µ ± 1% -7.64% (p=0.000 n=10) GCD1000x100000/WithoutXY-12 36.38µ ± 0% 35.60µ ± 1% -2.13% (p=0.000 n=10) GCD1000x100000/WithXY-12 172.1µ ± 0% 174.4µ ± 3% ~ (p=0.052 n=10) GCD10000x10000/WithoutXY-12 79.89µ ± 1% 75.16µ ± 2% -5.92% (p=0.000 n=10) GCD10000x10000/WithXY-12 160.1µ ± 0% 150.0µ ± 0% -6.33% (p=0.000 n=10) GCD10000x100000/WithoutXY-12 213.2µ ± 1% 209.0µ ± 1% -1.98% (p=0.000 n=10) GCD10000x100000/WithXY-12 1.399m ± 0% 1.342m ± 3% -4.08% (p=0.002 n=10) GCD100000x100000/WithoutXY-12 5.463m ± 1% 5.504m ± 2% ~ (p=0.190 n=10) GCD100000x100000/WithXY-12 11.36m ± 0% 11.46m ± 1% +0.86% (p=0.000 n=10) geomean 6.953µ 6.695µ -3.71% goos: linux goarch: amd64 pkg: math/big cpu: AMD Ryzen 9 7950X 16-Core Processor │ bench.before │ bench.after │ │ sec/op │ sec/op vs base │ GCD10x10/WithoutXY-32 39.66n ± 4% 44.34n ± 4% +11.77% (p=0.000 n=10) GCD10x10/WithXY-32 156.7n ± 12% 130.8n ± 2% -16.53% (p=0.000 n=10) GCD10x100/WithoutXY-32 115.8n ± 5% 120.2n ± 2% +3.89% (p=0.000 n=10) GCD10x100/WithXY-32 465.3n ± 3% 368.1n ± 2% -20.91% (p=0.000 n=10) GCD10x1000/WithoutXY-32 201.1n ± 1% 210.8n ± 2% +4.82% (p=0.000 n=10) GCD10x1000/WithXY-32 652.9n ± 4% 605.0n ± 1% -7.32% (p=0.002 n=10) GCD10x10000/WithoutXY-32 1.046µ ± 2% 1.143µ ± 1% +9.33% (p=0.000 n=10) GCD10x10000/WithXY-32 3.360µ ± 1% 3.258µ ± 1% -3.04% (p=0.000 n=10) GCD10x100000/WithoutXY-32 9.391µ ± 3% 9.997µ ± 1% +6.46% (p=0.000 n=10) GCD10x100000/WithXY-32 27.92µ ± 1% 28.21µ ± 0% +1.04% (p=0.043 n=10) GCD100x100/WithoutXY-32 443.7n ± 5% 320.0n ± 2% -27.88% (p=0.000 n=10) GCD100x100/WithXY-32 789.9n ± 2% 690.4n ± 1% -12.60% (p=0.000 n=10) GCD100x1000/WithoutXY-32 718.4n ± 3% 600.0n ± 1% -16.48% (p=0.000 n=10) GCD100x1000/WithXY-32 1.388µ ± 4% 1.175µ ± 1% -15.28% (p=0.000 n=10) GCD100x10000/WithoutXY-32 2.750µ ± 1% 2.668µ ± 1% -2.96% (p=0.000 n=10) GCD100x10000/WithXY-32 6.016µ ± 1% 5.590µ ± 1% -7.09% (p=0.000 n=10) GCD100x100000/WithoutXY-32 21.40µ ± 1% 22.30µ ± 1% +4.21% (p=0.000 n=10) GCD100x100000/WithXY-32 47.02µ ± 4% 48.80µ ± 0% +3.78% (p=0.015 n=10) GCD1000x1000/WithoutXY-32 3.417µ ± 4% 3.020µ ± 1% -11.65% (p=0.000 n=10) GCD1000x1000/WithXY-32 5.752µ ± 0% 5.418µ ± 2% -5.81% (p=0.000 n=10) GCD1000x10000/WithoutXY-32 6.150µ ± 0% 6.246µ ± 1% +1.55% (p=0.000 n=10) GCD1000x10000/WithXY-32 24.68µ ± 3% 25.07µ ± 1% ~ (p=0.051 n=10) GCD1000x100000/WithoutXY-32 34.60µ ± 2% 36.85µ ± 1% +6.51% (p=0.000 n=10) GCD1000x100000/WithXY-32 209.5µ ± 4% 227.4µ ± 0% +8.56% (p=0.000 n=10) GCD10000x10000/WithoutXY-32 90.69µ ± 0% 88.48µ ± 0% -2.44% (p=0.000 n=10) GCD10000x10000/WithXY-32 197.1µ ± 0% 200.5µ ± 0% +1.73% (p=0.000 n=10) GCD10000x100000/WithoutXY-32 239.1µ ± 0% 242.5µ ± 0% +1.42% (p=0.000 n=10) GCD10000x100000/WithXY-32 1.963m ± 3% 2.028m ± 0% +3.28% (p=0.000 n=10) GCD100000x100000/WithoutXY-32 7.466m ± 0% 7.412m ± 0% -0.71% (p=0.000 n=10) GCD100000x100000/WithXY-32 16.10m ± 2% 16.47m ± 0% +2.25% (p=0.000 n=10) geomean 8.388µ 8.127µ -3.12% Change-Id: I161dc409bad11bcc553bc8116449905ae5b06742 Reviewed-on: https://go-review.googlesource.com/c/go/+/650636 Reviewed-by: Robert Griesemer <gri@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Alan Donovan <adonovan@google.com> Auto-Submit: Russ Cox <rsc@golang.org>	2025-02-27 05:49:50 -08:00
qmuntal	f707e53fd5	all: surround -test.run arguments with ^$ If the -test.run value is not surrounded by ^$ then any test that matches the -test.run value will be run. This is normally not the desired behavior, as it can lead to unexpected tests being run. Change-Id: I3447aaebad5156bbef7f263cdb9f6b8c32331324 Reviewed-on: https://go-review.googlesource.com/c/go/+/651956 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-02-25 10:58:29 -08:00
Eng Zer Jun	cc874072f3	math/big: use built-in max function Change-Id: I65721039dab311762e55c6a60dd75b82f6b4622f Reviewed-on: https://go-review.googlesource.com/c/go/+/642335 Reviewed-by: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Robert Griesemer <gri@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com>	2025-02-03 08:25:31 -08:00
Sean Liao	32ff485c7c	math/bits: update reference to debruijn paper The old link no longer works. Fixes #70684 Change-Id: I8711ef7d5721bf20ef83f5192dd0d1f73dda6ce1 Reviewed-on: https://go-review.googlesource.com/c/go/+/633775 Auto-Submit: Ian Lance Taylor <iant@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2024-12-04 22:15:36 +00:00
Jorropo	4c3aa5d324	math/rand/v2: replace <= 0 with == 0 for Uint function docs This harmonize the docs with (Rand).Uint functions. And it make it clearer, I wasn't sure if it would try to interpret the uint as a signed number somehow, it does not pull any surprises make that clear. Change-Id: I5a87a0a5563dbabfc31e536e40ee69b11f5cb6cf Reviewed-on: https://go-review.googlesource.com/c/go/+/633535 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Commit-Queue: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Robert Griesemer <gri@google.com>	2024-12-04 16:34:07 +00:00
Russ Cox	a947912d8a	internal/byteorder: use canonical Go casing in names If Be and Le stand for big-endian and little-endian, then they should be BE and LE. Change-Id: I723e3962b8918da84791783d3c547638f1c9e8a9 Reviewed-on: https://go-review.googlesource.com/c/go/+/627376 Reviewed-by: Robert Griesemer <gri@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2024-11-20 20:59:28 +00:00
Adam	f7a14ae0cd	math/big: properly linkify a reference Change-Id: Ie7649060db25f1573eeaadd534a600bb24d30572 GitHub-Last-Rev: `c617848a4e` GitHub-Pull-Request: golang/go#70134 Reviewed-on: https://go-review.googlesource.com/c/go/+/623757 Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Robert Griesemer <gri@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Robert Griesemer <gri@google.com>	2024-10-31 19:23:59 +00:00
Xiaolin Zhao	b521ebb55a	math: implement arch{Floor, Ceil, Trunc} in hardware on loong64 benchmark: goos: linux goarch: loong64 pkg: math cpu: Loongson-3A6000 @ 2500.00MHz │ bench.old │ bench.new │ │ sec/op │ sec/op vs base │ Ceil 10.810n ± 0% 2.578n ± 0% -76.15% (p=0.000 n=20) Floor 10.810n ± 0% 2.531n ± 0% -76.59% (p=0.000 n=20) Trunc 9.606n ± 0% 2.530n ± 0% -73.67% (p=0.000 n=20) geomean 10.39n 2.546n -75.50% goos: linux goarch: loong64 pkg: math cpu: Loongson-3A5000 @ 2500.00MHz │ bench.old │ bench.new │ │ sec/op │ sec/op vs base │ Ceil 13.220n ± 0% 7.703n ± 8% -41.73% (p=0.000 n=20) Floor 12.410n ± 0% 7.248n ± 2% -41.59% (p=0.000 n=20) Trunc 11.210n ± 0% 7.757n ± 4% -30.80% (p=0.000 n=20) geomean 12.25n 7.566n -38.25% Change-Id: I3af51e9852e9cf5f965fed895d68945a2e8675f4 Reviewed-on: https://go-review.googlesource.com/c/go/+/612615 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2024-10-12 03:24:22 +00:00
Robert Griesemer	aa06c94054	math/big: add clarifying (internal) comment Follow-up on CL 467555. Change-Id: I1815b5def656ae4b86c31385ad0737f0465fa2d6 Reviewed-on: https://go-review.googlesource.com/c/go/+/613535 Auto-Submit: Robert Griesemer <gri@google.com> TryBot-Bypass: Robert Griesemer <gri@google.com> Reviewed-by: Robert Griesemer <gri@google.com> Reviewed-by: Tim King <taking@google.com>	2024-09-16 19:10:55 +00:00
Joel Sing	5367d696f7	math/big: simplify divBasic ujn assignment Rather than conditionally assigning ujn, initialise ujn above the loop to invent the leading 0 for u, then unconditionally load ujn at the bottom of the loop. This code operates on the basis that n >= 2, hence j+n-1 is always greater than zero. Change-Id: I1272ef30c787ed8707ae8421af2adcccc776d389 Reviewed-on: https://go-review.googlesource.com/c/go/+/467555 Auto-Submit: Robert Griesemer <gri@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Commit-Queue: Robert Griesemer <gri@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Robert Griesemer <gri@google.com>	2024-09-16 17:33:18 +00:00
Meng Zhuo	90391c2e8a	math: add round assembly implementations on riscv64 This CL reapplies CL 504737 and adds integer precision limitation check, since CL 504737 only checks whether floating point number is +-Inf or NaN. This CL is also ~7% faster than CL 504737. Updates #68322 goos: linux goarch: riscv64 pkg: math │ math.old.bench │ math.new.bench │ │ sec/op │ sec/op vs base │ Ceil 54.09n ± 0% 18.72n ± 0% -65.39% (p=0.000 n=10) Floor 40.72n ± 0% 18.72n ± 0% -54.03% (p=0.000 n=10) Round 20.73n ± 0% 20.73n ± 0% ~ (p=1.000 n=10) RoundToEven 24.07n ± 0% 24.07n ± 0% ~ (p=1.000 n=10) Trunc 38.72n ± 0% 18.72n ± 0% -51.65% (p=0.000 n=10) geomean 33.56n 20.09n -40.13% Change-Id: I06cfe2cb9e2535cd705d40b6650a7e71fedd906c Reviewed-on: https://go-review.googlesource.com/c/go/+/600075 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2024-09-11 02:28:10 +00:00
cuishuang	cc912bd8eb	all: remove unnecessary symbols and add missing symbols Change-Id: I535a7aaaf3f9e8a9c0e0c04f8f745ad7445a32f7 Reviewed-on: https://go-review.googlesource.com/c/go/+/611678 Run-TryBot: shuang cui <imcusg@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>	2024-09-09 16:44:45 +00:00
Meng Zhuo	3473d2f8ef	math: add large exact float rounding tests This CL adds trunc,ceil,floor tests for large exact float. Change-Id: Ib7ffec1d2d50d2ac955398a3dd0fd06d494fcf4f Reviewed-on: https://go-review.googlesource.com/c/go/+/601095 Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2024-09-04 13:36:10 +00:00
nlwkobe30	7cd0a4be5c	all: omit unnecessary 0 in slice expression All changes are related to the code, except for the comments in src/regexp/syntax/parse.go and src/slices/slices.go. Change-Id: I73c5d3c54099749b62210aa7f3182c5eb84bb6a6 GitHub-Last-Rev: `794aa9b053` GitHub-Pull-Request: golang/go#69170 Reviewed-on: https://go-review.googlesource.com/c/go/+/609678 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com>	2024-09-03 20:55:15 +00:00
Kir Kolyshkin	c3f346a485	math,os,os/*: use testenv.Executable As some callers don't have a testing context, modify testenv.Executable to accept nil (similar to how testenv.GOROOT works). Change-Id: I39112a7869933785a26b5cb6520055b3cc42b847 Reviewed-on: https://go-review.googlesource.com/c/go/+/609835 Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>	2024-09-03 20:11:30 +00:00
Joel Sing	2cee5d8109	math/big: implement addMulVVW in riscv64 assembly This provides an assembly implementation of addMulVVW for riscv64, processing up to four words per loop, resulting in a significant performance gain. On a StarFive VisionFive 2: │ addmulvvw.1 │ addmulvvw.2 │ │ sec/op │ sec/op vs base │ AddMulVVW/1-4 65.49n ± 0% 50.79n ± 0% -22.44% (p=0.000 n=10) AddMulVVW/2-4 82.81n ± 0% 66.83n ± 0% -19.29% (p=0.000 n=10) AddMulVVW/3-4 100.20n ± 0% 82.87n ± 0% -17.30% (p=0.000 n=10) AddMulVVW/4-4 117.50n ± 0% 84.20n ± 0% -28.34% (p=0.000 n=10) AddMulVVW/5-4 134.9n ± 0% 100.3n ± 0% -25.69% (p=0.000 n=10) AddMulVVW/10-4 221.7n ± 0% 164.4n ± 0% -25.85% (p=0.000 n=10) AddMulVVW/100-4 1.794µ ± 0% 1.250µ ± 0% -30.32% (p=0.000 n=10) AddMulVVW/1000-4 17.42µ ± 0% 12.08µ ± 0% -30.68% (p=0.000 n=10) AddMulVVW/10000-4 254.9µ ± 0% 214.8µ ± 0% -15.75% (p=0.000 n=10) AddMulVVW/100000-4 2.569m ± 0% 2.178m ± 0% -15.20% (p=0.000 n=10) geomean 1.443µ 1.107µ -23.29% │ addmulvvw.1 │ addmulvvw.2 │ │ B/s │ B/s vs base │ AddMulVVW/1-4 932.0Mi ± 0% 1201.6Mi ± 0% +28.93% (p=0.000 n=10) AddMulVVW/2-4 1.440Gi ± 0% 1.784Gi ± 0% +23.90% (p=0.000 n=10) AddMulVVW/3-4 1.785Gi ± 0% 2.158Gi ± 0% +20.87% (p=0.000 n=10) AddMulVVW/4-4 2.029Gi ± 0% 2.832Gi ± 0% +39.59% (p=0.000 n=10) AddMulVVW/5-4 2.209Gi ± 0% 2.973Gi ± 0% +34.55% (p=0.000 n=10) AddMulVVW/10-4 2.689Gi ± 0% 3.626Gi ± 0% +34.86% (p=0.000 n=10) AddMulVVW/100-4 3.323Gi ± 0% 4.770Gi ± 0% +43.54% (p=0.000 n=10) AddMulVVW/1000-4 3.421Gi ± 0% 4.936Gi ± 0% +44.27% (p=0.000 n=10) AddMulVVW/10000-4 2.338Gi ± 0% 2.776Gi ± 0% +18.69% (p=0.000 n=10) AddMulVVW/100000-4 2.320Gi ± 0% 2.736Gi ± 0% +17.93% (p=0.000 n=10) geomean 2.109Gi 2.749Gi +30.36% Change-Id: I6c7ee48233c53ff9b6a5a9002675886cd9bff5af Reviewed-on: https://go-review.googlesource.com/c/go/+/595400 Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2024-08-23 10:41:28 +00:00
Joel Sing	86cd5c4034	math/big: implement mulAddVWW in riscv64 assembly This provides an assembly implementation of mulAddVWW for riscv64, processing up to four words per loop, resulting in a significant performance gain. On a StarFive VisionFive 2: │ muladdvww.1 │ muladdvww.2 │ │ sec/op │ sec/op vs base │ MulAddVWW/1-4 68.18n ± 0% 65.49n ± 0% -3.95% (p=0.000 n=10) MulAddVWW/2-4 82.81n ± 0% 78.85n ± 0% -4.78% (p=0.000 n=10) MulAddVWW/3-4 97.49n ± 0% 72.18n ± 0% -25.96% (p=0.000 n=10) MulAddVWW/4-4 112.20n ± 0% 85.54n ± 0% -23.76% (p=0.000 n=10) MulAddVWW/5-4 126.90n ± 0% 98.90n ± 0% -22.06% (p=0.000 n=10) MulAddVWW/10-4 200.3n ± 0% 144.3n ± 0% -27.96% (p=0.000 n=10) MulAddVWW/100-4 1532.0n ± 0% 860.0n ± 0% -43.86% (p=0.000 n=10) MulAddVWW/1000-4 14.757µ ± 0% 8.076µ ± 0% -45.27% (p=0.000 n=10) MulAddVWW/10000-4 204.0µ ± 0% 137.1µ ± 0% -32.77% (p=0.000 n=10) MulAddVWW/100000-4 2.066m ± 0% 1.382m ± 0% -33.12% (p=0.000 n=10) geomean 1.311µ 950.0n -27.51% │ muladdvww.1 │ muladdvww.2 │ │ B/s │ B/s vs base │ MulAddVWW/1-4 895.1Mi ± 0% 932.0Mi ± 0% +4.11% (p=0.000 n=10) MulAddVWW/2-4 1.440Gi ± 0% 1.512Gi ± 0% +5.02% (p=0.000 n=10) MulAddVWW/3-4 1.834Gi ± 0% 2.477Gi ± 0% +35.07% (p=0.000 n=10) MulAddVWW/4-4 2.125Gi ± 0% 2.787Gi ± 0% +31.15% (p=0.000 n=10) MulAddVWW/5-4 2.349Gi ± 0% 3.013Gi ± 0% +28.28% (p=0.000 n=10) MulAddVWW/10-4 2.975Gi ± 0% 4.130Gi ± 0% +38.79% (p=0.000 n=10) MulAddVWW/100-4 3.891Gi ± 0% 6.930Gi ± 0% +78.11% (p=0.000 n=10) MulAddVWW/1000-4 4.039Gi ± 0% 7.380Gi ± 0% +82.72% (p=0.000 n=10) MulAddVWW/10000-4 2.922Gi ± 0% 4.346Gi ± 0% +48.74% (p=0.000 n=10) MulAddVWW/100000-4 2.884Gi ± 0% 4.313Gi ± 0% +49.52% (p=0.000 n=10) geomean 2.321Gi 3.202Gi +37.95% Change-Id: If08191607913ce5c7641f34bae8fa5c9dfb44777 Reviewed-on: https://go-review.googlesource.com/c/go/+/595399 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>	2024-08-23 10:41:12 +00:00
Joel Sing	4f18477db6	math/big: implement subVW in riscv64 assembly This provides an assembly implementation of subVW for riscv64, processing up to four words per loop, resulting in a significant performance gain. On a StarFive VisionFive 2: │ subvw.1 │ subvw.2 │ │ sec/op │ sec/op vs base │ SubVW/1-4 57.43n ± 0% 41.45n ± 0% -27.82% (p=0.000 n=10) SubVW/2-4 69.31n ± 0% 48.15n ± 0% -30.53% (p=0.000 n=10) SubVW/3-4 76.12n ± 0% 54.87n ± 0% -27.92% (p=0.000 n=10) SubVW/4-4 85.47n ± 0% 56.14n ± 0% -34.32% (p=0.000 n=10) SubVW/5-4 96.15n ± 0% 62.83n ± 0% -34.65% (p=0.000 n=10) SubVW/10-4 149.60n ± 0% 89.55n ± 0% -40.14% (p=0.000 n=10) SubVW/100-4 1115.0n ± 0% 549.3n ± 0% -50.74% (p=0.000 n=10) SubVW/1000-4 10.732µ ± 0% 5.071µ ± 0% -52.75% (p=0.000 n=10) SubVW/10000-4 153.0µ ± 0% 103.7µ ± 0% -32.21% (p=0.000 n=10) SubVW/100000-4 1.542m ± 0% 1.046m ± 0% -32.13% (p=0.000 n=10) SubVWext/1-4 57.42n ± 0% 41.45n ± 0% -27.81% (p=0.000 n=10) SubVWext/2-4 69.33n ± 0% 48.15n ± 0% -30.55% (p=0.000 n=10) SubVWext/3-4 76.12n ± 0% 54.93n ± 0% -27.84% (p=0.000 n=10) SubVWext/4-4 85.47n ± 0% 56.14n ± 0% -34.32% (p=0.000 n=10) SubVWext/5-4 96.15n ± 0% 62.83n ± 0% -34.65% (p=0.000 n=10) SubVWext/10-4 149.60n ± 0% 89.56n ± 0% -40.14% (p=0.000 n=10) SubVWext/100-4 1115.0n ± 0% 549.3n ± 0% -50.74% (p=0.000 n=10) SubVWext/1000-4 10.732µ ± 0% 5.061µ ± 0% -52.84% (p=0.000 n=10) SubVWext/10000-4 152.5µ ± 0% 103.7µ ± 0% -32.02% (p=0.000 n=10) SubVWext/100000-4 1.533m ± 0% 1.046m ± 0% -31.75% (p=0.000 n=10) geomean 1.005µ 633.7n -36.92% │ subvw.1 │ subvw.2 │ │ B/s │ B/s vs base │ SubVW/1-4 132.9Mi ± 0% 184.1Mi ± 0% +38.54% (p=0.000 n=10) SubVW/2-4 220.1Mi ± 0% 316.9Mi ± 0% +43.95% (p=0.000 n=10) SubVW/3-4 300.7Mi ± 0% 417.1Mi ± 0% +38.72% (p=0.000 n=10) SubVW/4-4 357.1Mi ± 0% 543.6Mi ± 0% +52.24% (p=0.000 n=10) SubVW/5-4 396.7Mi ± 0% 607.2Mi ± 0% +53.03% (p=0.000 n=10) SubVW/10-4 510.1Mi ± 0% 851.9Mi ± 0% +67.01% (p=0.000 n=10) SubVW/100-4 684.2Mi ± 0% 1388.9Mi ± 0% +102.99% (p=0.000 n=10) SubVW/1000-4 710.9Mi ± 0% 1504.5Mi ± 0% +111.63% (p=0.000 n=10) SubVW/10000-4 498.7Mi ± 0% 735.7Mi ± 0% +47.52% (p=0.000 n=10) SubVW/100000-4 494.8Mi ± 0% 729.1Mi ± 0% +47.34% (p=0.000 n=10) SubVWext/1-4 132.9Mi ± 0% 184.1Mi ± 0% +38.53% (p=0.000 n=10) SubVWext/2-4 220.1Mi ± 0% 316.9Mi ± 0% +44.00% (p=0.000 n=10) SubVWext/3-4 300.7Mi ± 0% 416.7Mi ± 0% +38.57% (p=0.000 n=10) SubVWext/4-4 357.1Mi ± 0% 543.6Mi ± 0% +52.24% (p=0.000 n=10) SubVWext/5-4 396.7Mi ± 0% 607.2Mi ± 0% +53.04% (p=0.000 n=10) SubVWext/10-4 510.1Mi ± 0% 851.9Mi ± 0% +67.01% (p=0.000 n=10) SubVWext/100-4 684.2Mi ± 0% 1388.9Mi ± 0% +102.99% (p=0.000 n=10) SubVWext/1000-4 710.9Mi ± 0% 1507.6Mi ± 0% +112.07% (p=0.000 n=10) SubVWext/10000-4 500.1Mi ± 0% 735.7Mi ± 0% +47.10% (p=0.000 n=10) SubVWext/100000-4 497.8Mi ± 0% 729.4Mi ± 0% +46.52% (p=0.000 n=10) geomean 387.6Mi 614.5Mi +58.51% Change-Id: I9d7fac719e977710ad9db9121fa298db6df605de Reviewed-on: https://go-review.googlesource.com/c/go/+/595398 Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2024-08-22 13:19:44 +00:00
Joel Sing	c6f56985ad	math/big: implement addVW in riscv64 assembly This provides an assembly implementation of addVW for riscv64, processing up to four words per loop, resulting in a significant performance gain. On a StarFive VisionFive 2: │ addvw.1 │ addvw.2 │ │ sec/op │ sec/op vs base │ AddVW/1-4 57.43n ± 0% 41.45n ± 0% -27.83% (p=0.000 n=10) AddVW/2-4 69.31n ± 0% 48.15n ± 0% -30.53% (p=0.000 n=10) AddVW/3-4 76.12n ± 0% 54.97n ± 0% -27.79% (p=0.000 n=10) AddVW/4-4 85.47n ± 0% 56.14n ± 0% -34.32% (p=0.000 n=10) AddVW/5-4 96.16n ± 0% 62.82n ± 0% -34.67% (p=0.000 n=10) AddVW/10-4 149.60n ± 0% 89.55n ± 0% -40.14% (p=0.000 n=10) AddVW/100-4 1115.0n ± 0% 549.3n ± 0% -50.74% (p=0.000 n=10) AddVW/1000-4 10.732µ ± 0% 5.060µ ± 0% -52.85% (p=0.000 n=10) AddVW/10000-4 151.7µ ± 0% 103.7µ ± 0% -31.63% (p=0.000 n=10) AddVW/100000-4 1.523m ± 0% 1.050m ± 0% -31.03% (p=0.000 n=10) AddVWext/1-4 57.42n ± 0% 41.45n ± 0% -27.81% (p=0.000 n=10) AddVWext/2-4 69.32n ± 0% 48.15n ± 0% -30.54% (p=0.000 n=10) AddVWext/3-4 76.12n ± 0% 54.87n ± 0% -27.92% (p=0.000 n=10) AddVWext/4-4 85.47n ± 0% 56.14n ± 0% -34.32% (p=0.000 n=10) AddVWext/5-4 96.15n ± 0% 62.82n ± 0% -34.66% (p=0.000 n=10) AddVWext/10-4 149.60n ± 0% 89.55n ± 0% -40.14% (p=0.000 n=10) AddVWext/100-4 1115.0n ± 0% 549.3n ± 0% -50.74% (p=0.000 n=10) AddVWext/1000-4 10.732µ ± 0% 5.060µ ± 0% -52.85% (p=0.000 n=10) AddVWext/10000-4 150.5µ ± 0% 103.7µ ± 0% -31.10% (p=0.000 n=10) AddVWext/100000-4 1.530m ± 0% 1.049m ± 0% -31.41% (p=0.000 n=10) geomean 1.003µ 633.9n -36.79% │ addvw.1 │ addvw.2 │ │ B/s │ B/s vs base │ AddVW/1-4 132.8Mi ± 0% 184.1Mi ± 0% +38.55% (p=0.000 n=10) AddVW/2-4 220.1Mi ± 0% 316.9Mi ± 0% +43.96% (p=0.000 n=10) AddVW/3-4 300.7Mi ± 0% 416.4Mi ± 0% +38.48% (p=0.000 n=10) AddVW/4-4 357.1Mi ± 0% 543.6Mi ± 0% +52.25% (p=0.000 n=10) AddVW/5-4 396.7Mi ± 0% 607.2Mi ± 0% +53.06% (p=0.000 n=10) AddVW/10-4 510.1Mi ± 0% 852.0Mi ± 0% +67.02% (p=0.000 n=10) AddVW/100-4 684.1Mi ± 0% 1389.0Mi ± 0% +103.03% (p=0.000 n=10) AddVW/1000-4 710.9Mi ± 0% 1507.8Mi ± 0% +112.08% (p=0.000 n=10) AddVW/10000-4 503.1Mi ± 0% 735.8Mi ± 0% +46.26% (p=0.000 n=10) AddVW/100000-4 501.0Mi ± 0% 726.5Mi ± 0% +45.00% (p=0.000 n=10) AddVWext/1-4 132.9Mi ± 0% 184.1Mi ± 0% +38.55% (p=0.000 n=10) AddVWext/2-4 220.1Mi ± 0% 316.9Mi ± 0% +43.98% (p=0.000 n=10) AddVWext/3-4 300.7Mi ± 0% 417.1Mi ± 0% +38.73% (p=0.000 n=10) AddVWext/4-4 357.1Mi ± 0% 543.6Mi ± 0% +52.25% (p=0.000 n=10) AddVWext/5-4 396.7Mi ± 0% 607.2Mi ± 0% +53.05% (p=0.000 n=10) AddVWext/10-4 510.1Mi ± 0% 852.0Mi ± 0% +67.02% (p=0.000 n=10) AddVWext/100-4 684.2Mi ± 0% 1389.0Mi ± 0% +103.02% (p=0.000 n=10) AddVWext/1000-4 710.9Mi ± 0% 1507.7Mi ± 0% +112.08% (p=0.000 n=10) AddVWext/10000-4 506.9Mi ± 0% 735.8Mi ± 0% +45.15% (p=0.000 n=10) AddVWext/100000-4 498.6Mi ± 0% 727.0Mi ± 0% +45.79% (p=0.000 n=10) geomean 388.3Mi 614.3Mi +58.19% Change-Id: Ib14a4b8c1d81e710753bbf6dd5546bbca44fe3f1 Reviewed-on: https://go-review.googlesource.com/c/go/+/595397 Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>	2024-08-22 13:19:28 +00:00
apocelipes	fd985d23dc	crypto/x509,math/rand/v2: implement the encoding.(Binary\|Text)Appender Implement the encoding.(Binary\|Text)Appender interfaces for "x509.OID". Implement the encoding.BinaryAppender interface for "rand/v2.PCG" and "rand/v2.ChaCha8". "rand/v2.ChaCha8.MarshalBinary" alse gains some performance benefits: │ old │ new │ │ sec/op │ sec/op vs base │ ChaCha8MarshalBinary-8 33.730n ± 2% 9.786n ± 1% -70.99% (p=0.000 n=10) ChaCha8MarshalBinaryRead-8 99.86n ± 1% 17.79n ± 0% -82.18% (p=0.000 n=10) geomean 58.04n 13.19n -77.27% │ old │ new │ │ B/op │ B/op vs base │ ChaCha8MarshalBinary-8 48.00 ± 0% 0.00 ± 0% -100.00% (p=0.000 n=10) ChaCha8MarshalBinaryRead-8 83.00 ± 0% 0.00 ± 0% -100.00% (p=0.000 n=10) │ old │ new │ │ allocs/op │ allocs/op vs base │ ChaCha8MarshalBinary-8 1.000 ± 0% 0.000 ± 0% -100.00% (p=0.000 n=10) ChaCha8MarshalBinaryRead-8 2.000 ± 0% 0.000 ± 0% -100.00% (p=0.000 n=10) For #62384 Change-Id: I604bde6dad90a916012909c7260f4bb06dcf5c0a GitHub-Last-Rev: `78abf9c5df` GitHub-Pull-Request: golang/go#68987 Reviewed-on: https://go-review.googlesource.com/c/go/+/607079 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com>	2024-08-21 19:19:57 +00:00
Alexander Cyon	54c948de9a	src: fix typos Fix typos in ~30 files Change-Id: Ie433aea01e7d15944c1e9e103691784876d5c1f9 GitHub-Last-Rev: `bbaeb3d1f8` GitHub-Pull-Request: golang/go#68964 Reviewed-on: https://go-review.googlesource.com/c/go/+/606955 Auto-Submit: Ian Lance Taylor <iant@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com>	2024-08-20 21:01:59 +00:00
Paschalis T	d2b6bdb035	math/rand: make calls to Seed no-op Makes calls to the global Seed a no-op. The GODEBUG=randseednop=0 setting can be used to revert this behavior. Fixes #67273 Change-Id: I79c1b2b23f3bc472fbd6190cb916a9d7583250f4 Reviewed-on: https://go-review.googlesource.com/c/go/+/606055 Auto-Submit: Cherry Mui <cherryyz@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2024-08-19 20:47:27 +00:00
apocelipes	527610763b	math/big,regexp: implement the encoding.TextAppender interface For #62384 Change-Id: I1557704c6a0f9c6f3b9aad001374dd5cdbc99065 GitHub-Last-Rev: `c258d18cce` GitHub-Pull-Request: golang/go#68893 Reviewed-on: https://go-review.googlesource.com/c/go/+/605758 Reviewed-by: Ian Lance Taylor <iant@google.com> Commit-Queue: Robert Griesemer <gri@google.com> Reviewed-by: Robert Griesemer <gri@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Robert Griesemer <gri@google.com>	2024-08-15 23:43:00 +00:00
Joel Sing	1419d0e925	math/big: implement subVV in riscv64 assembly This provides an assembly implementation of subVV for riscv64, processing up to four words per loop, resulting in a significant performance gain. On a StarFive VisionFive 2: │ subvv.1 │ subvv.2 │ │ sec/op │ sec/op vs base │ SubVV/1-4 73.46n ± 0% 48.08n ± 0% -34.55% (p=0.000 n=10) SubVV/2-4 88.13n ± 0% 58.76n ± 0% -33.33% (p=0.000 n=10) SubVV/3-4 102.80n ± 0% 69.45n ± 0% -32.44% (p=0.000 n=10) SubVV/4-4 117.50n ± 0% 72.11n ± 0% -38.63% (p=0.000 n=10) SubVV/5-4 132.20n ± 0% 82.80n ± 0% -37.37% (p=0.000 n=10) SubVV/10-4 216.3n ± 0% 126.9n ± 0% -41.33% (p=0.000 n=10) SubVV/100-4 1659.0n ± 0% 886.5n ± 0% -46.56% (p=0.000 n=10) SubVV/1000-4 16.089µ ± 0% 8.401µ ± 0% -47.78% (p=0.000 n=10) SubVV/10000-4 244.7µ ± 0% 176.8µ ± 0% -27.74% (p=0.000 n=10) SubVV/100000-4 2.562m ± 0% 1.871m ± 0% -26.96% (p=0.000 n=10) geomean 1.436µ 904.4n -37.04% │ subvv.1 │ subvv.2 │ │ B/s │ B/s vs base │ SubVV/1-4 830.9Mi ± 0% 1269.5Mi ± 0% +52.79% (p=0.000 n=10) SubVV/2-4 1.353Gi ± 0% 2.029Gi ± 0% +49.99% (p=0.000 n=10) SubVV/3-4 1.739Gi ± 0% 2.575Gi ± 0% +48.06% (p=0.000 n=10) SubVV/4-4 2.029Gi ± 0% 3.306Gi ± 0% +62.96% (p=0.000 n=10) SubVV/5-4 2.254Gi ± 0% 3.600Gi ± 0% +59.67% (p=0.000 n=10) SubVV/10-4 2.755Gi ± 0% 4.699Gi ± 0% +70.53% (p=0.000 n=10) SubVV/100-4 3.594Gi ± 0% 6.723Gi ± 0% +87.08% (p=0.000 n=10) SubVV/1000-4 3.705Gi ± 0% 7.095Gi ± 0% +91.52% (p=0.000 n=10) SubVV/10000-4 2.436Gi ± 0% 3.372Gi ± 0% +38.39% (p=0.000 n=10) SubVV/100000-4 2.327Gi ± 0% 3.185Gi ± 0% +36.91% (p=0.000 n=10) geomean 2.118Gi 3.364Gi +58.84% Change-Id: I361cb3f4195b27a9f1e9486c9e1fdbeaa94d32b4 Reviewed-on: https://go-review.googlesource.com/c/go/+/595396 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org>	2024-08-15 12:06:59 +00:00
Xiaolin Zhao	ff14e08cd3	cmd/compile, math: improve implementation of math.{Max,Min} on loong64 Make math.{Min,Max} intrinsics and implement math.{archMax,archMin} in hardware. goos: linux goarch: loong64 pkg: math cpu: Loongson-3A6000 @ 2500.00MHz │ old.bench │ new.bench │ │ sec/op │ sec/op vs base │ Max 7.606n ± 0% 3.087n ± 0% -59.41% (p=0.000 n=20) Min 7.205n ± 0% 2.904n ± 0% -59.69% (p=0.000 n=20) MinFloat 37.220n ± 0% 4.802n ± 0% -87.10% (p=0.000 n=20) MaxFloat 33.620n ± 0% 4.802n ± 0% -85.72% (p=0.000 n=20) geomean 16.18n 3.792n -76.57% goos: linux goarch: loong64 pkg: runtime cpu: Loongson-3A5000 @ 2500.00MHz │ old.bench │ new.bench │ │ sec/op │ sec/op vs base │ Max 10.010n ± 0% 7.196n ± 0% -28.11% (p=0.000 n=20) Min 8.806n ± 0% 7.155n ± 0% -18.75% (p=0.000 n=20) MinFloat 60.010n ± 0% 7.976n ± 0% -86.71% (p=0.000 n=20) MaxFloat 56.410n ± 0% 7.980n ± 0% -85.85% (p=0.000 n=20) geomean 23.37n 7.566n -67.63% Updates #59120. Change-Id: I6815d20bc304af3cbf5d6ca8fe0ca1c2ddebea2d Reviewed-on: https://go-review.googlesource.com/c/go/+/580283 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com>	2024-08-07 01:16:28 +00:00
Joel Sing	9abd11440c	math/big: implement addVV in riscv64 assembly This provides an assembly implementation of addVV for riscv64, processing up to four words per loop, resulting in a significant performance gain. On a StarFive VisionFive 2: │ addvv.1 │ addvv.2 │ │ sec/op │ sec/op vs base │ AddVV/1-4 73.45n ± 0% 48.08n ± 0% -34.54% (p=0.000 n=10) AddVV/2-4 88.14n ± 0% 58.76n ± 0% -33.33% (p=0.000 n=10) AddVV/3-4 102.80n ± 0% 69.44n ± 0% -32.45% (p=0.000 n=10) AddVV/4-4 117.50n ± 0% 72.18n ± 0% -38.57% (p=0.000 n=10) AddVV/5-4 132.20n ± 0% 82.79n ± 0% -37.38% (p=0.000 n=10) AddVV/10-4 216.3n ± 0% 126.8n ± 0% -41.35% (p=0.000 n=10) AddVV/100-4 1659.0n ± 0% 885.2n ± 0% -46.64% (p=0.000 n=10) AddVV/1000-4 16.089µ ± 0% 8.400µ ± 0% -47.79% (p=0.000 n=10) AddVV/10000-4 245.3µ ± 0% 176.9µ ± 0% -27.88% (p=0.000 n=10) AddVV/100000-4 2.537m ± 0% 1.873m ± 0% -26.17% (p=0.000 n=10) geomean 1.435µ 904.5n -36.99% │ addvv.1 │ addvv.2 │ │ B/s │ B/s vs base │ AddVV/1-4 830.9Mi ± 0% 1269.5Mi ± 0% +52.78% (p=0.000 n=10) AddVV/2-4 1.353Gi ± 0% 2.029Gi ± 0% +50.00% (p=0.000 n=10) AddVV/3-4 1.739Gi ± 0% 2.575Gi ± 0% +48.09% (p=0.000 n=10) AddVV/4-4 2.029Gi ± 0% 3.303Gi ± 0% +62.82% (p=0.000 n=10) AddVV/5-4 2.254Gi ± 0% 3.600Gi ± 0% +59.69% (p=0.000 n=10) AddVV/10-4 2.755Gi ± 0% 4.699Gi ± 0% +70.54% (p=0.000 n=10) AddVV/100-4 3.594Gi ± 0% 6.734Gi ± 0% +87.37% (p=0.000 n=10) AddVV/1000-4 3.705Gi ± 0% 7.096Gi ± 0% +91.54% (p=0.000 n=10) AddVV/10000-4 2.430Gi ± 0% 3.369Gi ± 0% +38.65% (p=0.000 n=10) AddVV/100000-4 2.350Gi ± 0% 3.183Gi ± 0% +35.44% (p=0.000 n=10) geomean 2.119Gi 3.364Gi +58.71% Change-Id: I727b3d9f8ab01eada7270046480b1430d56d0a96 Reviewed-on: https://go-review.googlesource.com/c/go/+/595395 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: M Zhuo <mengzhuo1203@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Than McIntosh <thanm@google.com>	2024-08-02 14:14:14 +00:00
withsky	fc51e5023e	math/big: fix comment typo in natdiv.go Comment in line 395: [x₀ < S, so S - x₀ < 0; drop it] Should be: [x₀ < S, so S - x₀ > 0; drop it] The proof is based on S - x₀ > 0, thus it's a typo of comment. Fixes #68466 Change-Id: I68bb7cb909ba2bfe02a8873f74b57edc6679b72a GitHub-Last-Rev: `40a2fc80cf` GitHub-Pull-Request: golang/go#68487 Reviewed-on: https://go-review.googlesource.com/c/go/+/598855 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org>	2024-07-17 15:08:04 +00:00
Kir Kolyshkin	97ccc224f1	math/big: use lists in docstrings This looks way better than the code formatting. Similar to CL 597656. Change-Id: I2c8809c1d6f8a8387941567213880662ff649a73 Reviewed-on: https://go-review.googlesource.com/c/go/+/597659 Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2024-07-16 18:17:35 +00:00
Kir Kolyshkin	66e940b6f8	math/big: more cross-references in docstrings Change-Id: I3541859bbf3ac4f9317b82a66d21be3d5c4c5a84 Reviewed-on: https://go-review.googlesource.com/c/go/+/597658 Reviewed-by: Ian Lance Taylor <iant@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Auto-Submit: Ian Lance Taylor <iant@google.com>	2024-07-16 18:17:32 +00:00
Jorropo	b3040679ad	math: remove riscv64 assembly implementations of rounding Fixes #68322 This reverts commit `ad377e906a`. Change-Id: Ifa4811e2c679d789cc830dbff5e50301410e24d0 Reviewed-on: https://go-review.googlesource.com/c/go/+/596516 Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: Keith Randall <khr@golang.org> Commit-Queue: Cuong Manh Le <cuong.manhle.vn@gmail.com> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>	2024-07-10 03:19:58 +00:00
Robert Griesemer	1831437f19	math/big: better doc string for Float.Copy, add example test Fixes #66358. Change-Id: Ic9bde88eabfb2a446d32e1dc5ac404a51ef49f11 Reviewed-on: https://go-review.googlesource.com/c/go/+/590635 Auto-Submit: Robert Griesemer <gri@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Reviewed-by: Robert Griesemer <gri@google.com>	2024-06-06 15:46:54 +00:00
Russ Cox	2a7ca156b8	all: document legacy //go:linkname for final round of modules Add linknames for most modules with ≥50 dependents. Add linknames for a few other modules that we know are important but are below 50. Remove linknames from badlinkname.go that do not merit inclusion (very small number of dependents). We can add them back later if the need arises. Fixes #67401. (For now.) Change-Id: I1e49fec0292265256044d64b1841d366c4106002 Reviewed-on: https://go-review.googlesource.com/c/go/+/587756 Auto-Submit: Russ Cox <rsc@golang.org> TryBot-Bypass: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2024-05-29 17:58:53 +00:00
Alan Donovan	bf91eb3a8b	std: fix calls to Printf(s) with non-constant s In all cases the intent was not to interpret s as a format string. In one case (go/types), this was a latent bug in production. (These were uncovered by a new check in vet's printf analyzer.) Updates #60529 Change-Id: I3e17af7e589be9aec1580783a1b1011c52ec494b Reviewed-on: https://go-review.googlesource.com/c/go/+/587855 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Russ Cox <rsc@golang.org>	2024-05-23 18:42:28 +00:00
Russ Cox	9a3ef86173	all: document legacy //go:linkname for modules with ≥5,000 dependents For #67401. Change-Id: Ifea84af92017b405466937f50fb8f28e6893c8cb Reviewed-on: https://go-review.googlesource.com/c/go/+/587220 Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Russ Cox <rsc@golang.org>	2024-05-23 01:15:13 +00:00
Ian Lance Taylor	b0b1d42db3	all: change from sort functions to slices functions where feasible Doing this because the slices functions are slightly faster and slightly easier to use. It also removes one dependency layer. This CL does not change packages that are used during bootstrap, as the bootstrap compiler does not have the required slices functions. It does not change the go/scanner package because the ErrorList Len, Swap, and Less methods are part of the Go 1 API. Change-Id: If52899be791c829198e11d2408727720b91ebe8a Reviewed-on: https://go-review.googlesource.com/c/go/+/587655 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Ian Lance Taylor <iant@google.com> Auto-Submit: Ian Lance Taylor <iant@google.com> Commit-Queue: Ian Lance Taylor <iant@google.com> Reviewed-by: Damien Neil <dneil@google.com>	2024-05-23 01:00:11 +00:00
Filippo Valsorda	587c3847da	math/rand/v2: add ChaCha8.Read Fixes #67059 Closes #67452 Closes #67498 Change-Id: I84eba2ed787a17e9d6aaad2a8a78596e3944909a Reviewed-on: https://go-review.googlesource.com/c/go/+/587280 Reviewed-by: Roland Shoemaker <roland@golang.org> Auto-Submit: Filippo Valsorda <filippo@golang.org> Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2024-05-22 22:09:08 +00:00
Paul E. Murphy	cc673d2ec5	all: convert PPC64 CMPx ...,R0,... to CMPx Rx,$0 Cleanup all remaining trivial compares against $0 in ppc64x assembly. In math, SRD ...,Rx; CMP Rx, $0 is further simplified to SRDCC. Change-Id: Ia2bc204953e32f08ee142bfd06a91965f30f99b6 Reviewed-on: https://go-review.googlesource.com/c/go/+/587016 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Run-TryBot: Paul Murphy <murp@ibm.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2024-05-22 18:17:17 +00:00
Brad Fitzpatrick	bf0bbd5360	math/rand/v2: drop pointer receiver on zero-width type Just a cleanup. Change-Id: Ibeb2c7d447c793086280e612fe5f0f7eeb863f71 Reviewed-on: https://go-review.googlesource.com/c/go/+/582875 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Damien Neil <dneil@google.com> Reviewed-by: Jorropo <jorropo.pgm@gmail.com>	2024-05-22 02:25:49 +00:00
Jayanth Krishnamurthy	f726c8d0fd	ppc64x: code cleanup in assembly files Replacing Branch Conditional (BC) with its extended mnemonic form of BDNZ and BDZ. - BC 16, 0, target can be replaced by BDNZ target - BC 18, 0, target can be replaced by BDZ target Change-Id: I1259e207f2a40d0b72780d5421f7449ddc006dc5 Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10 Reviewed-on: https://go-review.googlesource.com/c/go/+/585077 Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>	2024-05-20 18:05:32 +00:00
Cherry Mui	41aab30bd2	all: add push linknames to allow legacy pull linknames CL 585358 adds restrictions to disallow pull-only linknames (currently off by default). Currently, there are quite some pull- only linknames in user code in the wild. In order not to break those, we add push linknames to allow them to be pulled. This CL includes linknames found in a large code corpus (thanks Matthew Dempsky and Michael Pratt for the analysis!), that are not currently linknamed. Updates #67401. Change-Id: I32f5fc0c7a6abbd7a11359a025cfa2bf458fe767 Reviewed-on: https://go-review.googlesource.com/c/go/+/586137 Reviewed-by: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2024-05-17 16:48:00 +00:00

1 2 3 4 5 ...

765 Commits