mirror/go - go - Git Fam. Sieh

Commit Graph

Author	SHA1	Message	Date
Austin Clements	1d20a362d0	math: avoid assembly stubs Currently almost all math functions have the following pattern: func Sin(x float64) float64 func sin(x float64) float64 { // ... pure Go implementation ... } Architectures that implement a function in assembly provide the assembly implementation directly as the exported function (e.g., Sin), and architectures that don't implement it in assembly use a small stub to jump back to the Go code, like: TEXT ·Sin(SB), NOSPLIT, $0 JMP ·sin(SB) However, most functions are not implemented in assembly on most architectures, so this jump through assembly is a waste. It defeats compiler optimizations like inlining. And, with regabi, it actually adds a small but non-trivial overhead because the jump from assembly back to Go must go through an ABI0->ABIInternal bridge function. Hence, this CL reorganizes this structure across the entire package. It now leans on inlining to achieve peak performance, but allows the compiler to see all the way through the pure Go implementation. Now, functions follow this pattern: func Sin(x float64) float64 { if haveArchSin { return archSin(x) } return sin(x) } func sin(x float64) float64 { // ... pure Go implementation ... } Architectures that have assembly implementations use build-tagged files to set haveArchX to true an provide an archX implementation. That implementation can also still call back into the Go implementation (some of them do this). Prior to this change, enabling ABI wrappers results in a geomean slowdown of the math benchmarks of 8.77% (full results: https://perf.golang.org/search?q=upload:20210415.6) and of the Tile38 benchmarks by ~4%. After this change, enabling ABI wrappers is completely performance-neutral on Tile38 and all but one math benchmark (full results: https://perf.golang.org/search?q=upload:20210415.7). ABI wrappers slow down SqrtIndirectLatency-12 by 2.09%, which makes sense because that call must still go through an ABI wrapper. With ABI wrappers disabled (which won't be an option on amd64 much longer), on linux/amd64, this change is largely performance-neutral and slightly improves the performance of a few benchmarks: (Because there are so many benchmarks, I've applied the Šidák correction to the alpha threshold. It makes relatively little difference in which benchmarks are statistically significant.) name old time/op new time/op delta Acos-12 22.3ns ± 0% 18.8ns ± 1% -15.44% (p=0.000 n=18+16) Acosh-12 28.2ns ± 0% 28.2ns ± 0% ~ (p=0.404 n=18+20) Asin-12 18.1ns ± 0% 18.2ns ± 0% +0.20% (p=0.000 n=18+16) Asinh-12 32.8ns ± 0% 32.9ns ± 1% ~ (p=0.891 n=18+20) Atan-12 9.92ns ± 0% 9.90ns ± 1% -0.24% (p=0.000 n=17+16) Atanh-12 27.7ns ± 0% 27.5ns ± 0% -0.72% (p=0.000 n=16+20) Atan2-12 18.5ns ± 0% 18.4ns ± 0% -0.59% (p=0.000 n=19+19) Cbrt-12 22.1ns ± 0% 22.1ns ± 0% ~ (p=0.804 n=16+17) Ceil-12 0.84ns ± 0% 0.84ns ± 0% ~ (p=0.663 n=18+16) Copysign-12 0.84ns ± 0% 0.84ns ± 0% ~ (p=0.762 n=16+19) Cos-12 12.7ns ± 0% 12.7ns ± 1% ~ (p=0.145 n=19+18) Cosh-12 22.2ns ± 0% 22.5ns ± 0% +1.60% (p=0.000 n=17+19) Erf-12 11.1ns ± 1% 11.1ns ± 1% ~ (p=0.010 n=19+19) Erfc-12 12.6ns ± 1% 12.7ns ± 0% ~ (p=0.066 n=19+15) Erfinv-12 16.1ns ± 0% 16.1ns ± 0% ~ (p=0.462 n=17+20) Erfcinv-12 16.0ns ± 1% 16.0ns ± 1% ~ (p=0.015 n=17+16) Exp-12 16.3ns ± 0% 16.5ns ± 1% +1.25% (p=0.000 n=19+16) ExpGo-12 36.2ns ± 1% 36.1ns ± 1% ~ (p=0.242 n=20+18) Expm1-12 18.6ns ± 0% 18.7ns ± 0% +0.25% (p=0.000 n=16+19) Exp2-12 34.7ns ± 0% 34.6ns ± 1% ~ (p=0.010 n=19+18) Exp2Go-12 34.8ns ± 1% 34.8ns ± 1% ~ (p=0.372 n=19+19) Abs-12 0.56ns ± 0% 0.56ns ± 0% ~ (p=0.766 n=18+16) Dim-12 0.84ns ± 1% 0.84ns ± 1% ~ (p=0.167 n=17+19) Floor-12 0.84ns ± 0% 0.84ns ± 0% ~ (p=0.993 n=18+16) Max-12 3.35ns ± 0% 3.35ns ± 0% ~ (p=0.894 n=17+19) Min-12 3.35ns ± 0% 3.36ns ± 1% ~ (p=0.214 n=18+18) Mod-12 35.2ns ± 0% 34.7ns ± 0% -1.45% (p=0.000 n=18+17) Frexp-12 5.31ns ± 0% 4.75ns ± 0% -10.51% (p=0.000 n=19+18) Gamma-12 14.8ns ± 0% 16.2ns ± 1% +9.21% (p=0.000 n=20+19) Hypot-12 6.16ns ± 0% 6.17ns ± 0% +0.26% (p=0.000 n=19+20) HypotGo-12 7.79ns ± 1% 7.78ns ± 0% ~ (p=0.497 n=18+17) Ilogb-12 4.47ns ± 0% 4.47ns ± 0% ~ (p=0.167 n=19+19) J0-12 76.0ns ± 0% 76.3ns ± 0% +0.35% (p=0.000 n=19+18) J1-12 76.8ns ± 1% 75.9ns ± 0% -1.14% (p=0.000 n=18+18) Jn-12 167ns ± 1% 168ns ± 1% ~ (p=0.038 n=18+18) Ldexp-12 6.98ns ± 0% 6.43ns ± 0% -7.97% (p=0.000 n=17+18) Lgamma-12 15.9ns ± 0% 16.0ns ± 1% ~ (p=0.011 n=20+17) Log-12 13.3ns ± 0% 13.4ns ± 1% +0.37% (p=0.000 n=15+18) Logb-12 4.75ns ± 0% 4.75ns ± 0% ~ (p=0.831 n=16+18) Log1p-12 19.5ns ± 0% 19.5ns ± 1% ~ (p=0.851 n=18+17) Log10-12 15.9ns ± 0% 14.0ns ± 0% -11.92% (p=0.000 n=17+16) Log2-12 7.88ns ± 1% 8.01ns ± 0% +1.72% (p=0.000 n=20+20) Modf-12 4.75ns ± 0% 4.34ns ± 0% -8.66% (p=0.000 n=19+17) Nextafter32-12 5.31ns ± 0% 5.31ns ± 0% ~ (p=0.389 n=17+18) Nextafter64-12 5.03ns ± 1% 5.03ns ± 0% ~ (p=0.774 n=17+18) PowInt-12 29.9ns ± 0% 28.5ns ± 0% -4.69% (p=0.000 n=18+19) PowFrac-12 91.0ns ± 0% 91.1ns ± 0% ~ (p=0.029 n=19+19) Pow10Pos-12 1.12ns ± 0% 1.12ns ± 0% ~ (p=0.363 n=20+20) Pow10Neg-12 3.90ns ± 0% 3.90ns ± 0% ~ (p=0.921 n=17+18) Round-12 2.31ns ± 0% 2.31ns ± 1% ~ (p=0.390 n=18+18) RoundToEven-12 0.84ns ± 0% 0.84ns ± 0% ~ (p=0.280 n=18+19) Remainder-12 31.6ns ± 0% 29.6ns ± 0% -6.16% (p=0.000 n=18+17) Signbit-12 0.56ns ± 0% 0.56ns ± 0% ~ (p=0.385 n=19+18) Sin-12 12.5ns ± 0% 12.5ns ± 0% ~ (p=0.080 n=18+18) Sincos-12 16.4ns ± 2% 16.4ns ± 2% ~ (p=0.253 n=20+19) Sinh-12 26.1ns ± 0% 26.1ns ± 0% +0.18% (p=0.000 n=17+19) SqrtIndirect-12 3.91ns ± 0% 3.90ns ± 0% ~ (p=0.133 n=19+19) SqrtLatency-12 2.79ns ± 0% 2.79ns ± 0% ~ (p=0.226 n=16+19) SqrtIndirectLatency-12 6.68ns ± 0% 6.37ns ± 2% -4.66% (p=0.000 n=17+20) SqrtGoLatency-12 49.4ns ± 0% 49.4ns ± 0% ~ (p=0.289 n=18+16) SqrtPrime-12 3.18µs ± 0% 3.18µs ± 0% ~ (p=0.084 n=17+18) Tan-12 13.8ns ± 0% 13.9ns ± 2% ~ (p=0.292 n=19+20) Tanh-12 25.4ns ± 0% 25.4ns ± 0% ~ (p=0.101 n=17+17) Trunc-12 0.84ns ± 0% 0.84ns ± 0% ~ (p=0.765 n=18+16) Y0-12 75.8ns ± 0% 75.9ns ± 1% ~ (p=0.805 n=16+18) Y1-12 76.3ns ± 0% 75.3ns ± 1% -1.34% (p=0.000 n=19+17) Yn-12 164ns ± 0% 164ns ± 2% ~ (p=0.356 n=18+20) Float64bits-12 0.56ns ± 0% 0.56ns ± 0% ~ (p=0.383 n=18+18) Float64frombits-12 0.56ns ± 0% 0.56ns ± 0% ~ (p=0.066 n=18+19) Float32bits-12 0.56ns ± 0% 0.56ns ± 0% ~ (p=0.889 n=16+19) Float32frombits-12 0.56ns ± 0% 0.56ns ± 0% ~ (p=0.007 n=18+19) FMA-12 23.9ns ± 0% 24.0ns ± 0% +0.31% (p=0.000 n=16+17) [Geo mean] 9.86ns 9.77ns -0.87% (https://perf.golang.org/search?q=upload:20210415.5) For #40724. Change-Id: I44fbba2a17be930ec9daeb0a8222f55cd50555a0 Reviewed-on: https://go-review.googlesource.com/c/go/+/310331 Trust: Austin Clements <austin@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2021-04-15 15:48:19 +00:00
Iskander Sharipov	0fbaf6ca8b	math,net: omit explicit true tag expr in switch Performed `switch true {}` => `switch {}` replacement. Found using https://go-critic.github.io/overview.html#switchTrue-ref Change-Id: Ib39ea98531651966a5a56b7bd729b46e4eeb7f7c Reviewed-on: https://go-review.googlesource.com/123378 Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-08-20 22:15:59 +00:00
erifan01	ed6c6c9c11	math: optimize sinh and cosh Improve performance by reducing unnecessary function calls Benchmarks: Tme old time/op new time/op delta Cosh-8 229ns ± 0% 138ns ± 0% -39.74% (p=0.008 n=5+5) Sinh-8 231ns ± 0% 139ns ± 0% -39.83% (p=0.008 n=5+5) Change-Id: Icab5485849bbfaafca8429d06b67c558101f4f3c Reviewed-on: https://go-review.googlesource.com/85477 Reviewed-by: Robert Griesemer <gri@golang.org>	2018-02-27 04:34:37 +00:00
Thanabodee Charoenpiriyakij	1124fa300b	math: use Abs rather than if x < 0 { x = -x } This is the benchmark result base on darwin with amd64 architecture: name old time/op new time/op delta Cos 10.2ns ± 2% 10.3ns ± 3% +1.18% (p=0.032 n=10+10) Cosh 25.3ns ± 3% 24.6ns ± 2% -3.00% (p=0.000 n=10+10) Hypot 6.40ns ± 2% 6.19ns ± 3% -3.36% (p=0.000 n=10+10) HypotGo 7.16ns ± 3% 6.54ns ± 2% -8.66% (p=0.000 n=10+10) J0 66.0ns ± 2% 63.7ns ± 1% -3.42% (p=0.000 n=9+10) Fixes #21812 Change-Id: I2b88fbdfc250cd548f8f08b44ce2eb172dcacf43 Reviewed-on: https://go-review.googlesource.com/84437 Reviewed-by: Giovanni Bajo <rasky@develer.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2018-02-13 20:12:23 +00:00
Bill O'Farrell	b6a15683f0	math: use SIMD to accelerate some scalar math functions on s390x Note, most math functions are structured to use stubs, so that they can be accelerated with assembly on any platform. Sinh, cosh, and tanh were not structued with stubs, so this CL does that. This set of routines was chosen as likely to produce good speedups with assembly on any platform. Technique used was minimax polynomial approximation using tables of polynomial coefficients, with argument range reduction. A table of scaling factors was also used for cosh and log10. before after speedup BenchmarkCos 22.1 ns/op 6.79 ns/op 3.25x BenchmarkCosh 125 ns/op 11.7 ns/op 10.68x BenchmarkLog10 48.4 ns/op 12.5 ns/op 3.87x BenchmarkSin 22.2 ns/op 6.55 ns/op 3.39x BenchmarkSinh 125 ns/op 14.2 ns/op 8.80x BenchmarkTanh 65.0 ns/op 15.1 ns/op 4.30x Accuracy was tested against a high precision reference function to determine maximum error. Approximately 4,000,000 points were tested for each function, producing the following result. Note: ulperr is error in "units in the last place" max ulperr sin 1.43 (returns NaN beyond +-2^50) cos 1.79 (returns NaN beyond +-2^50) cosh 1.05 sinh 3.02 tanh 3.69 log10 1.75 Also includes a set of tests to test non-vector functions even when SIMD is enabled Change-Id: Icb45f14d00864ee19ed973d209c3af21e4df4edc Reviewed-on: https://go-review.googlesource.com/32352 Run-TryBot: Michael Munday <munday@ca.ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Munday <munday@ca.ibm.com>	2016-11-11 20:20:23 +00:00
Russ Cox	c007ce824d	build: move package sources from src/pkg to src Preparation was in CL 134570043. This CL contains only the effect of 'hg mv src/pkg/* src'. For more about the move, see golang.org/s/go14nopkg.	2014-09-08 00:08:51 -04:00

6 Commits