Commit Graph

6 Commits

Author SHA1 Message Date
Austin Clements 1d20a362d0 math: avoid assembly stubs
Currently almost all math functions have the following pattern:

func Sin(x float64) float64

func sin(x float64) float64 {
    // ... pure Go implementation ...
}

Architectures that implement a function in assembly provide the
assembly implementation directly as the exported function (e.g., Sin),
and architectures that don't implement it in assembly use a small stub
to jump back to the Go code, like:

TEXT ·Sin(SB), NOSPLIT, $0
	JMP ·sin(SB)

However, most functions are not implemented in assembly on most
architectures, so this jump through assembly is a waste. It defeats
compiler optimizations like inlining. And, with regabi, it actually
adds a small but non-trivial overhead because the jump from assembly
back to Go must go through an ABI0->ABIInternal bridge function.

Hence, this CL reorganizes this structure across the entire package.
It now leans on inlining to achieve peak performance, but allows the
compiler to see all the way through the pure Go implementation.

Now, functions follow this pattern:

func Sin(x float64) float64 {
	if haveArchSin {
		return archSin(x)
	}
	return sin(x)
}

func sin(x float64) float64 {
    // ... pure Go implementation ...
}

Architectures that have assembly implementations use build-tagged
files to set haveArchX to true an provide an archX implementation.
That implementation can also still call back into the Go
implementation (some of them do this).

Prior to this change, enabling ABI wrappers results in a geomean
slowdown of the math benchmarks of 8.77% (full results:
https://perf.golang.org/search?q=upload:20210415.6) and of the Tile38
benchmarks by ~4%. After this change, enabling ABI wrappers is
completely performance-neutral on Tile38 and all but one math
benchmark (full results:
https://perf.golang.org/search?q=upload:20210415.7). ABI wrappers slow
down SqrtIndirectLatency-12 by 2.09%, which makes sense because that
call must still go through an ABI wrapper.

With ABI wrappers disabled (which won't be an option on amd64 much
longer), on linux/amd64, this change is largely performance-neutral
and slightly improves the performance of a few benchmarks:

(Because there are so many benchmarks, I've applied the Šidák
correction to the alpha threshold. It makes relatively little
difference in which benchmarks are statistically significant.)

name                    old time/op  new time/op  delta
Acos-12                 22.3ns ± 0%  18.8ns ± 1%  -15.44%  (p=0.000 n=18+16)
Acosh-12                28.2ns ± 0%  28.2ns ± 0%     ~     (p=0.404 n=18+20)
Asin-12                 18.1ns ± 0%  18.2ns ± 0%   +0.20%  (p=0.000 n=18+16)
Asinh-12                32.8ns ± 0%  32.9ns ± 1%     ~     (p=0.891 n=18+20)
Atan-12                 9.92ns ± 0%  9.90ns ± 1%   -0.24%  (p=0.000 n=17+16)
Atanh-12                27.7ns ± 0%  27.5ns ± 0%   -0.72%  (p=0.000 n=16+20)
Atan2-12                18.5ns ± 0%  18.4ns ± 0%   -0.59%  (p=0.000 n=19+19)
Cbrt-12                 22.1ns ± 0%  22.1ns ± 0%     ~     (p=0.804 n=16+17)
Ceil-12                 0.84ns ± 0%  0.84ns ± 0%     ~     (p=0.663 n=18+16)
Copysign-12             0.84ns ± 0%  0.84ns ± 0%     ~     (p=0.762 n=16+19)
Cos-12                  12.7ns ± 0%  12.7ns ± 1%     ~     (p=0.145 n=19+18)
Cosh-12                 22.2ns ± 0%  22.5ns ± 0%   +1.60%  (p=0.000 n=17+19)
Erf-12                  11.1ns ± 1%  11.1ns ± 1%     ~     (p=0.010 n=19+19)
Erfc-12                 12.6ns ± 1%  12.7ns ± 0%     ~     (p=0.066 n=19+15)
Erfinv-12               16.1ns ± 0%  16.1ns ± 0%     ~     (p=0.462 n=17+20)
Erfcinv-12              16.0ns ± 1%  16.0ns ± 1%     ~     (p=0.015 n=17+16)
Exp-12                  16.3ns ± 0%  16.5ns ± 1%   +1.25%  (p=0.000 n=19+16)
ExpGo-12                36.2ns ± 1%  36.1ns ± 1%     ~     (p=0.242 n=20+18)
Expm1-12                18.6ns ± 0%  18.7ns ± 0%   +0.25%  (p=0.000 n=16+19)
Exp2-12                 34.7ns ± 0%  34.6ns ± 1%     ~     (p=0.010 n=19+18)
Exp2Go-12               34.8ns ± 1%  34.8ns ± 1%     ~     (p=0.372 n=19+19)
Abs-12                  0.56ns ± 0%  0.56ns ± 0%     ~     (p=0.766 n=18+16)
Dim-12                  0.84ns ± 1%  0.84ns ± 1%     ~     (p=0.167 n=17+19)
Floor-12                0.84ns ± 0%  0.84ns ± 0%     ~     (p=0.993 n=18+16)
Max-12                  3.35ns ± 0%  3.35ns ± 0%     ~     (p=0.894 n=17+19)
Min-12                  3.35ns ± 0%  3.36ns ± 1%     ~     (p=0.214 n=18+18)
Mod-12                  35.2ns ± 0%  34.7ns ± 0%   -1.45%  (p=0.000 n=18+17)
Frexp-12                5.31ns ± 0%  4.75ns ± 0%  -10.51%  (p=0.000 n=19+18)
Gamma-12                14.8ns ± 0%  16.2ns ± 1%   +9.21%  (p=0.000 n=20+19)
Hypot-12                6.16ns ± 0%  6.17ns ± 0%   +0.26%  (p=0.000 n=19+20)
HypotGo-12              7.79ns ± 1%  7.78ns ± 0%     ~     (p=0.497 n=18+17)
Ilogb-12                4.47ns ± 0%  4.47ns ± 0%     ~     (p=0.167 n=19+19)
J0-12                   76.0ns ± 0%  76.3ns ± 0%   +0.35%  (p=0.000 n=19+18)
J1-12                   76.8ns ± 1%  75.9ns ± 0%   -1.14%  (p=0.000 n=18+18)
Jn-12                    167ns ± 1%   168ns ± 1%     ~     (p=0.038 n=18+18)
Ldexp-12                6.98ns ± 0%  6.43ns ± 0%   -7.97%  (p=0.000 n=17+18)
Lgamma-12               15.9ns ± 0%  16.0ns ± 1%     ~     (p=0.011 n=20+17)
Log-12                  13.3ns ± 0%  13.4ns ± 1%   +0.37%  (p=0.000 n=15+18)
Logb-12                 4.75ns ± 0%  4.75ns ± 0%     ~     (p=0.831 n=16+18)
Log1p-12                19.5ns ± 0%  19.5ns ± 1%     ~     (p=0.851 n=18+17)
Log10-12                15.9ns ± 0%  14.0ns ± 0%  -11.92%  (p=0.000 n=17+16)
Log2-12                 7.88ns ± 1%  8.01ns ± 0%   +1.72%  (p=0.000 n=20+20)
Modf-12                 4.75ns ± 0%  4.34ns ± 0%   -8.66%  (p=0.000 n=19+17)
Nextafter32-12          5.31ns ± 0%  5.31ns ± 0%     ~     (p=0.389 n=17+18)
Nextafter64-12          5.03ns ± 1%  5.03ns ± 0%     ~     (p=0.774 n=17+18)
PowInt-12               29.9ns ± 0%  28.5ns ± 0%   -4.69%  (p=0.000 n=18+19)
PowFrac-12              91.0ns ± 0%  91.1ns ± 0%     ~     (p=0.029 n=19+19)
Pow10Pos-12             1.12ns ± 0%  1.12ns ± 0%     ~     (p=0.363 n=20+20)
Pow10Neg-12             3.90ns ± 0%  3.90ns ± 0%     ~     (p=0.921 n=17+18)
Round-12                2.31ns ± 0%  2.31ns ± 1%     ~     (p=0.390 n=18+18)
RoundToEven-12          0.84ns ± 0%  0.84ns ± 0%     ~     (p=0.280 n=18+19)
Remainder-12            31.6ns ± 0%  29.6ns ± 0%   -6.16%  (p=0.000 n=18+17)
Signbit-12              0.56ns ± 0%  0.56ns ± 0%     ~     (p=0.385 n=19+18)
Sin-12                  12.5ns ± 0%  12.5ns ± 0%     ~     (p=0.080 n=18+18)
Sincos-12               16.4ns ± 2%  16.4ns ± 2%     ~     (p=0.253 n=20+19)
Sinh-12                 26.1ns ± 0%  26.1ns ± 0%   +0.18%  (p=0.000 n=17+19)
SqrtIndirect-12         3.91ns ± 0%  3.90ns ± 0%     ~     (p=0.133 n=19+19)
SqrtLatency-12          2.79ns ± 0%  2.79ns ± 0%     ~     (p=0.226 n=16+19)
SqrtIndirectLatency-12  6.68ns ± 0%  6.37ns ± 2%   -4.66%  (p=0.000 n=17+20)
SqrtGoLatency-12        49.4ns ± 0%  49.4ns ± 0%     ~     (p=0.289 n=18+16)
SqrtPrime-12            3.18µs ± 0%  3.18µs ± 0%     ~     (p=0.084 n=17+18)
Tan-12                  13.8ns ± 0%  13.9ns ± 2%     ~     (p=0.292 n=19+20)
Tanh-12                 25.4ns ± 0%  25.4ns ± 0%     ~     (p=0.101 n=17+17)
Trunc-12                0.84ns ± 0%  0.84ns ± 0%     ~     (p=0.765 n=18+16)
Y0-12                   75.8ns ± 0%  75.9ns ± 1%     ~     (p=0.805 n=16+18)
Y1-12                   76.3ns ± 0%  75.3ns ± 1%   -1.34%  (p=0.000 n=19+17)
Yn-12                    164ns ± 0%   164ns ± 2%     ~     (p=0.356 n=18+20)
Float64bits-12          0.56ns ± 0%  0.56ns ± 0%     ~     (p=0.383 n=18+18)
Float64frombits-12      0.56ns ± 0%  0.56ns ± 0%     ~     (p=0.066 n=18+19)
Float32bits-12          0.56ns ± 0%  0.56ns ± 0%     ~     (p=0.889 n=16+19)
Float32frombits-12      0.56ns ± 0%  0.56ns ± 0%     ~     (p=0.007 n=18+19)
FMA-12                  23.9ns ± 0%  24.0ns ± 0%   +0.31%  (p=0.000 n=16+17)
[Geo mean]              9.86ns       9.77ns        -0.87%

(https://perf.golang.org/search?q=upload:20210415.5)

For #40724.

Change-Id: I44fbba2a17be930ec9daeb0a8222f55cd50555a0
Reviewed-on: https://go-review.googlesource.com/c/go/+/310331
Trust: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-04-15 15:48:19 +00:00
Plekhanov Maxim 47e71f3b69 math: use Abs in Pow rather than if x < 0 { x = -x }
name     old time/op  new time/op  delta
PowInt   55.7ns ± 1%  53.4ns ± 2%  -4.15%  (p=0.000 n=9+9)
PowFrac   133ns ± 1%   133ns ± 2%    ~     (p=0.587 n=8+9)

Change-Id: Ica0f4c2cbd554f2195c6d1762ed26742ff8e3924
Reviewed-on: https://go-review.googlesource.com/c/85375
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-10-04 17:33:04 +00:00
Brian Kessler 5305bdd86b math: correct result for Pow(x, ±.5)
Fixes #23224

The previous Pow code had an optimization for
powers equal to ±0.5 that used Sqrt for
increased accuracy/speed.  This caused special
cases involving powers of ±0.5 to disagree with
the Pow spec.  This change places the Sqrt optimization
after all of the special case handling.

Change-Id: I6bf757f6248256b29cc21725a84e27705d855369
Reviewed-on: https://go-review.googlesource.com/85660
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-01-02 18:10:43 +00:00
Brian Kessler 1246566142 math: eliminate overflow in Pow(x,y) for large y
The current implementation uses a shift and add
loop to compute the product of x's exponent xe and
the integer part of y (yi) for yi up to 1<<63.
Since xe is an 11-bit exponent, this product can be
up to 74-bits and overflow both 32 and 64-bit int.

This change checks whether the accumulated exponent
will fit in the 11-bit float exponent of the output
and breaks out of the loop early if overflow is detected.

The current handling of yi >= 1<<63 uses Exp(y * Log(x))
which incorrectly returns Nan for x<0.  In addition,
for y this large, Exp(y * Log(x)) can be enumerated
to only overflow except when x == -1 since the
boundary cases computed exactly:

Pow(NextAfter(1.0, Inf(1)), 1<<63)  == 2.72332... * 10^889
Pow(NextAfter(1.0, Inf(-1)), 1<<63) == 1.91624... * 10^-445

exceed the range of float64. So, the call can be
replaced with a simple case statement analgous to
y == Inf that correctly handles x < 0 as well.

Fixes #7394

Change-Id: I6f50dc951f3693697f9669697599860604323102
Reviewed-on: https://go-review.googlesource.com/48290
Reviewed-by: Robert Griesemer <gri@golang.org>
2017-08-16 09:10:10 +00:00
Bill O'Farrell 88672de7af math: use SIMD to accelerate additional scalar math functions on s390x
As necessary, math functions were structured to use stubs, so that they can
be accelerated with assembly on any platform.

Technique used was minimax polynomial approximation using tables of
polynomial coefficients, with argument range reduction.

Benchmark         New     Old     Speedup
BenchmarkAcos     12.2    47.5    3.89
BenchmarkAcosh    18.5    56.2    3.04
BenchmarkAsin     13.1    40.6    3.10
BenchmarkAsinh    19.4    62.8    3.24
BenchmarkAtan     10.1    23      2.28
BenchmarkAtanh    19.1    53.2    2.79
BenchmarkAtan2    16.5    33.9    2.05
BenchmarkCbrt     14.8    58      3.92
BenchmarkErf      10.8    20.1    1.86
BenchmarkErfc     11.2    23.5    2.10
BenchmarkExp      8.77    53.8    6.13
BenchmarkExpm1    10.1    38.3    3.79
BenchmarkLog      13.1    40.1    3.06
BenchmarkLog1p    12.7    38.3    3.02
BenchmarkPowInt   31.7    40.5    1.28
BenchmarkPowFrac  33.1    141     4.26
BenchmarkTan      11.5    30      2.61

Accuracy was tested against a high precision
reference function to determine maximum error.
Note: ulperr is error in "units in the last place"

       max
      ulperr
Acos  1.15
Acosh 1.07
Asin  2.22
Asinh 1.72
Atan  1.41
Atanh 3.00
Atan2 1.45
Cbrt  1.18
Erf   1.29
Erfc  4.82
Exp   1.00
Expm1 2.26
Log   0.94
Log1p 2.39
Tan   3.14

Pow will have 99.99% correctly rounded results with reasonable inputs
producing numeric (non Inf or NaN) results

Change-Id: I850e8cf7b70426e8b54ec49d74acd4cddc8c6cb2
Reviewed-on: https://go-review.googlesource.com/38585
Reviewed-by: Michael Munday <munday@ca.ibm.com>
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-05-08 19:52:30 +00:00
Russ Cox c007ce824d build: move package sources from src/pkg to src
Preparation was in CL 134570043.
This CL contains only the effect of 'hg mv src/pkg/* src'.
For more about the move, see golang.org/s/go14nopkg.
2014-09-08 00:08:51 -04:00