go/src/math
SparrowLii d54a9a9c42 math/big: replace division with multiplication by reciprocal word
Division is much slower than multiplication. And the method of using
multiplication by multiplying reciprocal and replacing division with it
can increase the speed of divWVW algorithm by three times,and at the
same time increase the speed of nats division.

The benchmark test on arm64 is as follows:
name                     old time/op    new time/op    delta
DivWVW/1-4                 13.1ns ± 4%    13.3ns ± 4%      ~     (p=0.444 n=5+5)
DivWVW/2-4                 48.6ns ± 1%    51.2ns ± 2%    +5.39%  (p=0.008 n=5+5)
DivWVW/3-4                 82.0ns ± 1%    69.7ns ± 1%   -15.03%  (p=0.008 n=5+5)
DivWVW/4-4                  116ns ± 1%      71ns ± 2%   -38.88%  (p=0.008 n=5+5)
DivWVW/5-4                  152ns ± 1%      84ns ± 4%   -44.70%  (p=0.008 n=5+5)
DivWVW/10-4                 319ns ± 1%     155ns ± 4%   -51.50%  (p=0.008 n=5+5)
DivWVW/100-4               3.44µs ± 3%    1.30µs ± 8%   -62.30%  (p=0.008 n=5+5)
DivWVW/1000-4              33.8µs ± 0%    10.9µs ± 1%   -67.74%  (p=0.008 n=5+5)
DivWVW/10000-4              343µs ± 4%     111µs ± 5%   -67.63%  (p=0.008 n=5+5)
DivWVW/100000-4            3.35ms ± 1%    1.25ms ± 3%   -62.79%  (p=0.008 n=5+5)
QuoRem-4                   3.08µs ± 2%    2.21µs ± 4%   -28.40%  (p=0.008 n=5+5)
ModSqrt225_Tonelli-4        444µs ± 2%     457µs ± 3%      ~     (p=0.095 n=5+5)
ModSqrt225_3Mod4-4          136µs ± 1%     138µs ± 3%      ~     (p=0.151 n=5+5)
ModSqrt231_Tonelli-4        473µs ± 3%     483µs ± 4%      ~     (p=0.548 n=5+5)
ModSqrt231_5Mod8-4          164µs ± 9%     169µs ±12%      ~     (p=0.421 n=5+5)
Sqrt-4                     36.8µs ± 1%    28.6µs ± 0%   -22.17%  (p=0.016 n=5+4)
Div/20/10-4                50.0ns ± 3%    51.3ns ± 6%      ~     (p=0.238 n=5+5)
Div/40/20-4                49.8ns ± 2%    51.3ns ± 6%      ~     (p=0.222 n=5+5)
Div/100/50-4               85.8ns ± 4%    86.5ns ± 5%	   ~     (p=0.246 n=5+5)
Div/200/100-4               335ns ± 3%     296ns ± 2%   -11.60%  (p=0.008 n=5+5)
Div/400/200-4               442ns ± 2%     359ns ± 5%   -18.81%  (p=0.008 n=5+5)
Div/1000/500-4              858ns ± 3%     643ns ± 6%   -25.06%  (p=0.008 n=5+5)
Div/2000/1000-4            1.70µs ± 3%    1.28µs ± 4%   -24.80%  (p=0.008 n=5+5)
Div/20000/10000-4          45.0µs ± 5%    41.8µs ± 4%    -7.17%  (p=0.016 n=5+5)
Div/200000/100000-4        1.51ms ± 7%    1.43ms ± 3%    -5.42%  (p=0.016 n=5+5)
Div/2000000/1000000-4      57.6ms ± 4%    57.5ms ± 3%      ~     (p=1.000 n=5+5)
Div/20000000/10000000-4     2.08s ± 3%     2.04s ± 1%      ~     (p=0.095 n=5+5)

name                     old speed      new speed      delta
DivWVW/1-4               4.87GB/s ± 4%  4.80GB/s ± 4%      ~     (p=0.310 n=5+5)
DivWVW/2-4               2.63GB/s ± 1%  2.50GB/s ± 2%    -5.07%  (p=0.008 n=5+5)
DivWVW/3-4               2.34GB/s ± 1%  2.76GB/s ± 1%   +17.70%  (p=0.008 n=5+5)
DivWVW/4-4               2.21GB/s ± 1%  3.61GB/s ± 2%   +63.42%  (p=0.008 n=5+5)
DivWVW/5-4               2.10GB/s ± 2%  3.81GB/s ± 4%   +80.89%  (p=0.008 n=5+5)
DivWVW/10-4              2.01GB/s ± 0%  4.13GB/s ± 4%  +105.91%  (p=0.008 n=5+5)
DivWVW/100-4             1.86GB/s ± 2%  4.95GB/s ± 7%  +165.63%  (p=0.008 n=5+5)
DivWVW/1000-4            1.89GB/s ± 0%  5.86GB/s ± 1%  +209.96%  (p=0.008 n=5+5)
DivWVW/10000-4           1.87GB/s ± 4%  5.76GB/s ± 5%  +208.96%  (p=0.008 n=5+5)
DivWVW/100000-4          1.91GB/s ± 1%  5.14GB/s ± 3%  +168.85%  (p=0.008 n=5+5)

Change-Id: I049f1196562b20800e6ef8a6493fd147f93ad830
Reviewed-on: https://go-review.googlesource.com/c/go/+/250417
Trust: Giovanni Bajo <rasky@develer.com>
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-09-23 21:55:55 +00:00
..
big math/big: replace division with multiplication by reciprocal word 2020-09-23 21:55:55 +00:00
bits cmd/compile: clean up codegen for branch-on-carry on s390x 2020-04-22 20:11:06 +00:00
cmplx math/cmplx: handle special cases 2020-05-01 03:16:37 +00:00
rand math/rand: update comment to avoid use of ^ for exponentiation 2019-12-04 21:14:24 +00:00
abs.go
acos_s390x.s math: use s390x mnemonics rather than binary encodings 2018-08-20 17:42:08 +00:00
acosh.go math: Remove redundant local variable Ln2 2020-09-19 09:09:52 +00:00
acosh_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
all_test.go math: correct Atan2(±y,+∞) = ±0 on s390x 2020-03-25 04:06:34 +00:00
arith_s390x.go math: simplify hasVX checking on s390x 2020-04-27 20:06:57 +00:00
arith_s390x_test.go
asin.go
asin_386.s
asin_s390x.s math: use s390x mnemonics rather than binary encodings 2018-08-20 17:42:08 +00:00
asinh.go
asinh_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
atan.go
atan2.go
atan2_386.s
atan2_s390x.s math: correct Atan2(±y,+∞) = ±0 on s390x 2020-03-25 04:06:34 +00:00
atan_386.s
atan_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
atanh.go
atanh_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
bits.go
cbrt.go
cbrt_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
const.go
copysign.go
cosh_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
dim.go
dim_amd64.s
dim_arm64.s
dim_riscv64.s math: implement Min/Max in riscv64 assembly 2020-05-04 17:29:13 +00:00
dim_s390x.s
erf.go
erf_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
erfc_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
erfinv.go all: update comment URLs from HTTP to HTTPS, where possible 2018-06-01 21:52:00 +00:00
example_test.go math: add function examples. 2020-05-02 20:22:19 +00:00
exp.go
exp2_386.s
exp_amd64.s math: fix dead link to springerlink (now link.springer) 2020-05-29 14:33:50 +00:00
exp_arm64.s math: optimize Exp and Exp2 on arm64 2018-03-27 19:55:02 +00:00
exp_asm.go all: remove nacl (part 3, more amd64p32) 2019-10-10 22:38:38 +00:00
exp_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
expm1.go
expm1_386.s all: this big patch remove whitespace from assembly files 2018-10-03 15:28:51 +00:00
expm1_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
export_s390x_test.go
export_test.go math: use constant rather than variable for exported test threshold 2018-12-13 06:33:18 +00:00
floor.go
floor_386.s
floor_amd64.s
floor_arm64.s
floor_ppc64x.s
floor_s390x.s
floor_wasm.s math, math/big: add wasm architecture 2018-05-08 13:29:22 +00:00
fma.go math, cmd/compile: rename Fma to FMA 2019-11-07 14:51:06 +00:00
frexp.go
frexp_386.s
gamma.go
huge_test.go math/cmplx: implement Payne-Hanek range reduction 2020-03-14 04:12:41 +00:00
hypot.go
hypot_386.s
hypot_amd64.s
j0.go all: s/cancelation/cancellation/ 2019-04-16 20:27:15 +00:00
j1.go all: s/cancelation/cancellation/ 2019-04-16 20:27:15 +00:00
jn.go math: use Sincos instead of Sin and Cos in Jn and Yn 2019-03-25 22:41:37 +00:00
ldexp.go math: fix Ldexp when result is below ldexp(2, -1075) 2018-03-29 23:14:13 +00:00
ldexp_386.s
lgamma.go go/printer, gofmt: tuned table alignment for better results 2018-04-04 13:39:34 -07:00
log.go
log1p.go math: simplify the code 2020-08-15 02:20:42 +00:00
log1p_386.s
log1p_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
log10.go
log10_386.s
log10_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
log_386.s
log_amd64.s
log_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
logb.go
mod.go math: use Abs in Mod rather than if x < 0 { x = -x} 2018-10-04 17:32:44 +00:00
mod_386.s
modf.go
modf_386.s
modf_arm64.s
modf_ppc64x.s
nextafter.go
pow.go math: use Abs in Pow rather than if x < 0 { x = -x } 2018-10-04 17:33:04 +00:00
pow10.go
pow_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
remainder.go math: fix math.Remainder(-x,x) (for Inf > x > 0) 2019-03-15 14:52:51 +00:00
remainder_386.s
signbit.go all: use "reports whether" consistently in the few places that didn't 2018-11-02 22:47:58 +00:00
sin.go src, misc: apply gofmt 2019-02-19 20:38:28 +00:00
sin_s390x.s cmd/asm, math: add s390x floating point test instructions 2018-04-03 16:08:04 +00:00
sincos.go src, misc: apply gofmt 2019-02-19 20:38:28 +00:00
sinh.go math,net: omit explicit true tag expr in switch 2018-08-20 22:15:59 +00:00
sinh_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
sqrt.go
sqrt_386.s all: this big patch remove whitespace from assembly files 2018-10-03 15:28:51 +00:00
sqrt_amd64.s
sqrt_arm.s all: this big patch remove whitespace from assembly files 2018-10-03 15:28:51 +00:00
sqrt_arm64.s
sqrt_mipsx.s
sqrt_ppc64x.s
sqrt_riscv64.s math: implement Sqrt in assembly for riscv64 2020-02-25 16:43:26 +00:00
sqrt_s390x.s
sqrt_wasm.s math, math/big: add wasm architecture 2018-05-08 13:29:22 +00:00
stubs_386.s math: consolidate assembly stub implementations 2019-04-23 14:50:16 +00:00
stubs_amd64.s math: consolidate assembly stub implementations 2019-04-23 14:50:16 +00:00
stubs_arm.s math: consolidate assembly stub implementations 2019-04-23 14:50:16 +00:00
stubs_arm64.s math: consolidate assembly stub implementations 2019-04-23 14:50:16 +00:00
stubs_mips64x.s math: consolidate assembly stub implementations 2019-04-23 14:50:16 +00:00
stubs_mipsx.s math: consolidate assembly stub implementations 2019-04-23 14:50:16 +00:00
stubs_ppc64x.s math: consolidate assembly stub implementations 2019-04-23 14:50:16 +00:00
stubs_riscv64.s math: implement Min/Max in riscv64 assembly 2020-05-04 17:29:13 +00:00
stubs_s390x.s math: simplify hasVX checking on s390x 2020-04-27 20:06:57 +00:00
stubs_wasm.s math: consolidate assembly stub implementations 2019-04-23 14:50:16 +00:00
tan.go src, misc: apply gofmt 2019-02-19 20:38:28 +00:00
tan_s390x.s math: use s390x mnemonics rather than binary encodings 2018-08-20 17:42:08 +00:00
tanh.go src, misc: apply gofmt 2019-02-19 20:38:28 +00:00
tanh_s390x.s math: use new mnemonics for 'rotate then insert' on s390x 2019-04-16 15:34:41 +00:00
trig_reduce.go math/cmplx: implement Payne-Hanek range reduction 2020-03-14 04:12:41 +00:00
unsafe.go math: document sign bit correspondence for floating-point/bits conversions 2018-12-06 22:27:54 +00:00