go/src/math
Carlos Eduardo Seo 9459c03b29 math/big: improve performance for addVV/subVV for ppc64x
This change adds a better asm implementation of addVV for ppc64x, with speedups
up to nearly 3x in the best cases.

benchmark                   old ns/op     new ns/op     delta
BenchmarkAddVV/1-8          7.33          5.81          -20.74%
BenchmarkAddVV/2-8          8.72          6.49          -25.57%
BenchmarkAddVV/3-8          10.5          7.08          -32.57%
BenchmarkAddVV/4-8          12.7          7.57          -40.39%
BenchmarkAddVV/5-8          14.3          8.06          -43.64%
BenchmarkAddVV/10-8         27.6          11.1          -59.78%
BenchmarkAddVV/100-8        218           82.4          -62.20%
BenchmarkAddVV/1000-8       2064          718           -65.21%
BenchmarkAddVV/10000-8      20536         7153          -65.17%
BenchmarkAddVV/100000-8     211004        72403         -65.69%

benchmark                   old MB/s     new MB/s     speedup
BenchmarkAddVV/1-8          8729.74      11006.26     1.26x
BenchmarkAddVV/2-8          14683.65     19707.55     1.34x
BenchmarkAddVV/3-8          18226.96     27103.63     1.49x
BenchmarkAddVV/4-8          20204.50     33805.81     1.67x
BenchmarkAddVV/5-8          22348.64     39694.06     1.78x
BenchmarkAddVV/10-8         23212.74     57631.08     2.48x
BenchmarkAddVV/100-8        29300.07     77629.53     2.65x
BenchmarkAddVV/1000-8       31000.56     89094.54     2.87x
BenchmarkAddVV/10000-8      31163.61     89469.16     2.87x
BenchmarkAddVV/100000-8     30331.16     88393.73     2.91x

It also adds the use of CTR for the loop counter in subVV, instead of
manually updating the loop counter. This is slightly faster.

Change-Id: Ic4b05cad384fd057972d46a5618ed5c3039d7460
Reviewed-on: https://go-review.googlesource.com/41010
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2017-04-25 13:15:39 +00:00
..
big math/big: improve performance for addVV/subVV for ppc64x 2017-04-25 13:15:39 +00:00
bits math/bits: support negative rotation count and remove RotateRight 2017-04-11 23:57:24 +00:00
cmplx math/cmplx: prevent infinite loop in tanSeries 2016-10-25 18:32:22 +00:00
rand math/rand: export Source64, mainly for documentation value 2016-11-23 04:29:25 +00:00
abs.go
acosh.go all: single space after period. 2016-03-02 00:13:47 +00:00
all_test.go math: speed up and improve accuracy of Pow10 2017-02-22 19:17:04 +00:00
arith_s390x.go math: use SIMD to accelerate some scalar math functions on s390x 2016-11-11 20:20:23 +00:00
arith_s390x_test.go math: use SIMD to accelerate some scalar math functions on s390x 2016-11-11 20:20:23 +00:00
asin.go
asin_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
asin_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
asin_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
asin_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
asinh.go all: single space after period. 2016-03-02 00:13:47 +00:00
atan.go
atan2.go
atan2_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
atan2_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
atan2_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
atan2_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
atan_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
atan_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
atan_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
atan_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
atanh.go all: single space after period. 2016-03-02 00:13:47 +00:00
bits.go
cbrt.go
const.go
copysign.go
cosh_s390x.s cmd/asm, cmd/internal/obj/s390x, math: add LGDR and LDGR instructions 2017-04-17 16:33:51 +00:00
dim.go
dim_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
dim_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
dim_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
dim_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
dim_arm64.s math: add some assembly implementations on ARM64 2016-09-27 23:52:12 +00:00
dim_s390x.s math: add functions and stubs for s390x 2016-04-06 23:35:56 +00:00
erf.go all: single space after period. 2016-03-02 00:13:47 +00:00
exp.go all: single space after period. 2016-03-02 00:13:47 +00:00
exp2_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
exp2_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
exp2_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
exp2_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
exp_386.s math: use portable Exp instead of 387 instructions on 386 2016-10-05 03:53:11 +00:00
exp_amd64.s math: check overflow in amd64 Exp implementation 2017-02-10 13:40:08 +00:00
exp_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
exp_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
expm1.go math,math/cmplx: fix linter issues 2016-10-24 23:25:46 +00:00
expm1_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
expm1_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
expm1_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
expm1_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
export_s390x_test.go math: use SIMD to accelerate some scalar math functions on s390x 2016-11-11 20:20:23 +00:00
export_test.go all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
floor.go
floor_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
floor_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
floor_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
floor_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
floor_arm64.s math: add some assembly implementations on ARM64 2016-09-27 23:52:12 +00:00
floor_asm.go
floor_ppc64x.s math, cmd/internal/obj/ppc64: improve floor, ceil, trunc with asm 2016-09-23 13:03:08 +00:00
floor_s390x.s math: optimize Ceil, Floor and Trunc on s390x 2016-08-26 17:27:13 +00:00
frexp.go
frexp_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
frexp_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
frexp_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
frexp_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
gamma.go math: speed up Gamma(+Inf) 2016-10-18 22:12:03 +00:00
hypot.go
hypot_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
hypot_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
hypot_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
hypot_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
j0.go math: speed up bessel functions on AMD64 2016-08-31 14:45:29 +00:00
j1.go math: speed up bessel functions on AMD64 2016-08-31 14:45:29 +00:00
jn.go math: fix typos in Bessel function docs 2017-02-16 22:41:34 +00:00
ldexp.go
ldexp_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
ldexp_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
ldexp_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
ldexp_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
lgamma.go all: single space after period. 2016-03-02 00:13:47 +00:00
log.go all: single space after period. 2016-03-02 00:13:47 +00:00
log1p.go math,math/cmplx: fix linter issues 2016-10-24 23:25:46 +00:00
log1p_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log1p_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log1p_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log1p_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log10.go
log10_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log10_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log10_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log10_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log10_s390x.s cmd/asm, cmd/internal/obj/s390x, math: add LGDR and LDGR instructions 2017-04-17 16:33:51 +00:00
log_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log_amd64.s math: speed up Log on amd64 2017-03-29 20:36:29 +00:00
log_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
log_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
logb.go
mod.go
mod_386.s
mod_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
mod_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
mod_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
modf.go all: single space after period. 2016-03-02 00:13:47 +00:00
modf_386.s all: fix assembly vet issues 2016-08-25 18:52:31 +00:00
modf_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
modf_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
modf_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
modf_arm64.s all: minor vet fixes 2016-10-24 17:27:37 +00:00
nextafter.go
pow.go
pow10.go math: speed up and improve accuracy of Pow10 2017-02-22 19:17:04 +00:00
remainder.go all: single space after period. 2016-03-02 00:13:47 +00:00
remainder_386.s
remainder_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
remainder_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
remainder_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
signbit.go
sin.go math,math/cmplx: fix linter issues 2016-10-24 23:25:46 +00:00
sin_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sin_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sin_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sin_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sin_s390x.s cmd/asm, cmd/internal/obj/s390x, math: add LGDR and LDGR instructions 2017-04-17 16:33:51 +00:00
sincos.go math: remove asm version of sincos everywhere, except 386 2017-04-24 15:09:18 +00:00
sincos_386.go math: remove asm version of sincos everywhere, except 386 2017-04-24 15:09:18 +00:00
sincos_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sinh.go math: use SIMD to accelerate some scalar math functions on s390x 2016-11-11 20:20:23 +00:00
sinh_s390x.s cmd/asm, cmd/internal/obj/s390x, math: add LGDR and LDGR instructions 2017-04-17 16:33:51 +00:00
sinh_stub.s math: use SIMD to accelerate some scalar math functions on s390x 2016-11-11 20:20:23 +00:00
sqrt.go math: delete unused function sqrtC 2016-03-03 02:29:09 +00:00
sqrt_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sqrt_amd64.s math: make sqrt smaller on AMD64 2016-09-29 15:56:52 +00:00
sqrt_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sqrt_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sqrt_arm64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
sqrt_mipsx.s math, math/big: add support for GOARCH=mips{,le} 2016-11-03 22:55:06 +00:00
sqrt_ppc64x.s all: make copyright headers consistent with one space after period 2016-05-02 13:43:18 +00:00
sqrt_s390x.s math: add functions and stubs for s390x 2016-04-06 23:35:56 +00:00
stubs_arm64.s math: remove asm version of sincos everywhere, except 386 2017-04-24 15:09:18 +00:00
stubs_mips64x.s math: remove asm version of sincos everywhere, except 386 2017-04-24 15:09:18 +00:00
stubs_mipsx.s math: remove asm version of sincos everywhere, except 386 2017-04-24 15:09:18 +00:00
stubs_ppc64x.s math: remove asm version of sincos everywhere, except 386 2017-04-24 15:09:18 +00:00
stubs_s390x.s math: remove asm version of sincos everywhere, except 386 2017-04-24 15:09:18 +00:00
tan.go math,math/cmplx: fix linter issues 2016-10-24 23:25:46 +00:00
tan_386.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
tan_amd64.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
tan_amd64p32.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
tan_arm.s all: make copyright headers consistent with one space after period 2016-03-01 23:34:33 +00:00
tanh.go math: use SIMD to accelerate some scalar math functions on s390x 2016-11-11 20:20:23 +00:00
tanh_s390x.s cmd/asm, cmd/internal/obj/s390x, math: add LGDR and LDGR instructions 2017-04-17 16:33:51 +00:00
unsafe.go