Commit Graph

8 Commits

Author SHA1 Message Date
Meng Zhuo 918d4d46cd runtime: improve MIPS64x memclr
Using MIPS MSA VLD/VST to improve mips64x large memclr.

name          old time/op    new time/op     delta
Memclr/5        23.2ns ± 0%     21.5ns ± 0%    -7.33%  (p=0.000 n=9+8)
Memclr/16       20.1ns ± 0%     17.1ns ± 0%   -14.93%  (p=0.000 n=10+10)
Memclr/64       27.2ns ± 0%     19.1ns ± 0%   -29.70%  (p=0.000 n=9+9)
Memclr/256      76.8ns ± 0%     24.1ns ± 0%   -68.66%  (p=0.000 n=10+10)
Memclr/4096     1.12µs ± 1%     0.18µs ± 0%   -84.32%  (p=0.000 n=10+8)
Memclr/65536    18.0µs ± 0%      2.8µs ± 0%   -84.29%  (p=0.000 n=10+10)
Memclr/1M        288µs ± 0%       45µs ± 0%   -84.20%  (p=0.000 n=10+10)
Memclr/4M       1.15ms ± 0%     0.18ms ± 0%   -84.21%  (p=0.000 n=9+10)
Memclr/8M       2.34ms ± 0%     1.39ms ± 0%   -40.55%  (p=0.000 n=10+8)
Memclr/16M      4.72ms ± 0%     4.74ms ± 0%    +0.52%  (p=0.000 n=9+10)
Memclr/64M      18.9ms ± 0%     18.9ms ± 0%      ~     (p=0.436 n=10+10)
GoMemclr/5      13.7ns ± 0%     16.9ns ± 0%   +23.36%  (p=0.000 n=10+10)
GoMemclr/16     14.3ns ± 0%      9.0ns ± 0%   -37.27%  (p=0.000 n=10+9)
GoMemclr/64     26.9ns ± 0%     13.7ns ± 0%   -49.07%  (p=0.000 n=10+10)
GoMemclr/256    77.8ns ± 0%     13.0ns ± 0%   -83.24%  (p=0.000 n=9+10)

name          old speed      new speed       delta
Memclr/5       215MB/s ± 0%    232MB/s ± 0%    +7.74%  (p=0.000 n=9+9)
Memclr/16      795MB/s ± 0%    935MB/s ± 0%   +17.60%  (p=0.000 n=10+10)
Memclr/64     2.35GB/s ± 0%   3.35GB/s ± 0%   +42.33%  (p=0.000 n=10+9)
Memclr/256    3.34GB/s ± 0%  10.65GB/s ± 0%  +219.16%  (p=0.000 n=10+10)
Memclr/4096   3.65GB/s ± 1%  23.30GB/s ± 0%  +538.36%  (p=0.000 n=10+10)
Memclr/65536  3.65GB/s ± 0%  23.21GB/s ± 0%  +536.59%  (p=0.000 n=10+10)
Memclr/1M     3.64GB/s ± 0%  23.07GB/s ± 0%  +532.96%  (p=0.000 n=10+10)
Memclr/4M     3.64GB/s ± 0%  23.08GB/s ± 0%  +533.36%  (p=0.000 n=9+10)
Memclr/8M     3.58GB/s ± 0%   6.02GB/s ± 0%   +68.20%  (p=0.000 n=10+8)
Memclr/16M    3.56GB/s ± 0%   3.54GB/s ± 0%    -0.51%  (p=0.000 n=9+10)
Memclr/64M    3.55GB/s ± 0%   3.55GB/s ± 0%      ~     (p=0.436 n=10+10)
GoMemclr/5     364MB/s ± 0%    296MB/s ± 0%   -18.76%  (p=0.000 n=9+10)
GoMemclr/16   1.12GB/s ± 0%   1.78GB/s ± 0%   +58.86%  (p=0.000 n=10+10)
GoMemclr/64   2.38GB/s ± 0%   4.66GB/s ± 0%   +96.27%  (p=0.000 n=10+9)
GoMemclr/256  3.29GB/s ± 0%  19.62GB/s ± 0%  +496.45%  (p=0.000 n=10+9)

Change-Id: I457858368f2875fd66818a41d2f0c190a850e8f1
Reviewed-on: https://go-review.googlesource.com/c/go/+/218177
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-26 17:48:20 +00:00
Cherry Zhang c46ffdd2ec runtime: guard VZEROUPPER on CPU feature
In CL 219131 we inserted a VZEROUPPER instruction on darwin/amd64.
The instruction is not available on pre-AVX machines. Guard it
with CPU feature.

Fixes #37459.

Change-Id: I9a064df277d091be4ee594eda5c7fd8ee323102b
Reviewed-on: https://go-review.googlesource.com/c/go/+/221057
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-02-26 01:52:42 +00:00
smasher164 58b031949b cmd/compile: add fma intrinsic for arm
This change introduces an arm intrinsic that generates the FMULAD
instruction for the fused-multiply-add operation on systems that
support it. System support is detected via cpu.ARM.HasVFPv4. A rewrite
rule translates the generic intrinsic to FMULAD.

Updates #25819.

Change-Id: I8459e5dd1cdbdca35f88a78dbeb7d387f1e20efa
Reviewed-on: https://go-review.googlesource.com/c/go/+/142117
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-21 17:42:47 +00:00
smasher164 7a6da218b1 cmd/compile: add fma intrinsic for amd64
To permit ssa-level optimization, this change introduces an amd64 intrinsic
that generates the VFMADD231SD instruction for the fused-multiply-add
operation on systems that support it. System support is detected via
cpu.X86.HasFMA. A rewrite rule can then translate the generic ssa intrinsic
("Fma") to VFMADD231SD.

The benchmark compares the software implementation (old) with the intrinsic
(new).

name   old time/op  new time/op  delta
Fma-4  27.2ns ± 1%   1.0ns ± 9%  -96.48%  (p=0.008 n=5+5)

Updates #25819.

Change-Id: I966655e5f96817a5d06dff5942418a3915b09584
Reviewed-on: https://go-review.googlesource.com/c/go/+/137156
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-21 16:42:10 +00:00
Martin Möhrmann 75798e8ada runtime: make processor capability variable naming platform specific
The current support_XXX variables are specific for the
amd64 and 386 platforms.

Prefix processor capability variables by architecture to have a
consistent naming scheme and avoid reuse of the existing
variables for new platforms.

This also aligns naming of runtime variables closer with internal/cpu
processor capability variable names.

Change-Id: I3eabb29a03874678851376185d3a62e73c1aff1d
Reviewed-on: https://go-review.googlesource.com/c/91435
Run-TryBot: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-11-14 20:30:31 +00:00
Martin Möhrmann 05c02444eb all: align cpu feature variable offset naming
Add an "offset_" prefix to all cpu feature variable offset constants to
signify that they are not boolean cpu feature variables.

Remove _ from offset constant names.

Change-Id: I6e22a79ebcbe6e2ae54c4ac8764f9260bb3223ff
Reviewed-on: https://go-review.googlesource.com/131215
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-08-24 18:40:16 +00:00
Martin Möhrmann 2e8c31b3d2 runtime: move arm hardware division support detection to internal/cpu
Assumes mandatory VFP and VFPv3 support to be present by default
but not IDIVA if AT_HWCAP is not available.

Adds GODEBUGCPU options to disable the use of code paths in the runtime
that use hardware support for division.

Change-Id: Ida02311bd9b9701de3fc120697e69445bf6c0853
Reviewed-on: https://go-review.googlesource.com/114826
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-08-24 14:27:07 +00:00
Martin Möhrmann c15c04d9e8 runtime: use internal/cpu variables in assembler code
Using internal/cpu variables has the benefit of avoiding false sharing
(as those are padded) and allows memory and cache usage for these variables
to be shared by multiple packages.

Change-Id: I2bf68d03091bf52b466cf689230d5d25d5950037
Reviewed-on: https://go-review.googlesource.com/126599
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-08-24 07:29:52 +00:00