Commit Graph

600 Commits

Author SHA1 Message Date
Xiaolin Zhao ff14e08cd3 cmd/compile, math: improve implementation of math.{Max,Min} on loong64
Make math.{Min,Max} intrinsics and implement math.{archMax,archMin}
in hardware.

goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A6000 @ 2500.00MHz
         │  old.bench   │              new.bench              │
         │    sec/op    │   sec/op     vs base                │
Max         7.606n ± 0%   3.087n ± 0%  -59.41% (p=0.000 n=20)
Min         7.205n ± 0%   2.904n ± 0%  -59.69% (p=0.000 n=20)
MinFloat   37.220n ± 0%   4.802n ± 0%  -87.10% (p=0.000 n=20)
MaxFloat   33.620n ± 0%   4.802n ± 0%  -85.72% (p=0.000 n=20)
geomean     16.18n        3.792n       -76.57%

goos: linux
goarch: loong64
pkg: runtime
cpu: Loongson-3A5000 @ 2500.00MHz
         │  old.bench   │              new.bench              │
         │    sec/op    │   sec/op     vs base                │
Max        10.010n ± 0%   7.196n ± 0%  -28.11% (p=0.000 n=20)
Min         8.806n ± 0%   7.155n ± 0%  -18.75% (p=0.000 n=20)
MinFloat   60.010n ± 0%   7.976n ± 0%  -86.71% (p=0.000 n=20)
MaxFloat   56.410n ± 0%   7.980n ± 0%  -85.85% (p=0.000 n=20)
geomean     23.37n        7.566n       -67.63%

Updates #59120.

Change-Id: I6815d20bc304af3cbf5d6ca8fe0ca1c2ddebea2d
Reviewed-on: https://go-review.googlesource.com/c/go/+/580283
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2024-08-07 01:16:28 +00:00
Keith Randall c18ff29295 cmd/compile: make sync/atomic AND/OR operations intrinsic on amd64
Update #61395

Change-Id: I59a950f48efc587dfdffce00e2f4f3ab99d8df00
Reviewed-on: https://go-review.googlesource.com/c/go/+/594738
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Nicolas Hillegeer <aktau@google.com>
2024-07-23 21:29:38 +00:00
Keith Randall dbfa3cacc7 cmd/compile: fix typing of atomic logical operations
For atomic AND and OR operations on memory, we currently have two
views of the op. One just does the operation on the memory and returns
just a memory. The other does the operation on the memory and returns
the old value (before having the logical operation done to it) and
memory.

Update #61395

These two type differently, and there's currently some confusion in
our rules about which is which. Use different names for the two
different flavors so we don't get them confused.

Change-Id: I07b4542db672b2cee98169ac42b67db73c482093
Reviewed-on: https://go-review.googlesource.com/c/go/+/594976
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Nicolas Hillegeer <aktau@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
2024-07-23 21:27:54 +00:00
fanzha02 63c1e141bc cmd/compile: intrinsify atomic And/Or on arm64
The atomic And/Or operators were added by the CL 528797,
the compiler does not intrinsify them, this CL does it for
arm64.

Also, for the existing atomicAnd/Or operations, the updated
value are not used, but at that time we need a register to
temporarily hold it. Now that we have v.RegTmp, the new value
is not needed anymore. This CL changes it.

The other change is that the existing operations don't use their
result, but now we need the old value and not the new value for
the result.

And this CL alias all of the And/Or operations into sync/atomic
package.

Peformance on an ARMv8.1 machine:
                      old.txt       new.txt
                      sec/op         sec/op         vs base
And32-160            8.716n ± 0%    4.771n ± 1%  -45.26% (p=0.000 n=10)
And32Parallel-160    30.58n ± 2%   26.45n ± 4% -13.49% (p=0.000 n=10)
And64-160            8.750n ± 1%    4.754n ± 0%  -45.67% (p=0.000 n=10)
And64Parallel-160    29.40n ± 3%    25.55n ± 5%  -13.11% (p=0.000 n=10)
Or32-160             8.847n ± 1%    4.754±1%  -46.26% (p=0.000 n=10)
Or32Parallel-160     30.75n ± 3%    26.10n ± 4%  -15.14% (p=0.000 n=10)
Or64-160             8.825n ± 1%    4.766n ± 0%  -46.00% (p=0.000 n=10)
Or64Parallel-160     30.52n ± 5%    25.89n ± 6%  -15.17% (p=0.000 n=10)

For #61395

Change-Id: Ib1d1ac83f7f67dcf67f74d003fadb0f80932b826
Reviewed-on: https://go-review.googlesource.com/c/go/+/584715
Auto-Submit: Austin Clements <austin@google.com>
TryBot-Bypass: Austin Clements <austin@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Fannie Zhang <Fannie.Zhang@arm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-05-23 15:49:20 +00:00
Paul E. Murphy dca577d882 cmd/compile/internal/ssa: reintroduce ANDconst opcode on PPC64
This allows more effective conversion of rotate and mask opcodes
into their CC equivalents, while simplifying the first lowering
pass.

This was removed before the latelower pass was introduced to fold
more cases of compare against zero. Add ANDconst to push the
conversion of ANDconst to ANDCCconst into latelower with the other
CC opcodes.

This also requires introducing RLDICLCC to prevent regressions
when ANDconst is converted to RLDICL then to RLDICLCC and back
to ANDCCconst when possible.

Change-Id: I9e5f9c99fbefa334db18c6c152c5f967f3ff2590
Reviewed-on: https://go-review.googlesource.com/c/go/+/586160
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-05-22 19:59:38 +00:00
Paul E. Murphy dfb17c126c cmd/compile: support float min/max instructions on PPC64
This enables efficient use of the builtin min/max function
for float64 and float32 types on GOPPC64 >= power9.

Extend the assembler to support xsminjdp/xsmaxjdp and use
them to implement float min/max.

Simplify the VSX xx3 opcode rules to allow FPR arguments,
if all arguments are an FPR.

Change-Id: I15882a4ce5dc46eba71d683cf1d184dc4236a328
Reviewed-on: https://go-review.googlesource.com/c/go/+/574535
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Than McIntosh <thanm@google.com>
2024-04-01 18:50:29 +00:00
Paul E. Murphy c7065bb9db cmd/compile/internal: generate ADDZE on PPC64
This usage shows up in quite a few places, and helps reduce
register pressure in several complex cryto functions by
removing a MOVD $0,... instruction.

Change-Id: I9444ea8f9d19bfd68fb71ea8dc34e109681b3802
Reviewed-on: https://go-review.googlesource.com/c/go/+/571055
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
2024-03-15 17:57:45 +00:00
Joel Sing 997636760e cmd/compile,cmd/internal/obj: provide rotation pseudo-instructions for riscv64
Provide and use rotation pseudo-instructions for riscv64. The RISC-V bitmanip
extension adds support for hardware rotation instructions in the form of ROL,
ROLW, ROR, RORI, RORIW and RORW. These are easily implemented in the assembler
as pseudo-instructions for CPUs that do not support the bitmanip extension.

This approach provides a number of advantages, including reducing the rewrite
rules needed in the compiler, simplifying codegen tests and most importantly,
allowing these instructions to be used in assembly (for example, riscv64
optimised versions of SHA-256 and SHA-512). When bitmanip support is added,
these instruction sequences can simply be replaced with a single instruction
if permitted by the GORISCV64 profile.

Change-Id: Ia23402e1a82f211ac760690deb063386056ae1fa
Reviewed-on: https://go-review.googlesource.com/c/go/+/565015
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: M Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
2024-03-07 14:57:07 +00:00
Joel Sing daa58db486 cmd/compile: improve rotations for riscv64
Enable canRotate for riscv64, enable rotation intrinsics and provide
better rewrite implementations for rotations. By avoiding Lsh*x64
and Rsh*Ux64 we can produce better code, especially for 32 and 64
bit rotations. By enabling canRotate we also benefit from the generic
rotation rewrite rules.

Benchmark on a StarFive VisionFive 2:

               │   rotate.1   │              rotate.2               │
               │    sec/op    │   sec/op     vs base                │
RotateLeft-4     14.700n ± 0%   8.016n ± 0%  -45.47% (p=0.000 n=10)
RotateLeft8-4     14.70n ± 0%   10.69n ± 0%  -27.28% (p=0.000 n=10)
RotateLeft16-4    14.70n ± 0%   12.02n ± 0%  -18.23% (p=0.000 n=10)
RotateLeft32-4   13.360n ± 0%   8.016n ± 0%  -40.00% (p=0.000 n=10)
RotateLeft64-4   13.360n ± 0%   8.016n ± 0%  -40.00% (p=0.000 n=10)
geomean           14.15n        9.208n       -34.92%

Change-Id: I1a2036fdc57cf88ebb6617eb8d92e1d187e183b2
Reviewed-on: https://go-review.googlesource.com/c/go/+/560315
Reviewed-by: M Zhuo <mengzhuo1203@gmail.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
2024-02-16 11:59:07 +00:00
Meng Zhuo 09ed9a6585 cmd/compile: implement float min/max in hardware for riscv64
CL 514596 adds float min/max for amd64, this CL adds it for riscv64.

The behavior of the RISC-V FMIN/FMAX instructions almost match Go's
requirements.

However according to RISCV spec 8.3 "NaN Generation and Propagation"
>> if at least one input is a signaling NaN, or if both inputs are quiet
>> NaNs, the result is the canonical NaN. If one operand is a quiet NaN
>> and the other is not a NaN, the result is the non-NaN operand.

Go using quiet NaN as NaN and according to Go spec
>> if any argument is a NaN, the result is a NaN

This requires the float min/max implementation to check whether one
of operand is qNaN before float mix/max actually execute.

This CL also fix a typo in minmax test.

Benchmark on Visionfive2
goos: linux
goarch: riscv64
pkg: runtime
         │ float_minmax.old.bench │       float_minmax.new.bench        │
         │         sec/op         │   sec/op     vs base                │
MinFloat             158.20n ± 0%   28.13n ± 0%  -82.22% (p=0.000 n=10)
MaxFloat             158.10n ± 0%   28.12n ± 0%  -82.21% (p=0.000 n=10)
geomean               158.1n        28.12n       -82.22%

Update #59488

Change-Id: Iab48be6d32b8882044fb8c821438ca8840e5493d
Reviewed-on: https://go-review.googlesource.com/c/go/+/514775
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Run-TryBot: M Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2024-01-26 01:41:50 +00:00
Guoqi Chen 6b77d1b736 cmd/compile: update loong64 CALL* ops
allow the loong64 CALL* ops to take variable number of args

Update #40724

Co-authored-by: Xiaolin Zhao <zhaoxiaolin@loongson.cn>
Change-Id: I4706d9651fcbf9a0f201af6820c97b1a924f14e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/521781
Auto-Submit: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: David Chase <drchase@google.com>
2023-11-21 19:04:19 +00:00
Guoqi Chen ebca52eeb7 cmd/compile/internal: add register info for loong64 regABI
Update #40724

Co-authored-by: Xiaolin Zhao <zhaoxiaolin@loongson.cn>
Change-Id: Ifd7d94147b01e4fc83978b53dca2bcc0ad1ac4e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/521779
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
2023-11-21 19:04:14 +00:00
Guoqi Chen 070139a130 cmd/compile,cmd/internal,runtime: change registers on loong64 to avoid regABI arguments
Update #40724

Co-authored-by: Xiaolin Zhao <zhaoxiaolin@loongson.cn>
Change-Id: Ic7e2e7fb4c1d3670e6abbfb817aa6e4e654e08d3
Reviewed-on: https://go-review.googlesource.com/c/go/+/521777
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Auto-Submit: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: David Chase <drchase@google.com>
2023-11-21 17:59:37 +00:00
Guoqi Chen f43581131e cmd/compile, cmd/internal, runtime: change the registers used by the duff device for loong64
Add R21 to the allocatable registers, use R20 and R21 in duff
device. This CL is in preparation for subsequent regABI support.

Updates #40724

Co-authored-by: Xiaolin Zhao <zhaoxiaolin@loongson.cn>
Change-Id: If1661adc0f766925fbe74827a369797f95fa28a9
Reviewed-on: https://go-review.googlesource.com/c/go/+/521775
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Than McIntosh <thanm@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-11-21 17:42:40 +00:00
Paul E. Murphy 773039ed5c cmd/compile/internal/ssa: on PPC64, merge (CMPconst [0] (op ...)) more aggressively
Generate the CC version of many opcodes whose result is compared against
signed 0. The approach taken here works even if the opcode result is used in
multiple places too.

Add support for ADD, ADDconst, ANDN, SUB, NEG, CNTLZD, NOR conversions
to their CC opcode variant. These are the most commonly used variants.

Also, do not set clobberFlags of CNTLZD and CNTLZW, they do not clobber
flags.

This results in about 1% smaller text sections in kubernetes binaries,
and no regressions in the crypto benchmarks.

Change-Id: I9e0381944869c3774106bf348dead5ecb96dffda
Reviewed-on: https://go-review.googlesource.com/c/go/+/538636
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2023-11-13 22:12:32 +00:00
Keith Randall 962ccbef91 cmd/compile: ensure pointer arithmetic happens after the nil check
Have nil checks return a pointer that is known non-nil. Users of
that pointer can use the result, ensuring that they are ordered
after the nil check itself.

The order dependence goes away after scheduling, when we've fixed
an order. At that point we move uses back to the original pointer
so it doesn't change regalloc any.

This prevents pointer arithmetic on nil from being spilled to the
stack and then observed by a stack scan.

Fixes #63657

Change-Id: I1a5fa4f2e6d9000d672792b4f90dfc1b7b67f6ea
Reviewed-on: https://go-review.googlesource.com/c/go/+/537775
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
2023-10-31 20:45:54 +00:00
Ubuntu 8fc043ccfa cmd/compile: optimize right shifts of int32 on riscv64
The compiler is currently sign extending 32 bit signed integers to
64 bits before right shifting them using a 64 bit shift instruction.
There's no need to do this as RISC-V has instructions for right
shifting 32 bit signed values (sraw and sraiw) which sign extend
the result of the shift to 64 bits.  Change the compiler so that
it uses sraw and sraiw for shifts of signed 32 bit integers reducing
in most cases the number of instructions needed to perform the shift.

Here are some examples of code sequences that are changed by this
patch:

int32(a) >> 2

  before:

    sll     x5,x10,0x20
    sra     x10,x5,0x22

  after:

    sraw    x10,x10,0x2

int32(v) >> int(s)

  before:

    sext.w  x5,x10
    sltiu   x6,x11,64
    add     x6,x6,-1
    or      x6,x11,x6
    sra     x10,x5,x6

  after:

    sltiu   x5,x11,32
    add     x5,x5,-1
    or      x5,x11,x5
    sraw    x10,x10,x5

int32(v) >> (int(s) & 31)

  before:

    sext.w  x5,x10
    and     x6,x11,63
    sra     x10,x5,x6

after:

    and     x5,x11,31
    sraw    x10,x10,x5

int32(100) >> int(a)

  before:

    bltz    x10,<target address calls runtime.panicshift>
    sltiu   x5,x10,64
    add     x5,x5,-1
    or      x5,x10,x5
    li      x6,100
    sra     x10,x6,x5

  after:

    bltz    x10,<target address calls runtime.panicshift>
    sltiu   x5,x10,32
    add     x5,x5,-1
    or      x5,x10,x5
    li      x6,100
    sraw    x10,x6,x5

int32(v) >> (int(s) & 63)

  before:

    sext.w  x5,x10
    and     x6,x11,63
    sra     x10,x5,x6

  after:

    and     x5,x11,63
    sltiu   x6,x5,32
    add     x6,x6,-1
    or      x5,x5,x6
    sraw    x10,x10,x5

In most cases we eliminate one instruction.  In the case where
we shift a int32 constant by a variable the number of instructions
generated is identical.  A sra is simply replaced by a sraw.  In the
unusual case where we shift right by a variable anded with a constant
> 31 but < 64, we generate two additional instructions.  As this is
an unusual case we do not try to optimize for it.

Some improvements can be seen in some of the existing benchmarks,
notably in the utf8 package which performs right shifts of runes
which are signed 32 bit integers.

                      |  utf8-old   |              utf8-new            |
                      |   sec/op    |   sec/op     vs base             |
EncodeASCIIRune-4       17.68n ± 0%   17.67n ± 0%       ~ (p=0.312 n=10)
EncodeJapaneseRune-4    35.34n ± 0%   34.53n ± 1%  -2.31% (p=0.000 n=10)
AppendASCIIRune-4       3.213n ± 0%   3.213n ± 0%       ~ (p=0.318 n=10)
AppendJapaneseRune-4    36.14n ± 0%   35.35n ± 0%  -2.19% (p=0.000 n=10)
DecodeASCIIRune-4       28.11n ± 0%   27.36n ± 0%  -2.69% (p=0.000 n=10)
DecodeJapaneseRune-4    38.55n ± 0%   38.58n ± 0%       ~ (p=0.612 n=10)

Change-Id: I60a91cbede9ce65597571c7b7dd9943eeb8d3cc2
Reviewed-on: https://go-review.googlesource.com/c/go/+/535115
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: M Zhuo <mzh@golangcn.org>
Reviewed-by: David Chase <drchase@google.com>
2023-10-30 14:47:06 +00:00
Guoqi Chen 3754ca0af2 cmd/compile: improve the implementation of Lowered{Move,Zero} on linux/loong64
Like the CL 487295, when implementing Lowered{Move,Zero}, 8 is first subtracted
from Rarg0 (parameter Ptr), and then the offset of 8 is added during subsequent
operations on Rarg0. This operation is meaningless, so delete it.

Change LoweredMove's Rarg0 register to R20, consistent with duffcopy.

goos: linux
goarch: loong64
pkg: runtime
cpu: Loongson-3C5000 @ 2200.00MHz
                              │  old.bench  │             new.bench               │
                              │    sec/op   │   sec/op     vs base                │
Memmove/15                      19.10n ± 0%   19.10n ± 0%        ~ (p=0.483 n=15)
MemmoveUnalignedDst/15          25.02n ± 0%   25.02n ± 0%        ~ (p=0.741 n=15)
MemmoveUnalignedDst/32          48.22n ± 0%   48.22n ± 0%        ~ (p=1.000 n=15) ¹
MemmoveUnalignedDst/64          90.57n ± 0%   90.52n ± 0%        ~ (p=0.212 n=15)
MemmoveUnalignedDstOverlap/32   44.12n ± 0%   44.13n ± 0%   +0.02% (p=0.000 n=15)
MemmoveUnalignedDstOverlap/64   87.79n ± 0%   87.80n ± 0%   +0.01% (p=0.002 n=15)
MemmoveUnalignedSrc/0           3.639n ± 0%   3.639n ± 0%        ~ (p=1.000 n=15) ¹
MemmoveUnalignedSrc/1           7.733n ± 0%   7.733n ± 0%        ~ (p=1.000 n=15)
MemmoveUnalignedSrc/2           9.097n ± 0%   9.097n ± 0%        ~ (p=1.000 n=15)
MemmoveUnalignedSrc/3           10.46n ± 0%   10.46n ± 0%        ~ (p=1.000 n=15) ¹
MemmoveUnalignedSrc/4           11.83n ± 0%   11.83n ± 0%        ~ (p=1.000 n=15) ¹
MemmoveUnalignedSrc/64          93.71n ± 0%   93.70n ± 0%        ~ (p=0.128 n=15)
Memclr/4096                     699.1n ± 0%   699.1n ± 0%        ~ (p=0.682 n=15)
Memclr/65536                    11.18µ ± 0%   11.18µ ± 0%   -0.01% (p=0.000 n=15)
Memclr/1M                       175.2µ ± 0%   175.2µ ± 0%        ~ (p=0.191 n=15)
Memclr/4M                       661.8µ ± 0%   662.0µ ± 0%        ~ (p=0.486 n=15)
MemclrUnaligned/4_5             19.39n ± 0%   20.47n ± 0%   +5.57% (p=0.000 n=15)
MemclrUnaligned/4_16            22.29n ± 0%   21.38n ± 0%   -4.08% (p=0.000 n=15)
MemclrUnaligned/4_64            30.58n ± 0%   29.81n ± 0%   -2.52% (p=0.000 n=15)
MemclrUnaligned/4_65536         11.19µ ± 0%   11.20µ ± 0%   +0.02% (p=0.000 n=15)
GoMemclr/5                      12.73n ± 0%   12.73n ± 0%        ~ (p=0.261 n=15)
GoMemclr/16                     10.01n ± 0%   10.00n ± 0%        ~ (p=0.264 n=15)
GoMemclr/256                    50.94n ± 0%   50.94n ± 0%        ~ (p=0.372 n=15)
ClearFat15                      14.95n ± 0%   15.01n ± 4%        ~ (p=0.925 n=15)
ClearFat1032                    125.5n ± 0%   125.6n ± 0%   +0.08% (p=0.000 n=15)
CopyFat64                       10.58n ± 0%   10.01n ± 0%   -5.39% (p=0.000 n=15)
CopyFat1040                     244.3n ± 0%   155.6n ± 0%  -36.31% (p=0.000 n=15)
Issue18740/2byte                29.82µ ± 0%   29.82µ ± 0%        ~ (p=0.648 n=30)
Issue18740/4byte                18.18µ ± 0%   18.18µ ± 0%   -0.02% (p=0.001 n=30)
Issue18740/8byte                8.395µ ± 0%   8.395µ ± 0%        ~ (p=0.401 n=30)
geomean                         154.5n        151.8n        -1.70%
¹ all samples are equal

Change-Id: Ia3f3c8b25e1e93c97ab72328651de78ca9dec016
Reviewed-on: https://go-review.googlesource.com/c/go/+/488515
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Reviewed-by: WANG Xuerui <git@xen0n.name>
Reviewed-by: xiaodong liu <teaofmoli@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-20 00:01:44 +00:00
Joel Sing f711892a8a cmd/compile/internal: stop lowering OpConvert on riscv64
Lowering for OpConvert was removed for all architectures in CL#108496,
prior to the riscv64 port being upstreamed. Remove lowering of OpConvert
on riscv64, which brings it inline with all other architectures. This
results in 1,600+ instructions being removed from the riscv64 go binary.

Change-Id: Iaaf1f8b397875926604048b66ad8ac91a98c871e
Reviewed-on: https://go-review.googlesource.com/c/go/+/533335
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2023-10-07 12:31:59 +00:00
Mark Ryan 561bf0457f cmd/compile: optimize right shifts of uint32 on riscv
The compiler is currently zero extending 32 bit unsigned integers to
64 bits before right shifting them using a 64 bit shift instruction.
There's no need to do this as RISC-V has instructions for right
shifting 32 bit unsigned values (srlw and srliw) which zero extend
the result of the shift to 64 bits.  Change the compiler so that
it uses srlw and srliw for 32 bit unsigned shifts reducing in most
cases the number of instructions needed to perform the shift.

Here are some examples of code sequences that are changed by this
patch:

uint32(a) >> 2

  before:

    sll     x5,x10,0x20
    srl     x10,x5,0x22

  after:

    srlw    x10,x10,0x2

uint32(a) >> int(b)

  before:

    sll     x5,x10,0x20
    srl     x5,x5,0x20
    srl     x5,x5,x11
    sltiu   x6,x11,64
    neg     x6,x6
    and     x10,x5,x6

  after:

    srlw    x5,x10,x11
    sltiu   x6,x11,32
    neg     x6,x6
    and     x10,x5,x6

bits.RotateLeft32(uint32(a), 1)

  before:

    sll     x5,x10,0x1
    sll     x6,x10,0x20
    srl     x7,x6,0x3f
    or      x5,x5,x7

  after:

   sll     x5,x10,0x1
   srlw    x6,x10,0x1f
   or      x10,x5,x6

bits.RotateLeft32(uint32(a), int(b))

  before:
    and     x6,x11,31
    sll     x7,x10,x6
    sll     x8,x10,0x20
    srl     x8,x8,0x20
    add     x6,x6,-32
    neg     x6,x6
    srl     x9,x8,x6
    sltiu   x6,x6,64
    neg     x6,x6
    and     x6,x9,x6
    or      x6,x6,x7

  after:

    and     x5,x11,31
    sll     x6,x10,x5
    add     x5,x5,-32
    neg     x5,x5
    srlw    x7,x10,x5
    sltiu   x5,x5,32
    neg     x5,x5
    and     x5,x7,x5
    or      x10,x6,x5

The one regression observed is the following case, an unbounded right
shift of a uint32 where the value we're shifting by is known to be
< 64 but > 31.  As this is an unusual case this commit does not
optimize for it, although the existing code does.

uint32(a) >> (b & 63)

  before:

    sll     x5,x10,0x20
    srl     x5,x5,0x20
    and     x6,x11,63
    srl     x10,x5,x6

  after

    and     x5,x11,63
    srlw    x6,x10,x5
    sltiu   x5,x5,32
    neg     x5,x5
    and     x10,x6,x5

Here we have one extra instruction.

Some benchmark highlights, generated on a VisionFive2 8GB running
Ubuntu 23.04.

pkg: math/bits
LeadingZeros32-4    18.64n ± 0%     17.32n ± 0%   -7.11% (p=0.000 n=10)
LeadingZeros64-4    15.47n ± 0%     15.51n ± 0%   +0.26% (p=0.027 n=10)
TrailingZeros16-4   18.48n ± 0%     17.68n ± 0%   -4.33% (p=0.000 n=10)
TrailingZeros32-4   16.87n ± 0%     16.07n ± 0%   -4.74% (p=0.000 n=10)
TrailingZeros64-4   15.26n ± 0%     15.27n ± 0%   +0.07% (p=0.043 n=10)
OnesCount32-4       20.08n ± 0%     19.29n ± 0%   -3.96% (p=0.000 n=10)
RotateLeft-4        8.864n ± 0%     8.838n ± 0%   -0.30% (p=0.006 n=10)
RotateLeft32-4      8.837n ± 0%     8.032n ± 0%   -9.11% (p=0.000 n=10)
Reverse32-4         29.77n ± 0%     26.52n ± 0%  -10.93% (p=0.000 n=10)
ReverseBytes32-4    9.640n ± 0%     8.838n ± 0%   -8.32% (p=0.000 n=10)
Sub32-4             8.835n ± 0%     8.035n ± 0%   -9.06% (p=0.000 n=10)
geomean             11.50n          11.33n        -1.45%

pkg: crypto/md5
Hash8Bytes-4             1.486µ ± 0%   1.426µ ± 0%  -4.04% (p=0.000 n=10)
Hash64-4                 2.079µ ± 0%   1.968µ ± 0%  -5.36% (p=0.000 n=10)
Hash128-4                2.720µ ± 0%   2.557µ ± 0%  -5.99% (p=0.000 n=10)
Hash256-4                3.996µ ± 0%   3.733µ ± 0%  -6.58% (p=0.000 n=10)
Hash512-4                6.541µ ± 0%   6.072µ ± 0%  -7.18% (p=0.000 n=10)
Hash1K-4                 11.64µ ± 0%   10.75µ ± 0%  -7.58% (p=0.000 n=10)
Hash8K-4                 82.95µ ± 0%   76.32µ ± 0%  -7.99% (p=0.000 n=10)
Hash1M-4                10.436m ± 0%   9.591m ± 0%  -8.10% (p=0.000 n=10)
Hash8M-4                 83.50m ± 0%   76.73m ± 0%  -8.10% (p=0.000 n=10)
Hash8BytesUnaligned-4    1.494µ ± 0%   1.434µ ± 0%  -4.02% (p=0.000 n=10)
Hash1KUnaligned-4        11.64µ ± 0%   10.76µ ± 0%  -7.52% (p=0.000 n=10)
Hash8KUnaligned-4        83.01µ ± 0%   76.32µ ± 0%  -8.07% (p=0.000 n=10)
geomean                  28.32µ        26.42µ       -6.72%

Change-Id: I20483a6668cca1b53fe83944bee3706aadcf8693
Reviewed-on: https://go-review.googlesource.com/c/go/+/528975
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-10-07 12:31:38 +00:00
Xianmiao Qu d98f74b31e cmd/compile/internal: intrinsify publicationBarrier on riscv64
This enables publicationBarrier to be used as an intrinsic
on riscv64, optimizing the required function call and return
instructions for invoking the "runtime.publicationBarrier"
function.

This function is called by mallocgc. The benchmark results for malloc tested on Lichee-Pi-4A(TH1520, RISC-V 2.0G C910 x4) are as follows.

goos: linux
goarch: riscv64
pkg: runtime
                    │   old.txt   │              new.txt               │
                    │   sec/op    │   sec/op     vs base               │
Malloc8-4             92.78n ± 1%   90.77n ± 1%  -2.17% (p=0.001 n=10)
Malloc16-4            156.5n ± 1%   151.7n ± 2%  -3.10% (p=0.000 n=10)
MallocTypeInfo8-4     131.7n ± 1%   130.6n ± 2%       ~ (p=0.165 n=10)
MallocTypeInfo16-4    186.5n ± 2%   186.2n ± 1%       ~ (p=0.956 n=10)
MallocLargeStruct-4   1.345µ ± 1%   1.355µ ± 1%       ~ (p=0.093 n=10)
geomean               216.9n        214.5n       -1.10%


Change-Id: Ieab6c02309614bac5c1b12b5ee3311f988ff644d
Reviewed-on: https://go-review.googlesource.com/c/go/+/531719
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: M Zhuo <mzh@golangcn.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
2023-10-03 19:29:38 +00:00
Guoqi Chen 06f420fc19 runtime: remove the meaningless offset of 8 for duffzero on loong64
Currently we subtract 8 from offset when calling duffzero because 8
is added to offset in the duffzero implementation. This operation is
meaningless, so remove it.

Change-Id: I7e451d04d7e98ccafe711645d81d3aadf376766f
Reviewed-on: https://go-review.googlesource.com/c/go/+/487295
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: WANG Xuerui <git@xen0n.name>
Run-TryBot: WANG Xuerui <git@xen0n.name>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: xiaodong liu <teaofmoli@gmail.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
2023-09-01 15:48:45 +00:00
Meng Zhuo 63ab68ddc5 cmd/compile: add single-precision FMA code generation for riscv64
This CL adds FMADDS,FMSUBS,FNMADDS,FNMSUBS SSA support for riscv

Change-Id: I1e7dd322b46b9e0f4923dbba256303d69ed12066
Reviewed-on: https://go-review.googlesource.com/c/go/+/506616
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: M Zhuo <mzh@golangcn.org>
2023-08-22 12:05:36 +00:00
Meng Zhuo 05f9511582 cmd/compile: improve FP FMA performance on riscv64
FMADD/FMSUB/FNSUB are an efficient FP FMA instructions, which can
be used by the compiler to improve FP performance.

Erf               188.0n ± 2%   139.5n ± 2%  -25.82% (p=0.000 n=10)
Erfc              193.6n ± 1%   143.2n ± 1%  -26.01% (p=0.000 n=10)
Erfinv            244.4n ± 2%   172.6n ± 0%  -29.40% (p=0.000 n=10)
Erfcinv           244.7n ± 2%   173.0n ± 1%  -29.31% (p=0.000 n=10)
geomean           216.0n        156.3n       -27.65%

Ref: The RISC-V Instruction Set Manual Volume I: Unprivileged ISA
11.6 Single-Precision Floating-Point Computational Instructions

Change-Id: I89aa3a4df7576fdd47f4a6ee608ac16feafd093c
Reviewed-on: https://go-review.googlesource.com/c/go/+/506036
Reviewed-by: Joel Sing <joel@sing.id.au>
Run-TryBot: M Zhuo <mzh@golangcn.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-08-22 08:38:08 +00:00
Paul E. Murphy 41c71d48a1 cmd/compile/internal: add RLDICR opcode for PPC64
This is encoded similarly to RLDICL, but can clear the least
significant bits.

Likewise, update the auxint encoding of RLDICL to match those
used by the rotate and mask word ssa opcodes for easier usage
within lowering rules. The RLDICL ssa opcode is not used yet.

Change-Id: I42486dd95714a3e8e2f19ab237a6cf3af520c905
Reviewed-on: https://go-review.googlesource.com/c/go/+/515575
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-08-14 20:31:17 +00:00
Keith Randall 611706b171 cmd/compile: don't use BTS when OR works, add direct memory BTS operations
Stop using BTSconst and friends when ORLconst can be used instead.
OR can be issued by more function units than BTS can, so it could
lead to better IPC. OR might take a few more bytes to encode, but
not a lot more.

Still use BTSconst for cases where the constant otherwise wouldn't
fit and would require a separate movabs instruction to materialize
the constant. This happens when setting bits 31-63 of 64-bit targets.

Add BTS-to-memory operations so we don't need to load/bts/store.

Fixes #61694

Change-Id: I00379608df8fb0167cb01466e97d11dec7c1596c
Reviewed-on: https://go-review.googlesource.com/c/go/+/515755
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-08-04 16:40:24 +00:00
Keith Randall 319504ce43 cmd/compile: implement float min/max in hardware for amd64 and arm64
Update #59488

Change-Id: I89f5ea494cbcc887f6fae8560e57bcbd8749be86
Reviewed-on: https://go-review.googlesource.com/c/go/+/514596
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-08-01 20:03:31 +00:00
Keith Randall 67983c0f78 cmd/compile: add indexed SET* opcodes for amd64
Update #61356

Change-Id: I391af98563b1c068208784c80ea736c78c29639d
Reviewed-on: https://go-review.googlesource.com/c/go/+/510435
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Martin Möhrmann <martin@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
2023-07-26 17:19:57 +00:00
Junxian Zhu d9fd19a7f5 cmd/compile: optimize math.Float32bits and math.Float32frombits on mipsx
This CL use MFC1/MTC1 instructions to move data between GPR and FPR instead of stores and loads to move float/int values.

goos: linux
goarch: mipsle
pkg: math
                      │   oldmathf   │              newmathf              │
                      │    sec/op    │   sec/op     vs base               │
Acos-4                   282.7n ± 0%   282.1n ± 0%   -0.18% (p=0.010 n=8)
Acosh-4                  450.8n ± 0%   450.9n ± 0%        ~ (p=0.699 n=8)
Asin-4                   272.6n ± 0%   272.1n ± 0%        ~ (p=0.050 n=8)
Asinh-4                  476.8n ± 0%   475.1n ± 0%   -0.35% (p=0.018 n=8)
Atan-4                   208.1n ± 0%   207.7n ± 0%   -0.17% (p=0.009 n=8)
Atanh-4                  448.8n ± 0%   448.7n ± 0%   -0.03% (p=0.014 n=8)
Atan2-4                  310.2n ± 0%   310.1n ± 0%        ~ (p=0.133 n=8)
Cbrt-4                   357.9n ± 0%   358.4n ± 0%   +0.11% (p=0.014 n=8)
Ceil-4                   203.8n ± 0%   204.7n ± 0%   +0.42% (p=0.008 n=8)
Compare-4                21.12n ± 0%   22.09n ± 0%   +4.59% (p=0.000 n=8)
Compare32-4             19.105n ± 0%   6.022n ± 0%  -68.48% (p=0.000 n=8)
Copysign-4               33.17n ± 0%   33.15n ± 0%        ~ (p=0.795 n=8)
Cos-4                    385.2n ± 0%   384.8n ± 1%        ~ (p=0.112 n=8)
Cosh-4                   546.0n ± 0%   545.0n ± 0%   -0.17% (p=0.012 n=8)
Erf-4                    192.4n ± 0%   195.4n ± 1%   +1.59% (p=0.000 n=8)
Erfc-4                   187.8n ± 0%   192.7n ± 0%   +2.64% (p=0.000 n=8)
Erfinv-4                 221.8n ± 1%   219.8n ± 0%   -0.88% (p=0.000 n=8)
Erfcinv-4                224.1n ± 1%   219.9n ± 0%   -1.87% (p=0.000 n=8)
Exp-4                    434.7n ± 0%   435.0n ± 0%        ~ (p=0.339 n=8)
ExpGo-4                  433.7n ± 0%   434.2n ± 0%   +0.13% (p=0.005 n=8)
Expm1-4                  243.0n ± 0%   242.9n ± 0%        ~ (p=0.103 n=8)
Exp2-4                   426.6n ± 0%   426.6n ± 0%        ~ (p=0.822 n=8)
Exp2Go-4                 425.6n ± 0%   425.5n ± 0%        ~ (p=0.377 n=8)
Abs-4                    8.033n ± 0%   8.029n ± 0%        ~ (p=0.065 n=8)
Dim-4                    18.07n ± 0%   18.07n ± 0%        ~ (p=0.051 n=8)
Floor-4                  151.6n ± 0%   151.6n ± 0%        ~ (p=0.450 n=8)
Max-4                    100.9n ± 8%   103.2n ± 2%        ~ (p=0.099 n=8)
Min-4                    116.4n ± 0%   116.4n ± 0%        ~ (p=0.467 n=8)
Mod-4                    959.6n ± 1%   950.9n ± 0%   -0.91% (p=0.006 n=8)
Frexp-4                  147.6n ± 0%   147.5n ± 0%   -0.07% (p=0.026 n=8)
Gamma-4                  482.7n ± 0%   478.2n ± 2%   -0.92% (p=0.000 n=8)
Hypot-4                  139.8n ± 1%   127.1n ± 8%   -9.12% (p=0.000 n=8)
HypotGo-4                137.2n ± 7%   117.5n ± 2%  -14.39% (p=0.001 n=8)
Ilogb-4                  109.5n ± 0%   108.4n ± 1%   -1.05% (p=0.001 n=8)
J0-4                     1.304µ ± 0%   1.304µ ± 0%        ~ (p=0.853 n=8)
J1-4                     1.349µ ± 0%   1.331µ ± 0%   -1.33% (p=0.000 n=8)
Jn-4                     2.774µ ± 0%   2.750µ ± 0%   -0.87% (p=0.000 n=8)
Ldexp-4                  151.6n ± 0%   151.5n ± 0%        ~ (p=0.695 n=8)
Lgamma-4                 226.9n ± 0%   233.9n ± 0%   +3.09% (p=0.000 n=8)
Log-4                    407.6n ± 0%   407.4n ± 0%        ~ (p=0.340 n=8)
Logb-4                   121.5n ± 0%   121.5n ± 0%   -0.08% (p=0.042 n=8)
Log1p-4                  315.5n ± 0%   315.6n ± 0%        ~ (p=0.930 n=8)
Log10-4                  417.8n ± 0%   417.5n ± 0%        ~ (p=0.053 n=8)
Log2-4                   208.8n ± 0%   208.8n ± 0%        ~ (p=0.582 n=8)
Modf-4                   126.5n ± 0%   126.4n ± 0%        ~ (p=0.128 n=8)
Nextafter32-4           112.45n ± 0%   82.27n ± 0%  -26.84% (p=0.000 n=8)
Nextafter64-4            141.5n ± 0%   141.5n ± 0%        ~ (p=0.569 n=8)
PowInt-4                 754.0n ± 1%   754.6n ± 0%        ~ (p=0.279 n=8)
PowFrac-4                1.608µ ± 1%   1.596µ ± 1%        ~ (p=0.661 n=8)
Pow10Pos-4               18.07n ± 0%   18.07n ± 0%        ~ (p=0.413 n=8)
Pow10Neg-4               17.08n ± 0%   18.07n ± 0%   +5.80% (p=0.000 n=8)
Round-4                  68.30n ± 0%   69.29n ± 0%   +1.45% (p=0.000 n=8)
RoundToEven-4            78.33n ± 0%   78.34n ± 0%        ~ (p=0.975 n=8)
Remainder-4              740.6n ± 1%   736.7n ± 0%        ~ (p=0.098 n=8)
Signbit-4                18.08n ± 0%   18.07n ± 0%        ~ (p=0.546 n=8)
Sin-4                    389.4n ± 0%   389.5n ± 0%        ~ (p=0.451 n=8)
Sincos-4                 415.6n ± 0%   415.6n ± 0%        ~ (p=0.450 n=8)
Sinh-4                   607.0n ± 0%   590.8n ± 1%   -2.68% (p=0.000 n=8)
SqrtIndirect-4           8.034n ± 0%   8.030n ± 0%        ~ (p=0.487 n=8)
SqrtLatency-4            8.031n ± 0%   8.034n ± 0%        ~ (p=0.152 n=8)
SqrtIndirectLatency-4    8.032n ± 0%   8.032n ± 0%        ~ (p=0.818 n=8)
SqrtGoLatency-4          895.8n ± 0%   895.3n ± 0%        ~ (p=0.553 n=8)
SqrtPrime-4              5.405µ ± 0%   5.379µ ± 0%   -0.48% (p=0.000 n=8)
Tan-4                    405.6n ± 0%   405.7n ± 0%        ~ (p=0.980 n=8)
Tanh-4                   545.1n ± 0%   545.1n ± 0%        ~ (p=0.806 n=8)
Trunc-4                  146.5n ± 0%   146.6n ± 0%        ~ (p=0.380 n=8)
Y0-4                     1.308µ ± 0%   1.306µ ± 0%        ~ (p=0.071 n=8)
Y1-4                     1.311µ ± 0%   1.315µ ± 0%   +0.31% (p=0.000 n=8)
Yn-4                     2.737µ ± 0%   2.745µ ± 0%   +0.27% (p=0.000 n=8)
Float64bits-4            14.56n ± 0%   14.56n ± 0%        ~ (p=0.689 n=8)
Float64frombits-4        19.08n ± 0%   19.08n ± 0%        ~ (p=0.580 n=8)
Float32bits-4           13.050n ± 0%   5.019n ± 0%  -61.54% (p=0.000 n=8)
Float32frombits-4       13.060n ± 0%   4.016n ± 0%  -69.25% (p=0.000 n=8)
FMA-4                    608.5n ± 0%   586.1n ± 0%   -3.67% (p=0.000 n=8)
geomean                  185.5n        176.2n        -5.02%

Change-Id: Ibf91092ffe70104e6c5ec03bc76d51259818b9b3
Reviewed-on: https://go-review.googlesource.com/c/go/+/494535
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2023-05-24 14:43:03 +00:00
Junxian Zhu f0d575c266 cmd/compile: optimize math.Float64(32)bits and math.Float64(32)frombits on mips64x
This CL use MFC1/MTC1 instructions to move data between GPR and FPR instead of stores and loads to move float/int values.

goos: linux
goarch: mips64le
pkg: math
                      │   oldmath    │              newmath               │
                      │    sec/op    │   sec/op     vs base               │
Acos-4                   258.2n ± 0%   258.2n ± 0%        ~ (p=0.859 n=8)
Acosh-4                  378.7n ± 0%   323.9n ± 0%  -14.47% (p=0.000 n=8)
Asin-4                   255.1n ± 2%   255.5n ± 0%   +0.16% (p=0.002 n=8)
Asinh-4                  407.1n ± 0%   348.7n ± 0%  -14.35% (p=0.000 n=8)
Atan-4                   189.5n ± 0%   189.9n ± 3%        ~ (p=0.205 n=8)
Atanh-4                  355.6n ± 0%   323.4n ± 2%   -9.03% (p=0.000 n=8)
Atan2-4                  284.1n ± 7%   280.1n ± 4%        ~ (p=0.313 n=8)
Cbrt-4                   314.3n ± 0%   236.4n ± 0%  -24.79% (p=0.000 n=8)
Ceil-4                   144.3n ± 3%   139.6n ± 0%        ~ (p=0.069 n=8)
Compare-4               21.100n ± 0%   7.035n ± 0%  -66.66% (p=0.000 n=8)
Compare32-4             20.100n ± 0%   6.030n ± 0%  -70.00% (p=0.000 n=8)
Copysign-4              34.970n ± 0%   6.221n ± 0%  -82.21% (p=0.000 n=8)
Cos-4                    183.4n ± 3%   184.1n ± 5%        ~ (p=0.159 n=8)
Cosh-4                   487.9n ± 2%   419.6n ± 0%  -14.00% (p=0.000 n=8)
Erf-4                    160.6n ± 0%   157.9n ± 0%   -1.68% (p=0.009 n=8)
Erfc-4                   183.7n ± 4%   169.8n ± 0%   -7.54% (p=0.000 n=8)
Erfinv-4                 191.5n ± 4%   183.6n ± 0%   -4.13% (p=0.023 n=8)
Erfcinv-4                192.0n ± 7%   184.3n ± 0%        ~ (p=0.425 n=8)
Exp-4                    398.2n ± 0%   340.1n ± 4%  -14.58% (p=0.000 n=8)
ExpGo-4                  383.3n ± 0%   327.3n ± 0%  -14.62% (p=0.000 n=8)
Expm1-4                  248.7n ± 5%   216.0n ± 0%  -13.11% (p=0.000 n=8)
Exp2-4                   372.8n ± 0%   316.9n ± 3%  -14.98% (p=0.000 n=8)
Exp2Go-4                 374.1n ± 0%   320.5n ± 0%  -14.33% (p=0.000 n=8)
Abs-4                    3.013n ± 0%   3.016n ± 0%   +0.10% (p=0.020 n=8)
Dim-4                    5.021n ± 0%   5.022n ± 0%        ~ (p=0.270 n=8)
Floor-4                  127.5n ± 4%   126.2n ± 3%        ~ (p=0.186 n=8)
Max-4                    72.32n ± 0%   61.33n ± 0%  -15.20% (p=0.000 n=8)
Min-4                    83.33n ± 1%   61.36n ± 0%  -26.37% (p=0.000 n=8)
Mod-4                    690.7n ± 0%   454.5n ± 0%  -34.20% (p=0.000 n=8)
Frexp-4                 116.30n ± 1%   71.80n ± 1%  -38.26% (p=0.000 n=8)
Gamma-4                  389.0n ± 0%   355.9n ± 1%   -8.48% (p=0.000 n=8)
Hypot-4                 102.40n ± 0%   83.90n ± 0%  -18.07% (p=0.000 n=8)
HypotGo-4               105.45n ± 4%   84.82n ± 2%  -19.56% (p=0.000 n=8)
Ilogb-4                  99.13n ± 4%   63.71n ± 2%  -35.73% (p=0.000 n=8)
J0-4                     859.7n ± 0%   854.8n ± 0%   -0.57% (p=0.000 n=8)
J1-4                     873.9n ± 0%   875.7n ± 0%   +0.21% (p=0.007 n=8)
Jn-4                     1.855µ ± 0%   1.867µ ± 0%   +0.65% (p=0.000 n=8)
Ldexp-4                 130.50n ± 2%   64.35n ± 0%  -50.69% (p=0.000 n=8)
Lgamma-4                 208.8n ± 0%   200.9n ± 0%   -3.78% (p=0.000 n=8)
Log-4                    294.1n ± 0%   255.2n ± 3%  -13.22% (p=0.000 n=8)
Logb-4                  105.45n ± 1%   66.81n ± 1%  -36.64% (p=0.000 n=8)
Log1p-4                  268.2n ± 0%   211.3n ± 0%  -21.21% (p=0.000 n=8)
Log10-4                  295.4n ± 0%   255.2n ± 2%  -13.59% (p=0.000 n=8)
Log2-4                   152.9n ± 1%   127.5n ± 0%  -16.61% (p=0.000 n=8)
Modf-4                  103.40n ± 0%   75.36n ± 0%  -27.12% (p=0.000 n=8)
Nextafter32-4           121.20n ± 1%   78.40n ± 0%  -35.31% (p=0.000 n=8)
Nextafter64-4           110.40n ± 1%   64.91n ± 0%  -41.20% (p=0.000 n=8)
PowInt-4                 509.8n ± 1%   369.3n ± 1%  -27.56% (p=0.000 n=8)
PowFrac-4               1189.0n ± 0%   947.8n ± 0%  -20.29% (p=0.000 n=8)
Pow10Pos-4               15.07n ± 0%   15.07n ± 0%        ~ (p=0.733 n=8)
Pow10Neg-4               20.10n ± 0%   20.10n ± 0%        ~ (p=0.576 n=8)
Round-4                  44.22n ± 0%   26.12n ± 0%  -40.92% (p=0.000 n=8)
RoundToEven-4            46.22n ± 0%   27.12n ± 0%  -41.31% (p=0.000 n=8)
Remainder-4              539.0n ± 1%   417.1n ± 1%  -22.62% (p=0.000 n=8)
Signbit-4               17.985n ± 0%   5.694n ± 0%  -68.34% (p=0.000 n=8)
Sin-4                    185.7n ± 5%   172.9n ± 0%   -6.89% (p=0.001 n=8)
Sincos-4                 176.6n ± 0%   200.9n ± 0%  +13.76% (p=0.000 n=8)
Sinh-4                   495.8n ± 0%   435.9n ± 0%  -12.09% (p=0.000 n=8)
SqrtIndirect-4           5.022n ± 0%   5.024n ± 0%        ~ (p=0.083 n=8)
SqrtLatency-4            8.038n ± 0%   8.044n ± 0%        ~ (p=0.524 n=8)
SqrtIndirectLatency-4    8.035n ± 0%   8.039n ± 0%   +0.06% (p=0.017 n=8)
SqrtGoLatency-4          340.1n ± 0%   278.3n ± 0%  -18.19% (p=0.000 n=8)
SqrtPrime-4              5.381µ ± 0%   5.386µ ± 0%        ~ (p=0.662 n=8)
Tan-4                    198.6n ± 1%   183.1n ± 0%   -7.85% (p=0.000 n=8)
Tanh-4                   491.3n ± 1%   440.8n ± 1%  -10.29% (p=0.000 n=8)
Trunc-4                  121.7n ± 0%   121.7n ± 0%        ~ (p=0.769 n=8)
Y0-4                     855.1n ± 0%   859.8n ± 0%   +0.54% (p=0.007 n=8)
Y1-4                     862.3n ± 0%   865.1n ± 0%   +0.32% (p=0.007 n=8)
Yn-4                     1.830µ ± 0%   1.837µ ± 0%   +0.36% (p=0.011 n=8)
Float64bits-4           13.060n ± 0%   3.016n ± 0%  -76.91% (p=0.000 n=8)
Float64frombits-4       13.060n ± 0%   3.018n ± 0%  -76.90% (p=0.000 n=8)
Float32bits-4           13.060n ± 0%   3.016n ± 0%  -76.91% (p=0.000 n=8)
Float32frombits-4       13.070n ± 0%   3.013n ± 0%  -76.94% (p=0.000 n=8)
FMA-4                    446.0n ± 0%   413.1n ± 1%   -7.38% (p=0.000 n=8)
geomean                  143.4n        108.3n       -24.49%

Change-Id: I2067f7a5ae1126ada7ab3fb2083710e8212535e9
Reviewed-on: https://go-review.googlesource.com/c/go/+/493815
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
2023-05-24 03:36:31 +00:00
Junxian Zhu 75add1ce0e cmd/compile: intrinsify runtime/internal/atomic.{And,Or} on MIPS64x
This CL intrinsify atomic{And,Or} on mips64x, which already implemented on mipsx.

goos: linux
goarch: mips64le
pkg: runtime/internal/atomic
                _  oldatomic  _             newatomic              _
                _   sec/op    _   sec/op     vs base               _
AtomicLoad64-4    27.96n _ 0%   28.02n _ 0%   +0.20% (p=0.026 n=8)
AtomicStore64-4   29.14n _ 0%   29.21n _ 0%   +0.22% (p=0.004 n=8)
AtomicLoad-4      27.96n _ 0%   28.02n _ 0%        ~ (p=0.220 n=8)
AtomicStore-4     29.15n _ 0%   29.21n _ 0%   +0.19% (p=0.002 n=8)
And8-4            53.09n _ 0%   41.71n _ 0%  -21.44% (p=0.000 n=8)
And-4             49.87n _ 0%   39.93n _ 0%  -19.93% (p=0.000 n=8)
And8Parallel-4    70.45n _ 0%   68.58n _ 0%   -2.65% (p=0.000 n=8)
AndParallel-4     70.40n _ 0%   67.95n _ 0%   -3.47% (p=0.000 n=8)
Or8-4             52.09n _ 0%   41.11n _ 0%  -21.08% (p=0.000 n=8)
Or-4              49.80n _ 0%   39.87n _ 0%  -19.93% (p=0.000 n=8)
Or8Parallel-4     70.43n _ 0%   68.25n _ 0%   -3.08% (p=0.000 n=8)
OrParallel-4      70.42n _ 0%   67.94n _ 0%   -3.51% (p=0.000 n=8)
Xadd-4            67.83n _ 0%   67.92n _ 0%   +0.13% (p=0.003 n=8)
Xadd64-4          67.85n _ 0%   67.92n _ 0%   +0.09% (p=0.021 n=8)
Cas-4             81.34n _ 0%   81.37n _ 0%        ~ (p=0.859 n=8)
Cas64-4           81.43n _ 0%   81.53n _ 0%   +0.13% (p=0.001 n=8)
Xchg-4            67.15n _ 0%   67.18n _ 0%        ~ (p=0.367 n=8)
Xchg64-4          67.16n _ 0%   67.21n _ 0%   +0.08% (p=0.008 n=8)
geomean           54.04n        51.01n        -5.61%

Change-Id: I9a4353f4b14134f1e9cf0dcf99db3feb951328ed
Reviewed-on: https://go-review.googlesource.com/c/go/+/494875
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Junxian Zhu <zhujunxian@oss.cipunited.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-05-18 10:23:17 +00:00
Junxian Zhu 5cad8d41ca math: optimize math.Abs on mipsx
This commit optimized math.Abs function implementation on mipsx.
Tested on loongson 3A2000.

goos: linux
goarch: mipsle
pkg: math
                      │   oldmath    │              newmath               │
                      │    sec/op    │   sec/op     vs base               │
Acos-4                   282.6n ± 0%   282.3n ± 0%        ~ (p=0.140 n=7)
Acosh-4                  506.1n ± 0%   451.8n ± 0%  -10.73% (p=0.001 n=7)
Asin-4                   272.3n ± 0%   272.2n ± 0%        ~ (p=0.808 n=7)
Asinh-4                  529.7n ± 0%   475.3n ± 0%  -10.27% (p=0.001 n=7)
Atan-4                   208.2n ± 0%   207.9n ± 0%        ~ (p=0.134 n=7)
Atanh-4                  503.4n ± 1%   449.7n ± 0%  -10.67% (p=0.001 n=7)
Atan2-4                  310.5n ± 0%   310.5n ± 0%        ~ (p=0.928 n=7)
Cbrt-4                   359.3n ± 0%   358.8n ± 0%        ~ (p=0.121 n=7)
Ceil-4                   203.9n ± 0%   204.0n ± 0%        ~ (p=0.600 n=7)
Compare-4                23.11n ± 0%   23.11n ± 0%        ~ (p=0.702 n=7)
Compare32-4              19.09n ± 0%   19.12n ± 0%        ~ (p=0.070 n=7)
Copysign-4               33.20n ± 0%   34.02n ± 0%   +2.47% (p=0.001 n=7)
Cos-4                    422.5n ± 0%   385.4n ± 1%   -8.78% (p=0.001 n=7)
Cosh-4                   628.0n ± 0%   545.5n ± 0%  -13.14% (p=0.001 n=7)
Erf-4                    193.7n ± 2%   192.7n ± 1%        ~ (p=0.430 n=7)
Erfc-4                   192.8n ± 1%   193.0n ± 0%        ~ (p=0.245 n=7)
Erfinv-4                 220.7n ± 1%   221.5n ± 2%        ~ (p=0.272 n=7)
Erfcinv-4                221.3n ± 1%   220.4n ± 2%        ~ (p=0.738 n=7)
Exp-4                    471.4n ± 0%   435.1n ± 0%   -7.70% (p=0.001 n=7)
ExpGo-4                  470.6n ± 0%   434.0n ± 0%   -7.78% (p=0.001 n=7)
Expm1-4                  243.1n ± 0%   243.4n ± 0%        ~ (p=0.417 n=7)
Exp2-4                   463.1n ± 0%   427.0n ± 0%   -7.80% (p=0.001 n=7)
Exp2Go-4                 462.4n ± 0%   426.2n ± 5%   -7.83% (p=0.001 n=7)
Abs-4                   37.000n ± 0%   8.039n ± 9%  -78.27% (p=0.001 n=7)
Dim-4                    18.09n ± 0%   18.11n ± 0%        ~ (p=0.094 n=7)
Floor-4                  151.9n ± 0%   151.8n ± 0%        ~ (p=0.190 n=7)
Max-4                    116.7n ± 1%   116.7n ± 1%        ~ (p=0.842 n=7)
Min-4                    116.6n ± 1%   116.6n ± 0%        ~ (p=0.464 n=7)
Mod-4                   1244.0n ± 0%   980.9n ± 0%  -21.15% (p=0.001 n=7)
Frexp-4                  199.0n ± 0%   146.7n ± 0%  -26.28% (p=0.001 n=7)
Gamma-4                  516.4n ± 0%   479.3n ± 1%   -7.18% (p=0.001 n=7)
Hypot-4                  169.8n ± 0%   117.8n ± 2%  -30.62% (p=0.001 n=7)
HypotGo-4                170.8n ± 0%   117.5n ± 0%  -31.21% (p=0.001 n=7)
Ilogb-4                  160.8n ± 0%   109.5n ± 0%  -31.90% (p=0.001 n=7)
J0-4                     1.359µ ± 0%   1.305µ ± 0%   -3.97% (p=0.001 n=7)
J1-4                     1.386µ ± 0%   1.334µ ± 0%   -3.75% (p=0.001 n=7)
Jn-4                     2.864µ ± 0%   2.758µ ± 0%   -3.70% (p=0.001 n=7)
Ldexp-4                  202.9n ± 0%   151.7n ± 0%  -25.23% (p=0.001 n=7)
Lgamma-4                 234.0n ± 0%   234.3n ± 0%        ~ (p=0.199 n=7)
Log-4                    444.1n ± 0%   407.9n ± 0%   -8.15% (p=0.001 n=7)
Logb-4                   157.8n ± 0%   121.6n ± 0%  -22.94% (p=0.001 n=7)
Log1p-4                  354.8n ± 0%   315.4n ± 0%  -11.10% (p=0.001 n=7)
Log10-4                  453.9n ± 0%   417.9n ± 0%   -7.93% (p=0.001 n=7)
Log2-4                   245.3n ± 0%   209.1n ± 0%  -14.76% (p=0.001 n=7)
Modf-4                   126.6n ± 0%   126.6n ± 0%        ~ (p=0.126 n=7)
Nextafter32-4            112.5n ± 0%   112.5n ± 0%        ~ (p=0.853 n=7)
Nextafter64-4            141.7n ± 0%   141.6n ± 0%        ~ (p=0.331 n=7)
PowInt-4                 878.8n ± 1%   758.3n ± 1%  -13.71% (p=0.001 n=7)
PowFrac-4                1.809µ ± 0%   1.615µ ± 0%  -10.72% (p=0.001 n=7)
Pow10Pos-4               18.10n ± 0%   18.12n ± 0%        ~ (p=0.464 n=7)
Pow10Neg-4               17.09n ± 0%   17.09n ± 0%        ~ (p=0.263 n=7)
Round-4                  68.36n ± 0%   68.33n ± 0%        ~ (p=0.325 n=7)
RoundToEven-4            78.40n ± 0%   78.40n ± 0%        ~ (p=0.934 n=7)
Remainder-4              894.0n ± 1%   753.4n ± 1%  -15.73% (p=0.001 n=7)
Signbit-4                18.09n ± 0%   18.09n ± 0%        ~ (p=0.761 n=7)
Sin-4                    389.8n ± 1%   389.8n ± 0%        ~ (p=0.995 n=7)
Sincos-4                 416.0n ± 0%   415.9n ± 0%        ~ (p=0.361 n=7)
Sinh-4                   634.6n ± 4%   585.6n ± 1%   -7.72% (p=0.001 n=7)
SqrtIndirect-4           8.035n ± 0%   8.036n ± 0%        ~ (p=0.523 n=7)
SqrtLatency-4            8.039n ± 0%   8.037n ± 0%        ~ (p=0.218 n=7)
SqrtIndirectLatency-4    8.040n ± 0%   8.040n ± 0%        ~ (p=0.652 n=7)
SqrtGoLatency-4          895.7n ± 0%   896.6n ± 0%   +0.10% (p=0.004 n=7)
SqrtPrime-4              5.406µ ± 0%   5.407µ ± 0%        ~ (p=0.592 n=7)
Tan-4                    406.1n ± 0%   405.8n ± 1%        ~ (p=0.435 n=7)
Tanh-4                   627.6n ± 0%   545.5n ± 0%  -13.08% (p=0.001 n=7)
Trunc-4                  146.7n ± 1%   146.7n ± 0%        ~ (p=0.755 n=7)
Y0-4                     1.359µ ± 0%   1.310µ ± 0%   -3.61% (p=0.001 n=7)
Y1-4                     1.351µ ± 0%   1.301µ ± 0%   -3.70% (p=0.001 n=7)
Yn-4                     2.829µ ± 0%   2.729µ ± 0%   -3.53% (p=0.001 n=7)
Float64bits-4            14.08n ± 0%   14.07n ± 0%        ~ (p=0.069 n=7)
Float64frombits-4        19.09n ± 0%   19.10n ± 0%        ~ (p=0.755 n=7)
Float32bits-4            13.06n ± 0%   13.07n ± 1%        ~ (p=0.586 n=7)
Float32frombits-4        13.06n ± 0%   13.06n ± 0%        ~ (p=0.853 n=7)
FMA-4                    606.9n ± 0%   606.8n ± 0%        ~ (p=0.393 n=7)
geomean                  201.1n        185.4n        -7.81%

Change-Id: I6d41a97ad3789ed5731588588859ac0b8b13b664
Reviewed-on: https://go-review.googlesource.com/c/go/+/484675
Reviewed-by: Rong Zhang <rongrong@oss.cipunited.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
2023-05-08 15:53:28 +00:00
Junxian Zhu 574431cfcd math: optimize math.Abs on mips64x
This commit optimized math.Abs function implementation on mips64x.
Tested on loongson 3A2000.

goos: linux
goarch: mips64le
pkg: math
                      │    oldmath    │               newmath               │
                      │    sec/op     │    sec/op     vs base               │
Acos-4                   258.0n ± ∞ ¹   257.1n ± ∞ ¹   -0.35% (p=0.008 n=5)
Acosh-4                  417.0n ± ∞ ¹   377.9n ± ∞ ¹   -9.38% (p=0.008 n=5)
Asin-4                   248.0n ± ∞ ¹   259.9n ± ∞ ¹   +4.80% (p=0.008 n=5)
Asinh-4                  439.6n ± ∞ ¹   408.3n ± ∞ ¹   -7.12% (p=0.008 n=5)
Atan-4                   189.6n ± ∞ ¹   188.8n ± ∞ ¹        ~ (p=0.056 n=5)
Atanh-4                  390.0n ± ∞ ¹   356.4n ± ∞ ¹   -8.62% (p=0.008 n=5)
Atan2-4                  279.0n ± ∞ ¹   263.9n ± ∞ ¹   -5.41% (p=0.008 n=5)
Cbrt-4                   314.2n ± ∞ ¹   322.3n ± ∞ ¹   +2.58% (p=0.008 n=5)
Ceil-4                   139.7n ± ∞ ¹   136.6n ± ∞ ¹   -2.22% (p=0.008 n=5)
Compare-4                21.11n ± ∞ ¹   21.09n ± ∞ ¹        ~ (p=0.405 n=5)
Compare32-4              20.10n ± ∞ ¹   20.12n ± ∞ ¹        ~ (p=0.206 n=5)
Copysign-4               32.17n ± ∞ ¹   35.71n ± ∞ ¹  +11.00% (p=0.008 n=5)
Cos-4                    222.8n ± ∞ ¹   169.8n ± ∞ ¹  -23.79% (p=0.008 n=5)
Cosh-4                   550.2n ± ∞ ¹   477.4n ± ∞ ¹  -13.23% (p=0.008 n=5)
Erf-4                    171.6n ± ∞ ¹   174.5n ± ∞ ¹        ~ (p=0.635 n=5)
Erfc-4                   182.6n ± ∞ ¹   170.2n ± ∞ ¹   -6.79% (p=0.008 n=5)
Erfinv-4                 177.6n ± ∞ ¹   196.6n ± ∞ ¹  +10.70% (p=0.008 n=5)
Erfcinv-4                177.8n ± ∞ ¹   197.8n ± ∞ ¹  +11.25% (p=0.008 n=5)
Exp-4                    422.8n ± ∞ ¹   382.1n ± ∞ ¹   -9.63% (p=0.008 n=5)
ExpGo-4                  416.1n ± ∞ ¹   383.2n ± ∞ ¹   -7.91% (p=0.008 n=5)
Expm1-4                  232.9n ± ∞ ¹   252.2n ± ∞ ¹   +8.29% (p=0.008 n=5)
Exp2-4                   404.8n ± ∞ ¹   389.1n ± ∞ ¹   -3.88% (p=0.008 n=5)
Exp2Go-4                 407.0n ± ∞ ¹   372.3n ± ∞ ¹   -8.53% (p=0.008 n=5)
Abs-4                   30.120n ± ∞ ¹   3.014n ± ∞ ¹  -89.99% (p=0.008 n=5)
Dim-4                    5.021n ± ∞ ¹   5.023n ± ∞ ¹        ~ (p=0.071 n=5)
Floor-4                  127.8n ± ∞ ¹   127.1n ± ∞ ¹   -0.55% (p=0.008 n=5)
Max-4                    77.69n ± ∞ ¹   76.33n ± ∞ ¹   -1.75% (p=0.008 n=5)
Min-4                    83.27n ± ∞ ¹   77.87n ± ∞ ¹   -6.48% (p=0.008 n=5)
Mod-4                    906.2n ± ∞ ¹   692.9n ± ∞ ¹  -23.54% (p=0.008 n=5)
Frexp-4                  150.6n ± ∞ ¹   108.6n ± ∞ ¹  -27.89% (p=0.008 n=5)
Gamma-4                  418.4n ± ∞ ¹   386.1n ± ∞ ¹   -7.72% (p=0.008 n=5)
Hypot-4                 148.20n ± ∞ ¹   93.78n ± ∞ ¹  -36.72% (p=0.008 n=5)
HypotGo-4               148.20n ± ∞ ¹   94.47n ± ∞ ¹  -36.26% (p=0.008 n=5)
Ilogb-4                 135.50n ± ∞ ¹   92.38n ± ∞ ¹  -31.82% (p=0.008 n=5)
J0-4                     937.7n ± ∞ ¹   861.7n ± ∞ ¹   -8.10% (p=0.008 n=5)
J1-4                     915.4n ± ∞ ¹   875.9n ± ∞ ¹   -4.32% (p=0.008 n=5)
Jn-4                     1.974µ ± ∞ ¹   1.863µ ± ∞ ¹   -5.62% (p=0.008 n=5)
Ldexp-4                  158.5n ± ∞ ¹   129.3n ± ∞ ¹  -18.42% (p=0.008 n=5)
Lgamma-4                 209.0n ± ∞ ¹   211.8n ± ∞ ¹        ~ (p=0.095 n=5)
Log-4                    326.4n ± ∞ ¹   295.2n ± ∞ ¹   -9.56% (p=0.008 n=5)
Logb-4                   147.7n ± ∞ ¹   105.0n ± ∞ ¹  -28.91% (p=0.008 n=5)
Log1p-4                  303.4n ± ∞ ¹   266.3n ± ∞ ¹  -12.23% (p=0.008 n=5)
Log10-4                  329.2n ± ∞ ¹   298.3n ± ∞ ¹   -9.39% (p=0.008 n=5)
Log2-4                   187.4n ± ∞ ¹   153.0n ± ∞ ¹  -18.36% (p=0.008 n=5)
Modf-4                   110.5n ± ∞ ¹   103.5n ± ∞ ¹   -6.33% (p=0.008 n=5)
Nextafter32-4            128.4n ± ∞ ¹   121.5n ± ∞ ¹   -5.37% (p=0.016 n=5)
Nextafter64-4            109.5n ± ∞ ¹   110.5n ± ∞ ¹   +0.91% (p=0.008 n=5)
PowInt-4                 603.3n ± ∞ ¹   516.4n ± ∞ ¹  -14.40% (p=0.008 n=5)
PowFrac-4                1.365µ ± ∞ ¹   1.183µ ± ∞ ¹  -13.33% (p=0.008 n=5)
Pow10Pos-4               15.07n ± ∞ ¹   15.07n ± ∞ ¹        ~ (p=0.738 n=5)
Pow10Neg-4               21.11n ± ∞ ¹   21.10n ± ∞ ¹        ~ (p=0.190 n=5)
Round-4                  44.23n ± ∞ ¹   44.22n ± ∞ ¹        ~ (p=0.635 n=5)
RoundToEven-4            50.25n ± ∞ ¹   46.27n ± ∞ ¹   -7.92% (p=0.008 n=5)
Remainder-4              675.6n ± ∞ ¹   530.4n ± ∞ ¹  -21.49% (p=0.008 n=5)
Signbit-4                17.07n ± ∞ ¹   17.95n ± ∞ ¹   +5.16% (p=0.008 n=5)
Sin-4                    171.6n ± ∞ ¹   189.1n ± ∞ ¹  +10.20% (p=0.008 n=5)
Sincos-4                 201.5n ± ∞ ¹   200.5n ± ∞ ¹        ~ (p=0.421 n=5)
Sinh-4                   529.6n ± ∞ ¹   484.6n ± ∞ ¹   -8.50% (p=0.008 n=5)
SqrtIndirect-4           5.021n ± ∞ ¹   5.023n ± ∞ ¹   +0.04% (p=0.048 n=5)
SqrtLatency-4            8.032n ± ∞ ¹   8.039n ± ∞ ¹   +0.09% (p=0.024 n=5)
SqrtIndirectLatency-4    8.036n ± ∞ ¹   8.038n ± ∞ ¹        ~ (p=0.056 n=5)
SqrtGoLatency-4          338.8n ± ∞ ¹   338.7n ± ∞ ¹        ~ (p=0.841 n=5)
SqrtPrime-4              5.379µ ± ∞ ¹   5.382µ ± ∞ ¹   +0.06% (p=0.048 n=5)
Tan-4                    182.7n ± ∞ ¹   191.8n ± ∞ ¹   +4.98% (p=0.008 n=5)
Tanh-4                   558.7n ± ∞ ¹   497.6n ± ∞ ¹  -10.94% (p=0.008 n=5)
Trunc-4                  122.5n ± ∞ ¹   122.6n ± ∞ ¹        ~ (p=0.405 n=5)
Y0-4                     892.8n ± ∞ ¹   851.7n ± ∞ ¹   -4.60% (p=0.008 n=5)
Y1-4                     887.2n ± ∞ ¹   863.2n ± ∞ ¹   -2.71% (p=0.008 n=5)
Yn-4                     1.889µ ± ∞ ¹   1.832µ ± ∞ ¹   -3.02% (p=0.008 n=5)
Float64bits-4            13.05n ± ∞ ¹   13.06n ± ∞ ¹   +0.08% (p=0.040 n=5)
Float64frombits-4        13.05n ± ∞ ¹   13.06n ± ∞ ¹        ~ (p=0.143 n=5)
Float32bits-4            13.05n ± ∞ ¹   13.06n ± ∞ ¹   +0.08% (p=0.008 n=5)
Float32frombits-4        13.05n ± ∞ ¹   13.08n ± ∞ ¹   +0.23% (p=0.016 n=5)
FMA-4                    445.7n ± ∞ ¹   448.1n ± ∞ ¹   +0.54% (p=0.008 n=5)
geomean                  157.2n         142.8n         -9.17%

Change-Id: I9bf104848b588c9ecf79401a81d483d7fcdb0a79
Reviewed-on: https://go-review.googlesource.com/c/go/+/481575
Reviewed-by: M Zhuo <mzh@golangcn.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Than McIntosh <thanm@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
Reviewed-by: Rong Zhang <rongrong@oss.cipunited.com>
2023-05-05 14:54:39 +00:00
Keith Randall cedf5008a8 cmd/compile: introduce separate memory op combining pass
Memory op combining is currently done using arch-specific rewrite rules.
Instead, do them as a arch-independent rewrite pass. This ensures that
all architectures (with unaligned loads & stores) get equal treatment.

This removes a lot of rewrite rules.

The new pass is a bit more comprehensive. It handles things like out-of-order
writes and is careful not to apply partial optimizations that then block
further optimizations.

Change-Id: I780ff3bb052475cd725a923309616882d25b8d9e
Reviewed-on: https://go-review.googlesource.com/c/go/+/478475
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2023-04-21 21:05:46 +00:00
Wayne Zuo 96428e160d cmd/compile: split DIVV/DIVVU op on loong64
Previously, we need calculate both quotient and remainder together.
However, in most cases, only one result is needed. By separating these
instructions, we can save one instruction in most cases.

Change-Id: I0a2d4167cda68ab606783ba1aa2720ede19d6b53
Reviewed-on: https://go-review.googlesource.com/c/go/+/475315
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2023-04-11 01:59:02 +00:00
erifan01 42f99b203d cmd/compile: optimize cmp to cmn under conditions < and >= on arm64
Under the right conditions we can optimize cmp comparisons to cmn
comparisons, such as:
func foo(a, b int) int {
  var c int
  if a + b < 0 {
  	c = 1
  }
  return c
}

Previously it's compiled as:
  ADD     R1, R0, R1
  CMP     $0, R1
  CSET    LT, R0
With this CL it's compiled as:
  CMN     R1, R0
  CSET    MI, R0
Here we need to pay attention to the overflow situation of a+b, the MI
flag means N==1, which doesn't honor the overflow flag V, its value
depends only on the sign of the result. So it has the same semantic of
the Go code, so it's correct.

Similarly, this CL also optimizes the case of >= comparison
using the PL conditional flag.

Change-Id: I47179faba5b30cca84ea69bafa2ad5241bf6dfba
Reviewed-on: https://go-review.googlesource.com/c/go/+/476116
Run-TryBot: Eric Fang <eric.fang@arm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-03-24 01:19:09 +00:00
Keith Randall 3360be4a11 cmd/compile: fix extraneous diff in generated files
Looks like CL 475735 contained a not-quite-up-to-date version
of the generated file. Maybe ABSFL was in an earlier version of the CL
and was removed before checkin without regenerating the generated file?

In any case, update the generated file. Shouldn't cause a problem, as
that field isn't used in x86/ssa.go.

Change-Id: I3f0b7d41081ba3ce2cdcae385fea16b37d7de81b
Reviewed-on: https://go-review.googlesource.com/c/go/+/477096
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-03-17 04:44:40 +00:00
Wayne Zuo cedfcba3e8 cmd/compile: instrinsify TrailingZeros{8,32,64} for 386
This CL add support for instrinsifying the TrialingZeros{8,32,64}
functions for 386 architecture. We need handle the case when the input
is 0, which could lead to undefined output from the BSFL instruction.

Next CL will remove the assembly code in runtime/internal/sys package.

Change-Id: Ic168edf68e81bf69a536102100fdd3f56f0f4a1b
Reviewed-on: https://go-review.googlesource.com/c/go/+/475735
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-03-14 08:10:32 +00:00
Wayne Zuo 14015be5bb cmd/compile: optimize multiplication on loong64
Previously, multiplication on loong64 architecture was performed using
MULV and MULHVU instructions to calculate the low 64-bit and high
64-bit of a multiplication respectively. However, in most cases, only
the low 64-bits are needed. This commit enalbes only computating the low
64-bit result with the MULV instruction.

Reduce the binary size slightly.

file      before    after     Δ       %
addr2line 2833777   2833849   +72     +0.003%
asm       5267499   5266963   -536    -0.010%
buildid   2579706   2579402   -304    -0.012%
cgo       4798260   4797444   -816    -0.017%
compile   25247419  25175030  -72389  -0.287%
cover     4973091   4972027   -1064   -0.021%
dist      3631013   3565653   -65360  -1.800%
doc       4076036   4074004   -2032   -0.050%
fix       3496378   3496066   -312    -0.009%
link      6984102   6983214   -888    -0.013%
nm        2743820   2743516   -304    -0.011%
objdump   4277171   4277035   -136    -0.003%
pack      2379248   2378872   -376    -0.016%
pprof     14419090  14419874  +784    +0.005%
test2json 2684386   2684018   -368    -0.014%
trace     13640018  13631034  -8984   -0.066%
vet       7748918   7752630   +3712   +0.048%
go        15643850  15638098  -5752   -0.037%
total     127423782 127268729 -155053 -0.122%

Change-Id: Ifce4a9a3ed1d03c170681e39cb6f3541db9882dc
Reviewed-on: https://go-review.googlesource.com/c/go/+/472775
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: David Chase <drchase@google.com>
2023-03-03 01:33:00 +00:00
Keith Randall 21d82e6ac8 cmd/compile: batch write barrier calls
Have the write barrier call return a pointer to a buffer into which
the generated code records pointers that need write barrier treatment.

Change-Id: I7871764298e0aa1513de417010c8d46b296b199e
Reviewed-on: https://go-review.googlesource.com/c/go/+/447781
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Bypass: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-02-24 00:21:13 +00:00
Keith Randall 44d22e75dd cmd/compile: detect write barrier completion differently
Instead of keeping track of in which blocks write barriers complete,
introduce a new op that marks the exact memory state where the
write barrier completes.

For future use. This allows us to move some of the write barrier code
to between the start of the merging block and the WBend marker.

Change-Id: If3809b260292667d91bf0ee18d7b4d0eb1e929f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/447777
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
2023-02-16 00:16:13 +00:00
Paul E. Murphy f9da938614 cmd/compile: remove unused ISELB PPC64 ssa opcode
The usage of ISELB has been removed as part of changes
made to support Power10 SETBC instructions.

Change-Id: I2fce4370f48c1eeee65d411dfd1bea4201f45b45
Reviewed-on: https://go-review.googlesource.com/c/go/+/465575
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: Archana Ravindar <aravind5@in.ibm.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2023-02-07 17:17:35 +00:00
Archana R a432d89137 cmd/compile: add rules to emit SETBC/R instructions on power10
This CL adds rules that replaces instances of ISEL that produce
a boolean result based on a condition register by SETBC/SETBCR
operations. On Power10 these are convereted to SETBC/SETBCR
instructions that use one register instead of 3 registers
conventionally used by ISEL and hence reduces register pressure.
On loops written specifically to exercise such instances of ISEL
extensively, a performance improvement of 2.5% is seen on Power10.
Also added verification tests to verify correct generation of
SETBC/SETBCR instructions on Power10.

Change-Id: Ib719897f09d893de40324440a43052dca026e8fa
Reviewed-on: https://go-review.googlesource.com/c/go/+/449795
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Archana Ravindar <aravind5@in.ibm.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-02-06 12:49:53 +00:00
Archana R cd1fc87156 cmd/compile: intrinsify math/bits/ReverseBytes{16|32|64} for ppc64/power10
This change intrinsifies ReverseBytes{16|32|64} by generating the
corresponding new instructions in Power10: brh, brd and brw and
adds a verification test for the same.
On Power 9 and 8, the .go code performs optimally as it is.

Performance improvement seen on Power10:
ReverseBytes32  1.38ns ± 0%  1.18ns ± 0%  -14.2
ReverseBytes64  1.52ns ± 0%  1.11ns ± 0%  -26.87
ReverseBytes16  1.41ns ± 1%  1.18ns ± 0%  -16.47

Change-Id: I88f127f3ab9ba24a772becc21ad90acfba324b37
Reviewed-on: https://go-review.googlesource.com/c/go/+/446675
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-02-03 19:01:06 +00:00
Jorropo 5c67ebbb31 cmd/compile: AMD64v3 remove unnecessary TEST comparision in isPowerOfTwo
With GOAMD64=V3 the canonical isPowerOfTwo function:
  func isPowerOfTwo(x uintptr) bool {
    return x&(x-1) == 0
  }

Used to compile to:
  temp := BLSR(x) // x&(x-1)
  flags = TEST(temp, temp)
  return flags.zf

However the blsr instruction already set ZF according to the result.
So we can remove the TEST instruction if we are just checking ZF.
Such as in multiple pieces of code around memory allocations.

This make the code smaller and faster.

Change-Id: Ia12d5a73aa3cb49188c0b647b1eff7b56c5a7b58
Reviewed-on: https://go-review.googlesource.com/c/go/+/448255
Run-TryBot: Jakub Ciolek <jakub@ciolek.dev>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-01-20 04:58:59 +00:00
Keith Randall 12befc3ce3 cmd/compile: improve scheduling pass
Convert the scheduling pass from scheduling backwards to scheduling forwards.

Forward scheduling makes it easier to prioritize scheduling values as
soon as they are ready, which is important for things like nil checks,
select ops, etc.

Forward scheduling is also quite a bit clearer. It was originally
backwards because computing uses is tricky, but I found a way to do it
simply and with n lg n complexity. The new scheme also makes it easy
to add new scheduling edges if needed.

Fixes #42673
Update #56568

Change-Id: Ibbb38c52d191f50ce7a94f8c1cbd3cd9b614ea8b
Reviewed-on: https://go-review.googlesource.com/c/go/+/270940
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2023-01-20 04:54:01 +00:00
Keith Randall 45dc81d856 cmd/compile: add memory argument to GetCallerSP
We need to make sure that when we get the stack pointer, we get it
at the right time.

V = GetCallerSP
Call()
W = GetCallerSP

If Call causes a stack growth, then we will be in a situation
where V != W. So it matters when GetCallerSP operations get scheduled.
Add a memory argument to GetCallerSP so it can't be reordered with
things like calls.

Change-Id: I6cc801134c38e358c5a1ec0c09d38379a16a4184
Reviewed-on: https://go-review.googlesource.com/c/go/+/453515
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Martin Möhrmann <martin@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-01-19 22:43:22 +00:00
Keith Randall f959fb3872 cmd/compile: add anchored version of SP
The SPanchored opcode is identical to SP, except that it takes a memory
argument so that it (and more importantly, anything that uses it)
must be scheduled at or after that memory argument.

This opcode ensures that a LEAQ of a variable gets scheduled after the
corresponding VARDEF for that variable.

This may lead to less CSE of LEAQ operations. The effect is very small.
The go binary is only 80 bytes bigger after this CL. Usually LEAQs get
folded into load/store operations, so the effect is only for pointerful
types, large enough to need a duffzero, and have their address passed
somewhere. Even then, usually the CSEd LEAQs will be un-CSEd because
the two uses are on different sides of a function call and the LEAQ
ends up being rematerialized at the second use anyway.

Change-Id: Ib893562cd05369b91dd563b48fb83f5250950293
Reviewed-on: https://go-review.googlesource.com/c/go/+/452916
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Reviewed-by: Martin Möhrmann <martin@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2023-01-19 22:43:12 +00:00
Dmitri Shuralyov 47a0d46716 cmd/compile/internal/ssa: generate code via a //go:generate directive
The standard way to generate code in a Go package is via //go:generate
directives, which are invoked by the developer explicitly running:

	go generate import/path/of/said/package

Switch to using that approach here.

This way, developers don't need to learn and remember a custom way that
each particular Go package may choose to implement its code generation.
It also enables conveniences such as 'go generate -n' to discover how
code is generated without running anything (this works on all packages
that rely on //go:generate directives), being able to generate multiple
packages at once and from any directory, and so on.

Change-Id: I0e5b6a1edeff670a8e588befeef0c445613803c7
Reviewed-on: https://go-review.googlesource.com/c/go/+/460135
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2023-01-19 22:42:34 +00:00
Keith Randall 5f7abeca5a cmd/compile: teach regalloc about temporary registers
Temporary registers are sometimes needed for an architecture backend
which needs to use several machine instructions to implement a single
SSA instruction.

Mark such instructions so that regalloc can reserve the temporary register
for it. That way we don't have to reserve a fixed register like we do now.

Convert the temp-register-using instructions on amd64 to use this
new mechanism. Other archs can follow as needed.

Change-Id: I1d0c8588afdad5cd18b4398eb5a0f755be5dead7
Reviewed-on: https://go-review.googlesource.com/c/go/+/398556
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
2022-11-17 18:53:13 +00:00