Commit Graph

104 Commits

Author SHA1 Message Date
fanzha02 fa04d488bd cmd/asm: fix the issue of moving 128-bit integers to vector registers on arm64
The CL 249758 added `FMOVQ $vcon, Vd` instruction and assembler used
128-bit simd literal-loading to load `$vcon` from pool into 128-bit vector
register `Vd`. Because Go does not have 128-bit integers for now, the
assembler will report an error of `immediate out of range` when
assembleing `FMOVQ $0x123456789abcdef0123456789abcdef, V0` instruction.

This patch lets 128-bit integers take two 64-bit operands, for the high
and low parts separately and adds `VMOVQ $hi, $lo, Vd` instruction to
move `$hi<<64+$lo' into 128-bit register `Vd`.

In addition, this patch renames `FMOVQ/FMOVD/FMOVS` ops to 'VMOVQ/VMOVD/VMOVS'
and uses them to move 128-bit, 64-bit and 32-bit constants into vector
registers, respectively

Update the go doc.

Fixes #40725

Change-Id: Ia3c83bb6463f104d2bee960905053a97299e0a3a
Reviewed-on: https://go-review.googlesource.com/c/go/+/255900
Trust: fannie zhang <Fannie.Zhang@arm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-09-25 01:47:40 +00:00
Joel Sing a3d8c210ad cmd/asm,cmd/internal/obj/riscv: provide branch pseudo-instructions
Implement various branch pseudo-instructions for riscv64. These make it easier
to read/write assembly and will also make it easier for the compiler to generate
optimised code.

Change-Id: Ic31a7748c0e1495522ebecf34b440842b8d12c04
Reviewed-on: https://go-review.googlesource.com/c/go/+/226397
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-03-31 13:13:58 +00:00
Joel Sing 10635921e5 cmd/asm,cmd/internal/obj/riscv: add atomic memory operation instructions
Use instructions in place of currently used defines.

Updates #36765

Change-Id: I00bb59e77b1aace549d7857cc9721ba2cb4ac6ca
Reviewed-on: https://go-review.googlesource.com/c/go/+/220541
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-15 08:13:28 +00:00
Joel Sing 85a8526a7e cmd/asm,cmd/internal/obj/riscv: add LR/SC instructions
Add support for Load-Reserved (LR) and Store-Conditional (SC) instructions.

Use instructions in place of currently used defines.

Updates #36765

Change-Id: I77e660639802293ece40cfde4865ac237e3308d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/220540
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-15 07:51:41 +00:00
Meng Zhuo 68fea523fd cmd/asm: add MIPS MSA LD/ST/LDI support for mips64x
This CL adding primitive asm support of MIPS MSA by introducing
new sets of register W0-W31 (C_WREG) and 12 new instructions:

* VMOV{B,H,W,D} ADDCONST, WREG  (Vector load immediate)
* VMOV{B,H,W,D} SOREG, WREG     (Vector load)
* VMOV{B,H,W,D} WREG, SOREG     (Vector store)

Ref: MIPS Architecture for Programmers Volume IV-j: The MIPS64 SIMD Architecture Module

Change-Id: I3362c59a73c82c94769c18a19a0bee7e5029217d
Reviewed-on: https://go-review.googlesource.com/c/go/+/215723
Run-TryBot: Meng Zhuo <mengzhuo1203@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-04 19:06:44 +00:00
Michael Munday 86235ec2bf cmd/asm/internal/arch: delete unused s390x functions
These functions are not necessary and are not called anywhere.

Change-Id: I1c0d814ba3044c27e3626ac9e6052d8154140404
Reviewed-on: https://go-review.googlesource.com/c/go/+/201697
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-10-17 14:55:25 +00:00
Brad Fitzpatrick 03ef105dae all: remove nacl (part 3, more amd64p32)
Part 1: CL 199499 (GOOS nacl)
Part 2: CL 200077 (amd64p32 files, toolchain)
Part 3: stuff that arguably should've been part of Part 2, but I forgot
        one of my grep patterns when splitting the original CL up into
        two parts.

This one might also have interesting stuff to resurrect for any future
x32 ABI support.

Updates #30439

Change-Id: I2b4143374a253a003666f3c69e776b7e456bdb9c
Reviewed-on: https://go-review.googlesource.com/c/go/+/200318
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-10-10 22:38:38 +00:00
Michael Munday 38c4a73706 cmd/asm: add s390x branch-on-count instructions
The branch-on-count instructions on s390x decrement the input
register and then compare its value to 0. If not equal the branch
is taken.

These instructions are useful for implementing loops with a set
number of iterations (which might be in a register).

For example, this for loop:

	for i := 0; i < n; i++ {
		... // i is not used or modified in the loop
	}

Could be implemented using this assembly:

	MOVD  Rn, Ri
loop:
	...
	BRCTG Ri, loop

Note that i will count down from n in the assembly whereas in the
original for loop it counted up to n which is why we can't use i
in the loop.

These instructions will only be used in hand-written codegen and
assembly for now since SSA blocks cannot currently modify values.
We could look into this in the future though.

Change-Id: Iaab93b8aa2699513b825439b8ea20d8fe2ea1ee6
Reviewed-on: https://go-review.googlesource.com/c/go/+/199977
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-09 15:04:59 +00:00
Michael Munday 8c99e45ef9 cmd/asm: add masked branch and conditional load instructions to s390x
The branch-relative-on-condition (BRC) instruction allows us to use
an immediate to specify under what conditions the branch is taken.
For example, `BRC $7, L1` is equivalent to `BNE L1`. It is sometimes
useful to specify branches in this way when either we don't have
an extended mnemonic for a particular mask value or we want to
generate the condition code mask programmatically.

The new load-on-condition (LOCR and LOCGR) and compare-and-branch
(CRJ, CGRJ, CLRJ, CLGRJ, CIJ, CGIJ, CLIJ and CLGIJ) instructions
provide the same flexibility for conditional loads and combined
compare and branch instructions.

Change-Id: Ic6f5d399b0157e278b39bd3645f4ee0f4df8e5fc
Reviewed-on: https://go-review.googlesource.com/c/go/+/196558
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-09-25 22:24:41 +00:00
Lynn Boger 7987238d9c cmd/asm,cmd/compile: clean up isel codegen on ppc64x
This cleans up the isel code generation in ssa for ppc64x.

Current there is no isel op and the isel code is only
generated from pseudo ops in ppc64/ssa.go, and only using
operands with values 0 or 1. When the isel is generated,
there is always a load of 1 into the temp register before it.

This change implements the isel op so it can be used in PPC64.rules,
and can recognize operand values other than 0 or 1. This also
eliminates the forced load of 1, so it will be loaded only if
needed.

This will make the isel code generation consistent with other ops,
and allow future rule changes that can take advantage of having
a more general purpose isel rule.

Change-Id: I363e1dbd3f7f5dfecb53187ad51cce409a8d1f8d
Reviewed-on: https://go-review.googlesource.com/c/go/+/195057
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <cseo@linux.vnet.ibm.com>
2019-09-18 15:54:32 +00:00
Joel Sing 6c6ad3086e cmd/asm,cmd/internal/obj: initial support for riscv64 assembler
Provide the initial framework for the riscv64 assembler. For now this
only supports raw WORD instructions, but at least allows for basic
testing. Additional functionality will be added in separate changes.

Based on the riscv-go port.

Updates #27532

Change-Id: I181ffb2d37a34764a3e91eded177d13a89c69f9a
Reviewed-on: https://go-review.googlesource.com/c/go/+/194117
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-09-12 18:54:38 +00:00
Joel Sing 112a72a020 cmd/asm/internal/arch: consolidate LinkArch handling
Rather than manually setting the LinkArch for each case, pass the correct
*obj.LinkArch to the arch* function, as is already done for archX86().

Change-Id: I4cf950780aa30a1385e785fb1d26edacb99bda79
Reviewed-on: https://go-review.googlesource.com/c/go/+/193818
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-09-07 09:43:04 +00:00
fanzha02 9c67516ed6 cmd/internal/obj/arm64: add support for most system registers
This patch supports the EL0 and EL1 system registers used in MRS/MSR
instructions. This patch refactors the assembler code, allowing the
assembler to read system register information from the automatically
generated sysRegEnc.go file and move existing declared system registers
to the sysRegEnc.go file.

This patch adds 431 system registers, it is worth noting that the number
of special registers is initialized to less than 1024 in the list7.go file.

This CL also adds some test cases to test the newly added system registers.

The test cases are contributed by Dianhong Xu <Dianhong.Xu@arm.com>

Change-Id: Ic09a937eaaeefe82bd08b5dd726808f8ff6cebf6
Reviewed-on: https://go-review.googlesource.com/c/go/+/189577
Reviewed-by: Ben Shi <powerman1st@163.com>
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-08-28 02:28:49 +00:00
smileeye 607493bed6 cmd/asm/internal/arch: improve the comment of function IsMIPSMUL
The check of MADD&MSUB was added to the function IsMIPSMUL in
a previous commit, and the comments should also be updated.

Change-Id: I2d3da055d55b459b908714c542dff99ab5c6cf99
Reviewed-on: https://go-review.googlesource.com/c/go/+/171102
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-04-10 14:39:34 +00:00
Ben Shi a189467cda cmd/internal/obj/mips: add MADD/MSUB
This CL implements MADD&MSUB, which are mips32r2 instructions.

Change-Id: I06fe51573569baf3b71536336b34b95ccd24750b
Reviewed-on: https://go-review.googlesource.com/c/go/+/167680
Run-TryBot: Ben Shi <powerman1st@163.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-03-23 01:39:16 +00:00
erifan01 448089854a cmd/asm: add arm64 v8.1 atomic instructions
This change adds several arm64 v8.1 atomic instructions and test cases.
They are LDADDAx, LDADDLx, LDANDAx, LDANDALx, LDANDLx, LDEORAx, LDEORALx,
LDEORLx, LDORAx, LDORALx, LDORLx, SWPAx and SWPLx. Their form is consistent
with the form of the existing atomic instructions.

For instructions STXRx, STLXRx, STXPx and STLXPx, the second destination
register can't be RSP. This CL also adds a check for this.

LDADDx Rs, (Rb), Rt: *Rb -> Rt, Rs + *Rb -> *Rb
LDANDx Rs, (Rb), Rt: *Rb -> Rt, Rs AND NOT(*Rb) -> *Rb
LDEORx Rs, (Rb), Rt: *Rb -> Rt, Rs EOR *Rb -> *Rb
LDORx  Rs, (Rb), Rt: *Rb -> Rt, Rs OR *Rb -> *Rb

Change-Id: I9f9b0245958cb57ab7d88c66fb9159b23b9017fd
Reviewed-on: https://go-review.googlesource.com/c/go/+/157001
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-03-06 18:52:03 +00:00
Cherry Zhang 35c0554293 cmd/asm: rename R18 to R18_PLATFORM on ARM64
In ARM64 ABI, R18 is the "platform register", the use of which is
OS specific. The OS could choose to reserve this register. In
practice, it seems fine to use R18 on Linux but not on darwin (iOS).

Rename R18 to R18_PLATFORM to prevent accidental use. There is no
R18 usage within the standard library (besides tests, which are
updated).

Fixes #26110

Change-Id: Icef7b9549e2049db1df307a0180a3c90a12d7a84
Reviewed-on: https://go-review.googlesource.com/c/147218
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-11-06 20:10:12 +00:00
Ben Shi 1018a80fe8 cmd/internal/obj/arm64: support more atomic instructions
LDADDALD(64-bit) and LDADDALW(32-bit) are already supported.
This CL adds supports of LDADDALH(16-bit) and LDADDALB(8-bit).

Change-Id: I4eac61adcec226d618dfce88618a2b98f5f1afe7
Reviewed-on: https://go-review.googlesource.com/132135
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-04 20:29:33 +00:00
Ben Shi 26d62b4ca9 cmd/internal/obj/arm64: add SWPALD/SWPALW/SWPALH/SWPALB
Those new instructions have acquire/release semantics, besides
normal atomic SWPD/SWPW/SWPH/SWPB.

Change-Id: I24821a4d21aebc342897ae52903aef612c8d8a4a
Reviewed-on: https://go-review.googlesource.com/128476
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-08-20 14:09:51 +00:00
Mario Arranz 6d0f757bb9 cmd/asm/internal/arch: add package definition
The package arch didn't have a definition as you can see in https://tip.golang.org/pkg/cmd/asm/internal/arch/

Change-Id: I07653b396393a75c445d04dbae5e22e90a0d5133
GitHub-Last-Rev: a859e9410f
GitHub-Pull-Request: golang/go#26817
Reviewed-on: https://go-review.googlesource.com/127929
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-08-06 21:16:18 +00:00
Wei Xiao 0a7ac93c27 cmd/compile: improve atomic add intrinsics with ARMv8.1 new instruction
ARMv8.1 has added new instruction (LDADDAL) for atomic memory operations. This
CL improves existing atomic add intrinsics with the new instruction. Since the
new instruction is only guaranteed to be present after ARMv8.1, we guard its
usage with a conditional on CPU feature.

Performance result on ARMv8.1 machine:
name        old time/op  new time/op  delta
Xadd-224    1.05µs ± 6%  0.02µs ± 4%  -98.06%  (p=0.000 n=10+8)
Xadd64-224  1.05µs ± 3%  0.02µs ±13%  -98.10%  (p=0.000 n=9+10)
[Geo mean]  1.05µs       0.02µs       -98.08%

Performance result on ARMv8.0 machine:
name        old time/op  new time/op  delta
Xadd-46      538ns ± 1%   541ns ± 1%  +0.62%  (p=0.000 n=9+9)
Xadd64-46    505ns ± 1%   508ns ± 0%  +0.48%  (p=0.003 n=9+8)
[Geo mean]   521ns        524ns       +0.55%

Change-Id: If4b5d8d0e2d6f84fe1492a4f5de0789910ad0ee9
Reviewed-on: https://go-review.googlesource.com/81877
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-06-21 14:52:43 +00:00
Richard Musiol 3b137dd2df cmd/compile: add wasm architecture
This commit adds the wasm architecture to the compile command.
A later commit will contain the corresponding linker changes.

Design doc: https://docs.google.com/document/d/131vjr4DH6JFnb-blm_uRdaC0_Nv3OUwjEY5qVCxCup4

The following files are generated:
- src/cmd/compile/internal/ssa/opGen.go
- src/cmd/compile/internal/ssa/rewriteWasm.go
- src/cmd/internal/obj/wasm/anames.go

Updates #18892

Change-Id: Ifb4a96a3e427aac2362a1c97967d5667450fba3b
Reviewed-on: https://go-review.googlesource.com/103295
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-05-04 17:56:12 +00:00
Ben Shi fc48dcb15f cmd/internal/obj/arm64: add more atomic instructions
More atomic instructions were introduced in ARMv8.1. And this CL
adds support for them and corresponding test cases.

LDADD Rs, (Rb), Rt: (Rb) -> Rt, Rs+(Rb) -> (Rb)
LDAND Rs, (Rb), Rt: (Rb) -> Rt, Rs&(Rb) -> (Rb)
LDEOR Rs, (Rb), Rt: (Rb) -> Rt, Rs^(Rb) -> (Rb)
LDOR  Rs, (Rb), Rt: (Rb) -> Rt, Rs|(Rb) -> (Rb)

Change-Id: Ifb9df86583c4dc54fb96274852c3b93a197045e4
Reviewed-on: https://go-review.googlesource.com/110535
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-05-03 20:27:05 +00:00
Balaram Makam c789ce3f75 cmd/asm: add vector instructions for ChaCha20Poly1305 on ARM64
This change provides VZIP1, VZIP2, VTBL instruction for supporting
ChaCha20Poly1305 implementation later.

Change-Id: Ife7c87b8ab1a6495a444478eeb9d906ae4c5ffa9
Reviewed-on: https://go-review.googlesource.com/110015
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-30 20:07:37 +00:00
fanzha02 236c567ba9 cmd/internal/obj/arm64: refactor the extended/shifted register encoding to the backend
The current code encodes the register and the shift/extension into a.Offset
field and this is done in the frontend. The CL refactors it to have the
frontend record the register/shift/extension information in a.Reg or a.Index
and leave the encoding stuff for the backend.

Change-Id: I600f456aec95377b7b79cd58e94afcb30aca5d19
Reviewed-on: https://go-review.googlesource.com/106815
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-18 19:19:57 +00:00
Ben Shi e1040d7955 cmd/internal/obj/arm64: support SWPD/SWPW/SWPH/SWPB
SWPD/SWPW/SWPH/SWPB were introduced in ARMv8.1. They swap content
of register and memory atomically. And their difference is
SWPD: 64-bit double word data
SWPW: 32-bit word data (zero extended to 64-bit)
SWPH: 16-bit half word data (zero extended to 64-bit)
SWPB: 8-bit byte data (zero extended to 64-bit)

This CL implements them in the arm64 assembler.

Change-Id: I2d9fb2310674bd92693531210e187143e7eed602
Reviewed-on: https://go-review.googlesource.com/101516
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-11 16:06:01 +00:00
fanzha02 31700b83b5 cmd/internal/obj/arm64: add assembler support for load with register offset
The patch adds support for LDR(register offset) instruction.
And add the test cases and negative tests.

Change-Id: I5b32c6a5065afc4571116d4896f7ebec3c0416d3
Reviewed-on: https://go-review.googlesource.com/87955
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-04-09 18:20:41 +00:00
Tobias Klauser 58e3f2ac87 cmd/asm/internal/arch: unexport ParseARM64Suffix
ParseARM64Suffix is not used outside cmd/asm/internal/arch.

Change-Id: I8e7782dce11cf8cd2fd08dd17e555ced8d87ba24
Reviewed-on: https://go-review.googlesource.com/105115
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-06 14:09:04 +00:00
Daniel Martí 284c53498f cmd: some semi-automated cleanups
* Remove some redundant returns
* Replace HasPrefix with TrimPrefix
* Remove some obviously dead code

Passes toolstash -cmp on std cmd.

Change-Id: Ifb0d70a45cbb8a8553758a8c4878598b7fe932bc
Reviewed-on: https://go-review.googlesource.com/105017
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-06 13:59:29 +00:00
Fangming.Fang ef9bdd11e8 cmd/asm: add essential instructions for AES-GCM on ARM64
This change adds VLD1, VST1, VPMULL{2}, VEXT, VRBIT, VUSHR and VSHL instructions
for supporting AES-GCM implementation later.

Fixes #24400

Change-Id: I556feb88067f195cbe25629ec2b7a817acc58709
Reviewed-on: https://go-review.googlesource.com/101095
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-03 15:36:31 +00:00
Yuval Pavel Zholkover 0a5be12f5c cmd/internal/obj/arm: add DMB instruction
Change-Id: Ib67a61d5b37af210ff15d60d72bd5238b9c2d0ca
Reviewed-on: https://go-review.googlesource.com/94815
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-27 19:54:44 +00:00
Balaram Makam b46d398887 runtime: improve arm64 memclr implementation
Improve runtime memclr_arm64.s using ZVA feature to zero out memory when n
is at least 64 bytes.

Also add DCZID_EL0 system register to use in MRS instruction.

    Benchmark results of runtime/Memclr on Amberwing:
name          old time/op    new time/op    delta
Memclr/5        12.7ns ± 0%    12.7ns ± 0%      ~     (all equal)
Memclr/16       12.7ns ± 0%    12.2ns ± 1%    -4.13%  (p=0.000 n=7+8)
Memclr/64       14.0ns ± 0%    14.6ns ± 1%    +4.29%  (p=0.000 n=7+8)
Memclr/256      23.7ns ± 0%    25.7ns ± 0%    +8.44%  (p=0.000 n=8+7)
Memclr/4096      204ns ± 0%      74ns ± 0%   -63.71%  (p=0.000 n=8+8)
Memclr/65536    2.89µs ± 0%    0.84µs ± 0%   -70.91%  (p=0.000 n=8+8)
Memclr/1M       45.9µs ± 0%    17.0µs ± 0%   -62.88%  (p=0.000 n=8+8)
Memclr/4M        184µs ± 0%      77µs ± 4%   -57.94%  (p=0.001 n=6+8)
Memclr/8M        367µs ± 0%     144µs ± 1%   -60.72%  (p=0.000 n=7+8)
Memclr/16M       734µs ± 0%     293µs ± 1%   -60.09%  (p=0.000 n=8+8)
Memclr/64M      2.94ms ± 0%    1.23ms ± 0%   -58.06%  (p=0.000 n=7+8)
GoMemclr/5      8.00ns ± 0%    8.79ns ± 0%    +9.83%  (p=0.000 n=8+8)
GoMemclr/16     8.00ns ± 0%    7.60ns ± 0%    -5.00%  (p=0.000 n=8+8)
GoMemclr/64     10.8ns ± 0%    10.4ns ± 0%    -3.70%  (p=0.000 n=8+8)
GoMemclr/256    20.4ns ± 0%    21.2ns ± 0%    +3.92%  (p=0.000 n=8+8)

name          old speed      new speed      delta
Memclr/5       394MB/s ± 0%   393MB/s ± 0%    -0.28%  (p=0.006 n=8+8)
Memclr/16     1.26GB/s ± 0%  1.31GB/s ± 1%    +4.07%  (p=0.000 n=7+8)
Memclr/64     4.57GB/s ± 0%  4.39GB/s ± 2%    -3.91%  (p=0.000 n=7+8)
Memclr/256    10.8GB/s ± 0%  10.0GB/s ± 0%    -7.95%  (p=0.001 n=7+6)
Memclr/4096   20.1GB/s ± 0%  55.3GB/s ± 0%  +175.46%  (p=0.000 n=8+8)
Memclr/65536  22.6GB/s ± 0%  77.8GB/s ± 0%  +243.63%  (p=0.000 n=7+8)
Memclr/1M     22.8GB/s ± 0%  61.5GB/s ± 0%  +169.38%  (p=0.000 n=8+8)
Memclr/4M     22.8GB/s ± 0%  54.3GB/s ± 4%  +137.85%  (p=0.001 n=6+8)
Memclr/8M     22.8GB/s ± 0%  58.1GB/s ± 1%  +154.56%  (p=0.000 n=7+8)
Memclr/16M    22.8GB/s ± 0%  57.2GB/s ± 1%  +150.54%  (p=0.000 n=8+8)
Memclr/64M    22.8GB/s ± 0%  54.4GB/s ± 0%  +138.42%  (p=0.000 n=7+8)
GoMemclr/5     625MB/s ± 0%   569MB/s ± 0%    -8.90%  (p=0.000 n=7+8)
GoMemclr/16   2.00GB/s ± 0%  2.10GB/s ± 0%    +5.26%  (p=0.000 n=8+8)
GoMemclr/64   5.92GB/s ± 0%  6.15GB/s ± 0%    +3.83%  (p=0.000 n=7+8)
GoMemclr/256  12.5GB/s ± 0%  12.1GB/s ± 0%    -3.77%  (p=0.000 n=8+7)

    Benchmark results of runtime/Memclr on Amberwing without ZVA:
name          old time/op    new time/op    delta
Memclr/5        12.7ns ± 0%    12.8ns ± 0%   +0.79%  (p=0.008 n=5+5)
Memclr/16       12.7ns ± 0%    12.7ns ± 0%     ~     (p=0.444 n=5+5)
Memclr/64       14.0ns ± 0%    14.4ns ± 0%   +2.86%  (p=0.008 n=5+5)
Memclr/256      23.7ns ± 1%    19.2ns ± 0%  -19.06%  (p=0.008 n=5+5)
Memclr/4096      203ns ± 0%     119ns ± 0%  -41.38%  (p=0.008 n=5+5)
Memclr/65536    2.89µs ± 0%    1.66µs ± 0%  -42.76%  (p=0.008 n=5+5)
Memclr/1M       45.9µs ± 0%    26.2µs ± 0%  -42.82%  (p=0.008 n=5+5)
Memclr/4M        184µs ± 0%     105µs ± 0%  -42.81%  (p=0.008 n=5+5)
Memclr/8M        367µs ± 0%     210µs ± 0%  -42.76%  (p=0.008 n=5+5)
Memclr/16M       734µs ± 0%     420µs ± 0%  -42.74%  (p=0.008 n=5+5)
Memclr/64M      2.94ms ± 0%    1.69ms ± 0%  -42.46%  (p=0.008 n=5+5)
GoMemclr/5      8.00ns ± 0%    8.40ns ± 0%   +5.00%  (p=0.008 n=5+5)
GoMemclr/16     8.00ns ± 0%    8.40ns ± 0%   +5.00%  (p=0.008 n=5+5)
GoMemclr/64     10.8ns ± 0%     9.6ns ± 0%  -11.02%  (p=0.008 n=5+5)
GoMemclr/256    20.4ns ± 0%    17.2ns ± 0%  -15.69%  (p=0.008 n=5+5)

name          old speed      new speed      delta
Memclr/5       393MB/s ± 0%   391MB/s ± 0%   -0.64%  (p=0.008 n=5+5)
Memclr/16     1.26GB/s ± 0%  1.26GB/s ± 0%   -0.55%  (p=0.008 n=5+5)
Memclr/64     4.57GB/s ± 0%  4.44GB/s ± 0%   -2.79%  (p=0.008 n=5+5)
Memclr/256    10.8GB/s ± 0%  13.3GB/s ± 0%  +23.07%  (p=0.016 n=4+5)
Memclr/4096   20.1GB/s ± 0%  34.3GB/s ± 0%  +70.91%  (p=0.008 n=5+5)
Memclr/65536  22.7GB/s ± 0%  39.6GB/s ± 0%  +74.65%  (p=0.008 n=5+5)
Memclr/1M     22.8GB/s ± 0%  40.0GB/s ± 0%  +74.88%  (p=0.008 n=5+5)
Memclr/4M     22.8GB/s ± 0%  39.9GB/s ± 0%  +74.84%  (p=0.008 n=5+5)
Memclr/8M     22.9GB/s ± 0%  39.9GB/s ± 0%  +74.71%  (p=0.008 n=5+5)
Memclr/16M    22.9GB/s ± 0%  39.9GB/s ± 0%  +74.64%  (p=0.008 n=5+5)
Memclr/64M    22.8GB/s ± 0%  39.7GB/s ± 0%  +73.79%  (p=0.008 n=5+5)
GoMemclr/5     625MB/s ± 0%   595MB/s ± 0%   -4.77%  (p=0.000 n=4+5)
GoMemclr/16   2.00GB/s ± 0%  1.90GB/s ± 0%   -4.77%  (p=0.008 n=5+5)
GoMemclr/64   5.92GB/s ± 0%  6.66GB/s ± 0%  +12.48%  (p=0.016 n=4+5)
GoMemclr/256  12.5GB/s ± 0%  14.9GB/s ± 0%  +18.95%  (p=0.008 n=5+5)

Fixes #22948

Change-Id: Iaae4e22391e25b54d299821bb7f8a81ac3986b93
Reviewed-on: https://go-review.googlesource.com/82055
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-14 18:20:40 +00:00
fanzha02 fdf5aaf555 cmd/asm: fix ARM64 vector register arrangement encoding bug
The current code assigns vector register arrangement a wrong value
when the arrangement specifier is S2, which causes the incorrect
assembly.

The patch fixes the issue and adds the test cases.

Fixes #24249

Change-Id: I9736df1279494003d0b178da1af9cee9cd85ce21
Reviewed-on: https://go-review.googlesource.com/98555
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-12 15:05:53 +00:00
erifan01 8c3c8332cd cmd/asm: enable several arm64 load & store instructions
Instructions LDARB, LDARH, LDAXPW, LDAXP, STLRB, STLRH, STLXP, STLXPW, STXP,
STXPW have been added before, but they are not enabled. This CL enabled them.

Change the form of LDXP and LDXPW to the form of LDP, and fix a bug of STLXP.

Change-Id: I5d2b51494b92451bf6b072c65cfdd8acf07e9b54
Reviewed-on: https://go-review.googlesource.com/96215
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-28 23:46:21 +00:00
Kunpei Sakai 0c471dfae2 cmd: avoid unnecessary type conversions
CL generated mechanically with github.com/mdempsky/unconvert.

Also updated cmd/compile/internal/ssa/gen/*.rules manually.

Change-Id: If721ef73cf0771ae83ce7e2d11623fc8d9155768
Reviewed-on: https://go-review.googlesource.com/97075
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-02-26 20:22:06 +00:00
erifan01 f5de42001d cmd/asm: add arm64 instructions for math optimization
Add arm64 HW instructions FMADDD, FMADDS, FMSUBD, FMSUBS, FNMADDD, FNMADDS,
FNMSUBD, FNMSUBS, VFMLA, VFMLS, VMOV (element) for math optimization.

Add check on register element index and test cases.

Change-Id: Ice07c50b1a02d488ad2cde2a4e8aea93f3e3afff
Reviewed-on: https://go-review.googlesource.com/90876
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-22 03:55:53 +00:00
Shawn Smith d3beea8c52 all: fix misspellings
GitHub-Last-Rev: 468df242d0
GitHub-Pull-Request: golang/go#23935
Change-Id: If751ce3ffa3a4d5e00a3138211383d12cb6b23fc
Reviewed-on: https://go-review.googlesource.com/95577
Run-TryBot: Andrew Bonventre <andybons@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Andrew Bonventre <andybons@golang.org>
2018-02-20 21:02:58 +00:00
fanzha02 ebd4950e3b cmd/asm: add PRFM instruction on ARM64
The current assembler cannot handle PRFM(immediate) instruciton.
The fix creates a prfopfield struct that contains the eight
prefetch operations and the value to use in instruction. And add
the test cases.

Fixes #22932

Change-Id: I621d611bd930ef3c42306a4372447c46d53b2ccf
Reviewed-on: https://go-review.googlesource.com/81675
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-14 17:09:52 +00:00
fanzha02 a0222ec518 cmd/internal/obj/arm64: fix assemble add/adds/sub/subs/cmp/cmn(extended register) bug
The current code encodes the wrong option value in the binary.

The fix reconstructs the function opxrrr() that does not encode the option
value into the binary value when arguments is sign or zero-extended register.

Add the relevant test cases and negative tests.

Fixes #23501
Change-Id: Ie5850ead2ad08d9a235a5664869aac5051762f1f
Reviewed-on: https://go-review.googlesource.com/88876
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-06 00:25:23 +00:00
Ben Shi 1ac8846984 cmd/internal/obj/arm: add BFC/BFI to arm's assembler
BFC (Bit Field Clear) and BFI (Bit Field Insert) were
introduced in ARMv6T2, and the compiler can use them
to do further optimization.

Change-Id: I5a3fbcd2c2400c9bf4b939da6366c854c744c27f
Reviewed-on: https://go-review.googlesource.com/72891
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-11-03 14:06:21 +00:00
Wei Xiao 531e6c06c4 cmd/asm: refine Go assembly for ARM64
Some ARM64-specific instructions (such as SIMD instructions) are not supported.
This patch adds support for the following:
1. Extended register, e.g.:
     ADD	Rm.<ext>[<<amount], Rn, Rd
     <ext> can have the following values:
       UXTB, UXTH, UXTW, UXTX, SXTB, SXTH, SXTW and SXTX
2. Arrangement for SIMD instructions, e.g.:
     VADDP	Vm.<T>, Vn.<T>, Vd.<T>
     <T> can have the following values:
       B8, B16, H4, H8, S2, S4 and D2
3. Width specifier and element index for SIMD instructions, e.g.:
     VMOV	Vn.<T>[index], Rd // MOV(to general register)
     <T> can have the following values:
       S and D
4. Register List, e.g.:
     VLD1	(Rn), [Vt1.<T>, Vt2.<T>, Vt3.<T>]
5. Register offset variant, e.g.:
     VLD1.P	(Rn)(Rm), [Vt1.<T>, Vt2.<T>] // Rm is the post-index register
6. Go assembly for ARM64 reference manual
     new added instructions are required to have according explanation items in
     the manual and items for existed instructions will be added incrementally

For more information about the refinement background, please refer to the
discussion (https://groups.google.com/forum/#!topic/golang-dev/rWgDxCrL4GU)

This patch only adds syntax and doesn't break any assembly that already exists.

Change-Id: I34e90b7faae032820593a0e417022c354a882008
Reviewed-on: https://go-review.googlesource.com/41654
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-10-13 13:41:19 +00:00
isharipo 8c67f210a1 cmd/internal/obj: change Prog.From3 to RestArgs ([]Addr)
This change makes it easier to express instructions
with arbitrary number of operands.

Rationale: previous approach with operand "hiding" does
not scale well, AVX and especially AVX512 have many
instructions with 3+ operands.

x86 asm backend is updated to handle up to 6 explicit operands.
It also fixes issue with 4-th immediate operand type checks.
All `ytab` tables are updated accordingly.

Changes to non-x86 backends only include these patterns:
`p.From3 = X` => `p.SetFrom3(X)`
`p.From3.X = Y` => `p.GetFrom3().X = Y`

Over time, other backends can adapt Prog.RestArgs
and reduce the amount of workarounds.

-- Performance --

x/benchmark/build:

$ benchstat upstream.bench patched.bench
name      old time/op                 new time/op                 delta
Build-48                  21.7s ± 2%                  21.8s ± 2%   ~     (p=0.218 n=10+10)

name      old binary-size             new binary-size             delta
Build-48                  10.3M ± 0%                  10.3M ± 0%   ~     (all equal)

name      old build-time/op           new build-time/op           delta
Build-48                  21.7s ± 2%                  21.8s ± 2%   ~     (p=0.218 n=10+10)

name      old build-peak-RSS-bytes    new build-peak-RSS-bytes    delta
Build-48                  145MB ± 5%                  148MB ± 5%   ~     (p=0.218 n=10+10)

name      old build-user+sys-time/op  new build-user+sys-time/op  delta
Build-48                  21.0s ± 2%                  21.2s ± 2%   ~     (p=0.075 n=10+10)

Microbenchmark shows a slight slowdown.

name        old time/op  new time/op  delta
AMD64asm-4  49.5ms ± 1%  49.9ms ± 1%  +0.67%  (p=0.001 n=23+15)

func BenchmarkAMD64asm(b *testing.B) {
  for i := 0; i < b.N; i++ {
    TestAMD64EndToEnd(nil)
    TestAMD64Encoder(nil)
  }
}

Change-Id: I4f1d37b5c2c966da3f2127705ccac9bff0038183
Reviewed-on: https://go-review.googlesource.com/63490
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-09-15 21:05:03 +00:00
fanzha02 8f1e2a2610 cmd/internal/obj/arm64: fix assemble fcmp/fcmpe bug
The current code treats floating-point constant as integer
and does not treat fcmp/fcmpe as the comparison instrucitons
that requires special handling.

The fix corrects the type of immediate arguments and adds fcmp/fcmpe
in the special handing.

Uncomment the fcmp/fcmpe cases.

Fixes #21567
Change-Id: I6782520e2770f6ce70270b667dd5e68f71e2d5ad
Reviewed-on: https://go-review.googlesource.com/57852
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-08-23 15:59:34 +00:00
Ben Shi 9bf521b2b4 cmd/internal/obj/arm: support BFX/BFXU instructions
BFX extracts given bits from the source register, sign extends them
to 32-bit, and writes to destination register. BFXU does the similar
operation with zero extention.

They were introduced in ARMv6T2.

Change-Id: I6822ebf663497a87a662d3645eddd7c611de2b1e
Reviewed-on: https://go-review.googlesource.com/56071
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-08-21 16:29:59 +00:00
fanzha02 6e8b10397b cmd/internal/obj/arm64: fix assemble stxr/stxrw/stxrb/stxrh bug
The stxr/stxrw/stxrb/stxrh instructions belong to STLXR-like instructions
set and they require special handling. The current code has no special
handling for those instructions.

The fix adds the special handling for those instructions.

Uncomment stxr/stxrw/stxrb/stxrh test cases.

Fixes #21397
Change-Id: I31cee29dd6b30b1c25badd5c7574dda7a01bf016
Reviewed-on: https://go-review.googlesource.com/54951
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-08-15 14:05:29 +00:00
Michael Munday db6f3bbc9a cmd: fix the order that s390x operands are printed in
The assembler reordered the operands of some instructions to put the
first operand into From3. Unfortunately this meant that when the
instructions were printed the operands were in a different order than
the assembler would expect as input. For example, 'MVC $8, (R1), (R2)'
would be printed as 'MVC (R1), $8, (R2)'.

Originally this was done to ensure that From contained the source
memory operand. The current compiler no longer requires this and so
this CL simply makes all instructions use the standard order for
operands: From, Reg, From3 and finally To.

Fixes #18295

Change-Id: Ib2b5ec29c647ca7a995eb03dc78f82d99618b092
Reviewed-on: https://go-review.googlesource.com/40299
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-04-25 15:16:56 +00:00
Josh Bleecher Snyder 63c1aff60b cmd/internal/obj: eagerly initialize assemblers
CL 38662 changed the x86 assembler to be eagerly
initialized, for a concurrent backend.

This CL puts in place a proper mechanism for doing so,
and switches all architectures to use it.

Passes toolstash-check -all.

Updates #15756

Change-Id: Id2aa527d3a8259c95797d63a2f0d1123e3ca2a1c
Reviewed-on: https://go-review.googlesource.com/39917
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-07 16:57:03 +00:00
Dave Cheney bfd8093c96 cmd/asm/internal/arch: use generic obj.Rconv function everywhere
Rather than using arm64.Rconv directly in the archArm64 constructor
use the generic obj.Rconv helper. This removes the only use of
arm64.Rconv outside the arm64 package itself.

Change-Id: I99e9e7156b52cd26dc134f610f764ec794264e2c
Reviewed-on: https://go-review.googlesource.com/38756
Run-TryBot: Dave Cheney <dave@cheney.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-03-31 04:29:58 +00:00
Ben Shi c5ddc558ba cmd/internal/obj/arm: support more ARMv5/ARMv6/ARMv7 instructions
REV/REV16/REVSH were introduced in ARMv6, they offered more efficient
byte reverse operatons.

MMUL/MMULA/MMULS were introduced in ARMv6, they simplified
a serial of mul->shift->add/sub operations into a single instruction.

RBIT was introduced in ARMv7, it inversed a 32-bit word's bit order.

MULS was introduced in ARMv7, it corresponded to MULA.

MULBB/MULABB were introduced in ARMv5TE, they performed 16-bit
multiplication (and accumulation).

Change-Id: I6365b17b3c4eaf382a657c210bb0094b423b11b8
Reviewed-on: https://go-review.googlesource.com/35565
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-30 18:19:04 +00:00
Josh Bleecher Snyder 3b251e603d cmd/internal/obj: eagerly initialize x86 assembler
Prior to this CL, instinit was called as needed.
This does not work well in a concurrent backend.
Initialization is very cheap; do it on startup instead.

Passes toolstash-check -all.
No compiler performance impact.

Updates #15756

Change-Id: Ifa5e82e8abf4504435e1b28766f5703a0555f42d
Reviewed-on: https://go-review.googlesource.com/38662
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-25 23:20:25 +00:00