Commit Graph

172 Commits

Author SHA1 Message Date
eric fang 5daefc5363 cmd/internal/obj/arm64: fix the wrong ROR operator of some instructions
Instructions such as ADD, SUB, CMP do not support ROR shift operations,
but we have not checked this at present. This CL adds this check.

Change-Id: Icac461f61ad6ddb60886a59ba34dddd29df1cc0f
Reviewed-on: https://go-review.googlesource.com/c/go/+/310035
Reviewed-by: eric fang <eric.fang@arm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Trust: eric fang <eric.fang@arm.com>
Run-TryBot: eric fang <eric.fang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
2021-04-22 01:40:02 +00:00
eric fang 48b7432e3f cmd/internal/obj/arm64: fix the wrong sp dst register of ADDS/SUBS instructions
According the armv8-a specification, the destination register of the ADDS/ADDSW/
SUBS/SUBSW instructions can not be RSP, the current implementation does not
check this and encodes this wrong instruction format as a CMN instruction. This
CL adds a check and test cases for this situation.

Change-Id: I92cc2f8e17dbda70f0dce8fddf1ca6d5d7730589
Reviewed-on: https://go-review.googlesource.com/c/go/+/309989
Reviewed-by: eric fang <eric.fang@arm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Trust: eric fang <eric.fang@arm.com>
Run-TryBot: eric fang <eric.fang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
2021-04-15 01:54:41 +00:00
eric fang d25476ebb2 cmd/internal/obj/arm64: fix constant pool size calculation error
The current calculation method of constant pool size is:
  c.pool.size = -c.pool.size & (funcAlign - 1)
  c.pool.size += uint32(sz)
This doesn't make sense. This CL changes it as:
  if q.As == ADWORD {
    c.pool.size = roundUp(c.pool.size, 8)
  }
  c.pool.size += uint32(sz)
which takes into account the padding size generated by aligning DWORD to
8 bytes.

It's unnecessary to set the Pc field in addpool and addpool128 because
the Pc value will be reset in function span7, so remove the related lines.

Change-Id: I5eb8f259be55a6b97fc2c20958b4a602bffa4f88
Reviewed-on: https://go-review.googlesource.com/c/go/+/298609
Reviewed-by: eric fang <eric.fang@arm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Trust: eric fang <eric.fang@arm.com>
Run-TryBot: eric fang <eric.fang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
2021-03-23 01:21:24 +00:00
fanzha02 9136d958ab cmd/asm: complete the support for VDUP on arm64
"VMOV Vn.<T>[index], Vn" is equivalent to "VDUP Vn.<T>[index], Vn", and
the latter has a higher priority in the disassembler than the former.
But the assembler doesn't support to encode this combination of VDUP,
this leads to an inconsistency between assembler and disassembler.

For example, if we assemble "VMOV V20.S[0], V20" to hex then decode it,
we'll get "VDUP V20.S[0], V20".

  VMOV V20.S[0], V20 -> 9406045e -> VDUP V20.S[0], V20 -> error

But we cannot assemble this VDUP again.

Similar reason for "VDUP Rn, Vd.<T>". This CL completes the support for
VDUP.

This patch is a copy of CL 276092. Co-authored-by: JunchenLi
<junchen.li@arm.com>

Change-Id: I8f8d86cf1911d5b16bb40d189f1dc34b24416aaf
Reviewed-on: https://go-review.googlesource.com/c/go/+/302929
Trust: fannie zhang <Fannie.Zhang@arm.com>
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-03-19 01:38:59 +00:00
fanzha02 a607408403 cmd/internal/obj/arm64: add support for op(extended register) with RSP arguments
Refer to ARM reference manual, like add(extended register) instructions,
the extension is encoded in the "option" field. If "Rd" or "Rn" is
RSP and "option" is "010" then LSL is preferred. Therefore, the instrution
"add Rm<<imm, RSP, RSP" or "add Rm<<imm RSP" is valid and can be encoded
as add(extended register) instruction.

But the current assembler can not handle like "op R1<<1, RSP, RSP"
instructions, this patch adds the support.

Because MVN(extended register) does not exist, remove it.

Add test cases.

Change-Id: I968749d75c6b93a4f297b39c73cc292e6b1035ad
Reviewed-on: https://go-review.googlesource.com/c/go/+/284900
Trust: fannie zhang <Fannie.Zhang@arm.com>
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-03-12 01:47:01 +00:00
eric fang 775f11cda1 cmd/internal/obj/arm64: remove unncessary class check in addpool
The argument class check in addpool is unnecessary, remove it so that we don't
need to list all the compatiable classes.

Change-Id: I36f6594db35e25db22fe898273e024c2db4cb771
Reviewed-on: https://go-review.googlesource.com/c/go/+/283492
Trust: eric fang <eric.fang@arm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-03-04 01:37:43 +00:00
eric fang 355c3a037e cmd/internal/obj/asm64: add support for moving BITCON to RSP
Constant of BITCON type can be moved into RSP by MOVD or MOVW instructions
directly, this CL enables this format of these two instructions.

For 32-bit ADDWop instructions with constant, rewrite the high 32-bit
to be a repetition of the low 32-bit, just as ANDWop instructions do,
so that we can optimize ADDW $bitcon, Rn, Rt as:
MOVW $bitcon, Rtmp
ADDW Rtmp, Rn, Rt
The original code is:
MOVZ $bitcon_low, Rtmp
MOVK $bitcon_high,Rtmp
ADDW Rtmp, Rn, Rt

Change-Id: I30e71972bcfd6470a8b6e6ffbacaee79d523805a
Reviewed-on: https://go-review.googlesource.com/c/go/+/289649
Trust: eric fang <eric.fang@arm.com>
Run-TryBot: eric fang <eric.fang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: eric fang <eric.fang@arm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-03-04 01:28:21 +00:00
eric fang 726d704c32 cmd/asm: add arm64 instructions VUMAX and VUMIN
This CL adds support for arm64 fp&simd instructions VUMAX and VUMIN.
Fixes #42326

Change-Id: I3757ba165dc31ce1ce70f3b06a9e5b94c14d2ab9
Reviewed-on: https://go-review.googlesource.com/c/go/+/271497
Trust: eric fang <eric.fang@arm.com>
Run-TryBot: eric fang <eric.fang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: fannie zhang <Fannie.Zhang@arm.com>
Reviewed-by: eric fang <eric.fang@arm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-03-04 01:26:52 +00:00
eric fang 79beddc773 cmd/asm: add 128-bit FLDPQ and FSTPQ instructions for arm64
This CL adds assembly support for 128-bit FLDPQ and FSTPQ instructions.

This CL also deletes some wrong pre/post-indexed LDP and STP instructions,
such as {ALDP, C_UAUTO4K, C_NONE, C_NONE, C_PAIR, 74, 8, REGSP, 0, C_XPRE},
because when the offset type is C_UAUTO4K, pre and post don't work.

Change-Id: Ifd901d4440eb06eb9e86c9dd17518749fdf32848
Reviewed-on: https://go-review.googlesource.com/c/go/+/273668
Trust: eric fang <eric.fang@arm.com>
Run-TryBot: eric fang <eric.fang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: eric fang <eric.fang@arm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-03-04 01:26:21 +00:00
eric fang cd99385ff4 cmd/internal/obj/arm64: fix VMOVQ instruction encoding error
The VMOVQ instruction moves a 128-bit constant into a V register, as 128-bit
constant can't be loaded into a register directly, we split it into two 64-bit
constants and load it from constant pool. Currently we add the 128-bit constant
to literal pool by calling the 'addpool' function twice, this is not the right
way because it doesn't guarantee the two DWORD instructions are consecutive,
and the second call of addpool will overwrite the p.Pool field,resulting in a
wrong PC-relative offset value of the Prog.

This CL renames the flag LFROM3 to LFROM128, and adds a new function addpool128
to add a 128-bit constant to the literal pool.

Change-Id: I616f043c99a9a18a663f8768842cc980de2e6f79
Reviewed-on: https://go-review.googlesource.com/c/go/+/282334
Reviewed-by: eric fang <eric.fang@arm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: eric fang <eric.fang@arm.com>
Trust: eric fang <eric.fang@arm.com>
2021-01-23 12:38:15 +00:00
Jonathan Swinney 5f0fca1475 cmd/asm: rename arm64 instructions LDANDx to LDCLRx
The LDANDx instructions were misleading because they correspond to the
mnemonic LDCLRx as defined in the Arm Architecture Reference Manual for
Armv8. This changes the assembler to use the same mnemonic as the GNU
assembler and the manual.

The instruction has the form:

LDCLRx Rs, (Rb), Rt: *Rb -> Rt, Rs AND NOT(*Rb) -> *Rb

Change-Id: I94ae003e99e817209bba1afe960e612bf3a0b410
Reviewed-on: https://go-review.googlesource.com/c/go/+/267138
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: fannie zhang <Fannie.Zhang@arm.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: fannie zhang <Fannie.Zhang@arm.com>
2020-11-04 15:53:19 +00:00
fanzha02 15131caeaa cmd/internal/obj/arm64: add CASx/CASPx instructions
This patch adds support for CASx and CASPx atomic instructions.

  go syntax                 gnu syntax
CASD Rs, (Rn|RSP), Rt => cas Xs, Xt, (Xn|SP)
CASALW Rs, (Rn|RSP), Rt => casal Ws, Wt, (Xn|SP)
CASPD (Rs, Rs+1), (Rn|RSP), (Rt, Rt+1) => casp Xs, Xs+1, Xt, Xt+1, (Xn|SP)
CASPW (Rs, Rs+1), (Rn|RSP), (Rt, Rt+1) => casp Ws, Ws+1, Wt, Wt+1, (Xn|SP)

This patch changes the type of prog.RestArgs from "[]Addr" to
"[]struct{Addr, Pos}", Pos is a enum, indicating the position of
the operand.

This patch also adds test cases.

Change-Id: Ib971cfda7890b7aa895d17bab22dea326c7fcaa4
Reviewed-on: https://go-review.googlesource.com/c/go/+/233277
Trust: fannie zhang <Fannie.Zhang@arm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-29 05:07:11 +00:00
fanzha02 3089ef6bd7 cmd/asm: add several arm64 SIMD instructions
This patch enables VSLI, VUADDW(2), VUSRA and FMOVQ SIMD instructions
required by the issue #40725. And the GNU syntax of 'FMOVQ' is 128-bit
ldr/str(immediate, simd&fp).

Add test cases.

Fixes #40725

Change-Id: Ide968ef4a9385ce4cd8f69bce854289014d30456
Reviewed-on: https://go-review.googlesource.com/c/go/+/258397
Trust: fannie zhang <Fannie.Zhang@arm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-29 03:52:23 +00:00
fanzha02 308ec220a2 cmd/asm: refactor some encoding functions for load/store with immediate offset on arm64
Some of the current functions for encoding load/store with
immediate offset instructions, like opstr12(), opstr9(),
opldr12(), opldr9() and opldrpp(), etc., they have the same
code, so this patch refactors them and merges them into two
functions opstr() and opldr().

Change-Id: I60367f8b720b77c7ebe6d66905a950dcf7701836
Reviewed-on: https://go-review.googlesource.com/c/go/+/263479
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Trust: fannie zhang <Fannie.Zhang@arm.com>
2020-10-29 01:50:09 +00:00
Russ Cox 912262b806 cmd/internal/obj: move LSym.Func into LSym.Extra
This creates space for a different kind of extension field
in LSym without making the struct any larger.
(There are many LSym, so we care about keeping the struct small.)

Change-Id: Ib16edb9e15f54c2a7351c8b875e19684058711e5
Reviewed-on: https://go-review.googlesource.com/c/go/+/243943
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-16 03:02:36 +00:00
Meng Zhuo 3036b76df0 cmd/asm: Add SHA3 hardware instructions for ARM64
Armv8.2-SHA introduced four SHA3-related instructions

EOR3 <Vd>.16B, <Vn>.16B, <Vm>.16B, <Va>.16B
RAX1 <Vd>.2D, <Vn>.2D, <Vm>.2D
XAR <Vd>.2D, <Vn>.2D, <Vm>.2D, #<imm6>
BCAX <Vd>.16B, <Vn>.16B, <Vm>.16B, <Va>.16B

We convert them into Go asm style as:

VEOR3 <Va>.B16, <Vm>.B16, <Vn>.B16, <Vd>.B16
VRAX1 <Vm>.D2, <Vn>.D2, <Vd>.D2
VXAR $imm6, <Vm>.D2, <Vn>.D2, <Vd>.D2
VBCAX <Va>.B16, <Vm>.B16, <Vn>.B16, <Vd>.B16

Armv8 Reference Manual:
* EOR3 (Three-way Exclusive OR) on C7.2.42
* RAX1 (Rotate and Exclusive OR) on C7.2.217
* XAR (Exclusive OR and Rotate) on C7.2.401
* BCAX (Bit Clear and Exclusive OR) on C7.2.12

Change-Id: I9a5d1b5ad508ed8fd5289d535906c54d9a63ca5a
Reviewed-on: https://go-review.googlesource.com/c/go/+/180757
Run-TryBot: Meng Zhuo <mzh@golangcn.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Trust: Emmanuel Odeke <emm.odeke@gmail.com>
2020-10-10 16:06:07 +00:00
Cherry Zhang 3f7b4d1207 cmd/internal/obj/arm64: only emit R_CALLIND relocations on calls
Don't emit it for jumps. In particular, not for the return
instruction, which is JMP (LR).

Reduce some binary size and linker resources.

Change-Id: Idb3242b86c5a137597fb8accb8aadfe0244c14cf
Reviewed-on: https://go-review.googlesource.com/c/go/+/260341
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
2020-10-07 20:18:43 +00:00
fanzha02 fa04d488bd cmd/asm: fix the issue of moving 128-bit integers to vector registers on arm64
The CL 249758 added `FMOVQ $vcon, Vd` instruction and assembler used
128-bit simd literal-loading to load `$vcon` from pool into 128-bit vector
register `Vd`. Because Go does not have 128-bit integers for now, the
assembler will report an error of `immediate out of range` when
assembleing `FMOVQ $0x123456789abcdef0123456789abcdef, V0` instruction.

This patch lets 128-bit integers take two 64-bit operands, for the high
and low parts separately and adds `VMOVQ $hi, $lo, Vd` instruction to
move `$hi<<64+$lo' into 128-bit register `Vd`.

In addition, this patch renames `FMOVQ/FMOVD/FMOVS` ops to 'VMOVQ/VMOVD/VMOVS'
and uses them to move 128-bit, 64-bit and 32-bit constants into vector
registers, respectively

Update the go doc.

Fixes #40725

Change-Id: Ia3c83bb6463f104d2bee960905053a97299e0a3a
Reviewed-on: https://go-review.googlesource.com/c/go/+/255900
Trust: fannie zhang <Fannie.Zhang@arm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-09-25 01:47:40 +00:00
diaxu01 a86b6f23f0 cmd/internal/obj/arm64: optimize the instruction of moving long effective stack address
Currently, when the offset of "MOVD $offset(Rn), Rd" is a large positive
constant or a negative constant, the assembler will load this offset from
the constant pool.This patch gets rid of the constant pool by encoding the
offset into two ADD instructions if it's a large positive constant or one
SUB instruction if negative. For very large negative offset, it is rarely
used, here we don't optimize this case.

Optimized case 1: MOVD $-0x100000(R7), R0
Before: LDR 0x67670(constant pool), R27; ADD R27.UXTX, R0, R7
After: SUB $0x100000, R7, R0

Optimized case 2: MOVD $0x123468(R7), R0
Before: LDR 0x67670(constant pool), R27; ADD R27.UXTX, R0, R7
After: ADD $0x123000, R7, R27; ADD $0x000468, R27, R0

1. Binary size before/after.
binary                 size change
pkg/linux_arm64        +4KB
pkg/tool/linux_arm64   no change
go                     no change
gofmt                  no change

2. go1 benckmark.
name                      old time/op                new time/op                delta
pkg:test/bench/go1 goos:linux goarch:arm64
BinaryTree17-64           7335721401.800000ns +-40%  6264542009.800000ns +-14%    ~     (p=0.421 n=5+5)
Fannkuch11-64             3886551822.600000ns +- 0%  3875870590.200000ns +- 0%    ~     (p=0.151 n=5+5)
FmtFprintfEmpty-64                82.960000ns +- 1%          83.900000ns +- 2%  +1.13%  (p=0.048 n=5+5)
FmtFprintfString-64              149.200000ns +- 1%         148.000000ns +- 0%  -0.80%  (p=0.016 n=5+4)
FmtFprintfInt-64                 177.000000ns +- 0%         178.400000ns +- 2%    ~     (p=0.794 n=4+5)
FmtFprintfIntInt-64              240.200000ns +- 2%         239.400000ns +- 4%    ~     (p=0.302 n=5+5)
FmtFprintfPrefixedInt-64         300.400000ns +- 0%         299.200000ns +- 1%    ~     (p=0.119 n=5+5)
FmtFprintfFloat-64               360.000000ns +- 0%         361.600000ns +- 3%    ~     (p=0.349 n=4+5)
FmtManyArgs-64                  1064.400000ns +- 1%        1061.400000ns +- 0%    ~     (p=0.087 n=5+5)
GobDecode-64                12080404.400000ns +- 2%    11637601.000000ns +- 1%  -3.67%  (p=0.008 n=5+5)
GobEncode-64                 8474973.800000ns +- 2%     7977801.600000ns +- 2%  -5.87%  (p=0.008 n=5+5)
Gzip-64                    416501238.400000ns +- 0%   410463405.400000ns +- 0%  -1.45%  (p=0.008 n=5+5)
Gunzip-64                   58088415.200000ns +- 0%    58826209.600000ns +- 0%  +1.27%  (p=0.008 n=5+5)
HTTPClientServer-64           128660.200000ns +-23%      117840.800000ns +- 8%    ~     (p=0.222 n=5+5)
JSONEncode-64               17547746.800000ns +- 4%    17216180.000000ns +- 1%    ~     (p=0.222 n=5+5)
JSONDecode-64               80879896.000000ns +- 1%    80063737.200000ns +- 0%  -1.01%  (p=0.008 n=5+5)
Mandelbrot200-64             5484901.600000ns +- 0%     5483614.400000ns +- 0%    ~     (p=0.310 n=5+5)
GoParse-64                   6201166.800000ns +- 6%     6150920.600000ns +- 1%    ~     (p=0.548 n=5+5)
RegexpMatchEasy0_32-64           135.000000ns +- 0%         139.200000ns +- 7%    ~     (p=0.643 n=5+5)
RegexpMatchEasy0_1K-64           484.600000ns +- 2%         483.800000ns +- 2%    ~     (p=0.984 n=5+5)
RegexpMatchEasy1_32-64           128.000000ns +- 1%         124.600000ns +- 1%  -2.66%  (p=0.008 n=5+5)
RegexpMatchEasy1_1K-64           769.400000ns +- 2%         761.400000ns +- 1%    ~     (p=0.460 n=5+5)
RegexpMatchMedium_32-64           12.900000ns +- 0%          12.500000ns +- 0%  -3.10%  (p=0.008 n=5+5)
RegexpMatchMedium_1K-64        57879.200000ns +- 1%       56512.200000ns +- 0%  -2.36%  (p=0.008 n=5+5)
RegexpMatchHard_32-64           3091.600000ns +- 1%        3071.000000ns +- 0%  -0.67%  (p=0.048 n=5+5)
RegexpMatchHard_1K-64          92941.200000ns +- 1%       92794.000000ns +- 0%    ~     (p=1.000 n=5+5)
Revcomp-64                1695605187.000000ns +-54%  1821697637.400000ns +-47%    ~     (p=1.000 n=5+5)
Template-64                112839686.800000ns +- 1%   109964069.200000ns +- 3%    ~     (p=0.095 n=5+5)
TimeParse-64                     587.000000ns +- 0%         587.000000ns +- 0%    ~     (all equal)
TimeFormat-64                    586.000000ns +- 1%         584.200000ns +- 1%    ~     (p=0.659 n=5+5)
[Geo mean]                      81804.262218ns             80694.712973ns       -1.36%

name                      old speed                  new speed                  delta
pkg:test/bench/go1 goos:linux goarch:arm64
GobDecode-64                         63.6MB/s +- 2%             66.0MB/s +- 1%  +3.78%  (p=0.008 n=5+5)
GobEncode-64                         90.6MB/s +- 2%             96.2MB/s +- 2%  +6.23%  (p=0.008 n=5+5)
Gzip-64                              46.6MB/s +- 0%             47.3MB/s +- 0%  +1.47%  (p=0.008 n=5+5)
Gunzip-64                             334MB/s +- 0%              330MB/s +- 0%  -1.25%  (p=0.008 n=5+5)
JSONEncode-64                         111MB/s +- 4%              113MB/s +- 1%    ~     (p=0.222 n=5+5)
JSONDecode-64                        24.0MB/s +- 1%             24.2MB/s +- 0%  +1.02%  (p=0.008 n=5+5)
GoParse-64                           9.35MB/s +- 6%             9.42MB/s +- 1%    ~     (p=0.571 n=5+5)
RegexpMatchEasy0_32-64                237MB/s +- 0%              231MB/s +- 7%    ~     (p=0.690 n=5+5)
RegexpMatchEasy0_1K-64               2.11GB/s +- 2%             2.12GB/s +- 2%    ~     (p=1.000 n=5+5)
RegexpMatchEasy1_32-64                250MB/s +- 1%              257MB/s +- 1%  +2.63%  (p=0.008 n=5+5)
RegexpMatchEasy1_1K-64               1.33GB/s +- 2%             1.35GB/s +- 1%    ~     (p=0.548 n=5+5)
RegexpMatchMedium_32-64              77.6MB/s +- 0%             79.8MB/s +- 0%  +2.80%  (p=0.008 n=5+5)
RegexpMatchMedium_1K-64              17.7MB/s +- 1%             18.1MB/s +- 0%  +2.41%  (p=0.008 n=5+5)
RegexpMatchHard_32-64                10.4MB/s +- 1%             10.4MB/s +- 0%    ~     (p=0.056 n=5+5)
RegexpMatchHard_1K-64                11.0MB/s +- 1%             11.0MB/s +- 0%    ~     (p=0.984 n=5+5)
Revcomp-64                            188MB/s +-71%              155MB/s +-71%    ~     (p=1.000 n=5+5)
Template-64                          17.2MB/s +- 1%             17.7MB/s +- 3%    ~     (p=0.095 n=5+5)
[Geo mean]                            79.2MB/s                   79.3MB/s       +0.24%

Change-Id: I593ac3e7037afafc3605ad4b0cfb51d5dd88015d
Reviewed-on: https://go-review.googlesource.com/c/go/+/232438
Trust: Alberto Donizetti <alb.donizetti@gmail.com>
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-09-16 14:56:18 +00:00
Junchen Li d7ab277eed cmd/asm: add more SIMD instructions on arm64
This CL adds USHLL, USHLL2, UZP1, UZP2, and BIF instructions requested
by #40725. And since UXTL* are aliases of USHLL*, this CL also merges
them into one case.

Updates #40725

Change-Id: I404a4fdaf953319f72eea548175bec1097a2a816
Reviewed-on: https://go-review.googlesource.com/c/go/+/253659
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-09-10 15:48:36 +00:00
fanzha02 dfdc3880b0 cmd/internal/obj/arm64: enable some SIMD instructions
Enable VBSL, VBIT, VCMTST, VUXTL VUXTL2 and FMOVQ SIMD
instructions required by the issue #40725. And FMOVQ
instrucion is used to move a large constant to a Vn
register.

Add test cases.

Fixes #40725

Change-Id: I1cac1922a0a0165d698a4b73a41f7a5f0a0ad549
Reviewed-on: https://go-review.googlesource.com/c/go/+/249758
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-09-10 02:22:19 +00:00
fanzha02 0e19aaabc0 cmd/asm: fix the error of checking the post-index offset of VLD[1-4]R instructions of arm64
The post-index offset of VLD[1-4]R instructions is decided by the
"size" field not "Q" field, the current assembler uses "Q" fileld
to check the correctness of post-index offset which is not correct.
This patch fixes it.

Fixes #40725

Change-Id: If1cde7f21c6b3ee0e491649eb567700bd1475c84
Reviewed-on: https://go-review.googlesource.com/c/go/+/249757
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-09-07 03:28:25 +00:00
Keith Randall 9e70564f63 cmd/compile,cmd/asm: simplify recording of branch targets, take 2
We currently use two fields to store the targets of branches.
Some phases use p.To.Val, some use p.Pcond. Rewrite so that
every branch instruction uses p.To.Val.
p.From.Val is also used in rare instances.
Introduce a Pool link for use by arm/arm64, instead of
repurposing Pcond.

This is a cleanup CL in preparation for some stack frame CLs.

Change-Id: If8239177e4b1ea2bccd0608eb39553d23210d405
Reviewed-on: https://go-review.googlesource.com/c/go/+/251437
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-08-31 17:36:08 +00:00
Xiangdong Ji 5d0b35ca98 cmd/asm: Always use go-style arrangement specifiers on ARM64
Fixing several error message and comment texts of the ARM64 assembler
to use arrangement specifiers of Go's assembly style.

Change-Id: Icdbb14fba7aaede40d57d0d754795b050366a1ab
Reviewed-on: https://go-review.googlesource.com/c/go/+/237859
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-08-28 16:40:13 +00:00
Keith Randall 26ad27bb02 Revert "cmd/compile,cmd/asm: simplify recording of branch targets"
This reverts CL 243318.

Reason for revert: Seems to be crashing some builders.

Change-Id: I2ffc59bc5535be60b884b281c8d0eff4647dc756
Reviewed-on: https://go-review.googlesource.com/c/go/+/251169
Reviewed-by: Bryan C. Mills <bcmills@google.com>
2020-08-28 02:10:13 +00:00
Keith Randall 8247da3662 cmd/compile,cmd/asm: simplify recording of branch targets
We currently use two fields to store the targets of branches.
Some phases use p.To.Val, some use p.Pcond. Rewrite so that
every branch instruction uses p.To.Val.
p.From.Val is also used in rare instances.
Introduce a Pool link for use by arm/arm64, instead of
repurposing Pcond.

This is a cleanup CL in preparation for some stack frame CLs.

Change-Id: I9055bf0a1d986aff421e47951a1dedc301c846f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/243318
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-08-27 22:35:45 +00:00
Keith Randall 46ca7b5ee2 cmd/internal/obj: stop removing NOPs from instruction stream
This has already been done for s390x, ppc64. This CL is for
all the other architectures.

Fixes #40796

Change-Id: Idd1816e057df63022d47e99fa06617811d8c8489
Reviewed-on: https://go-review.googlesource.com/c/go/+/248684
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-08-20 01:12:34 +00:00
Meng Zhuo 93eeb819ca cmd/asm: Add SHA512 hardware instructions for ARM64
ARMv8.2-SHA add SHA512 intructions:

1. SHA512H	Vm.D2, Vn, Vd
2. SHA512H2	Vm.D2, Vn, Vd
3. SHA512SU0	Vn.D2, Vd.D2
4. SHA512SU1	Vm.D2, Vn.D2, Vd.D2

ARMv8 Architecture Reference Manual C7.2.234-C7.2.234

Change-Id: Ie970fef1bba5312ad466f246035da4c40a1bbb39
Reviewed-on: https://go-review.googlesource.com/c/go/+/180057
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-08-18 17:30:53 +00:00
eric fang e5e386938f cmd/asm: fix the encoding error of VCNT instruction for arm64
When the arrangement specifier is "B16", the 30-bit should be 1 rather than 0.
This CL fixes this error.

Fixes #39445

Change-Id: Ib44881cdb8b3aab855cb30f2c52a085cd73a6a2c
Reviewed-on: https://go-review.googlesource.com/c/go/+/236638
Run-TryBot: eric fang <eric.fang@arm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-06-09 15:58:36 +00:00
Cherry Zhang f1ac85c8d1 cmd/internal/obj/arm64: fix 32-bit BITCON test
The BITCON test, isbitcon, assumes 32-bit constants are expanded
repeatedly, i.e. by copying the low 32 bits to high 32 bits,
instead of zero extending. We already do such expansion in
progedit. In con32class when classifying 32-bit constants, we
should use the expanded constant, instead of zero-extending it.

TODO: we could have better encoding for things like ANDW $-1, Rx.

Fixes #38946.

Change-Id: I37d0c95d744834419db5c897fd1f6c187595c926
Reviewed-on: https://go-review.googlesource.com/c/go/+/232984
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2020-05-08 20:57:01 +00:00
Cherry Zhang ee330385ca cmd/internal/obj, runtime: preempt & restart some instruction sequences
On some architectures, for async preemption the injected call
needs to clobber a register (usually REGTMP) in order to return
to the preempted function. As a consequence, the PC ranges where
REGTMP is live are not preemptible.

The uses of REGTMP are usually generated by the assembler, where
it needs to load or materialize a large constant or offset that
doesn't fit into the instruction. In those cases, REGTMP is not
live at the start of the instruction sequence. Instead of giving
up preemption in those cases, we could preempt it and restart the
sequence when resuming the execution. Basically, this is like
reissuing an interrupted instruction, except that here the
"instruction" is a Prog that consists of multiple machine
instructions. For this to work, we need to generate PC data to
mark the start of the Prog.

Currently this is only done for ARM64.

TODO: the split-stack function prologue is currently not async
preemptible. We could use this mechanism, preempt it and restart
at the function entry.

Change-Id: I37cb282f8e606e7ab6f67b3edfdc6063097b4bd1
Reviewed-on: https://go-review.googlesource.com/c/go/+/208126
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2020-05-06 15:41:12 +00:00
fanzha02 1dbcbcfca4 cmd/asm: align an instruction or a function's address
Recently, the gVisor project needs an instruction's address
with 128 bytes alignment to fit the architecture requirement
for interrupt table.

This patch allows aligning an instruction's address to be
aligned to a specific value (2^n and in the range [8, 2048])

The main changes include:

1. Adds a new element in the FuncInfo structure defined in
cmd/internal/obj/link.go file to record the alignment
information.

2. Adds a new element in the Func structure defined in
cmd/internal/goobj/read.go file to read the alignment
information.

3. Adds the assembler support to align an intruction's offset
with a specific value (2^n and in the range [8, 2048]).
e.g. "PCALIGN $256" indicates that the next instruction should
be aligned to 256 bytes.

4. An instruction's alignment is relative to the start of the
function where this instruction is located, so the function's
address must be aligned to the same or coarser boundary.

This CL also adds a test.

Change-Id: I9b365c111b3a12f767728f1b45aa0c00f073c37d
Reviewed-on: https://go-review.googlesource.com/c/go/+/226997
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-02 15:05:15 +00:00
Bryan C. Mills 5970480c68 Revert "cmd/asm: align an instruction or a function's address"
This reverts CL 212767.

Reason for revert: new test is persistently failing on freebsd-arm64-dmgk builder.

Change-Id: Ifd1227628e0e747688ddb4dc580170b2a103a89e
Reviewed-on: https://go-review.googlesource.com/c/go/+/226597
Run-TryBot: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-03-31 16:55:59 +00:00
fanzha02 71d477469c cmd/asm: align an instruction or a function's address
Recently, the gVisor project needs an instruction's address
with 128 bytes alignment and a function's start address with
2K bytes alignment to fit the architecture requirement for
interrupt table.

This patch allows aligning the address of an instruction to be
aligned to a specific value (2^n and not higher than 2048) and
the address of a function to be 2048 bytes.

The main changes include:

1. Adds ALIGN2048 flag to align a function's address with
2048 bytes.
e.g. "TEXT ·Add(SB),NOSPLIT|ALIGN2048" indicates that the
address of Add function should be aligned to 2048 bytes.

2. Adds a new element in the FuncInfo structure defined in
cmd/internal/obj/link.go file to record the alignment
information.

3. Adds a new element in the Func structure defined in
cmd/internal/goobj/read.go file to read the alignment
information.

4. Because go introduces a new object file format, also add
a new element in the FuncInfo structure defined in
cmd/internal/goobj2/funcinfo.go to record the alignment
information.

5. Adds the assembler support to align an intruction's offset
with a specific value (2^n and not higher than 2048).
e.g. "PCALIGN $256" indicates that the next instruction should
be aligned to 256 bytes.

This CL also adds a test.

Change-Id: I31cfa6fb5bc35dee2c44bf65913e90cddfcb492a
Reviewed-on: https://go-review.googlesource.com/c/go/+/212767
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-03-31 02:37:05 +00:00
diaxu01 2f8798ed07 cmd/internal/obj/arm64: add support of PCALIGN directive
Recently, we get requirements of instructions and functions alignment
from the gVisor project. To fit the alignment requirement of interrupt
table, they require an instruction's address to be aligned 128 bytes
and a function's entry address to be aligned 2K bytes. Thus we add
support for PCALIGN directive first. Below is a discussion about this
topic. https://groups.google.com/forum/m/#!topic/golang-dev/RPj90l5x86I

Functions in Go are aligned to 16 bytes on arm64, thus now we only
support 8 and 16 bytes alignment.

This patch adds support for PCALIGN directive. This directive can be
used within Go asm to align instruction by padding NOOP directives.

This patch also adds a test to verify the correnctness of the PCALIGN
directive. The test is contributed by Fannie Zhang <Fannie.Zhang@arm.com>.

Change-Id: I709e6b94847fe9e1824f42f4155355f90c63d523
Reviewed-on: https://go-review.googlesource.com/c/go/+/207117
Reviewed-by: eric fang <eric.fang@arm.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-03-25 14:28:26 +00:00
Russ Cox fc8a6336d1 cmd/asm, cmd/compile, runtime: add -spectre=ret mode
This commit extends the -spectre flag to cmd/asm and adds
a new Spectre mitigation mode "ret", which enables the use
of retpolines.

Retpolines prevent speculation about the target of an indirect
jump or call and are described in more detail here:
https://support.google.com/faqs/answer/7625886

Change-Id: I4f2cb982fa94e44d91e49bd98974fd125619c93a
Reviewed-on: https://go-review.googlesource.com/c/go/+/222661
Reviewed-by: Keith Randall <khr@golang.org>
2020-03-13 19:05:54 +00:00
Xiangdong Ji 2783249068 cmd/asm: add asimd instruction 'rev16' on arm64
Add support to the asimd instruction rev16 which reverses elements in
16-bit halfwords.

syntax:
	VREV16 <Vn>.<T>, <Vd>.<T>
<T> should be either B8 or B16.

Change-Id: I7a7b8e772589c51ca9eb6dca98bab1aac863c6c2
Reviewed-on: https://go-review.googlesource.com/c/go/+/213738
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-02-25 18:40:19 +00:00
Cherry Zhang 4736088463 cmd/internal/obj/arm64: mark unsafe points
For async preemption, we will be using REGTMP as a temporary
register in injected call on ARM64, which will clobber it. So any
code that uses REGTMP is not safe for async preemption.

In the assembler backend, we expand a Prog to multiple machine
instructions and use REGTMP as a temporary register if necessary.
These need to be marked unsafe. In fact, most of the
multi-instruction Progs use REGTMP, so we mark all of them,
except ones that are whitelisted.

Change-Id: I6e97805a13950e3b693fb606d77834940ac3722e
Reviewed-on: https://go-review.googlesource.com/c/go/+/203460
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-11-07 19:17:50 +00:00
diaxu01 3876bd67ef cmd/internal/obj/arm64: add support of NOOP instruction
This patch uses symbol NOOP to support arm64 instruction NOP. In
arm64, NOP stands for that No Operation does nothing, other than
advance the value of the program counter by 4. This instruction
can be used for instruction alignment purposes. This patch uses
NOOP to support arm64 instruction NOP, because we have a generic
"NOP" instruction, which is a zero-width pseudo-instruction.

In arm64, instruction NOP is an alias of HINT #0. This patch adds
test cases for instruction HINT #0.

Change-Id: I54e6854c46516eb652b412ef9e0f73ab7f171f8c
Reviewed-on: https://go-review.googlesource.com/c/go/+/200578
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-10-18 14:15:51 +00:00
diaxu01 9ce5cad0aa cmd/internal/obj/arm64: add error checking for system registers.
This CL adds system register error checking test cases. There're two kinds of
error test cases:

1. illegal combination.
MRS should be used in this way: MRS <system register>, <general register>.
MSR should be used in this way: MSR <general register>, <system register>.
Error usage examples:
MRS     R8, VTCR_EL2    // ERROR "illegal combination"
MSR     VTCR_EL2, R8    // ERROR "illegal combination"

2. illegal read or write access.
Error usage examples:
MSR     R7, MIDR_EL1    // ERROR "expected writable system register or pstate"
MRS     OSLAR_EL1, R3   // ERROR "expected readable system register"

This CL reads system registers readable and writeable property to check whether
they're used with legal read or write access. This property is named AccessFlags
in sysRegEnc.go, and it is automatically generated by modifing the system register
generator.

Change-Id: Ic83d5f372de38d1ecd0df1ca56b354ee157f16b4
Reviewed-on: https://go-review.googlesource.com/c/go/+/194917
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-10-08 16:20:53 +00:00
Meng Zhuo 2bf7a92571 cmd/asm: add VLD[1-4]R vector instructions on arm64
This change adds VLD1R, VLD2R, VLD3R, VLD4R

Change-Id: Ie19e9ae02fdfc94b9344acde8c9938849efb0bf0
Reviewed-on: https://go-review.googlesource.com/c/go/+/181697
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-03 20:31:11 +00:00
Ainar Garipov 0efbd10157 all: fix typos
Use the following (suboptimal) script to obtain a list of possible
typos:

  #!/usr/bin/env sh

  set -x

  git ls-files |\
    grep -e '\.\(c\|cc\|go\)$' |\
    xargs -n 1\
    awk\
    '/\/\// { gsub(/.*\/\//, ""); print; } /\/\*/, /\*\// { gsub(/.*\/\*/, ""); gsub(/\*\/.*/, ""); }' |\
    hunspell -d en_US -l |\
    grep '^[[:upper:]]\{0,1\}[[:lower:]]\{1,\}$' |\
    grep -v -e '^.\{1,4\}$' -e '^.\{16,\}$' |\
    sort -f |\
    uniq -c |\
    awk '$1 == 1 { print $2; }'

Then, go through the results manually and fix the most obvious typos in
the non-vendored code.

Change-Id: I3cb5830a176850e1a0584b8a40b47bde7b260eae
Reviewed-on: https://go-review.googlesource.com/c/go/+/193848
Reviewed-by: Robert Griesemer <gri@golang.org>
2019-09-08 17:28:20 +00:00
Meng Zhuo 8403d4ea90 cmd/asm: add V[LD|ST][2-4] vector instructions on arm64
This change adds VLD2, VLD3, VLD4, VST2, VST3, VST4 (multiple structures)
for image or multi media optimazation.

Change-Id: Iae3538ef4434e436e3fb2f19153c58f918f773af
Reviewed-on: https://go-review.googlesource.com/c/go/+/166518
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-08-28 15:29:12 +00:00
fanzha02 9c67516ed6 cmd/internal/obj/arm64: add support for most system registers
This patch supports the EL0 and EL1 system registers used in MRS/MSR
instructions. This patch refactors the assembler code, allowing the
assembler to read system register information from the automatically
generated sysRegEnc.go file and move existing declared system registers
to the sysRegEnc.go file.

This patch adds 431 system registers, it is worth noting that the number
of special registers is initialized to less than 1024 in the list7.go file.

This CL also adds some test cases to test the newly added system registers.

The test cases are contributed by Dianhong Xu <Dianhong.Xu@arm.com>

Change-Id: Ic09a937eaaeefe82bd08b5dd726808f8ff6cebf6
Reviewed-on: https://go-review.googlesource.com/c/go/+/189577
Reviewed-by: Ben Shi <powerman1st@163.com>
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-08-28 02:28:49 +00:00
erifan01 448089854a cmd/asm: add arm64 v8.1 atomic instructions
This change adds several arm64 v8.1 atomic instructions and test cases.
They are LDADDAx, LDADDLx, LDANDAx, LDANDALx, LDANDLx, LDEORAx, LDEORALx,
LDEORLx, LDORAx, LDORALx, LDORLx, SWPAx and SWPLx. Their form is consistent
with the form of the existing atomic instructions.

For instructions STXRx, STLXRx, STXPx and STLXPx, the second destination
register can't be RSP. This CL also adds a check for this.

LDADDx Rs, (Rb), Rt: *Rb -> Rt, Rs + *Rb -> *Rb
LDANDx Rs, (Rb), Rt: *Rb -> Rt, Rs AND NOT(*Rb) -> *Rb
LDEORx Rs, (Rb), Rt: *Rb -> Rt, Rs EOR *Rb -> *Rb
LDORx  Rs, (Rb), Rt: *Rb -> Rt, Rs OR *Rb -> *Rb

Change-Id: I9f9b0245958cb57ab7d88c66fb9159b23b9017fd
Reviewed-on: https://go-review.googlesource.com/c/go/+/157001
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-03-06 18:52:03 +00:00
fanzha02 2ef8abb41f cmd/internal/obj/arm64: fix the bug assembling TSTW
Current assembler reports error when it assembles
"TSTW $1689262177517664, R3", but go1.11 was building
fine.

Fixes #30334

Change-Id: I9c16d36717cd05df2134e8eb5b17edc385aff0a9
Reviewed-on: https://go-review.googlesource.com/c/163259
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ben Shi <powerman1st@163.com>
2019-02-22 09:29:27 +00:00
Tobias Klauser 9e277f7d55 all: use "reports whether" consistently instead of "returns whether"
Follow-up for CL 147037 and after Brad noticed the "returns whether"
pattern during the review of CL 150621.

Go documentation style for boolean funcs is to say:

    // Foo reports whether ...
    func Foo() bool

(rather than "returns whether")

Created with:

    $ perl -i -npe 's/returns whether/reports whether/' $(git grep -l "returns whether" | grep -v vendor)

Change-Id: I15fe9ff99180ad97750cd05a10eceafdb12dc0b4
Reviewed-on: https://go-review.googlesource.com/c/150918
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-12-02 15:12:26 +00:00
fanzha02 644ddaa842 cmd/internal/obj/arm64: encode large constants into MOVZ/MOVN and MOVK instructions
Current assembler gets large constants from constant pool, this CL
gets rid of the pool by using MOVZ/MOVN and MOVK to load large
constants.

This CL changes the assembler behavior as follows.

1. go assembly  1, MOVD $0x1111222233334444, R1
                2, MOVD $0x1111ffff1111ffff, R1
   previous version: MOVD 0x9a4, R1 (loads constant from pool).
   optimized version: 1, MOVD $0x4444, R1; MOVK $(0x3333<<16), R1; MOVK $(0x2222<<32), R1;
   MOVK $(0x1111<<48), R1. 2, MOVN $(0xeeee<<16), R1; MOVK $(0x1111<<48), R1.

Add test cases, and below are binary size comparison and bechmark results.

1. Binary size before/after
binary                 size change
pkg/linux_arm64        +25.4KB
pkg/tool/linux_arm64   -2.9KB
go                     -2KB
gofmt                  no change

2. compiler benchmark.
name       old time/op       new time/op       delta
Template         574ms ±21%        577ms ±14%     ~     (p=0.853 n=10+10)
Unicode          327ms ±29%        353ms ±23%     ~     (p=0.360 n=10+8)
GoTypes          1.97s ± 8%        2.04s ±11%     ~     (p=0.143 n=10+10)
Compiler         9.13s ± 9%        9.25s ± 8%     ~     (p=0.684 n=10+10)
SSA              29.2s ± 5%        27.0s ± 4%   -7.40%  (p=0.000 n=10+10)
Flate            402ms ±40%        308ms ± 6%  -23.29%  (p=0.004 n=10+10)
GoParser         470ms ±26%        382ms ±10%  -18.82%  (p=0.000 n=9+10)
Reflect          1.36s ±16%        1.17s ± 7%  -13.92%  (p=0.001 n=9+10)
Tar              561ms ±19%        466ms ±15%  -17.08%  (p=0.000 n=9+10)
XML              745ms ±20%        679ms ±20%     ~     (p=0.123 n=10+10)
StdCmd           35.5s ± 6%        37.2s ± 3%   +4.81%  (p=0.001 n=9+8)

name       old user-time/op  new user-time/op  delta
Template         625ms ±14%        660ms ±18%     ~     (p=0.343 n=10+10)
Unicode          355ms ±10%        373ms ±20%     ~     (p=0.346 n=9+10)
GoTypes          2.39s ± 8%        2.37s ± 5%     ~     (p=0.897 n=10+10)
Compiler         11.1s ± 4%        11.4s ± 2%   +2.63%  (p=0.010 n=10+9)
SSA              35.4s ± 3%        34.9s ± 2%     ~     (p=0.113 n=10+9)
Flate            402ms ±13%        371ms ±30%     ~     (p=0.089 n=10+9)
GoParser         513ms ± 8%        489ms ±24%   -4.76%  (p=0.039 n=9+9)
Reflect          1.52s ±12%        1.41s ± 5%   -7.32%  (p=0.001 n=9+10)
Tar              607ms ±10%        558ms ± 8%   -7.96%  (p=0.009 n=9+10)
XML              828ms ±10%        789ms ±12%     ~     (p=0.059 n=10+10)

name       old text-bytes    new text-bytes    delta
HelloSize        714kB ± 0%        712kB ± 0%   -0.23%  (p=0.000 n=10+10)
CmdGoSize       8.26MB ± 0%       8.25MB ± 0%   -0.14%  (p=0.000 n=10+10)

name       old data-bytes    new data-bytes    delta
HelloSize       10.5kB ± 0%       10.5kB ± 0%     ~     (all equal)
CmdGoSize        258kB ± 0%        258kB ± 0%     ~     (all equal)

name       old bss-bytes     new bss-bytes     delta
HelloSize        125kB ± 0%        125kB ± 0%     ~     (all equal)
CmdGoSize        146kB ± 0%        146kB ± 0%     ~     (all equal)

name       old exe-bytes     new exe-bytes     delta
HelloSize       1.18MB ± 0%       1.18MB ± 0%     ~     (all equal)
CmdGoSize       11.2MB ± 0%       11.2MB ± 0%   -0.13%  (p=0.000 n=10+10)

3. go1 benckmark.
name                   old time/op    new time/op    delta
BinaryTree17              6.60s ±18%     7.36s ±22%    ~     (p=0.222 n=5+5)
Fannkuch11                4.04s ± 0%     4.05s ± 0%    ~     (p=0.421 n=5+5)
FmtFprintfEmpty          91.8ns ±14%    91.2ns ± 9%    ~     (p=0.667 n=5+5)
FmtFprintfString          145ns ± 0%     151ns ± 6%    ~     (p=0.397 n=4+5)
FmtFprintfInt             169ns ± 0%     176ns ± 5%  +4.14%  (p=0.016 n=4+5)
FmtFprintfIntInt          229ns ± 2%     243ns ± 6%    ~     (p=0.143 n=5+5)
FmtFprintfPrefixedInt     343ns ± 0%     350ns ± 3%  +1.92%  (p=0.048 n=5+5)
FmtFprintfFloat           400ns ± 3%     394ns ± 3%    ~     (p=0.063 n=5+5)
FmtManyArgs              1.04µs ± 0%    1.05µs ± 0%  +1.62%  (p=0.029 n=4+4)
GobDecode                13.9ms ± 4%    13.9ms ± 5%    ~     (p=1.000 n=5+5)
GobEncode                10.6ms ± 4%    10.6ms ± 5%    ~     (p=0.421 n=5+5)
Gzip                      567ms ± 1%     563ms ± 4%    ~     (p=0.548 n=5+5)
Gunzip                   60.2ms ± 1%    60.4ms ± 0%    ~     (p=0.056 n=5+5)
HTTPClientServer          114µs ± 4%     108µs ± 7%    ~     (p=0.095 n=5+5)
JSONEncode               18.4ms ± 2%    17.8ms ± 2%  -3.06%  (p=0.016 n=5+5)
JSONDecode                105ms ± 1%     103ms ± 2%    ~     (p=0.056 n=5+5)
Mandelbrot200            5.48ms ± 0%    5.49ms ± 0%    ~     (p=0.841 n=5+5)
GoParse                  6.05ms ± 1%    6.05ms ± 2%    ~     (p=1.000 n=5+5)
RegexpMatchEasy0_32       143ns ± 1%     146ns ± 4%  +2.10%  (p=0.048 n=4+5)
RegexpMatchEasy0_1K       499ns ± 1%     492ns ± 2%    ~     (p=0.079 n=5+5)
RegexpMatchEasy1_32       137ns ± 0%     136ns ± 1%  -0.73%  (p=0.016 n=4+5)
RegexpMatchEasy1_1K       826ns ± 4%     823ns ± 2%    ~     (p=0.841 n=5+5)
RegexpMatchMedium_32      224ns ± 5%     233ns ± 8%    ~     (p=0.119 n=5+5)
RegexpMatchMedium_1K     59.6µs ± 0%    59.3µs ± 1%  -0.66%  (p=0.016 n=4+5)
RegexpMatchHard_32       3.29µs ± 3%    3.26µs ± 1%    ~     (p=0.889 n=5+5)
RegexpMatchHard_1K       98.8µs ± 2%    99.0µs ± 0%    ~     (p=0.690 n=5+5)
Revcomp                   1.02s ± 1%     1.01s ± 1%    ~     (p=0.095 n=5+5)
Template                  135ms ± 5%     131ms ± 1%    ~     (p=0.151 n=5+5)
TimeParse                 591ns ± 0%     593ns ± 0%  +0.20%  (p=0.048 n=5+5)
TimeFormat                655ns ± 2%     607ns ± 0%  -7.42%  (p=0.016 n=5+4)
[Geo mean]               93.5µs         93.8µs       +0.23%

name                   old speed      new speed      delta
GobDecode              55.1MB/s ± 4%  55.1MB/s ± 4%    ~     (p=1.000 n=5+5)
GobEncode              72.4MB/s ± 4%  72.3MB/s ± 5%    ~     (p=0.421 n=5+5)
Gzip                   34.2MB/s ± 1%  34.5MB/s ± 4%    ~     (p=0.548 n=5+5)
Gunzip                  322MB/s ± 1%   321MB/s ± 0%    ~     (p=0.056 n=5+5)
JSONEncode              106MB/s ± 2%   109MB/s ± 2%  +3.16%  (p=0.016 n=5+5)
JSONDecode             18.5MB/s ± 1%  18.8MB/s ± 2%    ~     (p=0.056 n=5+5)
GoParse                9.57MB/s ± 1%  9.57MB/s ± 2%    ~     (p=0.952 n=5+5)
RegexpMatchEasy0_32     223MB/s ± 1%   221MB/s ± 0%  -1.10%  (p=0.029 n=4+4)
RegexpMatchEasy0_1K    2.05GB/s ± 1%  2.08GB/s ± 2%    ~     (p=0.095 n=5+5)
RegexpMatchEasy1_32     232MB/s ± 0%   234MB/s ± 1%  +0.76%  (p=0.016 n=4+5)
RegexpMatchEasy1_1K    1.24GB/s ± 4%  1.24GB/s ± 2%    ~     (p=0.841 n=5+5)
RegexpMatchMedium_32   4.45MB/s ± 5%  4.20MB/s ± 1%  -5.63%  (p=0.000 n=5+4)
RegexpMatchMedium_1K   17.2MB/s ± 0%  17.3MB/s ± 1%  +0.66%  (p=0.016 n=4+5)
RegexpMatchHard_32     9.73MB/s ± 3%  9.83MB/s ± 1%    ~     (p=0.889 n=5+5)
RegexpMatchHard_1K     10.4MB/s ± 2%  10.3MB/s ± 0%    ~     (p=0.635 n=5+5)
Revcomp                 249MB/s ± 1%   252MB/s ± 1%    ~     (p=0.095 n=5+5)
Template               14.4MB/s ± 4%  14.8MB/s ± 1%    ~     (p=0.151 n=5+5)
[Geo mean]             62.1MB/s       62.3MB/s       +0.34%

Fixes #10108

Change-Id: I79038f3c4c2ff874c136053d1a2b1c8a5a9cfac5
Reviewed-on: https://go-review.googlesource.com/c/118796
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-11-07 16:12:02 +00:00
Cherry Zhang 441cb988b4 cmd/internal/obj/arm64: fix encoding of 32-bit negated logical instructions
32-bit negated logical instructions (BICW, ORNW, EONW) with
constants were mis-encoded, because they were missing in the
cases where we handle 32-bit logical instructions. This CL
adds the missing cases.

Fixes #28548

Change-Id: I3d6acde7d3b72bb7d3d5d00a9df698a72c806ad5
Reviewed-on: https://go-review.googlesource.com/c/147077
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Ben Shi <powerman1st@163.com>
Reviewed-by: Ben Shi <powerman1st@163.com>
2018-11-03 01:46:55 +00:00
fanzha02 86ce1cb060 cmd/internal/obj/arm64: reclassify 32-bit/64-bit constants
Current assembler saves constants in Offset which type is int64,
causing 32-bit constants have a incorrect class. This CL reclassifies
constants when opcodes are 32-bit variant, like MOVW, ANDW and
ADDW, etc. Besides, this CL encodes some constants of ADDCON class
as MOVs instructions.

This CL changes the assembler behavior as follows.

1. go assembler ADDW $MOVCON, Rn, Rd
   previous version: MOVD $MOVCON, Rtmp; ADDW Rtmp, Rn, Rd
   current version: MOVW $MOVCON, Rtmp; ADDW Rtmp, Rn, Rd

2. go assembly MOVW $0xaaaaffff, R1
   previous version: treats $0xaaaaffff as VCON, encodes it as MOVW 0x994, R1 (loads it from pool).
   current version: treats $0xaaaaffff as MOVCON, and encodes it into MOVW instructions.

3. go assembly MOVD $0x210000, R1
   previous version: treats $0x210000 as ADDCON, loads it from pool
   current version: treats $0x210000 as MOVCON, and encodes it into MOVD instructions.

Add the test cases.

1. Binary size before/after.
binary                          size change
pkg/linux_arm64                 -1.534KB
pkg/tool/linux_arm64            -0.718KB
go                              -0.32KB
gofmt                           no change

2. go1 benchmark result.
name                     old time/op    new time/op    delta
BinaryTree17-8              6.26s ± 1%     6.28s ± 1%     ~     (p=0.105 n=10+10)
Fannkuch11-8                5.40s ± 0%     5.39s ± 0%   -0.29%  (p=0.028 n=9+10)
FmtFprintfEmpty-8          94.5ns ± 0%    95.0ns ± 0%   +0.51%  (p=0.000 n=10+9)
FmtFprintfString-8          163ns ± 1%     159ns ± 1%   -2.06%  (p=0.000 n=10+9)
FmtFprintfInt-8             200ns ± 1%     196ns ± 1%   -1.99%  (p=0.000 n=9+10)
FmtFprintfIntInt-8          292ns ± 3%     284ns ± 1%   -2.87%  (p=0.001 n=10+9)
FmtFprintfPrefixedInt-8     422ns ± 1%     420ns ± 1%   -0.59%  (p=0.015 n=10+10)
FmtFprintfFloat-8           458ns ± 0%     463ns ± 1%   +1.19%  (p=0.000 n=9+10)
FmtManyArgs-8              1.37µs ± 1%    1.35µs ± 1%   -1.85%  (p=0.000 n=10+10)
GobDecode-8                15.5ms ± 1%    15.3ms ± 1%   -1.82%  (p=0.000 n=10+10)
GobEncode-8                11.7ms ± 5%    11.7ms ± 2%     ~     (p=0.549 n=10+9)
Gzip-8                      622ms ± 0%     624ms ± 0%   +0.23%  (p=0.000 n=10+9)
Gunzip-8                   73.6ms ± 0%    73.8ms ± 1%     ~     (p=0.077 n=9+9)
HTTPClientServer-8          115µs ± 1%     115µs ± 1%     ~     (p=0.796 n=10+10)
JSONEncode-8               31.1ms ± 2%    28.7ms ± 1%   -7.98%  (p=0.000 n=10+9)
JSONDecode-8                145ms ± 0%     145ms ± 1%     ~     (p=0.447 n=9+10)
Mandelbrot200-8            9.67ms ± 0%    9.60ms ± 0%   -0.76%  (p=0.000 n=9+9)
GoParse-8                  7.56ms ± 1%    7.58ms ± 0%   +0.21%  (p=0.035 n=10+9)
RegexpMatchEasy0_32-8       208ns ±10%     222ns ± 0%     ~     (p=0.531 n=10+6)
RegexpMatchEasy0_1K-8       699ns ± 4%     694ns ± 4%     ~     (p=0.868 n=10+10)
RegexpMatchEasy1_32-8       186ns ± 8%     190ns ±12%     ~     (p=0.955 n=10+10)
RegexpMatchEasy1_1K-8      1.13µs ± 1%    1.05µs ± 2%   -6.64%  (p=0.000 n=10+10)
RegexpMatchMedium_32-8      316ns ± 7%     288ns ± 1%   -8.68%  (p=0.000 n=10+7)
RegexpMatchMedium_1K-8     90.2µs ± 0%    85.5µs ± 2%   -5.19%  (p=0.000 n=10+10)
RegexpMatchHard_32-8       5.53µs ± 0%    3.90µs ± 0%  -29.52%  (p=0.000 n=10+10)
RegexpMatchHard_1K-8        119µs ± 0%     124µs ± 0%   +4.29%  (p=0.000 n=9+10)
Revcomp-8                   1.07s ± 0%     1.07s ± 0%     ~     (p=0.094 n=9+9)
Template-8                  162ms ± 1%     160ms ± 2%     ~     (p=0.089 n=10+10)
TimeParse-8                 756ns ± 2%     763ns ± 1%     ~     (p=0.158 n=10+10)
TimeFormat-8                758ns ± 1%     746ns ± 1%   -1.52%  (p=0.000 n=10+10)

name                     old speed      new speed      delta
GobDecode-8              49.4MB/s ± 1%  50.3MB/s ± 1%   +1.84%  (p=0.000 n=10+10)
GobEncode-8              65.6MB/s ± 5%  65.4MB/s ± 2%     ~     (p=0.549 n=10+9)
Gzip-8                   31.2MB/s ± 0%  31.1MB/s ± 0%   -0.24%  (p=0.000 n=9+9)
Gunzip-8                  264MB/s ± 0%   263MB/s ± 1%     ~     (p=0.073 n=9+9)
JSONEncode-8             62.3MB/s ± 2%  67.7MB/s ± 1%   +8.67%  (p=0.000 n=10+9)
JSONDecode-8             13.4MB/s ± 0%  13.4MB/s ± 1%     ~     (p=0.508 n=9+10)
GoParse-8                7.66MB/s ± 1%  7.64MB/s ± 0%   -0.23%  (p=0.049 n=10+9)
RegexpMatchEasy0_32-8     154MB/s ± 9%   143MB/s ± 3%     ~     (p=0.303 n=10+7)
RegexpMatchEasy0_1K-8    1.46GB/s ± 4%  1.47GB/s ± 4%     ~     (p=0.912 n=10+10)
RegexpMatchEasy1_32-8     172MB/s ± 9%   170MB/s ±12%     ~     (p=0.971 n=10+10)
RegexpMatchEasy1_1K-8     908MB/s ± 1%   972MB/s ± 2%   +7.12%  (p=0.000 n=10+10)
RegexpMatchMedium_32-8   3.17MB/s ± 7%  3.46MB/s ± 1%   +9.14%  (p=0.000 n=10+7)
RegexpMatchMedium_1K-8   11.3MB/s ± 0%  12.0MB/s ± 2%   +5.51%  (p=0.000 n=10+10)
RegexpMatchHard_32-8     5.78MB/s ± 0%  8.21MB/s ± 0%  +41.93%  (p=0.000 n=9+10)
RegexpMatchHard_1K-8     8.62MB/s ± 0%  8.27MB/s ± 0%   -4.11%  (p=0.000 n=9+10)
Revcomp-8                 237MB/s ± 0%   237MB/s ± 0%     ~     (p=0.081 n=9+9)
Template-8               12.0MB/s ± 1%  12.1MB/s ± 2%     ~     (p=0.072 n=10+10)

Change-Id: I080801f520366b42d5f9699954bd33106976a81b
Reviewed-on: https://go-review.googlesource.com/c/120661
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-10-22 14:23:23 +00:00