Commit Graph

797 Commits

Author SHA1 Message Date
Richard Musiol 6dd70fc5e3 all: add support for synchronous callbacks to js/wasm
With this change, callbacks returned by syscall/js.NewCallback
get executed synchronously. This is necessary for the APIs of
many JavaScript libraries.

A callback triggered during a call from Go to JavaScript gets executed
on the same goroutine. A callback triggered by JavaScript's event loop
gets executed on an extra goroutine.

Fixes #26045
Fixes #27441

Change-Id: I591b9e85ab851cef0c746c18eba95fb02ea9e85b
Reviewed-on: https://go-review.googlesource.com/c/142004
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-11-10 11:57:17 +00:00
Keith Randall a3b01440fe syscall: implement syscalls on Darwin using libSystem
There are still some references to the bare Syscall functions
in the stdlib. I will root those out in a following CL.
(This CL is big enough as it is.)
Most are in vendor directories:

cmd/vendor/golang.org/x/sys/unix/
vendor/golang_org/x/net/route/syscall.go
syscall/bpf_bsd.go
syscall/exec_unix.go
syscall/flock.go

Update #17490

Change-Id: I69ab707811530c26b652b291cadee92f5bf5c1a4
Reviewed-on: https://go-review.googlesource.com/c/141639
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Elias Naur <elias.naur@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-11-07 20:27:01 +00:00
fanzha02 644ddaa842 cmd/internal/obj/arm64: encode large constants into MOVZ/MOVN and MOVK instructions
Current assembler gets large constants from constant pool, this CL
gets rid of the pool by using MOVZ/MOVN and MOVK to load large
constants.

This CL changes the assembler behavior as follows.

1. go assembly  1, MOVD $0x1111222233334444, R1
                2, MOVD $0x1111ffff1111ffff, R1
   previous version: MOVD 0x9a4, R1 (loads constant from pool).
   optimized version: 1, MOVD $0x4444, R1; MOVK $(0x3333<<16), R1; MOVK $(0x2222<<32), R1;
   MOVK $(0x1111<<48), R1. 2, MOVN $(0xeeee<<16), R1; MOVK $(0x1111<<48), R1.

Add test cases, and below are binary size comparison and bechmark results.

1. Binary size before/after
binary                 size change
pkg/linux_arm64        +25.4KB
pkg/tool/linux_arm64   -2.9KB
go                     -2KB
gofmt                  no change

2. compiler benchmark.
name       old time/op       new time/op       delta
Template         574ms ±21%        577ms ±14%     ~     (p=0.853 n=10+10)
Unicode          327ms ±29%        353ms ±23%     ~     (p=0.360 n=10+8)
GoTypes          1.97s ± 8%        2.04s ±11%     ~     (p=0.143 n=10+10)
Compiler         9.13s ± 9%        9.25s ± 8%     ~     (p=0.684 n=10+10)
SSA              29.2s ± 5%        27.0s ± 4%   -7.40%  (p=0.000 n=10+10)
Flate            402ms ±40%        308ms ± 6%  -23.29%  (p=0.004 n=10+10)
GoParser         470ms ±26%        382ms ±10%  -18.82%  (p=0.000 n=9+10)
Reflect          1.36s ±16%        1.17s ± 7%  -13.92%  (p=0.001 n=9+10)
Tar              561ms ±19%        466ms ±15%  -17.08%  (p=0.000 n=9+10)
XML              745ms ±20%        679ms ±20%     ~     (p=0.123 n=10+10)
StdCmd           35.5s ± 6%        37.2s ± 3%   +4.81%  (p=0.001 n=9+8)

name       old user-time/op  new user-time/op  delta
Template         625ms ±14%        660ms ±18%     ~     (p=0.343 n=10+10)
Unicode          355ms ±10%        373ms ±20%     ~     (p=0.346 n=9+10)
GoTypes          2.39s ± 8%        2.37s ± 5%     ~     (p=0.897 n=10+10)
Compiler         11.1s ± 4%        11.4s ± 2%   +2.63%  (p=0.010 n=10+9)
SSA              35.4s ± 3%        34.9s ± 2%     ~     (p=0.113 n=10+9)
Flate            402ms ±13%        371ms ±30%     ~     (p=0.089 n=10+9)
GoParser         513ms ± 8%        489ms ±24%   -4.76%  (p=0.039 n=9+9)
Reflect          1.52s ±12%        1.41s ± 5%   -7.32%  (p=0.001 n=9+10)
Tar              607ms ±10%        558ms ± 8%   -7.96%  (p=0.009 n=9+10)
XML              828ms ±10%        789ms ±12%     ~     (p=0.059 n=10+10)

name       old text-bytes    new text-bytes    delta
HelloSize        714kB ± 0%        712kB ± 0%   -0.23%  (p=0.000 n=10+10)
CmdGoSize       8.26MB ± 0%       8.25MB ± 0%   -0.14%  (p=0.000 n=10+10)

name       old data-bytes    new data-bytes    delta
HelloSize       10.5kB ± 0%       10.5kB ± 0%     ~     (all equal)
CmdGoSize        258kB ± 0%        258kB ± 0%     ~     (all equal)

name       old bss-bytes     new bss-bytes     delta
HelloSize        125kB ± 0%        125kB ± 0%     ~     (all equal)
CmdGoSize        146kB ± 0%        146kB ± 0%     ~     (all equal)

name       old exe-bytes     new exe-bytes     delta
HelloSize       1.18MB ± 0%       1.18MB ± 0%     ~     (all equal)
CmdGoSize       11.2MB ± 0%       11.2MB ± 0%   -0.13%  (p=0.000 n=10+10)

3. go1 benckmark.
name                   old time/op    new time/op    delta
BinaryTree17              6.60s ±18%     7.36s ±22%    ~     (p=0.222 n=5+5)
Fannkuch11                4.04s ± 0%     4.05s ± 0%    ~     (p=0.421 n=5+5)
FmtFprintfEmpty          91.8ns ±14%    91.2ns ± 9%    ~     (p=0.667 n=5+5)
FmtFprintfString          145ns ± 0%     151ns ± 6%    ~     (p=0.397 n=4+5)
FmtFprintfInt             169ns ± 0%     176ns ± 5%  +4.14%  (p=0.016 n=4+5)
FmtFprintfIntInt          229ns ± 2%     243ns ± 6%    ~     (p=0.143 n=5+5)
FmtFprintfPrefixedInt     343ns ± 0%     350ns ± 3%  +1.92%  (p=0.048 n=5+5)
FmtFprintfFloat           400ns ± 3%     394ns ± 3%    ~     (p=0.063 n=5+5)
FmtManyArgs              1.04µs ± 0%    1.05µs ± 0%  +1.62%  (p=0.029 n=4+4)
GobDecode                13.9ms ± 4%    13.9ms ± 5%    ~     (p=1.000 n=5+5)
GobEncode                10.6ms ± 4%    10.6ms ± 5%    ~     (p=0.421 n=5+5)
Gzip                      567ms ± 1%     563ms ± 4%    ~     (p=0.548 n=5+5)
Gunzip                   60.2ms ± 1%    60.4ms ± 0%    ~     (p=0.056 n=5+5)
HTTPClientServer          114µs ± 4%     108µs ± 7%    ~     (p=0.095 n=5+5)
JSONEncode               18.4ms ± 2%    17.8ms ± 2%  -3.06%  (p=0.016 n=5+5)
JSONDecode                105ms ± 1%     103ms ± 2%    ~     (p=0.056 n=5+5)
Mandelbrot200            5.48ms ± 0%    5.49ms ± 0%    ~     (p=0.841 n=5+5)
GoParse                  6.05ms ± 1%    6.05ms ± 2%    ~     (p=1.000 n=5+5)
RegexpMatchEasy0_32       143ns ± 1%     146ns ± 4%  +2.10%  (p=0.048 n=4+5)
RegexpMatchEasy0_1K       499ns ± 1%     492ns ± 2%    ~     (p=0.079 n=5+5)
RegexpMatchEasy1_32       137ns ± 0%     136ns ± 1%  -0.73%  (p=0.016 n=4+5)
RegexpMatchEasy1_1K       826ns ± 4%     823ns ± 2%    ~     (p=0.841 n=5+5)
RegexpMatchMedium_32      224ns ± 5%     233ns ± 8%    ~     (p=0.119 n=5+5)
RegexpMatchMedium_1K     59.6µs ± 0%    59.3µs ± 1%  -0.66%  (p=0.016 n=4+5)
RegexpMatchHard_32       3.29µs ± 3%    3.26µs ± 1%    ~     (p=0.889 n=5+5)
RegexpMatchHard_1K       98.8µs ± 2%    99.0µs ± 0%    ~     (p=0.690 n=5+5)
Revcomp                   1.02s ± 1%     1.01s ± 1%    ~     (p=0.095 n=5+5)
Template                  135ms ± 5%     131ms ± 1%    ~     (p=0.151 n=5+5)
TimeParse                 591ns ± 0%     593ns ± 0%  +0.20%  (p=0.048 n=5+5)
TimeFormat                655ns ± 2%     607ns ± 0%  -7.42%  (p=0.016 n=5+4)
[Geo mean]               93.5µs         93.8µs       +0.23%

name                   old speed      new speed      delta
GobDecode              55.1MB/s ± 4%  55.1MB/s ± 4%    ~     (p=1.000 n=5+5)
GobEncode              72.4MB/s ± 4%  72.3MB/s ± 5%    ~     (p=0.421 n=5+5)
Gzip                   34.2MB/s ± 1%  34.5MB/s ± 4%    ~     (p=0.548 n=5+5)
Gunzip                  322MB/s ± 1%   321MB/s ± 0%    ~     (p=0.056 n=5+5)
JSONEncode              106MB/s ± 2%   109MB/s ± 2%  +3.16%  (p=0.016 n=5+5)
JSONDecode             18.5MB/s ± 1%  18.8MB/s ± 2%    ~     (p=0.056 n=5+5)
GoParse                9.57MB/s ± 1%  9.57MB/s ± 2%    ~     (p=0.952 n=5+5)
RegexpMatchEasy0_32     223MB/s ± 1%   221MB/s ± 0%  -1.10%  (p=0.029 n=4+4)
RegexpMatchEasy0_1K    2.05GB/s ± 1%  2.08GB/s ± 2%    ~     (p=0.095 n=5+5)
RegexpMatchEasy1_32     232MB/s ± 0%   234MB/s ± 1%  +0.76%  (p=0.016 n=4+5)
RegexpMatchEasy1_1K    1.24GB/s ± 4%  1.24GB/s ± 2%    ~     (p=0.841 n=5+5)
RegexpMatchMedium_32   4.45MB/s ± 5%  4.20MB/s ± 1%  -5.63%  (p=0.000 n=5+4)
RegexpMatchMedium_1K   17.2MB/s ± 0%  17.3MB/s ± 1%  +0.66%  (p=0.016 n=4+5)
RegexpMatchHard_32     9.73MB/s ± 3%  9.83MB/s ± 1%    ~     (p=0.889 n=5+5)
RegexpMatchHard_1K     10.4MB/s ± 2%  10.3MB/s ± 0%    ~     (p=0.635 n=5+5)
Revcomp                 249MB/s ± 1%   252MB/s ± 1%    ~     (p=0.095 n=5+5)
Template               14.4MB/s ± 4%  14.8MB/s ± 1%    ~     (p=0.151 n=5+5)
[Geo mean]             62.1MB/s       62.3MB/s       +0.34%

Fixes #10108

Change-Id: I79038f3c4c2ff874c136053d1a2b1c8a5a9cfac5
Reviewed-on: https://go-review.googlesource.com/c/118796
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-11-07 16:12:02 +00:00
Cherry Zhang 35c0554293 cmd/asm: rename R18 to R18_PLATFORM on ARM64
In ARM64 ABI, R18 is the "platform register", the use of which is
OS specific. The OS could choose to reserve this register. In
practice, it seems fine to use R18 on Linux but not on darwin (iOS).

Rename R18 to R18_PLATFORM to prevent accidental use. There is no
R18 usage within the standard library (besides tests, which are
updated).

Fixes #26110

Change-Id: Icef7b9549e2049db1df307a0180a3c90a12d7a84
Reviewed-on: https://go-review.googlesource.com/c/147218
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-11-06 20:10:12 +00:00
Austin Clements c0281afd87 cmd/internal/obj: don't dedup symbols in WriteObjFile
Currently, WriteObjFile deduplicates symbols by name. This is a
strange and unexpected place to do this. But, worse, there's no
checking that it's reasonable to deduplicate two symbols, so this
makes it incredibly easy to mask errors involving duplicate symbols.
Dealing with duplicate symbols is better left to the linker. We're
also about to introduce multiple symbols with the same name but
different ABIs/versions, which would make this deduplication more
complicated. We just removed the only part of the compiler that
actually depended on this behavior.

This CL removes symbol deduplication from WriteObjFile, since it is no
longer needed.

For #27539.

Change-Id: I650c550e46e83f95c67cb6c6646f9b2f7f10df30
Reviewed-on: https://go-review.googlesource.com/c/146558
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-11-03 15:12:58 +00:00
Austin Clements 15265ec421 cmd/compile: avoid duplicate GC bitmap symbols
Currently, liveness produces a distinct obj.LSym for each GC bitmap
for each function. These are then named by content hash and only
ultimately deduplicated by WriteObjFile.

For various reasons (see next commit), we want to remove this
deduplication behavior from WriteObjFile. Furthermore, it's
inefficient to produce these duplicate symbols in the first place.

GC bitmaps are the only source of duplicate symbols in the compiler.
This commit eliminates these duplicate symbols by declaring them in
the Ctxt symbol hash just like every other obj.LSym. As a result, all
GC bitmaps with the same content now refer to the same obj.LSym.

The next commit will remove deduplication from WriteObjFile.

For #27539.

Change-Id: I4f15e3d99530122cdf473b7a838c69ef5f79db59
Reviewed-on: https://go-review.googlesource.com/c/146557
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-11-03 15:12:34 +00:00
Cherry Zhang 441cb988b4 cmd/internal/obj/arm64: fix encoding of 32-bit negated logical instructions
32-bit negated logical instructions (BICW, ORNW, EONW) with
constants were mis-encoded, because they were missing in the
cases where we handle 32-bit logical instructions. This CL
adds the missing cases.

Fixes #28548

Change-Id: I3d6acde7d3b72bb7d3d5d00a9df698a72c806ad5
Reviewed-on: https://go-review.googlesource.com/c/147077
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Ben Shi <powerman1st@163.com>
Reviewed-by: Ben Shi <powerman1st@163.com>
2018-11-03 01:46:55 +00:00
Clément Chigot 85525c56ab all: skip unsupported tests on AIX
This commit skips tests which aren't yet supported on AIX.

nosplit.go is disabled because stackGuardMultiplier is increased for
syscalls.

Change-Id: Ib5ff9a4539c7646bcb6caee159f105ff8a160ad7
Reviewed-on: https://go-review.googlesource.com/c/146939
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
2018-11-02 16:12:08 +00:00
bill_ofarrell 3f3142ad99 cmd/asm: add s390x VMSLG instruction variants
VMSLG has three variants on z14 and later machines. These variants are used in "limbified" squaring:
VMSLEG: Even Shift Indication -- the even-indexed intermediate result is doubled
VMSLOG: Odd Shift Indication -- the odd-indexed intermediate result is doubled
VMSLEOG: Even and Odd Shift Indication -- both intermediate results are doubled
Limbified squaring is very useful for high performance cryptographic algorithms, such as
elliptic curve. This change allows these instructions to be used in Go assembly.

Change-Id: Iaad577b07320205539f99b3cb37a2a984882721b
Reviewed-on: https://go-review.googlesource.com/c/145180
Reviewed-by: Michael Munday <mike.munday@ibm.com>
2018-10-29 09:54:51 +00:00
Clément Chigot 75a0b9dbf3 cmd: add DWARF64 support for aix port
This commit adds support for DWARF 64bits which is needed for AIX
operating system.

It also adds the save of each compilation unit's size which will be
used during XCOFF generation in a following patch.

Updates: #25893

Change-Id: Icdd0a4dd02bc0a9f0df319c351fb1db944610015
Reviewed-on: https://go-review.googlesource.com/c/138729
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-10-24 13:36:43 +00:00
Lynn Boger bdba55653f cmd/asm/internal,cmd/internal/obj/ppc64: add alignment directive to asm for ppc64x
This adds support for an alignment directive that can be used
within Go asm to indicate preferred code alignment for ppc64x.
This is intended to be used with loops to improve
performance.

This change only adds the directive and aligns the code based
on it. Follow up changes will modify asm functions for
ppc64x that benefit from preferred alignment.

Fixes #14935

Here is one example of the improvement in memmove when the
directive is used on the loops in the code:

Memmove/64      8.74ns ± 0%    8.64ns ± 0%   -1.19%  (p=0.000 n=8+8)
Memmove/128     11.5ns ± 0%    11.0ns ± 0%   -4.35%  (p=0.000 n=8+8)
Memmove/256     23.0ns ± 0%    15.3ns ± 0%  -33.48%  (p=0.000 n=8+8)
Memmove/512     31.7ns ± 0%    31.8ns ± 0%   +0.32%  (p=0.000 n=8+8)
Memmove/1024    52.3ns ± 0%    43.9ns ± 0%  -16.10%  (p=0.000 n=8+8)
Memmove/2048    93.2ns ± 0%    76.2ns ± 0%  -18.24%  (p=0.000 n=8+8)
Memmove/4096     174ns ± 0%     141ns ± 0%  -18.97%  (p=0.000 n=8+8)

Change-Id: I200d77e923dd5d78c22fe3f8eb142a8fbaff57bf
Reviewed-on: https://go-review.googlesource.com/c/144218
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-10-23 20:37:29 +00:00
fanzha02 86ce1cb060 cmd/internal/obj/arm64: reclassify 32-bit/64-bit constants
Current assembler saves constants in Offset which type is int64,
causing 32-bit constants have a incorrect class. This CL reclassifies
constants when opcodes are 32-bit variant, like MOVW, ANDW and
ADDW, etc. Besides, this CL encodes some constants of ADDCON class
as MOVs instructions.

This CL changes the assembler behavior as follows.

1. go assembler ADDW $MOVCON, Rn, Rd
   previous version: MOVD $MOVCON, Rtmp; ADDW Rtmp, Rn, Rd
   current version: MOVW $MOVCON, Rtmp; ADDW Rtmp, Rn, Rd

2. go assembly MOVW $0xaaaaffff, R1
   previous version: treats $0xaaaaffff as VCON, encodes it as MOVW 0x994, R1 (loads it from pool).
   current version: treats $0xaaaaffff as MOVCON, and encodes it into MOVW instructions.

3. go assembly MOVD $0x210000, R1
   previous version: treats $0x210000 as ADDCON, loads it from pool
   current version: treats $0x210000 as MOVCON, and encodes it into MOVD instructions.

Add the test cases.

1. Binary size before/after.
binary                          size change
pkg/linux_arm64                 -1.534KB
pkg/tool/linux_arm64            -0.718KB
go                              -0.32KB
gofmt                           no change

2. go1 benchmark result.
name                     old time/op    new time/op    delta
BinaryTree17-8              6.26s ± 1%     6.28s ± 1%     ~     (p=0.105 n=10+10)
Fannkuch11-8                5.40s ± 0%     5.39s ± 0%   -0.29%  (p=0.028 n=9+10)
FmtFprintfEmpty-8          94.5ns ± 0%    95.0ns ± 0%   +0.51%  (p=0.000 n=10+9)
FmtFprintfString-8          163ns ± 1%     159ns ± 1%   -2.06%  (p=0.000 n=10+9)
FmtFprintfInt-8             200ns ± 1%     196ns ± 1%   -1.99%  (p=0.000 n=9+10)
FmtFprintfIntInt-8          292ns ± 3%     284ns ± 1%   -2.87%  (p=0.001 n=10+9)
FmtFprintfPrefixedInt-8     422ns ± 1%     420ns ± 1%   -0.59%  (p=0.015 n=10+10)
FmtFprintfFloat-8           458ns ± 0%     463ns ± 1%   +1.19%  (p=0.000 n=9+10)
FmtManyArgs-8              1.37µs ± 1%    1.35µs ± 1%   -1.85%  (p=0.000 n=10+10)
GobDecode-8                15.5ms ± 1%    15.3ms ± 1%   -1.82%  (p=0.000 n=10+10)
GobEncode-8                11.7ms ± 5%    11.7ms ± 2%     ~     (p=0.549 n=10+9)
Gzip-8                      622ms ± 0%     624ms ± 0%   +0.23%  (p=0.000 n=10+9)
Gunzip-8                   73.6ms ± 0%    73.8ms ± 1%     ~     (p=0.077 n=9+9)
HTTPClientServer-8          115µs ± 1%     115µs ± 1%     ~     (p=0.796 n=10+10)
JSONEncode-8               31.1ms ± 2%    28.7ms ± 1%   -7.98%  (p=0.000 n=10+9)
JSONDecode-8                145ms ± 0%     145ms ± 1%     ~     (p=0.447 n=9+10)
Mandelbrot200-8            9.67ms ± 0%    9.60ms ± 0%   -0.76%  (p=0.000 n=9+9)
GoParse-8                  7.56ms ± 1%    7.58ms ± 0%   +0.21%  (p=0.035 n=10+9)
RegexpMatchEasy0_32-8       208ns ±10%     222ns ± 0%     ~     (p=0.531 n=10+6)
RegexpMatchEasy0_1K-8       699ns ± 4%     694ns ± 4%     ~     (p=0.868 n=10+10)
RegexpMatchEasy1_32-8       186ns ± 8%     190ns ±12%     ~     (p=0.955 n=10+10)
RegexpMatchEasy1_1K-8      1.13µs ± 1%    1.05µs ± 2%   -6.64%  (p=0.000 n=10+10)
RegexpMatchMedium_32-8      316ns ± 7%     288ns ± 1%   -8.68%  (p=0.000 n=10+7)
RegexpMatchMedium_1K-8     90.2µs ± 0%    85.5µs ± 2%   -5.19%  (p=0.000 n=10+10)
RegexpMatchHard_32-8       5.53µs ± 0%    3.90µs ± 0%  -29.52%  (p=0.000 n=10+10)
RegexpMatchHard_1K-8        119µs ± 0%     124µs ± 0%   +4.29%  (p=0.000 n=9+10)
Revcomp-8                   1.07s ± 0%     1.07s ± 0%     ~     (p=0.094 n=9+9)
Template-8                  162ms ± 1%     160ms ± 2%     ~     (p=0.089 n=10+10)
TimeParse-8                 756ns ± 2%     763ns ± 1%     ~     (p=0.158 n=10+10)
TimeFormat-8                758ns ± 1%     746ns ± 1%   -1.52%  (p=0.000 n=10+10)

name                     old speed      new speed      delta
GobDecode-8              49.4MB/s ± 1%  50.3MB/s ± 1%   +1.84%  (p=0.000 n=10+10)
GobEncode-8              65.6MB/s ± 5%  65.4MB/s ± 2%     ~     (p=0.549 n=10+9)
Gzip-8                   31.2MB/s ± 0%  31.1MB/s ± 0%   -0.24%  (p=0.000 n=9+9)
Gunzip-8                  264MB/s ± 0%   263MB/s ± 1%     ~     (p=0.073 n=9+9)
JSONEncode-8             62.3MB/s ± 2%  67.7MB/s ± 1%   +8.67%  (p=0.000 n=10+9)
JSONDecode-8             13.4MB/s ± 0%  13.4MB/s ± 1%     ~     (p=0.508 n=9+10)
GoParse-8                7.66MB/s ± 1%  7.64MB/s ± 0%   -0.23%  (p=0.049 n=10+9)
RegexpMatchEasy0_32-8     154MB/s ± 9%   143MB/s ± 3%     ~     (p=0.303 n=10+7)
RegexpMatchEasy0_1K-8    1.46GB/s ± 4%  1.47GB/s ± 4%     ~     (p=0.912 n=10+10)
RegexpMatchEasy1_32-8     172MB/s ± 9%   170MB/s ±12%     ~     (p=0.971 n=10+10)
RegexpMatchEasy1_1K-8     908MB/s ± 1%   972MB/s ± 2%   +7.12%  (p=0.000 n=10+10)
RegexpMatchMedium_32-8   3.17MB/s ± 7%  3.46MB/s ± 1%   +9.14%  (p=0.000 n=10+7)
RegexpMatchMedium_1K-8   11.3MB/s ± 0%  12.0MB/s ± 2%   +5.51%  (p=0.000 n=10+10)
RegexpMatchHard_32-8     5.78MB/s ± 0%  8.21MB/s ± 0%  +41.93%  (p=0.000 n=9+10)
RegexpMatchHard_1K-8     8.62MB/s ± 0%  8.27MB/s ± 0%   -4.11%  (p=0.000 n=9+10)
Revcomp-8                 237MB/s ± 0%   237MB/s ± 0%     ~     (p=0.081 n=9+9)
Template-8               12.0MB/s ± 1%  12.1MB/s ± 2%     ~     (p=0.072 n=10+10)

Change-Id: I080801f520366b42d5f9699954bd33106976a81b
Reviewed-on: https://go-review.googlesource.com/c/120661
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-10-22 14:23:23 +00:00
Denys Smirnov 6c631ae227 cmd/compile: in wasm, allocate approximately right number of locals for functions
Currently, WASM binary writer requests 16 int registers (locals) and
16 float registers for every function regardless of how many locals the
function uses.

This change counts the number of used registers and requests a number
of locals matching the highest register index. The change has no effect
on performance and neglectable binary size improvement, but it makes
WASM code more readable and easy to analyze.

Change-Id: Ic1079623c0d632b215c68482db909fa440892700
GitHub-Last-Rev: 184634fa91
GitHub-Pull-Request: golang/go#28116
Reviewed-on: https://go-review.googlesource.com/c/140999
Reviewed-by: Richard Musiol <neelance@gmail.com>
Run-TryBot: Richard Musiol <neelance@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-10-19 21:05:33 +00:00
Lynn Boger f776d51bf7 cmd/internal/obj/ppc64: generate float 0 more efficiently on ppc64x
This change makes use of a VSX instruction to generate the
float 0 value instead of generating a constant in memory and
loading it from there.

This uses 1 instruction instead of 2 and avoids a memory reference.
in the +0 case, uses 2 instructions in the -0 case but avoids
the memory reference.

Since this is done in the assembler for ppc64x, an update has
been made to the assembler test.

Change-Id: Ief7dddcb057bfb602f78215f6947664e8c841464
Reviewed-on: https://go-review.googlesource.com/c/139420
Reviewed-by: Michael Munday <mike.munday@ibm.com>
2018-10-10 12:31:23 +00:00
Igor Zhilianin 04dc1b2443 all: fix a bunch of misspellings
Change-Id: I94cebca86706e072fbe3be782d3edbe0e22b9432
GitHub-Last-Rev: 8e15a40545
GitHub-Pull-Request: golang/go#28067
Reviewed-on: https://go-review.googlesource.com/c/140437
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-10-08 03:12:03 +00:00
Igor Zhilianin f90e89e675 all: fix a bunch of misspellings
Change-Id: If2954bdfc551515403706b2cd0dde94e45936e08
GitHub-Last-Rev: d4cfc41a55
GitHub-Pull-Request: golang/go#28049
Reviewed-on: https://go-review.googlesource.com/c/140299
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-10-06 15:40:03 +00:00
Tim Cooper 8aee193fb8 all: remove unneeded parentheses from package consts and vars
Change-Id: Ic7fce53c6264107c15b127d9c9ca0bec11a888ff
Reviewed-on: https://go-review.googlesource.com/c/138183
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-10-06 12:11:53 +00:00
Ben Shi 808186203b cmd/internal/obj/arm64: simplify ADD and SUB
Currently "ADD $0x123456, Rs, Rd" will load pre-stored 0x123456
from the constant pool and use it for the addition. Total 12 bytes
are cost. And so does SUB.

This CL breaks it to "ADD 0x123000, Rs, Rd" + "ADD 0x000456, Rd, Rd".
Both "0x123000" and "0x000456" can be directly encoded into the
instruction binary code. So 4 bytes are saved.

1. The total size of pkg/android_arm64 decreases about 0.3KB.

2. The go1 benchmark show little regression (excluding noise).

name                     old time/op    new time/op    delta
BinaryTree17-4              15.9s ± 0%     15.9s ± 1%  +0.10%  (p=0.044 n=29+29)
Fannkuch11-4                8.72s ± 0%     8.75s ± 0%  +0.34%  (p=0.000 n=30+24)
FmtFprintfEmpty-4           173ns ± 0%     173ns ± 0%    ~     (all equal)
FmtFprintfString-4          368ns ± 0%     368ns ± 0%    ~     (p=0.593 n=30+30)
FmtFprintfInt-4             417ns ± 0%     417ns ± 0%    ~     (all equal)
FmtFprintfIntInt-4          673ns ± 0%     661ns ± 1%  -1.70%  (p=0.000 n=30+30)
FmtFprintfPrefixedInt-4     805ns ± 0%     805ns ± 0%  +0.10%  (p=0.011 n=30+30)
FmtFprintfFloat-4          1.09µs ± 0%    1.09µs ± 0%    ~     (p=0.125 n=30+29)
FmtManyArgs-4              2.68µs ± 0%    2.68µs ± 0%  +0.07%  (p=0.004 n=30+30)
GobDecode-4                32.9ms ± 0%    33.2ms ± 1%  +1.07%  (p=0.000 n=29+29)
GobEncode-4                29.5ms ± 0%    29.6ms ± 0%  +0.26%  (p=0.000 n=28+28)
Gzip-4                      1.38s ± 1%     1.35s ± 3%  -1.94%  (p=0.000 n=28+30)
Gunzip-4                    139ms ± 0%     139ms ± 0%  +0.10%  (p=0.000 n=28+29)
HTTPClientServer-4          745µs ± 5%     742µs ± 3%    ~     (p=0.405 n=28+29)
JSONEncode-4               49.5ms ± 1%    49.9ms ± 0%  +0.89%  (p=0.000 n=30+30)
JSONDecode-4                264ms ± 1%     264ms ± 0%  +0.25%  (p=0.001 n=30+30)
Mandelbrot200-4            16.6ms ± 0%    16.6ms ± 0%    ~     (p=0.507 n=29+29)
GoParse-4                  15.9ms ± 0%    16.0ms ± 1%  +0.91%  (p=0.002 n=23+30)
RegexpMatchEasy0_32-4       379ns ± 0%     379ns ± 0%    ~     (all equal)
RegexpMatchEasy0_1K-4      1.31µs ± 0%    1.31µs ± 0%  +0.09%  (p=0.008 n=27+30)
RegexpMatchEasy1_32-4       357ns ± 0%     358ns ± 0%  +0.28%  (p=0.000 n=28+29)
RegexpMatchEasy1_1K-4      2.04µs ± 0%    2.04µs ± 0%    ~     (p=0.850 n=30+30)
RegexpMatchMedium_32-4      587ns ± 0%     589ns ± 0%  +0.33%  (p=0.000 n=30+30)
RegexpMatchMedium_1K-4      162µs ± 0%     163µs ± 0%    ~     (p=0.351 n=30+29)
RegexpMatchHard_32-4       9.54µs ± 0%    9.60µs ± 0%  +0.59%  (p=0.000 n=28+30)
RegexpMatchHard_1K-4        287µs ± 0%     287µs ± 0%  +0.11%  (p=0.000 n=26+29)
Revcomp-4                   2.50s ± 0%     2.50s ± 0%  -0.13%  (p=0.012 n=28+27)
Template-4                  312ms ± 1%     312ms ± 1%  +0.20%  (p=0.015 n=27+30)
TimeParse-4                1.68µs ± 0%    1.68µs ± 0%  -0.35%  (p=0.000 n=30+30)
TimeFormat-4               1.66µs ± 0%    1.64µs ± 0%  -1.20%  (p=0.000 n=25+29)
[Geo mean]                  246µs          246µs       -0.00%

name                     old speed      new speed      delta
GobDecode-4              23.3MB/s ± 0%  23.1MB/s ± 1%  -1.05%  (p=0.000 n=29+29)
GobEncode-4              26.0MB/s ± 0%  25.9MB/s ± 0%  -0.25%  (p=0.000 n=29+28)
Gzip-4                   14.1MB/s ± 1%  14.4MB/s ± 3%  +1.94%  (p=0.000 n=27+30)
Gunzip-4                  139MB/s ± 0%   139MB/s ± 0%  -0.10%  (p=0.000 n=28+29)
JSONEncode-4             39.2MB/s ± 1%  38.9MB/s ± 0%  -0.88%  (p=0.000 n=30+30)
JSONDecode-4             7.37MB/s ± 0%  7.35MB/s ± 0%  -0.26%  (p=0.001 n=30+30)
GoParse-4                3.65MB/s ± 0%  3.62MB/s ± 1%  -0.86%  (p=0.001 n=23+30)
RegexpMatchEasy0_32-4    84.3MB/s ± 0%  84.3MB/s ± 0%    ~     (p=0.126 n=27+26)
RegexpMatchEasy0_1K-4     784MB/s ± 0%   783MB/s ± 0%  -0.10%  (p=0.003 n=27+30)
RegexpMatchEasy1_32-4    89.5MB/s ± 0%  89.3MB/s ± 0%  -0.20%  (p=0.000 n=27+29)
RegexpMatchEasy1_1K-4     502MB/s ± 0%   502MB/s ± 0%    ~     (p=0.858 n=30+28)
RegexpMatchMedium_32-4   1.70MB/s ± 0%  1.70MB/s ± 0%  -0.25%  (p=0.000 n=30+30)
RegexpMatchMedium_1K-4   6.30MB/s ± 0%  6.30MB/s ± 0%    ~     (all equal)
RegexpMatchHard_32-4     3.35MB/s ± 0%  3.33MB/s ± 0%  -0.47%  (p=0.000 n=30+30)
RegexpMatchHard_1K-4     3.57MB/s ± 0%  3.56MB/s ± 0%  -0.20%  (p=0.000 n=27+30)
Revcomp-4                 102MB/s ± 0%   102MB/s ± 0%  +0.14%  (p=0.008 n=28+28)
Template-4               6.23MB/s ± 0%  6.21MB/s ± 1%  -0.21%  (p=0.009 n=21+30)
[Geo mean]               24.1MB/s       24.0MB/s       -0.16%

Change-Id: Ifcef3edb667540e2d86e586c23afcfbc2cf1340b
Reviewed-on: https://go-review.googlesource.com/c/134536
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-10-04 20:26:13 +00:00
Keith Randall cbafcc55e8 cmd/compile,runtime: implement stack objects
Rework how the compiler+runtime handles stack-allocated variables
whose address is taken.

Direct references to such variables work as before. References through
pointers, however, use a new mechanism. The new mechanism is more
precise than the old "ambiguously live" mechanism. It computes liveness
at runtime based on the actual references among objects on the stack.

Each function records all of its address-taken objects in a FUNCDATA.
These are called "stack objects". The runtime then uses that
information while scanning a stack to find all of the stack objects on
a stack. It then does a mark phase on the stack objects, using all the
pointers found on the stack (and ancillary structures, like defer
records) as the root set. Only stack objects which are found to be
live during this mark phase will be scanned and thus retain any heap
objects they point to.

A subsequent CL will remove all the "ambiguously live" logic from
the compiler, so that the stack object tracing will be required.
For this CL, the stack tracing is all redundant with the current
ambiguously live logic.

Update #22350

Change-Id: Ide19f1f71a5b6ec8c4d54f8f66f0e9a98344772f
Reviewed-on: https://go-review.googlesource.com/c/134155
Reviewed-by: Austin Clements <austin@google.com>
2018-10-03 19:52:49 +00:00
Ben Shi c3304edf10 cmd/internal/obj/arm: delete unnecessary code
In the arm assembler, "AMOVW" never falls into optab
case 13, so the check "if p.As == AMOVW" is useless.

Change-Id: Iec241d5b4cffb358a1477f470619dc9a6287884a
Reviewed-on: https://go-review.googlesource.com/c/138575
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-10-03 01:05:56 +00:00
fanzha02 e7f5f3eca4 cmd/internal/obj/arm64: add error report for invalid base register
The current assembler accepts the non-integer register as the base register,
which should be an illegal combination.

Add the test cases.

Change-Id: Ia21596bbb5b1e212e34bd3a170748ae788860422
Reviewed-on: https://go-review.googlesource.com/134575
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-09-12 15:45:13 +00:00
fanzha02 7ab4b5586d cmd/internal/obj/arm64: add CONSTRAINED UNPREDICTABLE behavior check for some load/store
According to ARM64 manual, it is "constrained unpredictable behavior"
if the src and dst registers of some load/store instructions are same.
In order to completely prevent such unpredictable behavior, adding the
check for load/store instructions that are supported by the assembler
in the assembler.

Add test cases.

Update #25823

Change-Id: I64c14ad99ee543d778e7ec8ae6516a532293dbb3
Reviewed-on: https://go-review.googlesource.com/120660
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-06 19:42:03 +00:00
fanzha02 c430adf136 cmd/internal/obj/arm64: encode float constants into FMOVS/FMOVD instructions
Current assembler rewrites float constants to values stored in memory
except 0.0, which is not performant. This patch uses the FMOVS/FMOVD
instructions to move some available floating-point immediate constants
into SIMD&FP destination registers. These available constants can be
encoded into FMOVS/FMOVD instructions, checked by the chipfloat7() function.

go1 benchmark results.
name                     old time/op    new time/op    delta
BinaryTree17-8              6.27s ± 1%     6.27s ± 1%    ~     (p=0.762 n=10+8)
Fannkuch11-8                5.42s ± 1%     5.38s ± 0%  -0.63%  (p=0.000 n=10+10)
FmtFprintfEmpty-8          92.9ns ± 1%    93.4ns ± 0%  +0.47%  (p=0.004 n=9+8)
FmtFprintfString-8          169ns ± 2%     170ns ± 4%    ~     (p=0.378 n=10+10)
FmtFprintfInt-8             197ns ± 1%     196ns ± 1%  -0.77%  (p=0.009 n=10+9)
FmtFprintfIntInt-8          284ns ± 1%     286ns ± 1%    ~     (p=0.051 n=10+10)
FmtFprintfPrefixedInt-8     419ns ± 0%     422ns ± 1%  +0.69%  (p=0.038 n=6+10)
FmtFprintfFloat-8           458ns ± 0%     463ns ± 1%  +1.14%  (p=0.000 n=10+10)
FmtManyArgs-8              1.35µs ± 2%    1.36µs ± 1%  +0.91%  (p=0.043 n=10+10)
GobDecode-8                16.0ms ± 2%    15.5ms ± 1%   -3.39%  (p=0.000 n=10+10)
GobEncode-8                11.9ms ± 3%    11.4ms ± 1%   -3.98%  (p=0.000 n=10+9)
Gzip-8                      621ms ± 0%     625ms ± 0%   +0.59%  (p=0.000 n=9+10)
Gunzip-8                   74.0ms ± 1%    74.3ms ± 0%     ~     (p=0.059 n=9+8)
HTTPClientServer-8          116µs ± 1%     116µs ± 1%     ~     (p=0.165 n=10+10)
JSONEncode-8               29.3ms ± 1%    29.5ms ± 0%   +0.72%  (p=0.001 n=10+10)
JSONDecode-8                145ms ± 1%     148ms ± 2%   +2.06%  (p=0.000 n=10+10)
Mandelbrot200-8            9.67ms ± 0%    9.48ms ± 1%   -1.92%  (p=0.000 n=8+10)
GoParse-8                  7.55ms ± 0%    7.60ms ± 0%   +0.57%  (p=0.000 n=9+10)
RegexpMatchEasy0_32-8       234ns ± 0%     210ns ± 0%  -10.13%  (p=0.000 n=8+10)
RegexpMatchEasy0_1K-8       753ns ± 1%     729ns ± 0%   -3.17%  (p=0.000 n=10+8)
RegexpMatchEasy1_32-8       225ns ± 0%     224ns ± 0%   -0.44%  (p=0.000 n=9+9)
RegexpMatchEasy1_1K-8      1.03µs ± 0%    1.04µs ± 1%   +1.29%  (p=0.000 n=10+10)
RegexpMatchMedium_32-8      320ns ± 3%     296ns ± 6%   -7.50%  (p=0.000 n=10+10)
RegexpMatchMedium_1K-8     77.0µs ± 5%    73.6µs ± 1%     ~     (p=0.393 n=10+10)
RegexpMatchHard_32-8       3.93µs ± 0%    3.89µs ± 1%   -0.95%  (p=0.000 n=10+9)
RegexpMatchHard_1K-8        120µs ± 5%     115µs ± 1%     ~     (p=0.739 n=10+10)
Revcomp-8                   1.07s ± 0%     1.08s ± 1%   +0.63%  (p=0.000 n=10+9)
Template-8                  165ms ± 1%     163ms ± 1%   -1.05%  (p=0.001 n=8+10)
TimeParse-8                 751ns ± 1%     749ns ± 1%     ~     (p=0.209 n=10+10)
TimeFormat-8                759ns ± 1%     751ns ± 1%   -0.96%  (p=0.001 n=10+10)

name                     old speed      new speed      delta
GobDecode-8              48.0MB/s ± 2%  49.6MB/s ± 1%   +3.50%  (p=0.000 n=10+10)
GobEncode-8              64.5MB/s ± 3%  67.1MB/s ± 1%   +4.08%  (p=0.000 n=10+9)
Gzip-8                   31.2MB/s ± 0%  31.1MB/s ± 0%   -0.55%  (p=0.000 n=9+8)
Gunzip-8                  262MB/s ± 1%   261MB/s ± 0%     ~     (p=0.059 n=9+8)
JSONEncode-8             66.3MB/s ± 1%  65.8MB/s ± 0%   -0.72%  (p=0.001 n=10+10)
JSONDecode-8             13.4MB/s ± 1%  13.2MB/s ± 1%   -2.02%  (p=0.000 n=10+10)
GoParse-8                7.67MB/s ± 0%  7.63MB/s ± 0%   -0.57%  (p=0.000 n=9+10)
RegexpMatchEasy0_32-8     136MB/s ± 0%   152MB/s ± 0%  +11.45%  (p=0.000 n=10+10)
RegexpMatchEasy0_1K-8    1.36GB/s ± 1%  1.40GB/s ± 0%   +3.25%  (p=0.000 n=10+8)
RegexpMatchEasy1_32-8     142MB/s ± 0%   143MB/s ± 0%   +0.35%  (p=0.000 n=10+9)
RegexpMatchEasy1_1K-8     992MB/s ± 0%   980MB/s ± 1%   -1.27%  (p=0.000 n=10+10)
RegexpMatchMedium_32-8   3.12MB/s ± 3%  3.38MB/s ± 6%   +8.17%  (p=0.000 n=10+10)
RegexpMatchMedium_1K-8   13.3MB/s ± 5%  13.9MB/s ± 1%     ~     (p=0.362 n=10+10)
RegexpMatchHard_32-8     8.14MB/s ± 0%  8.21MB/s ± 1%   +0.95%  (p=0.000 n=10+9)
RegexpMatchHard_1K-8     8.54MB/s ± 5%  8.90MB/s ± 1%     ~     (p=0.636 n=10+10)
Revcomp-8                 238MB/s ± 0%   236MB/s ± 1%   -0.63%  (p=0.000 n=10+9)
Template-8               11.8MB/s ± 1%  11.9MB/s ± 1%   +1.07%  (p=0.001 n=8+10)

Change-Id: I57b372d8dcd47e6aec39893843b20385d5d9c37e
Reviewed-on: https://go-review.googlesource.com/129555
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-05 14:47:02 +00:00
Ben Shi 1018a80fe8 cmd/internal/obj/arm64: support more atomic instructions
LDADDALD(64-bit) and LDADDALW(32-bit) are already supported.
This CL adds supports of LDADDALH(16-bit) and LDADDALB(8-bit).

Change-Id: I4eac61adcec226d618dfce88618a2b98f5f1afe7
Reviewed-on: https://go-review.googlesource.com/132135
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-04 20:29:33 +00:00
Michael Munday 6f9b94ab66 cmd/compile: implement OnesCount{8,16,32,64} intrinsics on s390x
This CL implements the math/bits.OnesCount{8,16,32,64} functions
as intrinsics on s390x using the 'population count' (popcnt)
instruction. This instruction was released as the 'population-count'
facility which uses the same facility bit (45) as the
'distinct-operands' facility which is a pre-requisite for Go on
s390x. We can therefore use it without a feature check.

The s390x popcnt instruction treats a 64 bit register as a vector
of 8 bytes, summing the number of ones in each byte individually.
It then writes the results to the corresponding bytes in the
output register. Therefore to implement OnesCount{16,32,64} we
need to sum the individual byte counts using some extra
instructions. To do this efficiently I've added some additional
pseudo operations to the s390x SSA backend.

Unlike other architectures the new instruction sequence is faster
for OnesCount8, so that is implemented using the intrinsic.

name         old time/op  new time/op  delta
OnesCount    3.21ns ± 1%  1.35ns ± 0%  -58.00%  (p=0.000 n=20+20)
OnesCount8   0.91ns ± 1%  0.81ns ± 0%  -11.43%  (p=0.000 n=20+20)
OnesCount16  1.51ns ± 3%  1.21ns ± 0%  -19.71%  (p=0.000 n=20+17)
OnesCount32  1.91ns ± 0%  1.12ns ± 1%  -41.60%  (p=0.000 n=19+20)
OnesCount64  3.18ns ± 4%  1.35ns ± 0%  -57.52%  (p=0.000 n=20+20)

Change-Id: Id54f0bd28b6db9a887ad12c0d72fcc168ef9c4e0
Reviewed-on: https://go-review.googlesource.com/114675
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-09-03 14:35:38 +00:00
Zheng Xu 8f4fd3f34e build: support frame-pointer for arm64
Supporting frame-pointer makes Linux's perf and other profilers much more useful
because it lets them gather a stack trace efficiently on profiling events. Major
changes include:
1. save FP on the word below where RSP is pointing to (proposed by Cherry and Austin)
2. adjust some specific offsets in runtime assembly and wrapper code
3. add support to FP in goroutine scheduler
4. adjust link stack overflow check to take the extra word into account
5. adjust nosplit test cases to enable frame sizes which are 16 bytes aligned

Performance impacts on go1 benchmarks:

Enable frame-pointer (by default)

name                      old time/op    new time/op    delta
BinaryTree17-46              5.94s ± 0%     6.00s ± 0%  +1.03%  (p=0.029 n=4+4)
Fannkuch11-46                2.84s ± 1%     2.77s ± 0%  -2.58%  (p=0.008 n=5+5)
FmtFprintfEmpty-46          55.0ns ± 1%    58.9ns ± 1%  +7.06%  (p=0.008 n=5+5)
FmtFprintfString-46          102ns ± 0%     105ns ± 0%  +2.94%  (p=0.008 n=5+5)
FmtFprintfInt-46             118ns ± 0%     117ns ± 1%  -1.19%  (p=0.000 n=4+5)
FmtFprintfIntInt-46          181ns ± 0%     182ns ± 1%    ~     (p=0.444 n=5+5)
FmtFprintfPrefixedInt-46     215ns ± 1%     214ns ± 0%    ~     (p=0.254 n=5+4)
FmtFprintfFloat-46           292ns ± 0%     296ns ± 0%  +1.46%  (p=0.029 n=4+4)
FmtManyArgs-46               720ns ± 0%     732ns ± 0%  +1.72%  (p=0.008 n=5+5)
GobDecode-46                9.82ms ± 1%   10.03ms ± 2%  +2.10%  (p=0.008 n=5+5)
GobEncode-46                8.14ms ± 0%    8.72ms ± 1%  +7.14%  (p=0.008 n=5+5)
Gzip-46                      420ms ± 0%     424ms ± 0%  +0.92%  (p=0.008 n=5+5)
Gunzip-46                   48.2ms ± 0%    48.4ms ± 0%  +0.41%  (p=0.008 n=5+5)
HTTPClientServer-46          201µs ± 4%     201µs ± 0%    ~     (p=0.730 n=5+4)
JSONEncode-46               17.1ms ± 0%    17.7ms ± 1%  +3.80%  (p=0.008 n=5+5)
JSONDecode-46               88.0ms ± 0%    90.1ms ± 0%  +2.42%  (p=0.008 n=5+5)
Mandelbrot200-46            5.06ms ± 0%    5.07ms ± 0%    ~     (p=0.310 n=5+5)
GoParse-46                  5.04ms ± 0%    5.12ms ± 0%  +1.53%  (p=0.008 n=5+5)
RegexpMatchEasy0_32-46       117ns ± 0%     117ns ± 0%    ~     (all equal)
RegexpMatchEasy0_1K-46       332ns ± 0%     329ns ± 0%  -0.78%  (p=0.008 n=5+5)
RegexpMatchEasy1_32-46       104ns ± 0%     113ns ± 0%  +8.65%  (p=0.029 n=4+4)
RegexpMatchEasy1_1K-46       563ns ± 0%     569ns ± 0%  +1.10%  (p=0.008 n=5+5)
RegexpMatchMedium_32-46      167ns ± 2%     177ns ± 1%  +5.74%  (p=0.008 n=5+5)
RegexpMatchMedium_1K-46     49.5µs ± 0%    53.4µs ± 0%  +7.81%  (p=0.008 n=5+5)
RegexpMatchHard_32-46       2.56µs ± 1%    2.72µs ± 0%  +6.01%  (p=0.008 n=5+5)
RegexpMatchHard_1K-46       77.0µs ± 0%    81.8µs ± 0%  +6.24%  (p=0.016 n=5+4)
Revcomp-46                   631ms ± 1%     627ms ± 1%    ~     (p=0.095 n=5+5)
Template-46                 81.8ms ± 0%    86.3ms ± 0%  +5.55%  (p=0.008 n=5+5)
TimeParse-46                 423ns ± 0%     432ns ± 0%  +2.32%  (p=0.008 n=5+5)
TimeFormat-46                478ns ± 2%     497ns ± 1%  +3.89%  (p=0.008 n=5+5)
[Geo mean]                  71.6µs         73.3µs       +2.45%

name                      old speed      new speed      delta
GobDecode-46              78.1MB/s ± 1%  76.6MB/s ± 2%  -2.04%  (p=0.008 n=5+5)
GobEncode-46              94.3MB/s ± 0%  88.0MB/s ± 1%  -6.67%  (p=0.008 n=5+5)
Gzip-46                   46.2MB/s ± 0%  45.8MB/s ± 0%  -0.91%  (p=0.008 n=5+5)
Gunzip-46                  403MB/s ± 0%   401MB/s ± 0%  -0.41%  (p=0.008 n=5+5)
JSONEncode-46              114MB/s ± 0%   109MB/s ± 1%  -3.66%  (p=0.008 n=5+5)
JSONDecode-46             22.0MB/s ± 0%  21.5MB/s ± 0%  -2.35%  (p=0.008 n=5+5)
GoParse-46                11.5MB/s ± 0%  11.3MB/s ± 0%  -1.51%  (p=0.008 n=5+5)
RegexpMatchEasy0_32-46     272MB/s ± 0%   272MB/s ± 1%    ~     (p=0.190 n=4+5)
RegexpMatchEasy0_1K-46    3.08GB/s ± 0%  3.11GB/s ± 0%  +0.77%  (p=0.008 n=5+5)
RegexpMatchEasy1_32-46     306MB/s ± 0%   283MB/s ± 0%  -7.63%  (p=0.029 n=4+4)
RegexpMatchEasy1_1K-46    1.82GB/s ± 0%  1.80GB/s ± 0%  -1.07%  (p=0.008 n=5+5)
RegexpMatchMedium_32-46   5.99MB/s ± 0%  5.64MB/s ± 1%  -5.77%  (p=0.016 n=4+5)
RegexpMatchMedium_1K-46   20.7MB/s ± 0%  19.2MB/s ± 0%  -7.25%  (p=0.008 n=5+5)
RegexpMatchHard_32-46     12.5MB/s ± 1%  11.8MB/s ± 0%  -5.66%  (p=0.008 n=5+5)
RegexpMatchHard_1K-46     13.3MB/s ± 0%  12.5MB/s ± 1%  -6.01%  (p=0.008 n=5+5)
Revcomp-46                 402MB/s ± 1%   405MB/s ± 1%    ~     (p=0.095 n=5+5)
Template-46               23.7MB/s ± 0%  22.5MB/s ± 0%  -5.25%  (p=0.008 n=5+5)
[Geo mean]                82.2MB/s       79.6MB/s       -3.26%

Disable frame-pointer (GOEXPERIMENT=noframepointer)

name                      old time/op    new time/op    delta
BinaryTree17-46              5.94s ± 0%     5.96s ± 0%  +0.39%  (p=0.029 n=4+4)
Fannkuch11-46                2.84s ± 1%     2.79s ± 1%  -1.68%  (p=0.008 n=5+5)
FmtFprintfEmpty-46          55.0ns ± 1%    55.2ns ± 3%    ~     (p=0.794 n=5+5)
FmtFprintfString-46          102ns ± 0%     103ns ± 0%  +0.98%  (p=0.016 n=5+4)
FmtFprintfInt-46             118ns ± 0%     115ns ± 0%  -2.54%  (p=0.029 n=4+4)
FmtFprintfIntInt-46          181ns ± 0%     179ns ± 0%  -1.10%  (p=0.000 n=5+4)
FmtFprintfPrefixedInt-46     215ns ± 1%     213ns ± 0%    ~     (p=0.143 n=5+4)
FmtFprintfFloat-46           292ns ± 0%     300ns ± 0%  +2.83%  (p=0.029 n=4+4)
FmtManyArgs-46               720ns ± 0%     739ns ± 0%  +2.64%  (p=0.008 n=5+5)
GobDecode-46                9.82ms ± 1%    9.78ms ± 1%    ~     (p=0.151 n=5+5)
GobEncode-46                8.14ms ± 0%    8.12ms ± 1%    ~     (p=0.690 n=5+5)
Gzip-46                      420ms ± 0%     420ms ± 0%    ~     (p=0.548 n=5+5)
Gunzip-46                   48.2ms ± 0%    48.0ms ± 0%  -0.33%  (p=0.032 n=5+5)
HTTPClientServer-46          201µs ± 4%     199µs ± 3%    ~     (p=0.548 n=5+5)
JSONEncode-46               17.1ms ± 0%    17.2ms ± 0%    ~     (p=0.056 n=5+5)
JSONDecode-46               88.0ms ± 0%    88.6ms ± 0%  +0.64%  (p=0.008 n=5+5)
Mandelbrot200-46            5.06ms ± 0%    5.07ms ± 0%    ~     (p=0.548 n=5+5)
GoParse-46                  5.04ms ± 0%    5.07ms ± 0%  +0.65%  (p=0.008 n=5+5)
RegexpMatchEasy0_32-46       117ns ± 0%     112ns ± 4%  -4.27%  (p=0.016 n=4+5)
RegexpMatchEasy0_1K-46       332ns ± 0%     330ns ± 1%    ~     (p=0.095 n=5+5)
RegexpMatchEasy1_32-46       104ns ± 0%     110ns ± 1%  +5.29%  (p=0.029 n=4+4)
RegexpMatchEasy1_1K-46       563ns ± 0%     567ns ± 2%    ~     (p=0.151 n=5+5)
RegexpMatchMedium_32-46      167ns ± 2%     166ns ± 0%    ~     (p=0.333 n=5+4)
RegexpMatchMedium_1K-46     49.5µs ± 0%    49.6µs ± 0%    ~     (p=0.841 n=5+5)
RegexpMatchHard_32-46       2.56µs ± 1%    2.49µs ± 0%  -2.81%  (p=0.008 n=5+5)
RegexpMatchHard_1K-46       77.0µs ± 0%    75.8µs ± 0%  -1.55%  (p=0.008 n=5+5)
Revcomp-46                   631ms ± 1%     628ms ± 0%    ~     (p=0.095 n=5+5)
Template-46                 81.8ms ± 0%    84.3ms ± 1%  +3.05%  (p=0.008 n=5+5)
TimeParse-46                 423ns ± 0%     425ns ± 0%  +0.52%  (p=0.008 n=5+5)
TimeFormat-46                478ns ± 2%     478ns ± 1%    ~     (p=1.000 n=5+5)
[Geo mean]                  71.6µs         71.6µs       -0.01%

name                      old speed      new speed      delta
GobDecode-46              78.1MB/s ± 1%  78.5MB/s ± 1%    ~     (p=0.151 n=5+5)
GobEncode-46              94.3MB/s ± 0%  94.5MB/s ± 1%    ~     (p=0.690 n=5+5)
Gzip-46                   46.2MB/s ± 0%  46.2MB/s ± 0%    ~     (p=0.571 n=5+5)
Gunzip-46                  403MB/s ± 0%   404MB/s ± 0%  +0.33%  (p=0.032 n=5+5)
JSONEncode-46              114MB/s ± 0%   113MB/s ± 0%    ~     (p=0.056 n=5+5)
JSONDecode-46             22.0MB/s ± 0%  21.9MB/s ± 0%  -0.64%  (p=0.008 n=5+5)
GoParse-46                11.5MB/s ± 0%  11.4MB/s ± 0%  -0.64%  (p=0.008 n=5+5)
RegexpMatchEasy0_32-46     272MB/s ± 0%   285MB/s ± 4%  +4.74%  (p=0.016 n=4+5)
RegexpMatchEasy0_1K-46    3.08GB/s ± 0%  3.10GB/s ± 1%    ~     (p=0.151 n=5+5)
RegexpMatchEasy1_32-46     306MB/s ± 0%   290MB/s ± 1%  -5.21%  (p=0.029 n=4+4)
RegexpMatchEasy1_1K-46    1.82GB/s ± 0%  1.81GB/s ± 2%    ~     (p=0.151 n=5+5)
RegexpMatchMedium_32-46   5.99MB/s ± 0%  6.02MB/s ± 1%    ~     (p=0.063 n=4+5)
RegexpMatchMedium_1K-46   20.7MB/s ± 0%  20.7MB/s ± 0%    ~     (p=0.659 n=5+5)
RegexpMatchHard_32-46     12.5MB/s ± 1%  12.8MB/s ± 0%  +2.88%  (p=0.008 n=5+5)
RegexpMatchHard_1K-46     13.3MB/s ± 0%  13.5MB/s ± 0%  +1.58%  (p=0.008 n=5+5)
Revcomp-46                 402MB/s ± 1%   405MB/s ± 0%    ~     (p=0.095 n=5+5)
Template-46               23.7MB/s ± 0%  23.0MB/s ± 1%  -2.95%  (p=0.008 n=5+5)
[Geo mean]                82.2MB/s       82.3MB/s       +0.04%

Frame-pointer is enabled on Linux by default but can be disabled by setting: GOEXPERIMENT=noframepointer.

Fixes #10110

Change-Id: I1bfaca6dba29a63009d7c6ab04ed7a1413d9479e
Reviewed-on: https://go-review.googlesource.com/61511
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-08-29 18:28:34 +00:00
Ben Shi 84374d4de5 cmd/internal/obj: support more arm64 FP instructions
ARM64 also supports float point LDP(load pair) & STP (store pair).
The CL adds implementation and corresponding test cases for
FLDPD/FLDPS/FSTPD/FSTPS.

Change-Id: I45f112012a4e097bfaf023d029b36e6cbc7a5859
Reviewed-on: https://go-review.googlesource.com/125438
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-08-24 03:00:59 +00:00
Iskander Sharipov ed2f84a94e cmd/internal/obj/arm64: simplify some bool expressions
Replace `!(o1 != 0)` with `o1 == 0` (for readability).

Found using https://go-critic.github.io/overview.html#boolExprSimplify-ref

Change-Id: I4fc035458f530973f9be15b38441ec7b5fb591ec
Reviewed-on: https://go-review.googlesource.com/123377
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-08-22 16:24:52 +00:00
Xia Bin 477b7e5f4d cmd/internal/obj: remove pointless validation
s.Func.Text only can be nil at the moment, otherwise there has
some bugs in compiler's Go rumtime.

Change-Id: Ib2ff9bb977352838e67f2b98a69468f6f350c1f3
Reviewed-on: https://go-review.googlesource.com/123535
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-08-20 22:16:46 +00:00
Ben Shi 84feb4bbb7 cmd/internal/obj/arm64: add register indexed FMOVS/FMOVD
This CL adds register indexed FMOVS/FMOVD.
FMOVS Fx, (Rn)(Rm)
FMOVS Fx, (Rn)(Rm<<2)
FMOVD Fx, (Rn)(Rm)
FMOVD Fx, (Rn)(Rm<<3)
FMOVS (Rn)(Rm), Fx
FMOVS (Rn)(Rm<<2), Fx
FMOVD (Rn)(Rm), Fx
FMOVD (Rn)(Rm<<3), Fx

Change-Id: Id76de6a4be96b64cf79d7e9a1962d9d49cb462f2
Reviewed-on: https://go-review.googlesource.com/123995
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-08-20 15:15:16 +00:00
Ben Shi 26d62b4ca9 cmd/internal/obj/arm64: add SWPALD/SWPALW/SWPALH/SWPALB
Those new instructions have acquire/release semantics, besides
normal atomic SWPD/SWPW/SWPH/SWPB.

Change-Id: I24821a4d21aebc342897ae52903aef612c8d8a4a
Reviewed-on: https://go-review.googlesource.com/128476
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-08-20 14:09:51 +00:00
Ben Shi 9594ba4fe5 cmd/internal/obj/arm64: fix incorrect rejection of legal instructions
"BFI $0, R1, $7, R2" is expected to copy bit 0~6 from R1 to R2, and
left R2's other bits unchanged.

But the assembler rejects it with error "illegal bit number", and
BFIW/SBFIZ/SBFIZW/UBFIZ/UBFIZW have the same problem.

This CL fixes that issue and adds corresponding test cases.

fixes #26736

Change-Id: Ie0090a0faa38a49dd9b096a0f435987849800b76
Reviewed-on: https://go-review.googlesource.com/127159
Run-TryBot: Ben Shi <powerman1st@163.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-08-03 15:44:22 +00:00
Ben Shi e351a16005 cmd/internal/obj/arm64: reject incorrect form of LDP/STP
"LDP (R0), (F0, F1)" and "STP (F1, F2), (R0)" are
silently accepted by the arm64 assembler without
any error message. And this CL fixes that bug.

fixes #26556.

Change-Id: Ib6fae81956deb39a4ffd95e9409acc8dad3ab2d2
Reviewed-on: https://go-review.googlesource.com/125637
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-07-30 15:31:06 +00:00
Michael Munday 9fa988547a cmd/internal/obj/s390x: increase maximum number of loop iterations
The maximum number of 'spanz' iterations that the s390x assembler
performs to reach a fixed point for relative offsets was 10. This
turned out to be too aggressive for one example of auto-generated
fuzzing code. Increase the number of iterations by 10x to reduce
the likelihood that the limit will be hit again. This limit only
exists to help find bugs in the assembler.

master at tip does not fail with the example code in the issue, I
have therefore not submitted it as a test (it is also quite large).
I tested this change with the example code at the commit given and
it fixes the issue.

Fixes #25269.

Change-Id: I0e44948957a7faff51c7d27c0b7746ed6e2d47bb
Reviewed-on: https://go-review.googlesource.com/122235
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-07-05 07:21:50 +00:00
Tobias Klauser 2ee6bfbd7e cmd/internal/obj: follow convention for generated code comment
Follow the convertion (https://golang.org/s/generatedcode) for generated
code in stringer.go.

Change-Id: I7b5fbb04ba03e8ac77a9a0a402088669469de858
Reviewed-on: https://go-review.googlesource.com/122015
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-07-03 16:14:43 +00:00
Wei Xiao 0a7ac93c27 cmd/compile: improve atomic add intrinsics with ARMv8.1 new instruction
ARMv8.1 has added new instruction (LDADDAL) for atomic memory operations. This
CL improves existing atomic add intrinsics with the new instruction. Since the
new instruction is only guaranteed to be present after ARMv8.1, we guard its
usage with a conditional on CPU feature.

Performance result on ARMv8.1 machine:
name        old time/op  new time/op  delta
Xadd-224    1.05µs ± 6%  0.02µs ± 4%  -98.06%  (p=0.000 n=10+8)
Xadd64-224  1.05µs ± 3%  0.02µs ±13%  -98.10%  (p=0.000 n=9+10)
[Geo mean]  1.05µs       0.02µs       -98.08%

Performance result on ARMv8.0 machine:
name        old time/op  new time/op  delta
Xadd-46      538ns ± 1%   541ns ± 1%  +0.62%  (p=0.000 n=9+9)
Xadd64-46    505ns ± 1%   508ns ± 0%  +0.48%  (p=0.003 n=9+8)
[Geo mean]   521ns        524ns       +0.55%

Change-Id: If4b5d8d0e2d6f84fe1492a4f5de0789910ad0ee9
Reviewed-on: https://go-review.googlesource.com/81877
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-06-21 14:52:43 +00:00
Richard Musiol e083dc6307 runtime, sycall/js: add support for callbacks from JavaScript
This commit adds support for JavaScript callbacks back into
WebAssembly. This is experimental API, just like the rest of the
syscall/js package. The time package now also uses this mechanism
to properly support timers without resorting to a busy loop.

JavaScript code can call into the same entry point multiple times.
The new RUN register is used to keep track of the program's
run state. Possible values are: starting, running, paused and exited.
If no goroutine is ready any more, the scheduler can put the
program into the "paused" state and the WebAssembly code will
stop running. When a callback occurs, the JavaScript code puts
the callback data into a queue and then calls into WebAssembly
to allow the Go code to continue running.

Updates #18892
Updates #25506

Change-Id: Ib8701cfa0536d10d69bd541c85b0e2a754eb54fb
Reviewed-on: https://go-review.googlesource.com/114197
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-06-14 21:50:53 +00:00
Lynn Boger 9e9ff565cd runtime/race: implement race detector for ppc64le
This adds the support to enable the race detector for ppc64le.

Added runtime/race_ppc64le.s to manage the calls from Go to the
LLVM tsan functions, mostly converting from the Go ABI to the
PPC64 ABI expected by Clang generated code.

Changed racewalk.go to call racefuncenterfp instead of racefuncenter
on ppc64le to allow the caller pc to be obtained in the asm code
before calling the tsan version.

Changed the set up code for racecallbackthunk so it doesn't use
the autogenerated save and restore of the link register since that
sequence uses registers inconsistent with the normal ppc64 ABI.

Made various changes to recognize that race is supported for
ppc64le.

Ensured that tls_g is updated and accessible from race_linux_ppc64le.s
so that the race ctx can be obtained and passed to tsan functions.

This enables the race tests for ppc64le in cmd/dist/test.go and
increases the timeout when running the benchmarks with the -race
option to avoid timing out.

Updates #24354, #23731

Change-Id: Ib97dc7ac313e6313c836dc7d2fb698f9d8fba3ef
Reviewed-on: https://go-review.googlesource.com/107935
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-06-11 17:45:36 +00:00
Tim Cooper 161874da2a all: update comment URLs from HTTP to HTTPS, where possible
Each URL was manually verified to ensure it did not serve up incorrect
content.

Change-Id: I4dc846227af95a73ee9a3074d0c379ff0fa955df
Reviewed-on: https://go-review.googlesource.com/115798
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
2018-06-01 21:52:00 +00:00
Ben Shi 3d6e4ec0a8 cmd/internal/obj/arm64: fix two issues in the assembler
There are two issues in the arm64 assembler.

1. "CMPW $0x22220000, RSP" is encoded to 5b44a4d2ff031b6b, which
   is the combination of "MOVD $0x22220000, Rtmp" and
   "NEGSW Rtmp, ZR".
   The right encoding should be a combination of
   "MOVD $0x22220000, Rtmp" and "CMPW Rtmp, RSP".

2. "AND $0x22220000, R2, RSP" is encoded to 5b44a4d25f601b00,
   which is the combination of "MOVD $0x22220000, Rtmp" and
   an illegal instruction.
   The right behavior should be an error report of
   "illegal combination", since "AND Rtmp, RSP, RSP" is invalid
   in armv8.

This CL fixes the above 2 issues and adds more test cases.

fixes #25557

Change-Id: Ia510be26b58a229f5dfe8a5fa0b35569b2d566e7
Reviewed-on: https://go-review.googlesource.com/114796
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-06-01 15:24:42 +00:00
isharipo 67fe8b5677 cmd/internal/obj/x86: add missing Yi8 for VEX ytabs
This change adds Yi8 forms for every ytab that had them before AVX-512 patch.
The rationale is backwards-compatibility.

EVEX forms remain strict and unchanged as they're not bound to any
backwards-compatibility issues.

Fixes #25510

Change-Id: Icd692266010ed64c9fe47cc837afc2edf2ad2d1d
Reviewed-on: https://go-review.googlesource.com/114136
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-05-23 15:34:19 +00:00
Austin Clements 9f95c9db23 cmd/compile, cmd/internal/obj: record register maps in binary
This adds FUNCDATA and PCDATA that records the register maps much like
the existing live arguments maps and live locals maps. The register
map is indexed independently from the argument and locals maps since
changes in register liveness tend not to correlate with changes to
argument and local liveness.

This is the final CL toward adding safe-points everywhere. The
following CLs will optimize liveness analysis to bring down the cost.
The effect of this CL is:

name        old time/op       new time/op       delta
Template          195ms ± 2%        197ms ± 1%    ~     (p=0.136 n=9+9)
Unicode          98.4ms ± 2%       99.7ms ± 1%  +1.39%  (p=0.004 n=10+10)
GoTypes           685ms ± 1%        700ms ± 1%  +2.06%  (p=0.000 n=9+9)
Compiler          3.28s ± 1%        3.34s ± 0%  +1.71%  (p=0.000 n=9+8)
SSA               7.79s ± 1%        7.91s ± 1%  +1.55%  (p=0.000 n=10+9)
Flate             133ms ± 2%        133ms ± 2%    ~     (p=0.190 n=10+10)
GoParser          161ms ± 2%        164ms ± 3%  +1.83%  (p=0.015 n=10+10)
Reflect           450ms ± 1%        457ms ± 1%  +1.62%  (p=0.000 n=10+10)
Tar               183ms ± 2%        185ms ± 1%  +0.91%  (p=0.008 n=9+10)
XML               234ms ± 1%        238ms ± 1%  +1.60%  (p=0.000 n=9+9)
[Geo mean]        411ms             417ms       +1.40%

name        old exe-bytes     new exe-bytes     delta
HelloSize         1.47M ± 0%        1.51M ± 0%  +2.79%  (p=0.000 n=10+10)

Compared to just before "cmd/internal/obj: consolidate emitting entry
stack map", the cumulative effect of adding stack maps everywhere and
register maps is:

name        old time/op       new time/op       delta
Template          185ms ± 2%        197ms ± 1%   +6.42%  (p=0.000 n=10+9)
Unicode          96.3ms ± 3%       99.7ms ± 1%   +3.60%  (p=0.000 n=10+10)
GoTypes           658ms ± 0%        700ms ± 1%   +6.37%  (p=0.000 n=10+9)
Compiler          3.14s ± 1%        3.34s ± 0%   +6.53%  (p=0.000 n=9+8)
SSA               7.41s ± 2%        7.91s ± 1%   +6.71%  (p=0.000 n=9+9)
Flate             126ms ± 1%        133ms ± 2%   +6.15%  (p=0.000 n=10+10)
GoParser          153ms ± 1%        164ms ± 3%   +6.89%  (p=0.000 n=10+10)
Reflect           437ms ± 1%        457ms ± 1%   +4.59%  (p=0.000 n=10+10)
Tar               178ms ± 1%        185ms ± 1%   +4.18%  (p=0.000 n=10+10)
XML               223ms ± 1%        238ms ± 1%   +6.39%  (p=0.000 n=10+9)
[Geo mean]        394ms             417ms        +5.78%

name        old alloc/op      new alloc/op      delta
Template         34.5MB ± 0%       38.0MB ± 0%  +10.19%  (p=0.000 n=10+10)
Unicode          29.3MB ± 0%       30.3MB ± 0%   +3.56%  (p=0.000 n=8+9)
GoTypes           113MB ± 0%        125MB ± 0%  +10.89%  (p=0.000 n=10+10)
Compiler          510MB ± 0%        575MB ± 0%  +12.79%  (p=0.000 n=10+10)
SSA              1.46GB ± 0%       1.64GB ± 0%  +12.40%  (p=0.000 n=10+10)
Flate            23.9MB ± 0%       25.9MB ± 0%   +8.56%  (p=0.000 n=10+10)
GoParser         28.0MB ± 0%       30.8MB ± 0%  +10.08%  (p=0.000 n=10+10)
Reflect          77.6MB ± 0%       84.3MB ± 0%   +8.63%  (p=0.000 n=10+10)
Tar              34.1MB ± 0%       37.0MB ± 0%   +8.44%  (p=0.000 n=10+10)
XML              42.7MB ± 0%       47.2MB ± 0%  +10.75%  (p=0.000 n=10+10)
[Geo mean]       76.0MB            83.3MB        +9.60%

name        old allocs/op     new allocs/op     delta
Template           321k ± 0%         337k ± 0%   +4.98%  (p=0.000 n=10+10)
Unicode            337k ± 0%         340k ± 0%   +1.04%  (p=0.000 n=10+9)
GoTypes           1.13M ± 0%        1.18M ± 0%   +4.85%  (p=0.000 n=10+10)
Compiler          4.67M ± 0%        4.96M ± 0%   +6.25%  (p=0.000 n=10+10)
SSA               11.7M ± 0%        12.3M ± 0%   +5.69%  (p=0.000 n=10+10)
Flate              216k ± 0%         226k ± 0%   +4.52%  (p=0.000 n=10+9)
GoParser           271k ± 0%         283k ± 0%   +4.52%  (p=0.000 n=10+10)
Reflect            927k ± 0%         972k ± 0%   +4.78%  (p=0.000 n=10+10)
Tar                318k ± 0%         333k ± 0%   +4.56%  (p=0.000 n=10+10)
XML                376k ± 0%         395k ± 0%   +5.04%  (p=0.000 n=10+10)
[Geo mean]         730k              764k        +4.61%

name        old exe-bytes     new exe-bytes     delta
HelloSize         1.46M ± 0%        1.51M ± 0%   +3.66%  (p=0.000 n=10+10)

For #24543.

Change-Id: I91e003dc64151916b384274884bf02a2d6862547
Reviewed-on: https://go-review.googlesource.com/109353
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-05-22 15:55:03 +00:00
isharipo 5437cde96c cmd/asm: enable AVX512
- Uncomment tests for AVX512 encoder
- Permit instruction suffixes for x86
- Permit limited reg list [reg-reg] syntax for x86 for multi-source ops
- EVEX encoding support in obj/x86 (Z-cases, asmevex, etc.)
- optabs and ytabs generated by x86avxgen (https://golang.org/cl/107216)

Note: suffix formatting implemented with updated CConv function.
Now arch asm backend should register formatting function by
calling RegisterOpSuffix.

Updates #22779

Change-Id: I076a167ee49582700e058c56ad74e6696710c8c8
Reviewed-on: https://go-review.googlesource.com/113315
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-05-22 14:57:15 +00:00
Austin Clements 02495da6b6 cmd/internal/obj: consolidate emitting entry stack map
The obj package needs to emit the PCDATA to select the entry stack map
before calling morestack. Currently this is copied for every
architecture. Since we're about to change how this works, consolidate
all of these copies into a single helper function.

For #24543.

Change-Id: Ia92d94de78f8e23fd06dba747c43e03e5989f67b
Reviewed-on: https://go-review.googlesource.com/109346
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-05-22 14:43:35 +00:00
isharipo d2c68bb65f cmd/internal/obj/x86: fix VPERMQ and VPERMPD ytab
Fixes invalid encoding of VPERMQ and VPERMPD that use
negative immediate argument.

Fixes #25418
Updates #25420

Change-Id: Idd8180c4c632a76b76f3a68efd5f930d94431994
Reviewed-on: https://go-review.googlesource.com/113615
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-05-17 16:20:25 +00:00
David Chase 7bac2a95f6 cmd/compile: plumb prologueEnd into DWARF
This marks the first instruction after the prologue for
consumption by debuggers, specifically Delve, who asked
for it.  gdb appears to ignore it, lldb appears to use it.

The bits for end-of-prologue and beginning-of-epilogue
are added to Pos (reducing maximum line number by 4x, to
1048575).  They're added in cmd/internal/obj/<ARCH>.go
(currently x86 only), so the compiler-proper need not
deal with them.

The linker currently does nothing with beginning-of-epilogue,
but the plumbing exists to make it easier in the future.

This also upgrades the line number table to DWARF version 3.

This CL includes a regression in the coverage for
testdata/i22558.gdb-dbg.nexts, this appears to be a gdb
artifact but the fix would be in the preceding CL in the
stack.

Change-Id: I3bda5f46a0ed232d137ad48f65a14835c742c506
Reviewed-on: https://go-review.googlesource.com/110416
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-05-14 14:12:31 +00:00
Ben Shi bec2f51b07 cmd/internal/obj/arm: fix wrong encoding of MUL
The arm assembler incorrectly encodes the following instructions.
"MUL R2, R4" -> 0xe0040492 ("MUL R4, R2, R4")
"MUL R2, R4, R4" -> 0xe0040492 ("MUL R4, R2, R4")

The CL fixes that issue.

fixes #25347

Change-Id: I883716c7bc51c5f64837ae7d81342f94540a58cb
Reviewed-on: https://go-review.googlesource.com/112737
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-05-14 01:53:39 +00:00
quasilyte e405c9cca3 cmd/internal/obj/x86: use named consts for movtab Z-cases
Use 0-terminated opbyte sequences for Zlit-like movtabs instead of E=0xff.

movCodeFullPtr is unused (load full ptr is unsupported), but it should
be removed in a separate CL (if removed at all).

Passes toolstash-check.

Change-Id: I28436718d93b017153de0e50e3bcec344ea4ee05
Reviewed-on: https://go-review.googlesource.com/107076
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-05-11 14:40:34 +00:00
Richard Musiol bf23a4e61d cmd/internal/obj/wasm: avoid invalid offsets for Load/Store
Offsets for Load and Store instructions have type i32. Bad index
expression offsets can cause an offset to be larger than MaxUint32,
which is not allowed. One example for this is the test test/index0.go.

Generate valid code by adding a guard to the responsible rewrite rule.
Also emit a proper error when using such a bad index in assembly code.

Change-Id: Ie90adcbf3ae3861c26680eb81790f28692913ccf
Reviewed-on: https://go-review.googlesource.com/111955
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-05-10 12:05:17 +00:00
Lynn Boger 28edaf4584 cmd/compile,test: combine byte loads and stores on ppc64le
CL 74410 added rules to combine consecutive byte loads and
stores when the byte order was little endian for ppc64le. This
is the corresponding change for bytes that are in big endian order.
These rules are all intended for a little endian target arch.

This adds new testcases in test/codegen/memcombine.go

Fixes #22496
Updates #24242

Benchmark improvement for encoding/binary:
name                      old time/op    new time/op    delta
ReadSlice1000Int32s-16      11.0µs ± 0%     9.0µs ± 0%  -17.47%  (p=0.029 n=4+4)
ReadStruct-16               2.47µs ± 1%    2.48µs ± 0%   +0.67%  (p=0.114 n=4+4)
ReadInts-16                  642ns ± 1%     630ns ± 1%   -2.02%  (p=0.029 n=4+4)
WriteInts-16                 654ns ± 0%     653ns ± 1%   -0.08%  (p=0.629 n=4+4)
WriteSlice1000Int32s-16     8.75µs ± 0%    8.20µs ± 0%   -6.19%  (p=0.029 n=4+4)
PutUint16-16                1.16ns ± 0%    0.93ns ± 0%  -19.83%  (p=0.029 n=4+4)
PutUint32-16                1.16ns ± 0%    0.93ns ± 0%  -19.83%  (p=0.029 n=4+4)
PutUint64-16                1.85ns ± 0%    0.93ns ± 0%  -49.73%  (p=0.029 n=4+4)
LittleEndianPutUint16-16    1.03ns ± 0%    0.93ns ± 0%   -9.71%  (p=0.029 n=4+4)
LittleEndianPutUint32-16    0.93ns ± 0%    0.93ns ± 0%     ~     (all equal)
LittleEndianPutUint64-16    0.93ns ± 0%    0.93ns ± 0%     ~     (all equal)
PutUvarint32-16             43.0ns ± 0%    43.1ns ± 0%   +0.12%  (p=0.429 n=4+4)
PutUvarint64-16              174ns ± 0%     175ns ± 0%   +0.29%  (p=0.429 n=4+4)

Updates made to functions in gcm.go to enable their matching. An existing
testcase prevents these functions from being replaced by those in encoding/binary
due to import dependencies.

Change-Id: Idb3bd1e6e7b12d86cd828fb29cb095848a3e485a
Reviewed-on: https://go-review.googlesource.com/98136
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-05-08 13:15:39 +00:00