Commit Graph

65 Commits

Author SHA1 Message Date
Keith Randall 1a6281d950 [release-branch.go1.16] cmd/compile: ensure constant shift amounts are in range for arm
Ensure constant shift amounts are in the range [0-31]. When shift amounts
are out of range, bad things happen. Shift amounts out of range occur
when lowering 64-bit shifts (we take an in-range shift s in [0-63] and
calculate s-32 and 32-s, both of which might be out of [0-31]).

The constant shift operations themselves still work, but their shift
amounts get copied unmolested to operations like ORshiftLL which use only
the low 5 bits. That changes an operation like <<100 which unconditionally
produces 0, to <<4, which doesn't.

Fixes #48478

Change-Id: I87363ef2b4ceaf3b2e316426064626efdfbb8ee3
Reviewed-on: https://go-review.googlesource.com/c/go/+/350969
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
(cherry picked from commit eff27e858b)
Reviewed-on: https://go-review.googlesource.com/c/go/+/351070
Reviewed-by: Austin Clements <austin@google.com>
2021-10-27 21:14:21 +00:00
David Chase 3c85e995ef cmd/compile: extend ssa.AuxCall to closure and interface calls
Also introduce helper methods.

Change-Id: I11a744ed002bae0ca9ebabba3206e1c14147e03d
Reviewed-on: https://go-review.googlesource.com/c/go/+/239080
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-09-16 20:58:14 +00:00
David Chase b4ef49e527 cmd/compile: introduce special ssa Aux type for calls
This is prerequisite to moving call expansion later into SSA,
and probably a good idea anyway.  Passes tests.

This is the first minimal CL that does a 1-for-1 substitution
of *ssa.AuxCall for *obj.LSym.  Next step (next CL) is to make
this change for all calls so that additional information can
be stored in AuxCall.

Change-Id: Ia3a7715648fd9fb1a176850767a726e6f5b959eb
Reviewed-on: https://go-review.googlesource.com/c/go/+/237680
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-09-16 20:57:24 +00:00
Keith Randall 40ef1faabc cmd/compile: redo flag constant ops for arm
Encode the flag results in an auxint field instead of having
one opcode per flag state. This helps us handle the new *noov
branches in a unified manner.

This is only for arm, arm64 is in a subsequent CL.

We could extend to other architectures as well, athough it would
only be cleanup, no behavioral change.

Update #39505

Change-Id: Ia46cea596faad540d1496c5915ab1274571543f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/238077
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-06-18 20:57:49 +00:00
Xiangdong Ji e031318ca6 cmd/compile: ARM comparisons with 0 incorrect on overflow
Some ARM rewriting rules convert 'comparing to zero' conditions of if
statements to a simplified version utilizing CMN and CMP instructions to
branch over condition flags, in order to save one Add or Sub caculation.

Such optimizations lead to wrong branching in case an overflow/underflow
occurs when executing CMN or CMP.

Fix the issue by introducing new block opcodes that don't honor the
overflow/underflow flag:

  Block-Op         Meaning                   ARM condition codes
  1. LTnoov        less than                 MI
  2. GEnoov        greater than or equal     PL
  3. LEnoov        less than or equal        MI || EQ
  4. GTnoov        greater than              NEQ & PL

The patch also adds a few test cases to cover scenarios that are specific
to ARM and fine-tunes the code generation tests for 'x-const'.

For more details please refer to the previous fix on 64-bit ARM:
  https://go-review.googlesource.com/c/go/+/233097

Go1 perf, 'old' is the non-optimized version, that is removing all concerned
rewriting rules.

name                     old time/op    new time/op     delta
BinaryTree17-8              7.73s ± 0%      7.81s ± 0%  +0.97%  (p=0.000 n=7+8)
Fannkuch11-8                7.06s ± 0%      7.00s ± 0%  -0.83%  (p=0.000 n=8+8)
FmtFprintfEmpty-8           181ns ± 1%      183ns ± 1%  +1.31%  (p=0.001 n=8+8)
FmtFprintfString-8          319ns ± 1%      325ns ± 2%  +1.71%  (p=0.009 n=7+8)
FmtFprintfInt-8             358ns ± 1%      359ns ± 1%    ~     (p=0.293 n=7+7)
FmtFprintfIntInt-8          459ns ± 3%      456ns ± 1%    ~     (p=0.869 n=8+8)
FmtFprintfPrefixedInt-8     535ns ± 4%      538ns ± 4%    ~     (p=0.572 n=8+8)
FmtFprintfFloat-8          1.01µs ± 2%     1.01µs ± 2%    ~     (p=0.625 n=8+8)
FmtManyArgs-8              1.93µs ± 2%     1.93µs ± 1%    ~     (p=0.979 n=8+7)
GobDecode-8                16.1ms ± 1%     16.5ms ± 1%  +2.32%  (p=0.000 n=8+8)
GobEncode-8                15.9ms ± 0%     15.8ms ± 1%  -1.00%  (p=0.000 n=8+7)
Gzip-8                      690ms ± 1%      670ms ± 0%  -2.90%  (p=0.000 n=8+8)
Gunzip-8                    109ms ± 1%      109ms ± 1%    ~     (p=0.694 n=7+8)
HTTPClientServer-8          149µs ± 3%      146µs ± 2%  -1.70%  (p=0.028 n=8+8)
JSONEncode-8               50.5ms ± 1%     49.2ms ± 0%  -2.60%  (p=0.001 n=7+7)
JSONDecode-8                135ms ± 2%      137ms ± 1%    ~     (p=0.054 n=8+7)
Mandelbrot200-8             951ms ± 0%      952ms ± 0%    ~     (p=0.852 n=6+8)
GoParse-8                  9.47ms ± 1%     9.66ms ± 1%  +2.01%  (p=0.000 n=8+8)
RegexpMatchEasy0_32-8       288ns ± 2%      277ns ± 2%  -3.61%  (p=0.000 n=8+8)
RegexpMatchEasy0_1K-8      1.66µs ± 1%     1.69µs ± 2%  +2.21%  (p=0.001 n=7+7)
RegexpMatchEasy1_32-8       334ns ± 1%      305ns ± 2%  -8.86%  (p=0.000 n=8+8)
RegexpMatchEasy1_1K-8      2.14µs ± 2%     2.15µs ± 0%    ~     (p=0.099 n=8+8)
RegexpMatchMedium_32-8     13.3ns ± 1%     13.3ns ± 0%    ~     (p=1.000 n=7+7)
RegexpMatchMedium_1K-8     81.1µs ± 3%     80.7µs ± 1%    ~     (p=0.955 n=7+8)
RegexpMatchHard_32-8       4.26µs ± 0%     4.26µs ± 0%    ~     (p=0.933 n=7+8)
RegexpMatchHard_1K-8        124µs ± 0%      124µs ± 0%  +0.31%  (p=0.000 n=8+8)
Revcomp-8                  14.7ms ± 2%     14.5ms ± 1%  -1.66%  (p=0.003 n=8+8)
Template-8                  197ms ± 2%      200ms ± 3%  +1.62%  (p=0.021 n=8+8)
TimeParse-8                1.33µs ± 1%     1.30µs ± 1%  -1.86%  (p=0.002 n=8+8)
TimeFormat-8               3.04µs ± 1%     3.02µs ± 0%  -0.60%  (p=0.000 n=8+8)

name                     old speed      new speed       delta
GobDecode-8              47.6MB/s ± 1%   46.5MB/s ± 1%  -2.28%  (p=0.000 n=8+8)
GobEncode-8              48.1MB/s ± 0%   48.6MB/s ± 1%  +1.02%  (p=0.000 n=8+7)
Gzip-8                   28.1MB/s ± 1%   29.0MB/s ± 0%  +2.97%  (p=0.000 n=8+8)
Gunzip-8                  178MB/s ± 1%    179MB/s ± 2%    ~     (p=0.694 n=7+8)
JSONEncode-8             38.4MB/s ± 1%   39.4MB/s ± 0%  +2.67%  (p=0.001 n=7+7)
JSONDecode-8             14.3MB/s ± 2%   14.2MB/s ± 1%  -0.81%  (p=0.043 n=8+7)
GoParse-8                6.12MB/s ± 1%   5.99MB/s ± 1%  -2.00%  (p=0.000 n=8+8)
RegexpMatchEasy0_32-8     111MB/s ± 2%    115MB/s ± 2%  +3.77%  (p=0.000 n=8+8)
RegexpMatchEasy0_1K-8     618MB/s ± 1%    604MB/s ± 2%  -2.16%  (p=0.001 n=7+7)
RegexpMatchEasy1_32-8    95.7MB/s ± 1%  105.1MB/s ± 2%  +9.76%  (p=0.000 n=8+8)
RegexpMatchEasy1_1K-8     479MB/s ± 2%    477MB/s ± 0%    ~     (p=0.105 n=8+8)
RegexpMatchMedium_32-8   75.2MB/s ± 1%   75.2MB/s ± 0%    ~     (p=0.247 n=7+7)
RegexpMatchMedium_1K-8   12.6MB/s ± 3%   12.7MB/s ± 1%    ~     (p=0.538 n=7+8)
RegexpMatchHard_32-8     7.52MB/s ± 0%   7.52MB/s ± 0%    ~     (p=0.968 n=7+8)
RegexpMatchHard_1K-8     8.26MB/s ± 0%   8.24MB/s ± 0%  -0.30%  (p=0.001 n=8+8)
Revcomp-8                 173MB/s ± 2%    176MB/s ± 1%  +1.68%  (p=0.003 n=8+8)
Template-8               9.85MB/s ± 2%   9.69MB/s ± 3%  -1.59%  (p=0.021 n=8+8)

Fixes   #39303
Updates #38740

Change-Id: I0a5f87bfda679f66414c0041ace2ca2e28363f36
Reviewed-on: https://go-review.googlesource.com/c/go/+/236637
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-06-09 15:50:33 +00:00
Austin Clements 2bad2f7eba cmd/compile: mark PanicBounds/Extend as calls
PanicBounds and PanicExtend are lowered to runtime calls (with a
non-Go ABI), but are not currently marked as calls. Since liveness
analysis only emits stack maps at calls in the runtime, this means
these panic call sites in the runtime won't get a stack map. These
almost immediately turn into throws in the runtime, but there's still
a chance they'll try to grow the stack first, which would lead to a
different panic.

To fix this, mark these operations as calls.

Outside the runtime, we currently emit stack maps for everything that
isn't an unsafe-point, so these panic calls get stack maps by default.
However, we're about to move to emitting stack maps only at call
sites, at which point this will start to matter outside the runtime as
well.

I confirmed that this has no effect on anything but PCDATA/FUNCDATA in
runtime and net/http.

For #36365.

Change-Id: Ic5bb463fd152cc320c815dc04cf62005261ae169
Reviewed-on: https://go-review.googlesource.com/c/go/+/230539
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-04-29 21:29:14 +00:00
Cherry Zhang 2ff746d7dc runtime: add async preemption support on ARM
This CL adds support of call injection and async preemption on
ARM.

Injected call, like sigpanic, has special frame layout. Teach
traceback to handle it.

Change-Id: I887e90134fbf8a676b73c26321c50b3c4762dba4
Reviewed-on: https://go-review.googlesource.com/c/go/+/202338
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Austin Clements <austin@google.com>
2019-11-05 02:49:48 +00:00
smasher164 58b031949b cmd/compile: add fma intrinsic for arm
This change introduces an arm intrinsic that generates the FMULAD
instruction for the fused-multiply-add operation on systems that
support it. System support is detected via cpu.ARM.HasVFPv4. A rewrite
rule translates the generic intrinsic to FMULAD.

Updates #25819.

Change-Id: I8459e5dd1cdbdca35f88a78dbeb7d387f1e20efa
Reviewed-on: https://go-review.googlesource.com/c/go/+/142117
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-21 17:42:47 +00:00
Ben Shi 11d7775c9f cmd/compile: remove some nacl SSA rules
Updates golang/go#30439

Change-Id: I7ef5301fbd650d26a37a1241ddf7ca1ccd58b89d
Reviewed-on: https://go-review.googlesource.com/c/go/+/200941
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-10-15 16:45:31 +00:00
Michael Munday 9c2e7e8bed cmd/compile: allow multiple SSA block control values
Control values are used to choose which successor of a block is
jumped to. Typically a control value takes the form of a 'flags'
value that represents the result of a comparison. Some
architectures however use a variable in a register as a control
value.

Up until now we have managed with a single control value per block.
However some architectures (e.g. s390x and riscv64) have combined
compare-and-branch instructions that take two variables in registers
as parameters. To generate these instructions we need to support 2
control values per block.

This CL allows up to 2 control values to be used in a block in
order to support the addition of compare-and-branch instructions.
I have implemented s390x compare-and-branch instructions in a
different CL.

Passes toolstash-check -all.

Results of compilebench:

name                      old time/op       new time/op       delta
Template                        208ms ± 1%        209ms ± 1%    ~     (p=0.289 n=20+20)
Unicode                        83.7ms ± 1%       83.3ms ± 3%  -0.49%  (p=0.017 n=18+18)
GoTypes                         748ms ± 1%        748ms ± 0%    ~     (p=0.460 n=20+18)
Compiler                        3.47s ± 1%        3.48s ± 1%    ~     (p=0.070 n=19+18)
SSA                             11.5s ± 1%        11.7s ± 1%  +1.64%  (p=0.000 n=19+18)
Flate                           130ms ± 1%        130ms ± 1%    ~     (p=0.588 n=19+20)
GoParser                        160ms ± 1%        161ms ± 1%    ~     (p=0.211 n=20+20)
Reflect                         465ms ± 1%        467ms ± 1%  +0.42%  (p=0.007 n=20+20)
Tar                             184ms ± 1%        185ms ± 2%    ~     (p=0.087 n=18+20)
XML                             253ms ± 1%        253ms ± 1%    ~     (p=0.377 n=20+18)
LinkCompiler                    769ms ± 2%        774ms ± 2%    ~     (p=0.070 n=19+19)
ExternalLinkCompiler            3.59s ±11%        3.68s ± 6%    ~     (p=0.072 n=20+20)
LinkWithoutDebugCompiler        446ms ± 5%        454ms ± 3%  +1.79%  (p=0.002 n=19+20)
StdCmd                          26.0s ± 2%        26.0s ± 2%    ~     (p=0.799 n=20+20)

name                      old user-time/op  new user-time/op  delta
Template                        238ms ± 5%        240ms ± 5%    ~     (p=0.142 n=20+20)
Unicode                         105ms ±11%        106ms ±10%    ~     (p=0.512 n=20+20)
GoTypes                         876ms ± 2%        873ms ± 4%    ~     (p=0.647 n=20+19)
Compiler                        4.17s ± 2%        4.19s ± 1%    ~     (p=0.093 n=20+18)
SSA                             13.9s ± 1%        14.1s ± 1%  +1.45%  (p=0.000 n=18+18)
Flate                           145ms ±13%        146ms ± 5%    ~     (p=0.851 n=20+18)
GoParser                        185ms ± 5%        188ms ± 7%    ~     (p=0.174 n=20+20)
Reflect                         534ms ± 3%        538ms ± 2%    ~     (p=0.105 n=20+18)
Tar                             215ms ± 4%        211ms ± 9%    ~     (p=0.079 n=19+20)
XML                             295ms ± 6%        295ms ± 5%    ~     (p=0.968 n=20+20)
LinkCompiler                    832ms ± 4%        837ms ± 7%    ~     (p=0.707 n=17+20)
ExternalLinkCompiler            1.58s ± 8%        1.60s ± 4%    ~     (p=0.296 n=20+19)
LinkWithoutDebugCompiler        478ms ±12%        489ms ±10%    ~     (p=0.429 n=20+20)

name                      old object-bytes  new object-bytes  delta
Template                        559kB ± 0%        559kB ± 0%    ~     (all equal)
Unicode                         216kB ± 0%        216kB ± 0%    ~     (all equal)
GoTypes                        2.03MB ± 0%       2.03MB ± 0%    ~     (all equal)
Compiler                       8.07MB ± 0%       8.07MB ± 0%  -0.06%  (p=0.000 n=20+20)
SSA                            27.1MB ± 0%       27.3MB ± 0%  +0.89%  (p=0.000 n=20+20)
Flate                           343kB ± 0%        343kB ± 0%    ~     (all equal)
GoParser                        441kB ± 0%        441kB ± 0%    ~     (all equal)
Reflect                        1.36MB ± 0%       1.36MB ± 0%    ~     (all equal)
Tar                             487kB ± 0%        487kB ± 0%    ~     (all equal)
XML                             632kB ± 0%        632kB ± 0%    ~     (all equal)

name                      old export-bytes  new export-bytes  delta
Template                       18.5kB ± 0%       18.5kB ± 0%    ~     (all equal)
Unicode                        7.92kB ± 0%       7.92kB ± 0%    ~     (all equal)
GoTypes                        35.0kB ± 0%       35.0kB ± 0%    ~     (all equal)
Compiler                        109kB ± 0%        110kB ± 0%  +0.72%  (p=0.000 n=20+20)
SSA                             137kB ± 0%        138kB ± 0%  +0.58%  (p=0.000 n=20+20)
Flate                          4.89kB ± 0%       4.89kB ± 0%    ~     (all equal)
GoParser                       8.49kB ± 0%       8.49kB ± 0%    ~     (all equal)
Reflect                        11.4kB ± 0%       11.4kB ± 0%    ~     (all equal)
Tar                            10.5kB ± 0%       10.5kB ± 0%    ~     (all equal)
XML                            16.7kB ± 0%       16.7kB ± 0%    ~     (all equal)

name                      old text-bytes    new text-bytes    delta
HelloSize                       761kB ± 0%        761kB ± 0%    ~     (all equal)
CmdGoSize                      10.8MB ± 0%       10.8MB ± 0%    ~     (all equal)

name                      old data-bytes    new data-bytes    delta
HelloSize                      10.7kB ± 0%       10.7kB ± 0%    ~     (all equal)
CmdGoSize                       312kB ± 0%        312kB ± 0%    ~     (all equal)

name                      old bss-bytes     new bss-bytes     delta
HelloSize                       122kB ± 0%        122kB ± 0%    ~     (all equal)
CmdGoSize                       146kB ± 0%        146kB ± 0%    ~     (all equal)

name                      old exe-bytes     new exe-bytes     delta
HelloSize                      1.13MB ± 0%       1.13MB ± 0%    ~     (all equal)
CmdGoSize                      15.1MB ± 0%       15.1MB ± 0%    ~     (all equal)

Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f
Reviewed-on: https://go-review.googlesource.com/c/go/+/196557
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-02 09:56:36 +00:00
Ben Shi 3cfd003a8a cmd/compile: optimize ARM's math.bits.RotateLeft32
This CL optimizes math.bits.RotateLeft32 to inline
"MOVW Rx@>Ry, Rd" on ARM.

The benchmark results of math/bits show some improvements.
name               old time/op  new time/op  delta
RotateLeft-4       9.42ns ± 0%  6.91ns ± 0%  -26.66%  (p=0.000 n=40+33)
RotateLeft8-4      8.79ns ± 0%  8.79ns ± 0%   -0.04%  (p=0.000 n=40+31)
RotateLeft16-4     8.79ns ± 0%  8.79ns ± 0%   -0.04%  (p=0.000 n=40+32)
RotateLeft32-4     8.16ns ± 0%  7.54ns ± 0%   -7.68%  (p=0.000 n=40+40)
RotateLeft64-4     15.7ns ± 0%  15.7ns ± 0%     ~     (all equal)

updates #31265

Change-Id: I77bc1c2c702d5323fc7cad5264a8e2d5666bf712
Reviewed-on: https://go-review.googlesource.com/c/go/+/188697
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-08-28 15:41:58 +00:00
Ben Shi c683ab8128 cmd/compile: optimize ARM's math.Abs
This CL optimizes math.Abs to an inline ABSD instruction on ARM.

The benchmark results of src/math/ show big improvements.
name                   old time/op  new time/op  delta
Acos-4                  181ns ± 0%   182ns ± 0%   +0.30%  (p=0.000 n=40+40)
Acosh-4                 202ns ± 0%   202ns ± 0%     ~     (all equal)
Asin-4                  163ns ± 0%   163ns ± 0%     ~     (all equal)
Asinh-4                 242ns ± 0%   242ns ± 0%     ~     (all equal)
Atan-4                  120ns ± 0%   121ns ± 0%   +0.83%  (p=0.000 n=40+40)
Atanh-4                 202ns ± 0%   202ns ± 0%     ~     (all equal)
Atan2-4                 173ns ± 0%   173ns ± 0%     ~     (all equal)
Cbrt-4                 1.06µs ± 0%  1.06µs ± 0%   +0.09%  (p=0.000 n=39+37)
Ceil-4                 72.9ns ± 0%  72.8ns ± 0%     ~     (p=0.237 n=40+40)
Copysign-4             13.2ns ± 0%  13.2ns ± 0%     ~     (all equal)
Cos-4                   193ns ± 0%   183ns ± 0%   -5.18%  (p=0.000 n=40+40)
Cosh-4                  254ns ± 0%   239ns ± 0%   -5.91%  (p=0.000 n=40+40)
Erf-4                   112ns ± 0%   112ns ± 0%     ~     (all equal)
Erfc-4                  117ns ± 0%   117ns ± 0%     ~     (all equal)
Erfinv-4                127ns ± 0%   127ns ± 1%     ~     (p=0.492 n=40+40)
Erfcinv-4               128ns ± 0%   128ns ± 0%     ~     (all equal)
Exp-4                   212ns ± 0%   206ns ± 0%   -3.05%  (p=0.000 n=40+40)
ExpGo-4                 216ns ± 0%   209ns ± 0%   -3.24%  (p=0.000 n=40+40)
Expm1-4                 142ns ± 0%   142ns ± 0%     ~     (all equal)
Exp2-4                  191ns ± 0%   184ns ± 0%   -3.45%  (p=0.000 n=40+40)
Exp2Go-4                194ns ± 0%   187ns ± 0%   -3.61%  (p=0.000 n=40+40)
Abs-4                  14.4ns ± 0%   6.3ns ± 0%  -56.39%  (p=0.000 n=38+39)
Dim-4                  12.6ns ± 0%  12.6ns ± 0%     ~     (all equal)
Floor-4                49.6ns ± 0%  49.6ns ± 0%     ~     (all equal)
Max-4                  27.6ns ± 0%  27.6ns ± 0%     ~     (all equal)
Min-4                  27.0ns ± 0%  27.0ns ± 0%     ~     (all equal)
Mod-4                   349ns ± 0%   305ns ± 1%  -12.55%  (p=0.000 n=33+40)
Frexp-4                54.0ns ± 0%  47.1ns ± 0%  -12.78%  (p=0.000 n=38+38)
Gamma-4                 242ns ± 0%   234ns ± 0%   -3.16%  (p=0.000 n=36+40)
Hypot-4                84.8ns ± 0%  67.8ns ± 0%  -20.05%  (p=0.000 n=31+35)
HypotGo-4              88.5ns ± 0%  71.6ns ± 0%  -19.12%  (p=0.000 n=40+38)
Ilogb-4                45.8ns ± 0%  38.9ns ± 0%  -15.12%  (p=0.000 n=40+32)
J0-4                    821ns ± 0%   802ns ± 0%   -2.33%  (p=0.000 n=33+40)
J1-4                    816ns ± 0%   807ns ± 0%   -1.05%  (p=0.000 n=40+29)
Jn-4                   1.67µs ± 0%  1.65µs ± 0%   -1.45%  (p=0.000 n=40+39)
Ldexp-4                61.5ns ± 0%  54.6ns ± 0%  -11.27%  (p=0.000 n=40+32)
Lgamma-4                188ns ± 0%   188ns ± 0%     ~     (all equal)
Log-4                   154ns ± 0%   147ns ± 0%   -4.78%  (p=0.000 n=40+40)
Logb-4                 50.9ns ± 0%  42.7ns ± 0%  -16.11%  (p=0.000 n=34+39)
Log1p-4                 160ns ± 0%   159ns ± 0%     ~     (p=0.828 n=40+40)
Log10-4                 173ns ± 0%   166ns ± 0%   -4.05%  (p=0.000 n=40+40)
Log2-4                 65.3ns ± 0%  58.4ns ± 0%  -10.57%  (p=0.000 n=37+37)
Modf-4                 36.4ns ± 0%  36.4ns ± 0%     ~     (all equal)
Nextafter32-4          36.4ns ± 0%  36.4ns ± 0%     ~     (all equal)
Nextafter64-4          32.7ns ± 0%  32.6ns ± 0%     ~     (p=0.375 n=40+40)
PowInt-4                300ns ± 0%   277ns ± 0%   -7.78%  (p=0.000 n=40+40)
PowFrac-4               676ns ± 0%   635ns ± 0%   -6.00%  (p=0.000 n=40+35)
Pow10Pos-4             17.6ns ± 0%  17.6ns ± 0%     ~     (all equal)
Pow10Neg-4             22.0ns ± 0%  22.0ns ± 0%     ~     (all equal)
Round-4                30.1ns ± 0%  30.1ns ± 0%     ~     (all equal)
RoundToEven-4          38.9ns ± 0%  38.9ns ± 0%     ~     (all equal)
Remainder-4             291ns ± 0%   263ns ± 0%   -9.62%  (p=0.000 n=40+40)
Signbit-4              11.3ns ± 0%  11.3ns ± 0%     ~     (all equal)
Sin-4                   185ns ± 0%   185ns ± 0%     ~     (all equal)
Sincos-4                230ns ± 0%   230ns ± 0%     ~     (all equal)
Sinh-4                  253ns ± 0%   246ns ± 0%   -2.77%  (p=0.000 n=39+39)
SqrtIndirect-4         41.4ns ± 0%  41.4ns ± 0%     ~     (all equal)
SqrtLatency-4          13.8ns ± 0%  13.8ns ± 0%     ~     (all equal)
SqrtIndirectLatency-4  37.0ns ± 0%  37.0ns ± 0%     ~     (p=0.632 n=40+40)
SqrtGoLatency-4         911ns ± 0%   911ns ± 0%   +0.08%  (p=0.000 n=40+40)
SqrtPrime-4            13.2µs ± 0%  13.2µs ± 0%   +0.01%  (p=0.038 n=38+40)
Tan-4                   205ns ± 0%   205ns ± 0%     ~     (all equal)
Tanh-4                  264ns ± 0%   247ns ± 0%   -6.44%  (p=0.000 n=39+32)
Trunc-4                45.2ns ± 0%  45.2ns ± 0%     ~     (all equal)
Y0-4                    796ns ± 0%   792ns ± 0%   -0.55%  (p=0.000 n=35+40)
Y1-4                    804ns ± 0%   797ns ± 0%   -0.82%  (p=0.000 n=24+40)
Yn-4                   1.64µs ± 0%  1.62µs ± 0%   -1.27%  (p=0.000 n=40+39)
Float64bits-4          8.16ns ± 0%  8.16ns ± 0%   +0.04%  (p=0.000 n=35+40)
Float64frombits-4      10.7ns ± 0%  10.7ns ± 0%     ~     (all equal)
Float32bits-4          7.53ns ± 0%  7.53ns ± 0%     ~     (p=0.760 n=40+40)
Float32frombits-4      6.91ns ± 0%  6.91ns ± 0%   -0.04%  (p=0.002 n=32+38)
[Geo mean]              111ns        106ns        -3.98%

Change-Id: I54f4fd7f5160db020b430b556bde59cc0fdb996d
Reviewed-on: https://go-review.googlesource.com/c/go/+/188678
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-08-28 15:41:28 +00:00
Keith Randall 2c423f063b cmd/compile,runtime: provide index information on bounds check failure
A few examples (for accessing a slice of length 3):

   s[-1]    runtime error: index out of range [-1]
   s[3]     runtime error: index out of range [3] with length 3
   s[-1:0]  runtime error: slice bounds out of range [-1:]
   s[3:0]   runtime error: slice bounds out of range [3:0]
   s[3:-1]  runtime error: slice bounds out of range [:-1]
   s[3:4]   runtime error: slice bounds out of range [:4] with capacity 3
   s[0:3:4] runtime error: slice bounds out of range [::4] with capacity 3

Note that in cases where there are multiple things wrong with the
indexes (e.g. s[3:-1]), we report one of those errors kind of
arbitrarily, currently the rightmost one.

An exhaustive set of examples is in issue30116[u].out in the CL.

The message text has the same prefix as the old message text. That
leads to slightly awkward phrasing but hopefully minimizes the chance
that code depending on the error text will break.

Increases the size of the go binary by 0.5% (amd64). The panic functions
take arguments in registers in order to keep the size of the compiled code
as small as possible.

Fixes #30116

Change-Id: Idb99a827b7888822ca34c240eca87b7e44a04fdd
Reviewed-on: https://go-review.googlesource.com/c/go/+/161477
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2019-03-18 17:33:38 +00:00
erifan01 fee84cc905 cmd/compile: add an optimization rule for math/bits.ReverseBytes16 on arm
This CL adds two rules to turn patterns like ((x<<8) | (x>>8)) (the type of
x is uint16, "|" can also be "+" or "^") to a REV16 instruction on arm v6+.
This optimization rule can be used for math/bits.ReverseBytes16.

Benchmarks on arm v6:
name               old time/op  new time/op  delta
ReverseBytes-32    2.86ns ± 0%  2.86ns ± 0%   ~     (all equal)
ReverseBytes16-32  2.86ns ± 0%  2.86ns ± 0%   ~     (all equal)
ReverseBytes32-32  1.29ns ± 0%  1.29ns ± 0%   ~     (all equal)
ReverseBytes64-32  1.43ns ± 0%  1.43ns ± 0%   ~     (all equal)

Change-Id: I819e633c9a9d308f8e476fb0c82d73fb73dd019f
Reviewed-on: https://go-review.googlesource.com/c/go/+/159019
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-03-07 13:37:54 +00:00
Ben Shi 096229b2ec cmd/compile: add missing type information for some arm/arm64 rules
Some indexed load/store rules lack of type information, and this
CL adds that for them.

Change-Id: Icac315ccb83a2f5bf30b056d4667d5b59eb4e5e2
Reviewed-on: https://go-review.googlesource.com/128455
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-08-27 15:22:45 +00:00
Wei Xiao 20102594a0 cmd/compile: intrinsify runtime.getcallerpc on all link register architectures
Add a compiler intrinsic for getcallerpc on following architectures:
  arm
  mips mipsle mips64 mips64le
  ppc64 ppc64le
  s390x

Change-Id: I758f3d4742fc214b206bcd07d90408622c17dbef
Reviewed-on: https://go-review.googlesource.com/110835
Run-TryBot: Wei Xiao <Wei.Xiao@arm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-05-02 16:59:27 +00:00
Austin Clements 8871c930be cmd/compile: don't lower OpConvert
Currently, each architecture lowers OpConvert to an arch-specific
OpXXXconvert. This is silly because OpConvert means the same thing on
all architectures and is logically a no-op that exists only to keep
track of conversions to and from unsafe.Pointer. Furthermore, lowering
it makes it harder to recognize in other analyses, particularly
liveness analysis.

This CL eliminates the lowering of OpConvert, leaving it as the
generic op until code generation time.

The main complexity here is that we still need to register-allocate
OpConvert operations. Currently, each arch's lowered OpConvert
specifies all GP registers in its register mask. Ideally, OpConvert
wouldn't affect value homing at all, and we could just copy the home
of OpConvert's source, but this can potentially home an OpConvert in a
LocalSlot, which neither regalloc nor stackalloc expect. Rather than
try to disentangle this assumption from regalloc and stackalloc, we
continue to register-allocate OpConvert, but teach regalloc that
OpConvert can be allocated to any allocatable GP register.

For #24543.

Change-Id: I795a6aee5fd94d4444a7bafac3838a400c9f7bb6
Reviewed-on: https://go-review.googlesource.com/108496
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-04-20 18:46:39 +00:00
Heschi Kreinick caa1b4afbd cmd/compile/internal/ssa: note zero-width Ops
Add a bool to opInfo to indicate if an Op never results in any
instructions. This is a conservative approximation: some operations,
like Copy, may or may not generate code depending on their arguments.

I built the list by reading each arch's ssaGenValue function. Hopefully
I got them all.

Change-Id: I130b251b65f18208294e129bb7ddc3f91d57d31d
Reviewed-on: https://go-review.googlesource.com/97957
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-02 18:55:45 +00:00
Austin Clements 1de1f316df runtime: buffered write barrier for arm
Updates #22460.

Change-Id: I5581df7ad553237db7df3701b117ad99e0593b78
Reviewed-on: https://go-review.googlesource.com/92698
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-02-13 16:34:17 +00:00
Ben Shi 1ec78d1dd1 cmd/compile: optimize ARM code with CMN/TST/TEQ
CMN/TST/TEQ were supported since ARMv4, which can be used to
simplify comparisons.

This patch implements the optimization and here are the benchmark
results.

1. A special test case got 18.21% improvement.
name                     old time/op    new time/op    delta
TSTTEQ-4                    806µs ± 1%     659µs ± 0%  -18.21%  (p=0.000 n=20+18)
(https://github.com/benshi001/ugo1/blob/master/tstteq_test.go)

2. There is no regression in the compilecmp benchmark.
name        old time/op       new time/op       delta
Template          2.31s ± 1%        2.30s ± 1%    ~     (p=0.661 n=10+9)
Unicode           1.32s ± 3%        1.32s ± 5%    ~     (p=0.280 n=10+10)
GoTypes           7.69s ± 1%        7.65s ± 0%  -0.52%  (p=0.027 n=10+8)
Compiler          36.5s ± 1%        36.4s ± 1%    ~     (p=0.546 n=9+9)
SSA               85.1s ± 2%        84.9s ± 1%    ~     (p=0.529 n=10+10)
Flate             1.43s ± 2%        1.43s ± 2%    ~     (p=0.661 n=10+9)
GoParser          1.81s ± 2%        1.81s ± 1%    ~     (p=0.796 n=10+10)
Reflect           5.10s ± 2%        5.09s ± 1%    ~     (p=0.853 n=10+10)
Tar               2.47s ± 1%        2.48s ± 1%    ~     (p=0.123 n=10+10)
XML               2.59s ± 1%        2.58s ± 1%    ~     (p=0.853 n=10+10)
[Geo mean]        4.78s             4.77s       -0.17%

name        old user-time/op  new user-time/op  delta
Template          2.72s ± 3%        2.73s ± 2%    ~     (p=0.928 n=10+10)
Unicode           1.58s ± 4%        1.60s ± 1%    ~     (p=0.087 n=10+9)
GoTypes           9.41s ± 2%        9.36s ± 1%    ~     (p=0.060 n=10+10)
Compiler          44.4s ± 2%        44.2s ± 2%    ~     (p=0.289 n=10+10)
SSA                110s ± 2%         110s ± 1%    ~     (p=0.739 n=10+10)
Flate             1.67s ± 2%        1.63s ± 3%    ~     (p=0.063 n=10+10)
GoParser          2.12s ± 1%        2.12s ± 2%    ~     (p=0.840 n=10+10)
Reflect           5.94s ± 1%        5.98s ± 1%    ~     (p=0.063 n=9+10)
Tar               3.01s ± 2%        3.02s ± 2%    ~     (p=0.584 n=10+10)
XML               3.04s ± 3%        3.02s ± 2%    ~     (p=0.696 n=10+10)
[Geo mean]        5.73s             5.72s       -0.20%

name        old text-bytes    new text-bytes    delta
HelloSize         579kB ± 0%        579kB ± 0%    ~     (all equal)

name        old data-bytes    new data-bytes    delta
HelloSize        5.46kB ± 0%       5.46kB ± 0%    ~     (all equal)

name        old bss-bytes     new bss-bytes     delta
HelloSize        72.8kB ± 0%       72.8kB ± 0%    ~     (all equal)

name        old exe-bytes     new exe-bytes     delta
HelloSize        1.03MB ± 0%       1.03MB ± 0%    ~     (all equal)

3. There is little change in the go1 benchmark (excluding the noise).
name                     old time/op    new time/op     delta
BinaryTree17-4              40.3s ± 1%      40.6s ± 1%  +0.80%  (p=0.000 n=30+30)
Fannkuch11-4                24.2s ± 1%      24.1s ± 0%    ~     (p=0.093 n=30+30)
FmtFprintfEmpty-4           834ns ± 0%      826ns ± 0%  -0.93%  (p=0.000 n=29+24)
FmtFprintfString-4         1.39µs ± 1%     1.36µs ± 0%  -2.02%  (p=0.000 n=30+30)
FmtFprintfInt-4            1.43µs ± 1%     1.44µs ± 1%    ~     (p=0.155 n=30+29)
FmtFprintfIntInt-4         2.09µs ± 0%     2.11µs ± 0%  +1.16%  (p=0.000 n=28+30)
FmtFprintfPrefixedInt-4    2.33µs ± 1%     2.36µs ± 0%  +1.25%  (p=0.000 n=30+30)
FmtFprintfFloat-4          4.27µs ± 1%     4.32µs ± 1%  +1.27%  (p=0.000 n=30+30)
FmtManyArgs-4              8.18µs ± 0%     8.14µs ± 0%  -0.46%  (p=0.000 n=25+27)
GobDecode-4                 101ms ± 1%      101ms ± 1%    ~     (p=0.182 n=29+29)
GobEncode-4                89.6ms ± 1%     87.8ms ± 2%  -2.02%  (p=0.000 n=30+29)
Gzip-4                      4.07s ± 1%      4.08s ± 1%    ~     (p=0.173 n=30+27)
Gunzip-4                    602ms ± 1%      600ms ± 1%  -0.29%  (p=0.000 n=29+28)
HTTPClientServer-4          679µs ± 4%      683µs ± 3%    ~     (p=0.197 n=30+30)
JSONEncode-4                241ms ± 1%      239ms ± 1%  -0.84%  (p=0.000 n=30+30)
JSONDecode-4                903ms ± 1%      882ms ± 1%  -2.33%  (p=0.000 n=30+30)
Mandelbrot200-4            41.8ms ± 0%     41.8ms ± 0%    ~     (p=0.719 n=30+30)
GoParse-4                  45.5ms ± 1%     45.8ms ± 1%  +0.52%  (p=0.000 n=30+30)
RegexpMatchEasy0_32-4      1.27µs ± 1%     1.27µs ± 0%  -0.60%  (p=0.000 n=30+30)
RegexpMatchEasy0_1K-4      7.77µs ± 6%     7.69µs ± 4%  -0.96%  (p=0.040 n=30+30)
RegexpMatchEasy1_32-4      1.29µs ± 1%     1.28µs ± 1%  -0.54%  (p=0.000 n=30+30)
RegexpMatchEasy1_1K-4      10.3µs ± 6%     10.2µs ± 3%    ~     (p=0.453 n=30+27)
RegexpMatchMedium_32-4     1.98µs ± 1%     2.00µs ± 1%  +0.85%  (p=0.000 n=30+29)
RegexpMatchMedium_1K-4      503µs ± 0%      503µs ± 1%    ~     (p=0.752 n=30+30)
RegexpMatchHard_32-4       27.1µs ± 1%     26.5µs ± 0%  -1.96%  (p=0.000 n=30+24)
RegexpMatchHard_1K-4        809µs ± 1%      799µs ± 1%  -1.29%  (p=0.000 n=29+30)
Revcomp-4                  67.3ms ± 2%     67.2ms ± 1%    ~     (p=0.265 n=29+29)
Template-4                  1.08s ± 1%      1.07s ± 0%  -1.39%  (p=0.000 n=30+22)
TimeParse-4                6.93µs ± 1%     6.96µs ± 1%  +0.40%  (p=0.005 n=30+30)
TimeFormat-4               13.3µs ± 0%     13.3µs ± 1%    ~     (p=0.734 n=30+30)
[Geo mean]                  709µs           707µs       -0.32%

name                     old speed      new speed       delta
GobDecode-4              7.59MB/s ± 1%   7.57MB/s ± 1%    ~     (p=0.145 n=29+29)
GobEncode-4              8.56MB/s ± 1%   8.74MB/s ± 1%  +2.07%  (p=0.000 n=30+29)
Gzip-4                   4.76MB/s ± 1%   4.75MB/s ± 1%  -0.25%  (p=0.037 n=30+30)
Gunzip-4                 32.2MB/s ± 1%   32.3MB/s ± 1%  +0.29%  (p=0.000 n=29+28)
JSONEncode-4             8.04MB/s ± 1%   8.11MB/s ± 1%  +0.85%  (p=0.000 n=30+30)
JSONDecode-4             2.15MB/s ± 1%   2.20MB/s ± 1%  +2.29%  (p=0.000 n=30+30)
GoParse-4                1.27MB/s ± 1%   1.26MB/s ± 1%  -0.73%  (p=0.000 n=30+30)
RegexpMatchEasy0_32-4    25.1MB/s ± 1%   25.3MB/s ± 0%  +0.61%  (p=0.000 n=30+30)
RegexpMatchEasy0_1K-4     131MB/s ± 6%    133MB/s ± 4%  +1.35%  (p=0.009 n=28+30)
RegexpMatchEasy1_32-4    24.9MB/s ± 1%   25.0MB/s ± 1%  +0.54%  (p=0.000 n=30+30)
RegexpMatchEasy1_1K-4    99.2MB/s ± 6%  100.2MB/s ± 3%    ~     (p=0.448 n=30+27)
RegexpMatchMedium_32-4    503kB/s ± 1%    500kB/s ± 0%  -0.66%  (p=0.002 n=30+24)
RegexpMatchMedium_1K-4   2.04MB/s ± 0%   2.04MB/s ± 1%    ~     (p=0.358 n=30+30)
RegexpMatchHard_32-4     1.18MB/s ± 1%   1.20MB/s ± 1%  +1.75%  (p=0.000 n=30+30)
RegexpMatchHard_1K-4     1.26MB/s ± 1%   1.28MB/s ± 1%  +1.42%  (p=0.000 n=30+30)
Revcomp-4                37.8MB/s ± 2%   37.8MB/s ± 1%    ~     (p=0.266 n=29+29)
Template-4               1.80MB/s ± 1%   1.82MB/s ± 1%  +1.46%  (p=0.000 n=30+30)
[Geo mean]               6.91MB/s        6.96MB/s       +0.70%

fixes #21583

Change-Id: I24065a80588ccae7de3ad732a3cfb0026cf7e214
Reviewed-on: https://go-review.googlesource.com/67490
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-10-11 14:03:00 +00:00
Cherry Zhang 6f3e5e637c cmd/compile: intrinsify runtime.getcallersp
Add a compiler intrinsic for getcallersp. So we are able to get
rid of the argument (not done in this CL).

Change-Id: Ic38fda1c694f918328659ab44654198fb116668d
Reviewed-on: https://go-review.googlesource.com/69350
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: David Chase <drchase@google.com>
2017-10-10 15:15:21 +00:00
Ben Shi 9732485851 cmd/compile: optimized ARM code with BFX/BFXU
BFX&BFXU were introduced in ARMv6T2. A single BFX or BFXU is
more efficiently than a pair of left-shift/right-shift in bit
field extraction.

This patch implements this optimization. And the benchmark tests
show big improvement in special cases and little change in total.

1. There is big improvement in a special test case.
name                     old time/op    new time/op    delta
BFX-4                       665µs ± 1%     595µs ± 0%  -10.61%  (p=0.000 n=20+20)
(The test case: https://github.com/benshi001/ugo1/blob/master/bfx_test.go)

2. The compilecmp benchmark shows no regression.
name        old time/op       new time/op       delta
Template          2.33s ± 2%        2.34s ± 2%    ~     (p=0.356 n=9+10)
Unicode           1.32s ± 2%        1.30s ± 2%    ~     (p=0.139 n=9+8)
GoTypes           7.77s ± 1%        7.76s ± 1%    ~     (p=0.780 n=10+9)
Compiler          37.3s ± 1%        37.1s ± 1%    ~     (p=0.211 n=10+9)
SSA               84.3s ± 2%        84.3s ± 2%    ~     (p=0.842 n=10+9)
Flate             1.45s ± 1%        1.45s ± 3%    ~     (p=0.853 n=10+10)
GoParser          1.83s ± 2%        1.83s ± 2%    ~     (p=0.739 n=10+10)
Reflect           5.08s ± 2%        5.09s ± 2%    ~     (p=0.720 n=9+10)
Tar               2.44s ± 1%        2.44s ± 2%    ~     (p=0.684 n=10+10)
XML               2.62s ± 2%        2.62s ± 2%    ~     (p=0.529 n=10+10)
[Geo mean]        4.80s             4.79s       -0.06%

name        old user-time/op  new user-time/op  delta
Template          2.76s ± 2%        2.75s ± 3%    ~     (p=0.893 n=10+10)
Unicode           1.63s ± 1%        1.60s ± 1%  -2.07%  (p=0.000 n=8+9)
GoTypes           9.54s ± 1%        9.52s ± 1%    ~     (p=0.215 n=10+10)
Compiler          46.0s ± 1%        46.0s ± 1%    ~     (p=0.853 n=10+10)
SSA                110s ± 1%         110s ± 1%    ~     (p=0.838 n=10+10)
Flate             1.69s ± 3%        1.69s ± 5%    ~     (p=0.957 n=10+10)
GoParser          2.15s ± 2%        2.15s ± 2%    ~     (p=0.749 n=10+10)
Reflect           6.03s ± 1%        5.99s ± 2%    ~     (p=0.060 n=9+10)
Tar               3.02s ± 2%        2.99s ± 2%    ~     (p=0.214 n=10+10)
XML               3.10s ± 2%        3.08s ± 2%    ~     (p=0.732 n=9+10)
[Geo mean]        5.82s             5.79s       -0.41%

name        old text-bytes    new text-bytes    delta
HelloSize         589kB ± 0%        589kB ± 0%    ~     (all equal)

name        old data-bytes    new data-bytes    delta
HelloSize        5.46kB ± 0%       5.46kB ± 0%    ~     (all equal)

name        old bss-bytes     new bss-bytes     delta
HelloSize        76.9kB ± 0%       76.9kB ± 0%    ~     (all equal)

name        old exe-bytes     new exe-bytes     delta
HelloSize        1.03MB ± 0%       1.03MB ± 0%    ~     (all equal)

3. The go1 benchmark shows little change in total. (excluding noise)
name                     old time/op    new time/op    delta
BinaryTree17-4              41.5s ± 1%     41.6s ± 1%    ~     (p=0.373 n=30+26)
Fannkuch11-4                23.6s ± 1%     23.6s ± 1%  +0.28%  (p=0.003 n=29+30)
FmtFprintfEmpty-4           826ns ± 1%     827ns ± 1%    ~     (p=0.155 n=30+30)
FmtFprintfString-4         1.35µs ± 1%    1.35µs ± 1%    ~     (p=0.499 n=30+30)
FmtFprintfInt-4            1.43µs ± 1%    1.41µs ± 1%  -1.19%  (p=0.000 n=30+30)
FmtFprintfIntInt-4         2.15µs ± 1%    2.11µs ± 1%  -1.78%  (p=0.000 n=30+30)
FmtFprintfPrefixedInt-4    2.21µs ± 1%    2.21µs ± 1%    ~     (p=0.881 n=30+30)
FmtFprintfFloat-4          4.41µs ± 1%    4.44µs ± 0%  +0.64%  (p=0.000 n=30+30)
FmtManyArgs-4              8.06µs ± 1%    8.06µs ± 0%    ~     (p=0.871 n=30+30)
GobDecode-4                 103ms ± 1%     104ms ± 2%  +0.54%  (p=0.013 n=28+29)
GobEncode-4                92.4ms ± 1%    92.6ms ± 1%    ~     (p=0.447 n=30+29)
Gzip-4                      4.17s ± 1%     4.06s ± 1%  -2.56%  (p=0.000 n=29+30)
Gunzip-4                    603ms ± 1%     602ms ± 1%    ~     (p=0.423 n=30+30)
HTTPClientServer-4          688µs ± 2%     674µs ± 3%  -2.09%  (p=0.000 n=29+30)
JSONEncode-4                237ms ± 1%     237ms ± 1%    ~     (p=0.061 n=29+30)
JSONDecode-4                907ms ± 1%     910ms ± 1%    ~     (p=0.061 n=30+30)
Mandelbrot200-4            41.7ms ± 0%    41.7ms ± 0%  +0.19%  (p=0.000 n=24+20)
GoParse-4                  45.7ms ± 2%    45.5ms ± 2%  -0.29%  (p=0.005 n=30+30)
RegexpMatchEasy0_32-4      1.27µs ± 0%    1.27µs ± 0%  +0.12%  (p=0.031 n=30+30)
RegexpMatchEasy0_1K-4      7.77µs ± 4%    7.73µs ± 3%    ~     (p=0.169 n=30+30)
RegexpMatchEasy1_32-4      1.29µs ± 1%    1.29µs ± 1%    ~     (p=0.126 n=30+30)
RegexpMatchEasy1_1K-4      10.4µs ± 3%    10.3µs ± 2%  -1.32%  (p=0.004 n=30+29)
RegexpMatchMedium_32-4     2.06µs ± 0%    2.06µs ± 0%    ~     (p=0.071 n=30+30)
RegexpMatchMedium_1K-4      531µs ± 1%     530µs ± 0%    ~     (p=0.121 n=30+23)
RegexpMatchHard_32-4       28.7µs ± 1%    28.6µs ± 1%  -0.21%  (p=0.001 n=30+27)
RegexpMatchHard_1K-4        860µs ± 1%     857µs ± 1%    ~     (p=0.105 n=30+27)
Revcomp-4                  67.3ms ± 2%    67.3ms ± 2%    ~     (p=0.805 n=29+29)
Template-4                  1.08s ± 1%     1.08s ± 1%    ~     (p=0.260 n=30+30)
TimeParse-4                7.04µs ± 0%    7.04µs ± 0%    ~     (p=0.315 n=30+30)
TimeFormat-4               13.2µs ± 1%    13.2µs ± 1%    ~     (p=0.077 n=30+30)
[Geo mean]                  715µs          713µs       -0.30%

name                     old speed      new speed      delta
GobDecode-4              7.42MB/s ± 1%  7.38MB/s ± 2%  -0.54%  (p=0.011 n=28+29)
GobEncode-4              8.30MB/s ± 1%  8.29MB/s ± 1%    ~     (p=0.484 n=30+29)
Gzip-4                   4.65MB/s ± 2%  4.78MB/s ± 1%  +2.73%  (p=0.000 n=30+30)
Gunzip-4                 32.2MB/s ± 1%  32.2MB/s ± 1%    ~     (p=0.357 n=30+30)
JSONEncode-4             8.18MB/s ± 1%  8.19MB/s ± 1%    ~     (p=0.052 n=29+30)
JSONDecode-4             2.14MB/s ± 1%  2.13MB/s ± 1%    ~     (p=0.074 n=30+29)
GoParse-4                1.27MB/s ± 1%  1.27MB/s ± 2%    ~     (p=0.618 n=24+30)
RegexpMatchEasy0_32-4    25.2MB/s ± 0%  25.2MB/s ± 0%  -0.12%  (p=0.031 n=30+30)
RegexpMatchEasy0_1K-4     132MB/s ± 5%   132MB/s ± 2%    ~     (p=0.171 n=30+30)
RegexpMatchEasy1_32-4    24.8MB/s ± 1%  24.9MB/s ± 1%    ~     (p=0.106 n=30+30)
RegexpMatchEasy1_1K-4    98.4MB/s ± 3%  99.6MB/s ± 4%  +1.19%  (p=0.011 n=30+30)
RegexpMatchMedium_32-4    483kB/s ± 1%   484kB/s ± 1%    ~     (p=0.426 n=30+30)
RegexpMatchMedium_1K-4   1.93MB/s ± 1%  1.93MB/s ± 0%    ~     (p=0.157 n=30+17)
RegexpMatchHard_32-4     1.12MB/s ± 1%  1.12MB/s ± 0%  +0.33%  (p=0.001 n=30+24)
RegexpMatchHard_1K-4     1.19MB/s ± 1%  1.19MB/s ± 1%    ~     (p=0.290 n=30+30)
Revcomp-4                37.8MB/s ± 2%  37.8MB/s ± 1%    ~     (p=0.815 n=29+29)
Template-4               1.80MB/s ± 1%  1.80MB/s ± 1%    ~     (p=0.586 n=30+30)
[Geo mean]               6.80MB/s       6.81MB/s       +0.25%

fixes #20966

Change-Id: Idb5567bbe988c875315b8c98c128957cd474ccc5
Reviewed-on: https://go-review.googlesource.com/64950
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
2017-09-21 12:41:04 +00:00
Ben Shi a07176b45a cmd/compile: optimize ARM code with MULAF/MULSF/MULAD/MULSD
The go compiler can generate better ARM code with those more
efficient FP instructions. And there is little improvement
in total but big improvement in special cases.

1. The size of pkg/linux_arm/math.a shrinks by 2.4%.

2. there is neither improvement nor regression in compilecmp benchmark.
name        old time/op       new time/op       delta
Template          2.32s ± 2%        2.32s ± 1%    ~     (p=1.000 n=9+10)
Unicode           1.32s ± 4%        1.32s ± 4%    ~     (p=0.912 n=10+10)
GoTypes           7.76s ± 1%        7.79s ± 1%    ~     (p=0.447 n=9+10)
Compiler          37.4s ± 2%        37.2s ± 2%    ~     (p=0.218 n=10+10)
SSA               84.8s ± 2%        85.0s ± 1%    ~     (p=0.604 n=10+9)
Flate             1.45s ± 2%        1.44s ± 2%    ~     (p=0.075 n=10+10)
GoParser          1.82s ± 1%        1.81s ± 1%    ~     (p=0.190 n=10+10)
Reflect           5.06s ± 1%        5.05s ± 1%    ~     (p=0.315 n=10+9)
Tar               2.37s ± 1%        2.37s ± 2%    ~     (p=0.912 n=10+10)
XML               2.56s ± 1%        2.58s ± 2%    ~     (p=0.089 n=10+10)
[Geo mean]        4.77s             4.77s       -0.08%

name        old user-time/op  new user-time/op  delta
Template          2.74s ± 2%        2.75s ± 2%    ~     (p=0.856 n=9+10)
Unicode           1.61s ± 4%        1.62s ± 3%    ~     (p=0.693 n=10+10)
GoTypes           9.55s ± 1%        9.49s ± 2%    ~     (p=0.056 n=9+10)
Compiler          45.9s ± 1%        45.8s ± 1%    ~     (p=0.345 n=9+10)
SSA                110s ± 1%         110s ± 1%    ~     (p=0.763 n=9+10)
Flate             1.68s ± 2%        1.68s ± 3%    ~     (p=0.616 n=10+10)
GoParser          2.14s ± 4%        2.14s ± 1%    ~     (p=0.825 n=10+9)
Reflect           5.95s ± 1%        5.97s ± 3%    ~     (p=0.951 n=9+10)
Tar               2.94s ± 3%        2.93s ± 2%    ~     (p=0.359 n=10+10)
XML               3.03s ± 3%        3.07s ± 6%    ~     (p=0.166 n=10+10)
[Geo mean]        5.76s             5.77s       +0.12%

name        old text-bytes    new text-bytes    delta
HelloSize         588kB ± 0%        588kB ± 0%    ~     (all equal)

name        old data-bytes    new data-bytes    delta
HelloSize        5.46kB ± 0%       5.46kB ± 0%    ~     (all equal)

name        old bss-bytes     new bss-bytes     delta
HelloSize        72.9kB ± 0%       72.9kB ± 0%    ~     (all equal)

name        old exe-bytes     new exe-bytes     delta
HelloSize        1.03MB ± 0%       1.03MB ± 0%    ~     (all equal)

3. The performance of Mandelbrot200 improves 15%, though little
   improvement in total.
name                     old time/op    new time/op    delta
BinaryTree17-4              41.7s ± 1%     41.7s ± 1%     ~     (p=0.264 n=29+23)
Fannkuch11-4                24.2s ± 0%     24.1s ± 1%   -0.13%  (p=0.050 n=30+30)
FmtFprintfEmpty-4           826ns ± 1%     824ns ± 1%   -0.24%  (p=0.038 n=25+30)
FmtFprintfString-4         1.38µs ± 1%    1.38µs ± 0%   -0.42%  (p=0.000 n=27+25)
FmtFprintfInt-4            1.46µs ± 1%    1.46µs ± 0%     ~     (p=0.060 n=30+23)
FmtFprintfIntInt-4         2.11µs ± 1%    2.08µs ± 0%   -1.04%  (p=0.000 n=30+30)
FmtFprintfPrefixedInt-4    2.23µs ± 1%    2.22µs ± 1%   -0.51%  (p=0.000 n=30+30)
FmtFprintfFloat-4          4.49µs ± 1%    4.48µs ± 1%   -0.22%  (p=0.004 n=26+30)
FmtManyArgs-4              8.06µs ± 1%    8.12µs ± 1%   +0.68%  (p=0.000 n=25+30)
GobDecode-4                 104ms ± 1%     104ms ± 2%     ~     (p=0.362 n=29+29)
GobEncode-4                92.9ms ± 1%    92.8ms ± 2%     ~     (p=0.786 n=30+30)
Gzip-4                      4.12s ± 1%     4.12s ± 1%     ~     (p=0.314 n=30+30)
Gunzip-4                    602ms ± 1%     603ms ± 1%     ~     (p=0.164 n=30+30)
HTTPClientServer-4          659µs ± 1%     655µs ± 2%   -0.64%  (p=0.006 n=25+28)
JSONEncode-4                234ms ± 1%     235ms ± 1%   +0.29%  (p=0.050 n=30+30)
JSONDecode-4                912ms ± 0%     911ms ± 0%     ~     (p=0.385 n=18+24)
Mandelbrot200-4            49.2ms ± 0%    41.7ms ± 0%  -15.35%  (p=0.000 n=25+27)
GoParse-4                  46.3ms ± 1%    46.3ms ± 2%     ~     (p=0.572 n=30+30)
RegexpMatchEasy0_32-4      1.29µs ± 1%    1.27µs ± 0%   -1.59%  (p=0.000 n=30+30)
RegexpMatchEasy0_1K-4      7.62µs ± 4%    7.71µs ± 3%     ~     (p=0.074 n=30+30)
RegexpMatchEasy1_32-4      1.31µs ± 0%    1.30µs ± 1%   -0.71%  (p=0.000 n=23+30)
RegexpMatchEasy1_1K-4      10.3µs ± 3%    10.3µs ± 5%     ~     (p=0.105 n=30+30)
RegexpMatchMedium_32-4     2.06µs ± 1%    2.06µs ± 1%     ~     (p=0.100 n=30+30)
RegexpMatchMedium_1K-4      533µs ± 1%     534µs ± 1%     ~     (p=0.254 n=29+30)
RegexpMatchHard_32-4       28.9µs ± 0%    28.9µs ± 0%     ~     (p=0.154 n=30+30)
RegexpMatchHard_1K-4        868µs ± 1%     867µs ± 0%     ~     (p=0.729 n=30+23)
Revcomp-4                  66.9ms ± 1%    67.2ms ± 2%     ~     (p=0.102 n=28+29)
Template-4                  1.07s ± 1%     1.06s ± 1%   -0.53%  (p=0.000 n=30+30)
TimeParse-4                7.07µs ± 1%    7.01µs ± 0%   -0.85%  (p=0.000 n=30+25)
TimeFormat-4               13.1µs ± 0%    13.2µs ± 1%   +0.77%  (p=0.000 n=27+27)
[Geo mean]                  721µs          716µs        -0.70%

name                     old speed      new speed      delta
GobDecode-4              7.38MB/s ± 1%  7.37MB/s ± 2%     ~     (p=0.399 n=29+29)
GobEncode-4              8.26MB/s ± 1%  8.27MB/s ± 2%     ~     (p=0.790 n=30+30)
Gzip-4                   4.71MB/s ± 1%  4.71MB/s ± 1%     ~     (p=0.885 n=30+30)
Gunzip-4                 32.2MB/s ± 1%  32.2MB/s ± 1%     ~     (p=0.190 n=30+30)
JSONEncode-4             8.28MB/s ± 1%  8.25MB/s ± 1%     ~     (p=0.053 n=30+30)
JSONDecode-4             2.13MB/s ± 0%  2.12MB/s ± 1%     ~     (p=0.072 n=18+30)
GoParse-4                1.25MB/s ± 1%  1.25MB/s ± 2%     ~     (p=0.863 n=30+30)
RegexpMatchEasy0_32-4    24.8MB/s ± 0%  25.2MB/s ± 1%   +1.61%  (p=0.000 n=30+30)
RegexpMatchEasy0_1K-4     134MB/s ± 4%   133MB/s ± 3%     ~     (p=0.074 n=30+30)
RegexpMatchEasy1_32-4    24.5MB/s ± 0%  24.6MB/s ± 1%   +0.72%  (p=0.000 n=23+30)
RegexpMatchEasy1_1K-4    99.1MB/s ± 3%  99.8MB/s ± 5%     ~     (p=0.105 n=30+30)
RegexpMatchMedium_32-4    483kB/s ± 1%   487kB/s ± 1%   +0.83%  (p=0.002 n=30+30)
RegexpMatchMedium_1K-4   1.92MB/s ± 1%  1.92MB/s ± 1%     ~     (p=0.058 n=30+30)
RegexpMatchHard_32-4     1.10MB/s ± 0%  1.11MB/s ± 0%     ~     (p=0.804 n=30+30)
RegexpMatchHard_1K-4     1.18MB/s ± 0%  1.18MB/s ± 0%     ~     (all equal)
Revcomp-4                38.0MB/s ± 1%  37.8MB/s ± 2%     ~     (p=0.098 n=28+29)
Template-4               1.82MB/s ± 1%  1.83MB/s ± 1%   +0.55%  (p=0.000 n=29+29)
[Geo mean]               6.79MB/s       6.79MB/s        +0.09%

Change-Id: Ia91991c2c5c59c5df712de85a83b13a21c0a554b
Reviewed-on: https://go-review.googlesource.com/63770
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-09-15 22:30:34 +00:00
Ben Shi 2899c3e8cb cmd/compile: optimize ARM code with NMULF/NMULD
NMULF and NMULD are efficient FP instructions, and the go compiler can
use them to generate better code.

The benchmark tests of my patch did not show general change, but big
improvement in special cases.

1.A special test case improved 12.6%.
https://github.com/benshi001/ugo1/blob/master/fpmul_test.go
name                     old time/op    new time/op    delta
FPMul-4                     398µs ± 1%     348µs ± 1%  -12.64%  (p=0.000 n=40+40)

2. the compilecmp test showed little change.
name        old time/op       new time/op       delta
Template          2.30s ± 1%        2.31s ± 1%    ~     (p=0.754 n=17+19)
Unicode           1.31s ± 3%        1.32s ± 5%    ~     (p=0.265 n=20+20)
GoTypes           7.73s ± 2%        7.73s ± 1%    ~     (p=0.925 n=20+20)
Compiler          37.0s ± 1%        37.3s ± 2%  +0.79%  (p=0.002 n=19+20)
SSA               83.8s ± 4%        83.5s ± 2%    ~     (p=0.964 n=20+17)
Flate             1.43s ± 2%        1.44s ± 1%    ~     (p=0.602 n=20+20)
GoParser          1.82s ± 2%        1.81s ± 2%    ~     (p=0.141 n=19+20)
Reflect           5.08s ± 2%        5.08s ± 3%    ~     (p=0.835 n=20+19)
Tar               2.36s ± 1%        2.35s ± 1%    ~     (p=0.195 n=18+17)
XML               2.57s ± 2%        2.56s ± 1%    ~     (p=0.283 n=20+17)
[Geo mean]        4.74s             4.75s       +0.05%

name        old user-time/op  new user-time/op  delta
Template          2.75s ± 2%        2.75s ± 0%    ~     (p=0.620 n=20+15)
Unicode           1.59s ± 4%        1.60s ± 4%    ~     (p=0.479 n=20+19)
GoTypes           9.48s ± 1%        9.47s ± 1%    ~     (p=0.743 n=20+20)
Compiler          45.7s ± 1%        45.7s ± 1%    ~     (p=0.482 n=19+20)
SSA                109s ± 1%         109s ± 2%    ~     (p=0.800 n=18+20)
Flate             1.67s ± 3%        1.67s ± 3%    ~     (p=0.598 n=19+18)
GoParser          2.15s ± 4%        2.13s ± 3%    ~     (p=0.153 n=20+20)
Reflect           5.95s ± 2%        5.95s ± 2%    ~     (p=0.961 n=19+20)
Tar               2.93s ± 2%        2.92s ± 3%    ~     (p=0.242 n=20+19)
XML               3.02s ± 3%        3.04s ± 3%    ~     (p=0.233 n=19+18)
[Geo mean]        5.74s             5.74s       -0.04%

name        old text-bytes    new text-bytes    delta
HelloSize         588kB ± 0%        588kB ± 0%    ~     (all equal)

name        old data-bytes    new data-bytes    delta
HelloSize        5.46kB ± 0%       5.46kB ± 0%    ~     (all equal)

name        old bss-bytes     new bss-bytes     delta
HelloSize        72.9kB ± 0%       72.9kB ± 0%    ~     (all equal)

name        old exe-bytes     new exe-bytes     delta
HelloSize        1.03MB ± 0%       1.03MB ± 0%    ~     (all equal)

3. The go1 benchmark showed little change in total.
name                     old time/op    new time/op    delta
BinaryTree17-4              41.8s ± 1%     41.8s ± 1%    ~     (p=0.388 n=40+39)
Fannkuch11-4                24.1s ± 1%     24.1s ± 1%    ~     (p=0.077 n=40+40)
FmtFprintfEmpty-4           834ns ± 1%     831ns ± 1%  -0.31%  (p=0.002 n=40+37)
FmtFprintfString-4         1.34µs ± 1%    1.34µs ± 0%    ~     (p=0.387 n=40+40)
FmtFprintfInt-4            1.44µs ± 1%    1.44µs ± 1%    ~     (p=0.421 n=40+40)
FmtFprintfIntInt-4         2.09µs ± 0%    2.09µs ± 1%    ~     (p=0.589 n=40+39)
FmtFprintfPrefixedInt-4    2.32µs ± 1%    2.33µs ± 1%  +0.15%  (p=0.001 n=40+40)
FmtFprintfFloat-4          4.51µs ± 0%    4.44µs ± 1%  -1.50%  (p=0.000 n=40+40)
FmtManyArgs-4              7.94µs ± 0%    7.97µs ± 0%  +0.36%  (p=0.001 n=32+40)
GobDecode-4                 104ms ± 1%     102ms ± 2%  -1.27%  (p=0.000 n=39+37)
GobEncode-4                90.5ms ± 1%    90.9ms ± 2%  +0.40%  (p=0.006 n=37+40)
Gzip-4                      4.10s ± 2%     4.08s ± 1%  -0.30%  (p=0.004 n=40+40)
Gunzip-4                    603ms ± 0%     602ms ± 1%    ~     (p=0.303 n=37+40)
HTTPClientServer-4          672µs ± 3%     658µs ± 2%  -2.08%  (p=0.000 n=39+37)
JSONEncode-4                238ms ± 1%     239ms ± 0%  +0.26%  (p=0.001 n=40+25)
JSONDecode-4                884ms ± 1%     885ms ± 1%  +0.16%  (p=0.012 n=40+40)
Mandelbrot200-4            49.3ms ± 0%    49.3ms ± 0%    ~     (p=0.588 n=40+38)
GoParse-4                  46.3ms ± 1%    46.4ms ± 2%    ~     (p=0.487 n=40+40)
RegexpMatchEasy0_32-4      1.28µs ± 1%    1.28µs ± 0%  +0.12%  (p=0.003 n=40+40)
RegexpMatchEasy0_1K-4      7.78µs ± 5%    7.78µs ± 4%    ~     (p=0.825 n=40+40)
RegexpMatchEasy1_32-4      1.29µs ± 1%    1.29µs ± 0%    ~     (p=0.659 n=40+40)
RegexpMatchEasy1_1K-4      10.3µs ± 3%    10.4µs ± 2%    ~     (p=0.266 n=40+40)
RegexpMatchMedium_32-4     2.05µs ± 1%    2.05µs ± 0%  -0.18%  (p=0.002 n=40+28)
RegexpMatchMedium_1K-4      533µs ± 1%     534µs ± 1%    ~     (p=0.397 n=37+40)
RegexpMatchHard_32-4       28.9µs ± 1%    28.9µs ± 1%  -0.22%  (p=0.002 n=40+40)
RegexpMatchHard_1K-4        868µs ± 1%     870µs ± 1%  +0.21%  (p=0.015 n=40+40)
Revcomp-4                  67.3ms ± 1%    67.2ms ± 2%    ~     (p=0.262 n=38+39)
Template-4                  1.07s ± 1%     1.07s ± 1%    ~     (p=0.276 n=40+40)
TimeParse-4                7.16µs ± 1%    7.16µs ± 1%    ~     (p=0.610 n=39+40)
TimeFormat-4               13.3µs ± 1%    13.3µs ± 1%    ~     (p=0.617 n=38+40)
[Geo mean]                  720µs          719µs       -0.13%

name                     old speed      new speed      delta
GobDecode-4              7.39MB/s ± 1%  7.49MB/s ± 2%  +1.25%  (p=0.000 n=39+38)
GobEncode-4              8.48MB/s ± 1%  8.45MB/s ± 2%  -0.40%  (p=0.005 n=37+40)
Gzip-4                   4.74MB/s ± 2%  4.75MB/s ± 1%  +0.30%  (p=0.018 n=40+40)
Gunzip-4                 32.2MB/s ± 0%  32.2MB/s ± 1%    ~     (p=0.272 n=36+40)
JSONEncode-4             8.15MB/s ± 1%  8.13MB/s ± 0%  -0.26%  (p=0.003 n=40+25)
JSONDecode-4             2.19MB/s ± 1%  2.19MB/s ± 1%    ~     (p=0.676 n=40+40)
GoParse-4                1.25MB/s ± 2%  1.25MB/s ± 2%    ~     (p=0.823 n=40+40)
RegexpMatchEasy0_32-4    25.1MB/s ± 1%  25.1MB/s ± 0%  -0.12%  (p=0.006 n=40+40)
RegexpMatchEasy0_1K-4     132MB/s ± 5%   132MB/s ± 5%    ~     (p=0.821 n=40+40)
RegexpMatchEasy1_32-4    24.7MB/s ± 1%  24.7MB/s ± 0%    ~     (p=0.630 n=40+40)
RegexpMatchEasy1_1K-4    99.1MB/s ± 3%  98.8MB/s ± 2%    ~     (p=0.268 n=40+40)
RegexpMatchMedium_32-4    487kB/s ± 2%   490kB/s ± 0%  +0.51%  (p=0.001 n=40+40)
RegexpMatchMedium_1K-4   1.92MB/s ± 1%  1.92MB/s ± 1%    ~     (p=0.208 n=39+40)
RegexpMatchHard_32-4     1.11MB/s ± 1%  1.11MB/s ± 0%  +0.36%  (p=0.000 n=40+33)
RegexpMatchHard_1K-4     1.18MB/s ± 1%  1.18MB/s ± 1%    ~     (p=0.207 n=40+37)
Revcomp-4                37.8MB/s ± 1%  37.8MB/s ± 2%    ~     (p=0.276 n=38+39)
Template-4               1.82MB/s ± 1%  1.81MB/s ± 1%    ~     (p=0.122 n=38+40)
[Geo mean]               6.81MB/s       6.81MB/s       +0.06%

fixes #19843

Change-Id: Ief3a0c2b15f59d40c7b40f2784eeb71196685b59
Reviewed-on: https://go-review.googlesource.com/61150
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-11 13:10:33 +00:00
Ben Shi 64607dbd26 cmd/compile: optimize ARM with MULS
MULS was introduced in ARMv7 and corresponding to MULA. This patch
duplicated all MULA related SSA rules with MULS.

Here was the contrast test result against the original go compiler.
There was no improvement in total, but big improvement in special cases.

1. A specific test case accelerated 18.62%.
(https://github.com/benshi001/ugo1/blob/master/mulsub_test.go)
name                     old time/op    new time/op    delta
MulSub-4                    270µs ± 0%     219µs ± 0%  -18.62%  (p=0.000 n=35+40)

2. Total size of all .a files in pkg/ shrank by 0.002%.

3. The compilecmp benchmark showed no decline.
name        old time/op       new time/op       delta
Template          2.37s ± 3%        2.36s ± 1%    ~     (p=0.233 n=19+18)
Unicode           1.32s ± 2%        1.34s ± 5%  +1.32%  (p=0.011 n=20+18)
GoTypes           7.88s ± 1%        7.87s ± 1%    ~     (p=0.758 n=20+20)
Compiler          37.5s ± 1%        37.6s ± 1%    ~     (p=0.194 n=20+19)
SSA               83.7s ± 2%        83.5s ± 2%    ~     (p=0.569 n=20+19)
Flate             1.46s ± 3%        1.45s ± 1%    ~     (p=0.619 n=20+17)
GoParser          1.87s ± 2%        1.85s ± 1%  -0.58%  (p=0.048 n=20+18)
Reflect           5.10s ± 2%        5.11s ± 2%    ~     (p=0.365 n=19+20)
Tar               1.78s ± 2%        1.78s ± 2%    ~     (p=0.531 n=19+20)
XML               2.62s ± 1%        2.61s ± 2%    ~     (p=0.057 n=17+19)
[Geo mean]        4.68s             4.67s       -0.07%

name        old user-time/op  new user-time/op  delta
Template          2.80s ± 1%        2.79s ± 2%    ~     (p=0.686 n=17+20)
Unicode           1.61s ± 4%        1.63s ± 6%    ~     (p=0.222 n=20+20)
GoTypes           9.59s ± 1%        9.60s ± 1%    ~     (p=0.482 n=17+20)
Compiler          46.1s ± 1%        46.2s ± 1%    ~     (p=0.373 n=20+18)
SSA                108s ± 1%         108s ± 2%    ~     (p=0.784 n=20+20)
Flate             1.68s ± 3%        1.69s ± 3%    ~     (p=0.335 n=20+19)
GoParser          2.20s ± 4%        2.19s ± 2%    ~     (p=0.844 n=20+18)
Reflect           5.97s ± 3%        6.01s ± 2%    ~     (p=0.184 n=20+20)
Tar               2.11s ± 2%        2.11s ± 4%    ~     (p=0.961 n=19+20)
XML               3.07s ± 1%        3.07s ± 3%    ~     (p=0.786 n=16+19)
[Geo mean]        5.61s             5.62s       +0.19%

name        old text-bytes    new text-bytes    delta
HelloSize         586kB ± 0%        586kB ± 0%    ~     (all equal)

name        old data-bytes    new data-bytes    delta
HelloSize        5.46kB ± 0%       5.46kB ± 0%    ~     (all equal)

name        old bss-bytes     new bss-bytes     delta
HelloSize        72.9kB ± 0%       72.9kB ± 0%    ~     (all equal)

name        old exe-bytes     new exe-bytes     delta
HelloSize        1.03MB ± 0%       1.03MB ± 0%    ~     (all equal)

4. The go1 benchmark showed no decline in total.
name                     old time/op    new time/op    delta
BinaryTree17-4              41.7s ± 1%     41.7s ± 1%    ~     (p=0.966 n=40+40)
Fannkuch11-4                23.6s ± 0%     23.6s ± 1%  -0.23%  (p=0.000 n=40+40)
FmtFprintfEmpty-4           844ns ± 1%     834ns ± 1%  -1.23%  (p=0.000 n=40+40)
FmtFprintfString-4         1.39µs ± 1%    1.40µs ± 1%  +0.71%  (p=0.000 n=40+40)
FmtFprintfInt-4            1.44µs ± 1%    1.45µs ± 1%  +0.70%  (p=0.000 n=40+40)
FmtFprintfIntInt-4         2.10µs ± 1%    2.10µs ± 1%  +0.30%  (p=0.000 n=40+40)
FmtFprintfPrefixedInt-4    2.49µs ± 0%    2.50µs ± 1%  +0.66%  (p=0.000 n=32+40)
FmtFprintfFloat-4          4.42µs ± 1%    4.46µs ± 2%  +0.94%  (p=0.000 n=40+40)
FmtManyArgs-4              8.31µs ± 1%    8.22µs ± 1%  -1.09%  (p=0.000 n=40+40)
GobDecode-4                 105ms ± 1%     102ms ± 1%  -2.30%  (p=0.000 n=39+39)
GobEncode-4                90.2ms ± 1%    88.7ms ± 1%  -1.66%  (p=0.000 n=40+39)
Gzip-4                      4.17s ± 1%     4.16s ± 1%    ~     (p=0.785 n=40+40)
Gunzip-4                    608ms ± 1%     608ms ± 1%    ~     (p=0.481 n=40+40)
HTTPClientServer-4          697µs ± 2%     684µs ± 3%  -1.89%  (p=0.000 n=37+40)
JSONEncode-4                255ms ± 1%     256ms ± 1%  +0.35%  (p=0.000 n=40+40)
JSONDecode-4                920ms ± 1%     926ms ± 1%  +0.64%  (p=0.000 n=40+39)
Mandelbrot200-4            49.3ms ± 1%    49.3ms ± 0%  +0.07%  (p=0.005 n=40+40)
GoParse-4                  46.8ms ± 2%    46.7ms ± 1%    ~     (p=1.000 n=40+40)
RegexpMatchEasy0_32-4      1.27µs ± 0%    1.27µs ± 1%    ~     (p=0.057 n=40+40)
RegexpMatchEasy0_1K-4      7.97µs ± 7%    7.92µs ± 5%    ~     (p=0.094 n=40+40)
RegexpMatchEasy1_32-4      1.28µs ± 1%    1.28µs ± 1%    ~     (p=0.406 n=40+40)
RegexpMatchEasy1_1K-4      10.5µs ± 4%    10.5µs ± 3%    ~     (p=0.855 n=40+40)
RegexpMatchMedium_32-4     2.04µs ± 0%    2.04µs ± 1%  -0.22%  (p=0.000 n=39+40)
RegexpMatchMedium_1K-4      541µs ± 0%     540µs ± 1%  -0.25%  (p=0.000 n=40+38)
RegexpMatchHard_32-4       29.3µs ± 1%    29.3µs ± 0%    ~     (p=0.149 n=40+40)
RegexpMatchHard_1K-4        878µs ± 1%     880µs ± 0%  +0.14%  (p=0.005 n=36+35)
Revcomp-4                  81.8ms ± 2%    81.4ms ± 2%  -0.43%  (p=0.015 n=38+39)
Template-4                  1.05s ± 1%     1.05s ± 1%    ~     (p=0.302 n=40+35)
TimeParse-4                7.18µs ± 1%    7.26µs ± 1%  +1.05%  (p=0.000 n=40+36)
TimeFormat-4               13.1µs ± 1%    13.1µs ± 1%    ~     (p=0.698 n=37+40)
[Geo mean]                  733µs          732µs       -0.16%

name                     old speed      new speed      delta
GobDecode-4              7.34MB/s ± 1%  7.51MB/s ± 1%  +2.36%  (p=0.000 n=39+39)
GobEncode-4              8.51MB/s ± 1%  8.65MB/s ± 1%  +1.69%  (p=0.000 n=40+39)
Gzip-4                   4.66MB/s ± 1%  4.66MB/s ± 1%    ~     (p=0.783 n=40+40)
Gunzip-4                 31.9MB/s ± 1%  31.9MB/s ± 1%    ~     (p=0.466 n=40+40)
JSONEncode-4             7.61MB/s ± 1%  7.58MB/s ± 1%  -0.35%  (p=0.001 n=40+40)
JSONDecode-4             2.11MB/s ± 1%  2.10MB/s ± 1%  -0.52%  (p=0.000 n=38+39)
GoParse-4                1.24MB/s ± 2%  1.24MB/s ± 1%    ~     (p=0.556 n=40+39)
RegexpMatchEasy0_32-4    25.1MB/s ± 0%  25.1MB/s ± 1%    ~     (p=0.064 n=40+40)
RegexpMatchEasy0_1K-4     129MB/s ± 8%   129MB/s ± 5%    ~     (p=0.094 n=40+40)
RegexpMatchEasy1_32-4    25.0MB/s ± 1%  25.1MB/s ± 1%    ~     (p=0.331 n=40+40)
RegexpMatchEasy1_1K-4    97.7MB/s ± 4%  97.8MB/s ± 3%    ~     (p=0.851 n=40+40)
RegexpMatchMedium_32-4    490kB/s ± 0%   490kB/s ± 0%    ~     (all equal)
RegexpMatchMedium_1K-4   1.89MB/s ± 0%  1.90MB/s ± 1%  +0.12%  (p=0.031 n=40+40)
RegexpMatchHard_32-4     1.09MB/s ± 1%  1.09MB/s ± 1%    ~     (p=0.597 n=40+40)
RegexpMatchHard_1K-4     1.16MB/s ± 1%  1.16MB/s ± 1%    ~     (p=0.565 n=40+35)
Revcomp-4                31.1MB/s ± 2%  31.2MB/s ± 2%  +0.44%  (p=0.018 n=38+39)
Template-4               1.85MB/s ± 1%  1.85MB/s ± 1%    ~     (p=0.873 n=40+40)
[Geo mean]               6.66MB/s       6.67MB/s       +0.26%

Change-Id: Icc972d8a78ea06c32c3aa15733ff0537c82c2dc7
Reviewed-on: https://go-review.googlesource.com/58950
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
2017-08-30 13:45:08 +00:00
Ben Shi a2f22a6803 cmd/compile: optimize ARM with more efficient MOVB/MOVBU/MOVH/MOVHU
Like the indexed MOVW (MOVWloadidx/MOVWstoreidx) used in current
ARM backend, the indexed MOVB/MOVBU/MOVH/MOVHU can also be used to
generate further optimized ARM code.

My patch implements this optimization. Here are some contrast test
results against the original go compiler.

1. The total size of all .a files in pkg/ shrinks by 0.03%.

2. The compilecmp benchmark shows a little decline.
name        old time/op       new time/op       delta
Template          2.35s ± 1%        2.37s ± 3%  +0.94%  (p=0.006 n=19+19)
Unicode           1.33s ± 3%        1.33s ± 2%    ~     (p=0.158 n=20+18)
GoTypes           7.86s ± 2%        7.84s ± 1%    ~     (p=0.284 n=19+18)
Compiler          37.5s ± 1%        37.7s ± 2%    ~     (p=0.101 n=20+19)
SSA               83.4s ± 2%        83.6s ± 2%    ~     (p=0.231 n=20+20)
Flate             1.46s ± 2%        1.45s ± 1%    ~     (p=0.097 n=20+17)
GoParser          1.86s ± 2%        1.86s ± 4%    ~     (p=0.738 n=20+20)
Reflect           5.10s ± 1%        5.11s ± 1%    ~     (p=0.290 n=20+18)
Tar               1.78s ± 2%        1.77s ± 2%    ~     (p=0.166 n=19+20)
XML               2.61s ± 2%        2.61s ± 2%    ~     (p=0.665 n=19+19)
[Geo mean]        4.67s             4.68s       +0.16%

name        old user-time/op  new user-time/op  delta
Template          2.79s ± 3%        2.80s ± 2%    ~     (p=0.662 n=20+20)
Unicode           1.62s ± 3%        1.64s ± 4%    ~     (p=0.252 n=20+20)
GoTypes           9.58s ± 2%        9.62s ± 2%    ~     (p=0.250 n=20+20)
Compiler          46.2s ± 1%        46.2s ± 1%    ~     (p=0.602 n=20+19)
SSA                108s ± 1%         108s ± 2%    ~     (p=0.242 n=18+20)
Flate             1.69s ± 3%        1.69s ± 4%    ~     (p=0.470 n=20+20)
GoParser          2.16s ± 3%        2.20s ± 4%  +1.70%  (p=0.005 n=19+20)
Reflect           6.02s ± 2%        6.02s ± 2%    ~     (p=0.700 n=20+17)
Tar               2.11s ± 2%        2.11s ± 3%    ~     (p=0.480 n=18+20)
XML               3.07s ± 2%        3.11s ± 4%  +1.50%  (p=0.043 n=20+20)
[Geo mean]        5.61s             5.64s       +0.55%

name        old text-bytes    new text-bytes    delta
HelloSize         586kB ± 0%        586kB ± 0%    ~     (all equal)

name        old data-bytes    new data-bytes    delta
HelloSize        5.46kB ± 0%       5.46kB ± 0%    ~     (all equal)

name        old bss-bytes     new bss-bytes     delta
HelloSize        72.9kB ± 0%       72.9kB ± 0%    ~     (all equal)

name        old exe-bytes     new exe-bytes     delta
HelloSize        1.03MB ± 0%       1.03MB ± 0%    ~     (all equal)

3. The go1 benchmark shows improvement totally, and even more than 10%
improvement in the test case Revcomp. 
name                     old time/op    new time/op    delta
BinaryTree17-4              42.0s ± 1%     41.5s ± 1%   -1.32%  (p=0.000 n=39+40)
Fannkuch11-4                24.1s ± 1%     23.6s ± 0%   -2.38%  (p=0.000 n=40+40)
FmtFprintfEmpty-4           843ns ± 0%     839ns ± 1%   -0.46%  (p=0.000 n=33+40)
FmtFprintfString-4         1.44µs ± 1%    1.37µs ± 1%   -5.48%  (p=0.000 n=40+35)
FmtFprintfInt-4            1.44µs ± 1%    1.41µs ± 2%   -1.50%  (p=0.000 n=40+40)
FmtFprintfIntInt-4         2.07µs ± 1%    2.06µs ± 0%   -0.78%  (p=0.000 n=40+40)
FmtFprintfPrefixedInt-4    2.50µs ± 1%    2.33µs ± 1%   -6.85%  (p=0.000 n=40+40)
FmtFprintfFloat-4          4.36µs ± 1%    4.34µs ± 0%   -0.39%  (p=0.017 n=40+40)
FmtManyArgs-4              8.11µs ± 0%    8.00µs ± 0%   -1.37%  (p=0.000 n=40+40)
GobDecode-4                 105ms ± 2%     103ms ± 2%   -2.17%  (p=0.000 n=39+39)
GobEncode-4                90.1ms ± 2%    88.6ms ± 1%   -1.67%  (p=0.000 n=40+39)
Gzip-4                      4.18s ± 1%     4.09s ± 1%   -2.03%  (p=0.000 n=40+40)
Gunzip-4                    608ms ± 1%     603ms ± 1%   -0.86%  (p=0.000 n=40+34)
HTTPClientServer-4          674µs ± 3%     661µs ± 2%   -1.82%  (p=0.000 n=40+39)
JSONEncode-4                256ms ± 1%     243ms ± 0%   -5.11%  (p=0.000 n=39+31)
JSONDecode-4                915ms ± 1%     904ms ± 1%   -1.18%  (p=0.000 n=40+36)
Mandelbrot200-4            49.2ms ± 0%    49.3ms ± 0%     ~     (p=0.254 n=34+40)
GoParse-4                  46.9ms ± 2%    46.9ms ± 1%     ~     (p=0.737 n=40+39)
RegexpMatchEasy0_32-4      1.28µs ± 1%    1.27µs ± 1%   -0.71%  (p=0.000 n=40+40)
RegexpMatchEasy0_1K-4      7.86µs ± 4%    7.67µs ± 4%   -2.46%  (p=0.000 n=38+40)
RegexpMatchEasy1_32-4      1.28µs ± 1%    1.28µs ± 1%   -0.54%  (p=0.000 n=40+40)
RegexpMatchEasy1_1K-4      10.4µs ± 2%    10.3µs ± 2%   -0.88%  (p=0.003 n=40+39)
RegexpMatchMedium_32-4     2.05µs ± 0%    2.04µs ± 0%   -0.34%  (p=0.000 n=40+33)
RegexpMatchMedium_1K-4      541µs ± 1%     535µs ± 1%   -1.02%  (p=0.000 n=40+38)
RegexpMatchHard_32-4       29.3µs ± 1%    29.1µs ± 1%   -0.51%  (p=0.000 n=40+40)
RegexpMatchHard_1K-4        881µs ± 1%     871µs ± 1%   -1.15%  (p=0.000 n=40+40)
Revcomp-4                  81.7ms ± 2%    67.5ms ± 2%  -17.37%  (p=0.000 n=39+39)
Template-4                  1.05s ± 1%     1.08s ± 2%   +3.67%  (p=0.000 n=40+40)
TimeParse-4                7.24µs ± 1%    7.09µs ± 1%   -2.13%  (p=0.000 n=40+40)
TimeFormat-4               13.2µs ± 1%    13.1µs ± 0%   -0.31%  (p=0.007 n=40+31)
[Geo mean]                  733µs          718µs        -2.03%

name                     old speed      new speed      delta
GobDecode-4              7.28MB/s ± 2%  7.44MB/s ± 2%   +2.23%  (p=0.000 n=39+39)
GobEncode-4              8.52MB/s ± 2%  8.67MB/s ± 1%   +1.70%  (p=0.000 n=40+39)
Gzip-4                   4.65MB/s ± 1%  4.74MB/s ± 1%   +1.94%  (p=0.000 n=37+40)
Gunzip-4                 31.9MB/s ± 1%  32.2MB/s ± 1%   +0.90%  (p=0.000 n=40+36)
JSONEncode-4             7.57MB/s ± 1%  7.98MB/s ± 0%   +5.41%  (p=0.000 n=40+31)
JSONDecode-4             2.12MB/s ± 1%  2.15MB/s ± 1%   +1.23%  (p=0.000 n=40+40)
GoParse-4                1.23MB/s ± 1%  1.23MB/s ± 1%     ~     (p=0.769 n=39+40)
RegexpMatchEasy0_32-4    25.0MB/s ± 1%  25.2MB/s ± 1%   +0.71%  (p=0.000 n=40+40)
RegexpMatchEasy0_1K-4     130MB/s ± 5%   134MB/s ± 4%   +2.53%  (p=0.000 n=38+40)
RegexpMatchEasy1_32-4    24.9MB/s ± 1%  25.1MB/s ± 1%   +0.55%  (p=0.000 n=40+40)
RegexpMatchEasy1_1K-4    98.5MB/s ± 2%  99.4MB/s ± 2%   +0.88%  (p=0.003 n=40+39)
RegexpMatchMedium_32-4    490kB/s ± 0%   490kB/s ± 0%     ~     (all equal)
RegexpMatchMedium_1K-4   1.89MB/s ± 1%  1.91MB/s ± 1%   +1.02%  (p=0.000 n=40+38)
RegexpMatchHard_32-4     1.10MB/s ± 1%  1.10MB/s ± 0%   +0.41%  (p=0.000 n=40+33)
RegexpMatchHard_1K-4     1.16MB/s ± 1%  1.17MB/s ± 1%   +1.21%  (p=0.000 n=40+40)
Revcomp-4                31.1MB/s ± 2%  37.6MB/s ± 2%  +21.03%  (p=0.000 n=39+39)
Template-4               1.86MB/s ± 1%  1.79MB/s ± 1%   -3.51%  (p=0.000 n=40+38)
[Geo mean]               6.66MB/s       6.80MB/s        +2.13%

fixes #21492

Change-Id: Ia26e7ca393f0a5f31de240e8ff9a220453ca7e0d
Reviewed-on: https://go-review.googlesource.com/58450
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-08-28 16:10:27 +00:00
Keith Randall 1e72bf6218 cmd/compile: experiment which clobbers all dead pointer fields
The experiment "clobberdead" clobbers all pointer fields that the
compiler thinks are dead, just before and after every safepoint.
Useful for debugging the generation of live pointer bitmaps.

Helped find the following issues:
Update #15936
Update #16026
Update #16095
Update #18860

Change-Id: Id1d12f86845e3d93bae903d968b1eac61fc461f9
Reviewed-on: https://go-review.googlesource.com/23924
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-04-21 20:19:50 +00:00
Josh Bleecher Snyder 01b1a34aac cmd/compile: rework handling of udiv on ARM
Instead of populating the aux symbol
of CALLudiv during rewrite rules,
populate it during genssa.

This simplifies the rewrite rules.
It also removes all remaining calls
to ctxt.Lookup from any rewrite rules.
This is a first step towards removing
ctxt from ssa.Cache entirely,
and also a first step towards converting
the obj.LSym.Version field into a boolean.
It should also speed up compilation.

Also, move func udiv into package runtime.
That's where it is anyway,
and it lets udiv look and act like the rest of
the runtime support functions.

Change-Id: I41462a632c14fdc41f61b08049ec13cd80a87bfe
Reviewed-on: https://go-review.googlesource.com/41191
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-20 16:27:38 +00:00
Ben Shi 8577f81a10 cmd/compile/internal: Optimization with RBIT and REV
By checking GOARM in ssa/gen/ARM.rules, each intermediate operator
can be implemented via different instruction serials.

It is up to the user to choose between compitability and efficiency.

The Bswap32(x) is optimized to REV(x) when GOARM >= 6.
The CTZ(x) is optimized to CLZ(RBIT x) when GOARM == 7.

Change-Id: Ie9ee645fa39333fa79ad84ed4d1cefac30422814
Reviewed-on: https://go-review.googlesource.com/35610
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-31 15:10:24 +00:00
Matthew Dempsky 691755304c cmd/compile/internal/ssa: populate SymEffects for SSA Ops
Changes to ${GOARCH}Ops.go files were mechanically produced using
github.com/mdempsky/ssa-symops, a one-off tool that inserts
"SymEffect: X" elements by pattern matching against the Op names.

Change-Id: Ibf3e481ffd588647f2a31662d72114b740ccbfcf
Reviewed-on: https://go-review.googlesource.com/38084
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-14 18:34:45 +00:00
Matthew Dempsky cc71aa9ac4 cmd/compile/internal/ssa: make ARM's udiv like other calls
Passes toolstash-check -all.

Change-Id: Id389f8158cf33a3c0fcef373615b5351e7c74b5b
Reviewed-on: https://go-review.googlesource.com/38082
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-13 21:29:02 +00:00
Matthew Dempsky 08d8d5c986 cmd/compile/internal/ssa: replace {Defer,Go}Call with StaticCall
Passes toolstash-check -all.

Change-Id: Icf8b75364e4761a5e56567f503b2c1cb17382ed2
Reviewed-on: https://go-review.googlesource.com/38080
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-13 19:44:36 +00:00
Cherry Zhang 6fd5e2549a cmd/compile: mark MOVWF/MOVFW clobbering F15 on ARM
The assembler back end uses F15 as a temporary register in these
instructions.

Checked the assembler back end and made sure that this is the
only case clobbering F15.

Fixes #19403.

Change-Id: I02b9e00fdd9229db899f501c8e9b306e02912d83
Reviewed-on: https://go-review.googlesource.com/37792
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-05 18:31:27 +00:00
shawnps 067bab00a8 all: fix misspellings
Change-Id: I429637ca91f7db4144f17621de851a548dc1ce76
Reviewed-on: https://go-review.googlesource.com/34923
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-01-07 16:53:25 +00:00
Cherry Zhang f9238a76ff cmd/compile: make LR allocatable in non-leaf functions on ARM
The mechanism is initially introduced (and reviewed) in CL 30597
on S390X.

Reduce number of "spilled value remains" by 0.4% in cmd/go.

Disabled on ARMv5 because LR is clobbered almost everywhere with
inserted softfloat calls.

Change-Id: I2934737ce2455909647ed2118fe2bd6f0aa5ac52
Reviewed-on: https://go-review.googlesource.com/32178
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-10-28 14:25:33 +00:00
Michael Munday 15817e409b cmd/compile: make link register allocatable in non-leaf functions
We save and restore the link register in non-leaf functions because
it is clobbered by CALLs. It is therefore available for general
purpose use.

Only enabled on s390x currently. The RC4 benchmarks in particular
benefit from the extra register:

name     old speed     new speed     delta
RC4_128  243MB/s ± 2%  341MB/s ± 2%  +40.46%  (p=0.008 n=5+5)
RC4_1K   267MB/s ± 0%  359MB/s ± 1%  +34.32%  (p=0.008 n=5+5)
RC4_8K   271MB/s ± 0%  362MB/s ± 0%  +33.61%  (p=0.008 n=5+5)

Change-Id: Id23bff95e771da9425353da2f32668b8e34ba09f
Reviewed-on: https://go-review.googlesource.com/30597
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-11 18:52:35 +00:00
Keith Randall 98938189a1 cmd/compile: remove duplicate nilchecks
Mark nil check operations as faulting if their arg is zero.
This lets the late nilcheck pass remove duplicates.

Fixes #17242.

Change-Id: I4c9938d8a5a1e43edd85b4a66f0b34004860bcd9
Reviewed-on: https://go-review.googlesource.com/29952
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-09-27 23:54:01 +00:00
Cherry Zhang 38cd79889e cmd/compile: simplify div/mod on ARM
On ARM, DIV, DIVU, MOD, MODU are pseudo instructions that makes
runtime calls _div/_udiv/_mod/_umod, which themselves are wrappers
of udiv. The udiv function does the real thing.

Instead of generating these pseudo instructions, call to udiv
directly. This removes one layer of wrappers (which has an awkward
way of passing argument), and also allows combining DIV and MOD
if both results are needed.

Change-Id: I118afc3986db3a1daabb5c1e6e57430888c91817
Reviewed-on: https://go-review.googlesource.com/29390
Reviewed-by: David Chase <drchase@google.com>
2016-09-20 13:40:48 +00:00
Keith Randall 3134ab3c2d cmd/compile: redo nil checks
Get rid of BlockCheck. Josh goaded me into it, and I went
down a rabbithole making it happen.

NilCheck now panics if the pointer is nil and returns void, as before.
BlockCheck is gone, and NilCheck is no longer a Control value for
any block. It just exists (and deadcode knows not to throw it away).

I rewrote the nilcheckelim pass to handle this case.  In particular,
there can now be multiple NilCheck ops per block.

I moved all of the arch-dependent nil check elimination done as
part of ssaGenValue into its own proper pass, so we don't have to
duplicate that code for every architecture.

Making the arch-dependent nil check its own pass means I needed
to add a bunch of flags to the opcode table so I could write
the code without arch-dependent ops everywhere.

Change-Id: I419f891ac9b0de313033ff09115c374163416a9f
Reviewed-on: https://go-review.googlesource.com/29120
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-09-15 02:42:13 +00:00
Keith Randall c345a3913f cmd/compile: get rid of BlockCall
No need for it, we can treat calls as (mostly) normal values
that take a memory and return a memory.

Lowers the number of basic blocks needed to represent a function.
"go test -c net/http" uses 27% fewer basic blocks.
Probably doesn't affect generated code much, but should help
various passes whose running time and/or space depends on
the number of basic blocks.

Fixes #15631

Change-Id: I0bf21e123f835e2cfa382753955a4f8bce03dfa6
Reviewed-on: https://go-review.googlesource.com/28950
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-09-12 23:27:02 +00:00
Cherry Zhang 8ff4260777 cmd/compile: intrinsify Ctz, Bswap on ARM
Atomic ops on ARM are implemented with kernel calls, so they are
not intrinsified.

Change-Id: I0e7cc2e5526ae1a3d24b4b89be1bd13db071f8ef
Reviewed-on: https://go-review.googlesource.com/28977
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-09-12 19:26:31 +00:00
Keith Randall 320ddcf834 cmd/compile: inline atomics from runtime/internal/atomic on amd64
Inline atomic reads and writes on amd64.  There's no reason
to pay the overhead of a call for these.

To keep atomic loads from being reordered, we make them
return a <value,memory> tuple.

Change the meaning of resultInArg0 for tuple-generating ops
to mean the first part of the result tuple, not the second.
This means we can always put the store part of the tuple last,
matching how arguments are laid out.  This requires reordering
the outputs of add32carry and sub32carry and their descendents
in various architectures.

benchmark                    old ns/op     new ns/op     delta
BenchmarkAtomicLoad64-8      2.09          0.26          -87.56%
BenchmarkAtomicStore64-8     7.54          5.72          -24.14%

TBD (in a different CL): Cas, Or8, ...

Change-Id: I713ea88e7da3026c44ea5bdb56ed094b20bc5207
Reviewed-on: https://go-review.googlesource.com/27641
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-08-25 20:09:04 +00:00
Cherry Zhang 0484052358 [dev.ssa] cmd/compile: remove flags from regMask
Reg allocator skips flag-typed values. Flag allocator uses the type
and whether the op has "clobberFlags" set.

Tested on AMD64, ARM, ARM64, 386. Passed 'toolstash -cmp' on AMD64.
PPC64 is coded blindly.

Change-Id: Ib1cc27efecef6a1bb27f7d7ed035a582660d244f
Reviewed-on: https://go-review.googlesource.com/25480
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-08-07 03:08:03 +00:00
Cherry Zhang 114c05962c [dev.ssa] cmd/compile: fix possible invalid pointer spill in large Zero/Move on ARM
Instead of comparing the address of the end of the memory to zero/copy,
comparing the address of the last element, which is a valid pointer.
Also unify large and unaligned Zero/Move, by passing alignment as AuxInt.

Fixes #16515 for ARM.

Change-Id: I19a62b31c5acf5c55c16a89bea1039c926dc91e5
Reviewed-on: https://go-review.googlesource.com/25300
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-07-27 18:00:19 +00:00
Cherry Zhang d8181d5d75 [dev.ssa] cmd/compile: simplify MOVWreg on ARM
For register-register move, if there is only one use, allocate it in
the same register so we don't need to emit an instruction.

Updates #15365.

Change-Id: Iad41843854a506c521d577ad93fcbe73e8de8065
Reviewed-on: https://go-review.googlesource.com/25059
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-07-21 16:46:58 +00:00
Cherry Zhang 7b9873b9b9 [dev.ssa] cmd/internal/obj, etc.: add and use NEGF, NEGD instructions on ARM
Updates #15365.

Change-Id: I372a5617c2c7d91de545cac0464809b96711b63a
Reviewed-on: https://go-review.googlesource.com/24646
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
2016-07-20 18:15:37 +00:00
Keith Randall 25e0a367da [dev.ssa] cmd/compile: clean up tuple types and selects
Make tuple types and their SelectX ops fully generic.
These ops no longer need to be lowered.
Regalloc understands them and their tuple-generating arguments.
We can now have opcodes returning arbitrary pairs of results.
(And it would be easy to move to >2 results if needed.)

Update arm implementation to the new standard.
Implement just enough in 386 port to do 64-bit add.

Change-Id: I370ed5aacce219c82e1954c61d1f63af76c16f79
Reviewed-on: https://go-review.googlesource.com/24976
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-07-18 16:11:36 +00:00
Cherry Zhang 7d70f84f54 [dev.ssa] cmd/compile: add floating point optimizations in SSA for ARM
Add some simplification rules for floating point ops.

cmd/internal/obj/arm supports instructions that compare FP register
to 0, but runtime softfloat simulator does not. This CL adds these
instructions to softfloat simulator as well.

Updates #15365.

Change-Id: I29405b2bfcb4c8cf106cb7a1a811409fec91b170
Reviewed-on: https://go-review.googlesource.com/24790
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-07-16 03:13:22 +00:00
Cherry Zhang 8cc3f4a17e [dev.ssa] cmd/compile: use shifted and indexed ops in SSA for ARM
This CL implements the following optimizations for ARM:
- use shifted ops (e.g. ADD R1<<2, R2) and indexed load/stores
- break up shift ops. Shifts used to be one SSA op that generates
  multiple instructions. We break them up to multiple ops, which
  allows constant folding and CSE for comparisons. Conditional moves
  are introduced for this.
- simplify zero/sign-extension ops.

Updates #15365.

Change-Id: I55e262a776a7ef2a1505d75e04d1208913c35d39
Reviewed-on: https://go-review.googlesource.com/24512
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-07-15 18:19:59 +00:00
Cherry Zhang 8599fdd9b6 [dev.ssa] cmd/compile: add some ARM optimization rewriting rules
Mostly constant folding rules, analogous to AMD64 ones. Along with
some simplifications.

Updates #15365.

Change-Id: If83bc1188bb05acb982ef3a1c21704c187e3eb24
Reviewed-on: https://go-review.googlesource.com/24210
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-07-06 15:55:29 +00:00