Commit Graph

138 Commits

Author SHA1 Message Date
Daniel Martí 5437b5a24b cmd/compile: disallow rewrite rules from declaring reserved names
If I change a rule in ARM64.rules to use the variable name "b" in a
conflicting way, rulegen would previously not complain, and the compiler
would later give a confusing error:

	$ go run *.go && go build cmd/compile/internal/ssa
	# cmd/compile/internal/ssa
	../rewriteARM64.go:24236:10: b.NewValue0 undefined (type int64 has no field or method NewValue0)

Make rulegen complain early about those cases. Sometimes they might
happen to be harmless, but in general they can easily cause confusion or
unintended effect due to shadowing.

After the change, with the same conflicting rule:

	$ go run *.go && go build cmd/compile/internal/ssa
	2021/03/22 11:31:49 rule ARM64.rules:495 uses the reserved name b
	exit status 1

Note that 24 existing rules were using reserved names. It seems like the
shadowing was harmless, as it wasn't causing typechecking issues nor did
it seem to cause unintended behavior when the rule rewrite code ran.

The bool values "b" were renamed "t", since that seems to have a
precedent in other rules and in the fmt package.

Sequential values like "a b c" were renamed to "x y z", since "b" is
reserved.

Finally, "typ" was renamed to "_typ", since there doesn't seem to be an
obviously better answer.

Passes all three of:

	$ GOARCH=amd64 go build -toolexec 'toolstash -cmp' -a std
	$ GOARCH=arm64 go build -toolexec 'toolstash -cmp' -a std
	$ GOARCH=mips64 go build -toolexec 'toolstash -cmp' -a std

Fixes #45154.

Change-Id: I1cce194dc7b477886a9c218c17973e996bcedccf
Reviewed-on: https://go-review.googlesource.com/c/go/+/303549
Trust: Daniel Martí <mvdan@mvdan.cc>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2021-03-22 16:02:08 +00:00
fanzha02 2b50ab2aee cmd/compile: optimize single-precision floating point square root
Add generic rule to rewrite the single-precision square root expression
with one single-precision instruction. The optimization will reduce two
times of precision converting between double-precision and single-precision.

On arm64 flatform.

previous:
  FCVTSD F0, F0
  FSQRTD F0, F0
  FCVTDS F0, F0

optimized:
  FSQRTS S0, S0

And this patch adds the test case to check the correctness.

This patch refers to CL 241877, contributed by Alice Xu
(dianhong.xu@arm.com)

Change-Id: I6de5d02281c693017ac4bd4c10963dd55989bd7e
Reviewed-on: https://go-review.googlesource.com/c/go/+/276873
Trust: fannie zhang <Fannie.Zhang@arm.com>
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2021-03-02 06:38:07 +00:00
Keith Randall d2911d7612 cmd/compile: fold MOV*nop and MOV*const
MOV*nop and MOV*reg seem superfluous. They are there to keep type
information around that would otherwise get thrown away. Not sure
what we need it for. I think our compiler needs a normalization of
how types are represented in SSA, especially after lowering.

MOV*nop gets in the way of some optimization rules firing, like for
load combining.

For now, just fold MOV*nop and MOV*const. It's certainly safe to
do that, as the type info on the MOV*const isn't ever useful.

R=go1.17

Change-Id: I3630a80afc2455a8e9cd9fde10c7abe05ddc3767
Reviewed-on: https://go-review.googlesource.com/c/go/+/276792
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2021-02-23 20:05:01 +00:00
David Chase 9a19481acb [dev.regabi] cmd/compile: make ordering for InvertFlags more stable
Current many architectures use a rule along the lines of

// Canonicalize the order of arguments to comparisons - helps with CSE.
((CMP|CMPW) x y) && x.ID > y.ID => (InvertFlags ((CMP|CMPW) y x))

to normalize comparisons as much as possible for CSE.  Replace the
ID comparison with something less variable across compiler changes.
This helps avoid spurious failures in some of the codegen-comparison
tests (though the current choice of comparison is sensitive to Op
ordering).

Two tests changed to accommodate modified instruction choice.

Change-Id: Ib35f450bd2bae9d4f9f7838ceaf7ec682bcf1e1a
Reviewed-on: https://go-review.googlesource.com/c/go/+/280155
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-01-13 02:40:43 +00:00
Keith Randall 6c64b6db68 cmd/compile: don't constant fold divide by zero
It just makes the compiler crash. Oops.

Fixes #43099

Change-Id: Id996c14799c1a5d0063ecae3b8770568161c2440
Reviewed-on: https://go-review.googlesource.com/c/go/+/276652
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-12-10 03:18:00 +00:00
Jason A. Donenfeld 4826abb6c2 cmd/compile: do not assume TST and TEQ set V on arm
These replacement rules assume that TST and TEQ set V. But TST and
TEQ do not set V. This is a problem because instructions like LT are
actually checking for N!=V. But with TST and TEQ not setting V, LT
doesn't do anything meaningful. It's possible to construct trivial
miscompilations from this, such as:

    package main

    var x = [4]int32{-0x7fffffff, 0x7fffffff, 2, 4}

    func main() {
        if x[0] > x[1] {
            panic("fail 1")
        }
        if x[2]&x[3] < 0 {
            panic("fail 2") // Fails here
        }
    }

That first comparison sets V, via the CMP that subtracts the values
causing the overflow. Then the second comparison operation thinks that
it uses the result of TST, when it actually uses the V from CMP.

Before this fix:

    TST             R0, R1
    BLT             loc_6C164

After this fix:

    TST             R0, R1
    BMI             loc_6C164

The BMI instruction checks the N flag, which TST sets.  This commit
fixes the issue by using [LG][TE]noov instead of vanilla [LG][TE], and
also adds a test case for the direct issue.

Fixes #42876.

Change-Id: I13c62c88d18574247ad002b671b38d2d0b0fc6fa
Reviewed-on: https://go-review.googlesource.com/c/go/+/274026
Run-TryBot: Jason A. Donenfeld <Jason@zx2c4.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Trust: Jason A. Donenfeld <Jason@zx2c4.com>
2020-12-01 22:59:34 +00:00
Alberto Donizetti 1090f0986d cmd/compile: rename mergeSymTyped to mergeSym
Also make canMergeSym take Syms instead of interface{}

Change-Id: I4926a1fc586aa90e198249d67e5b520404b40869
Reviewed-on: https://go-review.googlesource.com/c/go/+/265817
Trust: Alberto Donizetti <alb.donizetti@gmail.com>
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-10-28 19:19:18 +00:00
Alberto Donizetti 0eb52ac250 cmd/compile: port last ARM rules to typed
Passes

  GOARCH=arm gotip build -toolexec 'toolstash -cmp' -a std

Change-Id: I4a1cace82c5d957774ea20572406af276f02bf97
Reviewed-on: https://go-review.googlesource.com/c/go/+/264680
Trust: Alberto Donizetti <alb.donizetti@gmail.com>
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
2020-10-26 21:55:54 +00:00
Constantin Konstantinidis 9ec0a46a15 cmd/compile: enforce strongly typed rules for ARM (read)
Add type casting to offset.
L246-L247
L1473-L1475

toolstash-check successful.

Change-Id: I816c7556609ca6dd67bff8007c2d006cab89ee2b
Reviewed-on: https://go-review.googlesource.com/c/go/+/257639
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Alberto Donizetti <alb.donizetti@gmail.com>
2020-10-24 13:20:21 +00:00
Constantin Konstantinidis 3ca3ca51d7 cmd/compile: enforce strongly typed rules for ARM (to32)
Type casting changed to 32 from 64.
L1055-L1056
L1193-L1194, L1197-L1198

toolstash-check successful.

Change-Id: Icdb9985673292294bc4549afaaa6cf4fcf92ffa8
Reviewed-on: https://go-review.googlesource.com/c/go/+/257640
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
2020-10-20 22:59:31 +00:00
David Chase 694025e74f cmd/compile: avoid applying ARM CMP->CMN rewrite in unsigned context
Fixes #41780.

Change-Id: I1dc7c19a9f057650905da3a96214c2ff4abb51be
Reviewed-on: https://go-review.googlesource.com/c/go/+/259450
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-06 01:14:39 +00:00
Constantin Konstantinidis f5c3eda4c9 cmd/compile: enforce strongly typed rules for ARM (mergesym)
Replace mergeSym by mergeSymTyped.
L435-L459

toolstash-check successful.

Change-Id: Icbefe5c3589ed4ecdbca3dff9b3a758bdba3b34b
Reviewed-on: https://go-review.googlesource.com/c/go/+/257642
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Alberto Donizetti <alb.donizetti@gmail.com>
2020-09-26 16:04:46 +00:00
Constantin Konstantinidis aacbd7c3aa cmd/compile: enforce strongly typed rules for ARM (GOARM)
Toolstash-check successful for remaining rules using GOARM value.

Change-Id: I254f80d17839ef4957c1b7afbdb4db363a3b9367
Reviewed-on: https://go-review.googlesource.com/c/go/+/240997
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Giovanni Bajo <rasky@develer.com>
2020-09-24 09:15:20 +00:00
Constantin Konstantinidis 1f41f04d2c cmd/compile: enforce strongly typed rules for ARM (8)
add type casting to int32: L148-L156, L774-L778

Toolstash-check successful

Change-Id: Ib6544c1d7853c2811def5b18786e1fc5c18086ca
Reviewed-on: https://go-review.googlesource.com/c/go/+/256097
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Giovanni Bajo <rasky@develer.com>
2020-09-23 19:52:44 +00:00
Constantin Konstantinidis 58fa8075f5 cmd/compile: enforce strongly typed rules for ARM (mem)
L274-L281, L293-L307, L312, L317, L319, L335, L341

Toolstash-check successful

Change-Id: I69e8e9f964c1f35615e4e19401c3f661e1e64a3a
Reviewed-on: https://go-review.googlesource.com/c/go/+/256100
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Giovanni Bajo <rasky@develer.com>
2020-09-23 19:52:14 +00:00
Constantin Konstantinidis c9551f9c19 cmd/compile: enforce strongly typed rules for ARM (1)
Remove type casting in:
L731 - L764, L772, L780 - L781, L1014 - L1054, L1057 - L1068, L1195, L1199

Toolstack-check successful.

Change-Id: I80f90716477f269a227be28b14bf913b78ef375d
Reviewed-on: https://go-review.googlesource.com/c/go/+/228824
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Giovanni Bajo <rasky@develer.com>
2020-09-23 19:51:27 +00:00
Constantin Konstantinidis 150bd4ffd4 cmd/compile: enforce strongly typed rules for ARM (4)
"mul by constant" until "div by constant"
L547-L609

Change-Id: I19ebb5694e383878f505d34df2591a51fe38431a
Reviewed-on: https://go-review.googlesource.com/c/go/+/254662
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Alberto Donizetti <alb.donizetti@gmail.com>
2020-09-23 07:51:17 +00:00
Constantin Konstantinidis d2bd93a010 cmd/compile: enforce strongly typed rules for ARM (6)
End of "constant folding in *shift ops" until EOF
(L1070-)
Toolstash-check is successful.

Change-Id: I55846a459aca5238f831750f04132e13a0baeed7
Reviewed-on: https://go-review.googlesource.com/c/go/+/234198
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Giovanni Bajo <rasky@develer.com>
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Go Bot <gobot@golang.org>
2020-09-22 22:49:21 +00:00
Constantin Konstantinidis bfe72ec56a cmd/compile: enforce strongly typed rules for ARM (5)
From "absorb InvertFlags" until "constant folding in *shift ops"
L666-L1011
Toolstash-check is successful.

Change-Id: Ieed7d4643dc3dc2b3649477e87aebd22c81d1322
Reviewed-on: https://go-review.googlesource.com/c/go/+/234197
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Giovanni Bajo <rasky@develer.com>
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Go Bot <gobot@golang.org>
2020-09-22 22:49:06 +00:00
Constantin Konstantinidis be01f54c24 cmd/compile: enforce strongly typed rules for ARM (3)
Toolstash-check successful from L270 until L543.

Change-Id: Ic39ab86c80f970bfb21e318284f6bb3e8a994220
Reviewed-on: https://go-review.googlesource.com/c/go/+/233439
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Giovanni Bajo <rasky@develer.com>
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Go Bot <gobot@golang.org>
2020-09-22 22:48:49 +00:00
Constantin Konstantinidis 7f7184686b cmd/compile: enforce strongly typed rules for ARM (2)
Toolstash-check successful from L0 until L268

Change-Id: Ifc55ea1e4177c21107c521fc72da2da7b507b8ba
Reviewed-on: https://go-review.googlesource.com/c/go/+/232811
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Giovanni Bajo <rasky@develer.com>
2020-09-22 22:47:43 +00:00
fanzha02 85902b6786 cmd/compile: convert rest ARM64.rules lines to typed aux mode
This patch adds the ARM6464Bitfield auxInt to auxIntType() and
returns its Go type as "arm64Bitfield" type, which is defined
as int16 type.

And the Go type of SymOff auxInt is int32, but some functions
(such as min(), areAdjacentOffsets() and read16/32/64(),etc.)
use SymOff as an input parameter and treat its type as int64,
this patch adds the type conversion for these rules.

Passes toolstash-check -all.

Change-Id: Ib234b48d0a97ef244dd37878e06b5825316dd782
Reviewed-on: https://go-review.googlesource.com/c/go/+/234378
Reviewed-by: Keith Randall <khr@golang.org>
2020-08-24 14:38:38 +00:00
Keith Randall 40ef1faabc cmd/compile: redo flag constant ops for arm
Encode the flag results in an auxint field instead of having
one opcode per flag state. This helps us handle the new *noov
branches in a unified manner.

This is only for arm, arm64 is in a subsequent CL.

We could extend to other architectures as well, athough it would
only be cleanup, no behavioral change.

Update #39505

Change-Id: Ia46cea596faad540d1496c5915ab1274571543f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/238077
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-06-18 20:57:49 +00:00
Xiangdong Ji e031318ca6 cmd/compile: ARM comparisons with 0 incorrect on overflow
Some ARM rewriting rules convert 'comparing to zero' conditions of if
statements to a simplified version utilizing CMN and CMP instructions to
branch over condition flags, in order to save one Add or Sub caculation.

Such optimizations lead to wrong branching in case an overflow/underflow
occurs when executing CMN or CMP.

Fix the issue by introducing new block opcodes that don't honor the
overflow/underflow flag:

  Block-Op         Meaning                   ARM condition codes
  1. LTnoov        less than                 MI
  2. GEnoov        greater than or equal     PL
  3. LEnoov        less than or equal        MI || EQ
  4. GTnoov        greater than              NEQ & PL

The patch also adds a few test cases to cover scenarios that are specific
to ARM and fine-tunes the code generation tests for 'x-const'.

For more details please refer to the previous fix on 64-bit ARM:
  https://go-review.googlesource.com/c/go/+/233097

Go1 perf, 'old' is the non-optimized version, that is removing all concerned
rewriting rules.

name                     old time/op    new time/op     delta
BinaryTree17-8              7.73s ± 0%      7.81s ± 0%  +0.97%  (p=0.000 n=7+8)
Fannkuch11-8                7.06s ± 0%      7.00s ± 0%  -0.83%  (p=0.000 n=8+8)
FmtFprintfEmpty-8           181ns ± 1%      183ns ± 1%  +1.31%  (p=0.001 n=8+8)
FmtFprintfString-8          319ns ± 1%      325ns ± 2%  +1.71%  (p=0.009 n=7+8)
FmtFprintfInt-8             358ns ± 1%      359ns ± 1%    ~     (p=0.293 n=7+7)
FmtFprintfIntInt-8          459ns ± 3%      456ns ± 1%    ~     (p=0.869 n=8+8)
FmtFprintfPrefixedInt-8     535ns ± 4%      538ns ± 4%    ~     (p=0.572 n=8+8)
FmtFprintfFloat-8          1.01µs ± 2%     1.01µs ± 2%    ~     (p=0.625 n=8+8)
FmtManyArgs-8              1.93µs ± 2%     1.93µs ± 1%    ~     (p=0.979 n=8+7)
GobDecode-8                16.1ms ± 1%     16.5ms ± 1%  +2.32%  (p=0.000 n=8+8)
GobEncode-8                15.9ms ± 0%     15.8ms ± 1%  -1.00%  (p=0.000 n=8+7)
Gzip-8                      690ms ± 1%      670ms ± 0%  -2.90%  (p=0.000 n=8+8)
Gunzip-8                    109ms ± 1%      109ms ± 1%    ~     (p=0.694 n=7+8)
HTTPClientServer-8          149µs ± 3%      146µs ± 2%  -1.70%  (p=0.028 n=8+8)
JSONEncode-8               50.5ms ± 1%     49.2ms ± 0%  -2.60%  (p=0.001 n=7+7)
JSONDecode-8                135ms ± 2%      137ms ± 1%    ~     (p=0.054 n=8+7)
Mandelbrot200-8             951ms ± 0%      952ms ± 0%    ~     (p=0.852 n=6+8)
GoParse-8                  9.47ms ± 1%     9.66ms ± 1%  +2.01%  (p=0.000 n=8+8)
RegexpMatchEasy0_32-8       288ns ± 2%      277ns ± 2%  -3.61%  (p=0.000 n=8+8)
RegexpMatchEasy0_1K-8      1.66µs ± 1%     1.69µs ± 2%  +2.21%  (p=0.001 n=7+7)
RegexpMatchEasy1_32-8       334ns ± 1%      305ns ± 2%  -8.86%  (p=0.000 n=8+8)
RegexpMatchEasy1_1K-8      2.14µs ± 2%     2.15µs ± 0%    ~     (p=0.099 n=8+8)
RegexpMatchMedium_32-8     13.3ns ± 1%     13.3ns ± 0%    ~     (p=1.000 n=7+7)
RegexpMatchMedium_1K-8     81.1µs ± 3%     80.7µs ± 1%    ~     (p=0.955 n=7+8)
RegexpMatchHard_32-8       4.26µs ± 0%     4.26µs ± 0%    ~     (p=0.933 n=7+8)
RegexpMatchHard_1K-8        124µs ± 0%      124µs ± 0%  +0.31%  (p=0.000 n=8+8)
Revcomp-8                  14.7ms ± 2%     14.5ms ± 1%  -1.66%  (p=0.003 n=8+8)
Template-8                  197ms ± 2%      200ms ± 3%  +1.62%  (p=0.021 n=8+8)
TimeParse-8                1.33µs ± 1%     1.30µs ± 1%  -1.86%  (p=0.002 n=8+8)
TimeFormat-8               3.04µs ± 1%     3.02µs ± 0%  -0.60%  (p=0.000 n=8+8)

name                     old speed      new speed       delta
GobDecode-8              47.6MB/s ± 1%   46.5MB/s ± 1%  -2.28%  (p=0.000 n=8+8)
GobEncode-8              48.1MB/s ± 0%   48.6MB/s ± 1%  +1.02%  (p=0.000 n=8+7)
Gzip-8                   28.1MB/s ± 1%   29.0MB/s ± 0%  +2.97%  (p=0.000 n=8+8)
Gunzip-8                  178MB/s ± 1%    179MB/s ± 2%    ~     (p=0.694 n=7+8)
JSONEncode-8             38.4MB/s ± 1%   39.4MB/s ± 0%  +2.67%  (p=0.001 n=7+7)
JSONDecode-8             14.3MB/s ± 2%   14.2MB/s ± 1%  -0.81%  (p=0.043 n=8+7)
GoParse-8                6.12MB/s ± 1%   5.99MB/s ± 1%  -2.00%  (p=0.000 n=8+8)
RegexpMatchEasy0_32-8     111MB/s ± 2%    115MB/s ± 2%  +3.77%  (p=0.000 n=8+8)
RegexpMatchEasy0_1K-8     618MB/s ± 1%    604MB/s ± 2%  -2.16%  (p=0.001 n=7+7)
RegexpMatchEasy1_32-8    95.7MB/s ± 1%  105.1MB/s ± 2%  +9.76%  (p=0.000 n=8+8)
RegexpMatchEasy1_1K-8     479MB/s ± 2%    477MB/s ± 0%    ~     (p=0.105 n=8+8)
RegexpMatchMedium_32-8   75.2MB/s ± 1%   75.2MB/s ± 0%    ~     (p=0.247 n=7+7)
RegexpMatchMedium_1K-8   12.6MB/s ± 3%   12.7MB/s ± 1%    ~     (p=0.538 n=7+8)
RegexpMatchHard_32-8     7.52MB/s ± 0%   7.52MB/s ± 0%    ~     (p=0.968 n=7+8)
RegexpMatchHard_1K-8     8.26MB/s ± 0%   8.24MB/s ± 0%  -0.30%  (p=0.001 n=8+8)
Revcomp-8                 173MB/s ± 2%    176MB/s ± 1%  +1.68%  (p=0.003 n=8+8)
Template-8               9.85MB/s ± 2%   9.69MB/s ± 3%  -1.59%  (p=0.021 n=8+8)

Fixes   #39303
Updates #38740

Change-Id: I0a5f87bfda679f66414c0041ace2ca2e28363f36
Reviewed-on: https://go-review.googlesource.com/c/go/+/236637
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-06-09 15:50:33 +00:00
Josh Bleecher Snyder 67a8660b5a cmd/compile: CSE the RHS of rewrite rules
Keep track of all expressions encountered while
generating a rewrite result, and re-use them whenever possible.
Named expressions may still be used for clarity when desired.

Change-Id: I640dca108763eb8baeff8f9a4169300af3445b82
Reviewed-on: https://go-review.googlesource.com/c/go/+/229800
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2020-04-24 16:44:20 +00:00
Michael Munday bfd569fcb0 cmd/compile: delete the floating point Greater and Geq ops
Extend CL 220417 (which removed the integer Greater and Geq ops) to
floating point comparisons. Greater and Geq can always be
implemented using Less and Leq.

Fixes #37316.

Change-Id: Ieaddb4877dd0ff9037a1dd11d0a9a9e45ced71e7
Reviewed-on: https://go-review.googlesource.com/c/go/+/222397
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-07 19:55:05 +00:00
David Chase 47ade08141 cmd/compile: add logging for large (>= 128 byte) copies
For 1.15, unless someone really wants it in 1.14.

A performance-sensitive user thought this would be useful,
though "large" was not well-defined.  If 128 is large,
there are 139 static instances of "large" copies in the compiler
itself.

Includes test.

Change-Id: I81f20c62da59d37072429f3a22c1809e6fb2946d
Reviewed-on: https://go-review.googlesource.com/c/go/+/205066
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-04-03 17:24:48 +00:00
Josh Bleecher Snyder a2bff7c296 cmd/compile: make pre-elimination of rulegen bounds checks more precise
In cases in which we had a named value whose args were all _,
like this rule from ARM.rules:

(MOVBUreg x:(MOVBUload _ _)) -> (MOVWreg x)

We previously inserted

_ = x.Args[1]

even though it is unnecessary.
This change eliminates this pointless bounds check.
And in other cases, we now check bounds just as far as strictly necessary.

No significant movement on any compiler metrics.
Just nicer (and less) code.

Passes toolstash-check -all.

Change-Id: I075dfe9f926cc561cdc705e9ddaab563164bed3a
Reviewed-on: https://go-review.googlesource.com/c/go/+/221781
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-03-02 17:40:11 +00:00
Josh Bleecher Snyder 5e4da0adac cmd/compile: add streamlined Block Reset+AddControl routines
For use in rewrite rules. Shrinks cmd/compile:

compile 20082104  19967416  -114688 -0.571%

Passes toolstash-check -all.

Change-Id: Ic856508b27ec5b7fb9b6ca63e955a7139ae7dc30
Reviewed-on: https://go-review.googlesource.com/c/go/+/221780
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-03-02 17:40:00 +00:00
Josh Bleecher Snyder d7c073ecbf cmd/compile: add specialized Value reset for OpCopy
This:

* Simplifies and shortens the generated code for rewrite rules.
* Shrinks cmd/compile by 86k (0.4%) and makes it easier to compile.
* Removes the stmt boundary code wrangling from Value.reset,
  in favor of doing it in the one place where it actually does some work,
  namely the writebarrier pass. (This was ascertained by inspecting the
  code for cases in which notStmtBoundary values were generated.)

Passes toolstash-check -all.

Change-Id: I25671d4c4bbd772f235195d11da090878ea2cc07
Reviewed-on: https://go-review.googlesource.com/c/go/+/221421
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2020-03-02 16:24:47 +00:00
Josh Bleecher Snyder 7913f7dfcf cmd/compile: add specialized AddArgN functions for rewrite rules
This shrinks the compiler without impacting performance.
(The performance-sensitive part of rewrite rules is the non-match case.)
Passes toolstash-check -all.

Executable size:

file    before    after     Δ       %       
compile 20356168  20163960  -192208 -0.944% 
total   115599376 115407168 -192208 -0.166% 

Text size:

file                       before   after    Δ       %       
cmd/compile/internal/ssa.s 3928309  3778774  -149535 -3.807% 
total                      18862943 18713408 -149535 -0.793% 

Memory allocated compiling package SSA:

SSA               12.7M ± 0%        12.5M ± 0%  -1.74%  (p=0.008 n=5+5)

Compiler speed impact:

name        old time/op       new time/op       delta
Template          211ms ± 1%        211ms ± 2%    ~     (p=0.832 n=49+49)
Unicode          82.8ms ± 2%       83.2ms ± 2%  +0.44%  (p=0.022 n=46+49)
GoTypes           726ms ± 1%        728ms ± 2%    ~     (p=0.076 n=46+48)
Compiler          3.39s ± 2%        3.40s ± 2%    ~     (p=0.633 n=48+49)
SSA               7.71s ± 1%        7.65s ± 1%  -0.78%  (p=0.000 n=45+44)
Flate             134ms ± 1%        134ms ± 1%    ~     (p=0.195 n=50+49)
GoParser          167ms ± 1%        167ms ± 1%    ~     (p=0.390 n=47+47)
Reflect           453ms ± 3%        452ms ± 2%    ~     (p=0.492 n=48+49)
Tar               184ms ± 3%        184ms ± 2%    ~     (p=0.862 n=50+48)
XML               248ms ± 2%        248ms ± 2%    ~     (p=0.096 n=49+47)
[Geo mean]        415ms             415ms       -0.03%

name        old user-time/op  new user-time/op  delta
Template          273ms ± 1%        273ms ± 2%    ~     (p=0.711 n=48+48)
Unicode           117ms ± 6%        117ms ± 5%    ~     (p=0.633 n=50+50)
GoTypes           972ms ± 2%        974ms ± 1%  +0.29%  (p=0.016 n=47+49)
Compiler          4.46s ± 6%        4.51s ± 6%    ~     (p=0.093 n=50+50)
SSA               10.4s ± 1%        10.3s ± 2%  -0.94%  (p=0.000 n=45+50)
Flate             166ms ± 2%        167ms ± 2%    ~     (p=0.148 n=49+48)
GoParser          202ms ± 1%        202ms ± 2%  -0.28%  (p=0.014 n=47+49)
Reflect           594ms ± 2%        594ms ± 2%    ~     (p=0.717 n=48+49)
Tar               224ms ± 2%        224ms ± 2%    ~     (p=0.805 n=50+49)
XML               311ms ± 1%        310ms ± 1%    ~     (p=0.177 n=49+48)
[Geo mean]        537ms             537ms       +0.01%


Change-Id: I562b9f349b34ddcff01771769e6dbbc80604da7a
Reviewed-on: https://go-review.googlesource.com/c/go/+/221237
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-03-01 15:27:58 +00:00
Josh Bleecher Snyder d889f0cb10 cmd/compile: use correct types in phiopt
We try to preserve type correctness of generic ops.
phiopt modified a bool to be an int without a conversion.
Add a conversion. There are a few random fluctations in the
generated code as a result, but nothing noteworthy or systematic.

no binary size changes

file                        before   after    Δ       %       
math.s                      35966    35961    -5      -0.014% 
debug/dwarf.s               108141   108147   +6      +0.006% 
crypto/dsa.s                6047     6044     -3      -0.050% 
image/png.s                 42882    42885    +3      +0.007% 
go/parser.s                 80281    80278    -3      -0.004% 
cmd/internal/obj.s          115116   115113   -3      -0.003% 
go/types.s                  322130   322118   -12     -0.004% 
cmd/internal/obj/arm64.s    151679   151685   +6      +0.004% 
go/internal/gccgoimporter.s 56487    56493    +6      +0.011% 
cmd/test2json.s             1650     1647     -3      -0.182% 
cmd/link/internal/loadelf.s 35442    35443    +1      +0.003% 
cmd/go/internal/work.s      305039   305035   -4      -0.001% 
cmd/link/internal/ld.s      544835   544834   -1      -0.000% 
net/http.s                  558777   558774   -3      -0.001% 
cmd/compile/internal/ssa.s  3926551  3926994  +443    +0.011% 
cmd/compile/internal/gc.s   1552320  1552321  +1      +0.000% 
total                       18862241 18862670 +429    +0.002% 


Change-Id: I4289e773be6be534ea3f907d68f614441b8f9b46
Reviewed-on: https://go-review.googlesource.com/c/go/+/221607
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
2020-02-29 17:02:29 +00:00
Josh Bleecher Snyder eb5cd0fb40 cmd/compile: mark Lsyms as readonly earlier
The SSA backend has rules to read the contents of readonly Lsyms.
However, this rule was failing to trigger for many readonly Lsyms.
This is because the readonly attribute that was set on the Node.Name
was not propagated to its Lsym until the dump globals phase, after SSA runs.

To work around this phase ordering problem, introduce Node.SetReadonly,
which sets Node.Name.Readonly and also configures the Lsym
enough that SSA can use it.

This change also fixes a latent problem in the rewrite rule function,
namely that reads past the end of lsym.P were treated as entirely zero,
instead of merely requiring padding with trailing zeros.

This change also adds an amd64 rule needed to fully optimize
the results of this change. It would be better not to need this,
but the zero extension that should handle this for us
gets optimized away too soon (see #36897 for a similar problem).
I have not investigated whether other platforms also need new
rules to take full advantage of the new optimizations.

Compiled code for (interface{})(true) on amd64 goes from:

LEAQ	type.bool(SB), AX
MOVBLZX	""..stmp_0(SB), BX
LEAQ	runtime.staticbytes(SB), CX
ADDQ	CX, BX

to

LEAQ	type.bool(SB), AX
LEAQ	runtime.staticbytes+1(SB), BX

Prior to this change, the readonly symbol rewrite rules
fired a total of 884 times during make.bash.
Afterwards they fire 1807 times.

file    before    after     Δ       %
cgo     4827832   4823736   -4096   -0.085%
compile 24907768  24895656  -12112  -0.049%
fix     3376952   3368760   -8192   -0.243%
pprof   14751700  14747604  -4096   -0.028%
total   120343528 120315032 -28496  -0.024%

Change-Id: I59ea52138276c37840f69e30fb109fd376d579ec
Reviewed-on: https://go-review.googlesource.com/c/go/+/220499
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-02-26 19:30:21 +00:00
Michael Munday cb74dcc172 cmd/compile: remove Greater* and Geq* generic integer ops
The generic Greater and Geq ops can always be replaced with the Less and
Leq ops. This CL therefore removes them. This simplifies the compiler since
it reduces the number of operations that need handling in both code and in
rewrite rules. This will be especially true when adding control flow
optimizations such as the integer-in-range optimizations in CL 165998.

Change-Id: If0648b2b19998ac1bddccbf251283f3be4ec3040
Reviewed-on: https://go-review.googlesource.com/c/go/+/220417
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-02-26 13:11:53 +00:00
Michael Munday 44fe355694 cmd/compile: canonicalize comparison argument order
Ensure that any comparison between two values has the same argument
order. This helps ensure that they can be eliminated during the
lowered CSE pass which will be particularly important if we eliminate
the Greater and Geq ops (see #37316).

Example:

  CMP R0, R1
  BLT L1
  CMP R1, R0 // different order, cannot eliminate
  BEQ L2

  CMP R0, R1
  BLT L1
  CMP R0, R1 // same order, can eliminate
  BEQ L2

This does have some drawbacks. Notably comparisons might 'flip'
direction in the assembly output after even small changes to the
code or compiler. It should help make optimizations more reliable
however.

compilecmp master -> HEAD
master (218f4572f5): text/template: make reflect.Value indirections more robust
HEAD (f1661fef3e): cmd/compile: canonicalize comparison argument order
platform: linux/amd64

file      before    after     Δ       %
api       6063927   6068023   +4096   +0.068%
asm       5191757   5183565   -8192   -0.158%
cgo       4893518   4901710   +8192   +0.167%
cover     5330345   5326249   -4096   -0.077%
fix       3417778   3421874   +4096   +0.120%
pprof     14889456  14885360  -4096   -0.028%
test2json 2848138   2844042   -4096   -0.144%
trace     11746239  11733951  -12288  -0.105%
total     132739173 132722789 -16384  -0.012%

Change-Id: I11736b3fe2a4553f6fc65018f475e88217fa22f9
Reviewed-on: https://go-review.googlesource.com/c/go/+/220425
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-02-26 10:32:22 +00:00
Josh Bleecher Snyder 5749c0eb5b cmd/compile: use ellipses in ARM rules
Passes toolstash-check -all.

Change-Id: I390789ca5bfe45967ed156590f2690c91d416187
Reviewed-on: https://go-review.googlesource.com/c/go/+/217005
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-02-24 22:14:01 +00:00
Josh Bleecher Snyder 6dd11bcb35 cmd/compile: remove chunking of rewrite rules
We added chunking of rewrite rules to speed up compiling package SSA.
This series of changes has significantly shrunk the number of
rewrite rules, and they are no longer being added nearly as fast.
Now that we are sharing v.Args across multiple rewrite rules,
there is additional benefit to having more rules in a single function.
Removing chunking now has an incidental impact on compiling package SSA,
marginally speeds up other compilation, shrinks the cmd/compile binary,
and simplifies the code.

name        old time/op       new time/op       delta
Template          211ms ± 2%        210ms ± 2%  -0.50%  (p=0.000 n=91+97)
Unicode          81.9ms ± 3%       81.8ms ± 3%    ~     (p=0.179 n=96+91)
GoTypes           731ms ± 2%        731ms ± 1%    ~     (p=0.442 n=94+96)
Compiler          3.43s ± 2%        3.41s ± 2%  -0.36%  (p=0.001 n=98+94)
SSA               8.30s ± 2%        8.32s ± 2%  +0.19%  (p=0.034 n=94+95)
Flate             135ms ± 2%        134ms ± 1%  -0.30%  (p=0.006 n=98+94)
GoParser          167ms ± 1%        167ms ± 1%  -0.22%  (p=0.001 n=92+94)
Reflect           453ms ± 2%        453ms ± 3%    ~     (p=0.306 n=98+97)
Tar               184ms ± 2%        183ms ± 2%  -0.31%  (p=0.012 n=94+94)
XML               249ms ± 2%        248ms ± 1%  -0.26%  (p=0.002 n=96+92)
[Geo mean]        419ms             418ms       -0.21%

name        old user-time/op  new user-time/op  delta
Template          273ms ± 2%        272ms ± 2%  -0.46%  (p=0.000 n=93+96)
Unicode           116ms ± 4%        117ms ± 4%    ~     (p=0.433 n=98+98)
GoTypes           977ms ± 2%        977ms ± 1%    ~     (p=0.971 n=92+99)
Compiler          4.56s ± 6%        4.53s ± 6%    ~     (p=0.081 n=100+100)
SSA               11.1s ± 2%        11.1s ± 2%    ~     (p=0.064 n=99+96)
Flate             167ms ± 2%        167ms ± 1%  -0.24%  (p=0.004 n=95+96)
GoParser          203ms ± 1%        203ms ± 2%  -0.14%  (p=0.049 n=96+97)
Reflect           595ms ± 2%        595ms ± 2%    ~     (p=0.544 n=95+92)
Tar               225ms ± 2%        224ms ± 2%    ~     (p=0.562 n=99+99)
XML               312ms ± 2%        311ms ± 1%    ~     (p=0.050 n=97+93)
[Geo mean]        543ms             542ms       -0.13%

Change-Id: I8d34ab59f154b28f20c6f9e416b976bfce339baa
Reviewed-on: https://go-review.googlesource.com/c/go/+/216220
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-02-21 02:29:11 +00:00
Josh Bleecher Snyder a3f234c706 cmd/compile: reduce bounds checks in generated rewrite rules
CL 213703 converted generated rewrite rules for commutative ops
to use loops instead of duplicated code.

However, it loaded args using expressions like
v.Args[i] and v.Args[i^1], which the compiler could
not eliminate bounds for (including with all outstanding
prove CLs).

Also, given a series of separate rewrite rules for the same op,
we generated bounds checks for every rewrite rule, even though
we were repeatedly loading the same set of args.

This change reduces both sets of bounds checks.

Instead of loading v.Args[i] and v.Args[i^1] for commutative loops,
we now preload v.Args[0] and v.Args[1] into local variables,
and then swap them (as needed) in the commutative loop post statement.

And we now load all top level v.Args into local variables
at the beginning of every rewrite rule function.

The second optimization is the more significant,
but the first helps a little, and they play together
nicely from the perspective of generating the code.

This does increase register pressure, but the reduced bounds
checks more than compensate.

Note that the vast majority of rewrite rules evaluated
are not applied, so the prologue is the most important
part of the rewrite rules.

There is one subtle aspect to the new generated code.
Because the top level v.Args are shared across rewrite rules,
and rule evaluation can swap v_0 and v_1, v_0 and v_1
can end up being swapped from one rule to the next.
That is OK, because any time a rule does not get applied,
they will have been swapped exactly twice.

Passes toolstash-check -all.

name        old time/op       new time/op       delta
Template          213ms ± 2%        211ms ± 2%  -0.85%  (p=0.000 n=92+96)
Unicode          83.5ms ± 2%       83.2ms ± 2%  -0.41%  (p=0.004 n=95+90)
GoTypes           737ms ± 2%        733ms ± 2%  -0.51%  (p=0.000 n=91+94)
Compiler          3.45s ± 2%        3.43s ± 2%  -0.44%  (p=0.000 n=99+100)
SSA               8.54s ± 1%        8.32s ± 2%  -2.56%  (p=0.000 n=96+99)
Flate             136ms ± 2%        135ms ± 1%  -0.47%  (p=0.000 n=96+96)
GoParser          169ms ± 1%        168ms ± 1%  -0.33%  (p=0.000 n=96+93)
Reflect           456ms ± 3%        455ms ± 3%    ~     (p=0.261 n=95+94)
Tar               186ms ± 2%        185ms ± 2%  -0.48%  (p=0.000 n=94+95)
XML               251ms ± 1%        250ms ± 1%  -0.51%  (p=0.000 n=91+94)
[Geo mean]        424ms             421ms       -0.68%

name        old user-time/op  new user-time/op  delta
Template          275ms ± 1%        274ms ± 2%  -0.55%  (p=0.000 n=95+98)
Unicode           118ms ± 4%        118ms ± 4%    ~     (p=0.642 n=98+90)
GoTypes           983ms ± 1%        980ms ± 1%  -0.30%  (p=0.000 n=93+93)
Compiler          4.56s ± 6%        4.52s ± 6%  -0.72%  (p=0.003 n=100+100)
SSA               11.4s ± 1%        11.1s ± 1%  -2.50%  (p=0.000 n=96+97)
Flate             168ms ± 1%        167ms ± 1%  -0.49%  (p=0.000 n=92+92)
GoParser          204ms ± 1%        204ms ± 2%  -0.27%  (p=0.003 n=99+96)
Reflect           599ms ± 2%        598ms ± 2%    ~     (p=0.116 n=95+92)
Tar               227ms ± 2%        225ms ± 2%  -0.57%  (p=0.000 n=95+98)
XML               313ms ± 2%        312ms ± 1%  -0.37%  (p=0.000 n=89+95)
[Geo mean]        547ms             544ms       -0.61%

file    before    after     Δ       %
compile 21113112  21109016  -4096   -0.019%
total   131704940 131700844 -4096   -0.003%

Change-Id: Id6c39e0367e597c0c75b8a4b1eb14cc3cbd11956
Reviewed-on: https://go-review.googlesource.com/c/go/+/216218
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-02-21 00:55:58 +00:00
Josh Bleecher Snyder bd6d78ef37 cmd/compile: use loops to handle commutative ops in rules
Prior to this change, we generated additional rules at rulegen time
for all possible combinations of args to commutative ops.
This is simple and works well, but leads to lots of generated rules.
This in turn has increased the size of the compiler,
made it hard to compile package ssa on small machines,
and provided a disincentive to mark some ops as commutative.

This change reworks how we handle commutative ops.
Instead of generating a rule per argument permutation,
we generate a series of nested loops, one for each commutative op.
Each loop tries both possible argument orderings.

I also considered attempting to canonicalize the inputs to the
rewrite rules. However, because either or both arguments might be
nothing more than an identifier, and because there can be arbitrary
conditions to evaluate during matching, I did not see how to proceed.

The duplicate rule detection now sorts arguments to commutative ops,
so that it can detect commutative-only duplicates.

There may be further optimizations to the new generated code.
In particular, we may not be removing as many bounds checks as before;
I have not investigated deeply. If more work here is needed,
we could do it with more hints or with improvements to the prove pass.

This change has almost no impact on the generated code.
It does not pass toolstash-check, however. In a handful of functions,
for reasons I do not understand, there are minor position changes.

For the entire series ending at this change,
there is negligible compiler performance impact.

The compiler binary shrinks by about 15%,
and package ssa shrinks by about 25%.
Package ssa also compiles ~25% faster with ~25% less memory.

Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4
Reviewed-on: https://go-review.googlesource.com/c/go/+/213703
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-02-20 17:34:07 +00:00
Josh Bleecher Snyder 49f8d45994 cmd/compile: delete duplicate rules
Add logic during rulegen to detect exact duplicates
(after applying commutativity),
and clean up existing duplicates.

Change-Id: I7179f40fc48e236c74b74f429ec9f0f100026530
Reviewed-on: https://go-review.googlesource.com/c/go/+/213699
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
2020-02-20 05:04:39 +00:00
Russ Cox 543c6d2e0d math, cmd/compile: rename Fma to FMA
This API was added for #25819, where it was discussed as math.FMA.
The commit adding it used math.Fma, presumably for consistency
with the rest of the unusual names in package math
(Sincos, Acosh, Erfcinv, Float32bits, etc).

I believe that using an idiomatic Go name is more important here
than consistency with these other names, most of which are historical
baggage from C's standard library.

Early additions like Float32frombits happened before "uppercase for export"
(so they were originally like "float32frombits") and they were not properly
reconsidered when we uppercased the symbols to export them.
That's a mistake we live with.

The names of functions we have added since then, and even a few
that were legacy, are more properly Go-cased, such as IsNaN, IsInf,
and RoundToEven, rather than Isnan, Isinf, and Roundtoeven.
And also constants like MaxFloat32.

For new API, we should keep using proper Go-cased symbols
instead of minimally-upper-cased-C symbols.

So math.FMA, not math.Fma.

This API has not yet been released, so this change does not break
the compatibility promise.

This CL also modifies cmd/compile, since the compiler knows
the name of the function. I could have stopped at changing the
string constants, but it seemed to make more sense to use a
consistent casing everywhere.

Change-Id: I0f6f3407f41e99bfa8239467345c33945088896e
Reviewed-on: https://go-review.googlesource.com/c/go/+/205317
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-11-07 14:51:06 +00:00
Michael Munday 7c3060e923 cmd/compile/internal/ssa: regenerate rewrite rules
Slight differences existed due to a change in rulegen after the
FMA intrinsic code was generated.

Change-Id: Ieb6b3ec1b29985a18d1bbbc5a820ffea699306fd
Reviewed-on: https://go-review.googlesource.com/c/go/+/202443
Run-TryBot: Michael Munday <mike.munday@ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-10-21 21:14:04 +00:00
smasher164 58b031949b cmd/compile: add fma intrinsic for arm
This change introduces an arm intrinsic that generates the FMULAD
instruction for the fused-multiply-add operation on systems that
support it. System support is detected via cpu.ARM.HasVFPv4. A rewrite
rule translates the generic intrinsic to FMULAD.

Updates #25819.

Change-Id: I8459e5dd1cdbdca35f88a78dbeb7d387f1e20efa
Reviewed-on: https://go-review.googlesource.com/c/go/+/142117
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-21 17:42:47 +00:00
Brad Fitzpatrick 07b4abd62e all: remove the nacl port (part 2, amd64p32 + toolchain)
This is part two if the nacl removal. Part 1 was CL 199499.

This CL removes amd64p32 support, which might be useful in the future
if we implement the x32 ABI. It also removes the nacl bits in the
toolchain, and some remaining nacl bits.

Updates #30439

Change-Id: I2475d5bb066d1b474e00e40d95b520e7c2e286e1
Reviewed-on: https://go-review.googlesource.com/c/go/+/200077
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-10-09 22:34:34 +00:00
Michael Munday c7d81bc086 cmd/compile: reduce amount of code generated for block rewrite rules
Add a Reset method to blocks that allows us to reduce the amount of
code we generate for block rewrite rules.

Thanks to Cherry for suggesting a similar fix to this in CL 196557.

Compilebench result:

name                      old time/op       new time/op       delta
Template                        211ms ± 1%        211ms ± 1%   -0.30%  (p=0.028 n=19+20)
Unicode                        83.7ms ± 3%       83.0ms ± 2%   -0.79%  (p=0.029 n=18+19)
GoTypes                         757ms ± 1%        755ms ± 1%   -0.31%  (p=0.034 n=19+19)
Compiler                        3.51s ± 1%        3.50s ± 1%   -0.20%  (p=0.013 n=18+18)
SSA                             11.7s ± 1%        11.7s ± 1%   -0.38%  (p=0.000 n=19+19)
Flate                           131ms ± 1%        130ms ± 1%   -0.32%  (p=0.024 n=18+18)
GoParser                        162ms ± 1%        162ms ± 1%     ~     (p=0.059 n=20+18)
Reflect                         471ms ± 0%        470ms ± 0%   -0.24%  (p=0.045 n=20+17)
Tar                             187ms ± 1%        186ms ± 1%     ~     (p=0.157 n=20+20)
XML                             255ms ± 1%        255ms ± 1%     ~     (p=0.461 n=19+20)
LinkCompiler                    754ms ± 2%        755ms ± 2%     ~     (p=0.919 n=17+17)
ExternalLinkCompiler            2.82s ±16%        2.37s ±10%  -15.94%  (p=0.000 n=20+20)
LinkWithoutDebugCompiler        439ms ± 4%        442ms ± 6%     ~     (p=0.461 n=18+19)
StdCmd                          25.8s ± 2%        25.5s ± 1%   -0.95%  (p=0.000 n=20+20)

name                      old user-time/op  new user-time/op  delta
Template                        240ms ± 8%        238ms ± 7%     ~     (p=0.301 n=20+20)
Unicode                         107ms ±18%        104ms ±13%     ~     (p=0.149 n=20+20)
GoTypes                         883ms ± 3%        888ms ± 2%     ~     (p=0.211 n=20+20)
Compiler                        4.22s ± 1%        4.20s ± 1%     ~     (p=0.077 n=20+18)
SSA                             14.1s ± 1%        14.1s ± 2%     ~     (p=0.192 n=20+20)
Flate                           145ms ±10%        148ms ± 5%     ~     (p=0.126 n=20+18)
GoParser                        186ms ± 7%        186ms ± 7%     ~     (p=0.779 n=20+20)
Reflect                         538ms ± 3%        541ms ± 3%     ~     (p=0.192 n=20+20)
Tar                             218ms ± 4%        217ms ± 6%     ~     (p=0.835 n=19+20)
XML                             298ms ± 5%        298ms ± 5%     ~     (p=0.749 n=19+20)
LinkCompiler                    818ms ± 5%        825ms ± 8%     ~     (p=0.461 n=20+20)
ExternalLinkCompiler            1.55s ± 4%        1.53s ± 5%     ~     (p=0.063 n=20+18)
LinkWithoutDebugCompiler        460ms ±12%        460ms ± 7%     ~     (p=0.925 n=20+20)

name                      old object-bytes  new object-bytes  delta
Template                        554kB ± 0%        554kB ± 0%     ~     (all equal)
Unicode                         215kB ± 0%        215kB ± 0%     ~     (all equal)
GoTypes                        2.01MB ± 0%       2.01MB ± 0%     ~     (all equal)
Compiler                       7.97MB ± 0%       7.97MB ± 0%   +0.00%  (p=0.000 n=20+20)
SSA                            26.8MB ± 0%       26.9MB ± 0%   +0.27%  (p=0.000 n=20+20)
Flate                           340kB ± 0%        340kB ± 0%     ~     (all equal)
GoParser                        434kB ± 0%        434kB ± 0%     ~     (all equal)
Reflect                        1.34MB ± 0%       1.34MB ± 0%     ~     (all equal)
Tar                             480kB ± 0%        480kB ± 0%     ~     (all equal)
XML                             622kB ± 0%        622kB ± 0%     ~     (all equal)

name                      old export-bytes  new export-bytes  delta
Template                       20.4kB ± 0%       20.4kB ± 0%     ~     (all equal)
Unicode                        8.21kB ± 0%       8.21kB ± 0%     ~     (all equal)
GoTypes                        36.6kB ± 0%       36.6kB ± 0%     ~     (all equal)
Compiler                        115kB ± 0%        115kB ± 0%   +0.08%  (p=0.000 n=20+20)
SSA                             141kB ± 0%        141kB ± 0%   +0.07%  (p=0.000 n=20+20)
Flate                          5.11kB ± 0%       5.11kB ± 0%     ~     (all equal)
GoParser                       8.93kB ± 0%       8.93kB ± 0%     ~     (all equal)
Reflect                        11.8kB ± 0%       11.8kB ± 0%     ~     (all equal)
Tar                            10.9kB ± 0%       10.9kB ± 0%     ~     (all equal)
XML                            17.4kB ± 0%       17.4kB ± 0%     ~     (all equal)

name                      old text-bytes    new text-bytes    delta
HelloSize                       742kB ± 0%        742kB ± 0%     ~     (all equal)
CmdGoSize                      10.7MB ± 0%       10.7MB ± 0%     ~     (all equal)

name                      old data-bytes    new data-bytes    delta
HelloSize                      10.7kB ± 0%       10.7kB ± 0%     ~     (all equal)
CmdGoSize                       312kB ± 0%        312kB ± 0%     ~     (all equal)

name                      old bss-bytes     new bss-bytes     delta
HelloSize                       122kB ± 0%        122kB ± 0%     ~     (all equal)
CmdGoSize                       146kB ± 0%        146kB ± 0%     ~     (all equal)

name                      old exe-bytes     new exe-bytes     delta
HelloSize                      1.10MB ± 0%       1.10MB ± 0%     ~     (all equal)
CmdGoSize                      14.9MB ± 0%       14.9MB ± 0%     ~     (all equal)

Change-Id: Ic89a8e62423b3d9fd9391159e0663acf450803b5
Reviewed-on: https://go-review.googlesource.com/c/go/+/198419
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2019-10-07 09:19:12 +00:00
Michael Munday 9c2e7e8bed cmd/compile: allow multiple SSA block control values
Control values are used to choose which successor of a block is
jumped to. Typically a control value takes the form of a 'flags'
value that represents the result of a comparison. Some
architectures however use a variable in a register as a control
value.

Up until now we have managed with a single control value per block.
However some architectures (e.g. s390x and riscv64) have combined
compare-and-branch instructions that take two variables in registers
as parameters. To generate these instructions we need to support 2
control values per block.

This CL allows up to 2 control values to be used in a block in
order to support the addition of compare-and-branch instructions.
I have implemented s390x compare-and-branch instructions in a
different CL.

Passes toolstash-check -all.

Results of compilebench:

name                      old time/op       new time/op       delta
Template                        208ms ± 1%        209ms ± 1%    ~     (p=0.289 n=20+20)
Unicode                        83.7ms ± 1%       83.3ms ± 3%  -0.49%  (p=0.017 n=18+18)
GoTypes                         748ms ± 1%        748ms ± 0%    ~     (p=0.460 n=20+18)
Compiler                        3.47s ± 1%        3.48s ± 1%    ~     (p=0.070 n=19+18)
SSA                             11.5s ± 1%        11.7s ± 1%  +1.64%  (p=0.000 n=19+18)
Flate                           130ms ± 1%        130ms ± 1%    ~     (p=0.588 n=19+20)
GoParser                        160ms ± 1%        161ms ± 1%    ~     (p=0.211 n=20+20)
Reflect                         465ms ± 1%        467ms ± 1%  +0.42%  (p=0.007 n=20+20)
Tar                             184ms ± 1%        185ms ± 2%    ~     (p=0.087 n=18+20)
XML                             253ms ± 1%        253ms ± 1%    ~     (p=0.377 n=20+18)
LinkCompiler                    769ms ± 2%        774ms ± 2%    ~     (p=0.070 n=19+19)
ExternalLinkCompiler            3.59s ±11%        3.68s ± 6%    ~     (p=0.072 n=20+20)
LinkWithoutDebugCompiler        446ms ± 5%        454ms ± 3%  +1.79%  (p=0.002 n=19+20)
StdCmd                          26.0s ± 2%        26.0s ± 2%    ~     (p=0.799 n=20+20)

name                      old user-time/op  new user-time/op  delta
Template                        238ms ± 5%        240ms ± 5%    ~     (p=0.142 n=20+20)
Unicode                         105ms ±11%        106ms ±10%    ~     (p=0.512 n=20+20)
GoTypes                         876ms ± 2%        873ms ± 4%    ~     (p=0.647 n=20+19)
Compiler                        4.17s ± 2%        4.19s ± 1%    ~     (p=0.093 n=20+18)
SSA                             13.9s ± 1%        14.1s ± 1%  +1.45%  (p=0.000 n=18+18)
Flate                           145ms ±13%        146ms ± 5%    ~     (p=0.851 n=20+18)
GoParser                        185ms ± 5%        188ms ± 7%    ~     (p=0.174 n=20+20)
Reflect                         534ms ± 3%        538ms ± 2%    ~     (p=0.105 n=20+18)
Tar                             215ms ± 4%        211ms ± 9%    ~     (p=0.079 n=19+20)
XML                             295ms ± 6%        295ms ± 5%    ~     (p=0.968 n=20+20)
LinkCompiler                    832ms ± 4%        837ms ± 7%    ~     (p=0.707 n=17+20)
ExternalLinkCompiler            1.58s ± 8%        1.60s ± 4%    ~     (p=0.296 n=20+19)
LinkWithoutDebugCompiler        478ms ±12%        489ms ±10%    ~     (p=0.429 n=20+20)

name                      old object-bytes  new object-bytes  delta
Template                        559kB ± 0%        559kB ± 0%    ~     (all equal)
Unicode                         216kB ± 0%        216kB ± 0%    ~     (all equal)
GoTypes                        2.03MB ± 0%       2.03MB ± 0%    ~     (all equal)
Compiler                       8.07MB ± 0%       8.07MB ± 0%  -0.06%  (p=0.000 n=20+20)
SSA                            27.1MB ± 0%       27.3MB ± 0%  +0.89%  (p=0.000 n=20+20)
Flate                           343kB ± 0%        343kB ± 0%    ~     (all equal)
GoParser                        441kB ± 0%        441kB ± 0%    ~     (all equal)
Reflect                        1.36MB ± 0%       1.36MB ± 0%    ~     (all equal)
Tar                             487kB ± 0%        487kB ± 0%    ~     (all equal)
XML                             632kB ± 0%        632kB ± 0%    ~     (all equal)

name                      old export-bytes  new export-bytes  delta
Template                       18.5kB ± 0%       18.5kB ± 0%    ~     (all equal)
Unicode                        7.92kB ± 0%       7.92kB ± 0%    ~     (all equal)
GoTypes                        35.0kB ± 0%       35.0kB ± 0%    ~     (all equal)
Compiler                        109kB ± 0%        110kB ± 0%  +0.72%  (p=0.000 n=20+20)
SSA                             137kB ± 0%        138kB ± 0%  +0.58%  (p=0.000 n=20+20)
Flate                          4.89kB ± 0%       4.89kB ± 0%    ~     (all equal)
GoParser                       8.49kB ± 0%       8.49kB ± 0%    ~     (all equal)
Reflect                        11.4kB ± 0%       11.4kB ± 0%    ~     (all equal)
Tar                            10.5kB ± 0%       10.5kB ± 0%    ~     (all equal)
XML                            16.7kB ± 0%       16.7kB ± 0%    ~     (all equal)

name                      old text-bytes    new text-bytes    delta
HelloSize                       761kB ± 0%        761kB ± 0%    ~     (all equal)
CmdGoSize                      10.8MB ± 0%       10.8MB ± 0%    ~     (all equal)

name                      old data-bytes    new data-bytes    delta
HelloSize                      10.7kB ± 0%       10.7kB ± 0%    ~     (all equal)
CmdGoSize                       312kB ± 0%        312kB ± 0%    ~     (all equal)

name                      old bss-bytes     new bss-bytes     delta
HelloSize                       122kB ± 0%        122kB ± 0%    ~     (all equal)
CmdGoSize                       146kB ± 0%        146kB ± 0%    ~     (all equal)

name                      old exe-bytes     new exe-bytes     delta
HelloSize                      1.13MB ± 0%       1.13MB ± 0%    ~     (all equal)
CmdGoSize                      15.1MB ± 0%       15.1MB ± 0%    ~     (all equal)

Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f
Reviewed-on: https://go-review.googlesource.com/c/go/+/196557
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-02 09:56:36 +00:00
Daniel Martí 870080752d cmd/compile: reduce rulegen's output by 200 KiB
First, renove unnecessary "// cond:" lines from the generated files.
This shaves off about ~7k lines.

Second, join "if cond { break }" statements via "||", which allows us to
deduplicate a large number of them. This shaves off another ~25k lines.

This change is not for readability or simplicity; but rather, to avoid
unnecessary verbosity that makes the generated files larger. All in all,
git reports that the generated files overall weigh ~200KiB less, or
about 2.7% less.

While at it, add a -trace flag to rulegen.

Updates #33644.

Change-Id: I3fac0290a6066070cc62400bf970a4ae0929470a
Reviewed-on: https://go-review.googlesource.com/c/go/+/196498
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-09-23 16:36:10 +00:00
Ben Shi 3cfd003a8a cmd/compile: optimize ARM's math.bits.RotateLeft32
This CL optimizes math.bits.RotateLeft32 to inline
"MOVW Rx@>Ry, Rd" on ARM.

The benchmark results of math/bits show some improvements.
name               old time/op  new time/op  delta
RotateLeft-4       9.42ns ± 0%  6.91ns ± 0%  -26.66%  (p=0.000 n=40+33)
RotateLeft8-4      8.79ns ± 0%  8.79ns ± 0%   -0.04%  (p=0.000 n=40+31)
RotateLeft16-4     8.79ns ± 0%  8.79ns ± 0%   -0.04%  (p=0.000 n=40+32)
RotateLeft32-4     8.16ns ± 0%  7.54ns ± 0%   -7.68%  (p=0.000 n=40+40)
RotateLeft64-4     15.7ns ± 0%  15.7ns ± 0%     ~     (all equal)

updates #31265

Change-Id: I77bc1c2c702d5323fc7cad5264a8e2d5666bf712
Reviewed-on: https://go-review.googlesource.com/c/go/+/188697
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-08-28 15:41:58 +00:00
Ben Shi c683ab8128 cmd/compile: optimize ARM's math.Abs
This CL optimizes math.Abs to an inline ABSD instruction on ARM.

The benchmark results of src/math/ show big improvements.
name                   old time/op  new time/op  delta
Acos-4                  181ns ± 0%   182ns ± 0%   +0.30%  (p=0.000 n=40+40)
Acosh-4                 202ns ± 0%   202ns ± 0%     ~     (all equal)
Asin-4                  163ns ± 0%   163ns ± 0%     ~     (all equal)
Asinh-4                 242ns ± 0%   242ns ± 0%     ~     (all equal)
Atan-4                  120ns ± 0%   121ns ± 0%   +0.83%  (p=0.000 n=40+40)
Atanh-4                 202ns ± 0%   202ns ± 0%     ~     (all equal)
Atan2-4                 173ns ± 0%   173ns ± 0%     ~     (all equal)
Cbrt-4                 1.06µs ± 0%  1.06µs ± 0%   +0.09%  (p=0.000 n=39+37)
Ceil-4                 72.9ns ± 0%  72.8ns ± 0%     ~     (p=0.237 n=40+40)
Copysign-4             13.2ns ± 0%  13.2ns ± 0%     ~     (all equal)
Cos-4                   193ns ± 0%   183ns ± 0%   -5.18%  (p=0.000 n=40+40)
Cosh-4                  254ns ± 0%   239ns ± 0%   -5.91%  (p=0.000 n=40+40)
Erf-4                   112ns ± 0%   112ns ± 0%     ~     (all equal)
Erfc-4                  117ns ± 0%   117ns ± 0%     ~     (all equal)
Erfinv-4                127ns ± 0%   127ns ± 1%     ~     (p=0.492 n=40+40)
Erfcinv-4               128ns ± 0%   128ns ± 0%     ~     (all equal)
Exp-4                   212ns ± 0%   206ns ± 0%   -3.05%  (p=0.000 n=40+40)
ExpGo-4                 216ns ± 0%   209ns ± 0%   -3.24%  (p=0.000 n=40+40)
Expm1-4                 142ns ± 0%   142ns ± 0%     ~     (all equal)
Exp2-4                  191ns ± 0%   184ns ± 0%   -3.45%  (p=0.000 n=40+40)
Exp2Go-4                194ns ± 0%   187ns ± 0%   -3.61%  (p=0.000 n=40+40)
Abs-4                  14.4ns ± 0%   6.3ns ± 0%  -56.39%  (p=0.000 n=38+39)
Dim-4                  12.6ns ± 0%  12.6ns ± 0%     ~     (all equal)
Floor-4                49.6ns ± 0%  49.6ns ± 0%     ~     (all equal)
Max-4                  27.6ns ± 0%  27.6ns ± 0%     ~     (all equal)
Min-4                  27.0ns ± 0%  27.0ns ± 0%     ~     (all equal)
Mod-4                   349ns ± 0%   305ns ± 1%  -12.55%  (p=0.000 n=33+40)
Frexp-4                54.0ns ± 0%  47.1ns ± 0%  -12.78%  (p=0.000 n=38+38)
Gamma-4                 242ns ± 0%   234ns ± 0%   -3.16%  (p=0.000 n=36+40)
Hypot-4                84.8ns ± 0%  67.8ns ± 0%  -20.05%  (p=0.000 n=31+35)
HypotGo-4              88.5ns ± 0%  71.6ns ± 0%  -19.12%  (p=0.000 n=40+38)
Ilogb-4                45.8ns ± 0%  38.9ns ± 0%  -15.12%  (p=0.000 n=40+32)
J0-4                    821ns ± 0%   802ns ± 0%   -2.33%  (p=0.000 n=33+40)
J1-4                    816ns ± 0%   807ns ± 0%   -1.05%  (p=0.000 n=40+29)
Jn-4                   1.67µs ± 0%  1.65µs ± 0%   -1.45%  (p=0.000 n=40+39)
Ldexp-4                61.5ns ± 0%  54.6ns ± 0%  -11.27%  (p=0.000 n=40+32)
Lgamma-4                188ns ± 0%   188ns ± 0%     ~     (all equal)
Log-4                   154ns ± 0%   147ns ± 0%   -4.78%  (p=0.000 n=40+40)
Logb-4                 50.9ns ± 0%  42.7ns ± 0%  -16.11%  (p=0.000 n=34+39)
Log1p-4                 160ns ± 0%   159ns ± 0%     ~     (p=0.828 n=40+40)
Log10-4                 173ns ± 0%   166ns ± 0%   -4.05%  (p=0.000 n=40+40)
Log2-4                 65.3ns ± 0%  58.4ns ± 0%  -10.57%  (p=0.000 n=37+37)
Modf-4                 36.4ns ± 0%  36.4ns ± 0%     ~     (all equal)
Nextafter32-4          36.4ns ± 0%  36.4ns ± 0%     ~     (all equal)
Nextafter64-4          32.7ns ± 0%  32.6ns ± 0%     ~     (p=0.375 n=40+40)
PowInt-4                300ns ± 0%   277ns ± 0%   -7.78%  (p=0.000 n=40+40)
PowFrac-4               676ns ± 0%   635ns ± 0%   -6.00%  (p=0.000 n=40+35)
Pow10Pos-4             17.6ns ± 0%  17.6ns ± 0%     ~     (all equal)
Pow10Neg-4             22.0ns ± 0%  22.0ns ± 0%     ~     (all equal)
Round-4                30.1ns ± 0%  30.1ns ± 0%     ~     (all equal)
RoundToEven-4          38.9ns ± 0%  38.9ns ± 0%     ~     (all equal)
Remainder-4             291ns ± 0%   263ns ± 0%   -9.62%  (p=0.000 n=40+40)
Signbit-4              11.3ns ± 0%  11.3ns ± 0%     ~     (all equal)
Sin-4                   185ns ± 0%   185ns ± 0%     ~     (all equal)
Sincos-4                230ns ± 0%   230ns ± 0%     ~     (all equal)
Sinh-4                  253ns ± 0%   246ns ± 0%   -2.77%  (p=0.000 n=39+39)
SqrtIndirect-4         41.4ns ± 0%  41.4ns ± 0%     ~     (all equal)
SqrtLatency-4          13.8ns ± 0%  13.8ns ± 0%     ~     (all equal)
SqrtIndirectLatency-4  37.0ns ± 0%  37.0ns ± 0%     ~     (p=0.632 n=40+40)
SqrtGoLatency-4         911ns ± 0%   911ns ± 0%   +0.08%  (p=0.000 n=40+40)
SqrtPrime-4            13.2µs ± 0%  13.2µs ± 0%   +0.01%  (p=0.038 n=38+40)
Tan-4                   205ns ± 0%   205ns ± 0%     ~     (all equal)
Tanh-4                  264ns ± 0%   247ns ± 0%   -6.44%  (p=0.000 n=39+32)
Trunc-4                45.2ns ± 0%  45.2ns ± 0%     ~     (all equal)
Y0-4                    796ns ± 0%   792ns ± 0%   -0.55%  (p=0.000 n=35+40)
Y1-4                    804ns ± 0%   797ns ± 0%   -0.82%  (p=0.000 n=24+40)
Yn-4                   1.64µs ± 0%  1.62µs ± 0%   -1.27%  (p=0.000 n=40+39)
Float64bits-4          8.16ns ± 0%  8.16ns ± 0%   +0.04%  (p=0.000 n=35+40)
Float64frombits-4      10.7ns ± 0%  10.7ns ± 0%     ~     (all equal)
Float32bits-4          7.53ns ± 0%  7.53ns ± 0%     ~     (p=0.760 n=40+40)
Float32frombits-4      6.91ns ± 0%  6.91ns ± 0%   -0.04%  (p=0.002 n=32+38)
[Geo mean]              111ns        106ns        -3.98%

Change-Id: I54f4fd7f5160db020b430b556bde59cc0fdb996d
Reviewed-on: https://go-review.googlesource.com/c/go/+/188678
Run-TryBot: Ben Shi <powerman1st@163.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-08-28 15:41:28 +00:00
Daniel Martí 79dee788ec cmd/compile: teach rulegen to remove unused decls
First, add cpu and memory profiling flags, as these are useful to see
where rulegen is spending its time. It now takes many seconds to run on
a recent laptop, so we have to keep an eye on what it's doing.

Second, stop writing '_ = var' lines to keep imports and variables used
at all times. Now that rulegen removes all such unused names, they're
unnecessary.

To perform the removal, lean on go/types to first detect what names are
unused. We can configure it to give us all the type-checking errors in a
file, so we can collect all "declared but not used" errors in a single
pass.

We then use astutil.Apply to remove the relevant nodes based on the line
information from each unused error. This allows us to apply the changes
without having to do extra parser+printer roundtrips to plaintext, which
are far too expensive.

We need to do multiple such passes, as removing an unused variable
declaration might then make another declaration unused. Two passes are
enough to clean every file at the moment, so add a limit of three passes
for now to avoid eating cpu uncontrollably by accident.

The resulting performance of the changes above is a ~30% loss across the
table, since go/types is fairly expensive. The numbers were obtained
with 'benchcmd Rulegen go run *.go', which involves compiling rulegen
itself, but that seems reflective of how the program is used.

	name     old time/op         new time/op         delta
	Rulegen          5.61s ± 0%          7.36s ± 0%  +31.17%  (p=0.016 n=5+4)

	name     old user-time/op    new user-time/op    delta
	Rulegen          7.20s ± 1%          9.92s ± 1%  +37.76%  (p=0.016 n=5+4)

	name     old sys-time/op     new sys-time/op     delta
	Rulegen          135ms ±19%          169ms ±17%  +25.66%  (p=0.032 n=5+5)

	name     old peak-RSS-bytes  new peak-RSS-bytes  delta
	Rulegen         71.0MB ± 2%         85.6MB ± 2%  +20.56%  (p=0.008 n=5+5)

We can live with a bit more resource usage, but the time/op getting
close to 10s isn't good. To win that back, introduce concurrency in
main.go. This further increases resource usage a bit, but the real time
on this quad-core laptop is greatly reduced. The final benchstat is as
follows:

	name     old time/op         new time/op         delta
	Rulegen          5.61s ± 0%          3.97s ± 1%   -29.26%  (p=0.008 n=5+5)

	name     old user-time/op    new user-time/op    delta
	Rulegen          7.20s ± 1%         13.91s ± 1%   +93.09%  (p=0.008 n=5+5)

	name     old sys-time/op     new sys-time/op     delta
	Rulegen          135ms ±19%          269ms ± 9%   +99.17%  (p=0.008 n=5+5)

	name     old peak-RSS-bytes  new peak-RSS-bytes  delta
	Rulegen         71.0MB ± 2%        226.3MB ± 1%  +218.72%  (p=0.008 n=5+5)

It might be possible to reduce the cpu or memory usage in the future,
such as configuring go/types to do less work, or taking shortcuts to
avoid having to run it many times. For now, ~2x cpu and ~4x memory usage
seems like a fair trade for a faster and better rulegen.

Finally, we can remove the old code that tried to remove some unused
variables in a hacky and unmaintainable way.

Change-Id: Iff9e83e3f253babf5a1bd48cc993033b8550cee6
Reviewed-on: https://go-review.googlesource.com/c/go/+/189798
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-08-27 17:04:18 +00:00