Commit Graph

76 Commits

Author SHA1 Message Date
Daniel Martí 5437b5a24b cmd/compile: disallow rewrite rules from declaring reserved names
If I change a rule in ARM64.rules to use the variable name "b" in a
conflicting way, rulegen would previously not complain, and the compiler
would later give a confusing error:

	$ go run *.go && go build cmd/compile/internal/ssa
	# cmd/compile/internal/ssa
	../rewriteARM64.go:24236:10: b.NewValue0 undefined (type int64 has no field or method NewValue0)

Make rulegen complain early about those cases. Sometimes they might
happen to be harmless, but in general they can easily cause confusion or
unintended effect due to shadowing.

After the change, with the same conflicting rule:

	$ go run *.go && go build cmd/compile/internal/ssa
	2021/03/22 11:31:49 rule ARM64.rules:495 uses the reserved name b
	exit status 1

Note that 24 existing rules were using reserved names. It seems like the
shadowing was harmless, as it wasn't causing typechecking issues nor did
it seem to cause unintended behavior when the rule rewrite code ran.

The bool values "b" were renamed "t", since that seems to have a
precedent in other rules and in the fmt package.

Sequential values like "a b c" were renamed to "x y z", since "b" is
reserved.

Finally, "typ" was renamed to "_typ", since there doesn't seem to be an
obviously better answer.

Passes all three of:

	$ GOARCH=amd64 go build -toolexec 'toolstash -cmp' -a std
	$ GOARCH=arm64 go build -toolexec 'toolstash -cmp' -a std
	$ GOARCH=mips64 go build -toolexec 'toolstash -cmp' -a std

Fixes #45154.

Change-Id: I1cce194dc7b477886a9c218c17973e996bcedccf
Reviewed-on: https://go-review.googlesource.com/c/go/+/303549
Trust: Daniel Martí <mvdan@mvdan.cc>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2021-03-22 16:02:08 +00:00
fanzha02 2b50ab2aee cmd/compile: optimize single-precision floating point square root
Add generic rule to rewrite the single-precision square root expression
with one single-precision instruction. The optimization will reduce two
times of precision converting between double-precision and single-precision.

On arm64 flatform.

previous:
  FCVTSD F0, F0
  FSQRTD F0, F0
  FCVTDS F0, F0

optimized:
  FSQRTS S0, S0

And this patch adds the test case to check the correctness.

This patch refers to CL 241877, contributed by Alice Xu
(dianhong.xu@arm.com)

Change-Id: I6de5d02281c693017ac4bd4c10963dd55989bd7e
Reviewed-on: https://go-review.googlesource.com/c/go/+/276873
Trust: fannie zhang <Fannie.Zhang@arm.com>
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2021-03-02 06:38:07 +00:00
Keith Randall d2911d7612 cmd/compile: fold MOV*nop and MOV*const
MOV*nop and MOV*reg seem superfluous. They are there to keep type
information around that would otherwise get thrown away. Not sure
what we need it for. I think our compiler needs a normalization of
how types are represented in SSA, especially after lowering.

MOV*nop gets in the way of some optimization rules firing, like for
load combining.

For now, just fold MOV*nop and MOV*const. It's certainly safe to
do that, as the type info on the MOV*const isn't ever useful.

R=go1.17

Change-Id: I3630a80afc2455a8e9cd9fde10c7abe05ddc3767
Reviewed-on: https://go-review.googlesource.com/c/go/+/276792
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2021-02-23 20:05:01 +00:00
Keith Randall 6c64b6db68 cmd/compile: don't constant fold divide by zero
It just makes the compiler crash. Oops.

Fixes #43099

Change-Id: Id996c14799c1a5d0063ecae3b8770568161c2440
Reviewed-on: https://go-review.googlesource.com/c/go/+/276652
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-12-10 03:18:00 +00:00
Alberto Donizetti 399b5d14d4 cmd/compile: stop MOVW-ing -1 as SRA shift amount in mips
The shift amount in SRAconst needs to be in the [0,31] range, so stop
MOVWing -1 to SRA in the Rsh lowering rules.

Also see CL 270117.

Passes

  $ GOARCH=mips go build -toolexec 'toolstash -cmp' -a std
  $ GOARCH=mipsle go build -toolexec 'toolstash -cmp' -a std

Updates #42587

Change-Id: Ib5eb99b82310e404cc2d6f0c619b21b8a15406ce
Reviewed-on: https://go-review.googlesource.com/c/go/+/270558
Trust: Alberto Donizetti <alb.donizetti@gmail.com>
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-11-18 12:04:06 +00:00
Alberto Donizetti f2eea4c1dc cmd/compile: mask SLL,SRL,SRAconst shift amount
mips SRA/SLL/SRL shift amounts are used mod 32; this change aligns the
XXXconst rules to mask the shift amount by &31.

Passes

  $ GOARCH=mips go build -toolexec 'toolstash -cmp' -a std
  $ GOARCH=mipsle go build -toolexec 'toolstash -cmp' -a std

Fixes #42587

Change-Id: I6003ebd0bc500fba4cf6fb10254e1b557bf8c48f
Reviewed-on: https://go-review.googlesource.com/c/go/+/270117
Trust: Alberto Donizetti <alb.donizetti@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-11-16 08:22:10 +00:00
Alberto Donizetti 1090f0986d cmd/compile: rename mergeSymTyped to mergeSym
Also make canMergeSym take Syms instead of interface{}

Change-Id: I4926a1fc586aa90e198249d67e5b520404b40869
Reviewed-on: https://go-review.googlesource.com/c/go/+/265817
Trust: Alberto Donizetti <alb.donizetti@gmail.com>
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-10-28 19:19:18 +00:00
Alberto Donizetti de2d1c3fe2 cmd/compile: replace int32(b2i(x)) with b2i32(x) in rules
Change-Id: I7fbb0c1ead6e29a7445c8ab43f7050947597f3e8
Reviewed-on: https://go-review.googlesource.com/c/go/+/265497
Trust: Alberto Donizetti <alb.donizetti@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
2020-10-27 20:04:19 +00:00
Alberto Donizetti 5c1122b528 cmd/compile: delete isPowerOfTwo, switch to isPowerOfTwo64
rewrite.go has two identical functions isPowerOfTwo and
isPowerOfTwo64; the former has been there for a while, while the
latter was added together with isPowerOfTwo{8,16,32} for use in typed
rules.

This change deletes isPowerOfTwo and switch to using isPowerOfTwo64
everywhere.

Change-Id: If26c94565d2393fac6f0ba117ee7ee2fc915f7cd
Reviewed-on: https://go-review.googlesource.com/c/go/+/265417
Trust: Alberto Donizetti <alb.donizetti@gmail.com>
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-10-27 20:03:41 +00:00
Michael Pratt e5ad73508e cmd/compile: intrinsify runtime/internal/atomic.{And,Or} on MIPS
This one is trivial, as there are already 32-bit AND and OR ops used to
implement the more complex 8-bit versions.

Change-Id: Ic48a53ea291d0067ebeab8e96c82e054daf20ae7
Reviewed-on: https://go-review.googlesource.com/c/go/+/263149
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-23 16:25:07 +00:00
Josh Bleecher Snyder 67a8660b5a cmd/compile: CSE the RHS of rewrite rules
Keep track of all expressions encountered while
generating a rewrite result, and re-use them whenever possible.
Named expressions may still be used for clarity when desired.

Change-Id: I640dca108763eb8baeff8f9a4169300af3445b82
Reviewed-on: https://go-review.googlesource.com/c/go/+/229800
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2020-04-24 16:44:20 +00:00
Alberto Donizetti e47a17aeee cmd/compile: convert remaining mips rules to typed aux
Passes

  GOARCH=mips gotip build -toolexec 'toolstash -cmp' -a std
  GOARCH=mipsle gotip build -toolexec 'toolstash -cmp' -a std

Change-Id: I35df0522e299aa755491cd25f47f1f1bf447848c
Reviewed-on: https://go-review.googlesource.com/c/go/+/229637
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-24 07:22:41 +00:00
Alberto Donizetti 81df5e69fc cmd/compile: switch to typed aux for mips lowering rules
This covers most of the lowering rules.

Passes

  GOARCH=mips gotip build -toolexec 'toolstash -cmp' -a std
  GOARCH=mipsle gotip build -toolexec 'toolstash -cmp' -a std

Change-Id: I9d00aaebecb36622e3bdaf556e5a9377670bf86b
Reviewed-on: https://go-review.googlesource.com/c/go/+/229102
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-22 17:30:07 +00:00
Michael Munday bfd569fcb0 cmd/compile: delete the floating point Greater and Geq ops
Extend CL 220417 (which removed the integer Greater and Geq ops) to
floating point comparisons. Greater and Geq can always be
implemented using Less and Leq.

Fixes #37316.

Change-Id: Ieaddb4877dd0ff9037a1dd11d0a9a9e45ced71e7
Reviewed-on: https://go-review.googlesource.com/c/go/+/222397
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-07 19:55:05 +00:00
David Chase 47ade08141 cmd/compile: add logging for large (>= 128 byte) copies
For 1.15, unless someone really wants it in 1.14.

A performance-sensitive user thought this would be useful,
though "large" was not well-defined.  If 128 is large,
there are 139 static instances of "large" copies in the compiler
itself.

Includes test.

Change-Id: I81f20c62da59d37072429f3a22c1809e6fb2946d
Reviewed-on: https://go-review.googlesource.com/c/go/+/205066
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-04-03 17:24:48 +00:00
Josh Bleecher Snyder a2bff7c296 cmd/compile: make pre-elimination of rulegen bounds checks more precise
In cases in which we had a named value whose args were all _,
like this rule from ARM.rules:

(MOVBUreg x:(MOVBUload _ _)) -> (MOVWreg x)

We previously inserted

_ = x.Args[1]

even though it is unnecessary.
This change eliminates this pointless bounds check.
And in other cases, we now check bounds just as far as strictly necessary.

No significant movement on any compiler metrics.
Just nicer (and less) code.

Passes toolstash-check -all.

Change-Id: I075dfe9f926cc561cdc705e9ddaab563164bed3a
Reviewed-on: https://go-review.googlesource.com/c/go/+/221781
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-03-02 17:40:11 +00:00
Josh Bleecher Snyder 5e4da0adac cmd/compile: add streamlined Block Reset+AddControl routines
For use in rewrite rules. Shrinks cmd/compile:

compile 20082104  19967416  -114688 -0.571%

Passes toolstash-check -all.

Change-Id: Ic856508b27ec5b7fb9b6ca63e955a7139ae7dc30
Reviewed-on: https://go-review.googlesource.com/c/go/+/221780
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-03-02 17:40:00 +00:00
Josh Bleecher Snyder d7c073ecbf cmd/compile: add specialized Value reset for OpCopy
This:

* Simplifies and shortens the generated code for rewrite rules.
* Shrinks cmd/compile by 86k (0.4%) and makes it easier to compile.
* Removes the stmt boundary code wrangling from Value.reset,
  in favor of doing it in the one place where it actually does some work,
  namely the writebarrier pass. (This was ascertained by inspecting the
  code for cases in which notStmtBoundary values were generated.)

Passes toolstash-check -all.

Change-Id: I25671d4c4bbd772f235195d11da090878ea2cc07
Reviewed-on: https://go-review.googlesource.com/c/go/+/221421
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2020-03-02 16:24:47 +00:00
Josh Bleecher Snyder 7913f7dfcf cmd/compile: add specialized AddArgN functions for rewrite rules
This shrinks the compiler without impacting performance.
(The performance-sensitive part of rewrite rules is the non-match case.)
Passes toolstash-check -all.

Executable size:

file    before    after     Δ       %       
compile 20356168  20163960  -192208 -0.944% 
total   115599376 115407168 -192208 -0.166% 

Text size:

file                       before   after    Δ       %       
cmd/compile/internal/ssa.s 3928309  3778774  -149535 -3.807% 
total                      18862943 18713408 -149535 -0.793% 

Memory allocated compiling package SSA:

SSA               12.7M ± 0%        12.5M ± 0%  -1.74%  (p=0.008 n=5+5)

Compiler speed impact:

name        old time/op       new time/op       delta
Template          211ms ± 1%        211ms ± 2%    ~     (p=0.832 n=49+49)
Unicode          82.8ms ± 2%       83.2ms ± 2%  +0.44%  (p=0.022 n=46+49)
GoTypes           726ms ± 1%        728ms ± 2%    ~     (p=0.076 n=46+48)
Compiler          3.39s ± 2%        3.40s ± 2%    ~     (p=0.633 n=48+49)
SSA               7.71s ± 1%        7.65s ± 1%  -0.78%  (p=0.000 n=45+44)
Flate             134ms ± 1%        134ms ± 1%    ~     (p=0.195 n=50+49)
GoParser          167ms ± 1%        167ms ± 1%    ~     (p=0.390 n=47+47)
Reflect           453ms ± 3%        452ms ± 2%    ~     (p=0.492 n=48+49)
Tar               184ms ± 3%        184ms ± 2%    ~     (p=0.862 n=50+48)
XML               248ms ± 2%        248ms ± 2%    ~     (p=0.096 n=49+47)
[Geo mean]        415ms             415ms       -0.03%

name        old user-time/op  new user-time/op  delta
Template          273ms ± 1%        273ms ± 2%    ~     (p=0.711 n=48+48)
Unicode           117ms ± 6%        117ms ± 5%    ~     (p=0.633 n=50+50)
GoTypes           972ms ± 2%        974ms ± 1%  +0.29%  (p=0.016 n=47+49)
Compiler          4.46s ± 6%        4.51s ± 6%    ~     (p=0.093 n=50+50)
SSA               10.4s ± 1%        10.3s ± 2%  -0.94%  (p=0.000 n=45+50)
Flate             166ms ± 2%        167ms ± 2%    ~     (p=0.148 n=49+48)
GoParser          202ms ± 1%        202ms ± 2%  -0.28%  (p=0.014 n=47+49)
Reflect           594ms ± 2%        594ms ± 2%    ~     (p=0.717 n=48+49)
Tar               224ms ± 2%        224ms ± 2%    ~     (p=0.805 n=50+49)
XML               311ms ± 1%        310ms ± 1%    ~     (p=0.177 n=49+48)
[Geo mean]        537ms             537ms       +0.01%


Change-Id: I562b9f349b34ddcff01771769e6dbbc80604da7a
Reviewed-on: https://go-review.googlesource.com/c/go/+/221237
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-03-01 15:27:58 +00:00
Josh Bleecher Snyder d889f0cb10 cmd/compile: use correct types in phiopt
We try to preserve type correctness of generic ops.
phiopt modified a bool to be an int without a conversion.
Add a conversion. There are a few random fluctations in the
generated code as a result, but nothing noteworthy or systematic.

no binary size changes

file                        before   after    Δ       %       
math.s                      35966    35961    -5      -0.014% 
debug/dwarf.s               108141   108147   +6      +0.006% 
crypto/dsa.s                6047     6044     -3      -0.050% 
image/png.s                 42882    42885    +3      +0.007% 
go/parser.s                 80281    80278    -3      -0.004% 
cmd/internal/obj.s          115116   115113   -3      -0.003% 
go/types.s                  322130   322118   -12     -0.004% 
cmd/internal/obj/arm64.s    151679   151685   +6      +0.004% 
go/internal/gccgoimporter.s 56487    56493    +6      +0.011% 
cmd/test2json.s             1650     1647     -3      -0.182% 
cmd/link/internal/loadelf.s 35442    35443    +1      +0.003% 
cmd/go/internal/work.s      305039   305035   -4      -0.001% 
cmd/link/internal/ld.s      544835   544834   -1      -0.000% 
net/http.s                  558777   558774   -3      -0.001% 
cmd/compile/internal/ssa.s  3926551  3926994  +443    +0.011% 
cmd/compile/internal/gc.s   1552320  1552321  +1      +0.000% 
total                       18862241 18862670 +429    +0.002% 


Change-Id: I4289e773be6be534ea3f907d68f614441b8f9b46
Reviewed-on: https://go-review.googlesource.com/c/go/+/221607
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
2020-02-29 17:02:29 +00:00
Michael Munday cb74dcc172 cmd/compile: remove Greater* and Geq* generic integer ops
The generic Greater and Geq ops can always be replaced with the Less and
Leq ops. This CL therefore removes them. This simplifies the compiler since
it reduces the number of operations that need handling in both code and in
rewrite rules. This will be especially true when adding control flow
optimizations such as the integer-in-range optimizations in CL 165998.

Change-Id: If0648b2b19998ac1bddccbf251283f3be4ec3040
Reviewed-on: https://go-review.googlesource.com/c/go/+/220417
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-02-26 13:11:53 +00:00
Josh Bleecher Snyder 1bda9b001f cmd/compile: use ellipses in MIPS rules
Passes toolstash-check -all.

Change-Id: I14db0acb9b531029c613fa31bc076928651b6448
Reviewed-on: https://go-review.googlesource.com/c/go/+/217007
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-02-24 23:32:38 +00:00
Josh Bleecher Snyder 6dd11bcb35 cmd/compile: remove chunking of rewrite rules
We added chunking of rewrite rules to speed up compiling package SSA.
This series of changes has significantly shrunk the number of
rewrite rules, and they are no longer being added nearly as fast.
Now that we are sharing v.Args across multiple rewrite rules,
there is additional benefit to having more rules in a single function.
Removing chunking now has an incidental impact on compiling package SSA,
marginally speeds up other compilation, shrinks the cmd/compile binary,
and simplifies the code.

name        old time/op       new time/op       delta
Template          211ms ± 2%        210ms ± 2%  -0.50%  (p=0.000 n=91+97)
Unicode          81.9ms ± 3%       81.8ms ± 3%    ~     (p=0.179 n=96+91)
GoTypes           731ms ± 2%        731ms ± 1%    ~     (p=0.442 n=94+96)
Compiler          3.43s ± 2%        3.41s ± 2%  -0.36%  (p=0.001 n=98+94)
SSA               8.30s ± 2%        8.32s ± 2%  +0.19%  (p=0.034 n=94+95)
Flate             135ms ± 2%        134ms ± 1%  -0.30%  (p=0.006 n=98+94)
GoParser          167ms ± 1%        167ms ± 1%  -0.22%  (p=0.001 n=92+94)
Reflect           453ms ± 2%        453ms ± 3%    ~     (p=0.306 n=98+97)
Tar               184ms ± 2%        183ms ± 2%  -0.31%  (p=0.012 n=94+94)
XML               249ms ± 2%        248ms ± 1%  -0.26%  (p=0.002 n=96+92)
[Geo mean]        419ms             418ms       -0.21%

name        old user-time/op  new user-time/op  delta
Template          273ms ± 2%        272ms ± 2%  -0.46%  (p=0.000 n=93+96)
Unicode           116ms ± 4%        117ms ± 4%    ~     (p=0.433 n=98+98)
GoTypes           977ms ± 2%        977ms ± 1%    ~     (p=0.971 n=92+99)
Compiler          4.56s ± 6%        4.53s ± 6%    ~     (p=0.081 n=100+100)
SSA               11.1s ± 2%        11.1s ± 2%    ~     (p=0.064 n=99+96)
Flate             167ms ± 2%        167ms ± 1%  -0.24%  (p=0.004 n=95+96)
GoParser          203ms ± 1%        203ms ± 2%  -0.14%  (p=0.049 n=96+97)
Reflect           595ms ± 2%        595ms ± 2%    ~     (p=0.544 n=95+92)
Tar               225ms ± 2%        224ms ± 2%    ~     (p=0.562 n=99+99)
XML               312ms ± 2%        311ms ± 1%    ~     (p=0.050 n=97+93)
[Geo mean]        543ms             542ms       -0.13%

Change-Id: I8d34ab59f154b28f20c6f9e416b976bfce339baa
Reviewed-on: https://go-review.googlesource.com/c/go/+/216220
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-02-21 02:29:11 +00:00
Josh Bleecher Snyder a3f234c706 cmd/compile: reduce bounds checks in generated rewrite rules
CL 213703 converted generated rewrite rules for commutative ops
to use loops instead of duplicated code.

However, it loaded args using expressions like
v.Args[i] and v.Args[i^1], which the compiler could
not eliminate bounds for (including with all outstanding
prove CLs).

Also, given a series of separate rewrite rules for the same op,
we generated bounds checks for every rewrite rule, even though
we were repeatedly loading the same set of args.

This change reduces both sets of bounds checks.

Instead of loading v.Args[i] and v.Args[i^1] for commutative loops,
we now preload v.Args[0] and v.Args[1] into local variables,
and then swap them (as needed) in the commutative loop post statement.

And we now load all top level v.Args into local variables
at the beginning of every rewrite rule function.

The second optimization is the more significant,
but the first helps a little, and they play together
nicely from the perspective of generating the code.

This does increase register pressure, but the reduced bounds
checks more than compensate.

Note that the vast majority of rewrite rules evaluated
are not applied, so the prologue is the most important
part of the rewrite rules.

There is one subtle aspect to the new generated code.
Because the top level v.Args are shared across rewrite rules,
and rule evaluation can swap v_0 and v_1, v_0 and v_1
can end up being swapped from one rule to the next.
That is OK, because any time a rule does not get applied,
they will have been swapped exactly twice.

Passes toolstash-check -all.

name        old time/op       new time/op       delta
Template          213ms ± 2%        211ms ± 2%  -0.85%  (p=0.000 n=92+96)
Unicode          83.5ms ± 2%       83.2ms ± 2%  -0.41%  (p=0.004 n=95+90)
GoTypes           737ms ± 2%        733ms ± 2%  -0.51%  (p=0.000 n=91+94)
Compiler          3.45s ± 2%        3.43s ± 2%  -0.44%  (p=0.000 n=99+100)
SSA               8.54s ± 1%        8.32s ± 2%  -2.56%  (p=0.000 n=96+99)
Flate             136ms ± 2%        135ms ± 1%  -0.47%  (p=0.000 n=96+96)
GoParser          169ms ± 1%        168ms ± 1%  -0.33%  (p=0.000 n=96+93)
Reflect           456ms ± 3%        455ms ± 3%    ~     (p=0.261 n=95+94)
Tar               186ms ± 2%        185ms ± 2%  -0.48%  (p=0.000 n=94+95)
XML               251ms ± 1%        250ms ± 1%  -0.51%  (p=0.000 n=91+94)
[Geo mean]        424ms             421ms       -0.68%

name        old user-time/op  new user-time/op  delta
Template          275ms ± 1%        274ms ± 2%  -0.55%  (p=0.000 n=95+98)
Unicode           118ms ± 4%        118ms ± 4%    ~     (p=0.642 n=98+90)
GoTypes           983ms ± 1%        980ms ± 1%  -0.30%  (p=0.000 n=93+93)
Compiler          4.56s ± 6%        4.52s ± 6%  -0.72%  (p=0.003 n=100+100)
SSA               11.4s ± 1%        11.1s ± 1%  -2.50%  (p=0.000 n=96+97)
Flate             168ms ± 1%        167ms ± 1%  -0.49%  (p=0.000 n=92+92)
GoParser          204ms ± 1%        204ms ± 2%  -0.27%  (p=0.003 n=99+96)
Reflect           599ms ± 2%        598ms ± 2%    ~     (p=0.116 n=95+92)
Tar               227ms ± 2%        225ms ± 2%  -0.57%  (p=0.000 n=95+98)
XML               313ms ± 2%        312ms ± 1%  -0.37%  (p=0.000 n=89+95)
[Geo mean]        547ms             544ms       -0.61%

file    before    after     Δ       %
compile 21113112  21109016  -4096   -0.019%
total   131704940 131700844 -4096   -0.003%

Change-Id: Id6c39e0367e597c0c75b8a4b1eb14cc3cbd11956
Reviewed-on: https://go-review.googlesource.com/c/go/+/216218
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-02-21 00:55:58 +00:00
Josh Bleecher Snyder bd6d78ef37 cmd/compile: use loops to handle commutative ops in rules
Prior to this change, we generated additional rules at rulegen time
for all possible combinations of args to commutative ops.
This is simple and works well, but leads to lots of generated rules.
This in turn has increased the size of the compiler,
made it hard to compile package ssa on small machines,
and provided a disincentive to mark some ops as commutative.

This change reworks how we handle commutative ops.
Instead of generating a rule per argument permutation,
we generate a series of nested loops, one for each commutative op.
Each loop tries both possible argument orderings.

I also considered attempting to canonicalize the inputs to the
rewrite rules. However, because either or both arguments might be
nothing more than an identifier, and because there can be arbitrary
conditions to evaluate during matching, I did not see how to proceed.

The duplicate rule detection now sorts arguments to commutative ops,
so that it can detect commutative-only duplicates.

There may be further optimizations to the new generated code.
In particular, we may not be removing as many bounds checks as before;
I have not investigated deeply. If more work here is needed,
we could do it with more hints or with improvements to the prove pass.

This change has almost no impact on the generated code.
It does not pass toolstash-check, however. In a handful of functions,
for reasons I do not understand, there are minor position changes.

For the entire series ending at this change,
there is negligible compiler performance impact.

The compiler binary shrinks by about 15%,
and package ssa shrinks by about 25%.
Package ssa also compiles ~25% faster with ~25% less memory.

Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4
Reviewed-on: https://go-review.googlesource.com/c/go/+/213703
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-02-20 17:34:07 +00:00
Austin Clements ec10e6f364 cmd/compile: fix missing lowering of atomic {Load,Store}8
CL 203284 added a compiler intrinsics from atomic Load8 and Store8 on
several architectures, but missed the lowering on MIPS. This CL fixes
that.

Updates #10958, #24543.

Change-Id: I82e88971554fe8c33ad2bf195a633c44b9ac4cf7
Reviewed-on: https://go-review.googlesource.com/c/go/+/203977
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-10-29 13:48:29 +00:00
Michael Munday c7d81bc086 cmd/compile: reduce amount of code generated for block rewrite rules
Add a Reset method to blocks that allows us to reduce the amount of
code we generate for block rewrite rules.

Thanks to Cherry for suggesting a similar fix to this in CL 196557.

Compilebench result:

name                      old time/op       new time/op       delta
Template                        211ms ± 1%        211ms ± 1%   -0.30%  (p=0.028 n=19+20)
Unicode                        83.7ms ± 3%       83.0ms ± 2%   -0.79%  (p=0.029 n=18+19)
GoTypes                         757ms ± 1%        755ms ± 1%   -0.31%  (p=0.034 n=19+19)
Compiler                        3.51s ± 1%        3.50s ± 1%   -0.20%  (p=0.013 n=18+18)
SSA                             11.7s ± 1%        11.7s ± 1%   -0.38%  (p=0.000 n=19+19)
Flate                           131ms ± 1%        130ms ± 1%   -0.32%  (p=0.024 n=18+18)
GoParser                        162ms ± 1%        162ms ± 1%     ~     (p=0.059 n=20+18)
Reflect                         471ms ± 0%        470ms ± 0%   -0.24%  (p=0.045 n=20+17)
Tar                             187ms ± 1%        186ms ± 1%     ~     (p=0.157 n=20+20)
XML                             255ms ± 1%        255ms ± 1%     ~     (p=0.461 n=19+20)
LinkCompiler                    754ms ± 2%        755ms ± 2%     ~     (p=0.919 n=17+17)
ExternalLinkCompiler            2.82s ±16%        2.37s ±10%  -15.94%  (p=0.000 n=20+20)
LinkWithoutDebugCompiler        439ms ± 4%        442ms ± 6%     ~     (p=0.461 n=18+19)
StdCmd                          25.8s ± 2%        25.5s ± 1%   -0.95%  (p=0.000 n=20+20)

name                      old user-time/op  new user-time/op  delta
Template                        240ms ± 8%        238ms ± 7%     ~     (p=0.301 n=20+20)
Unicode                         107ms ±18%        104ms ±13%     ~     (p=0.149 n=20+20)
GoTypes                         883ms ± 3%        888ms ± 2%     ~     (p=0.211 n=20+20)
Compiler                        4.22s ± 1%        4.20s ± 1%     ~     (p=0.077 n=20+18)
SSA                             14.1s ± 1%        14.1s ± 2%     ~     (p=0.192 n=20+20)
Flate                           145ms ±10%        148ms ± 5%     ~     (p=0.126 n=20+18)
GoParser                        186ms ± 7%        186ms ± 7%     ~     (p=0.779 n=20+20)
Reflect                         538ms ± 3%        541ms ± 3%     ~     (p=0.192 n=20+20)
Tar                             218ms ± 4%        217ms ± 6%     ~     (p=0.835 n=19+20)
XML                             298ms ± 5%        298ms ± 5%     ~     (p=0.749 n=19+20)
LinkCompiler                    818ms ± 5%        825ms ± 8%     ~     (p=0.461 n=20+20)
ExternalLinkCompiler            1.55s ± 4%        1.53s ± 5%     ~     (p=0.063 n=20+18)
LinkWithoutDebugCompiler        460ms ±12%        460ms ± 7%     ~     (p=0.925 n=20+20)

name                      old object-bytes  new object-bytes  delta
Template                        554kB ± 0%        554kB ± 0%     ~     (all equal)
Unicode                         215kB ± 0%        215kB ± 0%     ~     (all equal)
GoTypes                        2.01MB ± 0%       2.01MB ± 0%     ~     (all equal)
Compiler                       7.97MB ± 0%       7.97MB ± 0%   +0.00%  (p=0.000 n=20+20)
SSA                            26.8MB ± 0%       26.9MB ± 0%   +0.27%  (p=0.000 n=20+20)
Flate                           340kB ± 0%        340kB ± 0%     ~     (all equal)
GoParser                        434kB ± 0%        434kB ± 0%     ~     (all equal)
Reflect                        1.34MB ± 0%       1.34MB ± 0%     ~     (all equal)
Tar                             480kB ± 0%        480kB ± 0%     ~     (all equal)
XML                             622kB ± 0%        622kB ± 0%     ~     (all equal)

name                      old export-bytes  new export-bytes  delta
Template                       20.4kB ± 0%       20.4kB ± 0%     ~     (all equal)
Unicode                        8.21kB ± 0%       8.21kB ± 0%     ~     (all equal)
GoTypes                        36.6kB ± 0%       36.6kB ± 0%     ~     (all equal)
Compiler                        115kB ± 0%        115kB ± 0%   +0.08%  (p=0.000 n=20+20)
SSA                             141kB ± 0%        141kB ± 0%   +0.07%  (p=0.000 n=20+20)
Flate                          5.11kB ± 0%       5.11kB ± 0%     ~     (all equal)
GoParser                       8.93kB ± 0%       8.93kB ± 0%     ~     (all equal)
Reflect                        11.8kB ± 0%       11.8kB ± 0%     ~     (all equal)
Tar                            10.9kB ± 0%       10.9kB ± 0%     ~     (all equal)
XML                            17.4kB ± 0%       17.4kB ± 0%     ~     (all equal)

name                      old text-bytes    new text-bytes    delta
HelloSize                       742kB ± 0%        742kB ± 0%     ~     (all equal)
CmdGoSize                      10.7MB ± 0%       10.7MB ± 0%     ~     (all equal)

name                      old data-bytes    new data-bytes    delta
HelloSize                      10.7kB ± 0%       10.7kB ± 0%     ~     (all equal)
CmdGoSize                       312kB ± 0%        312kB ± 0%     ~     (all equal)

name                      old bss-bytes     new bss-bytes     delta
HelloSize                       122kB ± 0%        122kB ± 0%     ~     (all equal)
CmdGoSize                       146kB ± 0%        146kB ± 0%     ~     (all equal)

name                      old exe-bytes     new exe-bytes     delta
HelloSize                      1.10MB ± 0%       1.10MB ± 0%     ~     (all equal)
CmdGoSize                      14.9MB ± 0%       14.9MB ± 0%     ~     (all equal)

Change-Id: Ic89a8e62423b3d9fd9391159e0663acf450803b5
Reviewed-on: https://go-review.googlesource.com/c/go/+/198419
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2019-10-07 09:19:12 +00:00
Michael Munday 9c2e7e8bed cmd/compile: allow multiple SSA block control values
Control values are used to choose which successor of a block is
jumped to. Typically a control value takes the form of a 'flags'
value that represents the result of a comparison. Some
architectures however use a variable in a register as a control
value.

Up until now we have managed with a single control value per block.
However some architectures (e.g. s390x and riscv64) have combined
compare-and-branch instructions that take two variables in registers
as parameters. To generate these instructions we need to support 2
control values per block.

This CL allows up to 2 control values to be used in a block in
order to support the addition of compare-and-branch instructions.
I have implemented s390x compare-and-branch instructions in a
different CL.

Passes toolstash-check -all.

Results of compilebench:

name                      old time/op       new time/op       delta
Template                        208ms ± 1%        209ms ± 1%    ~     (p=0.289 n=20+20)
Unicode                        83.7ms ± 1%       83.3ms ± 3%  -0.49%  (p=0.017 n=18+18)
GoTypes                         748ms ± 1%        748ms ± 0%    ~     (p=0.460 n=20+18)
Compiler                        3.47s ± 1%        3.48s ± 1%    ~     (p=0.070 n=19+18)
SSA                             11.5s ± 1%        11.7s ± 1%  +1.64%  (p=0.000 n=19+18)
Flate                           130ms ± 1%        130ms ± 1%    ~     (p=0.588 n=19+20)
GoParser                        160ms ± 1%        161ms ± 1%    ~     (p=0.211 n=20+20)
Reflect                         465ms ± 1%        467ms ± 1%  +0.42%  (p=0.007 n=20+20)
Tar                             184ms ± 1%        185ms ± 2%    ~     (p=0.087 n=18+20)
XML                             253ms ± 1%        253ms ± 1%    ~     (p=0.377 n=20+18)
LinkCompiler                    769ms ± 2%        774ms ± 2%    ~     (p=0.070 n=19+19)
ExternalLinkCompiler            3.59s ±11%        3.68s ± 6%    ~     (p=0.072 n=20+20)
LinkWithoutDebugCompiler        446ms ± 5%        454ms ± 3%  +1.79%  (p=0.002 n=19+20)
StdCmd                          26.0s ± 2%        26.0s ± 2%    ~     (p=0.799 n=20+20)

name                      old user-time/op  new user-time/op  delta
Template                        238ms ± 5%        240ms ± 5%    ~     (p=0.142 n=20+20)
Unicode                         105ms ±11%        106ms ±10%    ~     (p=0.512 n=20+20)
GoTypes                         876ms ± 2%        873ms ± 4%    ~     (p=0.647 n=20+19)
Compiler                        4.17s ± 2%        4.19s ± 1%    ~     (p=0.093 n=20+18)
SSA                             13.9s ± 1%        14.1s ± 1%  +1.45%  (p=0.000 n=18+18)
Flate                           145ms ±13%        146ms ± 5%    ~     (p=0.851 n=20+18)
GoParser                        185ms ± 5%        188ms ± 7%    ~     (p=0.174 n=20+20)
Reflect                         534ms ± 3%        538ms ± 2%    ~     (p=0.105 n=20+18)
Tar                             215ms ± 4%        211ms ± 9%    ~     (p=0.079 n=19+20)
XML                             295ms ± 6%        295ms ± 5%    ~     (p=0.968 n=20+20)
LinkCompiler                    832ms ± 4%        837ms ± 7%    ~     (p=0.707 n=17+20)
ExternalLinkCompiler            1.58s ± 8%        1.60s ± 4%    ~     (p=0.296 n=20+19)
LinkWithoutDebugCompiler        478ms ±12%        489ms ±10%    ~     (p=0.429 n=20+20)

name                      old object-bytes  new object-bytes  delta
Template                        559kB ± 0%        559kB ± 0%    ~     (all equal)
Unicode                         216kB ± 0%        216kB ± 0%    ~     (all equal)
GoTypes                        2.03MB ± 0%       2.03MB ± 0%    ~     (all equal)
Compiler                       8.07MB ± 0%       8.07MB ± 0%  -0.06%  (p=0.000 n=20+20)
SSA                            27.1MB ± 0%       27.3MB ± 0%  +0.89%  (p=0.000 n=20+20)
Flate                           343kB ± 0%        343kB ± 0%    ~     (all equal)
GoParser                        441kB ± 0%        441kB ± 0%    ~     (all equal)
Reflect                        1.36MB ± 0%       1.36MB ± 0%    ~     (all equal)
Tar                             487kB ± 0%        487kB ± 0%    ~     (all equal)
XML                             632kB ± 0%        632kB ± 0%    ~     (all equal)

name                      old export-bytes  new export-bytes  delta
Template                       18.5kB ± 0%       18.5kB ± 0%    ~     (all equal)
Unicode                        7.92kB ± 0%       7.92kB ± 0%    ~     (all equal)
GoTypes                        35.0kB ± 0%       35.0kB ± 0%    ~     (all equal)
Compiler                        109kB ± 0%        110kB ± 0%  +0.72%  (p=0.000 n=20+20)
SSA                             137kB ± 0%        138kB ± 0%  +0.58%  (p=0.000 n=20+20)
Flate                          4.89kB ± 0%       4.89kB ± 0%    ~     (all equal)
GoParser                       8.49kB ± 0%       8.49kB ± 0%    ~     (all equal)
Reflect                        11.4kB ± 0%       11.4kB ± 0%    ~     (all equal)
Tar                            10.5kB ± 0%       10.5kB ± 0%    ~     (all equal)
XML                            16.7kB ± 0%       16.7kB ± 0%    ~     (all equal)

name                      old text-bytes    new text-bytes    delta
HelloSize                       761kB ± 0%        761kB ± 0%    ~     (all equal)
CmdGoSize                      10.8MB ± 0%       10.8MB ± 0%    ~     (all equal)

name                      old data-bytes    new data-bytes    delta
HelloSize                      10.7kB ± 0%       10.7kB ± 0%    ~     (all equal)
CmdGoSize                       312kB ± 0%        312kB ± 0%    ~     (all equal)

name                      old bss-bytes     new bss-bytes     delta
HelloSize                       122kB ± 0%        122kB ± 0%    ~     (all equal)
CmdGoSize                       146kB ± 0%        146kB ± 0%    ~     (all equal)

name                      old exe-bytes     new exe-bytes     delta
HelloSize                      1.13MB ± 0%       1.13MB ± 0%    ~     (all equal)
CmdGoSize                      15.1MB ± 0%       15.1MB ± 0%    ~     (all equal)

Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f
Reviewed-on: https://go-review.googlesource.com/c/go/+/196557
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Keith Randall <khr@golang.org>
2019-10-02 09:56:36 +00:00
Daniel Martí 870080752d cmd/compile: reduce rulegen's output by 200 KiB
First, renove unnecessary "// cond:" lines from the generated files.
This shaves off about ~7k lines.

Second, join "if cond { break }" statements via "||", which allows us to
deduplicate a large number of them. This shaves off another ~25k lines.

This change is not for readability or simplicity; but rather, to avoid
unnecessary verbosity that makes the generated files larger. All in all,
git reports that the generated files overall weigh ~200KiB less, or
about 2.7% less.

While at it, add a -trace flag to rulegen.

Updates #33644.

Change-Id: I3fac0290a6066070cc62400bf970a4ae0929470a
Reviewed-on: https://go-review.googlesource.com/c/go/+/196498
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-09-23 16:36:10 +00:00
Daniel Martí 79dee788ec cmd/compile: teach rulegen to remove unused decls
First, add cpu and memory profiling flags, as these are useful to see
where rulegen is spending its time. It now takes many seconds to run on
a recent laptop, so we have to keep an eye on what it's doing.

Second, stop writing '_ = var' lines to keep imports and variables used
at all times. Now that rulegen removes all such unused names, they're
unnecessary.

To perform the removal, lean on go/types to first detect what names are
unused. We can configure it to give us all the type-checking errors in a
file, so we can collect all "declared but not used" errors in a single
pass.

We then use astutil.Apply to remove the relevant nodes based on the line
information from each unused error. This allows us to apply the changes
without having to do extra parser+printer roundtrips to plaintext, which
are far too expensive.

We need to do multiple such passes, as removing an unused variable
declaration might then make another declaration unused. Two passes are
enough to clean every file at the moment, so add a limit of three passes
for now to avoid eating cpu uncontrollably by accident.

The resulting performance of the changes above is a ~30% loss across the
table, since go/types is fairly expensive. The numbers were obtained
with 'benchcmd Rulegen go run *.go', which involves compiling rulegen
itself, but that seems reflective of how the program is used.

	name     old time/op         new time/op         delta
	Rulegen          5.61s ± 0%          7.36s ± 0%  +31.17%  (p=0.016 n=5+4)

	name     old user-time/op    new user-time/op    delta
	Rulegen          7.20s ± 1%          9.92s ± 1%  +37.76%  (p=0.016 n=5+4)

	name     old sys-time/op     new sys-time/op     delta
	Rulegen          135ms ±19%          169ms ±17%  +25.66%  (p=0.032 n=5+5)

	name     old peak-RSS-bytes  new peak-RSS-bytes  delta
	Rulegen         71.0MB ± 2%         85.6MB ± 2%  +20.56%  (p=0.008 n=5+5)

We can live with a bit more resource usage, but the time/op getting
close to 10s isn't good. To win that back, introduce concurrency in
main.go. This further increases resource usage a bit, but the real time
on this quad-core laptop is greatly reduced. The final benchstat is as
follows:

	name     old time/op         new time/op         delta
	Rulegen          5.61s ± 0%          3.97s ± 1%   -29.26%  (p=0.008 n=5+5)

	name     old user-time/op    new user-time/op    delta
	Rulegen          7.20s ± 1%         13.91s ± 1%   +93.09%  (p=0.008 n=5+5)

	name     old sys-time/op     new sys-time/op     delta
	Rulegen          135ms ±19%          269ms ± 9%   +99.17%  (p=0.008 n=5+5)

	name     old peak-RSS-bytes  new peak-RSS-bytes  delta
	Rulegen         71.0MB ± 2%        226.3MB ± 1%  +218.72%  (p=0.008 n=5+5)

It might be possible to reduce the cpu or memory usage in the future,
such as configuring go/types to do less work, or taking shortcuts to
avoid having to run it many times. For now, ~2x cpu and ~4x memory usage
seems like a fair trade for a faster and better rulegen.

Finally, we can remove the old code that tried to remove some unused
variables in a hacky and unmaintainable way.

Change-Id: Iff9e83e3f253babf5a1bd48cc993033b8550cee6
Reviewed-on: https://go-review.googlesource.com/c/go/+/189798
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-08-27 17:04:18 +00:00
Brian Kessler a28a942768 cmd/compile: add unsigned divisibility rules
"Division by invariant integers using multiplication" paper
by Granlund and Montgomery contains a method for directly computing
divisibility (x%c == 0 for c constant) by means of the modular inverse.
The method is further elaborated in "Hacker's Delight" by Warren Section 10-17

This general rule can compute divisibilty by one multiplication and a compare
for odd divisors and an additional rotate for even divisors.

To apply the divisibility rule, we must take into account
the rules to rewrite x%c = x-((x/c)*c) and (x/c) for c constant on the first
optimization pass "opt".  This complicates the matching as we want to match
only in the cases where the result of (x/c) is not also available.
So, we must match on the expanded form of (x/c) in the expression x == c*(x/c)
in the "late opt" pass after common subexpresion elimination.

Note, that if there is an intermediate opt pass introduced in the future we
could simplify these rules by delaying the magic division rewrite to "late opt"
and matching directly on (x/c) in the intermediate opt pass.

Additional rules to lower the generic RotateLeft* ops were also applied.

On amd64, the divisibility check is 25-50% faster.

name                     old time/op  new time/op  delta
DivconstI64-4            2.08ns ± 0%  2.08ns ± 1%     ~     (p=0.881 n=5+5)
DivisibleconstI64-4      2.67ns ± 0%  2.67ns ± 1%     ~     (p=1.000 n=5+5)
DivisibleWDivconstI64-4  2.67ns ± 0%  2.67ns ± 0%     ~     (p=0.683 n=5+5)
DivconstU64-4            2.08ns ± 1%  2.08ns ± 1%     ~     (p=1.000 n=5+5)
DivisibleconstU64-4      2.77ns ± 1%  1.55ns ± 2%  -43.90%  (p=0.008 n=5+5)
DivisibleWDivconstU64-4  2.99ns ± 1%  2.99ns ± 1%     ~     (p=1.000 n=5+5)
DivconstI32-4            1.53ns ± 2%  1.53ns ± 0%     ~     (p=1.000 n=5+5)
DivisibleconstI32-4      2.23ns ± 0%  2.25ns ± 3%     ~     (p=0.167 n=5+5)
DivisibleWDivconstI32-4  2.27ns ± 1%  2.27ns ± 1%     ~     (p=0.429 n=5+5)
DivconstU32-4            1.78ns ± 0%  1.78ns ± 1%     ~     (p=1.000 n=4+5)
DivisibleconstU32-4      2.52ns ± 2%  1.26ns ± 0%  -49.96%  (p=0.000 n=5+4)
DivisibleWDivconstU32-4  2.63ns ± 0%  2.85ns ±10%   +8.29%  (p=0.016 n=4+5)
DivconstI16-4            1.54ns ± 0%  1.54ns ± 0%     ~     (p=0.333 n=4+5)
DivisibleconstI16-4      2.10ns ± 0%  2.10ns ± 1%     ~     (p=0.571 n=4+5)
DivisibleWDivconstI16-4  2.22ns ± 0%  2.23ns ± 1%     ~     (p=0.556 n=4+5)
DivconstU16-4            1.09ns ± 0%  1.01ns ± 1%   -7.74%  (p=0.000 n=4+5)
DivisibleconstU16-4      1.83ns ± 0%  1.26ns ± 0%  -31.52%  (p=0.008 n=5+5)
DivisibleWDivconstU16-4  1.88ns ± 0%  1.89ns ± 1%     ~     (p=0.365 n=5+5)
DivconstI8-4             1.54ns ± 1%  1.54ns ± 1%     ~     (p=1.000 n=5+5)
DivisibleconstI8-4       2.10ns ± 0%  2.11ns ± 0%     ~     (p=0.238 n=5+4)
DivisibleWDivconstI8-4   2.22ns ± 0%  2.23ns ± 2%     ~     (p=0.762 n=5+5)
DivconstU8-4             0.92ns ± 1%  0.94ns ± 1%   +2.65%  (p=0.008 n=5+5)
DivisibleconstU8-4       1.66ns ± 0%  1.26ns ± 1%  -24.28%  (p=0.008 n=5+5)
DivisibleWDivconstU8-4   1.79ns ± 0%  1.80ns ± 1%     ~     (p=0.079 n=4+5)

A follow-up change will address the signed division case.

Updates #30282

Change-Id: I7e995f167179aa5c76bb10fbcbeb49c520943403
Reviewed-on: https://go-review.googlesource.com/c/go/+/168037
Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-04-27 20:46:46 +00:00
Daniel Martí 0fbf681840 cmd/compile: reduce rulegen's for loop verbosity
A lot of the naked for loops begin like:

	for {
		v := b.Control
		if v.Op != OpConstBool {
			break
		}
		...
		return true
	}

Instead, write them out in a more compact and readable way:

	for v.Op == OpConstBool {
		...
		return true
	}

This requires the addition of two bytes.Buffer writers, as this helps us
make a decision based on future pieces of generated code. This probably
makes rulegen slightly slower, but that's not noticeable; the code
generation still takes ~3.5s on my laptop, excluding build time.

The "v := b.Control" declaration can be moved to the top of each
function. Even though the rules can modify b.Control when firing, they
also make the function return, so v can't be used again.

While at it, remove three unnecessary lines from the top of each
rewriteBlock func.

In total, this results in ~4k lines removed from the generated code, and
a slight improvement in readability.

Change-Id: I317e4c6a4842c64df506f4513375475fad2aeec5
Reviewed-on: https://go-review.googlesource.com/c/go/+/167399
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-03-22 15:41:51 +00:00
Josh Bleecher Snyder 700e969d5b cmd/compile: regenerate rewrite rules
Change-Id: I7e921b7b4665ff76dc8bae493b2a49318690770b
Reviewed-on: https://go-review.googlesource.com/c/go/+/168637
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-03-21 17:04:11 +00:00
Keith Randall 2c423f063b cmd/compile,runtime: provide index information on bounds check failure
A few examples (for accessing a slice of length 3):

   s[-1]    runtime error: index out of range [-1]
   s[3]     runtime error: index out of range [3] with length 3
   s[-1:0]  runtime error: slice bounds out of range [-1:]
   s[3:0]   runtime error: slice bounds out of range [3:0]
   s[3:-1]  runtime error: slice bounds out of range [:-1]
   s[3:4]   runtime error: slice bounds out of range [:4] with capacity 3
   s[0:3:4] runtime error: slice bounds out of range [::4] with capacity 3

Note that in cases where there are multiple things wrong with the
indexes (e.g. s[3:-1]), we report one of those errors kind of
arbitrarily, currently the rightmost one.

An exhaustive set of examples is in issue30116[u].out in the CL.

The message text has the same prefix as the old message text. That
leads to slightly awkward phrasing but hopefully minimizes the chance
that code depending on the error text will break.

Increases the size of the go binary by 0.5% (amd64). The panic functions
take arguments in registers in order to keep the size of the compiled code
as small as possible.

Fixes #30116

Change-Id: Idb99a827b7888822ca34c240eca87b7e44a04fdd
Reviewed-on: https://go-review.googlesource.com/c/go/+/161477
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2019-03-18 17:33:38 +00:00
Michael Munday 04f1b65cc6 cmd/compile: try and access last argument first in rulegen
This reduces the number of extra bounds check hints we need to
insert. For example, rather than producing:

	_ = v.Args[2]
	x := v.Args[0]
	y := v.Args[1]
	z := v.Args[2]

We now produce:

	z := v.Args[2]
	x := v.Args[0]
	y := v.Args[1]

This gets rid of about 7000 lines of code from the rewrite rules.

Change-Id: I1291cf0f82e8d035a6d65bce7dee6cedee04cbcd
Reviewed-on: https://go-review.googlesource.com/c/go/+/167397
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-03-13 15:11:37 +00:00
Josh Bleecher Snyder 53948127d3 cmd/compile: make rulegen magic variable prediction more precise
The sheer length of the generated rules files makes my
editor and git client unhappy.
This change is a small step towards shortening them.

We recognize a few magic variables during rulegen: b, config, fe, typ.
Of these, only b appears prone to false positives.
By tightening the heuristic and fixing one case in MIPS.rules,
we can make the heuristic enough that it has no failures.
That allows us to remove the hedge assignments to _,
removing 3000 pointless lines of code.

Change-Id: I080cde5db28c8277cb3fd9ddcd829306c9a27785
Reviewed-on: https://go-review.googlesource.com/c/go/+/166979
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-03-12 20:13:46 +00:00
Keith Randall 7502ed3b90 cmd/compile: when merging instructions, prefer line number of faulting insn
Normally this happens when combining a sign extension and a load.  We
want the resulting combo-instruction to get the line number of the
load, not the line number of the sign extension.

For each rule, compute where we should get its line number by finding
a value on the match side that can fault.  Use that line number for
all the new values created on the right-hand side.

Fixes #27201

Change-Id: I19b3c6f468fff1a3c0bfbce2d6581828557064a3
Reviewed-on: https://go-review.googlesource.com/c/156937
Reviewed-by: David Chase <drchase@google.com>
2019-01-14 22:41:33 +00:00
Cherry Zhang 6a64efc250 cmd/compile: fix MIPS SGTconst-with-shift rules
(SGTconst [c] (SRLconst _ [d])) && 0 <= int32(c) && uint32(d) <= 31 && 1<<(32-uint32(d)) <= int32(c) -> (MOVWconst [1])

This rule is problematic. 1<<(32-uint32(d)) <= int32(c) meant to
say that it is true if c is greater than the largest possible
value of the right shift. But when d==1, 1<<(32-1) is negative
and results in the wrong comparison.

Rewrite the rules in a more direct way.

Fixes #29402.

Change-Id: I5940fc9538d9bc3a4bcae8aa34672867540dc60e
Reviewed-on: https://go-review.googlesource.com/c/155798
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-12-27 00:07:53 +00:00
Josh Bleecher Snyder 3bab4373c7 cmd/compile: make fmt available in rewrite rules
During development and debugging, I often want to
write noteRule(fmt.Sprintf(...)), and end up
manually adding the import to the generated code.
Let's just make it always available instead.

Change-Id: I1e2d47c98ba056e1b5da42e35fb6ad26f1d9cc3d
Reviewed-on: https://go-review.googlesource.com/c/145207
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
2018-10-28 18:46:36 +00:00
David Chase 0029cd479e cmd/compile: add LocalAddr that takes SP,mem operands
Lack of a well-defined order between VarDef and related
address operations sometimes causes problems with store order
and write barrier transformations; glitches in the order are
made irreparable (by later optimizations) if the two parts of
the glitch straddle a split in the original block caused by
insertion of a write barrier diamond.

Fix this by creating a LocalAddr for addresses of locals
(what VarDef matters for) that takes a memory input to
help make the order explicit.  Addr is modified to only
be legal for SB operand, so there is no overlap between
Addr and LocalAddr uses (there may be some downstream
cleanup from this).

Changes to generic.rules and rewrite.go ensure that codegen
tests continue to pass; CSE of LocalAddr is impaired, not
quite sure of the cost.

Fixes #26105.

Change-Id: Id4192b4440aa4e9d7ba54a465c456df9b530b515
Reviewed-on: https://go-review.googlesource.com/122483
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-07-12 18:45:31 +00:00
Wei Xiao 20102594a0 cmd/compile: intrinsify runtime.getcallerpc on all link register architectures
Add a compiler intrinsic for getcallerpc on following architectures:
  arm
  mips mipsle mips64 mips64le
  ppc64 ppc64le
  s390x

Change-Id: I758f3d4742fc214b206bcd07d90408622c17dbef
Reviewed-on: https://go-review.googlesource.com/110835
Run-TryBot: Wei Xiao <Wei.Xiao@arm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-05-02 16:59:27 +00:00
Josh Bleecher Snyder d9a50a6531 cmd/compile: use prove pass to detect Ctz of non-zero values
On amd64, Ctz must include special handling of zeros.
But the prove pass has enough information to detect whether the input
is non-zero, allowing a more efficient lowering.

Introduce new CtzNonZero ops to capture and use this information.

Benchmark code:

func BenchmarkVisitBits(b *testing.B) {
	b.Run("8", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			x := uint8(0xff)
			for x != 0 {
				sink = bits.TrailingZeros8(x)
				x &= x - 1
			}
		}
	})

    // and similarly so for 16, 32, 64
}

name            old time/op  new time/op  delta
VisitBits/8-8   7.27ns ± 4%  5.58ns ± 4%  -23.35%  (p=0.000 n=28+26)
VisitBits/16-8  14.7ns ± 7%  10.5ns ± 4%  -28.43%  (p=0.000 n=30+28)
VisitBits/32-8  27.6ns ± 8%  19.3ns ± 3%  -30.14%  (p=0.000 n=30+26)
VisitBits/64-8  44.0ns ±11%  38.0ns ± 5%  -13.48%  (p=0.000 n=30+30)

Fixes #25077

Change-Id: Ie6e5bd86baf39ee8a4ca7cadcf56d934e047f957
Reviewed-on: https://go-review.googlesource.com/109358
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-26 18:22:28 +00:00
Austin Clements 8871c930be cmd/compile: don't lower OpConvert
Currently, each architecture lowers OpConvert to an arch-specific
OpXXXconvert. This is silly because OpConvert means the same thing on
all architectures and is logically a no-op that exists only to keep
track of conversions to and from unsafe.Pointer. Furthermore, lowering
it makes it harder to recognize in other analyses, particularly
liveness analysis.

This CL eliminates the lowering of OpConvert, leaving it as the
generic op until code generation time.

The main complexity here is that we still need to register-allocate
OpConvert operations. Currently, each arch's lowered OpConvert
specifies all GP registers in its register mask. Ideally, OpConvert
wouldn't affect value homing at all, and we could just copy the home
of OpConvert's source, but this can potentially home an OpConvert in a
LocalSlot, which neither regalloc nor stackalloc expect. Rather than
try to disentangle this assumption from regalloc and stackalloc, we
continue to register-allocate OpConvert, but teach regalloc that
OpConvert can be allocated to any allocatable GP register.

For #24543.

Change-Id: I795a6aee5fd94d4444a7bafac3838a400c9f7bb6
Reviewed-on: https://go-review.googlesource.com/108496
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-04-20 18:46:39 +00:00
Giovanni Bajo 70fd25e4e1 cmd/compile: normalize spaces in rewrite rule comments.
In addition to look nicer to the eye, this allows to reformat
and indent rules without causing spurious changes to the generated
file, making it easier to spot functional changes.

After this CL, all CLs that will aggregate rules through
the new "|" functionality should cause no changes to the
generated files.

Change-Id: Icec283585ba8d7b91c79d76513c1d83dca4b30aa
Reviewed-on: https://go-review.googlesource.com/95216
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-02-20 18:14:38 +00:00
Austin Clements 313a4b2b7f runtime: buffered write barrier for mips
Updates #22460.

Change-Id: Ieaca94385c3bb88dcc8351c3866b4b0e2a1412b5
Reviewed-on: https://go-review.googlesource.com/92701
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-02-13 16:34:21 +00:00
Cherry Zhang 6f3e5e637c cmd/compile: intrinsify runtime.getcallersp
Add a compiler intrinsic for getcallersp. So we are able to get
rid of the argument (not done in this CL).

Change-Id: Ic38fda1c694f918328659ab44654198fb116668d
Reviewed-on: https://go-review.googlesource.com/69350
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: David Chase <drchase@google.com>
2017-10-10 15:15:21 +00:00
philhofer c59b495963 cmd/compile: add support for arm64 bit-test instructions
Add support for generating TBZ/TBNZ instructions.

The bit-test-and-branch pattern shows up in a number of
important places, including the runtime (gc bitmaps).

Before this change, there were 3 TB[N]?Z instructions in the Go tool,
all of which were in hand-written assembly. After this change, there
are 285. Also, the go1 benchmark binary gets about 4.5kB smaller.

Fixes #21361

Change-Id: I170c138b852754b9b8df149966ca5e62e6dfa771
Reviewed-on: https://go-review.googlesource.com/54470
Run-TryBot: Philip Hofer <phofer@umich.edu>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-08-15 13:39:11 +00:00
Cherry Zhang 1e0819101b cmd/compile: fix subword store/load elision for MIPS
Apply the fix in CL 44355 to MIPS.

ARM64 has these rules but commented out for performance reason.
Fix the commented rules, in case they are enabled in the future.

Enhance the test so it triggers the failure on ARM and MIPS without
the fix.

Updates #20530.

Change-Id: I82d77448e3939a545fe519d0a29a164f8fa5417c
Reviewed-on: https://go-review.googlesource.com/44430
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-05-31 14:44:02 +00:00
Josh Bleecher Snyder 5548f7d5cf cmd/compile: eliminate some bounds checks from generated rewrite rules
Noticed while looking at #20356.

Cuts 160k (1%) off of the cmd/compile binary.

Change-Id: If2397bc6971d6be9be6975048adecb0b5efa6d66
Reviewed-on: https://go-review.googlesource.com/43501
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-05-16 14:08:08 +00:00
Josh Bleecher Snyder 46b88c9fbc cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.

In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.

Though this is a big CL, most of the changes are
mechanical and uninteresting.

Interesting bits:

* Add new singleton globals to package types for the special
  SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
  and TTUPLE, for SSA tuple types.
  ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
  to package types.
* We had picked the name "types" in our rules for the handy
  list of types provided by ssa.Config. That conflicted with
  the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
  types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
  and probably also some other duplicated Type methods
  designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
  and they were not particularly careful about types in general.
  Of necessity, this CL switches them to use *types.Type;
  it does not make them more type-accurate.
  Unfortunately, using types.Type means initializing a bit
  of the types universe.
  This is prime for refactoring and improvement.

This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.

name        old alloc/op      new alloc/op      delta
Template         37.9MB ± 0%       37.7MB ± 0%  -0.57%  (p=0.000 n=10+8)
Unicode          28.9MB ± 0%       28.7MB ± 0%  -0.52%  (p=0.000 n=10+10)
GoTypes           110MB ± 0%        109MB ± 0%  -0.88%  (p=0.000 n=10+10)
Flate            24.7MB ± 0%       24.6MB ± 0%  -0.66%  (p=0.000 n=10+10)
GoParser         31.1MB ± 0%       30.9MB ± 0%  -0.61%  (p=0.000 n=10+9)
Reflect          73.9MB ± 0%       73.4MB ± 0%  -0.62%  (p=0.000 n=10+8)
Tar              25.8MB ± 0%       25.6MB ± 0%  -0.77%  (p=0.000 n=9+10)
XML              41.2MB ± 0%       40.9MB ± 0%  -0.80%  (p=0.000 n=10+10)
[Geo mean]       40.5MB            40.3MB       -0.68%

name        old allocs/op     new allocs/op     delta
Template           385k ± 0%         386k ± 0%    ~     (p=0.356 n=10+9)
Unicode            343k ± 1%         344k ± 0%    ~     (p=0.481 n=10+10)
GoTypes           1.16M ± 0%        1.16M ± 0%  -0.16%  (p=0.004 n=10+10)
Flate              238k ± 1%         238k ± 1%    ~     (p=0.853 n=10+10)
GoParser           320k ± 0%         320k ± 0%    ~     (p=0.720 n=10+9)
Reflect            957k ± 0%         957k ± 0%    ~     (p=0.460 n=10+8)
Tar                252k ± 0%         252k ± 0%    ~     (p=0.133 n=9+10)
XML                400k ± 0%         400k ± 0%    ~     (p=0.796 n=10+10)
[Geo mean]         428k              428k       -0.01%


Removing all the interface calls helps non-trivially with CPU, though.

name        old time/op       new time/op       delta
Template          178ms ± 4%        173ms ± 3%  -2.90%  (p=0.000 n=94+96)
Unicode          85.0ms ± 4%       83.9ms ± 4%  -1.23%  (p=0.000 n=96+96)
GoTypes           543ms ± 3%        528ms ± 3%  -2.73%  (p=0.000 n=98+96)
Flate             116ms ± 3%        113ms ± 4%  -2.34%  (p=0.000 n=96+99)
GoParser          144ms ± 3%        140ms ± 4%  -2.80%  (p=0.000 n=99+97)
Reflect           344ms ± 3%        334ms ± 4%  -3.02%  (p=0.000 n=100+99)
Tar               106ms ± 5%        103ms ± 4%  -3.30%  (p=0.000 n=98+94)
XML               198ms ± 5%        192ms ± 4%  -2.88%  (p=0.000 n=92+95)
[Geo mean]        178ms             173ms       -2.65%

name        old user-time/op  new user-time/op  delta
Template          229ms ± 5%        224ms ± 5%  -2.36%  (p=0.000 n=95+99)
Unicode           107ms ± 6%        106ms ± 5%  -1.13%  (p=0.001 n=93+95)
GoTypes           696ms ± 4%        679ms ± 4%  -2.45%  (p=0.000 n=97+99)
Flate             137ms ± 4%        134ms ± 5%  -2.66%  (p=0.000 n=99+96)
GoParser          176ms ± 5%        172ms ± 8%  -2.27%  (p=0.000 n=98+100)
Reflect           430ms ± 6%        411ms ± 5%  -4.46%  (p=0.000 n=100+92)
Tar               128ms ±13%        123ms ±13%  -4.21%  (p=0.000 n=100+100)
XML               239ms ± 6%        233ms ± 6%  -2.50%  (p=0.000 n=95+97)
[Geo mean]        220ms             213ms       -2.76%


Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-05-09 23:01:51 +00:00