mirror/go - go - Git Fam. Sieh

Commit Graph

Author	SHA1	Message	Date
Daniel Martí	5437b5a24b	cmd/compile: disallow rewrite rules from declaring reserved names If I change a rule in ARM64.rules to use the variable name "b" in a conflicting way, rulegen would previously not complain, and the compiler would later give a confusing error: $ go run .go && go build cmd/compile/internal/ssa # cmd/compile/internal/ssa ../rewriteARM64.go:24236:10: b.NewValue0 undefined (type int64 has no field or method NewValue0) Make rulegen complain early about those cases. Sometimes they might happen to be harmless, but in general they can easily cause confusion or unintended effect due to shadowing. After the change, with the same conflicting rule: $ go run .go && go build cmd/compile/internal/ssa 2021/03/22 11:31:49 rule ARM64.rules:495 uses the reserved name b exit status 1 Note that 24 existing rules were using reserved names. It seems like the shadowing was harmless, as it wasn't causing typechecking issues nor did it seem to cause unintended behavior when the rule rewrite code ran. The bool values "b" were renamed "t", since that seems to have a precedent in other rules and in the fmt package. Sequential values like "a b c" were renamed to "x y z", since "b" is reserved. Finally, "typ" was renamed to "_typ", since there doesn't seem to be an obviously better answer. Passes all three of: $ GOARCH=amd64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=arm64 go build -toolexec 'toolstash -cmp' -a std $ GOARCH=mips64 go build -toolexec 'toolstash -cmp' -a std Fixes #45154. Change-Id: I1cce194dc7b477886a9c218c17973e996bcedccf Reviewed-on: https://go-review.googlesource.com/c/go/+/303549 Trust: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2021-03-22 16:02:08 +00:00
fanzha02	2b50ab2aee	cmd/compile: optimize single-precision floating point square root Add generic rule to rewrite the single-precision square root expression with one single-precision instruction. The optimization will reduce two times of precision converting between double-precision and single-precision. On arm64 flatform. previous: FCVTSD F0, F0 FSQRTD F0, F0 FCVTDS F0, F0 optimized: FSQRTS S0, S0 And this patch adds the test case to check the correctness. This patch refers to CL 241877, contributed by Alice Xu (dianhong.xu@arm.com) Change-Id: I6de5d02281c693017ac4bd4c10963dd55989bd7e Reviewed-on: https://go-review.googlesource.com/c/go/+/276873 Trust: fannie zhang <Fannie.Zhang@arm.com> Run-TryBot: fannie zhang <Fannie.Zhang@arm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2021-03-02 06:38:07 +00:00
Keith Randall	d2911d7612	cmd/compile: fold MOVnop and MOVconst MOVnop and MOVreg seem superfluous. They are there to keep type information around that would otherwise get thrown away. Not sure what we need it for. I think our compiler needs a normalization of how types are represented in SSA, especially after lowering. MOVnop gets in the way of some optimization rules firing, like for load combining. For now, just fold MOVnop and MOVconst. It's certainly safe to do that, as the type info on the MOVconst isn't ever useful. R=go1.17 Change-Id: I3630a80afc2455a8e9cd9fde10c7abe05ddc3767 Reviewed-on: https://go-review.googlesource.com/c/go/+/276792 Trust: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2021-02-23 20:05:01 +00:00
Keith Randall	6c64b6db68	cmd/compile: don't constant fold divide by zero It just makes the compiler crash. Oops. Fixes #43099 Change-Id: Id996c14799c1a5d0063ecae3b8770568161c2440 Reviewed-on: https://go-review.googlesource.com/c/go/+/276652 Trust: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2020-12-10 03:18:00 +00:00
Alberto Donizetti	399b5d14d4	cmd/compile: stop MOVW-ing -1 as SRA shift amount in mips The shift amount in SRAconst needs to be in the [0,31] range, so stop MOVWing -1 to SRA in the Rsh lowering rules. Also see CL 270117. Passes $ GOARCH=mips go build -toolexec 'toolstash -cmp' -a std $ GOARCH=mipsle go build -toolexec 'toolstash -cmp' -a std Updates #42587 Change-Id: Ib5eb99b82310e404cc2d6f0c619b21b8a15406ce Reviewed-on: https://go-review.googlesource.com/c/go/+/270558 Trust: Alberto Donizetti <alb.donizetti@gmail.com> Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2020-11-18 12:04:06 +00:00
Alberto Donizetti	f2eea4c1dc	cmd/compile: mask SLL,SRL,SRAconst shift amount mips SRA/SLL/SRL shift amounts are used mod 32; this change aligns the XXXconst rules to mask the shift amount by &31. Passes $ GOARCH=mips go build -toolexec 'toolstash -cmp' -a std $ GOARCH=mipsle go build -toolexec 'toolstash -cmp' -a std Fixes #42587 Change-Id: I6003ebd0bc500fba4cf6fb10254e1b557bf8c48f Reviewed-on: https://go-review.googlesource.com/c/go/+/270117 Trust: Alberto Donizetti <alb.donizetti@gmail.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2020-11-16 08:22:10 +00:00
Alberto Donizetti	1090f0986d	cmd/compile: rename mergeSymTyped to mergeSym Also make canMergeSym take Syms instead of interface{} Change-Id: I4926a1fc586aa90e198249d67e5b520404b40869 Reviewed-on: https://go-review.googlesource.com/c/go/+/265817 Trust: Alberto Donizetti <alb.donizetti@gmail.com> Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-10-28 19:19:18 +00:00
Alberto Donizetti	de2d1c3fe2	cmd/compile: replace int32(b2i(x)) with b2i32(x) in rules Change-Id: I7fbb0c1ead6e29a7445c8ab43f7050947597f3e8 Reviewed-on: https://go-review.googlesource.com/c/go/+/265497 Trust: Alberto Donizetti <alb.donizetti@gmail.com> Reviewed-by: Keith Randall <khr@golang.org>	2020-10-27 20:04:19 +00:00
Alberto Donizetti	5c1122b528	cmd/compile: delete isPowerOfTwo, switch to isPowerOfTwo64 rewrite.go has two identical functions isPowerOfTwo and isPowerOfTwo64; the former has been there for a while, while the latter was added together with isPowerOfTwo{8,16,32} for use in typed rules. This change deletes isPowerOfTwo and switch to using isPowerOfTwo64 everywhere. Change-Id: If26c94565d2393fac6f0ba117ee7ee2fc915f7cd Reviewed-on: https://go-review.googlesource.com/c/go/+/265417 Trust: Alberto Donizetti <alb.donizetti@gmail.com> Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-10-27 20:03:41 +00:00
Michael Pratt	e5ad73508e	cmd/compile: intrinsify runtime/internal/atomic.{And,Or} on MIPS This one is trivial, as there are already 32-bit AND and OR ops used to implement the more complex 8-bit versions. Change-Id: Ic48a53ea291d0067ebeab8e96c82e054daf20ae7 Reviewed-on: https://go-review.googlesource.com/c/go/+/263149 Run-TryBot: Michael Pratt <mpratt@google.com> TryBot-Result: Go Bot <gobot@golang.org> Trust: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2020-10-23 16:25:07 +00:00
Josh Bleecher Snyder	67a8660b5a	cmd/compile: CSE the RHS of rewrite rules Keep track of all expressions encountered while generating a rewrite result, and re-use them whenever possible. Named expressions may still be used for clarity when desired. Change-Id: I640dca108763eb8baeff8f9a4169300af3445b82 Reviewed-on: https://go-review.googlesource.com/c/go/+/229800 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc>	2020-04-24 16:44:20 +00:00
Alberto Donizetti	e47a17aeee	cmd/compile: convert remaining mips rules to typed aux Passes GOARCH=mips gotip build -toolexec 'toolstash -cmp' -a std GOARCH=mipsle gotip build -toolexec 'toolstash -cmp' -a std Change-Id: I35df0522e299aa755491cd25f47f1f1bf447848c Reviewed-on: https://go-review.googlesource.com/c/go/+/229637 Reviewed-by: Keith Randall <khr@golang.org>	2020-04-24 07:22:41 +00:00
Alberto Donizetti	81df5e69fc	cmd/compile: switch to typed aux for mips lowering rules This covers most of the lowering rules. Passes GOARCH=mips gotip build -toolexec 'toolstash -cmp' -a std GOARCH=mipsle gotip build -toolexec 'toolstash -cmp' -a std Change-Id: I9d00aaebecb36622e3bdaf556e5a9377670bf86b Reviewed-on: https://go-review.googlesource.com/c/go/+/229102 Reviewed-by: Keith Randall <khr@golang.org>	2020-04-22 17:30:07 +00:00
Michael Munday	bfd569fcb0	cmd/compile: delete the floating point Greater and Geq ops Extend CL 220417 (which removed the integer Greater and Geq ops) to floating point comparisons. Greater and Geq can always be implemented using Less and Leq. Fixes #37316. Change-Id: Ieaddb4877dd0ff9037a1dd11d0a9a9e45ced71e7 Reviewed-on: https://go-review.googlesource.com/c/go/+/222397 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-04-07 19:55:05 +00:00
David Chase	47ade08141	cmd/compile: add logging for large (>= 128 byte) copies For 1.15, unless someone really wants it in 1.14. A performance-sensitive user thought this would be useful, though "large" was not well-defined. If 128 is large, there are 139 static instances of "large" copies in the compiler itself. Includes test. Change-Id: I81f20c62da59d37072429f3a22c1809e6fb2946d Reviewed-on: https://go-review.googlesource.com/c/go/+/205066 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2020-04-03 17:24:48 +00:00
Josh Bleecher Snyder	a2bff7c296	cmd/compile: make pre-elimination of rulegen bounds checks more precise In cases in which we had a named value whose args were all _, like this rule from ARM.rules: (MOVBUreg x:(MOVBUload _ _)) -> (MOVWreg x) We previously inserted _ = x.Args[1] even though it is unnecessary. This change eliminates this pointless bounds check. And in other cases, we now check bounds just as far as strictly necessary. No significant movement on any compiler metrics. Just nicer (and less) code. Passes toolstash-check -all. Change-Id: I075dfe9f926cc561cdc705e9ddaab563164bed3a Reviewed-on: https://go-review.googlesource.com/c/go/+/221781 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-03-02 17:40:11 +00:00
Josh Bleecher Snyder	5e4da0adac	cmd/compile: add streamlined Block Reset+AddControl routines For use in rewrite rules. Shrinks cmd/compile: compile 20082104 19967416 -114688 -0.571% Passes toolstash-check -all. Change-Id: Ic856508b27ec5b7fb9b6ca63e955a7139ae7dc30 Reviewed-on: https://go-review.googlesource.com/c/go/+/221780 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-03-02 17:40:00 +00:00
Josh Bleecher Snyder	d7c073ecbf	cmd/compile: add specialized Value reset for OpCopy This: * Simplifies and shortens the generated code for rewrite rules. * Shrinks cmd/compile by 86k (0.4%) and makes it easier to compile. * Removes the stmt boundary code wrangling from Value.reset, in favor of doing it in the one place where it actually does some work, namely the writebarrier pass. (This was ascertained by inspecting the code for cases in which notStmtBoundary values were generated.) Passes toolstash-check -all. Change-Id: I25671d4c4bbd772f235195d11da090878ea2cc07 Reviewed-on: https://go-review.googlesource.com/c/go/+/221421 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2020-03-02 16:24:47 +00:00
Josh Bleecher Snyder	7913f7dfcf	cmd/compile: add specialized AddArgN functions for rewrite rules This shrinks the compiler without impacting performance. (The performance-sensitive part of rewrite rules is the non-match case.) Passes toolstash-check -all. Executable size: file before after Δ % compile 20356168 20163960 -192208 -0.944% total 115599376 115407168 -192208 -0.166% Text size: file before after Δ % cmd/compile/internal/ssa.s 3928309 3778774 -149535 -3.807% total 18862943 18713408 -149535 -0.793% Memory allocated compiling package SSA: SSA 12.7M ± 0% 12.5M ± 0% -1.74% (p=0.008 n=5+5) Compiler speed impact: name old time/op new time/op delta Template 211ms ± 1% 211ms ± 2% ~ (p=0.832 n=49+49) Unicode 82.8ms ± 2% 83.2ms ± 2% +0.44% (p=0.022 n=46+49) GoTypes 726ms ± 1% 728ms ± 2% ~ (p=0.076 n=46+48) Compiler 3.39s ± 2% 3.40s ± 2% ~ (p=0.633 n=48+49) SSA 7.71s ± 1% 7.65s ± 1% -0.78% (p=0.000 n=45+44) Flate 134ms ± 1% 134ms ± 1% ~ (p=0.195 n=50+49) GoParser 167ms ± 1% 167ms ± 1% ~ (p=0.390 n=47+47) Reflect 453ms ± 3% 452ms ± 2% ~ (p=0.492 n=48+49) Tar 184ms ± 3% 184ms ± 2% ~ (p=0.862 n=50+48) XML 248ms ± 2% 248ms ± 2% ~ (p=0.096 n=49+47) [Geo mean] 415ms 415ms -0.03% name old user-time/op new user-time/op delta Template 273ms ± 1% 273ms ± 2% ~ (p=0.711 n=48+48) Unicode 117ms ± 6% 117ms ± 5% ~ (p=0.633 n=50+50) GoTypes 972ms ± 2% 974ms ± 1% +0.29% (p=0.016 n=47+49) Compiler 4.46s ± 6% 4.51s ± 6% ~ (p=0.093 n=50+50) SSA 10.4s ± 1% 10.3s ± 2% -0.94% (p=0.000 n=45+50) Flate 166ms ± 2% 167ms ± 2% ~ (p=0.148 n=49+48) GoParser 202ms ± 1% 202ms ± 2% -0.28% (p=0.014 n=47+49) Reflect 594ms ± 2% 594ms ± 2% ~ (p=0.717 n=48+49) Tar 224ms ± 2% 224ms ± 2% ~ (p=0.805 n=50+49) XML 311ms ± 1% 310ms ± 1% ~ (p=0.177 n=49+48) [Geo mean] 537ms 537ms +0.01% Change-Id: I562b9f349b34ddcff01771769e6dbbc80604da7a Reviewed-on: https://go-review.googlesource.com/c/go/+/221237 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-03-01 15:27:58 +00:00
Josh Bleecher Snyder	d889f0cb10	cmd/compile: use correct types in phiopt We try to preserve type correctness of generic ops. phiopt modified a bool to be an int without a conversion. Add a conversion. There are a few random fluctations in the generated code as a result, but nothing noteworthy or systematic. no binary size changes file before after Δ % math.s 35966 35961 -5 -0.014% debug/dwarf.s 108141 108147 +6 +0.006% crypto/dsa.s 6047 6044 -3 -0.050% image/png.s 42882 42885 +3 +0.007% go/parser.s 80281 80278 -3 -0.004% cmd/internal/obj.s 115116 115113 -3 -0.003% go/types.s 322130 322118 -12 -0.004% cmd/internal/obj/arm64.s 151679 151685 +6 +0.004% go/internal/gccgoimporter.s 56487 56493 +6 +0.011% cmd/test2json.s 1650 1647 -3 -0.182% cmd/link/internal/loadelf.s 35442 35443 +1 +0.003% cmd/go/internal/work.s 305039 305035 -4 -0.001% cmd/link/internal/ld.s 544835 544834 -1 -0.000% net/http.s 558777 558774 -3 -0.001% cmd/compile/internal/ssa.s 3926551 3926994 +443 +0.011% cmd/compile/internal/gc.s 1552320 1552321 +1 +0.000% total 18862241 18862670 +429 +0.002% Change-Id: I4289e773be6be534ea3f907d68f614441b8f9b46 Reviewed-on: https://go-review.googlesource.com/c/go/+/221607 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Keith Randall <khr@golang.org>	2020-02-29 17:02:29 +00:00
Michael Munday	cb74dcc172	cmd/compile: remove Greater* and Geq* generic integer ops The generic Greater and Geq ops can always be replaced with the Less and Leq ops. This CL therefore removes them. This simplifies the compiler since it reduces the number of operations that need handling in both code and in rewrite rules. This will be especially true when adding control flow optimizations such as the integer-in-range optimizations in CL 165998. Change-Id: If0648b2b19998ac1bddccbf251283f3be4ec3040 Reviewed-on: https://go-review.googlesource.com/c/go/+/220417 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-02-26 13:11:53 +00:00
Josh Bleecher Snyder	1bda9b001f	cmd/compile: use ellipses in MIPS rules Passes toolstash-check -all. Change-Id: I14db0acb9b531029c613fa31bc076928651b6448 Reviewed-on: https://go-review.googlesource.com/c/go/+/217007 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2020-02-24 23:32:38 +00:00
Josh Bleecher Snyder	6dd11bcb35	cmd/compile: remove chunking of rewrite rules We added chunking of rewrite rules to speed up compiling package SSA. This series of changes has significantly shrunk the number of rewrite rules, and they are no longer being added nearly as fast. Now that we are sharing v.Args across multiple rewrite rules, there is additional benefit to having more rules in a single function. Removing chunking now has an incidental impact on compiling package SSA, marginally speeds up other compilation, shrinks the cmd/compile binary, and simplifies the code. name old time/op new time/op delta Template 211ms ± 2% 210ms ± 2% -0.50% (p=0.000 n=91+97) Unicode 81.9ms ± 3% 81.8ms ± 3% ~ (p=0.179 n=96+91) GoTypes 731ms ± 2% 731ms ± 1% ~ (p=0.442 n=94+96) Compiler 3.43s ± 2% 3.41s ± 2% -0.36% (p=0.001 n=98+94) SSA 8.30s ± 2% 8.32s ± 2% +0.19% (p=0.034 n=94+95) Flate 135ms ± 2% 134ms ± 1% -0.30% (p=0.006 n=98+94) GoParser 167ms ± 1% 167ms ± 1% -0.22% (p=0.001 n=92+94) Reflect 453ms ± 2% 453ms ± 3% ~ (p=0.306 n=98+97) Tar 184ms ± 2% 183ms ± 2% -0.31% (p=0.012 n=94+94) XML 249ms ± 2% 248ms ± 1% -0.26% (p=0.002 n=96+92) [Geo mean] 419ms 418ms -0.21% name old user-time/op new user-time/op delta Template 273ms ± 2% 272ms ± 2% -0.46% (p=0.000 n=93+96) Unicode 116ms ± 4% 117ms ± 4% ~ (p=0.433 n=98+98) GoTypes 977ms ± 2% 977ms ± 1% ~ (p=0.971 n=92+99) Compiler 4.56s ± 6% 4.53s ± 6% ~ (p=0.081 n=100+100) SSA 11.1s ± 2% 11.1s ± 2% ~ (p=0.064 n=99+96) Flate 167ms ± 2% 167ms ± 1% -0.24% (p=0.004 n=95+96) GoParser 203ms ± 1% 203ms ± 2% -0.14% (p=0.049 n=96+97) Reflect 595ms ± 2% 595ms ± 2% ~ (p=0.544 n=95+92) Tar 225ms ± 2% 224ms ± 2% ~ (p=0.562 n=99+99) XML 312ms ± 2% 311ms ± 1% ~ (p=0.050 n=97+93) [Geo mean] 543ms 542ms -0.13% Change-Id: I8d34ab59f154b28f20c6f9e416b976bfce339baa Reviewed-on: https://go-review.googlesource.com/c/go/+/216220 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-02-21 02:29:11 +00:00
Josh Bleecher Snyder	a3f234c706	cmd/compile: reduce bounds checks in generated rewrite rules CL 213703 converted generated rewrite rules for commutative ops to use loops instead of duplicated code. However, it loaded args using expressions like v.Args[i] and v.Args[i^1], which the compiler could not eliminate bounds for (including with all outstanding prove CLs). Also, given a series of separate rewrite rules for the same op, we generated bounds checks for every rewrite rule, even though we were repeatedly loading the same set of args. This change reduces both sets of bounds checks. Instead of loading v.Args[i] and v.Args[i^1] for commutative loops, we now preload v.Args[0] and v.Args[1] into local variables, and then swap them (as needed) in the commutative loop post statement. And we now load all top level v.Args into local variables at the beginning of every rewrite rule function. The second optimization is the more significant, but the first helps a little, and they play together nicely from the perspective of generating the code. This does increase register pressure, but the reduced bounds checks more than compensate. Note that the vast majority of rewrite rules evaluated are not applied, so the prologue is the most important part of the rewrite rules. There is one subtle aspect to the new generated code. Because the top level v.Args are shared across rewrite rules, and rule evaluation can swap v_0 and v_1, v_0 and v_1 can end up being swapped from one rule to the next. That is OK, because any time a rule does not get applied, they will have been swapped exactly twice. Passes toolstash-check -all. name old time/op new time/op delta Template 213ms ± 2% 211ms ± 2% -0.85% (p=0.000 n=92+96) Unicode 83.5ms ± 2% 83.2ms ± 2% -0.41% (p=0.004 n=95+90) GoTypes 737ms ± 2% 733ms ± 2% -0.51% (p=0.000 n=91+94) Compiler 3.45s ± 2% 3.43s ± 2% -0.44% (p=0.000 n=99+100) SSA 8.54s ± 1% 8.32s ± 2% -2.56% (p=0.000 n=96+99) Flate 136ms ± 2% 135ms ± 1% -0.47% (p=0.000 n=96+96) GoParser 169ms ± 1% 168ms ± 1% -0.33% (p=0.000 n=96+93) Reflect 456ms ± 3% 455ms ± 3% ~ (p=0.261 n=95+94) Tar 186ms ± 2% 185ms ± 2% -0.48% (p=0.000 n=94+95) XML 251ms ± 1% 250ms ± 1% -0.51% (p=0.000 n=91+94) [Geo mean] 424ms 421ms -0.68% name old user-time/op new user-time/op delta Template 275ms ± 1% 274ms ± 2% -0.55% (p=0.000 n=95+98) Unicode 118ms ± 4% 118ms ± 4% ~ (p=0.642 n=98+90) GoTypes 983ms ± 1% 980ms ± 1% -0.30% (p=0.000 n=93+93) Compiler 4.56s ± 6% 4.52s ± 6% -0.72% (p=0.003 n=100+100) SSA 11.4s ± 1% 11.1s ± 1% -2.50% (p=0.000 n=96+97) Flate 168ms ± 1% 167ms ± 1% -0.49% (p=0.000 n=92+92) GoParser 204ms ± 1% 204ms ± 2% -0.27% (p=0.003 n=99+96) Reflect 599ms ± 2% 598ms ± 2% ~ (p=0.116 n=95+92) Tar 227ms ± 2% 225ms ± 2% -0.57% (p=0.000 n=95+98) XML 313ms ± 2% 312ms ± 1% -0.37% (p=0.000 n=89+95) [Geo mean] 547ms 544ms -0.61% file before after Δ % compile 21113112 21109016 -4096 -0.019% total 131704940 131700844 -4096 -0.003% Change-Id: Id6c39e0367e597c0c75b8a4b1eb14cc3cbd11956 Reviewed-on: https://go-review.googlesource.com/c/go/+/216218 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-02-21 00:55:58 +00:00
Josh Bleecher Snyder	bd6d78ef37	cmd/compile: use loops to handle commutative ops in rules Prior to this change, we generated additional rules at rulegen time for all possible combinations of args to commutative ops. This is simple and works well, but leads to lots of generated rules. This in turn has increased the size of the compiler, made it hard to compile package ssa on small machines, and provided a disincentive to mark some ops as commutative. This change reworks how we handle commutative ops. Instead of generating a rule per argument permutation, we generate a series of nested loops, one for each commutative op. Each loop tries both possible argument orderings. I also considered attempting to canonicalize the inputs to the rewrite rules. However, because either or both arguments might be nothing more than an identifier, and because there can be arbitrary conditions to evaluate during matching, I did not see how to proceed. The duplicate rule detection now sorts arguments to commutative ops, so that it can detect commutative-only duplicates. There may be further optimizations to the new generated code. In particular, we may not be removing as many bounds checks as before; I have not investigated deeply. If more work here is needed, we could do it with more hints or with improvements to the prove pass. This change has almost no impact on the generated code. It does not pass toolstash-check, however. In a handful of functions, for reasons I do not understand, there are minor position changes. For the entire series ending at this change, there is negligible compiler performance impact. The compiler binary shrinks by about 15%, and package ssa shrinks by about 25%. Package ssa also compiles ~25% faster with ~25% less memory. Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4 Reviewed-on: https://go-review.googlesource.com/c/go/+/213703 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2020-02-20 17:34:07 +00:00
Austin Clements	ec10e6f364	cmd/compile: fix missing lowering of atomic {Load,Store}8 CL 203284 added a compiler intrinsics from atomic Load8 and Store8 on several architectures, but missed the lowering on MIPS. This CL fixes that. Updates #10958, #24543. Change-Id: I82e88971554fe8c33ad2bf195a633c44b9ac4cf7 Reviewed-on: https://go-review.googlesource.com/c/go/+/203977 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-10-29 13:48:29 +00:00
Michael Munday	c7d81bc086	cmd/compile: reduce amount of code generated for block rewrite rules Add a Reset method to blocks that allows us to reduce the amount of code we generate for block rewrite rules. Thanks to Cherry for suggesting a similar fix to this in CL 196557. Compilebench result: name old time/op new time/op delta Template 211ms ± 1% 211ms ± 1% -0.30% (p=0.028 n=19+20) Unicode 83.7ms ± 3% 83.0ms ± 2% -0.79% (p=0.029 n=18+19) GoTypes 757ms ± 1% 755ms ± 1% -0.31% (p=0.034 n=19+19) Compiler 3.51s ± 1% 3.50s ± 1% -0.20% (p=0.013 n=18+18) SSA 11.7s ± 1% 11.7s ± 1% -0.38% (p=0.000 n=19+19) Flate 131ms ± 1% 130ms ± 1% -0.32% (p=0.024 n=18+18) GoParser 162ms ± 1% 162ms ± 1% ~ (p=0.059 n=20+18) Reflect 471ms ± 0% 470ms ± 0% -0.24% (p=0.045 n=20+17) Tar 187ms ± 1% 186ms ± 1% ~ (p=0.157 n=20+20) XML 255ms ± 1% 255ms ± 1% ~ (p=0.461 n=19+20) LinkCompiler 754ms ± 2% 755ms ± 2% ~ (p=0.919 n=17+17) ExternalLinkCompiler 2.82s ±16% 2.37s ±10% -15.94% (p=0.000 n=20+20) LinkWithoutDebugCompiler 439ms ± 4% 442ms ± 6% ~ (p=0.461 n=18+19) StdCmd 25.8s ± 2% 25.5s ± 1% -0.95% (p=0.000 n=20+20) name old user-time/op new user-time/op delta Template 240ms ± 8% 238ms ± 7% ~ (p=0.301 n=20+20) Unicode 107ms ±18% 104ms ±13% ~ (p=0.149 n=20+20) GoTypes 883ms ± 3% 888ms ± 2% ~ (p=0.211 n=20+20) Compiler 4.22s ± 1% 4.20s ± 1% ~ (p=0.077 n=20+18) SSA 14.1s ± 1% 14.1s ± 2% ~ (p=0.192 n=20+20) Flate 145ms ±10% 148ms ± 5% ~ (p=0.126 n=20+18) GoParser 186ms ± 7% 186ms ± 7% ~ (p=0.779 n=20+20) Reflect 538ms ± 3% 541ms ± 3% ~ (p=0.192 n=20+20) Tar 218ms ± 4% 217ms ± 6% ~ (p=0.835 n=19+20) XML 298ms ± 5% 298ms ± 5% ~ (p=0.749 n=19+20) LinkCompiler 818ms ± 5% 825ms ± 8% ~ (p=0.461 n=20+20) ExternalLinkCompiler 1.55s ± 4% 1.53s ± 5% ~ (p=0.063 n=20+18) LinkWithoutDebugCompiler 460ms ±12% 460ms ± 7% ~ (p=0.925 n=20+20) name old object-bytes new object-bytes delta Template 554kB ± 0% 554kB ± 0% ~ (all equal) Unicode 215kB ± 0% 215kB ± 0% ~ (all equal) GoTypes 2.01MB ± 0% 2.01MB ± 0% ~ (all equal) Compiler 7.97MB ± 0% 7.97MB ± 0% +0.00% (p=0.000 n=20+20) SSA 26.8MB ± 0% 26.9MB ± 0% +0.27% (p=0.000 n=20+20) Flate 340kB ± 0% 340kB ± 0% ~ (all equal) GoParser 434kB ± 0% 434kB ± 0% ~ (all equal) Reflect 1.34MB ± 0% 1.34MB ± 0% ~ (all equal) Tar 480kB ± 0% 480kB ± 0% ~ (all equal) XML 622kB ± 0% 622kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 20.4kB ± 0% 20.4kB ± 0% ~ (all equal) Unicode 8.21kB ± 0% 8.21kB ± 0% ~ (all equal) GoTypes 36.6kB ± 0% 36.6kB ± 0% ~ (all equal) Compiler 115kB ± 0% 115kB ± 0% +0.08% (p=0.000 n=20+20) SSA 141kB ± 0% 141kB ± 0% +0.07% (p=0.000 n=20+20) Flate 5.11kB ± 0% 5.11kB ± 0% ~ (all equal) GoParser 8.93kB ± 0% 8.93kB ± 0% ~ (all equal) Reflect 11.8kB ± 0% 11.8kB ± 0% ~ (all equal) Tar 10.9kB ± 0% 10.9kB ± 0% ~ (all equal) XML 17.4kB ± 0% 17.4kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 742kB ± 0% 742kB ± 0% ~ (all equal) CmdGoSize 10.7MB ± 0% 10.7MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.10MB ± 0% 1.10MB ± 0% ~ (all equal) CmdGoSize 14.9MB ± 0% 14.9MB ± 0% ~ (all equal) Change-Id: Ic89a8e62423b3d9fd9391159e0663acf450803b5 Reviewed-on: https://go-review.googlesource.com/c/go/+/198419 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Daniel Martí <mvdan@mvdan.cc>	2019-10-07 09:19:12 +00:00
Michael Munday	9c2e7e8bed	cmd/compile: allow multiple SSA block control values Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>	2019-10-02 09:56:36 +00:00
Daniel Martí	870080752d	cmd/compile: reduce rulegen's output by 200 KiB First, renove unnecessary "// cond:" lines from the generated files. This shaves off about ~7k lines. Second, join "if cond { break }" statements via "\|\|", which allows us to deduplicate a large number of them. This shaves off another ~25k lines. This change is not for readability or simplicity; but rather, to avoid unnecessary verbosity that makes the generated files larger. All in all, git reports that the generated files overall weigh ~200KiB less, or about 2.7% less. While at it, add a -trace flag to rulegen. Updates #33644. Change-Id: I3fac0290a6066070cc62400bf970a4ae0929470a Reviewed-on: https://go-review.googlesource.com/c/go/+/196498 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-09-23 16:36:10 +00:00
Daniel Martí	79dee788ec	cmd/compile: teach rulegen to remove unused decls First, add cpu and memory profiling flags, as these are useful to see where rulegen is spending its time. It now takes many seconds to run on a recent laptop, so we have to keep an eye on what it's doing. Second, stop writing '_ = var' lines to keep imports and variables used at all times. Now that rulegen removes all such unused names, they're unnecessary. To perform the removal, lean on go/types to first detect what names are unused. We can configure it to give us all the type-checking errors in a file, so we can collect all "declared but not used" errors in a single pass. We then use astutil.Apply to remove the relevant nodes based on the line information from each unused error. This allows us to apply the changes without having to do extra parser+printer roundtrips to plaintext, which are far too expensive. We need to do multiple such passes, as removing an unused variable declaration might then make another declaration unused. Two passes are enough to clean every file at the moment, so add a limit of three passes for now to avoid eating cpu uncontrollably by accident. The resulting performance of the changes above is a ~30% loss across the table, since go/types is fairly expensive. The numbers were obtained with 'benchcmd Rulegen go run *.go', which involves compiling rulegen itself, but that seems reflective of how the program is used. name old time/op new time/op delta Rulegen 5.61s ± 0% 7.36s ± 0% +31.17% (p=0.016 n=5+4) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 9.92s ± 1% +37.76% (p=0.016 n=5+4) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 169ms ±17% +25.66% (p=0.032 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 85.6MB ± 2% +20.56% (p=0.008 n=5+5) We can live with a bit more resource usage, but the time/op getting close to 10s isn't good. To win that back, introduce concurrency in main.go. This further increases resource usage a bit, but the real time on this quad-core laptop is greatly reduced. The final benchstat is as follows: name old time/op new time/op delta Rulegen 5.61s ± 0% 3.97s ± 1% -29.26% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Rulegen 7.20s ± 1% 13.91s ± 1% +93.09% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Rulegen 135ms ±19% 269ms ± 9% +99.17% (p=0.008 n=5+5) name old peak-RSS-bytes new peak-RSS-bytes delta Rulegen 71.0MB ± 2% 226.3MB ± 1% +218.72% (p=0.008 n=5+5) It might be possible to reduce the cpu or memory usage in the future, such as configuring go/types to do less work, or taking shortcuts to avoid having to run it many times. For now, ~2x cpu and ~4x memory usage seems like a fair trade for a faster and better rulegen. Finally, we can remove the old code that tried to remove some unused variables in a hacky and unmaintainable way. Change-Id: Iff9e83e3f253babf5a1bd48cc993033b8550cee6 Reviewed-on: https://go-review.googlesource.com/c/go/+/189798 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-08-27 17:04:18 +00:00
Brian Kessler	a28a942768	cmd/compile: add unsigned divisibility rules "Division by invariant integers using multiplication" paper by Granlund and Montgomery contains a method for directly computing divisibility (x%c == 0 for c constant) by means of the modular inverse. The method is further elaborated in "Hacker's Delight" by Warren Section 10-17 This general rule can compute divisibilty by one multiplication and a compare for odd divisors and an additional rotate for even divisors. To apply the divisibility rule, we must take into account the rules to rewrite x%c = x-((x/c)c) and (x/c) for c constant on the first optimization pass "opt". This complicates the matching as we want to match only in the cases where the result of (x/c) is not also available. So, we must match on the expanded form of (x/c) in the expression x == c(x/c) in the "late opt" pass after common subexpresion elimination. Note, that if there is an intermediate opt pass introduced in the future we could simplify these rules by delaying the magic division rewrite to "late opt" and matching directly on (x/c) in the intermediate opt pass. Additional rules to lower the generic RotateLeft* ops were also applied. On amd64, the divisibility check is 25-50% faster. name old time/op new time/op delta DivconstI64-4 2.08ns ± 0% 2.08ns ± 1% ~ (p=0.881 n=5+5) DivisibleconstI64-4 2.67ns ± 0% 2.67ns ± 1% ~ (p=1.000 n=5+5) DivisibleWDivconstI64-4 2.67ns ± 0% 2.67ns ± 0% ~ (p=0.683 n=5+5) DivconstU64-4 2.08ns ± 1% 2.08ns ± 1% ~ (p=1.000 n=5+5) DivisibleconstU64-4 2.77ns ± 1% 1.55ns ± 2% -43.90% (p=0.008 n=5+5) DivisibleWDivconstU64-4 2.99ns ± 1% 2.99ns ± 1% ~ (p=1.000 n=5+5) DivconstI32-4 1.53ns ± 2% 1.53ns ± 0% ~ (p=1.000 n=5+5) DivisibleconstI32-4 2.23ns ± 0% 2.25ns ± 3% ~ (p=0.167 n=5+5) DivisibleWDivconstI32-4 2.27ns ± 1% 2.27ns ± 1% ~ (p=0.429 n=5+5) DivconstU32-4 1.78ns ± 0% 1.78ns ± 1% ~ (p=1.000 n=4+5) DivisibleconstU32-4 2.52ns ± 2% 1.26ns ± 0% -49.96% (p=0.000 n=5+4) DivisibleWDivconstU32-4 2.63ns ± 0% 2.85ns ±10% +8.29% (p=0.016 n=4+5) DivconstI16-4 1.54ns ± 0% 1.54ns ± 0% ~ (p=0.333 n=4+5) DivisibleconstI16-4 2.10ns ± 0% 2.10ns ± 1% ~ (p=0.571 n=4+5) DivisibleWDivconstI16-4 2.22ns ± 0% 2.23ns ± 1% ~ (p=0.556 n=4+5) DivconstU16-4 1.09ns ± 0% 1.01ns ± 1% -7.74% (p=0.000 n=4+5) DivisibleconstU16-4 1.83ns ± 0% 1.26ns ± 0% -31.52% (p=0.008 n=5+5) DivisibleWDivconstU16-4 1.88ns ± 0% 1.89ns ± 1% ~ (p=0.365 n=5+5) DivconstI8-4 1.54ns ± 1% 1.54ns ± 1% ~ (p=1.000 n=5+5) DivisibleconstI8-4 2.10ns ± 0% 2.11ns ± 0% ~ (p=0.238 n=5+4) DivisibleWDivconstI8-4 2.22ns ± 0% 2.23ns ± 2% ~ (p=0.762 n=5+5) DivconstU8-4 0.92ns ± 1% 0.94ns ± 1% +2.65% (p=0.008 n=5+5) DivisibleconstU8-4 1.66ns ± 0% 1.26ns ± 1% -24.28% (p=0.008 n=5+5) DivisibleWDivconstU8-4 1.79ns ± 0% 1.80ns ± 1% ~ (p=0.079 n=4+5) A follow-up change will address the signed division case. Updates #30282 Change-Id: I7e995f167179aa5c76bb10fbcbeb49c520943403 Reviewed-on: https://go-review.googlesource.com/c/go/+/168037 Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-04-27 20:46:46 +00:00
Daniel Martí	0fbf681840	cmd/compile: reduce rulegen's for loop verbosity A lot of the naked for loops begin like: for { v := b.Control if v.Op != OpConstBool { break } ... return true } Instead, write them out in a more compact and readable way: for v.Op == OpConstBool { ... return true } This requires the addition of two bytes.Buffer writers, as this helps us make a decision based on future pieces of generated code. This probably makes rulegen slightly slower, but that's not noticeable; the code generation still takes ~3.5s on my laptop, excluding build time. The "v := b.Control" declaration can be moved to the top of each function. Even though the rules can modify b.Control when firing, they also make the function return, so v can't be used again. While at it, remove three unnecessary lines from the top of each rewriteBlock func. In total, this results in ~4k lines removed from the generated code, and a slight improvement in readability. Change-Id: I317e4c6a4842c64df506f4513375475fad2aeec5 Reviewed-on: https://go-review.googlesource.com/c/go/+/167399 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2019-03-22 15:41:51 +00:00
Josh Bleecher Snyder	700e969d5b	cmd/compile: regenerate rewrite rules Change-Id: I7e921b7b4665ff76dc8bae493b2a49318690770b Reviewed-on: https://go-review.googlesource.com/c/go/+/168637 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Munday <mike.munday@ibm.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2019-03-21 17:04:11 +00:00
Keith Randall	2c423f063b	cmd/compile,runtime: provide index information on bounds check failure A few examples (for accessing a slice of length 3): s[-1] runtime error: index out of range [-1] s[3] runtime error: index out of range [3] with length 3 s[-1:0] runtime error: slice bounds out of range [-1:] s[3:0] runtime error: slice bounds out of range [3:0] s[3:-1] runtime error: slice bounds out of range [:-1] s[3:4] runtime error: slice bounds out of range [:4] with capacity 3 s[0:3:4] runtime error: slice bounds out of range [::4] with capacity 3 Note that in cases where there are multiple things wrong with the indexes (e.g. s[3:-1]), we report one of those errors kind of arbitrarily, currently the rightmost one. An exhaustive set of examples is in issue30116[u].out in the CL. The message text has the same prefix as the old message text. That leads to slightly awkward phrasing but hopefully minimizes the chance that code depending on the error text will break. Increases the size of the go binary by 0.5% (amd64). The panic functions take arguments in registers in order to keep the size of the compiled code as small as possible. Fixes #30116 Change-Id: Idb99a827b7888822ca34c240eca87b7e44a04fdd Reviewed-on: https://go-review.googlesource.com/c/go/+/161477 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2019-03-18 17:33:38 +00:00
Michael Munday	04f1b65cc6	cmd/compile: try and access last argument first in rulegen This reduces the number of extra bounds check hints we need to insert. For example, rather than producing: _ = v.Args[2] x := v.Args[0] y := v.Args[1] z := v.Args[2] We now produce: z := v.Args[2] x := v.Args[0] y := v.Args[1] This gets rid of about 7000 lines of code from the rewrite rules. Change-Id: I1291cf0f82e8d035a6d65bce7dee6cedee04cbcd Reviewed-on: https://go-review.googlesource.com/c/go/+/167397 Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2019-03-13 15:11:37 +00:00
Josh Bleecher Snyder	53948127d3	cmd/compile: make rulegen magic variable prediction more precise The sheer length of the generated rules files makes my editor and git client unhappy. This change is a small step towards shortening them. We recognize a few magic variables during rulegen: b, config, fe, typ. Of these, only b appears prone to false positives. By tightening the heuristic and fixing one case in MIPS.rules, we can make the heuristic enough that it has no failures. That allows us to remove the hedge assignments to _, removing 3000 pointless lines of code. Change-Id: I080cde5db28c8277cb3fd9ddcd829306c9a27785 Reviewed-on: https://go-review.googlesource.com/c/go/+/166979 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2019-03-12 20:13:46 +00:00
Keith Randall	7502ed3b90	cmd/compile: when merging instructions, prefer line number of faulting insn Normally this happens when combining a sign extension and a load. We want the resulting combo-instruction to get the line number of the load, not the line number of the sign extension. For each rule, compute where we should get its line number by finding a value on the match side that can fault. Use that line number for all the new values created on the right-hand side. Fixes #27201 Change-Id: I19b3c6f468fff1a3c0bfbce2d6581828557064a3 Reviewed-on: https://go-review.googlesource.com/c/156937 Reviewed-by: David Chase <drchase@google.com>	2019-01-14 22:41:33 +00:00
Cherry Zhang	6a64efc250	cmd/compile: fix MIPS SGTconst-with-shift rules (SGTconst [c] (SRLconst _ [d])) && 0 <= int32(c) && uint32(d) <= 31 && 1<<(32-uint32(d)) <= int32(c) -> (MOVWconst [1]) This rule is problematic. 1<<(32-uint32(d)) <= int32(c) meant to say that it is true if c is greater than the largest possible value of the right shift. But when d==1, 1<<(32-1) is negative and results in the wrong comparison. Rewrite the rules in a more direct way. Fixes #29402. Change-Id: I5940fc9538d9bc3a4bcae8aa34672867540dc60e Reviewed-on: https://go-review.googlesource.com/c/155798 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-12-27 00:07:53 +00:00
Josh Bleecher Snyder	3bab4373c7	cmd/compile: make fmt available in rewrite rules During development and debugging, I often want to write noteRule(fmt.Sprintf(...)), and end up manually adding the import to the generated code. Let's just make it always available instead. Change-Id: I1e2d47c98ba056e1b5da42e35fb6ad26f1d9cc3d Reviewed-on: https://go-review.googlesource.com/c/145207 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com>	2018-10-28 18:46:36 +00:00
David Chase	0029cd479e	cmd/compile: add LocalAddr that takes SP,mem operands Lack of a well-defined order between VarDef and related address operations sometimes causes problems with store order and write barrier transformations; glitches in the order are made irreparable (by later optimizations) if the two parts of the glitch straddle a split in the original block caused by insertion of a write barrier diamond. Fix this by creating a LocalAddr for addresses of locals (what VarDef matters for) that takes a memory input to help make the order explicit. Addr is modified to only be legal for SB operand, so there is no overlap between Addr and LocalAddr uses (there may be some downstream cleanup from this). Changes to generic.rules and rewrite.go ensure that codegen tests continue to pass; CSE of LocalAddr is impaired, not quite sure of the cost. Fixes #26105. Change-Id: Id4192b4440aa4e9d7ba54a465c456df9b530b515 Reviewed-on: https://go-review.googlesource.com/122483 Run-TryBot: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2018-07-12 18:45:31 +00:00
Wei Xiao	20102594a0	cmd/compile: intrinsify runtime.getcallerpc on all link register architectures Add a compiler intrinsic for getcallerpc on following architectures: arm mips mipsle mips64 mips64le ppc64 ppc64le s390x Change-Id: I758f3d4742fc214b206bcd07d90408622c17dbef Reviewed-on: https://go-review.googlesource.com/110835 Run-TryBot: Wei Xiao <Wei.Xiao@arm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-05-02 16:59:27 +00:00
Josh Bleecher Snyder	d9a50a6531	cmd/compile: use prove pass to detect Ctz of non-zero values On amd64, Ctz must include special handling of zeros. But the prove pass has enough information to detect whether the input is non-zero, allowing a more efficient lowering. Introduce new CtzNonZero ops to capture and use this information. Benchmark code: func BenchmarkVisitBits(b testing.B) { b.Run("8", func(b testing.B) { for i := 0; i < b.N; i++ { x := uint8(0xff) for x != 0 { sink = bits.TrailingZeros8(x) x &= x - 1 } } }) // and similarly so for 16, 32, 64 } name old time/op new time/op delta VisitBits/8-8 7.27ns ± 4% 5.58ns ± 4% -23.35% (p=0.000 n=28+26) VisitBits/16-8 14.7ns ± 7% 10.5ns ± 4% -28.43% (p=0.000 n=30+28) VisitBits/32-8 27.6ns ± 8% 19.3ns ± 3% -30.14% (p=0.000 n=30+26) VisitBits/64-8 44.0ns ±11% 38.0ns ± 5% -13.48% (p=0.000 n=30+30) Fixes #25077 Change-Id: Ie6e5bd86baf39ee8a4ca7cadcf56d934e047f957 Reviewed-on: https://go-review.googlesource.com/109358 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2018-04-26 18:22:28 +00:00
Austin Clements	8871c930be	cmd/compile: don't lower OpConvert Currently, each architecture lowers OpConvert to an arch-specific OpXXXconvert. This is silly because OpConvert means the same thing on all architectures and is logically a no-op that exists only to keep track of conversions to and from unsafe.Pointer. Furthermore, lowering it makes it harder to recognize in other analyses, particularly liveness analysis. This CL eliminates the lowering of OpConvert, leaving it as the generic op until code generation time. The main complexity here is that we still need to register-allocate OpConvert operations. Currently, each arch's lowered OpConvert specifies all GP registers in its register mask. Ideally, OpConvert wouldn't affect value homing at all, and we could just copy the home of OpConvert's source, but this can potentially home an OpConvert in a LocalSlot, which neither regalloc nor stackalloc expect. Rather than try to disentangle this assumption from regalloc and stackalloc, we continue to register-allocate OpConvert, but teach regalloc that OpConvert can be allocated to any allocatable GP register. For #24543. Change-Id: I795a6aee5fd94d4444a7bafac3838a400c9f7bb6 Reviewed-on: https://go-review.googlesource.com/108496 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2018-04-20 18:46:39 +00:00
Giovanni Bajo	70fd25e4e1	cmd/compile: normalize spaces in rewrite rule comments. In addition to look nicer to the eye, this allows to reformat and indent rules without causing spurious changes to the generated file, making it easier to spot functional changes. After this CL, all CLs that will aggregate rules through the new "\|" functionality should cause no changes to the generated files. Change-Id: Icec283585ba8d7b91c79d76513c1d83dca4b30aa Reviewed-on: https://go-review.googlesource.com/95216 Run-TryBot: Giovanni Bajo <rasky@develer.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2018-02-20 18:14:38 +00:00
Austin Clements	313a4b2b7f	runtime: buffered write barrier for mips Updates #22460. Change-Id: Ieaca94385c3bb88dcc8351c3866b4b0e2a1412b5 Reviewed-on: https://go-review.googlesource.com/92701 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2018-02-13 16:34:21 +00:00
Cherry Zhang	6f3e5e637c	cmd/compile: intrinsify runtime.getcallersp Add a compiler intrinsic for getcallersp. So we are able to get rid of the argument (not done in this CL). Change-Id: Ic38fda1c694f918328659ab44654198fb116668d Reviewed-on: https://go-review.googlesource.com/69350 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: David Chase <drchase@google.com>	2017-10-10 15:15:21 +00:00
philhofer	c59b495963	cmd/compile: add support for arm64 bit-test instructions Add support for generating TBZ/TBNZ instructions. The bit-test-and-branch pattern shows up in a number of important places, including the runtime (gc bitmaps). Before this change, there were 3 TB[N]?Z instructions in the Go tool, all of which were in hand-written assembly. After this change, there are 285. Also, the go1 benchmark binary gets about 4.5kB smaller. Fixes #21361 Change-Id: I170c138b852754b9b8df149966ca5e62e6dfa771 Reviewed-on: https://go-review.googlesource.com/54470 Run-TryBot: Philip Hofer <phofer@umich.edu> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>	2017-08-15 13:39:11 +00:00
Cherry Zhang	1e0819101b	cmd/compile: fix subword store/load elision for MIPS Apply the fix in CL 44355 to MIPS. ARM64 has these rules but commented out for performance reason. Fix the commented rules, in case they are enabled in the future. Enhance the test so it triggers the failure on ARM and MIPS without the fix. Updates #20530. Change-Id: I82d77448e3939a545fe519d0a29a164f8fa5417c Reviewed-on: https://go-review.googlesource.com/44430 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>	2017-05-31 14:44:02 +00:00
Josh Bleecher Snyder	5548f7d5cf	cmd/compile: eliminate some bounds checks from generated rewrite rules Noticed while looking at #20356. Cuts 160k (1%) off of the cmd/compile binary. Change-Id: If2397bc6971d6be9be6975048adecb0b5efa6d66 Reviewed-on: https://go-review.googlesource.com/43501 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-05-16 14:08:08 +00:00
Josh Bleecher Snyder	46b88c9fbc	cmd/compile: change ssa.Type into types.Type When package ssa was created, Type was in package gc. To avoid circular dependencies, we used an interface (ssa.Type) to represent type information in SSA. In the Go 1.9 cycle, gri extricated the Type type from package gc. As a result, we can now use it in package ssa. Now, instead of package types depending on package ssa, it is the other way. This is a more sensible dependency tree, and helps compiler performance a bit. Though this is a big CL, most of the changes are mechanical and uninteresting. Interesting bits: Add new singleton globals to package types for the special SSA types Memory, Void, Invalid, Flags, and Int128. * Add two new Types, TSSA for the special types, and TTUPLE, for SSA tuple types. ssa.MakeTuple is now types.NewTuple. * Move type comparison result constants CMPlt, CMPeq, and CMPgt to package types. * We had picked the name "types" in our rules for the handy list of types provided by ssa.Config. That conflicted with the types package name, so change it to "typ". * Update the type comparison routine to handle tuples and special types inline. * Teach gc/fmt.go how to print special types. * We can now eliminate ElemTypes in favor of just Elem, and probably also some other duplicated Type methods designed to return ssa.Type instead of types.Type. The ssa tests were using their own dummy types, and they were not particularly careful about types in general. Of necessity, this CL switches them to use *types.Type; it does not make them more type-accurate. Unfortunately, using types.Type means initializing a bit of the types universe. This is prime for refactoring and improvement. This shrinks ssa.Value; it now fits in a smaller size class on 64 bit systems. This doesn't have a giant impact, though, since most Values are preallocated in a chunk. name old alloc/op new alloc/op delta Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8) Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10) GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10) Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10) GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9) Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8) Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10) XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10) [Geo mean] 40.5MB 40.3MB -0.68% name old allocs/op new allocs/op delta Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9) Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10) GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10) Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10) GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9) Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8) Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10) XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10) [Geo mean] 428k 428k -0.01% Removing all the interface calls helps non-trivially with CPU, though. name old time/op new time/op delta Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96) Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96) GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96) Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99) GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97) Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99) Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94) XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95) [Geo mean] 178ms 173ms -2.65% name old user-time/op new user-time/op delta Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99) Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95) GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99) Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96) GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100) Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92) Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100) XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97) [Geo mean] 220ms 213ms -2.76% Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1 Reviewed-on: https://go-review.googlesource.com/42145 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2017-05-09 23:01:51 +00:00

1 2

76 Commits