mirror of https://github.com/golang/go.git
209 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
83e288f3db |
cmd/compile: prevent constant folding of +/- when result is NaN
Missed as part of CL 221790. It isn't just * and / that can make NaNs. Update #36400 Fixes #38359 Change-Id: I3fa562f772fe03b510793a6dc0cf6189c0c3e652 Reviewed-on: https://go-review.googlesource.com/c/go/+/227860 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Alberto Donizetti <alb.donizetti@gmail.com> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com> |
|
|
|
28157b3292 |
cmd/compile: start implementing strongly typed aux and auxint fields
Right now the Aux and AuxInt fields of ssa.Values are typed as
interface{} and int64, respectively. Each rule that uses these values
must cast them to the type they actually are (*obj.LSym, or int32, or
ValAndOff, etc.), use them, and then cast them back to interface{} or
int64.
We know for each opcode what the types of the Aux and AuxInt fields
should be. So let's modify the rule generator to declare the types to
be what we know they should be, autoconverting to and from the generic
types for us. That way we can make the rules more type safe.
It's difficult to make a single CL for this, so I've coopted the "=>"
token to indicate a rule that is strongly typed. "->" rules are
processed as before. That will let us migrate a few rules at a time in
separate CLs. Hopefully we can reach a state where all rules are
strongly typed and we can drop the distinction.
This CL changes just a few rules to get a feel for what this
transition would look like.
I've decided not to put explicit types in the rules. I think it
makes the rules somewhat clearer, but definitely more verbose.
In particular, the passthrough rules that don't modify the fields
in question are verbose for no real reason.
Change-Id: I63a1b789ac5702e7caf7934cd49f784235d1d73d
Reviewed-on: https://go-review.googlesource.com/c/go/+/190197
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
|
|
|
|
bfd569fcb0 |
cmd/compile: delete the floating point Greater and Geq ops
Extend CL 220417 (which removed the integer Greater and Geq ops) to floating point comparisons. Greater and Geq can always be implemented using Less and Leq. Fixes #37316. Change-Id: Ieaddb4877dd0ff9037a1dd11d0a9a9e45ced71e7 Reviewed-on: https://go-review.googlesource.com/c/go/+/222397 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
82253ddc7a |
cmd/compile: constant fold CtzNN
Change-Id: I3ecd2c7ed3c8ae35c2bb9562aed09f7ade5c8cdd Reviewed-on: https://go-review.googlesource.com/c/go/+/221609 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
af7eafd150 |
cmd/compile: convert 386 port to use addressing modes pass (take 2)
Retrying CL 222782, with a fix that will hopefully stop the random crashing. The issue with the previous CL is that it does pointer arithmetic in a way that may briefly generate an out-of-bounds pointer. If an interrupt happens to occur in that state, the referenced object may be collected incorrectly. Suppose there was code that did s[x+c]. The previous CL had a rule to the effect of ptr + (x + c) -> c + (ptr + x). But ptr+x is not guaranteed to point to the same object as ptr. In contrast, ptr+(x+c) is guaranteed to point to the same object as ptr, because we would have already checked that x+c is in bounds. For example, strconv.trim used to have this code: MOVZX -0x1(BX)(DX*1), BP CMPL $0x30, AL After CL 222782, it had this code: LEAL 0(BX)(DX*1), BP CMPB $0x30, -0x1(BP) An interrupt between those last two instructions could see BP pointing outside the backing store of the slice involved. It's really hard to actually demonstrate a bug. First, you need to have an interrupt occur at exactly the right time. Then, there must be no other pointers to the object in question. Since the interrupted frame will be scanned conservatively, there can't even be a dead pointer in another register or on the stack. (In the example above, a bug can't happen because BX still holds the original pointer.) Then, the object in question needs to be collected (or at least scanned?) before the interrupted code continues. This CL needs to handle load combining somewhat differently than CL 222782 because of the new restriction on arithmetic. That's the only real difference (other than removing the bad rules) from that old CL. This bug is also present in the amd64 rewrite rules, and we haven't seen any crashing as a result. I will fix up that code similarly to this one in a separate CL. Update #37881 Change-Id: I5f0d584d9bef4696bfe89a61ef0a27c8d507329f Reviewed-on: https://go-review.googlesource.com/c/go/+/225798 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
cd9fd640db |
cmd/compile: don't allow NaNs in floating-point constant ops
Trying this CL again, with a fixed test that allows platforms to disagree on the exact behavior of converting NaNs. We store 32-bit floating point constants in a 64-bit field, by converting that 32-bit float to 64-bit float to store it, and convert it back to use it. That works for *almost* all floating-point constants. The exception is signaling NaNs. The round trip described above means we can't represent a 32-bit signaling NaN, because conversions strip the signaling bit. To fix this issue, just forbid NaNs as floating-point constants in SSA form. This shouldn't affect any real-world code, as people seldom constant-propagate NaNs (except in test code). Additionally, NaNs are somewhat underspecified (which of the many NaNs do you get when dividing 0/0?), so when cross-compiling there's a danger of using the compiler machine's NaN regime for some math, and the target machine's NaN regime for other math. Better to use the target machine's NaN regime always. Update #36400 Change-Id: Idf203b688a15abceabbd66ba290d4e9f63619ecb Reviewed-on: https://go-review.googlesource.com/c/go/+/221790 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> |
|
|
|
e37cc29863 |
cmd/compile: optimize integer-in-range checks
This CL incorporates code from CL 201206 by Josh Bleecher Snyder (thanks Josh). This CL restores the integer-in-range optimizations in the SSA backend. The fuse pass is enhanced to detect inequalities that could be merged and fuse their associated blocks while the generic rules optimize them into a single unsigned comparison. For example, the inequality `x >= 0 && x < 10` will now be optimized to `unsigned(x) < 10`. Overall has a fairly positive impact on binary sizes. name old time/op new time/op delta Template 192ms ± 1% 192ms ± 1% ~ (p=0.757 n=17+18) Unicode 76.6ms ± 2% 76.5ms ± 2% ~ (p=0.603 n=19+19) GoTypes 694ms ± 1% 693ms ± 1% ~ (p=0.569 n=19+20) Compiler 3.26s ± 0% 3.27s ± 0% +0.25% (p=0.000 n=20+20) SSA 7.41s ± 0% 7.49s ± 0% +1.10% (p=0.000 n=17+19) Flate 120ms ± 1% 120ms ± 1% +0.38% (p=0.003 n=19+19) GoParser 152ms ± 1% 152ms ± 1% ~ (p=0.061 n=17+19) Reflect 422ms ± 1% 425ms ± 2% +0.76% (p=0.001 n=18+20) Tar 167ms ± 1% 167ms ± 0% ~ (p=0.730 n=18+19) XML 233ms ± 4% 231ms ± 1% ~ (p=0.752 n=20+17) LinkCompiler 927ms ± 8% 928ms ± 8% ~ (p=0.857 n=19+20) ExternalLinkCompiler 1.81s ± 2% 1.81s ± 2% ~ (p=0.513 n=19+20) LinkWithoutDebugCompiler 556ms ±10% 583ms ±13% +4.95% (p=0.007 n=20+20) [Geo mean] 478ms 481ms +0.52% name old user-time/op new user-time/op delta Template 270ms ± 5% 269ms ± 7% ~ (p=0.925 n=20+20) Unicode 134ms ± 7% 131ms ±14% ~ (p=0.593 n=18+20) GoTypes 981ms ± 3% 987ms ± 2% +0.63% (p=0.049 n=19+18) Compiler 4.50s ± 2% 4.50s ± 1% ~ (p=0.588 n=19+20) SSA 10.6s ± 2% 10.6s ± 1% ~ (p=0.141 n=20+19) Flate 164ms ± 8% 165ms ±10% ~ (p=0.738 n=20+20) GoParser 202ms ± 5% 203ms ± 6% ~ (p=0.820 n=20+20) Reflect 587ms ± 6% 597ms ± 3% ~ (p=0.087 n=20+18) Tar 230ms ± 6% 228ms ± 8% ~ (p=0.569 n=19+20) XML 311ms ± 6% 314ms ± 5% ~ (p=0.369 n=20+20) LinkCompiler 878ms ± 8% 887ms ± 7% ~ (p=0.289 n=20+20) ExternalLinkCompiler 1.60s ± 7% 1.60s ± 7% ~ (p=0.820 n=20+20) LinkWithoutDebugCompiler 498ms ±12% 489ms ±11% ~ (p=0.398 n=20+20) [Geo mean] 611ms 611ms +0.05% name old alloc/op new alloc/op delta Template 36.1MB ± 0% 36.0MB ± 0% -0.32% (p=0.000 n=20+20) Unicode 28.3MB ± 0% 28.3MB ± 0% -0.03% (p=0.000 n=19+20) GoTypes 121MB ± 0% 121MB ± 0% ~ (p=0.226 n=16+20) Compiler 563MB ± 0% 563MB ± 0% ~ (p=0.166 n=20+19) SSA 1.32GB ± 0% 1.33GB ± 0% +0.88% (p=0.000 n=20+19) Flate 22.7MB ± 0% 22.7MB ± 0% -0.02% (p=0.033 n=19+20) GoParser 27.9MB ± 0% 27.9MB ± 0% -0.02% (p=0.001 n=20+20) Reflect 78.3MB ± 0% 78.2MB ± 0% -0.01% (p=0.019 n=20+20) Tar 34.0MB ± 0% 34.0MB ± 0% -0.04% (p=0.000 n=20+20) XML 43.9MB ± 0% 43.9MB ± 0% -0.07% (p=0.000 n=20+19) LinkCompiler 205MB ± 0% 205MB ± 0% +0.44% (p=0.000 n=20+18) ExternalLinkCompiler 223MB ± 0% 223MB ± 0% +0.03% (p=0.000 n=20+20) LinkWithoutDebugCompiler 139MB ± 0% 142MB ± 0% +1.75% (p=0.000 n=20+20) [Geo mean] 93.7MB 93.9MB +0.20% name old allocs/op new allocs/op delta Template 363k ± 0% 361k ± 0% -0.58% (p=0.000 n=20+19) Unicode 329k ± 0% 329k ± 0% -0.06% (p=0.000 n=19+20) GoTypes 1.28M ± 0% 1.28M ± 0% -0.01% (p=0.000 n=20+20) Compiler 5.40M ± 0% 5.40M ± 0% -0.01% (p=0.000 n=20+20) SSA 12.7M ± 0% 12.8M ± 0% +0.80% (p=0.000 n=20+20) Flate 228k ± 0% 228k ± 0% ~ (p=0.194 n=20+20) GoParser 295k ± 0% 295k ± 0% -0.04% (p=0.000 n=20+20) Reflect 949k ± 0% 949k ± 0% -0.01% (p=0.000 n=20+20) Tar 337k ± 0% 337k ± 0% -0.06% (p=0.000 n=20+20) XML 418k ± 0% 417k ± 0% -0.17% (p=0.000 n=20+20) LinkCompiler 553k ± 0% 554k ± 0% +0.22% (p=0.000 n=20+19) ExternalLinkCompiler 1.52M ± 0% 1.52M ± 0% +0.27% (p=0.000 n=20+20) LinkWithoutDebugCompiler 186k ± 0% 186k ± 0% +0.06% (p=0.000 n=20+20) [Geo mean] 723k 723k +0.03% name old text-bytes new text-bytes delta HelloSize 828kB ± 0% 828kB ± 0% -0.01% (p=0.000 n=20+20) name old data-bytes new data-bytes delta HelloSize 13.4kB ± 0% 13.4kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 180kB ± 0% 180kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.23MB ± 0% 1.23MB ± 0% -0.33% (p=0.000 n=20+20) file before after Δ % addr2line 4320075 4311883 -8192 -0.190% asm 5191932 5187836 -4096 -0.079% buildid 2835338 2831242 -4096 -0.144% compile 20531717 20569099 +37382 +0.182% cover 5322511 5318415 -4096 -0.077% dist 3723749 3719653 -4096 -0.110% doc 4743515 4739419 -4096 -0.086% fix 3413960 3409864 -4096 -0.120% link 6690119 6686023 -4096 -0.061% nm 4269616 4265520 -4096 -0.096% pprof 14942189 14929901 -12288 -0.082% trace 11807164 11790780 -16384 -0.139% vet 8384104 8388200 +4096 +0.049% go 15339076 15334980 -4096 -0.027% total 132258257 132226007 -32250 -0.024% Fixes #30645. Change-Id: If551ac5996097f3685870d083151b5843170aab0 Reviewed-on: https://go-review.googlesource.com/c/go/+/165998 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
74f898360d |
cmd/compile: constant fold SSA bool to int conversions
Shaves off a few instructions here and there. file before after Δ % go/types.s 322118 321851 -267 -0.083% go/internal/gcimporter.s 34937 34909 -28 -0.080% go/internal/gccgoimporter.s 56493 56474 -19 -0.034% cmd/compile/internal/ssa.s 3926994 3927177 +183 +0.005% total 18862670 18862539 -131 -0.001% Change-Id: I724f32317b946b5138224808f85709d9c097a247 Reviewed-on: https://go-review.googlesource.com/c/go/+/221428 Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
4ae1879dda |
cmd/compile: document Move's type
Fixes #37381 Change-Id: I8abf07d6342c10fc8d52e11c6a70fb0ec09220d2 Reviewed-on: https://go-review.googlesource.com/c/go/+/220683 Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
390c096ee9 |
cmd/compile: make clobber variadic
There are often many values to clobber. Allow passing them all in at once. The goal is increased rule readability. As a bonus, it shrinks cmd/compile by ~97k, almost half a percent. Package SSA requires 1.2% less memory to compile. The single-line changes were make via regex, and the remaining multi-line clobbers were manually combined. Passes toolstash-check -all. Change-Id: Ib310e9265d3616211f8192c9040b4c8933824d19 Reviewed-on: https://go-review.googlesource.com/c/go/+/220691 Reviewed-by: Michael Munday <mike.munday@ibm.com> |
|
|
|
cb74dcc172 |
cmd/compile: remove Greater* and Geq* generic integer ops
The generic Greater and Geq ops can always be replaced with the Less and Leq ops. This CL therefore removes them. This simplifies the compiler since it reduces the number of operations that need handling in both code and in rewrite rules. This will be especially true when adding control flow optimizations such as the integer-in-range optimizations in CL 165998. Change-Id: If0648b2b19998ac1bddccbf251283f3be4ec3040 Reviewed-on: https://go-review.googlesource.com/c/go/+/220417 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
a9f1ea4a83 |
Revert "cmd/compile: don't allow NaNs in floating-point constant ops"
This reverts CL 213477. Reason for revert: tests are failing on linux-mips*-rtrk builders. Change-Id: I8168f7450890233f1bd7e53930b73693c26d4dc0 Reviewed-on: https://go-review.googlesource.com/c/go/+/220897 Run-TryBot: Bryan C. Mills <bcmills@google.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> |
|
|
|
2aa7c6c548 |
cmd/compile: don't allow NaNs in floating-point constant ops
We store 32-bit floating point constants in a 64-bit field, by converting that 32-bit float to 64-bit float to store it, and convert it back to use it. That works for *almost* all floating-point constants. The exception is signaling NaNs. The round trip described above means we can't represent a 32-bit signaling NaN, because conversions strip the signaling bit. To fix this issue, just forbid NaNs as floating-point constants in SSA form. This shouldn't affect any real-world code, as people seldom constant-propagate NaNs (except in test code). Additionally, NaNs are somewhat underspecified (which of the many NaNs do you get when dividing 0/0?), so when cross-compiling there's a danger of using the compiler machine's NaN regime for some math, and the target machine's NaN regime for other math. Better to use the target machine's NaN regime always. This has been a bug since 1.10, and there's an easy workaround (declare a global varaible containing the signaling NaN pattern, and use that as the argument to math.Float32frombits) so we'll fix it in 1.15. Fixes #36400 Update #36399 Change-Id: Icf155e743281560eda2eed953d19a829552ccfda Reviewed-on: https://go-review.googlesource.com/c/go/+/213477 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> |
|
|
|
bc98e35b53 |
cmd/compile: avoid memmove -> SSA move rewrite when size is negative
We should panic in this situation. Rewriting to a SSA op just leads to a compiler panic. Fixes #36259 Change-Id: I6e0bccbed7dd0fdac7ebae76b98a211947947386 Reviewed-on: https://go-review.googlesource.com/c/go/+/212405 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> |
|
|
|
49f8d45994 |
cmd/compile: delete duplicate rules
Add logic during rulegen to detect exact duplicates (after applying commutativity), and clean up existing duplicates. Change-Id: I7179f40fc48e236c74b74f429ec9f0f100026530 Reviewed-on: https://go-review.googlesource.com/c/go/+/213699 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
440f7d6404 |
all: fix a bunch of misspellings
Change-Id: I5b909df0fd048cd66c5a27fca1b06466d3bcaac7
GitHub-Last-Rev:
|
|
|
|
6b1d5471b9 |
cmd/compile: add signed indivisibility by power of 2 rules
Commit
|
|
|
|
bb7890b85a |
cmd/compile: absorb more Not ops into Neq* and Eq* ops
We absorbed Not into most integer comparisons but not into pointer and floating point equality checks. The new cases trigger more than 300 times during make.bash. Change-Id: I77c6b31fcacde10da5470b73fc001a19521ce78d Reviewed-on: https://go-review.googlesource.com/c/go/+/200618 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
70331a31ed |
cmd/compile: fix typing of IData opcodes
The rules for extracting the interface data word don't leave the result typed correctly. If I do i.([1]*int)[0], the result should have type *int, not [1]*int. Using (IData x) for the result keeps the typing of the original top-level Value. I don't think this would ever cause a real codegen bug, bug fixing it at least makes the typing shown in ssa.html more consistent. Change-Id: I239d821c394e58347639387981b0510d13b2f7b7 Reviewed-on: https://go-review.googlesource.com/c/go/+/204042 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> |
|
|
|
9c2e7e8bed |
cmd/compile: allow multiple SSA block control values
Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
f41451e7eb |
compile: prefer an AND instead of SHR+SHL instructions
On modern 64bit CPUs a SHR, SHL or AND instruction take 1 cycle to execute.
A pair of shifts that operate on the same register will take 2 cycles
and needs to wait for the input register value to be available.
Large constants used to mask the high bits of a register with an AND
instruction can not be encoded as an immediate in the AND instruction
on amd64 and therefore need to be loaded into a register with a MOV
instruction.
However that MOV instruction is not dependent on the output register and
on many CPUs does not compete with the AND or shift instructions for
execution ports.
Using a pair of shifts to mask high bits instead of an AND to mask high
bits of a register has a shorter encoding and uses one less general
purpose register but is slower due to taking one clock cycle longer
if there is no register pressure that would make the AND variant need to
generate a spill.
For example the instructions emitted for (x & 1 << 63) before this CL are:
48c1ea3f SHRQ $0x3f, DX
48c1e23f SHLQ $0x3f, DX
after this CL the instructions are the same as GCC and LLVM use:
48b80000000000000080 MOVQ $0x8000000000000000, AX
4821d0 ANDQ DX, AX
Some platforms such as arm64 already have SSA optimization rules to fuse
two shift instructions back into an AND.
Removing the general rule to rewrite AND to SHR+SHL speeds up this benchmark:
var GlobalU uint
func BenchmarkAndHighBits(b *testing.B) {
x := uint(0)
for i := 0; i < b.N; i++ {
x &= 1 << 63
}
GlobalU = x
}
amd64/darwin on Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz:
name old time/op new time/op delta
AndHighBits-4 0.61ns ± 6% 0.42ns ± 6% -31.42% (p=0.000 n=25+25):
'go run run.go -all_codegen -v codegen' passes with following adjustments:
ARM64: The BFXIL pattern ((x << lc) >> rc | y & ac) needed adjustment
since ORshiftRL generation fusing '>> rc' and '|' interferes
with matching ((x << lc) >> rc) to generate UBFX. Previously
ORshiftLL was created first using the shifts generated for (y & ac).
S390X: Add rules for abs and copysign to match use of AND instead of SHIFTs.
Updates #33826
Updates #32781
Change-Id: I5a59f6239660d53c029cd22dfb44ddf39f93a56c
Reviewed-on: https://go-review.googlesource.com/c/go/+/196810
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
|
|
34fe8295c5 |
Revert "compile: prefer an AND instead of SHR+SHL instructions"
This reverts CL 194297. Reason for revert: introduced register allocation failures on PPC64LE builders. Updates #33826 Updates #32781 Updates #34468 Change-Id: I7d0b55df8cdf8e7d2277f1814299b083c2692e48 Reviewed-on: https://go-review.googlesource.com/c/go/+/196957 Run-TryBot: Bryan C. Mills <bcmills@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Reviewed-by: Martin Möhrmann <moehrmann@google.com> |
|
|
|
4e2b84ffc5 |
compile: prefer an AND instead of SHR+SHL instructions
On modern 64bit CPUs a SHR, SHL or AND instruction take 1 cycle to execute.
A pair of shifts that operate on the same register will take 2 cycles
and needs to wait for the input register value to be available.
Large constants used to mask the high bits of a register with an AND
instruction can not be encoded as an immediate in the AND instruction
on amd64 and therefore need to be loaded into a register with a MOV
instruction.
However that MOV instruction is not dependent on the output register and
on many CPUs does not compete with the AND or shift instructions for
execution ports.
Using a pair of shifts to mask high bits instead of an AND to mask high
bits of a register has a shorter encoding and uses one less general
purpose register but is slower due to taking one clock cycle longer
if there is no register pressure that would make the AND variant need to
generate a spill.
For example the instructions emitted for (x & 1 << 63) before this CL are:
48c1ea3f SHRQ $0x3f, DX
48c1e23f SHLQ $0x3f, DX
after this CL the instructions are the same as GCC and LLVM use:
48b80000000000000080 MOVQ $0x8000000000000000, AX
4821d0 ANDQ DX, AX
Some platforms such as arm64 already have SSA optimization rules to fuse
two shift instructions back into an AND.
Removing the general rule to rewrite AND to SHR+SHL speeds up this benchmark:
var GlobalU uint
func BenchmarkAndHighBits(b *testing.B) {
x := uint(0)
for i := 0; i < b.N; i++ {
x &= 1 << 63
}
GlobalU = x
}
amd64/darwin on Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz:
name old time/op new time/op delta
AndHighBits-4 0.61ns ± 6% 0.42ns ± 6% -31.42% (p=0.000 n=25+25):
'go run run.go -all_codegen -v codegen' passes with following adjustments:
ARM64: The BFXIL pattern ((x << lc) >> rc | y & ac) needed adjustment
since ORshiftRL generation fusing '>> rc' and '|' interferes
with matching ((x << lc) >> rc) to generate UBFX. Previously
ORshiftLL was created first using the shifts generated for (y & ac).
S390X: Add rules for abs and copysign to match use of AND instead of SHIFTs.
Updates #33826
Updates #32781
Change-Id: I43227da76b625de03fbc51117162b23b9c678cdb
Reviewed-on: https://go-review.googlesource.com/c/go/+/194297
Run-TryBot: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
|
|
|
|
5bb59b6d16 |
Revert "compile: prefer an AND instead of SHR+SHL instructions"
This reverts commit
|
|
|
|
9ec7074a94 |
compile: prefer an AND instead of SHR+SHL instructions
On modern 64bit CPUs a SHR, SHL or AND instruction take 1 cycle to execute.
A pair of shifts that operate on the same register will take 2 cycles
and needs to wait for the input register value to be available.
Large constants used to mask the high bits of a register with an AND
instruction can not be encoded as an immediate in the AND instruction
on amd64 and therefore need to be loaded into a register with a MOV
instruction.
However that MOV instruction is not dependent on the output register and
on many CPUs does not compete with the AND or shift instructions for
execution ports.
Using a pair of shifts to mask high bits instead of an AND to mask high
bits of a register has a shorter encoding and uses one less general
purpose register but is slower due to taking one clock cycle longer
if there is no register pressure that would make the AND variant need to
generate a spill.
For example the instructions emitted for (x & 1 << 63) before this CL are:
48c1ea3f SHRQ $0x3f, DX
48c1e23f SHLQ $0x3f, DX
after this CL the instructions are the same as GCC and LLVM use:
48b80000000000000080 MOVQ $0x8000000000000000, AX
4821d0 ANDQ DX, AX
Some platforms such as arm64 already have SSA optimization rules to fuse
two shift instructions back into an AND.
Removing the general rule to rewrite AND to SHR+SHL speeds up this benchmark:
var GlobalU uint
func BenchmarkAndHighBits(b *testing.B) {
x := uint(0)
for i := 0; i < b.N; i++ {
x &= 1 << 63
}
GlobalU = x
}
amd64/darwin on Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz:
name old time/op new time/op delta
AndHighBits-4 0.61ns ± 6% 0.42ns ± 6% -31.42% (p=0.000 n=25+25):
Updates #33826
Updates #32781
Change-Id: I862d3587446410c447b9a7265196b57f85358633
Reviewed-on: https://go-review.googlesource.com/c/go/+/191780
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
|
|
|
|
9675f81928 |
cmd/compile: add more Neg/Com optimizations
This is a grab-bag of minor optimizations. While we're here, document the c != -(1<<31) constraints better (#31888). file before after Δ % go 14669924 14665828 -4096 -0.028% asm 4867088 4858896 -8192 -0.168% compile 23988320 23984224 -4096 -0.017% cover 5210856 5206760 -4096 -0.079% link 6084376 6080280 -4096 -0.067% total 132181084 132156508 -24576 -0.019% file before after Δ % archive/tar.a 516708 516702 -6 -0.001% bufio.a 182200 181974 -226 -0.124% bytes.a 217624 216890 -734 -0.337% cmd/compile/internal/gc.a 8865412 8865228 -184 -0.002% cmd/compile/internal/ssa.a 29921002 29933976 +12974 +0.043% cmd/go/internal/modfetch/codehost.a 530602 530430 -172 -0.032% cmd/go/internal/modfetch.a 679664 679578 -86 -0.013% cmd/go/internal/modfile.a 411102 410928 -174 -0.042% cmd/go/internal/test.a 315218 315126 -92 -0.029% cmd/go/internal/tlog.a 183242 183256 +14 +0.008% cmd/go/internal/txtar.a 23148 23060 -88 -0.380% cmd/internal/bio.a 132064 132060 -4 -0.003% cmd/internal/buildid.a 107174 107172 -2 -0.002% cmd/internal/edit.a 33208 33354 +146 +0.440% cmd/internal/obj/arm.a 416488 416432 -56 -0.013% cmd/internal/obj/arm64.a 2772626 2772622 -4 -0.000% cmd/internal/obj/x86.a 923186 923114 -72 -0.008% cmd/internal/obj.a 679834 679836 +2 +0.000% cmd/internal/objfile.a 358374 358372 -2 -0.001% cmd/internal/test2json.a 67482 67434 -48 -0.071% cmd/link/internal/ld.a 2836280 2836110 -170 -0.006% cmd/link/internal/loadpe.a 148234 147736 -498 -0.336% cmd/link/internal/objfile.a 144534 144434 -100 -0.069% cmd/link/internal/ppc64.a 170876 170382 -494 -0.289% cmd/vendor/github.com/google/pprof/internal/elfexec.a 49896 49892 -4 -0.008% cmd/vendor/github.com/google/pprof/internal/graph.a 437478 437404 -74 -0.017% cmd/vendor/github.com/google/pprof/profile.a 902040 902044 +4 +0.000% cmd/vendor/github.com/ianlancetaylor/demangle.a 1217856 1217854 -2 -0.000% cmd/vendor/golang.org/x/arch/x86/x86asm.a 561332 560684 -648 -0.115% cmd/vendor/golang.org/x/crypto/ssh/terminal.a 153788 153784 -4 -0.003% cmd/vendor/golang.org/x/sys/unix.a 1043894 1043814 -80 -0.008% cmd/vendor/golang.org/x/tools/go/analysis/passes/printf.a 288458 288414 -44 -0.015% compress/flate.a 369024 368132 -892 -0.242% crypto/aes.a 109058 108968 -90 -0.083% crypto/cipher.a 150410 150544 +134 +0.089% crypto/elliptic.a 323572 323758 +186 +0.057% crypto/md5.a 50868 50788 -80 -0.157% crypto/rsa.a 195292 195214 -78 -0.040% crypto/sha1.a 70936 70858 -78 -0.110% crypto/sha256.a 75316 75236 -80 -0.106% crypto/sha512.a 84846 84768 -78 -0.092% crypto/subtle.a 6520 6514 -6 -0.092% crypto/tls.a 1654916 1654852 -64 -0.004% crypto/x509.a 888674 888638 -36 -0.004% database/sql.a 730280 730198 -82 -0.011% debug/gosym.a 184936 184862 -74 -0.040% debug/macho.a 272138 272136 -2 -0.001% debug/plan9obj.a 78444 78368 -76 -0.097% encoding/base64.a 82126 81882 -244 -0.297% encoding/binary.a 187196 187150 -46 -0.025% encoding/gob.a 897868 897870 +2 +0.000% encoding/json.a 659934 659832 -102 -0.015% encoding/pem.a 59138 58870 -268 -0.453% encoding/xml.a 694054 693300 -754 -0.109% fmt.a 484518 484196 -322 -0.066% go/format.a 33962 33994 +32 +0.094% go/printer.a 437132 437134 +2 +0.000% go/scanner.a 141774 141772 -2 -0.001% go/token.a 125130 125126 -4 -0.003% go/types.a 2192086 2191994 -92 -0.004% html/template.a 599038 598770 -268 -0.045% html.a 184842 184710 -132 -0.071% image/draw.a 129592 129238 -354 -0.273% image/gif.a 171824 171716 -108 -0.063% image/internal/imageutil.a 20282 19272 -1010 -4.980% image/jpeg.a 275608 275114 -494 -0.179% image/png.a 343416 343620 +204 +0.059% image.a 362244 362210 -34 -0.009% index/suffixarray.a 113040 112954 -86 -0.076% internal/trace.a 518972 518838 -134 -0.026% math/big.a 1012670 1012354 -316 -0.031% math.a 219338 219334 -4 -0.002% mime/multipart.a 178854 178502 -352 -0.197% mime/quotedprintable.a 49226 48936 -290 -0.589% net/http/cgi.a 172328 172324 -4 -0.002% net/http.a 4000180 3999732 -448 -0.011% net.a 1858330 1858252 -78 -0.004% path/filepath.a 107496 107498 +2 +0.002% reflect.a 1439776 1439994 +218 +0.015% regexp/syntax.a 459430 459432 +2 +0.000% regexp.a 416394 416400 +6 +0.001% runtime/debug.a 42106 42100 -6 -0.014% runtime/pprof/internal/profile.a 608718 608720 +2 +0.000% runtime/pprof.a 355474 355476 +2 +0.001% runtime.a 3555748 3555796 +48 +0.001% strconv.a 294432 294410 -22 -0.007% strings.a 292148 292090 -58 -0.020% syscall.a 859682 859470 -212 -0.025% text/tabwriter.a 65614 65148 -466 -0.710% vendor/golang.org/x/crypto/chacha20poly1305.a 126736 126728 -8 -0.006% vendor/golang.org/x/crypto/cryptobyte.a 269112 269114 +2 +0.001% vendor/golang.org/x/crypto/internal/chacha20.a 61842 61262 -580 -0.938% vendor/golang.org/x/crypto/poly1305.a 47410 47404 -6 -0.013% vendor/golang.org/x/net/dns/dnsmessage.a 628700 628012 -688 -0.109% vendor/golang.org/x/net/idna.a 237678 237826 +148 +0.062% vendor/golang.org/x/net/route.a 187852 187458 -394 -0.210% vendor/golang.org/x/sys/unix.a 1022426 1022348 -78 -0.008% vendor/golang.org/x/text/transform.a 117954 118104 +150 +0.127% vendor/golang.org/x/text/unicode/bidi.a 291398 291404 +6 +0.002% vendor/golang.org/x/text/unicode/norm.a 534640 534540 -100 -0.019% total 128945190 128945128 -62 -0.000% Change-Id: I346dc31356d5ef7774b824cf202169610bd26432 Reviewed-on: https://go-review.googlesource.com/c/go/+/175778 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
b8cbcacabe |
cmd/compile: optimize more pointer comparisons
The existing pointer comparison optimizations don't include pointer arithmetic. Add them. These rules trigger a few times in std cmd, while compiling: time.Duration.String cmd/go/internal/tlog.NodeHash crypto/tls.ticketKeyFromBytes (3 times) crypto/elliptic.(*p256Point).p256ScalarMult (15 times!) crypto/elliptic.initTable These weird comparisons occur when using the copy builtin, which does a pointer comparison between src and dst. This also happens to fix #32454, by optimizing enough early on that all values can be eliminated. Fixes #32454 Change-Id: I799d45743350bddd15a295dc1e12f8d03c11d1c6 Reviewed-on: https://go-review.googlesource.com/c/go/+/180940 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
abda0a6a92 |
cmd/compile: remove redundant rules
EqPtr and NeqPtr are marked as commutative, so the transformations for rules are already generated by the preceding two lines. Change-Id: Ibecba5c8e54d9df00c84e1dae7e5d8cb53eeff43 Reviewed-on: https://go-review.googlesource.com/c/go/+/180939 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
c5f142fa9f |
cmd/compile: optimize bitset tests
The assembly output for x & c == c, where c is power of 2: MOVQ "".set+8(SP), AX ANDQ $8, AX CMPQ AX, $8 SETEQ "".~r2+24(SP) With optimization using bitset: MOVQ "".set+8(SP), AX BTL $3, AX SETCS "".~r2+24(SP) output less than 1 instruction. However, there is no speed improvement: name old time/op new time/op delta AllBitSet-8 0.35ns ± 0% 0.35ns ± 0% ~ (all equal) Fixes #31904 Change-Id: I5dca4e410bf45716ed2145e3473979ec997e35d4 Reviewed-on: https://go-review.googlesource.com/c/go/+/175957 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
4d9dd35806 |
cmd/compile: add signed divisibility rules
"Division by invariant integers using multiplication" paper by Granlund and Montgomery contains a method for directly computing divisibility (x%c == 0 for c constant) by means of the modular inverse. The method is further elaborated in "Hacker's Delight" by Warren Section 10-17 This general rule can compute divisibilty by one multiplication, and add and a compare for odd divisors and an additional rotate for even divisors. To apply the divisibility rule, we must take into account the rules to rewrite x%c = x-((x/c)*c) and (x/c) for c constant on the first optimization pass "opt". This complicates the matching as we want to match only in the cases where the result of (x/c) is not also needed. So, we must match on the expanded form of (x/c) in the expression x == c*(x/c) in the "late opt" pass after common subexpresion elimination. Note, that if there is an intermediate opt pass introduced in the future we could simplify these rules by delaying the magic division rewrite to "late opt" and matching directly on (x/c) in the intermediate opt pass. On amd64, the divisibility check is 30-45% faster. name old time/op new time/op delta` DivisiblePow2constI64-4 0.83ns ± 1% 0.82ns ± 0% ~ (p=0.079 n=5+4) DivisibleconstI64-4 2.68ns ± 1% 1.87ns ± 0% -30.33% (p=0.000 n=5+4) DivisibleWDivconstI64-4 2.69ns ± 1% 2.71ns ± 3% ~ (p=1.000 n=5+5) DivisiblePow2constI32-4 1.15ns ± 1% 1.15ns ± 0% ~ (p=0.238 n=5+4) DivisibleconstI32-4 2.24ns ± 1% 1.20ns ± 0% -46.48% (p=0.016 n=5+4) DivisibleWDivconstI32-4 2.27ns ± 1% 2.27ns ± 1% ~ (p=0.683 n=5+5) DivisiblePow2constI16-4 0.81ns ± 1% 0.82ns ± 1% ~ (p=0.135 n=5+5) DivisibleconstI16-4 2.11ns ± 2% 1.20ns ± 1% -42.99% (p=0.008 n=5+5) DivisibleWDivconstI16-4 2.23ns ± 0% 2.27ns ± 2% +1.79% (p=0.029 n=4+4) DivisiblePow2constI8-4 0.81ns ± 1% 0.81ns ± 1% ~ (p=0.286 n=5+5) DivisibleconstI8-4 2.13ns ± 3% 1.19ns ± 1% -43.84% (p=0.008 n=5+5) DivisibleWDivconstI8-4 2.23ns ± 1% 2.25ns ± 1% ~ (p=0.183 n=5+5) Fixes #30282 Fixes #15806 Change-Id: Id20d78263a4fdfe0509229ae4dfa2fede83fc1d0 Reviewed-on: https://go-review.googlesource.com/c/go/+/173998 Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
a28a942768 |
cmd/compile: add unsigned divisibility rules
"Division by invariant integers using multiplication" paper by Granlund and Montgomery contains a method for directly computing divisibility (x%c == 0 for c constant) by means of the modular inverse. The method is further elaborated in "Hacker's Delight" by Warren Section 10-17 This general rule can compute divisibilty by one multiplication and a compare for odd divisors and an additional rotate for even divisors. To apply the divisibility rule, we must take into account the rules to rewrite x%c = x-((x/c)*c) and (x/c) for c constant on the first optimization pass "opt". This complicates the matching as we want to match only in the cases where the result of (x/c) is not also available. So, we must match on the expanded form of (x/c) in the expression x == c*(x/c) in the "late opt" pass after common subexpresion elimination. Note, that if there is an intermediate opt pass introduced in the future we could simplify these rules by delaying the magic division rewrite to "late opt" and matching directly on (x/c) in the intermediate opt pass. Additional rules to lower the generic RotateLeft* ops were also applied. On amd64, the divisibility check is 25-50% faster. name old time/op new time/op delta DivconstI64-4 2.08ns ± 0% 2.08ns ± 1% ~ (p=0.881 n=5+5) DivisibleconstI64-4 2.67ns ± 0% 2.67ns ± 1% ~ (p=1.000 n=5+5) DivisibleWDivconstI64-4 2.67ns ± 0% 2.67ns ± 0% ~ (p=0.683 n=5+5) DivconstU64-4 2.08ns ± 1% 2.08ns ± 1% ~ (p=1.000 n=5+5) DivisibleconstU64-4 2.77ns ± 1% 1.55ns ± 2% -43.90% (p=0.008 n=5+5) DivisibleWDivconstU64-4 2.99ns ± 1% 2.99ns ± 1% ~ (p=1.000 n=5+5) DivconstI32-4 1.53ns ± 2% 1.53ns ± 0% ~ (p=1.000 n=5+5) DivisibleconstI32-4 2.23ns ± 0% 2.25ns ± 3% ~ (p=0.167 n=5+5) DivisibleWDivconstI32-4 2.27ns ± 1% 2.27ns ± 1% ~ (p=0.429 n=5+5) DivconstU32-4 1.78ns ± 0% 1.78ns ± 1% ~ (p=1.000 n=4+5) DivisibleconstU32-4 2.52ns ± 2% 1.26ns ± 0% -49.96% (p=0.000 n=5+4) DivisibleWDivconstU32-4 2.63ns ± 0% 2.85ns ±10% +8.29% (p=0.016 n=4+5) DivconstI16-4 1.54ns ± 0% 1.54ns ± 0% ~ (p=0.333 n=4+5) DivisibleconstI16-4 2.10ns ± 0% 2.10ns ± 1% ~ (p=0.571 n=4+5) DivisibleWDivconstI16-4 2.22ns ± 0% 2.23ns ± 1% ~ (p=0.556 n=4+5) DivconstU16-4 1.09ns ± 0% 1.01ns ± 1% -7.74% (p=0.000 n=4+5) DivisibleconstU16-4 1.83ns ± 0% 1.26ns ± 0% -31.52% (p=0.008 n=5+5) DivisibleWDivconstU16-4 1.88ns ± 0% 1.89ns ± 1% ~ (p=0.365 n=5+5) DivconstI8-4 1.54ns ± 1% 1.54ns ± 1% ~ (p=1.000 n=5+5) DivisibleconstI8-4 2.10ns ± 0% 2.11ns ± 0% ~ (p=0.238 n=5+4) DivisibleWDivconstI8-4 2.22ns ± 0% 2.23ns ± 2% ~ (p=0.762 n=5+5) DivconstU8-4 0.92ns ± 1% 0.94ns ± 1% +2.65% (p=0.008 n=5+5) DivisibleconstU8-4 1.66ns ± 0% 1.26ns ± 1% -24.28% (p=0.008 n=5+5) DivisibleWDivconstU8-4 1.79ns ± 0% 1.80ns ± 1% ~ (p=0.079 n=4+5) A follow-up change will address the signed division case. Updates #30282 Change-Id: I7e995f167179aa5c76bb10fbcbeb49c520943403 Reviewed-on: https://go-review.googlesource.com/c/go/+/168037 Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
44343c777c |
cmd/compile: add signed divisibility by power of 2 rules
For powers of two (c=1<<k), the divisibility check x%c == 0 can be made just by checking the trailing zeroes via a mask x&(c-1) == 0 even for signed integers. This avoids division fix-ups when just divisibility check is needed. To apply this rule, we match on the fixed-up version of the division. This is neccessary because the mod and division rewrite rules are already applied during the initial opt pass. The speed up on amd64 due to elimination of unneccessary fix-up code is ~55%: name old time/op new time/op delta DivconstI64-4 2.08ns ± 0% 2.09ns ± 1% ~ (p=0.730 n=5+5) DivisiblePow2constI64-4 1.78ns ± 1% 0.81ns ± 1% -54.66% (p=0.008 n=5+5) DivconstU64-4 2.08ns ± 0% 2.08ns ± 0% ~ (p=0.683 n=5+5) DivconstI32-4 1.53ns ± 0% 1.53ns ± 1% ~ (p=0.968 n=4+5) DivisiblePow2constI32-4 1.79ns ± 1% 0.81ns ± 1% -54.97% (p=0.008 n=5+5) DivconstU32-4 1.78ns ± 1% 1.80ns ± 2% ~ (p=0.206 n=5+5) DivconstI16-4 1.54ns ± 2% 1.54ns ± 0% ~ (p=0.238 n=5+4) DivisiblePow2constI16-4 1.78ns ± 0% 0.81ns ± 1% -54.72% (p=0.000 n=4+5) DivconstU16-4 1.00ns ± 5% 1.01ns ± 1% ~ (p=0.119 n=5+5) DivconstI8-4 1.54ns ± 0% 1.54ns ± 2% ~ (p=0.571 n=4+5) DivisiblePow2constI8-4 1.78ns ± 0% 0.82ns ± 8% -53.71% (p=0.008 n=5+5) DivconstU8-4 0.93ns ± 1% 0.93ns ± 1% ~ (p=0.643 n=5+5) A follow-up CL will address the general case of x%c == 0 for signed integers. Updates #15806 Change-Id: Iabadbbe369b6e0998c8ce85d038ebc236142e42a Reviewed-on: https://go-review.googlesource.com/c/go/+/173557 Run-TryBot: Brian Kessler <brian.m.kessler@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
17615969b6 |
Revert "cmd/compile: add signed divisibility by power of 2 rules"
This reverts CL 168038 (git
|
|
|
|
68819fb6d2 |
cmd/compile: add signed divisibility by power of 2 rules
For powers of two (c=1<<k), the divisibility check x%c == 0 can be made just by checking the trailing zeroes via a mask x&(c-1)==0 even for signed integers. This avoids division fixups when just divisibility check is needed. To apply this rule the generic divisibility rule for A%B = A-(A/B*B) is disabled on the "opt" pass, but this does not affect generated code as this rule is applied later. The speed up on amd64 due to elimination of unneccessary fixup code is ~55%: name old time/op new time/op delta DivconstI64-4 2.08ns ± 0% 2.07ns ± 0% ~ (p=0.079 n=5+5) DivisiblePow2constI64-4 1.78ns ± 1% 0.81ns ± 1% -54.55% (p=0.008 n=5+5) DivconstU64-4 2.08ns ± 0% 2.08ns ± 0% ~ (p=1.000 n=5+5) DivconstI32-4 1.53ns ± 0% 1.53ns ± 0% ~ (all equal) DivisiblePow2constI32-4 1.79ns ± 1% 0.81ns ± 4% -54.75% (p=0.008 n=5+5) DivconstU32-4 1.78ns ± 1% 1.78ns ± 1% ~ (p=1.000 n=5+5) DivconstI16-4 1.54ns ± 2% 1.53ns ± 0% ~ (p=0.333 n=5+4) DivisiblePow2constI16-4 1.78ns ± 0% 0.79ns ± 1% -55.39% (p=0.000 n=4+5) DivconstU16-4 1.00ns ± 5% 0.99ns ± 1% ~ (p=0.730 n=5+5) DivconstI8-4 1.54ns ± 0% 1.53ns ± 0% ~ (p=0.714 n=4+5) DivisiblePow2constI8-4 1.78ns ± 0% 0.80ns ± 0% -55.06% (p=0.000 n=5+4) DivconstU8-4 0.93ns ± 1% 0.95ns ± 1% +1.72% (p=0.024 n=5+5) A follow-up CL will address the general case of x%c == 0 for signed integers. Updates #15806 Change-Id: I0d284863774b1bc8c4ce87443bbaec6103e14ef4 Reviewed-on: https://go-review.googlesource.com/c/go/+/168038 Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
68d4b1265e |
cmd/compile: reduce bits.Div64(0, lo, y) to 64 bit division
With this change, these two functions generate identical code:
func f(x uint64) (uint64, uint64) {
return bits.Div64(0, x, 5)
}
func g(x uint64) (uint64, uint64) {
return x / 5, x % 5
}
Updates #31582
Change-Id: Ia96c2e67f8af5dd985823afee5f155608c04a4b6
Reviewed-on: https://go-review.googlesource.com/c/go/+/173197
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
|
|
|
|
66f5d4e035 |
cmd/compile: int64(uint64 >> x) >= 0 if x > 0
This rewrite rule triggers only once, in math/big.quotToFloat64, as part of converting a uint64 to a float64. Nevertheless, it is cheap; let's add it. Change-Id: I3ed4a197a559110fec1bc04b3a8abb4c7fcc2c89 Reviewed-on: https://go-review.googlesource.com/c/go/+/167500 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
61945fc502 |
cmd/compile: don't generate panicshift for masked int shifts
We know that a & 31 is non-negative for all a, signed or not. We can avoid checking that and needing to write out an unreachable call to panicshift. Change-Id: I32f32fb2c950d2b2b35ac5c0e99b7b2dbd47f917 Reviewed-on: https://go-review.googlesource.com/c/go/+/167499 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
8f854244ad |
cmd/compile: fix crash when memmove argument is not the right type
Make sure the argument to memmove is of pointer type before we try to get the element type. This has been noticed for code that uses unsafe+linkname so it can call runtime.memmove. Probably not the best thing to allow, but the code is out there and we'd rather not break it unnecessarily. Fixes #30061 Change-Id: I334a8453f2e293959fd742044c43fbe93f0b3d31 Reviewed-on: https://go-review.googlesource.com/c/160826 Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> |
|
|
|
04105ef1da |
cmd/compile: decompose composite OpArg before decomposeUser
This makes it easier to track names of function arguments for debugging purposes. Change-Id: Ic34856fe0b910005e1c7bc051d769d489a4b158e Reviewed-on: https://go-review.googlesource.com/c/150098 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
8607b2e825 |
cmd/compile: optimize A->B->C Moves that include VarDefs
We have an existing optimization that recognizes memory moves of the form A -> B -> C and converts them into A -> C, in the hopes that the store to B will be end up being dead and thus eliminated. However, when A, B, and C are large types, the front end sometimes emits VarDef ops for the moves. This change adds an optimization to match that pattern. This required changing an old compiler test. The test assumed that a temporary was required to deal with a large return value. With this optimization in place, that temporary ended up being eliminated. Triggers 649 times during 'go build -a std cmd'. Cuts 16k off cmd/go. name old object-bytes new object-bytes delta Template 507kB ± 0% 507kB ± 0% -0.15% (p=0.008 n=5+5) Unicode 225kB ± 0% 225kB ± 0% ~ (all equal) GoTypes 1.85MB ± 0% 1.85MB ± 0% ~ (all equal) Flate 328kB ± 0% 328kB ± 0% ~ (all equal) GoParser 402kB ± 0% 402kB ± 0% -0.00% (p=0.008 n=5+5) Reflect 1.41MB ± 0% 1.41MB ± 0% -0.20% (p=0.008 n=5+5) Tar 458kB ± 0% 458kB ± 0% ~ (all equal) XML 601kB ± 0% 599kB ± 0% -0.21% (p=0.008 n=5+5) Change-Id: I9b5f25c8663a0b772ad1ee51fa61f74b74d26dd3 Reviewed-on: https://go-review.googlesource.com/c/143479 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Munday <mike.munday@ibm.com> |
|
|
|
a98bb7e244 |
cmd/compile: only optimize chained Moves on disjoint stack mem
This optimization is not sound if A, B, or C might overlap with each other. Thanks to Michael Munday for pointing this out during the review of CL 143479. This reduces the number of times this optimization triggers during make.bash from 386 to 74. This is unfortunate, but I don't see an obvious way around it, short of souping up the disjointness analysis. name old object-bytes new object-bytes delta Template 507kB ± 0% 507kB ± 0% +0.13% (p=0.008 n=5+5) Unicode 225kB ± 0% 225kB ± 0% ~ (all equal) GoTypes 1.85MB ± 0% 1.85MB ± 0% +0.02% (p=0.008 n=5+5) Flate 328kB ± 0% 328kB ± 0% ~ (all equal) GoParser 402kB ± 0% 402kB ± 0% ~ (all equal) Reflect 1.41MB ± 0% 1.41MB ± 0% ~ (all equal) Tar 457kB ± 0% 458kB ± 0% +0.20% (p=0.008 n=5+5) XML 600kB ± 0% 601kB ± 0% +0.03% (p=0.008 n=5+5) Change-Id: Ida408cb627145ba9faf473a78606f050c2f3f51c Reviewed-on: https://go-review.googlesource.com/c/145208 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Michael Munday <mike.munday@ibm.com> |
|
|
|
2578ac54eb |
cmd/compile: move argument stack construction to SSA generation
The goal of this change is to move work from walk to SSA, and simplify things along the way. This is hard to accomplish cleanly with small incremental changes, so this large commit message aims to provide a roadmap to the diff. High level description: Prior to this change, walk was responsible for constructing (most of) the stack for function calls. ascompatte gathered variadic arguments into a slice. It also rewrote n.List from a list of arguments to a list of assignments to stack slots. ascompatte was called multiple times to handle the receiver in a method call. reorder1 then introduced temporaries into n.List as needed to avoid smashing the stack. adjustargs then made extra stack space for go/defer args as needed. Node to SSA construction evaluated all the statements in n.List, and issued the function call, assuming that the stack was correctly constructed. Intrinsic calls had to dig around inside n.List to extract the arguments, since intrinsics don't use the stack to make function calls. This change moves stack construction to the SSA construction phase. ascompatte, now called walkParams, does all the work that ascompatte and reorder1 did. It handles variadic arguments, inserts the method receiver if needed, and allocates temporaries. It does not, however, make any assignments to stack slots. Instead, it moves the function arguments to n.Rlist, leaving assignments to temporaries in n.List. (It would be better to use Ninit instead of List; future work.) During SSA construction, after doing all the temporary assignments in n.List, the function arguments are assigned to stack slots by constructing the appropriate SSA Value, using (*state).storeArg. SSA construction also now handles adjustments for go/defer args. This change also simplifies intrinsic calls, since we no longer need to undo walk's work. Along the way, we simplify nodarg by pushing the fp==1 case to its callers, where it fits nicely. Generated code differences: There were a few optimizations applied along the way, the old way. f(g()) was rewritten to do a block copy of function results to function arguments. And reorder1 avoided introducing the final "save the stack" temporary in n.List. The f(g()) block copy optimization never actually triggered; the order pass rewrote away g(), so that has been removed. SSA optimizations mostly obviated the need for reorder1's optimization of avoiding the final temporary. The exception was when the temporary's type was not SSA-able; in that case, we got a Move into an autotmp and then an immediate Move onto the stack, with the autotmp never read or used again. This change introduces a new rewrite rule to detect such pointless double Moves and collapse them into a single Move. This is actually more powerful than the original optimization, since the original optimization relied on the imprecise Node.HasCall calculation. The other significant difference in the generated code is that the stack is now constructed completely in SP-offset order. Prior to this change, the stack was constructed somewhat haphazardly: first the final argument that Node.HasCall deemed to require a temporary, then other arguments, then the method receiver, then the defer/go args. SP-offset is probably a good default order. See future work. There are a few minor object file size changes as a result of this change. I investigated some regressions in early versions of this change. One regression (in archive/tar) was the addition of a single CMPQ instruction, which would be eliminated were this TODO from flagalloc to be done: // TODO: Remove original instructions if they are never used. One regression (in text/template) was an ADDQconstmodify that is now a regular MOVQLoad+ADDQconst+MOVQStore, due to an unlucky change in the order in which arguments are written. The argument change order can also now be luckier, so this appears to be a wash. All in all, though there will be minor winners and losers, this change appears to be performance neutral. Future work: Move loading the result of function calls to SSA construction; eliminate OINDREGSP. Consider pushing stack construction deeper into SSA world, perhaps in an arch-specific pass. Among other benefits, this would make it easier to transition to a new calling convention. This would require rethinking the handling of stack conflicts and is non-trivial. Figure out some clean way to indicate that stack construction Stores/Moves do not alias each other, so that subsequent passes may do things like CSE+tighten shared stack setup, do DSE using non-first Stores, etc. This would allow us to eliminate the minor text/template regression. Possibly make assignments to stack slots not treated as statements by DWARF. Compiler benchmarks: name old time/op new time/op delta Template 182ms ± 2% 179ms ± 2% -1.69% (p=0.000 n=47+48) Unicode 86.3ms ± 5% 85.1ms ± 4% -1.36% (p=0.001 n=50+50) GoTypes 646ms ± 1% 642ms ± 1% -0.63% (p=0.000 n=49+48) Compiler 2.89s ± 1% 2.86s ± 2% -1.36% (p=0.000 n=48+50) SSA 8.47s ± 1% 8.37s ± 2% -1.22% (p=0.000 n=47+50) Flate 122ms ± 2% 121ms ± 2% -0.66% (p=0.000 n=47+45) GoParser 147ms ± 2% 146ms ± 2% -0.53% (p=0.006 n=46+49) Reflect 406ms ± 2% 403ms ± 2% -0.76% (p=0.000 n=48+43) Tar 162ms ± 3% 162ms ± 4% ~ (p=0.191 n=46+50) XML 223ms ± 2% 222ms ± 2% -0.37% (p=0.031 n=45+49) [Geo mean] 382ms 378ms -0.89% name old user-time/op new user-time/op delta Template 219ms ± 3% 216ms ± 3% -1.56% (p=0.000 n=50+48) Unicode 109ms ± 6% 109ms ± 5% ~ (p=0.190 n=50+49) GoTypes 836ms ± 2% 828ms ± 2% -0.96% (p=0.000 n=49+48) Compiler 3.87s ± 2% 3.80s ± 1% -1.81% (p=0.000 n=49+46) SSA 12.0s ± 1% 11.8s ± 1% -2.01% (p=0.000 n=48+50) Flate 142ms ± 3% 141ms ± 3% -0.85% (p=0.003 n=50+48) GoParser 178ms ± 4% 175ms ± 4% -1.66% (p=0.000 n=48+46) Reflect 520ms ± 2% 512ms ± 2% -1.44% (p=0.000 n=45+48) Tar 200ms ± 3% 198ms ± 4% -0.61% (p=0.037 n=47+50) XML 277ms ± 3% 275ms ± 3% -0.85% (p=0.000 n=49+48) [Geo mean] 482ms 476ms -1.23% name old alloc/op new alloc/op delta Template 36.1MB ± 0% 35.3MB ± 0% -2.18% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 29.3MB ± 0% -1.58% (p=0.008 n=5+5) GoTypes 125MB ± 0% 123MB ± 0% -2.13% (p=0.008 n=5+5) Compiler 531MB ± 0% 513MB ± 0% -3.40% (p=0.008 n=5+5) SSA 2.00GB ± 0% 1.93GB ± 0% -3.34% (p=0.008 n=5+5) Flate 24.5MB ± 0% 24.3MB ± 0% -1.18% (p=0.008 n=5+5) GoParser 29.4MB ± 0% 28.7MB ± 0% -2.34% (p=0.008 n=5+5) Reflect 87.1MB ± 0% 86.0MB ± 0% -1.33% (p=0.008 n=5+5) Tar 35.3MB ± 0% 34.8MB ± 0% -1.44% (p=0.008 n=5+5) XML 47.9MB ± 0% 47.1MB ± 0% -1.86% (p=0.008 n=5+5) [Geo mean] 82.8MB 81.1MB -2.08% name old allocs/op new allocs/op delta Template 352k ± 0% 347k ± 0% -1.32% (p=0.008 n=5+5) Unicode 342k ± 0% 339k ± 0% -0.66% (p=0.008 n=5+5) GoTypes 1.29M ± 0% 1.27M ± 0% -1.30% (p=0.008 n=5+5) Compiler 4.98M ± 0% 4.87M ± 0% -2.14% (p=0.008 n=5+5) SSA 15.7M ± 0% 15.2M ± 0% -2.86% (p=0.008 n=5+5) Flate 233k ± 0% 231k ± 0% -0.83% (p=0.008 n=5+5) GoParser 296k ± 0% 291k ± 0% -1.54% (p=0.016 n=5+4) Reflect 1.05M ± 0% 1.04M ± 0% -0.65% (p=0.008 n=5+5) Tar 343k ± 0% 339k ± 0% -0.97% (p=0.008 n=5+5) XML 432k ± 0% 426k ± 0% -1.19% (p=0.008 n=5+5) [Geo mean] 815k 804k -1.35% name old object-bytes new object-bytes delta Template 505kB ± 0% 505kB ± 0% -0.01% (p=0.008 n=5+5) Unicode 224kB ± 0% 224kB ± 0% ~ (all equal) GoTypes 1.82MB ± 0% 1.83MB ± 0% +0.06% (p=0.008 n=5+5) Flate 324kB ± 0% 324kB ± 0% +0.00% (p=0.008 n=5+5) GoParser 402kB ± 0% 402kB ± 0% +0.04% (p=0.008 n=5+5) Reflect 1.39MB ± 0% 1.39MB ± 0% -0.01% (p=0.008 n=5+5) Tar 449kB ± 0% 449kB ± 0% -0.02% (p=0.008 n=5+5) XML 598kB ± 0% 597kB ± 0% -0.05% (p=0.008 n=5+5) Change-Id: Ifc9d5c1bd01f90171414b8fb18ffe2290d271143 Reviewed-on: https://go-review.googlesource.com/c/114797 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> |
|
|
|
ceb0c371d9 |
cmd/compile: make []byte("...") more efficient
Do []byte(string) conversions more efficiently when the string
is a constant. Instead of calling stringtobyteslice, allocate
just the space we need and encode the initialization directly.
[]byte("foo") rewrites to the following pseudocode:
var s [3]byte // on heap or stack, depending on whether b escapes
s = *(*[3]byte)(&"foo"[0]) // initialize s from the string
b = s[:]
which generates this assembly:
0x001d 00029 (tmp1.go:9) LEAQ type.[3]uint8(SB), AX
0x0024 00036 (tmp1.go:9) MOVQ AX, (SP)
0x0028 00040 (tmp1.go:9) CALL runtime.newobject(SB)
0x002d 00045 (tmp1.go:9) MOVQ 8(SP), AX
0x0032 00050 (tmp1.go:9) MOVBLZX go.string."foo"+2(SB), CX
0x0039 00057 (tmp1.go:9) MOVWLZX go.string."foo"(SB), DX
0x0040 00064 (tmp1.go:9) MOVW DX, (AX)
0x0043 00067 (tmp1.go:9) MOVB CL, 2(AX)
// Then the slice is b = {AX, 3, 3}
The generated code is still not optimal, as it still does load/store
from read-only memory instead of constant stores. Next CL...
Update #26498
Fixes #10170
Change-Id: I4b990b19f9a308f60c8f4f148934acffefe0a5bd
Reviewed-on: https://go-review.googlesource.com/c/140698
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
|
|
|
|
c96e3bcc97 |
cmd/compile: fix type of OffPtr in some optimization rules
In some optimization rules the type of generated OffPtr was incorrectly set to the type of the pointee, instead of the pointer. When the OffPtr value is spilled, this may generate a spill of the wrong type, e.g. a floating point spill of an integer (pointer) value. On Wasm, this leads to invalid bytecode. Fixes #27961. Change-Id: I5d464847eb900ed90794105c0013a1a7330756cc Reviewed-on: https://go-review.googlesource.com/c/139257 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Richard Musiol <neelance@gmail.com> |
|
|
|
c6118af558 |
cmd/compile: don't do floating point optimization x+0 -> x
That optimization is not valid if x == -0. The test is a bit tricky because 0 == -0. We distinguish 0 from -0 with 1/0 == inf, 1/-0 == -inf. This has been a bug since CL 24790 in Go 1.8. Probably doesn't warrant a backport. Fixes #27718 Note: the optimization x-0 -> x is actually valid. But it's probably best to take it out, so as to not confuse readers. Change-Id: I99f16a93b45f7406ec8053c2dc759a13eba035fa Reviewed-on: https://go-review.googlesource.com/135701 Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
2db1a7f929 |
cmd/compile: avoid more float32 <-> float64 conversions in compiler
Use the new custom truncate/extension code when storing or extracting float32 values from AuxInts to avoid the value being changed by the host platform's floating point conversion instructions (e.g. sNaN -> qNaN). Updates #27516. Change-Id: Id39650f1431ef74af088c895cf4738ea5fa87974 Reviewed-on: https://go-review.googlesource.com/134855 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
25b84c0155 |
cmd/compile: move v.Pos.line check to warnRule
This simplifies the rewrite rules. Change-Id: Iff062297d42a23cb31ad55e8c733842ecbc07da2 Reviewed-on: https://go-review.googlesource.com/129377 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> |
|
|
|
48af3a8be5 |
cmd/compile: fix store-to-load forwarding of 32-bit sNaNs
Signalling NaNs were being converted to quiet NaNs during constant propagation through integer <-> float store-to-load forwarding. This occurs because we store float32 constants as float64 values and CPU hardware 'quietens' NaNs during conversion between the two. Eventually we want to move to using float32 values to store float32 constants, however this will be a big change since both the compiler and the assembler expect float64 values. So for now this is a small change that will fix the immediate issue. Fixes #27193. Change-Id: Iac54bd8c13abe26f9396712bc71f9b396f842724 Reviewed-on: https://go-review.googlesource.com/132956 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
379d2dea72 |
cmd/compile: remove superfluous signed right shift used for signed division by 2
A signed right shift before an unsigned right shift by register width-1 (extracts the sign bit) is superflous. trigger counts during ./make.bash 0 (Rsh8U (Rsh8 x _) 7 ) -> (Rsh8U x 7 ) 0 (Rsh16U (Rsh16 x _) 15 ) -> (Rsh16U x 15) 2 (Rsh32U (Rsh32 x _) 31 ) -> (Rsh32U x 31) 251 (Rsh64U (Rsh64 x _) 63 ) -> (Rsh64U x 63) Changes the instructions generated on AMD64 for x / 2 where x is a signed integer from: MOVQ AX, CX SARQ $63, AX SHRQ $63, AX ADDQ CX, AX SARQ $1, AX to: MOVQ AX, CX SHRQ $63, AX ADDQ CX, AX SARQ $1, AX Change-Id: I86321ae8fc9dc24b8fa9eb80aa5c7299eff8c9dc Reviewed-on: https://go-review.googlesource.com/115956 Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |
|
|
|
4201c2077e |
cmd/compile: omit racefuncentry/exit when they are not needed
When compiling with -race, we insert calls to racefuncentry, into every function. Add a rule that removes them in leaf functions, without instrumented loads/stores. Shaves ~30kb from "-race" version of go tool: file difference: go_old 15626192 go_new 15597520 [-28672 bytes] section differences: global text (code) = -24513 bytes (-0.358598%) read-only data = -5849 bytes (-0.167064%) Total difference -30362 bytes (-0.097928%) Fixes #24662 Change-Id: Ia63bf1827f4cf2c25e3e28dcd097c150994ade0a Reviewed-on: https://go-review.googlesource.com/121235 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> |