Prior to this CL, the SSA backend reported violations
of the //go:nowritebarrier annotation immediately.
This necessitated emitting errors during SSA compilation,
which is not compatible with a concurrent backend.
Instead, check for such violations later.
We already save the data required to do a late check
for violations of the //go:nowritebarrierrec annotation.
Use the same data, and check //go:nowritebarrier at the same time.
One downside to doing this is that now only a single
violation will be reported per function.
Given that this is for the runtime only,
and violations are rare, this seems an acceptable cost.
While we are here, remove several 'nerrors != 0' checks
that are rendered pointless.
Updates #15756Fixes#19250 (as much as it ever can be)
Change-Id: Ia44c4ad5b6fd6f804d9f88d9571cec8d23665cb3
Reviewed-on: https://go-review.googlesource.com/38973
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Instead, add a scratchFpMem field to ssafn,
so that it may be passed on to genssa.
Updates #15756
Change-Id: Icdeae290d3098d14d31659fa07a9863964bb76ed
Reviewed-on: https://go-review.googlesource.com/38728
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This is a better home for it.
Change-Id: I7ce96c16378d841613edaa53c07347b0ac99ea6e
Reviewed-on: https://go-review.googlesource.com/38970
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
We don't need them any more since #15837 was fixed.
Fixes#19718
Change-Id: I13e46c62b321b2c9265f44c977b63bfb23163ca2
Reviewed-on: https://go-review.googlesource.com/38664
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Concurrent compilation requires providing an
explicit position and curfn to temp.
This implementation of tempAt temporarily
continues to use the globals lineno and Curfn,
so as not to collide with mdempsky's
work for #19683 eliminating the Curfn dependency
from func nod.
Updates #15756
Updates #19683
Change-Id: Ib3149ca4b0740e9f6eea44babc6f34cdd63028a9
Reviewed-on: https://go-review.googlesource.com/38592
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Almost never happens in practice.
The compiler will generate reasonable code anyway,
since assignments involving [0]T never do any work.
Fixes#19696Fixes#19671
Change-Id: I350d2e0c5bb326c4789c74a046ab0486b2cee49c
Reviewed-on: https://go-review.googlesource.com/38599
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Introduce a new type, gc.Progs, to manage
generation of Progs for a function.
Use it to replace globals pc and pcloc.
Passes toolstash-check -all.
Updates #15756
Change-Id: I2206998d7c58fe2a76b620904909f2e1cec8a57d
Reviewed-on: https://go-review.googlesource.com/38418
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
To enable this, inline the call to nod and simplify.
Eliminates a reference to lineno from the backend.
Passes toolstash-check -all.
Updates #15756
Change-Id: I9c4bd77d10d727aa8f5e6c6bb16b0e05de165631
Reviewed-on: https://go-review.googlesource.com/38441
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Prior to this CL, the function's position was used.
The dottype Node's position is clearly better.
I'm not thrilled about introducing a reference to
lineno in the middle of SSA construction;
I will have to remove it later.
My immediate goal is stability and correctness of positions,
though, since that aids refactoring, so this is an improvement.
An example from package io:
func (t *multiWriter) WriteString(s string) (n int, err error) {
var p []byte // lazily initialized if/when needed
for _, w := range t.writers {
if sw, ok := w.(stringWriter); ok {
n, err = sw.WriteString(s)
The w.(stringWriter) type assertion includes loading
the address of static type data for stringWriter:
LEAQ type."".stringWriter(SB), R10
Prior to this CL, this instruction was given the line number
of the function declaration.
After this CL, this instruction is given the line number
of the type assertion itself.
Change-Id: Ifcca274b581a5a57d7e3102c4d7b7786bf307210
Reviewed-on: https://go-review.googlesource.com/38389
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Teach the backend to recognize that the address of a symbol
is equal with itself, and that the addresses of two different
symbols are different.
Some examples of where this rule hits in the standard library:
- inlined uses of (*time.Time).setLoc (e.g. time.UTC)
- inlined uses of bufio.NewReader (via type assertion)
Change-Id: I23dcb068c2ec333655c1292917bec13bbd908c24
Reviewed-on: https://go-review.googlesource.com/38338
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
CL 27254 changed a constant string to a byte array
in encoding/hex and got significant performance
improvements.
hex.Encode used the string twice in a single function.
The rewrite rules lower constant strings into components.
The pointer component requires an aux symbol.
The existing implementation created a new aux symbol every time.
As a result, constant string pointers were never CSE'd.
Tighten then moved the pointer calculation next to the uses, i.e.
into the loop.
The re-use of aux syms enabled by this CL
occurs 3691 times during make.bash.
This CL should not go in without CL 38338
or something like it.
Change-Id: Ibbf5b17283c0e31821d04c7e08d995c654de5663
Reviewed-on: https://go-review.googlesource.com/28219
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This makes the overall naming and use of the functions
to create a Type more consistent.
Passes toolstash -cmp.
Change-Id: Ie0d40b42cc32b5ecf5f20502675a225038ea40e4
Reviewed-on: https://go-review.googlesource.com/38354
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This reduces the number of calls back into the
gc Type routines, which will help performance
in a concurrent backend.
It also reduces the number of callsites
that must be considered in making the transition.
Passes toolstash-check -all. No compiler performance changes.
Updates #15756
Change-Id: Ic7a8f1daac7e01a21658ae61ac118b2a70804117
Reviewed-on: https://go-review.googlesource.com/38340
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Prior to this CL, the ssa.Frontend field was responsible
for providing types to the backend during compilation.
However, the types needed by the backend are few and static.
It makes more sense to use a struct for them
and to hang that struct off the ssa.Config,
which is the correct home for readonly data.
Now that Types is a struct, we can clean up the names a bit as well.
This has the added benefit of allowing early construction
of all types needed by the backend.
This will be useful for concurrent backend compilation.
Passes toolstash-check -all. No compiler performance change.
Updates #15756
Change-Id: I021658c8cf2836d6a22bbc20cc828ac38c7da08a
Reviewed-on: https://go-review.googlesource.com/38336
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
While we're here, also eliminate a few more Curfn uses.
Passes toolstash -cmp. No compiler performance impact.
Updates #15756
Change-Id: Ib8db9e23467bbaf16cc44bf62d604910f733d6b8
Reviewed-on: https://go-review.googlesource.com/38331
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
This is a first step towards eliminating the
Curfn global in the backend.
There's more to do.
Passes toolstash -cmp. No compiler performance impact.
Updates #15756
Change-Id: Ib09f550a001e279a5aeeed0f85698290f890939c
Reviewed-on: https://go-review.googlesource.com/38232
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Suggested by mdempsky in CL 38232.
This allows us to use the Frontend field
to associate frontend state and information
with a function.
See the following CL in the series for examples.
This is a giant CL, but it is almost entirely routine refactoring.
The ssa test API is starting to feel a bit unwieldy.
I will clean it up separately, once the dust has settled.
Passes toolstash -cmp.
Updates #15756
Change-Id: I71c573bd96ff7251935fce1391b06b1f133c3caf
Reviewed-on: https://go-review.googlesource.com/38327
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
This CL changes the GOARCH.Init functions to take gc.Thearch as a
parameter, which gc.Main supplies.
Additionally, the x86 backend is refactored to decide within Init
whether to use the 387 or SSE2 instruction generators, rather than for
each individual SSA Value/Block.
Passes toolstash-check -all.
Change-Id: Ie6305a6cd6f6ab4e89ecbb3cbbaf5ffd57057a24
Reviewed-on: https://go-review.googlesource.com/38301
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
I don't know that it exists for any other architectures.
Update #18616
Change-Id: Idfe5dee251764d32787915889ec0be4bebc5be24
Reviewed-on: https://go-review.googlesource.com/38323
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
This makes ssa.Func, ssa.Cache, and ssa.Config fulfill
the roles laid out for them in CL 38160.
The only non-trivial change in this CL is how cached
values and blocks get IDs. Prior to this CL, their IDs were
assigned as part of resetting the cache, and only modified
IDs were reset. This required knowing how many values and
blocks were modified, which required a tight coupling between
ssa.Func and ssa.Config. To eliminate that coupling,
we now zero values and blocks during reset,
and assign their IDs when they are used.
Since unused values and blocks have ID == 0,
we can efficiently find the last used value/block,
to avoid zeroing everything.
Bulk zeroing is efficient, but not efficient enough
to obviate the need to avoid zeroing everything every time.
As a happy side-effect, ssa.Func.Free is no longer necessary.
DebugHashMatch and friends now belong in func.go.
They have been left in place for clarity and review.
I will move them in a subsequent CL.
Passes toolstash -cmp. No compiler performance impact.
No change in 'go test cmd/compile/internal/ssa' execution time.
Change-Id: I2eb7af58da067ef6a36e815a6f386cfe8634d098
Reviewed-on: https://go-review.googlesource.com/38167
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
name old time/op new time/op delta
LeadingZeros-4 2.00ns ± 0% 1.34ns ± 1% -33.02% (p=0.000 n=8+10)
LeadingZeros16-4 1.62ns ± 0% 1.57ns ± 0% -3.09% (p=0.001 n=8+9)
LeadingZeros32-4 2.14ns ± 0% 1.48ns ± 0% -30.84% (p=0.002 n=8+10)
LeadingZeros64-4 2.06ns ± 1% 1.33ns ± 0% -35.08% (p=0.000 n=8+8)
8-bit args is a special case - the Go code is really fast because
it is just a single table lookup. So I've disabled that for now.
Intrinsics were actually slower:
LeadingZeros8-4 1.22ns ± 3% 1.58ns ± 1% +29.56% (p=0.000 n=10+10)
Update #18616
Change-Id: Ia9c289b9ba59c583ea64060470315fd637e814cf
Reviewed-on: https://go-review.googlesource.com/38311
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
Remove size AuxInt in Store, and alignment in Move/Zero. We still
pass size AuxInt to Move/Zero, as it is used for partial Move/Zero
lowering (e.g. cmd/compile/internal/ssa/gen/386.rules:288).
SizeAndAlign is gone.
Passes "toolstash -cmp" on std.
Change-Id: I1ca34652b65dd30de886940e789fcf41d521475d
Reviewed-on: https://go-review.googlesource.com/38150
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Now that the write barrier insertion is moved to SSA, the SSA
building code can be simplified.
Updates #17583.
Change-Id: I5cacc034b11aa90b0abe6f8dd97e4e3994e2bc25
Reviewed-on: https://go-review.googlesource.com/36840
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
When the compiler insert write barriers, the frontend makes
conservative decisions at an early stage. This sometimes have
false positives because of the lack of information, for example,
writes on stack. SSA's writebarrier pass identifies writes on
stack and eliminates write barriers for them.
This CL moves write barrier insertion into SSA. The frontend no
longer makes decisions about write barriers, and simply does
normal assignments and emits normal Store ops when building SSA.
SSA writebarrier pass inserts write barrier for Stores when needed.
There, it has better information about the store because Phi and
Copy propagation are done at that time.
This CL only changes StoreWB to Store in gc/ssa.go. A followup CL
simplifies SSA building code.
Updates #17583.
Change-Id: I4592d9bc0067503befc169c50b4e6f4765673bec
Reviewed-on: https://go-review.googlesource.com/36839
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
For SSA Store/Move/Zero ops, attach the type of the value being
stored to the op as the Aux field. This type will be used for
write barrier insertion (in a followup CL). Since SSA passes
do not accurately propagate types of values (because of type
casting), we can't simply use type of the store's arguments
for write barrier insertion.
Passes "toolstash -cmp" on std.
Updates #17583.
Change-Id: I051d5e5c482931640d1d7d879b2a6bb91f2e0056
Reviewed-on: https://go-review.googlesource.com/36838
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Implement math/bits.TrailingZerosX using intrinsics.
Generally reorganize the intrinsic spec a bit.
The instrinsics data structure is now built at init time.
This will make doing the other functions in math/bits easier.
Update sys.CtzX to return int instead of uint{64,32} so it
matches math/bits.TrailingZerosX.
Improve the intrinsics a bit for amd64. We don't need the CMOV
for <64 bit versions.
Update #18616
Change-Id: Ic1c5339c943f961d830ae56f12674d7b29d4ff39
Reviewed-on: https://go-review.googlesource.com/38155
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
With this change, code like
h := sha1.New()
h.Write(buf)
sum := h.Sum()
gets compiled into static calls rather than
interface calls, because the compiler is able
to prove that 'h' is really a *sha1.digest.
The InterCall re-write rule hits a few dozen times
during make.bash, and hundreds of times during all.bash.
The most common pattern identified by the compiler
is a constructor like
func New() Interface { return &impl{...} }
where the constructor gets inlined into the caller,
and the result is used immediately. Examples include
{sha1,md5,crc32,crc64,...}.New, base64.NewEncoder,
base64.NewDecoder, errors.New, net.Pipe, and so on.
Some existing benchmarks that change on darwin/amd64:
Crc64/ISO4KB-8 2.67µs ± 1% 2.66µs ± 0% -0.36% (p=0.015 n=10+10)
Crc64/ISO1KB-8 694ns ± 0% 690ns ± 1% -0.59% (p=0.001 n=10+10)
Adler32KB-8 473ns ± 1% 471ns ± 0% -0.39% (p=0.010 n=10+9)
On architectures like amd64, the reduction in code size
appears to contribute more to benchmark improvements than just
removing the indirect call, since that branch gets predicted
accurately when called in a loop.
Updates #19361
Change-Id: I57d4dc21ef40a05ec0fbd55a9bb0eb74cdc67a3d
Reviewed-on: https://go-review.googlesource.com/38139
Run-TryBot: Philip Hofer <phofer@umich.edu>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
This reverts commit 4e0c7c3f61.
Reason for revert: The presence-of-optimization test program is fragile, breaks under noopt, and might break if the Go libraries are tweaked. It needs to be (re)written without reference to other packages.
Change-Id: I3aaf1ab006a1a255f961a978e9c984341740e3c7
Reviewed-on: https://go-review.googlesource.com/38097
Reviewed-by: Keith Randall <khr@golang.org>
This abstracts creation of ACALL Progs into package gc. The main
benefit of this today is we can refactor away a lot of common
boilerplate code.
Later, once liveness analysis happens on the SSA graph, this will also
provide an easy insertion point for emitting the PCDATA Progs
immediately before call instructions.
Passes toolstash-check -all.
Change-Id: Ia15108ace97201cd84314f1ca916dfeb4f09d61c
Reviewed-on: https://go-review.googlesource.com/38081
Reviewed-by: Keith Randall <khr@golang.org>
With this change, code like
h := sha1.New()
h.Write(buf)
sum := h.Sum()
gets compiled into static calls rather than
interface calls, because the compiler is able
to prove that 'h' is really a *sha1.digest.
The InterCall re-write rule hits a few dozen times
during make.bash, and hundreds of times during all.bash.
The most common pattern identified by the compiler
is a constructor like
func New() Interface { return &impl{...} }
where the constructor gets inlined into the caller,
and the result is used immediately. Examples include
{sha1,md5,crc32,crc64,...}.New, base64.NewEncoder,
base64.NewDecoder, errors.New, net.Pipe, and so on.
Some existing benchmarks that change on darwin/amd64:
Crc64/ISO4KB-8 2.67µs ± 1% 2.66µs ± 0% -0.36% (p=0.015 n=10+10)
Crc64/ISO1KB-8 694ns ± 0% 690ns ± 1% -0.59% (p=0.001 n=10+10)
Adler32KB-8 473ns ± 1% 471ns ± 0% -0.39% (p=0.010 n=10+9)
On architectures like amd64, the reduction in code size
appears to contribute more to benchmark improvements than just
removing the indirect call, since that branch gets predicted
accurately when called in a loop.
Updates #19361
Change-Id: Ia9d30afdd5f6b4d38d38b14b88f308acae8ce7ed
Reviewed-on: https://go-review.googlesource.com/37751
Run-TryBot: Philip Hofer <phofer@umich.edu>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Previously the base register was unset, which lead to the disassembler
using "FP" instead of "SP" as the base register. That lead to some
confusion as to what the difference betweeen the two was.
Be consistent and always use SP.
Fixes#19458
Change-Id: Ie8f8ee54653bd202c0cf6fbf1d350e3c8c8b67a0
Reviewed-on: https://go-review.googlesource.com/37971
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
After benchmarking with a compiler modified to have better
spill location, it became clear that this method of checking
was actually faster on (at least) two different architectures
(ppc64 and amd64) and it also provides more timely interruption
of loops.
This change adds a modified FOR loop node "FORUNTIL" that
checks after executing the loop body instead of before (i.e.,
always at least once). This ensures that a pointer past the
end of a slice or array is not made visible to the garbage
collector.
Without the rescheduling checks inserted, the restructured
loop from this change apparently provides a 1% geomean
improvement on PPC64 running the go1 benchmarks; the
improvement on AMD64 is only 0.12%.
Inserting the rescheduling check exposed some peculiar bug
with the ssa test code for s390x; this was updated based on
initial code actually generated for GOARCH=s390x to use
appropriate OpArg, OpAddr, and OpVarDef.
NaCl is disabled in testing.
Change-Id: Ieafaa9a61d2a583ad00968110ef3e7a441abca50
Reviewed-on: https://go-review.googlesource.com/36206
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Several SSA ops will always behave identically regardless of target
architecture, so handle those within gc/ssa.go instead.
Passes toolstash-check -all.
Change-Id: I54d514e80ab86723e44332a5a38e3054cbca8c5d
Reviewed-on: https://go-review.googlesource.com/37931
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This commit reworks multiway select statements to use normal control
flow primitives instead of the previous setjmp/longjmp-like behavior.
This simplifies liveness analysis and should prevent issues around
"returns twice" function calls within SSA passes.
test/live.go is updated because liveness analysis's CFG is more
representative of actual control flow. The case bodies are the only
real successors of the selectgo call, but previously the selectsend,
selectrecv, etc. calls were included in the successors list too.
Updates #19331.
Change-Id: I7f879b103a4b85e62fc36a270d812f54c0aa3e83
Reviewed-on: https://go-review.googlesource.com/37661
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Since switching to SSA, the only remaining use for the Ullman field
was in tracking whether or not an expression contained a function
call. Give it a new name and encode it in our fancy new bitset field.
Passes toolstash-check.
Change-Id: I95b7f9cb053856320c0d66efe14996667e6011c2
Reviewed-on: https://go-review.googlesource.com/37721
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Previously the compiler rewrote constant division into OHMUL
operations, but that rewriting was moved to SSA in CL 37015. Now OHMUL
is unused, so we can get rid of it.
Change-Id: Ib6fc7c2b6435510bafb5735b3b4f42cfd8ed8cdb
Reviewed-on: https://go-review.googlesource.com/37750
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
A value is "volatile" if it is a pointer to the argument region
on stack which will be clobbered by function call. This is used
to make sure the value is safe when inserting write barrier calls.
The writebarrier pass can tell whether a value is such a pointer.
Therefore no need to mark it when building SSA and thread this
information through.
Passes "toolstash -cmp" on std.
Updates #17583.
Change-Id: Idc5fc0d710152b94b3c504ce8db55ea9ff5b5195
Reviewed-on: https://go-review.googlesource.com/36835
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This adds the necessary changes so that atomics are treated as
intrinsics on ppc64x.
The implementations of And8 and Or8 require power8 for
both ppc64 and ppc64le. This is a new requirement
for ppc64.
Fixes#8739
Change-Id: Icb85e2755a49166ee3652668279f6ed5ebbca901
Reviewed-on: https://go-review.googlesource.com/36832
Reviewed-by: Keith Randall <khr@golang.org>
Explcitly block fused multiply-add pattern matching when a cast is used
after the multiplication, for example:
- (a * b) + c // can emit fused multiply-add
- float64(a * b) + c // cannot emit fused multiply-add
float{32,64} and complex{64,128} casts of matching types are now kept
as OCONV operations rather than being replaced with OCONVNOP operations
because they now imply a rounding operation (and therefore aren't a
no-op anymore).
Operations (for example, multiplication) on complex types may utilize
fused multiply-add and -subtract instructions internally. There is no
way to disable this behavior at the moment.
Improves the performance of the floating point implementation of
poly1305:
name old speed new speed delta
64 246MB/s ± 0% 275MB/s ± 0% +11.48% (p=0.000 n=10+8)
1K 312MB/s ± 0% 357MB/s ± 0% +14.41% (p=0.000 n=10+10)
64Unaligned 246MB/s ± 0% 274MB/s ± 0% +11.43% (p=0.000 n=10+10)
1KUnaligned 312MB/s ± 0% 357MB/s ± 0% +14.39% (p=0.000 n=10+8)
Updates #17895.
Change-Id: Ia771d275bb9150d1a598f8cc773444663de5ce16
Reviewed-on: https://go-review.googlesource.com/36963
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
The type of the OffPtr should be consistent with the type of the
following load. Before this CL it was typed as a pointer to the
struct.
Fixes#19164.
Change-Id: Ibcdec4411c6f719702f76f8dba3cce8691bfbe0c
Reviewed-on: https://go-review.googlesource.com/37254
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
We can immediately emit static assignment data rather than queueing
them up to be processed during SSA building.
Passes toolstash -cmp.
Change-Id: I8bcea4b72eafb0cc0b849cd93e9cde9d84f30d5e
Reviewed-on: https://go-review.googlesource.com/37024
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
These seem not to really matter, but good to be correct.
Change-Id: I02edb9797c3d6739725cfbe4723c75f151acd05e
Reviewed-on: https://go-review.googlesource.com/36837
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
SSA's writebarrier pass requires WB store ops are always at the
end of a block. If we move write barrier insertion into SSA and
emits normal Store ops when building SSA, this requirement becomes
impractical -- it will create too many blocks for all the Store
ops.
Redo SSA's writebarrier pass, explicitly order values in store
order, so it no longer needs this requirement.
Updates #17583.
Fixes#19067.
Change-Id: I66e817e526affb7e13517d4245905300a90b7170
Reviewed-on: https://go-review.googlesource.com/36834
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Currently, whether we need a write barrier is simply a property of the
pointer slot being written to.
The only optimization we currently apply using the value being written
is that pointers to stack variables can omit write barriers because
they're only written to stack slots... but we already omit write
barriers for all writes to the stack anyway.
Passes toolstash -cmp.
Change-Id: I7f16b71ff473899ed96706232d371d5b2b7ae789
Reviewed-on: https://go-review.googlesource.com/37109
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
When doing i.(T) for non-empty-interface i and concrete type T,
there's no need to read the type out of the itab. Just compare the
itab to the itab we expect for that interface/type pair.
Also optimize type switches by putting the type hash of the
concrete type in the itab. That way we don't need to load the
type pointer out of the itab.
Update #18492
Change-Id: I49e280a21e5687e771db5b8a56b685291ac168ce
Reviewed-on: https://go-review.googlesource.com/34810
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Instead we can just call needwritebarrier when constructing the SSA
representation.
Change-Id: I6fefaad49daada9cdb3050f112889e49dca0047b
Reviewed-on: https://go-review.googlesource.com/34566
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This code is dead as a result of
* removing the Follow pass
* moving rotation detection from walk to ssa
Change-Id: I14599c85bedb4e3148347b547e724187920182c4
Reviewed-on: https://go-review.googlesource.com/36484
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Gc's Sym type represents a package-qualified identifier, which is a
frontend concept and doesn't belong in SSA. Bonus: we can replace some
interface{} types with *obj.LSym.
Passes toolstash -cmp.
Change-Id: I456eb9957207d80f99f6eb9b8eab4a1f3263e9ed
Reviewed-on: https://go-review.googlesource.com/36415
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Rather than collecting static data nodes to be written out later, just
write them out immediately.
Change-Id: I51708b690e94bc3e288b4d6ba3307bf738a80f64
Reviewed-on: https://go-review.googlesource.com/36352
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
If there is a defer, and that defer recovers, then the caller
can see all of the output parameters. That means that we must
mark all the output parameters live at any point which might panic.
If there is no defer then this is not necessary. This is implemented.
We could also detect whether there is a recover in any of the defers.
If not, we would need to mark only output params that the defer
actually references (and the closure mechanism already does that).
This is not implemented.
Fixes#18860.
Change-Id: If984fe6686eddce9408bf25e725dd17fc16b8578
Reviewed-on: https://go-review.googlesource.com/36030
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
Remove rotate generation from walk. Remove OLROT and ssa.Lrot* opcodes.
Generate rotates during SSA lowering for architectures that have them.
This CL will allow rotates to be generated in more situations,
like when the shift values are determined to be constant
only after some analysis.
Fixes#18254
Change-Id: I8d6d684ff5ce2511aceaddfda98b908007851079
Reviewed-on: https://go-review.googlesource.com/34232
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
XPos is a compact (8 instead of 16 bytes on a 64bit machine) source
position representation. There is a 1:1 correspondence between each
XPos and each regular Pos, translated via a global table.
In some sense this brings back the LineHist, though positions can
track line and column information; there is a O(1) translation
between the representations (no binary search), and the translation
is factored out.
The size increase with the prior change is brought down again and
the compiler speed is in line with the master repo (measured on
the same "quiet" machine as for prior change):
name old time/op new time/op delta
Template 256ms ± 1% 262ms ± 2% ~ (p=0.063 n=5+4)
Unicode 132ms ± 1% 135ms ± 2% ~ (p=0.063 n=5+4)
GoTypes 891ms ± 1% 871ms ± 1% -2.28% (p=0.016 n=5+4)
Compiler 3.84s ± 2% 3.89s ± 2% ~ (p=0.413 n=5+4)
MakeBash 47.1s ± 1% 46.2s ± 2% ~ (p=0.095 n=5+5)
name old user-ns/op new user-ns/op delta
Template 309M ± 1% 314M ± 2% ~ (p=0.111 n=5+4)
Unicode 165M ± 1% 172M ± 9% ~ (p=0.151 n=5+5)
GoTypes 1.14G ± 2% 1.12G ± 1% ~ (p=0.063 n=5+4)
Compiler 5.00G ± 1% 4.96G ± 1% ~ (p=0.286 n=5+4)
Change-Id: Icc570cc60ab014d8d9af6976f1f961ab8828cc47
Reviewed-on: https://go-review.googlesource.com/34506
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This replaces the src.Pos LineHist-based position tracking with
the syntax.Pos implementation and updates all uses.
The LineHist table is not used anymore - the respective code is still
there but should be removed eventually. CL forthcoming.
Passes toolstash -cmp when comparing to the master repo (with the
exception of a couple of swapped assembly instructions, likely due
to different instruction scheduling because the line-based sorting
has changed; though this is won't affect correctness).
The sizes of various important compiler data structures have increased
significantly (see the various sizes_test.go files); this is probably
the reason for an increase of compilation times (to be addressed). Here
are the results of compilebench -count 5, run on a "quiet" machine (no
apps running besides a terminal):
name old time/op new time/op delta
Template 256ms ± 1% 280ms ±15% +9.54% (p=0.008 n=5+5)
Unicode 132ms ± 1% 132ms ± 1% ~ (p=0.690 n=5+5)
GoTypes 891ms ± 1% 917ms ± 2% +2.88% (p=0.008 n=5+5)
Compiler 3.84s ± 2% 3.99s ± 2% +3.95% (p=0.016 n=5+5)
MakeBash 47.1s ± 1% 47.2s ± 2% ~ (p=0.841 n=5+5)
name old user-ns/op new user-ns/op delta
Template 309M ± 1% 326M ± 2% +5.18% (p=0.008 n=5+5)
Unicode 165M ± 1% 168M ± 4% ~ (p=0.421 n=5+5)
GoTypes 1.14G ± 2% 1.18G ± 1% +3.47% (p=0.008 n=5+5)
Compiler 5.00G ± 1% 5.16G ± 1% +3.12% (p=0.008 n=5+5)
Change-Id: I241c4246cdff627d7ecb95cac23060b38f9775ec
Reviewed-on: https://go-review.googlesource.com/34273
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Loop breaking with a counter. Benchmarked (see comments),
eyeball checked for sanity on popular loops. This code
ought to handle loops in general, and properly inserts phi
functions in cases where the earlier version might not have.
Includes test, plus modifications to test/run.go to deal with
timeout and killing looping test. Tests broken by the addition
of extra code (branch frequency and live vars) for added
checks turn the check insertion off.
If GOEXPERIMENT=preemptibleloops, the compiler inserts reschedule
checks on every backedge of every reducible loop. Alternately,
specifying GO_GCFLAGS=-d=ssa/insert_resched_checks/on will
enable it for a single compilation, but because the core Go
libraries contain some loops that may run long, this is less
likely to have the desired effect.
This is intended as a tool to help in the study and diagnosis
of GC and other latency problems, now that goal STW GC latency
is on the order of 100 microseconds or less.
Updates #17831.
Updates #10958.
Change-Id: I6206c163a5b0248e3f21eb4fc65f73a179e1f639
Reviewed-on: https://go-review.googlesource.com/33910
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This is a mostly mechanical rename followed by manual fixes where necessary.
Change-Id: Ie5c670b133db978f15dc03e50dc2da0c80fc8842
Reviewed-on: https://go-review.googlesource.com/34137
Reviewed-by: David Lazar <lazard@golang.org>
Various minor adjustments.
Change-Id: Iedfb97989f7bedaa3e9e8993b167e05f162434a7
Reviewed-on: https://go-review.googlesource.com/34136
Reviewed-by: David Lazar <lazard@golang.org>
Adjust cmd/compile accordingly.
This will make it easier to replace the underlying implementation.
Change-Id: I33645850bb18c839b24785b6222a9e028617addb
Reviewed-on: https://go-review.googlesource.com/34133
Reviewed-by: David Lazar <lazard@golang.org>
This is a step toward chosing a different position representation.
By introducing an explicit type, it will be easier to make the
transition step-wise while ensuring everything keeps running.
This has been reviewed via https://go-review.googlesource.com/#/c/34025/.
Change-Id: Ibceddcd62d8f346321ac3250e3940e9c436ed684
Reviewed-on: https://go-review.googlesource.com/34132
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Lazar <lazard@golang.org>
when compiling f(a, b, c), we do something like:
*(SP+0) = eval(a)
*(SP+8) = eval(b)
*(SP+16) = eval(c)
call f
If one of those evaluations is later determined to unconditionally panic
(say eval(b) in this example), then the call is deadcode eliminated. But
any previous argument write (*(SP+0)=... here) is still around. Becuase
we only compute the size of the outarg area for calls which are still
around at the end of optimization, the space needed for *(SP+0)=v is not
accounted for and thus the outarg area may be too small.
The fix is to make sure that we evaluate any potentially panicing
operation before we write any of the args to the stack. It turns out
that fix is pretty easy, as we already have such a mechanism available
for function args. We just need to extend it to possibly panicing args
as well.
The resulting code (if b and c can panic, but a can't) is:
tmpb = eval(b)
*(SP+16) = eval(c)
*(SP+0) = eval(a)
*(SP+8) = tmpb
call f
This change tickled a bug in how we find the arguments for intrinsic
calls, so that latent bug is fixed up as well.
Update #16760.
Change-Id: I0bf5edf370220f82bc036cf2085ecc24f356d166
Reviewed-on: https://go-review.googlesource.com/32551
Reviewed-by: Cherry Zhang <cherryyz@google.com>
The code to do the conversion is smaller than the
call to the runtime.
The 1-result asserts need to call panic if they fail, but that
code is out of line.
The only conversions left in the runtime are those which
might allocate and those which might need to generate an itab.
Given the following types:
type E interface{}
type I interface { foo() }
type I2 iterface { foo(); bar() }
type Big [10]int
func (b Big) foo() { ... }
This CL inlines the following conversions:
was assertE2T
var e E = ...
b := i.(Big)
was assertE2T2
var e E = ...
b, ok := i.(Big)
was assertI2T
var i I = ...
b := i.(Big)
was assertI2T2
var i I = ...
b, ok := i.(Big)
was assertI2E
var i I = ...
e := i.(E)
was assertI2E2
var i I = ...
e, ok := i.(E)
These are the remaining runtime calls:
convT2E:
var b Big = ...
var e E = b
convT2I:
var b Big = ...
var i I = b
convI2I:
var i2 I2 = ...
var i I = i2
assertE2I:
var e E = ...
i := e.(I)
assertE2I2:
var e E = ...
i, ok := e.(I)
assertI2I:
var i I = ...
i2 := i.(I2)
assertI2I2:
var i I = ...
i2, ok := i.(I2)
Fixes#17405Fixes#8422
Change-Id: Ida2367bf8ce3cd2c6bb599a1814f1d275afabe21
Reviewed-on: https://go-review.googlesource.com/32313
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
We generate an OpKeepAlive for the idata portion of the interface
for a runtime.KeepAlive call. But given such an op, we need to keep
the entire containing variable alive, not just the range that was
passed to the OpKeepAlive operation.
Fixes#17710
Change-Id: I90de66ec8065e22fb09bcf9722999ddda289ae6e
Reviewed-on: https://go-review.googlesource.com/32477
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
We used to have to keep on-stack copies of these types.
Now they can be registerized.
[0]T is kind of trivial but might as well handle it.
This change enables another change I'm working on to improve how x.(T)
expressions are handled (#17405). This CL helps because now all
types that are direct interface types are registerizeable (e.g. [1]*byte).
No higher-degree arrays for now because non-constant indexes are hard.
Update #17405
Change-Id: I2399940965d17b3969ae66f6fe447a8cefdd6edd
Reviewed-on: https://go-review.googlesource.com/32416
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
This is an extension of
https://go-review.googlesource.com/c/31662/
to mark all the temporaries, not just the ssa-generated ones.
Before-and-after ls -l `go tool -n compile` shows a 3%
reduction in size (or rather, a prior 3% inflation for
failing to filter temps out properly.)
Replaced name-dependent "is it a temp?" tests with calls to
*Node.IsAutoTmp(), which depends on AutoTemp. Also replace
calls to istemp(n) with n.IsAutoTmp(), to reduce duplication
and clean up function name space. Generated temporaries
now come with a "." prefix to avoid (apparently harmless)
clashes with legal Go variable names.
Fixes#17644.
Fixes#17240.
Change-Id: If1417f29c79a7275d7303ddf859b51472890fd43
Reviewed-on: https://go-review.googlesource.com/32255
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Currently, zeroing generates an ssa.OpZero, which never has write
barriers, even if the assignment is an OASWB. The hybrid barrier
requires write barriers on zeroing, so change OASWB to generate an
ssa.OpZeroWB when assigning the zero value, which turns into a
typedmemclr.
Updates #17503.
Change-Id: Ib37ac5e39f578447dbd6b36a6a54117d5624784d
Reviewed-on: https://go-review.googlesource.com/31451
Reviewed-by: Cherry Zhang <cherryyz@google.com>
- removes the runtime function stringtoslicebytetmp
- removes the generation of calls to stringtoslicebytetmp from the frontend
- adds handling of OSTRARRAYBYTETMP in the backend
This reduces binary sizes and avoids function call overhead.
Change-Id: Ib9988d48549cee663b685b4897a483f94727b940
Reviewed-on: https://go-review.googlesource.com/32158
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Gobot Gobot <gobot@golang.org>
When we do
var x []byte = ...
y := x[i:]
We can't just use y.ptr = x.ptr + i, as the new pointer may point to the
next object in memory after the backing array.
We used to fix this by doing:
y.cap = x.cap - i
delta := i
if y.cap == 0 {
delta = 0
}
y.ptr = x.ptr + delta
That generates a branch in what is otherwise straight-line code.
Better to do:
y.cap = x.cap - i
mask := (y.cap - 1) >> 63 // -1 if y.cap==0, 0 otherwise
y.ptr = x.ptr + i &^ mask
It's about the same number of instructions (~4, depending on what
parts are constant, and the target architecture), but it is all
inline. It plays nicely with CSE, and the mask can be computed
in parallel with the index (in cases where a multiply is required).
It is a minor win in both speed and space.
Change-Id: Ied60465a0b8abb683c02208402e5bb7ac0e8370f
Reviewed-on: https://go-review.googlesource.com/32022
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
When the compiler insert write barriers, the frontend makes
conservative decisions at an early stage. This may have false
positives which result in write barriers for stack writes.
A new phase, writebarrier, is added to the SSA backend, to delay
the decision and eliminate false positives. The frontend still
makes conservative decisions. When building SSA, instead of
emitting runtime calls directly, it emits WB ops (StoreWB,
MoveWB, etc.), which will be expanded to branches and runtime
calls in writebarrier phase. Writes to static locations on stack
are detected and write barriers are removed.
All write barriers of stack writes found by the script from
issue #17330 are eliminated (except two false positives).
Fixes#17330.
Change-Id: I9bd66333da9d0ceb64dcaa3c6f33502798d1a0f8
Reviewed-on: https://go-review.googlesource.com/31131
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Adapt old test for prove's bounds check elimination.
Added missing rule to generic rules that lead to differences
between 32 and 64 bit platforms on sliceopt test.
Added debugging to prove.go that was helpful-to-necessary to
discover that missing rule.
Lowered debugging level on prove.go from 3 to 1; no idea
why it was previously 3.
Change-Id: I09de206aeb2fced9f2796efe2bfd4a59927eda0c
Reviewed-on: https://go-review.googlesource.com/23290
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This pragma cancels the effect of go:nowritebarrierrec. This is useful
in the scheduler because there are places where we enter a function
without a valid P (and hence cannot have write barriers), but then
obtain a P. This allows us to annotate the function with
go:nowritebarrierrec and split out the part after we've obtained a P
into a go:yeswritebarrierrec function.
Change-Id: Ic8ce4b6d3c074a1ecd8280ad90eaf39f0ffbcc2a
Reviewed-on: https://go-review.googlesource.com/30938
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
"abc"[1] is not like 'b', in that -"abc"[1] is uint8 math, not ideal constant math.
Delay the constantification until after ideal constant folding is over.
Fixes#11370.
Change-Id: Iba2fc00ca2455959e7bab8f4b8b4aac14b1f9858
Reviewed-on: https://go-review.googlesource.com/15740
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This, along with CL 30140, removes ~50% of stack write barriers
mentioned in issue #17330. The remaining are most due to Phi and
FwdRef, which is not resolved when building SSA. We might be
able to do it at a later stage where Phi and Copy propagations
are done, but matching an if-(store-store-call)+ sequence seems
not very pleasant.
Updates #17330.
Change-Id: Iaa36c7b1f4c4fc3dc10a27018a3b0e261094cb21
Reviewed-on: https://go-review.googlesource.com/30290
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Identify live stack variables during SSA and compute the stack frame
layout earlier so that we can emit instructions with the correct
offsets upfront.
Passes toolstash/buildall.
Change-Id: I191100dba274f1e364a15bdcfdc1d1466cdd1db5
Reviewed-on: https://go-review.googlesource.com/30216
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Probably a holdover from linked list vs. slice.
Change-Id: Ib2540b08ef0ae48707d44a5d57bc23f8d65c760d
Reviewed-on: https://go-review.googlesource.com/30256
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Should be more asymptotically happy.
We process each variable in turn to find all the
locations where it needs a phi (the dominance frontier
of all of its definitions). Then we add all those phis.
This takes O(n * #variables), although hopefully much less.
Then we do a single tree walk to match all the
FwdRefs with the nearest definition or phi.
This takes O(n) time.
The one remaining inefficiency is that we might end up
introducing a bunch of dead phis in the first step.
A TODO is to introduce phis only where they might be
used by a read.
The old algorithm is still faster on small functions,
so there's a cutover size (currently 500 blocks).
This algorithm supercedes the David's sparse phi
placement algorithm for large functions.
Lowers compile time of example from #14934 from
~10 sec to ~4 sec.
Lowers compile time of example from #16361 from
~4.5 sec to ~3 sec.
Lowers #16407 from ~20 min to ~30 sec.
Update #14934
Update #16361Fixes#16407
Change-Id: I1cff6364e1623c143190b6a924d7599e309db58f
Reviewed-on: https://go-review.googlesource.com/30163
Reviewed-by: David Chase <drchase@google.com>
At this point in the compiler we haven't assigned Xoffset values for
PAUTO variables anyway, so just immediately store the stack offsets
into Xoffset rather than into a global map.
Change-Id: I61eb471c857c8b145fd0895cbd98fd4e8d3c3365
Reviewed-on: https://go-review.googlesource.com/30081
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
var x *X = ...
defer x.foo()
As part of the defer, we need to calculate &(*X).foo·f. This expression
is the address of the static closure that will call (*X).foo when a
pointer to that closure is used in a call/defer/go. This pointer is not
currently properly typed in SSA. It is a pointer type, but the base
type is nil, not a proper type.
This turns out not to be a problem currently because we never use the
type of these SSA values. But I'm trying to change that (to be able to
spill them) in CL 28391. To fix, use uint8 as the fake type of the
closure.
Change-Id: Ieee388089c9af398ed772ee8c815122c347cb633
Reviewed-on: https://go-review.googlesource.com/29444
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Also adds the 'find leftmost one' instruction (FLOGR) and replaces the
WORD-encoded use of FLOGR in math/big with it.
Change-Id: I18e7cd19e75b8501a6ae8bd925471f7e37ded206
Reviewed-on: https://go-review.googlesource.com/29372
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
We're dropping this behavior in favor of runtime.KeepAlive.
Implement runtime.KeepAlive as an intrinsic.
Update #15843
Change-Id: Ib60225bd30d6770ece1c3c7d1339a06aa25b1cbc
Reviewed-on: https://go-review.googlesource.com/28310
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Teach SSA about the cmd/internal/obj/$ARCH register numbering.
It can then return that numbering when requested. Each architecture
now does not need to know anything about the internal SSA numbering
of registers.
Change-Id: I34472a2736227c15482e60994eebcdd2723fa52d
Reviewed-on: https://go-review.googlesource.com/29249
Reviewed-by: David Chase <drchase@google.com>
Follow up to CL 29134. Generated with gofmt -r 'Nod -> nod', plus
three manual adjustments to the comments in syntax/parser.go
Change-Id: I02920f7ab10c70b6e850457b42d5fe35f1f3821a
Reviewed-on: https://go-review.googlesource.com/29136
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Binary search remains our friend.
Suppose you add an ought-to-be-benign pattern to PPC64.rules,
and make.bash starts crashing. You can guard the pattern(s)
with config.DebugTest:
(Eq8 x y) && config.DebugTest && isSigned(x.Type) &&
isSigned(y.Type) ->
(Equal (CMPW (SignExt8to32 x) (SignExt8to32 y)))
and then
gossahash -s ./make.bash
...
(go drink beer while silicon minions toil)
...
Trying ./make.bash args=[], env=[GOSSAHASH=100110010111110]
./make.bash failed (1 distinct triggers): exit status 1
Trigger string is 'GOSSAHASH triggered (*importReader).readByte',
repeated 1 times
Review GSHS_LAST_FAIL.0.log for failing run
Finished with GOSSAHASH=100110010111110
Change-Id: I4eff46ebaf496baa2acedd32e217005cb3ac1c62
Reviewed-on: https://go-review.googlesource.com/29273
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
GOSSAFUNC=foo had previously only done printing for the
single function foo, and didn't quite clean up after itself
properly. Changes ensures that Config.HTML != nil iff
GOSSAFUNC==name-of-current-function.
Change-Id: I255e2902dfc64f715d93225f0d29d9525c06f764
Reviewed-on: https://go-review.googlesource.com/29250
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
After the removal of the old backend many types are no longer referenced
outside internal/gc. Make these functions private so that tools like
honnef.co/go/unused can spot when they become dead code. In doing so
this CL identified several previously public helpers which are no longer
used, so removes them.
This should be the last of the public functions.
Change-Id: I7e9c4e72f86f391b428b9dddb6f0d516529706c3
Reviewed-on: https://go-review.googlesource.com/29134
Run-TryBot: Dave Cheney <dave@cheney.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Get rid of BlockCheck. Josh goaded me into it, and I went
down a rabbithole making it happen.
NilCheck now panics if the pointer is nil and returns void, as before.
BlockCheck is gone, and NilCheck is no longer a Control value for
any block. It just exists (and deadcode knows not to throw it away).
I rewrote the nilcheckelim pass to handle this case. In particular,
there can now be multiple NilCheck ops per block.
I moved all of the arch-dependent nil check elimination done as
part of ssaGenValue into its own proper pass, so we don't have to
duplicate that code for every architecture.
Making the arch-dependent nil check its own pass means I needed
to add a bunch of flags to the opcode table so I could write
the code without arch-dependent ops everywhere.
Change-Id: I419f891ac9b0de313033ff09115c374163416a9f
Reviewed-on: https://go-review.googlesource.com/29120
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
when not instrumenting:
- Intrinsify uses of slicebytetostringtmp within the runtime package
in the ssa backend.
- Pass OARRAYBYTESTRTMP nodes to the compiler backends for lowering
instead of generating calls to slicebytetostringtmp.
name old time/op new time/op delta
ConcatStringAndBytes-4 27.9ns ± 2% 24.7ns ± 2% -11.52% (p=0.000 n=43+43)
Fixes#17044
Change-Id: I51ce9c3b93284ce526edd0234f094e98580faf2d
Reviewed-on: https://go-review.googlesource.com/29017
Run-TryBot: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Rip out the code that allows SSA to be used conditionally.
No longer exists:
ssa=0 flag
GOSSAHASH
GOSSAPKG
SSATEST
GOSSAFUNC now only controls the printing of the IR/html.
Still need to rip out all of the old backend. It should no longer be
callable after this CL.
Update #16357
Change-Id: Ib30cc18fba6ca52232c41689ba610b0a94aa74f5
Reviewed-on: https://go-review.googlesource.com/29155
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
It passed tests once, if anything's wrong, better to fail
sooner than later.
Change-Id: Ibb1c5db3f4c5535a4ff4681fd157db77082c5041
Reviewed-on: https://go-review.googlesource.com/28982
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
The new SSA backend modifies the ABI slightly: R0 is now a usable
general purpose register.
Fixes#16677.
Change-Id: I367435ce921e0c7e79e021c80cf8ef5d1d1466cf
Reviewed-on: https://go-review.googlesource.com/28978
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Also add assembly implementation, in case intrinsics is disabled.
Change-Id: Iff0a8a8ce326651bd29f6c403f5ec08dd3629993
Reviewed-on: https://go-review.googlesource.com/28979
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
No need for it, we can treat calls as (mostly) normal values
that take a memory and return a memory.
Lowers the number of basic blocks needed to represent a function.
"go test -c net/http" uses 27% fewer basic blocks.
Probably doesn't affect generated code much, but should help
various passes whose running time and/or space depends on
the number of basic blocks.
Fixes#15631
Change-Id: I0bf21e123f835e2cfa382753955a4f8bce03dfa6
Reviewed-on: https://go-review.googlesource.com/28950
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
- also consistently use %v instead of %s when we have a (gc) Formatter
- rewrite done automatically using Formats test in -u (update) mode
- manual update of format strings that were not single string constants
- updated fmt.go, fmt_test.go accordingly
- fmt_test: permit "%T" always
Change-Id: I8f0704286aba5704600ad0c4a4484005b79b905d
Reviewed-on: https://go-review.googlesource.com/28954
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Atomic ops on ARM are implemented with kernel calls, so they are
not intrinsified.
Change-Id: I0e7cc2e5526ae1a3d24b4b89be1bd13db071f8ef
Reviewed-on: https://go-review.googlesource.com/28977
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
This list now matches the one in popt.go.
Change-Id: Ib24de531cc35252f0ef276e5c6d247654b021533
Reviewed-on: https://go-review.googlesource.com/28965
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
When the divisor is known to be a constant
non-zero, don't insert panicdivide calls
that will just be eliminated later.
The main benefit here is readability of the SSA
form for compiler developers.
Change-Id: Icb7d07fc996941fbaff84524ac3e4b53d8e75fda
Reviewed-on: https://go-review.googlesource.com/28530
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
It should alias to Xchg instead of Swap. Found when testing #16985.
Change-Id: If9fd734a1f89b8b2656f421eb31b9d1b0d95a49f
Reviewed-on: https://go-review.googlesource.com/28512
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Does not pass toolstash -cmp due to changed export data,
but the cmd/go binary (which doesn't contain export data)
is bit-for-bit identical.
Change-Id: I6b12f9de18cf7da528e9207dccbf8f08c969f142
Reviewed-on: https://go-review.googlesource.com/26753
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reduce the duplication in every arch by moving the code into package gc.
Change-Id: Ia111add8316492571825431ecd4f0154c8792ae1
Reviewed-on: https://go-review.googlesource.com/28481
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Uses the same implementation as runtime/internal/atomic.
Reorganize the intrinsic detector to make it more table-driven.
Also works on amd64p32.
Change-Id: I7a5238951d6018d7d5d1bc01f339f6ee9282b2d0
Reviewed-on: https://go-review.googlesource.com/28076
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Atomic swap, add/and/or, compare and swap.
Also works on amd64p32.
Change-Id: Idf2d8f3e1255f71deba759e6e75e293afe4ab2ba
Reviewed-on: https://go-review.googlesource.com/27813
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Add the following optimizations:
- fold constants
- fold address into load/store
- simplify extensions and conditional branches
- remove nil checks
Turn on SSA on MIPS64 by default, and toggle the tests.
Fixes#16359.
Change-Id: I7f1e38c2509e22e42cd024e712990ebbe47176bd
Reviewed-on: https://go-review.googlesource.com/27870
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Inline atomic reads and writes on amd64. There's no reason
to pay the overhead of a call for these.
To keep atomic loads from being reordered, we make them
return a <value,memory> tuple.
Change the meaning of resultInArg0 for tuple-generating ops
to mean the first part of the result tuple, not the second.
This means we can always put the store part of the tuple last,
matching how arguments are laid out. This requires reordering
the outputs of add32carry and sub32carry and their descendents
in various architectures.
benchmark old ns/op new ns/op delta
BenchmarkAtomicLoad64-8 2.09 0.26 -87.56%
BenchmarkAtomicStore64-8 7.54 5.72 -24.14%
TBD (in a different CL): Cas, Or8, ...
Change-Id: I713ea88e7da3026c44ea5bdb56ed094b20bc5207
Reviewed-on: https://go-review.googlesource.com/27641
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Now that we have ops that can return 2 results, have BSF return a result
and flags. We can then get rid of the redundant comparison and use CMOV
instead of CMOVconst ops.
Get rid of a bunch of the ops we don't use. Ctz{8,16}, plus all the Clzs,
and CMOVNEs. I don't think we'll ever use them, and they would be easy
to add back if needed.
Change-Id: I8858a1d017903474ea7e4002fc76a6a86e7bd487
Reviewed-on: https://go-review.googlesource.com/27630
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
This time with the cherry-pick from the proper patch of
the old CL.
Stack size increased.
Corrected NaN-comparison glitches.
Marked g register as clobbered by calls.
Fixed shared libraries.
live_ssa.go still disabled because of differences.
Presumably turning on more optimization will fix
both the stack size and the live_ssa.go glitches.
Enhanced debugging output for shared libs test.
Rebased onto master.
Updates #16010.
Change-Id: I40864faf1ef32c118fb141b7ef8e854498e6b2c4
Reviewed-on: https://go-review.googlesource.com/27159
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
- Use machine instructions for uint64<->float conversions
- Do not enforce alignment on Zero/Move
ARM64 supports unaligned load/stores, but only aligned offset
or small offset can be encoded into instructions.
- Do combined loads
Change-Id: Iffca7dd0f13070b17b784861ce5a30af584680eb
Reviewed-on: https://go-review.googlesource.com/27086
Reviewed-by: David Chase <drchase@google.com>
When T is a scalar, there are no runtime calls
required, which makes this a clear win.
encoding/binary:
WriteInts-8 958ns ± 3% 864ns ± 2% -9.80% (p=0.000 n=15+15)
This also considerably shrinks a core fmt
routine:
Before: "".(*pp).printArg t=1 size=3952 args=0x20 locals=0xf0
After: "".(*pp).printArg t=1 size=2624 args=0x20 locals=0x98
Unfortunately, I find it very hard to get stable
numbers out of the fmt benchmarks due to thermal scaling.
Change-Id: I1278006b030253bf8e48dc7631d18985cdaa143d
Reviewed-on: https://go-review.googlesource.com/26659
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
autopkg == localpkg, so it appears to be a remnant of earlier code.
Change-Id: I65b6c074535e877317cbf9f1f35e94890f0ebf14
Reviewed-on: https://go-review.googlesource.com/26662
Reviewed-by: Keith Randall <khr@golang.org>
Passes ssa_test.
Requires a few new instructions and some scratchpad
memory to move data between G and F registers.
Also fixed comparisons to be correct in case of NaN.
Added missing instructions for run.bash.
Removed some FP registers that are apparently "reserved"
(but that are also apparently also unused except for a
gratuitous multiplication by two when y = x+x would work
just as well).
Currently failing stack splits.
Updates #16010.
Change-Id: I73b161bfff54445d72bd7b813b1479f89fc72602
Reviewed-on: https://go-review.googlesource.com/26813
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Add more ARM64 optimizations:
- use hardware zero register when it is possible.
- use shifted ops.
The assembler supports shifted ops but not documented, nor knows
how to print it. This CL adds them.
- enable fast division.
This was disabled because it makes the old backend generate slower
code. But with SSA it generates faster code.
Turn on SSA by default, also adjust tests.
Change-Id: I7794479954c83bb65008dcb457bc1e21d7496da6
Reviewed-on: https://go-review.googlesource.com/26950
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Last part of the 386 SSA port.
Modify the x86 backend to simulate SSE registers and
instructions with 387 registers and instructions.
The simulation isn't terribly performant, but it works,
and the old implementation wasn't very performant either.
Leaving to people who care about 387 to optimize if they want.
Turn on SSA backend for 386 by default.
Fixes#16358
Change-Id: I678fb59132620b2c47e993c1c10c4c21135f70c0
Reviewed-on: https://go-review.googlesource.com/25271
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
It's not a new backend, just a PtrSize==4 modification
of the existing AMD64 backend.
Change-Id: Icc63521a5cf4ebb379f7430ef3f070894c09afda
Reviewed-on: https://go-review.googlesource.com/25586
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
GOARCH=386 SSATEST=1 ./all.bash passes
Caveat: still needs changes to test/ files to use *_ssa.go versions. I
won't check those changes in with this CL because the builders will
complain as they don't have SSATEST=1.
Mostly minor fixes.
Implement float <-> uint32 in assembly. It seems the simplest option
for now.
GO386=387 does not work. That's why I can't make SSA the default for
386 yet.
Change-Id: Ic4d4402104d32bcfb1fd612f5bb6539f9acb8ae0
Reviewed-on: https://go-review.googlesource.com/25119
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Make tuple types and their SelectX ops fully generic.
These ops no longer need to be lowered.
Regalloc understands them and their tuple-generating arguments.
We can now have opcodes returning arbitrary pairs of results.
(And it would be easy to move to >2 results if needed.)
Update arm implementation to the new standard.
Implement just enough in 386 port to do 64-bit add.
Change-Id: I370ed5aacce219c82e1954c61d1f63af76c16f79
Reviewed-on: https://go-review.googlesource.com/24976
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
NaCl code runs in sandbox and there are restrictions for its
instruction uses
(https://developer.chrome.com/native-client/reference/sandbox_internals/arm-32-bit-sandbox).
Like the legacy backend, on NaCl,
- don't use R9, which is used as NaCl's "thread pointer".
- don't use Duff's device.
- don't use indexed load/stores.
- the assembler rewrites DIV/MOD to runtime calls, which on NaCl
clobbers R12, so R12 is marked as clobbered for DIV/MOD.
- other restrictions are satisfied by the assembler.
Enable SSA specific tests on nacl/arm, and disable non-SSA ones.
Updates #15365.
Change-Id: I9262693ec6756b89ca29d3ae4e52a96fe5403b02
Reviewed-on: https://go-review.googlesource.com/24859
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
As Josh mentioned in CL 24716, there has been requests for using SSA
for ARM. SSA can still be disabled by setting -ssa=0 for cmd/compile,
or partially enabled with GOSSAFUNC, GOSSAPKG, and GOSSAHASH.
Not enable SSA by default on NaCl, which is not supported yet.
Enable SSA-specific tests on ARM: live_ssa.go and nilptr3_ssa.go;
disable non-SSA tests: live.go, nilptr3.go, and slicepot.go.
Updates #15365.
Change-Id: Ic2ca8d166aeca8517b9d262a55e92f2130683a16
Reviewed-on: https://go-review.googlesource.com/23953
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
The comments were mostly duplicated; unify them.
Add a check that the required invariant holds.
Change-Id: I42fe09dcd1fac76d3c4e191f7a58c591c5ce429b
Reviewed-on: https://go-review.googlesource.com/24719
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Encode the size and the alignment into AuxInt of Zero and Move ops.
On AMD64, we simply don't look at the alignment. On ARM and PPC64, we
only generate aligned stores.
Updates #15365.
Change-Id: Ifdcc205c364f67c4516b9adebfe7d50d223b6863
Reviewed-on: https://go-review.googlesource.com/24511
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The frontend may emit node with line number missing. In this case,
use the parent line number. Instead of changing every call site of
pushLine, do it in pushLine itself.
Fixes#16214.
Change-Id: I80390550b56e4d690fc770b01ff725b892ffd6dc
Reviewed-on: https://go-review.googlesource.com/24641
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
We had ~30 one way, and these four new occurrences the other way.
Updates #11626
Change-Id: Ic6403dc4905874916ae292ff739d33482ed8e5bf
Reviewed-on: https://go-review.googlesource.com/24683
Reviewed-by: Rob Pike <r@golang.org>
This adds the initial SSA implementation for PPC64.
Builds golang and all.bash runs correctly. Simple hello.go
builds but does not run.
Change-Id: I7cec211b934cd7a2dd75a6cdfaf9f71867063466
Reviewed-on: https://go-review.googlesource.com/24453
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This will be used verbatim in other architectures.
Change-Id: I307891ae597d797fd45f296b6a38ffe9fac6b975
Reviewed-on: https://go-review.googlesource.com/24155
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This will be needed by other architectures as well.
Put a cleaner encapsulation around it.
Change-Id: I0ac25d600378042b2233301678e9d037e20701d8
Reviewed-on: https://go-review.googlesource.com/24154
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
SSA treats SP as constant throughout a function, so as OffPtr [off] SP.
When the stack moves, spilled OffPtr values become invalid, if they are
not pointer-typed.
(Currently it is fine because of the optimization rules that folds OffPtr
into Load/Store. But it'd better be "optimization", not requirement.)
Updates #15365.
Change-Id: I76cf4008dfdc169e1cb5a55a2605b6678efc915d
Reviewed-on: https://go-review.googlesource.com/23941
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
This matches the behavior of the legacy backend.
Fixes#15975 (if this is the intended behavior)
Change-Id: Id277959069b8b8bf9958fa8f2cbc762c752a1a19
Reviewed-on: https://go-review.googlesource.com/23820
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Make sure auto names don't conflict with function names. Before this CL,
we confused name a.len (the len field of the slice a) with a.len (the function
len declared on a).
Fixes#15961
Change-Id: I14913de697b521fb35db9a1b10ba201f25d552bb
Reviewed-on: https://go-review.googlesource.com/23789
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Machine supports (or the runtime simulates in soft float mode)
(u)int32<->float conversions. The frontend rewrites int64<->float
conversions to call to runtime function.
For int64->float32 conversion, the frontend generates
. . AS u(100) l(10) tc(1)
. . . NAME-main.~r1 u(1) a(true) g(1) l(9) x(8+0) class(PPARAMOUT) f(1) float32
. . . CALLFUNC u(100) l(10) tc(1) float32
. . . . NAME-runtime.int64tofloat64 u(1) a(true) x(0+0) class(PFUNC) tc(1) used(true) FUNC-func(int64) float64
The CALLFUNC node has type float32, whereas runtime.int64tofloat64
returns float64. The legacy backend implicitly makes a float64->float32
conversion. The SSA backend does not do implicit conversion, so we
insert an explicit CONV here.
All cmd/compile/internal/gc/testdata/*_ssa.go tests passed.
Progress on SSA for ARM. Still not complete.
Update #15365.
Change-Id: I30937c8ff977271246b068f48224693776804339
Reviewed-on: https://go-review.googlesource.com/23652
Reviewed-by: Keith Randall <khr@golang.org>
This CL adds support of Div, Mod, Convert, GetClosurePtr and 64-bit indexing
support to SSA backend for ARM.
Add tests for 64-bit indexing to cmd/compile/internal/gc/testdata/string_ssa.go.
Tests cmd/compile/internal/gc/testdata/*_ssa.go passed, except compound_ssa.go
and fp_ssa.go.
Progress on SSA for ARM. Still not complete. Essentially the only unsupported
part is floating point.
Updates #15365.
Change-Id: I269e88b67f641c25e7a813d910c96d356d236bff
Reviewed-on: https://go-review.googlesource.com/23542
Reviewed-by: David Chase <drchase@google.com>
Introduce dec64 rules to (generically) decompose 64-bit integer on
32-bit architectures. 64-bit integer is composed/decomposed with
Int64Make/Hi/Lo ops, as for complex types.
The idea of dealing with Add64 is the following:
(Add64 (Int64Make xh xl) (Int64Make yh yl))
->
(Int64Make
(Add32withcarry xh yh (Select0 (Add32carry xl yl)))
(Select1 (Add32carry xl yl)))
where Add32carry returns a tuple (flags,uint32). Select0 and Select1
read the first and the second component of the tuple, respectively.
The two Add32carry will be CSE'd.
Similarly for multiplication, Mul32uhilo returns a tuple (hi, lo).
Also add support of KeepAlive, to fix build after merge.
Tests addressed_ssa.go, array_ssa.go, break_ssa.go, chan_ssa.go,
cmp_ssa.go, ctl_ssa.go, map_ssa.go, and string_ssa.go in
cmd/compile/internal/gc/testdata passed.
Progress on SSA for ARM. Still not complete.
Updates #15365.
Change-Id: I7867c76785a456312de5d8398a6b3f7ca5a4f7ec
Reviewed-on: https://go-review.googlesource.com/23213
Reviewed-by: Keith Randall <khr@golang.org>
When we do *p = f(), we might need to copy the return value from
f to p with a write barrier. The write barrier itself is a call,
so we need to copy the return value of f to a temporary location
before we call the write barrier function. Otherwise, the call
itself (specifically, marshalling the args to typedmemmove) will
clobber the value we're trying to write.
Fixes#15854
Change-Id: I5703da87634d91a9884e3ec098d7b3af713462e7
Reviewed-on: https://go-review.googlesource.com/23522
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Also fix argument offset for runtime calls.
Also fix LoadReg/StoreReg by generating instructions by type.
Progress on SSA backend for ARM. Still not complete.
Tests append_ssa.go, assert_ssa.go, loadstore_ssa.go, short_ssa.go, and
deferNoReturn.go in cmd/compile/internal/gc/testdata passed.
Updates #15365.
Change-Id: I0f0a2398cab8bbb461772a55241a16a7da2ecedf
Reviewed-on: https://go-review.googlesource.com/23212
Reviewed-by: David Chase <drchase@google.com>
As in the elimination of PHEAP|PPARAM in CL 23393,
this is something the front end can trivially take care of
and then not bother the back ends with.
It also eliminates some suspect (and only lightly exercised)
code paths in the back ends.
I don't have a smoking gun for this one but it seems
more clearly correct.
Change-Id: I3b3f5e669b3b81d091ff1e2fb13226a6f14c69d5
Reviewed-on: https://go-review.googlesource.com/23431
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
The liveness computation of parameters generally was never
correct, but forcing all parameters to be live throughout the
function covered up that problem. The new SSA back end is
too clever: even though it currently keeps the parameter values live
throughout the function, it may find optimizations that mean
the current values are not written back to the original parameter
stack slots immediately or ever (for example if a parameter is set
to nil, SSA constant propagation may replace all later uses of the
parameter with a constant nil, eliminating the need to write the nil
value back to the stack slot), so the liveness code must now
track the actual operations on the stack slots, exposing these
problems.
One small problem in the handling of arguments is that nodarg
can return ONAME PPARAM nodes with adjusted offsets, so that
there are actually multiple *Node pointers for the same parameter
in the instruction stream. This might be possible to correct, but
not in this CL. For now, we fix this by using n.Orig instead of n
when considering PPARAM and PPARAMOUT nodes.
The major problem in the handling of arguments is general
confusion in the liveness code about the meaning of PPARAM|PHEAP
and PPARAMOUT|PHEAP nodes, especially as contrasted with PAUTO|PHEAP.
The difference between these two is that when a local variable "moves"
to the heap, it's really just allocated there to start with; in contrast,
when an argument moves to the heap, the actual data has to be copied
there from the stack at the beginning of the function, and when a
result "moves" to the heap the value in the heap has to be copied
back to the stack when the function returns
This general confusion is also present in the SSA back end.
The PHEAP bit worked decently when I first introduced it 7 years ago (!)
in 391425ae. The back end did nothing sophisticated, and in particular
there was no analysis at all: no escape analysis, no liveness analysis,
and certainly no SSA back end. But the complications caused in the
various downstream consumers suggest that this should be a detail
kept mainly in the front end.
This CL therefore eliminates both the PHEAP bit and even the idea of
"heap variables" from the back ends.
First, it replaces the PPARAM|PHEAP, PPARAMOUT|PHEAP, and PAUTO|PHEAP
variable classes with the single PAUTOHEAP, a pseudo-class indicating
a variable maintained on the heap and available by indirecting a
local variable kept on the stack (a plain PAUTO).
Second, walkexpr replaces all references to PAUTOHEAP variables
with indirections of the corresponding PAUTO variable.
The back ends and the liveness code now just see plain indirected
variables. This may actually produce better code, but the real goal
here is to eliminate these little-used and somewhat suspect code
paths in the back end analyses.
The OPARAM node type goes away too.
A followup CL will do the same to PPARAMREF. I'm not sure that
the back ends (SSA in particular) are handling those right either,
and with the framework established in this CL that change is trivial
and the result clearly more correct.
Fixes#15747.
Change-Id: I2770b1ce3cbc93981bfc7166be66a9da12013d74
Reviewed-on: https://go-review.googlesource.com/23393
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Fix hardcoded flag register mask in ssa/flagalloc.go by auto-generating
the mask.
Also fix a mistake (in previous CL) about conditional branches.
Progress on SSA backend for ARM. Still not complete. Now "container/ring"
package compiles and tests passed.
Updates #15365.
Change-Id: Id7c8805c30dbb8107baedb485ed0f71f59ed6ea8
Reviewed-on: https://go-review.googlesource.com/23093
Reviewed-by: Keith Randall <khr@golang.org>
Introduce a KeepAlive op which makes sure that its argument is kept
live until the KeepAlive. Use KeepAlive to mark pointer input
arguments as live after each function call and at each return.
We do this change only for pointer arguments. Those are the
critical ones to handle because they might have finalizers.
Doing compound arguments (slices, structs, ...) is more complicated
because we would need to track field liveness individually (we do
that for auto variables now, but inputs requires extra trickery).
Turn off the automatic marking of args as live. That way, when args
are explicitly nulled, plive will know that the original argument is
dead.
The KeepAlive op will be the eventual implementation of
runtime.KeepAlive.
Fixes#15277
Change-Id: I5f223e65d99c9f8342c03fbb1512c4d363e903e5
Reviewed-on: https://go-review.googlesource.com/22365
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.
Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)
Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs. This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.
Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.
Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:
#blocks #vars product
max 6171 28180 173,898,780
99.9% 1641 6548 10,401,878
99% 463 1909 873,721
95% 152 639 95,235
90% 84 359 30,021
The old algorithm is indeed usually fastest, for 99%ile
values of usually.
The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.
With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 4m35.200s
user 13m16.644s
sys 0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles
With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 10m36.569s
user 25m52.286s
sys 4m3.696s
and pprof reports 8.3GB allocated in the same larger profile
With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).
Fixes#15537.
Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Progress on SSA backend for ARM. Still not complete. Now "helloworld"
function compiles and runs.
Updates #15365.
Change-Id: I02f66983cefdf07a6aed262fb4af8add464d8e9a
Reviewed-on: https://go-review.googlesource.com/22854
Reviewed-by: Keith Randall <khr@golang.org>
If b has exactly one predecessor, as happens
frequently with static calls, we can make
lookupVarOutgoing generate less garbage.
Instead of generating a value that is just
going to be an OpCopy and then get eliminated,
loop. This can lead to lots of looping.
However, this loop is way cheaper than generating
lots of ssa.Values and then eliminating them.
For a subset of the code in #15537:
Before:
28.31 real 36.17 user 1.68 sys
2282450944 maximum resident set size
After:
9.63 real 11.66 user 0.51 sys
638144512 maximum resident set size
Updates #15537.
Excitingly, it appears that this also helps
regular code:
name old time/op new time/op delta
Template 288ms ± 6% 276ms ± 7% -4.13% (p=0.000 n=21+24)
Unicode 143ms ± 8% 141ms ±10% ~ (p=0.287 n=24+25)
GoTypes 932ms ± 4% 874ms ± 4% -6.20% (p=0.000 n=23+22)
Compiler 4.89s ± 4% 4.58s ± 4% -6.46% (p=0.000 n=22+23)
MakeBash 40.2s ±13% 39.8s ± 9% ~ (p=0.648 n=23+23)
name old user-ns/op new user-ns/op delta
Template 388user-ms ±10% 373user-ms ± 5% -3.80% (p=0.000 n=24+25)
Unicode 203user-ms ± 6% 202user-ms ± 7% ~ (p=0.492 n=22+24)
GoTypes 1.29user-s ± 4% 1.17user-s ± 4% -9.67% (p=0.000 n=25+23)
Compiler 6.86user-s ± 5% 6.28user-s ± 4% -8.49% (p=0.000 n=25+25)
name old alloc/op new alloc/op delta
Template 51.5MB ± 0% 47.6MB ± 0% -7.47% (p=0.000 n=22+25)
Unicode 37.2MB ± 0% 37.1MB ± 0% -0.21% (p=0.000 n=25+25)
GoTypes 166MB ± 0% 138MB ± 0% -16.83% (p=0.000 n=25+25)
Compiler 756MB ± 0% 628MB ± 0% -16.96% (p=0.000 n=25+23)
name old allocs/op new allocs/op delta
Template 450k ± 0% 445k ± 0% -1.02% (p=0.000 n=25+25)
Unicode 356k ± 0% 356k ± 0% ~ (p=0.374 n=24+25)
GoTypes 1.31M ± 0% 1.25M ± 0% -4.18% (p=0.000 n=25+25)
Compiler 5.29M ± 0% 5.02M ± 0% -5.15% (p=0.000 n=25+23)
It also seems to help in other cases in which
phi insertion is a pain point (#14774, #14934).
Change-Id: Ibd05ed7b99d262117ece7bb250dfa8c3d1cc5dd2
Reviewed-on: https://go-review.googlesource.com/22790
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Turn SSAing of variables off when compiling with optimizations off.
This helps keep variable names around that would otherwise be
optimized away.
Fixes#14744
Change-Id: I31db8cf269c068c7c5851808f13e5955a09810ca
Reviewed-on: https://go-review.googlesource.com/22681
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
These symbols are de-duplicated in the linker but the compiler generates quite
many duplicates too: 2425 of 13769 total symbols for runtime.a for example.
De-duplicating them in the compiler saves the linker a bit of work.
Fixes#14983
Change-Id: I5f18e5f9743563c795aad8f0a22d17a7ed147711
Reviewed-on: https://go-review.googlesource.com/22293
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Updates #15462
Automatic refactor with sed -e.
Replace all oconv(op, 0) to string conversion with the raw op value
which fmt's %v verb can print directly.
The remaining oconv(op, FmtSharp) will be replaced with op.GoString and
%#v in the next CL.
Change-Id: I5e2f7ee0bd35caa65c6dd6cb1a866b5e4519e641
Reviewed-on: https://go-review.googlesource.com/22499
Run-TryBot: Dave Cheney <dave@cheney.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The line numbers of ONAMEs are the location of their
declaration, not their use.
The line numbers of named OLITERALs are also the location
of their declaration.
Ignore both of these. Instead, we will inherit the line number from
the containing syntactic item.
Fixes#14742Fixes#15430
Change-Id: Ie43b5b9f6321cbf8cead56e37ccc9364d0702f2f
Reviewed-on: https://go-review.googlesource.com/22479
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Updates #15462
Semi automatic change with gofmt -r and hand fixups for callers outside
internal/gc.
All the uses of gc.Oconv outside cmd/compile/internal/gc were for the
Oconv(op, 0) form, which is already handled the Op.String method.
Replace the use of gc.Oconv(op, 0) with op itself, which will call
Op.String via the %v or %s verb. Unexport Oconv.
Change-Id: I84da2a2e4381b35f52efce427b2d6a3bccdf2526
Reviewed-on: https://go-review.googlesource.com/22496
Run-TryBot: Dave Cheney <dave@cheney.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>