When we do
var x []byte = ...
y := x[i:]
We can't just use y.ptr = x.ptr + i, as the new pointer may point to the
next object in memory after the backing array.
We used to fix this by doing:
y.cap = x.cap - i
delta := i
if y.cap == 0 {
delta = 0
}
y.ptr = x.ptr + delta
That generates a branch in what is otherwise straight-line code.
Better to do:
y.cap = x.cap - i
mask := (y.cap - 1) >> 63 // -1 if y.cap==0, 0 otherwise
y.ptr = x.ptr + i &^ mask
It's about the same number of instructions (~4, depending on what
parts are constant, and the target architecture), but it is all
inline. It plays nicely with CSE, and the mask can be computed
in parallel with the index (in cases where a multiply is required).
It is a minor win in both speed and space.
Change-Id: Ied60465a0b8abb683c02208402e5bb7ac0e8370f
Reviewed-on: https://go-review.googlesource.com/32022
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
On ARM, DIV, DIVU, MOD, MODU are pseudo instructions that makes
runtime calls _div/_udiv/_mod/_umod, which themselves are wrappers
of udiv. The udiv function does the real thing.
Instead of generating these pseudo instructions, call to udiv
directly. This removes one layer of wrappers (which has an awkward
way of passing argument), and also allows combining DIV and MOD
if both results are needed.
Change-Id: I118afc3986db3a1daabb5c1e6e57430888c91817
Reviewed-on: https://go-review.googlesource.com/29390
Reviewed-by: David Chase <drchase@google.com>
Atomic ops on ARM are implemented with kernel calls, so they are
not intrinsified.
Change-Id: I0e7cc2e5526ae1a3d24b4b89be1bd13db071f8ef
Reviewed-on: https://go-review.googlesource.com/28977
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Instead of comparing the address of the end of the memory to zero/copy,
comparing the address of the last element, which is a valid pointer.
Also unify large and unaligned Zero/Move, by passing alignment as AuxInt.
Fixes#16515 for ARM.
Change-Id: I19a62b31c5acf5c55c16a89bea1039c926dc91e5
Reviewed-on: https://go-review.googlesource.com/25300
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
For register-register move, if there is only one use, allocate it in
the same register so we don't need to emit an instruction.
Updates #15365.
Change-Id: Iad41843854a506c521d577ad93fcbe73e8de8065
Reviewed-on: https://go-review.googlesource.com/25059
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Make tuple types and their SelectX ops fully generic.
These ops no longer need to be lowered.
Regalloc understands them and their tuple-generating arguments.
We can now have opcodes returning arbitrary pairs of results.
(And it would be easy to move to >2 results if needed.)
Update arm implementation to the new standard.
Implement just enough in 386 port to do 64-bit add.
Change-Id: I370ed5aacce219c82e1954c61d1f63af76c16f79
Reviewed-on: https://go-review.googlesource.com/24976
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
NaCl code runs in sandbox and there are restrictions for its
instruction uses
(https://developer.chrome.com/native-client/reference/sandbox_internals/arm-32-bit-sandbox).
Like the legacy backend, on NaCl,
- don't use R9, which is used as NaCl's "thread pointer".
- don't use Duff's device.
- don't use indexed load/stores.
- the assembler rewrites DIV/MOD to runtime calls, which on NaCl
clobbers R12, so R12 is marked as clobbered for DIV/MOD.
- other restrictions are satisfied by the assembler.
Enable SSA specific tests on nacl/arm, and disable non-SSA ones.
Updates #15365.
Change-Id: I9262693ec6756b89ca29d3ae4e52a96fe5403b02
Reviewed-on: https://go-review.googlesource.com/24859
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Add some simplification rules for floating point ops.
cmd/internal/obj/arm supports instructions that compare FP register
to 0, but runtime softfloat simulator does not. This CL adds these
instructions to softfloat simulator as well.
Updates #15365.
Change-Id: I29405b2bfcb4c8cf106cb7a1a811409fec91b170
Reviewed-on: https://go-review.googlesource.com/24790
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
This CL implements the following optimizations for ARM:
- use shifted ops (e.g. ADD R1<<2, R2) and indexed load/stores
- break up shift ops. Shifts used to be one SSA op that generates
multiple instructions. We break them up to multiple ops, which
allows constant folding and CSE for comparisons. Conditional moves
are introduced for this.
- simplify zero/sign-extension ops.
Updates #15365.
Change-Id: I55e262a776a7ef2a1505d75e04d1208913c35d39
Reviewed-on: https://go-review.googlesource.com/24512
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Mostly constant folding rules, analogous to AMD64 ones. Along with
some simplifications.
Updates #15365.
Change-Id: If83bc1188bb05acb982ef3a1c21704c187e3eb24
Reviewed-on: https://go-review.googlesource.com/24210
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Encode the size and the alignment into AuxInt of Zero and Move ops.
On AMD64, we simply don't look at the alignment. On ARM and PPC64, we
only generate aligned stores.
Updates #15365.
Change-Id: Ifdcc205c364f67c4516b9adebfe7d50d223b6863
Reviewed-on: https://go-review.googlesource.com/24511
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Introduce an op MOVWaddr for addresses on ARM, instead of overuse
ADDconst.
Mark MOVWaddr as rematerializable. This fixes a liveness problem: if
it were not rematerializable, the address of a variable may be spilled
and later use of the address may just load the spilled value without
mentioning the variable, and the liveness code may think it is dead
prematurely.
Update #15365.
Change-Id: Ib0b0fa826bdb75c9e6bb362b95c6cf132cc6b1c0
Reviewed-on: https://go-review.googlesource.com/23942
Reviewed-by: David Chase <drchase@google.com>
- 64x signed right shift was wrong for shift larger than 0x80000000.
- for Lsh-followed-by-Rsh, the intermediate value should be full int
width, so when it is spilled MOVW should be used.
- use RET for RetJmp, so the assembler can take case of restoring LR
for non-leaf case.
- reserve R9 in dynlink mode. R9 is used for GOT by the assembler.
Progress on SSA backend for ARM. Still not complete.
Updates #15365.
Change-Id: I3caca256b92ff7cf96469da2feaf4868a592efc5
Reviewed-on: https://go-review.googlesource.com/23793
Reviewed-by: David Chase <drchase@google.com>
Machine supports (or the runtime simulates in soft float mode)
(u)int32<->float conversions. The frontend rewrites int64<->float
conversions to call to runtime function.
For int64->float32 conversion, the frontend generates
. . AS u(100) l(10) tc(1)
. . . NAME-main.~r1 u(1) a(true) g(1) l(9) x(8+0) class(PPARAMOUT) f(1) float32
. . . CALLFUNC u(100) l(10) tc(1) float32
. . . . NAME-runtime.int64tofloat64 u(1) a(true) x(0+0) class(PFUNC) tc(1) used(true) FUNC-func(int64) float64
The CALLFUNC node has type float32, whereas runtime.int64tofloat64
returns float64. The legacy backend implicitly makes a float64->float32
conversion. The SSA backend does not do implicit conversion, so we
insert an explicit CONV here.
All cmd/compile/internal/gc/testdata/*_ssa.go tests passed.
Progress on SSA for ARM. Still not complete.
Update #15365.
Change-Id: I30937c8ff977271246b068f48224693776804339
Reviewed-on: https://go-review.googlesource.com/23652
Reviewed-by: Keith Randall <khr@golang.org>
This CL adds support of Div, Mod, Convert, GetClosurePtr and 64-bit indexing
support to SSA backend for ARM.
Add tests for 64-bit indexing to cmd/compile/internal/gc/testdata/string_ssa.go.
Tests cmd/compile/internal/gc/testdata/*_ssa.go passed, except compound_ssa.go
and fp_ssa.go.
Progress on SSA for ARM. Still not complete. Essentially the only unsupported
part is floating point.
Updates #15365.
Change-Id: I269e88b67f641c25e7a813d910c96d356d236bff
Reviewed-on: https://go-review.googlesource.com/23542
Reviewed-by: David Chase <drchase@google.com>
Also fix a mistake in previous CL about x8 and x16 shifts:
the shift needs ZeroExt.
Progress on SSA for ARM. Still not complete.
Updates #15365.
Change-Id: Ibc352760023d38bc6b9c5251e929fe26e016637a
Reviewed-on: https://go-review.googlesource.com/23486
Reviewed-by: David Chase <drchase@google.com>
Introduce dec64 rules to (generically) decompose 64-bit integer on
32-bit architectures. 64-bit integer is composed/decomposed with
Int64Make/Hi/Lo ops, as for complex types.
The idea of dealing with Add64 is the following:
(Add64 (Int64Make xh xl) (Int64Make yh yl))
->
(Int64Make
(Add32withcarry xh yh (Select0 (Add32carry xl yl)))
(Select1 (Add32carry xl yl)))
where Add32carry returns a tuple (flags,uint32). Select0 and Select1
read the first and the second component of the tuple, respectively.
The two Add32carry will be CSE'd.
Similarly for multiplication, Mul32uhilo returns a tuple (hi, lo).
Also add support of KeepAlive, to fix build after merge.
Tests addressed_ssa.go, array_ssa.go, break_ssa.go, chan_ssa.go,
cmp_ssa.go, ctl_ssa.go, map_ssa.go, and string_ssa.go in
cmd/compile/internal/gc/testdata passed.
Progress on SSA for ARM. Still not complete.
Updates #15365.
Change-Id: I7867c76785a456312de5d8398a6b3f7ca5a4f7ec
Reviewed-on: https://go-review.googlesource.com/23213
Reviewed-by: Keith Randall <khr@golang.org>
Generate load/stores for small zeroing/move, DUFFZERO/DUFFCOPY for
medium zeroing/move, and loops for large zeroing/move.
cmd/compile/internal/gc/testdata/{copy_ssa.go,zero_ssa.go} tests
passed.
Progress on SSA backend for ARM. Still not complete. A few packages
in the standard library compile and tests passed, including
container/list, hash/crc32, unicode/utf8, etc.
Updates #15365.
Change-Id: Ieb4b68b44ee7de66bf7b68f5f33a605349fcc6fa
Reviewed-on: https://go-review.googlesource.com/23097
Reviewed-by: Keith Randall <khr@golang.org>
Implement shifts and multiplications for up to 32-bit values.
Also handle Exit block.
Progress on SSA backend for ARM. Still not complete.
container/heap, crypto/subtle, hash/adler32 packages compile and
tests passed.
Updates #15365.
Change-Id: I6bee4d5b0051e51d5de97e8a1938c4b87a36cbf8
Reviewed-on: https://go-review.googlesource.com/23096
Reviewed-by: Keith Randall <khr@golang.org>
Fix hardcoded flag register mask in ssa/flagalloc.go by auto-generating
the mask.
Also fix a mistake (in previous CL) about conditional branches.
Progress on SSA backend for ARM. Still not complete. Now "container/ring"
package compiles and tests passed.
Updates #15365.
Change-Id: Id7c8805c30dbb8107baedb485ed0f71f59ed6ea8
Reviewed-on: https://go-review.googlesource.com/23093
Reviewed-by: Keith Randall <khr@golang.org>
Start working on arm port. Gets close to correct
code for fibonacci:
func fib(n int) int {
if n < 2 {
return n
}
return fib(n-1) + fib(n-2)
}
Still a lot to do, but this is a good starting point.
Cleaned up some arch-specific dependencies in regalloc.
Change-Id: I4301c6c31a8402168e50dcfee8bcf7aee73ea9d5
Reviewed-on: https://go-review.googlesource.com/21000
Reviewed-by: David Chase <drchase@google.com>