Commit Graph

23 Commits

Author SHA1 Message Date
Keith Randall deb4177cf0 cmd/compile: use masks instead of branches for slicing
When we do

  var x []byte = ...
  y := x[i:]

We can't just use y.ptr = x.ptr + i, as the new pointer may point to the
next object in memory after the backing array.
We used to fix this by doing:

  y.cap = x.cap - i
  delta := i
  if y.cap == 0 {
    delta = 0
  }
  y.ptr = x.ptr + delta

That generates a branch in what is otherwise straight-line code.

Better to do:

  y.cap = x.cap - i
  mask := (y.cap - 1) >> 63 // -1 if y.cap==0, 0 otherwise
  y.ptr = x.ptr + i &^ mask

It's about the same number of instructions (~4, depending on what
parts are constant, and the target architecture), but it is all
inline. It plays nicely with CSE, and the mask can be computed
in parallel with the index (in cases where a multiply is required).

It is a minor win in both speed and space.

Change-Id: Ied60465a0b8abb683c02208402e5bb7ac0e8370f
Reviewed-on: https://go-review.googlesource.com/32022
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-10-27 20:22:49 +00:00
Cherry Zhang 38cd79889e cmd/compile: simplify div/mod on ARM
On ARM, DIV, DIVU, MOD, MODU are pseudo instructions that makes
runtime calls _div/_udiv/_mod/_umod, which themselves are wrappers
of udiv. The udiv function does the real thing.

Instead of generating these pseudo instructions, call to udiv
directly. This removes one layer of wrappers (which has an awkward
way of passing argument), and also allows combining DIV and MOD
if both results are needed.

Change-Id: I118afc3986db3a1daabb5c1e6e57430888c91817
Reviewed-on: https://go-review.googlesource.com/29390
Reviewed-by: David Chase <drchase@google.com>
2016-09-20 13:40:48 +00:00
Cherry Zhang 8ff4260777 cmd/compile: intrinsify Ctz, Bswap on ARM
Atomic ops on ARM are implemented with kernel calls, so they are
not intrinsified.

Change-Id: I0e7cc2e5526ae1a3d24b4b89be1bd13db071f8ef
Reviewed-on: https://go-review.googlesource.com/28977
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-09-12 19:26:31 +00:00
Cherry Zhang 114c05962c [dev.ssa] cmd/compile: fix possible invalid pointer spill in large Zero/Move on ARM
Instead of comparing the address of the end of the memory to zero/copy,
comparing the address of the last element, which is a valid pointer.
Also unify large and unaligned Zero/Move, by passing alignment as AuxInt.

Fixes #16515 for ARM.

Change-Id: I19a62b31c5acf5c55c16a89bea1039c926dc91e5
Reviewed-on: https://go-review.googlesource.com/25300
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-07-27 18:00:19 +00:00
Cherry Zhang d8181d5d75 [dev.ssa] cmd/compile: simplify MOVWreg on ARM
For register-register move, if there is only one use, allocate it in
the same register so we don't need to emit an instruction.

Updates #15365.

Change-Id: Iad41843854a506c521d577ad93fcbe73e8de8065
Reviewed-on: https://go-review.googlesource.com/25059
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-07-21 16:46:58 +00:00
Cherry Zhang 7b9873b9b9 [dev.ssa] cmd/internal/obj, etc.: add and use NEGF, NEGD instructions on ARM
Updates #15365.

Change-Id: I372a5617c2c7d91de545cac0464809b96711b63a
Reviewed-on: https://go-review.googlesource.com/24646
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
2016-07-20 18:15:37 +00:00
Keith Randall 25e0a367da [dev.ssa] cmd/compile: clean up tuple types and selects
Make tuple types and their SelectX ops fully generic.
These ops no longer need to be lowered.
Regalloc understands them and their tuple-generating arguments.
We can now have opcodes returning arbitrary pairs of results.
(And it would be easy to move to >2 results if needed.)

Update arm implementation to the new standard.
Implement just enough in 386 port to do 64-bit add.

Change-Id: I370ed5aacce219c82e1954c61d1f63af76c16f79
Reviewed-on: https://go-review.googlesource.com/24976
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-07-18 16:11:36 +00:00
Cherry Zhang 6b6de15d32 [dev.ssa] cmd/compile: support NaCl in SSA for ARM
NaCl code runs in sandbox and there are restrictions for its
instruction uses
(https://developer.chrome.com/native-client/reference/sandbox_internals/arm-32-bit-sandbox).

Like the legacy backend, on NaCl,
- don't use R9, which is used as NaCl's "thread pointer".
- don't use Duff's device.
- don't use indexed load/stores.
- the assembler rewrites DIV/MOD to runtime calls, which on NaCl
  clobbers R12, so R12 is marked as clobbered for DIV/MOD.
- other restrictions are satisfied by the assembler.

Enable SSA specific tests on nacl/arm, and disable non-SSA ones.

Updates #15365.

Change-Id: I9262693ec6756b89ca29d3ae4e52a96fe5403b02
Reviewed-on: https://go-review.googlesource.com/24859
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-07-16 03:13:45 +00:00
Cherry Zhang 7d70f84f54 [dev.ssa] cmd/compile: add floating point optimizations in SSA for ARM
Add some simplification rules for floating point ops.

cmd/internal/obj/arm supports instructions that compare FP register
to 0, but runtime softfloat simulator does not. This CL adds these
instructions to softfloat simulator as well.

Updates #15365.

Change-Id: I29405b2bfcb4c8cf106cb7a1a811409fec91b170
Reviewed-on: https://go-review.googlesource.com/24790
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-07-16 03:13:22 +00:00
Cherry Zhang 8cc3f4a17e [dev.ssa] cmd/compile: use shifted and indexed ops in SSA for ARM
This CL implements the following optimizations for ARM:
- use shifted ops (e.g. ADD R1<<2, R2) and indexed load/stores
- break up shift ops. Shifts used to be one SSA op that generates
  multiple instructions. We break them up to multiple ops, which
  allows constant folding and CSE for comparisons. Conditional moves
  are introduced for this.
- simplify zero/sign-extension ops.

Updates #15365.

Change-Id: I55e262a776a7ef2a1505d75e04d1208913c35d39
Reviewed-on: https://go-review.googlesource.com/24512
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-07-15 18:19:59 +00:00
Cherry Zhang 8599fdd9b6 [dev.ssa] cmd/compile: add some ARM optimization rewriting rules
Mostly constant folding rules, analogous to AMD64 ones. Along with
some simplifications.

Updates #15365.

Change-Id: If83bc1188bb05acb982ef3a1c21704c187e3eb24
Reviewed-on: https://go-review.googlesource.com/24210
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-07-06 15:55:29 +00:00
Cherry Zhang f55317828b [dev.ssa] cmd/compile: ensure alignment for Zero and Move in SSA for ARM
Encode the size and the alignment into AuxInt of Zero and Move ops.
On AMD64, we simply don't look at the alignment. On ARM and PPC64, we
only generate aligned stores.

Updates #15365.

Change-Id: Ifdcc205c364f67c4516b9adebfe7d50d223b6863
Reviewed-on: https://go-review.googlesource.com/24511
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-07-02 22:22:12 +00:00
Cherry Zhang c40dcff2f2 [dev.ssa] cmd/compile: use MOVWaddr for address on ARM
Introduce an op MOVWaddr for addresses on ARM, instead of overuse
ADDconst.

Mark MOVWaddr as rematerializable. This fixes a liveness problem: if
it were not rematerializable, the address of a variable may be spilled
and later use of the address may just load the spilled value without
mentioning the variable, and the liveness code may think it is dead
prematurely.

Update #15365.

Change-Id: Ib0b0fa826bdb75c9e6bb362b95c6cf132cc6b1c0
Reviewed-on: https://go-review.googlesource.com/23942
Reviewed-by: David Chase <drchase@google.com>
2016-06-13 12:55:51 +00:00
Cherry Zhang fa54bf16e0 [dev.ssa] cmd/compile: fix a few bugs for SSA for ARM
- 64x signed right shift was wrong for shift larger than 0x80000000.
- for Lsh-followed-by-Rsh, the intermediate value should be full int
  width, so when it is spilled MOVW should be used.
- use RET for RetJmp, so the assembler can take case of restoring LR
  for non-leaf case.
- reserve R9 in dynlink mode. R9 is used for GOT by the assembler.

Progress on SSA backend for ARM. Still not complete.

Updates #15365.

Change-Id: I3caca256b92ff7cf96469da2feaf4868a592efc5
Reviewed-on: https://go-review.googlesource.com/23793
Reviewed-by: David Chase <drchase@google.com>
2016-06-08 20:37:31 +00:00
Cherry Zhang 59e11d7827 [dev.ssa] cmd/compile: handle floating point on ARM
Machine supports (or the runtime simulates in soft float mode)
(u)int32<->float conversions. The frontend rewrites int64<->float
conversions to call to runtime function.

For int64->float32 conversion, the frontend generates

.   .   AS u(100) l(10) tc(1)
.   .   .   NAME-main.~r1 u(1) a(true) g(1) l(9) x(8+0) class(PPARAMOUT) f(1) float32
.   .   .   CALLFUNC u(100) l(10) tc(1) float32
.   .   .   .   NAME-runtime.int64tofloat64 u(1) a(true) x(0+0) class(PFUNC) tc(1) used(true) FUNC-func(int64) float64

The CALLFUNC node has type float32, whereas runtime.int64tofloat64
returns float64. The legacy backend implicitly makes a float64->float32
conversion. The SSA backend does not do implicit conversion, so we
insert an explicit CONV here.

All cmd/compile/internal/gc/testdata/*_ssa.go tests passed.

Progress on SSA for ARM. Still not complete.

Update #15365.

Change-Id: I30937c8ff977271246b068f48224693776804339
Reviewed-on: https://go-review.googlesource.com/23652
Reviewed-by: Keith Randall <khr@golang.org>
2016-06-06 14:06:38 +00:00
Cherry Zhang e78d90beeb [dev.ssa] cmd/compile: handle Div, Convert, GetClosurePtr etc. on ARM
This CL adds support of Div, Mod, Convert, GetClosurePtr and 64-bit indexing
support to SSA backend for ARM.

Add tests for 64-bit indexing to cmd/compile/internal/gc/testdata/string_ssa.go.

Tests cmd/compile/internal/gc/testdata/*_ssa.go passed, except compound_ssa.go
and fp_ssa.go.

Progress on SSA for ARM. Still not complete. Essentially the only unsupported
part is floating point.

Updates #15365.

Change-Id: I269e88b67f641c25e7a813d910c96d356d236bff
Reviewed-on: https://go-review.googlesource.com/23542
Reviewed-by: David Chase <drchase@google.com>
2016-06-05 03:56:42 +00:00
Cherry Zhang 4636d02244 [dev.ssa] cmd/compile: handle 64-bit shifts on ARM
Also fix a mistake in previous CL about x8 and x16 shifts:
the shift needs ZeroExt.

Progress on SSA for ARM. Still not complete.

Updates #15365.

Change-Id: Ibc352760023d38bc6b9c5251e929fe26e016637a
Reviewed-on: https://go-review.googlesource.com/23486
Reviewed-by: David Chase <drchase@google.com>
2016-06-02 13:03:59 +00:00
Cherry Zhang 8756d9253f [dev.ssa] cmd/compile: decompose 64-bit integer on ARM
Introduce dec64 rules to (generically) decompose 64-bit integer on
32-bit architectures. 64-bit integer is composed/decomposed with
Int64Make/Hi/Lo ops, as for complex types.

The idea of dealing with Add64 is the following:

(Add64 (Int64Make xh xl) (Int64Make yh yl))
->
(Int64Make
	(Add32withcarry xh yh (Select0 (Add32carry xl yl)))
	(Select1 (Add32carry xl yl)))

where Add32carry returns a tuple (flags,uint32). Select0 and Select1
read the first and the second component of the tuple, respectively.
The two Add32carry will be CSE'd.

Similarly for multiplication, Mul32uhilo returns a tuple (hi, lo).

Also add support of KeepAlive, to fix build after merge.

Tests addressed_ssa.go, array_ssa.go, break_ssa.go, chan_ssa.go,
cmp_ssa.go, ctl_ssa.go, map_ssa.go, and string_ssa.go in
cmd/compile/internal/gc/testdata passed.

Progress on SSA for ARM. Still not complete.

Updates #15365.

Change-Id: I7867c76785a456312de5d8398a6b3f7ca5a4f7ec
Reviewed-on: https://go-review.googlesource.com/23213
Reviewed-by: Keith Randall <khr@golang.org>
2016-06-02 13:01:09 +00:00
Cherry Zhang 8357ec37ae [dev.ssa] cmd/compile: implement Zero, Move, Copy for SSA on ARM
Generate load/stores for small zeroing/move, DUFFZERO/DUFFCOPY for
medium zeroing/move, and loops for large zeroing/move.

cmd/compile/internal/gc/testdata/{copy_ssa.go,zero_ssa.go} tests
passed.

Progress on SSA backend for ARM. Still not complete. A few packages
in the standard library compile and tests passed, including
container/list, hash/crc32, unicode/utf8, etc.

Updates #15365.

Change-Id: Ieb4b68b44ee7de66bf7b68f5f33a605349fcc6fa
Reviewed-on: https://go-review.googlesource.com/23097
Reviewed-by: Keith Randall <khr@golang.org>
2016-05-19 02:55:35 +00:00
Cherry Zhang 8f72690711 [dev.ssa] cmd/compile: implement shifts & multiplications for SSA on ARM
Implement shifts and multiplications for up to 32-bit values.

Also handle Exit block.

Progress on SSA backend for ARM. Still not complete.
container/heap, crypto/subtle, hash/adler32 packages compile and
tests passed.

Updates #15365.

Change-Id: I6bee4d5b0051e51d5de97e8a1938c4b87a36cbf8
Reviewed-on: https://go-review.googlesource.com/23096
Reviewed-by: Keith Randall <khr@golang.org>
2016-05-19 02:49:09 +00:00
Cherry Zhang ccaed50c7b [dev.ssa] cmd/compile: handle boolean values for SSA on ARM
Fix hardcoded flag register mask in ssa/flagalloc.go by auto-generating
the mask.

Also fix a mistake (in previous CL) about conditional branches.

Progress on SSA backend for ARM. Still not complete. Now "container/ring"
package compiles and tests passed.

Updates #15365.

Change-Id: Id7c8805c30dbb8107baedb485ed0f71f59ed6ea8
Reviewed-on: https://go-review.googlesource.com/23093
Reviewed-by: Keith Randall <khr@golang.org>
2016-05-19 02:48:36 +00:00
Cherry Zhang e2848de9ef [dev.ssa] cmd/compile: implement the following for SSA on ARM
- generic Ops: Phi, CALL variants, NilCheck
- generic Blocks: Plain, Check
- 32-bit arithmetics
- CMP and conditional branches
- load/store
- zero/sign-extensions (8 to 16, 8 to 32, 16 to 32)

Progress on SSA backend for ARM. Still not complete. Now "errors"
package compiles and tests passed.

Updates #15365.

Change-Id: If126fd17f8695cbf55d64085bb3f1a4a53205701
Reviewed-on: https://go-review.googlesource.com/22856
Reviewed-by: Keith Randall <khr@golang.org>
2016-05-10 19:38:11 +00:00
Keith Randall 4c9a470d46 cmd/compile: start on ARM port
Start working on arm port.  Gets close to correct
code for fibonacci:
    func fib(n int) int {
        if n < 2 {
            return n
        }
        return fib(n-1) + fib(n-2)
    }

Still a lot to do, but this is a good starting point.

Cleaned up some arch-specific dependencies in regalloc.

Change-Id: I4301c6c31a8402168e50dcfee8bcf7aee73ea9d5
Reviewed-on: https://go-review.googlesource.com/21000
Reviewed-by: David Chase <drchase@google.com>
2016-03-23 17:46:05 +00:00