Commit Graph

372 Commits

Author SHA1 Message Date
Keith Randall 8dc04cbedc cmd/compile: enforce 32-bit restrictions on ops
Most 64-bit x86 ops can only take a signed 32-bit constant.
Clean up our rewrite rules to enforce this restriction.

Modify the assembler to fail if the offset does not fit
in the instruction.

That last check triggers a few times on weird testing code.
Suppress those errors if the compiler itself generated errors.

Fixes #14862

Change-Id: I76559af035b38483b1e59621a8029fc66b3a5d1e
Reviewed-on: https://go-review.googlesource.com/20815
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-03-20 00:12:47 +00:00
Keith Randall 3dd4f74e06 cmd/compile: merge shifts into LEAs
Change-Id: I5a43c354f36184ae64a52268023c3222da3026d8
Reviewed-on: https://go-review.googlesource.com/20880
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-03-18 23:08:24 +00:00
Todd Neal b2dc1f82a5 cmd/compile: perform minimal phi elimination during critical
Phi splitting sometimes leads to a phi with only a single predecessor.
This must be replaced with a copy to maintain a valid SSA form.

Fixes #14857

Change-Id: I5ab2423fb6c85a061928e3206b02185ea8c79cd7
Reviewed-on: https://go-review.googlesource.com/20826
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-18 15:35:49 +00:00
David Chase 815c9a7f28 cmd/compile: use loop information in regalloc
This seems to help the problem reported in #14606; this
change seems to produce about a 4% improvement (mostly
for the 128-8192 shards).

Fixes #14789.

Change-Id: I1bd52c82d4ca81d9d5e9ab371fdfc860d7e8af50
Reviewed-on: https://go-review.googlesource.com/20660
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-18 01:23:29 +00:00
David Chase 09a9ce60c7 cmd/compile: get gcflags to bootstrap; ssa debug opts for "all"
This is intended to help debug compiler problems that pop
up in the bootstrap phase of make.bash.  GO_GCFLAGS does not
normally apply there.  Options-for-all phases is intended
to allow crude tracing (and full timing) by turning on timing
for all phases, not just one.

Phase names can also be specified using a regular expression,
for example
BOOT_GO_GCFLAGS=-d='ssa/~^.*scc$/off' \
GO_GCFLAGS='-d=ssa/~^.*scc$/off' ./make.bash

I just added this because it was the fastest way to get
me to a place where I could easily debug the compiler.

Change-Id: I0781f3e7c19651ae7452fa25c2d54c9a245ef62d
Reviewed-on: https://go-review.googlesource.com/20775
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-18 01:04:47 +00:00
David Chase 3a17fdaba0 cmd/compile: correct maintain use count when phi args merge
The critical phase did not correctly maintain the use count
when two predecessors of a new critical block transmit the
same value.

Change-Id: Iba802c98ebb84e36a410721ec32c867140efb6d4
Reviewed-on: https://go-review.googlesource.com/20822
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-03-17 20:55:13 +00:00
David Chase 2d56dee61b cmd/compile: escape analysis explanations added to -m -m output
This should probably be considered "experimental" at this stage, but
what it needs is feedback from adventurous adopters.  I think the data
structure used for describing escape reasons might be extendable to
allow a cleanup of the underlying algorithms, which suffers from
insufficiently separated concerns (the graph does not deal well with
escape level adjustments, so it is augmented by a second custom-walk
portion of the "flood" phase. It would be better to put it all,
including level adjustments, in a single graph structure, and then
simply flood the graph.

Tweaked to avoid allocations in the no-logging case.

Modified run.go to ignore lines with leading "#" in the output (since
it can never match a line), and in -update_errors to ignore leading
tabs in output lines and to normalize embedded filenames.

Currently requires -m -m because otherwise the noise/update
burden for the other escape tests is considerable.

There is a partial test.  Existing escape analysis tests seem to
cover all except the panic case and what looks like it might be
unreachable code in escape analysis.

Fixes #10526.

Change-Id: I2524fdec54facae48b00b2548e25d9e46fcaf832
Reviewed-on: https://go-review.googlesource.com/18041
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-17 13:29:48 +00:00
Todd Neal 50bc546d43 cmd/compile: reuse blocks in critical pass
If a phi has duplicate arguments, then the new block that is constructed
to remove the critical edge can be used for all of the duplicate
arguments.

read-only data = -904 bytes (-0.058308%)
global text (code) = -2240 bytes (-0.060056%)
Total difference -3144 bytes (-0.056218%)

Change-Id: Iee3762744d6a8c9d26cdfa880bb23feb62b03c9c
Reviewed-on: https://go-review.googlesource.com/20746
Run-TryBot: Todd Neal <todd@tneal.org>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-17 12:12:34 +00:00
Keith Randall 56e0ecc5ea cmd/compile: keep value use counts in SSA
Keep track of how many uses each Value has.  Each appearance in
Value.Args and in Block.Control counts once.

The number of uses of a value is generically useful to
constrain rewrite rules.  For instance, we might want to
prevent merging index operations into loads if the same
index expression is used lots of times.

But I have one use in particular for which the use count is required.
We must make sure we don't combine ops with loads if the load has
more than one use.  Otherwise, we may split a single load
into multiple loads and that breaks perceived behavior in
the presence of races.  In particular, the load of m.state
in sync/mutex.go:Lock can't be done twice.  (I have a separate
CL which triggers the mutex failure.  This CL has a test which
demonstrates a similar failure.)

Change-Id: Icaafa479239f48632a069d0c3f624e6ebc6b1f0e
Reviewed-on: https://go-review.googlesource.com/20790
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-03-17 04:20:02 +00:00
Todd Neal c8b148e7a5 cmd/compile: fold constants from lsh/rsh/lsh and rsh/lsh/rsh
Fixes #14825

Change-Id: Ib44d80579a55c15d75ea2ad1ef54efa6ca66a9a6
Reviewed-on: https://go-review.googlesource.com/20745
Run-TryBot: Todd Neal <todd@tneal.org>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-16 11:40:20 +00:00
Todd Neal 763afe13b9 cmd/compile: change logging of spills for regalloc to Warnl format
Change-Id: I01c000ff3f6dc6b0ed691e289eeef0fa61500337
Reviewed-on: https://go-review.googlesource.com/20744
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-16 02:50:13 +00:00
Todd Neal 8edb72587f cmd/compile: add logging to critical and phielim
Change-Id: Ieefeceea40bd29657fd519368b0920dad8443844
Reviewed-on: https://go-review.googlesource.com/20712
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-15 22:53:12 +00:00
Keith Randall 5305a329d8 cmd/compile: turn off SSA internal consistency checks
They've been on for a few weeks of general use and nothing
has tripped up on them yet.

Makes the compiler ~18% faster.

Change-Id: I42d7bbc0581597f9cf4fb28989847814c81b08a2
Reviewed-on: https://go-review.googlesource.com/20741
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-15 22:25:48 +00:00
Matthew Dempsky 1b9f168f73 cmd/compile: use int for field index
All of a struct's fields have to fit into memory anyway, so index them
with int instead of int64.  This also makes it nicer for
cmd/compile/internal/gc to reuse the same NumFields function.

Change-Id: I210be804a0c33370ec9977414918c02c675b0fbe
Reviewed-on: https://go-review.googlesource.com/20691
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2016-03-15 19:56:43 +00:00
Keith Randall 41f9f6f471 cmd/compile: fix load-combining
Make sure symbol gets carried along by load-combining rule.
Add the new load into the right block where we know that
mem is live.

Use auxInt field to carry i along instead of an explicit ADDQ.

Incorporate LEA ops into MOVBQZX and friends.

Change-Id: I587f7c6120b98fd2a0d48ddd6ddd13345d4421b4
Reviewed-on: https://go-review.googlesource.com/20732
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-03-15 19:18:13 +00:00
Alexandru Moșoi 13f74db304 cmd/compile: fix no-opt build after moving decomposing user functions
decompose-builtin pass requires an opt pass, but -N disables
late-opt, the only opt pass (out of two) that happens
after decompose-builtin.  This CL enables both 'opt' and 'late opt'
passes. The extra compile time for 'late opt' in negligible
since most rewrites were already done in the first 'opt'
(also measured before). We should put some effort in splitting the
generic rules into required and optional.

Also update generic.rules comments about lowering
of StringMake and SliceMake.

Tested with GO_GCFLAGS=-N ./all.bash

Change-Id: I92999681aaa02587b6dc6e32ce997a91f1fc9499
Reviewed-on: https://go-review.googlesource.com/20682
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-14 18:44:10 +00:00
Michael Pratt a4e31d42ee cmd/compile: remove amd64 code from package gc and the core gen tool
Parts of the SSA compiler in package gc contain amd64-specific code,
most notably Prog generation. Move this code into package amd64, so that
other architectures can be added more easily.

In package gc, this change is just moving code. There are no functional
changes or even any larger structural changes beyond changing function
names (mostly for export).

In the cmd/compile/internal/ssa/gen tool, more information is included
in arch to remove the AMD64-specific behavior in the main portion of the
tool. The generated opGen.go is identical.

Change-Id: I8eb37c6e6df6de1b65fa7dab6f3bc32c29daf643
Reviewed-on: https://go-review.googlesource.com/20609
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-14 16:59:03 +00:00
Todd Neal 98b88de56f cmd/compile: change the type of ssa Warnl line number
Line numbers are always int32, so the Warnl function should take the
line number as an int32 as well.  This matches gc.Warnl and removes
a cast every place it's used.

Change-Id: I5d6201e640d52ec390eb7174f8fd8c438d4efe58
Reviewed-on: https://go-review.googlesource.com/20662
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-14 11:04:40 +00:00
Alexandru Moșoi 8ec80176d4 cmd/compile: move decompose builtin closer to late opt
* Shaves about 10k from pkg/tools/linux_amd64.
* Was suggested by drchase before
* Found by looking at ssa output of #14758

Change-Id: If2c4ddf3b2603d4dfd8fb4d9199b9a3dcb05b17d
Reviewed-on: https://go-review.googlesource.com/20570
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-13 22:12:01 +00:00
Todd Neal f6ceed2cab cmd/compile: const folding for float32/64
Split the auxFloat type into 32/64 bit versions and perform checking for
exactly representable float32 values.  Perform const folding on
float32/64.  Comment out some const negation rules that the frontend
already performs.

Change-Id: Ib3f8d59fa8b30e50fe0267786cfb3c50a06169d2
Reviewed-on: https://go-review.googlesource.com/20568
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-13 13:32:41 +00:00
Alexandru Moșoi cd798dcb88 cmd/compile/internal/ssa: generalize prove to all booleans
* Refacts a bit saving and restoring parents restrictions
* Shaves ~100k from pkg/tools/linux_amd64,
but most of the savings come from the rewrite rules.
* Improves on the following artificial test case:
func f1(a4 bool, a6 bool) bool {
  return a6 || (a6 || (a6 || a4)) || (a6 || (a4 || a6 || (false || a6)))
}

Change-Id: I714000f75a37a3a6617c6e6834c75bd23674215f
Reviewed-on: https://go-review.googlesource.com/20306
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-13 12:05:41 +00:00
Keith Randall 77b527e283 cmd/compile: strength reduce *24
We use *24 a lot for pointer arithmetic when accessing slices
of slices ([][]T).  Rewrite to use an LEA and a shift.
The shift will likely be free, as it often gets folded into
an indexed load/store.

Update #14606

Change-Id: Ie0bf6dc1093876efd57e88ce5f62c26a9bf21cec
Reviewed-on: https://go-review.googlesource.com/20567
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-03-12 02:58:45 +00:00
Keith Randall 369f4f5de5 cmd/compile: regalloc of two address instructions
x86 has a lot of instructions that require the output to be in the same
register as one of the inputs.  When allocating the output register,
allocate the same register as the input if it is available.

Improves the performance of golang.org/x/crypto/sha3 by
10% (from 6% slower than 1.6 to 4% faster).

Fixes #14745

Change-Id: I4d81785240c9368e4dc75107b45c959d200df8e6
Reviewed-on: https://go-review.googlesource.com/20488
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-03-11 04:13:07 +00:00
Todd Neal 6b3d4a5353 cmd/compile: modify regalloc/stackalloc to use the cmd line debug args
Change the existing flags from compile time consts to be configurable
from the command line.

Change-Id: I4aab4bf3dfcbdd8e2b5a2ff51af95c2543967769
Reviewed-on: https://go-review.googlesource.com/20560
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-11 01:35:12 +00:00
Keith Randall 31d13f479a cmd/compile: don't use PPARAMOUT names for temps
The location of VARDEFs is incorrect for PPARAMOUT variables
which are also used as temporary locations.  We put in VARDEFs
when setting the variable at return time, but when the location
is also used as a temporary the lifetime values are wrong.

Fix copyelim to update the names map properly.  This is a
real name bug fix which, as a result, allows me to
write a reasonable test to trigger the PPARAMOUT bug.

This is kind of a band-aid fix for #14591.  A more pricipled
fix (which allows values to be stored in the return variable
earlier than the return point) will be harder.

Fixes #14591

Change-Id: I7df8ae103a982d1f218ed704c080d7b83cdcfdd9
Reviewed-on: https://go-review.googlesource.com/20457
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-03-11 00:56:04 +00:00
Matthew Dempsky 0b281872e6 cmd/compile: rename ssa.Type's Elem method to ElemType
I would like to add a

    func (t *Type) Elem() *Type

method to package gc, but that would collide with the existing

    func (t *Type) Elem() ssa.Type

method needed to make *gc.Type implement ssa.Type.  Because the latter
is much less widely used right now than the former will be, this CL
renames it to ElemType.

Longer term, hopefully gc and ssa will share a common Type interface,
and ElemType can go away.

Change-Id: I270008515dc4c01ef531cf715637a924659c4735
Reviewed-on: https://go-review.googlesource.com/20546
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-10 23:02:33 +00:00
Keith Randall ddc6b64444 cmd/compile: fix defer/deferreturn
Make sure we do any just-before-return cleanup on all paths out of a
function, including when recovering.  Each exit path should include
deferreturn (if there are any defers) and then the exit
code (e.g. copying heap-escaping return values back to the stack).

Introduce a Defer SSA block type which has two outgoing edges - one the
fallthrough edge (the defer was queued successfully) and one which
immediately returns (the defer had a successful recover() call and
normal execution should resume at the return point).

Fixes #14725

Change-Id: Iad035c9fd25ef8b7a74dafbd7461cf04833d981f
Reviewed-on: https://go-review.googlesource.com/20486
Reviewed-by: David Chase <drchase@google.com>
2016-03-10 22:33:49 +00:00
Todd Neal 6cb2e1d015 cmd/compile: remove values from const cache upon free
When calling freeValue for possible const values, remove them from the
cache as well.

Change-Id: I087ed592243e33c58e5db41700ab266fc70196d9
Reviewed-on: https://go-review.googlesource.com/20481
Run-TryBot: Todd Neal <tolchz@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-10 10:28:05 +00:00
Alexandru Moșoi bbd3ffbd83 cmd/compile: constant fold more of IsInBounds and IsSliceInBounds
Fixes #14721

Change-Id: Id1d5a819e5c242b91a37c4e464ed3f00c691aff5
Reviewed-on: https://go-review.googlesource.com/20482
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-09 18:05:28 +00:00
Alexandru Moșoi dfcb853d9d cmd/compile/internal/ssa: lower builtins much later
* Move lowering into a separate pass.
* SliceLen/SliceCap is now available to various intermediate passes
which use useful for bounds checking.
* Add a second opt pass to handle the new opportunities

Decreases the code size of binaries in pkg/tool/linux_amd64
by ~45K.

Updates #14564 #14606

Change-Id: I5b2bd6202181c50623a3585fbf15c0d6db6d4685
Reviewed-on: https://go-review.googlesource.com/20172
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-03-09 11:08:59 +00:00
David Chase 0321cabdfa cmd/compile: guard the &-to-<<>> opt against small constants
Converting an and-K into a pair of shifts for K that will
fit in a one-byte argument is probably not an optimization,
and it also interferes with other patterns that we want to
see fire, like (<< (AND K)) [for small K] and bounds check
elimination for masked indices.

Turns out that on Intel, even 32-bit signed immediates beat
the shift pair; the size reduction of tool binaries is 0.09%
vs 0.07% for only the 8-bit immediates.

RLH found this one working on the new/next GC.

Change-Id: I2414a8de1dd58d680d18587577fbadb7ff4f67d9
Reviewed-on: https://go-review.googlesource.com/20410
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
2016-03-08 23:06:11 +00:00
Alexandru Moșoi fb2f99d5fd cmd/compile/internal/ssa: simplify nil checks in opt.
* Simplify the nilcheck generated by
for _, e := range a {}
* No effect on the generated code because these nil checks
don't end up in the generated code.
* Useful for other analysis, e.g. it'll remove one dependecy
on the induction variable.

Change-Id: I6ee66ddfdc010ae22aea8dca48163303d93de7a9
Reviewed-on: https://go-review.googlesource.com/20307
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-08 22:17:47 +00:00
Josh Bleecher Snyder f3a29f1f81 cmd/compile: preallocate storage for three Value args
benchstat master2 arg3b
name      old time/op    new time/op    delta
Template     441ms ± 4%     444ms ± 6%    ~     (p=0.335 n=22+25)
GoTypes      1.51s ± 2%     1.51s ± 2%    ~     (p=0.129 n=25+21)
Compiler     5.59s ± 1%     5.56s ± 2%  -0.65%  (p=0.001 n=24+21)

name      old alloc/op   new alloc/op   delta
Template    85.6MB ± 0%    85.3MB ± 0%  -0.40%  (p=0.000 n=25+24)
GoTypes      307MB ± 0%     305MB ± 0%  -0.38%  (p=0.000 n=25+25)
Compiler    1.06GB ± 0%    1.05GB ± 0%  -0.43%  (p=0.000 n=25+25)

name      old allocs/op  new allocs/op  delta
Template     1.10M ± 0%     1.09M ± 0%  -1.04%  (p=0.000 n=25+25)
GoTypes      3.36M ± 0%     3.32M ± 0%  -1.13%  (p=0.000 n=25+24)
Compiler     13.0M ± 0%     12.9M ± 0%  -1.12%  (p=0.000 n=25+25)

Change-Id: I1280b846e895c00b95bb6664958a7765bd819610
Reviewed-on: https://go-review.googlesource.com/20296
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-08 18:46:49 +00:00
Josh Bleecher Snyder 39214275d6 cmd/compile: cache const nil, iface, slice, and ""
name      old time/op    new time/op    delta
Template     441ms ± 4%     446ms ± 4%  +1.23%  (p=0.048 n=22+25)
GoTypes      1.51s ± 2%     1.51s ± 2%    ~     (p=0.224 n=25+25)
Compiler     5.59s ± 1%     5.57s ± 2%  -0.38%  (p=0.019 n=24+24)

name      old alloc/op   new alloc/op   delta
Template    85.6MB ± 0%    85.6MB ± 0%  -0.11%  (p=0.000 n=25+24)
GoTypes      307MB ± 0%     305MB ± 0%  -0.45%  (p=0.000 n=25+25)
Compiler    1.06GB ± 0%    1.06GB ± 0%  -0.34%  (p=0.000 n=25+25)

name      old allocs/op  new allocs/op  delta
Template     1.10M ± 0%     1.10M ± 0%  -0.03%  (p=0.001 n=25+24)
GoTypes      3.36M ± 0%     3.35M ± 0%  -0.13%  (p=0.000 n=25+25)
Compiler     13.0M ± 0%     13.0M ± 0%  -0.12%  (p=0.000 n=25+24)

Change-Id: I7fc18acbc3b1588aececef9692e24a0bd3dba974
Reviewed-on: https://go-review.googlesource.com/20295
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-08 16:21:51 +00:00
Matthew Dempsky 0d9258a830 cmd/internal/obj: add As type for assembly opcodes
Passes toolstash/buildall.

Fixes #14692.

Change-Id: I4352678d8251309f2b8b7793674c550fac948006
Reviewed-on: https://go-review.googlesource.com/20350
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-08 04:20:09 +00:00
David Chase b1785a5065 cmd/compile: Tinkering with schedule for debug and regalloc
This adds a heap-based proper priority queue to the
scheduler which made a relatively easy to test quite a few
heuristics that "ought to work well".  For go tools
themselves (which may not be representative) the heuristic
that works best is (1) in line-number-order, then (2) from
more to fewer args, then (3) in variable ID order.  Trying
to improve this with information about use at end of
blocks turned out to be fruitless -- all of my naive
attempts at using that information turned out worse than
ignoring it.  I can confirm that the stores-early heuristic
tends to help; removing it makes the results slightly worse.

My metric is code size reduction, which I take to mean fewer
spills from register allocation.  It's not uniform.
Here's the endpoints for "vet" from one set of pretty-good
heuristics (this is representative at least).

-2208 time.parse 13472 15680 -14.081633%
-1514 runtime.pclntab 1002058 1003572 -0.150861%
-352 time.Time.AppendFormat 9952 10304 -3.416149%
-112 runtime.runGCProg 1984 2096 -5.343511%
-64 regexp/syntax.(*parser).factor 7264 7328 -0.873362%
-44 go.string.alldata 238630 238674 -0.018435%

48 math/big.(*Float).round 1376 1328 3.614458%
48 text/tabwriter.(*Writer).writeLines 1232 1184 4.054054%
48 math/big.shr 832 784 6.122449%
88 go.func.* 75174 75086 0.117199%
96 time.Date 1968 1872 5.128205%

Overall there appears to be an 0.1% decrease in text size.
No timings yet, and given the distribution of size reductions
it might make sense to wait on those.

addr2line  text (code) = -4392 bytes (-0.156273%)
api  text (code) = -5502 bytes (-0.147644%)
asm  text (code) = -5254 bytes (-0.187810%)
cgo  text (code) = -4886 bytes (-0.148846%)
compile  text (code) = -1577 bytes (-0.019346%) * changed
cover  text (code) = -5236 bytes (-0.137992%)
dist  text (code) = -5015 bytes (-0.167829%)
doc  text (code) = -5180 bytes (-0.182121%)
fix  text (code) = -5000 bytes (-0.215148%)
link  text (code) = -5092 bytes (-0.152712%)
newlink  text (code) = -5204 bytes (-0.196986%)
nm  text (code) = -4398 bytes (-0.156018%)
objdump  text (code) = -4582 bytes (-0.155046%)
pack  text (code) = -4503 bytes (-0.294287%)
pprof  text (code) = -6314 bytes (-0.085177%)
trace  text (code) = -5856 bytes (-0.097818%)
vet  text (code) = -5696 bytes (-0.117334%)
yacc  text (code) = -4971 bytes (-0.213817%)

This leaves me sorely tempted to look into a "real" scheduler
to try to do a better job, but I think it might make more
sense to look into getting loop information into the
register allocator instead.

Fixes #14577.

Change-Id: I5238b83284ce76dea1eb94084a8cd47277db6827
Reviewed-on: https://go-review.googlesource.com/20240
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-08 02:32:20 +00:00
Todd Neal 481fe59012 cmd/compile: fix load combining from a non-zero pointer offset
When the pointer offset is non-zero in the small loads, we need to add the offset
when converting to the larger load.

Fixes #14694

Change-Id: I5ba8bcb3b9ce26c7fae0c4951500b9ef0fed54cd
Reviewed-on: https://go-review.googlesource.com/20333
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-08 02:21:01 +00:00
Josh Bleecher Snyder 8969ab89b8 cmd/compile: add sizeof test for ssa types
Fix some test output while we're here.

Change-Id: I265cedc222e078eff120f268b92451e12b0400b2
Reviewed-on: https://go-review.googlesource.com/20294
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-07 22:41:22 +00:00
Josh Bleecher Snyder 6ed10382f7 cmd/compile: soup up isSamePtr
This increases the number of matches in make.bash
from 853 to 984.

Change-Id: I12697697a50ecd86d49698200144a4c80dd3e5a4
Reviewed-on: https://go-review.googlesource.com/20274
Reviewed-by: Todd Neal <todd@tneal.org>
2016-03-07 01:30:07 +00:00
Josh Bleecher Snyder 3294014ae1 cmd/compile: collapse OffPtr sequences
This triggers an astonishing 160k times
during make.bash. The second biggest
generic rewrite triggers 100k times.

However, this is really just moving
rewrites that were happening at the
architecture level to the generic level.

Change-Id: Ife06fe5234f31433328460cb2e0741c071deda41
Reviewed-on: https://go-review.googlesource.com/20235
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-03-06 23:56:55 +00:00
Keith Randall 12e60452e9 cmd/compile: Combine smaller loads into a larger load
This only deals with the loads themselves.  The bounds checks
are a separate issue.  Also doesn't handle stores, those are
harder because we need to make sure intermediate memory states
aren't observed (which is hard to do with rewrite rules).

Use one byte shorter instructions for zero-extending loads.

Update #14267

Change-Id: I40af25ab5208488151ba7db32bf96081878fa7d9
Reviewed-on: https://go-review.googlesource.com/20218
Reviewed-by: Alexandru Moșoi <alexandru@mosoi.ro>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-06 22:52:22 +00:00
Matthew Dempsky 16e2e95dd9 cmd/compile/internal/ssa: chmod -x likelyadjust.go
Change-Id: I01ec7d08884a1e0fbce94ea77281c1ee4a2cfd56
Reviewed-on: https://go-review.googlesource.com/20238
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-04 22:56:02 +00:00
Keith Randall c0740fed37 cmd/compile: more ssa config flags
To turn ssa compilation on or off altogether, use
-ssa=1 or -ssa=0.  Default is on.

To turn on or off consistency checks, do
-d=ssa/check/on or -d=ssa/check/off.  Default is on for now.

Change-Id: I277e0311f538981c8b9c62e7b7382a0c8755ce4c
Reviewed-on: https://go-review.googlesource.com/20217
Reviewed-by: David Chase <drchase@google.com>
2016-03-04 19:03:12 +00:00
David du Colombier d55a099e22 cmd/compile: don't use duffcopy and duffzero on Plan 9
The ssa compiler uses the duffcopy and duffzero functions,
which rely on the MOVUPS instructions.

However, this doesn't work on Plan 9, since floating point
operations are not allowed in the note handler.

This change disables the use of duffcopy and duffzero
on Plan 9 in the ssa compiler.

Updates #14605.

Change-Id: I017f8ff83de00eabaf7e146b4344a863db1dfddc
Reviewed-on: https://go-review.googlesource.com/20171
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David du Colombier <0intro@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-03 21:36:11 +00:00
Keith Randall 686fbdb3b0 cmd/compile: make compilation deterministic, fixes toolstash
Make sure we don't depend on map iterator order.

Fixes #14600

Change-Id: Iac0e0c8689f3ace7a4dc8e2127e2fd3c8545bd29
Reviewed-on: https://go-review.googlesource.com/20158
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-03 18:03:45 +00:00
Keith Randall c03ed491fe cmd/compile: load some live values into registers before loop
If we're about to enter a loop, load values which are live
and will soon be used in the loop into registers.

name                     old time/op    new time/op    delta
BinaryTree17-8              2.80s ± 4%     2.62s ± 2%   -6.43%          (p=0.008 n=5+5)
Fannkuch11-8                2.45s ± 2%     2.14s ± 1%  -12.43%          (p=0.008 n=5+5)
FmtFprintfEmpty-8          49.0ns ± 1%    48.4ns ± 1%   -1.35%          (p=0.032 n=5+5)
FmtFprintfString-8          160ns ± 1%     153ns ± 0%   -4.63%          (p=0.008 n=5+5)
FmtFprintfInt-8             152ns ± 0%     150ns ± 0%   -1.57%          (p=0.000 n=5+4)
FmtFprintfIntInt-8          252ns ± 2%     244ns ± 1%   -3.02%          (p=0.008 n=5+5)
FmtFprintfPrefixedInt-8     223ns ± 0%     223ns ± 0%     ~     (all samples are equal)
FmtFprintfFloat-8           293ns ± 2%     291ns ± 2%     ~             (p=0.389 n=5+5)
FmtManyArgs-8               956ns ± 0%     936ns ± 0%   -2.05%          (p=0.008 n=5+5)
GobDecode-8                7.18ms ± 0%    7.11ms ± 0%   -1.02%          (p=0.008 n=5+5)
GobEncode-8                6.12ms ± 3%    6.07ms ± 1%     ~             (p=0.690 n=5+5)
Gzip-8                      284ms ± 1%     284ms ± 0%     ~             (p=1.000 n=5+5)
Gunzip-8                   40.8ms ± 1%    40.6ms ± 1%     ~             (p=0.310 n=5+5)
HTTPClientServer-8         69.8µs ± 1%    72.2µs ± 4%     ~             (p=0.056 n=5+5)
JSONEncode-8               16.1ms ± 2%    16.2ms ± 1%     ~             (p=0.151 n=5+5)
JSONDecode-8               54.9ms ± 0%    57.0ms ± 1%   +3.79%          (p=0.008 n=5+5)
Mandelbrot200-8            4.35ms ± 0%    4.39ms ± 0%   +0.85%          (p=0.008 n=5+5)
GoParse-8                  3.56ms ± 1%    3.42ms ± 1%   -4.03%          (p=0.008 n=5+5)
RegexpMatchEasy0_32-8      75.6ns ± 1%    75.0ns ± 0%   -0.83%          (p=0.016 n=5+4)
RegexpMatchEasy0_1K-8       250ns ± 0%     252ns ± 1%   +0.80%          (p=0.016 n=4+5)
RegexpMatchEasy1_32-8      75.0ns ± 0%    75.4ns ± 2%     ~             (p=0.206 n=5+5)
RegexpMatchEasy1_1K-8       401ns ± 0%     398ns ± 1%     ~             (p=0.056 n=5+5)
RegexpMatchMedium_32-8      119ns ± 0%     118ns ± 0%   -0.84%          (p=0.008 n=5+5)
RegexpMatchMedium_1K-8     36.6µs ± 0%    36.9µs ± 0%   +0.91%          (p=0.008 n=5+5)
RegexpMatchHard_32-8       1.95µs ± 1%    1.92µs ± 0%   -1.23%          (p=0.032 n=5+5)
RegexpMatchHard_1K-8       58.3µs ± 1%    58.1µs ± 1%     ~             (p=0.548 n=5+5)
Revcomp-8                   425ms ± 1%     389ms ± 1%   -8.39%          (p=0.008 n=5+5)
Template-8                 65.5ms ± 1%    63.6ms ± 1%   -2.86%          (p=0.008 n=5+5)
TimeParse-8                 363ns ± 0%     354ns ± 1%   -2.59%          (p=0.008 n=5+5)
TimeFormat-8                363ns ± 0%     364ns ± 1%     ~             (p=0.159 n=5+5)

Fixes #14511

Change-Id: I1b79d2545271fa90d5b04712cc25573bdc94f2ce
Reviewed-on: https://go-review.googlesource.com/20151
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-03-03 17:39:43 +00:00
David Chase 97b2295f06 cmd/compile: trunc(and(x,K)) rewrite to trunc(x) for some K
uint8(s.b & 0xff) ought to produce same code as uint8(s.b)
but it did not.  RLH found this one looking for moles to
whack in the GC code.

Change-Id: I883d68ec7a5746d652712be84a274a11256b3b33
Reviewed-on: https://go-review.googlesource.com/20141
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-02 21:50:55 +00:00
Keith Randall 62ac107a34 cmd/compile: some SSA cleanup
Do some easy TODOs.
Move a bunch of other TODOs into bugs.

Change-Id: Iaba9dad6221a2af11b3cbcc512875f4a85842873
Reviewed-on: https://go-review.googlesource.com/20114
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-03-02 03:17:46 +00:00
Brad Fitzpatrick 5fea2ccc77 all: single space after period.
The tree's pretty inconsistent about single space vs double space
after a period in documentation. Make it consistently a single space,
per earlier decisions. This means contributors won't be confused by
misleading precedence.

This CL doesn't use go/doc to parse. It only addresses // comments.
It was generated with:

$ perl -i -npe 's,^(\s*// .+[a-z]\.)  +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.)  +([A-Z])')
$ go test go/doc -update

Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
Reviewed-on: https://go-review.googlesource.com/20022
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dave Day <djd@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-02 00:13:47 +00:00
David Chase 6b3462c784 [dev.ssa] cmd/compile: adjust branch likeliness for calls/loops
Static branch predictions (which guide block ordering) are
adjusted based on:

loop/not-loop     (favor looping)
abnormal-exit/not (avoid panic)
call/not-call     (avoid call)
ret/default       (treat returns as rare)

This appears to make no difference in performance of real
code, meaning the compiler itself.  The earlier version of
this has been stripped down to help make the cost of this
only-aesthetic-on-Intel phase be as cheap as possible (we
probably want information about inner loops for improving
register allocation, but because register allocation follows
close behind this pass, conceivably the information could be
reused -- so we might do this anyway just to normalize
output).

For a ./make.bash that takes 200 user seconds, about .75
second is reported in likelyadjust (summing nanoseconds
reported with -d=ssa/likelyadjust/time ).

Upstream predictions are respected.
Includes test, limited to build on amd64 only.
Did several iterations on the debugging output to allow
some rough checks on behavior.
Debug=1 logging notes agree/disagree with earlier passes,
allowing analysis like the following:

Run on make.bash:
GO_GCFLAGS=-d=ssa/likelyadjust/debug \
   ./make.bash >& lkly5.log

grep 'ranch prediction' lkly5.log | wc -l
   78242 // 78k predictions

grep 'ranch predi' lkly5.log | egrep -v 'agrees with' | wc -l
   29633 // 29k NEW predictions

grep 'disagrees' lkly5.log | wc -l
     444 // contradicted 444 times

grep '< exit' lkly5.log | wc -l
   10212 // 10k exit predictions

grep '< exit' lkly5.log | egrep 'disagrees' | wc -l
       5 // 5 contradicted by previous prediction

grep '< exit' lkly5.log | egrep -v 'agrees' | wc -l
     702 // 702-5 redundant with previous prediction

grep '< call' lkly5.log | egrep -v 'agrees' | wc -l
   16699 // 16k new call predictions

grep 'stay in loop' lkly5.log | egrep -v 'agrees' | wc -l
    3951 // 4k new "remain in loop" predictions

Fixes #11451.

Change-Id: Iafb0504f7030d304ef4b6dc1aba9a5789151a593
Reviewed-on: https://go-review.googlesource.com/19995
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-01 20:09:41 +00:00