Commit Graph

3282 Commits

Author SHA1 Message Date
Matthew Dempsky 3aa53b3135 runtime: eliminate runtime.hselect
Now the registration phase looks like:

    var cases [4]runtime.scases
    var order [8]uint16
    selectsend(&cases[0], c1, &v1)
    selectrecv(&cases[1], c2, &v2, nil)
    selectrecv(&cases[2], c3, &v3, &ok)
    selectdefault(&cases[3])
    chosen := selectgo(&cases[0], &order[0], 4)

Primarily, this is just preparation for having the compiler open-code
selectsend, selectrecv, and selectdefault.

As a minor benefit, order can now be layed out separately on the stack
in the pointer-free segment, so it won't take up space in the
function's stack pointer maps.

Change-Id: I5552ba594201efd31fcb40084da20b42ea569a45
Reviewed-on: https://go-review.googlesource.com/37933
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-05-01 03:17:31 +00:00
Martin Möhrmann 48bfc8db51 runtime,cmd/compile: adjust and correct path names in comments of map code
Some of the comments relative paths do not exist and
reflect does not define its own hmap structure.

Correct paths and consistently reference paths starting from the
go src directory.

Change-Id: I5204a3a98f77d65f17dcde98b847378cea05ad8a
Reviewed-on: https://go-review.googlesource.com/94758
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-04-30 13:42:26 +00:00
Wei Xiao bd8a88729c cmd/compile: intrinsify runtime.getcallerpc on arm64
Add a compiler intrinsic for getcallerpc on arm64 for better code generation.

Change-Id: I897e670a2b8ffa1a8c2fdc638f5b2c44bda26318
Reviewed-on: https://go-review.googlesource.com/109276
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-30 13:29:14 +00:00
Josh Bleecher Snyder 743fd9171f cmd/compile: use AuxInt to store shift boundedness
Fixes ssacheck build.

Change-Id: Idf1d2ea9a971a1f17f2fca568099e870bb5d913f
Reviewed-on: https://go-review.googlesource.com/110122
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-04-30 00:59:26 +00:00
Alberto Donizetti 8a958bb8a6 cmd/compile: better formatting for ssa phases options doc
Change the help doc of

  go tool compile -d=ssa/help

from this:

  compile: GcFlag -d=ssa/<phase>/<flag>[=<value>|<function_name>]
  <phase> is one of:
  check, all, build, intrinsics, early_phielim, early_copyelim
  early_deadcode, short_circuit, decompose_user, opt, zero_arg_cse
  opt_deadcode, generic_cse, phiopt, nilcheckelim, prove, loopbce
  decompose_builtin, softfloat, late_opt, generic_deadcode, check_bce
  fuse, dse, writebarrier, insert_resched_checks, tighten, lower
  lowered_cse, elim_unread_autos, lowered_deadcode, checkLower
  late_phielim, late_copyelim, phi_tighten, late_deadcode, critical
  likelyadjust, layout, schedule, late_nilcheck, flagalloc, regalloc
  loop_rotate, stackframe, trim
  <flag> is one of on, off, debug, mem, time, test, stats, dump
  <value> defaults to 1
  <function_name> is required for "dump", specifies name of function to dump after <phase>
  Except for dump, output is directed to standard out; dump appears in a file.
  Phase "all" supports flags "time", "mem", and "dump".
  Phases "intrinsics" supports flags "on", "off", and "debug".
  Interpretation of the "debug" value depends on the phase.
  Dump files are named <phase>__<function_name>_<seq>.dump.

To this:

  compile: PhaseOptions usage:

      go tool compile -d=ssa/<phase>/<flag>[=<value>|<function_name>]

  where:

  - <phase> is one of:
      check, all, build, intrinsics, early_phielim, early_copyelim
      early_deadcode, short_circuit, decompose_user, opt, zero_arg_cse
      opt_deadcode, generic_cse, phiopt, nilcheckelim, prove
      decompose_builtin, softfloat, late_opt, generic_deadcode, check_bce
      branchelim, fuse, dse, writebarrier, insert_resched_checks, lower
      lowered_cse, elim_unread_autos, lowered_deadcode, checkLower
      late_phielim, late_copyelim, tighten, phi_tighten, late_deadcode
      critical, likelyadjust, layout, schedule, late_nilcheck, flagalloc
      regalloc, loop_rotate, stackframe, trim

  - <flag> is one of:
      on, off, debug, mem, time, test, stats, dump

  - <value> defaults to 1

  - <function_name> is required for the "dump" flag, and specifies the
    name of function to dump after <phase>

  Phase "all" supports flags "time", "mem", and "dump".
  Phase "intrinsics" supports flags "on", "off", and "debug".

  If the "dump" flag is specified, the output is written on a file named
  <phase>__<function_name>_<seq>.dump; otherwise it is directed to stdout.

Also add a few examples at the bottom.

Fixes #20349

Change-Id: I334799e951e7b27855b3ace5d2d966c4d6ec4cff
Reviewed-on: https://go-review.googlesource.com/110062
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-04-29 16:24:03 +00:00
Josh Bleecher Snyder 9eb4590ad2 cmd/compile: simplify shifts using bounds from prove pass
The prove pass sometimes has bounds information
that later rewrite passes do not.

Use this information to mark shifts as bounded,
and then use that information to generate better code on amd64.
It may prove to be helpful on other architectures, too.

While here, coalesce the existing shift lowering rules.

This triggers 35 times building std+cmd. The full list is below.

Here's an example from runtime.heapBitsSetType:

			if nb < 8 {
				b |= uintptr(*p) << nb
				p = add1(p)
			} else {
				nb -= 8
			}

We now generate better code on amd64 for that left shift.

Updates #25087

vendor/golang_org/x/crypto/curve25519/mont25519_amd64.go:48:20: Proved Rsh8Ux64 bounded
runtime/mbitmap.go:1252:22: Proved Lsh64x64 bounded
runtime/mbitmap.go:1265:16: Proved Lsh64x64 bounded
runtime/mbitmap.go:1275:28: Proved Lsh64x64 bounded
runtime/mbitmap.go:1645:25: Proved Lsh64x64 bounded
runtime/mbitmap.go:1663:25: Proved Lsh64x64 bounded
runtime/mbitmap.go:1808:41: Proved Lsh64x64 bounded
runtime/mbitmap.go:1831:49: Proved Lsh64x64 bounded
syscall/route_bsd.go:227:23: Proved Lsh32x64 bounded
syscall/route_bsd.go:295:23: Proved Lsh32x64 bounded
syscall/route_darwin.go:40:23: Proved Lsh32x64 bounded
compress/bzip2/bzip2.go:384:26: Proved Lsh64x16 bounded
vendor/golang_org/x/net/route/address.go:370:14: Proved Lsh64x64 bounded
compress/flate/inflate.go:201:54: Proved Lsh64x64 bounded
math/big/prime.go:50:25: Proved Lsh64x64 bounded
vendor/golang_org/x/crypto/cryptobyte/asn1.go:464:43: Proved Lsh8x8 bounded
net/ip.go:87:21: Proved Rsh8Ux64 bounded
cmd/internal/goobj/read.go:267:23: Proved Lsh64x64 bounded
cmd/vendor/golang.org/x/arch/arm64/arm64asm/decode.go:534:27: Proved Lsh32x32 bounded
cmd/vendor/golang.org/x/arch/arm64/arm64asm/decode.go:544:27: Proved Lsh32x32 bounded
cmd/internal/obj/arm/asm5.go:1044:16: Proved Lsh32x64 bounded
cmd/internal/obj/arm/asm5.go:1065:10: Proved Lsh32x32 bounded
cmd/internal/obj/mips/obj0.go:1311:21: Proved Lsh32x64 bounded
cmd/compile/internal/syntax/scanner.go:352:23: Proved Lsh64x64 bounded
go/types/expr.go:222:36: Proved Lsh64x64 bounded
crypto/x509/x509.go:1626:9: Proved Rsh8Ux64 bounded
cmd/link/internal/loadelf/ldelf.go:823:22: Proved Lsh8x64 bounded
net/http/h2_bundle.go:1470:17: Proved Lsh8x8 bounded
net/http/h2_bundle.go:1477:46: Proved Lsh8x8 bounded
net/http/h2_bundle.go:1481:31: Proved Lsh64x8 bounded
cmd/compile/internal/ssa/rewriteARM64.go:18759:17: Proved Lsh64x64 bounded
cmd/compile/internal/ssa/sparsemap.go:70:23: Proved Lsh32x64 bounded
cmd/compile/internal/ssa/sparsemap.go:73:45: Proved Lsh32x64 bounded

Change-Id: I58bb72f3e6f12f6ac69be633ea7222c245438142
Reviewed-on: https://go-review.googlesource.com/109776
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-04-29 15:59:45 +00:00
ChrisALiles 22ff9521da cmd/compile: pass arguments to convt2E/I integer functions by value
The motivation is avoid generating a pointer to the data being
converted so it can be garbage collected.
The change also slightly reduces binary size by shrinking call sites.

Fixes #24286

Benchmark results:
name                   old time/op  new time/op  delta
ConvT2ESmall-4         2.86ns ± 0%  2.80ns ± 1%  -2.12%  (p=0.000 n=29+28)
ConvT2EUintptr-4       2.88ns ± 1%  2.88ns ± 0%  -0.20%  (p=0.002 n=28+30)
ConvT2ELarge-4         19.6ns ± 0%  20.4ns ± 1%  +4.22%  (p=0.000 n=19+30)
ConvT2ISmall-4         3.01ns ± 0%  2.85ns ± 0%  -5.32%  (p=0.000 n=24+28)
ConvT2IUintptr-4       3.00ns ± 1%  2.87ns ± 0%  -4.44%  (p=0.000 n=29+25)
ConvT2ILarge-4         20.4ns ± 1%  21.3ns ± 1%  +4.41%  (p=0.000 n=30+26)
ConvT2Ezero/zero/16-4  2.84ns ± 1%  2.99ns ± 0%  +5.38%  (p=0.000 n=30+25)
ConvT2Ezero/zero/32-4  2.83ns ± 2%  3.00ns ± 0%  +5.91%  (p=0.004 n=27+3)

Change-Id: I65016ec94c53f97c52113121cab582d0c342b7a8
Reviewed-on: https://go-review.googlesource.com/102636
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-29 15:53:39 +00:00
Giovanni Bajo e0d37a33ab cmd/compile: teach prove to handle expressions like len(s)-delta
When a loop has bound len(s)-delta, findIndVar detected it and
returned len(s) as (conservative) upper bound. This little lie
allowed loopbce to drop bound checks.

It is obviously more generic to teach prove about relations like
x+d<w for non-constant "w"; we already handled the case for
constant "w", so we just want to learn that if d<0, then x+d<w
proves that x<w.

To be able to remove the code from findIndVar, we also need
to teach prove that len() and cap() are always non-negative.

This CL allows to prove 633 more checks in cmd+std. Most
of them are cases where the code was already testing before
accessing a slice but the compiler didn't know it. For instance,
take strings.HasSuffix:

    func HasSuffix(s, suffix string) bool {
        return len(s) >= len(suffix) && s[len(s)-len(suffix):] == suffix
    }

When suffix is a literal string, the compiler now understands
that the explicit check is enough to not emit a slice check.

I also found a loopbce test that was incorrectly
written to detect an overflow but had a off-by-one (on the
conservative side), so it unexpectly passed with this CL; I
changed it to really trigger the overflow as intended.

Change-Id: Ib5abade337db46b8811425afebad4719b6e46c4a
Reviewed-on: https://go-review.googlesource.com/105635
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-04-29 09:38:32 +00:00
Giovanni Bajo 6d379add0f cmd/compile: in prove, detect loops with negative increments
To be effective, this also requires being able to relax constraints
on min/max bound inclusiveness; they are now exposed through a flags,
and prove has been updated to handle it correctly.

Change-Id: I3490e54461b7b9de8bc4ae40d3b5e2fa2d9f0556
Reviewed-on: https://go-review.googlesource.com/104041
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-04-29 09:38:18 +00:00
Giovanni Bajo 980fdb8dd5 cmd/compile: improve testing of induction variables
Test both minimum and maximum bound, and prepare
formatting for more advanced tests (inclusive / esclusive bounds).

Change-Id: Ibe432916d9c938343bc07943798bc9709ad71845
Reviewed-on: https://go-review.googlesource.com/104040
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-29 09:38:09 +00:00
Giovanni Bajo f49369b67c cmd/compile: remove loopbce pass
prove now is able to do what loopbce used to do.

Passes toolstash -cmp.

Compilebench of the whole serie (master 9967582f77):

name       old time/op     new time/op     delta
Template       208ms ±18%      198ms ± 4%    ~     (p=0.690 n=5+5)
Unicode       99.1ms ±19%     96.5ms ± 4%    ~     (p=0.548 n=5+5)
GoTypes        623ms ± 1%      633ms ± 1%    ~     (p=0.056 n=5+5)
Compiler       2.94s ± 2%      3.02s ± 4%    ~     (p=0.095 n=5+5)
SSA            6.77s ± 1%      7.11s ± 2%  +4.94%  (p=0.008 n=5+5)
Flate          129ms ± 1%      136ms ± 0%  +4.87%  (p=0.016 n=5+4)
GoParser       152ms ± 3%      156ms ± 1%    ~     (p=0.095 n=5+5)
Reflect        380ms ± 2%      392ms ± 1%  +3.30%  (p=0.008 n=5+5)
Tar            185ms ± 6%      184ms ± 2%    ~     (p=0.690 n=5+5)
XML            223ms ± 2%      228ms ± 3%    ~     (p=0.095 n=5+5)
StdCmd         26.8s ± 2%      28.0s ± 5%  +4.46%  (p=0.032 n=5+5)

name       old user-ns/op  new user-ns/op  delta
Template        252M ± 5%       248M ± 3%    ~     (p=1.000 n=5+5)
Unicode         118M ± 7%       121M ± 4%    ~     (p=0.548 n=5+5)
GoTypes         790M ± 2%       793M ± 2%    ~     (p=0.690 n=5+5)
Compiler       3.78G ± 3%      3.91G ± 4%    ~     (p=0.056 n=5+5)
SSA            8.98G ± 2%      9.52G ± 3%  +6.08%  (p=0.008 n=5+5)
Flate           155M ± 1%       160M ± 0%  +3.47%  (p=0.016 n=5+4)
GoParser        185M ± 4%       187M ± 2%    ~     (p=0.310 n=5+5)
Reflect         469M ± 1%       481M ± 1%  +2.52%  (p=0.016 n=5+5)
Tar             222M ± 4%       222M ± 2%    ~     (p=0.841 n=5+5)
XML             269M ± 1%       274M ± 2%  +1.88%  (p=0.032 n=5+5)

name       old text-bytes  new text-bytes  delta
HelloSize       664k ± 0%       664k ± 0%    ~     (all equal)
CmdGoSize      7.23M ± 0%      7.22M ± 0%  -0.06%  (p=0.008 n=5+5)

name       old data-bytes  new data-bytes  delta
HelloSize       134k ± 0%       134k ± 0%    ~     (all equal)
CmdGoSize       390k ± 0%       390k ± 0%    ~     (all equal)

name       old exe-bytes   new exe-bytes   delta
HelloSize      1.39M ± 0%      1.39M ± 0%    ~     (all equal)
CmdGoSize      14.4M ± 0%      14.4M ± 0%  -0.06%  (p=0.008 n=5+5)

Go1 of the whole serie:

name                      old time/op    new time/op    delta
BinaryTree17-16              5.40s ± 6%     5.38s ± 4%     ~     (p=1.000 n=12+10)
Fannkuch11-16                4.04s ± 3%     3.81s ± 3%   -5.70%  (p=0.000 n=11+11)
FmtFprintfEmpty-16          60.7ns ± 2%    60.2ns ± 3%     ~     (p=0.136 n=11+10)
FmtFprintfString-16          115ns ± 2%     114ns ± 4%     ~     (p=0.175 n=11+10)
FmtFprintfInt-16             118ns ± 2%     125ns ± 2%   +5.76%  (p=0.000 n=11+10)
FmtFprintfIntInt-16          196ns ± 2%     204ns ± 3%   +4.42%  (p=0.000 n=10+11)
FmtFprintfPrefixedInt-16     207ns ± 2%     214ns ± 2%   +3.23%  (p=0.000 n=10+11)
FmtFprintfFloat-16           364ns ± 3%     357ns ± 2%   -1.88%  (p=0.002 n=11+11)
FmtManyArgs-16               773ns ± 2%     775ns ± 1%     ~     (p=0.457 n=11+10)
GobDecode-16                11.2ms ± 4%    11.0ms ± 3%   -1.51%  (p=0.022 n=10+9)
GobEncode-16                9.91ms ± 6%    9.81ms ± 5%     ~     (p=0.699 n=11+11)
Gzip-16                      339ms ± 1%     338ms ± 1%     ~     (p=0.438 n=11+11)
Gunzip-16                   64.4ms ± 1%    65.2ms ± 1%   +1.28%  (p=0.001 n=10+11)
HTTPClientServer-16          157µs ± 7%     160µs ± 5%     ~     (p=0.133 n=11+11)
JSONEncode-16               22.3ms ± 4%    23.2ms ± 4%   +3.79%  (p=0.000 n=11+11)
JSONDecode-16               96.7ms ± 3%    96.6ms ± 1%     ~     (p=0.562 n=11+11)
Mandelbrot200-16            6.42ms ± 1%    6.40ms ± 1%     ~     (p=0.365 n=11+11)
GoParse-16                  5.59ms ± 7%    5.42ms ± 5%   -3.07%  (p=0.020 n=11+10)
RegexpMatchEasy0_32-16       113ns ± 2%     113ns ± 3%     ~     (p=0.968 n=11+10)
RegexpMatchEasy0_1K-16       417ns ± 1%     416ns ± 3%     ~     (p=0.742 n=11+10)
RegexpMatchEasy1_32-16       106ns ± 1%     107ns ± 3%     ~     (p=0.223 n=11+11)
RegexpMatchEasy1_1K-16       654ns ± 2%     657ns ± 1%     ~     (p=0.672 n=11+8)
RegexpMatchMedium_32-16      176ns ± 3%     177ns ± 1%     ~     (p=0.664 n=11+9)
RegexpMatchMedium_1K-16     56.3µs ± 3%    56.7µs ± 3%     ~     (p=0.171 n=11+11)
RegexpMatchHard_32-16       2.83µs ± 5%    2.83µs ± 4%     ~     (p=0.735 n=11+11)
RegexpMatchHard_1K-16       82.7µs ± 2%    82.7µs ± 2%     ~     (p=0.853 n=10+10)
Revcomp-16                   679ms ± 9%     782ms ±29%  +15.16%  (p=0.031 n=9+11)
Template-16                  118ms ± 1%     109ms ± 2%   -7.49%  (p=0.000 n=11+11)
TimeParse-16                 474ns ± 1%     462ns ± 1%   -2.59%  (p=0.000 n=11+11)
TimeFormat-16                482ns ± 1%     494ns ± 1%   +2.49%  (p=0.000 n=10+11)

name                      old speed      new speed      delta
GobDecode-16              68.7MB/s ± 4%  69.8MB/s ± 3%   +1.52%  (p=0.022 n=10+9)
GobEncode-16              77.6MB/s ± 6%  78.3MB/s ± 5%     ~     (p=0.699 n=11+11)
Gzip-16                   57.2MB/s ± 1%  57.3MB/s ± 1%     ~     (p=0.428 n=11+11)
Gunzip-16                  301MB/s ± 2%   298MB/s ± 1%   -1.07%  (p=0.007 n=11+11)
JSONEncode-16             86.9MB/s ± 4%  83.7MB/s ± 4%   -3.63%  (p=0.000 n=11+11)
JSONDecode-16             20.1MB/s ± 3%  20.1MB/s ± 1%     ~     (p=0.529 n=11+11)
GoParse-16                10.4MB/s ± 6%  10.7MB/s ± 4%   +3.12%  (p=0.020 n=11+10)
RegexpMatchEasy0_32-16     282MB/s ± 2%   282MB/s ± 3%     ~     (p=0.756 n=11+10)
RegexpMatchEasy0_1K-16    2.45GB/s ± 1%  2.46GB/s ± 2%     ~     (p=0.705 n=11+10)
RegexpMatchEasy1_32-16     299MB/s ± 1%   297MB/s ± 2%     ~     (p=0.151 n=11+11)
RegexpMatchEasy1_1K-16    1.56GB/s ± 2%  1.56GB/s ± 1%     ~     (p=0.717 n=11+8)
RegexpMatchMedium_32-16   5.67MB/s ± 4%  5.63MB/s ± 1%     ~     (p=0.538 n=11+9)
RegexpMatchMedium_1K-16   18.2MB/s ± 3%  18.1MB/s ± 3%     ~     (p=0.156 n=11+11)
RegexpMatchHard_32-16     11.3MB/s ± 5%  11.3MB/s ± 4%     ~     (p=0.711 n=11+11)
RegexpMatchHard_1K-16     12.4MB/s ± 1%  12.4MB/s ± 2%     ~     (p=0.535 n=9+10)
Revcomp-16                 370MB/s ± 5%   332MB/s ±24%     ~     (p=0.062 n=8+11)
Template-16               16.5MB/s ± 1%  17.8MB/s ± 2%   +8.11%  (p=0.000 n=11+11)

Change-Id: I41e46f375ee127785c6491f7ef5bd35581261ae6
Reviewed-on: https://go-review.googlesource.com/104039
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-04-29 09:37:58 +00:00
Giovanni Bajo 7ec25d0acf cmd/compile: implement loop BCE in prove
Reuse findIndVar to discover induction variables, and then
register the facts we know about them into the facts table
when entering the loop block.

Moreover, handle "x+delta > w" while updating the facts table,
to be able to prove accesses to slices with constant offsets
such as slice[i-10].

Change-Id: I2a63d050ed58258136d54712ac7015b25c893d71
Reviewed-on: https://go-review.googlesource.com/104038
Run-TryBot: Giovanni Bajo <rasky@develer.com>
Reviewed-by: David Chase <drchase@google.com>
2018-04-29 09:37:35 +00:00
Giovanni Bajo 29162ec9a7 cmd/compile: in prove, infer unsigned relations while branching
When a branch is followed, we apply the relation as described
in the domain relation table. In case the relation is in the
positive domain, we can also infer an unsigned relation if,
by that point, we know that both operands are non-negative.

Fixes #20393

Change-Id: Ieaf0c81558b36d96616abae3eb834c788dd278d5
Reviewed-on: https://go-review.googlesource.com/100278
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: David Chase <drchase@google.com>
2018-04-29 09:37:15 +00:00
Giovanni Bajo 5c40210987 cmd/compile: in prove, add transitive closure of relations
Implement it through a partial order datastructure, which
keeps the relations between SSA values in a forest of DAGs
and is able to discover contradictions.

In make.bash, this patch is able to prove hundreds of conditions
which were not proved before.

Compilebench:

name       old time/op       new time/op       delta
Template         371ms ± 2%        368ms ± 1%    ~     (p=0.222 n=5+5)
Unicode          203ms ± 6%        199ms ± 3%    ~     (p=0.421 n=5+5)
GoTypes          1.17s ± 4%        1.18s ± 1%    ~     (p=0.151 n=5+5)
Compiler         5.54s ± 2%        5.59s ± 1%    ~     (p=0.548 n=5+5)
SSA              12.9s ± 2%        13.2s ± 1%  +2.96%  (p=0.032 n=5+5)
Flate            245ms ± 2%        247ms ± 3%    ~     (p=0.690 n=5+5)
GoParser         302ms ± 6%        302ms ± 4%    ~     (p=0.548 n=5+5)
Reflect          764ms ± 4%        773ms ± 3%    ~     (p=0.095 n=5+5)
Tar              354ms ± 6%        361ms ± 3%    ~     (p=0.222 n=5+5)
XML              434ms ± 3%        429ms ± 1%    ~     (p=0.421 n=5+5)
StdCmd           22.6s ± 1%        22.9s ± 1%  +1.40%  (p=0.032 n=5+5)

name       old user-time/op  new user-time/op  delta
Template         436ms ± 8%        426ms ± 5%    ~     (p=0.579 n=5+5)
Unicode          219ms ±15%        219ms ±12%    ~     (p=1.000 n=5+5)
GoTypes          1.47s ± 6%        1.53s ± 6%    ~     (p=0.222 n=5+5)
Compiler         7.26s ± 4%        7.40s ± 2%    ~     (p=0.389 n=5+5)
SSA              17.7s ± 4%        18.5s ± 4%  +4.13%  (p=0.032 n=5+5)
Flate            257ms ± 5%        268ms ± 9%    ~     (p=0.333 n=5+5)
GoParser         354ms ± 6%        348ms ± 6%    ~     (p=0.913 n=5+5)
Reflect          904ms ± 2%        944ms ± 4%    ~     (p=0.056 n=5+5)
Tar              398ms ±11%        430ms ± 7%    ~     (p=0.079 n=5+5)
XML              501ms ± 7%        489ms ± 5%    ~     (p=0.444 n=5+5)

name       old text-bytes    new text-bytes    delta
HelloSize        670kB ± 0%        670kB ± 0%  +0.00%  (p=0.008 n=5+5)
CmdGoSize       7.22MB ± 0%       7.21MB ± 0%  -0.07%  (p=0.008 n=5+5)

name       old data-bytes    new data-bytes    delta
HelloSize       9.88kB ± 0%       9.88kB ± 0%    ~     (all equal)
CmdGoSize        248kB ± 0%        248kB ± 0%  -0.06%  (p=0.008 n=5+5)

name       old bss-bytes     new bss-bytes     delta
HelloSize        125kB ± 0%        125kB ± 0%    ~     (all equal)
CmdGoSize        145kB ± 0%        144kB ± 0%  -0.20%  (p=0.008 n=5+5)

name       old exe-bytes     new exe-bytes     delta
HelloSize       1.43MB ± 0%       1.43MB ± 0%    ~     (all equal)
CmdGoSize       14.5MB ± 0%       14.5MB ± 0%  -0.06%  (p=0.008 n=5+5)

Fixes #19714
Updates #20393

Change-Id: Ia090f5b5dc1bcd274ba8a39b233c1e1ace1b330e
Reviewed-on: https://go-review.googlesource.com/100277
Run-TryBot: Giovanni Bajo <rasky@develer.com>
Reviewed-by: David Chase <drchase@google.com>
2018-04-29 09:35:39 +00:00
Josh Bleecher Snyder 5af0b28a73 runtime: iterate over set bits in adjustpointers
There are several things combined in this change.

First, eliminate the gobitvector type in favor
of adding a ptrbit method to bitvector.
In non-performance-critical code, use that method.
In performance critical code, though, load the bitvector data
one byte at a time and iterate only over set bits.
To support that, add and use sys.Ctz8.

name                old time/op  new time/op  delta
StackCopyPtr-8      81.8ms ± 5%  78.9ms ± 3%   -3.58%  (p=0.000 n=97+96)
StackCopy-8         65.9ms ± 3%  62.8ms ± 3%   -4.67%  (p=0.000 n=96+92)
StackCopyNoCache-8   105ms ± 3%   102ms ± 3%   -3.38%  (p=0.000 n=96+95)

Change-Id: I00b80f45612708bd440b1a411a57fa6dfa24aa74
Reviewed-on: https://go-review.googlesource.com/109716
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-04-29 05:24:44 +00:00
Josh Bleecher Snyder 13cd006139 runtime: add fast version of getArgInfo
getArgInfo is called a lot during stack copying.
In the common case it doesn't do much work,
but it cannot be inlined.

This change works around that.

name                old time/op  new time/op  delta
StackCopyPtr-8       108ms ± 5%    96ms ± 4%  -10.40%  (p=0.000 n=20+20)
StackCopy-8         82.6ms ± 3%  78.4ms ± 6%   -5.15%  (p=0.000 n=19+20)
StackCopyNoCache-8   130ms ± 3%   122ms ± 3%   -6.44%  (p=0.000 n=20+20)

Change-Id: If7d8a08c50a4e2e76e4331b399396c5dbe88c2ce
Reviewed-on: https://go-review.googlesource.com/108945
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-04-29 03:33:09 +00:00
Ben Shi aaf73c6d1e cmd/compile: optimize ARM64 with shifted register indexed load/store
ARM64 supports efficient instructions which combine shift, addition, load/store
together. Such as "MOVD (R0)(R1<<3), R2" and "MOVWU R6, (R4)(R1<<2)".

This CL optimizes the compiler to emit such efficient instuctions. And below
is some test data.

1. binary size before/after
binary                 size change
pkg/linux_arm64        +80.1KB
pkg/tool/linux_arm64   +121.9KB
go                     -4.3KB
gofmt                  -64KB

2. go1 benchmark
There is big improvement for the test case Fannkuch11, and slight
improvement for sme others, excluding noise.

name                     old time/op    new time/op    delta
BinaryTree17-4              43.9s ± 2%     44.0s ± 2%     ~     (p=0.820 n=30+30)
Fannkuch11-4                30.6s ± 2%     24.5s ± 3%  -19.93%  (p=0.000 n=25+30)
FmtFprintfEmpty-4           500ns ± 0%     499ns ± 0%   -0.11%  (p=0.000 n=23+25)
FmtFprintfString-4         1.03µs ± 0%    1.04µs ± 3%     ~     (p=0.065 n=29+30)
FmtFprintfInt-4            1.15µs ± 3%    1.15µs ± 4%   -0.56%  (p=0.000 n=30+30)
FmtFprintfIntInt-4         1.80µs ± 5%    1.82µs ± 0%     ~     (p=0.094 n=30+24)
FmtFprintfPrefixedInt-4    2.17µs ± 5%    2.20µs ± 0%     ~     (p=0.100 n=30+23)
FmtFprintfFloat-4          3.08µs ± 3%    3.09µs ± 4%     ~     (p=0.123 n=30+30)
FmtManyArgs-4              7.41µs ± 4%    7.17µs ± 1%   -3.26%  (p=0.000 n=30+23)
GobDecode-4                93.7ms ± 0%    94.7ms ± 4%     ~     (p=0.685 n=24+30)
GobEncode-4                78.7ms ± 7%    77.1ms ± 0%     ~     (p=0.729 n=30+23)
Gzip-4                      4.01s ± 0%     3.97s ± 5%   -1.11%  (p=0.037 n=24+30)
Gunzip-4                    389ms ± 4%     384ms ± 0%     ~     (p=0.155 n=30+23)
HTTPClientServer-4          536µs ± 1%     537µs ± 1%     ~     (p=0.236 n=30+30)
JSONEncode-4                179ms ± 1%     182ms ± 6%     ~     (p=0.763 n=24+30)
JSONDecode-4                843ms ± 0%     839ms ± 6%   -0.42%  (p=0.003 n=25+30)
Mandelbrot200-4            46.5ms ± 0%    46.5ms ± 0%   +0.02%  (p=0.000 n=26+26)
GoParse-4                  44.3ms ± 6%    43.3ms ± 0%     ~     (p=0.067 n=30+27)
RegexpMatchEasy0_32-4      1.07µs ± 7%    1.07µs ± 4%     ~     (p=0.835 n=30+30)
RegexpMatchEasy0_1K-4      5.51µs ± 0%    5.49µs ± 0%   -0.35%  (p=0.000 n=23+26)
RegexpMatchEasy1_32-4      1.01µs ± 0%    1.02µs ± 4%   +0.96%  (p=0.014 n=24+30)
RegexpMatchEasy1_1K-4      7.43µs ± 0%    7.18µs ± 0%   -3.41%  (p=0.000 n=23+24)
RegexpMatchMedium_32-4     1.78µs ± 0%    1.81µs ± 4%   +1.47%  (p=0.012 n=23+30)
RegexpMatchMedium_1K-4      547µs ± 1%     542µs ± 3%   -0.90%  (p=0.003 n=24+30)
RegexpMatchHard_32-4       30.4µs ± 0%    29.7µs ± 0%   -2.15%  (p=0.000 n=19+23)
RegexpMatchHard_1K-4        913µs ± 0%     915µs ± 6%   +0.25%  (p=0.012 n=24+30)
Revcomp-4                   6.32s ± 1%     6.42s ± 4%     ~     (p=0.342 n=25+30)
Template-4                  868ms ± 6%     878ms ± 6%   +1.15%  (p=0.000 n=30+30)
TimeParse-4                4.57µs ± 4%    4.59µs ± 3%   +0.65%  (p=0.010 n=29+30)
TimeFormat-4               4.51µs ± 0%    4.50µs ± 0%   -0.27%  (p=0.000 n=27+24)
[Geo mean]                  695µs          689µs        -0.92%

name                     old speed      new speed      delta
GobDecode-4              8.19MB/s ± 0%  8.12MB/s ± 4%     ~     (p=0.680 n=24+30)
GobEncode-4              9.76MB/s ± 7%  9.96MB/s ± 0%     ~     (p=0.616 n=30+23)
Gzip-4                   4.84MB/s ± 0%  4.89MB/s ± 4%   +1.16%  (p=0.030 n=24+30)
Gunzip-4                 49.9MB/s ± 4%  50.6MB/s ± 0%     ~     (p=0.162 n=30+23)
JSONEncode-4             10.9MB/s ± 1%  10.7MB/s ± 6%     ~     (p=0.575 n=24+30)
JSONDecode-4             2.30MB/s ± 0%  2.32MB/s ± 5%   +0.72%  (p=0.003 n=22+30)
GoParse-4                1.31MB/s ± 6%  1.34MB/s ± 0%   +2.26%  (p=0.002 n=30+27)
RegexpMatchEasy0_32-4    30.0MB/s ± 6%  30.0MB/s ± 4%     ~     (p=1.000 n=30+30)
RegexpMatchEasy0_1K-4     186MB/s ± 0%   187MB/s ± 0%   +0.35%  (p=0.000 n=23+26)
RegexpMatchEasy1_32-4    31.8MB/s ± 0%  31.5MB/s ± 4%   -0.92%  (p=0.012 n=25+30)
RegexpMatchEasy1_1K-4     138MB/s ± 0%   143MB/s ± 0%   +3.53%  (p=0.000 n=23+24)
RegexpMatchMedium_32-4    560kB/s ± 0%   553kB/s ± 4%   -1.19%  (p=0.005 n=23+30)
RegexpMatchMedium_1K-4   1.87MB/s ± 0%  1.89MB/s ± 3%   +1.04%  (p=0.002 n=24+30)
RegexpMatchHard_32-4     1.05MB/s ± 0%  1.08MB/s ± 0%   +2.40%  (p=0.000 n=19+23)
RegexpMatchHard_1K-4     1.12MB/s ± 0%  1.12MB/s ± 5%   +0.12%  (p=0.006 n=25+30)
Revcomp-4                40.2MB/s ± 1%  39.6MB/s ± 4%     ~     (p=0.242 n=25+30)
Template-4               2.24MB/s ± 6%  2.21MB/s ± 6%   -1.15%  (p=0.000 n=30+30)
[Geo mean]               7.87MB/s       7.91MB/s        +0.44%

Change-Id: If374cb7abf83537aa0a176f73c0f736f7800db03
Reviewed-on: https://go-review.googlesource.com/108735
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-27 20:02:05 +00:00
Milan Knezevic 2959128dc5 cmd/compile: add softfloat support to mips64{,le}
mips64 softfloat support is based on mips implementation and introduces
new enviroment variable GOMIPS64.

GOMIPS64 is a GOARCH=mips64{,le} specific option, for a choice between
hard-float and soft-float. Valid values are 'hardfloat' (default) and
'softfloat'. It is passed to the assembler as
'GOMIPS64_{hardfloat,softfloat}'.

Change-Id: I7f73078627f7cb37c588a38fb5c997fe09c56134
Reviewed-on: https://go-review.googlesource.com/108475
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-27 14:50:17 +00:00
Daniel Martí a835c7396d cmd/compile: add initial README
As a follow-up to the first README for cmd/compile/internal/ssa.

Since this is the parent package for all the compiler packages, this
README serves as an overview of the compiler and its packages. As more
READMEs are added for specific parts with more detail, such as ssa's,
they can be linked from this one.

Thanks to Iskander Sharipov, Josh Bleecher Snyder, Matthew Dempsky,
Alberto Donizetti, and Robert Griesemer for helping with all the details
in this document.

Change-Id: I820a535e25dce86ccc667ce1c6e92b75fc32f3af
Reviewed-on: https://go-review.googlesource.com/103935
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-04-27 07:51:59 +00:00
Josh Bleecher Snyder b9785fc844 cmd/compile: log Ctz non-zero proofs
I forgot this in CL 109358.

Change-Id: Ia5e8bd9cf43393f098b101a0d6a0c526e3e4f101
Reviewed-on: https://go-review.googlesource.com/109775
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-27 04:49:09 +00:00
Matthew Dempsky ae2a2d12f6 cmd/compile: cleaner solution for importing init functions
Using oldname+resolve is how typecheck handles this anyway.

Passes toolstash -cmp, with both -iexport enabled and disabled.

Change-Id: I12b0f0333d6b86ce6bfc4d416c461b5f15c1110d
Reviewed-on: https://go-review.googlesource.com/109715
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-04-26 23:21:28 +00:00
Josh Bleecher Snyder 8bf4b7e673 cmd/compile: convert amd64 BSFL and BSRL from tuple to result only
Change-Id: I220a459f67ecb310b6e9a526a1ff55527d421e70
Reviewed-on: https://go-review.googlesource.com/109416
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-26 19:29:17 +00:00
Josh Bleecher Snyder d9a50a6531 cmd/compile: use prove pass to detect Ctz of non-zero values
On amd64, Ctz must include special handling of zeros.
But the prove pass has enough information to detect whether the input
is non-zero, allowing a more efficient lowering.

Introduce new CtzNonZero ops to capture and use this information.

Benchmark code:

func BenchmarkVisitBits(b *testing.B) {
	b.Run("8", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			x := uint8(0xff)
			for x != 0 {
				sink = bits.TrailingZeros8(x)
				x &= x - 1
			}
		}
	})

    // and similarly so for 16, 32, 64
}

name            old time/op  new time/op  delta
VisitBits/8-8   7.27ns ± 4%  5.58ns ± 4%  -23.35%  (p=0.000 n=28+26)
VisitBits/16-8  14.7ns ± 7%  10.5ns ± 4%  -28.43%  (p=0.000 n=30+28)
VisitBits/32-8  27.6ns ± 8%  19.3ns ± 3%  -30.14%  (p=0.000 n=30+26)
VisitBits/64-8  44.0ns ±11%  38.0ns ± 5%  -13.48%  (p=0.000 n=30+30)

Fixes #25077

Change-Id: Ie6e5bd86baf39ee8a4ca7cadcf56d934e047f957
Reviewed-on: https://go-review.googlesource.com/109358
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-26 18:22:28 +00:00
Michael Munday adbb6ec903 cmd/compile/internal/ssa: regenerate rewrite rules
Running 'go run *.go' in the gen directory resulted in this diff.

Change-Id: Iee398a720f54d3f2c3c122fc6fc45a708a39e45e
Reviewed-on: https://go-review.googlesource.com/109575
Run-TryBot: Michael Munday <mike.munday@ibm.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-26 18:06:33 +00:00
Balaram Makam f524268c40 cmd/compile: optimize ARM64 code with CMN/TST
Use CMN/TST to simplify comparisons. This can reduce the
register pressure by removing single def/use registers for example:
ADDW R0, R1, R8 -> CMNW R1, R0 ; CMN is an alias of ADDS.
CBZW R8, label  -> BEQ  label  ; single def/use of R8 removed.

Little change in performance of go1 benchmark on Amberwing:
name                   old time/op    new time/op    delta
RegexpMatchEasy0_32       247ns ± 0%     246ns ± 0%  -0.40%  (p=0.008 n=5+5)
RegexpMatchEasy0_1K       581ns ± 0%     580ns ± 0%    ~     (p=0.079 n=4+5)
RegexpMatchEasy1_32       244ns ± 0%     243ns ± 0%  -0.41%  (p=0.008 n=5+5)
RegexpMatchEasy1_1K       804ns ± 0%     806ns ± 0%  +0.25%  (p=0.016 n=5+4)
RegexpMatchMedium_32      313ns ± 0%     311ns ± 0%  -0.64%  (p=0.008 n=5+5)
RegexpMatchMedium_1K     52.2µs ± 0%    51.9µs ± 0%  -0.51%  (p=0.008 n=5+5)
RegexpMatchHard_32       2.76µs ± 3%    2.74µs ± 0%    ~     (p=0.683 n=5+5)
RegexpMatchHard_1K       78.8µs ± 0%    78.9µs ± 0%  +0.04%  (p=0.008 n=5+5)
FmtFprintfEmpty          58.6ns ± 0%    57.7ns ± 0%  -1.54%  (p=0.008 n=5+5)
FmtFprintfString          118ns ± 0%     115ns ± 0%  -2.54%  (p=0.008 n=5+5)
FmtFprintfInt             119ns ± 0%     119ns ± 0%    ~     (all equal)
FmtFprintfIntInt          192ns ± 0%     192ns ± 0%    ~     (all equal)
FmtFprintfPrefixedInt     224ns ± 0%     205ns ± 0%  -8.48%  (p=0.008 n=5+5)
FmtFprintfFloat           336ns ± 0%     333ns ± 1%    ~     (p=0.683 n=5+5)
FmtManyArgs               779ns ± 1%     760ns ± 1%  -2.41%  (p=0.008 n=5+5)
Gzip                      437ms ± 0%     436ms ± 0%  -0.27%  (p=0.008 n=5+5)
HTTPClientServer         90.1µs ± 1%    91.1µs ± 0%  +1.19%  (p=0.008 n=5+5)
JSONEncode               20.1ms ± 0%    20.2ms ± 1%    ~     (p=0.690 n=5+5)
JSONDecode               94.5ms ± 1%    94.1ms ± 1%    ~     (p=0.095 n=5+5)
Mandelbrot200            5.37ms ± 0%    5.37ms ± 0%    ~     (p=0.421 n=5+5)
TimeParse                 450ns ± 0%     446ns ± 0%  -0.89%  (p=0.000 n=5+4)
TimeFormat                483ns ± 1%     473ns ± 0%  -2.19%  (p=0.008 n=5+5)
Template                 90.6ms ± 0%    89.7ms ± 0%  -0.93%  (p=0.008 n=5+5)
GoParse                  5.97ms ± 0%    6.01ms ± 0%  +0.65%  (p=0.008 n=5+5)
BinaryTree17              11.8s ± 0%     11.7s ± 0%  -0.28%  (p=0.016 n=5+5)
Revcomp                   669ms ± 0%     669ms ± 0%    ~     (p=0.222 n=5+5)
Fannkuch11                3.28s ± 0%     3.34s ± 0%  +1.72%  (p=0.016 n=4+5)
[Geo mean]               46.6µs         46.3µs       -0.74%

name                   old speed      new speed      delta
RegexpMatchEasy0_32     129MB/s ± 0%   130MB/s ± 0%  +0.32%  (p=0.016 n=5+4)
RegexpMatchEasy0_1K    1.76GB/s ± 0%  1.76GB/s ± 0%  +0.13%  (p=0.016 n=4+5)
RegexpMatchEasy1_32     131MB/s ± 0%   132MB/s ± 0%  +0.32%  (p=0.008 n=5+5)
RegexpMatchEasy1_1K    1.27GB/s ± 0%  1.27GB/s ± 0%  -0.24%  (p=0.016 n=5+4)
RegexpMatchMedium_32   3.19MB/s ± 0%  3.21MB/s ± 0%  +0.63%  (p=0.008 n=5+5)
RegexpMatchMedium_1K   19.6MB/s ± 0%  19.7MB/s ± 0%  +0.51%  (p=0.029 n=4+4)
RegexpMatchHard_32     11.6MB/s ± 2%  11.7MB/s ± 0%    ~     (p=1.000 n=5+5)
RegexpMatchHard_1K     13.0MB/s ± 0%  13.0MB/s ± 0%    ~     (p=0.079 n=4+5)
Gzip                   44.4MB/s ± 0%  44.5MB/s ± 0%  +0.27%  (p=0.008 n=5+5)
JSONEncode             96.4MB/s ± 0%  96.2MB/s ± 1%    ~     (p=0.579 n=5+5)
JSONDecode             20.5MB/s ± 1%  20.6MB/s ± 1%    ~     (p=0.111 n=5+5)
Template               21.4MB/s ± 0%  21.6MB/s ± 0%  +0.94%  (p=0.008 n=5+5)
GoParse                9.70MB/s ± 0%  9.63MB/s ± 0%  -0.68%  (p=0.016 n=4+5)
Revcomp                 380MB/s ± 0%   380MB/s ± 0%    ~     (p=0.222 n=5+5)
[Geo mean]             55.3MB/s       55.4MB/s       +0.23%

Change-Id: I2e5338138991d9bc984e67b51212aa5d1b0f2a6b
Reviewed-on: https://go-review.googlesource.com/97335
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
2018-04-26 14:13:12 +00:00
Carlos Eduardo Seo ebb67d993a cmd/compile, cmd/internal/obj/ppc64: make math.Round an intrinsic on ppc64x
This change implements math.Round as an intrinsic on ppc64x so it can be
done using a single instruction.

benchmark                   old ns/op     new ns/op     delta
BenchmarkRound-16           2.60          0.69          -73.46%

Change-Id: I9408363e96201abdfc73ced7bcd5f0c29db006a8
Reviewed-on: https://go-review.googlesource.com/109395
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2018-04-26 14:12:09 +00:00
Matthew Dempsky 736390c2bd cmd/compile: allow SSA of multi-field structs while instrumenting
When we moved racewalk to buildssa, we disabled SSA of structs with
2-4 fields while instrumenting because it caused false dependencies
because for "x.f" we would emit

    (StructSelect (Load (Addr x)) "f")

Even though we later simplify this to

    (Load (OffPtr (Addr x) "f"))

the instrumentation saw a load of x in its entirety and would issue
appropriate race/msan calls.

The fix taken here is to directly emit the OffPtr form when x.f is
addressable and can't be represented in SSA form.

Change-Id: I0caf37bced52e9c16937466b0ac8cab6d356e525
Reviewed-on: https://go-review.googlesource.com/109360
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-04-26 07:07:18 +00:00
Josh Bleecher Snyder c5f0104daf cmd/compile: use intrinsic for LeadingZeros8 on amd64
The previous change sped up the pure computation form of LeadingZeros8.
This places it somewhat close to the table lookup form.
Depending on something that varies from toolchain to toolchain
(alignment, perhaps?), the slowdown from ditching the table lookup
is either 20% or 5%.

This benchmark is the best case scenario for the table lookup:
It is in the L1 cache already.

I think we're close enough that we can switch to the computational version,
and trust that the memory effects and binary size savings will be worth it.

Code:

func f8(x uint8)   { z = bits.LeadingZeros8(x) }

Before:

"".f8 STEXT nosplit size=34 args=0x8 locals=0x0
	0x0000 00000 (x.go:7)	TEXT	"".f8(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:7)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:7)	MOVBLZX	"".x+8(SP), AX
	0x0005 00005 (x.go:7)	MOVBLZX	AL, AX
	0x0008 00008 (x.go:7)	LEAQ	math/bits.len8tab(SB), CX
	0x000f 00015 (x.go:7)	MOVBLZX	(CX)(AX*1), AX
	0x0013 00019 (x.go:7)	ADDQ	$-8, AX
	0x0017 00023 (x.go:7)	NEGQ	AX
	0x001a 00026 (x.go:7)	MOVQ	AX, "".z(SB)
	0x0021 00033 (x.go:7)	RET

After:

"".f8 STEXT nosplit size=30 args=0x8 locals=0x0
	0x0000 00000 (x.go:7)	TEXT	"".f8(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:7)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:7)	MOVBLZX	"".x+8(SP), AX
	0x0005 00005 (x.go:7)	MOVBLZX	AL, AX
	0x0008 00008 (x.go:7)	LEAL	1(AX)(AX*1), AX
	0x000c 00012 (x.go:7)	BSRL	AX, AX
	0x000f 00015 (x.go:7)	ADDQ	$-8, AX
	0x0013 00019 (x.go:7)	NEGQ	AX
	0x0016 00022 (x.go:7)	MOVQ	AX, "".z(SB)
	0x001d 00029 (x.go:7)	RET

Change-Id: Icc7db50a7820fb9a3da8a816d6b6940d7f8e193e
Reviewed-on: https://go-review.googlesource.com/108942
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-25 21:34:15 +00:00
Josh Bleecher Snyder 1d321ada73 cmd/compile: optimize LeadingZeros(16|32) on amd64
Introduce Len8 and Len16 ops and provide optimized lowerings for them.
amd64 only for this CL, although it wouldn't surprise me
if other architectures also admit of optimized lowerings.

Also use and optimize the Len32 lowering, along the same lines.

Leave Len8 unused for the moment; a subsequent CL will enable it.

For 16 and 32 bits, this leads to a speed-up.

name              old time/op  new time/op  delta
LeadingZeros16-8  1.42ns ± 5%  1.23ns ± 5%  -13.42%  (p=0.000 n=20+20)
LeadingZeros32-8  1.25ns ± 5%  1.03ns ± 5%  -17.63%  (p=0.000 n=20+16)

Code:

func f16(x uint16) { z = bits.LeadingZeros16(x) }
func f32(x uint32) { z = bits.LeadingZeros32(x) }

Before:

"".f16 STEXT nosplit size=38 args=0x8 locals=0x0
	0x0000 00000 (x.go:8)	TEXT	"".f16(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:8)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:8)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:8)	MOVWLZX	"".x+8(SP), AX
	0x0005 00005 (x.go:8)	MOVWLZX	AX, AX
	0x0008 00008 (x.go:8)	BSRQ	AX, AX
	0x000c 00012 (x.go:8)	MOVQ	$-1, CX
	0x0013 00019 (x.go:8)	CMOVQEQ	CX, AX
	0x0017 00023 (x.go:8)	ADDQ	$-15, AX
	0x001b 00027 (x.go:8)	NEGQ	AX
	0x001e 00030 (x.go:8)	MOVQ	AX, "".z(SB)
	0x0025 00037 (x.go:8)	RET

"".f32 STEXT nosplit size=34 args=0x8 locals=0x0
	0x0000 00000 (x.go:9)	TEXT	"".f32(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:9)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:9)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:9)	MOVL	"".x+8(SP), AX
	0x0004 00004 (x.go:9)	BSRQ	AX, AX
	0x0008 00008 (x.go:9)	MOVQ	$-1, CX
	0x000f 00015 (x.go:9)	CMOVQEQ	CX, AX
	0x0013 00019 (x.go:9)	ADDQ	$-31, AX
	0x0017 00023 (x.go:9)	NEGQ	AX
	0x001a 00026 (x.go:9)	MOVQ	AX, "".z(SB)
	0x0021 00033 (x.go:9)	RET

After:

"".f16 STEXT nosplit size=30 args=0x8 locals=0x0
	0x0000 00000 (x.go:8)	TEXT	"".f16(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:8)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:8)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:8)	MOVWLZX	"".x+8(SP), AX
	0x0005 00005 (x.go:8)	MOVWLZX	AX, AX
	0x0008 00008 (x.go:8)	LEAL	1(AX)(AX*1), AX
	0x000c 00012 (x.go:8)	BSRL	AX, AX
	0x000f 00015 (x.go:8)	ADDQ	$-16, AX
	0x0013 00019 (x.go:8)	NEGQ	AX
	0x0016 00022 (x.go:8)	MOVQ	AX, "".z(SB)
	0x001d 00029 (x.go:8)	RET

"".f32 STEXT nosplit size=28 args=0x8 locals=0x0
	0x0000 00000 (x.go:9)	TEXT	"".f32(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:9)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:9)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:9)	MOVL	"".x+8(SP), AX
	0x0004 00004 (x.go:9)	LEAQ	1(AX)(AX*1), AX
	0x0009 00009 (x.go:9)	BSRQ	AX, AX
	0x000d 00013 (x.go:9)	ADDQ	$-32, AX
	0x0011 00017 (x.go:9)	NEGQ	AX
	0x0014 00020 (x.go:9)	MOVQ	AX, "".z(SB)
	0x001b 00027 (x.go:9)	RET

Change-Id: I6c93c173752a7bfdeab8be30777ae05a736e1f4b
Reviewed-on: https://go-review.googlesource.com/108941
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-25 21:34:04 +00:00
Josh Bleecher Snyder 54dbab5221 cmd/compile: optimize TrailingZeros(8|16) on amd64
Introduce Ctz8 and Ctz16 ops and provide optimized lowerings for them.
amd64 only for this CL, although it wouldn't surprise me
if other architectures also admit of optimized lowerings.

name               old time/op  new time/op  delta
TrailingZeros8-8   1.33ns ± 6%  0.84ns ± 3%  -36.90%  (p=0.000 n=20+20)
TrailingZeros16-8  1.26ns ± 5%  0.84ns ± 5%  -33.50%  (p=0.000 n=20+18)

Code:

func f8(x uint8)   { z = bits.TrailingZeros8(x) }
func f16(x uint16) { z = bits.TrailingZeros16(x) }

Before:

"".f8 STEXT nosplit size=34 args=0x8 locals=0x0
	0x0000 00000 (x.go:7)	TEXT	"".f8(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:7)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:7)	MOVBLZX	"".x+8(SP), AX
	0x0005 00005 (x.go:7)	MOVBLZX	AL, AX
	0x0008 00008 (x.go:7)	BTSQ	$8, AX
	0x000d 00013 (x.go:7)	BSFQ	AX, AX
	0x0011 00017 (x.go:7)	MOVL	$64, CX
	0x0016 00022 (x.go:7)	CMOVQEQ	CX, AX
	0x001a 00026 (x.go:7)	MOVQ	AX, "".z(SB)
	0x0021 00033 (x.go:7)	RET

"".f16 STEXT nosplit size=34 args=0x8 locals=0x0
	0x0000 00000 (x.go:8)	TEXT	"".f16(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:8)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:8)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:8)	MOVWLZX	"".x+8(SP), AX
	0x0005 00005 (x.go:8)	MOVWLZX	AX, AX
	0x0008 00008 (x.go:8)	BTSQ	$16, AX
	0x000d 00013 (x.go:8)	BSFQ	AX, AX
	0x0011 00017 (x.go:8)	MOVL	$64, CX
	0x0016 00022 (x.go:8)	CMOVQEQ	CX, AX
	0x001a 00026 (x.go:8)	MOVQ	AX, "".z(SB)
	0x0021 00033 (x.go:8)	RET

After:

"".f8 STEXT nosplit size=20 args=0x8 locals=0x0
	0x0000 00000 (x.go:7)	TEXT	"".f8(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:7)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:7)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:7)	MOVBLZX	"".x+8(SP), AX
	0x0005 00005 (x.go:7)	BTSL	$8, AX
	0x0009 00009 (x.go:7)	BSFL	AX, AX
	0x000c 00012 (x.go:7)	MOVQ	AX, "".z(SB)
	0x0013 00019 (x.go:7)	RET

"".f16 STEXT nosplit size=20 args=0x8 locals=0x0
	0x0000 00000 (x.go:8)	TEXT	"".f16(SB), NOSPLIT, $0-8
	0x0000 00000 (x.go:8)	FUNCDATA	$0, gclocals·2a5305abe05176240e61b8620e19a815(SB)
	0x0000 00000 (x.go:8)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:8)	MOVWLZX	"".x+8(SP), AX
	0x0005 00005 (x.go:8)	BTSL	$16, AX
	0x0009 00009 (x.go:8)	BSFL	AX, AX
	0x000c 00012 (x.go:8)	MOVQ	AX, "".z(SB)
	0x0013 00019 (x.go:8)	RET

Change-Id: I0551e357348de2b724737d569afd6ac9f5c3aa11
Reviewed-on: https://go-review.googlesource.com/108940
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-25 21:33:52 +00:00
Russ Cox d3ff5090dd cmd/compile: fix format error
Found by pending CL to make cmd/vet auto-detect printf wrappers.

Change-Id: I6b5ba8f9c301dd2d7086c152cf2e54a68b012208
Reviewed-on: https://go-review.googlesource.com/109345
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-25 20:22:25 +00:00
Matthew Dempsky 7500b29993 cmd/compile/internal/types: remove Field.Funarg
Passes toolstash-check.

Change-Id: Idc00f15e369cad62cb8f7a09fd0ef09abd3fcdef
Reviewed-on: https://go-review.googlesource.com/109356
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-25 19:26:13 +00:00
Ilya Tocar 25813f9f91 cmd/compile/internal/ssa: tweak branchelim cost model on amd64
Currently branchelim is too aggressive in converting branches to
conditinal movs. On most x86 cpus resulting cmov* are more expensive than
most simple instructions, because they have a latency of 2, instead of 1,
So by teaching branchelim to limit number of CondSelects and consider possible
need to recalculate flags, we can archive huge speed-ups (fix big regressions).
In package strings:

ToUpper/#00-6                              10.9ns ± 1%  11.8ns ± 1%   +8.29%  (p=0.000 n=10+10)
ToUpper/ONLYUPPER-6                        27.9ns ± 0%  27.8ns ± 1%     ~     (p=0.106 n=9+10)
ToUpper/abc-6                              90.3ns ± 2%  90.3ns ± 1%     ~     (p=0.956 n=10+10)
ToUpper/AbC123-6                            110ns ± 1%   113ns ± 2%   +3.19%  (p=0.000 n=10+10)
ToUpper/azAZ09_-6                           109ns ± 2%   110ns ± 1%     ~     (p=0.174 n=10+10)
ToUpper/longStrinGwitHmixofsmaLLandcAps-6   228ns ± 1%   233ns ± 2%   +2.11%  (p=0.000 n=10+10)
ToUpper/longɐstringɐwithɐnonasciiⱯchars-6   907ns ± 1%   709ns ± 2%  -21.85%  (p=0.000 n=10+10)
ToUpper/ɐɐɐɐɐ-6                             793ns ± 2%   562ns ± 2%  -29.06%  (p=0.000 n=10+10)

In fmt:

SprintfQuoteString-6   272ns ± 2%   195ns ± 4%  -28.39%  (p=0.000 n=10+10)

And in archive/zip:

CompressedZipGarbage-6       4.00ms ± 0%    4.03ms ± 0%   +0.71%  (p=0.000 n=10+10)
Zip64Test-6                  27.5ms ± 1%    24.2ms ± 0%  -12.01%  (p=0.000 n=10+10)
Zip64TestSizes/4096-6        10.4µs ±12%    10.7µs ± 8%     ~     (p=0.068 n=10+8)
Zip64TestSizes/1048576-6     79.0µs ± 3%    70.2µs ± 2%  -11.14%  (p=0.000 n=10+10)
Zip64TestSizes/67108864-6    4.64ms ± 1%    4.11ms ± 1%  -11.43%  (p=0.000 n=10+10)

As far as I can tell, no cases with significant gain from cmov have regressed.

On go1 it looks like most changes are unrelated, but I've verified that
TimeFormat really switched from cmov to branch in a hot spot.
Fill results below:

name                     old time/op    new time/op    delta
BinaryTree17-6              4.42s ± 1%     4.44s ± 1%    ~     (p=0.075 n=10+10)
Fannkuch11-6                4.23s ± 0%     4.18s ± 0%  -1.16%  (p=0.000 n=8+10)
FmtFprintfEmpty-6          67.5ns ± 2%    67.5ns ± 0%    ~     (p=0.950 n=10+7)
FmtFprintfString-6          117ns ± 2%     119ns ± 1%  +1.07%  (p=0.045 n=9+10)
FmtFprintfInt-6             122ns ± 0%     123ns ± 2%    ~     (p=0.825 n=8+10)
FmtFprintfIntInt-6          188ns ± 1%     187ns ± 1%  -0.85%  (p=0.001 n=10+10)
FmtFprintfPrefixedInt-6     223ns ± 1%     226ns ± 1%  +1.40%  (p=0.000 n=9+10)
FmtFprintfFloat-6           380ns ± 1%     379ns ± 0%    ~     (p=0.350 n=9+7)
FmtManyArgs-6               784ns ± 0%     790ns ± 1%  +0.81%  (p=0.000 n=10+9)
GobDecode-6                10.7ms ± 1%    10.8ms ± 0%  +0.68%  (p=0.000 n=10+10)
GobEncode-6                8.95ms ± 0%    8.94ms ± 0%    ~     (p=0.436 n=10+10)
Gzip-6                      378ms ± 0%     378ms ± 0%    ~     (p=0.696 n=8+10)
Gunzip-6                   60.5ms ± 0%    60.9ms ± 0%  +0.73%  (p=0.000 n=8+10)
HTTPClientServer-6          109µs ± 3%     111µs ± 2%  +2.53%  (p=0.000 n=10+10)
JSONEncode-6               20.2ms ± 0%    20.2ms ± 0%    ~     (p=0.382 n=8+8)
JSONDecode-6               85.9ms ± 1%    84.5ms ± 0%  -1.59%  (p=0.000 n=10+8)
Mandelbrot200-6            6.89ms ± 0%    6.85ms ± 1%  -0.49%  (p=0.000 n=9+10)
GoParse-6                  5.49ms ± 0%    5.40ms ± 0%  -1.63%  (p=0.000 n=8+10)
RegexpMatchEasy0_32-6       126ns ± 1%     129ns ± 1%  +2.14%  (p=0.000 n=10+10)
RegexpMatchEasy0_1K-6       320ns ± 1%     317ns ± 2%    ~     (p=0.089 n=10+10)
RegexpMatchEasy1_32-6       119ns ± 2%     121ns ± 4%    ~     (p=0.591 n=10+10)
RegexpMatchEasy1_1K-6       544ns ± 1%     541ns ± 0%  -0.67%  (p=0.020 n=8+8)
RegexpMatchMedium_32-6      184ns ± 1%     184ns ± 1%    ~     (p=0.360 n=10+10)
RegexpMatchMedium_1K-6     57.7µs ± 2%    58.3µs ± 1%  +1.12%  (p=0.022 n=10+10)
RegexpMatchHard_32-6       2.72µs ± 5%    2.70µs ± 0%    ~     (p=0.166 n=10+8)
RegexpMatchHard_1K-6       80.2µs ± 0%    81.0µs ± 0%  +1.01%  (p=0.000 n=8+10)
Revcomp-6                   607ms ± 0%     601ms ± 2%  -1.00%  (p=0.006 n=8+10)
Template-6                 93.1ms ± 1%    92.6ms ± 0%  -0.56%  (p=0.000 n=9+9)
TimeParse-6                 472ns ± 0%     470ns ± 0%  -0.28%  (p=0.001 n=9+9)
TimeFormat-6                546ns ± 0%     511ns ± 0%  -6.41%  (p=0.001 n=7+7)
[Geo mean]                 76.4µs         76.3µs       -0.12%

name                     old speed      new speed      delta
GobDecode-6              71.5MB/s ± 1%  71.1MB/s ± 0%  -0.68%  (p=0.000 n=10+10)
GobEncode-6              85.8MB/s ± 0%  85.8MB/s ± 0%    ~     (p=0.425 n=10+10)
Gzip-6                   51.3MB/s ± 0%  51.3MB/s ± 0%    ~     (p=0.680 n=8+10)
Gunzip-6                  321MB/s ± 0%   318MB/s ± 0%  -0.73%  (p=0.000 n=8+10)
JSONEncode-6             95.9MB/s ± 0%  96.0MB/s ± 0%    ~     (p=0.367 n=8+8)
JSONDecode-6             22.6MB/s ± 1%  22.9MB/s ± 0%  +1.62%  (p=0.000 n=10+8)
GoParse-6                10.6MB/s ± 0%  10.7MB/s ± 0%  +1.64%  (p=0.000 n=8+10)
RegexpMatchEasy0_32-6     252MB/s ± 1%   247MB/s ± 1%  -2.22%  (p=0.000 n=10+10)
RegexpMatchEasy0_1K-6    3.19GB/s ± 1%  3.22GB/s ± 2%    ~     (p=0.105 n=10+10)
RegexpMatchEasy1_32-6     267MB/s ± 2%   264MB/s ± 4%    ~     (p=0.481 n=10+10)
RegexpMatchEasy1_1K-6    1.88GB/s ± 1%  1.89GB/s ± 0%  +0.62%  (p=0.038 n=8+8)
RegexpMatchMedium_32-6   5.41MB/s ± 2%  5.43MB/s ± 0%    ~     (p=0.339 n=10+8)
RegexpMatchMedium_1K-6   17.8MB/s ± 1%  17.6MB/s ± 1%  -1.12%  (p=0.029 n=10+10)
RegexpMatchHard_32-6     11.8MB/s ± 5%  11.8MB/s ± 0%    ~     (p=0.163 n=10+8)
RegexpMatchHard_1K-6     12.8MB/s ± 0%  12.6MB/s ± 0%  -1.06%  (p=0.000 n=7+10)
Revcomp-6                 419MB/s ± 0%   423MB/s ± 2%  +1.02%  (p=0.006 n=8+10)
Template-6               20.9MB/s ± 0%  21.0MB/s ± 0%  +0.53%  (p=0.000 n=8+8)
[Geo mean]               77.0MB/s       77.0MB/s       +0.05%

diff --git a/src/cmd/compile/internal/ssa/branchelim.go b/src/cmd/compile/internal/ssa/branchelim.go

Change-Id: Ibdffa9ea9b4c72668617ce3202ec4a83a1cd59be
Reviewed-on: https://go-review.googlesource.com/107936
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-25 16:06:54 +00:00
Keith Randall ae26d57f96 cmd/compile: update SSA TODO file
Get rid of a bunch of stuff we've already done.

Change-Id: Ibae4be7535ddb58590a072a2390c5f3e948c2fd7
Reviewed-on: https://go-review.googlesource.com/109136
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-24 23:35:13 +00:00
Matthew Dempsky 2083b5d673 cmd/compile/internal/types: replace Type.Val with Type.Elem
This reduces the API surface of Type slightly (for #25056), but also
makes it more consistent with the reflect and go/types APIs.

Passes toolstash-check.

Change-Id: Ief9a8eb461ae6e88895f347e2a1b7b8a62423222
Reviewed-on: https://go-review.googlesource.com/109138
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-24 22:37:52 +00:00
Matthew Dempsky e10ee798c4 cmd/compile/internal/types: remove ElemType wrapper
This was an artifact from when we had a separate ssa.Type interface to
break circular dependency between packages ssa and gc. It's no longer
needed now that package ssa directly uses package types.

Change-Id: I6a93e5d79082815f7f0eb89507381969cc6cb403
Reviewed-on: https://go-review.googlesource.com/109137
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-04-24 22:24:47 +00:00
Josh Bleecher Snyder 3d6647d6f8 cmd/compile: improve regalloc live values debug printing
Before:

live values at end of each block
  b1: v3 v2 v7 avoid=0
  b2: v3 v13 avoid=81
  b3: v19[AX] v3 avoid=81
  b6: avoid=0
  b7: avoid=0
  b5: avoid=0
  b4: v3 v18 avoid=81

After:

live values at end of each block
  b1: v3 v2 v7
  b2: v3 v13 avoid=AX DI
  b3: v19[AX] v3 avoid=AX DI
  b6:
  b7:
  b5:
  b4: v3 v18 avoid=AX DI

Change-Id: Ibec5c76a16151832b8d49a21c640699fdc9a9d28
Reviewed-on: https://go-review.googlesource.com/109000
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-24 17:45:48 +00:00
Ilya Tocar fb017c60bc cmd/compile/internal/ssa: fix endless compile loop on AMD64
We currently rewrite
(TESTQ (MOVQconst [c] x)) into (TESTQconst [c] x)
and (TESTQconst [-1] x) into (TESTQ x x)
if x is a (MOVQconst [-1]) we will be stuck in the endless rewrite loop.
Don't perform the rewrite in such cases.

Fixes #25006

Change-Id: I77f561ba2605fc104f1e5d5c57f32e9d67a2c000
Reviewed-on: https://go-review.googlesource.com/108879
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-24 16:20:41 +00:00
isharipo cb44c8debb cmd/compile/internal/ssa: add Op{SP,SB} type checks to check.go
gc/ssa.go initilizes SP and SB values with TUINTPTR type.
Assign same type in SSA tests and modify check.go to catch
mismatching types for those ops.

This makes SSA tests more consistent.

Change-Id: I798440d57d00fb949d1a0cd796759c9b82a934bd
Reviewed-on: https://go-review.googlesource.com/106658
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-24 15:51:15 +00:00
Matthew Dempsky a3c75d9b31 cmd/compile: enable indexed export format by default
Change-Id: Id018eeb79afbe2c695a583b3845cfbc1aab08388
Reviewed-on: https://go-review.googlesource.com/106797
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-04-24 01:06:17 +00:00
Matthew Dempsky ca2f85fd3f cmd/compile: add indexed export format
This CL introduces a new indexed data format for package export
data. This improves on the previous (sequential) binary format by
allowing the compiler to selectively (and lazily) load only the data
that's actually needed for compilation.

In large Go projects, the package export data can become very large
due to transitive type declaration dependencies and inline
function/method bodies. By lazily loading these declarations and
bodies as needed, we avoid wasting time and memory processing
unnecessary and/or redundant data.

In the benchmarks below, "old" is -iexport=false and "new" is
-iexport=true. The suffixes indicate the compiler concurrency (-c) and
inlining (-l) settings used for the build (using -gcflags=all=-foo).
Benchmarks were run on an HP Z620.

Juju is "go build -a github.com/juju/juju/cmd/...":

name          old real-time/op  new real-time/op  delta
Juju/c=1/l=0        44.0s ± 1%        38.7s ± 9%  -11.97%  (p=0.001 n=7+7)
Juju/c=1/l=4        53.7s ± 3%        45.3s ± 4%  -15.53%  (p=0.001 n=7+7)
Juju/c=4/l=0        39.7s ± 8%        32.0s ± 4%  -19.38%  (p=0.001 n=7+7)
Juju/c=4/l=4        46.3s ± 4%        38.0s ± 4%  -18.06%  (p=0.001 n=7+7)

name          old user-time/op  new user-time/op  delta
Juju/c=1/l=0         371s ± 1%         300s ± 0%  -19.07%  (p=0.001 n=7+6)
Juju/c=1/l=4         482s ± 0%         374s ± 1%  -22.37%  (p=0.001 n=7+7)
Juju/c=4/l=0         410s ± 1%         340s ± 1%  -17.19%  (p=0.001 n=7+7)
Juju/c=4/l=4         532s ± 1%         424s ± 1%  -20.26%  (p=0.001 n=7+7)

name          old sys-time/op   new sys-time/op   delta
Juju/c=1/l=0        33.4s ± 1%        28.4s ± 2%  -15.02%  (p=0.001 n=7+7)
Juju/c=1/l=4        40.7s ± 2%        32.8s ± 3%  -19.51%  (p=0.001 n=7+7)
Juju/c=4/l=0        39.8s ± 2%        34.4s ± 2%  -13.74%  (p=0.001 n=7+7)
Juju/c=4/l=4        48.4s ± 2%        40.4s ± 2%  -16.50%  (p=0.001 n=7+7)

Kubelet is "go build -a k8s.io/kubernetes/cmd/kubelet":

name             old real-time/op  new real-time/op  delta
Kubelet/c=1/l=0        42.0s ± 1%        34.8s ± 1%  -17.27%  (p=0.008 n=5+5)
Kubelet/c=1/l=4        55.4s ± 3%        45.4s ± 3%  -18.06%  (p=0.002 n=6+6)
Kubelet/c=4/l=0        37.4s ± 3%        29.9s ± 1%  -20.25%  (p=0.004 n=6+5)
Kubelet/c=4/l=4        48.1s ± 2%        39.0s ± 5%  -18.93%  (p=0.002 n=6+6)

name             old user-time/op  new user-time/op  delta
Kubelet/c=1/l=0         291s ± 1%         233s ± 1%  -19.96%  (p=0.002 n=6+6)
Kubelet/c=1/l=4         385s ± 1%         298s ± 1%  -22.51%  (p=0.002 n=6+6)
Kubelet/c=4/l=0         325s ± 0%         268s ± 1%  -17.48%  (p=0.004 n=5+6)
Kubelet/c=4/l=4         429s ± 1%         343s ± 1%  -20.08%  (p=0.002 n=6+6)

name             old sys-time/op   new sys-time/op   delta
Kubelet/c=1/l=0        25.1s ± 2%        20.9s ± 4%  -16.69%  (p=0.002 n=6+6)
Kubelet/c=1/l=4        31.2s ± 3%        24.4s ± 0%  -21.67%  (p=0.010 n=6+4)
Kubelet/c=4/l=0        30.2s ± 2%        25.6s ± 1%  -15.34%  (p=0.002 n=6+6)
Kubelet/c=4/l=4        37.3s ± 1%        30.9s ± 2%  -17.11%  (p=0.002 n=6+6)

Change-Id: Ie43eb3bbe1392cbb61c86792a17a57b33b9561f0
Reviewed-on: https://go-review.googlesource.com/106796
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-04-24 01:05:27 +00:00
Matthew Dempsky 03f546eb60 cmd/compile/internal/types: add Pkg and SetPkg methods to Type
The go/types API exposes what package objects were declared in, which
includes struct fields, interface methods, and function parameters.

The compiler implicitly tracks these for non-exported identifiers
(through the Sym's associated Pkg), but exported identifiers always
use localpkg. To simplify identifying this, add an explicit package
field to struct, interface, and function types.

Change-Id: I6adc5dc653e78f058714259845fb3077066eec82
Reviewed-on: https://go-review.googlesource.com/107622
Reviewed-by: Robert Griesemer <gri@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-24 00:48:29 +00:00
Josh Bleecher Snyder d292f77e95 cmd/compile: rewrite 2*x+c into LEAx1 on amd64
Rewrite x<<1+c into x+x+c, which can be expressed as a single LEAQ/LEAL.

Bit of a special case, but the single-instruction
LEA is both shorter and faster than SHL then ADD.

Triggers 293 times during make.bash.

Change-Id: I3f09c8e9a8f3859d1eeed336f095fc3ada79c2c1
Reviewed-on: https://go-review.googlesource.com/108938
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-23 22:40:10 +00:00
Josh Bleecher Snyder 22115859a5 cmd/compile: add amd64 LEAL{1,2,4,8} ops
For future use in rewrite rules.

Change-Id: Ic9875beb0dea6e0bbcbd4b75d99a53f4a9a7c3fd
Reviewed-on: https://go-review.googlesource.com/101275
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-04-23 21:42:28 +00:00
Matthew Dempsky 545ef11037 cmd/compile: remove toolstash workaround in bexport.go
Change-Id: Ie4facdcab4b35cf7d350c4b8fa06a3c5a0c6caeb
Reviewed-on: https://go-review.googlesource.com/108875
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-04-23 18:06:05 +00:00
Matthew Dempsky 7759b32a62 cmd/compile: replace Field.Nname.Pos with Field.Pos
For struct fields and methods, Field.Nname was only used to store
position information, which means we're allocating an entire ONAME
Node+Name+Param structure just for one field. We can optimize away
these ONAME allocations by instead adding a Field.Pos field.

Unfortunately, we can't get rid of Field.Nname, because it's needed
for function parameters, so Field grows a little bit and now has more
redundant information in those cases. However, that was already the
case (e.g., Field.Sym and Field.Nname.Sym), and it's still a net win
for allocations as demonstrated by the benchmarks below.

Additionally, by moving the ONAME allocation for function parameters
to funcargs, we can avoid allocating them for function parameters that
aren't used in corresponding function bodies (e.g., interface methods,
function-typed variables, and imported functions/methods without
inline bodies).

name       old time/op       new time/op       delta
Template         254ms ± 6%        251ms ± 6%  -1.04%  (p=0.000 n=487+488)
Unicode          128ms ± 7%        128ms ± 7%    ~     (p=0.294 n=482+467)
GoTypes          862ms ± 5%        860ms ± 4%    ~     (p=0.075 n=488+471)
Compiler         3.91s ± 4%        3.90s ± 4%  -0.39%  (p=0.000 n=468+473)

name       old user-time/op  new user-time/op  delta
Template         339ms ±14%        336ms ±14%  -1.02%  (p=0.001 n=498+494)
Unicode          176ms ±18%        176ms ±25%    ~     (p=0.940 n=491+499)
GoTypes          1.13s ± 8%        1.13s ± 9%    ~     (p=0.157 n=496+493)
Compiler         5.24s ± 6%        5.21s ± 6%  -0.57%  (p=0.000 n=485+489)

name       old alloc/op      new alloc/op      delta
Template        38.3MB ± 0%       37.3MB ± 0%  -2.58%  (p=0.000 n=499+497)
Unicode         29.1MB ± 0%       29.1MB ± 0%  -0.03%  (p=0.000 n=500+493)
GoTypes          116MB ± 0%        115MB ± 0%  -0.65%  (p=0.000 n=498+499)
Compiler         492MB ± 0%        487MB ± 0%  -1.00%  (p=0.000 n=497+498)

name       old allocs/op     new allocs/op     delta
Template          364k ± 0%         360k ± 0%  -1.15%  (p=0.000 n=499+499)
Unicode           336k ± 0%         336k ± 0%  -0.01%  (p=0.000 n=500+493)
GoTypes          1.16M ± 0%        1.16M ± 0%  -0.30%  (p=0.000 n=499+499)
Compiler         4.54M ± 0%        4.51M ± 0%  -0.58%  (p=0.000 n=494+495)

Passes toolstash-check -gcflags=-dwarf=false. Changes DWARF output
because position information is now tracked more precisely for
function parameters.

Change-Id: Ib8077d70d564cc448c5e4290baceab3a4396d712
Reviewed-on: https://go-review.googlesource.com/108217
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-04-23 18:05:57 +00:00
Austin Clements bdb65da049 cmd/compile: don't compact liveness maps in place
Currently Liveness.compact rewrites the Liveness.livevars slice in
place. However, we're about to add register maps, which we'll want to
track in livevars, but compact independently from the stack maps.
Hence, this CL modifies Liveness.compact to consume Liveness.livevars
and produce a new slice of deduplicated stack maps. This is somewhat
clearer anyway because it avoids potential confusion over how
Liveness.livevars is indexed.

Passes toolstash -cmp.

For #24543.

Change-Id: I7093fbc71143f8a29e677aa30c96e501f953ca2b
Reviewed-on: https://go-review.googlesource.com/108498
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-04-23 16:32:16 +00:00
Josh Bleecher Snyder 566e3e074c cmd/compile: avoid runtime call during switch string(byteslice)
This triggers three times while building std,
once in image/png and twice in go/internal/gccgoimporter.

There are no instances in std in which a more aggressive
optimization would have triggered.

This doesn't necessarily avoid an allocation,
because escape analysis is already able in many cases
to use a temporary backing for the string,
but it does at a minimum avoid the runtime call and copy.

Fixes #24937

Change-Id: I7019e85638ba8cd7e2f03890e672558b858579bc
Reviewed-on: https://go-review.googlesource.com/108035
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-04-21 00:50:50 +00:00
Matthew Dempsky dd71e3fef4 cmd/compile: refactor how declarations are imported
This CL moves all of the logic for wiring up imported declarations
into export.go, so that it can be reused by the indexed importer
code. While here, increase symmetry across routines.

Passes toolstash-check.

Change-Id: I1ccec5c3999522b010e4d04ed56b632fd4d712d9
Reviewed-on: https://go-review.googlesource.com/107621
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-04-20 21:19:17 +00:00
Austin Clements 568d6f9831 cmd/compile: teach Haspointer about TSSA and TTUPLE
These will appear when tracking live pointers in registers, so we need
to know whether they have pointers.

For #24543.

Change-Id: I2edccee39ca989473db4b3e7875ff166808ac141
Reviewed-on: https://go-review.googlesource.com/108497
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: David Chase <drchase@google.com>
2018-04-20 19:10:00 +00:00