Commit Graph

11738 Commits

Author SHA1 Message Date
David Chase f7404974da cmd/compile: finish GOEXPERIMENT=preemptibleloops repair
A newish check for branch-likely on single-successor blocks
caught a case where the preemption-check inserter was
setting "likely" on an unconditional branch.

Fixed by checking for that case before setting likely.

Also removed an overconservative restriction on parallel
compilation for GOEXPERIMENT=preemptibleloops; it works
fine, it is just another control-flow transformation.

Change-Id: I8e786e6281e0631cac8d80cff67bfb6402b4d225
Reviewed-on: https://go-review.googlesource.com/102317
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-03-26 17:44:14 +00:00
Michael Munday 8afa8a3374 cmd/compile: use 32-bit comparisons where possible on s390x
We use 32-bit operations for 8- and 16-bit arithmetic, so use them
for comparisons too. This won't change performance but it is more
consistent and makes testing 8- and 16-bit comparison codegen
slightly more straightforward (for follow up CL).

Also fix a typo and add some additional double sign and zero
extension rules to remove the operations inserted by the comparison
rules.

Change-Id: I89ec1b0e09cb8be8090cf007be283ad88bba75a4
Reviewed-on: https://go-review.googlesource.com/102556
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-26 17:41:34 +00:00
Hana Kim 68a1c9c400 internal/trace: compute span stats as computing goroutine stats
Move part of UserSpan event processing from cmd/trace.analyzeAnnotations
to internal/trace.GoroutineStats that returns analyzed per-goroutine
execution information. Now the execution information includes list of
spans and their execution information.

cmd/trace.analyzeAnnotations utilizes the span execution information
from internal/trace.GoroutineStats and connects them with task
information.

Change-Id: Ib7f79a3ba652a4ae55cd81ea17565bcc7e241c5c
Reviewed-on: https://go-review.googlesource.com/101917
Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Peter Weinberger <pjw@google.com>
2018-03-26 16:59:01 +00:00
Ilya Tocar 24cd112086 cmd/compile/internal/ssa: optimize away double NEG on amd64
When lowering some ops on amd64 we generate additional NEGQ.
This may result in code like this:

NEGQ R12
NEGQ R12

Optimize it away. Gain is not significant, about ~0.5% gain in geomean
in compress/flate and 200 bytes codesize reduction in go tool.

Full results below:

name                             old time/op    new time/op    delta
Encode/Digits/Huffman/1e4-6        65.8µs ± 0%    65.7µs ± 0%  -0.21%  (p=0.010 n=10+9)
Encode/Digits/Huffman/1e5-6         633µs ± 0%     632µs ± 0%    ~     (p=0.370 n=8+9)
Encode/Digits/Huffman/1e6-6        6.30ms ± 1%    6.29ms ± 1%    ~     (p=0.796 n=10+10)
Encode/Digits/Speed/1e4-6           281µs ± 0%     280µs ± 1%  -0.34%  (p=0.043 n=8+10)
Encode/Digits/Speed/1e5-6          2.66ms ± 0%    2.66ms ± 0%  -0.09%  (p=0.043 n=10+10)
Encode/Digits/Speed/1e6-6          26.3ms ± 0%    26.3ms ± 0%    ~     (p=0.190 n=10+10)
Encode/Digits/Default/1e4-6         554µs ± 0%     557µs ± 0%  +0.46%  (p=0.001 n=9+10)
Encode/Digits/Default/1e5-6        8.63ms ± 1%    8.62ms ± 1%    ~     (p=0.912 n=10+10)
Encode/Digits/Default/1e6-6        92.7ms ± 1%    92.2ms ± 1%    ~     (p=0.052 n=10+10)
Encode/Digits/Compression/1e4-6     558µs ± 1%     557µs ± 1%    ~     (p=0.481 n=10+10)
Encode/Digits/Compression/1e5-6    8.58ms ± 0%    8.61ms ± 1%    ~     (p=0.315 n=8+10)
Encode/Digits/Compression/1e6-6    92.3ms ± 1%    92.4ms ± 1%    ~     (p=0.971 n=10+10)
Encode/Twain/Huffman/1e4-6         89.5µs ± 0%    89.0µs ± 1%  -0.48%  (p=0.001 n=9+9)
Encode/Twain/Huffman/1e5-6          727µs ± 1%     728µs ± 0%    ~     (p=0.604 n=10+9)
Encode/Twain/Huffman/1e6-6         7.21ms ± 0%    7.19ms ± 1%    ~     (p=0.696 n=8+10)
Encode/Twain/Speed/1e4-6            320µs ± 1%     321µs ± 1%    ~     (p=0.353 n=10+10)
Encode/Twain/Speed/1e5-6           2.63ms ± 0%    2.62ms ± 1%  -0.33%  (p=0.016 n=8+10)
Encode/Twain/Speed/1e6-6           25.8ms ± 0%    25.8ms ± 0%    ~     (p=0.360 n=10+8)
Encode/Twain/Default/1e4-6          677µs ± 1%     671µs ± 1%  -0.88%  (p=0.000 n=10+10)
Encode/Twain/Default/1e5-6         10.5ms ± 1%    10.3ms ± 0%  -2.06%  (p=0.000 n=10+10)
Encode/Twain/Default/1e6-6          113ms ± 1%     111ms ± 1%  -1.96%  (p=0.000 n=10+9)
Encode/Twain/Compression/1e4-6      688µs ± 0%     679µs ± 1%  -1.30%  (p=0.000 n=7+10)
Encode/Twain/Compression/1e5-6     11.6ms ± 1%    11.3ms ± 1%  -2.10%  (p=0.000 n=10+10)
Encode/Twain/Compression/1e6-6      126ms ± 1%     124ms ± 0%  -1.57%  (p=0.000 n=10+10)
[Geo mean]                         3.45ms         3.44ms       -0.46%

name                             old speed      new speed      delta
Encode/Digits/Huffman/1e4-6       152MB/s ± 0%   152MB/s ± 0%  +0.21%  (p=0.009 n=10+9)
Encode/Digits/Huffman/1e5-6       158MB/s ± 0%   158MB/s ± 0%    ~     (p=0.336 n=8+9)
Encode/Digits/Huffman/1e6-6       159MB/s ± 1%   159MB/s ± 1%    ~     (p=0.781 n=10+10)
Encode/Digits/Speed/1e4-6        35.6MB/s ± 0%  35.7MB/s ± 1%  +0.34%  (p=0.020 n=8+10)
Encode/Digits/Speed/1e5-6        37.6MB/s ± 0%  37.7MB/s ± 0%  +0.09%  (p=0.049 n=10+10)
Encode/Digits/Speed/1e6-6        38.0MB/s ± 0%  38.0MB/s ± 0%    ~     (p=0.146 n=10+10)
Encode/Digits/Default/1e4-6      18.0MB/s ± 0%  18.0MB/s ± 0%  -0.45%  (p=0.002 n=9+10)
Encode/Digits/Default/1e5-6      11.6MB/s ± 1%  11.6MB/s ± 1%    ~     (p=0.644 n=10+10)
Encode/Digits/Default/1e6-6      10.8MB/s ± 1%  10.8MB/s ± 1%  +0.51%  (p=0.044 n=10+10)
Encode/Digits/Compression/1e4-6  17.9MB/s ± 1%  17.9MB/s ± 1%    ~     (p=0.468 n=10+10)
Encode/Digits/Compression/1e5-6  11.7MB/s ± 0%  11.6MB/s ± 1%    ~     (p=0.322 n=8+10)
Encode/Digits/Compression/1e6-6  10.8MB/s ± 1%  10.8MB/s ± 1%    ~     (p=0.983 n=10+10)
Encode/Twain/Huffman/1e4-6        112MB/s ± 0%   112MB/s ± 1%  +0.42%  (p=0.002 n=8+9)
Encode/Twain/Huffman/1e5-6        138MB/s ± 1%   137MB/s ± 0%    ~     (p=0.616 n=10+9)
Encode/Twain/Huffman/1e6-6        139MB/s ± 0%   139MB/s ± 1%    ~     (p=0.652 n=8+10)
Encode/Twain/Speed/1e4-6         31.3MB/s ± 1%  31.2MB/s ± 1%    ~     (p=0.342 n=10+10)
Encode/Twain/Speed/1e5-6         38.0MB/s ± 0%  38.1MB/s ± 1%  +0.33%  (p=0.011 n=8+10)
Encode/Twain/Speed/1e6-6         38.8MB/s ± 0%  38.7MB/s ± 0%    ~     (p=0.325 n=10+8)
Encode/Twain/Default/1e4-6       14.8MB/s ± 1%  14.9MB/s ± 1%  +0.88%  (p=0.000 n=10+10)
Encode/Twain/Default/1e5-6       9.48MB/s ± 1%  9.68MB/s ± 0%  +2.11%  (p=0.000 n=10+10)
Encode/Twain/Default/1e6-6       8.86MB/s ± 1%  9.03MB/s ± 1%  +1.97%  (p=0.000 n=10+9)
Encode/Twain/Compression/1e4-6   14.5MB/s ± 0%  14.7MB/s ± 1%  +1.31%  (p=0.000 n=7+10)
Encode/Twain/Compression/1e5-6   8.63MB/s ± 1%  8.82MB/s ± 1%  +2.17%  (p=0.000 n=10+10)
Encode/Twain/Compression/1e6-6   7.92MB/s ± 1%  8.05MB/s ± 1%  +1.59%  (p=0.000 n=10+10)
[Geo mean]                       29.0MB/s       29.1MB/s       +0.47%

// symSizeComp `which go` go_old:

section differences:
global text (code) = 203 bytes (0.005131%)
read-only data = 1 bytes (0.000057%)
Total difference 204 bytes (0.003297%)

Change-Id: Ie2cdfa1216472d78694fff44d215b3b8e71cf7bf
Reviewed-on: https://go-review.googlesource.com/102277
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-26 16:19:53 +00:00
Hana (Hyang-Ah) Kim ea1f483240 cmd/trace: beautify goroutine page
- Summary: also includes links to pprof data.
- Sortable table: sorting is done on server-side. The intention is
  that later, I want to add pagination feature and limit the page
  size the browser has to handle.
- Stacked horizontal bar graph to present total time breakdown.
- Human-friendly time representation.
- No dependency on external fancy javascript libraries to allow
  it to function without an internet connection.

Change-Id: I91e5c26746e59ad0329dfb61e096e11f768c7b73
Reviewed-on: https://go-review.googlesource.com/102156
Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Andrew Bonventre <andybons@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2018-03-26 15:36:56 +00:00
Alberto Donizetti 2ba98f1ae9 cmd/compile: avoid some allocations in regalloc
Compilebench:
name      old time/op       new time/op       delta
Template        283ms ± 3%        281ms ± 4%    ~     (p=0.242 n=20+20)
Unicode         137ms ± 6%        135ms ± 6%    ~     (p=0.194 n=20+19)
GoTypes         890ms ± 2%        883ms ± 1%  -0.74%  (p=0.001 n=19+19)
Compiler        4.21s ± 2%        4.20s ± 2%  -0.40%  (p=0.033 n=20+19)
SSA             9.86s ± 2%        9.68s ± 1%  -1.80%  (p=0.000 n=20+19)
Flate           185ms ± 5%        185ms ± 7%    ~     (p=0.429 n=20+20)
GoParser        222ms ± 3%        222ms ± 4%    ~     (p=0.588 n=19+20)
Reflect         572ms ± 2%        570ms ± 3%    ~     (p=0.113 n=19+20)
Tar             263ms ± 4%        259ms ± 2%  -1.41%  (p=0.013 n=20+20)
XML             321ms ± 2%        321ms ± 4%    ~     (p=0.835 n=20+19)

name      old user-time/op  new user-time/op  delta
Template        400ms ± 5%        405ms ± 5%    ~     (p=0.096 n=20+20)
Unicode         217ms ± 8%        213ms ± 8%    ~     (p=0.242 n=20+20)
GoTypes         1.23s ± 3%        1.22s ± 3%    ~     (p=0.923 n=19+20)
Compiler        5.76s ± 6%        5.81s ± 2%    ~     (p=0.687 n=20+19)
SSA             14.2s ± 4%        14.0s ± 4%    ~     (p=0.121 n=20+20)
Flate           248ms ± 7%        251ms ±10%    ~     (p=0.369 n=20+20)
GoParser        308ms ± 5%        305ms ± 6%    ~     (p=0.336 n=19+20)
Reflect         771ms ± 2%        766ms ± 2%    ~     (p=0.113 n=20+19)
Tar             370ms ± 5%        362ms ± 7%  -2.06%  (p=0.036 n=19+20)
XML             435ms ± 4%        432ms ± 5%    ~     (p=0.369 n=20+20)

name      old alloc/op      new alloc/op      delta
Template       39.5MB ± 0%       39.4MB ± 0%  -0.20%  (p=0.000 n=20+20)
Unicode        29.1MB ± 0%       29.1MB ± 0%    ~     (p=0.064 n=20+20)
GoTypes         117MB ± 0%        117MB ± 0%  -0.17%  (p=0.000 n=20+20)
Compiler        503MB ± 0%        502MB ± 0%  -0.15%  (p=0.000 n=19+19)
SSA            1.42GB ± 0%       1.42GB ± 0%  -0.16%  (p=0.000 n=20+20)
Flate          25.3MB ± 0%       25.3MB ± 0%  -0.19%  (p=0.000 n=20+20)
GoParser       31.4MB ± 0%       31.3MB ± 0%  -0.14%  (p=0.000 n=20+18)
Reflect        78.1MB ± 0%       77.9MB ± 0%  -0.34%  (p=0.000 n=20+19)
Tar            40.1MB ± 0%       40.0MB ± 0%  -0.17%  (p=0.000 n=20+20)
XML            45.3MB ± 0%       45.2MB ± 0%  -0.13%  (p=0.000 n=20+20)

name      old allocs/op     new allocs/op     delta
Template         393k ± 0%         392k ± 0%  -0.21%  (p=0.000 n=20+19)
Unicode          337k ± 0%         337k ± 0%  -0.02%  (p=0.000 n=20+20)
GoTypes         1.22M ± 0%        1.22M ± 0%  -0.21%  (p=0.000 n=20+20)
Compiler        4.77M ± 0%        4.76M ± 0%  -0.16%  (p=0.000 n=20+20)
SSA             11.8M ± 0%        11.8M ± 0%  -0.12%  (p=0.000 n=20+20)
Flate            242k ± 0%         241k ± 0%  -0.20%  (p=0.000 n=20+20)
GoParser         324k ± 0%         324k ± 0%  -0.14%  (p=0.000 n=20+20)
Reflect          985k ± 0%         981k ± 0%  -0.38%  (p=0.000 n=20+20)
Tar              403k ± 0%         402k ± 0%  -0.19%  (p=0.000 n=20+20)
XML              424k ± 0%         424k ± 0%  -0.16%  (p=0.000 n=19+20)

Change-Id: I131e382b64cd6db11a9263a477d45d80c180c499
Reviewed-on: https://go-review.googlesource.com/102421
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-25 18:27:49 +00:00
Daniel Martí b1892d740e cmd/compile/internal/gc: various cleanups
Remove a couple of unnecessary var declarations, an unused sort.Sort
type, and simplify a range by using the two-name variant.

Change-Id: Ia251f634db0bfbe8b1d553b8659272ddbd13b2c3
Reviewed-on: https://go-review.googlesource.com/102336
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-24 19:44:18 +00:00
Daniel Nephin 5526ef1c51 cmd/test2json: document missing "skip" action
Change-Id: I906e61170279f0647598e2fd4fa931aac1b69288
GitHub-Last-Rev: f6df43e8e1
GitHub-Pull-Request: golang/go#24517
Reviewed-on: https://go-review.googlesource.com/102396
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-24 18:37:22 +00:00
Alberto Donizetti a27cd4fd31 test/codegen: port tbz/tbnz arm64 tests
And delete them from asm_test.

Change-Id: I34fcf85ae8ce09cd146fe4ce6a0ae7616bd97e2d
Reviewed-on: https://go-review.googlesource.com/102296
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-24 09:35:53 +00:00
Giovanni Bajo d54902ece9 cmd/compile: in prove, shortcircuit self-facts
Sometimes, we can end up calling update with a self-relation
about a variable (x REL x). In this case, there is no need
to record anything: the relation is unsatisfiable if and only
if it doesn't contain eq.

This also helps avoiding infinite loop in next CL that will
introduce transitive closure of relations.

Passes toolstash -cmp.

Change-Id: Ic408452ec1c13653f22ada35466ec98bc14aaa8e
Reviewed-on: https://go-review.googlesource.com/100276
Reviewed-by: Austin Clements <austin@google.com>
2018-03-24 03:06:21 +00:00
Giovanni Bajo 385d936fb2 cmd/compile: in prove, fail fast when unsat is found
When an unsatisfiable relation is recorded in the facts table,
there is no need to compute further relations or updates
additional data structures.

Since we're about to transitively propagate relations, make
sure to fail as fast as possible to avoid doing useless work
in dead branches.

Passes toolstash -cmp.

Change-Id: I23eed376d62776824c33088163c7ac9620abce85
Reviewed-on: https://go-review.googlesource.com/100275
Reviewed-by: Austin Clements <austin@google.com>
2018-03-24 03:06:01 +00:00
Giovanni Bajo 79112707bb cmd/compile: add patterns for bit set/clear/complement on amd64
This patch completes implementation of BT(Q|L), and adds support
for BT(S|R|C)(Q|L).

Example of code changes from time.(*Time).addSec:

        if t.wall&hasMonotonic != 0 {
  0x1073465               488b08                  MOVQ 0(AX), CX
  0x1073468               4889ca                  MOVQ CX, DX
  0x107346b               48c1e93f                SHRQ $0x3f, CX
  0x107346f               48c1e13f                SHLQ $0x3f, CX
  0x1073473               48f7c1ffffffff          TESTQ $-0x1, CX
  0x107347a               746b                    JE 0x10734e7

        if t.wall&hasMonotonic != 0 {
  0x1073435               488b08                  MOVQ 0(AX), CX
  0x1073438               480fbae13f              BTQ $0x3f, CX
  0x107343d               7363                    JAE 0x10734a2

Another example:

                        t.wall = t.wall&nsecMask | uint64(dsec)<<nsecShift | hasMonotonic
  0x10734c8               4881e1ffffff3f          ANDQ $0x3fffffff, CX
  0x10734cf               48c1e61e                SHLQ $0x1e, SI
  0x10734d3               4809ce                  ORQ CX, SI
  0x10734d6               48b90000000000000080    MOVQ $0x8000000000000000, CX
  0x10734e0               4809f1                  ORQ SI, CX
  0x10734e3               488908                  MOVQ CX, 0(AX)

                        t.wall = t.wall&nsecMask | uint64(dsec)<<nsecShift | hasMonotonic
  0x107348b		4881e2ffffff3f		ANDQ $0x3fffffff, DX
  0x1073492		48c1e61e		SHLQ $0x1e, SI
  0x1073496		4809f2			ORQ SI, DX
  0x1073499		480fbaea3f		BTSQ $0x3f, DX
  0x107349e		488910			MOVQ DX, 0(AX)

Go1 benchmarks seem unaffected, and I would be surprised
otherwise:

name                     old time/op    new time/op     delta
BinaryTree17-4              2.64s ± 4%      2.56s ± 9%  -2.92%  (p=0.008 n=9+9)
Fannkuch11-4                2.90s ± 1%      2.95s ± 3%  +1.76%  (p=0.010 n=10+9)
FmtFprintfEmpty-4          35.3ns ± 1%     34.5ns ± 2%  -2.34%  (p=0.004 n=9+8)
FmtFprintfString-4         57.0ns ± 1%     58.4ns ± 5%  +2.52%  (p=0.029 n=9+10)
FmtFprintfInt-4            59.8ns ± 3%     59.8ns ± 6%    ~     (p=0.565 n=10+10)
FmtFprintfIntInt-4         93.9ns ± 3%     91.2ns ± 5%  -2.94%  (p=0.014 n=10+9)
FmtFprintfPrefixedInt-4     107ns ± 6%      104ns ± 6%    ~     (p=0.099 n=10+10)
FmtFprintfFloat-4           187ns ± 3%      188ns ± 3%    ~     (p=0.505 n=10+9)
FmtManyArgs-4               410ns ± 1%      415ns ± 6%    ~     (p=0.649 n=8+10)
GobDecode-4                5.30ms ± 3%     5.27ms ± 3%    ~     (p=0.436 n=10+10)
GobEncode-4                4.62ms ± 5%     4.47ms ± 2%  -3.24%  (p=0.001 n=9+10)
Gzip-4                      197ms ± 4%      193ms ± 3%    ~     (p=0.123 n=10+10)
Gunzip-4                   30.4ms ± 3%     30.1ms ± 3%    ~     (p=0.481 n=10+10)
HTTPClientServer-4         76.3µs ± 1%     76.0µs ± 1%    ~     (p=0.236 n=8+9)
JSONEncode-4               10.5ms ± 9%     10.3ms ± 3%    ~     (p=0.280 n=10+10)
JSONDecode-4               42.3ms ±10%     41.3ms ± 2%    ~     (p=0.053 n=9+10)
Mandelbrot200-4            3.80ms ± 2%     3.72ms ± 2%  -2.15%  (p=0.001 n=9+10)
GoParse-4                  2.88ms ±10%     2.81ms ± 2%    ~     (p=0.247 n=10+10)
RegexpMatchEasy0_32-4      69.5ns ± 4%     68.6ns ± 2%    ~     (p=0.171 n=10+10)
RegexpMatchEasy0_1K-4       165ns ± 3%      162ns ± 3%    ~     (p=0.137 n=10+10)
RegexpMatchEasy1_32-4      65.7ns ± 6%     64.4ns ± 2%  -2.02%  (p=0.037 n=10+10)
RegexpMatchEasy1_1K-4       278ns ± 2%      279ns ± 3%    ~     (p=0.991 n=8+9)
RegexpMatchMedium_32-4     99.3ns ± 3%     98.5ns ± 4%    ~     (p=0.457 n=10+9)
RegexpMatchMedium_1K-4     30.1µs ± 1%     30.4µs ± 2%    ~     (p=0.173 n=8+10)
RegexpMatchHard_32-4       1.40µs ± 2%     1.41µs ± 4%    ~     (p=0.565 n=10+10)
RegexpMatchHard_1K-4       42.5µs ± 1%     41.5µs ± 3%  -2.13%  (p=0.002 n=8+9)
Revcomp-4                   332ms ± 4%      328ms ± 5%    ~     (p=0.720 n=9+10)
Template-4                 48.3ms ± 2%     49.6ms ± 3%  +2.56%  (p=0.002 n=8+10)
TimeParse-4                 252ns ± 2%      249ns ± 3%    ~     (p=0.116 n=9+10)
TimeFormat-4                262ns ± 4%      252ns ± 3%  -4.01%  (p=0.000 n=9+10)

name                     old speed      new speed       delta
GobDecode-4               145MB/s ± 3%    146MB/s ± 3%    ~     (p=0.436 n=10+10)
GobEncode-4               166MB/s ± 5%    172MB/s ± 2%  +3.28%  (p=0.001 n=9+10)
Gzip-4                   98.6MB/s ± 4%  100.4MB/s ± 3%    ~     (p=0.123 n=10+10)
Gunzip-4                  639MB/s ± 3%    645MB/s ± 3%    ~     (p=0.481 n=10+10)
JSONEncode-4              185MB/s ± 8%    189MB/s ± 3%    ~     (p=0.280 n=10+10)
JSONDecode-4             46.0MB/s ± 9%   47.0MB/s ± 2%  +2.21%  (p=0.046 n=9+10)
GoParse-4                20.1MB/s ± 9%   20.6MB/s ± 2%    ~     (p=0.239 n=10+10)
RegexpMatchEasy0_32-4     460MB/s ± 4%    467MB/s ± 2%    ~     (p=0.165 n=10+10)
RegexpMatchEasy0_1K-4    6.19GB/s ± 3%   6.28GB/s ± 3%    ~     (p=0.165 n=10+10)
RegexpMatchEasy1_32-4     487MB/s ± 5%    497MB/s ± 2%  +2.00%  (p=0.043 n=10+10)
RegexpMatchEasy1_1K-4    3.67GB/s ± 2%   3.67GB/s ± 3%    ~     (p=0.963 n=8+9)
RegexpMatchMedium_32-4   10.1MB/s ± 3%   10.1MB/s ± 4%    ~     (p=0.435 n=10+9)
RegexpMatchMedium_1K-4   34.0MB/s ± 1%   33.7MB/s ± 2%    ~     (p=0.173 n=8+10)
RegexpMatchHard_32-4     22.9MB/s ± 2%   22.7MB/s ± 4%    ~     (p=0.565 n=10+10)
RegexpMatchHard_1K-4     24.0MB/s ± 3%   24.7MB/s ± 3%  +2.64%  (p=0.001 n=9+9)
Revcomp-4                 766MB/s ± 4%    775MB/s ± 5%    ~     (p=0.720 n=9+10)
Template-4               40.2MB/s ± 2%   39.2MB/s ± 3%  -2.47%  (p=0.002 n=8+10)

The rules match ~1800 times during all.bash.

Fixes #18943

Change-Id: I64be1ada34e89c486dfd935bf429b35652117ed4
Reviewed-on: https://go-review.googlesource.com/94766
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-24 02:38:50 +00:00
isharipo 3afd2d7fc8 cmd/compile/internal/gc: properly initialize ssa.Func Type field
The ssa.Func has Type field that is described as
function signature type.

It never gets any value and remains nil.
This leads to "<T>" signature printed representation.

Given this function declaration:
	func foo(x int, f func() string) (int, error)

GOSSAFUNC printed it as below:
	compiling foo
	foo <T>

After this change:
	compiling foo
	foo func(int, func() string) (int, error)

Change-Id: Iec5eec8aac5c76ff184659e30f41b2f5fe86d329
Reviewed-on: https://go-review.googlesource.com/102375
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-03-24 02:18:52 +00:00
Matthew Dempsky ea668e18a6 cmd/compile: always write pack files
By always writing out pack files, the object file format can be
simplified somewhat. In particular, the export data format will no
longer require escaping, because the pack file provides appropriate
framing.

This CL does not affect build systems that use -pack, which includes
all major Go build systems (cmd/go, gb, bazel).

Also, existing package import logic already distinguishes pack/object
files based on file contents rather than file extension.

The only exception is cmd/pack, which specially handled object files
created by cmd/compile when used with the 'c' mode. This mode is
extended to now recognize the pack files produced by cmd/compile and
handle them as before.

Passes toolstash-check.

Updates #21705.
Updates #24512.

Change-Id: Idf131013bfebd73a5cde7e087eb19964503a9422
Reviewed-on: https://go-review.googlesource.com/102236
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-24 00:51:24 +00:00
Matthew Dempsky 699b0d4e52 cmd/link: skip __.PKGDEF in archives
The __.PKGDEF file is a compiler object file only intended for other
compilers. Also, for build systems that use -linkobj, all of the
information it contains is present within the linker object files
already, so look for it there instead.

This requires a little bit of code reorganization. Significantly,
previously when loading an archive file, the __.PKGDEF file was
authoritative on whether the package was "main" and/or "safe". Now
that we're using the Go object files instead, there's the issue that
there can be multiple Go object files in an archive (because when
using assembly, each assembly file becomes its own additional object
file).

The solution taken here is to check if any object file within the
package declares itself as "main" and/or "safe".

Updates #24512.

Change-Id: I70243a293bdf34b8555c0bf1833f8933b2809449
Reviewed-on: https://go-review.googlesource.com/102281
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-24 00:17:33 +00:00
Matthew Dempsky 946af1b658 cmd/link: make sure we're hashing __.PKGDEF in genhash
This is currently always the case because loadobjfile complains if
it's not, but that will be changed soon.

Updates #24512.

Change-Id: I262daca765932a0f4cea3fcc1cc80ca90de07a59
Reviewed-on: https://go-review.googlesource.com/102280
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-23 22:28:03 +00:00
Hyang-Ah Hana Kim 58734039bd Revert "cmd/vendor/.../pprof: refresh from upstream@a74ae6f"
This reverts commit c6e69ec7f9.

Reason for revert: Broke builders. #24508

Change-Id: I66abff0dd14ec6e1f8d8d982ccfb0438633b639d
Reviewed-on: https://go-review.googlesource.com/102316
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2018-03-23 15:09:04 +00:00
Hana (Hyang-Ah) Kim c6e69ec7f9 cmd/vendor/.../pprof: refresh from upstream@a74ae6f
Merges updates listed in
0e0e5b725...a74ae6f

Update #24443

cmd/vendor/vendor.json was updated manually.

Change-Id: I15d5fe82ac18263d4d54f5773cee0e197e93dd59
Reviewed-on: https://go-review.googlesource.com/101736
Reviewed-by: Alberto Donizetti <alb.donizetti@gmail.com>
2018-03-23 14:36:29 +00:00
Matthew Dempsky 50921bfa2e cmd/compile: change unsafeUintptrTag from var to const
Change-Id: Ie30878199e24cce5b75428e6b602c017ebd16642
Reviewed-on: https://go-review.googlesource.com/102175
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2018-03-22 19:38:06 +00:00
Daniel Martí 02798ed936 cmd/compile: use more range fors in gc
Slightly simplifies the code. Made sure to exclude the cases that would
change behavior, such as when the iterated value is a string, when the
index is modified within the body, or when the slice is modified.

Also checked that all the elements are of pointer type, to avoid the
corner case where non-pointer types could be copied by mistake.

Change-Id: Iea64feb2a9a6a4c94ada9ff3ace40ee173505849
Reviewed-on: https://go-review.googlesource.com/100557
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-03-22 18:38:19 +00:00
Austin Clements 48f990b4a5 cmd/compile: fix GOEXPERIMENT=preemptibleloops type-checking
This experiment has gone stale. It causes a type-checking failure
because the condition of the OIF produced by range loop lowering has
type "untyped bool". Fix this by typechecking the whole OIF statement,
not just its condition.

This doesn't quite fix the whole experiment, but it gets further.
Something about preemption point insertion is causing failures like
"internal compiler error: likeliness prediction 1 for block b10 with 1
successors" in cmd/compile/internal/gc.

Change-Id: I7d80d618d7c91c338bf5f2a8dc174d582a479df3
Reviewed-on: https://go-review.googlesource.com/102157
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-03-22 18:20:31 +00:00
Travis Bischel 4f7b774822 cmd/compile: specialize Move up to 79B on amd64
Move currently uses mov instructions directly up to 31 bytes and then
switches to duffcopy. Moving 31 bytes is 4 instructions corresponding to
two loads and two stores, (or 6 if !useSSE) depending on the usage,
duffcopy is five (one or two mov, two or three lea, one call).

This adds direct mov instructions for Move's of size 32, 48, and 64 with
sse and for only size 32 without.
With useSSE:
- 32 is 4 instructions (byte +/- comparison below)
- 33 thru 48 is 6
- 49 thru 64 is 8

Without:
- 32 is 8

Note that the only platform with useSSE set to false is plan 9. I have
built three projects based off tip and tip with this patch and the
project's byte size is equal to or less than they were prior.

The basis of this change is that copying data with instructions directly
is nearly free, whereas calling into duffcopy adds a bit of overhead.
This is most noticeable in range statements where elements are 32+
bytes. For code with the following pattern:

func Benchmark32Range(b *testing.B) {
        var f s32
        for _, count := range []int{10, 100, 1000, 10000} {
                name := strconv.Itoa(count)
                b.Run(name, func(b *testing.B) {
                        base := make([]s32, count)
                        for i := 0; i < b.N; i++ {
                                for _, v := range base {
                                        f = v
                                }
                        }
                })
        }
        _ = f
}

These are the resulting benchmarks:
Benchmark16Range/10-4        19.1          19.1          +0.00%
Benchmark16Range/100-4       169           170           +0.59%
Benchmark16Range/1000-4      1684          1691          +0.42%
Benchmark16Range/10000-4     18147         18124         -0.13%
Benchmark31Range/10-4        141           142           +0.71%
Benchmark31Range/100-4       1407          1410          +0.21%
Benchmark31Range/1000-4      14070         14074         +0.03%
Benchmark31Range/10000-4     141781        141759        -0.02%
Benchmark32Range/10-4        71.4          32.2          -54.90%
Benchmark32Range/100-4       695           326           -53.09%
Benchmark32Range/1000-4      7166          3313          -53.77%
Benchmark32Range/10000-4     72571         35425         -51.19%
Benchmark64Range/10-4        87.8          64.9          -26.08%
Benchmark64Range/100-4       868           629           -27.53%
Benchmark64Range/1000-4      9355          6907          -26.17%
Benchmark64Range/10000-4     94463         70385         -25.49%
Benchmark79Range/10-4        177           152           -14.12%
Benchmark79Range/100-4       1769          1531          -13.45%
Benchmark79Range/1000-4      17893         15532         -13.20%
Benchmark79Range/10000-4     178947        155551        -13.07%
Benchmark80Range/10-4        99.6          99.7          +0.10%
Benchmark80Range/100-4       987           985           -0.20%
Benchmark80Range/1000-4      10573         10560         -0.12%
Benchmark80Range/10000-4     106792        106639        -0.14%

For runtime's BenchCopyFat* benchmarks:
CopyFat8-4     0.40ns ± 0%  0.40ns ± 0%      ~     (all equal)
CopyFat12-4    0.40ns ± 0%  0.80ns ± 0%  +100.00%  (p=0.000 n=9+9)
CopyFat16-4    0.40ns ± 0%  0.80ns ± 0%  +100.00%  (p=0.000 n=10+8)
CopyFat24-4    0.80ns ± 0%  0.40ns ± 0%   -50.00%  (p=0.001 n=8+9)
CopyFat32-4    2.01ns ± 0%  0.40ns ± 0%   -80.10%  (p=0.000 n=8+8)
CopyFat64-4    2.87ns ± 0%  0.40ns ± 0%   -86.07%  (p=0.000 n=8+10)
CopyFat128-4   4.82ns ± 0%  4.82ns ± 0%      ~     (p=1.000 n=8+8)
CopyFat256-4   8.83ns ± 0%  8.83ns ± 0%      ~     (p=1.000 n=8+8)
CopyFat512-4   16.9ns ± 0%  16.9ns ± 0%      ~     (all equal)
CopyFat520-4   14.6ns ± 0%  14.6ns ± 1%      ~     (p=0.529 n=8+9)
CopyFat1024-4  32.9ns ± 0%  33.0ns ± 0%    +0.20%  (p=0.041 n=8+9)

Function calls are not benefitted as much due how they are compiled, but
other benchmarks I ran show that calling function with 64 byte elements
is marginally improved.

The main downside with this change is that it may increase binary sizes
depending on the size of the copy, but this change also decreases
binaries for moves of 48 bytes or less.

For the following code:
package main

type size [32]byte

//go:noinline
func use(t size) {
}

//go:noinline
func get() size {
	var z size
	return z
}

func main() {
	var a size
	use(a)
}

Changing size around gives the following assembly leading up to the call
(the initialization and actual call are removed):

tip func call with 32B arg: 27B
    48 89 e7                 mov    %rsp,%rdi
    48 8d 74 24 20           lea    0x20(%rsp),%rsi
    48 89 6c 24 f0           mov    %rbp,-0x10(%rsp)
    48 8d 6c 24 f0           lea    -0x10(%rsp),%rbp
    e8 53 ab ff ff           callq  448964 <runtime.duffcopy+0x364>
    48 8b 6d 00              mov    0x0(%rbp),%rbp

modified: 19B (-8B)
    0f 10 44 24 20           movups 0x20(%rsp),%xmm0
    0f 11 04 24              movups %xmm0,(%rsp)
    0f 10 44 24 30           movups 0x30(%rsp),%xmm0
    0f 11 44 24 10           movups %xmm0,0x10(%rsp)
-
tip with 47B arg: 29B
    48 8d 7c 24 0f           lea    0xf(%rsp),%rdi
    48 8d 74 24 40           lea    0x40(%rsp),%rsi
    48 89 6c 24 f0           mov    %rbp,-0x10(%rsp)
    48 8d 6c 24 f0           lea    -0x10(%rsp),%rbp
    e8 43 ab ff ff           callq  448964 <runtime.duffcopy+0x364>
    48 8b 6d 00              mov    0x0(%rbp),%rbp

modified: 20B (-9B)
    0f 10 44 24 40           movups 0x40(%rsp),%xmm0
    0f 11 44 24 0f           movups %xmm0,0xf(%rsp)
    0f 10 44 24 50           movups 0x50(%rsp),%xmm0
    0f 11 44 24 1f           movups %xmm0,0x1f(%rsp)
-
tip with 64B arg: 27B
    48 89 e7                 mov    %rsp,%rdi
    48 8d 74 24 40           lea    0x40(%rsp),%rsi
    48 89 6c 24 f0           mov    %rbp,-0x10(%rsp)
    48 8d 6c 24 f0           lea    -0x10(%rsp),%rbp
    e8 1f ab ff ff           callq  448948 <runtime.duffcopy+0x348>
    48 8b 6d 00              mov    0x0(%rbp),%rbp

modified: 39B [+12B]
    0f 10 44 24 40           movups 0x40(%rsp),%xmm0
    0f 11 04 24              movups %xmm0,(%rsp)
    0f 10 44 24 50           movups 0x50(%rsp),%xmm0
    0f 11 44 24 10           movups %xmm0,0x10(%rsp)
    0f 10 44 24 60           movups 0x60(%rsp),%xmm0
    0f 11 44 24 20           movups %xmm0,0x20(%rsp)
    0f 10 44 24 70           movups 0x70(%rsp),%xmm0
    0f 11 44 24 30           movups %xmm0,0x30(%rsp)
-
tip with 79B arg: 29B
    48 8d 7c 24 0f           lea    0xf(%rsp),%rdi
    48 8d 74 24 60           lea    0x60(%rsp),%rsi
    48 89 6c 24 f0           mov    %rbp,-0x10(%rsp)
    48 8d 6c 24 f0           lea    -0x10(%rsp),%rbp
    e8 09 ab ff ff           callq  448948 <runtime.duffcopy+0x348>
    48 8b 6d 00              mov    0x0(%rbp),%rbp

modified: 46B [+17B]
    0f 10 44 24 60           movups 0x60(%rsp),%xmm0
    0f 11 44 24 0f           movups %xmm0,0xf(%rsp)
    0f 10 44 24 70           movups 0x70(%rsp),%xmm0
    0f 11 44 24 1f           movups %xmm0,0x1f(%rsp)
    0f 10 84 24 80 00 00     movups 0x80(%rsp),%xmm0
    00
    0f 11 44 24 2f           movups %xmm0,0x2f(%rsp)
    0f 10 84 24 90 00 00     movups 0x90(%rsp),%xmm0
    00
    0f 11 44 24 3f           movups %xmm0,0x3f(%rsp)

So, at best we save 9B, at worst we gain 17. I do not think that copying
around 65+B sized types is common enough to bloat program sizes. Using
bincmp on the go binary itself shows a zero byte difference; there are
gains and losses all over. One of the largest gains in binary size comes
from cmd/go/internal/cache.(*Cache).Get, which passes around a 64 byte
sized type -- this is one of the cases I would expect to be benefitted
by this change.

I think that this marginal improvement in struct copying for 64 byte
structs is worth it: most data structs / work items I use in my programs
are small, but few are smaller than 32 bytes: with one slice, the budget
is up. The 32 rule alone would allow another 16 bytes, the 48 and 64
rules allow another 32 and 48.

Change-Id: I19a8f9190d5d41825091f17f268f4763bfc12a62
Reviewed-on: https://go-review.googlesource.com/100718
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-22 18:17:37 +00:00
Alberto Donizetti fc6280d4b0 test/codegen: port direct comparisons with memory tests
And remove them from asm_test.

Change-Id: I1ca29b40546d6de06f20bfd550ed8ff87f495454
Reviewed-on: https://go-review.googlesource.com/102115
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-22 17:20:09 +00:00
Carlos Eduardo Seo 6633bb2aa7 cmd/compile/internal/ppc64, runtime internal/atomic, sync/atomic: implement faster atomics for ppc64x
This change implements faster atomics for ppc64x based on the ISA 2.07B,
Appendix B.2 recommendations, replacing SYNC/ISYNC by LWSYNC in some
cases.

Updates #21348

name                                           old time/op new time/op    delta
Cond1-16                                           955ns     856ns      -10.33%
Cond2-16                                          2.38µs    2.03µs      -14.59%
Cond4-16                                          5.90µs    5.44µs       -7.88%
Cond8-16                                          12.1µs    11.1µs       -8.42%
Cond16-16                                         27.0µs    25.1µs       -7.04%
Cond32-16                                         59.1µs    55.5µs       -6.14%
LoadMostlyHits/*sync_test.DeepCopyMap-16          22.1ns    24.1ns       +9.02%
LoadMostlyHits/*sync_test.RWMutexMap-16            252ns     249ns       -1.20%
LoadMostlyHits/*sync.Map-16                       16.2ns    16.3ns         ~
LoadMostlyMisses/*sync_test.DeepCopyMap-16        22.3ns    22.6ns         ~
LoadMostlyMisses/*sync_test.RWMutexMap-16          249ns     247ns       -0.51%
LoadMostlyMisses/*sync.Map-16                     12.7ns    12.7ns         ~
LoadOrStoreBalanced/*sync_test.RWMutexMap-16      1.27µs    1.17µs       -7.54%
LoadOrStoreBalanced/*sync.Map-16                  1.12µs    1.10µs       -2.35%
LoadOrStoreUnique/*sync_test.RWMutexMap-16        1.75µs    1.68µs       -3.84%
LoadOrStoreUnique/*sync.Map-16                    2.07µs    1.97µs       -5.13%
LoadOrStoreCollision/*sync_test.DeepCopyMap-16    15.8ns    15.9ns         ~
LoadOrStoreCollision/*sync_test.RWMutexMap-16      496ns     424ns      -14.48%
LoadOrStoreCollision/*sync.Map-16                 6.07ns    6.07ns         ~
Range/*sync_test.DeepCopyMap-16                   1.65µs    1.64µs         ~
Range/*sync_test.RWMutexMap-16                     278µs     288µs       +3.75%
Range/*sync.Map-16                                2.00µs    2.01µs         ~
AdversarialAlloc/*sync_test.DeepCopyMap-16        3.45µs    3.44µs         ~
AdversarialAlloc/*sync_test.RWMutexMap-16          226ns     227ns         ~
AdversarialAlloc/*sync.Map-16                     1.09µs    1.07µs       -2.36%
AdversarialDelete/*sync_test.DeepCopyMap-16        553ns     550ns       -0.57%
AdversarialDelete/*sync_test.RWMutexMap-16         273ns     274ns         ~
AdversarialDelete/*sync.Map-16                     247ns     249ns         ~
UncontendedSemaphore-16                           79.0ns    65.5ns      -17.11%
ContendedSemaphore-16                              112ns      97ns      -13.77%
MutexUncontended-16                               3.34ns    2.51ns      -24.69%
Mutex-16                                           266ns     191ns      -28.26%
MutexSlack-16                                      226ns     159ns      -29.55%
MutexWork-16                                       377ns     338ns      -10.14%
MutexWorkSlack-16                                  335ns     308ns       -8.20%
MutexNoSpin-16                                     196ns     184ns       -5.91%
MutexSpin-16                                       710ns     666ns       -6.21%
Once-16                                           1.29ns    1.29ns         ~
Pool-16                                           8.64ns    8.71ns         ~
PoolOverflow-16                                   1.60µs    1.44µs      -10.25%
SemaUncontended-16                                5.39ns    4.42ns      -17.96%
SemaSyntNonblock-16                                539ns     483ns      -10.42%
SemaSyntBlock-16                                   413ns     354ns      -14.20%
SemaWorkNonblock-16                                305ns     258ns      -15.36%
SemaWorkBlock-16                                   266ns     229ns      -14.06%
RWMutexUncontended-16                             12.9ns     9.7ns      -24.80%
RWMutexWrite100-16                                 203ns     147ns      -27.47%
RWMutexWrite10-16                                  177ns     119ns      -32.74%
RWMutexWorkWrite100-16                             435ns     403ns       -7.39%
RWMutexWorkWrite10-16                              642ns     611ns       -4.79%
WaitGroupUncontended-16                           4.67ns    3.70ns      -20.92%
WaitGroupAddDone-16                                402ns     355ns      -11.54%
WaitGroupAddDoneWork-16                            208ns     250ns      +20.09%
WaitGroupWait-16                                  1.21ns    1.21ns         ~
WaitGroupWaitWork-16                              5.91ns    5.87ns       -0.81%
WaitGroupActuallyWait-16                          92.2ns    85.8ns       -6.91%

Updates #21348

Change-Id: Ibb9b271d11b308264103829e176c6d9fe8f867d3
Reviewed-on: https://go-review.googlesource.com/95175
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2018-03-22 14:13:01 +00:00
Tim Wright 88129f0cb2 all: enable c-shared/c-archive support for freebsd/amd64
Fixes #14327
Much of the code is based on the linux/amd64 code that implements these
build modes, and code is shared where possible.

Change-Id: Ia510f2023768c0edbc863aebc585929ec593b332
Reviewed-on: https://go-review.googlesource.com/93875
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-21 21:56:20 +00:00
isharipo ff5cf43df5 runtime,sync/atomic: replace asm BYTEs with insts for x86
For each replacement, test case is added to new 386enc.s file
with exception of EMMS, SYSENTER, MFENCE and LFENCE as they
are already covered in amd64enc.s (same on amd64 and 386).

The replacement became less obvious after go vet suggested changes
Before:
	BYTE $0x0f; BYTE $0x7f; BYTE $0x44; BYTE $0x24; BYTE $0x08
Changed to MOVQ (this form is being tested):
	MOVQ M0, 8(SP)
Refactored to FP-relative access (go vet advice):
	MOVQ M0, val+4(FP)

Change-Id: I56b87cf3371b6ad81ad0cd9db2033aee407b5818
Reviewed-on: https://go-review.googlesource.com/101475
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-03-21 20:51:04 +00:00
Daniel Martí 77c3ef6f6f cmd/doc: use empty GOPATH when running the tests
Otherwise, a populated GOPATH might result in failures such as:

	$ go test
	[...] no buildable Go source files in [...]/gopherjs/compiler/natives/src/crypto/rand
	exit status 1

Move the initialization of the dirs walker out of the init func, so that
we can control its behavior in the tests.

Updates #24464.

Change-Id: I4b26a7d3d6809bdd8e9b6b0556d566e7855f80fe
Reviewed-on: https://go-review.googlesource.com/101836
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-21 13:43:22 +00:00
Alberto Donizetti 041c5d8348 cmd/trace: remove unused variable in tests
Unused variables in closures are currently not diagnosed by the
compiler (this is Issue #3059), while go/types catches them.

One unused variable in the cmd/trace tests is causing the go/types
test that typechecks the whole standard library to fail:

  FAIL: TestStdlib (8.05s)
    stdlib_test.go:223: cmd/trace/annotations_test.go:241:6: gcTime
    declared but not used
  FAIL

Remove it.

Updates #24464

Change-Id: I0f1b9db6ae1f0130616ee649bdbfdc91e38d2184
Reviewed-on: https://go-review.googlesource.com/101815
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2018-03-21 11:10:03 +00:00
Ilya Tocar 983dcf70ba cmd/compile/internal/ssa: update regalloc in loops
Currently we don't lift spill out of loop if loop contains call.
However often we have code like this:

for .. {
    if hard_case {
	call()
    }
    // simple case, without call
}

So instead of checking for any call, check for unavoidable call.
For #22698 cases I see:
mime/quotedprintable/Writer-6                   10.9µs ± 4%      9.2µs ± 3%   -15.02%  (p=0.000 n=8+8)
And:
compress/flate/Encode/Twain/Huffman/1e4-6       99.4µs ± 6%     90.9µs ± 0%    -8.57%  (p=0.000 n=8+8)
compress/flate/Encode/Twain/Huffman/1e5-6       760µs ± 1%      725µs ± 1%     -4.56%  (p=0.000 n=8+8)
compress/flate/Encode/Twain/Huffman/1e6-6       7.55ms ± 0%      7.24ms ± 0%     -4.07%  (p=0.000 n=8+7)

There are no significant changes on go1 benchmarks.
But for cases with runtime arch checks, where we call generic version on old hardware,
there are respectable performance gains:
math/RoundToEven-6                             1.43ns ± 0%     1.25ns ± 0%   -12.59%  (p=0.001 n=7+7)
math/bits/OnesCount64-6                        1.60ns ± 1%     1.42ns ± 1%   -11.32%  (p=0.000 n=8+8)

Also on some runtime benchmarks loops have less loads and higher performance:
runtime/RuneIterate/range1/ASCII-6             15.6ns ± 1%     13.9ns ± 1%   -10.74%  (p=0.000 n=7+8)
runtime/ArrayEqual-6                           3.22ns ± 0%     2.86ns ± 2%   -11.06%  (p=0.000 n=7+8)

Fixes #22698
Updates #22234

Change-Id: I0ae2f19787d07a9026f064366dedbe601bf7257a
Reviewed-on: https://go-review.googlesource.com/84055
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-03-20 21:02:39 +00:00
Alberto Donizetti be371edd67 test/codegen: port comparisons tests to codegen
And delete them from asm_test.

Change-Id: I64c512bfef3b3da6db5c5d29277675dade28b8ab
Reviewed-on: https://go-review.googlesource.com/101595
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-20 19:38:06 +00:00
Than McIntosh f45c07e84a cmd/compile: fix regression in DWARF inlined routine variable tracking
Fix a bug in the code that generates the pre-inlined variable
declaration table used as raw material for emitting DWARF inline
routine records. The fix for issue 23704 altered the recipe for
assigning file/line/col to variables in one part of the compiler, but
didn't update a similar recipe in the code for variable tracking.
Added a new test that should catch problems of a similar nature.

Fixes #24460.

Change-Id: I255c036637f4151aa579c0e21d123fd413724d61
Reviewed-on: https://go-review.googlesource.com/101676
Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2018-03-20 18:56:52 +00:00
Michael Munday ae10914e67 cmd/compile: mark LAA and LAAG as clobbering flags on s390x
The atomic add instructions modify the condition code and so need to
be marked as clobbering flags.

Fixes #24449.

Change-Id: Ic69c8d775fbdbfb2a56c5e0cfca7a49c0d7f6897
Reviewed-on: https://go-review.googlesource.com/101455
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-20 09:44:50 +00:00
Fangming.Fang 9c312245ac cmd/asm: fix bug about VMOV instruction (move a vector element to another) on ARM64
This change fixes index error when encoding VMOV instruction which pattern
is vmov Vn.<T>[index], Vd.<T>[index]

Change-Id: I949166e6dfd63fb0a9365f183b6c50d452614f9d
Reviewed-on: https://go-review.googlesource.com/101335
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-20 03:45:04 +00:00
Fangming.Fang 7673e30503 cmd/asm: fix bug about VMOV instruction (move register to vector element) on ARM64
This change fixes index error when encoding VMOV instruction which pattern is
VMOV Rn, V.<T>[index]. For example VMOV R1, V1.S[1] is assembled as VMOV R1, V1.S[0]

Fixes #24400
Change-Id: I82b5edc8af4e06862bc4692b119697c6bb7dc3fb
Reviewed-on: https://go-review.googlesource.com/101297
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-20 03:43:37 +00:00
Vladimir Kuzmin c12b185a6e cmd/compile: avoid mapaccess at m[k]=append(m[k]..
Currently rvalue m[k] is transformed during walk into:

        tmp1 := *mapaccess(m, k)
        tmp2 := append(tmp1, ...)
        *mapassign(m, k) = tmp2

However, this is suboptimal, as we could instead produce just:
        tmp := mapassign(m, k)
        *tmp := append(*tmp, ...)

Optimization is possible only if during Order it may tell that m[k] is
exactly the same at left and right part of assignment. It doesn't work:
1) m[f(k)] = append(m[f(k)], ...)
2) sink, m[k] = sink, append(m[k]...)
3) m[k] = append(..., m[k],...)

Benchmark:
name                           old time/op    new time/op    delta
MapAppendAssign/Int32/256-8      33.5ns ± 3%    22.4ns ±10%  -33.24%  (p=0.000 n=16+18)
MapAppendAssign/Int32/65536-8    68.2ns ± 6%    48.5ns ±29%  -28.90%  (p=0.000 n=20+20)
MapAppendAssign/Int64/256-8      34.3ns ± 4%    23.3ns ± 5%  -32.23%  (p=0.000 n=17+18)
MapAppendAssign/Int64/65536-8    65.9ns ± 7%    61.2ns ±19%   -7.06%  (p=0.002 n=18+20)
MapAppendAssign/Str/256-8         116ns ±12%      79ns ±16%  -31.70%  (p=0.000 n=20+19)
MapAppendAssign/Str/65536-8       134ns ±15%     111ns ±45%  -16.95%  (p=0.000 n=19+20)

name                           old alloc/op   new alloc/op   delta
MapAppendAssign/Int32/256-8       47.0B ± 0%     46.0B ± 0%   -2.13%  (p=0.000 n=19+18)
MapAppendAssign/Int32/65536-8     27.0B ± 0%     20.7B ±30%  -23.33%  (p=0.000 n=20+20)
MapAppendAssign/Int64/256-8       47.0B ± 0%     46.0B ± 0%   -2.13%  (p=0.000 n=20+17)
MapAppendAssign/Int64/65536-8     27.0B ± 0%     27.0B ± 0%     ~     (all equal)
MapAppendAssign/Str/256-8         94.0B ± 0%     78.0B ± 0%  -17.02%  (p=0.000 n=20+16)
MapAppendAssign/Str/65536-8       54.0B ± 0%     54.0B ± 0%     ~     (all equal)

Fixes #24364
Updates #5147

Change-Id: Id257d052b75b9a445b4885dc571bf06ce6f6b409
Reviewed-on: https://go-review.googlesource.com/100838
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-20 01:47:07 +00:00
fanzha02 910c3a9dfc cmd/asm: add ARM64 assembler check for incorrect input
Current ARM64 assembler has no check for the invalid value of both
shift amount and post-index immediate offset of LD1/ST1. This patch
adds the check.

This patch also fixes the printing error of register number equals
to 31, which should be printed as ZR instead of R31. Test cases
are also added.

Change-Id: I476235f3ab3a3fc91fe89c5a3149a4d4529c05c7
Reviewed-on: https://go-review.googlesource.com/100255
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
2018-03-19 23:45:50 +00:00
Alberto Donizetti 5a4e09837c test/codegen: port maps test to codegen
And delete them from asm_test.

Change-Id: I3cf0934706a640136cb0f646509174f8c1bf3363
Reviewed-on: https://go-review.googlesource.com/101395
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-19 13:39:34 +00:00
Alberto Donizetti b61b1d2c57 test/codegen: port structs test to codegen
And delete them from asm_test.

Change-Id: Ia286239a3d8f3915f2ca25dbcb39f3354a4f8aea
Reviewed-on: https://go-review.googlesource.com/101138
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-18 16:53:53 +00:00
Daniel Martí 2767c4e285 cmd/go: remove some unused parameters
Change-Id: I441b3045e76afc1c561914926c14efc8a116c8a7
Reviewed-on: https://go-review.googlesource.com/101195
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-16 21:01:28 +00:00
David Chase b30bf958da cmd/compile: enable scopes unconditionally
This revives Alessandro Arzilli's CL to enable scopes
whenever any dwarf is emitted (with optimization or not),
adds a test that detects this changes and shows that it
creates more truthful debugging output.

Reverted change to ssa/debug_test tests made when
scopes were disabled during dwarflocationlist development.

Also included are updates to the Delve test output (it
had fallen out of sync; creating test output for one
updates it for all) and minor naming changes in
ssa/debug_test.

Compile-time/space changes (relative to tip including dwarflocationlists):

benchstat -geomean after.log scopes.log
name        old time/op     new time/op     delta
Template        182ms ± 1%      182ms ± 1%    ~     (p=0.666 n=9+9)
Unicode        82.8ms ± 1%     86.6ms ±14%    ~     (p=0.211 n=9+10)
GoTypes         611ms ± 1%      616ms ± 2%  +0.97%  (p=0.001 n=10+9)
Compiler        2.95s ± 1%      2.95s ± 0%    ~     (p=0.573 n=10+8)
SSA             6.70s ± 1%      6.81s ± 1%  +1.68%  (p=0.000 n=9+10)
Flate           117ms ± 1%      118ms ± 1%  +0.60%  (p=0.036 n=9+8)
GoParser        145ms ± 1%      145ms ± 1%    ~     (p=1.000 n=9+9)
Reflect         398ms ± 1%      396ms ± 1%    ~     (p=0.053 n=9+10)
Tar             171ms ± 1%      171ms ± 1%    ~     (p=0.356 n=9+10)
XML             214ms ± 1%      214ms ± 1%    ~     (p=0.605 n=9+9)
StdCmd          12.4s ± 2%      12.4s ± 1%    ~     (p=1.000 n=9+9)
[Geo mean]      506ms           509ms       +0.71%

name        old user-ns/op  new user-ns/op  delta
Template         254M ± 4%       249M ± 6%    ~     (p=0.155 n=10+10)
Unicode          121M ±11%       124M ± 6%    ~     (p=0.516 n=10+10)
GoTypes          824M ± 2%       869M ± 5%  +5.49%  (p=0.001 n=8+10)
Compiler        4.01G ± 2%      4.02G ± 1%    ~     (p=0.561 n=9+9)
SSA             10.0G ± 2%      10.2G ± 2%  +2.29%  (p=0.000 n=9+10)
Flate            154M ± 7%       154M ± 7%    ~     (p=0.960 n=10+9)
GoParser         190M ± 7%       196M ± 6%    ~     (p=0.064 n=9+10)
Reflect          528M ± 2%       517M ± 3%  -1.97%  (p=0.025 n=10+10)
Tar              227M ± 5%       232M ± 3%    ~     (p=0.061 n=9+10)
XML              286M ± 4%       283M ± 4%    ~     (p=0.343 n=9+9)
[Geo mean]       502M            508M       +1.09%

name        old text-bytes  new text-bytes  delta
HelloSize        672k ± 0%       672k ± 0%  +0.01%  (p=0.000 n=10+10)
CmdGoSize       7.21M ± 0%      7.21M ± 0%  -0.00%  (p=0.000 n=10+10)
[Geo mean]      2.20M           2.20M       +0.00%

name        old data-bytes  new data-bytes  delta
HelloSize       9.88k ± 0%      9.88k ± 0%    ~     (all equal)
CmdGoSize        248k ± 0%       248k ± 0%    ~     (all equal)
[Geo mean]      49.5k           49.5k       +0.00%

name        old bss-bytes   new bss-bytes   delta
HelloSize        125k ± 0%       125k ± 0%    ~     (all equal)
CmdGoSize        144k ± 0%       144k ± 0%  -0.04%  (p=0.000 n=10+10)
[Geo mean]       135k            135k       -0.02%

name        old exe-bytes   new exe-bytes   delta
HelloSize       1.30M ± 0%      1.34M ± 0%  +3.15%  (p=0.000 n=10+10)
CmdGoSize       13.5M ± 0%      13.9M ± 0%  +2.70%  (p=0.000 n=10+10)
[Geo mean]      4.19M           4.31M       +2.92%

Change-Id: Id53b8d57bd00440142ccbd39b95710e14e083fb5
Reviewed-on: https://go-review.googlesource.com/101217
Reviewed-by: Heschi Kreinick <heschi@google.com>
2018-03-16 20:25:10 +00:00
Matthew Dempsky 86a338960d reflect: sort exported methods first
By moving exported methods to the front of method lists, filtering
down to only the exported methods just needs a count of how many
exported methods exist, which the compiler can statically
provide. This allows getting rid of the exported method cache.

For #22075.

Change-Id: I8eeb274563a2940e1347c34d673f843ae2569064
Reviewed-on: https://go-review.googlesource.com/100846
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-15 21:56:08 +00:00
Matthew Dempsky 91bbe5388d cmd/compile: sort method sets earlier
By sorting method sets earlier, we can change the interface
satisfaction problem from taking O(NM) time to O(N+M). This is the
same algorithm already used by runtime and reflect for dynamic
interface satisfaction testing.

For #22075.

Change-Id: I3d889f0227f37704535739bbde11f5107b4eea17
Reviewed-on: https://go-review.googlesource.com/100845
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-03-15 21:53:01 +00:00
David Chase 1c24ffbf93 cmd/compile: turn on DWARF locations lists for ssa vars
This changes the default setting for -dwarflocationlists
from false to true, removes the flag from ssa/debug_test.go,
and updates runtime/runtime-gdb_test.go to match a change
in debugging output for composite variables.

Current benchmarks (perflock, -count 10)

benchstat -geomean before.log after.log
name        old time/op     new time/op     delta
Template        175ms ± 0%      182ms ± 1%   +3.68%  (p=0.000 n=8+9)
Unicode        82.0ms ± 2%     82.8ms ± 1%   +0.96%  (p=0.019 n=9+9)
GoTypes         590ms ± 1%      611ms ± 1%   +3.42%  (p=0.000 n=9+10)
Compiler        2.85s ± 0%      2.95s ± 1%   +3.60%  (p=0.000 n=9+10)
SSA             6.42s ± 1%      6.70s ± 1%   +4.31%  (p=0.000 n=10+9)
Flate           113ms ± 2%      117ms ± 1%   +3.11%  (p=0.000 n=10+9)
GoParser        140ms ± 1%      145ms ± 1%   +3.47%  (p=0.000 n=10+9)
Reflect         384ms ± 0%      398ms ± 1%   +3.56%  (p=0.000 n=8+9)
Tar             165ms ± 1%      171ms ± 1%   +3.33%  (p=0.000 n=9+9)
XML             207ms ± 2%      214ms ± 1%   +3.41%  (p=0.000 n=9+9)
StdCmd          11.8s ± 2%      12.4s ± 2%   +4.41%  (p=0.000 n=10+9)
[Geo mean]      489ms           506ms        +3.38%

name        old user-ns/op  new user-ns/op  delta
Template         247M ± 4%       254M ± 4%   +2.76%  (p=0.040 n=10+10)
Unicode          118M ±16%       121M ±11%     ~     (p=0.364 n=10+10)
GoTypes          805M ± 2%       824M ± 2%   +2.37%  (p=0.003 n=9+8)
Compiler        3.92G ± 2%      4.01G ± 2%   +2.20%  (p=0.001 n=9+9)
SSA             9.63G ± 4%     10.00G ± 2%   +3.81%  (p=0.000 n=10+9)
Flate            155M ±10%       154M ± 7%     ~     (p=0.718 n=9+10)
GoParser         184M ±11%       190M ± 7%     ~     (p=0.220 n=10+9)
Reflect          506M ± 4%       528M ± 2%   +4.27%  (p=0.000 n=10+10)
Tar              224M ± 4%       227M ± 5%     ~     (p=0.207 n=10+9)
XML              272M ± 7%       286M ± 4%   +5.23%  (p=0.010 n=10+9)
[Geo mean]       489M            502M        +2.76%

name        old text-bytes  new text-bytes  delta
HelloSize        672k ± 0%       672k ± 0%     ~     (all equal)
CmdGoSize       7.21M ± 0%      7.21M ± 0%     ~     (all equal)
[Geo mean]      2.20M           2.20M        +0.00%

name        old data-bytes  new data-bytes  delta
HelloSize       9.88k ± 0%      9.88k ± 0%     ~     (all equal)
CmdGoSize        248k ± 0%       248k ± 0%     ~     (all equal)
[Geo mean]      49.5k           49.5k        +0.00%

name        old bss-bytes   new bss-bytes   delta
HelloSize        125k ± 0%       125k ± 0%     ~     (all equal)
CmdGoSize        144k ± 0%       144k ± 0%     ~     (all equal)
[Geo mean]       135k            135k        +0.00%

name        old exe-bytes   new exe-bytes   delta
HelloSize       1.10M ± 0%      1.30M ± 0%  +17.82%  (p=0.000 n=10+10)
CmdGoSize       11.6M ± 0%      13.5M ± 0%  +16.90%  (p=0.000 n=10+10)
[Geo mean]      3.57M           4.19M       +17.36%

Change-Id: I250055813cadd25cebee8da1f9a7f995a6eae432
Reviewed-on: https://go-review.googlesource.com/100738
Reviewed-by: Heschi Kreinick <heschi@google.com>
2018-03-15 21:34:17 +00:00
Heschi Kreinick 1814a0595c cmd/trace: filter tasks by log text
Add a search box to the top of the user task views that only displays
tasks containing a particular log message.

Change-Id: I92f4aa113f930954e8811416901e37824f0eb884
Reviewed-on: https://go-review.googlesource.com/100843
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2018-03-15 19:59:22 +00:00
Alberto Donizetti cceee685be test/codegen: port floats tests to codegen
And delete them from asm_test.

Change-Id: Ibdaca3496eefc73c731b511ddb9636a1f3dff68c
Reviewed-on: https://go-review.googlesource.com/100915
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-15 18:05:59 +00:00
Keith Randall 9d4215311b runtime: identify special functions by flag instead of address
When there are plugins, there may not be a unique copy of runtime
functions like goexit, mcall, etc.  So identifying them by entry
address is problematic.  Instead, keep track of each special function
using a field in the symbol table.  That way, multiple copies of
the same runtime function will be treated identically.

Fixes #24351
Fixes #23133

Change-Id: Iea3232df8a6af68509769d9ca618f530cc0f84fd
Reviewed-on: https://go-review.googlesource.com/100739
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-15 17:31:57 +00:00
Daniel Martí cd2cb6e3f5 cmd/compile: cache sparse maps across ssa passes
This is done for sparse sets already, but it was missing for sparse
maps. Only affects deadstore and regalloc, as they're the only ones that
use sparse maps.

name                 old time/op    new time/op    delta
DSEPass-4               247µs ± 0%     216µs ± 0%  -12.75%  (p=0.008 n=5+5)
DSEPassBlock-4         3.05ms ± 1%    2.87ms ± 1%   -6.02%  (p=0.002 n=6+6)
CSEPass-4              2.30ms ± 0%    2.32ms ± 0%   +0.53%  (p=0.026 n=6+6)
CSEPassBlock-4         23.8ms ± 0%    23.8ms ± 0%     ~     (p=0.931 n=6+5)
DeadcodePass-4         51.7µs ± 1%    51.5µs ± 2%     ~     (p=0.429 n=5+6)
DeadcodePassBlock-4     734µs ± 1%     742µs ± 3%     ~     (p=0.394 n=6+6)
MultiPass-4             152µs ± 0%     149µs ± 2%     ~     (p=0.082 n=5+6)
MultiPassBlock-4       2.67ms ± 1%    2.41ms ± 2%   -9.77%  (p=0.008 n=5+5)

name                 old alloc/op   new alloc/op   delta
DSEPass-4              41.2kB ± 0%     0.1kB ± 0%  -99.68%  (p=0.002 n=6+6)
DSEPassBlock-4          560kB ± 0%       4kB ± 0%  -99.34%  (p=0.026 n=5+6)
CSEPass-4               189kB ± 0%     189kB ± 0%     ~     (all equal)
CSEPassBlock-4         3.10MB ± 0%    3.10MB ± 0%     ~     (p=0.444 n=5+5)
DeadcodePass-4         10.5kB ± 0%    10.5kB ± 0%     ~     (all equal)
DeadcodePassBlock-4     164kB ± 0%     164kB ± 0%     ~     (all equal)
MultiPass-4             240kB ± 0%     199kB ± 0%  -17.06%  (p=0.002 n=6+6)
MultiPassBlock-4       3.60MB ± 0%    2.99MB ± 0%  -17.06%  (p=0.002 n=6+6)

name                 old allocs/op  new allocs/op  delta
DSEPass-4                8.00 ± 0%      4.00 ± 0%  -50.00%  (p=0.002 n=6+6)
DSEPassBlock-4            240 ± 0%       120 ± 0%  -50.00%  (p=0.002 n=6+6)
CSEPass-4                9.00 ± 0%      9.00 ± 0%     ~     (all equal)
CSEPassBlock-4          1.35k ± 0%     1.35k ± 0%     ~     (all equal)
DeadcodePass-4           3.00 ± 0%      3.00 ± 0%     ~     (all equal)
DeadcodePassBlock-4      9.00 ± 0%      9.00 ± 0%     ~     (all equal)
MultiPass-4              11.0 ± 0%      10.0 ± 0%   -9.09%  (p=0.002 n=6+6)
MultiPassBlock-4          165 ± 0%       150 ± 0%   -9.09%  (p=0.002 n=6+6)

Change-Id: I43860687c88f33605eb1415f36473c5cfe8fde4a
Reviewed-on: https://go-review.googlesource.com/98449
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-03-15 17:24:39 +00:00
Giovanni Bajo a35ec9a59e cmd/compile: implement CMOV on amd64
This builds upon the branchelim pass, activating it for amd64 and
lowering CondSelect. Special care is made to FPU instructions for
NaN handling.

Benchmark results on Xeon E5630 (Westmere EP):

name                      old time/op    new time/op    delta
BinaryTree17-16              4.99s ± 9%     4.66s ± 2%     ~     (p=0.095 n=5+5)
Fannkuch11-16                4.93s ± 3%     5.04s ± 2%     ~     (p=0.548 n=5+5)
FmtFprintfEmpty-16          58.8ns ± 7%    61.4ns ±14%     ~     (p=0.579 n=5+5)
FmtFprintfString-16          114ns ± 2%     114ns ± 4%     ~     (p=0.603 n=5+5)
FmtFprintfInt-16             181ns ± 4%     125ns ± 3%  -30.90%  (p=0.008 n=5+5)
FmtFprintfIntInt-16          263ns ± 2%     217ns ± 2%  -17.34%  (p=0.008 n=5+5)
FmtFprintfPrefixedInt-16     230ns ± 1%     212ns ± 1%   -7.99%  (p=0.008 n=5+5)
FmtFprintfFloat-16           411ns ± 3%     344ns ± 5%  -16.43%  (p=0.008 n=5+5)
FmtManyArgs-16               828ns ± 4%     790ns ± 2%   -4.59%  (p=0.032 n=5+5)
GobDecode-16                10.9ms ± 4%    10.8ms ± 5%     ~     (p=0.548 n=5+5)
GobEncode-16                9.52ms ± 5%    9.46ms ± 2%     ~     (p=1.000 n=5+5)
Gzip-16                      334ms ± 2%     337ms ± 2%     ~     (p=0.548 n=5+5)
Gunzip-16                   64.4ms ± 1%    65.0ms ± 1%   +1.00%  (p=0.008 n=5+5)
HTTPClientServer-16          156µs ± 3%     155µs ± 3%     ~     (p=0.690 n=5+5)
JSONEncode-16               21.0ms ± 1%    21.8ms ± 0%   +3.76%  (p=0.016 n=5+4)
JSONDecode-16               95.1ms ± 0%    95.7ms ± 1%     ~     (p=0.151 n=5+5)
Mandelbrot200-16            6.38ms ± 1%    6.42ms ± 1%     ~     (p=0.095 n=5+5)
GoParse-16                  5.47ms ± 2%    5.36ms ± 1%   -1.95%  (p=0.016 n=5+5)
RegexpMatchEasy0_32-16       111ns ± 1%     111ns ± 1%     ~     (p=0.635 n=5+4)
RegexpMatchEasy0_1K-16       408ns ± 1%     411ns ± 2%     ~     (p=0.087 n=5+5)
RegexpMatchEasy1_32-16       103ns ± 1%     104ns ± 1%     ~     (p=0.484 n=5+5)
RegexpMatchEasy1_1K-16       659ns ± 2%     652ns ± 1%     ~     (p=0.571 n=5+5)
RegexpMatchMedium_32-16      176ns ± 2%     174ns ± 1%     ~     (p=0.476 n=5+5)
RegexpMatchMedium_1K-16     58.6µs ± 4%    57.7µs ± 4%     ~     (p=0.548 n=5+5)
RegexpMatchHard_32-16       3.07µs ± 3%    3.04µs ± 4%     ~     (p=0.421 n=5+5)
RegexpMatchHard_1K-16       89.2µs ± 1%    87.9µs ± 2%   -1.52%  (p=0.032 n=5+5)
Revcomp-16                   575ms ± 0%     587ms ± 2%   +2.12%  (p=0.032 n=4+5)
Template-16                  110ms ± 1%     107ms ± 3%   -3.00%  (p=0.032 n=5+5)
TimeParse-16                 463ns ± 0%     462ns ± 0%     ~     (p=0.810 n=5+4)
TimeFormat-16                538ns ± 0%     535ns ± 0%   -0.63%  (p=0.024 n=5+5)

name                      old speed      new speed      delta
GobDecode-16              70.7MB/s ± 4%  71.4MB/s ± 5%     ~     (p=0.452 n=5+5)
GobEncode-16              80.7MB/s ± 5%  81.2MB/s ± 2%     ~     (p=1.000 n=5+5)
Gzip-16                   58.2MB/s ± 2%  57.7MB/s ± 2%     ~     (p=0.452 n=5+5)
Gunzip-16                  302MB/s ± 1%   299MB/s ± 1%   -0.99%  (p=0.008 n=5+5)
JSONEncode-16             92.4MB/s ± 1%  89.1MB/s ± 0%   -3.63%  (p=0.016 n=5+4)
JSONDecode-16             20.4MB/s ± 0%  20.3MB/s ± 1%     ~     (p=0.135 n=5+5)
GoParse-16                10.6MB/s ± 2%  10.8MB/s ± 1%   +2.00%  (p=0.016 n=5+5)
RegexpMatchEasy0_32-16     286MB/s ± 1%   285MB/s ± 3%     ~     (p=1.000 n=5+5)
RegexpMatchEasy0_1K-16    2.51GB/s ± 1%  2.49GB/s ± 2%     ~     (p=0.095 n=5+5)
RegexpMatchEasy1_32-16     309MB/s ± 1%   307MB/s ± 1%     ~     (p=0.548 n=5+5)
RegexpMatchEasy1_1K-16    1.55GB/s ± 2%  1.57GB/s ± 1%     ~     (p=0.690 n=5+5)
RegexpMatchMedium_32-16   5.68MB/s ± 2%  5.73MB/s ± 1%     ~     (p=0.579 n=5+5)
RegexpMatchMedium_1K-16   17.5MB/s ± 4%  17.8MB/s ± 4%     ~     (p=0.500 n=5+5)
RegexpMatchHard_32-16     10.4MB/s ± 3%  10.5MB/s ± 4%     ~     (p=0.460 n=5+5)
RegexpMatchHard_1K-16     11.5MB/s ± 1%  11.7MB/s ± 2%   +1.57%  (p=0.032 n=5+5)
Revcomp-16                 442MB/s ± 0%   433MB/s ± 2%   -2.05%  (p=0.032 n=4+5)
Template-16               17.7MB/s ± 1%  18.2MB/s ± 3%   +3.12%  (p=0.032 n=5+5)

Change-Id: I6972e8f35f2b31f9a42ac473a6bf419a18022558
Reviewed-on: https://go-review.googlesource.com/100935
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-15 16:41:59 +00:00
James Cowgill 423111081b cmd/internal/obj/mips: load/store even float registers first
There is a bug in Octeon III processors where storing an odd floating
point register after it has recently been written to by a double
floating point operation will store the old value from before the double
operation (there are some extra details - the operation and store
must be a certain number of cycles apart). However, this bug does not
occur if the even register is stored first. Currently the bug only
happens on big endian because go always loads the even register first on
little endian.

Workaround the bug by always loading / storing the even floating point
register first. Since this is just an instruction reordering, it should
have no performance penalty. This follows other compilers like GCC which
will always store the even register first (although you do have to set
the ISA level to MIPS I to prevent it from using SDC1).

Change-Id: I5e73daa4d724ca1df7bf5228aab19f53f26a4976
Reviewed-on: https://go-review.googlesource.com/97735
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-15 15:40:39 +00:00
Geoff Berry e244a7a7d3 cmd/compile/internal/ssa: add patterns for arm64 bitfield opcodes
Add patterns to match common idioms for EXTR, BFI, BFXIL, SBFIZ, SBFX,
UBFIZ and UBFX opcodes.

go1 benchmarks results on Amberwing:
name                   old time/op    new time/op    delta
FmtManyArgs               786ns ± 2%     714ns ± 1%  -9.20%  (p=0.000 n=10+10)
Gzip                      437ms ± 0%     402ms ± 0%  -7.99%  (p=0.000 n=10+10)
FmtFprintfIntInt          196ns ± 0%     182ns ± 0%  -7.28%  (p=0.000 n=10+9)
FmtFprintfPrefixedInt     207ns ± 0%     199ns ± 0%  -3.86%  (p=0.000 n=10+10)
FmtFprintfFloat           324ns ± 0%     316ns ± 0%  -2.47%  (p=0.000 n=10+8)
FmtFprintfInt             119ns ± 0%     117ns ± 0%  -1.68%  (p=0.000 n=10+9)
GobDecode                12.8ms ± 2%    12.6ms ± 1%  -1.62%  (p=0.002 n=10+10)
JSONDecode               94.4ms ± 1%    93.4ms ± 0%  -1.10%  (p=0.000 n=10+10)
RegexpMatchEasy0_32       247ns ± 0%     245ns ± 0%  -0.65%  (p=0.000 n=10+10)
RegexpMatchMedium_32      314ns ± 0%     312ns ± 0%  -0.64%  (p=0.000 n=10+10)
RegexpMatchEasy0_1K       541ns ± 0%     538ns ± 0%  -0.55%  (p=0.000 n=10+9)
TimeParse                 450ns ± 1%     448ns ± 1%  -0.42%  (p=0.035 n=9+9)
RegexpMatchEasy1_32       244ns ± 0%     243ns ± 0%  -0.41%  (p=0.000 n=10+10)
GoParse                  6.03ms ± 0%    6.00ms ± 0%  -0.40%  (p=0.002 n=10+10)
RegexpMatchEasy1_1K       779ns ± 0%     777ns ± 0%  -0.26%  (p=0.000 n=10+10)
RegexpMatchHard_32       2.75µs ± 0%    2.74µs ± 1%  -0.06%  (p=0.026 n=9+9)
BinaryTree17              11.7s ± 0%     11.6s ± 0%    ~     (p=0.089 n=10+10)
HTTPClientServer         89.1µs ± 1%    89.5µs ± 2%    ~     (p=0.436 n=10+10)
RegexpMatchHard_1K       78.9µs ± 0%    79.5µs ± 2%    ~     (p=0.469 n=10+10)
FmtFprintfEmpty          58.5ns ± 0%    58.5ns ± 0%    ~     (all equal)
GobEncode                12.0ms ± 1%    12.1ms ± 0%    ~     (p=0.075 n=10+10)
Revcomp                   669ms ± 0%     668ms ± 0%    ~     (p=0.091 n=7+9)
Mandelbrot200            5.35ms ± 0%    5.36ms ± 0%  +0.07%  (p=0.000 n=9+9)
RegexpMatchMedium_1K     52.1µs ± 0%    52.1µs ± 0%  +0.10%  (p=0.000 n=9+9)
Fannkuch11                3.25s ± 0%     3.26s ± 0%  +0.36%  (p=0.000 n=9+10)
FmtFprintfString          114ns ± 1%     115ns ± 0%  +0.52%  (p=0.011 n=10+10)
JSONEncode               20.2ms ± 0%    20.3ms ± 0%  +0.65%  (p=0.000 n=10+10)
Template                 91.3ms ± 0%    92.3ms ± 0%  +1.08%  (p=0.000 n=10+10)
TimeFormat                484ns ± 0%     495ns ± 1%  +2.30%  (p=0.000 n=9+10)

There are some opportunities to improve this change further by adding
patterns to match the "extended register" versions of ADD/SUB/CMP, but I
think that should be evaluated on its own.  The regressions in Template
and TimeFormat would likely be recovered by this, as they seem to be due
to generating:

    ubfiz x0, x0, #3, #8
    add x1, x2, x0

instead of

    add x1, x2, x0, lsl #3

Change-Id: I5644a8d70ac7a98e784a377a2b76ab47a3415a4b
Reviewed-on: https://go-review.googlesource.com/88355
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-15 14:10:41 +00:00
Alberto Donizetti ded9a1b372 test/codegen: port len/cap pow2 div tests to codegen
And delete them from asm_test.

Change-Id: I29c8d098a8893e6b669b6272a2f508985ac9d618
Reviewed-on: https://go-review.googlesource.com/100876
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-15 13:34:01 +00:00
Matthew Dempsky 29517daff9 cmd/compile: extract common noding code from func{Decl,Lit}
Passes toolstash-check.

Change-Id: I8290221d6169e077dfa4ea737d685c7fcecf6841
Reviewed-on: https://go-review.googlesource.com/100835
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-03-15 01:09:42 +00:00
Matthew Dempsky 463fe95bdd cmd/compile: fix duplicate code generation in swt.go
When combining adjacent type switch cases with the same type hash, we
failed to actually remove the combined cases, so we would generate
code for them twice.

We use MD5 for type hashes, so collisions are rare, but they do
currently appear in test/fixedbugs/bug248.dir/bug2.go, which is how I
noticed this failure.

Passes toolstash-check.

Change-Id: I66729b3366b96cb8ddc8fa6f3ebea11ef6d74012
Reviewed-on: https://go-review.googlesource.com/100461
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-03-15 01:01:56 +00:00
Matthew Dempsky eb3c44b2c4 cmd/compile: cleanup closure.go
The main thing is we now eagerly create the ODCLFUNC node for
closures, immediately cross-link them, and assign fields (e.g., Nbody,
Dcl, Parents, Marks) directly on the ODCLFUNC (previously they were
assigned on the OCLOSURE and later moved to the ODCLFUNC).

This allows us to set Curfn to the ODCLFUNC instead of the OCLOSURE,
which makes things more consistent with normal function declarations.
(Notably, this means Cvars now hang off the ODCLFUNC instead of the
OCLOSURE.)

Assignment of xfunc symbol names also now happens before typechecking
their body, which means debugging output now provides a more helpful
name than "<S>".

In golang.org/cl/66810, we changed "x := y" statements to avoid
creating false closure variables for x, but we still create them for
struct literals like "s{f: x}". Update comment in capturevars
accordingly.

More opportunity for cleanups still, but this makes some substantial
progress, IMO.

Passes toolstash-check.

Change-Id: I65a4efc91886e3dcd1000561348af88297775cd7
Reviewed-on: https://go-review.googlesource.com/100197
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-03-14 23:54:39 +00:00
Ilya Tocar 644d14ea0f Revert "cmd/compile: implement CMOV on amd64"
This reverts commit 080187f4f7.

It broke build of golang.org/x/exp/shiny/iconvg
See issue 24395 for details

Change-Id: Ifd6134f6214e6cee40bd3c63c32941d5fc96ae8b
Reviewed-on: https://go-review.googlesource.com/100755
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-14 21:21:23 +00:00
Heschi Kreinick 44e65f2c94 cmd/compile/internal/ssa: track stack-only vars
User variables that cannot be SSA'd, either because their addresses are
taken or because they are too large for the decomposition heuristic, do
not explicitly appear as operands of SSA values. Instead they are written
to directly via the stack pointer.

This hid them from the location list generation, which is only
interested in the named value table. Fortunately, the lifetime of
stack-only variables is delineated by VarDef/VarKill ops, and it's easy
enough to turn those into location list bounds.

One wrinkle: stack frame information is not explicitly available in the
SSA phases, because it's owned by the frontend in AllocFrame. It would
be easier if the set of live LocalSlots were returned by that, but this
is the minimal change to fix missing variables. Or VarDef/VarKills
could appear in NamedValues, which would make this change even easier.

Change-Id: Ice6654dad6f9babb0286e95c7ec28594561dc91f
Reviewed-on: https://go-review.googlesource.com/100458
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-14 20:52:29 +00:00
Lynn Boger aff222cd18 cmd/compile: improve PPC64.rules to reduce size of rewritePPC64.go
Some rules in PPC64.rules cause an extremely large rewritePPC64.go
file to be generated, due to rules with commutative operations and
many operands. This happens with the existing
rules for combining byte loads in little endian order, and
also happens with the pending change to do the same for bytes
in big endian order.

The change improves the existing rules and reduces the size of
the rewrite file by more than 60%. Once this change is merged,
then the pending change for big endian ordered rules will be
updated to use rules that avoid generating an excessively large
rewrite file.

This also includes a fix to a performance regression for
littleEndian.PutUint16 on ppc64le.

Change-Id: I8d2ea42885fa2b84b30c63aa124b0a9b130564ff
Reviewed-on: https://go-review.googlesource.com/100675
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-14 19:03:05 +00:00
Balaram Makam b46d398887 runtime: improve arm64 memclr implementation
Improve runtime memclr_arm64.s using ZVA feature to zero out memory when n
is at least 64 bytes.

Also add DCZID_EL0 system register to use in MRS instruction.

    Benchmark results of runtime/Memclr on Amberwing:
name          old time/op    new time/op    delta
Memclr/5        12.7ns ± 0%    12.7ns ± 0%      ~     (all equal)
Memclr/16       12.7ns ± 0%    12.2ns ± 1%    -4.13%  (p=0.000 n=7+8)
Memclr/64       14.0ns ± 0%    14.6ns ± 1%    +4.29%  (p=0.000 n=7+8)
Memclr/256      23.7ns ± 0%    25.7ns ± 0%    +8.44%  (p=0.000 n=8+7)
Memclr/4096      204ns ± 0%      74ns ± 0%   -63.71%  (p=0.000 n=8+8)
Memclr/65536    2.89µs ± 0%    0.84µs ± 0%   -70.91%  (p=0.000 n=8+8)
Memclr/1M       45.9µs ± 0%    17.0µs ± 0%   -62.88%  (p=0.000 n=8+8)
Memclr/4M        184µs ± 0%      77µs ± 4%   -57.94%  (p=0.001 n=6+8)
Memclr/8M        367µs ± 0%     144µs ± 1%   -60.72%  (p=0.000 n=7+8)
Memclr/16M       734µs ± 0%     293µs ± 1%   -60.09%  (p=0.000 n=8+8)
Memclr/64M      2.94ms ± 0%    1.23ms ± 0%   -58.06%  (p=0.000 n=7+8)
GoMemclr/5      8.00ns ± 0%    8.79ns ± 0%    +9.83%  (p=0.000 n=8+8)
GoMemclr/16     8.00ns ± 0%    7.60ns ± 0%    -5.00%  (p=0.000 n=8+8)
GoMemclr/64     10.8ns ± 0%    10.4ns ± 0%    -3.70%  (p=0.000 n=8+8)
GoMemclr/256    20.4ns ± 0%    21.2ns ± 0%    +3.92%  (p=0.000 n=8+8)

name          old speed      new speed      delta
Memclr/5       394MB/s ± 0%   393MB/s ± 0%    -0.28%  (p=0.006 n=8+8)
Memclr/16     1.26GB/s ± 0%  1.31GB/s ± 1%    +4.07%  (p=0.000 n=7+8)
Memclr/64     4.57GB/s ± 0%  4.39GB/s ± 2%    -3.91%  (p=0.000 n=7+8)
Memclr/256    10.8GB/s ± 0%  10.0GB/s ± 0%    -7.95%  (p=0.001 n=7+6)
Memclr/4096   20.1GB/s ± 0%  55.3GB/s ± 0%  +175.46%  (p=0.000 n=8+8)
Memclr/65536  22.6GB/s ± 0%  77.8GB/s ± 0%  +243.63%  (p=0.000 n=7+8)
Memclr/1M     22.8GB/s ± 0%  61.5GB/s ± 0%  +169.38%  (p=0.000 n=8+8)
Memclr/4M     22.8GB/s ± 0%  54.3GB/s ± 4%  +137.85%  (p=0.001 n=6+8)
Memclr/8M     22.8GB/s ± 0%  58.1GB/s ± 1%  +154.56%  (p=0.000 n=7+8)
Memclr/16M    22.8GB/s ± 0%  57.2GB/s ± 1%  +150.54%  (p=0.000 n=8+8)
Memclr/64M    22.8GB/s ± 0%  54.4GB/s ± 0%  +138.42%  (p=0.000 n=7+8)
GoMemclr/5     625MB/s ± 0%   569MB/s ± 0%    -8.90%  (p=0.000 n=7+8)
GoMemclr/16   2.00GB/s ± 0%  2.10GB/s ± 0%    +5.26%  (p=0.000 n=8+8)
GoMemclr/64   5.92GB/s ± 0%  6.15GB/s ± 0%    +3.83%  (p=0.000 n=7+8)
GoMemclr/256  12.5GB/s ± 0%  12.1GB/s ± 0%    -3.77%  (p=0.000 n=8+7)

    Benchmark results of runtime/Memclr on Amberwing without ZVA:
name          old time/op    new time/op    delta
Memclr/5        12.7ns ± 0%    12.8ns ± 0%   +0.79%  (p=0.008 n=5+5)
Memclr/16       12.7ns ± 0%    12.7ns ± 0%     ~     (p=0.444 n=5+5)
Memclr/64       14.0ns ± 0%    14.4ns ± 0%   +2.86%  (p=0.008 n=5+5)
Memclr/256      23.7ns ± 1%    19.2ns ± 0%  -19.06%  (p=0.008 n=5+5)
Memclr/4096      203ns ± 0%     119ns ± 0%  -41.38%  (p=0.008 n=5+5)
Memclr/65536    2.89µs ± 0%    1.66µs ± 0%  -42.76%  (p=0.008 n=5+5)
Memclr/1M       45.9µs ± 0%    26.2µs ± 0%  -42.82%  (p=0.008 n=5+5)
Memclr/4M        184µs ± 0%     105µs ± 0%  -42.81%  (p=0.008 n=5+5)
Memclr/8M        367µs ± 0%     210µs ± 0%  -42.76%  (p=0.008 n=5+5)
Memclr/16M       734µs ± 0%     420µs ± 0%  -42.74%  (p=0.008 n=5+5)
Memclr/64M      2.94ms ± 0%    1.69ms ± 0%  -42.46%  (p=0.008 n=5+5)
GoMemclr/5      8.00ns ± 0%    8.40ns ± 0%   +5.00%  (p=0.008 n=5+5)
GoMemclr/16     8.00ns ± 0%    8.40ns ± 0%   +5.00%  (p=0.008 n=5+5)
GoMemclr/64     10.8ns ± 0%     9.6ns ± 0%  -11.02%  (p=0.008 n=5+5)
GoMemclr/256    20.4ns ± 0%    17.2ns ± 0%  -15.69%  (p=0.008 n=5+5)

name          old speed      new speed      delta
Memclr/5       393MB/s ± 0%   391MB/s ± 0%   -0.64%  (p=0.008 n=5+5)
Memclr/16     1.26GB/s ± 0%  1.26GB/s ± 0%   -0.55%  (p=0.008 n=5+5)
Memclr/64     4.57GB/s ± 0%  4.44GB/s ± 0%   -2.79%  (p=0.008 n=5+5)
Memclr/256    10.8GB/s ± 0%  13.3GB/s ± 0%  +23.07%  (p=0.016 n=4+5)
Memclr/4096   20.1GB/s ± 0%  34.3GB/s ± 0%  +70.91%  (p=0.008 n=5+5)
Memclr/65536  22.7GB/s ± 0%  39.6GB/s ± 0%  +74.65%  (p=0.008 n=5+5)
Memclr/1M     22.8GB/s ± 0%  40.0GB/s ± 0%  +74.88%  (p=0.008 n=5+5)
Memclr/4M     22.8GB/s ± 0%  39.9GB/s ± 0%  +74.84%  (p=0.008 n=5+5)
Memclr/8M     22.9GB/s ± 0%  39.9GB/s ± 0%  +74.71%  (p=0.008 n=5+5)
Memclr/16M    22.9GB/s ± 0%  39.9GB/s ± 0%  +74.64%  (p=0.008 n=5+5)
Memclr/64M    22.8GB/s ± 0%  39.7GB/s ± 0%  +73.79%  (p=0.008 n=5+5)
GoMemclr/5     625MB/s ± 0%   595MB/s ± 0%   -4.77%  (p=0.000 n=4+5)
GoMemclr/16   2.00GB/s ± 0%  1.90GB/s ± 0%   -4.77%  (p=0.008 n=5+5)
GoMemclr/64   5.92GB/s ± 0%  6.66GB/s ± 0%  +12.48%  (p=0.016 n=4+5)
GoMemclr/256  12.5GB/s ± 0%  14.9GB/s ± 0%  +18.95%  (p=0.008 n=5+5)

Fixes #22948

Change-Id: Iaae4e22391e25b54d299821bb7f8a81ac3986b93
Reviewed-on: https://go-review.googlesource.com/82055
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-14 18:20:40 +00:00
Robert Griesemer e65d6a6abe cmd/compile: document new line directives
Fixes #24183.

Change-Id: I5ef31c4a3aad7e05568b7de1227745d686d4aff8
Reviewed-on: https://go-review.googlesource.com/100462
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-14 18:11:30 +00:00
Alberto Donizetti cd3aae9b81 test/codegen: port all small memmove tests to codegen
This change ports all the remaining tests checking that small memmoves
are replaced with MOVs to the new codegen test harness, and deletes
them from the asm_test file.

Change-Id: I01c94b441e27a5d61518035af62d62779dafeb56
Reviewed-on: https://go-review.googlesource.com/100476
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-14 15:57:07 +00:00
Daniel Martí b8d26225c1 cmd/asm: move manual tests out of generated file
Thanks to Iskander Sharipov for spotting this in an earlier CL of mine.

Change-Id: Idf45ad266205ff83985367cb38f585badfbed151
Reviewed-on: https://go-review.googlesource.com/100535
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Iskander Sharipov <iskander.sharipov@intel.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-14 15:32:55 +00:00
Giovanni Bajo eb44b8635c cmd/compile: remove BTQconst rule
This rule is meant for code optimization, but it makes other rules
potentially more complex, as they need to cope with the fact that
a 32-bit op (BTLconst) can appear everywhere a 64-bit rule maches.

Move the optimization to opcode expansion instead. Tests will be
added in following CL.

Change-Id: Ica5ef291e7963c4af17c124d4a2869e6c8f7b0c7
Reviewed-on: https://go-review.googlesource.com/99995
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-13 23:21:38 +00:00
Matthew Dempsky e601c07908 cmd/compile: reject type switch with guarded declaration and no cases
Fixes #23116.

Change-Id: I5db5c5c39bbb50148ffa18c9393b045f255f80a3
Reviewed-on: https://go-review.googlesource.com/100459
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-03-13 22:02:46 +00:00
Robert Griesemer 363bcd7b4f cmd/compile: use key position for key:val elements in composite literals
Fixes #24339.

Change-Id: Ie47764fed27f76b480834b1fdbed0512c94831d9
Reviewed-on: https://go-review.googlesource.com/100457
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-03-13 21:44:12 +00:00
Daniel Martí 63f4ab98eb cmd/compile: deduplicate racewalk switch cases
Only the contiguous ones, to keep the patch simple. Remove some
unnecessary newlines, while at it.

Change-Id: Ia588f80538b49a169fbf49835979ebff5a0a7b6d
Reviewed-on: https://go-review.googlesource.com/94756
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-03-13 21:42:58 +00:00
Daniel Martí 1178e51a37 cmd/asm: VPERMQ's imm8 arg is an uint8
The imm8 argument consists of 4 2-bit indices, so it can take values up
to $255. However, the assembler was treating it as Yi8, which reads
"fits in int8". Add a Yu8 variant, to also keep backwards compatibility
with negative values possible with Yi8.

Fixes #24378.

Change-Id: I24ddb19c219b54d039a6c1bcdb903717d1c7c3b8
Reviewed-on: https://go-review.googlesource.com/100475
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-13 21:29:45 +00:00
pityonline 375280f1bb cmd/vendor: fix JSON format
Change-Id: I9c5a4a4031c12d67c7e75e9a276a766927abf83d
Reviewed-on: https://go-review.googlesource.com/100415
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-13 20:41:16 +00:00
Matthew Dempsky 09d4455f45 cmd/compile: enable inlining variadic functions
As a side effect of working on mid-stack inlining, we've fixed support
for inlining variadic functions. Might as well enable it.

Change-Id: I7f555f8b941969791db7eb598c0b49f6dc0820aa
Reviewed-on: https://go-review.googlesource.com/100456
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-13 20:34:03 +00:00
Matthew Dempsky c74aa39f47 cmd/compile: eliminate mkinlcall's isddd parameter
These are always set to n.Isddd(), which is readily available within
mkinlcall.

Change-Id: I3d7fbc9dc19a40d6b905691c666eee9bcd031a00
Reviewed-on: https://go-review.googlesource.com/100455
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-13 20:23:55 +00:00
Carlos Eduardo Seo e1f8fe8dff cmd/internal/obj/ppc64: implement full operand support for l*arx instructions
The current implementation of l*arx instructions does not accept non-zero
offsets in RA nor the EH field. This change adds full functionality to those
instructions.

Updates #23845

Change-Id: If113f70d11de5f35f8389520b049390dbc40e863
Reviewed-on: https://go-review.googlesource.com/99635
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2018-03-13 19:46:54 +00:00
Daniel Martí ca9abbb731 cmd/compile: remove some unused parameters
As reported by unparam.

Passes toolstash -cmp on std cmd.

Change-Id: I55473e1eed096ed1c3e431aed2cbf0b6b5444b91
Reviewed-on: https://go-review.googlesource.com/97895
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-03-13 16:50:11 +00:00
Cherry Zhang 518e6f0893 cmd/internal/obj/arm64: support logical instructions targeting RSP
Logical instructions can have RSP as its destination. Support it.

Note that the two-operand form, like "AND $1, RSP", which is
equivalent to the three-operand form "AND $1, RSP, RSP", is
invalid, because the source register is not allowed to be RSP.

Also note that instructions that set the conditional flags, like
ANDS, cannot target RSP. Because of this, we split out the optab
entries of AND et al. and ANDS et al.

Merge the optab entries of BIC et al. to AND et al., because they
are same.

Fixes #24332.

Change-Id: I3584d6f2e7cea98a659a1ed9fdf67c353e090637
Reviewed-on: https://go-review.googlesource.com/100217
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-13 16:09:57 +00:00
Cherry Zhang 911839c1f4 cmd/internal/obj/arm64: fix branch-too-far with TBZ like instructions
The compiler now emits TBZ like instructions, but the assembler's
too-far-branch patch code didn't include that case. Add it.

Fixes #23889.

Change-Id: Ib75f9250c660b9fb652835fbc83263a5d5073dc5
Reviewed-on: https://go-review.googlesource.com/94902
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-03-13 13:27:58 +00:00
David Chase 3c16934f16 cmd/compile: fix failure to reset reused bit of storage
This is the "3rd bug" that caused compilations to sometimes
produce different results when dwarf location lists were
enabled.

A loop had not been properly rewritten in an earlier
optimization CL, and it accessed uninitialized data,
which was deterministically perhaps wrong when single
threaded, but variably wrong when multithreaded.

Change-Id: Ib3da538762fdf7d5e4407106f2434f3b14a1d7ea
Reviewed-on: https://go-review.googlesource.com/99935
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2018-03-13 05:03:31 +00:00
Vladimir Kuzmin 7395083136 cmd/compile: avoid extra mapaccess in "m[k] op= r"
Currently, order desugars map assignment operations like

    m[k] op= r

into

    m[k] = m[k] op r

which in turn is transformed during walk into:

    tmp := *mapaccess(m, k)
    tmp = tmp op r
    *mapassign(m, k) = tmp

However, this is suboptimal, as we could instead produce just:

    *mapassign(m, k) op= r

One complication though is if "r == 0", then "m[k] /= r" and "m[k] %=
r" will panic, and they need to do so *before* calling mapassign,
otherwise we may insert a new zero-value element into the map.

It would be spec compliant to just emit the "r != 0" check before
calling mapassign (see #23735), but currently these checks aren't
generated until SSA construction. For now, it's simpler to continue
desugaring /= and %= into two map indexing operations.

Fixes #23661.

Change-Id: I46e3739d9adef10e92b46fdd78b88d5aabe68952
Reviewed-on: https://go-review.googlesource.com/91557
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-03-12 19:27:44 +00:00
isharipo 85a8d25d53 cmd/compile/internal/ssa: emit IMUL3{L/Q} for MUL{L/Q}const on x86
cmd/asm now supports three-operand form of IMUL,
so instead of using IMUL with resultInArg0, emit IMUL3 instruction.

This results in less redundant MOVs where SSA assigns
different registers to input[0] and dst arguments.

Note: these have exactly the same encoding when reg0=reg1:
      IMUL3x $const, reg0, reg1
      IMULx $const, reg
Two-operand IMULx is like a crippled IMUL3x, with dst fixed to input[0].
This is why we don't bother to generate IMULx for the case where
dst is the same as input[0].

Change-Id: I4becda475b3dffdd07b6fdf1c75bacc82af654e4
Reviewed-on: https://go-review.googlesource.com/99656
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-12 19:02:36 +00:00
Giovanni Bajo 080187f4f7 cmd/compile: implement CMOV on amd64
This builds upon the branchelim pass, activating it for amd64 and
lowering CondSelect. Special care is made to FPU instructions for
NaN handling.

Benchmark results on Xeon E5630 (Westmere EP):

name                      old time/op    new time/op    delta
BinaryTree17-16              4.99s ± 9%     4.66s ± 2%     ~     (p=0.095 n=5+5)
Fannkuch11-16                4.93s ± 3%     5.04s ± 2%     ~     (p=0.548 n=5+5)
FmtFprintfEmpty-16          58.8ns ± 7%    61.4ns ±14%     ~     (p=0.579 n=5+5)
FmtFprintfString-16          114ns ± 2%     114ns ± 4%     ~     (p=0.603 n=5+5)
FmtFprintfInt-16             181ns ± 4%     125ns ± 3%  -30.90%  (p=0.008 n=5+5)
FmtFprintfIntInt-16          263ns ± 2%     217ns ± 2%  -17.34%  (p=0.008 n=5+5)
FmtFprintfPrefixedInt-16     230ns ± 1%     212ns ± 1%   -7.99%  (p=0.008 n=5+5)
FmtFprintfFloat-16           411ns ± 3%     344ns ± 5%  -16.43%  (p=0.008 n=5+5)
FmtManyArgs-16               828ns ± 4%     790ns ± 2%   -4.59%  (p=0.032 n=5+5)
GobDecode-16                10.9ms ± 4%    10.8ms ± 5%     ~     (p=0.548 n=5+5)
GobEncode-16                9.52ms ± 5%    9.46ms ± 2%     ~     (p=1.000 n=5+5)
Gzip-16                      334ms ± 2%     337ms ± 2%     ~     (p=0.548 n=5+5)
Gunzip-16                   64.4ms ± 1%    65.0ms ± 1%   +1.00%  (p=0.008 n=5+5)
HTTPClientServer-16          156µs ± 3%     155µs ± 3%     ~     (p=0.690 n=5+5)
JSONEncode-16               21.0ms ± 1%    21.8ms ± 0%   +3.76%  (p=0.016 n=5+4)
JSONDecode-16               95.1ms ± 0%    95.7ms ± 1%     ~     (p=0.151 n=5+5)
Mandelbrot200-16            6.38ms ± 1%    6.42ms ± 1%     ~     (p=0.095 n=5+5)
GoParse-16                  5.47ms ± 2%    5.36ms ± 1%   -1.95%  (p=0.016 n=5+5)
RegexpMatchEasy0_32-16       111ns ± 1%     111ns ± 1%     ~     (p=0.635 n=5+4)
RegexpMatchEasy0_1K-16       408ns ± 1%     411ns ± 2%     ~     (p=0.087 n=5+5)
RegexpMatchEasy1_32-16       103ns ± 1%     104ns ± 1%     ~     (p=0.484 n=5+5)
RegexpMatchEasy1_1K-16       659ns ± 2%     652ns ± 1%     ~     (p=0.571 n=5+5)
RegexpMatchMedium_32-16      176ns ± 2%     174ns ± 1%     ~     (p=0.476 n=5+5)
RegexpMatchMedium_1K-16     58.6µs ± 4%    57.7µs ± 4%     ~     (p=0.548 n=5+5)
RegexpMatchHard_32-16       3.07µs ± 3%    3.04µs ± 4%     ~     (p=0.421 n=5+5)
RegexpMatchHard_1K-16       89.2µs ± 1%    87.9µs ± 2%   -1.52%  (p=0.032 n=5+5)
Revcomp-16                   575ms ± 0%     587ms ± 2%   +2.12%  (p=0.032 n=4+5)
Template-16                  110ms ± 1%     107ms ± 3%   -3.00%  (p=0.032 n=5+5)
TimeParse-16                 463ns ± 0%     462ns ± 0%     ~     (p=0.810 n=5+4)
TimeFormat-16                538ns ± 0%     535ns ± 0%   -0.63%  (p=0.024 n=5+5)

name                      old speed      new speed      delta
GobDecode-16              70.7MB/s ± 4%  71.4MB/s ± 5%     ~     (p=0.452 n=5+5)
GobEncode-16              80.7MB/s ± 5%  81.2MB/s ± 2%     ~     (p=1.000 n=5+5)
Gzip-16                   58.2MB/s ± 2%  57.7MB/s ± 2%     ~     (p=0.452 n=5+5)
Gunzip-16                  302MB/s ± 1%   299MB/s ± 1%   -0.99%  (p=0.008 n=5+5)
JSONEncode-16             92.4MB/s ± 1%  89.1MB/s ± 0%   -3.63%  (p=0.016 n=5+4)
JSONDecode-16             20.4MB/s ± 0%  20.3MB/s ± 1%     ~     (p=0.135 n=5+5)
GoParse-16                10.6MB/s ± 2%  10.8MB/s ± 1%   +2.00%  (p=0.016 n=5+5)
RegexpMatchEasy0_32-16     286MB/s ± 1%   285MB/s ± 3%     ~     (p=1.000 n=5+5)
RegexpMatchEasy0_1K-16    2.51GB/s ± 1%  2.49GB/s ± 2%     ~     (p=0.095 n=5+5)
RegexpMatchEasy1_32-16     309MB/s ± 1%   307MB/s ± 1%     ~     (p=0.548 n=5+5)
RegexpMatchEasy1_1K-16    1.55GB/s ± 2%  1.57GB/s ± 1%     ~     (p=0.690 n=5+5)
RegexpMatchMedium_32-16   5.68MB/s ± 2%  5.73MB/s ± 1%     ~     (p=0.579 n=5+5)
RegexpMatchMedium_1K-16   17.5MB/s ± 4%  17.8MB/s ± 4%     ~     (p=0.500 n=5+5)
RegexpMatchHard_32-16     10.4MB/s ± 3%  10.5MB/s ± 4%     ~     (p=0.460 n=5+5)
RegexpMatchHard_1K-16     11.5MB/s ± 1%  11.7MB/s ± 2%   +1.57%  (p=0.032 n=5+5)
Revcomp-16                 442MB/s ± 0%   433MB/s ± 2%   -2.05%  (p=0.032 n=4+5)
Template-16               17.7MB/s ± 1%  18.2MB/s ± 3%   +3.12%  (p=0.032 n=5+5)

Change-Id: Ic7cb7374d07da031e771bdcbfdd832fd1b17159c
Reviewed-on: https://go-review.googlesource.com/98695
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-03-12 18:01:33 +00:00
fanzha02 fdf5aaf555 cmd/asm: fix ARM64 vector register arrangement encoding bug
The current code assigns vector register arrangement a wrong value
when the arrangement specifier is S2, which causes the incorrect
assembly.

The patch fixes the issue and adds the test cases.

Fixes #24249

Change-Id: I9736df1279494003d0b178da1af9cee9cd85ce21
Reviewed-on: https://go-review.googlesource.com/98555
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-12 15:05:53 +00:00
Giovanni Bajo f7ac70a566 test: move rotate tests to top-level testsuite.
Remove old tests from asm_test.

Change-Id: Ib408ec7faa60068bddecf709b93ce308e0ef665a
Reviewed-on: https://go-review.googlesource.com/100075
Reviewed-by: Alberto Donizetti <alb.donizetti@gmail.com>
2018-03-11 10:08:18 +00:00
Daniel Martí c15b7b2a54 cmd: re-generate all stringer files
The tool has gotten better over time, so re-generating the files brings
some advantages like fewer objects, dropping the use of fmt, and
dropping unnecessary bounds checks.

While at it, add the missing go:generate line for obj.AddrType.

Change-Id: I120c9795ee8faddf5961ff0384b9dcaf58d831ff
Reviewed-on: https://go-review.googlesource.com/100015
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-10 21:20:50 +00:00
Giovanni Bajo 5c432fe0e3 cmd/compile: gofmt rewriteARM64.go
Change-Id: I7424257e496f8f40c9601b62335b64d641dcd3b5
Reviewed-on: https://go-review.googlesource.com/99996
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2018-03-10 13:03:55 +00:00
Daniel Martí 0c5cfec844 cmd/internal/test2json: support subtests containing colons
The "updates" lines, such as RUN, do not contain a colon. However,
test2json looked for one anyway, meaning that it would be thrown off if
it encountered a line like:

	=== RUN   TestWithColons/[::1]

In that case, it must not use the first colon it encounters to separate
the action from the test name.

Fixes #23920.

Change-Id: I82eff23e24b83dae183c0cf9f85fc5f409f51c25
Reviewed-on: https://go-review.googlesource.com/98445
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-10 10:13:25 +00:00
David Chase 0eacf8cbdf cmd/compile: add DWARF reg defs & fix 32-bit location list bug
Before DWARF location lists can be turned on, 3 bugs need
fixing.

This CL addresses two -- lack of register definitions for
various architectures, and bugs on 32-bit platforms.
The third bug comes later.

Passes
GO_GCFLAGS=-dwarflocationlists ./run.bash -no-rebuild
(-no-rebuild because the map dependence causes trouble)

Change-Id: I4223b48ade84763e4b048e4aeb81149f082c7bc7
Reviewed-on: https://go-review.googlesource.com/99255
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-09 23:17:18 +00:00
Ben Shi 20046020c4 cmd/compile: fix an issue in MNEG of ARM64
There are two less optimized SSA rules in my previous CL
https://go-review.googlesource.com/c/go/+/95075 .

This CL fixes that issue and a test case gets about 10%
performance improvement.
name    old time/op  new time/op  delta
MNEG-4   263µs ± 3%   235µs ± 3%  -10.53%  (p=0.000 n=20+20)
(https://github.com/benshi001/ugo1/blob/master/mneg_7_test.go)

Change-Id: I30087097e281dd9d9d1c870d32e13b4ef4a96ad3
Reviewed-on: https://go-review.googlesource.com/99495
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-09 22:45:21 +00:00
Matthew Dempsky e4de522c95 cmd/compile: fix Node.Etype overloading
Add helper methods that validate n.Op and convert to/from the
appropriate type.

Notably, there was a lot of code in walk.go that thought setting
Etype=1 on an OADDR node affected escape analysis.

Passes toolstash-check.

TBR=marvin

Change-Id: Ieae7c67225c1459c9719f9e6a748a25b975cf758
Reviewed-on: https://go-review.googlesource.com/99535
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-09 21:44:35 +00:00
Alberto Donizetti 5f541b11aa test/codegen: port MULs merging tests to codegen
And delete them from asm_go.

Change-Id: I0057cbd90ca55fa51c596e32406e190f3866f93e
Reviewed-on: https://go-review.googlesource.com/99815
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-09 17:01:56 +00:00
Jean de Klerk 709317138f cmd/go: briefly document test caching in go test -h output
Fixes #23971

Change-Id: I073f278cc058aa15a23c0ea06292c02d50a3df21
Reviewed-on: https://go-review.googlesource.com/95582
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
2018-03-09 11:39:11 +00:00
Alberto Donizetti cde34780b7 test/codegen: port math/bits.RotateLeft tests to codegen
Only RotateLeft{64,32} were tested, and just for ppc64. This CL adds
tests for RotateLeft{64,32,16,8} on arm64 and amd64/386, for the cases
where the calls are actually instrinsified.

RotateLeft tests (the last ones for math/bits functions) are deleted
from asm_test.

This CL also adds a space between the "//" and the arch name in the
comments, to uniform this file to the style used in all the other
files.

Change-Id: Ifc2a27261d70bcc294b4ec64490d8367f62d2b89
Reviewed-on: https://go-review.googlesource.com/99596
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-09 10:53:38 +00:00
Hana Kim 6b5a0b5c16 cmd/trace: set cname for span slices
Define a set of color names available in trace viewer

https://user-images.githubusercontent.com/4999471/37063995-5d0bad48-2169-11e8-92be-9cb363e21c38.png

Change-Id: I312fcbc5430d7512b4c39ddc79a769259bad8c22
Reviewed-on: https://go-review.googlesource.com/99055
Reviewed-by: Heschi Kreinick <heschi@google.com>
2018-03-09 01:58:41 +00:00
Hana Kim 476f71c958 cmd/trace: remove unrelated arrows in task-oriented traceview
Also grey out instants that represent events occurred outside the
task's span. Furthermore, if the unrelated instants represent user
annotation events but not for the task of the interest, skip rendering
completely.

This helps users to focus on the task-related events better.

UI screen shot:
https://gist.github.com/hyangah/1df5d2c8f429fd933c481e9636b89b55#file-golang-org_cl_99035

Change-Id: I2b5aef41584c827f8c1e915d0d8e5c95fe2b4b65
Reviewed-on: https://go-review.googlesource.com/99035
Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2018-03-09 01:34:12 +00:00
Austin Clements 60a9e5d613 runtime: ensure abort actually crashes the process
On all non-x86 arches, runtime.abort simply reads from nil.
Unfortunately, if this happens on a user stack, the signal handler
will dutifully turn this into a panicmem, which lets user defers run
and which user code can even recover from.

To fix this, add an explicit check to the signal handler that turns
faults in abort into hard crashes directly in the signal handler. This
has the added benefit of giving a register dump at the abort point.

Change-Id: If26a7f13790745ee3867db7f53b72d8281176d70
Reviewed-on: https://go-review.googlesource.com/93661
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-08 22:55:55 +00:00
Austin Clements c950a90d72 runtime: call abort instead of raw INT $3 or bad MOV
Everything except for amd64, amd64p32, and 386 currently defines and
uses an abort function. This CL makes these match. The next CL will
recognize the abort function to make this more useful.

Change-Id: I7c155871ea48919a9220417df0630005b444f488
Reviewed-on: https://go-review.googlesource.com/93660
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-08 22:55:54 +00:00
Austin Clements da022da900 cmd/compile: simplify OpSlicemask optimization
The previous CL introduced isConstDelta. Use it to simplify the
OpSlicemask optimization in the prove pass. This passes toolstash
-cmp.

Change-Id: If2aa762db4cdc0cd1c581a536340530a9831081b
Reviewed-on: https://go-review.googlesource.com/87481
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-08 22:25:29 +00:00
Austin Clements 6436270dad cmd/compile: add fence-post implications to prove
This adds four new deductions to the prove pass, all related to adding
or subtracting one from a value. This is the first hint of actual
arithmetic relations in the prove pass.

The most effective of these is

   x-1 >= w && x > min  ⇒  x > w

This helps eliminate bounds checks in code like

  if x > 0 {
    // do something with s[x-1]
  }

Altogether, these deductions prove an additional 260 branches in std
and cmd. Furthermore, they will let us eliminate some tricky
compiler-inserted panics in the runtime that are interfering with
static analysis.

Fixes #23354.

Change-Id: I7088223e0e0cd6ff062a75c127eb4bb60e6dce02
Reviewed-on: https://go-review.googlesource.com/87480
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Alexandru Moșoi <alexandru@mosoi.ro>
2018-03-08 22:25:28 +00:00
Austin Clements 941fc129e2 cmd/compile: derive unsigned limits from signed limits in prove
This adds a few simple deductions to the prove pass' fact table to
derive unsigned concrete limits from signed concrete limits where
possible.

This tweak lets the pass prove 70 additional branch conditions in std
and cmd.

This is based on a comment from the recently-deleted factsTable.get:
"// TODO: also use signed data if lim.min >= 0".

Change-Id: Ib4340249e7733070f004a0aa31254adf5df8a392
Reviewed-on: https://go-review.googlesource.com/87479
Reviewed-by: Alexandru Moșoi <alexandru@mosoi.ro>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-08 22:25:27 +00:00
Austin Clements 669db2cef5 cmd/compile: make prove pass use unsatisfiability
Currently the prove pass uses implication queries. For each block, it
collects the set of branch conditions leading to that block, and
queries this fact table for whether any of these facts imply the
block's own branch condition (or its inverse). This works remarkably
well considering it doesn't do any deduction on these facts, but it
has various downsides:

1. It requires an implementation both of adding facts to the table and
   determining implications. These are very nearly duals of each
   other, but require separate implementations. Likewise, the process
   of asserting facts of dominating branch conditions is very nearly
   the dual of the process of querying implied branch conditions.

2. It leads to less effective use of derived facts. For example, the
   prove pass currently derives facts about the relations between len
   and cap, but can't make use of these unless a branch condition is
   in the exact form of a derived fact. If one of these derived facts
   contradicts another fact, it won't notice or make use of this.

This CL changes the approach of the prove pass to instead use
*contradiction* instead of implication. Rather than ever querying a
branch condition, it simply adds branch conditions to the fact table.
If this leads to a contradiction (specifically, it makes the fact set
unsatisfiable), that branch is impossible and can be cut. As a result,

1. We can eliminate the code for determining implications
   (factsTable.get disappears entirely). Also, there is now a single
   implementation of visiting and asserting branch conditions, since
   we don't have to flip them around to treat them as facts in one
   place and queries in another.

2. Derived facts can be used effectively. It doesn't matter *why* the
   fact table is unsatisfiable; a contradiction in any of the facts is
   enough.

3. As an added benefit, it's now quite easy to avoid traversing beyond
   provably-unreachable blocks. In contrast, the current
   implementation always visits all blocks.

The prove pass already has nearly all of the mechanism necessary to
compute unsatisfiability, which means this both simplifies the code
and makes it more powerful.

The only complication is that the current implication procedure has a
hack for dealing with the 0 <= Args[0] condition of OpIsInBounds and
OpIsSliceInBounds. We replace this with asserting the appropriate fact
when we process one of these conditions. This seems much cleaner
anyway, and works because we can now take advantage of derived facts.

This has no measurable effect on compiler performance.

Effectiveness:

There is exactly one condition in all of std and cmd that this fails
to prove that the old implementation could: (int64(^uint(0)>>1) < x)
in encoding/gob. This can never be true because x is an int, and it's
basically coincidence that the old code gets this. (For example, it
fails to prove the similar (x < ^int64(^uint(0)>>1)) condition that
immediately precedes it, and even though the conditions are logically
unrelated, it wouldn't get the second one if it hadn't first processed
the first!)

It does, however, prove a few dozen additional branches. These come
from facts that are added to the fact table about the relations
between len and cap. These were almost never queried directly before,
but could lead to contradictions, which the unsat-based approach is
able to use.

There are exactly two branches in std and cmd that this implementation
proves in the *other* direction. This sounds scary, but is okay
because both occur in already-unreachable blocks, so it doesn't matter
what we chose. Because the fact table logic is sound but incomplete,
it fails to prove that the block isn't reachable, even though it is
able to prove that both outgoing branches are impossible. We could
turn these blocks into BlockExit blocks, but it doesn't seem worth the
trouble of the extra proof effort for something that happens twice in
all of std and cmd.

Tests:

This CL updates test/prove.go to change the expected messages because
it can no longer give a "reason" why it proved or disproved a
condition. It also adds a new test of a branch it couldn't prove
before.

It mostly guts test/sliceopt.go, removing everything related to slice
bounds optimizations and moving a few relevant tests to test/prove.go.
Much of this test is actually unreachable. The new prove pass figures
this out and doesn't try to prove anything about the unreachable
parts. The output on the unreachable parts is already suspect because
anything can be proved at that point, so it's really just a regression
test for an algorithm the compiler no longer uses.

This is a step toward fixing #23354. That issue is quite easy to fix
once we can use derived facts effectively.

Change-Id: Ia48a1b9ee081310579fe474e4a61857424ff8ce8
Reviewed-on: https://go-review.googlesource.com/87478
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-08 22:25:25 +00:00
Austin Clements 2e9cf5f66e cmd/compile: simplify limit logic in prove
This replaces the open-coded intersection of limits in the prove pass
with a general limit intersection operation. This should get identical
results except in one case where it's more precise: when handling an
equality relation, if the value is *outside* the existing range, this
will reduce the range to empty rather than resetting it. This will be
important to a follow-up CL where we can take advantage of empty
ranges.

For #23354.

Change-Id: I3d3d75924f61b1da1cb604b3a9d189b26fb3a14e
Reviewed-on: https://go-review.googlesource.com/87477
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Alexandru Moșoi <alexandru@mosoi.ro>
2018-03-08 22:25:24 +00:00
Austin Clements 44e20b64ef cmd/compile: more String methods for prove types
These aid in debugging.

Change-Id: Ieb38c996765f780f6103f8c3292639d408c25123
Reviewed-on: https://go-review.googlesource.com/87476
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-08 22:25:23 +00:00
Austin Clements 491f409a32 cmd/compile: minor comment improvements/corrections
Change-Id: Ie0934f1528d58d4971cdef726d3e2d23cf3935d3
Reviewed-on: https://go-review.googlesource.com/87475
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Alexandru Moșoi <alexandru@mosoi.ro>
2018-03-08 22:25:21 +00:00
Matthew Dempsky b55eedd173 Revert "cmd/compile: cleanup nodpc and nodfp"
This reverts commit dcac984b97.

Reason for revert: broke LR architectures (arm64, ppc64, s390x)

Change-Id: I531d311c9053e81503c8c78d6cf044b318fc828b
Reviewed-on: https://go-review.googlesource.com/99695
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-03-08 21:23:01 +00:00
isharipo d2a5263a9c math/big: speedup nat.setBytes for bigger slices
Set up to _S (number of bytes in Uint) bytes at time
by using BigEndian.Uint32 and BigEndian.Uint64.

The performance improves for slices bigger than _S bytes.
This is the case for 128/256bit arith that initializes
it's objects from bytes.

name               old time/op  new time/op  delta
NatSetBytes/8-4    29.8ns ± 1%  11.4ns ± 0%  -61.63%  (p=0.000 n=9+8)
NatSetBytes/24-4    109ns ± 1%    56ns ± 0%  -48.75%  (p=0.000 n=9+8)
NatSetBytes/128-4   420ns ± 2%   110ns ± 1%  -73.83%  (p=0.000 n=10+10)
NatSetBytes/7-4    26.2ns ± 1%  21.3ns ± 2%  -18.63%  (p=0.000 n=8+9)
NatSetBytes/23-4    106ns ± 1%    67ns ± 1%  -36.93%  (p=0.000 n=9+10)
NatSetBytes/127-4   410ns ± 2%   121ns ± 0%  -70.46%  (p=0.000 n=9+8)

Found this optimization opportunity by looking at ethereum_corevm
community benchmark cpuprofile.

name        old time/op  new time/op  delta
OpDiv256-4   715ns ± 1%   596ns ± 1%  -16.57%  (p=0.008 n=5+5)
OpDiv128-4   373ns ± 1%   314ns ± 1%  -15.83%  (p=0.008 n=5+5)
OpDiv64-4    301ns ± 0%   285ns ± 1%   -5.12%  (p=0.008 n=5+5)

Change-Id: I8e5a680ae6284c8233d8d7431d51253a8a740b57
Reviewed-on: https://go-review.googlesource.com/98775
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
Reviewed-by: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-08 18:50:10 +00:00
Matthew Dempsky dcac984b97 cmd/compile: cleanup nodpc and nodfp
Instead of creating a new &nodfp expression for every recover() call,
or a new nodpc variable for every function instrumented by the race
detector, this CL introduces two new uintptr-typed pseudo-variables
callerSP and callerPC. These pseudo-variables act just like calls to
the runtime's getcallersp() and getcallerpc() functions.

For consistency, change runtime.gorecover's builtin stub's parameter
type from "*int32" to "uintptr".

Passes toolstash-check, but toolstash-check -race fails because of
register allocator changes.

Change-Id: I985d644653de2dac8b7b03a28829ad04dfd4f358
Reviewed-on: https://go-review.googlesource.com/99416
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-08 18:22:29 +00:00
Matthew Dempsky 6a5cfa8b63 cmd/compile: remove two out-of-phase calls to walk
All calls to walkstmt/walkexpr/etc should be rooted from funccompile,
whereas transformclosure and fninit are called by main.

Passes toolstash-check.

Change-Id: Ic880e2d2d83af09618ce4daa8e7716f6b389e53e
Reviewed-on: https://go-review.googlesource.com/99418
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-08 18:22:13 +00:00
Matthew Dempsky 8b766e5d09 cmd/compile: remove state.exitCode
We're holding onto the function's complete AST anyway, so might as
well grab the exit code from there.

Passes toolstash-check.

Change-Id: I851b5dfdb53f991e9cd9488d25d0d2abc2a8379f
Reviewed-on: https://go-review.googlesource.com/99417
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-08 18:22:04 +00:00
Matthew Dempsky e3127f023f cmd/compile: fuse escape analysis parameter tagging loops
Simplifies the code somewhat and allows removing Param.Field.

Passes toolstash-check.

Change-Id: Id854416aea8afd27ce4830ff0f5ff940f7353792
Reviewed-on: https://go-review.googlesource.com/99336
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-08 18:21:52 +00:00
Alberto Donizetti 3772b2e1d5 test/codegen: port 2^n muls tests to codegen harness
And delete them from the asm_test.go file.

Change-Id: I124c8c352299646ec7db0968cdb0fe59a3b5d83d
Reviewed-on: https://go-review.googlesource.com/99475
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-08 16:30:14 +00:00
Lynn Boger 5b14c7b324 cmd/asm, cmd/internal/obj/ppc64: avoid unnecessary load zeros
When instructions add, and, or, xor, and movd have
constant operands in some cases more instructions are
generated than necessary by the assembler.

This adds more opcode/operand combinations to the optab
and improves the code generation for the cases where the
size and sign of the constant allows the use of 1
instructions instead of 2.

Example of previous code:
	oris r3, r0, 0
	ori  r3, r3, 65533

now:
	ori r3, r0, 65533

This does not significantly reduce the overall binary size
because the improvement depends on the constant value.
Some procedures show a 1-2% reduction in size. This improvement
could also be significant in cases where the extra instructions
occur in a critical loop.

Testcase ppc64enc.s was added to cmd/asm/internal/asm/testdata
with the variations affected by this change.

Updates #23845

Change-Id: I7fdf2320c95815d99f2755ba77d0c6921cd7fad7
Reviewed-on: https://go-review.googlesource.com/95135
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-03-08 14:17:34 +00:00
Matthew Dempsky 88466e93a4 cmd/compile: mark anonymous receiver parameters as non-escaping
This was already done for normal parameters, and the same logic
applies for receiver parameters too.

Updates #24305.

Change-Id: Ia2a46f68d14e8fb62004ff0da1db0f065a95a1b7
Reviewed-on: https://go-review.googlesource.com/99335
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-08 00:20:01 +00:00
Ian Lance Taylor 8b8625a328 cmd/cover: don't crash on non-gofmt'ed input
Without the change to cover.go, the new test fails with

panic: overlapping edits: [4946,4950)->"", [4947,4947)->"thisNameMustBeVeryLongToCauseOverflowOfCounterIncrementStatementOntoNextLineForTest.Count[112]++;"

The original code inserts "else{", deletes "else", and then positions
a new block just after the "}" that must come before the "else".
That works on gofmt'ed code, but fails if the code looks like "}else".
When there is no space between the "{" and the "else", the new block
is inserted into a location that we are deleting, leading to the
"overlapping edits" mentioned above.

This CL fixes this case by not deleting the "else" but just using the
one that is already there. That requires adjust the block offset to
come after the "{" that we insert.

Fixes #23927

Change-Id: I40ef592490878765bbce6550ddb439e43ac525b2
Reviewed-on: https://go-review.googlesource.com/98935
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-03-07 23:36:25 +00:00
Brad Fitzpatrick d8c9ef9e5c cmd/dist: skip rebuild before running tests when on the build systems
Updates #24300

Change-Id: I7752dab67e15a6dfe5fffe5b5ccbf3373bbc2c13
Reviewed-on: https://go-review.googlesource.com/99296
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-07 23:27:24 +00:00
David du Colombier b1335037fa cmd/go: skip TestVetWithOnlyCgoFiles when cgo is disabled
CL 99175 added TestVetWithOnlyCgoFiles. However, this
test is failing on platforms where cgo is disabled,
because no file can be built.

This change fixes TestVetWithOnlyCgoFiles by skipping
this test when cgo is disabled.

Fixes #24304.

Change-Id: Ibb38fcd3e0ed1a791782145d3f2866f12117c6fe
Reviewed-on: https://go-review.googlesource.com/99275
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-07 22:03:43 +00:00
Ian Lance Taylor 709da95513 cmd/go: run vet on packages with only cgo files
CgoFiles is not included in GoFiles, so we need to check both.

Fixes #24193

Change-Id: I6a67bd912e3d9a4be0eae8fa8db6fa8a07fb5df3
Reviewed-on: https://go-review.googlesource.com/99175
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-07 18:42:17 +00:00
Matthew Dempsky a3b3284ddc cmd/compile: prevent untyped types from reaching walk
We already require expressions to have already been typechecked before
reaching walk. Moreover, all untyped expressions should have been
converted to their default type by walk.

However, in practice, we've been somewhat sloppy and inconsistent
about ensuring this. In particular, a lot of AST rewrites ended up
leaving untyped bool expressions scattered around. These likely aren't
harmful in practice, but it seems worth cleaning up.

The two most common cases addressed by this CL are:

1) When generating OIF and OFOR nodes, we would often typecheck the
conditional expression, but not apply defaultlit to force it to the
expression's default type.

2) When rewriting string comparisons into more fundamental primitives,
we were simply overwriting r.Type with the desired type, which didn't
propagate the type to nested subexpressions. These are fixed by
utilizing finishcompare, which correctly handles this (and is already
used by other comparison lowering rewrites).

Lastly, walkexpr is extended to assert that it's not called on untyped
expressions.

Fixes #23834.

Change-Id: Icbd29648a293555e4015d3b06a95a24ccbd3f790
Reviewed-on: https://go-review.googlesource.com/98337
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-03-07 18:14:22 +00:00
Kunpei Sakai ed8b7a7785 cmd/compile: go fmt
Change-Id: I2eae33928641c6ed74badfe44d079ae90e5cc8c8
Reviewed-on: https://go-review.googlesource.com/99195
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-07 16:57:03 +00:00
Hana Kim 93b0261d0a cmd/trace: force GC occassionally
to return memory to the OS after completing potentially
large operations.

Update #21870

Sys went down to 3.7G

$ DEBUG_MEMORY_USAGE=1 go tool trace trace.out

2018/03/07 09:35:52 Parsing trace...
after parsing trace
 Alloc:	3385754360 Bytes
 Sys:	3662047864 Bytes
 HeapReleased:	0 Bytes
 HeapSys:	3488907264 Bytes
 HeapInUse:	3426549760 Bytes
 HeapAlloc:	3385754360 Bytes
Enter to continue...
2018/03/07 09:36:09 Splitting trace...
after spliting trace
 Alloc:	3238309424 Bytes
 Sys:	3684410168 Bytes
 HeapReleased:	0 Bytes
 HeapSys:	3488874496 Bytes
 HeapInUse:	3266461696 Bytes
 HeapAlloc:	3238309424 Bytes
Enter to continue...
2018/03/07 09:36:39 Opening browser. Trace viewer is listening on http://100.101.224.241:12345

after httpJsonTrace
 Alloc:	3000633872 Bytes
 Sys:	3693978424 Bytes
 HeapReleased:	0 Bytes
 HeapSys:	3488743424 Bytes
 HeapInUse:	3030966272 Bytes
 HeapAlloc:	3000633872 Bytes
Enter to continue...

Change-Id: I56f64cae66c809cbfbad03fba7bd0d35494c1d04
Reviewed-on: https://go-review.googlesource.com/92376
Reviewed-by: Peter Weinberger <pjw@google.com>
2018-03-07 14:39:25 +00:00
Hana Kim ee465831ec cmd/trace: generate jsontrace data in a streaming fashion
Update #21870

The Sys went down to 4.25G from 6.2G.

$ DEBUG_MEMORY_USAGE=1 go tool trace trace.out
2018/03/07 08:49:01 Parsing trace...
after parsing trace
 Alloc:	3385757184 Bytes
 Sys:	3661195896 Bytes
 HeapReleased:	0 Bytes
 HeapSys:	3488841728 Bytes
 HeapInUse:	3426516992 Bytes
 HeapAlloc:	3385757184 Bytes
Enter to continue...
2018/03/07 08:49:18 Splitting trace...
after spliting trace
 Alloc:	2352071904 Bytes
 Sys:	4243825464 Bytes
 HeapReleased:	0 Bytes
 HeapSys:	4025712640 Bytes
 HeapInUse:	2377703424 Bytes
 HeapAlloc:	2352071904 Bytes
Enter to continue...
after httpJsonTrace
 Alloc:	3228697832 Bytes
 Sys:	4250379064 Bytes
 HeapReleased:	0 Bytes
 HeapSys:	4025647104 Bytes
 HeapInUse:	3260014592 Bytes
 HeapAlloc:	3228697832 Bytes

Change-Id: I546f26bdbc68b1e58f1af1235a0e299dc0ff115e
Reviewed-on: https://go-review.googlesource.com/92375
Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
Reviewed-by: Peter Weinberger <pjw@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-07 14:33:54 +00:00
Matthew Dempsky d7eb4901f1 cmd/compile: remove funcdepth variables
There were only two large classes of use for these variables:

1) Testing "funcdepth != 0" or "funcdepth > 0", which is equivalent to
checking "Curfn != nil".

2) In oldname, detecting whether a closure variable has been created
for the current function, which can be handled by instead testing
"n.Name.Curfn != Curfn".

Lastly, merge funcstart into funchdr, since it's only called once, and
it better matches up with funcbody now.

Passes toolstash-check.

Change-Id: I8fe159a9d37ef7debc4cd310354cea22a8b23394
Reviewed-on: https://go-review.googlesource.com/99076
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-07 06:05:18 +00:00
Matthew Dempsky aa00ca12fe cmd/compile: cleanup funccompile and compile
Bring these functions next to each other, and clean them up a little
bit. Also, change emitptrargsmap to take Curfn as a parameter instead
of a global.

Passes toolstash-check.

Change-Id: Ib9c94fda3b2cb6f0dcec1585622b33b4f311b5e9
Reviewed-on: https://go-review.googlesource.com/99075
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-07 03:12:38 +00:00
Kunpei Sakai b75e8a2a3b cmd/compile: prevent detection of wrong duplicates
by including *types.Type in typeVal.

Updates #21866
Fixes #24159

Change-Id: I2f8cac252d88d43e723124f2867b1410b7abab7b
Reviewed-on: https://go-review.googlesource.com/98476
Run-TryBot: Kunpei Sakai <namusyaka@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-03-07 01:26:00 +00:00
Matthew Dempsky 2c0c68d621 cmd/compile: fix miscompilation of "defer delete(m, k)"
Previously, for slow map key types (i.e., any type other than a 32-bit
or 64-bit plain memory type), we would rewrite

    defer delete(m, k)

into

    ktmp := k
    defer delete(m, &ktmp)

However, if the defer statement was inside a loop, we would end up
reusing the same ktmp value for all of the deferred deletes.

We already rewrite

    defer print(x, y, z)

into

    defer func(a1, a2, a3) {
        print(a1, a2, a3)
    }(x, y, z)

This CL generalizes this rewrite to also apply for slow map deletes.

This could be extended to apply even more generally to other builtins,
but as discussed on #24259, there are cases where we must *not* do
this (e.g., "defer recover()"). However, if we elect to do this more
generally, this CL should still make that easier.

Lastly, while here, fix a few isues in wrapCall (nee walkprintfunc):

1) lookupN appends the generation number to the symbol anyway, so "%d"
was being literally included in the generated function names.

2) walkstmt will be called when the function is compiled later anyway,
so no need to do it now.

Fixes #24259.

Change-Id: I70286867c64c69c18e9552f69e3f4154a0fc8b04
Reviewed-on: https://go-review.googlesource.com/99017
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-06 23:33:28 +00:00
ChrisALiles 42ecf39e85 cmd/compile: improve compiler error on embedded structs
Fixes #23609

Change-Id: I751aae3d849de7fce1306324fcb1a4c3842d873e
Reviewed-on: https://go-review.googlesource.com/97076
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-06 21:06:46 +00:00
Alberto Donizetti 8516ecd05f test/codegen: port math/bits.ReverseBytes tests to codegen
And remove them from ssa_test.

Change-Id: If767af662801219774d1bdb787c77edfa6067770
Reviewed-on: https://go-review.googlesource.com/98976
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-06 20:34:33 +00:00
Wei Xiao 05962561ae cmd/compile/internal/ssa: improve store combine optimization on arm64
Current implementation doesn't consider MOVDreg type operand and fail to combine
it into larger store. This patch fixes the issue.

Fixes #24242

Change-Id: I7d68697f80e76f48c3528ece01a602bf513248ec
Reviewed-on: https://go-review.googlesource.com/98397
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-06 20:29:04 +00:00
Balaram Makam 0e8b7110f6 cmd/compile/internal/ssa: inline small memmove for arm64
This patch enables the optimization for arm64 target.

Performance results on Amberwing for strconv benchmark:
name             old time/op  new time/op  delta
Quote             721ns ± 0%   617ns ± 0%  -14.40%  (p=0.016 n=5+4)
QuoteRune         118ns ± 0%   117ns ± 0%   -0.85%  (p=0.008 n=5+5)
AppendQuote       436ns ± 2%   321ns ± 0%  -26.31%  (p=0.008 n=5+5)
AppendQuoteRune  34.7ns ± 0%  28.4ns ± 0%  -18.16%  (p=0.000 n=5+4)
[Geo mean]        189ns        160ns       -15.41%

Change-Id: I5714c474e7483d07ca338fbaf49beb4bbcc11c44
Reviewed-on: https://go-review.googlesource.com/98735
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-06 18:37:19 +00:00
Alberto Donizetti 18ae5eca3b test/codegen: port math/bits.OnesCount tests to codegen
And remove them from ssa_test.

Change-Id: I3efac5fea529bb0efa2dae32124530482ba5058e
Reviewed-on: https://go-review.googlesource.com/98815
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-06 17:53:00 +00:00
Cherry Zhang f624445473 cmd/internal/obj/arm64: gofmt
Change-Id: Ica778fef2d0245fbb14f595597e45c7cf6adef84
Reviewed-on: https://go-review.googlesource.com/98895
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-06 16:35:20 +00:00
Elias Naur ad87a67cdf cmd/dist: default to GOARM=7 on android
Auto-detecting GOARM on Android makes as little sense as for nacl/arm
and darwin/arm.

Also update androidtest.sh to not require GOARM set.

Change-Id: Id409ce1573d3c668d00fa4b7e3562ad7ece6fef5
Reviewed-on: https://go-review.googlesource.com/98875
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
Run-TryBot: Elias Naur <elias.naur@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-06 16:08:04 +00:00
Alberto Donizetti 85dcc709a8 test/codegen: port math/bits.TrailingZeros tests to codegen
And remove them from ssa_test.

Change-Id: Ib5de5c0d908f23915e0847eca338cacf2fa5325b
Reviewed-on: https://go-review.googlesource.com/98795
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-06 11:48:37 +00:00
Meng Zhuo 8916773a3d runtime, cmd/compile: use ldp for DUFFCOPY on ARM64
name         old time/op  new time/op  delta
CopyFat8     2.15ns ± 1%  2.19ns ± 6%     ~     (p=0.171 n=8+9)
CopyFat12    2.15ns ± 0%  2.17ns ± 2%     ~     (p=0.137 n=8+10)
CopyFat16    2.17ns ± 3%  2.15ns ± 0%     ~     (p=0.211 n=10+10)
CopyFat24    2.16ns ± 1%  2.15ns ± 0%     ~     (p=0.087 n=10+10)
CopyFat32    11.5ns ± 0%  12.8ns ± 2%  +10.87%  (p=0.000 n=8+10)
CopyFat64    20.2ns ± 2%  12.9ns ± 0%  -36.11%  (p=0.000 n=10+10)
CopyFat128   37.2ns ± 0%  21.5ns ± 0%  -42.20%  (p=0.000 n=10+10)
CopyFat256   71.6ns ± 0%  38.7ns ± 0%  -45.95%  (p=0.000 n=10+10)
CopyFat512    140ns ± 0%    73ns ± 0%  -47.86%  (p=0.000 n=10+9)
CopyFat520    142ns ± 0%    74ns ± 0%  -47.54%  (p=0.000 n=10+10)
CopyFat1024   277ns ± 0%   141ns ± 0%  -49.10%  (p=0.000 n=10+10)

Change-Id: If54bc571add5db674d5e081579c87e80153d0a5a
Reviewed-on: https://go-review.googlesource.com/97395
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-06 04:14:59 +00:00
Rob Pike baf3eb1625 cmd/doc: make local dot-slash path names work
Before, an argument that started ./ or ../ was not treated as
a package relative to the current directory. Thus

	$ cd $GOROOT/src/text
	$ go doc ./template

could find html/template as $GOROOT/src/html/./template
is a valid Go source directory.

Fix this by catching such paths and making them absolute before
processing.

Fixes #23383.

Change-Id: Ic2a92eaa3a6328f728635657f9de72ac3ee82afb
Reviewed-on: https://go-review.googlesource.com/98396
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-06 01:11:26 +00:00
Matthew Dempsky 26708439ec cmd/compile: refactor order.go into methods
No functional changes, just changing all the orderfoo functions
into (*Order).foo methods.

Passes toolstash-check.

Change-Id: Ib9833daa98aff3c645ce56794a414f8472689152
Reviewed-on: https://go-review.googlesource.com/98617
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-05 21:25:24 +00:00
Hana Kim d3946f75d3 internal/trace: remove backlinks from span/task end to start
This is an updated version of golang.org/cl/96395, with the fix to
TestUserSpan.

This reverts commit 7b6f6267e90a8e4eab37a3f2164ba882e6222adb.

Change-Id: I31eec8ba0997f9178dffef8dac608e731ab70872
Reviewed-on: https://go-review.googlesource.com/98236
Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2018-03-05 20:10:22 +00:00
Alberto Donizetti 83e41b3e76 test/codegen: port math/bits.Leadingzero tests to codegen
Change-Id: Ic21d25db5d56ce77516c53082dfbc010e5875b81
Reviewed-on: https://go-review.googlesource.com/98655
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-05 19:52:04 +00:00
Alberto Donizetti c1806906d8 test: port bits.Len intrinsics tests to the new codegen harness
This change move bits.Len* intrinsification tests to the new codegen
test harness, removing them from the old ssa_test file. Five different
test functions (one for each bit.Len function tested) was used, to
avoid possible unwanted interactions between multiple calls inside one
function.

Change-Id: Iffd5be55b58e88597fa30a562a28dacb01236d8b
Reviewed-on: https://go-review.googlesource.com/98156
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-05 18:01:19 +00:00
Giovanni Bajo 29fcd57a9f cmd/compile: fold offsets into memory ops
Fold offsets for:

  {ADD,SUB,MUL}[SD]mem
  ADD[LQ]constmem
  {ADD,SUB,AND,OR,XOR}[LQ]mem

Cumulatively, the rules trigger ~900 times in all.bash.

Fixes #23325

Change-Id: If6c701f68fa0b57907a353a07a516b914127d0d8
Reviewed-on: https://go-review.googlesource.com/98035
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-04 22:53:50 +00:00
Keith Randall ee58eccc56 internal/bytealg: move short string Index implementations into bytealg
Also move the arm64 CountByte implementation while we're here.

Fixes #19792

Change-Id: I1e0fdf1e03e3135af84150a2703b58dad1b0d57e
Reviewed-on: https://go-review.googlesource.com/98518
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-04 19:49:44 +00:00
Keith Randall f6332bb84a internal/bytealg: move compare functions to bytealg
Move bytes.Compare and runtime·cmpstring to bytealg.

Update #19792

Change-Id: I139e6d7c59686bef7a3017e3dec99eba5fd10447
Reviewed-on: https://go-review.googlesource.com/98515
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-04 17:49:39 +00:00
Keith Randall 45964e4f9c internal/bytealg: move Count to bytealg
Move bytes.Count and strings.Count to bytealg.

Update #19792

Change-Id: I3e4e14b504a0b71758885bb131e5656e342cf8cb
Reviewed-on: https://go-review.googlesource.com/98495
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-04 17:49:25 +00:00
Giovanni Bajo 89ae7045f3 test: convert all math-related tests from asm_test
Change-Id: If542f0b5c5754e6eb2f9b302fe5a148ba9a57338
Reviewed-on: https://go-review.googlesource.com/98443
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-04 16:52:33 +00:00
Giovanni Bajo fad31e513d test: move load/store combines into asmcheck
This CL moves the load/store combining tests into asmcheck.
In addition at being more compact, it's also now easier to
spot what it is missing in each architecture.

While doing so, I think I uncovered a bug in ppc64le and arm64
rules, because they fail to load/store combine in non-trivial
functions. Not sure why, I'll open an issue.

Change-Id: Ia1572d53c0553d9104f3e52b95e4d1768a8440a3
Reviewed-on: https://go-review.googlesource.com/98441
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-04 16:52:03 +00:00
Giovanni Bajo 8ce74b7d11 test: port a nil-check interface test from asm_test
Change-Id: I69c1688506d1aeca655047acf35d1bff966fc01e
Reviewed-on: https://go-review.googlesource.com/98442
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-03 20:20:54 +00:00
Keith Randall 1dfa380e3d internal/bytealg: move equal functions to bytealg
Move bytes.Equal, runtime.memequal, and runtime.memequal_varlen
to the bytealg package.

Update #19792

Change-Id: Ic4175e952936016ea0bda6c7c3dbb33afdc8e4ac
Reviewed-on: https://go-review.googlesource.com/98355
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-03 04:18:27 +00:00
Keith Randall 403ab0f221 internal/bytealg: move IndexByte asssembly to the new bytealg package
Move the IndexByte function from the runtime to a new bytealg package.
The new package will eventually hold all the optimized assembly for
groveling through byte slices and strings. It seems a better home for
this code than randomly keeping it in runtime.

Once this is in, the next step is to move the other functions
(Compare, Equal, ...).

Update #19792

This change seems complicated enough that we might just declare
"not worth it" and abandon.  Opinions welcome.

The core assembly is all unchanged, except minor modifications where
the code reads cpu feature bits.

The wrapper functions have been cleaned up as they are now actually
checked by vet.

Change-Id: I9fa75bee5d85db3a65b3fd3b7997e60367523796
Reviewed-on: https://go-review.googlesource.com/98016
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-02 22:46:15 +00:00
David du Colombier 1c9297c365 cmd/compile: skip TestEmptyDwarfRanges on Plan 9
TestEmptyDwarfRanges has been added in CL 94816.
This test is failing on Plan 9 because executables
don't have a DWARF symbol table.

Fixes #24226.

Change-Id: Iff7e34b8c2703a2f19ee8087a4d64d0bb98496cd
Reviewed-on: https://go-review.googlesource.com/98275
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-02 21:23:07 +00:00
Hana Kim d3562c9db9 internal/trace: Revert "remove backlinks from span/task end to start"
This reverts commit 16398894dc.
This broke TestUserTaskSpan test.

Change-Id: If5ff8bdfe84e8cb30787b03ead87205ece3d5601
Reviewed-on: https://go-review.googlesource.com/98235
Reviewed-by: Heschi Kreinick <heschi@google.com>
2018-03-02 20:32:08 +00:00
Hana Kim 16398894dc internal/trace: remove backlinks from span/task end to start
Even though undocumented, the assumption is the Event's link field
points to the following event in the future. The new span/task event
processing breaks the assumption.

Change-Id: I4ce2f30c67c4f525ec0a121a7e43d8bdd2ec3f77
Reviewed-on: https://go-review.googlesource.com/96395
Reviewed-by: Heschi Kreinick <heschi@google.com>
2018-03-02 20:15:57 +00:00
Michael Fraenkel 5b071bfa88 cmd/compile: convert type during finishcompare
When recursively calling walkexpr, r.Type is still the untyped value.
It then sometimes recursively calls finishcompare, which complains that
you can't compare the resulting expression to that untyped value.

Updates #23834.

Change-Id: I6b7acd3970ceaff8da9216bfa0ae24aca5dee828
Reviewed-on: https://go-review.googlesource.com/97856
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-03-02 19:48:23 +00:00
Than McIntosh 9b95611e38 cmd/compile: add DWARF register mappings for ARM64.
Add DWARF register mappings for ARM64, so that that arch will become
usable with "-dwarflocationlists". [NB: I've plugged in a set of
numbers from the doc, but this will require additional manual testing.]

Change-Id: Id9aa63857bc8b4f5c825f49274101cf372e9e856
Reviewed-on: https://go-review.googlesource.com/82515
Reviewed-by: Heschi Kreinick <heschi@google.com>
2018-03-02 19:40:29 +00:00
Alessandro Arzilli eca41af012 cmd/link: fix up debug_range for dsymutil (revert CL 72371)
Dsymutil, an utility used on macOS when externally linking executables,
does not support base address selector entries in debug_ranges.

CL 73271 worked around this problem by removing base address selectors
and emitting CU-relative relocations for each list entry.

This commit, as an optimization, reintroduces the base address
selectors and changes the linker to remove them again, but only when it
knows that it will have to invoke the external linker on macOS.

Compilecmp comparing master with a branch that has scope tracking
always enabled:

completed   15 of   15, estimated time remaining 0s (eta 2:43PM)
name        old time/op       new time/op       delta
Template          272ms ± 8%        257ms ± 5%  -5.33%  (p=0.000 n=15+14)
Unicode           124ms ± 7%        122ms ± 5%    ~     (p=0.210 n=14+14)
GoTypes           873ms ± 3%        870ms ± 5%    ~     (p=0.856 n=15+13)
Compiler          4.49s ± 2%        4.49s ± 5%    ~     (p=0.982 n=14+14)
SSA               11.8s ± 4%        11.8s ± 3%    ~     (p=0.653 n=15+15)
Flate             163ms ± 6%        164ms ± 9%    ~     (p=0.914 n=14+15)
GoParser          203ms ± 6%        202ms ±10%    ~     (p=0.571 n=14+14)
Reflect           547ms ± 7%        542ms ± 4%    ~     (p=0.914 n=15+14)
Tar               244ms ± 7%        237ms ± 3%  -2.80%  (p=0.002 n=14+13)
XML               289ms ± 6%        289ms ± 5%    ~     (p=0.839 n=14+14)
[Geo mean]        537ms             531ms       -1.10%

name        old user-time/op  new user-time/op  delta
Template          360ms ± 4%        341ms ± 7%  -5.16%  (p=0.000 n=14+14)
Unicode           189ms ±11%        190ms ± 8%    ~     (p=0.844 n=15+15)
GoTypes           1.13s ± 4%        1.14s ± 7%    ~     (p=0.582 n=15+14)
Compiler          5.34s ± 2%        5.40s ± 4%  +1.19%  (p=0.036 n=11+13)
SSA               14.7s ± 2%        14.7s ± 3%    ~     (p=0.602 n=15+15)
Flate             211ms ± 7%        214ms ± 8%    ~     (p=0.252 n=14+14)
GoParser          267ms ±12%        266ms ± 2%    ~     (p=0.837 n=15+11)
Reflect           706ms ± 4%        701ms ± 3%    ~     (p=0.213 n=14+12)
Tar               331ms ± 9%        320ms ± 5%  -3.30%  (p=0.025 n=15+14)
XML               378ms ± 4%        373ms ± 6%    ~     (p=0.253 n=14+15)
[Geo mean]        704ms             700ms       -0.58%

name        old alloc/op      new alloc/op      delta
Template         38.0MB ± 0%       38.4MB ± 0%  +1.12%  (p=0.000 n=15+15)
Unicode          28.8MB ± 0%       28.8MB ± 0%  +0.17%  (p=0.000 n=15+15)
GoTypes           112MB ± 0%        114MB ± 0%  +1.47%  (p=0.000 n=15+15)
Compiler          465MB ± 0%        473MB ± 0%  +1.71%  (p=0.000 n=15+15)
SSA              1.48GB ± 0%       1.53GB ± 0%  +3.07%  (p=0.000 n=15+15)
Flate            24.3MB ± 0%       24.7MB ± 0%  +1.67%  (p=0.000 n=15+15)
GoParser         30.7MB ± 0%       31.0MB ± 0%  +1.15%  (p=0.000 n=12+15)
Reflect          76.3MB ± 0%       77.1MB ± 0%  +0.97%  (p=0.000 n=15+15)
Tar              39.2MB ± 0%       39.6MB ± 0%  +0.91%  (p=0.000 n=15+15)
XML              41.5MB ± 0%       42.0MB ± 0%  +1.29%  (p=0.000 n=15+15)
[Geo mean]       77.5MB            78.6MB       +1.35%

name        old allocs/op     new allocs/op     delta
Template           385k ± 0%         387k ± 0%  +0.51%  (p=0.000 n=15+15)
Unicode            342k ± 0%         343k ± 0%  +0.10%  (p=0.000 n=14+15)
GoTypes           1.19M ± 0%        1.19M ± 0%  +0.62%  (p=0.000 n=15+15)
Compiler          4.51M ± 0%        4.54M ± 0%  +0.50%  (p=0.000 n=14+15)
SSA               12.2M ± 0%        12.4M ± 0%  +1.12%  (p=0.000 n=14+15)
Flate              234k ± 0%         236k ± 0%  +0.60%  (p=0.000 n=15+15)
GoParser           318k ± 0%         320k ± 0%  +0.60%  (p=0.000 n=15+15)
Reflect            974k ± 0%         977k ± 0%  +0.27%  (p=0.000 n=15+15)
Tar                395k ± 0%         397k ± 0%  +0.37%  (p=0.000 n=14+15)
XML                404k ± 0%         407k ± 0%  +0.53%  (p=0.000 n=15+15)
[Geo mean]         794k              798k       +0.52%

name        old text-bytes    new text-bytes    delta
HelloSize         680kB ± 0%        680kB ± 0%    ~     (all equal)

name        old data-bytes    new data-bytes    delta
HelloSize        9.62kB ± 0%       9.62kB ± 0%    ~     (all equal)

name        old bss-bytes     new bss-bytes     delta
HelloSize         125kB ± 0%        125kB ± 0%    ~     (all equal)

name        old exe-bytes     new exe-bytes     delta
HelloSize        1.11MB ± 0%       1.13MB ± 0%  +1.85%  (p=0.000 n=15+15)

Change-Id: I61c98ba0340cb798034b2bb55e3ab3a58ac1cf23
Reviewed-on: https://go-review.googlesource.com/98075
Reviewed-by: Heschi Kreinick <heschi@google.com>
2018-03-02 19:33:44 +00:00
Heschi Kreinick 9dc351beba cmd/compile/internal/ssa: batch up all zero-width instructions
When generating location lists, batch up changes for all zero-width
instructions, not just phis. This prevents the creation of location list
entries that don't actually cover any instructions.

This isn't perfect because of the caveats in the prior CL (Copy is
zero-width sometimes) but in practice this seems to fix all of the empty
lists in std.

Change-Id: Ice4a9ade36b6b24ca111d1494c414eec96e5af25
Reviewed-on: https://go-review.googlesource.com/97958
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-03-02 18:55:56 +00:00