Commit Graph

11651 Commits

Author SHA1 Message Date
Michael Munday 331c187b17 cmd/compile: simplify Neg lowering on s390x
No need to sign extend input to Neg8 and Neg16.

Change-Id: I7896c83c9cdf84a34098582351a4aabf61cd6fdd
Reviewed-on: https://go-review.googlesource.com/102675
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-27 19:03:42 +00:00
Matthew Dempsky 7b177b1a03 cmd/compile: fix method set computation for shadowed methods
In expandmeth, we call expand1/expand0 to build a list of all
candidate methods to promote, and then we use dotpath to prune down
which names actually resolve to a promoted method and how.

However, previously we still computed "followsptr" based on the
expand1/expand0 traversal (which is depth-first), rather than
dotpath (which is breadth-first). The result is that we could
sometimes end up miscomputing whether a particular promoted method
involves a pointer traversal, which could result in bad code
generation for method trampolines.

Fixes #24547.

Change-Id: I57dc014466d81c165b05d78b98610dc3765b7a90
Reviewed-on: https://go-review.googlesource.com/102618
Reviewed-by: Robert Griesemer <gri@golang.org>
2018-03-27 18:56:36 +00:00
Alberto Donizetti 377a2cb2d2 cmd/compile: reduce allocations in regAllocState.regalloc
name      old time/op       new time/op       delta
Template        281ms ± 2%        282ms ± 3%    ~     (p=0.428 n=19+20)
Unicode         138ms ± 6%        138ms ± 7%    ~     (p=0.813 n=19+20)
GoTypes         901ms ± 2%        895ms ± 2%    ~     (p=0.050 n=19+20)
Compiler        4.25s ± 1%        4.23s ± 1%  -0.31%  (p=0.031 n=19+18)
SSA             9.77s ± 1%        9.78s ± 1%    ~     (p=0.512 n=20+20)
Flate           187ms ± 3%        187ms ± 4%    ~     (p=0.687 n=20+19)
GoParser        224ms ± 4%        222ms ± 3%    ~     (p=0.301 n=20+20)
Reflect         576ms ± 2%        576ms ± 2%    ~     (p=0.620 n=20+20)
Tar             262ms ± 3%        263ms ± 3%    ~     (p=0.599 n=19+18)
XML             322ms ± 4%        322ms ± 2%    ~     (p=0.512 n=20+20)

name      old user-time/op  new user-time/op  delta
Template        403ms ± 3%        399ms ± 5%    ~     (p=0.149 n=17+20)
Unicode         217ms ±12%        217ms ± 9%    ~     (p=0.883 n=20+20)
GoTypes         1.24s ± 3%        1.24s ± 3%    ~     (p=0.718 n=20+20)
Compiler        5.90s ± 3%        5.84s ± 5%    ~     (p=0.217 n=18+20)
SSA             14.0s ± 6%        14.1s ± 5%    ~     (p=0.235 n=19+20)
Flate           253ms ± 6%        254ms ± 5%    ~     (p=0.749 n=20+19)
GoParser        309ms ± 7%        307ms ± 5%    ~     (p=0.398 n=20+20)
Reflect         772ms ± 3%        771ms ± 3%    ~     (p=0.901 n=20+19)
Tar             368ms ± 5%        369ms ± 8%    ~     (p=0.429 n=20+20)
XML             435ms ± 5%        434ms ± 5%    ~     (p=0.841 n=20+20)

name      old alloc/op      new alloc/op      delta
Template       39.0MB ± 0%       38.9MB ± 0%  -0.21%  (p=0.000 n=20+19)
Unicode        29.0MB ± 0%       29.0MB ± 0%  -0.03%  (p=0.000 n=20+20)
GoTypes         116MB ± 0%        115MB ± 0%  -0.33%  (p=0.000 n=20+20)
Compiler        498MB ± 0%        496MB ± 0%  -0.37%  (p=0.000 n=19+20)
SSA            1.41GB ± 0%       1.40GB ± 0%  -0.24%  (p=0.000 n=20+20)
Flate          25.0MB ± 0%       25.0MB ± 0%  -0.22%  (p=0.000 n=20+19)
GoParser       31.0MB ± 0%       30.9MB ± 0%  -0.23%  (p=0.000 n=20+17)
Reflect        77.1MB ± 0%       77.0MB ± 0%  -0.12%  (p=0.000 n=20+20)
Tar            39.7MB ± 0%       39.6MB ± 0%  -0.17%  (p=0.000 n=20+20)
XML            44.9MB ± 0%       44.8MB ± 0%  -0.29%  (p=0.000 n=20+20)

name      old allocs/op     new allocs/op     delta
Template         386k ± 0%         385k ± 0%  -0.28%  (p=0.000 n=20+20)
Unicode          337k ± 0%         336k ± 0%  -0.07%  (p=0.000 n=20+20)
GoTypes         1.20M ± 0%        1.20M ± 0%  -0.41%  (p=0.000 n=20+20)
Compiler        4.71M ± 0%        4.68M ± 0%  -0.52%  (p=0.000 n=20+20)
SSA             11.7M ± 0%        11.6M ± 0%  -0.31%  (p=0.000 n=20+19)
Flate            238k ± 0%         237k ± 0%  -0.28%  (p=0.000 n=18+20)
GoParser         320k ± 0%         319k ± 0%  -0.34%  (p=0.000 n=20+19)
Reflect          961k ± 0%         959k ± 0%  -0.12%  (p=0.000 n=20+20)
Tar              397k ± 0%         396k ± 0%  -0.23%  (p=0.000 n=20+20)
XML              419k ± 0%         417k ± 0%  -0.39%  (p=0.000 n=20+19)

Change-Id: Ic7ec3614808d9892c1cab3991b996b7a3b8eff21
Reviewed-on: https://go-review.googlesource.com/102676
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-27 18:03:39 +00:00
HaraldNordgren a42ea51ae9 cmd/go: print each import error only once
This change prevents import errors from being printed multiple times.
Creating a bare-bones package 'p' with only one file importing itself
and running 'go build p', the current implementation gives this error
message:

	can't load package: import cycle not allowed
	package p
		imports p
	import cycle not allowed
	package p
		imports p

With this change we will show the message only once.

Updates #23295

Change-Id: I653b34c1c06c279f3df514f12ec0b89745a7e64a
Reviewed-on: https://go-review.googlesource.com/86535
Reviewed-by: Harald Nordgren <haraldnordgren@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-27 14:07:14 +00:00
Alberto Donizetti b63b0f2b75 cmd/compile: allocate less in regalloc's liveValues
Instrumenting the compiler shows that, at the end of liveValues, the
values of the workList's cap are distributed as:

  cap	 freq
  1  	 0.006
  2  	 0.002
  4  	 0.237
  8  	 0.272
  16	 0.254
  32	 0.141
  64	 0.062
  128	 0.02
  256	 0.005
  512	 0.001
  1024	 0.0

Since the initial workList slice allocation is always on the stack
(as the variable does not escape), we can aggressively pre-allocate a
big backing array at (almost) no cost. This will save several
allocations in liveValues calls that end up having a large workList,
with no performance penalties for calls that have a small workList.

name      old time/op       new time/op       delta
Template        284ms ± 3%        282ms ± 3%    ~     (p=0.201 n=20+20)
Unicode         138ms ± 7%        138ms ± 7%    ~     (p=0.718 n=20+20)
GoTypes         905ms ± 2%        895ms ± 1%  -1.10%  (p=0.003 n=19+18)
Compiler        4.26s ± 1%        4.25s ± 1%  -0.38%  (p=0.038 n=20+19)
SSA             9.85s ± 2%        9.80s ± 1%    ~     (p=0.061 n=20+19)
Flate           187ms ± 6%        186ms ± 5%    ~     (p=0.289 n=20+20)
GoParser        227ms ± 3%        225ms ± 3%    ~     (p=0.072 n=20+20)
Reflect         578ms ± 2%        575ms ± 2%    ~     (p=0.059 n=18+20)
Tar             263ms ± 2%        265ms ± 3%    ~     (p=0.224 n=19+20)
XML             323ms ± 3%        325ms ± 2%    ~     (p=0.127 n=20+20)

name      old user-time/op  new user-time/op  delta
Template        406ms ± 6%        404ms ± 4%    ~     (p=0.314 n=20+20)
Unicode         220ms ± 6%        215ms ±11%    ~     (p=0.077 n=18+20)
GoTypes         1.25s ± 3%        1.24s ± 4%    ~     (p=0.461 n=20+20)
Compiler        5.95s ± 2%        5.84s ± 5%  -1.93%  (p=0.007 n=20+20)
SSA             14.4s ± 4%        14.2s ± 4%    ~     (p=0.108 n=20+20)
Flate           257ms ± 6%        252ms ± 9%    ~     (p=0.063 n=20+20)
GoParser        317ms ± 5%        312ms ± 6%  -1.85%  (p=0.049 n=20+20)
Reflect         779ms ± 2%        774ms ± 3%    ~     (p=0.253 n=20+20)
Tar             371ms ± 4%        374ms ± 4%    ~     (p=0.327 n=20+20)
XML             440ms ± 5%        442ms ± 5%    ~     (p=0.678 n=20+20)

name      old alloc/op      new alloc/op      delta
Template       39.4MB ± 0%       39.0MB ± 0%  -0.96%  (p=0.000 n=20+20)
Unicode        29.1MB ± 0%       29.0MB ± 0%  -0.13%  (p=0.000 n=20+20)
GoTypes         117MB ± 0%        116MB ± 0%  -0.88%  (p=0.000 n=20+20)
Compiler        502MB ± 0%        498MB ± 0%  -0.77%  (p=0.000 n=19+20)
SSA            1.42GB ± 0%       1.40GB ± 0%  -0.80%  (p=0.000 n=20+20)
Flate          25.3MB ± 0%       25.0MB ± 0%  -1.10%  (p=0.000 n=20+19)
GoParser       31.3MB ± 0%       31.0MB ± 0%  -1.05%  (p=0.000 n=20+20)
Reflect        77.9MB ± 0%       77.1MB ± 0%  -1.03%  (p=0.000 n=20+20)
Tar            40.0MB ± 0%       39.7MB ± 0%  -0.80%  (p=0.000 n=20+20)
XML            45.2MB ± 0%       44.9MB ± 0%  -0.72%  (p=0.000 n=20+20)

name      old allocs/op     new allocs/op     delta
Template         392k ± 0%         386k ± 0%  -1.44%  (p=0.000 n=20+20)
Unicode          337k ± 0%         337k ± 0%  -0.22%  (p=0.000 n=20+20)
GoTypes         1.22M ± 0%        1.20M ± 0%  -1.33%  (p=0.000 n=20+20)
Compiler        4.76M ± 0%        4.71M ± 0%  -1.12%  (p=0.000 n=20+20)
SSA             11.8M ± 0%        11.7M ± 0%  -1.00%  (p=0.000 n=20+20)
Flate            241k ± 0%         238k ± 0%  -1.49%  (p=0.000 n=20+20)
GoParser         324k ± 0%         320k ± 0%  -1.17%  (p=0.000 n=20+20)
Reflect          981k ± 0%         961k ± 0%  -2.11%  (p=0.000 n=20+20)
Tar              402k ± 0%         397k ± 0%  -1.29%  (p=0.000 n=20+20)
XML              424k ± 0%         419k ± 0%  -1.10%  (p=0.000 n=19+20)

Change-Id: If46667ae98eee2d47a615cad05e18df0629d8388
Reviewed-on: https://go-review.googlesource.com/102495
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-27 08:16:50 +00:00
Alberto Donizetti fdee46ee70 cmd/compile: use more ORs in generic.rules
No changes in the actual generated compiler code.

Change-Id: I206a7bf7b60f70a73640119fc92974f79ed95a6b
Reviewed-on: https://go-review.googlesource.com/102416
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2018-03-27 07:49:49 +00:00
Tim Wright 131901e80d cmd/go, cmd/link, runtime: enable PIE build mode, cgo race tests on FreeBSD
Fixes #24546

Change-Id: I99ebd5bc18e5c5e42eee4689644a7a8b02405f31
Reviewed-on: https://go-review.googlesource.com/102616
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-27 02:50:29 +00:00
Brad Fitzpatrick 48db2c01b4 all: use strings.Builder instead of bytes.Buffer where appropriate
I grepped for "bytes.Buffer" and "buf.String" and mostly ignored test
files. I skipped a few on purpose and probably missed a few others,
but otherwise I think this should be most of them.

Updates #18990

Change-Id: I5a6ae4296b87b416d8da02d7bfaf981d8cc14774
Reviewed-on: https://go-review.googlesource.com/102479
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-26 23:05:53 +00:00
Hana Kim f0eca373be cmd/trace: add /userspans, /userspan pages
Change-Id: Ifbefb659a8df3b079d69679871af444b179deaeb
Reviewed-on: https://go-review.googlesource.com/102599
Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2018-03-26 22:30:52 +00:00
Austin Clements 2d8181e7b5 cmd/compile: clarify unsigned interpretation of AuxInt
The way Value.AuxInt represents unsigned numbers is currently
documented in genericOps.go, which is not the most obvious place for
it. Move that documentation to Value.AuxInt. Furthermore, to make it
harder to use incorrectly, introduce a Value.AuxUnsigned accessor that
returns the zero-extended value of Value.AuxInt.

Change-Id: I85030c3c68761404058a430e0b1c7464591b2f42
Reviewed-on: https://go-review.googlesource.com/102597
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-26 19:08:24 +00:00
Tobias Klauser a29e25b82c cmd/cgo: add support for GOARCH=sparc64
Even though GOARCH=sparc64 is not supported by gc (yet), it is easy to
make cgo already support it.

This e.g. allows to generate Go type definitions for linux/sparc64 in
the golang.org/x/sys/unix package without using gccgo.

Change-Id: I8886c81e7c895a0d93e350d81ed653fb59d95dd8
Reviewed-on: https://go-review.googlesource.com/102555
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-26 18:58:51 +00:00
David Chase a934e34e87 cmd/compile: invoke gdb more carefully in ssa/debug_test.go
Gdb can be sensitive to contents of .gdbinit, and to run
this test properly needs to have runtime/runtime-gdb.py
on the auto load safe path.  Therefore, turn off .gdbinit
loading and explicitly add $GOROOT/runtime to the safe
load path.

This should make ssa/debug_test.go run more consistently.

Updates #24464.

Change-Id: I63ed17c032cb3773048713ce51fca3a3f86e79b6
Reviewed-on: https://go-review.googlesource.com/102598
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-26 18:44:52 +00:00
Ilya Tocar 3db3826a57 cmd/internal/obj/x86: use PutOpBytesLit in more places
We already replaced most loops with PutOpBytesLit where possible,
do this in a last few places.

Change-Id: I8c90de017810145a12394fa6b887755e9111b22a
Reviewed-on: https://go-review.googlesource.com/102276
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-26 18:20:18 +00:00
David Chase f7404974da cmd/compile: finish GOEXPERIMENT=preemptibleloops repair
A newish check for branch-likely on single-successor blocks
caught a case where the preemption-check inserter was
setting "likely" on an unconditional branch.

Fixed by checking for that case before setting likely.

Also removed an overconservative restriction on parallel
compilation for GOEXPERIMENT=preemptibleloops; it works
fine, it is just another control-flow transformation.

Change-Id: I8e786e6281e0631cac8d80cff67bfb6402b4d225
Reviewed-on: https://go-review.googlesource.com/102317
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-03-26 17:44:14 +00:00
Michael Munday 8afa8a3374 cmd/compile: use 32-bit comparisons where possible on s390x
We use 32-bit operations for 8- and 16-bit arithmetic, so use them
for comparisons too. This won't change performance but it is more
consistent and makes testing 8- and 16-bit comparison codegen
slightly more straightforward (for follow up CL).

Also fix a typo and add some additional double sign and zero
extension rules to remove the operations inserted by the comparison
rules.

Change-Id: I89ec1b0e09cb8be8090cf007be283ad88bba75a4
Reviewed-on: https://go-review.googlesource.com/102556
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-26 17:41:34 +00:00
Hana Kim 68a1c9c400 internal/trace: compute span stats as computing goroutine stats
Move part of UserSpan event processing from cmd/trace.analyzeAnnotations
to internal/trace.GoroutineStats that returns analyzed per-goroutine
execution information. Now the execution information includes list of
spans and their execution information.

cmd/trace.analyzeAnnotations utilizes the span execution information
from internal/trace.GoroutineStats and connects them with task
information.

Change-Id: Ib7f79a3ba652a4ae55cd81ea17565bcc7e241c5c
Reviewed-on: https://go-review.googlesource.com/101917
Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Peter Weinberger <pjw@google.com>
2018-03-26 16:59:01 +00:00
Ilya Tocar 24cd112086 cmd/compile/internal/ssa: optimize away double NEG on amd64
When lowering some ops on amd64 we generate additional NEGQ.
This may result in code like this:

NEGQ R12
NEGQ R12

Optimize it away. Gain is not significant, about ~0.5% gain in geomean
in compress/flate and 200 bytes codesize reduction in go tool.

Full results below:

name                             old time/op    new time/op    delta
Encode/Digits/Huffman/1e4-6        65.8µs ± 0%    65.7µs ± 0%  -0.21%  (p=0.010 n=10+9)
Encode/Digits/Huffman/1e5-6         633µs ± 0%     632µs ± 0%    ~     (p=0.370 n=8+9)
Encode/Digits/Huffman/1e6-6        6.30ms ± 1%    6.29ms ± 1%    ~     (p=0.796 n=10+10)
Encode/Digits/Speed/1e4-6           281µs ± 0%     280µs ± 1%  -0.34%  (p=0.043 n=8+10)
Encode/Digits/Speed/1e5-6          2.66ms ± 0%    2.66ms ± 0%  -0.09%  (p=0.043 n=10+10)
Encode/Digits/Speed/1e6-6          26.3ms ± 0%    26.3ms ± 0%    ~     (p=0.190 n=10+10)
Encode/Digits/Default/1e4-6         554µs ± 0%     557µs ± 0%  +0.46%  (p=0.001 n=9+10)
Encode/Digits/Default/1e5-6        8.63ms ± 1%    8.62ms ± 1%    ~     (p=0.912 n=10+10)
Encode/Digits/Default/1e6-6        92.7ms ± 1%    92.2ms ± 1%    ~     (p=0.052 n=10+10)
Encode/Digits/Compression/1e4-6     558µs ± 1%     557µs ± 1%    ~     (p=0.481 n=10+10)
Encode/Digits/Compression/1e5-6    8.58ms ± 0%    8.61ms ± 1%    ~     (p=0.315 n=8+10)
Encode/Digits/Compression/1e6-6    92.3ms ± 1%    92.4ms ± 1%    ~     (p=0.971 n=10+10)
Encode/Twain/Huffman/1e4-6         89.5µs ± 0%    89.0µs ± 1%  -0.48%  (p=0.001 n=9+9)
Encode/Twain/Huffman/1e5-6          727µs ± 1%     728µs ± 0%    ~     (p=0.604 n=10+9)
Encode/Twain/Huffman/1e6-6         7.21ms ± 0%    7.19ms ± 1%    ~     (p=0.696 n=8+10)
Encode/Twain/Speed/1e4-6            320µs ± 1%     321µs ± 1%    ~     (p=0.353 n=10+10)
Encode/Twain/Speed/1e5-6           2.63ms ± 0%    2.62ms ± 1%  -0.33%  (p=0.016 n=8+10)
Encode/Twain/Speed/1e6-6           25.8ms ± 0%    25.8ms ± 0%    ~     (p=0.360 n=10+8)
Encode/Twain/Default/1e4-6          677µs ± 1%     671µs ± 1%  -0.88%  (p=0.000 n=10+10)
Encode/Twain/Default/1e5-6         10.5ms ± 1%    10.3ms ± 0%  -2.06%  (p=0.000 n=10+10)
Encode/Twain/Default/1e6-6          113ms ± 1%     111ms ± 1%  -1.96%  (p=0.000 n=10+9)
Encode/Twain/Compression/1e4-6      688µs ± 0%     679µs ± 1%  -1.30%  (p=0.000 n=7+10)
Encode/Twain/Compression/1e5-6     11.6ms ± 1%    11.3ms ± 1%  -2.10%  (p=0.000 n=10+10)
Encode/Twain/Compression/1e6-6      126ms ± 1%     124ms ± 0%  -1.57%  (p=0.000 n=10+10)
[Geo mean]                         3.45ms         3.44ms       -0.46%

name                             old speed      new speed      delta
Encode/Digits/Huffman/1e4-6       152MB/s ± 0%   152MB/s ± 0%  +0.21%  (p=0.009 n=10+9)
Encode/Digits/Huffman/1e5-6       158MB/s ± 0%   158MB/s ± 0%    ~     (p=0.336 n=8+9)
Encode/Digits/Huffman/1e6-6       159MB/s ± 1%   159MB/s ± 1%    ~     (p=0.781 n=10+10)
Encode/Digits/Speed/1e4-6        35.6MB/s ± 0%  35.7MB/s ± 1%  +0.34%  (p=0.020 n=8+10)
Encode/Digits/Speed/1e5-6        37.6MB/s ± 0%  37.7MB/s ± 0%  +0.09%  (p=0.049 n=10+10)
Encode/Digits/Speed/1e6-6        38.0MB/s ± 0%  38.0MB/s ± 0%    ~     (p=0.146 n=10+10)
Encode/Digits/Default/1e4-6      18.0MB/s ± 0%  18.0MB/s ± 0%  -0.45%  (p=0.002 n=9+10)
Encode/Digits/Default/1e5-6      11.6MB/s ± 1%  11.6MB/s ± 1%    ~     (p=0.644 n=10+10)
Encode/Digits/Default/1e6-6      10.8MB/s ± 1%  10.8MB/s ± 1%  +0.51%  (p=0.044 n=10+10)
Encode/Digits/Compression/1e4-6  17.9MB/s ± 1%  17.9MB/s ± 1%    ~     (p=0.468 n=10+10)
Encode/Digits/Compression/1e5-6  11.7MB/s ± 0%  11.6MB/s ± 1%    ~     (p=0.322 n=8+10)
Encode/Digits/Compression/1e6-6  10.8MB/s ± 1%  10.8MB/s ± 1%    ~     (p=0.983 n=10+10)
Encode/Twain/Huffman/1e4-6        112MB/s ± 0%   112MB/s ± 1%  +0.42%  (p=0.002 n=8+9)
Encode/Twain/Huffman/1e5-6        138MB/s ± 1%   137MB/s ± 0%    ~     (p=0.616 n=10+9)
Encode/Twain/Huffman/1e6-6        139MB/s ± 0%   139MB/s ± 1%    ~     (p=0.652 n=8+10)
Encode/Twain/Speed/1e4-6         31.3MB/s ± 1%  31.2MB/s ± 1%    ~     (p=0.342 n=10+10)
Encode/Twain/Speed/1e5-6         38.0MB/s ± 0%  38.1MB/s ± 1%  +0.33%  (p=0.011 n=8+10)
Encode/Twain/Speed/1e6-6         38.8MB/s ± 0%  38.7MB/s ± 0%    ~     (p=0.325 n=10+8)
Encode/Twain/Default/1e4-6       14.8MB/s ± 1%  14.9MB/s ± 1%  +0.88%  (p=0.000 n=10+10)
Encode/Twain/Default/1e5-6       9.48MB/s ± 1%  9.68MB/s ± 0%  +2.11%  (p=0.000 n=10+10)
Encode/Twain/Default/1e6-6       8.86MB/s ± 1%  9.03MB/s ± 1%  +1.97%  (p=0.000 n=10+9)
Encode/Twain/Compression/1e4-6   14.5MB/s ± 0%  14.7MB/s ± 1%  +1.31%  (p=0.000 n=7+10)
Encode/Twain/Compression/1e5-6   8.63MB/s ± 1%  8.82MB/s ± 1%  +2.17%  (p=0.000 n=10+10)
Encode/Twain/Compression/1e6-6   7.92MB/s ± 1%  8.05MB/s ± 1%  +1.59%  (p=0.000 n=10+10)
[Geo mean]                       29.0MB/s       29.1MB/s       +0.47%

// symSizeComp `which go` go_old:

section differences:
global text (code) = 203 bytes (0.005131%)
read-only data = 1 bytes (0.000057%)
Total difference 204 bytes (0.003297%)

Change-Id: Ie2cdfa1216472d78694fff44d215b3b8e71cf7bf
Reviewed-on: https://go-review.googlesource.com/102277
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-26 16:19:53 +00:00
Hana (Hyang-Ah) Kim ea1f483240 cmd/trace: beautify goroutine page
- Summary: also includes links to pprof data.
- Sortable table: sorting is done on server-side. The intention is
  that later, I want to add pagination feature and limit the page
  size the browser has to handle.
- Stacked horizontal bar graph to present total time breakdown.
- Human-friendly time representation.
- No dependency on external fancy javascript libraries to allow
  it to function without an internet connection.

Change-Id: I91e5c26746e59ad0329dfb61e096e11f768c7b73
Reviewed-on: https://go-review.googlesource.com/102156
Run-TryBot: Hyang-Ah Hana Kim <hyangah@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Andrew Bonventre <andybons@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2018-03-26 15:36:56 +00:00
Alberto Donizetti 2ba98f1ae9 cmd/compile: avoid some allocations in regalloc
Compilebench:
name      old time/op       new time/op       delta
Template        283ms ± 3%        281ms ± 4%    ~     (p=0.242 n=20+20)
Unicode         137ms ± 6%        135ms ± 6%    ~     (p=0.194 n=20+19)
GoTypes         890ms ± 2%        883ms ± 1%  -0.74%  (p=0.001 n=19+19)
Compiler        4.21s ± 2%        4.20s ± 2%  -0.40%  (p=0.033 n=20+19)
SSA             9.86s ± 2%        9.68s ± 1%  -1.80%  (p=0.000 n=20+19)
Flate           185ms ± 5%        185ms ± 7%    ~     (p=0.429 n=20+20)
GoParser        222ms ± 3%        222ms ± 4%    ~     (p=0.588 n=19+20)
Reflect         572ms ± 2%        570ms ± 3%    ~     (p=0.113 n=19+20)
Tar             263ms ± 4%        259ms ± 2%  -1.41%  (p=0.013 n=20+20)
XML             321ms ± 2%        321ms ± 4%    ~     (p=0.835 n=20+19)

name      old user-time/op  new user-time/op  delta
Template        400ms ± 5%        405ms ± 5%    ~     (p=0.096 n=20+20)
Unicode         217ms ± 8%        213ms ± 8%    ~     (p=0.242 n=20+20)
GoTypes         1.23s ± 3%        1.22s ± 3%    ~     (p=0.923 n=19+20)
Compiler        5.76s ± 6%        5.81s ± 2%    ~     (p=0.687 n=20+19)
SSA             14.2s ± 4%        14.0s ± 4%    ~     (p=0.121 n=20+20)
Flate           248ms ± 7%        251ms ±10%    ~     (p=0.369 n=20+20)
GoParser        308ms ± 5%        305ms ± 6%    ~     (p=0.336 n=19+20)
Reflect         771ms ± 2%        766ms ± 2%    ~     (p=0.113 n=20+19)
Tar             370ms ± 5%        362ms ± 7%  -2.06%  (p=0.036 n=19+20)
XML             435ms ± 4%        432ms ± 5%    ~     (p=0.369 n=20+20)

name      old alloc/op      new alloc/op      delta
Template       39.5MB ± 0%       39.4MB ± 0%  -0.20%  (p=0.000 n=20+20)
Unicode        29.1MB ± 0%       29.1MB ± 0%    ~     (p=0.064 n=20+20)
GoTypes         117MB ± 0%        117MB ± 0%  -0.17%  (p=0.000 n=20+20)
Compiler        503MB ± 0%        502MB ± 0%  -0.15%  (p=0.000 n=19+19)
SSA            1.42GB ± 0%       1.42GB ± 0%  -0.16%  (p=0.000 n=20+20)
Flate          25.3MB ± 0%       25.3MB ± 0%  -0.19%  (p=0.000 n=20+20)
GoParser       31.4MB ± 0%       31.3MB ± 0%  -0.14%  (p=0.000 n=20+18)
Reflect        78.1MB ± 0%       77.9MB ± 0%  -0.34%  (p=0.000 n=20+19)
Tar            40.1MB ± 0%       40.0MB ± 0%  -0.17%  (p=0.000 n=20+20)
XML            45.3MB ± 0%       45.2MB ± 0%  -0.13%  (p=0.000 n=20+20)

name      old allocs/op     new allocs/op     delta
Template         393k ± 0%         392k ± 0%  -0.21%  (p=0.000 n=20+19)
Unicode          337k ± 0%         337k ± 0%  -0.02%  (p=0.000 n=20+20)
GoTypes         1.22M ± 0%        1.22M ± 0%  -0.21%  (p=0.000 n=20+20)
Compiler        4.77M ± 0%        4.76M ± 0%  -0.16%  (p=0.000 n=20+20)
SSA             11.8M ± 0%        11.8M ± 0%  -0.12%  (p=0.000 n=20+20)
Flate            242k ± 0%         241k ± 0%  -0.20%  (p=0.000 n=20+20)
GoParser         324k ± 0%         324k ± 0%  -0.14%  (p=0.000 n=20+20)
Reflect          985k ± 0%         981k ± 0%  -0.38%  (p=0.000 n=20+20)
Tar              403k ± 0%         402k ± 0%  -0.19%  (p=0.000 n=20+20)
XML              424k ± 0%         424k ± 0%  -0.16%  (p=0.000 n=19+20)

Change-Id: I131e382b64cd6db11a9263a477d45d80c180c499
Reviewed-on: https://go-review.googlesource.com/102421
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-25 18:27:49 +00:00
Daniel Martí b1892d740e cmd/compile/internal/gc: various cleanups
Remove a couple of unnecessary var declarations, an unused sort.Sort
type, and simplify a range by using the two-name variant.

Change-Id: Ia251f634db0bfbe8b1d553b8659272ddbd13b2c3
Reviewed-on: https://go-review.googlesource.com/102336
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-24 19:44:18 +00:00
Daniel Nephin 5526ef1c51 cmd/test2json: document missing "skip" action
Change-Id: I906e61170279f0647598e2fd4fa931aac1b69288
GitHub-Last-Rev: f6df43e8e1
GitHub-Pull-Request: golang/go#24517
Reviewed-on: https://go-review.googlesource.com/102396
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-24 18:37:22 +00:00
Alberto Donizetti a27cd4fd31 test/codegen: port tbz/tbnz arm64 tests
And delete them from asm_test.

Change-Id: I34fcf85ae8ce09cd146fe4ce6a0ae7616bd97e2d
Reviewed-on: https://go-review.googlesource.com/102296
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-24 09:35:53 +00:00
Giovanni Bajo d54902ece9 cmd/compile: in prove, shortcircuit self-facts
Sometimes, we can end up calling update with a self-relation
about a variable (x REL x). In this case, there is no need
to record anything: the relation is unsatisfiable if and only
if it doesn't contain eq.

This also helps avoiding infinite loop in next CL that will
introduce transitive closure of relations.

Passes toolstash -cmp.

Change-Id: Ic408452ec1c13653f22ada35466ec98bc14aaa8e
Reviewed-on: https://go-review.googlesource.com/100276
Reviewed-by: Austin Clements <austin@google.com>
2018-03-24 03:06:21 +00:00
Giovanni Bajo 385d936fb2 cmd/compile: in prove, fail fast when unsat is found
When an unsatisfiable relation is recorded in the facts table,
there is no need to compute further relations or updates
additional data structures.

Since we're about to transitively propagate relations, make
sure to fail as fast as possible to avoid doing useless work
in dead branches.

Passes toolstash -cmp.

Change-Id: I23eed376d62776824c33088163c7ac9620abce85
Reviewed-on: https://go-review.googlesource.com/100275
Reviewed-by: Austin Clements <austin@google.com>
2018-03-24 03:06:01 +00:00
Giovanni Bajo 79112707bb cmd/compile: add patterns for bit set/clear/complement on amd64
This patch completes implementation of BT(Q|L), and adds support
for BT(S|R|C)(Q|L).

Example of code changes from time.(*Time).addSec:

        if t.wall&hasMonotonic != 0 {
  0x1073465               488b08                  MOVQ 0(AX), CX
  0x1073468               4889ca                  MOVQ CX, DX
  0x107346b               48c1e93f                SHRQ $0x3f, CX
  0x107346f               48c1e13f                SHLQ $0x3f, CX
  0x1073473               48f7c1ffffffff          TESTQ $-0x1, CX
  0x107347a               746b                    JE 0x10734e7

        if t.wall&hasMonotonic != 0 {
  0x1073435               488b08                  MOVQ 0(AX), CX
  0x1073438               480fbae13f              BTQ $0x3f, CX
  0x107343d               7363                    JAE 0x10734a2

Another example:

                        t.wall = t.wall&nsecMask | uint64(dsec)<<nsecShift | hasMonotonic
  0x10734c8               4881e1ffffff3f          ANDQ $0x3fffffff, CX
  0x10734cf               48c1e61e                SHLQ $0x1e, SI
  0x10734d3               4809ce                  ORQ CX, SI
  0x10734d6               48b90000000000000080    MOVQ $0x8000000000000000, CX
  0x10734e0               4809f1                  ORQ SI, CX
  0x10734e3               488908                  MOVQ CX, 0(AX)

                        t.wall = t.wall&nsecMask | uint64(dsec)<<nsecShift | hasMonotonic
  0x107348b		4881e2ffffff3f		ANDQ $0x3fffffff, DX
  0x1073492		48c1e61e		SHLQ $0x1e, SI
  0x1073496		4809f2			ORQ SI, DX
  0x1073499		480fbaea3f		BTSQ $0x3f, DX
  0x107349e		488910			MOVQ DX, 0(AX)

Go1 benchmarks seem unaffected, and I would be surprised
otherwise:

name                     old time/op    new time/op     delta
BinaryTree17-4              2.64s ± 4%      2.56s ± 9%  -2.92%  (p=0.008 n=9+9)
Fannkuch11-4                2.90s ± 1%      2.95s ± 3%  +1.76%  (p=0.010 n=10+9)
FmtFprintfEmpty-4          35.3ns ± 1%     34.5ns ± 2%  -2.34%  (p=0.004 n=9+8)
FmtFprintfString-4         57.0ns ± 1%     58.4ns ± 5%  +2.52%  (p=0.029 n=9+10)
FmtFprintfInt-4            59.8ns ± 3%     59.8ns ± 6%    ~     (p=0.565 n=10+10)
FmtFprintfIntInt-4         93.9ns ± 3%     91.2ns ± 5%  -2.94%  (p=0.014 n=10+9)
FmtFprintfPrefixedInt-4     107ns ± 6%      104ns ± 6%    ~     (p=0.099 n=10+10)
FmtFprintfFloat-4           187ns ± 3%      188ns ± 3%    ~     (p=0.505 n=10+9)
FmtManyArgs-4               410ns ± 1%      415ns ± 6%    ~     (p=0.649 n=8+10)
GobDecode-4                5.30ms ± 3%     5.27ms ± 3%    ~     (p=0.436 n=10+10)
GobEncode-4                4.62ms ± 5%     4.47ms ± 2%  -3.24%  (p=0.001 n=9+10)
Gzip-4                      197ms ± 4%      193ms ± 3%    ~     (p=0.123 n=10+10)
Gunzip-4                   30.4ms ± 3%     30.1ms ± 3%    ~     (p=0.481 n=10+10)
HTTPClientServer-4         76.3µs ± 1%     76.0µs ± 1%    ~     (p=0.236 n=8+9)
JSONEncode-4               10.5ms ± 9%     10.3ms ± 3%    ~     (p=0.280 n=10+10)
JSONDecode-4               42.3ms ±10%     41.3ms ± 2%    ~     (p=0.053 n=9+10)
Mandelbrot200-4            3.80ms ± 2%     3.72ms ± 2%  -2.15%  (p=0.001 n=9+10)
GoParse-4                  2.88ms ±10%     2.81ms ± 2%    ~     (p=0.247 n=10+10)
RegexpMatchEasy0_32-4      69.5ns ± 4%     68.6ns ± 2%    ~     (p=0.171 n=10+10)
RegexpMatchEasy0_1K-4       165ns ± 3%      162ns ± 3%    ~     (p=0.137 n=10+10)
RegexpMatchEasy1_32-4      65.7ns ± 6%     64.4ns ± 2%  -2.02%  (p=0.037 n=10+10)
RegexpMatchEasy1_1K-4       278ns ± 2%      279ns ± 3%    ~     (p=0.991 n=8+9)
RegexpMatchMedium_32-4     99.3ns ± 3%     98.5ns ± 4%    ~     (p=0.457 n=10+9)
RegexpMatchMedium_1K-4     30.1µs ± 1%     30.4µs ± 2%    ~     (p=0.173 n=8+10)
RegexpMatchHard_32-4       1.40µs ± 2%     1.41µs ± 4%    ~     (p=0.565 n=10+10)
RegexpMatchHard_1K-4       42.5µs ± 1%     41.5µs ± 3%  -2.13%  (p=0.002 n=8+9)
Revcomp-4                   332ms ± 4%      328ms ± 5%    ~     (p=0.720 n=9+10)
Template-4                 48.3ms ± 2%     49.6ms ± 3%  +2.56%  (p=0.002 n=8+10)
TimeParse-4                 252ns ± 2%      249ns ± 3%    ~     (p=0.116 n=9+10)
TimeFormat-4                262ns ± 4%      252ns ± 3%  -4.01%  (p=0.000 n=9+10)

name                     old speed      new speed       delta
GobDecode-4               145MB/s ± 3%    146MB/s ± 3%    ~     (p=0.436 n=10+10)
GobEncode-4               166MB/s ± 5%    172MB/s ± 2%  +3.28%  (p=0.001 n=9+10)
Gzip-4                   98.6MB/s ± 4%  100.4MB/s ± 3%    ~     (p=0.123 n=10+10)
Gunzip-4                  639MB/s ± 3%    645MB/s ± 3%    ~     (p=0.481 n=10+10)
JSONEncode-4              185MB/s ± 8%    189MB/s ± 3%    ~     (p=0.280 n=10+10)
JSONDecode-4             46.0MB/s ± 9%   47.0MB/s ± 2%  +2.21%  (p=0.046 n=9+10)
GoParse-4                20.1MB/s ± 9%   20.6MB/s ± 2%    ~     (p=0.239 n=10+10)
RegexpMatchEasy0_32-4     460MB/s ± 4%    467MB/s ± 2%    ~     (p=0.165 n=10+10)
RegexpMatchEasy0_1K-4    6.19GB/s ± 3%   6.28GB/s ± 3%    ~     (p=0.165 n=10+10)
RegexpMatchEasy1_32-4     487MB/s ± 5%    497MB/s ± 2%  +2.00%  (p=0.043 n=10+10)
RegexpMatchEasy1_1K-4    3.67GB/s ± 2%   3.67GB/s ± 3%    ~     (p=0.963 n=8+9)
RegexpMatchMedium_32-4   10.1MB/s ± 3%   10.1MB/s ± 4%    ~     (p=0.435 n=10+9)
RegexpMatchMedium_1K-4   34.0MB/s ± 1%   33.7MB/s ± 2%    ~     (p=0.173 n=8+10)
RegexpMatchHard_32-4     22.9MB/s ± 2%   22.7MB/s ± 4%    ~     (p=0.565 n=10+10)
RegexpMatchHard_1K-4     24.0MB/s ± 3%   24.7MB/s ± 3%  +2.64%  (p=0.001 n=9+9)
Revcomp-4                 766MB/s ± 4%    775MB/s ± 5%    ~     (p=0.720 n=9+10)
Template-4               40.2MB/s ± 2%   39.2MB/s ± 3%  -2.47%  (p=0.002 n=8+10)

The rules match ~1800 times during all.bash.

Fixes #18943

Change-Id: I64be1ada34e89c486dfd935bf429b35652117ed4
Reviewed-on: https://go-review.googlesource.com/94766
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-24 02:38:50 +00:00
isharipo 3afd2d7fc8 cmd/compile/internal/gc: properly initialize ssa.Func Type field
The ssa.Func has Type field that is described as
function signature type.

It never gets any value and remains nil.
This leads to "<T>" signature printed representation.

Given this function declaration:
	func foo(x int, f func() string) (int, error)

GOSSAFUNC printed it as below:
	compiling foo
	foo <T>

After this change:
	compiling foo
	foo func(int, func() string) (int, error)

Change-Id: Iec5eec8aac5c76ff184659e30f41b2f5fe86d329
Reviewed-on: https://go-review.googlesource.com/102375
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-03-24 02:18:52 +00:00
Matthew Dempsky ea668e18a6 cmd/compile: always write pack files
By always writing out pack files, the object file format can be
simplified somewhat. In particular, the export data format will no
longer require escaping, because the pack file provides appropriate
framing.

This CL does not affect build systems that use -pack, which includes
all major Go build systems (cmd/go, gb, bazel).

Also, existing package import logic already distinguishes pack/object
files based on file contents rather than file extension.

The only exception is cmd/pack, which specially handled object files
created by cmd/compile when used with the 'c' mode. This mode is
extended to now recognize the pack files produced by cmd/compile and
handle them as before.

Passes toolstash-check.

Updates #21705.
Updates #24512.

Change-Id: Idf131013bfebd73a5cde7e087eb19964503a9422
Reviewed-on: https://go-review.googlesource.com/102236
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-24 00:51:24 +00:00
Matthew Dempsky 699b0d4e52 cmd/link: skip __.PKGDEF in archives
The __.PKGDEF file is a compiler object file only intended for other
compilers. Also, for build systems that use -linkobj, all of the
information it contains is present within the linker object files
already, so look for it there instead.

This requires a little bit of code reorganization. Significantly,
previously when loading an archive file, the __.PKGDEF file was
authoritative on whether the package was "main" and/or "safe". Now
that we're using the Go object files instead, there's the issue that
there can be multiple Go object files in an archive (because when
using assembly, each assembly file becomes its own additional object
file).

The solution taken here is to check if any object file within the
package declares itself as "main" and/or "safe".

Updates #24512.

Change-Id: I70243a293bdf34b8555c0bf1833f8933b2809449
Reviewed-on: https://go-review.googlesource.com/102281
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-24 00:17:33 +00:00
Matthew Dempsky 946af1b658 cmd/link: make sure we're hashing __.PKGDEF in genhash
This is currently always the case because loadobjfile complains if
it's not, but that will be changed soon.

Updates #24512.

Change-Id: I262daca765932a0f4cea3fcc1cc80ca90de07a59
Reviewed-on: https://go-review.googlesource.com/102280
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-23 22:28:03 +00:00
Hyang-Ah Hana Kim 58734039bd Revert "cmd/vendor/.../pprof: refresh from upstream@a74ae6f"
This reverts commit c6e69ec7f9.

Reason for revert: Broke builders. #24508

Change-Id: I66abff0dd14ec6e1f8d8d982ccfb0438633b639d
Reviewed-on: https://go-review.googlesource.com/102316
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
2018-03-23 15:09:04 +00:00
Hana (Hyang-Ah) Kim c6e69ec7f9 cmd/vendor/.../pprof: refresh from upstream@a74ae6f
Merges updates listed in
0e0e5b725...a74ae6f

Update #24443

cmd/vendor/vendor.json was updated manually.

Change-Id: I15d5fe82ac18263d4d54f5773cee0e197e93dd59
Reviewed-on: https://go-review.googlesource.com/101736
Reviewed-by: Alberto Donizetti <alb.donizetti@gmail.com>
2018-03-23 14:36:29 +00:00
Matthew Dempsky 50921bfa2e cmd/compile: change unsafeUintptrTag from var to const
Change-Id: Ie30878199e24cce5b75428e6b602c017ebd16642
Reviewed-on: https://go-review.googlesource.com/102175
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2018-03-22 19:38:06 +00:00
Daniel Martí 02798ed936 cmd/compile: use more range fors in gc
Slightly simplifies the code. Made sure to exclude the cases that would
change behavior, such as when the iterated value is a string, when the
index is modified within the body, or when the slice is modified.

Also checked that all the elements are of pointer type, to avoid the
corner case where non-pointer types could be copied by mistake.

Change-Id: Iea64feb2a9a6a4c94ada9ff3ace40ee173505849
Reviewed-on: https://go-review.googlesource.com/100557
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-03-22 18:38:19 +00:00
Austin Clements 48f990b4a5 cmd/compile: fix GOEXPERIMENT=preemptibleloops type-checking
This experiment has gone stale. It causes a type-checking failure
because the condition of the OIF produced by range loop lowering has
type "untyped bool". Fix this by typechecking the whole OIF statement,
not just its condition.

This doesn't quite fix the whole experiment, but it gets further.
Something about preemption point insertion is causing failures like
"internal compiler error: likeliness prediction 1 for block b10 with 1
successors" in cmd/compile/internal/gc.

Change-Id: I7d80d618d7c91c338bf5f2a8dc174d582a479df3
Reviewed-on: https://go-review.googlesource.com/102157
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-03-22 18:20:31 +00:00
Travis Bischel 4f7b774822 cmd/compile: specialize Move up to 79B on amd64
Move currently uses mov instructions directly up to 31 bytes and then
switches to duffcopy. Moving 31 bytes is 4 instructions corresponding to
two loads and two stores, (or 6 if !useSSE) depending on the usage,
duffcopy is five (one or two mov, two or three lea, one call).

This adds direct mov instructions for Move's of size 32, 48, and 64 with
sse and for only size 32 without.
With useSSE:
- 32 is 4 instructions (byte +/- comparison below)
- 33 thru 48 is 6
- 49 thru 64 is 8

Without:
- 32 is 8

Note that the only platform with useSSE set to false is plan 9. I have
built three projects based off tip and tip with this patch and the
project's byte size is equal to or less than they were prior.

The basis of this change is that copying data with instructions directly
is nearly free, whereas calling into duffcopy adds a bit of overhead.
This is most noticeable in range statements where elements are 32+
bytes. For code with the following pattern:

func Benchmark32Range(b *testing.B) {
        var f s32
        for _, count := range []int{10, 100, 1000, 10000} {
                name := strconv.Itoa(count)
                b.Run(name, func(b *testing.B) {
                        base := make([]s32, count)
                        for i := 0; i < b.N; i++ {
                                for _, v := range base {
                                        f = v
                                }
                        }
                })
        }
        _ = f
}

These are the resulting benchmarks:
Benchmark16Range/10-4        19.1          19.1          +0.00%
Benchmark16Range/100-4       169           170           +0.59%
Benchmark16Range/1000-4      1684          1691          +0.42%
Benchmark16Range/10000-4     18147         18124         -0.13%
Benchmark31Range/10-4        141           142           +0.71%
Benchmark31Range/100-4       1407          1410          +0.21%
Benchmark31Range/1000-4      14070         14074         +0.03%
Benchmark31Range/10000-4     141781        141759        -0.02%
Benchmark32Range/10-4        71.4          32.2          -54.90%
Benchmark32Range/100-4       695           326           -53.09%
Benchmark32Range/1000-4      7166          3313          -53.77%
Benchmark32Range/10000-4     72571         35425         -51.19%
Benchmark64Range/10-4        87.8          64.9          -26.08%
Benchmark64Range/100-4       868           629           -27.53%
Benchmark64Range/1000-4      9355          6907          -26.17%
Benchmark64Range/10000-4     94463         70385         -25.49%
Benchmark79Range/10-4        177           152           -14.12%
Benchmark79Range/100-4       1769          1531          -13.45%
Benchmark79Range/1000-4      17893         15532         -13.20%
Benchmark79Range/10000-4     178947        155551        -13.07%
Benchmark80Range/10-4        99.6          99.7          +0.10%
Benchmark80Range/100-4       987           985           -0.20%
Benchmark80Range/1000-4      10573         10560         -0.12%
Benchmark80Range/10000-4     106792        106639        -0.14%

For runtime's BenchCopyFat* benchmarks:
CopyFat8-4     0.40ns ± 0%  0.40ns ± 0%      ~     (all equal)
CopyFat12-4    0.40ns ± 0%  0.80ns ± 0%  +100.00%  (p=0.000 n=9+9)
CopyFat16-4    0.40ns ± 0%  0.80ns ± 0%  +100.00%  (p=0.000 n=10+8)
CopyFat24-4    0.80ns ± 0%  0.40ns ± 0%   -50.00%  (p=0.001 n=8+9)
CopyFat32-4    2.01ns ± 0%  0.40ns ± 0%   -80.10%  (p=0.000 n=8+8)
CopyFat64-4    2.87ns ± 0%  0.40ns ± 0%   -86.07%  (p=0.000 n=8+10)
CopyFat128-4   4.82ns ± 0%  4.82ns ± 0%      ~     (p=1.000 n=8+8)
CopyFat256-4   8.83ns ± 0%  8.83ns ± 0%      ~     (p=1.000 n=8+8)
CopyFat512-4   16.9ns ± 0%  16.9ns ± 0%      ~     (all equal)
CopyFat520-4   14.6ns ± 0%  14.6ns ± 1%      ~     (p=0.529 n=8+9)
CopyFat1024-4  32.9ns ± 0%  33.0ns ± 0%    +0.20%  (p=0.041 n=8+9)

Function calls are not benefitted as much due how they are compiled, but
other benchmarks I ran show that calling function with 64 byte elements
is marginally improved.

The main downside with this change is that it may increase binary sizes
depending on the size of the copy, but this change also decreases
binaries for moves of 48 bytes or less.

For the following code:
package main

type size [32]byte

//go:noinline
func use(t size) {
}

//go:noinline
func get() size {
	var z size
	return z
}

func main() {
	var a size
	use(a)
}

Changing size around gives the following assembly leading up to the call
(the initialization and actual call are removed):

tip func call with 32B arg: 27B
    48 89 e7                 mov    %rsp,%rdi
    48 8d 74 24 20           lea    0x20(%rsp),%rsi
    48 89 6c 24 f0           mov    %rbp,-0x10(%rsp)
    48 8d 6c 24 f0           lea    -0x10(%rsp),%rbp
    e8 53 ab ff ff           callq  448964 <runtime.duffcopy+0x364>
    48 8b 6d 00              mov    0x0(%rbp),%rbp

modified: 19B (-8B)
    0f 10 44 24 20           movups 0x20(%rsp),%xmm0
    0f 11 04 24              movups %xmm0,(%rsp)
    0f 10 44 24 30           movups 0x30(%rsp),%xmm0
    0f 11 44 24 10           movups %xmm0,0x10(%rsp)
-
tip with 47B arg: 29B
    48 8d 7c 24 0f           lea    0xf(%rsp),%rdi
    48 8d 74 24 40           lea    0x40(%rsp),%rsi
    48 89 6c 24 f0           mov    %rbp,-0x10(%rsp)
    48 8d 6c 24 f0           lea    -0x10(%rsp),%rbp
    e8 43 ab ff ff           callq  448964 <runtime.duffcopy+0x364>
    48 8b 6d 00              mov    0x0(%rbp),%rbp

modified: 20B (-9B)
    0f 10 44 24 40           movups 0x40(%rsp),%xmm0
    0f 11 44 24 0f           movups %xmm0,0xf(%rsp)
    0f 10 44 24 50           movups 0x50(%rsp),%xmm0
    0f 11 44 24 1f           movups %xmm0,0x1f(%rsp)
-
tip with 64B arg: 27B
    48 89 e7                 mov    %rsp,%rdi
    48 8d 74 24 40           lea    0x40(%rsp),%rsi
    48 89 6c 24 f0           mov    %rbp,-0x10(%rsp)
    48 8d 6c 24 f0           lea    -0x10(%rsp),%rbp
    e8 1f ab ff ff           callq  448948 <runtime.duffcopy+0x348>
    48 8b 6d 00              mov    0x0(%rbp),%rbp

modified: 39B [+12B]
    0f 10 44 24 40           movups 0x40(%rsp),%xmm0
    0f 11 04 24              movups %xmm0,(%rsp)
    0f 10 44 24 50           movups 0x50(%rsp),%xmm0
    0f 11 44 24 10           movups %xmm0,0x10(%rsp)
    0f 10 44 24 60           movups 0x60(%rsp),%xmm0
    0f 11 44 24 20           movups %xmm0,0x20(%rsp)
    0f 10 44 24 70           movups 0x70(%rsp),%xmm0
    0f 11 44 24 30           movups %xmm0,0x30(%rsp)
-
tip with 79B arg: 29B
    48 8d 7c 24 0f           lea    0xf(%rsp),%rdi
    48 8d 74 24 60           lea    0x60(%rsp),%rsi
    48 89 6c 24 f0           mov    %rbp,-0x10(%rsp)
    48 8d 6c 24 f0           lea    -0x10(%rsp),%rbp
    e8 09 ab ff ff           callq  448948 <runtime.duffcopy+0x348>
    48 8b 6d 00              mov    0x0(%rbp),%rbp

modified: 46B [+17B]
    0f 10 44 24 60           movups 0x60(%rsp),%xmm0
    0f 11 44 24 0f           movups %xmm0,0xf(%rsp)
    0f 10 44 24 70           movups 0x70(%rsp),%xmm0
    0f 11 44 24 1f           movups %xmm0,0x1f(%rsp)
    0f 10 84 24 80 00 00     movups 0x80(%rsp),%xmm0
    00
    0f 11 44 24 2f           movups %xmm0,0x2f(%rsp)
    0f 10 84 24 90 00 00     movups 0x90(%rsp),%xmm0
    00
    0f 11 44 24 3f           movups %xmm0,0x3f(%rsp)

So, at best we save 9B, at worst we gain 17. I do not think that copying
around 65+B sized types is common enough to bloat program sizes. Using
bincmp on the go binary itself shows a zero byte difference; there are
gains and losses all over. One of the largest gains in binary size comes
from cmd/go/internal/cache.(*Cache).Get, which passes around a 64 byte
sized type -- this is one of the cases I would expect to be benefitted
by this change.

I think that this marginal improvement in struct copying for 64 byte
structs is worth it: most data structs / work items I use in my programs
are small, but few are smaller than 32 bytes: with one slice, the budget
is up. The 32 rule alone would allow another 16 bytes, the 48 and 64
rules allow another 32 and 48.

Change-Id: I19a8f9190d5d41825091f17f268f4763bfc12a62
Reviewed-on: https://go-review.googlesource.com/100718
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-22 18:17:37 +00:00
Alberto Donizetti fc6280d4b0 test/codegen: port direct comparisons with memory tests
And remove them from asm_test.

Change-Id: I1ca29b40546d6de06f20bfd550ed8ff87f495454
Reviewed-on: https://go-review.googlesource.com/102115
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-03-22 17:20:09 +00:00
Carlos Eduardo Seo 6633bb2aa7 cmd/compile/internal/ppc64, runtime internal/atomic, sync/atomic: implement faster atomics for ppc64x
This change implements faster atomics for ppc64x based on the ISA 2.07B,
Appendix B.2 recommendations, replacing SYNC/ISYNC by LWSYNC in some
cases.

Updates #21348

name                                           old time/op new time/op    delta
Cond1-16                                           955ns     856ns      -10.33%
Cond2-16                                          2.38µs    2.03µs      -14.59%
Cond4-16                                          5.90µs    5.44µs       -7.88%
Cond8-16                                          12.1µs    11.1µs       -8.42%
Cond16-16                                         27.0µs    25.1µs       -7.04%
Cond32-16                                         59.1µs    55.5µs       -6.14%
LoadMostlyHits/*sync_test.DeepCopyMap-16          22.1ns    24.1ns       +9.02%
LoadMostlyHits/*sync_test.RWMutexMap-16            252ns     249ns       -1.20%
LoadMostlyHits/*sync.Map-16                       16.2ns    16.3ns         ~
LoadMostlyMisses/*sync_test.DeepCopyMap-16        22.3ns    22.6ns         ~
LoadMostlyMisses/*sync_test.RWMutexMap-16          249ns     247ns       -0.51%
LoadMostlyMisses/*sync.Map-16                     12.7ns    12.7ns         ~
LoadOrStoreBalanced/*sync_test.RWMutexMap-16      1.27µs    1.17µs       -7.54%
LoadOrStoreBalanced/*sync.Map-16                  1.12µs    1.10µs       -2.35%
LoadOrStoreUnique/*sync_test.RWMutexMap-16        1.75µs    1.68µs       -3.84%
LoadOrStoreUnique/*sync.Map-16                    2.07µs    1.97µs       -5.13%
LoadOrStoreCollision/*sync_test.DeepCopyMap-16    15.8ns    15.9ns         ~
LoadOrStoreCollision/*sync_test.RWMutexMap-16      496ns     424ns      -14.48%
LoadOrStoreCollision/*sync.Map-16                 6.07ns    6.07ns         ~
Range/*sync_test.DeepCopyMap-16                   1.65µs    1.64µs         ~
Range/*sync_test.RWMutexMap-16                     278µs     288µs       +3.75%
Range/*sync.Map-16                                2.00µs    2.01µs         ~
AdversarialAlloc/*sync_test.DeepCopyMap-16        3.45µs    3.44µs         ~
AdversarialAlloc/*sync_test.RWMutexMap-16          226ns     227ns         ~
AdversarialAlloc/*sync.Map-16                     1.09µs    1.07µs       -2.36%
AdversarialDelete/*sync_test.DeepCopyMap-16        553ns     550ns       -0.57%
AdversarialDelete/*sync_test.RWMutexMap-16         273ns     274ns         ~
AdversarialDelete/*sync.Map-16                     247ns     249ns         ~
UncontendedSemaphore-16                           79.0ns    65.5ns      -17.11%
ContendedSemaphore-16                              112ns      97ns      -13.77%
MutexUncontended-16                               3.34ns    2.51ns      -24.69%
Mutex-16                                           266ns     191ns      -28.26%
MutexSlack-16                                      226ns     159ns      -29.55%
MutexWork-16                                       377ns     338ns      -10.14%
MutexWorkSlack-16                                  335ns     308ns       -8.20%
MutexNoSpin-16                                     196ns     184ns       -5.91%
MutexSpin-16                                       710ns     666ns       -6.21%
Once-16                                           1.29ns    1.29ns         ~
Pool-16                                           8.64ns    8.71ns         ~
PoolOverflow-16                                   1.60µs    1.44µs      -10.25%
SemaUncontended-16                                5.39ns    4.42ns      -17.96%
SemaSyntNonblock-16                                539ns     483ns      -10.42%
SemaSyntBlock-16                                   413ns     354ns      -14.20%
SemaWorkNonblock-16                                305ns     258ns      -15.36%
SemaWorkBlock-16                                   266ns     229ns      -14.06%
RWMutexUncontended-16                             12.9ns     9.7ns      -24.80%
RWMutexWrite100-16                                 203ns     147ns      -27.47%
RWMutexWrite10-16                                  177ns     119ns      -32.74%
RWMutexWorkWrite100-16                             435ns     403ns       -7.39%
RWMutexWorkWrite10-16                              642ns     611ns       -4.79%
WaitGroupUncontended-16                           4.67ns    3.70ns      -20.92%
WaitGroupAddDone-16                                402ns     355ns      -11.54%
WaitGroupAddDoneWork-16                            208ns     250ns      +20.09%
WaitGroupWait-16                                  1.21ns    1.21ns         ~
WaitGroupWaitWork-16                              5.91ns    5.87ns       -0.81%
WaitGroupActuallyWait-16                          92.2ns    85.8ns       -6.91%

Updates #21348

Change-Id: Ibb9b271d11b308264103829e176c6d9fe8f867d3
Reviewed-on: https://go-review.googlesource.com/95175
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2018-03-22 14:13:01 +00:00
Tim Wright 88129f0cb2 all: enable c-shared/c-archive support for freebsd/amd64
Fixes #14327
Much of the code is based on the linux/amd64 code that implements these
build modes, and code is shared where possible.

Change-Id: Ia510f2023768c0edbc863aebc585929ec593b332
Reviewed-on: https://go-review.googlesource.com/93875
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-21 21:56:20 +00:00
isharipo ff5cf43df5 runtime,sync/atomic: replace asm BYTEs with insts for x86
For each replacement, test case is added to new 386enc.s file
with exception of EMMS, SYSENTER, MFENCE and LFENCE as they
are already covered in amd64enc.s (same on amd64 and 386).

The replacement became less obvious after go vet suggested changes
Before:
	BYTE $0x0f; BYTE $0x7f; BYTE $0x44; BYTE $0x24; BYTE $0x08
Changed to MOVQ (this form is being tested):
	MOVQ M0, 8(SP)
Refactored to FP-relative access (go vet advice):
	MOVQ M0, val+4(FP)

Change-Id: I56b87cf3371b6ad81ad0cd9db2033aee407b5818
Reviewed-on: https://go-review.googlesource.com/101475
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-03-21 20:51:04 +00:00
Daniel Martí 77c3ef6f6f cmd/doc: use empty GOPATH when running the tests
Otherwise, a populated GOPATH might result in failures such as:

	$ go test
	[...] no buildable Go source files in [...]/gopherjs/compiler/natives/src/crypto/rand
	exit status 1

Move the initialization of the dirs walker out of the init func, so that
we can control its behavior in the tests.

Updates #24464.

Change-Id: I4b26a7d3d6809bdd8e9b6b0556d566e7855f80fe
Reviewed-on: https://go-review.googlesource.com/101836
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-21 13:43:22 +00:00
Alberto Donizetti 041c5d8348 cmd/trace: remove unused variable in tests
Unused variables in closures are currently not diagnosed by the
compiler (this is Issue #3059), while go/types catches them.

One unused variable in the cmd/trace tests is causing the go/types
test that typechecks the whole standard library to fail:

  FAIL: TestStdlib (8.05s)
    stdlib_test.go:223: cmd/trace/annotations_test.go:241:6: gcTime
    declared but not used
  FAIL

Remove it.

Updates #24464

Change-Id: I0f1b9db6ae1f0130616ee649bdbfdc91e38d2184
Reviewed-on: https://go-review.googlesource.com/101815
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
2018-03-21 11:10:03 +00:00
Ilya Tocar 983dcf70ba cmd/compile/internal/ssa: update regalloc in loops
Currently we don't lift spill out of loop if loop contains call.
However often we have code like this:

for .. {
    if hard_case {
	call()
    }
    // simple case, without call
}

So instead of checking for any call, check for unavoidable call.
For #22698 cases I see:
mime/quotedprintable/Writer-6                   10.9µs ± 4%      9.2µs ± 3%   -15.02%  (p=0.000 n=8+8)
And:
compress/flate/Encode/Twain/Huffman/1e4-6       99.4µs ± 6%     90.9µs ± 0%    -8.57%  (p=0.000 n=8+8)
compress/flate/Encode/Twain/Huffman/1e5-6       760µs ± 1%      725µs ± 1%     -4.56%  (p=0.000 n=8+8)
compress/flate/Encode/Twain/Huffman/1e6-6       7.55ms ± 0%      7.24ms ± 0%     -4.07%  (p=0.000 n=8+7)

There are no significant changes on go1 benchmarks.
But for cases with runtime arch checks, where we call generic version on old hardware,
there are respectable performance gains:
math/RoundToEven-6                             1.43ns ± 0%     1.25ns ± 0%   -12.59%  (p=0.001 n=7+7)
math/bits/OnesCount64-6                        1.60ns ± 1%     1.42ns ± 1%   -11.32%  (p=0.000 n=8+8)

Also on some runtime benchmarks loops have less loads and higher performance:
runtime/RuneIterate/range1/ASCII-6             15.6ns ± 1%     13.9ns ± 1%   -10.74%  (p=0.000 n=7+8)
runtime/ArrayEqual-6                           3.22ns ± 0%     2.86ns ± 2%   -11.06%  (p=0.000 n=7+8)

Fixes #22698
Updates #22234

Change-Id: I0ae2f19787d07a9026f064366dedbe601bf7257a
Reviewed-on: https://go-review.googlesource.com/84055
Run-TryBot: Ilya Tocar <ilya.tocar@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-03-20 21:02:39 +00:00
Alberto Donizetti be371edd67 test/codegen: port comparisons tests to codegen
And delete them from asm_test.

Change-Id: I64c512bfef3b3da6db5c5d29277675dade28b8ab
Reviewed-on: https://go-review.googlesource.com/101595
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-20 19:38:06 +00:00
Than McIntosh f45c07e84a cmd/compile: fix regression in DWARF inlined routine variable tracking
Fix a bug in the code that generates the pre-inlined variable
declaration table used as raw material for emitting DWARF inline
routine records. The fix for issue 23704 altered the recipe for
assigning file/line/col to variables in one part of the compiler, but
didn't update a similar recipe in the code for variable tracking.
Added a new test that should catch problems of a similar nature.

Fixes #24460.

Change-Id: I255c036637f4151aa579c0e21d123fd413724d61
Reviewed-on: https://go-review.googlesource.com/101676
Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2018-03-20 18:56:52 +00:00
Michael Munday ae10914e67 cmd/compile: mark LAA and LAAG as clobbering flags on s390x
The atomic add instructions modify the condition code and so need to
be marked as clobbering flags.

Fixes #24449.

Change-Id: Ic69c8d775fbdbfb2a56c5e0cfca7a49c0d7f6897
Reviewed-on: https://go-review.googlesource.com/101455
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-03-20 09:44:50 +00:00
Fangming.Fang 9c312245ac cmd/asm: fix bug about VMOV instruction (move a vector element to another) on ARM64
This change fixes index error when encoding VMOV instruction which pattern
is vmov Vn.<T>[index], Vd.<T>[index]

Change-Id: I949166e6dfd63fb0a9365f183b6c50d452614f9d
Reviewed-on: https://go-review.googlesource.com/101335
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-20 03:45:04 +00:00
Fangming.Fang 7673e30503 cmd/asm: fix bug about VMOV instruction (move register to vector element) on ARM64
This change fixes index error when encoding VMOV instruction which pattern is
VMOV Rn, V.<T>[index]. For example VMOV R1, V1.S[1] is assembled as VMOV R1, V1.S[0]

Fixes #24400
Change-Id: I82b5edc8af4e06862bc4692b119697c6bb7dc3fb
Reviewed-on: https://go-review.googlesource.com/101297
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-20 03:43:37 +00:00
Vladimir Kuzmin c12b185a6e cmd/compile: avoid mapaccess at m[k]=append(m[k]..
Currently rvalue m[k] is transformed during walk into:

        tmp1 := *mapaccess(m, k)
        tmp2 := append(tmp1, ...)
        *mapassign(m, k) = tmp2

However, this is suboptimal, as we could instead produce just:
        tmp := mapassign(m, k)
        *tmp := append(*tmp, ...)

Optimization is possible only if during Order it may tell that m[k] is
exactly the same at left and right part of assignment. It doesn't work:
1) m[f(k)] = append(m[f(k)], ...)
2) sink, m[k] = sink, append(m[k]...)
3) m[k] = append(..., m[k],...)

Benchmark:
name                           old time/op    new time/op    delta
MapAppendAssign/Int32/256-8      33.5ns ± 3%    22.4ns ±10%  -33.24%  (p=0.000 n=16+18)
MapAppendAssign/Int32/65536-8    68.2ns ± 6%    48.5ns ±29%  -28.90%  (p=0.000 n=20+20)
MapAppendAssign/Int64/256-8      34.3ns ± 4%    23.3ns ± 5%  -32.23%  (p=0.000 n=17+18)
MapAppendAssign/Int64/65536-8    65.9ns ± 7%    61.2ns ±19%   -7.06%  (p=0.002 n=18+20)
MapAppendAssign/Str/256-8         116ns ±12%      79ns ±16%  -31.70%  (p=0.000 n=20+19)
MapAppendAssign/Str/65536-8       134ns ±15%     111ns ±45%  -16.95%  (p=0.000 n=19+20)

name                           old alloc/op   new alloc/op   delta
MapAppendAssign/Int32/256-8       47.0B ± 0%     46.0B ± 0%   -2.13%  (p=0.000 n=19+18)
MapAppendAssign/Int32/65536-8     27.0B ± 0%     20.7B ±30%  -23.33%  (p=0.000 n=20+20)
MapAppendAssign/Int64/256-8       47.0B ± 0%     46.0B ± 0%   -2.13%  (p=0.000 n=20+17)
MapAppendAssign/Int64/65536-8     27.0B ± 0%     27.0B ± 0%     ~     (all equal)
MapAppendAssign/Str/256-8         94.0B ± 0%     78.0B ± 0%  -17.02%  (p=0.000 n=20+16)
MapAppendAssign/Str/65536-8       54.0B ± 0%     54.0B ± 0%     ~     (all equal)

Fixes #24364
Updates #5147

Change-Id: Id257d052b75b9a445b4885dc571bf06ce6f6b409
Reviewed-on: https://go-review.googlesource.com/100838
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-03-20 01:47:07 +00:00
fanzha02 910c3a9dfc cmd/asm: add ARM64 assembler check for incorrect input
Current ARM64 assembler has no check for the invalid value of both
shift amount and post-index immediate offset of LD1/ST1. This patch
adds the check.

This patch also fixes the printing error of register number equals
to 31, which should be printed as ZR instead of R31. Test cases
are also added.

Change-Id: I476235f3ab3a3fc91fe89c5a3149a4d4529c05c7
Reviewed-on: https://go-review.googlesource.com/100255
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
2018-03-19 23:45:50 +00:00
Alberto Donizetti 5a4e09837c test/codegen: port maps test to codegen
And delete them from asm_test.

Change-Id: I3cf0934706a640136cb0f646509174f8c1bf3363
Reviewed-on: https://go-review.googlesource.com/101395
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Giovanni Bajo <rasky@develer.com>
2018-03-19 13:39:34 +00:00