Commit Graph

82 Commits

Author SHA1 Message Date
Russ Cox 4dd77bdc91 cmd/asm, cmd/link, runtime: introduce FuncInfo flag bits
The runtime traceback code has its own definition of which functions
mark the top frame of a stack, separate from the TOPFRAME bits that
exist in the assembly and are passed along in DWARF information.
It's error-prone and redundant to have two different sources of truth.
This CL provides the actual TOPFRAME bits to the runtime, so that
the runtime can use those bits instead of reinventing its own category.

This CL also adds a new bit, SPWRITE, which marks functions that
write directly to SP (anything but adding and subtracting constants).
Such functions must stop a traceback, because the traceback has no
way to rederive the SP on entry. Again, the runtime has its own definition
which is mostly correct, but also missing some functions. During ordinary
goroutine context switches, such functions do not appear on the stack,
so the incompleteness in the runtime usually doesn't matter.
But profiling signals can arrive at any moment, and the runtime may
crash during traceback if it attempts to unwind an SP-writing frame
and gets out-of-sync with the actual stack. The runtime contains code
to try to detect likely candidates but again it is incomplete.
Deriving the SPWRITE bit automatically from the actual assembly code
provides the complete truth, and passing it to the runtime lets the
runtime use it.

This CL is part of a stack adding windows/arm64
support (#36439), intended to land in the Go 1.17 cycle.
This CL is, however, not windows/arm64-specific.
It is cleanup meant to make the port (and future ports) easier.

Change-Id: I227f53b23ac5b3dabfcc5e8ee3f00df4e113cf58
Reviewed-on: https://go-review.googlesource.com/c/go/+/288800
Trust: Russ Cox <rsc@golang.org>
Trust: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
2021-02-19 00:02:30 +00:00
Derek Parker 63fd764502 cmd/internal/obj: add prologue_end DWARF stmt for ppc64
This patch adds a prologue_end statement to the DWARF information for
the ppc64 arch.

Prologue end is used by the Delve debugger in order to determine where
to set a breakpoint to avoid the stacksplit prologue.

Updates #36612

Change-Id: Ifb16c1476fe716a0bf493c5486d1d88ebe8d0253
GitHub-Last-Rev: 77a217206d
GitHub-Pull-Request: golang/go#42261
Reviewed-on: https://go-review.googlesource.com/c/go/+/266019
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com>
Trust: Dmitri Shuralyov <dmitshur@golang.org>
2020-11-04 23:24:22 +00:00
Russ Cox 912262b806 cmd/internal/obj: move LSym.Func into LSym.Extra
This creates space for a different kind of extension field
in LSym without making the struct any larger.
(There are many LSym, so we care about keeping the struct small.)

Change-Id: Ib16edb9e15f54c2a7351c8b875e19684058711e5
Reviewed-on: https://go-review.googlesource.com/c/go/+/243943
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-10-16 03:02:36 +00:00
Keith Randall 9e70564f63 cmd/compile,cmd/asm: simplify recording of branch targets, take 2
We currently use two fields to store the targets of branches.
Some phases use p.To.Val, some use p.Pcond. Rewrite so that
every branch instruction uses p.To.Val.
p.From.Val is also used in rare instances.
Introduce a Pool link for use by arm/arm64, instead of
repurposing Pcond.

This is a cleanup CL in preparation for some stack frame CLs.

Change-Id: If8239177e4b1ea2bccd0608eb39553d23210d405
Reviewed-on: https://go-review.googlesource.com/c/go/+/251437
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-08-31 17:36:08 +00:00
Keith Randall 26ad27bb02 Revert "cmd/compile,cmd/asm: simplify recording of branch targets"
This reverts CL 243318.

Reason for revert: Seems to be crashing some builders.

Change-Id: I2ffc59bc5535be60b884b281c8d0eff4647dc756
Reviewed-on: https://go-review.googlesource.com/c/go/+/251169
Reviewed-by: Bryan C. Mills <bcmills@google.com>
2020-08-28 02:10:13 +00:00
Keith Randall 8247da3662 cmd/compile,cmd/asm: simplify recording of branch targets
We currently use two fields to store the targets of branches.
Some phases use p.To.Val, some use p.Pcond. Rewrite so that
every branch instruction uses p.To.Val.
p.From.Val is also used in rare instances.
Introduce a Pool link for use by arm/arm64, instead of
repurposing Pcond.

This is a cleanup CL in preparation for some stack frame CLs.

Change-Id: I9055bf0a1d986aff421e47951a1dedc301c846f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/243318
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-08-27 22:35:45 +00:00
Lynn Boger 7d7bd5abc7 cmd/internal/obj/ppc64: don't remove NOP in assembler
Previously, the assembler removed NOPs from the Prog list in
obj9.go. NOPs shouldn't be removed if they were added as
an inline mark, as described in the issue below.

Fixes #40689

Once the NOPs were left in the Prog list, some instructions
were flagged as invalid because they had an operand which was
not represented in optab. In order to preserve the previous
assembler behavior, entries were added to optab for those
operand cases. They were not flagged as errors before because
the NOP instructions were removed before the code to check the
valid opcode/operand combinations.

Change-Id: Iae5145f94459027cf458e914d7c5d6089807ccf8
Reviewed-on: https://go-review.googlesource.com/c/go/+/247842
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Paul Murphy <murp@ibm.com>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
2020-08-13 15:22:41 +00:00
Than McIntosh 076dc2111b [dev.link] cmd/compile: make compiler-generated ppc64 TOC symbols static
Set the AttrStatic flag on compiler-emitted TOC symbols for ppc64; these
symbols don't need to go into the final symbol table in Go binaries.
This fixes a buglet introduced by CL 240539 that was causing failures
on the aix builder.

Change-Id: If8b63bcf6d2791f1ec5a0c371d2d11e806202fd2
Reviewed-on: https://go-review.googlesource.com/c/go/+/241637
Run-TryBot: Than McIntosh <thanm@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-07-08 23:51:08 +00:00
Cherry Zhang b2482e4817 cmd/internal/obj: mark split-stack prologue nonpreemptible
When there are both a synchronous preemption request (by
clobbering the stack guard) and an asynchronous one (by signal),
the running goroutine may observe the synchronous request first
in stack bounds check, and go to the path of calling morestack.
If the preemption signal arrives at this point before the call to
morestack, the goroutine will be asynchronously preempted,
entering the scheduler. When it is resumed, the scheduler clears
the preemption request, unclobbers the stack guard. But the
resumed goroutine will still call morestack, as it is already on
its way. morestack will, as there is no preemption request,
double the stack unnecessarily. If this happens multiple times,
the stack may grow too big, although only a small amount is
actually used.

To fix this, we mark the stack bounds check and the call to
morestack async-nonpreemptible, starting after the memory
instruction (mostly a load, on x86 CMP with memory).

Not done for Wasm as it does not support async preemption.

Fixes #35470.

Change-Id: Ibd7f3d935a3649b80f47539116ec9b9556680cf2
Reviewed-on: https://go-review.googlesource.com/c/go/+/207350
Reviewed-by: David Chase <drchase@google.com>
2019-11-27 01:30:19 +00:00
Cherry Zhang 3c47eada3f cmd/internal/obj/ppc64: handle MOVDU for SP delta
If a MOVDU instruction is used with an offset of SP, the
instruction changes SP therefore needs an SP delta, which is used
for generating the PC-SP table for stack unwinding. MOVDU is
frequently used for allocating the frame and saving the LR in the
same instruction, so this is particularly useful.

Change-Id: Icb63eb55aa01c3dc350ac4e4cff6371f4c3c5867
Reviewed-on: https://go-review.googlesource.com/c/go/+/205279
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-11-07 19:20:57 +00:00
Cherry Zhang ceca99bdeb cmd/compile, cmd/internal/obj/ppc64: mark unsafe points
We'll use CTR as a scratch register for call injection. Mark code
sequences that use CTR as unsafe for async preemption. Currently
it is only used in LoweredZero and LoweredMove. It is unfortunate
that they are nonpreemptible. But I think it is still better than
using LR for call injection and marking all leaf functions
nonpreemptible.

Also mark the prologue of large frame functions nonpreemptible,
as we write below SP.

Change-Id: I05a75431499f3f4b2f23651a7b17f7fcf2afbe06
Reviewed-on: https://go-review.googlesource.com/c/go/+/203823
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-11-07 19:20:35 +00:00
Cherry Zhang 24e549a717 cmd/compile, cmd/internal/obj/ppc64: use LR for indirect calls
On PPC64, indirect calls can be made through LR or CTR. Currently
both are used. This CL changes it to always use LR.

For async preemption, to return from the injected call, we need
an indirect jump back to the PC we preeempted. This jump can be
made through LR or CTR. So we'll have to clobber either LR or CTR.
Currently, LR is used more frequently. In particular, for a leaf
function, LR is live throughout the function. We don't want to
make leaf functions nonpreemptible. So we choose CTR for the call
injection. For code sequences that use CTR, if it is ok to use
another register, change it to.

Plus, it is a call so it will clobber LR anyway. It doesn't need
to also clobber CTR (even without preemption).

Change-Id: I07bd0e93b94a1a3aa2be2cd465801136165d8ab8
Reviewed-on: https://go-review.googlesource.com/c/go/+/203822
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2019-11-07 19:20:03 +00:00
Clément Chigot 09c9bced82 cmd/internal/obj/ppc64: Fix ADUFFxxxx generation on aix/ppc64
ADUFFCOPY and ADUFFZERO instructions weren't handled by rewriteToUseTOC.
These instructions are considered as a simple branch except with -dynlink
where they become an indirect call.

Fixes #34604

Change-Id: I16ca6a152164966fb9cbf792219a8a39aad2b53b
Reviewed-on: https://go-review.googlesource.com/c/go/+/197842
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-10-01 18:30:25 +00:00
Clément Chigot 20ac64a2dd cmd/dist, cmd/link, runtime: fix stack size when cross-compiling aix/ppc64
This commit allows to cross-compiling aix/ppc64. The nosplit limit must
twice as large as on others platforms because of AIX syscalls.
The stack limit, especially stackGuardMultiplier, was set by cmd/dist
during the bootstrap and doesn't depend on GOOS/GOARCH target.

Fixes #29572

Change-Id: Id51e38885e1978d981aa9e14972eaec17294322e
Reviewed-on: https://go-review.googlesource.com/c/157117
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-01-09 22:06:51 +00:00
Clément Chigot 503091a77c cmd: improve aix/ppc64 new symbol addressing
This commit updates the new symbol addressing made for aix/ppc64 according
to feedbacks given in CL 151039.

Change-Id: Ic4eb9943dc520d65f7d084adf8fa9a2530f4d3f9
Reviewed-on: https://go-review.googlesource.com/c/151302
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-12-17 21:13:09 +00:00
Clément Chigot 4295ed9bef cmd: fix symbols addressing for aix/ppc64
This commit changes the code generated for addressing symbols on AIX
operating system.

On AIX, every symbol accesses must be done via another symbol near the TOC,
named TOC anchor or TOC entry. This TOC anchor is a pointer to the symbol
address.
During Progedit function, when a symbol access is detected, its instructions
are modified to create a load on its TOC anchor and retrieve the symbol.

Change-Id: I00cf8f49c13004bc99fa8af13d549a709320f797
Reviewed-on: https://go-review.googlesource.com/c/151039
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-11-27 21:06:16 +00:00
Lynn Boger f776d51bf7 cmd/internal/obj/ppc64: generate float 0 more efficiently on ppc64x
This change makes use of a VSX instruction to generate the
float 0 value instead of generating a constant in memory and
loading it from there.

This uses 1 instruction instead of 2 and avoids a memory reference.
in the +0 case, uses 2 instructions in the -0 case but avoids
the memory reference.

Since this is done in the assembler for ppc64x, an update has
been made to the assembler test.

Change-Id: Ief7dddcb057bfb602f78215f6947664e8c841464
Reviewed-on: https://go-review.googlesource.com/c/139420
Reviewed-by: Michael Munday <mike.munday@ibm.com>
2018-10-10 12:31:23 +00:00
Lynn Boger 9e9ff565cd runtime/race: implement race detector for ppc64le
This adds the support to enable the race detector for ppc64le.

Added runtime/race_ppc64le.s to manage the calls from Go to the
LLVM tsan functions, mostly converting from the Go ABI to the
PPC64 ABI expected by Clang generated code.

Changed racewalk.go to call racefuncenterfp instead of racefuncenter
on ppc64le to allow the caller pc to be obtained in the asm code
before calling the tsan version.

Changed the set up code for racecallbackthunk so it doesn't use
the autogenerated save and restore of the link register since that
sequence uses registers inconsistent with the normal ppc64 ABI.

Made various changes to recognize that race is supported for
ppc64le.

Ensured that tls_g is updated and accessible from race_linux_ppc64le.s
so that the race ctx can be obtained and passed to tsan functions.

This enables the race tests for ppc64le in cmd/dist/test.go and
increases the timeout when running the benchmarks with the -race
option to avoid timing out.

Updates #24354, #23731

Change-Id: Ib97dc7ac313e6313c836dc7d2fb698f9d8fba3ef
Reviewed-on: https://go-review.googlesource.com/107935
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-06-11 17:45:36 +00:00
Austin Clements 02495da6b6 cmd/internal/obj: consolidate emitting entry stack map
The obj package needs to emit the PCDATA to select the entry stack map
before calling morestack. Currently this is copied for every
architecture. Since we're about to change how this works, consolidate
all of these copies into a single helper function.

For #24543.

Change-Id: Ia92d94de78f8e23fd06dba747c43e03e5989f67b
Reviewed-on: https://go-review.googlesource.com/109346
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-05-22 14:43:35 +00:00
Wei Xiao 20102594a0 cmd/compile: intrinsify runtime.getcallerpc on all link register architectures
Add a compiler intrinsic for getcallerpc on following architectures:
  arm
  mips mipsle mips64 mips64le
  ppc64 ppc64le
  s390x

Change-Id: I758f3d4742fc214b206bcd07d90408622c17dbef
Reviewed-on: https://go-review.googlesource.com/110835
Run-TryBot: Wei Xiao <Wei.Xiao@arm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-05-02 16:59:27 +00:00
David Chase 0eacf8cbdf cmd/compile: add DWARF reg defs & fix 32-bit location list bug
Before DWARF location lists can be turned on, 3 bugs need
fixing.

This CL addresses two -- lack of register definitions for
various architectures, and bugs on 32-bit platforms.
The third bug comes later.

Passes
GO_GCFLAGS=-dwarflocationlists ./run.bash -no-rebuild
(-no-rebuild because the map dependence causes trouble)

Change-Id: I4223b48ade84763e4b048e4aeb81149f082c7bc7
Reviewed-on: https://go-review.googlesource.com/99255
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2018-03-09 23:17:18 +00:00
Kunpei Sakai 0c471dfae2 cmd: avoid unnecessary type conversions
CL generated mechanically with github.com/mdempsky/unconvert.

Also updated cmd/compile/internal/ssa/gen/*.rules manually.

Change-Id: If721ef73cf0771ae83ce7e2d11623fc8d9155768
Reviewed-on: https://go-review.googlesource.com/97075
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-02-26 20:22:06 +00:00
isharipo 8c67f210a1 cmd/internal/obj: change Prog.From3 to RestArgs ([]Addr)
This change makes it easier to express instructions
with arbitrary number of operands.

Rationale: previous approach with operand "hiding" does
not scale well, AVX and especially AVX512 have many
instructions with 3+ operands.

x86 asm backend is updated to handle up to 6 explicit operands.
It also fixes issue with 4-th immediate operand type checks.
All `ytab` tables are updated accordingly.

Changes to non-x86 backends only include these patterns:
`p.From3 = X` => `p.SetFrom3(X)`
`p.From3.X = Y` => `p.GetFrom3().X = Y`

Over time, other backends can adapt Prog.RestArgs
and reduce the amount of workarounds.

-- Performance --

x/benchmark/build:

$ benchstat upstream.bench patched.bench
name      old time/op                 new time/op                 delta
Build-48                  21.7s ± 2%                  21.8s ± 2%   ~     (p=0.218 n=10+10)

name      old binary-size             new binary-size             delta
Build-48                  10.3M ± 0%                  10.3M ± 0%   ~     (all equal)

name      old build-time/op           new build-time/op           delta
Build-48                  21.7s ± 2%                  21.8s ± 2%   ~     (p=0.218 n=10+10)

name      old build-peak-RSS-bytes    new build-peak-RSS-bytes    delta
Build-48                  145MB ± 5%                  148MB ± 5%   ~     (p=0.218 n=10+10)

name      old build-user+sys-time/op  new build-user+sys-time/op  delta
Build-48                  21.0s ± 2%                  21.2s ± 2%   ~     (p=0.075 n=10+10)

Microbenchmark shows a slight slowdown.

name        old time/op  new time/op  delta
AMD64asm-4  49.5ms ± 1%  49.9ms ± 1%  +0.67%  (p=0.001 n=23+15)

func BenchmarkAMD64asm(b *testing.B) {
  for i := 0; i < b.N; i++ {
    TestAMD64EndToEnd(nil)
    TestAMD64Encoder(nil)
  }
}

Change-Id: I4f1d37b5c2c966da3f2127705ccac9bff0038183
Reviewed-on: https://go-review.googlesource.com/63490
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-09-15 21:05:03 +00:00
Ian Lance Taylor 5331e7e9df cmd/internal/obj, cmd/link: fix st_other field on PPC64
In PPC64 ELF files, the st_other field indicates the number of
prologue instructions between the global and local entry points.
We add the instructions in the compiler and assembler if -shared is used.
We were assuming that the instructions were present when building a
c-archive or PIE or doing dynamic linking, on the assumption that those
are the cases where the go tool would be building with -shared.
That assumption fails when using some other tool, such as Bazel,
that does not necessarily use -shared in exactly the same way.

This CL records in the object file whether a symbol was compiled
with -shared (this will be the same for all symbols in a given compilation)
and uses that information when setting the st_other field.

Fixes #20290.

Change-Id: Ib2b77e16aef38824871102e3c244fcf04a86c6ea
Reviewed-on: https://go-review.googlesource.com/43051
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-05-09 23:36:51 +00:00
Lynn Boger 6910e1085b cmd/internal/obj/ppc64: use MOVDU to update stack reg for leaf functions where possible
When the stack register is decremented to acquire stack space at
the beginning of a function, a MOVDU should be used so it is done
atomically, unless the size of the stack frame is too large for
that instruction.  The code to determine whether to use MOVDU
or MOVD was checking if the function was a leaf and always generating MOVD
when it was.  The choice of MOVD vs. MOVDU should only depend on the stack
frame size.  This fixes that problem.

Change-Id: I0e49c79036f1e8f7584179e1442b938fc6da085f
Reviewed-on: https://go-review.googlesource.com/41813
Reviewed-by: Michael Munday <munday@ca.ibm.com>
2017-04-26 17:39:33 +00:00
Josh Bleecher Snyder 405a280d01 cmd/internal/obj: eliminate LSym.Version
There were only two versions, 0 and 1,
and the only user of version 1 was the assembler,
to indicate that a symbol was static.

Rename LSym.Version to Static,
and add it to LSym.Attributes.
Simplify call-sites.

Passes toolstash-check.

Change-Id: Iabd39918f5019cce78f381d13f0481ae09f3871f
Reviewed-on: https://go-review.googlesource.com/41201
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-04-20 21:56:23 +00:00
Matthew Dempsky 1e3570ac86 cmd/internal/objabi: extract shared functionality from obj
Now only cmd/asm and cmd/compile depend on cmd/internal/obj. Changing
the assembler backends no longer requires reinstalling cmd/link or
cmd/addr2line.

There's also now one canonical definition of the object file format in
cmd/internal/objabi/doc.go, with a warning to update all three
implementations.

objabi is still something of a grab bag of unrelated code (e.g., flag
and environment variable handling probably belong in a separate "tool"
package), but this is still progress.

Fixes #15165.
Fixes #20026.

Change-Id: Ic4b92fac7d0d35438e0d20c9579aad4085c5534c
Reviewed-on: https://go-review.googlesource.com/40972
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-04-19 00:00:09 +00:00
Matthew Dempsky 1747078695 cmd/internal/obj: un-embed FuncInfo field in LSym
Automated refactoring using github.com/mdempsky/unbed (to rewrite
s.Foo to s.FuncInfo.Foo) and then gorename (to rename the FuncInfo
field to just Func).

Passes toolstash-check -all.

Change-Id: I802c07a1239e0efea058a91a87c5efe12170083a
Reviewed-on: https://go-review.googlesource.com/40670
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-04-18 17:29:50 +00:00
Josh Bleecher Snyder ce3ee7cdae cmd/internal/obj: stop storing Text flags in From3
Prior to this CL, flags such as NOSPLIT
on ATEXT Progs were stored in From3.Offset.
Some but not all of those flags were also
duplicated into From.Sym.Attribute.

This CL migrates all of those flags into
From.Sym.Attribute and stops creating a From3.

A side-effect of this is that printing an
ATEXT Prog can no longer simply dump From3.Offset.
That's kind of good, since the raw flag value
wasn't very informative anyway, but it did
necessitate a bunch of updates to the cmd/asm tests.

The reason I'm doing this work now is that
avoiding storing flags in both From.Sym and From3.Offset
simplifies some other changes to fix the data
race first described in CL 40254.

This CL almost passes toolstash-check -all.
The only changes are in cases where the assembler
has decided that a function's flags may be altered,
e.g. to make a function with no calls in it NOSPLIT.
Prior to this CL, that information was not printed.

Sample before:

"".Ctz64 t=1 size=63 args=0x10 locals=0x0
	0x0000 00000 (/Users/josh/go/tip/src/runtime/internal/sys/intrinsics.go:35)	TEXT	"".Ctz64(SB), $0-16
	0x0000 00000 (/Users/josh/go/tip/src/runtime/internal/sys/intrinsics.go:35)	FUNCDATA	$0, gclocals·f207267fbf96a0178e8758c6e3e0ce28(SB)

Sample after:

"".Ctz64 t=1 nosplit size=63 args=0x10 locals=0x0
	0x0000 00000 (/Users/josh/go/tip/src/runtime/internal/sys/intrinsics.go:35)	TEXT	"".Ctz64(SB), NOSPLIT, $0-16
	0x0000 00000 (/Users/josh/go/tip/src/runtime/internal/sys/intrinsics.go:35)	FUNCDATA	$0, gclocals·f207267fbf96a0178e8758c6e3e0ce28(SB)

Observe the additional "nosplit" in the first line
and the additional "NOSPLIT" in the second line.

Updates #15756

Change-Id: I5c59bd8f3bdc7c780361f801d94a261f0aef3d13
Reviewed-on: https://go-review.googlesource.com/40495
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-12 21:53:39 +00:00
Josh Bleecher Snyder f95de5c679 cmd/internal/obj/ppc64: make assembler almost concurrency-safe
CL 39922 made the arm assembler concurrency-safe.
This CL does the same, but for ppc64.
The approach is similar: introduce ctxt9 to hold
function-local state and thread it through
the assembler as necessary.

One race remains after this CL, similar to CL 40252.

That race is conceptually unrelated to this refactoring,
and will be addressed in a separate CL.

Passes toolstash-check -all.

Updates #15756

Change-Id: Icc37d9a971bed2184c8e66b1a64f4f2e556dc207
Reviewed-on: https://go-review.googlesource.com/40372
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-11 14:53:08 +00:00
Josh Bleecher Snyder 7165bcc6ba cmd/internal/obj: remove timing prints from assemblers
Updates #19865

Change-Id: I24fbf5d79b5e4cac09c14cfff678a8215397b670
Reviewed-on: https://go-review.googlesource.com/39914
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-07 17:13:26 +00:00
Josh Bleecher Snyder 63c1aff60b cmd/internal/obj: eagerly initialize assemblers
CL 38662 changed the x86 assembler to be eagerly
initialized, for a concurrent backend.

This CL puts in place a proper mechanism for doing so,
and switches all architectures to use it.

Passes toolstash-check -all.

Updates #15756

Change-Id: Id2aa527d3a8259c95797d63a2f0d1123e3ca2a1c
Reviewed-on: https://go-review.googlesource.com/39917
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-07 16:57:03 +00:00
Josh Bleecher Snyder 99683483d6 cmd/internal/obj: unify creation of numeric literal syms
This is a straightforward refactoring,
to reduce the scope of upcoming changes.

The symbol size and AttrLocal=true was not
set universally, but it appears not to matter,
since toolstash -cmp is happy.

Passes toolstash-check -all.

Change-Id: I7f8392f939592d3a1bc6f61dec992f5661f42fca
Reviewed-on: https://go-review.googlesource.com/39791
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-06 19:01:50 +00:00
Josh Bleecher Snyder c311488283 cmd/internal/obj: remove Linklookup
It was simply a wrapper around Link.Lookup.
Unwrap everything.

CL prepared using eg with template:

package p

import "cmd/internal/obj"

func before(ctxt *obj.Link, name string, version int) *obj.LSym {
	return obj.Linklookup(ctxt, name, version)
}

func after(ctxt *obj.Link, name string, version int) *obj.LSym {
	return ctxt.Lookup(name, version)
}

Then one comment in cmd/asm/internal/asm/parse.go
was manually updated (and gofmt'ed!),
and func Linklookup deleted.

Passes toolstash-check (as a sanity measure).

Change-Id: Icc4d56b0b2b5c8888d3184c1898c48359ea1e638
Reviewed-on: https://go-review.googlesource.com/39715
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-06 19:01:41 +00:00
Josh Bleecher Snyder 5b59b32c97 cmd/compile: teach assemblers to accept a Prog allocator
The existing bulk Prog allocator is not concurrency-safe.
To allow for concurrency-safe bulk allocation of Progs,
I want to move Prog allocation and caching upstream,
to the clients of cmd/internal/obj.

This is a preliminary enabling refactoring.
After this CL, instead of calling Ctxt.NewProg
throughout the assemblers, we thread through
a newprog function that returns a new Prog.

That function is set up to be Ctxt.NewProg,
so there are no real changes in this CL;
this CL only establishes the plumbing.

Passes toolstash-check -all.
Negligible compiler performance impact.

Updates #15756

name        old time/op     new time/op     delta
Template        213ms ± 3%      214ms ± 4%    ~     (p=0.574 n=49+47)
Unicode        90.1ms ± 5%     89.9ms ± 4%    ~     (p=0.417 n=50+49)
GoTypes         585ms ± 4%      584ms ± 3%    ~     (p=0.466 n=49+49)
SSA             6.50s ± 3%      6.52s ± 2%    ~     (p=0.251 n=49+49)
Flate           128ms ± 4%      128ms ± 4%    ~     (p=0.673 n=49+50)
GoParser        152ms ± 3%      152ms ± 3%    ~     (p=0.810 n=48+49)
Reflect         372ms ± 4%      372ms ± 5%    ~     (p=0.778 n=49+50)
Tar             113ms ± 5%      111ms ± 4%  -0.98%  (p=0.016 n=50+49)
XML             208ms ± 3%      208ms ± 2%    ~     (p=0.483 n=47+49)
[Geo mean]      285ms           285ms       -0.17%

name        old user-ns/op  new user-ns/op  delta
Template         253M ± 8%       254M ± 9%    ~     (p=0.899 n=50+50)
Unicode          106M ± 9%       106M ±11%    ~     (p=0.642 n=50+50)
GoTypes          736M ± 4%       740M ± 4%    ~     (p=0.121 n=50+49)
SSA             8.82G ± 3%      8.88G ± 2%  +0.65%  (p=0.006 n=49+48)
Flate            147M ± 4%       147M ± 5%    ~     (p=0.844 n=47+48)
GoParser         179M ± 4%       178M ± 6%    ~     (p=0.785 n=50+50)
Reflect          443M ± 6%       441M ± 5%    ~     (p=0.850 n=48+47)
Tar              126M ± 5%       126M ± 5%    ~     (p=0.734 n=50+50)
XML              244M ± 5%       244M ± 5%    ~     (p=0.594 n=49+50)
[Geo mean]       341M            341M       +0.11%

Change-Id: Ice962f61eb3a524c2db00a166cb582c22caa7d68
Reviewed-on: https://go-review.googlesource.com/39633
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-06 02:07:21 +00:00
Michael Munday 41fd8d6401 cmd/internal/obj: make morestack cutoff the same on all architectures
There is always 128 bytes available below the stackguard. Allow functions
with medium-sized stack frames to use this, potentially allowing them to
avoid growing the stack.

This change makes all architectures use the same calculation as x86.

Change-Id: I2afb1a7c686ae5a933e50903b31ea4106e4cd0a0
Reviewed-on: https://go-review.googlesource.com/38734
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-29 20:35:46 +00:00
Josh Bleecher Snyder 42a915c933 cmd/internal/obj: convert Debug* Link fields into bools
Change-Id: I9ac274dbfe887675a7820d2f8f87b5887b1c9b0e
Reviewed-on: https://go-review.googlesource.com/38383
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-20 22:08:41 +00:00
Josh Bleecher Snyder 604e4841d6 cmd/internal/obj/ppc64: remove stackbarrier function check
Stack barriers were removed in CL 36620.

Change-Id: If124d65a73a7b344a42be2a4b386a14d7a0a428b
Reviewed-on: https://go-review.googlesource.com/38169
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Reviewed-by: David Chase <drchase@google.com>
2017-03-17 04:46:58 +00:00
Cherry Zhang bed8129ee6 cmd/internal/obj: remove Follow pass
The Follow pass in the assembler backend reorders and copies
instructions. This even applies to hand-written assembly code,
which in many cases don't want to be reordered. Now that the
SSA compiler does a good job for laying out instructions, the
benefit of this pass is very little:

AMD64: (old = with Follow, new = without Follow)
name                      old time/op    new time/op    delta
BinaryTree17-12              2.78s ± 1%     2.79s ± 1%  +0.44%  (p=0.000 n=20+19)
Fannkuch11-12                3.11s ± 0%     3.31s ± 1%  +6.16%  (p=0.000 n=19+19)
FmtFprintfEmpty-12          50.9ns ± 1%    51.6ns ± 3%  +1.40%  (p=0.000 n=17+20)
FmtFprintfString-12          127ns ± 0%     128ns ± 1%  +0.88%  (p=0.000 n=17+17)
FmtFprintfInt-12             122ns ± 0%     123ns ± 1%  +0.76%  (p=0.000 n=20+19)
FmtFprintfIntInt-12          185ns ± 1%     186ns ± 1%  +0.65%  (p=0.000 n=20+19)
FmtFprintfPrefixedInt-12     192ns ± 1%     202ns ± 1%  +4.99%  (p=0.000 n=20+19)
FmtFprintfFloat-12           284ns ± 0%     288ns ± 0%  +1.33%  (p=0.000 n=15+19)
FmtManyArgs-12               807ns ± 0%     804ns ± 0%  -0.44%  (p=0.000 n=16+18)
GobDecode-12                7.23ms ± 1%    7.21ms ± 1%    ~     (p=0.052 n=20+20)
GobEncode-12                6.09ms ± 1%    6.12ms ± 1%  +0.41%  (p=0.002 n=19+19)
Gzip-12                      253ms ± 1%     255ms ± 1%  +0.95%  (p=0.000 n=18+20)
Gunzip-12                   38.4ms ± 0%    38.5ms ± 0%  +0.34%  (p=0.000 n=17+17)
HTTPClientServer-12         95.4µs ± 2%    96.1µs ± 1%  +0.78%  (p=0.002 n=19+19)
JSONEncode-12               16.5ms ± 1%    16.6ms ± 1%  +1.17%  (p=0.000 n=19+19)
JSONDecode-12               54.6ms ± 1%    55.3ms ± 1%  +1.23%  (p=0.000 n=18+18)
Mandelbrot200-12            4.47ms ± 0%    4.47ms ± 0%  +0.06%  (p=0.000 n=18+18)
GoParse-12                  3.47ms ± 1%    3.47ms ± 1%    ~     (p=0.583 n=20+20)
RegexpMatchEasy0_32-12      84.8ns ± 1%    85.2ns ± 2%  +0.51%  (p=0.022 n=20+20)
RegexpMatchEasy0_1K-12       206ns ± 1%     206ns ± 1%    ~     (p=0.770 n=20+20)
RegexpMatchEasy1_32-12      82.8ns ± 1%    83.4ns ± 1%  +0.64%  (p=0.000 n=20+19)
RegexpMatchEasy1_1K-12       363ns ± 1%     361ns ± 1%  -0.48%  (p=0.007 n=20+20)
RegexpMatchMedium_32-12      126ns ± 1%     126ns ± 0%  +0.72%  (p=0.000 n=20+20)
RegexpMatchMedium_1K-12     39.1µs ± 1%    39.8µs ± 0%  +1.73%  (p=0.000 n=19+19)
RegexpMatchHard_32-12       1.97µs ± 0%    1.98µs ± 1%  +0.29%  (p=0.005 n=18+20)
RegexpMatchHard_1K-12       59.5µs ± 1%    59.8µs ± 1%  +0.36%  (p=0.000 n=18+20)
Revcomp-12                   442ms ± 1%     445ms ± 2%  +0.67%  (p=0.000 n=19+20)
Template-12                 58.0ms ± 1%    57.5ms ± 1%  -0.85%  (p=0.000 n=19+19)
TimeParse-12                 311ns ± 0%     314ns ± 0%  +0.94%  (p=0.000 n=20+18)
TimeFormat-12                350ns ± 3%     346ns ± 0%    ~     (p=0.076 n=20+19)
[Geo mean]                  55.9µs         56.4µs       +0.80%

ARM32:
name                     old time/op    new time/op    delta
BinaryTree17-4              30.4s ± 0%     30.1s ± 0%  -1.14%  (p=0.000 n=10+8)
Fannkuch11-4                13.7s ± 0%     13.6s ± 0%  -0.75%  (p=0.000 n=10+10)
FmtFprintfEmpty-4           664ns ± 1%     651ns ± 1%  -1.96%  (p=0.000 n=7+8)
FmtFprintfString-4         1.83µs ± 2%    1.77µs ± 2%  -3.21%  (p=0.000 n=10+10)
FmtFprintfInt-4            1.57µs ± 2%    1.54µs ± 2%  -2.25%  (p=0.007 n=10+10)
FmtFprintfIntInt-4         2.37µs ± 2%    2.31µs ± 1%  -2.68%  (p=0.000 n=10+10)
FmtFprintfPrefixedInt-4    2.14µs ± 2%    2.10µs ± 1%  -1.83%  (p=0.006 n=10+10)
FmtFprintfFloat-4          3.69µs ± 2%    3.74µs ± 1%  +1.60%  (p=0.000 n=10+10)
FmtManyArgs-4              9.43µs ± 1%    9.17µs ± 1%  -2.70%  (p=0.000 n=10+10)
GobDecode-4                76.3ms ± 1%    75.5ms ± 1%  -1.14%  (p=0.003 n=10+10)
GobEncode-4                70.7ms ± 2%    69.0ms ± 1%  -2.36%  (p=0.000 n=10+10)
Gzip-4                      2.64s ± 1%     2.65s ± 0%  +0.59%  (p=0.002 n=10+10)
Gunzip-4                    402ms ± 0%     398ms ± 0%  -1.11%  (p=0.000 n=10+9)
HTTPClientServer-4          458µs ± 0%     457µs ± 0%    ~     (p=0.247 n=10+10)
JSONEncode-4                171ms ± 0%     172ms ± 0%  +0.56%  (p=0.000 n=10+10)
JSONDecode-4                672ms ± 1%     668ms ± 1%    ~     (p=0.105 n=10+10)
Mandelbrot200-4            33.5ms ± 0%    33.5ms ± 0%    ~     (p=0.156 n=9+10)
GoParse-4                  33.9ms ± 0%    34.0ms ± 0%  +0.36%  (p=0.031 n=9+9)
RegexpMatchEasy0_32-4       823ns ± 1%     835ns ± 1%  +1.49%  (p=0.000 n=8+8)
RegexpMatchEasy0_1K-4      3.99µs ± 0%    4.02µs ± 1%  +0.92%  (p=0.000 n=8+10)
RegexpMatchEasy1_32-4       877ns ± 3%     904ns ± 2%  +3.07%  (p=0.012 n=10+10)
RegexpMatchEasy1_1K-4      5.99µs ± 0%    5.97µs ± 1%  -0.38%  (p=0.023 n=8+8)
RegexpMatchMedium_32-4     1.40µs ± 2%    1.40µs ± 2%    ~     (p=0.590 n=10+9)
RegexpMatchMedium_1K-4      357µs ± 0%     355µs ± 1%  -0.72%  (p=0.000 n=7+8)
RegexpMatchHard_32-4       22.3µs ± 0%    22.1µs ± 0%  -0.49%  (p=0.000 n=8+7)
RegexpMatchHard_1K-4        661µs ± 0%     658µs ± 0%  -0.42%  (p=0.000 n=8+7)
Revcomp-4                  46.3ms ± 0%    46.3ms ± 0%    ~     (p=0.393 n=10+10)
Template-4                  753ms ± 1%     750ms ± 0%    ~     (p=0.211 n=10+9)
TimeParse-4                4.28µs ± 1%    4.22µs ± 1%  -1.34%  (p=0.000 n=8+10)
TimeFormat-4               9.00µs ± 0%    9.05µs ± 0%  +0.59%  (p=0.000 n=10+10)
[Geo mean]                  538µs          535µs       -0.55%

ARM64:
name                     old time/op    new time/op    delta
BinaryTree17-8              8.39s ± 0%     8.39s ± 0%    ~     (p=0.684 n=10+10)
Fannkuch11-8                5.95s ± 0%     5.99s ± 0%  +0.63%  (p=0.000 n=10+10)
FmtFprintfEmpty-8           116ns ± 0%     116ns ± 0%    ~     (all equal)
FmtFprintfString-8          361ns ± 0%     360ns ± 0%  -0.31%  (p=0.003 n=8+6)
FmtFprintfInt-8             290ns ± 0%     290ns ± 0%    ~     (p=0.620 n=9+9)
FmtFprintfIntInt-8          476ns ± 1%     469ns ± 0%  -1.47%  (p=0.000 n=10+6)
FmtFprintfPrefixedInt-8     412ns ± 2%     417ns ± 2%  +1.39%  (p=0.006 n=9+10)
FmtFprintfFloat-8           652ns ± 1%     652ns ± 0%    ~     (p=0.161 n=10+8)
FmtManyArgs-8              1.94µs ± 0%    1.94µs ± 2%    ~     (p=0.781 n=10+10)
GobDecode-8                17.7ms ± 1%    17.7ms ± 0%    ~     (p=0.962 n=10+7)
GobEncode-8                15.6ms ± 0%    15.6ms ± 1%    ~     (p=0.063 n=10+10)
Gzip-8                      786ms ± 0%     787ms ± 0%    ~     (p=0.356 n=10+9)
Gunzip-8                    127ms ± 0%     127ms ± 0%  +0.08%  (p=0.028 n=10+9)
HTTPClientServer-8          198µs ± 6%     198µs ± 7%    ~     (p=0.796 n=10+10)
JSONEncode-8               42.5ms ± 0%    42.2ms ± 0%  -0.73%  (p=0.000 n=9+8)
JSONDecode-8                158ms ± 1%     162ms ± 0%  +2.28%  (p=0.000 n=10+9)
Mandelbrot200-8            10.1ms ± 0%    10.1ms ± 0%  -0.01%  (p=0.000 n=10+9)
GoParse-8                  8.54ms ± 1%    8.63ms ± 1%  +1.06%  (p=0.000 n=10+9)
RegexpMatchEasy0_32-8       231ns ± 1%     225ns ± 0%  -2.52%  (p=0.000 n=9+10)
RegexpMatchEasy0_1K-8      1.63µs ± 0%    1.63µs ± 0%    ~     (p=0.170 n=10+10)
RegexpMatchEasy1_32-8       253ns ± 0%     249ns ± 0%  -1.41%  (p=0.000 n=9+10)
RegexpMatchEasy1_1K-8      2.08µs ± 0%    2.08µs ± 0%  -0.32%  (p=0.000 n=9+10)
RegexpMatchMedium_32-8      355ns ± 1%     351ns ± 0%  -1.04%  (p=0.007 n=10+7)
RegexpMatchMedium_1K-8      104µs ± 0%     104µs ± 0%    ~     (p=0.148 n=10+10)
RegexpMatchHard_32-8       5.79µs ± 0%    5.79µs ± 0%    ~     (p=0.578 n=10+10)
RegexpMatchHard_1K-8        176µs ± 0%     176µs ± 0%    ~     (p=0.137 n=10+10)
Revcomp-8                   1.37s ± 1%     1.36s ± 1%  -0.26%  (p=0.023 n=10+10)
Template-8                  151ms ± 1%     154ms ± 1%  +2.14%  (p=0.000 n=9+10)
TimeParse-8                 723ns ± 2%     721ns ± 1%    ~     (p=0.592 n=10+10)
TimeFormat-8                804ns ± 2%     798ns ± 3%    ~     (p=0.344 n=10+10)
[Geo mean]                  154µs          154µs       -0.02%

Therefore remove this pass. Also reduce text size by 0.5~2%.

Comment out some dead code in runtime/sys_nacl_amd64p32.s
which contains undefined symbols.

Change-Id: I1473986fe5b18b3d2554ce96cdc6f0999b8d955d
Reviewed-on: https://go-review.googlesource.com/36205
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2017-02-07 15:00:48 +00:00
David Lazar 48d029fe43 [dev.inline] cmd/internal/obj: rename Prog.Lineno to Prog.Pos
Change-Id: I7585d85907869f5a286b36936dfd035f1e8e9906
Reviewed-on: https://go-review.googlesource.com/34197
Run-TryBot: David Lazar <lazard@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2016-12-09 20:35:56 +00:00
Cherry Zhang 698bfa17a8 cmd/internal/obj: save link register in leaf function with non-empty frame on PPC64, ARM64, S390X
The runtime traceback code assumes non-empty frame has link
link register saved on LR architectures. Make sure it is so in
the assember.

Also make sure that LR is stored before update SP, so the traceback
code will not see a half-updated stack frame if a signal comes
during the execution of function prologue.

Fixes #17381.

Change-Id: I668b04501999b7f9b080275a2d1f8a57029cbbb3
Reviewed-on: https://go-review.googlesource.com/31760
Reviewed-by: Michael Munday <munday@ca.ibm.com>
2016-10-25 21:44:32 +00:00
shaharko d391dc260a cmd/internal/obj: Use bitfield for LSym attributes
Reduces the size of LSym struct.

On 32bit: before 84  after 76
On 64bit: before 136 after 128

name       old time/op     new time/op     delta
Template       182ms ± 3%      182ms ± 3%    ~           (p=0.607 n=19+20)
Unicode       93.5ms ± 4%     94.2ms ± 3%    ~           (p=0.141 n=20+19)
GoTypes        608ms ± 1%      605ms ± 2%    ~           (p=0.056 n=20+20)

name       old user-ns/op  new user-ns/op  delta
Template        249M ± 7%       249M ± 4%    ~           (p=0.605 n=18+19)
Unicode         149M ±14%       151M ± 5%    ~           (p=0.724 n=20+17)
GoTypes         855M ± 4%       853M ± 3%    ~           (p=0.537 n=19+19)

name       old alloc/op    new alloc/op    delta
Template      40.3MB ± 0%     40.3MB ± 0%  -0.11%        (p=0.000 n=19+20)
Unicode       33.8MB ± 0%     33.8MB ± 0%  -0.08%        (p=0.000 n=20+20)
GoTypes        119MB ± 0%      119MB ± 0%  -0.10%        (p=0.000 n=19+20)

name       old allocs/op   new allocs/op   delta
Template        383k ± 0%       383k ± 0%    ~           (p=0.703 n=20+20)
Unicode         317k ± 0%       317k ± 0%    ~           (p=0.982 n=19+18)
GoTypes        1.14M ± 0%      1.14M ± 0%    ~           (p=0.086 n=20+20)

Change-Id: Id6ba0db3ecc4503a4e9af3ed0d5884d4366e8bf9
Reviewed-on: https://go-review.googlesource.com/31870
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Run-TryBot: Shahar Kohanim <skohanim@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-10-25 20:10:05 +00:00
Michael Munday 430b82009c cmd/internal/obj/{ppc64,s390x}: mark functions with small stacks NOSPLIT
This change omits the stack check on ppc64 and s390x when the size of
a stack frame is less than obj.StackSmall. This is an optimization
x86 already performs.

The effect on s390x isn't huge because we were already omitting the
stack check when the frame size was 0 (it shaves about 1K from the
size of bin/go). On ppc64 however this change reduces the size of the
.text section in bin/go by 33K (1%).

Updates #13379 (for ppc64).

Change-Id: I6af0eb987646bea47fcaf0a812db3496bab0f680
Reviewed-on: https://go-review.googlesource.com/31357
Reviewed-by: David Chase <drchase@google.com>
2016-10-18 17:07:14 +00:00
Cherry Zhang 4c9a372946 runtime, cmd/internal/obj: get rid of rewindmorestack
In the function prologue, we emit a jump to the beginning of
the function immediately after calling morestack. And in the
runtime stack growing code, it decodes and emulates that jump.
This emulation was necessary before we had per-PC SP deltas,
since the traceback code assumed that the frame size was fixed
for the whole function, except on the first instruction where
it was 0. Since we now have per-PC SP deltas and PCDATA, we
can correctly record that the frame size is 0. This makes the
emulation unnecessary.

This may be helpful for registerized calling convention, where
there may be unspills of arguments after calling morestack. It
also simplifies the runtime.

Change-Id: I7ebee31eaee81795445b33f521ab6a79624c4ceb
Reviewed-on: https://go-review.googlesource.com/30138
Reviewed-by: David Chase <drchase@google.com>
2016-10-05 18:19:46 +00:00
Dave Cheney d61c07ffd8 cmd/link/internal, cmd/internal/obj: introduce ctxt.Logf
Replace the various calls to Fprintf(ctxt.Bso, ...) with a helper,
ctxt.Logf. This also addresses the various inconsistent flushing of
ctxt.Bso.

Because we have two Link structures, add Link.Logf in both places.

Change-Id: I23093f9b9b3bf33089a0ffd7f815f92dcd1a1fa1
Reviewed-on: https://go-review.googlesource.com/27730
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-08-25 03:34:06 +00:00
Dave Cheney 3167ff7ca9 cmd/internal/*: only call ctx.Bso.Flush when something has been written
In many places where ctx.Bso.Flush is used as the target for some debug
logging, ctx.Bso.Flush is called unconditionally. In the majority of
cases where debug logging is not enabled, this means Flush is called
many times when there is nothing to be flushed (it will be called anyway
when ctx.Bso is eventually closed), sometimes in a loop.

Avoid this by moving the ctx.Bso.Flush call into the same condition
block as the debug print. This pattern was previously applied
sporadically.

Change-Id: I0444cb235cc8b9bac51a59b2e44e59872db2be06
Reviewed-on: https://go-review.googlesource.com/27579
Run-TryBot: Dave Cheney <dave@cheney.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-08-25 01:31:17 +00:00
Lynn Boger eeca3ba92f sync/atomic, runtime/internal/atomic: improve ppc64x atomics
The following performance improvements have been made to the
low-level atomic functions for ppc64le & ppc64:

- For those cases containing a lwarx and stwcx (or other sizes):
sync, lwarx, maybe something, stwcx, loop to sync, sync, isync
The sync is moved before (outside) the lwarx/stwcx loop, and the
 sync after is removed, so it becomes:
sync, lwarx, maybe something, stwcx, loop to lwarx, isync

- For the Or8 and And8, the shifting and manipulation of the
address to the word aligned version were removed and the
instructions were changed to use lbarx, stbcx instead of
register shifting, xor, then lwarx, stwcx.

- New instructions LWSYNC, LBAR, STBCC were tested and added.
runtime/atomic_ppc64x.s was changed to use the LWSYNC opcode
instead of the WORD encoding.

Fixes #15469

Ran some of the benchmarks in the runtime and sync directories.
Some results varied from run to run but the trend was improvement
based on best times for base and new:

runtime.test:
BenchmarkChanNonblocking-128         0.88          0.89          +1.14%
BenchmarkChanUncontended-128         569           511           -10.19%
BenchmarkChanContended-128           63110         53231         -15.65%
BenchmarkChanSync-128                691           598           -13.46%
BenchmarkChanSyncWork-128            11355         11649         +2.59%
BenchmarkChanProdCons0-128           2402          2090          -12.99%
BenchmarkChanProdCons10-128          1348          1363          +1.11%
BenchmarkChanProdCons100-128         1002          746           -25.55%
BenchmarkChanProdConsWork0-128       2554          2720          +6.50%
BenchmarkChanProdConsWork10-128      1909          1804          -5.50%
BenchmarkChanProdConsWork100-128     1624          1580          -2.71%
BenchmarkChanCreation-128            237           212           -10.55%
BenchmarkChanSem-128                 705           667           -5.39%
BenchmarkChanPopular-128             5081190       4497566       -11.49%

BenchmarkCreateGoroutines-128             532           473           -11.09%
BenchmarkCreateGoroutinesParallel-128     35.0          34.7          -0.86%
BenchmarkCreateGoroutinesCapture-128      4923          4200          -14.69%

sync.test:
BenchmarkUncontendedSemaphore-128      112           94.2          -15.89%
BenchmarkContendedSemaphore-128        133           128           -3.76%
BenchmarkMutexUncontended-128          1.90          1.67          -12.11%
BenchmarkMutex-128                     353           310           -12.18%
BenchmarkMutexSlack-128                304           283           -6.91%
BenchmarkMutexWork-128                 554           541           -2.35%
BenchmarkMutexWorkSlack-128            567           556           -1.94%
BenchmarkMutexNoSpin-128               275           242           -12.00%
BenchmarkMutexSpin-128                 1129          1030          -8.77%
BenchmarkOnce-128                      1.08          0.96          -11.11%
BenchmarkPool-128                      29.8          27.4          -8.05%
BenchmarkPoolOverflow-128              40564         36583         -9.81%
BenchmarkSemaUncontended-128           3.14          2.63          -16.24%
BenchmarkSemaSyntNonblock-128          1087          1069          -1.66%
BenchmarkSemaSyntBlock-128             897           893           -0.45%
BenchmarkSemaWorkNonblock-128          1034          1028          -0.58%
BenchmarkSemaWorkBlock-128             949           886           -6.64%

Change-Id: I4403fb29d3cd5254b7b1ce87a216bd11b391079e
Reviewed-on: https://go-review.googlesource.com/22549
Reviewed-by: Michael Munday <munday@ca.ibm.com>
Reviewed-by: Minux Ma <minux@golang.org>
2016-05-05 18:52:28 +00:00
Emmanuel Odeke 53fd522c0d all: make copyright headers consistent with one space after period
Follows suit with https://go-review.googlesource.com/#/c/20111.

Generated by running
$ grep -R 'Go Authors.  All' * | cut -d":" -f1 | while read F;do perl -pi -e 's/Go
Authors.  All/Go Authors. All/g' $F;done

The code in cmd/internal/unvendor wasn't changed.

Fixes #15213

Change-Id: I4f235cee0a62ec435f9e8540a1ec08ae03b1a75f
Reviewed-on: https://go-review.googlesource.com/21819
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-02 13:43:18 +00:00
Matthew Dempsky babfb4ec3b cmd/internal/obj: change Link.Flag_shared to bool
Change-Id: I9bda2ce6f45fb8292503f86d8f9f161601f222b7
Reviewed-on: https://go-review.googlesource.com/22053
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
2016-04-14 02:11:17 +00:00
Matthew Dempsky c6e11fe037 cmd: add new common architecture representation
Information about CPU architectures (e.g., name, family, byte
ordering, pointer and register size) is currently redundantly
scattered around the source tree. Instead consolidate the basic
information into a single new package cmd/internal/sys.

Also, introduce new sys.I386, sys.AMD64, etc. names for the constants
'8', '6', etc. and replace most uses of the latter. The notable
exceptions are a couple of error messages that still refer to the old
char-based toolchain names and function reltype in cmd/link.

Passes toolstash/buildall.

Change-Id: I8a6f0cbd49577ec1672a98addebc45f767e36461
Reviewed-on: https://go-review.googlesource.com/21623
Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-07 01:23:25 +00:00