Commit Graph

87 Commits

Author SHA1 Message Date
David du Colombier c8af6de2e8 [dev.cc] cmd/5g,cmd/6g,cmd/9g: fix warnings on Plan 9
warning: src/cmd/5g/reg.c:461 format mismatch d VLONG, arg 5
warning: src/cmd/6g/reg.c:396 format mismatch d VLONG, arg 5
warning: src/cmd/9g/reg.c:440 format mismatch d VLONG, arg 5

LGTM=minux
R=rsc, minux
CC=golang-codereviews
https://golang.org/cl/179300043
2014-11-25 08:42:00 +01:00
Austin Clements 5b38501a4f [dev.power64] 5g,6g,8g,9g: debug prints for regopt pass 6 and paint2
Theses were very helpful in understanding the regions and
register selection when porting regopt to 9g.  Add them to the
other compilers (and improve 9g's successor debug print).

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/174130043
2014-11-14 11:56:31 -05:00
Austin Clements c3dadb3d19 [dev.power64] 6g,8g: remove unnecessary and incorrect reg use scanning
Previously, the 6g and 8g registerizers scanned for used
registers beyond the end of a region being considered for
registerization.  This ancient artifact was copied from the C
compilers, where it was probably necessary to track implicitly
used registers.  In the Go compilers it's harmless (because it
can only over-restrict the set of available registers), but no
longer necessary because the Go compilers correctly track
register use/set information.  The consequences of this extra
scan were (at least) that 1) we would not consider allocating
the AX register if there was a deferproc call in the future
because deferproc uses AX as a return register, so we see the
use of AX, but don't track that AX is set by the CALL, and 2)
we could not consider allocating the DX register if there was
a MUL in the future because MUL implicitly sets DX and (thanks
to an abuse of copyu in this code) we would also consider DX
used.

This commit fixes these problems by nuking this code.

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/174110043
2014-11-13 13:34:20 -05:00
Austin Clements 4f81684f86 [dev.power64] 6g: don't create variables for indirect addresses
Previously, mkvar treated, for example, 0(AX) the same as AX.
As a result, a move to an indirect address would be marked as
*setting* the register, rather than just using it, resulting
in unnecessary register moves.  Fix this by not producing
variables for indirect addresses.

LGTM=rsc
R=rsc, dave
CC=golang-codereviews
https://golang.org/cl/164610043
2014-11-05 15:36:47 -05:00
Austin Clements fa32e922d5 [dev.power64] gc: convert Bits to a uint64 array
So far all of our architectures have had at most 32 registers,
so we've been able to use entry 0 in the Bits uint32 array
directly as a register mask.  Power64 has 64 registers, so
this converts Bits to a uint64 array so we can continue to use
entry 0 directly as a register mask on Power64.

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/169060043
2014-11-04 16:34:56 -05:00
Josh Bleecher Snyder be2ad1d7f5 cmd/5g, cmd/6g, cmd/8g: clear Addr node when registerizing
Update #8525

Some temporary variables that were fully registerized nevertheless had stack space allocated for them because the Addrs were still marked as having associated nodes.

Distribution of stack space reserved for temporary variables while running make.bash (6g):

BEFORE

40.89%  7026 allocauto: 0 to 0
 7.83%  1346 allocauto: 0 to 24
 7.22%  1241 allocauto: 0 to 8
 6.30%  1082 allocauto: 0 to 16
 4.96%   853 allocauto: 0 to 56
 4.59%   789 allocauto: 0 to 32
 2.97%   510 allocauto: 0 to 40
 2.32%   399 allocauto: 0 to 48
 2.10%   360 allocauto: 0 to 64
 1.91%   328 allocauto: 0 to 72

AFTER

48.49%  8332 allocauto: 0 to 0
 9.52%  1635 allocauto: 0 to 16
 5.28%   908 allocauto: 0 to 48
 4.80%   824 allocauto: 0 to 32
 4.73%   812 allocauto: 0 to 8
 3.38%   581 allocauto: 0 to 24
 2.35%   404 allocauto: 0 to 40
 2.32%   399 allocauto: 0 to 64
 1.65%   284 allocauto: 0 to 56
 1.34%   230 allocauto: 0 to 72

LGTM=rsc
R=rsc
CC=dave, dvyukov, golang-codereviews, minux
https://golang.org/cl/126160043
2014-08-24 19:07:27 -07:00
Dave Cheney afb6221bf7 cmd/6g: fix undefined behavior in reg.c
Update #8527

Fixes, cmd/6g/reg.c:847:24: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'

LGTM=minux, rsc
R=minux, rsc
CC=dvyukov, golang-codereviews
https://golang.org/cl/129290043
2014-08-19 10:52:50 +10:00
Russ Cox ebce79446d build: annotations and modifications for c2go
The main changes fall into a few patterns:

1. Replace #define with enum.

2. Add /*c2go */ comment giving effect of #define.
This is necessary for function-like #defines and
non-enum-able #defined constants.
(Not all compilers handle negative or large enums.)

3. Add extra braces in struct initializer.
(c2go does not implement the full rules.)

This is enough to let c2go typecheck the source tree.
There may be more changes once it is doing
other semantic analyses.

LGTM=minux, iant
R=minux, dave, iant
CC=golang-codereviews
https://golang.org/cl/106860045
2014-07-02 15:41:29 -04:00
Russ Cox 1afbceb599 cmd/6g: treat vardef-initialized fat variables as live at calls
This CL forces the optimizer to preserve some memory stores
that would be redundant except that a stack scan due to garbage
collection or stack copying might look at them during a function call.
As such, it forces additional memory writes and therefore slows
down the execution of some programs, especially garbage-heavy
programs that are already limited by memory bandwidth.

The slowdown can be as much as 7% for end-to-end benchmarks.

These numbers are from running go1.test -test.benchtime=5s three times,
taking the best (lowest) ns/op for each benchmark. I am excluding
benchmarks with time/op < 10us to focus on macro effects.
All benchmarks are on amd64.

Comparing tip (a27f34c771cb) against this CL on an Intel Core i5 MacBook Pro:

benchmark                          old ns/op      new ns/op      delta
BenchmarkBinaryTree17              3876500413     3856337341     -0.52%
BenchmarkFannkuch11                2965104777     2991182127     +0.88%
BenchmarkGobDecode                 8563026        8788340        +2.63%
BenchmarkGobEncode                 5050608        5267394        +4.29%
BenchmarkGzip                      431191816      434168065      +0.69%
BenchmarkGunzip                    107873523      110563792      +2.49%
BenchmarkHTTPClientServer          85036          86131          +1.29%
BenchmarkJSONEncode                22143764       22501647       +1.62%
BenchmarkJSONDecode                79646916       85658808       +7.55%
BenchmarkMandelbrot200             4720421        4700108        -0.43%
BenchmarkGoParse                   4651575        4712247        +1.30%
BenchmarkRegexpMatchMedium_1K      71986          73490          +2.09%
BenchmarkRegexpMatchHard_1K        111018         117495         +5.83%
BenchmarkRevcomp                   648798723      659352759      +1.63%
BenchmarkTemplate                  112673009      112819078      +0.13%

Comparing tip (a27f34c771cb) against this CL on an Intel Xeon E5520:

BenchmarkBinaryTree17              5461110720     5393104469     -1.25%
BenchmarkFannkuch11                4314677151     4327177615     +0.29%
BenchmarkGobDecode                 11065853       11235272       +1.53%
BenchmarkGobEncode                 6500065        6959837        +7.07%
BenchmarkGzip                      647478596      671769097      +3.75%
BenchmarkGunzip                    139348579      141096376      +1.25%
BenchmarkHTTPClientServer          69376          73610          +6.10%
BenchmarkJSONEncode                30172320       31796106       +5.38%
BenchmarkJSONDecode                113704905      114239137      +0.47%
BenchmarkMandelbrot200             6032730        6003077        -0.49%
BenchmarkGoParse                   6775251        6405995        -5.45%
BenchmarkRegexpMatchMedium_1K      111832         113895         +1.84%
BenchmarkRegexpMatchHard_1K        161112         168420         +4.54%
BenchmarkRevcomp                   876363406      892319935      +1.82%
BenchmarkTemplate                  146273096      148998339      +1.86%

Just to get a sense of where we are compared to the previous release,
here are the same benchmarks comparing Go 1.2 to this CL.

Comparing Go 1.2 against this CL on an Intel Core i5 MacBook Pro:

BenchmarkBinaryTree17              4370077662     3856337341     -11.76%
BenchmarkFannkuch11                3347052657     2991182127     -10.63%
BenchmarkGobDecode                 8791384        8788340        -0.03%
BenchmarkGobEncode                 4968759        5267394        +6.01%
BenchmarkGzip                      437815669      434168065      -0.83%
BenchmarkGunzip                    94604099       110563792      +16.87%
BenchmarkHTTPClientServer          87798          86131          -1.90%
BenchmarkJSONEncode                22818243       22501647       -1.39%
BenchmarkJSONDecode                97182444       85658808       -11.86%
BenchmarkMandelbrot200             4733516        4700108        -0.71%
BenchmarkGoParse                   5054384        4712247        -6.77%
BenchmarkRegexpMatchMedium_1K      67612          73490          +8.69%
BenchmarkRegexpMatchHard_1K        107321         117495         +9.48%
BenchmarkRevcomp                   733270055      659352759      -10.08%
BenchmarkTemplate                  109304977      112819078      +3.21%

Comparing Go 1.2 against this CL on an Intel Xeon E5520:

BenchmarkBinaryTree17              5986953594     5393104469     -9.92%
BenchmarkFannkuch11                4861139174     4327177615     -10.98%
BenchmarkGobDecode                 11830997       11235272       -5.04%
BenchmarkGobEncode                 6608722        6959837        +5.31%
BenchmarkGzip                      661875826      671769097      +1.49%
BenchmarkGunzip                    138630019      141096376      +1.78%
BenchmarkHTTPClientServer          71534          73610          +2.90%
BenchmarkJSONEncode                30393609       31796106       +4.61%
BenchmarkJSONDecode                139645860      114239137      -18.19%
BenchmarkMandelbrot200             5988660        6003077        +0.24%
BenchmarkGoParse                   6974092        6405995        -8.15%
BenchmarkRegexpMatchMedium_1K      111331         113895         +2.30%
BenchmarkRegexpMatchHard_1K        165961         168420         +1.48%
BenchmarkRevcomp                   995049292      892319935      -10.32%
BenchmarkTemplate                  145623363      148998339      +2.32%

Fixes #8036.

LGTM=khr
R=golang-codereviews, josharian, khr
CC=golang-codereviews, iant, r
https://golang.org/cl/99660044
2014-05-30 16:41:58 -04:00
Russ Cox f5184d3437 cmd/gc: correct handling of globals, func args, results
Globals, function arguments, and results are special cases in
registerization.

Globals must be flushed aggressively, because nearly any
operation can cause a panic, and the recovery code must see
the latest values. Globals also must be loaded aggressively,
because nearly any store through a pointer might be updating a
global: the compiler cannot see all the "address of"
operations on globals, especially exported globals. To
accomplish this, mark all globals as having their address
taken, which effectively disables registerization.

If a function contains a defer statement, the function results
must be flushed aggressively, because nearly any operation can
cause a panic, and the deferred code may call recover, causing
the original function to return the current values of its
function results. To accomplish this, mark all function
results as having their address taken if the function contains
any defer statements. This causes not just aggressive flushing
but also aggressive loading. The aggressive loading is
overkill but the best we can do in the current code.

Function arguments must be considered live at all safe points
in a function, because garbage collection always preserves
them: they must be up-to-date in order to be preserved
correctly. Accomplish this by marking them live at all call
sites. An earlier attempt at this marked function arguments as
having their address taken, which disabled registerization
completely, making programs slower. This CL's solution allows
registerization while preserving safety. The benchmark speedup
is caused by being able to registerize again (the earlier CL
lost the same amount).

benchmark                old ns/op     new ns/op     delta
BenchmarkEqualPort32     61.4          56.0          -8.79%

benchmark                old MB/s     new MB/s     speedup
BenchmarkEqualPort32     521.56       570.97       1.09x

Fixes #1304. (again)
Fixes #7944. (again)
Fixes #7984.
Fixes #7995.

LGTM=khr
R=golang-codereviews, khr
CC=golang-codereviews, iant, r
https://golang.org/cl/97500044
2014-05-15 15:34:53 -04:00
Russ Cox 26ad5d4ff0 cmd/gc: fix liveness vs regopt mismatch for input variables
The inputs to a function are marked live at all times in the
liveness bitmaps, so that the garbage collector will not free
the things they point at and reuse the pointers, so that the
pointers shown in stack traces are guaranteed not to have
been recycled.

Unfortunately, no one told the register optimizer that the
inputs need to be preserved at all call sites. If a function
is done with a particular input value, the optimizer will stop
preserving it across calls. For single-word values this just
means that the value recorded might be stale. For multi-word
values like slices, the value recorded could be only partially stale:
it can happen that, say, the cap was updated but not the len,
or that the len was updated but not the base pointer.
Either of these possibilities (and others) would make the
garbage collector misinterpret memory, leading to memory
corruption.

This came up in a real program, in which the garbage collector's
'slice len ≤ slice cap' check caught the inconsistency.

Fixes #7944.

LGTM=iant
R=golang-codereviews, iant
CC=golang-codereviews, khr
https://golang.org/cl/100370045
2014-05-12 17:19:02 -04:00
Josh Bleecher Snyder 1848d71445 cmd/gc: don't give credit for NOPs during register allocation
The register allocator decides which variables should be placed into registers by charging for each load/store and crediting for each use, and then selecting an allocation with minimal cost. NOPs will be eliminated, however, so using a variable in a NOP should not generate credit.

Issue 7867 arises from attempted registerization of multi-word variables because they are used in NOPs. By not crediting for that use, they will no longer be considered for registerization.

This fix could theoretically lead to better register allocation, but NOPs are rare relative to other instructions.

Fixes #7867.

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/94810044
2014-05-09 09:55:17 -07:00
Russ Cox 0a8a719ded cmd/5g, cmd/6g, cmd/8g: preserve wide values in large functions
In large functions with many variables, the register optimizer
may give up and choose not to track certain variables at all.
In this case, the "nextinnode" information linking together
all the words from a given variable will be incomplete, and
the result may be that only some of a multiword value is
preserved across a call. That confuses the garbage collector,
so don't do that. Instead, mark those variables as having
their address taken, so that they will be preserved at all
calls. It's overkill, but correct.

Tested by hand using the 6g -S output to see that it does fix
the buggy generated code leading to the issue 7726 failure.

There is no automated test because I managed to break the
compiler while writing a test (see issue 7727). I will check
in a test along with the fix to issue 7727.

Fixes #7726.

LGTM=khr
R=khr, bradfitz, dave
CC=golang-codereviews
https://golang.org/cl/85200043
2014-04-16 13:59:42 -04:00
Russ Cox b700cb4974 cmd/gc: shorten temporary lifetimes when possible
The new channel and map runtime routines take pointers
to values, typically temporaries. Without help, the compiler
cannot tell when those temporaries stop being needed,
because it isn't sure what happened to the pointer.
Arrange to insert explicit VARKILL instructions for these
temporaries so that the liveness analysis can avoid seeing
them as "ambiguously live".

The change is made in order.c, which was already in charge of
introducing temporaries to preserve the order-of-evaluation
guarantees. Now its job has expanded to include introducing
temporaries as needed by runtime routines, and then also
inserting the VARKILL annotations for all these temporaries,
so that their lifetimes can be shortened.

In order to do its job for the map runtime routines, order.c arranges
that all map lookups or map assignments have the form:

        x = m[k]
        x, y = m[k]
        m[k] = x

where x, y, and k are simple variables (often temporaries).
Likewise, receiving from a channel is now always:

        x = <-c

In order to provide the map guarantee, order.c is responsible for
rewriting x op= y into x = x op y, so that m[k] += z becomes

        t = m[k]
        t2 = t + z
        m[k] = t2

While here, fix a few bugs in order.c's traversal: it was failing to
walk into select and switch case bodies, so order of evaluation
guarantees were not preserved in those situations.
Added tests to test/reorder2.go.

Fixes #7671.

In gc/popt's temporary-merging optimization, allow merging
of temporaries with their address taken as long as the liveness
ranges do not intersect. (There is a good chance of that now
that we have VARKILL annotations to limit the liveness range.)

Explicitly killing temporaries cuts the number of ambiguously
live temporaries that must be zeroed in the godoc binary from
860 to 711, or -17%. There is more work to be done, but this
is a good checkpoint.

Update #7345

LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/81940043
2014-04-01 13:31:38 -04:00
Russ Cox 6722d45631 cmd/gc: liveness-related bug fixes
1. On entry to a function, only zero the ambiguously live stack variables.
Before, we were zeroing all stack variables containing pointers.
The zeroing is pretty inefficient right now (issue 7624), but there are also
too many stack variables detected as ambiguously live (issue 7345),
and that must be addressed before deciding how to improve the zeroing code.
(Changes in 5g/ggen.c, 6g/ggen.c, 8g/ggen.c, gc/pgen.c)

Fixes #7647.

2. Make the regopt word-based liveness analysis preserve the
whole-variable liveness property expected by the garbage collection
bitmap liveness analysis. That is, if the regopt liveness decides that
one word in a struct needs to be preserved, make sure it preserves
the entire struct. This is particularly important for multiword values
such as strings, slices, and interfaces, in which all the words need
to be present in order to understand the meaning.
(Changes in 5g/reg.c, 6g/reg.c, 8g/reg.c.)

Fixes #7591.

3. Make the regopt word-based liveness analysis treat a variable
as having its address taken - which makes it preserved across
all future calls - whenever n->addrtaken is set, for consistency
with the gc bitmap liveness analysis, even if there is no machine
instruction actually taking the address. In this case n->addrtaken
is incorrect (a nicer way to put it is overconservative), and ideally
there would be no such cases, but they can happen and the two
analyses need to agree.
(Changes in 5g/reg.c, 6g/reg.c, 8g/reg.c; test in bug484.go.)

Fixes crashes found by turning off "zero everything" in step 1.

4. Remove spurious VARDEF annotations. As the comment in
gc/pgen.c explains, the VARDEF must immediately precede
the initialization. It cannot be too early, and it cannot be too late.
In particular, if a function call sits between the VARDEF and the
actual machine instructions doing the initialization, the variable
will be treated as live during that function call even though it is
uninitialized, leading to problems.
(Changes in gc/gen.c; test in live.go.)

Fixes crashes found by turning off "zero everything" in step 1.

5. Do not treat loading the address of a wide value as a signal
that the value must be initialized. Instead depend on the existence
of a VARDEF or the first actual read/write of a word in the value.
If the load is in order to pass the address to a function that does
the actual initialization, treating the load as an implicit VARDEF
causes the same problems as described in step 4.
The alternative is to arrange to zero every such value before
passing it to the real initialization function, but this is a much
easier and more efficient change.
(Changes in gc/plive.c.)

Fixes crashes found by turning off "zero everything" in step 1.

6. Treat wide input parameters with their address taken as
initialized on entry to the function. Otherwise they look
"ambiguously live" and we will try to emit code to zero them.
(Changes in gc/plive.c.)

Fixes crashes found by turning off "zero everything" in step 1.

7. An array of length 0 has no pointers, even if the element type does.
Without this change, the zeroing code complains when asked to
clear a 0-length array.
(Changes in gc/reflect.c.)

LGTM=khr
R=khr
CC=golang-codereviews
https://golang.org/cl/80160044
2014-03-27 14:05:57 -04:00
Dave Cheney 7c8280c9ef all: merge NaCl branch (part 1)
See golang.org/s/go13nacl for design overview.

This CL is the mostly mechanical changes from rsc's Go 1.2 based NaCl branch, specifically 39cb35750369 to 500771b477cf from https://code.google.com/r/rsc-go13nacl. This CL does not include working NaCl support, there are probably two or three more large merges to come.

CL 15750044 is not included as it involves more invasive changes to the linker which will need to be merged separately.

The exact change lists included are

15050047: syscall: support for Native Client
15360044: syscall: unzip implementation for Native Client
15370044: syscall: Native Client SRPC implementation
15400047: cmd/dist, cmd/go, go/build, test: support for Native Client
15410048: runtime: support for Native Client
15410049: syscall: file descriptor table for Native Client
15410050: syscall: in-memory file system for Native Client
15440048: all: update +build lines for Native Client port
15540045: cmd/6g, cmd/8g, cmd/gc: support for Native Client
15570045: os: support for Native Client
15680044: crypto/..., hash/crc32, reflect, sync/atomic: support for amd64p32
15690044: net: support for Native Client
15690048: runtime: support for fake time like on Go Playground
15690051: build: disable various tests on Native Client

LGTM=rsc
R=rsc
CC=golang-codereviews
https://golang.org/cl/68150047
2014-02-25 09:47:42 -05:00
Russ Cox 7a7c0ffb47 cmd/gc: correct liveness for fat variables
The VARDEF placement must be before the initialization
but after any final use. If you have something like s = ... using s ...
the rhs must be evaluated, then the VARDEF, then the lhs
assigned.

There is a large comment in pgen.c on gvardef explaining
this in more detail.

This CL also includes Ian's suggestions from earlier CLs,
namely commenting the use of mode in link.h and fixing
the precedence of the ~r check in dcl.c.

This CL enables the check that if liveness analysis decides
a variable is live on entry to the function, that variable must
be a function parameter (not a result, and not a local variable).
If this check fails, it indicates a bug in the liveness analysis or
in the generated code being analyzed.

The race detector generates invalid code for append(x, y...).
The code declares a temporary t and then uses cap(t) before
initializing t. The new liveness check catches this bug and
stops the compiler from writing out the buggy code.
Consequently, this CL disables the race detector tests in
run.bash until the race detector bug can be fixed
(golang.org/issue/7334).

Except for the race detector bug, the liveness analysis check
does not detect any problems (this CL and the previous CLs
fixed all the detected problems).

The net test still fails with GOGC=0 but the rest of the tests
now pass or time out (because GOGC=0 is so slow).

TBR=iant
CC=golang-codereviews
https://golang.org/cl/64170043
2014-02-15 10:58:55 -05:00
Russ Cox 684332f47c cmd/5g: fix regopt bug in copyprop
copyau1 was assuming that it could deduce the type of the
middle register p->reg from the type of the left or right
argument: in CMPF F1, F2, the p->reg==2 must be a D_FREG
because p->from is F1, and in CMP R1, R2, the p->reg==2 must
be a D_REG because p->from is R1.

This heuristic fails for CMP $0, R2, which was causing copyau1
not to recognize p->reg==2 as a reference to R2, which was
keeping it from properly renaming the register use when
substituting registers.

cmd/5c has the right approach: look at the opcode p->as to
decide the kind of register. It is unclear where 5g's copyau1
came from; perhaps it was an attempt to avoid expanding 5c's
a2type to include new instructions used only by 5g.

Copy a2type from cmd/5c, expand to include additional instructions,
and make it crash the compiler if asked about an instruction
it does not understand (avoid silent bugs in the future if new
instructions are added).

Should fix current arm build breakage.

While we're here, fix the print statements dumping the pred and
succ info in the asm listing to pass an int arg to %.4ud
(Prog.pc is a vlong now, due to the liblink merge).

TBR=ken2
CC=golang-codereviews
https://golang.org/cl/62730043
2014-02-13 03:54:55 +00:00
Anthony Martin 2cae0591cd cmd/cc, cmd/gc, cmd/ld: consolidate print format routines
We now use the %A, %D, %P, and %R routines from liblink
across the board.

Fixes #7178.
Fixes #7055.

LGTM=iant
R=golang-codereviews, gobot, rsc, dave, iant, remyoudompheng
CC=golang-codereviews
https://golang.org/cl/49170043
2014-02-12 14:29:11 -05:00
David du Colombier 9607255760 cmd/6g, cmd/gc, cmd/ld: fix Plan 9 amd64 warnings
warning: src/cmd/6g/reg.c:671 format mismatch d VLONG, arg 4
warning: src/cmd/gc/pgen.c:230 set and not used: oldstksize
warning: src/cmd/gc/plive.c:877 format mismatch lx UVLONG, arg 2
warning: src/cmd/gc/walk.c:2878 set and not used: cbv
warning: src/cmd/gc/walk.c:2885 set and not used: hbv
warning: src/cmd/ld/data.c:198 format mismatch s IND FUNC(IND CHAR) INT, arg 2
warning: src/cmd/ld/data.c:230 format mismatch s IND FUNC(IND CHAR) INT, arg 2
warning: src/cmd/ld/dwarf.c:1517 set and not used: pc
warning: src/cmd/ld/elf.c:1507 format mismatch d VLONG, arg 2
warning: src/cmd/ld/ldmacho.c:509 set and not used: dsymtab

R=golang-dev, gobot, rsc
CC=golang-dev
https://golang.org/cl/36740045
2013-12-18 20:20:46 +01:00
Russ Cox f606c1be80 cmd/5g, cmd/6g, cmd/8g: use liblink
Preparation for golang.org/s/go13linker work.

This CL does not build by itself. It depends on 35740044
and 35790044 and will be submitted at the same time.

R=iant
CC=golang-dev
https://golang.org/cl/34590045
2013-12-08 22:51:55 -05:00
Carl Shapiro f056daf075 cmd/5g, cmd/5l, cmd/6g, cmd/6l, cmd/8g, cmd/8l, cmd/gc, runtime: generate pointer maps by liveness analysis
This change allows the garbage collector to examine stack
slots that are determined as live and containing a pointer
value by the garbage collector.  This results in a mean
reduction of 65% in the number of stack slots scanned during
an invocation of "GOGC=1 all.bash".

Unfortunately, this does not yet allow garbage collection to
be precise for the stack slots computed as live.  Pointers
confound the determination of what definitions reach a given
instruction.  In general, this problem is not solvable without
runtime cost but some advanced cooperation from the compiler
might mitigate common cases.

R=golang-dev, rsc, cshapiro
CC=golang-dev
https://golang.org/cl/14430048
2013-12-05 17:35:22 -08:00
Russ Cox fa72679f07 cmd/gc: add temporary-merging optimization pass
The compilers assume they can generate temporary variables
as needed to preserve the right semantics or simplify code
generation and the back end will still generate good code.
This turns out not to be true. The back ends will only
track the first 128 variables per function and give up
on the remainder. That needs to be fixed too, in a later CL.

This CL merges temporary variables with equal types and
non-overlapping lifetimes using the greedy algorithm in
Poletto and Sarkar, "Linear Scan Register Allocation",
ACM TOPLAS 1999.

The result can be striking in the right functions.

Top 20 frame size changes in a 6g godoc binary by bytes saved:

5464 1984 (-3480, -63.7%) go/build.(*Context).Import
4456 1824 (-2632, -59.1%) go/printer.(*printer).expr1
2560   80 (-2480, -96.9%) time.nextStdChunk
3496 1608 (-1888, -54.0%) go/printer.(*printer).stmt
1896  272 (-1624, -85.7%) net/http.init
2688 1400 (-1288, -47.9%) fmt.(*pp).printReflectValue
2800 1512 (-1288, -46.0%) main.main
3296 2016 (-1280, -38.8%) crypto/tls.(*Conn).clientHandshake
1664  488 (-1176, -70.7%) time.loadZoneZip
1760  608 (-1152, -65.5%) time.parse
4104 3072 (-1032, -25.1%) runtime/pprof.writeHeap
1680  712 ( -968, -57.6%) go/ast.Walk
2488 1560 ( -928, -37.3%) crypto/x509.parseCertificate
1128  392 ( -736, -65.2%) math/big.nat.divLarge
1528  864 ( -664, -43.5%) go/printer.(*printer).fieldList
1360  712 ( -648, -47.6%) regexp/syntax.(*parser).factor
2104 1528 ( -576, -27.4%) encoding/asn1.parseField
1064  504 ( -560, -52.6%) encoding/xml.(*Decoder).text
 584   48 ( -536, -91.8%) html.init
1400  864 ( -536, -38.3%) go/doc.playExample

In the same godoc build, cuts the number of functions with
too many vars from 83 to 32.

R=ken2
CC=golang-dev
https://golang.org/cl/12829043
2013-08-13 00:09:31 -04:00
Russ Cox dbf96addfb cmd/gc: move flow graph into portable opt
Now there's only one copy of the flow graph construction
and dominator computation, and different optimizations
can attach different annotations to the instructions.

R=ken2
CC=golang-dev
https://golang.org/cl/12797045
2013-08-12 22:02:10 -04:00
Russ Cox b3b87143f2 cmd/gc: support for "portable" optimization logic
Code in gc/popt.c is compiled as part of 5g, 6g, and 8g,
meaning it can use arch-specific headers but there's
just one copy of the code.

This is the same arrangement we use for the portable
code generation logic in gc/pgen.c.

Move fixjmp and noreturn there to get the ball rolling.

R=ken2
CC=golang-dev
https://golang.org/cl/12789043
2013-08-12 19:14:02 -04:00
Russ Cox 24c8035fbe cmd/6g: move opt instruction decode into common function
Add new proginfo function that returns information about a
Prog*. The information includes various instruction
description bits as well as a list of required registers set
and used and indexing registers used.

Convert the large instruction switches to use proginfo.

This information was formerly duplicated in multiple
optimization passes, inconsistently. For example, the
information about which registers an instruction requires
appeared three times for most instructions.

Most of the switches were incomplete or incorrect in some way.
For example, the switch in copyu did not list cases for INCB,
JPS, MOVAPD, MOVBWSX, MOVBWZX, PCDATA, POPQ, PUSHQ, STD,
TESTB, TESTQ, and XCHGL. Those were all falling into the
"unknown instruction" default case and stopping the rewrite,
perhaps unnecessarily. Similarly, the switch in needc only
listed a handful of the instructions that use or set the carry bit.

We still need to decide whether to use proginfo to generalize
a few of the remaining smaller switches in peep.c.

If this goes well, we'll make similar changes in 8g and 5g.

R=ken2
CC=golang-dev
https://golang.org/cl/12637051
2013-08-11 21:46:38 -04:00
Russ Cox 48769bf546 runtime: use funcdata to supply garbage collection information
This CL introduces a FUNCDATA number for runtime-specific
garbage collection metadata, changes the C and Go compilers
to emit that metadata, and changes the runtime to expect it.

The old pseudo-instructions that carried this information
are gone, as is the linker code to process them.

R=golang-dev, dvyukov, cshapiro
CC=golang-dev
https://golang.org/cl/11406044
2013-07-19 16:04:09 -04:00
Carl Shapiro 4e0a51c210 cmd/5l, cmd/6l, cmd/8l, cmd/gc, runtime: generate and use bitmaps of argument pointer locations
With this change the compiler emits a bitmap for each function
covering its stack frame arguments area.  If an argument word
is known to contain a pointer, a bit is set.  The garbage
collector reads this information when scanning the stack by
frames and uses it to ignores locations known to not contain a
pointer.

R=golang-dev, bradfitz, daniel.morsing, dvyukov, khr, khr, iant, cshapiro
CC=golang-dev
https://golang.org/cl/9223046
2013-05-28 17:59:10 -07:00
Rob Pike 4dcb13bb44 cmd/gc: fix some overflows in the compiler
Some 64-bit fields were run through 32-bit words, some counts were
not checked for overflow, and relocations must fit in 32 bits.
Tests to follow.

R=golang-dev, dsymonds
CC=golang-dev
https://golang.org/cl/9033043
2013-04-29 22:44:40 -07:00
David du Colombier 3a14b42edf cmd/6g: fix warnings on Plan 9
src/cmd/6g/peep.c:471 set and not used: r
src/cmd/6g/peep.c:560 overspecified class: regconsttyp GLOBL STATIC
src/cmd/6g/peep.c:761 more arguments than format IND STRUCT Prog
src/cmd/6g/reg.c:185 set and not used: r1
src/cmd/6g/reg.c:786 format mismatch d VLONG, arg 3
src/cmd/6g/reg.c:1064 format mismatch d VLONG, arg 5

R=golang-dev, bradfitz
CC=golang-dev
https://golang.org/cl/8197044
2013-03-30 09:31:49 -07:00
Russ Cox aa3efb28f0 cmd/gc: can stop tracking gotype in regopt
Now that the type information is in TYPE instructions
that are not rewritten by the optimization passes,
we don't have to try to preserve the type information
(no longer) attached to MOV instructions.

R=ken2
CC=golang-dev
https://golang.org/cl/7402054
2013-02-25 16:11:34 -05:00
Russ Cox 1d5dc4fd48 cmd/gc: emit explicit type information for local variables
The type information is (and for years has been) included
as an extra field in the address chunk of an instruction.
Unfortunately, suppose there is a string at a+24(FP) and
we have an instruction reading its length. It will say:

        MOVQ x+32(FP), AX

and the type of *that* argument is int (not slice), because
it is the length being read. This confuses the picture seen
by debuggers and now, worse, by the garbage collector.

Instead of attaching the type information to all uses,
emit an explicit list of TYPE instructions with the information.
The TYPE instructions are no-ops whose only role is to
provide an address to attach type information to.

For example, this function:

        func f(x, y, z int) (a, b string) {
                return
        }

now compiles into:

        --- prog list "f" ---
        0000 (/Users/rsc/x.go:3) TEXT    f+0(SB),$0-56
        0001 (/Users/rsc/x.go:3) LOCALS  ,
        0002 (/Users/rsc/x.go:3) TYPE    x+0(FP){int},$8
        0003 (/Users/rsc/x.go:3) TYPE    y+8(FP){int},$8
        0004 (/Users/rsc/x.go:3) TYPE    z+16(FP){int},$8
        0005 (/Users/rsc/x.go:3) TYPE    a+24(FP){string},$16
        0006 (/Users/rsc/x.go:3) TYPE    b+40(FP){string},$16
        0007 (/Users/rsc/x.go:3) MOVQ    $0,b+40(FP)
        0008 (/Users/rsc/x.go:3) MOVQ    $0,b+48(FP)
        0009 (/Users/rsc/x.go:3) MOVQ    $0,a+24(FP)
        0010 (/Users/rsc/x.go:3) MOVQ    $0,a+32(FP)
        0011 (/Users/rsc/x.go:4) RET     ,

The { } show the formerly hidden type information.
The { } syntax is used when printing from within the gc compiler.
It is not accepted by the assemblers.

The same type information is now included on global variables:

0055 (/Users/rsc/x.go:15) GLOBL   slice+0(SB){[]string},$24(AL*0)

This more accurate type information fixes a bug in the
garbage collector's precise heap collection.

The linker only cares about globals right now, but having the
local information should make things a little nicer for Carl
in the future.

Fixes #4907.

R=ken2
CC=golang-dev
https://golang.org/cl/7395056
2013-02-25 12:13:47 -05:00
Russ Cox 2c09d6992f cmd/gc: slightly better code generation
* Avoid treating CALL fn(SB) as justification for introducing
and tracking a registerized variable for fn(SB).

* Remove USED(n) after declaration and zeroing of n.
It was left over from when the compiler emitted more
aggressive set and not used errors, and it was keeping
the optimizer from removing a redundant zeroing of n
when n was a pointer or integer variable.

Update #597.

R=ken2
CC=golang-dev
https://golang.org/cl/7277048
2013-02-03 14:51:21 -05:00
Daniel Morsing f1e4ee3f49 cmd/5g, cmd/6g, cmd/8g: flush return parameters in case of panic.
Fixes #4066.

R=rsc, minux.ma
CC=golang-dev
https://golang.org/cl/7040044
2013-01-04 17:07:21 +01:00
Dave Cheney b2797f2ae0 cmd/{5,6,8}g: reduce size of Prog and Addr
5g: Prog went from 128 bytes to 88 bytes
6g: Prog went from 174 bytes to 144 bytes
8g: Prog went from 124 bytes to 92 bytes

There may be a little more that can be squeezed out of Addr, but alignment will be a factor.

All: remove the unused pun field from Addr

R=rsc, minux.ma
CC=golang-dev
https://golang.org/cl/6922048
2012-12-14 06:20:24 +11:00
Rémy Oudompheng 45fe306ac8 cmd/[568]g: recycle ONAME nodes used in regopt to denote registers.
The reported decrease in memory usage is about 5%.

R=golang-dev, dave, rsc
CC=golang-dev
https://golang.org/cl/6902064
2012-12-09 19:10:52 +01:00
Rémy Oudompheng 7c0cbbfa18 cmd/6g, cmd/8g: mark used registers in indirect addressing.
Fixes #4094.
Fixes #4353.

R=golang-dev, dave, rsc
CC=golang-dev
https://golang.org/cl/6810090
2012-11-07 21:36:15 +01:00
Rémy Oudompheng 33cceb09e2 cmd/{5g,6g,8g,6c}: remove unused macro, use %E to print etype.
R=golang-dev, rsc, dave
CC=golang-dev
https://golang.org/cl/6569044
2012-09-24 23:44:00 +02:00
Russ Cox 650160e36a cmd/gc: prepare for 64-bit ints
This CL makes the compiler understand that the type of
the len or cap of a map, slice, or string is 'int', not 'int32'.
It does not change the meaning of int, but it should make
the eventual change of the meaning of int in 6g a bit smoother.

Update #2188.

R=ken, dave, remyoudompheng
CC=golang-dev
https://golang.org/cl/6542059
2012-09-24 14:59:44 -04:00
Rémy Oudompheng 36df358a30 cmd/6g: fix internal error with SSE registers.
Revision 63f7abcae015 introduced a bug caused by
code assuming registers started at X5, not X0.

Fixes #4138.

R=rsc
CC=golang-dev, remy
https://golang.org/cl/6558043
2012-09-23 18:22:03 +02:00
Russ Cox 658482d70f cmd/5g: fix register opt bug
The width was not being set on the address, which meant
that the optimizer could not find variables that overlapped
with it and mark them as having had their address taken.
This let to the compiler believing variables had been set
but never used and then optimizing away the set.

Fixes #4129.

R=ken2
CC=golang-dev
https://golang.org/cl/6552059
2012-09-22 10:01:35 -04:00
Rémy Oudompheng 413fbed341 cmd/6g: cosmetic improvements to regopt debugging.
R=rsc, golang-dev
CC=golang-dev
https://golang.org/cl/6528044
2012-09-21 20:20:26 +02:00
Russ Cox 57ad05db15 cmd/6g: use all 16 float registers, optimize float moves
Fixes #2446.

R=ken2
CC=golang-dev
https://golang.org/cl/6557044
2012-09-21 13:39:09 -04:00
Rémy Oudompheng a7059cc793 cmd/[568]g: correct freeing of allocated Regs.
R=golang-dev, rsc
CC=golang-dev, remy
https://golang.org/cl/6281050
2012-06-05 06:43:15 +02:00
Russ Cox 001b75c942 cmd/gc: contiguous loop layout
Drop expecttaken function in favor of extra argument
to gbranch and bgen. Mark loop condition as likely to
be true, so that loops are generated inline.

The main benefit here is contiguous code when trying
to read the generated assembly. It has only minor effects
on the timing, and they mostly cancel the minor effects
that aligning function entry points had.  One exception:
both changes made Fannkuch faster.

Compared to before CL 6244066 (before aligned functions)
benchmark                 old ns/op    new ns/op    delta
BenchmarkBinaryTree17    4222117400   4201958800   -0.48%
BenchmarkFannkuch11      3462631800   3215908600   -7.13%
BenchmarkGobDecode         20887622     20899164   +0.06%
BenchmarkGobEncode          9548772      9439083   -1.15%
BenchmarkGzip                151687       152060   +0.25%
BenchmarkGunzip                8742         8711   -0.35%
BenchmarkJSONEncode        62730560     62686700   -0.07%
BenchmarkJSONDecode       252569180    252368960   -0.08%
BenchmarkMandelbrot200      5267599      5252531   -0.29%
BenchmarkRevcomp25M       980813500    985248400   +0.45%
BenchmarkTemplate         361259100    357414680   -1.06%

Compared to tip (aligned functions):
benchmark                 old ns/op    new ns/op    delta
BenchmarkBinaryTree17    4140739800   4201958800   +1.48%
BenchmarkFannkuch11      3259914400   3215908600   -1.35%
BenchmarkGobDecode         20620222     20899164   +1.35%
BenchmarkGobEncode          9384886      9439083   +0.58%
BenchmarkGzip                150333       152060   +1.15%
BenchmarkGunzip                8741         8711   -0.34%
BenchmarkJSONEncode        65210990     62686700   -3.87%
BenchmarkJSONDecode       249394860    252368960   +1.19%
BenchmarkMandelbrot200      5273394      5252531   -0.40%
BenchmarkRevcomp25M       996013800    985248400   -1.08%
BenchmarkTemplate         360620840    357414680   -0.89%

R=ken2
CC=golang-dev
https://golang.org/cl/6245069
2012-05-30 18:07:39 -04:00
Russ Cox 8f8640a057 cmd/6g: allow use of R14, R15 now
We stopped reserving them in 2009 or so.

R=ken
CC=golang-dev
https://golang.org/cl/6215061
2012-05-21 12:59:26 -04:00
Russ Cox e530d6a1e0 6c, 6g, 6l: add MOVQL to make truncation explicit
Without an explicit signal for a truncation, copy propagation
will sometimes propagate a 32-bit truncation and end up
overwriting uses of the original 64-bit value.

The case that arose in practice is in C but I believe
that the same could plausibly happen in Go.
The main reason we didn't run into the same in Go
is that I (perhaps incorrectly?) drop MOVL AX, AX
during gins, so the truncation was never generated, so
it didn't confuse the optimizer.

Fixes #1315.
Fixes #3488.

R=ken2
CC=golang-dev
https://golang.org/cl/6002043
2012-04-10 12:51:59 -04:00
Russ Cox 8998835543 5g, 6g, 8g: flush modified globals aggressively
The alternative is to record enough information that the
trap handler know which registers contain cached globals
and can flush the registers back to their original locations.
That's significantly more work.

This only affects globals that have been written to.
Code that reads from a global should continue to registerize
as well as before.

Fixes #1304.

R=ken2
CC=golang-dev
https://golang.org/cl/5687046
2012-02-20 13:41:44 -05:00
Russ Cox e2d326b878 5g, 6g, 8g: fix loop finding bug, squash jmps
The loop recognizer uses the standard dominance
frontiers but gets confused by dead code, which
has a (not explicitly set) rpo number of 0, meaning it
looks like the head of the function, so it dominates
everything.  If the loop recognizer encounters dead
code while tracking backward through the graph
it fails to recognize where it started as a loop, and
then the optimizer does not registerize values loaded
inside that loop.  Fix by checking rpo against rpo2r.

Separately, run a quick pass over the generated
code to squash JMPs to JMP instructions, which
are convenient to emit during code generation but
difficult to read when debugging the -S output.
A side effect of this pass is to eliminate dead code,
so the output files may be slightly smaller and the
optimizer may have less work to do.
There is no semantic effect, because the linkers
flatten JMP chains and delete dead instructions
when laying out the final code.  Doing it here too
just makes the -S output easier to read and more
like what the final binary will contain.

The "dead code breaks loop finding" bug is thus
fixed twice over.  It seemed prudent to fix loopit
separately just in case dead code ever sneaks back
in for one reason or another.

R=ken2
CC=golang-dev
https://golang.org/cl/5190043
2011-10-04 15:06:16 -04:00
Russ Cox e419535f2a 5g, 6g, 8g: registerize variables again
My previous CL:

changeset:   9645:ce2e5f44b310
user:        Russ Cox <rsc@golang.org>
date:        Tue Sep 06 10:24:21 2011 -0400
summary:     gc: unify stack frame layout

introduced a bug wherein no variables were
being registerized, making Go programs 2-3x
slower than they had been before.

This CL fixes that bug (along with some others
it was hiding) and adds a test that optimization
makes at least one test case faster.

R=ken2
CC=golang-dev
https://golang.org/cl/5174045
2011-10-03 17:46:36 -04:00