Commit Graph

35 Commits

Author SHA1 Message Date
Russ Cox 80a153dd51 cmd/6l, cmd/8l: fix MOVL MOVQ optab
The entry for LEAL/LEAQ in these optabs was listed as having
two data bytes in the y array. In fact they had and expect no data
bytes. However, the general loop expects to be able to look at at
least one data byte, to make sure it is not 0x0f. So give them each
a single data byte set to 0 (not 0x0f).

Since the MOV instructions have the largest optab cases, this
requires growing the size of the data array.

Clang found this bug because the general o->op[z] == 0x0f
test was using z == 22, which was out of bounds.

In practice the next byte in memory was probably not 0x0f
so it wasn't truly broken. But might as well be clean.

Update #5764

R=ken2
CC=golang-dev
https://golang.org/cl/13241050
2013-09-10 14:53:41 -04:00
Russ Cox 567818224e cmd/5l, cmd/6l, cmd/8l: accept PCDATA instruction in input
The portable code in cmd/ld already knows how to process it,
we just have to ignore it during code generation.

R=ken2
CC=golang-dev
https://golang.org/cl/11363043
2013-07-16 16:23:11 -04:00
Russ Cox 5d363c6357 cmd/ld, runtime: new in-memory symbol table format
Design at http://golang.org/s/go12symtab.

This enables some cleanup of the garbage collector metadata
that will be done in future CLs.

This CL does not move the old symtab and pclntab back into
an unmapped section of the file. That's a bit tricky and will be
done separately.

Fixes #4020.

R=golang-dev, dave, cshapiro, iant, r
CC=golang-dev, nigeltao
https://golang.org/cl/11085043
2013-07-16 09:41:38 -04:00
Russ Cox aad4720b51 cmd/6l, cmd/8l: use one-byte XCHG forms when possible
Pointed out by khr.

R=ken2
CC=golang-dev
https://golang.org/cl/11145044
2013-07-12 20:58:38 -04:00
Adam Langley 6bea504b94 cmd/6a, cmd/6l: add PCLMULQDQ instruction.
This Intel instruction implements multiplication in binary fields.

R=golang-dev, minux.ma, dave, rsc
CC=golang-dev
https://golang.org/cl/10428043
2013-06-21 15:17:13 -04:00
Russ Cox 26d43a0f22 cmd/6l: accept NOP of $x+10(SP) and of X0
Needed to link code compiled with 6c -N.

R=ken2
CC=golang-dev
https://golang.org/cl/10043044
2013-06-05 10:38:52 -04:00
Carl Shapiro 4e0a51c210 cmd/5l, cmd/6l, cmd/8l, cmd/gc, runtime: generate and use bitmaps of argument pointer locations
With this change the compiler emits a bitmap for each function
covering its stack frame arguments area.  If an argument word
is known to contain a pointer, a bit is set.  The garbage
collector reads this information when scanning the stack by
frames and uses it to ignores locations known to not contain a
pointer.

R=golang-dev, bradfitz, daniel.morsing, dvyukov, khr, khr, iant, cshapiro
CC=golang-dev
https://golang.org/cl/9223046
2013-05-28 17:59:10 -07:00
Russ Cox b505ff6279 crypto/rc4: faster amd64 implementation
XOR key into data 128 bits at a time instead of 64 bits
and pipeline half of state loads. Rotate loop to allow
single-register indexing for state[i].

On a MacBookPro10,2 (Core i5):

benchmark           old ns/op    new ns/op    delta
BenchmarkRC4_128          412          224  -45.63%
BenchmarkRC4_1K          3179         1613  -49.26%
BenchmarkRC4_8K         25223        12545  -50.26%

benchmark            old MB/s     new MB/s  speedup
BenchmarkRC4_128       310.51       570.42    1.84x
BenchmarkRC4_1K        322.09       634.48    1.97x
BenchmarkRC4_8K        320.97       645.32    2.01x

For comparison, on the same machine, openssl 0.9.8r reports
its rc4 speed as somewhat under 350 MB/s for both 1K and 8K
(it is operating 64 bits at a time).

On an Intel Xeon E5520:

benchmark           old ns/op    new ns/op    delta
BenchmarkRC4_128          418          259  -38.04%
BenchmarkRC4_1K          3200         1884  -41.12%
BenchmarkRC4_8K         25173        14529  -42.28%

benchmark            old MB/s     new MB/s  speedup
BenchmarkRC4_128       306.04       492.48    1.61x
BenchmarkRC4_1K        319.93       543.26    1.70x
BenchmarkRC4_8K        321.61       557.20    1.73x

For comparison, on the same machine, openssl 1.0.1
reports its rc4 speed as 587 MB/s for 1K and 601 MB/s for 8K.

R=agl
CC=golang-dev
https://golang.org/cl/7865046
2013-03-21 16:38:57 -04:00
Keith Randall 297bb12809 cmd/6a, cmd/8a, cmd/6l, cmd/8l: add AES instructions
Instructions for use in AES hashing.  See CL#7543043

R=rsc
CC=golang-dev
https://golang.org/cl/7548043
2013-03-07 12:54:00 -08:00
Russ Cox 1d5dc4fd48 cmd/gc: emit explicit type information for local variables
The type information is (and for years has been) included
as an extra field in the address chunk of an instruction.
Unfortunately, suppose there is a string at a+24(FP) and
we have an instruction reading its length. It will say:

        MOVQ x+32(FP), AX

and the type of *that* argument is int (not slice), because
it is the length being read. This confuses the picture seen
by debuggers and now, worse, by the garbage collector.

Instead of attaching the type information to all uses,
emit an explicit list of TYPE instructions with the information.
The TYPE instructions are no-ops whose only role is to
provide an address to attach type information to.

For example, this function:

        func f(x, y, z int) (a, b string) {
                return
        }

now compiles into:

        --- prog list "f" ---
        0000 (/Users/rsc/x.go:3) TEXT    f+0(SB),$0-56
        0001 (/Users/rsc/x.go:3) LOCALS  ,
        0002 (/Users/rsc/x.go:3) TYPE    x+0(FP){int},$8
        0003 (/Users/rsc/x.go:3) TYPE    y+8(FP){int},$8
        0004 (/Users/rsc/x.go:3) TYPE    z+16(FP){int},$8
        0005 (/Users/rsc/x.go:3) TYPE    a+24(FP){string},$16
        0006 (/Users/rsc/x.go:3) TYPE    b+40(FP){string},$16
        0007 (/Users/rsc/x.go:3) MOVQ    $0,b+40(FP)
        0008 (/Users/rsc/x.go:3) MOVQ    $0,b+48(FP)
        0009 (/Users/rsc/x.go:3) MOVQ    $0,a+24(FP)
        0010 (/Users/rsc/x.go:3) MOVQ    $0,a+32(FP)
        0011 (/Users/rsc/x.go:4) RET     ,

The { } show the formerly hidden type information.
The { } syntax is used when printing from within the gc compiler.
It is not accepted by the assemblers.

The same type information is now included on global variables:

0055 (/Users/rsc/x.go:15) GLOBL   slice+0(SB){[]string},$24(AL*0)

This more accurate type information fixes a bug in the
garbage collector's precise heap collection.

The linker only cares about globals right now, but having the
local information should make things a little nicer for Carl
in the future.

Fixes #4907.

R=ken2
CC=golang-dev
https://golang.org/cl/7395056
2013-02-25 12:13:47 -05:00
Russ Cox d57fcbf05c cmd/5l, cmd/6l, cmd/8l: accept CALL reg, reg
The new src argument is ignored during linking
(that is, CALL r1, r2 is identical to CALL r2 for linking),
but it serves as a hint to the 5g/6g/8g optimizer
that the src register is live on entry to the called
function and must be preserved.

It is possible to avoid exposing this fact to the rest of
the toolchain, keeping it entirely within 5g/6g/8g,
but I think it will help to be able to look in object files
and assembly listings and linker -a / -W output to
see CALL instructions are "Go func value" calls and
which are "C function pointer" calls.

R=ken2
CC=golang-dev
https://golang.org/cl/7364045
2013-02-22 14:23:21 -05:00
Carl Shapiro f466617a62 cmd/5g, cmd/5l, cmd/6l, cmd/8l, cmd/gc, cmd/ld, runtime: accurate args and locals information
Previously, the func structure contained an inaccurate value for
the args member and a 0 value for the locals member.

This change populates the func structure with args and locals
values computed by the compiler.  The number of args was
already available in the ATEXT instruction.  The number of
locals is now passed through in the new ALOCALS instruction.

This change also switches the unit of args and locals to be
bytes, just like the frame member, instead of 32-bit words.

R=golang-dev, bradfitz, cshapiro, dave, rsc
CC=golang-dev
https://golang.org/cl/7399045
2013-02-21 12:52:26 -08:00
Russ Cox 3d40062c68 cmd/gc, cmd/ld: struct field tracking
This is an experiment in static analysis of Go programs
to understand which struct fields a program might use.
It is not part of the Go language specification, it must
be enabled explicitly when building the toolchain,
and it may be removed at any time.

After building the toolchain with GOEXPERIMENT=fieldtrack,
a specific field can be marked for tracking by including
`go:"track"` in the field tag:

        package pkg

        type T struct {
                F int `go:"track"`
                G int // untracked
        }

To simplify usage, only named struct types can have
tracked fields, and only exported fields can be tracked.

The implementation works by making each function begin
with a sequence of no-op USEFIELD instructions declaring
which tracked fields are accessed by a specific function.
After the linker's dead code elimination removes unused
functions, the fields referred to by the remaining
USEFIELD instructions are the ones reported as used by
the binary.

The -k option to the linker specifies the fully qualified
symbol name (such as my/pkg.list) of a string variable that
should be initialized with the field tracking information
for the program. The field tracking string is a sequence
of lines, each terminated by a \n and describing a single
tracked field referred to by the program. Each line is made
up of one or more tab-separated fields. The first field is
the name of the tracked field, fully qualified, as in
"my/pkg.T.F". Subsequent fields give a shortest path of
reverse references from that field to a global variable or
function, corresponding to one way in which the program
might reach that field.

A common source of false positives in field tracking is
types with large method sets, because a reference to the
type descriptor carries with it references to all methods.
To address this problem, the CL also introduces a comment
annotation

        //go:nointerface

that marks an upcoming method declaration as unavailable
for use in satisfying interfaces, both statically and
dynamically. Such a method is also invisible to package
reflect.

Again, all of this is disabled by default. It only turns on
if you have GOEXPERIMENT=fieldtrack set during make.bash.

R=iant, ken
CC=golang-dev
https://golang.org/cl/6749064
2012-11-02 00:17:21 -04:00
Shenghou Ma e039c405c8 cmd/6a, cmd/6l: add support for AES-NI instrutions and PSHUFD
This CL adds support for the these 7 new instructions to 6a/6l in
preparation of the upcoming CL for AES-NI accelerated crypto/aes:
AESENC, AESENCLAST, AESDEC, AESDECLAST, AESIMC, AESKEYGENASSIST,
and PSHUFD.

R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/5970055
2012-09-27 01:53:08 +08:00
Adam Langley 2c5b53866c undo CL 6498092 / 4ff71bc1a199
Broke tests on 386.

««« original CL description
6l/8l: emit correct opcodes to F(SUB|DIV)R?D.

When the destination was not F0, 6l and 8l swapped FSUBD/FSUBRD and
FDIVD/FDIVRD.

R=golang-dev, dave, rsc
CC=golang-dev
https://golang.org/cl/6498092
»»»

R=golang-dev
CC=golang-dev
https://golang.org/cl/6492100
2012-09-10 15:52:36 -04:00
Adam Langley 72fa142fc5 6l/8l: emit correct opcodes to F(SUB|DIV)R?D.
When the destination was not F0, 6l and 8l swapped FSUBD/FSUBRD and
FDIVD/FDIVRD.

R=golang-dev, dave, rsc
CC=golang-dev
https://golang.org/cl/6498092
2012-09-10 15:35:39 -04:00
Russ Cox f2bd3a977d cmd/6l, cmd/8l, cmd/5l: add AUNDEF instruction
On 6l and 8l, this is a real instruction, guaranteed to
cause an 'undefined instruction' exception.

On 5l, we simulate it as BL to address 0.

The plan is to use it as a signal to the linker that this
point in the instruction stream cannot be reached
(hence the changes to nofollow).  This will help the
compiler explain that panicindex and friends do not
return without having to put a list of these functions
in the linker.

R=ken2
CC=golang-dev
https://golang.org/cl/6255064
2012-05-30 16:47:56 -04:00
Russ Cox fefae6eed1 cmd/6g, cmd/8g: move panicindex calls out of line
The old code generated for a bounds check was
                CMP
                JLT ok
                CALL panicindex
        ok:
                ...

The new code is (once the linker finishes with it):
                CMP
                JGE panic
                ...
        panic:
                CALL panicindex

which moves the calls out of line, putting more useful
code in each cache line.  This matters especially in tight
loops, such as in Fannkuch.  The benefit is more modest
elsewhere, but real.

From test/bench/go1, amd64:

benchmark                old ns/op    new ns/op    delta
BenchmarkBinaryTree17   6096092000   6088808000   -0.12%
BenchmarkFannkuch11     6151404000   4020463000  -34.64%
BenchmarkGobDecode        28990050     28894630   -0.33%
BenchmarkGobEncode        12406310     12136730   -2.17%
BenchmarkGzip               179923       179903   -0.01%
BenchmarkGunzip              11219        11130   -0.79%
BenchmarkJSONEncode       86429350     86515900   +0.10%
BenchmarkJSONDecode      334593800    315728400   -5.64%
BenchmarkRevcomp25M     1219763000   1180767000   -3.20%
BenchmarkTemplate        492947600    483646800   -1.89%

And 386:

benchmark                old ns/op    new ns/op    delta
BenchmarkBinaryTree17   6354902000   6243000000   -1.76%
BenchmarkFannkuch11     8043769000   7326965000   -8.91%
BenchmarkGobDecode        19010800     18941230   -0.37%
BenchmarkGobEncode        14077500     13792460   -2.02%
BenchmarkGzip               194087       193619   -0.24%
BenchmarkGunzip              12495        12457   -0.30%
BenchmarkJSONEncode      125636400    125451400   -0.15%
BenchmarkJSONDecode      696648600    685032800   -1.67%
BenchmarkRevcomp25M     2058088000   2052545000   -0.27%
BenchmarkTemplate        602140000    589876800   -2.04%

To implement this, two new instruction forms:

        JLT target      // same as always
        JLT $0, target  // branch expected not taken
        JLT $1, target  // branch expected taken

The linker could also emit the prediction prefixes, but it
does not: expected taken branches are reversed so that the
expected case is not taken (as in example above), and
the default expectaton for such a jump is not taken
already.

R=golang-dev, gri, r, dave
CC=golang-dev
https://golang.org/cl/6248049
2012-05-29 12:09:27 -04:00
Russ Cox ed480128a6 cmd/6a, cmd/6l: add BSWAPL, BSWAPQ
R=ken2
CC=golang-dev
https://golang.org/cl/6209087
2012-05-22 00:12:58 -04:00
Russ Cox e530d6a1e0 6c, 6g, 6l: add MOVQL to make truncation explicit
Without an explicit signal for a truncation, copy propagation
will sometimes propagate a 32-bit truncation and end up
overwriting uses of the original 64-bit value.

The case that arose in practice is in C but I believe
that the same could plausibly happen in Go.
The main reason we didn't run into the same in Go
is that I (perhaps incorrectly?) drop MOVL AX, AX
during gins, so the truncation was never generated, so
it didn't confuse the optimizer.

Fixes #1315.
Fixes #3488.

R=ken2
CC=golang-dev
https://golang.org/cl/6002043
2012-04-10 12:51:59 -04:00
Russ Cox 35d260fa4c 6a, 6l: add PREFETCH instructions
R=ken2
CC=golang-dev
https://golang.org/cl/5989073
2012-04-10 10:09:09 -04:00
Adam Langley 36d3707009 6a/6l: add IMUL3Q and SHLDL
Although Intel considers the three-argument form of IMUL to be a
variant of IMUL, I couldn't make 6l able to differentiate it without
huge changes, so I called it IMUL3.

R=rsc
CC=golang-dev
https://golang.org/cl/5686055
2012-02-23 10:51:04 -05:00
Michał Derkacz 17105870ff 6l: add MOVQ xmm_reg, xmm_reg
Added handler for:
        MOVQ xmm_reg, xmm_reg/mem64
        MOVQ xmm_reg/mem64, xmm_reg
using native MOVQ (it take precedence above REX.W MOVD)
I don't understood 6l code enough to be sure that my small changes
didn't broke it. But now 6l works with MOVQ xmm_reg, xmm_reg and
all.bash reports "0 unexpected bugs".

There is test assembly source:
MOVQ    X0, X1
MOVQ    AX, X1
MOVQ    X1, AX
MOVQ    xxx+8(FP), X2
MOVQ    X2, xxx+8(FP)

and generated code (gdb disassemble /r):

0x000000000040f112 <+0>:   f3 0f 7e c8        movq  %xmm0,%xmm1
0x000000000040f116 <+4>:   66 48 0f 6e c8     movq  %rax,%xmm1
0x000000000040f11b <+9>:   66 48 0f 7e c8     movq  %xmm1,%rax
0x000000000040f120 <+14>:  f3 0f 7e 54 24 10  movq  0x10(%rsp),%xmm2
0x000000000040f126 <+20>:  66 0f d6 54 24 10  movq  %xmm2,0x10(%rsp)

Fixes #2418.

R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/5316076
2011-11-09 16:01:17 -05:00
Michał Derkacz c8a2be8c38 6l: Fixes opcode for PSLLQ imm8, xmm_reg
R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/5340056
2011-11-09 16:00:24 -05:00
Jaroslavas Počepko a88994f804 6l, 8l: remove JCXZ; add JCXZW, JCXZL, and JCXZQ
R=golang-dev
CC=golang-dev, rsc
https://golang.org/cl/4950050
2011-08-26 17:45:19 -04:00
Dmitriy Vyukov 4e5086b993 runtime: improve Linux mutex
The implementation is hybrid active/passive spin/blocking mutex.
The design minimizes amount of context switches and futex calls.
The idea is that all critical sections in runtime are intentially
small, so pure blocking mutex behaves badly causing
a lot of context switches, thread parking/unparking and kernel calls.
Note that some synthetic benchmarks become somewhat slower,
that's due to increased contention on other data structures,
it should not affect programs that do any real work.

On 2 x Intel E5620, 8 HT cores, 2.4GHz
benchmark                     old ns/op    new ns/op    delta
BenchmarkSelectContended         521.00       503.00   -3.45%
BenchmarkSelectContended-2       661.00       320.00  -51.59%
BenchmarkSelectContended-4      1139.00       629.00  -44.78%
BenchmarkSelectContended-8      2870.00       878.00  -69.41%
BenchmarkSelectContended-16     5276.00       818.00  -84.50%
BenchmarkChanContended           112.00       103.00   -8.04%
BenchmarkChanContended-2         631.00       174.00  -72.42%
BenchmarkChanContended-4         682.00       272.00  -60.12%
BenchmarkChanContended-8        1601.00       520.00  -67.52%
BenchmarkChanContended-16       3100.00       372.00  -88.00%
BenchmarkChanSync                253.00       239.00   -5.53%
BenchmarkChanSync-2             5030.00      4648.00   -7.59%
BenchmarkChanSync-4             4826.00      4694.00   -2.74%
BenchmarkChanSync-8             4778.00      4713.00   -1.36%
BenchmarkChanSync-16            5289.00      4710.00  -10.95%
BenchmarkChanProdCons0           273.00       254.00   -6.96%
BenchmarkChanProdCons0-2         599.00       400.00  -33.22%
BenchmarkChanProdCons0-4        1168.00       659.00  -43.58%
BenchmarkChanProdCons0-8        2831.00      1057.00  -62.66%
BenchmarkChanProdCons0-16       4197.00      1037.00  -75.29%
BenchmarkChanProdCons10          150.00       140.00   -6.67%
BenchmarkChanProdCons10-2        607.00       268.00  -55.85%
BenchmarkChanProdCons10-4       1137.00       404.00  -64.47%
BenchmarkChanProdCons10-8       2115.00       828.00  -60.85%
BenchmarkChanProdCons10-16      4283.00       855.00  -80.04%
BenchmarkChanProdCons100         117.00       110.00   -5.98%
BenchmarkChanProdCons100-2       558.00       218.00  -60.93%
BenchmarkChanProdCons100-4       722.00       287.00  -60.25%
BenchmarkChanProdCons100-8      1840.00       431.00  -76.58%
BenchmarkChanProdCons100-16     3394.00       448.00  -86.80%
BenchmarkChanProdConsWork0      2014.00      1996.00   -0.89%
BenchmarkChanProdConsWork0-2    1207.00      1127.00   -6.63%
BenchmarkChanProdConsWork0-4    1913.00       611.00  -68.06%
BenchmarkChanProdConsWork0-8    3016.00       949.00  -68.53%
BenchmarkChanProdConsWork0-16   4320.00      1154.00  -73.29%
BenchmarkChanProdConsWork10     1906.00      1897.00   -0.47%
BenchmarkChanProdConsWork10-2   1123.00      1033.00   -8.01%
BenchmarkChanProdConsWork10-4   1076.00       571.00  -46.93%
BenchmarkChanProdConsWork10-8   2748.00      1096.00  -60.12%
BenchmarkChanProdConsWork10-16  4600.00      1105.00  -75.98%
BenchmarkChanProdConsWork100    1884.00      1852.00   -1.70%
BenchmarkChanProdConsWork100-2  1235.00      1146.00   -7.21%
BenchmarkChanProdConsWork100-4  1217.00       619.00  -49.14%
BenchmarkChanProdConsWork100-8  1534.00       509.00  -66.82%
BenchmarkChanProdConsWork100-16 4126.00       918.00  -77.75%
BenchmarkSyscall                  34.40        33.30   -3.20%
BenchmarkSyscall-2               160.00       121.00  -24.38%
BenchmarkSyscall-4               131.00       136.00   +3.82%
BenchmarkSyscall-8               139.00       131.00   -5.76%
BenchmarkSyscall-16              161.00       168.00   +4.35%
BenchmarkSyscallWork             950.00       950.00   +0.00%
BenchmarkSyscallWork-2           481.00       480.00   -0.21%
BenchmarkSyscallWork-4           268.00       270.00   +0.75%
BenchmarkSyscallWork-8           156.00       169.00   +8.33%
BenchmarkSyscallWork-16          188.00       184.00   -2.13%
BenchmarkSemaSyntNonblock         36.40        35.60   -2.20%
BenchmarkSemaSyntNonblock-2       81.40        45.10  -44.59%
BenchmarkSemaSyntNonblock-4      126.00       108.00  -14.29%
BenchmarkSemaSyntNonblock-8      112.00       112.00   +0.00%
BenchmarkSemaSyntNonblock-16     110.00       112.00   +1.82%
BenchmarkSemaSyntBlock            35.30        35.30   +0.00%
BenchmarkSemaSyntBlock-2         118.00       124.00   +5.08%
BenchmarkSemaSyntBlock-4         105.00       108.00   +2.86%
BenchmarkSemaSyntBlock-8         101.00       111.00   +9.90%
BenchmarkSemaSyntBlock-16        112.00       118.00   +5.36%
BenchmarkSemaWorkNonblock        810.00       811.00   +0.12%
BenchmarkSemaWorkNonblock-2      476.00       414.00  -13.03%
BenchmarkSemaWorkNonblock-4      238.00       228.00   -4.20%
BenchmarkSemaWorkNonblock-8      140.00       126.00  -10.00%
BenchmarkSemaWorkNonblock-16     117.00       116.00   -0.85%
BenchmarkSemaWorkBlock           810.00       811.00   +0.12%
BenchmarkSemaWorkBlock-2         454.00       466.00   +2.64%
BenchmarkSemaWorkBlock-4         243.00       241.00   -0.82%
BenchmarkSemaWorkBlock-8         145.00       137.00   -5.52%
BenchmarkSemaWorkBlock-16        132.00       123.00   -6.82%
BenchmarkContendedSemaphore      123.00       102.00  -17.07%
BenchmarkContendedSemaphore-2     34.80        34.90   +0.29%
BenchmarkContendedSemaphore-4     34.70        34.80   +0.29%
BenchmarkContendedSemaphore-8     34.70        34.70   +0.00%
BenchmarkContendedSemaphore-16    34.80        34.70   -0.29%
BenchmarkMutex                    26.80        26.00   -2.99%
BenchmarkMutex-2                 108.00        45.20  -58.15%
BenchmarkMutex-4                 103.00       127.00  +23.30%
BenchmarkMutex-8                 109.00       147.00  +34.86%
BenchmarkMutex-16                102.00       152.00  +49.02%
BenchmarkMutexSlack               27.00        26.90   -0.37%
BenchmarkMutexSlack-2            149.00       165.00  +10.74%
BenchmarkMutexSlack-4            121.00       209.00  +72.73%
BenchmarkMutexSlack-8            101.00       158.00  +56.44%
BenchmarkMutexSlack-16            97.00       129.00  +32.99%
BenchmarkMutexWork               792.00       794.00   +0.25%
BenchmarkMutexWork-2             407.00       409.00   +0.49%
BenchmarkMutexWork-4             220.00       209.00   -5.00%
BenchmarkMutexWork-8             267.00       160.00  -40.07%
BenchmarkMutexWork-16            315.00       300.00   -4.76%
BenchmarkMutexWorkSlack          792.00       793.00   +0.13%
BenchmarkMutexWorkSlack-2        406.00       404.00   -0.49%
BenchmarkMutexWorkSlack-4        225.00       212.00   -5.78%
BenchmarkMutexWorkSlack-8        268.00       136.00  -49.25%
BenchmarkMutexWorkSlack-16       300.00       300.00   +0.00%
BenchmarkRWMutexWrite100          27.10        27.00   -0.37%
BenchmarkRWMutexWrite100-2        33.10        40.80  +23.26%
BenchmarkRWMutexWrite100-4       113.00        88.10  -22.04%
BenchmarkRWMutexWrite100-8       119.00        95.30  -19.92%
BenchmarkRWMutexWrite100-16      148.00       109.00  -26.35%
BenchmarkRWMutexWrite10           29.60        29.40   -0.68%
BenchmarkRWMutexWrite10-2        111.00        61.40  -44.68%
BenchmarkRWMutexWrite10-4        270.00       208.00  -22.96%
BenchmarkRWMutexWrite10-8        204.00       185.00   -9.31%
BenchmarkRWMutexWrite10-16       261.00       190.00  -27.20%
BenchmarkRWMutexWorkWrite100    1040.00      1036.00   -0.38%
BenchmarkRWMutexWorkWrite100-2   593.00       580.00   -2.19%
BenchmarkRWMutexWorkWrite100-4   470.00       365.00  -22.34%
BenchmarkRWMutexWorkWrite100-8   468.00       289.00  -38.25%
BenchmarkRWMutexWorkWrite100-16  604.00       374.00  -38.08%
BenchmarkRWMutexWorkWrite10      951.00       951.00   +0.00%
BenchmarkRWMutexWorkWrite10-2   1001.00       928.00   -7.29%
BenchmarkRWMutexWorkWrite10-4   1555.00      1006.00  -35.31%
BenchmarkRWMutexWorkWrite10-8   2085.00      1171.00  -43.84%
BenchmarkRWMutexWorkWrite10-16  2082.00      1614.00  -22.48%

R=rsc, iant, msolo, fw, iant
CC=golang-dev
https://golang.org/cl/4711045
2011-07-29 12:44:06 -04:00
Adam Langley 9f4c288c16 hash/crc32: add SSE4.2 support
Using the CRC32 instruction speeds up the Castagnoli computation by
about 20x on a modern Intel CPU.

R=rsc
CC=golang-dev
https://golang.org/cl/4650072
2011-07-12 09:29:24 -04:00
Evan Shaw 4d429c7fe5 6l: More SSE instruction fixes
PSADBW and PSHUFL had the wrong prefixes.

R=rsc
CC=golang-dev
https://golang.org/cl/2836041
2010-11-05 13:59:53 -04:00
Evan Shaw 884dceca1f 6a/6l: fix MOVOU encoding
The andproto field was set incorrectly, causing 6a to encode illegal
instructions.

R=rsc
CC=golang-dev
https://golang.org/cl/2781042
2010-11-01 16:14:43 -04:00
Russ Cox 285312a05c 6l: drop confusing comment
R=ken2
CC=golang-dev
https://golang.org/cl/1693047
2010-07-01 12:51:00 -07:00
Russ Cox bf10982739 6l: implement MOVLQZX as "mov", not "movsxd"
(Here, quoted strings are the official AMD names.)

The amd64 "movsxd" instruction, when invoked
with a 64-bit REX prefix, moves and sign extends
a 32-bit value from register or memory into a
64-bit register.  6.out.h spells this MOVLQSX.

6.out.h also includes MOVLQZX, the zero extending
version, which it implements as "movsxd" without
the REX prefix.  Without the REX prefix it's only sign
extending 32 bits to 32 bits (i.e., not doing anything
to the bits) and then storing in a 32-bit register.
Any write to a 32-bit register zeros the top half of the
corresponding 64-bit register, giving the advertised effect.
This particular implementation of the functionality is
non-standard, because an ordinary 32-bit "mov" would
do the same thing.

Because it is non-standard, it is often mishandled or
not handled by binary translation tools like valgrind.
Switching to the standard "mov" makes the binaries
work better with those tools.

It's probably useful in 6c and 6g to have an explicit
instruction, though, so that the intent of the size
change is clear.  Thus we leave the concept of MOVLQZX
and just implement it by the standard "mov" instead of
the non-standard 32-bit "movsxd".

Fixes #896.

R=ken2
CC=golang-dev
https://golang.org/cl/1733046
2010-07-01 12:18:35 -07:00
Ken Thompson 3f982aeaf6 morestack magic number
automatically generated in 6g and 6c,
manually set in 6a. format is
	TEXT	a(SB),, $a-b
where a is auto size and b is parameter size

SVN=126946
2008-07-12 17:16:22 -07:00
Ken Thompson ddba96aed8 stack offset
SVN=123521
2008-06-18 22:07:09 -07:00
Ken Thompson f997bc6eb6 stack offseet table marker
tacked above each TEXT entry

SVN=123496
2008-06-18 17:51:56 -07:00
Rob Pike 0cafb9ea3d Add compiler source to new directory structure
SVN=121164
2008-06-04 14:37:38 -07:00