Commit Graph

335 Commits

Author SHA1 Message Date
Russ Cox 6034406eae build: more "undefined behavior" fixes
Fixes #5764.

R=golang-dev, bradfitz
CC=golang-dev
https://golang.org/cl/13441051
2013-09-10 14:54:55 -04:00
Russ Cox 80a153dd51 cmd/6l, cmd/8l: fix MOVL MOVQ optab
The entry for LEAL/LEAQ in these optabs was listed as having
two data bytes in the y array. In fact they had and expect no data
bytes. However, the general loop expects to be able to look at at
least one data byte, to make sure it is not 0x0f. So give them each
a single data byte set to 0 (not 0x0f).

Since the MOV instructions have the largest optab cases, this
requires growing the size of the data array.

Clang found this bug because the general o->op[z] == 0x0f
test was using z == 22, which was out of bounds.

In practice the next byte in memory was probably not 0x0f
so it wasn't truly broken. But might as well be clean.

Update #5764

R=ken2
CC=golang-dev
https://golang.org/cl/13241050
2013-09-10 14:53:41 -04:00
Aulus Egnatius Varialus 2b44b36487 cgo: enable cgo on dragonfly
Enable cgo for dragonfly/386 and dragonfly/amd64.

R=golang-dev, jsing, iant, bradfitz
CC=golang-dev
https://golang.org/cl/13247046
2013-09-04 15:19:21 -07:00
Joel Sing d0206101c8 cmd/5l,cmd/6l,cmd/8l: fix dragonflydynld path
R=golang-dev, bradfitz, dave
CC=golang-dev
https://golang.org/cl/13225043
2013-08-31 22:02:21 +10:00
Dmitriy Vyukov 79dca0327e libbio, all cmd: consistently use BGETC/BPUTC instead of Bgetc/Bputc
Also introduce BGET2/4, BPUT2/4 as they are widely used.
Slightly improve BGETC/BPUTC implementation.
This gives ~5% CPU time improvement on go install -a -p1 std.
Before:
real		user		sys
0m23.561s	0m16.625s	0m5.848s
0m23.766s	0m16.624s	0m5.846s
0m23.742s	0m16.621s	0m5.868s
after:
0m22.999s	0m15.841s	0m5.889s
0m22.845s	0m15.808s	0m5.850s
0m22.889s	0m15.832s	0m5.848s

R=golang-dev, r
CC=golang-dev
https://golang.org/cl/12745047
2013-08-30 15:46:12 +04:00
Joel Sing 71dc91db0f all: compiler/bootstrap for dragonfly/amd64
Add dragonfly/amd64 support to the Go compiler, bootstrap and GOOS list.

R=devon.odell, bradfitz
CC=golang-dev
https://golang.org/cl/12796050
2013-08-24 01:18:04 +10:00
Russ Cox 999a36f9af cmd/gc: &x panics if x does
See golang.org/s/go12nil.

This CL is about getting all the right checks inserted.
A followup CL will add an optimization pass to
remove redundant checks.

R=ken2
CC=golang-dev
https://golang.org/cl/12970043
2013-08-15 14:38:32 -04:00
Elias Naur 45233734e2 runtime.cmd/ld: Add ARM external linking and implement -shared in terms of external linking
This CL is an aggregate of 10271047, 10499043, 9733044. Descriptions of each follow:

10499043
runtime,cmd/ld: Merge TLS symbols and teach 5l about ARM TLS

This CL prepares for external linking support to ARM.

The pseudo-symbols runtime.g and runtime.m are merged into a single
runtime.tlsgm symbol. When external linking, the offset of a thread local
variable is stored at a memory location instead of being embedded into a offset
of a ldr instruction. With a single runtime.tlsgm symbol for both g and m, only
one such offset is needed.

The larger part of this CL moves TLS code from gcc compiled to internally
compiled. The TLS code now uses the modern MRC instruction, and 5l is taught
about TLS fallbacks in case the instruction is not available or appropriate.

10271047
This CL adds support for -linkmode external to 5l.

For 5l itself, use addrel to allow for D_CALL relocations to be handled by the
host linker. Of the cases listed in rsc's comment in issue 4069, only case 5 and
63 needed an update. One of the TODO: addrel cases was since replaced, and the
rest of the cases are either covered by indirection through addpool (cases with
LTO or LFROM flags) or stubs (case 74). The addpool cases are covered because
addpool emits AWORD instructions, which in turn are handled by case 11.

In the runtime, change the argv argument in the rt0* functions slightly to be a
pointer to the argv list, instead of relying on a particular location of argv.

9733044
The -shared flag to 6l outputs a shared library, implemented in Go
and callable from non-Go programs such as C.

The main part of this CL change the thread local storage model.
Go uses the fastest and least general mode, local exec. TLS data in shared
libraries normally requires at least the local dynamic mode, however, this CL
instead opts for using the initial exec mode. Initial exec mode is faster than
local dynamic mode and can be used in linux since the linker has reserved a
limited amount of TLS space for performance sensitive TLS code.

Initial exec mode requires an extra load from the GOT table to determine the
TLS offset. This penalty will not be paid if ld is not in -shared mode, since
TLS accesses will be reduced to local exec.

The elf sections .init_array and .rela.init_array are added to register the Go
runtime entry with cgo at library load time.

The "hidden" attribute is added to Cgo functions called from Go, since Go
does not generate call through the GOT table, and adding non-GOT relocations for
a global function is not supported by gcc. Cgo symbols don't need to be global
and avoiding the GOT table is also faster.

The changes to 8l are only removes code relevant to the old -shared mode where
internal linking was used.

This CL only address the low level linker work. It can be submitted by itself,
but to be useful, the runtime changes in CL 9738047 is also needed.

Design discussion at
https://groups.google.com/forum/?fromgroups#!topic/golang-nuts/zmjXkGrEx6Q

Fixes #5590.

R=rsc
CC=golang-dev
https://golang.org/cl/12871044
2013-08-14 15:38:54 +00:00
Russ Cox fa72679f07 cmd/gc: add temporary-merging optimization pass
The compilers assume they can generate temporary variables
as needed to preserve the right semantics or simplify code
generation and the back end will still generate good code.
This turns out not to be true. The back ends will only
track the first 128 variables per function and give up
on the remainder. That needs to be fixed too, in a later CL.

This CL merges temporary variables with equal types and
non-overlapping lifetimes using the greedy algorithm in
Poletto and Sarkar, "Linear Scan Register Allocation",
ACM TOPLAS 1999.

The result can be striking in the right functions.

Top 20 frame size changes in a 6g godoc binary by bytes saved:

5464 1984 (-3480, -63.7%) go/build.(*Context).Import
4456 1824 (-2632, -59.1%) go/printer.(*printer).expr1
2560   80 (-2480, -96.9%) time.nextStdChunk
3496 1608 (-1888, -54.0%) go/printer.(*printer).stmt
1896  272 (-1624, -85.7%) net/http.init
2688 1400 (-1288, -47.9%) fmt.(*pp).printReflectValue
2800 1512 (-1288, -46.0%) main.main
3296 2016 (-1280, -38.8%) crypto/tls.(*Conn).clientHandshake
1664  488 (-1176, -70.7%) time.loadZoneZip
1760  608 (-1152, -65.5%) time.parse
4104 3072 (-1032, -25.1%) runtime/pprof.writeHeap
1680  712 ( -968, -57.6%) go/ast.Walk
2488 1560 ( -928, -37.3%) crypto/x509.parseCertificate
1128  392 ( -736, -65.2%) math/big.nat.divLarge
1528  864 ( -664, -43.5%) go/printer.(*printer).fieldList
1360  712 ( -648, -47.6%) regexp/syntax.(*parser).factor
2104 1528 ( -576, -27.4%) encoding/asn1.parseField
1064  504 ( -560, -52.6%) encoding/xml.(*Decoder).text
 584   48 ( -536, -91.8%) html.init
1400  864 ( -536, -38.3%) go/doc.playExample

In the same godoc build, cuts the number of functions with
too many vars from 83 to 32.

R=ken2
CC=golang-dev
https://golang.org/cl/12829043
2013-08-13 00:09:31 -04:00
Russ Cox 4984e6e9fd cmd/6l: fix printing of frame size in TEXT instruction
R=ken2
CC=golang-dev
https://golang.org/cl/12827043
2013-08-12 22:04:24 -04:00
Keith Randall 5a54696d78 cmd/ld: Put the textflag constants in a separate file.
We can then include this file in assembly to replace
cryptic constants like "7" with meaningful constants
like "(NOPROF|DUPOK|NOSPLIT)".

Converting just pkg/runtime/asm*.s for now.  Dropping NOPROF
and DUPOK from lots of places where they aren't needed.
More .s files to come in a subsequent changelist.

A nonzero number in the textflag field now means
"has not been converted yet".

R=golang-dev, daniel.morsing, rsc, khr
CC=golang-dev
https://golang.org/cl/12568043
2013-08-07 10:23:24 -07:00
Russ Cox 48769bf546 runtime: use funcdata to supply garbage collection information
This CL introduces a FUNCDATA number for runtime-specific
garbage collection metadata, changes the C and Go compilers
to emit that metadata, and changes the runtime to expect it.

The old pseudo-instructions that carried this information
are gone, as is the linker code to process them.

R=golang-dev, dvyukov, cshapiro
CC=golang-dev
https://golang.org/cl/11406044
2013-07-19 16:04:09 -04:00
Russ Cox c3de91bb15 cmd/ld, runtime: use new contiguous pcln table
R=golang-dev, r, dave
CC=golang-dev
https://golang.org/cl/11494043
2013-07-18 10:43:22 -04:00
Russ Cox 567818224e cmd/5l, cmd/6l, cmd/8l: accept PCDATA instruction in input
The portable code in cmd/ld already knows how to process it,
we just have to ignore it during code generation.

R=ken2
CC=golang-dev
https://golang.org/cl/11363043
2013-07-16 16:23:11 -04:00
Russ Cox 5d363c6357 cmd/ld, runtime: new in-memory symbol table format
Design at http://golang.org/s/go12symtab.

This enables some cleanup of the garbage collector metadata
that will be done in future CLs.

This CL does not move the old symtab and pclntab back into
an unmapped section of the file. That's a bit tricky and will be
done separately.

Fixes #4020.

R=golang-dev, dave, cshapiro, iant, r
CC=golang-dev, nigeltao
https://golang.org/cl/11085043
2013-07-16 09:41:38 -04:00
Russ Cox aad4720b51 cmd/6l, cmd/8l: use one-byte XCHG forms when possible
Pointed out by khr.

R=ken2
CC=golang-dev
https://golang.org/cl/11145044
2013-07-12 20:58:38 -04:00
Russ Cox 031c107cad cmd/ld: fix large stack split for preempt check
If the stack frame size is larger than the known-unmapped region at the
bottom of the address space, then the stack split prologue cannot use the usual
condition:

        SP - size >= stackguard

because SP - size may wrap around to a very large number.
Instead, if the stack frame is large, the prologue tests:

        SP - stackguard >= size

(This ends up being a few instructions more expensive, so we don't do it always.)

Preemption requests register by setting stackguard to a very large value, so
that the first test (SP - size >= stackguard) cannot possibly succeed.
Unfortunately, that same very large value causes a wraparound in the
second test (SP - stackguard >= size), making it succeed incorrectly.

To avoid *that* wraparound, we have to amend the test:

        stackguard != StackPreempt && SP - stackguard >= size

This test is only used for functions with large frames, which essentially
always split the stack, so the cost of the few instructions is noise.

This CL and CL 11085043 together fix the known issues with preemption,
at the beginning of a function, so we will be able to try turning it on again.

R=ken2
CC=golang-dev
https://golang.org/cl/11205043
2013-07-12 12:12:56 -04:00
Russ Cox d6d83c918c cmd/ld: place read-only data in non-executable segment
R=golang-dev, dave, r
CC=golang-dev, nigeltao
https://golang.org/cl/10713043
2013-07-11 22:52:48 -04:00
Russ Cox 6c99b5c0d3 cmd/5l, cmd/6l, cmd/8l: increase error buffer size
STRINGSZ (200) is fine for lines generated by things like
instruction dumps, but an error containing a couple file
names can easily exceed that, especially on Macs with
the ridiculous default $TMPDIR.

R=ken2
CC=golang-dev
https://golang.org/cl/11199043
2013-07-11 22:49:15 -04:00
Russ Cox 6fa3c89b77 runtime: record proper goroutine state during stack split
Until now, the goroutine state has been scattered during the
execution of newstack and oldstack. It's all there, and those routines
know how to get back to a working goroutine, but other pieces of
the system, like stack traces, do not. If something does interrupt
the newstack or oldstack execution, the rest of the system can't
understand the goroutine. For example, if newstack decides there
is an overflow and calls throw, the stack tracer wouldn't dump the
goroutine correctly.

For newstack to save a useful state snapshot, it needs to be able
to rewind the PC in the function that triggered the split back to
the beginning of the function. (The PC is a few instructions in, just
after the call to morestack.) To make that possible, we change the
prologues to insert a jmp back to the beginning of the function
after the call to morestack. That is, the prologue used to be roughly:

        TEXT myfunc
                check for split
                jmpcond nosplit
                call morestack
        nosplit:
                sub $xxx, sp

Now an extra instruction is inserted after the call:

        TEXT myfunc
        start:
                check for split
                jmpcond nosplit
                call morestack
                jmp start
        nosplit:
                sub $xxx, sp

The jmp is not executed directly. It is decoded and simulated by
runtime.rewindmorestack to discover the beginning of the function,
and then the call to morestack returns directly to the start label
instead of to the jump instruction. So logically the jmp is still
executed, just not by the cpu.

The prologue thus repeats in the case of a function that needs a
stack split, but against the cost of the split itself, the extra few
instructions are noise. The repeated prologue has the nice effect of
making a stack split double-check that the new stack is big enough:
if morestack happens to return on a too-small stack, we'll now notice
before corruption happens.

The ability for newstack to rewind to the beginning of the function
should help preemption too. If newstack decides that it was called
for preemption instead of a stack split, it now has the goroutine state
correctly paused if rescheduling is needed, and when the goroutine
can run again, it can return to the start label on its original stack
and re-execute the split check.

Here is an example of a split stack overflow showing the full
trace, without any special cases in the stack printer.
(This one was triggered by making the split check incorrect.)

runtime: newstack framesize=0x0 argsize=0x18 sp=0x6aebd0 stack=[0x6b0000, 0x6b0fa0]
        morebuf={pc:0x69f5b sp:0x6aebd8 lr:0x0}
        sched={pc:0x68880 sp:0x6aebd0 lr:0x0 ctxt:0x34e700}
runtime: split stack overflow: 0x6aebd0 < 0x6b0000
fatal error: runtime: split stack overflow

goroutine 1 [stack split]:
runtime.mallocgc(0x290, 0x100000000, 0x1)
        /Users/rsc/g/go/src/pkg/runtime/zmalloc_darwin_amd64.c:21 fp=0x6aebd8
runtime.new()
        /Users/rsc/g/go/src/pkg/runtime/zmalloc_darwin_amd64.c:682 +0x5b fp=0x6aec08
go/build.(*Context).Import(0x5ae340, 0xc210030c71, 0xa, 0xc2100b4380, 0x1b, ...)
        /Users/rsc/g/go/src/pkg/go/build/build.go:424 +0x3a fp=0x6b00a0
main.loadImport(0xc210030c71, 0xa, 0xc2100b4380, 0x1b, 0xc2100b42c0, ...)
        /Users/rsc/g/go/src/cmd/go/pkg.go:249 +0x371 fp=0x6b01a8
main.(*Package).load(0xc21017c800, 0xc2100b42c0, 0xc2101828c0, 0x0, 0x0, ...)
        /Users/rsc/g/go/src/cmd/go/pkg.go:431 +0x2801 fp=0x6b0c98
main.loadPackage(0x369040, 0x7, 0xc2100b42c0, 0x0)
        /Users/rsc/g/go/src/cmd/go/pkg.go:709 +0x857 fp=0x6b0f80
----- stack segment boundary -----
main.(*builder).action(0xc2100902a0, 0x0, 0x0, 0xc2100e6c00, 0xc2100e5750, ...)
        /Users/rsc/g/go/src/cmd/go/build.go:539 +0x437 fp=0x6b14a0
main.(*builder).action(0xc2100902a0, 0x0, 0x0, 0xc21015b400, 0x2, ...)
        /Users/rsc/g/go/src/cmd/go/build.go:528 +0x1d2 fp=0x6b1658
main.(*builder).test(0xc2100902a0, 0xc210092000, 0x0, 0x0, 0xc21008ff60, ...)
        /Users/rsc/g/go/src/cmd/go/test.go:622 +0x1b53 fp=0x6b1f68
----- stack segment boundary -----
main.runTest(0x5a6b20, 0xc21000a020, 0x2, 0x2)
        /Users/rsc/g/go/src/cmd/go/test.go:366 +0xd09 fp=0x6a5cf0
main.main()
        /Users/rsc/g/go/src/cmd/go/main.go:161 +0x4f9 fp=0x6a5f78
runtime.main()
        /Users/rsc/g/go/src/pkg/runtime/proc.c:183 +0x92 fp=0x6a5fa0
runtime.goexit()
        /Users/rsc/g/go/src/pkg/runtime/proc.c:1266 fp=0x6a5fa8

And here is a seg fault during oldstack:

SIGSEGV: segmentation violation
PC=0x1b2a6

runtime.oldstack()
        /Users/rsc/g/go/src/pkg/runtime/stack.c:159 +0x76
runtime.lessstack()
        /Users/rsc/g/go/src/pkg/runtime/asm_amd64.s:270 +0x22

goroutine 1 [stack unsplit]:
fmt.(*pp).printArg(0x2102e64e0, 0xe5c80, 0x2102c9220, 0x73, 0x0, ...)
        /Users/rsc/g/go/src/pkg/fmt/print.go:818 +0x3d3 fp=0x221031e6f8
fmt.(*pp).doPrintf(0x2102e64e0, 0x12fb20, 0x2, 0x221031eb98, 0x1, ...)
        /Users/rsc/g/go/src/pkg/fmt/print.go:1183 +0x15cb fp=0x221031eaf0
fmt.Sprintf(0x12fb20, 0x2, 0x221031eb98, 0x1, 0x1, ...)
        /Users/rsc/g/go/src/pkg/fmt/print.go:234 +0x67 fp=0x221031eb40
flag.(*stringValue).String(0x2102c9210, 0x1, 0x0)
        /Users/rsc/g/go/src/pkg/flag/flag.go:180 +0xb3 fp=0x221031ebb0
flag.(*FlagSet).Var(0x2102f6000, 0x293d38, 0x2102c9210, 0x143490, 0xa, ...)
        /Users/rsc/g/go/src/pkg/flag/flag.go:633 +0x40 fp=0x221031eca0
flag.(*FlagSet).StringVar(0x2102f6000, 0x2102c9210, 0x143490, 0xa, 0x12fa60, ...)
        /Users/rsc/g/go/src/pkg/flag/flag.go:550 +0x91 fp=0x221031ece8
flag.(*FlagSet).String(0x2102f6000, 0x143490, 0xa, 0x12fa60, 0x0, ...)
        /Users/rsc/g/go/src/pkg/flag/flag.go:563 +0x87 fp=0x221031ed38
flag.String(0x143490, 0xa, 0x12fa60, 0x0, 0x161950, ...)
        /Users/rsc/g/go/src/pkg/flag/flag.go:570 +0x6b fp=0x221031ed80
testing.init()
        /Users/rsc/g/go/src/pkg/testing/testing.go:-531 +0xbb fp=0x221031edc0
strings_test.init()
        /Users/rsc/g/go/src/pkg/strings/strings_test.go:1115 +0x62 fp=0x221031ef70
main.init()
        strings/_test/_testmain.go:90 +0x3d fp=0x221031ef78
runtime.main()
        /Users/rsc/g/go/src/pkg/runtime/proc.c:180 +0x8a fp=0x221031efa0
runtime.goexit()
        /Users/rsc/g/go/src/pkg/runtime/proc.c:1269 fp=0x221031efa8

goroutine 2 [runnable]:
runtime.MHeap_Scavenger()
        /Users/rsc/g/go/src/pkg/runtime/mheap.c:438
runtime.goexit()
        /Users/rsc/g/go/src/pkg/runtime/proc.c:1269
created by runtime.main
        /Users/rsc/g/go/src/pkg/runtime/proc.c:166

rax     0x23ccc0
rbx     0x23ccc0
rcx     0x0
rdx     0x38
rdi     0x2102c0170
rsi     0x221032cfe0
rbp     0x221032cfa0
rsp     0x7fff5fbff5b0
r8      0x2102c0120
r9      0x221032cfa0
r10     0x221032c000
r11     0x104ce8
r12     0xe5c80
r13     0x1be82baac718
r14     0x13091135f7d69200
r15     0x0
rip     0x1b2a6
rflags  0x10246
cs      0x2b
fs      0x0
gs      0x0

Fixes #5723.

R=r, dvyukov, go.peter.90, dave, iant
CC=golang-dev
https://golang.org/cl/10360048
2013-06-27 11:32:01 -04:00
Adam Langley 6bea504b94 cmd/6a, cmd/6l: add PCLMULQDQ instruction.
This Intel instruction implements multiplication in binary fields.

R=golang-dev, minux.ma, dave, rsc
CC=golang-dev
https://golang.org/cl/10428043
2013-06-21 15:17:13 -04:00
Russ Cox 1f51d27922 cmd/gc: move genembedtramp into portable code
Requires adding new linker instruction
        RET	f(SB)
meaning return but then immediately call f.
This is what you'd use to implement a tail call after
fiddling with the arguments, but the compiler only
uses it in genwrapper.

This CL eliminates the copy-and-paste genembedtramp
functions from 5g/8g/6g and makes the code run on ARM
for the first time. It removes a small special case for function
generation, which should help Carl a bit, but at the same time
it does not bother to implement general tail call optimization,
which we do not want anyway.

Fixes #5627.

R=ken2
CC=golang-dev
https://golang.org/cl/10057044
2013-06-11 09:41:49 -04:00
Russ Cox 26d43a0f22 cmd/6l: accept NOP of $x+10(SP) and of X0
Needed to link code compiled with 6c -N.

R=ken2
CC=golang-dev
https://golang.org/cl/10043044
2013-06-05 10:38:52 -04:00
Lucio De Re 0b88587d22 cmd/[568]l/obj.c: NULL is not recognised in Plan 9 build, use nil instead.
Fixes #5591.

R=golang-dev, dave, minux.ma, cshapiro
CC=carl shapiro <cshapiro, golang-dev
https://golang.org/cl/9839046
2013-05-30 15:02:10 +10:00
Carl Shapiro 4e0a51c210 cmd/5l, cmd/6l, cmd/8l, cmd/gc, runtime: generate and use bitmaps of argument pointer locations
With this change the compiler emits a bitmap for each function
covering its stack frame arguments area.  If an argument word
is known to contain a pointer, a bit is set.  The garbage
collector reads this information when scanning the stack by
frames and uses it to ignores locations known to not contain a
pointer.

R=golang-dev, bradfitz, daniel.morsing, dvyukov, khr, khr, iant, cshapiro
CC=golang-dev
https://golang.org/cl/9223046
2013-05-28 17:59:10 -07:00
Ian Lance Taylor 9182c364aa cmd/ld: add -extld and -extldflags options
Permits specifying the linker to use, and trailing flags to
pass to that linker, when linking in external mode.  External
mode linking is used when building a package that uses cgo, as
described in the cgo docs.

Also document -linkmode and -tmpdir.

R=golang-dev, r
CC=golang-dev
https://golang.org/cl/8225043
2013-04-01 12:56:18 -07:00
Ian Lance Taylor 3197be4807 cmd/dist, cmd/ld: GO_EXTLINK_ENABLED=0 defaults to -linkmode=internal
Change build system to set GO_EXTLINK_ENABLED=0 by default for
OS X 10.6, since the system linker has a bug and can not
handle the object files generated by 6l.

Fixes #5130.

R=golang-dev, r
CC=golang-dev
https://golang.org/cl/8183043
2013-03-29 16:33:35 -07:00
Ian Lance Taylor e7fc9a5c57 cmd/6l: fix OpenBSD build
Avoid generating TLS relocations on OpenBSD.

R=golang-dev, bradfitz
CC=golang-dev
https://golang.org/cl/7641055
2013-03-27 14:32:51 -07:00
Ian Lance Taylor 30e29ee9b6 cmd/ld: emit TLS relocations during external linking
This CL was written by rsc.  I just tweaked 8l.

This CL adds TLS relocation to the ELF .o file we write during external linking,
so that the host linker (gcc) can decide the final location of m and g.

Similar relocations are not necessary on OS X because we use an alternate
program start-time mechanism to acquire thread-local storage.

Similar relocations are not necessary on ARM or Plan 9 or Windows
because external linking mode is not yet supported on those systems.

On almost all ELF systems, the references we use are like %fs:-0x4 or %gs:-0x4,
which we write in 6a/8a as -0x4(FS) or -0x4(GS). On Linux/ELF, however,
Xen's lack of support for this mode forced us long ago to use a two-instruction
sequence: first we load %gs:0x0 into a register r, and then we use -0x4(r).
(The ELF program loader arranges that %gs:0x0 contains a regular pointer to
that same memory location.) In order to relocate those -0x4(r) references,
the linker must know where they are. This CL adds the equivalent notation
-0x4(r)(GS*1) for this purpose: it assembles to the same encoding as -0x4(r)
but the (GS*1) indicates to the linker that this is one of those thread-local
references that needs relocation.

Thanks to Elias Naur for reminding me about this missing piece and
also for writing the test.

R=r
CC=golang-dev
https://golang.org/cl/7891047
2013-03-27 13:27:35 -07:00
Rémy Oudompheng d815a14718 cmd/5l, cmd/6l, cmd/8l: fix segfault on reading LOCALS for a duplicate definition.
Fixes #5105.

R=golang-dev, dave, daniel.morsing, rsc
CC=golang-dev
https://golang.org/cl/7965043
2013-03-25 22:09:55 +01:00
Rémy Oudompheng a3c2d62a9a cmd/5l, cmd/6l, cmd/8l: remove declaration on non-existent variables.
R=golang-dev, minux.ma
CC=golang-dev
https://golang.org/cl/7985043
2013-03-24 08:55:08 +01:00
Russ Cox b505ff6279 crypto/rc4: faster amd64 implementation
XOR key into data 128 bits at a time instead of 64 bits
and pipeline half of state loads. Rotate loop to allow
single-register indexing for state[i].

On a MacBookPro10,2 (Core i5):

benchmark           old ns/op    new ns/op    delta
BenchmarkRC4_128          412          224  -45.63%
BenchmarkRC4_1K          3179         1613  -49.26%
BenchmarkRC4_8K         25223        12545  -50.26%

benchmark            old MB/s     new MB/s  speedup
BenchmarkRC4_128       310.51       570.42    1.84x
BenchmarkRC4_1K        322.09       634.48    1.97x
BenchmarkRC4_8K        320.97       645.32    2.01x

For comparison, on the same machine, openssl 0.9.8r reports
its rc4 speed as somewhat under 350 MB/s for both 1K and 8K
(it is operating 64 bits at a time).

On an Intel Xeon E5520:

benchmark           old ns/op    new ns/op    delta
BenchmarkRC4_128          418          259  -38.04%
BenchmarkRC4_1K          3200         1884  -41.12%
BenchmarkRC4_8K         25173        14529  -42.28%

benchmark            old MB/s     new MB/s  speedup
BenchmarkRC4_128       306.04       492.48    1.61x
BenchmarkRC4_1K        319.93       543.26    1.70x
BenchmarkRC4_8K        321.61       557.20    1.73x

For comparison, on the same machine, openssl 1.0.1
reports its rc4 speed as 587 MB/s for 1K and 601 MB/s for 8K.

R=agl
CC=golang-dev
https://golang.org/cl/7865046
2013-03-21 16:38:57 -04:00
Russ Cox b4f3533c92 cmd/ld: replace -hostobj with -linkmode
Still disabled. Need to fix TLS.

R=golang-dev, minux.ma, bradfitz
CC=golang-dev
https://golang.org/cl/7783044
2013-03-19 15:45:42 -04:00
Russ Cox 8bbb6d3ed0 cmd/ld: another use-after-free
This only shows up in the duplicate symbol error message.

R=golang-dev, r
CC=golang-dev
https://golang.org/cl/7486053
2013-03-14 14:35:47 -04:00
Russ Cox 3b85b724c5 cmd/ld: darwin support for host linking
R=ken2
CC=golang-dev
https://golang.org/cl/7626045
2013-03-11 00:51:42 -04:00
Russ Cox 9e13803ae1 cmd/ld: avoid redundant external relocation calculations
R=ken2, ken
CC=golang-dev
https://golang.org/cl/7483045
2013-03-10 19:07:16 -07:00
Russ Cox 96b243fa47 cmd/ld: replace dynimpname with extname
Dynimpname was getting too confusing.
Replace flag-like checks with tests of s->type.

R=ken2
CC=golang-dev
https://golang.org/cl/7594046
2013-03-10 18:19:53 -04:00
Russ Cox df6072b41c cmd/ld: include full symbol table in Mach-O output
This makes binaries work with OS X nm.

R=ken2
CC=golang-dev
https://golang.org/cl/7558044
2013-03-10 16:24:01 -04:00
Russ Cox e982ecacd1 cmd/ld: add tmpdir flag to preserve temp files
R=ken2
CC=golang-dev
https://golang.org/cl/7497044
2013-03-10 12:50:44 -04:00
Steve McCoy 18f926aab3 cgo: enable external linking mode on FreeBSD amd64.
Tested on FreeBSD 9.1 amd64, per rsc's instructions at
https://groups.google.com/d/topic/golang-dev/HjRTLvRsJXo/discussion .

R=golang-dev, lucio.dere, devon.odell, rsc
CC=golang-dev
https://golang.org/cl/7664044
2013-03-09 14:51:57 -08:00
Russ Cox e0c430d5b7 cmd/6l, cmd/8l: fix BSD builds
Before this CL, running

        cd misc/cgo/test
        go test -c
        readelf --dyn-syms test.test | grep cgoexp

turned up many UNDEF symbols corresponding to symbols actually
in the binary but marked only cgo_export_static. Only symbols
marked cgo_export_dynamic should be listed in this mode.
And if the symbol is going to be listed, it should be listed with its
actual address instead of UNDEF.

The Linux dynamic linker didn't care about the seemingly missing
symbols, but the BSD one did.

This CL eliminates the symbols from the dyn-syms table.

R=golang-dev
TBR=golang-dev
CC=golang-dev
https://golang.org/cl/7624043
2013-03-07 21:23:59 -08:00
Russ Cox 7663ffcae6 cmd/ld: steps toward 386 host linking
- Introduce MaxAlign constant and use in data layout
and ELF section header.

- Allow up to 16-byte alignment for large objects
(will help Keith's hash changes).

- Emit ELF symbol for .rathole (global /dev/null used by 8c).

- Invoke gcc with -m32/-m64 as appropriate.

- Don't invoke gcc if writing the .o file failed.

R=golang-dev, iant
CC=golang-dev
https://golang.org/cl/7563045
2013-03-07 19:57:25 -08:00
Keith Randall 297bb12809 cmd/6a, cmd/8a, cmd/6l, cmd/8l: add AES instructions
Instructions for use in AES hashing.  See CL#7543043

R=rsc
CC=golang-dev
https://golang.org/cl/7548043
2013-03-07 12:54:00 -08:00
Russ Cox 60f783d92b cmd/ld: host linking support for linux/amd64
Still to do: non-linux and non-amd64.
It may work on other ELF-based amd64 systems too, but untested.

"go test -ldflags -hostobj $GOROOT/misc/cgo/test" passes.

Much may yet change, but this seems a reasonable checkpoint.

R=iant
CC=golang-dev
https://golang.org/cl/7369057
2013-03-07 09:19:02 -05:00
Russ Cox 1d5dc4fd48 cmd/gc: emit explicit type information for local variables
The type information is (and for years has been) included
as an extra field in the address chunk of an instruction.
Unfortunately, suppose there is a string at a+24(FP) and
we have an instruction reading its length. It will say:

        MOVQ x+32(FP), AX

and the type of *that* argument is int (not slice), because
it is the length being read. This confuses the picture seen
by debuggers and now, worse, by the garbage collector.

Instead of attaching the type information to all uses,
emit an explicit list of TYPE instructions with the information.
The TYPE instructions are no-ops whose only role is to
provide an address to attach type information to.

For example, this function:

        func f(x, y, z int) (a, b string) {
                return
        }

now compiles into:

        --- prog list "f" ---
        0000 (/Users/rsc/x.go:3) TEXT    f+0(SB),$0-56
        0001 (/Users/rsc/x.go:3) LOCALS  ,
        0002 (/Users/rsc/x.go:3) TYPE    x+0(FP){int},$8
        0003 (/Users/rsc/x.go:3) TYPE    y+8(FP){int},$8
        0004 (/Users/rsc/x.go:3) TYPE    z+16(FP){int},$8
        0005 (/Users/rsc/x.go:3) TYPE    a+24(FP){string},$16
        0006 (/Users/rsc/x.go:3) TYPE    b+40(FP){string},$16
        0007 (/Users/rsc/x.go:3) MOVQ    $0,b+40(FP)
        0008 (/Users/rsc/x.go:3) MOVQ    $0,b+48(FP)
        0009 (/Users/rsc/x.go:3) MOVQ    $0,a+24(FP)
        0010 (/Users/rsc/x.go:3) MOVQ    $0,a+32(FP)
        0011 (/Users/rsc/x.go:4) RET     ,

The { } show the formerly hidden type information.
The { } syntax is used when printing from within the gc compiler.
It is not accepted by the assemblers.

The same type information is now included on global variables:

0055 (/Users/rsc/x.go:15) GLOBL   slice+0(SB){[]string},$24(AL*0)

This more accurate type information fixes a bug in the
garbage collector's precise heap collection.

The linker only cares about globals right now, but having the
local information should make things a little nicer for Carl
in the future.

Fixes #4907.

R=ken2
CC=golang-dev
https://golang.org/cl/7395056
2013-02-25 12:13:47 -05:00
Russ Cox d57fcbf05c cmd/5l, cmd/6l, cmd/8l: accept CALL reg, reg
The new src argument is ignored during linking
(that is, CALL r1, r2 is identical to CALL r2 for linking),
but it serves as a hint to the 5g/6g/8g optimizer
that the src register is live on entry to the called
function and must be preserved.

It is possible to avoid exposing this fact to the rest of
the toolchain, keeping it entirely within 5g/6g/8g,
but I think it will help to be able to look in object files
and assembly listings and linker -a / -W output to
see CALL instructions are "Go func value" calls and
which are "C function pointer" calls.

R=ken2
CC=golang-dev
https://golang.org/cl/7364045
2013-02-22 14:23:21 -05:00
Carl Shapiro f466617a62 cmd/5g, cmd/5l, cmd/6l, cmd/8l, cmd/gc, cmd/ld, runtime: accurate args and locals information
Previously, the func structure contained an inaccurate value for
the args member and a 0 value for the locals member.

This change populates the func structure with args and locals
values computed by the compiler.  The number of args was
already available in the ATEXT instruction.  The number of
locals is now passed through in the new ALOCALS instruction.

This change also switches the unit of args and locals to be
bytes, just like the frame member, instead of 32-bit words.

R=golang-dev, bradfitz, cshapiro, dave, rsc
CC=golang-dev
https://golang.org/cl/7399045
2013-02-21 12:52:26 -08:00
Robert Griesemer 3ee87d02b0 cmd/godoc: use go/build to determine package and example files
Also:
- faster code for example extraction
- simplify handling of command documentation:
  all "main" packages are treated as commands
- various minor cleanups along the way

For commands written in Go, any doc.go file containing
documentation must now be part of package main (rather
then package documentation), otherwise the documentation
won't show up in godoc (it will still build, though).

For commands written in C, documentation may still be
in doc.go files defining package documentation, but the
recommended way is to explicitly ignore those files with
a +build ignore constraint to define package main.

Fixes #4806.

R=adg, rsc, dave, bradfitz
CC=golang-dev
https://golang.org/cl/7333046
2013-02-19 11:19:58 -08:00
Russ Cox 0cb0f6d090 cmd/ld: support for linking with host linker
A step toward a fix for issue 4069.

To allow linking with arbitrary host object files, add a linker mode
that can generate a host object file instead of an executable.
Then the host linker can be invoked to generate the final executable.

This CL adds a new -hostobj flag that instructs the linker to write
a host object file instead of an executable.

That is, this works:

        go tool 6g x.go
        go tool 6l -hostobj -o x.o x.6
        ld -e _rt0_amd64_linux x.o
        ./a.out

as does:

        go tool 8g x.go
        go tool 8l -hostld ignored -o x.o x.8
        ld -m elf_i386 -e _rt0_386_linux x.o
        ./a.out

Because 5l was never updated to use the standard relocation scheme,
it will take more work to get this working on ARM.

This is a checkpoint of the basic functionality. It does not work
with cgo yet, and cgo is the main reason for the change.
The command-line interface will likely change too.
The gc linker has other information that needs to be returned to
the caller for use when invoking the host linker besides the single
object file.

R=iant, iant
CC=golang-dev
https://golang.org/cl/7060044
2013-01-31 14:11:32 -08:00
Elias Naur 3bdeaf2a64 6l/5l: PIC and shared library support for the linkers.
Added the -shared flag to 5l/6l to output a PIC executable with the required
dynamic relocations and RIP-relative addressing in machine code.
Added dummy support to 8l to avoid compilation errors

See also:
https://golang.org/cl/6822078
https://golang.org/cl/7064048

and

https://groups.google.com/d/topic/golang-nuts/P05BDjLcQ5k/discussion

R=rsc, iant
CC=golang-dev
https://golang.org/cl/6926049
2013-01-30 08:46:56 -08:00