diff --git a/doc/asm.html b/doc/asm.html new file mode 100644 index 0000000000..ba19700643 --- /dev/null +++ b/doc/asm.html @@ -0,0 +1,402 @@ + + +
+This document is a quick outline of the unusual form of assembly language used by the gc
+suite of Go compilers (6g, 8g, etc.).
+It is based on the input to the Plan 9 assemblers, which is documented in detail
+on the Plan 9 site.
+If you plan to write assembly language, you should read that document although much of it is Plan 9-specific.
+This document provides a summary of the syntax and
+describes the peculiarities that apply when writing assembly code to interact with Go.
+
+The most important thing to know about Go's assembler is that it is not a direct representation of the underlying machine.
+Some of the details map precisely to the machine, but some do not.
+This is because the compiler suite (see
+this description)
+needs no assembler pass in the usual pipeline.
+Instead, the compiler emits a kind of incompletely defined instruction set, in binary form, which the linker
+then completes.
+In particular, the linker does instruction selection, so when you see an instruction like MOV
+what the linker actually generates for that operation might not be a move instruction at all, perhaps a clear or load.
+Or it might correspond exactly to the machine instruction with that name.
+In general, machine-specific operations tend to appear as themselves, while more general concepts like
+memory move and subroutine call and return are more abstract.
+The details vary with architecture, and we apologize for the imprecision; the situation is not well-defined.
+
+The assembler program is a way to generate that intermediate, incompletely defined instruction sequence
+as input for the linker.
+If you want to see what the instructions look like in assembly for a given architecture, say amd64, there
+are many examples in the sources of the standard library, in packages such as
+runtime and
+math/big.
+You can also examine what the compiler emits as assembly code:
+
+$ cat x.go
+package main
+
+func main() {
+ println(3)
+}
+$ go tool 6g -S x.go # or: go build -gcflags -S x.go
+
+--- prog list "main" ---
+0000 (x.go:3) TEXT main+0(SB),$8-0
+0001 (x.go:3) FUNCDATA $0,gcargs·0+0(SB)
+0002 (x.go:3) FUNCDATA $1,gclocals·0+0(SB)
+0003 (x.go:4) MOVQ $3,(SP)
+0004 (x.go:4) PCDATA $0,$8
+0005 (x.go:4) CALL ,runtime.printint+0(SB)
+0006 (x.go:4) PCDATA $0,$-1
+0007 (x.go:4) PCDATA $0,$0
+0008 (x.go:4) CALL ,runtime.printnl+0(SB)
+0009 (x.go:4) PCDATA $0,$-1
+0010 (x.go:5) RET ,
+...
+
+
+
+The FUNCDATA and PCDATA directives contain information
+for use by the garbage collector; they are introduced by the compiler.
+
+To see what gets put in the binary after linking, add the -a flag to the linker:
+
+$ go tool 6l -a x.6 # or: go build -ldflags -a x.go +codeblk [0x2000,0x1d059) at offset 0x1000 +002000 main.main | (3) TEXT main.main+0(SB),$8 +002000 65488b0c25a0080000 | (3) MOVQ 2208(GS),CX +002009 483b21 | (3) CMPQ SP,(CX) +00200c 7707 | (3) JHI ,2015 +00200e e83da20100 | (3) CALL ,1c250+runtime.morestack00 +002013 ebeb | (3) JMP ,2000 +002015 4883ec08 | (3) SUBQ $8,SP +002019 | (3) FUNCDATA $0,main.gcargs·0+0(SB) +002019 | (3) FUNCDATA $1,main.gclocals·0+0(SB) +002019 48c7042403000000 | (4) MOVQ $3,(SP) +002021 | (4) PCDATA $0,$8 +002021 e8aad20000 | (4) CALL ,f2d0+runtime.printint +002026 | (4) PCDATA $0,$-1 +002026 | (4) PCDATA $0,$0 +002026 e865d40000 | (4) CALL ,f490+runtime.printnl +00202b | (4) PCDATA $0,$-1 +00202b 4883c408 | (5) ADDQ $8,SP +00202f c3 | (5) RET , +... ++ + +
+Some symbols, such as PC, R0 and SP, are predeclared and refer to registers.
+There are two other predeclared symbols, SB (static base) and FP (frame pointer).
+All user-defined symbols other than jump labels are written as offsets to these pseudo-registers.
+
+The SB pseudo-register can be thought of as the origin of memory, so the symbol foo(SB)
+is the name foo as an address in memory.
+
+The FP is a virtual frame pointer.
+The compilers maintain a virtual frame pointer and refer to the arguments on the stack as offsets from that pseudo-register.
+Thus 0(FP) is the first argument to the function,
+8(FP) is the second (on a 64-bit machine), and so on.
+To refer to an argument by name, add the name to the numerical offset, like this: first_arg+0(FP).
+The name in this syntax has no semantic value; think of it as a comment to the reader.
+
+Instructions, registers, and assembler directives are always in UPPER CASE to remind you
+that assembly programming is a fraught endeavor.
+(Exceptions: the m and g register renamings on ARM.)
+
+In Go object files and binaries, the full name of a symbol is the
+package path followed by a period and the symbol name:
+fmt.Printf or math/rand.Int.
+Because the assembler's parser treats period and slash as punctuation,
+those strings cannot be used directly as identifier names.
+Instead, the assembler allows the middle dot character U+00B7
+and the division slash U+2215 in identifiers and rewrites them to
+plain period and slash.
+Within an assembler source file, the symbols above are written as
+fmt·Printf and math∕rand·Int.
+The assembly listings generated by the compilers when using the -S flag
+show the period and slash directly instead of the Unicode replacements
+required by the assemblers.
+
+Most hand-written assembly files do not include the full package path
+in symbol names, because the linker inserts the package path of the current
+object file at the beginning of any name starting with a period:
+in an assembly source file within the math/rand package implementation,
+the package's Int function can be referred to as ·Int.
+This convention avoids the need to hard-code a package's import path in its
+own source code, making it easier to move the code from one location to another.
+
+The assembler uses various directives to bind text and data to symbol names.
+For example, here is a simple complete function definition. The TEXT
+directive declares the symbol runtime·profileloop and the instructions
+that follow form the body of the function.
+The last instruction in a TEXT block must be some sort of jump, usually a RET (pseudo-)instruction.
+(If it's not, the linker will append a jump-to-itself instruction; there is no fallthrough in TEXTs.)
+After the symbol, the arguments are flags (see below)
+and the frame size, a constant (but see below):
+
+TEXT runtime·profileloop(SB),NOSPLIT,$8 + MOVQ $runtime·profileloop1(SB), CX + MOVQ CX, 0(SP) + CALL runtime·externalthreadhandler(SB) + RET ++ +
+In the general case, the frame size is followed by an argument size, separated by a minus sign.
+(It's not an subtraction, just idiosyncratic syntax.)
+The frame size $24-8 states that the function has a 24-byte frame
+and is called with 8 bytes of argument, which live on the caller's frame.
+If NOSPLIT is not specified for the TEXT,
+the argument size must be provided.
+
+Note that the symbol name uses a middle dot to separate the components and is specified as an offset from the
+static base pseudo-register SB.
+This function would be called from Go source for package runtime using the
+simple name profileloop.
+
+For DATA directives, the symbol is followed by a slash and the number
+of bytes the memory associated with the symbol occupies.
+The arguments are optional flags and the data itself.
+For instance,
+
+DATA runtime·isplan9(SB)/4, $1 ++ +
+declares the local symbol runtime·isplan9 of size 4 and value 1.
+Again the symbol has the middle dot and is offset from SB.
+
+The GLOBL directive declares a symbol to be global.
+The arguments are optional flags and the size of the data being declared as a global,
+which will have initial value all zeros unless a DATA directive
+has initialized it.
+The GLOBL directive must follow any corresponding DATA directives.
+This example
+
+GLOBL runtime·tlsoffset(SB),$4 ++ +
+declares runtime·tlsoffset to have size 4.
+
+There may be one or two arguments to the directives.
+If there are two, the first is a bit mask of flags,
+which can be written as numeric expressions, added or or-ed together,
+or can be set symbolically for easier absorption by a human.
+Their values, defined in the file src/cmd/ld/textflag.h, are:
+
NOPROF = 1
+TEXT items.)
+Don't profile the marked function. This flag is deprecated.
+DUPOK = 2
+NOSPLIT = 4
+TEXT items.)
+Don't insert the preamble to check if the stack must be split.
+The frame for the routine, plus anything it calls, must fit in the
+spare space at the top of the stack segment.
+Used to protect routines such as the stack splitting code itself.
+RODATA = 8
+DATA and GLOBL items.)
+Put this data in a read-only section.
+NOPTR = 16
+DATA and GLOBL items.)
+This data contains no pointers and therefore does not need to be
+scanned by the garbage collector.
+WRAPPER = 32
+TEXT items.)
+This is a wrapper function and should not count as disabling recover.
+
+It is impractical to list all the instructions and other details for each machine.
+To see what instructions are defined for a given machine, say 32-bit Intel x86,
+look in the top-level header file for the corresponding linker, in this case 8l.
+That is, the file $GOROOT/src/cmd/8l/8.out.h contains a C enumeration, called as,
+of the instructions and their spellings as known to the assembler and linker for that architecture.
+In that file you'll find a declaration that begins
+
+enum as
+{
+ AXXX,
+ AAAA,
+ AAAD,
+ AAAM,
+ AAAS,
+ AADCB,
+ ...
+
+
+
+Each instruction begins with a initial capital A in this list, so AADCB
+represents the ADCB (add carry byte) instruction.
+The enumeration is in alphabetical order, plus some late additions (AXXX occupies
+the zero slot as an invalid instruction).
+The sequence has nothing to do with the actual encoding of the machine instructions.
+Again, the linker takes care of that detail.
+
+One detail evident in the examples from the previous sections is that data in the instructions flows from left to right:
+MOVQ $0, CX clears CX.
+This convention applies even on architectures where the usual mode is the opposite direction.
+
+Here follows some descriptions of key Go-specific details for the supported architectures. +
+ +
+The runtime pointers to the m and g structures are maintained
+through the value of an otherwise unused (as far as Go is concerned) register in the MMU.
+A OS-dependent macro get_tls is defined for the assembler if the source includes
+an architecture-dependent header file, like this:
+
+#include "zasm_GOOS_GOARCH.h" ++ +
+Within the runtime, the get_tls macro loads its argument register
+with a pointer to a pair of words representing the g and m pointers.
+The sequence to load g and m using CX looks like this:
+
+get_tls(CX) +MOVL g(CX), AX // Move g into AX. +MOVL m(CX), BX // Move m into BX. ++ +
+The assembly code to access the m and g
+pointers is the same as on the 386, except it uses MOVQ rather than
+MOVL:
+
+get_tls(CX) +MOVQ g(CX), AX // Move g into AX. +MOVQ m(CX), BX // Move m into BX. ++ +
+The registers R9 and R10 are reserved by the
+compiler and linker to point to the m (machine) and g
+(goroutine) structures, respectively.
+Within assembler source code, these pointers
+can be referred to as simply m and g.
+
+When defining a TEXT, specifying frame size $-4
+tells the linker that this is a leaf function that does not need to save LR on entry.
+
+The assemblers are designed to support the compiler so not all hardware instructions
+are defined for all architectures: if the compiler doesn't generate it, it might not be there.
+If you need to use a missing instruction, there are two ways to proceed.
+One is to update the assembler to support that instruction, which is straightforward
+but only worthwhile if it's likely the instruction will be used again.
+Instead, for simple one-off cases, it's possible to use the BYTE
+and WORD directives
+to lay down explicit data into the instruction stream within a TEXT.
+Here's how the 386 runtime defines the 64-bit atomic load function.
+
+// uint64 atomicload64(uint64 volatile* addr); +// so actually +// void atomicload64(uint64 *res, uint64 volatile *addr); +TEXT runtime·atomicload64(SB), NOSPLIT, $0-8 + MOVL 4(SP), BX + MOVL 8(SP), AX + // MOVQ (%EAX), %MM0 + BYTE $0x0f; BYTE $0x6f; BYTE $0x00 + // MOVQ %MM0, 0(%EBX) + BYTE $0x0f; BYTE $0x7f; BYTE $0x03 + // EMMS + BYTE $0x0F; BYTE $0x77 + RET +