mirror of https://github.com/golang/go.git
internal/pkgbits, cmd/compile/internal/noder: document string section
To understand this change, we begin with a short description of the UIR
file format.
Every file is a header followed by a series of sections. Each section
has a kind, which determines the type of elements it contains. An
element is just a collection of one or more primitives, as defined by
package pkgbits.
Strings have their own section. Elements in the string section contain
only string primitives. To use a string, elements in other sections
encode a reference to the string section.
To illustrate, consider a simple file which exports nothing at all.
package p
In the meta section, there is an element representing a package stub.
In that package stub, a string ("p") represents both the path and name
of the package. Again, these are encoded as references.
To manage references, every element begins with a reference table.
Instead of writing the bytes for "p" directly, the package stub encodes
an index in this reference table. At that index, a pair of numbers is
stored, indicating:
1. which section
2. which element index within the section
Effectively, elements always use *2* layers of indirection; first to the
reference table, then to the bytes themselves.
With some minor hand-waving, an encoding for the above package is given
below, with (S)ections, (E)lements and (P)rimitives denoted.
+ Header
| + Section Ends // each section has 1 element
| | + 1 // String is elements [0, 1)
| | + 2 // Meta is elements [1, 2)
| + Element Ends
| | + 1 // "p" is bytes [0, 1)
| | + 6 // stub is bytes [1, 6)
+ Payload
| + (S) String
| | + (E) String
| | | + (P) String { byte } 0x70 // "p"
| + (S) Meta
| | + (E) Package Stub
| | | + Reference Table
| | | | + (P) Entry Count uvarint 1 // there is a single entry
| | | | + (P) 0th Section uvarint 0 // to String, 0th section
| | | | + (P) 0th Index uvarint 0 // to 0th element in String
| | | + Internals
| | | | + (P) Path uvarint 0 // 0th entry in table
| | | | + (P) Name uvarint 0 // 0th entry in table
Note that string elements do not have reference tables like other
elements. They behave more like a primitive.
As this is a bit complicated and getting into details of the UIR file
format, we omit some details in the documentation here. The structure
will become clearer as we continue documenting.
Change-Id: I12a5ce9a34251c5358a20f2f2c4d0f9bd497f4d0
Reviewed-on: https://go-review.googlesource.com/c/go/+/671997
Reviewed-by: Robert Griesemer <gri@google.com>
Auto-Submit: Mark Freeman <mark@golang.org>
TryBot-Bypass: Mark Freeman <mark@golang.org>
This commit is contained in:
parent
adcad7bea9
commit
a24f4db2a2
|
|
@ -20,7 +20,7 @@ The payload is a series of sections. Each section has a kind which determines
|
||||||
its index in the series.
|
its index in the series.
|
||||||
|
|
||||||
SectionKind = Uint64 .
|
SectionKind = Uint64 .
|
||||||
Payload = SectionString // TODO(markfreeman) Define.
|
Payload = SectionString
|
||||||
SectionMeta
|
SectionMeta
|
||||||
SectionPosBase // TODO(markfreeman) Define.
|
SectionPosBase // TODO(markfreeman) Define.
|
||||||
SectionPkg // TODO(markfreeman) Define.
|
SectionPkg // TODO(markfreeman) Define.
|
||||||
|
|
@ -40,6 +40,12 @@ accessed using an index relative to the start of the section.
|
||||||
// TODO(markfreeman): Rename to SectionIndex.
|
// TODO(markfreeman): Rename to SectionIndex.
|
||||||
RelIndex = Uint64 .
|
RelIndex = Uint64 .
|
||||||
|
|
||||||
|
## String Section
|
||||||
|
String values are stored as elements in the string section. Elements outside
|
||||||
|
the string section access string values by reference.
|
||||||
|
|
||||||
|
SectionString = { String } .
|
||||||
|
|
||||||
## Meta Section
|
## Meta Section
|
||||||
The meta section provides fundamental information for a package. It contains
|
The meta section provides fundamental information for a package. It contains
|
||||||
exactly two elements — a public root and a private root.
|
exactly two elements — a public root and a private root.
|
||||||
|
|
|
||||||
|
|
@ -16,51 +16,62 @@ zvarint = (* a zig-zag encoded signed variable-width integer *) .
|
||||||
uvarint = (* an unsigned variable-width integer *) .
|
uvarint = (* an unsigned variable-width integer *) .
|
||||||
|
|
||||||
# Strings
|
# Strings
|
||||||
Strings are not encoded directly. Rather, they are deduplicated during encoding
|
A string is a series of bytes.
|
||||||
and referenced where needed.
|
|
||||||
|
|
||||||
String = [ Sync ] StringRef .
|
// TODO(markfreeman): Does this need a marker?
|
||||||
StringRef = [ Sync ] Uint64 . // TODO(markfreeman): Document.
|
String = { byte } .
|
||||||
|
|
||||||
StringSlice = Uint64 // the number of strings in the slice
|
Strings are typically not encoded directly. Rather, they are deduplicated
|
||||||
{ String }
|
during encoding and referenced where needed; this process is called interning.
|
||||||
.
|
|
||||||
|
|
||||||
// TODO(markfreeman) It is awkward to discuss references (and by extension
|
StringRef = [ Sync ] Ref[String] .
|
||||||
// strings and constants). We cannot explain how they resolve without mention
|
|
||||||
// of foreign concepts. Ideally, references would be defined in familar terms —
|
Note that StringRef is *not* equivalent to Ref[String] due to the extra marker.
|
||||||
// perhaps using an index on the byte array.
|
|
||||||
|
# References
|
||||||
|
References specify the location of a value. While the representation here is
|
||||||
|
fixed, the interpretation of a reference is left to other packages.
|
||||||
|
|
||||||
|
Ref[T] = [ Sync ] Uint64 . // points to a value of type T
|
||||||
|
|
||||||
|
# Slices
|
||||||
|
Slices are a convenience for encoding a series of values of the same type.
|
||||||
|
|
||||||
|
// TODO(markfreeman): Does this need a marker?
|
||||||
|
Slice[T] = Uint64 // the number of values in the slice
|
||||||
|
{ T } // the values
|
||||||
|
.
|
||||||
|
|
||||||
# Constants
|
# Constants
|
||||||
Constants appear as defined via the package constant.
|
Constants appear as defined via the package constant.
|
||||||
|
|
||||||
Constant = [ Sync ]
|
Constant = [ Sync ]
|
||||||
Bool // whether the constant is a complex number
|
Bool // whether the constant is a complex number
|
||||||
Scalar // the real part
|
Scalar // the real part
|
||||||
[ Scalar ] // if complex, the imaginary part
|
[ Scalar ] // if complex, the imaginary part
|
||||||
.
|
.
|
||||||
|
|
||||||
A scalar represents a value using one of several potential formats. The exact
|
A scalar represents a value using one of several potential formats. The exact
|
||||||
format and interpretation is distinguished by a code preceding the value.
|
format and interpretation is distinguished by a code preceding the value.
|
||||||
|
|
||||||
Scalar = [ Sync ]
|
Scalar = [ Sync ]
|
||||||
Uint64 // the code
|
Uint64 // the code indicating the type of Val
|
||||||
Val
|
Val
|
||||||
.
|
.
|
||||||
|
|
||||||
Val = Bool
|
Val = Bool
|
||||||
| Int64
|
| Int64
|
||||||
| String
|
| StringRef
|
||||||
| Term // big integer
|
| Term // big integer
|
||||||
| Term Term // big ratio, numerator / denominator
|
| Term Term // big ratio, numerator / denominator
|
||||||
| BigBytes // big float, precision 512
|
| BigBytes // big float, precision 512
|
||||||
.
|
.
|
||||||
|
|
||||||
Term = BigBytes
|
Term = BigBytes
|
||||||
Bool // whether the term is negative
|
Bool // whether the term is negative
|
||||||
.
|
.
|
||||||
|
|
||||||
BigBytes = String . // bytes of a big value
|
BigBytes = StringRef . // bytes of a big value
|
||||||
|
|
||||||
# Markers
|
# Markers
|
||||||
Markers provide a mechanism for asserting that encoders and decoders are
|
Markers provide a mechanism for asserting that encoders and decoders are
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue