monomorphization chapter

This commit is contained in:
Mark Mansi 2020-03-07 15:47:35 -06:00 committed by Who? Me?!
parent e19762b57c
commit 44cba6e075
2 changed files with 80 additions and 3 deletions

View File

@ -52,7 +52,9 @@ newtype | a "newtype" is a wrapper around some other type (e.g.
NLL | [non-lexical lifetimes](../borrow_check/region_inference.html), an extension to Rust's borrowing system to make it be based on the control-flow graph. NLL | [non-lexical lifetimes](../borrow_check/region_inference.html), an extension to Rust's borrowing system to make it be based on the control-flow graph.
node-id or NodeId | an index identifying a particular node in the AST or HIR; gradually being phased out and replaced with `HirId`. node-id or NodeId | an index identifying a particular node in the AST or HIR; gradually being phased out and replaced with `HirId`.
obligation | something that must be proven by the trait system ([see more](../traits/resolution.html)) obligation | something that must be proven by the trait system ([see more](../traits/resolution.html))
placeholder | **NOTE: skolemization is deprecated by placeholder** a way of handling subtyping around "for-all" types (e.g., `for<'a> fn(&'a u32)`) as well as solving higher-ranked trait bounds (e.g., `for<'a> T: Trait<'a>`). See [the chapter on placeholder and universes](../borrow_check/region_inference/placeholders_and_universes.md) for more details.
point | used in the NLL analysis to refer to some particular location in the MIR; typically used to refer to a node in the control-flow graph. point | used in the NLL analysis to refer to some particular location in the MIR; typically used to refer to a node in the control-flow graph.
polymorphize | An optimization that avoids unnecessary monomorphisation ([see more](../backend/monomorph.md#polymorphization))
projection | a general term for a "relative path", e.g. `x.f` is a "field projection", and `T::Item` is an ["associated type projection"](../traits/goals-and-clauses.html#trait-ref) projection | a general term for a "relative path", e.g. `x.f` is a "field projection", and `T::Item` is an ["associated type projection"](../traits/goals-and-clauses.html#trait-ref)
promoted constants | constants extracted from a function and lifted to static scope; see [this section](../mir/index.html#promoted) for more details. promoted constants | constants extracted from a function and lifted to static scope; see [this section](../mir/index.html#promoted) for more details.
provider | the function that executes a query ([see more](../query.html)) provider | the function that executes a query ([see more](../query.html))
@ -63,7 +65,6 @@ rib | a data structure in the name resolver that keeps trac
sess | the compiler session, which stores global data used throughout compilation sess | the compiler session, which stores global data used throughout compilation
side tables | because the AST and HIR are immutable once created, we often carry extra information about them in the form of hashtables, indexed by the id of a particular node. side tables | because the AST and HIR are immutable once created, we often carry extra information about them in the form of hashtables, indexed by the id of a particular node.
sigil | like a keyword but composed entirely of non-alphanumeric tokens. For example, `&` is a sigil for references. sigil | like a keyword but composed entirely of non-alphanumeric tokens. For example, `&` is a sigil for references.
placeholder | **NOTE: skolemization is deprecated by placeholder** a way of handling subtyping around "for-all" types (e.g., `for<'a> fn(&'a u32)`) as well as solving higher-ranked trait bounds (e.g., `for<'a> T: Trait<'a>`). See [the chapter on placeholder and universes](../borrow_check/region_inference/placeholders_and_universes.md) for more details.
soundness | soundness is a technical term in type theory. Roughly, if a type system is sound, then if a program type-checks, it is type-safe; i.e. I can never (in safe rust) force a value into a variable of the wrong type. (see "completeness"). soundness | soundness is a technical term in type theory. Roughly, if a type system is sound, then if a program type-checks, it is type-safe; i.e. I can never (in safe rust) force a value into a variable of the wrong type. (see "completeness").
span | a location in the user's source code, used for error reporting primarily. These are like a file-name/line-number/column tuple on steroids: they carry a start/end point, and also track macro expansions and compiler desugaring. All while being packed into a few bytes (really, it's an index into a table). See the Span datatype for more. span | a location in the user's source code, used for error reporting primarily. These are like a file-name/line-number/column tuple on steroids: they carry a start/end point, and also track macro expansions and compiler desugaring. All while being packed into a few bytes (really, it's an index into a table). See the Span datatype for more.
substs | the substitutions for a given generic type or item (e.g. the `i32`, `u32` in `HashMap<i32, u32>`) substs | the substitutions for a given generic type or item (e.g. the `i32`, `u32` in `HashMap<i32, u32>`)

View File

@ -1,8 +1,84 @@
# Monomorphization # Monomorphization
TODO As you probably know, rust has a very expressive type system that has extensive
support for generic types. But of course, assembly is not generic, so we need
to figure out the concrete types of all the generics before the code can
execute.
Different languages handle this problem differently. For example, in some
languages, such as Java, we may not know the most precise type of value until
runtime. In the case of Java, this is ok because (almost) all variables are
reference values anyway (i.e. pointers to a stack allocated object). This
flexibility comes at the cost of performance, since all accesses to an object
must dereference a pointer.
Rust takes a different approach: it _monomorphizes_ all generic types. This
means that compiler stamps out a different copy of the code of a generic
function for each concrete type needed. For example, if I use a `Vec<u64>` and
a `Vec<String>` in my code, then the generated binary will have two copies of
the generated code for `Vec`: one for `Vec<u64>` and another for `Vec<String>`.
The result is fast programs, but it comes at the cost of compile time (creating
all those copies can take a while) and binary size (all those copies might take
a lot of space).
Monomorphization is the first step in the backend of the rust compiler.
## Collection
First, we need to figure out what concrete types we need for all the generic
things in our program. This is called _collection_, and the code that does this
is called the _monomorphization collector_.
Take this example:
```rust
fn banana() {
peach::<u64>();
}
fn main() {
banana();
}
```
The monomorphisation collector will give you a list of `[main, banana,
peach::<u64>]`. These are the functions that will have machine code generated
for them. Collector will also add things like statics to that list.
See [the collector rustdocs][collect] for more info.
[collect]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/monomorphize/collector/index.html
## Polymorphization ## Polymorphization
TODO As mentioned above, monomorphisation produces fast code, but it comes at the
cost of compile time and binary size. [MIR
optimizations](../mir/optimizations.md) can help a bit with this. Another
optimization currently under development is called _polymorphization_.
The general idea is that often we can share some code between monomorphized
copies of code. More precisely, if a MIR block is not dependent on a type
parameter, it may not need to be monomorphized into many copies. Consider the
following example:
```rust
pub fn f() {
g::<bool>();
g::<usize>();
}
fn g<T>() -> usize {
let n = 1;
let closure = || n;
closure()
}
```
In this case, we would currently collect `[f, g::<bool>, g::<usize>,
g::<bool>::{{closure}}, g::<usize>::{{closure}}]`, but notice that the two
closures would be identical -- they don't depend on the type parameter `T` of
function `g`. So we only need to emit one copy of the closure.
For more information, see [this thread on github][polymorph].
[polymorph]: https://github.com/rust-lang/rust/issues/46477