diff --git a/src/appendix/glossary.md b/src/appendix/glossary.md index f5274a8f..715928d2 100644 --- a/src/appendix/glossary.md +++ b/src/appendix/glossary.md @@ -52,7 +52,9 @@ newtype | a "newtype" is a wrapper around some other type (e.g. NLL | [non-lexical lifetimes](../borrow_check/region_inference.html), an extension to Rust's borrowing system to make it be based on the control-flow graph. node-id or NodeId | an index identifying a particular node in the AST or HIR; gradually being phased out and replaced with `HirId`. obligation | something that must be proven by the trait system ([see more](../traits/resolution.html)) +placeholder | **NOTE: skolemization is deprecated by placeholder** a way of handling subtyping around "for-all" types (e.g., `for<'a> fn(&'a u32)`) as well as solving higher-ranked trait bounds (e.g., `for<'a> T: Trait<'a>`). See [the chapter on placeholder and universes](../borrow_check/region_inference/placeholders_and_universes.md) for more details. point | used in the NLL analysis to refer to some particular location in the MIR; typically used to refer to a node in the control-flow graph. +polymorphize | An optimization that avoids unnecessary monomorphisation ([see more](../backend/monomorph.md#polymorphization)) projection | a general term for a "relative path", e.g. `x.f` is a "field projection", and `T::Item` is an ["associated type projection"](../traits/goals-and-clauses.html#trait-ref) promoted constants | constants extracted from a function and lifted to static scope; see [this section](../mir/index.html#promoted) for more details. provider | the function that executes a query ([see more](../query.html)) @@ -63,7 +65,6 @@ rib | a data structure in the name resolver that keeps trac sess | the compiler session, which stores global data used throughout compilation side tables | because the AST and HIR are immutable once created, we often carry extra information about them in the form of hashtables, indexed by the id of a particular node. sigil | like a keyword but composed entirely of non-alphanumeric tokens. For example, `&` is a sigil for references. -placeholder | **NOTE: skolemization is deprecated by placeholder** a way of handling subtyping around "for-all" types (e.g., `for<'a> fn(&'a u32)`) as well as solving higher-ranked trait bounds (e.g., `for<'a> T: Trait<'a>`). See [the chapter on placeholder and universes](../borrow_check/region_inference/placeholders_and_universes.md) for more details. soundness | soundness is a technical term in type theory. Roughly, if a type system is sound, then if a program type-checks, it is type-safe; i.e. I can never (in safe rust) force a value into a variable of the wrong type. (see "completeness"). span | a location in the user's source code, used for error reporting primarily. These are like a file-name/line-number/column tuple on steroids: they carry a start/end point, and also track macro expansions and compiler desugaring. All while being packed into a few bytes (really, it's an index into a table). See the Span datatype for more. substs | the substitutions for a given generic type or item (e.g. the `i32`, `u32` in `HashMap`) diff --git a/src/backend/monomorph.md b/src/backend/monomorph.md index eaf37f2e..e28eac8f 100644 --- a/src/backend/monomorph.md +++ b/src/backend/monomorph.md @@ -1,8 +1,84 @@ # Monomorphization -TODO +As you probably know, rust has a very expressive type system that has extensive +support for generic types. But of course, assembly is not generic, so we need +to figure out the concrete types of all the generics before the code can +execute. +Different languages handle this problem differently. For example, in some +languages, such as Java, we may not know the most precise type of value until +runtime. In the case of Java, this is ok because (almost) all variables are +reference values anyway (i.e. pointers to a stack allocated object). This +flexibility comes at the cost of performance, since all accesses to an object +must dereference a pointer. + +Rust takes a different approach: it _monomorphizes_ all generic types. This +means that compiler stamps out a different copy of the code of a generic +function for each concrete type needed. For example, if I use a `Vec` and +a `Vec` in my code, then the generated binary will have two copies of +the generated code for `Vec`: one for `Vec` and another for `Vec`. +The result is fast programs, but it comes at the cost of compile time (creating +all those copies can take a while) and binary size (all those copies might take +a lot of space). + +Monomorphization is the first step in the backend of the rust compiler. + +## Collection + +First, we need to figure out what concrete types we need for all the generic +things in our program. This is called _collection_, and the code that does this +is called the _monomorphization collector_. + +Take this example: + +```rust +fn banana() { + peach::(); +} + +fn main() { + banana(); +} +``` + +The monomorphisation collector will give you a list of `[main, banana, +peach::]`. These are the functions that will have machine code generated +for them. Collector will also add things like statics to that list. + +See [the collector rustdocs][collect] for more info. + +[collect]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/monomorphize/collector/index.html ## Polymorphization -TODO +As mentioned above, monomorphisation produces fast code, but it comes at the +cost of compile time and binary size. [MIR +optimizations](../mir/optimizations.md) can help a bit with this. Another +optimization currently under development is called _polymorphization_. + +The general idea is that often we can share some code between monomorphized +copies of code. More precisely, if a MIR block is not dependent on a type +parameter, it may not need to be monomorphized into many copies. Consider the +following example: + +```rust +pub fn f() { + g::(); + g::(); +} + +fn g() -> usize { + let n = 1; + let closure = || n; + closure() +} +``` + +In this case, we would currently collect `[f, g::, g::, +g::::{{closure}}, g::::{{closure}}]`, but notice that the two +closures would be identical -- they don't depend on the type parameter `T` of +function `g`. So we only need to emit one copy of the closure. + +For more information, see [this thread on github][polymorph]. + +[polymorph]: https://github.com/rust-lang/rust/issues/46477