apply mark-i-m's suggestions

This commit is contained in:
Niko Matsakis 2018-02-25 20:55:56 -05:00 committed by Who? Me?!
parent 644dccfa34
commit 75b2591ab3
9 changed files with 209 additions and 202 deletions

View File

@ -24,7 +24,7 @@
- [Type checking](./type-checking.md)
- [The MIR (Mid-level IR)](./mir.md)
- [MIR construction](./mir-construction.md)
- [MIR visitor](./mir-visitor.md)
- [MIR visitor and traversal](./mir-visitor.md)
- [MIR passes: getting the MIR for a function](./mir-passes.md)
- [MIR borrowck](./mir-borrowck.md)
- [MIR-based region checking (NLL)](./mir-regionck.md)

View File

@ -17,9 +17,9 @@ A control-flow graph is structured as a set of **basic blocks**
connected by edges. The key idea of a basic block is that it is a set
of statements that execute "together" -- that is, whenever you branch
to a basic block, you start at the first statement and then execute
all the remainder. Only at the end of the is there the possibility of
branching to more than one place (in MIR, we call that final statement
the **terminator**):
all the remainder. Only at the end of the block is there the
possibility of branching to more than one place (in MIR, we call that
final statement the **terminator**):
```
bb0: {
@ -88,7 +88,8 @@ cycle.
## What is co- and contra-variance?
*to be written*
Check out the subtyping chapter from the
[Rust Nomicon](https://doc.rust-lang.org/nomicon/subtyping.html).
<a name=free-vs-bound>
@ -97,18 +98,17 @@ cycle.
Let's describe the concepts of free vs bound in terms of program
variables, since that's the thing we're most familiar with.
- Consider this expression: `a + b`. In this expression, `a` and `b`
refer to local variables that are defined *outside* of the
expression. We say that those variables **appear free** in the
expression. To see why this term makes sense, consider the next
example.
- In contrast, consider this expression, which creates a closure: `|a,
- Consider this expression, which creates a closure: `|a,
b| a + b`. Here, the `a` and `b` in `a + b` refer to the arguments
that the closure will be given when it is called. We say that the
`a` and `b` there are **bound** to the closure, and that the closure
signature `|a, b|` is a **binder** for the names `a` and `b`
(because any references to `a` or `b` within refer to the variables
that it introduces).
- Consider this expression: `a + b`. In this expression, `a` and `b`
refer to local variables that are defined *outside* of the
expression. We say that those variables **appear free** in the
expression (i.e., they are **free**, not **bound** (tied up)).
So there you have it: a variable "appears free" in some
expression/statement/whatever if it refers to something defined

View File

@ -6,11 +6,16 @@ The compiler uses a number of...idiosyncratic abbreviations and things. This glo
Term | Meaning
------------------------|--------
AST | the abstract syntax tree produced by the syntax crate; reflects user syntax very closely.
binder | a "binder" is a place where a variable or type is declared; for example, the `<T>` is a binder for the generic type parameter `T` in `fn foo<T>(..)`, and `|a| ...` is a binder for the parameter `a`. See [the background chapter for more](./background.html#free-vs-bound)
bound variable | a "bound variable" is one that is declared within an expression/term. For example, the variable `a` is bound within the closure expession `|a| a * 2`. See [the background chapter for more](./background.html#free-vs-bound)
codegen unit | when we produce LLVM IR, we group the Rust code into a number of codegen units. Each of these units is processed by LLVM independently from one another, enabling parallelism. They are also the unit of incremental re-use.
completeness | completeness is a technical term in type theory. Completeness means that every type-safe program also type-checks. Having both soundness and completeness is very hard, and usually soundness is more important. (see "soundness").
control-flow graph | a representation of the control-flow of a program; see [the background chapter for more](./background.html#cfg)
cx | we tend to use "cx" as an abbrevation for context. See also `tcx`, `infcx`, etc.
DAG | a directed acyclic graph is used during compilation to keep track of dependencies between queries. ([see more](incremental-compilation.html))
data-flow analysis | a static analysis that figures out what properties are true at each point in the control-flow of a program; see [the background chapter for more](./background.html#dataflow)
DefId | an index identifying a definition (see `librustc/hir/def_id.rs`). Uniquely identifies a `DefPath`.
free variable | a "free variable" is one that is not bound within an expression or term; see [the background chapter for more](./background.html#free-vs-bound)
'gcx | the lifetime of the global arena ([see more](ty.html))
generics | the set of generic type parameters defined on a type or item
HIR | the High-level IR, created by lowering and desugaring the AST ([see more](hir.html))
@ -18,7 +23,7 @@ HirId | identifies a particular node in the HIR by combining
HIR Map | The HIR map, accessible via tcx.hir, allows you to quickly navigate the HIR and convert between various forms of identifiers.
ICE | internal compiler error. When the compiler crashes.
ICH | incremental compilation hash. ICHs are used as fingerprints for things such as HIR and crate metadata, to check if changes have been made. This is useful in incremental compilation to see if part of a crate has changed and should be recompiled.
inference variable | when doing type or region inference, an "inference variable" is a kind of special type/region that represents value you are trying to find. Think of `X` in algebra.
inference variable | when doing type or region inference, an "inference variable" is a kind of special type/region that represents what you are trying to infer. Think of X in algebra. For example, if we are trying to infer the type of a variable in a program, we create an inference variable to represent that unknown type.
infcx | the inference context (see `librustc/infer`)
IR | Intermediate Representation. A general term in compilers. During compilation, the code is transformed from raw source (ASCII text) to various IRs. In Rust, these are primarily HIR, MIR, and LLVM IR. Each IR is well-suited for some set of computations. For example, MIR is well-suited for the borrow checker, and LLVM IR is well-suited for codegen because LLVM accepts it.
local crate | the crate currently being compiled.
@ -27,14 +32,18 @@ LTO | Link-Time Optimizations. A set of optimizations offer
MIR | the Mid-level IR that is created after type-checking for use by borrowck and trans ([see more](./mir.html))
miri | an interpreter for MIR used for constant evaluation ([see more](./miri.html))
newtype | a "newtype" is a wrapper around some other type (e.g., `struct Foo(T)` is a "newtype" for `T`). This is commonly used in Rust to give a stronger type for indices.
NLL | [non-lexical lifetimes](./mir-regionck.html), an extension to Rust's borrowing system to make it be based on the control-flow graph.
node-id or NodeId | an index identifying a particular node in the AST or HIR; gradually being phased out and replaced with `HirId`.
obligation | something that must be proven by the trait system ([see more](trait-resolution.html))
promoted constants | constants extracted from a function and lifted to static scope; see [this section](./mir.html#promoted) for more details.
provider | the function that executes a query ([see more](query.html))
quantified | in math or logic, existential and universal quantification are used to ask questions like "is there any type T for which is true?" or "is this true for all types T?"; see [the background chapter for more](./background.html#quantified)
query | perhaps some sub-computation during compilation ([see more](query.html))
region | another term for "lifetime" often used in the literature and in the borrow checker.
sess | the compiler session, which stores global data used throughout compilation
side tables | because the AST and HIR are immutable once created, we often carry extra information about them in the form of hashtables, indexed by the id of a particular node.
sigil | like a keyword but composed entirely of non-alphanumeric tokens. For example, `&` is a sigil for references.
skolemization | a way of handling subtyping around "for-all" types (e.g., `for<'a> fn(&'a u32)` as well as solving higher-ranked trait bounds (e.g., `for<'a> T: Trait<'a>`). See [the chapter on skolemization and universes](./mir-regionck.html#skol) for more details.
soundness | soundness is a technical term in type theory. Roughly, if a type system is sound, then if a program type-checks, it is type-safe; i.e. I can never (in safe rust) force a value into a variable of the wrong type. (see "completeness").
span | a location in the user's source code, used for error reporting primarily. These are like a file-name/line-number/column tuple on steroids: they carry a start/end point, and also track macro expansions and compiler desugaring. All while being packed into a few bytes (really, it's an index into a table). See the Span datatype for more.
substs | the substitutions for a given generic type or item (e.g. the `i32`, `u32` in `HashMap<i32, u32>`)
@ -45,6 +54,7 @@ token | the smallest unit of parsing. Tokens are produced aft
trans | the code to translate MIR into LLVM IR.
trait reference | a trait and values for its type parameters ([see more](ty.html)).
ty | the internal representation of a type ([see more](ty.html)).
variance | variance determines how changes to a generic type/lifetime parameter affect subtyping; for example, if `T` is a subtype of `U`, then `Vec<T>` is a subtype `Vec<U>` because `Vec` is *covariant* in its generic parameter. See [the background chapter for more](./background.html#variance).
[LLVM]: https://llvm.org/
[lto]: https://llvm.org/docs/LinkTimeOptimization.html

View File

@ -1,122 +0,0 @@
# MIR Background topics
This section covers a numbers of common compiler terms that arise when
talking about MIR and optimizations. We try to give the general
definition while providing some Rust-specific context.
<a name=cfg>
## What is a control-flow graph?
A control-flow graph is a common term from compilers. If you've ever
used a flow-chart, then the concept of a control-flow graph will be
pretty familiar to you. It's a representation of your program that
exposes the underlying control flow in a very clear way.
A control-flow graph is structured as a set of **basic blocks**
connected by edges. The key idea of a basic block is that it is a set
of statements that execute "together" -- that is, whenever you branch
to a basic block, you start at the first statement and then execute
all the remainder. Only at the end of the is there the possibility of
branching to more than one place (in MIR, we call that final statement
the **terminator**):
```
bb0: {
statement0;
statement1;
statement2;
...
terminator;
}
```
Many expressions that you are used to in Rust compile down to multiple
basic blocks. For example, consider an if statement:
```rust
a = 1;
if some_variable {
b = 1;
} else {
c = 1;
}
d = 1;
```
This would compile into four basic blocks:
```
BB0: {
a = 1;
if some_variable { goto BB1 } else { goto BB2 }
}
BB1: {
b = 1;
goto BB3;
}
BB2: {
c = 1;
goto BB3;
}
BB3: {
d = 1;
...;
}
```
When using a control-flow graph, a loop simply appears as a cycle in
the graph, and the `break` keyword translates into a path out of that
cycle.
<a name=dataflow>
## What is a dataflow analysis?
*to be written*
<a name=quantified>
## What is "universally quantified"? What about "existentially quantified"?
*to be written*
<a name=variance>
## What is co- and contra-variance?
*to be written*
<a name=free-vs-bound>
## What is a "free region" or a "free variable"? What about "bound region"?
Let's describe the concepts of free vs bound in terms of program
variables, since that's the thing we're most familiar with.
- Consider this expression: `a + b`. In this expression, `a` and `b`
refer to local variables that are defined *outside* of the
expression. We say that those variables **appear free** in the
expression. To see why this term makes sense, consider the next
example.
- In contrast, consider this expression, which creates a closure: `|a,
b| a + b`. Here, the `a` and `b` in `a + b` refer to the arguments
that the closure will be given when it is called. We say that the
`a` and `b` there are **bound** to the closure, and that the closure
signature `|a, b|` is a **binder** for the names `a` and `b`
(because any references to `a` or `b` within refer to the variables
that it introduces).
So there you have it: a variable "appears free" in some
expression/statement/whatever if it refers to something defined
outside of that expressions/statement/whatever. Equivalently, we can
then refer to the "free variables" of an expression -- which is just
the set of variables that "appear free".
So what does this have to do with regions? Well, we can apply the
analogous concept to type and regions. For example, in the type `&'a
u32`, `'a` appears free. But in the type `for<'a> fn(&'a u32)`, it
does not.

View File

@ -37,9 +37,9 @@ in several modes, but this text will describe only the mode when NLL is enabled
The overall flow of the borrow checker is as follows:
- We first create a **local copy** C of the MIR. We will be modifying
this copy in place to modify the types and things to include
references to the new regions that we are computing.
- We first create a **local copy** C of the MIR. In the coming steps,
we will modify this copy in place to modify the types and things to
include references to the new regions that we are computing.
- We then invoke `nll::replace_regions_in_mir` to modify this copy C.
Among other things, this function will replace all of the regions in
the MIR with fresh [inference variables](glossary.html).
@ -51,6 +51,6 @@ The overall flow of the borrow checker is as follows:
- (More details can be found in [the NLL section](./mir-regionck.html).)
- Finally, the borrow checker itself runs, taking as input (a) the
results of move analysis and (b) the regions computed by the region
checker. This allows is to figure out which loans are still in scope
checker. This allows us to figure out which loans are still in scope
at any particular point.

View File

@ -4,9 +4,9 @@ If you would like to get the MIR for a function (or constant, etc),
you can use the `optimized_mir(def_id)` query. This will give you back
the final, optimized MIR. For foreign def-ids, we simply read the MIR
from the other crate's metadata. But for local def-ids, the query will
construct the MIR and then iteratively optimize it by putting it
through various pipeline stages. This section describes those pipeline
stages and how you can extend them.
construct the MIR and then iteratively optimize it by applying a
series of passes. This section describes how those passes work and how
you can extend them.
To produce the `optimized_mir(D)` for a given def-id `D`, the MIR
passes through several suites of optimizations, each represented by a
@ -97,18 +97,19 @@ that appeared within the `main` function.)
### Implementing and registering a pass
A `MirPass` is some bit of code that processes the MIR, typically --
but not always -- transforming it along the way in some way. For
example, it might perform an optimization. The `MirPass` trait itself
is found in in [the `rustc_mir::transform` module][mirtransform], and
it basically consists of one method, `run_pass`, that simply gets an
but not always -- transforming it along the way somehow. For example,
it might perform an optimization. The `MirPass` trait itself is found
in in [the `rustc_mir::transform` module][mirtransform], and it
basically consists of one method, `run_pass`, that simply gets an
`&mut Mir` (along with the tcx and some information about where it
came from).
came from). The MIR is therefore modified in place (which helps to
keep things efficient).
A good example of a basic MIR pass is [`NoLandingPads`], which walks the
MIR and removes all edges that are due to unwinding -- this is used
with when configured with `panic=abort`, which never unwinds. As you can see
from its source, a MIR pass is defined by first defining a dummy type, a struct
with no fields, something like:
A good example of a basic MIR pass is [`NoLandingPads`], which walks
the MIR and removes all edges that are due to unwinding -- this is
used when configured with `panic=abort`, which never unwinds. As you
can see from its source, a MIR pass is defined by first defining a
dummy type, a struct with no fields, something like:
```rust
struct MyPass;
@ -120,8 +121,9 @@ this pass into the appropriate list of passes found in a query like
should go into the `optimized_mir` list.)
If you are writing a pass, there's a good chance that you are going to
want to use a [MIR visitor] too -- those are a handy visitor that
walks the MIR for you and lets you make small edits here and there.
want to use a [MIR visitor]. MIR visitors are a handy way to walk all
the parts of the MIR, either to search for something or to make small
edits.
### Stealing
@ -149,7 +151,9 @@ be **stolen** by the `mir_validated()` suite. If nothing was done,
then `mir_const_qualif(D)` would succeed if it came before
`mir_validated(D)`, but fail otherwise. Therefore, `mir_validated(D)`
will **force** `mir_const_qualif` before it actually steals, thus
ensuring that the reads have already happened:
ensuring that the reads have already happened (remember that
[queries are memoized](./query.html), so executing a query twice
simply loads from a cache the second time):
```
mir_const(D) --read-by--> mir_const_qualif(D)

View File

@ -10,11 +10,11 @@ deprecated once they become the standard kind of lifetime.)
The MIR-based region analysis consists of two major functions:
- `replace_regions_in_mir`, invoked first, has two jobs:
- First, it analyzes the signature of the MIR and finds the set of
regions that appear in the MIR signature (e.g., `'a` in `fn
foo<'a>(&'a u32) { ... }`. These are called the "universal" or
"free" regions -- in particular, they are the regions that
[appear free][fvb] in the function body.
- First, it finds the set of regions that appear within the
signature of the function (e.g., `'a` in `fn foo<'a>(&'a u32) {
... }`. These are called the "universal" or "free" regions -- in
particular, they are the regions that [appear free][fvb] in the
function body.
- Second, it replaces all the regions from the function body with
fresh inference variables. This is because (presently) those
regions are the results of lexical region inference and hence are
@ -49,6 +49,8 @@ the role of `liveness_constraints` vs other `constraints`, plus
## Closures
*to be written*
<a name=mirtypeck>
## The MIR type-check
@ -131,15 +133,14 @@ replace them with
representatives, written like `!1`. We call these regions "skolemized
regions" -- they represent, basically, "some unknown region".
Once we've done that replacement, we have the following types:
Once we've done that replacement, we have the following relation:
fn(&'static u32) <: fn(&'!1 u32)
The key idea here is that this unknown region `'!1` is not related to
any other regions. So if we can prove that the subtyping relationship
is true for `'!1`, then it ought to be true for any region, which is
what we wanted. (This number `!1` is called a "universe", for reasons
we'll get into later.)
what we wanted.
So let's work through what happens next. To check if two functions are
subtypes, we check if their arguments have the desired relationship
@ -154,6 +155,118 @@ outlives `'static`. Now, this *might* be true -- after all, `'!1`
could be `'static` -- but we don't *know* that it's true. So this
should yield up an error (eventually).
### What is a universe
In the previous section, we introduced the idea of a skolemized
region, and we denoted it `!1`. We call this number `1` the **universe
index**. The idea of a "universe" is that it is a set of names that
are in scope within some type or at some point. Universes are formed
into a tree, where each child extends its parents with some new names.
So the **root universe** conceptually contains global names, such as
the the lifetime `'static` or the type `i32`. In the compiler, we also
put generic type parameters into this root universe. So consider
this function `bar`:
```rust
struct Foo { }
fn bar<'a, T>(t: &'a T) {
...
}
```
Here, the root universe would consider of the lifetimes `'static` and
`'a`. In fact, although we're focused on lifetimes here, we can apply
the same concept to types, in which case the types `Foo` and `T` would
be in the root universe (along with other global types, like `i32`).
Basically, the root universe contains all the names that
[appear free](./background.html#free-vs-bound) in the body of `bar`.
Now let's extend `bar` a bit by adding a variable `x`:
```rust
fn bar<'a, T>(t: &'a T) {
let x: for<'b> fn(&'b u32) = ...;
}
```
Here, the name `'b` is not part of the root universe. Instead, when we
"enter" into this `for<'b>` (e.g., by skolemizing it), we will create
a child universe of the root, let's call it U1:
```
U0 (root universe)
└─ U1 (child universe)
```
The idea is that this child universe U1 extends the root universe U0
with a new name, which we are identifying by its universe number:
`!1`.
Now let's extend `bar` a bit by adding one more variable, `y`:
```rust
fn bar<'a, T>(t: &'a T) {
let x: for<'b> fn(&'b u32) = ...;
let y: for<'c> fn(&'b u32) = ...;
}
```
When we enter *this* type, we will again create a new universe, which
let's call `U2`. It's parent will be the root universe, and U1 will be
its sibling:
```
U0 (root universe)
├─ U1 (child universe)
└─ U2 (child universe)
```
This implies that, while in U2, we can name things from U0 or U2, but
not U1.
**Giving existential variables a universe.** Now that we have this
notion of universes, we can use it to extend our type-checker and
things to prevent illegal names from leaking out. The idea is that we
give each inference (existential) variable -- whether it be a type or
a lifetime -- a universe. That variable's value can then only
reference names visible from that universe. So for example is a
lifetime variable is created in U0, then it cannot be assigned a value
of `!1` or `!2`, because those names are not visible from the universe
U0.
**Representing universes with just a counter.** You might be surprised
to see that the compiler doesn't keep track of a full tree of
universes. Instead, it just keeps a counter -- and, to determine if
one universe can see another one, it just checks if the index is
greater. For example, U2 can see U0 because 2 >= 0. But U0 cannot see
U2, because 0 >= 2 is false.
How can we get away with this? Doesn't this mean that we would allow
U2 to also see U1? The answer is that, yes, we would, **if that
question ever arose**. But because of the structure of our type
checker etc, there is no way for that to happen. In order for
something happening in the universe U1 to "communicate" with something
happening in U2, they would have to have a shared inference variable X
in common. And because everything in U1 is scoped to just U1 and its
children, that inference variable X would have to be in U0. And since
X is in U0, it cannot name anything from U1 (or U2). This is perhaps easiest
to see by using a kind of generic "logic" example:
```
exists<X> {
forall<Y> { ... /* Y is in U1 ... */ }
forall<Z> { ... /* Z is in U2 ... */ }
}
```
Here, the only way for the two foralls to interact would be through X,
but neither Y nor Z are in scope when X is declared, so its value
cannot reference either of them.
### Universes and skolemized region elements
But where does that error come from? The way it happens is like this.
@ -179,10 +292,11 @@ In the region inference engine, outlives constraints have the form:
V1: V2 @ P
where `V1` and `V2` are region indices, and hence map to some region
variable (which may be universally or existentially quantified). This
variable will have a universe, so let's call those universes `U(V1)`
and `U(V2)` respectively. (Actually, the only one we are going to care
about is `U(V1)`.)
variable (which may be universally or existentially quantified). The
`P` here is a "point" in the control-flow graph; it's not important
for this section. This variable will have a universe, so let's call
those universes `U(V1)` and `U(V2)` respectively. (Actually, the only
one we are going to care about is `U(V1)`.)
When we encounter this constraint, the ordinary procedure is to start
a DFS from `P`. We keep walking so long as the nodes we are walking
@ -190,24 +304,24 @@ are present in `value(V2)` and we add those nodes to `value(V1)`. If
we reach a return point, we add in any `end(X)` elements. That part
remains unchanged.
But then *after that* we want to iterate over the skolemized `skol(u)`
But then *after that* we want to iterate over the skolemized `skol(x)`
elements in V2 (each of those must be visible to `U(V2)`, but we
should be able to just assume that is true, we don't have to check
it). We have to ensure that `value(V1)` outlives each of those
skolemized elements.
Now there are two ways that could happen. First, if `U(V1)` can see
the universe `u` (i.e., `u <= U(V1)`), then we can just add `skol(u1)`
the universe `x` (i.e., `x <= U(V1)`), then we can just add `skol(x)`
to `value(V1)` and be done. But if not, then we have to approximate:
we may not know what set of elements `skol(u1)` represents, but we
should be able to compute some sort of **upper bound** for it --
something that it is smaller than. For now, we'll just use `'static`
for that (since it is bigger than everything) -- in the future, we can
sometimes be smarter here (and in fact we have code for doing this
already in other contexts). Moreover, since `'static` is in U0, we
know that all variables can see it -- so basically if we find a that
`value(V2)` contains `skol(u)` for some universe `u` that `V1` can't
see, then we force `V1` to `'static`.
we may not know what set of elements `skol(x)` represents, but we
should be able to compute some sort of **upper bound** B for it --
some region B that outlives `skol(x)`. For now, we'll just use
`'static` for that (since it outlives everything) -- in the future, we
can sometimes be smarter here (and in fact we have code for doing this
already in other contexts). Moreover, since `'static` is in the root
universe U0, we know that all variables can see it -- so basically if
we find that `value(V2)` contains `skol(x)` for some universe `x`
that `V1` can't see, then we force `V1` to `'static`.
### Extending the "universal regions" check
@ -258,7 +372,7 @@ To process this, we would grow the value of V1 to include all of Vs:
Vs = { CFG; end('static) }
V1 = { CFG; end('static), skol(1) }
At that point, constraint propagation is done, because all the
At that point, constraint propagation is complete, because all the
outlives relationships are satisfied. Then we would go to the "check
universal regions" portion of the code, which would test that no
universal region grew too large.
@ -280,8 +394,9 @@ Here we would skolemize the supertype, as before, yielding:
<:
fn(&'!1 u32, &'!2 u32)
then we instantiate the variable on the left-hand side with an existential
in universe U2, yielding:
then we instantiate the variable on the left-hand side with an
existential in universe U2, yielding the following (`?n` is a notation
for an existential variable):
fn(&'?3 u32, &'?3 u32)
<:

View File

@ -43,3 +43,13 @@ terminators and removes their `unwind` successors.
[`NoLandingPads`]: https://github.com/rust-lang/rust/tree/master/src/librustc_mir/transform/no_landing_pads.rs
## Traversal
In addition the visitor, [the `rustc::mir::traversal` module][t]
contains useful functions for walking the MIR CFG in
[different standard orders][traversal] (e.g. pre-order, reverse
post-order, and so forth).
[t]: https://github.com/rust-lang/rust/tree/master/src/librustc/mir/traversal.rs
[traversal]: https://en.wikipedia.org/wiki/Tree_traversal

View File

@ -1,10 +1,10 @@
# The MIR (Mid-level IR)
MIR is Rust's _Mid-level Intermediate Representation_. It is
constructed from HIR (described in an earlier chapter). MIR was
introduced in [RFC 1211]. It is a radically simplified form of Rust
that is used for certain flow-sensitive safety checks -- notably the
borrow checker! -- and also for optimization and code generation.
constructed from [HIR](./hir.html). MIR was introduced in
[RFC 1211]. It is a radically simplified form of Rust that is used for
certain flow-sensitive safety checks -- notably the borrow checker! --
and also for optimization and code generation.
If you'd like a very high-level introduction to MIR, as well as some
of the compiler concepts that it relies on (such as control-flow
@ -35,20 +35,20 @@ This section introduces the key concepts of MIR, summarized here:
- **Basic blocks**: units of the control-flow graph, consisting of:
- **statements:** actions with one successor
- **terminators:** actions with potentially multiple successors; always at the end of a block
- (if you're not familiar with the term basic block, see the [MIR background chapter][bg])
- (if you're not familiar with the term *basic block*, see the [background chapter][cfg])
- **Locals:** Memory locations alloated on the stack (conceptually, at
least), such as function arguments, local variables, and
temporaries. These are identified by an index, written with a
leading underscore, like `_1`. There is also a special "local"
(`_0`) allocated to store the return value.
- **Places:** expressions that identify a location in memory, like `_1` or `_1.f`.
- **Rvalues:** expressions that product a value. The "R" stands for
- **Rvalues:** expressions that produce a value. The "R" stands for
the fact that these are the "right-hand side" of an assignment.
- **Operands:** the arguments to an rvalue, which can either be a
constant (like `22`) or a place (like `_1`).
You can get a feeling for how MIR is structed by translating simple
programs into MIR and ready the pretty printed output. In fact, the
programs into MIR and reading the pretty printed output. In fact, the
playground makes this easy, since it supplies a MIR button that will
show you the MIR for your program. Try putting this program into play
(or [clicking on this link][sample-play]), and then clicking the "MIR"
@ -96,7 +96,9 @@ You can see that variables in MIR don't have names, they have indices,
like `_0` or `_1`. We also intermingle the user's variables (e.g.,
`_1`) with temporary values (e.g., `_2` or `_3`). You can tell the
difference between user-defined variables have a comment that gives
you their original name (`// "vec" in scope 1...`).
you their original name (`// "vec" in scope 1...`). The "scope" blocks
(e.g., `scope 1 { .. }`) describe the lexical structure of the source
program (which names were in scope when).
**Basic blocks.** Reading further, we see our first **basic block** (naturally it may look
slightly different when you view it, and I am ignoring some of the comments):
@ -223,27 +225,15 @@ but [you can read about those below](#promoted)).
- **Rvalues** are represented by the enum `Rvalue`.
- **Operands** are represented by the enum `Operand`.
## MIR Visitor
The main MIR data type is `rustc::mir::Mir`, defined in `mod.rs`.
There is also the MIR visitor (in `visit.rs`) which allows you to walk
the MIR and override what actions will be taken at various points (you
can visit in either shared or mutable mode; the latter allows changing
the MIR in place). Finally `traverse.rs` contains various traversal
routines for visiting the MIR CFG in [different standard orders][traversal]
(e.g. pre-order, reverse post-order, and so forth).
[traversal]: https://en.wikipedia.org/wiki/Tree_traversal
## Representing constants
TBD
*to be written*
<a name=promoted>
### Promoted constants
TBD
*to be written*
[mir]: https://github.com/rust-lang/rust/tree/master/src/librustc/mir