Merge pull request #190 from nikomatsakis/mir-borrow-check-1
start to document MIR borrow check
This commit is contained in:
commit
93b5783fec
|
|
@ -3,7 +3,7 @@ cache:
|
|||
- cargo
|
||||
before_install:
|
||||
- shopt -s globstar
|
||||
- MAX_LINE_LENGTH=80 bash ci/check_line_lengths.sh src/**/*.md
|
||||
- MAX_LINE_LENGTH=100 bash ci/check_line_lengths.sh src/**/*.md
|
||||
install:
|
||||
- source ~/.cargo/env || true
|
||||
- bash ci/install.sh
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
if [ "$1" == "--help" ]; then
|
||||
echo 'Usage:'
|
||||
echo ' MAX_LINE_LENGTH=80' "$0" 'src/**/*.md'
|
||||
echo ' MAX_LINE_LENGTH=100' "$0" 'src/**/*.md'
|
||||
exit 1
|
||||
fi
|
||||
|
||||
|
|
|
|||
|
|
@ -53,9 +53,12 @@
|
|||
- [MIR construction](./mir/construction.md)
|
||||
- [MIR visitor and traversal](./mir/visitor.md)
|
||||
- [MIR passes: getting the MIR for a function](./mir/passes.md)
|
||||
- [MIR borrowck](./mir/borrowck.md)
|
||||
- [MIR-based region checking (NLL)](./mir/regionck.md)
|
||||
- [MIR optimizations](./mir/optimizations.md)
|
||||
- [The borrow checker](./borrow_check.md)
|
||||
- [Tracking moves and initialization](./borrow_check/moves_and_initialization.md)
|
||||
- [Move paths](./borrow_check/moves_and_initialization/move_paths.md)
|
||||
- [MIR type checker](./borrow_check/type_check.md)
|
||||
- [Region inference](./borrow_check/region_inference.md)
|
||||
- [Constant evaluation](./const-eval.md)
|
||||
- [miri const evaluator](./miri.md)
|
||||
- [Parameter Environments](./param_env.md)
|
||||
|
|
|
|||
|
|
@ -40,7 +40,7 @@ MIR | the Mid-level IR that is created after type-checking
|
|||
miri | an interpreter for MIR used for constant evaluation ([see more](./miri.html))
|
||||
normalize | a general term for converting to a more canonical form, but in the case of rustc typically refers to [associated type normalization](./traits/associated-types.html#normalize)
|
||||
newtype | a "newtype" is a wrapper around some other type (e.g., `struct Foo(T)` is a "newtype" for `T`). This is commonly used in Rust to give a stronger type for indices.
|
||||
NLL | [non-lexical lifetimes](./mir/regionck.html), an extension to Rust's borrowing system to make it be based on the control-flow graph.
|
||||
NLL | [non-lexical lifetimes](./borrow_check/region_inference.html), an extension to Rust's borrowing system to make it be based on the control-flow graph.
|
||||
node-id or NodeId | an index identifying a particular node in the AST or HIR; gradually being phased out and replaced with `HirId`.
|
||||
obligation | something that must be proven by the trait system ([see more](traits/resolution.html))
|
||||
projection | a general term for a "relative path", e.g. `x.f` is a "field projection", and `T::Item` is an ["associated type projection"](./traits/goals-and-clauses.html#trait-ref)
|
||||
|
|
@ -53,7 +53,7 @@ rib | a data structure in the name resolver that keeps trac
|
|||
sess | the compiler session, which stores global data used throughout compilation
|
||||
side tables | because the AST and HIR are immutable once created, we often carry extra information about them in the form of hashtables, indexed by the id of a particular node.
|
||||
sigil | like a keyword but composed entirely of non-alphanumeric tokens. For example, `&` is a sigil for references.
|
||||
skolemization | a way of handling subtyping around "for-all" types (e.g., `for<'a> fn(&'a u32)`) as well as solving higher-ranked trait bounds (e.g., `for<'a> T: Trait<'a>`). See [the chapter on skolemization and universes](./mir/regionck.html#skol) for more details.
|
||||
skolemization | a way of handling subtyping around "for-all" types (e.g., `for<'a> fn(&'a u32)`) as well as solving higher-ranked trait bounds (e.g., `for<'a> T: Trait<'a>`). See [the chapter on skolemization and universes](./borrow_check/region_inference.html#skol) for more details.
|
||||
soundness | soundness is a technical term in type theory. Roughly, if a type system is sound, then if a program type-checks, it is type-safe; i.e. I can never (in safe rust) force a value into a variable of the wrong type. (see "completeness").
|
||||
span | a location in the user's source code, used for error reporting primarily. These are like a file-name/line-number/column tuple on steroids: they carry a start/end point, and also track macro expansions and compiler desugaring. All while being packed into a few bytes (really, it's an index into a table). See the Span datatype for more.
|
||||
substs | the substitutions for a given generic type or item (e.g. the `i32`, `u32` in `HashMap<i32, u32>`)
|
||||
|
|
|
|||
|
|
@ -0,0 +1,63 @@
|
|||
# MIR borrow check
|
||||
|
||||
The borrow check is Rust's "secret sauce" – it is tasked with
|
||||
enforcing a number of properties:
|
||||
|
||||
- That all variables are initialized before they are used.
|
||||
- That you can't move the same value twice.
|
||||
- That you can't move a value while it is borrowed.
|
||||
- That you can't access a place while it is mutably borrowed (except through
|
||||
the reference).
|
||||
- That you can't mutate a place while it is shared borrowed.
|
||||
- etc
|
||||
|
||||
At the time of this writing, the code is in a state of transition. The
|
||||
"main" borrow checker still works by processing [the HIR](hir.html),
|
||||
but that is being phased out in favor of the MIR-based borrow checker.
|
||||
Accordingly, this documentation focuses on the new, MIR-based borrow
|
||||
checker.
|
||||
|
||||
Doing borrow checking on MIR has several advantages:
|
||||
|
||||
- The MIR is *far* less complex than the HIR; the radical desugaring
|
||||
helps prevent bugs in the borrow checker. (If you're curious, you
|
||||
can see
|
||||
[a list of bugs that the MIR-based borrow checker fixes here][47366].)
|
||||
- Even more importantly, using the MIR enables ["non-lexical lifetimes"][nll],
|
||||
which are regions derived from the control-flow graph.
|
||||
|
||||
[47366]: https://github.com/rust-lang/rust/issues/47366
|
||||
[nll]: http://rust-lang.github.io/rfcs/2094-nll.html
|
||||
|
||||
### Major phases of the borrow checker
|
||||
|
||||
The borrow checker source is found in
|
||||
[the `rustc_mir::borrow_check` module][b_c]. The main entry point is
|
||||
the [`mir_borrowck`] query.
|
||||
|
||||
[b_c]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/index.html
|
||||
[`mir_borrowck`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/fn.mir_borrowck.html
|
||||
|
||||
- We first create a **local copy** of the MIR. In the coming steps,
|
||||
we will modify this copy in place to modify the types and things to
|
||||
include references to the new regions that we are computing.
|
||||
- We then invoke [`replace_regions_in_mir`] to modify our local MIR.
|
||||
Among other things, this function will replace all of the [regions](./appendix/glossary.html) in
|
||||
the MIR with fresh [inference variables](./appendix/glossary.html).
|
||||
- Next, we perform a number of
|
||||
[dataflow analyses](./appendix/background.html#dataflow) that
|
||||
compute what data is moved and when.
|
||||
- We then do a [second type check](borrow_check/type_check.html) across the MIR:
|
||||
the purpose of this type check is to determine all of the constraints between
|
||||
different regions.
|
||||
- Next, we do [region inference](borrow_check/region_inference.html), which computes
|
||||
the values of each region — basically, points in the control-flow graph.
|
||||
- At this point, we can compute the "borrows in scope" at each point.
|
||||
- Finally, we do a second walk over the MIR, looking at the actions it
|
||||
does and reporting errors. For example, if we see a statement like
|
||||
`*a + 1`, then we would check that the variable `a` is initialized
|
||||
and that it is not mutably borrowed, as either of those would
|
||||
require an error to be reported.
|
||||
- Doing this check requires the results of all the previous analyses.
|
||||
|
||||
[`replace_regions_in_mir`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/fn.replace_regions_in_mir.html
|
||||
|
|
@ -0,0 +1,50 @@
|
|||
# Tracking moves and initialization
|
||||
|
||||
Part of the borrow checker's job is to track which variables are
|
||||
"initialized" at any given point in time -- this also requires
|
||||
figuring out where moves occur and tracking those.
|
||||
|
||||
## Initialization and moves
|
||||
|
||||
From a user's perspective, initialization -- giving a variable some
|
||||
value -- and moves -- transfering ownership to another place -- might
|
||||
seem like distinct topics. Indeed, our borrow checker error messages
|
||||
often talk about them differently. But **within the borrow checker**,
|
||||
they are not nearly as separate. Roughly speaking, the borrow checker
|
||||
tracks the set of "initialized places" at any point in the source
|
||||
code. Assigning to a previously uninitialized local variable adds it
|
||||
to that set; moving from a local variable removes it from that set.
|
||||
|
||||
Consider this example:
|
||||
|
||||
```rust,ignore
|
||||
fn foo() {
|
||||
let a: Vec<u32>;
|
||||
|
||||
// a is not initialized yet
|
||||
|
||||
a = vec![22];
|
||||
|
||||
// a is initialized here
|
||||
|
||||
std::mem::drop(a); // a is moved here
|
||||
|
||||
// a is no longer initialized here
|
||||
|
||||
let l = a.len(); //~ ERROR
|
||||
}
|
||||
```
|
||||
|
||||
Here you can see that `a` starts off as uninitialized; once it is
|
||||
assigned, it becomes initialized. But when `drop(a)` is called, that
|
||||
moves `a` into the call, and hence it becomes uninitialized again.
|
||||
|
||||
## Subsections
|
||||
|
||||
To make it easier to peruse, this section is broken into a number of
|
||||
subsections:
|
||||
|
||||
- [Move paths](./moves_and_initialization/move_paths.html the
|
||||
*move path* concept that we use to track which local variables (or parts of
|
||||
local variables, in some cases) are initialized.
|
||||
- TODO *Rest not yet written* =)
|
||||
|
|
@ -0,0 +1,128 @@
|
|||
# Move paths
|
||||
|
||||
In reality, it's not enough to track initialization at the granularity
|
||||
of local variables. Rust also allows us to do moves and initialization
|
||||
at the field granularity:
|
||||
|
||||
```rust,ignore
|
||||
fn foo() {
|
||||
let a: (Vec<u32>, Vec<u32>) = (vec![22], vec![44]);
|
||||
|
||||
// a.0 and a.1 are both initialized
|
||||
|
||||
let b = a.0; // moves a.0
|
||||
|
||||
// a.0 is not initializd, but a.1 still is
|
||||
|
||||
let c = a.0; // ERROR
|
||||
let d = a.1; // OK
|
||||
}
|
||||
```
|
||||
|
||||
To handle this, we track initialization at the granularity of a **move
|
||||
path**. A [`MovePath`] represents some location that the user can
|
||||
initialize, move, etc. So e.g. there is a move-path representing the
|
||||
local variable `a`, and there is a move-path representing `a.0`. Move
|
||||
paths roughly correspond to the concept of a [`Place`] from MIR, but
|
||||
they are indexed in ways that enable us to do move analysis more
|
||||
efficiently.
|
||||
|
||||
[`MovePath`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePath.html
|
||||
[`Place`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/mir/enum.Place.html
|
||||
|
||||
## Move path indices
|
||||
|
||||
Although there is a [`MovePath`] data structure, they are never
|
||||
referenced directly. Instead, all the code passes around *indices* of
|
||||
type
|
||||
[`MovePathIndex`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/indexes/struct.MovePathIndex.html). If
|
||||
you need to get information about a move path, you use this index with
|
||||
the [`move_paths` field of the `MoveData`][move_paths]. For example,
|
||||
to convert a [`MovePathIndex`] `mpi` into a MIR [`Place`], you might
|
||||
access the [`MovePath::place`] field like so:
|
||||
|
||||
```rust,ignore
|
||||
move_data.move_paths[mpi].place
|
||||
```
|
||||
|
||||
[move_paths]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MoveData.html#structfield.move_paths
|
||||
[`MovePath::place`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePath.html#structfield.place
|
||||
|
||||
## Building move paths
|
||||
|
||||
One of the first things we do in the MIR borrow check is to construct
|
||||
the set of move paths. This is done as part of the
|
||||
[`MoveData::gather_moves`] function. This function uses a MIR visitor
|
||||
called [`Gatherer`] to walk the MIR and look at how each [`Place`]
|
||||
within is accessed. For each such [`Place`], it constructs a
|
||||
corresponding [`MovePathIndex`]. It also records when/where that
|
||||
particular move path is moved/initialized, but we'll get to that in a
|
||||
later section.
|
||||
|
||||
[`Gatherer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/builder/struct.Gatherer.html
|
||||
[`MoveData::gather_moves`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MoveData.html#method.gather_moves
|
||||
|
||||
### Illegal move paths
|
||||
|
||||
We don't actually create a move-path for **every** [`Place`] that gets
|
||||
used. In particular, if it is illegal to move from a [`Place`], then
|
||||
there is no need for a [`MovePathIndex`]. Some examples:
|
||||
|
||||
- You cannot move from a static variable, so we do not create a [`MovePathIndex`]
|
||||
for static variables.
|
||||
- You cannot move an individual element of an array, so if we have e.g. `foo: [String; 3]`,
|
||||
there would be no move-path for `foo[1]`.
|
||||
- You cannot move from inside of a borrowed reference, so if we have e.g. `foo: &String`,
|
||||
there would be no move-path for `*foo`.
|
||||
|
||||
These rules are enforced by the [`move_path_for`] function, which
|
||||
converts a [`Place`] into a [`MovePathIndex`] -- in error cases like
|
||||
those just discussed, the function returns an `Err`. This in turn
|
||||
means we don't have to bother tracking whether those places are
|
||||
initialized (which lowers overhead).
|
||||
|
||||
[`move_path_for`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/builder/struct.Gatherer.html#method.move_path_for
|
||||
|
||||
## Looking up a move-path
|
||||
|
||||
If you have a [`Place`] and you would like to convert it to a [`MovePathIndex`], you
|
||||
can do that using the [`MovePathLookup`] structure found in the [`rev_lookup`] field
|
||||
of [`MoveData`]. There are two different methods:
|
||||
|
||||
[`MovePathLookup`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePathLookup.html
|
||||
[`rev_lookup`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MoveData.html#structfield.rev_lookup
|
||||
|
||||
- [`find_local`], which takes a [`mir::Local`] representing a local
|
||||
variable. This is the easier method, because we **always** create a
|
||||
[`MovePathIndex`] for every local variable.
|
||||
- [`find`], which takes an arbitrary [`Place`]. This method is a bit
|
||||
more annoying to use, precisely because we don't have a
|
||||
[`MovePathIndex`] for **every** [`Place`] (as we just discussed in
|
||||
the "illegal move paths" section). Therefore, [`find`] returns a
|
||||
[`LookupResult`] indicating the closest path it was able to find
|
||||
that exists (e.g., for `foo[1]`, it might return just the path for
|
||||
`foo`).
|
||||
|
||||
[`find`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePathLookup.html#method.find
|
||||
[`find_local`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePathLookup.html#method.find_local
|
||||
[`mir::Local`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/mir/struct.Local.html
|
||||
[`LookupResult`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/enum.LookupResult.html
|
||||
|
||||
## Cross-references
|
||||
|
||||
As we noted above, move-paths are stored in a big vector and
|
||||
referenced via their [`MovePathIndex`]. However, within this vector,
|
||||
they are also structured into a tree. So for example if you have the
|
||||
[`MovePathIndex`] for `a.b.c`, you can go to its parent move-path
|
||||
`a.b`. You can also iterate over all children paths: so, from `a.b`,
|
||||
you might iterate to find the path `a.b.c` (here you are iterating
|
||||
just over the paths that are **actually referenced** in the source,
|
||||
not all **possible** paths that could have been referenced). These
|
||||
references are used for example in the [`has_any_child_of`] function,
|
||||
which checks whether the dataflow results contain a value for the
|
||||
given move-path (e.g., `a.b`) or any child of that move-path (e.g.,
|
||||
`a.b.c`).
|
||||
|
||||
[`Place`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/mir/enum.Place.html
|
||||
[`has_any_child_of`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/at_location/struct.FlowAtLocation.html#method.has_any_child_of
|
||||
|
||||
|
|
@ -1,11 +1,11 @@
|
|||
# MIR-based region checking (NLL)
|
||||
# Region inference (NLL)
|
||||
|
||||
The MIR-based region checking code is located in
|
||||
[the `rustc_mir::borrow_check::nll` module][nll]. (NLL, of course,
|
||||
stands for "non-lexical lifetimes", a term that will hopefully be
|
||||
deprecated once they become the standard kind of lifetime.)
|
||||
|
||||
[nll]: https://github.com/rust-lang/rust/tree/master/src/librustc_mir/borrow_check/nll
|
||||
[nll]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/index.html
|
||||
|
||||
The MIR-based region analysis consists of two major functions:
|
||||
|
||||
|
|
@ -0,0 +1,10 @@
|
|||
# The MIR type-check
|
||||
|
||||
A key component of the borrow check is the
|
||||
[MIR type-check](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/type_check/index.html).
|
||||
This check walks the MIR and does a complete "type check" -- the same
|
||||
kind you might find in any other language. In the process of doing
|
||||
this type-check, we also uncover the region constraints that apply to
|
||||
the program.
|
||||
|
||||
TODO -- elaborate further? Maybe? :)
|
||||
|
|
@ -1,59 +0,0 @@
|
|||
# MIR borrow check
|
||||
|
||||
The borrow check is Rust's "secret sauce" – it is tasked with
|
||||
enforcing a number of properties:
|
||||
|
||||
- That all variables are initialized before they are used.
|
||||
- That you can't move the same value twice.
|
||||
- That you can't move a value while it is borrowed.
|
||||
- That you can't access a place while it is mutably borrowed (except through
|
||||
the reference).
|
||||
- That you can't mutate a place while it is shared borrowed.
|
||||
- etc
|
||||
|
||||
At the time of this writing, the code is in a state of transition. The
|
||||
"main" borrow checker still works by processing [the HIR](hir.html),
|
||||
but that is being phased out in favor of the MIR-based borrow checker.
|
||||
Doing borrow checking on MIR has two key advantages:
|
||||
|
||||
- The MIR is *far* less complex than the HIR; the radical desugaring
|
||||
helps prevent bugs in the borrow checker. (If you're curious, you
|
||||
can see
|
||||
[a list of bugs that the MIR-based borrow checker fixes here][47366].)
|
||||
- Even more importantly, using the MIR enables ["non-lexical lifetimes"][nll],
|
||||
which are regions derived from the control-flow graph.
|
||||
|
||||
[47366]: https://github.com/rust-lang/rust/issues/47366
|
||||
[nll]: http://rust-lang.github.io/rfcs/2094-nll.html
|
||||
|
||||
### Major phases of the borrow checker
|
||||
|
||||
The borrow checker source is found in
|
||||
[the `rustc_mir::borrow_check` module][b_c]. The main entry point is
|
||||
the `mir_borrowck` query. At the time of this writing, MIR borrowck can operate
|
||||
in several modes, but this text will describe only the mode when NLL is enabled
|
||||
(what you get with `#![feature(nll)]`).
|
||||
|
||||
[b_c]: https://github.com/rust-lang/rust/tree/master/src/librustc_mir/borrow_check
|
||||
|
||||
The overall flow of the borrow checker is as follows:
|
||||
|
||||
- We first create a **local copy** C of the MIR. In the coming steps,
|
||||
we will modify this copy in place to modify the types and things to
|
||||
include references to the new regions that we are computing.
|
||||
- We then invoke `nll::replace_regions_in_mir` to modify this copy C.
|
||||
Among other things, this function will replace all of the regions in
|
||||
the MIR with fresh [inference variables](./appendix/glossary.html).
|
||||
- (More details can be found in [the regionck section](./mir/regionck.html).)
|
||||
- Next, we perform a number of [dataflow
|
||||
analyses](./appendix/background.html#dataflow)
|
||||
that compute what data is moved and when. The results of these analyses
|
||||
are needed to do both borrow checking and region inference.
|
||||
- Using the move data, we can then compute the values of all the regions in the
|
||||
MIR.
|
||||
- (More details can be found in [the NLL section](./mir/regionck.html).)
|
||||
- Finally, the borrow checker itself runs, taking as input (a) the
|
||||
results of move analysis and (b) the regions computed by the region
|
||||
checker. This allows us to figure out which loans are still in scope
|
||||
at any particular point.
|
||||
|
||||
Loading…
Reference in New Issue