Merge pull request #190 from nikomatsakis/mir-borrow-check-1

start to document MIR borrow check
This commit is contained in:
Niko Matsakis 2018-09-11 16:52:54 -04:00 committed by GitHub
commit 93b5783fec
10 changed files with 262 additions and 67 deletions

View File

@ -3,7 +3,7 @@ cache:
- cargo
before_install:
- shopt -s globstar
- MAX_LINE_LENGTH=80 bash ci/check_line_lengths.sh src/**/*.md
- MAX_LINE_LENGTH=100 bash ci/check_line_lengths.sh src/**/*.md
install:
- source ~/.cargo/env || true
- bash ci/install.sh

View File

@ -2,7 +2,7 @@
if [ "$1" == "--help" ]; then
echo 'Usage:'
echo ' MAX_LINE_LENGTH=80' "$0" 'src/**/*.md'
echo ' MAX_LINE_LENGTH=100' "$0" 'src/**/*.md'
exit 1
fi

View File

@ -53,9 +53,12 @@
- [MIR construction](./mir/construction.md)
- [MIR visitor and traversal](./mir/visitor.md)
- [MIR passes: getting the MIR for a function](./mir/passes.md)
- [MIR borrowck](./mir/borrowck.md)
- [MIR-based region checking (NLL)](./mir/regionck.md)
- [MIR optimizations](./mir/optimizations.md)
- [The borrow checker](./borrow_check.md)
- [Tracking moves and initialization](./borrow_check/moves_and_initialization.md)
- [Move paths](./borrow_check/moves_and_initialization/move_paths.md)
- [MIR type checker](./borrow_check/type_check.md)
- [Region inference](./borrow_check/region_inference.md)
- [Constant evaluation](./const-eval.md)
- [miri const evaluator](./miri.md)
- [Parameter Environments](./param_env.md)

View File

@ -40,7 +40,7 @@ MIR | the Mid-level IR that is created after type-checking
miri | an interpreter for MIR used for constant evaluation ([see more](./miri.html))
normalize | a general term for converting to a more canonical form, but in the case of rustc typically refers to [associated type normalization](./traits/associated-types.html#normalize)
newtype | a "newtype" is a wrapper around some other type (e.g., `struct Foo(T)` is a "newtype" for `T`). This is commonly used in Rust to give a stronger type for indices.
NLL | [non-lexical lifetimes](./mir/regionck.html), an extension to Rust's borrowing system to make it be based on the control-flow graph.
NLL | [non-lexical lifetimes](./borrow_check/region_inference.html), an extension to Rust's borrowing system to make it be based on the control-flow graph.
node-id or NodeId | an index identifying a particular node in the AST or HIR; gradually being phased out and replaced with `HirId`.
obligation | something that must be proven by the trait system ([see more](traits/resolution.html))
projection | a general term for a "relative path", e.g. `x.f` is a "field projection", and `T::Item` is an ["associated type projection"](./traits/goals-and-clauses.html#trait-ref)
@ -53,7 +53,7 @@ rib | a data structure in the name resolver that keeps trac
sess | the compiler session, which stores global data used throughout compilation
side tables | because the AST and HIR are immutable once created, we often carry extra information about them in the form of hashtables, indexed by the id of a particular node.
sigil | like a keyword but composed entirely of non-alphanumeric tokens. For example, `&` is a sigil for references.
skolemization | a way of handling subtyping around "for-all" types (e.g., `for<'a> fn(&'a u32)`) as well as solving higher-ranked trait bounds (e.g., `for<'a> T: Trait<'a>`). See [the chapter on skolemization and universes](./mir/regionck.html#skol) for more details.
skolemization | a way of handling subtyping around "for-all" types (e.g., `for<'a> fn(&'a u32)`) as well as solving higher-ranked trait bounds (e.g., `for<'a> T: Trait<'a>`). See [the chapter on skolemization and universes](./borrow_check/region_inference.html#skol) for more details.
soundness | soundness is a technical term in type theory. Roughly, if a type system is sound, then if a program type-checks, it is type-safe; i.e. I can never (in safe rust) force a value into a variable of the wrong type. (see "completeness").
span | a location in the user's source code, used for error reporting primarily. These are like a file-name/line-number/column tuple on steroids: they carry a start/end point, and also track macro expansions and compiler desugaring. All while being packed into a few bytes (really, it's an index into a table). See the Span datatype for more.
substs | the substitutions for a given generic type or item (e.g. the `i32`, `u32` in `HashMap<i32, u32>`)

63
src/borrow_check.md Normal file
View File

@ -0,0 +1,63 @@
# MIR borrow check
The borrow check is Rust's "secret sauce" it is tasked with
enforcing a number of properties:
- That all variables are initialized before they are used.
- That you can't move the same value twice.
- That you can't move a value while it is borrowed.
- That you can't access a place while it is mutably borrowed (except through
the reference).
- That you can't mutate a place while it is shared borrowed.
- etc
At the time of this writing, the code is in a state of transition. The
"main" borrow checker still works by processing [the HIR](hir.html),
but that is being phased out in favor of the MIR-based borrow checker.
Accordingly, this documentation focuses on the new, MIR-based borrow
checker.
Doing borrow checking on MIR has several advantages:
- The MIR is *far* less complex than the HIR; the radical desugaring
helps prevent bugs in the borrow checker. (If you're curious, you
can see
[a list of bugs that the MIR-based borrow checker fixes here][47366].)
- Even more importantly, using the MIR enables ["non-lexical lifetimes"][nll],
which are regions derived from the control-flow graph.
[47366]: https://github.com/rust-lang/rust/issues/47366
[nll]: http://rust-lang.github.io/rfcs/2094-nll.html
### Major phases of the borrow checker
The borrow checker source is found in
[the `rustc_mir::borrow_check` module][b_c]. The main entry point is
the [`mir_borrowck`] query.
[b_c]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/index.html
[`mir_borrowck`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/fn.mir_borrowck.html
- We first create a **local copy** of the MIR. In the coming steps,
we will modify this copy in place to modify the types and things to
include references to the new regions that we are computing.
- We then invoke [`replace_regions_in_mir`] to modify our local MIR.
Among other things, this function will replace all of the [regions](./appendix/glossary.html) in
the MIR with fresh [inference variables](./appendix/glossary.html).
- Next, we perform a number of
[dataflow analyses](./appendix/background.html#dataflow) that
compute what data is moved and when.
- We then do a [second type check](borrow_check/type_check.html) across the MIR:
the purpose of this type check is to determine all of the constraints between
different regions.
- Next, we do [region inference](borrow_check/region_inference.html), which computes
the values of each region — basically, points in the control-flow graph.
- At this point, we can compute the "borrows in scope" at each point.
- Finally, we do a second walk over the MIR, looking at the actions it
does and reporting errors. For example, if we see a statement like
`*a + 1`, then we would check that the variable `a` is initialized
and that it is not mutably borrowed, as either of those would
require an error to be reported.
- Doing this check requires the results of all the previous analyses.
[`replace_regions_in_mir`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/fn.replace_regions_in_mir.html

View File

@ -0,0 +1,50 @@
# Tracking moves and initialization
Part of the borrow checker's job is to track which variables are
"initialized" at any given point in time -- this also requires
figuring out where moves occur and tracking those.
## Initialization and moves
From a user's perspective, initialization -- giving a variable some
value -- and moves -- transfering ownership to another place -- might
seem like distinct topics. Indeed, our borrow checker error messages
often talk about them differently. But **within the borrow checker**,
they are not nearly as separate. Roughly speaking, the borrow checker
tracks the set of "initialized places" at any point in the source
code. Assigning to a previously uninitialized local variable adds it
to that set; moving from a local variable removes it from that set.
Consider this example:
```rust,ignore
fn foo() {
let a: Vec<u32>;
// a is not initialized yet
a = vec![22];
// a is initialized here
std::mem::drop(a); // a is moved here
// a is no longer initialized here
let l = a.len(); //~ ERROR
}
```
Here you can see that `a` starts off as uninitialized; once it is
assigned, it becomes initialized. But when `drop(a)` is called, that
moves `a` into the call, and hence it becomes uninitialized again.
## Subsections
To make it easier to peruse, this section is broken into a number of
subsections:
- [Move paths](./moves_and_initialization/move_paths.html the
*move path* concept that we use to track which local variables (or parts of
local variables, in some cases) are initialized.
- TODO *Rest not yet written* =)

View File

@ -0,0 +1,128 @@
# Move paths
In reality, it's not enough to track initialization at the granularity
of local variables. Rust also allows us to do moves and initialization
at the field granularity:
```rust,ignore
fn foo() {
let a: (Vec<u32>, Vec<u32>) = (vec![22], vec![44]);
// a.0 and a.1 are both initialized
let b = a.0; // moves a.0
// a.0 is not initializd, but a.1 still is
let c = a.0; // ERROR
let d = a.1; // OK
}
```
To handle this, we track initialization at the granularity of a **move
path**. A [`MovePath`] represents some location that the user can
initialize, move, etc. So e.g. there is a move-path representing the
local variable `a`, and there is a move-path representing `a.0`. Move
paths roughly correspond to the concept of a [`Place`] from MIR, but
they are indexed in ways that enable us to do move analysis more
efficiently.
[`MovePath`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePath.html
[`Place`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/mir/enum.Place.html
## Move path indices
Although there is a [`MovePath`] data structure, they are never
referenced directly. Instead, all the code passes around *indices* of
type
[`MovePathIndex`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/indexes/struct.MovePathIndex.html). If
you need to get information about a move path, you use this index with
the [`move_paths` field of the `MoveData`][move_paths]. For example,
to convert a [`MovePathIndex`] `mpi` into a MIR [`Place`], you might
access the [`MovePath::place`] field like so:
```rust,ignore
move_data.move_paths[mpi].place
```
[move_paths]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MoveData.html#structfield.move_paths
[`MovePath::place`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePath.html#structfield.place
## Building move paths
One of the first things we do in the MIR borrow check is to construct
the set of move paths. This is done as part of the
[`MoveData::gather_moves`] function. This function uses a MIR visitor
called [`Gatherer`] to walk the MIR and look at how each [`Place`]
within is accessed. For each such [`Place`], it constructs a
corresponding [`MovePathIndex`]. It also records when/where that
particular move path is moved/initialized, but we'll get to that in a
later section.
[`Gatherer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/builder/struct.Gatherer.html
[`MoveData::gather_moves`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MoveData.html#method.gather_moves
### Illegal move paths
We don't actually create a move-path for **every** [`Place`] that gets
used. In particular, if it is illegal to move from a [`Place`], then
there is no need for a [`MovePathIndex`]. Some examples:
- You cannot move from a static variable, so we do not create a [`MovePathIndex`]
for static variables.
- You cannot move an individual element of an array, so if we have e.g. `foo: [String; 3]`,
there would be no move-path for `foo[1]`.
- You cannot move from inside of a borrowed reference, so if we have e.g. `foo: &String`,
there would be no move-path for `*foo`.
These rules are enforced by the [`move_path_for`] function, which
converts a [`Place`] into a [`MovePathIndex`] -- in error cases like
those just discussed, the function returns an `Err`. This in turn
means we don't have to bother tracking whether those places are
initialized (which lowers overhead).
[`move_path_for`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/builder/struct.Gatherer.html#method.move_path_for
## Looking up a move-path
If you have a [`Place`] and you would like to convert it to a [`MovePathIndex`], you
can do that using the [`MovePathLookup`] structure found in the [`rev_lookup`] field
of [`MoveData`]. There are two different methods:
[`MovePathLookup`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePathLookup.html
[`rev_lookup`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MoveData.html#structfield.rev_lookup
- [`find_local`], which takes a [`mir::Local`] representing a local
variable. This is the easier method, because we **always** create a
[`MovePathIndex`] for every local variable.
- [`find`], which takes an arbitrary [`Place`]. This method is a bit
more annoying to use, precisely because we don't have a
[`MovePathIndex`] for **every** [`Place`] (as we just discussed in
the "illegal move paths" section). Therefore, [`find`] returns a
[`LookupResult`] indicating the closest path it was able to find
that exists (e.g., for `foo[1]`, it might return just the path for
`foo`).
[`find`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePathLookup.html#method.find
[`find_local`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePathLookup.html#method.find_local
[`mir::Local`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/mir/struct.Local.html
[`LookupResult`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/enum.LookupResult.html
## Cross-references
As we noted above, move-paths are stored in a big vector and
referenced via their [`MovePathIndex`]. However, within this vector,
they are also structured into a tree. So for example if you have the
[`MovePathIndex`] for `a.b.c`, you can go to its parent move-path
`a.b`. You can also iterate over all children paths: so, from `a.b`,
you might iterate to find the path `a.b.c` (here you are iterating
just over the paths that are **actually referenced** in the source,
not all **possible** paths that could have been referenced). These
references are used for example in the [`has_any_child_of`] function,
which checks whether the dataflow results contain a value for the
given move-path (e.g., `a.b`) or any child of that move-path (e.g.,
`a.b.c`).
[`Place`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/mir/enum.Place.html
[`has_any_child_of`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/at_location/struct.FlowAtLocation.html#method.has_any_child_of

View File

@ -1,11 +1,11 @@
# MIR-based region checking (NLL)
# Region inference (NLL)
The MIR-based region checking code is located in
[the `rustc_mir::borrow_check::nll` module][nll]. (NLL, of course,
stands for "non-lexical lifetimes", a term that will hopefully be
deprecated once they become the standard kind of lifetime.)
[nll]: https://github.com/rust-lang/rust/tree/master/src/librustc_mir/borrow_check/nll
[nll]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/index.html
The MIR-based region analysis consists of two major functions:

View File

@ -0,0 +1,10 @@
# The MIR type-check
A key component of the borrow check is the
[MIR type-check](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/type_check/index.html).
This check walks the MIR and does a complete "type check" -- the same
kind you might find in any other language. In the process of doing
this type-check, we also uncover the region constraints that apply to
the program.
TODO -- elaborate further? Maybe? :)

View File

@ -1,59 +0,0 @@
# MIR borrow check
The borrow check is Rust's "secret sauce" it is tasked with
enforcing a number of properties:
- That all variables are initialized before they are used.
- That you can't move the same value twice.
- That you can't move a value while it is borrowed.
- That you can't access a place while it is mutably borrowed (except through
the reference).
- That you can't mutate a place while it is shared borrowed.
- etc
At the time of this writing, the code is in a state of transition. The
"main" borrow checker still works by processing [the HIR](hir.html),
but that is being phased out in favor of the MIR-based borrow checker.
Doing borrow checking on MIR has two key advantages:
- The MIR is *far* less complex than the HIR; the radical desugaring
helps prevent bugs in the borrow checker. (If you're curious, you
can see
[a list of bugs that the MIR-based borrow checker fixes here][47366].)
- Even more importantly, using the MIR enables ["non-lexical lifetimes"][nll],
which are regions derived from the control-flow graph.
[47366]: https://github.com/rust-lang/rust/issues/47366
[nll]: http://rust-lang.github.io/rfcs/2094-nll.html
### Major phases of the borrow checker
The borrow checker source is found in
[the `rustc_mir::borrow_check` module][b_c]. The main entry point is
the `mir_borrowck` query. At the time of this writing, MIR borrowck can operate
in several modes, but this text will describe only the mode when NLL is enabled
(what you get with `#![feature(nll)]`).
[b_c]: https://github.com/rust-lang/rust/tree/master/src/librustc_mir/borrow_check
The overall flow of the borrow checker is as follows:
- We first create a **local copy** C of the MIR. In the coming steps,
we will modify this copy in place to modify the types and things to
include references to the new regions that we are computing.
- We then invoke `nll::replace_regions_in_mir` to modify this copy C.
Among other things, this function will replace all of the regions in
the MIR with fresh [inference variables](./appendix/glossary.html).
- (More details can be found in [the regionck section](./mir/regionck.html).)
- Next, we perform a number of [dataflow
analyses](./appendix/background.html#dataflow)
that compute what data is moved and when. The results of these analyses
are needed to do both borrow checking and region inference.
- Using the move data, we can then compute the values of all the regions in the
MIR.
- (More details can be found in [the NLL section](./mir/regionck.html).)
- Finally, the borrow checker itself runs, taking as input (a) the
results of move analysis and (b) the regions computed by the region
checker. This allows us to figure out which loans are still in scope
at any particular point.