diff --git a/src/SUMMARY.md b/src/SUMMARY.md index e9e2ffae..338cb7fe 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -55,6 +55,9 @@ - [MIR passes: getting the MIR for a function](./mir/passes.md) - [MIR optimizations](./mir/optimizations.md) - [The borrow checker](./borrow_check.md) + - [Tracking moves and initialization](./borrow_check/moves_and_initialization.md) + - [Move paths](./borrow_check/moves_and_initialization/move_paths.md) + - [MIR type checker](./borrow_check/type_check.md) - [Region inference](./borrow_check/region_inference.md) - [Constant evaluation](./const-eval.md) - [miri const evaluator](./miri.md) diff --git a/src/borrow_check.md b/src/borrow_check.md index d5fea618..07095f20 100644 --- a/src/borrow_check.md +++ b/src/borrow_check.md @@ -14,7 +14,10 @@ enforcing a number of properties: At the time of this writing, the code is in a state of transition. The "main" borrow checker still works by processing [the HIR](hir.html), but that is being phased out in favor of the MIR-based borrow checker. -Doing borrow checking on MIR has two key advantages: +Accordingly, this documentation focuses on the new, MIR-based borrow +checker. + +Doing borrow checking on MIR has several advantages: - The MIR is *far* less complex than the HIR; the radical desugaring helps prevent bugs in the borrow checker. (If you're curious, you @@ -30,30 +33,31 @@ Doing borrow checking on MIR has two key advantages: The borrow checker source is found in [the `rustc_mir::borrow_check` module][b_c]. The main entry point is -the `mir_borrowck` query. At the time of this writing, MIR borrowck can operate -in several modes, but this text will describe only the mode when NLL is enabled -(what you get with `#![feature(nll)]`). +the [`mir_borrowck`] query. -[b_c]: https://github.com/rust-lang/rust/tree/master/src/librustc_mir/borrow_check - -The overall flow of the borrow checker is as follows: +[b_c]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/index.html +[`mir_borrowck`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/fn.mir_borrowck.html - We first create a **local copy** C of the MIR. In the coming steps, we will modify this copy in place to modify the types and things to include references to the new regions that we are computing. -- We then invoke `nll::replace_regions_in_mir` to modify this copy C. - Among other things, this function will replace all of the regions in +- We then invoke [`replace_regions_in_mir`] to modify this copy C. + Among other things, this function will replace all of the [regions](./appendix/glossary.html) in the MIR with fresh [inference variables](./appendix/glossary.html). - - (More details can be found in [the regionck section](./mir/regionck.html).) -- Next, we perform a number of [dataflow - analyses](./appendix/background.html#dataflow) - that compute what data is moved and when. The results of these analyses - are needed to do both borrow checking and region inference. -- Using the move data, we can then compute the values of all the regions in the - MIR. - - (More details can be found in [the NLL section](./mir/regionck.html).) -- Finally, the borrow checker itself runs, taking as input (a) the - results of move analysis and (b) the regions computed by the region - checker. This allows us to figure out which loans are still in scope - at any particular point. +- Next, we perform a number of + [dataflow analyses](./appendix/background.html#dataflow) that + compute what data is moved and when. +- We then do a [second type check](borrow_check/type_check.html) across the MIR: + the purpose of this type check is to determine all of the constraints between + different regions. +- Next, we do [region inference](borrow_check/region_inference.html), which computes + the values of each region -- basically, points in the control-flow graph. +- At this point, we can compute the "borrows in scope" at each point. +- Finally, we do a second walk over the MIR, looking at the actions it + does and reporting errors. For example, if we see a statement like + `*a + 1`, then we would check that the variable `a` is initialized + and that it is not mutably borrowed, as either of those would + require an error to be reported. + - Doing this check requires the results of all the previous analyses. +[`replace_regions_in_mir`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/fn.replace_regions_in_mir.html diff --git a/src/borrow_check/moves_and_initialization.md b/src/borrow_check/moves_and_initialization.md new file mode 100644 index 00000000..6bc8cc47 --- /dev/null +++ b/src/borrow_check/moves_and_initialization.md @@ -0,0 +1,50 @@ +# Tracking moves and initialization + +Part of the borrow checker's job is to track which variables are +"initialized" at any given point in time -- this also requires +figuring out where moves occur and tracking those. + +## Initialization and moves + +From a user's perspective, initialization -- giving a variable some +value -- and moves -- transfering ownership to another place -- might +seem like distinct topics. Indeed, our borrow checker error messages +often talk about them differently. But **within the borrow checker**, +they are not nearly as separate. Roughly speaking, the borrow checker +tracks the set of "initialized places" at any point in time. Assigning +to a previously uninitialized local variable adds it to that set; +moving from a local variable removes it from that set. + +Consider this example: + +```rust +fn foo() { + let a: Vec; + + // a is not initialized yet + + a = vec![22]; + + // a is initialized here + + std::mem::drop(a); // a is moved here + + // a is no longer initialized here + + let l = a.len(); //~ ERROR +} +``` + +Here you can see that `a` starts off as uninitialized; once it is +assigned, it becomes initialized. But when `drop(a)` is called, it +becomes uninitialized again. + +## Subsections + +To make it easier to peruse, this section is broken into a number of +subsections: + +- [Move paths](./moves_and_initialization/move_paths.html the + *move path* concept that we use to track which local variables (or parts of + local variables, in some cases) are initialized. +- *Rest not yet written* =) diff --git a/src/borrow_check/moves_and_initialization/move_paths.md b/src/borrow_check/moves_and_initialization/move_paths.md new file mode 100644 index 00000000..9d5dc3ed --- /dev/null +++ b/src/borrow_check/moves_and_initialization/move_paths.md @@ -0,0 +1,126 @@ +# Move paths + +In reality, it's not enough to track initialization at the granularity +of local variables. Sometimes we need to track, e.g., individual fields: + +```rust +fn foo() { + let a: (Vec, Vec) = (vec![22], vec![44]); + + // a.0 and a.1 are both initialized + + let b = a.0; // moves a.0 + + // a.0 is not initializd, but a.1 still is + + let c = a.0; // ERROR + let d = a.1; // OK +} +``` + +To handle this, we track initialization at the granularity of a **move +path**. A [`MovePath`] represents some location that the user can +initialize, move, etc. So e.g. there is a move-path representing the +local variable `a`, and there is a move-path representing `a.0`. Move +paths roughly correspond to the concept of a [`Place`] from MIR, but +they are indexed in ways that enable us to do move analysis more +efficiently. + +[`MovePath`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePath.html +[`Place`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/mir/enum.Place.html + +## Move path indices + +Although there is a [`MovePath`] data structure, they are never +referenced directly. Instead, all the code passes around *indices* of +type +[`MovePathIndex`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/indexes/struct.MovePathIndex.html). If +you need to get information about a move path, you use this index with +the [`move_paths` field of the `MoveData`][move_paths]. For example, +to convert a [`MovePathIndex`] `mpi` into a MIR [`Place`], you might +access the [`MovePath::place`] field like so: + +```rust +move_data.move_paths[mpi].place +``` + +[move_paths]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MoveData.html#structfield.move_paths +[`MovePath::place`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePath.html#structfield.place + +## Building move paths + +One of the first things we do in the MIR borrow check is to construct +the set of move paths. This is done as part of the +[`MoveData::gather_moves`] function. This function uses a MIR visitor +called [`Gatherer`] to walk the MIR and look at how each [`Place`] +within is accessed. For each such [`Place`], it constructs a +corresponding [`MovePathIndex`]. It also records when/where that +particular move path is moved/initialized, but we'll get to that in a +later section. + +[`Gatherer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/builder/struct.Gatherer.html +[`MoveData::gather_moves`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MoveData.html#method.gather_moves + +### Illegal move paths + +We don't actually move-paths for **every** [`Place`] that gets used. +In particular, if it is illegal to move from a [`Place`], then there +is no need for a [`MovePathIndex`]. Some examples: + +- You cannot move from a static variable, so we do not create a [`MovePathIndex`] + for static variables. +- You cannot move an individual element of an array, so if we have e.g. `foo: [String; 3]`, + there would be no move-path for `foo[1]`. +- You cannot move from inside of a borrowed reference, so if we have e.g. `foo: &String`, + there would be no move-path for `*foo`. + +These rules are enforced by the [`move_path_for`] function, which +converts a [`Place`] into a [`MovePathIndex`] -- in error cases like +those just discussed, the function returns an `Err`. This in turn +means we don't have to bother tracking whether those places are +initialized (which lowers overhead). + +[`move_path_for`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/builder/struct.Gatherer.html#method.move_path_for + +## Looking up a move-path + +If you have a [`Place`] and you would like to convert it to a [`MovePathIndex`], you +can do that using the [`MovePathLookup`] structure found in the [`rev_lookup`] field +of [`MoveData`]. There are two different methods: + +[`MovePathLookup`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePathLookup.html +[`rev_lookup`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MoveData.html#structfield.rev_lookup + +- [`find_local`], which takes a [`mir::Local`] representing a local + variable. This is the easier method, because we **always** create a + [`MovePathIndex`] for every local variable. +- [`find`], which takes an arbitrary [`Place`]. This method is a bit + more annoying to use, precisely because we don't have a + [`MovePathIndex`] for **every** [`Place`] (as we just discussed in + the "illegal move paths" section). Therefore, [`find`] returns a + [`LookupResult`] indicating the closest path it was able to find + that exists (e.g., for `foo[1]`, it might return just the path for + `foo`). + +[`find`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePathLookup.html#method.find +[`find_local`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePathLookup.html#method.find_local +[`mir::Local`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/mir/struct.Local.html +[`LookupResult`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/enum.LookupResult.html + +## Cross-references + +As we noted above, move-paths are stored in a big vector and +referenced via their [`MovePathIndex`]. However, within this vector, +they are also structured into a tree. So for example if you have the +[`MovePathIndex`] for `a.b.c`, you can go to its parent move-path +`a.b`. You can also iterate over all children paths: so, from `a.b`, +you might iterate to find the path `a.b.c` (here you are iterating +just over the paths that the user **actually referenced**, not all +**possible** paths the user could have done). These references are +used for example in the [`has_any_child_of`] function, which checks +whether the dataflow results contain a value for the given move-path +(e.g., `a.b`) or any child of that move-path (e.g., `a.b.c`). + +[`Place`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/mir/enum.Place.html +[`has_any_child_of`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/at_location/struct.FlowAtLocation.html#method.has_any_child_of + diff --git a/src/borrow_check/region_inference.md b/src/borrow_check/region_inference.md index 9034af8a..47b21b0d 100644 --- a/src/borrow_check/region_inference.md +++ b/src/borrow_check/region_inference.md @@ -1,11 +1,11 @@ -# MIR-based region checking (NLL) +# Region inference (NLL) The MIR-based region checking code is located in [the `rustc_mir::borrow_check::nll` module][nll]. (NLL, of course, stands for "non-lexical lifetimes", a term that will hopefully be deprecated once they become the standard kind of lifetime.) -[nll]: https://github.com/rust-lang/rust/tree/master/src/librustc_mir/borrow_check/nll +[nll]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/index.html The MIR-based region analysis consists of two major functions: diff --git a/src/borrow_check/type_check.md b/src/borrow_check/type_check.md new file mode 100644 index 00000000..10d96155 --- /dev/null +++ b/src/borrow_check/type_check.md @@ -0,0 +1,11 @@ +# The MIR type-check + +A key component of the borrow check is the +[MIR type-check](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/type_check/index.html). +This check walks the MIR and does a complete "type check" -- the same +kind you might find in any other language. In the process of doing +this type-check, we also uncover the region constraints that apply to +the program. + + +