Merge pull request #190 from nikomatsakis/mir-borrow-check-1

start to document MIR borrow check
2018-09-11 16:52:54 -04:00 · 2018-09-11 16:52:54 -04:00 · 93b5783fec
parent b53fd08f17 c749bb2272
commit 93b5783fec
10 changed files with 262 additions and 67 deletions
--- a/.travis.yml
+++ b/.travis.yml
@ -3,7 +3,7 @@ cache:
 - cargo
 before_install:
 - shopt -s globstar
- MAX_LINE_LENGTH=80 bash ci/check_line_lengths.sh src/**/*.md
+- MAX_LINE_LENGTH=100 bash ci/check_line_lengths.sh src/**/*.md
 install:
 - source ~/.cargo/env || true
 - bash ci/install.sh
--- a/ci/check_line_lengths.sh
+++ b/ci/check_line_lengths.sh
@ -2,7 +2,7 @@

 if [ "$1" == "--help" ]; then
    echo 'Usage:'
-    echo '  MAX_LINE_LENGTH=80' "$0" 'src/**/*.md'
+    echo '  MAX_LINE_LENGTH=100' "$0" 'src/**/*.md'
    exit 1
 fi

--- a/src/SUMMARY.md
+++ b/src/SUMMARY.md
@ -53,9 +53,12 @@
    - [MIR construction](./mir/construction.md)
    - [MIR visitor and traversal](./mir/visitor.md)
    - [MIR passes: getting the MIR for a function](./mir/passes.md)
-    - [MIR borrowck](./mir/borrowck.md)
-      - [MIR-based region checking (NLL)](./mir/regionck.md)
    - [MIR optimizations](./mir/optimizations.md)
+- [The borrow checker](./borrow_check.md)
+    - [Tracking moves and initialization](./borrow_check/moves_and_initialization.md)
+      - [Move paths](./borrow_check/moves_and_initialization/move_paths.md)
+    - [MIR type checker](./borrow_check/type_check.md)
+    - [Region inference](./borrow_check/region_inference.md)
 - [Constant evaluation](./const-eval.md)
    - [miri const evaluator](./miri.md)
 - [Parameter Environments](./param_env.md)
--- a/src/appendix/glossary.md
+++ b/src/appendix/glossary.md
@ -40,7 +40,7 @@ MIR                     |  the Mid-level IR that is created after type-checking
 miri                    |  an interpreter for MIR used for constant evaluation ([see more](./miri.html))
 normalize               |  a general term for converting to a more canonical form, but in the case of rustc typically refers to [associated type normalization](./traits/associated-types.html#normalize)
 newtype                 |  a "newtype" is a wrapper around some other type (e.g., `struct Foo(T)` is a "newtype" for `T`). This is commonly used in Rust to give a stronger type for indices.
-NLL                     | [non-lexical lifetimes](./mir/regionck.html), an extension to Rust's borrowing system to make it be based on the control-flow graph.
+NLL                     | [non-lexical lifetimes](./borrow_check/region_inference.html), an extension to Rust's borrowing system to make it be based on the control-flow graph.
 node-id or NodeId       |  an index identifying a particular node in the AST or HIR; gradually being phased out and replaced with `HirId`.
 obligation              |  something that must be proven by the trait system ([see more](traits/resolution.html))
 projection              |  a general term for a "relative path", e.g. `x.f` is a "field projection", and `T::Item` is an ["associated type projection"](./traits/goals-and-clauses.html#trait-ref)
@ -53,7 +53,7 @@ rib                     |  a data structure in the name resolver that keeps trac
 sess                    |  the compiler session, which stores global data used throughout compilation
 side tables             |  because the AST and HIR are immutable once created, we often carry extra information about them in the form of hashtables, indexed by the id of a particular node.
 sigil                   |  like a keyword but composed entirely of non-alphanumeric tokens. For example, `&` is a sigil for references.
-skolemization           |  a way of handling subtyping around "for-all" types (e.g., `for<'a> fn(&'a u32)`) as well as solving higher-ranked trait bounds (e.g., `for<'a> T: Trait<'a>`). See [the chapter on skolemization and universes](./mir/regionck.html#skol) for more details.
+skolemization           |  a way of handling subtyping around "for-all" types (e.g., `for<'a> fn(&'a u32)`) as well as solving higher-ranked trait bounds (e.g., `for<'a> T: Trait<'a>`). See [the chapter on skolemization and universes](./borrow_check/region_inference.html#skol) for more details.
 soundness               |  soundness is a technical term in type theory. Roughly, if a type system is sound, then if a program type-checks, it is type-safe; i.e. I can never (in safe rust) force a value into a variable of the wrong type. (see "completeness").
 span                    |  a location in the user's source code, used for error reporting primarily. These are like a file-name/line-number/column tuple on steroids: they carry a start/end point, and also track macro expansions and compiler desugaring. All while being packed into a few bytes (really, it's an index into a table). See the Span datatype for more.
 substs                  |  the substitutions for a given generic type or item (e.g. the `i32`, `u32` in `HashMap<i32, u32>`)
--- a/src/borrow_check.md
+++ b/src/borrow_check.md
@ -0,0 +1,63 @@
+# MIR borrow check
+
+The borrow check is Rust's "secret sauce" – it is tasked with
+enforcing a number of properties:
+
+- That all variables are initialized before they are used.
+- That you can't move the same value twice.
+- That you can't move a value while it is borrowed.
+- That you can't access a place while it is mutably borrowed (except through
+  the reference).
+- That you can't mutate a place while it is shared borrowed.
+- etc
+
+At the time of this writing, the code is in a state of transition. The
+"main" borrow checker still works by processing [the HIR](hir.html),
+but that is being phased out in favor of the MIR-based borrow checker.
+Accordingly, this documentation focuses on the new, MIR-based borrow
+checker.
+
+Doing borrow checking on MIR has several advantages:
+
+- The MIR is *far* less complex than the HIR; the radical desugaring
+  helps prevent bugs in the borrow checker. (If you're curious, you
+  can see
+  [a list of bugs that the MIR-based borrow checker fixes here][47366].)
+- Even more importantly, using the MIR enables ["non-lexical lifetimes"][nll],
+  which are regions derived from the control-flow graph.
+
+[47366]: https://github.com/rust-lang/rust/issues/47366
+[nll]: http://rust-lang.github.io/rfcs/2094-nll.html
+
+### Major phases of the borrow checker
+
+The borrow checker source is found in
+[the `rustc_mir::borrow_check` module][b_c]. The main entry point is
+the [`mir_borrowck`] query.
+
+[b_c]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/index.html
+[`mir_borrowck`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/fn.mir_borrowck.html
+
+- We first create a **local copy** of the MIR. In the coming steps,
+  we will modify this copy in place to modify the types and things to
+  include references to the new regions that we are computing.
+- We then invoke [`replace_regions_in_mir`] to modify our local MIR.
+  Among other things, this function will replace all of the [regions](./appendix/glossary.html) in
+  the MIR with fresh [inference variables](./appendix/glossary.html).
+- Next, we perform a number of
+  [dataflow analyses](./appendix/background.html#dataflow) that
+  compute what data is moved and when.
+- We then do a [second type check](borrow_check/type_check.html) across the MIR:
+  the purpose of this type check is to determine all of the constraints between
+  different regions.
+- Next, we do [region inference](borrow_check/region_inference.html), which computes
+  the values of each region — basically, points in the control-flow graph.
+- At this point, we can compute the "borrows in scope" at each point.
+- Finally, we do a second walk over the MIR, looking at the actions it
+  does and reporting errors. For example, if we see a statement like
+  `*a + 1`, then we would check that the variable `a` is initialized
+  and that it is not mutably borrowed, as either of those would
+  require an error to be reported.
+  - Doing this check requires the results of all the previous analyses.
+
+[`replace_regions_in_mir`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/fn.replace_regions_in_mir.html
--- a/src/borrow_check/moves_and_initialization.md
+++ b/src/borrow_check/moves_and_initialization.md
@ -0,0 +1,50 @@
+# Tracking moves and initialization
+
+Part of the borrow checker's job is to track which variables are
+"initialized" at any given point in time -- this also requires
+figuring out where moves occur and tracking those.
+
+## Initialization and moves
+
+From a user's perspective, initialization -- giving a variable some
+value -- and moves -- transfering ownership to another place -- might
+seem like distinct topics. Indeed, our borrow checker error messages
+often talk about them differently. But **within the borrow checker**,
+they are not nearly as separate. Roughly speaking, the borrow checker
+tracks the set of "initialized places" at any point in the source
+code. Assigning to a previously uninitialized local variable adds it
+to that set; moving from a local variable removes it from that set.
+
+Consider this example:
+
+```rust,ignore
+fn foo() {
+    let a: Vec<u32>;
+    
+    // a is not initialized yet
+    
+    a = vec![22];
+    
+    // a is initialized here
+    
+    std::mem::drop(a); // a is moved here
+    
+    // a is no longer initialized here
+
+    let l = a.len(); //~ ERROR
+}
+```
+
+Here you can see that `a` starts off as uninitialized; once it is
+assigned, it becomes initialized. But when `drop(a)` is called, that
+moves `a` into the call, and hence it becomes uninitialized again.
+
+## Subsections
+
+To make it easier to peruse, this section is broken into a number of
+subsections:
+
+- [Move paths](./moves_and_initialization/move_paths.html the
+  *move path* concept that we use to track which local variables (or parts of
+  local variables, in some cases) are initialized.
+- TODO *Rest not yet written* =)
--- a/src/borrow_check/moves_and_initialization/move_paths.md
+++ b/src/borrow_check/moves_and_initialization/move_paths.md
@ -0,0 +1,128 @@
+# Move paths
+
+In reality, it's not enough to track initialization at the granularity
+of local variables. Rust also allows us to do moves and initialization
+at the field granularity:
+
+```rust,ignore
+fn foo() {
+    let a: (Vec<u32>, Vec<u32>) = (vec![22], vec![44]);
+    
+    // a.0 and a.1 are both initialized
+    
+    let b = a.0; // moves a.0
+    
+    // a.0 is not initializd, but a.1 still is
+
+    let c = a.0; // ERROR
+    let d = a.1; // OK
+}
+```
+
+To handle this, we track initialization at the granularity of a **move
+path**. A [`MovePath`] represents some location that the user can
+initialize, move, etc. So e.g. there is a move-path representing the
+local variable `a`, and there is a move-path representing `a.0`.  Move
+paths roughly correspond to the concept of a [`Place`] from MIR, but
+they are indexed in ways that enable us to do move analysis more
+efficiently.
+
+[`MovePath`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePath.html
+[`Place`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/mir/enum.Place.html
+
+## Move path indices
+
+Although there is a [`MovePath`] data structure, they are never
+referenced directly.  Instead, all the code passes around *indices* of
+type
+[`MovePathIndex`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/indexes/struct.MovePathIndex.html). If
+you need to get information about a move path, you use this index with
+the [`move_paths` field of the `MoveData`][move_paths]. For example,
+to convert a [`MovePathIndex`] `mpi` into a MIR [`Place`], you might
+access the [`MovePath::place`] field like so:
+
+```rust,ignore
+move_data.move_paths[mpi].place
+```
+
+[move_paths]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MoveData.html#structfield.move_paths
+[`MovePath::place`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePath.html#structfield.place
+
+## Building move paths
+
+One of the first things we do in the MIR borrow check is to construct
+the set of move paths. This is done as part of the
+[`MoveData::gather_moves`] function. This function uses a MIR visitor
+called [`Gatherer`] to walk the MIR and look at how each [`Place`]
+within is accessed. For each such [`Place`], it constructs a
+corresponding [`MovePathIndex`]. It also records when/where that
+particular move path is moved/initialized, but we'll get to that in a
+later section.
+
+[`Gatherer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/builder/struct.Gatherer.html
+[`MoveData::gather_moves`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MoveData.html#method.gather_moves
+
+### Illegal move paths
+
+We don't actually create a move-path for **every** [`Place`] that gets
+used.  In particular, if it is illegal to move from a [`Place`], then
+there is no need for a [`MovePathIndex`]. Some examples:
+
+- You cannot move from a static variable, so we do not create a [`MovePathIndex`]
+  for static variables.
+- You cannot move an individual element of an array, so if we have e.g. `foo: [String; 3]`,
+  there would be no move-path for `foo[1]`.
+- You cannot move from inside of a borrowed reference, so if we have e.g. `foo: &String`,
+  there would be no move-path for `*foo`.
+  
+These rules are enforced by the [`move_path_for`] function, which
+converts a [`Place`] into a [`MovePathIndex`] -- in error cases like
+those just discussed, the function returns an `Err`. This in turn
+means we don't have to bother tracking whether those places are
+initialized (which lowers overhead).
+
+[`move_path_for`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/builder/struct.Gatherer.html#method.move_path_for
+
+## Looking up a move-path
+
+If you have a [`Place`] and you would like to convert it to a [`MovePathIndex`], you 
+can do that using the [`MovePathLookup`] structure found in the [`rev_lookup`] field
+of [`MoveData`]. There are two different methods:
+
+[`MovePathLookup`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePathLookup.html
+[`rev_lookup`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MoveData.html#structfield.rev_lookup
+
+- [`find_local`], which takes a [`mir::Local`] representing a local
+  variable. This is the easier method, because we **always** create a
+  [`MovePathIndex`] for every local variable.
+- [`find`], which takes an arbitrary [`Place`]. This method is a bit
+  more annoying to use, precisely because we don't have a
+  [`MovePathIndex`] for **every** [`Place`] (as we just discussed in
+  the "illegal move paths" section). Therefore, [`find`] returns a
+  [`LookupResult`] indicating the closest path it was able to find
+  that exists (e.g., for `foo[1]`, it might return just the path for
+  `foo`).
+  
+[`find`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePathLookup.html#method.find
+[`find_local`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/struct.MovePathLookup.html#method.find_local
+[`mir::Local`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/mir/struct.Local.html
+[`LookupResult`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/move_paths/enum.LookupResult.html
+
+## Cross-references
+
+As we noted above, move-paths are stored in a big vector and
+referenced via their [`MovePathIndex`]. However, within this vector,
+they are also structured into a tree. So for example if you have the
+[`MovePathIndex`] for `a.b.c`, you can go to its parent move-path
+`a.b`. You can also iterate over all children paths: so, from `a.b`,
+you might iterate to find the path `a.b.c` (here you are iterating
+just over the paths that are **actually referenced** in the source,
+not all **possible** paths that could have been referenced). These
+references are used for example in the [`has_any_child_of`] function,
+which checks whether the dataflow results contain a value for the
+given move-path (e.g., `a.b`) or any child of that move-path (e.g.,
+`a.b.c`).
+
+[`Place`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/mir/enum.Place.html
+[`has_any_child_of`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/dataflow/at_location/struct.FlowAtLocation.html#method.has_any_child_of
+
--- a/src/borrow_check/region_inference.md
+++ b/src/borrow_check/region_inference.md
@ -1,11 +1,11 @@
-# MIR-based region checking (NLL)
+# Region inference (NLL)

 The MIR-based region checking code is located in
 [the `rustc_mir::borrow_check::nll` module][nll]. (NLL, of course,
 stands for "non-lexical lifetimes", a term that will hopefully be
 deprecated once they become the standard kind of lifetime.)

-[nll]: https://github.com/rust-lang/rust/tree/master/src/librustc_mir/borrow_check/nll
+[nll]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/index.html

 The MIR-based region analysis consists of two major functions:

--- a/src/borrow_check/type_check.md
+++ b/src/borrow_check/type_check.md
@ -0,0 +1,10 @@
+# The MIR type-check
+
+A key component of the borrow check is the
+[MIR type-check](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/type_check/index.html).
+This check walks the MIR and does a complete "type check" -- the same
+kind you might find in any other language. In the process of doing
+this type-check, we also uncover the region constraints that apply to
+the program.
+
+TODO -- elaborate further? Maybe? :)
--- a/src/mir/borrowck.md
+++ b/src/mir/borrowck.md
@ -1,59 +0,0 @@
-# MIR borrow check
-
-The borrow check is Rust's "secret sauce" – it is tasked with
-enforcing a number of properties:
-
- That all variables are initialized before they are used.
- That you can't move the same value twice.
- That you can't move a value while it is borrowed.
- That you can't access a place while it is mutably borrowed (except through
-  the reference).
- That you can't mutate a place while it is shared borrowed.
- etc
-
-At the time of this writing, the code is in a state of transition. The
-"main" borrow checker still works by processing [the HIR](hir.html),
-but that is being phased out in favor of the MIR-based borrow checker.
-Doing borrow checking on MIR has two key advantages:
-
- The MIR is *far* less complex than the HIR; the radical desugaring
-  helps prevent bugs in the borrow checker. (If you're curious, you
-  can see
-  [a list of bugs that the MIR-based borrow checker fixes here][47366].)
- Even more importantly, using the MIR enables ["non-lexical lifetimes"][nll],
-  which are regions derived from the control-flow graph.
-
-[47366]: https://github.com/rust-lang/rust/issues/47366
-[nll]: http://rust-lang.github.io/rfcs/2094-nll.html
-
-### Major phases of the borrow checker
-
-The borrow checker source is found in
-[the `rustc_mir::borrow_check` module][b_c]. The main entry point is
-the `mir_borrowck` query. At the time of this writing, MIR borrowck can operate
-in several modes, but this text will describe only the mode when NLL is enabled
-(what you get with `#![feature(nll)]`).
-
-[b_c]: https://github.com/rust-lang/rust/tree/master/src/librustc_mir/borrow_check
-
-The overall flow of the borrow checker is as follows:
-
- We first create a **local copy** C of the MIR. In the coming steps,
-  we will modify this copy in place to modify the types and things to
-  include references to the new regions that we are computing.
- We then invoke `nll::replace_regions_in_mir` to modify this copy C.
-  Among other things, this function will replace all of the regions in
-  the MIR with fresh [inference variables](./appendix/glossary.html).
-  - (More details can be found in [the regionck section](./mir/regionck.html).)
- Next, we perform a number of [dataflow
-  analyses](./appendix/background.html#dataflow)
-  that compute what data is moved and when. The results of these analyses
-  are needed to do both borrow checking and region inference.
- Using the move data, we can then compute the values of all the regions in the
-  MIR.
-  - (More details can be found in [the NLL section](./mir/regionck.html).)
- Finally, the borrow checker itself runs, taking as input (a) the
-  results of move analysis and (b) the regions computed by the region
-  checker. This allows us to figure out which loans are still in scope
-  at any particular point.
-