describe region inference and member constraints in some detail

This commit is contained in:
Niko Matsakis 2019-06-20 09:43:48 -04:00 committed by Who? Me?!
parent 00be9ef01b
commit 7ba8e25bb8
5 changed files with 470 additions and 72 deletions

View File

@ -77,6 +77,8 @@
- [MIR type checker](./borrow_check/type_check.md)
- [Region inference](./borrow_check/region_inference.md)
- [Constraint propagation](./borrow_check/region_inference/constraint_propagation.md)
- [Lifetime parameters](./borrow_check/region_inference/lifetime_parameters.md)
- [Member constraints](./borrow_check/region_inference/member_constraints.md)
- [Placeholders and universes](./borrow_check/region_inference/placeholders_and_universes.md)
- [Closure constraints](./borrow_check/region_inference/closure_constraints.md)
- [Errror reporting](./borrow_check/region_inference/error_reporting.md)

View File

@ -71,6 +71,8 @@ TODO: write about _how_ these regions are computed.
[`UniversalRegions`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/universal_regions/struct.UniversalRegions.html
<a name="region-variables"></a>
## Region variables
The value of a region can be thought of as a **set**. This set contains all

View File

@ -1,88 +1,121 @@
# Constraint propagation
The main work of the region inference is **constraint
propagation**. This means processing the set of constraints to compute
the final values for all the region variables.
The main work of the region inference is **constraint propagation**,
which is done in the [`propagate_constraints`] function. There are
three sorts of constraints that are used in NLL, and we'll explain how
`propagate_constraints` works by "layering" those sorts of constraints
on one at a time (each of them is fairly independent from the others):
## Kinds of constraints
- liveness constraints (`R live at E`), which arise from liveness;
- outlives constraints (`R1: R2`), which arise from subtyping;
- [member constraints][m_c] (`member R_m of [R_c...]`), which arise from impl Trait.
Each kind of constraint is handled somewhat differently by the region inferencer.
[`propagate_constraints`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/region_infer/struct.RegionInferenceContext.html#method.propagate_constraints
[m_c]: ./member_constraints.html
### Liveness constraints
In this chapter, we'll explain the "heart" of constraint propagation,
covering both liveness and outlives constraints.
## Notation and high-level concepts
Conceptually, region inference is a "fixed-point" computation. It is
given some set of constraints `{C}` and it computes a set of values
`Values: R -> {E}` that maps each region `R` to a set of elements
`{E}` (see [here][riv] for more notes on region elements):
- Initially, each region is mapped to an empty set, so `Values(R) =
{}` for all regions `R`.
- Next, we process the constraints repeatedly until a fixed-point is reached:
- For each constraint C:
- Update `Values` as needed to satisfy the constraint
[riv]: ../region-inference.html#region-variables
As a simple example, if we have a liveness constraint `R live at E`,
then we can apply `Values(R) = Values(R) union {E}` to make the
constraint be satisfied. Similarly, if we have an outlives constraints
`R1: R2`, we can apply `Values(R1) = Values(R1) union Values(R2)`.
(Member constraints are more complex and we discuss them below.)
In practice, however, we are a bit more clever. Instead of applying
the constraints in a loop, we can analyze the constraints and figure
out the correct order to apply them, so that we only have to apply
each constraint once in order to find the final result.
Similarly, in the implementation, the `Values` set is stored in the
`scc_values` field, but they are indexed not by a *region* but by a
*strongly connected component* (SCC). SCCs are an optimization that
avoids a lot of redundant storage and computation. They are explained
in the section on outlives constraints.
## Liveness constraints
A **liveness constraint** arises when some variable whose type
includes a region R is live at some point P. This simply means that
the value of R must include the point P. Liveness constraints are
computed by the MIR type checker.
We represent them by keeping a (sparse) bitset for each region
variable, which is the field [`liveness_constraints`], of type
[`LivenessValues`]
A liveness constraint `R live at E` is satisfied if `E` is a member of
`Values(R)`. So to "apply" such a constraint to `Values`, we just have
to compute `Values(R) = Values(R) union {E}`.
The liveness values are computed in the type-check and passes to the
region inference upon creation in the `liveness_constraints` argument.
These are not represented as individual constraints like `R live at E`
though; instead, we store a (sparse) bitset per region variable (of
type [`LivenessValues`]). This way we only need a single bit for each
liveness constraint.
[`liveness_constraints`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/region_infer/struct.RegionInferenceContext.html#structfield.liveness_constraints
[`LivenessValues`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/region_infer/values/struct.LivenessValues.html
### Outlives constraints
One thing that is worth mentioning: All lifetime parameters are always
considered to be live over the entire function body. This is because
they correspond to some portion of the *caller's* execution, and that
execution clearly includes the time spent in this function, since the
caller is waiting for us to return.
## Outlives constraints
An outlives constraint `'a: 'b` indicates that the value of `'a` must
be a **superset** of the value of `'b`. On creation, we are given a
set of outlives constraints in the form of a
[`ConstraintSet`]. However, to work more efficiently with outlives
constraints, they are [converted into the form of a graph][graph-fn],
where the nodes of the graph are region variables (`'a`, `'b`) and
each constraint `'a: 'b` induces an edge `'a -> 'b`. This conversion
happens in the [`RegionInferenceContext::new`] function that creates
the inference context.
be a **superset** of the value of `'b`. That is, an outlives
constraint `R1: R2` is satisfied if `Values(R1)` is a superset of
`Values(R2)`. So to "apply" such a constraint to `Values`, we just
have to compute `Values(R1) = Values(R1) union Values(R2)`.
One observation that follows from this is that if you have `R1: R2`
and `R2: R1`, then `R1 = R2` must be true. Similarly, if you have:
```
R1: R2
R2: R3
R3: R4
R4: R1
```
then `R1 = R2 = R3 = R4` follows. We take advantage of this to make things
much faster, as described shortly.
In the code, the set of outlives constraints is given to the region
inference context on creation in a parameter of type
[`ConstraintSet`]. The constraint set is basically just a list of `'a:
'b` constraints.
### The outlives constraint graph and SCCs
In order to work more efficiently with outlives constraints, they are
[converted into the form of a graph][graph-fn], where the nodes of the
graph are region variables (`'a`, `'b`) and each constraint `'a: 'b`
induces an edge `'a -> 'b`. This conversion happens in the
[`RegionInferenceContext::new`] function that creates the inference
context.
[`ConstraintSet`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/constraints/struct.ConstraintSet.html
[graph-fn]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/constraints/struct.ConstraintSet.html#method.graph
[`RegionInferenceContext::new`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/region_infer/struct.RegionInferenceContext.html#method.new
### Member constraints
A member constraint `'m member of ['c_1..'c_N]` expresses that the
region `'m` must be *equal* to some **choice regions** `'c_i` (for
some `i`). These constraints cannot be expressed by users, but they arise
from `impl Trait` due to its lifetime capture rules. Consinder a function
such as the following:
```rust
fn make(a: &'a u32, b: &'b u32) -> impl Trait<'a, 'b> { .. }
```
Here, the true return type (often called the "hidden type") is only
permitted to capture the lifeimes `'a` or `'b`. You can kind of see
this more clearly by desugaring that `impl Trait` return type into its
more explicit form:
```rust
type MakeReturn<'x, 'y> = impl Trait<'x, 'y>;
fn make(a: &'a u32, b: &'b u32) -> MakeReturn<'a, 'b> { .. }
```
Here, the idea is that the hidden type must be some type that could
have been written in place of the `impl Trait<'x, 'y>` -- but clearly
such a type can only reference the regions `'x` or `'y` (or
`'static`!), as those are the only names in scope. This limitation is
then translated into a restriction to only access `'a` or `'b` because
we are returning `MakeReturn<'a, 'b>`, where `'x` and `'y` have been
replaced with `'a` and `'b` respectively.
## SCCs in the outlives constraint graph
The most common sort of constraint in practice are outlives
constraints like `'a: 'b`. Such a cosntraint means that `'a` is a
superset of `'b`. So what happens if we have two regions `'a` and `'b`
that mutually outlive one another, like so?
```
'a: 'b
'b: 'a
```
In this case, we can conclude that `'a` and `'b` must be equal
sets. In fact, it doesn't have to be just two regions. We could create
an extended "chain" of outlives constraints:
When using a graph representation, we can detect regions that must be equal
by looking for cycles. That is, if you have a constraint like
```
'a: 'b
@ -91,11 +124,8 @@ an extended "chain" of outlives constraints:
'd: 'a
```
Here, we know that `'a..'d` are all equal to one another.
As mentioned above, an outlives constraint like `'a: 'b` can be viewed
as an edge in a graph `'a -> 'b`. Cycles in this graph indicate regions
that mutually outlive one another and hence must be equal.
then this will correspond to a cycle in the graph containing the
elements `'a...'d`.
Therefore, one of the first things that we do in propagating region
values is to compute the **strongly connected components** (SCCs) in
@ -138,9 +168,55 @@ superset of the value of `S1`. One crucial thing is that this graph of
SCCs is always a DAG -- that is, it never has cycles. This is because
all the cycles have been removed to form the SCCs themselves.
## How constraint propagation works
### Applying liveness constraints to SCCs
The liveness constraints that come in from the type-checker are
expressed in terms of regions -- that is, we have a map like
`Liveness: R -> {E}`. But we want our final result to be expressed
in terms of SCCs -- we can integrate these liveness constraints very
easily just by taking the union:
```
for each region R:
let S by the SCC that contains R
Values(S) = Values(S) union Liveness(R)
```
In the region inferencer, this step is done in [`RegionInferenceContext::new`].
### Applying outlives constraints
Once we have computed the DAG of SCCs, we use that to structure out
entire computation. If we have an edge `S1 -> S2` between two SCCs,
that means that `Values(S1) >= Values(S2)` must hold. So, to compute
the value of `S1`, we first compute the values of each successor `S2`.
Then we simply union all of those values together. To use a
quasi-iterator-like notation:
```
Values(S1) =
s1.successors()
.map(|s2| Values(s2))
.union()
```
In the code, this work starts in the [`propagate_constraints`]
function, which iterates over all the SCCs. For each SCC S1, we
compute its value by first computing the value of its
successors. Since SCCs form a DAG, we don't have to be conecrned about
cycles, though we do need to keep a set around to track whether we
have already processed a given SCC or not. For each successor S2, once
we have computed S2's value, we can union those elements into the
value for S1. (Although we have to be careful in this process to
properly handle [higher-ranked
placeholders](./placeholders_and_universes.html). Note that the value
for S1 already contains the liveness constraints, since they were
added in [`RegionInferenceContext::new`].
Once that process is done, we now have the "minimal value" for S1,
taking into account all of the liveness and outlives
constraints. However, in order to complete the process, we must also
consider [member constraints][m_c], which are described in [a later
section][m_c].
The main work of constraint propagation is done in the
`propagation_constraints` function.
[`propagate_constraints`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/region_infer/struct.RegionInferenceContext.html#method.propagate_constraints

View File

@ -0,0 +1,125 @@
# Universal regions
"Universal regions" is the name that the code uses to refer to "named
lifetimes" -- e.g., lifetime parameters and `'static`. The name
derives from the fact that such lifetimes are "universally quantified"
(i.e., we must make sure the code is true for all values of those
lifetimes). It is worth spending a bit of discussing how lifetime
parameters are handled during region inference. Consider this example:
```rust
fn foo<'a, 'b>(x: &'a u32, y: &'b u32) -> &'b u32 {
x
}
```
This example is intended not to compile, because we are returning `x`,
which has type `&'a u32`, but our signature promises that we will
return a `&'b u32` value. But how are lifetimes like `'a` and `'b
integrated into region inference, and how this error wind up being
detected?
## Universal regions and their relationships to one another
Early on in region inference, one of the first things we do is to
construct a [`UniversalRegions`] struct. This struct tracks the
various universal regions in scope on a particular function. We also
create a [`UniversalRegionRelations`] struct, which tracks their
relationships to one another. So if you have e.g. `where 'a: 'b`, then
the [`UniversalRegionRelations`] struct would track that `'a: 'b` is
known to hold (which could be tested with the [`outlives`] function.
[`UniversalRegions`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/universal_regions/struct.UniversalRegions.html
[`UniversalRegionsRelations`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/type_check/free_region_relations/struct.UniversalRegionRelations.html
[`outlives`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/type_check/free_region_relations/struct.UniversalRegionRelations.html#method.outlives
## Everything is a region variable
One important aspect of how NLL region inference works is that **all
lifetimes** are represented as numbered variables. This means that the
only variant of [`ty::RegionKind`] that we use is the [`ReVar`]
variant. These region variables are broken into two major categories,
based on their index:
[`ty::RegionKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/enum.RegionKind.html
[`ReVar`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/enum.RegionKind.html#variant.ReVar
- 0..N: universal regions -- the ones we are discussing here. In this
case, the code must be correct with respect to any value of those
variables that meets the declared relationships.
- N..M: existential regions -- inference variables where the region
inferencer is tasked with finding *some* suitable value.
In fact, the universal regions can be further subdivided based on
where they were brought into scope (see the [`RegionClassification`]
type). These subdivions are not important for the topics discussed
here, but become important when we consider [closure constraint
propagation](./closure_constraints.html), so we discuss them there.
[`RegionClassification`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/universal_regions/enum.RegionClassification.html#variant.Local
## Universal lifetimes as the elements of a region's value
As noted previously, the value that we infer for each region is a set
`{E}`. The elements of this set can be points in the control-flow
graph, but they can also be an element `end('a)` corresponding to each
universal lifetime `'a`. If the value for some region `R0` includes
`end('a`), then this implies that R0 must extend until the end of `'a`
in the caller.
## The "value" of a universal region
During region inference, we compute a value for each universal region
in the same way as we compute values for other regions. This value
represents, effectively, the **lower bound** on that universal region
-- the things that it must outlive. We now describe how we use this
value to check for errors.
## Liveness and universal regions
All universal regions have an initial liveness constraint that
includes the entire function body. This is because lifetime parameters
are defined in the caller and must include the entirety of the
function call that invokes this particular function. In addition, each
universal region `'a` includes itself (that is, `end('a)`) in its
liveness constraint (i.e., `'a` must extend until the end of
itself). In the code, these liveness constraints are setup in
[`init_free_and_bound_regions`].
[`init_free_and_bound_regions`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/region_infer/struct.RegionInferenceContext.html#method.init_free_and_bound_regions
## Propagating outlives constraints for universal regions
So, consider the first example of this section:
```rust
fn foo<'a, 'b>(x: &'a u32, y: &'b u32) -> &'b u32 {
x
}
```
Here, returning `x` requires that `&'a u32 <: &'b u32`, which gives
rise to an outlives constraint `'a: 'b`. Combined with our default liveness
constraints we get:
```
'a live at {B, end('a)} // B represents the "bunction body"
'b live at {B, end('b)}
'a: 'b
```
When we process the `'a: 'b` constraint, therefore, we will add
`end('b)` into the value for `'a`, resulting in a final value of `{B,
end('a), end('b)}`.
## Detecting errors
Once we have finished constraint propagation, we then enforce a
constraint that if some universal region `'a` includes an element
`end('b)`, then `'a: 'b` must be declared in the function's bounds. If
not, as in our example, that is an error. This check is done in the
[`check_universal_regions`] function, which simply iterates over all
universal regions, inspects their final value, and tests against the
declared [`UniversalRegionRelations`].
[`check_universal_regions`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/nll/region_infer/struct.RegionInferenceContext.html#method.check_universal_regions

View File

@ -0,0 +1,193 @@
# Member constraints
A member constraint `'m member of ['c_1..'c_N]` expresses that the
region `'m` must be *equal* to some **choice regions** `'c_i` (for
some `i`). These constraints cannot be expressed by users, but they
arise from `impl Trait` due to its lifetime capture rules. Consinder a
function such as the following:
```rust
fn make(a: &'a u32, b: &'b u32) -> impl Trait<'a, 'b> { .. }
```
Here, the true return type (often called the "hidden type") is only
permitted to capture the lifeimes `'a` or `'b`. You can kind of see
this more clearly by desugaring that `impl Trait` return type into its
more explicit form:
```rust
type MakeReturn<'x, 'y> = impl Trait<'x, 'y>;
fn make(a: &'a u32, b: &'b u32) -> MakeReturn<'a, 'b> { .. }
```
Here, the idea is that the hidden type must be some type that could
have been written in place of the `impl Trait<'x, 'y>` -- but clearly
such a type can only reference the regions `'x` or `'y` (or
`'static`!), as those are the only names in scope. This limitation is
then translated into a restriction to only access `'a` or `'b` because
we are returning `MakeReturn<'a, 'b>`, where `'x` and `'y` have been
replaced with `'a` and `'b` respectively.
## Detailed example
To help us explain member constraints in more detail, let's spell out
the `make` example in a bit more detail. First off, let's assume that
you have some dummy trait:
```rust
trait Trait<'a, 'b> { }
impl<T> Trait<'_, '_> for T { }
```
and this is the `make` function (in desugared form):
```rust
type MakeReturn<'x, 'y> = impl Trait<'x, 'y>;
fn make(a: &'a u32, b: &'b u32) -> MakeReturn<'a, 'b> {
(a, b)
}
```
What happens in this case is that the return type will be `(&'0 u32, &'1 u32)`,
where `'0` and `'1` are fresh region variables. We will have the following
region constraints:
```
'0 live at {L}
'1 live at {L}
'a: '0
'b: '1
'0 member of ['a, 'b, 'static]
'1 member of ['a, 'b, 'static]
```
Here the "liveness set" `{L}` corresponds to that subset of the body
where `'0` and `'1` are live -- basically the point from where the
return tuple is constructed to where it is returned (in fact, `'0` and
`'1` might have slightly different liveness sets, but that's not very
interesting to the point we are illustrating here).
The `'a: '0` and `'b: '1` constraints arise from subtyping. When we
construct the `(a, b)` value, it will be assigned type `(&'0 u32, &'1
u32)` -- the region variables reflect that the lifetime of this
references could be made smaller. For this value to be created from
`a` and `b`, however, we do require that:
```
(&'a u32, &'b u32) <: (&'0 u32, &'1 u32)
```
which means in turn that `&'a u32 <: &'0 u32` and hence that `'a: '0`
(and similarly that `&'b u32 <: &'1 u32`, `'b: '1`).
Note that if we ignore member constraints, the value of `'0` would be
inferred to some subset of the function body (from the liveness
constraints, which we did not write explicitly). It would never become
`'a`, because there is no need for it too -- we have a constraint that
`'a: '0`, but that just puts a "cap" on how *large* `'0` can grow to
become. Since we compute the *minimal* value that we can, we are happy
to leave `'0` as being just equal to the liveness set. This is where
member constraints come in.
## Choices are always lifetime parameters
At present, the "choice" regions from a member constraint are always
lifetime parameters from the current function. This falls out from the
placement of impl Trait, though in the future it may not be the case.
We take some advantage of this fact, as it simplifies the current
code. In particular, we don't have to consider a case like `'0 member
of ['1, 'static]`, in which the value of both `'0` and `'1` are being
inferred and hence changing. See [rust-lang/rust#61773] for more
information.
[#61773]: https://github.com/rust-lang/rust/issues/61773
## Applying member constraints
Member constraints are a bit more complex than other forms of
constraints. This is because they have a "or" quality to them -- that
is, they describe multiple choices that we must select from. E.g., in
our example constraint `'0 member of ['a, 'b, 'static]`, it might be
that `'0` is equal to `'a`, `'b`, *or* `'static`. How can we pick the
correct one? What we currently do is to look for a *minimal choice*
-- if we find one, then we will grow `'0` to be equal to that minimal
choice. To find that minimal choice, we take two factors into
consideration: lower and upper bounds.
### Lower bounds
The *lower bounds* are those lifetimes that `'0` *must outlive* --
i.e., that `'0` must be larger than. In fact, when it comes time to
apply member constraints, we've already *computed* the lower bounds of
`'0` because we computed its minimal value (or at least, the lower
bounds considering everything but member constraints).
Let `LB` be the current value of `'0`. We know then that `'0: LB` must
hold, whatever the final value of `'0` is. Therefore, we can rule out
any choice where `'choice` where `'choice: LB` does not hold.
Unfortunately, in our example, this is not very helpful. The lower
bound for `'0` will just be the liveness set `{L}`, and we know that
all the lifetime parameters outlive that set. So we are left with the
same set of choices here. (But in other examples, particularly those
with different variance, lower bound constraints may be relevant.)
### Upper bounds
The *upper bounds* are those lifetimes that *must outlive* `'0` --
i.e., that `'0` must be *smaller* then. In our example, this would be
`'a`, because we have the constraint that `'a: '0`. In more complex
examples, the chain may be more indirect.
We can use upper bounds to rule out members in a very similar way to
lower lower bounds. If UB is some upper bound, then we know that `UB:
'0` must hold, so we can rule out any choice `'choice` where `UB:
'choice` does not hold.
In our example, we would be able to reduce our choice set from `['a,
'b, 'static]` to just `['a]`. This is because `'0` has an upper bound
of `'a`, and neither `'a: 'b` nor `'a: 'static` is known to hold.
(For notes on how we collect upper bounds in the implementation, see
[the section below](#collecting).)
### Minimal choice
After applying lower and upper bounds, we can still sometimes have
multiple possibilities. For example, imagine a variant of our example
using types with the opposite variance. In that case, we would have
the constraint `'0: 'a` instead of `'a: '0`. Hence the current value
of `'0` would be `{L, 'a}`. Using this as a lower bound, we would be
able to narrow down the member choices to `['a, 'static]` because `'b:
'a` is not known to hold (but `'a: 'a` and `'static: 'a` do hold). We
would not have any upper bounds, so that would be our final set of choices.
In that case, we apply the **minimal choice** rule -- basically, if
one of our choices if smaller than the others, we can use that. In
this case, we would opt for `'a` (and not `'static`).
This choice is consistent with the general 'flow' of region
propagation, which always aims to compute a minimal value for the
region being inferred. However, it is somewhat arbitrary.
<a name="collecting"></a>
### Collecting upper bounds in the implementation
In practice, computing upper bounds is a bit inconvenient, because our
data structures are setup for the opposite. What we do is to compute
the **reverse SCC graph** (we do this lazilly and cache the result) --
that is, a graph where `'a: 'b` induces an edge `SCC('b) ->
SCC('a)`. Like the normal SCC graph, this is a DAG. We can then do a
depth-first search starting from `SCC('0)` in this graph. This will
take us to all the SCCs that must outlive `'0`.
One wrinkle is that, as we walk the "upper bound" SCCs, their values
will not yet have been fully computed. However, we **have** already
applied their liveness constraints, so we have some information about
their value. In particular, for any regions representing lifetime
parameters, their value will contain themselves (i.e., the initial
value for `'a` includes `'a` and the value for `'b` contains `'b`). So
we can collect all of the lifetime parameters that are reachable,
which is precisely what we are interested in.