Add back the `canonicalization` chapter. (#1532)
* Add back the `canonicalization` chapter. * Add a `FIXME` about reorganizing contents.
This commit is contained in:
parent
e3bda8f6fb
commit
10b168e2c7
|
|
@ -123,6 +123,7 @@
|
||||||
- [Lowering to logic](./traits/lowering-to-logic.md)
|
- [Lowering to logic](./traits/lowering-to-logic.md)
|
||||||
- [Goals and clauses](./traits/goals-and-clauses.md)
|
- [Goals and clauses](./traits/goals-and-clauses.md)
|
||||||
- [Canonical queries](./traits/canonical-queries.md)
|
- [Canonical queries](./traits/canonical-queries.md)
|
||||||
|
- [Canonicalization](./traits/canonicalization.md)
|
||||||
- [Next-gen trait solving](./solve/trait-solving.md)
|
- [Next-gen trait solving](./solve/trait-solving.md)
|
||||||
- [Invariants of the type system](./solve/invariants.md)
|
- [Invariants of the type system](./solve/invariants.md)
|
||||||
- [The solver](./solve/the-solver.md)
|
- [The solver](./solve/the-solver.md)
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,260 @@
|
||||||
|
# Canonicalization
|
||||||
|
|
||||||
|
> **NOTE**: FIXME: The content of this chapter has some overlap with
|
||||||
|
> [Next-gen trait solving Canonicalization chapter](../solve/canonicalization.html).
|
||||||
|
> It is suggested to reorganize these contents in the future.
|
||||||
|
|
||||||
|
Canonicalization is the process of **isolating** an inference value
|
||||||
|
from its context. It is a key part of implementing
|
||||||
|
[canonical queries][cq], and you may wish to read the parent chapter
|
||||||
|
to get more context.
|
||||||
|
|
||||||
|
Canonicalization is really based on a very simple concept: every
|
||||||
|
[inference variable](../type-inference.html#vars) is always in one of
|
||||||
|
two states: either it is **unbound**, in which case we don't know yet
|
||||||
|
what type it is, or it is **bound**, in which case we do. So to
|
||||||
|
isolate some data-structure T that contains types/regions from its
|
||||||
|
environment, we just walk down and find the unbound variables that
|
||||||
|
appear in T; those variables get replaced with "canonical variables",
|
||||||
|
starting from zero and numbered in a fixed order (left to right, for
|
||||||
|
the most part, but really it doesn't matter as long as it is
|
||||||
|
consistent).
|
||||||
|
|
||||||
|
[cq]: ./canonical-queries.html
|
||||||
|
|
||||||
|
So, for example, if we have the type `X = (?T, ?U)`, where `?T` and
|
||||||
|
`?U` are distinct, unbound inference variables, then the canonical
|
||||||
|
form of `X` would be `(?0, ?1)`, where `?0` and `?1` represent these
|
||||||
|
**canonical placeholders**. Note that the type `Y = (?U, ?T)` also
|
||||||
|
canonicalizes to `(?0, ?1)`. But the type `Z = (?T, ?T)` would
|
||||||
|
canonicalize to `(?0, ?0)` (as would `(?U, ?U)`). In other words, the
|
||||||
|
exact identity of the inference variables is not important – unless
|
||||||
|
they are repeated.
|
||||||
|
|
||||||
|
We use this to improve caching as well as to detect cycles and other
|
||||||
|
things during trait resolution. Roughly speaking, the idea is that if
|
||||||
|
two trait queries have the same canonical form, then they will get
|
||||||
|
the same answer. That answer will be expressed in terms of the
|
||||||
|
canonical variables (`?0`, `?1`), which we can then map back to the
|
||||||
|
original variables (`?T`, `?U`).
|
||||||
|
|
||||||
|
## Canonicalizing the query
|
||||||
|
|
||||||
|
To see how it works, imagine that we are asking to solve the following
|
||||||
|
trait query: `?A: Foo<'static, ?B>`, where `?A` and `?B` are unbound.
|
||||||
|
This query contains two unbound variables, but it also contains the
|
||||||
|
lifetime `'static`. The trait system generally ignores all lifetimes
|
||||||
|
and treats them equally, so when canonicalizing, we will *also*
|
||||||
|
replace any [free lifetime](../appendix/background.html#free-vs-bound) with a
|
||||||
|
canonical variable (Note that `'static` is actually a _free_ lifetime
|
||||||
|
variable here. We are not considering it in the typing context of the whole
|
||||||
|
program but only in the context of this trait reference. Mathematically, we
|
||||||
|
are not quantifying over the whole program, but only this obligation).
|
||||||
|
Therefore, we get the following result:
|
||||||
|
|
||||||
|
```text
|
||||||
|
?0: Foo<'?1, ?2>
|
||||||
|
```
|
||||||
|
|
||||||
|
Sometimes we write this differently, like so:
|
||||||
|
|
||||||
|
```text
|
||||||
|
for<T,L,T> { ?0: Foo<'?1, ?2> }
|
||||||
|
```
|
||||||
|
|
||||||
|
This `for<>` gives some information about each of the canonical
|
||||||
|
variables within. In this case, each `T` indicates a type variable,
|
||||||
|
so `?0` and `?2` are types; the `L` indicates a lifetime variable, so
|
||||||
|
`?1` is a lifetime. The `canonicalize` method *also* gives back a
|
||||||
|
`CanonicalVarValues` array OV with the "original values" for each
|
||||||
|
canonicalized variable:
|
||||||
|
|
||||||
|
```text
|
||||||
|
[?A, 'static, ?B]
|
||||||
|
```
|
||||||
|
|
||||||
|
We'll need this vector OV later, when we process the query response.
|
||||||
|
|
||||||
|
## Executing the query
|
||||||
|
|
||||||
|
Once we've constructed the canonical query, we can try to solve it.
|
||||||
|
To do so, we will wind up creating a fresh inference context and
|
||||||
|
**instantiating** the canonical query in that context. The idea is that
|
||||||
|
we create a substitution S from the canonical form containing a fresh
|
||||||
|
inference variable (of suitable kind) for each canonical variable.
|
||||||
|
So, for our example query:
|
||||||
|
|
||||||
|
```text
|
||||||
|
for<T,L,T> { ?0: Foo<'?1, ?2> }
|
||||||
|
```
|
||||||
|
|
||||||
|
the substitution S might be:
|
||||||
|
|
||||||
|
```text
|
||||||
|
S = [?A, '?B, ?C]
|
||||||
|
```
|
||||||
|
|
||||||
|
We can then replace the bound canonical variables (`?0`, etc) with
|
||||||
|
these inference variables, yielding the following fully instantiated
|
||||||
|
query:
|
||||||
|
|
||||||
|
```text
|
||||||
|
?A: Foo<'?B, ?C>
|
||||||
|
```
|
||||||
|
|
||||||
|
Remember that substitution S though! We're going to need it later.
|
||||||
|
|
||||||
|
OK, now that we have a fresh inference context and an instantiated
|
||||||
|
query, we can go ahead and try to solve it. The trait solver itself is
|
||||||
|
explained in more detail in [another section](./slg.html), but
|
||||||
|
suffice to say that it will compute a [certainty value][cqqr] (`Proven` or
|
||||||
|
`Ambiguous`) and have side-effects on the inference variables we've
|
||||||
|
created. For example, if there were only one impl of `Foo`, like so:
|
||||||
|
|
||||||
|
[cqqr]: ./canonical-queries.html#query-response
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
impl<'a, X> Foo<'a, X> for Vec<X>
|
||||||
|
where X: 'a
|
||||||
|
{ ... }
|
||||||
|
```
|
||||||
|
|
||||||
|
then we might wind up with a certainty value of `Proven`, as well as
|
||||||
|
creating fresh inference variables `'?D` and `?E` (to represent the
|
||||||
|
parameters on the impl) and unifying as follows:
|
||||||
|
|
||||||
|
- `'?B = '?D`
|
||||||
|
- `?A = Vec<?E>`
|
||||||
|
- `?C = ?E`
|
||||||
|
|
||||||
|
We would also accumulate the region constraint `?E: '?D`, due to the
|
||||||
|
where clause.
|
||||||
|
|
||||||
|
In order to create our final query result, we have to "lift" these
|
||||||
|
values out of the query's inference context and into something that
|
||||||
|
can be reapplied in our original inference context. We do that by
|
||||||
|
**re-applying canonicalization**, but to the **query result**.
|
||||||
|
|
||||||
|
## Canonicalizing the query result
|
||||||
|
|
||||||
|
As discussed in [the parent section][cqqr], most trait queries wind up
|
||||||
|
with a result that brings together a "certainty value" `certainty`, a
|
||||||
|
result substitution `var_values`, and some region constraints. To
|
||||||
|
create this, we wind up re-using the substitution S that we created
|
||||||
|
when first instantiating our query. To refresh your memory, we had a query
|
||||||
|
|
||||||
|
```text
|
||||||
|
for<T,L,T> { ?0: Foo<'?1, ?2> }
|
||||||
|
```
|
||||||
|
|
||||||
|
for which we made a substutition S:
|
||||||
|
|
||||||
|
```text
|
||||||
|
S = [?A, '?B, ?C]
|
||||||
|
```
|
||||||
|
|
||||||
|
We then did some work which unified some of those variables with other things.
|
||||||
|
If we "refresh" S with the latest results, we get:
|
||||||
|
|
||||||
|
```text
|
||||||
|
S = [Vec<?E>, '?D, ?E]
|
||||||
|
```
|
||||||
|
|
||||||
|
These are precisely the new values for the three input variables from
|
||||||
|
our original query. Note though that they include some new variables
|
||||||
|
(like `?E`). We can make those go away by canonicalizing again! We don't
|
||||||
|
just canonicalize S, though, we canonicalize the whole query response QR:
|
||||||
|
|
||||||
|
```text
|
||||||
|
QR = {
|
||||||
|
certainty: Proven, // or whatever
|
||||||
|
var_values: [Vec<?E>, '?D, ?E] // this is S
|
||||||
|
region_constraints: [?E: '?D], // from the impl
|
||||||
|
value: (), // for our purposes, just (), but
|
||||||
|
// in some cases this might have
|
||||||
|
// a type or other info
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The result would be as follows:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Canonical(QR) = for<T, L> {
|
||||||
|
certainty: Proven,
|
||||||
|
var_values: [Vec<?0>, '?1, ?0]
|
||||||
|
region_constraints: [?0: '?1],
|
||||||
|
value: (),
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
(One subtle point: when we canonicalize the query **result**, we do not
|
||||||
|
use any special treatment for free lifetimes. Note that both
|
||||||
|
references to `'?D`, for example, were converted into the same
|
||||||
|
canonical variable (`?1`). This is in contrast to the original query,
|
||||||
|
where we canonicalized every free lifetime into a fresh canonical
|
||||||
|
variable.)
|
||||||
|
|
||||||
|
Now, this result must be reapplied in each context where needed.
|
||||||
|
|
||||||
|
## Processing the canonicalized query result
|
||||||
|
|
||||||
|
In the previous section we produced a canonical query result. We now have
|
||||||
|
to apply that result in our original context. If you recall, way back in the
|
||||||
|
beginning, we were trying to prove this query:
|
||||||
|
|
||||||
|
```text
|
||||||
|
?A: Foo<'static, ?B>
|
||||||
|
```
|
||||||
|
|
||||||
|
We canonicalized that into this:
|
||||||
|
|
||||||
|
```text
|
||||||
|
for<T,L,T> { ?0: Foo<'?1, ?2> }
|
||||||
|
```
|
||||||
|
|
||||||
|
and now we got back a canonical response:
|
||||||
|
|
||||||
|
```text
|
||||||
|
for<T, L> {
|
||||||
|
certainty: Proven,
|
||||||
|
var_values: [Vec<?0>, '?1, ?0]
|
||||||
|
region_constraints: [?0: '?1],
|
||||||
|
value: (),
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
We now want to apply that response to our context. Conceptually, how
|
||||||
|
we do that is to (a) instantiate each of the canonical variables in
|
||||||
|
the result with a fresh inference variable, (b) unify the values in
|
||||||
|
the result with the original values, and then (c) record the region
|
||||||
|
constraints for later. Doing step (a) would yield a result of
|
||||||
|
|
||||||
|
```text
|
||||||
|
{
|
||||||
|
certainty: Proven,
|
||||||
|
var_values: [Vec<?C>, '?D, ?C]
|
||||||
|
^^ ^^^ fresh inference variables
|
||||||
|
region_constraints: [?C: '?D],
|
||||||
|
value: (),
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Step (b) would then unify:
|
||||||
|
|
||||||
|
```text
|
||||||
|
?A with Vec<?C>
|
||||||
|
'static with '?D
|
||||||
|
?B with ?C
|
||||||
|
```
|
||||||
|
|
||||||
|
And finally the region constraint of `?C: 'static` would be recorded
|
||||||
|
for later verification.
|
||||||
|
|
||||||
|
(What we *actually* do is a mildly optimized variant of that: Rather
|
||||||
|
than eagerly instantiating all of the canonical values in the result
|
||||||
|
with variables, we instead walk the vector of values, looking for
|
||||||
|
cases where the value is just a canonical variable. In our example,
|
||||||
|
`values[2]` is `?C`, so that means we can deduce that `?C := ?B` and
|
||||||
|
`'?D := 'static`. This gives us a partial set of values. Anything for
|
||||||
|
which we do not find a value, we create an inference variable.)
|
||||||
|
|
||||||
Loading…
Reference in New Issue