Add a bit about various type system concepts (#697)
* add a bit on dataflow analysis * add a bit on quanitification * add a bit on debruijn index * add a bit on early and late bound params * add missing link * Typos Co-authored-by: Tshepang Lekhonkhobe <tshepang@gmail.com> * clarify dataflow example * fix formatting * fix typos * Typos Co-authored-by: Tshepang Lekhonkhobe <tshepang@gmail.com> * fix errors in background * remove dup material and make early/late intro short * adjust intro * Niko's intro Co-authored-by: Niko Matsakis <niko@alum.mit.edu> Co-authored-by: Tshepang Lekhonkhobe <tshepang@gmail.com> Co-authored-by: Niko Matsakis <niko@alum.mit.edu>
This commit is contained in:
parent
e74b4ab3a4
commit
98d000fb3c
|
|
@ -76,6 +76,7 @@
|
|||
- [Generic arguments](./generic_arguments.md)
|
||||
- [Type inference](./type-inference.md)
|
||||
- [Trait solving](./traits/resolution.md)
|
||||
- [Early and Late Bound Parameters](./early-late-bound.md)
|
||||
- [Higher-ranked trait bounds](./traits/hrtb.md)
|
||||
- [Caching subtleties](./traits/caching.md)
|
||||
- [Specialization](./traits/specialization.md)
|
||||
|
|
|
|||
|
|
@ -79,16 +79,158 @@ cycle.
|
|||
[*Static Program Analysis*](https://cs.au.dk/~amoeller/spa/) by Anders Møller
|
||||
and Michael I. Schwartzbach is an incredible resource!
|
||||
|
||||
*to be written*
|
||||
_Dataflow analysis_ is a type of static analysis that is common in many
|
||||
compilers. It describes a general technique, rather than a particular analysis.
|
||||
|
||||
The basic idea is that we can walk over a [CFG](#cfg) and keep track of what
|
||||
some value could be. At the end of the walk, we might have shown that some
|
||||
claim is true or not necessarily true (e.g. "this variable must be
|
||||
initialized"). `rustc` tends to do dataflow analyses over the MIR, since that
|
||||
is already a CFG.
|
||||
|
||||
For example, suppose we want to check that `x` is initialized before it is used
|
||||
in this snippet:
|
||||
|
||||
```rust,ignore
|
||||
fn foo() {
|
||||
let mut x;
|
||||
|
||||
if some_cond {
|
||||
x = 1;
|
||||
}
|
||||
|
||||
dbg!(x);
|
||||
}
|
||||
```
|
||||
|
||||
A CFG for this code might look like this:
|
||||
|
||||
```txt
|
||||
+------+
|
||||
| Init | (A)
|
||||
+------+
|
||||
| \
|
||||
| if some_cond
|
||||
else \ +-------+
|
||||
| \| x = 1 | (B)
|
||||
| +-------+
|
||||
| /
|
||||
+---------+
|
||||
| dbg!(x) | (C)
|
||||
+---------+
|
||||
```
|
||||
|
||||
We can do the dataflow analysis as follows: we will start off with a flag `init`
|
||||
which indicates if we know `x` is initialized. As we walk the CFG, we will
|
||||
update the flag. At the end, we can check its value.
|
||||
|
||||
So first, in block (A), the variable `x` is declared but not initialized, so
|
||||
`init = false`. In block (B), we initialize the value, so we know that `x` is
|
||||
initialized. So at the end of (B), `init = true`.
|
||||
|
||||
Block (C) is where things get interesting. Notice that there are two incoming
|
||||
edges, one from (A) and one from (B), corresponding to whether `some_cond` is true or not.
|
||||
But we cannot know that! It could be the case the `some_cond` is always true,
|
||||
so that `x` is actually always initialized. It could also be the case that
|
||||
`some_cond` depends on something random (e.g. the time), so `x` may not be
|
||||
initialized. In general, we cannot know statically (due to [Rice's
|
||||
Theorem][rice]). So what should the value of `init` be in block (C)?
|
||||
|
||||
[rice]: https://en.wikipedia.org/wiki/Rice%27s_theorem
|
||||
|
||||
Generally, in dataflow analyses, if a block has multiple parents (like (C) in
|
||||
our example), its dataflow value will be some function of all its parents (and
|
||||
of course, what happens in (C)). Which function we use depends on the analysis
|
||||
we are doing.
|
||||
|
||||
In this case, we want to be able to prove definitively that `x` must be
|
||||
initialized before use. This forces us to be conservative and assume that
|
||||
`some_cond` might be false sometimes. So our "merging function" is "and". That
|
||||
is, `init = true` in (C) if `init = true` in (A) _and_ in (B) (or if `x` is
|
||||
initialized in (C)). But this is not the case; in particular, `init = false` in
|
||||
(A), and `x` is not initialized in (C). Thus, `init = false` in (C); we can
|
||||
report an error that "`x` may not be initialized before use".
|
||||
|
||||
There is definitely a lot more that can be said about dataflow analyses. There is an
|
||||
extensive body of research literature on the topic, including a lot of theory.
|
||||
We only discussed a forwards analysis, but backwards dataflow analysis is also
|
||||
useful. For example, rather than starting from block (A) and moving forwards,
|
||||
we might have started with the usage of `x` and moved backwards to try to find
|
||||
its initialization.
|
||||
|
||||
<a name="quantified"></a>
|
||||
|
||||
## What is "universally quantified"? What about "existentially quantified"?
|
||||
|
||||
*to be written*
|
||||
In math, a predicate may be _universally quantified_ or _existentially
|
||||
quantified_:
|
||||
|
||||
- _Universal_ quantification:
|
||||
- the predicate holds if it is true for all possible inputs.
|
||||
- Traditional notation: ∀x: P(x). Read as "for all x, P(x) holds".
|
||||
- _Existential_ quantification:
|
||||
- the predicate holds if there is any input where it is true, i.e., there
|
||||
only has to be a single input.
|
||||
- Traditional notation: ∃x: P(x). Read as "there exists x such that P(x) holds".
|
||||
|
||||
In Rust, they come up in type checking and trait solving. For example,
|
||||
|
||||
```rust,ignore
|
||||
fn foo<T>()
|
||||
```
|
||||
This function claims that the function is well-typed for all types `T`: `∀ T: well_typed(foo)`.
|
||||
|
||||
Another example:
|
||||
|
||||
```rust,ignore
|
||||
fn foo<'a>(_: &'a usize)
|
||||
```
|
||||
This function claims that for any lifetime `'a` (determined by the
|
||||
caller), it is well-typed: `∀ 'a: well_typed(foo)`.
|
||||
|
||||
Another example:
|
||||
|
||||
```rust,ignore
|
||||
fn foo<F>()
|
||||
where for<'a> F: Fn(&'a u8)
|
||||
```
|
||||
This function claims that it is well-typed for all types `F` such that for all
|
||||
lifetimes `'a`, `F: Fn(&'a u8)`: `∀ F: ∀ 'a: (F: Fn(&'a u8)) => well_typed(foo)`.
|
||||
|
||||
One more example:
|
||||
|
||||
```rust,ignore
|
||||
fn foo(_: dyn Debug)
|
||||
```
|
||||
This function claims that there exists some type `T` that implements `Debug`
|
||||
such that the function is well-typed: `∃ T: (T: Debug) and well_typed(foo)`.
|
||||
|
||||
<a name="variance"></a>
|
||||
|
||||
## What is a DeBruijn Index?
|
||||
|
||||
DeBruijn indices are a way of representing which variables are bound in
|
||||
which binders using only integers. They were [originally invented][wikideb] for
|
||||
use in lambda calculus evaluation. In `rustc`, we use a similar idea for the
|
||||
[representation of generic types][sub].
|
||||
|
||||
[wikideb]: https://en.wikipedia.org/wiki/De_Bruijn_index
|
||||
[sub]: ../generics.md
|
||||
|
||||
Here is a basic example of how DeBruijn indices might be used for closures (we
|
||||
don't actually do this in `rustc` though):
|
||||
|
||||
```rust,ignore
|
||||
|x| {
|
||||
f(x) // de Bruijn index of `x` is 1 because `x` is bound 1 level up
|
||||
|
||||
|y| {
|
||||
g(x, y) // index of `x` is 2 because it is bound 2 levels up
|
||||
// index of `y` is 1 because it is bound 1 level up
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## What is co- and contra-variance?
|
||||
|
||||
Check out the subtyping chapter from the
|
||||
|
|
|
|||
|
|
@ -15,6 +15,7 @@ CTFE <div id="ctfe"/> | Short for Compile-Time Function Eval
|
|||
cx <div id="cx"/> | We tend to use "cx" as an abbreviation for context. See also `tcx`, `infcx`, etc.
|
||||
DAG <div id="dag"/> | A directed acyclic graph is used during compilation to keep track of dependencies between queries. ([see more](../queries/incremental-compilation.html))
|
||||
data-flow analysis <div id="data-flow"/> | A static analysis that figures out what properties are true at each point in the control-flow of a program; see [the background chapter for more](./background.html#dataflow).
|
||||
DeBruijn Index <div id="debruijn"> | A technique for describing which binder a variable is bound by using only integers. It has the benefit that it is invariant under variable renaming. ([see more](./background.md#debruijn))
|
||||
DefId <div id="def-id"/> | An index identifying a definition (see `librustc_middle/hir/def_id.rs`). Uniquely identifies a `DefPath`. See [the HIR chapter for more](../hir.html#identifiers-in-the-hir).
|
||||
Double pointer <div id="double-ptr"/> | A pointer with additional metadata. See "fat pointer" for more.
|
||||
drop glue <div id="drop-glue"/> | (internal) compiler-generated instructions that handle calling the destructors (`Drop`) for data types.
|
||||
|
|
|
|||
|
|
@ -0,0 +1,107 @@
|
|||
# Early and Late Bound Variables
|
||||
|
||||
In Rust, item definitions (like `fn`) can often have generic parameters, which
|
||||
are always [_universally_ quantified][quant]. That is, if you have a function
|
||||
like
|
||||
|
||||
```rust
|
||||
fn foo<T>(x: T) { }
|
||||
```
|
||||
|
||||
this function is defined "for all T" (not "for some specific T", which would be
|
||||
[_existentially_ quantified][quant]).
|
||||
|
||||
[quant]: ./appendix/background.md#quantified
|
||||
|
||||
While Rust *items* can be quantified over types, lifetimes, and constants, the
|
||||
types of values in Rust are only ever quantified over lifetimes. So you can
|
||||
have a type like `for<'a> fn(&'a u32)`, which represents a function pointer
|
||||
that takes a reference with any lifetime, or `for<'a> dyn Trait<'a>`, which is
|
||||
a `dyn` trait for a trait implemented for any lifetime; but we have no type
|
||||
like `for<T> fn(T)`, which would be a function that takes a value of *any type*
|
||||
as a parameter. This is a consequence of monomorphization -- to support a value
|
||||
of type `for<T> fn(T)`, we would need a single function pointer that can be
|
||||
used for a parameter of any type, but in Rust we generate customized code for
|
||||
each parameter type.
|
||||
|
||||
One consequence of this asymmetry is a weird split in how we represesent some
|
||||
generic types: _early-_ and _late-_ bound parameters.
|
||||
Basically, if we cannot represent a type (e.g. a universally quantified type),
|
||||
we have to bind it _early_ so that the unrepresentable type is never around.
|
||||
|
||||
Consider the following example:
|
||||
|
||||
```rust,ignore
|
||||
fn foo<'a, 'b, T>(x: &'a u32, y: &'b T) where T: 'b { ... }
|
||||
```
|
||||
|
||||
We cannot treat `'a`, `'b`, and `T` in the same way. Types in Rust can't have
|
||||
`for<T> { .. }`, only `for<'a> {...}`, so whenever you reference `foo` the type
|
||||
you get back can't be `for<'a, 'b, T> fn(&'a u32, y: &'b T)`. Instead, the `T`
|
||||
must be substituted early. In particular, you have:
|
||||
|
||||
```rust,ignore
|
||||
let x = foo; // T, 'b have to be substituted here
|
||||
x(...); // 'a substituted here, at the point of call
|
||||
x(...); // 'a substituted here with a different value
|
||||
```
|
||||
|
||||
## Early-bound parameters
|
||||
|
||||
Early-bound parameters in rustc are identified by an index, stored in the
|
||||
[`ParamTy`] struct for types or the [`EarlyBoundRegion`] struct for lifetimes.
|
||||
The index counts from the outermost declaration in scope. This means that as you
|
||||
add more binders inside, the index doesn't change.
|
||||
|
||||
For example,
|
||||
|
||||
```rust,ignore
|
||||
trait Foo<T> {
|
||||
type Bar<U> = (Self, T, U);
|
||||
}
|
||||
```
|
||||
|
||||
Here, the type `(Self, T, U)` would be `($0, $1, $2)`, where `$N` means a
|
||||
[`ParamTy`] with the index of `N`.
|
||||
|
||||
In rustc, the [`Generics`] structure carries the this information. So the
|
||||
[`Generics`] for `Bar` above would be just like for `U` and would indicate the
|
||||
'parent' generics of `Foo`, which declares `Self` and `T`. You can read more
|
||||
in [this chapter](./generics.md).
|
||||
|
||||
[`ParamTy`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.ParamTy.html
|
||||
[`EarlyBoundRegion`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.EarlyBoundRegion.html
|
||||
[`Generics`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Generics.html
|
||||
|
||||
## Late-bound parameters
|
||||
|
||||
Late-bound parameters in `rustc` are handled quite differently (they are also
|
||||
specialized to lifetimes since, right now, only late-bound lifetimes are
|
||||
supported, though with GATs that has to change). We indicate their potential
|
||||
presence by a [`Binder`] type. The [`Binder`] doesn't know how many variables
|
||||
there are at that binding level. This can only be determined by walking the
|
||||
type itself and collecting them. So a type like `for<'a, 'b> ('a, 'b)` would be
|
||||
`for (^0.a, ^0.b)`. Here, we just write `for` because we don't know the names
|
||||
of the things bound within.
|
||||
|
||||
Moreover, a reference to a late-bound lifetime is written `^0.a`:
|
||||
|
||||
- The `0` is the index; it identifies that this lifetime is bound in the
|
||||
innermost binder (the `for`).
|
||||
- The `a` is the "name"; late-bound lifetimes in rustc are identified by a
|
||||
"name" -- the [`BoundRegion`] struct. This struct can contain a
|
||||
[`DefId`][defid] or it might have various "anonymous" numbered names. The
|
||||
latter arise from types like `fn(&u32, &u32)`, which are equivalent to
|
||||
something like `for<'a, 'b> fn(&'a u32, &'b u32)`, but the names of those
|
||||
lifetimes must be generated.
|
||||
|
||||
This setup of not knowing the full set of variables at a binding level has some
|
||||
advantages and some disadvantages. The disadvantage is that you must walk the
|
||||
type to find out what is bound at the given level and so forth. The advantage
|
||||
is primarily that, when constructing types from Rust syntax, if we encounter
|
||||
anonymous regions like in `fn(&u32)`, we just create a fresh index and don't have
|
||||
to update the binder.
|
||||
|
||||
[`Binder`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Binder.html
|
||||
[`BoundRegion`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/enum.BoundRegion.html
|
||||
[defid]: ./hir.html#identifiers-in-the-hir
|
||||
Loading…
Reference in New Issue