Merge pull request #85 from mark-i-m/typeck

Add the contents of the typeck READMEs
This commit is contained in:
Niko Matsakis 2018-03-13 11:55:25 -04:00 committed by GitHub
commit 9384331641
7 changed files with 467 additions and 1 deletions

View File

@ -35,6 +35,8 @@
- [The SLG solver](./traits-slg.md) - [The SLG solver](./traits-slg.md)
- [Bibliography](./traits-bibliography.md) - [Bibliography](./traits-bibliography.md)
- [Type checking](./type-checking.md) - [Type checking](./type-checking.md)
- [Method Lookup](./method-lookup.md)
- [Variance](./variance.md)
- [The MIR (Mid-level IR)](./mir.md) - [The MIR (Mid-level IR)](./mir.md)
- [MIR construction](./mir-construction.md) - [MIR construction](./mir-construction.md)
- [MIR visitor and traversal](./mir-visitor.md) - [MIR visitor and traversal](./mir-visitor.md)

View File

@ -91,6 +91,9 @@ cycle.
Check out the subtyping chapter from the Check out the subtyping chapter from the
[Rust Nomicon](https://doc.rust-lang.org/nomicon/subtyping.html). [Rust Nomicon](https://doc.rust-lang.org/nomicon/subtyping.html).
See the [variance](./variance.html) chapter of this guide for more info on how
the type checker handles variance.
<a name=free-vs-bound> <a name=free-vs-bound>
## What is a "free region" or a "free variable"? What about "bound region"? ## What is a "free region" or a "free variable"? What about "bound region"?

View File

@ -14,9 +14,11 @@ Item | Kind | Short description | Chapter |
`Session` | struct | The data associated with a compilation session | [the Parser], [The Rustc Driver] | [src/librustc/session/mod.html](https://github.com/rust-lang/rust/blob/master/src/librustc/session/mod.rs) `Session` | struct | The data associated with a compilation session | [the Parser], [The Rustc Driver] | [src/librustc/session/mod.html](https://github.com/rust-lang/rust/blob/master/src/librustc/session/mod.rs)
`StringReader` | struct | This is the lexer used during parsing. It consumes characters from the raw source code being compiled and produces a series of tokens for use by the rest of the parser | [The parser] | [src/libsyntax/parse/lexer/mod.rs](https://github.com/rust-lang/rust/blob/master/src/libsyntax/parse/lexer/mod.rs) `StringReader` | struct | This is the lexer used during parsing. It consumes characters from the raw source code being compiled and produces a series of tokens for use by the rest of the parser | [The parser] | [src/libsyntax/parse/lexer/mod.rs](https://github.com/rust-lang/rust/blob/master/src/libsyntax/parse/lexer/mod.rs)
`TraitDef` | struct | This struct contains a trait's definition with type information | [The `ty` modules] | [src/librustc/ty/trait_def.rs](https://github.com/rust-lang/rust/blob/master/src/librustc/ty/trait_def.rs) `TraitDef` | struct | This struct contains a trait's definition with type information | [The `ty` modules] | [src/librustc/ty/trait_def.rs](https://github.com/rust-lang/rust/blob/master/src/librustc/ty/trait_def.rs)
`Ty<'tcx>` | struct | This is the internal representation of a type used for type checking | [Type checking] | [src/librustc/ty/mod.rs](https://github.com/rust-lang/rust/blob/master/src/librustc/ty/mod.rs)
`TyCtxt<'cx, 'tcx, 'tcx>` | type | The "typing context". This is the central data structure in the compiler. It is the context that you use to perform all manner of queries. | [The `ty` modules] | [src/librustc/ty/context.rs](https://github.com/rust-lang/rust/blob/master/src/librustc/ty/context.rs) `TyCtxt<'cx, 'tcx, 'tcx>` | type | The "typing context". This is the central data structure in the compiler. It is the context that you use to perform all manner of queries. | [The `ty` modules] | [src/librustc/ty/context.rs](https://github.com/rust-lang/rust/blob/master/src/librustc/ty/context.rs)
[The HIR]: hir.html [The HIR]: hir.html
[The parser]: the-parser.html [The parser]: the-parser.html
[The Rustc Driver]: rustc-driver.html [The Rustc Driver]: rustc-driver.html
[Type checking]: type-checking.html
[The `ty` modules]: ty.html [The `ty` modules]: ty.html

View File

@ -56,7 +56,8 @@ token | the smallest unit of parsing. Tokens are produced aft
trans | the code to translate MIR into LLVM IR. trans | the code to translate MIR into LLVM IR.
trait reference | a trait and values for its type parameters ([see more](ty.html)). trait reference | a trait and values for its type parameters ([see more](ty.html)).
ty | the internal representation of a type ([see more](ty.html)). ty | the internal representation of a type ([see more](ty.html)).
variance | variance determines how changes to a generic type/lifetime parameter affect subtyping; for example, if `T` is a subtype of `U`, then `Vec<T>` is a subtype `Vec<U>` because `Vec` is *covariant* in its generic parameter. See [the background chapter for more](./appendix-background.html#variance). UFCS | Universal Function Call Syntax. An unambiguous syntax for calling a method ([see more](type-checking.html)).
variance | variance determines how changes to a generic type/lifetime parameter affect subtyping; for example, if `T` is a subtype of `U`, then `Vec<T>` is a subtype `Vec<U>` because `Vec` is *covariant* in its generic parameter. See [the background chapter](./appendix-background.html#variance) for a more general explanation. See the [variance chapter](./variance.html) for an explanation of how type checking handles variance.
[LLVM]: https://llvm.org/ [LLVM]: https://llvm.org/
[lto]: https://llvm.org/docs/LinkTimeOptimization.html [lto]: https://llvm.org/docs/LinkTimeOptimization.html

119
src/method-lookup.md Normal file
View File

@ -0,0 +1,119 @@
# Method lookup
Method lookup can be rather complex due to the interaction of a number
of factors, such as self types, autoderef, trait lookup, etc. This
file provides an overview of the process. More detailed notes are in
the code itself, naturally.
One way to think of method lookup is that we convert an expression of
the form:
```rust
receiver.method(...)
```
into a more explicit UFCS form:
```rust
Trait::method(ADJ(receiver), ...) // for a trait call
ReceiverType::method(ADJ(receiver), ...) // for an inherent method call
```
Here `ADJ` is some kind of adjustment, which is typically a series of
autoderefs and then possibly an autoref (e.g., `&**receiver`). However
we sometimes do other adjustments and coercions along the way, in
particular unsizing (e.g., converting from `[T; n]` to `[T]`).
Method lookup is divided into two major phases:
1. Probing ([`probe.rs`][probe]). The probe phase is when we decide what method
to call and how to adjust the receiver.
2. Confirmation ([`confirm.rs`][confirm]). The confirmation phase "applies"
this selection, updating the side-tables, unifying type variables, and
otherwise doing side-effectful things.
One reason for this division is to be more amenable to caching. The
probe phase produces a "pick" (`probe::Pick`), which is designed to be
cacheable across method-call sites. Therefore, it does not include
inference variables or other information.
[probe]: https://github.com/rust-lang/rust/blob/master/src/librustc_typeck/check/method/probe.rs
[confirm]: https://github.com/rust-lang/rust/blob/master/src/librustc_typeck/check/method/confirm.rs
## The Probe phase
### Steps
The first thing that the probe phase does is to create a series of
*steps*. This is done by progressively dereferencing the receiver type
until it cannot be deref'd anymore, as well as applying an optional
"unsize" step. So if the receiver has type `Rc<Box<[T; 3]>>`, this
might yield:
```rust
Rc<Box<[T; 3]>>
Box<[T; 3]>
[T; 3]
[T]
```
### Candidate assembly
We then search along those steps to create a list of *candidates*. A
`Candidate` is a method item that might plausibly be the method being
invoked. For each candidate, we'll derive a "transformed self type"
that takes into account explicit self.
Candidates are grouped into two kinds, inherent and extension.
**Inherent candidates** are those that are derived from the
type of the receiver itself. So, if you have a receiver of some
nominal type `Foo` (e.g., a struct), any methods defined within an
impl like `impl Foo` are inherent methods. Nothing needs to be
imported to use an inherent method, they are associated with the type
itself (note that inherent impls can only be defined in the same
module as the type itself).
FIXME: Inherent candidates are not always derived from impls. If you
have a trait object, such as a value of type `Box<ToString>`, then the
trait methods (`to_string()`, in this case) are inherently associated
with it. Another case is type parameters, in which case the methods of
their bounds are inherent. However, this part of the rules is subject
to change: when DST's "impl Trait for Trait" is complete, trait object
dispatch could be subsumed into trait matching, and the type parameter
behavior should be reconsidered in light of where clauses.
TODO: Is this FIXME still accurate?
**Extension candidates** are derived from imported traits. If I have
the trait `ToString` imported, and I call `to_string()` on a value of
type `T`, then we will go off to find out whether there is an impl of
`ToString` for `T`. These kinds of method calls are called "extension
methods". They can be defined in any module, not only the one that
defined `T`. Furthermore, you must import the trait to call such a
method.
So, let's continue our example. Imagine that we were calling a method
`foo` with the receiver `Rc<Box<[T; 3]>>` and there is a trait `Foo`
that defines it with `&self` for the type `Rc<U>` as well as a method
on the type `Box` that defines `Foo` but with `&mut self`. Then we
might have two candidates:
&Rc<Box<[T; 3]>> from the impl of `Foo` for `Rc<U>` where `U=Box<T; 3]>
&mut Box<[T; 3]>> from the inherent impl on `Box<U>` where `U=[T; 3]`
### Candidate search
Finally, to actually pick the method, we will search down the steps,
trying to match the receiver type against the candidate types. At
each step, we also consider an auto-ref and auto-mut-ref to see whether
that makes any of the candidates match. We pick the first step where
we find a match.
In the case of our example, the first step is `Rc<Box<[T; 3]>>`,
which does not itself match any candidate. But when we autoref it, we
get the type `&Rc<Box<[T; 3]>>` which does match. We would then
recursively consider all where-clauses that appear on the impl: if
those match (or we cannot rule out that they do), then this is the
method we would pick. Otherwise, we would continue down the series of
steps.

View File

@ -1 +1,44 @@
# Type checking # Type checking
The [`rustc_typeck`][typeck] crate contains the source for "type collection"
and "type checking", as well as a few other bits of related functionality. (It
draws heavily on the [type inference] and [trait solving].)
[typeck]: https://github.com/rust-lang/rust/tree/master/src/librustc_typeck
[type inference]: type-inference.html
[trait solving]: trait-resolution.html
## Type collection
Type "collection" is the process of converting the types found in the HIR
(`hir::Ty`), which represent the syntactic things that the user wrote, into the
**internal representation** used by the compiler (`Ty<'tcx>`) -- we also do
similar conversions for where-clauses and other bits of the function signature.
To try and get a sense for the difference, consider this function:
```rust
struct Foo { }
fn foo(x: Foo, y: self::Foo) { .. }
// ^^^ ^^^^^^^^^
```
Those two parameters `x` and `y` each have the same type: but they will have
distinct `hir::Ty` nodes. Those nodes will have different spans, and of course
they encode the path somewhat differently. But once they are "collected" into
`Ty<'tcx>` nodes, they will be represented by the exact same internal type.
Collection is defined as a bundle of [queries] for computing information about
the various functions, traits, and other items in the crate being compiled.
Note that each of these queries is concerned with *interprocedural* things --
for example, for a function definition, collection will figure out the type and
signature of the function, but it will not visit the *body* of the function in
any way, nor examine type annotations on local variables (that's the job of
type *checking*).
For more details, see the [`collect`][collect] module.
[queries]: query.html
[collect]: https://github.com/rust-lang/rust/blob/master/src/librustc_typeck/collect.rs
**TODO**: actually talk about type checking...

296
src/variance.md Normal file
View File

@ -0,0 +1,296 @@
# Variance of type and lifetime parameters
For a more general background on variance, see the [background] appendix.
[background]: ./appendix-background.html
During type checking we must infer the variance of type and lifetime
parameters. The algorithm is taken from Section 4 of the paper ["Taming the
Wildcards: Combining Definition- and Use-Site Variance"][pldi11] published in
PLDI'11 and written by Altidor et al., and hereafter referred to as The Paper.
[pldi11]: https://people.cs.umass.edu/~yannis/variance-extended2011.pdf
This inference is explicitly designed *not* to consider the uses of
types within code. To determine the variance of type parameters
defined on type `X`, we only consider the definition of the type `X`
and the definitions of any types it references.
We only infer variance for type parameters found on *data types*
like structs and enums. In these cases, there is a fairly straightforward
explanation for what variance means. The variance of the type
or lifetime parameters defines whether `T<A>` is a subtype of `T<B>`
(resp. `T<'a>` and `T<'b>`) based on the relationship of `A` and `B`
(resp. `'a` and `'b`).
We do not infer variance for type parameters found on traits, functions,
or impls. Variance on trait parameters can indeed make sense
(and we used to compute it) but it is actually rather subtle in
meaning and not that useful in practice, so we removed it. See the
[addendum] for some details. Variances on function/impl parameters, on the
other hand, doesn't make sense because these parameters are instantiated and
then forgotten, they don't persist in types or compiled byproducts.
[addendum]: #addendum
> **Notation**
>
> We use the notation of The Paper throughout this chapter:
>
> - `+` is _covariance_.
> - `-` is _contravariance_.
> - `*` is _bivariance_.
> - `o` is _invariance_.
## The algorithm
The basic idea is quite straightforward. We iterate over the types
defined and, for each use of a type parameter `X`, accumulate a
constraint indicating that the variance of `X` must be valid for the
variance of that use site. We then iteratively refine the variance of
`X` until all constraints are met. There is *always* a solution, because at
the limit we can declare all type parameters to be invariant and all
constraints will be satisfied.
As a simple example, consider:
```rust
enum Option<A> { Some(A), None }
enum OptionalFn<B> { Some(|B|), None }
enum OptionalMap<C> { Some(|C| -> C), None }
```
Here, we will generate the constraints:
1. V(A) <= +
2. V(B) <= -
3. V(C) <= +
4. V(C) <= -
These indicate that (1) the variance of A must be at most covariant;
(2) the variance of B must be at most contravariant; and (3, 4) the
variance of C must be at most covariant *and* contravariant. All of these
results are based on a variance lattice defined as follows:
* Top (bivariant)
- +
o Bottom (invariant)
Based on this lattice, the solution `V(A)=+`, `V(B)=-`, `V(C)=o` is the
optimal solution. Note that there is always a naive solution which
just declares all variables to be invariant.
You may be wondering why fixed-point iteration is required. The reason
is that the variance of a use site may itself be a function of the
variance of other type parameters. In full generality, our constraints
take the form:
V(X) <= Term
Term := + | - | * | o | V(X) | Term x Term
Here the notation `V(X)` indicates the variance of a type/region
parameter `X` with respect to its defining class. `Term x Term`
represents the "variance transform" as defined in the paper:
> If the variance of a type variable `X` in type expression `E` is `V2`
and the definition-site variance of the [corresponding] type parameter
of a class `C` is `V1`, then the variance of `X` in the type expression
`C<E>` is `V3 = V1.xform(V2)`.
## Constraints
If I have a struct or enum with where clauses:
```rust
struct Foo<T: Bar> { ... }
```
you might wonder whether the variance of `T` with respect to `Bar` affects the
variance `T` with respect to `Foo`. I claim no. The reason: assume that `T` is
invariant with respect to `Bar` but covariant with respect to `Foo`. And then
we have a `Foo<X>` that is upcast to `Foo<Y>`, where `X <: Y`. However, while
`X : Bar`, `Y : Bar` does not hold. In that case, the upcast will be illegal,
but not because of a variance failure, but rather because the target type
`Foo<Y>` is itself just not well-formed. Basically we get to assume
well-formedness of all types involved before considering variance.
### Dependency graph management
Because variance is a whole-crate inference, its dependency graph
can become quite muddled if we are not careful. To resolve this, we refactor
into two queries:
- `crate_variances` computes the variance for all items in the current crate.
- `variances_of` accesses the variance for an individual reading; it
works by requesting `crate_variances` and extracting the relevant data.
If you limit yourself to reading `variances_of`, your code will only
depend then on the inference of that particular item.
Ultimately, this setup relies on the [red-green algorithm][rga]. In particular,
every variance query effectively depends on all type definitions in the entire
crate (through `crate_variances`), but since most changes will not result in a
change to the actual results from variance inference, the `variances_of` query
will wind up being considered green after it is re-evaluated.
[rga]: ./incremental-compilation.html
<a name=addendum>
## Addendum: Variance on traits
As mentioned above, we used to permit variance on traits. This was
computed based on the appearance of trait type parameters in
method signatures and was used to represent the compatibility of
vtables in trait objects (and also "virtual" vtables or dictionary
in trait bounds). One complication was that variance for
associated types is less obvious, since they can be projected out
and put to myriad uses, so it's not clear when it is safe to allow
`X<A>::Bar` to vary (or indeed just what that means). Moreover (as
covered below) all inputs on any trait with an associated type had
to be invariant, limiting the applicability. Finally, the
annotations (`MarkerTrait`, `PhantomFn`) needed to ensure that all
trait type parameters had a variance were confusing and annoying
for little benefit.
Just for historical reference, I am going to preserve some text indicating how
one could interpret variance and trait matching.
### Variance and object types
Just as with structs and enums, we can decide the subtyping
relationship between two object types `&Trait<A>` and `&Trait<B>`
based on the relationship of `A` and `B`. Note that for object
types we ignore the `Self` type parameter -- it is unknown, and
the nature of dynamic dispatch ensures that we will always call a
function that is expected the appropriate `Self` type. However, we
must be careful with the other type parameters, or else we could
end up calling a function that is expecting one type but provided
another.
To see what I mean, consider a trait like so:
trait ConvertTo<A> {
fn convertTo(&self) -> A;
}
Intuitively, If we had one object `O=&ConvertTo<Object>` and another
`S=&ConvertTo<String>`, then `S <: O` because `String <: Object`
(presuming Java-like "string" and "object" types, my go to examples
for subtyping). The actual algorithm would be to compare the
(explicit) type parameters pairwise respecting their variance: here,
the type parameter A is covariant (it appears only in a return
position), and hence we require that `String <: Object`.
You'll note though that we did not consider the binding for the
(implicit) `Self` type parameter: in fact, it is unknown, so that's
good. The reason we can ignore that parameter is precisely because we
don't need to know its value until a call occurs, and at that time (as
you said) the dynamic nature of virtual dispatch means the code we run
will be correct for whatever value `Self` happens to be bound to for
the particular object whose method we called. `Self` is thus different
from `A`, because the caller requires that `A` be known in order to
know the return type of the method `convertTo()`. (As an aside, we
have rules preventing methods where `Self` appears outside of the
receiver position from being called via an object.)
### Trait variance and vtable resolution
But traits aren't only used with objects. They're also used when
deciding whether a given impl satisfies a given trait bound. To set the
scene here, imagine I had a function:
fn convertAll<A,T:ConvertTo<A>>(v: &[T]) {
...
}
Now imagine that I have an implementation of `ConvertTo` for `Object`:
impl ConvertTo<i32> for Object { ... }
And I want to call `convertAll` on an array of strings. Suppose
further that for whatever reason I specifically supply the value of
`String` for the type parameter `T`:
let mut vector = vec!["string", ...];
convertAll::<i32, String>(vector);
Is this legal? To put another way, can we apply the `impl` for
`Object` to the type `String`? The answer is yes, but to see why
we have to expand out what will happen:
- `convertAll` will create a pointer to one of the entries in the
vector, which will have type `&String`
- It will then call the impl of `convertTo()` that is intended
for use with objects. This has the type:
fn(self: &Object) -> i32
It is ok to provide a value for `self` of type `&String` because
`&String <: &Object`.
OK, so intuitively we want this to be legal, so let's bring this back
to variance and see whether we are computing the correct result. We
must first figure out how to phrase the question "is an impl for
`Object,i32` usable where an impl for `String,i32` is expected?"
Maybe it's helpful to think of a dictionary-passing implementation of
type classes. In that case, `convertAll()` takes an implicit parameter
representing the impl. In short, we *have* an impl of type:
V_O = ConvertTo<i32> for Object
and the function prototype expects an impl of type:
V_S = ConvertTo<i32> for String
As with any argument, this is legal if the type of the value given
(`V_O`) is a subtype of the type expected (`V_S`). So is `V_O <: V_S`?
The answer will depend on the variance of the various parameters. In
this case, because the `Self` parameter is contravariant and `A` is
covariant, it means that:
V_O <: V_S iff
i32 <: i32
String <: Object
These conditions are satisfied and so we are happy.
### Variance and associated types
Traits with associated types -- or at minimum projection
expressions -- must be invariant with respect to all of their
inputs. To see why this makes sense, consider what subtyping for a
trait reference means:
<T as Trait> <: <U as Trait>
means that if I know that `T as Trait`, I also know that `U as
Trait`. Moreover, if you think of it as dictionary passing style,
it means that a dictionary for `<T as Trait>` is safe to use where
a dictionary for `<U as Trait>` is expected.
The problem is that when you can project types out from `<T as
Trait>`, the relationship to types projected out of `<U as Trait>`
is completely unknown unless `T==U` (see #21726 for more
details). Making `Trait` invariant ensures that this is true.
Another related reason is that if we didn't make traits with
associated types invariant, then projection is no longer a
function with a single result. Consider:
```
trait Identity { type Out; fn foo(&self); }
impl<T> Identity for T { type Out = T; ... }
```
Now if I have `<&'static () as Identity>::Out`, this can be
validly derived as `&'a ()` for any `'a`:
<&'a () as Identity> <: <&'static () as Identity>
if &'static () < : &'a () -- Identity is contravariant in Self
if 'static : 'a -- Subtyping rules for relations
This change otoh means that `<'static () as Identity>::Out` is
always `&'static ()` (which might then be upcast to `'a ()`,
separately). This was helpful in solving #21750.