Write implied bounds chapter

2018-10-19 15:20:41 +02:00 · 2018-10-19 15:20:41 +02:00 · 0e032c2870
parent 7117abcc53
commit 0e032c2870
1 changed files with 496 additions and 5 deletions
--- a/src/traits/implied-bounds.md
+++ b/src/traits/implied-bounds.md
@ -1,9 +1,500 @@
 # Implied Bounds
-*to be written*
+Implied bounds remove the need to repeat where clauses written on
 a type declaration or a trait declaration. For example, say we have the
 following type declaration:
 ```rust,ignore
 struct HashSet<K: Hash> {
    ...
 }
 ```
-Cover:
+then everywhere we use `HashSet<K>` as an "input" type, that is appearing in
 the receiver type of an `impl` or in the arguments of a function, we don't
 want to have to repeat the `where K: Hash` bound, as in:
- Why the `FromEnv` setup etc is the way it is
+```rust,ignore
- Perhaps move some of the material from 'lowering rules' in to here
+// I don't want to have to repeat `where K: Hash` here.
- Show various examples where you could go wrong
+impl<K> HashSet<K> {
    ...
 }
 // Same here.
 fn loud_insert<K>(set: &mut HashSet<K>, item: K) {
    println!("inserting!");
    set.insert(item);
 }
 ```
 Note that in the `loud_insert` example, `HashSet<K>` is not the type of an
 argument of the `loud_insert` function, it only *appears* in the argument type
 `&mut HashSet<K>`.
 The rationale for applying implied bounds to input types is that, for example,
 in order to call the `loud_insert` function above, the programmer must have
 *produced* the type `HashSet<K>` already, hence the compiler already verified
 that `HashSet<K>` was well-formed, i.e. that `K` effectively implemented
 `Hash`, as in the following example:
 ```rust,ignore
 fn main() {
    // I am producing a value of type `HashSet<i32>`.
    // If `i32` was not `Hash`, the compiler would report an error here.
    let set: HashSet<i32> = HashSet::new();
    loud_insert(&mut set, 5);
 }
 ```
 hence we don't want to repeat where clauses for input types because that would
 sort of duplicate the work of the programmer, having to verify that their types
 are well-formed both when calling the function and when using them in the
 arguments of their function. The same reasoning applies when using an `impl`.
 Similarly, given the following trait declaration:
 ```rust,ignore
 trait Copy where Self: Clone {
    ...
 }
 ```
 then everywhere we bound over `SomeType: Copy`, we would like to be able to
 use the fact that `SomeType: Clone` without having to write it explicitly,
 as in:
 ```rust,ignore
 fn loud_clone<T: Clone>(x: T) {
    println!("cloning!");
    x.clone();
 }
 fn fun_with_copy<T: Copy>(x: T) {
    println!("will clone a `Copy` type soon...");
    // I'm using `loud_clone<T: Clone>` with `T: Copy`, I know this
    // implies `T: Clone` so I don't want to have to write it explicitly.
    loud_clone(x);
 }
 ```
 The rationale for implied bounds for traits is that if a type implement `Copy`,
 that is if there exists an `impl Copy` for that type, there *ought* to exist
 an `impl Clone` for that type, otherwise the compiler would have reported an
 error in the first place. So again, if we were forced to repeat the additionnal
 `where SomeType: Clone` everywhere whereas we already know that
 `SomeType: Copy` hold, we would kind of duplicate the verification work.
 Implied bounds are not yet completely enforced in rustc, at the moment it only
 works for outlive requirements, super trait bounds and bounds on associated
 types. The full RFC can be found [here][RFC]. We'll give here a brief view
 of how implied bounds work and why we chose to implement it that way. The
 complete set of lowering rules can be found in the corresponding
 [chapter](./lowering-rules.md).
 [RFC]: https://github.com/rust-lang/rfcs/blob/master/text/2089-implied-bounds.md
 ## Implied bounds and lowering rules
 Now we need to express implied bounds in terms of logical rules. We will start
 with exposing a naive way to do it. Suppose that we have the following traits:
 ```rust,ignore
 trait Foo {
    ...
 }
 trait Bar where Self: Foo { } {
    ...
 }
 ```
 So we would like to say that if a type implements `Bar`, then necessarily
 it must also implement `Foo`. We might think that a clause like this would
 work:
 ```text
 forall<Type> {
    Implemented(Type: Foo) :- Implemented(Type: Bar).
 }
 ```
 Now suppose that we just write this impl:
 ```rust,ignore
 struct X;
 impl Bar for X { }
 ```
 Clearly this should not be allowed: indeed, we wrote a `Bar` impl for `X`, but
 the `Bar` trait requires that we also implement `Foo` for `X`, which we never
 did. In terms of what the compiler does, this would look like this:
 ```rust,ignore
 struct X;
 impl Bar for X {
    // We are in a `Bar` impl for the type `X`.
    // There is a `where Self: Foo` bound on the `Bar` trait declaration.
    // Hence I need to prove that `X` also implements `Foo` for that impl
    // to be legal.
 }
 ```
 So the compiler would try to prove `Implemented(X: Foo)`. Of course it will
 not find any `impl Foo for X` since we did not write any. However, it
 will see our implied bound clause:
 ```text
 forall<Type> {
    Implemented(Type: Foo) :- Implemented(Type: Bar).
 }
 ```
 so that it may be able to prove `Implemented(X: Foo)` if `Implemented(X: Bar)`
 holds. And it turns out that `Implemented(X: Bar)` does hold since we wrote
 a `Bar` impl for `X`! Hence the compiler will accept the `Bar` impl while it
 should not.
 ## Implied bounds coming from the environment
 So the naive approach does not work. What we need to do is to somehow decouple
 implied bounds from impls. Suppose we know that a type `SomeType<...>`
 implements `Bar` and we want to deduce that `SomeType<...>` must also implement
 `Foo`.
 There are two possibilities: first one, we have enough information about
 `SomeType<...>` to see that there exists a `Bar` impl in the program which
 covers `SomeType<...>`, for example a plain `impl<...> Bar for SomeType<...>`.
 Then if the compiler has done its job correctly, there *must* exist a `Foo`
 impl which covers `SomeType<...>`, e.g. another plain
 `impl<...> Foo for SomeType<...>`. In that case then, we can just use this
 impl and we do not need implied bounds at all.
 Second possibility: we do not know enough about `SomeType<...>` in order to
 find a `Bar` impl which covers it, for example if `SomeType<...>` is just
 a type parameter in a function:
 ```rust,ignore
 fn foo<T: Bar>() {
    // We'd like to deduce `Implemented(T: Foo)`.
 }
 ```
 that is, the information that `T` implements `Bar` here comes from the
 *environment*. The environment is the set of things that we assume to be true
 when we type check some Rust declaration. In that case, what we assume is that
 `T: Bar`. Then at that point, we might authorize ourselves to have some kind
 of  "local" implied bound reasoning which would say
 `Implemented(T: Foo) :- Implemented(T: Bar)`. This reasoning would
 only be done within our `foo` function in order to avoid the earlier
 problem where we had a global clause.
 We can apply these local reasonings everywhere we can have an environment
 -- i.e. when we can write where clauses -- that is inside impls,
 trait declarations and type declarations.
 ## Computing implied bounds with `FromEnv`
 The previous subsection showed that it was only useful to compute implied
 bounds for facts coming from the environment.
 We talked about "local" rules, but there are multiple possible strategies to
 indeed implement the locality of implied bounds.
 In rustc, the current strategy is to *elaborate* bounds: that is, each time
 we have a fact in the environment, we recursively derive all the other things
 that are implied by this fact until we reach a fixed point. For example, if
 we have the following declarations:
 ```rust,ignore
 trait A { }
 trait B where Self: A { }
 trait C where Self: B { }
 fn foo<T: C>() {
    ...
 }
 ```
 then inside the `foo` function, we start with an environment containing only
 `Implemented(T: C)`. Then because of implied bounds for the `C` trait, we
 elaborate `Implemented(T: B)` and add it to our environment. Because of
 implied bounds for the `B` trait, we elaborate `Implemented(T: A)`and add it
 to our environment as well. We cannot elaborate anything else, so we conclude
 that our final environment consists of `Implemented(T: A + B + C)`.
 In the new-style trait system, we like to encode as many things as possible
 with logical rules. So rather than "elaborating", we have a set of *global*
 program clauses defined like so:
 ```text
 forall<T> { Implemented(T: A) :- FromEnv(T: A). }
 forall<T> { Implemented(T: B) :- FromEnv(T: B). }
 forall<T> { FromEnv(T: A) :- FromEnv(T: B). }
 forall<T> { Implemented(T: C) :- FromEnv(T: C). }
 forall<T> { FromEnv(T: C) :- FromEnv(T: C). }
 ```
 So these clauses are defined globally (that is they are available from
 everywhere in the program) but they cannot be used because the hypothesis
 is always of the form `FromEnv(...)` which is a bit special. Indeed, as
 indicated by the name, `FromEnv(...)` facts can **only** come from the
 environment.
 How it works is that in the `foo` function, instead of having an environment
 containing `Implemented(T: C)`, we replace this environment with
 `FromEnv(T: C)`. From here and thanks to the above clauses, we see that we
 are able to reach any of `Implemented(T: A)`, `Implemented(T: B)` or
 `Implemented(T: C)`, which is what we wanted.
 ## Implied bounds and well-formedness checking
 Implied bounds are tightly related with well-formedness checking.
 Well-formedness checking is the process of checking that the impls the
 programmer wrote are legal, what we referred to earlier as "the compiler doing
 its job correctly".
 We already saw examples of illegal and legal impls:
 ```rust,ignore
 trait Foo { }
 trait Bar where Self: Foo { }
 struct X;
 struct Y;
 impl Bar for X {
    // This impl is not legal: the `Bar` trait requires that we also
    // implement `Foo`, and we didn't.
 }
 impl Foo for Y {
    // This impl is legal: there is nothing to check as there are no where
    // clauses on the `Foo` trait.
 }
 impl Bar for Y {
    // This impl is legal: we have a `Foo` impl for `Y`.
 }
 ```
 We must define what "legal" and "illegal" mean. For this, we introduce another
 predicate: `WellFormed(Type: Trait)`. We say that the trait reference
 `Type: Trait` is well-formed is `Type` meets the bounds written on the
 `Trait` declaration. For each impl we write, assuming that the where clauses
 declared on the impl hold, the compiler tries to prove that the corresponding
 trait reference is well-formed. The impl is legal if the compiler manages to do
 so.
 Coming to the definition of `WellFormed(Type: Trait)`, it would be tempting
 to define it as:
 ```rust,ignore
 trait Trait where WC1, WC2, ..., WCn {
    ...
 }
 ```
 ```text
 forall<Type> {
    WellFormed(Type: Trait) :- WC1 && WC2 && .. && WCn.
 }
 ```
 and indeed this was basically what was done in rustc until it was noticed that
 this mixed badly with implied bounds. The key thing is that implied bounds
 allows someone to derive all bounds implied by a fact in the environment, and
 this *transitively* as we've seen with the `A + B + C` traits example.
 However, the `WellFormed` predicate as defined above only checks that the
 *direct* superbounds hold. That is, if we come back to our `A + B + C`
 example:
 ```rust,ignore
 trait A { }
 // No where clauses, always well-formed.
 // forall<Type> { WellFormed(Type: A). }
 trait B where Self: A { }
 // We only check the direct superbound `Self: A`.
 // forall<Type> { WellFormed(Type: B) :- Implemented(Type: A). }
 trait C where Self: B { }
 // We only check the direct superbound `Self: B`. We do not check
 // the `Self: A` implied bound  coming from the `Self: B` superbound.
 // forall<Type> { WellFormed(Type: C) :- Implemented(Type: B). }
 ```
 There is an asymmetry between the recursive power of implied bounds and
 the shallow checking of `WellFormed`. It turns out that this asymmetry
 can be [exploited][bug]. Indeed, suppose that we define the following
 traits:
 ```rust,ignore
 trait Partial where Self: Copy { }
 // WellFormed(Self: Partial) :- Implemented(Self: Copy).
 trait Complete where Self: Partial { }
 // WellFormed(Self: Complete) :- Implemented(Self: Partial).
 impl<T> Partial for T where T: Complete { }
 impl<T> Complete for T { }
 ```
 For the `Partial` impl, what the compiler must prove is:
 ```text
 forall<T> {
    if (T: Complete) { // assume that the where clauses hold
        WellFormed(T: Partial) // show that the trait reference is well-formed
    }
 }
 ```
 Proving `WellFormed(T: Partial)` amounts to proving `Implemented(T: Copy)`.
 However, we have `Implemented(T: Complete)` in our environment: thanks to
 implied bounds, we can deduce `Implemented(T: Partial)`. Using implied bounds
 one level deeper, we can deduce `Implemented(T: Copy)`. Finally, the `Partial`
 impl is legal.
 For the `Complete` impl, what the compiler must prove is:
 ```text
 forall<T> {
    WellFormed(T: Complete) // show that the trait reference is well-formed
 }
 ```
 Proving `WellFormed(T: Complete)` amounts to proving `Implemented(T: Partial)`.
 We see that the `impl Partial for T` applies if we can prove
 `Implemented(T: Complete)`, and it turns out we can prove this fact since our
 `impl<T> Complete for T` is a blanket impl without any where clauses.
 So both impls are legal and the compiler accepts the program. Moreover, thanks
 to the `Complete` blanket impl, all types implement `Complete`. So we could
 now use this impl like so:
 ```rust,ignore
 fn eat<T>(x: T) { }
 fn copy_everything<T: Complete>(x: T) {
    eat(x);
    eat(x);
 }
 fn main() {
    let not_copiable = vec![1, 2, 3, 4];
    copy_everything(not_copiable);
 }
 ```
 In this program, we use the fact that `Vec<i32>` implements `Complete`, as any
 other type. Hence we can call `copy_everything` with an argument of type
 `Vec<i32>`. Inside the `copy_everything` function, we have the
 `Implemented(T: Complete)` bound in our environment. Thanks to implied bounds,
 we can deduce `Implemented(T: Partial)`. Using implied bounds again, we deduce
 `Implemented(T: Copy)` and we can indeed call the `eat` function which moves
 the argument twice since its argument is `Copy`. Problem: the `T` type was
 in fact `Vec<i32>` which is not copy at all, hence we will double-free the
 underlying vec storage so we have a memory unsoundness in safe Rust.
 Of course, disregarding the asymmetry between `WellFormed` and implied bounds,
 this bug was possible only because we had some kind of self-referencing impls.
 But self-referencing impls are very useful in practice and are not the real
 culprits in this affair.
 [bug]: https://github.com/rust-lang/rust/pull/43786
 ## Co-inductiveness of `WellFormed`
 So the solution is to fix this asymmetry between `WellFormed` and implied
 bounds. For that, we need for the `WellFormed` predicate to not only require
 that the direct superbounds hold, but also all the bounds transitively implied
 by the superbounds. What we can do is to have the following rules for the
 `WellFormed` predicate:
 ```rust,ignore
 trait A { }
 // WellFormed(Self: A) :- Implemented(Self: A).
 trait B where Self: A { }
 // WellFormed(Self: B) :- Implemented(Self: B) && WellFormed(Self: A).
 trait C where Self: B { }
 // WellFormed(Self: C) :- Implemented(Self: C) && WellFormed(Self: B).
 ```
 Notice that we are now also requiring `Implemented(Self: Trait)` for
 `WellFormed(Self: Trait)` to be true: this is to simplify the process of
 traversing all the implied bounds transitively. This does not change anything
 when checking whether impls are legal, because since we assume
 that the where clauses hold inside the impl, we know that the corresponding
 trait reference do hold. Thanks to this setup, you can see that we indeed
 require to prove the set of all bounds transitively implied by the where
 clauses.
 However there is still a catch. Suppose that we have the following trait
 definition:
 ```rust,ignore
 trait Foo where <Self as Foo>::Item: Foo {
    type Item;
 }
 ```
 so this definition is a bit more involved than the ones we've seen already
 because it defines an associated item. However, the well-formedness rule
 would not be more complicated:
 ```text
 WellFormed(Self: Foo) :-
    Implemented(Self: Foo) &&
    WellFormed(<Self as Foo>::Item: Foo).
 ```
 Now we would like to write the following impl:
 ```rust,ignore
 impl Foo for i32 {
    type Item = i32;
 }
 ```
 The `Foo` trait definition and the `impl Foo for i32` are perfectly valid
 Rust: we're kind of recursively using our `Foo` impl in order to show that
 the associated value indeed implements `Foo`, but that's ok. But if we
 translates this to our well-formedness setting, the compiler proof process
 inside the `Foo` impl is the following: it starts with proving that the
 well-formedness goal `WellFormed(i32: Foo)` is true. In order to do that,
 it must prove the following goals: `Implemented(i32: Foo)` and
 `WellFormed(<i32 as Foo>::Item: Foo)`. `Implemented(i32: Foo)` holds because
 there is our impl and there are no where clauses on it so it's always true.
 However, because of the associated type value we used,
 `WellFormed(<i32 as Foo>::Item: Foo)` simplifies to just
 `WellFormed(i32: Foo)`. So in order to prove its original goal
 `WellFormed(i32: Foo)`, the compiler needs to prove `WellFormed(i32: Foo)`:
 this clearly is a cycle and cycles are usually rejected by the trait solver,
 unless...  if the `WellFormed` predicate was made to be co-inductive.
 A co-inductive predicate, as discussed in the chapter on
 [goals and clauses](./goals-and-clauses.md#coinductive-goals), are predicates
 for which the
 trait solver accepts cycles. In our setting, this would be a valid thing to do:
 indeed, the `WellFormed` predicate just serves as a way of enumerating all
 the implied bounds. Hence, it's like a fixed point algorithm: it tries to grow
 the set of implied bounds until there is nothing more to add. Here, a cycle
 in the chain of `WellFormed` predicates just means that there is no more bounds
 to add in that direction, so we can just accept this cycle and focus on other
 directions. It's easy to prove that under these co-inductive semantics, we
 are effectively visiting all the transitive implied bounds, and only these.
 ## Implied bounds on types
 We mainly talked about implied bounds for traits because this was the most
 subtle regarding implementation. Implied bounds on types are simpler,
 especially because if we assume that a type is well-formed, we don't use that
 fact to deduce that other types are well-formed, we only use it to deduce
 that e.g. some trait bounds hold.
 For types, we just use rules like these ones:
 ```rust,ignore
 struct Type<...> where WC1, ..., WCn {
    ...
 }
 ```
 ```text
 forall<...> {
    WellFormed(Type<...>) :- WC1, ..., WCn.
 }
 forall<...> {
    FromEnv(WC1) :- FromEnv(Type<...>).
    ...
    FromEnv(WCn) :- FromEnv(Type<...>).
 }
 ```
 We can see that we have this asymmetry between well-formedness check,
 which only verifies that the direct superbounds hold, and implied bounds which
 gives access to all bounds transitively implied by the where clauses. In that
 case this is ok because as we said, we don't use `FromEnv(Type<...>)` to deduce
 other `FromEnv(OtherType<...>)` things, nor do we use `FromEnv(Type: Trait)` to
 deduce `FromEnv(OtherType<...>)` things. So in that sense type definitions are
 "less recursive" than traits, and we saw in a previous subsection that
 it was the combination of asymmetry and recursive trait / impls that led to
 unsoundness. As such, the `WellFormed(Type<...>)` predicate does not need
 to be co-inductive.
 This asymmetry optimization is useful because in a real Rust program, we have
 to check the well-formedness of types very often (e.g. for each type which
 appears in the body of a function).