From 0e032c2870f6ad29e7ec41caacf3c933ba284b6e Mon Sep 17 00:00:00 2001 From: scalexm Date: Fri, 19 Oct 2018 15:20:41 +0200 Subject: [PATCH] Write implied bounds chapter --- src/traits/implied-bounds.md | 501 ++++++++++++++++++++++++++++++++++- 1 file changed, 496 insertions(+), 5 deletions(-) diff --git a/src/traits/implied-bounds.md b/src/traits/implied-bounds.md index 26a63bf3..1feb84e3 100644 --- a/src/traits/implied-bounds.md +++ b/src/traits/implied-bounds.md @@ -1,9 +1,500 @@ # Implied Bounds -*to be written* +Implied bounds remove the need to repeat where clauses written on +a type declaration or a trait declaration. For example, say we have the +following type declaration: +```rust,ignore +struct HashSet { + ... +} +``` -Cover: +then everywhere we use `HashSet` as an "input" type, that is appearing in +the receiver type of an `impl` or in the arguments of a function, we don't +want to have to repeat the `where K: Hash` bound, as in: -- Why the `FromEnv` setup etc is the way it is -- Perhaps move some of the material from 'lowering rules' in to here -- Show various examples where you could go wrong +```rust,ignore +// I don't want to have to repeat `where K: Hash` here. +impl HashSet { + ... +} + +// Same here. +fn loud_insert(set: &mut HashSet, item: K) { + println!("inserting!"); + set.insert(item); +} +``` + +Note that in the `loud_insert` example, `HashSet` is not the type of an +argument of the `loud_insert` function, it only *appears* in the argument type +`&mut HashSet`. + +The rationale for applying implied bounds to input types is that, for example, +in order to call the `loud_insert` function above, the programmer must have +*produced* the type `HashSet` already, hence the compiler already verified +that `HashSet` was well-formed, i.e. that `K` effectively implemented +`Hash`, as in the following example: + +```rust,ignore +fn main() { + // I am producing a value of type `HashSet`. + // If `i32` was not `Hash`, the compiler would report an error here. + let set: HashSet = HashSet::new(); + loud_insert(&mut set, 5); +} +``` + +hence we don't want to repeat where clauses for input types because that would +sort of duplicate the work of the programmer, having to verify that their types +are well-formed both when calling the function and when using them in the +arguments of their function. The same reasoning applies when using an `impl`. + +Similarly, given the following trait declaration: +```rust,ignore +trait Copy where Self: Clone { + ... +} +``` + +then everywhere we bound over `SomeType: Copy`, we would like to be able to +use the fact that `SomeType: Clone` without having to write it explicitly, +as in: +```rust,ignore +fn loud_clone(x: T) { + println!("cloning!"); + x.clone(); +} + +fn fun_with_copy(x: T) { + println!("will clone a `Copy` type soon..."); + + // I'm using `loud_clone` with `T: Copy`, I know this + // implies `T: Clone` so I don't want to have to write it explicitly. + loud_clone(x); +} +``` + +The rationale for implied bounds for traits is that if a type implement `Copy`, +that is if there exists an `impl Copy` for that type, there *ought* to exist +an `impl Clone` for that type, otherwise the compiler would have reported an +error in the first place. So again, if we were forced to repeat the additionnal +`where SomeType: Clone` everywhere whereas we already know that +`SomeType: Copy` hold, we would kind of duplicate the verification work. + +Implied bounds are not yet completely enforced in rustc, at the moment it only +works for outlive requirements, super trait bounds and bounds on associated +types. The full RFC can be found [here][RFC]. We'll give here a brief view +of how implied bounds work and why we chose to implement it that way. The +complete set of lowering rules can be found in the corresponding +[chapter](./lowering-rules.md). + +[RFC]: https://github.com/rust-lang/rfcs/blob/master/text/2089-implied-bounds.md + +## Implied bounds and lowering rules + +Now we need to express implied bounds in terms of logical rules. We will start +with exposing a naive way to do it. Suppose that we have the following traits: +```rust,ignore +trait Foo { + ... +} + +trait Bar where Self: Foo { } { + ... +} +``` + +So we would like to say that if a type implements `Bar`, then necessarily +it must also implement `Foo`. We might think that a clause like this would +work: +```text +forall { + Implemented(Type: Foo) :- Implemented(Type: Bar). +} +``` + +Now suppose that we just write this impl: +```rust,ignore +struct X; + +impl Bar for X { } +``` + +Clearly this should not be allowed: indeed, we wrote a `Bar` impl for `X`, but +the `Bar` trait requires that we also implement `Foo` for `X`, which we never +did. In terms of what the compiler does, this would look like this: +```rust,ignore +struct X; + +impl Bar for X { + // We are in a `Bar` impl for the type `X`. + // There is a `where Self: Foo` bound on the `Bar` trait declaration. + // Hence I need to prove that `X` also implements `Foo` for that impl + // to be legal. +} +``` +So the compiler would try to prove `Implemented(X: Foo)`. Of course it will +not find any `impl Foo for X` since we did not write any. However, it +will see our implied bound clause: +```text +forall { + Implemented(Type: Foo) :- Implemented(Type: Bar). +} +``` + +so that it may be able to prove `Implemented(X: Foo)` if `Implemented(X: Bar)` +holds. And it turns out that `Implemented(X: Bar)` does hold since we wrote +a `Bar` impl for `X`! Hence the compiler will accept the `Bar` impl while it +should not. + +## Implied bounds coming from the environment + +So the naive approach does not work. What we need to do is to somehow decouple +implied bounds from impls. Suppose we know that a type `SomeType<...>` +implements `Bar` and we want to deduce that `SomeType<...>` must also implement +`Foo`. + +There are two possibilities: first one, we have enough information about +`SomeType<...>` to see that there exists a `Bar` impl in the program which +covers `SomeType<...>`, for example a plain `impl<...> Bar for SomeType<...>`. +Then if the compiler has done its job correctly, there *must* exist a `Foo` +impl which covers `SomeType<...>`, e.g. another plain +`impl<...> Foo for SomeType<...>`. In that case then, we can just use this +impl and we do not need implied bounds at all. + +Second possibility: we do not know enough about `SomeType<...>` in order to +find a `Bar` impl which covers it, for example if `SomeType<...>` is just +a type parameter in a function: +```rust,ignore +fn foo() { + // We'd like to deduce `Implemented(T: Foo)`. +} +``` + +that is, the information that `T` implements `Bar` here comes from the +*environment*. The environment is the set of things that we assume to be true +when we type check some Rust declaration. In that case, what we assume is that +`T: Bar`. Then at that point, we might authorize ourselves to have some kind +of "local" implied bound reasoning which would say +`Implemented(T: Foo) :- Implemented(T: Bar)`. This reasoning would +only be done within our `foo` function in order to avoid the earlier +problem where we had a global clause. + +We can apply these local reasonings everywhere we can have an environment +-- i.e. when we can write where clauses -- that is inside impls, +trait declarations and type declarations. + +## Computing implied bounds with `FromEnv` + +The previous subsection showed that it was only useful to compute implied +bounds for facts coming from the environment. +We talked about "local" rules, but there are multiple possible strategies to +indeed implement the locality of implied bounds. + +In rustc, the current strategy is to *elaborate* bounds: that is, each time +we have a fact in the environment, we recursively derive all the other things +that are implied by this fact until we reach a fixed point. For example, if +we have the following declarations: +```rust,ignore +trait A { } +trait B where Self: A { } +trait C where Self: B { } + +fn foo() { + ... +} +``` +then inside the `foo` function, we start with an environment containing only +`Implemented(T: C)`. Then because of implied bounds for the `C` trait, we +elaborate `Implemented(T: B)` and add it to our environment. Because of +implied bounds for the `B` trait, we elaborate `Implemented(T: A)`and add it +to our environment as well. We cannot elaborate anything else, so we conclude +that our final environment consists of `Implemented(T: A + B + C)`. + +In the new-style trait system, we like to encode as many things as possible +with logical rules. So rather than "elaborating", we have a set of *global* +program clauses defined like so: +```text +forall { Implemented(T: A) :- FromEnv(T: A). } + +forall { Implemented(T: B) :- FromEnv(T: B). } +forall { FromEnv(T: A) :- FromEnv(T: B). } + +forall { Implemented(T: C) :- FromEnv(T: C). } +forall { FromEnv(T: C) :- FromEnv(T: C). } +``` +So these clauses are defined globally (that is they are available from +everywhere in the program) but they cannot be used because the hypothesis +is always of the form `FromEnv(...)` which is a bit special. Indeed, as +indicated by the name, `FromEnv(...)` facts can **only** come from the +environment. +How it works is that in the `foo` function, instead of having an environment +containing `Implemented(T: C)`, we replace this environment with +`FromEnv(T: C)`. From here and thanks to the above clauses, we see that we +are able to reach any of `Implemented(T: A)`, `Implemented(T: B)` or +`Implemented(T: C)`, which is what we wanted. + +## Implied bounds and well-formedness checking + +Implied bounds are tightly related with well-formedness checking. +Well-formedness checking is the process of checking that the impls the +programmer wrote are legal, what we referred to earlier as "the compiler doing +its job correctly". + +We already saw examples of illegal and legal impls: +```rust,ignore +trait Foo { } +trait Bar where Self: Foo { } + +struct X; +struct Y; + +impl Bar for X { + // This impl is not legal: the `Bar` trait requires that we also + // implement `Foo`, and we didn't. +} + +impl Foo for Y { + // This impl is legal: there is nothing to check as there are no where + // clauses on the `Foo` trait. +} + +impl Bar for Y { + // This impl is legal: we have a `Foo` impl for `Y`. +} +``` +We must define what "legal" and "illegal" mean. For this, we introduce another +predicate: `WellFormed(Type: Trait)`. We say that the trait reference +`Type: Trait` is well-formed is `Type` meets the bounds written on the +`Trait` declaration. For each impl we write, assuming that the where clauses +declared on the impl hold, the compiler tries to prove that the corresponding +trait reference is well-formed. The impl is legal if the compiler manages to do +so. + +Coming to the definition of `WellFormed(Type: Trait)`, it would be tempting +to define it as: +```rust,ignore +trait Trait where WC1, WC2, ..., WCn { + ... +} +``` + +```text +forall { + WellFormed(Type: Trait) :- WC1 && WC2 && .. && WCn. +} +``` +and indeed this was basically what was done in rustc until it was noticed that +this mixed badly with implied bounds. The key thing is that implied bounds +allows someone to derive all bounds implied by a fact in the environment, and +this *transitively* as we've seen with the `A + B + C` traits example. +However, the `WellFormed` predicate as defined above only checks that the +*direct* superbounds hold. That is, if we come back to our `A + B + C` +example: +```rust,ignore +trait A { } +// No where clauses, always well-formed. +// forall { WellFormed(Type: A). } + +trait B where Self: A { } +// We only check the direct superbound `Self: A`. +// forall { WellFormed(Type: B) :- Implemented(Type: A). } + +trait C where Self: B { } +// We only check the direct superbound `Self: B`. We do not check +// the `Self: A` implied bound coming from the `Self: B` superbound. +// forall { WellFormed(Type: C) :- Implemented(Type: B). } +``` +There is an asymmetry between the recursive power of implied bounds and +the shallow checking of `WellFormed`. It turns out that this asymmetry +can be [exploited][bug]. Indeed, suppose that we define the following +traits: +```rust,ignore +trait Partial where Self: Copy { } +// WellFormed(Self: Partial) :- Implemented(Self: Copy). + +trait Complete where Self: Partial { } +// WellFormed(Self: Complete) :- Implemented(Self: Partial). + +impl Partial for T where T: Complete { } + +impl Complete for T { } +``` + +For the `Partial` impl, what the compiler must prove is: +```text +forall { + if (T: Complete) { // assume that the where clauses hold + WellFormed(T: Partial) // show that the trait reference is well-formed + } +} +``` +Proving `WellFormed(T: Partial)` amounts to proving `Implemented(T: Copy)`. +However, we have `Implemented(T: Complete)` in our environment: thanks to +implied bounds, we can deduce `Implemented(T: Partial)`. Using implied bounds +one level deeper, we can deduce `Implemented(T: Copy)`. Finally, the `Partial` +impl is legal. + +For the `Complete` impl, what the compiler must prove is: +```text +forall { + WellFormed(T: Complete) // show that the trait reference is well-formed +} +``` +Proving `WellFormed(T: Complete)` amounts to proving `Implemented(T: Partial)`. +We see that the `impl Partial for T` applies if we can prove +`Implemented(T: Complete)`, and it turns out we can prove this fact since our +`impl Complete for T` is a blanket impl without any where clauses. + +So both impls are legal and the compiler accepts the program. Moreover, thanks +to the `Complete` blanket impl, all types implement `Complete`. So we could +now use this impl like so: +```rust,ignore +fn eat(x: T) { } + +fn copy_everything(x: T) { + eat(x); + eat(x); +} + +fn main() { + let not_copiable = vec![1, 2, 3, 4]; + copy_everything(not_copiable); +} +``` +In this program, we use the fact that `Vec` implements `Complete`, as any +other type. Hence we can call `copy_everything` with an argument of type +`Vec`. Inside the `copy_everything` function, we have the +`Implemented(T: Complete)` bound in our environment. Thanks to implied bounds, +we can deduce `Implemented(T: Partial)`. Using implied bounds again, we deduce +`Implemented(T: Copy)` and we can indeed call the `eat` function which moves +the argument twice since its argument is `Copy`. Problem: the `T` type was +in fact `Vec` which is not copy at all, hence we will double-free the +underlying vec storage so we have a memory unsoundness in safe Rust. + +Of course, disregarding the asymmetry between `WellFormed` and implied bounds, +this bug was possible only because we had some kind of self-referencing impls. +But self-referencing impls are very useful in practice and are not the real +culprits in this affair. + +[bug]: https://github.com/rust-lang/rust/pull/43786 + +## Co-inductiveness of `WellFormed` + +So the solution is to fix this asymmetry between `WellFormed` and implied +bounds. For that, we need for the `WellFormed` predicate to not only require +that the direct superbounds hold, but also all the bounds transitively implied +by the superbounds. What we can do is to have the following rules for the +`WellFormed` predicate: +```rust,ignore +trait A { } +// WellFormed(Self: A) :- Implemented(Self: A). + +trait B where Self: A { } +// WellFormed(Self: B) :- Implemented(Self: B) && WellFormed(Self: A). + +trait C where Self: B { } +// WellFormed(Self: C) :- Implemented(Self: C) && WellFormed(Self: B). +``` + +Notice that we are now also requiring `Implemented(Self: Trait)` for +`WellFormed(Self: Trait)` to be true: this is to simplify the process of +traversing all the implied bounds transitively. This does not change anything +when checking whether impls are legal, because since we assume +that the where clauses hold inside the impl, we know that the corresponding +trait reference do hold. Thanks to this setup, you can see that we indeed +require to prove the set of all bounds transitively implied by the where +clauses. + +However there is still a catch. Suppose that we have the following trait +definition: +```rust,ignore +trait Foo where ::Item: Foo { + type Item; +} +``` + +so this definition is a bit more involved than the ones we've seen already +because it defines an associated item. However, the well-formedness rule +would not be more complicated: +```text +WellFormed(Self: Foo) :- + Implemented(Self: Foo) && + WellFormed(::Item: Foo). +``` + +Now we would like to write the following impl: +```rust,ignore +impl Foo for i32 { + type Item = i32; +} +``` +The `Foo` trait definition and the `impl Foo for i32` are perfectly valid +Rust: we're kind of recursively using our `Foo` impl in order to show that +the associated value indeed implements `Foo`, but that's ok. But if we +translates this to our well-formedness setting, the compiler proof process +inside the `Foo` impl is the following: it starts with proving that the +well-formedness goal `WellFormed(i32: Foo)` is true. In order to do that, +it must prove the following goals: `Implemented(i32: Foo)` and +`WellFormed(::Item: Foo)`. `Implemented(i32: Foo)` holds because +there is our impl and there are no where clauses on it so it's always true. +However, because of the associated type value we used, +`WellFormed(::Item: Foo)` simplifies to just +`WellFormed(i32: Foo)`. So in order to prove its original goal +`WellFormed(i32: Foo)`, the compiler needs to prove `WellFormed(i32: Foo)`: +this clearly is a cycle and cycles are usually rejected by the trait solver, +unless... if the `WellFormed` predicate was made to be co-inductive. + +A co-inductive predicate, as discussed in the chapter on +[goals and clauses](./goals-and-clauses.md#coinductive-goals), are predicates +for which the +trait solver accepts cycles. In our setting, this would be a valid thing to do: +indeed, the `WellFormed` predicate just serves as a way of enumerating all +the implied bounds. Hence, it's like a fixed point algorithm: it tries to grow +the set of implied bounds until there is nothing more to add. Here, a cycle +in the chain of `WellFormed` predicates just means that there is no more bounds +to add in that direction, so we can just accept this cycle and focus on other +directions. It's easy to prove that under these co-inductive semantics, we +are effectively visiting all the transitive implied bounds, and only these. + +## Implied bounds on types + +We mainly talked about implied bounds for traits because this was the most +subtle regarding implementation. Implied bounds on types are simpler, +especially because if we assume that a type is well-formed, we don't use that +fact to deduce that other types are well-formed, we only use it to deduce +that e.g. some trait bounds hold. + +For types, we just use rules like these ones: +```rust,ignore +struct Type<...> where WC1, ..., WCn { + ... +} +``` + +```text +forall<...> { + WellFormed(Type<...>) :- WC1, ..., WCn. +} + +forall<...> { + FromEnv(WC1) :- FromEnv(Type<...>). + ... + FromEnv(WCn) :- FromEnv(Type<...>). +} +``` +We can see that we have this asymmetry between well-formedness check, +which only verifies that the direct superbounds hold, and implied bounds which +gives access to all bounds transitively implied by the where clauses. In that +case this is ok because as we said, we don't use `FromEnv(Type<...>)` to deduce +other `FromEnv(OtherType<...>)` things, nor do we use `FromEnv(Type: Trait)` to +deduce `FromEnv(OtherType<...>)` things. So in that sense type definitions are +"less recursive" than traits, and we saw in a previous subsection that +it was the combination of asymmetry and recursive trait / impls that led to +unsoundness. As such, the `WellFormed(Type<...>)` predicate does not need +to be co-inductive. + +This asymmetry optimization is useful because in a real Rust program, we have +to check the well-formedness of types very often (e.g. for each type which +appears in the body of a function).