Explain the important concepts of exhaustiveness checking
This commit is contained in:
parent
eb008cbd1e
commit
7f49cdcba2
|
|
@ -7,7 +7,7 @@ are exhaustive.
|
|||
## Pattern usefulness
|
||||
|
||||
The central question that usefulness checking answers is:
|
||||
"in this match expression, is that branch reachable?".
|
||||
"in this match expression, is that branch redundant?".
|
||||
More precisely, it boils down to computing whether,
|
||||
given a list of patterns we have already seen,
|
||||
a given new pattern might match any new value.
|
||||
|
|
@ -42,10 +42,8 @@ because a match expression can return a value).
|
|||
|
||||
## Where it happens
|
||||
|
||||
This check is done to any expression that desugars to a match expression in MIR.
|
||||
That includes actual `match` expressions,
|
||||
but also anything that looks like pattern matching,
|
||||
including `if let`, destructuring `let`, and similar expressions.
|
||||
This check is done anywhere you can write a pattern: `match` expressions, `if let`, `let else`,
|
||||
plain `let`, and function arguments.
|
||||
|
||||
```rust
|
||||
// `match`
|
||||
|
|
@ -80,9 +78,141 @@ fn foo(Foo { x, y }: Foo) {
|
|||
|
||||
## The algorithm
|
||||
|
||||
Exhaustiveness checking is implemented in [`check_match`].
|
||||
The core of the algorithm is in [`usefulness`].
|
||||
Exhaustiveness checking is run before MIR building in [`check_match`].
|
||||
It is implemented in the [`rustc_pattern_analysis`] crate,
|
||||
with the core of the algorithm in the [`usefulness`] module.
|
||||
That file contains a detailed description of the algorithm.
|
||||
|
||||
## Important concepts
|
||||
|
||||
### Constructors and fields
|
||||
|
||||
In the value `Pair(Some(0), true)`, `Pair` is called the constructor of the value, and `Some(0)` and
|
||||
`true` are its fields. Every matcheable value can be decomposed in this way. Examples of
|
||||
constructors are: `Some`, `None`, `(,)` (the 2-tuple constructor), `Foo {..}` (the constructor for
|
||||
a struct `Foo`), and `2` (the constructor for the number `2`).
|
||||
|
||||
Each constructor takes a fixed number of fields; this is called its arity. `Pair` and `(,)` have
|
||||
arity 2, `Some` has arity 1, `None` and `42` have arity 0. Each type has a known set of
|
||||
constructors. Some types have many constructors (like `u64`) or even an infinitely many (like `&str`
|
||||
and `&[T]`).
|
||||
|
||||
Patterns are similar: `Pair(Some(_), _)` has constructor `Pair` and two fields. The difference is
|
||||
that we get some extra pattern-only constructors, namely: the wildcard `_`, variable bindings,
|
||||
integer ranges like `0..=10`, and variable-length slices like `[_, .., _]`. We treat or-patterns
|
||||
separately.
|
||||
|
||||
Now to check if a value `v` matches a pattern `p`, we check if `v`'s constructor matches `p`'s
|
||||
constructor, then recursively compare their fields if necessary. A few representative examples:
|
||||
|
||||
- `matches!(v, _) := true`
|
||||
- `matches!((v0, v1), (p0, p1)) := matches!(v0, p0) && matches!(v1, p1)`
|
||||
- `matches!(Foo { a: v0, b: v1 }, Foo { a: p0, b: p1 }) := matches!(v0, p0) && matches!(v1, p1)`
|
||||
- `matches!(Ok(v0), Ok(p0)) := matches!(v0, p0)`
|
||||
- `matches!(Ok(v0), Err(p0)) := false` (incompatible variants)
|
||||
- `matches!(v, 1..=100) := matches!(v, 1) || ... || matches!(v, 100)`
|
||||
- `matches!([v0], [p0, .., p1]) := false` (incompatible lengths)
|
||||
- `matches!([v0, v1, v2], [p0, .., p1]) := matches!(v0, p0) && matches!(v2, p1)`
|
||||
|
||||
This concept is absolutely central to pattern analysis. The [`constructor`] module provides
|
||||
functions to extract, list and manipulate constructors. This is a useful enough concept that
|
||||
variations of it can be found in other places of the compiler, like in the MIR-lowering of a match
|
||||
expression and in some clippy lints.
|
||||
|
||||
### Constructor grouping and splitting
|
||||
|
||||
The pattern-only constructors (`_`, ranges and variable-length slices) each stand for a set of
|
||||
normal constructors, e.g. `_: Option<T>` stands for the set {`None`, `Some`} and `[_, .., _]` stands
|
||||
for the infinite set {`[,]`, `[,,]`, `[,,,]`, ...} of the slice constructors of arity >= 2.
|
||||
|
||||
In order to manage these constructors, we keep them as grouped as possible. For example:
|
||||
|
||||
```rust
|
||||
match (0, false) {
|
||||
(0 ..=100, true) => {}
|
||||
(50..=150, false) => {}
|
||||
(0 ..=200, _) => {}
|
||||
}
|
||||
```
|
||||
|
||||
In this example, all of `0`, `1`, .., `49` match the same arms, and thus can be treated as a group.
|
||||
In fact, in this match, the only ranges we need to consider are: `0..50`, `50..=100`,
|
||||
`101..=150`,`151..=200` and `201..`. Similarly:
|
||||
|
||||
```rust
|
||||
enum Direction { North, South, East, West }
|
||||
# let wind = (Direction::North, 0u8);
|
||||
match wind {
|
||||
(Direction::North, 50..) => {}
|
||||
(_, _) => {}
|
||||
}
|
||||
```
|
||||
|
||||
Here we can treat all the non-`North` constructors as a group, giving us only two cases to handle:
|
||||
`North`, and everything else.
|
||||
|
||||
This is called "constructor splitting" and is crucial to having exhaustiveness run in reasonable
|
||||
time.
|
||||
|
||||
### Usefulness vs reachability in the presence of empty types
|
||||
|
||||
This is likely the subtlest aspect of exhaustiveness. To be fully precise, a match doesn't operate
|
||||
on a value, it operates on a place. In certain unsafe circumstances, it is possible for a place to
|
||||
not contain valid data for its type. This has subtle consequences for empty types. Take the
|
||||
following:
|
||||
|
||||
```rust
|
||||
enum Void {}
|
||||
let x: u8 = 0;
|
||||
let ptr: *const Void = &x as *const u8 as *const Void;
|
||||
unsafe {
|
||||
match *ptr {
|
||||
_ => println!("Reachable!"),
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
In this example, `ptr` is a valid pointer pointing to a place with invalid data. The `_` pattern
|
||||
does not look at the contents of the place `*ptr`, so this code is ok and the arm is taken. In other
|
||||
words, despite the place we are inspecting being of type `Void`, there is a reachable arm. If the
|
||||
arm had a binding however:
|
||||
|
||||
```rust
|
||||
# #[derive(Copy, Clone)]
|
||||
# enum Void {}
|
||||
# let x: u8 = 0;
|
||||
# let ptr: *const Void = &x as *const u8 as *const Void;
|
||||
# unsafe {
|
||||
match *ptr {
|
||||
_a => println!("Unreachable!"),
|
||||
}
|
||||
# }
|
||||
```
|
||||
|
||||
Here the binding loads the value of type `Void` from the `*ptr` place. In this example, this causes
|
||||
UB since the data is not valid. In the general case, this asserts validity of the data at `*ptr`.
|
||||
Either way, this arm will never be taken.
|
||||
|
||||
Finally, let's consider the empty match `match *ptr {}`. If we consider this exhaustive, then
|
||||
having invalid data at `*ptr` is invalid. In other words, the empty match is semantically
|
||||
equivalent to the `_a => ...` match. In the interest of explicitness, we prefer the case with an
|
||||
arm, hence we won't tell the user to remove the `_a` arm. In other words, the `_a` arm is
|
||||
unreachable yet not redundant. This is why we lint on redundant arms rather than unreachable
|
||||
arms, despite the fact that the lint says "unreachable".
|
||||
|
||||
These considerations only affects certain places, namely those that can contain non-valid data
|
||||
without UB. These are: pointer dereferences, reference dereferences, and union field accesses. We
|
||||
track during exhaustiveness checking whether a given place is known to contain valid data.
|
||||
|
||||
Having said all that, the current implementation of exhaustiveness checking does not follow the
|
||||
above considerations. On stable, empty types are for the most part treated as non-empty. The
|
||||
[`exhaustive_patterns`] feature errs on the other end: it allows omitting arms that could be
|
||||
reachable in unsafe situations. The [`never_patterns`] experimental feature aims to fix this and
|
||||
permit the correct behavior of empty types in patterns.
|
||||
|
||||
[`check_match`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_build/thir/pattern/check_match/index.html
|
||||
[`usefulness`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_build/thir/pattern/usefulness/index.html
|
||||
[`rustc_pattern_analysis`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_pattern_analysis/index.html
|
||||
[`usefulness`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_pattern_analysis/usefulness/index.html
|
||||
[`constructor`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_pattern_analysis/constructor/index.html
|
||||
[`never_patterns`]: https://github.com/rust-lang/rust/issues/118155
|
||||
[`exhaustive_patterns`]: https://github.com/rust-lang/rust/issues/51085
|
||||
|
|
|
|||
Loading…
Reference in New Issue