Merge pull request #270 from michaelwoerister/query-eval-model-update
Add "The Query Evaluation Model in Detail" and "Incremental Compilation In Detail" chapters.
This commit is contained in:
commit
1ad362e6d6
|
|
@ -27,7 +27,9 @@
|
||||||
- [The Rustc Driver](./rustc-driver.md)
|
- [The Rustc Driver](./rustc-driver.md)
|
||||||
- [Rustdoc](./rustdoc.md)
|
- [Rustdoc](./rustdoc.md)
|
||||||
- [Queries: demand-driven compilation](./query.md)
|
- [Queries: demand-driven compilation](./query.md)
|
||||||
- [Incremental compilation](./incremental-compilation.md)
|
- [The Query Evaluation Model in Detail](./queries/query-evaluation-model-in-detail.md)
|
||||||
|
- [Incremental compilation](./queries/incremental-compilation.md)
|
||||||
|
- [Incremental compilation In Detail](./queries/incremental-compilation-in-detail.md)
|
||||||
- [Debugging and Testing](./incrcomp-debugging.md)
|
- [Debugging and Testing](./incrcomp-debugging.md)
|
||||||
- [The parser](./the-parser.md)
|
- [The parser](./the-parser.md)
|
||||||
- [`#[test]` Implementation](./test-implementation.md)
|
- [`#[test]` Implementation](./test-implementation.md)
|
||||||
|
|
|
||||||
|
|
@ -15,7 +15,7 @@ completeness | completeness is a technical term in type theory. Comp
|
||||||
control-flow graph | a representation of the control-flow of a program; see [the background chapter for more](./background.html#cfg)
|
control-flow graph | a representation of the control-flow of a program; see [the background chapter for more](./background.html#cfg)
|
||||||
CTFE | Compile-Time Function Evaluation. This is the ability of the compiler to evaluate `const fn`s at compile time. This is part of the compiler's constant evaluation system. ([see more](../const-eval.html))
|
CTFE | Compile-Time Function Evaluation. This is the ability of the compiler to evaluate `const fn`s at compile time. This is part of the compiler's constant evaluation system. ([see more](../const-eval.html))
|
||||||
cx | we tend to use "cx" as an abbreviation for context. See also `tcx`, `infcx`, etc.
|
cx | we tend to use "cx" as an abbreviation for context. See also `tcx`, `infcx`, etc.
|
||||||
DAG | a directed acyclic graph is used during compilation to keep track of dependencies between queries. ([see more](../incremental-compilation.html))
|
DAG | a directed acyclic graph is used during compilation to keep track of dependencies between queries. ([see more](../queries/incremental-compilation.html))
|
||||||
data-flow analysis | a static analysis that figures out what properties are true at each point in the control-flow of a program; see [the background chapter for more](./background.html#dataflow)
|
data-flow analysis | a static analysis that figures out what properties are true at each point in the control-flow of a program; see [the background chapter for more](./background.html#dataflow)
|
||||||
DefId | an index identifying a definition (see `librustc/hir/def_id.rs`). Uniquely identifies a `DefPath`.
|
DefId | an index identifying a definition (see `librustc/hir/def_id.rs`). Uniquely identifies a `DefPath`.
|
||||||
Double pointer | a pointer with additional metadata. See "fat pointer" for more.
|
Double pointer | a pointer with additional metadata. See "fat pointer" for more.
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,354 @@
|
||||||
|
# Incremental Compilation In Detail
|
||||||
|
|
||||||
|
The incremental compilation scheme is, in essence, a surprisingly
|
||||||
|
simple extension to the overall query system. It relies on the fact that:
|
||||||
|
|
||||||
|
1. queries are pure functions -- given the same inputs, a query will always
|
||||||
|
yield the same result, and
|
||||||
|
2. the query model structures compilation in an acyclic graph that makes
|
||||||
|
dependencies between individual computations explicit.
|
||||||
|
|
||||||
|
This chapter will explain how we can use these properties for making things
|
||||||
|
incremental and then goes on to discuss version implementation issues.
|
||||||
|
|
||||||
|
# A Basic Algorithm For Incremental Query Evaluation
|
||||||
|
|
||||||
|
As explained in the [query evaluation model primer][query-model], query
|
||||||
|
invocations form a directed-acyclic graph. Here's the example from the
|
||||||
|
previous chapter again:
|
||||||
|
|
||||||
|
```ignore
|
||||||
|
list_of_all_hir_items <----------------------------- type_check_crate()
|
||||||
|
|
|
||||||
|
|
|
||||||
|
Hir(foo) <--- type_of(foo) <--- type_check_item(foo) <-------+
|
||||||
|
| |
|
||||||
|
+-----------------+ |
|
||||||
|
| |
|
||||||
|
v |
|
||||||
|
Hir(bar) <--- type_of(bar) <--- type_check_item(bar) <-------+
|
||||||
|
```
|
||||||
|
|
||||||
|
Since every access from one query to another has to go through the query
|
||||||
|
context, we can record these accesses and thus actually build this dependency
|
||||||
|
graph in memory. With dependency tracking enabled, when compilation is done,
|
||||||
|
we know which queries were invoked (the nodes of the graph) and for each
|
||||||
|
invocation, which other queries or input has gone into computing the query's
|
||||||
|
result (the edges of the graph).
|
||||||
|
|
||||||
|
Now suppose, we change the source code of our program so that
|
||||||
|
HIR of `bar` looks different than before. Our goal is to only recompute
|
||||||
|
those queries that are actually affected by the change while just re-using
|
||||||
|
the cached results of all the other queries. Given the dependency graph we can
|
||||||
|
do exactly that. For a given query invocation, the graph tells us exactly
|
||||||
|
what data has gone into computing its results, we just have to follow the
|
||||||
|
edges until we reach something that has changed. If we don't encounter
|
||||||
|
anything that has changed, we know that the query still would evaluate to
|
||||||
|
the same result we already have in our cache.
|
||||||
|
|
||||||
|
Taking the `type_of(foo)` invocation from above as example, we can check
|
||||||
|
whether the cached result is still valid by following the edges to its
|
||||||
|
inputs. The only edge leads to `Hir(foo)`, an input that has not been affected
|
||||||
|
by the change. So we know that the cached result for `type_of(foo)` is still
|
||||||
|
valid.
|
||||||
|
|
||||||
|
The story is a bit different for `type_check_item(foo)`: We again walk the
|
||||||
|
edges and already know that `type_of(foo)` is fine. Then we get to
|
||||||
|
`type_of(bar)` which we have not checked yet, so we walk the edges of
|
||||||
|
`type_of(bar)` and encounter `Hir(bar)` which *has* changed. Consequently
|
||||||
|
the result of `type_of(bar)` might yield a different same result than what we
|
||||||
|
have in the cache and, transitively, the result of `type_check_item(foo)`
|
||||||
|
might have changed too. We thus re-run `type_check_item(foo)`, which in
|
||||||
|
turn will re-run `type_of(bar)`, which will yield an up-to-date result
|
||||||
|
because it reads the up-to-date version of `Hir(bar)`.
|
||||||
|
|
||||||
|
|
||||||
|
# The Problem With The Basic Algorithm: False Positives
|
||||||
|
|
||||||
|
If you read the previous paragraph carefully, you'll notice that it says that
|
||||||
|
`type_of(bar)` *might* have changed because one of its inputs has changed.
|
||||||
|
There's also the possibility that it might still yield exactly the same
|
||||||
|
result *even though* its input has changed. Consider an example with a
|
||||||
|
simple query that just computes the sign of an integer:
|
||||||
|
|
||||||
|
```ignore
|
||||||
|
IntValue(x) <---- sign_of(x) <--- some_other_query(x)
|
||||||
|
```
|
||||||
|
|
||||||
|
Let's say that `IntValue(x)` starts out as `1000` and then is set to `2000`.
|
||||||
|
Even though `IntValue(x)` is different in the two cases, `sign_of(x)` yields
|
||||||
|
the result `+` in both cases.
|
||||||
|
|
||||||
|
If we follow the basic algorithm, however, `some_other_query(x)` would have to
|
||||||
|
(unnecessarily) be re-evaluated because it transitively depends on a changed
|
||||||
|
input. Change detection yields a "false positive" in this case because it has
|
||||||
|
to conservatively assume that `some_other_query(x)` might be affected by that
|
||||||
|
changed input.
|
||||||
|
|
||||||
|
Unfortunately it turns out that the actual queries in the compiler are full
|
||||||
|
of examples like this and small changes to the input often potentially affect
|
||||||
|
very large parts of the output binaries. As a consequence, we had to make the
|
||||||
|
change detection system smarter and more accurate.
|
||||||
|
|
||||||
|
# Improving Accuracy: The red-green Algorithm
|
||||||
|
|
||||||
|
The "false positives" problem can be solved by interleaving change detection
|
||||||
|
and query re-evaluation. Instead of walking the graph all the way to the
|
||||||
|
inputs when trying to find out if some cached result is still valid, we can
|
||||||
|
check if a result has *actually* changed after we were forced to re-evaluate
|
||||||
|
it.
|
||||||
|
|
||||||
|
We call this algorithm, for better or worse, the red-green algorithm because nodes
|
||||||
|
in the dependency graph are assigned the color green if we were able to prove
|
||||||
|
that its cached result is still valid and the color red if the result has
|
||||||
|
turned out to be different after re-evaluating it.
|
||||||
|
|
||||||
|
The meat of red-green change tracking is implemented in the try-mark-green
|
||||||
|
algorithm, that, you've guessed it, tries to mark a given node as green:
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
fn try_mark_green(tcx, current_node) -> bool {
|
||||||
|
|
||||||
|
// Fetch the inputs to `current_node`, i.e. get the nodes that the direct
|
||||||
|
// edges from `node` lead to.
|
||||||
|
let dependencies = tcx.dep_graph.get_dependencies_of(current_node);
|
||||||
|
|
||||||
|
// Now check all the inputs for changes
|
||||||
|
for dependency in dependencies {
|
||||||
|
|
||||||
|
match tcx.dep_graph.get_node_color(dependency) {
|
||||||
|
Green => {
|
||||||
|
// This input has already been checked before and it has not
|
||||||
|
// changed; so we can go on to check the next one
|
||||||
|
}
|
||||||
|
Red => {
|
||||||
|
// We found an input that has changed. We cannot mark
|
||||||
|
// `current_node` as green without re-running the
|
||||||
|
// corresponding query.
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
Unknown => {
|
||||||
|
// This is the first time we are look at this node. Let's try
|
||||||
|
// to mark it green by calling try_mark_green() recursively.
|
||||||
|
if try_mark_green(tcx, dependency) {
|
||||||
|
// We successfully marked the input as green, on to the
|
||||||
|
// next.
|
||||||
|
} else {
|
||||||
|
// We could *not* mark the input as green. This means we
|
||||||
|
// don't know if its value has changed. In order to find
|
||||||
|
// out, we re-run the corresponding query now!
|
||||||
|
tcx.run_query_for(dependency);
|
||||||
|
|
||||||
|
// Fetch and check the node color again. Running the query
|
||||||
|
// has forced it to either red (if it yielded a different
|
||||||
|
// result than we have in the cache) or green (if it
|
||||||
|
// yielded the same result).
|
||||||
|
match tcx.dep_graph.get_node_color(dependency) {
|
||||||
|
Red => {
|
||||||
|
// The input turned out to be red, so we cannot
|
||||||
|
// mark `current_node` as green.
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
Green => {
|
||||||
|
// Re-running the query paid off! The result is the
|
||||||
|
// same as before, so this particular input does
|
||||||
|
// not invalidate `current_node`.
|
||||||
|
}
|
||||||
|
Unknown => {
|
||||||
|
// There is no way a node has no color after
|
||||||
|
// re-running the query.
|
||||||
|
panic!("unreachable")
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// If we have gotten through the entire loop, it means that all inputs
|
||||||
|
// have turned out to be green. If all inputs are unchanged, it means
|
||||||
|
// that the query result corresponding to `current_node` cannot have
|
||||||
|
// changed either.
|
||||||
|
tcx.dep_graph.mark_green(current_node);
|
||||||
|
|
||||||
|
true
|
||||||
|
}
|
||||||
|
|
||||||
|
// Note: The actual implementation can be found in
|
||||||
|
// src/librustc/dep_graph/graph.rs
|
||||||
|
```
|
||||||
|
|
||||||
|
By using red-green marking we can avoid the devastating cumulative effect of
|
||||||
|
having false positives during change detection. Whenever a query is executed
|
||||||
|
in incremental mode, we first check if its already green. If not, we run
|
||||||
|
`try_mark_green()` on it. If it still isn't green after that, then we actually
|
||||||
|
invoke the query provider to re-compute the result.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
# The Real World: How Persistence Makes Everything Complicated
|
||||||
|
|
||||||
|
The sections above described the underlying algorithm for incremental
|
||||||
|
compilation but because the compiler process exits after being finished and
|
||||||
|
takes the query context with its result cache with it into oblivion, we have
|
||||||
|
persist data to disk, so the next compilation session can make use of it.
|
||||||
|
This comes with a whole new set of implementation challenges:
|
||||||
|
|
||||||
|
- The query results cache is stored to disk, so they are not readily available
|
||||||
|
for change comparison.
|
||||||
|
- A subsequent compilation session will start off with new version of the code
|
||||||
|
that has arbitrary changes applied to it. All kinds of IDs and indices that
|
||||||
|
are generated from a global, sequential counter (e.g. `NodeId`, `DefId`, etc)
|
||||||
|
might have shifted, making the persisted results on disk not immediately
|
||||||
|
usable anymore because the same numeric IDs and indices might refer to
|
||||||
|
completely new things in the new compilation session.
|
||||||
|
- Persisting things to disk comes at a cost, so not every tiny piece of
|
||||||
|
information should be actually cached in between compilation sessions.
|
||||||
|
Fixed-sized, plain-old-data is preferred to complex things that need to run
|
||||||
|
branching code during (de-)serialization.
|
||||||
|
|
||||||
|
The following sections describe how the compiler currently solves these issues.
|
||||||
|
|
||||||
|
## A Question Of Stability: Bridging The Gap Between Compilation Sessions
|
||||||
|
|
||||||
|
As noted before, various IDs (like `DefId`) are generated by the compiler in a
|
||||||
|
way that depends on the contents of the source code being compiled. ID assignment
|
||||||
|
is usually deterministic, that is, if the exact same code is compiled twice,
|
||||||
|
the same things will end up with the same IDs. However, if something
|
||||||
|
changes, e.g. a function is added in the middle of a file, there is no
|
||||||
|
guarantee that anything will have the same ID as it had before.
|
||||||
|
|
||||||
|
As a consequence we cannot represent the data in our on-disk cache the same
|
||||||
|
way it is represented in memory. For example, if we just stored a piece
|
||||||
|
of type information like `TyKind::FnDef(DefId, &'tcx Substs<'tcx>)` (as we do
|
||||||
|
in memory) and then the contained `DefId` points to a different function in
|
||||||
|
a new compilation session we'd be in trouble.
|
||||||
|
|
||||||
|
The solution to this problem is to find "stable" forms for IDs which remain
|
||||||
|
valid in between compilation sessions. For the most important case, `DefId`s,
|
||||||
|
these are the so-called `DefPath`s. Each `DefId` has a
|
||||||
|
corresponding `DefPath` but in place of a numeric ID, a `DefPath` is based on
|
||||||
|
the path to the identified item, e.g. `std::collections::HashMap`. The
|
||||||
|
advantage of an ID like this is that it is not affected by unrelated changes.
|
||||||
|
For example, one can add a new function to `std::collections` but
|
||||||
|
`std::collections::HashMap` would still be `std::collections::HashMap`. A
|
||||||
|
`DefPath` is "stable" across changes made to the source code while a `DefId`
|
||||||
|
isn't.
|
||||||
|
|
||||||
|
There is also the `DefPathHash` which is just a 128-bit hash value of the
|
||||||
|
`DefPath`. The two contain the same information and we mostly use the
|
||||||
|
`DefPathHash` because it simpler to handle, being `Copy` and self-contained.
|
||||||
|
|
||||||
|
This principle of stable identifiers is used to make the data in the on-disk
|
||||||
|
cache resilient to source code changes. Instead of storing a `DefId`, we store
|
||||||
|
the `DefPathHash` and when we deserialize something from the cache, we map the
|
||||||
|
`DefPathHash` to the corresponding `DefId` in the *current* compilation session
|
||||||
|
(which is just a simple hash table lookup).
|
||||||
|
|
||||||
|
The `HirId`, used for identifying HIR components that don't have their own
|
||||||
|
`DefId`, is another such stable ID. It is (conceptually) a pair of a `DefPath`
|
||||||
|
and a `LocalId`, where the `LocalId` identifies something (e.g. a `hir::Expr`)
|
||||||
|
locally within its "owner" (e.g. a `hir::Item`). If the owner is moved around,
|
||||||
|
the `LocalId`s within it are still the same.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## Checking Query Results For Changes: StableHash And Fingerprints
|
||||||
|
|
||||||
|
In order to do red-green-marking we often need to check if the result of a
|
||||||
|
query has changed compared to the result it had during the previous
|
||||||
|
compilation session. There are two performance problems with this though:
|
||||||
|
|
||||||
|
- We'd like to avoid having to load the previous result from disk just for
|
||||||
|
doing the comparison. We already computed the new result and will use that.
|
||||||
|
Also loading a result from disk will "pollute" the interners with data that
|
||||||
|
is unlikely to ever be used.
|
||||||
|
- We don't want to store each and every result in the on-disk cache. For
|
||||||
|
example, it would be wasted effort to persist things to disk that are
|
||||||
|
already available in upstream crates.
|
||||||
|
|
||||||
|
The compiler avoids these problems by using so-called `Fingerprint`s. Each time
|
||||||
|
a new query result is computed, the query engine will compute a 128 bit hash
|
||||||
|
value of the result. We call this hash value "the `Fingerprint` of the query
|
||||||
|
result". The hashing is (and has to be) done "in a stable way". This means
|
||||||
|
that whenever something is hashed that might change in between compilation
|
||||||
|
sessions (e.g. a `DefId`), we instead hash its stable equivalent
|
||||||
|
(e.g. the corresponding `DefPath`). That's what the whole `StableHash`
|
||||||
|
infrastructure is for. This way `Fingerprint`s computed in two
|
||||||
|
different compilation sessions are still comparable.
|
||||||
|
|
||||||
|
The next step is to store these fingerprints along with the dependency graph.
|
||||||
|
This is cheap since fingerprints are just bytes to be copied. It's also cheap to
|
||||||
|
load the entire set of fingerprints together with the dependency graph.
|
||||||
|
|
||||||
|
Now, when red-green-marking reaches the point where it needs to check if a
|
||||||
|
result has changed, it can just compare the (already loaded) previous
|
||||||
|
fingerprint to the fingerprint of the new result.
|
||||||
|
|
||||||
|
This approach works rather well but it's not without flaws:
|
||||||
|
|
||||||
|
- There is a small possibility of hash collisions. That is, two different
|
||||||
|
results could have the same fingerprint and the system would erroneously
|
||||||
|
assume that the result hasn't changed, leading to a missed update.
|
||||||
|
|
||||||
|
We mitigate this risk by using a high-quality hash function and a 128 bit
|
||||||
|
wide hash value. Due to these measures the practical risk of a hash
|
||||||
|
collision is negligible.
|
||||||
|
|
||||||
|
- Computing fingerprints is quite costly. It is the main reason why incremental
|
||||||
|
compilation can be slower than non-incremental compilation. We are forced to
|
||||||
|
use a good and thus expensive hash function, and we have to map things to
|
||||||
|
their stable equivalents while doing the hashing.
|
||||||
|
|
||||||
|
In the future we might want to explore different approaches to this problem.
|
||||||
|
For now it's `StableHash` and `Fingerprint`.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## A Tale Of Two DepGraphs: The Old And The New
|
||||||
|
|
||||||
|
The initial description of dependency tracking glosses over a few details
|
||||||
|
that quickly become a head scratcher when actually trying to implement things.
|
||||||
|
In particular it's easy to overlook that we are actually dealing with *two*
|
||||||
|
dependency graphs: The one we built during the previous compilation session and
|
||||||
|
the one that we are building for the current compilation session.
|
||||||
|
|
||||||
|
When a compilation session starts, the compiler loads the previous dependency
|
||||||
|
graph into memory as an immutable piece of data. Then, when a query is invoked,
|
||||||
|
it will first try to mark the corresponding node in the graph as green. This
|
||||||
|
means really that we are trying to mark the node in the *previous* dep-graph
|
||||||
|
as green that corresponds to the query key in the *current* session. How do we
|
||||||
|
do this mapping between current query key and previous `DepNode`? The answer
|
||||||
|
is again `Fingerprint`s: Nodes in the dependency graph are identified by a
|
||||||
|
fingerprint of the query key. Since fingerprints are stable across compilation
|
||||||
|
sessions, computing one in the current session allows us to find a node
|
||||||
|
in the dependency graph from the previous session. If we don't find a node with
|
||||||
|
the given fingerprint, it means that the query key refers to something that
|
||||||
|
did not yet exist in the previous session.
|
||||||
|
|
||||||
|
So, having found the dep-node in the previous dependency graph, we can look
|
||||||
|
up its dependencies (also dep-nodes in the previous graph) and continue with
|
||||||
|
the rest of the try-mark-green algorithm. The next interesting thing happens
|
||||||
|
when we successfully marked the node as green. At that point we copy the node
|
||||||
|
and the edges to its dependencies from the old graph into the new graph. We
|
||||||
|
have to do this because the new dep-graph cannot not acquire the
|
||||||
|
node and edges via the regular dependency tracking. The tracking system can
|
||||||
|
only record edges while actually running a query -- but running the query,
|
||||||
|
although we have the result already cached, is exactly what we want to avoid.
|
||||||
|
|
||||||
|
Once the compilation session has finished, all the unchanged parts have been
|
||||||
|
copied over from the old into the new dependency graph, while the changed parts
|
||||||
|
have been added to the new graph by the tracking system. At this point, the
|
||||||
|
new graph is serialized out to disk, alongside the query result cache, and can
|
||||||
|
act as the previous dep-graph in a subsequent compilation session.
|
||||||
|
|
||||||
|
|
||||||
|
## Didn't You Forget Something?: Cache Promotion
|
||||||
|
TODO
|
||||||
|
|
||||||
|
|
||||||
|
# The Future: Shortcomings Of The Current System and Possible Solutions
|
||||||
|
TODO
|
||||||
|
|
||||||
|
|
||||||
|
[query-model]: ./query-evaluation-model-in-detail.html
|
||||||
|
|
@ -0,0 +1,237 @@
|
||||||
|
|
||||||
|
|
||||||
|
# The Query Evaluation Model in Detail
|
||||||
|
|
||||||
|
This chapter provides a deeper dive into the abstract model queries are built on.
|
||||||
|
It does not go into implementation details but tries to explain
|
||||||
|
the underlying logic. The examples here, therefore, have been stripped down and
|
||||||
|
simplified and don't directly reflect the compilers internal APIs.
|
||||||
|
|
||||||
|
## What is a query?
|
||||||
|
|
||||||
|
Abstractly we view the compiler's knowledge about a given crate as a "database"
|
||||||
|
and queries are the way of asking the compiler questions about it, i.e.
|
||||||
|
we "query" the compiler's "database" for facts.
|
||||||
|
|
||||||
|
However, there's something special to this compiler database: It starts out empty
|
||||||
|
and is filled on-demand when queries are executed. Consequently, a query must
|
||||||
|
know how to compute its result if the database does not contain it yet. For
|
||||||
|
doing so, it can access other queries and certain input values that the database
|
||||||
|
is pre-filled with on creation.
|
||||||
|
|
||||||
|
A query thus consists of the following things:
|
||||||
|
|
||||||
|
- A name that identifies the query
|
||||||
|
- A "key" that specifies what we want to look up
|
||||||
|
- A result type that specifies what kind of result it yields
|
||||||
|
- A "provider" which is a function that specifies how the result is to be
|
||||||
|
computed if it isn't already present in the database.
|
||||||
|
|
||||||
|
As an example, the name of the `type_of` query is `type_of`, its query key is a
|
||||||
|
`DefId` identifying the item we want to know the type of, the result type is
|
||||||
|
`Ty<'tcx>`, and the provider is a function that, given the query key and access
|
||||||
|
to the rest of the database, can compute the type of the item identified by the
|
||||||
|
key.
|
||||||
|
|
||||||
|
So in some sense a query is just a function that maps the query key to the
|
||||||
|
corresponding result. However, we have to apply some restrictions in order for
|
||||||
|
this to be sound:
|
||||||
|
|
||||||
|
- The key and result must be immutable values.
|
||||||
|
- The provider function must be a pure function, that is, for the same key it
|
||||||
|
must always yield the same result.
|
||||||
|
- The only parameters a provider function takes are the key and a reference to
|
||||||
|
the "query context" (which provides access to rest of the "database").
|
||||||
|
|
||||||
|
The database is built up lazily by invoking queries. The query providers will
|
||||||
|
invoke other queries, for which the result is either already cached or computed
|
||||||
|
by calling another query provider. These query provider invocations
|
||||||
|
conceptually form a directed acyclic graph (DAG) at the leaves of which are
|
||||||
|
input values that are already known when the query context is created.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## Caching/Memoization
|
||||||
|
|
||||||
|
Results of query invocations are "memoized" which means that the query context
|
||||||
|
will cache the result in an internal table and, when the query is invoked with
|
||||||
|
the same query key again, will return the result from the cache instead of
|
||||||
|
running the provider again.
|
||||||
|
|
||||||
|
This caching is crucial for making the query engine efficient. Without
|
||||||
|
memoization the system would still be sound (that is, it would yield the same
|
||||||
|
results) but the same computations would be done over and over again.
|
||||||
|
|
||||||
|
Memoization is one of the main reasons why query providers have to be pure
|
||||||
|
functions. If calling a provider function could yield different results for
|
||||||
|
each invocation (because it accesses some global mutable state) then we could
|
||||||
|
not memoize the result.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## Input data
|
||||||
|
|
||||||
|
When the query context is created, it is still empty: No queries have been
|
||||||
|
executed, no results are cached. But the context already provides access to
|
||||||
|
"input" data, i.e. pieces of immutable data that where computed before the
|
||||||
|
context was created and that queries can access to do their computations.
|
||||||
|
Currently this input data consists mainly of the HIR map and the command-line
|
||||||
|
options the compiler was invoked with. In the future, inputs will just consist
|
||||||
|
of command-line options and a list of source files -- the HIR map will itself
|
||||||
|
be provided by a query which processes these source files.
|
||||||
|
|
||||||
|
Without inputs, queries would live in a void without anything to compute their
|
||||||
|
result from (remember, query providers only have access to other queries and
|
||||||
|
the context but not any other outside state or information).
|
||||||
|
|
||||||
|
For a query provider, input data and results of other queries look exactly the
|
||||||
|
same: It just tells the context "give me the value of X". Because input data
|
||||||
|
is immutable, the provider can rely on it being the same across
|
||||||
|
different query invocations, just as is the case for query results.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## An example execution trace of some queries
|
||||||
|
|
||||||
|
How does this DAG of query invocations come into existence? At some point
|
||||||
|
the compiler driver will create the, as yet empty, query context. It will then,
|
||||||
|
from outside of the query system, invoke the queries it needs to perform its
|
||||||
|
task. This looks something like the following:
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
fn compile_crate() {}
|
||||||
|
let cli_options = ...;
|
||||||
|
let hir_map = ...;
|
||||||
|
|
||||||
|
// Create the query context `tcx`
|
||||||
|
let tcx = TyCtxt::new(cli_options, hir_map);
|
||||||
|
|
||||||
|
// Do type checking by invoking the type check query
|
||||||
|
tcx.type_check_crate();
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The `type_check_crate` query provider would look something like the following:
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
fn type_check_crate_provider(tcx, _key: ()) {
|
||||||
|
let list_of_items = tcx.hir_map.list_of_items();
|
||||||
|
|
||||||
|
for item_def_id in list_of_hir_items {
|
||||||
|
tcx.type_check_item(item_def_id);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
We see that the `type_check_crate` query accesses input data
|
||||||
|
(`tcx.hir_map.list_of_items()`) and invokes other queries
|
||||||
|
(`type_check_item`). The `type_check_item`
|
||||||
|
invocations will themselves access input data and/or invoke other queries,
|
||||||
|
so that in the end the DAG of query invocations will be built up backwards
|
||||||
|
from the node that was initially executed:
|
||||||
|
|
||||||
|
```ignore
|
||||||
|
(2) (1)
|
||||||
|
list_of_all_hir_items <----------------------------- type_check_crate()
|
||||||
|
|
|
||||||
|
(5) (4) (3) |
|
||||||
|
Hir(foo) <--- type_of(foo) <--- type_check_item(foo) <-------+
|
||||||
|
| |
|
||||||
|
+-----------------+ |
|
||||||
|
| |
|
||||||
|
(7) v (6) (8) |
|
||||||
|
Hir(bar) <--- type_of(bar) <--- type_check_item(bar) <-------+
|
||||||
|
|
||||||
|
// (x) denotes invocation order
|
||||||
|
```
|
||||||
|
|
||||||
|
We also see that often a query result can be read from the cache:
|
||||||
|
`type_of(bar)` was computed for `type_check_item(foo)` so when
|
||||||
|
`type_check_item(bar)` needs it, it is already in the cache.
|
||||||
|
|
||||||
|
Query results stay cached in the query context as long as the context lives.
|
||||||
|
So if the compiler driver invoked another query later on, the above graph
|
||||||
|
would still exist and already executed queries would not have to be re-done.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## Cycles
|
||||||
|
|
||||||
|
Earlier we stated that query invocations form a DAG. However, it would be easy
|
||||||
|
form a cyclic graph by, for example, having a query provider like the following:
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
fn cyclic_query_provider(tcx, key) -> u32 {
|
||||||
|
// Invoke the same query with the same key again
|
||||||
|
tcx.cyclic_query(key)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Since query providers are regular functions, this would behave much as expected:
|
||||||
|
Evaluation would get stuck in an infinite recursion. A query like this would not
|
||||||
|
be very useful either. However, sometimes certain kinds of invalid user input
|
||||||
|
can result in queries being called in a cyclic way. The query engine includes
|
||||||
|
a check for cyclic invocations and, because cycles are an irrecoverable error,
|
||||||
|
will abort execution with a "cycle error" messages that tries to be human
|
||||||
|
readable.
|
||||||
|
|
||||||
|
At some point the compiler had a notion of "cycle recovery", that is, one could
|
||||||
|
"try" to execute a query and if it ended up causing a cycle, proceed in some
|
||||||
|
other fashion. However, this was later removed because it is not entirely
|
||||||
|
clear what the theoretical consequences of this are, especially regarding
|
||||||
|
incremental compilation.
|
||||||
|
|
||||||
|
|
||||||
|
## "Steal" Queries
|
||||||
|
|
||||||
|
Some queries have their result wrapped in a `Steal<T>` struct. These queries
|
||||||
|
behave exactly the same as regular with one exception: Their result is expected
|
||||||
|
to be "stolen" out of the cache at some point, meaning some other part of the
|
||||||
|
program is taking ownership of it and the result cannot be accessed anymore.
|
||||||
|
|
||||||
|
This stealing mechanism exists purely as a performance optimization because some
|
||||||
|
result values are too costly to clone (e.g. the MIR of a function). It seems
|
||||||
|
like result stealing would violate the condition that query results must be
|
||||||
|
immutable (after all we are moving the result value out of the cache) but it is
|
||||||
|
OK as long as the mutation is not observable. This is achieved by two things:
|
||||||
|
|
||||||
|
- Before a result is stolen, we make sure to eagerly run all queries that
|
||||||
|
might ever need to read that result. This has to be done manually by calling
|
||||||
|
those queries.
|
||||||
|
- Whenever a query tries to access a stolen result, we make the compiler ICE so
|
||||||
|
that such a condition cannot go unnoticed.
|
||||||
|
|
||||||
|
This is not an ideal setup because of the manual intervention needed, so it
|
||||||
|
should be used sparingly and only when it is well known which queries might
|
||||||
|
access a given result. In practice, however, stealing has not turned out to be
|
||||||
|
much of a maintainance burden.
|
||||||
|
|
||||||
|
To summarize: "Steal queries" break some of the rules in a controlled way.
|
||||||
|
There are checks in place that make sure that nothing can go silently wrong.
|
||||||
|
|
||||||
|
|
||||||
|
## Parallel Query Execution
|
||||||
|
|
||||||
|
The query model has some properties that make it actually feasible to evaluate
|
||||||
|
multiple queries in parallel without too much of an effort:
|
||||||
|
|
||||||
|
- All data a query provider can access is accessed via the query context, so
|
||||||
|
the query context can take care of synchronizing access.
|
||||||
|
- Query results are required to be immutable so they can safely be used by
|
||||||
|
different threads concurrently.
|
||||||
|
|
||||||
|
The nightly compiler already implements parallel query evaluation as follows:
|
||||||
|
|
||||||
|
When a query `foo` is evaluated, the cache table for `foo` is locked.
|
||||||
|
|
||||||
|
- If there already is a result, we can clone it,release the lock and
|
||||||
|
we are done.
|
||||||
|
- If there is no cache entry and no other active query invocation computing the
|
||||||
|
same result, we mark the key as being "in progress", release the lock and
|
||||||
|
start evaluating.
|
||||||
|
- If there *is* another query invocation for the same key in progress, we
|
||||||
|
release the lock, and just block the thread until the other invocation has
|
||||||
|
computed the result we are waiting for. This cannot deadlock because, as
|
||||||
|
mentioned before, query invocations form a DAG. Some thread will always make
|
||||||
|
progress.
|
||||||
|
|
||||||
61
src/query.md
61
src/query.md
|
|
@ -35,6 +35,12 @@ will in turn demand information about that crate, starting from the
|
||||||
However, that vision is not fully realized. Still, big chunks of the
|
However, that vision is not fully realized. Still, big chunks of the
|
||||||
compiler (for example, generating MIR) work exactly like this.
|
compiler (for example, generating MIR) work exactly like this.
|
||||||
|
|
||||||
|
### The Query Evaluation Model in Detail
|
||||||
|
|
||||||
|
The [Query Evaluation Model in Detail][query-model] chapter gives a more
|
||||||
|
in-depth description of what queries are and how they work.
|
||||||
|
If you intend to write a query of your own, this is a good read.
|
||||||
|
|
||||||
### Invoking queries
|
### Invoking queries
|
||||||
|
|
||||||
To invoke a query is simple. The tcx ("type context") offers a method
|
To invoke a query is simple. The tcx ("type context") offers a method
|
||||||
|
|
@ -45,60 +51,6 @@ query, you would just do this:
|
||||||
let ty = tcx.type_of(some_def_id);
|
let ty = tcx.type_of(some_def_id);
|
||||||
```
|
```
|
||||||
|
|
||||||
### Cycles between queries
|
|
||||||
|
|
||||||
A cycle is when a query becomes stuck in a loop e.g. query A generates query B
|
|
||||||
which generates query A again.
|
|
||||||
|
|
||||||
Currently, cycles during query execution should always result in a
|
|
||||||
compilation error. Typically, they arise because of illegal programs
|
|
||||||
that contain cyclic references they shouldn't (though sometimes they
|
|
||||||
arise because of compiler bugs, in which case we need to factor our
|
|
||||||
queries in a more fine-grained fashion to avoid them).
|
|
||||||
|
|
||||||
However, it is nonetheless often useful to *recover* from a cycle
|
|
||||||
(after reporting an error, say) and try to soldier on, so as to give a
|
|
||||||
better user experience. In order to recover from a cycle, you don't
|
|
||||||
get to use the nice method-call-style syntax. Instead, you invoke
|
|
||||||
using the `try_get` method, which looks roughly like this:
|
|
||||||
|
|
||||||
```rust,ignore
|
|
||||||
use ty::queries;
|
|
||||||
...
|
|
||||||
match queries::type_of::try_get(tcx, DUMMY_SP, self.did) {
|
|
||||||
Ok(result) => {
|
|
||||||
// no cycle occurred! You can use `result`
|
|
||||||
}
|
|
||||||
Err(err) => {
|
|
||||||
// A cycle occurred! The error value `err` is a `DiagnosticBuilder`,
|
|
||||||
// meaning essentially an "in-progress", not-yet-reported error message.
|
|
||||||
// See below for more details on what to do here.
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
So, if you get back an `Err` from `try_get`, then a cycle *did* occur. This
|
|
||||||
means that you must ensure that a compiler error message is reported. You can
|
|
||||||
do that in two ways:
|
|
||||||
|
|
||||||
The simplest is to invoke `err.emit()`. This will emit the cycle error to the
|
|
||||||
user.
|
|
||||||
|
|
||||||
However, often cycles happen because of an illegal program, and you
|
|
||||||
know at that point that an error either already has been reported or
|
|
||||||
will be reported due to this cycle by some other bit of code. In that
|
|
||||||
case, you can invoke `err.cancel()` to not emit any error. It is
|
|
||||||
traditional to then invoke:
|
|
||||||
|
|
||||||
```rust,ignore
|
|
||||||
tcx.sess.delay_span_bug(some_span, "some message")
|
|
||||||
```
|
|
||||||
|
|
||||||
`delay_span_bug()` is a helper that says: we expect a compilation
|
|
||||||
error to have happened or to happen in the future; so, if compilation
|
|
||||||
ultimately succeeds, make an ICE with the message `"some
|
|
||||||
message"`. This is basically just a precaution in case you are wrong.
|
|
||||||
|
|
||||||
### How the compiler executes a query
|
### How the compiler executes a query
|
||||||
|
|
||||||
So you may be wondering what happens when you invoke a query
|
So you may be wondering what happens when you invoke a query
|
||||||
|
|
@ -315,3 +267,4 @@ impl<'tcx> QueryDescription for queries::type_of<'tcx> {
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
[query-model]: queries/query-evaluation-model-in-detail.html
|
||||||
|
|
|
||||||
|
|
@ -139,7 +139,7 @@ crate (through `crate_variances`), but since most changes will not result in a
|
||||||
change to the actual results from variance inference, the `variances_of` query
|
change to the actual results from variance inference, the `variances_of` query
|
||||||
will wind up being considered green after it is re-evaluated.
|
will wind up being considered green after it is re-evaluated.
|
||||||
|
|
||||||
[rga]: ./incremental-compilation.html
|
[rga]: ./queries/incremental-compilation.html
|
||||||
|
|
||||||
<a name="addendum"></a>
|
<a name="addendum"></a>
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue