Add "The Query Evaluation Model in Detail" Chapter.
This commit is contained in:
parent
cdd17886a2
commit
d9ec22b420
|
|
@ -20,6 +20,7 @@
|
||||||
- [The Rustc Driver](./rustc-driver.md)
|
- [The Rustc Driver](./rustc-driver.md)
|
||||||
- [Rustdoc](./rustdoc.md)
|
- [Rustdoc](./rustdoc.md)
|
||||||
- [Queries: demand-driven compilation](./query.md)
|
- [Queries: demand-driven compilation](./query.md)
|
||||||
|
- [The Query Evaluation Model in Detail](./query-evaluation-model-in-detail.md)
|
||||||
- [Incremental compilation](./incremental-compilation.md)
|
- [Incremental compilation](./incremental-compilation.md)
|
||||||
- [Debugging and Testing](./incrcomp-debugging.md)
|
- [Debugging and Testing](./incrcomp-debugging.md)
|
||||||
- [The parser](./the-parser.md)
|
- [The parser](./the-parser.md)
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,236 @@
|
||||||
|
|
||||||
|
|
||||||
|
# The Query Evaluation Model in Detail
|
||||||
|
|
||||||
|
This chapter provides a deeper dive into the abstract model queries are built on.
|
||||||
|
It does not go into implementation details but tries to explain
|
||||||
|
the underlying logic. The examples here, therefore, have been stripped down and
|
||||||
|
simplified and don't directly reflect the compilers internal APIs.
|
||||||
|
|
||||||
|
## What is a query?
|
||||||
|
|
||||||
|
Abstractly we view the compiler's knowledge about a given crate as a "database"
|
||||||
|
and queries are the way of asking the compiler questions about it, i.e.
|
||||||
|
we "query" the compiler's "database" for facts.
|
||||||
|
|
||||||
|
However, there's something special to this compiler database: It starts out empty
|
||||||
|
and is filled on-demand when queries are executed. Consequently, a query must
|
||||||
|
know how to compute its result if the database does not contain it yet. For
|
||||||
|
doing so, it can access other queries and certain input values that the database
|
||||||
|
is pre-filled with on creation.
|
||||||
|
|
||||||
|
A query thus consists of the following things:
|
||||||
|
|
||||||
|
- A name that identifies the query
|
||||||
|
- A "key" that specifies what we want to look up
|
||||||
|
- A result type that specifies what kind of result it yields
|
||||||
|
- A "provider" which is a function that specifies how the result is to be
|
||||||
|
computed if it isn't already present in the database.
|
||||||
|
|
||||||
|
As an example, the name of the `type_of` query is `type_of`, its query key is a
|
||||||
|
`DefId` identifying the item we want to know the type of, the result type is
|
||||||
|
`Ty<'tcx>`, and the provider is a function that, given the query key and access
|
||||||
|
to the rest of the database, can compute the type of the item identified by the
|
||||||
|
key.
|
||||||
|
|
||||||
|
So in some sense a query is just a function that maps the query key to the
|
||||||
|
corresponding result. However, we have to apply some restrictions in order for
|
||||||
|
this to be sound:
|
||||||
|
|
||||||
|
- The key and result must be immutable values.
|
||||||
|
- The provider function must be a pure function, that is, for the same key it
|
||||||
|
must always yield the same result.
|
||||||
|
- The only parameters a provider function takes are the key and a reference to
|
||||||
|
the "query context" (which provides access to rest of the "database").
|
||||||
|
|
||||||
|
The database is built up lazily by invoking queries. The query providers will
|
||||||
|
invoke other queries, for which the result is either already cached or computed
|
||||||
|
by calling another query provider. These query provider invocations
|
||||||
|
conceptually form a directed acyclic graph (DAG) at the leaves of which are
|
||||||
|
input values that are already known when the query context is created.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## Caching/Memoization
|
||||||
|
|
||||||
|
Results of query invocations are "memoized" which means that the query context
|
||||||
|
will cache the result in an internal table and, when the query is invoked with
|
||||||
|
the same query key again, will return the result from the cache instead of
|
||||||
|
running the provider again.
|
||||||
|
|
||||||
|
This caching is crucial for making the query engine efficient. Without
|
||||||
|
memoization the system would still be sound (that is, it would yield the same
|
||||||
|
results) but the same computations would be done over and over again.
|
||||||
|
|
||||||
|
Memoization is one of the main reasons why query providers have to be pure
|
||||||
|
functions. If calling a provider function could yield different results for
|
||||||
|
each invocation (because it accesses some global mutable state) then we could
|
||||||
|
not memoize the result.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## Input data
|
||||||
|
|
||||||
|
When the query context is created, it is still empty: No queries have been
|
||||||
|
executed, no results are cached. But the context already provides access to
|
||||||
|
"input" data, i.e. pieces of immutable data that where computed before the
|
||||||
|
context was created and that queries can access to do their computations.
|
||||||
|
Currently this input data consists mainly of the HIR map and the command-line
|
||||||
|
options the compiler was invoked with. In the future, inputs will just consist
|
||||||
|
of command-line options and a list of source files -- the HIR map will itself
|
||||||
|
be provided by a query which processes these source files.
|
||||||
|
|
||||||
|
Without inputs, queries would live in a void without anything to compute their
|
||||||
|
result from (remember, query providers only have access to other queries and
|
||||||
|
the context but not any other outside state or information).
|
||||||
|
|
||||||
|
For a query provider, input data and results of other queries look exactly the
|
||||||
|
same: It just tells the context "give me the value of X". Because input data
|
||||||
|
is immutable, the provider can rely on it being the same across
|
||||||
|
different query invocations, just as is the case for query results.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## An example execution trace of some queries
|
||||||
|
|
||||||
|
How does this DAG of query invocations come into existence? At some point
|
||||||
|
the compiler driver will create the, as yet empty, query context. It will then,
|
||||||
|
from outside of the query system, invoke the queries it needs to perform its
|
||||||
|
task. This looks something like the following:
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
fn compile_crate() {}
|
||||||
|
let cli_options = ...;
|
||||||
|
let hir_map = ...;
|
||||||
|
|
||||||
|
// Create the query context `tcx`
|
||||||
|
let tcx = TyCtxt::new(cli_options, hir_map);
|
||||||
|
|
||||||
|
// Do type checking by invoking the type check query
|
||||||
|
tcx.type_check_crate();
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The `type_check_crate` query provider would look something like the following:
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
fn type_check_crate_provider(tcx, _key: ()) {
|
||||||
|
let list_of_items = tcx.hir_map.list_of_items();
|
||||||
|
|
||||||
|
for item_def_id in list_of_hir_items {
|
||||||
|
tcx.type_check_item(item_def_id);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
We see that the `type_check_crate` query accesses input data (`tcx.hir_map`)
|
||||||
|
and invokes other queries (`type_check_item`). The `type_check_item`
|
||||||
|
invocations will themselves access input data and/or invoke other queries,
|
||||||
|
so that in the end the DAG of query invocations will be built up backwards
|
||||||
|
from the node that was initially executed:
|
||||||
|
|
||||||
|
```
|
||||||
|
(1)
|
||||||
|
hir_map <--------------------------------------------------- type_check_crate()
|
||||||
|
^ |
|
||||||
|
| (4) (3) (2) |
|
||||||
|
+-- Hir(foo) <--- type_of(foo) <--- type_check_item(foo) <-------+
|
||||||
|
| | |
|
||||||
|
| +-----------------+ |
|
||||||
|
| | |
|
||||||
|
| (6) v (5) (7) |
|
||||||
|
+-- Hir(bar) <--- type_of(bar) <--- type_check_item(bar) <-------+
|
||||||
|
|
||||||
|
// (x) denotes invocation order
|
||||||
|
```
|
||||||
|
|
||||||
|
We also see that often a query result can be read from the cache:
|
||||||
|
`type_of(bar)` was computed for `type_check_item(foo)` so when
|
||||||
|
`type_check_item(bar)` needs it, it is already in the cache.
|
||||||
|
|
||||||
|
Query results stay cached in the query context as long as the context lives.
|
||||||
|
So if the compiler driver invoked another query later on, the above graph
|
||||||
|
would still exist and already executed queries would not have to be re-done.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## Cycles
|
||||||
|
|
||||||
|
Earlier we stated that query invocations form a DAG. However, it would be easy
|
||||||
|
form a cyclic graph by, for example, having a query provider like the following:
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
fn cyclic_query_provider(tcx, key) -> u32 {
|
||||||
|
// Invoke the same query with the same key again
|
||||||
|
tcx.cyclic_query(key)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Since query providers are regular functions, this would behave much as expected:
|
||||||
|
Evaluation would get stuck in an infinite recursion. A query like this would not
|
||||||
|
be very useful either. However, sometimes certain kinds of invalid user input
|
||||||
|
can result in queries being called in a cyclic way. The query engine includes
|
||||||
|
a check for cyclic invocations and, because cycles are an irrecoverable error,
|
||||||
|
will abort execution with a "cycle error" messages that tries to be human
|
||||||
|
readable.
|
||||||
|
|
||||||
|
At some point the compiler had a notion of "cycle recovery", that is, one could
|
||||||
|
"try" to execute a query and if it ended up causing a cycle, proceed in some
|
||||||
|
other fashion. However, this was later removed because it is not entirely
|
||||||
|
clear what the theoretical consequences of this are, especially regarding
|
||||||
|
incremental compilation.
|
||||||
|
|
||||||
|
|
||||||
|
## "Steal" Queries
|
||||||
|
|
||||||
|
Some queries have their result wrapped in a `Steal<T>` struct. These queries
|
||||||
|
behave exactly the same as regular with one exception: Their result is expected
|
||||||
|
to be "stolen" out of the cache at some point, meaning some other part of the
|
||||||
|
program is taking ownership of it and the result cannot be accessed anymore.
|
||||||
|
|
||||||
|
This stealing mechanism exists purely as a performance optimization because some
|
||||||
|
result values are too costly to clone (e.g. the MIR of a function). It seems
|
||||||
|
like result stealing would violate the condition that query results must be
|
||||||
|
immutable (after all we are moving the result value out of the cache) but it is
|
||||||
|
OK as long as the mutation is not observable. This is achieved by two things:
|
||||||
|
|
||||||
|
- Before a result is stolen, we make sure to eagerly run all queries that
|
||||||
|
might ever need to read that result. This has to be done manually by calling
|
||||||
|
those queries.
|
||||||
|
- Whenever a query tries to access a stolen result, we make the compiler ICE so
|
||||||
|
that such a condition cannot go unnoticed.
|
||||||
|
|
||||||
|
This is not an ideal setup because of the manual intervention needed, so it
|
||||||
|
should be used sparingly and only when it is well known which queries might
|
||||||
|
access a given result. In practice, however, stealing has not turned out to be
|
||||||
|
much of a maintainance burden.
|
||||||
|
|
||||||
|
To summarize: "Steal queries" break some of the rules in a controlled way.
|
||||||
|
There are checks in place that make sure that nothing can go silently wrong.
|
||||||
|
|
||||||
|
|
||||||
|
## Parallel Query Execution
|
||||||
|
|
||||||
|
The query model has some properties that make it actually feasible to evaluate
|
||||||
|
multiple queries in parallel without too much of an effort:
|
||||||
|
|
||||||
|
- All data a query provider can access is accessed via the query context, so
|
||||||
|
the query context can take care of synchronizing access.
|
||||||
|
- Query results are required to be immutable so they can safely be used by
|
||||||
|
different threads concurrently.
|
||||||
|
|
||||||
|
The nightly compiler already implements parallel query evaluation as follows:
|
||||||
|
|
||||||
|
When a query `foo` is evaluated, the cache table for `foo` is locked.
|
||||||
|
|
||||||
|
- If there already is a result, we can clone it,release the lock and
|
||||||
|
we are done.
|
||||||
|
- If there is no cache entry and no other active query invocation computing the
|
||||||
|
same result, we mark the key as being "in progress", release the lock and
|
||||||
|
start evaluating.
|
||||||
|
- If there *is* another query invocation for the same key in progress, we
|
||||||
|
release the lock, and just block the thread until the other invocation has
|
||||||
|
computed the result we are waiting for. This cannot deadlock because, as
|
||||||
|
mentioned before, query invocations form a DAG. Some thread will always make
|
||||||
|
progress.
|
||||||
|
|
||||||
60
src/query.md
60
src/query.md
|
|
@ -35,6 +35,12 @@ will in turn demand information about that crate, starting from the
|
||||||
However, that vision is not fully realized. Still, big chunks of the
|
However, that vision is not fully realized. Still, big chunks of the
|
||||||
compiler (for example, generating MIR) work exactly like this.
|
compiler (for example, generating MIR) work exactly like this.
|
||||||
|
|
||||||
|
### The Query Evaluation Model in Detail
|
||||||
|
|
||||||
|
The [Query Evaluation Model in Detail](query-evaluation-model-in-detail.html)
|
||||||
|
chapter gives a more in-depth description of what queries are and how they work.
|
||||||
|
If you intend to write a query of your own, this is a good read.
|
||||||
|
|
||||||
### Invoking queries
|
### Invoking queries
|
||||||
|
|
||||||
To invoke a query is simple. The tcx ("type context") offers a method
|
To invoke a query is simple. The tcx ("type context") offers a method
|
||||||
|
|
@ -45,60 +51,6 @@ query, you would just do this:
|
||||||
let ty = tcx.type_of(some_def_id);
|
let ty = tcx.type_of(some_def_id);
|
||||||
```
|
```
|
||||||
|
|
||||||
### Cycles between queries
|
|
||||||
|
|
||||||
A cycle is when a query becomes stuck in a loop e.g. query A generates query B
|
|
||||||
which generates query A again.
|
|
||||||
|
|
||||||
Currently, cycles during query execution should always result in a
|
|
||||||
compilation error. Typically, they arise because of illegal programs
|
|
||||||
that contain cyclic references they shouldn't (though sometimes they
|
|
||||||
arise because of compiler bugs, in which case we need to factor our
|
|
||||||
queries in a more fine-grained fashion to avoid them).
|
|
||||||
|
|
||||||
However, it is nonetheless often useful to *recover* from a cycle
|
|
||||||
(after reporting an error, say) and try to soldier on, so as to give a
|
|
||||||
better user experience. In order to recover from a cycle, you don't
|
|
||||||
get to use the nice method-call-style syntax. Instead, you invoke
|
|
||||||
using the `try_get` method, which looks roughly like this:
|
|
||||||
|
|
||||||
```rust,ignore
|
|
||||||
use ty::queries;
|
|
||||||
...
|
|
||||||
match queries::type_of::try_get(tcx, DUMMY_SP, self.did) {
|
|
||||||
Ok(result) => {
|
|
||||||
// no cycle occurred! You can use `result`
|
|
||||||
}
|
|
||||||
Err(err) => {
|
|
||||||
// A cycle occurred! The error value `err` is a `DiagnosticBuilder`,
|
|
||||||
// meaning essentially an "in-progress", not-yet-reported error message.
|
|
||||||
// See below for more details on what to do here.
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
So, if you get back an `Err` from `try_get`, then a cycle *did* occur. This
|
|
||||||
means that you must ensure that a compiler error message is reported. You can
|
|
||||||
do that in two ways:
|
|
||||||
|
|
||||||
The simplest is to invoke `err.emit()`. This will emit the cycle error to the
|
|
||||||
user.
|
|
||||||
|
|
||||||
However, often cycles happen because of an illegal program, and you
|
|
||||||
know at that point that an error either already has been reported or
|
|
||||||
will be reported due to this cycle by some other bit of code. In that
|
|
||||||
case, you can invoke `err.cancel()` to not emit any error. It is
|
|
||||||
traditional to then invoke:
|
|
||||||
|
|
||||||
```rust,ignore
|
|
||||||
tcx.sess.delay_span_bug(some_span, "some message")
|
|
||||||
```
|
|
||||||
|
|
||||||
`delay_span_bug()` is a helper that says: we expect a compilation
|
|
||||||
error to have happened or to happen in the future; so, if compilation
|
|
||||||
ultimately succeeds, make an ICE with the message `"some
|
|
||||||
message"`. This is basically just a precaution in case you are wrong.
|
|
||||||
|
|
||||||
### How the compiler executes a query
|
### How the compiler executes a query
|
||||||
|
|
||||||
So you may be wondering what happens when you invoke a query
|
So you may be wondering what happens when you invoke a query
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue