Add "The Query Evaluation Model in Detail" Chapter.
This commit is contained in:
parent
cdd17886a2
commit
d9ec22b420
|
|
@ -20,6 +20,7 @@
|
|||
- [The Rustc Driver](./rustc-driver.md)
|
||||
- [Rustdoc](./rustdoc.md)
|
||||
- [Queries: demand-driven compilation](./query.md)
|
||||
- [The Query Evaluation Model in Detail](./query-evaluation-model-in-detail.md)
|
||||
- [Incremental compilation](./incremental-compilation.md)
|
||||
- [Debugging and Testing](./incrcomp-debugging.md)
|
||||
- [The parser](./the-parser.md)
|
||||
|
|
|
|||
|
|
@ -0,0 +1,236 @@
|
|||
|
||||
|
||||
# The Query Evaluation Model in Detail
|
||||
|
||||
This chapter provides a deeper dive into the abstract model queries are built on.
|
||||
It does not go into implementation details but tries to explain
|
||||
the underlying logic. The examples here, therefore, have been stripped down and
|
||||
simplified and don't directly reflect the compilers internal APIs.
|
||||
|
||||
## What is a query?
|
||||
|
||||
Abstractly we view the compiler's knowledge about a given crate as a "database"
|
||||
and queries are the way of asking the compiler questions about it, i.e.
|
||||
we "query" the compiler's "database" for facts.
|
||||
|
||||
However, there's something special to this compiler database: It starts out empty
|
||||
and is filled on-demand when queries are executed. Consequently, a query must
|
||||
know how to compute its result if the database does not contain it yet. For
|
||||
doing so, it can access other queries and certain input values that the database
|
||||
is pre-filled with on creation.
|
||||
|
||||
A query thus consists of the following things:
|
||||
|
||||
- A name that identifies the query
|
||||
- A "key" that specifies what we want to look up
|
||||
- A result type that specifies what kind of result it yields
|
||||
- A "provider" which is a function that specifies how the result is to be
|
||||
computed if it isn't already present in the database.
|
||||
|
||||
As an example, the name of the `type_of` query is `type_of`, its query key is a
|
||||
`DefId` identifying the item we want to know the type of, the result type is
|
||||
`Ty<'tcx>`, and the provider is a function that, given the query key and access
|
||||
to the rest of the database, can compute the type of the item identified by the
|
||||
key.
|
||||
|
||||
So in some sense a query is just a function that maps the query key to the
|
||||
corresponding result. However, we have to apply some restrictions in order for
|
||||
this to be sound:
|
||||
|
||||
- The key and result must be immutable values.
|
||||
- The provider function must be a pure function, that is, for the same key it
|
||||
must always yield the same result.
|
||||
- The only parameters a provider function takes are the key and a reference to
|
||||
the "query context" (which provides access to rest of the "database").
|
||||
|
||||
The database is built up lazily by invoking queries. The query providers will
|
||||
invoke other queries, for which the result is either already cached or computed
|
||||
by calling another query provider. These query provider invocations
|
||||
conceptually form a directed acyclic graph (DAG) at the leaves of which are
|
||||
input values that are already known when the query context is created.
|
||||
|
||||
|
||||
|
||||
## Caching/Memoization
|
||||
|
||||
Results of query invocations are "memoized" which means that the query context
|
||||
will cache the result in an internal table and, when the query is invoked with
|
||||
the same query key again, will return the result from the cache instead of
|
||||
running the provider again.
|
||||
|
||||
This caching is crucial for making the query engine efficient. Without
|
||||
memoization the system would still be sound (that is, it would yield the same
|
||||
results) but the same computations would be done over and over again.
|
||||
|
||||
Memoization is one of the main reasons why query providers have to be pure
|
||||
functions. If calling a provider function could yield different results for
|
||||
each invocation (because it accesses some global mutable state) then we could
|
||||
not memoize the result.
|
||||
|
||||
|
||||
|
||||
## Input data
|
||||
|
||||
When the query context is created, it is still empty: No queries have been
|
||||
executed, no results are cached. But the context already provides access to
|
||||
"input" data, i.e. pieces of immutable data that where computed before the
|
||||
context was created and that queries can access to do their computations.
|
||||
Currently this input data consists mainly of the HIR map and the command-line
|
||||
options the compiler was invoked with. In the future, inputs will just consist
|
||||
of command-line options and a list of source files -- the HIR map will itself
|
||||
be provided by a query which processes these source files.
|
||||
|
||||
Without inputs, queries would live in a void without anything to compute their
|
||||
result from (remember, query providers only have access to other queries and
|
||||
the context but not any other outside state or information).
|
||||
|
||||
For a query provider, input data and results of other queries look exactly the
|
||||
same: It just tells the context "give me the value of X". Because input data
|
||||
is immutable, the provider can rely on it being the same across
|
||||
different query invocations, just as is the case for query results.
|
||||
|
||||
|
||||
|
||||
## An example execution trace of some queries
|
||||
|
||||
How does this DAG of query invocations come into existence? At some point
|
||||
the compiler driver will create the, as yet empty, query context. It will then,
|
||||
from outside of the query system, invoke the queries it needs to perform its
|
||||
task. This looks something like the following:
|
||||
|
||||
```rust,ignore
|
||||
fn compile_crate() {}
|
||||
let cli_options = ...;
|
||||
let hir_map = ...;
|
||||
|
||||
// Create the query context `tcx`
|
||||
let tcx = TyCtxt::new(cli_options, hir_map);
|
||||
|
||||
// Do type checking by invoking the type check query
|
||||
tcx.type_check_crate();
|
||||
}
|
||||
```
|
||||
|
||||
The `type_check_crate` query provider would look something like the following:
|
||||
|
||||
```rust,ignore
|
||||
fn type_check_crate_provider(tcx, _key: ()) {
|
||||
let list_of_items = tcx.hir_map.list_of_items();
|
||||
|
||||
for item_def_id in list_of_hir_items {
|
||||
tcx.type_check_item(item_def_id);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
We see that the `type_check_crate` query accesses input data (`tcx.hir_map`)
|
||||
and invokes other queries (`type_check_item`). The `type_check_item`
|
||||
invocations will themselves access input data and/or invoke other queries,
|
||||
so that in the end the DAG of query invocations will be built up backwards
|
||||
from the node that was initially executed:
|
||||
|
||||
```
|
||||
(1)
|
||||
hir_map <--------------------------------------------------- type_check_crate()
|
||||
^ |
|
||||
| (4) (3) (2) |
|
||||
+-- Hir(foo) <--- type_of(foo) <--- type_check_item(foo) <-------+
|
||||
| | |
|
||||
| +-----------------+ |
|
||||
| | |
|
||||
| (6) v (5) (7) |
|
||||
+-- Hir(bar) <--- type_of(bar) <--- type_check_item(bar) <-------+
|
||||
|
||||
// (x) denotes invocation order
|
||||
```
|
||||
|
||||
We also see that often a query result can be read from the cache:
|
||||
`type_of(bar)` was computed for `type_check_item(foo)` so when
|
||||
`type_check_item(bar)` needs it, it is already in the cache.
|
||||
|
||||
Query results stay cached in the query context as long as the context lives.
|
||||
So if the compiler driver invoked another query later on, the above graph
|
||||
would still exist and already executed queries would not have to be re-done.
|
||||
|
||||
|
||||
|
||||
## Cycles
|
||||
|
||||
Earlier we stated that query invocations form a DAG. However, it would be easy
|
||||
form a cyclic graph by, for example, having a query provider like the following:
|
||||
|
||||
```rust,ignore
|
||||
fn cyclic_query_provider(tcx, key) -> u32 {
|
||||
// Invoke the same query with the same key again
|
||||
tcx.cyclic_query(key)
|
||||
}
|
||||
```
|
||||
|
||||
Since query providers are regular functions, this would behave much as expected:
|
||||
Evaluation would get stuck in an infinite recursion. A query like this would not
|
||||
be very useful either. However, sometimes certain kinds of invalid user input
|
||||
can result in queries being called in a cyclic way. The query engine includes
|
||||
a check for cyclic invocations and, because cycles are an irrecoverable error,
|
||||
will abort execution with a "cycle error" messages that tries to be human
|
||||
readable.
|
||||
|
||||
At some point the compiler had a notion of "cycle recovery", that is, one could
|
||||
"try" to execute a query and if it ended up causing a cycle, proceed in some
|
||||
other fashion. However, this was later removed because it is not entirely
|
||||
clear what the theoretical consequences of this are, especially regarding
|
||||
incremental compilation.
|
||||
|
||||
|
||||
## "Steal" Queries
|
||||
|
||||
Some queries have their result wrapped in a `Steal<T>` struct. These queries
|
||||
behave exactly the same as regular with one exception: Their result is expected
|
||||
to be "stolen" out of the cache at some point, meaning some other part of the
|
||||
program is taking ownership of it and the result cannot be accessed anymore.
|
||||
|
||||
This stealing mechanism exists purely as a performance optimization because some
|
||||
result values are too costly to clone (e.g. the MIR of a function). It seems
|
||||
like result stealing would violate the condition that query results must be
|
||||
immutable (after all we are moving the result value out of the cache) but it is
|
||||
OK as long as the mutation is not observable. This is achieved by two things:
|
||||
|
||||
- Before a result is stolen, we make sure to eagerly run all queries that
|
||||
might ever need to read that result. This has to be done manually by calling
|
||||
those queries.
|
||||
- Whenever a query tries to access a stolen result, we make the compiler ICE so
|
||||
that such a condition cannot go unnoticed.
|
||||
|
||||
This is not an ideal setup because of the manual intervention needed, so it
|
||||
should be used sparingly and only when it is well known which queries might
|
||||
access a given result. In practice, however, stealing has not turned out to be
|
||||
much of a maintainance burden.
|
||||
|
||||
To summarize: "Steal queries" break some of the rules in a controlled way.
|
||||
There are checks in place that make sure that nothing can go silently wrong.
|
||||
|
||||
|
||||
## Parallel Query Execution
|
||||
|
||||
The query model has some properties that make it actually feasible to evaluate
|
||||
multiple queries in parallel without too much of an effort:
|
||||
|
||||
- All data a query provider can access is accessed via the query context, so
|
||||
the query context can take care of synchronizing access.
|
||||
- Query results are required to be immutable so they can safely be used by
|
||||
different threads concurrently.
|
||||
|
||||
The nightly compiler already implements parallel query evaluation as follows:
|
||||
|
||||
When a query `foo` is evaluated, the cache table for `foo` is locked.
|
||||
|
||||
- If there already is a result, we can clone it,release the lock and
|
||||
we are done.
|
||||
- If there is no cache entry and no other active query invocation computing the
|
||||
same result, we mark the key as being "in progress", release the lock and
|
||||
start evaluating.
|
||||
- If there *is* another query invocation for the same key in progress, we
|
||||
release the lock, and just block the thread until the other invocation has
|
||||
computed the result we are waiting for. This cannot deadlock because, as
|
||||
mentioned before, query invocations form a DAG. Some thread will always make
|
||||
progress.
|
||||
|
||||
60
src/query.md
60
src/query.md
|
|
@ -35,6 +35,12 @@ will in turn demand information about that crate, starting from the
|
|||
However, that vision is not fully realized. Still, big chunks of the
|
||||
compiler (for example, generating MIR) work exactly like this.
|
||||
|
||||
### The Query Evaluation Model in Detail
|
||||
|
||||
The [Query Evaluation Model in Detail](query-evaluation-model-in-detail.html)
|
||||
chapter gives a more in-depth description of what queries are and how they work.
|
||||
If you intend to write a query of your own, this is a good read.
|
||||
|
||||
### Invoking queries
|
||||
|
||||
To invoke a query is simple. The tcx ("type context") offers a method
|
||||
|
|
@ -45,60 +51,6 @@ query, you would just do this:
|
|||
let ty = tcx.type_of(some_def_id);
|
||||
```
|
||||
|
||||
### Cycles between queries
|
||||
|
||||
A cycle is when a query becomes stuck in a loop e.g. query A generates query B
|
||||
which generates query A again.
|
||||
|
||||
Currently, cycles during query execution should always result in a
|
||||
compilation error. Typically, they arise because of illegal programs
|
||||
that contain cyclic references they shouldn't (though sometimes they
|
||||
arise because of compiler bugs, in which case we need to factor our
|
||||
queries in a more fine-grained fashion to avoid them).
|
||||
|
||||
However, it is nonetheless often useful to *recover* from a cycle
|
||||
(after reporting an error, say) and try to soldier on, so as to give a
|
||||
better user experience. In order to recover from a cycle, you don't
|
||||
get to use the nice method-call-style syntax. Instead, you invoke
|
||||
using the `try_get` method, which looks roughly like this:
|
||||
|
||||
```rust,ignore
|
||||
use ty::queries;
|
||||
...
|
||||
match queries::type_of::try_get(tcx, DUMMY_SP, self.did) {
|
||||
Ok(result) => {
|
||||
// no cycle occurred! You can use `result`
|
||||
}
|
||||
Err(err) => {
|
||||
// A cycle occurred! The error value `err` is a `DiagnosticBuilder`,
|
||||
// meaning essentially an "in-progress", not-yet-reported error message.
|
||||
// See below for more details on what to do here.
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
So, if you get back an `Err` from `try_get`, then a cycle *did* occur. This
|
||||
means that you must ensure that a compiler error message is reported. You can
|
||||
do that in two ways:
|
||||
|
||||
The simplest is to invoke `err.emit()`. This will emit the cycle error to the
|
||||
user.
|
||||
|
||||
However, often cycles happen because of an illegal program, and you
|
||||
know at that point that an error either already has been reported or
|
||||
will be reported due to this cycle by some other bit of code. In that
|
||||
case, you can invoke `err.cancel()` to not emit any error. It is
|
||||
traditional to then invoke:
|
||||
|
||||
```rust,ignore
|
||||
tcx.sess.delay_span_bug(some_span, "some message")
|
||||
```
|
||||
|
||||
`delay_span_bug()` is a helper that says: we expect a compilation
|
||||
error to have happened or to happen in the future; so, if compilation
|
||||
ultimately succeeds, make an ICE with the message `"some
|
||||
message"`. This is basically just a precaution in case you are wrong.
|
||||
|
||||
### How the compiler executes a query
|
||||
|
||||
So you may be wondering what happens when you invoke a query
|
||||
|
|
|
|||
Loading…
Reference in New Issue