259 lines
10 KiB
Markdown
259 lines
10 KiB
Markdown
# Queries: demand-driven compilation
|
||
|
||
<!-- toc -->
|
||
|
||
As described in [the high-level overview of the compiler][hl], the Rust compiler
|
||
is still (as of <!-- date-check --> July 2021) transitioning from a
|
||
traditional "pass-based" setup to a "demand-driven" system. The compiler query
|
||
system is the key to rustc's demand-driven organization.
|
||
The idea is pretty simple. Instead of entirely independent passes
|
||
(parsing, type-checking, etc.), a set of function-like *queries*
|
||
compute information about the input source. For example,
|
||
there is a query called `type_of` that, given the [`DefId`] of
|
||
some item, will compute the type of that item and return it to you.
|
||
|
||
[`DefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/def_id/struct.DefId.html
|
||
[hl]: ./compiler-src.md
|
||
|
||
Query execution is *memoized*. The first time you invoke a
|
||
query, it will go do the computation, but the next time, the result is
|
||
returned from a hashtable. Moreover, query execution fits nicely into
|
||
*incremental computation*; the idea is roughly that, when you invoke a
|
||
query, the result *may* be returned to you by loading stored data
|
||
from disk.[^incr-comp-detail]
|
||
|
||
Eventually, we want the entire compiler
|
||
control-flow to be query driven. There will effectively be one
|
||
top-level query (`compile`) that will run compilation on a crate; this
|
||
will in turn demand information about that crate, starting from the
|
||
*end*. For example:
|
||
|
||
- The `compile` query might demand to get a list of codegen-units
|
||
(i.e. modules that need to be compiled by LLVM).
|
||
- But computing the list of codegen-units would invoke some subquery
|
||
that returns the list of all modules defined in the Rust source.
|
||
- That query in turn would invoke something asking for the HIR.
|
||
- This keeps going further and further back until we wind up doing the
|
||
actual parsing.
|
||
|
||
Although this vision is not fully realized, large sections of the
|
||
compiler (for example, generating [MIR](./mir/index.md)) currently work exactly like this.
|
||
|
||
[^incr-comp-detail]: The ["Incremental Compilation in Detail](queries/incremental-compilation-in-detail.md) chapter gives a more
|
||
in-depth description of what queries are and how they work.
|
||
If you intend to write a query of your own, this is a good read.
|
||
|
||
## Invoking queries
|
||
|
||
Invoking a query is simple. The [`TyCtxt`] ("type context") struct offers a method
|
||
for each defined query. For example, to invoke the `type_of`
|
||
query, you would just do this:
|
||
|
||
```rust,ignore
|
||
let ty = tcx.type_of(some_def_id);
|
||
```
|
||
|
||
[`TyCtxt`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyCtxt.html
|
||
|
||
## How the compiler executes a query
|
||
|
||
So you may be wondering what happens when you invoke a query
|
||
method. The answer is that, for each query, the compiler maintains a
|
||
cache – if your query has already been executed, then, the answer is
|
||
simple: we clone the return value out of the cache and return it
|
||
(therefore, you should try to ensure that the return types of queries
|
||
are cheaply cloneable; insert an `Rc` if necessary).
|
||
|
||
### Providers
|
||
|
||
If, however, the query is *not* in the cache, then the compiler will
|
||
try to find a suitable **provider**. A provider is a function that has
|
||
been defined and linked into the compiler somewhere that contains the
|
||
code to compute the result of the query.
|
||
|
||
**Providers are defined per-crate.** The compiler maintains,
|
||
internally, a table of providers for every crate, at least
|
||
conceptually. Right now, there are really two sets: the providers for
|
||
queries about the **local crate** (that is, the one being compiled)
|
||
and providers for queries about **external crates** (that is,
|
||
dependencies of the local crate). Note that what determines the crate
|
||
that a query is targeting is not the *kind* of query, but the *key*.
|
||
For example, when you invoke `tcx.type_of(def_id)`, that could be a
|
||
local query or an external query, depending on what crate the `def_id`
|
||
is referring to (see the [`self::keys::Key`][Key] trait for more
|
||
information on how that works).
|
||
|
||
Providers always have the same signature:
|
||
|
||
```rust,ignore
|
||
fn provider<'tcx>(
|
||
tcx: TyCtxt<'tcx>,
|
||
key: QUERY_KEY,
|
||
) -> QUERY_RESULT {
|
||
...
|
||
}
|
||
```
|
||
|
||
Providers take two arguments: the `tcx` and the query key.
|
||
They return the result of the query.
|
||
|
||
### How providers are setup
|
||
|
||
When the tcx is created, it is given the providers by its creator using
|
||
the [`Providers`][providers_struct] struct. This struct is generated by
|
||
the macros here, but it is basically a big list of function pointers:
|
||
|
||
[providers_struct]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/query/struct.Providers.html
|
||
|
||
```rust,ignore
|
||
struct Providers {
|
||
type_of: for<'tcx> fn(TyCtxt<'tcx>, DefId) -> Ty<'tcx>,
|
||
...
|
||
}
|
||
```
|
||
|
||
At present, we have one copy of the struct for local crates, and one
|
||
for external crates, though the plan is that we may eventually have
|
||
one per crate.
|
||
|
||
These `Providers` structs are ultimately created and populated by
|
||
`rustc_driver`, but it does this by distributing the work
|
||
throughout the other `rustc_*` crates. This is done by invoking
|
||
various [`provide`][provide_fn] functions. These functions tend to look
|
||
something like this:
|
||
|
||
[provide_fn]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/hir/fn.provide.html
|
||
|
||
```rust,ignore
|
||
pub fn provide(providers: &mut Providers) {
|
||
*providers = Providers {
|
||
type_of,
|
||
..*providers
|
||
};
|
||
}
|
||
```
|
||
|
||
That is, they take an `&mut Providers` and mutate it in place. Usually
|
||
we use the formulation above just because it looks nice, but you could
|
||
as well do `providers.type_of = type_of`, which would be equivalent.
|
||
(Here, `type_of` would be a top-level function, defined as we saw
|
||
before.) So, if we want to add a provider for some other query,
|
||
let's call it `fubar`, into the crate above, we might modify the `provide()`
|
||
function like so:
|
||
|
||
```rust,ignore
|
||
pub fn provide(providers: &mut Providers) {
|
||
*providers = Providers {
|
||
type_of,
|
||
fubar,
|
||
..*providers
|
||
};
|
||
}
|
||
|
||
fn fubar<'tcx>(tcx: TyCtxt<'tcx>, key: DefId) -> Fubar<'tcx> { ... }
|
||
```
|
||
|
||
N.B. Most of the `rustc_*` crates only provide **local
|
||
providers**. Almost all **extern providers** wind up going through the
|
||
[`rustc_metadata` crate][rustc_metadata], which loads the information
|
||
from the crate metadata. But in some cases there are crates that
|
||
provide queries for *both* local and external crates, in which case
|
||
they define both a `provide` and a `provide_extern` function, through
|
||
[`wasm_import_module_map`][wasm_import_module_map], that `rustc_driver` can invoke.
|
||
|
||
[rustc_metadata]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/index.html
|
||
[wasm_import_module_map]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/back/symbol_export/fn.wasm_import_module_map.html
|
||
|
||
## Adding a new query
|
||
|
||
How do you add a new query?
|
||
Defining a query takes place in two steps:
|
||
|
||
1. Declare the query name, its arguments and description.
|
||
2. Supply query providers where needed.
|
||
|
||
To declare the query name and arguments, you simply add an entry to
|
||
the big macro invocation in [`compiler/rustc_middle/src/query/mod.rs`][query-mod].
|
||
Then you need to add a documentation comment to it with some _internal_ description.
|
||
Then, provide the `desc` attribute which contains a _user-facing_ description of the query.
|
||
The `desc` attribute is shown to the user in query cycles.
|
||
|
||
This looks something like:
|
||
|
||
[query-mod]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/query/index.html
|
||
|
||
```rust,ignore
|
||
rustc_queries! {
|
||
/// Records the type of every item.
|
||
query type_of(key: DefId) -> Ty<'tcx> {
|
||
cache_on_disk_if { key.is_local() }
|
||
desc { |tcx| "computing the type of `{}`", tcx.def_path_str(key) }
|
||
}
|
||
...
|
||
}
|
||
```
|
||
|
||
A query definition has the following form:
|
||
|
||
```rust,ignore
|
||
query type_of(key: DefId) -> Ty<'tcx> { ... }
|
||
^^^^^ ^^^^^^^ ^^^^^ ^^^^^^^^ ^^^
|
||
| | | | |
|
||
| | | | query modifiers
|
||
| | | result type
|
||
| | query key type
|
||
| name of query
|
||
query keyword
|
||
```
|
||
|
||
Let's go over these elements one by one:
|
||
|
||
- **Query keyword:** indicates a start of a query definition.
|
||
- **Name of query:** the name of the query method
|
||
(`tcx.type_of(..)`). Also used as the name of a struct
|
||
(`ty::queries::type_of`) that will be generated to represent
|
||
this query.
|
||
- **Query key type:** the type of the argument to this query.
|
||
This type must implement the [`ty::query::keys::Key`][Key] trait, which
|
||
defines (for example) how to map it to a crate, and so forth.
|
||
- **Result type of query:** the type produced by this query. This type
|
||
should (a) not use `RefCell` or other interior mutability and (b) be
|
||
cheaply cloneable. Interning or using `Rc` or `Arc` is recommended for
|
||
non-trivial data types.[^steal]
|
||
- **Query modifiers:** various flags and options that customize how the
|
||
query is processed (mostly with respect to [incremental compilation][incrcomp]).
|
||
|
||
[Key]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/query/keys/trait.Key.html
|
||
[incrcomp]: queries/incremental-compilation-in-detail.html#query-modifiers
|
||
|
||
So, to add a query:
|
||
|
||
- Add an entry to `rustc_queries!` using the format above.
|
||
- Link the provider by modifying the appropriate `provide` method;
|
||
or add a new one if needed and ensure that `rustc_driver` is invoking it.
|
||
|
||
[^steal]: The one exception to those rules is the `ty::steal::Steal` type,
|
||
which is used to cheaply modify MIR in place. See the definition
|
||
of `Steal` for more details. New uses of `Steal` should **not** be
|
||
added without alerting `@rust-lang/compiler`.
|
||
|
||
## External links
|
||
|
||
Related design ideas, and tracking issues:
|
||
|
||
- Design document: [On-demand Rustc incremental design doc]
|
||
- Tracking Issue: ["Red/Green" dependency tracking in compiler]
|
||
|
||
More discussion and issues:
|
||
|
||
- [GitHub issue #42633]
|
||
- [Incremental Compilation Beta]
|
||
- [Incremental Compilation Announcement]
|
||
|
||
[On-demand Rustc incremental design doc]: https://github.com/nikomatsakis/rustc-on-demand-incremental-design-doc/blob/master/0000-rustc-on-demand-and-incremental.md
|
||
["Red/Green" dependency tracking in compiler]: https://github.com/rust-lang/rust/issues/42293
|
||
[GitHub issue #42633]: https://github.com/rust-lang/rust/issues/42633
|
||
[Incremental Compilation Beta]: https://internals.rust-lang.org/t/incremental-compilation-beta/4721
|
||
[Incremental Compilation Announcement]: https://blog.rust-lang.org/2016/09/08/incremental.html
|
||
|