diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 2e3b3438..e4bc2428 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -5,6 +5,8 @@ - [Using the compiler testing framework](./running-tests.md) - [Walkthrough: a typical contribution](./walkthrough.md) - [High-level overview of the compiler source](./high-level-overview.md) +- [Queries: demand-driven compilation](./query.md) + - [Incremental compilation](./incremental-compilation.md) - [The parser](./the-parser.md) - [Macro expansion](./macro-expansion.md) - [Name resolution](./name-resolution.md) diff --git a/src/query.md b/src/query.md new file mode 100644 index 00000000..65d65130 --- /dev/null +++ b/src/query.md @@ -0,0 +1,314 @@ +# Queries: demand-driven compilation + +As described in [the high-level overview of the compiler][hl], the +Rust compiler is current transitioning from a traditional "pass-based" +setup to a "demand-driven" system. **The Compiler Query System is the +key to our new demand-driven organization.** The idea is pretty +simple. You have various queries that compute things about the input +-- for example, there is a query called `type_of(def_id)` that, given +the def-id of some item, will compute the type of that item and return +it to you. + +[hl]: high-level-overview.html + +Query execution is **memoized** -- so the first time you invoke a +query, it will go do the computation, but the next time, the result is +returned from a hashtable. Moreover, query execution fits nicely into +**incremental computation**; the idea is roughly that, when you do a +query, the result **may** be returned to you by loading stored data +from disk (but that's a separate topic we won't discuss further here). + +The overall vision is that, eventually, the entire compiler +control-flow will be query driven. There will effectively be one +top-level query ("compile") that will run compilation on a crate; this +will in turn demand information about that crate, starting from the +*end*. For example: + +- This "compile" query might demand to get a list of codegen-units + (i.e., modules that need to be compiled by LLVM). +- But computing the list of codegen-units would invoke some subquery + that returns the list of all modules defined in the Rust source. +- That query in turn would invoke something asking for the HIR. +- This keeps going further and further back until we wind up doing the + actual parsing. + +However, that vision is not fully realized. Still, big chunks of the +compiler (for example, generating MIR) work exactly like this. + +### Invoking queries + +To invoke a query is simple. The tcx ("type context") offers a method +for each defined query. So, for example, to invoke the `type_of` +query, you would just do this: + +```rust +let ty = tcx.type_of(some_def_id); +``` + +### Cycles between queries + +Currently, cycles during query execution should always result in a +compilation error. Typically, they arise because of illegal programs +that contain cyclic references they shouldn't (though sometimes they +arise because of compiler bugs, in which case we need to factor our +queries in a more fine-grained fashion to avoid them). + +However, it is nonetheless often useful to *recover* from a cycle +(after reporting an error, say) and try to soldier on, so as to give a +better user experience. In order to recover from a cycle, you don't +get to use the nice method-call-style syntax. Instead, you invoke +using the `try_get` method, which looks roughly like this: + +```rust +use ty::maps::queries; +... +match queries::type_of::try_get(tcx, DUMMY_SP, self.did) { + Ok(result) => { + // no cycle occurred! You can use `result` + } + Err(err) => { + // A cycle occurred! The error value `err` is a `DiagnosticBuilder`, + // meaning essentially an "in-progress", not-yet-reported error message. + // See below for more details on what to do here. + } +} +``` + +So, if you get back an `Err` from `try_get`, then a cycle *did* occur. This means that +you must ensure that a compiler error message is reported. You can do that in two ways: + +The simplest is to invoke `err.emit()`. This will emit the cycle error to the user. + +However, often cycles happen because of an illegal program, and you +know at that point that an error either already has been reported or +will be reported due to this cycle by some other bit of code. In that +case, you can invoke `err.cancel()` to not emit any error. It is +traditional to then invoke: + +``` +tcx.sess.delay_span_bug(some_span, "some message") +``` + +`delay_span_bug()` is a helper that says: we expect a compilation +error to have happened or to happen in the future; so, if compilation +ultimately succeeds, make an ICE with the message `"some +message"`. This is basically just a precaution in case you are wrong. + +### How the compiler executes a query + +So you may be wondering what happens when you invoke a query +method. The answer is that, for each query, the compiler maintains a +cache -- if your query has already been executed, then, the answer is +simple: we clone the return value out of the cache and return it +(therefore, you should try to ensure that the return types of queries +are cheaply cloneable; insert a `Rc` if necessary). + +#### Providers + +If, however, the query is *not* in the cache, then the compiler will +try to find a suitable **provider**. A provider is a function that has +been defined and linked into the compiler somewhere that contains the +code to compute the result of the query. + +**Providers are defined per-crate.** The compiler maintains, +internally, a table of providers for every crate, at least +conceptually. Right now, there are really two sets: the providers for +queries about the **local crate** (that is, the one being compiled) +and providers for queries about **external crates** (that is, +dependencies of the local crate). Note that what determines the crate +that a query is targeting is not the *kind* of query, but the *key*. +For example, when you invoke `tcx.type_of(def_id)`, that could be a +local query or an external query, depending on what crate the `def_id` +is referring to (see the `self::keys::Key` trait for more information +on how that works). + +Providers always have the same signature: + +```rust +fn provider<'cx, 'tcx>(tcx: TyCtxt<'cx, 'tcx, 'tcx>, + key: QUERY_KEY) + -> QUERY_RESULT +{ + ... +} +``` + +Providers take two arguments: the `tcx` and the query key. Note also +that they take the *global* tcx (i.e., they use the `'tcx` lifetime +twice), rather than taking a tcx with some active inference context. +They return the result of the query. + +#### How providers are setup + +When the tcx is created, it is given the providers by its creator using +the `Providers` struct. This struct is generate by the macros here, but it +is basically a big list of function pointers: + +```rust +struct Providers { + type_of: for<'cx, 'tcx> fn(TyCtxt<'cx, 'tcx, 'tcx>, DefId) -> Ty<'tcx>, + ... +} +``` + +At present, we have one copy of the struct for local crates, and one +for external crates, though the plan is that we may eventually have +one per crate. + +These `Provider` structs are ultimately created and populated by +`librustc_driver`, but it does this by distributing the work +throughout the other `rustc_*` crates. This is done by invoking +various `provide` functions. These functions tend to look something +like this: + +```rust +pub fn provide(providers: &mut Providers) { + *providers = Providers { + type_of, + ..*providers + }; +} +``` + +That is, they take an `&mut Providers` and mutate it in place. Usually +we use the formulation above just because it looks nice, but you could +as well do `providers.type_of = type_of`, which would be equivalent. +(Here, `type_of` would be a top-level function, defined as we saw +before.) So, if we want to add a provider for some other query, +let's call it `fubar`, into the crate above, we might modify the `provide()` +function like so: + +```rust +pub fn provide(providers: &mut Providers) { + *providers = Providers { + type_of, + fubar, + ..*providers + }; +} + +fn fubar<'cx, 'tcx>(tcx: TyCtxt<'cx, 'tcx>, key: DefId) -> Fubar<'tcx> { .. } +``` + +NB. Most of the `rustc_*` crates only provide **local +providers**. Almost all **extern providers** wind up going through the +[`rustc_metadata` crate][rustc_metadata], which loads the information from the crate +metadata. But in some cases there are crates that provide queries for +*both* local and external crates, in which case they define both a +`provide` and a `provide_extern` function that `rustc_driver` can +invoke. + +[rustc_metadata]: https://github.com/rust-lang/rust/tree/master/src/librustc_metadata + +### Adding a new kind of query + +So suppose you want to add a new kind of query, how do you do so? +Well, defining a query takes place in two steps: + +1. first, you have to specify the query name and arguments; and then, +2. you have to supply query providers where needed. + +To specify the query name and arguments, you simply add an entry to +the big macro invocation in +[`src/librustc/ty/maps/mod.rs`][maps-mod]. This will probably have +changed by the time you read this README, but at present it looks +something like: + +[maps-mod]: https://github.com/rust-lang/rust/blob/master/src/librustc/ty/maps/mod.rs + +``` +define_maps! { <'tcx> + /// Records the type of every item. + [] fn type_of: TypeOfItem(DefId) -> Ty<'tcx>, + + ... +} +``` + +Each line of the macro defines one query. The name is broken up like this: + +``` +[] fn type_of: TypeOfItem(DefId) -> Ty<'tcx>, +^^ ^^^^^^^ ^^^^^^^^^^ ^^^^^ ^^^^^^^^ +| | | | | +| | | | result type of query +| | | query key type +| | dep-node constructor +| name of query +query flags +``` + +Let's go over them one by one: + +- **Query flags:** these are largely unused right now, but the intention + is that we'll be able to customize various aspects of how the query is + processed. +- **Name of query:** the name of the query method + (`tcx.type_of(..)`). Also used as the name of a struct + (`ty::maps::queries::type_of`) that will be generated to represent + this query. +- **Dep-node constructor:** indicates the constructor function that + connects this query to incremental compilation. Typically, this is a + `DepNode` variant, which can be added by modifying the + `define_dep_nodes!` macro invocation in + [`librustc/dep_graph/dep_node.rs`][dep-node]. + - However, sometimes we use a custom function, in which case the + name will be in snake case and the function will be defined at the + bottom of the file. This is typically used when the query key is + not a def-id, or just not the type that the dep-node expects. +- **Query key type:** the type of the argument to this query. + This type must implement the `ty::maps::keys::Key` trait, which + defines (for example) how to map it to a crate, and so forth. +- **Result type of query:** the type produced by this query. This type + should (a) not use `RefCell` or other interior mutability and (b) be + cheaply cloneable. Interning or using `Rc` or `Arc` is recommended for + non-trivial data types. + - The one exception to those rules is the `ty::steal::Steal` type, + which is used to cheaply modify MIR in place. See the definition + of `Steal` for more details. New uses of `Steal` should **not** be + added without alerting `@rust-lang/compiler`. + +[dep-node]: https://github.com/rust-lang/rust/blob/master/src/librustc/dep_graph/dep_node.rs + +So, to add a query: + +- Add an entry to `define_maps!` using the format above. +- Possibly add a corresponding entry to the dep-node macro. +- Link the provider by modifying the appropriate `provide` method; + or add a new one if needed and ensure that `rustc_driver` is invoking it. + +#### Query structs and descriptions + +For each kind, the `define_maps` macro will generate a "query struct" +named after the query. This struct is a kind of a place-holder +describing the query. Each such struct implements the +`self::config::QueryConfig` trait, which has associated types for the +key/value of that particular query. Basically the code generated looks something +like this: + +```rust +// Dummy struct representing a particular kind of query: +pub struct type_of<'tcx> { phantom: PhantomData<&'tcx ()> } + +impl<'tcx> QueryConfig for type_of<'tcx> { + type Key = DefId; + type Value = Ty<'tcx>; +} +``` + +There is an additional trait that you may wish to implement called +`self::config::QueryDescription`. This trait is used during cycle +errors to give a "human readable" name for the query, so that we can +summarize what was happening when the cycle occurred. Implementing +this trait is optional if the query key is `DefId`, but if you *don't* +implement it, you get a pretty generic error ("processing `foo`..."). +You can put new impls into the `config` module. They look something like this: + +```rust +impl<'tcx> QueryDescription for queries::type_of<'tcx> { + fn describe(tcx: TyCtxt, key: DefId) -> String { + format!("computing the type of `{}`", tcx.item_path_str(key)) + } +} +``` +