Merge pull request #28 from nikomatsakis/master
add query + incremental section and restructure a bit
This commit is contained in:
commit
ccc8ca961e
|
|
@ -5,16 +5,19 @@
|
||||||
- [Using the compiler testing framework](./running-tests.md)
|
- [Using the compiler testing framework](./running-tests.md)
|
||||||
- [Walkthrough: a typical contribution](./walkthrough.md)
|
- [Walkthrough: a typical contribution](./walkthrough.md)
|
||||||
- [High-level overview of the compiler source](./high-level-overview.md)
|
- [High-level overview of the compiler source](./high-level-overview.md)
|
||||||
|
- [Queries: demand-driven compilation](./query.md)
|
||||||
|
- [Incremental compilation](./incremental-compilation.md)
|
||||||
- [The parser](./the-parser.md)
|
- [The parser](./the-parser.md)
|
||||||
- [Macro expansion](./macro-expansion.md)
|
- [Macro expansion](./macro-expansion.md)
|
||||||
- [Name resolution](./name-resolution.md)
|
- [Name resolution](./name-resolution.md)
|
||||||
- [HIR lowering](./hir-lowering.md)
|
- [The HIR (High-level IR)](./hir.md)
|
||||||
- [The `ty` module: representing types](./ty.md)
|
- [The `ty` module: representing types](./ty.md)
|
||||||
- [Type inference](./type-inference.md)
|
- [Type inference](./type-inference.md)
|
||||||
- [Trait resolution](./trait-resolution.md)
|
- [Trait resolution](./trait-resolution.md)
|
||||||
- [Type checking](./type-checking.md)
|
- [Type checking](./type-checking.md)
|
||||||
- [MIR construction](./mir-construction.md)
|
- [The MIR (Mid-level IR)](./mir.md)
|
||||||
- [MIR borrowck](./mir-borrowck.md)
|
- [MIR construction](./mir-construction.md)
|
||||||
- [MIR optimizations](./mir-optimizations.md)
|
- [MIR borrowck](./mir-borrowck.md)
|
||||||
|
- [MIR optimizations](./mir-optimizations.md)
|
||||||
- [trans: generating LLVM IR](./trans.md)
|
- [trans: generating LLVM IR](./trans.md)
|
||||||
- [Glossary](./glossary.md)
|
- [Glossary](./glossary.md)
|
||||||
|
|
|
||||||
|
|
@ -9,23 +9,24 @@ AST | the abstract syntax tree produced by the syntax crate
|
||||||
codegen unit | when we produce LLVM IR, we group the Rust code into a number of codegen units. Each of these units is processed by LLVM independently from one another, enabling parallelism. They are also the unit of incremental re-use.
|
codegen unit | when we produce LLVM IR, we group the Rust code into a number of codegen units. Each of these units is processed by LLVM independently from one another, enabling parallelism. They are also the unit of incremental re-use.
|
||||||
cx | we tend to use "cx" as an abbrevation for context. See also `tcx`, `infcx`, etc.
|
cx | we tend to use "cx" as an abbrevation for context. See also `tcx`, `infcx`, etc.
|
||||||
DefId | an index identifying a definition (see `librustc/hir/def_id.rs`). Uniquely identifies a `DefPath`.
|
DefId | an index identifying a definition (see `librustc/hir/def_id.rs`). Uniquely identifies a `DefPath`.
|
||||||
HIR | the High-level IR, created by lowering and desugaring the AST. See `librustc/hir`.
|
HIR | the High-level IR, created by lowering and desugaring the AST ([see more](hir.html))
|
||||||
HirId | identifies a particular node in the HIR by combining a def-id with an "intra-definition offset".
|
HirId | identifies a particular node in the HIR by combining a def-id with an "intra-definition offset".
|
||||||
'gcx | the lifetime of the global arena (see `librustc/ty`).
|
'gcx | the lifetime of the global arena ([see more](ty.html))
|
||||||
generics | the set of generic type parameters defined on a type or item
|
generics | the set of generic type parameters defined on a type or item
|
||||||
ICE | internal compiler error. When the compiler crashes.
|
ICE | internal compiler error. When the compiler crashes.
|
||||||
infcx | the inference context (see `librustc/infer`)
|
infcx | the inference context (see `librustc/infer`)
|
||||||
MIR | the Mid-level IR that is created after type-checking for use by borrowck and trans. Defined in the `src/librustc/mir/` module, but much of the code that manipulates it is found in `src/librustc_mir`.
|
MIR | the Mid-level IR that is created after type-checking for use by borrowck and trans ([see more](./mir.html))
|
||||||
obligation | something that must be proven by the trait system; see `librustc/traits`.
|
obligation | something that must be proven by the trait system ([see more](trait-resolution.html))
|
||||||
local crate | the crate currently being compiled.
|
local crate | the crate currently being compiled.
|
||||||
node-id or NodeId | an index identifying a particular node in the AST or HIR; gradually being phased out and replaced with `HirId`.
|
node-id or NodeId | an index identifying a particular node in the AST or HIR; gradually being phased out and replaced with `HirId`.
|
||||||
query | perhaps some sub-computation during compilation; see `librustc/maps`.
|
query | perhaps some sub-computation during compilation ([see more](query.html))
|
||||||
provider | the function that executes a query; see `librustc/maps`.
|
provider | the function that executes a query ([see more](query.html))
|
||||||
sess | the compiler session, which stores global data used throughout compilation
|
sess | the compiler session, which stores global data used throughout compilation
|
||||||
side tables | because the AST and HIR are immutable once created, we often carry extra information about them in the form of hashtables, indexed by the id of a particular node.
|
side tables | because the AST and HIR are immutable once created, we often carry extra information about them in the form of hashtables, indexed by the id of a particular node.
|
||||||
span | a location in the user's source code, used for error reporting primarily. These are like a file-name/line-number/column tuple on steroids: they carry a start/end point, and also track macro expansions and compiler desugaring. All while being packed into a few bytes (really, it's an index into a table). See the Span datatype for more.
|
span | a location in the user's source code, used for error reporting primarily. These are like a file-name/line-number/column tuple on steroids: they carry a start/end point, and also track macro expansions and compiler desugaring. All while being packed into a few bytes (really, it's an index into a table). See the Span datatype for more.
|
||||||
substs | the substitutions for a given generic type or item (e.g., the `i32`, `u32` in `HashMap<i32, u32>`)
|
substs | the substitutions for a given generic type or item (e.g., the `i32`, `u32` in `HashMap<i32, u32>`)
|
||||||
tcx | the "typing context", main data structure of the compiler (see `librustc/ty`).
|
tcx | the "typing context", main data structure of the compiler ([see more](ty.html))
|
||||||
|
'tcx | the lifetime of the currently active inference context ([see more](ty.html))
|
||||||
trans | the code to translate MIR into LLVM IR.
|
trans | the code to translate MIR into LLVM IR.
|
||||||
trait reference | a trait and values for its type parameters (see `librustc/ty`).
|
trait reference | a trait and values for its type parameters ([see more](ty.html)).
|
||||||
ty | the internal representation of a type (see `librustc/ty`).
|
ty | the internal representation of a type ([see more](ty.html)).
|
||||||
|
|
|
||||||
|
|
@ -1,4 +1,4 @@
|
||||||
# HIR lowering
|
# The HIR
|
||||||
|
|
||||||
The HIR -- "High-level IR" -- is the primary IR used in most of
|
The HIR -- "High-level IR" -- is the primary IR used in most of
|
||||||
rustc. It is a desugared version of the "abstract syntax tree" (AST)
|
rustc. It is a desugared version of the "abstract syntax tree" (AST)
|
||||||
|
|
@ -116,4 +116,4 @@ associated with an **owner**, which is typically some kind of item
|
||||||
(e.g., a `fn()` or `const`), but could also be a closure expression
|
(e.g., a `fn()` or `const`), but could also be a closure expression
|
||||||
(e.g., `|x, y| x + y`). You can use the HIR map to find the body
|
(e.g., `|x, y| x + y`). You can use the HIR map to find the body
|
||||||
associated with a given def-id (`maybe_body_owned_by()`) or to find
|
associated with a given def-id (`maybe_body_owned_by()`) or to find
|
||||||
the owner of a body (`body_owner_def_id()`).
|
the owner of a body (`body_owner_def_id()`).
|
||||||
|
|
@ -0,0 +1,139 @@
|
||||||
|
# Incremental compilation
|
||||||
|
|
||||||
|
The incremental compilation scheme is, in essence, a surprisingly
|
||||||
|
simple extension to the overall query system. We'll start by describing
|
||||||
|
a slightly simplified variant of the real thing, the "basic algorithm", and then describe
|
||||||
|
some possible improvements.
|
||||||
|
|
||||||
|
## The basic algorithm
|
||||||
|
|
||||||
|
The basic algorithm is
|
||||||
|
called the **red-green** algorithm[^salsa]. The high-level idea is
|
||||||
|
that, after each run of the compiler, we will save the results of all
|
||||||
|
the queries that we do, as well as the **query DAG**. The
|
||||||
|
**query DAG** is a [DAG] that indices which queries executed which
|
||||||
|
other queries. So for example there would be an edge from a query Q1
|
||||||
|
to another query Q2 if computing Q1 required computing Q2 (note that
|
||||||
|
because queries cannot depend on themselves, this results in a DAG and
|
||||||
|
not a general graph).
|
||||||
|
|
||||||
|
[DAG]: https://en.wikipedia.org/wiki/Directed_acyclic_graph
|
||||||
|
|
||||||
|
On the next run of the compiler, then, we can sometimes reuse these
|
||||||
|
query results to avoid re-executing a query. We do this by assigning
|
||||||
|
every query a **color**:
|
||||||
|
|
||||||
|
- If a query is colored **red**, that means that its result during
|
||||||
|
this compilation has **changed** from the previous compilation.
|
||||||
|
- If a query is colored **green**, that means that its result is
|
||||||
|
the **same** as the previous compilation.
|
||||||
|
|
||||||
|
There are two key insights here:
|
||||||
|
|
||||||
|
- First, if all the inputs to query Q are colored green, then the
|
||||||
|
query Q **must** result in the same value as last time and hence
|
||||||
|
need not be re-executed (or else the compiler is not deterministic).
|
||||||
|
- Second, even if some inputs to a query changes, it may be that it
|
||||||
|
**still** produces the same result as the previous compilation. In
|
||||||
|
particular, the query may only use part of its input.
|
||||||
|
- Therefore, after executing a query, we always check whether it
|
||||||
|
produced the same result as the previous time. **If it did,** we
|
||||||
|
can still mark the query as green, and hence avoid re-executing
|
||||||
|
dependent queries.
|
||||||
|
|
||||||
|
### The try-mark-green algorithm
|
||||||
|
|
||||||
|
The core of the incremental compilation is an algorithm called
|
||||||
|
"try-mark-green". It has the job of determining the color of a given
|
||||||
|
query Q (which must not yet have been executed). In cases where Q has
|
||||||
|
red inputs, determining Q's color may involve re-executing Q so that
|
||||||
|
we can compare its output; but if all of Q's inputs are green, then we
|
||||||
|
can determine that Q must be green without re-executing it or inspect
|
||||||
|
its value what-so-ever. In the compiler, this allows us to avoid
|
||||||
|
deserializing the result from disk when we don't need it, and -- in
|
||||||
|
fact -- enables us to sometimes skip *serializing* the result as well
|
||||||
|
(see the refinements section below).
|
||||||
|
|
||||||
|
Try-mark-green works as follows:
|
||||||
|
|
||||||
|
- First check if there is the query Q was executed during the previous
|
||||||
|
compilation.
|
||||||
|
- If not, we can just re-execute the query as normal, and assign it the
|
||||||
|
color of red.
|
||||||
|
- If yes, then load the 'dependent queries' that Q
|
||||||
|
- If there is a saved result, then we load the `reads(Q)` vector from the
|
||||||
|
query DAG. The "reads" is the set of queries that Q executed during
|
||||||
|
its execution.
|
||||||
|
- For each query R that in `reads(Q)`, we recursively demand the color
|
||||||
|
of R using try-mark-green.
|
||||||
|
- Note: it is important that we visit each node in `reads(Q)` in same order
|
||||||
|
as they occurred in the original compilation. See [the section on the query DAG below](#dag).
|
||||||
|
- If **any** of the nodes in `reads(Q)` wind up colored **red**, then Q is dirty.
|
||||||
|
- We re-execute Q and compare the hash of its result to the hash of the result
|
||||||
|
from the previous compilation.
|
||||||
|
- If the hash has not changed, we can mark Q as **green** and return.
|
||||||
|
- Otherwise, **all** of the nodes in `reads(Q)` must be **green**. In that case,
|
||||||
|
we can color Q as **green** and return.
|
||||||
|
|
||||||
|
<a name="dag">
|
||||||
|
|
||||||
|
### The query DAG
|
||||||
|
|
||||||
|
The query DAG code is stored in
|
||||||
|
[`src/librustc/dep_graph`][dep_graph]. Construction of the DAG is done
|
||||||
|
by instrumenting the query execution.
|
||||||
|
|
||||||
|
One key point is that the query DAG also tracks ordering; that is, for
|
||||||
|
each query Q, we noy only track the queries that Q reads, we track the
|
||||||
|
**order** in which they were read. This allows try-mark-green to walk
|
||||||
|
those queries back in the same order. This is important because once a subquery comes back as red,
|
||||||
|
we can no longer be sure that Q will continue along the same path as before.
|
||||||
|
That is, imagine a query like this:
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
fn main_query(tcx) {
|
||||||
|
if tcx.subquery1() {
|
||||||
|
tcx.subquery2()
|
||||||
|
} else {
|
||||||
|
tcx.subquery3()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Now imagine that in the first compilation, `main_query` starts by
|
||||||
|
executing `subquery1`, and this returns true. In that case, the next
|
||||||
|
query `main_query` executes will be `subquery2`, and `subquery3` will
|
||||||
|
not be executed at all.
|
||||||
|
|
||||||
|
But now imagine that in the **next** compilation, the input has
|
||||||
|
changed such that `subquery` returns **false**. In this case, `subquery2` would never
|
||||||
|
execute. If try-mark-green were to visit `reads(main_query)` out of order,
|
||||||
|
however, it might have visited `subquery2` before `subquery1`, and hence executed it.
|
||||||
|
This can lead to ICEs and other problems in the compiler.
|
||||||
|
|
||||||
|
[dep_graph]: https://github.com/rust-lang/rust/tree/master/src/librustc/dep_graph
|
||||||
|
|
||||||
|
## Improvements to the basic algorithm
|
||||||
|
|
||||||
|
In the description basic algorithm, we said that at the end of
|
||||||
|
compilation we would save the results of all the queries that were
|
||||||
|
performed. In practice, this can be quite wasteful -- many of those
|
||||||
|
results are very cheap to recompute, and serializing + deserializing
|
||||||
|
them is not a particular win. In practice, what we would do is to save
|
||||||
|
**the hashes** of all the subqueries that we performed. Then, in select cases,
|
||||||
|
we **also** save the results.
|
||||||
|
|
||||||
|
This is why the incremental algorithm separates computing the
|
||||||
|
**color** of a node, which often does not require its value, from
|
||||||
|
computing the **result** of a node. Computing the result is done via a simple algorithm
|
||||||
|
like so:
|
||||||
|
|
||||||
|
- Check if a saved result for Q is available. If so, compute the color of Q.
|
||||||
|
If Q is green, deserialize and return the saved result.
|
||||||
|
- Otherwise, execute Q.
|
||||||
|
- We can then compare the hash of the result and color Q as green if
|
||||||
|
it did not change.
|
||||||
|
|
||||||
|
# Footnotes
|
||||||
|
|
||||||
|
[^salsa]: I have long wanted to rename it to the Salsa algorithm, but it never caught on. -@nikomatsakis
|
||||||
|
|
@ -0,0 +1,6 @@
|
||||||
|
# The MIR (Mid-level IR)
|
||||||
|
|
||||||
|
TODO
|
||||||
|
|
||||||
|
Defined in the `src/librustc/mir/` module, but much of the code that
|
||||||
|
manipulates it is found in `src/librustc_mir`.
|
||||||
|
|
@ -0,0 +1,314 @@
|
||||||
|
# Queries: demand-driven compilation
|
||||||
|
|
||||||
|
As described in [the high-level overview of the compiler][hl], the
|
||||||
|
Rust compiler is current transitioning from a traditional "pass-based"
|
||||||
|
setup to a "demand-driven" system. **The Compiler Query System is the
|
||||||
|
key to our new demand-driven organization.** The idea is pretty
|
||||||
|
simple. You have various queries that compute things about the input
|
||||||
|
-- for example, there is a query called `type_of(def_id)` that, given
|
||||||
|
the def-id of some item, will compute the type of that item and return
|
||||||
|
it to you.
|
||||||
|
|
||||||
|
[hl]: high-level-overview.html
|
||||||
|
|
||||||
|
Query execution is **memoized** -- so the first time you invoke a
|
||||||
|
query, it will go do the computation, but the next time, the result is
|
||||||
|
returned from a hashtable. Moreover, query execution fits nicely into
|
||||||
|
**incremental computation**; the idea is roughly that, when you do a
|
||||||
|
query, the result **may** be returned to you by loading stored data
|
||||||
|
from disk (but that's a separate topic we won't discuss further here).
|
||||||
|
|
||||||
|
The overall vision is that, eventually, the entire compiler
|
||||||
|
control-flow will be query driven. There will effectively be one
|
||||||
|
top-level query ("compile") that will run compilation on a crate; this
|
||||||
|
will in turn demand information about that crate, starting from the
|
||||||
|
*end*. For example:
|
||||||
|
|
||||||
|
- This "compile" query might demand to get a list of codegen-units
|
||||||
|
(i.e., modules that need to be compiled by LLVM).
|
||||||
|
- But computing the list of codegen-units would invoke some subquery
|
||||||
|
that returns the list of all modules defined in the Rust source.
|
||||||
|
- That query in turn would invoke something asking for the HIR.
|
||||||
|
- This keeps going further and further back until we wind up doing the
|
||||||
|
actual parsing.
|
||||||
|
|
||||||
|
However, that vision is not fully realized. Still, big chunks of the
|
||||||
|
compiler (for example, generating MIR) work exactly like this.
|
||||||
|
|
||||||
|
### Invoking queries
|
||||||
|
|
||||||
|
To invoke a query is simple. The tcx ("type context") offers a method
|
||||||
|
for each defined query. So, for example, to invoke the `type_of`
|
||||||
|
query, you would just do this:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
let ty = tcx.type_of(some_def_id);
|
||||||
|
```
|
||||||
|
|
||||||
|
### Cycles between queries
|
||||||
|
|
||||||
|
Currently, cycles during query execution should always result in a
|
||||||
|
compilation error. Typically, they arise because of illegal programs
|
||||||
|
that contain cyclic references they shouldn't (though sometimes they
|
||||||
|
arise because of compiler bugs, in which case we need to factor our
|
||||||
|
queries in a more fine-grained fashion to avoid them).
|
||||||
|
|
||||||
|
However, it is nonetheless often useful to *recover* from a cycle
|
||||||
|
(after reporting an error, say) and try to soldier on, so as to give a
|
||||||
|
better user experience. In order to recover from a cycle, you don't
|
||||||
|
get to use the nice method-call-style syntax. Instead, you invoke
|
||||||
|
using the `try_get` method, which looks roughly like this:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
use ty::maps::queries;
|
||||||
|
...
|
||||||
|
match queries::type_of::try_get(tcx, DUMMY_SP, self.did) {
|
||||||
|
Ok(result) => {
|
||||||
|
// no cycle occurred! You can use `result`
|
||||||
|
}
|
||||||
|
Err(err) => {
|
||||||
|
// A cycle occurred! The error value `err` is a `DiagnosticBuilder`,
|
||||||
|
// meaning essentially an "in-progress", not-yet-reported error message.
|
||||||
|
// See below for more details on what to do here.
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
So, if you get back an `Err` from `try_get`, then a cycle *did* occur. This means that
|
||||||
|
you must ensure that a compiler error message is reported. You can do that in two ways:
|
||||||
|
|
||||||
|
The simplest is to invoke `err.emit()`. This will emit the cycle error to the user.
|
||||||
|
|
||||||
|
However, often cycles happen because of an illegal program, and you
|
||||||
|
know at that point that an error either already has been reported or
|
||||||
|
will be reported due to this cycle by some other bit of code. In that
|
||||||
|
case, you can invoke `err.cancel()` to not emit any error. It is
|
||||||
|
traditional to then invoke:
|
||||||
|
|
||||||
|
```
|
||||||
|
tcx.sess.delay_span_bug(some_span, "some message")
|
||||||
|
```
|
||||||
|
|
||||||
|
`delay_span_bug()` is a helper that says: we expect a compilation
|
||||||
|
error to have happened or to happen in the future; so, if compilation
|
||||||
|
ultimately succeeds, make an ICE with the message `"some
|
||||||
|
message"`. This is basically just a precaution in case you are wrong.
|
||||||
|
|
||||||
|
### How the compiler executes a query
|
||||||
|
|
||||||
|
So you may be wondering what happens when you invoke a query
|
||||||
|
method. The answer is that, for each query, the compiler maintains a
|
||||||
|
cache -- if your query has already been executed, then, the answer is
|
||||||
|
simple: we clone the return value out of the cache and return it
|
||||||
|
(therefore, you should try to ensure that the return types of queries
|
||||||
|
are cheaply cloneable; insert a `Rc` if necessary).
|
||||||
|
|
||||||
|
#### Providers
|
||||||
|
|
||||||
|
If, however, the query is *not* in the cache, then the compiler will
|
||||||
|
try to find a suitable **provider**. A provider is a function that has
|
||||||
|
been defined and linked into the compiler somewhere that contains the
|
||||||
|
code to compute the result of the query.
|
||||||
|
|
||||||
|
**Providers are defined per-crate.** The compiler maintains,
|
||||||
|
internally, a table of providers for every crate, at least
|
||||||
|
conceptually. Right now, there are really two sets: the providers for
|
||||||
|
queries about the **local crate** (that is, the one being compiled)
|
||||||
|
and providers for queries about **external crates** (that is,
|
||||||
|
dependencies of the local crate). Note that what determines the crate
|
||||||
|
that a query is targeting is not the *kind* of query, but the *key*.
|
||||||
|
For example, when you invoke `tcx.type_of(def_id)`, that could be a
|
||||||
|
local query or an external query, depending on what crate the `def_id`
|
||||||
|
is referring to (see the `self::keys::Key` trait for more information
|
||||||
|
on how that works).
|
||||||
|
|
||||||
|
Providers always have the same signature:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
fn provider<'cx, 'tcx>(tcx: TyCtxt<'cx, 'tcx, 'tcx>,
|
||||||
|
key: QUERY_KEY)
|
||||||
|
-> QUERY_RESULT
|
||||||
|
{
|
||||||
|
...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Providers take two arguments: the `tcx` and the query key. Note also
|
||||||
|
that they take the *global* tcx (i.e., they use the `'tcx` lifetime
|
||||||
|
twice), rather than taking a tcx with some active inference context.
|
||||||
|
They return the result of the query.
|
||||||
|
|
||||||
|
#### How providers are setup
|
||||||
|
|
||||||
|
When the tcx is created, it is given the providers by its creator using
|
||||||
|
the `Providers` struct. This struct is generate by the macros here, but it
|
||||||
|
is basically a big list of function pointers:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
struct Providers {
|
||||||
|
type_of: for<'cx, 'tcx> fn(TyCtxt<'cx, 'tcx, 'tcx>, DefId) -> Ty<'tcx>,
|
||||||
|
...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
At present, we have one copy of the struct for local crates, and one
|
||||||
|
for external crates, though the plan is that we may eventually have
|
||||||
|
one per crate.
|
||||||
|
|
||||||
|
These `Provider` structs are ultimately created and populated by
|
||||||
|
`librustc_driver`, but it does this by distributing the work
|
||||||
|
throughout the other `rustc_*` crates. This is done by invoking
|
||||||
|
various `provide` functions. These functions tend to look something
|
||||||
|
like this:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub fn provide(providers: &mut Providers) {
|
||||||
|
*providers = Providers {
|
||||||
|
type_of,
|
||||||
|
..*providers
|
||||||
|
};
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
That is, they take an `&mut Providers` and mutate it in place. Usually
|
||||||
|
we use the formulation above just because it looks nice, but you could
|
||||||
|
as well do `providers.type_of = type_of`, which would be equivalent.
|
||||||
|
(Here, `type_of` would be a top-level function, defined as we saw
|
||||||
|
before.) So, if we want to add a provider for some other query,
|
||||||
|
let's call it `fubar`, into the crate above, we might modify the `provide()`
|
||||||
|
function like so:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub fn provide(providers: &mut Providers) {
|
||||||
|
*providers = Providers {
|
||||||
|
type_of,
|
||||||
|
fubar,
|
||||||
|
..*providers
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
fn fubar<'cx, 'tcx>(tcx: TyCtxt<'cx, 'tcx>, key: DefId) -> Fubar<'tcx> { .. }
|
||||||
|
```
|
||||||
|
|
||||||
|
NB. Most of the `rustc_*` crates only provide **local
|
||||||
|
providers**. Almost all **extern providers** wind up going through the
|
||||||
|
[`rustc_metadata` crate][rustc_metadata], which loads the information from the crate
|
||||||
|
metadata. But in some cases there are crates that provide queries for
|
||||||
|
*both* local and external crates, in which case they define both a
|
||||||
|
`provide` and a `provide_extern` function that `rustc_driver` can
|
||||||
|
invoke.
|
||||||
|
|
||||||
|
[rustc_metadata]: https://github.com/rust-lang/rust/tree/master/src/librustc_metadata
|
||||||
|
|
||||||
|
### Adding a new kind of query
|
||||||
|
|
||||||
|
So suppose you want to add a new kind of query, how do you do so?
|
||||||
|
Well, defining a query takes place in two steps:
|
||||||
|
|
||||||
|
1. first, you have to specify the query name and arguments; and then,
|
||||||
|
2. you have to supply query providers where needed.
|
||||||
|
|
||||||
|
To specify the query name and arguments, you simply add an entry to
|
||||||
|
the big macro invocation in
|
||||||
|
[`src/librustc/ty/maps/mod.rs`][maps-mod]. This will probably have
|
||||||
|
changed by the time you read this README, but at present it looks
|
||||||
|
something like:
|
||||||
|
|
||||||
|
[maps-mod]: https://github.com/rust-lang/rust/blob/master/src/librustc/ty/maps/mod.rs
|
||||||
|
|
||||||
|
```
|
||||||
|
define_maps! { <'tcx>
|
||||||
|
/// Records the type of every item.
|
||||||
|
[] fn type_of: TypeOfItem(DefId) -> Ty<'tcx>,
|
||||||
|
|
||||||
|
...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Each line of the macro defines one query. The name is broken up like this:
|
||||||
|
|
||||||
|
```
|
||||||
|
[] fn type_of: TypeOfItem(DefId) -> Ty<'tcx>,
|
||||||
|
^^ ^^^^^^^ ^^^^^^^^^^ ^^^^^ ^^^^^^^^
|
||||||
|
| | | | |
|
||||||
|
| | | | result type of query
|
||||||
|
| | | query key type
|
||||||
|
| | dep-node constructor
|
||||||
|
| name of query
|
||||||
|
query flags
|
||||||
|
```
|
||||||
|
|
||||||
|
Let's go over them one by one:
|
||||||
|
|
||||||
|
- **Query flags:** these are largely unused right now, but the intention
|
||||||
|
is that we'll be able to customize various aspects of how the query is
|
||||||
|
processed.
|
||||||
|
- **Name of query:** the name of the query method
|
||||||
|
(`tcx.type_of(..)`). Also used as the name of a struct
|
||||||
|
(`ty::maps::queries::type_of`) that will be generated to represent
|
||||||
|
this query.
|
||||||
|
- **Dep-node constructor:** indicates the constructor function that
|
||||||
|
connects this query to incremental compilation. Typically, this is a
|
||||||
|
`DepNode` variant, which can be added by modifying the
|
||||||
|
`define_dep_nodes!` macro invocation in
|
||||||
|
[`librustc/dep_graph/dep_node.rs`][dep-node].
|
||||||
|
- However, sometimes we use a custom function, in which case the
|
||||||
|
name will be in snake case and the function will be defined at the
|
||||||
|
bottom of the file. This is typically used when the query key is
|
||||||
|
not a def-id, or just not the type that the dep-node expects.
|
||||||
|
- **Query key type:** the type of the argument to this query.
|
||||||
|
This type must implement the `ty::maps::keys::Key` trait, which
|
||||||
|
defines (for example) how to map it to a crate, and so forth.
|
||||||
|
- **Result type of query:** the type produced by this query. This type
|
||||||
|
should (a) not use `RefCell` or other interior mutability and (b) be
|
||||||
|
cheaply cloneable. Interning or using `Rc` or `Arc` is recommended for
|
||||||
|
non-trivial data types.
|
||||||
|
- The one exception to those rules is the `ty::steal::Steal` type,
|
||||||
|
which is used to cheaply modify MIR in place. See the definition
|
||||||
|
of `Steal` for more details. New uses of `Steal` should **not** be
|
||||||
|
added without alerting `@rust-lang/compiler`.
|
||||||
|
|
||||||
|
[dep-node]: https://github.com/rust-lang/rust/blob/master/src/librustc/dep_graph/dep_node.rs
|
||||||
|
|
||||||
|
So, to add a query:
|
||||||
|
|
||||||
|
- Add an entry to `define_maps!` using the format above.
|
||||||
|
- Possibly add a corresponding entry to the dep-node macro.
|
||||||
|
- Link the provider by modifying the appropriate `provide` method;
|
||||||
|
or add a new one if needed and ensure that `rustc_driver` is invoking it.
|
||||||
|
|
||||||
|
#### Query structs and descriptions
|
||||||
|
|
||||||
|
For each kind, the `define_maps` macro will generate a "query struct"
|
||||||
|
named after the query. This struct is a kind of a place-holder
|
||||||
|
describing the query. Each such struct implements the
|
||||||
|
`self::config::QueryConfig` trait, which has associated types for the
|
||||||
|
key/value of that particular query. Basically the code generated looks something
|
||||||
|
like this:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// Dummy struct representing a particular kind of query:
|
||||||
|
pub struct type_of<'tcx> { phantom: PhantomData<&'tcx ()> }
|
||||||
|
|
||||||
|
impl<'tcx> QueryConfig for type_of<'tcx> {
|
||||||
|
type Key = DefId;
|
||||||
|
type Value = Ty<'tcx>;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
There is an additional trait that you may wish to implement called
|
||||||
|
`self::config::QueryDescription`. This trait is used during cycle
|
||||||
|
errors to give a "human readable" name for the query, so that we can
|
||||||
|
summarize what was happening when the cycle occurred. Implementing
|
||||||
|
this trait is optional if the query key is `DefId`, but if you *don't*
|
||||||
|
implement it, you get a pretty generic error ("processing `foo`...").
|
||||||
|
You can put new impls into the `config` module. They look something like this:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
impl<'tcx> QueryDescription for queries::type_of<'tcx> {
|
||||||
|
fn describe(tcx: TyCtxt, key: DefId) -> String {
|
||||||
|
format!("computing the type of `{}`", tcx.item_path_str(key))
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
Loading…
Reference in New Issue