diff --git a/src/SUMMARY.md b/src/SUMMARY.md index e4bc2428..8e18969a 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -6,7 +6,7 @@ - [Walkthrough: a typical contribution](./walkthrough.md) - [High-level overview of the compiler source](./high-level-overview.md) - [Queries: demand-driven compilation](./query.md) - - [Incremental compilation](./incremental-compilation.md) + - [Incremental compilation](./incremental-compilation.md) - [The parser](./the-parser.md) - [Macro expansion](./macro-expansion.md) - [Name resolution](./name-resolution.md) @@ -15,8 +15,9 @@ - [Type inference](./type-inference.md) - [Trait resolution](./trait-resolution.md) - [Type checking](./type-checking.md) -- [MIR construction](./mir-construction.md) -- [MIR borrowck](./mir-borrowck.md) -- [MIR optimizations](./mir-optimizations.md) +- [The MIR (Mid-level IR)](./mir.md) + - [MIR construction](./mir-construction.md) + - [MIR borrowck](./mir-borrowck.md) + - [MIR optimizations](./mir-optimizations.md) - [trans: generating LLVM IR](./trans.md) - [Glossary](./glossary.md) diff --git a/src/incremental-compilation.md b/src/incremental-compilation.md new file mode 100644 index 00000000..23910c5b --- /dev/null +++ b/src/incremental-compilation.md @@ -0,0 +1,139 @@ +# Incremental compilation + +The incremental compilation scheme is, in essence, a surprisingly +simple extension to the overall query system. We'll start by describing +a slightly simplified variant of the real thing, the "basic algorithm", and then describe +some possible improvements. + +## The basic algorithm + +The basic algorithm is +called the **red-green** algorithm[^salsa]. The high-level idea is +that, after each run of the compiler, we will save the results of all +the queries that we do, as well as the **query DAG**. The +**query DAG** is a [DAG] that indices which queries executed which +other queries. So for example there would be an edge from a query Q1 +to another query Q2 if computing Q1 required computing Q2 (note that +because queries cannot depend on themselves, this results in a DAG and +not a general graph). + +[DAG]: https://en.wikipedia.org/wiki/Directed_acyclic_graph + +On the next run of the compiler, then, we can sometimes reuse these +query results to avoid re-executing a query. We do this by assigning +every query a **color**: + +- If a query is colored **red**, that means that its result during + this compilation has **changed** from the previous compilation. +- If a query is colored **green**, that means that its result is + the **same** as the previous compilation. + +There are two key insights here: + +- First, if all the inputs to query Q are colored green, then the + query Q **must** result in the same value as last time and hence + need not be re-executed (or else the compiler is not deterministic). +- Second, even if some inputs to a query changes, it may be that it + **still** produces the same result as the previous compilation. In + particular, the query may only use part of its input. + - Therefore, after executing a query, we always check whether it + produced the same result as the previous time. **If it did,** we + can still mark the query as green, and hence avoid re-executing + dependent queries. + +### The try-mark-green algorithm + +The core of the incremental compilation is an algorithm called +"try-mark-green". It has the job of determining the color of a given +query Q (which must not yet have been executed). In cases where Q has +red inputs, determining Q's color may involve re-executing Q so that +we can compare its output; but if all of Q's inputs are green, then we +can determine that Q must be green without re-executing it or inspect +its value what-so-ever. In the compiler, this allows us to avoid +deserializing the result from disk when we don't need it, and -- in +fact -- enables us to sometimes skip *serializing* the result as well +(see the refinements section below). + +Try-mark-green works as follows: + +- First check if there is the query Q was executed during the previous + compilation. + - If not, we can just re-execute the query as normal, and assign it the + color of red. +- If yes, then load the 'dependent queries' that Q +- If there is a saved result, then we load the `reads(Q)` vector from the + query DAG. The "reads" is the set of queries that Q executed during + its execution. + - For each query R that in `reads(Q)`, we recursively demand the color + of R using try-mark-green. + - Note: it is important that we visit each node in `reads(Q)` in same order + as they occurred in the original compilation. See [the section on the query DAG below](#dag). + - If **any** of the nodes in `reads(Q)` wind up colored **red**, then Q is dirty. + - We re-execute Q and compare the hash of its result to the hash of the result + from the previous compilation. + - If the hash has not changed, we can mark Q as **green** and return. + - Otherwise, **all** of the nodes in `reads(Q)` must be **green**. In that case, + we can color Q as **green** and return. + + + +### The query DAG + +The query DAG code is stored in +[`src/librustc/dep_graph`][dep_graph]. Construction of the DAG is done +by instrumenting the query execution. + +One key point is that the query DAG also tracks ordering; that is, for +each query Q, we noy only track the queries that Q reads, we track the +**order** in which they were read. This allows try-mark-green to walk +those queries back in the same order. This is important because once a subquery comes back as red, +we can no longer be sure that Q will continue along the same path as before. +That is, imagine a query like this: + +```rust,ignore +fn main_query(tcx) { + if tcx.subquery1() { + tcx.subquery2() + } else { + tcx.subquery3() + } +} +``` + +Now imagine that in the first compilation, `main_query` starts by +executing `subquery1`, and this returns true. In that case, the next +query `main_query` executes will be `subquery2`, and `subquery3` will +not be executed at all. + +But now imagine that in the **next** compilation, the input has +changed such that `subquery` returns **false**. In this case, `subquery2` would never +execute. If try-mark-green were to visit `reads(main_query)` out of order, +however, it might have visited `subquery2` before `subquery1`, and hence executed it. +This can lead to ICEs and other problems in the compiler. + +[dep_graph]: https://github.com/rust-lang/rust/tree/master/src/librustc/dep_graph + +## Improvements to the basic algorithm + +In the description basic algorithm, we said that at the end of +compilation we would save the results of all the queries that were +performed. In practice, this can be quite wasteful -- many of those +results are very cheap to recompute, and serializing + deserializing +them is not a particular win. In practice, what we would do is to save +**the hashes** of all the subqueries that we performed. Then, in select cases, +we **also** save the results. + +This is why the incremental algorithm separates computing the +**color** of a node, which often does not require its value, from +computing the **result** of a node. Computing the result is done via a simple algorithm +like so: + +- Check if a saved result for Q is available. If so, compute the color of Q. + If Q is green, deserialize and return the saved result. +- Otherwise, execute Q. + - We can then compare the hash of the result and color Q as green if + it did not change. + +# Footnotes + +[^salsa]: I have long wanted to rename it to the Salsa algorithm, but it never caught on. -@nikomatsakis diff --git a/src/mir.md b/src/mir.md new file mode 100644 index 00000000..2be6a2e1 --- /dev/null +++ b/src/mir.md @@ -0,0 +1 @@ +# The MIR (Mid-level IR)