breakup the MIR section and add an incremental compilation section
This commit is contained in:
parent
d052d295ef
commit
eb31ada5c7
|
|
@ -15,6 +15,7 @@
|
||||||
- [Type inference](./type-inference.md)
|
- [Type inference](./type-inference.md)
|
||||||
- [Trait resolution](./trait-resolution.md)
|
- [Trait resolution](./trait-resolution.md)
|
||||||
- [Type checking](./type-checking.md)
|
- [Type checking](./type-checking.md)
|
||||||
|
- [The MIR (Mid-level IR)](./mir.md)
|
||||||
- [MIR construction](./mir-construction.md)
|
- [MIR construction](./mir-construction.md)
|
||||||
- [MIR borrowck](./mir-borrowck.md)
|
- [MIR borrowck](./mir-borrowck.md)
|
||||||
- [MIR optimizations](./mir-optimizations.md)
|
- [MIR optimizations](./mir-optimizations.md)
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,139 @@
|
||||||
|
# Incremental compilation
|
||||||
|
|
||||||
|
The incremental compilation scheme is, in essence, a surprisingly
|
||||||
|
simple extension to the overall query system. We'll start by describing
|
||||||
|
a slightly simplified variant of the real thing, the "basic algorithm", and then describe
|
||||||
|
some possible improvements.
|
||||||
|
|
||||||
|
## The basic algorithm
|
||||||
|
|
||||||
|
The basic algorithm is
|
||||||
|
called the **red-green** algorithm[^salsa]. The high-level idea is
|
||||||
|
that, after each run of the compiler, we will save the results of all
|
||||||
|
the queries that we do, as well as the **query DAG**. The
|
||||||
|
**query DAG** is a [DAG] that indices which queries executed which
|
||||||
|
other queries. So for example there would be an edge from a query Q1
|
||||||
|
to another query Q2 if computing Q1 required computing Q2 (note that
|
||||||
|
because queries cannot depend on themselves, this results in a DAG and
|
||||||
|
not a general graph).
|
||||||
|
|
||||||
|
[DAG]: https://en.wikipedia.org/wiki/Directed_acyclic_graph
|
||||||
|
|
||||||
|
On the next run of the compiler, then, we can sometimes reuse these
|
||||||
|
query results to avoid re-executing a query. We do this by assigning
|
||||||
|
every query a **color**:
|
||||||
|
|
||||||
|
- If a query is colored **red**, that means that its result during
|
||||||
|
this compilation has **changed** from the previous compilation.
|
||||||
|
- If a query is colored **green**, that means that its result is
|
||||||
|
the **same** as the previous compilation.
|
||||||
|
|
||||||
|
There are two key insights here:
|
||||||
|
|
||||||
|
- First, if all the inputs to query Q are colored green, then the
|
||||||
|
query Q **must** result in the same value as last time and hence
|
||||||
|
need not be re-executed (or else the compiler is not deterministic).
|
||||||
|
- Second, even if some inputs to a query changes, it may be that it
|
||||||
|
**still** produces the same result as the previous compilation. In
|
||||||
|
particular, the query may only use part of its input.
|
||||||
|
- Therefore, after executing a query, we always check whether it
|
||||||
|
produced the same result as the previous time. **If it did,** we
|
||||||
|
can still mark the query as green, and hence avoid re-executing
|
||||||
|
dependent queries.
|
||||||
|
|
||||||
|
### The try-mark-green algorithm
|
||||||
|
|
||||||
|
The core of the incremental compilation is an algorithm called
|
||||||
|
"try-mark-green". It has the job of determining the color of a given
|
||||||
|
query Q (which must not yet have been executed). In cases where Q has
|
||||||
|
red inputs, determining Q's color may involve re-executing Q so that
|
||||||
|
we can compare its output; but if all of Q's inputs are green, then we
|
||||||
|
can determine that Q must be green without re-executing it or inspect
|
||||||
|
its value what-so-ever. In the compiler, this allows us to avoid
|
||||||
|
deserializing the result from disk when we don't need it, and -- in
|
||||||
|
fact -- enables us to sometimes skip *serializing* the result as well
|
||||||
|
(see the refinements section below).
|
||||||
|
|
||||||
|
Try-mark-green works as follows:
|
||||||
|
|
||||||
|
- First check if there is the query Q was executed during the previous
|
||||||
|
compilation.
|
||||||
|
- If not, we can just re-execute the query as normal, and assign it the
|
||||||
|
color of red.
|
||||||
|
- If yes, then load the 'dependent queries' that Q
|
||||||
|
- If there is a saved result, then we load the `reads(Q)` vector from the
|
||||||
|
query DAG. The "reads" is the set of queries that Q executed during
|
||||||
|
its execution.
|
||||||
|
- For each query R that in `reads(Q)`, we recursively demand the color
|
||||||
|
of R using try-mark-green.
|
||||||
|
- Note: it is important that we visit each node in `reads(Q)` in same order
|
||||||
|
as they occurred in the original compilation. See [the section on the query DAG below](#dag).
|
||||||
|
- If **any** of the nodes in `reads(Q)` wind up colored **red**, then Q is dirty.
|
||||||
|
- We re-execute Q and compare the hash of its result to the hash of the result
|
||||||
|
from the previous compilation.
|
||||||
|
- If the hash has not changed, we can mark Q as **green** and return.
|
||||||
|
- Otherwise, **all** of the nodes in `reads(Q)` must be **green**. In that case,
|
||||||
|
we can color Q as **green** and return.
|
||||||
|
|
||||||
|
<a name="dag">
|
||||||
|
|
||||||
|
### The query DAG
|
||||||
|
|
||||||
|
The query DAG code is stored in
|
||||||
|
[`src/librustc/dep_graph`][dep_graph]. Construction of the DAG is done
|
||||||
|
by instrumenting the query execution.
|
||||||
|
|
||||||
|
One key point is that the query DAG also tracks ordering; that is, for
|
||||||
|
each query Q, we noy only track the queries that Q reads, we track the
|
||||||
|
**order** in which they were read. This allows try-mark-green to walk
|
||||||
|
those queries back in the same order. This is important because once a subquery comes back as red,
|
||||||
|
we can no longer be sure that Q will continue along the same path as before.
|
||||||
|
That is, imagine a query like this:
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
fn main_query(tcx) {
|
||||||
|
if tcx.subquery1() {
|
||||||
|
tcx.subquery2()
|
||||||
|
} else {
|
||||||
|
tcx.subquery3()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Now imagine that in the first compilation, `main_query` starts by
|
||||||
|
executing `subquery1`, and this returns true. In that case, the next
|
||||||
|
query `main_query` executes will be `subquery2`, and `subquery3` will
|
||||||
|
not be executed at all.
|
||||||
|
|
||||||
|
But now imagine that in the **next** compilation, the input has
|
||||||
|
changed such that `subquery` returns **false**. In this case, `subquery2` would never
|
||||||
|
execute. If try-mark-green were to visit `reads(main_query)` out of order,
|
||||||
|
however, it might have visited `subquery2` before `subquery1`, and hence executed it.
|
||||||
|
This can lead to ICEs and other problems in the compiler.
|
||||||
|
|
||||||
|
[dep_graph]: https://github.com/rust-lang/rust/tree/master/src/librustc/dep_graph
|
||||||
|
|
||||||
|
## Improvements to the basic algorithm
|
||||||
|
|
||||||
|
In the description basic algorithm, we said that at the end of
|
||||||
|
compilation we would save the results of all the queries that were
|
||||||
|
performed. In practice, this can be quite wasteful -- many of those
|
||||||
|
results are very cheap to recompute, and serializing + deserializing
|
||||||
|
them is not a particular win. In practice, what we would do is to save
|
||||||
|
**the hashes** of all the subqueries that we performed. Then, in select cases,
|
||||||
|
we **also** save the results.
|
||||||
|
|
||||||
|
This is why the incremental algorithm separates computing the
|
||||||
|
**color** of a node, which often does not require its value, from
|
||||||
|
computing the **result** of a node. Computing the result is done via a simple algorithm
|
||||||
|
like so:
|
||||||
|
|
||||||
|
- Check if a saved result for Q is available. If so, compute the color of Q.
|
||||||
|
If Q is green, deserialize and return the saved result.
|
||||||
|
- Otherwise, execute Q.
|
||||||
|
- We can then compare the hash of the result and color Q as green if
|
||||||
|
it did not change.
|
||||||
|
|
||||||
|
# Footnotes
|
||||||
|
|
||||||
|
[^salsa]: I have long wanted to rename it to the Salsa algorithm, but it never caught on. -@nikomatsakis
|
||||||
|
|
@ -0,0 +1 @@
|
||||||
|
# The MIR (Mid-level IR)
|
||||||
Loading…
Reference in New Issue