Merge pull request #270 from michaelwoerister/query-eval-model-update

Add "The Query Evaluation Model in Detail" and "Incremental Compilation In Detail" chapters.
2019-01-30 13:30:48 +01:00 · 2019-01-30 13:30:48 +01:00 · 1ad362e6d6
parent 72b9911979 b8af56c8ac
commit 1ad362e6d6
7 changed files with 603 additions and 57 deletions
--- a/src/SUMMARY.md
+++ b/src/SUMMARY.md
@ -27,7 +27,9 @@
 - [The Rustc Driver](./rustc-driver.md)
    - [Rustdoc](./rustdoc.md)
 - [Queries: demand-driven compilation](./query.md)
-    - [Incremental compilation](./incremental-compilation.md)
+    - [The Query Evaluation Model in Detail](./queries/query-evaluation-model-in-detail.md)
    - [Incremental compilation](./queries/incremental-compilation.md)
    - [Incremental compilation In Detail](./queries/incremental-compilation-in-detail.md)
    - [Debugging and Testing](./incrcomp-debugging.md)
 - [The parser](./the-parser.md)
 - [`#[test]` Implementation](./test-implementation.md)
--- a/src/appendix/glossary.md
+++ b/src/appendix/glossary.md
@ -15,7 +15,7 @@ completeness            |  completeness is a technical term in type theory. Comp
 control-flow graph      |  a representation of the control-flow of a program; see [the background chapter for more](./background.html#cfg)
 CTFE                    |  Compile-Time Function Evaluation. This is the ability of the compiler to evaluate `const fn`s at compile time. This is part of the compiler's constant evaluation system. ([see more](../const-eval.html))
 cx                      |  we tend to use "cx" as an abbreviation for context. See also `tcx`, `infcx`, etc.
-DAG                     |  a directed acyclic graph is used during compilation to keep track of dependencies between queries. ([see more](../incremental-compilation.html))
+DAG                     |  a directed acyclic graph is used during compilation to keep track of dependencies between queries. ([see more](../queries/incremental-compilation.html))
 data-flow analysis      |  a static analysis that figures out what properties are true at each point in the control-flow of a program; see [the background chapter for more](./background.html#dataflow)
 DefId                   |  an index identifying a definition (see `librustc/hir/def_id.rs`). Uniquely identifies a `DefPath`.
 Double pointer          |  a pointer with additional metadata. See "fat pointer" for more.
--- a/src/queries/incremental-compilation-in-detail.md
+++ b/src/queries/incremental-compilation-in-detail.md
@ -0,0 +1,354 @@
 # Incremental Compilation In Detail
 The incremental compilation scheme is, in essence, a surprisingly
 simple extension to the overall query system. It relies on the fact that:
  1. queries are pure functions -- given the same inputs, a query will always
     yield the same result, and
  2. the query model structures compilation in an acyclic graph that makes
     dependencies between individual computations explicit.
 This chapter will explain how we can use these properties for making things
 incremental and then goes on to discuss version implementation issues.
 # A Basic Algorithm For Incremental Query Evaluation
 As explained in the [query evaluation model primer][query-model], query
 invocations form a directed-acyclic graph. Here's the example from the
 previous chapter again:
 ```ignore
  list_of_all_hir_items <----------------------------- type_check_crate()
                                                               |
                                                               |
  Hir(foo) <--- type_of(foo) <--- type_check_item(foo) <-------+
                                      |                        |
                    +-----------------+                        |
                    |                                          |
                    v                                          |
  Hir(bar) <--- type_of(bar) <--- type_check_item(bar) <-------+
 ```
 Since every access from one query to another has to go through the query
 context, we can record these accesses and thus actually build this dependency
 graph in memory. With dependency tracking enabled, when compilation is done,
 we know which queries were invoked (the nodes of the graph) and for each
 invocation, which other queries or input has gone into computing the query's
 result (the edges of the graph).
 Now suppose, we change the source code of our program so that
 HIR of `bar` looks different than before. Our goal is to only recompute
 those queries that are actually affected by the change while just re-using
 the cached results of all the other queries. Given the dependency graph we can
 do exactly that. For a given query invocation, the graph tells us exactly
 what data has gone into computing its results, we just have to follow the
 edges until we reach something that has changed. If we don't encounter
 anything that has changed, we know that the query still would evaluate to
 the same result we already have in our cache.
 Taking the `type_of(foo)` invocation from above as example, we can check
 whether the cached result is still valid by following the edges to its
 inputs. The only edge leads to `Hir(foo)`, an input that has not been affected
 by the change. So we know that the cached result for `type_of(foo)` is still
 valid.
 The story is a bit different for `type_check_item(foo)`: We again walk the
 edges and already know that `type_of(foo)` is fine. Then we get to
 `type_of(bar)` which we have not checked yet, so we walk the edges of
 `type_of(bar)` and encounter `Hir(bar)` which *has* changed. Consequently
 the result of `type_of(bar)` might yield a different same result than what we
 have in the cache and, transitively, the result of `type_check_item(foo)`
 might have changed too. We thus re-run `type_check_item(foo)`, which in
 turn will re-run `type_of(bar)`, which will yield an up-to-date result
 because it reads the up-to-date version of `Hir(bar)`.
 # The Problem With The Basic Algorithm: False Positives
 If you read the previous paragraph carefully, you'll notice that it says that
 `type_of(bar)` *might* have changed because one of its inputs has changed.
 There's also the possibility that it might still yield exactly the same
 result *even though* its input has changed. Consider an example with a
 simple query that just computes the sign of an integer:
 ```ignore
  IntValue(x) <---- sign_of(x) <--- some_other_query(x)
 ```
 Let's say that `IntValue(x)` starts out as `1000` and then is set to `2000`.
 Even though `IntValue(x)` is different in the two cases, `sign_of(x)` yields
 the result `+` in both cases.
 If we follow the basic algorithm, however, `some_other_query(x)` would have to
 (unnecessarily) be re-evaluated because it transitively depends on a changed
 input. Change detection yields a "false positive" in this case because it has
 to conservatively assume that `some_other_query(x)` might be affected by that
 changed input.
 Unfortunately it turns out that the actual queries in the compiler are full
 of examples like this and small changes to the input often potentially affect
 very large parts of the output binaries. As a consequence, we had to make the
 change detection system smarter and more accurate.
 # Improving Accuracy: The red-green Algorithm
 The "false positives" problem can be solved by interleaving change detection
 and query re-evaluation. Instead of walking the graph all the way to the
 inputs when trying to find out if some cached result is still valid, we can
 check if a result has *actually* changed after we were forced to re-evaluate
 it.
 We call this algorithm, for better or worse, the red-green algorithm because nodes
 in the dependency graph are assigned the color green if we were able to prove
 that its cached result is still valid and the color red if the result has
 turned out to be different after re-evaluating it.
 The meat of red-green change tracking is implemented in the try-mark-green
 algorithm, that, you've guessed it, tries to mark a given node as green:
 ```rust,ignore
 fn try_mark_green(tcx, current_node) -> bool {
    // Fetch the inputs to `current_node`, i.e. get the nodes that the direct
    // edges from `node` lead to.
    let dependencies = tcx.dep_graph.get_dependencies_of(current_node);
    // Now check all the inputs for changes
    for dependency in dependencies {
        match tcx.dep_graph.get_node_color(dependency) {
            Green => {
                // This input has already been checked before and it has not
                // changed; so we can go on to check the next one
            }
            Red => {
                // We found an input that has changed. We cannot mark
                // `current_node` as green without re-running the
                // corresponding query.
                return false
            }
            Unknown => {
                // This is the first time we are look at this node. Let's try
                // to mark it green by calling try_mark_green() recursively.
                if try_mark_green(tcx, dependency) {
                    // We successfully marked the input as green, on to the
                    // next.
                } else {
                    // We could *not* mark the input as green. This means we
                    // don't know if its value has changed. In order to find
                    // out, we re-run the corresponding query now!
                    tcx.run_query_for(dependency);
                    // Fetch and check the node color again. Running the query
                    // has forced it to either red (if it yielded a different
                    // result than we have in the cache) or green (if it
                    // yielded the same result).
                    match tcx.dep_graph.get_node_color(dependency) {
                        Red => {
                            // The input turned out to be red, so we cannot
                            // mark `current_node` as green.
                            return false
                        }
                        Green => {
                            // Re-running the query paid off! The result is the
                            // same as before, so this particular input does
                            // not invalidate `current_node`.
                        }
                        Unknown => {
                            // There is no way a node has no color after
                            // re-running the query.
                            panic!("unreachable")
                        }
                    }
                }
            }
        }
    }
    // If we have gotten through the entire loop, it means that all inputs
    // have turned out to be green. If all inputs are unchanged, it means
    // that the query result corresponding to `current_node` cannot have
    // changed either.
    tcx.dep_graph.mark_green(current_node);
    true
 }
 // Note: The actual implementation can be found in
 //       src/librustc/dep_graph/graph.rs
 ```
 By using red-green marking we can avoid the devastating cumulative effect of
 having false positives during change detection. Whenever a query is executed
 in incremental mode, we first check if its already green. If not, we run
 `try_mark_green()` on it. If it still isn't green after that, then we actually
 invoke the query provider to re-compute the result.
 # The Real World: How Persistence Makes Everything Complicated
 The sections above described the underlying algorithm for incremental
 compilation but because the compiler process exits after being finished and
 takes the query context with its result cache with it into oblivion, we have
 persist data to disk, so the next compilation session can make use of it.
 This comes with a whole new set of implementation challenges:
 - The query results cache is stored to disk, so they are not readily available
  for change comparison.
 - A subsequent compilation session will start off with new version of the code
  that has arbitrary changes applied to it. All kinds of IDs and indices that
  are generated from a global, sequential counter (e.g. `NodeId`, `DefId`, etc)
  might have shifted, making the persisted results on disk not immediately
  usable anymore because the same numeric IDs and indices might refer to
  completely new things in the new compilation session.
 - Persisting things to disk comes at a cost, so not every tiny piece of
  information should be actually cached in between compilation sessions.
  Fixed-sized, plain-old-data is preferred to complex things that need to run
  branching code during (de-)serialization.
 The following sections describe how the compiler currently solves these issues.
 ## A Question Of Stability: Bridging The Gap Between Compilation Sessions
 As noted before, various IDs (like `DefId`) are generated by the compiler in a
 way that depends on the contents of the source code being compiled. ID assignment
 is usually deterministic, that is, if the exact same code is compiled twice,
 the same things will end up with the same IDs. However, if something
 changes, e.g. a function is added in the middle of a file, there is no
 guarantee that anything will have the same ID as it had before.
 As a consequence we cannot represent the data in our on-disk cache the same
 way it is represented in memory. For example, if we just stored a piece
 of type information like `TyKind::FnDef(DefId, &'tcx Substs<'tcx>)` (as we do
 in memory) and then the contained `DefId` points to a different function in
 a new compilation session we'd be in trouble.
 The solution to this problem is to find "stable" forms for IDs which remain
 valid in between compilation sessions. For the most important case, `DefId`s,
 these are the so-called `DefPath`s. Each `DefId` has a
 corresponding `DefPath` but in place of a numeric ID, a `DefPath` is based on
 the path to the identified item, e.g. `std::collections::HashMap`. The
 advantage of an ID like this is that it is not affected by unrelated changes.
 For example, one can add a new function to `std::collections` but
 `std::collections::HashMap` would still be `std::collections::HashMap`. A
 `DefPath` is "stable" across changes made to the source code while a `DefId`
 isn't.
 There is also the `DefPathHash` which is just a 128-bit hash value of the
 `DefPath`. The two contain the same information and we mostly use the
 `DefPathHash` because it simpler to handle, being `Copy` and self-contained.
 This principle of stable identifiers is used to make the data in the on-disk
 cache resilient to source code changes. Instead of storing a `DefId`, we store
 the `DefPathHash` and when we deserialize something from the cache, we map the
 `DefPathHash` to the corresponding `DefId` in the *current* compilation session
 (which is just a simple hash table lookup).
 The `HirId`, used for identifying HIR components that don't have their own
 `DefId`, is another such stable ID. It is (conceptually) a pair of a `DefPath`
 and a `LocalId`, where the `LocalId` identifies something (e.g. a `hir::Expr`)
 locally within its "owner" (e.g. a `hir::Item`). If the owner is moved around,
 the `LocalId`s within it are still the same.
 ## Checking Query Results For Changes: StableHash And Fingerprints
 In order to do red-green-marking we often need to check if the result of a
 query has changed compared to the result it had during the previous
 compilation session. There are two performance problems with this though:
 - We'd like to avoid having to load the previous result from disk just for
  doing the comparison. We already computed the new result and will use that.
  Also loading a result from disk will "pollute" the interners with data that
  is unlikely to ever be used.
 - We don't want to store each and every result in the on-disk cache. For
  example, it would be wasted effort to persist things to disk that are
  already available in upstream crates.
 The compiler avoids these problems by using so-called `Fingerprint`s. Each time
 a new query result is computed, the query engine will compute a 128 bit hash
 value of the result. We call this hash value "the `Fingerprint` of the query
 result". The hashing is (and has to be) done "in a stable way". This means
 that whenever something is hashed that might change in between compilation
 sessions (e.g. a `DefId`), we instead hash its stable equivalent
 (e.g. the corresponding `DefPath`). That's what the whole `StableHash`
 infrastructure is for. This way `Fingerprint`s computed in two
 different compilation sessions are still comparable.
 The next step is to store these fingerprints along with the dependency graph.
 This is cheap since fingerprints are just bytes to be copied. It's also cheap to
 load the entire set of fingerprints together with the dependency graph.
 Now, when red-green-marking reaches the point where it needs to check if a
 result has changed, it can just compare the (already loaded) previous
 fingerprint to the fingerprint of the new result.
 This approach works rather well but it's not without flaws:
 - There is a small possibility of hash collisions. That is, two different
  results could have the same fingerprint and the system would erroneously
  assume that the result hasn't changed, leading to a missed update.
  We mitigate this risk by using a high-quality hash function and a 128 bit
  wide hash value. Due to these measures the practical risk of a hash
  collision is negligible.
 - Computing fingerprints is quite costly. It is the main reason why incremental
  compilation can be slower than non-incremental compilation. We are forced to
  use a good and thus expensive hash function, and we have to map things to
  their stable equivalents while doing the hashing.
 In the future we might want to explore different approaches to this problem.
 For now it's `StableHash` and `Fingerprint`.
 ## A Tale Of Two DepGraphs: The Old And The New
 The initial description of dependency tracking glosses over a few details
 that quickly become a head scratcher when actually trying to implement things.
 In particular it's easy to overlook that we are actually dealing with *two*
 dependency graphs: The one we built during the previous compilation session and
 the one that we are building for the current compilation session.
 When a compilation session starts, the compiler loads the previous dependency
 graph into memory as an immutable piece of data. Then, when a query is invoked,
 it will first try to mark the corresponding node in the graph as green. This
 means really that we are trying to mark the node in the *previous* dep-graph
 as green that corresponds to the query key in the *current* session. How do we
 do this mapping between current query key and previous `DepNode`? The answer
 is again `Fingerprint`s: Nodes in the dependency graph are identified by a
 fingerprint of the query key. Since fingerprints are stable across compilation
 sessions, computing one in the current session allows us to find a node
 in the dependency graph from the previous session. If we don't find a node with
 the given fingerprint, it means that the query key refers to something that
 did not yet exist in the previous session.
 So, having found the dep-node in the previous dependency graph, we can look
 up its dependencies (also dep-nodes in the previous graph) and continue with
 the rest of the try-mark-green algorithm. The next interesting thing happens
 when we successfully marked the node as green. At that point we copy the node
 and the edges to its dependencies from the old graph into the new graph. We
 have to do this because the new dep-graph cannot not acquire the
 node and edges via the regular dependency tracking. The tracking system can
 only record edges while actually running a query -- but running the query,
 although we have the result already cached, is exactly what we want to avoid.
 Once the compilation session has finished, all the unchanged parts have been
 copied over from the old into the new dependency graph, while the changed parts
 have been added to the new graph by the tracking system. At this point, the
 new graph is serialized out to disk, alongside the query result cache, and can
 act as the previous dep-graph in a subsequent compilation session.
 ## Didn't You Forget Something?: Cache Promotion
 TODO
 # The Future: Shortcomings Of The Current System and Possible Solutions
 TODO
 [query-model]: ./query-evaluation-model-in-detail.html
--- a/src/queries/incremental-compilation.md
+++ b/src/queries/incremental-compilation.md
--- a/src/queries/query-evaluation-model-in-detail.md
+++ b/src/queries/query-evaluation-model-in-detail.md
@ -0,0 +1,237 @@
 # The Query Evaluation Model in Detail
 This chapter provides a deeper dive into the abstract model queries are built on.
 It does not go into implementation details but tries to explain
 the underlying logic. The examples here, therefore, have been stripped down and
 simplified and don't directly reflect the compilers internal APIs.
 ## What is a query?
 Abstractly we view the compiler's knowledge about a given crate as a "database"
 and queries are the way of asking the compiler questions about it, i.e.
 we "query" the compiler's "database" for facts.
 However, there's something special to this compiler database: It starts out empty
 and is filled on-demand when queries are executed. Consequently, a query must
 know how to compute its result if the database does not contain it yet. For
 doing so, it can access other queries and certain input values that the database
 is pre-filled with on creation.
 A query thus consists of the following things:
 - A name that identifies the query
 - A "key" that specifies what we want to look up
 - A result type that specifies what kind of result it yields
 - A "provider" which is a function that specifies how the result is to be
   computed if it isn't already present in the database.
 As an example, the name of the `type_of` query is `type_of`, its query key is a
 `DefId` identifying the item we want to know the type of, the result type is
 `Ty<'tcx>`, and the provider is a function that, given the query key and access
 to the rest of the database, can compute the type of the item identified by the
 key.
 So in some sense a query is just a function that maps the query key to the
 corresponding result. However, we have to apply some restrictions in order for
 this to be sound:
 - The key and result must be immutable values.
 - The provider function must be a pure function, that is, for the same key it
   must always yield the same result.
 - The only parameters a provider function takes are the key and a reference to
   the "query context" (which provides access to rest of the "database").
 The database is built up lazily by invoking queries. The query providers will
 invoke other queries, for which the result is either already cached or computed
 by calling another query provider. These query provider invocations
 conceptually form a directed acyclic graph (DAG) at the leaves of which are
 input values that are already known when the query context is created.
 ## Caching/Memoization
 Results of query invocations are "memoized" which means that the query context
 will cache the result in an internal table and, when the query is invoked with
 the same query key again, will return the result from the cache instead of
 running the provider again.
 This caching is crucial for making the query engine efficient. Without
 memoization the system would still be sound (that is, it would yield the same
 results) but the same computations would be done over and over again.
 Memoization is one of the main reasons why query providers have to be pure
 functions. If calling a provider function could yield different results for
 each invocation (because it accesses some global mutable state) then we could
 not memoize the result.
 ## Input data
 When the query context is created, it is still empty: No queries have been
 executed, no results are cached. But the context already provides access to
 "input" data, i.e. pieces of immutable data that where computed before the
 context was created and that queries can access to do their computations.
 Currently this input data consists mainly of the HIR map and the command-line
 options the compiler was invoked with. In the future, inputs will just consist
 of command-line options and a list of source files -- the HIR map will itself
 be provided by a query which processes these source files.
 Without inputs, queries would live in a void without anything to compute their
 result from (remember, query providers only have access to other queries and
 the context but not any other outside state or information).
 For a query provider, input data and results of other queries look exactly the
 same: It just tells the context "give me the value of X". Because input data
 is immutable, the provider can rely on it being the same across
 different query invocations, just as is the case for query results.
 ## An example execution trace of some queries
 How does this DAG of query invocations come into existence? At some point
 the compiler driver will create the, as yet empty, query context. It will then,
 from outside of the query system, invoke the queries it needs to perform its
 task. This looks something like the following:
 ```rust,ignore
 fn compile_crate() {}
    let cli_options = ...;
    let hir_map = ...;
    // Create the query context `tcx`
    let tcx = TyCtxt::new(cli_options, hir_map);
    // Do type checking by invoking the type check query
    tcx.type_check_crate();
 }
 ```
 The `type_check_crate` query provider would look something like the following:
 ```rust,ignore
 fn type_check_crate_provider(tcx, _key: ()) {
    let list_of_items = tcx.hir_map.list_of_items();
    for item_def_id in list_of_hir_items {
        tcx.type_check_item(item_def_id);
    }
 }
 ```
 We see that the `type_check_crate` query accesses input data
 (`tcx.hir_map.list_of_items()`) and invokes other queries
 (`type_check_item`). The `type_check_item`
 invocations will themselves access input data and/or invoke other queries,
 so that in the end the DAG of query invocations will be built up backwards
 from the node that was initially executed:
 ```ignore
         (2)                                                 (1)
  list_of_all_hir_items <----------------------------- type_check_crate()
                                                               |
    (5)             (4)                  (3)                   |
  Hir(foo) <--- type_of(foo) <--- type_check_item(foo) <-------+
                                      |                        |
                    +-----------------+                        |
                    |                                          |
    (7)             v  (6)                  (8)                |
  Hir(bar) <--- type_of(bar) <--- type_check_item(bar) <-------+
 // (x) denotes invocation order
 ```
 We also see that often a query result can be read from the cache:
 `type_of(bar)` was computed for `type_check_item(foo)` so when
 `type_check_item(bar)` needs it, it is already in the cache.
 Query results stay cached in the query context as long as the context lives.
 So if the compiler driver invoked another query later on, the above graph
 would still exist and already executed queries would not have to be re-done.
 ## Cycles
 Earlier we stated that query invocations form a DAG. However, it would be easy
 form a cyclic graph by, for example, having a query provider like the following:
 ```rust,ignore
 fn cyclic_query_provider(tcx, key) -> u32 {
  // Invoke the same query with the same key again
  tcx.cyclic_query(key)
 }
 ```
 Since query providers are regular functions, this would behave much as expected:
 Evaluation would get stuck in an infinite recursion. A query like this would not
 be very useful either. However, sometimes certain kinds of invalid user input
 can result in queries being called in a cyclic way. The query engine includes
 a check for cyclic invocations and, because cycles are an irrecoverable error,
 will abort execution with a "cycle error" messages that tries to be human
 readable.
 At some point the compiler had a notion of "cycle recovery", that is, one could
 "try" to execute a query and if it ended up causing a cycle, proceed in some
 other fashion. However, this was later removed because it is not entirely
 clear what the theoretical consequences of this are, especially regarding
 incremental compilation.
 ## "Steal" Queries
 Some queries have their result wrapped in a `Steal<T>` struct. These queries
 behave exactly the same as regular with one exception: Their result is expected
 to be "stolen" out of the cache at some point, meaning some other part of the
 program is taking ownership of it and the result cannot be accessed anymore.
 This stealing mechanism exists purely as a performance optimization because some
 result values are too costly to clone (e.g. the MIR of a function). It seems
 like result stealing would violate the condition that query results must be
 immutable (after all we are moving the result value out of the cache) but it is
 OK as long as the mutation is not observable. This is achieved by two things:
 - Before a result is stolen, we make sure to eagerly run all queries that
  might ever need to read that result. This has to be done manually by calling
  those queries.
 - Whenever a query tries to access a stolen result, we make the compiler ICE so
  that such a condition cannot go unnoticed.
 This is not an ideal setup because of the manual intervention needed, so it
 should be used sparingly and only when it is well known which queries might
 access a given result. In practice, however, stealing has not turned out to be
 much of a maintainance burden.
 To summarize: "Steal queries" break some of the rules in a controlled way.
 There are checks in place that make sure that nothing can go silently wrong.
 ## Parallel Query Execution
 The query model has some properties that make it actually feasible to evaluate
 multiple queries in parallel without too much of an effort:
 - All data a query provider can access is accessed via the query context, so
  the query context can take care of synchronizing access.
 - Query results are required to be immutable so they can safely be used by
  different threads concurrently.
 The nightly compiler already implements parallel query evaluation as follows:
 When a query `foo` is evaluated, the cache table for `foo` is locked.
 - If there already is a result, we can clone it,release the lock and
  we are done.
 - If there is no cache entry and no other active query invocation computing the
  same result, we mark the key as being "in progress", release the lock and
  start evaluating.
 - If there *is* another query invocation for the same key in progress, we
  release the lock, and just block the thread until the other invocation has
  computed the result we are waiting for. This cannot deadlock because, as
  mentioned before, query invocations form a DAG. Some thread will always make
  progress.
--- a/src/query.md
+++ b/src/query.md
@ -35,6 +35,12 @@ will in turn demand information about that crate, starting from the
 However, that vision is not fully realized. Still, big chunks of the
 compiler (for example, generating MIR) work exactly like this.
 ### The Query Evaluation Model in Detail
 The [Query Evaluation Model in Detail][query-model] chapter gives a more
 in-depth description of what queries are and how they work.
 If you intend to write a query of your own, this is a good read.
 ### Invoking queries
 To invoke a query is simple. The tcx ("type context") offers a method
@ -45,60 +51,6 @@ query, you would just do this:
 let ty = tcx.type_of(some_def_id);
 ```
 ### Cycles between queries
 A cycle is when a query becomes stuck in a loop e.g. query A generates query B
 which generates query A again.
 Currently, cycles during query execution should always result in a
 compilation error. Typically, they arise because of illegal programs
 that contain cyclic references they shouldn't (though sometimes they
 arise because of compiler bugs, in which case we need to factor our
 queries in a more fine-grained fashion to avoid them).
 However, it is nonetheless often useful to *recover* from a cycle
 (after reporting an error, say) and try to soldier on, so as to give a
 better user experience. In order to recover from a cycle, you don't
 get to use the nice method-call-style syntax. Instead, you invoke
 using the `try_get` method, which looks roughly like this:
 ```rust,ignore
 use ty::queries;
 ...
 match queries::type_of::try_get(tcx, DUMMY_SP, self.did) {
  Ok(result) => {
    // no cycle occurred! You can use `result`
  }
  Err(err) => {
    // A cycle occurred! The error value `err` is a `DiagnosticBuilder`,
    // meaning essentially an "in-progress", not-yet-reported error message.
    // See below for more details on what to do here.
  }
 }
 ```
 So, if you get back an `Err` from `try_get`, then a cycle *did* occur. This
 means that you must ensure that a compiler error message is reported. You can
 do that in two ways:
 The simplest is to invoke `err.emit()`. This will emit the cycle error to the
 user.
 However, often cycles happen because of an illegal program, and you
 know at that point that an error either already has been reported or
 will be reported due to this cycle by some other bit of code. In that
 case, you can invoke `err.cancel()` to not emit any error. It is
 traditional to then invoke:
 ```rust,ignore
 tcx.sess.delay_span_bug(some_span, "some message")
 ```
 `delay_span_bug()` is a helper that says: we expect a compilation
 error to have happened or to happen in the future; so, if compilation
 ultimately succeeds, make an ICE with the message `"some
 message"`. This is basically just a precaution in case you are wrong.
 ### How the compiler executes a query
 So you may be wondering what happens when you invoke a query
@ -315,3 +267,4 @@ impl<'tcx> QueryDescription for queries::type_of<'tcx> {
 }
 ```
 [query-model]: queries/query-evaluation-model-in-detail.html
--- a/src/variance.md
+++ b/src/variance.md
@ -139,7 +139,7 @@ crate (through `crate_variances`), but since most changes will not result in a
 change to the actual results from variance inference, the `variances_of` query
 will wind up being considered green after it is re-evaluated.
-[rga]: ./incremental-compilation.html
+[rga]: ./queries/incremental-compilation.html
 <a name="addendum"></a>