From 4ebe82a278021767747c2a18ae977419c02a9e02 Mon Sep 17 00:00:00 2001
From: Timothy Maloney <tmaloney@pdx.edu>
Date: Mon, 6 Sep 2021 13:11:09 -0700
Subject: [PATCH] Docs: consolidated parallelism information

---
 src/overview.md                               |  7 +--
 src/parallel-rustc.md                         | 61 ++++++++++++++-----
 .../query-evaluation-model-in-detail.md       | 26 --------
 3 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/src/overview.md b/src/overview.md
index 2e4c6955..26b291d3 100644
--- a/src/overview.md
+++ b/src/overview.md
@@ -297,12 +297,7 @@ Compiler performance is a problem that we would like to improve on
 (and are always working on). One aspect of that is parallelizing
 `rustc` itself.
 
-Currently, there is only one part of rustc that is already parallel: codegen.
-During monomorphization, the compiler will split up all the code to be
-generated into smaller chunks called _codegen units_. These are then generated
-by independent instances of LLVM. Since they are independent, we can run them
-in parallel. At the end, the linker is run to combine all the codegen units
-together into one binary.
+Currently, there is only one part of rustc that is parallel by default: codegen.
 
 However, the rest of the compiler is still not yet parallel. There have been
 lots of efforts spent on this, but it is generally a hard problem. The current
diff --git a/src/parallel-rustc.md b/src/parallel-rustc.md
index eec8219a..243dca98 100644
--- a/src/parallel-rustc.md
+++ b/src/parallel-rustc.md
@@ -1,25 +1,55 @@
 # Parallel Compilation
 
-Most of the compiler is not parallel. This represents an opportunity for
-improving compiler performance.
+As of <!-- date: 2021-09 --> September 2021, The only stage of the compiler 
+that is already parallel is codegen. The nightly compiler implements query evaluation,
+but there is a lot of correctness work that needs to be done. The lack of parallelism at other stages 
+also represents an opportunity for improving compiler performance. One can try out the current 
+parallel compiler work by enabling it in the `config.toml`.
+
+These next few sections describe where and how parallelism is currently used, 
+and the current status of making parallel compilation the default in `rustc`.
+
+The underlying thread-safe data-structures used in the parallel compiler 
+can be found in `rustc_data_structures/sync.rs`. Some of these data structures
+use the `parking_lot` API.
+
+## Code Gen
+
+During [monomorphization][monomorphization] the compiler splits up all the code to 
+be generated into smaller chunks called _codegen units_. These are then generated by 
+independent instances of LLVM running in parallel. At the end, the linker 
+is run to combine all the codegen units together into one binary.
+
+## Query System 
+
+The query model has some properties that make it actually feasible to evaluate
+multiple queries in parallel without too much of an effort:
+
+- All data a query provider can access is accessed via the query context, so
+  the query context can take care of synchronizing access.
+- Query results are required to be immutable so they can safely be used by
+  different threads concurrently.
+
+
+When a query `foo` is evaluated, the cache table for `foo` is locked.
+
+- If there already is a result, we can clone it, release the lock and
+  we are done.
+- If there is no cache entry and no other active query invocation computing the
+  same result, we mark the key as being "in progress", release the lock and
+  start evaluating.
+- If there *is* another query invocation for the same key in progress, we
+  release the lock, and just block the thread until the other invocation has
+  computed the result we are waiting for. This cannot deadlock because, as
+  mentioned before, query invocations form a DAG. Some thread will always make
+  progress.
+
+## Current Status
 
 As of <!-- date: 2021-07 --> July 2021, work on explicitly parallelizing the
 compiler has stalled. There is a lot of design and correctness work that needs
 to be done.
 
-One can try out the current parallel compiler work by enabling it in the
-`config.toml`.
-
-There are a few basic ideas in this effort:
-
-- There are a lot of loops in the compiler that just iterate over all items in
-  a crate. These can possibly be parallelized.
-- We can use (a custom fork of) [`rayon`] to run tasks in parallel. The custom
-  fork allows the execution of DAGs of tasks, not just trees.
-- There are currently a lot of global data structures that need to be made
-  thread-safe. A key strategy here has been converting interior-mutable
-  data-structures (e.g. `Cell`) into their thread-safe siblings (e.g. `Mutex`).
-
 [`rayon`]: https://crates.io/crates/rayon
 
 As of <!-- date: 2021-02 --> February 2021, much of this effort is on hold due
@@ -45,3 +75,4 @@ are a bit out of date):
 [imlist]: https://github.com/nikomatsakis/rustc-parallelization/blob/master/interior-mutability-list.md
 [irlo1]: https://internals.rust-lang.org/t/help-test-parallel-rustc/11503
 [tracking]: https://github.com/rust-lang/rust/issues/48685
+[monomorphization]:https://rustc-dev-guide.rust-lang.org/backend/monomorph.html
diff --git a/src/queries/query-evaluation-model-in-detail.md b/src/queries/query-evaluation-model-in-detail.md
index 4c2427e3..b84a5dac 100644
--- a/src/queries/query-evaluation-model-in-detail.md
+++ b/src/queries/query-evaluation-model-in-detail.md
@@ -211,29 +211,3 @@ much of a maintenance burden.
 
 To summarize: "Steal queries" break some of the rules in a controlled way.
 There are checks in place that make sure that nothing can go silently wrong.
-
-
-## Parallel Query Execution
-
-The query model has some properties that make it actually feasible to evaluate
-multiple queries in parallel without too much of an effort:
-
-- All data a query provider can access is accessed via the query context, so
-  the query context can take care of synchronizing access.
-- Query results are required to be immutable so they can safely be used by
-  different threads concurrently.
-
-The nightly compiler already implements parallel query evaluation as follows:
-
-When a query `foo` is evaluated, the cache table for `foo` is locked.
-
-- If there already is a result, we can clone it, release the lock and
-  we are done.
-- If there is no cache entry and no other active query invocation computing the
-  same result, we mark the key as being "in progress", release the lock and
-  start evaluating.
-- If there *is* another query invocation for the same key in progress, we
-  release the lock, and just block the thread until the other invocation has
-  computed the result we are waiting for. This cannot deadlock because, as
-  mentioned before, query invocations form a DAG. Some thread will always make
-  progress.