From 4ebe82a278021767747c2a18ae977419c02a9e02 Mon Sep 17 00:00:00 2001 From: Timothy Maloney Date: Mon, 6 Sep 2021 13:11:09 -0700 Subject: [PATCH] Docs: consolidated parallelism information --- src/overview.md | 7 +-- src/parallel-rustc.md | 61 ++++++++++++++----- .../query-evaluation-model-in-detail.md | 26 -------- 3 files changed, 47 insertions(+), 47 deletions(-) diff --git a/src/overview.md b/src/overview.md index 2e4c6955..26b291d3 100644 --- a/src/overview.md +++ b/src/overview.md @@ -297,12 +297,7 @@ Compiler performance is a problem that we would like to improve on (and are always working on). One aspect of that is parallelizing `rustc` itself. -Currently, there is only one part of rustc that is already parallel: codegen. -During monomorphization, the compiler will split up all the code to be -generated into smaller chunks called _codegen units_. These are then generated -by independent instances of LLVM. Since they are independent, we can run them -in parallel. At the end, the linker is run to combine all the codegen units -together into one binary. +Currently, there is only one part of rustc that is parallel by default: codegen. However, the rest of the compiler is still not yet parallel. There have been lots of efforts spent on this, but it is generally a hard problem. The current diff --git a/src/parallel-rustc.md b/src/parallel-rustc.md index eec8219a..243dca98 100644 --- a/src/parallel-rustc.md +++ b/src/parallel-rustc.md @@ -1,25 +1,55 @@ # Parallel Compilation -Most of the compiler is not parallel. This represents an opportunity for -improving compiler performance. +As of September 2021, The only stage of the compiler +that is already parallel is codegen. The nightly compiler implements query evaluation, +but there is a lot of correctness work that needs to be done. The lack of parallelism at other stages +also represents an opportunity for improving compiler performance. One can try out the current +parallel compiler work by enabling it in the `config.toml`. + +These next few sections describe where and how parallelism is currently used, +and the current status of making parallel compilation the default in `rustc`. + +The underlying thread-safe data-structures used in the parallel compiler +can be found in `rustc_data_structures/sync.rs`. Some of these data structures +use the `parking_lot` API. + +## Code Gen + +During [monomorphization][monomorphization] the compiler splits up all the code to +be generated into smaller chunks called _codegen units_. These are then generated by +independent instances of LLVM running in parallel. At the end, the linker +is run to combine all the codegen units together into one binary. + +## Query System + +The query model has some properties that make it actually feasible to evaluate +multiple queries in parallel without too much of an effort: + +- All data a query provider can access is accessed via the query context, so + the query context can take care of synchronizing access. +- Query results are required to be immutable so they can safely be used by + different threads concurrently. + + +When a query `foo` is evaluated, the cache table for `foo` is locked. + +- If there already is a result, we can clone it, release the lock and + we are done. +- If there is no cache entry and no other active query invocation computing the + same result, we mark the key as being "in progress", release the lock and + start evaluating. +- If there *is* another query invocation for the same key in progress, we + release the lock, and just block the thread until the other invocation has + computed the result we are waiting for. This cannot deadlock because, as + mentioned before, query invocations form a DAG. Some thread will always make + progress. + +## Current Status As of July 2021, work on explicitly parallelizing the compiler has stalled. There is a lot of design and correctness work that needs to be done. -One can try out the current parallel compiler work by enabling it in the -`config.toml`. - -There are a few basic ideas in this effort: - -- There are a lot of loops in the compiler that just iterate over all items in - a crate. These can possibly be parallelized. -- We can use (a custom fork of) [`rayon`] to run tasks in parallel. The custom - fork allows the execution of DAGs of tasks, not just trees. -- There are currently a lot of global data structures that need to be made - thread-safe. A key strategy here has been converting interior-mutable - data-structures (e.g. `Cell`) into their thread-safe siblings (e.g. `Mutex`). - [`rayon`]: https://crates.io/crates/rayon As of February 2021, much of this effort is on hold due @@ -45,3 +75,4 @@ are a bit out of date): [imlist]: https://github.com/nikomatsakis/rustc-parallelization/blob/master/interior-mutability-list.md [irlo1]: https://internals.rust-lang.org/t/help-test-parallel-rustc/11503 [tracking]: https://github.com/rust-lang/rust/issues/48685 +[monomorphization]:https://rustc-dev-guide.rust-lang.org/backend/monomorph.html diff --git a/src/queries/query-evaluation-model-in-detail.md b/src/queries/query-evaluation-model-in-detail.md index 4c2427e3..b84a5dac 100644 --- a/src/queries/query-evaluation-model-in-detail.md +++ b/src/queries/query-evaluation-model-in-detail.md @@ -211,29 +211,3 @@ much of a maintenance burden. To summarize: "Steal queries" break some of the rules in a controlled way. There are checks in place that make sure that nothing can go silently wrong. - - -## Parallel Query Execution - -The query model has some properties that make it actually feasible to evaluate -multiple queries in parallel without too much of an effort: - -- All data a query provider can access is accessed via the query context, so - the query context can take care of synchronizing access. -- Query results are required to be immutable so they can safely be used by - different threads concurrently. - -The nightly compiler already implements parallel query evaluation as follows: - -When a query `foo` is evaluated, the cache table for `foo` is locked. - -- If there already is a result, we can clone it, release the lock and - we are done. -- If there is no cache entry and no other active query invocation computing the - same result, we mark the key as being "in progress", release the lock and - start evaluating. -- If there *is* another query invocation for the same key in progress, we - release the lock, and just block the thread until the other invocation has - computed the result we are waiting for. This cannot deadlock because, as - mentioned before, query invocations form a DAG. Some thread will always make - progress.