Docs: consolidated parallelism information

This commit is contained in:
Timothy Maloney 2021-09-06 13:11:09 -07:00 committed by Joshua Nelson
parent 71d88b345f
commit 0fe44f730b
3 changed files with 47 additions and 47 deletions

View File

@ -297,12 +297,7 @@ Compiler performance is a problem that we would like to improve on
(and are always working on). One aspect of that is parallelizing
`rustc` itself.
Currently, there is only one part of rustc that is already parallel: codegen.
During monomorphization, the compiler will split up all the code to be
generated into smaller chunks called _codegen units_. These are then generated
by independent instances of LLVM. Since they are independent, we can run them
in parallel. At the end, the linker is run to combine all the codegen units
together into one binary.
Currently, there is only one part of rustc that is parallel by default: codegen.
However, the rest of the compiler is still not yet parallel. There have been
lots of efforts spent on this, but it is generally a hard problem. The current

View File

@ -1,25 +1,55 @@
# Parallel Compilation
Most of the compiler is not parallel. This represents an opportunity for
improving compiler performance.
As of <!-- date: 2021-09 --> September 2021, The only stage of the compiler
that is already parallel is codegen. The nightly compiler implements query evaluation,
but there is a lot of correctness work that needs to be done. The lack of parallelism at other stages
also represents an opportunity for improving compiler performance. One can try out the current
parallel compiler work by enabling it in the `config.toml`.
These next few sections describe where and how parallelism is currently used,
and the current status of making parallel compilation the default in `rustc`.
The underlying thread-safe data-structures used in the parallel compiler
can be found in `rustc_data_structures/sync.rs`. Some of these data structures
use the `parking_lot` API.
## Code Gen
During [monomorphization][monomorphization] the compiler splits up all the code to
be generated into smaller chunks called _codegen units_. These are then generated by
independent instances of LLVM running in parallel. At the end, the linker
is run to combine all the codegen units together into one binary.
## Query System
The query model has some properties that make it actually feasible to evaluate
multiple queries in parallel without too much of an effort:
- All data a query provider can access is accessed via the query context, so
the query context can take care of synchronizing access.
- Query results are required to be immutable so they can safely be used by
different threads concurrently.
When a query `foo` is evaluated, the cache table for `foo` is locked.
- If there already is a result, we can clone it, release the lock and
we are done.
- If there is no cache entry and no other active query invocation computing the
same result, we mark the key as being "in progress", release the lock and
start evaluating.
- If there *is* another query invocation for the same key in progress, we
release the lock, and just block the thread until the other invocation has
computed the result we are waiting for. This cannot deadlock because, as
mentioned before, query invocations form a DAG. Some thread will always make
progress.
## Current Status
As of <!-- date: 2021-07 --> July 2021, work on explicitly parallelizing the
compiler has stalled. There is a lot of design and correctness work that needs
to be done.
One can try out the current parallel compiler work by enabling it in the
`config.toml`.
There are a few basic ideas in this effort:
- There are a lot of loops in the compiler that just iterate over all items in
a crate. These can possibly be parallelized.
- We can use (a custom fork of) [`rayon`] to run tasks in parallel. The custom
fork allows the execution of DAGs of tasks, not just trees.
- There are currently a lot of global data structures that need to be made
thread-safe. A key strategy here has been converting interior-mutable
data-structures (e.g. `Cell`) into their thread-safe siblings (e.g. `Mutex`).
[`rayon`]: https://crates.io/crates/rayon
As of <!-- date: 2021-02 --> February 2021, much of this effort is on hold due
@ -45,3 +75,4 @@ are a bit out of date):
[imlist]: https://github.com/nikomatsakis/rustc-parallelization/blob/master/interior-mutability-list.md
[irlo1]: https://internals.rust-lang.org/t/help-test-parallel-rustc/11503
[tracking]: https://github.com/rust-lang/rust/issues/48685
[monomorphization]:https://rustc-dev-guide.rust-lang.org/backend/monomorph.html

View File

@ -211,29 +211,3 @@ much of a maintenance burden.
To summarize: "Steal queries" break some of the rules in a controlled way.
There are checks in place that make sure that nothing can go silently wrong.
## Parallel Query Execution
The query model has some properties that make it actually feasible to evaluate
multiple queries in parallel without too much of an effort:
- All data a query provider can access is accessed via the query context, so
the query context can take care of synchronizing access.
- Query results are required to be immutable so they can safely be used by
different threads concurrently.
The nightly compiler already implements parallel query evaluation as follows:
When a query `foo` is evaluated, the cache table for `foo` is locked.
- If there already is a result, we can clone it, release the lock and
we are done.
- If there is no cache entry and no other active query invocation computing the
same result, we mark the key as being "in progress", release the lock and
start evaluating.
- If there *is* another query invocation for the same key in progress, we
release the lock, and just block the thread until the other invocation has
computed the result we are waiting for. This cannot deadlock because, as
mentioned before, query invocations form a DAG. Some thread will always make
progress.