Docs: consolidated parallelism information
This commit is contained in:
parent
71d88b345f
commit
0fe44f730b
|
|
@ -297,12 +297,7 @@ Compiler performance is a problem that we would like to improve on
|
||||||
(and are always working on). One aspect of that is parallelizing
|
(and are always working on). One aspect of that is parallelizing
|
||||||
`rustc` itself.
|
`rustc` itself.
|
||||||
|
|
||||||
Currently, there is only one part of rustc that is already parallel: codegen.
|
Currently, there is only one part of rustc that is parallel by default: codegen.
|
||||||
During monomorphization, the compiler will split up all the code to be
|
|
||||||
generated into smaller chunks called _codegen units_. These are then generated
|
|
||||||
by independent instances of LLVM. Since they are independent, we can run them
|
|
||||||
in parallel. At the end, the linker is run to combine all the codegen units
|
|
||||||
together into one binary.
|
|
||||||
|
|
||||||
However, the rest of the compiler is still not yet parallel. There have been
|
However, the rest of the compiler is still not yet parallel. There have been
|
||||||
lots of efforts spent on this, but it is generally a hard problem. The current
|
lots of efforts spent on this, but it is generally a hard problem. The current
|
||||||
|
|
|
||||||
|
|
@ -1,25 +1,55 @@
|
||||||
# Parallel Compilation
|
# Parallel Compilation
|
||||||
|
|
||||||
Most of the compiler is not parallel. This represents an opportunity for
|
As of <!-- date: 2021-09 --> September 2021, The only stage of the compiler
|
||||||
improving compiler performance.
|
that is already parallel is codegen. The nightly compiler implements query evaluation,
|
||||||
|
but there is a lot of correctness work that needs to be done. The lack of parallelism at other stages
|
||||||
|
also represents an opportunity for improving compiler performance. One can try out the current
|
||||||
|
parallel compiler work by enabling it in the `config.toml`.
|
||||||
|
|
||||||
|
These next few sections describe where and how parallelism is currently used,
|
||||||
|
and the current status of making parallel compilation the default in `rustc`.
|
||||||
|
|
||||||
|
The underlying thread-safe data-structures used in the parallel compiler
|
||||||
|
can be found in `rustc_data_structures/sync.rs`. Some of these data structures
|
||||||
|
use the `parking_lot` API.
|
||||||
|
|
||||||
|
## Code Gen
|
||||||
|
|
||||||
|
During [monomorphization][monomorphization] the compiler splits up all the code to
|
||||||
|
be generated into smaller chunks called _codegen units_. These are then generated by
|
||||||
|
independent instances of LLVM running in parallel. At the end, the linker
|
||||||
|
is run to combine all the codegen units together into one binary.
|
||||||
|
|
||||||
|
## Query System
|
||||||
|
|
||||||
|
The query model has some properties that make it actually feasible to evaluate
|
||||||
|
multiple queries in parallel without too much of an effort:
|
||||||
|
|
||||||
|
- All data a query provider can access is accessed via the query context, so
|
||||||
|
the query context can take care of synchronizing access.
|
||||||
|
- Query results are required to be immutable so they can safely be used by
|
||||||
|
different threads concurrently.
|
||||||
|
|
||||||
|
|
||||||
|
When a query `foo` is evaluated, the cache table for `foo` is locked.
|
||||||
|
|
||||||
|
- If there already is a result, we can clone it, release the lock and
|
||||||
|
we are done.
|
||||||
|
- If there is no cache entry and no other active query invocation computing the
|
||||||
|
same result, we mark the key as being "in progress", release the lock and
|
||||||
|
start evaluating.
|
||||||
|
- If there *is* another query invocation for the same key in progress, we
|
||||||
|
release the lock, and just block the thread until the other invocation has
|
||||||
|
computed the result we are waiting for. This cannot deadlock because, as
|
||||||
|
mentioned before, query invocations form a DAG. Some thread will always make
|
||||||
|
progress.
|
||||||
|
|
||||||
|
## Current Status
|
||||||
|
|
||||||
As of <!-- date: 2021-07 --> July 2021, work on explicitly parallelizing the
|
As of <!-- date: 2021-07 --> July 2021, work on explicitly parallelizing the
|
||||||
compiler has stalled. There is a lot of design and correctness work that needs
|
compiler has stalled. There is a lot of design and correctness work that needs
|
||||||
to be done.
|
to be done.
|
||||||
|
|
||||||
One can try out the current parallel compiler work by enabling it in the
|
|
||||||
`config.toml`.
|
|
||||||
|
|
||||||
There are a few basic ideas in this effort:
|
|
||||||
|
|
||||||
- There are a lot of loops in the compiler that just iterate over all items in
|
|
||||||
a crate. These can possibly be parallelized.
|
|
||||||
- We can use (a custom fork of) [`rayon`] to run tasks in parallel. The custom
|
|
||||||
fork allows the execution of DAGs of tasks, not just trees.
|
|
||||||
- There are currently a lot of global data structures that need to be made
|
|
||||||
thread-safe. A key strategy here has been converting interior-mutable
|
|
||||||
data-structures (e.g. `Cell`) into their thread-safe siblings (e.g. `Mutex`).
|
|
||||||
|
|
||||||
[`rayon`]: https://crates.io/crates/rayon
|
[`rayon`]: https://crates.io/crates/rayon
|
||||||
|
|
||||||
As of <!-- date: 2021-02 --> February 2021, much of this effort is on hold due
|
As of <!-- date: 2021-02 --> February 2021, much of this effort is on hold due
|
||||||
|
|
@ -45,3 +75,4 @@ are a bit out of date):
|
||||||
[imlist]: https://github.com/nikomatsakis/rustc-parallelization/blob/master/interior-mutability-list.md
|
[imlist]: https://github.com/nikomatsakis/rustc-parallelization/blob/master/interior-mutability-list.md
|
||||||
[irlo1]: https://internals.rust-lang.org/t/help-test-parallel-rustc/11503
|
[irlo1]: https://internals.rust-lang.org/t/help-test-parallel-rustc/11503
|
||||||
[tracking]: https://github.com/rust-lang/rust/issues/48685
|
[tracking]: https://github.com/rust-lang/rust/issues/48685
|
||||||
|
[monomorphization]:https://rustc-dev-guide.rust-lang.org/backend/monomorph.html
|
||||||
|
|
|
||||||
|
|
@ -211,29 +211,3 @@ much of a maintenance burden.
|
||||||
|
|
||||||
To summarize: "Steal queries" break some of the rules in a controlled way.
|
To summarize: "Steal queries" break some of the rules in a controlled way.
|
||||||
There are checks in place that make sure that nothing can go silently wrong.
|
There are checks in place that make sure that nothing can go silently wrong.
|
||||||
|
|
||||||
|
|
||||||
## Parallel Query Execution
|
|
||||||
|
|
||||||
The query model has some properties that make it actually feasible to evaluate
|
|
||||||
multiple queries in parallel without too much of an effort:
|
|
||||||
|
|
||||||
- All data a query provider can access is accessed via the query context, so
|
|
||||||
the query context can take care of synchronizing access.
|
|
||||||
- Query results are required to be immutable so they can safely be used by
|
|
||||||
different threads concurrently.
|
|
||||||
|
|
||||||
The nightly compiler already implements parallel query evaluation as follows:
|
|
||||||
|
|
||||||
When a query `foo` is evaluated, the cache table for `foo` is locked.
|
|
||||||
|
|
||||||
- If there already is a result, we can clone it, release the lock and
|
|
||||||
we are done.
|
|
||||||
- If there is no cache entry and no other active query invocation computing the
|
|
||||||
same result, we mark the key as being "in progress", release the lock and
|
|
||||||
start evaluating.
|
|
||||||
- If there *is* another query invocation for the same key in progress, we
|
|
||||||
release the lock, and just block the thread until the other invocation has
|
|
||||||
computed the result we are waiting for. This cannot deadlock because, as
|
|
||||||
mentioned before, query invocations form a DAG. Some thread will always make
|
|
||||||
progress.
|
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue