Update parallel-rustc.md (#1926)
Co-authored-by: SparrowLii <liyuan179@huawei.com> Co-authored-by: Jieyou Xu <jieyouxu@outlook.com>
This commit is contained in:
parent
5d7107b836
commit
c423c5636d
|
|
@ -1,31 +1,46 @@
|
|||
# Parallel Compilation
|
||||
|
||||
As of <!-- date-check --> August 2022, the only stage of the compiler that
|
||||
is already parallel is codegen. Some parts of the compiler already have
|
||||
parallel implementations, such as query evaluation, type check and
|
||||
monomorphization, but the general version of the compiler does not include
|
||||
these parallelization functions. **To try out the current parallel compiler**,
|
||||
one can install rustc from source code with `parallel-compiler = true` in
|
||||
the `config.toml`.
|
||||
<div class="warning">
|
||||
Parallel front-end is currently (as of 2024 November) undergoing significant
|
||||
changes, this page contains quite a bit of outdated information.
|
||||
|
||||
The lack of parallelism at other stages (for example, macro expansion) also
|
||||
represents an opportunity for improving compiler performance.
|
||||
Tracking issue: <https://github.com/rust-lang/rust/issues/113349>
|
||||
</div>
|
||||
|
||||
These next few sections describe where and how parallelism is currently used,
|
||||
and the current status of making parallel compilation the default in `rustc`.
|
||||
As of <!-- date-check --> November 2024, most of the rust compiler is now
|
||||
parallelized.
|
||||
|
||||
## Codegen
|
||||
- The codegen part is executed concurrently by default. You can use the `-C
|
||||
codegen-units=n` option to control the number of concurrent tasks.
|
||||
- The parts after HIR lowering to codegen such as type checking, borrowing
|
||||
checking, and mir optimization are parallelized in the nightly version.
|
||||
Currently, they are executed in serial by default, and parallelization is
|
||||
manually enabled by the user using the `-Z threads = n` option.
|
||||
- Other parts, such as lexical parsing, HIR lowering, and macro expansion, are
|
||||
still executed in serial mode.
|
||||
|
||||
During [monomorphization][monomorphization] the compiler splits up all the code to
|
||||
<div class="warning">
|
||||
The follow sections are kept for now but are quite outdated.
|
||||
</div>
|
||||
|
||||
---
|
||||
|
||||
[codegen]: backend/codegen.md
|
||||
|
||||
## Code Generation
|
||||
|
||||
During monomorphization the compiler splits up all the code to
|
||||
be generated into smaller chunks called _codegen units_. These are then generated by
|
||||
independent instances of LLVM running in parallel. At the end, the linker
|
||||
is run to combine all the codegen units together into one binary. This process
|
||||
occurs in the `rustc_codegen_ssa::base` module.
|
||||
occurs in the [`rustc_codegen_ssa::base`] module.
|
||||
|
||||
[`rustc_codegen_ssa::base`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/index.html
|
||||
|
||||
## Data Structures
|
||||
|
||||
The underlying thread-safe data-structures used in the parallel compiler
|
||||
can be found in the `rustc_data_structures::sync` module. These data structures
|
||||
can be found in the [`rustc_data_structures::sync`] module. These data structures
|
||||
are implemented differently depending on whether `parallel-compiler` is true.
|
||||
|
||||
| data structure | parallel | non-parallel |
|
||||
|
|
@ -45,34 +60,39 @@ are implemented differently depending on whether `parallel-compiler` is true.
|
|||
| LockGuard | parking_lot::MutexGuard | std::cell::RefMut |
|
||||
| MappedLockGuard | parking_lot::MappedMutexGuard | std::cell::RefMut |
|
||||
|
||||
- These thread-safe data structures interspersed during compilation can
|
||||
cause a lot of lock contention, which actually degrades performance as the
|
||||
number of threads increases beyond 4. This inspires us to audit the use
|
||||
of these data structures, leading to either refactoring to reduce use of
|
||||
shared state, or persistent documentation covering invariants, atomicity,
|
||||
and lock orderings.
|
||||
- These thread-safe data structures are interspersed during compilation which
|
||||
can cause lock contention resulting in degraded performance as the number of
|
||||
threads increases beyond 4. So we audit the use of these data structures
|
||||
which leads to either a refactoring so as to reduce the use of shared state,
|
||||
or the authoring of persistent documentation covering the specific of the
|
||||
invariants, the atomicity, and the lock orderings.
|
||||
|
||||
- On the other hand, we still need to figure out what other invariants
|
||||
during compilation might not hold in parallel compilation.
|
||||
|
||||
### WorkLocal
|
||||
[`rustc_data_structures::sync`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_data_structures/sync/index.html
|
||||
|
||||
`WorkLocal` is a special data structure implemented for parallel compiler.
|
||||
It holds worker-locals values for each thread in a thread pool. You can only
|
||||
access the worker local value through the Deref impl on the thread pool it
|
||||
was constructed on. It will panic otherwise.
|
||||
### WorkerLocal
|
||||
|
||||
`WorkLocal` is used to implement the `Arena` allocator in the parallel
|
||||
environment, which is critical in parallel queries. Its implementation
|
||||
is located in the `rustc-rayon-core::worker_local` module. However, in the
|
||||
non-parallel compiler, it is implemented as `(OneThread<T>)`, whose `T`
|
||||
[`WorkerLocal`] is a special data structure implemented for parallel compilers. It
|
||||
holds worker-locals values for each thread in a thread pool. You can only
|
||||
access the worker local value through the `Deref` `impl` on the thread pool it
|
||||
was constructed on. It panics otherwise.
|
||||
|
||||
`WorkerLocal` is used to implement the `Arena` allocator in the parallel
|
||||
environment, which is critical in parallel queries. Its implementation is
|
||||
located in the [`rustc_data_structures::sync::worker_local`] module. However,
|
||||
in the non-parallel compiler, it is implemented as `(OneThread<T>)`, whose `T`
|
||||
can be accessed directly through `Deref::deref`.
|
||||
|
||||
[`rustc_data_structures::sync::worker_local`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_data_structures/sync/worker_local/index.html
|
||||
[`WorkerLocal`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_data_structures/sync/worker_local/struct.WorkerLocal.html
|
||||
|
||||
## Parallel Iterator
|
||||
|
||||
The parallel iterators provided by the [`rayon`] crate are easy ways
|
||||
to implement parallelism. In the current implementation of the parallel
|
||||
compiler we use a custom [fork][rustc-rayon] of [`rayon`] to run tasks in parallel.
|
||||
The parallel iterators provided by the [`rayon`] crate are easy ways to
|
||||
implement parallelism. In the current implementation of the parallel compiler
|
||||
we use a custom [fork][rustc-rayon] of `rayon` to run tasks in parallel.
|
||||
|
||||
Some iterator functions are implemented to run loops in parallel
|
||||
when `parallel-compiler` is true.
|
||||
|
|
@ -88,10 +108,9 @@ when `parallel-compiler` is true.
|
|||
| **ModuleItems::par_impl_items**(&self, f: impl Fn(ImplItemId)) | run `f` on all impl items in the module | rustc_middle::hir |
|
||||
| **ModuleItems::par_foreign_items**(&self, f: impl Fn(ForeignItemId)) | run `f` on all foreign items in the module | rustc_middle::hir |
|
||||
|
||||
There are a lot of loops in the compiler which can possibly be
|
||||
parallelized using these functions. As of <!-- date-check--> August
|
||||
2022, scenarios where the parallel iterator function has been used
|
||||
are as follows:
|
||||
There are a lot of loops in the compiler which can possibly be parallelized
|
||||
using these functions. As of <!-- date-check--> August 2022, scenarios where
|
||||
the parallel iterator function has been used are as follows:
|
||||
|
||||
| caller | scenario | callee |
|
||||
| ------------------------------------------------------- | ------------------------------------------------------------ | ------------------------ |
|
||||
|
|
@ -113,9 +132,9 @@ There are still many loops that have the potential to use parallel iterators.
|
|||
## Query System
|
||||
|
||||
The query model has some properties that make it actually feasible to evaluate
|
||||
multiple queries in parallel without too much of an effort:
|
||||
multiple queries in parallel without too much effort:
|
||||
|
||||
- All data a query provider can access is accessed via the query context, so
|
||||
- All data a query provider can access is via the query context, so
|
||||
the query context can take care of synchronizing access.
|
||||
- Query results are required to be immutable so they can safely be used by
|
||||
different threads concurrently.
|
||||
|
|
@ -135,31 +154,31 @@ When a query `foo` is evaluated, the cache table for `foo` is locked.
|
|||
the compiler uses an extra thread *(named deadlock handler)* to detect, remove and
|
||||
report the cycle error.
|
||||
|
||||
Parallel query still has a lot of work to do, most of which is related to
|
||||
the previous `Data Structures` and `Parallel Iterators`. See [this tracking issue][tracking].
|
||||
The parallel query feature still has implementation to do, most of which is
|
||||
related to the previous `Data Structures` and `Parallel Iterators`. See [this
|
||||
open feature tracking issue][tracking].
|
||||
|
||||
## Rustdoc
|
||||
|
||||
As of <!-- date-check--> November 2022, there are still a number of steps
|
||||
to complete before rustdoc rendering can be made parallel. More details on
|
||||
this issue can be found [here][parallel-rustdoc].
|
||||
As of <!-- date-check--> November 2022, there are still a number of steps to
|
||||
complete before `rustdoc` rendering can be made parallel (see a open discussion
|
||||
of [parallel `rustdoc`][parallel-rustdoc]).
|
||||
|
||||
## Resources
|
||||
|
||||
Here are some resources that can be used to learn more (note that some of them
|
||||
are a bit out of date):
|
||||
Here are some resources that can be used to learn more:
|
||||
|
||||
- [This IRLO thread by alexchricton about performance][irlo1]
|
||||
- [This IRLO thread by Zoxc, one of the pioneers of the effort][irlo0]
|
||||
- [This list of interior mutability in the compiler by nikomatsakis][imlist]
|
||||
- [This IRLO thread by alexchricton about performance][irlo1]
|
||||
|
||||
[`rayon`]: https://crates.io/crates/rayon
|
||||
[rustc-rayon]: https://github.com/rust-lang/rustc-rayon
|
||||
[irlo0]: https://internals.rust-lang.org/t/parallelizing-rustc-using-rayon/6606
|
||||
[Arc]: https://doc.rust-lang.org/std/sync/struct.Arc.html
|
||||
[imlist]: https://github.com/nikomatsakis/rustc-parallelization/blob/master/interior-mutability-list.md
|
||||
[irlo0]: https://internals.rust-lang.org/t/parallelizing-rustc-using-rayon/6606
|
||||
[irlo1]: https://internals.rust-lang.org/t/help-test-parallel-rustc/11503
|
||||
[tracking]: https://github.com/rust-lang/rust/issues/48685
|
||||
[monomorphization]: backend/monomorph.md
|
||||
[parallel-rustdoc]: https://github.com/rust-lang/rust/issues/82741
|
||||
[Arc]: https://doc.rust-lang.org/std/sync/struct.Arc.html
|
||||
[Rc]: https://doc.rust-lang.org/std/rc/struct.Rc.html
|
||||
[rustc-rayon]: https://github.com/rust-lang/rustc-rayon
|
||||
[tracking]: https://github.com/rust-lang/rust/issues/48685
|
||||
|
|
|
|||
Loading…
Reference in New Issue