update parallel-rustc.md
This commit is contained in:
parent
dc6fcaeae8
commit
7d4856ba23
|
|
@ -1,34 +1,111 @@
|
||||||
# Parallel Compilation
|
# Parallel Compilation
|
||||||
|
|
||||||
As of <!-- date-check --> May 2022, The only stage of the compiler
|
As of <!-- date-check --> August 2022, the only stage of the compiler that
|
||||||
that is already parallel is codegen. The nightly compiler implements query evaluation,
|
is already parallel is codegen. Some other parts of the nightly compiler
|
||||||
but there is still a lot of work to be done. The lack of parallelism at other stages
|
have parallel implementations, such as query evaluation, type check and
|
||||||
also represents an opportunity for improving compiler performance. One can try out the current
|
monomorphization, but there is still a lot of work to be done. The lack of
|
||||||
parallel compiler work by enabling it in the `config.toml`.
|
parallelism at other stages (for example, macro expansion) also represents
|
||||||
|
an opportunity for improving compiler performance.
|
||||||
|
|
||||||
|
**To try out the current parallel compiler**, one can install rustc from
|
||||||
|
source code with enable `parallel-compiler = true` in the `config.toml`.
|
||||||
|
|
||||||
These next few sections describe where and how parallelism is currently used,
|
These next few sections describe where and how parallelism is currently used,
|
||||||
and the current status of making parallel compilation the default in `rustc`.
|
and the current status of making parallel compilation the default in `rustc`.
|
||||||
|
|
||||||
The underlying thread-safe data-structures used in the parallel compiler
|
|
||||||
can be found in the `rustc_data_structures::sync` module. Some of these data structures
|
|
||||||
use the `parking_lot` crate as well.
|
|
||||||
|
|
||||||
## Codegen
|
## Codegen
|
||||||
|
|
||||||
There are two underlying thread safe data structures used in code generation:
|
|
||||||
|
|
||||||
- `Lrc`
|
|
||||||
- Which is an [`Arc`][Arc] if `parallel_compiler` is true, and a [`Rc`][Rc]
|
|
||||||
if it is not.
|
|
||||||
- `MetadataRef` -> [`OwningRef<Box<dyn Erased + Send + Sync>, [u8]>`][OwningRef]
|
|
||||||
- This data structure is specific to `rustc`.
|
|
||||||
|
|
||||||
During [monomorphization][monomorphization] the compiler splits up all the code to
|
During [monomorphization][monomorphization] the compiler splits up all the code to
|
||||||
be generated into smaller chunks called _codegen units_. These are then generated by
|
be generated into smaller chunks called _codegen units_. These are then generated by
|
||||||
independent instances of LLVM running in parallel. At the end, the linker
|
independent instances of LLVM running in parallel. At the end, the linker
|
||||||
is run to combine all the codegen units together into one binary. This process
|
is run to combine all the codegen units together into one binary. This process
|
||||||
occurs in the `rustc_codegen_ssa::base` module.
|
occurs in the `rustc_codegen_ssa::base` module.
|
||||||
|
|
||||||
|
## Data Structures
|
||||||
|
|
||||||
|
The underlying thread-safe data-structures used in the parallel compiler
|
||||||
|
can be found in the `rustc_data_structures::sync` module. These data structures
|
||||||
|
are implemented diferently depending on whether `parallel-compiler` is true.
|
||||||
|
|
||||||
|
| data structure | parallel | non-parallel |
|
||||||
|
| -------------------------------- | --------------------------------------------------- | ------------ |
|
||||||
|
| Lrc | std::sync::Arc | std::rc::Rc |
|
||||||
|
| Weak | std::sync::Weak | std::rc::Weak |
|
||||||
|
| Atomic{Bool}/{Usize}/{U32}/{U64} | std::sync::atomic::Atomic{Bool}/{Usize}/{U32}/{U64} | (Cell\<Bool/Usize/U32/U64>) |
|
||||||
|
| OnceCell | std::sync::OnceLock | std::cell::OnceCell |
|
||||||
|
| Lock\<T> | (parking_lot::Mutex\<T>) | (std::cell::RefCell) |
|
||||||
|
| RwLock\<T> | (parking_lot::RwLock\<T>) | (std::cell::RefCell) |
|
||||||
|
| MTRef<'a, T> | &'a T | &'a mut T |
|
||||||
|
| MTLock\<T> | (Lock\<T>) | (T) |
|
||||||
|
| ReadGuard | parking_lot::RwLockReadGuard | std::cell::Ref |
|
||||||
|
| MappedReadGuard | parking_lot::MappedRwLockReadGuard | std::cell::Ref |
|
||||||
|
| WriteGuard | MappedWriteGuard | std::cell::RefMut |
|
||||||
|
| MappedWriteGuard | parking_lot::MappedRwLockWriteGuard | std::cell::RefMut |
|
||||||
|
| LockGuard | parking_lot::MutexGuard | std::cell::RefMut |
|
||||||
|
| MappedLockGuard | parking_lot::MappedMutexGuard | std::cell::RefMut |
|
||||||
|
| MetadataRef | [`OwningRef<Box<dyn Erased + Send + Sync>, [u8]>`][OwningRef] | [`OwningRef<Box<dyn Erased>, [u8]>`][OwningRef] |
|
||||||
|
|
||||||
|
- There are currently a lot of global data structures that need to be made
|
||||||
|
thread-safe. A key strategy here has been converting interior-mutable
|
||||||
|
data-structures (e.g. `Cell`) into their thread-safe siblings (e.g. `Mutex`).
|
||||||
|
|
||||||
|
### WorkLocal
|
||||||
|
|
||||||
|
`WorkLocal` is a special data structure implemented for parallel compiler.
|
||||||
|
It holds worker-locals values for each thread in a thread pool. You can only
|
||||||
|
access the worker local value through the Deref impl on the thread pool it
|
||||||
|
was constructed on. It will panic otherwise.
|
||||||
|
|
||||||
|
`WorkLocal` is used to implement the `Arena` allocator in the parallel
|
||||||
|
environment, which is critical in parallel queries. Its implementation
|
||||||
|
locals in the `rustc-rayon-core::worker_local` module. However, in the
|
||||||
|
non-parallel compiler, it is implemented as `(OneThread<T>)`, which `T`
|
||||||
|
can be accessed directly through `Deref::deref`.
|
||||||
|
|
||||||
|
## Parallel Iterator
|
||||||
|
|
||||||
|
The parallel iterators provided by the [`rayon`] crate are efficient
|
||||||
|
ways to achieve parallelization. The current nightly rustc uses (a custom
|
||||||
|
ork of) [`rayon`] to run tasks in parallel. The custom fork allows the
|
||||||
|
execution of DAGs of tasks, not just trees.
|
||||||
|
|
||||||
|
Some iterator functions are implemented in the current nightly compiler to
|
||||||
|
run loops in parallel when `parallel-compiler` is true.
|
||||||
|
|
||||||
|
| Function(Omit `Send` and `Sync`) | Introduction | Owning Module |
|
||||||
|
| ------------------------------------------------------------ | ------------------------------------------------------------ | -------------------------- |
|
||||||
|
| **par_iter**<T: IntoParallelIterator>(t: T) -> T::Iter | generate a parallel iterator | rustc_data_structure::sync |
|
||||||
|
| **par_for_each_in**<T: IntoParallelIterator>(t: T, for_each: impl Fn(T::Item)) | generate a parallel iterator and run `for_each` on each element | rustc_data_structure::sync |
|
||||||
|
| **Map::par_body_owners**(self, f: impl Fn(LocalDefId)) | run `f` on all hir owners in the crate | rustc_middle::hir::map |
|
||||||
|
| **Map::par_for_each_module**(self, f: impl Fn(LocalDefId)) | run `f` on all modules and sub modules in the crate | rustc_middle::hir::map |
|
||||||
|
| **ModuleItems::par_items**(&self, f: impl Fn(ItemId)) | run `f` on all items in the module | rustc_middle::hir |
|
||||||
|
| **ModuleItems::par_trait_items**(&self, f: impl Fn(TraitItemId)) | run `f` on all trait items in the module | rustc_middle::hir |
|
||||||
|
| **ModuleItems::par_impl_items**(&self, f: impl Fn(ImplItemId)) | run `f` on all impl items in the module | rustc_middle::hir |
|
||||||
|
| **ModuleItems::par_foreign_items**(&self, f: impl Fn(ForeignItemId)) | run `f` on all foreign items in the module | rustc_middle::hir |
|
||||||
|
|
||||||
|
There are a lot of loops in the compiler which can possibly be
|
||||||
|
parallelized using these functions. As of <!-- date-check--> August
|
||||||
|
2022, scenarios where the parallel iterator function has been used
|
||||||
|
are as follows:
|
||||||
|
|
||||||
|
| caller | scenario | callee |
|
||||||
|
| ------------------------------------------------------- | ------------------------------------------------------------ | ------------------------ |
|
||||||
|
| rustc_metadata::rmeta::encoder::prefetch_mir | Prefetch queries which will be needed later by metadata encoding | par_iter |
|
||||||
|
| rustc_monomorphize::collector::collect_crate_mono_items | Collect monomorphized items reachable from non-generic items | par_for_each_in |
|
||||||
|
| rustc_interface::passes::analysis | Check the validity of the match statements | Map::par_body_owners |
|
||||||
|
| rustc_interface::passes::analysis | MIR borrow check | Map::par_body_owners |
|
||||||
|
| rustc_typeck::check::typeck_item_bodies | Type check | Map::par_body_owners |
|
||||||
|
| rustc_interface::passes::hir_id_validator::check_crate | Check the validity of hir | Map::par_for_each_module |
|
||||||
|
| rustc_interface::passes::analysis | Check the validity of loops body, attributes, naked functions, unstable abi, const bodys | Map::par_for_each_module |
|
||||||
|
| rustc_interface::passes::analysis | Liveness and intrinsic checking of MIR | Map::par_for_each_module |
|
||||||
|
| rustc_interface::passes::analysis | Deathness checking | Map::par_for_each_module |
|
||||||
|
| rustc_interface::passes::analysis | Privacy checking | Map::par_for_each_module |
|
||||||
|
| rustc_lint::late::check_crate | Run per-module lints | Map::par_for_each_module |
|
||||||
|
| rustc_typeck::check_crate | well formed checking | Map::par_for_each_module |
|
||||||
|
|
||||||
|
And There are still many loops that have the potential to use
|
||||||
|
parallel iterators.
|
||||||
|
|
||||||
## Query System
|
## Query System
|
||||||
|
|
||||||
The query model has some properties that make it actually feasible to evaluate
|
The query model has some properties that make it actually feasible to evaluate
|
||||||
|
|
@ -48,9 +125,12 @@ When a query `foo` is evaluated, the cache table for `foo` is locked.
|
||||||
start evaluating.
|
start evaluating.
|
||||||
- If there *is* another query invocation for the same key in progress, we
|
- If there *is* another query invocation for the same key in progress, we
|
||||||
release the lock, and just block the thread until the other invocation has
|
release the lock, and just block the thread until the other invocation has
|
||||||
computed the result we are waiting for. This cannot deadlock because, as
|
computed the result we are waiting for. **Deadlocks are possible**, in which
|
||||||
mentioned before, query invocations form a DAG. Some threads will always make
|
case `rustc_query_system::query::job::deadlock()` will be called to detect
|
||||||
progress.
|
and remove the deadlock and then return cycle error as the query result.
|
||||||
|
|
||||||
|
Parallel query still has a lot of work to do, most of which are related to
|
||||||
|
the previous `Data Structures` and `Parallel Iterators`. See [this tracking issue][tracking].
|
||||||
|
|
||||||
## Rustdoc
|
## Rustdoc
|
||||||
|
|
||||||
|
|
@ -64,18 +144,6 @@ As of <!-- date-check --> May 2022, work on explicitly parallelizing the
|
||||||
compiler has stalled. There is a lot of design and correctness work that needs
|
compiler has stalled. There is a lot of design and correctness work that needs
|
||||||
to be done.
|
to be done.
|
||||||
|
|
||||||
These are the basic ideas in the effort to make `rustc` parallel:
|
|
||||||
|
|
||||||
- There are a lot of loops in the compiler that just iterate over all items in
|
|
||||||
a crate. These can possibly be parallelized.
|
|
||||||
- We can use (a custom fork of) [`rayon`] to run tasks in parallel. The custom
|
|
||||||
fork allows the execution of DAGs of tasks, not just trees.
|
|
||||||
- There are currently a lot of global data structures that need to be made
|
|
||||||
thread-safe. A key strategy here has been converting interior-mutable
|
|
||||||
data-structures (e.g. `Cell`) into their thread-safe siblings (e.g. `Mutex`).
|
|
||||||
|
|
||||||
[`rayon`]: https://crates.io/crates/rayon
|
|
||||||
|
|
||||||
As of <!-- date-check --> May 2022, much of this effort is on hold due
|
As of <!-- date-check --> May 2022, much of this effort is on hold due
|
||||||
to lack of manpower. We have a working prototype with promising performance
|
to lack of manpower. We have a working prototype with promising performance
|
||||||
gains in many cases. However, there are two blockers:
|
gains in many cases. However, there are two blockers:
|
||||||
|
|
@ -93,8 +161,8 @@ are a bit out of date):
|
||||||
- [This IRLO thread by Zoxc, one of the pioneers of the effort][irlo0]
|
- [This IRLO thread by Zoxc, one of the pioneers of the effort][irlo0]
|
||||||
- [This list of interior mutability in the compiler by nikomatsakis][imlist]
|
- [This list of interior mutability in the compiler by nikomatsakis][imlist]
|
||||||
- [This IRLO thread by alexchricton about performance][irlo1]
|
- [This IRLO thread by alexchricton about performance][irlo1]
|
||||||
- [This tracking issue][tracking]
|
|
||||||
|
|
||||||
|
[`rayon`]: https://crates.io/crates/rayon
|
||||||
[irlo0]: https://internals.rust-lang.org/t/parallelizing-rustc-using-rayon/6606
|
[irlo0]: https://internals.rust-lang.org/t/parallelizing-rustc-using-rayon/6606
|
||||||
[imlist]: https://github.com/nikomatsakis/rustc-parallelization/blob/master/interior-mutability-list.md
|
[imlist]: https://github.com/nikomatsakis/rustc-parallelization/blob/master/interior-mutability-list.md
|
||||||
[irlo1]: https://internals.rust-lang.org/t/help-test-parallel-rustc/11503
|
[irlo1]: https://internals.rust-lang.org/t/help-test-parallel-rustc/11503
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue