Add documentation about profile-guided optimization.
This commit is contained in:
parent
e928847fde
commit
c3b32895e7
|
|
@ -85,6 +85,7 @@
|
|||
- [Debugging LLVM](./codegen/debugging.md)
|
||||
- [Emitting Diagnostics](./diag.md)
|
||||
- [JSON diagnostic format](./diag/json-format.md)
|
||||
- [Profile-guided Optimization](./profile-guided-optimization.md)
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
|
|
@ -0,0 +1,132 @@
|
|||
# Profile Guided Optimization
|
||||
|
||||
`rustc` supports doing profile-guided optimization (PGO).
|
||||
This chapter describes what PGO is and how the support for it is
|
||||
implemented in `rustc`.
|
||||
|
||||
## What Is Profiled-Guided Optimization?
|
||||
|
||||
The basic concept of PGO is to collect data about the typical execution of
|
||||
a program (e.g. which branches it is likely to take) and then use this data
|
||||
to inform optimizations such as inlining, machine-code layout,
|
||||
register allocation, etc.
|
||||
|
||||
There are different ways of collecting data about a program's execution.
|
||||
One is to run the program inside a profiler (such as `perf`) and another
|
||||
is to create an instrumented binary, that is, a binary that has data
|
||||
collection built into it, and run that.
|
||||
The latter usually provides more accurate data.
|
||||
|
||||
## How is PGO implemented in `rustc`?
|
||||
|
||||
`rustc` current PGO implementation relies entirely on LLVM.
|
||||
LLVM actually [supports multiple forms][clang-pgo] of PGO:
|
||||
|
||||
[clang-pgo]: https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
|
||||
|
||||
- Sampling-based PGO where an external profiling tool like `perf` is used
|
||||
to collect data about a program's execution.
|
||||
- GCOV-based profiling, where code coverage infrastructure is used to collect
|
||||
profiling information.
|
||||
- Front-end based instrumentation, where the compiler front-end (e.g. Clang)
|
||||
inserts instrumentation intrinsics into the LLVM IR it generates.
|
||||
- IR-level instrumentation, where LLVM inserts the instrumentation intrinsics
|
||||
itself during optimization passes.
|
||||
|
||||
`rustc` supports only the last approach, IR-level instrumentation, mainly
|
||||
because it is almost exclusively implemented in LLVM and needs little
|
||||
maintenance on the Rust side. Fortunately, it is also the most modern approach,
|
||||
yielding the best results.
|
||||
|
||||
So, we are dealing with an instrumentation-based approach, i.e. profiling data
|
||||
is generated by a specially instrumented version of the program that's being
|
||||
optimized. Instrumentation-based PGO has two components: a compile-time
|
||||
component and run-time component, and one needs to understand the overall
|
||||
workflow to see how they interact.
|
||||
|
||||
### Overall Workflow
|
||||
|
||||
Generating a PGO-optimized program involves the following four steps:
|
||||
|
||||
1. Compile the program with instrumentation enabled (e.g. `rustc -Cprofile-generate main.rs`)
|
||||
2. Run the instrumented program (e.g. `./main`) which generates a `default-<id>.profraw` file
|
||||
3. Convert the `.profraw` file into a `.profdata` file using LLVM's `llvm-profdata` tool.
|
||||
4. Compile the program again, this time making use of the profiling data
|
||||
(e.g. `rustc -Cprofile-use=merged.profdata main.rs`)
|
||||
|
||||
### Compile-Time Aspects
|
||||
|
||||
Depending on which step in the above workflow we are in, two different things
|
||||
can happen at compile time:
|
||||
|
||||
#### Create Binaries with Instrumentation
|
||||
|
||||
As mentioned above, the profiling instrumentation is added by LLVM.
|
||||
`rustc` instructs LLVM to do so [by setting the appropriate][pgo-gen-passmanager]
|
||||
flags when creating LLVM `PassManager`s:
|
||||
|
||||
```C
|
||||
// `PMBR` is an `LLVMPassManagerBuilderRef`
|
||||
unwrap(PMBR)->EnablePGOInstrGen = true;
|
||||
// Instrumented binaries have a default output path for the `.profraw` file
|
||||
// hard-coded into them:
|
||||
unwrap(PMBR)->PGOInstrGen = PGOGenPath;
|
||||
```
|
||||
|
||||
`rustc` also has to make sure that some of the symbols from LLVM's profiling
|
||||
runtime are not removed [by marking the with the right export level][pgo-gen-symbols].
|
||||
|
||||
[pgo-gen-passmanager]: https://github.com/rust-lang/rust/blob/1.34.1/src/rustllvm/PassWrapper.cpp#L412-L416
|
||||
[pgo-gen-symbols]:https://github.com/rust-lang/rust/blob/1.34.1/src/librustc_codegen_ssa/back/symbol_export.rs#L212-L225
|
||||
|
||||
|
||||
#### Compile Binaries Where Optimizations Make Use Of Profiling Data
|
||||
|
||||
In the final step of the workflow described above, the program is compiled
|
||||
again, with the compiler using the gathered profiling data in order to drive
|
||||
optimization decisions. `rustc` again leaves most of the work to LLVM here,
|
||||
basically [just telling][pgo-use-passmanager] the LLVM `PassManagerBuilder`
|
||||
where the profiling data can be found:
|
||||
|
||||
```C
|
||||
unwrap(PMBR)->PGOInstrUse = PGOUsePath;
|
||||
```
|
||||
|
||||
[pgo-use-passmanager]: https://github.com/rust-lang/rust/blob/1.34.1/src/rustllvm/PassWrapper.cpp#L417-L420
|
||||
|
||||
LLVM does the rest (e.g. setting branch weights, marking functions with
|
||||
`cold` or `inlinehint`, etc).
|
||||
|
||||
|
||||
### Runtime Aspects
|
||||
|
||||
Instrumentation-based approaches always also have a runtime component, i.e.
|
||||
once we have an instrumented program, that program needs to be run in order
|
||||
to generate profiling data, and collecting and persisting this profiling
|
||||
data needs some infrastructure in place.
|
||||
|
||||
In the case of LLVM, these runtime components are implemented in
|
||||
[compiler-rt][compiler-rt-profile] and statically linked into any instrumented
|
||||
binaries.
|
||||
The `rustc` version of this can be found in `src/libprofiler_builtins` which
|
||||
basically packs the C code from `compiler-rt` into a Rust crate.
|
||||
|
||||
In order for `libprofiler_builtins` to be built, `profiler = true` must be set
|
||||
in `rustc`'s `config.toml`.
|
||||
|
||||
[compiler-rt-profile]: https://github.com/llvm/llvm-project/tree/master/compiler-rt/lib/profile
|
||||
|
||||
## Testing PGO
|
||||
|
||||
Since the PGO workflow spans multiple compiler invocations most testing happens
|
||||
in [run-make tests][rmake-tests] (the relevant tests have `pgo` in their name).
|
||||
There is also a [codegen test][codegen-test] that checks that some expected
|
||||
instrumentation artifacts show up in LLVM IR.
|
||||
|
||||
[rmake-tests]: https://github.com/rust-lang/rust/tree/master/src/test/run-make-fulldeps
|
||||
[codegen-test]: https://github.com/rust-lang/rust/blob/master/src/test/codegen/pgo-instrumentation.rs
|
||||
|
||||
## Additional Information
|
||||
|
||||
Clang's documentation contains a good overview on PGO in LLVM here:
|
||||
https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
|
||||
Loading…
Reference in New Issue