various nits from mark-i-m

This commit is contained in:
Niko Matsakis 2018-08-31 13:46:13 -04:00 committed by Who? Me?!
parent 9ecda8c863
commit 860e34b13a
1 changed files with 20 additions and 18 deletions

View File

@ -1,6 +1,6 @@
# Profiling with perf # Profiling with perf
sThis is a guide for how to profile rustc with perf. This is a guide for how to profile rustc with [perf](https://perf.wiki.kernel.org/index.php/Main_Page).
## Initial steps ## Initial steps
@ -11,7 +11,7 @@ sThis is a guide for how to profile rustc with perf.
- leave everything else the defaults - leave everything else the defaults
- Run `./x.py build` to get a full build - Run `./x.py build` to get a full build
- Make a rustup toolchain (let's call it `rust-prof`) pointing to that result - Make a rustup toolchain (let's call it `rust-prof`) pointing to that result
- `rustup toolchain link` XXX - `rustup toolchain link <path-to-toolchain>`
## Gathering a perf profile ## Gathering a perf profile
@ -29,9 +29,11 @@ perf record -F99 --call-graph dwarf XXX
``` ```
The `-F99` tells perf to sample at 99 Hz, which avoids generating too The `-F99` tells perf to sample at 99 Hz, which avoids generating too
much data for longer runs. The `--call-graph dwarf` tells perf to get much data for longer runs (why 99 Hz you ask? No particular reason, it
call-graph information from debuginfo, which is accurate. The `XXX` is just seems to work well for me). The `--call-graph dwarf` tells perf
the command you want to profile. So, for example, you might do: to get call-graph information from debuginfo, which is accurate. The
`XXX` is the command you want to profile. So, for example, you might
do:
``` ```
perf record -F99 --call-graph dwarf cargo +rust-prof rustc perf record -F99 --call-graph dwarf cargo +rust-prof rustc
@ -42,6 +44,7 @@ to run `cargo`. But there are some things to be aware of:
- You probably don't want to profile the time spend building - You probably don't want to profile the time spend building
dependencies. So something like `cargo build; cargo clean -p $C` may dependencies. So something like `cargo build; cargo clean -p $C` may
be helpful (where `$C` is the crate name) be helpful (where `$C` is the crate name)
- Though usually I just do `touch src/lib.rs` and rebuild instead. =)
- You probably don't want incremental messing about with your - You probably don't want incremental messing about with your
profile. So something like `CARGO_INCREMENTAL=0` can be helpful. profile. So something like `CARGO_INCREMENTAL=0` can be helpful.
@ -89,8 +92,7 @@ CARGO_INCREMENTAL=0 perf record -F99 --call-graph dwarf cargo rustc --profile ch
Note that final command: it's a doozy! It uses the `cargo rustc` Note that final command: it's a doozy! It uses the `cargo rustc`
command, which executes rustc with (potentially) additional options; command, which executes rustc with (potentially) additional options;
the `--profile check` and `--lib` options specify that we are doing a the `--profile check` and `--lib` options specify that we are doing a
`cargo check` execution, and that this is a library (not an `cargo check` execution, and that this is a library (not a binary).
execution).
At this point, we can use `perf` tooling to analyze the results. For example: At this point, we can use `perf` tooling to analyze the results. For example:
@ -110,7 +112,8 @@ can be helpful; it is covered below.
### Gathering NLL data ### Gathering NLL data
If you want to profile an NLL run, you can just pass extra options to the `cargo rustc` command. The actual perf site just uses `-Zborrowck=mir`, which we can simulate like so: If you want to profile an NLL run, you can just pass extra options to
the `cargo rustc` command, like so:
```bash ```bash
touch src/lib.rs touch src/lib.rs
@ -128,18 +131,18 @@ simple but useful tool that lets you answer queries like:
- "how much time was spent in function F" (no matter where it was called from) - "how much time was spent in function F" (no matter where it was called from)
- "how much time was spent in function F when it was called from G" - "how much time was spent in function F when it was called from G"
- "how much time was spent in function F *excluding* time spent in G" - "how much time was spent in function F *excluding* time spent in G"
- "what fns does F call and how much time does it spend in them" - "what functions does F call and how much time does it spend in them"
To understand how it works, you have to know just a bit about To understand how it works, you have to know just a bit about
perf. Basically, perf works by *sampling* your process on a regular perf. Basically, perf works by *sampling* your process on a regular
basis (or whenever some event occurs). For each sample, perf gathers a basis (or whenever some event occurs). For each sample, perf gathers a
backtrace. `perf focus` lets you write a regular expression that tests backtrace. `perf focus` lets you write a regular expression that tests
which fns appear in that backtrace, and then tells you which which functions appear in that backtrace, and then tells you which
percentage of samples had a backtrace that met the regular percentage of samples had a backtrace that met the regular
expression. It's probably easiest to explain by walking through how I expression. It's probably easiest to explain by walking through how I
would analyze NLL performance. would analyze NLL performance.
## Installing `perf-focus` ### Installing `perf-focus`
You can install perf-focus using `cargo install`: You can install perf-focus using `cargo install`:
@ -147,7 +150,7 @@ You can install perf-focus using `cargo install`:
cargo install perf-focus cargo install perf-focus
``` ```
## Example: How much time is spent in MIR borrowck? ### Example: How much time is spent in MIR borrowck?
Let's say we've gathered the NLL data for a test. We'd like to know Let's say we've gathered the NLL data for a test. We'd like to know
how much time it is spending in the MIR borrow-checker. The "main" how much time it is spending in the MIR borrow-checker. The "main"
@ -175,7 +178,7 @@ samples where `do_mir_borrowck` was on the stack: in this case, 29%.
currently executes `perf script` (perhaps there is a better currently executes `perf script` (perhaps there is a better
way...). I've sometimes found that `perf script` outputs C++ mangled way...). I've sometimes found that `perf script` outputs C++ mangled
names. This is annoying. You can tell by running `perf script | names. This is annoying. You can tell by running `perf script |
head` yourself -- if you see named like `5rustc6middle` instead of head` yourself -- if you see names like `5rustc6middle` instead of
`rustc::middle`, then you have the same problem. You can solve this `rustc::middle`, then you have the same problem. You can solve this
by doing: by doing:
@ -190,7 +193,7 @@ stdin, rather than executing `perf focus`. We should make this more
convenient (at worst, maybe add a `c++filt` option to `perf focus`, or convenient (at worst, maybe add a `c++filt` option to `perf focus`, or
just always use it -- it's pretty harmless). just always use it -- it's pretty harmless).
## Example: How much time does MIR borrowck spend solving traits? ### Example: How much time does MIR borrowck spend solving traits?
Perhaps we'd like to know how much time MIR borrowck spends in the Perhaps we'd like to know how much time MIR borrowck spends in the
trait checker. We can ask this using a more complex regex: trait checker. We can ask this using a more complex regex:
@ -215,7 +218,7 @@ If you're curious, you can find out exactly which samples by using the
each sample. The `|` at the front of the line indicates the part that each sample. The `|` at the front of the line indicates the part that
the regular expression matched. the regular expression matched.
## Example: Where does MIR borrowck spend its time? ### Example: Where does MIR borrowck spend its time?
Often we want to do a more "explorational" queries. Like, we know that Often we want to do a more "explorational" queries. Like, we know that
MIR borrowck is 29% of the time, but where does that time get spent? MIR borrowck is 29% of the time, but where does that time get spent?
@ -258,7 +261,7 @@ altogether ("total") and the percent of time spent in **just that
function and not some callee of that function** (self). Usually function and not some callee of that function** (self). Usually
"total" is the more interesting number, but not always. "total" is the more interesting number, but not always.
### Absolute vs relative percentages ### Relative percentages
By default, all in perf-focus are relative to the **total program By default, all in perf-focus are relative to the **total program
execution**. This is useful to help you keep perspective -- often as execution**. This is useful to help you keep perspective -- often as
@ -270,8 +273,7 @@ are easily compared against one another.
That said, sometimes it's useful to get relative percentages, so `perf That said, sometimes it's useful to get relative percentages, so `perf
focus` offers a `--relative` option. In this case, the percentages are focus` offers a `--relative` option. In this case, the percentages are
listed only for samples that match (vs all samples). So for example we listed only for samples that match (vs all samples). So for example we
could find out get our percentages relative to the borrowck itself could get our percentages relative to the borrowck itself like so:
like so:
```bash ```bash
> perf focus '{do_mir_borrowck}' --tree-callees --relative --tree-max-depth 1 --tree-min-percent 5 > perf focus '{do_mir_borrowck}' --tree-callees --relative --tree-max-depth 1 --tree-min-percent 5