various nits from mark-i-m
This commit is contained in:
parent
9ecda8c863
commit
860e34b13a
|
|
@ -1,6 +1,6 @@
|
||||||
# Profiling with perf
|
# Profiling with perf
|
||||||
|
|
||||||
sThis is a guide for how to profile rustc with perf.
|
This is a guide for how to profile rustc with [perf](https://perf.wiki.kernel.org/index.php/Main_Page).
|
||||||
|
|
||||||
## Initial steps
|
## Initial steps
|
||||||
|
|
||||||
|
|
@ -11,7 +11,7 @@ sThis is a guide for how to profile rustc with perf.
|
||||||
- leave everything else the defaults
|
- leave everything else the defaults
|
||||||
- Run `./x.py build` to get a full build
|
- Run `./x.py build` to get a full build
|
||||||
- Make a rustup toolchain (let's call it `rust-prof`) pointing to that result
|
- Make a rustup toolchain (let's call it `rust-prof`) pointing to that result
|
||||||
- `rustup toolchain link` XXX
|
- `rustup toolchain link <path-to-toolchain>`
|
||||||
|
|
||||||
## Gathering a perf profile
|
## Gathering a perf profile
|
||||||
|
|
||||||
|
|
@ -29,9 +29,11 @@ perf record -F99 --call-graph dwarf XXX
|
||||||
```
|
```
|
||||||
|
|
||||||
The `-F99` tells perf to sample at 99 Hz, which avoids generating too
|
The `-F99` tells perf to sample at 99 Hz, which avoids generating too
|
||||||
much data for longer runs. The `--call-graph dwarf` tells perf to get
|
much data for longer runs (why 99 Hz you ask? No particular reason, it
|
||||||
call-graph information from debuginfo, which is accurate. The `XXX` is
|
just seems to work well for me). The `--call-graph dwarf` tells perf
|
||||||
the command you want to profile. So, for example, you might do:
|
to get call-graph information from debuginfo, which is accurate. The
|
||||||
|
`XXX` is the command you want to profile. So, for example, you might
|
||||||
|
do:
|
||||||
|
|
||||||
```
|
```
|
||||||
perf record -F99 --call-graph dwarf cargo +rust-prof rustc
|
perf record -F99 --call-graph dwarf cargo +rust-prof rustc
|
||||||
|
|
@ -42,6 +44,7 @@ to run `cargo`. But there are some things to be aware of:
|
||||||
- You probably don't want to profile the time spend building
|
- You probably don't want to profile the time spend building
|
||||||
dependencies. So something like `cargo build; cargo clean -p $C` may
|
dependencies. So something like `cargo build; cargo clean -p $C` may
|
||||||
be helpful (where `$C` is the crate name)
|
be helpful (where `$C` is the crate name)
|
||||||
|
- Though usually I just do `touch src/lib.rs` and rebuild instead. =)
|
||||||
- You probably don't want incremental messing about with your
|
- You probably don't want incremental messing about with your
|
||||||
profile. So something like `CARGO_INCREMENTAL=0` can be helpful.
|
profile. So something like `CARGO_INCREMENTAL=0` can be helpful.
|
||||||
|
|
||||||
|
|
@ -89,8 +92,7 @@ CARGO_INCREMENTAL=0 perf record -F99 --call-graph dwarf cargo rustc --profile ch
|
||||||
Note that final command: it's a doozy! It uses the `cargo rustc`
|
Note that final command: it's a doozy! It uses the `cargo rustc`
|
||||||
command, which executes rustc with (potentially) additional options;
|
command, which executes rustc with (potentially) additional options;
|
||||||
the `--profile check` and `--lib` options specify that we are doing a
|
the `--profile check` and `--lib` options specify that we are doing a
|
||||||
`cargo check` execution, and that this is a library (not an
|
`cargo check` execution, and that this is a library (not a binary).
|
||||||
execution).
|
|
||||||
|
|
||||||
At this point, we can use `perf` tooling to analyze the results. For example:
|
At this point, we can use `perf` tooling to analyze the results. For example:
|
||||||
|
|
||||||
|
|
@ -110,7 +112,8 @@ can be helpful; it is covered below.
|
||||||
|
|
||||||
### Gathering NLL data
|
### Gathering NLL data
|
||||||
|
|
||||||
If you want to profile an NLL run, you can just pass extra options to the `cargo rustc` command. The actual perf site just uses `-Zborrowck=mir`, which we can simulate like so:
|
If you want to profile an NLL run, you can just pass extra options to
|
||||||
|
the `cargo rustc` command, like so:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
touch src/lib.rs
|
touch src/lib.rs
|
||||||
|
|
@ -128,18 +131,18 @@ simple but useful tool that lets you answer queries like:
|
||||||
- "how much time was spent in function F" (no matter where it was called from)
|
- "how much time was spent in function F" (no matter where it was called from)
|
||||||
- "how much time was spent in function F when it was called from G"
|
- "how much time was spent in function F when it was called from G"
|
||||||
- "how much time was spent in function F *excluding* time spent in G"
|
- "how much time was spent in function F *excluding* time spent in G"
|
||||||
- "what fns does F call and how much time does it spend in them"
|
- "what functions does F call and how much time does it spend in them"
|
||||||
|
|
||||||
To understand how it works, you have to know just a bit about
|
To understand how it works, you have to know just a bit about
|
||||||
perf. Basically, perf works by *sampling* your process on a regular
|
perf. Basically, perf works by *sampling* your process on a regular
|
||||||
basis (or whenever some event occurs). For each sample, perf gathers a
|
basis (or whenever some event occurs). For each sample, perf gathers a
|
||||||
backtrace. `perf focus` lets you write a regular expression that tests
|
backtrace. `perf focus` lets you write a regular expression that tests
|
||||||
which fns appear in that backtrace, and then tells you which
|
which functions appear in that backtrace, and then tells you which
|
||||||
percentage of samples had a backtrace that met the regular
|
percentage of samples had a backtrace that met the regular
|
||||||
expression. It's probably easiest to explain by walking through how I
|
expression. It's probably easiest to explain by walking through how I
|
||||||
would analyze NLL performance.
|
would analyze NLL performance.
|
||||||
|
|
||||||
## Installing `perf-focus`
|
### Installing `perf-focus`
|
||||||
|
|
||||||
You can install perf-focus using `cargo install`:
|
You can install perf-focus using `cargo install`:
|
||||||
|
|
||||||
|
|
@ -147,7 +150,7 @@ You can install perf-focus using `cargo install`:
|
||||||
cargo install perf-focus
|
cargo install perf-focus
|
||||||
```
|
```
|
||||||
|
|
||||||
## Example: How much time is spent in MIR borrowck?
|
### Example: How much time is spent in MIR borrowck?
|
||||||
|
|
||||||
Let's say we've gathered the NLL data for a test. We'd like to know
|
Let's say we've gathered the NLL data for a test. We'd like to know
|
||||||
how much time it is spending in the MIR borrow-checker. The "main"
|
how much time it is spending in the MIR borrow-checker. The "main"
|
||||||
|
|
@ -175,7 +178,7 @@ samples where `do_mir_borrowck` was on the stack: in this case, 29%.
|
||||||
currently executes `perf script` (perhaps there is a better
|
currently executes `perf script` (perhaps there is a better
|
||||||
way...). I've sometimes found that `perf script` outputs C++ mangled
|
way...). I've sometimes found that `perf script` outputs C++ mangled
|
||||||
names. This is annoying. You can tell by running `perf script |
|
names. This is annoying. You can tell by running `perf script |
|
||||||
head` yourself -- if you see named like `5rustc6middle` instead of
|
head` yourself -- if you see names like `5rustc6middle` instead of
|
||||||
`rustc::middle`, then you have the same problem. You can solve this
|
`rustc::middle`, then you have the same problem. You can solve this
|
||||||
by doing:
|
by doing:
|
||||||
|
|
||||||
|
|
@ -190,7 +193,7 @@ stdin, rather than executing `perf focus`. We should make this more
|
||||||
convenient (at worst, maybe add a `c++filt` option to `perf focus`, or
|
convenient (at worst, maybe add a `c++filt` option to `perf focus`, or
|
||||||
just always use it -- it's pretty harmless).
|
just always use it -- it's pretty harmless).
|
||||||
|
|
||||||
## Example: How much time does MIR borrowck spend solving traits?
|
### Example: How much time does MIR borrowck spend solving traits?
|
||||||
|
|
||||||
Perhaps we'd like to know how much time MIR borrowck spends in the
|
Perhaps we'd like to know how much time MIR borrowck spends in the
|
||||||
trait checker. We can ask this using a more complex regex:
|
trait checker. We can ask this using a more complex regex:
|
||||||
|
|
@ -215,7 +218,7 @@ If you're curious, you can find out exactly which samples by using the
|
||||||
each sample. The `|` at the front of the line indicates the part that
|
each sample. The `|` at the front of the line indicates the part that
|
||||||
the regular expression matched.
|
the regular expression matched.
|
||||||
|
|
||||||
## Example: Where does MIR borrowck spend its time?
|
### Example: Where does MIR borrowck spend its time?
|
||||||
|
|
||||||
Often we want to do a more "explorational" queries. Like, we know that
|
Often we want to do a more "explorational" queries. Like, we know that
|
||||||
MIR borrowck is 29% of the time, but where does that time get spent?
|
MIR borrowck is 29% of the time, but where does that time get spent?
|
||||||
|
|
@ -258,7 +261,7 @@ altogether ("total") and the percent of time spent in **just that
|
||||||
function and not some callee of that function** (self). Usually
|
function and not some callee of that function** (self). Usually
|
||||||
"total" is the more interesting number, but not always.
|
"total" is the more interesting number, but not always.
|
||||||
|
|
||||||
### Absolute vs relative percentages
|
### Relative percentages
|
||||||
|
|
||||||
By default, all in perf-focus are relative to the **total program
|
By default, all in perf-focus are relative to the **total program
|
||||||
execution**. This is useful to help you keep perspective -- often as
|
execution**. This is useful to help you keep perspective -- often as
|
||||||
|
|
@ -270,8 +273,7 @@ are easily compared against one another.
|
||||||
That said, sometimes it's useful to get relative percentages, so `perf
|
That said, sometimes it's useful to get relative percentages, so `perf
|
||||||
focus` offers a `--relative` option. In this case, the percentages are
|
focus` offers a `--relative` option. In this case, the percentages are
|
||||||
listed only for samples that match (vs all samples). So for example we
|
listed only for samples that match (vs all samples). So for example we
|
||||||
could find out get our percentages relative to the borrowck itself
|
could get our percentages relative to the borrowck itself like so:
|
||||||
like so:
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
> perf focus '{do_mir_borrowck}' --tree-callees --relative --tree-max-depth 1 --tree-min-percent 5
|
> perf focus '{do_mir_borrowck}' --tree-callees --relative --tree-max-depth 1 --tree-min-percent 5
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue