add a section about profiling with perf

2018-08-31 12:57:51 -04:00 · 2018-08-31 12:57:51 -04:00 · 9ecda8c863
parent 3cd4413429
commit 9ecda8c863
3 changed files with 306 additions and 1 deletions
--- a/src/SUMMARY.md
+++ b/src/SUMMARY.md
@ -4,13 +4,15 @@
 - [About the compiler team](./compiler-team.md)
 - [How to build the compiler and run what you built](./how-to-build-and-run.md)
 - [Coding conventions](./conventions.md)
 - [Walkthrough: a typical contribution](./walkthrough.md)
 - [The compiler testing framework](./tests/intro.md)
    - [Running tests](./tests/running.md)
    - [Adding new tests](./tests/adding.md)
    - [Using `compiletest` + commands to control test
      execution](./compiletest.md)
 - [Debugging the Compiler](./compiler-debugging.md)
- [Walkthrough: a typical contribution](./walkthrough.md)
+- [Profiling the compiler](./profiling.md)
    - [with the linux perf tool](./profiling/with_perf.md)
 - [High-level overview of the compiler source](./high-level-overview.md)
 - [The Rustc Driver](./rustc-driver.md)
    - [Rustdoc](./rustdoc.md)
--- a/src/profiling.md
+++ b/src/profiling.md
@ -0,0 +1,9 @@
 # Profiling the compiler
 This discussion talks about how profile the compiler and find out
 where it spends its time.  If you just want to get a general overview,
 it is often a good idea to just add `-Zself-profile` option to the
 rustc command line. This will break down time spent into various
 categories.  But if you want a more detailed look, you probably want
 to break out a custom profiler.
--- a/src/profiling/with_perf.md
+++ b/src/profiling/with_perf.md
@ -0,0 +1,294 @@
 # Profiling with perf
 sThis is a guide for how to profile rustc with perf.
 ## Initial steps
 - Get a clean checkout of rust-lang/master, or whatever it is you want to profile.
 - Set the following settings in your `config.toml`:
  - `debuginfo-lines = true`
  - `use-jemalloc = false` -- lets you do memory use profiling with valgrind 
  - leave everything else the defaults
 - Run `./x.py build` to get a full build
 - Make a rustup toolchain (let's call it `rust-prof`) pointing to that result
  - `rustup toolchain link` XXX
 ## Gathering a perf profile
 perf is an excellent tool on linux that can be used to gather and
 analyze all kinds of information. Mostly it is used to figure out
 where a program spends its time. It can also be used for other sorts
 of events, though, like cache misses and so forth.
 ### The basics
 The basic `perf` command is this:
 ```
 perf record -F99 --call-graph dwarf XXX
 ```
 The `-F99` tells perf to sample at 99 Hz, which avoids generating too
 much data for longer runs. The `--call-graph dwarf` tells perf to get
 call-graph information from debuginfo, which is accurate. The `XXX` is
 the command you want to profile. So, for example, you might do:
 ```
 perf record -F99 --call-graph dwarf cargo +rust-prof rustc
 ```
 to run `cargo`. But there are some things to be aware of:
 - You probably don't want to profile the time spend building
  dependencies. So something like `cargo build; cargo clean -p $C` may
  be helpful (where `$C` is the crate name)
 - You probably don't want incremental messing about with your
  profile. So something like `CARGO_INCREMENTAL=0` can be helpful.
 ### Gathering a perf profile from a `perf.rust-lang.org` test
 Often we want to analyze a specific test from `perf.rust-lang.org`. To
 do that, the first step is to clone
 [the rustc-perf repository][rustc-perf-gh]:
 ```bash
 > git clone https://github.com/rust-lang-nursery/rustc-perf
 ```
 [rustc-perf-gh]: https://github.com/rust-lang-nursery/rustc-perf
 This repo contains a bunch of stuff, but the sources for the tests are
 found in [the `collector/benchmarks` directory][dir]. So let's go into
 the directory of a specific test; we'll use `clap-rs` as an example:
 [dir]: https://github.com/rust-lang-nursery/rustc-perf/tree/master/collector/benchmarks
 ```bash
 cd collector/benchmarks/clap-rs
 ```
 In this case, let's say we want to profile the `cargo check`
 performance. In that case, I would first run some basic commands to
 build the dependencies:
 ```bash
 # Setup: first clean out any old results and build the dependencies:
 cargo +rust-prof clean 
 CARGO_INCREMENTAL=0 cargo +rust-prof check 
 ```
 Next: we want record the execution time for *just* the clap-rs crate,
 running cargo check. I tend to use `cargo rustc` for this, since it
 also allows me to add explicit flags, which we'll do later on.
 ```bash
 touch src/lib.rs
 CARGO_INCREMENTAL=0 perf record -F99 --call-graph dwarf cargo rustc --profile check --lib
 ```
 Note that final command: it's a doozy! It uses the `cargo rustc`
 command, which executes rustc with (potentially) additional options;
 the `--profile check` and `--lib` options specify that we are doing a
 `cargo check` execution, and that this is a library (not an
 execution).
 At this point, we can use `perf` tooling to analyze the results. For example:
 ```bash
 > perf report
 ```
 will open up an interactive TUI program. In simple cases, that can be
 helpful. For more detailed examination, the [`perf-focus` tool][pf]
 can be helpful; it is covered below.
 **A note of caution.** Each of the rustc-perf tests is its own special
  snowflake. In particular, some of them are not libraries, in which
  case you would want to do `touch src/main.rs` and avoid passing
  `--lib`. I'm not sure how best to tell which test is which to be
  honest.
 ### Gathering NLL data
 If you want to profile an NLL run, you can just pass extra options to the `cargo rustc` command. The actual perf site just uses `-Zborrowck=mir`, which we can simulate like so:
 ```bash
 touch src/lib.rs
 CARGO_INCREMENTAL=0 perf record -F99 --call-graph dwarf cargo rustc --profile check --lib -- -Zborrowck=mir
 ```
 [pf]: https://github.com/nikomatsakis/perf-focus
 ## Analyzing a perf profile with `perf focus`
 Once you've gathered a perf profile, we want to get some information
 about it. For this, I personally use [perf focus][pf]. It's a kind of
 simple but useful tool that lets you answer queries like:
 - "how much time was spent in function F" (no matter where it was called from)
 - "how much time was spent in function F when it was called from G"
 - "how much time was spent in function F *excluding* time spent in G"
 - "what fns does F call and how much time does it spend in them"
 To understand how it works, you have to know just a bit about
 perf. Basically, perf works by *sampling* your process on a regular
 basis (or whenever some event occurs). For each sample, perf gathers a
 backtrace. `perf focus` lets you write a regular expression that tests
 which fns appear in that backtrace, and then tells you which
 percentage of samples had a backtrace that met the regular
 expression. It's probably easiest to explain by walking through how I
 would analyze NLL performance.
 ## Installing `perf-focus`
 You can install perf-focus using `cargo install`:
 ```
 cargo install perf-focus
 ```
 ## Example: How much time is spent in MIR borrowck?
 Let's say we've gathered the NLL data for a test. We'd like to know
 how much time it is spending in the MIR borrow-checker. The "main"
 function of the MIR borrowck is called `do_mir_borrowck`, so we can do
 this command:
 ```bash
 > perf focus '{do_mir_borrowck}'
 Matcher    : {do_mir_borrowck}
 Matches    : 228
 Not Matches: 542
 Percentage : 29%
 ```
 The `'{do_mir_borrowck}'` argument is called the **matcher**. It
 specifies the test to be applied on the backtrace. In this case, the
 `{X}` indicates that there must be *some* function on the backtrace
 that meets the regular expression `X`. In this case, that regex is
 just the name of the fn we want (in fact, it's a subset of the name;
 the full name includes a bunch of other stuff, like the module
 path). In this mode, perf-focus just prints out the percentage of
 samples where `do_mir_borrowck` was on the stack: in this case, 29%.
 **A note about c++filt.** To get the data from `perf`, `perf focus`
  currently executes `perf script` (perhaps there is a better
  way...). I've sometimes found that `perf script` outputs C++ mangled
  names. This is annoying. You can tell by running `perf script |
  head` yourself -- if you see named like `5rustc6middle` instead of
  `rustc::middle`, then you have the same problem. You can solve this
  by doing:
 ```bash
 > perf script | c++filt | perf focus --from-stdin ...
 ```
 This will pipe the output from `perf script` through `c++filt` and
 should mostly convert those names into a more friendly format. The
 `--from-stdin` flag to `perf focus` tells it to get its data from
 stdin, rather than executing `perf focus`. We should make this more
 convenient (at worst, maybe add a `c++filt` option to `perf focus`, or
 just always use it -- it's pretty harmless).
 ## Example: How much time does MIR borrowck spend solving traits?
 Perhaps we'd like to know how much time MIR borrowck spends in the
 trait checker. We can ask this using a more complex regex:
 ```bash
 > perf focus '{do_mir_borrowck}..{^rustc::traits}'
 Matcher    : {do_mir_borrowck},..{^rustc::traits}
 Matches    : 12
 Not Matches: 1311
 Percentage : 0%
 ```
 Here we used the `..` operator to ask "how often do we have
 `do_mir_borrowck` on the stack and then, later, some fn whose name
 begins with `rusc::traits`?" (basically, code in that module). It
 turns out the answer is "almost never" -- only 12 samples fit that
 description (if you ever see *no* samples, that often indicates your
 query is messed up).
 If you're curious, you can find out exactly which samples by using the
 `--print-match` option. This will print out the full backtrace for
 each sample. The `|` at the front of the line indicates the part that
 the regular expression matched.
 ## Example: Where does MIR borrowck spend its time?
 Often we want to do a more "explorational" queries. Like, we know that
 MIR borrowck is 29% of the time, but where does that time get spent?
 For that, the `--tree-callees` option is often the best tool. You
 usually also want to give `--tree-min-percent` or
 `--tree-max-depth`. The result looks like this:
 ```bash
 > perf focus '{do_mir_borrowck}' --tree-callees --tree-min-percent 3
 Matcher    : {do_mir_borrowck}
 Matches    : 577
 Not Matches: 746
 Percentage : 43%
 Tree
 | matched `{do_mir_borrowck}` (43% total, 0% self)
 : | rustc_mir::borrow_check::nll::compute_regions (20% total, 0% self)
 : : | rustc_mir::borrow_check::nll::type_check::type_check_internal (13% total, 0% self)
 : : : | core::ops::function::FnOnce::call_once (5% total, 0% self)
 : : : : | rustc_mir::borrow_check::nll::type_check::liveness::generate (5% total, 3% self)
 : : : | <rustc_mir::borrow_check::nll::type_check::TypeVerifier<'a, 'b, 'gcx, 'tcx> as rustc::mir::visit::Visitor<'tcx>>::visit_mir (3% total, 0% self)
 : | rustc::mir::visit::Visitor::visit_mir (8% total, 6% self)
 : | <rustc_mir::borrow_check::MirBorrowckCtxt<'cx, 'gcx, 'tcx> as rustc_mir::dataflow::DataflowResultsConsumer<'cx, 'tcx>>::visit_statement_entry (5% total, 0% self)
 : | rustc_mir::dataflow::do_dataflow (3% total, 0% self)
 ```
 What happens with `--tree-callees` is that
 - we find each sample matching the regular expression
 - we look at the code that is occurs *after* the regex match and try to build up a call tree
 The `--tree-min-percent 3` option says "only show me things that take
 more than 3% of the time. Without this, the tree often gets really
 noisy and includes random stuff like the innards of
 malloc. `--tree-max-depth` can be useful too, it just limits how many
 levels we print.
 For each line, we display the percent of time in that function
 altogether ("total") and the percent of time spent in **just that
 function and not some callee of that function** (self). Usually
 "total" is the more interesting number, but not always.
 ### Absolute vs relative percentages
 By default, all in perf-focus are relative to the **total program
 execution**. This is useful to help you keep perspective -- often as
 we drill down to find hot spots, we can lose sight of the fact that,
 in terms of overall program execution, this "hot spot" is actually not
 important. It also ensures that percentages between different queries
 are easily compared against one another.
 That said, sometimes it's useful to get relative percentages, so `perf
 focus` offers a `--relative` option. In this case, the percentages are
 listed only for samples that match (vs all samples). So for example we
 could find out get our percentages relative to the borrowck itself
 like so:
 ```bash
 > perf focus '{do_mir_borrowck}' --tree-callees --relative --tree-max-depth 1 --tree-min-percent 5
 Matcher    : {do_mir_borrowck}
 Matches    : 577
 Not Matches: 746
 Percentage : 100%
 Tree
 | matched `{do_mir_borrowck}` (100% total, 0% self)
 : | rustc_mir::borrow_check::nll::compute_regions (47% total, 0% self) [...]
 : | rustc::mir::visit::Visitor::visit_mir (19% total, 15% self) [...]
 : | <rustc_mir::borrow_check::MirBorrowckCtxt<'cx, 'gcx, 'tcx> as rustc_mir::dataflow::DataflowResultsConsumer<'cx, 'tcx>>::visit_statement_entry (13% total, 0% self) [...]
 : | rustc_mir::dataflow::do_dataflow (8% total, 1% self) [...]
 ```
 Here you see that `compute_regions` came up as "47% total" -- that
 means that 47% of `do_mir_borrowck` is spent in that function. Before,
 we saw 20% -- that's because `do_mir_borrowck` itself is only 43% of
 the total time (and `.47 * .43 = .20`).