Add article on using WPA to profile rustc memory usage on Windows (#1074)
Document how to use WPA to profile rustc and what the normal workflow should be for investigating bootstrap memory usage issues. Co-authored-by: Ryan Levick <ryan.levick@gmail.com>
This commit is contained in:
parent
8bb61fe105
commit
91f8f334f4
|
|
@ -23,6 +23,7 @@
|
||||||
- [Debugging the Compiler](./compiler-debugging.md)
|
- [Debugging the Compiler](./compiler-debugging.md)
|
||||||
- [Profiling the compiler](./profiling.md)
|
- [Profiling the compiler](./profiling.md)
|
||||||
- [with the linux perf tool](./profiling/with_perf.md)
|
- [with the linux perf tool](./profiling/with_perf.md)
|
||||||
|
- [with Windows Performance Analyzer](./profiling/wpa_profiling.md)
|
||||||
- [crates.io Dependencies](./crates-io.md)
|
- [crates.io Dependencies](./crates-io.md)
|
||||||
|
|
||||||
|
|
||||||
|
|
|
||||||
Binary file not shown.
|
After Width: | Height: | Size: 305 KiB |
Binary file not shown.
|
After Width: | Height: | Size: 142 KiB |
|
|
@ -21,6 +21,10 @@ Depending on what you're trying to measure, there are several different approach
|
||||||
eg. `cargo -Z timings build`.
|
eg. `cargo -Z timings build`.
|
||||||
You can use this flag on the compiler itself with `CARGOFLAGS="-Z timings" ./x.py build`
|
You can use this flag on the compiler itself with `CARGOFLAGS="-Z timings" ./x.py build`
|
||||||
|
|
||||||
|
- If you want to profile memory usage, you can use various tools depending on what operating system
|
||||||
|
you are using.
|
||||||
|
- For Windows, read our [WPA guide](profiling/wpa_profiling.html).
|
||||||
|
|
||||||
## Optimizing rustc's bootstrap times with `cargo-llvm-lines`
|
## Optimizing rustc's bootstrap times with `cargo-llvm-lines`
|
||||||
|
|
||||||
Using [cargo-llvm-lines](https://github.com/dtolnay/cargo-llvm-lines) you can count the
|
Using [cargo-llvm-lines](https://github.com/dtolnay/cargo-llvm-lines) you can count the
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,108 @@
|
||||||
|
# Profiling on Windows
|
||||||
|
|
||||||
|
## Introducing WPR and WPA
|
||||||
|
|
||||||
|
High-level performance analysis (including memory usage) can be performed with the Windows
|
||||||
|
Performance Recorder (WPR) and Windows Performance Analyzer (WPA). As the names suggest, WPR is for
|
||||||
|
recording system statistics (in the form of event trace log a.k.a. ETL files), while WPA is for
|
||||||
|
analyzing these ETL files.
|
||||||
|
|
||||||
|
WPR collects system wide statistics, so it won't just record things relevant to rustc but also
|
||||||
|
everything else that's running on the machine. During analysis, we can filter to just the things we
|
||||||
|
find interesting.
|
||||||
|
|
||||||
|
These tools are quite powerful but also require a bit of learning
|
||||||
|
before we can successfully profile the Rust compiler.
|
||||||
|
|
||||||
|
Here we will explore how to use WPR and WPA for analyzing the Rust compiler as well as provide
|
||||||
|
links to useful "profiles" (i.e., settings files that tweak the defaults for WPR and WPA) that are
|
||||||
|
specifically designed to make analyzing rustc easier.
|
||||||
|
|
||||||
|
### Installing WPR and WPA
|
||||||
|
|
||||||
|
You can install WPR and WPA as part of the Windows Performance Toolkit which itself is an option as
|
||||||
|
part of downloading the Windows Assessment and Deployment Kit (ADK). You can download the ADK
|
||||||
|
installer [here](https://go.microsoft.com/fwlink/?linkid=2086042). Make sure to select the Windows
|
||||||
|
Performance Toolkit (you don't need to select anything else).
|
||||||
|
|
||||||
|
## Recording
|
||||||
|
|
||||||
|
In order to perform system analysis, you'll first need to record your system with WPR. Open WPR and
|
||||||
|
at the bottom of the window select the "profiles" of the things you want to record. For looking
|
||||||
|
into memory usage of the rustc bootstrap process, we'll want to select the following items:
|
||||||
|
|
||||||
|
* CPU usage
|
||||||
|
* VirtualAlloc usage
|
||||||
|
|
||||||
|
You might be tempted to record "Heap usage" as well, but this records every single heap allocation
|
||||||
|
and can be very, very expensive. For high-level analysis, it might be best to leave that turned
|
||||||
|
off.
|
||||||
|
|
||||||
|
Now we need to get our setup ready to record. For memory usage analysis, it is best to record the
|
||||||
|
stage 2 compiler build with a stage 1 compiler build with debug symbols. Having symbols in the
|
||||||
|
compiler we're using to build rustc will aid our analysis greatly by allowing WPA to resolve Rust
|
||||||
|
symbols correctly. Unfortunately, the stage 0 compiler does not have symbols turned on which is why
|
||||||
|
we'll need to build a stage 1 compiler and then a stage 2 compiler ourselves.
|
||||||
|
|
||||||
|
To do this, make sure you have set `debuginfo-level = 1` in your `config.toml` file. This tells
|
||||||
|
rustc to generate debug information which includes stack frames when bootstrapping.
|
||||||
|
|
||||||
|
Now you can build the stage 1 compiler: `python x.py build --stage 1 -i library/std` or however
|
||||||
|
else you want to build the stage 1 compiler.
|
||||||
|
|
||||||
|
Now that the stage 1 compiler is built, we can record the stage 2 build. Go back to WPR, click the
|
||||||
|
"start" button and build the stage 2 compiler (e.g., `python x build --stage=2 -i library/std `).
|
||||||
|
When this process finishes, stop the recording.
|
||||||
|
|
||||||
|
Click the Save button and once that process is complete, click the "Open in WPA" button which
|
||||||
|
appears.
|
||||||
|
|
||||||
|
> Note: The trace file is fairly large so it can take WPA some time to finish opening the file.
|
||||||
|
|
||||||
|
## Analysis
|
||||||
|
|
||||||
|
Now that our ETL file is open in WPA, we can analyze the results. First, we'll want to apply the
|
||||||
|
pre-made "profile" which will put WPA into a state conducive to analyzing rustc bootstrap. Download
|
||||||
|
the profile [here](https://github.com/wesleywiser/rustc-bootstrap-wpa-analysis/releases/download/1/rustc.generic.wpaProfile).
|
||||||
|
Select the "Profiles" menu at the top, then "apply" and then choose the downloaded profile.
|
||||||
|
|
||||||
|
You should see something resembling the following:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
Next, we will need to tell WPA to load and process debug symbols so that it can properly demangle
|
||||||
|
the Rust stack traces. To do this, click "Trace" and then choose "Load Symbols". This step can take
|
||||||
|
a while.
|
||||||
|
|
||||||
|
Once WPA has loaded symbols for rustc, we can expand the rustc.exe node and begin drilling down
|
||||||
|
into the stack with the largest allocations.
|
||||||
|
|
||||||
|
To do that, we'll expand the `[Root]` node in the "Commit Stack" column and continue expanding
|
||||||
|
until we find interesting stack frames.
|
||||||
|
|
||||||
|
> Tip: After selecting the node you want to expand, press the right arrow key. This will expand the
|
||||||
|
node and put the selection on the next largest node in the expanded set. You can continue pressing
|
||||||
|
the right arrow key until you reach an interesting frame.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
In this sample, you can see calls through codegen are allocating ~30gb of memory in total
|
||||||
|
throughout this profile.
|
||||||
|
|
||||||
|
## Other Analysis Tabs
|
||||||
|
|
||||||
|
The profile also includes a few other tabs which can be helpful:
|
||||||
|
|
||||||
|
- System Configuration
|
||||||
|
- General information about the system the capture was recorded on.
|
||||||
|
- rustc Build Processes
|
||||||
|
- A flat list of relevant processes such as rustc.exe, cargo.exe, link.exe etc.
|
||||||
|
- Each process lists its command line arguments.
|
||||||
|
- Useful for figuring out what a specific rustc process was working on.
|
||||||
|
- rustc Build Process Tree
|
||||||
|
- Timeline showing when processes started and exited.
|
||||||
|
- rustc CPU Analysis
|
||||||
|
- Contains charts preconfigured to show hotspots in rustc.
|
||||||
|
- These charts are designed to support analyzing where rustc is spending its time.
|
||||||
|
- rustc Memory Analysis
|
||||||
|
- Contains charts preconfigured to show where rustc is allocating memory.
|
||||||
Loading…
Reference in New Issue