add notes about generating llvm ir

2018-07-15 18:49:03 -05:00 · 2018-07-15 18:49:03 -05:00 · e0d07aad5f
parent 894893860d
commit e0d07aad5f
1 changed files with 114 additions and 50 deletions
--- a/src/compiler-debugging.md
+++ b/src/compiler-debugging.md
@ -1,4 +1,4 @@
-**Note: This is copied from the 
+**Note: This is copied from the
 [rust-forge](https://github.com/rust-lang-nursery/rust-forge). If anything needs
 updating, please open an issue or make a PR on the github repo.**

@ -16,7 +16,7 @@ normal Rust programs.  IIRC backtraces **don't work** on Mac and on MinGW,
 sorry. If you have trouble or the backtraces are full of `unknown`,
 you might want to find some way to use Linux or MSVC on Windows.

-In the default configuration, you don't have line numbers enabled, so the 
+In the default configuration, you don't have line numbers enabled, so the
 backtrace looks like this:

 ```text
@ -36,8 +36,8 @@ stack backtrace:
  37: rustc_driver::run_compiler
 ```

-If you want line numbers for the stack trace, you can enable 
-`debuginfo-lines=true` or `debuginfo=true` in your config.toml and rebuild the 
+If you want line numbers for the stack trace, you can enable
+`debuginfo-lines=true` or `debuginfo=true` in your config.toml and rebuild the
 compiler. Then the backtrace will look like this:

 ```text
@ -110,16 +110,16 @@ note: rustc 1.24.0-dev running on x86_64-unknown-linux-gnu

 note: run with `RUST_BACKTRACE=1` for a backtrace

-thread 'rustc' panicked at 'encountered error with `-Z treat_err_as_bug', 
+thread 'rustc' panicked at 'encountered error with `-Z treat_err_as_bug',
 /home/user/rust/src/librustc_errors/lib.rs:411:12
-note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose 
+note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose
 backtrace.
 stack backtrace:
  (~~~ IRRELEVANT PART OF BACKTRACE REMOVED BY ME ~~~)
-   7: rustc::traits::error_reporting::<impl rustc::infer::InferCtxt<'a, 'gcx, 
+   7: rustc::traits::error_reporting::<impl rustc::infer::InferCtxt<'a, 'gcx,
             'tcx>>::report_selection_error
             at /home/user/rust/src/librustc/traits/error_reporting.rs:823
-   8: rustc::traits::error_reporting::<impl rustc::infer::InferCtxt<'a, 'gcx, 
+   8: rustc::traits::error_reporting::<impl rustc::infer::InferCtxt<'a, 'gcx,
             'tcx>>::report_fulfillment_errors
             at /home/user/rust/src/librustc/traits/error_reporting.rs:160
             at /home/user/rust/src/librustc/traits/error_reporting.rs:112
@ -136,7 +136,7 @@ $ # Cool, now I have a backtrace for the error

 The compiler has a lot of `debug!` calls, which print out logging information
 at many points. These are very useful to at least narrow down the location of
-a bug if not to find it entirely, or just to orient yourself as to why the 
+a bug if not to find it entirely, or just to orient yourself as to why the
 compiler is doing a particular thing.

 To see the logs, you need to set the `RUST_LOG` environment variable to
@ -191,9 +191,9 @@ want to call `x.py clean` to force one.
 ### Logging etiquette

 Because calls to `debug!` are removed by default, in most cases, don't worry
-about adding "unnecessary" calls to `debug!` and leaving them in code you 
-commit - they won't slow down the performance of what we ship, and if they 
-helped you pinning down a bug, they will probably help someone else with a 
+about adding "unnecessary" calls to `debug!` and leaving them in code you
+commit - they won't slow down the performance of what we ship, and if they
+helped you pinning down a bug, they will probably help someone else with a
 different one.

 However, there are still a few concerns that you might care about:
@ -201,27 +201,27 @@ However, there are still a few concerns that you might care about:
 ### Expensive operations in logs

 A note of caution: the expressions *within* the `debug!` call are run
-whenever RUST_LOG is set, even if the filter would exclude the log. This means 
+whenever RUST_LOG is set, even if the filter would exclude the log. This means
 that if in the module `rustc::foo` you have a statement

 ```Rust
 debug!("{:?}", random_operation(tcx));
 ```

-Then if someone runs a debug `rustc` with `RUST_LOG=rustc::bar`, then 
-`random_operation()` will still run - even while it's output will never be 
+Then if someone runs a debug `rustc` with `RUST_LOG=rustc::bar`, then
+`random_operation()` will still run - even while it's output will never be
 needed!

 This means that you should not put anything too expensive or likely
-to crash there - that would annoy anyone who wants to use logging for their own 
-module. Note that if `RUST_LOG` is unset (the default), then the code will not 
-run - this means that if your logging code panics, then no-one will know it 
+to crash there - that would annoy anyone who wants to use logging for their own
+module. Note that if `RUST_LOG` is unset (the default), then the code will not
+run - this means that if your logging code panics, then no-one will know it
 until someone tries to use logging to find *another* bug.

-If you *need* to do an expensive operation in a log, be aware that while log 
-expressions are *evaluated* even if logging is not enabled in your module, 
-they are not *formatted* unless it *is*. This means you can put your 
-expensive/crashy operations inside an `fmt::Debug` impl, and they will not be 
+If you *need* to do an expensive operation in a log, be aware that while log
+expressions are *evaluated* even if logging is not enabled in your module,
+they are not *formatted* unless it *is*. This means you can put your
+expensive/crashy operations inside an `fmt::Debug` impl, and they will not be
 run unless your log is enabled:

 ```Rust
@ -246,7 +246,7 @@ debug!("{:?}", ExpensiveOperationContainer { tcx });
 ## Formatting Graphviz output (.dot files)
 [formatting-graphviz-output]: #formatting-graphviz-output

-Some compiler options for debugging specific features yield graphviz graphs - 
+Some compiler options for debugging specific features yield graphviz graphs -
 e.g. the `#[rustc_mir(borrowck_graphviz_postflow="suffix.dot")]` attribute
 dumps various borrow-checker dataflow graphs.

@ -261,30 +261,66 @@ $ firefox maybe_init_suffix.pdf # Or your favorite pdf viewer
 ## Debugging LLVM
 [debugging-llvm]: #debugging-llvm

-LLVM is a big project on its own that probably needs to have its own debugging
-document (not that I could find one). But here are some tips that are important
-in a rustc context:
+> NOTE: If you are looking for info about code generation, please see [this
+> chapter][codegen] instead.
+
+[codegen]: codegen.html
+
+This section is about debugging compiler bugs in code generation (e.g. why the
+compiler generated some piece of code or crashed in LLVM).  LLVM is a big
+project on its own that probably needs to have its own debugging document (not
+that I could find one). But here are some tips that are important in a rustc
+context:
+
+As a general rule, compilers generate lots of information from analyzing code.
+Thus, a useful first step is usually to find a minimal example. One way to do
+this is to
+
+1. create a new crate that reproduces the issue (e.g. adding whatever crate is
+at fault as a dependency, and using it from there)
+
+2. minimize the crate by removing external dependencies; that is, moving
+everything relevant to the new crate
+
+3. further minimize the issue by making the code shorter (there are tools that
+help with this like `creduce`)

 The official compilers (including nightlies) have LLVM assertions disabled,
 which means that LLVM assertion failures can show up as compiler crashes (not
 ICEs but "real" crashes) and other sorts of weird behavior. If you are
 encountering these, it is a good idea to try using a compiler with LLVM
 assertions enabled - either an "alt" nightly or a compiler you build yourself
-by setting `[llvm] assertions=true` in your config.toml - and
-see whether anything turns up.
+by setting `[llvm] assertions=true` in your config.toml - and see whether
+anything turns up.

-The rustc build process builds the LLVM tools into 
+The rustc build process builds the LLVM tools into
 `./build/<host-triple>/llvm/bin`. They can be called directly.

-The default rustc compilation pipeline has multiple codegen units, which is hard
-to replicate manually and means that LLVM is called multiple times in parallel.
-If you can get away with it (i.e. if it doesn't make your bug disappear),
-passing `-C codegen-units=1` to rustc will make debugging easier.
+The default rustc compilation pipeline has multiple codegen units, which is
+hard to replicate manually and means that LLVM is called multiple times in
+parallel.  If you can get away with it (i.e. if it doesn't make your bug
+disappear), passing `-C codegen-units=1` to rustc will make debugging easier.

-If you want to play with the optimization pipeline, you can use the opt tool 
-from `./build/<host-triple>/llvm/bin/` with the the LLVM IR emitted by rustc. 
-Note that rustc emits different IR depending on whether `-O` is enabled, even 
-without LLVM's optimizations, so if you want to play with the IR rustc emits, 
+To rustc to generate LLVM IR, you need to pass the `--emit=llvm-ir` flag. If
+you are building via cargo, use the `RUSTFLAGS` environment variable (e.g.
+`RUSTFLAGS='--emit=llvm-ir'`). This causes rustc to spit out LLVM IR into the
+target directory.
+
+`cargo llvm-ir [options] path` spits out the LLVM IR for a particular function
+at `path`. (`cargo install cargo-asm` installs `cargo asm` and `cargo
+llvm-ir`). `--build-type=debug` emits code for debug builds. There are also
+other useful options. Also, debug info in LLVM IR can clutter the output a lot:
+`RUSTFLAGS="-C debuginfo=0"` is really useful.
+
+`RUSTFLAGS="-C save-temps"` outputs LLVM bitcode (not the same as IR) at
+different stages during compilation, which is sometimes useful. One just needs
+to convert the bitcode files to `.ll` files using `llvm-dis` which should be in
+the target local compilation of rustc.
+
+If you want to play with the optimization pipeline, you can use the `opt` tool
+from `./build/<host-triple>/llvm/bin/` with the LLVM IR emitted by rustc.  Note
+that rustc emits different IR depending on whether `-O` is enabled, even
+without LLVM's optimizations, so if you want to play with the IR rustc emits,
 you should:

 ```bash
@ -295,21 +331,21 @@ $ $OPT -S -O2 < my-file.ll > my
 ```

 If you just want to get the LLVM IR during the LLVM pipeline, to e.g. see which
-IR causes an optimization-time assertion to fail, or to see when
-LLVM performs a particular optimization, you can pass the rustc flag
-`-C llvm-args=-print-after-all`, and possibly add
-`-C llvm-args='-filter-print-funcs=EXACT_FUNCTION_NAME` (e.g.
-`-C llvm-args='-filter-print-funcs=_ZN11collections3str21_$LT$impl$u20$str$GT$\
-    7replace17hbe10ea2e7c809b0bE'`).
+IR causes an optimization-time assertion to fail, or to see when LLVM performs
+a particular optimization, you can pass the rustc flag `-C
+llvm-args=-print-after-all`, and possibly add `-C
+llvm-args='-filter-print-funcs=EXACT_FUNCTION_NAME` (e.g.  `-C
+llvm-args='-filter-print-funcs=_ZN11collections3str21_$LT$impl$u20$str$GT$\
+7replace17hbe10ea2e7c809b0bE'`).

-That produces a lot of output into standard error, so you'll want to pipe
-that to some file. Also, if you are using neither `-filter-print-funcs` nor
-`-C codegen-units=1`, then, because the multiple codegen units run in parallel,
-the printouts will mix together and you won't be able to read anything.
+That produces a lot of output into standard error, so you'll want to pipe that
+to some file. Also, if you are using neither `-filter-print-funcs` nor `-C
+codegen-units=1`, then, because the multiple codegen units run in parallel, the
+printouts will mix together and you won't be able to read anything.

-If you want just the IR for a specific function (say, you want to see
-why it causes an assertion or doesn't optimize correctly), you can use
-`llvm-extract`, e.g.
+If you want just the IR for a specific function (say, you want to see why it
+causes an assertion or doesn't optimize correctly), you can use `llvm-extract`,
+e.g.

 ```bash
 $ ./build/$TRIPLE/llvm/bin/llvm-extract \
@ -319,4 +355,32 @@ $ ./build/$TRIPLE/llvm/bin/llvm-extract \
    > extracted.ll
 ```

+### Filing LLVM bug reports
+
+When filing an LLVM bug report, you will probably want some sort of minimal
+working example that demonstrates the problem. The Godbolt compiler explorer is
+really helpful for this.
+
+1. Once you have some LLVM IR for the problematic code (see above), you can
+create a minimal working example with Godbolt. Go to
+[gcc.godbolt.org](https://gcc.godbolt.org).
+
+2. Choose `LLVM-IR` as programming language.
+
+3. Use `llc` to compile the IR to a particular target as is:
+    - There are some useful flags: `-mattr` enables target features, `-march=`
+      selects the target, `-mcpu=` selects the CPU, etc.
+    - Commands like `llc -march=help` output all architectures available, which
+      is useful because sometimes the Rust arch names and the LLVM names do not
+      match.
+    - If you have compiled rustc yourself somewhere, in the target directory
+      you have binaries for `llc`, `opt`, etc.
+
+4. If you want to optimize the LLVM-IR, you can use `opt` to see how the LLVM
+   optimizations transform it.
+
+5. Once you have a godbolt link demonstrating the issue, it is pretty easy to
+   fill in an LLVM bug.
+
+
 [env-logger]: https://docs.rs/env_logger/0.4.3/env_logger/