update backend chapters from nagisa's notes

This commit is contained in:
Mark Mansi 2020-03-07 17:26:30 -06:00 committed by Who? Me?!
parent 44cba6e075
commit da7894aa29
5 changed files with 151 additions and 32 deletions

View File

@ -11,7 +11,7 @@ AST | the abstract syntax tree produced by the `rustc_ast`
binder | a "binder" is a place where a variable or type is declared; for example, the `<T>` is a binder for the generic type parameter `T` in `fn foo<T>(..)`, and \|`a`\|` ...` is a binder for the parameter `a`. See [the background chapter for more](./background.html#free-vs-bound)
bound variable | a "bound variable" is one that is declared within an expression/term. For example, the variable `a` is bound within the closure expression \|`a`\|` a * 2`. See [the background chapter for more](./background.html#free-vs-bound)
codegen | the code to translate MIR into LLVM IR.
codegen unit | when we produce LLVM IR, we group the Rust code into a number of codegen units (sometimes abbreviated as CGUs). Each of these units is processed by LLVM independently from one another, enabling parallelism. They are also the unit of incremental re-use.
codegen unit | when we produce LLVM IR, we group the Rust code into a number of codegen units (sometimes abbreviated as CGUs). Each of these units is processed by LLVM independently from one another, enabling parallelism. They are also the unit of incremental re-use. ([see more](../backend/codegen.md))
completeness | completeness is a technical term in type theory. Completeness means that every type-safe program also type-checks. Having both soundness and completeness is very hard, and usually soundness is more important. (see "soundness").
control-flow graph | a representation of the control-flow of a program; see [the background chapter for more](./background.html#cfg)
CTFE | Compile-Time Function Evaluation. This is the ability of the compiler to evaluate `const fn`s at compile time. This is part of the compiler's constant evaluation system. ([see more](../const-eval.html))

View File

@ -1,9 +1,19 @@
# The Compiler Backend
The _compiler backend_ refers to the parts of the compiler that turn rustc's
MIR into actual executable code (e.g. an ELF or EXE binary) that can run on a
processor. This is the last stage of compilation, and it has a few important
parts:
All of the preceding chapters of this guide have one thing in common: we never
generated any executable machine code at all! With this chapter, all of that
changes.
It's often useful to think of compilers as being composed of a _frontend_ and a
_backend_ (though in rustc, there's not a sharp line between frontend and
backend). The _frontend_ is responsible for taking raw source code, checking it
for correctness, and getting it into a format usable by the backend. For rustc,
this format is the MIR. The _backend_ refers to the parts of the compiler that
turn rustc's MIR into actual executable code (e.g. an ELF or EXE binary) that
can run on a processor. All of the previous chapters deal with rustc's
frontend.
rustc's backend does the following:
0. First, we need to collect the set of things to generate code for. In
particular, we need to find out which concrete types to substitute for
@ -11,7 +21,30 @@ parts:
Generating code for the concrete types (i.e. emitting a copy of the code for
each concrete type) is called _monomorphization_, so the process of
collecting all the concrete types is called _monomorphization collection_.
1. Next, we need to actually lower the MIR (which is generic) to a codegen IR
(usually LLVM IR; which is not generic) for each concrete type we collected.
2. Finally, we need to invoke LLVM, which runs a bunch of optimization passes,
generates executable code, and links together an executable binary.
1. Next, we need to actually lower the MIR to a codegen IR
(usually LLVM IR) for each concrete type we collected.
2. Finally, we need to invoke the codegen backend (e.g. LLVM or Cranelift),
which runs a bunch of optimization passes, generates executable code, and
links together an executable binary.
[codegen1]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_crate.html
The code for codegen is actually a bit complex due to a few factors:
- Support for multiple backends (LLVM and Cranelift). We try to share as much
backend code between them as possible, so a lot of it is generic over the
codegen implementation. This means that there are often a lot of layers of
abstraction.
- Codegen happens asynchronously in another thread for performance.
- The actual codegen is done by a third-party library (either LLVM or Cranelift).
Generally, the [`rustc_codegen_ssa`][ssa] crate contains backend-agnastic code
(i.e. independent of LLVM or Cranelift), while the [`rustc_codegen_llvm`][llvm]
crate contains code specific to LLVM codegen.
[ssa]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/index.html
[llvm]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/index.html
At a very high level, the entry point is
[`rustc_codegen_ssa::base::codegen_crate`][codegen1]. This function starts the
process discussed in the rest of this chapter.

View File

@ -1,7 +1,13 @@
# Code generation
Code generation or "codegen" is the part of the compiler that actually
generates an executable binary. rustc uses LLVM for code generation.
generates an executable binary. Usually, rustc uses LLVM for code generation;
there is also support for [Cranelift]. The key is that rustc doesn't implement
codegen itself. It's worth noting, though, that in the rust source code, many
parts of the backend have `codegen` in their names (there are no hard
boundaries).
[Cranelift]: https://github.com/bytecodealliance/wasmtime/tree/master/cranelift
> NOTE: If you are looking for hints on how to debug code generation bugs,
> please see [this section of the debugging chapter][debugging].
@ -10,28 +16,16 @@ generates an executable binary. rustc uses LLVM for code generation.
## What is LLVM?
All of the preceding chapters of this guide have one thing in common: we never
generated any executable machine code at all! With this chapter, all of that
changes.
[LLVM](https://llvm.org) is "a collection of modular and reusable compiler and
toolchain technologies". In particular, the LLVM project contains a pluggable
compiler backend (also called "LLVM"), which is used by many compiler projects,
including the `clang` C compiler and our beloved `rustc`.
Like most compilers, rustc is composed of a "frontend" and a "backend". The
"frontend" is responsible for taking raw source code, checking it for
correctness, and getting it into a format `X` from which we can generate
executable machine code. The "backend" then takes that format `X` and produces
(possibly optimized) executable machine code for some platform. All of the
previous chapters deal with rustc's frontend.
rustc's backend is [LLVM](https://llvm.org), "a collection of modular and
reusable compiler and toolchain technologies". In particular, the LLVM project
contains a pluggable compiler backend (also called "LLVM"), which is used by
many compiler projects, including the `clang` C compiler and our beloved
`rustc`.
LLVM's "format `X`" is called LLVM IR. It is basically assembly code with
LLVM takes input in the form of LLVM IR. It is basically assembly code with
additional low-level types and annotations added. These annotations are helpful
for doing optimizations on the LLVM IR and outputted machine code. The end
result of all this is (at long last) something executable (e.g. an ELF object
or wasm).
result of all this is (at long last) something executable (e.g. an ELF object,
an EXE, or wasm).
There are a few benefits to using LLVM:
@ -49,6 +43,34 @@ There are a few benefits to using LLVM:
[spectre]: https://meltdownattack.com/
## Generating LLVM IR
## Running LLVM, linking, and metadata generation
TODO
Once LLVM IR for all of the functions and statics, etc is built, it is time to
start running LLVM and its optimisation passes. LLVM IR is grouped into
"modules". Multiple "modules" can be codegened at the same time to aid in
multi-core utilisation. These "modules" are what we refer to as _codegen
units_. These units were established way back during monomorphisation
collection phase.
Once LLVM produces objects from these modules, these objects are passed to the
linker along with, optionally, the metadata object and an archive or an
executable is produced.
It is not necessarily the codegen phase described above that runs the
optimisations. With certain kinds of LTO, the optimisation might happen at the
linking time instead. It is also possible for some optimisations to happen
before objects are passed on to the linker and some to happen during the
linking.
This all happens towards the very end of compilation. The code for this can be
found in [`librustc_codegen_ssa::back`][ssaback] and
[`librustc_codegen_llvm::back`][llvmback]. Sadly, this piece of code is not
really well-separated into LLVM-dependent code; the [`rustc_codegen_ssa`][ssa]
contains a fair amount of code specific to the LLVM backend.
Once these components are done with their work you end up with a number of
files in your filesystem corresponding to the outputs you have requested.
[ssa]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/index.html
[ssaback]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/back/index.html
[llvmback]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/back/index.html

View File

@ -1,3 +1,58 @@
# Lowering MIR to a Codegen IR
TODO
Now that we have a list of symbols to generate from the collector, we need to
generate some sort of codegen IR. In this chapter, we will assume LLVM IR,
since that's what rustc usually uses. The actual monomorphisation is performed
as we go, while we do the translation.
Recall that the backend is started by
[`rustc_codegen_ssa::base::codegen_crate`][codegen1]. Eventually, this reaches
[`rustc_codegen_ssa::mir::codegen_mir`][codegen2], which does the lowering from
MIR to LLVM IR.
[codegen1]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_crate.html
[codegen2]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/mir/fn.codegen_mir.html
The code is split into modules which handle particular MIR primitives:
- [`librustc_codegen_ssa::mir::block`][mirblk] will deal with translating
blocks and their terminators. The most complicated and also the most
interesting thing this module does is generating code for function calls,
including the necessary unwinding handling IR.
- [`librustc_codegen_ssa::mir::statement`][mirst] translates MIR statements.
- [`librustc_codegen_ssa::mir::operand`][mirop] translates MIR operands.
- [`librustc_codegen_ssa::mir::place`][mirpl] translates MIR place references.
- [`librustc_codegen_ssa::mir::rvalue`][mirrv] translates MIR r-values.
[mirblk]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/mir/block/index.html
[mirst]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/mir/statement/index.html
[mirop]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/mir/operand/index.html
[mirpl]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/mir/place/index.html
[mirrv]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/mir/rvalue/index.html
Before a function is translated a number of simple and primitive analysis
passes will run to help us generate simpler and more efficient LLVM IR. An
example of such an analysis pass would be figuring out which variables are
SSA-like, so that we can translate them to SSA directly rather than relying on
LLVM's `mem2reg` for those variables. The anayses can be found in
[`rustc_codegen_ssa::mir::analyze`][mirana].
[mirana]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/mir/analyze/index.html
Usually a single MIR basic block will map to a LLVM basic block, with very few
exceptions: intrinsic or function calls and less basic MIR statemenets like
`assert` can result in multiple basic blocks. This is a perfect lede into the
non-portable LLVM-specific part of the code generation. Intrinsic generation is
fairly easy to understand as it involves very few abstraction levels in between
and can be found in [`rustc_codegen_llvm::intrinsic`][llvmint].
[llvmint]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/intrinsic/index.html
Everything else will use the [builder interface][builder], this is the code that gets
called in [`librustc_codegen_ssa::mir::*`][ssamir] modules that was discussed
above.
[builder]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/builder/index.html
[ssamir]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/mir/index.html
> TODO: discuss how constants are generated

View File

@ -49,6 +49,15 @@ See [the collector rustdocs][collect] for more info.
[collect]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/monomorphize/collector/index.html
The monomorphisation collector is run just before MIR lowering and codegen.
[`rustc_codegen_ssa::base::codegen_crate`][codegen1] calls the
[`collect_and_partition_mono_items`][mono] query, which does monomorphisation
collection and then partitions them into [codegen
units](../appendix/glossary.md).
[mono]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/monomorphize/partitioning/fn.collect_and_partition_mono_items.html
[codegen1]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_crate.html
## Polymorphization
As mentioned above, monomorphisation produces fast code, but it comes at the