This commit is contained in:
Tshepang Mbambo 2022-08-26 17:34:44 +02:00 committed by GitHub
parent 4128e99571
commit f54dffb9e1
2 changed files with 44 additions and 38 deletions

View File

@ -1,13 +1,16 @@
# Code generation # Code generation
Code generation or "codegen" is the part of the compiler that actually Code generation (or "codegen") is the part of the compiler
generates an executable binary. Usually, rustc uses LLVM for code generation; that actually generates an executable binary.
there is also support for [Cranelift]. The key is that rustc doesn't implement Usually, rustc uses LLVM for code generation,
codegen itself. It's worth noting, though, that in the Rust source code, many bu there is also support for [Cranelift] and [GCC].
parts of the backend have `codegen` in their names (there are no hard The key is that rustc doesn't implement codegen itself.
boundaries). It's worth noting, though, that in the Rust source code,
many parts of the backend have `codegen` in their names
(there are no hard boundaries).
[Cranelift]: https://github.com/bytecodealliance/wasmtime/tree/HEAD/cranelift [Cranelift]: https://github.com/bytecodealliance/wasmtime/tree/main/cranelift
[GCC]: https://github.com/rust-lang/rustc_codegen_gcc
> NOTE: If you are looking for hints on how to debug code generation bugs, > NOTE: If you are looking for hints on how to debug code generation bugs,
> please see [this section of the debugging chapter][debugging]. > please see [this section of the debugging chapter][debugging].

View File

@ -1,54 +1,57 @@
# From MIR to Binaries # From MIR to Binaries
All of the preceding chapters of this guide have one thing in common: we never All of the preceding chapters of this guide have one thing in common:
generated any executable machine code at all! With this chapter, all of that we never generated any executable machine code at all!
changes. With this chapter, all of that changes.
So far, we've shown how the compiler can take raw source code in text format So far,
and transform it into [MIR]. We have also shown how the compiler does various we've shown how the compiler can take raw source code in text format
analyses on the code to detect things like type or lifetime errors. Now, we and transform it into [MIR].
will finally take the MIR and produce some executable machine code. We have also shown how the compiler does various
analyses on the code to detect things like type or lifetime errors.
Now, we will finally take the MIR and produce some executable machine code.
[MIR]: ./mir/index.md [MIR]: ./mir/index.md
> NOTE: This part of a compiler is often called the _backend_. The term is a bit > NOTE: This part of a compiler is often called the _backend_.
> overloaded because in the compiler source, it usually refers to the "codegen > The term is a bit overloaded because in the compiler source,
> backend" (i.e. LLVM or Cranelift). Usually, when you see the word "backend" > it usually refers to the "codegen backend" (i.e. LLVM, Cranelift, or GCC).
> in this part, we are referring to the "codegen backend". > Usually, when you see the word "backend" in this part,
> we are referring to the "codegen backend".
So what do we need to do? So what do we need to do?
0. First, we need to collect the set of things to generate code for. In 0. First, we need to collect the set of things to generate code for.
particular, we need to find out which concrete types to substitute for In particular,
generic ones, since we need to generate code for the concrete types. we need to find out which concrete types to substitute for generic ones,
Generating code for the concrete types (i.e. emitting a copy of the code for since we need to generate code for the concrete types.
each concrete type) is called _monomorphization_, so the process of Generating code for the concrete types
collecting all the concrete types is called _monomorphization collection_. (i.e. emitting a copy of the code for each concrete type) is called _monomorphization_,
so the process of collecting all the concrete types is called _monomorphization collection_.
1. Next, we need to actually lower the MIR to a codegen IR 1. Next, we need to actually lower the MIR to a codegen IR
(usually LLVM IR) for each concrete type we collected. (usually LLVM IR) for each concrete type we collected.
2. Finally, we need to invoke LLVM or Cranelift, which runs a bunch of 2. Finally, we need to invoke the codegen backend,
optimization passes, generates executable code, and links together an which runs a bunch of optimization passes,
executable binary. generates executable code,
and links together an executable binary.
[codegen1]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_crate.html [codegen1]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_crate.html
The code for codegen is actually a bit complex due to a few factors: The code for codegen is actually a bit complex due to a few factors:
- Support for multiple codegen backends (LLVM and Cranelift). We try to share as much - Support for multiple codegen backends (LLVM, Cranelift, and GCC).
backend code between them as possible, so a lot of it is generic over the We try to share as much backend code between them as possible,
codegen implementation. This means that there are often a lot of layers of so a lot of it is generic over the codegen implementation.
abstraction. This means that there are often a lot of layers of abstraction.
- Codegen happens asynchronously in another thread for performance. - Codegen happens asynchronously in another thread for performance.
- The actual codegen is done by a third-party library (either LLVM or Cranelift). - The actual codegen is done by a third-party library (either of the 3 backends).
Generally, the [`rustc_codegen_ssa`][ssa] crate contains backend-agnostic code Generally, the [`rustc_codegen_ssa`][ssa] crate contains backend-agnostic code,
(i.e. independent of LLVM or Cranelift), while the [`rustc_codegen_llvm`][llvm] while the [`rustc_codegen_llvm`][llvm] crate contains code specific to LLVM codegen.
crate contains code specific to LLVM codegen.
[ssa]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/index.html [ssa]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/index.html
[llvm]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/index.html [llvm]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/index.html
At a very high level, the entry point is At a very high level, the entry point is
[`rustc_codegen_ssa::base::codegen_crate`][codegen1]. This function starts the [`rustc_codegen_ssa::base::codegen_crate`][codegen1].
process discussed in the rest of this chapter. This function starts the process discussed in the rest of this chapter.