move readme to guide (#481)
This commit is contained in:
parent
bb6ca4c4b8
commit
114ed694f2
|
|
@ -91,6 +91,7 @@
|
|||
- [Code Generation](./codegen.md)
|
||||
- [Updating LLVM](./codegen/updating-llvm.md)
|
||||
- [Debugging LLVM](./codegen/debugging.md)
|
||||
- [Backend Agnostic Codegen](./codegen/backend-agnostic.md)
|
||||
- [Profile-guided Optimization](./profile-guided-optimization.md)
|
||||
- [Debugging Support in Rust Compiler](./debugging-support-in-rustc.md)
|
||||
|
||||
|
|
|
|||
|
|
@ -0,0 +1,203 @@
|
|||
# Backend Agnostic Codegen
|
||||
|
||||
In the future, it would be nice to allow other codegen backends (e.g.
|
||||
[Cranelift][cranelift]). To this end, `librustc_codegen_ssa` provides an
|
||||
abstract interface for all backends to implenent.
|
||||
|
||||
> The following is a copy/paste of a README from the rust-lang/rust repo.
|
||||
> Please submit a PR if it needs updating.
|
||||
|
||||
# Refactoring of `rustc_codegen_llvm`
|
||||
by Denis Merigoux, October 23rd 2018
|
||||
|
||||
## State of the code before the refactoring
|
||||
|
||||
All the code related to the compilation of MIR into LLVM IR was contained
|
||||
inside the `rustc_codegen_llvm` crate. Here is the breakdown of the most
|
||||
important elements:
|
||||
* the `back` folder (7,800 LOC) implements the mechanisms for creating the
|
||||
different object files and archive through LLVM, but also the communication
|
||||
mechanisms for parallel code generation;
|
||||
* the `debuginfo` (3,200 LOC) folder contains all code that passes debug
|
||||
information down to LLVM;
|
||||
* the `llvm` (2,200 LOC) folder defines the FFI necessary to communicate with
|
||||
LLVM using the C++ API;
|
||||
* the `mir` (4,300 LOC) folder implements the actual lowering from MIR to LLVM
|
||||
IR;
|
||||
* the `base.rs` (1,300 LOC) file contains some helper functions but also the
|
||||
high-level code that launches the code generation and distributes the work.
|
||||
* the `builder.rs` (1,200 LOC) file contains all the functions generating
|
||||
individual LLVM IR instructions inside a basic block;
|
||||
* the `common.rs` (450 LOC) contains various helper functions and all the
|
||||
functions generating LLVM static values;
|
||||
* the `type_.rs` (300 LOC) defines most of the type translations to LLVM IR.
|
||||
|
||||
The goal of this refactoring is to separate inside this crate code that is
|
||||
specific to the LLVM from code that can be reused for other rustc backends. For
|
||||
instance, the `mir` folder is almost entirely backend-specific but it relies
|
||||
heavily on other parts of the crate. The separation of the code must not affect
|
||||
the logic of the code nor its performance.
|
||||
|
||||
For these reasons, the separation process involves two transformations that
|
||||
have to be done at the same time for the resulting code to compile :
|
||||
|
||||
1. replace all the LLVM-specific types by generics inside function signatures
|
||||
and structure definitions;
|
||||
2. encapsulate all functions calling the LLVM FFI inside a set of traits that
|
||||
will define the interface between backend-agnostic code and the backend.
|
||||
|
||||
While the LLVM-specific code will be left in `rustc_codegen_llvm`, all the new
|
||||
traits and backend-agnostic code will be moved in `rustc_codegen_ssa` (name
|
||||
suggestion by @eddyb).
|
||||
|
||||
## Generic types and structures
|
||||
|
||||
@irinagpopa started to parametrize the types of `rustc_codegen_llvm` by a
|
||||
generic `Value` type, implemented in LLVM by a reference `&'ll Value`. This
|
||||
work has been extended to all structures inside the `mir` folder and elsewhere,
|
||||
as well as for LLVM's `BasicBlock` and `Type` types.
|
||||
|
||||
The two most important structures for the LLVM codegen are `CodegenCx` and
|
||||
`Builder`. They are parametrized by multiple lifetime parameters and the type
|
||||
for `Value`.
|
||||
|
||||
```rust,ignore
|
||||
struct CodegenCx<'ll, 'tcx> {
|
||||
/* ... */
|
||||
}
|
||||
|
||||
struct Builder<'a, 'll, 'tcx> {
|
||||
cx: &'a CodegenCx<'ll, 'tcx>,
|
||||
/* ... */
|
||||
}
|
||||
```
|
||||
|
||||
`CodegenCx` is used to compile one codegen-unit that can contain multiple
|
||||
functions, whereas `Builder` is created to compile one basic block.
|
||||
|
||||
The code in `rustc_codegen_llvm` has to deal with multiple explicit lifetime
|
||||
parameters, that correspond to the following:
|
||||
* `'tcx` is the longest lifetime, that corresponds to the original `TyCtxt`
|
||||
containing the program's information;
|
||||
* `'a` is a short-lived reference of a `CodegenCx` or another object inside a
|
||||
struct;
|
||||
* `'ll` is the lifetime of references to LLVM objects such as `Value` or
|
||||
`Type`.
|
||||
|
||||
Although there are already many lifetime parameters in the code, making it
|
||||
generic uncovered situations where the borrow-checker was passing only due to
|
||||
the special nature of the LLVM objects manipulated (they are extern pointers).
|
||||
For instance, a additional lifetime parameter had to be added to
|
||||
`LocalAnalyser` in `analyse.rs`, leading to the definition:
|
||||
|
||||
```rust,ignore
|
||||
struct LocalAnalyzer<'mir, 'a, 'tcx> {
|
||||
/* ... */
|
||||
}
|
||||
```
|
||||
|
||||
However, the two most important structures `CodegenCx` and `Builder` are not
|
||||
defined in the backend-agnostic code. Indeed, their content is highly specific
|
||||
of the backend and it makes more sense to leave their definition to the backend
|
||||
implementor than to allow just a narrow spot via a generic field for the
|
||||
backend's context.
|
||||
|
||||
## Traits and interface
|
||||
|
||||
Because they have to be defined by the backend, `CodegenCx` and `Builder` will
|
||||
be the structures implementing all the traits defining the backend's interface.
|
||||
These traits are defined in the folder `rustc_codegen_ssa/traits` and all the
|
||||
backend-agnostic code is parametrized by them. For instance, let us explain how
|
||||
a function in `base.rs` is parametrized:
|
||||
|
||||
```rust,ignore
|
||||
pub fn codegen_instance<'a, 'tcx, Bx: BuilderMethods<'a, 'tcx>>(
|
||||
cx: &'a Bx::CodegenCx,
|
||||
instance: Instance<'tcx>
|
||||
) {
|
||||
/* ... */
|
||||
}
|
||||
```
|
||||
|
||||
In this signature, we have the two lifetime parameters explained earlier and
|
||||
the master type `Bx` which satisfies the trait `BuilderMethods` corresponding
|
||||
to the interface satisfied by the `Builder` struct. The `BuilderMethods`
|
||||
defines an associated type `Bx::CodegenCx` that itself satisfies the
|
||||
`CodegenMethods` traits implemented by the struct `CodegenCx`.
|
||||
|
||||
On the trait side, here is an example with part of the definition of
|
||||
`BuilderMethods` in `traits/builder.rs`:
|
||||
|
||||
```rust,ignore
|
||||
pub trait BuilderMethods<'a, 'tcx>:
|
||||
HasCodegen<'tcx>
|
||||
+ DebugInfoBuilderMethods<'tcx>
|
||||
+ ArgTypeMethods<'tcx>
|
||||
+ AbiBuilderMethods<'tcx>
|
||||
+ IntrinsicCallMethods<'tcx>
|
||||
+ AsmBuilderMethods<'tcx>
|
||||
{
|
||||
fn new_block<'b>(
|
||||
cx: &'a Self::CodegenCx,
|
||||
llfn: Self::Function,
|
||||
name: &'b str
|
||||
) -> Self;
|
||||
/* ... */
|
||||
fn cond_br(
|
||||
&mut self,
|
||||
cond: Self::Value,
|
||||
then_llbb: Self::BasicBlock,
|
||||
else_llbb: Self::BasicBlock,
|
||||
);
|
||||
/* ... */
|
||||
}
|
||||
```
|
||||
|
||||
Finally, a master structure implementing the `ExtraBackendMethods` trait is
|
||||
used for high-level codegen-driving functions like `codegen_crate` in
|
||||
`base.rs`. For LLVM, it is the empty `LlvmCodegenBackend`.
|
||||
`ExtraBackendMethods` should be implemented by the same structure that
|
||||
implements the `CodegenBackend` defined in
|
||||
`rustc_codegen_utils/codegen_backend.rs`.
|
||||
|
||||
During the traitification process, certain functions have been converted from
|
||||
methods of a local structure to methods of `CodegenCx` or `Builder` and a
|
||||
corresponding `self` parameter has been added. Indeed, LLVM stores information
|
||||
internally that it can access when called through its API. This information
|
||||
does not show up in a Rust data structure carried around when these methods are
|
||||
called. However, when implementing a Rust backend for `rustc`, these methods
|
||||
will need information from `CodegenCx`, hence the additional parameter (unused
|
||||
in the LLVM implementation of the trait).
|
||||
|
||||
## State of the code after the refactoring
|
||||
|
||||
The traits offer an API which is very similar to the API of LLVM. This is not
|
||||
the best solution since LLVM has a very special way of doing things: when
|
||||
addding another backend, the traits definition might be changed in order to
|
||||
offer more flexibility.
|
||||
|
||||
However, the current separation between backend-agnostic and LLVM-specific code
|
||||
has allows the reuse of a significant part of the old `rustc_codegen_llvm`.
|
||||
Here is the new LOC breakdown between backend-agnostic (BA) and LLVM for the
|
||||
most important elements:
|
||||
|
||||
* `back` folder: 3,800 (BA) vs 4,100 (LLVM);
|
||||
* `mir` folder: 4,400 (BA) vs 0 (LLVM);
|
||||
* `base.rs`: 1,100 (BA) vs 250 (LLVM);
|
||||
* `builder.rs`: 1,400 (BA) vs 0 (LLVM);
|
||||
* `common.rs`: 350 (BA) vs 350 (LLVM);
|
||||
|
||||
The `debuginfo` folder has been left almost untouched by the splitting and is
|
||||
specific to LLVM. Only its high-level features have been traitified.
|
||||
|
||||
The new `traits` folder has 1500 LOC only for trait definitions. Overall, the
|
||||
27,000 LOC-sized old `rustc_codegen_llvm` code has been split into the new
|
||||
18,500 LOC-sized new `rustc_codegen_llvm` and the 12,000 LOC-sized
|
||||
`rustc_codegen_ssa`. We can say that this refactoring allowed the reuse of
|
||||
approximately 10,000 LOC that would otherwise have had to be duplicated between
|
||||
the multiple backends of `rustc`.
|
||||
|
||||
The refactored version of `rustc`'s backend introduced no regression over the
|
||||
test suite nor in performance benchmark, which is in coherence with the nature
|
||||
of the refactoring that used only compile-time parametricity (no trait
|
||||
objects).
|
||||
Loading…
Reference in New Issue