Update overview.md (#1898)
* Update overview.md Various link addition and minor edits for clarity. * generic improvements * fix line lengths for ci/cd --------- Co-authored-by: Tbkhi <me.stole546@silomails.com> Co-authored-by: Oliver Dechant <ol922807@dal.ca>
This commit is contained in:
parent
57b23c8566
commit
0a9c758ed5
312
src/overview.md
312
src/overview.md
|
|
@ -6,25 +6,24 @@ This chapter is about the overall process of compiling a program -- how
|
|||
everything fits together.
|
||||
|
||||
The Rust compiler is special in two ways: it does things to your code that
|
||||
other compilers don't do (e.g. borrow checking) and it has a lot of
|
||||
other compilers don't do (e.g. borrow-checking) and it has a lot of
|
||||
unconventional implementation choices (e.g. queries). We will talk about these
|
||||
in turn in this chapter, and in the rest of the guide, we will look at all the
|
||||
in turn in this chapter, and in the rest of the guide, we will look at the
|
||||
individual pieces in more detail.
|
||||
|
||||
## What the compiler does to your code
|
||||
|
||||
So first, let's look at what the compiler does to your code. For now, we will
|
||||
avoid mentioning how the compiler implements these steps except as needed;
|
||||
we'll talk about that later.
|
||||
avoid mentioning how the compiler implements these steps except as needed.
|
||||
|
||||
### Invocation
|
||||
|
||||
Compilation begins when a user writes a Rust source program in text
|
||||
and invokes the `rustc` compiler on it. The work that the compiler needs to
|
||||
perform is defined by command-line options. For example, it is possible to
|
||||
enable nightly features (`-Z` flags), perform `check`-only builds, or emit
|
||||
LLVM-IR rather than executable machine code. The `rustc` executable call may
|
||||
be indirect through the use of `cargo`.
|
||||
Compilation begins when a user writes a Rust source program in text and invokes
|
||||
the `rustc` compiler on it. The work that the compiler needs to perform is
|
||||
defined by command-line options. For example, it is possible to enable nightly
|
||||
features (`-Z` flags), perform `check`-only builds, or emit the LLVM
|
||||
Intermediate Representation (`LLVM-IR`) rather than executable machine code.
|
||||
The `rustc` executable call may be indirect through the use of `cargo`.
|
||||
|
||||
Command line argument parsing occurs in the [`rustc_driver`]. This crate
|
||||
defines the compile configuration that is requested by the user and passes it
|
||||
|
|
@ -34,140 +33,151 @@ to the rest of the compilation process as a [`rustc_interface::Config`].
|
|||
|
||||
The raw Rust source text is analyzed by a low-level *lexer* located in
|
||||
[`rustc_lexer`]. At this stage, the source text is turned into a stream of
|
||||
atomic source code units known as _tokens_. The lexer supports the
|
||||
atomic source code units known as _tokens_. The `lexer` supports the
|
||||
Unicode character encoding.
|
||||
|
||||
The token stream passes through a higher-level lexer located in
|
||||
[`rustc_parse`] to prepare for the next stage of the compile process. The
|
||||
[`StringReader`] struct is used at this stage to perform a set of validations
|
||||
[`StringReader`] `struct` is used at this stage to perform a set of validations
|
||||
and turn strings into interned symbols (_interning_ is discussed later).
|
||||
[String interning] is a way of storing only one immutable
|
||||
copy of each distinct string value.
|
||||
|
||||
The lexer has a small interface and doesn't depend directly on the
|
||||
diagnostic infrastructure in `rustc`. Instead it provides diagnostics as plain
|
||||
data which are emitted in `rustc_parse::lexer` as real diagnostics.
|
||||
The lexer preserves full fidelity information for both IDEs and proc macros.
|
||||
The lexer has a small interface and doesn't depend directly on the diagnostic
|
||||
infrastructure in `rustc`. Instead it provides diagnostics as plain data which
|
||||
are emitted in [`rustc_parse::lexer`] as real diagnostics. The `lexer`
|
||||
preserves full fidelity information for both IDEs and procedural macros
|
||||
(sometimes referred to as "proc-macros").
|
||||
|
||||
The *parser* [translates the token stream from the lexer into an Abstract Syntax
|
||||
The *parser* [translates the token stream from the `lexer` into an Abstract Syntax
|
||||
Tree (AST)][parser]. It uses a recursive descent (top-down) approach to syntax
|
||||
analysis. The crate entry points for the parser are the
|
||||
analysis. The crate entry points for the `parser` are the
|
||||
[`Parser::parse_crate_mod()`][parse_crate_mod] and [`Parser::parse_mod()`][parse_mod]
|
||||
methods found in [`rustc_parse::parser::Parser`]. The external module parsing
|
||||
entry point is [`rustc_expand::module::parse_external_mod`][parse_external_mod].
|
||||
And the macro parser entry point is [`Parser::parse_nonterminal()`][parse_nonterminal].
|
||||
And the macro-`parser` entry point is [`Parser::parse_nonterminal()`][parse_nonterminal].
|
||||
|
||||
Parsing is performed with a set of `Parser` utility methods including `bump`,
|
||||
`check`, `eat`, `expect`, `look_ahead`.
|
||||
Parsing is performed with a set of [`parser`] utility methods including [`bump`],
|
||||
[`check`], [`eat`], [`expect`], [`look_ahead`].
|
||||
|
||||
Parsing is organized by semantic construct. Separate
|
||||
`parse_*` methods can be found in the [`rustc_parse`][rustc_parse_parser_dir]
|
||||
directory. The source file name follows the construct name. For example, the
|
||||
following files are found in the parser:
|
||||
following files are found in the `parser`:
|
||||
|
||||
- `expr.rs`
|
||||
- `pat.rs`
|
||||
- `ty.rs`
|
||||
- `stmt.rs`
|
||||
- [`expr.rs`](https://github.com/rust-lang/rust/blob/master/compiler/rustc_parse/src/parser/expr.rs)
|
||||
- [`pat.rs`](https://github.com/rust-lang/rust/blob/master/compiler/rustc_parse/src/parser/pat.rs)
|
||||
- [`ty.rs`](https://github.com/rust-lang/rust/blob/master/compiler/rustc_parse/src/parser/ty.rs)
|
||||
- [`stmt.rs`](https://github.com/rust-lang/rust/blob/master/compiler/rustc_parse/src/parser/stmt.rs)
|
||||
|
||||
This naming scheme is used across many compiler stages. You will find
|
||||
either a file or directory with the same name across the parsing, lowering,
|
||||
type checking, THIR lowering, and MIR building sources.
|
||||
This naming scheme is used across many compiler stages. You will find either a
|
||||
file or directory with the same name across the parsing, lowering, type
|
||||
checking, [Typed High-level Intermediate Representation (`THIR`)] lowering, and
|
||||
[Mid-level Intermediate Representation (`MIR`)][mir] building sources.
|
||||
|
||||
Macro expansion, AST validation, name resolution, and early linting also take place
|
||||
during this stage.
|
||||
Macro-expansion, `AST`-validation, name-resolution, and early linting also take
|
||||
place during the lexing and parsing stage.
|
||||
|
||||
The parser uses the standard `DiagnosticBuilder` API for error handling, but we
|
||||
try to recover, parsing a superset of Rust's grammar, while also emitting an error.
|
||||
`rustc_ast::ast::{Crate, Mod, Expr, Pat, ...}` AST nodes are returned from the parser.
|
||||
The [`rustc_ast::ast`]::{[`Crate`], [`Expr`], [`Pat`], ...} `AST` nodes are
|
||||
returned from the parser while the standard [`DiagnosticBuilder`] API is used
|
||||
for error handling. Generally Rust's compiler will try to recover from errors
|
||||
by parsing a superset of Rust's grammar, while also emitting an error type.
|
||||
|
||||
### HIR lowering
|
||||
### `HIR` lowering
|
||||
|
||||
Next, we take the AST and convert it to [High-Level Intermediate
|
||||
Representation (HIR)][hir], a more compiler-friendly representation of the
|
||||
AST. This process is called "lowering". It involves a lot of desugaring of things
|
||||
like loops and `async fn`.
|
||||
Next the `AST` is converted into [High-Level Intermediate Representation
|
||||
(`HIR`)][hir], a more compiler-friendly representation of the `AST`. This process
|
||||
is called "lowering" and involves a lot of desugaring (the expansion and
|
||||
formalizing of shortened or abbreviated syntax constructs) of things like loops
|
||||
and `async fn`.
|
||||
|
||||
We then use the HIR to do [*type inference*] (the process of automatic
|
||||
detection of the type of an expression), [*trait solving*] (the process
|
||||
of pairing up an impl with each reference to a trait), and [*type
|
||||
checking*]. Type checking is the process of converting the types found in the HIR
|
||||
([`hir::Ty`]), which represent what the user wrote,
|
||||
into the internal representation used by the compiler ([`Ty<'tcx>`]).
|
||||
That information is used to verify the type safety, correctness and
|
||||
coherence of the types used in the program.
|
||||
We then use the `HIR` to do [*type inference*] (the process of automatic
|
||||
detection of the type of an expression), [*trait solving*] (the process of
|
||||
pairing up an impl with each reference to a `trait`), and [*type checking*]. Type
|
||||
checking is the process of converting the types found in the `HIR` ([`hir::Ty`]),
|
||||
which represent what the user wrote, into the internal representation used by
|
||||
the compiler ([`Ty<'tcx>`]). It's called type checking because the information
|
||||
is used to verify the type safety, correctness and coherence of the types used
|
||||
in the program.
|
||||
|
||||
### MIR lowering
|
||||
### `MIR` lowering
|
||||
|
||||
The HIR is then [lowered to Mid-level Intermediate Representation (MIR)][mir],
|
||||
which is used for [borrow checking].
|
||||
The `HIR` is further lowered to `MIR`
|
||||
(used for [borrow checking]) by constructing the `THIR` (an even more desugared `HIR` used for
|
||||
pattern and exhaustiveness checking) to convert into `MIR`.
|
||||
|
||||
Along the way, we also construct the THIR, which is an even more desugared HIR.
|
||||
THIR is used for pattern and exhaustiveness checking. It is also more
|
||||
convenient to convert into MIR than HIR is.
|
||||
We do [many optimizations on the MIR][mir-opt] because it is generic and that
|
||||
improves later code generation and compilation speed. It is easier to do some
|
||||
optimizations at `MIR` level than at `LLVM-IR` level. For example LLVM doesn't seem
|
||||
to be able to optimize the pattern the [`simplify_try`] `MIR`-opt looks for.
|
||||
|
||||
We do [many optimizations on the MIR][mir-opt] because it is still
|
||||
generic and that improves the code we generate later, improving compilation
|
||||
speed too.
|
||||
MIR is a higher level (and generic) representation, so it is easier to do
|
||||
some optimizations at MIR level than at LLVM-IR level. For example LLVM
|
||||
doesn't seem to be able to optimize the pattern the [`simplify_try`] mir
|
||||
opt looks for.
|
||||
Rust code is also [_monomorphized_] during code generation, which means making
|
||||
copies of all the generic code with the type parameters replaced by concrete
|
||||
types. To do this, we need to collect a list of what concrete types to generate
|
||||
code for. This is called _monomorphization collection_ and it happens at the
|
||||
`MIR` level.
|
||||
|
||||
Rust code is _monomorphized_, which means making copies of all the generic
|
||||
code with the type parameters replaced by concrete types. To do
|
||||
this, we need to collect a list of what concrete types to generate code for.
|
||||
This is called _monomorphization collection_ and it happens at the MIR level.
|
||||
[_monomorphized_]: https://en.wikipedia.org/wiki/Monomorphization
|
||||
|
||||
### Code generation
|
||||
|
||||
We then begin what is vaguely called _code generation_ or _codegen_.
|
||||
The [code generation stage][codegen] is when higher level
|
||||
representations of source are turned into an executable binary. `rustc`
|
||||
uses LLVM for code generation. The first step is to convert the MIR
|
||||
to LLVM Intermediate Representation (LLVM IR). This is where the MIR
|
||||
is actually monomorphized, according to the list we created in the
|
||||
previous step.
|
||||
The LLVM IR is passed to LLVM, which does a lot more optimizations on it.
|
||||
It then emits machine code. It is basically assembly code with additional
|
||||
low-level types and annotations added (e.g. an ELF object or WASM).
|
||||
The different libraries/binaries are then linked together to produce the final
|
||||
binary.
|
||||
We then begin what is simply called _code generation_ or _codegen_. The [code
|
||||
generation stage][codegen] is when higher-level representations of source are
|
||||
turned into an executable binary. Since `rustc` uses LLVM for code generation,
|
||||
the first step is to convert the `MIR` to `LLVM-IR`. This is where the `MIR` is
|
||||
actually monomorphized. The `LLVM-IR` is passed to LLVM, which does a lot more
|
||||
optimizations on it, emitting machine code which is basically assembly code
|
||||
with additional low-level types and annotations added (e.g. an ELF object or
|
||||
`WASM`). The different libraries/binaries are then linked together to produce
|
||||
the final binary.
|
||||
|
||||
[String interning]: https://en.wikipedia.org/wiki/String_interning
|
||||
[`rustc_lexer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html
|
||||
[`rustc_driver`]: rustc-driver.md
|
||||
[`rustc_interface::Config`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_interface/interface/struct.Config.html
|
||||
[lex]: the-parser.md
|
||||
[`StringReader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/struct.StringReader.html
|
||||
[`rustc_parse`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html
|
||||
[parser]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html
|
||||
[hir]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html
|
||||
[*type inference*]: type-inference.md
|
||||
[*trait solving*]: traits/resolution.md
|
||||
[*type checking*]: type-checking.md
|
||||
[mir]: mir/index.md
|
||||
[borrow checking]: borrow_check.md
|
||||
[mir-opt]: mir/optimizations.md
|
||||
[`simplify_try`]: https://github.com/rust-lang/rust/pull/66282
|
||||
[codegen]: backend/codegen.md
|
||||
[parse_nonterminal]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.parse_nonterminal
|
||||
[parse_crate_mod]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.parse_crate_mod
|
||||
[parse_mod]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.parse_mod
|
||||
[`rustc_parse::parser::Parser`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html
|
||||
[parse_external_mod]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/module/fn.parse_external_mod.html
|
||||
[rustc_parse_parser_dir]: https://github.com/rust-lang/rust/tree/master/compiler/rustc_parse/src/parser
|
||||
[*type inference*]: type-inference.md
|
||||
[`bump`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.bump
|
||||
[`check`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.check
|
||||
[`Crate`]: https://doc.rust-lang.org/beta/nightly-rustc/rustc_ast/ast/struct.Crate.html
|
||||
[`DiagnosticBuilder`]: https://doc.rust-lang.org/beta/nightly-rustc/rustc_errors/struct.DiagnosticBuilder.html
|
||||
[`eat`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.eat
|
||||
[`expect`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.expect
|
||||
[`Expr`]: https://doc.rust-lang.org/beta/nightly-rustc/rustc_ast/ast/struct.Expr.html
|
||||
[`hir::Ty`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/struct.Ty.html
|
||||
[`look_ahead`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.look_ahead
|
||||
[`Parser`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html
|
||||
[`Pat`]: https://doc.rust-lang.org/beta/nightly-rustc/rustc_ast/ast/struct.Pat.html
|
||||
[`rustc_ast::ast`]: https://doc.rust-lang.org/beta/nightly-rustc/rustc_ast/index.html
|
||||
[`rustc_driver`]: rustc-driver.md
|
||||
[`rustc_interface::Config`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_interface/interface/struct.Config.html
|
||||
[`rustc_lexer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html
|
||||
[`rustc_parse::lexer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/index.html
|
||||
[`rustc_parse::parser::Parser`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html
|
||||
[`rustc_parse`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html
|
||||
[`simplify_try`]: https://github.com/rust-lang/rust/pull/66282
|
||||
[`StringReader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/struct.StringReader.html
|
||||
[`Ty<'tcx>`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Ty.html
|
||||
[borrow checking]: borrow_check.md
|
||||
[codegen]: backend/codegen.md
|
||||
[hir]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html
|
||||
[lex]: the-parser.md
|
||||
[mir-opt]: mir/optimizations.md
|
||||
[mir]: mir/index.md
|
||||
[parse_crate_mod]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.parse_crate_mod
|
||||
[parse_external_mod]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/module/fn.parse_external_mod.html
|
||||
[parse_mod]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.parse_mod
|
||||
[parse_nonterminal]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.parse_nonterminal
|
||||
[parser]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html
|
||||
[rustc_parse_parser_dir]: https://github.com/rust-lang/rust/tree/master/compiler/rustc_parse/src/parser
|
||||
[String interning]: https://en.wikipedia.org/wiki/String_interning
|
||||
[Typed High-level Intermediate Representation (`THIR`)]: https://rustc-dev-guide.rust-lang.org/thir.html
|
||||
|
||||
## How it does it
|
||||
|
||||
Ok, so now that we have a high-level view of what the compiler does to your
|
||||
code, let's take a high-level view of _how_ it does all that stuff. There are a
|
||||
lot of constraints and conflicting goals that the compiler needs to
|
||||
Now that we have a high-level view of what the compiler does to your code,
|
||||
let's take a high-level view of _how_ it does all that stuff. There are a lot
|
||||
of constraints and conflicting goals that the compiler needs to
|
||||
satisfy/optimize for. For example,
|
||||
|
||||
- Compilation speed: how fast is it to compile a program. More/better
|
||||
- Compilation speed: how fast is it to compile a program? More/better
|
||||
compile-time analyses often means compilation is slower.
|
||||
- Also, we want to support incremental compilation, so we need to take that
|
||||
into account. How can we keep track of what work needs to be redone and
|
||||
|
|
@ -190,17 +200,17 @@ satisfy/optimize for. For example,
|
|||
the input programs says they do, and should continue to do so despite the
|
||||
tremendous amount of change constantly going on.
|
||||
- Integration: a number of other tools need to use the compiler in
|
||||
various ways (e.g. cargo, clippy, miri) that must be supported.
|
||||
various ways (e.g. `cargo`, `clippy`, `MIRI`) that must be supported.
|
||||
- Compiler stability: the compiler should not crash or fail ungracefully on the
|
||||
stable channel.
|
||||
- Rust stability: the compiler must respect Rust's stability guarantees by not
|
||||
breaking programs that previously compiled despite the many changes that are
|
||||
always going on to its implementation.
|
||||
- Limitations of other tools: rustc uses LLVM in its backend, and LLVM has some
|
||||
strengths we leverage and some limitations/weaknesses we need to work around.
|
||||
- Limitations of other tools: `rustc` uses LLVM in its backend, and LLVM has some
|
||||
strengths we leverage and some aspects we need to work around.
|
||||
|
||||
So, as you read through the rest of the guide, keep these things in mind. They
|
||||
will often inform decisions that we make.
|
||||
So, as you continue your journey through the rest of the guide, keep these
|
||||
things in mind. They will often inform decisions that we make.
|
||||
|
||||
### Intermediate representations
|
||||
|
||||
|
|
@ -217,31 +227,32 @@ for different purposes:
|
|||
- Token stream: the lexer produces a stream of tokens directly from the source
|
||||
code. This stream of tokens is easier for the parser to deal with than raw
|
||||
text.
|
||||
- Abstract Syntax Tree (AST): the abstract syntax tree is built from the stream
|
||||
- Abstract Syntax Tree (`AST`): the abstract syntax tree is built from the stream
|
||||
of tokens produced by the lexer. It represents
|
||||
pretty much exactly what the user wrote. It helps to do some syntactic sanity
|
||||
checking (e.g. checking that a type is expected where the user wrote one).
|
||||
- High-level IR (HIR): This is a sort of desugared AST. It's still close
|
||||
- High-level IR (HIR): This is a sort of desugared `AST`. It's still close
|
||||
to what the user wrote syntactically, but it includes some implicit things
|
||||
such as some elided lifetimes, etc. This IR is amenable to type checking.
|
||||
- Typed HIR (THIR): This is an intermediate between HIR and MIR, and used to be called
|
||||
High-level Abstract IR (HAIR). It is like the HIR but it is fully typed and a bit
|
||||
more desugared (e.g. method calls and implicit dereferences are made fully explicit).
|
||||
Moreover, it is easier to lower to MIR from THIR than from HIR.
|
||||
- Middle-level IR (MIR): This IR is basically a Control-Flow Graph (CFG). A CFG
|
||||
- Typed `HIR` (THIR) _formerly High-level Abstract IR (HAIR)_: This is an
|
||||
intermediate between `HIR` and MIR. It is like the `HIR` but it is fully typed
|
||||
and a bit more desugared (e.g. method calls and implicit dereferences are
|
||||
made fully explicit). As a result, it is easier to lower to `MIR` from `THIR` than
|
||||
from HIR.
|
||||
- Middle-level IR (`MIR`): This IR is basically a Control-Flow Graph (CFG). A CFG
|
||||
is a type of diagram that shows the basic blocks of a program and how control
|
||||
flow can go between them. Likewise, MIR also has a bunch of basic blocks with
|
||||
flow can go between them. Likewise, `MIR` also has a bunch of basic blocks with
|
||||
simple typed statements inside them (e.g. assignment, simple computations,
|
||||
etc) and control flow edges to other basic blocks (e.g., calls, dropping
|
||||
values). MIR is used for borrow checking and other
|
||||
values). `MIR` is used for borrow checking and other
|
||||
important dataflow-based checks, such as checking for uninitialized values.
|
||||
It is also used for a series of optimizations and for constant evaluation (via
|
||||
MIRI). Because MIR is still generic, we can do a lot of analyses here more
|
||||
`MIRI`). Because `MIR` is still generic, we can do a lot of analyses here more
|
||||
efficiently than after monomorphization.
|
||||
- LLVM IR: This is the standard form of all input to the LLVM compiler. LLVM IR
|
||||
- `LLVM-IR`: This is the standard form of all input to the LLVM compiler. `LLVM-IR`
|
||||
is a sort of typed assembly language with lots of annotations. It's
|
||||
a standard format that is used by all compilers that use LLVM (e.g. the clang
|
||||
C compiler also outputs LLVM IR). LLVM IR is designed to be easy for other
|
||||
C compiler also outputs `LLVM-IR`). `LLVM-IR` is designed to be easy for other
|
||||
compilers to emit and also rich enough for LLVM to run a bunch of
|
||||
optimizations on it.
|
||||
|
||||
|
|
@ -258,25 +269,25 @@ representations are interned.
|
|||
|
||||
### Queries
|
||||
|
||||
The first big implementation choice is the _query_ system. The Rust compiler
|
||||
uses a query system which is unlike most textbook compilers, which are
|
||||
organized as a series of passes over the code that execute sequentially. The
|
||||
compiler does this to make incremental compilation possible -- that is, if the
|
||||
user makes a change to their program and recompiles, we want to do as little
|
||||
redundant work as possible to produce the new binary.
|
||||
The first big implementation choice is Rust's use of the _query_ system in its
|
||||
compiler. The Rust compiler _is not_ organized as a series of passes over the
|
||||
code which execute sequentially. The Rust compiler does this to make
|
||||
incremental compilation possible -- that is, if the user makes a change to
|
||||
their program and recompiles, we want to do as little redundant work as
|
||||
possible to output the new binary.
|
||||
|
||||
In `rustc`, all the major steps above are organized as a bunch of queries that
|
||||
call each other. For example, there is a query to ask for the type of something
|
||||
and another to ask for the optimized MIR of a function. These
|
||||
queries can call each other and are all tracked through the query system.
|
||||
The results of the queries are cached on disk so that we can tell which
|
||||
queries' results changed from the last compilation and only redo those. This is
|
||||
how incremental compilation works.
|
||||
and another to ask for the optimized `MIR` of a function. These queries can call
|
||||
each other and are all tracked through the query system. The results of the
|
||||
queries are cached on disk so that the compiler can tell which queries' results
|
||||
changed from the last compilation and only redo those. This is how incremental
|
||||
compilation works.
|
||||
|
||||
In principle, for the query-fied steps, we do each of the above for each item
|
||||
individually. For example, we will take the HIR for a function and use queries
|
||||
to ask for the LLVM IR for that HIR. This drives the generation of optimized
|
||||
MIR, which drives the borrow checker, which drives the generation of MIR, and
|
||||
individually. For example, we will take the `HIR` for a function and use queries
|
||||
to ask for the `LLVM-IR` for that HIR. This drives the generation of optimized
|
||||
`MIR`, which drives the borrow checker, which drives the generation of `MIR`, and
|
||||
so on.
|
||||
|
||||
... except that this is very over-simplified. In fact, some queries are not
|
||||
|
|
@ -295,8 +306,8 @@ Moreover, the compiler wasn't originally built to use a query system; the query
|
|||
system has been retrofitted into the compiler, so parts of it are not query-fied
|
||||
yet. Also, LLVM isn't our code, so that isn't querified either. The plan is to
|
||||
eventually query-fy all of the steps listed in the previous section,
|
||||
but as of <!-- date-check --> November 2022, only the steps between HIR and
|
||||
LLVM IR are query-fied. That is, lexing, parsing, name resolution, and macro
|
||||
but as of <!-- date-check --> November 2022, only the steps between `HIR` and
|
||||
`LLVM-IR` are query-fied. That is, lexing, parsing, name resolution, and macro
|
||||
expansion are done all at once for the whole program.
|
||||
|
||||
One other thing to mention here is the all-important "typing context",
|
||||
|
|
@ -308,7 +319,7 @@ queries are defined as methods on the [`TyCtxt`] type, and the in-memory query
|
|||
cache is stored there too. In the code, there is usually a variable called
|
||||
`tcx` which is a handle on the typing context. You will also see lifetimes with
|
||||
the name `'tcx`, which means that something is tied to the lifetime of the
|
||||
`TyCtxt` (usually it is stored or interned there).
|
||||
[`TyCtxt`] (usually it is stored or interned there).
|
||||
|
||||
[`TyCtxt`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyCtxt.html
|
||||
|
||||
|
|
@ -320,9 +331,10 @@ program) is [`rustc_middle::ty::Ty`][ty]. This is so important that we have a wh
|
|||
on [`ty::Ty`][ty], but for now, we just want to mention that it exists and is the way
|
||||
`rustc` represents types!
|
||||
|
||||
Also note that the `rustc_middle::ty` module defines the `TyCtxt` struct we mentioned before.
|
||||
Also note that the [`rustc_middle::ty`] module defines the [`TyCtxt`] struct we mentioned before.
|
||||
|
||||
[ty]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Ty.html
|
||||
[`rustc_middle::ty`]: https://doc.rust-lang.org/beta/nightly-rustc/rustc_middle/ty/index.html
|
||||
|
||||
### Parallelism
|
||||
|
||||
|
|
@ -330,17 +342,21 @@ Compiler performance is a problem that we would like to improve on
|
|||
(and are always working on). One aspect of that is parallelizing
|
||||
`rustc` itself.
|
||||
|
||||
Currently, there is only one part of rustc that is parallel by default: codegen.
|
||||
Currently, there is only one part of rustc that is parallel by default:
|
||||
[code generation](./parallel-rustc.md#Codegen).
|
||||
|
||||
However, the rest of the compiler is still not yet parallel. There have been
|
||||
lots of efforts spent on this, but it is generally a hard problem. The current
|
||||
approach is to turn `RefCell`s into `Mutex`s -- that is, we
|
||||
approach is to turn [`RefCell`]s into [`Mutex`]s -- that is, we
|
||||
switch to thread-safe internal mutability. However, there are ongoing
|
||||
challenges with lock contention, maintaining query-system invariants under
|
||||
concurrency, and the complexity of the code base. One can try out the current
|
||||
work by enabling parallel compilation in `config.toml`. It's still early days,
|
||||
but there are already some promising performance improvements.
|
||||
|
||||
[`RefCell`]: https://doc.rust-lang.org/std/cell/struct.RefCell.html
|
||||
[`Mutex`]: https://doc.rust-lang.org/std/sync/struct.Mutex.html
|
||||
|
||||
### Bootstrapping
|
||||
|
||||
`rustc` itself is written in Rust. So how do we compile the compiler? We use an
|
||||
|
|
@ -362,7 +378,7 @@ For more details on bootstrapping, see
|
|||
- Does LLVM ever do optimizations in debug builds?
|
||||
- How do I explore phases of the compile process in my own sources (lexer,
|
||||
parser, HIR, etc)? - e.g., `cargo rustc -- -Z unpretty=hir-tree` allows you to
|
||||
view HIR representation
|
||||
view `HIR` representation
|
||||
- What is the main source entry point for `X`?
|
||||
- Where do phases diverge for cross-compilation to machine code across
|
||||
different platforms?
|
||||
|
|
@ -387,16 +403,16 @@ For more details on bootstrapping, see
|
|||
- [Entry point for first file in crate](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_interface/passes/fn.parse.html)
|
||||
- [Entry point for outline module parsing](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/module/fn.parse_external_mod.html)
|
||||
- [Entry point for macro fragments][parse_nonterminal]
|
||||
- AST definition: [`rustc_ast`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html)
|
||||
- `AST` definition: [`rustc_ast`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html)
|
||||
- Feature gating: **TODO**
|
||||
- Early linting: **TODO**
|
||||
- The High Level Intermediate Representation (HIR)
|
||||
- Guide: [The HIR](hir.md)
|
||||
- Guide: [Identifiers in the HIR](hir.md#identifiers-in-the-hir)
|
||||
- Guide: [The HIR Map](hir.md#the-hir-map)
|
||||
- Guide: [Lowering AST to HIR](lowering.md)
|
||||
- How to view HIR representation for your code `cargo rustc -- -Z unpretty=hir-tree`
|
||||
- Rustc HIR definition: [`rustc_hir`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html)
|
||||
- Guide: [The `HIR` Map](hir.md#the-hir-map)
|
||||
- Guide: [Lowering `AST` to HIR](lowering.md)
|
||||
- How to view `HIR` representation for your code `cargo rustc -- -Z unpretty=hir-tree`
|
||||
- Rustc `HIR` definition: [`rustc_hir`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html)
|
||||
- Main entry point: **TODO**
|
||||
- Late linting: **TODO**
|
||||
- Type Inference
|
||||
|
|
@ -406,21 +422,21 @@ For more details on bootstrapping, see
|
|||
- Main entry point (type checking bodies): [the `typeck` query](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyCtxt.html#method.typeck)
|
||||
- These two functions can't be decoupled.
|
||||
- The Mid Level Intermediate Representation (MIR)
|
||||
- Guide: [The MIR (Mid level IR)](mir/index.md)
|
||||
- Guide: [The `MIR` (Mid level IR)](mir/index.md)
|
||||
- Definition: [`rustc_middle/src/mir`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/index.html)
|
||||
- Definition of sources that manipulates the MIR: [`rustc_mir_build`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_build/index.html), [`rustc_mir_dataflow`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_dataflow/index.html), [`rustc_mir_transform`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/index.html)
|
||||
- The Borrow Checker
|
||||
- Guide: [MIR Borrow Check](borrow_check.md)
|
||||
- Definition: [`rustc_borrowck`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_borrowck/index.html)
|
||||
- Main entry point: [`mir_borrowck` query](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_borrowck/fn.mir_borrowck.html)
|
||||
- MIR Optimizations
|
||||
- `MIR` Optimizations
|
||||
- Guide: [MIR Optimizations](mir/optimizations.md)
|
||||
- Definition: [`rustc_mir_transform`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/index.html)
|
||||
- Main entry point: [`optimized_mir` query](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/fn.optimized_mir.html)
|
||||
- Code Generation
|
||||
- Guide: [Code Generation](backend/codegen.md)
|
||||
- Generating Machine Code from LLVM IR with LLVM - **TODO: reference?**
|
||||
- Generating Machine Code from `LLVM-IR` with LLVM - **TODO: reference?**
|
||||
- Main entry point: [`rustc_codegen_ssa::base::codegen_crate`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_crate.html)
|
||||
- This monomorphizes and produces LLVM IR for one codegen unit. It then
|
||||
- This monomorphizes and produces `LLVM-IR` for one codegen unit. It then
|
||||
starts a background thread to run LLVM, which must be joined later.
|
||||
- Monomorphization happens lazily via [`FunctionCx::monomorphize`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/mir/struct.FunctionCx.html#method.monomorphize) and [`rustc_codegen_ssa::base::codegen_instance `](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_instance.html)
|
||||
|
|
|
|||
Loading…
Reference in New Issue