Update overview.md (#1898)

* Update overview.md

Various link addition and minor edits for clarity.

* generic improvements

* fix line lengths for ci/cd

---------

Co-authored-by: Tbkhi <me.stole546@silomails.com>
Co-authored-by: Oliver Dechant <ol922807@dal.ca>
This commit is contained in:
Tbkhi 2024-03-04 16:00:53 -04:00 committed by GitHub
parent 57b23c8566
commit 0a9c758ed5
1 changed files with 164 additions and 148 deletions

View File

@ -6,25 +6,24 @@ This chapter is about the overall process of compiling a program -- how
everything fits together. everything fits together.
The Rust compiler is special in two ways: it does things to your code that The Rust compiler is special in two ways: it does things to your code that
other compilers don't do (e.g. borrow checking) and it has a lot of other compilers don't do (e.g. borrow-checking) and it has a lot of
unconventional implementation choices (e.g. queries). We will talk about these unconventional implementation choices (e.g. queries). We will talk about these
in turn in this chapter, and in the rest of the guide, we will look at all the in turn in this chapter, and in the rest of the guide, we will look at the
individual pieces in more detail. individual pieces in more detail.
## What the compiler does to your code ## What the compiler does to your code
So first, let's look at what the compiler does to your code. For now, we will So first, let's look at what the compiler does to your code. For now, we will
avoid mentioning how the compiler implements these steps except as needed; avoid mentioning how the compiler implements these steps except as needed.
we'll talk about that later.
### Invocation ### Invocation
Compilation begins when a user writes a Rust source program in text Compilation begins when a user writes a Rust source program in text and invokes
and invokes the `rustc` compiler on it. The work that the compiler needs to the `rustc` compiler on it. The work that the compiler needs to perform is
perform is defined by command-line options. For example, it is possible to defined by command-line options. For example, it is possible to enable nightly
enable nightly features (`-Z` flags), perform `check`-only builds, or emit features (`-Z` flags), perform `check`-only builds, or emit the LLVM
LLVM-IR rather than executable machine code. The `rustc` executable call may Intermediate Representation (`LLVM-IR`) rather than executable machine code.
be indirect through the use of `cargo`. The `rustc` executable call may be indirect through the use of `cargo`.
Command line argument parsing occurs in the [`rustc_driver`]. This crate Command line argument parsing occurs in the [`rustc_driver`]. This crate
defines the compile configuration that is requested by the user and passes it defines the compile configuration that is requested by the user and passes it
@ -34,140 +33,151 @@ to the rest of the compilation process as a [`rustc_interface::Config`].
The raw Rust source text is analyzed by a low-level *lexer* located in The raw Rust source text is analyzed by a low-level *lexer* located in
[`rustc_lexer`]. At this stage, the source text is turned into a stream of [`rustc_lexer`]. At this stage, the source text is turned into a stream of
atomic source code units known as _tokens_. The lexer supports the atomic source code units known as _tokens_. The `lexer` supports the
Unicode character encoding. Unicode character encoding.
The token stream passes through a higher-level lexer located in The token stream passes through a higher-level lexer located in
[`rustc_parse`] to prepare for the next stage of the compile process. The [`rustc_parse`] to prepare for the next stage of the compile process. The
[`StringReader`] struct is used at this stage to perform a set of validations [`StringReader`] `struct` is used at this stage to perform a set of validations
and turn strings into interned symbols (_interning_ is discussed later). and turn strings into interned symbols (_interning_ is discussed later).
[String interning] is a way of storing only one immutable [String interning] is a way of storing only one immutable
copy of each distinct string value. copy of each distinct string value.
The lexer has a small interface and doesn't depend directly on the The lexer has a small interface and doesn't depend directly on the diagnostic
diagnostic infrastructure in `rustc`. Instead it provides diagnostics as plain infrastructure in `rustc`. Instead it provides diagnostics as plain data which
data which are emitted in `rustc_parse::lexer` as real diagnostics. are emitted in [`rustc_parse::lexer`] as real diagnostics. The `lexer`
The lexer preserves full fidelity information for both IDEs and proc macros. preserves full fidelity information for both IDEs and procedural macros
(sometimes referred to as "proc-macros").
The *parser* [translates the token stream from the lexer into an Abstract Syntax The *parser* [translates the token stream from the `lexer` into an Abstract Syntax
Tree (AST)][parser]. It uses a recursive descent (top-down) approach to syntax Tree (AST)][parser]. It uses a recursive descent (top-down) approach to syntax
analysis. The crate entry points for the parser are the analysis. The crate entry points for the `parser` are the
[`Parser::parse_crate_mod()`][parse_crate_mod] and [`Parser::parse_mod()`][parse_mod] [`Parser::parse_crate_mod()`][parse_crate_mod] and [`Parser::parse_mod()`][parse_mod]
methods found in [`rustc_parse::parser::Parser`]. The external module parsing methods found in [`rustc_parse::parser::Parser`]. The external module parsing
entry point is [`rustc_expand::module::parse_external_mod`][parse_external_mod]. entry point is [`rustc_expand::module::parse_external_mod`][parse_external_mod].
And the macro parser entry point is [`Parser::parse_nonterminal()`][parse_nonterminal]. And the macro-`parser` entry point is [`Parser::parse_nonterminal()`][parse_nonterminal].
Parsing is performed with a set of `Parser` utility methods including `bump`, Parsing is performed with a set of [`parser`] utility methods including [`bump`],
`check`, `eat`, `expect`, `look_ahead`. [`check`], [`eat`], [`expect`], [`look_ahead`].
Parsing is organized by semantic construct. Separate Parsing is organized by semantic construct. Separate
`parse_*` methods can be found in the [`rustc_parse`][rustc_parse_parser_dir] `parse_*` methods can be found in the [`rustc_parse`][rustc_parse_parser_dir]
directory. The source file name follows the construct name. For example, the directory. The source file name follows the construct name. For example, the
following files are found in the parser: following files are found in the `parser`:
- `expr.rs` - [`expr.rs`](https://github.com/rust-lang/rust/blob/master/compiler/rustc_parse/src/parser/expr.rs)
- `pat.rs` - [`pat.rs`](https://github.com/rust-lang/rust/blob/master/compiler/rustc_parse/src/parser/pat.rs)
- `ty.rs` - [`ty.rs`](https://github.com/rust-lang/rust/blob/master/compiler/rustc_parse/src/parser/ty.rs)
- `stmt.rs` - [`stmt.rs`](https://github.com/rust-lang/rust/blob/master/compiler/rustc_parse/src/parser/stmt.rs)
This naming scheme is used across many compiler stages. You will find This naming scheme is used across many compiler stages. You will find either a
either a file or directory with the same name across the parsing, lowering, file or directory with the same name across the parsing, lowering, type
type checking, THIR lowering, and MIR building sources. checking, [Typed High-level Intermediate Representation (`THIR`)] lowering, and
[Mid-level Intermediate Representation (`MIR`)][mir] building sources.
Macro expansion, AST validation, name resolution, and early linting also take place Macro-expansion, `AST`-validation, name-resolution, and early linting also take
during this stage. place during the lexing and parsing stage.
The parser uses the standard `DiagnosticBuilder` API for error handling, but we The [`rustc_ast::ast`]::{[`Crate`], [`Expr`], [`Pat`], ...} `AST` nodes are
try to recover, parsing a superset of Rust's grammar, while also emitting an error. returned from the parser while the standard [`DiagnosticBuilder`] API is used
`rustc_ast::ast::{Crate, Mod, Expr, Pat, ...}` AST nodes are returned from the parser. for error handling. Generally Rust's compiler will try to recover from errors
by parsing a superset of Rust's grammar, while also emitting an error type.
### HIR lowering ### `HIR` lowering
Next, we take the AST and convert it to [High-Level Intermediate Next the `AST` is converted into [High-Level Intermediate Representation
Representation (HIR)][hir], a more compiler-friendly representation of the (`HIR`)][hir], a more compiler-friendly representation of the `AST`. This process
AST. This process is called "lowering". It involves a lot of desugaring of things is called "lowering" and involves a lot of desugaring (the expansion and
like loops and `async fn`. formalizing of shortened or abbreviated syntax constructs) of things like loops
and `async fn`.
We then use the HIR to do [*type inference*] (the process of automatic We then use the `HIR` to do [*type inference*] (the process of automatic
detection of the type of an expression), [*trait solving*] (the process detection of the type of an expression), [*trait solving*] (the process of
of pairing up an impl with each reference to a trait), and [*type pairing up an impl with each reference to a `trait`), and [*type checking*]. Type
checking*]. Type checking is the process of converting the types found in the HIR checking is the process of converting the types found in the `HIR` ([`hir::Ty`]),
([`hir::Ty`]), which represent what the user wrote, which represent what the user wrote, into the internal representation used by
into the internal representation used by the compiler ([`Ty<'tcx>`]). the compiler ([`Ty<'tcx>`]). It's called type checking because the information
That information is used to verify the type safety, correctness and is used to verify the type safety, correctness and coherence of the types used
coherence of the types used in the program. in the program.
### MIR lowering ### `MIR` lowering
The HIR is then [lowered to Mid-level Intermediate Representation (MIR)][mir], The `HIR` is further lowered to `MIR`
which is used for [borrow checking]. (used for [borrow checking]) by constructing the `THIR` (an even more desugared `HIR` used for
pattern and exhaustiveness checking) to convert into `MIR`.
Along the way, we also construct the THIR, which is an even more desugared HIR. We do [many optimizations on the MIR][mir-opt] because it is generic and that
THIR is used for pattern and exhaustiveness checking. It is also more improves later code generation and compilation speed. It is easier to do some
convenient to convert into MIR than HIR is. optimizations at `MIR` level than at `LLVM-IR` level. For example LLVM doesn't seem
to be able to optimize the pattern the [`simplify_try`] `MIR`-opt looks for.
We do [many optimizations on the MIR][mir-opt] because it is still Rust code is also [_monomorphized_] during code generation, which means making
generic and that improves the code we generate later, improving compilation copies of all the generic code with the type parameters replaced by concrete
speed too. types. To do this, we need to collect a list of what concrete types to generate
MIR is a higher level (and generic) representation, so it is easier to do code for. This is called _monomorphization collection_ and it happens at the
some optimizations at MIR level than at LLVM-IR level. For example LLVM `MIR` level.
doesn't seem to be able to optimize the pattern the [`simplify_try`] mir
opt looks for.
Rust code is _monomorphized_, which means making copies of all the generic [_monomorphized_]: https://en.wikipedia.org/wiki/Monomorphization
code with the type parameters replaced by concrete types. To do
this, we need to collect a list of what concrete types to generate code for.
This is called _monomorphization collection_ and it happens at the MIR level.
### Code generation ### Code generation
We then begin what is vaguely called _code generation_ or _codegen_. We then begin what is simply called _code generation_ or _codegen_. The [code
The [code generation stage][codegen] is when higher level generation stage][codegen] is when higher-level representations of source are
representations of source are turned into an executable binary. `rustc` turned into an executable binary. Since `rustc` uses LLVM for code generation,
uses LLVM for code generation. The first step is to convert the MIR the first step is to convert the `MIR` to `LLVM-IR`. This is where the `MIR` is
to LLVM Intermediate Representation (LLVM IR). This is where the MIR actually monomorphized. The `LLVM-IR` is passed to LLVM, which does a lot more
is actually monomorphized, according to the list we created in the optimizations on it, emitting machine code which is basically assembly code
previous step. with additional low-level types and annotations added (e.g. an ELF object or
The LLVM IR is passed to LLVM, which does a lot more optimizations on it. `WASM`). The different libraries/binaries are then linked together to produce
It then emits machine code. It is basically assembly code with additional the final binary.
low-level types and annotations added (e.g. an ELF object or WASM).
The different libraries/binaries are then linked together to produce the final
binary.
[String interning]: https://en.wikipedia.org/wiki/String_interning
[`rustc_lexer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html
[`rustc_driver`]: rustc-driver.md
[`rustc_interface::Config`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_interface/interface/struct.Config.html
[lex]: the-parser.md
[`StringReader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/struct.StringReader.html
[`rustc_parse`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html
[parser]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html
[hir]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html
[*type inference*]: type-inference.md
[*trait solving*]: traits/resolution.md [*trait solving*]: traits/resolution.md
[*type checking*]: type-checking.md [*type checking*]: type-checking.md
[mir]: mir/index.md [*type inference*]: type-inference.md
[borrow checking]: borrow_check.md [`bump`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.bump
[mir-opt]: mir/optimizations.md [`check`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.check
[`simplify_try`]: https://github.com/rust-lang/rust/pull/66282 [`Crate`]: https://doc.rust-lang.org/beta/nightly-rustc/rustc_ast/ast/struct.Crate.html
[codegen]: backend/codegen.md [`DiagnosticBuilder`]: https://doc.rust-lang.org/beta/nightly-rustc/rustc_errors/struct.DiagnosticBuilder.html
[parse_nonterminal]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.parse_nonterminal [`eat`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.eat
[parse_crate_mod]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.parse_crate_mod [`expect`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.expect
[parse_mod]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.parse_mod [`Expr`]: https://doc.rust-lang.org/beta/nightly-rustc/rustc_ast/ast/struct.Expr.html
[`rustc_parse::parser::Parser`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html
[parse_external_mod]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/module/fn.parse_external_mod.html
[rustc_parse_parser_dir]: https://github.com/rust-lang/rust/tree/master/compiler/rustc_parse/src/parser
[`hir::Ty`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/struct.Ty.html [`hir::Ty`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/struct.Ty.html
[`look_ahead`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.look_ahead
[`Parser`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html
[`Pat`]: https://doc.rust-lang.org/beta/nightly-rustc/rustc_ast/ast/struct.Pat.html
[`rustc_ast::ast`]: https://doc.rust-lang.org/beta/nightly-rustc/rustc_ast/index.html
[`rustc_driver`]: rustc-driver.md
[`rustc_interface::Config`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_interface/interface/struct.Config.html
[`rustc_lexer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html
[`rustc_parse::lexer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/index.html
[`rustc_parse::parser::Parser`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html
[`rustc_parse`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html
[`simplify_try`]: https://github.com/rust-lang/rust/pull/66282
[`StringReader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/struct.StringReader.html
[`Ty<'tcx>`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Ty.html [`Ty<'tcx>`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Ty.html
[borrow checking]: borrow_check.md
[codegen]: backend/codegen.md
[hir]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html
[lex]: the-parser.md
[mir-opt]: mir/optimizations.md
[mir]: mir/index.md
[parse_crate_mod]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.parse_crate_mod
[parse_external_mod]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/module/fn.parse_external_mod.html
[parse_mod]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.parse_mod
[parse_nonterminal]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.parse_nonterminal
[parser]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html
[rustc_parse_parser_dir]: https://github.com/rust-lang/rust/tree/master/compiler/rustc_parse/src/parser
[String interning]: https://en.wikipedia.org/wiki/String_interning
[Typed High-level Intermediate Representation (`THIR`)]: https://rustc-dev-guide.rust-lang.org/thir.html
## How it does it ## How it does it
Ok, so now that we have a high-level view of what the compiler does to your Now that we have a high-level view of what the compiler does to your code,
code, let's take a high-level view of _how_ it does all that stuff. There are a let's take a high-level view of _how_ it does all that stuff. There are a lot
lot of constraints and conflicting goals that the compiler needs to of constraints and conflicting goals that the compiler needs to
satisfy/optimize for. For example, satisfy/optimize for. For example,
- Compilation speed: how fast is it to compile a program. More/better - Compilation speed: how fast is it to compile a program? More/better
compile-time analyses often means compilation is slower. compile-time analyses often means compilation is slower.
- Also, we want to support incremental compilation, so we need to take that - Also, we want to support incremental compilation, so we need to take that
into account. How can we keep track of what work needs to be redone and into account. How can we keep track of what work needs to be redone and
@ -190,17 +200,17 @@ satisfy/optimize for. For example,
the input programs says they do, and should continue to do so despite the the input programs says they do, and should continue to do so despite the
tremendous amount of change constantly going on. tremendous amount of change constantly going on.
- Integration: a number of other tools need to use the compiler in - Integration: a number of other tools need to use the compiler in
various ways (e.g. cargo, clippy, miri) that must be supported. various ways (e.g. `cargo`, `clippy`, `MIRI`) that must be supported.
- Compiler stability: the compiler should not crash or fail ungracefully on the - Compiler stability: the compiler should not crash or fail ungracefully on the
stable channel. stable channel.
- Rust stability: the compiler must respect Rust's stability guarantees by not - Rust stability: the compiler must respect Rust's stability guarantees by not
breaking programs that previously compiled despite the many changes that are breaking programs that previously compiled despite the many changes that are
always going on to its implementation. always going on to its implementation.
- Limitations of other tools: rustc uses LLVM in its backend, and LLVM has some - Limitations of other tools: `rustc` uses LLVM in its backend, and LLVM has some
strengths we leverage and some limitations/weaknesses we need to work around. strengths we leverage and some aspects we need to work around.
So, as you read through the rest of the guide, keep these things in mind. They So, as you continue your journey through the rest of the guide, keep these
will often inform decisions that we make. things in mind. They will often inform decisions that we make.
### Intermediate representations ### Intermediate representations
@ -217,31 +227,32 @@ for different purposes:
- Token stream: the lexer produces a stream of tokens directly from the source - Token stream: the lexer produces a stream of tokens directly from the source
code. This stream of tokens is easier for the parser to deal with than raw code. This stream of tokens is easier for the parser to deal with than raw
text. text.
- Abstract Syntax Tree (AST): the abstract syntax tree is built from the stream - Abstract Syntax Tree (`AST`): the abstract syntax tree is built from the stream
of tokens produced by the lexer. It represents of tokens produced by the lexer. It represents
pretty much exactly what the user wrote. It helps to do some syntactic sanity pretty much exactly what the user wrote. It helps to do some syntactic sanity
checking (e.g. checking that a type is expected where the user wrote one). checking (e.g. checking that a type is expected where the user wrote one).
- High-level IR (HIR): This is a sort of desugared AST. It's still close - High-level IR (HIR): This is a sort of desugared `AST`. It's still close
to what the user wrote syntactically, but it includes some implicit things to what the user wrote syntactically, but it includes some implicit things
such as some elided lifetimes, etc. This IR is amenable to type checking. such as some elided lifetimes, etc. This IR is amenable to type checking.
- Typed HIR (THIR): This is an intermediate between HIR and MIR, and used to be called - Typed `HIR` (THIR) _formerly High-level Abstract IR (HAIR)_: This is an
High-level Abstract IR (HAIR). It is like the HIR but it is fully typed and a bit intermediate between `HIR` and MIR. It is like the `HIR` but it is fully typed
more desugared (e.g. method calls and implicit dereferences are made fully explicit). and a bit more desugared (e.g. method calls and implicit dereferences are
Moreover, it is easier to lower to MIR from THIR than from HIR. made fully explicit). As a result, it is easier to lower to `MIR` from `THIR` than
- Middle-level IR (MIR): This IR is basically a Control-Flow Graph (CFG). A CFG from HIR.
- Middle-level IR (`MIR`): This IR is basically a Control-Flow Graph (CFG). A CFG
is a type of diagram that shows the basic blocks of a program and how control is a type of diagram that shows the basic blocks of a program and how control
flow can go between them. Likewise, MIR also has a bunch of basic blocks with flow can go between them. Likewise, `MIR` also has a bunch of basic blocks with
simple typed statements inside them (e.g. assignment, simple computations, simple typed statements inside them (e.g. assignment, simple computations,
etc) and control flow edges to other basic blocks (e.g., calls, dropping etc) and control flow edges to other basic blocks (e.g., calls, dropping
values). MIR is used for borrow checking and other values). `MIR` is used for borrow checking and other
important dataflow-based checks, such as checking for uninitialized values. important dataflow-based checks, such as checking for uninitialized values.
It is also used for a series of optimizations and for constant evaluation (via It is also used for a series of optimizations and for constant evaluation (via
MIRI). Because MIR is still generic, we can do a lot of analyses here more `MIRI`). Because `MIR` is still generic, we can do a lot of analyses here more
efficiently than after monomorphization. efficiently than after monomorphization.
- LLVM IR: This is the standard form of all input to the LLVM compiler. LLVM IR - `LLVM-IR`: This is the standard form of all input to the LLVM compiler. `LLVM-IR`
is a sort of typed assembly language with lots of annotations. It's is a sort of typed assembly language with lots of annotations. It's
a standard format that is used by all compilers that use LLVM (e.g. the clang a standard format that is used by all compilers that use LLVM (e.g. the clang
C compiler also outputs LLVM IR). LLVM IR is designed to be easy for other C compiler also outputs `LLVM-IR`). `LLVM-IR` is designed to be easy for other
compilers to emit and also rich enough for LLVM to run a bunch of compilers to emit and also rich enough for LLVM to run a bunch of
optimizations on it. optimizations on it.
@ -258,25 +269,25 @@ representations are interned.
### Queries ### Queries
The first big implementation choice is the _query_ system. The Rust compiler The first big implementation choice is Rust's use of the _query_ system in its
uses a query system which is unlike most textbook compilers, which are compiler. The Rust compiler _is not_ organized as a series of passes over the
organized as a series of passes over the code that execute sequentially. The code which execute sequentially. The Rust compiler does this to make
compiler does this to make incremental compilation possible -- that is, if the incremental compilation possible -- that is, if the user makes a change to
user makes a change to their program and recompiles, we want to do as little their program and recompiles, we want to do as little redundant work as
redundant work as possible to produce the new binary. possible to output the new binary.
In `rustc`, all the major steps above are organized as a bunch of queries that In `rustc`, all the major steps above are organized as a bunch of queries that
call each other. For example, there is a query to ask for the type of something call each other. For example, there is a query to ask for the type of something
and another to ask for the optimized MIR of a function. These and another to ask for the optimized `MIR` of a function. These queries can call
queries can call each other and are all tracked through the query system. each other and are all tracked through the query system. The results of the
The results of the queries are cached on disk so that we can tell which queries are cached on disk so that the compiler can tell which queries' results
queries' results changed from the last compilation and only redo those. This is changed from the last compilation and only redo those. This is how incremental
how incremental compilation works. compilation works.
In principle, for the query-fied steps, we do each of the above for each item In principle, for the query-fied steps, we do each of the above for each item
individually. For example, we will take the HIR for a function and use queries individually. For example, we will take the `HIR` for a function and use queries
to ask for the LLVM IR for that HIR. This drives the generation of optimized to ask for the `LLVM-IR` for that HIR. This drives the generation of optimized
MIR, which drives the borrow checker, which drives the generation of MIR, and `MIR`, which drives the borrow checker, which drives the generation of `MIR`, and
so on. so on.
... except that this is very over-simplified. In fact, some queries are not ... except that this is very over-simplified. In fact, some queries are not
@ -295,8 +306,8 @@ Moreover, the compiler wasn't originally built to use a query system; the query
system has been retrofitted into the compiler, so parts of it are not query-fied system has been retrofitted into the compiler, so parts of it are not query-fied
yet. Also, LLVM isn't our code, so that isn't querified either. The plan is to yet. Also, LLVM isn't our code, so that isn't querified either. The plan is to
eventually query-fy all of the steps listed in the previous section, eventually query-fy all of the steps listed in the previous section,
but as of <!-- date-check --> November 2022, only the steps between HIR and but as of <!-- date-check --> November 2022, only the steps between `HIR` and
LLVM IR are query-fied. That is, lexing, parsing, name resolution, and macro `LLVM-IR` are query-fied. That is, lexing, parsing, name resolution, and macro
expansion are done all at once for the whole program. expansion are done all at once for the whole program.
One other thing to mention here is the all-important "typing context", One other thing to mention here is the all-important "typing context",
@ -308,7 +319,7 @@ queries are defined as methods on the [`TyCtxt`] type, and the in-memory query
cache is stored there too. In the code, there is usually a variable called cache is stored there too. In the code, there is usually a variable called
`tcx` which is a handle on the typing context. You will also see lifetimes with `tcx` which is a handle on the typing context. You will also see lifetimes with
the name `'tcx`, which means that something is tied to the lifetime of the the name `'tcx`, which means that something is tied to the lifetime of the
`TyCtxt` (usually it is stored or interned there). [`TyCtxt`] (usually it is stored or interned there).
[`TyCtxt`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyCtxt.html [`TyCtxt`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyCtxt.html
@ -320,9 +331,10 @@ program) is [`rustc_middle::ty::Ty`][ty]. This is so important that we have a wh
on [`ty::Ty`][ty], but for now, we just want to mention that it exists and is the way on [`ty::Ty`][ty], but for now, we just want to mention that it exists and is the way
`rustc` represents types! `rustc` represents types!
Also note that the `rustc_middle::ty` module defines the `TyCtxt` struct we mentioned before. Also note that the [`rustc_middle::ty`] module defines the [`TyCtxt`] struct we mentioned before.
[ty]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Ty.html [ty]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Ty.html
[`rustc_middle::ty`]: https://doc.rust-lang.org/beta/nightly-rustc/rustc_middle/ty/index.html
### Parallelism ### Parallelism
@ -330,17 +342,21 @@ Compiler performance is a problem that we would like to improve on
(and are always working on). One aspect of that is parallelizing (and are always working on). One aspect of that is parallelizing
`rustc` itself. `rustc` itself.
Currently, there is only one part of rustc that is parallel by default: codegen. Currently, there is only one part of rustc that is parallel by default:
[code generation](./parallel-rustc.md#Codegen).
However, the rest of the compiler is still not yet parallel. There have been However, the rest of the compiler is still not yet parallel. There have been
lots of efforts spent on this, but it is generally a hard problem. The current lots of efforts spent on this, but it is generally a hard problem. The current
approach is to turn `RefCell`s into `Mutex`s -- that is, we approach is to turn [`RefCell`]s into [`Mutex`]s -- that is, we
switch to thread-safe internal mutability. However, there are ongoing switch to thread-safe internal mutability. However, there are ongoing
challenges with lock contention, maintaining query-system invariants under challenges with lock contention, maintaining query-system invariants under
concurrency, and the complexity of the code base. One can try out the current concurrency, and the complexity of the code base. One can try out the current
work by enabling parallel compilation in `config.toml`. It's still early days, work by enabling parallel compilation in `config.toml`. It's still early days,
but there are already some promising performance improvements. but there are already some promising performance improvements.
[`RefCell`]: https://doc.rust-lang.org/std/cell/struct.RefCell.html
[`Mutex`]: https://doc.rust-lang.org/std/sync/struct.Mutex.html
### Bootstrapping ### Bootstrapping
`rustc` itself is written in Rust. So how do we compile the compiler? We use an `rustc` itself is written in Rust. So how do we compile the compiler? We use an
@ -362,7 +378,7 @@ For more details on bootstrapping, see
- Does LLVM ever do optimizations in debug builds? - Does LLVM ever do optimizations in debug builds?
- How do I explore phases of the compile process in my own sources (lexer, - How do I explore phases of the compile process in my own sources (lexer,
parser, HIR, etc)? - e.g., `cargo rustc -- -Z unpretty=hir-tree` allows you to parser, HIR, etc)? - e.g., `cargo rustc -- -Z unpretty=hir-tree` allows you to
view HIR representation view `HIR` representation
- What is the main source entry point for `X`? - What is the main source entry point for `X`?
- Where do phases diverge for cross-compilation to machine code across - Where do phases diverge for cross-compilation to machine code across
different platforms? different platforms?
@ -387,16 +403,16 @@ For more details on bootstrapping, see
- [Entry point for first file in crate](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_interface/passes/fn.parse.html) - [Entry point for first file in crate](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_interface/passes/fn.parse.html)
- [Entry point for outline module parsing](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/module/fn.parse_external_mod.html) - [Entry point for outline module parsing](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/module/fn.parse_external_mod.html)
- [Entry point for macro fragments][parse_nonterminal] - [Entry point for macro fragments][parse_nonterminal]
- AST definition: [`rustc_ast`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html) - `AST` definition: [`rustc_ast`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html)
- Feature gating: **TODO** - Feature gating: **TODO**
- Early linting: **TODO** - Early linting: **TODO**
- The High Level Intermediate Representation (HIR) - The High Level Intermediate Representation (HIR)
- Guide: [The HIR](hir.md) - Guide: [The HIR](hir.md)
- Guide: [Identifiers in the HIR](hir.md#identifiers-in-the-hir) - Guide: [Identifiers in the HIR](hir.md#identifiers-in-the-hir)
- Guide: [The HIR Map](hir.md#the-hir-map) - Guide: [The `HIR` Map](hir.md#the-hir-map)
- Guide: [Lowering AST to HIR](lowering.md) - Guide: [Lowering `AST` to HIR](lowering.md)
- How to view HIR representation for your code `cargo rustc -- -Z unpretty=hir-tree` - How to view `HIR` representation for your code `cargo rustc -- -Z unpretty=hir-tree`
- Rustc HIR definition: [`rustc_hir`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html) - Rustc `HIR` definition: [`rustc_hir`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html)
- Main entry point: **TODO** - Main entry point: **TODO**
- Late linting: **TODO** - Late linting: **TODO**
- Type Inference - Type Inference
@ -406,21 +422,21 @@ For more details on bootstrapping, see
- Main entry point (type checking bodies): [the `typeck` query](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyCtxt.html#method.typeck) - Main entry point (type checking bodies): [the `typeck` query](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyCtxt.html#method.typeck)
- These two functions can't be decoupled. - These two functions can't be decoupled.
- The Mid Level Intermediate Representation (MIR) - The Mid Level Intermediate Representation (MIR)
- Guide: [The MIR (Mid level IR)](mir/index.md) - Guide: [The `MIR` (Mid level IR)](mir/index.md)
- Definition: [`rustc_middle/src/mir`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/index.html) - Definition: [`rustc_middle/src/mir`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/index.html)
- Definition of sources that manipulates the MIR: [`rustc_mir_build`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_build/index.html), [`rustc_mir_dataflow`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_dataflow/index.html), [`rustc_mir_transform`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/index.html) - Definition of sources that manipulates the MIR: [`rustc_mir_build`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_build/index.html), [`rustc_mir_dataflow`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_dataflow/index.html), [`rustc_mir_transform`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/index.html)
- The Borrow Checker - The Borrow Checker
- Guide: [MIR Borrow Check](borrow_check.md) - Guide: [MIR Borrow Check](borrow_check.md)
- Definition: [`rustc_borrowck`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_borrowck/index.html) - Definition: [`rustc_borrowck`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_borrowck/index.html)
- Main entry point: [`mir_borrowck` query](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_borrowck/fn.mir_borrowck.html) - Main entry point: [`mir_borrowck` query](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_borrowck/fn.mir_borrowck.html)
- MIR Optimizations - `MIR` Optimizations
- Guide: [MIR Optimizations](mir/optimizations.md) - Guide: [MIR Optimizations](mir/optimizations.md)
- Definition: [`rustc_mir_transform`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/index.html) - Definition: [`rustc_mir_transform`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/index.html)
- Main entry point: [`optimized_mir` query](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/fn.optimized_mir.html) - Main entry point: [`optimized_mir` query](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir_transform/fn.optimized_mir.html)
- Code Generation - Code Generation
- Guide: [Code Generation](backend/codegen.md) - Guide: [Code Generation](backend/codegen.md)
- Generating Machine Code from LLVM IR with LLVM - **TODO: reference?** - Generating Machine Code from `LLVM-IR` with LLVM - **TODO: reference?**
- Main entry point: [`rustc_codegen_ssa::base::codegen_crate`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_crate.html) - Main entry point: [`rustc_codegen_ssa::base::codegen_crate`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_crate.html)
- This monomorphizes and produces LLVM IR for one codegen unit. It then - This monomorphizes and produces `LLVM-IR` for one codegen unit. It then
starts a background thread to run LLVM, which must be joined later. starts a background thread to run LLVM, which must be joined later.
- Monomorphization happens lazily via [`FunctionCx::monomorphize`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/mir/struct.FunctionCx.html#method.monomorphize) and [`rustc_codegen_ssa::base::codegen_instance `](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_instance.html) - Monomorphization happens lazily via [`FunctionCx::monomorphize`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/mir/struct.FunctionCx.html#method.monomorphize) and [`rustc_codegen_ssa::base::codegen_instance `](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/base/fn.codegen_instance.html)