rewrite/update compiler source code chapter
This commit is contained in:
parent
f31532d3b2
commit
3852cb1592
|
|
@ -1,137 +1,149 @@
|
||||||
# High-level overview of the compiler source
|
# High-level overview of the compiler source
|
||||||
|
|
||||||
## Crate structure
|
> **NOTE**: The structure of the repository is going through a lot of
|
||||||
|
> transitions. In particular, we want to get to a point eventually where the
|
||||||
|
> top-level directory has separate directories for the compiler, build-system,
|
||||||
|
> std libs, etc, rather than one huge `src/` directory.
|
||||||
|
|
||||||
The main Rust repository consists of a `src` directory, under which
|
## Workspace structure
|
||||||
there live many crates. These crates contain the sources for the
|
|
||||||
standard library and the compiler. This document, of course, focuses
|
|
||||||
on the latter.
|
|
||||||
|
|
||||||
Rustc consists of a number of crates, including `rustc_ast`,
|
The `rust-lang/rust` repository consists of a single large cargo workspace
|
||||||
`rustc`, `rustc_target`, `rustc_codegen`, `rustc_driver`, and
|
containing the compiler, the standard library (core, alloc, std, etc), and
|
||||||
many more. The source for each crate can be found in a directory
|
`rustdoc`, along with the build system and bunch of tools and submodules for
|
||||||
like `src/libXXX`, where `XXX` is the crate name.
|
building a full Rust distribution.
|
||||||
|
|
||||||
(N.B. The names and divisions of these crates are not set in
|
As of this writing, this structure is gradually undergoing some transformation
|
||||||
stone and may change over time. For the time being, we tend towards a
|
to make it a bit less monolithic and more approachable, especially to
|
||||||
finer-grained division to help with compilation time, though as incremental
|
newcommers.
|
||||||
compilation improves, that may change.)
|
|
||||||
|
|
||||||
The dependency structure of these crates is roughly a diamond:
|
> Eventually, the hope is for the standard library to live in a `stdlib/`
|
||||||
|
> directory, while the compiler lives in `compiler/`. However, as of this
|
||||||
|
> writing, both live in `src/`.
|
||||||
|
|
||||||
```text
|
The repository consists of a `src` directory, under which there live many
|
||||||
rustc_driver
|
crates, which are the source for the compiler, standard library, etc, as
|
||||||
/ | \
|
mentioned above.
|
||||||
/ | \
|
|
||||||
/ | \
|
|
||||||
/ v \
|
|
||||||
rustc_codegen rustc_borrowck ... rustc_metadata
|
|
||||||
\ | /
|
|
||||||
\ | /
|
|
||||||
\ | /
|
|
||||||
\ v /
|
|
||||||
rustc_middle
|
|
||||||
|
|
|
||||||
v
|
|
||||||
rustc_ast
|
|
||||||
/ \
|
|
||||||
/ \
|
|
||||||
rustc_span rustc_builtin_macros
|
|
||||||
```
|
|
||||||
|
|
||||||
The `rustc_driver` crate, at the top of this lattice, is effectively
|
## Standard library
|
||||||
the "main" function for the rust compiler. It doesn't have much "real
|
|
||||||
code", but instead ties together all of the code defined in the other
|
|
||||||
crates and defines the overall flow of execution. (As we transition
|
|
||||||
more and more to the [query model], however, the
|
|
||||||
"flow" of compilation is becoming less centrally defined.)
|
|
||||||
|
|
||||||
At the other extreme, the `rustc_middle` crate defines the common and
|
The standard library crates are obviously named `libstd`, `libcore`,
|
||||||
pervasive data structures that all the rest of the compiler uses
|
`liballoc`, etc. There is also `libproc_macro`, `libtest`, and other runtime
|
||||||
(e.g. how to represent types, traits, and the program itself). It
|
libraries.
|
||||||
also contains some amount of the compiler itself, although that is
|
|
||||||
relatively limited.
|
|
||||||
|
|
||||||
Finally, all the crates in the bulge in the middle define the bulk of
|
This code is fairly similar to most other Rust crates except that it must be
|
||||||
the compiler – they all depend on `rustc_middle`, so that they can make use
|
built in a special way because it can use unstable features.
|
||||||
of the various types defined there, and they export public routines
|
|
||||||
that `rustc_driver` will invoke as needed (more and more, what these
|
|
||||||
crates export are "query definitions", but those are covered later
|
|
||||||
on).
|
|
||||||
|
|
||||||
Below `rustc_middle` lie various crates that make up the parser and error
|
## Compiler
|
||||||
reporting mechanism. They are also an internal part
|
|
||||||
of the compiler and not intended to be stable (though they do wind up
|
|
||||||
getting used by some crates in the wild; a practice we hope to
|
|
||||||
gradually phase out).
|
|
||||||
|
|
||||||
## The main stages of compilation
|
The compiler crates all have names starting with `librustc_*`. These are a large
|
||||||
|
collection of interdependent crates. There is also the `rustc` crate which is
|
||||||
|
the actual binary. It doesn't actually do anything besides calling the compiler
|
||||||
|
main function elsewhere.
|
||||||
|
|
||||||
The Rust compiler is in a bit of transition right now. It used to be a
|
The dependency structure of these crates is complex, but roughly it is
|
||||||
purely "pass-based" compiler, where we ran a number of passes over the
|
something like this:
|
||||||
entire program, and each did a particular check of transformation. We
|
|
||||||
are gradually replacing this pass-based code with an alternative setup
|
|
||||||
based on on-demand **queries**. In the query-model, we work backwards,
|
|
||||||
executing a *query* that expresses our ultimate goal (e.g. "compile
|
|
||||||
this crate"). This query in turn may make other queries (e.g. "get me
|
|
||||||
a list of all modules in the crate"). Those queries make other queries
|
|
||||||
that ultimately bottom out in the base operations, like parsing the
|
|
||||||
input, running the type-checker, and so forth. This on-demand model
|
|
||||||
permits us to do exciting things like only do the minimal amount of
|
|
||||||
work needed to type-check a single function. It also helps with
|
|
||||||
incremental compilation. (For details on defining queries, check out
|
|
||||||
the [query model].)
|
|
||||||
|
|
||||||
Regardless of the general setup, the basic operations that the
|
- `rustc` (the binary) calls [`rustc_driver::main`][main].
|
||||||
compiler must perform are the same. The only thing that changes is
|
- [`rustc_driver`] depends on a lot of other crates, but the main one is
|
||||||
whether these operations are invoked front-to-back, or on demand. In
|
[`rustc_interface`].
|
||||||
order to compile a Rust crate, these are the general steps that we
|
- [`rustc_interface`] depends on most of the other compiler crates. It
|
||||||
take:
|
is a fairly generic interface for driving the whole compilation.
|
||||||
|
- The most of the other `rustc_*` crates depend on [`rustc_middle`],
|
||||||
|
which defines a lot of central data structures in the compiler.
|
||||||
|
- [`rustc_middle`] and most of the other crates depend on a
|
||||||
|
handful of crates representing the early parts of the
|
||||||
|
compiler (e.g. the parser), fundamental data structures (e.g.
|
||||||
|
[`Span`]), or error reporting: [`rustc_data_strucutres`],
|
||||||
|
[`rustc_span`], [`rustc_errors`], etc.
|
||||||
|
|
||||||
1. **Parsing input**
|
[main]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/fn.main.html
|
||||||
- this processes the `.rs` files and produces the AST
|
[`rustc_driver`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/index.html
|
||||||
("abstract syntax tree")
|
[`rustc_interface`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_interface/index.html
|
||||||
- the AST is defined in `src/librustc_ast/ast.rs`. It is intended to match the lexical
|
[`rustc_middle`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/index.html
|
||||||
syntax of the Rust language quite closely.
|
[`rustc_data_strucutres`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_data_strucutres/index.html
|
||||||
2. **Name resolution, macro expansion, and configuration**
|
[`rustc_span`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/index.html
|
||||||
- once parsing is complete, we process the AST recursively, resolving
|
[`Span`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.Span.html
|
||||||
paths and expanding macros. This same process also processes `#[cfg]`
|
[`rustc_errors`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_errors/index.html
|
||||||
nodes, and hence may strip things out of the AST as well.
|
|
||||||
3. **Lowering to HIR**
|
|
||||||
- Once name resolution completes, we convert the AST into the HIR,
|
|
||||||
or "[high-level intermediate representation]". The HIR is defined in
|
|
||||||
`src/librustc_middle/hir/`; that module also includes the [lowering] code.
|
|
||||||
- The HIR is a lightly desugared variant of the AST. It is more processed
|
|
||||||
than the AST and more suitable for the analyses that follow.
|
|
||||||
It is **not** required to match the syntax of the Rust language.
|
|
||||||
- As a simple example, in the **AST**, we preserve the parentheses
|
|
||||||
that the user wrote, so `((1 + 2) + 3)` and `1 + 2 + 3` parse
|
|
||||||
into distinct trees, even though they are equivalent. In the
|
|
||||||
HIR, however, parentheses nodes are removed, and those two
|
|
||||||
expressions are represented in the same way.
|
|
||||||
3. **Type-checking and subsequent analyses**
|
|
||||||
- An important step in processing the HIR is to perform type
|
|
||||||
checking. This process assigns types to every HIR expression,
|
|
||||||
for example, and also is responsible for resolving some
|
|
||||||
"type-dependent" paths, such as field accesses (`x.f` – we
|
|
||||||
can't know what field `f` is being accessed until we know the
|
|
||||||
type of `x`) and associated type references (`T::Item` – we
|
|
||||||
can't know what type `Item` is until we know what `T` is).
|
|
||||||
- Type checking creates "side-tables" (`TypeckTables`) that include
|
|
||||||
the types of expressions, the way to resolve methods, and so forth.
|
|
||||||
- After type-checking, we can do other analyses, such as privacy checking.
|
|
||||||
4. **Lowering to MIR and post-processing**
|
|
||||||
- Once type-checking is done, we can lower the HIR into MIR ("middle IR"),
|
|
||||||
which is a **very** desugared version of Rust, well suited to borrowck
|
|
||||||
but also to certain high-level optimizations.
|
|
||||||
5. **Translation to LLVM and LLVM optimizations**
|
|
||||||
- From MIR, we can produce LLVM IR.
|
|
||||||
- LLVM then runs its various optimizations, which produces a number of
|
|
||||||
`.o` files (one for each "codegen unit").
|
|
||||||
6. **Linking**
|
|
||||||
- Finally, those `.o` files are linked together.
|
|
||||||
|
|
||||||
|
You can see the exact dependencies by reading the `Cargo.toml` for the various
|
||||||
|
crates, just like a normal Rust crate.
|
||||||
|
|
||||||
[query model]: query.html
|
You may ask why the compiler is broken into so many crates. There are two major reasons:
|
||||||
[high-level intermediate representation]: hir.html
|
|
||||||
[lowering]: lowering.html
|
1. Organization. The compiler is a _huge_ codebase; it would be an impossibly large crate.
|
||||||
|
2. Compile time. By breaking the compiler into multiple crates, we can take
|
||||||
|
better advantage of incremental/parallel compilation using cargo. In
|
||||||
|
particular, we try to have as few dependencies between crates as possible so
|
||||||
|
that we dont' have to rebuild as many crates if you change one.
|
||||||
|
|
||||||
|
Most of this book is about the compiler, so we won't have any further
|
||||||
|
explanation of these crates here.
|
||||||
|
|
||||||
|
One final thing: [`src/llvm-project`] is a submodule for our fork of LLVM.
|
||||||
|
|
||||||
|
[`src/llvm-project`]: https://github.com/rust-lang/rust/tree/master/src
|
||||||
|
|
||||||
|
## rustdoc
|
||||||
|
|
||||||
|
The bulk of `rustdoc` is in [`librustdoc`]. However, the `rustdoc` binary
|
||||||
|
itself is [`src/tools/rustdoc`], which does nothing except call [`rustdoc::main`].
|
||||||
|
|
||||||
|
There is also javascript and CSS for the rustdocs in [`src/tools/rustdoc-js`]
|
||||||
|
and [`src/tools/rustdoc-themes`].
|
||||||
|
|
||||||
|
You can read more about rustdoc in [this chapter][rustdocch].
|
||||||
|
|
||||||
|
[`librustdoc`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustdoc/index.html
|
||||||
|
[`rustdoc::main`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustdoc/fn.main.html
|
||||||
|
[`src/tools/rustdoc`]: https://github.com/rust-lang/rust/tree/master/src/tools/rustdoc
|
||||||
|
[`src/tools/rustdoc-js`]: https://github.com/rust-lang/rust/tree/master/src/tools/rustdoc-js
|
||||||
|
[`src/tools/rustdoc-themes`]: https://github.com/rust-lang/rust/tree/master/src/tools/rustdoc-themes
|
||||||
|
|
||||||
|
[rustdocch]: ./rustdoc-internals.md
|
||||||
|
|
||||||
|
## Tests
|
||||||
|
|
||||||
|
The test suite for all of the above is in [`src/test/`]. You can read more
|
||||||
|
about the test suite [in this chapter][testsch].
|
||||||
|
|
||||||
|
The test harness itself is in [`src/tools/compiletest`].
|
||||||
|
|
||||||
|
[testsch]: ./tests/intro.md
|
||||||
|
|
||||||
|
[`src/test/`]: https://github.com/rust-lang/rust/tree/master/src/test
|
||||||
|
[`src/tools/compiletest`]: https://github.com/rust-lang/rust/tree/master/src/tools/compiletest
|
||||||
|
|
||||||
|
## Build System
|
||||||
|
|
||||||
|
There are a number of tools in the repository just for building the compiler,
|
||||||
|
standard library, rustdoc, etc, along with testing, building a full Rust
|
||||||
|
distribution, etc.
|
||||||
|
|
||||||
|
One of the primary tools is [`src/bootstrap`]. You can read more about
|
||||||
|
bootstrapping [in this chapter][bootstch]. The process may also use other tools
|
||||||
|
from `src/tools/`, such as [`tidy`] or [`compiletest`].
|
||||||
|
|
||||||
|
[`src/bootstrap`]: https://github.com/rust-lang/rust/tree/master/src/bootstrap
|
||||||
|
[`tidy`]: https://github.com/rust-lang/rust/tree/master/src/tools/tidy
|
||||||
|
[`compiletest`]: https://github.com/rust-lang/rust/tree/master/src/tools/compiletest
|
||||||
|
|
||||||
|
[bootstch]: ./building/bootstrapping.md
|
||||||
|
|
||||||
|
## Other
|
||||||
|
|
||||||
|
There are a lot of other things in the `rust-lang/rust` repo that are related
|
||||||
|
to building a full rust distribution. Most of the time you don't need to worry
|
||||||
|
about them.
|
||||||
|
|
||||||
|
These include:
|
||||||
|
- [`src/ci`]: The CI configuration. This actually quite extensive because we
|
||||||
|
run a lot of tests on a lot of platforms.
|
||||||
|
- [`src/doc`]: Various documentation, including submodules for a few books.
|
||||||
|
- [`src/etc`]: Miscellaneous utilities.
|
||||||
|
- [`src/tools/rustc-workspace-hack`], and others: Various workarounds to make cargo work with bootstrapping.
|
||||||
|
- And more...
|
||||||
|
|
||||||
|
[`src/ci`]: https://github.com/rust-lang/rust/tree/master/src/ci
|
||||||
|
[`src/doc`]: https://github.com/rust-lang/rust/tree/master/src/doc
|
||||||
|
[`src/etc`]: https://github.com/rust-lang/rust/tree/master/src/etc
|
||||||
|
[`src/tools/rustc-workspace-hack`]: https://github.com/rust-lang/rust/tree/master/src/tools/rustc-workspace-hack
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue