[overview.md] Add command line argument parsing, lexer stages, and parser outline
This commit is contained in:
parent
a43ef4d3b3
commit
0783019c12
132
src/overview.md
132
src/overview.md
|
|
@ -19,16 +19,16 @@ we'll talk about that later.
|
||||||
|
|
||||||
**TODO: someone else should confirm this vvv**
|
**TODO: someone else should confirm this vvv**
|
||||||
|
|
||||||
- User writes a program and invokes `rustc` on it (possibly through `cargo`).
|
- The compile process begins when a user writes a Rust source program in text and invokes the `rustc` compiler on it. The work that the compiler needs to perform is defined with command line options. For example, it is possible to optionally enable nightly features, perform `check`-only builds, or emit LLVM-IR rather than complete the entire compile process defined here. The `rustc` executable call may be indirect through the use of `cargo`.
|
||||||
- First, we parse command line flags, etc. This is done in [`librustc_driver`].
|
- Command line argument parsing occurs in the [`librustc_driver`]. This crate defines the compile configuration that is requested by the user.
|
||||||
We now know what the exact work is we need to do (e.g. which nightly features
|
- The raw Rust source text is analyzed by a low-level lexer located in [`librustc_lexer`]. At this stage, the source text is turned into a stream of atomic source code units known as _tokens_. (**TODO**: chrissimpkins - Maybe discuss Unicode handling during this stage?)
|
||||||
are enabled, whether we are doing a `check`-only build or emiting LLVM-IR or
|
- The token stream passes through a higher-level lexer located in [`librustc_parse`] to prepare for the next stage of the compile process. The [`StringReader`] struct is used at this stage to perform a set of validations and turn strings into interned symbols.
|
||||||
a full compilation).
|
- (**TODO**: chrissimpkins - Expand info on parser) We then [_parse_ the stream of tokens][parser] to build an Abstract Syntax Tree (AST).
|
||||||
- Then, we start to do compilation...
|
- macro expansion (**TODO** chrissimpkins)
|
||||||
- We first [_lex_ the user program][lex]. This turns the program into a stream
|
- ast validation (**TODO** chrissimpkins)
|
||||||
of _tokens_ (yes, the same sort of tokens as `proc_macros` (sort of)).
|
- nameres (**TODO** chrissimpkins)
|
||||||
[`StringReader`] from [`librustc_parse`] integrates [`librustc_lexer`] with
|
- early linting (**TODO** chrissimpkins)
|
||||||
`rustc` data structures.
|
|
||||||
- We then [_parse_ the stream of tokens][parser] to build an Abstract Syntax
|
- We then [_parse_ the stream of tokens][parser] to build an Abstract Syntax
|
||||||
Tree (AST).
|
Tree (AST).
|
||||||
- We then take the AST and [convert it to High-Level Intermediate
|
- We then take the AST and [convert it to High-Level Intermediate
|
||||||
|
|
@ -45,27 +45,27 @@ we'll talk about that later.
|
||||||
- We (want to) do [many optimizations on the MIR][mir-opt] because it is still
|
- We (want to) do [many optimizations on the MIR][mir-opt] because it is still
|
||||||
generic and that improves the code we generate later, improving compilation
|
generic and that improves the code we generate later, improving compilation
|
||||||
speed too. (**TODO: size optimizations too?**)
|
speed too. (**TODO: size optimizations too?**)
|
||||||
- MIR is a higher level (and generic) representation, so it is easier to do
|
- MIR is a higher level (and generic) representation, so it is easier to do
|
||||||
some optimizations at MIR level than at LLVM-IR level. For example LLVM
|
some optimizations at MIR level than at LLVM-IR level. For example LLVM
|
||||||
doesn't seem to be able to optimize the pattern the [`simplify_try`] mir
|
doesn't seem to be able to optimize the pattern the [`simplify_try`] mir
|
||||||
opt looks for.
|
opt looks for.
|
||||||
- Rust code is _monomorphized_, which means making copies of all the generic
|
- Rust code is _monomorphized_, which means making copies of all the generic
|
||||||
code with the type parameters replaced by concrete types. To do
|
code with the type parameters replaced by concrete types. To do
|
||||||
this, we need to collect a list of what concrete types to generate code for.
|
this, we need to collect a list of what concrete types to generate code for.
|
||||||
This is called _monomorphization collection_.
|
This is called _monomorphization collection_.
|
||||||
- We then begin what is vaguely called _code generation_ or _codegen_.
|
- We then begin what is vaguely called _code generation_ or _codegen_.
|
||||||
- The [code generation stage (codegen)][codegen] is when higher level
|
- The [code generation stage (codegen)][codegen] is when higher level
|
||||||
representations of source are turned into an executable binary. `rustc`
|
representations of source are turned into an executable binary. `rustc`
|
||||||
uses LLVM for code generation. The first step is the MIR is then
|
uses LLVM for code generation. The first step is the MIR is then
|
||||||
converted to LLVM Intermediate Representation (LLVM IR). This is where
|
converted to LLVM Intermediate Representation (LLVM IR). This is where
|
||||||
the MIR is actually monomorphized, according to the list we created in
|
the MIR is actually monomorphized, according to the list we created in
|
||||||
the previous step.
|
the previous step.
|
||||||
- The LLVM IR is passed to LLVM, which does a lot more optimizations on it.
|
- The LLVM IR is passed to LLVM, which does a lot more optimizations on it.
|
||||||
It then emits machine code. It is basically assembly code with additional
|
It then emits machine code. It is basically assembly code with additional
|
||||||
low-level types and annotations added. (e.g. an ELF object or wasm).
|
low-level types and annotations added. (e.g. an ELF object or wasm).
|
||||||
**TODO: reference for this section?**
|
**TODO: reference for this section?**
|
||||||
- The different libraries/binaries are linked together to produce the final
|
- The different libraries/binaries are linked together to produce the final
|
||||||
binary. **TODO: reference for this section?**
|
binary. **TODO: reference for this section?**
|
||||||
|
|
||||||
[`librustc_lexer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html
|
[`librustc_lexer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html
|
||||||
[`librustc_driver`]: https://rust-lang.github.io/rustc-guide/rustc-driver.html
|
[`librustc_driver`]: https://rust-lang.github.io/rustc-guide/rustc-driver.html
|
||||||
|
|
@ -90,12 +90,12 @@ satisfy/optimize for. For example,
|
||||||
|
|
||||||
- Compilation speed: how fast is it to compile a program. More/better
|
- Compilation speed: how fast is it to compile a program. More/better
|
||||||
compile-time analyses often means compilation is slower.
|
compile-time analyses often means compilation is slower.
|
||||||
- Also, we want to support incremental compilation, so we need to take that
|
- Also, we want to support incremental compilation, so we need to take that
|
||||||
into account. How can we keep track of what work needs to be redone and
|
into account. How can we keep track of what work needs to be redone and
|
||||||
what can be reused if the user modifies their program?
|
what can be reused if the user modifies their program?
|
||||||
- Also we can't store too much stuff in the incremental cache because
|
- Also we can't store too much stuff in the incremental cache because
|
||||||
it would take a long time to load from disk and it could take a lot
|
it would take a long time to load from disk and it could take a lot
|
||||||
of space on the user's system...
|
of space on the user's system...
|
||||||
- Compiler memory usage: while compiling a program, we don't want to use more
|
- Compiler memory usage: while compiling a program, we don't want to use more
|
||||||
memory than we need.
|
memory than we need.
|
||||||
- Program speed: how fast is your compiled program. More/better compile-time
|
- Program speed: how fast is your compiled program. More/better compile-time
|
||||||
|
|
@ -277,46 +277,46 @@ but there are already some promising performance improvements.
|
||||||
# References
|
# References
|
||||||
|
|
||||||
- Command line parsing
|
- Command line parsing
|
||||||
- Guide: [The Rustc Driver and Interface](https://rust-lang.github.io/rustc-guide/rustc-driver.html)
|
- Guide: [The Rustc Driver and Interface](https://rust-lang.github.io/rustc-guide/rustc-driver.html)
|
||||||
- Driver definition: [`rustc_driver`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/)
|
- Driver definition: [`rustc_driver`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/)
|
||||||
- Main entry point: **TODO**
|
- Main entry point: **TODO**
|
||||||
- Lexical Analysis: Lex the user program to a stream of tokens
|
- Lexical Analysis: Lex the user program to a stream of tokens
|
||||||
- Guide: [Lexing and Parsing](https://rust-lang.github.io/rustc-guide/the-parser.html)
|
- Guide: [Lexing and Parsing](https://rust-lang.github.io/rustc-guide/the-parser.html)
|
||||||
- Lexer definition: [`librustc_lexer`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html)
|
- Lexer definition: [`librustc_lexer`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html)
|
||||||
- Main entry point: **TODO**
|
- Main entry point: **TODO**
|
||||||
- Parsing: Parse the stream of tokens to an Abstract Syntax Tree (AST)
|
- Parsing: Parse the stream of tokens to an Abstract Syntax Tree (AST)
|
||||||
- Guide: [Lexing and Parsing](https://rust-lang.github.io/rustc-guide/the-parser.html)
|
- Guide: [Lexing and Parsing](https://rust-lang.github.io/rustc-guide/the-parser.html)
|
||||||
- Parser definition: [`librustc_parse`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html)
|
- Parser definition: [`librustc_parse`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html)
|
||||||
- Main entry point: **TODO**
|
- Main entry point: **TODO**
|
||||||
- AST definition: [`librustc_ast`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html)
|
- AST definition: [`librustc_ast`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html)
|
||||||
- The High Level Intermediate Representation (HIR)
|
- The High Level Intermediate Representation (HIR)
|
||||||
- Guide: [The HIR](https://rust-lang.github.io/rustc-guide/hir.html)
|
- Guide: [The HIR](https://rust-lang.github.io/rustc-guide/hir.html)
|
||||||
- Guide: [Identifiers in the HIR](https://rust-lang.github.io/rustc-guide/hir.html#identifiers-in-the-hir)
|
- Guide: [Identifiers in the HIR](https://rust-lang.github.io/rustc-guide/hir.html#identifiers-in-the-hir)
|
||||||
- Guide: [The HIR Map](https://rust-lang.github.io/rustc-guide/hir.html#the-hir-map)
|
- Guide: [The HIR Map](https://rust-lang.github.io/rustc-guide/hir.html#the-hir-map)
|
||||||
- Guide: [Lowering AST to HIR](https://rust-lang.github.io/rustc-guide/lowering.html)
|
- Guide: [Lowering AST to HIR](https://rust-lang.github.io/rustc-guide/lowering.html)
|
||||||
- How to view HIR representation for your code `cargo rustc -- -Zunpretty=hir-tree`
|
- How to view HIR representation for your code `cargo rustc -- -Zunpretty=hir-tree`
|
||||||
- Rustc HIR definition: [`rustc_hir`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html)
|
- Rustc HIR definition: [`rustc_hir`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html)
|
||||||
- Main entry point: **TODO**
|
- Main entry point: **TODO**
|
||||||
- Type Inference
|
- Type Inference
|
||||||
- Guide: [Type Inference](https://rust-lang.github.io/rustc-guide/type-inference.html)
|
- Guide: [Type Inference](https://rust-lang.github.io/rustc-guide/type-inference.html)
|
||||||
- Guide: [The ty Module: Representing Types](https://rust-lang.github.io/rustc-guide/ty.html) (semantics)
|
- Guide: [The ty Module: Representing Types](https://rust-lang.github.io/rustc-guide/ty.html) (semantics)
|
||||||
- Main entry point: **TODO**
|
- Main entry point: **TODO**
|
||||||
- The Mid Level Intermediate Representation (MIR)
|
- The Mid Level Intermediate Representation (MIR)
|
||||||
- Guide: [The MIR (Mid level IR)](https://rust-lang.github.io/rustc-guide/mir/index.html)
|
- Guide: [The MIR (Mid level IR)](https://rust-lang.github.io/rustc-guide/mir/index.html)
|
||||||
- Definition: [`librustc/mir`](https://github.com/rust-lang/rust/tree/master/src/librustc/mir)
|
- Definition: [`librustc/mir`](https://github.com/rust-lang/rust/tree/master/src/librustc/mir)
|
||||||
- Definition of source that manipulates the MIR: [`librustc_mir`](https://github.com/rust-lang/rust/tree/master/src/librustc_mir)
|
- Definition of source that manipulates the MIR: [`librustc_mir`](https://github.com/rust-lang/rust/tree/master/src/librustc_mir)
|
||||||
- Main entry point: **TODO**
|
- Main entry point: **TODO**
|
||||||
- The Borrow Checker
|
- The Borrow Checker
|
||||||
- Guide: [MIR Borrow Check](https://rust-lang.github.io/rustc-guide/borrow_check.html)
|
- Guide: [MIR Borrow Check](https://rust-lang.github.io/rustc-guide/borrow_check.html)
|
||||||
- Definition: [`rustc_mir/borrow_check`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/index.html)
|
- Definition: [`rustc_mir/borrow_check`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/index.html)
|
||||||
- Main entry point: **TODO**
|
- Main entry point: **TODO**
|
||||||
- MIR Optimizations
|
- MIR Optimizations
|
||||||
- Guide: [MIR Optimizations](https://rust-lang.github.io/rustc-guide/mir/optimizations.html)
|
- Guide: [MIR Optimizations](https://rust-lang.github.io/rustc-guide/mir/optimizations.html)
|
||||||
- Definition: [`rustc_mir/transform`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/index.html) **TODO: is this correct?**
|
- Definition: [`rustc_mir/transform`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/index.html) **TODO: is this correct?**
|
||||||
- Main entry point: **TODO**
|
- Main entry point: **TODO**
|
||||||
- Code Generation
|
- Code Generation
|
||||||
- Guide: [Code Generation](https://rust-lang.github.io/rustc-guide/codegen.html)
|
- Guide: [Code Generation](https://rust-lang.github.io/rustc-guide/codegen.html)
|
||||||
- Guide: [Generating LLVM IR](https://rust-lang.github.io/rustc-guide/codegen.html#generating-llvm-ir) - **TODO: this is not available yet**
|
- Guide: [Generating LLVM IR](https://rust-lang.github.io/rustc-guide/codegen.html#generating-llvm-ir) - **TODO: this is not available yet**
|
||||||
- Generating Machine Code from LLVM IR with LLVM - **TODO: reference?**
|
- Generating Machine Code from LLVM IR with LLVM - **TODO: reference?**
|
||||||
- Main entry point MIR -> LLVM IR: **TODO**
|
- Main entry point MIR -> LLVM IR: **TODO**
|
||||||
- Main entry point LLVM IR -> Machine Code **TODO**
|
- Main entry point LLVM IR -> Machine Code **TODO**
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue