[overview.md] Add command line argument parsing, lexer stages, and parser outline

This commit is contained in:
Chris Simpkins 2020-04-03 01:41:04 -04:00 committed by Who? Me?!
parent a43ef4d3b3
commit 0783019c12
1 changed files with 66 additions and 66 deletions

View File

@ -19,16 +19,16 @@ we'll talk about that later.
**TODO: someone else should confirm this vvv** **TODO: someone else should confirm this vvv**
- User writes a program and invokes `rustc` on it (possibly through `cargo`). - The compile process begins when a user writes a Rust source program in text and invokes the `rustc` compiler on it. The work that the compiler needs to perform is defined with command line options. For example, it is possible to optionally enable nightly features, perform `check`-only builds, or emit LLVM-IR rather than complete the entire compile process defined here. The `rustc` executable call may be indirect through the use of `cargo`.
- First, we parse command line flags, etc. This is done in [`librustc_driver`]. - Command line argument parsing occurs in the [`librustc_driver`]. This crate defines the compile configuration that is requested by the user.
We now know what the exact work is we need to do (e.g. which nightly features - The raw Rust source text is analyzed by a low-level lexer located in [`librustc_lexer`]. At this stage, the source text is turned into a stream of atomic source code units known as _tokens_. (**TODO**: chrissimpkins - Maybe discuss Unicode handling during this stage?)
are enabled, whether we are doing a `check`-only build or emiting LLVM-IR or - The token stream passes through a higher-level lexer located in [`librustc_parse`] to prepare for the next stage of the compile process. The [`StringReader`] struct is used at this stage to perform a set of validations and turn strings into interned symbols.
a full compilation). - (**TODO**: chrissimpkins - Expand info on parser) We then [_parse_ the stream of tokens][parser] to build an Abstract Syntax Tree (AST).
- Then, we start to do compilation... - macro expansion (**TODO** chrissimpkins)
- We first [_lex_ the user program][lex]. This turns the program into a stream - ast validation (**TODO** chrissimpkins)
of _tokens_ (yes, the same sort of tokens as `proc_macros` (sort of)). - nameres (**TODO** chrissimpkins)
[`StringReader`] from [`librustc_parse`] integrates [`librustc_lexer`] with - early linting (**TODO** chrissimpkins)
`rustc` data structures.
- We then [_parse_ the stream of tokens][parser] to build an Abstract Syntax - We then [_parse_ the stream of tokens][parser] to build an Abstract Syntax
Tree (AST). Tree (AST).
- We then take the AST and [convert it to High-Level Intermediate - We then take the AST and [convert it to High-Level Intermediate
@ -45,27 +45,27 @@ we'll talk about that later.
- We (want to) do [many optimizations on the MIR][mir-opt] because it is still - We (want to) do [many optimizations on the MIR][mir-opt] because it is still
generic and that improves the code we generate later, improving compilation generic and that improves the code we generate later, improving compilation
speed too. (**TODO: size optimizations too?**) speed too. (**TODO: size optimizations too?**)
- MIR is a higher level (and generic) representation, so it is easier to do - MIR is a higher level (and generic) representation, so it is easier to do
some optimizations at MIR level than at LLVM-IR level. For example LLVM some optimizations at MIR level than at LLVM-IR level. For example LLVM
doesn't seem to be able to optimize the pattern the [`simplify_try`] mir doesn't seem to be able to optimize the pattern the [`simplify_try`] mir
opt looks for. opt looks for.
- Rust code is _monomorphized_, which means making copies of all the generic - Rust code is _monomorphized_, which means making copies of all the generic
code with the type parameters replaced by concrete types. To do code with the type parameters replaced by concrete types. To do
this, we need to collect a list of what concrete types to generate code for. this, we need to collect a list of what concrete types to generate code for.
This is called _monomorphization collection_. This is called _monomorphization collection_.
- We then begin what is vaguely called _code generation_ or _codegen_. - We then begin what is vaguely called _code generation_ or _codegen_.
- The [code generation stage (codegen)][codegen] is when higher level - The [code generation stage (codegen)][codegen] is when higher level
representations of source are turned into an executable binary. `rustc` representations of source are turned into an executable binary. `rustc`
uses LLVM for code generation. The first step is the MIR is then uses LLVM for code generation. The first step is the MIR is then
converted to LLVM Intermediate Representation (LLVM IR). This is where converted to LLVM Intermediate Representation (LLVM IR). This is where
the MIR is actually monomorphized, according to the list we created in the MIR is actually monomorphized, according to the list we created in
the previous step. the previous step.
- The LLVM IR is passed to LLVM, which does a lot more optimizations on it. - The LLVM IR is passed to LLVM, which does a lot more optimizations on it.
It then emits machine code. It is basically assembly code with additional It then emits machine code. It is basically assembly code with additional
low-level types and annotations added. (e.g. an ELF object or wasm). low-level types and annotations added. (e.g. an ELF object or wasm).
**TODO: reference for this section?** **TODO: reference for this section?**
- The different libraries/binaries are linked together to produce the final - The different libraries/binaries are linked together to produce the final
binary. **TODO: reference for this section?** binary. **TODO: reference for this section?**
[`librustc_lexer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html [`librustc_lexer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html
[`librustc_driver`]: https://rust-lang.github.io/rustc-guide/rustc-driver.html [`librustc_driver`]: https://rust-lang.github.io/rustc-guide/rustc-driver.html
@ -90,12 +90,12 @@ satisfy/optimize for. For example,
- Compilation speed: how fast is it to compile a program. More/better - Compilation speed: how fast is it to compile a program. More/better
compile-time analyses often means compilation is slower. compile-time analyses often means compilation is slower.
- Also, we want to support incremental compilation, so we need to take that - Also, we want to support incremental compilation, so we need to take that
into account. How can we keep track of what work needs to be redone and into account. How can we keep track of what work needs to be redone and
what can be reused if the user modifies their program? what can be reused if the user modifies their program?
- Also we can't store too much stuff in the incremental cache because - Also we can't store too much stuff in the incremental cache because
it would take a long time to load from disk and it could take a lot it would take a long time to load from disk and it could take a lot
of space on the user's system... of space on the user's system...
- Compiler memory usage: while compiling a program, we don't want to use more - Compiler memory usage: while compiling a program, we don't want to use more
memory than we need. memory than we need.
- Program speed: how fast is your compiled program. More/better compile-time - Program speed: how fast is your compiled program. More/better compile-time
@ -277,46 +277,46 @@ but there are already some promising performance improvements.
# References # References
- Command line parsing - Command line parsing
- Guide: [The Rustc Driver and Interface](https://rust-lang.github.io/rustc-guide/rustc-driver.html) - Guide: [The Rustc Driver and Interface](https://rust-lang.github.io/rustc-guide/rustc-driver.html)
- Driver definition: [`rustc_driver`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/) - Driver definition: [`rustc_driver`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/)
- Main entry point: **TODO** - Main entry point: **TODO**
- Lexical Analysis: Lex the user program to a stream of tokens - Lexical Analysis: Lex the user program to a stream of tokens
- Guide: [Lexing and Parsing](https://rust-lang.github.io/rustc-guide/the-parser.html) - Guide: [Lexing and Parsing](https://rust-lang.github.io/rustc-guide/the-parser.html)
- Lexer definition: [`librustc_lexer`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html) - Lexer definition: [`librustc_lexer`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html)
- Main entry point: **TODO** - Main entry point: **TODO**
- Parsing: Parse the stream of tokens to an Abstract Syntax Tree (AST) - Parsing: Parse the stream of tokens to an Abstract Syntax Tree (AST)
- Guide: [Lexing and Parsing](https://rust-lang.github.io/rustc-guide/the-parser.html) - Guide: [Lexing and Parsing](https://rust-lang.github.io/rustc-guide/the-parser.html)
- Parser definition: [`librustc_parse`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html) - Parser definition: [`librustc_parse`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html)
- Main entry point: **TODO** - Main entry point: **TODO**
- AST definition: [`librustc_ast`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html) - AST definition: [`librustc_ast`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html)
- The High Level Intermediate Representation (HIR) - The High Level Intermediate Representation (HIR)
- Guide: [The HIR](https://rust-lang.github.io/rustc-guide/hir.html) - Guide: [The HIR](https://rust-lang.github.io/rustc-guide/hir.html)
- Guide: [Identifiers in the HIR](https://rust-lang.github.io/rustc-guide/hir.html#identifiers-in-the-hir) - Guide: [Identifiers in the HIR](https://rust-lang.github.io/rustc-guide/hir.html#identifiers-in-the-hir)
- Guide: [The HIR Map](https://rust-lang.github.io/rustc-guide/hir.html#the-hir-map) - Guide: [The HIR Map](https://rust-lang.github.io/rustc-guide/hir.html#the-hir-map)
- Guide: [Lowering AST to HIR](https://rust-lang.github.io/rustc-guide/lowering.html) - Guide: [Lowering AST to HIR](https://rust-lang.github.io/rustc-guide/lowering.html)
- How to view HIR representation for your code `cargo rustc -- -Zunpretty=hir-tree` - How to view HIR representation for your code `cargo rustc -- -Zunpretty=hir-tree`
- Rustc HIR definition: [`rustc_hir`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html) - Rustc HIR definition: [`rustc_hir`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html)
- Main entry point: **TODO** - Main entry point: **TODO**
- Type Inference - Type Inference
- Guide: [Type Inference](https://rust-lang.github.io/rustc-guide/type-inference.html) - Guide: [Type Inference](https://rust-lang.github.io/rustc-guide/type-inference.html)
- Guide: [The ty Module: Representing Types](https://rust-lang.github.io/rustc-guide/ty.html) (semantics) - Guide: [The ty Module: Representing Types](https://rust-lang.github.io/rustc-guide/ty.html) (semantics)
- Main entry point: **TODO** - Main entry point: **TODO**
- The Mid Level Intermediate Representation (MIR) - The Mid Level Intermediate Representation (MIR)
- Guide: [The MIR (Mid level IR)](https://rust-lang.github.io/rustc-guide/mir/index.html) - Guide: [The MIR (Mid level IR)](https://rust-lang.github.io/rustc-guide/mir/index.html)
- Definition: [`librustc/mir`](https://github.com/rust-lang/rust/tree/master/src/librustc/mir) - Definition: [`librustc/mir`](https://github.com/rust-lang/rust/tree/master/src/librustc/mir)
- Definition of source that manipulates the MIR: [`librustc_mir`](https://github.com/rust-lang/rust/tree/master/src/librustc_mir) - Definition of source that manipulates the MIR: [`librustc_mir`](https://github.com/rust-lang/rust/tree/master/src/librustc_mir)
- Main entry point: **TODO** - Main entry point: **TODO**
- The Borrow Checker - The Borrow Checker
- Guide: [MIR Borrow Check](https://rust-lang.github.io/rustc-guide/borrow_check.html) - Guide: [MIR Borrow Check](https://rust-lang.github.io/rustc-guide/borrow_check.html)
- Definition: [`rustc_mir/borrow_check`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/index.html) - Definition: [`rustc_mir/borrow_check`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/index.html)
- Main entry point: **TODO** - Main entry point: **TODO**
- MIR Optimizations - MIR Optimizations
- Guide: [MIR Optimizations](https://rust-lang.github.io/rustc-guide/mir/optimizations.html) - Guide: [MIR Optimizations](https://rust-lang.github.io/rustc-guide/mir/optimizations.html)
- Definition: [`rustc_mir/transform`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/index.html) **TODO: is this correct?** - Definition: [`rustc_mir/transform`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/index.html) **TODO: is this correct?**
- Main entry point: **TODO** - Main entry point: **TODO**
- Code Generation - Code Generation
- Guide: [Code Generation](https://rust-lang.github.io/rustc-guide/codegen.html) - Guide: [Code Generation](https://rust-lang.github.io/rustc-guide/codegen.html)
- Guide: [Generating LLVM IR](https://rust-lang.github.io/rustc-guide/codegen.html#generating-llvm-ir) - **TODO: this is not available yet** - Guide: [Generating LLVM IR](https://rust-lang.github.io/rustc-guide/codegen.html#generating-llvm-ir) - **TODO: this is not available yet**
- Generating Machine Code from LLVM IR with LLVM - **TODO: reference?** - Generating Machine Code from LLVM IR with LLVM - **TODO: reference?**
- Main entry point MIR -> LLVM IR: **TODO** - Main entry point MIR -> LLVM IR: **TODO**
- Main entry point LLVM IR -> Machine Code **TODO** - Main entry point LLVM IR -> Machine Code **TODO**