From 0783019c123a14235102bb4afdc52e4026c7e48b Mon Sep 17 00:00:00 2001 From: Chris Simpkins Date: Fri, 3 Apr 2020 01:41:04 -0400 Subject: [PATCH] [overview.md] Add command line argument parsing, lexer stages, and parser outline --- src/overview.md | 132 ++++++++++++++++++++++++------------------------ 1 file changed, 66 insertions(+), 66 deletions(-) diff --git a/src/overview.md b/src/overview.md index 82aa6370..ee876114 100644 --- a/src/overview.md +++ b/src/overview.md @@ -19,16 +19,16 @@ we'll talk about that later. **TODO: someone else should confirm this vvv** -- User writes a program and invokes `rustc` on it (possibly through `cargo`). -- First, we parse command line flags, etc. This is done in [`librustc_driver`]. - We now know what the exact work is we need to do (e.g. which nightly features - are enabled, whether we are doing a `check`-only build or emiting LLVM-IR or - a full compilation). -- Then, we start to do compilation... -- We first [_lex_ the user program][lex]. This turns the program into a stream - of _tokens_ (yes, the same sort of tokens as `proc_macros` (sort of)). - [`StringReader`] from [`librustc_parse`] integrates [`librustc_lexer`] with - `rustc` data structures. +- The compile process begins when a user writes a Rust source program in text and invokes the `rustc` compiler on it. The work that the compiler needs to perform is defined with command line options. For example, it is possible to optionally enable nightly features, perform `check`-only builds, or emit LLVM-IR rather than complete the entire compile process defined here. The `rustc` executable call may be indirect through the use of `cargo`. +- Command line argument parsing occurs in the [`librustc_driver`]. This crate defines the compile configuration that is requested by the user. +- The raw Rust source text is analyzed by a low-level lexer located in [`librustc_lexer`]. At this stage, the source text is turned into a stream of atomic source code units known as _tokens_. (**TODO**: chrissimpkins - Maybe discuss Unicode handling during this stage?) +- The token stream passes through a higher-level lexer located in [`librustc_parse`] to prepare for the next stage of the compile process. The [`StringReader`] struct is used at this stage to perform a set of validations and turn strings into interned symbols. +- (**TODO**: chrissimpkins - Expand info on parser) We then [_parse_ the stream of tokens][parser] to build an Abstract Syntax Tree (AST). + - macro expansion (**TODO** chrissimpkins) + - ast validation (**TODO** chrissimpkins) + - nameres (**TODO** chrissimpkins) + - early linting (**TODO** chrissimpkins) + - We then [_parse_ the stream of tokens][parser] to build an Abstract Syntax Tree (AST). - We then take the AST and [convert it to High-Level Intermediate @@ -45,27 +45,27 @@ we'll talk about that later. - We (want to) do [many optimizations on the MIR][mir-opt] because it is still generic and that improves the code we generate later, improving compilation speed too. (**TODO: size optimizations too?**) - - MIR is a higher level (and generic) representation, so it is easier to do - some optimizations at MIR level than at LLVM-IR level. For example LLVM - doesn't seem to be able to optimize the pattern the [`simplify_try`] mir - opt looks for. + - MIR is a higher level (and generic) representation, so it is easier to do + some optimizations at MIR level than at LLVM-IR level. For example LLVM + doesn't seem to be able to optimize the pattern the [`simplify_try`] mir + opt looks for. - Rust code is _monomorphized_, which means making copies of all the generic code with the type parameters replaced by concrete types. To do this, we need to collect a list of what concrete types to generate code for. This is called _monomorphization collection_. - We then begin what is vaguely called _code generation_ or _codegen_. - - The [code generation stage (codegen)][codegen] is when higher level - representations of source are turned into an executable binary. `rustc` + - The [code generation stage (codegen)][codegen] is when higher level + representations of source are turned into an executable binary. `rustc` uses LLVM for code generation. The first step is the MIR is then - converted to LLVM Intermediate Representation (LLVM IR). This is where - the MIR is actually monomorphized, according to the list we created in - the previous step. - - The LLVM IR is passed to LLVM, which does a lot more optimizations on it. - It then emits machine code. It is basically assembly code with additional - low-level types and annotations added. (e.g. an ELF object or wasm). - **TODO: reference for this section?** - - The different libraries/binaries are linked together to produce the final - binary. **TODO: reference for this section?** + converted to LLVM Intermediate Representation (LLVM IR). This is where + the MIR is actually monomorphized, according to the list we created in + the previous step. + - The LLVM IR is passed to LLVM, which does a lot more optimizations on it. + It then emits machine code. It is basically assembly code with additional + low-level types and annotations added. (e.g. an ELF object or wasm). + **TODO: reference for this section?** + - The different libraries/binaries are linked together to produce the final + binary. **TODO: reference for this section?** [`librustc_lexer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html [`librustc_driver`]: https://rust-lang.github.io/rustc-guide/rustc-driver.html @@ -90,12 +90,12 @@ satisfy/optimize for. For example, - Compilation speed: how fast is it to compile a program. More/better compile-time analyses often means compilation is slower. - - Also, we want to support incremental compilation, so we need to take that - into account. How can we keep track of what work needs to be redone and - what can be reused if the user modifies their program? - - Also we can't store too much stuff in the incremental cache because - it would take a long time to load from disk and it could take a lot - of space on the user's system... + - Also, we want to support incremental compilation, so we need to take that + into account. How can we keep track of what work needs to be redone and + what can be reused if the user modifies their program? + - Also we can't store too much stuff in the incremental cache because + it would take a long time to load from disk and it could take a lot + of space on the user's system... - Compiler memory usage: while compiling a program, we don't want to use more memory than we need. - Program speed: how fast is your compiled program. More/better compile-time @@ -277,46 +277,46 @@ but there are already some promising performance improvements. # References - Command line parsing - - Guide: [The Rustc Driver and Interface](https://rust-lang.github.io/rustc-guide/rustc-driver.html) - - Driver definition: [`rustc_driver`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/) - - Main entry point: **TODO** + - Guide: [The Rustc Driver and Interface](https://rust-lang.github.io/rustc-guide/rustc-driver.html) + - Driver definition: [`rustc_driver`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/) + - Main entry point: **TODO** - Lexical Analysis: Lex the user program to a stream of tokens - - Guide: [Lexing and Parsing](https://rust-lang.github.io/rustc-guide/the-parser.html) - - Lexer definition: [`librustc_lexer`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html) - - Main entry point: **TODO** + - Guide: [Lexing and Parsing](https://rust-lang.github.io/rustc-guide/the-parser.html) + - Lexer definition: [`librustc_lexer`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html) + - Main entry point: **TODO** - Parsing: Parse the stream of tokens to an Abstract Syntax Tree (AST) - - Guide: [Lexing and Parsing](https://rust-lang.github.io/rustc-guide/the-parser.html) - - Parser definition: [`librustc_parse`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html) - - Main entry point: **TODO** - - AST definition: [`librustc_ast`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html) + - Guide: [Lexing and Parsing](https://rust-lang.github.io/rustc-guide/the-parser.html) + - Parser definition: [`librustc_parse`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html) + - Main entry point: **TODO** + - AST definition: [`librustc_ast`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html) - The High Level Intermediate Representation (HIR) - - Guide: [The HIR](https://rust-lang.github.io/rustc-guide/hir.html) - - Guide: [Identifiers in the HIR](https://rust-lang.github.io/rustc-guide/hir.html#identifiers-in-the-hir) - - Guide: [The HIR Map](https://rust-lang.github.io/rustc-guide/hir.html#the-hir-map) - - Guide: [Lowering AST to HIR](https://rust-lang.github.io/rustc-guide/lowering.html) - - How to view HIR representation for your code `cargo rustc -- -Zunpretty=hir-tree` - - Rustc HIR definition: [`rustc_hir`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html) - - Main entry point: **TODO** + - Guide: [The HIR](https://rust-lang.github.io/rustc-guide/hir.html) + - Guide: [Identifiers in the HIR](https://rust-lang.github.io/rustc-guide/hir.html#identifiers-in-the-hir) + - Guide: [The HIR Map](https://rust-lang.github.io/rustc-guide/hir.html#the-hir-map) + - Guide: [Lowering AST to HIR](https://rust-lang.github.io/rustc-guide/lowering.html) + - How to view HIR representation for your code `cargo rustc -- -Zunpretty=hir-tree` + - Rustc HIR definition: [`rustc_hir`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html) + - Main entry point: **TODO** - Type Inference - - Guide: [Type Inference](https://rust-lang.github.io/rustc-guide/type-inference.html) - - Guide: [The ty Module: Representing Types](https://rust-lang.github.io/rustc-guide/ty.html) (semantics) - - Main entry point: **TODO** + - Guide: [Type Inference](https://rust-lang.github.io/rustc-guide/type-inference.html) + - Guide: [The ty Module: Representing Types](https://rust-lang.github.io/rustc-guide/ty.html) (semantics) + - Main entry point: **TODO** - The Mid Level Intermediate Representation (MIR) - - Guide: [The MIR (Mid level IR)](https://rust-lang.github.io/rustc-guide/mir/index.html) - - Definition: [`librustc/mir`](https://github.com/rust-lang/rust/tree/master/src/librustc/mir) - - Definition of source that manipulates the MIR: [`librustc_mir`](https://github.com/rust-lang/rust/tree/master/src/librustc_mir) - - Main entry point: **TODO** + - Guide: [The MIR (Mid level IR)](https://rust-lang.github.io/rustc-guide/mir/index.html) + - Definition: [`librustc/mir`](https://github.com/rust-lang/rust/tree/master/src/librustc/mir) + - Definition of source that manipulates the MIR: [`librustc_mir`](https://github.com/rust-lang/rust/tree/master/src/librustc_mir) + - Main entry point: **TODO** - The Borrow Checker - - Guide: [MIR Borrow Check](https://rust-lang.github.io/rustc-guide/borrow_check.html) - - Definition: [`rustc_mir/borrow_check`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/index.html) - - Main entry point: **TODO** + - Guide: [MIR Borrow Check](https://rust-lang.github.io/rustc-guide/borrow_check.html) + - Definition: [`rustc_mir/borrow_check`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/borrow_check/index.html) + - Main entry point: **TODO** - MIR Optimizations - - Guide: [MIR Optimizations](https://rust-lang.github.io/rustc-guide/mir/optimizations.html) - - Definition: [`rustc_mir/transform`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/index.html) **TODO: is this correct?** - - Main entry point: **TODO** + - Guide: [MIR Optimizations](https://rust-lang.github.io/rustc-guide/mir/optimizations.html) + - Definition: [`rustc_mir/transform`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/transform/index.html) **TODO: is this correct?** + - Main entry point: **TODO** - Code Generation - - Guide: [Code Generation](https://rust-lang.github.io/rustc-guide/codegen.html) - - Guide: [Generating LLVM IR](https://rust-lang.github.io/rustc-guide/codegen.html#generating-llvm-ir) - **TODO: this is not available yet** - - Generating Machine Code from LLVM IR with LLVM - **TODO: reference?** - - Main entry point MIR -> LLVM IR: **TODO** - - Main entry point LLVM IR -> Machine Code **TODO** + - Guide: [Code Generation](https://rust-lang.github.io/rustc-guide/codegen.html) + - Guide: [Generating LLVM IR](https://rust-lang.github.io/rustc-guide/codegen.html#generating-llvm-ir) - **TODO: this is not available yet** + - Generating Machine Code from LLVM IR with LLVM - **TODO: reference?** + - Main entry point MIR -> LLVM IR: **TODO** + - Main entry point LLVM IR -> Machine Code **TODO**