[overview.md] add lexer updates, parser updates

includes feedback from matklad (lexer) and centril (parser)
This commit is contained in:
Chris Simpkins 2020-04-07 22:19:57 -04:00 committed by Who? Me?!
parent 4eadacdf29
commit 5090bb8d0f
1 changed files with 11 additions and 15 deletions

View File

@ -28,8 +28,8 @@ we'll talk about that later.
to the rest of the compilation process as a [`rustc_interface::Config`]. to the rest of the compilation process as a [`rustc_interface::Config`].
- The raw Rust source text is analyzed by a low-level lexer located in - The raw Rust source text is analyzed by a low-level lexer located in
[`librustc_lexer`]. At this stage, the source text is turned into a stream of [`librustc_lexer`]. At this stage, the source text is turned into a stream of
atomic source code units known as _tokens_. The lexer supports the Unicode atomic source code units known as _tokens_. The lexer supports the
character encoding. Unicode character encoding.
- The token stream passes through a higher-level lexer located in - The token stream passes through a higher-level lexer located in
[`librustc_parse`] to prepare for the next stage of the compile process. The [`librustc_parse`] to prepare for the next stage of the compile process. The
[`StringReader`] struct is used at this stage to perform a set of validations [`StringReader`] struct is used at this stage to perform a set of validations
@ -47,25 +47,21 @@ we'll talk about that later.
- Parsing is performed with a set of `Parser` utility methods including `fn bump`, - Parsing is performed with a set of `Parser` utility methods including `fn bump`,
`fn check`, `fn eat`, `fn expect`, `fn look_ahead`. `fn check`, `fn eat`, `fn expect`, `fn look_ahead`.
- Parsing is organized by the semantic construct that is being parsed. Separate - Parsing is organized by the semantic construct that is being parsed. Separate
`parse_*` methods can be found in `librustc_parse` `parser` directory. File `parse_*` methods can be found in `librustc_parse` `parser` directory. The source
naming follows the construct name. For example, the following files are found file name follows the construct name. For example, the following files are found
in the parser: in the parser:
- `expr.rs` - `expr.rs`
- `pat.rs` - `pat.rs`
- `ty.rs` - `ty.rs`
- `stmt.rs` - `stmt.rs`
- This naming scheme is used across the parser, lowering, type checking, - This naming scheme is used across many compiler stages. You will find
HAIR lowering, & MIR building stages of the compile process and you will either a file or directory with the same name across the parsing, lowering,
find either a file or directory with the same name for most of these constructs type checking, HAIR lowering, and MIR building sources.
at each of these stages of compilation. - Macro expansion, AST validation, name resolution, and early linting takes place
- For error handling, the parser uses the standard `DiagnosticBuilder` API, but we during this stage of the compile process.
- The parser uses the standard `DiagnosticBuilder` API for error handling, but we
try to recover, parsing a superset of Rust's grammar, while also emitting an error. try to recover, parsing a superset of Rust's grammar, while also emitting an error.
- The `rustc_ast::ast::{Crate, Mod, Expr, Pat, ...}` AST node returned from the parser. - `rustc_ast::ast::{Crate, Mod, Expr, Pat, ...}` AST nodes are returned from the parser.
- macro expansion (**TODO** chrissimpkins)
- ast validation (**TODO** chrissimpkins)
- nameres (**TODO** chrissimpkins)
- early linting (**TODO** chrissimpkins)
- We then take the AST and [convert it to High-Level Intermediate - We then take the AST and [convert it to High-Level Intermediate
Representation (HIR)][hir]. This is a compiler-friendly representation of the Representation (HIR)][hir]. This is a compiler-friendly representation of the
AST. This involves a lot of desugaring of things like loops and `async fn`. AST. This involves a lot of desugaring of things like loops and `async fn`.