Update the-parser.md

This commit is contained in:
Tbkhi 2024-03-10 17:45:12 -03:00 committed by nora
parent 04480082ad
commit 41c4e9c988
1 changed files with 45 additions and 39 deletions

View File

@ -1,68 +1,74 @@
# Lexing and Parsing # Lexing and Parsing
The very first thing the compiler does is take the program (in Unicode The very first thing the compiler does is take the program (in Unicode) and
characters) and turn it into something the compiler can work with more transmute it into a data format the compiler can work with more conveniently
conveniently than strings. This happens in two stages: Lexing and Parsing. than strings. This happens in two stages: Lexing and Parsing.
Lexing takes strings and turns them into streams of [tokens]. For example, 1. _Lexing_ takes strings and turns them into streams of [tokens]. For
`foo.bar + buz` would be turned into the tokens `foo`, `.`, example, `foo.bar + buz` would be turned into the tokens `foo`, `.`, `bar`,
`bar`, `+`, and `buz`. The lexer lives in [`rustc_lexer`][lexer]. `+`, and `buz`.
[tokens]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/token/index.html [tokens]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/token/index.html
[lexer]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html [lexer]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html
Parsing then takes streams of tokens and turns them into a structured 2. _Parsing_ takes streams of tokens and turns them into a structured form
form which is easier for the compiler to work with, usually called an [*Abstract which is easier for the compiler to work with, usually called an [*Abstract
Syntax Tree*][ast] (AST). An AST mirrors the structure of a Rust program in memory, Syntax Tree* (`AST`)][ast] .
using a `Span` to link a particular AST node back to its source text.
An `AST` mirrors the structure of a Rust program in memory, using a `Span` to
link a particular `AST` node back to its source text. The `AST` is defined in
[`rustc_ast`][rustc_ast], along with some definitions for tokens and token
streams, data structures/`trait`s for mutating `AST`s, and shared definitions for
other `AST`-related parts of the compiler (like the lexer and
`macro`-expansion).
The AST is defined in [`rustc_ast`][rustc_ast], along with some definitions for The lexer is developed in [`rustc_lexer`][lexer].
tokens and token streams, data structures/traits for mutating ASTs, and shared
definitions for other AST-related parts of the compiler (like the lexer and
macro-expansion).
The parser is defined in [`rustc_parse`][rustc_parse], along with a The parser is defined in [`rustc_parse`][rustc_parse], along with a
high-level interface to the lexer and some validation routines that run after high-level interface to the lexer and some validation routines that run after
macro expansion. In particular, the [`rustc_parse::parser`][parser] contains `macro` expansion. In particular, the [`rustc_parse::parser`][parser] contains
the parser implementation. the parser implementation.
The main entrypoint to the parser is via the various `parse_*` functions and others in the The main entrypoint to the parser is via the various `parse_*` functions and others in
[parser crate][parser_lib]. They let you do things like turn a [`SourceFile`][sourcefile] [rustc_parse][rustc_parse]. They let you do things like turn a [`SourceFile`][sourcefile]
(e.g. the source in a single file) into a token stream, create a parser from (e.g. the source in a single file) into a token stream, create a parser from
the token stream, and then execute the parser to get a `Crate` (the root AST the token stream, and then execute the parser to get a [`Crate`] (the root `AST`
node). node).
To minimize the amount of copying that is done, To minimize the amount of copying that is done, both [`StringReader`] and
both [`StringReader`] and [`Parser`] have lifetimes which bind them to the parent `ParseSess`. [`Parser`] have lifetimes which bind them to the parent [`ParseSess`]. This
This contains all the information needed while parsing, contains all the information needed while parsing, as well as the [`SourceMap`]
as well as the [`SourceMap`] itself. itself.
Note that while parsing, we may encounter macro definitions or invocations. We Note that while parsing, we may encounter `macro` definitions or invocations. We
set these aside to be expanded (see [this chapter](./macro-expansion.md)). set these aside to be expanded (see [Macro Expansion](./macro-expansion.md)).
Expansion may itself require parsing the output of the macro, which may reveal Expansion itself may require parsing the output of a `macro`, which may reveal
more macros to be expanded, and so on. more `macro`s to be expanded, and so on.
## More on Lexical Analysis ## More on Lexical Analysis
Code for lexical analysis is split between two crates: Code for lexical analysis is split between two crates:
- `rustc_lexer` crate is responsible for breaking a `&str` into chunks - [`rustc_lexer`] crate is responsible for breaking a `&str` into chunks
constituting tokens. Although it is popular to implement lexers as generated constituting tokens. Although it is popular to implement lexers as generated
finite state machines, the lexer in `rustc_lexer` is hand-written. finite state machines, the lexer in [`rustc_lexer`] is hand-written.
- [`StringReader`] integrates `rustc_lexer` with data structures specific to `rustc`. - [`StringReader`] integrates [`rustc_lexer`] with data structures specific to
Specifically, `rustc`. Specifically, it adds `Span` information to tokens returned by
it adds `Span` information to tokens returned by `rustc_lexer` and interns identifiers. [`rustc_lexer`] and interns identifiers.
[`Crate`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/struct.Crate.html
[`Parser`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html
[`ParseSess`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_session/parse/struct.ParseSess.html
[`rustc_lexer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html
[`SourceMap`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/source_map/struct.SourceMap.html
[`StringReader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/struct.StringReader.html
[ast module]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html
[ast]: https://en.wikipedia.org/wiki/Abstract_syntax_tree
[parser]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/index.html
[rustc_ast]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html [rustc_ast]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html
[rustc_errors]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_errors/index.html [rustc_errors]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_errors/index.html
[ast]: https://en.wikipedia.org/wiki/Abstract_syntax_tree
[`SourceMap`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/source_map/struct.SourceMap.html
[ast module]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html
[rustc_parse]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html [rustc_parse]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html
[parser_lib]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html
[parser]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/index.html
[`Parser`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html
[`StringReader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/struct.StringReader.html
[visit module]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/visit/index.html
[sourcefile]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.SourceFile.html [sourcefile]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.SourceFile.html
[visit module]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/visit/index.html