Update the-parser.md
This commit is contained in:
parent
04480082ad
commit
41c4e9c988
|
|
@ -1,68 +1,74 @@
|
|||
# Lexing and Parsing
|
||||
|
||||
The very first thing the compiler does is take the program (in Unicode
|
||||
characters) and turn it into something the compiler can work with more
|
||||
conveniently than strings. This happens in two stages: Lexing and Parsing.
|
||||
The very first thing the compiler does is take the program (in Unicode) and
|
||||
transmute it into a data format the compiler can work with more conveniently
|
||||
than strings. This happens in two stages: Lexing and Parsing.
|
||||
|
||||
Lexing takes strings and turns them into streams of [tokens]. For example,
|
||||
`foo.bar + buz` would be turned into the tokens `foo`, `.`,
|
||||
`bar`, `+`, and `buz`. The lexer lives in [`rustc_lexer`][lexer].
|
||||
1. _Lexing_ takes strings and turns them into streams of [tokens]. For
|
||||
example, `foo.bar + buz` would be turned into the tokens `foo`, `.`, `bar`,
|
||||
`+`, and `buz`.
|
||||
|
||||
[tokens]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/token/index.html
|
||||
[lexer]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html
|
||||
|
||||
Parsing then takes streams of tokens and turns them into a structured
|
||||
form which is easier for the compiler to work with, usually called an [*Abstract
|
||||
Syntax Tree*][ast] (AST). An AST mirrors the structure of a Rust program in memory,
|
||||
using a `Span` to link a particular AST node back to its source text.
|
||||
2. _Parsing_ takes streams of tokens and turns them into a structured form
|
||||
which is easier for the compiler to work with, usually called an [*Abstract
|
||||
Syntax Tree* (`AST`)][ast] .
|
||||
|
||||
The AST is defined in [`rustc_ast`][rustc_ast], along with some definitions for
|
||||
tokens and token streams, data structures/traits for mutating ASTs, and shared
|
||||
definitions for other AST-related parts of the compiler (like the lexer and
|
||||
macro-expansion).
|
||||
|
||||
An `AST` mirrors the structure of a Rust program in memory, using a `Span` to
|
||||
link a particular `AST` node back to its source text. The `AST` is defined in
|
||||
[`rustc_ast`][rustc_ast], along with some definitions for tokens and token
|
||||
streams, data structures/`trait`s for mutating `AST`s, and shared definitions for
|
||||
other `AST`-related parts of the compiler (like the lexer and
|
||||
`macro`-expansion).
|
||||
|
||||
The lexer is developed in [`rustc_lexer`][lexer].
|
||||
|
||||
The parser is defined in [`rustc_parse`][rustc_parse], along with a
|
||||
high-level interface to the lexer and some validation routines that run after
|
||||
macro expansion. In particular, the [`rustc_parse::parser`][parser] contains
|
||||
`macro` expansion. In particular, the [`rustc_parse::parser`][parser] contains
|
||||
the parser implementation.
|
||||
|
||||
The main entrypoint to the parser is via the various `parse_*` functions and others in the
|
||||
[parser crate][parser_lib]. They let you do things like turn a [`SourceFile`][sourcefile]
|
||||
The main entrypoint to the parser is via the various `parse_*` functions and others in
|
||||
[rustc_parse][rustc_parse]. They let you do things like turn a [`SourceFile`][sourcefile]
|
||||
(e.g. the source in a single file) into a token stream, create a parser from
|
||||
the token stream, and then execute the parser to get a `Crate` (the root AST
|
||||
the token stream, and then execute the parser to get a [`Crate`] (the root `AST`
|
||||
node).
|
||||
|
||||
To minimize the amount of copying that is done,
|
||||
both [`StringReader`] and [`Parser`] have lifetimes which bind them to the parent `ParseSess`.
|
||||
This contains all the information needed while parsing,
|
||||
as well as the [`SourceMap`] itself.
|
||||
To minimize the amount of copying that is done, both [`StringReader`] and
|
||||
[`Parser`] have lifetimes which bind them to the parent [`ParseSess`]. This
|
||||
contains all the information needed while parsing, as well as the [`SourceMap`]
|
||||
itself.
|
||||
|
||||
Note that while parsing, we may encounter macro definitions or invocations. We
|
||||
set these aside to be expanded (see [this chapter](./macro-expansion.md)).
|
||||
Expansion may itself require parsing the output of the macro, which may reveal
|
||||
more macros to be expanded, and so on.
|
||||
Note that while parsing, we may encounter `macro` definitions or invocations. We
|
||||
set these aside to be expanded (see [Macro Expansion](./macro-expansion.md)).
|
||||
Expansion itself may require parsing the output of a `macro`, which may reveal
|
||||
more `macro`s to be expanded, and so on.
|
||||
|
||||
## More on Lexical Analysis
|
||||
|
||||
Code for lexical analysis is split between two crates:
|
||||
|
||||
- `rustc_lexer` crate is responsible for breaking a `&str` into chunks
|
||||
- [`rustc_lexer`] crate is responsible for breaking a `&str` into chunks
|
||||
constituting tokens. Although it is popular to implement lexers as generated
|
||||
finite state machines, the lexer in `rustc_lexer` is hand-written.
|
||||
finite state machines, the lexer in [`rustc_lexer`] is hand-written.
|
||||
|
||||
- [`StringReader`] integrates `rustc_lexer` with data structures specific to `rustc`.
|
||||
Specifically,
|
||||
it adds `Span` information to tokens returned by `rustc_lexer` and interns identifiers.
|
||||
- [`StringReader`] integrates [`rustc_lexer`] with data structures specific to
|
||||
`rustc`. Specifically, it adds `Span` information to tokens returned by
|
||||
[`rustc_lexer`] and interns identifiers.
|
||||
|
||||
[`Crate`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/struct.Crate.html
|
||||
[`Parser`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html
|
||||
[`ParseSess`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_session/parse/struct.ParseSess.html
|
||||
[`rustc_lexer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html
|
||||
[`SourceMap`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/source_map/struct.SourceMap.html
|
||||
[`StringReader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/struct.StringReader.html
|
||||
[ast module]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html
|
||||
[ast]: https://en.wikipedia.org/wiki/Abstract_syntax_tree
|
||||
[parser]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/index.html
|
||||
[rustc_ast]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html
|
||||
[rustc_errors]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_errors/index.html
|
||||
[ast]: https://en.wikipedia.org/wiki/Abstract_syntax_tree
|
||||
[`SourceMap`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/source_map/struct.SourceMap.html
|
||||
[ast module]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html
|
||||
[rustc_parse]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html
|
||||
[parser_lib]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html
|
||||
[parser]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/index.html
|
||||
[`Parser`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html
|
||||
[`StringReader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/struct.StringReader.html
|
||||
[visit module]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/visit/index.html
|
||||
[sourcefile]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.SourceFile.html
|
||||
[visit module]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/visit/index.html
|
||||
Loading…
Reference in New Issue