update parser chapter

2019-11-11 10:36:45 -06:00 · 2019-11-11 10:36:45 -06:00 · a50b8f144f
parent 7616ab36bc
commit a50b8f144f
4 changed files with 33 additions and 24 deletions
--- a/src/SUMMARY.md
+++ b/src/SUMMARY.md
@ -38,7 +38,7 @@
        - [Incremental compilation](./queries/incremental-compilation.md)
        - [Incremental compilation In Detail](./queries/incremental-compilation-in-detail.md)
        - [Debugging and Testing](./incrcomp-debugging.md)
-    - [The parser](./the-parser.md)
+    - [Lexing and Parsing](./the-parser.md)
    - [`#[test]` Implementation](./test-implementation.md)
    - [Macro expansion](./macro-expansion.md)
    - [Name resolution](./name-resolution.md)
--- a/src/appendix/code-index.md
+++ b/src/appendix/code-index.md
@ -24,7 +24,7 @@ Item            |  Kind    | Short description           | Chapter            |
 `SourceFile` | struct | Part of the `SourceMap`. Maps AST nodes to their source code for a single source file. Was previously called FileMap | [The parser] | [src/libsyntax_pos/lib.rs](https://doc.rust-lang.org/nightly/nightly-rustc/syntax/source_map/struct.SourceFile.html)
 `SourceMap` | struct | Maps AST nodes to their source code. It is composed of `SourceFile`s. Was previously called CodeMap | [The parser] | [src/libsyntax/source_map.rs](https://doc.rust-lang.org/nightly/nightly-rustc/syntax/source_map/struct.SourceMap.html)
 `Span` | struct  | A location in the user's source code, used for error reporting primarily | [Emitting Diagnostics] | [src/libsyntax_pos/span_encoding.rs](https://doc.rust-lang.org/nightly/nightly-rustc/syntax_pos/struct.Span.html)
-`StringReader` | struct | This is the lexer used during parsing. It consumes characters from the raw source code being compiled and produces a series of tokens for use by the rest of the parser | [The parser] |  [src/libsyntax/parse/lexer/mod.rs](https://doc.rust-lang.org/nightly/nightly-rustc/syntax/parse/lexer/struct.StringReader.html)
+`StringReader` | struct | This is the lexer used during parsing. It consumes characters from the raw source code being compiled and produces a series of tokens for use by the rest of the parser | [The parser] |  [src/librustc_parse/lexer/mod.rs](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/struct.StringReader.html)
 `syntax::token_stream::TokenStream` | struct | An abstract sequence of tokens, organized into `TokenTree`s | [The parser], [Macro expansion] | [src/libsyntax/tokenstream.rs](https://doc.rust-lang.org/nightly/nightly-rustc/syntax/tokenstream/struct.TokenStream.html)
 `TraitDef` | struct | This struct contains a trait's definition with type information | [The `ty` modules] |  [src/librustc/ty/trait_def.rs](https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/trait_def/struct.TraitDef.html)
 `TraitRef` | struct | The combination of a trait and its input types (e.g. `P0: Trait<P1...Pn>`) | [Trait Solving: Goals and Clauses], [Trait Solving: Lowering impls]  |  [src/librustc/ty/sty.rs](https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.TraitRef.html)
--- a/src/macro-expansion.md
+++ b/src/macro-expansion.md
@ -1,5 +1,8 @@
 # Macro expansion

+> `libsyntax`, `librustc_expand`, and `libsyntax_ext` are all undergoing
+> refactoring, so some of the links in this chapter may be broken.
+
 Macro expansion happens during parsing. `rustc` has two parsers, in fact: the
 normal Rust parser, and the macro parser. During the parsing phase, the normal
 Rust parser will set aside the contents of macros and their invocations. Later,
--- a/src/the-parser.md
+++ b/src/the-parser.md
@ -1,29 +1,35 @@
-# The Parser
+# Lexing and Parsing

-The parser is responsible for converting raw Rust source code into a structured
+> The parser and lexer are currently undergoing a lot of refactoring, so parts
+> of this chapter may be out of date.
+
+The very first thing the compiler does is take the program (in Unicode
+characters) and turn it into something the compiler can work with more
+conveniently than strings. This happens in two stages: Lexing and Parsing.
+
+Lexing takes strings and turns them into streams of tokens. For example,
+`a.b + c` would be turned into the tokens `a`, `.`, `b`, `+`, and `c`.
+The lexer lives in [`librustc_lexer`][lexer].
+
+[lexer]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html
+
+Parsing then takes streams of tokens and turns them into a structured
 form which is easier for the compiler to work with, usually called an [*Abstract
-Syntax Tree*][ast]. An AST mirrors the structure of a Rust program in memory,
+Syntax Tree*][ast] (AST). An AST mirrors the structure of a Rust program in memory,
 using a `Span` to link a particular AST node back to its source text.

-The bulk of the parser lives in the [libsyntax] crate.
+The AST is defined in [`libsyntax`][libsyntax], along with some definitions for
+tokens and token streams, data structures/traits for mutating ASTs, and shared
+definitions for other AST-related parts of the compiler (like the lexer and
+macro-expansion).

-Like most parsers, the parsing process is composed of two main steps,
-
- lexical analysis – turn a stream of characters into a stream of token trees
- parsing – turn the token trees into an AST
-
-The `syntax` crate contains several main players,
-
- a [`SourceMap`] for mapping AST nodes to their source code
- the [ast module] contains types corresponding to each AST node
- a [`StringReader`] for lexing source code into tokens
- the [parser module] and [`Parser`] struct are in charge of actually parsing
-  tokens into AST nodes,
- and a [visit module] for walking the AST and inspecting or mutating the AST
-  nodes.
+The parser is defined in [`librustc_parse`][librustc_parse], along with a
+high-level interface to the lexer and some validation routines that run after
+macro expansion. In particular, the [`rustc_parser::parser`][parser] contains
+the parser implementation.

 The main entrypoint to the parser is via the various `parse_*` functions in the
-[parser module]. They let you do things like turn a [`SourceFile`][sourcefile]
+[parser][parser]. They let you do things like turn a [`SourceFile`][sourcefile]
 (e.g. the source in a single file) into a token stream, create a parser from
 the token stream, and then execute the parser to get a `Crate` (the root AST
 node).
@ -44,14 +50,14 @@ Code for lexical analysis is split between two crates:
  specific data structures. Specifically, it adds `Span` information to tokens
  returned by `rustc_lexer` and interns identifiers.

-
 [libsyntax]: https://doc.rust-lang.org/nightly/nightly-rustc/syntax/index.html
 [rustc_errors]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_errors/index.html
 [ast]: https://en.wikipedia.org/wiki/Abstract_syntax_tree
 [`SourceMap`]: https://doc.rust-lang.org/nightly/nightly-rustc/syntax/source_map/struct.SourceMap.html
 [ast module]: https://doc.rust-lang.org/nightly/nightly-rustc/syntax/ast/index.html
-[parser module]: https://doc.rust-lang.org/nightly/nightly-rustc/syntax/parse/index.html
+[librustc_parse]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html
+[parser]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/index.html
 [`Parser`]: https://doc.rust-lang.org/nightly/nightly-rustc/syntax/parse/parser/struct.Parser.html
-[`StringReader`]: https://doc.rust-lang.org/nightly/nightly-rustc/syntax/parse/lexer/struct.StringReader.html
+[`StringReader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/struct.StringReader.html
 [visit module]: https://doc.rust-lang.org/nightly/nightly-rustc/syntax/visit/index.html
 [sourcefile]: https://doc.rust-lang.org/nightly/nightly-rustc/syntax/source_map/struct.SourceFile.html