Updated macros to address Niko's comments

2018-01-26 14:41:56 -06:00 · 2018-01-26 14:41:56 -06:00 · 81e9d3bb83
parent ce6899ab8b
commit 81e9d3bb83
1 changed files with 125 additions and 37 deletions
--- a/src/macro-expansion.md
+++ b/src/macro-expansion.md
@ -2,56 +2,136 @@
 Macro expansion happens during parsing. `rustc` has two parsers, in fact: the
 normal Rust parser, and the macro parser. During the parsing phase, the normal
-Rust parser will call into the macro parser when it encounters a macro. The
+Rust parser will call into the macro parser when it encounters a macro
-macro parser, in turn, may call back out to the Rust parser when it needs to
+definition or macro invocation (TODO: verify). The macro parser, in turn, may
-bind a metavariable (e.g. `$my_expr`). There are a few aspects of this system to
+call back out to the Rust parser when it needs to bind a metavariable (e.g.
-be explained. The code for macro expansion is in `src/libsyntax/ext/tt/`.
+`$my_expr`) while parsing the contents of a macro invocation. The code for macro
 expansion is in [`src/libsyntax/ext/tt/`][code_dir]. This chapter aims to
 explain how macro expansion works.
 ### Example
 It's helpful to have an example to refer to. For the remainder of this chapter,
 whenever we refer to the "example _definition_", we mean the following:
 ```rust
 macro_rules! printer {
    (print $mvar:ident) => {
        println!("{}", $mvar);
    }
    (print twice $mvar:ident) => {
        println!("{}", $mvar);
        println!("{}", $mvar);
    }
 }
 ```
 `$mvar` is called a _metavariable_. Unlike normal variables, rather than binding
 to a value in a computation, a metavariable binds _at compile time_ to a tree of
 _tokens_. A _token_ zero or more symbols that together have some meaning. For
 example, in our example definition, `print`, `$mvar`, `=>`, `{` are all tokens
 (though that's not an exhaustive list). There are also other special tokens,
 such as `EOF`, which indicates that there are no more tokens. The process of
 producing a stream of tokens from the raw bytes of the source file is called
 _lexing_. For more information about _lexing_, see the [Parsing
 chapter][parsing] of this book.
 Whenever we refer to the "example _invocation_", we mean the following snippet:
 ```rust
 printer!(print foo); // Assume `foo` is a variable defined somewhere else...
 ```
 The process of expanding the macro invocation into the syntax tree
 `println!("{}", foo)` and then expanding that into a call to `Display::fmt` is
 called _macro expansion_, it is the topic of this chapter.
 ### The macro parser
 There are two parts to macro expansion: parsing the definition and parsing the
 invocations. Interestingly, both are done by the macro parser.
 Basically, the macro parser is like an NFA-based regex parser. It uses an
 algorithm similar in spirit to the [Earley parsing
 algorithm](https://en.wikipedia.org/wiki/Earley_parser). The macro parser is
-defined in `src/libsyntax/ext/tt/macro_parser.rs`.
+defined in [`src/libsyntax/ext/tt/macro_parser.rs`][code_mp].
-In a traditional NFA-based parser, one common approach is to have some pattern
+The interface of the macro parser is as follows (this is slightly simplified):
 which we are trying to match an input against. Moreover, we may try to capture
 some portion of the input and bind it to variable in the pattern. For example:
 suppose we have a pattern (borrowing Rust macro syntax) such as `a $b:ident a`
 -- that is, an `a` token followed by an `ident` token followed by another `a`
 token. Given an input `a foo a`, the _metavariable_ `$b` would bind to the
 `ident` `foo`. On the other hand, an input `a foo b` would be rejected as a
 parse failure because the pattern `a <ident> a` cannot match `a foo b` (or as
 the compiler would put it, "no rules expected token `b`").
-The macro parser does pretty much exactly that with one exception: in order to
+```rust
-parse different types of metavariables, such as `ident`, `block`, `expr`, etc.,
+fn parse(
-the macro parser must sometimes call back to the normal Rust parser.
+    sess: ParserSession,
    tts: TokenStream,
    ms: &[TokenTree]
 ) -> NamedParseResult
 ```
-Interestingly, both definitions and invokations of macros are parsed using the
+In this interface:
 macro parser. This is extremely non-intuitive and self-referential. The code to
 parse macro _definitions_ is in `src/libsyntax/ext/tt/macro_rules.rs`. It
 defines the pattern for matching for a macro definition as `$( $lhs:tt =>
 $rhs:tt );+`. In other words, a `macro_rules` defintion should have in its body
 at least one occurence of a token tree followed by `=>` followed by another
 token tree. When the compiler comes to a `macro_rules` definition, it uses this
 pattern to match the two token trees per rule in the definition of the macro
 _using the macro parser itself_.
-When the compiler comes to a macro invokation, it needs to parse that
+- `sess` is a "parsing session", which keeps track of some metadata. Most
-invokation. This is also known as _macro expansion_. The same NFA-based macro
+  notably, this is used to keep track of errors that are generated so they can
-parser is used that is described above. Notably, the "pattern" (or _matcher_)
+  be reported to the user.
-used is the first token tree extracted from the rules of the macro _definition_.
+- `tts` is a stream of tokens. The macro parser's job is to consume the raw
-In other words, given some pattern described by the _definition_ of the macro,
+  stream of tokens and output a binding of metavariables to corresponding token
-we want to match the contents of the _invokation_ of the macro.
+  trees.
 - `ms` a _matcher_. This is a sequence of token trees that we want to match
  `tts` against.
-The algorithm is exactly the same, but when the macro parser comes to a place in
+In the analogy of a regex parser, `tts` is the input and we are matching it
-the current matcher where it needs to match a _non-terminal_ (i.e. a
+against the pattern `ms`. Using our examples, `tts` could be the stream of
-metavariable), it calls back to the normal Rust parser to get the contents of
+tokens containing the inside of the example invocation `print foo`, while `ms`
-that non-terminal. Then, the macro parser proceeds in parsing as normal.
+might be the sequence of token (trees) `print $mvar:ident`.
 The output of the parser is a `NamedParserResult`, which indicates which of
 three cases has occured:
 - Success: `tts` matches the given matcher `ms`, and we have produced a binding
  from metavariables to the corresponding token trees.
 - Failure: `tts` does not match `ms`. This results in an error message such as
  "No rule expected token _blah_".
 - Error: some fatal error has occured _in the parser_. For example, this happens
  if there are more than one pattern match, since that indicates the macro is
  ambiguous.
 The full interface is defined [here][code_parse_int].
 The macro parser does pretty much exactly the same as a normal regex parser with
 one exception: in order to parse different types of metavariables, such as
 `ident`, `block`, `expr`, etc., the macro parser must sometimes call back to the
 normal Rust parser.
 As mentioned above, both definitions and invocations of macros are parsed using
 the macro parser. This is extremely non-intuitive and self-referential. The code
 to parse macro _definitions_ is in
 [`src/libsyntax/ext/tt/macro_rules.rs`][code_mr]. It defines the pattern for
 matching for a macro definition as `$( $lhs:tt => $rhs:tt );+`. In other words,
 a `macro_rules` defintion should have in its body at least one occurence of a
 token tree followed by `=>` followed by another token tree. When the compiler
 comes to a `macro_rules` definition, it uses this pattern to match the two token
 trees per rule in the definition of the macro _using the macro parser itself_.
 In our example definition, the metavariable `$lhs` would match the patterns of
 both arms: `(print $mvar:ident)` and `(print twice $mvar:ident)`.  And `$rhs`
 would match the bodies of both arms: `{ println!("{}", $mvar); }` and `{
 println!("{}", $mvar); println!("{}", $mvar); }`. The parser would keep this
 knowledge around for when it needs to expand a macro invocation.
 When the compiler comes to a macro invocation, it parses that invocation using
 the same NFA-based macro parser that is described above. However, the matcher
 used is the first token tree (`$lhs`) extracted from the arms of the macro
 _definition_. Using our example, we would try to match the token stream `print
 foo` from the invocation against the matchers `print $mvar:ident` and `print
 twice $mvar:ident` that we previously extracted from the definition.  The
 algorithm is exactly the same, but when the macro parser comes to a place in the
 current matcher where it needs to match a _non-terminal_ (e.g. `$mvar:ident`),
 it calls back to the normal Rust parser to get the contents of that
 non-terminal. In this case, the Rust parser would look for an `ident` token,
 which it finds (`foo`) and returns to the macro parser. Then, the macro parser
 proceeds in parsing as normal. Also, note that exactly one of the matchers from
 the various arms should match the invocation (otherwise, the macro is
 ambiguous).
 For more information about the macro parser's implementation, see the comments
-in `src/libsyntax/ext/tt/macro_parser.rs`.
+in [`src/libsyntax/ext/tt/macro_parser.rs`][code_mp].
 ### Hygiene
@ -64,3 +144,11 @@ TODO
 ### Custom Derive
 TODO
 [code_dir]: https://github.com/rust-lang/rust/tree/master/src/libsyntax/ext/tt
 [code_mp]: https://github.com/rust-lang/rust/tree/master/src/libsyntax/ext/tt/macro_parser.rs
 [code_mp]: https://github.com/rust-lang/rust/tree/master/src/libsyntax/ext/tt/macro_rules.rs
 [code_parse_int]: https://github.com/rust-lang/rust/blob/a97cd17f5d71fb4ec362f4fbd79373a6e7ed7b82/src/libsyntax/ext/tt/macro_parser.rs#L421
 [parsing]: ./the-parser.md