additional changes to links and some text

2024-03-11 09:10:25 -03:00 · 2024-03-11 09:10:25 -03:00 · 027c805e5c
parent 07aa8b109f
commit 027c805e5c
1 changed files with 119 additions and 118 deletions
--- a/src/macro-expansion.md
+++ b/src/macro-expansion.md
@ -331,9 +331,11 @@ a code location and [`SyntaxContext`][sc]. Likewise, an [`Ident`] is just an int
 [am]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.apply_mark

 For built-in `macro`s, we use the context:
-`SyntaxContext::empty().apply_mark(expn_id)`, and such `macro`s are considered to
-be defined at the hierarchy root. We do the same for `proc macro`s because we
-haven't implemented cross-crate hygiene yet.
+[`SyntaxContext::empty().apply_mark(expn_id)`], and such `macro`s are
+considered to be defined at the hierarchy root. We do the same for `proc
+macro`s because we haven't implemented cross-crate hygiene yet.
+
+[`SyntaxContext::empty().apply_mark(expn_id)`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.apply_mark

 If the token had context `X` before being produced by a `macro` then after being
 produced by the `macro` it has context `X -> macro_id`. Here are some examples:
@ -346,12 +348,11 @@ macro m() { ident }
 m!();
 ```

-Here `ident` originally has context [`SyntaxContext::root`][scr]. `ident` has
+Here `ident` which initially has context [`SyntaxContext::root`][scr] has
 context `ROOT -> id(m)` after it's produced by `m`.

 [scr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.root

-
 Example 1:

 ```rust,ignore
@ -360,7 +361,8 @@ macro m() { macro n() { ident } }
 m!();
 n!();
 ```
-In this example the `ident` has context `ROOT` originally, then `ROOT -> id(m)`
+
+In this example the `ident` has context `ROOT` initially, then `ROOT -> id(m)`
 after the first expansion, then `ROOT -> id(m) -> id(n)`.

 Example 2:
@ -377,11 +379,11 @@ m!(foo);
 After all expansions, `foo` has context `ROOT -> id(n)` and `bar` has context
 `ROOT -> id(m) -> id(n)`.

-Finally, one last thing to mention is that currently, this hierarchy is subject
-to the ["context transplantation hack"][hack]. Basically, the more modern (and
-experimental) `macro` `macro`s have stronger hygiene than the older MBE system,
-but this can result in weird interactions between the two. The hack is intended
-to make things "just work" for now.
+Currently this hierarchy for tracking `macro` definitions is subject to the
+so-called ["context transplantation hack"][hack]. Modern (i.e. experimental)
+`macro`s have stronger hygiene than the legacy "Macros By Example" (`MBE`)
+system which can result in weird interactions between the two. The hack is
+intended to make things "just work" for now.

 [`ExpnId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnId.html
 [hack]: https://github.com/rust-lang/rust/pull/51762#issuecomment-401400732
@ -390,7 +392,8 @@ to make things "just work" for now.

 The third and final hierarchy tracks the location of `macro` invocations.

-In this hierarchy [`ExpnData::call_site`][callsite] is the child -> parent link.
+In this hierarchy [`ExpnData::call_site`][callsite] is the `child -> parent`
+link.

 [callsite]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html#structfield.call_site

@ -420,20 +423,22 @@ Above, we saw how the output of a `macro` is integrated into the `AST` for a cra
 and we also saw how the hygiene data for a crate is generated. But how do we
 actually produce the output of a `macro`? It depends on the type of `macro`.

-There are two types of `macro`s in Rust:
-`macro_rules!` `macro`s (a.k.a. "Macros By Example" (MBE)) and procedural `macro`s
-(or "proc `macro`s"; including custom derives). During the parsing phase, the normal
-Rust parser will set aside the contents of `macro`s and their invocations. Later,
-`macro`s are expanded using these portions of the code.
+There are two types of `macro`s in Rust: 
+  1. `macro_rules!` macros, and,
+  2. procedural `macro`s (`proc macro`s); including custom derives. 
+  
+During the parsing phase, the normal Rust parser will set aside the contents of
+`macro`s and their invocations. Later, `macro`s are expanded using these
+portions of the code.

 Some important data structures/interfaces here:
 - [`SyntaxExtension`] - a lowered `macro` representation, contains its expander
-  function, which transforms a `TokenStream` or `AST` into another `TokenStream`
-  or `AST` + some additional data like stability, or a list of unstable features
-  allowed inside the `macro`.
+  function, which transforms a [`TokenStream`] or `AST` into another
+  [`TokenStream`] or `AST` + some additional data like stability, or a list of
+  unstable features allowed inside the `macro`.
 - [`SyntaxExtensionKind`] - expander functions may have several different
  signatures (take one token stream, or two, or a piece of `AST`, etc). This is
-  an enum that lists them.
+  an `enum` that lists them.
 - [`BangProcMacro`]/[`TTMacroExpander`]/[`AttrProcMacro`]/[`MultiItemModifier`] -
  `trait`s representing the expander function signatures.

@ -446,18 +451,15 @@ Some important data structures/interfaces here:

 ## Macros By Example

-MBEs have their own parser distinct from the normal Rust parser. When `macro`s
-are expanded, we may invoke the MBE parser to parse and expand a `macro`.  The
-MBE parser, in turn, may call the normal Rust parser when it needs to bind a
-metavariable (e.g.  `$my_expr`) while parsing the contents of a `macro`
+`MBE`s have their own parser distinct from the Rust parser. When `macro`s are
+expanded, we may invoke the `MBE` parser to parse and expand a `macro`.  The
+`MBE` parser, in turn, may call the Rust parser when it needs to bind a
+metavariable (e.g. `$my_expr`) while parsing the contents of a `macro`
 invocation. The code for `macro` expansion is in
 [`compiler/rustc_expand/src/mbe/`][code_dir].

 ### Example

-It's helpful to have an example to refer to. For the remainder of this chapter,
-whenever we refer to the "example _definition_", we mean the following:
-
 ```rust,ignore
 macro_rules! printer {
    (print $mvar:ident) => {
@ -470,41 +472,41 @@ macro_rules! printer {
 }
 ```

-`$mvar` is called a _metavariable_. Unlike normal variables, rather than
-binding to a value in a computation, a metavariable binds _at compile time_ to
-a tree of _tokens_.  A _token_ is a single "unit" of the grammar, such as an
+Here `$mvar` is called a _metavariable_. Unlike normal variables, rather than
+binding to a value _at runtime_, a metavariable binds _at compile time_ to a
+tree of _tokens_.  A _token_ is a single "unit" of the grammar, such as an
 identifier (e.g. `foo`) or punctuation (e.g. `=>`). There are also other
-special tokens, such as `EOF`, which indicates that there are no more tokens.
-Token trees resulting from paired parentheses-like characters (`(`...`)`,
-`[`...`]`, and `{`...`}`) – they include the open and close and all the tokens
-in between (we do require that parentheses-like characters be balanced). Having
-`macro` expansion operate on token streams rather than the raw bytes of a source
-file abstracts away a lot of complexity. The `macro` expander (and much of the
-rest of the compiler) doesn't really care that much about the exact line and
-column of some syntactic construct in the code; it cares about what constructs
-are used in the code. Using tokens allows us to care about _what_ without
-worrying about _where_. For more information about tokens, see the
-[Parsing][parsing] chapter of this book.
-
-Whenever we refer to the "example _invocation_", we mean the following snippet:
+special tokens, such as `EOF`, which its self indicates that there are no more
+tokens. There are token trees resulting from the paired parentheses-like
+characters (`(`...`)`, `[`...`]`, and `{`...`}`) – they include the open and
+close and all the tokens in between (Rust requires that parentheses-like
+characters be balanced). Having `macro` expansion operate on token streams
+rather than the raw bytes of a source-file abstracts away a lot of complexity.
+The `macro` expander (and much of the rest of the compiler) doesn't consider
+the exact line and column of some syntactic construct in the code; it considers
+which constructs are used in the code. Using tokens allows us to care about
+_what_ without worrying about _where_. For more information about tokens, see
+the [Parsing][parsing] chapter of this book.

 ```rust,ignore
-printer!(print foo); // Assume `foo` is a variable defined somewhere else...
+printer!(print foo); // `foo` is a variable
 ```

 The process of expanding the `macro` invocation into the syntax tree
-`println!("{}", foo)` and then expanding that into a call to `Display::fmt` is
-called _`macro` expansion_, and it is the topic of this chapter.
+`println!("{}", foo)` and then expanding the syntax tree into a call to
+`Display::fmt` is one common example of _`macro` expansion_.

 ### The MBE parser

-There are two parts to MBE expansion: parsing the definition and parsing the
-invocations. Interestingly, both are done by the `macro` parser.
+There are two parts to `MBE` expansion done by the `macro` parser: 
+  1. parsing the definition, and,
+  2. parsing the invocations. 

-Basically, the MBE parser is like an NFA-based regex parser. It uses an
-algorithm similar in spirit to the [Earley parsing
-algorithm](https://en.wikipedia.org/wiki/Earley_parser). The `macro` parser is
-defined in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].
+We think of the `MBE` parser as a nondeterministic finite automaton (NFA) based
+regex parser since it uses an algorithm similar in spirit to the [Earley
+parsing algorithm](https://en.wikipedia.org/wiki/Earley_parser). The `macro`
+parser is defined in
+[`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].

 The interface of the `macro` parser is as follows (this is slightly simplified):

@ -518,64 +520,67 @@ fn parse_tt(

 We use these items in `macro` parser:

- `parser` is a reference to the state of a normal Rust parser, including the
-  token stream and parsing session. The token stream is what we are about to
-  ask the MBE parser to parse. We will consume the raw stream of tokens and
-  output a binding of metavariables to corresponding token trees. The parsing
-  session can be used to report parser errors.
- `matcher` is a sequence of `MatcherLoc`s that we want to match
+- a `parser` variable is a reference to the state of a normal Rust parser,
+  including the token stream and parsing session. The token stream is what we
+  are about to ask the `MBE` parser to parse. We will consume the raw stream of
+  tokens and output a binding of metavariables to corresponding token trees.
+  The parsing session can be used to report parser errors.
+- a `matcher` variable is a sequence of [`MatcherLoc`]s that we want to match
  the token stream against. They're converted from token trees before matching.

-In the analogy of a regex parser, the token stream is the input and we are matching it
-against the pattern `matcher`. Using our examples, the token stream could be the stream of
-tokens containing the inside of the example invocation `print foo`, while `matcher`
-might be the sequence of token (trees) `print $mvar:ident`.
+[`MatcherLoc`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/enum.MatcherLoc.html
+
+In the analogy of a regex parser, the token stream is the input and we are
+matching it against the pattern defined by `matcher`. Using our examples, the
+token stream could be the stream of tokens containing the inside of the example
+invocation `print foo`, while `matcher` might be the sequence of token (trees)
+`print $mvar:ident`.

 The output of the parser is a [`ParseResult`], which indicates which of
 three cases has occurred:

- Success: the token stream matches the given `matcher`, and we have produced a binding
-  from metavariables to the corresponding token trees.
- Failure: the token stream does not match `matcher`. This results in an error message such as
-  "No rule expected token _blah_".
- Error: some fatal error has occurred _in the parser_. For example, this
-  happens if there is more than one pattern match, since that indicates
-  the `macro` is ambiguous.
+- **Success**: the token stream matches the given `matcher` and we have produced a
+  binding from metavariables to the corresponding token trees.
+- **Failure**: the token stream does not match `matcher` and results in an error
+  message such as "No rule expected token ...".
+- **Error**: some fatal error has occurred _in the parser_. For example, this
+  happens if there is more than one pattern match, since that indicates the
+  `macro` is ambiguous.

 The full interface is defined [here][code_parse_int].

-The `macro` parser does pretty much exactly the same as a normal regex parser with
-one exception: in order to parse different types of metavariables, such as
-`ident`, `block`, `expr`, etc., the `macro` parser must sometimes call back to the
-normal Rust parser.
+The `macro` parser does pretty much exactly the same as a normal regex parser
+with one exception: in order to parse different types of metavariables, such as
+`ident`, `block`, `expr`, etc., the `macro` parser must call back to the normal
+Rust parser. Both the definition and invocation of `macro`s are parsed using
+the parser in a process which is non-intuitively self-referential. 

-As mentioned above, both definitions and invocations of `macro`s are parsed using
-the `macro` parser. This is extremely non-intuitive and self-referential. The code
-to parse `macro` _definitions_ is in
-[`compiler/rustc_expand/src/mbe/macro_rules.rs`][code_mr]. It defines the pattern for
-matching for a `macro` definition as `$( $lhs:tt => $rhs:tt );+`. In other words,
-a `macro_rules` definition should have in its body at least one occurrence of a
-token tree followed by `=>` followed by another token tree. When the compiler
-comes to a `macro_rules` definition, it uses this pattern to match the two token
-trees per rule in the definition of the `macro` _using the `macro` parser itself_.
-In our example definition, the metavariable `$lhs` would match the patterns of
-both arms: `(print $mvar:ident)` and `(print twice $mvar:ident)`.  And `$rhs`
-would match the bodies of both arms: `{ println!("{}", $mvar); }` and `{
-println!("{}", $mvar); println!("{}", $mvar); }`. The parser would keep this
-knowledge around for when it needs to expand a `macro` invocation.
+The code to parse `macro` _definitions_ is in
+[`compiler/rustc_expand/src/mbe/macro_rules.rs`][code_mr]. It defines the
+pattern for matching a `macro` definition as `$( $lhs:tt => $rhs:tt );+`. In
+other words, a `macro_rules` definition should have in its body at least one
+occurrence of a token tree followed by `=>` followed by another token tree.
+When the compiler comes to a `macro_rules` definition, it uses this pattern to
+match the two token trees per rule in the definition of the `macro`, _thereby
+utilizing the `macro` parser itself_. In our example definition, the
+metavariable `$lhs` would match the patterns of both arms: `(print
+$mvar:ident)` and `(print twice $mvar:ident)`. And `$rhs` would match the
+bodies of both arms: `{ println!("{}", $mvar); }` and `{ println!("{}", $mvar);
+println!("{}", $mvar); }`. The parser keeps this knowledge around for when it
+needs to expand a `macro` invocation.

 When the compiler comes to a `macro` invocation, it parses that invocation using
-the same NFA-based `macro` parser that is described above. However, the matcher
+a NFA-based `macro` parser described above. However, the `matcher` variable
 used is the first token tree (`$lhs`) extracted from the arms of the `macro`
 _definition_. Using our example, we would try to match the token stream `print
-foo` from the invocation against the matchers `print $mvar:ident` and `print
-twice $mvar:ident` that we previously extracted from the definition.  The
+foo` from the invocation against the `matcher`s `print $mvar:ident` and `print
+twice $mvar:ident` that we previously extracted from the definition. The
 algorithm is exactly the same, but when the `macro` parser comes to a place in the
-current matcher where it needs to match a _non-terminal_ (e.g. `$mvar:ident`),
+current `matcher` where it needs to match a _non-terminal_ (e.g. `$mvar:ident`),
 it calls back to the normal Rust parser to get the contents of that
 non-terminal. In this case, the Rust parser would look for an `ident` token,
 which it finds (`foo`) and returns to the `macro` parser. Then, the `macro` parser
-proceeds in parsing as normal. Also, note that exactly one of the matchers from
+proceeds in parsing as normal. Also, note that exactly one of the `matcher`s from
 the various arms should match the invocation; if there is more than one match,
 the parse is ambiguous, while if there are no matches at all, there is a syntax
 error.
@ -583,32 +588,21 @@ error.
 For more information about the `macro` parser's implementation, see the comments
 in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].

-### `macro`s and Macros 2.0
-
-There is an old and mostly undocumented effort to improve the MBE system, give
-it more hygiene-related features, better scoping and visibility rules, etc. There
-hasn't been a lot of work on this recently, unfortunately. Internally, `macro`
-`macro`s use the same machinery as today's MBEs; they just have additional
-syntactic sugar and are allowed to be in namespaces.
-
 ## Procedural Macros

-Procedural `macro`s are also expanded during parsing, as mentioned above.
-However, they use a rather different mechanism. Rather than having a parser in
-the compiler, procedural `macro`s are implemented as custom, third-party crates.
-The compiler will compile the proc `macro` crate and specially annotated
-functions in them (i.e. the proc `macro` itself), passing them a stream of tokens.
+Procedural `macro`s are also expanded during parsing. However, rather than
+having a parser in the compiler, `proc macro`s are implemented as custom,
+third-party crates. The compiler will compile the `proc macro` crate and
+specially annotated functions in them (i.e. the `proc macro` itself), passing
+them a stream of tokens. A `proc macro` can then transform the token stream and
+output a new token stream, which is synthesized into the `AST`.

-The proc `macro` can then transform the token stream and output a new token
-stream, which is synthesized into the `AST`.
-
-It's worth noting that the token stream type used by proc `macro`s is _stable_,
-so `rustc` does not use it internally (since our internal data structures are
-unstable). The compiler's token stream is
-[`rustc_ast::tokenstream::TokenStream`][rustcts], as previously. This is
-converted into the stable [`proc_macro::TokenStream`][stablets] and back in
+The token stream type used by `proc macro`s is _stable_, so `rustc` does not
+use it internally. The compiler's (unstable) token stream is defined in
+[`rustc_ast::tokenstream::TokenStream`][rustcts]. This is converted into the
+stable [`proc_macro::TokenStream`][stablets] and back in
 [`rustc_expand::proc_macro`][pm] and [`rustc_expand::proc_macro_server`][pms].
-Because the Rust ABI is unstable, we use the C ABI for this conversion.
+Since the Rust ABI is currently unstable, we use the C ABI for this conversion.

 [tsmod]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/index.html
 [rustcts]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/struct.TokenStream.html
@ -617,10 +611,17 @@ Because the Rust ABI is unstable, we use the C ABI for this conversion.
 [pms]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/proc_macro_server/index.html
 [`ParseResult`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/enum.ParseResult.html

-TODO: more here. [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/1160)
+<!-- TODO(rylev): more here. [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/1160) -->

 ### Custom Derive

-Custom derives are a special type of proc `macro`.
+Custom derives are a special type of `proc macro`.

-TODO: more? [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/1160)
+### Macros By Example and Macros 2.0
+
+There is an legacy and mostly undocumented effort to improve the `MBE` system
+by giving it more hygiene-related features, better scoping and visibility
+rules, etc. Internally this uses the same machinery as today's `MBE`s with some
+additional syntactic sugar and are allowed to be in namespaces.
+
+<!-- TODO(rylev): more? [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/1160) -->