diff --git a/src/macro-expansion.md b/src/macro-expansion.md index 0e1c72e7..ac77495b 100644 --- a/src/macro-expansion.md +++ b/src/macro-expansion.md @@ -2,25 +2,29 @@ -> `rustc_ast`, `rustc_expand`, and `rustc_builtin_macros` are all undergoing -> refactoring, so some of the links in this chapter may be broken. +> N.B. [`rustc_ast`], [`rustc_expand`], and [`rustc_builtin_macros`] are all +> undergoing refactoring, so some of the links in this chapter may be broken. -Rust has a very powerful macro system. In the previous chapter, we saw how the -parser sets aside macros to be expanded (it temporarily uses [placeholders]). -This chapter is about the process of expanding those macros iteratively until -we have a complete AST for our crate with no unexpanded macros (or a compile -error). +Rust has a very powerful `macro` system. In the previous chapter, we saw how +the parser sets aside `macro`s to be expanded (using temporary [placeholders]). +This chapter is about the process of expanding those `macro`s iteratively until +we have a complete [*Abstract Syntax Tree* (`AST`)][ast] for our crate with no +unexpanded `macro`s (or a compile error). +[ast]: https://en.wikipedia.org/wiki/Abstract_syntax_tree +[`rustc_ast`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html +[`rustc_expand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/index.html +[`rustc_builtin_macros`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_builtin_macros/index.html [placeholders]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/placeholders/index.html -First, we will discuss the algorithm that expands and integrates macro output -into ASTs. Next, we will take a look at how hygiene data is collected. Finally, -we will look at the specifics of expanding different types of macros. +First, we discuss the algorithm that expands and integrates `macro` output into +`AST`s. Next, we take a look at how hygiene data is collected. Finally, we look +at the specifics of expanding different types of `macro`s. Many of the algorithms and data structures described below are in [`rustc_expand`], -with basic data structures in [`rustc_expand::base`][base]. +with fundamental data structures in [`rustc_expand::base`][base]. -Also of note, `cfg` and `cfg_attr` are treated specially from other macros, and are +Also of note, `cfg` and `cfg_attr` are treated specially from other `macro`s, and are handled in [`rustc_expand::config`][cfg]. [`rustc_expand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/index.html @@ -29,108 +33,112 @@ handled in [`rustc_expand::config`][cfg]. ## Expansion and AST Integration -First of all, expansion happens at the crate level. Given a raw source code for -a crate, the compiler will produce a massive AST with all macros expanded, all +Firstly, expansion happens at the crate level. Given a raw source code for +a crate, the compiler will produce a massive `AST` with all `macro`s expanded, all modules inlined, etc. The primary entry point for this process is the -[`MacroExpander::fully_expand_fragment`][fef] method. With few exceptions, we +[`MacroExpander::fully_expand_fragment()`][fef] method. With few exceptions, we use this method on the whole crate (see ["Eager Expansion"](#eager-expansion) below for more detailed discussion of edge case expansion issues). [`rustc_builtin_macros`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_builtin_macros/index.html [reb]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/build/index.html -At a high level, [`fully_expand_fragment`][fef] works in iterations. We keep a -queue of unresolved macro invocations (that is, macros we haven't found the -definition of yet). We repeatedly try to pick a macro from the queue, resolve +At a high level, [`fully_expand_fragment()`][fef] works in iterations. We keep a +queue of unresolved `macro` invocations (i.e. `macro`s we haven't found the +definition of yet). We repeatedly try to pick a `macro` from the queue, resolve it, expand it, and integrate it back. If we can't make progress in an iteration, this represents a compile error. Here is the [algorithm][original]: [fef]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.MacroExpander.html#method.fully_expand_fragment [original]: https://github.com/rust-lang/rust/pull/53778#issuecomment-419224049 -1. Initialize a `queue` of unresolved macros. +1. Initialize a `queue` of unresolved `macro`s. 2. Repeat until `queue` is empty (or we make no progress, which is an error): 1. [Resolve](./name-resolution.md) imports in our partially built crate as much as possible. - 2. Collect as many macro [`Invocation`s][inv] as possible from our - partially built crate (fn-like, attributes, derives) and add them to the + 2. Collect as many `macro` [`Invocation`s][inv] as possible from our + partially built crate (`fn`-like, attributes, derives) and add them to the queue. - 3. Dequeue the first element, and attempt to resolve it. + 3. Dequeue the first element and attempt to resolve it. 4. If it's resolved: - 1. Run the macro's expander function that consumes a [`TokenStream`] or - AST and produces a [`TokenStream`] or [`AstFragment`] (depending on - the macro kind). (A `TokenStream` is a collection of [`TokenTree`s][tt], + 1. Run the `macro`'s expander function that consumes a [`TokenStream`] or + `AST` and produces a [`TokenStream`] or [`AstFragment`] (depending on + the `macro` kind). (A [`TokenStream`] is a collection of [`TokenTree`s][tt], each of which are a token (punctuation, identifier, or literal) or a delimited group (anything inside `()`/`[]`/`{}`)). - - At this point, we know everything about the macro itself and can - call `set_expn_data` to fill in its properties in the global data; - that is the hygiene data associated with `ExpnId`. (See [the - "Hygiene" section below][hybelow]). - 2. Integrate that piece of AST into the big existing partially built - AST. This is essentially where the "token-like mass" becomes a - proper set-in-stone AST with side-tables. It happens as follows: - - If the macro produces tokens (e.g. a proc macro), we parse into - an AST, which may produce parse errors. - - During expansion, we create `SyntaxContext`s (hierarchy 2). (See - [the "Hygiene" section below][hybelow]) - - These three passes happen one after another on every AST fragment - freshly expanded from a macro: + - At this point, we know everything about the `macro` itself and can + call [`set_expn_data()`] to fill in its properties in the global + data; that is the [hygiene] data associated with [`ExpnId`] (see + [Hygiene][hybelow] below). + 2. Integrate that piece of `AST` into the currently-existing though + partially-built `AST`. This is essentially where the "token-like mass" + becomes a proper set-in-stone `AST` with side-tables. It happens as + follows: + - If the `macro` produces tokens (e.g. a `proc macro`), we parse into + an `AST`, which may produce parse errors. + - During expansion, we create [`SyntaxContext`]s (hierarchy 2) (see + [Hygiene][hybelow] below). + - These three passes happen one after another on every `AST` fragment + freshly expanded from a `macro`: - [`NodeId`]s are assigned by [`InvocationCollector`]. This - also collects new macro calls from this new AST piece and + also collects new `macro` calls from this new `AST` piece and adds them to the queue. - ["Def paths"][defpath] are created and [`DefId`]s are assigned to them by [`DefCollector`]. - Names are put into modules (from the resolver's point of view) by [`BuildReducedGraphVisitor`]. - 3. After expanding a single macro and integrating its output, continue - to the next iteration of [`fully_expand_fragment`][fef]. + 3. After expanding a single `macro` and integrating its output, continue + to the next iteration of [`fully_expand_fragment()`][fef]. 5. If it's not resolved: - 1. Put the macro back in the queue + 1. Put the `macro` back in the queue. 2. Continue to next iteration... -[defpath]: hir.md#identifiers-in-the-hir -[`NodeId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/node_id/struct.NodeId.html -[`InvocationCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.InvocationCollector.html -[`DefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.DefId.html -[`DefCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/def_collector/struct.DefCollector.html -[`BuildReducedGraphVisitor`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/build_reduced_graph/struct.BuildReducedGraphVisitor.html -[hybelow]: #hygiene-and-hierarchies -[tt]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/enum.TokenTree.html -[`TokenStream`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/struct.TokenStream.html -[inv]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.Invocation.html [`AstFragment`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragment.html +[`BuildReducedGraphVisitor`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/build_reduced_graph/struct.BuildReducedGraphVisitor.html +[`DefCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/def_collector/struct.DefCollector.html +[`DefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.DefId.html +[`ExpnId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnId.html +[`InvocationCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.InvocationCollector.html +[`NodeId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/node_id/struct.NodeId.html +[`set_expn_data()`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.LocalExpnId.html#method.set_expn_data +[`SyntaxContext`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html +[`TokenStream`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/struct.TokenStream.html +[defpath]: hir.md#identifiers-in-the-hir +[hybelow]: #hygiene-and-hierarchies +[hygiene]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/index.html +[inv]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.Invocation.html +[tt]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/enum.TokenTree.html ### Error Recovery -If we make no progress in an iteration, then we have reached a compilation -error (e.g. an undefined macro). We attempt to recover from failures -(unresolved macros or imports) for the sake of diagnostics. This allows -compilation to continue past the first error, so that we can report more errors -at a time. Recovery can't cause compilation to succeed. We know that it will -fail at this point. The recovery happens by expanding unresolved macros into -[`ExprKind::Err`][err]. +If we make no progress in an iteration we have reached a compilation error +(e.g. an undefined `macro`). We attempt to recover from failures (i.e. +unresolved `macro`s or imports) with the intent of generating diagnostics. +Failure recovery happens by expanding unresolved `macro`s into +[`ExprKind::Err`][err] and allows compilation to continue past the first error +so that `rustc` can report more errors than just the original failure. [err]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/enum.ExprKind.html#variant.Err ### Name Resolution Notice that name resolution is involved here: we need to resolve imports and -macro names in the above algorithm. This is done in -[`rustc_resolve::macros`][mresolve], which resolves macro paths, validates -those resolutions, and reports various errors (e.g. "not found" or "found, but -it's unstable" or "expected x, found y"). However, we don't try to resolve -other names yet. This happens later, as we will see in the [next -chapter](./name-resolution.md). +`macro` names in the above algorithm. This is done in +[`rustc_resolve::macros`][mresolve], which resolves `macro` paths, validates +those resolutions, and reports various errors (e.g. "not found", "found, but +it's unstable", "expected x, found y"). However, we don't try to resolve +other names yet. This happens later, as we will see in the chapter: [Name +Resolution](./name-resolution.md). [mresolve]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/macros/index.html ### Eager Expansion -_Eager expansion_ means that we expand the arguments of a macro invocation -before the macro invocation itself. This is implemented only for a few special -built-in macros that expect literals; expanding arguments first for some of -these macro results in a smoother user experience. As an example, consider the -following: +_Eager expansion_ means we expand the arguments of a `macro` invocation before +the `macro` invocation itself. This is implemented only for a few special +built-in `macro`s that expect literals; expanding arguments first for some of +these `macro` results in a smoother user experience. As an example, consider +the following: ```rust,ignore macro bar($i: ident) { $i } @@ -139,35 +147,37 @@ macro foo($i: ident) { $i } foo!(bar!(baz)); ``` -A lazy expansion would expand `foo!` first. An eager expansion would expand +A lazy-expansion would expand `foo!` first. An eager-expansion would expand `bar!` first. -Eager expansion is not a generally available feature of Rust. Implementing -eager expansion more generally would be challenging, but we implement it for a -few special built-in macros for the sake of user experience. The built-in -macros are implemented in [`rustc_builtin_macros`], along with some other early -code generation facilities like injection of standard library imports or +Eager-expansion is not a generally available feature of Rust. Implementing +eager-expansion more generally would be challenging, so we implement it for a +few special built-in `macro`s for the sake of user-experience. The built-in +`macro`s are implemented in [`rustc_builtin_macros`], along with some other +early code generation facilities like injection of standard library imports or generation of test harness. There are some additional helpers for building -their AST fragments in [`rustc_expand::build`][reb]. Eager expansion generally -performs a subset of the things that lazy (normal) expansion does. It is done by -invoking [`fully_expand_fragment`][fef] on only part of a crate (as opposed to -the whole crate, like we normally do). +`AST` fragments in [`rustc_expand::build`][reb]. Eager-expansion generally +performs a subset of the things that lazy (normal) expansion does. It is done +by invoking [`fully_expand_fragment`][fef] on only part of a crate (as opposed +to the whole crate, like we normally do). ### Other Data Structures -Here are some other notable data structures involved in expansion and integration: -- [`ResolverExpand`] - a trait used to break crate dependencies. This allows the +Here are some other notable data structures involved in expansion and +integration: +- [`ResolverExpand`] - a `trait` used to break crate dependencies. This allows the resolver services to be used in [`rustc_ast`], despite [`rustc_resolve`] and pretty much everything else depending on [`rustc_ast`]. -- [`ExtCtxt`]/[`ExpansionData`] - various intermediate data kept and used by expansion - infrastructure in the process of its work -- [`Annotatable`] - a piece of AST that can be an attribute target, almost same - thing as AstFragment except for types and patterns that can be produced by - macros but cannot be annotated with attributes -- [`MacResult`] - a "polymorphic" AST fragment, something that can turn into a - different `AstFragment` depending on its [`AstFragmentKind`] - item, - or expression, or pattern etc. +- [`ExtCtxt`]/[`ExpansionData`] - holds various intermediate expansion + infrastructure data. +- [`Annotatable`] - a piece of `AST` that can be an attribute target, almost the same + thing as [`AstFragment`] except for `type`s and patterns that can be produced by + `macro`s but cannot be annotated with attributes. +- [`MacResult`] - a "polymorphic" `AST` fragment, something that can turn into + a different [`AstFragment`] depending on its [`AstFragmentKind`] (i.e. an item, + expression, pattern, etc). +[`AstFragment`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragment.html [`rustc_ast`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html [`rustc_resolve`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/index.html [`ResolverExpand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.ResolverExpand.html @@ -179,7 +189,7 @@ Here are some other notable data structures involved in expansion and integratio ## Hygiene and Hierarchies -If you have ever used C/C++ preprocessor macros, you know that there are some +If you have ever used the C/C++ preprocessor macros, you know that there are some annoying and hard-to-debug gotchas! For example, consider the following C code: ```c @@ -213,16 +223,16 @@ we got `foo(0, 0)` because the macro defined its own `y`! These are both examples of _macro hygiene_ issues. _Hygiene_ relates to how to handle names defined _within a macro_. In particular, a hygienic macro system -prevents errors due to names introduced within a macro. Rust macros are hygienic +prevents errors due to names introduced within a macro. Rust `macro`s are hygienic in that they do not allow one to write the sorts of bugs above. At a high level, hygiene within the Rust compiler is accomplished by keeping track of the context where a name is introduced and used. We can then -disambiguate names based on that context. Future iterations of the macro system -will allow greater control to the macro author to use that context. For example, -a macro author may want to introduce a new name to the context where the macro -was called. Alternately, the macro author may be defining a variable for use -only within the macro (i.e. it should not be visible outside the macro). +disambiguate names based on that context. Future iterations of the `macro` system +will allow greater control to the `macro` author to use that context. For example, +a `macro` author may want to introduce a new name to the context where the `macro` +was called. Alternately, the `macro` author may be defining a variable for use +only within the `macro` (i.e. it should not be visible outside the `macro`). [code_dir]: https://github.com/rust-lang/rust/tree/master/compiler/rustc_expand/src/mbe [code_mp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser @@ -230,18 +240,18 @@ only within the macro (i.e. it should not be visible outside the macro). [code_parse_int]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/struct.TtParser.html#method.parse_tt [parsing]: ./the-parser.html -The context is attached to AST nodes. All AST nodes generated by macros have +The context is attached to `AST` nodes. All `AST` nodes generated by `macro`s have context attached. Additionally, there may be other nodes that have context -attached, such as some desugared syntax (non-macro-expanded nodes are +attached, such as some desugared syntax (non-`macro`-expanded nodes are considered to just have the "root" context, as described below). Throughout the compiler, we use [`rustc_span::Span`s][span] to refer to code locations. This struct also has hygiene information attached to it, as we will see later. [span]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.Span.html -Because macros invocations and definitions can be nested, the syntax context of -a node must be a hierarchy. For example, if we expand a macro and there is -another macro invocation or definition in the generated output, then the syntax +Because `macro`s invocations and definitions can be nested, the syntax context of +a node must be a hierarchy. For example, if we expand a `macro` and there is +another `macro` invocation or definition in the generated output, then the syntax context should reflect the nesting. However, it turns out that there are actually a few types of context we may @@ -249,13 +259,13 @@ want to track for different purposes. Thus, there are not just one but _three_ expansion hierarchies that together comprise the hygiene information for a crate. -All of these hierarchies need some sort of "macro ID" to identify individual -elements in the chain of expansions. This ID is [`ExpnId`]. All macros receive -an integer ID, assigned continuously starting from 0 as we discover new macro +All of these hierarchies need some sort of "`macro` ID" to identify individual +elements in the chain of expansions. This ID is [`ExpnId`]. All `macro`s receive +an integer ID, assigned continuously starting from 0 as we discover new `macro` calls. All hierarchies start at [`ExpnId::root()`][rootid], which is its own parent. -[`rustc_span::hygiene`][hy] contains all of the hygiene-related algorithms +The [`rustc_span::hygiene`][hy] library contains all of the hygiene-related algorithms (with the exception of some hacks in [`Resolver::resolve_crate_root`][hacks]) and structures related to hygiene and expansion that are kept in global data. @@ -273,18 +283,18 @@ any [`Ident`] without any context. ### The Expansion Order Hierarchy -The first hierarchy tracks the order of expansions, i.e., when a macro -invocation is in the output of another macro. +The first hierarchy tracks the order of expansions, i.e., when a `macro` +invocation is in the output of another `macro`. Here, the children in the hierarchy will be the "innermost" tokens. The -[`ExpnData`] struct itself contains a subset of properties from both macro -definition and macro call available through global data. -[`ExpnData::parent`][edp] tracks the child -> parent link in this hierarchy. +[`ExpnData`] struct itself contains a subset of properties from both `macro` +definition and `macro` call available through global data. +[`ExpnData::parent`][edp] tracks the child-to-parent link in this hierarchy. [`ExpnData`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html [edp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html#structfield.parent -For example, +For example: ```rust,ignore macro_rules! foo { () => { println!(); } } @@ -292,25 +302,25 @@ macro_rules! foo { () => { println!(); } } fn main() { foo!(); } ``` -In this code, the AST nodes that are finally generated would have hierarchy +In this code, the `AST` nodes that are finally generated would have hierarchy `root -> id(foo) -> id(println)`. ### The Macro Definition Hierarchy -The second hierarchy tracks the order of macro definitions, i.e., when we are -expanding one macro another macro definition is revealed in its output. This +The second hierarchy tracks the order of `macro` definitions, i.e., when we are +expanding one `macro` another `macro` definition is revealed in its output. This one is a bit tricky and more complex than the other two hierarchies. [`SyntaxContext`][sc] represents a whole chain in this hierarchy via an ID. [`SyntaxContextData`][scd] contains data associated with the given -`SyntaxContext`; mostly it is a cache for results of filtering that chain in -different ways. [`SyntaxContextData::parent`][scdp] is the child -> parent +[`SyntaxContext`][sc]; mostly it is a cache for results of filtering that chain in +different ways. [`SyntaxContextData::parent`][scdp] is the child-to-parent link here, and [`SyntaxContextData::outer_expns`][scdoe] are individual -elements in the chain. The "chaining operator" is +elements in the chain. The "chaining-operator" is [`SyntaxContext::apply_mark`][am] in compiler code. A [`Span`][span], mentioned above, is actually just a compact representation of -a code location and `SyntaxContext`. Likewise, an [`Ident`] is just an interned +a code location and [`SyntaxContext`][sc]. Likewise, an [`Ident`] is just an interned [`Symbol`] + `Span` (i.e. an interned string + hygiene data). [`Symbol`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/symbol/struct.Symbol.html @@ -320,13 +330,13 @@ a code location and `SyntaxContext`. Likewise, an [`Ident`] is just an interned [scdoe]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContextData.html#structfield.outer_expn [am]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.apply_mark -For built-in macros, we use the context: -`SyntaxContext::empty().apply_mark(expn_id)`, and such macros are considered to -be defined at the hierarchy root. We do the same for proc-macros because we +For built-in `macro`s, we use the context: +`SyntaxContext::empty().apply_mark(expn_id)`, and such `macro`s are considered to +be defined at the hierarchy root. We do the same for `proc macro`s because we haven't implemented cross-crate hygiene yet. -If the token had context `X` before being produced by a macro then after being -produced by the macro it has context `X -> macro_id`. Here are some examples: +If the token had context `X` before being produced by a `macro` then after being +produced by the `macro` it has context `X -> macro_id`. Here are some examples: Example 0: @@ -356,7 +366,7 @@ after the first expansion, then `ROOT -> id(m) -> id(n)`. Example 2: Note that these chains are not entirely determined by their last element, in -other words `ExpnId` is not isomorphic to `SyntaxContext`. +other words [`ExpnId`] is not isomorphic to [`SyntaxContext`][sc]. ```rust,ignore macro m($i: ident) { macro n() { ($i, bar) } } @@ -369,15 +379,16 @@ After all expansions, `foo` has context `ROOT -> id(n)` and `bar` has context Finally, one last thing to mention is that currently, this hierarchy is subject to the ["context transplantation hack"][hack]. Basically, the more modern (and -experimental) `macro` macros have stronger hygiene than the older MBE system, +experimental) `macro` `macro`s have stronger hygiene than the older MBE system, but this can result in weird interactions between the two. The hack is intended to make things "just work" for now. +[`ExpnId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnId.html [hack]: https://github.com/rust-lang/rust/pull/51762#issuecomment-401400732 ### The Call-site Hierarchy -The third and final hierarchy tracks the location of macro invocations. +The third and final hierarchy tracks the location of `macro` invocations. In this hierarchy [`ExpnData::call_site`][callsite] is the child -> parent link. @@ -392,39 +403,39 @@ macro foo($i: ident) { $i } foo!(bar!(baz)); ``` -For the `baz` AST node in the final output, the expansion-order hierarchy is +For the `baz` `AST` node in the final output, the expansion-order hierarchy is `ROOT -> id(foo) -> id(bar) -> baz`, while the call-site hierarchy is `ROOT -> baz`. ### Macro Backtraces -Macro backtraces are implemented in [`rustc_span`] using the hygiene machinery +`macro` backtraces are implemented in [`rustc_span`] using the hygiene machinery in [`rustc_span::hygiene`][hy]. [`rustc_span`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/index.html ## Producing Macro Output -Above, we saw how the output of a macro is integrated into the AST for a crate, +Above, we saw how the output of a `macro` is integrated into the `AST` for a crate, and we also saw how the hygiene data for a crate is generated. But how do we -actually produce the output of a macro? It depends on the type of macro. +actually produce the output of a `macro`? It depends on the type of `macro`. -There are two types of macros in Rust: -`macro_rules!` macros (a.k.a. "Macros By Example" (MBE)) and procedural macros -(or "proc macros"; including custom derives). During the parsing phase, the normal -Rust parser will set aside the contents of macros and their invocations. Later, -macros are expanded using these portions of the code. +There are two types of `macro`s in Rust: +`macro_rules!` `macro`s (a.k.a. "Macros By Example" (MBE)) and procedural `macro`s +(or "proc `macro`s"; including custom derives). During the parsing phase, the normal +Rust parser will set aside the contents of `macro`s and their invocations. Later, +`macro`s are expanded using these portions of the code. Some important data structures/interfaces here: -- [`SyntaxExtension`] - a lowered macro representation, contains its expander - function, which transforms a `TokenStream` or AST into another `TokenStream` - or AST + some additional data like stability, or a list of unstable features - allowed inside the macro. +- [`SyntaxExtension`] - a lowered `macro` representation, contains its expander + function, which transforms a `TokenStream` or `AST` into another `TokenStream` + or `AST` + some additional data like stability, or a list of unstable features + allowed inside the `macro`. - [`SyntaxExtensionKind`] - expander functions may have several different - signatures (take one token stream, or two, or a piece of AST, etc). This is + signatures (take one token stream, or two, or a piece of `AST`, etc). This is an enum that lists them. - [`BangProcMacro`]/[`TTMacroExpander`]/[`AttrProcMacro`]/[`MultiItemModifier`] - - traits representing the expander function signatures. + `trait`s representing the expander function signatures. [`SyntaxExtension`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/struct.SyntaxExtension.html [`SyntaxExtensionKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/enum.SyntaxExtensionKind.html @@ -435,11 +446,11 @@ Some important data structures/interfaces here: ## Macros By Example -MBEs have their own parser distinct from the normal Rust parser. When macros -are expanded, we may invoke the MBE parser to parse and expand a macro. The +MBEs have their own parser distinct from the normal Rust parser. When `macro`s +are expanded, we may invoke the MBE parser to parse and expand a `macro`. The MBE parser, in turn, may call the normal Rust parser when it needs to bind a -metavariable (e.g. `$my_expr`) while parsing the contents of a macro -invocation. The code for macro expansion is in +metavariable (e.g. `$my_expr`) while parsing the contents of a `macro` +invocation. The code for `macro` expansion is in [`compiler/rustc_expand/src/mbe/`][code_dir]. ### Example @@ -467,8 +478,8 @@ special tokens, such as `EOF`, which indicates that there are no more tokens. Token trees resulting from paired parentheses-like characters (`(`...`)`, `[`...`]`, and `{`...`}`) – they include the open and close and all the tokens in between (we do require that parentheses-like characters be balanced). Having -macro expansion operate on token streams rather than the raw bytes of a source -file abstracts away a lot of complexity. The macro expander (and much of the +`macro` expansion operate on token streams rather than the raw bytes of a source +file abstracts away a lot of complexity. The `macro` expander (and much of the rest of the compiler) doesn't really care that much about the exact line and column of some syntactic construct in the code; it cares about what constructs are used in the code. Using tokens allows us to care about _what_ without @@ -481,21 +492,21 @@ Whenever we refer to the "example _invocation_", we mean the following snippet: printer!(print foo); // Assume `foo` is a variable defined somewhere else... ``` -The process of expanding the macro invocation into the syntax tree +The process of expanding the `macro` invocation into the syntax tree `println!("{}", foo)` and then expanding that into a call to `Display::fmt` is -called _macro expansion_, and it is the topic of this chapter. +called _`macro` expansion_, and it is the topic of this chapter. ### The MBE parser There are two parts to MBE expansion: parsing the definition and parsing the -invocations. Interestingly, both are done by the macro parser. +invocations. Interestingly, both are done by the `macro` parser. Basically, the MBE parser is like an NFA-based regex parser. It uses an algorithm similar in spirit to the [Earley parsing -algorithm](https://en.wikipedia.org/wiki/Earley_parser). The macro parser is +algorithm](https://en.wikipedia.org/wiki/Earley_parser). The `macro` parser is defined in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp]. -The interface of the macro parser is as follows (this is slightly simplified): +The interface of the `macro` parser is as follows (this is slightly simplified): ```rust,ignore fn parse_tt( @@ -505,7 +516,7 @@ fn parse_tt( ) -> ParseResult ``` -We use these items in macro parser: +We use these items in `macro` parser: - `parser` is a reference to the state of a normal Rust parser, including the token stream and parsing session. The token stream is what we are about to @@ -529,47 +540,47 @@ three cases has occurred: "No rule expected token _blah_". - Error: some fatal error has occurred _in the parser_. For example, this happens if there is more than one pattern match, since that indicates - the macro is ambiguous. + the `macro` is ambiguous. The full interface is defined [here][code_parse_int]. -The macro parser does pretty much exactly the same as a normal regex parser with +The `macro` parser does pretty much exactly the same as a normal regex parser with one exception: in order to parse different types of metavariables, such as -`ident`, `block`, `expr`, etc., the macro parser must sometimes call back to the +`ident`, `block`, `expr`, etc., the `macro` parser must sometimes call back to the normal Rust parser. -As mentioned above, both definitions and invocations of macros are parsed using -the macro parser. This is extremely non-intuitive and self-referential. The code -to parse macro _definitions_ is in +As mentioned above, both definitions and invocations of `macro`s are parsed using +the `macro` parser. This is extremely non-intuitive and self-referential. The code +to parse `macro` _definitions_ is in [`compiler/rustc_expand/src/mbe/macro_rules.rs`][code_mr]. It defines the pattern for -matching for a macro definition as `$( $lhs:tt => $rhs:tt );+`. In other words, +matching for a `macro` definition as `$( $lhs:tt => $rhs:tt );+`. In other words, a `macro_rules` definition should have in its body at least one occurrence of a token tree followed by `=>` followed by another token tree. When the compiler comes to a `macro_rules` definition, it uses this pattern to match the two token -trees per rule in the definition of the macro _using the macro parser itself_. +trees per rule in the definition of the `macro` _using the `macro` parser itself_. In our example definition, the metavariable `$lhs` would match the patterns of both arms: `(print $mvar:ident)` and `(print twice $mvar:ident)`. And `$rhs` would match the bodies of both arms: `{ println!("{}", $mvar); }` and `{ println!("{}", $mvar); println!("{}", $mvar); }`. The parser would keep this -knowledge around for when it needs to expand a macro invocation. +knowledge around for when it needs to expand a `macro` invocation. -When the compiler comes to a macro invocation, it parses that invocation using -the same NFA-based macro parser that is described above. However, the matcher -used is the first token tree (`$lhs`) extracted from the arms of the macro +When the compiler comes to a `macro` invocation, it parses that invocation using +the same NFA-based `macro` parser that is described above. However, the matcher +used is the first token tree (`$lhs`) extracted from the arms of the `macro` _definition_. Using our example, we would try to match the token stream `print foo` from the invocation against the matchers `print $mvar:ident` and `print twice $mvar:ident` that we previously extracted from the definition. The -algorithm is exactly the same, but when the macro parser comes to a place in the +algorithm is exactly the same, but when the `macro` parser comes to a place in the current matcher where it needs to match a _non-terminal_ (e.g. `$mvar:ident`), it calls back to the normal Rust parser to get the contents of that non-terminal. In this case, the Rust parser would look for an `ident` token, -which it finds (`foo`) and returns to the macro parser. Then, the macro parser +which it finds (`foo`) and returns to the `macro` parser. Then, the `macro` parser proceeds in parsing as normal. Also, note that exactly one of the matchers from the various arms should match the invocation; if there is more than one match, the parse is ambiguous, while if there are no matches at all, there is a syntax error. -For more information about the macro parser's implementation, see the comments +For more information about the `macro` parser's implementation, see the comments in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp]. ### `macro`s and Macros 2.0 @@ -577,21 +588,21 @@ in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp]. There is an old and mostly undocumented effort to improve the MBE system, give it more hygiene-related features, better scoping and visibility rules, etc. There hasn't been a lot of work on this recently, unfortunately. Internally, `macro` -macros use the same machinery as today's MBEs; they just have additional +`macro`s use the same machinery as today's MBEs; they just have additional syntactic sugar and are allowed to be in namespaces. ## Procedural Macros -Procedural macros are also expanded during parsing, as mentioned above. +Procedural `macro`s are also expanded during parsing, as mentioned above. However, they use a rather different mechanism. Rather than having a parser in -the compiler, procedural macros are implemented as custom, third-party crates. -The compiler will compile the proc macro crate and specially annotated -functions in them (i.e. the proc macro itself), passing them a stream of tokens. +the compiler, procedural `macro`s are implemented as custom, third-party crates. +The compiler will compile the proc `macro` crate and specially annotated +functions in them (i.e. the proc `macro` itself), passing them a stream of tokens. -The proc macro can then transform the token stream and output a new token -stream, which is synthesized into the AST. +The proc `macro` can then transform the token stream and output a new token +stream, which is synthesized into the `AST`. -It's worth noting that the token stream type used by proc macros is _stable_, +It's worth noting that the token stream type used by proc `macro`s is _stable_, so `rustc` does not use it internally (since our internal data structures are unstable). The compiler's token stream is [`rustc_ast::tokenstream::TokenStream`][rustcts], as previously. This is @@ -610,6 +621,6 @@ TODO: more here. [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/116 ### Custom Derive -Custom derives are a special type of proc macro. +Custom derives are a special type of proc `macro`. TODO: more? [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/1160)