Update macro-expansion.md

This commit is contained in:
Tbkhi 2024-03-10 18:40:28 -03:00 committed by nora
parent 6977f206f5
commit 2edd9e08d4
1 changed files with 189 additions and 178 deletions

View File

@ -2,25 +2,29 @@
<!-- toc --> <!-- toc -->
> `rustc_ast`, `rustc_expand`, and `rustc_builtin_macros` are all undergoing > N.B. [`rustc_ast`], [`rustc_expand`], and [`rustc_builtin_macros`] are all
> refactoring, so some of the links in this chapter may be broken. > undergoing refactoring, so some of the links in this chapter may be broken.
Rust has a very powerful macro system. In the previous chapter, we saw how the Rust has a very powerful `macro` system. In the previous chapter, we saw how
parser sets aside macros to be expanded (it temporarily uses [placeholders]). the parser sets aside `macro`s to be expanded (using temporary [placeholders]).
This chapter is about the process of expanding those macros iteratively until This chapter is about the process of expanding those `macro`s iteratively until
we have a complete AST for our crate with no unexpanded macros (or a compile we have a complete [*Abstract Syntax Tree* (`AST`)][ast] for our crate with no
error). unexpanded `macro`s (or a compile error).
[ast]: https://en.wikipedia.org/wiki/Abstract_syntax_tree
[`rustc_ast`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html
[`rustc_expand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/index.html
[`rustc_builtin_macros`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_builtin_macros/index.html
[placeholders]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/placeholders/index.html [placeholders]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/placeholders/index.html
First, we will discuss the algorithm that expands and integrates macro output First, we discuss the algorithm that expands and integrates `macro` output into
into ASTs. Next, we will take a look at how hygiene data is collected. Finally, `AST`s. Next, we take a look at how hygiene data is collected. Finally, we look
we will look at the specifics of expanding different types of macros. at the specifics of expanding different types of `macro`s.
Many of the algorithms and data structures described below are in [`rustc_expand`], Many of the algorithms and data structures described below are in [`rustc_expand`],
with basic data structures in [`rustc_expand::base`][base]. with fundamental data structures in [`rustc_expand::base`][base].
Also of note, `cfg` and `cfg_attr` are treated specially from other macros, and are Also of note, `cfg` and `cfg_attr` are treated specially from other `macro`s, and are
handled in [`rustc_expand::config`][cfg]. handled in [`rustc_expand::config`][cfg].
[`rustc_expand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/index.html [`rustc_expand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/index.html
@ -29,108 +33,112 @@ handled in [`rustc_expand::config`][cfg].
## Expansion and AST Integration ## Expansion and AST Integration
First of all, expansion happens at the crate level. Given a raw source code for Firstly, expansion happens at the crate level. Given a raw source code for
a crate, the compiler will produce a massive AST with all macros expanded, all a crate, the compiler will produce a massive `AST` with all `macro`s expanded, all
modules inlined, etc. The primary entry point for this process is the modules inlined, etc. The primary entry point for this process is the
[`MacroExpander::fully_expand_fragment`][fef] method. With few exceptions, we [`MacroExpander::fully_expand_fragment()`][fef] method. With few exceptions, we
use this method on the whole crate (see ["Eager Expansion"](#eager-expansion) use this method on the whole crate (see ["Eager Expansion"](#eager-expansion)
below for more detailed discussion of edge case expansion issues). below for more detailed discussion of edge case expansion issues).
[`rustc_builtin_macros`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_builtin_macros/index.html [`rustc_builtin_macros`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_builtin_macros/index.html
[reb]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/build/index.html [reb]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/build/index.html
At a high level, [`fully_expand_fragment`][fef] works in iterations. We keep a At a high level, [`fully_expand_fragment()`][fef] works in iterations. We keep a
queue of unresolved macro invocations (that is, macros we haven't found the queue of unresolved `macro` invocations (i.e. `macro`s we haven't found the
definition of yet). We repeatedly try to pick a macro from the queue, resolve definition of yet). We repeatedly try to pick a `macro` from the queue, resolve
it, expand it, and integrate it back. If we can't make progress in an it, expand it, and integrate it back. If we can't make progress in an
iteration, this represents a compile error. Here is the [algorithm][original]: iteration, this represents a compile error. Here is the [algorithm][original]:
[fef]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.MacroExpander.html#method.fully_expand_fragment [fef]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.MacroExpander.html#method.fully_expand_fragment
[original]: https://github.com/rust-lang/rust/pull/53778#issuecomment-419224049 [original]: https://github.com/rust-lang/rust/pull/53778#issuecomment-419224049
1. Initialize a `queue` of unresolved macros. 1. Initialize a `queue` of unresolved `macro`s.
2. Repeat until `queue` is empty (or we make no progress, which is an error): 2. Repeat until `queue` is empty (or we make no progress, which is an error):
1. [Resolve](./name-resolution.md) imports in our partially built crate as 1. [Resolve](./name-resolution.md) imports in our partially built crate as
much as possible. much as possible.
2. Collect as many macro [`Invocation`s][inv] as possible from our 2. Collect as many `macro` [`Invocation`s][inv] as possible from our
partially built crate (fn-like, attributes, derives) and add them to the partially built crate (`fn`-like, attributes, derives) and add them to the
queue. queue.
3. Dequeue the first element, and attempt to resolve it. 3. Dequeue the first element and attempt to resolve it.
4. If it's resolved: 4. If it's resolved:
1. Run the macro's expander function that consumes a [`TokenStream`] or 1. Run the `macro`'s expander function that consumes a [`TokenStream`] or
AST and produces a [`TokenStream`] or [`AstFragment`] (depending on `AST` and produces a [`TokenStream`] or [`AstFragment`] (depending on
the macro kind). (A `TokenStream` is a collection of [`TokenTree`s][tt], the `macro` kind). (A [`TokenStream`] is a collection of [`TokenTree`s][tt],
each of which are a token (punctuation, identifier, or literal) or a each of which are a token (punctuation, identifier, or literal) or a
delimited group (anything inside `()`/`[]`/`{}`)). delimited group (anything inside `()`/`[]`/`{}`)).
- At this point, we know everything about the macro itself and can - At this point, we know everything about the `macro` itself and can
call `set_expn_data` to fill in its properties in the global data; call [`set_expn_data()`] to fill in its properties in the global
that is the hygiene data associated with `ExpnId`. (See [the data; that is the [hygiene] data associated with [`ExpnId`] (see
"Hygiene" section below][hybelow]). [Hygiene][hybelow] below).
2. Integrate that piece of AST into the big existing partially built 2. Integrate that piece of `AST` into the currently-existing though
AST. This is essentially where the "token-like mass" becomes a partially-built `AST`. This is essentially where the "token-like mass"
proper set-in-stone AST with side-tables. It happens as follows: becomes a proper set-in-stone `AST` with side-tables. It happens as
- If the macro produces tokens (e.g. a proc macro), we parse into follows:
an AST, which may produce parse errors. - If the `macro` produces tokens (e.g. a `proc macro`), we parse into
- During expansion, we create `SyntaxContext`s (hierarchy 2). (See an `AST`, which may produce parse errors.
[the "Hygiene" section below][hybelow]) - During expansion, we create [`SyntaxContext`]s (hierarchy 2) (see
- These three passes happen one after another on every AST fragment [Hygiene][hybelow] below).
freshly expanded from a macro: - These three passes happen one after another on every `AST` fragment
freshly expanded from a `macro`:
- [`NodeId`]s are assigned by [`InvocationCollector`]. This - [`NodeId`]s are assigned by [`InvocationCollector`]. This
also collects new macro calls from this new AST piece and also collects new `macro` calls from this new `AST` piece and
adds them to the queue. adds them to the queue.
- ["Def paths"][defpath] are created and [`DefId`]s are - ["Def paths"][defpath] are created and [`DefId`]s are
assigned to them by [`DefCollector`]. assigned to them by [`DefCollector`].
- Names are put into modules (from the resolver's point of - Names are put into modules (from the resolver's point of
view) by [`BuildReducedGraphVisitor`]. view) by [`BuildReducedGraphVisitor`].
3. After expanding a single macro and integrating its output, continue 3. After expanding a single `macro` and integrating its output, continue
to the next iteration of [`fully_expand_fragment`][fef]. to the next iteration of [`fully_expand_fragment()`][fef].
5. If it's not resolved: 5. If it's not resolved:
1. Put the macro back in the queue 1. Put the `macro` back in the queue.
2. Continue to next iteration... 2. Continue to next iteration...
[defpath]: hir.md#identifiers-in-the-hir
[`NodeId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/node_id/struct.NodeId.html
[`InvocationCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.InvocationCollector.html
[`DefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.DefId.html
[`DefCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/def_collector/struct.DefCollector.html
[`BuildReducedGraphVisitor`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/build_reduced_graph/struct.BuildReducedGraphVisitor.html
[hybelow]: #hygiene-and-hierarchies
[tt]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/enum.TokenTree.html
[`TokenStream`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/struct.TokenStream.html
[inv]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.Invocation.html
[`AstFragment`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragment.html [`AstFragment`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragment.html
[`BuildReducedGraphVisitor`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/build_reduced_graph/struct.BuildReducedGraphVisitor.html
[`DefCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/def_collector/struct.DefCollector.html
[`DefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.DefId.html
[`ExpnId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnId.html
[`InvocationCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.InvocationCollector.html
[`NodeId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/node_id/struct.NodeId.html
[`set_expn_data()`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.LocalExpnId.html#method.set_expn_data
[`SyntaxContext`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html
[`TokenStream`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/struct.TokenStream.html
[defpath]: hir.md#identifiers-in-the-hir
[hybelow]: #hygiene-and-hierarchies
[hygiene]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/index.html
[inv]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.Invocation.html
[tt]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/enum.TokenTree.html
### Error Recovery ### Error Recovery
If we make no progress in an iteration, then we have reached a compilation If we make no progress in an iteration we have reached a compilation error
error (e.g. an undefined macro). We attempt to recover from failures (e.g. an undefined `macro`). We attempt to recover from failures (i.e.
(unresolved macros or imports) for the sake of diagnostics. This allows unresolved `macro`s or imports) with the intent of generating diagnostics.
compilation to continue past the first error, so that we can report more errors Failure recovery happens by expanding unresolved `macro`s into
at a time. Recovery can't cause compilation to succeed. We know that it will [`ExprKind::Err`][err] and allows compilation to continue past the first error
fail at this point. The recovery happens by expanding unresolved macros into so that `rustc` can report more errors than just the original failure.
[`ExprKind::Err`][err].
[err]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/enum.ExprKind.html#variant.Err [err]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/enum.ExprKind.html#variant.Err
### Name Resolution ### Name Resolution
Notice that name resolution is involved here: we need to resolve imports and Notice that name resolution is involved here: we need to resolve imports and
macro names in the above algorithm. This is done in `macro` names in the above algorithm. This is done in
[`rustc_resolve::macros`][mresolve], which resolves macro paths, validates [`rustc_resolve::macros`][mresolve], which resolves `macro` paths, validates
those resolutions, and reports various errors (e.g. "not found" or "found, but those resolutions, and reports various errors (e.g. "not found", "found, but
it's unstable" or "expected x, found y"). However, we don't try to resolve it's unstable", "expected x, found y"). However, we don't try to resolve
other names yet. This happens later, as we will see in the [next other names yet. This happens later, as we will see in the chapter: [Name
chapter](./name-resolution.md). Resolution](./name-resolution.md).
[mresolve]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/macros/index.html [mresolve]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/macros/index.html
### Eager Expansion ### Eager Expansion
_Eager expansion_ means that we expand the arguments of a macro invocation _Eager expansion_ means we expand the arguments of a `macro` invocation before
before the macro invocation itself. This is implemented only for a few special the `macro` invocation itself. This is implemented only for a few special
built-in macros that expect literals; expanding arguments first for some of built-in `macro`s that expect literals; expanding arguments first for some of
these macro results in a smoother user experience. As an example, consider the these `macro` results in a smoother user experience. As an example, consider
following: the following:
```rust,ignore ```rust,ignore
macro bar($i: ident) { $i } macro bar($i: ident) { $i }
@ -139,35 +147,37 @@ macro foo($i: ident) { $i }
foo!(bar!(baz)); foo!(bar!(baz));
``` ```
A lazy expansion would expand `foo!` first. An eager expansion would expand A lazy-expansion would expand `foo!` first. An eager-expansion would expand
`bar!` first. `bar!` first.
Eager expansion is not a generally available feature of Rust. Implementing Eager-expansion is not a generally available feature of Rust. Implementing
eager expansion more generally would be challenging, but we implement it for a eager-expansion more generally would be challenging, so we implement it for a
few special built-in macros for the sake of user experience. The built-in few special built-in `macro`s for the sake of user-experience. The built-in
macros are implemented in [`rustc_builtin_macros`], along with some other early `macro`s are implemented in [`rustc_builtin_macros`], along with some other
code generation facilities like injection of standard library imports or early code generation facilities like injection of standard library imports or
generation of test harness. There are some additional helpers for building generation of test harness. There are some additional helpers for building
their AST fragments in [`rustc_expand::build`][reb]. Eager expansion generally `AST` fragments in [`rustc_expand::build`][reb]. Eager-expansion generally
performs a subset of the things that lazy (normal) expansion does. It is done by performs a subset of the things that lazy (normal) expansion does. It is done
invoking [`fully_expand_fragment`][fef] on only part of a crate (as opposed to by invoking [`fully_expand_fragment`][fef] on only part of a crate (as opposed
the whole crate, like we normally do). to the whole crate, like we normally do).
### Other Data Structures ### Other Data Structures
Here are some other notable data structures involved in expansion and integration: Here are some other notable data structures involved in expansion and
- [`ResolverExpand`] - a trait used to break crate dependencies. This allows the integration:
- [`ResolverExpand`] - a `trait` used to break crate dependencies. This allows the
resolver services to be used in [`rustc_ast`], despite [`rustc_resolve`] and resolver services to be used in [`rustc_ast`], despite [`rustc_resolve`] and
pretty much everything else depending on [`rustc_ast`]. pretty much everything else depending on [`rustc_ast`].
- [`ExtCtxt`]/[`ExpansionData`] - various intermediate data kept and used by expansion - [`ExtCtxt`]/[`ExpansionData`] - holds various intermediate expansion
infrastructure in the process of its work infrastructure data.
- [`Annotatable`] - a piece of AST that can be an attribute target, almost same - [`Annotatable`] - a piece of `AST` that can be an attribute target, almost the same
thing as AstFragment except for types and patterns that can be produced by thing as [`AstFragment`] except for `type`s and patterns that can be produced by
macros but cannot be annotated with attributes `macro`s but cannot be annotated with attributes.
- [`MacResult`] - a "polymorphic" AST fragment, something that can turn into a - [`MacResult`] - a "polymorphic" `AST` fragment, something that can turn into
different `AstFragment` depending on its [`AstFragmentKind`] - item, a different [`AstFragment`] depending on its [`AstFragmentKind`] (i.e. an item,
or expression, or pattern etc. expression, pattern, etc).
[`AstFragment`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragment.html
[`rustc_ast`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html [`rustc_ast`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html
[`rustc_resolve`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/index.html [`rustc_resolve`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/index.html
[`ResolverExpand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.ResolverExpand.html [`ResolverExpand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.ResolverExpand.html
@ -179,7 +189,7 @@ Here are some other notable data structures involved in expansion and integratio
## Hygiene and Hierarchies ## Hygiene and Hierarchies
If you have ever used C/C++ preprocessor macros, you know that there are some If you have ever used the C/C++ preprocessor macros, you know that there are some
annoying and hard-to-debug gotchas! For example, consider the following C code: annoying and hard-to-debug gotchas! For example, consider the following C code:
```c ```c
@ -213,16 +223,16 @@ we got `foo(0, 0)` because the macro defined its own `y`!
These are both examples of _macro hygiene_ issues. _Hygiene_ relates to how to These are both examples of _macro hygiene_ issues. _Hygiene_ relates to how to
handle names defined _within a macro_. In particular, a hygienic macro system handle names defined _within a macro_. In particular, a hygienic macro system
prevents errors due to names introduced within a macro. Rust macros are hygienic prevents errors due to names introduced within a macro. Rust `macro`s are hygienic
in that they do not allow one to write the sorts of bugs above. in that they do not allow one to write the sorts of bugs above.
At a high level, hygiene within the Rust compiler is accomplished by keeping At a high level, hygiene within the Rust compiler is accomplished by keeping
track of the context where a name is introduced and used. We can then track of the context where a name is introduced and used. We can then
disambiguate names based on that context. Future iterations of the macro system disambiguate names based on that context. Future iterations of the `macro` system
will allow greater control to the macro author to use that context. For example, will allow greater control to the `macro` author to use that context. For example,
a macro author may want to introduce a new name to the context where the macro a `macro` author may want to introduce a new name to the context where the `macro`
was called. Alternately, the macro author may be defining a variable for use was called. Alternately, the `macro` author may be defining a variable for use
only within the macro (i.e. it should not be visible outside the macro). only within the `macro` (i.e. it should not be visible outside the `macro`).
[code_dir]: https://github.com/rust-lang/rust/tree/master/compiler/rustc_expand/src/mbe [code_dir]: https://github.com/rust-lang/rust/tree/master/compiler/rustc_expand/src/mbe
[code_mp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser [code_mp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser
@ -230,18 +240,18 @@ only within the macro (i.e. it should not be visible outside the macro).
[code_parse_int]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/struct.TtParser.html#method.parse_tt [code_parse_int]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/struct.TtParser.html#method.parse_tt
[parsing]: ./the-parser.html [parsing]: ./the-parser.html
The context is attached to AST nodes. All AST nodes generated by macros have The context is attached to `AST` nodes. All `AST` nodes generated by `macro`s have
context attached. Additionally, there may be other nodes that have context context attached. Additionally, there may be other nodes that have context
attached, such as some desugared syntax (non-macro-expanded nodes are attached, such as some desugared syntax (non-`macro`-expanded nodes are
considered to just have the "root" context, as described below). considered to just have the "root" context, as described below).
Throughout the compiler, we use [`rustc_span::Span`s][span] to refer to code locations. Throughout the compiler, we use [`rustc_span::Span`s][span] to refer to code locations.
This struct also has hygiene information attached to it, as we will see later. This struct also has hygiene information attached to it, as we will see later.
[span]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.Span.html [span]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.Span.html
Because macros invocations and definitions can be nested, the syntax context of Because `macro`s invocations and definitions can be nested, the syntax context of
a node must be a hierarchy. For example, if we expand a macro and there is a node must be a hierarchy. For example, if we expand a `macro` and there is
another macro invocation or definition in the generated output, then the syntax another `macro` invocation or definition in the generated output, then the syntax
context should reflect the nesting. context should reflect the nesting.
However, it turns out that there are actually a few types of context we may However, it turns out that there are actually a few types of context we may
@ -249,13 +259,13 @@ want to track for different purposes. Thus, there are not just one but _three_
expansion hierarchies that together comprise the hygiene information for a expansion hierarchies that together comprise the hygiene information for a
crate. crate.
All of these hierarchies need some sort of "macro ID" to identify individual All of these hierarchies need some sort of "`macro` ID" to identify individual
elements in the chain of expansions. This ID is [`ExpnId`]. All macros receive elements in the chain of expansions. This ID is [`ExpnId`]. All `macro`s receive
an integer ID, assigned continuously starting from 0 as we discover new macro an integer ID, assigned continuously starting from 0 as we discover new `macro`
calls. All hierarchies start at [`ExpnId::root()`][rootid], which is its own calls. All hierarchies start at [`ExpnId::root()`][rootid], which is its own
parent. parent.
[`rustc_span::hygiene`][hy] contains all of the hygiene-related algorithms The [`rustc_span::hygiene`][hy] library contains all of the hygiene-related algorithms
(with the exception of some hacks in [`Resolver::resolve_crate_root`][hacks]) (with the exception of some hacks in [`Resolver::resolve_crate_root`][hacks])
and structures related to hygiene and expansion that are kept in global data. and structures related to hygiene and expansion that are kept in global data.
@ -273,18 +283,18 @@ any [`Ident`] without any context.
### The Expansion Order Hierarchy ### The Expansion Order Hierarchy
The first hierarchy tracks the order of expansions, i.e., when a macro The first hierarchy tracks the order of expansions, i.e., when a `macro`
invocation is in the output of another macro. invocation is in the output of another `macro`.
Here, the children in the hierarchy will be the "innermost" tokens. The Here, the children in the hierarchy will be the "innermost" tokens. The
[`ExpnData`] struct itself contains a subset of properties from both macro [`ExpnData`] struct itself contains a subset of properties from both `macro`
definition and macro call available through global data. definition and `macro` call available through global data.
[`ExpnData::parent`][edp] tracks the child -> parent link in this hierarchy. [`ExpnData::parent`][edp] tracks the child-to-parent link in this hierarchy.
[`ExpnData`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html [`ExpnData`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html
[edp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html#structfield.parent [edp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html#structfield.parent
For example, For example:
```rust,ignore ```rust,ignore
macro_rules! foo { () => { println!(); } } macro_rules! foo { () => { println!(); } }
@ -292,25 +302,25 @@ macro_rules! foo { () => { println!(); } }
fn main() { foo!(); } fn main() { foo!(); }
``` ```
In this code, the AST nodes that are finally generated would have hierarchy In this code, the `AST` nodes that are finally generated would have hierarchy
`root -> id(foo) -> id(println)`. `root -> id(foo) -> id(println)`.
### The Macro Definition Hierarchy ### The Macro Definition Hierarchy
The second hierarchy tracks the order of macro definitions, i.e., when we are The second hierarchy tracks the order of `macro` definitions, i.e., when we are
expanding one macro another macro definition is revealed in its output. This expanding one `macro` another `macro` definition is revealed in its output. This
one is a bit tricky and more complex than the other two hierarchies. one is a bit tricky and more complex than the other two hierarchies.
[`SyntaxContext`][sc] represents a whole chain in this hierarchy via an ID. [`SyntaxContext`][sc] represents a whole chain in this hierarchy via an ID.
[`SyntaxContextData`][scd] contains data associated with the given [`SyntaxContextData`][scd] contains data associated with the given
`SyntaxContext`; mostly it is a cache for results of filtering that chain in [`SyntaxContext`][sc]; mostly it is a cache for results of filtering that chain in
different ways. [`SyntaxContextData::parent`][scdp] is the child -> parent different ways. [`SyntaxContextData::parent`][scdp] is the child-to-parent
link here, and [`SyntaxContextData::outer_expns`][scdoe] are individual link here, and [`SyntaxContextData::outer_expns`][scdoe] are individual
elements in the chain. The "chaining operator" is elements in the chain. The "chaining-operator" is
[`SyntaxContext::apply_mark`][am] in compiler code. [`SyntaxContext::apply_mark`][am] in compiler code.
A [`Span`][span], mentioned above, is actually just a compact representation of A [`Span`][span], mentioned above, is actually just a compact representation of
a code location and `SyntaxContext`. Likewise, an [`Ident`] is just an interned a code location and [`SyntaxContext`][sc]. Likewise, an [`Ident`] is just an interned
[`Symbol`] + `Span` (i.e. an interned string + hygiene data). [`Symbol`] + `Span` (i.e. an interned string + hygiene data).
[`Symbol`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/symbol/struct.Symbol.html [`Symbol`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/symbol/struct.Symbol.html
@ -320,13 +330,13 @@ a code location and `SyntaxContext`. Likewise, an [`Ident`] is just an interned
[scdoe]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContextData.html#structfield.outer_expn [scdoe]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContextData.html#structfield.outer_expn
[am]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.apply_mark [am]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.apply_mark
For built-in macros, we use the context: For built-in `macro`s, we use the context:
`SyntaxContext::empty().apply_mark(expn_id)`, and such macros are considered to `SyntaxContext::empty().apply_mark(expn_id)`, and such `macro`s are considered to
be defined at the hierarchy root. We do the same for proc-macros because we be defined at the hierarchy root. We do the same for `proc macro`s because we
haven't implemented cross-crate hygiene yet. haven't implemented cross-crate hygiene yet.
If the token had context `X` before being produced by a macro then after being If the token had context `X` before being produced by a `macro` then after being
produced by the macro it has context `X -> macro_id`. Here are some examples: produced by the `macro` it has context `X -> macro_id`. Here are some examples:
Example 0: Example 0:
@ -356,7 +366,7 @@ after the first expansion, then `ROOT -> id(m) -> id(n)`.
Example 2: Example 2:
Note that these chains are not entirely determined by their last element, in Note that these chains are not entirely determined by their last element, in
other words `ExpnId` is not isomorphic to `SyntaxContext`. other words [`ExpnId`] is not isomorphic to [`SyntaxContext`][sc].
```rust,ignore ```rust,ignore
macro m($i: ident) { macro n() { ($i, bar) } } macro m($i: ident) { macro n() { ($i, bar) } }
@ -369,15 +379,16 @@ After all expansions, `foo` has context `ROOT -> id(n)` and `bar` has context
Finally, one last thing to mention is that currently, this hierarchy is subject Finally, one last thing to mention is that currently, this hierarchy is subject
to the ["context transplantation hack"][hack]. Basically, the more modern (and to the ["context transplantation hack"][hack]. Basically, the more modern (and
experimental) `macro` macros have stronger hygiene than the older MBE system, experimental) `macro` `macro`s have stronger hygiene than the older MBE system,
but this can result in weird interactions between the two. The hack is intended but this can result in weird interactions between the two. The hack is intended
to make things "just work" for now. to make things "just work" for now.
[`ExpnId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnId.html
[hack]: https://github.com/rust-lang/rust/pull/51762#issuecomment-401400732 [hack]: https://github.com/rust-lang/rust/pull/51762#issuecomment-401400732
### The Call-site Hierarchy ### The Call-site Hierarchy
The third and final hierarchy tracks the location of macro invocations. The third and final hierarchy tracks the location of `macro` invocations.
In this hierarchy [`ExpnData::call_site`][callsite] is the child -> parent link. In this hierarchy [`ExpnData::call_site`][callsite] is the child -> parent link.
@ -392,39 +403,39 @@ macro foo($i: ident) { $i }
foo!(bar!(baz)); foo!(bar!(baz));
``` ```
For the `baz` AST node in the final output, the expansion-order hierarchy is For the `baz` `AST` node in the final output, the expansion-order hierarchy is
`ROOT -> id(foo) -> id(bar) -> baz`, while the call-site hierarchy is `ROOT -> `ROOT -> id(foo) -> id(bar) -> baz`, while the call-site hierarchy is `ROOT ->
baz`. baz`.
### Macro Backtraces ### Macro Backtraces
Macro backtraces are implemented in [`rustc_span`] using the hygiene machinery `macro` backtraces are implemented in [`rustc_span`] using the hygiene machinery
in [`rustc_span::hygiene`][hy]. in [`rustc_span::hygiene`][hy].
[`rustc_span`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/index.html [`rustc_span`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/index.html
## Producing Macro Output ## Producing Macro Output
Above, we saw how the output of a macro is integrated into the AST for a crate, Above, we saw how the output of a `macro` is integrated into the `AST` for a crate,
and we also saw how the hygiene data for a crate is generated. But how do we and we also saw how the hygiene data for a crate is generated. But how do we
actually produce the output of a macro? It depends on the type of macro. actually produce the output of a `macro`? It depends on the type of `macro`.
There are two types of macros in Rust: There are two types of `macro`s in Rust:
`macro_rules!` macros (a.k.a. "Macros By Example" (MBE)) and procedural macros `macro_rules!` `macro`s (a.k.a. "Macros By Example" (MBE)) and procedural `macro`s
(or "proc macros"; including custom derives). During the parsing phase, the normal (or "proc `macro`s"; including custom derives). During the parsing phase, the normal
Rust parser will set aside the contents of macros and their invocations. Later, Rust parser will set aside the contents of `macro`s and their invocations. Later,
macros are expanded using these portions of the code. `macro`s are expanded using these portions of the code.
Some important data structures/interfaces here: Some important data structures/interfaces here:
- [`SyntaxExtension`] - a lowered macro representation, contains its expander - [`SyntaxExtension`] - a lowered `macro` representation, contains its expander
function, which transforms a `TokenStream` or AST into another `TokenStream` function, which transforms a `TokenStream` or `AST` into another `TokenStream`
or AST + some additional data like stability, or a list of unstable features or `AST` + some additional data like stability, or a list of unstable features
allowed inside the macro. allowed inside the `macro`.
- [`SyntaxExtensionKind`] - expander functions may have several different - [`SyntaxExtensionKind`] - expander functions may have several different
signatures (take one token stream, or two, or a piece of AST, etc). This is signatures (take one token stream, or two, or a piece of `AST`, etc). This is
an enum that lists them. an enum that lists them.
- [`BangProcMacro`]/[`TTMacroExpander`]/[`AttrProcMacro`]/[`MultiItemModifier`] - - [`BangProcMacro`]/[`TTMacroExpander`]/[`AttrProcMacro`]/[`MultiItemModifier`] -
traits representing the expander function signatures. `trait`s representing the expander function signatures.
[`SyntaxExtension`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/struct.SyntaxExtension.html [`SyntaxExtension`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/struct.SyntaxExtension.html
[`SyntaxExtensionKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/enum.SyntaxExtensionKind.html [`SyntaxExtensionKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/enum.SyntaxExtensionKind.html
@ -435,11 +446,11 @@ Some important data structures/interfaces here:
## Macros By Example ## Macros By Example
MBEs have their own parser distinct from the normal Rust parser. When macros MBEs have their own parser distinct from the normal Rust parser. When `macro`s
are expanded, we may invoke the MBE parser to parse and expand a macro. The are expanded, we may invoke the MBE parser to parse and expand a `macro`. The
MBE parser, in turn, may call the normal Rust parser when it needs to bind a MBE parser, in turn, may call the normal Rust parser when it needs to bind a
metavariable (e.g. `$my_expr`) while parsing the contents of a macro metavariable (e.g. `$my_expr`) while parsing the contents of a `macro`
invocation. The code for macro expansion is in invocation. The code for `macro` expansion is in
[`compiler/rustc_expand/src/mbe/`][code_dir]. [`compiler/rustc_expand/src/mbe/`][code_dir].
### Example ### Example
@ -467,8 +478,8 @@ special tokens, such as `EOF`, which indicates that there are no more tokens.
Token trees resulting from paired parentheses-like characters (`(`...`)`, Token trees resulting from paired parentheses-like characters (`(`...`)`,
`[`...`]`, and `{`...`}`) they include the open and close and all the tokens `[`...`]`, and `{`...`}`) they include the open and close and all the tokens
in between (we do require that parentheses-like characters be balanced). Having in between (we do require that parentheses-like characters be balanced). Having
macro expansion operate on token streams rather than the raw bytes of a source `macro` expansion operate on token streams rather than the raw bytes of a source
file abstracts away a lot of complexity. The macro expander (and much of the file abstracts away a lot of complexity. The `macro` expander (and much of the
rest of the compiler) doesn't really care that much about the exact line and rest of the compiler) doesn't really care that much about the exact line and
column of some syntactic construct in the code; it cares about what constructs column of some syntactic construct in the code; it cares about what constructs
are used in the code. Using tokens allows us to care about _what_ without are used in the code. Using tokens allows us to care about _what_ without
@ -481,21 +492,21 @@ Whenever we refer to the "example _invocation_", we mean the following snippet:
printer!(print foo); // Assume `foo` is a variable defined somewhere else... printer!(print foo); // Assume `foo` is a variable defined somewhere else...
``` ```
The process of expanding the macro invocation into the syntax tree The process of expanding the `macro` invocation into the syntax tree
`println!("{}", foo)` and then expanding that into a call to `Display::fmt` is `println!("{}", foo)` and then expanding that into a call to `Display::fmt` is
called _macro expansion_, and it is the topic of this chapter. called _`macro` expansion_, and it is the topic of this chapter.
### The MBE parser ### The MBE parser
There are two parts to MBE expansion: parsing the definition and parsing the There are two parts to MBE expansion: parsing the definition and parsing the
invocations. Interestingly, both are done by the macro parser. invocations. Interestingly, both are done by the `macro` parser.
Basically, the MBE parser is like an NFA-based regex parser. It uses an Basically, the MBE parser is like an NFA-based regex parser. It uses an
algorithm similar in spirit to the [Earley parsing algorithm similar in spirit to the [Earley parsing
algorithm](https://en.wikipedia.org/wiki/Earley_parser). The macro parser is algorithm](https://en.wikipedia.org/wiki/Earley_parser). The `macro` parser is
defined in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp]. defined in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].
The interface of the macro parser is as follows (this is slightly simplified): The interface of the `macro` parser is as follows (this is slightly simplified):
```rust,ignore ```rust,ignore
fn parse_tt( fn parse_tt(
@ -505,7 +516,7 @@ fn parse_tt(
) -> ParseResult ) -> ParseResult
``` ```
We use these items in macro parser: We use these items in `macro` parser:
- `parser` is a reference to the state of a normal Rust parser, including the - `parser` is a reference to the state of a normal Rust parser, including the
token stream and parsing session. The token stream is what we are about to token stream and parsing session. The token stream is what we are about to
@ -529,47 +540,47 @@ three cases has occurred:
"No rule expected token _blah_". "No rule expected token _blah_".
- Error: some fatal error has occurred _in the parser_. For example, this - Error: some fatal error has occurred _in the parser_. For example, this
happens if there is more than one pattern match, since that indicates happens if there is more than one pattern match, since that indicates
the macro is ambiguous. the `macro` is ambiguous.
The full interface is defined [here][code_parse_int]. The full interface is defined [here][code_parse_int].
The macro parser does pretty much exactly the same as a normal regex parser with The `macro` parser does pretty much exactly the same as a normal regex parser with
one exception: in order to parse different types of metavariables, such as one exception: in order to parse different types of metavariables, such as
`ident`, `block`, `expr`, etc., the macro parser must sometimes call back to the `ident`, `block`, `expr`, etc., the `macro` parser must sometimes call back to the
normal Rust parser. normal Rust parser.
As mentioned above, both definitions and invocations of macros are parsed using As mentioned above, both definitions and invocations of `macro`s are parsed using
the macro parser. This is extremely non-intuitive and self-referential. The code the `macro` parser. This is extremely non-intuitive and self-referential. The code
to parse macro _definitions_ is in to parse `macro` _definitions_ is in
[`compiler/rustc_expand/src/mbe/macro_rules.rs`][code_mr]. It defines the pattern for [`compiler/rustc_expand/src/mbe/macro_rules.rs`][code_mr]. It defines the pattern for
matching for a macro definition as `$( $lhs:tt => $rhs:tt );+`. In other words, matching for a `macro` definition as `$( $lhs:tt => $rhs:tt );+`. In other words,
a `macro_rules` definition should have in its body at least one occurrence of a a `macro_rules` definition should have in its body at least one occurrence of a
token tree followed by `=>` followed by another token tree. When the compiler token tree followed by `=>` followed by another token tree. When the compiler
comes to a `macro_rules` definition, it uses this pattern to match the two token comes to a `macro_rules` definition, it uses this pattern to match the two token
trees per rule in the definition of the macro _using the macro parser itself_. trees per rule in the definition of the `macro` _using the `macro` parser itself_.
In our example definition, the metavariable `$lhs` would match the patterns of In our example definition, the metavariable `$lhs` would match the patterns of
both arms: `(print $mvar:ident)` and `(print twice $mvar:ident)`. And `$rhs` both arms: `(print $mvar:ident)` and `(print twice $mvar:ident)`. And `$rhs`
would match the bodies of both arms: `{ println!("{}", $mvar); }` and `{ would match the bodies of both arms: `{ println!("{}", $mvar); }` and `{
println!("{}", $mvar); println!("{}", $mvar); }`. The parser would keep this println!("{}", $mvar); println!("{}", $mvar); }`. The parser would keep this
knowledge around for when it needs to expand a macro invocation. knowledge around for when it needs to expand a `macro` invocation.
When the compiler comes to a macro invocation, it parses that invocation using When the compiler comes to a `macro` invocation, it parses that invocation using
the same NFA-based macro parser that is described above. However, the matcher the same NFA-based `macro` parser that is described above. However, the matcher
used is the first token tree (`$lhs`) extracted from the arms of the macro used is the first token tree (`$lhs`) extracted from the arms of the `macro`
_definition_. Using our example, we would try to match the token stream `print _definition_. Using our example, we would try to match the token stream `print
foo` from the invocation against the matchers `print $mvar:ident` and `print foo` from the invocation against the matchers `print $mvar:ident` and `print
twice $mvar:ident` that we previously extracted from the definition. The twice $mvar:ident` that we previously extracted from the definition. The
algorithm is exactly the same, but when the macro parser comes to a place in the algorithm is exactly the same, but when the `macro` parser comes to a place in the
current matcher where it needs to match a _non-terminal_ (e.g. `$mvar:ident`), current matcher where it needs to match a _non-terminal_ (e.g. `$mvar:ident`),
it calls back to the normal Rust parser to get the contents of that it calls back to the normal Rust parser to get the contents of that
non-terminal. In this case, the Rust parser would look for an `ident` token, non-terminal. In this case, the Rust parser would look for an `ident` token,
which it finds (`foo`) and returns to the macro parser. Then, the macro parser which it finds (`foo`) and returns to the `macro` parser. Then, the `macro` parser
proceeds in parsing as normal. Also, note that exactly one of the matchers from proceeds in parsing as normal. Also, note that exactly one of the matchers from
the various arms should match the invocation; if there is more than one match, the various arms should match the invocation; if there is more than one match,
the parse is ambiguous, while if there are no matches at all, there is a syntax the parse is ambiguous, while if there are no matches at all, there is a syntax
error. error.
For more information about the macro parser's implementation, see the comments For more information about the `macro` parser's implementation, see the comments
in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp]. in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].
### `macro`s and Macros 2.0 ### `macro`s and Macros 2.0
@ -577,21 +588,21 @@ in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].
There is an old and mostly undocumented effort to improve the MBE system, give There is an old and mostly undocumented effort to improve the MBE system, give
it more hygiene-related features, better scoping and visibility rules, etc. There it more hygiene-related features, better scoping and visibility rules, etc. There
hasn't been a lot of work on this recently, unfortunately. Internally, `macro` hasn't been a lot of work on this recently, unfortunately. Internally, `macro`
macros use the same machinery as today's MBEs; they just have additional `macro`s use the same machinery as today's MBEs; they just have additional
syntactic sugar and are allowed to be in namespaces. syntactic sugar and are allowed to be in namespaces.
## Procedural Macros ## Procedural Macros
Procedural macros are also expanded during parsing, as mentioned above. Procedural `macro`s are also expanded during parsing, as mentioned above.
However, they use a rather different mechanism. Rather than having a parser in However, they use a rather different mechanism. Rather than having a parser in
the compiler, procedural macros are implemented as custom, third-party crates. the compiler, procedural `macro`s are implemented as custom, third-party crates.
The compiler will compile the proc macro crate and specially annotated The compiler will compile the proc `macro` crate and specially annotated
functions in them (i.e. the proc macro itself), passing them a stream of tokens. functions in them (i.e. the proc `macro` itself), passing them a stream of tokens.
The proc macro can then transform the token stream and output a new token The proc `macro` can then transform the token stream and output a new token
stream, which is synthesized into the AST. stream, which is synthesized into the `AST`.
It's worth noting that the token stream type used by proc macros is _stable_, It's worth noting that the token stream type used by proc `macro`s is _stable_,
so `rustc` does not use it internally (since our internal data structures are so `rustc` does not use it internally (since our internal data structures are
unstable). The compiler's token stream is unstable). The compiler's token stream is
[`rustc_ast::tokenstream::TokenStream`][rustcts], as previously. This is [`rustc_ast::tokenstream::TokenStream`][rustcts], as previously. This is
@ -610,6 +621,6 @@ TODO: more here. [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/116
### Custom Derive ### Custom Derive
Custom derives are a special type of proc macro. Custom derives are a special type of proc `macro`.
TODO: more? [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/1160) TODO: more? [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/1160)