Update macro-expansion.md

This commit is contained in:
Tbkhi 2024-03-10 18:40:28 -03:00 committed by nora
parent 6977f206f5
commit 2edd9e08d4
1 changed files with 189 additions and 178 deletions

View File

@ -2,25 +2,29 @@
<!-- toc -->
> `rustc_ast`, `rustc_expand`, and `rustc_builtin_macros` are all undergoing
> refactoring, so some of the links in this chapter may be broken.
> N.B. [`rustc_ast`], [`rustc_expand`], and [`rustc_builtin_macros`] are all
> undergoing refactoring, so some of the links in this chapter may be broken.
Rust has a very powerful macro system. In the previous chapter, we saw how the
parser sets aside macros to be expanded (it temporarily uses [placeholders]).
This chapter is about the process of expanding those macros iteratively until
we have a complete AST for our crate with no unexpanded macros (or a compile
error).
Rust has a very powerful `macro` system. In the previous chapter, we saw how
the parser sets aside `macro`s to be expanded (using temporary [placeholders]).
This chapter is about the process of expanding those `macro`s iteratively until
we have a complete [*Abstract Syntax Tree* (`AST`)][ast] for our crate with no
unexpanded `macro`s (or a compile error).
[ast]: https://en.wikipedia.org/wiki/Abstract_syntax_tree
[`rustc_ast`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html
[`rustc_expand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/index.html
[`rustc_builtin_macros`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_builtin_macros/index.html
[placeholders]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/placeholders/index.html
First, we will discuss the algorithm that expands and integrates macro output
into ASTs. Next, we will take a look at how hygiene data is collected. Finally,
we will look at the specifics of expanding different types of macros.
First, we discuss the algorithm that expands and integrates `macro` output into
`AST`s. Next, we take a look at how hygiene data is collected. Finally, we look
at the specifics of expanding different types of `macro`s.
Many of the algorithms and data structures described below are in [`rustc_expand`],
with basic data structures in [`rustc_expand::base`][base].
with fundamental data structures in [`rustc_expand::base`][base].
Also of note, `cfg` and `cfg_attr` are treated specially from other macros, and are
Also of note, `cfg` and `cfg_attr` are treated specially from other `macro`s, and are
handled in [`rustc_expand::config`][cfg].
[`rustc_expand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/index.html
@ -29,108 +33,112 @@ handled in [`rustc_expand::config`][cfg].
## Expansion and AST Integration
First of all, expansion happens at the crate level. Given a raw source code for
a crate, the compiler will produce a massive AST with all macros expanded, all
Firstly, expansion happens at the crate level. Given a raw source code for
a crate, the compiler will produce a massive `AST` with all `macro`s expanded, all
modules inlined, etc. The primary entry point for this process is the
[`MacroExpander::fully_expand_fragment`][fef] method. With few exceptions, we
[`MacroExpander::fully_expand_fragment()`][fef] method. With few exceptions, we
use this method on the whole crate (see ["Eager Expansion"](#eager-expansion)
below for more detailed discussion of edge case expansion issues).
[`rustc_builtin_macros`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_builtin_macros/index.html
[reb]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/build/index.html
At a high level, [`fully_expand_fragment`][fef] works in iterations. We keep a
queue of unresolved macro invocations (that is, macros we haven't found the
definition of yet). We repeatedly try to pick a macro from the queue, resolve
At a high level, [`fully_expand_fragment()`][fef] works in iterations. We keep a
queue of unresolved `macro` invocations (i.e. `macro`s we haven't found the
definition of yet). We repeatedly try to pick a `macro` from the queue, resolve
it, expand it, and integrate it back. If we can't make progress in an
iteration, this represents a compile error. Here is the [algorithm][original]:
[fef]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.MacroExpander.html#method.fully_expand_fragment
[original]: https://github.com/rust-lang/rust/pull/53778#issuecomment-419224049
1. Initialize a `queue` of unresolved macros.
1. Initialize a `queue` of unresolved `macro`s.
2. Repeat until `queue` is empty (or we make no progress, which is an error):
1. [Resolve](./name-resolution.md) imports in our partially built crate as
much as possible.
2. Collect as many macro [`Invocation`s][inv] as possible from our
partially built crate (fn-like, attributes, derives) and add them to the
2. Collect as many `macro` [`Invocation`s][inv] as possible from our
partially built crate (`fn`-like, attributes, derives) and add them to the
queue.
3. Dequeue the first element, and attempt to resolve it.
3. Dequeue the first element and attempt to resolve it.
4. If it's resolved:
1. Run the macro's expander function that consumes a [`TokenStream`] or
AST and produces a [`TokenStream`] or [`AstFragment`] (depending on
the macro kind). (A `TokenStream` is a collection of [`TokenTree`s][tt],
1. Run the `macro`'s expander function that consumes a [`TokenStream`] or
`AST` and produces a [`TokenStream`] or [`AstFragment`] (depending on
the `macro` kind). (A [`TokenStream`] is a collection of [`TokenTree`s][tt],
each of which are a token (punctuation, identifier, or literal) or a
delimited group (anything inside `()`/`[]`/`{}`)).
- At this point, we know everything about the macro itself and can
call `set_expn_data` to fill in its properties in the global data;
that is the hygiene data associated with `ExpnId`. (See [the
"Hygiene" section below][hybelow]).
2. Integrate that piece of AST into the big existing partially built
AST. This is essentially where the "token-like mass" becomes a
proper set-in-stone AST with side-tables. It happens as follows:
- If the macro produces tokens (e.g. a proc macro), we parse into
an AST, which may produce parse errors.
- During expansion, we create `SyntaxContext`s (hierarchy 2). (See
[the "Hygiene" section below][hybelow])
- These three passes happen one after another on every AST fragment
freshly expanded from a macro:
- At this point, we know everything about the `macro` itself and can
call [`set_expn_data()`] to fill in its properties in the global
data; that is the [hygiene] data associated with [`ExpnId`] (see
[Hygiene][hybelow] below).
2. Integrate that piece of `AST` into the currently-existing though
partially-built `AST`. This is essentially where the "token-like mass"
becomes a proper set-in-stone `AST` with side-tables. It happens as
follows:
- If the `macro` produces tokens (e.g. a `proc macro`), we parse into
an `AST`, which may produce parse errors.
- During expansion, we create [`SyntaxContext`]s (hierarchy 2) (see
[Hygiene][hybelow] below).
- These three passes happen one after another on every `AST` fragment
freshly expanded from a `macro`:
- [`NodeId`]s are assigned by [`InvocationCollector`]. This
also collects new macro calls from this new AST piece and
also collects new `macro` calls from this new `AST` piece and
adds them to the queue.
- ["Def paths"][defpath] are created and [`DefId`]s are
assigned to them by [`DefCollector`].
- Names are put into modules (from the resolver's point of
view) by [`BuildReducedGraphVisitor`].
3. After expanding a single macro and integrating its output, continue
to the next iteration of [`fully_expand_fragment`][fef].
3. After expanding a single `macro` and integrating its output, continue
to the next iteration of [`fully_expand_fragment()`][fef].
5. If it's not resolved:
1. Put the macro back in the queue
1. Put the `macro` back in the queue.
2. Continue to next iteration...
[defpath]: hir.md#identifiers-in-the-hir
[`NodeId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/node_id/struct.NodeId.html
[`InvocationCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.InvocationCollector.html
[`DefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.DefId.html
[`DefCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/def_collector/struct.DefCollector.html
[`BuildReducedGraphVisitor`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/build_reduced_graph/struct.BuildReducedGraphVisitor.html
[hybelow]: #hygiene-and-hierarchies
[tt]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/enum.TokenTree.html
[`TokenStream`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/struct.TokenStream.html
[inv]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.Invocation.html
[`AstFragment`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragment.html
[`BuildReducedGraphVisitor`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/build_reduced_graph/struct.BuildReducedGraphVisitor.html
[`DefCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/def_collector/struct.DefCollector.html
[`DefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.DefId.html
[`ExpnId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnId.html
[`InvocationCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.InvocationCollector.html
[`NodeId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/node_id/struct.NodeId.html
[`set_expn_data()`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.LocalExpnId.html#method.set_expn_data
[`SyntaxContext`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html
[`TokenStream`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/struct.TokenStream.html
[defpath]: hir.md#identifiers-in-the-hir
[hybelow]: #hygiene-and-hierarchies
[hygiene]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/index.html
[inv]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.Invocation.html
[tt]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/enum.TokenTree.html
### Error Recovery
If we make no progress in an iteration, then we have reached a compilation
error (e.g. an undefined macro). We attempt to recover from failures
(unresolved macros or imports) for the sake of diagnostics. This allows
compilation to continue past the first error, so that we can report more errors
at a time. Recovery can't cause compilation to succeed. We know that it will
fail at this point. The recovery happens by expanding unresolved macros into
[`ExprKind::Err`][err].
If we make no progress in an iteration we have reached a compilation error
(e.g. an undefined `macro`). We attempt to recover from failures (i.e.
unresolved `macro`s or imports) with the intent of generating diagnostics.
Failure recovery happens by expanding unresolved `macro`s into
[`ExprKind::Err`][err] and allows compilation to continue past the first error
so that `rustc` can report more errors than just the original failure.
[err]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/enum.ExprKind.html#variant.Err
### Name Resolution
Notice that name resolution is involved here: we need to resolve imports and
macro names in the above algorithm. This is done in
[`rustc_resolve::macros`][mresolve], which resolves macro paths, validates
those resolutions, and reports various errors (e.g. "not found" or "found, but
it's unstable" or "expected x, found y"). However, we don't try to resolve
other names yet. This happens later, as we will see in the [next
chapter](./name-resolution.md).
`macro` names in the above algorithm. This is done in
[`rustc_resolve::macros`][mresolve], which resolves `macro` paths, validates
those resolutions, and reports various errors (e.g. "not found", "found, but
it's unstable", "expected x, found y"). However, we don't try to resolve
other names yet. This happens later, as we will see in the chapter: [Name
Resolution](./name-resolution.md).
[mresolve]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/macros/index.html
### Eager Expansion
_Eager expansion_ means that we expand the arguments of a macro invocation
before the macro invocation itself. This is implemented only for a few special
built-in macros that expect literals; expanding arguments first for some of
these macro results in a smoother user experience. As an example, consider the
following:
_Eager expansion_ means we expand the arguments of a `macro` invocation before
the `macro` invocation itself. This is implemented only for a few special
built-in `macro`s that expect literals; expanding arguments first for some of
these `macro` results in a smoother user experience. As an example, consider
the following:
```rust,ignore
macro bar($i: ident) { $i }
@ -139,35 +147,37 @@ macro foo($i: ident) { $i }
foo!(bar!(baz));
```
A lazy expansion would expand `foo!` first. An eager expansion would expand
A lazy-expansion would expand `foo!` first. An eager-expansion would expand
`bar!` first.
Eager expansion is not a generally available feature of Rust. Implementing
eager expansion more generally would be challenging, but we implement it for a
few special built-in macros for the sake of user experience. The built-in
macros are implemented in [`rustc_builtin_macros`], along with some other early
code generation facilities like injection of standard library imports or
Eager-expansion is not a generally available feature of Rust. Implementing
eager-expansion more generally would be challenging, so we implement it for a
few special built-in `macro`s for the sake of user-experience. The built-in
`macro`s are implemented in [`rustc_builtin_macros`], along with some other
early code generation facilities like injection of standard library imports or
generation of test harness. There are some additional helpers for building
their AST fragments in [`rustc_expand::build`][reb]. Eager expansion generally
performs a subset of the things that lazy (normal) expansion does. It is done by
invoking [`fully_expand_fragment`][fef] on only part of a crate (as opposed to
the whole crate, like we normally do).
`AST` fragments in [`rustc_expand::build`][reb]. Eager-expansion generally
performs a subset of the things that lazy (normal) expansion does. It is done
by invoking [`fully_expand_fragment`][fef] on only part of a crate (as opposed
to the whole crate, like we normally do).
### Other Data Structures
Here are some other notable data structures involved in expansion and integration:
- [`ResolverExpand`] - a trait used to break crate dependencies. This allows the
Here are some other notable data structures involved in expansion and
integration:
- [`ResolverExpand`] - a `trait` used to break crate dependencies. This allows the
resolver services to be used in [`rustc_ast`], despite [`rustc_resolve`] and
pretty much everything else depending on [`rustc_ast`].
- [`ExtCtxt`]/[`ExpansionData`] - various intermediate data kept and used by expansion
infrastructure in the process of its work
- [`Annotatable`] - a piece of AST that can be an attribute target, almost same
thing as AstFragment except for types and patterns that can be produced by
macros but cannot be annotated with attributes
- [`MacResult`] - a "polymorphic" AST fragment, something that can turn into a
different `AstFragment` depending on its [`AstFragmentKind`] - item,
or expression, or pattern etc.
- [`ExtCtxt`]/[`ExpansionData`] - holds various intermediate expansion
infrastructure data.
- [`Annotatable`] - a piece of `AST` that can be an attribute target, almost the same
thing as [`AstFragment`] except for `type`s and patterns that can be produced by
`macro`s but cannot be annotated with attributes.
- [`MacResult`] - a "polymorphic" `AST` fragment, something that can turn into
a different [`AstFragment`] depending on its [`AstFragmentKind`] (i.e. an item,
expression, pattern, etc).
[`AstFragment`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragment.html
[`rustc_ast`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html
[`rustc_resolve`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/index.html
[`ResolverExpand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.ResolverExpand.html
@ -179,7 +189,7 @@ Here are some other notable data structures involved in expansion and integratio
## Hygiene and Hierarchies
If you have ever used C/C++ preprocessor macros, you know that there are some
If you have ever used the C/C++ preprocessor macros, you know that there are some
annoying and hard-to-debug gotchas! For example, consider the following C code:
```c
@ -213,16 +223,16 @@ we got `foo(0, 0)` because the macro defined its own `y`!
These are both examples of _macro hygiene_ issues. _Hygiene_ relates to how to
handle names defined _within a macro_. In particular, a hygienic macro system
prevents errors due to names introduced within a macro. Rust macros are hygienic
prevents errors due to names introduced within a macro. Rust `macro`s are hygienic
in that they do not allow one to write the sorts of bugs above.
At a high level, hygiene within the Rust compiler is accomplished by keeping
track of the context where a name is introduced and used. We can then
disambiguate names based on that context. Future iterations of the macro system
will allow greater control to the macro author to use that context. For example,
a macro author may want to introduce a new name to the context where the macro
was called. Alternately, the macro author may be defining a variable for use
only within the macro (i.e. it should not be visible outside the macro).
disambiguate names based on that context. Future iterations of the `macro` system
will allow greater control to the `macro` author to use that context. For example,
a `macro` author may want to introduce a new name to the context where the `macro`
was called. Alternately, the `macro` author may be defining a variable for use
only within the `macro` (i.e. it should not be visible outside the `macro`).
[code_dir]: https://github.com/rust-lang/rust/tree/master/compiler/rustc_expand/src/mbe
[code_mp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser
@ -230,18 +240,18 @@ only within the macro (i.e. it should not be visible outside the macro).
[code_parse_int]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/struct.TtParser.html#method.parse_tt
[parsing]: ./the-parser.html
The context is attached to AST nodes. All AST nodes generated by macros have
The context is attached to `AST` nodes. All `AST` nodes generated by `macro`s have
context attached. Additionally, there may be other nodes that have context
attached, such as some desugared syntax (non-macro-expanded nodes are
attached, such as some desugared syntax (non-`macro`-expanded nodes are
considered to just have the "root" context, as described below).
Throughout the compiler, we use [`rustc_span::Span`s][span] to refer to code locations.
This struct also has hygiene information attached to it, as we will see later.
[span]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.Span.html
Because macros invocations and definitions can be nested, the syntax context of
a node must be a hierarchy. For example, if we expand a macro and there is
another macro invocation or definition in the generated output, then the syntax
Because `macro`s invocations and definitions can be nested, the syntax context of
a node must be a hierarchy. For example, if we expand a `macro` and there is
another `macro` invocation or definition in the generated output, then the syntax
context should reflect the nesting.
However, it turns out that there are actually a few types of context we may
@ -249,13 +259,13 @@ want to track for different purposes. Thus, there are not just one but _three_
expansion hierarchies that together comprise the hygiene information for a
crate.
All of these hierarchies need some sort of "macro ID" to identify individual
elements in the chain of expansions. This ID is [`ExpnId`]. All macros receive
an integer ID, assigned continuously starting from 0 as we discover new macro
All of these hierarchies need some sort of "`macro` ID" to identify individual
elements in the chain of expansions. This ID is [`ExpnId`]. All `macro`s receive
an integer ID, assigned continuously starting from 0 as we discover new `macro`
calls. All hierarchies start at [`ExpnId::root()`][rootid], which is its own
parent.
[`rustc_span::hygiene`][hy] contains all of the hygiene-related algorithms
The [`rustc_span::hygiene`][hy] library contains all of the hygiene-related algorithms
(with the exception of some hacks in [`Resolver::resolve_crate_root`][hacks])
and structures related to hygiene and expansion that are kept in global data.
@ -273,18 +283,18 @@ any [`Ident`] without any context.
### The Expansion Order Hierarchy
The first hierarchy tracks the order of expansions, i.e., when a macro
invocation is in the output of another macro.
The first hierarchy tracks the order of expansions, i.e., when a `macro`
invocation is in the output of another `macro`.
Here, the children in the hierarchy will be the "innermost" tokens. The
[`ExpnData`] struct itself contains a subset of properties from both macro
definition and macro call available through global data.
[`ExpnData::parent`][edp] tracks the child -> parent link in this hierarchy.
[`ExpnData`] struct itself contains a subset of properties from both `macro`
definition and `macro` call available through global data.
[`ExpnData::parent`][edp] tracks the child-to-parent link in this hierarchy.
[`ExpnData`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html
[edp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html#structfield.parent
For example,
For example:
```rust,ignore
macro_rules! foo { () => { println!(); } }
@ -292,25 +302,25 @@ macro_rules! foo { () => { println!(); } }
fn main() { foo!(); }
```
In this code, the AST nodes that are finally generated would have hierarchy
In this code, the `AST` nodes that are finally generated would have hierarchy
`root -> id(foo) -> id(println)`.
### The Macro Definition Hierarchy
The second hierarchy tracks the order of macro definitions, i.e., when we are
expanding one macro another macro definition is revealed in its output. This
The second hierarchy tracks the order of `macro` definitions, i.e., when we are
expanding one `macro` another `macro` definition is revealed in its output. This
one is a bit tricky and more complex than the other two hierarchies.
[`SyntaxContext`][sc] represents a whole chain in this hierarchy via an ID.
[`SyntaxContextData`][scd] contains data associated with the given
`SyntaxContext`; mostly it is a cache for results of filtering that chain in
different ways. [`SyntaxContextData::parent`][scdp] is the child -> parent
[`SyntaxContext`][sc]; mostly it is a cache for results of filtering that chain in
different ways. [`SyntaxContextData::parent`][scdp] is the child-to-parent
link here, and [`SyntaxContextData::outer_expns`][scdoe] are individual
elements in the chain. The "chaining operator" is
elements in the chain. The "chaining-operator" is
[`SyntaxContext::apply_mark`][am] in compiler code.
A [`Span`][span], mentioned above, is actually just a compact representation of
a code location and `SyntaxContext`. Likewise, an [`Ident`] is just an interned
a code location and [`SyntaxContext`][sc]. Likewise, an [`Ident`] is just an interned
[`Symbol`] + `Span` (i.e. an interned string + hygiene data).
[`Symbol`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/symbol/struct.Symbol.html
@ -320,13 +330,13 @@ a code location and `SyntaxContext`. Likewise, an [`Ident`] is just an interned
[scdoe]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContextData.html#structfield.outer_expn
[am]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.apply_mark
For built-in macros, we use the context:
`SyntaxContext::empty().apply_mark(expn_id)`, and such macros are considered to
be defined at the hierarchy root. We do the same for proc-macros because we
For built-in `macro`s, we use the context:
`SyntaxContext::empty().apply_mark(expn_id)`, and such `macro`s are considered to
be defined at the hierarchy root. We do the same for `proc macro`s because we
haven't implemented cross-crate hygiene yet.
If the token had context `X` before being produced by a macro then after being
produced by the macro it has context `X -> macro_id`. Here are some examples:
If the token had context `X` before being produced by a `macro` then after being
produced by the `macro` it has context `X -> macro_id`. Here are some examples:
Example 0:
@ -356,7 +366,7 @@ after the first expansion, then `ROOT -> id(m) -> id(n)`.
Example 2:
Note that these chains are not entirely determined by their last element, in
other words `ExpnId` is not isomorphic to `SyntaxContext`.
other words [`ExpnId`] is not isomorphic to [`SyntaxContext`][sc].
```rust,ignore
macro m($i: ident) { macro n() { ($i, bar) } }
@ -369,15 +379,16 @@ After all expansions, `foo` has context `ROOT -> id(n)` and `bar` has context
Finally, one last thing to mention is that currently, this hierarchy is subject
to the ["context transplantation hack"][hack]. Basically, the more modern (and
experimental) `macro` macros have stronger hygiene than the older MBE system,
experimental) `macro` `macro`s have stronger hygiene than the older MBE system,
but this can result in weird interactions between the two. The hack is intended
to make things "just work" for now.
[`ExpnId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnId.html
[hack]: https://github.com/rust-lang/rust/pull/51762#issuecomment-401400732
### The Call-site Hierarchy
The third and final hierarchy tracks the location of macro invocations.
The third and final hierarchy tracks the location of `macro` invocations.
In this hierarchy [`ExpnData::call_site`][callsite] is the child -> parent link.
@ -392,39 +403,39 @@ macro foo($i: ident) { $i }
foo!(bar!(baz));
```
For the `baz` AST node in the final output, the expansion-order hierarchy is
For the `baz` `AST` node in the final output, the expansion-order hierarchy is
`ROOT -> id(foo) -> id(bar) -> baz`, while the call-site hierarchy is `ROOT ->
baz`.
### Macro Backtraces
Macro backtraces are implemented in [`rustc_span`] using the hygiene machinery
`macro` backtraces are implemented in [`rustc_span`] using the hygiene machinery
in [`rustc_span::hygiene`][hy].
[`rustc_span`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/index.html
## Producing Macro Output
Above, we saw how the output of a macro is integrated into the AST for a crate,
Above, we saw how the output of a `macro` is integrated into the `AST` for a crate,
and we also saw how the hygiene data for a crate is generated. But how do we
actually produce the output of a macro? It depends on the type of macro.
actually produce the output of a `macro`? It depends on the type of `macro`.
There are two types of macros in Rust:
`macro_rules!` macros (a.k.a. "Macros By Example" (MBE)) and procedural macros
(or "proc macros"; including custom derives). During the parsing phase, the normal
Rust parser will set aside the contents of macros and their invocations. Later,
macros are expanded using these portions of the code.
There are two types of `macro`s in Rust:
`macro_rules!` `macro`s (a.k.a. "Macros By Example" (MBE)) and procedural `macro`s
(or "proc `macro`s"; including custom derives). During the parsing phase, the normal
Rust parser will set aside the contents of `macro`s and their invocations. Later,
`macro`s are expanded using these portions of the code.
Some important data structures/interfaces here:
- [`SyntaxExtension`] - a lowered macro representation, contains its expander
function, which transforms a `TokenStream` or AST into another `TokenStream`
or AST + some additional data like stability, or a list of unstable features
allowed inside the macro.
- [`SyntaxExtension`] - a lowered `macro` representation, contains its expander
function, which transforms a `TokenStream` or `AST` into another `TokenStream`
or `AST` + some additional data like stability, or a list of unstable features
allowed inside the `macro`.
- [`SyntaxExtensionKind`] - expander functions may have several different
signatures (take one token stream, or two, or a piece of AST, etc). This is
signatures (take one token stream, or two, or a piece of `AST`, etc). This is
an enum that lists them.
- [`BangProcMacro`]/[`TTMacroExpander`]/[`AttrProcMacro`]/[`MultiItemModifier`] -
traits representing the expander function signatures.
`trait`s representing the expander function signatures.
[`SyntaxExtension`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/struct.SyntaxExtension.html
[`SyntaxExtensionKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/enum.SyntaxExtensionKind.html
@ -435,11 +446,11 @@ Some important data structures/interfaces here:
## Macros By Example
MBEs have their own parser distinct from the normal Rust parser. When macros
are expanded, we may invoke the MBE parser to parse and expand a macro. The
MBEs have their own parser distinct from the normal Rust parser. When `macro`s
are expanded, we may invoke the MBE parser to parse and expand a `macro`. The
MBE parser, in turn, may call the normal Rust parser when it needs to bind a
metavariable (e.g. `$my_expr`) while parsing the contents of a macro
invocation. The code for macro expansion is in
metavariable (e.g. `$my_expr`) while parsing the contents of a `macro`
invocation. The code for `macro` expansion is in
[`compiler/rustc_expand/src/mbe/`][code_dir].
### Example
@ -467,8 +478,8 @@ special tokens, such as `EOF`, which indicates that there are no more tokens.
Token trees resulting from paired parentheses-like characters (`(`...`)`,
`[`...`]`, and `{`...`}`) they include the open and close and all the tokens
in between (we do require that parentheses-like characters be balanced). Having
macro expansion operate on token streams rather than the raw bytes of a source
file abstracts away a lot of complexity. The macro expander (and much of the
`macro` expansion operate on token streams rather than the raw bytes of a source
file abstracts away a lot of complexity. The `macro` expander (and much of the
rest of the compiler) doesn't really care that much about the exact line and
column of some syntactic construct in the code; it cares about what constructs
are used in the code. Using tokens allows us to care about _what_ without
@ -481,21 +492,21 @@ Whenever we refer to the "example _invocation_", we mean the following snippet:
printer!(print foo); // Assume `foo` is a variable defined somewhere else...
```
The process of expanding the macro invocation into the syntax tree
The process of expanding the `macro` invocation into the syntax tree
`println!("{}", foo)` and then expanding that into a call to `Display::fmt` is
called _macro expansion_, and it is the topic of this chapter.
called _`macro` expansion_, and it is the topic of this chapter.
### The MBE parser
There are two parts to MBE expansion: parsing the definition and parsing the
invocations. Interestingly, both are done by the macro parser.
invocations. Interestingly, both are done by the `macro` parser.
Basically, the MBE parser is like an NFA-based regex parser. It uses an
algorithm similar in spirit to the [Earley parsing
algorithm](https://en.wikipedia.org/wiki/Earley_parser). The macro parser is
algorithm](https://en.wikipedia.org/wiki/Earley_parser). The `macro` parser is
defined in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].
The interface of the macro parser is as follows (this is slightly simplified):
The interface of the `macro` parser is as follows (this is slightly simplified):
```rust,ignore
fn parse_tt(
@ -505,7 +516,7 @@ fn parse_tt(
) -> ParseResult
```
We use these items in macro parser:
We use these items in `macro` parser:
- `parser` is a reference to the state of a normal Rust parser, including the
token stream and parsing session. The token stream is what we are about to
@ -529,47 +540,47 @@ three cases has occurred:
"No rule expected token _blah_".
- Error: some fatal error has occurred _in the parser_. For example, this
happens if there is more than one pattern match, since that indicates
the macro is ambiguous.
the `macro` is ambiguous.
The full interface is defined [here][code_parse_int].
The macro parser does pretty much exactly the same as a normal regex parser with
The `macro` parser does pretty much exactly the same as a normal regex parser with
one exception: in order to parse different types of metavariables, such as
`ident`, `block`, `expr`, etc., the macro parser must sometimes call back to the
`ident`, `block`, `expr`, etc., the `macro` parser must sometimes call back to the
normal Rust parser.
As mentioned above, both definitions and invocations of macros are parsed using
the macro parser. This is extremely non-intuitive and self-referential. The code
to parse macro _definitions_ is in
As mentioned above, both definitions and invocations of `macro`s are parsed using
the `macro` parser. This is extremely non-intuitive and self-referential. The code
to parse `macro` _definitions_ is in
[`compiler/rustc_expand/src/mbe/macro_rules.rs`][code_mr]. It defines the pattern for
matching for a macro definition as `$( $lhs:tt => $rhs:tt );+`. In other words,
matching for a `macro` definition as `$( $lhs:tt => $rhs:tt );+`. In other words,
a `macro_rules` definition should have in its body at least one occurrence of a
token tree followed by `=>` followed by another token tree. When the compiler
comes to a `macro_rules` definition, it uses this pattern to match the two token
trees per rule in the definition of the macro _using the macro parser itself_.
trees per rule in the definition of the `macro` _using the `macro` parser itself_.
In our example definition, the metavariable `$lhs` would match the patterns of
both arms: `(print $mvar:ident)` and `(print twice $mvar:ident)`. And `$rhs`
would match the bodies of both arms: `{ println!("{}", $mvar); }` and `{
println!("{}", $mvar); println!("{}", $mvar); }`. The parser would keep this
knowledge around for when it needs to expand a macro invocation.
knowledge around for when it needs to expand a `macro` invocation.
When the compiler comes to a macro invocation, it parses that invocation using
the same NFA-based macro parser that is described above. However, the matcher
used is the first token tree (`$lhs`) extracted from the arms of the macro
When the compiler comes to a `macro` invocation, it parses that invocation using
the same NFA-based `macro` parser that is described above. However, the matcher
used is the first token tree (`$lhs`) extracted from the arms of the `macro`
_definition_. Using our example, we would try to match the token stream `print
foo` from the invocation against the matchers `print $mvar:ident` and `print
twice $mvar:ident` that we previously extracted from the definition. The
algorithm is exactly the same, but when the macro parser comes to a place in the
algorithm is exactly the same, but when the `macro` parser comes to a place in the
current matcher where it needs to match a _non-terminal_ (e.g. `$mvar:ident`),
it calls back to the normal Rust parser to get the contents of that
non-terminal. In this case, the Rust parser would look for an `ident` token,
which it finds (`foo`) and returns to the macro parser. Then, the macro parser
which it finds (`foo`) and returns to the `macro` parser. Then, the `macro` parser
proceeds in parsing as normal. Also, note that exactly one of the matchers from
the various arms should match the invocation; if there is more than one match,
the parse is ambiguous, while if there are no matches at all, there is a syntax
error.
For more information about the macro parser's implementation, see the comments
For more information about the `macro` parser's implementation, see the comments
in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].
### `macro`s and Macros 2.0
@ -577,21 +588,21 @@ in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].
There is an old and mostly undocumented effort to improve the MBE system, give
it more hygiene-related features, better scoping and visibility rules, etc. There
hasn't been a lot of work on this recently, unfortunately. Internally, `macro`
macros use the same machinery as today's MBEs; they just have additional
`macro`s use the same machinery as today's MBEs; they just have additional
syntactic sugar and are allowed to be in namespaces.
## Procedural Macros
Procedural macros are also expanded during parsing, as mentioned above.
Procedural `macro`s are also expanded during parsing, as mentioned above.
However, they use a rather different mechanism. Rather than having a parser in
the compiler, procedural macros are implemented as custom, third-party crates.
The compiler will compile the proc macro crate and specially annotated
functions in them (i.e. the proc macro itself), passing them a stream of tokens.
the compiler, procedural `macro`s are implemented as custom, third-party crates.
The compiler will compile the proc `macro` crate and specially annotated
functions in them (i.e. the proc `macro` itself), passing them a stream of tokens.
The proc macro can then transform the token stream and output a new token
stream, which is synthesized into the AST.
The proc `macro` can then transform the token stream and output a new token
stream, which is synthesized into the `AST`.
It's worth noting that the token stream type used by proc macros is _stable_,
It's worth noting that the token stream type used by proc `macro`s is _stable_,
so `rustc` does not use it internally (since our internal data structures are
unstable). The compiler's token stream is
[`rustc_ast::tokenstream::TokenStream`][rustcts], as previously. This is
@ -610,6 +621,6 @@ TODO: more here. [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/116
### Custom Derive
Custom derives are a special type of proc macro.
Custom derives are a special type of proc `macro`.
TODO: more? [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/1160)