Update macro-expansion.md
This commit is contained in:
parent
6977f206f5
commit
2edd9e08d4
|
|
@ -2,25 +2,29 @@
|
|||
|
||||
<!-- toc -->
|
||||
|
||||
> `rustc_ast`, `rustc_expand`, and `rustc_builtin_macros` are all undergoing
|
||||
> refactoring, so some of the links in this chapter may be broken.
|
||||
> N.B. [`rustc_ast`], [`rustc_expand`], and [`rustc_builtin_macros`] are all
|
||||
> undergoing refactoring, so some of the links in this chapter may be broken.
|
||||
|
||||
Rust has a very powerful macro system. In the previous chapter, we saw how the
|
||||
parser sets aside macros to be expanded (it temporarily uses [placeholders]).
|
||||
This chapter is about the process of expanding those macros iteratively until
|
||||
we have a complete AST for our crate with no unexpanded macros (or a compile
|
||||
error).
|
||||
Rust has a very powerful `macro` system. In the previous chapter, we saw how
|
||||
the parser sets aside `macro`s to be expanded (using temporary [placeholders]).
|
||||
This chapter is about the process of expanding those `macro`s iteratively until
|
||||
we have a complete [*Abstract Syntax Tree* (`AST`)][ast] for our crate with no
|
||||
unexpanded `macro`s (or a compile error).
|
||||
|
||||
[ast]: https://en.wikipedia.org/wiki/Abstract_syntax_tree
|
||||
[`rustc_ast`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html
|
||||
[`rustc_expand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/index.html
|
||||
[`rustc_builtin_macros`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_builtin_macros/index.html
|
||||
[placeholders]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/placeholders/index.html
|
||||
|
||||
First, we will discuss the algorithm that expands and integrates macro output
|
||||
into ASTs. Next, we will take a look at how hygiene data is collected. Finally,
|
||||
we will look at the specifics of expanding different types of macros.
|
||||
First, we discuss the algorithm that expands and integrates `macro` output into
|
||||
`AST`s. Next, we take a look at how hygiene data is collected. Finally, we look
|
||||
at the specifics of expanding different types of `macro`s.
|
||||
|
||||
Many of the algorithms and data structures described below are in [`rustc_expand`],
|
||||
with basic data structures in [`rustc_expand::base`][base].
|
||||
with fundamental data structures in [`rustc_expand::base`][base].
|
||||
|
||||
Also of note, `cfg` and `cfg_attr` are treated specially from other macros, and are
|
||||
Also of note, `cfg` and `cfg_attr` are treated specially from other `macro`s, and are
|
||||
handled in [`rustc_expand::config`][cfg].
|
||||
|
||||
[`rustc_expand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/index.html
|
||||
|
|
@ -29,108 +33,112 @@ handled in [`rustc_expand::config`][cfg].
|
|||
|
||||
## Expansion and AST Integration
|
||||
|
||||
First of all, expansion happens at the crate level. Given a raw source code for
|
||||
a crate, the compiler will produce a massive AST with all macros expanded, all
|
||||
Firstly, expansion happens at the crate level. Given a raw source code for
|
||||
a crate, the compiler will produce a massive `AST` with all `macro`s expanded, all
|
||||
modules inlined, etc. The primary entry point for this process is the
|
||||
[`MacroExpander::fully_expand_fragment`][fef] method. With few exceptions, we
|
||||
[`MacroExpander::fully_expand_fragment()`][fef] method. With few exceptions, we
|
||||
use this method on the whole crate (see ["Eager Expansion"](#eager-expansion)
|
||||
below for more detailed discussion of edge case expansion issues).
|
||||
|
||||
[`rustc_builtin_macros`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_builtin_macros/index.html
|
||||
[reb]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/build/index.html
|
||||
|
||||
At a high level, [`fully_expand_fragment`][fef] works in iterations. We keep a
|
||||
queue of unresolved macro invocations (that is, macros we haven't found the
|
||||
definition of yet). We repeatedly try to pick a macro from the queue, resolve
|
||||
At a high level, [`fully_expand_fragment()`][fef] works in iterations. We keep a
|
||||
queue of unresolved `macro` invocations (i.e. `macro`s we haven't found the
|
||||
definition of yet). We repeatedly try to pick a `macro` from the queue, resolve
|
||||
it, expand it, and integrate it back. If we can't make progress in an
|
||||
iteration, this represents a compile error. Here is the [algorithm][original]:
|
||||
|
||||
[fef]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.MacroExpander.html#method.fully_expand_fragment
|
||||
[original]: https://github.com/rust-lang/rust/pull/53778#issuecomment-419224049
|
||||
|
||||
1. Initialize a `queue` of unresolved macros.
|
||||
1. Initialize a `queue` of unresolved `macro`s.
|
||||
2. Repeat until `queue` is empty (or we make no progress, which is an error):
|
||||
1. [Resolve](./name-resolution.md) imports in our partially built crate as
|
||||
much as possible.
|
||||
2. Collect as many macro [`Invocation`s][inv] as possible from our
|
||||
partially built crate (fn-like, attributes, derives) and add them to the
|
||||
2. Collect as many `macro` [`Invocation`s][inv] as possible from our
|
||||
partially built crate (`fn`-like, attributes, derives) and add them to the
|
||||
queue.
|
||||
3. Dequeue the first element, and attempt to resolve it.
|
||||
3. Dequeue the first element and attempt to resolve it.
|
||||
4. If it's resolved:
|
||||
1. Run the macro's expander function that consumes a [`TokenStream`] or
|
||||
AST and produces a [`TokenStream`] or [`AstFragment`] (depending on
|
||||
the macro kind). (A `TokenStream` is a collection of [`TokenTree`s][tt],
|
||||
1. Run the `macro`'s expander function that consumes a [`TokenStream`] or
|
||||
`AST` and produces a [`TokenStream`] or [`AstFragment`] (depending on
|
||||
the `macro` kind). (A [`TokenStream`] is a collection of [`TokenTree`s][tt],
|
||||
each of which are a token (punctuation, identifier, or literal) or a
|
||||
delimited group (anything inside `()`/`[]`/`{}`)).
|
||||
- At this point, we know everything about the macro itself and can
|
||||
call `set_expn_data` to fill in its properties in the global data;
|
||||
that is the hygiene data associated with `ExpnId`. (See [the
|
||||
"Hygiene" section below][hybelow]).
|
||||
2. Integrate that piece of AST into the big existing partially built
|
||||
AST. This is essentially where the "token-like mass" becomes a
|
||||
proper set-in-stone AST with side-tables. It happens as follows:
|
||||
- If the macro produces tokens (e.g. a proc macro), we parse into
|
||||
an AST, which may produce parse errors.
|
||||
- During expansion, we create `SyntaxContext`s (hierarchy 2). (See
|
||||
[the "Hygiene" section below][hybelow])
|
||||
- These three passes happen one after another on every AST fragment
|
||||
freshly expanded from a macro:
|
||||
- At this point, we know everything about the `macro` itself and can
|
||||
call [`set_expn_data()`] to fill in its properties in the global
|
||||
data; that is the [hygiene] data associated with [`ExpnId`] (see
|
||||
[Hygiene][hybelow] below).
|
||||
2. Integrate that piece of `AST` into the currently-existing though
|
||||
partially-built `AST`. This is essentially where the "token-like mass"
|
||||
becomes a proper set-in-stone `AST` with side-tables. It happens as
|
||||
follows:
|
||||
- If the `macro` produces tokens (e.g. a `proc macro`), we parse into
|
||||
an `AST`, which may produce parse errors.
|
||||
- During expansion, we create [`SyntaxContext`]s (hierarchy 2) (see
|
||||
[Hygiene][hybelow] below).
|
||||
- These three passes happen one after another on every `AST` fragment
|
||||
freshly expanded from a `macro`:
|
||||
- [`NodeId`]s are assigned by [`InvocationCollector`]. This
|
||||
also collects new macro calls from this new AST piece and
|
||||
also collects new `macro` calls from this new `AST` piece and
|
||||
adds them to the queue.
|
||||
- ["Def paths"][defpath] are created and [`DefId`]s are
|
||||
assigned to them by [`DefCollector`].
|
||||
- Names are put into modules (from the resolver's point of
|
||||
view) by [`BuildReducedGraphVisitor`].
|
||||
3. After expanding a single macro and integrating its output, continue
|
||||
to the next iteration of [`fully_expand_fragment`][fef].
|
||||
3. After expanding a single `macro` and integrating its output, continue
|
||||
to the next iteration of [`fully_expand_fragment()`][fef].
|
||||
5. If it's not resolved:
|
||||
1. Put the macro back in the queue
|
||||
1. Put the `macro` back in the queue.
|
||||
2. Continue to next iteration...
|
||||
|
||||
[defpath]: hir.md#identifiers-in-the-hir
|
||||
[`NodeId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/node_id/struct.NodeId.html
|
||||
[`InvocationCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.InvocationCollector.html
|
||||
[`DefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.DefId.html
|
||||
[`DefCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/def_collector/struct.DefCollector.html
|
||||
[`BuildReducedGraphVisitor`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/build_reduced_graph/struct.BuildReducedGraphVisitor.html
|
||||
[hybelow]: #hygiene-and-hierarchies
|
||||
[tt]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/enum.TokenTree.html
|
||||
[`TokenStream`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/struct.TokenStream.html
|
||||
[inv]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.Invocation.html
|
||||
[`AstFragment`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragment.html
|
||||
[`BuildReducedGraphVisitor`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/build_reduced_graph/struct.BuildReducedGraphVisitor.html
|
||||
[`DefCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/def_collector/struct.DefCollector.html
|
||||
[`DefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.DefId.html
|
||||
[`ExpnId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnId.html
|
||||
[`InvocationCollector`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.InvocationCollector.html
|
||||
[`NodeId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/node_id/struct.NodeId.html
|
||||
[`set_expn_data()`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.LocalExpnId.html#method.set_expn_data
|
||||
[`SyntaxContext`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html
|
||||
[`TokenStream`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/struct.TokenStream.html
|
||||
[defpath]: hir.md#identifiers-in-the-hir
|
||||
[hybelow]: #hygiene-and-hierarchies
|
||||
[hygiene]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/index.html
|
||||
[inv]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.Invocation.html
|
||||
[tt]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/enum.TokenTree.html
|
||||
|
||||
### Error Recovery
|
||||
|
||||
If we make no progress in an iteration, then we have reached a compilation
|
||||
error (e.g. an undefined macro). We attempt to recover from failures
|
||||
(unresolved macros or imports) for the sake of diagnostics. This allows
|
||||
compilation to continue past the first error, so that we can report more errors
|
||||
at a time. Recovery can't cause compilation to succeed. We know that it will
|
||||
fail at this point. The recovery happens by expanding unresolved macros into
|
||||
[`ExprKind::Err`][err].
|
||||
If we make no progress in an iteration we have reached a compilation error
|
||||
(e.g. an undefined `macro`). We attempt to recover from failures (i.e.
|
||||
unresolved `macro`s or imports) with the intent of generating diagnostics.
|
||||
Failure recovery happens by expanding unresolved `macro`s into
|
||||
[`ExprKind::Err`][err] and allows compilation to continue past the first error
|
||||
so that `rustc` can report more errors than just the original failure.
|
||||
|
||||
[err]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/enum.ExprKind.html#variant.Err
|
||||
|
||||
### Name Resolution
|
||||
|
||||
Notice that name resolution is involved here: we need to resolve imports and
|
||||
macro names in the above algorithm. This is done in
|
||||
[`rustc_resolve::macros`][mresolve], which resolves macro paths, validates
|
||||
those resolutions, and reports various errors (e.g. "not found" or "found, but
|
||||
it's unstable" or "expected x, found y"). However, we don't try to resolve
|
||||
other names yet. This happens later, as we will see in the [next
|
||||
chapter](./name-resolution.md).
|
||||
`macro` names in the above algorithm. This is done in
|
||||
[`rustc_resolve::macros`][mresolve], which resolves `macro` paths, validates
|
||||
those resolutions, and reports various errors (e.g. "not found", "found, but
|
||||
it's unstable", "expected x, found y"). However, we don't try to resolve
|
||||
other names yet. This happens later, as we will see in the chapter: [Name
|
||||
Resolution](./name-resolution.md).
|
||||
|
||||
[mresolve]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/macros/index.html
|
||||
|
||||
### Eager Expansion
|
||||
|
||||
_Eager expansion_ means that we expand the arguments of a macro invocation
|
||||
before the macro invocation itself. This is implemented only for a few special
|
||||
built-in macros that expect literals; expanding arguments first for some of
|
||||
these macro results in a smoother user experience. As an example, consider the
|
||||
following:
|
||||
_Eager expansion_ means we expand the arguments of a `macro` invocation before
|
||||
the `macro` invocation itself. This is implemented only for a few special
|
||||
built-in `macro`s that expect literals; expanding arguments first for some of
|
||||
these `macro` results in a smoother user experience. As an example, consider
|
||||
the following:
|
||||
|
||||
```rust,ignore
|
||||
macro bar($i: ident) { $i }
|
||||
|
|
@ -139,35 +147,37 @@ macro foo($i: ident) { $i }
|
|||
foo!(bar!(baz));
|
||||
```
|
||||
|
||||
A lazy expansion would expand `foo!` first. An eager expansion would expand
|
||||
A lazy-expansion would expand `foo!` first. An eager-expansion would expand
|
||||
`bar!` first.
|
||||
|
||||
Eager expansion is not a generally available feature of Rust. Implementing
|
||||
eager expansion more generally would be challenging, but we implement it for a
|
||||
few special built-in macros for the sake of user experience. The built-in
|
||||
macros are implemented in [`rustc_builtin_macros`], along with some other early
|
||||
code generation facilities like injection of standard library imports or
|
||||
Eager-expansion is not a generally available feature of Rust. Implementing
|
||||
eager-expansion more generally would be challenging, so we implement it for a
|
||||
few special built-in `macro`s for the sake of user-experience. The built-in
|
||||
`macro`s are implemented in [`rustc_builtin_macros`], along with some other
|
||||
early code generation facilities like injection of standard library imports or
|
||||
generation of test harness. There are some additional helpers for building
|
||||
their AST fragments in [`rustc_expand::build`][reb]. Eager expansion generally
|
||||
performs a subset of the things that lazy (normal) expansion does. It is done by
|
||||
invoking [`fully_expand_fragment`][fef] on only part of a crate (as opposed to
|
||||
the whole crate, like we normally do).
|
||||
`AST` fragments in [`rustc_expand::build`][reb]. Eager-expansion generally
|
||||
performs a subset of the things that lazy (normal) expansion does. It is done
|
||||
by invoking [`fully_expand_fragment`][fef] on only part of a crate (as opposed
|
||||
to the whole crate, like we normally do).
|
||||
|
||||
### Other Data Structures
|
||||
|
||||
Here are some other notable data structures involved in expansion and integration:
|
||||
- [`ResolverExpand`] - a trait used to break crate dependencies. This allows the
|
||||
Here are some other notable data structures involved in expansion and
|
||||
integration:
|
||||
- [`ResolverExpand`] - a `trait` used to break crate dependencies. This allows the
|
||||
resolver services to be used in [`rustc_ast`], despite [`rustc_resolve`] and
|
||||
pretty much everything else depending on [`rustc_ast`].
|
||||
- [`ExtCtxt`]/[`ExpansionData`] - various intermediate data kept and used by expansion
|
||||
infrastructure in the process of its work
|
||||
- [`Annotatable`] - a piece of AST that can be an attribute target, almost same
|
||||
thing as AstFragment except for types and patterns that can be produced by
|
||||
macros but cannot be annotated with attributes
|
||||
- [`MacResult`] - a "polymorphic" AST fragment, something that can turn into a
|
||||
different `AstFragment` depending on its [`AstFragmentKind`] - item,
|
||||
or expression, or pattern etc.
|
||||
- [`ExtCtxt`]/[`ExpansionData`] - holds various intermediate expansion
|
||||
infrastructure data.
|
||||
- [`Annotatable`] - a piece of `AST` that can be an attribute target, almost the same
|
||||
thing as [`AstFragment`] except for `type`s and patterns that can be produced by
|
||||
`macro`s but cannot be annotated with attributes.
|
||||
- [`MacResult`] - a "polymorphic" `AST` fragment, something that can turn into
|
||||
a different [`AstFragment`] depending on its [`AstFragmentKind`] (i.e. an item,
|
||||
expression, pattern, etc).
|
||||
|
||||
[`AstFragment`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragment.html
|
||||
[`rustc_ast`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html
|
||||
[`rustc_resolve`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/index.html
|
||||
[`ResolverExpand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/trait.ResolverExpand.html
|
||||
|
|
@ -179,7 +189,7 @@ Here are some other notable data structures involved in expansion and integratio
|
|||
|
||||
## Hygiene and Hierarchies
|
||||
|
||||
If you have ever used C/C++ preprocessor macros, you know that there are some
|
||||
If you have ever used the C/C++ preprocessor macros, you know that there are some
|
||||
annoying and hard-to-debug gotchas! For example, consider the following C code:
|
||||
|
||||
```c
|
||||
|
|
@ -213,16 +223,16 @@ we got `foo(0, 0)` because the macro defined its own `y`!
|
|||
|
||||
These are both examples of _macro hygiene_ issues. _Hygiene_ relates to how to
|
||||
handle names defined _within a macro_. In particular, a hygienic macro system
|
||||
prevents errors due to names introduced within a macro. Rust macros are hygienic
|
||||
prevents errors due to names introduced within a macro. Rust `macro`s are hygienic
|
||||
in that they do not allow one to write the sorts of bugs above.
|
||||
|
||||
At a high level, hygiene within the Rust compiler is accomplished by keeping
|
||||
track of the context where a name is introduced and used. We can then
|
||||
disambiguate names based on that context. Future iterations of the macro system
|
||||
will allow greater control to the macro author to use that context. For example,
|
||||
a macro author may want to introduce a new name to the context where the macro
|
||||
was called. Alternately, the macro author may be defining a variable for use
|
||||
only within the macro (i.e. it should not be visible outside the macro).
|
||||
disambiguate names based on that context. Future iterations of the `macro` system
|
||||
will allow greater control to the `macro` author to use that context. For example,
|
||||
a `macro` author may want to introduce a new name to the context where the `macro`
|
||||
was called. Alternately, the `macro` author may be defining a variable for use
|
||||
only within the `macro` (i.e. it should not be visible outside the `macro`).
|
||||
|
||||
[code_dir]: https://github.com/rust-lang/rust/tree/master/compiler/rustc_expand/src/mbe
|
||||
[code_mp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser
|
||||
|
|
@ -230,18 +240,18 @@ only within the macro (i.e. it should not be visible outside the macro).
|
|||
[code_parse_int]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/struct.TtParser.html#method.parse_tt
|
||||
[parsing]: ./the-parser.html
|
||||
|
||||
The context is attached to AST nodes. All AST nodes generated by macros have
|
||||
The context is attached to `AST` nodes. All `AST` nodes generated by `macro`s have
|
||||
context attached. Additionally, there may be other nodes that have context
|
||||
attached, such as some desugared syntax (non-macro-expanded nodes are
|
||||
attached, such as some desugared syntax (non-`macro`-expanded nodes are
|
||||
considered to just have the "root" context, as described below).
|
||||
Throughout the compiler, we use [`rustc_span::Span`s][span] to refer to code locations.
|
||||
This struct also has hygiene information attached to it, as we will see later.
|
||||
|
||||
[span]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.Span.html
|
||||
|
||||
Because macros invocations and definitions can be nested, the syntax context of
|
||||
a node must be a hierarchy. For example, if we expand a macro and there is
|
||||
another macro invocation or definition in the generated output, then the syntax
|
||||
Because `macro`s invocations and definitions can be nested, the syntax context of
|
||||
a node must be a hierarchy. For example, if we expand a `macro` and there is
|
||||
another `macro` invocation or definition in the generated output, then the syntax
|
||||
context should reflect the nesting.
|
||||
|
||||
However, it turns out that there are actually a few types of context we may
|
||||
|
|
@ -249,13 +259,13 @@ want to track for different purposes. Thus, there are not just one but _three_
|
|||
expansion hierarchies that together comprise the hygiene information for a
|
||||
crate.
|
||||
|
||||
All of these hierarchies need some sort of "macro ID" to identify individual
|
||||
elements in the chain of expansions. This ID is [`ExpnId`]. All macros receive
|
||||
an integer ID, assigned continuously starting from 0 as we discover new macro
|
||||
All of these hierarchies need some sort of "`macro` ID" to identify individual
|
||||
elements in the chain of expansions. This ID is [`ExpnId`]. All `macro`s receive
|
||||
an integer ID, assigned continuously starting from 0 as we discover new `macro`
|
||||
calls. All hierarchies start at [`ExpnId::root()`][rootid], which is its own
|
||||
parent.
|
||||
|
||||
[`rustc_span::hygiene`][hy] contains all of the hygiene-related algorithms
|
||||
The [`rustc_span::hygiene`][hy] library contains all of the hygiene-related algorithms
|
||||
(with the exception of some hacks in [`Resolver::resolve_crate_root`][hacks])
|
||||
and structures related to hygiene and expansion that are kept in global data.
|
||||
|
||||
|
|
@ -273,18 +283,18 @@ any [`Ident`] without any context.
|
|||
|
||||
### The Expansion Order Hierarchy
|
||||
|
||||
The first hierarchy tracks the order of expansions, i.e., when a macro
|
||||
invocation is in the output of another macro.
|
||||
The first hierarchy tracks the order of expansions, i.e., when a `macro`
|
||||
invocation is in the output of another `macro`.
|
||||
|
||||
Here, the children in the hierarchy will be the "innermost" tokens. The
|
||||
[`ExpnData`] struct itself contains a subset of properties from both macro
|
||||
definition and macro call available through global data.
|
||||
[`ExpnData::parent`][edp] tracks the child -> parent link in this hierarchy.
|
||||
[`ExpnData`] struct itself contains a subset of properties from both `macro`
|
||||
definition and `macro` call available through global data.
|
||||
[`ExpnData::parent`][edp] tracks the child-to-parent link in this hierarchy.
|
||||
|
||||
[`ExpnData`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html
|
||||
[edp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html#structfield.parent
|
||||
|
||||
For example,
|
||||
For example:
|
||||
|
||||
```rust,ignore
|
||||
macro_rules! foo { () => { println!(); } }
|
||||
|
|
@ -292,25 +302,25 @@ macro_rules! foo { () => { println!(); } }
|
|||
fn main() { foo!(); }
|
||||
```
|
||||
|
||||
In this code, the AST nodes that are finally generated would have hierarchy
|
||||
In this code, the `AST` nodes that are finally generated would have hierarchy
|
||||
`root -> id(foo) -> id(println)`.
|
||||
|
||||
### The Macro Definition Hierarchy
|
||||
|
||||
The second hierarchy tracks the order of macro definitions, i.e., when we are
|
||||
expanding one macro another macro definition is revealed in its output. This
|
||||
The second hierarchy tracks the order of `macro` definitions, i.e., when we are
|
||||
expanding one `macro` another `macro` definition is revealed in its output. This
|
||||
one is a bit tricky and more complex than the other two hierarchies.
|
||||
|
||||
[`SyntaxContext`][sc] represents a whole chain in this hierarchy via an ID.
|
||||
[`SyntaxContextData`][scd] contains data associated with the given
|
||||
`SyntaxContext`; mostly it is a cache for results of filtering that chain in
|
||||
different ways. [`SyntaxContextData::parent`][scdp] is the child -> parent
|
||||
[`SyntaxContext`][sc]; mostly it is a cache for results of filtering that chain in
|
||||
different ways. [`SyntaxContextData::parent`][scdp] is the child-to-parent
|
||||
link here, and [`SyntaxContextData::outer_expns`][scdoe] are individual
|
||||
elements in the chain. The "chaining operator" is
|
||||
elements in the chain. The "chaining-operator" is
|
||||
[`SyntaxContext::apply_mark`][am] in compiler code.
|
||||
|
||||
A [`Span`][span], mentioned above, is actually just a compact representation of
|
||||
a code location and `SyntaxContext`. Likewise, an [`Ident`] is just an interned
|
||||
a code location and [`SyntaxContext`][sc]. Likewise, an [`Ident`] is just an interned
|
||||
[`Symbol`] + `Span` (i.e. an interned string + hygiene data).
|
||||
|
||||
[`Symbol`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/symbol/struct.Symbol.html
|
||||
|
|
@ -320,13 +330,13 @@ a code location and `SyntaxContext`. Likewise, an [`Ident`] is just an interned
|
|||
[scdoe]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContextData.html#structfield.outer_expn
|
||||
[am]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.apply_mark
|
||||
|
||||
For built-in macros, we use the context:
|
||||
`SyntaxContext::empty().apply_mark(expn_id)`, and such macros are considered to
|
||||
be defined at the hierarchy root. We do the same for proc-macros because we
|
||||
For built-in `macro`s, we use the context:
|
||||
`SyntaxContext::empty().apply_mark(expn_id)`, and such `macro`s are considered to
|
||||
be defined at the hierarchy root. We do the same for `proc macro`s because we
|
||||
haven't implemented cross-crate hygiene yet.
|
||||
|
||||
If the token had context `X` before being produced by a macro then after being
|
||||
produced by the macro it has context `X -> macro_id`. Here are some examples:
|
||||
If the token had context `X` before being produced by a `macro` then after being
|
||||
produced by the `macro` it has context `X -> macro_id`. Here are some examples:
|
||||
|
||||
Example 0:
|
||||
|
||||
|
|
@ -356,7 +366,7 @@ after the first expansion, then `ROOT -> id(m) -> id(n)`.
|
|||
Example 2:
|
||||
|
||||
Note that these chains are not entirely determined by their last element, in
|
||||
other words `ExpnId` is not isomorphic to `SyntaxContext`.
|
||||
other words [`ExpnId`] is not isomorphic to [`SyntaxContext`][sc].
|
||||
|
||||
```rust,ignore
|
||||
macro m($i: ident) { macro n() { ($i, bar) } }
|
||||
|
|
@ -369,15 +379,16 @@ After all expansions, `foo` has context `ROOT -> id(n)` and `bar` has context
|
|||
|
||||
Finally, one last thing to mention is that currently, this hierarchy is subject
|
||||
to the ["context transplantation hack"][hack]. Basically, the more modern (and
|
||||
experimental) `macro` macros have stronger hygiene than the older MBE system,
|
||||
experimental) `macro` `macro`s have stronger hygiene than the older MBE system,
|
||||
but this can result in weird interactions between the two. The hack is intended
|
||||
to make things "just work" for now.
|
||||
|
||||
[`ExpnId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnId.html
|
||||
[hack]: https://github.com/rust-lang/rust/pull/51762#issuecomment-401400732
|
||||
|
||||
### The Call-site Hierarchy
|
||||
|
||||
The third and final hierarchy tracks the location of macro invocations.
|
||||
The third and final hierarchy tracks the location of `macro` invocations.
|
||||
|
||||
In this hierarchy [`ExpnData::call_site`][callsite] is the child -> parent link.
|
||||
|
||||
|
|
@ -392,39 +403,39 @@ macro foo($i: ident) { $i }
|
|||
foo!(bar!(baz));
|
||||
```
|
||||
|
||||
For the `baz` AST node in the final output, the expansion-order hierarchy is
|
||||
For the `baz` `AST` node in the final output, the expansion-order hierarchy is
|
||||
`ROOT -> id(foo) -> id(bar) -> baz`, while the call-site hierarchy is `ROOT ->
|
||||
baz`.
|
||||
|
||||
### Macro Backtraces
|
||||
|
||||
Macro backtraces are implemented in [`rustc_span`] using the hygiene machinery
|
||||
`macro` backtraces are implemented in [`rustc_span`] using the hygiene machinery
|
||||
in [`rustc_span::hygiene`][hy].
|
||||
|
||||
[`rustc_span`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/index.html
|
||||
|
||||
## Producing Macro Output
|
||||
|
||||
Above, we saw how the output of a macro is integrated into the AST for a crate,
|
||||
Above, we saw how the output of a `macro` is integrated into the `AST` for a crate,
|
||||
and we also saw how the hygiene data for a crate is generated. But how do we
|
||||
actually produce the output of a macro? It depends on the type of macro.
|
||||
actually produce the output of a `macro`? It depends on the type of `macro`.
|
||||
|
||||
There are two types of macros in Rust:
|
||||
`macro_rules!` macros (a.k.a. "Macros By Example" (MBE)) and procedural macros
|
||||
(or "proc macros"; including custom derives). During the parsing phase, the normal
|
||||
Rust parser will set aside the contents of macros and their invocations. Later,
|
||||
macros are expanded using these portions of the code.
|
||||
There are two types of `macro`s in Rust:
|
||||
`macro_rules!` `macro`s (a.k.a. "Macros By Example" (MBE)) and procedural `macro`s
|
||||
(or "proc `macro`s"; including custom derives). During the parsing phase, the normal
|
||||
Rust parser will set aside the contents of `macro`s and their invocations. Later,
|
||||
`macro`s are expanded using these portions of the code.
|
||||
|
||||
Some important data structures/interfaces here:
|
||||
- [`SyntaxExtension`] - a lowered macro representation, contains its expander
|
||||
function, which transforms a `TokenStream` or AST into another `TokenStream`
|
||||
or AST + some additional data like stability, or a list of unstable features
|
||||
allowed inside the macro.
|
||||
- [`SyntaxExtension`] - a lowered `macro` representation, contains its expander
|
||||
function, which transforms a `TokenStream` or `AST` into another `TokenStream`
|
||||
or `AST` + some additional data like stability, or a list of unstable features
|
||||
allowed inside the `macro`.
|
||||
- [`SyntaxExtensionKind`] - expander functions may have several different
|
||||
signatures (take one token stream, or two, or a piece of AST, etc). This is
|
||||
signatures (take one token stream, or two, or a piece of `AST`, etc). This is
|
||||
an enum that lists them.
|
||||
- [`BangProcMacro`]/[`TTMacroExpander`]/[`AttrProcMacro`]/[`MultiItemModifier`] -
|
||||
traits representing the expander function signatures.
|
||||
`trait`s representing the expander function signatures.
|
||||
|
||||
[`SyntaxExtension`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/struct.SyntaxExtension.html
|
||||
[`SyntaxExtensionKind`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/enum.SyntaxExtensionKind.html
|
||||
|
|
@ -435,11 +446,11 @@ Some important data structures/interfaces here:
|
|||
|
||||
## Macros By Example
|
||||
|
||||
MBEs have their own parser distinct from the normal Rust parser. When macros
|
||||
are expanded, we may invoke the MBE parser to parse and expand a macro. The
|
||||
MBEs have their own parser distinct from the normal Rust parser. When `macro`s
|
||||
are expanded, we may invoke the MBE parser to parse and expand a `macro`. The
|
||||
MBE parser, in turn, may call the normal Rust parser when it needs to bind a
|
||||
metavariable (e.g. `$my_expr`) while parsing the contents of a macro
|
||||
invocation. The code for macro expansion is in
|
||||
metavariable (e.g. `$my_expr`) while parsing the contents of a `macro`
|
||||
invocation. The code for `macro` expansion is in
|
||||
[`compiler/rustc_expand/src/mbe/`][code_dir].
|
||||
|
||||
### Example
|
||||
|
|
@ -467,8 +478,8 @@ special tokens, such as `EOF`, which indicates that there are no more tokens.
|
|||
Token trees resulting from paired parentheses-like characters (`(`...`)`,
|
||||
`[`...`]`, and `{`...`}`) – they include the open and close and all the tokens
|
||||
in between (we do require that parentheses-like characters be balanced). Having
|
||||
macro expansion operate on token streams rather than the raw bytes of a source
|
||||
file abstracts away a lot of complexity. The macro expander (and much of the
|
||||
`macro` expansion operate on token streams rather than the raw bytes of a source
|
||||
file abstracts away a lot of complexity. The `macro` expander (and much of the
|
||||
rest of the compiler) doesn't really care that much about the exact line and
|
||||
column of some syntactic construct in the code; it cares about what constructs
|
||||
are used in the code. Using tokens allows us to care about _what_ without
|
||||
|
|
@ -481,21 +492,21 @@ Whenever we refer to the "example _invocation_", we mean the following snippet:
|
|||
printer!(print foo); // Assume `foo` is a variable defined somewhere else...
|
||||
```
|
||||
|
||||
The process of expanding the macro invocation into the syntax tree
|
||||
The process of expanding the `macro` invocation into the syntax tree
|
||||
`println!("{}", foo)` and then expanding that into a call to `Display::fmt` is
|
||||
called _macro expansion_, and it is the topic of this chapter.
|
||||
called _`macro` expansion_, and it is the topic of this chapter.
|
||||
|
||||
### The MBE parser
|
||||
|
||||
There are two parts to MBE expansion: parsing the definition and parsing the
|
||||
invocations. Interestingly, both are done by the macro parser.
|
||||
invocations. Interestingly, both are done by the `macro` parser.
|
||||
|
||||
Basically, the MBE parser is like an NFA-based regex parser. It uses an
|
||||
algorithm similar in spirit to the [Earley parsing
|
||||
algorithm](https://en.wikipedia.org/wiki/Earley_parser). The macro parser is
|
||||
algorithm](https://en.wikipedia.org/wiki/Earley_parser). The `macro` parser is
|
||||
defined in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].
|
||||
|
||||
The interface of the macro parser is as follows (this is slightly simplified):
|
||||
The interface of the `macro` parser is as follows (this is slightly simplified):
|
||||
|
||||
```rust,ignore
|
||||
fn parse_tt(
|
||||
|
|
@ -505,7 +516,7 @@ fn parse_tt(
|
|||
) -> ParseResult
|
||||
```
|
||||
|
||||
We use these items in macro parser:
|
||||
We use these items in `macro` parser:
|
||||
|
||||
- `parser` is a reference to the state of a normal Rust parser, including the
|
||||
token stream and parsing session. The token stream is what we are about to
|
||||
|
|
@ -529,47 +540,47 @@ three cases has occurred:
|
|||
"No rule expected token _blah_".
|
||||
- Error: some fatal error has occurred _in the parser_. For example, this
|
||||
happens if there is more than one pattern match, since that indicates
|
||||
the macro is ambiguous.
|
||||
the `macro` is ambiguous.
|
||||
|
||||
The full interface is defined [here][code_parse_int].
|
||||
|
||||
The macro parser does pretty much exactly the same as a normal regex parser with
|
||||
The `macro` parser does pretty much exactly the same as a normal regex parser with
|
||||
one exception: in order to parse different types of metavariables, such as
|
||||
`ident`, `block`, `expr`, etc., the macro parser must sometimes call back to the
|
||||
`ident`, `block`, `expr`, etc., the `macro` parser must sometimes call back to the
|
||||
normal Rust parser.
|
||||
|
||||
As mentioned above, both definitions and invocations of macros are parsed using
|
||||
the macro parser. This is extremely non-intuitive and self-referential. The code
|
||||
to parse macro _definitions_ is in
|
||||
As mentioned above, both definitions and invocations of `macro`s are parsed using
|
||||
the `macro` parser. This is extremely non-intuitive and self-referential. The code
|
||||
to parse `macro` _definitions_ is in
|
||||
[`compiler/rustc_expand/src/mbe/macro_rules.rs`][code_mr]. It defines the pattern for
|
||||
matching for a macro definition as `$( $lhs:tt => $rhs:tt );+`. In other words,
|
||||
matching for a `macro` definition as `$( $lhs:tt => $rhs:tt );+`. In other words,
|
||||
a `macro_rules` definition should have in its body at least one occurrence of a
|
||||
token tree followed by `=>` followed by another token tree. When the compiler
|
||||
comes to a `macro_rules` definition, it uses this pattern to match the two token
|
||||
trees per rule in the definition of the macro _using the macro parser itself_.
|
||||
trees per rule in the definition of the `macro` _using the `macro` parser itself_.
|
||||
In our example definition, the metavariable `$lhs` would match the patterns of
|
||||
both arms: `(print $mvar:ident)` and `(print twice $mvar:ident)`. And `$rhs`
|
||||
would match the bodies of both arms: `{ println!("{}", $mvar); }` and `{
|
||||
println!("{}", $mvar); println!("{}", $mvar); }`. The parser would keep this
|
||||
knowledge around for when it needs to expand a macro invocation.
|
||||
knowledge around for when it needs to expand a `macro` invocation.
|
||||
|
||||
When the compiler comes to a macro invocation, it parses that invocation using
|
||||
the same NFA-based macro parser that is described above. However, the matcher
|
||||
used is the first token tree (`$lhs`) extracted from the arms of the macro
|
||||
When the compiler comes to a `macro` invocation, it parses that invocation using
|
||||
the same NFA-based `macro` parser that is described above. However, the matcher
|
||||
used is the first token tree (`$lhs`) extracted from the arms of the `macro`
|
||||
_definition_. Using our example, we would try to match the token stream `print
|
||||
foo` from the invocation against the matchers `print $mvar:ident` and `print
|
||||
twice $mvar:ident` that we previously extracted from the definition. The
|
||||
algorithm is exactly the same, but when the macro parser comes to a place in the
|
||||
algorithm is exactly the same, but when the `macro` parser comes to a place in the
|
||||
current matcher where it needs to match a _non-terminal_ (e.g. `$mvar:ident`),
|
||||
it calls back to the normal Rust parser to get the contents of that
|
||||
non-terminal. In this case, the Rust parser would look for an `ident` token,
|
||||
which it finds (`foo`) and returns to the macro parser. Then, the macro parser
|
||||
which it finds (`foo`) and returns to the `macro` parser. Then, the `macro` parser
|
||||
proceeds in parsing as normal. Also, note that exactly one of the matchers from
|
||||
the various arms should match the invocation; if there is more than one match,
|
||||
the parse is ambiguous, while if there are no matches at all, there is a syntax
|
||||
error.
|
||||
|
||||
For more information about the macro parser's implementation, see the comments
|
||||
For more information about the `macro` parser's implementation, see the comments
|
||||
in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].
|
||||
|
||||
### `macro`s and Macros 2.0
|
||||
|
|
@ -577,21 +588,21 @@ in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].
|
|||
There is an old and mostly undocumented effort to improve the MBE system, give
|
||||
it more hygiene-related features, better scoping and visibility rules, etc. There
|
||||
hasn't been a lot of work on this recently, unfortunately. Internally, `macro`
|
||||
macros use the same machinery as today's MBEs; they just have additional
|
||||
`macro`s use the same machinery as today's MBEs; they just have additional
|
||||
syntactic sugar and are allowed to be in namespaces.
|
||||
|
||||
## Procedural Macros
|
||||
|
||||
Procedural macros are also expanded during parsing, as mentioned above.
|
||||
Procedural `macro`s are also expanded during parsing, as mentioned above.
|
||||
However, they use a rather different mechanism. Rather than having a parser in
|
||||
the compiler, procedural macros are implemented as custom, third-party crates.
|
||||
The compiler will compile the proc macro crate and specially annotated
|
||||
functions in them (i.e. the proc macro itself), passing them a stream of tokens.
|
||||
the compiler, procedural `macro`s are implemented as custom, third-party crates.
|
||||
The compiler will compile the proc `macro` crate and specially annotated
|
||||
functions in them (i.e. the proc `macro` itself), passing them a stream of tokens.
|
||||
|
||||
The proc macro can then transform the token stream and output a new token
|
||||
stream, which is synthesized into the AST.
|
||||
The proc `macro` can then transform the token stream and output a new token
|
||||
stream, which is synthesized into the `AST`.
|
||||
|
||||
It's worth noting that the token stream type used by proc macros is _stable_,
|
||||
It's worth noting that the token stream type used by proc `macro`s is _stable_,
|
||||
so `rustc` does not use it internally (since our internal data structures are
|
||||
unstable). The compiler's token stream is
|
||||
[`rustc_ast::tokenstream::TokenStream`][rustcts], as previously. This is
|
||||
|
|
@ -610,6 +621,6 @@ TODO: more here. [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/116
|
|||
|
||||
### Custom Derive
|
||||
|
||||
Custom derives are a special type of proc macro.
|
||||
Custom derives are a special type of proc `macro`.
|
||||
|
||||
TODO: more? [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/1160)
|
||||
|
|
|
|||
Loading…
Reference in New Issue