Minor edits
This commit is contained in:
parent
b956638072
commit
04480082ad
|
|
@ -5,11 +5,11 @@
|
||||||
> N.B. [`rustc_ast`], [`rustc_expand`], and [`rustc_builtin_macros`] are all
|
> N.B. [`rustc_ast`], [`rustc_expand`], and [`rustc_builtin_macros`] are all
|
||||||
> undergoing refactoring, so some of the links in this chapter may be broken.
|
> undergoing refactoring, so some of the links in this chapter may be broken.
|
||||||
|
|
||||||
Rust has a very powerful `macro` system. In the previous chapter, we saw how
|
Rust has a very powerful macro system. In the previous chapter, we saw how
|
||||||
the parser sets aside `macro`s to be expanded (using temporary [placeholders]).
|
the parser sets aside macros to be expanded (using temporary [placeholders]).
|
||||||
This chapter is about the process of expanding those `macro`s iteratively until
|
This chapter is about the process of expanding those macros iteratively until
|
||||||
we have a complete [*Abstract Syntax Tree* (`AST`)][ast] for our crate with no
|
we have a complete [*Abstract Syntax Tree* (AST)][ast] for our crate with no
|
||||||
unexpanded `macro`s (or a compile error).
|
unexpanded macros (or a compile error).
|
||||||
|
|
||||||
[ast]: ./ast-validation.md
|
[ast]: ./ast-validation.md
|
||||||
[`rustc_ast`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html
|
[`rustc_ast`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html
|
||||||
|
|
@ -17,14 +17,14 @@ unexpanded `macro`s (or a compile error).
|
||||||
[`rustc_builtin_macros`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_builtin_macros/index.html
|
[`rustc_builtin_macros`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_builtin_macros/index.html
|
||||||
[placeholders]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/placeholders/index.html
|
[placeholders]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/placeholders/index.html
|
||||||
|
|
||||||
First, we discuss the algorithm that expands and integrates `macro` output into
|
First, we discuss the algorithm that expands and integrates macro output into
|
||||||
`AST`s. Next, we take a look at how hygiene data is collected. Finally, we look
|
ASTs. Next, we take a look at how hygiene data is collected. Finally, we look
|
||||||
at the specifics of expanding different types of `macro`s.
|
at the specifics of expanding different types of macros.
|
||||||
|
|
||||||
Many of the algorithms and data structures described below are in [`rustc_expand`],
|
Many of the algorithms and data structures described below are in [`rustc_expand`],
|
||||||
with fundamental data structures in [`rustc_expand::base`][base].
|
with fundamental data structures in [`rustc_expand::base`][base].
|
||||||
|
|
||||||
Also of note, `cfg` and `cfg_attr` are treated specially from other `macro`s, and are
|
Also of note, `cfg` and `cfg_attr` are treated specially from other macros, and are
|
||||||
handled in [`rustc_expand::config`][cfg].
|
handled in [`rustc_expand::config`][cfg].
|
||||||
|
|
||||||
[`rustc_expand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/index.html
|
[`rustc_expand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/index.html
|
||||||
|
|
@ -34,7 +34,7 @@ handled in [`rustc_expand::config`][cfg].
|
||||||
## Expansion and AST Integration
|
## Expansion and AST Integration
|
||||||
|
|
||||||
Firstly, expansion happens at the crate level. Given a raw source code for
|
Firstly, expansion happens at the crate level. Given a raw source code for
|
||||||
a crate, the compiler will produce a massive `AST` with all `macro`s expanded, all
|
a crate, the compiler will produce a massive AST with all macros expanded, all
|
||||||
modules inlined, etc. The primary entry point for this process is the
|
modules inlined, etc. The primary entry point for this process is the
|
||||||
[`MacroExpander::fully_expand_fragment`][fef] method. With few exceptions, we
|
[`MacroExpander::fully_expand_fragment`][fef] method. With few exceptions, we
|
||||||
use this method on the whole crate (see ["Eager Expansion"](#eager-expansion)
|
use this method on the whole crate (see ["Eager Expansion"](#eager-expansion)
|
||||||
|
|
@ -44,53 +44,53 @@ below for more detailed discussion of edge case expansion issues).
|
||||||
[reb]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/build/index.html
|
[reb]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/build/index.html
|
||||||
|
|
||||||
At a high level, [`fully_expand_fragment`][fef] works in iterations. We keep a
|
At a high level, [`fully_expand_fragment`][fef] works in iterations. We keep a
|
||||||
queue of unresolved `macro` invocations (i.e. `macro`s we haven't found the
|
queue of unresolved macro invocations (i.e. macros we haven't found the
|
||||||
definition of yet). We repeatedly try to pick a `macro` from the queue, resolve
|
definition of yet). We repeatedly try to pick a macro from the queue, resolve
|
||||||
it, expand it, and integrate it back. If we can't make progress in an
|
it, expand it, and integrate it back. If we can't make progress in an
|
||||||
iteration, this represents a compile error. Here is the [algorithm][original]:
|
iteration, this represents a compile error. Here is the [algorithm][original]:
|
||||||
|
|
||||||
[fef]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.MacroExpander.html#method.fully_expand_fragment
|
[fef]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.MacroExpander.html#method.fully_expand_fragment
|
||||||
[original]: https://github.com/rust-lang/rust/pull/53778#issuecomment-419224049
|
[original]: https://github.com/rust-lang/rust/pull/53778#issuecomment-419224049
|
||||||
|
|
||||||
1. Initialize a `queue` of unresolved `macro`s.
|
1. Initialize a `queue` of unresolved macros.
|
||||||
2. Repeat until `queue` is empty (or we make no progress, which is an error):
|
2. Repeat until `queue` is empty (or we make no progress, which is an error):
|
||||||
1. [Resolve](./name-resolution.md) imports in our partially built crate as
|
1. [Resolve](./name-resolution.md) imports in our partially built crate as
|
||||||
much as possible.
|
much as possible.
|
||||||
2. Collect as many `macro` [`Invocation`s][inv] as possible from our
|
2. Collect as many macro [`Invocation`s][inv] as possible from our
|
||||||
partially built crate (`fn`-like, attributes, derives) and add them to the
|
partially built crate (`fn`-like, attributes, derives) and add them to the
|
||||||
queue.
|
queue.
|
||||||
3. Dequeue the first element and attempt to resolve it.
|
3. Dequeue the first element and attempt to resolve it.
|
||||||
4. If it's resolved:
|
4. If it's resolved:
|
||||||
1. Run the `macro`'s expander function that consumes a [`TokenStream`] or
|
1. Run the macro's expander function that consumes a [`TokenStream`] or
|
||||||
`AST` and produces a [`TokenStream`] or [`AstFragment`] (depending on
|
AST and produces a [`TokenStream`] or [`AstFragment`] (depending on
|
||||||
the `macro` kind). (A [`TokenStream`] is a collection of [`TokenTree`s][tt],
|
the macro kind). (A [`TokenStream`] is a collection of [`TokenTree`s][tt],
|
||||||
each of which are a token (punctuation, identifier, or literal) or a
|
each of which are a token (punctuation, identifier, or literal) or a
|
||||||
delimited group (anything inside `()`/`[]`/`{}`)).
|
delimited group (anything inside `()`/`[]`/`{}`)).
|
||||||
- At this point, we know everything about the `macro` itself and can
|
- At this point, we know everything about the macro itself and can
|
||||||
call [`set_expn_data`] to fill in its properties in the global
|
call [`set_expn_data`] to fill in its properties in the global
|
||||||
data; that is the [hygiene] data associated with [`ExpnId`] (see
|
data; that is the [hygiene] data associated with [`ExpnId`] (see
|
||||||
[Hygiene][hybelow] below).
|
[Hygiene][hybelow] below).
|
||||||
2. Integrate that piece of `AST` into the currently-existing though
|
2. Integrate that piece of AST into the currently-existing though
|
||||||
partially-built `AST`. This is essentially where the "token-like mass"
|
partially-built AST. This is essentially where the "token-like mass"
|
||||||
becomes a proper set-in-stone `AST` with side-tables. It happens as
|
becomes a proper set-in-stone AST with side-tables. It happens as
|
||||||
follows:
|
follows:
|
||||||
- If the `macro` produces tokens (e.g. a `proc macro`), we parse into
|
- If the macro produces tokens (e.g. a proc macro), we parse into
|
||||||
an `AST`, which may produce parse errors.
|
an AST, which may produce parse errors.
|
||||||
- During expansion, we create [`SyntaxContext`]s (hierarchy 2) (see
|
- During expansion, we create [`SyntaxContext`]s (hierarchy 2) (see
|
||||||
[Hygiene][hybelow] below).
|
[Hygiene][hybelow] below).
|
||||||
- These three passes happen one after another on every `AST` fragment
|
- These three passes happen one after another on every AST fragment
|
||||||
freshly expanded from a `macro`:
|
freshly expanded from a macro:
|
||||||
- [`NodeId`]s are assigned by [`InvocationCollector`]. This
|
- [`NodeId`]s are assigned by [`InvocationCollector`]. This
|
||||||
also collects new `macro` calls from this new `AST` piece and
|
also collects new macro calls from this new AST piece and
|
||||||
adds them to the queue.
|
adds them to the queue.
|
||||||
- ["Def paths"][defpath] are created and [`DefId`]s are
|
- ["Def paths"][defpath] are created and [`DefId`]s are
|
||||||
assigned to them by [`DefCollector`].
|
assigned to them by [`DefCollector`].
|
||||||
- Names are put into modules (from the resolver's point of
|
- Names are put into modules (from the resolver's point of
|
||||||
view) by [`BuildReducedGraphVisitor`].
|
view) by [`BuildReducedGraphVisitor`].
|
||||||
3. After expanding a single `macro` and integrating its output, continue
|
3. After expanding a single macro and integrating its output, continue
|
||||||
to the next iteration of [`fully_expand_fragment`][fef].
|
to the next iteration of [`fully_expand_fragment`][fef].
|
||||||
5. If it's not resolved:
|
5. If it's not resolved:
|
||||||
1. Put the `macro` back in the queue.
|
1. Put the macro back in the queue.
|
||||||
2. Continue to next iteration...
|
2. Continue to next iteration...
|
||||||
|
|
||||||
[`AstFragment`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragment.html
|
[`AstFragment`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragment.html
|
||||||
|
|
@ -112,9 +112,9 @@ iteration, this represents a compile error. Here is the [algorithm][original]:
|
||||||
### Error Recovery
|
### Error Recovery
|
||||||
|
|
||||||
If we make no progress in an iteration we have reached a compilation error
|
If we make no progress in an iteration we have reached a compilation error
|
||||||
(e.g. an undefined `macro`). We attempt to recover from failures (i.e.
|
(e.g. an undefined macro). We attempt to recover from failures (i.e.
|
||||||
unresolved `macro`s or imports) with the intent of generating diagnostics.
|
unresolved macros or imports) with the intent of generating diagnostics.
|
||||||
Failure recovery happens by expanding unresolved `macro`s into
|
Failure recovery happens by expanding unresolved macros into
|
||||||
[`ExprKind::Err`][err] and allows compilation to continue past the first error
|
[`ExprKind::Err`][err] and allows compilation to continue past the first error
|
||||||
so that `rustc` can report more errors than just the original failure.
|
so that `rustc` can report more errors than just the original failure.
|
||||||
|
|
||||||
|
|
@ -123,8 +123,8 @@ so that `rustc` can report more errors than just the original failure.
|
||||||
### Name Resolution
|
### Name Resolution
|
||||||
|
|
||||||
Notice that name resolution is involved here: we need to resolve imports and
|
Notice that name resolution is involved here: we need to resolve imports and
|
||||||
`macro` names in the above algorithm. This is done in
|
macro names in the above algorithm. This is done in
|
||||||
[`rustc_resolve::macros`][mresolve], which resolves `macro` paths, validates
|
[`rustc_resolve::macros`][mresolve], which resolves macro paths, validates
|
||||||
those resolutions, and reports various errors (e.g. "not found", "found, but
|
those resolutions, and reports various errors (e.g. "not found", "found, but
|
||||||
it's unstable", "expected x, found y"). However, we don't try to resolve
|
it's unstable", "expected x, found y"). However, we don't try to resolve
|
||||||
other names yet. This happens later, as we will see in the chapter: [Name
|
other names yet. This happens later, as we will see in the chapter: [Name
|
||||||
|
|
@ -134,10 +134,10 @@ Resolution](./name-resolution.md).
|
||||||
|
|
||||||
### Eager Expansion
|
### Eager Expansion
|
||||||
|
|
||||||
_Eager expansion_ means we expand the arguments of a `macro` invocation before
|
_Eager expansion_ means we expand the arguments of a macro invocation before
|
||||||
the `macro` invocation itself. This is implemented only for a few special
|
the macro invocation itself. This is implemented only for a few special
|
||||||
built-in `macro`s that expect literals; expanding arguments first for some of
|
built-in macros that expect literals; expanding arguments first for some of
|
||||||
these `macro` results in a smoother user experience. As an example, consider
|
these macro results in a smoother user experience. As an example, consider
|
||||||
the following:
|
the following:
|
||||||
|
|
||||||
```rust,ignore
|
```rust,ignore
|
||||||
|
|
@ -152,11 +152,11 @@ A lazy-expansion would expand `foo!` first. An eager-expansion would expand
|
||||||
|
|
||||||
Eager-expansion is not a generally available feature of Rust. Implementing
|
Eager-expansion is not a generally available feature of Rust. Implementing
|
||||||
eager-expansion more generally would be challenging, so we implement it for a
|
eager-expansion more generally would be challenging, so we implement it for a
|
||||||
few special built-in `macro`s for the sake of user-experience. The built-in
|
few special built-in macros for the sake of user-experience. The built-in
|
||||||
`macro`s are implemented in [`rustc_builtin_macros`], along with some other
|
macros are implemented in [`rustc_builtin_macros`], along with some other
|
||||||
early code generation facilities like injection of standard library imports or
|
early code generation facilities like injection of standard library imports or
|
||||||
generation of test harness. There are some additional helpers for building
|
generation of test harness. There are some additional helpers for building
|
||||||
`AST` fragments in [`rustc_expand::build`][reb]. Eager-expansion generally
|
AST fragments in [`rustc_expand::build`][reb]. Eager-expansion generally
|
||||||
performs a subset of the things that lazy (normal) expansion does. It is done
|
performs a subset of the things that lazy (normal) expansion does. It is done
|
||||||
by invoking [`fully_expand_fragment`][fef] on only part of a crate (as opposed
|
by invoking [`fully_expand_fragment`][fef] on only part of a crate (as opposed
|
||||||
to the whole crate, like we normally do).
|
to the whole crate, like we normally do).
|
||||||
|
|
@ -170,10 +170,10 @@ integration:
|
||||||
pretty much everything else depending on [`rustc_ast`].
|
pretty much everything else depending on [`rustc_ast`].
|
||||||
- [`ExtCtxt`]/[`ExpansionData`] - holds various intermediate expansion
|
- [`ExtCtxt`]/[`ExpansionData`] - holds various intermediate expansion
|
||||||
infrastructure data.
|
infrastructure data.
|
||||||
- [`Annotatable`] - a piece of `AST` that can be an attribute target, almost the same
|
- [`Annotatable`] - a piece of AST that can be an attribute target, almost the same
|
||||||
thing as [`AstFragment`] except for `type`s and patterns that can be produced by
|
thing as [`AstFragment`] except for types and patterns that can be produced by
|
||||||
`macro`s but cannot be annotated with attributes.
|
macros but cannot be annotated with attributes.
|
||||||
- [`MacResult`] - a "polymorphic" `AST` fragment, something that can turn into
|
- [`MacResult`] - a "polymorphic" AST fragment, something that can turn into
|
||||||
a different [`AstFragment`] depending on its [`AstFragmentKind`] (i.e. an item,
|
a different [`AstFragment`] depending on its [`AstFragmentKind`] (i.e. an item,
|
||||||
expression, pattern, etc).
|
expression, pattern, etc).
|
||||||
|
|
||||||
|
|
@ -223,16 +223,16 @@ we got `foo(0, 0)` because the macro defined its own `y`!
|
||||||
|
|
||||||
These are both examples of _macro hygiene_ issues. _Hygiene_ relates to how to
|
These are both examples of _macro hygiene_ issues. _Hygiene_ relates to how to
|
||||||
handle names defined _within a macro_. In particular, a hygienic macro system
|
handle names defined _within a macro_. In particular, a hygienic macro system
|
||||||
prevents errors due to names introduced within a macro. Rust `macro`s are hygienic
|
prevents errors due to names introduced within a macro. Rust macros are hygienic
|
||||||
in that they do not allow one to write the sorts of bugs above.
|
in that they do not allow one to write the sorts of bugs above.
|
||||||
|
|
||||||
At a high level, hygiene within the Rust compiler is accomplished by keeping
|
At a high level, hygiene within the Rust compiler is accomplished by keeping
|
||||||
track of the context where a name is introduced and used. We can then
|
track of the context where a name is introduced and used. We can then
|
||||||
disambiguate names based on that context. Future iterations of the `macro` system
|
disambiguate names based on that context. Future iterations of the macro system
|
||||||
will allow greater control to the `macro` author to use that context. For example,
|
will allow greater control to the macro author to use that context. For example,
|
||||||
a `macro` author may want to introduce a new name to the context where the `macro`
|
a macro author may want to introduce a new name to the context where the macro
|
||||||
was called. Alternately, the `macro` author may be defining a variable for use
|
was called. Alternately, the macro author may be defining a variable for use
|
||||||
only within the `macro` (i.e. it should not be visible outside the `macro`).
|
only within the macro (i.e. it should not be visible outside the macro).
|
||||||
|
|
||||||
[code_dir]: https://github.com/rust-lang/rust/tree/master/compiler/rustc_expand/src/mbe
|
[code_dir]: https://github.com/rust-lang/rust/tree/master/compiler/rustc_expand/src/mbe
|
||||||
[code_mp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser
|
[code_mp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser
|
||||||
|
|
@ -240,18 +240,18 @@ only within the `macro` (i.e. it should not be visible outside the `macro`).
|
||||||
[code_parse_int]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/struct.TtParser.html#method.parse_tt
|
[code_parse_int]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/struct.TtParser.html#method.parse_tt
|
||||||
[parsing]: ./the-parser.html
|
[parsing]: ./the-parser.html
|
||||||
|
|
||||||
The context is attached to `AST` nodes. All `AST` nodes generated by `macro`s have
|
The context is attached to AST nodes. All AST nodes generated by macros have
|
||||||
context attached. Additionally, there may be other nodes that have context
|
context attached. Additionally, there may be other nodes that have context
|
||||||
attached, such as some desugared syntax (non-`macro`-expanded nodes are
|
attached, such as some desugared syntax (non-macro-expanded nodes are
|
||||||
considered to just have the "root" context, as described below).
|
considered to just have the "root" context, as described below).
|
||||||
Throughout the compiler, we use [`rustc_span::Span`s][span] to refer to code locations.
|
Throughout the compiler, we use [`rustc_span::Span`s][span] to refer to code locations.
|
||||||
This struct also has hygiene information attached to it, as we will see later.
|
This struct also has hygiene information attached to it, as we will see later.
|
||||||
|
|
||||||
[span]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.Span.html
|
[span]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.Span.html
|
||||||
|
|
||||||
Because `macro`s invocations and definitions can be nested, the syntax context of
|
Because macros invocations and definitions can be nested, the syntax context of
|
||||||
a node must be a hierarchy. For example, if we expand a `macro` and there is
|
a node must be a hierarchy. For example, if we expand a macro and there is
|
||||||
another `macro` invocation or definition in the generated output, then the syntax
|
another macro invocation or definition in the generated output, then the syntax
|
||||||
context should reflect the nesting.
|
context should reflect the nesting.
|
||||||
|
|
||||||
However, it turns out that there are actually a few types of context we may
|
However, it turns out that there are actually a few types of context we may
|
||||||
|
|
@ -259,13 +259,13 @@ want to track for different purposes. Thus, there are not just one but _three_
|
||||||
expansion hierarchies that together comprise the hygiene information for a
|
expansion hierarchies that together comprise the hygiene information for a
|
||||||
crate.
|
crate.
|
||||||
|
|
||||||
All of these hierarchies need some sort of "`macro` ID" to identify individual
|
All of these hierarchies need some sort of "macro ID" to identify individual
|
||||||
elements in the chain of expansions. This ID is [`ExpnId`]. All `macro`s receive
|
elements in the chain of expansions. This ID is [`ExpnId`]. All macros receive
|
||||||
an integer ID, assigned continuously starting from 0 as we discover new `macro`
|
an integer ID, assigned continuously starting from 0 as we discover new macro
|
||||||
calls. All hierarchies start at [`ExpnId::root`][rootid], which is its own
|
calls. All hierarchies start at [`ExpnId::root`][rootid], which is its own
|
||||||
parent.
|
parent.
|
||||||
|
|
||||||
The [`rustc_span::hygiene`][hy] library contains all of the hygiene-related algorithms
|
The [`rustc_span::hygiene`][hy] crate contains all of the hygiene-related algorithms
|
||||||
(with the exception of some hacks in [`Resolver::resolve_crate_root`][hacks])
|
(with the exception of some hacks in [`Resolver::resolve_crate_root`][hacks])
|
||||||
and structures related to hygiene and expansion that are kept in global data.
|
and structures related to hygiene and expansion that are kept in global data.
|
||||||
|
|
||||||
|
|
@ -283,12 +283,12 @@ any [`Ident`] without any context.
|
||||||
|
|
||||||
### The Expansion Order Hierarchy
|
### The Expansion Order Hierarchy
|
||||||
|
|
||||||
The first hierarchy tracks the order of expansions, i.e., when a `macro`
|
The first hierarchy tracks the order of expansions, i.e., when a macro
|
||||||
invocation is in the output of another `macro`.
|
invocation is in the output of another macro.
|
||||||
|
|
||||||
Here, the children in the hierarchy will be the "innermost" tokens. The
|
Here, the children in the hierarchy will be the "innermost" tokens. The
|
||||||
[`ExpnData`] struct itself contains a subset of properties from both `macro`
|
[`ExpnData`] struct itself contains a subset of properties from both macro
|
||||||
definition and `macro` call available through global data.
|
definition and macro call available through global data.
|
||||||
[`ExpnData::parent`][edp] tracks the child-to-parent link in this hierarchy.
|
[`ExpnData::parent`][edp] tracks the child-to-parent link in this hierarchy.
|
||||||
|
|
||||||
[`ExpnData`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html
|
[`ExpnData`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html
|
||||||
|
|
@ -302,13 +302,13 @@ macro_rules! foo { () => { println!(); } }
|
||||||
fn main() { foo!(); }
|
fn main() { foo!(); }
|
||||||
```
|
```
|
||||||
|
|
||||||
In this code, the `AST` nodes that are finally generated would have hierarchy
|
In this code, the AST nodes that are finally generated would have hierarchy
|
||||||
`root -> id(foo) -> id(println)`.
|
`root -> id(foo) -> id(println)`.
|
||||||
|
|
||||||
### The Macro Definition Hierarchy
|
### The Macro Definition Hierarchy
|
||||||
|
|
||||||
The second hierarchy tracks the order of `macro` definitions, i.e., when we are
|
The second hierarchy tracks the order of macro definitions, i.e., when we are
|
||||||
expanding one `macro` another `macro` definition is revealed in its output. This
|
expanding one macro another macro definition is revealed in its output. This
|
||||||
one is a bit tricky and more complex than the other two hierarchies.
|
one is a bit tricky and more complex than the other two hierarchies.
|
||||||
|
|
||||||
[`SyntaxContext`][sc] represents a whole chain in this hierarchy via an ID.
|
[`SyntaxContext`][sc] represents a whole chain in this hierarchy via an ID.
|
||||||
|
|
@ -330,15 +330,15 @@ a code location and [`SyntaxContext`][sc]. Likewise, an [`Ident`] is just an int
|
||||||
[scdoe]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContextData.html#structfield.outer_expn
|
[scdoe]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContextData.html#structfield.outer_expn
|
||||||
[am]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.apply_mark
|
[am]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.apply_mark
|
||||||
|
|
||||||
For built-in `macro`s, we use the context:
|
For built-in macros, we use the context:
|
||||||
[`SyntaxContext::empty().apply_mark(expn_id)`], and such `macro`s are
|
[`SyntaxContext::empty().apply_mark(expn_id)`], and such macros are
|
||||||
considered to be defined at the hierarchy root. We do the same for `proc
|
considered to be defined at the hierarchy root. We do the same for `proc
|
||||||
macro`s because we haven't implemented cross-crate hygiene yet.
|
macro`s because we haven't implemented cross-crate hygiene yet.
|
||||||
|
|
||||||
[`SyntaxContext::empty().apply_mark(expn_id)`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.apply_mark
|
[`SyntaxContext::empty().apply_mark(expn_id)`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.apply_mark
|
||||||
|
|
||||||
If the token had context `X` before being produced by a `macro` then after being
|
If the token had context `X` before being produced by a macro then after being
|
||||||
produced by the `macro` it has context `X -> macro_id`. Here are some examples:
|
produced by the macro it has context `X -> macro_id`. Here are some examples:
|
||||||
|
|
||||||
Example 0:
|
Example 0:
|
||||||
|
|
||||||
|
|
@ -379,9 +379,9 @@ m!(foo);
|
||||||
After all expansions, `foo` has context `ROOT -> id(n)` and `bar` has context
|
After all expansions, `foo` has context `ROOT -> id(n)` and `bar` has context
|
||||||
`ROOT -> id(m) -> id(n)`.
|
`ROOT -> id(m) -> id(n)`.
|
||||||
|
|
||||||
Currently this hierarchy for tracking `macro` definitions is subject to the
|
Currently this hierarchy for tracking macro definitions is subject to the
|
||||||
so-called ["context transplantation hack"][hack]. Modern (i.e. experimental)
|
so-called ["context transplantation hack"][hack]. Modern (i.e. experimental)
|
||||||
`macro`s have stronger hygiene than the legacy "Macros By Example" (`MBE`)
|
macros have stronger hygiene than the legacy "Macros By Example" (MBE)
|
||||||
system which can result in weird interactions between the two. The hack is
|
system which can result in weird interactions between the two. The hack is
|
||||||
intended to make things "just work" for now.
|
intended to make things "just work" for now.
|
||||||
|
|
||||||
|
|
@ -390,7 +390,7 @@ intended to make things "just work" for now.
|
||||||
|
|
||||||
### The Call-site Hierarchy
|
### The Call-site Hierarchy
|
||||||
|
|
||||||
The third and final hierarchy tracks the location of `macro` invocations.
|
The third and final hierarchy tracks the location of macro invocations.
|
||||||
|
|
||||||
In this hierarchy [`ExpnData::call_site`][callsite] is the `child -> parent`
|
In this hierarchy [`ExpnData::call_site`][callsite] is the `child -> parent`
|
||||||
link.
|
link.
|
||||||
|
|
@ -406,38 +406,38 @@ macro foo($i: ident) { $i }
|
||||||
foo!(bar!(baz));
|
foo!(bar!(baz));
|
||||||
```
|
```
|
||||||
|
|
||||||
For the `baz` `AST` node in the final output, the expansion-order hierarchy is
|
For the `baz` AST node in the final output, the expansion-order hierarchy is
|
||||||
`ROOT -> id(foo) -> id(bar) -> baz`, while the call-site hierarchy is `ROOT ->
|
`ROOT -> id(foo) -> id(bar) -> baz`, while the call-site hierarchy is `ROOT ->
|
||||||
baz`.
|
baz`.
|
||||||
|
|
||||||
### Macro Backtraces
|
### Macro Backtraces
|
||||||
|
|
||||||
`macro` backtraces are implemented in [`rustc_span`] using the hygiene machinery
|
Macro backtraces are implemented in [`rustc_span`] using the hygiene machinery
|
||||||
in [`rustc_span::hygiene`][hy].
|
in [`rustc_span::hygiene`][hy].
|
||||||
|
|
||||||
[`rustc_span`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/index.html
|
[`rustc_span`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/index.html
|
||||||
|
|
||||||
## Producing Macro Output
|
## Producing Macro Output
|
||||||
|
|
||||||
Above, we saw how the output of a `macro` is integrated into the `AST` for a crate,
|
Above, we saw how the output of a macro is integrated into the AST for a crate,
|
||||||
and we also saw how the hygiene data for a crate is generated. But how do we
|
and we also saw how the hygiene data for a crate is generated. But how do we
|
||||||
actually produce the output of a `macro`? It depends on the type of `macro`.
|
actually produce the output of a macro? It depends on the type of macro.
|
||||||
|
|
||||||
There are two types of `macro`s in Rust:
|
There are two types of macros in Rust:
|
||||||
1. `macro_rules!` macros, and,
|
1. `macro_rules!` macros (a.k.a. "Macros By Example" (MBE)), and,
|
||||||
2. procedural `macro`s (`proc macro`s); including custom derives.
|
2. procedural macros (proc macros); including custom derives.
|
||||||
|
|
||||||
During the parsing phase, the normal Rust parser will set aside the contents of
|
During the parsing phase, the normal Rust parser will set aside the contents of
|
||||||
`macro`s and their invocations. Later, `macro`s are expanded using these
|
macros and their invocations. Later, macros are expanded using these
|
||||||
portions of the code.
|
portions of the code.
|
||||||
|
|
||||||
Some important data structures/interfaces here:
|
Some important data structures/interfaces here:
|
||||||
- [`SyntaxExtension`] - a lowered `macro` representation, contains its expander
|
- [`SyntaxExtension`] - a lowered macro representation, contains its expander
|
||||||
function, which transforms a [`TokenStream`] or `AST` into another
|
function, which transforms a [`TokenStream`] or AST into another
|
||||||
[`TokenStream`] or `AST` + some additional data like stability, or a list of
|
[`TokenStream`] or AST + some additional data like stability, or a list of
|
||||||
unstable features allowed inside the `macro`.
|
unstable features allowed inside the macro.
|
||||||
- [`SyntaxExtensionKind`] - expander functions may have several different
|
- [`SyntaxExtensionKind`] - expander functions may have several different
|
||||||
signatures (take one token stream, or two, or a piece of `AST`, etc). This is
|
signatures (take one token stream, or two, or a piece of AST, etc). This is
|
||||||
an `enum` that lists them.
|
an `enum` that lists them.
|
||||||
- [`BangProcMacro`]/[`TTMacroExpander`]/[`AttrProcMacro`]/[`MultiItemModifier`] -
|
- [`BangProcMacro`]/[`TTMacroExpander`]/[`AttrProcMacro`]/[`MultiItemModifier`] -
|
||||||
`trait`s representing the expander function signatures.
|
`trait`s representing the expander function signatures.
|
||||||
|
|
@ -451,11 +451,11 @@ Some important data structures/interfaces here:
|
||||||
|
|
||||||
## Macros By Example
|
## Macros By Example
|
||||||
|
|
||||||
`MBE`s have their own parser distinct from the Rust parser. When `macro`s are
|
MBEs have their own parser distinct from the Rust parser. When macros are
|
||||||
expanded, we may invoke the `MBE` parser to parse and expand a `macro`. The
|
expanded, we may invoke the MBE parser to parse and expand a macro. The
|
||||||
`MBE` parser, in turn, may call the Rust parser when it needs to bind a
|
MBE parser, in turn, may call the Rust parser when it needs to bind a
|
||||||
metavariable (e.g. `$my_expr`) while parsing the contents of a `macro`
|
metavariable (e.g. `$my_expr`) while parsing the contents of a macro
|
||||||
invocation. The code for `macro` expansion is in
|
invocation. The code for macro expansion is in
|
||||||
[`compiler/rustc_expand/src/mbe/`][code_dir].
|
[`compiler/rustc_expand/src/mbe/`][code_dir].
|
||||||
|
|
||||||
### Example
|
### Example
|
||||||
|
|
@ -480,9 +480,9 @@ special tokens, such as `EOF`, which its self indicates that there are no more
|
||||||
tokens. There are token trees resulting from the paired parentheses-like
|
tokens. There are token trees resulting from the paired parentheses-like
|
||||||
characters (`(`...`)`, `[`...`]`, and `{`...`}`) – they include the open and
|
characters (`(`...`)`, `[`...`]`, and `{`...`}`) – they include the open and
|
||||||
close and all the tokens in between (Rust requires that parentheses-like
|
close and all the tokens in between (Rust requires that parentheses-like
|
||||||
characters be balanced). Having `macro` expansion operate on token streams
|
characters be balanced). Having macro expansion operate on token streams
|
||||||
rather than the raw bytes of a source-file abstracts away a lot of complexity.
|
rather than the raw bytes of a source-file abstracts away a lot of complexity.
|
||||||
The `macro` expander (and much of the rest of the compiler) doesn't consider
|
The macro expander (and much of the rest of the compiler) doesn't consider
|
||||||
the exact line and column of some syntactic construct in the code; it considers
|
the exact line and column of some syntactic construct in the code; it considers
|
||||||
which constructs are used in the code. Using tokens allows us to care about
|
which constructs are used in the code. Using tokens allows us to care about
|
||||||
_what_ without worrying about _where_. For more information about tokens, see
|
_what_ without worrying about _where_. For more information about tokens, see
|
||||||
|
|
@ -492,23 +492,23 @@ the [Parsing][parsing] chapter of this book.
|
||||||
printer!(print foo); // `foo` is a variable
|
printer!(print foo); // `foo` is a variable
|
||||||
```
|
```
|
||||||
|
|
||||||
The process of expanding the `macro` invocation into the syntax tree
|
The process of expanding the macro invocation into the syntax tree
|
||||||
`println!("{}", foo)` and then expanding the syntax tree into a call to
|
`println!("{}", foo)` and then expanding the syntax tree into a call to
|
||||||
`Display::fmt` is one common example of _`macro` expansion_.
|
`Display::fmt` is one common example of _macro expansion_.
|
||||||
|
|
||||||
### The MBE parser
|
### The MBE parser
|
||||||
|
|
||||||
There are two parts to `MBE` expansion done by the `macro` parser:
|
There are two parts to MBE expansion done by the macro parser:
|
||||||
1. parsing the definition, and,
|
1. parsing the definition, and,
|
||||||
2. parsing the invocations.
|
2. parsing the invocations.
|
||||||
|
|
||||||
We think of the `MBE` parser as a nondeterministic finite automaton (NFA) based
|
We think of the MBE parser as a nondeterministic finite automaton (NFA) based
|
||||||
regex parser since it uses an algorithm similar in spirit to the [Earley
|
regex parser since it uses an algorithm similar in spirit to the [Earley
|
||||||
parsing algorithm](https://en.wikipedia.org/wiki/Earley_parser). The `macro`
|
parsing algorithm](https://en.wikipedia.org/wiki/Earley_parser). The macro
|
||||||
parser is defined in
|
parser is defined in
|
||||||
[`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].
|
[`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].
|
||||||
|
|
||||||
The interface of the `macro` parser is as follows (this is slightly simplified):
|
The interface of the macro parser is as follows (this is slightly simplified):
|
||||||
|
|
||||||
```rust,ignore
|
```rust,ignore
|
||||||
fn parse_tt(
|
fn parse_tt(
|
||||||
|
|
@ -518,11 +518,11 @@ fn parse_tt(
|
||||||
) -> ParseResult
|
) -> ParseResult
|
||||||
```
|
```
|
||||||
|
|
||||||
We use these items in `macro` parser:
|
We use these items in macro parser:
|
||||||
|
|
||||||
- a `parser` variable is a reference to the state of a normal Rust parser,
|
- a `parser` variable is a reference to the state of a normal Rust parser,
|
||||||
including the token stream and parsing session. The token stream is what we
|
including the token stream and parsing session. The token stream is what we
|
||||||
are about to ask the `MBE` parser to parse. We will consume the raw stream of
|
are about to ask the MBE parser to parse. We will consume the raw stream of
|
||||||
tokens and output a binding of metavariables to corresponding token trees.
|
tokens and output a binding of metavariables to corresponding token trees.
|
||||||
The parsing session can be used to report parser errors.
|
The parsing session can be used to report parser errors.
|
||||||
- a `matcher` variable is a sequence of [`MatcherLoc`]s that we want to match
|
- a `matcher` variable is a sequence of [`MatcherLoc`]s that we want to match
|
||||||
|
|
@ -531,73 +531,73 @@ We use these items in `macro` parser:
|
||||||
[`MatcherLoc`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/enum.MatcherLoc.html
|
[`MatcherLoc`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/enum.MatcherLoc.html
|
||||||
|
|
||||||
In the analogy of a regex parser, the token stream is the input and we are
|
In the analogy of a regex parser, the token stream is the input and we are
|
||||||
matching it against the pattern defined by `matcher`. Using our examples, the
|
matching it against the pattern defined by matcher. Using our examples, the
|
||||||
token stream could be the stream of tokens containing the inside of the example
|
token stream could be the stream of tokens containing the inside of the example
|
||||||
invocation `print foo`, while `matcher` might be the sequence of token (trees)
|
invocation `print foo`, while matcher might be the sequence of token (trees)
|
||||||
`print $mvar:ident`.
|
`print $mvar:ident`.
|
||||||
|
|
||||||
The output of the parser is a [`ParseResult`], which indicates which of
|
The output of the parser is a [`ParseResult`], which indicates which of
|
||||||
three cases has occurred:
|
three cases has occurred:
|
||||||
|
|
||||||
- **Success**: the token stream matches the given `matcher` and we have produced a
|
- **Success**: the token stream matches the given matcher and we have produced a
|
||||||
binding from metavariables to the corresponding token trees.
|
binding from metavariables to the corresponding token trees.
|
||||||
- **Failure**: the token stream does not match `matcher` and results in an error
|
- **Failure**: the token stream does not match matcher and results in an error
|
||||||
message such as "No rule expected token ...".
|
message such as "No rule expected token ...".
|
||||||
- **Error**: some fatal error has occurred _in the parser_. For example, this
|
- **Error**: some fatal error has occurred _in the parser_. For example, this
|
||||||
happens if there is more than one pattern match, since that indicates the
|
happens if there is more than one pattern match, since that indicates the
|
||||||
`macro` is ambiguous.
|
macro is ambiguous.
|
||||||
|
|
||||||
The full interface is defined [here][code_parse_int].
|
The full interface is defined [here][code_parse_int].
|
||||||
|
|
||||||
The `macro` parser does pretty much exactly the same as a normal regex parser
|
The macro parser does pretty much exactly the same as a normal regex parser
|
||||||
with one exception: in order to parse different types of metavariables, such as
|
with one exception: in order to parse different types of metavariables, such as
|
||||||
`ident`, `block`, `expr`, etc., the `macro` parser must call back to the normal
|
`ident`, `block`, `expr`, etc., the macro parser must call back to the normal
|
||||||
Rust parser. Both the definition and invocation of `macro`s are parsed using
|
Rust parser. Both the definition and invocation of macros are parsed using
|
||||||
the parser in a process which is non-intuitively self-referential.
|
the parser in a process which is non-intuitively self-referential.
|
||||||
|
|
||||||
The code to parse `macro` _definitions_ is in
|
The code to parse macro _definitions_ is in
|
||||||
[`compiler/rustc_expand/src/mbe/macro_rules.rs`][code_mr]. It defines the
|
[`compiler/rustc_expand/src/mbe/macro_rules.rs`][code_mr]. It defines the
|
||||||
pattern for matching a `macro` definition as `$( $lhs:tt => $rhs:tt );+`. In
|
pattern for matching a macro definition as `$( $lhs:tt => $rhs:tt );+`. In
|
||||||
other words, a `macro_rules` definition should have in its body at least one
|
other words, a `macro_rules` definition should have in its body at least one
|
||||||
occurrence of a token tree followed by `=>` followed by another token tree.
|
occurrence of a token tree followed by `=>` followed by another token tree.
|
||||||
When the compiler comes to a `macro_rules` definition, it uses this pattern to
|
When the compiler comes to a `macro_rules` definition, it uses this pattern to
|
||||||
match the two token trees per the rules of the definition of the `macro`, _thereby
|
match the two token trees per the rules of the definition of the macro, _thereby
|
||||||
utilizing the `macro` parser itself_. In our example definition, the
|
utilizing the macro parser itself_. In our example definition, the
|
||||||
metavariable `$lhs` would match the patterns of both arms: `(print
|
metavariable `$lhs` would match the patterns of both arms: `(print
|
||||||
$mvar:ident)` and `(print twice $mvar:ident)`. And `$rhs` would match the
|
$mvar:ident)` and `(print twice $mvar:ident)`. And `$rhs` would match the
|
||||||
bodies of both arms: `{ println!("{}", $mvar); }` and `{ println!("{}", $mvar);
|
bodies of both arms: `{ println!("{}", $mvar); }` and `{ println!("{}", $mvar);
|
||||||
println!("{}", $mvar); }`. The parser keeps this knowledge around for when it
|
println!("{}", $mvar); }`. The parser keeps this knowledge around for when it
|
||||||
needs to expand a `macro` invocation.
|
needs to expand a macro invocation.
|
||||||
|
|
||||||
When the compiler comes to a `macro` invocation, it parses that invocation using
|
When the compiler comes to a macro invocation, it parses that invocation using
|
||||||
a NFA-based `macro` parser described above. However, the `matcher` variable
|
a NFA-based macro parser described above. However, the matcher variable
|
||||||
used is the first token tree (`$lhs`) extracted from the arms of the `macro`
|
used is the first token tree (`$lhs`) extracted from the arms of the macro
|
||||||
_definition_. Using our example, we would try to match the token stream `print
|
_definition_. Using our example, we would try to match the token stream `print
|
||||||
foo` from the invocation against the `matcher`s `print $mvar:ident` and `print
|
foo` from the invocation against the matchers `print $mvar:ident` and `print
|
||||||
twice $mvar:ident` that we previously extracted from the definition. The
|
twice $mvar:ident` that we previously extracted from the definition. The
|
||||||
algorithm is exactly the same, but when the `macro` parser comes to a place in the
|
algorithm is exactly the same, but when the macro parser comes to a place in the
|
||||||
current `matcher` where it needs to match a _non-terminal_ (e.g. `$mvar:ident`),
|
current matcher where it needs to match a _non-terminal_ (e.g. `$mvar:ident`),
|
||||||
it calls back to the normal Rust parser to get the contents of that
|
it calls back to the normal Rust parser to get the contents of that
|
||||||
non-terminal. In this case, the Rust parser would look for an `ident` token,
|
non-terminal. In this case, the Rust parser would look for an `ident` token,
|
||||||
which it finds (`foo`) and returns to the `macro` parser. Then, the `macro` parser
|
which it finds (`foo`) and returns to the macro parser. Then, the macro parser
|
||||||
proceeds in parsing as normal. Also, note that exactly one of the `matcher`s from
|
proceeds in parsing as normal. Also, note that exactly one of the matchers from
|
||||||
the various arms should match the invocation; if there is more than one match,
|
the various arms should match the invocation; if there is more than one match,
|
||||||
the parse is ambiguous, while if there are no matches at all, there is a syntax
|
the parse is ambiguous, while if there are no matches at all, there is a syntax
|
||||||
error.
|
error.
|
||||||
|
|
||||||
For more information about the `macro` parser's implementation, see the comments
|
For more information about the macro parser's implementation, see the comments
|
||||||
in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].
|
in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].
|
||||||
|
|
||||||
## Procedural Macros
|
## Procedural Macros
|
||||||
|
|
||||||
Procedural `macro`s are also expanded during parsing. However, rather than
|
Procedural macros are also expanded during parsing. However, rather than
|
||||||
having a parser in the compiler, `proc macro`s are implemented as custom,
|
having a parser in the compiler, proc macros are implemented as custom,
|
||||||
third-party crates. The compiler will compile the `proc macro` crate and
|
third-party crates. The compiler will compile the proc macro crate and
|
||||||
specially annotated functions in them (i.e. the `proc macro` itself), passing
|
specially annotated functions in them (i.e. the proc macro itself), passing
|
||||||
them a stream of tokens. A `proc macro` can then transform the token stream and
|
them a stream of tokens. A proc macro can then transform the token stream and
|
||||||
output a new token stream, which is synthesized into the `AST`.
|
output a new token stream, which is synthesized into the AST.
|
||||||
|
|
||||||
The token stream type used by `proc macro`s is _stable_, so `rustc` does not
|
The token stream type used by proc macros is _stable_, so `rustc` does not
|
||||||
use it internally. The compiler's (unstable) token stream is defined in
|
use it internally. The compiler's (unstable) token stream is defined in
|
||||||
[`rustc_ast::tokenstream::TokenStream`][rustcts]. This is converted into the
|
[`rustc_ast::tokenstream::TokenStream`][rustcts]. This is converted into the
|
||||||
stable [`proc_macro::TokenStream`][stablets] and back in
|
stable [`proc_macro::TokenStream`][stablets] and back in
|
||||||
|
|
@ -615,13 +615,13 @@ Since the Rust ABI is currently unstable, we use the C ABI for this conversion.
|
||||||
|
|
||||||
### Custom Derive
|
### Custom Derive
|
||||||
|
|
||||||
Custom derives are a special type of `proc macro`.
|
Custom derives are a special type of proc macro.
|
||||||
|
|
||||||
### Macros By Example and Macros 2.0
|
### Macros By Example and Macros 2.0
|
||||||
|
|
||||||
There is an legacy and mostly undocumented effort to improve the `MBE` system
|
There is an legacy and mostly undocumented effort to improve the MBE system
|
||||||
by giving it more hygiene-related features, better scoping and visibility
|
by giving it more hygiene-related features, better scoping and visibility
|
||||||
rules, etc. Internally this uses the same machinery as today's `MBE`s with some
|
rules, etc. Internally this uses the same machinery as today's MBEs with some
|
||||||
additional syntactic sugar and are allowed to be in namespaces.
|
additional syntactic sugar and are allowed to be in namespaces.
|
||||||
|
|
||||||
<!-- TODO(rylev): more? [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/1160) -->
|
<!-- TODO(rylev): more? [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/1160) -->
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue