Minor edits
This commit is contained in:
parent
b956638072
commit
04480082ad
|
|
@ -5,11 +5,11 @@
|
|||
> N.B. [`rustc_ast`], [`rustc_expand`], and [`rustc_builtin_macros`] are all
|
||||
> undergoing refactoring, so some of the links in this chapter may be broken.
|
||||
|
||||
Rust has a very powerful `macro` system. In the previous chapter, we saw how
|
||||
the parser sets aside `macro`s to be expanded (using temporary [placeholders]).
|
||||
This chapter is about the process of expanding those `macro`s iteratively until
|
||||
we have a complete [*Abstract Syntax Tree* (`AST`)][ast] for our crate with no
|
||||
unexpanded `macro`s (or a compile error).
|
||||
Rust has a very powerful macro system. In the previous chapter, we saw how
|
||||
the parser sets aside macros to be expanded (using temporary [placeholders]).
|
||||
This chapter is about the process of expanding those macros iteratively until
|
||||
we have a complete [*Abstract Syntax Tree* (AST)][ast] for our crate with no
|
||||
unexpanded macros (or a compile error).
|
||||
|
||||
[ast]: ./ast-validation.md
|
||||
[`rustc_ast`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html
|
||||
|
|
@ -17,14 +17,14 @@ unexpanded `macro`s (or a compile error).
|
|||
[`rustc_builtin_macros`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_builtin_macros/index.html
|
||||
[placeholders]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/placeholders/index.html
|
||||
|
||||
First, we discuss the algorithm that expands and integrates `macro` output into
|
||||
`AST`s. Next, we take a look at how hygiene data is collected. Finally, we look
|
||||
at the specifics of expanding different types of `macro`s.
|
||||
First, we discuss the algorithm that expands and integrates macro output into
|
||||
ASTs. Next, we take a look at how hygiene data is collected. Finally, we look
|
||||
at the specifics of expanding different types of macros.
|
||||
|
||||
Many of the algorithms and data structures described below are in [`rustc_expand`],
|
||||
with fundamental data structures in [`rustc_expand::base`][base].
|
||||
|
||||
Also of note, `cfg` and `cfg_attr` are treated specially from other `macro`s, and are
|
||||
Also of note, `cfg` and `cfg_attr` are treated specially from other macros, and are
|
||||
handled in [`rustc_expand::config`][cfg].
|
||||
|
||||
[`rustc_expand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/index.html
|
||||
|
|
@ -34,7 +34,7 @@ handled in [`rustc_expand::config`][cfg].
|
|||
## Expansion and AST Integration
|
||||
|
||||
Firstly, expansion happens at the crate level. Given a raw source code for
|
||||
a crate, the compiler will produce a massive `AST` with all `macro`s expanded, all
|
||||
a crate, the compiler will produce a massive AST with all macros expanded, all
|
||||
modules inlined, etc. The primary entry point for this process is the
|
||||
[`MacroExpander::fully_expand_fragment`][fef] method. With few exceptions, we
|
||||
use this method on the whole crate (see ["Eager Expansion"](#eager-expansion)
|
||||
|
|
@ -44,53 +44,53 @@ below for more detailed discussion of edge case expansion issues).
|
|||
[reb]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/build/index.html
|
||||
|
||||
At a high level, [`fully_expand_fragment`][fef] works in iterations. We keep a
|
||||
queue of unresolved `macro` invocations (i.e. `macro`s we haven't found the
|
||||
definition of yet). We repeatedly try to pick a `macro` from the queue, resolve
|
||||
queue of unresolved macro invocations (i.e. macros we haven't found the
|
||||
definition of yet). We repeatedly try to pick a macro from the queue, resolve
|
||||
it, expand it, and integrate it back. If we can't make progress in an
|
||||
iteration, this represents a compile error. Here is the [algorithm][original]:
|
||||
|
||||
[fef]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/struct.MacroExpander.html#method.fully_expand_fragment
|
||||
[original]: https://github.com/rust-lang/rust/pull/53778#issuecomment-419224049
|
||||
|
||||
1. Initialize a `queue` of unresolved `macro`s.
|
||||
1. Initialize a `queue` of unresolved macros.
|
||||
2. Repeat until `queue` is empty (or we make no progress, which is an error):
|
||||
1. [Resolve](./name-resolution.md) imports in our partially built crate as
|
||||
much as possible.
|
||||
2. Collect as many `macro` [`Invocation`s][inv] as possible from our
|
||||
2. Collect as many macro [`Invocation`s][inv] as possible from our
|
||||
partially built crate (`fn`-like, attributes, derives) and add them to the
|
||||
queue.
|
||||
3. Dequeue the first element and attempt to resolve it.
|
||||
4. If it's resolved:
|
||||
1. Run the `macro`'s expander function that consumes a [`TokenStream`] or
|
||||
`AST` and produces a [`TokenStream`] or [`AstFragment`] (depending on
|
||||
the `macro` kind). (A [`TokenStream`] is a collection of [`TokenTree`s][tt],
|
||||
1. Run the macro's expander function that consumes a [`TokenStream`] or
|
||||
AST and produces a [`TokenStream`] or [`AstFragment`] (depending on
|
||||
the macro kind). (A [`TokenStream`] is a collection of [`TokenTree`s][tt],
|
||||
each of which are a token (punctuation, identifier, or literal) or a
|
||||
delimited group (anything inside `()`/`[]`/`{}`)).
|
||||
- At this point, we know everything about the `macro` itself and can
|
||||
- At this point, we know everything about the macro itself and can
|
||||
call [`set_expn_data`] to fill in its properties in the global
|
||||
data; that is the [hygiene] data associated with [`ExpnId`] (see
|
||||
[Hygiene][hybelow] below).
|
||||
2. Integrate that piece of `AST` into the currently-existing though
|
||||
partially-built `AST`. This is essentially where the "token-like mass"
|
||||
becomes a proper set-in-stone `AST` with side-tables. It happens as
|
||||
2. Integrate that piece of AST into the currently-existing though
|
||||
partially-built AST. This is essentially where the "token-like mass"
|
||||
becomes a proper set-in-stone AST with side-tables. It happens as
|
||||
follows:
|
||||
- If the `macro` produces tokens (e.g. a `proc macro`), we parse into
|
||||
an `AST`, which may produce parse errors.
|
||||
- If the macro produces tokens (e.g. a proc macro), we parse into
|
||||
an AST, which may produce parse errors.
|
||||
- During expansion, we create [`SyntaxContext`]s (hierarchy 2) (see
|
||||
[Hygiene][hybelow] below).
|
||||
- These three passes happen one after another on every `AST` fragment
|
||||
freshly expanded from a `macro`:
|
||||
- These three passes happen one after another on every AST fragment
|
||||
freshly expanded from a macro:
|
||||
- [`NodeId`]s are assigned by [`InvocationCollector`]. This
|
||||
also collects new `macro` calls from this new `AST` piece and
|
||||
also collects new macro calls from this new AST piece and
|
||||
adds them to the queue.
|
||||
- ["Def paths"][defpath] are created and [`DefId`]s are
|
||||
assigned to them by [`DefCollector`].
|
||||
- Names are put into modules (from the resolver's point of
|
||||
view) by [`BuildReducedGraphVisitor`].
|
||||
3. After expanding a single `macro` and integrating its output, continue
|
||||
3. After expanding a single macro and integrating its output, continue
|
||||
to the next iteration of [`fully_expand_fragment`][fef].
|
||||
5. If it's not resolved:
|
||||
1. Put the `macro` back in the queue.
|
||||
1. Put the macro back in the queue.
|
||||
2. Continue to next iteration...
|
||||
|
||||
[`AstFragment`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/expand/enum.AstFragment.html
|
||||
|
|
@ -112,9 +112,9 @@ iteration, this represents a compile error. Here is the [algorithm][original]:
|
|||
### Error Recovery
|
||||
|
||||
If we make no progress in an iteration we have reached a compilation error
|
||||
(e.g. an undefined `macro`). We attempt to recover from failures (i.e.
|
||||
unresolved `macro`s or imports) with the intent of generating diagnostics.
|
||||
Failure recovery happens by expanding unresolved `macro`s into
|
||||
(e.g. an undefined macro). We attempt to recover from failures (i.e.
|
||||
unresolved macros or imports) with the intent of generating diagnostics.
|
||||
Failure recovery happens by expanding unresolved macros into
|
||||
[`ExprKind::Err`][err] and allows compilation to continue past the first error
|
||||
so that `rustc` can report more errors than just the original failure.
|
||||
|
||||
|
|
@ -123,8 +123,8 @@ so that `rustc` can report more errors than just the original failure.
|
|||
### Name Resolution
|
||||
|
||||
Notice that name resolution is involved here: we need to resolve imports and
|
||||
`macro` names in the above algorithm. This is done in
|
||||
[`rustc_resolve::macros`][mresolve], which resolves `macro` paths, validates
|
||||
macro names in the above algorithm. This is done in
|
||||
[`rustc_resolve::macros`][mresolve], which resolves macro paths, validates
|
||||
those resolutions, and reports various errors (e.g. "not found", "found, but
|
||||
it's unstable", "expected x, found y"). However, we don't try to resolve
|
||||
other names yet. This happens later, as we will see in the chapter: [Name
|
||||
|
|
@ -134,10 +134,10 @@ Resolution](./name-resolution.md).
|
|||
|
||||
### Eager Expansion
|
||||
|
||||
_Eager expansion_ means we expand the arguments of a `macro` invocation before
|
||||
the `macro` invocation itself. This is implemented only for a few special
|
||||
built-in `macro`s that expect literals; expanding arguments first for some of
|
||||
these `macro` results in a smoother user experience. As an example, consider
|
||||
_Eager expansion_ means we expand the arguments of a macro invocation before
|
||||
the macro invocation itself. This is implemented only for a few special
|
||||
built-in macros that expect literals; expanding arguments first for some of
|
||||
these macro results in a smoother user experience. As an example, consider
|
||||
the following:
|
||||
|
||||
```rust,ignore
|
||||
|
|
@ -152,11 +152,11 @@ A lazy-expansion would expand `foo!` first. An eager-expansion would expand
|
|||
|
||||
Eager-expansion is not a generally available feature of Rust. Implementing
|
||||
eager-expansion more generally would be challenging, so we implement it for a
|
||||
few special built-in `macro`s for the sake of user-experience. The built-in
|
||||
`macro`s are implemented in [`rustc_builtin_macros`], along with some other
|
||||
few special built-in macros for the sake of user-experience. The built-in
|
||||
macros are implemented in [`rustc_builtin_macros`], along with some other
|
||||
early code generation facilities like injection of standard library imports or
|
||||
generation of test harness. There are some additional helpers for building
|
||||
`AST` fragments in [`rustc_expand::build`][reb]. Eager-expansion generally
|
||||
AST fragments in [`rustc_expand::build`][reb]. Eager-expansion generally
|
||||
performs a subset of the things that lazy (normal) expansion does. It is done
|
||||
by invoking [`fully_expand_fragment`][fef] on only part of a crate (as opposed
|
||||
to the whole crate, like we normally do).
|
||||
|
|
@ -170,10 +170,10 @@ integration:
|
|||
pretty much everything else depending on [`rustc_ast`].
|
||||
- [`ExtCtxt`]/[`ExpansionData`] - holds various intermediate expansion
|
||||
infrastructure data.
|
||||
- [`Annotatable`] - a piece of `AST` that can be an attribute target, almost the same
|
||||
thing as [`AstFragment`] except for `type`s and patterns that can be produced by
|
||||
`macro`s but cannot be annotated with attributes.
|
||||
- [`MacResult`] - a "polymorphic" `AST` fragment, something that can turn into
|
||||
- [`Annotatable`] - a piece of AST that can be an attribute target, almost the same
|
||||
thing as [`AstFragment`] except for types and patterns that can be produced by
|
||||
macros but cannot be annotated with attributes.
|
||||
- [`MacResult`] - a "polymorphic" AST fragment, something that can turn into
|
||||
a different [`AstFragment`] depending on its [`AstFragmentKind`] (i.e. an item,
|
||||
expression, pattern, etc).
|
||||
|
||||
|
|
@ -223,16 +223,16 @@ we got `foo(0, 0)` because the macro defined its own `y`!
|
|||
|
||||
These are both examples of _macro hygiene_ issues. _Hygiene_ relates to how to
|
||||
handle names defined _within a macro_. In particular, a hygienic macro system
|
||||
prevents errors due to names introduced within a macro. Rust `macro`s are hygienic
|
||||
prevents errors due to names introduced within a macro. Rust macros are hygienic
|
||||
in that they do not allow one to write the sorts of bugs above.
|
||||
|
||||
At a high level, hygiene within the Rust compiler is accomplished by keeping
|
||||
track of the context where a name is introduced and used. We can then
|
||||
disambiguate names based on that context. Future iterations of the `macro` system
|
||||
will allow greater control to the `macro` author to use that context. For example,
|
||||
a `macro` author may want to introduce a new name to the context where the `macro`
|
||||
was called. Alternately, the `macro` author may be defining a variable for use
|
||||
only within the `macro` (i.e. it should not be visible outside the `macro`).
|
||||
disambiguate names based on that context. Future iterations of the macro system
|
||||
will allow greater control to the macro author to use that context. For example,
|
||||
a macro author may want to introduce a new name to the context where the macro
|
||||
was called. Alternately, the macro author may be defining a variable for use
|
||||
only within the macro (i.e. it should not be visible outside the macro).
|
||||
|
||||
[code_dir]: https://github.com/rust-lang/rust/tree/master/compiler/rustc_expand/src/mbe
|
||||
[code_mp]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser
|
||||
|
|
@ -240,18 +240,18 @@ only within the `macro` (i.e. it should not be visible outside the `macro`).
|
|||
[code_parse_int]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/struct.TtParser.html#method.parse_tt
|
||||
[parsing]: ./the-parser.html
|
||||
|
||||
The context is attached to `AST` nodes. All `AST` nodes generated by `macro`s have
|
||||
The context is attached to AST nodes. All AST nodes generated by macros have
|
||||
context attached. Additionally, there may be other nodes that have context
|
||||
attached, such as some desugared syntax (non-`macro`-expanded nodes are
|
||||
attached, such as some desugared syntax (non-macro-expanded nodes are
|
||||
considered to just have the "root" context, as described below).
|
||||
Throughout the compiler, we use [`rustc_span::Span`s][span] to refer to code locations.
|
||||
This struct also has hygiene information attached to it, as we will see later.
|
||||
|
||||
[span]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.Span.html
|
||||
|
||||
Because `macro`s invocations and definitions can be nested, the syntax context of
|
||||
a node must be a hierarchy. For example, if we expand a `macro` and there is
|
||||
another `macro` invocation or definition in the generated output, then the syntax
|
||||
Because macros invocations and definitions can be nested, the syntax context of
|
||||
a node must be a hierarchy. For example, if we expand a macro and there is
|
||||
another macro invocation or definition in the generated output, then the syntax
|
||||
context should reflect the nesting.
|
||||
|
||||
However, it turns out that there are actually a few types of context we may
|
||||
|
|
@ -259,13 +259,13 @@ want to track for different purposes. Thus, there are not just one but _three_
|
|||
expansion hierarchies that together comprise the hygiene information for a
|
||||
crate.
|
||||
|
||||
All of these hierarchies need some sort of "`macro` ID" to identify individual
|
||||
elements in the chain of expansions. This ID is [`ExpnId`]. All `macro`s receive
|
||||
an integer ID, assigned continuously starting from 0 as we discover new `macro`
|
||||
All of these hierarchies need some sort of "macro ID" to identify individual
|
||||
elements in the chain of expansions. This ID is [`ExpnId`]. All macros receive
|
||||
an integer ID, assigned continuously starting from 0 as we discover new macro
|
||||
calls. All hierarchies start at [`ExpnId::root`][rootid], which is its own
|
||||
parent.
|
||||
|
||||
The [`rustc_span::hygiene`][hy] library contains all of the hygiene-related algorithms
|
||||
The [`rustc_span::hygiene`][hy] crate contains all of the hygiene-related algorithms
|
||||
(with the exception of some hacks in [`Resolver::resolve_crate_root`][hacks])
|
||||
and structures related to hygiene and expansion that are kept in global data.
|
||||
|
||||
|
|
@ -283,12 +283,12 @@ any [`Ident`] without any context.
|
|||
|
||||
### The Expansion Order Hierarchy
|
||||
|
||||
The first hierarchy tracks the order of expansions, i.e., when a `macro`
|
||||
invocation is in the output of another `macro`.
|
||||
The first hierarchy tracks the order of expansions, i.e., when a macro
|
||||
invocation is in the output of another macro.
|
||||
|
||||
Here, the children in the hierarchy will be the "innermost" tokens. The
|
||||
[`ExpnData`] struct itself contains a subset of properties from both `macro`
|
||||
definition and `macro` call available through global data.
|
||||
[`ExpnData`] struct itself contains a subset of properties from both macro
|
||||
definition and macro call available through global data.
|
||||
[`ExpnData::parent`][edp] tracks the child-to-parent link in this hierarchy.
|
||||
|
||||
[`ExpnData`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.ExpnData.html
|
||||
|
|
@ -302,13 +302,13 @@ macro_rules! foo { () => { println!(); } }
|
|||
fn main() { foo!(); }
|
||||
```
|
||||
|
||||
In this code, the `AST` nodes that are finally generated would have hierarchy
|
||||
In this code, the AST nodes that are finally generated would have hierarchy
|
||||
`root -> id(foo) -> id(println)`.
|
||||
|
||||
### The Macro Definition Hierarchy
|
||||
|
||||
The second hierarchy tracks the order of `macro` definitions, i.e., when we are
|
||||
expanding one `macro` another `macro` definition is revealed in its output. This
|
||||
The second hierarchy tracks the order of macro definitions, i.e., when we are
|
||||
expanding one macro another macro definition is revealed in its output. This
|
||||
one is a bit tricky and more complex than the other two hierarchies.
|
||||
|
||||
[`SyntaxContext`][sc] represents a whole chain in this hierarchy via an ID.
|
||||
|
|
@ -330,15 +330,15 @@ a code location and [`SyntaxContext`][sc]. Likewise, an [`Ident`] is just an int
|
|||
[scdoe]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContextData.html#structfield.outer_expn
|
||||
[am]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.apply_mark
|
||||
|
||||
For built-in `macro`s, we use the context:
|
||||
[`SyntaxContext::empty().apply_mark(expn_id)`], and such `macro`s are
|
||||
For built-in macros, we use the context:
|
||||
[`SyntaxContext::empty().apply_mark(expn_id)`], and such macros are
|
||||
considered to be defined at the hierarchy root. We do the same for `proc
|
||||
macro`s because we haven't implemented cross-crate hygiene yet.
|
||||
|
||||
[`SyntaxContext::empty().apply_mark(expn_id)`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/hygiene/struct.SyntaxContext.html#method.apply_mark
|
||||
|
||||
If the token had context `X` before being produced by a `macro` then after being
|
||||
produced by the `macro` it has context `X -> macro_id`. Here are some examples:
|
||||
If the token had context `X` before being produced by a macro then after being
|
||||
produced by the macro it has context `X -> macro_id`. Here are some examples:
|
||||
|
||||
Example 0:
|
||||
|
||||
|
|
@ -379,9 +379,9 @@ m!(foo);
|
|||
After all expansions, `foo` has context `ROOT -> id(n)` and `bar` has context
|
||||
`ROOT -> id(m) -> id(n)`.
|
||||
|
||||
Currently this hierarchy for tracking `macro` definitions is subject to the
|
||||
Currently this hierarchy for tracking macro definitions is subject to the
|
||||
so-called ["context transplantation hack"][hack]. Modern (i.e. experimental)
|
||||
`macro`s have stronger hygiene than the legacy "Macros By Example" (`MBE`)
|
||||
macros have stronger hygiene than the legacy "Macros By Example" (MBE)
|
||||
system which can result in weird interactions between the two. The hack is
|
||||
intended to make things "just work" for now.
|
||||
|
||||
|
|
@ -390,7 +390,7 @@ intended to make things "just work" for now.
|
|||
|
||||
### The Call-site Hierarchy
|
||||
|
||||
The third and final hierarchy tracks the location of `macro` invocations.
|
||||
The third and final hierarchy tracks the location of macro invocations.
|
||||
|
||||
In this hierarchy [`ExpnData::call_site`][callsite] is the `child -> parent`
|
||||
link.
|
||||
|
|
@ -406,38 +406,38 @@ macro foo($i: ident) { $i }
|
|||
foo!(bar!(baz));
|
||||
```
|
||||
|
||||
For the `baz` `AST` node in the final output, the expansion-order hierarchy is
|
||||
For the `baz` AST node in the final output, the expansion-order hierarchy is
|
||||
`ROOT -> id(foo) -> id(bar) -> baz`, while the call-site hierarchy is `ROOT ->
|
||||
baz`.
|
||||
|
||||
### Macro Backtraces
|
||||
|
||||
`macro` backtraces are implemented in [`rustc_span`] using the hygiene machinery
|
||||
Macro backtraces are implemented in [`rustc_span`] using the hygiene machinery
|
||||
in [`rustc_span::hygiene`][hy].
|
||||
|
||||
[`rustc_span`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/index.html
|
||||
|
||||
## Producing Macro Output
|
||||
|
||||
Above, we saw how the output of a `macro` is integrated into the `AST` for a crate,
|
||||
Above, we saw how the output of a macro is integrated into the AST for a crate,
|
||||
and we also saw how the hygiene data for a crate is generated. But how do we
|
||||
actually produce the output of a `macro`? It depends on the type of `macro`.
|
||||
actually produce the output of a macro? It depends on the type of macro.
|
||||
|
||||
There are two types of `macro`s in Rust:
|
||||
1. `macro_rules!` macros, and,
|
||||
2. procedural `macro`s (`proc macro`s); including custom derives.
|
||||
There are two types of macros in Rust:
|
||||
1. `macro_rules!` macros (a.k.a. "Macros By Example" (MBE)), and,
|
||||
2. procedural macros (proc macros); including custom derives.
|
||||
|
||||
During the parsing phase, the normal Rust parser will set aside the contents of
|
||||
`macro`s and their invocations. Later, `macro`s are expanded using these
|
||||
macros and their invocations. Later, macros are expanded using these
|
||||
portions of the code.
|
||||
|
||||
Some important data structures/interfaces here:
|
||||
- [`SyntaxExtension`] - a lowered `macro` representation, contains its expander
|
||||
function, which transforms a [`TokenStream`] or `AST` into another
|
||||
[`TokenStream`] or `AST` + some additional data like stability, or a list of
|
||||
unstable features allowed inside the `macro`.
|
||||
- [`SyntaxExtension`] - a lowered macro representation, contains its expander
|
||||
function, which transforms a [`TokenStream`] or AST into another
|
||||
[`TokenStream`] or AST + some additional data like stability, or a list of
|
||||
unstable features allowed inside the macro.
|
||||
- [`SyntaxExtensionKind`] - expander functions may have several different
|
||||
signatures (take one token stream, or two, or a piece of `AST`, etc). This is
|
||||
signatures (take one token stream, or two, or a piece of AST, etc). This is
|
||||
an `enum` that lists them.
|
||||
- [`BangProcMacro`]/[`TTMacroExpander`]/[`AttrProcMacro`]/[`MultiItemModifier`] -
|
||||
`trait`s representing the expander function signatures.
|
||||
|
|
@ -451,11 +451,11 @@ Some important data structures/interfaces here:
|
|||
|
||||
## Macros By Example
|
||||
|
||||
`MBE`s have their own parser distinct from the Rust parser. When `macro`s are
|
||||
expanded, we may invoke the `MBE` parser to parse and expand a `macro`. The
|
||||
`MBE` parser, in turn, may call the Rust parser when it needs to bind a
|
||||
metavariable (e.g. `$my_expr`) while parsing the contents of a `macro`
|
||||
invocation. The code for `macro` expansion is in
|
||||
MBEs have their own parser distinct from the Rust parser. When macros are
|
||||
expanded, we may invoke the MBE parser to parse and expand a macro. The
|
||||
MBE parser, in turn, may call the Rust parser when it needs to bind a
|
||||
metavariable (e.g. `$my_expr`) while parsing the contents of a macro
|
||||
invocation. The code for macro expansion is in
|
||||
[`compiler/rustc_expand/src/mbe/`][code_dir].
|
||||
|
||||
### Example
|
||||
|
|
@ -480,9 +480,9 @@ special tokens, such as `EOF`, which its self indicates that there are no more
|
|||
tokens. There are token trees resulting from the paired parentheses-like
|
||||
characters (`(`...`)`, `[`...`]`, and `{`...`}`) – they include the open and
|
||||
close and all the tokens in between (Rust requires that parentheses-like
|
||||
characters be balanced). Having `macro` expansion operate on token streams
|
||||
characters be balanced). Having macro expansion operate on token streams
|
||||
rather than the raw bytes of a source-file abstracts away a lot of complexity.
|
||||
The `macro` expander (and much of the rest of the compiler) doesn't consider
|
||||
The macro expander (and much of the rest of the compiler) doesn't consider
|
||||
the exact line and column of some syntactic construct in the code; it considers
|
||||
which constructs are used in the code. Using tokens allows us to care about
|
||||
_what_ without worrying about _where_. For more information about tokens, see
|
||||
|
|
@ -492,23 +492,23 @@ the [Parsing][parsing] chapter of this book.
|
|||
printer!(print foo); // `foo` is a variable
|
||||
```
|
||||
|
||||
The process of expanding the `macro` invocation into the syntax tree
|
||||
The process of expanding the macro invocation into the syntax tree
|
||||
`println!("{}", foo)` and then expanding the syntax tree into a call to
|
||||
`Display::fmt` is one common example of _`macro` expansion_.
|
||||
`Display::fmt` is one common example of _macro expansion_.
|
||||
|
||||
### The MBE parser
|
||||
|
||||
There are two parts to `MBE` expansion done by the `macro` parser:
|
||||
There are two parts to MBE expansion done by the macro parser:
|
||||
1. parsing the definition, and,
|
||||
2. parsing the invocations.
|
||||
|
||||
We think of the `MBE` parser as a nondeterministic finite automaton (NFA) based
|
||||
We think of the MBE parser as a nondeterministic finite automaton (NFA) based
|
||||
regex parser since it uses an algorithm similar in spirit to the [Earley
|
||||
parsing algorithm](https://en.wikipedia.org/wiki/Earley_parser). The `macro`
|
||||
parsing algorithm](https://en.wikipedia.org/wiki/Earley_parser). The macro
|
||||
parser is defined in
|
||||
[`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].
|
||||
|
||||
The interface of the `macro` parser is as follows (this is slightly simplified):
|
||||
The interface of the macro parser is as follows (this is slightly simplified):
|
||||
|
||||
```rust,ignore
|
||||
fn parse_tt(
|
||||
|
|
@ -518,11 +518,11 @@ fn parse_tt(
|
|||
) -> ParseResult
|
||||
```
|
||||
|
||||
We use these items in `macro` parser:
|
||||
We use these items in macro parser:
|
||||
|
||||
- a `parser` variable is a reference to the state of a normal Rust parser,
|
||||
including the token stream and parsing session. The token stream is what we
|
||||
are about to ask the `MBE` parser to parse. We will consume the raw stream of
|
||||
are about to ask the MBE parser to parse. We will consume the raw stream of
|
||||
tokens and output a binding of metavariables to corresponding token trees.
|
||||
The parsing session can be used to report parser errors.
|
||||
- a `matcher` variable is a sequence of [`MatcherLoc`]s that we want to match
|
||||
|
|
@ -531,73 +531,73 @@ We use these items in `macro` parser:
|
|||
[`MatcherLoc`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/mbe/macro_parser/enum.MatcherLoc.html
|
||||
|
||||
In the analogy of a regex parser, the token stream is the input and we are
|
||||
matching it against the pattern defined by `matcher`. Using our examples, the
|
||||
matching it against the pattern defined by matcher. Using our examples, the
|
||||
token stream could be the stream of tokens containing the inside of the example
|
||||
invocation `print foo`, while `matcher` might be the sequence of token (trees)
|
||||
invocation `print foo`, while matcher might be the sequence of token (trees)
|
||||
`print $mvar:ident`.
|
||||
|
||||
The output of the parser is a [`ParseResult`], which indicates which of
|
||||
three cases has occurred:
|
||||
|
||||
- **Success**: the token stream matches the given `matcher` and we have produced a
|
||||
- **Success**: the token stream matches the given matcher and we have produced a
|
||||
binding from metavariables to the corresponding token trees.
|
||||
- **Failure**: the token stream does not match `matcher` and results in an error
|
||||
- **Failure**: the token stream does not match matcher and results in an error
|
||||
message such as "No rule expected token ...".
|
||||
- **Error**: some fatal error has occurred _in the parser_. For example, this
|
||||
happens if there is more than one pattern match, since that indicates the
|
||||
`macro` is ambiguous.
|
||||
macro is ambiguous.
|
||||
|
||||
The full interface is defined [here][code_parse_int].
|
||||
|
||||
The `macro` parser does pretty much exactly the same as a normal regex parser
|
||||
The macro parser does pretty much exactly the same as a normal regex parser
|
||||
with one exception: in order to parse different types of metavariables, such as
|
||||
`ident`, `block`, `expr`, etc., the `macro` parser must call back to the normal
|
||||
Rust parser. Both the definition and invocation of `macro`s are parsed using
|
||||
`ident`, `block`, `expr`, etc., the macro parser must call back to the normal
|
||||
Rust parser. Both the definition and invocation of macros are parsed using
|
||||
the parser in a process which is non-intuitively self-referential.
|
||||
|
||||
The code to parse `macro` _definitions_ is in
|
||||
The code to parse macro _definitions_ is in
|
||||
[`compiler/rustc_expand/src/mbe/macro_rules.rs`][code_mr]. It defines the
|
||||
pattern for matching a `macro` definition as `$( $lhs:tt => $rhs:tt );+`. In
|
||||
pattern for matching a macro definition as `$( $lhs:tt => $rhs:tt );+`. In
|
||||
other words, a `macro_rules` definition should have in its body at least one
|
||||
occurrence of a token tree followed by `=>` followed by another token tree.
|
||||
When the compiler comes to a `macro_rules` definition, it uses this pattern to
|
||||
match the two token trees per the rules of the definition of the `macro`, _thereby
|
||||
utilizing the `macro` parser itself_. In our example definition, the
|
||||
match the two token trees per the rules of the definition of the macro, _thereby
|
||||
utilizing the macro parser itself_. In our example definition, the
|
||||
metavariable `$lhs` would match the patterns of both arms: `(print
|
||||
$mvar:ident)` and `(print twice $mvar:ident)`. And `$rhs` would match the
|
||||
bodies of both arms: `{ println!("{}", $mvar); }` and `{ println!("{}", $mvar);
|
||||
println!("{}", $mvar); }`. The parser keeps this knowledge around for when it
|
||||
needs to expand a `macro` invocation.
|
||||
needs to expand a macro invocation.
|
||||
|
||||
When the compiler comes to a `macro` invocation, it parses that invocation using
|
||||
a NFA-based `macro` parser described above. However, the `matcher` variable
|
||||
used is the first token tree (`$lhs`) extracted from the arms of the `macro`
|
||||
When the compiler comes to a macro invocation, it parses that invocation using
|
||||
a NFA-based macro parser described above. However, the matcher variable
|
||||
used is the first token tree (`$lhs`) extracted from the arms of the macro
|
||||
_definition_. Using our example, we would try to match the token stream `print
|
||||
foo` from the invocation against the `matcher`s `print $mvar:ident` and `print
|
||||
foo` from the invocation against the matchers `print $mvar:ident` and `print
|
||||
twice $mvar:ident` that we previously extracted from the definition. The
|
||||
algorithm is exactly the same, but when the `macro` parser comes to a place in the
|
||||
current `matcher` where it needs to match a _non-terminal_ (e.g. `$mvar:ident`),
|
||||
algorithm is exactly the same, but when the macro parser comes to a place in the
|
||||
current matcher where it needs to match a _non-terminal_ (e.g. `$mvar:ident`),
|
||||
it calls back to the normal Rust parser to get the contents of that
|
||||
non-terminal. In this case, the Rust parser would look for an `ident` token,
|
||||
which it finds (`foo`) and returns to the `macro` parser. Then, the `macro` parser
|
||||
proceeds in parsing as normal. Also, note that exactly one of the `matcher`s from
|
||||
which it finds (`foo`) and returns to the macro parser. Then, the macro parser
|
||||
proceeds in parsing as normal. Also, note that exactly one of the matchers from
|
||||
the various arms should match the invocation; if there is more than one match,
|
||||
the parse is ambiguous, while if there are no matches at all, there is a syntax
|
||||
error.
|
||||
|
||||
For more information about the `macro` parser's implementation, see the comments
|
||||
For more information about the macro parser's implementation, see the comments
|
||||
in [`compiler/rustc_expand/src/mbe/macro_parser.rs`][code_mp].
|
||||
|
||||
## Procedural Macros
|
||||
|
||||
Procedural `macro`s are also expanded during parsing. However, rather than
|
||||
having a parser in the compiler, `proc macro`s are implemented as custom,
|
||||
third-party crates. The compiler will compile the `proc macro` crate and
|
||||
specially annotated functions in them (i.e. the `proc macro` itself), passing
|
||||
them a stream of tokens. A `proc macro` can then transform the token stream and
|
||||
output a new token stream, which is synthesized into the `AST`.
|
||||
Procedural macros are also expanded during parsing. However, rather than
|
||||
having a parser in the compiler, proc macros are implemented as custom,
|
||||
third-party crates. The compiler will compile the proc macro crate and
|
||||
specially annotated functions in them (i.e. the proc macro itself), passing
|
||||
them a stream of tokens. A proc macro can then transform the token stream and
|
||||
output a new token stream, which is synthesized into the AST.
|
||||
|
||||
The token stream type used by `proc macro`s is _stable_, so `rustc` does not
|
||||
The token stream type used by proc macros is _stable_, so `rustc` does not
|
||||
use it internally. The compiler's (unstable) token stream is defined in
|
||||
[`rustc_ast::tokenstream::TokenStream`][rustcts]. This is converted into the
|
||||
stable [`proc_macro::TokenStream`][stablets] and back in
|
||||
|
|
@ -615,13 +615,13 @@ Since the Rust ABI is currently unstable, we use the C ABI for this conversion.
|
|||
|
||||
### Custom Derive
|
||||
|
||||
Custom derives are a special type of `proc macro`.
|
||||
Custom derives are a special type of proc macro.
|
||||
|
||||
### Macros By Example and Macros 2.0
|
||||
|
||||
There is an legacy and mostly undocumented effort to improve the `MBE` system
|
||||
There is an legacy and mostly undocumented effort to improve the MBE system
|
||||
by giving it more hygiene-related features, better scoping and visibility
|
||||
rules, etc. Internally this uses the same machinery as today's `MBE`s with some
|
||||
rules, etc. Internally this uses the same machinery as today's MBEs with some
|
||||
additional syntactic sugar and are allowed to be in namespaces.
|
||||
|
||||
<!-- TODO(rylev): more? [#1160](https://github.com/rust-lang/rustc-dev-guide/issues/1160) -->
|
||||
|
|
|
|||
Loading…
Reference in New Issue