diff --git a/src/macro-expansion.md b/src/macro-expansion.md index faf9bad4..5cd3c067 100644 --- a/src/macro-expansion.md +++ b/src/macro-expansion.md @@ -4,14 +4,27 @@ > refactoring, so some of the links in this chapter may be broken. Rust has a very powerful macro system. In the previous chapter, we saw how the -parser sets aside macros to be expanded. This chapter is about the process of -expanding those macros iteratively until we have a complete AST for our crate -with no unexpanded macros (or a compile error). +parser sets aside macros to be expanded (it temporarily uses [placeholders]). +This chapter is about the process of expanding those macros iteratively until +we have a complete AST for our crate with no unexpanded macros (or a compile +error). + +[placeholders]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/placeholders/index.html First, we will discuss the algorithm that expands and integrates macro output into ASTs. Next, we will take a look at how hygiene data is collected. Finally, we will look at the specifics of expanding different types of macros. +Many of the algorithms and data structures described below are in [`rustc_expand`], +with basic data structures in [`rustc_expand::base`][base]. + +Also of note, `cfg` and `cfg_attr` are treated specially from other macros, and are +handled in [`rustc_expand::config`][cfg]. + +[`rustc_expand`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/index.html +[base]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/base/index.html +[cfg]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/config/index.html + ## Expansion and AST Integration First of all, expansion happens at the crate level. Given a raw source code for @@ -24,10 +37,7 @@ method on a whole crate. If it is not run on a full crate, it means we are doing _eager macro expansion_. Eager expansion means that we expand the arguments of a macro invocation before the macro invocation itself. This is implemented only for a few special built-in macros that expect literals (it's -not a generally available feature of Rust). Eager expansion generally performs -a subset of the things that lazy (normal) expansion does, so we will focus on -lazy expansion for the rest of this chapter. - +not a generally available feature of Rust). As an example, consider the following: ```rust,ignore @@ -40,7 +50,16 @@ foo!(bar!(baz)); A lazy expansion would expand `foo!` first. An eager expansion would expand `bar!` first. Implementing eager expansion more generally would be challenging, but we implement it for a few special built-in macros for the sake of user -experience. +experience. The built-in macros are implemented in [`rustc_builtin_macros`], +along with some other early code generation facilities like injection of +standard library imports or generation of test harness. There are some +additional helpers for building their AST fragments in +[`rustc_expand::build`][reb]. Eager expansion generally performs a subset of +the things that lazy (normal) expansion does, so we will focus on lazy +expansion for the rest of this chapter. + +[`rustc_builtin_macros`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_builtin_macros/index.html +[reb]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/build/index.html At a high level, [`fully_expand_fragment`][fef] works in iterations. We keep a queue of unresolved macro invocations (that is, macros we haven't found the @@ -114,10 +133,15 @@ fail at this point. The recovery happens by expanding unresolved macros into [err]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/enum.ExprKind.html#variant.Err Notice that name resolution is involved here: we need to resolve imports and -macro names in the above algorithm. However, we don't try to resolve other -names yet. This happens later, as we will see in the [next +macro names in the above algorithm. This is done in +[`rustc_resolve::macros`][mresolve], which resolves macro paths, validates +those resolutions, and reports various errors (e.g. "not found" or "found, but +it's unstable" or "expected x, found y"). However, we don't try to resolve +other names yet. This happens later, as we will see in the [next chapter](./name-resolution.md). +[mresolve]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_resolve/macros/index.html + Here are some other notable data structures involved in expansion and integration: - [`Resolver`] - a trait used to break crate dependencies. This allows the resolver services to be used in [`rustc_ast`], despite [`rustc_resolve`] and pretty much everything else depending on [`rustc_ast`]. - [`ExtCtxt`]/[`ExpansionData`] - various intermediate data kept and used by expansion @@ -217,9 +241,9 @@ an integer ID, assigned continuously starting from 0 as we discover new macro calls. All heirarchies start at [`ExpnId::root()`][rootid], which is its own parent. -All of the hygiene-related algorithms are implemented in -[`rustc_span::hygiene`][hy], with the exception of some hacks -[`Resolver::resolve_crate_root`][hacks]. +[`rustc_span::hygiene`][hy] contains all of the hygiene-related algorithms +(with the exception of some hacks in [`Resolver::resolve_crate_root`][hacks]) +and structures related to hygiene and expansion that are kept in global data. The actual heirarchies are stored in [`HygieneData`][hd]. This is a global piece of data containing hygiene and expansion info that can be accessed from @@ -362,6 +386,13 @@ foo!(bar!(baz)); For the `baz` AST node in the final output, the first heirarchy is `ROOT -> id(foo) -> id(bar) -> baz`, while the third heirarchy is `ROOT -> baz`. +### Macro Backtraces + +Macro backtraces are implemented in [`rustc_span`] using the hygiene machinery +in [`rustc_span::hygiene`][hy]. + +[`rustc_span`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/index.html + ## Producing Macro Output Above, we saw how the output of a macro is integrated into the AST for a crate, @@ -551,7 +582,17 @@ stream, which is synthesized into the AST. It's worth noting that the token stream type used by proc macros is _stable_, so `rustc` does not use it internally (since our internal data structures are -unstable). +unstable). The compiler's token stream is +[`rustc_ast::tokenstream::TokenStream`][rustcts], as previously. This is +converted into the stable [`proc_macro::TokenStream`][stablets] and back in +[`rustc_expand::proc_macro`][pm] and [`rustc_expand::proc_macro_server`][pms]. +Because the Rust ABI is unstable, we use the C ABI for this conversion. + +[tsmod]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/index.html +[rustcts]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/tokenstream/struct.TokenStream.html +[stablets]: https://doc.rust-lang.org/proc_macro/struct.TokenStream.html +[pm]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/proc_macro/index.html +[pms]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/proc_macro_server/index.html TODO: more here. @@ -560,22 +601,3 @@ TODO: more here. Custom derives are a special type of proc macro. TODO: more? - -## Important Modules and Data Structures - -TODO: sprinkle these throughout the chapter as much as possible... - -- librustc_span/hygiene.rs - structures related to hygiene and expansion that are kept in global data (can be accessed from any Ident without any context) -- librustc_span/lib.rs - some secondary methods like macro backtrace using primary methods from hygiene.rs -- librustc_builtin_macros - implementations of built-in macros (including macro attributes and derives) and some other early code generation facilities like injection of standard library imports or generation of test harness. -- librustc_ast/config.rs - implementation of cfg/cfg_attr (they treated specially from other macros), should probably be moved into librustc_ast/ext. -- librustc_ast/tokenstream.rs + librustc_ast/parse/token.rs - structures for compiler-side tokens, token trees, and token streams. -- librustc_ast/ext - various expansion-related stuff -- librustc_ast/ext/base.rs - basic structures used by expansion -- librustc_ast/ext/expand.rs - some expansion structures and the bulk of expansion infrastructure code - collecting macro invocations, calling into resolve for them, calling their expanding functions, and integrating the results back into AST -- librustc_ast/ext/placeholder.rs - the part of expand.rs responsible for "integrating the results back into AST" basicallly, "placeholder" is a temporary AST node replaced with macro expansion result nodes -- librustc_ast/ext/builer.rs - helper functions for building AST for built-in macros in librustc_builtin_macros (and user-defined syntactic plugins previously), can probably be moved into librustc_builtin_macros these days -- librustc_ast/ext/proc_macro.rs + librustc_ast/ext/proc_macro_server.rs - interfaces between the compiler and the stable proc_macro library, converting tokens and token streams between the two representations and sending them through C ABI -- librustc_ast/ext/tt - implementation of macro_rules, turns macro_rules DSL into something with signature Fn(TokenStream) -> TokenStream that can eat and produce tokens, @mark-i-m knows more about this -- librustc_resolve/macros.rs - resolving macro paths, validating those resolutions, reporting various "not found"/"found, but it's unstable"/"expected x, found y" errors -- librustc_middle/hir/map/def_collector.rs + librustc_resolve/build_reduced_graph.rs - integrate an AST fragment freshly expanded from a macro into various parent/child structures like module hierarchy or "definition paths" diff --git a/src/the-parser.md b/src/the-parser.md index c0f2a071..da318c9e 100644 --- a/src/the-parser.md +++ b/src/the-parser.md @@ -7,10 +7,11 @@ The very first thing the compiler does is take the program (in Unicode characters) and turn it into something the compiler can work with more conveniently than strings. This happens in two stages: Lexing and Parsing. -Lexing takes strings and turns them into streams of tokens. For example, +Lexing takes strings and turns them into streams of [tokens]. For example, `a.b + c` would be turned into the tokens `a`, `.`, `b`, `+`, and `c`. The lexer lives in [`librustc_lexer`][lexer]. +[tokens]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/token/index.html [lexer]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html Parsing then takes streams of tokens and turns them into a structured