Compare commits

...

11 Commits

Author SHA1 Message Date
xizheyin 9bbf15ac40
Merge 23d77abfc0 into e0a39188f1 2025-06-19 01:17:21 +08:00
Boxy e0a39188f1
Merge pull request #2474 from BoxyUwU/ambig_unambig_ty_consts
Document Ambig vs Unambig Type/Consts
2025-06-18 15:30:14 +01:00
Boxy 9d7ba8573d Reviews 2025-06-18 15:28:44 +01:00
Boxy c963b4ad93 Add links 2025-06-17 18:09:06 +01:00
Boxy a02af2f135 Write chapter on Unambig vs Ambig Types/Consts 2025-06-17 18:09:06 +01:00
Boxy 4185dca095 Stub chapter and consolidate under `/hir/` 2025-06-17 18:09:02 +01:00
nora a2c80e6e23
Merge pull request #2475 from lolbinarycat/patch-3
Profiling with perf: specify the section of bootstrap settings.
2025-06-17 18:34:03 +02:00
lolbinarycat 7b921990fc
Profiling with perf: specify the section of bootstrap settings. 2025-06-17 11:31:04 -05:00
xizheyin 23d77abfc0
Add Section How queries interact with external crate metadata
Signed-off-by: xizheyin <xizheyin@smail.nju.edu.cn>
2025-06-16 23:47:04 +08:00
xizheyin e03ee80811
change key in Provider example (local) into `LocalDefId`
Signed-off-by: xizheyin <xizheyin@smail.nju.edu.cn>
2025-06-15 21:03:22 +08:00
xizheyin 07e05bbc46 Refinement of Providers into Providers and ExternProviders
Signed-off-by: xizheyin <xizheyin@smail.nju.edu.cn>
2025-06-15 20:59:08 +08:00
9 changed files with 149 additions and 41 deletions

View File

@ -121,8 +121,9 @@
- [Feature gate checking](./feature-gate-ck.md)
- [Lang Items](./lang-items.md)
- [The HIR (High-level IR)](./hir.md)
- [Lowering AST to HIR](./ast-lowering.md)
- [Debugging](./hir-debugging.md)
- [Lowering AST to HIR](./hir/lowering.md)
- [Ambig/Unambig Types and Consts](./hir/ambig-unambig-ty-and-consts.md)
- [Debugging](./hir/debugging.md)
- [The THIR (Typed High-level IR)](./thir.md)
- [The MIR (Mid-level IR)](./mir/index.md)
- [MIR construction](./mir/construction.md)

View File

@ -553,7 +553,7 @@ compiler](#linting-early-in-the-compiler).
[AST nodes]: the-parser.md
[AST lowering]: ast-lowering.md
[AST lowering]: ./hir/lowering.md
[HIR nodes]: hir.md
[MIR nodes]: mir/index.md
[macro expansion]: macro-expansion.md

View File

@ -5,7 +5,7 @@
The HIR "High-Level Intermediate Representation" is the primary IR used
in most of rustc. It is a compiler-friendly representation of the abstract
syntax tree (AST) that is generated after parsing, macro expansion, and name
resolution (see [Lowering](./ast-lowering.html) for how the HIR is created).
resolution (see [Lowering](./hir/lowering.md) for how the HIR is created).
Many parts of HIR resemble Rust surface syntax quite closely, with
the exception that some of Rust's expression forms have been desugared away.
For example, `for` loops are converted into a `loop` and do not appear in

View File

@ -0,0 +1,63 @@
# Ambig/Unambig Types and Consts
Types and Consts args in the HIR can be in two kinds of positions ambiguous (ambig) or unambiguous (unambig). Ambig positions are where
it would be valid to parse either a type or a const, unambig positions are where only one kind would be valid to
parse.
```rust
fn func<T, const N: usize>(arg: T) {
// ^ Unambig type position
let a: _ = arg;
// ^ Unambig type position
func::<T, N>(arg);
// ^ ^
// ^^^^ Ambig position
let _: [u8; 10];
// ^^ ^^ Unambig const position
// ^^ Unambig type position
}
```
Most types/consts in ambig positions are able to be disambiguated as either a type or const during parsing. Single segment paths are always represented as types in the AST but may get resolved to a const parameter during name resolution, then lowered to a const argument during ast-lowering. The only generic arguments which remain ambiguous after lowering are inferred generic arguments (`_`) in path segments. For example, in `Foo<_>` it is not clear whether the `_` argument is an inferred type argument, or an inferred const argument.
In unambig positions, inferred arguments are represented with [`hir::TyKind::Infer`][ty_infer] or [`hir::ConstArgKind::Infer`][const_infer] depending on whether it is a type or const position respectively.
In ambig positions, inferred arguments are represented with `hir::GenericArg::Infer`.
A naive implementation of this would result in there being potentially 5 places where you might think an inferred type/const could be found in the HIR from looking at the structure of the HIR:
1. In unambig type position as a `hir::TyKind::Infer`
2. In unambig const arg position as a `hir::ConstArgKind::Infer`
3. In an ambig position as a [`GenericArg::Type(TyKind::Infer)`][generic_arg_ty]
4. In an ambig position as a [`GenericArg::Const(ConstArgKind::Infer)`][generic_arg_const]
5. In an ambig position as a [`GenericArg::Infer`][generic_arg_infer]
Note that places 3 and 4 would never actually be possible to encounter as we always lower to `GenericArg::Infer` in generic arg position.
This has a few failure modes:
- People may write visitors which check for `GenericArg::Infer` but forget to check for `hir::TyKind/ConstArgKind::Infer`, only handling infers in ambig positions by accident.
- People may write visitors which check for `hir::TyKind/ConstArgKind::Infer` but forget to check for `GenericArg::Infer`, only handling infers in unambig positions by accident.
- People may write visitors which check for `GenerArg::Type/Const(TyKind/ConstArgKind::Infer)` and `GenerigArg::Infer`, not realising that we never represent inferred types/consts in ambig positions as a `GenericArg::Type/Const`.
- People may write visitors which check for *only* `TyKind::Infer` and not `ConstArgKind::Infer` forgetting that there are also inferred const arguments (and vice versa).
To make writing HIR visitors less error prone when caring about inferred types/consts we have a relatively complex system:
1. We have different types in the compiler for when a type or const is in an unambig or ambig position, `hir::Ty<AmbigArg>` and `hir::Ty<()>`. [`AmbigArg`][ambig_arg] is an uninhabited type which we use in the `Infer` variant of `TyKind` and `ConstArgKind` to selectively "disable" it if we are in an ambig position.
2. The [`visit_ty`][visit_ty] and [`visit_const_arg`][visit_const_arg] methods on HIR visitors only accept the ambig position versions of types/consts. Unambig types/consts are implicitly converted to ambig types/consts during the visiting process, with the `Infer` variant handled by a dedicated [`visit_infer`][visit_infer] method.
This has a number of benefits:
- It's clear that `GenericArg::Type/Const` cannot represent inferred type/const arguments
- Implementors of `visit_ty` and `visit_const_arg` will never encounter inferred types/consts making it impossible to write a visitor that seems to work right but handles edge cases wrong
- The `visit_infer` method handles *all* cases of inferred type/consts in the HIR making it easy for visitors to handle inferred type/consts in one dedicated place and not forget cases
[ty_infer]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/enum.TyKind.html#variant.Infer
[const_infer]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/enum.ConstArgKind.html#variant.Infer
[generic_arg_ty]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/enum.GenericArg.html#variant.Type
[generic_arg_const]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/enum.GenericArg.html#variant.Const
[generic_arg_infer]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/enum.GenericArg.html#variant.Infer
[ambig_arg]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/enum.AmbigArg.html
[visit_ty]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/intravisit/trait.Visitor.html#method.visit_ty
[visit_const_arg]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/intravisit/trait.Visitor.html#method.visit_const_arg
[visit_infer]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/intravisit/trait.Visitor.html#method.visit_infer

View File

@ -1,6 +1,6 @@
# AST lowering
The AST lowering step converts AST to [HIR](hir.html).
The AST lowering step converts AST to [HIR](../hir.md).
This means many structures are removed if they are irrelevant
for type analysis or similar syntax agnostic analyses. Examples
of such structures include but are not limited to

View File

@ -410,7 +410,7 @@ For more details on bootstrapping, see
- Guide: [The HIR](hir.md)
- Guide: [Identifiers in the HIR](hir.md#identifiers-in-the-hir)
- Guide: [The `HIR` Map](hir.md#the-hir-map)
- Guide: [Lowering `AST` to `HIR`](ast-lowering.md)
- Guide: [Lowering `AST` to `HIR`](./hir/lowering.md)
- How to view `HIR` representation for your code `cargo rustc -- -Z unpretty=hir-tree`
- Rustc `HIR` definition: [`rustc_hir`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html)
- Main entry point: **TODO**

View File

@ -7,8 +7,8 @@ This is a guide for how to profile rustc with [perf](https://perf.wiki.kernel.or
- Get a clean checkout of rust-lang/master, or whatever it is you want
to profile.
- Set the following settings in your `bootstrap.toml`:
- `debuginfo-level = 1` - enables line debuginfo
- `jemalloc = false` - lets you do memory use profiling with valgrind
- `rust.debuginfo-level = 1` - enables line debuginfo
- `rust.jemalloc = false` - lets you do memory use profiling with valgrind
- leave everything else the defaults
- Run `./x build` to get a full build
- Make a rustup toolchain pointing to that result

View File

@ -71,22 +71,24 @@ are cheaply cloneable; insert an `Rc` if necessary).
If, however, the query is *not* in the cache, then the compiler will
call the corresponding **provider** function. A provider is a function
implemented in a specific module and **manually registered** into the
[`Providers`][providers_struct] struct during compiler initialization.
The macro system generates the [`Providers`][providers_struct] struct,
which acts as a function table for all query implementations, where each
implemented in a specific module and **manually registered** into either
the [`Providers`][providers_struct] struct (for local crate queries) or
the [`ExternProviders`][extern_providers_struct] struct (for external crate queries)
during compiler initialization. The macro system generates both structs,
which act as function tables for all query implementations, where each
field is a function pointer to the actual provider.
**Note:** The `Providers` struct is generated by macros and acts as a function table for all query implementations.
It is **not** a Rust trait, but a plain struct with function pointer fields.
**Note:** Both the `Providers` and `ExternProviders` structs are generated by macros and act as function tables for all query implementations.
They are **not** Rust traits, but plain structs with function pointer fields.
**Providers are defined per-crate.** The compiler maintains,
internally, a table of providers for every crate, at least
conceptually. Right now, there are really two sets: the providers for
queries about the **local crate** (that is, the one being compiled)
and providers for queries about **external crates** (that is,
dependencies of the local crate). Note that what determines the crate
that a query is targeting is not the *kind* of query, but the *key*.
conceptually. There are two sets of providers:
- The `Providers` struct for queries about the **local crate** (that is, the one being compiled)
- The `ExternProviders` struct for queries about **external crates** (that is,
dependencies of the local crate)
Note that what determines the crate that a query is targeting is not the *kind* of query, but the *key*.
For example, when you invoke `tcx.type_of(def_id)`, that could be a
local query or an external query, depending on what crate the `def_id`
is referring to (see the [`self::keys::Key`][Key] trait for more
@ -119,22 +121,22 @@ they define both a `provide` and a `provide_extern` function, through
### How providers are set up
When the tcx is created, it is given the providers by its creator using
the [`Providers`][providers_struct] struct. This struct is generated by
the macros here, but it is basically a big list of function pointers:
[providers_struct]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/query/struct.Providers.html
When the tcx is created, it is given both the local and external providers by its creator using
the `Providers` struct from `rustc_middle::util`. This struct contains both the local and external providers:
```rust,ignore
struct Providers {
type_of: for<'tcx> fn(TyCtxt<'tcx>, DefId) -> Ty<'tcx>,
// ... one field for each query
pub struct Providers {
pub queries: crate::query::Providers, // Local crate providers
pub extern_queries: crate::query::ExternProviders, // External crate providers
pub hooks: crate::hooks::Providers,
}
```
Each of these provider structs is generated by the macros and contains function pointers for their respective queries.
#### How are providers registered?
The `Providers` struct is filled in during compiler initialization, mainly by the `rustc_driver` crate.
The provider structs are filled in during compiler initialization, mainly by the `rustc_driver` crate.
But the actual provider functions are implemented in various `rustc_*` crates (like `rustc_middle`, `rustc_hir_analysis`, etc).
To register providers, each crate exposes a [`provide`][provide_fn] function that looks like this:
@ -142,17 +144,20 @@ To register providers, each crate exposes a [`provide`][provide_fn] function tha
[provide_fn]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/hir/fn.provide.html
```rust,ignore
pub fn provide(providers: &mut Providers) {
*providers = Providers {
type_of,
// ... add more providers here
..*providers
};
pub fn provide(providers: &mut rustc_middle::util::Providers) {
providers.queries.type_of = type_of;
// ... add more local providers here
providers.extern_queries.type_of = extern_type_of;
// ... add more external providers here
providers.hooks.some_hook = some_hook;
// ... add more hooks here
}
```
- This function takes a mutable reference to the `Providers` struct and sets the fields to point to the correct provider functions.
- You can also assign fields individually, e.g. `providers.type_of = type_of;`.
- You can assign fields individually for each provider type (local, external, and hooks).
#### Adding a new provider
@ -160,18 +165,57 @@ Suppose you want to add a new query called `fubar`. You would:
1. Implement the provider function:
```rust,ignore
fn fubar<'tcx>(tcx: TyCtxt<'tcx>, key: DefId) -> Fubar<'tcx> { ... }
fn fubar<'tcx>(tcx: TyCtxt<'tcx>, key: LocalDefId) -> Fubar<'tcx> { ... }
```
2. Register it in the `provide` function:
```rust,ignore
pub fn provide(providers: &mut Providers) {
*providers = Providers {
fubar,
..*providers
};
pub fn provide(providers: &mut rustc_middle::util::Providers) {
providers.queries.fubar = fubar;
}
```
### How queries interact with external crate metadata
When a query is made for an external crate (i.e., a dependency), the query system needs to load the information from that crate's metadata.
This is handled by the [`rustc_metadata` crate][rustc_metadata], which is responsible for decoding and providing the information stored in the `.rmeta` files.
The process works like this:
1. When a query is made, the query system first checks if the `DefId` refers to a local or external crate by checking if `def_id.krate == LOCAL_CRATE`.
This determines whether to use the local provider from [`Providers`][providers_struct] or the external provider from [`ExternProviders`][extern_providers_struct].
2. For external crates, the query system will look for a provider in the [`ExternProviders`][extern_providers_struct] struct.
The `rustc_metadata` crate registers these external providers through the `provide_extern` function in `rustc_metadata/src/rmeta/decoder/cstore_impl.rs`. Just like:
```rust
pub fn provide_extern(providers: &mut ExternProviders) {
providers.foo = |tcx, def_id| {
// Load and decode metadata for external crate
let cdata = CStore::from_tcx(tcx).get_crate_data(def_id.krate);
cdata.foo(def_id.index)
};
// Register other external providers...
}
```
3. The metadata is stored in a binary format in `.rmeta` files that contains pre-computed information about the external crate, such as types, function signatures, trait implementations, and other information needed by the compiler. When an external query is made, the `rustc_metadata` crate:
- Loads the `.rmeta` file for the external crate
- Decodes the metadata using the `Decodable` trait
- Returns the decoded information to the query system
This approach avoids recompiling external crates, allows for faster compilation of dependent crates, and enables incremental compilation to work across crate boundaries.
Here is a simplified example, when you call `tcx.type_of(def_id)` for a type defined in an external crate, the query system will:
1. Detect that the `def_id` refers to an external crate by checking `def_id.krate != LOCAL_CRATE`
2. Call the appropriate provider from `ExternProviders` which was registered by `rustc_metadata`
3. The provider will load and decode the type information from the external crate's metadata
4. Return the decoded type to the caller
This is why most `rustc_*` crates only need to provide local providers - the external providers are handled by the metadata system.
The only exception is when a crate needs to provide special handling for external queries, in which case it would implement both local and external providers.
[rustc_metadata]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/index.html
[providers_struct]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/query/struct.Providers.html
[extern_providers_struct]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/query/struct.ExternProviders.html
---
## Adding a new query