From 141528264b91f9dd1f5c5d15dc6995ea9ccc27f5 Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Fri, 19 Jan 2018 06:51:14 -0500 Subject: [PATCH] move over the `ty` README --- src/SUMMARY.md | 2 +- src/ty.md | 166 ++++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 166 insertions(+), 2 deletions(-) diff --git a/src/SUMMARY.md b/src/SUMMARY.md index b1dfa440..f8b1f1d6 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -9,7 +9,7 @@ - [Macro expansion](./macro-expansion.md) - [Name resolution](./name-resolution.md) - [HIR lowering](./hir-lowering.md) -- [Representing types (`ty` module in depth)](./ty.md) +- [The `ty` module: representing types](./ty.md) - [Type inference](./type-inference.md) - [Trait resolution](./trait-resolution.md) - [Type checking](./type-checking.md) diff --git a/src/ty.md b/src/ty.md index 1762ce03..8debb71c 100644 --- a/src/ty.md +++ b/src/ty.md @@ -1 +1,165 @@ -# Representing types (`ty` module in depth) +# The `ty` module: representing types + +The `ty` module defines how the Rust compiler represents types +internally. It also defines the *typing context* (`tcx` or `TyCtxt`), +which is the central data structure in the compiler. + +## The tcx and how it uses lifetimes + +The `tcx` ("typing context") is the central data structure in the +compiler. It is the context that you use to perform all manner of +queries. The struct `TyCtxt` defines a reference to this shared context: + +```rust +tcx: TyCtxt<'a, 'gcx, 'tcx> +// -- ---- ---- +// | | | +// | | innermost arena lifetime (if any) +// | "global arena" lifetime +// lifetime of this reference +``` + +As you can see, the `TyCtxt` type takes three lifetime parameters. +These lifetimes are perhaps the most complex thing to understand about +the tcx. During Rust compilation, we allocate most of our memory in +**arenas**, which are basically pools of memory that get freed all at +once. When you see a reference with a lifetime like `'tcx` or `'gcx`, +you know that it refers to arena-allocated data (or data that lives as +long as the arenas, anyhow). + +We use two distinct levels of arenas. The outer level is the "global +arena". This arena lasts for the entire compilation: so anything you +allocate in there is only freed once compilation is basically over +(actually, when we shift to executing LLVM). + +To reduce peak memory usage, when we do type inference, we also use an +inner level of arena. These arenas get thrown away once type inference +is over. This is done because type inference generates a lot of +"throw-away" types that are not particularly interesting after type +inference completes, so keeping around those allocations would be +wasteful. + +Often, we wish to write code that explicitly asserts that it is not +taking place during inference. In that case, there is no "local" +arena, and all the types that you can access are allocated in the +global arena. To express this, the idea is to use the same lifetime +for the `'gcx` and `'tcx` parameters of `TyCtxt`. Just to be a touch +confusing, we tend to use the name `'tcx` in such contexts. Here is an +example: + +```rust +fn not_in_inference<'a, 'tcx>(tcx: TyCtxt<'a, 'tcx, 'tcx>, def_id: DefId) { + // ---- ---- + // Using the same lifetime here asserts + // that the innermost arena accessible through + // this reference *is* the global arena. +} +``` + +In contrast, if we want to code that can be usable during type inference, then you +need to declare a distinct `'gcx` and `'tcx` lifetime parameter: + +```rust +fn maybe_in_inference<'a, 'gcx, 'tcx>(tcx: TyCtxt<'a, 'gcx, 'tcx>, def_id: DefId) { + // ---- ---- + // Using different lifetimes here means that + // the innermost arena *may* be distinct + // from the global arena (but doesn't have to be). +} +``` + +### Allocating and working with types + +Rust types are represented using the `Ty<'tcx>` defined in the `ty` +module (not to be confused with the `Ty` struct from [the HIR]). This +is in fact a simple type alias for a reference with `'tcx` lifetime: + +```rust +pub type Ty<'tcx> = &'tcx TyS<'tcx>; +``` + +[the HIR]: ../hir/README.md + +You can basically ignore the `TyS` struct -- you will basically never +access it explicitly. We always pass it by reference using the +`Ty<'tcx>` alias -- the only exception I think is to define inherent +methods on types. Instances of `TyS` are only ever allocated in one of +the rustc arenas (never e.g. on the stack). + +One common operation on types is to **match** and see what kinds of +types they are. This is done by doing `match ty.sty`, sort of like this: + +```rust +fn test_type<'tcx>(ty: Ty<'tcx>) { + match ty.sty { + ty::TyArray(elem_ty, len) => { ... } + ... + } +} +``` + +The `sty` field (the origin of this name is unclear to me; perhaps +structural type?) is of type `TypeVariants<'tcx>`, which is an enum +defining all of the different kinds of types in the compiler. + +> NB: inspecting the `sty` field on types during type inference can be +> risky, as there may be inference variables and other things to +> consider, or sometimes types are not yet known that will become +> known later.). + +To allocate a new type, you can use the various `mk_` methods defined +on the `tcx`. These have names that correpond mostly to the various kinds +of type variants. For example: + +```rust +let array_ty = tcx.mk_array(elem_ty, len * 2); +``` + +These methods all return a `Ty<'tcx>` -- note that the lifetime you +get back is the lifetime of the innermost arena that this `tcx` has +access to. In fact, types are always canonicalized and interned (so we +never allocate exactly the same type twice) and are always allocated +in the outermost arena where they can be (so, if they do not contain +any inference variables or other "temporary" types, they will be +allocated in the global arena). However, the lifetime `'tcx` is always +a safe approximation, so that is what you get back. + +> NB. Because types are interned, it is possible to compare them for +> equality efficiently using `==` -- however, this is almost never what +> you want to do unless you happen to be hashing and looking for +> duplicates. This is because often in Rust there are multiple ways to +> represent the same type, particularly once inference is involved. If +> you are going to be testing for type equality, you probably need to +> start looking into the inference code to do it right. + +You can also find various common types in the `tcx` itself by accessing +`tcx.types.bool`, `tcx.types.char`, etc (see `CommonTypes` for more). + +### Beyond types: Other kinds of arena-allocated data structures + +In addition to types, there are a number of other arena-allocated data +structures that you can allocate, and which are found in this +module. Here are a few examples: + +- `Substs`, allocated with `mk_substs` -- this will intern a slice of types, often used to + specify the values to be substituted for generics (e.g., `HashMap` + would be represented as a slice `&'tcx [tcx.types.i32, tcx.types.u32]`). +- `TraitRef`, typically passed by value -- a **trait reference** + consists of a reference to a trait along with its various type + parameters (including `Self`), like `i32: Display` (here, the def-id + would reference the `Display` trait, and the substs would contain + `i32`). +- `Predicate` defines something the trait system has to prove (see `traits` module). + +### Import conventions + +Although there is no hard and fast rule, the `ty` module tends to be used like so: + +```rust +use ty::{self, Ty, TyCtxt}; +``` + +In particular, since they are so common, the `Ty` and `TyCtxt` types +are imported directly. Other types are often referenced with an +explicit `ty::` prefix (e.g., `ty::TraitRef<'tcx>`). But some modules +choose to import a larger or smaller set of names explicitly.