create a separate chapter on arenas/interning
This commit is contained in:
parent
43ca498a19
commit
d6c9c42df5
|
|
@ -43,6 +43,7 @@
|
|||
- [Debugging and Testing](./incrcomp-debugging.md)
|
||||
- [Profiling Queries](./queries/profiling.md)
|
||||
- [Salsa](./salsa.md)
|
||||
- [Memory Management in Rustc](./memory.md)
|
||||
- [Lexing and Parsing](./the-parser.md)
|
||||
- [`#[test]` Implementation](./test-implementation.md)
|
||||
- [Panic Implementation](./panic-implementation.md)
|
||||
|
|
|
|||
|
|
@ -0,0 +1,88 @@
|
|||
# Memory Management in Rustc
|
||||
|
||||
Rustc tries to be pretty careful how it manages memory. The compiler allocates
|
||||
_a lot_ of data structures throughout compilation, and if we are not careful,
|
||||
it will take a lot of time and space to do so.
|
||||
|
||||
One of the main way the compiler manages this is using arenas and interning.
|
||||
|
||||
## Arenas and Interning
|
||||
|
||||
We create a LOT of data structures during compilation. For performance reasons,
|
||||
we allocate them from a global memory pool; they are each allocated once from a
|
||||
long-lived *arena*. This is called _arena allocation_. This system reduces
|
||||
allocations/deallocations of memory. It also allows for easy comparison of
|
||||
types for equality: for each interned type `X`, we implemented [`PartialEq for
|
||||
X`][peqimpl], so we can just compare pointers. The [`CtxtInterners`] type
|
||||
contains a bunch of maps of interned types and the arena itself.
|
||||
|
||||
[peqimpl]: https://github.com/rust-lang/rust/blob/3ee936378662bd2e74be951d6a7011a95a6bd84d/src/librustc/ty/mod.rs#L528-L534
|
||||
[`CtxtInterners`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.CtxtInterners.html#structfield.arena
|
||||
|
||||
### Example: `ty::TyS`
|
||||
|
||||
Taking the example of [`ty::TyS`] which represents a type in the compiler (you
|
||||
can read more [here](./ty.md)). Each time we want to construct a type, the
|
||||
compiler doesn’t naively allocate from the buffer. Instead, we check if that
|
||||
type was already constructed. If it was, we just get the same pointer we had
|
||||
before, otherwise we make a fresh pointer. With this schema if we want to know
|
||||
if two types are the same, all we need to do is compare the pointers which is
|
||||
efficient. `TyS` is carefully setup so you never construct them on the stack.
|
||||
You always allocate them from this arena and you always intern them so they are
|
||||
unique.
|
||||
|
||||
At the beginning of the compilation we make a buffer and each time we need to allocate a type we use
|
||||
some of this memory buffer. If we run out of space we get another one. The lifetime of that buffer
|
||||
is `'tcx`. Our types are tied to that lifetime, so when compilation finishes all the memory related
|
||||
to that buffer is freed and our `'tcx` references would be invalid.
|
||||
|
||||
In addition to types, there are a number of other arena-allocated data structures that you can
|
||||
allocate, and which are found in this module. Here are a few examples:
|
||||
|
||||
- [`Substs`][subst], allocated with `mk_substs` – this will intern a slice of types, often used to
|
||||
specify the values to be substituted for generics (e.g. `HashMap<i32, u32>` would be represented
|
||||
as a slice `&'tcx [tcx.types.i32, tcx.types.u32]`).
|
||||
- [`TraitRef`], typically passed by value – a **trait reference** consists of a reference to a trait
|
||||
along with its various type parameters (including `Self`), like `i32: Display` (here, the def-id
|
||||
would reference the `Display` trait, and the substs would contain `i32`). Note that `def-id` is
|
||||
defined and discussed in depth in the `AdtDef and DefId` section.
|
||||
- [`Predicate`] defines something the trait system has to prove (see `traits` module).
|
||||
|
||||
[subst]: ./generic_arguments.html#subst
|
||||
[`TraitRef`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.TraitRef.html
|
||||
[`Predicate`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/enum.Predicate.html
|
||||
|
||||
[`ty::TyS`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.TyS.html
|
||||
|
||||
## The tcx and how it uses lifetimes
|
||||
|
||||
The `tcx` ("typing context") is the central data structure in the compiler. It is the context that
|
||||
you use to perform all manner of queries. The struct `TyCtxt` defines a reference to this shared
|
||||
context:
|
||||
|
||||
```rust,ignore
|
||||
tcx: TyCtxt<'tcx>
|
||||
// ----
|
||||
// |
|
||||
// arena lifetime
|
||||
```
|
||||
|
||||
As you can see, the `TyCtxt` type takes a lifetime parameter. When you see a reference with a
|
||||
lifetime like `'tcx`, you know that it refers to arena-allocated data (or data that lives as long as
|
||||
the arenas, anyhow).
|
||||
|
||||
### A Note On Lifetimes
|
||||
|
||||
The Rust compiler is a fairly large program containing lots of big data
|
||||
structures (e.g. the AST, HIR, and the type system) and as such, arenas and
|
||||
references are heavily relied upon to minimize unnecessary memory use. This
|
||||
manifests itself in the way people can plug into the compiler (i.e. the
|
||||
[driver](./rustc-driver.md)), preferring a "push"-style API (callbacks) instead
|
||||
of the more Rust-ic "pull" style (think the `Iterator` trait).
|
||||
|
||||
Thread-local storage and interning are used a lot through the compiler to reduce
|
||||
duplication while also preventing a lot of the ergonomic issues due to many
|
||||
pervasive lifetimes. The [`rustc::ty::tls`][tls] module is used to access these
|
||||
thread-locals, although you should rarely need to touch it.
|
||||
|
||||
[tls]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/tls/index.html
|
||||
|
|
@ -32,19 +32,6 @@ replaces this functionality.
|
|||
> **Warning:** By its very nature, the internal compiler APIs are always going
|
||||
> to be unstable. That said, we do try not to break things unnecessarily.
|
||||
|
||||
## A Note On Lifetimes
|
||||
|
||||
The Rust compiler is a fairly large program containing lots of big data
|
||||
structures (e.g. the AST, HIR, and the type system) and as such, arenas and
|
||||
references are heavily relied upon to minimize unnecessary memory use. This
|
||||
manifests itself in the way people can plug into the compiler, preferring a
|
||||
"push"-style API (callbacks) instead of the more Rust-ic "pull" style (think
|
||||
the `Iterator` trait).
|
||||
|
||||
Thread-local storage and interning are used a lot through the compiler to reduce
|
||||
duplication while also preventing a lot of the ergonomic issues due to many
|
||||
pervasive lifetimes. The `rustc::ty::tls` module is used to access these
|
||||
thread-locals, although you should rarely need to touch it.
|
||||
|
||||
[cb]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/trait.Callbacks.html
|
||||
[rd_rc]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/fn.run_compiler.html
|
||||
|
|
|
|||
117
src/ty.md
117
src/ty.md
|
|
@ -119,12 +119,41 @@ field of type [`TyKind`][tykind], which represents the key type information. `Ty
|
|||
which represents different kinds of types (e.g. primitives, references, abstract data types,
|
||||
generics, lifetimes, etc). `TyS` also has 2 more fields, `flags` and `outer_exclusive_binder`. They
|
||||
are convenient hacks for efficiency and summarize information about the type that we may want to
|
||||
know, but they don’t come into the picture as much here.
|
||||
know, but they don’t come into the picture as much here. Finally, `ty::TyS`s
|
||||
are [interned](./memory.md), so that the `ty::Ty` can be a thin pointer-like
|
||||
type. This allows us to do cheap comparisons for equality, along with the other
|
||||
benefits of interning.
|
||||
|
||||
[tys]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.TyS.html
|
||||
[kind]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.TyS.html#structfield.kind
|
||||
[tykind]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/enum.TyKind.html
|
||||
|
||||
## Allocating and working with types
|
||||
|
||||
To allocate a new type, you can use the various `mk_` methods defined on the `tcx`. These have names
|
||||
that correspond mostly to the various kinds of types. For example:
|
||||
|
||||
```rust,ignore
|
||||
let array_ty = tcx.mk_array(elem_ty, len * 2);
|
||||
```
|
||||
|
||||
These methods all return a `Ty<'tcx>` – note that the lifetime you get back is the lifetime of the
|
||||
arena that this `tcx` has access to. Types are always canonicalized and interned (so we never
|
||||
allocate exactly the same type twice).
|
||||
|
||||
> NB. Because types are interned, it is possible to compare them for equality efficiently using `==`
|
||||
> – however, this is almost never what you want to do unless you happen to be hashing and looking
|
||||
> for duplicates. This is because often in Rust there are multiple ways to represent the same type,
|
||||
> particularly once inference is involved. If you are going to be testing for type equality, you
|
||||
> probably need to start looking into the inference code to do it right.
|
||||
|
||||
You can also find various common types in the `tcx` itself by accessing `tcx.types.bool`,
|
||||
`tcx.types.char`, etc (see [`CommonTypes`] for more).
|
||||
|
||||
[`CommonTypes`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/context/struct.CommonTypes.html
|
||||
|
||||
## `ty::TyKind` Variants
|
||||
|
||||
Note: `TyKind` is **NOT** the functional programming concept of *Kind*.
|
||||
|
||||
Whenever working with a `Ty` in the compiler, it is common to match on the kind of type:
|
||||
|
|
@ -147,8 +176,6 @@ types in the compiler.
|
|||
There are a lot of related types, and we’ll cover them in time (e.g regions/lifetimes,
|
||||
“substitutions”, etc).
|
||||
|
||||
## `ty::TyKind` Variants
|
||||
|
||||
There are a bunch of variants on the `TyKind` enum, which you can see by looking at the rustdocs.
|
||||
Here is a sampling:
|
||||
|
||||
|
|
@ -191,90 +218,6 @@ will discuss this more later.
|
|||
[kinderr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/enum.TyKind.html#variant.Error
|
||||
[kindvars]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/enum.TyKind.html#variants
|
||||
|
||||
## Interning
|
||||
|
||||
We create a LOT of types during compilation. For performance reasons, we allocate them from a global
|
||||
memory pool, they are each allocated once from a long-lived *arena*. This is called _arena
|
||||
allocation_. This system reduces allocations/deallocations of memory. It also allows for easy
|
||||
comparison of types for equality: we implemented [`PartialEq for TyS`][peqimpl], so we can just
|
||||
compare pointers. The [`CtxtInterners`] type contains a bunch of maps of interned types and the
|
||||
arena itself.
|
||||
|
||||
[peqimpl]: https://github.com/rust-lang/rust/blob/3ee936378662bd2e74be951d6a7011a95a6bd84d/src/librustc/ty/mod.rs#L528-L534
|
||||
[`CtxtInterners`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.CtxtInterners.html#structfield.arena
|
||||
|
||||
Each time we want to construct a type, the compiler doesn’t naively allocate from the buffer.
|
||||
Instead, we check if that type was already constructed. If it was, we just get the same pointer we
|
||||
had before, otherwise we make a fresh pointer. With this schema if we want to know if two types are
|
||||
the same, all we need to do is compare the pointers which is efficient. `TyS` which represents types
|
||||
is carefully setup so you never construct them on the stack. You always allocate them from this
|
||||
arena and you always intern them so they are unique.
|
||||
|
||||
At the beginning of the compilation we make a buffer and each time we need to allocate a type we use
|
||||
some of this memory buffer. If we run out of space we get another one. The lifetime of that buffer
|
||||
is `'tcx`. Our types are tied to that lifetime, so when compilation finishes all the memory related
|
||||
to that buffer is freed and our `'tcx` references would be invalid.
|
||||
|
||||
|
||||
## The tcx and how it uses lifetimes
|
||||
|
||||
The `tcx` ("typing context") is the central data structure in the compiler. It is the context that
|
||||
you use to perform all manner of queries. The struct `TyCtxt` defines a reference to this shared
|
||||
context:
|
||||
|
||||
```rust,ignore
|
||||
tcx: TyCtxt<'tcx>
|
||||
// ----
|
||||
// |
|
||||
// arena lifetime
|
||||
```
|
||||
|
||||
As you can see, the `TyCtxt` type takes a lifetime parameter. When you see a reference with a
|
||||
lifetime like `'tcx`, you know that it refers to arena-allocated data (or data that lives as long as
|
||||
the arenas, anyhow).
|
||||
|
||||
## Allocating and working with types
|
||||
|
||||
To allocate a new type, you can use the various `mk_` methods defined on the `tcx`. These have names
|
||||
that correspond mostly to the various kinds of types. For example:
|
||||
|
||||
```rust,ignore
|
||||
let array_ty = tcx.mk_array(elem_ty, len * 2);
|
||||
```
|
||||
|
||||
These methods all return a `Ty<'tcx>` – note that the lifetime you get back is the lifetime of the
|
||||
arena that this `tcx` has access to. Types are always canonicalized and interned (so we never
|
||||
allocate exactly the same type twice).
|
||||
|
||||
> NB. Because types are interned, it is possible to compare them for equality efficiently using `==`
|
||||
> – however, this is almost never what you want to do unless you happen to be hashing and looking
|
||||
> for duplicates. This is because often in Rust there are multiple ways to represent the same type,
|
||||
> particularly once inference is involved. If you are going to be testing for type equality, you
|
||||
> probably need to start looking into the inference code to do it right.
|
||||
|
||||
You can also find various common types in the `tcx` itself by accessing `tcx.types.bool`,
|
||||
`tcx.types.char`, etc (see [`CommonTypes`] for more).
|
||||
|
||||
[`CommonTypes`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/context/struct.CommonTypes.html
|
||||
|
||||
## Beyond types: other kinds of arena-allocated data structures
|
||||
|
||||
In addition to types, there are a number of other arena-allocated data structures that you can
|
||||
allocate, and which are found in this module. Here are a few examples:
|
||||
|
||||
- [`Substs`][subst], allocated with `mk_substs` – this will intern a slice of types, often used to
|
||||
specify the values to be substituted for generics (e.g. `HashMap<i32, u32>` would be represented
|
||||
as a slice `&'tcx [tcx.types.i32, tcx.types.u32]`).
|
||||
- [`TraitRef`], typically passed by value – a **trait reference** consists of a reference to a trait
|
||||
along with its various type parameters (including `Self`), like `i32: Display` (here, the def-id
|
||||
would reference the `Display` trait, and the substs would contain `i32`). Note that `def-id` is
|
||||
defined and discussed in depth in the `AdtDef and DefId` section.
|
||||
- [`Predicate`] defines something the trait system has to prove (see `traits` module).
|
||||
|
||||
[subst]: ./generic_arguments.html#subst
|
||||
[`TraitRef`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.TraitRef.html
|
||||
[`Predicate`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/enum.Predicate.html
|
||||
|
||||
## Import conventions
|
||||
|
||||
Although there is no hard and fast rule, the `ty` module tends to be used like so:
|
||||
|
|
|
|||
|
|
@ -43,13 +43,6 @@ tcx.infer_ctxt().enter(|infcx| {
|
|||
})
|
||||
```
|
||||
|
||||
Each inference context creates a short-lived type arena to store the
|
||||
fresh types and things that it will create, as described in the
|
||||
[chapter on the `ty` module][ty-ch]. This arena is created by the `enter`
|
||||
function and disposed of after it returns.
|
||||
|
||||
[ty-ch]: ty.html
|
||||
|
||||
Within the closure, `infcx` has the type `InferCtxt<'cx, 'tcx>` for some
|
||||
fresh `'cx`, while `'tcx` is the same as outside the inference context.
|
||||
(Again, see the [`ty` chapter][ty-ch] for more details on this setup.)
|
||||
|
|
|
|||
Loading…
Reference in New Issue