create a separate chapter on arenas/interning
This commit is contained in:
parent
4b34444364
commit
f007762b9c
|
|
@ -43,6 +43,7 @@
|
||||||
- [Debugging and Testing](./incrcomp-debugging.md)
|
- [Debugging and Testing](./incrcomp-debugging.md)
|
||||||
- [Profiling Queries](./queries/profiling.md)
|
- [Profiling Queries](./queries/profiling.md)
|
||||||
- [Salsa](./salsa.md)
|
- [Salsa](./salsa.md)
|
||||||
|
- [Memory Management in Rustc](./memory.md)
|
||||||
- [Lexing and Parsing](./the-parser.md)
|
- [Lexing and Parsing](./the-parser.md)
|
||||||
- [`#[test]` Implementation](./test-implementation.md)
|
- [`#[test]` Implementation](./test-implementation.md)
|
||||||
- [Panic Implementation](./panic-implementation.md)
|
- [Panic Implementation](./panic-implementation.md)
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,88 @@
|
||||||
|
# Memory Management in Rustc
|
||||||
|
|
||||||
|
Rustc tries to be pretty careful how it manages memory. The compiler allocates
|
||||||
|
_a lot_ of data structures throughout compilation, and if we are not careful,
|
||||||
|
it will take a lot of time and space to do so.
|
||||||
|
|
||||||
|
One of the main way the compiler manages this is using arenas and interning.
|
||||||
|
|
||||||
|
## Arenas and Interning
|
||||||
|
|
||||||
|
We create a LOT of data structures during compilation. For performance reasons,
|
||||||
|
we allocate them from a global memory pool; they are each allocated once from a
|
||||||
|
long-lived *arena*. This is called _arena allocation_. This system reduces
|
||||||
|
allocations/deallocations of memory. It also allows for easy comparison of
|
||||||
|
types for equality: for each interned type `X`, we implemented [`PartialEq for
|
||||||
|
X`][peqimpl], so we can just compare pointers. The [`CtxtInterners`] type
|
||||||
|
contains a bunch of maps of interned types and the arena itself.
|
||||||
|
|
||||||
|
[peqimpl]: https://github.com/rust-lang/rust/blob/3ee936378662bd2e74be951d6a7011a95a6bd84d/src/librustc/ty/mod.rs#L528-L534
|
||||||
|
[`CtxtInterners`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.CtxtInterners.html#structfield.arena
|
||||||
|
|
||||||
|
### Example: `ty::TyS`
|
||||||
|
|
||||||
|
Taking the example of [`ty::TyS`] which represents a type in the compiler (you
|
||||||
|
can read more [here](./ty.md)). Each time we want to construct a type, the
|
||||||
|
compiler doesn’t naively allocate from the buffer. Instead, we check if that
|
||||||
|
type was already constructed. If it was, we just get the same pointer we had
|
||||||
|
before, otherwise we make a fresh pointer. With this schema if we want to know
|
||||||
|
if two types are the same, all we need to do is compare the pointers which is
|
||||||
|
efficient. `TyS` is carefully setup so you never construct them on the stack.
|
||||||
|
You always allocate them from this arena and you always intern them so they are
|
||||||
|
unique.
|
||||||
|
|
||||||
|
At the beginning of the compilation we make a buffer and each time we need to allocate a type we use
|
||||||
|
some of this memory buffer. If we run out of space we get another one. The lifetime of that buffer
|
||||||
|
is `'tcx`. Our types are tied to that lifetime, so when compilation finishes all the memory related
|
||||||
|
to that buffer is freed and our `'tcx` references would be invalid.
|
||||||
|
|
||||||
|
In addition to types, there are a number of other arena-allocated data structures that you can
|
||||||
|
allocate, and which are found in this module. Here are a few examples:
|
||||||
|
|
||||||
|
- [`Substs`][subst], allocated with `mk_substs` – this will intern a slice of types, often used to
|
||||||
|
specify the values to be substituted for generics (e.g. `HashMap<i32, u32>` would be represented
|
||||||
|
as a slice `&'tcx [tcx.types.i32, tcx.types.u32]`).
|
||||||
|
- [`TraitRef`], typically passed by value – a **trait reference** consists of a reference to a trait
|
||||||
|
along with its various type parameters (including `Self`), like `i32: Display` (here, the def-id
|
||||||
|
would reference the `Display` trait, and the substs would contain `i32`). Note that `def-id` is
|
||||||
|
defined and discussed in depth in the `AdtDef and DefId` section.
|
||||||
|
- [`Predicate`] defines something the trait system has to prove (see `traits` module).
|
||||||
|
|
||||||
|
[subst]: ./generic_arguments.html#subst
|
||||||
|
[`TraitRef`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.TraitRef.html
|
||||||
|
[`Predicate`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/enum.Predicate.html
|
||||||
|
|
||||||
|
[`ty::TyS`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.TyS.html
|
||||||
|
|
||||||
|
## The tcx and how it uses lifetimes
|
||||||
|
|
||||||
|
The `tcx` ("typing context") is the central data structure in the compiler. It is the context that
|
||||||
|
you use to perform all manner of queries. The struct `TyCtxt` defines a reference to this shared
|
||||||
|
context:
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
tcx: TyCtxt<'tcx>
|
||||||
|
// ----
|
||||||
|
// |
|
||||||
|
// arena lifetime
|
||||||
|
```
|
||||||
|
|
||||||
|
As you can see, the `TyCtxt` type takes a lifetime parameter. When you see a reference with a
|
||||||
|
lifetime like `'tcx`, you know that it refers to arena-allocated data (or data that lives as long as
|
||||||
|
the arenas, anyhow).
|
||||||
|
|
||||||
|
### A Note On Lifetimes
|
||||||
|
|
||||||
|
The Rust compiler is a fairly large program containing lots of big data
|
||||||
|
structures (e.g. the AST, HIR, and the type system) and as such, arenas and
|
||||||
|
references are heavily relied upon to minimize unnecessary memory use. This
|
||||||
|
manifests itself in the way people can plug into the compiler (i.e. the
|
||||||
|
[driver](./rustc-driver.md)), preferring a "push"-style API (callbacks) instead
|
||||||
|
of the more Rust-ic "pull" style (think the `Iterator` trait).
|
||||||
|
|
||||||
|
Thread-local storage and interning are used a lot through the compiler to reduce
|
||||||
|
duplication while also preventing a lot of the ergonomic issues due to many
|
||||||
|
pervasive lifetimes. The [`rustc::ty::tls`][tls] module is used to access these
|
||||||
|
thread-locals, although you should rarely need to touch it.
|
||||||
|
|
||||||
|
[tls]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/tls/index.html
|
||||||
|
|
@ -32,19 +32,6 @@ replaces this functionality.
|
||||||
> **Warning:** By its very nature, the internal compiler APIs are always going
|
> **Warning:** By its very nature, the internal compiler APIs are always going
|
||||||
> to be unstable. That said, we do try not to break things unnecessarily.
|
> to be unstable. That said, we do try not to break things unnecessarily.
|
||||||
|
|
||||||
## A Note On Lifetimes
|
|
||||||
|
|
||||||
The Rust compiler is a fairly large program containing lots of big data
|
|
||||||
structures (e.g. the AST, HIR, and the type system) and as such, arenas and
|
|
||||||
references are heavily relied upon to minimize unnecessary memory use. This
|
|
||||||
manifests itself in the way people can plug into the compiler, preferring a
|
|
||||||
"push"-style API (callbacks) instead of the more Rust-ic "pull" style (think
|
|
||||||
the `Iterator` trait).
|
|
||||||
|
|
||||||
Thread-local storage and interning are used a lot through the compiler to reduce
|
|
||||||
duplication while also preventing a lot of the ergonomic issues due to many
|
|
||||||
pervasive lifetimes. The `rustc::ty::tls` module is used to access these
|
|
||||||
thread-locals, although you should rarely need to touch it.
|
|
||||||
|
|
||||||
[cb]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/trait.Callbacks.html
|
[cb]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/trait.Callbacks.html
|
||||||
[rd_rc]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/fn.run_compiler.html
|
[rd_rc]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_driver/fn.run_compiler.html
|
||||||
|
|
|
||||||
117
src/ty.md
117
src/ty.md
|
|
@ -119,12 +119,41 @@ field of type [`TyKind`][tykind], which represents the key type information. `Ty
|
||||||
which represents different kinds of types (e.g. primitives, references, abstract data types,
|
which represents different kinds of types (e.g. primitives, references, abstract data types,
|
||||||
generics, lifetimes, etc). `TyS` also has 2 more fields, `flags` and `outer_exclusive_binder`. They
|
generics, lifetimes, etc). `TyS` also has 2 more fields, `flags` and `outer_exclusive_binder`. They
|
||||||
are convenient hacks for efficiency and summarize information about the type that we may want to
|
are convenient hacks for efficiency and summarize information about the type that we may want to
|
||||||
know, but they don’t come into the picture as much here.
|
know, but they don’t come into the picture as much here. Finally, `ty::TyS`s
|
||||||
|
are [interned](./memory.md), so that the `ty::Ty` can be a thin pointer-like
|
||||||
|
type. This allows us to do cheap comparisons for equality, along with the other
|
||||||
|
benefits of interning.
|
||||||
|
|
||||||
[tys]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.TyS.html
|
[tys]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.TyS.html
|
||||||
[kind]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.TyS.html#structfield.kind
|
[kind]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.TyS.html#structfield.kind
|
||||||
[tykind]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/enum.TyKind.html
|
[tykind]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/enum.TyKind.html
|
||||||
|
|
||||||
|
## Allocating and working with types
|
||||||
|
|
||||||
|
To allocate a new type, you can use the various `mk_` methods defined on the `tcx`. These have names
|
||||||
|
that correspond mostly to the various kinds of types. For example:
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
let array_ty = tcx.mk_array(elem_ty, len * 2);
|
||||||
|
```
|
||||||
|
|
||||||
|
These methods all return a `Ty<'tcx>` – note that the lifetime you get back is the lifetime of the
|
||||||
|
arena that this `tcx` has access to. Types are always canonicalized and interned (so we never
|
||||||
|
allocate exactly the same type twice).
|
||||||
|
|
||||||
|
> NB. Because types are interned, it is possible to compare them for equality efficiently using `==`
|
||||||
|
> – however, this is almost never what you want to do unless you happen to be hashing and looking
|
||||||
|
> for duplicates. This is because often in Rust there are multiple ways to represent the same type,
|
||||||
|
> particularly once inference is involved. If you are going to be testing for type equality, you
|
||||||
|
> probably need to start looking into the inference code to do it right.
|
||||||
|
|
||||||
|
You can also find various common types in the `tcx` itself by accessing `tcx.types.bool`,
|
||||||
|
`tcx.types.char`, etc (see [`CommonTypes`] for more).
|
||||||
|
|
||||||
|
[`CommonTypes`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/context/struct.CommonTypes.html
|
||||||
|
|
||||||
|
## `ty::TyKind` Variants
|
||||||
|
|
||||||
Note: `TyKind` is **NOT** the functional programming concept of *Kind*.
|
Note: `TyKind` is **NOT** the functional programming concept of *Kind*.
|
||||||
|
|
||||||
Whenever working with a `Ty` in the compiler, it is common to match on the kind of type:
|
Whenever working with a `Ty` in the compiler, it is common to match on the kind of type:
|
||||||
|
|
@ -147,8 +176,6 @@ types in the compiler.
|
||||||
There are a lot of related types, and we’ll cover them in time (e.g regions/lifetimes,
|
There are a lot of related types, and we’ll cover them in time (e.g regions/lifetimes,
|
||||||
“substitutions”, etc).
|
“substitutions”, etc).
|
||||||
|
|
||||||
## `ty::TyKind` Variants
|
|
||||||
|
|
||||||
There are a bunch of variants on the `TyKind` enum, which you can see by looking at the rustdocs.
|
There are a bunch of variants on the `TyKind` enum, which you can see by looking at the rustdocs.
|
||||||
Here is a sampling:
|
Here is a sampling:
|
||||||
|
|
||||||
|
|
@ -191,90 +218,6 @@ will discuss this more later.
|
||||||
[kinderr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/enum.TyKind.html#variant.Error
|
[kinderr]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/enum.TyKind.html#variant.Error
|
||||||
[kindvars]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/enum.TyKind.html#variants
|
[kindvars]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/enum.TyKind.html#variants
|
||||||
|
|
||||||
## Interning
|
|
||||||
|
|
||||||
We create a LOT of types during compilation. For performance reasons, we allocate them from a global
|
|
||||||
memory pool, they are each allocated once from a long-lived *arena*. This is called _arena
|
|
||||||
allocation_. This system reduces allocations/deallocations of memory. It also allows for easy
|
|
||||||
comparison of types for equality: we implemented [`PartialEq for TyS`][peqimpl], so we can just
|
|
||||||
compare pointers. The [`CtxtInterners`] type contains a bunch of maps of interned types and the
|
|
||||||
arena itself.
|
|
||||||
|
|
||||||
[peqimpl]: https://github.com/rust-lang/rust/blob/3ee936378662bd2e74be951d6a7011a95a6bd84d/src/librustc/ty/mod.rs#L528-L534
|
|
||||||
[`CtxtInterners`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.CtxtInterners.html#structfield.arena
|
|
||||||
|
|
||||||
Each time we want to construct a type, the compiler doesn’t naively allocate from the buffer.
|
|
||||||
Instead, we check if that type was already constructed. If it was, we just get the same pointer we
|
|
||||||
had before, otherwise we make a fresh pointer. With this schema if we want to know if two types are
|
|
||||||
the same, all we need to do is compare the pointers which is efficient. `TyS` which represents types
|
|
||||||
is carefully setup so you never construct them on the stack. You always allocate them from this
|
|
||||||
arena and you always intern them so they are unique.
|
|
||||||
|
|
||||||
At the beginning of the compilation we make a buffer and each time we need to allocate a type we use
|
|
||||||
some of this memory buffer. If we run out of space we get another one. The lifetime of that buffer
|
|
||||||
is `'tcx`. Our types are tied to that lifetime, so when compilation finishes all the memory related
|
|
||||||
to that buffer is freed and our `'tcx` references would be invalid.
|
|
||||||
|
|
||||||
|
|
||||||
## The tcx and how it uses lifetimes
|
|
||||||
|
|
||||||
The `tcx` ("typing context") is the central data structure in the compiler. It is the context that
|
|
||||||
you use to perform all manner of queries. The struct `TyCtxt` defines a reference to this shared
|
|
||||||
context:
|
|
||||||
|
|
||||||
```rust,ignore
|
|
||||||
tcx: TyCtxt<'tcx>
|
|
||||||
// ----
|
|
||||||
// |
|
|
||||||
// arena lifetime
|
|
||||||
```
|
|
||||||
|
|
||||||
As you can see, the `TyCtxt` type takes a lifetime parameter. When you see a reference with a
|
|
||||||
lifetime like `'tcx`, you know that it refers to arena-allocated data (or data that lives as long as
|
|
||||||
the arenas, anyhow).
|
|
||||||
|
|
||||||
## Allocating and working with types
|
|
||||||
|
|
||||||
To allocate a new type, you can use the various `mk_` methods defined on the `tcx`. These have names
|
|
||||||
that correspond mostly to the various kinds of types. For example:
|
|
||||||
|
|
||||||
```rust,ignore
|
|
||||||
let array_ty = tcx.mk_array(elem_ty, len * 2);
|
|
||||||
```
|
|
||||||
|
|
||||||
These methods all return a `Ty<'tcx>` – note that the lifetime you get back is the lifetime of the
|
|
||||||
arena that this `tcx` has access to. Types are always canonicalized and interned (so we never
|
|
||||||
allocate exactly the same type twice).
|
|
||||||
|
|
||||||
> NB. Because types are interned, it is possible to compare them for equality efficiently using `==`
|
|
||||||
> – however, this is almost never what you want to do unless you happen to be hashing and looking
|
|
||||||
> for duplicates. This is because often in Rust there are multiple ways to represent the same type,
|
|
||||||
> particularly once inference is involved. If you are going to be testing for type equality, you
|
|
||||||
> probably need to start looking into the inference code to do it right.
|
|
||||||
|
|
||||||
You can also find various common types in the `tcx` itself by accessing `tcx.types.bool`,
|
|
||||||
`tcx.types.char`, etc (see [`CommonTypes`] for more).
|
|
||||||
|
|
||||||
[`CommonTypes`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/context/struct.CommonTypes.html
|
|
||||||
|
|
||||||
## Beyond types: other kinds of arena-allocated data structures
|
|
||||||
|
|
||||||
In addition to types, there are a number of other arena-allocated data structures that you can
|
|
||||||
allocate, and which are found in this module. Here are a few examples:
|
|
||||||
|
|
||||||
- [`Substs`][subst], allocated with `mk_substs` – this will intern a slice of types, often used to
|
|
||||||
specify the values to be substituted for generics (e.g. `HashMap<i32, u32>` would be represented
|
|
||||||
as a slice `&'tcx [tcx.types.i32, tcx.types.u32]`).
|
|
||||||
- [`TraitRef`], typically passed by value – a **trait reference** consists of a reference to a trait
|
|
||||||
along with its various type parameters (including `Self`), like `i32: Display` (here, the def-id
|
|
||||||
would reference the `Display` trait, and the substs would contain `i32`). Note that `def-id` is
|
|
||||||
defined and discussed in depth in the `AdtDef and DefId` section.
|
|
||||||
- [`Predicate`] defines something the trait system has to prove (see `traits` module).
|
|
||||||
|
|
||||||
[subst]: ./generic_arguments.html#subst
|
|
||||||
[`TraitRef`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/struct.TraitRef.html
|
|
||||||
[`Predicate`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/ty/enum.Predicate.html
|
|
||||||
|
|
||||||
## Import conventions
|
## Import conventions
|
||||||
|
|
||||||
Although there is no hard and fast rule, the `ty` module tends to be used like so:
|
Although there is no hard and fast rule, the `ty` module tends to be used like so:
|
||||||
|
|
|
||||||
|
|
@ -43,13 +43,6 @@ tcx.infer_ctxt().enter(|infcx| {
|
||||||
})
|
})
|
||||||
```
|
```
|
||||||
|
|
||||||
Each inference context creates a short-lived type arena to store the
|
|
||||||
fresh types and things that it will create, as described in the
|
|
||||||
[chapter on the `ty` module][ty-ch]. This arena is created by the `enter`
|
|
||||||
function and disposed of after it returns.
|
|
||||||
|
|
||||||
[ty-ch]: ty.html
|
|
||||||
|
|
||||||
Within the closure, `infcx` has the type `InferCtxt<'cx, 'tcx>` for some
|
Within the closure, `infcx` has the type `InferCtxt<'cx, 'tcx>` for some
|
||||||
fresh `'cx`, while `'tcx` is the same as outside the inference context.
|
fresh `'cx`, while `'tcx` is the same as outside the inference context.
|
||||||
(Again, see the [`ty` chapter][ty-ch] for more details on this setup.)
|
(Again, see the [`ty` chapter][ty-ch] for more details on this setup.)
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue