apply linebreaks
This commit is contained in:
parent
153f3796a9
commit
00d5bcc913
147
src/miri.md
147
src/miri.md
|
|
@ -55,33 +55,40 @@ Before the evaluation, a virtual memory location (in this case essentially a
|
||||||
`vec![u8; 4]` or `vec![u8; 8]`) is created for storing the evaluation result.
|
`vec![u8; 4]` or `vec![u8; 8]`) is created for storing the evaluation result.
|
||||||
|
|
||||||
At the start of the evaluation, `_0` and `_1` are
|
At the start of the evaluation, `_0` and `_1` are
|
||||||
`Operand::Immediate(Immediate::Scalar(ScalarMaybeUndef::Undef))`.
|
`Operand::Immediate(Immediate::Scalar(ScalarMaybeUndef::Undef))`. This is quite
|
||||||
This is quite a mouthful: [`Operand`] can represent either data stored somewhere in the [interpreter memory](#memory) (`Operand::Indirect`), or (as an optimization) immediate data stored in-line.
|
a mouthful: [`Operand`] can represent either data stored somewhere in the
|
||||||
And [`Immediate`] can either be a single (potentially uninitialized) [scalar value][`Scalar`] (integer or thin pointer), or a pair of two of them.
|
[interpreter memory](#memory) (`Operand::Indirect`), or (as an optimization)
|
||||||
In our case, the single scalar value is *not* (yet) initialized.
|
immediate data stored in-line. And [`Immediate`] can either be a single
|
||||||
|
(potentially uninitialized) [scalar value][`Scalar`] (integer or thin pointer),
|
||||||
|
or a pair of two of them. In our case, the single scalar value is *not* (yet)
|
||||||
|
initialized.
|
||||||
|
|
||||||
When the initialization of `_1` is invoked, the
|
When the initialization of `_1` is invoked, the value of the `FOO` constant is
|
||||||
value of the `FOO` constant is required, and triggers another call to
|
required, and triggers another call to `tcx.const_eval`, which will not be shown
|
||||||
`tcx.const_eval`, which will not be shown here. If the evaluation of FOO is
|
here. If the evaluation of FOO is successful, `42` will be subtracted from its
|
||||||
successful, `42` will be subtracted from its value `4096` and the result stored in
|
value `4096` and the result stored in `_1` as
|
||||||
`_1` as `Operand::Immediate(Immediate::ScalarPair(Scalar::Raw { data: 4054, .. }, Scalar::Raw { data: 0, .. })`. The first
|
`Operand::Immediate(Immediate::ScalarPair(Scalar::Raw { data: 4054, .. },
|
||||||
part of the pair is the computed value, the second part is a bool that's true if
|
Scalar::Raw { data: 0, .. })`. The first part of the pair is the computed value,
|
||||||
an overflow happened. A `Scalar::Raw` also stores the size (in bytes) of this scalar value; we are eliding that here.
|
the second part is a bool that's true if an overflow happened. A `Scalar::Raw`
|
||||||
|
also stores the size (in bytes) of this scalar value; we are eliding that here.
|
||||||
|
|
||||||
The next statement asserts that said boolean is `0`. In case the assertion
|
The next statement asserts that said boolean is `0`. In case the assertion
|
||||||
fails, its error message is used for reporting a compile-time error.
|
fails, its error message is used for reporting a compile-time error.
|
||||||
|
|
||||||
Since it does not fail, `Operand::Immediate(Immediate::Scalar(Scalar::Raw { data: 4054, .. }))` is stored in the
|
Since it does not fail, `Operand::Immediate(Immediate::Scalar(Scalar::Raw {
|
||||||
virtual memory was allocated before the evaluation. `_0` always refers to that
|
data: 4054, .. }))` is stored in the virtual memory was allocated before the
|
||||||
location directly.
|
evaluation. `_0` always refers to that location directly.
|
||||||
|
|
||||||
After the evaluation is done, the return value is converted from [`Operand`] to [`ConstValue`] by [`op_to_const`]:
|
After the evaluation is done, the return value is converted from [`Operand`] to
|
||||||
the former representation is geared towards what is needed *during* cost evaluation, while [`ConstValue`]
|
[`ConstValue`] by [`op_to_const`]: the former representation is geared towards
|
||||||
is shaped by the needs of the remaining parts of the compiler that consume the results of const evaluation.
|
what is needed *during* cost evaluation, while [`ConstValue`] is shaped by the
|
||||||
As part of this conversion, for types with scalar values, even if
|
needs of the remaining parts of the compiler that consume the results of const
|
||||||
the resulting [`Operand`] is `Indirect`, it will return an immediate `ConstValue::Scalar(computed_value)` (instead of the usual `ConstValue::ByRef`).
|
evaluation. As part of this conversion, for types with scalar values, even if
|
||||||
This makes using the result much more efficient and also more convenient, as no further queries need to be
|
the resulting [`Operand`] is `Indirect`, it will return an immediate
|
||||||
executed in order to get at something as simple as a `usize`.
|
`ConstValue::Scalar(computed_value)` (instead of the usual `ConstValue::ByRef`).
|
||||||
|
This makes using the result much more efficient and also more convenient, as no
|
||||||
|
further queries need to be executed in order to get at something as simple as a
|
||||||
|
`usize`.
|
||||||
|
|
||||||
Future evaluations of the same constants will not actually invoke
|
Future evaluations of the same constants will not actually invoke
|
||||||
Miri, but just use the cached result.
|
Miri, but just use the cached result.
|
||||||
|
|
@ -96,12 +103,13 @@ Miri, but just use the cached result.
|
||||||
|
|
||||||
Miri's outside-facing datastructures can be found in
|
Miri's outside-facing datastructures can be found in
|
||||||
[librustc/mir/interpret](https://github.com/rust-lang/rust/blob/master/src/librustc/mir/interpret).
|
[librustc/mir/interpret](https://github.com/rust-lang/rust/blob/master/src/librustc/mir/interpret).
|
||||||
This is mainly the error enum and the [`ConstValue`] and [`Scalar`] types. A `ConstValue` can
|
This is mainly the error enum and the [`ConstValue`] and [`Scalar`] types. A
|
||||||
be either `Scalar` (a single `Scalar`, i.e., integer or thin pointer),
|
`ConstValue` can be either `Scalar` (a single `Scalar`, i.e., integer or thin
|
||||||
`Slice` (to represent byte slices and strings, as needed for pattern matching) or `ByRef`, which is used for anything else and
|
pointer), `Slice` (to represent byte slices and strings, as needed for pattern
|
||||||
refers to a virtual allocation. These allocations can be accessed via the
|
matching) or `ByRef`, which is used for anything else and refers to a virtual
|
||||||
methods on `tcx.interpret_interner`.
|
allocation. These allocations can be accessed via the methods on
|
||||||
A `Scalar` is either some `Raw` integer or a pointer; see [the next section](#memory) for more on that.
|
`tcx.interpret_interner`. A `Scalar` is either some `Raw` integer or a pointer;
|
||||||
|
see [the next section](#memory) for more on that.
|
||||||
|
|
||||||
If you are expecting a numeric result, you can use `eval_usize` (panics on
|
If you are expecting a numeric result, you can use `eval_usize` (panics on
|
||||||
anything that can't be representad as a `u64`) or `try_eval_usize` which results
|
anything that can't be representad as a `u64`) or `try_eval_usize` which results
|
||||||
|
|
@ -109,18 +117,25 @@ in an `Option<u64>` yielding the `Scalar` if possible.
|
||||||
|
|
||||||
## Memory
|
## Memory
|
||||||
|
|
||||||
To support any kind of pointers, Miri needs to have a "virtual memory" that the pointers can point to.
|
To support any kind of pointers, Miri needs to have a "virtual memory" that the
|
||||||
This is implemented in the [`Memory`] type.
|
pointers can point to. This is implemented in the [`Memory`] type. In the
|
||||||
In the simplest model, every global variable, stack variable and every dynamic allocation corresponds to an [`Allocation`] in that memory.
|
simplest model, every global variable, stack variable and every dynamic
|
||||||
(Actually using an allocation for every MIR stack variable would be very inefficient; that's why we have `Operand::Immediate` for stack variables that are both small and never have their address taken.
|
allocation corresponds to an [`Allocation`] in that memory. (Actually using an
|
||||||
But that is purely an optimization.)
|
allocation for every MIR stack variable would be very inefficient; that's why we
|
||||||
|
have `Operand::Immediate` for stack variables that are both small and never have
|
||||||
|
their address taken. But that is purely an optimization.)
|
||||||
|
|
||||||
Such an `Allocation` is basically just a sequence of `u8` storing the value of each byte in this allocation.
|
Such an `Allocation` is basically just a sequence of `u8` storing the value of
|
||||||
(Plus some extra data, see below.)
|
each byte in this allocation. (Plus some extra data, see below.) Every
|
||||||
Every `Allocation` has a globally unique `AllocId` assigned in `Memory`.
|
`Allocation` has a globally unique `AllocId` assigned in `Memory`. With that, a
|
||||||
With that, a [`Pointer`] consists of a pair of an `AllocId` (indicating the allocation) and an offset into the allocation (indicating which byte of the allocation the pointer points to).
|
[`Pointer`] consists of a pair of an `AllocId` (indicating the allocation) and
|
||||||
It may seem odd that a `Pointer` is not just an integer address, but remember that during const evaluation, we cannot know at which actual integer address the allocation will end up -- so we use `AllocId` as symbolic base addresses, which means we need a separate offset.
|
an offset into the allocation (indicating which byte of the allocation the
|
||||||
(As an aside, it turns out that pointers at run-time are [more than just integers, too](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#pointer-provenance).)
|
pointer points to). It may seem odd that a `Pointer` is not just an integer
|
||||||
|
address, but remember that during const evaluation, we cannot know at which
|
||||||
|
actual integer address the allocation will end up -- so we use `AllocId` as
|
||||||
|
symbolic base addresses, which means we need a separate offset. (As an aside,
|
||||||
|
it turns out that pointers at run-time are
|
||||||
|
[more than just integers, too](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#pointer-provenance).)
|
||||||
|
|
||||||
These allocations exist so that references and raw pointers have something to
|
These allocations exist so that references and raw pointers have something to
|
||||||
point to. There is no global linear heap in which things are allocated, but each
|
point to. There is no global linear heap in which things are allocated, but each
|
||||||
|
|
@ -131,23 +146,35 @@ matter how unsafe) operation that you can do that would ever change said pointer
|
||||||
to a pointer to a different local variable `b`.
|
to a pointer to a different local variable `b`.
|
||||||
Pointer arithmetic on `a` will only ever change its offset; the `AllocId` stays the same.
|
Pointer arithmetic on `a` will only ever change its offset; the `AllocId` stays the same.
|
||||||
|
|
||||||
This, however, causes a problem when we want to store a `Pointer` into an `Allocation`: we cannot turn it into a sequence of `u8` of the right length!
|
This, however, causes a problem when we want to store a `Pointer` into an
|
||||||
`AllocId` and offset together are twice as big as a pointer "seems" to be.
|
`Allocation`: we cannot turn it into a sequence of `u8` of the right length!
|
||||||
This is what the `relocation` field of `Allocation` is for: the byte offset of the `Pointer` gets stored as a bunch of `u8`, while its `AllocId` gets stored out-of-band.
|
`AllocId` and offset together are twice as big as a pointer "seems" to be. This
|
||||||
The two are reassembled when the `Pointer` is read from memory.
|
is what the `relocation` field of `Allocation` is for: the byte offset of the
|
||||||
The other bit of extra data an `Allocation` needs is `undef_mask` for keeping track of which of its bytes are initialized.
|
`Pointer` gets stored as a bunch of `u8`, while its `AllocId` gets stored
|
||||||
|
out-of-band. The two are reassembled when the `Pointer` is read from memory.
|
||||||
|
The other bit of extra data an `Allocation` needs is `undef_mask` for keeping
|
||||||
|
track of which of its bytes are initialized.
|
||||||
|
|
||||||
### Global memory and exotic allocations
|
### Global memory and exotic allocations
|
||||||
|
|
||||||
`Memory` exists only during the Miri evaluation; it gets destroyed when the final value of the constant is computed.
|
`Memory` exists only during the Miri evaluation; it gets destroyed when the
|
||||||
In case that constant contains any pointers, those get "interned" and moved to a global "const eval memory" that is part of `TyCtxt`.
|
final value of the constant is computed. In case that constant contains any
|
||||||
These allocations stay around for the remaining computation and get serialized into the final output (so that dependent crates can use them).
|
pointers, those get "interned" and moved to a global "const eval memory" that is
|
||||||
|
part of `TyCtxt`. These allocations stay around for the remaining computation
|
||||||
|
and get serialized into the final output (so that dependent crates can use
|
||||||
|
them).
|
||||||
|
|
||||||
Moreover, to also support function pointers, the global memory in `TyCtxt` can also contain "virtual allocations": instead of an `Allocation`, these contain an `Instance`.
|
Moreover, to also support function pointers, the global memory in `TyCtxt` can
|
||||||
That allows a `Pointer` to point to either normal data or a function, which is needed to be able to evaluate casts from function pointers to raw pointers.
|
also contain "virtual allocations": instead of an `Allocation`, these contain an
|
||||||
|
`Instance`. That allows a `Pointer` to point to either normal data or a
|
||||||
|
function, which is needed to be able to evaluate casts from function pointers to
|
||||||
|
raw pointers.
|
||||||
|
|
||||||
Finally, the [`GlobalAlloc`] type used in the global memory also contains a variant `Static` that points to a particular `const` or `static` item.
|
Finally, the [`GlobalAlloc`] type used in the global memory also contains a
|
||||||
This is needed to support circular statics, where we need to have a `Pointer` to a `static` for which we cannot yet have an `Allocation` as we do not know the bytes of its value.
|
variant `Static` that points to a particular `const` or `static` item. This is
|
||||||
|
needed to support circular statics, where we need to have a `Pointer` to a
|
||||||
|
`static` for which we cannot yet have an `Allocation` as we do not know the
|
||||||
|
bytes of its value.
|
||||||
|
|
||||||
[`Memory`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/interpret/struct.Memory.html
|
[`Memory`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/interpret/struct.Memory.html
|
||||||
[`Allocation`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/mir/interpret/struct.Allocation.html
|
[`Allocation`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/mir/interpret/struct.Allocation.html
|
||||||
|
|
@ -156,14 +183,20 @@ This is needed to support circular statics, where we need to have a `Pointer` to
|
||||||
|
|
||||||
### Pointer values vs Pointer types
|
### Pointer values vs Pointer types
|
||||||
|
|
||||||
One common cause of confusion in Miri is that being a pointer *value* and having a pointer *type* are entirely independent properties.
|
One common cause of confusion in Miri is that being a pointer *value* and having
|
||||||
By "pointer value", we refer to a `Scalar::Ptr` containing a `Pointer` and thus pointing somewhere into Miri's virtual memory.
|
a pointer *type* are entirely independent properties. By "pointer value", we
|
||||||
This is in contrast to `Scalar::Raw`, which is just some concrete integer.
|
refer to a `Scalar::Ptr` containing a `Pointer` and thus pointing somewhere into
|
||||||
|
Miri's virtual memory. This is in contrast to `Scalar::Raw`, which is just some
|
||||||
|
concrete integer.
|
||||||
|
|
||||||
However, a variable of pointer or reference *type*, such as `*const T` or `&T`, does not have to have a pointer *value*:
|
However, a variable of pointer or reference *type*, such as `*const T` or `&T`,
|
||||||
it could be obtaining by casting or transmuting an integer to a pointer (currently that is hard to do in const eval, but eventually `transmute` will be stable as a `const fn`).
|
does not have to have a pointer *value*: it could be obtaining by casting or
|
||||||
And similarly, when casting or transmuting a reference to some actual allocation to an integer, we end up with a pointer *value* (`Scalar::Ptr`) at integer *type* (`usize`).
|
transmuting an integer to a pointer (currently that is hard to do in const eval,
|
||||||
This is a problem because we cannot meaningfully perform integer operations such as division on pointer values.
|
but eventually `transmute` will be stable as a `const fn`). And similarly, when
|
||||||
|
casting or transmuting a reference to some actual allocation to an integer, we
|
||||||
|
end up with a pointer *value* (`Scalar::Ptr`) at integer *type* (`usize`). This
|
||||||
|
is a problem because we cannot meaningfully perform integer operations such as
|
||||||
|
division on pointer values.
|
||||||
|
|
||||||
## Interpretation
|
## Interpretation
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue