From 00d5bcc913cbde4f99a7c0cd86460f718ef29305 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Tue, 5 Nov 2019 16:57:35 +0100 Subject: [PATCH] apply linebreaks --- src/miri.md | 147 ++++++++++++++++++++++++++++++++-------------------- 1 file changed, 90 insertions(+), 57 deletions(-) diff --git a/src/miri.md b/src/miri.md index 77ae8f93..09c31e0a 100644 --- a/src/miri.md +++ b/src/miri.md @@ -55,33 +55,40 @@ Before the evaluation, a virtual memory location (in this case essentially a `vec![u8; 4]` or `vec![u8; 8]`) is created for storing the evaluation result. At the start of the evaluation, `_0` and `_1` are -`Operand::Immediate(Immediate::Scalar(ScalarMaybeUndef::Undef))`. -This is quite a mouthful: [`Operand`] can represent either data stored somewhere in the [interpreter memory](#memory) (`Operand::Indirect`), or (as an optimization) immediate data stored in-line. -And [`Immediate`] can either be a single (potentially uninitialized) [scalar value][`Scalar`] (integer or thin pointer), or a pair of two of them. -In our case, the single scalar value is *not* (yet) initialized. +`Operand::Immediate(Immediate::Scalar(ScalarMaybeUndef::Undef))`. This is quite +a mouthful: [`Operand`] can represent either data stored somewhere in the +[interpreter memory](#memory) (`Operand::Indirect`), or (as an optimization) +immediate data stored in-line. And [`Immediate`] can either be a single +(potentially uninitialized) [scalar value][`Scalar`] (integer or thin pointer), +or a pair of two of them. In our case, the single scalar value is *not* (yet) +initialized. -When the initialization of `_1` is invoked, the -value of the `FOO` constant is required, and triggers another call to -`tcx.const_eval`, which will not be shown here. If the evaluation of FOO is -successful, `42` will be subtracted from its value `4096` and the result stored in -`_1` as `Operand::Immediate(Immediate::ScalarPair(Scalar::Raw { data: 4054, .. }, Scalar::Raw { data: 0, .. })`. The first -part of the pair is the computed value, the second part is a bool that's true if -an overflow happened. A `Scalar::Raw` also stores the size (in bytes) of this scalar value; we are eliding that here. +When the initialization of `_1` is invoked, the value of the `FOO` constant is +required, and triggers another call to `tcx.const_eval`, which will not be shown +here. If the evaluation of FOO is successful, `42` will be subtracted from its +value `4096` and the result stored in `_1` as +`Operand::Immediate(Immediate::ScalarPair(Scalar::Raw { data: 4054, .. }, +Scalar::Raw { data: 0, .. })`. The first part of the pair is the computed value, +the second part is a bool that's true if an overflow happened. A `Scalar::Raw` +also stores the size (in bytes) of this scalar value; we are eliding that here. The next statement asserts that said boolean is `0`. In case the assertion fails, its error message is used for reporting a compile-time error. -Since it does not fail, `Operand::Immediate(Immediate::Scalar(Scalar::Raw { data: 4054, .. }))` is stored in the -virtual memory was allocated before the evaluation. `_0` always refers to that -location directly. +Since it does not fail, `Operand::Immediate(Immediate::Scalar(Scalar::Raw { +data: 4054, .. }))` is stored in the virtual memory was allocated before the +evaluation. `_0` always refers to that location directly. -After the evaluation is done, the return value is converted from [`Operand`] to [`ConstValue`] by [`op_to_const`]: -the former representation is geared towards what is needed *during* cost evaluation, while [`ConstValue`] -is shaped by the needs of the remaining parts of the compiler that consume the results of const evaluation. -As part of this conversion, for types with scalar values, even if -the resulting [`Operand`] is `Indirect`, it will return an immediate `ConstValue::Scalar(computed_value)` (instead of the usual `ConstValue::ByRef`). -This makes using the result much more efficient and also more convenient, as no further queries need to be -executed in order to get at something as simple as a `usize`. +After the evaluation is done, the return value is converted from [`Operand`] to +[`ConstValue`] by [`op_to_const`]: the former representation is geared towards +what is needed *during* cost evaluation, while [`ConstValue`] is shaped by the +needs of the remaining parts of the compiler that consume the results of const +evaluation. As part of this conversion, for types with scalar values, even if +the resulting [`Operand`] is `Indirect`, it will return an immediate +`ConstValue::Scalar(computed_value)` (instead of the usual `ConstValue::ByRef`). +This makes using the result much more efficient and also more convenient, as no +further queries need to be executed in order to get at something as simple as a +`usize`. Future evaluations of the same constants will not actually invoke Miri, but just use the cached result. @@ -96,12 +103,13 @@ Miri, but just use the cached result. Miri's outside-facing datastructures can be found in [librustc/mir/interpret](https://github.com/rust-lang/rust/blob/master/src/librustc/mir/interpret). -This is mainly the error enum and the [`ConstValue`] and [`Scalar`] types. A `ConstValue` can -be either `Scalar` (a single `Scalar`, i.e., integer or thin pointer), -`Slice` (to represent byte slices and strings, as needed for pattern matching) or `ByRef`, which is used for anything else and -refers to a virtual allocation. These allocations can be accessed via the -methods on `tcx.interpret_interner`. -A `Scalar` is either some `Raw` integer or a pointer; see [the next section](#memory) for more on that. +This is mainly the error enum and the [`ConstValue`] and [`Scalar`] types. A +`ConstValue` can be either `Scalar` (a single `Scalar`, i.e., integer or thin +pointer), `Slice` (to represent byte slices and strings, as needed for pattern +matching) or `ByRef`, which is used for anything else and refers to a virtual +allocation. These allocations can be accessed via the methods on +`tcx.interpret_interner`. A `Scalar` is either some `Raw` integer or a pointer; +see [the next section](#memory) for more on that. If you are expecting a numeric result, you can use `eval_usize` (panics on anything that can't be representad as a `u64`) or `try_eval_usize` which results @@ -109,18 +117,25 @@ in an `Option` yielding the `Scalar` if possible. ## Memory -To support any kind of pointers, Miri needs to have a "virtual memory" that the pointers can point to. -This is implemented in the [`Memory`] type. -In the simplest model, every global variable, stack variable and every dynamic allocation corresponds to an [`Allocation`] in that memory. -(Actually using an allocation for every MIR stack variable would be very inefficient; that's why we have `Operand::Immediate` for stack variables that are both small and never have their address taken. -But that is purely an optimization.) +To support any kind of pointers, Miri needs to have a "virtual memory" that the +pointers can point to. This is implemented in the [`Memory`] type. In the +simplest model, every global variable, stack variable and every dynamic +allocation corresponds to an [`Allocation`] in that memory. (Actually using an +allocation for every MIR stack variable would be very inefficient; that's why we +have `Operand::Immediate` for stack variables that are both small and never have +their address taken. But that is purely an optimization.) -Such an `Allocation` is basically just a sequence of `u8` storing the value of each byte in this allocation. -(Plus some extra data, see below.) -Every `Allocation` has a globally unique `AllocId` assigned in `Memory`. -With that, a [`Pointer`] consists of a pair of an `AllocId` (indicating the allocation) and an offset into the allocation (indicating which byte of the allocation the pointer points to). -It may seem odd that a `Pointer` is not just an integer address, but remember that during const evaluation, we cannot know at which actual integer address the allocation will end up -- so we use `AllocId` as symbolic base addresses, which means we need a separate offset. -(As an aside, it turns out that pointers at run-time are [more than just integers, too](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#pointer-provenance).) +Such an `Allocation` is basically just a sequence of `u8` storing the value of +each byte in this allocation. (Plus some extra data, see below.) Every +`Allocation` has a globally unique `AllocId` assigned in `Memory`. With that, a +[`Pointer`] consists of a pair of an `AllocId` (indicating the allocation) and +an offset into the allocation (indicating which byte of the allocation the +pointer points to). It may seem odd that a `Pointer` is not just an integer +address, but remember that during const evaluation, we cannot know at which +actual integer address the allocation will end up -- so we use `AllocId` as +symbolic base addresses, which means we need a separate offset. (As an aside, +it turns out that pointers at run-time are +[more than just integers, too](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#pointer-provenance).) These allocations exist so that references and raw pointers have something to point to. There is no global linear heap in which things are allocated, but each @@ -131,23 +146,35 @@ matter how unsafe) operation that you can do that would ever change said pointer to a pointer to a different local variable `b`. Pointer arithmetic on `a` will only ever change its offset; the `AllocId` stays the same. -This, however, causes a problem when we want to store a `Pointer` into an `Allocation`: we cannot turn it into a sequence of `u8` of the right length! -`AllocId` and offset together are twice as big as a pointer "seems" to be. -This is what the `relocation` field of `Allocation` is for: the byte offset of the `Pointer` gets stored as a bunch of `u8`, while its `AllocId` gets stored out-of-band. -The two are reassembled when the `Pointer` is read from memory. -The other bit of extra data an `Allocation` needs is `undef_mask` for keeping track of which of its bytes are initialized. +This, however, causes a problem when we want to store a `Pointer` into an +`Allocation`: we cannot turn it into a sequence of `u8` of the right length! +`AllocId` and offset together are twice as big as a pointer "seems" to be. This +is what the `relocation` field of `Allocation` is for: the byte offset of the +`Pointer` gets stored as a bunch of `u8`, while its `AllocId` gets stored +out-of-band. The two are reassembled when the `Pointer` is read from memory. +The other bit of extra data an `Allocation` needs is `undef_mask` for keeping +track of which of its bytes are initialized. ### Global memory and exotic allocations -`Memory` exists only during the Miri evaluation; it gets destroyed when the final value of the constant is computed. -In case that constant contains any pointers, those get "interned" and moved to a global "const eval memory" that is part of `TyCtxt`. -These allocations stay around for the remaining computation and get serialized into the final output (so that dependent crates can use them). +`Memory` exists only during the Miri evaluation; it gets destroyed when the +final value of the constant is computed. In case that constant contains any +pointers, those get "interned" and moved to a global "const eval memory" that is +part of `TyCtxt`. These allocations stay around for the remaining computation +and get serialized into the final output (so that dependent crates can use +them). -Moreover, to also support function pointers, the global memory in `TyCtxt` can also contain "virtual allocations": instead of an `Allocation`, these contain an `Instance`. -That allows a `Pointer` to point to either normal data or a function, which is needed to be able to evaluate casts from function pointers to raw pointers. +Moreover, to also support function pointers, the global memory in `TyCtxt` can +also contain "virtual allocations": instead of an `Allocation`, these contain an +`Instance`. That allows a `Pointer` to point to either normal data or a +function, which is needed to be able to evaluate casts from function pointers to +raw pointers. -Finally, the [`GlobalAlloc`] type used in the global memory also contains a variant `Static` that points to a particular `const` or `static` item. -This is needed to support circular statics, where we need to have a `Pointer` to a `static` for which we cannot yet have an `Allocation` as we do not know the bytes of its value. +Finally, the [`GlobalAlloc`] type used in the global memory also contains a +variant `Static` that points to a particular `const` or `static` item. This is +needed to support circular statics, where we need to have a `Pointer` to a +`static` for which we cannot yet have an `Allocation` as we do not know the +bytes of its value. [`Memory`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/interpret/struct.Memory.html [`Allocation`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc/mir/interpret/struct.Allocation.html @@ -156,14 +183,20 @@ This is needed to support circular statics, where we need to have a `Pointer` to ### Pointer values vs Pointer types -One common cause of confusion in Miri is that being a pointer *value* and having a pointer *type* are entirely independent properties. -By "pointer value", we refer to a `Scalar::Ptr` containing a `Pointer` and thus pointing somewhere into Miri's virtual memory. -This is in contrast to `Scalar::Raw`, which is just some concrete integer. +One common cause of confusion in Miri is that being a pointer *value* and having +a pointer *type* are entirely independent properties. By "pointer value", we +refer to a `Scalar::Ptr` containing a `Pointer` and thus pointing somewhere into +Miri's virtual memory. This is in contrast to `Scalar::Raw`, which is just some +concrete integer. -However, a variable of pointer or reference *type*, such as `*const T` or `&T`, does not have to have a pointer *value*: -it could be obtaining by casting or transmuting an integer to a pointer (currently that is hard to do in const eval, but eventually `transmute` will be stable as a `const fn`). -And similarly, when casting or transmuting a reference to some actual allocation to an integer, we end up with a pointer *value* (`Scalar::Ptr`) at integer *type* (`usize`). -This is a problem because we cannot meaningfully perform integer operations such as division on pointer values. +However, a variable of pointer or reference *type*, such as `*const T` or `&T`, +does not have to have a pointer *value*: it could be obtaining by casting or +transmuting an integer to a pointer (currently that is hard to do in const eval, +but eventually `transmute` will be stable as a `const fn`). And similarly, when +casting or transmuting a reference to some actual allocation to an integer, we +end up with a pointer *value* (`Scalar::Ptr`) at integer *type* (`usize`). This +is a problem because we cannot meaningfully perform integer operations such as +division on pointer values. ## Interpretation