From 1a399f5ea3c861b7ae3d282f621eabad59e34c19 Mon Sep 17 00:00:00 2001 From: Alexander Regueiro Date: Sun, 4 Feb 2018 18:48:38 +0000 Subject: [PATCH] Improved grammar of HIR section. --- src/hir.md | 89 ++++++++++++++++++++++++++---------------------------- 1 file changed, 43 insertions(+), 46 deletions(-) diff --git a/src/hir.md b/src/hir.md index 52fda69a..ac7690bd 100644 --- a/src/hir.md +++ b/src/hir.md @@ -1,12 +1,11 @@ # The HIR -The HIR – "High-level IR" – is the primary IR used in most of -rustc. It is a desugared version of the "abstract syntax tree" (AST) -that is generated after parsing, macro expansion, and name resolution -have completed. Many parts of HIR resemble Rust surface syntax quite -closely, with the exception that some of Rust's expression forms have -been desugared away (as an example, `for` loops are converted into a -`loop` and do not appear in the HIR). +The HIR – "High-level IR" – is the primary IR used in most of rustc. +It is a desugared version of the "abstract syntax tree" (AST) that is generated +after parsing, macro expansion, and name resolution have completed. Many parts +of HIR resemble Rust surface syntax quite closely, with the exception that some +of Rust's expression forms have been desugared away (as an example, `for` loops +are converted into a `loop` and do not appear in the HIR). This chapter covers the main concepts of the HIR. @@ -21,8 +20,8 @@ serve to organize the content of the crate for easier access. For example, the contents of individual items (e.g., modules, functions, traits, impls, etc) in the HIR are not immediately -accessible in the parents. So, for example, if had a module item `foo` -containing a function `bar()`: +accessible in the parents. So, for example, if there is a module item +`foo` containing a function `bar()`: ``` mod foo { @@ -30,25 +29,25 @@ mod foo { } ``` -Then in the HIR the representation of module `foo` (the `Mod` -stuct) would have only the **`ItemId`** `I` of `bar()`. To get the +then in the HIR the representation of module `foo` (the `Mod` +stuct) would only have the **`ItemId`** `I` of `bar()`. To get the details of the function `bar()`, we would lookup `I` in the `items` map. One nice result from this representation is that one can iterate over all items in the crate by iterating over the key-value pairs -in these maps (without the need to trawl through the IR in total). +in these maps (without the need to trawl through the whole HIR). There are similar maps for things like trait items and impl items, as well as "bodies" (explained below). -The other reason to setup the representation this way is for better +The other reason to set up the representation this way is for better integration with incremental compilation. This way, if you gain access -to a `&hir::Item` (e.g. for the mod `foo`), you do not immediately +to an `&hir::Item` (e.g. for the mod `foo`), you do not immediately gain access to the contents of the function `bar()`. Instead, you only gain access to the **id** for `bar()`, and you must invoke some -function to lookup the contents of `bar()` given its id; this gives us -a chance to observe that you accessed the data for `bar()` and record -the dependency. +function to lookup the contents of `bar()` given its id; this gives the +compiler a chance to observe that you accessed the data for `bar()`, +and then record the dependency. ### Identifiers in the HIR @@ -57,37 +56,35 @@ carry around references into the HIR, but rather to carry around *identifier numbers* (or just "ids"). Right now, you will find four sorts of identifiers in active use: -- `DefId`, which primarily names "definitions" or top-level items. - - You can think of a `DefId` as being shorthand for a very explicit - and complete path, like `std::collections::HashMap`. However, - these paths are able to name things that are not nameable in - normal Rust (e.g., impls), and they also include extra information - about the crate (such as its version number, as two versions of - the same crate can co-exist). - - A `DefId` really consists of two parts, a `CrateNum` (which - identifies the crate) and a `DefIndex` (which indixes into a list - of items that is maintained per crate). -- `HirId`, which combines the index of a particular item with an - offset within that item. - - the key point of a `HirId` is that it is *relative* to some item (which is named - via a `DefId`). -- `BodyId`, this is an absolute identifier that refers to a specific - body (definition of a function or constant) in the crate. It is currently - effectively a "newtype'd" `NodeId`. -- `NodeId`, which is an absolute id that identifies a single node in the HIR tree. +- `DefId` – primarily names "definitions" or top-level items. + - You can think of a `DefId` as shorthand for a very explicit and complete + path, like `std::collections::HashMap`. However, these paths are able to + name things that are not nameable in normal Rust (e.g. impls), and they also + include extra information about the crate (such as its version number, since + two versions of the same crate can co-exist). + - A `DefId` really consists of two parts, a `CrateNum` (which identifies the + crate) and a `DefIndex` (which indexes into a list of items that is + maintained per crate). +- `HirId` – combines the index of a particular item with an offset within + that item. + - The key point of an `HirId` is that it is *relative* to some item (which is + named via a `DefId`). +- `BodyId` – an absolute identifier that refers to a specific body (definition + of a function or constant) in the crate. It is currently effectively a + "newtype'd" `NodeId`. +- `NodeId` – an absolute ID that identifies a single node in the HIR tree. - While these are still in common use, **they are being slowly phased out**. - - Since they are absolute within the crate, adding a new node - anywhere in the tree causes the node-ids of all subsequent code in - the crate to change. This is terrible for incremental compilation, - as you can perhaps imagine. + - Since they are absolute within the crate, adding a new node anywhere in the + tree causes the `NodeId`s of all subsequent code in the crate to change. + This is terrible for incremental compilation, as you can perhaps imagine. -### HIR Map +### The HIR Map Most of the time when you are working with the HIR, you will do so via the **HIR Map**, accessible in the tcx via `tcx.hir` (and defined in the `hir::map` module). The HIR map contains a number of methods to -convert between ids of various kinds and to lookup data associated -with a HIR node. +convert between IDs of various kinds and to lookup data associated +with an HIR node. For example, if you have a `DefId`, and you would like to convert it to a `NodeId`, you can use `tcx.hir.as_local_node_id(def_id)`. This @@ -100,7 +97,7 @@ Similarly, you can use `tcx.hir.find(n)` to lookup the node for a `NodeId`. This returns a `Option>`, where `Node` is an enum defined in the map; by matching on this you can find out what sort of node the node-id referred to and also get a pointer to the data -itself. Often, you know what sort of node `n` is – e.g., if you know +itself. Often, you know what sort of node `n` is – e.g. if you know that `n` must be some HIR expression, you can do `tcx.hir.expect_expr(n)`, which will extract and return the `&hir::Expr`, panicking if `n` is not in fact an expression. @@ -113,7 +110,7 @@ calls like `tcx.hir.get_parent_node(n)`. A **body** represents some kind of executable code, such as the body of a function/closure or the definition of a constant. Bodies are associated with an **owner**, which is typically some kind of item -(e.g., a `fn()` or `const`), but could also be a closure expression -(e.g., `|x, y| x + y`). You can use the HIR map to find the body -associated with a given def-id (`maybe_body_owned_by()`) or to find +(e.g. an `fn()` or `const`), but could also be a closure expression +(e.g. `|x, y| x + y`). You can use the HIR map to find the body +associated with a given `DefId` (`maybe_body_owned_by()`) or to find the owner of a body (`body_owner_def_id()`).