Add chapter on libs and metadata. (#1044)
This commit is contained in:
parent
be872c1ce3
commit
95c3930c67
|
|
@ -145,6 +145,7 @@
|
||||||
- [Debugging LLVM](./backend/debugging.md)
|
- [Debugging LLVM](./backend/debugging.md)
|
||||||
- [Backend Agnostic Codegen](./backend/backend-agnostic.md)
|
- [Backend Agnostic Codegen](./backend/backend-agnostic.md)
|
||||||
- [Implicit Caller Location](./backend/implicit-caller-location.md)
|
- [Implicit Caller Location](./backend/implicit-caller-location.md)
|
||||||
|
- [Libraries and Metadata](./backend/libs-and-metadata.md)
|
||||||
- [Profile-guided Optimization](./profile-guided-optimization.md)
|
- [Profile-guided Optimization](./profile-guided-optimization.md)
|
||||||
- [LLVM Source-Based Code Coverage](./llvm-coverage-instrumentation.md)
|
- [LLVM Source-Based Code Coverage](./llvm-coverage-instrumentation.md)
|
||||||
- [Sanitizers Support](./sanitizers.md)
|
- [Sanitizers Support](./sanitizers.md)
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,192 @@
|
||||||
|
# Libraries and Metadata
|
||||||
|
|
||||||
|
When the compiler sees a reference to an external crate, it needs to load some
|
||||||
|
information about that crate. This chapter gives an overview of that process,
|
||||||
|
and the supported file formats for crate libraries.
|
||||||
|
|
||||||
|
## Libraries
|
||||||
|
|
||||||
|
A crate dependency can be loaded from an `rlib`, `dylib`, or `rmeta` file. A
|
||||||
|
key point of these file formats is that they contain `rustc`-specific
|
||||||
|
[*metadata*](#metadata). This metadata allows the compiler to discover enough
|
||||||
|
information about the external crate to understand the items it contains,
|
||||||
|
which macros it exports, and *much* more.
|
||||||
|
|
||||||
|
### rlib
|
||||||
|
|
||||||
|
An `rlib` is an [archive file], which is similar to a tar file. This file
|
||||||
|
format is specific to `rustc`, and may change over time. This file contains:
|
||||||
|
|
||||||
|
* Object code, which is the result of code generation. This is used during
|
||||||
|
regular linking. There is a separate `.o` file for each [codegen unit]. The
|
||||||
|
codegen step can be skipped with the [`-C
|
||||||
|
linker-plugin-lto`][linker-plugin-lto] CLI option, which means each `.o`
|
||||||
|
file will only contain LLVM bitcode.
|
||||||
|
* [LLVM bitcode], which is a binary representation of LLVM's intermediate
|
||||||
|
representation, which is embedded as a section in the `.o` files. This can
|
||||||
|
be used for [Link Time Optimization] (LTO). This can be removed with the
|
||||||
|
[`-C embed-bitcode=no`][embed-bitcode] CLI option to improve compile times
|
||||||
|
and reduce disk space if LTO is not needed.
|
||||||
|
* `rustc` [metadata], in a file named `lib.rmeta`.
|
||||||
|
* A symbol table, which is generally a list of symbols with offsets to the
|
||||||
|
object file that contain that symbol. This is pretty standard for archive
|
||||||
|
files.
|
||||||
|
|
||||||
|
[archive file]: https://en.wikipedia.org/wiki/Ar_(Unix)
|
||||||
|
[LLVM bitcode]: https://llvm.org/docs/BitCodeFormat.html
|
||||||
|
[Link Time Optimization]: https://llvm.org/docs/LinkTimeOptimization.html
|
||||||
|
[codegen unit]: ../backend/codegen.md
|
||||||
|
[embed-bitcode]: https://doc.rust-lang.org/rustc/codegen-options/index.html#embed-bitcode
|
||||||
|
[linker-plugin-lto]: https://doc.rust-lang.org/rustc/codegen-options/index.html#linker-plugin-lto
|
||||||
|
|
||||||
|
### dylib
|
||||||
|
|
||||||
|
A `dylib` is a platform-specific shared library. It includes the `rustc`
|
||||||
|
[metadata] in a special link section called `.rustc` in a compressed format.
|
||||||
|
|
||||||
|
### rmeta
|
||||||
|
|
||||||
|
An `rmeta` file is custom binary format that contains the [metadata] for the
|
||||||
|
crate. This file can be used for fast "checks" of a project by skipping all
|
||||||
|
code generation (as is done with `cargo check`), collecting enough information
|
||||||
|
for documentation (as is done with `cargo doc`), or for
|
||||||
|
[pipelining](#pipelining). This file is created if the
|
||||||
|
[`--emit=metadata`][emit] CLI option is used.
|
||||||
|
|
||||||
|
`rmeta` files do not support linking, since they do not contain compiled
|
||||||
|
object files.
|
||||||
|
|
||||||
|
[emit]: https://doc.rust-lang.org/rustc/command-line-arguments.html#option-emit
|
||||||
|
|
||||||
|
## Metadata
|
||||||
|
|
||||||
|
The metadata contains a wide swath of different elements. This guide will not
|
||||||
|
go into detail of every field it contains. You are encouraged to browse the
|
||||||
|
[`CrateRoot`] definition to get a sense of the different elements it contains.
|
||||||
|
Everything about metadata encoding and decoding is in the [`rustc_metadata`]
|
||||||
|
package.
|
||||||
|
|
||||||
|
Here are a few highlights of things it contains:
|
||||||
|
|
||||||
|
* The version of the `rustc` compiler. The compiler will refuse to load files
|
||||||
|
from any other version.
|
||||||
|
* The [Strict Version Hash](#strict-version-hash) (SVH). This helps ensure the
|
||||||
|
correct dependency is loaded.
|
||||||
|
* The [Crate Disambiguator](#crate-disambiguator). This is a hash used
|
||||||
|
to disambiguate between different crates of the same name.
|
||||||
|
* Information about all the source files in the library. This can be used for
|
||||||
|
a variety of things, such as diagnostics pointing to sources in a
|
||||||
|
dependency.
|
||||||
|
* Information about exported macros, traits, types, and items. Generally,
|
||||||
|
anything that's needed to be known when a path references something inside a
|
||||||
|
crate dependency.
|
||||||
|
* Encoded [MIR]. This is optional, and only encoded if needed for code
|
||||||
|
generation. `cargo check` skips this for performance reasons.
|
||||||
|
|
||||||
|
[`CrateRoot`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/rmeta/struct.CrateRoot.html
|
||||||
|
[`rustc_metadata`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/index.html
|
||||||
|
[MIR]: ../mir/index.md
|
||||||
|
|
||||||
|
### Strict Version Hash
|
||||||
|
|
||||||
|
The Strict Version Hash ([SVH], also known as the "crate hash") is a 64-bit
|
||||||
|
hash that is used to ensure that the correct crate dependencies are loaded. It
|
||||||
|
is possible for a directory to contain multiple copies of the same dependency
|
||||||
|
built with different settings, or built from different sources. The crate
|
||||||
|
loader will skip any crates that have the wrong SVH.
|
||||||
|
|
||||||
|
The SVH is also used for the [incremental compilation] session filename,
|
||||||
|
though that usage is mostly historic.
|
||||||
|
|
||||||
|
The hash includes a variety of elements:
|
||||||
|
|
||||||
|
* Hashes of the HIR nodes.
|
||||||
|
* All of the upstream crate hashes.
|
||||||
|
* All of the source filenames.
|
||||||
|
* Hashes of certain command-line flags (like `-C metadata` via the [Crate
|
||||||
|
Disambiguator](#crate-disambiguator), and all CLI options marked with
|
||||||
|
`[TRACKED]`).
|
||||||
|
|
||||||
|
See [`finalize_and_compute_crate_hash`] for where the hash is actually
|
||||||
|
computed.
|
||||||
|
|
||||||
|
[SVH]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_data_structures/svh/struct.Svh.html
|
||||||
|
[incremental compilation]: ../queries/incremental-compilation.md
|
||||||
|
[`finalize_and_compute_crate_hash`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/hir/map/collector/struct.NodeCollector.html#method.finalize_and_compute_crate_hash
|
||||||
|
|
||||||
|
### Crate Disambiguator
|
||||||
|
|
||||||
|
The [`CrateDisambiguator`] is a 128-bit hash used to distinguish between
|
||||||
|
different crates of the same name. It is a hash of all the [`-C metadata`] CLI
|
||||||
|
options computed in [`compute_crate_disambiguator`]. It is used in a variety
|
||||||
|
of places, such as symbol name mangling, crate loading, and much more.
|
||||||
|
|
||||||
|
By default, all Rust symbols are mangled and incorporate the disambiguator
|
||||||
|
hash. This allows multiple versions of the same crate to be included together.
|
||||||
|
Cargo automatically generates `-C metadata` hashes based on a variety of
|
||||||
|
factors, like the package version, source, and the target kind (a lib and bin
|
||||||
|
can have the same crate name, so they need to be disambiguated).
|
||||||
|
|
||||||
|
[`CrateDisambiguator`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/crate_disambiguator/struct.CrateDisambiguator.html
|
||||||
|
[`compute_crate_disambiguator`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_interface/util/fn.compute_crate_disambiguator.html
|
||||||
|
[`-C metadata`]: https://doc.rust-lang.org/rustc/codegen-options/index.html#metadata
|
||||||
|
|
||||||
|
## Crate loading
|
||||||
|
|
||||||
|
Crate loading can have quite a few subtle complexities. During [name
|
||||||
|
resolution], when an external crate is referenced (via an `extern crate` or
|
||||||
|
path), the resolver uses the [`CrateLoader`] which is responsible for finding
|
||||||
|
the crate libraries and loading the [metadata] for them. After the dependency
|
||||||
|
is loaded, the `CrateLoader` will provide the information the resolver needs
|
||||||
|
to perform its job (such as expanding macros, resolving paths, etc.).
|
||||||
|
|
||||||
|
To load each external crate, the `CrateLoader` uses a [`CrateLocator`] to
|
||||||
|
actually find the correct files for one specific crate. There is some great
|
||||||
|
documentation in the [`locator`] module that goes into detail on how loading
|
||||||
|
works, and I strongly suggest reading it to get the full picture.
|
||||||
|
|
||||||
|
The location of a dependency can come from several different places. Direct
|
||||||
|
dependencies are usually passed with `--extern` flags, and the loader can look
|
||||||
|
at those directly. Direct dependencies often have references to their own
|
||||||
|
dependencies, which need to be loaded, too. These are usually found by
|
||||||
|
scanning the directories passed with the `-L` flag for any file whose metadata
|
||||||
|
contains a matching crate name and [SVH](#strict-version-hash). The loader
|
||||||
|
will also look at the [sysroot] to find dependencies.
|
||||||
|
|
||||||
|
As crates are loaded, they are kept in the [`CStore`] with the crate metadata
|
||||||
|
wrapped in the [`CrateMetadata`] struct. After resolution and expansion, the
|
||||||
|
`CStore` will make its way into the [`GlobalCtxt`] for the rest of
|
||||||
|
compilation.
|
||||||
|
|
||||||
|
[name resolution]: ../name-resolution.md
|
||||||
|
[`CrateLoader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/creader/struct.CrateLoader.html
|
||||||
|
[`CrateLocator`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/locator/struct.CrateLocator.html
|
||||||
|
[`locator`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/locator/index.html
|
||||||
|
[`CStore`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/creader/struct.CStore.html
|
||||||
|
[`CrateMetadata`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/rmeta/decoder/struct.CrateMetadata.html
|
||||||
|
[`GlobalCtxt`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.GlobalCtxt.html
|
||||||
|
[sysroot]: ../building/bootstrapping.md#what-is-a-sysroot
|
||||||
|
|
||||||
|
## Pipelining
|
||||||
|
|
||||||
|
One trick to improve compile times is to start building a crate as soon as the
|
||||||
|
metadata for its dependencies is available. For a library, there is no need to
|
||||||
|
wait for the code generation of dependencies to finish. Cargo implements this
|
||||||
|
technique by telling `rustc` to emit an [`rmeta`](#rmeta) file for each
|
||||||
|
dependency as well as an [`rlib`](#rlib). As early as it can, `rustc` will
|
||||||
|
save the `rmeta` file to disk before it continues to the code generation
|
||||||
|
phase. The compiler sends a JSON message to let the build tool know that it
|
||||||
|
can start building the next crate if possible.
|
||||||
|
|
||||||
|
The [crate loading](#crate-loading) system is smart enough to know when it
|
||||||
|
sees an `rmeta` file to use that if the `rlib` is not there (or has only been
|
||||||
|
partially written).
|
||||||
|
|
||||||
|
This pipelining isn't possible for binaries, because the linking phase will
|
||||||
|
require the code generation of all its dependencies. In the future, it may be
|
||||||
|
possible to further improve this scenario by splitting linking into a separate
|
||||||
|
command (see [#64191]).
|
||||||
|
|
||||||
|
[#64191]: https://github.com/rust-lang/rust/issues/64191
|
||||||
|
|
||||||
|
[metadata]: #metadata
|
||||||
Loading…
Reference in New Issue