Add chapter on libs and metadata. (#1044)
This commit is contained in:
parent
be872c1ce3
commit
95c3930c67
|
|
@ -145,6 +145,7 @@
|
|||
- [Debugging LLVM](./backend/debugging.md)
|
||||
- [Backend Agnostic Codegen](./backend/backend-agnostic.md)
|
||||
- [Implicit Caller Location](./backend/implicit-caller-location.md)
|
||||
- [Libraries and Metadata](./backend/libs-and-metadata.md)
|
||||
- [Profile-guided Optimization](./profile-guided-optimization.md)
|
||||
- [LLVM Source-Based Code Coverage](./llvm-coverage-instrumentation.md)
|
||||
- [Sanitizers Support](./sanitizers.md)
|
||||
|
|
|
|||
|
|
@ -0,0 +1,192 @@
|
|||
# Libraries and Metadata
|
||||
|
||||
When the compiler sees a reference to an external crate, it needs to load some
|
||||
information about that crate. This chapter gives an overview of that process,
|
||||
and the supported file formats for crate libraries.
|
||||
|
||||
## Libraries
|
||||
|
||||
A crate dependency can be loaded from an `rlib`, `dylib`, or `rmeta` file. A
|
||||
key point of these file formats is that they contain `rustc`-specific
|
||||
[*metadata*](#metadata). This metadata allows the compiler to discover enough
|
||||
information about the external crate to understand the items it contains,
|
||||
which macros it exports, and *much* more.
|
||||
|
||||
### rlib
|
||||
|
||||
An `rlib` is an [archive file], which is similar to a tar file. This file
|
||||
format is specific to `rustc`, and may change over time. This file contains:
|
||||
|
||||
* Object code, which is the result of code generation. This is used during
|
||||
regular linking. There is a separate `.o` file for each [codegen unit]. The
|
||||
codegen step can be skipped with the [`-C
|
||||
linker-plugin-lto`][linker-plugin-lto] CLI option, which means each `.o`
|
||||
file will only contain LLVM bitcode.
|
||||
* [LLVM bitcode], which is a binary representation of LLVM's intermediate
|
||||
representation, which is embedded as a section in the `.o` files. This can
|
||||
be used for [Link Time Optimization] (LTO). This can be removed with the
|
||||
[`-C embed-bitcode=no`][embed-bitcode] CLI option to improve compile times
|
||||
and reduce disk space if LTO is not needed.
|
||||
* `rustc` [metadata], in a file named `lib.rmeta`.
|
||||
* A symbol table, which is generally a list of symbols with offsets to the
|
||||
object file that contain that symbol. This is pretty standard for archive
|
||||
files.
|
||||
|
||||
[archive file]: https://en.wikipedia.org/wiki/Ar_(Unix)
|
||||
[LLVM bitcode]: https://llvm.org/docs/BitCodeFormat.html
|
||||
[Link Time Optimization]: https://llvm.org/docs/LinkTimeOptimization.html
|
||||
[codegen unit]: ../backend/codegen.md
|
||||
[embed-bitcode]: https://doc.rust-lang.org/rustc/codegen-options/index.html#embed-bitcode
|
||||
[linker-plugin-lto]: https://doc.rust-lang.org/rustc/codegen-options/index.html#linker-plugin-lto
|
||||
|
||||
### dylib
|
||||
|
||||
A `dylib` is a platform-specific shared library. It includes the `rustc`
|
||||
[metadata] in a special link section called `.rustc` in a compressed format.
|
||||
|
||||
### rmeta
|
||||
|
||||
An `rmeta` file is custom binary format that contains the [metadata] for the
|
||||
crate. This file can be used for fast "checks" of a project by skipping all
|
||||
code generation (as is done with `cargo check`), collecting enough information
|
||||
for documentation (as is done with `cargo doc`), or for
|
||||
[pipelining](#pipelining). This file is created if the
|
||||
[`--emit=metadata`][emit] CLI option is used.
|
||||
|
||||
`rmeta` files do not support linking, since they do not contain compiled
|
||||
object files.
|
||||
|
||||
[emit]: https://doc.rust-lang.org/rustc/command-line-arguments.html#option-emit
|
||||
|
||||
## Metadata
|
||||
|
||||
The metadata contains a wide swath of different elements. This guide will not
|
||||
go into detail of every field it contains. You are encouraged to browse the
|
||||
[`CrateRoot`] definition to get a sense of the different elements it contains.
|
||||
Everything about metadata encoding and decoding is in the [`rustc_metadata`]
|
||||
package.
|
||||
|
||||
Here are a few highlights of things it contains:
|
||||
|
||||
* The version of the `rustc` compiler. The compiler will refuse to load files
|
||||
from any other version.
|
||||
* The [Strict Version Hash](#strict-version-hash) (SVH). This helps ensure the
|
||||
correct dependency is loaded.
|
||||
* The [Crate Disambiguator](#crate-disambiguator). This is a hash used
|
||||
to disambiguate between different crates of the same name.
|
||||
* Information about all the source files in the library. This can be used for
|
||||
a variety of things, such as diagnostics pointing to sources in a
|
||||
dependency.
|
||||
* Information about exported macros, traits, types, and items. Generally,
|
||||
anything that's needed to be known when a path references something inside a
|
||||
crate dependency.
|
||||
* Encoded [MIR]. This is optional, and only encoded if needed for code
|
||||
generation. `cargo check` skips this for performance reasons.
|
||||
|
||||
[`CrateRoot`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/rmeta/struct.CrateRoot.html
|
||||
[`rustc_metadata`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/index.html
|
||||
[MIR]: ../mir/index.md
|
||||
|
||||
### Strict Version Hash
|
||||
|
||||
The Strict Version Hash ([SVH], also known as the "crate hash") is a 64-bit
|
||||
hash that is used to ensure that the correct crate dependencies are loaded. It
|
||||
is possible for a directory to contain multiple copies of the same dependency
|
||||
built with different settings, or built from different sources. The crate
|
||||
loader will skip any crates that have the wrong SVH.
|
||||
|
||||
The SVH is also used for the [incremental compilation] session filename,
|
||||
though that usage is mostly historic.
|
||||
|
||||
The hash includes a variety of elements:
|
||||
|
||||
* Hashes of the HIR nodes.
|
||||
* All of the upstream crate hashes.
|
||||
* All of the source filenames.
|
||||
* Hashes of certain command-line flags (like `-C metadata` via the [Crate
|
||||
Disambiguator](#crate-disambiguator), and all CLI options marked with
|
||||
`[TRACKED]`).
|
||||
|
||||
See [`finalize_and_compute_crate_hash`] for where the hash is actually
|
||||
computed.
|
||||
|
||||
[SVH]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_data_structures/svh/struct.Svh.html
|
||||
[incremental compilation]: ../queries/incremental-compilation.md
|
||||
[`finalize_and_compute_crate_hash`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/hir/map/collector/struct.NodeCollector.html#method.finalize_and_compute_crate_hash
|
||||
|
||||
### Crate Disambiguator
|
||||
|
||||
The [`CrateDisambiguator`] is a 128-bit hash used to distinguish between
|
||||
different crates of the same name. It is a hash of all the [`-C metadata`] CLI
|
||||
options computed in [`compute_crate_disambiguator`]. It is used in a variety
|
||||
of places, such as symbol name mangling, crate loading, and much more.
|
||||
|
||||
By default, all Rust symbols are mangled and incorporate the disambiguator
|
||||
hash. This allows multiple versions of the same crate to be included together.
|
||||
Cargo automatically generates `-C metadata` hashes based on a variety of
|
||||
factors, like the package version, source, and the target kind (a lib and bin
|
||||
can have the same crate name, so they need to be disambiguated).
|
||||
|
||||
[`CrateDisambiguator`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/crate_disambiguator/struct.CrateDisambiguator.html
|
||||
[`compute_crate_disambiguator`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_interface/util/fn.compute_crate_disambiguator.html
|
||||
[`-C metadata`]: https://doc.rust-lang.org/rustc/codegen-options/index.html#metadata
|
||||
|
||||
## Crate loading
|
||||
|
||||
Crate loading can have quite a few subtle complexities. During [name
|
||||
resolution], when an external crate is referenced (via an `extern crate` or
|
||||
path), the resolver uses the [`CrateLoader`] which is responsible for finding
|
||||
the crate libraries and loading the [metadata] for them. After the dependency
|
||||
is loaded, the `CrateLoader` will provide the information the resolver needs
|
||||
to perform its job (such as expanding macros, resolving paths, etc.).
|
||||
|
||||
To load each external crate, the `CrateLoader` uses a [`CrateLocator`] to
|
||||
actually find the correct files for one specific crate. There is some great
|
||||
documentation in the [`locator`] module that goes into detail on how loading
|
||||
works, and I strongly suggest reading it to get the full picture.
|
||||
|
||||
The location of a dependency can come from several different places. Direct
|
||||
dependencies are usually passed with `--extern` flags, and the loader can look
|
||||
at those directly. Direct dependencies often have references to their own
|
||||
dependencies, which need to be loaded, too. These are usually found by
|
||||
scanning the directories passed with the `-L` flag for any file whose metadata
|
||||
contains a matching crate name and [SVH](#strict-version-hash). The loader
|
||||
will also look at the [sysroot] to find dependencies.
|
||||
|
||||
As crates are loaded, they are kept in the [`CStore`] with the crate metadata
|
||||
wrapped in the [`CrateMetadata`] struct. After resolution and expansion, the
|
||||
`CStore` will make its way into the [`GlobalCtxt`] for the rest of
|
||||
compilation.
|
||||
|
||||
[name resolution]: ../name-resolution.md
|
||||
[`CrateLoader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/creader/struct.CrateLoader.html
|
||||
[`CrateLocator`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/locator/struct.CrateLocator.html
|
||||
[`locator`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/locator/index.html
|
||||
[`CStore`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/creader/struct.CStore.html
|
||||
[`CrateMetadata`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_metadata/rmeta/decoder/struct.CrateMetadata.html
|
||||
[`GlobalCtxt`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.GlobalCtxt.html
|
||||
[sysroot]: ../building/bootstrapping.md#what-is-a-sysroot
|
||||
|
||||
## Pipelining
|
||||
|
||||
One trick to improve compile times is to start building a crate as soon as the
|
||||
metadata for its dependencies is available. For a library, there is no need to
|
||||
wait for the code generation of dependencies to finish. Cargo implements this
|
||||
technique by telling `rustc` to emit an [`rmeta`](#rmeta) file for each
|
||||
dependency as well as an [`rlib`](#rlib). As early as it can, `rustc` will
|
||||
save the `rmeta` file to disk before it continues to the code generation
|
||||
phase. The compiler sends a JSON message to let the build tool know that it
|
||||
can start building the next crate if possible.
|
||||
|
||||
The [crate loading](#crate-loading) system is smart enough to know when it
|
||||
sees an `rmeta` file to use that if the `rlib` is not there (or has only been
|
||||
partially written).
|
||||
|
||||
This pipelining isn't possible for binaries, because the linking phase will
|
||||
require the code generation of all its dependencies. In the future, it may be
|
||||
possible to further improve this scenario by splitting linking into a separate
|
||||
command (see [#64191]).
|
||||
|
||||
[#64191]: https://github.com/rust-lang/rust/issues/64191
|
||||
|
||||
[metadata]: #metadata
|
||||
Loading…
Reference in New Issue