A guide to how rustc works and how to contribute to it.
Go to file
bors de1eb200fd Auto merge of #134767 - Bryanskiy:dylibs-3, r=petrochenkov
Initial support for dynamically linked crates

This PR is an initial implementation of [rust-lang/rfcs#3435](https://github.com/rust-lang/rfcs/pull/3435) proposal.
### component 1: interface generator

Interface generator - a tool for generating a stripped version of crate source code. The interface is like a C header, where all function bodies are omitted. For example, initial crate:

```rust
#[export]
#[repr(C)]
pub struct S {
   pub x: i32
}
#[export]
pub extern "C" fn foo(x: S) {
   m1::bar(x);
}

pub fn bar(x: crate::S) {
    // some computations
}
```

generated interface:

```rust
#[export]
#[repr(C)]
pub struct S {
    pub x: i32,
}

#[export]
pub extern "C" fn foo(x: S);

pub fn bar(x: crate::S);
```

The interface generator was implemented as part of the pretty-printer. Ideally interface should only contain exportable items, but here is the first problem:
-  pass for determining exportable items relies on privacy information, which is totally available only in HIR
- HIR pretty-printer uses pseudo-code(at least for attributes)

So, the interface generator was implemented in AST. This has led to the fact that non-exportable items cannot be filtered out, but I don't think this is a major issue at the moment.

To emit an interface use a new `sdylib` crate type which is basically the same as `dylib`, but it doesn't contain metadata, and also produces the interface as a second artifact. The current interface name is `lib{crate_name}.rs`.
#### Why was it decided to use a design with an auto-generated interface?

One of the main objectives of this proposal is to allow building the library and the application with different compiler versions. This requires either a metadata format compatible across rustc versions or some form of a source code. The option with a stable metadata format has not been investigated in detail, but it is not part of RFC either. Here is the the related discussion: https://github.com/rust-lang/rfcs/pull/3435#discussion_r1202872373

Original proposal suggests using the source code for the dynamic library and all its dependencies. Metadata is obtained from `cargo check`. I decided to use interface files since it is more or less compatible with the original proposal, but also allows users to hide the source code.
##### Regarding the design with interfaces

in Rust, files generally do not have a special meaning, unlike C++. A translation unit i.e. a crate is not a single file, it consists of modules. Modules, in turn, can be declared either in one file or divided into several. That's why the "interface file" isn't a very coherent concept in Rust. I would like to avoid adding an additional level of complexity for users until it is proven necessary. Therefore, the initial plan was to make the interfaces completely invisible to users i. e. make them auto-generated. I also planned to put them in the dylib, but this has not been done yet. (since the PR is already big enough, I decided to postpone it)

There is one concern, though, which has not yet been investigated(https://github.com/rust-lang/rust/pull/134767#issuecomment-2736471828):

> Compiling the interface as pretty-printed source code doesn't use correct macro hygiene (mostly relevant to macros 2.0, stable macros do not affect item hygiene).  I don't have much hope for encoding hygiene data in any stable way, we should rather support a way for the interface file to be provided manually, instead of being auto-generated, if there are any non-trivial requirements.
### component 2: crate loader

When building dynamic dependencies, the crate loader searches for the interface in the file system, builds the interface without codegen and loads it's metadata. Routing rules for interface files are almost the same as for `rlibs` and `dylibs`. Firstly, the compiler checks `extern` options and then tries to deduce the path himself.

Here are the code and commands that corresponds to the compilation process:

```rust
// simple-lib.rs
#![crate_type = "sdylib"]

#[extern]
pub extern "C" fn foo() -> i32 {
    42
}
```

```rust
// app.rs
extern crate simple_lib;

fn main() {
    assert!(simple_lib::foo(), 42);
}
```

```
// Generate interface, build library.
rustc +toolchain1 lib.rs

// Build app. Perhaps with a different compiler version.
rustc +toolchain2 app.rs -L.
```

P.S. The interface name/format and rules for file system routing can be changed further.
### component 3: exportable items collector

Query for collecting exportable items. Which items are exportable is defined [here](https://github.com/m-ou-se/rfcs/blob/export/text/0000-export.md#the-export-attribute) .
### component 4: "stable" mangling scheme

The mangling scheme proposed in the RFC consists of two parts: a mangled item path and a hash of the signature.
#### mangled item path

For the first part of the symbol it has been decided to reuse the `v0` mangling scheme as it much less dependent on compiler internals compared to the `legacy` scheme.

The exception is disambiguators (https://doc.rust-lang.org/rustc/symbol-mangling/v0.html#disambiguator):

For example, during symbol mangling rustc uses a special index to distinguish between two impls of the same type in the same module(See `DisambiguatedDefPathData`). The calculation of this index may depend on private items, but private items should not affect the ABI. Example:

```rust
#[export]
#[repr(C)]
pub struct S<T>(pub T);

struct S1;
pub struct S2;

impl S<S1> {
    extern "C" fn foo() -> i32 {
        1
    }
}

#[export]
impl S<S2> {
    // Different symbol names can be generated for this item
    // when compiling the interface and source code.
    pub extern "C" fn foo() -> i32 {
        2
    }
}
```

In order to make disambiguation independent of the compiler version we can assign an id to each impl according to their relative order in the source code.

The second example is `StableCrateId` which is used to disambiguate different crates. `StableCrateId` consists of crate name, `-Cmetadata` arguments and compiler version. At the moment, I have decided to keep only the crate name, but a more consistent approach to crate disambiguation could be added in the future.

Actually, there are more cases where such disambiguation can be used. For instance, when mangling internal rustc symbols, but it also hasn't been investigated in detail yet.
#### hash of the signature

Exportable functions from stable dylibs can be called from safe code. In order to provide type safety, 128 bit hash with relevant type information is appended to the symbol ([description from RFC](https://github.com/m-ou-se/rfcs/blob/export/text/0000-export.md#name-mangling-and-safety)). For now, it includes:

- hash of the type name for primitive types
- for ADT types with public fields the implementation follows [this](https://github.com/m-ou-se/rfcs/blob/export/text/0000-export.md#types-with-public-fields) rules

`#[export(unsafe_stable_abi = "hash")]` syntax for ADT types with private fields is not yet implemented.

Type safety is a subtle thing here. I used the approach from RFC, but there is the ongoing research project about it. [https://rust-lang.github.io/rust-project-goals/2025h1/safe-linking.html](https://rust-lang.github.io/rust-project-goals/2025h1/safe-linking.html)

### Unresolved questions

Interfaces:
1. Move the interface generator to HIR and add an exportable items filter.
2. Compatibility of auto-generated interfaces and macro hygiene.
3. There is an open issue with interface files compilation: https://github.com/rust-lang/rust/pull/134767#issuecomment-2736471828
4. Put an interface into a dylib.

Mangling scheme:
1. Which information is required to ensure type safety and how should it be encoded? ([https://rust-lang.github.io/rust-project-goals/2025h1/safe-linking.html](https://rust-lang.github.io/rust-project-goals/2025h1/safe-linking.html))
2. Determine all other possible cases, where path disambiguation is used. Make it compiler independent.

We also need a semi-stable API to represent types. For example, the order of fields in the `VariantDef` must be stable. Or a semi-stable representation for AST, which ensures that the order of the items in the code is preserved.

There are some others, mentioned in the proposal.
2025-05-05 08:36:17 +00:00
.github/workflows Update mdbook to 0.4.48 2025-04-28 11:41:17 -07:00
ci add rustfmt settings file 2025-03-30 00:31:44 +02:00
examples add rustfmt settings file 2025-03-30 00:31:44 +02:00
josh-sync use repo name in push pr title 2025-04-28 06:49:13 +02:00
src compiletest: Support matching on non-json lines in compiler output 2025-05-04 18:27:45 +03:00
.editorconfig Set max line length in `.editorconfig` to 100 (#1788) 2023-09-05 23:24:59 +09:00
.gitattributes .gitattributes: Mark minified javascript as binary to filter greps 2022-10-07 18:34:51 +02:00
.gitignore add josh-sync build dir to gitignore (#2196) 2025-01-06 02:57:03 +08:00
.mailmap add a mailmap 2023-12-17 18:21:38 +01:00
CITATION.cff Add a citation file 2023-02-11 08:41:56 +02:00
CNAME cname (#606) 2020-03-09 18:10:52 -03:00
CODE_OF_CONDUCT.md Fix some links (#1865) 2024-01-28 19:44:41 -03:00
LICENSE-APACHE add code-of-conduct, licensing material, and a README 2018-01-16 16:36:21 -05:00
LICENSE-MIT add code-of-conduct, licensing material, and a README 2018-01-16 16:36:21 -05:00
README.md toolchain version does not need to be specified 2025-04-19 13:34:13 +02:00
book.toml Update book.toml fix the authors field 2025-04-04 08:34:08 +03:00
mermaid-init.js add mdbook-mermaid 2022-07-17 23:34:12 +02:00
mermaid.min.js add mdbook-mermaid 2022-07-17 23:34:12 +02:00
rust-version Preparing for merge from rustc 2025-05-01 04:05:40 +00:00
rustfmt.toml add rustfmt settings file 2025-03-30 00:31:44 +02:00
triagebot.toml Merge pull request #2352 from xizheyin/enable-behind-upstream 2025-04-30 18:42:23 +08:00

README.md

CI

This is a collaborative effort to build a guide that explains how rustc works. The aim of the guide is to help new contributors get oriented to rustc, as well as to help more experienced folks in figuring out some new part of the compiler that they haven't worked on before.

You can read the latest version of the guide here.

You may also find the rustdocs for the compiler itself useful. Note that these are not intended as a guide; it's recommended that you search for the docs you're looking for instead of reading them top to bottom.

For documentation on developing the standard library, see std-dev-guide.

Contributing to the guide

The guide is useful today, but it has a lot of work still to go.

If you'd like to help improve the guide, we'd love to have you! You can find plenty of issues on the issue tracker. Just post a comment on the issue you would like to work on to make sure that we don't accidentally duplicate work. If you think something is missing, please open an issue about it!

In general, if you don't know how the compiler works, that is not a problem! In that case, what we will do is to schedule a bit of time for you to talk with someone who does know the code, or who wants to pair with you and figure it out. Then you can work on writing up what you learned.

In general, when writing about a particular part of the compiler's code, we recommend that you link to the relevant parts of the rustc rustdocs.

Build Instructions

To build a local static HTML site, install mdbook with:

cargo install mdbook mdbook-linkcheck2 mdbook-toc mdbook-mermaid

and execute the following command in the root of the repository:

mdbook build --open

The build files are found in the book/html directory.

We use mdbook-linkcheck2 to validate URLs included in our documentation. Link checking is not run by default locally, though it is in CI. To enable it locally, set the environment variable ENABLE_LINKCHECK=1 like in the following example.

ENABLE_LINKCHECK=1 mdbook serve

Table of Contents

We use mdbook-toc to auto-generate TOCs for long sections. You can invoke the preprocessor by including the <!-- toc --> marker at the place where you want the TOC.

Synchronizing josh subtree with rustc

This repository is linked to rust-lang/rust as a josh subtree. You can use the following commands to synchronize the subtree in both directions.

You'll need to install josh-proxy locally via

cargo install josh-proxy --git https://github.com/josh-project/josh --tag r24.10.04

Older versions of josh-proxy may not round trip commits losslessly so it is important to install this exact version.

Pull changes from rust-lang/rust into this repository

  1. Checkout a new branch that will be used to create a PR into rust-lang/rustc-dev-guide
  2. Run the pull command
    cargo run --manifest-path josh-sync/Cargo.toml rustc-pull
    
  3. Push the branch to your fork and create a PR into rustc-dev-guide

Push changes from this repository into rust-lang/rust

  1. Run the push command to create a branch named <branch-name> in a rustc fork under the <gh-username> account
    cargo run --manifest-path josh-sync/Cargo.toml rustc-push <branch-name> <gh-username>
    
  2. Create a PR from <branch-name> into rust-lang/rust

Minimal git config

For simplicity (ease of implementation purposes), the josh-sync script simply calls out to system git. This means that the git invocation may be influenced by global (or local) git configuration.

You may observe "Nothing to pull" even if you know rustc-pull has something to pull if your global git config sets fetch.prunetags = true (and possibly other configurations may cause unexpected outcomes).

To minimize the likelihood of this happening, you may wish to keep a separate minimal git config that only has [user] entries from global git config, then repoint system git to use the minimal git config instead. E.g.

GIT_CONFIG_GLOBAL=/path/to/minimal/gitconfig GIT_CONFIG_SYSTEM='' cargo run --manifest-path josh-sync/Cargo.toml -- rustc-pull