Skip to content

skip codegen for intrinsics with big fallback bodies if backend does not need them#150605

Merged
rust-bors[bot] merged 1 commit intorust-lang:mainfrom
RalfJung:fallback-intrinsic-skip
Feb 4, 2026
Merged

skip codegen for intrinsics with big fallback bodies if backend does not need them#150605
rust-bors[bot] merged 1 commit intorust-lang:mainfrom
RalfJung:fallback-intrinsic-skip

Conversation

@RalfJung
Copy link
Member

@RalfJung RalfJung commented Jan 2, 2026

This hopefully fixes the perf regression from #148478. I only added the intrinsics with big fallback bodies to the list; it doesn't seem worth the effort of going through the entire list.

Fixes #149945
Cc @scottmcm @bjorn3

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 2, 2026
@rustbot
Copy link
Collaborator

rustbot commented Jan 2, 2026

r? @jieyouxu

rustbot has assigned @jieyouxu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer


/// The names of intrinsics that the current codegen backend replaces
/// with its own implementations.
pub replaced_intrinsics: Vec<Symbol>,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there is no way to get the current codegen backend from a tcx. I wasn't sure what the best way is to make this list of symbols available to monomorphization, and went for a new field in Session -- does that make sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know enough about how all this should be structured to know what the best option is here.

This seems at least plausible, since at worst it stays empty and that doesn't hurt anything (other than perf).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bjorn3 do you have any suggestions for how to deal with this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not the biggest fan of another Session field, but don't have any other suggestions either.

@RalfJung
Copy link
Member Author

RalfJung commented Jan 2, 2026

@bors try
@rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Jan 2, 2026
skip codegen for intrinsics with big fallback bodies if backend does not need them
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 2, 2026
@rust-log-analyzer

This comment has been minimized.

@RalfJung RalfJung force-pushed the fallback-intrinsic-skip branch from 4ca06da to a170604 Compare January 2, 2026 19:29
@rust-bors
Copy link
Contributor

rust-bors bot commented Jan 2, 2026

☀️ Try build successful (CI)
Build commit: 4763a83 (4763a83f81ae539aaa6f6e5e773ba1fc73de0a10, parent: 8a24a202aa02f677fc2a3e0e1a69af7545803952)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (4763a83): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
0.6% [0.6%, 0.6%] 1
Regressions ❌
(secondary)
0.1% [0.1%, 0.1%] 1
Improvements ✅
(primary)
-1.8% [-2.8%, -0.8%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.0% [-2.8%, 0.6%] 3

Max RSS (memory usage)

Results (primary -1.5%, secondary 3.5%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
1.3% [0.7%, 1.9%] 2
Regressions ❌
(secondary)
3.5% [3.5%, 3.5%] 1
Improvements ✅
(primary)
-4.3% [-7.2%, -1.4%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.5% [-7.2%, 1.9%] 4

Cycles

Results (primary -3.9%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-3.9% [-3.9%, -3.9%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -3.9% [-3.9%, -3.9%] 1

Binary size

Results (primary 0.2%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
1.4% [1.4%, 1.4%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.0% [-0.1%, -0.0%] 7
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.2% [-0.1%, 1.4%] 8

Bootstrap: 473.485s -> 474.195s (0.15%)
Artifact size: 390.77 MiB -> 390.79 MiB (0.01%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jan 2, 2026
@RalfJung RalfJung force-pushed the fallback-intrinsic-skip branch from a170604 to 57e44f5 Compare January 2, 2026 22:14
@RalfJung
Copy link
Member Author

RalfJung commented Jan 2, 2026

@bors try
@rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Jan 2, 2026
skip codegen for intrinsics with big fallback bodies if backend does not need them
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 2, 2026
@rust-bors
Copy link
Contributor

rust-bors bot commented Jan 3, 2026

☀️ Try build successful (CI)
Build commit: c75310a (c75310a5c412df8835187dd0ef37361b2f00d085, parent: 5497a36a7faf3d2af37beebcff7008e493202902)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (c75310a): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
0.7% [0.7%, 0.7%] 1
Regressions ❌
(secondary)
0.1% [0.1%, 0.1%] 1
Improvements ✅
(primary)
-1.8% [-2.9%, -0.8%] 2
Improvements ✅
(secondary)
-0.4% [-0.4%, -0.4%] 1
All ❌✅ (primary) -1.0% [-2.9%, 0.7%] 3

Max RSS (memory usage)

Results (primary -4.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-4.1% [-7.3%, -1.7%] 3
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -4.1% [-7.3%, -1.7%] 3

Cycles

Results (primary -3.9%, secondary 15.2%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
15.2% [15.0%, 15.4%] 2
Improvements ✅
(primary)
-3.9% [-3.9%, -3.9%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -3.9% [-3.9%, -3.9%] 1

Binary size

Results (primary 0.2%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
1.4% [1.4%, 1.4%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.0% [-0.1%, -0.0%] 7
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.2% [-0.1%, 1.4%] 8

Bootstrap: 471.287s -> 473.923s (0.56%)
Artifact size: 390.83 MiB -> 390.83 MiB (-0.00%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 3, 2026
@jieyouxu
Copy link
Member

jieyouxu commented Jan 3, 2026

@rustbot reroll

@RalfJung
Copy link
Member Author

Seems like I had no luck with that reroll.
@rustbot reroll

@scottmcm or could you review this?

@rustbot rustbot assigned mati865 and unassigned SparrowLii Jan 31, 2026
@mati865
Copy link
Member

mati865 commented Jan 31, 2026

Cool idea!

I'll wait a few days to give @scottmcm time to respond respond as the much more knowledgeable person.

Do you know if there is a list of similarly optimised intrinsics somewhere?

@RalfJung
Copy link
Member Author

RalfJung commented Feb 2, 2026

In principle one could go over all the intrinsics that have fallback bodies, and then check whether the LLVM backend has implementations for them.

But most fallback bodies are small so the cost of monomorphizing them is tiny. Not sure if it's worth going through the entire list. I think I got all the ones that have big fallback bodies where we really don't want to pay the monomorphization cost.

@mati865
Copy link
Member

mati865 commented Feb 4, 2026

Fair enough, thanks for the explanation.

@bors r+

@rust-bors
Copy link
Contributor

rust-bors bot commented Feb 4, 2026

📌 Commit 57e44f5 has been approved by mati865

It is now in the queue for this repository.

@rust-bors rust-bors bot added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 4, 2026
@rust-bors

This comment has been minimized.

rust-bors bot pushed a commit that referenced this pull request Feb 4, 2026
skip codegen for intrinsics with big fallback bodies if backend does not need them

This hopefully fixes the perf regression from #148478. I only added the intrinsics with big fallback bodies to the list; it doesn't seem worth the effort of going through the entire list.

Fixes #149945
Cc @scottmcm @bjorn3
JonathanBrouwer added a commit to JonathanBrouwer/rust that referenced this pull request Feb 4, 2026
…r=mati865

skip codegen for intrinsics with big fallback bodies if backend does not need them

This hopefully fixes the perf regression from rust-lang#148478. I only added the intrinsics with big fallback bodies to the list; it doesn't seem worth the effort of going through the entire list.

Fixes rust-lang#149945
Cc @scottmcm @bjorn3
@JonathanBrouwer
Copy link
Contributor

@bors yield
Yielding to enclosing rollup

@rust-bors
Copy link
Contributor

rust-bors bot commented Feb 4, 2026

Auto build cancelled. Cancelled workflows:

The next pull request likely to be tested is #152099.

rust-bors bot pushed a commit that referenced this pull request Feb 4, 2026
…uwer

Rollup of 11 pull requests

Successful merges:

 - #150605 (skip codegen for intrinsics with big fallback bodies if backend does not need them)
 - #150992 (link modifier `export-symbols`: export all global symbols from selected uptream c static libraries)
 - #151534 (target: fix destabilising target-spec-json)
 - #152088 (rustbook/README.md: add missing `)`)
 - #151526 (Fix autodiff codegen tests)
 - #151810 (citool: report debuginfo test statistics)
 - #152065 (Convert to inline diagnostics in `rustc_ty_utils`)
 - #152068 (Convert to inline diagnostics in `rustc_resolve`)
 - #152070 (Convert to inline diagnostics in `rustc_pattern_analysis`)
 - #152072 (Convert to inline diagnostics in `rustc_monomorphize`)
 - #152083 (Fix set_times_nofollow for directory on windows)

Failed merges:

 - #152069 (Convert to inline diagnostics in `rustc_privacy`)
@RalfJung
Copy link
Member Author

RalfJung commented Feb 4, 2026

This has perf impact, should it really be rolled up? It is marked rollup=never.

@RalfJung
Copy link
Member Author

RalfJung commented Feb 4, 2026

Oh I guess that mark got lost in the bors transition?
@bors rollup=never

@rust-bors

This comment has been minimized.

@rust-bors rust-bors bot added merged-by-bors This PR was explicitly merged by bors. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Feb 4, 2026
@rust-bors
Copy link
Contributor

rust-bors bot commented Feb 4, 2026

☀️ Test successful - CI
Approved by: mati865
Duration: 3h 17m 40s
Pushing db3e99b to main...

@rust-bors rust-bors bot merged commit db3e99b into rust-lang:main Feb 4, 2026
13 checks passed
@rustbot rustbot added this to the 1.95.0 milestone Feb 4, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

What is this? This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing 8bccf12 (parent) -> db3e99b (this PR)

Test differences

Show 16 test diffs

16 doctest diffs were found. These are ignored, as they are noisy.

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard db3e99bbab28c6ca778b13222becdea54533d908 --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

  1. dist-aarch64-llvm-mingw: 1h 34m -> 1h 50m (+17.0%)
  2. dist-aarch64-apple: 2h 17m -> 2h (-12.5%)
  3. dist-x86_64-apple: 2h 11m -> 1h 59m (-8.7%)
  4. dist-various-1: 1h 11m -> 1h 5m (-8.4%)
  5. armhf-gnu: 1h 33m -> 1h 27m (-7.2%)
  6. dist-armhf-linux: 1h 26m -> 1h 31m (+6.7%)
  7. aarch64-gnu-llvm-20-1: 59m 31s -> 1h 3m (+6.7%)
  8. dist-apple-various: 1h 13m -> 1h 18m (+6.2%)
  9. x86_64-msvc-1: 2h 25m -> 2h 34m (+6.2%)
  10. x86_64-msvc-2: 2h 29m -> 2h 38m (+5.9%)
How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (db3e99b): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.3% [-2.3%, -0.4%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.3% [-2.3%, -0.4%] 2

Max RSS (memory usage)

Results (primary 1.9%, secondary 2.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
5.3% [1.3%, 9.3%] 2
Regressions ❌
(secondary)
2.4% [2.4%, 2.4%] 1
Improvements ✅
(primary)
-1.5% [-1.8%, -1.1%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.9% [-1.8%, 9.3%] 4

Cycles

Results (primary -2.6%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.6% [-2.6%, -2.6%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -2.6% [-2.6%, -2.6%] 1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 472.482s -> 472.94s (0.10%)
Artifact size: 398.10 MiB -> 398.12 MiB (0.01%)

@rustbot rustbot removed the perf-regression Performance regression. label Feb 4, 2026
@RalfJung RalfJung deleted the fallback-intrinsic-skip branch February 5, 2026 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. merged-by-bors This PR was explicitly merged by bors. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Avoid monomorphizing intrinsic fallback bodies that the backend does not need

10 participants