skip codegen for intrinsics with big fallback bodies if backend does not need them#150605
Conversation
|
|
||
| /// The names of intrinsics that the current codegen backend replaces | ||
| /// with its own implementations. | ||
| pub replaced_intrinsics: Vec<Symbol>, |
There was a problem hiding this comment.
It seems there is no way to get the current codegen backend from a tcx. I wasn't sure what the best way is to make this list of symbols available to monomorphization, and went for a new field in Session -- does that make sense?
There was a problem hiding this comment.
I don't know enough about how all this should be structured to know what the best option is here.
This seems at least plausible, since at worst it stays empty and that doesn't hurt anything (other than perf).
There was a problem hiding this comment.
@bjorn3 do you have any suggestions for how to deal with this?
There was a problem hiding this comment.
I am not the biggest fan of another Session field, but don't have any other suggestions either.
|
@bors try |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
skip codegen for intrinsics with big fallback bodies if backend does not need them
This comment has been minimized.
This comment has been minimized.
4ca06da to
a170604
Compare
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (4763a83): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -1.5%, secondary 3.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -3.9%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary 0.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 473.485s -> 474.195s (0.15%) |
a170604 to
57e44f5
Compare
|
@bors try |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
skip codegen for intrinsics with big fallback bodies if backend does not need them
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (c75310a): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -4.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -3.9%, secondary 15.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary 0.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 471.287s -> 473.923s (0.56%) |
|
@rustbot reroll |
|
Cool idea! I'll wait a few days to give @scottmcm time to respond respond as the much more knowledgeable person. Do you know if there is a list of similarly optimised intrinsics somewhere? |
|
In principle one could go over all the intrinsics that have fallback bodies, and then check whether the LLVM backend has implementations for them. But most fallback bodies are small so the cost of monomorphizing them is tiny. Not sure if it's worth going through the entire list. I think I got all the ones that have big fallback bodies where we really don't want to pay the monomorphization cost. |
|
Fair enough, thanks for the explanation. @bors r+ |
This comment has been minimized.
This comment has been minimized.
skip codegen for intrinsics with big fallback bodies if backend does not need them This hopefully fixes the perf regression from #148478. I only added the intrinsics with big fallback bodies to the list; it doesn't seem worth the effort of going through the entire list. Fixes #149945 Cc @scottmcm @bjorn3
…r=mati865 skip codegen for intrinsics with big fallback bodies if backend does not need them This hopefully fixes the perf regression from rust-lang#148478. I only added the intrinsics with big fallback bodies to the list; it doesn't seem worth the effort of going through the entire list. Fixes rust-lang#149945 Cc @scottmcm @bjorn3
|
@bors yield |
|
Auto build cancelled. Cancelled workflows: The next pull request likely to be tested is #152099. |
…uwer Rollup of 11 pull requests Successful merges: - #150605 (skip codegen for intrinsics with big fallback bodies if backend does not need them) - #150992 (link modifier `export-symbols`: export all global symbols from selected uptream c static libraries) - #151534 (target: fix destabilising target-spec-json) - #152088 (rustbook/README.md: add missing `)`) - #151526 (Fix autodiff codegen tests) - #151810 (citool: report debuginfo test statistics) - #152065 (Convert to inline diagnostics in `rustc_ty_utils`) - #152068 (Convert to inline diagnostics in `rustc_resolve`) - #152070 (Convert to inline diagnostics in `rustc_pattern_analysis`) - #152072 (Convert to inline diagnostics in `rustc_monomorphize`) - #152083 (Fix set_times_nofollow for directory on windows) Failed merges: - #152069 (Convert to inline diagnostics in `rustc_privacy`)
|
This has perf impact, should it really be rolled up? It is marked rollup=never. |
|
Oh I guess that mark got lost in the bors transition? |
This comment has been minimized.
This comment has been minimized.
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.Comparing 8bccf12 (parent) -> db3e99b (this PR) Test differencesShow 16 test diffs16 doctest diffs were found. These are ignored, as they are noisy. Test dashboardRun cargo run --manifest-path src/ci/citool/Cargo.toml -- \
test-dashboard db3e99bbab28c6ca778b13222becdea54533d908 --output-dir test-dashboardAnd then open Job duration changes
How to interpret the job duration changes?Job durations can vary a lot, based on the actual runner instance |
|
Finished benchmarking commit (db3e99b): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 1.9%, secondary 2.4%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -2.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 472.482s -> 472.94s (0.10%) |
This hopefully fixes the perf regression from #148478. I only added the intrinsics with big fallback bodies to the list; it doesn't seem worth the effort of going through the entire list.
Fixes #149945
Cc @scottmcm @bjorn3