Optimize checked_ilog and pow when base is a power of two#147250
Conversation
8f19ce6 to
e5ca8cd
Compare
This comment has been minimized.
This comment has been minimized.
|
does this affect the codegen in practical circumstances? I would expect even a fairly weak optimizer to perform this optimization, making us hand-coding it irrelevant and potentially even harmful because we introduce more conditional logic to chew through. |
checked_ilog when base is a power of twochecked_ilog and pow when base is a power of two
It replaces a loop by some bit manipulations. Whether anyone actually calls either function with a compile-time known, power of 2 base is another question. But it can't hurt, since it is guarded by
No, in either function, LLVM is not able to see that the loop is performing a log/pow and apply the identities: https://godbolt.org/z/6vMsxc9Kh
They're guarded by |
2faa397 to
abb7f32
Compare
...then I'm kinda surprised! Nice catch. |
Would be good to have codegen tests to demonstrate what this is doing -- especially since that way there's a way for people to try removing the special cases later if they think that LLVM no longer needs them. |
abb7f32 to
9ab0f63
Compare
9ab0f63 to
1d0ac82
Compare
|
thanks for the codegen tests! |
1d0ac82 to
c84b99e
Compare
|
@rustbot ready |
|
@scottmcm ping? |
There was a problem hiding this comment.
The code here is looking good, but can you make sure there's normal runtime behaviour tests for it too? Notably, after the base conversation (that's fixed in the code) it made me think that that's not currently covered by any tests, so we should have something -- maybe just add to the # Examples that it returns None for base zero or one? (They seem like perfectly reasonable and helpful examples, in addition to giving coverage for this stuff.)
And maybe add some should_panic tests to ensure that the overflow checking is still correct for pow when overflow checks are enabled?
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (759376f): comparison URL. Overall result: ❌ regressions - please read:Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf. Next, please: If you can, justify the regressions found in this try perf run in writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 3.0%, secondary -1.3%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (secondary -1.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.0%, secondary 0.3%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 510.625s -> 510.701s (0.01%) |
|
Same crates and for roughly the same amount. I'm clueless. Maybe try posting on Zulip, as I'm not sure who to ping. |
|
Despite the unexpected perf result, it was recommended on Zulip to merge this anyway. @bors r+ |
|
@bors p=2 |
This comment has been minimized.
This comment has been minimized.
…=jhpratt Optimize `checked_ilog` and `pow` when `base` is a power of two Optimize `checked_ilog` and `pow` when the base is a power of two
|
Stuck |
|
Auto build was cancelled. Cancelled workflows: The next pull request likely to be tested is #157548. |
Thanks. I was going to post to zulip but haven't had time this week. Can you link to the zulip discussion? |
This comment has been minimized.
This comment has been minimized.
What is this?This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.Comparing 43a4909 (parent) -> f20a92e (this PR) Test differencesShow 3966 test diffsStage 1
Stage 2
Additionally, 3962 doctest diffs were found. These are ignored, as they are noisy. Job group index
Test dashboardRun cargo run --manifest-path src/ci/citool/Cargo.toml -- \
test-dashboard f20a92ec01483dc5c58e90e246f266bdad822d86 --output-dir test-dashboardAnd then open Job duration changes
How to interpret the job duration changes?Job durations can vary a lot, based on the actual runner instance |
|
Finished benchmarking commit (f20a92e): comparison URL. Overall result: ❌ regressions - please read:Our benchmarks found a performance regression caused by this PR. Next Steps:
@rustbot label: +perf-regression Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 5.0%, secondary -5.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesThis perf run didn't have relevant results for this metric. Binary sizeResults (primary -0.0%, secondary 0.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 516.093s -> 516.04s (-0.01%) |
|
This was known before approving. @rustbot label +perf-regression-triaged |
View all comments
Optimize
checked_ilogandpowwhen the base is a power of two