In copy_nonoverlapping, use mul nuw nsw to compute the byte size#157560
Conversation
|
Yeah, in fact both |
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
In `copy_nonoverlapping`, use `mul nuw nsw` to compute the byte size
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (1fe6841): comparison URL. Overall result: no relevant changes - no action neededBenchmarking means the PR may be perf-sensitive. Consider adding rollup=never if this change is not fit for rolling up. @rustbot label: -S-waiting-on-perf -perf-regression Instruction countThis perf run didn't have relevant results for this metric. Max RSS (memory usage)Results (primary 3.3%, secondary -5.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (secondary -3.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis perf run didn't have relevant results for this metric. Bootstrap: 516.093s -> 517.507s (0.27%) |
|
@bors r+ rollup (because clearly it doesn't impact perf... Right now) |
…saethlin In `copy_nonoverlapping`, use `mul nuw nsw` to compute the byte size Seems like we might as well? Adding these flags means the optimizer can tell the limited range on the count of items -- like how we use these flags (rust-lang#136575) when calculating `size_of_val` for a slice. Today we use a wrapping multiplication, which mean that `copy_nonoverlapping::<u32>(src, dst, 0x40000000_00000001)` appears like 4 bytes -- a perfectly reasonable size! -- once it gets to the `memcpy` call. If I'm understanding <https://doc.rust-lang.org/std/ptr/fn.copy_nonoverlapping.html#safety> properly, this is just exploiting existing UB, since `src` and `dst` must each be inside an allocation, and those allocations can be at most `isize::MAX` bytes. (Plus, fundamentally, to be non-overlapping there's not enough space in the address space to be bigger than `isize::MAX`.) cc @RalfJung to make sure this is ok, as requested last he found out I was newly exploiting some UB in codegen 🙃 r? codegen
…saethlin In `copy_nonoverlapping`, use `mul nuw nsw` to compute the byte size Seems like we might as well? Adding these flags means the optimizer can tell the limited range on the count of items -- like how we use these flags (rust-lang#136575) when calculating `size_of_val` for a slice. Today we use a wrapping multiplication, which mean that `copy_nonoverlapping::<u32>(src, dst, 0x40000000_00000001)` appears like 4 bytes -- a perfectly reasonable size! -- once it gets to the `memcpy` call. If I'm understanding <https://doc.rust-lang.org/std/ptr/fn.copy_nonoverlapping.html#safety> properly, this is just exploiting existing UB, since `src` and `dst` must each be inside an allocation, and those allocations can be at most `isize::MAX` bytes. (Plus, fundamentally, to be non-overlapping there's not enough space in the address space to be bigger than `isize::MAX`.) cc @RalfJung to make sure this is ok, as requested last he found out I was newly exploiting some UB in codegen 🙃 r? codegen
Rollup of 25 pull requests Successful merges: - #157447 (Move cross crate tests into the appropriate folder) - #145108 (Resolver: Batched Import Resolution) - #156119 (Further optimize `SliceIndex<str>` impl for `Range<usize>`) - #157224 (Manually unroll loop in `str::floor_char_boundary`) - #157289 (Add infallible primitive type lookups to template arg resolver) - #157540 (Cleanup and optimize `render_impls`) - #157444 (Couple of work product cleanups) - #157543 (Reorganize `tests/ui/issues` [5/N]) - #153513 (Syntactically reject equality predicates) - #155797 (LineWriter: cap write_vectored newline scan to avoid quadratic write_all_vectored) - #156155 (macros: report unbound metavariables directly) - #156188 (riscv: promote d, e, and f target_features to CfgStableToggleUnstable) - #156666 (Clarify meaning of ranges in pointer offset docs) - #157078 (Document equivalence of `highest_one` and `ilog2` methods on integers) - #157129 (ci: update download-artifact action to v8) - #157169 (triagebot: Update messages to direct changes to appropriate repositories) - #157323 (Document Repeat::last panic behavior) - #157370 (Clarify MaybeUninit::zeroed padding docs) - #157399 (Silence llbc's output by default to prevent rustc's linker output warning) - #157500 (Improve documentation of `align_of` and `Alignment`.) - #157545 (Suggest using comma to separate valid attribute list items) - #157559 (chore: Update annotate-snippets to 0.12.16) - #157560 (In `copy_nonoverlapping`, use `mul nuw nsw` to compute the byte size) - #157580 (Importing suggestion reported twice when reporting privacy error) - #157581 (Test fixup)
Rollup of 25 pull requests Successful merges: - #157447 (Move cross crate tests into the appropriate folder) - #145108 (Resolver: Batched Import Resolution) - #156119 (Further optimize `SliceIndex<str>` impl for `Range<usize>`) - #157224 (Manually unroll loop in `str::floor_char_boundary`) - #157289 (Add infallible primitive type lookups to template arg resolver) - #157540 (Cleanup and optimize `render_impls`) - #157444 (Couple of work product cleanups) - #157543 (Reorganize `tests/ui/issues` [5/N]) - #153513 (Syntactically reject equality predicates) - #155797 (LineWriter: cap write_vectored newline scan to avoid quadratic write_all_vectored) - #156155 (macros: report unbound metavariables directly) - #156188 (riscv: promote d, e, and f target_features to CfgStableToggleUnstable) - #156666 (Clarify meaning of ranges in pointer offset docs) - #157078 (Document equivalence of `highest_one` and `ilog2` methods on integers) - #157129 (ci: update download-artifact action to v8) - #157169 (triagebot: Update messages to direct changes to appropriate repositories) - #157323 (Document Repeat::last panic behavior) - #157370 (Clarify MaybeUninit::zeroed padding docs) - #157399 (Silence llbc's output by default to prevent rustc's linker output warning) - #157500 (Improve documentation of `align_of` and `Alignment`.) - #157545 (Suggest using comma to separate valid attribute list items) - #157559 (chore: Update annotate-snippets to 0.12.16) - #157560 (In `copy_nonoverlapping`, use `mul nuw nsw` to compute the byte size) - #157580 (Importing suggestion reported twice when reporting privacy error) - #157581 (Test fixup)
Yeah, it enables fewer optimizations than I'd hoped right now; see llvm/llvm-project#202240 and #157589 |
Rollup merge of #157560 - scottmcm:mul_nuw_nsw_in_memcpy, r=saethlin In `copy_nonoverlapping`, use `mul nuw nsw` to compute the byte size Seems like we might as well? Adding these flags means the optimizer can tell the limited range on the count of items -- like how we use these flags (#136575) when calculating `size_of_val` for a slice. Today we use a wrapping multiplication, which mean that `copy_nonoverlapping::<u32>(src, dst, 0x40000000_00000001)` appears like 4 bytes -- a perfectly reasonable size! -- once it gets to the `memcpy` call. If I'm understanding <https://doc.rust-lang.org/std/ptr/fn.copy_nonoverlapping.html#safety> properly, this is just exploiting existing UB, since `src` and `dst` must each be inside an allocation, and those allocations can be at most `isize::MAX` bytes. (Plus, fundamentally, to be non-overlapping there's not enough space in the address space to be bigger than `isize::MAX`.) cc @RalfJung to make sure this is ok, as requested last he found out I was newly exploiting some UB in codegen 🙃 r? codegen
Rollup of 25 pull requests Successful merges: - rust-lang/rust#157447 (Move cross crate tests into the appropriate folder) - rust-lang/rust#145108 (Resolver: Batched Import Resolution) - rust-lang/rust#156119 (Further optimize `SliceIndex<str>` impl for `Range<usize>`) - rust-lang/rust#157224 (Manually unroll loop in `str::floor_char_boundary`) - rust-lang/rust#157289 (Add infallible primitive type lookups to template arg resolver) - rust-lang/rust#157540 (Cleanup and optimize `render_impls`) - rust-lang/rust#157444 (Couple of work product cleanups) - rust-lang/rust#157543 (Reorganize `tests/ui/issues` [5/N]) - rust-lang/rust#153513 (Syntactically reject equality predicates) - rust-lang/rust#155797 (LineWriter: cap write_vectored newline scan to avoid quadratic write_all_vectored) - rust-lang/rust#156155 (macros: report unbound metavariables directly) - rust-lang/rust#156188 (riscv: promote d, e, and f target_features to CfgStableToggleUnstable) - rust-lang/rust#156666 (Clarify meaning of ranges in pointer offset docs) - rust-lang/rust#157078 (Document equivalence of `highest_one` and `ilog2` methods on integers) - rust-lang/rust#157129 (ci: update download-artifact action to v8) - rust-lang/rust#157169 (triagebot: Update messages to direct changes to appropriate repositories) - rust-lang/rust#157323 (Document Repeat::last panic behavior) - rust-lang/rust#157370 (Clarify MaybeUninit::zeroed padding docs) - rust-lang/rust#157399 (Silence llbc's output by default to prevent rustc's linker output warning) - rust-lang/rust#157500 (Improve documentation of `align_of` and `Alignment`.) - rust-lang/rust#157545 (Suggest using comma to separate valid attribute list items) - rust-lang/rust#157559 (chore: Update annotate-snippets to 0.12.16) - rust-lang/rust#157560 (In `copy_nonoverlapping`, use `mul nuw nsw` to compute the byte size) - rust-lang/rust#157580 (Importing suggestion reported twice when reporting privacy error) - rust-lang/rust#157581 (Test fixup)
Rollup of 25 pull requests Successful merges: - rust-lang/rust#157447 (Move cross crate tests into the appropriate folder) - rust-lang/rust#145108 (Resolver: Batched Import Resolution) - rust-lang/rust#156119 (Further optimize `SliceIndex<str>` impl for `Range<usize>`) - rust-lang/rust#157224 (Manually unroll loop in `str::floor_char_boundary`) - rust-lang/rust#157289 (Add infallible primitive type lookups to template arg resolver) - rust-lang/rust#157540 (Cleanup and optimize `render_impls`) - rust-lang/rust#157444 (Couple of work product cleanups) - rust-lang/rust#157543 (Reorganize `tests/ui/issues` [5/N]) - rust-lang/rust#153513 (Syntactically reject equality predicates) - rust-lang/rust#155797 (LineWriter: cap write_vectored newline scan to avoid quadratic write_all_vectored) - rust-lang/rust#156155 (macros: report unbound metavariables directly) - rust-lang/rust#156188 (riscv: promote d, e, and f target_features to CfgStableToggleUnstable) - rust-lang/rust#156666 (Clarify meaning of ranges in pointer offset docs) - rust-lang/rust#157078 (Document equivalence of `highest_one` and `ilog2` methods on integers) - rust-lang/rust#157129 (ci: update download-artifact action to v8) - rust-lang/rust#157169 (triagebot: Update messages to direct changes to appropriate repositories) - rust-lang/rust#157323 (Document Repeat::last panic behavior) - rust-lang/rust#157370 (Clarify MaybeUninit::zeroed padding docs) - rust-lang/rust#157399 (Silence llbc's output by default to prevent rustc's linker output warning) - rust-lang/rust#157500 (Improve documentation of `align_of` and `Alignment`.) - rust-lang/rust#157545 (Suggest using comma to separate valid attribute list items) - rust-lang/rust#157559 (chore: Update annotate-snippets to 0.12.16) - rust-lang/rust#157560 (In `copy_nonoverlapping`, use `mul nuw nsw` to compute the byte size) - rust-lang/rust#157580 (Importing suggestion reported twice when reporting privacy error) - rust-lang/rust#157581 (Test fixup)
Use `mul nuw nsw` in `intrinsics::copy` Essentially the same as rust-lang#157560, just for `copy` instead of `copy_nonoverlapping`. > Yeah, in fact both copy and copy_nonoverlapping could use this since we know the result must be at most isize::MAX else it cannot be inbounds. > ~ rust-lang#157560 (comment) r? saethlin
Rollup merge of #157588 - scottmcm:copy-nsuw, r=saethlin Use `mul nuw nsw` in `intrinsics::copy` Essentially the same as #157560, just for `copy` instead of `copy_nonoverlapping`. > Yeah, in fact both copy and copy_nonoverlapping could use this since we know the result must be at most isize::MAX else it cannot be inbounds. > ~ #157560 (comment) r? saethlin
Seems like we might as well? Adding these flags means the optimizer can tell the limited range on the count of items -- like how we use these flags (#136575) when calculating
size_of_valfor a slice.Today we use a wrapping multiplication, which mean that
copy_nonoverlapping::<u32>(src, dst, 0x40000000_00000001)appears like 4 bytes -- a perfectly reasonable size! -- once it gets to thememcpycall.If I'm understanding https://doc.rust-lang.org/std/ptr/fn.copy_nonoverlapping.html#safety properly, this is just exploiting existing UB, since
srcanddstmust each be inside an allocation, and those allocations can be at mostisize::MAXbytes. (Plus, fundamentally, to be non-overlapping there's not enough space in the address space to be bigger thanisize::MAX.)cc @RalfJung to make sure this is ok, as requested last he found out I was newly exploiting some UB in codegen 🙃
r? codegen