Skip to content

Use undef for (some) partially-uninit constants#94130

Merged
bors merged 5 commits into
rust-lang:masterfrom
erikdesjardins:partially
Feb 25, 2022
Merged

Use undef for (some) partially-uninit constants#94130
bors merged 5 commits into
rust-lang:masterfrom
erikdesjardins:partially

Conversation

@erikdesjardins

@erikdesjardins erikdesjardins commented Feb 18, 2022

Copy link
Copy Markdown
Contributor

There needs to be some limit to avoid perf regressions on large arrays
with undef in each element (see comment in the code).

Fixes: #84565
Original PR: #83698

Depends on LLVM 14: #93577

There needs to be some limit to avoid perf regressions on large arrays
with undef in each element (see comment in the code).
@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Feb 18, 2022
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 18, 2022
@lqd

lqd commented Feb 18, 2022

Copy link
Copy Markdown
Member

@bors try @rust-timer queue

@rust-timer

Copy link
Copy Markdown
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 18, 2022
@bors

bors commented Feb 18, 2022

Copy link
Copy Markdown
Collaborator

⌛ Trying commit b7e5597 with merge 796a3796f114911f2d4b98ea97c8ff28a6f37634...

@RalfJung

Copy link
Copy Markdown
Member

I know next to nothing about how LLVM represents consts and when or how that might cause slowdowns, and this doesn't even touch the CTFE engine, so I don't think I am a suitable reviewer for this.

@erikdesjardins

Copy link
Copy Markdown
Contributor Author

r? @nagisa

@rust-highfive rust-highfive assigned nagisa and unassigned RalfJung Feb 18, 2022
@nagisa

nagisa commented Feb 18, 2022

Copy link
Copy Markdown
Member

cc @nikic

@nikic

nikic commented Feb 18, 2022

Copy link
Copy Markdown
Contributor

I wonder if the right heuristic here isn't related to size, but rather whether the constant would otherwise be zeroinitializer? If it isn't, then the value needs to be made explicit anyway, and it shouldn't matter whether the explicit value is undef or some other value.

@bors

bors commented Feb 18, 2022

Copy link
Copy Markdown
Collaborator

☀️ Try build successful - checks-actions
Build commit: 796a3796f114911f2d4b98ea97c8ff28a6f37634 (796a3796f114911f2d4b98ea97c8ff28a6f37634)

@rust-timer

Copy link
Copy Markdown
Collaborator

Queued 796a3796f114911f2d4b98ea97c8ff28a6f37634 with parent b8c56fa, future comparison URL.

@erikdesjardins

erikdesjardins commented Feb 18, 2022

Copy link
Copy Markdown
Contributor Author

I wonder if the right heuristic here isn't related to size, but rather whether the constant would otherwise be zeroinitializer

I think that makes some sense* -- but I do want to emit undef for some values that would otherwise be zeroinitializer.

e.g. a type like

struct Foo {
    init: bool,
    data: MaybeUninit<SomeLargeStruct>,
}
const FOO: Foo = Foo { init: false, data: MaybeUninit::uninit() };

could be zeroinitializer, but would benefit from being partially undef since only one byte is initialized.
(i.e. so memcpys of it could be shrunk to just one byte--doesn't look like LLVM currently does this, but it could be done)

Combining the two heuristics is an option of course (using undef for consts that are either nonzero or < size limit), although at that point I'm not sure it would be worth it.

Another option: limit the number of fields in the anonymous struct that we generate, i.e., the number of contiguous chunks of all-init or all-uninit bytes. I think this would be a decently close match to the actual cost of generating the const expression. It would also let us handle (potentially) profitable cases like my struct Foo example regardless of how large they are. (Hmm, I think I've convinced myself that this is better than size, at least.)

* I do think it's still not fully accurate though--it ought to be more expensive to generate { [1 x i8] c"1", [1 x i8] undef, [1 x i8] c"2", ... } than { [1000 x i8] c"..." }, since for the former we have to allocate 1000500 const strings vs. just 1 large one for the latter.

@erikdesjardins

Copy link
Copy Markdown
Contributor Author

(Hmm, I think I've convinced myself that this is better than size, at least.)

Switched to limiting the number of chunks. Let me know if you'd prefer something else.

@erikdesjardins erikdesjardins changed the title Use undef for partially-uninit constants up to 1024 bytes Use undef for (some) partially-uninit constants Feb 19, 2022
@rust-log-analyzer

This comment has been minimized.

@rust-timer

Copy link
Copy Markdown
Collaborator

Finished benchmarking commit (796a3796f114911f2d4b98ea97c8ff28a6f37634): comparison url.

Summary: This benchmark run did not return any relevant results. 5 results were found to be statistically significant but too small to be relevant.

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 19, 2022
@nikic

nikic commented Feb 19, 2022

Copy link
Copy Markdown
Contributor

Yeah, using the number of chunks makes a lot of sense to me.

Comment thread compiler/rustc_session/src/options.rs Outdated
@nikic

nikic commented Feb 19, 2022

Copy link
Copy Markdown
Contributor

@bors r+

@RalfJung

Copy link
Copy Markdown
Member

@bors r=nikic

@bors

bors commented Feb 20, 2022

Copy link
Copy Markdown
Collaborator

📌 Commit 067f628 has been approved by nikic

@rust-log-analyzer

This comment has been minimized.

@RalfJung

Copy link
Copy Markdown
Member

@bors r-

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Feb 20, 2022
@nikic

nikic commented Feb 20, 2022

Copy link
Copy Markdown
Contributor

@bors r+

@bors

bors commented Feb 20, 2022

Copy link
Copy Markdown
Collaborator

📌 Commit 5bf8303 has been approved by nikic

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 20, 2022
@bors

bors commented Feb 24, 2022

Copy link
Copy Markdown
Collaborator

⌛ Testing commit 5bf8303 with merge 3450b91e88c8e5343cffeead37603a63a8f904a9...

@bors

bors commented Feb 24, 2022

Copy link
Copy Markdown
Collaborator

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Feb 24, 2022
@rust-log-analyzer

Copy link
Copy Markdown
Collaborator

The job aarch64-gnu failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

@nikic

nikic commented Feb 24, 2022

Copy link
Copy Markdown
Contributor

@bors retry spurious network error

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 24, 2022
@bors

bors commented Feb 25, 2022

Copy link
Copy Markdown
Collaborator

⌛ Testing commit 5bf8303 with merge ece55d4...

@bors

bors commented Feb 25, 2022

Copy link
Copy Markdown
Collaborator

☀️ Test successful - checks-actions
Approved by: nikic
Pushing ece55d4 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Feb 25, 2022
@bors bors merged commit ece55d4 into rust-lang:master Feb 25, 2022
@rustbot rustbot added this to the 1.61.0 milestone Feb 25, 2022
@rust-timer

Copy link
Copy Markdown
Collaborator

Finished benchmarking commit (ece55d4): comparison url.

Summary: This benchmark run shows 3 relevant improvements 🎉 but 6 relevant regressions 😿 to instruction counts.

  • Arithmetic mean of relevant regressions: 0.7%
  • Arithmetic mean of relevant improvements: -1.9%
  • Arithmetic mean of all relevant changes: -0.1%
  • Largest improvement in instruction counts: -2.9% on full builds of ctfe-stress-4 opt
  • Largest regression in instruction counts: 0.9% on incr-unchanged builds of externs opt

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression

@rustbot rustbot added the perf-regression Performance regression. label Feb 25, 2022
@erikdesjardins erikdesjardins deleted the partially branch February 25, 2022 19:14
@erikdesjardins

erikdesjardins commented Feb 25, 2022

Copy link
Copy Markdown
Contributor Author

Perf changes:

ctfe-stress-4 opt full improvement:

LLVM does less work. Judging from the involved functions, this is likely because the generated bitcode is much smaller, since we emit just undef for large parts of some constant(s) instead of a large string.

--------------------------------------------------------------------------------
Ir             
--------------------------------------------------------------------------------
-1,234,509,749  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir             file:function
--------------------------------------------------------------------------------
 -293,601,221  ???:(anonymous namespace)::ModuleBitcodeWriter::writeConstants(unsigned int, unsigned int, bool)
 -270,535,020  ???:void llvm::BitstreamWriter::EmitRecordWithAbbrevImpl<unsigned long>(unsigned int, llvm::ArrayRef<unsigned long>, llvm::StringRef, llvm::Optional<unsigned int>)
 -255,855,613  ???:llvm::BitstreamCursor::readRecord(unsigned int, llvm::SmallVectorImpl<unsigned long>&, llvm::StringRef*)
 -184,944,003  ???:llvm::SHA1::hashBlock()
 -117,440,526  ???:llvm::Type::getPrimitiveSizeInBits() const
  -87,031,635  ???:llvm::ConstantDataArray::getString(llvm::LLVMContext&, llvm::StringRef, bool)
  -66,059,028  ???:(anonymous namespace)::BitcodeReader::parseConstants()
   17,301,570  ???:llvm::raw_svector_ostream::write_impl(char const*, unsigned long)
   15,204,381  ???:llvm::raw_ostream::write(char const*, unsigned long)
  -15,006,677  ./string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_sse2_unaligned_erms
    9,831,661  ???:<rustc_const_eval::interpret::eval_context::InterpCx<rustc_const_eval::const_eval::machine::CompileTimeInterpreter>>::run
    9,437,220  ???:llvm::raw_ostream::flush_tied_then_write(char const*, unsigned long)
   -7,077,468  ???:llvm::SHA1::update(llvm::ArrayRef<unsigned char>)
    5,244,443  ???:llvm::MCAssembler::writeSectionData(llvm::raw_ostream&, llvm::MCSection const*, llvm::MCAsmLayout const&) const
    5,240,004  ./string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:memcpy@GLIBC_2.2.5
    2,753,552  ???:<rustc_middle::mir::interpret::allocation::InitMask>::find_bit
   -1,310,727  ???:<rustc_const_eval::const_eval::machine::CompileTimeInterpreter as rustc_const_eval::interpret::machine::Machine>::find_mir_or_eval_fn

ucd debug full regression:

LLVM does more work, likely because we generate constants with more fields (which is expected).

Functions like <&mut alloc::vec::Vec<ena::unify::VarValue<rustc_type_ir::TyVid>> as core::convert::AsRef<[ena::unify::VarValue<rustc_type_ir::TyVid>]>>::as_ref (this is impl<T> AsRef<T> for &mut T) should never show up in diffs since they're trivial (and Vec<T> as AsRef<[T]> is also pretty simple). I'll open a PR adding #[inline] to those to see if it causes improvements more generally.

Edit: opened #94372

--------------------------------------------------------------------------------
Ir          
--------------------------------------------------------------------------------
112,870,638  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir           file:function
--------------------------------------------------------------------------------
-19,111,572  ???:<&mut alloc::vec::Vec<ena::unify::VarValue<rustc_type_ir::TyVid>> as core::convert::AsRef<[ena::unify::VarValue<rustc_type_ir::TyVid>]>>::as_ref
 15,031,469  ???:llvm::DataLayout::getAlignment(llvm::Type*, bool) const
 14,773,437  ???:llvm::MCAssembler::writeSectionData(llvm::raw_ostream&, llvm::MCSection const*, llvm::MCAsmLayout const&) const
  6,930,326  ???:llvm::ConstantDataArray::getString(llvm::LLVMContext&, llvm::StringRef, bool)
  6,135,110  ???:llvm::SmallPtrSetImplBase::insert_imp_big(void const*)
  5,341,507  ???:llvm::DataLayout::getTypeSizeInBits(llvm::Type*) const
  5,186,440  ???:llvm::MCAsmLayout::layoutFragment(llvm::MCFragment*)
 -4,139,756  ???:<rustc_infer::infer::sub::Sub as rustc_middle::ty::relate::TypeRelation>::tys
  4,015,730  ???:llvm::MCAssembler::layout(llvm::MCAsmLayout&)
  3,861,034  ???:<rustc_const_eval::interpret::validity::ValidityVisitor<rustc_const_eval::const_eval::machine::CompileTimeInterpreter> as rustc_const_eval::interpret::visitor::ValueVisitor<rustc_const_eval::const_eval::machine::CompileTimeInterpreter>>::walk_value
  3,765,544  ???:llvm::MCObjectStreamer::emitFill(llvm::MCExpr const&, unsigned long, llvm::SMLoc)
  3,615,753  obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/build/tikv-jemalloc-sys-93177ab130254195/out/build/src/arena.c:arena_dalloc_bin_locked_impl
  3,447,968  ???:llvm::MCObjectStreamer::getOrCreateDataFragment(llvm::MCSubtargetInfo const*)
...

all the externs regressions look like:

--------------------------------------------------------------------------------
Ir         
--------------------------------------------------------------------------------
13,448,267  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir          file:function
--------------------------------------------------------------------------------
13,504,500  ???:rustc_middle::ty::context::tls::TLV::__getit
  -145,666  ???:<rustc_query_system::dep_graph::graph::CurrentDepGraph<rustc_middle::dep_graph::dep_node::DepKind>>::intern_node
   -66,004  ???:<rustc_middle::dep_graph::dep_node::DepKind as rustc_query_system::dep_graph::DepKind>::read_deps::<<rustc_query_system::dep_graph::graph::DepGraph<rustc_middle::dep_graph::dep_node::DepKind>>::read_index::{closure
    27,018  ./nptl/../nptl/pthread_mutex_trylock.c:pthread_mutex_trylock
   -20,467  obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/build/tikv-jemalloc-sys-93177ab130254195/out/build/src/extent.c:_rjem_je_extent_heap_remove_first
    18,004  ???:<rustc_ast::ast::NestedMetaItem>::from_tokens::<rustc_ast::tokenstream::Cursor>
   -15,157  obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/build/tikv-jemalloc-sys-93177ab130254195/out/build/src/base.c:base_alloc_impl
    14,724  obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/build/tikv-jemalloc-sys-93177ab130254195/out/build/include/jemalloc/internal/hash.h:extent_try_coalesce_impl
    14,315  obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/build/tikv-jemalloc-sys-93177ab130254195/out/build/include/jemalloc/internal/hash.h:extent_lock_from_addr
    14,220  ./nptl/pthread_mutex_unlock.c:__pthread_mutex_unlock_usercnt

Judging from callgrind (omitted from this comment), this is a __getit call that was inlined before this change and isn't afterwards.
__getit is already #[inline], I'll open a PR to try making it #[inline(always)] to see if that is worthwhile.

Edit: opened #94373

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Partially-undef constants cause a large size regression in src/test/run-make/wasm-panic-small

10 participants