Skip to content

LST: Remove matrix caps for MDs using Precompute#256

Open
GNiendorf wants to merge 1 commit intomasterfrom
md_full_precompute
Open

LST: Remove matrix caps for MDs using Precompute#256
GNiendorf wants to merge 1 commit intomasterfrom
md_full_precompute

Conversation

@GNiendorf
Copy link
Copy Markdown
Member

@GNiendorf GNiendorf commented May 1, 2026

Most LST objects (segments, triplets, quintuplets, ...) size their buffers from an upper-bound estimate computed in an initial counting pass before object creation (see previous PR's here: cms-sw#50157, cms-sw#48698, cms-sw#47232). Memory for LST mini-doublets (two hits in a given module) are currently allocated using the minimum of a loose dynamic estimate (n hits lower sensor x n hits upper sensor) and static caps based on detector region. These static caps help reduce overallocation from the loose dynamic estimate but can cause truncation for high-occupancy events and have to be re-tuned by hand when hit selections or pT cuts change. This PR replaces these static caps with a precomputed exact-count: We first compute the number of mini-doublets that will be created in a counting pass and then allocate an exact-size buffer to store them. This eliminates possible truncation and reduces the memory allocated for mini-doublets for a small increase in overall timing. Mini-doublet memory drops from 31.4 MB to 7.4 MB on average (~22% decrease in total memory/event for LST) for a ~7% increase in LST time/event.

@GNiendorf
Copy link
Copy Markdown
Member Author

run-ci: [all, hlt]

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

The PR was built and ran successfully in standalone mode running on CPU. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     29.7     90.8    131.4    132.2     49.4    785.2     10.6     42.3     74.2    212.0      0.8    1558.7     743.7+/- 187.1     507.4   explicit[s=4] (target branch)
   avg     29.3    163.1    128.4    125.3     47.8    671.8     10.6     41.5     73.2    214.2      0.1    1505.2     804.1+/- 195.7     495.6   explicit[s=4] (this PR)

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

The PR was built and ran successfully with HLT setup running on CPU (procModifiers = ). Here are some plots.

HLT General Plots
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

The PR was built and ran successfully with CMSSW running on CPU. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

@slava77
Copy link
Copy Markdown

slava77 commented May 1, 2026

The PR was built and ran successfully in standalone mode running on CPU. Here are some of the comparison plots.

image

Why did the pLS time change? (or is it not a pLS column? recall that the header text is missing a T4 kernel; I guess T4 is now under the TC, if I recall the rough timing cost correctly)

@GNiendorf
Copy link
Copy Markdown
Member Author

GNiendorf commented May 1, 2026

Why did the pLS time change? (or is it not a pLS column? recall that the header text is missing a T4 kernel; I guess T4 is now under the TC, if I recall the rough timing cost correctly)

Just compiler weirdness or effects from multi-stream. The pLS time on lnx4555 CPU single-stream fluctuates randomly between 360 and 316 even when I don't touch it. I don't think you would see this in HLT since the main kernel there (accounts for almost all pLS time) is the pLS dedup which is turned off.

@slava77
Copy link
Copy Markdown

slava77 commented May 1, 2026

in #245 the MD time went down from 323.6 to 90, now up from 90 to 163. Still a net gain. From a quick glance the PR looks good to go.
Please prepare a description.

@GNiendorf GNiendorf marked this pull request as ready for review May 1, 2026 20:36
@GNiendorf GNiendorf changed the title Remove matrix caps for MDs using full precompute LST: Remove matrix caps for MDs using Precompute May 1, 2026
@GNiendorf
Copy link
Copy Markdown
Member Author

@slava77 PR description updated

@slava77
Copy link
Copy Markdown

slava77 commented May 1, 2026

@slava77 PR description updated

Thank you.
Let's go with this to CMSSW

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants