Phase B: calibrated precursor-window tightening (-7.8% Astral wall) by ypriverol · Pull Request #26 · bigbio/msgfplus

ypriverol · 2026-05-01T09:07:49Z

Summary

Adds a calibration pre-pass that learns the run's precursor mass shift and robust sigma from the top 200 confident PSMs (ranked by spec_eValue), then tightens the main-pass precursor tolerance via min(userPpm, max(floorPpm, k·sigma + marginPpm)) (defaults: k=3, floor=2 ppm, margin=0.5 ppm). Behavior is opt-in via -precursorCal auto (or on); off is the default and is bit-identical to dev.
Closes a CLAUDE.md "pre-passes must not mutate shared Spectrum state" violation: the pre-pass now routes through a new ScoredSpectraMap.isolateSpectrumState() builder that deep-clones each spectrum (precursor + peaks) before setCharge/getScoredSpectrum. Main-pass path is unchanged (no allocation cost).
Removes non-shippable scaffolding from earlier exploration (Experiment 2 telemetry classes, mass-interval pruning hook, calibration debug TSV emitter, ScoredSpectraMap binary-search helper).

Validation — 3-dataset, 3-trial, isolated bench machine

pride-linux-vm.ebi.ac.uk, 8 threads, JAR md5 a4fe020dd1563fe8ec8922f7b9df595e.

Dataset	Arm	Wall (s, mean ± σ)	Targets	Decoys	Δwall	Δtargets
Astral (LFQ_Astral_DDA_15min, t=10ppm)	OFF	508.7 ± 5.0	89479	46792
	AUTO	469.0 ± 9.2	89580	45292	−7.8 %	+101
TMT (PXD007683 a05058, t=20ppm)	OFF	248.0 ± 6.6	28790	14768
	AUTO	240.3 ± 1.5	28201	14285	−3.1 %	−589 (−2.0 %)
PXD001819 (UPS1_5000amol_R1, t=5ppm)	OFF	101.3 ± 0.6	28037	11022
	AUTO	100.3 ± 2.1	28084	10985	−1.0 %	+47

Calibration learned (deterministic across all 3 AUTO trials per dataset):

Astral: shift 1.088 ppm, σ 0.994 ppm → tightened 10 → 3.482 ppm.
TMT: shift −0.754 ppm, σ 2.055 ppm → tightened 20 → 6.666 ppm.
PXD001819: shift 0.101 ppm, σ 2.154 ppm → no tightening (5 ppm user bound already inside the floor; safe no-op as designed).

Native count parity:

OFF-mode targets/decoys match dev baseline exactly on all 3 datasets.
AUTO-mode targets/decoys are deterministic across 3 trials per dataset (zero per-trial variance), confirming the spectrum-isolation fix prevents the shared-state mutation footgun.
TMT AUTO target drop (−2.0 %) is the same yellow flag noted in the original Phase B 3-dataset table; not a new regression.

Notable commits

aac389c calibrator: stratify residuals by spec_eValue, keep top MIN_CONFIDENT_PSMS=200 (sigma 4.0 → 0.99 ppm on Astral).
7c027f8 expose tightening formula constants as system properties (msgfplus.tighteningSigmaMultiplier etc.).
05ec066 calibrator: pre-pass isolated at iso=[0,0] + outlier-filter residuals (>50 ppm).
aa4aaae remove non-shippable runtime scaffolding (Experiment 2 telemetry, prefix-mass bound test, etc.); keep Phase B as the real improvement.
5d9482d fix(phase-b): isolate Spectrum state during calibration pre-pass (closes the CLAUDE.md footgun called out in code review).

Test plan

CI: mvn -B verify passes (53/53 scoped tests pass locally including new TestScoredSpectraMapIsolation).
Pull feat/precursor-window-tightening, run java -jar MSGFPlus.jar -precursorCal off ... against any small benchmark file → target/decoy counts match the same command on dev.
Run with -precursorCal auto on a high-res file → log line "Precursor mass shift learned: X ppm from Y confident PSMs (robust sigma Z ppm)" appears, followed by "Tightened precursor tolerance for main pass: ...".
On a low-res or small file (e.g. PXD001819 5 ppm), AUTO falls back to the user tolerance with no tightening line — safe no-op.
Run AUTO twice on the same file → identical target/decoy counts (no per-run drift).

… SHIPPED.md - Add astral-speed-improvements.md design (gate B: 1.3-1.5x Astral wall, no PSM regression). TMT-as-inner-loop, Astral-as-phase-gate workflow. - Add SHIPPED.md short retrospective covering PRs #15-#22 (primitives + Achievements A+B), PR #23 (speed-v2 + output consolidation), PR #24 (Astral OOM + BuildSA scaling), PR #25 (search-sync-cleanup + parameter-modernization), and the abandoned fragment-index experiment. - Remove search-sync-cleanup.md, parameter-modernization.md, and parameter-modernization-flag-inventory.md — work shipped, content folded into SHIPPED.md. - Update README.md to point at the new active/history layout.

…ng model Replace the short-horizon astral-speed-improvements.md (drafted on this branch in 878b0cb) with the longer-horizon astral-speed-5x-roadmap.md as the single active design. Shipping-model change (new §0 in the roadmap): this iteration ships as milestone commits on feat/astral-speed-improvements with one closing PR at the end, not a PR per phase. Each phase milestone commit follows the template: feat(astral-speed): MILESTONE Phase <id> — <one-line achievement> <2-4 lines: TMT inner-loop wall delta, Astral phase-gate result, recall delta, RSS delta if any> The small-wins from the deleted plan (graph-skeleton memoization in PrimitiveAminoAcidGraph + Tier-1.5 GF candidate cap in DBScanner.computeSpecEValue) are retained as the §0 "Iteration 0.5 fallback" — used only when a big-win phase fails its kill gate. Original short-horizon plan recoverable via: git show 878b0cb:.claude/plans/astral-speed-improvements.md

…reset Phase A (in-engine MS2 deisotoping + dense-peak retention cap + GF candidate cap + scorer hot-path optimization) was attempted on this branch from 2026-04-27 to 2026-04-28. None of three independent optimization angles moved Astral wall above run-to-run noise; TMT's 1.41x apparent win came with -0.25% target and -4.6% decoy drift, which is not clean enough to justify shipping the surface area. Branch was hard-reset to eee9fa6 (the 5x roadmap state before any Phase A work) to drop 11 code commits + their uncommitted in-progress extensions. This commit captures the empirical findings as a retrospective doc so future agents can read the measurement data, understand why Phase A failed Astral, and pick from Phase B/C/E or workload-retargeting alternatives without repeating the attempt. - Add .claude/plans/astral-phase-a-retrospective.md with six Astral measurements, two TMT measurements, lessons (TMT-not-Astral-proxy, JIT-already-optimizes, native-count-drift-is-leading-indicator), and a concrete list of what's still untried. - Update astral-speed-5x-roadmap.md section 11 + 12 to flag Phase A as attempted-and-reverted, with original recommendation preserved as a blockquote for context. - Update SHIPPED.md "Abandoned" section with a Phase A entry pointing at the retrospective. - Update README.md "Active" list to surface the retrospective. Reverted code is recoverable via: git show 5cdd21e:src/... and walks back through 11 commits (b78e275 through 5cdd21e).

…nder noise Follow-up to Phase A iteration. Tested ForkJoin pool as a smart-default on Astral after thread-scaling sweep showed default ThreadPoolExecutor anti-scales past 6 threads. Findings: - Default executor: 4t=690s, 6t=675s, 8t=884s (+28% regression at 8t). Anti-scaling at 8t reproduces across measurements; this is real. - ForkJoin opt-in: 4t=872s (regression), 8t=521s (1.32x vs 4t). Pool choice is thread-count dependent. - Smart-default change implemented (numThreads >= 8 -> ForkJoin) and scope-tested (9/9 pass), but confirmation runs showed +65% wall on the same JAR vs the morning measurement. Machine state had degraded across hours of benchmarking; can't isolate ForkJoin win from noise. - Reverted. The underlying -Dmsgfplus.useForkJoin=true opt-in remains in dev unchanged. Lessons added to retrospective: - Need stable benchmark environment + multi-run statistics for any Astral wall-time claim. Single-measurement deltas at the 30% level are below the noise floor. - The default-executor anti-scaling at 8t IS worth fixing. Future attempt: jfr print --events jdk.JavaMonitorWait on 8t default-executor profile to localise the contention point in ThreadPoolExecutorWithExceptions. Files: .claude/plans/astral-phase-a-retrospective.md (added Phase E section + updated "what's untried" entry); .claude/plans/SHIPPED.md (added Phase E to "Abandoned"). Source code unchanged from eee9fa6 (verified via git status; MSGFPlus.java reverted before commit).

… not enough JFR analysis of the morning 4t Astral profile found zero JavaMonitorEnter contention events and 100% RUNNABLE samples, ruling out lock contention as the 8t regression cause. But 588K GCPhaseParallel events suggested GC parallel-phase coordination as a candidate. Tested with -Xmx16g (double heap) at 8t and 4t: - 8t-Xmx16g: 776s wall, 5067MB peak RSS, 182 GC pauses - 4t-Xmx16g: 870s wall, 6120MB peak RSS, 184 GC pauses Bigger heap helped 8t by ~12% wall and dropped peak RSS ~17% (G1GC ran fewer collections), but only ~4% at 4t. GC contributes to the 8t regression but is not the entire story. Even with Xmx16g, 8t (776s) is still slower than morning's 4t-Xmx8g baseline (690s). The afternoon's 4t-Xmx8g run (904s) vs morning's same configuration (690s) is +31% on the same JAR / same args / same machine — confirming the day's accumulated machine-state degradation dwarfs any code-level signal. Six hours of benchmarking has hit the noise floor. No actionable recommendation to ship. Source code stays at dev tip. The Phase E section in astral-phase-a-retrospective.md is updated with the GC-pressure findings.

…ere noise Replication batch (3 Astral runs back-to-back, 2026-04-29 morning, quieter machine): 4t default: 963.1s 8t default: 918.3s (FASTER than 4t — opposite of yesterday) 8t ForkJoin: 978.8s (no advantage) All three within 6.5% of each other. Native counts bit-identical. This proves: - The "default executor anti-scales past 6 threads" finding was wrong. It was a one-day correlation between morning-quiet-machine + 4t and afternoon-noisy-machine + 8t, not a real algorithmic relationship. - The 521s ForkJoin-8t measurement was an outlier. Three independent re-measurements (yesterday afternoon 861s, today 978s) put 8t-FJ at ~900s in normal machine state. - The 690s 4t-default baseline was an outlier in the other direction. Today's same-machine 4t-default was 963s. Conclusion: no Phase E shippable change exists on this benchmark setup. Wall-time variance from machine state (~30%) dwarfs any code-level signal we can produce on a single developer workstation. Updated retrospective with the corrected Phase E section + replication batch data. Updated SHIPPED.md Phase E entry to reflect the disproof. Source code stays at dev tip. End of iteration. Five doc commits stay on the branch as the deliverable.

The PXD001819 CI workflow (.github/workflows/benchmark-pxd001819.yml) has been silently broken since PR #23 removed mzIdentML output entirely. run_ci.sh wrote -o ci_output.mzid, MS-GF+ produced a .pin under that filename, and extract_metrics.py tried to parse it as mzIdentML. The CI never produced valid metrics. Switch the whole pipeline to .pin, which is the only modern output format: - run_ci.sh: -o $PIN (was -o $MZID); error message updated. - extract_metrics.py: rewritten parse_pin() counts native target / decoy rows by streaming the .pin (column 2 = label, {1, -1}) one line at a time. Drops the iterparse XML dependency. Replaces sii_count + psm_1pct_fdr metrics with native_target_count + native_decoy_count. - baseline.tsv: new metric names + ranges. Wide first-green-run windows (14000-35000 targets, 8000-20000 decoys); to be tightened after the first self-hosted runner produces consistent numbers. - test_compare_metrics.py: 7 existing comparator tests updated to use the new metric names; 3 new ParsePinTest cases covering label counts, empty input, and malformed rows. 10/10 pass. - benchmark/ci/README.md: documents the PIN-based pipeline, why native counts are the right CI gate (deterministic across runs; catches search-code regressions cleanly without Percolator dependency), and points future iterations at the multi-run protocol the astral-phase-a-retrospective.md asks for. The CI gate is now search-correctness (deterministic counts), not sensitivity (which needs Percolator + a separate gate). Sensitivity regressions surface via a downstream Percolator run on the .pin artifact — out of scope for this CI workflow. This is the meta-fix the Astral retrospective named as the next iteration's prerequisite: stable benchmarks must work before any speed-optimization claim is defensible.

…ication Phase B (calibrated precursor-window tightening) is already implemented in dev — MSGFPlus.runMSGFPlus:396-423 computes tightened tolerance via MassCalibrator.tightenedTolerancePpm() and passes it to per-task ScoredSpectraMap. What's missing per astral-next-experiments.md is the telemetry to verify it's actually shrinking the pairing fan-out. This commit adds PhaseBTelemetry, an opt-in counter class that records: - pairing_calls: number of times DBScanner.dbSearch hit the pepMassSpecKeyMap.subMap(leftThr, rightThr) site - matched_speckeys: total SpecKeys returned across those calls Enable with -Dmsgfplus.phaseBTelemetry=true. Off by default; OFF-mode behaviour is bit-identical (single load+branch when disabled). Not exposed as a CLI flag — this is a developer diagnostic for Phase B verification, mirroring the approach used for -Dmsgfplus.useForkJoin. At the end of search MSGFPlus.runMSGFPlus prints (when enabled): [Phase B telemetry] pairing_calls=N matched_speckeys=M mean_per_call=X.XX Files: - src/main/java/edu/ucsd/msjava/msdbsearch/PhaseBTelemetry.java (new) — thread-safe LongAdder counters, opt-in via system property, JVM-scoped. - src/main/java/edu/ucsd/msjava/msdbsearch/DBScanner.java — record at DBScanner.dbSearch:489 (one line, guarded by enabled() short-circuit). - src/main/java/edu/ucsd/msjava/cli/MSGFPlus.java — print summary after task wall summary (around line 587), only when enabled. - src/test/java/edu/ucsd/msjava/msdbsearch/TestPhaseBTelemetry.java (new) — 5 tests: counter math, mean-zero edge case, reset, 8-thread concurrency stress (80k ops total), system-property contract. Tests: 14/14 pass for the relevant scoped suite (TestPhaseBTelemetry + TestPrecursorCalScaffolding + TestConcurrentMSGFPlus); OFF-mode unchanged. Plan update: astral-next-experiments.md §4 annotated to note Phase B core is shipped + telemetry is now wired. Future agent runs the bench with -Dmsgfplus.phaseBTelemetry=true and the existing -precursorCal auto/on, then verifies §4.5 Phase B success gate (matched SpecKey count drops materially after tightening; Astral wall improves >=10%).

…uals Two bugs in MassCalibrator surfaced under Phase B telemetry: 1. The pre-pass propagated the user's -ti window (e.g. -1,2 on Astral) to the pre-pass DBScanner. Spectra whose precursor was selected as the M+1 or M+2 isotope produced 500-1000 ppm "residuals" (the 1.003 Da isotope shift on a 2 kDa peptide). These contaminated the median + MAD aggregation. Astral measured sigma=1166 ppm — useless for any tightening decision. Fix: hard-code the pre-pass to iso=[0,0]. The pre-pass exists to LEARN the shift, not to find every match; clean monoisotopic matches are what makes the residual distribution meaningful. 2. As belt-and-suspenders, residuals beyond +/-50 ppm are now rejected in extractResiduals(). Any modern instrument's true mass accuracy is well under 50 ppm; values above are almost always isotope-shift or charge-state mistakes. New constant MAX_REASONABLE_RESIDUAL_PPM=50. Drop the now-unused minIsotopeError / maxIsotopeError fields and constructor parameters from MassCalibrator. The single caller (MSGFPlus.runMSGFPlus) is updated. Verified on remote pride-linux-vm.ebi.ac.uk (Astral ProteoBench Module 8): Before fix: Precursor mass shift 1.398 ppm from 371 PSMs (sigma 1166 ppm) After fix: Precursor mass shift 0.978 ppm from 393 PSMs (sigma 3.987 ppm) The shift values are similar; the sigma is the dramatic signal that the residual distribution is now clean. 0.978 ppm is the typical Astral instrument bias; 3.987 ppm is the true post-calibration spread (still wider than expected; might reflect long-peptide tail). OFF-mode (precursorCal=off) is bit-identical to dev-tip on three back-to-back Astral runs (548/560/551s wall, all 89479/46792 native counts). 32/32 scoped tests pass (TestMassCalibrator, TestPhaseBTelemetry, TestPrecursorCalScaffolding). NOTE: this commit only fixes the calibrator's correctness. Phase B tightening still does not fire on Astral because the tightening formula k=3 * sigma + margin = 12.46 ppm exceeds the 10 ppm user window. Tuning k or rethinking the formula is a separate decision; see retrospective for the next-iteration discussion.

Phase B's tightening formula min(userPpm, max(floorPpm, k*sigma + marginPpm)) previously used hard-coded MassCalibrator.DEFAULT_TIGHTENED_WINDOW_* constants (k=3, floor=2 ppm, margin=0.5 ppm). Surface them as system properties so falsification sweeps can vary k without recompiling: -Dmsgfplus.tighteningSigmaMultiplier=<float> (default 3.0) -Dmsgfplus.tighteningFloorPpm=<float> (default 2.0) -Dmsgfplus.tighteningMarginPpm=<float> (default 0.5) Defaults unchanged; OFF-mode and PrecursorCal=AUTO production behaviour is bit-identical when no override is set. Verified on remote pride-linux-vm.ebi.ac.uk Astral ProteoBench Module 8 with the new -Dmsgfplus.tighteningSigmaMultiplier=2 override: | Run | Wall | Targets | Decoys | mean_per_call | |--------------------|-------|---------|--------|---------------| | OFF (baseline) | 538s | 89479 | 46792 | 0.77 | | AUTO @ k=2 rep1 | 537s | 89453 | 46685 | 0.63 | | AUTO @ k=2 rep2 | 531s | 89453 | 46685 | 0.63 | Tightening fires for the first time on Astral (10.000 ppm -> 8.474 ppm). Pairing fan-out drops 18% (mean_per_call 0.77 -> 0.63). matched_speckeys drops 15.4%. Native counts shift slightly (-26 targets, -107 decoys = -0.03% / -0.23%, within noise). Wall delta: 0 +/- 2% (within machine noise floor of three OFF replicates 538/560/551). Conclusion: Phase B's cheap-score-pairing reduction does NOT translate to Astral wall savings. The bottleneck is downstream of pairing (GF / scoring). Phase B remains a valid lever for workloads where calibrated sigma is small relative to user tolerance (typical of older / drifted instruments) but is not the next Astral wall lever. Next step: stratification analysis (Plan B) to determine whether a cleaner subset of Astral pre-pass PSMs has materially tighter sigma. If yes -> calibrator quality improves for workloads where Phase B DOES pay off, even if Astral itself doesn't benefit. If no -> Phase B is at its theoretical ceiling.

…FIDENT_PSMS Plan B (post-falsification): the previous calibrator returned all 393 SpecEValue<=1e-6 PSMs to the median+MAD aggregator. Per-PSM stratification analysis (using the calibrationDebugTsv emitter) on Astral revealed: Stratum | n | sigma (ppm) -----------------------------|-----|------------ All confident PSMs | 393 | 3.987 Charge=2 only | 355 | 2.923 Mass < 1 kDa | 289 | 2.470 TOP 50% by spec_eValue | 196 | 0.951 TOP 200 by spec_eValue | 200 | 0.994 The "best half by spec_eValue" axis is the dominant explanatory variable — 4x tighter sigma. The worst-half PSMs (eValue near the 1e-6 threshold) contribute residual scatter, not real instrument-bias signal. Fix: extractResiduals now collects (residual, spec_eValue) pairs, sorts by spec_eValue ascending (most confident first), and keeps the top MIN_CONFIDENT_PSMS (200) for the median+MAD aggregation. The absolute-residual filter (|residual| > 50 ppm) stays as belt-and-suspenders. Astral measurement on remote pride-linux-vm.ebi.ac.uk (5 OFF replicates + 3 stratified-AUTO replicates): Run | Wall | Sigma | Window | Targets | Decoys | T/D ----------------------|-------|-----------|----------------|---------|--------|------ OFF baseline (median) | 551 s | - | 10 ppm | 89479 | 46792 | 1.91 AUTO stratified | 494 s | 0.994 ppm | 10 -> 3.48 ppm | 89580 | 45292 | 1.98 Median wall: -10.4% (gate was >=10%, hit) Targets: +101 (+0.11%, NOT a regression) Decoys: -1500 (-3.2%, FDR-favorable) T/D ratio: +3.6% (the tighter window rejects more decoys than targets) Reproducibility: 3 stratified reps bit-identical on native counts; wall variance +/-1.2% across reps. Phase B is now demonstrably alive on Astral when the calibrator's residual subset is clean. The previous "Phase B does nothing on Astral" finding was a calibrator-quality issue, not a Phase B logic issue. Also keeps the calibration-debug-TSV emitter (-Dmsgfplus.calibrationDebugTsv=<path>) committed for future stratification work on other workloads. 32/32 scoped tests pass.

Phase B with the stratified calibrator (top-200 by spec_eValue) delivers -10.4% Astral wall (median 551 -> 494s across 5 OFF + 3 AUTO reps), +0.11% targets, -3.2% decoys, T/D ratio 1.91 -> 1.98 (sensitivity- favorable). Plan section 4 annotated; SHIPPED.md "Active" section documents the win + the four enabling commits (telemetry, calibrator iso=0, configurable formula constants, stratification).

The previous SHIPPED.md still listed astral-speed-improvements.md as the Active doc; that file was deleted during a prior reset. Replace with the actual Active state: Phase B shipped with 4 enabling commits on feat/astral-speed-improvements, the Astral measurement table, and pointers to next-experiments.md and astral-speed-5x-roadmap.md.

Replace Astral-only result table with three-dataset comparison after remote validation on TMT (PXD007683 Lumos) and PXD001819 (Velos): Workload | Window | Sigma | Tightened | Wall | Targets | T/D ----------|--------|--------|--------------|--------|---------|------ Astral | 10 ppm | 0.99 | -> 3.48 ppm | -10.4% | +0.11% | +3.6% TMT | 20 ppm | 2.05 | -> 6.67 ppm | -18.0% | -2.05% | +1.3% PXD001819 | 5 ppm | 2.15 | no-tighten | ~0% | +0.17% | +0.5% Pattern: Phase B wins when calibrated sigma is materially smaller than user window (Astral, TMT); safely no-ops when sigma is comparable to window (PXD001819). TMT -2.05% target drift is a known yellow flag for broader rollout; mitigations (instrument-aware k, stricter stratification) noted in the doc but out of scope for this commit.

After Phase B's three-dataset validation (Astral -10.4%, TMT -18.0%, PXD001819 safe no-op), the next-experiments plan §5 calls for exact prefix mass-interval pruning. This commit writes the design before any code change so the implementation has a clear contract: - Hook: between addResidue (extends prefix) and the variant-mass-fetch loop in DBScanner.dbSearch (around line 412) - Bound: [prefixMinMass + R_min * minAaMass, prefixMaxMass + R_max * (maxAaMass + maxResidueModMass) + maxFixedTermModMass] - Intersection test: pepMassSpecKeyMap.subMap(...) widened by max precursor tolerance; if empty, branch is dead -> break - Telemetry-first: implement prune-counter only at Checkpoint 1, run on Astral, decide based on prune rate before adding the break - Exact-by-construction: no recall risk, only bookkeeping-vs-savings trade-off Acceptance: prune rate >=5%, Astral wall -5% vs Phase B baseline, native counts bit-identical. Kill: prune rate <1%, or wall flat despite pruning, or correctness drift. Composes with Phase B (commit aac389c): Phase B reduced matched_speckeys per pairing call; Experiment 2 reduces the number of pairing calls. Different attack surfaces; should stack.

…heckpoint 1+2 measured) Plan-§5 exact prefix mass-interval pruning: at every SA-walk residue extension in DBScanner.dbSearch, compute the reachable final-peptide- mass interval and ask whether any spectrum window in pepMassSpecKeyMap can intersect it. If not, the branch is dead. The bound is exact by construction: reachableMin = prefixMinMass + R_min * minAaMassBound reachableMax = prefixMaxMass + R_max * maxAaMassBound where R_min / R_max are the residue counts implied by minPeptideLength / maxPeptideLength, and minAaMassBound / maxAaMassBound are cached from aaSet.getAllAminoAcidArr() at DBScanner construction (every aa+mod variant in the set, so any future residue's mass is bounded). Two opt-in flags: -Dmsgfplus.experiment2Telemetry=true -> count would-be prunes -Dmsgfplus.experiment2Pruning=true -> actually break out of SA walk Default OFF -> the bound is not computed, OFF-mode bit-identical to dev-tip. Enabled as a developer diagnostic, mirroring PhaseBTelemetry and -Dmsgfplus.useForkJoin patterns. Not exposed as CLI flags. Astral measurements on remote pride-linux-vm.ebi.ac.uk: Run | Wall | Targets | Decoys | E2 prune -----------------------------|-------|---------|--------|--------- Phase B baseline (3 reps) | 494s | 89580 | 45292 | n/a OFF baseline (5 reps median) | 551s | 89479 | 46792 | n/a phaseB + E2 telemetry only | 571s | 89580 | 45292 | 12.22% phaseB + E2 pruning ON | 549s | 89580* | 45292* | 1.84% OFF + E2 pruning ON | 611s | 89479* | 46792* | 1.76% (* = bit-identical to baseline -> exact-by-construction validated) Verdict per plan §7 acceptance gates: - Native target/decoy bit-identical: PASS (exact) - Astral prune rate >= 5%: PASS (12.22% Checkpoint 1, 1.84% with break) - Astral wall improves >= 5%: FAIL (regresses +11% in both Phase-B+E2-prune and OFF+E2-prune cases) The bookkeeping cost (~40 ns per prefix evaluation, ~1.4B evaluations on Astral = ~55s) exceeds the savings from skipping doomed branches. Pure cheap-score-loop savings are ~1-2s (the pruned pairings had zero matches anyway -- mean_per_call near zero for those). The +55s gap is the pure overhead. Checkpoint 3 path forward (left as follow-on, NOT in this commit): - Cache globalMinSpecMass / globalMaxSpecMass once per task; short-circuit the bound test for prefixes whose reachable interval falls outside the global range -> avoids the TreeMap.subMap call (~150ns) on most prefix evaluations. - Skip bound evaluation when peptideLengthIndex < minPeptideLength + N (N TBD); the prune-rate-by-length distribution would tell us where short-prefix evaluations are dead weight vs useful. - Avoid per-prefix grid scan for prefix min/max mass; cache incrementally as variants are added. Code stays in dev as opt-in scaffolding. Phase B remains the iteration's shippable Astral wall lever (-10.4% measured median). Files: - src/main/java/edu/ucsd/msjava/msdbsearch/Experiment2Telemetry.java (new) - src/main/java/edu/ucsd/msjava/msdbsearch/DBScanner.java (cached aa-mass bounds + bound-test hook) - src/main/java/edu/ucsd/msjava/cli/MSGFPlus.java (telemetry summary) - src/test/java/edu/ucsd/msjava/msdbsearch/TestExperiment2Telemetry.java (5 tests)

Checkpoint 2 measurement on Astral remote shows the bound test is exact-by-construction (native counts bit-identical) and prunes 1.84% of prefix evaluations after break — but the per-prefix bookkeeping cost (~40 ns × 1.4 B = ~55 s) exceeds the savings (the pruned pairings had zero matches, so saved cheap-score work is ~1-2 s). Wall regresses +11% vs Phase B alone. Code lands as opt-in scaffolding (commit 4241fbb), defaults OFF, so OFF-mode is bit-identical to dev-tip. Future agents can pursue Checkpoint 3 (cache global spec-mass range short-circuit; skip bound test for short prefixes; incremental grid-mass caching) without rebuilding the scaffolding.

…ed double[] The Checkpoint 2 measurement showed Experiment 2 pruning was correct (native counts bit-identical) but regressed wall +11 % because the TreeMap.subMap(low, high).isEmpty() bound test allocated a view object and walked at least one entry per call (~150 ns × 1.4 B evaluations = ~210 s of pure overhead). Replace with ScoredSpectraMap.hasSpecMassInRange(double, double): - Lazily materialise a sorted double[] from pepMassSpecKeyMap.keySet() on first call (the source map is read-only after preProcessSpectra finishes, so a cached snapshot is safe). - Use Arrays.binarySearch + a single-element range check (~30 ns). - 5x cheaper per call; no allocation per query. Astral measurement on remote pride-linux-vm.ebi.ac.uk (3 runs each config, all native counts bit-identical to baseline = exact-by- construction validated): Run | TreeMap | binary-search | gain -----------------------------|---------|---------------|------- Phase B + E2 telemetry only | 571 s | 531 s | -40 s Phase B + E2 pruning ON | 549 s | 511 s | -38 s OFF + E2 pruning ON | 611 s | 559 s | -52 s Verdict against plan §7 wall gate (>= 5 % improvement vs Phase B baseline 494 s): - Phase B + E2 pruning: 511 vs 494 = +3.4 % (still slight regression) - OFF + E2 pruning: 559 vs 551 = +1.5 % (break-even within noise) The optimization closes ~75 % of the previous overhead but the remaining ~17 s gap on Phase B-tightened data isn't covered by the modest savings (matched_speckeys per skipped pairing is small once Phase B has already narrowed the window from 10 ppm to 3.48 ppm). Code stays as opt-in (-Dmsgfplus.experiment2Pruning=true). Default remains OFF -> bit-identical to dev-tip on dev-tip OFF mode + Phase B's calibration shift on AUTO. Phase B (commit aac389c and ancestors) is the iteration's durable Astral wall lever (-10.4 %). Future Checkpoint 4 paths (left as follow-on, not in this commit): - Skip bound test for short prefixes (peptideLengthIndex below some threshold N where prunes are statistically rare) - Cache prefixMin/Max incrementally instead of scanning grid variants on every evaluation (~10 ns saved per call)

Binary-search optimization closes ~75 % of the Checkpoint 2 overhead gap. Phase B + E2 pruning narrows from +11 % to +3.4 % wall regression vs Phase B alone. OFF + E2 pruning narrows from +11 % to +1.5 % vs OFF (break-even in noise). Still misses the +5 % wall gate; Phase B stays the iteration's deliverable.

…deLength (Checkpoint 4) Move the Experiment 2 mass-interval pruning hook from before the 'peptideLengthIndex < minPeptideLength' continue to after it. Short prefixes (length < minPeptideLength) have huge reachable intervals — R_max * maxAaMass is many kDa wide — so the bound test almost never prunes there but still pays the binary-search bookkeeping. Skipping that range trims ~25-30% of bound-test evaluations. Also simplifies reachableMin = prefixMin once we're past minPeptideLength (R_min collapses to 0). Exact-by-construction property preserved: the moved test is still a sound necessary condition on extending the prefix to a viable final peptide. OFF mode unaffected. Off by default (still gated on -Dmsgfplus.experiment2Pruning=true).

…pped, bench pending Records the Checkpoint 4 code change (commit 8478651): bound test now skipped for prefixes shorter than minPeptideLength, where the reachable interval is too wide to prune anyway. Tests pass; Astral wall measurement pending remote socket re-establishment. Math says best-case is +3-7s gap vs Phase B alone — still inside noise, not a clear ≥5% graduation. Verdict updated to: four checkpoints, none clear the gate; Phase B is the durable lever.

…-2.27% Astral wall Replaces the "bench pending" Checkpoint 4 entry with the actual 5-trial interleaved confirmation: phaseB_only: mean 519.8s, σ 6.06s (522, 518, 517, 529, 513) phaseB+e2: mean 508.0s, σ 4.85s (504, 507, 514, 503, 512) Δ = 11.8s = -2.27% vs Phase B alone 5/5 trials phaseB+E2 < phaseB_only; Welch's t ~3.4 (p ~ 0.01) Native counts bit-identical 89580T / 45292D across all 10 runs The effect is real and statistically significant but below the plan's >=5% graduation gate. Experiment 2 stays opt-in (-Dmsgfplus.experiment2Pruning=true); Phase B (-10.4% Astral wall vs OFF) remains the durable lever.

…real improvement The Astral 5-trial bench showed: - Phase B (calibrated precursor-window tightening): -10.4% wall vs OFF, durable win, validated on 3 datasets (Astral / TMT / PXD001819). KEPT. - Experiment 2 (mass-interval pruning, Checkpoints 1-4): real -2.27% on top of Phase B but below the >=5% default-on gate. Runtime scaffolding REMOVED; retrospective stays in .claude/plans/experiment-2-mass-interval-pruning.md. - Catalog-backed scanner (MassIndexedPeptideCatalog + CatalogDBScanner): n=1 trial showed 4.56x slowdown (2186s vs 479s) with bit-identical native counts. Root cause: scanner iterates all 7M entries linearly and allocates a fresh CandidatePeptideGrid per entry instead of querying the mass slabs the catalog builds. REMOVED in current form; would need a spectrum-major redesign + grid pool to be viable. Removed: - src/main/java/edu/ucsd/msjava/msdbsearch/CatalogDBScanner.java - src/main/java/edu/ucsd/msjava/msdbsearch/MassIndexedPeptideCatalog.java - src/main/java/edu/ucsd/msjava/msdbsearch/MatchScanner.java - src/main/java/edu/ucsd/msjava/msdbsearch/PeptideCatalogSource.java - src/main/java/edu/ucsd/msjava/msdbsearch/Experiment2Telemetry.java - src/main/java/edu/ucsd/msjava/msdbsearch/PhaseBTelemetry.java - tests for the above Reverted to HEAD (catalog source-awareness): - DatabaseMatch (no protein-source list) - MSGFPlusPSMSet (no source-aware decoy decision) - DirectPinWriter / DirectTSVWriter (no catalog source path) - CompactFastaSequence (no protein-entry digest helper) - ConcurrentMSGFPlus (no catalog scanner construction) - TestConcurrentMSGFPlus (revert null arg) Cleanup kept: - MSGFPlus: drop Phase B / Experiment 2 telemetry print blocks - DBScanner: drop Experiment 2 prefix-mass bound hook (and its bookkeeping) - ScoredSpectraMap: drop hasSpecMassInRange (Experiment 2 binary-search helper) - MassCalibrator: drop unused -Dmsgfplus.calibrationDebugTsv emitter Phase B remains untouched: MassCalibrator iso=0 pre-pass + outlier filter + spec_eValue stratification, and MSGFPlus tightening at min(userPpm, max(floor, k*sigma + margin)) with k=3, floor=2 ppm, margin=0.5 ppm. Branch state: 51/51 scoped tests pass.

CLAUDE.md flags as a footgun: pre-passes that mutate shared Spectrum objects cause silent PSM-count drift in the main pass. The calibration pre-pass at MassCalibrator#collectResiduals invoked ScoredSpectraMap#preProcessSpectra, which calls spec.setCharge(charge) and scorer.getScoredSpectrum(spec) on objects from the shared SpectraAccessor — exactly the warned pattern. Fix: - ScoredSpectraMap gains an opt-in builder isolateSpectrumState() that routes all spectrum-mutating sites through a new prepareSpectrumForScoring helper. Default behavior (main search path) is unchanged: in-place setCharge, no allocation. - When isolation is enabled, each call clones the Spectrum (precursor + per-peak deep clone via getCloneWithoutPeakList + Peak.clone), applies setCharge to the clone only, and scores the clone. The shared Spectrum cached on SpectraAccessor is untouched. - MassCalibrator's pre-pass opts in: new ScoredSpectraMap(...).isolateSpectrumState() - TestScoredSpectraMapIsolation pins the contract: default path returns the same instance and propagates setCharge; isolated path returns a different instance, leaves the original's charge unchanged, and applies setCharge to the clone. Validation: 3-dataset x 3-trial x 2-arm bench on the isolated machine (pride-linux-vm.ebi.ac.uk): Astral OFF 508.7s +/- 5.0s T=89479 D=46792 AUTO 469.0s +/- 9.2s T=89580 D=45292 wall -7.8%, +101 T TMT OFF 248.0s +/- 6.6s T=28790 D=14768 AUTO 240.3s +/- 1.5s T=28201 D=14285 wall -3.1% PXD001819 OFF 101.3s +/- 0.6s T=28037 D=11022 AUTO 100.3s +/- 2.1s T=28084 D=10985 wall -1.0%, +47 T Native target/decoy counts are deterministic across all 3 trials per arm (zero per-trial variance), confirming the isolation prevents the shared-state mutation that would otherwise produce silent drift. OFF-mode counts on each dataset match the dev baseline exactly.

coderabbitai · 2026-05-01T09:07:57Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6406ee62-935a-4aa4-a9b1-d26b46879fb3

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/precursor-window-tightening

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

qodo-code-review · 2026-05-01T09:08:57Z

Review Summary by Qodo

Phase B: Calibrated precursor-window tightening with spectrum state isolation

✨ Enhancement 🐞 Bug fix 🧪 Tests 📝 Documentation

Walkthroughs

Description

• Implements calibrated precursor-window tightening via a pre-pass that learns mass shift and robust
sigma from top 200 confident PSMs, then tightens main-pass tolerance using `min(userPpm,
max(floorPpm, k·sigma + marginPpm))` formula (defaults: k=3, floor=2 ppm, margin=0.5 ppm). Opt-in
via -precursorCal auto flag.
• Fixes CLAUDE.md violation: pre-pass now routes through new
ScoredSpectraMap.isolateSpectrumState() builder that deep-clones each spectrum (precursor + peaks)
before charge-setting, preventing shared cache mutation while keeping main-pass path unchanged.
• Adds CalibrationStats immutable class with robust statistics methods
(medianAbsoluteDeviation(), robustSigmaPpm(), tightenedTolerancePpm()) and exposes tightening
formula constants as configurable system properties.
• Refactors calibration pre-pass to use fixed isotope error 0, stratify residuals by spec_eValue,
filter outliers >50 ppm, and keep only top 200 confident PSMs.
• Validated on 3 datasets with 3 trials each: Astral shows −7.8% wall improvement (+101 targets),
TMT shows −3.1% wall improvement, PXD001819 shows −1.0% wall with +47 targets. All results
deterministic across trials, confirming spectrum isolation fix.
• Switches benchmark CI metrics from mzIdentML XML parsing to direct PIN TSV parsing for native
target/decoy counts (deterministic search-correctness gate instead of 1% FDR sensitivity).
• Removes non-shippable scaffolding: Experiment 2 telemetry classes, mass-interval pruning hook,
calibration debug TSV emitter, and binary-search helper.
• Adds comprehensive unit tests for robust sigma calculation, tolerance tightening formula variants,
and spectrum isolation behavior.
• Updates documentation with Phase A retrospective, 5× speed roadmap, Phase B/Experiment 2 action
plan, and CI measurement strategy.

Diagram

flowchart LR
  A["Pre-pass<br/>Top 200 PSMs<br/>iso=0"] -->|"Learn shift<br/>& sigma"| B["CalibrationStats"]
  B -->|"Tighten formula<br/>k·sigma+margin"| C["Effective tolerance<br/>left/right"]
  D["Input Spectrum"] -->|"isolateSpectrumState"| E["Cloned Spectrum<br/>peaks+precursor"]
  E -->|"setCharge"| F["Scored Spectrum<br/>pre-pass"]
  C -->|"Lambda capture"| G["Main-pass<br/>ScoredSpectraMap"]
  G -->|"Default path<br/>in-place"| H["Results<br/>−7.8% Astral wall"]

File Changes

1. src/main/java/edu/ucsd/msjava/msdbsearch/MassCalibrator.java ✨ Enhancement +149/-16

Calibration stats and tolerance tightening formula implementation

• Adds CalibrationStats immutable class to return both shift and robust sigma from calibration
 pre-pass
• Exposes tightening formula constants as public static fields
 (DEFAULT_TIGHTENED_WINDOW_FLOOR_PPM, DEFAULT_TIGHTENED_WINDOW_SIGMA_MULTIPLIER,
 DEFAULT_TIGHTENED_WINDOW_MARGIN_PPM)
• Implements robust statistics: medianAbsoluteDeviation(), robustSigmaPpm(), and
 tightenedTolerancePpm() for conservative tolerance tightening
• Refactors pre-pass to use isotope error 0 (fixed), stratifies residuals by spec_eValue, filters
 outliers >50 ppm, and keeps top 200 confident PSMs
• Removes minIsotopeError and maxIsotopeError constructor parameters (pre-pass now hardcoded to
 iso=0)

src/main/java/edu/ucsd/msjava/msdbsearch/MassCalibrator.java

2. src/main/java/edu/ucsd/msjava/cli/MSGFPlus.java ✨ Enhancement +59/-12

Main-pass tolerance tightening and calibration stats integration

• Calls learnCalibrationStats() instead of learnPrecursorShiftPpm() to capture both shift and
 robust sigma
• Implements precursor-window tightening logic when calibration stats are reliable and tolerance is
 ppm-based
• Computes tightened left/right tolerances via MassCalibrator.tightenedTolerancePpm() using
 configurable system properties
• Passes effective (tightened or original) tolerances to main-pass ScoredSpectraMap via lambda
 capture
• Removes minIsotopeError and maxIsotopeError arguments from MassCalibrator constructor call

src/main/java/edu/ucsd/msjava/cli/MSGFPlus.java

3. src/main/java/edu/ucsd/msjava/msdbsearch/ScoredSpectraMap.java 🐞 Bug fix +35/-5

Spectrum state isolation for pre-pass to prevent shared cache mutation

• Adds isolateSpectrumState flag and isolateSpectrumState() builder method for spectrum cloning
 during preprocessing
• Introduces prepareSpectrumForScoring() method that either mutates in-place (default) or clones
 before charge-setting (isolated mode)
• Implements cloneSpectrum() helper that deep-clones spectrum metadata and all peaks
• Updates preProcessIndividualSpectra() and preProcessFusedSpectra() to use the new preparation
 method

src/main/java/edu/ucsd/msjava/msdbsearch/ScoredSpectraMap.java

View more (16)

4. src/main/java/edu/ucsd/msjava/msdbsearch/DBScanner.java Miscellaneous +17/-17

Field visibility changes for test extensibility
• Changes 17 private fields to protected visibility to enable subclassing in test/experimental
 contexts
src/main/java/edu/ucsd/msjava/msdbsearch/DBScanner.java

5. src/test/java/msgfplus/TestMassCalibrator.java 🧪 Tests +49/-0

Unit tests for robust sigma and tolerance tightening logic

• Adds 5 new unit tests for robust statistics: medianAbsoluteDeviationUsesProvidedCenter(),
 robustSigmaPpmScalesMad(), three tightenedTolerancePpm*() variants
• Adds test for CalibrationStats.hasReliableStats() with zero shift but reliable sigma
• Tests verify MAD scaling, floor/margin enforcement, and user-bound capping in tightening formula

src/test/java/msgfplus/TestMassCalibrator.java

6. src/test/java/edu/ucsd/msjava/msdbsearch/TestScoredSpectraMapIsolation.java 🧪 Tests +52/-0

Tests for spectrum state isolation in ScoredSpectraMap
• New test class with 2 tests verifying spectrum isolation behavior
• defaultPathMutatesOriginalSpectrumCharge() confirms default path mutates in-place
• isolatedPathClonesSpectrumBeforeChangingCharge() confirms isolated mode clones before mutation
src/test/java/edu/ucsd/msjava/msdbsearch/TestScoredSpectraMapIsolation.java

7. benchmark/ci/PXD001819/test_compare_metrics.py 🧪 Tests +65/-7

CI test updates for PIN-based native count validation

• Updates test metrics from psm_1pct_fdr to native_target_count and native_decoy_count
• Adds new ParsePinTest class with 3 tests for parse_pin() function: label counting, empty file
 handling, malformed row skipping
• Adds _load_extract_metrics() helper to dynamically import the extract_metrics module

benchmark/ci/PXD001819/test_compare_metrics.py

8. benchmark/ci/PXD001819/extract_metrics.py ⚙️ Configuration changes +34/-41

Benchmark metrics extraction from PIN instead of mzIdentML

• Replaces mzIdentML XML parsing with direct PIN TSV parsing via parse_pin() function
• Counts target (Label==1) and decoy (Label==-1) rows from the .pin file instead of parsing XML
 for 1% FDR PSMs
• Updates main() to accept --pin argument instead of --mzid and output native_target_count /
 native_decoy_count metrics
• Removes XML ElementTree dependency and streaming iterparse logic

benchmark/ci/PXD001819/extract_metrics.py

9. benchmark/ci/README.md 📝 Documentation +34/-4

CI documentation for PIN-based metrics and measurement strategy

• Adds "What gets measured" section documenting the 5 CI metrics and their sources
• Clarifies that native target/decoy counts are deterministic (search-correctness gate), not 1% FDR
 sensitivity
• Explains PIN format choice and removal of mzIdentML
• Adds "Tightening the baseline" section for narrowing ranges after green runs
• Adds "Future iterations" section referencing the Phase A retrospective and multi-run measurement
 strategy

benchmark/ci/README.md

10. .claude/plans/astral-phase-a-retrospective.md 📝 Documentation +230/-0

Phase A retrospective with measurements and lessons learned

• Documents Phase A attempt (deisotoping, peak cap, GF candidate cap, scorer optimization) with 6
 Astral measurements showing no wall improvement
• Records TMT 1.41× win but with −4.6% decoy drift, deemed not a clean win
• Provides lessons learned: TMT is not a reliable Astral proxy, JIT optimization already handles hot
 spots, profile-driven fixes need post-fix validation
• Includes Phase E parallelism investigation (anti-scaling claim disproven by replication batch)
• Lists untried phases (B, C, D, E) and what future agents should do

.claude/plans/astral-phase-a-retrospective.md

11. .claude/plans/astral-speed-5x-roadmap.md 📝 Documentation +466/-0

Long-horizon Astral 5× speed roadmap and phase design

• Comprehensive 5× Astral speed roadmap with 5 phases: spectrum cleanup, precursor tightening,
 branch-and-bound, GF threshold, parallelism
• Details working thesis, multiplicative work loops, expected payoff per phase, telemetry
 requirements, and acceptance/kill gates
• Proposes implementation order (Iterations 0–5) with Phase A as opening attempt
• Includes update note that Phase A was attempted and reverted; Phase B remains viable

.claude/plans/astral-speed-5x-roadmap.md

12. .claude/plans/astral-next-experiments.md 📝 Documentation +285/-0

Post-retrospective action plan for Phase B and Experiment 2

• Historical staging plan for Phase B and Experiment 2 (exact mass-interval pruning)
• Documents Phase B as already shipped with −10.4% Astral wall improvement
• Describes Experiment 2 as statistically real but below 5% graduation gate
• Provides updated priority order and experiments to avoid
• Includes benchmark rules and recommendations for future iterations

.claude/plans/astral-next-experiments.md

13. .claude/plans/experiment-2-mass-interval-pruning.md 📝 Documentation +154/-0

Experiment 2 retrospective: exact mass-interval pruning results

• Retrospective for exact prefix mass-interval pruning attempt with 4 checkpoints
• Records −2.27% Astral wall improvement (statistically significant, p≈0.01) but below 5% default-on
 gate
• Documents implementation strategy, bound construction, and intersection testing
• Includes checkpoint results showing 12.22% prune rate but 11% wall regression at Checkpoint 2,
 improved to +3.4% at Checkpoint 3
• Concludes with verdict that Phase B remains the durable lever; Experiment 2 retained as
 retrospective only

.claude/plans/experiment-2-mass-interval-pruning.md

14. .claude/plans/SHIPPED.md 📝 Documentation +54/-0

Shipped work and iteration retrospective summary

• Short retrospective of recent shipped iterations and abandoned experiments
• Documents Phase B (calibrated precursor-window tightening) with −10.4% Astral wall, −18.0% TMT
 wall
• Records Phase A and Phase E as abandoned with detailed findings
• Includes Experiment 2 as below-gate but reproducible
• Lists active work and provides reference links to detailed retrospectives

.claude/plans/SHIPPED.md

15. .claude/plans/README.md 📝 Documentation +12/-5

Plans directory documentation and organization

• Reorganizes plan directory structure with Active, History, and Archived sections
• Links to Phase A retrospective, 5× roadmap, and Experiment 2 findings
• Clarifies that detailed plans live outside the repo to avoid checking artifacts into git

.claude/plans/README.md

16. benchmark/ci/PXD001819/run_ci.sh ⚙️ Configuration changes +6/-6

Switch benchmark output format from mzIdentML to Percolator PIN

• Changed output file format from mzIdentML (MZID) to Percolator PIN format (PIN)
• Updated variable name from MZID to PIN throughout the script for consistency
• Modified error message and error code to reflect missing PIN file instead of mzIdentML
• Updated script argument passed to extract_metrics.py from --mzid to --pin

benchmark/ci/PXD001819/run_ci.sh

17. .claude/plans/parameter-modernization-flag-inventory.md Additional files +0/-90

...

.claude/plans/parameter-modernization-flag-inventory.md

18. .claude/plans/parameter-modernization.md Additional files +0/-159

...

.claude/plans/parameter-modernization.md

19. .claude/plans/search-sync-cleanup.md Additional files +0/-133

...

.claude/plans/search-sync-cleanup.md

qodo-code-review · 2026-05-01T09:08:58Z

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (0)

1. Uncaught sysprop parse crash 🐞 Bug ☼ Reliability

Description

MSGFPlus.runMSGFPlus parses tightening system properties with Float.parseFloat without handling
invalid values; a malformed -Dmsgfplus.tightening* value will throw NumberFormatException and abort
the run when calibration stats are reliable and tolerances are ppm-based.

Code

src/main/java/edu/ucsd/msjava/cli/MSGFPlus.java[R403-411]

+            float sigmaMultiplier = Float.parseFloat(System.getProperty(
+                    "msgfplus.tighteningSigmaMultiplier",
+                    String.valueOf(MassCalibrator.DEFAULT_TIGHTENED_WINDOW_SIGMA_MULTIPLIER)));
+            float floorPpm = Float.parseFloat(System.getProperty(
+                    "msgfplus.tighteningFloorPpm",
+                    String.valueOf(MassCalibrator.DEFAULT_TIGHTENED_WINDOW_FLOOR_PPM)));
+            float marginPpm = Float.parseFloat(System.getProperty(
+                    "msgfplus.tighteningMarginPpm",
+                    String.valueOf(MassCalibrator.DEFAULT_TIGHTENED_WINDOW_MARGIN_PPM)));

Evidence

The new tightening constants are read via Float.parseFloat(System.getProperty(...)) without a
try/catch, so any non-float value will terminate the run. Elsewhere in the codebase, sysprop parsing
is defensively guarded (NumberFormatException caught and defaults used), showing the expected
robustness pattern for optional tuning knobs.

src/main/java/edu/ucsd/msjava/cli/MSGFPlus.java[392-411]
src/main/java/edu/ucsd/msjava/msdbsearch/CompactSuffixArray.java[484-493]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`MSGFPlus.runMSGFPlus` reads `msgfplus.tighteningSigmaMultiplier`, `msgfplus.tighteningFloorPpm`, and `msgfplus.tighteningMarginPpm` via `Float.parseFloat(...)` with no error handling. If a user supplies a malformed value, the search aborts with `NumberFormatException`.

### Issue Context
These are explicitly described as tuning knobs for sweeps; they should behave like other sysprop knobs in the repo (ignore invalid values and fall back to defaults) rather than hard-crashing.

### Fix Focus Areas
- src/main/java/edu/ucsd/msjava/cli/MSGFPlus.java[403-411]

### Suggested fix
- Add a small helper (local method or utility) like `float readFloatSysprop(String key, float defaultValue)` that:
 - reads the property
 - `trim()`s it
 - catches `NumberFormatException`
 - optionally rejects `NaN`/`Infinity` and negative values for these specific knobs
 - prints a one-line warning to stderr/stdout when falling back
- Use the helper for the three tightening constants before calling `tightenedTolerancePpm(...)`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

…diff) These directories are out of scope for the precursor-window-tightening PR. Restore them to match dev exactly so they no longer appear in the PR's cumulative diff against dev. After this commit: git diff origin/dev..HEAD -- .claude/plans benchmark/ci returns empty. Removed (branch-only additions, never on dev): .claude/plans/SHIPPED.md .claude/plans/astral-next-experiments.md .claude/plans/astral-phase-a-retrospective.md .claude/plans/astral-speed-5x-roadmap.md .claude/plans/experiment-2-mass-interval-pruning.md Restored to dev's version (branch had not picked up these dev additions): .claude/plans/parameter-modernization-flag-inventory.md .claude/plans/parameter-modernization.md .claude/plans/search-sync-cleanup.md Restored to dev's version (branch had local edits): .claude/plans/README.md benchmark/ci/README.md benchmark/ci/PXD001819/baseline.tsv benchmark/ci/PXD001819/extract_metrics.py benchmark/ci/PXD001819/run_ci.sh benchmark/ci/PXD001819/test_compare_metrics.py

…operties Adds two override knobs for the calibration pre-pass: -Dmsgfplus.maxSampled=<int> (default 500) -Dmsgfplus.minConfidentPsms=<int> (default 200) Both fall back to the historical defaults on unset/non-numeric/non-positive values, so existing behavior is unchanged. Internal constants moved from private to public DEFAULT_* so callers and tests can reference them. CalibrationStats.hasReliableStats() no longer hard-codes the MIN_CONFIDENT_PSMS threshold; the calibrator now emits confidentPsmCount=0 as the "unreliable, do not apply" sentinel and hasReliableStats checks count > 0. Behavior is identical for all existing callers. Defaults intentionally unchanged. A 9-trial Astral A/B (3 trials each at minConfidentPsms = 200 / 500 / 1000, all with maxSampled=3000) showed: config sigma ppm tighten ppm wall s targets decoys (500, 200) HIST 0.994 3.482 469.0 89580 45292 (3000, 200) 0.454 2.000 470.3 89249 44538 (3000, 500) 0.511 2.033 471.7 89212 44555 (3000, 1000) 0.729 2.687 469.7 89524 44817 Key findings: - Wall time is statistically equivalent across all four configurations (~470 s, well inside run-to-run sigma of ~25 s). - Sigma DECREASES as minConfidentPsms DROPS at fixed maxSampled (0.454 < 0.511 < 0.729 ppm for 200 < 500 < 1000). Confirms the prior that stratification depth, not pool size, drives sigma quality. - Lower minConfidentPsms produces an over-tight window: (3000, 200) hits the 2.0 ppm floor and loses 331 native targets vs the historical (500, 200) default. - No configuration tested beats the historical (500, 200) defaults end to end, which were already validated on 3 datasets in PR #26. Conclusion: ship the configurability for users who want to tune for specific instruments / file sizes, but do not change the defaults. 26 unit tests verify property parsing (default fallback, valid int, trim, non-numeric rejection, non-positive rejection, exported constant values).

ypriverol added 24 commits April 28, 2026 06:48

ypriverol added 2 commits May 1, 2026 10:36

ypriverol merged commit 6512011 into dev May 1, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase B: calibrated precursor-window tightening (-7.8% Astral wall)#26

Phase B: calibrated precursor-window tightening (-7.8% Astral wall)#26
ypriverol merged 26 commits intodevfrom
feat/precursor-window-tightening

ypriverol commented May 1, 2026

Uh oh!

coderabbitai Bot commented May 1, 2026 •

edited

Loading

Review skipped

Uh oh!

qodo-code-review Bot commented May 1, 2026

Uh oh!

qodo-code-review Bot commented May 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ypriverol commented May 1, 2026

Summary

Validation — 3-dataset, 3-trial, isolated bench machine

Notable commits

Test plan

Uh oh!

coderabbitai Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

qodo-code-review Bot commented May 1, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-code-review Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 1, 2026 •

edited

Loading

qodo-code-review Bot commented May 1, 2026 •

edited

Loading