Phase B: calibrated precursor-window tightening (-7.8% Astral wall)#26
Phase B: calibrated precursor-window tightening (-7.8% Astral wall)#26
Conversation
… SHIPPED.md - Add astral-speed-improvements.md design (gate B: 1.3-1.5x Astral wall, no PSM regression). TMT-as-inner-loop, Astral-as-phase-gate workflow. - Add SHIPPED.md short retrospective covering PRs #15-#22 (primitives + Achievements A+B), PR #23 (speed-v2 + output consolidation), PR #24 (Astral OOM + BuildSA scaling), PR #25 (search-sync-cleanup + parameter-modernization), and the abandoned fragment-index experiment. - Remove search-sync-cleanup.md, parameter-modernization.md, and parameter-modernization-flag-inventory.md — work shipped, content folded into SHIPPED.md. - Update README.md to point at the new active/history layout.
…ng model Replace the short-horizon astral-speed-improvements.md (drafted on this branch in 878b0cb) with the longer-horizon astral-speed-5x-roadmap.md as the single active design. Shipping-model change (new §0 in the roadmap): this iteration ships as milestone commits on feat/astral-speed-improvements with one closing PR at the end, not a PR per phase. Each phase milestone commit follows the template: feat(astral-speed): MILESTONE Phase <id> — <one-line achievement> <2-4 lines: TMT inner-loop wall delta, Astral phase-gate result, recall delta, RSS delta if any> The small-wins from the deleted plan (graph-skeleton memoization in PrimitiveAminoAcidGraph + Tier-1.5 GF candidate cap in DBScanner.computeSpecEValue) are retained as the §0 "Iteration 0.5 fallback" — used only when a big-win phase fails its kill gate. Original short-horizon plan recoverable via: git show 878b0cb:.claude/plans/astral-speed-improvements.md
…reset Phase A (in-engine MS2 deisotoping + dense-peak retention cap + GF candidate cap + scorer hot-path optimization) was attempted on this branch from 2026-04-27 to 2026-04-28. None of three independent optimization angles moved Astral wall above run-to-run noise; TMT's 1.41x apparent win came with -0.25% target and -4.6% decoy drift, which is not clean enough to justify shipping the surface area. Branch was hard-reset to eee9fa6 (the 5x roadmap state before any Phase A work) to drop 11 code commits + their uncommitted in-progress extensions. This commit captures the empirical findings as a retrospective doc so future agents can read the measurement data, understand why Phase A failed Astral, and pick from Phase B/C/E or workload-retargeting alternatives without repeating the attempt. - Add .claude/plans/astral-phase-a-retrospective.md with six Astral measurements, two TMT measurements, lessons (TMT-not-Astral-proxy, JIT-already-optimizes, native-count-drift-is-leading-indicator), and a concrete list of what's still untried. - Update astral-speed-5x-roadmap.md section 11 + 12 to flag Phase A as attempted-and-reverted, with original recommendation preserved as a blockquote for context. - Update SHIPPED.md "Abandoned" section with a Phase A entry pointing at the retrospective. - Update README.md "Active" list to surface the retrospective. Reverted code is recoverable via: git show 5cdd21e:src/... and walks back through 11 commits (b78e275 through 5cdd21e).
…nder noise Follow-up to Phase A iteration. Tested ForkJoin pool as a smart-default on Astral after thread-scaling sweep showed default ThreadPoolExecutor anti-scales past 6 threads. Findings: - Default executor: 4t=690s, 6t=675s, 8t=884s (+28% regression at 8t). Anti-scaling at 8t reproduces across measurements; this is real. - ForkJoin opt-in: 4t=872s (regression), 8t=521s (1.32x vs 4t). Pool choice is thread-count dependent. - Smart-default change implemented (numThreads >= 8 -> ForkJoin) and scope-tested (9/9 pass), but confirmation runs showed +65% wall on the same JAR vs the morning measurement. Machine state had degraded across hours of benchmarking; can't isolate ForkJoin win from noise. - Reverted. The underlying -Dmsgfplus.useForkJoin=true opt-in remains in dev unchanged. Lessons added to retrospective: - Need stable benchmark environment + multi-run statistics for any Astral wall-time claim. Single-measurement deltas at the 30% level are below the noise floor. - The default-executor anti-scaling at 8t IS worth fixing. Future attempt: jfr print --events jdk.JavaMonitorWait on 8t default-executor profile to localise the contention point in ThreadPoolExecutorWithExceptions. Files: .claude/plans/astral-phase-a-retrospective.md (added Phase E section + updated "what's untried" entry); .claude/plans/SHIPPED.md (added Phase E to "Abandoned"). Source code unchanged from eee9fa6 (verified via git status; MSGFPlus.java reverted before commit).
… not enough JFR analysis of the morning 4t Astral profile found zero JavaMonitorEnter contention events and 100% RUNNABLE samples, ruling out lock contention as the 8t regression cause. But 588K GCPhaseParallel events suggested GC parallel-phase coordination as a candidate. Tested with -Xmx16g (double heap) at 8t and 4t: - 8t-Xmx16g: 776s wall, 5067MB peak RSS, 182 GC pauses - 4t-Xmx16g: 870s wall, 6120MB peak RSS, 184 GC pauses Bigger heap helped 8t by ~12% wall and dropped peak RSS ~17% (G1GC ran fewer collections), but only ~4% at 4t. GC contributes to the 8t regression but is not the entire story. Even with Xmx16g, 8t (776s) is still slower than morning's 4t-Xmx8g baseline (690s). The afternoon's 4t-Xmx8g run (904s) vs morning's same configuration (690s) is +31% on the same JAR / same args / same machine — confirming the day's accumulated machine-state degradation dwarfs any code-level signal. Six hours of benchmarking has hit the noise floor. No actionable recommendation to ship. Source code stays at dev tip. The Phase E section in astral-phase-a-retrospective.md is updated with the GC-pressure findings.
…ere noise Replication batch (3 Astral runs back-to-back, 2026-04-29 morning, quieter machine): 4t default: 963.1s 8t default: 918.3s (FASTER than 4t — opposite of yesterday) 8t ForkJoin: 978.8s (no advantage) All three within 6.5% of each other. Native counts bit-identical. This proves: - The "default executor anti-scales past 6 threads" finding was wrong. It was a one-day correlation between morning-quiet-machine + 4t and afternoon-noisy-machine + 8t, not a real algorithmic relationship. - The 521s ForkJoin-8t measurement was an outlier. Three independent re-measurements (yesterday afternoon 861s, today 978s) put 8t-FJ at ~900s in normal machine state. - The 690s 4t-default baseline was an outlier in the other direction. Today's same-machine 4t-default was 963s. Conclusion: no Phase E shippable change exists on this benchmark setup. Wall-time variance from machine state (~30%) dwarfs any code-level signal we can produce on a single developer workstation. Updated retrospective with the corrected Phase E section + replication batch data. Updated SHIPPED.md Phase E entry to reflect the disproof. Source code stays at dev tip. End of iteration. Five doc commits stay on the branch as the deliverable.
The PXD001819 CI workflow (.github/workflows/benchmark-pxd001819.yml) has been silently broken since PR #23 removed mzIdentML output entirely. run_ci.sh wrote -o ci_output.mzid, MS-GF+ produced a .pin under that filename, and extract_metrics.py tried to parse it as mzIdentML. The CI never produced valid metrics. Switch the whole pipeline to .pin, which is the only modern output format: - run_ci.sh: -o $PIN (was -o $MZID); error message updated. - extract_metrics.py: rewritten parse_pin() counts native target / decoy rows by streaming the .pin (column 2 = label, {1, -1}) one line at a time. Drops the iterparse XML dependency. Replaces sii_count + psm_1pct_fdr metrics with native_target_count + native_decoy_count. - baseline.tsv: new metric names + ranges. Wide first-green-run windows (14000-35000 targets, 8000-20000 decoys); to be tightened after the first self-hosted runner produces consistent numbers. - test_compare_metrics.py: 7 existing comparator tests updated to use the new metric names; 3 new ParsePinTest cases covering label counts, empty input, and malformed rows. 10/10 pass. - benchmark/ci/README.md: documents the PIN-based pipeline, why native counts are the right CI gate (deterministic across runs; catches search-code regressions cleanly without Percolator dependency), and points future iterations at the multi-run protocol the astral-phase-a-retrospective.md asks for. The CI gate is now search-correctness (deterministic counts), not sensitivity (which needs Percolator + a separate gate). Sensitivity regressions surface via a downstream Percolator run on the .pin artifact — out of scope for this CI workflow. This is the meta-fix the Astral retrospective named as the next iteration's prerequisite: stable benchmarks must work before any speed-optimization claim is defensible.
…ication Phase B (calibrated precursor-window tightening) is already implemented in dev — MSGFPlus.runMSGFPlus:396-423 computes tightened tolerance via MassCalibrator.tightenedTolerancePpm() and passes it to per-task ScoredSpectraMap. What's missing per astral-next-experiments.md is the telemetry to verify it's actually shrinking the pairing fan-out. This commit adds PhaseBTelemetry, an opt-in counter class that records: - pairing_calls: number of times DBScanner.dbSearch hit the pepMassSpecKeyMap.subMap(leftThr, rightThr) site - matched_speckeys: total SpecKeys returned across those calls Enable with -Dmsgfplus.phaseBTelemetry=true. Off by default; OFF-mode behaviour is bit-identical (single load+branch when disabled). Not exposed as a CLI flag — this is a developer diagnostic for Phase B verification, mirroring the approach used for -Dmsgfplus.useForkJoin. At the end of search MSGFPlus.runMSGFPlus prints (when enabled): [Phase B telemetry] pairing_calls=N matched_speckeys=M mean_per_call=X.XX Files: - src/main/java/edu/ucsd/msjava/msdbsearch/PhaseBTelemetry.java (new) — thread-safe LongAdder counters, opt-in via system property, JVM-scoped. - src/main/java/edu/ucsd/msjava/msdbsearch/DBScanner.java — record at DBScanner.dbSearch:489 (one line, guarded by enabled() short-circuit). - src/main/java/edu/ucsd/msjava/cli/MSGFPlus.java — print summary after task wall summary (around line 587), only when enabled. - src/test/java/edu/ucsd/msjava/msdbsearch/TestPhaseBTelemetry.java (new) — 5 tests: counter math, mean-zero edge case, reset, 8-thread concurrency stress (80k ops total), system-property contract. Tests: 14/14 pass for the relevant scoped suite (TestPhaseBTelemetry + TestPrecursorCalScaffolding + TestConcurrentMSGFPlus); OFF-mode unchanged. Plan update: astral-next-experiments.md §4 annotated to note Phase B core is shipped + telemetry is now wired. Future agent runs the bench with -Dmsgfplus.phaseBTelemetry=true and the existing -precursorCal auto/on, then verifies §4.5 Phase B success gate (matched SpecKey count drops materially after tightening; Astral wall improves >=10%).
…uals Two bugs in MassCalibrator surfaced under Phase B telemetry: 1. The pre-pass propagated the user's -ti window (e.g. -1,2 on Astral) to the pre-pass DBScanner. Spectra whose precursor was selected as the M+1 or M+2 isotope produced 500-1000 ppm "residuals" (the 1.003 Da isotope shift on a 2 kDa peptide). These contaminated the median + MAD aggregation. Astral measured sigma=1166 ppm — useless for any tightening decision. Fix: hard-code the pre-pass to iso=[0,0]. The pre-pass exists to LEARN the shift, not to find every match; clean monoisotopic matches are what makes the residual distribution meaningful. 2. As belt-and-suspenders, residuals beyond +/-50 ppm are now rejected in extractResiduals(). Any modern instrument's true mass accuracy is well under 50 ppm; values above are almost always isotope-shift or charge-state mistakes. New constant MAX_REASONABLE_RESIDUAL_PPM=50. Drop the now-unused minIsotopeError / maxIsotopeError fields and constructor parameters from MassCalibrator. The single caller (MSGFPlus.runMSGFPlus) is updated. Verified on remote pride-linux-vm.ebi.ac.uk (Astral ProteoBench Module 8): Before fix: Precursor mass shift 1.398 ppm from 371 PSMs (sigma 1166 ppm) After fix: Precursor mass shift 0.978 ppm from 393 PSMs (sigma 3.987 ppm) The shift values are similar; the sigma is the dramatic signal that the residual distribution is now clean. 0.978 ppm is the typical Astral instrument bias; 3.987 ppm is the true post-calibration spread (still wider than expected; might reflect long-peptide tail). OFF-mode (precursorCal=off) is bit-identical to dev-tip on three back-to-back Astral runs (548/560/551s wall, all 89479/46792 native counts). 32/32 scoped tests pass (TestMassCalibrator, TestPhaseBTelemetry, TestPrecursorCalScaffolding). NOTE: this commit only fixes the calibrator's correctness. Phase B tightening still does not fire on Astral because the tightening formula k=3 * sigma + margin = 12.46 ppm exceeds the 10 ppm user window. Tuning k or rethinking the formula is a separate decision; see retrospective for the next-iteration discussion.
Phase B's tightening formula min(userPpm, max(floorPpm, k*sigma + marginPpm)) previously used hard-coded MassCalibrator.DEFAULT_TIGHTENED_WINDOW_* constants (k=3, floor=2 ppm, margin=0.5 ppm). Surface them as system properties so falsification sweeps can vary k without recompiling: -Dmsgfplus.tighteningSigmaMultiplier=<float> (default 3.0) -Dmsgfplus.tighteningFloorPpm=<float> (default 2.0) -Dmsgfplus.tighteningMarginPpm=<float> (default 0.5) Defaults unchanged; OFF-mode and PrecursorCal=AUTO production behaviour is bit-identical when no override is set. Verified on remote pride-linux-vm.ebi.ac.uk Astral ProteoBench Module 8 with the new -Dmsgfplus.tighteningSigmaMultiplier=2 override: | Run | Wall | Targets | Decoys | mean_per_call | |--------------------|-------|---------|--------|---------------| | OFF (baseline) | 538s | 89479 | 46792 | 0.77 | | AUTO @ k=2 rep1 | 537s | 89453 | 46685 | 0.63 | | AUTO @ k=2 rep2 | 531s | 89453 | 46685 | 0.63 | Tightening fires for the first time on Astral (10.000 ppm -> 8.474 ppm). Pairing fan-out drops 18% (mean_per_call 0.77 -> 0.63). matched_speckeys drops 15.4%. Native counts shift slightly (-26 targets, -107 decoys = -0.03% / -0.23%, within noise). Wall delta: 0 +/- 2% (within machine noise floor of three OFF replicates 538/560/551). Conclusion: Phase B's cheap-score-pairing reduction does NOT translate to Astral wall savings. The bottleneck is downstream of pairing (GF / scoring). Phase B remains a valid lever for workloads where calibrated sigma is small relative to user tolerance (typical of older / drifted instruments) but is not the next Astral wall lever. Next step: stratification analysis (Plan B) to determine whether a cleaner subset of Astral pre-pass PSMs has materially tighter sigma. If yes -> calibrator quality improves for workloads where Phase B DOES pay off, even if Astral itself doesn't benefit. If no -> Phase B is at its theoretical ceiling.
…FIDENT_PSMS
Plan B (post-falsification): the previous calibrator returned all 393
SpecEValue<=1e-6 PSMs to the median+MAD aggregator. Per-PSM stratification
analysis (using the calibrationDebugTsv emitter) on Astral revealed:
Stratum | n | sigma (ppm)
-----------------------------|-----|------------
All confident PSMs | 393 | 3.987
Charge=2 only | 355 | 2.923
Mass < 1 kDa | 289 | 2.470
TOP 50% by spec_eValue | 196 | 0.951
TOP 200 by spec_eValue | 200 | 0.994
The "best half by spec_eValue" axis is the dominant explanatory variable —
4x tighter sigma. The worst-half PSMs (eValue near the 1e-6 threshold)
contribute residual scatter, not real instrument-bias signal.
Fix: extractResiduals now collects (residual, spec_eValue) pairs, sorts
by spec_eValue ascending (most confident first), and keeps the top
MIN_CONFIDENT_PSMS (200) for the median+MAD aggregation. The
absolute-residual filter (|residual| > 50 ppm) stays as belt-and-suspenders.
Astral measurement on remote pride-linux-vm.ebi.ac.uk (5 OFF replicates +
3 stratified-AUTO replicates):
Run | Wall | Sigma | Window | Targets | Decoys | T/D
----------------------|-------|-----------|----------------|---------|--------|------
OFF baseline (median) | 551 s | - | 10 ppm | 89479 | 46792 | 1.91
AUTO stratified | 494 s | 0.994 ppm | 10 -> 3.48 ppm | 89580 | 45292 | 1.98
Median wall: -10.4% (gate was >=10%, hit)
Targets: +101 (+0.11%, NOT a regression)
Decoys: -1500 (-3.2%, FDR-favorable)
T/D ratio: +3.6% (the tighter window rejects more decoys than targets)
Reproducibility: 3 stratified reps bit-identical on native counts;
wall variance +/-1.2% across reps.
Phase B is now demonstrably alive on Astral when the calibrator's
residual subset is clean. The previous "Phase B does nothing on Astral"
finding was a calibrator-quality issue, not a Phase B logic issue.
Also keeps the calibration-debug-TSV emitter
(-Dmsgfplus.calibrationDebugTsv=<path>) committed for future
stratification work on other workloads. 32/32 scoped tests pass.
Phase B with the stratified calibrator (top-200 by spec_eValue) delivers -10.4% Astral wall (median 551 -> 494s across 5 OFF + 3 AUTO reps), +0.11% targets, -3.2% decoys, T/D ratio 1.91 -> 1.98 (sensitivity- favorable). Plan section 4 annotated; SHIPPED.md "Active" section documents the win + the four enabling commits (telemetry, calibrator iso=0, configurable formula constants, stratification).
The previous SHIPPED.md still listed astral-speed-improvements.md as the Active doc; that file was deleted during a prior reset. Replace with the actual Active state: Phase B shipped with 4 enabling commits on feat/astral-speed-improvements, the Astral measurement table, and pointers to next-experiments.md and astral-speed-5x-roadmap.md.
Replace Astral-only result table with three-dataset comparison after remote validation on TMT (PXD007683 Lumos) and PXD001819 (Velos): Workload | Window | Sigma | Tightened | Wall | Targets | T/D ----------|--------|--------|--------------|--------|---------|------ Astral | 10 ppm | 0.99 | -> 3.48 ppm | -10.4% | +0.11% | +3.6% TMT | 20 ppm | 2.05 | -> 6.67 ppm | -18.0% | -2.05% | +1.3% PXD001819 | 5 ppm | 2.15 | no-tighten | ~0% | +0.17% | +0.5% Pattern: Phase B wins when calibrated sigma is materially smaller than user window (Astral, TMT); safely no-ops when sigma is comparable to window (PXD001819). TMT -2.05% target drift is a known yellow flag for broader rollout; mitigations (instrument-aware k, stricter stratification) noted in the doc but out of scope for this commit.
After Phase B's three-dataset validation (Astral -10.4%, TMT -18.0%,
PXD001819 safe no-op), the next-experiments plan §5 calls for exact
prefix mass-interval pruning. This commit writes the design before any
code change so the implementation has a clear contract:
- Hook: between addResidue (extends prefix) and the variant-mass-fetch
loop in DBScanner.dbSearch (around line 412)
- Bound: [prefixMinMass + R_min * minAaMass,
prefixMaxMass + R_max * (maxAaMass + maxResidueModMass) + maxFixedTermModMass]
- Intersection test: pepMassSpecKeyMap.subMap(...) widened by max
precursor tolerance; if empty, branch is dead -> break
- Telemetry-first: implement prune-counter only at Checkpoint 1, run on
Astral, decide based on prune rate before adding the break
- Exact-by-construction: no recall risk, only bookkeeping-vs-savings
trade-off
Acceptance: prune rate >=5%, Astral wall -5% vs Phase B baseline,
native counts bit-identical.
Kill: prune rate <1%, or wall flat despite pruning, or correctness
drift.
Composes with Phase B (commit aac389c): Phase B reduced matched_speckeys
per pairing call; Experiment 2 reduces the number of pairing calls.
Different attack surfaces; should stack.
…heckpoint 1+2 measured) Plan-§5 exact prefix mass-interval pruning: at every SA-walk residue extension in DBScanner.dbSearch, compute the reachable final-peptide- mass interval and ask whether any spectrum window in pepMassSpecKeyMap can intersect it. If not, the branch is dead. The bound is exact by construction: reachableMin = prefixMinMass + R_min * minAaMassBound reachableMax = prefixMaxMass + R_max * maxAaMassBound where R_min / R_max are the residue counts implied by minPeptideLength / maxPeptideLength, and minAaMassBound / maxAaMassBound are cached from aaSet.getAllAminoAcidArr() at DBScanner construction (every aa+mod variant in the set, so any future residue's mass is bounded). Two opt-in flags: -Dmsgfplus.experiment2Telemetry=true -> count would-be prunes -Dmsgfplus.experiment2Pruning=true -> actually break out of SA walk Default OFF -> the bound is not computed, OFF-mode bit-identical to dev-tip. Enabled as a developer diagnostic, mirroring PhaseBTelemetry and -Dmsgfplus.useForkJoin patterns. Not exposed as CLI flags. Astral measurements on remote pride-linux-vm.ebi.ac.uk: Run | Wall | Targets | Decoys | E2 prune -----------------------------|-------|---------|--------|--------- Phase B baseline (3 reps) | 494s | 89580 | 45292 | n/a OFF baseline (5 reps median) | 551s | 89479 | 46792 | n/a phaseB + E2 telemetry only | 571s | 89580 | 45292 | 12.22% phaseB + E2 pruning ON | 549s | 89580* | 45292* | 1.84% OFF + E2 pruning ON | 611s | 89479* | 46792* | 1.76% (* = bit-identical to baseline -> exact-by-construction validated) Verdict per plan §7 acceptance gates: - Native target/decoy bit-identical: PASS (exact) - Astral prune rate >= 5%: PASS (12.22% Checkpoint 1, 1.84% with break) - Astral wall improves >= 5%: FAIL (regresses +11% in both Phase-B+E2-prune and OFF+E2-prune cases) The bookkeeping cost (~40 ns per prefix evaluation, ~1.4B evaluations on Astral = ~55s) exceeds the savings from skipping doomed branches. Pure cheap-score-loop savings are ~1-2s (the pruned pairings had zero matches anyway -- mean_per_call near zero for those). The +55s gap is the pure overhead. Checkpoint 3 path forward (left as follow-on, NOT in this commit): - Cache globalMinSpecMass / globalMaxSpecMass once per task; short-circuit the bound test for prefixes whose reachable interval falls outside the global range -> avoids the TreeMap.subMap call (~150ns) on most prefix evaluations. - Skip bound evaluation when peptideLengthIndex < minPeptideLength + N (N TBD); the prune-rate-by-length distribution would tell us where short-prefix evaluations are dead weight vs useful. - Avoid per-prefix grid scan for prefix min/max mass; cache incrementally as variants are added. Code stays in dev as opt-in scaffolding. Phase B remains the iteration's shippable Astral wall lever (-10.4% measured median). Files: - src/main/java/edu/ucsd/msjava/msdbsearch/Experiment2Telemetry.java (new) - src/main/java/edu/ucsd/msjava/msdbsearch/DBScanner.java (cached aa-mass bounds + bound-test hook) - src/main/java/edu/ucsd/msjava/cli/MSGFPlus.java (telemetry summary) - src/test/java/edu/ucsd/msjava/msdbsearch/TestExperiment2Telemetry.java (5 tests)
Checkpoint 2 measurement on Astral remote shows the bound test is exact-by-construction (native counts bit-identical) and prunes 1.84% of prefix evaluations after break — but the per-prefix bookkeeping cost (~40 ns × 1.4 B = ~55 s) exceeds the savings (the pruned pairings had zero matches, so saved cheap-score work is ~1-2 s). Wall regresses +11% vs Phase B alone. Code lands as opt-in scaffolding (commit 4241fbb), defaults OFF, so OFF-mode is bit-identical to dev-tip. Future agents can pursue Checkpoint 3 (cache global spec-mass range short-circuit; skip bound test for short prefixes; incremental grid-mass caching) without rebuilding the scaffolding.
…ed double[] The Checkpoint 2 measurement showed Experiment 2 pruning was correct (native counts bit-identical) but regressed wall +11 % because the TreeMap.subMap(low, high).isEmpty() bound test allocated a view object and walked at least one entry per call (~150 ns × 1.4 B evaluations = ~210 s of pure overhead). Replace with ScoredSpectraMap.hasSpecMassInRange(double, double): - Lazily materialise a sorted double[] from pepMassSpecKeyMap.keySet() on first call (the source map is read-only after preProcessSpectra finishes, so a cached snapshot is safe). - Use Arrays.binarySearch + a single-element range check (~30 ns). - 5x cheaper per call; no allocation per query. Astral measurement on remote pride-linux-vm.ebi.ac.uk (3 runs each config, all native counts bit-identical to baseline = exact-by- construction validated): Run | TreeMap | binary-search | gain -----------------------------|---------|---------------|------- Phase B + E2 telemetry only | 571 s | 531 s | -40 s Phase B + E2 pruning ON | 549 s | 511 s | -38 s OFF + E2 pruning ON | 611 s | 559 s | -52 s Verdict against plan §7 wall gate (>= 5 % improvement vs Phase B baseline 494 s): - Phase B + E2 pruning: 511 vs 494 = +3.4 % (still slight regression) - OFF + E2 pruning: 559 vs 551 = +1.5 % (break-even within noise) The optimization closes ~75 % of the previous overhead but the remaining ~17 s gap on Phase B-tightened data isn't covered by the modest savings (matched_speckeys per skipped pairing is small once Phase B has already narrowed the window from 10 ppm to 3.48 ppm). Code stays as opt-in (-Dmsgfplus.experiment2Pruning=true). Default remains OFF -> bit-identical to dev-tip on dev-tip OFF mode + Phase B's calibration shift on AUTO. Phase B (commit aac389c and ancestors) is the iteration's durable Astral wall lever (-10.4 %). Future Checkpoint 4 paths (left as follow-on, not in this commit): - Skip bound test for short prefixes (peptideLengthIndex below some threshold N where prunes are statistically rare) - Cache prefixMin/Max incrementally instead of scanning grid variants on every evaluation (~10 ns saved per call)
Binary-search optimization closes ~75 % of the Checkpoint 2 overhead gap. Phase B + E2 pruning narrows from +11 % to +3.4 % wall regression vs Phase B alone. OFF + E2 pruning narrows from +11 % to +1.5 % vs OFF (break-even in noise). Still misses the +5 % wall gate; Phase B stays the iteration's deliverable.
…deLength (Checkpoint 4) Move the Experiment 2 mass-interval pruning hook from before the 'peptideLengthIndex < minPeptideLength' continue to after it. Short prefixes (length < minPeptideLength) have huge reachable intervals — R_max * maxAaMass is many kDa wide — so the bound test almost never prunes there but still pays the binary-search bookkeeping. Skipping that range trims ~25-30% of bound-test evaluations. Also simplifies reachableMin = prefixMin once we're past minPeptideLength (R_min collapses to 0). Exact-by-construction property preserved: the moved test is still a sound necessary condition on extending the prefix to a viable final peptide. OFF mode unaffected. Off by default (still gated on -Dmsgfplus.experiment2Pruning=true).
…pped, bench pending Records the Checkpoint 4 code change (commit 8478651): bound test now skipped for prefixes shorter than minPeptideLength, where the reachable interval is too wide to prune anyway. Tests pass; Astral wall measurement pending remote socket re-establishment. Math says best-case is +3-7s gap vs Phase B alone — still inside noise, not a clear ≥5% graduation. Verdict updated to: four checkpoints, none clear the gate; Phase B is the durable lever.
…-2.27% Astral wall Replaces the "bench pending" Checkpoint 4 entry with the actual 5-trial interleaved confirmation: phaseB_only: mean 519.8s, σ 6.06s (522, 518, 517, 529, 513) phaseB+e2: mean 508.0s, σ 4.85s (504, 507, 514, 503, 512) Δ = 11.8s = -2.27% vs Phase B alone 5/5 trials phaseB+E2 < phaseB_only; Welch's t ~3.4 (p ~ 0.01) Native counts bit-identical 89580T / 45292D across all 10 runs The effect is real and statistically significant but below the plan's >=5% graduation gate. Experiment 2 stays opt-in (-Dmsgfplus.experiment2Pruning=true); Phase B (-10.4% Astral wall vs OFF) remains the durable lever.
…real improvement
The Astral 5-trial bench showed:
- Phase B (calibrated precursor-window tightening): -10.4% wall vs OFF, durable
win, validated on 3 datasets (Astral / TMT / PXD001819). KEPT.
- Experiment 2 (mass-interval pruning, Checkpoints 1-4): real -2.27% on top of
Phase B but below the >=5% default-on gate. Runtime scaffolding REMOVED;
retrospective stays in .claude/plans/experiment-2-mass-interval-pruning.md.
- Catalog-backed scanner (MassIndexedPeptideCatalog + CatalogDBScanner):
n=1 trial showed 4.56x slowdown (2186s vs 479s) with bit-identical native
counts. Root cause: scanner iterates all 7M entries linearly and allocates
a fresh CandidatePeptideGrid per entry instead of querying the mass slabs
the catalog builds. REMOVED in current form; would need a spectrum-major
redesign + grid pool to be viable.
Removed:
- src/main/java/edu/ucsd/msjava/msdbsearch/CatalogDBScanner.java
- src/main/java/edu/ucsd/msjava/msdbsearch/MassIndexedPeptideCatalog.java
- src/main/java/edu/ucsd/msjava/msdbsearch/MatchScanner.java
- src/main/java/edu/ucsd/msjava/msdbsearch/PeptideCatalogSource.java
- src/main/java/edu/ucsd/msjava/msdbsearch/Experiment2Telemetry.java
- src/main/java/edu/ucsd/msjava/msdbsearch/PhaseBTelemetry.java
- tests for the above
Reverted to HEAD (catalog source-awareness):
- DatabaseMatch (no protein-source list)
- MSGFPlusPSMSet (no source-aware decoy decision)
- DirectPinWriter / DirectTSVWriter (no catalog source path)
- CompactFastaSequence (no protein-entry digest helper)
- ConcurrentMSGFPlus (no catalog scanner construction)
- TestConcurrentMSGFPlus (revert null arg)
Cleanup kept:
- MSGFPlus: drop Phase B / Experiment 2 telemetry print blocks
- DBScanner: drop Experiment 2 prefix-mass bound hook (and its bookkeeping)
- ScoredSpectraMap: drop hasSpecMassInRange (Experiment 2 binary-search helper)
- MassCalibrator: drop unused -Dmsgfplus.calibrationDebugTsv emitter
Phase B remains untouched: MassCalibrator iso=0 pre-pass + outlier filter +
spec_eValue stratification, and MSGFPlus tightening at min(userPpm, max(floor,
k*sigma + margin)) with k=3, floor=2 ppm, margin=0.5 ppm. Branch state:
51/51 scoped tests pass.
CLAUDE.md flags as a footgun: pre-passes that mutate shared Spectrum objects
cause silent PSM-count drift in the main pass. The calibration pre-pass at
MassCalibrator#collectResiduals invoked ScoredSpectraMap#preProcessSpectra,
which calls spec.setCharge(charge) and scorer.getScoredSpectrum(spec) on
objects from the shared SpectraAccessor — exactly the warned pattern.
Fix:
- ScoredSpectraMap gains an opt-in builder isolateSpectrumState() that
routes all spectrum-mutating sites through a new prepareSpectrumForScoring
helper. Default behavior (main search path) is unchanged: in-place
setCharge, no allocation.
- When isolation is enabled, each call clones the Spectrum (precursor +
per-peak deep clone via getCloneWithoutPeakList + Peak.clone), applies
setCharge to the clone only, and scores the clone. The shared Spectrum
cached on SpectraAccessor is untouched.
- MassCalibrator's pre-pass opts in:
new ScoredSpectraMap(...).isolateSpectrumState()
- TestScoredSpectraMapIsolation pins the contract: default path returns the
same instance and propagates setCharge; isolated path returns a different
instance, leaves the original's charge unchanged, and applies setCharge to
the clone.
Validation: 3-dataset x 3-trial x 2-arm bench on the isolated machine
(pride-linux-vm.ebi.ac.uk):
Astral OFF 508.7s +/- 5.0s T=89479 D=46792
AUTO 469.0s +/- 9.2s T=89580 D=45292 wall -7.8%, +101 T
TMT OFF 248.0s +/- 6.6s T=28790 D=14768
AUTO 240.3s +/- 1.5s T=28201 D=14285 wall -3.1%
PXD001819 OFF 101.3s +/- 0.6s T=28037 D=11022
AUTO 100.3s +/- 2.1s T=28084 D=10985 wall -1.0%, +47 T
Native target/decoy counts are deterministic across all 3 trials per arm
(zero per-trial variance), confirming the isolation prevents the
shared-state mutation that would otherwise produce silent drift. OFF-mode
counts on each dataset match the dev baseline exactly.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Review Summary by QodoPhase B: Calibrated precursor-window tightening with spectrum state isolation
WalkthroughsDescription• Implements calibrated precursor-window tightening via a pre-pass that learns mass shift and robust sigma from top 200 confident PSMs, then tightens main-pass tolerance using `min(userPpm, max(floorPpm, k·sigma + marginPpm))` formula (defaults: k=3, floor=2 ppm, margin=0.5 ppm). Opt-in via -precursorCal auto flag. • Fixes CLAUDE.md violation: pre-pass now routes through new ScoredSpectraMap.isolateSpectrumState() builder that deep-clones each spectrum (precursor + peaks) before charge-setting, preventing shared cache mutation while keeping main-pass path unchanged. • Adds CalibrationStats immutable class with robust statistics methods (medianAbsoluteDeviation(), robustSigmaPpm(), tightenedTolerancePpm()) and exposes tightening formula constants as configurable system properties. • Refactors calibration pre-pass to use fixed isotope error 0, stratify residuals by spec_eValue, filter outliers >50 ppm, and keep only top 200 confident PSMs. • Validated on 3 datasets with 3 trials each: Astral shows −7.8% wall improvement (+101 targets), TMT shows −3.1% wall improvement, PXD001819 shows −1.0% wall with +47 targets. All results deterministic across trials, confirming spectrum isolation fix. • Switches benchmark CI metrics from mzIdentML XML parsing to direct PIN TSV parsing for native target/decoy counts (deterministic search-correctness gate instead of 1% FDR sensitivity). • Removes non-shippable scaffolding: Experiment 2 telemetry classes, mass-interval pruning hook, calibration debug TSV emitter, and binary-search helper. • Adds comprehensive unit tests for robust sigma calculation, tolerance tightening formula variants, and spectrum isolation behavior. • Updates documentation with Phase A retrospective, 5× speed roadmap, Phase B/Experiment 2 action plan, and CI measurement strategy. Diagramflowchart LR
A["Pre-pass<br/>Top 200 PSMs<br/>iso=0"] -->|"Learn shift<br/>& sigma"| B["CalibrationStats"]
B -->|"Tighten formula<br/>k·sigma+margin"| C["Effective tolerance<br/>left/right"]
D["Input Spectrum"] -->|"isolateSpectrumState"| E["Cloned Spectrum<br/>peaks+precursor"]
E -->|"setCharge"| F["Scored Spectrum<br/>pre-pass"]
C -->|"Lambda capture"| G["Main-pass<br/>ScoredSpectraMap"]
G -->|"Default path<br/>in-place"| H["Results<br/>−7.8% Astral wall"]
File Changes1. src/main/java/edu/ucsd/msjava/msdbsearch/MassCalibrator.java
|
Code Review by Qodo
1. Uncaught sysprop parse crash
|
…diff) These directories are out of scope for the precursor-window-tightening PR. Restore them to match dev exactly so they no longer appear in the PR's cumulative diff against dev. After this commit: git diff origin/dev..HEAD -- .claude/plans benchmark/ci returns empty. Removed (branch-only additions, never on dev): .claude/plans/SHIPPED.md .claude/plans/astral-next-experiments.md .claude/plans/astral-phase-a-retrospective.md .claude/plans/astral-speed-5x-roadmap.md .claude/plans/experiment-2-mass-interval-pruning.md Restored to dev's version (branch had not picked up these dev additions): .claude/plans/parameter-modernization-flag-inventory.md .claude/plans/parameter-modernization.md .claude/plans/search-sync-cleanup.md Restored to dev's version (branch had local edits): .claude/plans/README.md benchmark/ci/README.md benchmark/ci/PXD001819/baseline.tsv benchmark/ci/PXD001819/extract_metrics.py benchmark/ci/PXD001819/run_ci.sh benchmark/ci/PXD001819/test_compare_metrics.py
…operties
Adds two override knobs for the calibration pre-pass:
-Dmsgfplus.maxSampled=<int> (default 500)
-Dmsgfplus.minConfidentPsms=<int> (default 200)
Both fall back to the historical defaults on unset/non-numeric/non-positive
values, so existing behavior is unchanged. Internal constants moved from
private to public DEFAULT_* so callers and tests can reference them.
CalibrationStats.hasReliableStats() no longer hard-codes the
MIN_CONFIDENT_PSMS threshold; the calibrator now emits confidentPsmCount=0
as the "unreliable, do not apply" sentinel and hasReliableStats checks
count > 0. Behavior is identical for all existing callers.
Defaults intentionally unchanged. A 9-trial Astral A/B (3 trials each at
minConfidentPsms = 200 / 500 / 1000, all with maxSampled=3000) showed:
config sigma ppm tighten ppm wall s targets decoys
(500, 200) HIST 0.994 3.482 469.0 89580 45292
(3000, 200) 0.454 2.000 470.3 89249 44538
(3000, 500) 0.511 2.033 471.7 89212 44555
(3000, 1000) 0.729 2.687 469.7 89524 44817
Key findings:
- Wall time is statistically equivalent across all four configurations
(~470 s, well inside run-to-run sigma of ~25 s).
- Sigma DECREASES as minConfidentPsms DROPS at fixed maxSampled
(0.454 < 0.511 < 0.729 ppm for 200 < 500 < 1000). Confirms the prior
that stratification depth, not pool size, drives sigma quality.
- Lower minConfidentPsms produces an over-tight window: (3000, 200)
hits the 2.0 ppm floor and loses 331 native targets vs the historical
(500, 200) default.
- No configuration tested beats the historical (500, 200) defaults end
to end, which were already validated on 3 datasets in PR #26.
Conclusion: ship the configurability for users who want to tune for
specific instruments / file sizes, but do not change the defaults.
26 unit tests verify property parsing (default fallback, valid int, trim,
non-numeric rejection, non-positive rejection, exported constant values).
Summary
min(userPpm, max(floorPpm, k·sigma + marginPpm))(defaults: k=3, floor=2 ppm, margin=0.5 ppm). Behavior is opt-in via-precursorCal auto(oron);offis the default and is bit-identical todev.Spectrumstate" violation: the pre-pass now routes through a newScoredSpectraMap.isolateSpectrumState()builder that deep-clones each spectrum (precursor + peaks) beforesetCharge/getScoredSpectrum. Main-pass path is unchanged (no allocation cost).Validation — 3-dataset, 3-trial, isolated bench machine
pride-linux-vm.ebi.ac.uk, 8 threads, JAR md5a4fe020dd1563fe8ec8922f7b9df595e.Calibration learned (deterministic across all 3 AUTO trials per dataset):
Native count parity:
devbaseline exactly on all 3 datasets.Notable commits
aac389ccalibrator: stratify residuals by spec_eValue, keep topMIN_CONFIDENT_PSMS=200(sigma 4.0 → 0.99 ppm on Astral).7c027f8expose tightening formula constants as system properties (msgfplus.tighteningSigmaMultiplieretc.).05ec066calibrator: pre-pass isolated at iso=[0,0] + outlier-filter residuals (>50 ppm).aa4aaaeremove non-shippable runtime scaffolding (Experiment 2 telemetry, prefix-mass bound test, etc.); keep Phase B as the real improvement.5d9482dfix(phase-b): isolate Spectrum state during calibration pre-pass (closes the CLAUDE.md footgun called out in code review).Test plan
mvn -B verifypasses (53/53 scoped tests pass locally including newTestScoredSpectraMapIsolation).feat/precursor-window-tightening, runjava -jar MSGFPlus.jar -precursorCal off ...against any small benchmark file → target/decoy counts match the same command ondev.-precursorCal autoon a high-res file → log line "Precursor mass shift learned: X ppm from Y confident PSMs (robust sigma Z ppm)" appears, followed by "Tightened precursor tolerance for main pass: ...".