Skip to content

Phase B: calibrated precursor-window tightening (-7.8% Astral wall)#26

Merged
ypriverol merged 26 commits intodevfrom
feat/precursor-window-tightening
May 1, 2026
Merged

Phase B: calibrated precursor-window tightening (-7.8% Astral wall)#26
ypriverol merged 26 commits intodevfrom
feat/precursor-window-tightening

Conversation

@ypriverol
Copy link
Copy Markdown
Member

Summary

  • Adds a calibration pre-pass that learns the run's precursor mass shift and robust sigma from the top 200 confident PSMs (ranked by spec_eValue), then tightens the main-pass precursor tolerance via min(userPpm, max(floorPpm, k·sigma + marginPpm)) (defaults: k=3, floor=2 ppm, margin=0.5 ppm). Behavior is opt-in via -precursorCal auto (or on); off is the default and is bit-identical to dev.
  • Closes a CLAUDE.md "pre-passes must not mutate shared Spectrum state" violation: the pre-pass now routes through a new ScoredSpectraMap.isolateSpectrumState() builder that deep-clones each spectrum (precursor + peaks) before setCharge/getScoredSpectrum. Main-pass path is unchanged (no allocation cost).
  • Removes non-shippable scaffolding from earlier exploration (Experiment 2 telemetry classes, mass-interval pruning hook, calibration debug TSV emitter, ScoredSpectraMap binary-search helper).

Validation — 3-dataset, 3-trial, isolated bench machine

pride-linux-vm.ebi.ac.uk, 8 threads, JAR md5 a4fe020dd1563fe8ec8922f7b9df595e.

Dataset Arm Wall (s, mean ± σ) Targets Decoys Δwall Δtargets
Astral (LFQ_Astral_DDA_15min, t=10ppm) OFF 508.7 ± 5.0 89479 46792
AUTO 469.0 ± 9.2 89580 45292 −7.8 % +101
TMT (PXD007683 a05058, t=20ppm) OFF 248.0 ± 6.6 28790 14768
AUTO 240.3 ± 1.5 28201 14285 −3.1 % −589 (−2.0 %)
PXD001819 (UPS1_5000amol_R1, t=5ppm) OFF 101.3 ± 0.6 28037 11022
AUTO 100.3 ± 2.1 28084 10985 −1.0 % +47

Calibration learned (deterministic across all 3 AUTO trials per dataset):

  • Astral: shift 1.088 ppm, σ 0.994 ppm → tightened 10 → 3.482 ppm.
  • TMT: shift −0.754 ppm, σ 2.055 ppm → tightened 20 → 6.666 ppm.
  • PXD001819: shift 0.101 ppm, σ 2.154 ppm → no tightening (5 ppm user bound already inside the floor; safe no-op as designed).

Native count parity:

  • OFF-mode targets/decoys match dev baseline exactly on all 3 datasets.
  • AUTO-mode targets/decoys are deterministic across 3 trials per dataset (zero per-trial variance), confirming the spectrum-isolation fix prevents the shared-state mutation footgun.
  • TMT AUTO target drop (−2.0 %) is the same yellow flag noted in the original Phase B 3-dataset table; not a new regression.

Notable commits

  • aac389c calibrator: stratify residuals by spec_eValue, keep top MIN_CONFIDENT_PSMS=200 (sigma 4.0 → 0.99 ppm on Astral).
  • 7c027f8 expose tightening formula constants as system properties (msgfplus.tighteningSigmaMultiplier etc.).
  • 05ec066 calibrator: pre-pass isolated at iso=[0,0] + outlier-filter residuals (>50 ppm).
  • aa4aaae remove non-shippable runtime scaffolding (Experiment 2 telemetry, prefix-mass bound test, etc.); keep Phase B as the real improvement.
  • 5d9482d fix(phase-b): isolate Spectrum state during calibration pre-pass (closes the CLAUDE.md footgun called out in code review).

Test plan

  • CI: mvn -B verify passes (53/53 scoped tests pass locally including new TestScoredSpectraMapIsolation).
  • Pull feat/precursor-window-tightening, run java -jar MSGFPlus.jar -precursorCal off ... against any small benchmark file → target/decoy counts match the same command on dev.
  • Run with -precursorCal auto on a high-res file → log line "Precursor mass shift learned: X ppm from Y confident PSMs (robust sigma Z ppm)" appears, followed by "Tightened precursor tolerance for main pass: ...".
  • On a low-res or small file (e.g. PXD001819 5 ppm), AUTO falls back to the user tolerance with no tightening line — safe no-op.
  • Run AUTO twice on the same file → identical target/decoy counts (no per-run drift).

… SHIPPED.md

- Add astral-speed-improvements.md design (gate B: 1.3-1.5x Astral wall,
  no PSM regression). TMT-as-inner-loop, Astral-as-phase-gate workflow.
- Add SHIPPED.md short retrospective covering PRs #15-#22 (primitives +
  Achievements A+B), PR #23 (speed-v2 + output consolidation), PR #24
  (Astral OOM + BuildSA scaling), PR #25 (search-sync-cleanup +
  parameter-modernization), and the abandoned fragment-index experiment.
- Remove search-sync-cleanup.md, parameter-modernization.md, and
  parameter-modernization-flag-inventory.md — work shipped, content
  folded into SHIPPED.md.
- Update README.md to point at the new active/history layout.
…ng model

Replace the short-horizon astral-speed-improvements.md (drafted on this
branch in 878b0cb) with the longer-horizon astral-speed-5x-roadmap.md
as the single active design.

Shipping-model change (new §0 in the roadmap): this iteration ships as
milestone commits on feat/astral-speed-improvements with one closing PR
at the end, not a PR per phase. Each phase milestone commit follows the
template:

  feat(astral-speed): MILESTONE Phase <id> — <one-line achievement>

  <2-4 lines: TMT inner-loop wall delta, Astral phase-gate result,
  recall delta, RSS delta if any>

The small-wins from the deleted plan (graph-skeleton memoization in
PrimitiveAminoAcidGraph + Tier-1.5 GF candidate cap in
DBScanner.computeSpecEValue) are retained as the §0 "Iteration 0.5
fallback" — used only when a big-win phase fails its kill gate.

Original short-horizon plan recoverable via:
  git show 878b0cb:.claude/plans/astral-speed-improvements.md
…reset

Phase A (in-engine MS2 deisotoping + dense-peak retention cap + GF
candidate cap + scorer hot-path optimization) was attempted on this
branch from 2026-04-27 to 2026-04-28. None of three independent
optimization angles moved Astral wall above run-to-run noise; TMT's
1.41x apparent win came with -0.25% target and -4.6% decoy drift,
which is not clean enough to justify shipping the surface area.

Branch was hard-reset to eee9fa6 (the 5x roadmap state before any Phase
A work) to drop 11 code commits + their uncommitted in-progress
extensions. This commit captures the empirical findings as a
retrospective doc so future agents can read the measurement data,
understand why Phase A failed Astral, and pick from Phase B/C/E or
workload-retargeting alternatives without repeating the attempt.

- Add .claude/plans/astral-phase-a-retrospective.md with six Astral
  measurements, two TMT measurements, lessons (TMT-not-Astral-proxy,
  JIT-already-optimizes, native-count-drift-is-leading-indicator), and
  a concrete list of what's still untried.
- Update astral-speed-5x-roadmap.md section 11 + 12 to flag Phase A as
  attempted-and-reverted, with original recommendation preserved as a
  blockquote for context.
- Update SHIPPED.md "Abandoned" section with a Phase A entry
  pointing at the retrospective.
- Update README.md "Active" list to surface the retrospective.

Reverted code is recoverable via:
  git show 5cdd21e:src/...
and walks back through 11 commits (b78e275 through 5cdd21e).
…nder noise

Follow-up to Phase A iteration. Tested ForkJoin pool as a smart-default
on Astral after thread-scaling sweep showed default ThreadPoolExecutor
anti-scales past 6 threads.

Findings:
- Default executor: 4t=690s, 6t=675s, 8t=884s (+28% regression at 8t).
  Anti-scaling at 8t reproduces across measurements; this is real.
- ForkJoin opt-in: 4t=872s (regression), 8t=521s (1.32x vs 4t). Pool
  choice is thread-count dependent.
- Smart-default change implemented (numThreads >= 8 -> ForkJoin) and
  scope-tested (9/9 pass), but confirmation runs showed +65% wall on
  the same JAR vs the morning measurement. Machine state had degraded
  across hours of benchmarking; can't isolate ForkJoin win from noise.
- Reverted. The underlying -Dmsgfplus.useForkJoin=true opt-in remains
  in dev unchanged.

Lessons added to retrospective:
- Need stable benchmark environment + multi-run statistics for any
  Astral wall-time claim. Single-measurement deltas at the 30% level
  are below the noise floor.
- The default-executor anti-scaling at 8t IS worth fixing. Future
  attempt: jfr print --events jdk.JavaMonitorWait on 8t default-executor
  profile to localise the contention point in
  ThreadPoolExecutorWithExceptions.

Files: .claude/plans/astral-phase-a-retrospective.md (added Phase E
section + updated "what's untried" entry); .claude/plans/SHIPPED.md
(added Phase E to "Abandoned"). Source code unchanged from eee9fa6
(verified via git status; MSGFPlus.java reverted before commit).
… not enough

JFR analysis of the morning 4t Astral profile found zero JavaMonitorEnter
contention events and 100% RUNNABLE samples, ruling out lock contention
as the 8t regression cause. But 588K GCPhaseParallel events suggested
GC parallel-phase coordination as a candidate.

Tested with -Xmx16g (double heap) at 8t and 4t:
- 8t-Xmx16g: 776s wall, 5067MB peak RSS, 182 GC pauses
- 4t-Xmx16g: 870s wall, 6120MB peak RSS, 184 GC pauses

Bigger heap helped 8t by ~12% wall and dropped peak RSS ~17% (G1GC ran
fewer collections), but only ~4% at 4t. GC contributes to the 8t
regression but is not the entire story. Even with Xmx16g, 8t (776s) is
still slower than morning's 4t-Xmx8g baseline (690s).

The afternoon's 4t-Xmx8g run (904s) vs morning's same configuration
(690s) is +31% on the same JAR / same args / same machine — confirming
the day's accumulated machine-state degradation dwarfs any code-level
signal. Six hours of benchmarking has hit the noise floor.

No actionable recommendation to ship. Source code stays at dev tip.
The Phase E section in astral-phase-a-retrospective.md is updated with
the GC-pressure findings.
…ere noise

Replication batch (3 Astral runs back-to-back, 2026-04-29 morning,
quieter machine):

  4t default:     963.1s
  8t default:     918.3s   (FASTER than 4t — opposite of yesterday)
  8t ForkJoin:    978.8s   (no advantage)

All three within 6.5% of each other. Native counts bit-identical.

This proves:
- The "default executor anti-scales past 6 threads" finding was wrong.
  It was a one-day correlation between morning-quiet-machine + 4t and
  afternoon-noisy-machine + 8t, not a real algorithmic relationship.
- The 521s ForkJoin-8t measurement was an outlier. Three independent
  re-measurements (yesterday afternoon 861s, today 978s) put 8t-FJ at
  ~900s in normal machine state.
- The 690s 4t-default baseline was an outlier in the other direction.
  Today's same-machine 4t-default was 963s.

Conclusion: no Phase E shippable change exists on this benchmark setup.
Wall-time variance from machine state (~30%) dwarfs any code-level
signal we can produce on a single developer workstation.

Updated retrospective with the corrected Phase E section + replication
batch data. Updated SHIPPED.md Phase E entry to reflect the disproof.
Source code stays at dev tip.

End of iteration. Five doc commits stay on the branch as the deliverable.
The PXD001819 CI workflow (.github/workflows/benchmark-pxd001819.yml)
has been silently broken since PR #23 removed mzIdentML output
entirely. run_ci.sh wrote -o ci_output.mzid, MS-GF+ produced a .pin
under that filename, and extract_metrics.py tried to parse it as
mzIdentML. The CI never produced valid metrics.

Switch the whole pipeline to .pin, which is the only modern output
format:

- run_ci.sh: -o $PIN (was -o $MZID); error message updated.
- extract_metrics.py: rewritten parse_pin() counts native target /
  decoy rows by streaming the .pin (column 2 = label, {1, -1}) one
  line at a time. Drops the iterparse XML dependency. Replaces
  sii_count + psm_1pct_fdr metrics with native_target_count +
  native_decoy_count.
- baseline.tsv: new metric names + ranges. Wide first-green-run
  windows (14000-35000 targets, 8000-20000 decoys); to be tightened
  after the first self-hosted runner produces consistent numbers.
- test_compare_metrics.py: 7 existing comparator tests updated to
  use the new metric names; 3 new ParsePinTest cases covering
  label counts, empty input, and malformed rows. 10/10 pass.
- benchmark/ci/README.md: documents the PIN-based pipeline, why
  native counts are the right CI gate (deterministic across runs;
  catches search-code regressions cleanly without Percolator
  dependency), and points future iterations at the multi-run
  protocol the astral-phase-a-retrospective.md asks for.

The CI gate is now search-correctness (deterministic counts), not
sensitivity (which needs Percolator + a separate gate). Sensitivity
regressions surface via a downstream Percolator run on the .pin
artifact — out of scope for this CI workflow.

This is the meta-fix the Astral retrospective named as the next
iteration's prerequisite: stable benchmarks must work before any
speed-optimization claim is defensible.
…ication

Phase B (calibrated precursor-window tightening) is already implemented in
dev — MSGFPlus.runMSGFPlus:396-423 computes tightened tolerance via
MassCalibrator.tightenedTolerancePpm() and passes it to per-task
ScoredSpectraMap. What's missing per astral-next-experiments.md is the
telemetry to verify it's actually shrinking the pairing fan-out.

This commit adds PhaseBTelemetry, an opt-in counter class that records:
- pairing_calls: number of times DBScanner.dbSearch hit the
  pepMassSpecKeyMap.subMap(leftThr, rightThr) site
- matched_speckeys: total SpecKeys returned across those calls

Enable with -Dmsgfplus.phaseBTelemetry=true. Off by default; OFF-mode
behaviour is bit-identical (single load+branch when disabled). Not
exposed as a CLI flag — this is a developer diagnostic for Phase B
verification, mirroring the approach used for -Dmsgfplus.useForkJoin.

At the end of search MSGFPlus.runMSGFPlus prints (when enabled):
  [Phase B telemetry] pairing_calls=N matched_speckeys=M mean_per_call=X.XX

Files:
- src/main/java/edu/ucsd/msjava/msdbsearch/PhaseBTelemetry.java (new) —
  thread-safe LongAdder counters, opt-in via system property, JVM-scoped.
- src/main/java/edu/ucsd/msjava/msdbsearch/DBScanner.java — record at
  DBScanner.dbSearch:489 (one line, guarded by enabled() short-circuit).
- src/main/java/edu/ucsd/msjava/cli/MSGFPlus.java — print summary after
  task wall summary (around line 587), only when enabled.
- src/test/java/edu/ucsd/msjava/msdbsearch/TestPhaseBTelemetry.java (new)
  — 5 tests: counter math, mean-zero edge case, reset, 8-thread
  concurrency stress (80k ops total), system-property contract.

Tests: 14/14 pass for the relevant scoped suite (TestPhaseBTelemetry +
TestPrecursorCalScaffolding + TestConcurrentMSGFPlus); OFF-mode
unchanged.

Plan update: astral-next-experiments.md §4 annotated to note Phase B
core is shipped + telemetry is now wired. Future agent runs the bench
with -Dmsgfplus.phaseBTelemetry=true and the existing -precursorCal
auto/on, then verifies §4.5 Phase B success gate (matched SpecKey count
drops materially after tightening; Astral wall improves >=10%).
…uals

Two bugs in MassCalibrator surfaced under Phase B telemetry:

1. The pre-pass propagated the user's -ti window (e.g. -1,2 on Astral) to
   the pre-pass DBScanner. Spectra whose precursor was selected as the
   M+1 or M+2 isotope produced 500-1000 ppm "residuals" (the 1.003 Da
   isotope shift on a 2 kDa peptide). These contaminated the median +
   MAD aggregation. Astral measured sigma=1166 ppm — useless for any
   tightening decision. Fix: hard-code the pre-pass to iso=[0,0]. The
   pre-pass exists to LEARN the shift, not to find every match; clean
   monoisotopic matches are what makes the residual distribution
   meaningful.

2. As belt-and-suspenders, residuals beyond +/-50 ppm are now rejected
   in extractResiduals(). Any modern instrument's true mass accuracy is
   well under 50 ppm; values above are almost always isotope-shift or
   charge-state mistakes. New constant MAX_REASONABLE_RESIDUAL_PPM=50.

Drop the now-unused minIsotopeError / maxIsotopeError fields and
constructor parameters from MassCalibrator. The single caller
(MSGFPlus.runMSGFPlus) is updated.

Verified on remote pride-linux-vm.ebi.ac.uk (Astral ProteoBench Module 8):

  Before fix:  Precursor mass shift 1.398 ppm from 371 PSMs (sigma 1166 ppm)
  After fix:   Precursor mass shift 0.978 ppm from 393 PSMs (sigma 3.987 ppm)

The shift values are similar; the sigma is the dramatic signal that
the residual distribution is now clean. 0.978 ppm is the typical
Astral instrument bias; 3.987 ppm is the true post-calibration
spread (still wider than expected; might reflect long-peptide tail).

OFF-mode (precursorCal=off) is bit-identical to dev-tip on three
back-to-back Astral runs (548/560/551s wall, all 89479/46792 native
counts). 32/32 scoped tests pass (TestMassCalibrator,
TestPhaseBTelemetry, TestPrecursorCalScaffolding).

NOTE: this commit only fixes the calibrator's correctness. Phase B
tightening still does not fire on Astral because the tightening
formula k=3 * sigma + margin = 12.46 ppm exceeds the 10 ppm user
window. Tuning k or rethinking the formula is a separate decision;
see retrospective for the next-iteration discussion.
Phase B's tightening formula min(userPpm, max(floorPpm, k*sigma + marginPpm))
previously used hard-coded MassCalibrator.DEFAULT_TIGHTENED_WINDOW_*
constants (k=3, floor=2 ppm, margin=0.5 ppm). Surface them as system
properties so falsification sweeps can vary k without recompiling:

  -Dmsgfplus.tighteningSigmaMultiplier=<float>  (default 3.0)
  -Dmsgfplus.tighteningFloorPpm=<float>         (default 2.0)
  -Dmsgfplus.tighteningMarginPpm=<float>        (default 0.5)

Defaults unchanged; OFF-mode and PrecursorCal=AUTO production behaviour
is bit-identical when no override is set.

Verified on remote pride-linux-vm.ebi.ac.uk Astral ProteoBench Module 8
with the new -Dmsgfplus.tighteningSigmaMultiplier=2 override:

| Run                | Wall  | Targets | Decoys | mean_per_call |
|--------------------|-------|---------|--------|---------------|
| OFF (baseline)     | 538s  |  89479  | 46792  | 0.77          |
| AUTO @ k=2 rep1    | 537s  |  89453  | 46685  | 0.63          |
| AUTO @ k=2 rep2    | 531s  |  89453  | 46685  | 0.63          |

Tightening fires for the first time on Astral (10.000 ppm -> 8.474 ppm).
Pairing fan-out drops 18% (mean_per_call 0.77 -> 0.63). matched_speckeys
drops 15.4%. Native counts shift slightly (-26 targets, -107 decoys =
-0.03% / -0.23%, within noise). Wall delta: 0 +/- 2% (within machine
noise floor of three OFF replicates 538/560/551).

Conclusion: Phase B's cheap-score-pairing reduction does NOT translate
to Astral wall savings. The bottleneck is downstream of pairing (GF /
scoring). Phase B remains a valid lever for workloads where calibrated
sigma is small relative to user tolerance (typical of older / drifted
instruments) but is not the next Astral wall lever.

Next step: stratification analysis (Plan B) to determine whether a
cleaner subset of Astral pre-pass PSMs has materially tighter sigma.
If yes -> calibrator quality improves for workloads where Phase B
DOES pay off, even if Astral itself doesn't benefit. If no ->
Phase B is at its theoretical ceiling.
…FIDENT_PSMS

Plan B (post-falsification): the previous calibrator returned all 393
SpecEValue<=1e-6 PSMs to the median+MAD aggregator. Per-PSM stratification
analysis (using the calibrationDebugTsv emitter) on Astral revealed:

  Stratum                      | n   | sigma (ppm)
  -----------------------------|-----|------------
  All confident PSMs           | 393 | 3.987
  Charge=2 only                | 355 | 2.923
  Mass < 1 kDa                 | 289 | 2.470
  TOP 50% by spec_eValue       | 196 | 0.951
  TOP 200 by spec_eValue       | 200 | 0.994

The "best half by spec_eValue" axis is the dominant explanatory variable —
4x tighter sigma. The worst-half PSMs (eValue near the 1e-6 threshold)
contribute residual scatter, not real instrument-bias signal.

Fix: extractResiduals now collects (residual, spec_eValue) pairs, sorts
by spec_eValue ascending (most confident first), and keeps the top
MIN_CONFIDENT_PSMS (200) for the median+MAD aggregation. The
absolute-residual filter (|residual| > 50 ppm) stays as belt-and-suspenders.

Astral measurement on remote pride-linux-vm.ebi.ac.uk (5 OFF replicates +
3 stratified-AUTO replicates):

  Run                   | Wall  | Sigma     | Window         | Targets | Decoys | T/D
  ----------------------|-------|-----------|----------------|---------|--------|------
  OFF baseline (median) | 551 s | -         | 10 ppm         |  89479  | 46792  | 1.91
  AUTO stratified       | 494 s | 0.994 ppm | 10 -> 3.48 ppm |  89580  | 45292  | 1.98

  Median wall: -10.4% (gate was >=10%, hit)
  Targets:     +101 (+0.11%, NOT a regression)
  Decoys:      -1500 (-3.2%, FDR-favorable)
  T/D ratio:   +3.6% (the tighter window rejects more decoys than targets)
  Reproducibility: 3 stratified reps bit-identical on native counts;
                   wall variance +/-1.2% across reps.

Phase B is now demonstrably alive on Astral when the calibrator's
residual subset is clean. The previous "Phase B does nothing on Astral"
finding was a calibrator-quality issue, not a Phase B logic issue.

Also keeps the calibration-debug-TSV emitter
(-Dmsgfplus.calibrationDebugTsv=<path>) committed for future
stratification work on other workloads. 32/32 scoped tests pass.
Phase B with the stratified calibrator (top-200 by spec_eValue) delivers
-10.4% Astral wall (median 551 -> 494s across 5 OFF + 3 AUTO reps),
+0.11% targets, -3.2% decoys, T/D ratio 1.91 -> 1.98 (sensitivity-
favorable). Plan section 4 annotated; SHIPPED.md "Active" section
documents the win + the four enabling commits (telemetry, calibrator
iso=0, configurable formula constants, stratification).
The previous SHIPPED.md still listed astral-speed-improvements.md as
the Active doc; that file was deleted during a prior reset. Replace
with the actual Active state: Phase B shipped with 4 enabling commits
on feat/astral-speed-improvements, the Astral measurement table, and
pointers to next-experiments.md and astral-speed-5x-roadmap.md.
Replace Astral-only result table with three-dataset comparison after
remote validation on TMT (PXD007683 Lumos) and PXD001819 (Velos):

  Workload  | Window | Sigma  | Tightened    | Wall   | Targets | T/D
  ----------|--------|--------|--------------|--------|---------|------
  Astral    | 10 ppm | 0.99   | -> 3.48 ppm  | -10.4% |  +0.11% | +3.6%
  TMT       | 20 ppm | 2.05   | -> 6.67 ppm  | -18.0% |  -2.05% | +1.3%
  PXD001819 |  5 ppm | 2.15   | no-tighten   |   ~0%  |  +0.17% | +0.5%

Pattern: Phase B wins when calibrated sigma is materially smaller than
user window (Astral, TMT); safely no-ops when sigma is comparable to
window (PXD001819). TMT -2.05% target drift is a known yellow flag for
broader rollout; mitigations (instrument-aware k, stricter
stratification) noted in the doc but out of scope for this commit.
After Phase B's three-dataset validation (Astral -10.4%, TMT -18.0%,
PXD001819 safe no-op), the next-experiments plan §5 calls for exact
prefix mass-interval pruning. This commit writes the design before any
code change so the implementation has a clear contract:

- Hook: between addResidue (extends prefix) and the variant-mass-fetch
  loop in DBScanner.dbSearch (around line 412)
- Bound: [prefixMinMass + R_min * minAaMass,
          prefixMaxMass + R_max * (maxAaMass + maxResidueModMass) + maxFixedTermModMass]
- Intersection test: pepMassSpecKeyMap.subMap(...) widened by max
  precursor tolerance; if empty, branch is dead -> break
- Telemetry-first: implement prune-counter only at Checkpoint 1, run on
  Astral, decide based on prune rate before adding the break
- Exact-by-construction: no recall risk, only bookkeeping-vs-savings
  trade-off

Acceptance: prune rate >=5%, Astral wall -5% vs Phase B baseline,
native counts bit-identical.

Kill: prune rate <1%, or wall flat despite pruning, or correctness
drift.

Composes with Phase B (commit aac389c): Phase B reduced matched_speckeys
per pairing call; Experiment 2 reduces the number of pairing calls.
Different attack surfaces; should stack.
…heckpoint 1+2 measured)

Plan-§5 exact prefix mass-interval pruning: at every SA-walk residue
extension in DBScanner.dbSearch, compute the reachable final-peptide-
mass interval and ask whether any spectrum window in pepMassSpecKeyMap
can intersect it. If not, the branch is dead.

The bound is exact by construction:
  reachableMin = prefixMinMass + R_min * minAaMassBound
  reachableMax = prefixMaxMass + R_max * maxAaMassBound
where R_min / R_max are the residue counts implied by minPeptideLength /
maxPeptideLength, and minAaMassBound / maxAaMassBound are cached from
aaSet.getAllAminoAcidArr() at DBScanner construction (every aa+mod
variant in the set, so any future residue's mass is bounded).

Two opt-in flags:
  -Dmsgfplus.experiment2Telemetry=true  -> count would-be prunes
  -Dmsgfplus.experiment2Pruning=true    -> actually break out of SA walk

Default OFF -> the bound is not computed, OFF-mode bit-identical to
dev-tip. Enabled as a developer diagnostic, mirroring PhaseBTelemetry
and -Dmsgfplus.useForkJoin patterns. Not exposed as CLI flags.

Astral measurements on remote pride-linux-vm.ebi.ac.uk:

  Run                          | Wall  | Targets | Decoys | E2 prune
  -----------------------------|-------|---------|--------|---------
  Phase B baseline (3 reps)    | 494s  | 89580   | 45292  | n/a
  OFF baseline (5 reps median) | 551s  | 89479   | 46792  | n/a
  phaseB + E2 telemetry only   | 571s  | 89580   | 45292  | 12.22%
  phaseB + E2 pruning ON       | 549s  | 89580*  | 45292* | 1.84%
  OFF + E2 pruning ON          | 611s  | 89479*  | 46792* | 1.76%
  (* = bit-identical to baseline -> exact-by-construction validated)

Verdict per plan §7 acceptance gates:
- Native target/decoy bit-identical: PASS (exact)
- Astral prune rate >= 5%: PASS (12.22% Checkpoint 1, 1.84% with break)
- Astral wall improves >= 5%: FAIL (regresses +11% in both
  Phase-B+E2-prune and OFF+E2-prune cases)

The bookkeeping cost (~40 ns per prefix evaluation, ~1.4B
evaluations on Astral = ~55s) exceeds the savings from skipping
doomed branches. Pure cheap-score-loop savings are ~1-2s (the
pruned pairings had zero matches anyway -- mean_per_call near zero
for those). The +55s gap is the pure overhead.

Checkpoint 3 path forward (left as follow-on, NOT in this commit):
- Cache globalMinSpecMass / globalMaxSpecMass once per task; short-circuit
  the bound test for prefixes whose reachable interval falls outside
  the global range -> avoids the TreeMap.subMap call (~150ns) on most
  prefix evaluations.
- Skip bound evaluation when peptideLengthIndex < minPeptideLength + N
  (N TBD); the prune-rate-by-length distribution would tell us where
  short-prefix evaluations are dead weight vs useful.
- Avoid per-prefix grid scan for prefix min/max mass; cache
  incrementally as variants are added.

Code stays in dev as opt-in scaffolding. Phase B remains the
iteration's shippable Astral wall lever (-10.4% measured median).

Files:
- src/main/java/edu/ucsd/msjava/msdbsearch/Experiment2Telemetry.java (new)
- src/main/java/edu/ucsd/msjava/msdbsearch/DBScanner.java (cached aa-mass
  bounds + bound-test hook)
- src/main/java/edu/ucsd/msjava/cli/MSGFPlus.java (telemetry summary)
- src/test/java/edu/ucsd/msjava/msdbsearch/TestExperiment2Telemetry.java (5 tests)
Checkpoint 2 measurement on Astral remote shows the bound test is
exact-by-construction (native counts bit-identical) and prunes 1.84%
of prefix evaluations after break — but the per-prefix bookkeeping
cost (~40 ns × 1.4 B = ~55 s) exceeds the savings (the pruned
pairings had zero matches, so saved cheap-score work is ~1-2 s).
Wall regresses +11% vs Phase B alone.

Code lands as opt-in scaffolding (commit 4241fbb), defaults OFF, so
OFF-mode is bit-identical to dev-tip. Future agents can pursue
Checkpoint 3 (cache global spec-mass range short-circuit; skip bound
test for short prefixes; incremental grid-mass caching) without
rebuilding the scaffolding.
…ed double[]

The Checkpoint 2 measurement showed Experiment 2 pruning was correct
(native counts bit-identical) but regressed wall +11 % because the
TreeMap.subMap(low, high).isEmpty() bound test allocated a view object
and walked at least one entry per call (~150 ns × 1.4 B evaluations =
~210 s of pure overhead).

Replace with ScoredSpectraMap.hasSpecMassInRange(double, double):
- Lazily materialise a sorted double[] from pepMassSpecKeyMap.keySet()
  on first call (the source map is read-only after preProcessSpectra
  finishes, so a cached snapshot is safe).
- Use Arrays.binarySearch + a single-element range check (~30 ns).
- 5x cheaper per call; no allocation per query.

Astral measurement on remote pride-linux-vm.ebi.ac.uk (3 runs each
config, all native counts bit-identical to baseline = exact-by-
construction validated):

  Run                          | TreeMap | binary-search | gain
  -----------------------------|---------|---------------|-------
  Phase B + E2 telemetry only  | 571 s   |     531 s     | -40 s
  Phase B + E2 pruning ON      | 549 s   |     511 s     | -38 s
  OFF + E2 pruning ON          | 611 s   |     559 s     | -52 s

Verdict against plan §7 wall gate (>= 5 % improvement vs Phase B
baseline 494 s):
- Phase B + E2 pruning: 511 vs 494 = +3.4 % (still slight regression)
- OFF + E2 pruning: 559 vs 551 = +1.5 % (break-even within noise)

The optimization closes ~75 % of the previous overhead but the
remaining ~17 s gap on Phase B-tightened data isn't covered by the
modest savings (matched_speckeys per skipped pairing is small once
Phase B has already narrowed the window from 10 ppm to 3.48 ppm).

Code stays as opt-in (-Dmsgfplus.experiment2Pruning=true). Default
remains OFF -> bit-identical to dev-tip on dev-tip OFF mode + Phase
B's calibration shift on AUTO. Phase B (commit aac389c and ancestors)
is the iteration's durable Astral wall lever (-10.4 %).

Future Checkpoint 4 paths (left as follow-on, not in this commit):
- Skip bound test for short prefixes (peptideLengthIndex below some
  threshold N where prunes are statistically rare)
- Cache prefixMin/Max incrementally instead of scanning grid variants
  on every evaluation (~10 ns saved per call)
Binary-search optimization closes ~75 % of the Checkpoint 2 overhead
gap. Phase B + E2 pruning narrows from +11 % to +3.4 % wall regression
vs Phase B alone. OFF + E2 pruning narrows from +11 % to +1.5 % vs OFF
(break-even in noise). Still misses the +5 % wall gate; Phase B stays
the iteration's deliverable.
…deLength (Checkpoint 4)

Move the Experiment 2 mass-interval pruning hook from before the
'peptideLengthIndex < minPeptideLength' continue to after it. Short prefixes
(length < minPeptideLength) have huge reachable intervals — R_max * maxAaMass
is many kDa wide — so the bound test almost never prunes there but still pays
the binary-search bookkeeping. Skipping that range trims ~25-30% of bound-test
evaluations.

Also simplifies reachableMin = prefixMin once we're past minPeptideLength
(R_min collapses to 0).

Exact-by-construction property preserved: the moved test is still a sound
necessary condition on extending the prefix to a viable final peptide. OFF mode
unaffected. Off by default (still gated on -Dmsgfplus.experiment2Pruning=true).
…pped, bench pending

Records the Checkpoint 4 code change (commit 8478651): bound test now skipped
for prefixes shorter than minPeptideLength, where the reachable interval is
too wide to prune anyway. Tests pass; Astral wall measurement pending remote
socket re-establishment. Math says best-case is +3-7s gap vs Phase B alone —
still inside noise, not a clear ≥5% graduation. Verdict updated to: four
checkpoints, none clear the gate; Phase B is the durable lever.
…-2.27% Astral wall

Replaces the "bench pending" Checkpoint 4 entry with the actual 5-trial
interleaved confirmation:

  phaseB_only:    mean 519.8s, σ 6.06s (522, 518, 517, 529, 513)
  phaseB+e2:      mean 508.0s, σ 4.85s (504, 507, 514, 503, 512)
  Δ = 11.8s = -2.27% vs Phase B alone
  5/5 trials phaseB+E2 < phaseB_only; Welch's t ~3.4 (p ~ 0.01)
  Native counts bit-identical 89580T / 45292D across all 10 runs

The effect is real and statistically significant but below the plan's >=5%
graduation gate. Experiment 2 stays opt-in (-Dmsgfplus.experiment2Pruning=true);
Phase B (-10.4% Astral wall vs OFF) remains the durable lever.
…real improvement

The Astral 5-trial bench showed:
  - Phase B (calibrated precursor-window tightening): -10.4% wall vs OFF, durable
    win, validated on 3 datasets (Astral / TMT / PXD001819). KEPT.
  - Experiment 2 (mass-interval pruning, Checkpoints 1-4): real -2.27% on top of
    Phase B but below the >=5% default-on gate. Runtime scaffolding REMOVED;
    retrospective stays in .claude/plans/experiment-2-mass-interval-pruning.md.
  - Catalog-backed scanner (MassIndexedPeptideCatalog + CatalogDBScanner):
    n=1 trial showed 4.56x slowdown (2186s vs 479s) with bit-identical native
    counts. Root cause: scanner iterates all 7M entries linearly and allocates
    a fresh CandidatePeptideGrid per entry instead of querying the mass slabs
    the catalog builds. REMOVED in current form; would need a spectrum-major
    redesign + grid pool to be viable.

Removed:
  - src/main/java/edu/ucsd/msjava/msdbsearch/CatalogDBScanner.java
  - src/main/java/edu/ucsd/msjava/msdbsearch/MassIndexedPeptideCatalog.java
  - src/main/java/edu/ucsd/msjava/msdbsearch/MatchScanner.java
  - src/main/java/edu/ucsd/msjava/msdbsearch/PeptideCatalogSource.java
  - src/main/java/edu/ucsd/msjava/msdbsearch/Experiment2Telemetry.java
  - src/main/java/edu/ucsd/msjava/msdbsearch/PhaseBTelemetry.java
  - tests for the above

Reverted to HEAD (catalog source-awareness):
  - DatabaseMatch (no protein-source list)
  - MSGFPlusPSMSet (no source-aware decoy decision)
  - DirectPinWriter / DirectTSVWriter (no catalog source path)
  - CompactFastaSequence (no protein-entry digest helper)
  - ConcurrentMSGFPlus (no catalog scanner construction)
  - TestConcurrentMSGFPlus (revert null arg)

Cleanup kept:
  - MSGFPlus: drop Phase B / Experiment 2 telemetry print blocks
  - DBScanner: drop Experiment 2 prefix-mass bound hook (and its bookkeeping)
  - ScoredSpectraMap: drop hasSpecMassInRange (Experiment 2 binary-search helper)
  - MassCalibrator: drop unused -Dmsgfplus.calibrationDebugTsv emitter

Phase B remains untouched: MassCalibrator iso=0 pre-pass + outlier filter +
spec_eValue stratification, and MSGFPlus tightening at min(userPpm, max(floor,
k*sigma + margin)) with k=3, floor=2 ppm, margin=0.5 ppm. Branch state:
51/51 scoped tests pass.
CLAUDE.md flags as a footgun: pre-passes that mutate shared Spectrum objects
cause silent PSM-count drift in the main pass. The calibration pre-pass at
MassCalibrator#collectResiduals invoked ScoredSpectraMap#preProcessSpectra,
which calls spec.setCharge(charge) and scorer.getScoredSpectrum(spec) on
objects from the shared SpectraAccessor — exactly the warned pattern.

Fix:

- ScoredSpectraMap gains an opt-in builder isolateSpectrumState() that
  routes all spectrum-mutating sites through a new prepareSpectrumForScoring
  helper. Default behavior (main search path) is unchanged: in-place
  setCharge, no allocation.
- When isolation is enabled, each call clones the Spectrum (precursor +
  per-peak deep clone via getCloneWithoutPeakList + Peak.clone), applies
  setCharge to the clone only, and scores the clone. The shared Spectrum
  cached on SpectraAccessor is untouched.
- MassCalibrator's pre-pass opts in:
    new ScoredSpectraMap(...).isolateSpectrumState()
- TestScoredSpectraMapIsolation pins the contract: default path returns the
  same instance and propagates setCharge; isolated path returns a different
  instance, leaves the original's charge unchanged, and applies setCharge to
  the clone.

Validation: 3-dataset x 3-trial x 2-arm bench on the isolated machine
(pride-linux-vm.ebi.ac.uk):

  Astral      OFF  508.7s +/- 5.0s  T=89479 D=46792
              AUTO 469.0s +/- 9.2s  T=89580 D=45292   wall -7.8%, +101 T
  TMT         OFF  248.0s +/- 6.6s  T=28790 D=14768
              AUTO 240.3s +/- 1.5s  T=28201 D=14285   wall -3.1%
  PXD001819   OFF  101.3s +/- 0.6s  T=28037 D=11022
              AUTO 100.3s +/- 2.1s  T=28084 D=10985   wall -1.0%, +47 T

Native target/decoy counts are deterministic across all 3 trials per arm
(zero per-trial variance), confirming the isolation prevents the
shared-state mutation that would otherwise produce silent drift. OFF-mode
counts on each dataset match the dev baseline exactly.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 1, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6406ee62-935a-4aa4-a9b1-d26b46879fb3

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/precursor-window-tightening

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Phase B: Calibrated precursor-window tightening with spectrum state isolation

✨ Enhancement 🐞 Bug fix 🧪 Tests 📝 Documentation

Grey Divider

Walkthroughs

Description
• Implements calibrated precursor-window tightening via a pre-pass that learns mass shift and robust
  sigma from top 200 confident PSMs, then tightens main-pass tolerance using `min(userPpm,
  max(floorPpm, k·sigma + marginPpm))` formula (defaults: k=3, floor=2 ppm, margin=0.5 ppm). Opt-in
  via -precursorCal auto flag.
• Fixes CLAUDE.md violation: pre-pass now routes through new
  ScoredSpectraMap.isolateSpectrumState() builder that deep-clones each spectrum (precursor + peaks)
  before charge-setting, preventing shared cache mutation while keeping main-pass path unchanged.
• Adds CalibrationStats immutable class with robust statistics methods
  (medianAbsoluteDeviation(), robustSigmaPpm(), tightenedTolerancePpm()) and exposes tightening
  formula constants as configurable system properties.
• Refactors calibration pre-pass to use fixed isotope error 0, stratify residuals by spec_eValue,
  filter outliers >50 ppm, and keep only top 200 confident PSMs.
• Validated on 3 datasets with 3 trials each: Astral shows −7.8% wall improvement (+101 targets),
  TMT shows −3.1% wall improvement, PXD001819 shows −1.0% wall with +47 targets. All results
  deterministic across trials, confirming spectrum isolation fix.
• Switches benchmark CI metrics from mzIdentML XML parsing to direct PIN TSV parsing for native
  target/decoy counts (deterministic search-correctness gate instead of 1% FDR sensitivity).
• Removes non-shippable scaffolding: Experiment 2 telemetry classes, mass-interval pruning hook,
  calibration debug TSV emitter, and binary-search helper.
• Adds comprehensive unit tests for robust sigma calculation, tolerance tightening formula variants,
  and spectrum isolation behavior.
• Updates documentation with Phase A retrospective, 5× speed roadmap, Phase B/Experiment 2 action
  plan, and CI measurement strategy.
Diagram
flowchart LR
  A["Pre-pass<br/>Top 200 PSMs<br/>iso=0"] -->|"Learn shift<br/>& sigma"| B["CalibrationStats"]
  B -->|"Tighten formula<br/>k·sigma+margin"| C["Effective tolerance<br/>left/right"]
  D["Input Spectrum"] -->|"isolateSpectrumState"| E["Cloned Spectrum<br/>peaks+precursor"]
  E -->|"setCharge"| F["Scored Spectrum<br/>pre-pass"]
  C -->|"Lambda capture"| G["Main-pass<br/>ScoredSpectraMap"]
  G -->|"Default path<br/>in-place"| H["Results<br/>−7.8% Astral wall"]
Loading

Grey Divider

File Changes

1. src/main/java/edu/ucsd/msjava/msdbsearch/MassCalibrator.java ✨ Enhancement +149/-16

Calibration stats and tolerance tightening formula implementation

• Adds CalibrationStats immutable class to return both shift and robust sigma from calibration
 pre-pass
• Exposes tightening formula constants as public static fields
 (DEFAULT_TIGHTENED_WINDOW_FLOOR_PPM, DEFAULT_TIGHTENED_WINDOW_SIGMA_MULTIPLIER,
 DEFAULT_TIGHTENED_WINDOW_MARGIN_PPM)
• Implements robust statistics: medianAbsoluteDeviation(), robustSigmaPpm(), and
 tightenedTolerancePpm() for conservative tolerance tightening
• Refactors pre-pass to use isotope error 0 (fixed), stratifies residuals by spec_eValue, filters
 outliers >50 ppm, and keeps top 200 confident PSMs
• Removes minIsotopeError and maxIsotopeError constructor parameters (pre-pass now hardcoded to
 iso=0)

src/main/java/edu/ucsd/msjava/msdbsearch/MassCalibrator.java


2. src/main/java/edu/ucsd/msjava/cli/MSGFPlus.java ✨ Enhancement +59/-12

Main-pass tolerance tightening and calibration stats integration

• Calls learnCalibrationStats() instead of learnPrecursorShiftPpm() to capture both shift and
 robust sigma
• Implements precursor-window tightening logic when calibration stats are reliable and tolerance is
 ppm-based
• Computes tightened left/right tolerances via MassCalibrator.tightenedTolerancePpm() using
 configurable system properties
• Passes effective (tightened or original) tolerances to main-pass ScoredSpectraMap via lambda
 capture
• Removes minIsotopeError and maxIsotopeError arguments from MassCalibrator constructor call

src/main/java/edu/ucsd/msjava/cli/MSGFPlus.java


3. src/main/java/edu/ucsd/msjava/msdbsearch/ScoredSpectraMap.java 🐞 Bug fix +35/-5

Spectrum state isolation for pre-pass to prevent shared cache mutation

• Adds isolateSpectrumState flag and isolateSpectrumState() builder method for spectrum cloning
 during preprocessing
• Introduces prepareSpectrumForScoring() method that either mutates in-place (default) or clones
 before charge-setting (isolated mode)
• Implements cloneSpectrum() helper that deep-clones spectrum metadata and all peaks
• Updates preProcessIndividualSpectra() and preProcessFusedSpectra() to use the new preparation
 method

src/main/java/edu/ucsd/msjava/msdbsearch/ScoredSpectraMap.java


View more (16)
4. src/main/java/edu/ucsd/msjava/msdbsearch/DBScanner.java Miscellaneous +17/-17

Field visibility changes for test extensibility

• Changes 17 private fields to protected visibility to enable subclassing in test/experimental
 contexts

src/main/java/edu/ucsd/msjava/msdbsearch/DBScanner.java


5. src/test/java/msgfplus/TestMassCalibrator.java 🧪 Tests +49/-0

Unit tests for robust sigma and tolerance tightening logic

• Adds 5 new unit tests for robust statistics: medianAbsoluteDeviationUsesProvidedCenter(),
 robustSigmaPpmScalesMad(), three tightenedTolerancePpm*() variants
• Adds test for CalibrationStats.hasReliableStats() with zero shift but reliable sigma
• Tests verify MAD scaling, floor/margin enforcement, and user-bound capping in tightening formula

src/test/java/msgfplus/TestMassCalibrator.java


6. src/test/java/edu/ucsd/msjava/msdbsearch/TestScoredSpectraMapIsolation.java 🧪 Tests +52/-0

Tests for spectrum state isolation in ScoredSpectraMap

• New test class with 2 tests verifying spectrum isolation behavior
• defaultPathMutatesOriginalSpectrumCharge() confirms default path mutates in-place
• isolatedPathClonesSpectrumBeforeChangingCharge() confirms isolated mode clones before mutation

src/test/java/edu/ucsd/msjava/msdbsearch/TestScoredSpectraMapIsolation.java


7. benchmark/ci/PXD001819/test_compare_metrics.py 🧪 Tests +65/-7

CI test updates for PIN-based native count validation

• Updates test metrics from psm_1pct_fdr to native_target_count and native_decoy_count
• Adds new ParsePinTest class with 3 tests for parse_pin() function: label counting, empty file
 handling, malformed row skipping
• Adds _load_extract_metrics() helper to dynamically import the extract_metrics module

benchmark/ci/PXD001819/test_compare_metrics.py


8. benchmark/ci/PXD001819/extract_metrics.py ⚙️ Configuration changes +34/-41

Benchmark metrics extraction from PIN instead of mzIdentML

• Replaces mzIdentML XML parsing with direct PIN TSV parsing via parse_pin() function
• Counts target (Label==1) and decoy (Label==-1) rows from the .pin file instead of parsing XML
 for 1% FDR PSMs
• Updates main() to accept --pin argument instead of --mzid and output native_target_count /
 native_decoy_count metrics
• Removes XML ElementTree dependency and streaming iterparse logic

benchmark/ci/PXD001819/extract_metrics.py


9. benchmark/ci/README.md 📝 Documentation +34/-4

CI documentation for PIN-based metrics and measurement strategy

• Adds "What gets measured" section documenting the 5 CI metrics and their sources
• Clarifies that native target/decoy counts are deterministic (search-correctness gate), not 1% FDR
 sensitivity
• Explains PIN format choice and removal of mzIdentML
• Adds "Tightening the baseline" section for narrowing ranges after green runs
• Adds "Future iterations" section referencing the Phase A retrospective and multi-run measurement
 strategy

benchmark/ci/README.md


10. .claude/plans/astral-phase-a-retrospective.md 📝 Documentation +230/-0

Phase A retrospective with measurements and lessons learned

• Documents Phase A attempt (deisotoping, peak cap, GF candidate cap, scorer optimization) with 6
 Astral measurements showing no wall improvement
• Records TMT 1.41× win but with −4.6% decoy drift, deemed not a clean win
• Provides lessons learned: TMT is not a reliable Astral proxy, JIT optimization already handles hot
 spots, profile-driven fixes need post-fix validation
• Includes Phase E parallelism investigation (anti-scaling claim disproven by replication batch)
• Lists untried phases (B, C, D, E) and what future agents should do

.claude/plans/astral-phase-a-retrospective.md


11. .claude/plans/astral-speed-5x-roadmap.md 📝 Documentation +466/-0

Long-horizon Astral 5× speed roadmap and phase design

• Comprehensive 5× Astral speed roadmap with 5 phases: spectrum cleanup, precursor tightening,
 branch-and-bound, GF threshold, parallelism
• Details working thesis, multiplicative work loops, expected payoff per phase, telemetry
 requirements, and acceptance/kill gates
• Proposes implementation order (Iterations 0–5) with Phase A as opening attempt
• Includes update note that Phase A was attempted and reverted; Phase B remains viable

.claude/plans/astral-speed-5x-roadmap.md


12. .claude/plans/astral-next-experiments.md 📝 Documentation +285/-0

Post-retrospective action plan for Phase B and Experiment 2

• Historical staging plan for Phase B and Experiment 2 (exact mass-interval pruning)
• Documents Phase B as already shipped with −10.4% Astral wall improvement
• Describes Experiment 2 as statistically real but below 5% graduation gate
• Provides updated priority order and experiments to avoid
• Includes benchmark rules and recommendations for future iterations

.claude/plans/astral-next-experiments.md


13. .claude/plans/experiment-2-mass-interval-pruning.md 📝 Documentation +154/-0

Experiment 2 retrospective: exact mass-interval pruning results

• Retrospective for exact prefix mass-interval pruning attempt with 4 checkpoints
• Records −2.27% Astral wall improvement (statistically significant, p≈0.01) but below 5% default-on
 gate
• Documents implementation strategy, bound construction, and intersection testing
• Includes checkpoint results showing 12.22% prune rate but 11% wall regression at Checkpoint 2,
 improved to +3.4% at Checkpoint 3
• Concludes with verdict that Phase B remains the durable lever; Experiment 2 retained as
 retrospective only

.claude/plans/experiment-2-mass-interval-pruning.md


14. .claude/plans/SHIPPED.md 📝 Documentation +54/-0

Shipped work and iteration retrospective summary

• Short retrospective of recent shipped iterations and abandoned experiments
• Documents Phase B (calibrated precursor-window tightening) with −10.4% Astral wall, −18.0% TMT
 wall
• Records Phase A and Phase E as abandoned with detailed findings
• Includes Experiment 2 as below-gate but reproducible
• Lists active work and provides reference links to detailed retrospectives

.claude/plans/SHIPPED.md


15. .claude/plans/README.md 📝 Documentation +12/-5

Plans directory documentation and organization

• Reorganizes plan directory structure with Active, History, and Archived sections
• Links to Phase A retrospective, 5× roadmap, and Experiment 2 findings
• Clarifies that detailed plans live outside the repo to avoid checking artifacts into git

.claude/plans/README.md


16. benchmark/ci/PXD001819/run_ci.sh ⚙️ Configuration changes +6/-6

Switch benchmark output format from mzIdentML to Percolator PIN

• Changed output file format from mzIdentML (MZID) to Percolator PIN format (PIN)
• Updated variable name from MZID to PIN throughout the script for consistency
• Modified error message and error code to reflect missing PIN file instead of mzIdentML
• Updated script argument passed to extract_metrics.py from --mzid to --pin

benchmark/ci/PXD001819/run_ci.sh


17. .claude/plans/parameter-modernization-flag-inventory.md Additional files +0/-90

...

.claude/plans/parameter-modernization-flag-inventory.md


18. .claude/plans/parameter-modernization.md Additional files +0/-159

...

.claude/plans/parameter-modernization.md


19. .claude/plans/search-sync-cleanup.md Additional files +0/-133

...

.claude/plans/search-sync-cleanup.md


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented May 1, 2026

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (0)

Grey Divider


Remediation recommended

1. Uncaught sysprop parse crash 🐞 Bug ☼ Reliability
Description
MSGFPlus.runMSGFPlus parses tightening system properties with Float.parseFloat without handling
invalid values; a malformed -Dmsgfplus.tightening* value will throw NumberFormatException and abort
the run when calibration stats are reliable and tolerances are ppm-based.
Code

src/main/java/edu/ucsd/msjava/cli/MSGFPlus.java[R403-411]

+            float sigmaMultiplier = Float.parseFloat(System.getProperty(
+                    "msgfplus.tighteningSigmaMultiplier",
+                    String.valueOf(MassCalibrator.DEFAULT_TIGHTENED_WINDOW_SIGMA_MULTIPLIER)));
+            float floorPpm = Float.parseFloat(System.getProperty(
+                    "msgfplus.tighteningFloorPpm",
+                    String.valueOf(MassCalibrator.DEFAULT_TIGHTENED_WINDOW_FLOOR_PPM)));
+            float marginPpm = Float.parseFloat(System.getProperty(
+                    "msgfplus.tighteningMarginPpm",
+                    String.valueOf(MassCalibrator.DEFAULT_TIGHTENED_WINDOW_MARGIN_PPM)));
Evidence
The new tightening constants are read via Float.parseFloat(System.getProperty(...)) without a
try/catch, so any non-float value will terminate the run. Elsewhere in the codebase, sysprop parsing
is defensively guarded (NumberFormatException caught and defaults used), showing the expected
robustness pattern for optional tuning knobs.

src/main/java/edu/ucsd/msjava/cli/MSGFPlus.java[392-411]
src/main/java/edu/ucsd/msjava/msdbsearch/CompactSuffixArray.java[484-493]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`MSGFPlus.runMSGFPlus` reads `msgfplus.tighteningSigmaMultiplier`, `msgfplus.tighteningFloorPpm`, and `msgfplus.tighteningMarginPpm` via `Float.parseFloat(...)` with no error handling. If a user supplies a malformed value, the search aborts with `NumberFormatException`.

### Issue Context
These are explicitly described as tuning knobs for sweeps; they should behave like other sysprop knobs in the repo (ignore invalid values and fall back to defaults) rather than hard-crashing.

### Fix Focus Areas
- src/main/java/edu/ucsd/msjava/cli/MSGFPlus.java[403-411]

### Suggested fix
- Add a small helper (local method or utility) like `float readFloatSysprop(String key, float defaultValue)` that:
 - reads the property
 - `trim()`s it
 - catches `NumberFormatException`
 - optionally rejects `NaN`/`Infinity` and negative values for these specific knobs
 - prints a one-line warning to stderr/stdout when falling back
- Use the helper for the three tightening constants before calling `tightenedTolerancePpm(...)`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

ypriverol added 2 commits May 1, 2026 10:36
…diff)

These directories are out of scope for the precursor-window-tightening PR.
Restore them to match dev exactly so they no longer appear in the PR's
cumulative diff against dev. After this commit:

  git diff origin/dev..HEAD -- .claude/plans benchmark/ci

returns empty.

Removed (branch-only additions, never on dev):
  .claude/plans/SHIPPED.md
  .claude/plans/astral-next-experiments.md
  .claude/plans/astral-phase-a-retrospective.md
  .claude/plans/astral-speed-5x-roadmap.md
  .claude/plans/experiment-2-mass-interval-pruning.md

Restored to dev's version (branch had not picked up these dev additions):
  .claude/plans/parameter-modernization-flag-inventory.md
  .claude/plans/parameter-modernization.md
  .claude/plans/search-sync-cleanup.md

Restored to dev's version (branch had local edits):
  .claude/plans/README.md
  benchmark/ci/README.md
  benchmark/ci/PXD001819/baseline.tsv
  benchmark/ci/PXD001819/extract_metrics.py
  benchmark/ci/PXD001819/run_ci.sh
  benchmark/ci/PXD001819/test_compare_metrics.py
…operties

Adds two override knobs for the calibration pre-pass:

  -Dmsgfplus.maxSampled=<int>          (default 500)
  -Dmsgfplus.minConfidentPsms=<int>    (default 200)

Both fall back to the historical defaults on unset/non-numeric/non-positive
values, so existing behavior is unchanged. Internal constants moved from
private to public DEFAULT_* so callers and tests can reference them.

CalibrationStats.hasReliableStats() no longer hard-codes the
MIN_CONFIDENT_PSMS threshold; the calibrator now emits confidentPsmCount=0
as the "unreliable, do not apply" sentinel and hasReliableStats checks
count > 0. Behavior is identical for all existing callers.

Defaults intentionally unchanged. A 9-trial Astral A/B (3 trials each at
minConfidentPsms = 200 / 500 / 1000, all with maxSampled=3000) showed:

  config            sigma ppm   tighten ppm   wall s    targets   decoys
  (500,  200) HIST  0.994       3.482         469.0     89580     45292
  (3000, 200)       0.454       2.000         470.3     89249     44538
  (3000, 500)       0.511       2.033         471.7     89212     44555
  (3000, 1000)      0.729       2.687         469.7     89524     44817

Key findings:
  - Wall time is statistically equivalent across all four configurations
    (~470 s, well inside run-to-run sigma of ~25 s).
  - Sigma DECREASES as minConfidentPsms DROPS at fixed maxSampled
    (0.454 < 0.511 < 0.729 ppm for 200 < 500 < 1000). Confirms the prior
    that stratification depth, not pool size, drives sigma quality.
  - Lower minConfidentPsms produces an over-tight window: (3000, 200)
    hits the 2.0 ppm floor and loses 331 native targets vs the historical
    (500, 200) default.
  - No configuration tested beats the historical (500, 200) defaults end
    to end, which were already validated on 3 datasets in PR #26.

Conclusion: ship the configurability for users who want to tune for
specific instruments / file sizes, but do not change the defaults.

26 unit tests verify property parsing (default fallback, valid int, trim,
non-numeric rejection, non-positive rejection, exported constant values).
@ypriverol ypriverol merged commit 6512011 into dev May 1, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant