Skip to content

First bigbio release of msgf+#14

Open
ypriverol wants to merge 168 commits intomasterfrom
dev
Open

First bigbio release of msgf+#14
ypriverol wants to merge 168 commits intomasterfrom
dev

Conversation

@ypriverol
Copy link
Copy Markdown
Member

@ypriverol ypriverol commented Apr 16, 2026

This pull request adds comprehensive project documentation and internal investigation records to the MS-GF+ codebase. It introduces a high-level project context file, detailed documentation of command-line flags, and a new system for tracking technical investigations and plans. The main themes are improved onboarding/documentation, enhanced transparency into known issues and technical debt, and groundwork for future development and parameter modernization.

Project documentation and onboarding:

  • Added .claude/CLAUDE.md with an overview of the MS-GF+ project, directory structure, build instructions, conventions, and performance-sensitive invariants. This serves as a central onboarding and context document for the codebase.
  • Added .claude/plans/README.md and .claude/plans/parameter-modernization-flag-inventory.md to document implementation plans, design documents, and a detailed inventory of all command-line flags and their parsing semantics. This supports the upcoming parameter modernization effort. [1] [2]

Investigations and technical debt tracking:

  • Introduced .claude/investigations/README.md as an index for tracked issues, bugs, and behaviors needing further analysis, establishing a process for technical investigations.
  • Added detailed investigations:
    • Investigation 001: MGF scan number extraction failure, documenting the parser's inability to handle PRIDE/ProteomeXchange TITLE formats and proposing regex-based fixes.
    • Investigation 002: E-value leakage to Percolator, analyzing how MS-GF+ E-value output leaks target/decoy information, impacting FDR estimation in downstream tools, and proposing output changes and CLI flags to mitigate.

ypriverol and others added 30 commits April 13, 2026 07:51
…Java 17

- Change skipTests from true to false so mvn test actually runs
- Update maven-compiler-plugin source/target from 1.8 to 17 (matches runtime)
- Add missing compile dependencies: jmzml 1.7.11, fastutil 8.5.12,
  slf4j-api 1.7.36, logback-classic 1.2.12, commons-io 2.15.1
  (master code references these classes but they were not declared)
- @ignore TestMzML test that requires Windows-specific DMS files

Result: 120 tests run, 53 active, 67 skipped, 0 failures

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…shold

In MZIdentMLGen.addSpectrumIdentificationResults(), change `break` to
`continue` when a match has DeNovoScore below the minimum threshold.
The `break` was incorrectly stopping emission of all subsequent matches
for that spectrum, silently dropping valid PSMs from the mzid output.

Also add null safety check for spectrum index lookup — if a spectrum
index is not found in the spectrum file, log a warning and skip
instead of throwing a NullPointerException.

Add TestMZIdentMLGen with two integration tests:
- testMzidScoreCompleteness: runs MSGF+ search, verifies every SII
  has all 4 score CVParams (RawScore, DeNovoScore, SpecEValue, EValue)
- testMzidStructuralValidity: verifies output mzid has required
  mzIdentML structure elements

Closes MSGFPlus#157

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add new -msLevel CLI parameter to filter spectra by MS level.
Accepts single value (e.g., -msLevel 2) or comma-separated range
(e.g., -msLevel 2,3). Default is 2 (MS2 only).

Changes:
- ParamManager: add MS_LEVEL enum and registration
- IntRangeParameter: enable single-value parsing, fix typo
- SearchParams: add minMSLevel/maxMSLevel fields
- SpecKey: filter spectra by MS level in getSpecKeyList()
- SpectraAccessor: add setMSLevelRange(), wire to parsers
- MzMLAdapter/MzXMLSpectraMap: fix maxMSLevel to be inclusive
- MSGFPlus/MSGFDB/MSGFDBLib: wire MS level parameters
- pom.xml: remove fastutil shade filter (jmzml 1.7.11 needs full fastutil)

Tests: TestIntRangeParameter (9 tests), TestMSLevelFiltering (6 tests)

Benchmark (TMT 1.1GB, TDA):
  Baseline: 1245s, 6654 PSMs@1%FDR
  -msLevel 2: 957s (-23%), 6936 PSMs@1%FDR (+4.2%)

Closes MSGFPlus#159

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat(MSGFPlus#159): add -msLevel parameter for MS level filtering
fix(MSGFPlus#157): preserve PSM scores when DeNovoScore is below threshold
fix: enable test suite and fix broken build dependencies
Remove standalone scripts, legacy tools, and unused classes that are
not referenced by the core MSGF+ search pipeline, reducing codebase
by ~22,000 lines.

Deleted entire packages:
- ims/ (9 files) — legacy IMS utilities
- ipa/ (5 files) — unused isotope pattern analysis
- msgf2d/ (8 files) — abandoned 2D scoring experiment
- msdictionary/ (7 files) — unused genome dictionary tool
- mstag/ (3 files) — unused sequence tagging
- scripts/ (6 files) — standalone CLI utilities
- msutil/test/ (3 files) — misplaced test classes
- msgf/test/ (2 files) — legacy test stubs
- msgf/analysis/ (1 file) — unused ROC generator

Cleaned mixed packages:
- misc/: removed 59 standalone scripts, kept 5 core utilities
- msgf/: removed 6 unused graph/scoring classes
- msutil/: removed 9 unused filter/annotation classes
- msdbsearch/: removed 4 standalone DB tools
- parser/: removed 9 legacy format parsers (InsPecT, Mascot, etc.)
- ui/: removed 6 legacy entry points (MSGF, MSGFLib, etc.)
- mzid/: removed 1 unused adapter stub
- msscorer/: removed 1 unused stats class
- suffixarray/: removed 1 unused mass array class

Also removed dead test methods and cleaned dangling imports.
Tests: 119 run, 0 failures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrite README.md:
- Full parameter reference tables covering all 30+ flags organized
  by category (core search, fragmentation, enzyme, filtering, etc.)
- Quick start examples for basic and TMT searches
- Modification file format documentation with examples
- Build-from-source instructions
- Updated requirements to Java 17+

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add parameter docs to README and CI/CD workflow
Remove dead code: 150 unused classes, -22K lines
Write search results directly to TSV from in-memory objects, bypassing
mzIdentML serialization. Output is column-identical to MzIDToTsv
(verified by diff on test.mgf search). This avoids generating large
.mzid files when only TSV is needed downstream (e.g. OpenMS
MSGFPlusAdapter, Percolator).

- New DirectTSVWriter class with same score/protein/mod logic as
  MZIdentMLGen but streaming tab-delimited output
- New -outputFormat parameter: 0=mzid (default), 1=tsv, 2=both
- Includes fixed + variable mods, MGF Title column, decoy filtering
- Backwards compatible: default remains mzid

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When -addFeatures 1 is used with -outputFormat tsv, the TSV now
includes all PSMFeatureFinder columns needed for Percolator:
ExplainedIonCurrentRatio, NTermIonCurrentRatio, CTermIonCurrentRatio,
MS2IonCurrent, MS1IonCurrent, IsolationWindowEfficiency,
NumMatchedMainIons, and all error statistics (MeanError/StdevError
for All and Top7, both absolute and relative).

These features were previously only available as UserParams in mzid
and were not extracted by OpenMS's addMSGFFeatures() — now they are
directly accessible as TSV columns.

The peptide modification format (M+15.995) is already compatible with
OpenMS MSGFPlusAdapter's modifySequence_() converter which transforms
it to bracket notation M[+15.995] for AASequence.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the jmzml JAXB-based MzMLUnmarshaller with a lightweight StAX
streaming parser that extracts only the 11 fields MSGF+ needs. The new
parser builds a spectrum index in a single pass, then preloads all
spectra into memory on first random access, eliminating repeated XML
parsing during the search phase.

Benchmark (TMT 1.1GB mzML, target-decoy, 4 threads):
- Wall time: 957s -> 853s (-10.9%)
- PSMs at 1% FDR: 6,936 (unchanged)
- Score completeness: 100% (unchanged)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…port

- Remove jmzml (JAXB-based mzML parser) dependency from pom.xml
- Delete old jmzml-dependent classes: MzMLAdapter, MzMLSpectraMap,
  MzMLSpectraIterator, SpectrumConverter
- Add referenceableParamGroupRef resolution to StaxMzMLParser: builds
  a map of param groups during index pass, resolves refs during
  spectrum parsing (critical for files that define polarity, MS level,
  etc. in referenceable groups)
- Move turnOffLogs() utility to StaxMzMLParser, update all callers
- Keep fastutil dependency (needed by jmzidml at runtime)

JAR size reduced from 39.5MB to 38MB.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jmzReader (uk.ac.ebi.pride.tools:jmzreader:2.0.6) had zero imports
anywhere in the codebase — a dead dependency from earlier development.
All spectrum file format parsing uses custom implementations:
mzML (StaxMzMLParser), mzXML (embedded jrap/stax), MGF/MS2/PKL
(custom parsers).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove mzXML file format support entirely:
- Delete embedded jrap/stax library (20 files, ~5,800 lines)
- Delete MzXMLSpectraMap, MzXMLSpectraIterator, MzXMLToMgfConverter
- Delete MzXMLToMgf utility and mzXML test resources (38MB)
- Remove MZXML from SpecFileFormat enum, SpectraAccessor, ParamManager
- Update misc/scripts/ui classes to remove mzXML code paths

mzXML is a legacy format superseded by mzML. Users with mzXML files
can convert to mzML using msconvert (ProteoWizard).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- StaxMzMLParser: use ConcurrentHashMap for thread-safe spectrum cache,
  fix class-level doc (preload-all, not bounded LRU), check index before
  preloading, propagate exceptions instead of returning null
- StaxMzMLSpectraIterator: throw NoSuchElementException when exhausted
- SpectraAccessor: throw exception instead of System.exit(-1),
  validate specFormat is non-null in constructor
- SelectSpectra: update stale .mzXML reference to .mzML
- pom.xml: fix duplicate <manifest>, remove stale comments, note
  fastutil is required by jmzidentml at runtime

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Write search results directly to TSV from in-memory objects, bypassing
mzIdentML serialization. Output is column-identical to MzIDToTsv
(verified by diff on test.mgf search). This avoids generating large
.mzid files when only TSV is needed downstream (e.g. OpenMS
MSGFPlusAdapter, Percolator).

- New DirectTSVWriter class with same score/protein/mod logic as
  MZIdentMLGen but streaming tab-delimited output
- New -outputFormat parameter: 0=mzid (default), 1=tsv, 2=both
- Includes fixed + variable mods, MGF Title column, decoy filtering
- Backwards compatible: default remains mzid

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When -addFeatures 1 is used with -outputFormat tsv, the TSV now
includes all PSMFeatureFinder columns needed for Percolator:
ExplainedIonCurrentRatio, NTermIonCurrentRatio, CTermIonCurrentRatio,
MS2IonCurrent, MS1IonCurrent, IsolationWindowEfficiency,
NumMatchedMainIons, and all error statistics (MeanError/StdevError
for All and Top7, both absolute and relative).

These features were previously only available as UserParams in mzid
and were not extracted by OpenMS's addMSGFFeatures() — now they are
directly accessible as TSV columns.

The peptide modification format (M+15.995) is already compatible with
OpenMS MSGFPlusAdapter's modifySequence_() converter which transforms
it to bracket notation M[+15.995] for AASequence.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace ConvertToMgf-based tests (class removed in PR #7) with
StaxMzMLParser and SpectraAccessor mzML parsing tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: native TSV output — bypass mzIdentML for OpenMS/Percolator pipelines
perf: replace jmzml JAXB parser with StAX-based mzML reader
* chore: add CI/release packaging and benchmark scaffolding

Split infra and repository maintenance updates into a dedicated reviewable change set, including workflow automation, Docker packaging, benchmark scripts/docs, and project documentation updates. Exclude large local benchmark artifacts and keep this PR focused on non-hot-path code organization and release hygiene.

Made-with: Cursor

* chore: keep benchmark folder local-only

Remove benchmark scripts/docs from this branch and ignore the entire benchmark directory so local benchmarking assets do not appear in review PRs.

Made-with: Cursor

* docs: keep single canonical primitives plan

Fold memory-reduction guidance into the balanced primitives plan and remove the old duplicate plan file so review and maintenance use one canonical document.

Made-with: Cursor

* chore: narrow PR1 plans to scope-only docs

Remove unrelated strategy and optimization plan documents from PR1 so this branch stays focused on infra/packaging cleanup. Keep only the plans index file in this PR.

Made-with: Cursor

* chore: remove legacy ZippedReleases folder

Delete the obsolete Windows release helper scripts and reference files under ZippedReleases from the repository.

Made-with: Cursor

* chore: remove legacy extlib dependency jar

Delete the obsolete jrap/stax legacy jar under extlib as part of repository cleanup.

Made-with: Cursor

* fix: address copilot review feedback for PR11

Align docs with actual supported legacy formats, update release pipeline to build from tag version with tests, and fix Docker build JDK requirement.

Made-with: Cursor

* chore: minor packaging/docs hygiene for PR1

Normalize ignore files, shrink Docker build context, align agent README with dev/CI, and clarify release workflow step naming.

Made-with: Cursor

* docs: trim examples folder to small referenced artifacts

Remove duplicate Tryp_Pig_Bov DB/index copies (tests use src/test/resources),
drop large unlinked Excel/PNG teaching files, and add docs/examples/README.md
so the directory purpose is obvious. Link the index from the main README.

Made-with: Cursor

* chore: remove IntelliJ IDEA tips screenshots from docs

Made-with: Cursor

* docs: replace legacy HTML manuals with Markdown

Convert docs/*.html to GitHub-flavored Markdown (pandoc), fix internal links,
add docs/README.md as the documentation index, and remove unused style.css.

Made-with: Cursor

* docs: strip leftover HTML span wrappers from converted Markdown

Made-with: Cursor
Add a workflow-dispatch benchmark pipeline on a fixed self-hosted runner profile, with public-data download, metrics emission, and baseline TSV comparison under benchmark/ci/PXD001819 for future dataset expansion.

Made-with: Cursor
Use uppercase PXD001819 naming in workflow-visible labels/artifacts and update README to state mzXML is not available in this fork.

Made-with: Cursor
- run_ci.sh: count only opening <SpectrumIdentificationItem> tags for
  sii_count (prior substring match double-counted closing tags and
  picked up SpectrumIdentificationItemRef)
- run_ci.sh: always emit peak_rss_kb and cpu_percent (NA when GNU time
  does not expose them) so metrics file format is consistent
- compare_metrics.py: support an `optional` column; optional missing/NA
  metrics warn instead of failing CI
- baseline.tsv: add optional column, mark peak_rss_kb optional, fix
  ubuntu-latest note to reference the self-hosted runner, widen
  sii_count floor to match the de-duplicated count
- README pointers: update stale references to a non-existent
  benchmark/run_pxd001819_benchmark.sh script
- benchmark/README.md: describe the actual committed CI scaffold
  instead of an uncommitted local harness layout

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- extract_metrics.py: stream-parse mzIdentML with ElementTree.iterparse
  so SII counting and PSM 1% FDR counting no longer rely on line-shaped
  regex matches over XML
- run_ci.sh: use a bash array for SEARCH_ARGS (safe against future
  flags with spaces), atomic .part downloads, validate cached gzip,
  default MSGFPLUS_THREADS to 8 to match the workflow, drop the
  always-zero java_exit metric, and emit integer wall_time_sec
- workflow: pin Python via actions/setup-python@v5 so self-hosted
  runners have a known 3.11 interpreter for the helper scripts
- compare_metrics.py: add test_compare_metrics.py covering
  in-range pass, out-of-range fail, missing required/optional,
  NA, non-numeric, and empty-range rows (7 tests, all passing)
- .gitignore: drop redundant benchmark/** patterns (already covered
  by benchmark/* + ci/ allowlist); add __pycache__/ and *.pyc
- docs: describe new helper and test scripts in both READMEs
fix(benchmark): harden PXD001819 scaffold per review feedback
Two follow-ups from the PR #25 review thread:

1. BOM strip in BufferedLineReader (Copilot finding).
   The constructor wrapped the FileInputStream in
   UnicodeBOMInputStream but never called skipBOM() -- BOM bytes
   leaked into line 1, breaking config / mod / FASTA files saved by
   editors that prepend a UTF-8 BOM (Windows Notepad, some VS Code
   configurations, etc.). Fix: chain .skipBOM() in the super(...)
   call. The BOM-stripping path is per-instance; the wrapper still
   detects the BOM in its constructor, we just now consume it.

   New test BufferedLineReaderTest pins the contract: a file with a
   UTF-8 BOM followed by `ParentMassTolerance=20ppm` returns the
   bare key=value as line 1 (not `ParentMassTolerance=20ppm`),
   and a no-BOM file is unchanged.

2. Delete unused MSGFResult.java.
   Verified zero callers in src/main and src/test before removal.
   The reviewer flagged this as a small follow-up to land alongside
   the modernization sweep.

Verified: scoped sweep (BufferedLineReaderTest,
MSGFPlusOptionsConfigFileTest, MSGFPlusOptionsActivationMethodTest,
SearchParamsTest, TestPrecursorCalScaffolding, TestRunManifestWriter,
TestDirectPinWriter, TestMSUtils, TestMisc, TestMinSpectraPerThread):
55 tests, 0 failures, 0 errors, 2 skipped.
Convert eight value-shaped types to Java records, per the audit:
small immutable carriers with no inheritance constraints, where the
record's auto-generated equals/hashCode/toString and accessor methods
replace hand-rolled boilerplate. No behavioral change; net trim of
~80 LOC plus a clearer surface.

Converted:

- cli.IntRange -- record IntRange(int min, int max). Compact
  constructor keeps the min<=max validation. parse() and
  Converter (picocli ITypeConverter) preserved.
- cli.PrecursorTolerance -- record PrecursorTolerance(Tolerance left,
  Tolerance right). Compact constructor enforces the matching-unit
  and non-negative invariants in-place; parse() / Converter retained.
- msutil.CvParamInfo -- record CvParamInfo(String accession, String
  name, String value, String unitAccession, String unitName). The
  stored hasUnit field is gone; hasUnit() is now derived from
  unitAccession != null. Compatibility getters (getAccession(),
  getValue(), getUnitAccession(), getUnitName(), getHasUnit()) keep
  existing call sites untouched.
- msutil.Annotation -- record Annotation(AminoAcid prevAA, Peptide
  peptide, AminoAcid nextAA). Setters (which had no live callers)
  removed; the parsing constructor delegates via this(...).
  isProteinNTerm/isProteinCTerm and the dotted toString are kept.
- msgf.ProfilePeak<T extends Matter> -- record. Setters were unused;
  Comparable<ProfilePeak<T>> is preserved.
- msutil.Atom -- record Atom(String code, double mass, int
  nominalMass, String name). Static atomArr table + atomMap +
  static initializer block all kept verbatim. getCode() / getName() /
  getMass() / getNominalMass() retained as compatibility wrappers.
- msdbsearch.CompactSuffixArray.RangeMetadata -- record (4 file/int
  fields). Three call sites updated from md.numEntries to
  md.numEntries() etc.
- msgf.MassListComparator.MatchedPair<T extends Matter> -- record.
  getMass1/getMass2 retained as compatibility wrappers.

Call-site updates in SearchParams.parse: tol.left/.right, isotope.min/.max,
specIdx.min/.max, ms.min/.max all switched to record accessor methods
(.left()/.right()/.min()/.max()).

Verified: scoped sweep (incl. BufferedLineReaderTest,
MSGFPlusOptionsConfigFileTest, MSGFPlusOptionsActivationMethodTest,
SearchParamsTest, TestPrecursorCalScaffolding, TestRunManifestWriter,
TestDirectPinWriter, TestMinSpectraPerThread, TestMSUtils, TestSA,
TestMisc, TestCandidatePeptideGrid + ConsideringMetCleavage):
80 tests, 0 failures, 0 errors, 3 skipped.
perf(search): instrumentation + dead-sync removal + per-task buffers + ForkJoinPool toggle
… SHIPPED.md

- Add astral-speed-improvements.md design (gate B: 1.3-1.5x Astral wall,
  no PSM regression). TMT-as-inner-loop, Astral-as-phase-gate workflow.
- Add SHIPPED.md short retrospective covering PRs #15-#22 (primitives +
  Achievements A+B), PR #23 (speed-v2 + output consolidation), PR #24
  (Astral OOM + BuildSA scaling), PR #25 (search-sync-cleanup +
  parameter-modernization), and the abandoned fragment-index experiment.
- Remove search-sync-cleanup.md, parameter-modernization.md, and
  parameter-modernization-flag-inventory.md — work shipped, content
  folded into SHIPPED.md.
- Update README.md to point at the new active/history layout.
…ng model

Replace the short-horizon astral-speed-improvements.md (drafted on this
branch in 878b0cb) with the longer-horizon astral-speed-5x-roadmap.md
as the single active design.

Shipping-model change (new §0 in the roadmap): this iteration ships as
milestone commits on feat/astral-speed-improvements with one closing PR
at the end, not a PR per phase. Each phase milestone commit follows the
template:

  feat(astral-speed): MILESTONE Phase <id> — <one-line achievement>

  <2-4 lines: TMT inner-loop wall delta, Astral phase-gate result,
  recall delta, RSS delta if any>

The small-wins from the deleted plan (graph-skeleton memoization in
PrimitiveAminoAcidGraph + Tier-1.5 GF candidate cap in
DBScanner.computeSpecEValue) are retained as the §0 "Iteration 0.5
fallback" — used only when a big-win phase fails its kill gate.

Original short-horizon plan recoverable via:
  git show 878b0cb:.claude/plans/astral-speed-improvements.md
…reset

Phase A (in-engine MS2 deisotoping + dense-peak retention cap + GF
candidate cap + scorer hot-path optimization) was attempted on this
branch from 2026-04-27 to 2026-04-28. None of three independent
optimization angles moved Astral wall above run-to-run noise; TMT's
1.41x apparent win came with -0.25% target and -4.6% decoy drift,
which is not clean enough to justify shipping the surface area.

Branch was hard-reset to eee9fa6 (the 5x roadmap state before any Phase
A work) to drop 11 code commits + their uncommitted in-progress
extensions. This commit captures the empirical findings as a
retrospective doc so future agents can read the measurement data,
understand why Phase A failed Astral, and pick from Phase B/C/E or
workload-retargeting alternatives without repeating the attempt.

- Add .claude/plans/astral-phase-a-retrospective.md with six Astral
  measurements, two TMT measurements, lessons (TMT-not-Astral-proxy,
  JIT-already-optimizes, native-count-drift-is-leading-indicator), and
  a concrete list of what's still untried.
- Update astral-speed-5x-roadmap.md section 11 + 12 to flag Phase A as
  attempted-and-reverted, with original recommendation preserved as a
  blockquote for context.
- Update SHIPPED.md "Abandoned" section with a Phase A entry
  pointing at the retrospective.
- Update README.md "Active" list to surface the retrospective.

Reverted code is recoverable via:
  git show 5cdd21e:src/...
and walks back through 11 commits (b78e275 through 5cdd21e).
…nder noise

Follow-up to Phase A iteration. Tested ForkJoin pool as a smart-default
on Astral after thread-scaling sweep showed default ThreadPoolExecutor
anti-scales past 6 threads.

Findings:
- Default executor: 4t=690s, 6t=675s, 8t=884s (+28% regression at 8t).
  Anti-scaling at 8t reproduces across measurements; this is real.
- ForkJoin opt-in: 4t=872s (regression), 8t=521s (1.32x vs 4t). Pool
  choice is thread-count dependent.
- Smart-default change implemented (numThreads >= 8 -> ForkJoin) and
  scope-tested (9/9 pass), but confirmation runs showed +65% wall on
  the same JAR vs the morning measurement. Machine state had degraded
  across hours of benchmarking; can't isolate ForkJoin win from noise.
- Reverted. The underlying -Dmsgfplus.useForkJoin=true opt-in remains
  in dev unchanged.

Lessons added to retrospective:
- Need stable benchmark environment + multi-run statistics for any
  Astral wall-time claim. Single-measurement deltas at the 30% level
  are below the noise floor.
- The default-executor anti-scaling at 8t IS worth fixing. Future
  attempt: jfr print --events jdk.JavaMonitorWait on 8t default-executor
  profile to localise the contention point in
  ThreadPoolExecutorWithExceptions.

Files: .claude/plans/astral-phase-a-retrospective.md (added Phase E
section + updated "what's untried" entry); .claude/plans/SHIPPED.md
(added Phase E to "Abandoned"). Source code unchanged from eee9fa6
(verified via git status; MSGFPlus.java reverted before commit).
… not enough

JFR analysis of the morning 4t Astral profile found zero JavaMonitorEnter
contention events and 100% RUNNABLE samples, ruling out lock contention
as the 8t regression cause. But 588K GCPhaseParallel events suggested
GC parallel-phase coordination as a candidate.

Tested with -Xmx16g (double heap) at 8t and 4t:
- 8t-Xmx16g: 776s wall, 5067MB peak RSS, 182 GC pauses
- 4t-Xmx16g: 870s wall, 6120MB peak RSS, 184 GC pauses

Bigger heap helped 8t by ~12% wall and dropped peak RSS ~17% (G1GC ran
fewer collections), but only ~4% at 4t. GC contributes to the 8t
regression but is not the entire story. Even with Xmx16g, 8t (776s) is
still slower than morning's 4t-Xmx8g baseline (690s).

The afternoon's 4t-Xmx8g run (904s) vs morning's same configuration
(690s) is +31% on the same JAR / same args / same machine — confirming
the day's accumulated machine-state degradation dwarfs any code-level
signal. Six hours of benchmarking has hit the noise floor.

No actionable recommendation to ship. Source code stays at dev tip.
The Phase E section in astral-phase-a-retrospective.md is updated with
the GC-pressure findings.
…ere noise

Replication batch (3 Astral runs back-to-back, 2026-04-29 morning,
quieter machine):

  4t default:     963.1s
  8t default:     918.3s   (FASTER than 4t — opposite of yesterday)
  8t ForkJoin:    978.8s   (no advantage)

All three within 6.5% of each other. Native counts bit-identical.

This proves:
- The "default executor anti-scales past 6 threads" finding was wrong.
  It was a one-day correlation between morning-quiet-machine + 4t and
  afternoon-noisy-machine + 8t, not a real algorithmic relationship.
- The 521s ForkJoin-8t measurement was an outlier. Three independent
  re-measurements (yesterday afternoon 861s, today 978s) put 8t-FJ at
  ~900s in normal machine state.
- The 690s 4t-default baseline was an outlier in the other direction.
  Today's same-machine 4t-default was 963s.

Conclusion: no Phase E shippable change exists on this benchmark setup.
Wall-time variance from machine state (~30%) dwarfs any code-level
signal we can produce on a single developer workstation.

Updated retrospective with the corrected Phase E section + replication
batch data. Updated SHIPPED.md Phase E entry to reflect the disproof.
Source code stays at dev tip.

End of iteration. Five doc commits stay on the branch as the deliverable.
The PXD001819 CI workflow (.github/workflows/benchmark-pxd001819.yml)
has been silently broken since PR #23 removed mzIdentML output
entirely. run_ci.sh wrote -o ci_output.mzid, MS-GF+ produced a .pin
under that filename, and extract_metrics.py tried to parse it as
mzIdentML. The CI never produced valid metrics.

Switch the whole pipeline to .pin, which is the only modern output
format:

- run_ci.sh: -o $PIN (was -o $MZID); error message updated.
- extract_metrics.py: rewritten parse_pin() counts native target /
  decoy rows by streaming the .pin (column 2 = label, {1, -1}) one
  line at a time. Drops the iterparse XML dependency. Replaces
  sii_count + psm_1pct_fdr metrics with native_target_count +
  native_decoy_count.
- baseline.tsv: new metric names + ranges. Wide first-green-run
  windows (14000-35000 targets, 8000-20000 decoys); to be tightened
  after the first self-hosted runner produces consistent numbers.
- test_compare_metrics.py: 7 existing comparator tests updated to
  use the new metric names; 3 new ParsePinTest cases covering
  label counts, empty input, and malformed rows. 10/10 pass.
- benchmark/ci/README.md: documents the PIN-based pipeline, why
  native counts are the right CI gate (deterministic across runs;
  catches search-code regressions cleanly without Percolator
  dependency), and points future iterations at the multi-run
  protocol the astral-phase-a-retrospective.md asks for.

The CI gate is now search-correctness (deterministic counts), not
sensitivity (which needs Percolator + a separate gate). Sensitivity
regressions surface via a downstream Percolator run on the .pin
artifact — out of scope for this CI workflow.

This is the meta-fix the Astral retrospective named as the next
iteration's prerequisite: stable benchmarks must work before any
speed-optimization claim is defensible.
…ication

Phase B (calibrated precursor-window tightening) is already implemented in
dev — MSGFPlus.runMSGFPlus:396-423 computes tightened tolerance via
MassCalibrator.tightenedTolerancePpm() and passes it to per-task
ScoredSpectraMap. What's missing per astral-next-experiments.md is the
telemetry to verify it's actually shrinking the pairing fan-out.

This commit adds PhaseBTelemetry, an opt-in counter class that records:
- pairing_calls: number of times DBScanner.dbSearch hit the
  pepMassSpecKeyMap.subMap(leftThr, rightThr) site
- matched_speckeys: total SpecKeys returned across those calls

Enable with -Dmsgfplus.phaseBTelemetry=true. Off by default; OFF-mode
behaviour is bit-identical (single load+branch when disabled). Not
exposed as a CLI flag — this is a developer diagnostic for Phase B
verification, mirroring the approach used for -Dmsgfplus.useForkJoin.

At the end of search MSGFPlus.runMSGFPlus prints (when enabled):
  [Phase B telemetry] pairing_calls=N matched_speckeys=M mean_per_call=X.XX

Files:
- src/main/java/edu/ucsd/msjava/msdbsearch/PhaseBTelemetry.java (new) —
  thread-safe LongAdder counters, opt-in via system property, JVM-scoped.
- src/main/java/edu/ucsd/msjava/msdbsearch/DBScanner.java — record at
  DBScanner.dbSearch:489 (one line, guarded by enabled() short-circuit).
- src/main/java/edu/ucsd/msjava/cli/MSGFPlus.java — print summary after
  task wall summary (around line 587), only when enabled.
- src/test/java/edu/ucsd/msjava/msdbsearch/TestPhaseBTelemetry.java (new)
  — 5 tests: counter math, mean-zero edge case, reset, 8-thread
  concurrency stress (80k ops total), system-property contract.

Tests: 14/14 pass for the relevant scoped suite (TestPhaseBTelemetry +
TestPrecursorCalScaffolding + TestConcurrentMSGFPlus); OFF-mode
unchanged.

Plan update: astral-next-experiments.md §4 annotated to note Phase B
core is shipped + telemetry is now wired. Future agent runs the bench
with -Dmsgfplus.phaseBTelemetry=true and the existing -precursorCal
auto/on, then verifies §4.5 Phase B success gate (matched SpecKey count
drops materially after tightening; Astral wall improves >=10%).
…uals

Two bugs in MassCalibrator surfaced under Phase B telemetry:

1. The pre-pass propagated the user's -ti window (e.g. -1,2 on Astral) to
   the pre-pass DBScanner. Spectra whose precursor was selected as the
   M+1 or M+2 isotope produced 500-1000 ppm "residuals" (the 1.003 Da
   isotope shift on a 2 kDa peptide). These contaminated the median +
   MAD aggregation. Astral measured sigma=1166 ppm — useless for any
   tightening decision. Fix: hard-code the pre-pass to iso=[0,0]. The
   pre-pass exists to LEARN the shift, not to find every match; clean
   monoisotopic matches are what makes the residual distribution
   meaningful.

2. As belt-and-suspenders, residuals beyond +/-50 ppm are now rejected
   in extractResiduals(). Any modern instrument's true mass accuracy is
   well under 50 ppm; values above are almost always isotope-shift or
   charge-state mistakes. New constant MAX_REASONABLE_RESIDUAL_PPM=50.

Drop the now-unused minIsotopeError / maxIsotopeError fields and
constructor parameters from MassCalibrator. The single caller
(MSGFPlus.runMSGFPlus) is updated.

Verified on remote pride-linux-vm.ebi.ac.uk (Astral ProteoBench Module 8):

  Before fix:  Precursor mass shift 1.398 ppm from 371 PSMs (sigma 1166 ppm)
  After fix:   Precursor mass shift 0.978 ppm from 393 PSMs (sigma 3.987 ppm)

The shift values are similar; the sigma is the dramatic signal that
the residual distribution is now clean. 0.978 ppm is the typical
Astral instrument bias; 3.987 ppm is the true post-calibration
spread (still wider than expected; might reflect long-peptide tail).

OFF-mode (precursorCal=off) is bit-identical to dev-tip on three
back-to-back Astral runs (548/560/551s wall, all 89479/46792 native
counts). 32/32 scoped tests pass (TestMassCalibrator,
TestPhaseBTelemetry, TestPrecursorCalScaffolding).

NOTE: this commit only fixes the calibrator's correctness. Phase B
tightening still does not fire on Astral because the tightening
formula k=3 * sigma + margin = 12.46 ppm exceeds the 10 ppm user
window. Tuning k or rethinking the formula is a separate decision;
see retrospective for the next-iteration discussion.
Phase B's tightening formula min(userPpm, max(floorPpm, k*sigma + marginPpm))
previously used hard-coded MassCalibrator.DEFAULT_TIGHTENED_WINDOW_*
constants (k=3, floor=2 ppm, margin=0.5 ppm). Surface them as system
properties so falsification sweeps can vary k without recompiling:

  -Dmsgfplus.tighteningSigmaMultiplier=<float>  (default 3.0)
  -Dmsgfplus.tighteningFloorPpm=<float>         (default 2.0)
  -Dmsgfplus.tighteningMarginPpm=<float>        (default 0.5)

Defaults unchanged; OFF-mode and PrecursorCal=AUTO production behaviour
is bit-identical when no override is set.

Verified on remote pride-linux-vm.ebi.ac.uk Astral ProteoBench Module 8
with the new -Dmsgfplus.tighteningSigmaMultiplier=2 override:

| Run                | Wall  | Targets | Decoys | mean_per_call |
|--------------------|-------|---------|--------|---------------|
| OFF (baseline)     | 538s  |  89479  | 46792  | 0.77          |
| AUTO @ k=2 rep1    | 537s  |  89453  | 46685  | 0.63          |
| AUTO @ k=2 rep2    | 531s  |  89453  | 46685  | 0.63          |

Tightening fires for the first time on Astral (10.000 ppm -> 8.474 ppm).
Pairing fan-out drops 18% (mean_per_call 0.77 -> 0.63). matched_speckeys
drops 15.4%. Native counts shift slightly (-26 targets, -107 decoys =
-0.03% / -0.23%, within noise). Wall delta: 0 +/- 2% (within machine
noise floor of three OFF replicates 538/560/551).

Conclusion: Phase B's cheap-score-pairing reduction does NOT translate
to Astral wall savings. The bottleneck is downstream of pairing (GF /
scoring). Phase B remains a valid lever for workloads where calibrated
sigma is small relative to user tolerance (typical of older / drifted
instruments) but is not the next Astral wall lever.

Next step: stratification analysis (Plan B) to determine whether a
cleaner subset of Astral pre-pass PSMs has materially tighter sigma.
If yes -> calibrator quality improves for workloads where Phase B
DOES pay off, even if Astral itself doesn't benefit. If no ->
Phase B is at its theoretical ceiling.
…FIDENT_PSMS

Plan B (post-falsification): the previous calibrator returned all 393
SpecEValue<=1e-6 PSMs to the median+MAD aggregator. Per-PSM stratification
analysis (using the calibrationDebugTsv emitter) on Astral revealed:

  Stratum                      | n   | sigma (ppm)
  -----------------------------|-----|------------
  All confident PSMs           | 393 | 3.987
  Charge=2 only                | 355 | 2.923
  Mass < 1 kDa                 | 289 | 2.470
  TOP 50% by spec_eValue       | 196 | 0.951
  TOP 200 by spec_eValue       | 200 | 0.994

The "best half by spec_eValue" axis is the dominant explanatory variable —
4x tighter sigma. The worst-half PSMs (eValue near the 1e-6 threshold)
contribute residual scatter, not real instrument-bias signal.

Fix: extractResiduals now collects (residual, spec_eValue) pairs, sorts
by spec_eValue ascending (most confident first), and keeps the top
MIN_CONFIDENT_PSMS (200) for the median+MAD aggregation. The
absolute-residual filter (|residual| > 50 ppm) stays as belt-and-suspenders.

Astral measurement on remote pride-linux-vm.ebi.ac.uk (5 OFF replicates +
3 stratified-AUTO replicates):

  Run                   | Wall  | Sigma     | Window         | Targets | Decoys | T/D
  ----------------------|-------|-----------|----------------|---------|--------|------
  OFF baseline (median) | 551 s | -         | 10 ppm         |  89479  | 46792  | 1.91
  AUTO stratified       | 494 s | 0.994 ppm | 10 -> 3.48 ppm |  89580  | 45292  | 1.98

  Median wall: -10.4% (gate was >=10%, hit)
  Targets:     +101 (+0.11%, NOT a regression)
  Decoys:      -1500 (-3.2%, FDR-favorable)
  T/D ratio:   +3.6% (the tighter window rejects more decoys than targets)
  Reproducibility: 3 stratified reps bit-identical on native counts;
                   wall variance +/-1.2% across reps.

Phase B is now demonstrably alive on Astral when the calibrator's
residual subset is clean. The previous "Phase B does nothing on Astral"
finding was a calibrator-quality issue, not a Phase B logic issue.

Also keeps the calibration-debug-TSV emitter
(-Dmsgfplus.calibrationDebugTsv=<path>) committed for future
stratification work on other workloads. 32/32 scoped tests pass.
Phase B with the stratified calibrator (top-200 by spec_eValue) delivers
-10.4% Astral wall (median 551 -> 494s across 5 OFF + 3 AUTO reps),
+0.11% targets, -3.2% decoys, T/D ratio 1.91 -> 1.98 (sensitivity-
favorable). Plan section 4 annotated; SHIPPED.md "Active" section
documents the win + the four enabling commits (telemetry, calibrator
iso=0, configurable formula constants, stratification).
The previous SHIPPED.md still listed astral-speed-improvements.md as
the Active doc; that file was deleted during a prior reset. Replace
with the actual Active state: Phase B shipped with 4 enabling commits
on feat/astral-speed-improvements, the Astral measurement table, and
pointers to next-experiments.md and astral-speed-5x-roadmap.md.
Replace Astral-only result table with three-dataset comparison after
remote validation on TMT (PXD007683 Lumos) and PXD001819 (Velos):

  Workload  | Window | Sigma  | Tightened    | Wall   | Targets | T/D
  ----------|--------|--------|--------------|--------|---------|------
  Astral    | 10 ppm | 0.99   | -> 3.48 ppm  | -10.4% |  +0.11% | +3.6%
  TMT       | 20 ppm | 2.05   | -> 6.67 ppm  | -18.0% |  -2.05% | +1.3%
  PXD001819 |  5 ppm | 2.15   | no-tighten   |   ~0%  |  +0.17% | +0.5%

Pattern: Phase B wins when calibrated sigma is materially smaller than
user window (Astral, TMT); safely no-ops when sigma is comparable to
window (PXD001819). TMT -2.05% target drift is a known yellow flag for
broader rollout; mitigations (instrument-aware k, stricter
stratification) noted in the doc but out of scope for this commit.
After Phase B's three-dataset validation (Astral -10.4%, TMT -18.0%,
PXD001819 safe no-op), the next-experiments plan §5 calls for exact
prefix mass-interval pruning. This commit writes the design before any
code change so the implementation has a clear contract:

- Hook: between addResidue (extends prefix) and the variant-mass-fetch
  loop in DBScanner.dbSearch (around line 412)
- Bound: [prefixMinMass + R_min * minAaMass,
          prefixMaxMass + R_max * (maxAaMass + maxResidueModMass) + maxFixedTermModMass]
- Intersection test: pepMassSpecKeyMap.subMap(...) widened by max
  precursor tolerance; if empty, branch is dead -> break
- Telemetry-first: implement prune-counter only at Checkpoint 1, run on
  Astral, decide based on prune rate before adding the break
- Exact-by-construction: no recall risk, only bookkeeping-vs-savings
  trade-off

Acceptance: prune rate >=5%, Astral wall -5% vs Phase B baseline,
native counts bit-identical.

Kill: prune rate <1%, or wall flat despite pruning, or correctness
drift.

Composes with Phase B (commit aac389c): Phase B reduced matched_speckeys
per pairing call; Experiment 2 reduces the number of pairing calls.
Different attack surfaces; should stack.
…heckpoint 1+2 measured)

Plan-§5 exact prefix mass-interval pruning: at every SA-walk residue
extension in DBScanner.dbSearch, compute the reachable final-peptide-
mass interval and ask whether any spectrum window in pepMassSpecKeyMap
can intersect it. If not, the branch is dead.

The bound is exact by construction:
  reachableMin = prefixMinMass + R_min * minAaMassBound
  reachableMax = prefixMaxMass + R_max * maxAaMassBound
where R_min / R_max are the residue counts implied by minPeptideLength /
maxPeptideLength, and minAaMassBound / maxAaMassBound are cached from
aaSet.getAllAminoAcidArr() at DBScanner construction (every aa+mod
variant in the set, so any future residue's mass is bounded).

Two opt-in flags:
  -Dmsgfplus.experiment2Telemetry=true  -> count would-be prunes
  -Dmsgfplus.experiment2Pruning=true    -> actually break out of SA walk

Default OFF -> the bound is not computed, OFF-mode bit-identical to
dev-tip. Enabled as a developer diagnostic, mirroring PhaseBTelemetry
and -Dmsgfplus.useForkJoin patterns. Not exposed as CLI flags.

Astral measurements on remote pride-linux-vm.ebi.ac.uk:

  Run                          | Wall  | Targets | Decoys | E2 prune
  -----------------------------|-------|---------|--------|---------
  Phase B baseline (3 reps)    | 494s  | 89580   | 45292  | n/a
  OFF baseline (5 reps median) | 551s  | 89479   | 46792  | n/a
  phaseB + E2 telemetry only   | 571s  | 89580   | 45292  | 12.22%
  phaseB + E2 pruning ON       | 549s  | 89580*  | 45292* | 1.84%
  OFF + E2 pruning ON          | 611s  | 89479*  | 46792* | 1.76%
  (* = bit-identical to baseline -> exact-by-construction validated)

Verdict per plan §7 acceptance gates:
- Native target/decoy bit-identical: PASS (exact)
- Astral prune rate >= 5%: PASS (12.22% Checkpoint 1, 1.84% with break)
- Astral wall improves >= 5%: FAIL (regresses +11% in both
  Phase-B+E2-prune and OFF+E2-prune cases)

The bookkeeping cost (~40 ns per prefix evaluation, ~1.4B
evaluations on Astral = ~55s) exceeds the savings from skipping
doomed branches. Pure cheap-score-loop savings are ~1-2s (the
pruned pairings had zero matches anyway -- mean_per_call near zero
for those). The +55s gap is the pure overhead.

Checkpoint 3 path forward (left as follow-on, NOT in this commit):
- Cache globalMinSpecMass / globalMaxSpecMass once per task; short-circuit
  the bound test for prefixes whose reachable interval falls outside
  the global range -> avoids the TreeMap.subMap call (~150ns) on most
  prefix evaluations.
- Skip bound evaluation when peptideLengthIndex < minPeptideLength + N
  (N TBD); the prune-rate-by-length distribution would tell us where
  short-prefix evaluations are dead weight vs useful.
- Avoid per-prefix grid scan for prefix min/max mass; cache
  incrementally as variants are added.

Code stays in dev as opt-in scaffolding. Phase B remains the
iteration's shippable Astral wall lever (-10.4% measured median).

Files:
- src/main/java/edu/ucsd/msjava/msdbsearch/Experiment2Telemetry.java (new)
- src/main/java/edu/ucsd/msjava/msdbsearch/DBScanner.java (cached aa-mass
  bounds + bound-test hook)
- src/main/java/edu/ucsd/msjava/cli/MSGFPlus.java (telemetry summary)
- src/test/java/edu/ucsd/msjava/msdbsearch/TestExperiment2Telemetry.java (5 tests)
Checkpoint 2 measurement on Astral remote shows the bound test is
exact-by-construction (native counts bit-identical) and prunes 1.84%
of prefix evaluations after break — but the per-prefix bookkeeping
cost (~40 ns × 1.4 B = ~55 s) exceeds the savings (the pruned
pairings had zero matches, so saved cheap-score work is ~1-2 s).
Wall regresses +11% vs Phase B alone.

Code lands as opt-in scaffolding (commit 4241fbb), defaults OFF, so
OFF-mode is bit-identical to dev-tip. Future agents can pursue
Checkpoint 3 (cache global spec-mass range short-circuit; skip bound
test for short prefixes; incremental grid-mass caching) without
rebuilding the scaffolding.
…ed double[]

The Checkpoint 2 measurement showed Experiment 2 pruning was correct
(native counts bit-identical) but regressed wall +11 % because the
TreeMap.subMap(low, high).isEmpty() bound test allocated a view object
and walked at least one entry per call (~150 ns × 1.4 B evaluations =
~210 s of pure overhead).

Replace with ScoredSpectraMap.hasSpecMassInRange(double, double):
- Lazily materialise a sorted double[] from pepMassSpecKeyMap.keySet()
  on first call (the source map is read-only after preProcessSpectra
  finishes, so a cached snapshot is safe).
- Use Arrays.binarySearch + a single-element range check (~30 ns).
- 5x cheaper per call; no allocation per query.

Astral measurement on remote pride-linux-vm.ebi.ac.uk (3 runs each
config, all native counts bit-identical to baseline = exact-by-
construction validated):

  Run                          | TreeMap | binary-search | gain
  -----------------------------|---------|---------------|-------
  Phase B + E2 telemetry only  | 571 s   |     531 s     | -40 s
  Phase B + E2 pruning ON      | 549 s   |     511 s     | -38 s
  OFF + E2 pruning ON          | 611 s   |     559 s     | -52 s

Verdict against plan §7 wall gate (>= 5 % improvement vs Phase B
baseline 494 s):
- Phase B + E2 pruning: 511 vs 494 = +3.4 % (still slight regression)
- OFF + E2 pruning: 559 vs 551 = +1.5 % (break-even within noise)

The optimization closes ~75 % of the previous overhead but the
remaining ~17 s gap on Phase B-tightened data isn't covered by the
modest savings (matched_speckeys per skipped pairing is small once
Phase B has already narrowed the window from 10 ppm to 3.48 ppm).

Code stays as opt-in (-Dmsgfplus.experiment2Pruning=true). Default
remains OFF -> bit-identical to dev-tip on dev-tip OFF mode + Phase
B's calibration shift on AUTO. Phase B (commit aac389c and ancestors)
is the iteration's durable Astral wall lever (-10.4 %).

Future Checkpoint 4 paths (left as follow-on, not in this commit):
- Skip bound test for short prefixes (peptideLengthIndex below some
  threshold N where prunes are statistically rare)
- Cache prefixMin/Max incrementally instead of scanning grid variants
  on every evaluation (~10 ns saved per call)
Binary-search optimization closes ~75 % of the Checkpoint 2 overhead
gap. Phase B + E2 pruning narrows from +11 % to +3.4 % wall regression
vs Phase B alone. OFF + E2 pruning narrows from +11 % to +1.5 % vs OFF
(break-even in noise). Still misses the +5 % wall gate; Phase B stays
the iteration's deliverable.
…deLength (Checkpoint 4)

Move the Experiment 2 mass-interval pruning hook from before the
'peptideLengthIndex < minPeptideLength' continue to after it. Short prefixes
(length < minPeptideLength) have huge reachable intervals — R_max * maxAaMass
is many kDa wide — so the bound test almost never prunes there but still pays
the binary-search bookkeeping. Skipping that range trims ~25-30% of bound-test
evaluations.

Also simplifies reachableMin = prefixMin once we're past minPeptideLength
(R_min collapses to 0).

Exact-by-construction property preserved: the moved test is still a sound
necessary condition on extending the prefix to a viable final peptide. OFF mode
unaffected. Off by default (still gated on -Dmsgfplus.experiment2Pruning=true).
…pped, bench pending

Records the Checkpoint 4 code change (commit 8478651): bound test now skipped
for prefixes shorter than minPeptideLength, where the reachable interval is
too wide to prune anyway. Tests pass; Astral wall measurement pending remote
socket re-establishment. Math says best-case is +3-7s gap vs Phase B alone —
still inside noise, not a clear ≥5% graduation. Verdict updated to: four
checkpoints, none clear the gate; Phase B is the durable lever.
…-2.27% Astral wall

Replaces the "bench pending" Checkpoint 4 entry with the actual 5-trial
interleaved confirmation:

  phaseB_only:    mean 519.8s, σ 6.06s (522, 518, 517, 529, 513)
  phaseB+e2:      mean 508.0s, σ 4.85s (504, 507, 514, 503, 512)
  Δ = 11.8s = -2.27% vs Phase B alone
  5/5 trials phaseB+E2 < phaseB_only; Welch's t ~3.4 (p ~ 0.01)
  Native counts bit-identical 89580T / 45292D across all 10 runs

The effect is real and statistically significant but below the plan's >=5%
graduation gate. Experiment 2 stays opt-in (-Dmsgfplus.experiment2Pruning=true);
Phase B (-10.4% Astral wall vs OFF) remains the durable lever.
…real improvement

The Astral 5-trial bench showed:
  - Phase B (calibrated precursor-window tightening): -10.4% wall vs OFF, durable
    win, validated on 3 datasets (Astral / TMT / PXD001819). KEPT.
  - Experiment 2 (mass-interval pruning, Checkpoints 1-4): real -2.27% on top of
    Phase B but below the >=5% default-on gate. Runtime scaffolding REMOVED;
    retrospective stays in .claude/plans/experiment-2-mass-interval-pruning.md.
  - Catalog-backed scanner (MassIndexedPeptideCatalog + CatalogDBScanner):
    n=1 trial showed 4.56x slowdown (2186s vs 479s) with bit-identical native
    counts. Root cause: scanner iterates all 7M entries linearly and allocates
    a fresh CandidatePeptideGrid per entry instead of querying the mass slabs
    the catalog builds. REMOVED in current form; would need a spectrum-major
    redesign + grid pool to be viable.

Removed:
  - src/main/java/edu/ucsd/msjava/msdbsearch/CatalogDBScanner.java
  - src/main/java/edu/ucsd/msjava/msdbsearch/MassIndexedPeptideCatalog.java
  - src/main/java/edu/ucsd/msjava/msdbsearch/MatchScanner.java
  - src/main/java/edu/ucsd/msjava/msdbsearch/PeptideCatalogSource.java
  - src/main/java/edu/ucsd/msjava/msdbsearch/Experiment2Telemetry.java
  - src/main/java/edu/ucsd/msjava/msdbsearch/PhaseBTelemetry.java
  - tests for the above

Reverted to HEAD (catalog source-awareness):
  - DatabaseMatch (no protein-source list)
  - MSGFPlusPSMSet (no source-aware decoy decision)
  - DirectPinWriter / DirectTSVWriter (no catalog source path)
  - CompactFastaSequence (no protein-entry digest helper)
  - ConcurrentMSGFPlus (no catalog scanner construction)
  - TestConcurrentMSGFPlus (revert null arg)

Cleanup kept:
  - MSGFPlus: drop Phase B / Experiment 2 telemetry print blocks
  - DBScanner: drop Experiment 2 prefix-mass bound hook (and its bookkeeping)
  - ScoredSpectraMap: drop hasSpecMassInRange (Experiment 2 binary-search helper)
  - MassCalibrator: drop unused -Dmsgfplus.calibrationDebugTsv emitter

Phase B remains untouched: MassCalibrator iso=0 pre-pass + outlier filter +
spec_eValue stratification, and MSGFPlus tightening at min(userPpm, max(floor,
k*sigma + margin)) with k=3, floor=2 ppm, margin=0.5 ppm. Branch state:
51/51 scoped tests pass.
CLAUDE.md flags as a footgun: pre-passes that mutate shared Spectrum objects
cause silent PSM-count drift in the main pass. The calibration pre-pass at
MassCalibrator#collectResiduals invoked ScoredSpectraMap#preProcessSpectra,
which calls spec.setCharge(charge) and scorer.getScoredSpectrum(spec) on
objects from the shared SpectraAccessor — exactly the warned pattern.

Fix:

- ScoredSpectraMap gains an opt-in builder isolateSpectrumState() that
  routes all spectrum-mutating sites through a new prepareSpectrumForScoring
  helper. Default behavior (main search path) is unchanged: in-place
  setCharge, no allocation.
- When isolation is enabled, each call clones the Spectrum (precursor +
  per-peak deep clone via getCloneWithoutPeakList + Peak.clone), applies
  setCharge to the clone only, and scores the clone. The shared Spectrum
  cached on SpectraAccessor is untouched.
- MassCalibrator's pre-pass opts in:
    new ScoredSpectraMap(...).isolateSpectrumState()
- TestScoredSpectraMapIsolation pins the contract: default path returns the
  same instance and propagates setCharge; isolated path returns a different
  instance, leaves the original's charge unchanged, and applies setCharge to
  the clone.

Validation: 3-dataset x 3-trial x 2-arm bench on the isolated machine
(pride-linux-vm.ebi.ac.uk):

  Astral      OFF  508.7s +/- 5.0s  T=89479 D=46792
              AUTO 469.0s +/- 9.2s  T=89580 D=45292   wall -7.8%, +101 T
  TMT         OFF  248.0s +/- 6.6s  T=28790 D=14768
              AUTO 240.3s +/- 1.5s  T=28201 D=14285   wall -3.1%
  PXD001819   OFF  101.3s +/- 0.6s  T=28037 D=11022
              AUTO 100.3s +/- 2.1s  T=28084 D=10985   wall -1.0%, +47 T

Native target/decoy counts are deterministic across all 3 trials per arm
(zero per-trial variance), confirming the isolation prevents the
shared-state mutation that would otherwise produce silent drift. OFF-mode
counts on each dataset match the dev baseline exactly.
…diff)

These directories are out of scope for the precursor-window-tightening PR.
Restore them to match dev exactly so they no longer appear in the PR's
cumulative diff against dev. After this commit:

  git diff origin/dev..HEAD -- .claude/plans benchmark/ci

returns empty.

Removed (branch-only additions, never on dev):
  .claude/plans/SHIPPED.md
  .claude/plans/astral-next-experiments.md
  .claude/plans/astral-phase-a-retrospective.md
  .claude/plans/astral-speed-5x-roadmap.md
  .claude/plans/experiment-2-mass-interval-pruning.md

Restored to dev's version (branch had not picked up these dev additions):
  .claude/plans/parameter-modernization-flag-inventory.md
  .claude/plans/parameter-modernization.md
  .claude/plans/search-sync-cleanup.md

Restored to dev's version (branch had local edits):
  .claude/plans/README.md
  benchmark/ci/README.md
  benchmark/ci/PXD001819/baseline.tsv
  benchmark/ci/PXD001819/extract_metrics.py
  benchmark/ci/PXD001819/run_ci.sh
  benchmark/ci/PXD001819/test_compare_metrics.py
…operties

Adds two override knobs for the calibration pre-pass:

  -Dmsgfplus.maxSampled=<int>          (default 500)
  -Dmsgfplus.minConfidentPsms=<int>    (default 200)

Both fall back to the historical defaults on unset/non-numeric/non-positive
values, so existing behavior is unchanged. Internal constants moved from
private to public DEFAULT_* so callers and tests can reference them.

CalibrationStats.hasReliableStats() no longer hard-codes the
MIN_CONFIDENT_PSMS threshold; the calibrator now emits confidentPsmCount=0
as the "unreliable, do not apply" sentinel and hasReliableStats checks
count > 0. Behavior is identical for all existing callers.

Defaults intentionally unchanged. A 9-trial Astral A/B (3 trials each at
minConfidentPsms = 200 / 500 / 1000, all with maxSampled=3000) showed:

  config            sigma ppm   tighten ppm   wall s    targets   decoys
  (500,  200) HIST  0.994       3.482         469.0     89580     45292
  (3000, 200)       0.454       2.000         470.3     89249     44538
  (3000, 500)       0.511       2.033         471.7     89212     44555
  (3000, 1000)      0.729       2.687         469.7     89524     44817

Key findings:
  - Wall time is statistically equivalent across all four configurations
    (~470 s, well inside run-to-run sigma of ~25 s).
  - Sigma DECREASES as minConfidentPsms DROPS at fixed maxSampled
    (0.454 < 0.511 < 0.729 ppm for 200 < 500 < 1000). Confirms the prior
    that stratification depth, not pool size, drives sigma quality.
  - Lower minConfidentPsms produces an over-tight window: (3000, 200)
    hits the 2.0 ppm floor and loses 331 native targets vs the historical
    (500, 200) default.
  - No configuration tested beats the historical (500, 200) defaults end
    to end, which were already validated on 3 datasets in PR #26.

Conclusion: ship the configurability for users who want to tune for
specific instruments / file sizes, but do not change the defaults.

26 unit tests verify property parsing (default fallback, valid int, trim,
non-numeric rejection, non-positive rejection, exported constant values).
Phase B: calibrated precursor-window tightening (-7.8% Astral wall)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants