Conversation
…Java 17 - Change skipTests from true to false so mvn test actually runs - Update maven-compiler-plugin source/target from 1.8 to 17 (matches runtime) - Add missing compile dependencies: jmzml 1.7.11, fastutil 8.5.12, slf4j-api 1.7.36, logback-classic 1.2.12, commons-io 2.15.1 (master code references these classes but they were not declared) - @ignore TestMzML test that requires Windows-specific DMS files Result: 120 tests run, 53 active, 67 skipped, 0 failures Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…shold In MZIdentMLGen.addSpectrumIdentificationResults(), change `break` to `continue` when a match has DeNovoScore below the minimum threshold. The `break` was incorrectly stopping emission of all subsequent matches for that spectrum, silently dropping valid PSMs from the mzid output. Also add null safety check for spectrum index lookup — if a spectrum index is not found in the spectrum file, log a warning and skip instead of throwing a NullPointerException. Add TestMZIdentMLGen with two integration tests: - testMzidScoreCompleteness: runs MSGF+ search, verifies every SII has all 4 score CVParams (RawScore, DeNovoScore, SpecEValue, EValue) - testMzidStructuralValidity: verifies output mzid has required mzIdentML structure elements Closes MSGFPlus#157 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add new -msLevel CLI parameter to filter spectra by MS level. Accepts single value (e.g., -msLevel 2) or comma-separated range (e.g., -msLevel 2,3). Default is 2 (MS2 only). Changes: - ParamManager: add MS_LEVEL enum and registration - IntRangeParameter: enable single-value parsing, fix typo - SearchParams: add minMSLevel/maxMSLevel fields - SpecKey: filter spectra by MS level in getSpecKeyList() - SpectraAccessor: add setMSLevelRange(), wire to parsers - MzMLAdapter/MzXMLSpectraMap: fix maxMSLevel to be inclusive - MSGFPlus/MSGFDB/MSGFDBLib: wire MS level parameters - pom.xml: remove fastutil shade filter (jmzml 1.7.11 needs full fastutil) Tests: TestIntRangeParameter (9 tests), TestMSLevelFiltering (6 tests) Benchmark (TMT 1.1GB, TDA): Baseline: 1245s, 6654 PSMs@1%FDR -msLevel 2: 957s (-23%), 6936 PSMs@1%FDR (+4.2%) Closes MSGFPlus#159 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat(MSGFPlus#159): add -msLevel parameter for MS level filtering
fix(MSGFPlus#157): preserve PSM scores when DeNovoScore is below threshold
fix: enable test suite and fix broken build dependencies
Remove standalone scripts, legacy tools, and unused classes that are not referenced by the core MSGF+ search pipeline, reducing codebase by ~22,000 lines. Deleted entire packages: - ims/ (9 files) — legacy IMS utilities - ipa/ (5 files) — unused isotope pattern analysis - msgf2d/ (8 files) — abandoned 2D scoring experiment - msdictionary/ (7 files) — unused genome dictionary tool - mstag/ (3 files) — unused sequence tagging - scripts/ (6 files) — standalone CLI utilities - msutil/test/ (3 files) — misplaced test classes - msgf/test/ (2 files) — legacy test stubs - msgf/analysis/ (1 file) — unused ROC generator Cleaned mixed packages: - misc/: removed 59 standalone scripts, kept 5 core utilities - msgf/: removed 6 unused graph/scoring classes - msutil/: removed 9 unused filter/annotation classes - msdbsearch/: removed 4 standalone DB tools - parser/: removed 9 legacy format parsers (InsPecT, Mascot, etc.) - ui/: removed 6 legacy entry points (MSGF, MSGFLib, etc.) - mzid/: removed 1 unused adapter stub - msscorer/: removed 1 unused stats class - suffixarray/: removed 1 unused mass array class Also removed dead test methods and cleaned dangling imports. Tests: 119 run, 0 failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrite README.md: - Full parameter reference tables covering all 30+ flags organized by category (core search, fragmentation, enzyme, filtering, etc.) - Quick start examples for basic and TMT searches - Modification file format documentation with examples - Build-from-source instructions - Updated requirements to Java 17+ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add parameter docs to README and CI/CD workflow
Remove dead code: 150 unused classes, -22K lines
Write search results directly to TSV from in-memory objects, bypassing mzIdentML serialization. Output is column-identical to MzIDToTsv (verified by diff on test.mgf search). This avoids generating large .mzid files when only TSV is needed downstream (e.g. OpenMS MSGFPlusAdapter, Percolator). - New DirectTSVWriter class with same score/protein/mod logic as MZIdentMLGen but streaming tab-delimited output - New -outputFormat parameter: 0=mzid (default), 1=tsv, 2=both - Includes fixed + variable mods, MGF Title column, decoy filtering - Backwards compatible: default remains mzid Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When -addFeatures 1 is used with -outputFormat tsv, the TSV now includes all PSMFeatureFinder columns needed for Percolator: ExplainedIonCurrentRatio, NTermIonCurrentRatio, CTermIonCurrentRatio, MS2IonCurrent, MS1IonCurrent, IsolationWindowEfficiency, NumMatchedMainIons, and all error statistics (MeanError/StdevError for All and Top7, both absolute and relative). These features were previously only available as UserParams in mzid and were not extracted by OpenMS's addMSGFFeatures() — now they are directly accessible as TSV columns. The peptide modification format (M+15.995) is already compatible with OpenMS MSGFPlusAdapter's modifySequence_() converter which transforms it to bracket notation M[+15.995] for AASequence. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the jmzml JAXB-based MzMLUnmarshaller with a lightweight StAX streaming parser that extracts only the 11 fields MSGF+ needs. The new parser builds a spectrum index in a single pass, then preloads all spectra into memory on first random access, eliminating repeated XML parsing during the search phase. Benchmark (TMT 1.1GB mzML, target-decoy, 4 threads): - Wall time: 957s -> 853s (-10.9%) - PSMs at 1% FDR: 6,936 (unchanged) - Score completeness: 100% (unchanged) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…port - Remove jmzml (JAXB-based mzML parser) dependency from pom.xml - Delete old jmzml-dependent classes: MzMLAdapter, MzMLSpectraMap, MzMLSpectraIterator, SpectrumConverter - Add referenceableParamGroupRef resolution to StaxMzMLParser: builds a map of param groups during index pass, resolves refs during spectrum parsing (critical for files that define polarity, MS level, etc. in referenceable groups) - Move turnOffLogs() utility to StaxMzMLParser, update all callers - Keep fastutil dependency (needed by jmzidml at runtime) JAR size reduced from 39.5MB to 38MB. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jmzReader (uk.ac.ebi.pride.tools:jmzreader:2.0.6) had zero imports anywhere in the codebase — a dead dependency from earlier development. All spectrum file format parsing uses custom implementations: mzML (StaxMzMLParser), mzXML (embedded jrap/stax), MGF/MS2/PKL (custom parsers). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove mzXML file format support entirely: - Delete embedded jrap/stax library (20 files, ~5,800 lines) - Delete MzXMLSpectraMap, MzXMLSpectraIterator, MzXMLToMgfConverter - Delete MzXMLToMgf utility and mzXML test resources (38MB) - Remove MZXML from SpecFileFormat enum, SpectraAccessor, ParamManager - Update misc/scripts/ui classes to remove mzXML code paths mzXML is a legacy format superseded by mzML. Users with mzXML files can convert to mzML using msconvert (ProteoWizard). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- StaxMzMLParser: use ConcurrentHashMap for thread-safe spectrum cache, fix class-level doc (preload-all, not bounded LRU), check index before preloading, propagate exceptions instead of returning null - StaxMzMLSpectraIterator: throw NoSuchElementException when exhausted - SpectraAccessor: throw exception instead of System.exit(-1), validate specFormat is non-null in constructor - SelectSpectra: update stale .mzXML reference to .mzML - pom.xml: fix duplicate <manifest>, remove stale comments, note fastutil is required by jmzidentml at runtime Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Write search results directly to TSV from in-memory objects, bypassing mzIdentML serialization. Output is column-identical to MzIDToTsv (verified by diff on test.mgf search). This avoids generating large .mzid files when only TSV is needed downstream (e.g. OpenMS MSGFPlusAdapter, Percolator). - New DirectTSVWriter class with same score/protein/mod logic as MZIdentMLGen but streaming tab-delimited output - New -outputFormat parameter: 0=mzid (default), 1=tsv, 2=both - Includes fixed + variable mods, MGF Title column, decoy filtering - Backwards compatible: default remains mzid Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When -addFeatures 1 is used with -outputFormat tsv, the TSV now includes all PSMFeatureFinder columns needed for Percolator: ExplainedIonCurrentRatio, NTermIonCurrentRatio, CTermIonCurrentRatio, MS2IonCurrent, MS1IonCurrent, IsolationWindowEfficiency, NumMatchedMainIons, and all error statistics (MeanError/StdevError for All and Top7, both absolute and relative). These features were previously only available as UserParams in mzid and were not extracted by OpenMS's addMSGFFeatures() — now they are directly accessible as TSV columns. The peptide modification format (M+15.995) is already compatible with OpenMS MSGFPlusAdapter's modifySequence_() converter which transforms it to bracket notation M[+15.995] for AASequence. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace ConvertToMgf-based tests (class removed in PR #7) with StaxMzMLParser and SpectraAccessor mzML parsing tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: native TSV output — bypass mzIdentML for OpenMS/Percolator pipelines
perf: replace jmzml JAXB parser with StAX-based mzML reader
* chore: add CI/release packaging and benchmark scaffolding Split infra and repository maintenance updates into a dedicated reviewable change set, including workflow automation, Docker packaging, benchmark scripts/docs, and project documentation updates. Exclude large local benchmark artifacts and keep this PR focused on non-hot-path code organization and release hygiene. Made-with: Cursor * chore: keep benchmark folder local-only Remove benchmark scripts/docs from this branch and ignore the entire benchmark directory so local benchmarking assets do not appear in review PRs. Made-with: Cursor * docs: keep single canonical primitives plan Fold memory-reduction guidance into the balanced primitives plan and remove the old duplicate plan file so review and maintenance use one canonical document. Made-with: Cursor * chore: narrow PR1 plans to scope-only docs Remove unrelated strategy and optimization plan documents from PR1 so this branch stays focused on infra/packaging cleanup. Keep only the plans index file in this PR. Made-with: Cursor * chore: remove legacy ZippedReleases folder Delete the obsolete Windows release helper scripts and reference files under ZippedReleases from the repository. Made-with: Cursor * chore: remove legacy extlib dependency jar Delete the obsolete jrap/stax legacy jar under extlib as part of repository cleanup. Made-with: Cursor * fix: address copilot review feedback for PR11 Align docs with actual supported legacy formats, update release pipeline to build from tag version with tests, and fix Docker build JDK requirement. Made-with: Cursor * chore: minor packaging/docs hygiene for PR1 Normalize ignore files, shrink Docker build context, align agent README with dev/CI, and clarify release workflow step naming. Made-with: Cursor * docs: trim examples folder to small referenced artifacts Remove duplicate Tryp_Pig_Bov DB/index copies (tests use src/test/resources), drop large unlinked Excel/PNG teaching files, and add docs/examples/README.md so the directory purpose is obvious. Link the index from the main README. Made-with: Cursor * chore: remove IntelliJ IDEA tips screenshots from docs Made-with: Cursor * docs: replace legacy HTML manuals with Markdown Convert docs/*.html to GitHub-flavored Markdown (pandoc), fix internal links, add docs/README.md as the documentation index, and remove unused style.css. Made-with: Cursor * docs: strip leftover HTML span wrappers from converted Markdown Made-with: Cursor
Add a workflow-dispatch benchmark pipeline on a fixed self-hosted runner profile, with public-data download, metrics emission, and baseline TSV comparison under benchmark/ci/PXD001819 for future dataset expansion. Made-with: Cursor
Use uppercase PXD001819 naming in workflow-visible labels/artifacts and update README to state mzXML is not available in this fork. Made-with: Cursor
Made-with: Cursor
- run_ci.sh: count only opening <SpectrumIdentificationItem> tags for sii_count (prior substring match double-counted closing tags and picked up SpectrumIdentificationItemRef) - run_ci.sh: always emit peak_rss_kb and cpu_percent (NA when GNU time does not expose them) so metrics file format is consistent - compare_metrics.py: support an `optional` column; optional missing/NA metrics warn instead of failing CI - baseline.tsv: add optional column, mark peak_rss_kb optional, fix ubuntu-latest note to reference the self-hosted runner, widen sii_count floor to match the de-duplicated count - README pointers: update stale references to a non-existent benchmark/run_pxd001819_benchmark.sh script - benchmark/README.md: describe the actual committed CI scaffold instead of an uncommitted local harness layout Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- extract_metrics.py: stream-parse mzIdentML with ElementTree.iterparse so SII counting and PSM 1% FDR counting no longer rely on line-shaped regex matches over XML - run_ci.sh: use a bash array for SEARCH_ARGS (safe against future flags with spaces), atomic .part downloads, validate cached gzip, default MSGFPLUS_THREADS to 8 to match the workflow, drop the always-zero java_exit metric, and emit integer wall_time_sec - workflow: pin Python via actions/setup-python@v5 so self-hosted runners have a known 3.11 interpreter for the helper scripts - compare_metrics.py: add test_compare_metrics.py covering in-range pass, out-of-range fail, missing required/optional, NA, non-numeric, and empty-range rows (7 tests, all passing) - .gitignore: drop redundant benchmark/** patterns (already covered by benchmark/* + ci/ allowlist); add __pycache__/ and *.pyc - docs: describe new helper and test scripts in both READMEs
fix(benchmark): harden PXD001819 scaffold per review feedback
Two follow-ups from the PR #25 review thread: 1. BOM strip in BufferedLineReader (Copilot finding). The constructor wrapped the FileInputStream in UnicodeBOMInputStream but never called skipBOM() -- BOM bytes leaked into line 1, breaking config / mod / FASTA files saved by editors that prepend a UTF-8 BOM (Windows Notepad, some VS Code configurations, etc.). Fix: chain .skipBOM() in the super(...) call. The BOM-stripping path is per-instance; the wrapper still detects the BOM in its constructor, we just now consume it. New test BufferedLineReaderTest pins the contract: a file with a UTF-8 BOM followed by `ParentMassTolerance=20ppm` returns the bare key=value as line 1 (not `ParentMassTolerance=20ppm`), and a no-BOM file is unchanged. 2. Delete unused MSGFResult.java. Verified zero callers in src/main and src/test before removal. The reviewer flagged this as a small follow-up to land alongside the modernization sweep. Verified: scoped sweep (BufferedLineReaderTest, MSGFPlusOptionsConfigFileTest, MSGFPlusOptionsActivationMethodTest, SearchParamsTest, TestPrecursorCalScaffolding, TestRunManifestWriter, TestDirectPinWriter, TestMSUtils, TestMisc, TestMinSpectraPerThread): 55 tests, 0 failures, 0 errors, 2 skipped.
Convert eight value-shaped types to Java records, per the audit: small immutable carriers with no inheritance constraints, where the record's auto-generated equals/hashCode/toString and accessor methods replace hand-rolled boilerplate. No behavioral change; net trim of ~80 LOC plus a clearer surface. Converted: - cli.IntRange -- record IntRange(int min, int max). Compact constructor keeps the min<=max validation. parse() and Converter (picocli ITypeConverter) preserved. - cli.PrecursorTolerance -- record PrecursorTolerance(Tolerance left, Tolerance right). Compact constructor enforces the matching-unit and non-negative invariants in-place; parse() / Converter retained. - msutil.CvParamInfo -- record CvParamInfo(String accession, String name, String value, String unitAccession, String unitName). The stored hasUnit field is gone; hasUnit() is now derived from unitAccession != null. Compatibility getters (getAccession(), getValue(), getUnitAccession(), getUnitName(), getHasUnit()) keep existing call sites untouched. - msutil.Annotation -- record Annotation(AminoAcid prevAA, Peptide peptide, AminoAcid nextAA). Setters (which had no live callers) removed; the parsing constructor delegates via this(...). isProteinNTerm/isProteinCTerm and the dotted toString are kept. - msgf.ProfilePeak<T extends Matter> -- record. Setters were unused; Comparable<ProfilePeak<T>> is preserved. - msutil.Atom -- record Atom(String code, double mass, int nominalMass, String name). Static atomArr table + atomMap + static initializer block all kept verbatim. getCode() / getName() / getMass() / getNominalMass() retained as compatibility wrappers. - msdbsearch.CompactSuffixArray.RangeMetadata -- record (4 file/int fields). Three call sites updated from md.numEntries to md.numEntries() etc. - msgf.MassListComparator.MatchedPair<T extends Matter> -- record. getMass1/getMass2 retained as compatibility wrappers. Call-site updates in SearchParams.parse: tol.left/.right, isotope.min/.max, specIdx.min/.max, ms.min/.max all switched to record accessor methods (.left()/.right()/.min()/.max()). Verified: scoped sweep (incl. BufferedLineReaderTest, MSGFPlusOptionsConfigFileTest, MSGFPlusOptionsActivationMethodTest, SearchParamsTest, TestPrecursorCalScaffolding, TestRunManifestWriter, TestDirectPinWriter, TestMinSpectraPerThread, TestMSUtils, TestSA, TestMisc, TestCandidatePeptideGrid + ConsideringMetCleavage): 80 tests, 0 failures, 0 errors, 3 skipped.
perf(search): instrumentation + dead-sync removal + per-task buffers + ForkJoinPool toggle
… SHIPPED.md - Add astral-speed-improvements.md design (gate B: 1.3-1.5x Astral wall, no PSM regression). TMT-as-inner-loop, Astral-as-phase-gate workflow. - Add SHIPPED.md short retrospective covering PRs #15-#22 (primitives + Achievements A+B), PR #23 (speed-v2 + output consolidation), PR #24 (Astral OOM + BuildSA scaling), PR #25 (search-sync-cleanup + parameter-modernization), and the abandoned fragment-index experiment. - Remove search-sync-cleanup.md, parameter-modernization.md, and parameter-modernization-flag-inventory.md — work shipped, content folded into SHIPPED.md. - Update README.md to point at the new active/history layout.
…ng model Replace the short-horizon astral-speed-improvements.md (drafted on this branch in 878b0cb) with the longer-horizon astral-speed-5x-roadmap.md as the single active design. Shipping-model change (new §0 in the roadmap): this iteration ships as milestone commits on feat/astral-speed-improvements with one closing PR at the end, not a PR per phase. Each phase milestone commit follows the template: feat(astral-speed): MILESTONE Phase <id> — <one-line achievement> <2-4 lines: TMT inner-loop wall delta, Astral phase-gate result, recall delta, RSS delta if any> The small-wins from the deleted plan (graph-skeleton memoization in PrimitiveAminoAcidGraph + Tier-1.5 GF candidate cap in DBScanner.computeSpecEValue) are retained as the §0 "Iteration 0.5 fallback" — used only when a big-win phase fails its kill gate. Original short-horizon plan recoverable via: git show 878b0cb:.claude/plans/astral-speed-improvements.md
…reset Phase A (in-engine MS2 deisotoping + dense-peak retention cap + GF candidate cap + scorer hot-path optimization) was attempted on this branch from 2026-04-27 to 2026-04-28. None of three independent optimization angles moved Astral wall above run-to-run noise; TMT's 1.41x apparent win came with -0.25% target and -4.6% decoy drift, which is not clean enough to justify shipping the surface area. Branch was hard-reset to eee9fa6 (the 5x roadmap state before any Phase A work) to drop 11 code commits + their uncommitted in-progress extensions. This commit captures the empirical findings as a retrospective doc so future agents can read the measurement data, understand why Phase A failed Astral, and pick from Phase B/C/E or workload-retargeting alternatives without repeating the attempt. - Add .claude/plans/astral-phase-a-retrospective.md with six Astral measurements, two TMT measurements, lessons (TMT-not-Astral-proxy, JIT-already-optimizes, native-count-drift-is-leading-indicator), and a concrete list of what's still untried. - Update astral-speed-5x-roadmap.md section 11 + 12 to flag Phase A as attempted-and-reverted, with original recommendation preserved as a blockquote for context. - Update SHIPPED.md "Abandoned" section with a Phase A entry pointing at the retrospective. - Update README.md "Active" list to surface the retrospective. Reverted code is recoverable via: git show 5cdd21e:src/... and walks back through 11 commits (b78e275 through 5cdd21e).
…nder noise Follow-up to Phase A iteration. Tested ForkJoin pool as a smart-default on Astral after thread-scaling sweep showed default ThreadPoolExecutor anti-scales past 6 threads. Findings: - Default executor: 4t=690s, 6t=675s, 8t=884s (+28% regression at 8t). Anti-scaling at 8t reproduces across measurements; this is real. - ForkJoin opt-in: 4t=872s (regression), 8t=521s (1.32x vs 4t). Pool choice is thread-count dependent. - Smart-default change implemented (numThreads >= 8 -> ForkJoin) and scope-tested (9/9 pass), but confirmation runs showed +65% wall on the same JAR vs the morning measurement. Machine state had degraded across hours of benchmarking; can't isolate ForkJoin win from noise. - Reverted. The underlying -Dmsgfplus.useForkJoin=true opt-in remains in dev unchanged. Lessons added to retrospective: - Need stable benchmark environment + multi-run statistics for any Astral wall-time claim. Single-measurement deltas at the 30% level are below the noise floor. - The default-executor anti-scaling at 8t IS worth fixing. Future attempt: jfr print --events jdk.JavaMonitorWait on 8t default-executor profile to localise the contention point in ThreadPoolExecutorWithExceptions. Files: .claude/plans/astral-phase-a-retrospective.md (added Phase E section + updated "what's untried" entry); .claude/plans/SHIPPED.md (added Phase E to "Abandoned"). Source code unchanged from eee9fa6 (verified via git status; MSGFPlus.java reverted before commit).
… not enough JFR analysis of the morning 4t Astral profile found zero JavaMonitorEnter contention events and 100% RUNNABLE samples, ruling out lock contention as the 8t regression cause. But 588K GCPhaseParallel events suggested GC parallel-phase coordination as a candidate. Tested with -Xmx16g (double heap) at 8t and 4t: - 8t-Xmx16g: 776s wall, 5067MB peak RSS, 182 GC pauses - 4t-Xmx16g: 870s wall, 6120MB peak RSS, 184 GC pauses Bigger heap helped 8t by ~12% wall and dropped peak RSS ~17% (G1GC ran fewer collections), but only ~4% at 4t. GC contributes to the 8t regression but is not the entire story. Even with Xmx16g, 8t (776s) is still slower than morning's 4t-Xmx8g baseline (690s). The afternoon's 4t-Xmx8g run (904s) vs morning's same configuration (690s) is +31% on the same JAR / same args / same machine — confirming the day's accumulated machine-state degradation dwarfs any code-level signal. Six hours of benchmarking has hit the noise floor. No actionable recommendation to ship. Source code stays at dev tip. The Phase E section in astral-phase-a-retrospective.md is updated with the GC-pressure findings.
…ere noise Replication batch (3 Astral runs back-to-back, 2026-04-29 morning, quieter machine): 4t default: 963.1s 8t default: 918.3s (FASTER than 4t — opposite of yesterday) 8t ForkJoin: 978.8s (no advantage) All three within 6.5% of each other. Native counts bit-identical. This proves: - The "default executor anti-scales past 6 threads" finding was wrong. It was a one-day correlation between morning-quiet-machine + 4t and afternoon-noisy-machine + 8t, not a real algorithmic relationship. - The 521s ForkJoin-8t measurement was an outlier. Three independent re-measurements (yesterday afternoon 861s, today 978s) put 8t-FJ at ~900s in normal machine state. - The 690s 4t-default baseline was an outlier in the other direction. Today's same-machine 4t-default was 963s. Conclusion: no Phase E shippable change exists on this benchmark setup. Wall-time variance from machine state (~30%) dwarfs any code-level signal we can produce on a single developer workstation. Updated retrospective with the corrected Phase E section + replication batch data. Updated SHIPPED.md Phase E entry to reflect the disproof. Source code stays at dev tip. End of iteration. Five doc commits stay on the branch as the deliverable.
The PXD001819 CI workflow (.github/workflows/benchmark-pxd001819.yml) has been silently broken since PR #23 removed mzIdentML output entirely. run_ci.sh wrote -o ci_output.mzid, MS-GF+ produced a .pin under that filename, and extract_metrics.py tried to parse it as mzIdentML. The CI never produced valid metrics. Switch the whole pipeline to .pin, which is the only modern output format: - run_ci.sh: -o $PIN (was -o $MZID); error message updated. - extract_metrics.py: rewritten parse_pin() counts native target / decoy rows by streaming the .pin (column 2 = label, {1, -1}) one line at a time. Drops the iterparse XML dependency. Replaces sii_count + psm_1pct_fdr metrics with native_target_count + native_decoy_count. - baseline.tsv: new metric names + ranges. Wide first-green-run windows (14000-35000 targets, 8000-20000 decoys); to be tightened after the first self-hosted runner produces consistent numbers. - test_compare_metrics.py: 7 existing comparator tests updated to use the new metric names; 3 new ParsePinTest cases covering label counts, empty input, and malformed rows. 10/10 pass. - benchmark/ci/README.md: documents the PIN-based pipeline, why native counts are the right CI gate (deterministic across runs; catches search-code regressions cleanly without Percolator dependency), and points future iterations at the multi-run protocol the astral-phase-a-retrospective.md asks for. The CI gate is now search-correctness (deterministic counts), not sensitivity (which needs Percolator + a separate gate). Sensitivity regressions surface via a downstream Percolator run on the .pin artifact — out of scope for this CI workflow. This is the meta-fix the Astral retrospective named as the next iteration's prerequisite: stable benchmarks must work before any speed-optimization claim is defensible.
…ication Phase B (calibrated precursor-window tightening) is already implemented in dev — MSGFPlus.runMSGFPlus:396-423 computes tightened tolerance via MassCalibrator.tightenedTolerancePpm() and passes it to per-task ScoredSpectraMap. What's missing per astral-next-experiments.md is the telemetry to verify it's actually shrinking the pairing fan-out. This commit adds PhaseBTelemetry, an opt-in counter class that records: - pairing_calls: number of times DBScanner.dbSearch hit the pepMassSpecKeyMap.subMap(leftThr, rightThr) site - matched_speckeys: total SpecKeys returned across those calls Enable with -Dmsgfplus.phaseBTelemetry=true. Off by default; OFF-mode behaviour is bit-identical (single load+branch when disabled). Not exposed as a CLI flag — this is a developer diagnostic for Phase B verification, mirroring the approach used for -Dmsgfplus.useForkJoin. At the end of search MSGFPlus.runMSGFPlus prints (when enabled): [Phase B telemetry] pairing_calls=N matched_speckeys=M mean_per_call=X.XX Files: - src/main/java/edu/ucsd/msjava/msdbsearch/PhaseBTelemetry.java (new) — thread-safe LongAdder counters, opt-in via system property, JVM-scoped. - src/main/java/edu/ucsd/msjava/msdbsearch/DBScanner.java — record at DBScanner.dbSearch:489 (one line, guarded by enabled() short-circuit). - src/main/java/edu/ucsd/msjava/cli/MSGFPlus.java — print summary after task wall summary (around line 587), only when enabled. - src/test/java/edu/ucsd/msjava/msdbsearch/TestPhaseBTelemetry.java (new) — 5 tests: counter math, mean-zero edge case, reset, 8-thread concurrency stress (80k ops total), system-property contract. Tests: 14/14 pass for the relevant scoped suite (TestPhaseBTelemetry + TestPrecursorCalScaffolding + TestConcurrentMSGFPlus); OFF-mode unchanged. Plan update: astral-next-experiments.md §4 annotated to note Phase B core is shipped + telemetry is now wired. Future agent runs the bench with -Dmsgfplus.phaseBTelemetry=true and the existing -precursorCal auto/on, then verifies §4.5 Phase B success gate (matched SpecKey count drops materially after tightening; Astral wall improves >=10%).
…uals Two bugs in MassCalibrator surfaced under Phase B telemetry: 1. The pre-pass propagated the user's -ti window (e.g. -1,2 on Astral) to the pre-pass DBScanner. Spectra whose precursor was selected as the M+1 or M+2 isotope produced 500-1000 ppm "residuals" (the 1.003 Da isotope shift on a 2 kDa peptide). These contaminated the median + MAD aggregation. Astral measured sigma=1166 ppm — useless for any tightening decision. Fix: hard-code the pre-pass to iso=[0,0]. The pre-pass exists to LEARN the shift, not to find every match; clean monoisotopic matches are what makes the residual distribution meaningful. 2. As belt-and-suspenders, residuals beyond +/-50 ppm are now rejected in extractResiduals(). Any modern instrument's true mass accuracy is well under 50 ppm; values above are almost always isotope-shift or charge-state mistakes. New constant MAX_REASONABLE_RESIDUAL_PPM=50. Drop the now-unused minIsotopeError / maxIsotopeError fields and constructor parameters from MassCalibrator. The single caller (MSGFPlus.runMSGFPlus) is updated. Verified on remote pride-linux-vm.ebi.ac.uk (Astral ProteoBench Module 8): Before fix: Precursor mass shift 1.398 ppm from 371 PSMs (sigma 1166 ppm) After fix: Precursor mass shift 0.978 ppm from 393 PSMs (sigma 3.987 ppm) The shift values are similar; the sigma is the dramatic signal that the residual distribution is now clean. 0.978 ppm is the typical Astral instrument bias; 3.987 ppm is the true post-calibration spread (still wider than expected; might reflect long-peptide tail). OFF-mode (precursorCal=off) is bit-identical to dev-tip on three back-to-back Astral runs (548/560/551s wall, all 89479/46792 native counts). 32/32 scoped tests pass (TestMassCalibrator, TestPhaseBTelemetry, TestPrecursorCalScaffolding). NOTE: this commit only fixes the calibrator's correctness. Phase B tightening still does not fire on Astral because the tightening formula k=3 * sigma + margin = 12.46 ppm exceeds the 10 ppm user window. Tuning k or rethinking the formula is a separate decision; see retrospective for the next-iteration discussion.
Phase B's tightening formula min(userPpm, max(floorPpm, k*sigma + marginPpm)) previously used hard-coded MassCalibrator.DEFAULT_TIGHTENED_WINDOW_* constants (k=3, floor=2 ppm, margin=0.5 ppm). Surface them as system properties so falsification sweeps can vary k without recompiling: -Dmsgfplus.tighteningSigmaMultiplier=<float> (default 3.0) -Dmsgfplus.tighteningFloorPpm=<float> (default 2.0) -Dmsgfplus.tighteningMarginPpm=<float> (default 0.5) Defaults unchanged; OFF-mode and PrecursorCal=AUTO production behaviour is bit-identical when no override is set. Verified on remote pride-linux-vm.ebi.ac.uk Astral ProteoBench Module 8 with the new -Dmsgfplus.tighteningSigmaMultiplier=2 override: | Run | Wall | Targets | Decoys | mean_per_call | |--------------------|-------|---------|--------|---------------| | OFF (baseline) | 538s | 89479 | 46792 | 0.77 | | AUTO @ k=2 rep1 | 537s | 89453 | 46685 | 0.63 | | AUTO @ k=2 rep2 | 531s | 89453 | 46685 | 0.63 | Tightening fires for the first time on Astral (10.000 ppm -> 8.474 ppm). Pairing fan-out drops 18% (mean_per_call 0.77 -> 0.63). matched_speckeys drops 15.4%. Native counts shift slightly (-26 targets, -107 decoys = -0.03% / -0.23%, within noise). Wall delta: 0 +/- 2% (within machine noise floor of three OFF replicates 538/560/551). Conclusion: Phase B's cheap-score-pairing reduction does NOT translate to Astral wall savings. The bottleneck is downstream of pairing (GF / scoring). Phase B remains a valid lever for workloads where calibrated sigma is small relative to user tolerance (typical of older / drifted instruments) but is not the next Astral wall lever. Next step: stratification analysis (Plan B) to determine whether a cleaner subset of Astral pre-pass PSMs has materially tighter sigma. If yes -> calibrator quality improves for workloads where Phase B DOES pay off, even if Astral itself doesn't benefit. If no -> Phase B is at its theoretical ceiling.
…FIDENT_PSMS
Plan B (post-falsification): the previous calibrator returned all 393
SpecEValue<=1e-6 PSMs to the median+MAD aggregator. Per-PSM stratification
analysis (using the calibrationDebugTsv emitter) on Astral revealed:
Stratum | n | sigma (ppm)
-----------------------------|-----|------------
All confident PSMs | 393 | 3.987
Charge=2 only | 355 | 2.923
Mass < 1 kDa | 289 | 2.470
TOP 50% by spec_eValue | 196 | 0.951
TOP 200 by spec_eValue | 200 | 0.994
The "best half by spec_eValue" axis is the dominant explanatory variable —
4x tighter sigma. The worst-half PSMs (eValue near the 1e-6 threshold)
contribute residual scatter, not real instrument-bias signal.
Fix: extractResiduals now collects (residual, spec_eValue) pairs, sorts
by spec_eValue ascending (most confident first), and keeps the top
MIN_CONFIDENT_PSMS (200) for the median+MAD aggregation. The
absolute-residual filter (|residual| > 50 ppm) stays as belt-and-suspenders.
Astral measurement on remote pride-linux-vm.ebi.ac.uk (5 OFF replicates +
3 stratified-AUTO replicates):
Run | Wall | Sigma | Window | Targets | Decoys | T/D
----------------------|-------|-----------|----------------|---------|--------|------
OFF baseline (median) | 551 s | - | 10 ppm | 89479 | 46792 | 1.91
AUTO stratified | 494 s | 0.994 ppm | 10 -> 3.48 ppm | 89580 | 45292 | 1.98
Median wall: -10.4% (gate was >=10%, hit)
Targets: +101 (+0.11%, NOT a regression)
Decoys: -1500 (-3.2%, FDR-favorable)
T/D ratio: +3.6% (the tighter window rejects more decoys than targets)
Reproducibility: 3 stratified reps bit-identical on native counts;
wall variance +/-1.2% across reps.
Phase B is now demonstrably alive on Astral when the calibrator's
residual subset is clean. The previous "Phase B does nothing on Astral"
finding was a calibrator-quality issue, not a Phase B logic issue.
Also keeps the calibration-debug-TSV emitter
(-Dmsgfplus.calibrationDebugTsv=<path>) committed for future
stratification work on other workloads. 32/32 scoped tests pass.
Phase B with the stratified calibrator (top-200 by spec_eValue) delivers -10.4% Astral wall (median 551 -> 494s across 5 OFF + 3 AUTO reps), +0.11% targets, -3.2% decoys, T/D ratio 1.91 -> 1.98 (sensitivity- favorable). Plan section 4 annotated; SHIPPED.md "Active" section documents the win + the four enabling commits (telemetry, calibrator iso=0, configurable formula constants, stratification).
The previous SHIPPED.md still listed astral-speed-improvements.md as the Active doc; that file was deleted during a prior reset. Replace with the actual Active state: Phase B shipped with 4 enabling commits on feat/astral-speed-improvements, the Astral measurement table, and pointers to next-experiments.md and astral-speed-5x-roadmap.md.
Replace Astral-only result table with three-dataset comparison after remote validation on TMT (PXD007683 Lumos) and PXD001819 (Velos): Workload | Window | Sigma | Tightened | Wall | Targets | T/D ----------|--------|--------|--------------|--------|---------|------ Astral | 10 ppm | 0.99 | -> 3.48 ppm | -10.4% | +0.11% | +3.6% TMT | 20 ppm | 2.05 | -> 6.67 ppm | -18.0% | -2.05% | +1.3% PXD001819 | 5 ppm | 2.15 | no-tighten | ~0% | +0.17% | +0.5% Pattern: Phase B wins when calibrated sigma is materially smaller than user window (Astral, TMT); safely no-ops when sigma is comparable to window (PXD001819). TMT -2.05% target drift is a known yellow flag for broader rollout; mitigations (instrument-aware k, stricter stratification) noted in the doc but out of scope for this commit.
After Phase B's three-dataset validation (Astral -10.4%, TMT -18.0%,
PXD001819 safe no-op), the next-experiments plan §5 calls for exact
prefix mass-interval pruning. This commit writes the design before any
code change so the implementation has a clear contract:
- Hook: between addResidue (extends prefix) and the variant-mass-fetch
loop in DBScanner.dbSearch (around line 412)
- Bound: [prefixMinMass + R_min * minAaMass,
prefixMaxMass + R_max * (maxAaMass + maxResidueModMass) + maxFixedTermModMass]
- Intersection test: pepMassSpecKeyMap.subMap(...) widened by max
precursor tolerance; if empty, branch is dead -> break
- Telemetry-first: implement prune-counter only at Checkpoint 1, run on
Astral, decide based on prune rate before adding the break
- Exact-by-construction: no recall risk, only bookkeeping-vs-savings
trade-off
Acceptance: prune rate >=5%, Astral wall -5% vs Phase B baseline,
native counts bit-identical.
Kill: prune rate <1%, or wall flat despite pruning, or correctness
drift.
Composes with Phase B (commit aac389c): Phase B reduced matched_speckeys
per pairing call; Experiment 2 reduces the number of pairing calls.
Different attack surfaces; should stack.
…heckpoint 1+2 measured) Plan-§5 exact prefix mass-interval pruning: at every SA-walk residue extension in DBScanner.dbSearch, compute the reachable final-peptide- mass interval and ask whether any spectrum window in pepMassSpecKeyMap can intersect it. If not, the branch is dead. The bound is exact by construction: reachableMin = prefixMinMass + R_min * minAaMassBound reachableMax = prefixMaxMass + R_max * maxAaMassBound where R_min / R_max are the residue counts implied by minPeptideLength / maxPeptideLength, and minAaMassBound / maxAaMassBound are cached from aaSet.getAllAminoAcidArr() at DBScanner construction (every aa+mod variant in the set, so any future residue's mass is bounded). Two opt-in flags: -Dmsgfplus.experiment2Telemetry=true -> count would-be prunes -Dmsgfplus.experiment2Pruning=true -> actually break out of SA walk Default OFF -> the bound is not computed, OFF-mode bit-identical to dev-tip. Enabled as a developer diagnostic, mirroring PhaseBTelemetry and -Dmsgfplus.useForkJoin patterns. Not exposed as CLI flags. Astral measurements on remote pride-linux-vm.ebi.ac.uk: Run | Wall | Targets | Decoys | E2 prune -----------------------------|-------|---------|--------|--------- Phase B baseline (3 reps) | 494s | 89580 | 45292 | n/a OFF baseline (5 reps median) | 551s | 89479 | 46792 | n/a phaseB + E2 telemetry only | 571s | 89580 | 45292 | 12.22% phaseB + E2 pruning ON | 549s | 89580* | 45292* | 1.84% OFF + E2 pruning ON | 611s | 89479* | 46792* | 1.76% (* = bit-identical to baseline -> exact-by-construction validated) Verdict per plan §7 acceptance gates: - Native target/decoy bit-identical: PASS (exact) - Astral prune rate >= 5%: PASS (12.22% Checkpoint 1, 1.84% with break) - Astral wall improves >= 5%: FAIL (regresses +11% in both Phase-B+E2-prune and OFF+E2-prune cases) The bookkeeping cost (~40 ns per prefix evaluation, ~1.4B evaluations on Astral = ~55s) exceeds the savings from skipping doomed branches. Pure cheap-score-loop savings are ~1-2s (the pruned pairings had zero matches anyway -- mean_per_call near zero for those). The +55s gap is the pure overhead. Checkpoint 3 path forward (left as follow-on, NOT in this commit): - Cache globalMinSpecMass / globalMaxSpecMass once per task; short-circuit the bound test for prefixes whose reachable interval falls outside the global range -> avoids the TreeMap.subMap call (~150ns) on most prefix evaluations. - Skip bound evaluation when peptideLengthIndex < minPeptideLength + N (N TBD); the prune-rate-by-length distribution would tell us where short-prefix evaluations are dead weight vs useful. - Avoid per-prefix grid scan for prefix min/max mass; cache incrementally as variants are added. Code stays in dev as opt-in scaffolding. Phase B remains the iteration's shippable Astral wall lever (-10.4% measured median). Files: - src/main/java/edu/ucsd/msjava/msdbsearch/Experiment2Telemetry.java (new) - src/main/java/edu/ucsd/msjava/msdbsearch/DBScanner.java (cached aa-mass bounds + bound-test hook) - src/main/java/edu/ucsd/msjava/cli/MSGFPlus.java (telemetry summary) - src/test/java/edu/ucsd/msjava/msdbsearch/TestExperiment2Telemetry.java (5 tests)
Checkpoint 2 measurement on Astral remote shows the bound test is exact-by-construction (native counts bit-identical) and prunes 1.84% of prefix evaluations after break — but the per-prefix bookkeeping cost (~40 ns × 1.4 B = ~55 s) exceeds the savings (the pruned pairings had zero matches, so saved cheap-score work is ~1-2 s). Wall regresses +11% vs Phase B alone. Code lands as opt-in scaffolding (commit 4241fbb), defaults OFF, so OFF-mode is bit-identical to dev-tip. Future agents can pursue Checkpoint 3 (cache global spec-mass range short-circuit; skip bound test for short prefixes; incremental grid-mass caching) without rebuilding the scaffolding.
…ed double[] The Checkpoint 2 measurement showed Experiment 2 pruning was correct (native counts bit-identical) but regressed wall +11 % because the TreeMap.subMap(low, high).isEmpty() bound test allocated a view object and walked at least one entry per call (~150 ns × 1.4 B evaluations = ~210 s of pure overhead). Replace with ScoredSpectraMap.hasSpecMassInRange(double, double): - Lazily materialise a sorted double[] from pepMassSpecKeyMap.keySet() on first call (the source map is read-only after preProcessSpectra finishes, so a cached snapshot is safe). - Use Arrays.binarySearch + a single-element range check (~30 ns). - 5x cheaper per call; no allocation per query. Astral measurement on remote pride-linux-vm.ebi.ac.uk (3 runs each config, all native counts bit-identical to baseline = exact-by- construction validated): Run | TreeMap | binary-search | gain -----------------------------|---------|---------------|------- Phase B + E2 telemetry only | 571 s | 531 s | -40 s Phase B + E2 pruning ON | 549 s | 511 s | -38 s OFF + E2 pruning ON | 611 s | 559 s | -52 s Verdict against plan §7 wall gate (>= 5 % improvement vs Phase B baseline 494 s): - Phase B + E2 pruning: 511 vs 494 = +3.4 % (still slight regression) - OFF + E2 pruning: 559 vs 551 = +1.5 % (break-even within noise) The optimization closes ~75 % of the previous overhead but the remaining ~17 s gap on Phase B-tightened data isn't covered by the modest savings (matched_speckeys per skipped pairing is small once Phase B has already narrowed the window from 10 ppm to 3.48 ppm). Code stays as opt-in (-Dmsgfplus.experiment2Pruning=true). Default remains OFF -> bit-identical to dev-tip on dev-tip OFF mode + Phase B's calibration shift on AUTO. Phase B (commit aac389c and ancestors) is the iteration's durable Astral wall lever (-10.4 %). Future Checkpoint 4 paths (left as follow-on, not in this commit): - Skip bound test for short prefixes (peptideLengthIndex below some threshold N where prunes are statistically rare) - Cache prefixMin/Max incrementally instead of scanning grid variants on every evaluation (~10 ns saved per call)
Binary-search optimization closes ~75 % of the Checkpoint 2 overhead gap. Phase B + E2 pruning narrows from +11 % to +3.4 % wall regression vs Phase B alone. OFF + E2 pruning narrows from +11 % to +1.5 % vs OFF (break-even in noise). Still misses the +5 % wall gate; Phase B stays the iteration's deliverable.
…deLength (Checkpoint 4) Move the Experiment 2 mass-interval pruning hook from before the 'peptideLengthIndex < minPeptideLength' continue to after it. Short prefixes (length < minPeptideLength) have huge reachable intervals — R_max * maxAaMass is many kDa wide — so the bound test almost never prunes there but still pays the binary-search bookkeeping. Skipping that range trims ~25-30% of bound-test evaluations. Also simplifies reachableMin = prefixMin once we're past minPeptideLength (R_min collapses to 0). Exact-by-construction property preserved: the moved test is still a sound necessary condition on extending the prefix to a viable final peptide. OFF mode unaffected. Off by default (still gated on -Dmsgfplus.experiment2Pruning=true).
…pped, bench pending Records the Checkpoint 4 code change (commit 8478651): bound test now skipped for prefixes shorter than minPeptideLength, where the reachable interval is too wide to prune anyway. Tests pass; Astral wall measurement pending remote socket re-establishment. Math says best-case is +3-7s gap vs Phase B alone — still inside noise, not a clear ≥5% graduation. Verdict updated to: four checkpoints, none clear the gate; Phase B is the durable lever.
…-2.27% Astral wall Replaces the "bench pending" Checkpoint 4 entry with the actual 5-trial interleaved confirmation: phaseB_only: mean 519.8s, σ 6.06s (522, 518, 517, 529, 513) phaseB+e2: mean 508.0s, σ 4.85s (504, 507, 514, 503, 512) Δ = 11.8s = -2.27% vs Phase B alone 5/5 trials phaseB+E2 < phaseB_only; Welch's t ~3.4 (p ~ 0.01) Native counts bit-identical 89580T / 45292D across all 10 runs The effect is real and statistically significant but below the plan's >=5% graduation gate. Experiment 2 stays opt-in (-Dmsgfplus.experiment2Pruning=true); Phase B (-10.4% Astral wall vs OFF) remains the durable lever.
…real improvement
The Astral 5-trial bench showed:
- Phase B (calibrated precursor-window tightening): -10.4% wall vs OFF, durable
win, validated on 3 datasets (Astral / TMT / PXD001819). KEPT.
- Experiment 2 (mass-interval pruning, Checkpoints 1-4): real -2.27% on top of
Phase B but below the >=5% default-on gate. Runtime scaffolding REMOVED;
retrospective stays in .claude/plans/experiment-2-mass-interval-pruning.md.
- Catalog-backed scanner (MassIndexedPeptideCatalog + CatalogDBScanner):
n=1 trial showed 4.56x slowdown (2186s vs 479s) with bit-identical native
counts. Root cause: scanner iterates all 7M entries linearly and allocates
a fresh CandidatePeptideGrid per entry instead of querying the mass slabs
the catalog builds. REMOVED in current form; would need a spectrum-major
redesign + grid pool to be viable.
Removed:
- src/main/java/edu/ucsd/msjava/msdbsearch/CatalogDBScanner.java
- src/main/java/edu/ucsd/msjava/msdbsearch/MassIndexedPeptideCatalog.java
- src/main/java/edu/ucsd/msjava/msdbsearch/MatchScanner.java
- src/main/java/edu/ucsd/msjava/msdbsearch/PeptideCatalogSource.java
- src/main/java/edu/ucsd/msjava/msdbsearch/Experiment2Telemetry.java
- src/main/java/edu/ucsd/msjava/msdbsearch/PhaseBTelemetry.java
- tests for the above
Reverted to HEAD (catalog source-awareness):
- DatabaseMatch (no protein-source list)
- MSGFPlusPSMSet (no source-aware decoy decision)
- DirectPinWriter / DirectTSVWriter (no catalog source path)
- CompactFastaSequence (no protein-entry digest helper)
- ConcurrentMSGFPlus (no catalog scanner construction)
- TestConcurrentMSGFPlus (revert null arg)
Cleanup kept:
- MSGFPlus: drop Phase B / Experiment 2 telemetry print blocks
- DBScanner: drop Experiment 2 prefix-mass bound hook (and its bookkeeping)
- ScoredSpectraMap: drop hasSpecMassInRange (Experiment 2 binary-search helper)
- MassCalibrator: drop unused -Dmsgfplus.calibrationDebugTsv emitter
Phase B remains untouched: MassCalibrator iso=0 pre-pass + outlier filter +
spec_eValue stratification, and MSGFPlus tightening at min(userPpm, max(floor,
k*sigma + margin)) with k=3, floor=2 ppm, margin=0.5 ppm. Branch state:
51/51 scoped tests pass.
CLAUDE.md flags as a footgun: pre-passes that mutate shared Spectrum objects
cause silent PSM-count drift in the main pass. The calibration pre-pass at
MassCalibrator#collectResiduals invoked ScoredSpectraMap#preProcessSpectra,
which calls spec.setCharge(charge) and scorer.getScoredSpectrum(spec) on
objects from the shared SpectraAccessor — exactly the warned pattern.
Fix:
- ScoredSpectraMap gains an opt-in builder isolateSpectrumState() that
routes all spectrum-mutating sites through a new prepareSpectrumForScoring
helper. Default behavior (main search path) is unchanged: in-place
setCharge, no allocation.
- When isolation is enabled, each call clones the Spectrum (precursor +
per-peak deep clone via getCloneWithoutPeakList + Peak.clone), applies
setCharge to the clone only, and scores the clone. The shared Spectrum
cached on SpectraAccessor is untouched.
- MassCalibrator's pre-pass opts in:
new ScoredSpectraMap(...).isolateSpectrumState()
- TestScoredSpectraMapIsolation pins the contract: default path returns the
same instance and propagates setCharge; isolated path returns a different
instance, leaves the original's charge unchanged, and applies setCharge to
the clone.
Validation: 3-dataset x 3-trial x 2-arm bench on the isolated machine
(pride-linux-vm.ebi.ac.uk):
Astral OFF 508.7s +/- 5.0s T=89479 D=46792
AUTO 469.0s +/- 9.2s T=89580 D=45292 wall -7.8%, +101 T
TMT OFF 248.0s +/- 6.6s T=28790 D=14768
AUTO 240.3s +/- 1.5s T=28201 D=14285 wall -3.1%
PXD001819 OFF 101.3s +/- 0.6s T=28037 D=11022
AUTO 100.3s +/- 2.1s T=28084 D=10985 wall -1.0%, +47 T
Native target/decoy counts are deterministic across all 3 trials per arm
(zero per-trial variance), confirming the isolation prevents the
shared-state mutation that would otherwise produce silent drift. OFF-mode
counts on each dataset match the dev baseline exactly.
…diff) These directories are out of scope for the precursor-window-tightening PR. Restore them to match dev exactly so they no longer appear in the PR's cumulative diff against dev. After this commit: git diff origin/dev..HEAD -- .claude/plans benchmark/ci returns empty. Removed (branch-only additions, never on dev): .claude/plans/SHIPPED.md .claude/plans/astral-next-experiments.md .claude/plans/astral-phase-a-retrospective.md .claude/plans/astral-speed-5x-roadmap.md .claude/plans/experiment-2-mass-interval-pruning.md Restored to dev's version (branch had not picked up these dev additions): .claude/plans/parameter-modernization-flag-inventory.md .claude/plans/parameter-modernization.md .claude/plans/search-sync-cleanup.md Restored to dev's version (branch had local edits): .claude/plans/README.md benchmark/ci/README.md benchmark/ci/PXD001819/baseline.tsv benchmark/ci/PXD001819/extract_metrics.py benchmark/ci/PXD001819/run_ci.sh benchmark/ci/PXD001819/test_compare_metrics.py
…operties
Adds two override knobs for the calibration pre-pass:
-Dmsgfplus.maxSampled=<int> (default 500)
-Dmsgfplus.minConfidentPsms=<int> (default 200)
Both fall back to the historical defaults on unset/non-numeric/non-positive
values, so existing behavior is unchanged. Internal constants moved from
private to public DEFAULT_* so callers and tests can reference them.
CalibrationStats.hasReliableStats() no longer hard-codes the
MIN_CONFIDENT_PSMS threshold; the calibrator now emits confidentPsmCount=0
as the "unreliable, do not apply" sentinel and hasReliableStats checks
count > 0. Behavior is identical for all existing callers.
Defaults intentionally unchanged. A 9-trial Astral A/B (3 trials each at
minConfidentPsms = 200 / 500 / 1000, all with maxSampled=3000) showed:
config sigma ppm tighten ppm wall s targets decoys
(500, 200) HIST 0.994 3.482 469.0 89580 45292
(3000, 200) 0.454 2.000 470.3 89249 44538
(3000, 500) 0.511 2.033 471.7 89212 44555
(3000, 1000) 0.729 2.687 469.7 89524 44817
Key findings:
- Wall time is statistically equivalent across all four configurations
(~470 s, well inside run-to-run sigma of ~25 s).
- Sigma DECREASES as minConfidentPsms DROPS at fixed maxSampled
(0.454 < 0.511 < 0.729 ppm for 200 < 500 < 1000). Confirms the prior
that stratification depth, not pool size, drives sigma quality.
- Lower minConfidentPsms produces an over-tight window: (3000, 200)
hits the 2.0 ppm floor and loses 331 native targets vs the historical
(500, 200) default.
- No configuration tested beats the historical (500, 200) defaults end
to end, which were already validated on 3 datasets in PR #26.
Conclusion: ship the configurability for users who want to tune for
specific instruments / file sizes, but do not change the defaults.
26 unit tests verify property parsing (default fallback, valid int, trim,
non-numeric rejection, non-positive rejection, exported constant values).
Phase B: calibrated precursor-window tightening (-7.8% Astral wall)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request adds comprehensive project documentation and internal investigation records to the MS-GF+ codebase. It introduces a high-level project context file, detailed documentation of command-line flags, and a new system for tracking technical investigations and plans. The main themes are improved onboarding/documentation, enhanced transparency into known issues and technical debt, and groundwork for future development and parameter modernization.
Project documentation and onboarding:
.claude/CLAUDE.mdwith an overview of the MS-GF+ project, directory structure, build instructions, conventions, and performance-sensitive invariants. This serves as a central onboarding and context document for the codebase..claude/plans/README.mdand.claude/plans/parameter-modernization-flag-inventory.mdto document implementation plans, design documents, and a detailed inventory of all command-line flags and their parsing semantics. This supports the upcoming parameter modernization effort. [1] [2]Investigations and technical debt tracking:
.claude/investigations/README.mdas an index for tracked issues, bugs, and behaviors needing further analysis, establishing a process for technical investigations.