diff --git a/docs/training-scoring-models.md b/docs/training-scoring-models.md
new file mode 100644
index 00000000..cc12e827
--- /dev/null
+++ b/docs/training-scoring-models.md
@@ -0,0 +1,181 @@
+# Training MS-GF+ scoring models
+
+MS-GF+ ships with a set of pre-trained scoring models (`.param` files in
+`src/main/resources/`) that cover the common combinations of activation
+method, instrument type, enzyme, and protocol. The bundled set includes
+HCD/QExactive/Tryp, HCD/HighRes/Tryp/TMT, and so on. If your data does
+not match any of the bundled combinations, or if you want a model
+specifically tuned for your instrument, you can train your own.
+
+This page describes the recovered training entry point on this fork and
+the end-to-end workflow.
+
+## When to train a new model
+
+You need a custom model only when:
+
+- Your activation/instrument/enzyme/protocol combination has no bundled
+  `.param` file. Run a search with `-inst HighRes -m HCD -e Tryp` and
+  watch the startup log: if MS-GF+ falls back to a generic model, that's
+  the signal.
+- Your instrument's fragmentation pattern differs materially from the
+  bundled training data (e.g. a new generation Astral run vs. a Q
+  Exactive of 2014).
+- You want to compare an in-house trained model against the bundled one
+  to quantify the gain.
+
+For most users on standard tryptic HCD runs, the bundled model is fine
+and Phase B's calibrated precursor-window tightening (the
+`-precursorCal auto` flag) is the bigger lever.
+
+## What you need
+
+1. **Spectra**: one or more mzML or MGF files of MS/MS data from the
+   instrument and acquisition mode you want to train for.
+2. **A protein database**: a FASTA covering the species in your
+   training data.
+3. **A modifications file**: standard MS-GF+ `Mods.txt` format.
+4. **A target/decoy MS-GF+ search of (1) against (2) at standard 1%
+   FDR**: this provides the annotated PSMs the trainer learns from.
+   The trainer needs **a few hundred** confidently identified PSMs to
+   produce a stable model; thousands is better.
+
+The mzID input that upstream MS-GF+ supported was removed from this
+fork. The trainer now accepts only TSV PSM lists. The standard MS-GF+
+TSV writer (`DirectTSVWriter`, the one you get with the default search
+output) produces TSVs that already match the trainer's expected format.
+
+## Workflow
+
+### Step 1 — Search your training data
+
+Run an MS-GF+ search the same way you would for any project, with
+`-tda 1` (target-decoy on) so the output has a `QValue` column the
+trainer can filter on. Use a precision tolerance appropriate for the
+instrument; for high-resolution data, `-precursorCal auto` is
+recommended so Phase B's calibration tightens the window before the
+search.
+
+Example:
+
+```sh
+java -Xmx16G -jar MSGFPlus.jar \
+  -s training.mzML \
+  -d training.fasta \
+  -mod Mods.txt \
+  -t 10ppm \
+  -tda 1 \
+  -inst HighRes \
+  -m HCD \
+  -e Tryp \
+  -precursorCal auto \
+  -addFeatures 1 \
+  -o training.tsv
+```
+
+The output `training.tsv` is the input to the trainer.
+
+### Step 2 — Train the scoring model
+
+Invoke `ScoringParamGen` with the same activation method, instrument
+type, enzyme, and protocol you used in step 1. The trainer pulls the
+high-confidence PSMs (default `QValue ≤ 0.01`), looks up each PSM's
+spectrum in the directory you supply, and writes a `.param` file in
+the current working directory.
+
+```sh
+java -Xmx4G -cp MSGFPlus.jar edu.ucsd.msjava.ui.ScoringParamGen \
+  -i training.tsv \
+  -d /path/to/spectra-directory \
+  -m HCD \
+  -inst HighRes \
+  -e Tryp \
+  -protocol Standard
+```
+
+The output filename is derived from the data type: e.g. for the
+arguments above, `HCD_HighRes_Tryp_Standard.param`.
+
+### Step 3 — Use the model
+
+Drop the `.param` file into `src/main/resources/` (rebuilding the JAR)
+or supply it via the appropriate parameter file. MS-GF+ will pick it up
+when the search's `(activationMethod, instrumentType, enzyme, protocol)`
+tuple matches the filename.
+
+## TSV input format
+
+`AnnotatedSpectra` (the trainer's TSV reader) requires these columns,
+identified by header name (case-insensitive):
+
+| Column | Required | Description |
+|---|---|---|
+| `#SpecFile` | yes | Spectrum file name (matched by basename against `-d`). |
+| `SpecID` | yes | Index or scan identifier; passed to `SpectraAccessor.getSpectrumById`. |
+| `Peptide` | yes | Peptide sequence. May include `K.PEPTIDE.K` flanking residues; flankers are stripped. |
+| `Charge` | yes | Integer charge state. |
+| `FDR` / `EFDR` / `QValue` / `SpecQValue` | yes (any one) | Used to filter rows; default threshold is 0.01. |
+
+Extra columns are ignored. The `Peptide` field accepts standard MS-GF+
+modification syntax (e.g. `K.PEP+57.021M+15.995IDE.K`). The trainer
+matches each PSM's `(SpecID, Charge)` against the spectrum file and
+verifies that the peptide's theoretical mass is within 5 Da of the
+spectrum's precursor mass; mismatches are reported and abort the run
+(unless `-dropErrors 1` is supplied).
+
+## CLI reference
+
+```
+java -cp MSGFPlus.jar edu.ucsd.msjava.ui.ScoringParamGen [options]
+
+Required:
+  -i  <tsv1[,tsv2,...]>  Training result TSV files (mzID input not supported in this build)
+  -d  <specDir>          Directory holding the spectrum files referenced by the TSVs
+  -m  <activation>       Activation method (CID, ETD, HCD, UVPD, etc.)
+  -inst <instrument>     Instrument type (LowRes, HighRes, QExactive, etc.)
+  -e  <enzyme>           Enzyme name (Tryp, Chymotryp, LysC, AspN, etc.)
+
+Optional:
+  -protocol <name>       Protocol (default: NoProtocol/automatic)
+  -thread <int>          Worker threads for parsing PSMs (default: 1)
+  -dropErrors 0|1        Drop datasets with errors instead of failing (default: 0)
+  -mgf 0|1               Also emit aggregated <dataType>.mgf (default: 0)
+```
+
+## Notes and limitations
+
+- **mzID input was removed**: the upstream MS-GF+ `ui/ScoringParamGen`
+  CLI accepted `.mzid` files via `MzIDParser`. That class was deleted
+  from this fork in commit `9bf01c8`. If your existing pipeline produces
+  mzID, pre-convert to TSV (e.g. with the upstream MS-GF+ JAR's
+  `MzIDToTsv`) before passing to the trainer.
+- **No `params/ParamManager`**: the upstream CLI used the (now-removed)
+  `ParamManager` framework. The recovered entry point parses arguments
+  manually; option semantics match the upstream CLI, but the help text
+  is shorter.
+- **Output goes to the current directory**: the `.param` file lands
+  wherever you launched the JVM. There is no `-o` option; supply the
+  intended output directory via `cd` before invoking, or copy the file
+  afterwards.
+- **Minimum training data**: the trainer applies an internal dedup
+  (`(peptide, charge)` keyed, capped at 3 spectra per pair) before the
+  partition step. Empirically (see `TestScoringParamGenSmoke`), the
+  partition step refuses to fit under ~360 dedup-survived spectra at a
+  single charge, silently emitting an empty partition set and aborting
+  the model write. To stay above the floor, plan on **≥ 200 unique
+  peptide identifications across the dominant charge state**, and
+  preferably ≥ 500 unique peptides for a model with usable rank
+  distributions.
+
+## See also
+
+- The recovered code lives at:
+  - `src/main/java/edu/ucsd/msjava/ui/ScoringParamGen.java`
+  - `src/main/java/edu/ucsd/msjava/msscorer/ScoringParameterGeneratorWithErrors.java`
+  - `src/main/java/edu/ucsd/msjava/msscorer/ScoringParameterGenerator.java`
+  - `src/main/java/edu/ucsd/msjava/msutil/AnnotatedSpectra.java`
+  - `src/main/java/edu/ucsd/msjava/misc/TrainScoringParameters.java`
+- The original upstream documentation page is still live at
+  <https://msgfplus.github.io/msgfplus/ScoringParamGen.html>; the
+  command-line semantics here match it apart from the mzID input
+  caveat above.
diff --git a/src/main/java/edu/ucsd/msjava/misc/TrainScoringParameters.java b/src/main/java/edu/ucsd/msjava/misc/TrainScoringParameters.java
new file mode 100644
index 00000000..466e3619
--- /dev/null
+++ b/src/main/java/edu/ucsd/msjava/misc/TrainScoringParameters.java
@@ -0,0 +1,164 @@
+package edu.ucsd.msjava.misc;
+
+import edu.ucsd.msjava.msscorer.NewRankScorer;
+import edu.ucsd.msjava.msscorer.NewScorerFactory.SpecDataType;
+import edu.ucsd.msjava.msscorer.ScoringParameterGeneratorWithErrors;
+import edu.ucsd.msjava.msutil.ActivationMethod;
+import edu.ucsd.msjava.msutil.AminoAcidSet;
+import edu.ucsd.msjava.msutil.Enzyme;
+import edu.ucsd.msjava.msutil.InstrumentType;
+import edu.ucsd.msjava.msutil.Protocol;
+
+import java.io.BufferedInputStream;
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.InputStream;
+import java.util.Calendar;
+
+/**
+ * Internal harness for batch-training MS-GF+ param files from a fixed
+ * directory layout. Restored verbatim from upstream; not intended as a
+ * customer-facing entry point.
+ */
+public class TrainScoringParameters {
+
+    private static final String PARAM_DIR = System.getProperty("user.home") + "/Research/Data/TrainingMSGFPlus/new";
+    private static final String BACKUP_DIR = System.getProperty("user.home") + "/Research/Data/TrainingMSGFPlus/backup";
+    private static final String SPEC_DIR = System.getProperty("user.home") + "/Research/Data/TrainingMSGFPlus/AnnotatedSpectra";
+
+    public static void main(String argv[]) throws Exception {
+//		backup();
+//		createParamFiles();
+        testParamFiles();
+    }
+
+    public static void backup() throws Exception {
+        File paramDir = new File(PARAM_DIR);
+        boolean paramExists = false;
+        for (File paramFile : paramDir.listFiles()) {
+            if (paramFile.getName().endsWith(".param"))
+                paramExists = true;
+        }
+        if (!paramExists) {
+            System.out.println("No param file to backup.");
+            return;
+        }
+        Calendar calendar = Calendar.getInstance();
+        String dateStr = calendar.get(Calendar.MONTH) + "_" + calendar.get(Calendar.DAY_OF_MONTH) + "_" + calendar.get(Calendar.YEAR);
+        String backupDirName = "ParamBackup_" + dateStr;
+        File backupDir = new File(BACKUP_DIR + "/" + backupDirName);
+        if (backupDir.exists()) {
+            System.out.println("Backup directory already exists: " + backupDir.getPath());
+            System.exit(-1);
+        }
+        backupDir.mkdirs();
+        System.out.println(backupDir.getPath() + " is created.");
+
+        boolean backupSuccess = true;
+        for (File paramFile : paramDir.listFiles()) {
+            if (paramFile.getName().endsWith(".param")) {
+                File newFile = new File(backupDir, paramFile.getName());
+                boolean isBackupSuccessful = paramFile.renameTo(newFile);
+                System.out.println("Moving " + paramFile.getPath() + " to " + newFile.getPath() + (isBackupSuccessful ? " succeeded." : " failed."));
+                if (!isBackupSuccessful) {
+                    backupSuccess = false;
+                    break;
+                }
+            }
+        }
+        if (backupSuccess)
+            System.out.println("Backup complete.");
+        else {
+            backupDir.delete();
+            System.out.println(backupDir.getPath() + " is deleted.");
+            System.out.println("Backup failed.");
+            System.exit(0);
+        }
+    }
+
+    public static void createParamFiles() throws Exception {
+        File specDir = new File(SPEC_DIR);
+        if (!specDir.exists()) {
+            System.err.println("Training spectra directory doesn't exist:" + specDir.getPath());
+            System.exit(-1);
+        }
+
+        AminoAcidSet aaSet = AminoAcidSet.getStandardAminoAcidSetWithFixedCarbamidomethylatedCys();
+        for (File specFile : specDir.listFiles()) {
+            String specFileName = specFile.getName();
+            if (specFileName.endsWith(".mgf")) {
+                String id = specFileName.substring(0, specFileName.lastIndexOf('.'));
+                String[] token = id.split("_");
+                if (token.length != 3 && token.length != 4) {
+                    System.err.println("Wrong file name: " + specFile.getName());
+                    System.exit(-1);
+                }
+                String actMethodStr = token[0];
+                String instTypeStr = token[1];
+                String enzymeStr = token[2];
+                String protocolStr = null;
+                if (token.length == 4)
+                    protocolStr = token[3];
+
+                ActivationMethod actMethod = ActivationMethod.get(actMethodStr);
+                if (actMethod == null) {
+                    System.err.println("Unrecognized ActivationMethod: " + actMethodStr + "(" + specFileName + ")");
+                    System.exit(-1);
+                }
+                InstrumentType instType = InstrumentType.get(instTypeStr);
+                if (instType == null) {
+                    System.err.println("Unrecognized InstrumentType: " + instTypeStr + "(" + specFileName + ")");
+                    System.exit(-1);
+                }
+                Enzyme enzyme = Enzyme.getEnzymeByName(enzymeStr);
+                if (enzyme == null) {
+                    System.err.println("Unrecognized Enzyme: " + enzymeStr + "(" + specFileName + ")");
+                    System.exit(-1);
+                }
+
+                Protocol protocol = null;
+                if (protocolStr != null) {
+                    protocol = Protocol.get(protocolStr);
+                    if (protocol == null) {
+                        System.err.println("Unrecognized Protocol: " + protocolStr + "(" + specFileName + ")");
+                        System.exit(-1);
+                    }
+                } else
+                    protocol = Protocol.AUTOMATIC;
+
+                if (actMethod == null || instType == null || enzyme == null || protocol == null) {
+                    System.err.println("Wrong file name: " + specFile.getName());
+                    System.exit(-1);
+                }
+
+                SpecDataType dataType = new SpecDataType(actMethod, instType, enzyme, protocol);
+                System.out.println("Processing " + dataType.toString());
+                ScoringParameterGeneratorWithErrors.generateParameters(
+                        specFile,
+                        dataType,
+                        aaSet,
+                        new File(PARAM_DIR),
+                        false,
+                        false,
+                        false);
+            }
+        }
+        System.out.println("Successfully generated parameters!");
+    }
+
+    public static void testParamFiles() throws Exception {
+        for (File f : new File(PARAM_DIR).listFiles()) {
+            if (f.getName().endsWith(".param")) {
+                System.out.println("Reading " + f.getName());
+                InputStream is = new BufferedInputStream(new FileInputStream(f));
+                NewRankScorer scorer = new NewRankScorer(new BufferedInputStream(is));
+                System.out.println(scorer.getSpecDataType());
+                if (!f.getName().substring(0, f.getName().lastIndexOf('.')).equals(scorer.getSpecDataType().toString())) {
+                    System.out.println(f.getName().substring(0, f.getName().lastIndexOf('.')) + " != " + scorer.getSpecDataType().toString());
+                    System.out.println("********* Mismatch **********");
+                }
+            }
+        }
+        System.out.println("Read Success");
+    }
+}
diff --git a/src/main/java/edu/ucsd/msjava/msscorer/ScoringParameterGenerator.java b/src/main/java/edu/ucsd/msjava/msscorer/ScoringParameterGenerator.java
new file mode 100644
index 00000000..9cef6e1a
--- /dev/null
+++ b/src/main/java/edu/ucsd/msjava/msscorer/ScoringParameterGenerator.java
@@ -0,0 +1,718 @@
+package edu.ucsd.msjava.msscorer;
+
+import edu.ucsd.msjava.mgf.MgfSpectrumParser;
+import edu.ucsd.msjava.msgf.Histogram;
+import edu.ucsd.msjava.msgf.NominalMass;
+import edu.ucsd.msjava.msgf.Tolerance;
+import edu.ucsd.msjava.msscorer.NewScorerFactory.SpecDataType;
+import edu.ucsd.msjava.msutil.*;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Hashtable;
+import java.util.TreeSet;
+
+/**
+ * This only supports low accuracy fragment ions.
+ *
+ * @author sangtaekim
+ */
+public class ScoringParameterGenerator extends NewRankScorer {
+    private static final float MIN_OFFSET_MASS = -120;    // for ion types
+    private static final float MAX_OFFSET_MASS = 38;
+    private static final float MIN_PRECURSOR_OFFSET = -300;    // for precursors
+    private static final float MAX_PRECURSOR_OFFSET = 30;
+    private static final int MIN_NUM_SPECTRA_PER_PARTITION = 400;    // 400
+    private static final int MIN_NUM_SPECTRA_FOR_PRECURSOR_OFF = 150;
+
+    private static final float MIN_PRECURSOR_OFFSET_PROBABILITY = 0.15f;    // 0.15
+    private static final float MIN_ION_OFFSET_PROBABILITY = 0.15f;    // 0.15, for ion types
+    private static final int MAX_RANK = 150;
+    private static final int NUM_SEGMENTS_PER_SPECTRUM = 2;    // 2
+
+
+    private static final int[] smoothingRanks = {3, 5, 10, 20, 50, Integer.MAX_VALUE}; //Ranks around which smoothing occurs
+    private static final int[] smoothingWindowSize = {0, 1, 2, 3, 4, 5}; //Smoothing windows for each smoothing rank
+
+    private static final int NUM_NOISE_IONS = 10;
+    protected static final int MAX_CHARGE = 20;
+
+    public static void main(String argv[]) {
+        File specFile = null;
+        File outputFile = null;
+        boolean isText = false;
+        AminoAcidSet aaSet = AminoAcidSet.getStandardAminoAcidSetWithFixedCarbamidomethylatedCys();
+        int numSpecsPerPeptide = 1;
+        int errorScalingFactor = 10;
+
+        // Fragmentation method
+        ActivationMethod activationMethod = null;
+        InstrumentType instType = null;
+        Enzyme enzyme = null;
+
+        for (int i = 0; i < argv.length; i += 2) {
+            if (!argv[i].startsWith("-") || i + 1 >= argv.length)
+                printUsageAndExit("Invalid parameter!");
+            if (argv[i].equalsIgnoreCase("-i")) {
+                specFile = new File(argv[i + 1]);
+                if (!specFile.exists()) {
+                    printUsageAndExit(argv[i + 1] + " doesn't exist.");
+                }
+                int posDot = specFile.getName().lastIndexOf('.');
+                if (posDot >= 0) {
+                    String extension = specFile.getName().substring(posDot);
+                    if (!extension.equalsIgnoreCase(".mgf"))
+                        printUsageAndExit("Invalid spectrum format: " + argv[i + 1]);
+                } else
+                    printUsageAndExit("Invalid spectrum format: " + argv[i + 1]);
+            } else if (argv[i].equalsIgnoreCase("-o")) {
+                outputFile = new File(argv[i + 1]);
+            } else if (argv[i].equalsIgnoreCase("-t")) {
+                outputFile = new File(argv[i + 1]);
+                isText = true;
+            } else if (argv[i].equalsIgnoreCase("-fixMod")) {
+                if (argv[i + 1].equalsIgnoreCase("0"))
+                    aaSet = AminoAcidSet.getStandardAminoAcidSet();
+                else if (argv[i + 1].equalsIgnoreCase("1"))
+                    aaSet = AminoAcidSet.getStandardAminoAcidSetWithFixedCarbamidomethylatedCys();
+                else if (argv[i + 1].equalsIgnoreCase("2"))
+                    aaSet = AminoAcidSet.getStandardAminoAcidSetWithFixedCarboxymethylatedCys();
+                else
+                    printUsageAndExit("Invalid -fixMod parameter: " + argv[i + 1]);
+            } else if (argv[i].equalsIgnoreCase("-pep")) {
+                numSpecsPerPeptide = Integer.parseInt(argv[i + 1]);
+            } else if (argv[i].equalsIgnoreCase("-err")) {
+                errorScalingFactor = Integer.parseInt(argv[i + 1]);
+            } else if (argv[i].equalsIgnoreCase("-m")) {
+                if (argv[i + 1].equalsIgnoreCase("1")) {
+                    activationMethod = ActivationMethod.CID;
+                } else if (argv[i + 1].equalsIgnoreCase("2")) {
+                    activationMethod = ActivationMethod.ETD;
+                } else if (argv[i + 1].equalsIgnoreCase("3")) {
+                    activationMethod = ActivationMethod.HCD;
+                } else if (argv[i + 1].equalsIgnoreCase("4")) {
+                    activationMethod = ActivationMethod.UVPD;
+                } else {
+                    printUsageAndExit("Invalid activation method: " + argv[i + 1]);
+                }
+            } else if (argv[i].equalsIgnoreCase("-inst")) {
+                if (argv[i + 1].equalsIgnoreCase("0")) {
+                    instType = InstrumentType.LOW_RESOLUTION_LTQ;
+                } else if (argv[i + 1].equalsIgnoreCase("1")) {
+                    instType = InstrumentType.TOF;
+                } else if (argv[i + 1].equalsIgnoreCase("2")) {
+                    instType = InstrumentType.HIGH_RESOLUTION_LTQ;
+                } else {
+                    printUsageAndExit("Invalid instrument type: " + argv[i + 1]);
+                }
+            } else if (argv[i].equalsIgnoreCase("-e")) {
+                if (argv[i + 1].equalsIgnoreCase("0"))
+                    enzyme = null;
+                else if (argv[i + 1].equalsIgnoreCase("1"))
+                    enzyme = Enzyme.TRYPSIN;
+                else if (argv[i + 1].equalsIgnoreCase("2"))
+                    enzyme = Enzyme.CHYMOTRYPSIN;
+                else if (argv[i + 1].equalsIgnoreCase("3"))
+                    enzyme = Enzyme.LysC;
+                else if (argv[i + 1].equalsIgnoreCase("4"))
+                    enzyme = Enzyme.LysN;
+                else if (argv[i + 1].equalsIgnoreCase("5"))
+                    enzyme = Enzyme.GluC;
+                else if (argv[i + 1].equalsIgnoreCase("6"))
+                    enzyme = Enzyme.ArgC;
+                else if (argv[i + 1].equalsIgnoreCase("7"))
+                    enzyme = Enzyme.AspN;
+                else
+                    printUsageAndExit("Invalid enzyme: " + argv[i + 1]);
+            } else
+                printUsageAndExit("Invalid parameters!");
+        }
+        if (specFile == null)
+            printUsageAndExit("missing annotatedMgfFileName!");
+        if (outputFile == null)
+            printUsageAndExit("missing outputFileName!");
+        if (activationMethod == null)
+            printUsageAndExit("missing activationMethod!");
+        if (instType == null)
+            printUsageAndExit("missing instrumentType!");
+
+        generateParameters(specFile, activationMethod, instType, enzyme, Protocol.AUTOMATIC, numSpecsPerPeptide, errorScalingFactor, outputFile, aaSet, isText, false);
+    }
+
+    public static void printUsageAndExit(String message) {
+        System.err.println(message);
+        System.out.println("usage: java -Xmx2000M -cp MSGF.jar edu.ucsd.msjava.msscorer.ScoringParameterGenerator\n" +
+                "\t-i annotatedMgfFileName (*.mgf)\n" +
+                "\t-o outputFileName (e.g. CID_Tryp.param)\n" +
+                "\t-m FragmentationMethodID (1: CID, 2: ETD, 3: HCD, 4: UVPD)\n" +
+                "\t-inst InstrumentID (0: Low-res LCQ/LTQ, 1: TOF , 2: High-res LTQ)\n" +
+                "\t-e EnzymeID (0: No enzyme, 1: Trypsin (Default), 2: Chymotrypsin, 3: Lys-C, 4: Lys-N, 5: Glu-C, 6: Arg-C, 7: Asp-N)\n" +
+                "\t[-fixMod 0/1/2] (0: NoCysteineProtection, 1: CarbamidomethyC (default), 2: CarboxymethylC)\n" +
+                "\t[-pep numPeptidesPerSpec]  (default: 1)\n" +
+                "\t[-err errorScalingFactor]  (default: 10)"
+        );
+        System.exit(0);
+    }
+
+    public static void generateParameters(
+            File specFile,
+            ActivationMethod activationMethod,
+            InstrumentType instType,
+            Enzyme enzyme,
+            Protocol protocol,
+            int numSpecsPerPeptide,
+            int errorScalingFactor,
+            File outputFile,
+            AminoAcidSet aaSet,
+            boolean isText,
+            boolean verbose) {
+        SpectraContainer container = new SpectraContainer(specFile.getPath(), new MgfSpectrumParser().aaSet(aaSet));
+
+        // multiple spectra with the same peptide -> one spec per peptide
+        HashMap<String, ArrayList<Spectrum>> pepSpecMap = new HashMap<String, ArrayList<Spectrum>>();
+        SpectraContainer specContOnePerPep = new SpectraContainer();
+        for (Spectrum spec : container) {
+            String pep = spec.getAnnotationStr() + ":" + spec.getCharge();
+            if (pep != null && pep.length() > 0) {
+                ArrayList<Spectrum> specList = pepSpecMap.get(pep);
+                if (specList == null) {
+                    specList = new ArrayList<Spectrum>();
+                    pepSpecMap.put(pep, specList);
+                }
+                if (specList.size() < numSpecsPerPeptide)
+                    specList.add(spec);
+            }
+        }
+        for (ArrayList<Spectrum> specList : pepSpecMap.values())
+            for (Spectrum spec : specList)
+                specContOnePerPep.add(spec);
+
+        SpecDataType dataType = new SpecDataType(activationMethod, instType, enzyme, protocol);
+        ScoringParameterGenerator gen = new ScoringParameterGenerator(specContOnePerPep, dataType);
+
+        gen.tolerance(new Tolerance(1 / Constants.INTEGER_MASS_SCALER / 2));
+
+        gen.partition(NUM_SEGMENTS_PER_SPECTRUM);
+        if (verbose)
+            System.out.println("Partition: " + gen.partitionSet.size());
+
+        gen.precursorOFF(MIN_PRECURSOR_OFFSET_PROBABILITY);
+        if (verbose)
+            System.out.println("PrecursorOFF Done.");
+
+        gen.filterPrecursorPeaks();
+        if (verbose)
+            System.out.println("Filtering Done.");
+
+        gen.selectIonTypes(MIN_ION_OFFSET_PROBABILITY);
+        if (verbose)
+            System.out.println("Ion types selected.");
+
+        gen.generateRankDist(MAX_RANK);
+        if (verbose)
+            System.out.println("Rank distribution computed.");
+
+        gen.smoothing();
+        if (verbose)
+            System.out.println("Smoothing complete.");
+
+        if (!isText)
+            gen.writeParameters(outputFile);
+        else
+            gen.writeParametersPlainText(outputFile);
+
+        if (verbose)
+            System.out.println("Writing Done.");
+    }
+
+    private SpectraContainer specContainer;
+
+    public ScoringParameterGenerator(SpectraContainer specContainer, SpecDataType dataType) {
+        this.specContainer = specContainer;
+        super.dataType = dataType;
+    }
+
+    public void partition(int numSegments) {
+        super.numSegments = numSegments;
+        chargeHist = new Histogram<Integer>();
+        partitionSet = new TreeSet<Partition>();
+
+        Hashtable<Integer, ArrayList<Float>> parentMassMap = new Hashtable<Integer, ArrayList<Float>>();
+        for (Spectrum spec : specContainer) {
+            int charge = spec.getCharge();
+            if (charge <= 0)
+                continue;
+            chargeHist.add(charge);
+            if (spec.getAnnotation() != null) {
+                ArrayList<Float> precursorList = parentMassMap.get(charge);
+                if (precursorList == null) {
+                    precursorList = new ArrayList<Float>();
+                    parentMassMap.put(charge, precursorList);
+                }
+                precursorList.add(spec.getPrecursorMass());
+            }
+        }
+
+        for (int c = chargeHist.minKey(); c <= chargeHist.maxKey(); c++) {
+            ArrayList<Float> parentMassList = parentMassMap.get(c);
+            if (parentMassList == null)
+                continue;
+
+            int numSpec = parentMassList.size();
+            if (numSpec < Math.round(MIN_NUM_SPECTRA_PER_PARTITION * 0.9f))
+                continue;
+
+            Collections.sort(parentMassList);
+            int bestSetSize = 0;
+            int smallestRemainder = MIN_NUM_SPECTRA_PER_PARTITION;
+            for (int i = Math.round(MIN_NUM_SPECTRA_PER_PARTITION * 0.9f); i <= Math.round(MIN_NUM_SPECTRA_PER_PARTITION * 1.1f); i++) {
+                int remainder = numSpec % i;
+                if (i - remainder < remainder)
+                    remainder = i - remainder;
+                if (remainder < smallestRemainder || (remainder == smallestRemainder && Math.abs(MIN_NUM_SPECTRA_PER_PARTITION - i) < Math.abs(MIN_NUM_SPECTRA_PER_PARTITION - bestSetSize))) {
+                    bestSetSize = i;
+                    smallestRemainder = remainder;
+                }
+            }
+            int num = 0;
+            for (int i = 0; i == 0 || i < Math.round(numSpec / (float) bestSetSize); i++) {
+                if (num != 0) {
+                    for (int seg = 0; seg < numSegments; seg++)
+                        partitionSet.add(new Partition(c, parentMassList.get(num), seg));
+                } else {
+                    for (int seg = 0; seg < numSegments; seg++)
+                        partitionSet.add(new Partition(c, 0f, seg));
+                }
+                num += bestSetSize;
+            }
+        }
+    }
+
+    private void precursorOFF(float minProbThreshold) {
+        if (chargeHist == null) {
+            assert (false) : "partition() must have been called before";
+            return;
+        }
+        precursorOFFMap = new java.util.TreeMap<Integer, ArrayList<PrecursorOffsetFrequency>>();
+        numPrecurOFF = 0;
+
+        for (int charge = chargeHist.minKey(); charge <= chargeHist.maxKey(); charge++) {
+            if (chargeHist.get(charge) < MIN_NUM_SPECTRA_FOR_PRECURSOR_OFF)
+                continue;
+            ArrayList<PrecursorOffsetFrequency> precursorOffsetList = new ArrayList<PrecursorOffsetFrequency>();
+            int numSpecs = 0;
+            Hashtable<Integer, Histogram<Integer>> histList = new Hashtable<Integer, Histogram<Integer>>();
+            for (int c = charge; c >= 2; c--)
+                histList.put(c, new Histogram<Integer>());
+
+            for (Spectrum spec : specContainer) {
+                if (spec.getAnnotation() == null)
+                    continue;
+                if (spec.getCharge() != charge)
+                    continue;
+                numSpecs++;
+                spec = filter.apply(spec);
+                float precursorNeutralMass = spec.getPrecursorMass();
+                for (int c = charge; c >= 2; c--) {
+                    float precursorMz = (precursorNeutralMass + c * (float) Composition.ChargeCarrierMass()) / c;
+                    ArrayList<Peak> peakList = spec.getPeakListByMassRange(
+                            precursorMz + MIN_PRECURSOR_OFFSET / (float) c - mme.getToleranceAsDa(precursorMz + MIN_PRECURSOR_OFFSET / (float) c) / 2,
+                            precursorMz + MAX_PRECURSOR_OFFSET / (float) c + mme.getToleranceAsDa(precursorMz + MAX_PRECURSOR_OFFSET / (float) c) / 2);
+
+                    int prevMassIndexDiff = Integer.MIN_VALUE;
+                    for (Peak p : peakList) {
+                        float peakMass = p.getMz();
+                        int massIndexDiff = NominalMass.toNominalMass(peakMass - precursorMz);
+                        if (massIndexDiff > prevMassIndexDiff) {
+                            histList.get(c).add(massIndexDiff);
+                            prevMassIndexDiff = massIndexDiff;
+                        }
+                    }
+                }
+            }
+
+            for (int c = charge; c >= 2; c--) {
+                ArrayList<Integer> keyList = new ArrayList<Integer>(histList.get(c).keySet());
+                Collections.sort(keyList);
+                for (Integer key : keyList) {
+                    float prob = (histList.get(c).get(key)) / (float) numSpecs;
+                    if (prob > minProbThreshold) {
+                        precursorOffsetList.add(new PrecursorOffsetFrequency((charge - c), NominalMass.getMassFromNominalMass(key), prob));
+                    }
+                }
+            }
+            precursorOFFMap.put(charge, precursorOffsetList);
+            numPrecurOFF += precursorOffsetList.size();
+        }
+    }
+
+    private void filterPrecursorPeaks() {
+        if (this.precursorOFFMap == null)
+            return;
+        for (Spectrum spec : specContainer) {
+            for (PrecursorOffsetFrequency off : this.getPrecursorOFF(spec.getCharge()))
+                spec.filterPrecursorPeaks(mme, off.getReducedCharge(), off.getOffset());
+        }
+    }
+
+    private Pair<Float, Float> getPrecursorMassRange(Partition partition) {
+        float minParentMass = partition.getParentMass();
+        float maxParentMass = Float.MAX_VALUE;
+        Partition higherPartition = partitionSet.higher(partition);
+        if (higherPartition != null) {
+            if (higherPartition.getCharge() == partition.getCharge() && higherPartition.getSegNum() == partition.getSegNum()) {
+                maxParentMass = higherPartition.getParentMass();
+            }
+        }
+        return new Pair<Float, Float>(minParentMass, maxParentMass);
+    }
+
+    private void selectIonTypes(float minProbThreshold) {
+        if (partitionSet == null) {
+            assert (false) : "partition() must have been called before!";
+            return;
+        }
+
+        fragOFFTable = new HashMap<Partition, ArrayList<FragmentOffsetFrequency>>();
+        insignificantFragOFFTable = new HashMap<Partition, ArrayList<FragmentOffsetFrequency>>();
+
+        for (Partition partition : partitionSet) {
+            int charge = partition.getCharge();
+            Pair<Float, Float> parentMassRange = getPrecursorMassRange(partition);
+            int seg = partition.getSegNum();
+
+            int numSpec = 0;
+            Hashtable<Integer, Histogram<Integer>> prefixIonFreq = new Hashtable<Integer, Histogram<Integer>>();
+            Hashtable<Integer, Histogram<Integer>> suffixIonFreq = new Hashtable<Integer, Histogram<Integer>>();
+            for (int c = 1; c <= charge; c++) {
+                prefixIonFreq.put(c, new Histogram<Integer>());
+                suffixIonFreq.put(c, new Histogram<Integer>());
+            }
+
+            int numCleavages = 0;
+            for (Spectrum spec : specContainer) {
+                if (spec.getAnnotation() == null)
+                    continue;
+                if (spec.getCharge() != charge)
+                    continue;
+
+                float curParentMass = spec.getPrecursorMass();
+                if (curParentMass < parentMassRange.getFirst() || curParentMass >= parentMassRange.getSecond())
+                    continue;
+
+                Peptide annotation = spec.getAnnotation();
+                numCleavages += annotation.size() - 1;
+                numSpec++;
+                spec = filter.apply(spec);
+
+                for (int c = 1; c <= charge; c++) {
+                    for (int direction = 0; direction < 2; direction++) {
+                        double accurateMass = 0;
+                        Hashtable<Integer, Histogram<Integer>> ionFreq = null;
+                        for (int i = 0; i < annotation.size() - 1; i++) {
+                            if (direction == 0) {
+                                accurateMass += annotation.get(i).getAccurateMass();
+                                ionFreq = prefixIonFreq;
+                            } else if (direction == 1) {
+                                accurateMass += annotation.get(annotation.size() - 1 - i).getAccurateMass();
+                                ionFreq = suffixIonFreq;
+                            }
+                            float mass = (float) (accurateMass / c);
+                            ArrayList<Peak> peakList = spec.getPeakListByMassRange(
+                                    mass + MIN_OFFSET_MASS / (float) c - mme.getToleranceAsDa(mass),
+                                    mass + MAX_OFFSET_MASS / (float) c + mme.getToleranceAsDa(mass));
+                            int prevIntOffset = Integer.MIN_VALUE;
+                            for (Peak p : peakList) {
+                                float peakMz = p.getMz();
+                                int segNum = getSegmentNum(peakMz, curParentMass);
+                                if (segNum != seg)
+                                    continue;
+                                float offset = peakMz - mass;
+                                int intOffset = NominalMass.toNominalMass(offset);
+                                if (intOffset > prevIntOffset) {
+                                    ionFreq.get(c).add(intOffset);
+                                    prevIntOffset = intOffset;
+                                }
+                            }
+                        }
+                    }
+                }
+            }
+
+            float maxProb = 0;
+            int maxCharge = 0;
+            int maxDirection = 0;
+            float maxOffset = 0;
+
+            ArrayList<FragmentOffsetFrequency> fragmentOffsetFrequencyList = new ArrayList<FragmentOffsetFrequency>();
+            ArrayList<FragmentOffsetFrequency> insignificantFragmentOffsetFrequencyList = new ArrayList<FragmentOffsetFrequency>();
+            for (int c = 1; c <= charge; c++) {
+                for (int direction = 0; direction < 2; direction++) {
+                    ArrayList<Integer> keyList;
+                    if (direction == 0)
+                        keyList = new ArrayList<Integer>(prefixIonFreq.get(c).keySet());
+                    else
+                        keyList = new ArrayList<Integer>(suffixIonFreq.get(c).keySet());
+
+                    Collections.sort(keyList);
+                    for (Integer key : keyList) {
+                        float offset = NominalMass.getMassFromNominalMass(key);
+                        int freq;
+                        if (direction == 0)
+                            freq = prefixIonFreq.get(c).get(key);
+                        else
+                            freq = suffixIonFreq.get(c).get(key);
+                        float prob = freq / (float) numCleavages * numSegments;
+                        if (prob > maxProb) {
+                            maxProb = prob;
+                            maxCharge = c;
+                            maxDirection = direction;
+                            maxOffset = offset;
+                        }
+                        if (prob > minProbThreshold) {
+                            if (direction == 0)
+                                fragmentOffsetFrequencyList.add(new FragmentOffsetFrequency(new IonType.PrefixIon(c, offset), prob));
+                            else
+                                fragmentOffsetFrequencyList.add(new FragmentOffsetFrequency(new IonType.SuffixIon(c, offset), prob));
+                        } else {
+                            if (direction == 0)
+                                insignificantFragmentOffsetFrequencyList.add(new FragmentOffsetFrequency(new IonType.PrefixIon(c, offset), prob));
+                            else
+                                insignificantFragmentOffsetFrequencyList.add(new FragmentOffsetFrequency(new IonType.SuffixIon(c, offset), prob));
+                        }
+                    }
+                }
+            }
+
+            if (fragmentOffsetFrequencyList.size() == 0) {
+                if (maxDirection == 0)
+                    fragmentOffsetFrequencyList.add(new FragmentOffsetFrequency(new IonType.PrefixIon(maxCharge, maxOffset), maxProb));
+                else
+                    fragmentOffsetFrequencyList.add(new FragmentOffsetFrequency(new IonType.SuffixIon(maxCharge, maxOffset), maxProb));
+            }
+
+            Collections.sort(insignificantFragmentOffsetFrequencyList);
+            ArrayList<FragmentOffsetFrequency> noiseOffsetFrequencyList = new ArrayList<FragmentOffsetFrequency>(NUM_NOISE_IONS);
+
+            int numNoise = 0;
+            for (FragmentOffsetFrequency off : insignificantFragmentOffsetFrequencyList) {
+                if (off.getIonType().getCharge() == 1)
+                    noiseOffsetFrequencyList.add(off);
+                if (++numNoise >= NUM_NOISE_IONS)
+                    break;
+            }
+            Collections.sort(fragmentOffsetFrequencyList, Collections.reverseOrder());
+            fragOFFTable.put(partition, fragmentOffsetFrequencyList);
+            insignificantFragOFFTable.put(partition, noiseOffsetFrequencyList);
+        }
+    }
+
+    private void generateRankDist(int maxRank) {
+        if (partitionSet == null) {
+            assert (false) : "partition() must have been called!";
+            return;
+        }
+
+        rankDistTable = new HashMap<Partition, HashMap<IonType, Float[]>>();
+        this.maxRank = maxRank;
+
+        for (Partition partition : partitionSet) {
+            int charge = partition.getCharge();
+            IonType[] ionTypes = getIonTypes(partition);
+            if (ionTypes == null || ionTypes.length == 0)
+                continue;
+            Pair<Float, Float> parentMassRange = getPrecursorMassRange(partition);
+            int seg = partition.getSegNum();
+
+            int numSpec = 0;
+            Hashtable<IonType, Histogram<Integer>> rankDist = new Hashtable<IonType, Histogram<Integer>>();
+            Hashtable<IonType, Float> rankDistMaxRank = new Hashtable<IonType, Float>();
+            Hashtable<IonType, Float> rankDistUnexplained = new Hashtable<IonType, Float>();
+
+            for (IonType ion : ionTypes) {
+                rankDist.put(ion, new Histogram<Integer>());
+                rankDistMaxRank.put(ion, 0f);
+                rankDistUnexplained.put(ion, 0f);
+            }
+            rankDist.put(IonType.NOISE, new Histogram<Integer>());
+
+            float[] noiseDist = new float[maxRank + 2];
+            int numMaxRankPeaks = 0;
+            int totalCleavageSites = 0;
+
+            for (Spectrum spec : specContainer) {
+                int numExplainedPeaks = 0;
+                if (spec.getAnnotation() == null)
+                    continue;
+                if (spec.getCharge() != charge)
+                    continue;
+                float curParentMass = spec.getPrecursorMass();
+                if (curParentMass < parentMassRange.getFirst() || curParentMass >= parentMassRange.getSecond())
+                    continue;
+
+                Peptide annotation = spec.getAnnotation();
+                spec.setRanksOfPeaks();
+                numSpec++;
+                numMaxRankPeaks += spec.size() - maxRank + 1;
+                totalCleavageSites += annotation.size() - 1;
+                int prmMassIndex = 0;
+                int srmMassIndex = 0;
+
+                HashSet<Peak> explainedPeakSet = new HashSet<Peak>();
+                Hashtable<IonType, Integer> numExplainedMaxRankPeaks = new Hashtable<IonType, Integer>();
+                for (IonType ion : ionTypes) {
+                    numExplainedMaxRankPeaks.put(ion, 0);
+                }
+
+                int numSignalBinsAtThisSegment = 0;
+                for (int i = 0; i < annotation.size() - 1; i++) {
+                    prmMassIndex += NominalMass.toNominalMass(annotation.get(i).getMass());
+                    srmMassIndex += NominalMass.toNominalMass(annotation.get(annotation.size() - 1 - i).getMass());
+
+                    float prm = NominalMass.getMassFromNominalMass(prmMassIndex);
+                    float srm = NominalMass.getMassFromNominalMass(srmMassIndex);
+                    for (IonType ion : ionTypes) {
+                        float theoMass;
+                        if (ion instanceof IonType.PrefixIon)
+                            theoMass = ion.getMz(prm);
+                        else
+                            theoMass = ion.getMz(srm);
+
+                        int segNum = super.getSegmentNum(theoMass, curParentMass);
+                        if (segNum == seg) {
+                            numSignalBinsAtThisSegment++;
+                            Peak p = spec.getPeakByMass(theoMass, mme);
+                            if (p != null) {
+                                numExplainedPeaks++;
+                                int rank = p.getRank();
+                                if (rank >= maxRank) {
+                                    rank = maxRank;
+                                    numExplainedMaxRankPeaks.put(ion, numExplainedMaxRankPeaks.get(ion) + 1);
+                                }
+                                explainedPeakSet.add(p);
+                                rankDist.get(ion).add(rank);
+                            } else {
+                                rankDist.get(ion).add(maxRank + 1);
+                            }
+                        }
+                    }
+                }
+
+                ArrayList<Peak> unexplainedPeaksAtThisSegment = new ArrayList<Peak>();
+                int numPeaksAtThisSegment = 0;
+                int numMaxRankPeaksAtThisSegment = 0;
+                for (Peak p : spec) {
+                    if (super.getSegmentNum(p.getMz(), curParentMass) == seg) {
+                        numPeaksAtThisSegment++;
+                        if (p.getRank() >= maxRank)
+                            numMaxRankPeaksAtThisSegment++;
+                        if (!explainedPeakSet.contains(p))
+                            unexplainedPeaksAtThisSegment.add(p);
+                    }
+                }
+
+                float midMassThisSegment = (1f / numSegments * seg + 1f / numSegments / 2) * annotation.getParentMass();
+                float numBinsAtThisSegment = annotation.getParentMass() / numSegments / mme.getToleranceAsDa(midMassThisSegment) / 2;
+
+                for (Peak p : unexplainedPeaksAtThisSegment) {
+                    int rank = p.getRank();
+                    float noiseFreq = (annotation.size() - 1) / numSegments / numBinsAtThisSegment;
+                    if (rank >= maxRank)
+                        noiseDist[maxRank] += noiseFreq / numMaxRankPeaksAtThisSegment;
+                    else
+                        noiseDist[rank] += noiseFreq;
+                }
+
+                for (IonType ion : ionTypes) {
+                    if (numMaxRankPeaksAtThisSegment > 0) {
+                        Float prevSumFreq = rankDistMaxRank.get(ion);
+                        float curFreq = numExplainedMaxRankPeaks.get(ion) / (float) numMaxRankPeaksAtThisSegment;
+                        rankDistMaxRank.put(ion, prevSumFreq + curFreq);
+                    }
+                }
+
+                noiseDist[maxRank + 1] += (numBinsAtThisSegment - numPeaksAtThisSegment) * (annotation.size() - 1) / numSegments / numBinsAtThisSegment;
+            }
+
+            HashMap<IonType, Float[]> freqDist = new HashMap<IonType, Float[]>();
+            for (IonType ion : ionTypes) {
+                Float[] dist = new Float[maxRank + 1];
+                Histogram<Integer> hist = rankDist.get(ion);
+                for (int i = 1; i <= maxRank - 1; i++) {
+                    Integer num = hist.get(i);
+                    dist[i - 1] = (num / (float) numSpec);
+                }
+                dist[maxRank - 1] = rankDistMaxRank.get(ion) / numSpec;
+                dist[maxRank] = hist.get(maxRank + 1) / (float) numSpec;
+                freqDist.put(ion, dist);
+            }
+
+            // noise
+            Float[] dist = new Float[maxRank + 1];
+            for (int i = 1; i <= maxRank + 1; i++)
+                dist[i - 1] = noiseDist[i] / numSpec;
+            freqDist.put(IonType.NOISE, dist);
+
+            rankDistTable.put(partition, freqDist);
+        }
+    }
+
+    protected void smoothing() {
+        smoothingRankDistTable();
+    }
+
+    protected void smoothingRankDistTable() {
+        if (rankDistTable == null)
+            return;
+        assert (smoothingRanks.length == smoothingWindowSize.length);
+        for (Partition partition : rankDistTable.keySet()) {
+            HashMap<IonType, Float[]> table = this.rankDistTable.get(partition);
+            for (IonType ion : table.keySet()) {
+                Float[] freq = table.get(ion);
+                Float[] smoothedFreq = new Float[freq.length];
+                int smoothingIndex = 0;
+                for (int i = 0; i < freq.length - 2; i++) {
+                    if (smoothingIndex < smoothingRanks.length - 1 &&
+                            i == smoothingRanks[smoothingIndex])
+                        smoothingIndex++;
+                    int windowSize = smoothingWindowSize[smoothingIndex];
+                    float sumFrequencies = 0;
+                    int numIndicesSummed = 0;
+                    for (int d = -windowSize; d <= windowSize; d++) {
+                        int index = i + d;
+                        if (index < 0 || index > freq.length - 3)
+                            continue;
+                        sumFrequencies += freq[index];
+                        numIndicesSummed++;
+                    }
+                    while (sumFrequencies == 0 && windowSize < freq.length - 4) {
+                        windowSize++;
+                        int index = i - windowSize;
+                        if (index >= 0) {
+                            sumFrequencies += freq[index];
+                            numIndicesSummed++;
+                        }
+                        index = i + windowSize;
+                        if (index <= freq.length - 3) {
+                            sumFrequencies += freq[index];
+                            numIndicesSummed++;
+                        }
+                    }
+                    if (sumFrequencies != 0)
+                        smoothedFreq[i] = sumFrequencies / numIndicesSummed;
+                    else
+                        assert (false);
+                }
+                for (int i = 0; i < freq.length - 2; i++)
+                    freq[i] = smoothedFreq[i];
+                if (freq[freq.length - 1] == 0)
+                    freq[freq.length - 1] = Float.MIN_VALUE;
+                if (freq[freq.length - 2] == 0)
+                    freq[freq.length - 2] = freq[freq.length - 3];
+            }
+        }
+    }
+}
diff --git a/src/main/java/edu/ucsd/msjava/msscorer/ScoringParameterGeneratorWithErrors.java b/src/main/java/edu/ucsd/msjava/msscorer/ScoringParameterGeneratorWithErrors.java
new file mode 100644
index 00000000..3c1c6e6e
--- /dev/null
+++ b/src/main/java/edu/ucsd/msjava/msscorer/ScoringParameterGeneratorWithErrors.java
@@ -0,0 +1,841 @@
+package edu.ucsd.msjava.msscorer;
+
+import edu.ucsd.msjava.mgf.MgfSpectrumParser;
+import edu.ucsd.msjava.msgf.Histogram;
+import edu.ucsd.msjava.msgf.IntHistogram;
+import edu.ucsd.msjava.msgf.NominalMass;
+import edu.ucsd.msjava.msgf.Tolerance;
+import edu.ucsd.msjava.msscorer.NewScorerFactory.SpecDataType;
+import edu.ucsd.msjava.msutil.*;
+import edu.ucsd.msjava.msutil.IonType.PrefixIon;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Hashtable;
+import java.util.TreeMap;
+import java.util.TreeSet;
+
+/**
+ * This only supports low accuracy fragment ions.
+ *
+ * @author sangtaekim
+ */
+public class ScoringParameterGeneratorWithErrors extends NewRankScorer {
+    private static final float MIN_PRECURSOR_OFFSET = -300;    // for precursors
+    private static final float MAX_PRECURSOR_OFFSET = 30;
+    private static final int MIN_NUM_SPECTRA_PER_PARTITION = 400;
+    private static final int MIN_NUM_SPECTRA_FOR_PRECURSOR_OFF = 150;
+    private static final int MAX_NUM_PARTITIONS_PER_CHARGE = 30;
+
+    private static final float MIN_PRECURSOR_OFFSET_PROBABILITY = 0.15f;
+    private static final float MIN_ION_OFFSET_PROBABILITY = 0.15f;
+    private static final float MIN_MAIN_ION_OFFSET_PROBABILITY = 0.01f;
+
+    private static final int MAX_RANK = 150;
+    private static final int NUM_SEGMENTS_PER_SPECTRUM = 2;
+
+    private static final int[] smoothingRanks = {3, 5, 10, 20, 50, Integer.MAX_VALUE};
+    private static final int[] smoothingWindowSize = {0, 1, 2, 3, 4, 5};
+
+    private static final float DECONVOLUTION_MASS_TOLERANCE = 0.02f;
+    protected static final int MAX_CHARGE = 20;
+
+    public static void generateParameters(
+            File specFile,
+            SpecDataType dataType,
+            AminoAcidSet aaSet,
+            File outputDir,
+            boolean isText,
+            boolean verbose,
+            boolean singlePartition
+    ) {
+        SpectraContainer container = new SpectraContainer(specFile.getPath(), new MgfSpectrumParser().aaSet(aaSet));
+        generateParameters(container, dataType, aaSet, outputDir, isText, verbose, singlePartition);
+    }
+
+    public static void generateParameters(
+            SpectraContainer container,
+            SpecDataType dataType,
+            AminoAcidSet aaSet,
+            File outputDir,
+            boolean isText,
+            boolean verbose) {
+        generateParameters(container, dataType, aaSet, outputDir, isText, verbose, false);
+    }
+
+    public static void generateParameters(
+            SpectraContainer container,
+            SpecDataType dataType,
+            AminoAcidSet aaSet,
+            File outputDir,
+            boolean isText,
+            boolean verbose,
+            boolean singlePartition) {
+        if (verbose)
+            System.out.println("Number of annotated PSMs: " + container.size());
+
+        String paramFileName = dataType.toString() + ".param";
+
+        File outputFile;
+        if (outputDir != null)
+            outputFile = new File(outputDir, paramFileName);
+        else
+            outputFile = new File(paramFileName);
+
+        if (verbose)
+            System.out.println("Output file name: " + outputFile.getAbsolutePath());
+        int errorScalingFactor = 0;
+        boolean applyDeconvolution = false;
+
+        if (dataType.getInstrumentType() == InstrumentType.HIGH_RESOLUTION_LTQ
+                || dataType.getInstrumentType() == InstrumentType.TOF
+                || dataType.getInstrumentType().isHighResolution()) {
+            errorScalingFactor = 100;
+            applyDeconvolution = true;
+            if (verbose)
+                System.out.println("High-precision MS/MS data: " +
+                        "errorScalingFactor(" + errorScalingFactor + ") " +
+                        "chargeDeconvolution(" + applyDeconvolution + ")");
+        }
+
+        boolean considerPhosLoss = false;
+        if (dataType.getProtocol().getName().equals("Phosphorylation")) {
+            considerPhosLoss = true;
+            if (verbose)
+                System.out.println("Consider H3PO4 loss.");
+        }
+
+        boolean consideriTRAQLoss = false;
+        if (dataType.getProtocol().getName().equals("iTRAQ")) {
+            consideriTRAQLoss = true;
+            if (verbose)
+                System.out.println("Consider iTRAQ loss.");
+        }
+
+        boolean considerTMTLoss = false;
+        if (dataType.getProtocol().getName().equals("TMT")) {
+            considerTMTLoss = true;
+            if (verbose)
+                System.out.println("Consider TMT loss.");
+        }
+
+        if (dataType.getProtocol().getName().equals("iTRAQPhospho")) {
+            considerPhosLoss = true;
+            consideriTRAQLoss = true;
+            if (verbose)
+                System.out.println("Consider iTRAQ and H3PO4 loss.");
+        }
+
+        HashSet<String> pepSet = new HashSet<String>();
+        for (Spectrum spec : container)
+            pepSet.add(spec.getAnnotationStr());
+
+        if (verbose)
+            System.out.println("Number of unique peptides: " + pepSet.size());
+        int numSpecsPerPeptide;
+        if (pepSet.size() < 2000) {
+            numSpecsPerPeptide = 3;
+        } else {
+            numSpecsPerPeptide = 1;
+        }
+        if (verbose)
+            System.out.println("Consider " + numSpecsPerPeptide + " per spectrum.");
+
+        HashMap<String, ArrayList<Spectrum>> pepSpecMap = new HashMap<String, ArrayList<Spectrum>>();
+        for (Spectrum spec : container) {
+            if (spec.getAnnotationStr() == null)
+                continue;
+            String pep = spec.getAnnotationStr() + ":" + spec.getCharge();
+            if (pep != null && pep.length() > 0) {
+                ArrayList<Spectrum> specList = pepSpecMap.get(pep);
+                if (specList == null) {
+                    specList = new ArrayList<Spectrum>();
+                    pepSpecMap.put(pep, specList);
+                }
+                if (specList.size() < numSpecsPerPeptide)
+                    specList.add(spec);
+            }
+        }
+
+        SpectraContainer specContOnePerPep = new SpectraContainer();
+        for (ArrayList<Spectrum> specList : pepSpecMap.values()) {
+            for (Spectrum spec : specList) {
+                specContOnePerPep.add(spec);
+            }
+        }
+
+        ScoringParameterGeneratorWithErrors gen = new ScoringParameterGeneratorWithErrors(
+                specContOnePerPep,
+                dataType,
+                considerPhosLoss,
+                consideriTRAQLoss,
+                considerTMTLoss,
+                applyDeconvolution);
+
+        gen.tolerance(new Tolerance(0.5f));
+
+        if (singlePartition)
+            gen.partition(2, true);
+        else
+            gen.partition(NUM_SEGMENTS_PER_SPECTRUM, false);
+        if (verbose)
+            System.out.println("Partition: " + gen.partitionSet.size());
+
+        gen.precursorOFF(MIN_PRECURSOR_OFFSET_PROBABILITY);
+        if (verbose)
+            System.out.println("PrecursorOFF Done.");
+
+        gen.filterPrecursorPeaks();
+        if (verbose)
+            System.out.println("Filtering Done.");
+
+        if (applyDeconvolution) {
+            gen.deconvoluteSpectra();
+            if (verbose)
+                System.out.println("Deconvolution Done.");
+        }
+
+        gen.selectIonTypes();
+        if (verbose)
+            System.out.println("Ion types selected.");
+
+        gen.generateRankDist(MAX_RANK);
+        if (verbose)
+            System.out.println("Rank distribution computed.");
+
+        gen.generateErrorDist(errorScalingFactor);
+        if (verbose)
+            System.out.println("Error disbribution computed");
+
+        gen.smoothing();
+        if (verbose)
+            System.out.println("Smoothing complete.");
+
+        gen.writeParameters(outputFile);
+        gen.writeParametersPlainText(new File(outputFile.getPath() + ".txt"));
+
+        if (verbose)
+            System.out.println("Writing Done.");
+    }
+
+    private SpectraContainer specContainer;
+    private final boolean considerPhosLoss;
+    private final boolean consideriTRAQLoss;
+    private final boolean considerTMTLoss;
+
+    public ScoringParameterGeneratorWithErrors(SpectraContainer specContainer, SpecDataType dataType, boolean considerPhosLoss, boolean consideriTRAQLoss, boolean considerTMTLoss, boolean applyDeconvolution) {
+        this.specContainer = specContainer;
+        this.considerPhosLoss = considerPhosLoss;
+        this.consideriTRAQLoss = consideriTRAQLoss;
+        this.considerTMTLoss = considerTMTLoss;
+        super.dataType = dataType;
+        super.applyDeconvolution = applyDeconvolution;
+        super.deconvolutionErrorTolerance = DECONVOLUTION_MASS_TOLERANCE;
+    }
+
+    public void partition(int numSegments, boolean singlePartition) {
+        super.numSegments = numSegments;
+        chargeHist = new Histogram<Integer>();
+        partitionSet = new TreeSet<Partition>();
+
+        Hashtable<Integer, ArrayList<Float>> parentMassMap = new Hashtable<Integer, ArrayList<Float>>();
+        for (Spectrum spec : specContainer) {
+            int charge = spec.getCharge();
+            if (charge <= 0)
+                continue;
+            chargeHist.add(charge);
+            if (spec.getAnnotation() != null) {
+                ArrayList<Float> precursorList = parentMassMap.get(charge);
+                if (precursorList == null) {
+                    precursorList = new ArrayList<Float>();
+                    parentMassMap.put(charge, precursorList);
+                }
+                precursorList.add(spec.getPrecursorMass());
+            }
+        }
+
+        for (int c = chargeHist.minKey(); c <= chargeHist.maxKey(); c++) {
+            ArrayList<Float> parentMassList = parentMassMap.get(c);
+            if (parentMassList == null)
+                continue;
+
+            int numSpec = parentMassList.size();
+            if (numSpec < Math.round(MIN_NUM_SPECTRA_PER_PARTITION * 0.9f))
+                continue;
+
+            int partitionSize = Math.max(numSpec / MAX_NUM_PARTITIONS_PER_CHARGE, MIN_NUM_SPECTRA_PER_PARTITION);
+
+            Collections.sort(parentMassList);
+            int bestSetSize = 0;
+
+            if (singlePartition)
+                bestSetSize = numSpec;
+            else {
+                int smallestRemainder = partitionSize;
+                for (int i = Math.round(partitionSize * 0.9f); i <= Math.round(partitionSize * 1.1f); i++) {
+                    int remainder = numSpec % i;
+                    if (i - remainder < remainder)
+                        remainder = i - remainder;
+                    if (remainder < smallestRemainder || (remainder == smallestRemainder && Math.abs(partitionSize - i) < Math.abs(partitionSize - bestSetSize))) {
+                        bestSetSize = i;
+                        smallestRemainder = remainder;
+                    }
+                }
+            }
+            int num = 0;
+            for (int i = 0; i == 0 || i < Math.round(numSpec / (float) bestSetSize); i++) {
+                if (num != 0) {
+                    for (int seg = 0; seg < numSegments; seg++)
+                        partitionSet.add(new Partition(c, parentMassList.get(num), seg));
+                } else {
+                    for (int seg = 0; seg < numSegments; seg++)
+                        partitionSet.add(new Partition(c, 0f, seg));
+                }
+                num += bestSetSize;
+            }
+        }
+    }
+
+    private void precursorOFF(float minProbThreshold) {
+        if (chargeHist == null) {
+            assert (false) : "partition() must have been called before";
+            return;
+        }
+        precursorOFFMap = new TreeMap<Integer, ArrayList<PrecursorOffsetFrequency>>();
+        numPrecurOFF = 0;
+
+        for (int charge = chargeHist.minKey(); charge <= chargeHist.maxKey(); charge++) {
+            if (chargeHist.get(charge) < MIN_NUM_SPECTRA_FOR_PRECURSOR_OFF)
+                continue;
+            ArrayList<PrecursorOffsetFrequency> precursorOffsetList = new ArrayList<PrecursorOffsetFrequency>();
+            int numSpecs = 0;
+            Hashtable<Integer, Histogram<Integer>> histList = new Hashtable<Integer, Histogram<Integer>>();
+            for (int c = charge; c >= 2; c--)
+                histList.put(c, new Histogram<Integer>());
+
+            for (Spectrum spec : specContainer) {
+                if (spec.getAnnotation() == null)
+                    continue;
+                if (spec.getCharge() != charge)
+                    continue;
+                numSpecs++;
+                spec = filter.apply(spec);
+                float precursorNeutralMass = spec.getPrecursorMass();
+                for (int c = charge; c >= 2; c--) {
+                    float precursorMz = (precursorNeutralMass + c * (float) Composition.ChargeCarrierMass()) / c;
+                    ArrayList<Peak> peakList = spec.getPeakListByMassRange(
+                            precursorMz + MIN_PRECURSOR_OFFSET / (float) c - mme.getToleranceAsDa(precursorMz + MIN_PRECURSOR_OFFSET / (float) c) / 2,
+                            precursorMz + MAX_PRECURSOR_OFFSET / (float) c + mme.getToleranceAsDa(precursorMz + MAX_PRECURSOR_OFFSET / (float) c) / 2);
+
+                    int prevMassIndexDiff = Integer.MIN_VALUE;
+                    for (Peak p : peakList) {
+                        float peakMass = p.getMz();
+                        int massIndexDiff = NominalMass.toNominalMass(peakMass - precursorMz);
+                        if (massIndexDiff > prevMassIndexDiff) {
+                            histList.get(c).add(massIndexDiff);
+                            prevMassIndexDiff = massIndexDiff;
+                        }
+                    }
+                }
+            }
+
+            for (int c = charge; c >= 2; c--) {
+                ArrayList<Integer> keyList = new ArrayList<Integer>(histList.get(c).keySet());
+                Collections.sort(keyList);
+                for (Integer key : keyList) {
+                    float prob = (histList.get(c).get(key)) / (float) numSpecs;
+                    if (prob > minProbThreshold) {
+                        precursorOffsetList.add(new PrecursorOffsetFrequency((charge - c), NominalMass.getMassFromNominalMass(key), prob));
+                    }
+                }
+            }
+            precursorOFFMap.put(charge, precursorOffsetList);
+            numPrecurOFF += precursorOffsetList.size();
+        }
+    }
+
+    private void filterPrecursorPeaks() {
+        if (this.precursorOFFMap == null)
+            return;
+        for (Spectrum spec : specContainer) {
+            for (PrecursorOffsetFrequency off : this.getPrecursorOFF(spec.getCharge()))
+                spec.filterPrecursorPeaks(mme, off.getReducedCharge(), off.getOffset());
+        }
+    }
+
+    private void deconvoluteSpectra() {
+        SpectraContainer newSpecContainer = new SpectraContainer();
+        for (Spectrum spec : specContainer) {
+            newSpecContainer.add(spec.getDeconvolutedSpectrum(DECONVOLUTION_MASS_TOLERANCE));
+        }
+        specContainer = newSpecContainer;
+    }
+
+    private Pair<Float, Float> getPrecursorMassRange(Partition partition) {
+        float minParentMass = partition.getParentMass();
+        float maxParentMass = Float.MAX_VALUE;
+        Partition higherPartition = partitionSet.higher(partition);
+        if (higherPartition != null) {
+            if (higherPartition.getCharge() == partition.getCharge() && higherPartition.getSegNum() == partition.getSegNum()) {
+                maxParentMass = higherPartition.getParentMass();
+            }
+        }
+        return new Pair<Float, Float>(minParentMass, maxParentMass);
+    }
+
+    private void selectIonTypes() {
+        if (partitionSet == null) {
+            assert (false) : "partition() must have been called before!";
+            return;
+        }
+
+        fragOFFTable = new HashMap<Partition, ArrayList<FragmentOffsetFrequency>>();
+
+        for (Partition partition : partitionSet) {
+            int charge = partition.getCharge();
+            Pair<Float, Float> parentMassRange = getPrecursorMassRange(partition);
+
+            SpectraContainer curPartContainer = new SpectraContainer();
+            for (Spectrum spec : specContainer) {
+                if (spec.getAnnotation() == null)
+                    continue;
+                if (spec.getCharge() != charge)
+                    continue;
+
+                float curParentMass = spec.getPrecursorMass();
+                if (curParentMass < parentMassRange.getFirst() || curParentMass >= parentMassRange.getSecond())
+                    continue;
+
+                curPartContainer.add(spec);
+            }
+
+            ArrayList<FragmentOffsetFrequency> signalFragmentOffsetFrequencyList = new ArrayList<FragmentOffsetFrequency>();
+
+            int seg = partition.getSegNum();
+            IonType[] allIonTypes = IonType.getAllKnownIonTypes(Math.min(charge, 3), true, considerPhosLoss, consideriTRAQLoss, considerTMTLoss).toArray(new IonType[0]);
+
+            IonProbability probGen = new IonProbability(
+                    curPartContainer.iterator(),
+                    allIonTypes,
+                    mme)
+                    .filter(filter)
+                    .segment(seg, numSegments);
+
+            float[] ionProb = probGen.getIonProb();
+
+            float signalThreshold = MIN_ION_OFFSET_PROBABILITY;
+            for (int i = 0; i < allIonTypes.length; i++) {
+                if (ionProb[i] >= signalThreshold)
+                    signalFragmentOffsetFrequencyList.add(new FragmentOffsetFrequency(allIonTypes[i], ionProb[i]));
+            }
+
+            if (signalFragmentOffsetFrequencyList.size() == 0) {
+                int maxIndex = -1;
+                float maxIonProb = Float.MIN_VALUE;
+                for (int i = 0; i < allIonTypes.length; i++) {
+                    if (ionProb[i] > MIN_MAIN_ION_OFFSET_PROBABILITY && ionProb[i] > maxIonProb) {
+                        maxIndex = i;
+                        maxIonProb = ionProb[i];
+                    }
+                }
+                if (maxIndex >= 0)
+                    signalFragmentOffsetFrequencyList.add(new FragmentOffsetFrequency(allIonTypes[maxIndex], maxIonProb));
+            }
+
+            Collections.sort(signalFragmentOffsetFrequencyList, Collections.reverseOrder());
+            fragOFFTable.put(partition, signalFragmentOffsetFrequencyList);
+        }
+        super.determineIonTypes();
+    }
+
+    private void generateRankDist(int maxRank) {
+        if (partitionSet == null) {
+            assert (false) : "partition() must have been called!";
+            return;
+        }
+
+        rankDistTable = new HashMap<Partition, HashMap<IonType, Float[]>>();
+        this.maxRank = maxRank;
+
+        for (Partition partition : partitionSet) {
+            int charge = partition.getCharge();
+            IonType[] ionTypes = getIonTypes(partition);
+            if (ionTypes == null || ionTypes.length == 0)
+                continue;
+
+            Pair<Float, Float> parentMassRange = getPrecursorMassRange(partition);
+            int seg = partition.getSegNum();
+
+            int numSpec = 0;
+            Hashtable<IonType, Histogram<Integer>> rankDist = new Hashtable<IonType, Histogram<Integer>>();
+            Hashtable<IonType, Float> rankDistMaxRank = new Hashtable<IonType, Float>();
+            Hashtable<IonType, Float> rankDistUnexplained = new Hashtable<IonType, Float>();
+
+            for (IonType ion : ionTypes) {
+                rankDist.put(ion, new Histogram<Integer>());
+                rankDistMaxRank.put(ion, 0f);
+                rankDistUnexplained.put(ion, 0f);
+            }
+            rankDist.put(IonType.NOISE, new Histogram<Integer>());
+
+            float[] noiseDist = new float[maxRank + 2];
+            int numMaxRankPeaks = 0;
+            int totalCleavageSites = 0;
+
+            for (Spectrum spec : specContainer) {
+                int numExplainedPeaks = 0;
+                if (spec.getAnnotation() == null)
+                    continue;
+                if (spec.getCharge() != charge)
+                    continue;
+                float curParentMass = spec.getPrecursorMass();
+                if (curParentMass < parentMassRange.getFirst() || curParentMass >= parentMassRange.getSecond())
+                    continue;
+
+                Peptide annotation = spec.getAnnotation();
+                spec.setRanksOfPeaks();
+                numSpec++;
+                numMaxRankPeaks += spec.size() - maxRank + 1;
+                totalCleavageSites += annotation.size() - 1;
+                int prmMassIndex = 0;
+                int srmMassIndex = 0;
+
+                HashSet<Peak> explainedPeakSet = new HashSet<Peak>();
+                Hashtable<IonType, Integer> numExplainedMaxRankPeaks = new Hashtable<IonType, Integer>();
+                for (IonType ion : ionTypes) {
+                    numExplainedMaxRankPeaks.put(ion, 0);
+                }
+
+                int numSignalBinsAtThisSegment = 0;
+                for (int i = 0; i < annotation.size() - 1; i++) {
+                    prmMassIndex += NominalMass.toNominalMass(annotation.get(i).getMass());
+                    srmMassIndex += NominalMass.toNominalMass(annotation.get(annotation.size() - 1 - i).getMass());
+
+                    float prm = NominalMass.getMassFromNominalMass(prmMassIndex);
+                    float srm = NominalMass.getMassFromNominalMass(srmMassIndex);
+                    for (IonType ion : ionTypes) {
+                        float theoMass;
+                        if (ion instanceof IonType.PrefixIon)
+                            theoMass = ion.getMz(prm);
+                        else
+                            theoMass = ion.getMz(srm);
+
+                        int segNum = super.getSegmentNum(theoMass, curParentMass);
+                        if (segNum == seg) {
+                            numSignalBinsAtThisSegment++;
+                            Peak p = spec.getPeakByMass(theoMass, mme);
+                            if (p != null) {
+                                numExplainedPeaks++;
+                                int rank = p.getRank();
+                                if (rank >= maxRank) {
+                                    rank = maxRank;
+                                    numExplainedMaxRankPeaks.put(ion, numExplainedMaxRankPeaks.get(ion) + 1);
+                                }
+                                explainedPeakSet.add(p);
+                                rankDist.get(ion).add(rank);
+                            } else {
+                                rankDist.get(ion).add(maxRank + 1);
+                            }
+                        }
+                    }
+                }
+
+                ArrayList<Peak> unexplainedPeaksAtThisSegment = new ArrayList<Peak>();
+                int numPeaksAtThisSegment = 0;
+                int numMaxRankPeaksAtThisSegment = 0;
+                for (Peak p : spec) {
+                    if (super.getSegmentNum(p.getMz(), curParentMass) == seg) {
+                        numPeaksAtThisSegment++;
+                        if (p.getRank() >= maxRank)
+                            numMaxRankPeaksAtThisSegment++;
+                        if (!explainedPeakSet.contains(p))
+                            unexplainedPeaksAtThisSegment.add(p);
+                    }
+                }
+
+                float midMassThisSegment = (1f / numSegments * seg + 1f / numSegments / 2) * annotation.getParentMass();
+                float numBinsAtThisSegment = annotation.getParentMass() / numSegments / mme.getToleranceAsDa(midMassThisSegment) / 2;
+
+                for (Peak p : unexplainedPeaksAtThisSegment) {
+                    int rank = p.getRank();
+                    float noiseFreq = (annotation.size() - 1) / numSegments / numBinsAtThisSegment;
+                    if (rank >= maxRank)
+                        noiseDist[maxRank] += noiseFreq / numMaxRankPeaksAtThisSegment;
+                    else
+                        noiseDist[rank] += noiseFreq;
+                }
+
+                for (IonType ion : ionTypes) {
+                    if (numMaxRankPeaksAtThisSegment > 0) {
+                        Float prevSumFreq = rankDistMaxRank.get(ion);
+                        float curFreq = numExplainedMaxRankPeaks.get(ion) / (float) numMaxRankPeaksAtThisSegment;
+                        rankDistMaxRank.put(ion, prevSumFreq + curFreq);
+                    }
+                }
+
+                noiseDist[maxRank + 1] += (numBinsAtThisSegment - numPeaksAtThisSegment) * (annotation.size() - 1) / numSegments / numBinsAtThisSegment;
+            }
+
+            HashMap<IonType, Float[]> freqDist = new HashMap<IonType, Float[]>();
+            for (IonType ion : ionTypes) {
+                Float[] dist = new Float[maxRank + 1];
+                Histogram<Integer> hist = rankDist.get(ion);
+                for (int i = 1; i <= maxRank - 1; i++) {
+                    Integer num = hist.get(i);
+                    dist[i - 1] = (num / (float) numSpec);
+                }
+                dist[maxRank - 1] = rankDistMaxRank.get(ion) / numSpec;
+                dist[maxRank] = hist.get(maxRank + 1) / (float) numSpec;
+                freqDist.put(ion, dist);
+            }
+
+            // noise
+            Float[] dist = new Float[maxRank + 1];
+            for (int i = 1; i <= maxRank + 1; i++)
+                dist[i - 1] = noiseDist[i] / numSpec;
+            freqDist.put(IonType.NOISE, dist);
+
+            rankDistTable.put(partition, freqDist);
+        }
+    }
+
+    private void generateErrorDist(int errorScalingFactor) {
+        this.errorScalingFactor = errorScalingFactor;
+        if (errorScalingFactor > 0) {
+            generateIonErrorDist();
+            generateNoiseErrorDist();
+        }
+    }
+
+    private void generateIonErrorDist() {
+        ionErrDistTable = new HashMap<Partition, Float[]>();
+        ionExistenceTable = new HashMap<Partition, Float[]>();
+        for (Partition partition : partitionSet) {
+            int charge = partition.getCharge();
+            Pair<Float, Float> parentMassRange = getPrecursorMassRange(partition);
+            int seg = partition.getSegNum();
+            if (seg != super.getNumSegments() - 1)
+                continue;
+            IonType mainIon = this.getMainIonType(partition);
+            IntHistogram errHist = new IntHistogram();
+            int[] edgeCount = new int[4];
+            int numSpecs = 0;
+            for (Spectrum spec : specContainer) {
+                if (spec.getAnnotation() == null)
+                    continue;
+                if (spec.getCharge() != charge)
+                    continue;
+
+                float curParentMass = spec.getPrecursorMass();
+                if (curParentMass < parentMassRange.getFirst() || curParentMass >= parentMassRange.getSecond())
+                    continue;
+
+                numSpecs++;
+                Peptide peptide;
+
+                peptide = spec.getAnnotation();
+
+                int intResidueMass = 0;
+                float[] obsMass = new float[peptide.size() + 1];
+
+                obsMass[0] = mainIon.getOffset();
+                for (int i = 0; i < peptide.size() - 1; i++) {
+                    if (mainIon instanceof PrefixIon)
+                        intResidueMass += peptide.get(i).getNominalMass();
+                    else
+                        intResidueMass += peptide.get(peptide.size() - 1 - i).getNominalMass();
+
+                    float theoMass = mainIon.getMz(NominalMass.getMassFromNominalMass(intResidueMass));
+                    Peak p = spec.getPeakByMass(theoMass, mme);
+                    if (p != null)
+                        obsMass[i + 1] = p.getMz();
+                    else
+                        obsMass[i + 1] = -1;
+                }
+
+                obsMass[peptide.size()] = mainIon.getMz(peptide.getMass());
+                for (int i = 1; i <= peptide.size(); i++) {
+                    if (obsMass[i] >= 0) {
+                        if (obsMass[i - 1] >= 0) {
+                            AminoAcid aa;
+                            if (mainIon instanceof PrefixIon)
+                                aa = peptide.get(i - 1);
+                            else
+                                aa = peptide.get(peptide.size() - i);
+
+                            float expMass = obsMass[i] - obsMass[i - 1];
+                            float theoMass = aa.getMass() / mainIon.getCharge();
+                            float diff = expMass - theoMass;
+                            int diffIndex = Math.round(diff * errorScalingFactor);
+                            if (diffIndex > errorScalingFactor)
+                                diffIndex = errorScalingFactor;
+                            else if (diffIndex < -errorScalingFactor)
+                                diffIndex = -errorScalingFactor;
+                            errHist.add(diffIndex);
+                            edgeCount[3]++;
+                        } else
+                            edgeCount[1]++;
+                    } else {
+                        if (obsMass[i - 1] >= 0)
+                            edgeCount[2]++;
+                        else
+                            edgeCount[0]++;
+                    }
+                }
+            }
+
+            Float[] ionErrHist = new Float[2 * errorScalingFactor + 1];
+            float[] smoothedHist = errHist.getSmoothedHist(errorScalingFactor);
+            for (int i = -errorScalingFactor; i <= errorScalingFactor; i++)
+                ionErrHist[i + errorScalingFactor] = smoothedHist[i + errorScalingFactor] / (float) errHist.totalCount();
+
+            Float[] ionExistence = new Float[edgeCount.length];
+            int sumEdgeCount = 0;
+            for (int i = 0; i < edgeCount.length; i++)
+                sumEdgeCount += edgeCount[i];
+            for (int i = 0; i < edgeCount.length; i++)
+                ionExistence[i] = edgeCount[i] / (float) sumEdgeCount;
+
+            for (int i = 0; i < this.numSegments; i++) {
+                Partition part = new Partition(partition.getCharge(), partition.getParentMass(), i);
+                if (partitionSet.contains(part)) {
+                    ionErrDistTable.put(part, ionErrHist);
+                    ionExistenceTable.put(part, ionExistence);
+                }
+            }
+        }
+    }
+
+    private void generateNoiseErrorDist() {
+        this.noiseErrDistTable = new HashMap<Partition, Float[]>();
+        AminoAcidSet aaSet = AminoAcidSet.getStandardAminoAcidSetWithFixedCarbamidomethylatedCys();
+        AminoAcid aaK = aaSet.getAminoAcid('K');
+        AminoAcid aaQ = aaSet.getAminoAcid('Q');
+        int heaviestAANominalMass = aaSet.getMaxNominalMass();
+        float[] nominalMass = new float[heaviestAANominalMass + 1];
+        for (AminoAcid aa : aaSet)
+            nominalMass[aa.getNominalMass()] = aa.getMass();
+
+        for (Partition partition : partitionSet) {
+            int charge = partition.getCharge();
+            Pair<Float, Float> parentMassRange = getPrecursorMassRange(partition);
+            int seg = partition.getSegNum();
+            if (seg != super.getNumSegments() - 1)
+                continue;
+
+            IntHistogram errHist = new IntHistogram();
+            int numSpecs = 0;
+            for (Spectrum spec : specContainer) {
+                if (spec.getAnnotation() == null)
+                    continue;
+                if (spec.getCharge() != charge)
+                    continue;
+
+                float curParentMass = spec.getPrecursorMass();
+                if (curParentMass < parentMassRange.getFirst() || curParentMass >= parentMassRange.getSecond())
+                    continue;
+
+                Spectrum noiseSpec = (Spectrum) spec.clone();
+
+                numSpecs++;
+
+                for (int i = 0; i < noiseSpec.size() - 1; i++) {
+                    Peak p1 = noiseSpec.get(i);
+                    float p1Mass = p1.getMz();
+                    int nominalP1 = NominalMass.toNominalMass(p1.getMz());
+                    for (int j = i + 1; j < noiseSpec.size(); j++) {
+                        Peak p2 = noiseSpec.get(j);
+                        float p2Mass = p2.getMz();
+                        int nominalP2 = NominalMass.toNominalMass(p2.getMz());
+                        int nominalDiff = nominalP2 - nominalP1;
+                        if (nominalDiff > heaviestAANominalMass)
+                            break;
+                        if (nominalMass[nominalDiff] == 0)
+                            continue;
+
+                        float diff = p2Mass - p1Mass;
+                        float aaMass = nominalMass[nominalDiff];
+                        if (nominalDiff == 128) {
+                            if (Math.abs(diff - aaQ.getMass()) > Math.abs(diff - aaK.getMass()))
+                                aaMass = aaK.getMass();
+                            else
+                                aaMass = aaQ.getMass();
+                        }
+                        float err = diff - aaMass;
+                        errHist.add(Math.round(err * errorScalingFactor));
+                    }
+                }
+            }
+            Float[] noiseErrHist = new Float[2 * errorScalingFactor + 1];
+            float[] smoothedHist = errHist.getSmoothedHist(errorScalingFactor);
+            for (int i = -errorScalingFactor; i <= errorScalingFactor; i++)
+                noiseErrHist[i + errorScalingFactor] = smoothedHist[i + errorScalingFactor] / (float) errHist.totalCount();
+
+            for (int i = 0; i < this.numSegments; i++) {
+                Partition part = new Partition(partition.getCharge(), partition.getParentMass(), i);
+                if (partitionSet.contains(part)) {
+                    noiseErrDistTable.put(part, noiseErrHist);
+                }
+            }
+        }
+    }
+
+    protected void smoothing() {
+        smoothingRankDistTable();
+    }
+
+    protected void smoothingRankDistTable() {
+        if (rankDistTable == null)
+            return;
+        assert (smoothingRanks.length == smoothingWindowSize.length);
+        for (Partition partition : rankDistTable.keySet()) {
+            HashMap<IonType, Float[]> table = this.rankDistTable.get(partition);
+            for (IonType ion : table.keySet()) {
+                Float[] freq = table.get(ion);
+                Float[] smoothedFreq = new Float[freq.length];
+                int smoothingIndex = 0;
+                for (int i = 0; i < freq.length - 2; i++) {
+                    if (smoothingIndex < smoothingRanks.length - 1 &&
+                            i == smoothingRanks[smoothingIndex])
+                        smoothingIndex++;
+                    int windowSize = smoothingWindowSize[smoothingIndex];
+                    float sumFrequencies = 0;
+                    int numIndicesSummed = 0;
+                    for (int d = -windowSize; d <= windowSize; d++) {
+                        int index = i + d;
+                        if (index < 0 || index > freq.length - 3)
+                            continue;
+                        sumFrequencies += freq[index];
+                        numIndicesSummed++;
+                    }
+                    while (sumFrequencies == 0 && windowSize < freq.length - 4) {
+                        windowSize++;
+                        int index = i - windowSize;
+                        if (index >= 0) {
+                            sumFrequencies += freq[index];
+                            numIndicesSummed++;
+                        }
+                        index = i + windowSize;
+                        if (index <= freq.length - 3) {
+                            sumFrequencies += freq[index];
+                            numIndicesSummed++;
+                        }
+                    }
+                    if (sumFrequencies != 0)
+                        smoothedFreq[i] = sumFrequencies / numIndicesSummed;
+                    else
+                        assert (false);
+                }
+                for (int i = 0; i < freq.length - 2; i++)
+                    freq[i] = smoothedFreq[i];
+                if (freq[freq.length - 1] == 0)
+                    freq[freq.length - 1] = Float.MIN_VALUE;
+                if (freq[freq.length - 2] == 0)
+                    freq[freq.length - 2] = freq[freq.length - 3];
+            }
+        }
+    }
+}
diff --git a/src/main/java/edu/ucsd/msjava/msutil/AnnotatedSpectra.java b/src/main/java/edu/ucsd/msjava/msutil/AnnotatedSpectra.java
new file mode 100644
index 00000000..8a39b7ac
--- /dev/null
+++ b/src/main/java/edu/ucsd/msjava/msutil/AnnotatedSpectra.java
@@ -0,0 +1,325 @@
+package edu.ucsd.msjava.msutil;
+
+import edu.ucsd.msjava.mgf.BufferedLineReader;
+import edu.ucsd.msjava.misc.ExceptionCapturer;
+import edu.ucsd.msjava.misc.ProgressData;
+import edu.ucsd.msjava.misc.ProgressReporter;
+import edu.ucsd.msjava.misc.ThreadPoolExecutorWithExceptions;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.PrintStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+
+/**
+ * Restored trainer support class. mzID input has been excised in this fork
+ * (the {@code mzid/} package and its {@code MzIDParser} were removed); only
+ * pre-converted TSV result files are accepted.
+ */
+public class AnnotatedSpectra implements ProgressReporter, ExceptionCapturer {
+    private File[] resultFiles;
+    private File specDir;
+    private AminoAcidSet aaSet;
+    private float fdrThreshold = 0.01f;
+    private ProgressData progress;
+    private boolean dropErrorDatasets = false;
+    private Throwable exception = null;
+
+    @Override
+    public void setProgressData(ProgressData data) {
+        progress = data;
+    }
+
+    @Override
+    public ProgressData getProgressData() {
+        return progress;
+    }
+
+    @Override
+    public boolean hasException() {
+        return exception != null;
+    }
+
+    @Override
+    public Throwable getException() {
+        return exception;
+    }
+
+    public void setDropErrorDatasets(boolean dropErrors) {
+        dropErrorDatasets = dropErrors;
+    }
+
+    private SpectraContainer annotatedSpectra;
+
+    public AnnotatedSpectra(File[] resultFiles, File specDir, AminoAcidSet aaSet) {
+        this.resultFiles = resultFiles;
+        this.specDir = specDir;
+        this.aaSet = aaSet;
+        this.progress = null;
+    }
+
+    public AnnotatedSpectra(File[] resultFiles, File specDir, AminoAcidSet aaSet, float fdrThreshold, boolean dropErrors) {
+        this.resultFiles = resultFiles;
+        this.specDir = specDir;
+        this.aaSet = aaSet;
+        this.fdrThreshold = fdrThreshold;
+        this.dropErrorDatasets = dropErrors;
+        this.progress = null;
+    }
+
+    public static class ConcurrentAnnotatedSpectraParser extends AnnotatedSpectra implements Runnable {
+        private List<Spectrum> results;
+        private List<String> errors;
+
+        public ConcurrentAnnotatedSpectraParser(File[] resultFiles, File specDir, AminoAcidSet aaSet, float fdrThreshold, boolean dropErrors, List<Spectrum> resultList, List<String> errorList) {
+            super(resultFiles, specDir, aaSet, fdrThreshold, dropErrors);
+            results = resultList;
+            errors = errorList;
+        }
+
+        @Override
+        public void run() {
+            String result = parse();
+            results.addAll(getAnnotatedSpecContainer());
+            if (result != null) {
+                errors.add(result);
+            }
+        }
+    }
+
+    public AnnotatedSpectra fdrThreshold(float fdrThreshold) {
+        this.fdrThreshold = fdrThreshold;
+        return this;
+    }
+
+    public SpectraContainer getAnnotatedSpecContainer() {
+        return annotatedSpectra;
+    }
+
+    public String parse(int numThreads, boolean dropErrors) {
+        if (numThreads <= 1) {
+            return parse();
+        }
+
+        List<Spectrum> results = Collections.synchronizedList(new ArrayList<Spectrum>());
+        List<String> errors = Collections.synchronizedList(new ArrayList<String>());
+
+        // Thread pool
+        ThreadPoolExecutorWithExceptions executor = ThreadPoolExecutorWithExceptions.newFixedThreadPool(numThreads);
+        executor.setTaskName("Parse");
+
+        try {
+            List<List<File>> taskFiles = new ArrayList<List<File>>();
+            for (int i = 0; i < numThreads; i++) {
+                taskFiles.add(new ArrayList<File>());
+            }
+            for (int i = 0; i < resultFiles.length; i++) {
+                taskFiles.get(i % numThreads).add(resultFiles[i]);
+            }
+            for (int i = 0; i < numThreads; i++) {
+                List<File> thisTaskFiles = taskFiles.get(i);
+                System.out.println("Task " + (i + 1) + ": " + thisTaskFiles.size() + " files.");
+                ConcurrentAnnotatedSpectraParser parser = new ConcurrentAnnotatedSpectraParser(thisTaskFiles.toArray(new File[0]), specDir, aaSet, fdrThreshold, dropErrors, results, errors);
+                executor.execute(parser);
+            }
+            taskFiles.clear();
+
+            executor.outputProgressReport();
+            executor.shutdown();
+
+            try {
+                executor.awaitTerminationWithExceptions(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
+            } catch (InterruptedException e) {
+                if (!executor.HasThrownData()) {
+                    e.printStackTrace();
+                }
+            }
+
+            executor.outputProgressReport();
+        } catch (OutOfMemoryError ex) {
+            ex.printStackTrace();
+            executor.shutdownNow();
+            return "Task terminated; results incomplete. Please run again with a greater amount of memory, using \"-Xmx4G\", for example.";
+        } catch (Exception ex) {
+            ex.printStackTrace();
+            executor.shutdownNow();
+            return "Task terminated; results incomplete. Please run again.";
+        } catch (Throwable ex) {
+            ex.printStackTrace();
+            executor.shutdownNow();
+            return "Task terminated; results incomplete. Please run again.";
+        }
+        annotatedSpectra = new SpectraContainer();
+        annotatedSpectra.addAll(results);
+        String errorList = null;
+        for (String error : errors) {
+            if (errorList == null) {
+                errorList = error;
+            } else {
+                errorList += "\n" + error;
+            }
+        }
+        return errorList;
+    }
+
+    public String parse() {
+        if (progress == null) {
+            progress = new ProgressData();
+        }
+        annotatedSpectra = new SpectraContainer();
+
+        System.out.println("Using " + resultFiles.length + " result files:");
+        for (File resultFile : resultFiles)
+            System.out.println("\t" + resultFile.getName());
+
+        int count = 0;
+        int total = resultFiles.length;
+        String aggErrs = null;
+        for (File resultFile : resultFiles) {
+            String errMsg = parseFile(resultFile);
+            count++;
+            progress.report(count, total);
+            if (errMsg != null) {
+                String msg = "Error while parsing " + resultFile.getName() + ": " + errMsg;
+                if (dropErrorDatasets) {
+                    System.out.println(msg);
+                    if (aggErrs == null) {
+                        aggErrs = msg;
+                    } else {
+                        aggErrs += "\n" + msg;
+                    }
+                } else {
+                    exception = new Exception(msg);
+                    return msg;
+                }
+            }
+        }
+        return null;
+    }
+
+    public void writeToMgf(PrintStream out) {
+        if (annotatedSpectra != null) {
+            for (Spectrum spec : annotatedSpectra)
+                spec.outputMgf(out);
+        }
+    }
+
+    public String parseFile(File resultFile) {
+        System.out.println("Parsing " + resultFile.getName());
+        File tsvResultFile;
+        if (resultFile.getName().endsWith(".mzid")) {
+            // mzID input was supported by upstream via MzIDParser; the mzid/ package
+            // was removed in this fork. Pre-convert .mzid to .tsv and pass the .tsv.
+            return "mzid input is not supported in this build; pre-convert to .tsv and supply that instead.";
+        } else if (resultFile.getName().endsWith(".tsv")) {
+            tsvResultFile = resultFile;
+        } else {
+            return "Unrecognized result file extension (expected .tsv): " + resultFile.getName();
+        }
+
+        BufferedLineReader in;
+        try {
+            in = new BufferedLineReader(tsvResultFile.getPath());
+        } catch (IOException e) {
+            e.printStackTrace();
+            return "Unable to open " + tsvResultFile.getPath();
+        }
+
+        String s = in.readLine();
+        if (s == null || !s.startsWith("#")) {
+            return "Not a valid tsv result file";
+        }
+
+        int specIdCol = -1;
+        int specFileCol = -1;
+        int pepCol = -1;
+        int fdrCol = -1;
+        int chargeCol = -1;
+
+        String[] label = s.split("\t");
+        for (int i = 0; i < label.length; i++) {
+            if (label[i].equalsIgnoreCase("#SpecFile"))
+                specFileCol = i;
+            else if (label[i].equalsIgnoreCase("SpecID"))
+                specIdCol = i;
+            else if (label[i].equalsIgnoreCase("Peptide"))
+                pepCol = i;
+            else if (label[i].equalsIgnoreCase("FDR") || label[i].equalsIgnoreCase("EFDR") || label[i].equalsIgnoreCase("QValue") || label[i].equalsIgnoreCase("SpecQValue"))
+                fdrCol = i;
+            else if (label[i].equalsIgnoreCase("Charge"))
+                chargeCol = i;
+        }
+        if (specIdCol < 0 || specFileCol < 0 || pepCol < 0)
+            return "Not a valid mzid file";
+        if (fdrCol < 0)
+            return "QValue is missing";
+
+        ArrayList<String> resultList = new ArrayList<String>();
+        while ((s = in.readLine()) != null) {
+            String[] token = s.split("\t");
+            if (token.length <= specIdCol || token.length <= specFileCol || token.length <= pepCol || token.length <= fdrCol)
+                continue;
+
+            float fdr = Float.parseFloat(token[fdrCol]);
+
+            if (fdr <= fdrThreshold) {
+                resultList.add(s);
+            }
+        }
+
+        Iterator<String> itr = resultList.iterator();
+        List<Spectrum> annotatedResults = new ArrayList<Spectrum>();
+
+        HashMap<String, SpectraAccessor> specAccessorMap = new HashMap<String, SpectraAccessor>();
+        while (itr.hasNext()) {
+            String str = itr.next();
+            String[] token = str.split("\t");
+
+            String pep = token[pepCol];
+            if (pep.matches(".\\..+\\.."))
+                pep = pep.substring(pep.indexOf('.') + 1, pep.lastIndexOf('.'));
+
+            String specFileName = token[specFileCol];
+            specFileName = new File(specFileName).getName();
+
+            int charge = Integer.parseInt(token[chargeCol]);
+
+            SpectraAccessor specAccessor = specAccessorMap.get(specFileName);
+            if (specAccessor == null) {
+                File specFile = new File(specDir.getPath() + File.separator + specFileName);
+                specAccessor = new SpectraAccessor(specFile);
+                specAccessorMap.put(specFileName, specAccessor);
+            }
+
+            String specId = token[specIdCol];
+            Spectrum spec = specAccessor.getSpectrumById(specId);
+
+            if (spec == null)
+                return specFileName + ":" + specId + " is not available!";
+            else {
+                Peptide peptide = new Peptide(pep, aaSet);
+                spec.setCharge(charge);
+
+                if (Math.abs(spec.getPeptideMass() - peptide.getMass()) < 5) {
+                    spec.setAnnotation(peptide);
+                    annotatedResults.add(spec);
+                } else {
+                    return "parent mass doesn't match " + specFileName + ":" + specId + " " + peptide.toString() + " " + spec.getPeptideMass() + " != " + peptide.getMass();
+                }
+            }
+        }
+
+        try {
+            in.close();
+        } catch (IOException e) {
+            e.printStackTrace();
+        }
+        annotatedSpectra.addAll(annotatedResults);
+        return null;
+    }
+}
diff --git a/src/main/java/edu/ucsd/msjava/ui/ScoringParamGen.java b/src/main/java/edu/ucsd/msjava/ui/ScoringParamGen.java
new file mode 100644
index 00000000..b335342f
--- /dev/null
+++ b/src/main/java/edu/ucsd/msjava/ui/ScoringParamGen.java
@@ -0,0 +1,205 @@
+package edu.ucsd.msjava.ui;
+
+import edu.ucsd.msjava.msscorer.NewScorerFactory.SpecDataType;
+import edu.ucsd.msjava.msscorer.ScoringParameterGeneratorWithErrors;
+import edu.ucsd.msjava.msutil.ActivationMethod;
+import edu.ucsd.msjava.msutil.AminoAcidSet;
+import edu.ucsd.msjava.msutil.AnnotatedSpectra;
+import edu.ucsd.msjava.msutil.Enzyme;
+import edu.ucsd.msjava.msutil.InstrumentType;
+import edu.ucsd.msjava.msutil.Protocol;
+
+import java.io.BufferedOutputStream;
+import java.io.File;
+import java.io.FileNotFoundException;
+import java.io.FileOutputStream;
+import java.io.PrintStream;
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Trainer entry point. The MS-GF+ {@code params/ParamManager} CLI framework
+ * was removed from this fork; this restored entry point uses a simple
+ * {@code args[]} parser instead. Inputs that depended on the deleted
+ * {@code mzid/} package are not accepted - supply pre-converted .tsv files.
+ */
+public class ScoringParamGen {
+
+    public static final int VERSION = 8831;
+    public static final String DATE = "02/04/2013";
+
+    public static void main(String argv[]) {
+        if (argv.length == 0 || hasHelpFlag(argv)) {
+            printUsageInfo(null);
+            return;
+        }
+
+        String errMsg = run(argv);
+        if (errMsg != null) {
+            System.err.println("[Error] " + errMsg);
+            System.out.println();
+            printUsageInfo(null);
+        }
+    }
+
+    private static boolean hasHelpFlag(String[] argv) {
+        for (String a : argv) {
+            if (a.equalsIgnoreCase("-h") || a.equalsIgnoreCase("--help") || a.equalsIgnoreCase("-help"))
+                return true;
+        }
+        return false;
+    }
+
+    static String run(String[] argv) {
+        // -i <results.tsv>[,results2.tsv,...]   (one or more comma-separated training result TSVs)
+        // -d <specDir>                          (directory holding the spectrum files referenced by the TSVs)
+        // -m <activationMethodName>             (e.g. CID, ETD, HCD, UVPD)
+        // -inst <instrumentTypeName>            (e.g. LowRes, HighRes)
+        // -e <enzymeName>                       (e.g. Tryp)
+        // -protocol <protocolName>              (e.g. NoProtocol, optional, default automatic)
+        // -thread <numThreads>                  (default 1)
+        // -dropErrors 0|1                       (default 0)
+        // -mgf 0|1                              (default 0; if 1, also writes <dataType>.mgf)
+        File[] resultFiles = null;
+        File specDir = null;
+        ActivationMethod activationMethod = null;
+        InstrumentType instType = null;
+        Enzyme enzyme = null;
+        Protocol protocol = null;
+        int numThreads = 1;
+        boolean dropErrors = false;
+        boolean createMgf = false;
+
+        for (int i = 0; i < argv.length; i += 2) {
+            if (!argv[i].startsWith("-") || i + 1 >= argv.length) {
+                return "Invalid parameter: " + argv[i];
+            }
+            String key = argv[i];
+            String val = argv[i + 1];
+            if (key.equalsIgnoreCase("-i")) {
+                String[] paths = val.split(",");
+                List<File> files = new ArrayList<File>(paths.length);
+                for (String p : paths) {
+                    File f = new File(p);
+                    if (!f.exists())
+                        return "Input file does not exist: " + p;
+                    files.add(f);
+                }
+                resultFiles = files.toArray(new File[0]);
+            } else if (key.equalsIgnoreCase("-d")) {
+                specDir = new File(val);
+                if (!specDir.exists() || !specDir.isDirectory())
+                    return "Spectrum directory does not exist or is not a directory: " + val;
+            } else if (key.equalsIgnoreCase("-m")) {
+                activationMethod = ActivationMethod.get(val);
+                if (activationMethod == null)
+                    return "Unrecognized activation method: " + val;
+            } else if (key.equalsIgnoreCase("-inst")) {
+                instType = InstrumentType.get(val);
+                if (instType == null)
+                    return "Unrecognized instrument type: " + val;
+            } else if (key.equalsIgnoreCase("-e")) {
+                enzyme = Enzyme.getEnzymeByName(val);
+                if (enzyme == null)
+                    return "Unrecognized enzyme: " + val;
+            } else if (key.equalsIgnoreCase("-protocol")) {
+                protocol = Protocol.get(val);
+                if (protocol == null)
+                    return "Unrecognized protocol: " + val;
+            } else if (key.equalsIgnoreCase("-thread")) {
+                try {
+                    numThreads = Integer.parseInt(val);
+                } catch (NumberFormatException e) {
+                    return "-thread must be an integer";
+                }
+            } else if (key.equalsIgnoreCase("-dropErrors")) {
+                dropErrors = "1".equals(val);
+            } else if (key.equalsIgnoreCase("-mgf")) {
+                createMgf = "1".equals(val);
+            } else {
+                return "Unknown option: " + key;
+            }
+        }
+
+        if (resultFiles == null) return "missing -i (training result TSV files)";
+        if (specDir == null) return "missing -d (spectrum directory)";
+        if (activationMethod == null) return "missing -m (activation method)";
+        if (instType == null) return "missing -inst (instrument type)";
+        if (enzyme == null) return "missing -e (enzyme)";
+        if (protocol == null) protocol = Protocol.AUTOMATIC;
+
+        long time = System.currentTimeMillis();
+        AminoAcidSet aaSet = AminoAcidSet.getStandardAminoAcidSet();
+
+        AnnotatedSpectra annotatedSpec = new AnnotatedSpectra(resultFiles, specDir, aaSet);
+        System.out.println("Reading training PSMs...");
+        String parseErr = annotatedSpec.parse(numThreads, dropErrors);
+        if (parseErr != null) {
+            if (dropErrors) {
+                System.out.println("Datasets with errors (dropped): " + parseErr);
+            } else {
+                return parseErr;
+            }
+        }
+        if (annotatedSpec.getAnnotatedSpecContainer() == null
+                || annotatedSpec.getAnnotatedSpecContainer().isEmpty()) {
+            return "No results to train on. Exiting.";
+        }
+        System.out.println("Done.");
+
+        SpecDataType dataType = new SpecDataType(activationMethod, instType, enzyme, protocol);
+
+        if (createMgf) {
+            String mgfFileName = dataType.toString() + ".mgf";
+            File mgfFile = new File(mgfFileName);
+            System.out.println("Creating " + mgfFile.getPath());
+            try {
+                PrintStream mgfOut = new PrintStream(new BufferedOutputStream(new FileOutputStream(mgfFile)));
+                annotatedSpec.writeToMgf(mgfOut);
+                mgfOut.close();
+            } catch (FileNotFoundException e) {
+                e.printStackTrace();
+            }
+        }
+
+        ScoringParameterGeneratorWithErrors.generateParameters(
+                annotatedSpec.getAnnotatedSpecContainer(),
+                dataType,
+                aaSet,
+                new File("."),
+                false,
+                true);
+
+        System.out.format("ScoringParamGen complete (total elapsed time: %.2f sec)\n",
+                (System.currentTimeMillis() - time) / (float) 1000);
+        return null;
+    }
+
+    public static void printUsageInfo(String message) {
+        if (message != null) {
+            System.err.println(message);
+        }
+        System.out.println("ScoringParamGen v" + VERSION + " (" + DATE + ")");
+        System.out.println("Trains MS-GF+ scoring parameters (.param) from annotated PSM training data.");
+        System.out.println();
+        System.out.println("Usage: java -Xmx2000M -cp MSGFPlus.jar edu.ucsd.msjava.ui.ScoringParamGen [options]");
+        System.out.println();
+        System.out.println("Required:");
+        System.out.println("  -i  <tsv1[,tsv2,...]>  Training result TSV files (mzID input not supported in this build)");
+        System.out.println("  -d  <specDir>          Directory holding the spectrum files referenced by the TSVs");
+        System.out.println("  -m  <activation>       Activation method (e.g. CID, ETD, HCD, UVPD)");
+        System.out.println("  -inst <instrument>     Instrument type (e.g. LowRes, HighRes, QExactive)");
+        System.out.println("  -e  <enzyme>           Enzyme name (e.g. Tryp, Chymotryp, LysC, AspN)");
+        System.out.println();
+        System.out.println("Optional:");
+        System.out.println("  -protocol <name>       Protocol (default: NoProtocol/automatic)");
+        System.out.println("  -thread <int>          Worker threads for parsing PSMs (default: 1)");
+        System.out.println("  -dropErrors 0|1        Drop datasets with errors instead of failing (default: 0)");
+        System.out.println("  -mgf 0|1               Also emit aggregated <dataType>.mgf (default: 0)");
+        System.out.println();
+        System.out.println("Notes:");
+        System.out.println("  * The output .param file is written to the current working directory.");
+        System.out.println("  * mzID input was supported by upstream MS-GF+ but the mzid/ package was removed");
+        System.out.println("    from this fork; pre-convert .mzid to .tsv if needed.");
+    }
+}
diff --git a/src/test/java/edu/ucsd/msjava/msutil/TestAnnotatedSpectraRecovery.java b/src/test/java/edu/ucsd/msjava/msutil/TestAnnotatedSpectraRecovery.java
new file mode 100644
index 00000000..b5fc5001
--- /dev/null
+++ b/src/test/java/edu/ucsd/msjava/msutil/TestAnnotatedSpectraRecovery.java
@@ -0,0 +1,232 @@
+package edu.ucsd.msjava.msutil;
+
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.charset.StandardCharsets;
+import java.nio.file.Files;
+import java.nio.file.Path;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotNull;
+import static org.junit.Assert.assertNull;
+import static org.junit.Assert.assertTrue;
+
+/**
+ * Integration tests for the restored {@link AnnotatedSpectra} TSV/spectrum
+ * loader. Covers:
+ *
+ *  - Extension validation: .mzid is rejected with the explicit "not supported
+ *    in this build" message (the mzid/ package was removed); .tsv is the
+ *    only accepted format.
+ *  - TSV header validation: required columns must be present; missing columns
+ *    produce specific errors.
+ *  - End-to-end TSV + spectrum-file integration using the existing
+ *    src/test/resources/iprg-2013/F13.mgf fixture (real MS2 data, 146K lines).
+ *    Exercises SpectraAccessor lookup against an MGF spec dir without
+ *    requiring any specific instrument data; the recovered code is
+ *    instrument-agnostic. Astral-format mzML files are too large to ship as
+ *    test fixtures, but the integration boundary is identical: parse a TSV
+ *    row, look up the named SpecID via SpectraAccessor, return a graceful
+ *    error if not found.
+ *
+ * Does NOT exercise the actual scoring-parameter trainer
+ * (ScoringParameterGeneratorWithErrors.generateParameters); that needs
+ * thousands of confidently-annotated PSMs and is out of scope for a recovery
+ * smoke test.
+ */
+public class TestAnnotatedSpectraRecovery {
+
+    @Rule
+    public TemporaryFolder tmp = new TemporaryFolder();
+
+    private static final String F13_MGF_RESOURCE_PATH = "src/test/resources/iprg-2013/F13.mgf";
+
+    private File writeTsv(String content) throws IOException {
+        File f = tmp.newFile("results.tsv");
+        Files.write(f.toPath(), content.getBytes(StandardCharsets.UTF_8));
+        return f;
+    }
+
+    private AnnotatedSpectra newWith(File... resultFiles) {
+        return new AnnotatedSpectra(resultFiles, tmp.getRoot(), AminoAcidSet.getStandardAminoAcidSet());
+    }
+
+    @Test
+    public void rejectsMzidExtensionWithExplicitMessage() throws IOException {
+        File mzid = tmp.newFile("results.mzid");
+        Files.write(mzid.toPath(), "<placeholder/>".getBytes(StandardCharsets.UTF_8));
+
+        AnnotatedSpectra parser = newWith(mzid);
+        String err = parser.parseFile(mzid);
+
+        assertNotNull("mzid input should be rejected", err);
+        assertTrue("error should call out that mzid is unsupported in this build; got: " + err,
+                err.contains("mzid input is not supported in this build"));
+    }
+
+    @Test
+    public void rejectsUnknownExtension() throws IOException {
+        File txt = tmp.newFile("results.txt");
+        Files.write(txt.toPath(), "irrelevant".getBytes(StandardCharsets.UTF_8));
+
+        AnnotatedSpectra parser = newWith(txt);
+        String err = parser.parseFile(txt);
+
+        assertNotNull(err);
+        assertTrue("error should mention the unrecognized extension; got: " + err,
+                err.startsWith("Unrecognized result file extension"));
+    }
+
+    @Test
+    public void rejectsTsvWithoutHeaderHash() throws IOException {
+        // Header line must start with "#" per MS-GF+ result format. A blank
+        // first line (or a line that doesn't start with #) is rejected.
+        File tsv = writeTsv("SpecFile\tSpecID\tPeptide\tCharge\tQValue\nfoo\t1\tK.PEPTIDE.K\t2\t0.001\n");
+
+        AnnotatedSpectra parser = newWith(tsv);
+        String err = parser.parseFile(tsv);
+
+        assertEquals("Not a valid tsv result file", err);
+    }
+
+    @Test
+    public void rejectsTsvMissingRequiredColumns() throws IOException {
+        // Header has only #SpecFile and Charge; SpecID and Peptide are required.
+        File tsv = writeTsv("#SpecFile\tCharge\nfoo\t2\n");
+
+        AnnotatedSpectra parser = newWith(tsv);
+        String err = parser.parseFile(tsv);
+
+        // Note: the upstream error message reads "Not a valid mzid file" even
+        // though the input is .tsv — that string predates the mzID excision.
+        // Keeping it as-is to stay faithful to the upstream parser; the test
+        // documents the actual behavior so future readers aren't surprised.
+        assertNotNull(err);
+        assertTrue("missing required columns should be flagged; got: " + err,
+                err.contains("Not a valid"));
+    }
+
+    @Test
+    public void rejectsTsvMissingFdrColumn() throws IOException {
+        // Header has SpecID/SpecFile/Peptide but no QValue/EFDR/SpecQValue/FDR.
+        File tsv = writeTsv("#SpecFile\tSpecID\tPeptide\tCharge\nfoo\t1\tK.PEPTIDE.K\t2\n");
+
+        AnnotatedSpectra parser = newWith(tsv);
+        String err = parser.parseFile(tsv);
+
+        assertEquals("QValue is missing", err);
+    }
+
+    @Test
+    public void acceptsAnyOfTheFdrColumnAliases() throws IOException {
+        // The parser recognises FDR / EFDR / QValue / SpecQValue interchangeably
+        // as the FDR column. Verify each alias produces a passing header check
+        // (no error). Use parse() (not parseFile() directly) because parseFile
+        // expects the container to have been initialised by parse() first.
+        for (String fdrAlias : new String[]{"FDR", "EFDR", "QValue", "SpecQValue"}) {
+            File tsv = tmp.newFile("results-" + fdrAlias + ".tsv");
+            Files.write(tsv.toPath(),
+                    ("#SpecFile\tSpecID\tPeptide\tCharge\t" + fdrAlias + "\n")
+                            .getBytes(StandardCharsets.UTF_8));
+            AnnotatedSpectra parser = newWith(tsv);
+            String err = parser.parse();
+            // No data rows -> no lookups -> parser returns null (success).
+            assertNull("FDR column alias '" + fdrAlias + "' should be accepted; got: " + err, err);
+        }
+    }
+
+    @Test
+    public void filtersRowsAboveFdrThreshold() throws IOException {
+        // Default fdrThreshold is 0.01. A row with QValue=0.99 is dropped;
+        // no spectrum lookup is attempted, so the parser succeeds without
+        // touching the spec dir.
+        File tsv = writeTsv(
+                "#SpecFile\tSpecID\tPeptide\tCharge\tQValue\n"
+                        + "irrelevant.mgf\tindex=0\tK.PEPTIDE.K\t2\t0.99\n");
+
+        AnnotatedSpectra parser = newWith(tsv);
+        String err = parser.parse();
+
+        assertNull("rows above FDR threshold should be dropped silently; got: " + err, err);
+        assertNotNull(parser.getAnnotatedSpecContainer());
+        assertTrue("no PSMs should have passed the filter",
+                parser.getAnnotatedSpecContainer().isEmpty());
+    }
+
+    @Test
+    public void looksUpSpectraInRealMgfFixtureAndReportsMissingId() throws IOException {
+        // Drive the full TSV-row parse path against the real F13.mgf fixture.
+        // The TSV references a SpecID that does not exist in F13.mgf, so the
+        // SpectraAccessor returns null and the parser produces a clean
+        // "is not available!" error rather than crashing.
+        Path mgfSrc = new File(F13_MGF_RESOURCE_PATH).toPath();
+        assertTrue("test fixture missing: " + F13_MGF_RESOURCE_PATH, Files.exists(mgfSrc));
+
+        File specDir = tmp.newFolder("specs");
+        Path mgfDest = new File(specDir, "F13.mgf").toPath();
+        Files.copy(mgfSrc, mgfDest);
+
+        File tsv = writeTsv(
+                "#SpecFile\tSpecID\tPeptide\tCharge\tQValue\n"
+                        + "F13.mgf\tindex=99999999\tK.PEPTIDE.K\t2\t0.001\n");
+
+        AnnotatedSpectra parser = new AnnotatedSpectra(
+                new File[]{tsv}, specDir, AminoAcidSet.getStandardAminoAcidSet());
+        String err = parser.parseFile(tsv);
+
+        assertNotNull("missing spec ID should yield an error", err);
+        assertTrue("error should name the file and id; got: " + err,
+                err.contains("F13.mgf:index=99999999") && err.contains("is not available"));
+    }
+
+    @Test
+    public void parseTopLevelHandlesValidEmptyTsvCleanly() throws IOException {
+        // The high-level parse() method aggregates per-file results; an empty
+        // (header-only) TSV produces no PSMs and no error.
+        File tsv = writeTsv("#SpecFile\tSpecID\tPeptide\tCharge\tQValue\n");
+
+        AnnotatedSpectra parser = newWith(tsv);
+        String err = parser.parse();
+
+        assertNull("empty but well-formed TSV should not error; got: " + err, err);
+        assertNotNull(parser.getAnnotatedSpecContainer());
+        assertTrue(parser.getAnnotatedSpecContainer().isEmpty());
+    }
+
+    @Test
+    public void parseAggregatesMultipleFiles() throws IOException {
+        // Two empty-but-valid TSVs — both succeed, container is empty,
+        // no exceptions propagate. Proves the multi-file iteration loop works.
+        File tsv1 = writeTsv("#SpecFile\tSpecID\tPeptide\tCharge\tQValue\n");
+        File tsv2 = tmp.newFile("results2.tsv");
+        Files.write(tsv2.toPath(),
+                "#SpecFile\tSpecID\tPeptide\tCharge\tQValue\n".getBytes(StandardCharsets.UTF_8));
+
+        AnnotatedSpectra parser = newWith(tsv1, tsv2);
+        String err = parser.parse();
+
+        assertNull("aggregating empty TSVs should not error; got: " + err, err);
+        assertTrue(parser.getAnnotatedSpecContainer().isEmpty());
+    }
+
+    @Test
+    public void parseFailFastOnFirstErrorWhenDropErrorsDisabled() throws IOException {
+        // Two TSVs, first is invalid (.mzid) → parse() returns the first
+        // error and stops.
+        File mzid = tmp.newFile("first.mzid");
+        Files.write(mzid.toPath(), "<placeholder/>".getBytes(StandardCharsets.UTF_8));
+        File tsv = writeTsv("#SpecFile\tSpecID\tPeptide\tCharge\tQValue\n");
+
+        AnnotatedSpectra parser = new AnnotatedSpectra(
+                new File[]{mzid, tsv}, tmp.getRoot(), AminoAcidSet.getStandardAminoAcidSet());
+        String err = parser.parse();
+
+        assertNotNull(err);
+        assertTrue("should surface the mzid-rejection message; got: " + err,
+                err.contains("mzid input is not supported in this build"));
+    }
+}
diff --git a/src/test/java/edu/ucsd/msjava/ui/TestScoringParamGenRecovery.java b/src/test/java/edu/ucsd/msjava/ui/TestScoringParamGenRecovery.java
new file mode 100644
index 00000000..2ad6d563
--- /dev/null
+++ b/src/test/java/edu/ucsd/msjava/ui/TestScoringParamGenRecovery.java
@@ -0,0 +1,221 @@
+package edu.ucsd.msjava.ui;
+
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.charset.StandardCharsets;
+import java.nio.file.Files;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotNull;
+import static org.junit.Assert.assertNull;
+import static org.junit.Assert.assertTrue;
+
+/**
+ * Integration tests for the restored {@link ScoringParamGen} CLI front door.
+ *
+ * Scope: the recovery surface only — argument parsing, validation of required
+ * options, error paths for invalid enums (activation method, instrument,
+ * enzyme, protocol), and the TSV/spectrum-dir existence checks. The actual
+ * .param model generation is not exercised; that requires real annotated
+ * training PSMs which are not part of this fixture set.
+ */
+public class TestScoringParamGenRecovery {
+
+    @Rule
+    public TemporaryFolder tmp = new TemporaryFolder();
+
+    private File makeEmptyTsv() throws IOException {
+        File f = tmp.newFile("results.tsv");
+        Files.write(f.toPath(), new byte[0]);
+        return f;
+    }
+
+    private File makeSpecDir() throws IOException {
+        return tmp.newFolder("spectra");
+    }
+
+    @Test
+    public void runWithoutArgsReportsMissingI() {
+        assertEquals("missing -i (training result TSV files)",
+                ScoringParamGen.run(new String[]{}));
+    }
+
+    @Test
+    public void runWithSingleDanglingArgReportsInvalidParameter() {
+        // The arg parser requires key/value pairs; a lone "-i" is malformed.
+        assertEquals("Invalid parameter: -i",
+                ScoringParamGen.run(new String[]{"-i"}));
+    }
+
+    @Test
+    public void runWithBareTokenReportsInvalidParameter() {
+        // First arg must start with "-". "junk" is a bare positional, rejected.
+        assertEquals("Invalid parameter: junk",
+                ScoringParamGen.run(new String[]{"junk", "value"}));
+    }
+
+    @Test
+    public void runWithUnknownOptionRejected() throws IOException {
+        File tsv = makeEmptyTsv();
+        String err = ScoringParamGen.run(new String[]{"-i", tsv.getPath(), "-bogus", "x"});
+        assertEquals("Unknown option: -bogus", err);
+    }
+
+    @Test
+    public void runWithMissingInputFileRejected() {
+        String err = ScoringParamGen.run(new String[]{"-i", "/nonexistent/path/to/results.tsv"});
+        assertNotNull(err);
+        assertTrue("error should mention the bad path; got: " + err,
+                err.startsWith("Input file does not exist:"));
+    }
+
+    @Test
+    public void runWithMissingSpecDirRejected() throws IOException {
+        File tsv = makeEmptyTsv();
+        String err = ScoringParamGen.run(new String[]{
+                "-i", tsv.getPath(),
+                "-d", "/nonexistent/directory"});
+        assertNotNull(err);
+        assertTrue("error should mention the missing dir; got: " + err,
+                err.startsWith("Spectrum directory does not exist"));
+    }
+
+    @Test
+    public void runWithInvalidActivationMethodRejected() throws IOException {
+        File tsv = makeEmptyTsv();
+        File dir = makeSpecDir();
+        String err = ScoringParamGen.run(new String[]{
+                "-i", tsv.getPath(),
+                "-d", dir.getPath(),
+                "-m", "FAKE_ACTIVATION"});
+        assertEquals("Unrecognized activation method: FAKE_ACTIVATION", err);
+    }
+
+    @Test
+    public void runWithInvalidInstrumentRejected() throws IOException {
+        File tsv = makeEmptyTsv();
+        File dir = makeSpecDir();
+        String err = ScoringParamGen.run(new String[]{
+                "-i", tsv.getPath(),
+                "-d", dir.getPath(),
+                "-m", "HCD",
+                "-inst", "FAKE_INSTRUMENT"});
+        assertEquals("Unrecognized instrument type: FAKE_INSTRUMENT", err);
+    }
+
+    @Test
+    public void runWithInvalidEnzymeRejected() throws IOException {
+        File tsv = makeEmptyTsv();
+        File dir = makeSpecDir();
+        String err = ScoringParamGen.run(new String[]{
+                "-i", tsv.getPath(),
+                "-d", dir.getPath(),
+                "-m", "HCD",
+                "-inst", "QExactive",
+                "-e", "FAKE_ENZYME"});
+        assertEquals("Unrecognized enzyme: FAKE_ENZYME", err);
+    }
+
+    @Test
+    public void runWithInvalidProtocolRejected() throws IOException {
+        File tsv = makeEmptyTsv();
+        File dir = makeSpecDir();
+        String err = ScoringParamGen.run(new String[]{
+                "-i", tsv.getPath(),
+                "-d", dir.getPath(),
+                "-m", "HCD",
+                "-inst", "QExactive",
+                "-e", "Tryp",
+                "-protocol", "FAKE_PROTOCOL"});
+        assertEquals("Unrecognized protocol: FAKE_PROTOCOL", err);
+    }
+
+    @Test
+    public void runWithNonNumericThreadRejected() throws IOException {
+        File tsv = makeEmptyTsv();
+        File dir = makeSpecDir();
+        String err = ScoringParamGen.run(new String[]{
+                "-i", tsv.getPath(),
+                "-d", dir.getPath(),
+                "-m", "HCD",
+                "-inst", "QExactive",
+                "-e", "Tryp",
+                "-thread", "abc"});
+        assertEquals("-thread must be an integer", err);
+    }
+
+    @Test
+    public void runWithMissingDReportedAfterIIsValid() throws IOException {
+        File tsv = makeEmptyTsv();
+        String err = ScoringParamGen.run(new String[]{"-i", tsv.getPath()});
+        assertEquals("missing -d (spectrum directory)", err);
+    }
+
+    @Test
+    public void runWithMissingMReportedAfterIDIsValid() throws IOException {
+        File tsv = makeEmptyTsv();
+        File dir = makeSpecDir();
+        String err = ScoringParamGen.run(new String[]{
+                "-i", tsv.getPath(),
+                "-d", dir.getPath()});
+        assertEquals("missing -m (activation method)", err);
+    }
+
+    @Test
+    public void runWithAllRequiredArgsButEmptyTsvFailsAtParse() throws IOException {
+        // All CLI args validate; the empty TSV (no header line) fails inside
+        // AnnotatedSpectra.parseFile with "Not a valid tsv result file".
+        // This proves the CLI hands off to the parser correctly post-recovery.
+        File tsv = makeEmptyTsv();
+        File dir = makeSpecDir();
+        String err = ScoringParamGen.run(new String[]{
+                "-i", tsv.getPath(),
+                "-d", dir.getPath(),
+                "-m", "HCD",
+                "-inst", "QExactive",
+                "-e", "Tryp"});
+        assertNotNull("expected a parse error to bubble up; got null", err);
+        assertTrue("error should come from TSV parsing; got: " + err,
+                err.contains("Not a valid tsv result file")
+                        || err.startsWith("Error while parsing"));
+    }
+
+    @Test
+    public void runWithNoMatchingPsmsExitsCleanlyBeforeTraining() throws IOException {
+        // Build a valid-shaped TSV whose only row has FDR > 0.01 (the trainer's
+        // FDR floor). AnnotatedSpectra accepts the file, filters the row out,
+        // and the CLI returns "No results to train on" *without* invoking the
+        // ScoringParameterGeneratorWithErrors trainer. Proves the full
+        // arg-parse + handoff + early-exit chain works end to end.
+        File tsv = tmp.newFile("noresults.tsv");
+        String content = String.join("\t", "#SpecFile", "SpecID", "Peptide", "Charge", "QValue") + "\n"
+                + String.join("\t", "synth.mgf", "index=0", "K.PEPTIDE.K", "2", "0.99") + "\n";
+        Files.write(tsv.toPath(), content.getBytes(StandardCharsets.UTF_8));
+        File dir = makeSpecDir();
+        String err = ScoringParamGen.run(new String[]{
+                "-i", tsv.getPath(),
+                "-d", dir.getPath(),
+                "-m", "HCD",
+                "-inst", "QExactive",
+                "-e", "Tryp"});
+        assertEquals("No results to train on. Exiting.", err);
+    }
+
+    @Test
+    public void mainWithEmptyArgsDoesNotThrow() {
+        // Smoke check: empty args triggers printUsageInfo and returns. Should
+        // not throw or call System.exit.
+        ScoringParamGen.main(new String[]{});
+    }
+
+    @Test
+    public void mainWithHelpFlagDoesNotThrow() {
+        ScoringParamGen.main(new String[]{"-h"});
+        ScoringParamGen.main(new String[]{"--help"});
+        ScoringParamGen.main(new String[]{"-help"});
+    }
+}
diff --git a/src/test/java/msgfplus/TestScoringParamGenSmoke.java b/src/test/java/msgfplus/TestScoringParamGenSmoke.java
new file mode 100644
index 00000000..259ea5d6
--- /dev/null
+++ b/src/test/java/msgfplus/TestScoringParamGenSmoke.java
@@ -0,0 +1,387 @@
+package msgfplus;
+
+import edu.ucsd.msjava.msscorer.NewRankScorer;
+import edu.ucsd.msjava.msscorer.NewScorerFactory.SpecDataType;
+import edu.ucsd.msjava.msscorer.ScoringParameterGeneratorWithErrors;
+import edu.ucsd.msjava.msutil.ActivationMethod;
+import edu.ucsd.msjava.msutil.AminoAcidSet;
+import edu.ucsd.msjava.msutil.Composition;
+import edu.ucsd.msjava.msutil.Enzyme;
+import edu.ucsd.msjava.msutil.InstrumentType;
+import edu.ucsd.msjava.msutil.Peak;
+import edu.ucsd.msjava.msutil.Peptide;
+import edu.ucsd.msjava.msutil.Protocol;
+import edu.ucsd.msjava.msutil.SpectraContainer;
+import edu.ucsd.msjava.msutil.Spectrum;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Random;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotNull;
+import static org.junit.Assert.assertTrue;
+
+/**
+ * End-to-end smoke test for the recovered scorer trainer.
+ *
+ * The other recovery tests
+ * ({@link edu.ucsd.msjava.ui.TestScoringParamGenRecovery},
+ *  {@link edu.ucsd.msjava.msutil.TestAnnotatedSpectraRecovery})
+ * cover argument parsing and TSV-error paths but stop short of actually
+ * invoking
+ * {@link ScoringParameterGeneratorWithErrors#generateParameters(SpectraContainer,
+ *  SpecDataType, AminoAcidSet, File, boolean, boolean)}.
+ *
+ * This test closes that gap. It builds a synthetic in-memory
+ * {@link SpectraContainer} of annotated spectra (no MGF/mzML on disk, no
+ * MS-GF+ search), feeds it directly to the trainer, and verifies that:
+ *
+ *   1. {@code generateParameters} runs to completion without throwing.
+ *   2. A {@code <dataType>.param} file is produced on disk and is non-empty.
+ *   3. The companion {@code .param.txt} debug dump is also produced.
+ *   4. The binary {@code .param} parses cleanly via {@link NewRankScorer},
+ *      proving the on-disk format is round-trippable by the same code that
+ *      MS-GF+ uses to load shipped scoring models at search time.
+ *
+ * Approach rationale (Approach A in the planning notes): synthetic spectra
+ * are sufficient because the trainer's invariants are all numeric --- it
+ * computes b/y ion frequencies, rank distributions, and precursor offset
+ * frequencies from (peptide, peak list) pairs without verifying that the
+ * peptide is the "correct" assignment. A grid of distinct tryptic peptides
+ * with realistic b/y peak placement at varied intensities exercises every
+ * code path in the trainer (partition, precursorOFF, ion-type selection,
+ * rank distribution, smoothing, write) at low cost.
+ *
+ * The {@code MIN_NUM_SPECTRA_PER_PARTITION = 400} threshold inside the
+ * trainer is the lower bound on input size, but the trainer also dedupes
+ * to {@code numSpecsPerPeptide = 3} spectra per (peptide, charge) pair
+ * before partitioning. So the unique-peptide count is the dominant knob:
+ * 200 unique peptides * 3 reps = 600 spectra retained, comfortably above
+ * the 360 floor. The {@code IonProbability.getIonProb} path further
+ * requires {@code numObservedPeaks + numMissingPeaks > 1000} per ion type
+ * per segment for that ion type to be selected; with peptides of length
+ * 8-12 and full b+y ion ladders, 200 unique peptides produce well over
+ * that threshold.
+ *
+ * We use {@link InstrumentType#LOW_RESOLUTION_LTQ} ("LowRes") because:
+ *   - it sets {@code errorScalingFactor = 0} inside the trainer, skipping
+ *     the (slow, optional) ion-error and noise-error distribution stages,
+ *   - it sets {@code applyDeconvolution = false}, skipping a full container
+ *     rebuild,
+ *   - the resulting .param file still exercises every required write path
+ *     (charge histogram, partition table, precursor OFF, fragment OFF, rank
+ *     distributions, no error dist).
+ *
+ * Runtime target: under 60s on a developer laptop.
+ */
+public class TestScoringParamGenSmoke {
+
+    @Rule
+    public TemporaryFolder tmp = new TemporaryFolder();
+
+    /**
+     * Residues used as the variable "body" of synthetic tryptic peptides.
+     * Excludes K and R (those are reserved for the C-terminus to keep the
+     * generated sequences tryptic) and excludes I/L collisions / U/O
+     * non-standard codes. Cysteine is also omitted to avoid the standard
+     * AA set's lack of fixed carbamidomethylation distorting masses in a
+     * way that the trainer's nominal-mass binning might handle awkwardly.
+     */
+    private static final char[] BODY_RESIDUES =
+            {'A', 'D', 'E', 'F', 'G', 'H', 'L', 'M', 'N', 'P',
+             'Q', 'S', 'T', 'V', 'W', 'Y'};
+
+    /**
+     * Generates a deterministic pool of {@code count} distinct synthetic
+     * tryptic peptides. Each peptide has length 8-12, body residues drawn
+     * from {@link #BODY_RESIDUES}, and ends in K or R.
+     *
+     * The trainer keeps at most {@code numSpecsPerPeptide} (= 3 for
+     * fixtures with under 2000 unique peptides) {@code (peptide, charge)}
+     * pairs in its training set, so the unique-peptide count is the real
+     * upper bound on usable training spectra after dedup. To clear the
+     * {@code MIN_NUM_SPECTRA_PER_PARTITION} = 400 partitioning floor we
+     * need ~140+ unique peptides (140 * 3 = 420), which this generator
+     * supplies easily.
+     */
+    private static List<String> generateSyntheticTrypticPeptides(int count, long seed) {
+        Random rnd = new Random(seed);
+        java.util.LinkedHashSet<String> peps = new java.util.LinkedHashSet<String>();
+        // Defensive guard: we ask for distinct peptides, so make sure the
+        // alphabet is large enough that collisions don't loop forever.
+        // 16^8 ~= 4e9 distinct length-8 bodies; we won't realistically
+        // exceed that.
+        while (peps.size() < count) {
+            int len = 8 + rnd.nextInt(5);  // 8..12 residues total
+            StringBuilder sb = new StringBuilder(len);
+            for (int i = 0; i < len - 1; i++) {
+                sb.append(BODY_RESIDUES[rnd.nextInt(BODY_RESIDUES.length)]);
+            }
+            sb.append(rnd.nextBoolean() ? 'K' : 'R');
+            peps.add(sb.toString());
+        }
+        return new ArrayList<String>(peps);
+    }
+
+    /**
+     * Writes a synthetic annotated MS2 spectrum into the container.
+     *
+     * @param peptideStr     the unmodified peptide sequence (e.g. "AAGDPLQNK")
+     * @param charge         precursor charge state (typically 2)
+     * @param noisePeakCount number of random noise peaks to scatter alongside
+     *                       the b/y ions; small (3-5) is enough to keep the
+     *                       trainer's noise-distribution stage exercised
+     * @param rnd            shared RNG for reproducible noise placement
+     */
+    private static Spectrum buildSyntheticSpectrum(String peptideStr,
+                                                   int charge,
+                                                   AminoAcidSet aaSet,
+                                                   int noisePeakCount,
+                                                   Random rnd) {
+        Peptide pep = new Peptide(peptideStr, aaSet);
+        float parentMass = pep.getParentMass();              // M = sum(residues) + H2O
+        float proton = (float) Composition.ChargeCarrierMass();
+        float precursorMz = (parentMass + charge * proton) / charge;
+
+        Spectrum spec = new Spectrum();
+        spec.setPrecursor(new Peak(precursorMz, 1.0f, charge));
+        spec.setActivationMethod(ActivationMethod.HCD);
+        spec.setAnnotation(pep);
+
+        // b ion ladder: prefix mass + proton.
+        float prefix = 0f;
+        for (int i = 0; i < pep.size() - 1; i++) {
+            prefix += pep.get(i).getMass();
+            float bMz = prefix + proton;
+            // Vary intensities so setRanksOfPeaks produces a non-trivial ranking.
+            float intensity = 100f + (i * 17f) + rnd.nextFloat() * 50f;
+            spec.add(new Peak(bMz, intensity, 1));
+        }
+
+        // y ion ladder: suffix mass + H2O + proton.
+        float suffix = 0f;
+        float h2oPlusProton = (float) Composition.H2O + proton;
+        for (int i = 0; i < pep.size() - 1; i++) {
+            suffix += pep.get(pep.size() - 1 - i).getMass();
+            float yMz = suffix + h2oPlusProton;
+            float intensity = 200f + (i * 13f) + rnd.nextFloat() * 50f;
+            spec.add(new Peak(yMz, intensity, 1));
+        }
+
+        // A few low-intensity noise peaks to keep the noise distribution
+        // (rankDistTable's IonType.NOISE entry, generated by generateRankDist)
+        // populated with non-zero counts.
+        for (int i = 0; i < noisePeakCount; i++) {
+            float mz = 100f + rnd.nextFloat() * (precursorMz - 100f);
+            float intensity = 5f + rnd.nextFloat() * 15f;
+            spec.add(new Peak(mz, intensity, 1));
+        }
+
+        // Peaks must be in ascending m/z order for getPeakByMass binary search.
+        spec.setRanksOfPeaks();
+        java.util.Collections.sort(spec, new Peak.MassComparator());
+        return spec;
+    }
+
+    /**
+     * Build {@code uniquePeptides * spectraPerPeptide} synthetic annotated
+     * spectra at the given charge state.
+     *
+     * The trainer dedupes to {@code numSpecsPerPeptide} per (peptide,
+     * charge) pair internally (= 3 for fixtures under 2000 unique
+     * peptides), so {@code spectraPerPeptide} should be 3 to ensure all
+     * generated spectra are retained after dedup. {@code uniquePeptides}
+     * therefore directly determines how many spectra survive into
+     * partition().
+     */
+    private static SpectraContainer buildSyntheticContainer(int uniquePeptides,
+                                                            int spectraPerPeptide,
+                                                            int charge,
+                                                            AminoAcidSet aaSet,
+                                                            long seed) {
+        // Deterministic RNG so the test is reproducible across runs / CI.
+        Random rnd = new Random(seed);
+        List<String> peptides = generateSyntheticTrypticPeptides(uniquePeptides, seed);
+        SpectraContainer container = new SpectraContainer();
+        for (String pep : peptides) {
+            for (int rep = 0; rep < spectraPerPeptide; rep++) {
+                container.add(buildSyntheticSpectrum(pep, charge, aaSet, 4, rnd));
+            }
+        }
+        return container;
+    }
+
+    @Test
+    public void generatesParamFileFromSyntheticAnnotatedSpectra() throws Exception {
+        // The trainer's MIN_NUM_SPECTRA_PER_PARTITION (400) is the dominant
+        // input-size constraint, and the trainer dedupes to 3 spectra per
+        // (peptide, charge) pair before partitioning. 200 unique peptides
+        // * 3 reps = 600 retained spectra, comfortably above the floor and
+        // well under 60s end-to-end.
+        final int uniquePeptides = 200;
+        final int spectraPerPeptide = 3;
+        final int charge = 2;
+        final int totalSpectra = uniquePeptides * spectraPerPeptide;
+
+        AminoAcidSet aaSet = AminoAcidSet.getStandardAminoAcidSet();
+        SpectraContainer container = buildSyntheticContainer(
+                uniquePeptides, spectraPerPeptide, charge, aaSet, 42L);
+
+        // Sanity: the container is well-populated and every spectrum is annotated.
+        assertEquals(totalSpectra, container.size());
+        for (Spectrum s : container) {
+            assertNotNull("every synthetic spectrum must carry a Peptide annotation", s.getAnnotation());
+            assertEquals(charge, s.getCharge());
+            assertTrue("every synthetic spectrum must have b/y peaks", s.size() > 0);
+        }
+
+        // LowRes: errorScalingFactor=0, no deconvolution. Tryp + Standard
+        // protocol = the simplest happy path through the trainer.
+        SpecDataType dataType = new SpecDataType(
+                ActivationMethod.HCD,
+                InstrumentType.LOW_RESOLUTION_LTQ,
+                Enzyme.TRYPSIN,
+                Protocol.STANDARD);
+
+        File outDir = tmp.newFolder("trainerOut");
+
+        long t0 = System.currentTimeMillis();
+        ScoringParameterGeneratorWithErrors.generateParameters(
+                container,
+                dataType,
+                aaSet,
+                outDir,
+                /* isText  = */ false,
+                /* verbose = */ false);
+        long elapsedMs = System.currentTimeMillis() - t0;
+
+        // Soft runtime guard: if we ever drift over ~45s on this 600-spectrum
+        // input something has likely regressed badly. The constraint in the
+        // test plan is <60s end-to-end.
+        assertTrue("trainer took too long: " + elapsedMs + " ms (expected < 45000)",
+                elapsedMs < 45_000);
+
+        // The .param file is written as "<dataType>.param" inside outDir; the
+        // dataType.toString() form for non-Standard protocols includes a
+        // trailing "_<protocol>". Standard protocol omits the suffix per
+        // SpecDataType.toString().
+        String expectedName = dataType.toString() + ".param";
+        File paramFile = new File(outDir, expectedName);
+        assertTrue("expected .param file was not produced at " + paramFile.getAbsolutePath(),
+                paramFile.exists() && paramFile.isFile());
+        assertTrue(".param file is empty", paramFile.length() > 0);
+
+        File paramTextFile = new File(outDir, expectedName + ".txt");
+        assertTrue("companion .param.txt debug dump was not produced",
+                paramTextFile.exists() && paramTextFile.length() > 0);
+
+        // Round-trip: reload via NewRankScorer. This exercises the same code
+        // path MS-GF+ uses at search time to read a shipped .param resource,
+        // so a successful constructor here proves the trainer's binary
+        // output is structurally valid (charge histogram, partitions,
+        // precursor OFF, fragment OFF, rank distributions, error dist
+        // section absent because errorScalingFactor=0).
+        NewRankScorer reloaded = new NewRankScorer(paramFile.getPath());
+        assertNotNull(reloaded);
+        assertNotNull("reloaded scorer must expose its dataType", reloaded.getSpecDataType());
+        assertEquals("activation method round-tripped",
+                ActivationMethod.HCD, reloaded.getSpecDataType().getActivationMethod());
+        assertEquals("instrument type round-tripped",
+                InstrumentType.LOW_RESOLUTION_LTQ, reloaded.getSpecDataType().getInstrumentType());
+        assertEquals("enzyme round-tripped",
+                Enzyme.TRYPSIN, reloaded.getSpecDataType().getEnzyme());
+        assertNotNull("reloaded scorer must expose a non-null partition set",
+                reloaded.getParitionSet());
+        assertTrue("reloaded scorer must have at least one partition",
+                reloaded.getParitionSet().size() > 0);
+    }
+
+    @Test
+    public void debugTextDumpReferencesActivationAndEnzyme() throws Exception {
+        // Smaller second @Test that re-runs the trainer with a slightly
+        // different shape and inspects the human-readable .param.txt to
+        // confirm the trainer wrote sensible header lines. This is cheap
+        // because the trainer dominates the runtime and we already paid
+        // for it above; this test pays for it again with a deliberately
+        // smaller input to keep the suite fast.
+        final int uniquePeptides = 180;
+        final int spectraPerPeptide = 3;
+        final int charge = 2;
+        AminoAcidSet aaSet = AminoAcidSet.getStandardAminoAcidSet();
+        SpectraContainer container = buildSyntheticContainer(
+                uniquePeptides, spectraPerPeptide, charge, aaSet, 1337L);
+
+        SpecDataType dataType = new SpecDataType(
+                ActivationMethod.CID,
+                InstrumentType.LOW_RESOLUTION_LTQ,
+                Enzyme.TRYPSIN,
+                Protocol.STANDARD);
+        File outDir = tmp.newFolder("trainerOut2");
+
+        ScoringParameterGeneratorWithErrors.generateParameters(
+                container, dataType, aaSet, outDir, false, false);
+
+        File paramTextFile = new File(outDir, dataType.toString() + ".param.txt");
+        assertTrue("companion .param.txt debug dump was not produced",
+                paramTextFile.exists() && paramTextFile.length() > 0);
+
+        String contents = new String(java.nio.file.Files.readAllBytes(paramTextFile.toPath()));
+        assertTrue(".param.txt should mention the activation method; got first 200 chars: "
+                        + contents.substring(0, Math.min(200, contents.length())),
+                contents.contains("CID"));
+        assertTrue(".param.txt should mention the enzyme",
+                contents.contains("Tryp"));
+        assertTrue(".param.txt should include a charge histogram section",
+                contents.contains("ChargeHistogram"));
+    }
+
+    /**
+     * Documents the trainer's effective lower bound on input size. After the
+     * (peptide, charge) dedup ({@code numSpecsPerPeptide = 3} for fixtures
+     * under 2000 unique peptides), the trainer needs ~360 surviving spectra
+     * (= {@code MIN_NUM_SPECTRA_PER_PARTITION * 0.9}) at one charge to
+     * produce any partition at all; below that, partition() skips the
+     * charge entirely and {@code writeParameters} dies via assert.
+     *
+     * 130 unique peptides * 3 reps = 390 retained spectra -- just clears
+     * the floor with margin for the rounding in the {@code 0.9f} factor.
+     * The test makes the implicit contract explicit so future readers don't
+     * accidentally shrink fixtures below it.
+     */
+    @Test
+    public void trainerHandlesInputSizeJustAboveMinimumPartitionThreshold() throws Exception {
+        final int uniquePeptides = 130;
+        final int spectraPerPeptide = 3;
+        final int charge = 2;
+        AminoAcidSet aaSet = AminoAcidSet.getStandardAminoAcidSet();
+        SpectraContainer container = buildSyntheticContainer(
+                uniquePeptides, spectraPerPeptide, charge, aaSet, 7L);
+
+        SpecDataType dataType = new SpecDataType(
+                ActivationMethod.HCD,
+                InstrumentType.LOW_RESOLUTION_LTQ,
+                Enzyme.TRYPSIN,
+                Protocol.STANDARD);
+        File outDir = tmp.newFolder("trainerOut3");
+
+        ScoringParameterGeneratorWithErrors.generateParameters(
+                container, dataType, aaSet, outDir, false, false);
+
+        File paramFile = new File(outDir, dataType.toString() + ".param");
+        assertTrue("trainer should still emit a .param at the input-size boundary",
+                paramFile.exists() && paramFile.length() > 0);
+
+        NewRankScorer reloaded = new NewRankScorer(paramFile.getPath());
+        assertNotNull(reloaded);
+        assertNotNull(reloaded.getParitionSet());
+        // With 380 spectra at one charge and numSegments=2, the partition
+        // routine produces 2 partitions (one per segment).
+        assertTrue("expected at least one partition; got " + reloaded.getParitionSet().size(),
+                reloaded.getParitionSet().size() >= 1);
+    }
+}