Skip to content

feat: prediction-score histogram + plot_rank error bars/class colors (prototype #312)#318

Draft
breimanntools wants to merge 6 commits into
masterfrom
feat/prediction-plots
Draft

feat: prediction-score histogram + plot_rank error bars/class colors (prototype #312)#318
breimanntools wants to merge 6 commits into
masterfrom
feat/prediction-plots

Conversation

@breimanntools

@breimanntools breimanntools commented Jun 30, 2026

Copy link
Copy Markdown
Owner

Status: SOLID — both figures work, new local tests green (96 in the two plotting test files; 412 across plotting + relevant api meta-tests), full ripple done within the locked __init__ policy. Overnight prototype for maintainer review.

Part of #305 / prototype for #312.

What this adds (two additive plotting things, ported from the original G-Sec project's scripts/plot_cpp_pred.py)

  1. plot_prediction_hist(df_pred, score=, group=, ...) — internal (aaanalysis/plotting/_plot_prediction_hist.py).
    Class-separated histogram of a 0-100 prediction score (port of _plot_pred1): sns.histplot with stacked/layered class bars, canonical sample colors (substrate=green, etc.), integer y-ticks, despine. [0, 1] probabilities are auto-rescaled to a [0, 100] percent axis. Returns (fig, ax).
    Not re-exported in aaanalysis/__init__.py (feat: prediction-score histogram + plot_rank error bars/class colors #312 is not on the wire-to-public-API list). Reached via from aaanalysis.plotting._plot_prediction_hist import plot_prediction_hist. A # TODO(#305): re-export ... (CONFIRM-FIRST, maintainer review) marker is left in the module.

  2. plot_rank extended additively with col_class= / col_std= (port of plot_pred3_top_hits).
    Passing col_class switches to a ranked-candidates horizontal-bar variant: named candidates (the DataFrame index) as bars colored by class, optional per-item error bars (col_std), vertical cutoff lines (threshold). When col_class is None the existing scatter path is byte-identical — guarded by TestPlotRankDefaultRegression (asserts the new args at their defaults reproduce the legacy call's offsets/colors/labels/lines).

What to review

  • API-naming decision (open question): per the locked spec I used score= / group= for plot_prediction_hist, which is inconsistent with the sibling plot_rank's col_score= / col_group=. Before a public re-export, the maintainer should pick one convention (I'd lean col_score/col_group for package consistency).
  • plot_rank two-mode design: one function now has a scatter mode and a horizontal-bar mode, selected by col_class. In bar mode col_group is unused and col_class carries the classes; candidate y-labels come from the DataFrame index. Reasonable, or should the bar variant be a separate plot_rank_candidates sibling? (The issue offers "extend plot_rank or add a sibling".)
  • Auto-rescale of [0, 1][0, 100] in the histogram (faithful to the original) — confirm this implicit behavior is wanted vs. requiring percent input.

Full ripple

  • numpydoc on both (named Returns fig, ax; versionchanged note on plot_rank).
  • Unit tests: positive + negative per new arg; plot_prediction_hist returns Axes; plot_rank default-output regression. tests/unit/plotting_tests/test_plot_prediction_hist.py (new) + additions to test_plot_rank.py.
  • Release notes Unreleased entry (docs/source/index/release_notes.rst, Plotting section).

TODO (deferred, needs the public symbol)

  • Re-export plot_prediction_hist in aaanalysis/__init__.py + plotting/__init__.py __all__ (CONFIRM-FIRST).
  • Example notebook examples/<abbr>_plot_prediction_hist.ipynb and cheat-sheet entry (public-symbol only).
  • Optionally add a Raises section to plot_rank (pre-existing advisory, unchanged by this PR).

Local gate run with the worktree ENV recipe (PYTHONPATH=$WT $MAIN/.venv/bin/python, verified aaanalysis.__file__ points at the worktree). No edits to __init__.py, pyproject.toml, _data/, workflows, config.py, or template_classes.py.

🤖 Generated with Claude Code


Critical self-review (review-and-improve pass, #312)

Went back over the whole diff adversarially, rendered every variant to PNG, and fixed the following concrete defects. Updated commit: 0cad7e9f.

Defects found and fixed

  1. The "byte-identical scatter" regression test was a tautology. test_new_args_none_match_legacy_call compared plot_rank(..., col_std=None, col_class=None) against the plain call — but those args default to None, so it was the same call twice and would pass even if the scatter branch were rewritten. Replaced TestPlotRankDefaultRegression with a frozen golden-value guard: the substrate group's exact (rank, score) offsets, the group→color mapping (COLOR_POS/COLOR_REL_NEG/COLOR_NEG), legend labels/sizes, and that thresholds are drawn as horizontal constant-y dashed lines — all pinned as literals independent of the current implementation, so any change to the sort/rank/color/threshold logic makes them fail.

  2. API-naming inconsistency (the open question) resolved. plot_prediction_hist used score=/group= while the sibling plot_rank uses col_score=/col_group=. Renamed the histogram params to col_score/col_group for plotting-family consistency and updated docstring, tests, and the inline example.

  3. The [0,1]→[0,100] rescale was silent magic and unsafe. It ran pd.to_numeric(errors="coerce"), which would silently turn a non-numeric score column into NaN and drop it. Now the score column is validated as numeric up front (clear ValueError otherwise), the rescale only fires on a genuine probability (finite max ≤ 1), and it emits a verbose-gated notice (ut.check_verbose(False) + ut.print_out) so the behavior is explicit, not hidden. Added tests: non-numeric score raises, and already-percent data (max > 1) is left untouched.

  4. Reused ut.check_lim for binrange instead of the hand-rolled _check_binrange helper (dropped it) — same validation via the shared house helper, less bespoke code.

  5. Integer count ticks made robust. The manual get_yticks() filter could emit an odd/sparse or sub-zero tick set; replaced with MaxNLocator(integer=True), matplotlib's idiomatic integer locator (verified integer ticks on the rendered figures).

Verified

  • 102 tests in the two plotting files green; 356 green across test_param_coverage, test_backend_import_hygiene, and the full plotting_tests/ dir (worktree aaanalysis.__file__ confirmed).
  • Rendered stacked / layered+KDE / probability histograms and the ranked-candidates bar + scatter modes: correct 0–100 bin range, integer y-ticks, despined, house-palette class colors, error-bar caps and vertical/horizontal thresholds all legible.
  • plot_prediction_hist stays internal (no __init__/__all__ touched; TODO kept); library calls no plt.show()/tight_layout()/plot_settings(); both functions return (fig, ax) per the family.

Residual (deliberately not changed)

  • plot_rank's two-mode design (col_class toggles bars; col_group then unused) is the maintainer's open API question — left as-is per the issue's "extend vs. sibling" choice.
  • Error-bar caps (black) and threshold lines (grey) use structural neutrals, consistent with the pre-existing scatter threshold style, not category-palette colors — kept.
  • _resolve_group_colors is shared by importing it from _plot_rank into _plot_prediction_hist (same subpackage, passes import-hygiene); a future tidy could hoist it into a shared plotting helper, out of scope here.
  • Pre-existing RAISES-UNDOCUMENTED advisory on plot_rank (house style omits Raises sections) — unchanged.

Iterative review log

  • Round 2 (correctness & edge cases): plot_prediction_hist all-NaN score column raised a cryptic seaborn No objects to concatenate; now a clean ValueError (finite-value guard). plot_rank silently accepted a non-numeric col_score (both modes) and col_std; added numeric-dtype validation matching the sibling histogram. +4 tests (106 in the two plotting files).
  • Round 3 (plot quality): rendered both figures to PNG under aa.plot_settings() and default rcParams. Histogram was clean (integer y-ticks, despine, house palette, gcfs fonts). The ranked-candidates bar mode had a legend collision — the class legend (default best loc) overlapped the top bars. Moved it outside the axes (right, frameon=False); verified no tight_layout clipping. +1 guard test.
  • Round 4 (efficiency & simplicity): the bar helper recomputed df[col_score].to_numpy() twice (barh + errorbar); hoisted it to one local. The per-group scatter loop and the single frame copies are idiomatic and cannot be vectorized without changing the byte-identical output, so no further change. Output identical.
  • Round 5 (guides, docs & tests): docstring checker + drift clean (only the pre-existing house-style RAISES-UNDOCUMENTED advisory); no ADR refs; all See-Also xrefs (plot_get_clist, comp_per_protein_ap, comp_detection_metrics) resolve; every public param validated; every new param has positive+negative tests; full area gate green (plotting dir + param_coverage + import-hygiene + plot-return-contract = 361). One fix: dropped the versionchanged:: 1.1.0 on plot_rank (redundant, since the function is itself versionadded:: 1.1.0, an unreleased version — nothing to have "changed" from).

breimanntools and others added 6 commits July 1, 2026 01:29
…de (prototype #312)

Add two additive prediction-result figures ported from the original G-Sec
project's scripts/plot_cpp_pred.py:

- plot_prediction_hist (internal, aaanalysis/plotting/_plot_prediction_hist.py):
  class-separated histogram of a 0-100 prediction score (port of _plot_pred1).
  [0, 1] probabilities are auto-rescaled to a percent axis. NOT re-exported in
  __init__ (#312 is not on the wire-to-public-API list); reached via submodule
  import. TODO(#305) marker left for a CONFIRM-FIRST re-export.
- plot_rank: additive col_class / col_std ranked-candidates mode (port of
  plot_pred3_top_hits) - named candidates as horizontal bars colored by class,
  optional per-item error bars, vertical cutoff lines. The default scatter path
  stays byte-identical (col_class=None); guarded by a regression test.

Full ripple: numpydoc on both, positive+negative unit tests per new arg,
regression test proving plot_rank default output unchanged, release-notes
Unreleased entry. Example notebook + cheat-sheet entry deferred (need the public
symbol) and flagged as TODO.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ssion (#312)

- Rename plot_prediction_hist params score/group -> col_score/col_group for
  consistency with the plotting family (plot_rank uses col_score/col_group).
- Make the [0,1]->[0,100] rescale explicit and safe: reject a non-numeric score
  column with a clear ValueError (no silent to_numeric coerce), and emit a
  verbose-gated notice when a probability is rescaled.
- Reuse ut.check_lim for binrange (drop the bespoke _check_binrange helper).
- Robust integer count ticks via MaxNLocator(integer=True).
- Replace the tautological plot_rank "default unchanged" test with a frozen
  golden-value regression (offsets/labels/colors/horizontal thresholds) that
  fails if the scatter branch is altered; add non-numeric-score and
  percent-not-rescaled tests for the histogram.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… overlap (#312)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…sionadded 1.1.0) (#312)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant