feat: prediction-score histogram + plot_rank error bars/class colors (prototype #312) by breimanntools · Pull Request #318 · breimanntools/aaanalysis

breimanntools · 2026-06-30T23:30:33Z

Status: SOLID — both figures work, new local tests green (96 in the two plotting test files; 412 across plotting + relevant api meta-tests), full ripple done within the locked __init__ policy. Overnight prototype for maintainer review.

Part of #305 / prototype for #312.

What this adds (two additive plotting things, ported from the original G-Sec project's `scripts/plot_cpp_pred.py`)

plot_prediction_hist(df_pred, score=, group=, ...) — internal (aaanalysis/plotting/_plot_prediction_hist.py).
Class-separated histogram of a 0-100 prediction score (port of _plot_pred1): sns.histplot with stacked/layered class bars, canonical sample colors (substrate=green, etc.), integer y-ticks, despine. [0, 1] probabilities are auto-rescaled to a [0, 100] percent axis. Returns (fig, ax).
Not re-exported in aaanalysis/__init__.py (feat: prediction-score histogram + plot_rank error bars/class colors #312 is not on the wire-to-public-API list). Reached via from aaanalysis.plotting._plot_prediction_hist import plot_prediction_hist. A # TODO(#305): re-export ... (CONFIRM-FIRST, maintainer review) marker is left in the module.
plot_rank extended additively with col_class= / col_std= (port of plot_pred3_top_hits).
Passing col_class switches to a ranked-candidates horizontal-bar variant: named candidates (the DataFrame index) as bars colored by class, optional per-item error bars (col_std), vertical cutoff lines (threshold). When col_class is None the existing scatter path is byte-identical — guarded by TestPlotRankDefaultRegression (asserts the new args at their defaults reproduce the legacy call's offsets/colors/labels/lines).

What to review

API-naming decision (open question): per the locked spec I used score= / group= for plot_prediction_hist, which is inconsistent with the sibling plot_rank's col_score= / col_group=. Before a public re-export, the maintainer should pick one convention (I'd lean col_score/col_group for package consistency).
plot_rank two-mode design: one function now has a scatter mode and a horizontal-bar mode, selected by col_class. In bar mode col_group is unused and col_class carries the classes; candidate y-labels come from the DataFrame index. Reasonable, or should the bar variant be a separate plot_rank_candidates sibling? (The issue offers "extend plot_rank or add a sibling".)
Auto-rescale of [0, 1] → [0, 100] in the histogram (faithful to the original) — confirm this implicit behavior is wanted vs. requiring percent input.

Full ripple

numpydoc on both (named Returns fig, ax; versionchanged note on plot_rank).
Unit tests: positive + negative per new arg; plot_prediction_hist returns Axes; plot_rank default-output regression. tests/unit/plotting_tests/test_plot_prediction_hist.py (new) + additions to test_plot_rank.py.
Release notes Unreleased entry (docs/source/index/release_notes.rst, Plotting section).

TODO (deferred, needs the public symbol)

Re-export plot_prediction_hist in aaanalysis/__init__.py + plotting/__init__.py __all__ (CONFIRM-FIRST).
Example notebook examples/<abbr>_plot_prediction_hist.ipynb and cheat-sheet entry (public-symbol only).
Optionally add a Raises section to plot_rank (pre-existing advisory, unchanged by this PR).

Local gate run with the worktree ENV recipe (PYTHONPATH=$WT $MAIN/.venv/bin/python, verified aaanalysis.__file__ points at the worktree). No edits to __init__.py, pyproject.toml, _data/, workflows, config.py, or template_classes.py.

🤖 Generated with Claude Code

Critical self-review (review-and-improve pass, #312)

Went back over the whole diff adversarially, rendered every variant to PNG, and fixed the following concrete defects. Updated commit: 0cad7e9f.

Defects found and fixed

The "byte-identical scatter" regression test was a tautology. test_new_args_none_match_legacy_call compared plot_rank(..., col_std=None, col_class=None) against the plain call — but those args default to None, so it was the same call twice and would pass even if the scatter branch were rewritten. Replaced TestPlotRankDefaultRegression with a frozen golden-value guard: the substrate group's exact (rank, score) offsets, the group→color mapping (COLOR_POS/COLOR_REL_NEG/COLOR_NEG), legend labels/sizes, and that thresholds are drawn as horizontal constant-y dashed lines — all pinned as literals independent of the current implementation, so any change to the sort/rank/color/threshold logic makes them fail.
API-naming inconsistency (the open question) resolved. plot_prediction_hist used score=/group= while the sibling plot_rank uses col_score=/col_group=. Renamed the histogram params to col_score/col_group for plotting-family consistency and updated docstring, tests, and the inline example.
The [0,1]→[0,100] rescale was silent magic and unsafe. It ran pd.to_numeric(errors="coerce"), which would silently turn a non-numeric score column into NaN and drop it. Now the score column is validated as numeric up front (clear ValueError otherwise), the rescale only fires on a genuine probability (finite max ≤ 1), and it emits a verbose-gated notice (ut.check_verbose(False) + ut.print_out) so the behavior is explicit, not hidden. Added tests: non-numeric score raises, and already-percent data (max > 1) is left untouched.
Reused ut.check_lim for binrange instead of the hand-rolled _check_binrange helper (dropped it) — same validation via the shared house helper, less bespoke code.
Integer count ticks made robust. The manual get_yticks() filter could emit an odd/sparse or sub-zero tick set; replaced with MaxNLocator(integer=True), matplotlib's idiomatic integer locator (verified integer ticks on the rendered figures).

Verified

102 tests in the two plotting files green; 356 green across test_param_coverage, test_backend_import_hygiene, and the full plotting_tests/ dir (worktree aaanalysis.__file__ confirmed).
Rendered stacked / layered+KDE / probability histograms and the ranked-candidates bar + scatter modes: correct 0–100 bin range, integer y-ticks, despined, house-palette class colors, error-bar caps and vertical/horizontal thresholds all legible.
plot_prediction_hist stays internal (no __init__/__all__ touched; TODO kept); library calls no plt.show()/tight_layout()/plot_settings(); both functions return (fig, ax) per the family.

Residual (deliberately not changed)

plot_rank's two-mode design (col_class toggles bars; col_group then unused) is the maintainer's open API question — left as-is per the issue's "extend vs. sibling" choice.
Error-bar caps (black) and threshold lines (grey) use structural neutrals, consistent with the pre-existing scatter threshold style, not category-palette colors — kept.
_resolve_group_colors is shared by importing it from _plot_rank into _plot_prediction_hist (same subpackage, passes import-hygiene); a future tidy could hoist it into a shared plotting helper, out of scope here.
Pre-existing RAISES-UNDOCUMENTED advisory on plot_rank (house style omits Raises sections) — unchanged.

Iterative review log

Round 2 (correctness & edge cases): plot_prediction_hist all-NaN score column raised a cryptic seaborn No objects to concatenate; now a clean ValueError (finite-value guard). plot_rank silently accepted a non-numeric col_score (both modes) and col_std; added numeric-dtype validation matching the sibling histogram. +4 tests (106 in the two plotting files).
Round 3 (plot quality): rendered both figures to PNG under aa.plot_settings() and default rcParams. Histogram was clean (integer y-ticks, despine, house palette, gcfs fonts). The ranked-candidates bar mode had a legend collision — the class legend (default best loc) overlapped the top bars. Moved it outside the axes (right, frameon=False); verified no tight_layout clipping. +1 guard test.
Round 4 (efficiency & simplicity): the bar helper recomputed df[col_score].to_numpy() twice (barh + errorbar); hoisted it to one local. The per-group scatter loop and the single frame copies are idiomatic and cannot be vectorized without changing the byte-identical output, so no further change. Output identical.
Round 5 (guides, docs & tests): docstring checker + drift clean (only the pre-existing house-style RAISES-UNDOCUMENTED advisory); no ADR refs; all See-Also xrefs (plot_get_clist, comp_per_protein_ap, comp_detection_metrics) resolve; every public param validated; every new param has positive+negative tests; full area gate green (plotting dir + param_coverage + import-hygiene + plot-return-contract = 361). One fix: dropped the versionchanged:: 1.1.0 on plot_rank (redundant, since the function is itself versionadded:: 1.1.0, an unreleased version — nothing to have "changed" from).

…de (prototype #312) Add two additive prediction-result figures ported from the original G-Sec project's scripts/plot_cpp_pred.py: - plot_prediction_hist (internal, aaanalysis/plotting/_plot_prediction_hist.py): class-separated histogram of a 0-100 prediction score (port of _plot_pred1). [0, 1] probabilities are auto-rescaled to a percent axis. NOT re-exported in __init__ (#312 is not on the wire-to-public-API list); reached via submodule import. TODO(#305) marker left for a CONFIRM-FIRST re-export. - plot_rank: additive col_class / col_std ranked-candidates mode (port of plot_pred3_top_hits) - named candidates as horizontal bars colored by class, optional per-item error bars, vertical cutoff lines. The default scatter path stays byte-identical (col_class=None); guarded by a regression test. Full ripple: numpydoc on both, positive+negative unit tests per new arg, regression test proving plot_rank default output unchanged, release-notes Unreleased entry. Example notebook + cheat-sheet entry deferred (need the public symbol) and flagged as TODO. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ssion (#312) - Rename plot_prediction_hist params score/group -> col_score/col_group for consistency with the plotting family (plot_rank uses col_score/col_group). - Make the [0,1]->[0,100] rescale explicit and safe: reject a non-numeric score column with a clear ValueError (no silent to_numeric coerce), and emit a verbose-gated notice when a probability is rescaled. - Reuse ut.check_lim for binrange (drop the bespoke _check_binrange helper). - Robust integer count ticks via MaxNLocator(integer=True). - Replace the tautological plot_rank "default unchanged" test with a frozen golden-value regression (offsets/labels/colors/horizontal thresholds) that fails if the scatter branch is altered; add non-numeric-score and percent-not-rescaled tests for the histogram. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… overlap (#312) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…sionadded 1.1.0) (#312) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

breimanntools and others added 6 commits July 1, 2026 01:29

round2(plot): validate numeric score/std + finite-score guard (#312)

16e2f1c

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

round3(plot): move ranked-candidates legend outside axes to avoid bar…

603d4a9

… overlap (#312) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

round4(plot): hoist repeated score to_numpy in bar helper (#312)

a7a9c82

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

round5(plot): drop redundant versionchanged on plot_rank (same as ver…

2392c81

…sionadded 1.1.0) (#312) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: prediction-score histogram + plot_rank error bars/class colors (prototype #312)#318

feat: prediction-score histogram + plot_rank error bars/class colors (prototype #312)#318
breimanntools wants to merge 6 commits into
masterfrom
feat/prediction-plots

breimanntools commented Jun 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

breimanntools commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this adds (two additive plotting things, ported from the original G-Sec project's scripts/plot_cpp_pred.py)

What to review

Full ripple

TODO (deferred, needs the public symbol)

Critical self-review (review-and-improve pass, #312)

Iterative review log

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

breimanntools commented Jun 30, 2026 •

edited

Loading

What this adds (two additive plotting things, ported from the original G-Sec project's `scripts/plot_cpp_pred.py`)