feat: prediction-score histogram + plot_rank error bars/class colors (prototype #312)#318
Draft
breimanntools wants to merge 6 commits into
Draft
feat: prediction-score histogram + plot_rank error bars/class colors (prototype #312)#318breimanntools wants to merge 6 commits into
breimanntools wants to merge 6 commits into
Conversation
…de (prototype #312) Add two additive prediction-result figures ported from the original G-Sec project's scripts/plot_cpp_pred.py: - plot_prediction_hist (internal, aaanalysis/plotting/_plot_prediction_hist.py): class-separated histogram of a 0-100 prediction score (port of _plot_pred1). [0, 1] probabilities are auto-rescaled to a percent axis. NOT re-exported in __init__ (#312 is not on the wire-to-public-API list); reached via submodule import. TODO(#305) marker left for a CONFIRM-FIRST re-export. - plot_rank: additive col_class / col_std ranked-candidates mode (port of plot_pred3_top_hits) - named candidates as horizontal bars colored by class, optional per-item error bars, vertical cutoff lines. The default scatter path stays byte-identical (col_class=None); guarded by a regression test. Full ripple: numpydoc on both, positive+negative unit tests per new arg, regression test proving plot_rank default output unchanged, release-notes Unreleased entry. Example notebook + cheat-sheet entry deferred (need the public symbol) and flagged as TODO. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ssion (#312) - Rename plot_prediction_hist params score/group -> col_score/col_group for consistency with the plotting family (plot_rank uses col_score/col_group). - Make the [0,1]->[0,100] rescale explicit and safe: reject a non-numeric score column with a clear ValueError (no silent to_numeric coerce), and emit a verbose-gated notice when a probability is rescaled. - Reuse ut.check_lim for binrange (drop the bespoke _check_binrange helper). - Robust integer count ticks via MaxNLocator(integer=True). - Replace the tautological plot_rank "default unchanged" test with a frozen golden-value regression (offsets/labels/colors/horizontal thresholds) that fails if the scatter branch is altered; add non-numeric-score and percent-not-rescaled tests for the histogram. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… overlap (#312) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…sionadded 1.1.0) (#312) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Status: SOLID — both figures work, new local tests green (96 in the two plotting test files; 412 across plotting + relevant api meta-tests), full ripple done within the locked
__init__policy. Overnight prototype for maintainer review.Part of #305 / prototype for #312.
What this adds (two additive plotting things, ported from the original G-Sec project's
scripts/plot_cpp_pred.py)plot_prediction_hist(df_pred, score=, group=, ...)— internal (aaanalysis/plotting/_plot_prediction_hist.py).Class-separated histogram of a 0-100 prediction score (port of
_plot_pred1):sns.histplotwith stacked/layered class bars, canonical sample colors (substrate=green, etc.), integer y-ticks, despine.[0, 1]probabilities are auto-rescaled to a[0, 100]percent axis. Returns(fig, ax).Not re-exported in
aaanalysis/__init__.py(feat: prediction-score histogram + plot_rank error bars/class colors #312 is not on the wire-to-public-API list). Reached viafrom aaanalysis.plotting._plot_prediction_hist import plot_prediction_hist. A# TODO(#305): re-export ... (CONFIRM-FIRST, maintainer review)marker is left in the module.plot_rankextended additively withcol_class=/col_std=(port ofplot_pred3_top_hits).Passing
col_classswitches to a ranked-candidates horizontal-bar variant: named candidates (the DataFrame index) as bars colored by class, optional per-item error bars (col_std), vertical cutoff lines (threshold). Whencol_class is Nonethe existing scatter path is byte-identical — guarded byTestPlotRankDefaultRegression(asserts the new args at their defaults reproduce the legacy call's offsets/colors/labels/lines).What to review
score=/group=forplot_prediction_hist, which is inconsistent with the siblingplot_rank'scol_score=/col_group=. Before a public re-export, the maintainer should pick one convention (I'd leancol_score/col_groupfor package consistency).plot_ranktwo-mode design: one function now has a scatter mode and a horizontal-bar mode, selected bycol_class. In bar modecol_groupis unused andcol_classcarries the classes; candidate y-labels come from the DataFrame index. Reasonable, or should the bar variant be a separateplot_rank_candidatessibling? (The issue offers "extendplot_rankor add a sibling".)[0, 1]→[0, 100]in the histogram (faithful to the original) — confirm this implicit behavior is wanted vs. requiring percent input.Full ripple
Returns fig, ax;versionchangednote onplot_rank).plot_prediction_histreturns Axes;plot_rankdefault-output regression.tests/unit/plotting_tests/test_plot_prediction_hist.py(new) + additions totest_plot_rank.py.docs/source/index/release_notes.rst, Plotting section).TODO (deferred, needs the public symbol)
plot_prediction_histinaaanalysis/__init__.py+plotting/__init__.py__all__(CONFIRM-FIRST).examples/<abbr>_plot_prediction_hist.ipynband cheat-sheet entry (public-symbol only).Raisessection toplot_rank(pre-existing advisory, unchanged by this PR).Local gate run with the worktree ENV recipe (
PYTHONPATH=$WT $MAIN/.venv/bin/python, verifiedaaanalysis.__file__points at the worktree). No edits to__init__.py,pyproject.toml,_data/, workflows,config.py, ortemplate_classes.py.🤖 Generated with Claude Code
Critical self-review (review-and-improve pass, #312)
Went back over the whole diff adversarially, rendered every variant to PNG, and fixed the following concrete defects. Updated commit:
0cad7e9f.Defects found and fixed
The "byte-identical scatter" regression test was a tautology.
test_new_args_none_match_legacy_callcomparedplot_rank(..., col_std=None, col_class=None)against the plain call — but those args default toNone, so it was the same call twice and would pass even if the scatter branch were rewritten. ReplacedTestPlotRankDefaultRegressionwith a frozen golden-value guard: the substrate group's exact(rank, score)offsets, the group→color mapping (COLOR_POS/COLOR_REL_NEG/COLOR_NEG), legend labels/sizes, and that thresholds are drawn as horizontal constant-y dashed lines — all pinned as literals independent of the current implementation, so any change to the sort/rank/color/threshold logic makes them fail.API-naming inconsistency (the open question) resolved.
plot_prediction_histusedscore=/group=while the siblingplot_rankusescol_score=/col_group=. Renamed the histogram params tocol_score/col_groupfor plotting-family consistency and updated docstring, tests, and the inline example.The
[0,1]→[0,100]rescale was silent magic and unsafe. It ranpd.to_numeric(errors="coerce"), which would silently turn a non-numeric score column into NaN and drop it. Now the score column is validated as numeric up front (clearValueErrorotherwise), the rescale only fires on a genuine probability (finite max ≤ 1), and it emits a verbose-gated notice (ut.check_verbose(False)+ut.print_out) so the behavior is explicit, not hidden. Added tests: non-numeric score raises, and already-percent data (max > 1) is left untouched.Reused
ut.check_limforbinrangeinstead of the hand-rolled_check_binrangehelper (dropped it) — same validation via the shared house helper, less bespoke code.Integer count ticks made robust. The manual
get_yticks()filter could emit an odd/sparse or sub-zero tick set; replaced withMaxNLocator(integer=True), matplotlib's idiomatic integer locator (verified integer ticks on the rendered figures).Verified
test_param_coverage,test_backend_import_hygiene, and the fullplotting_tests/dir (worktreeaaanalysis.__file__confirmed).plot_prediction_histstays internal (no__init__/__all__touched; TODO kept); library calls noplt.show()/tight_layout()/plot_settings(); both functions return(fig, ax)per the family.Residual (deliberately not changed)
plot_rank's two-mode design (col_classtoggles bars;col_groupthen unused) is the maintainer's open API question — left as-is per the issue's "extend vs. sibling" choice.black) and threshold lines (grey) use structural neutrals, consistent with the pre-existing scatter threshold style, not category-palette colors — kept._resolve_group_colorsis shared by importing it from_plot_rankinto_plot_prediction_hist(same subpackage, passes import-hygiene); a future tidy could hoist it into a shared plotting helper, out of scope here.RAISES-UNDOCUMENTEDadvisory onplot_rank(house style omitsRaisessections) — unchanged.Iterative review log
plot_prediction_histall-NaN score column raised a cryptic seabornNo objects to concatenate; now a cleanValueError(finite-value guard).plot_ranksilently accepted a non-numericcol_score(both modes) andcol_std; added numeric-dtype validation matching the sibling histogram. +4 tests (106 in the two plotting files).aa.plot_settings()and default rcParams. Histogram was clean (integer y-ticks, despine, house palette, gcfs fonts). The ranked-candidates bar mode had a legend collision — the class legend (defaultbestloc) overlapped the top bars. Moved it outside the axes (right,frameon=False); verified notight_layoutclipping. +1 guard test.df[col_score].to_numpy()twice (barh + errorbar); hoisted it to one local. The per-group scatter loop and the single frame copies are idiomatic and cannot be vectorized without changing the byte-identical output, so no further change. Output identical.RAISES-UNDOCUMENTEDadvisory); no ADR refs; all See-Also xrefs (plot_get_clist,comp_per_protein_ap,comp_detection_metrics) resolve; every public param validated; every new param has positive+negative tests; full area gate green (plotting dir + param_coverage + import-hygiene + plot-return-contract = 361). One fix: dropped theversionchanged:: 1.1.0onplot_rank(redundant, since the function is itselfversionadded:: 1.1.0, an unreleased version — nothing to have "changed" from).