feat: aa.plot_eval_heatmap — house-preset evaluation heatmap (prototype #310) by breimanntools · Pull Request #317 · breimanntools/aaanalysis

breimanntools · 2026-06-30T23:30:26Z

Status: SOLID — works, green local tests, full ripple done. New public function
aa.plot_eval_heatmap, all 482 plotting+api tests pass, the example notebook
re-executes under nbmake, and a regression test pins byte-equivalence to the seaborn
block this consolidates.

What to review first

aaanalysis/plotting/_plot_eval_heatmap.py — the frontend + Validate block.
The See Also in the docstring documenting the relationship to aap.plot_eval
(this is the simple static sibling; aap.plot_eval is the adaptive sweep
summarizer). Confirm this framing is the intended reconciliation for feat: aa.plot_eval_heatmap — house-preset evaluation heatmap (consolidate duplicated seaborn blocks) #310's
"single static-grid heatmap entry point" requirement.
__init__.py / __all__ re-export (CONFIRM-FIRST, additive — on the
wire-to-public-API list for this epic).

Open API question

The issue asks to "decide: standalone helper vs a mode of aap.plot_eval." This
prototype chose a standalone top-level helper (aa.plot_eval_heatmap) and
documents the relationship rather than folding it into aap.plot_eval. If you'd
rather expose it as a mode/flag on aap.plot_eval, that is a small follow-up.

Part of #305 / prototype for #310.

Summary

Consolidates the hand-built seaborn evaluation-heatmap block — duplicated in the
γ-secretase notebook cells 12/25 and a further 3× in the original project — into one
call with the house preset:

aa.plot_eval_heatmap(df_eval, xlabel=None, ylabel=None, vmin=50, vmax=100,
                     cbar_label="Balanced accuracy [%]", ax=None)  # -> Axes

viridis, fixed [vmin, vmax] color limits, annot with fmt=".0f",
linewidth=0.1, labeled colorbar, no left/bottom ticks, horizontal tick labels.
Returns the Axes; library code never calls plt.show()/tight_layout().

Full ripple

numpydoc docstring with versionadded, named Returns, See Also, Examples
include.
Executed example notebook examples/plotting/plot_eval_heatmap.ipynb (inline
backend, embedded PNG outputs, plt.tight_layout()+plt.show() in the notebook
only).
21 unit tests: returns Axes; respects vmin/vmax/labels/cbar_label/ax;
annotation count; horizontal ticks; bad input → ValueError; and a
seaborn-block equivalence regression test (KPI).
Public API: re-exported via plotting/__init__.py, top-level __init__.py/__all__,
and docs/source/api.rst.
Release notes (Unreleased → Plotting) + cheat sheet entry.

Notes / not done

I did not re-author γ-secretase notebook cells 12/25 in place: that notebook is
the full paper pipeline (heavy CPP/dPULearn runs) and is concurrently modified in the
main checkout, so re-executing it overnight was out of scope. Equivalence to those
cells is instead proven by test_matches_raw_seaborn_block (same heatmap array,
clims, colormap, and annotations). Swapping the cells onto the new call is a trivial
maintainer follow-up.

🤖 Generated with Claude Code

Critical self-review (post-prototype pass)

Rendered the function to PNG under the house context (aa.plot_settings(),
font 18) and inspected — found and fixed real defects:

Defects found + fixed

X-tick labels overlapped into unreadable mush — the biggest defect, and it
was visible in the committed example notebook's own cell-2 output
(25 feature50 f...). Forcing x labels horizontal (rotation=0) with no sizing
control breaks for any non-trivial column-label length, including the realistic
γ-secretase case (scale61 scale60 scale75 CPP). Added xtick_rotation /
ytick_rotation (non-zero x rotation is right-aligned) and a figsize arg
(consistent with the plot_rank sibling). The example's long-label panel now
renders clean; re-executed with the inline backend.
linewidth=0.1 was an ambiguous kwargs pass-through that collides with
seaborn's own linewidths=0 default on pcolormesh. Switched to the documented
linewidths=0.1. Visual/equivalence unchanged (the KPI test asserts array /
clim / cmap / annotations, not line width).
Minor: the all-numeric column check computed select_dtypes twice — now
once; added the # I Helper Functions skeleton header; validate the new params
in the Validate block; tightened the See Also.

Confirmed correct (not changed)

Returns the Axes (not (fig, ax)) — matches issue feat: aa.plot_eval_heatmap — house-preset evaluation heatmap (consolidate duplicated seaborn blocks) #310's explicit
requirement; library code never calls plt.show() / tight_layout() /
plot_settings().
Annotation text contrast is auto-handled by seaborn (white on dark, dark on
light) — legible across the viridis range in the renders.
cmap="viridis" kept as a literal: plot_get_cmap only ships the diverging
CPP/SHAP palettes (wrong for a sequential 50-100 accuracy scale); viridis is the
house preset named in the issue, so no preset applies.

Tests / gates (worktree-authoritative import verified)

Added tests for xtick_rotation (rotation + ha), ytick_rotation, figsize,
and bad-type rejection. plotting_tests 312 passed, api_tests 175 passed
(param-coverage, backend-import-hygiene, return-contract all green), docstring
structural checker clean (only the sibling-consistent RAISES advisory), and
--nbmake on the example notebook passes.

Residual

Notebook cells 12/25 of the γ-secretase use case are still not re-authored onto
the new call (that notebook is heavy and concurrently edited in the main
checkout); equivalence to those cells stays proven by
test_matches_raw_seaborn_block. Trivial maintainer follow-up.

Iterative review log

Round 2 (correctness/edge): converged — no new defects. Adversarially probed NaN cells, all-NaN, inf, bool/datetime/object columns, duplicate column names, single-row/col, integer dtype. NaN cells render blank (only non-NaN annotated) and never raise; bool/datetime/dup-col grids are cleanly rejected via the Validate block. Added two edge tests (test_nan_cells_render_gracefully, test_integer_grid_accepted) to guard the NaN-graceful and integer-dtype behavior (a sweep table can legitimately carry NaN for a failed config). No stochastic path → reproducibility N/A.
Round 3 (plot quality): converged — no new defects. Rendered to PNG under both aa.plot_settings() (font 18) and default rcParams, across small (2×3), large (8×10), and long-label (45° rotated) grids. Verified: seaborn per-cell luminance contrast is legible across the whole viridis range (white on dark blue/teal, dark on green/yellow); annotation/tick/colorbar-label/colorbar-tick font sizes all scale correctly with rcParams (18/16.5/18/16.5 under plot_settings, 10 at default); colorbar is labeled; long labels right-align cleanly with no overlap/clipping; set_xticklabels renders with zero warnings (warnings-as-errors check passed); library code contains no plt.show/tight_layout/plot_settings. Committed example-notebook outputs re-verified clean. Non-square aspect kept deliberately to stay byte-equivalent to the consolidated seaborn block (ADR-0032).
Round 4 (efficiency/simplicity): simplified two Validate-block guards with identical behavior — len(df_eval) == 0 or df_eval.shape[1] == 0 → idiomatic df_eval.empty; and the all-numeric check from select_dtypes(include=...) + length-compare + an O(n²) membership comprehension (error path) down to a single select_dtypes(exclude="number").columns.tolist(). Draw path already fully vectorized (no loops/iterrows/copies); verified byte-identical rejection on empty/bool/datetime and acceptance on float/int grids.
Round 5 (guides/docs/tests): docstring checker reports 0 defects (the single RAISES-UNDOCUMENTED advisory is house-consistent — no plotting sibling documents a Raises section, so leaving it off matches the family). Validate block confirmed to cover every public param via a ut.check_*; no ADR refs, no print, bare ValueError, viridis literal justified (no matching ut preset for a sequential 50–100 scale). Added negative-type tests for ylabel/cbar_label/ytick_rotation so every public param now has positive+negative coverage. Full local gate green: heatmap tests + param-coverage + backend-import-hygiene + plot-return-contract + docstring-contracts (113 passed). Example notebook re-executes cleanly against the worktree code (verified cell-by-cell); the local --nbmake miss is only the worktree/editable-install split (kernel resolves the installed main-checkout package), which does not occur in CI where the merged package is pip-installed.

#310) Consolidate the hand-built seaborn evaluation-heatmap block (duplicated in the gamma-secretase notebook cells 12/25 and the original project) into one top-level function with the house preset: viridis, fixed [vmin, vmax] color limits, integer annotations, labeled colorbar, horizontal ticks. Returns the Axes; library code never calls plt.show()/tight_layout(). The simple static sibling of the adaptive aap.plot_eval (documented in See Also). - aaanalysis/plotting/_plot_eval_heatmap.py: frontend with Validate block - wire to public API: plotting/__init__.py + __init__.py/__all__ + api.rst - numpydoc + Examples include; executed example notebook - 21 unit tests incl. seaborn-block equivalence (KPI) + bad-input ValueError - release notes Unreleased + cheat sheet entry Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…, correct seaborn linewidths Critical-review pass on #310: - Long/normal-length column labels overlapped into unreadable mush because the x-tick labels were forced horizontal (rotation=0) with no sizing control. Add xtick_rotation / ytick_rotation (right-aligned when rotated) and a figsize arg; the committed example's long-label panel now renders clean. - Use seaborn's documented linewidths= (was the ambiguous kwargs-passthrough linewidth=, which collides with seaborn's own linewidths=0 default). - Compute the all-numeric column check once; add the # I Helper Functions skeleton header; validate the new params in the Validate block. - Tests + example notebook updated and re-executed (inline backend). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Adversarial correctness pass found no defect; NaN cells render blank and never raise. Guard NaN-graceful annotation and integer-dtype acceptance. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Use idiomatic df.empty for the min-shape guard and a single select_dtypes(exclude='number') for the all-numeric guard (drops the length compare + O(n^2) membership comprehension). Behavior identical. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…bel/ytick_rotation Completes positive+negative coverage for every public param. Docstring checker clean (0 defects; RAISES-UNDOCUMENTED advisory is house-consistent — no plotting sibling documents Raises). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…for evaluation maps) Evaluation maps should read as a square grid of equal-sized cells; sns.heatmap defaulted square=False (cells stretched to the figure). Add a square: bool=True param (opt out with square=False). Additive; equivalence test unaffected (checks data/clim/cmap/annotations, not aspect).

breimanntools and others added 7 commits July 1, 2026 01:29

round2(plot_eval_heatmap): add NaN-graceful + integer-dtype edge tests

f8d1828

Adversarial correctness pass found no defect; NaN cells render blank and never raise. Guard NaN-graceful annotation and integer-dtype acceptance. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/master' into feat/plot-eval-heatmap

f5d6e9c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: aa.plot_eval_heatmap — house-preset evaluation heatmap (prototype #310)#317

feat: aa.plot_eval_heatmap — house-preset evaluation heatmap (prototype #310)#317
breimanntools wants to merge 7 commits into
masterfrom
feat/plot-eval-heatmap

breimanntools commented Jun 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

breimanntools commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Full ripple

Notes / not done

🤖 Generated with Claude Code

Critical self-review (post-prototype pass)

Iterative review log

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

breimanntools commented Jun 30, 2026 •

edited

Loading