Skip to content

feat: aa.plot_eval_heatmap — house-preset evaluation heatmap (prototype #310)#317

Draft
breimanntools wants to merge 7 commits into
masterfrom
feat/plot-eval-heatmap
Draft

feat: aa.plot_eval_heatmap — house-preset evaluation heatmap (prototype #310)#317
breimanntools wants to merge 7 commits into
masterfrom
feat/plot-eval-heatmap

Conversation

@breimanntools

@breimanntools breimanntools commented Jun 30, 2026

Copy link
Copy Markdown
Owner

Status: SOLID — works, green local tests, full ripple done. New public function
aa.plot_eval_heatmap, all 482 plotting+api tests pass, the example notebook
re-executes under nbmake, and a regression test pins byte-equivalence to the seaborn
block this consolidates.

What to review first

  • aaanalysis/plotting/_plot_eval_heatmap.py — the frontend + Validate block.
  • The See Also in the docstring documenting the relationship to aap.plot_eval
    (this is the simple static sibling; aap.plot_eval is the adaptive sweep
    summarizer). Confirm this framing is the intended reconciliation for feat: aa.plot_eval_heatmap — house-preset evaluation heatmap (consolidate duplicated seaborn blocks) #310's
    "single static-grid heatmap entry point" requirement.
  • __init__.py / __all__ re-export (CONFIRM-FIRST, additive — on the
    wire-to-public-API list for this epic).

Open API question

  • The issue asks to "decide: standalone helper vs a mode of aap.plot_eval." This
    prototype chose a standalone top-level helper (aa.plot_eval_heatmap) and
    documents the relationship rather than folding it into aap.plot_eval. If you'd
    rather expose it as a mode/flag on aap.plot_eval, that is a small follow-up.

Part of #305 / prototype for #310.

Summary

Consolidates the hand-built seaborn evaluation-heatmap block — duplicated in the
γ-secretase notebook cells 12/25 and a further 3× in the original project — into one
call with the house preset:

aa.plot_eval_heatmap(df_eval, xlabel=None, ylabel=None, vmin=50, vmax=100,
                     cbar_label="Balanced accuracy [%]", ax=None)  # -> Axes

viridis, fixed [vmin, vmax] color limits, annot with fmt=".0f",
linewidth=0.1, labeled colorbar, no left/bottom ticks, horizontal tick labels.
Returns the Axes; library code never calls plt.show()/tight_layout().

Full ripple

  • numpydoc docstring with versionadded, named Returns, See Also, Examples
    include.
  • Executed example notebook examples/plotting/plot_eval_heatmap.ipynb (inline
    backend, embedded PNG outputs, plt.tight_layout()+plt.show() in the notebook
    only).
  • 21 unit tests: returns Axes; respects vmin/vmax/labels/cbar_label/ax;
    annotation count; horizontal ticks; bad input → ValueError; and a
    seaborn-block equivalence regression test (KPI)
    .
  • Public API: re-exported via plotting/__init__.py, top-level __init__.py/__all__,
    and docs/source/api.rst.
  • Release notes (Unreleased → Plotting) + cheat sheet entry.

Notes / not done

  • I did not re-author γ-secretase notebook cells 12/25 in place: that notebook is
    the full paper pipeline (heavy CPP/dPULearn runs) and is concurrently modified in the
    main checkout, so re-executing it overnight was out of scope. Equivalence to those
    cells is instead proven by test_matches_raw_seaborn_block (same heatmap array,
    clims, colormap, and annotations). Swapping the cells onto the new call is a trivial
    maintainer follow-up.

🤖 Generated with Claude Code

Critical self-review (post-prototype pass)

Rendered the function to PNG under the house context (aa.plot_settings(),
font 18) and inspected — found and fixed real defects:

Defects found + fixed

  • X-tick labels overlapped into unreadable mush — the biggest defect, and it
    was visible in the committed example notebook's own cell-2 output
    (25 feature50 f...). Forcing x labels horizontal (rotation=0) with no sizing
    control breaks for any non-trivial column-label length, including the realistic
    γ-secretase case (scale61 scale60 scale75 CPP). Added xtick_rotation /
    ytick_rotation (non-zero x rotation is right-aligned) and a figsize arg
    (consistent with the plot_rank sibling). The example's long-label panel now
    renders clean; re-executed with the inline backend.
  • linewidth=0.1 was an ambiguous kwargs pass-through that collides with
    seaborn's own linewidths=0 default on pcolormesh. Switched to the documented
    linewidths=0.1. Visual/equivalence unchanged (the KPI test asserts array /
    clim / cmap / annotations, not line width).
  • Minor: the all-numeric column check computed select_dtypes twice — now
    once; added the # I Helper Functions skeleton header; validate the new params
    in the Validate block; tightened the See Also.

Confirmed correct (not changed)

  • Returns the Axes (not (fig, ax)) — matches issue feat: aa.plot_eval_heatmap — house-preset evaluation heatmap (consolidate duplicated seaborn blocks) #310's explicit
    requirement; library code never calls plt.show() / tight_layout() /
    plot_settings().
  • Annotation text contrast is auto-handled by seaborn (white on dark, dark on
    light) — legible across the viridis range in the renders.
  • cmap="viridis" kept as a literal: plot_get_cmap only ships the diverging
    CPP/SHAP palettes (wrong for a sequential 50-100 accuracy scale); viridis is the
    house preset named in the issue, so no preset applies.

Tests / gates (worktree-authoritative import verified)

  • Added tests for xtick_rotation (rotation + ha), ytick_rotation, figsize,
    and bad-type rejection. plotting_tests 312 passed, api_tests 175 passed
    (param-coverage, backend-import-hygiene, return-contract all green), docstring
    structural checker clean (only the sibling-consistent RAISES advisory), and
    --nbmake on the example notebook passes.

Residual

  • Notebook cells 12/25 of the γ-secretase use case are still not re-authored onto
    the new call (that notebook is heavy and concurrently edited in the main
    checkout); equivalence to those cells stays proven by
    test_matches_raw_seaborn_block. Trivial maintainer follow-up.

Iterative review log

  • Round 2 (correctness/edge): converged — no new defects. Adversarially probed NaN cells, all-NaN, inf, bool/datetime/object columns, duplicate column names, single-row/col, integer dtype. NaN cells render blank (only non-NaN annotated) and never raise; bool/datetime/dup-col grids are cleanly rejected via the Validate block. Added two edge tests (test_nan_cells_render_gracefully, test_integer_grid_accepted) to guard the NaN-graceful and integer-dtype behavior (a sweep table can legitimately carry NaN for a failed config). No stochastic path → reproducibility N/A.
  • Round 3 (plot quality): converged — no new defects. Rendered to PNG under both aa.plot_settings() (font 18) and default rcParams, across small (2×3), large (8×10), and long-label (45° rotated) grids. Verified: seaborn per-cell luminance contrast is legible across the whole viridis range (white on dark blue/teal, dark on green/yellow); annotation/tick/colorbar-label/colorbar-tick font sizes all scale correctly with rcParams (18/16.5/18/16.5 under plot_settings, 10 at default); colorbar is labeled; long labels right-align cleanly with no overlap/clipping; set_xticklabels renders with zero warnings (warnings-as-errors check passed); library code contains no plt.show/tight_layout/plot_settings. Committed example-notebook outputs re-verified clean. Non-square aspect kept deliberately to stay byte-equivalent to the consolidated seaborn block (ADR-0032).
  • Round 4 (efficiency/simplicity): simplified two Validate-block guards with identical behavior — len(df_eval) == 0 or df_eval.shape[1] == 0 → idiomatic df_eval.empty; and the all-numeric check from select_dtypes(include=...) + length-compare + an O(n²) membership comprehension (error path) down to a single select_dtypes(exclude="number").columns.tolist(). Draw path already fully vectorized (no loops/iterrows/copies); verified byte-identical rejection on empty/bool/datetime and acceptance on float/int grids.
  • Round 5 (guides/docs/tests): docstring checker reports 0 defects (the single RAISES-UNDOCUMENTED advisory is house-consistent — no plotting sibling documents a Raises section, so leaving it off matches the family). Validate block confirmed to cover every public param via a ut.check_*; no ADR refs, no print, bare ValueError, viridis literal justified (no matching ut preset for a sequential 50–100 scale). Added negative-type tests for ylabel/cbar_label/ytick_rotation so every public param now has positive+negative coverage. Full local gate green: heatmap tests + param-coverage + backend-import-hygiene + plot-return-contract + docstring-contracts (113 passed). Example notebook re-executes cleanly against the worktree code (verified cell-by-cell); the local --nbmake miss is only the worktree/editable-install split (kernel resolves the installed main-checkout package), which does not occur in CI where the merged package is pip-installed.

breimanntools and others added 7 commits July 1, 2026 01:29
#310)

Consolidate the hand-built seaborn evaluation-heatmap block (duplicated in the
gamma-secretase notebook cells 12/25 and the original project) into one
top-level function with the house preset: viridis, fixed [vmin, vmax] color
limits, integer annotations, labeled colorbar, horizontal ticks. Returns the
Axes; library code never calls plt.show()/tight_layout().

The simple static sibling of the adaptive aap.plot_eval (documented in See Also).

- aaanalysis/plotting/_plot_eval_heatmap.py: frontend with Validate block
- wire to public API: plotting/__init__.py + __init__.py/__all__ + api.rst
- numpydoc + Examples include; executed example notebook
- 21 unit tests incl. seaborn-block equivalence (KPI) + bad-input ValueError
- release notes Unreleased + cheat sheet entry

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, correct seaborn linewidths

Critical-review pass on #310:
- Long/normal-length column labels overlapped into unreadable mush because the
  x-tick labels were forced horizontal (rotation=0) with no sizing control. Add
  xtick_rotation / ytick_rotation (right-aligned when rotated) and a figsize arg;
  the committed example's long-label panel now renders clean.
- Use seaborn's documented linewidths= (was the ambiguous kwargs-passthrough
  linewidth=, which collides with seaborn's own linewidths=0 default).
- Compute the all-numeric column check once; add the # I Helper Functions
  skeleton header; validate the new params in the Validate block.
- Tests + example notebook updated and re-executed (inline backend).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adversarial correctness pass found no defect; NaN cells render blank and
never raise. Guard NaN-graceful annotation and integer-dtype acceptance.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Use idiomatic df.empty for the min-shape guard and a single
select_dtypes(exclude='number') for the all-numeric guard (drops the
length compare + O(n^2) membership comprehension). Behavior identical.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…bel/ytick_rotation

Completes positive+negative coverage for every public param. Docstring
checker clean (0 defects; RAISES-UNDOCUMENTED advisory is house-consistent
— no plotting sibling documents Raises).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…for evaluation maps)

Evaluation maps should read as a square grid of equal-sized cells; sns.heatmap
defaulted square=False (cells stretched to the figure). Add a square: bool=True
param (opt out with square=False). Additive; equivalence test unaffected (checks
data/clim/cmap/annotations, not aspect).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant