feat: aa.plot_comparison grouped comparison barplot (prototype #311) by breimanntools · Pull Request #320 · breimanntools/aaanalysis

breimanntools · 2026-06-30T23:47:38Z

Status: SOLID — works, green local tests, full ripple done. One integration
step (re-authoring the live γ-secretase cell 28) is left for the maintainer
because the paper notebook needs xgboost, which is not installed in the agent
env; the identical figure is proven by the executed example notebook + unit tests.

Part of #305 / prototype for #311.

What this adds

Top-level aa.plot_comparison(df_eval, group=..., condition=..., value=..., baseline=50, baseline_label=None, annotate=True, group_order=None, condition_order=None, colors=None, bar_width=0.8, ax=None, figsize=(7, 4.2), xlabel=None, ylabel="Score", title=None, ylim=None, fontsize_annotations=10)
→ returns the Axes.

A grouped method × condition barplot built from a tidy eval frame:

auto bar offsets / widths for N groups (bar count = groups × conditions);
optional per-bar value labels (annotate);
optional dashed chance / baseline line with an auto label (baseline /
baseline_label);
house colors via plot_get_clist; follows plotting.md (library plot code
never calls plt.show() / tight_layout() / plot_settings(); returns the Axes).

Replaces the fully-manual matplotlib of a hand-built grouped barplot (the
x ± w/2 offsets + per-bar ax.text loop + axhline(50) in γ-secretase cell 28).

Files

aaanalysis/plotting/_plot_comparison.py — the function (frontend + Validate block).
aaanalysis/plotting/__init__.py, aaanalysis/__init__.py — re-export (feat: aa.plot_comparison — grouped comparison barplot with value labels + chance line #311 is
on the approved wire-to-public-API list; additive __all__ entry).
tests/unit/plotting_tests/test_plot_comparison.py — 33 tests (positive +
negative per public param; bar count, baseline-when-set, annotations, ValueErrors).
examples/plotting/plot_comparison.ipynb — executed, inline backend, display_df
- plt.tight_layout()/plt.show(); renders the γ-sec cell-28 figure
  (61/60/75 vs 71/74/94, baseline 50) via one call.
docs/source/api.rst (autosummary), docs/source/index/release_notes.rst
(Unreleased), docs/_cheatsheet/content.py (cheat-sheet row).

Local gates (all green)

tests/unit/plotting_tests/ → 319 passed (33 new).
tests/unit/api_tests/ → 175 passed (param-coverage, abbreviation registry,
backend-import-hygiene, utils barrel).
docstring checker → 0 defects (1 advisory RAISES-UNDOCUMENTED, matching the
sibling plot_rank which also omits a Raises section); doc/signature drift → 0.
import aaanalysis resolves cleanly with plot_comparison in aa.__all__.

What to review

API surface / param names (locked subset group/condition/value/baseline/ annotate/ax plus the optional extras baseline_label, group_order,
condition_order, colors, bar_width, ylim, fontsize_annotations,
labels/title/figsize — mirrors the plot_rank extras pattern).
Default ylabel="Score" (generic) vs leaving it None.
Repeated (group, condition) cells are averaged (mean); confirm that
policy vs erroring on duplicates.
Return type is Axes (per the issue), unlike plot_rank which returns
(fig, ax) — confirm the intended house convention here.

γ-secretase cell-28 re-authoring (KPI — recommended drop-in for the maintainer)

The paper notebook can't be re-executed in the agent env (No module named 'xgboost'), so its committed outputs are left intact. With the maintainer's full
env, cell 28 collapses to:

aa.plot_settings()
df_eval = pd.DataFrame({
    "group":     ["Scale-based"] * 3 + ["CPP"] * 3,
    "condition": cols * 2,
    "value":     res_scale + res_cpp,
})
aa.plot_comparison(df_eval, baseline=50, baseline_label="random (50%)",
                   colors={"Scale-based": "tab:gray", "CPP": "tab:red"},
                   ylabel="Balanced accuracy [%]", ylim=(0, 108),
                   title="Feature engineering x data expansion")
plt.tight_layout(); plt.show()

This removes the manual offsets, the per-bar ax.text loop, and the axhline —
the issue KPI. The example notebook in this PR renders the equivalent figure.

BLOCKED / TODO

Re-run γ-secretase use_case1_gamma_secretase.ipynb with xgboost installed and
swap cell 28 to the snippet above (could not execute here).

🤖 Generated with Claude Code

Critical self-review (review-and-improve pass)

Rendered the figure to PNG across several scenarios (percent, fractional AUC,
5 groups, tall bars) and inspected for real defects. Found and fixed:

Value labels broke on fractional metrics (major). The hardcoded f"{h:.0f}"
rendered AUC-style values in [0, 1] as "0"/"1" for every bar (0.62, 0.72,
0.82, 0.92 all became "1"). Added _auto_annotation_fmt (no decimals for
integer-valued scores, two decimals for fractional, one otherwise) plus an
explicit annotation_fmt override.
Legend overlapped the data (major, plot quality). The inside ax.legend()
sat on top of the tallest bars and clipped their value labels (e.g. the "94"
label vanished behind the legend box). Moved the legend outside the axes
(upper right, frameless) so it never collides with bars or labels.
Return type now (fig, ax) to match the sibling plot_rank (same epic)
instead of a bare Axes — the open question flagged in the original PR body is
resolved toward the family convention. Tests and the example notebook updated to
unpack fig, ax.
Cryptic error on non-distinct columns. Passing the same column for two of
group/condition/value produced an opaque pandas groupby error; added an
explicit up-front ValueError.
Broken cross-reference. See Also pointed to a non-existent
aaanalysis.plot_eval_heatmap; repointed to the real aaanalysis.pipe.plot_eval
and added plot_rank.
Minor: extend the y headroom slightly even when annotate=False; validate
annotation_fmt in the Validate block.

Tests: 39 unit tests (adds fractional-precision, mean-aggregation, annotation_fmt,
distinct-column, and (fig, ax) cases) plus param_coverage / backend-import-hygiene
green; docstring checker 0 defects (the advisory RAISES-UNDOCUMENTED is consistent
with plot_rank, which also omits a Raises section); the example notebook
re-executes under nbmake with embedded outputs.

Residual: the library default palette (plot_get_clist) is used for auto colors;
the tab:gray/tab:red literals appear only in user-facing example/test call sites
(demonstrating the colors arg), not in library code. The γ-secretase notebook
swap remains blocked on xgboost in the agent env (unchanged from the original PR).

Iterative review log

Round 2 (correctness & edge cases): fixed a duplicate entry in group_order/condition_order (cryptic TypeError / silent extra x-tick + wrong bar count) by de-duplicating the explicit order; added an up-front ValueError for a non-numeric value column (was a cryptic pandas mean error). +4 regression tests (dup group/condition order, partial grid, non-numeric value).
Round 3 (plot quality): rendered percent + fractional values, 2/5 groups, tall bars, and long condition names under plot_settings() + default rcParams. Found the free-floating chance (...) baseline text overlapped the bars when the baseline sat near bar heights (AUC vs 0.5); moved it into the outside legend via the axhline label. Added an additive xtick_rotation arg (default 0, output unchanged) so long condition names stay legible. Residual: on very dense clusters (>=5 groups) adjacent per-bar value labels can still touch — inherent to value labels on narrow bars.
Round 4 (efficiency & simplicity): hoisted the loop-invariant 0.01*max(heights_max,1) label gap out of the per-bar annotation loop and replaced .values.astype(float) with a single to_numpy(dtype=float) conversion. Output byte-identical (46 tests unchanged). No dead code / redundant validation found otherwise; the empty/distinct-column/bar_width==0 guards each cover a case check_df/check_number_range do not.
Round 5 (guides, docs & tests): frontend Validate covers every public param (including the new xtick_rotation, positive+negative tested); no ADR refs / print / tab: literals in library code; docstring checker 0 defects (RAISES advisory matches sibling plot_rank), doc/signature drift 0; (fig, ax) return matches sibling plot_rank. Refreshed + re-executed the example notebook (nbmake green, embedded PNGs) to cover xtick_rotation and the legend baseline. Full local gate: plotting + api meta-tests 507 passed.

Top-level aa.plot_comparison(df_eval, group=, condition=, value=, baseline=50, annotate=True, ax=None) draws a grouped method x condition barplot from a tidy eval frame: automatic bar offsets/widths for N groups, optional per-bar value labels, and an optional dashed chance/baseline line with a label. Returns the Axes. Replaces the fully-manual matplotlib (x +/- w/2 offsets + per-bar ax.text loop + axhline) of a hand-built grouped barplot (gamma-secretase cell 28). House colors via plot_get_clist; follows plotting.md (no plt.show/tight_layout/ plot_settings in library code; returns the Axes). Full ripple: numpydoc + Examples include; executed example notebook examples/plotting/plot_comparison.ipynb (inline backend, display_df + plt.show); 33 unit tests (bar count = groups x conditions, baseline drawn when set, annotations present, bad input -> ValueError); re-exported in aaanalysis/__init__.py __all__; api.rst autosummary; release-notes Unreleased entry; cheat-sheet row. Part of #305 / prototype for #311. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…x) return - Value labels used a hardcoded {:.0f} that collapsed fractional metrics (AUC in [0,1]) to '0'/'1'; add auto-precision + an annotation_fmt override. - Legend was drawn inside the axes and overlapped the tallest bars / their value labels; move it outside (upper right) for a publication-clean figure. - Return (fig, ax) to match the sibling plot_rank instead of a bare Axes. - Reject non-distinct group/condition/value columns with a clear ValueError (was a cryptic pandas groupby error); validate annotation_fmt. - Fix a broken See Also cross-reference (plot_eval_heatmap -> pipe.plot_eval, plot_rank); extend headroom slightly when annotate is False. - Update tests to unpack (fig, ax) and cover the new behaviors; re-execute the example notebook with embedded outputs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Fix two correctness edge cases found in adversarial review: - A repeated entry in group_order/condition_order created a duplicate grid axis label, raising a cryptic TypeError (group_order) or silently drawing an extra x-tick + wrong bar count (condition_order). _resolve_order now de-duplicates the explicit order. - A non-numeric 'value' column produced a cryptic pandas 'mean' TypeError; the Validate block now raises a clear ValueError up front. Adds 4 regression tests (dup group/condition order, partial grid, non-numeric). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Plot-quality pass — rendered percent/fractional, 2-5 groups, tall bars, long condition names under plot_settings + default rcParams: - The free-floating 'chance (...)' baseline text obscured the bars whenever the baseline sat near the bar heights (e.g. AUC vs 0.5). Move it into the (already-outside) legend via the axhline label, so it can never overlap data. - Add an additive xtick_rotation arg (default 0, output unchanged) so long condition names can be right-aligned/rotated to stay legible. Updates the baseline-label tests to assert the legend entry; adds positive+ negative xtick_rotation tests. 46 tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… copy Efficiency/clarity micro-pass (output byte-identical): - Hoist the constant 0.01*max(heights_max,1) label gap out of the per-bar annotation loop (was recomputed for every bar). - grid.loc[g].to_numpy(dtype=float) instead of .values.astype(float) (one conversion, no redundant intermediate copy). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…line wording Docs/tests completeness pass: - Refresh the executed example notebook for the round-3 behavior (baseline label now a legend entry) and demonstrate the new xtick_rotation param. - Update the 'Further parameters' prose (baseline_label is a legend label; add xtick_rotation). Re-executed with embedded outputs; nbmake green. Docstring checker 0 defects, doc/signature drift 0; full plotting + api meta-test gate 507 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

breimanntools and others added 6 commits July 1, 2026 01:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: aa.plot_comparison grouped comparison barplot (prototype #311)#320

feat: aa.plot_comparison grouped comparison barplot (prototype #311)#320
breimanntools wants to merge 6 commits into
masterfrom
feat/plot-comparison

breimanntools commented Jun 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

breimanntools commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this adds

Files

Local gates (all green)

What to review

γ-secretase cell-28 re-authoring (KPI — recommended drop-in for the maintainer)

BLOCKED / TODO

Critical self-review (review-and-improve pass)

Iterative review log

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

breimanntools commented Jun 30, 2026 •

edited

Loading