Skip to content

feat: aa.plot_comparison grouped comparison barplot (prototype #311)#320

Draft
breimanntools wants to merge 6 commits into
masterfrom
feat/plot-comparison
Draft

feat: aa.plot_comparison grouped comparison barplot (prototype #311)#320
breimanntools wants to merge 6 commits into
masterfrom
feat/plot-comparison

Conversation

@breimanntools

@breimanntools breimanntools commented Jun 30, 2026

Copy link
Copy Markdown
Owner

Status: SOLID — works, green local tests, full ripple done. One integration
step (re-authoring the live γ-secretase cell 28) is left for the maintainer
because the paper notebook needs xgboost, which is not installed in the agent
env; the identical figure is proven by the executed example notebook + unit tests.

Part of #305 / prototype for #311.

What this adds

Top-level aa.plot_comparison(df_eval, group=..., condition=..., value=..., baseline=50, baseline_label=None, annotate=True, group_order=None, condition_order=None, colors=None, bar_width=0.8, ax=None, figsize=(7, 4.2), xlabel=None, ylabel="Score", title=None, ylim=None, fontsize_annotations=10)
→ returns the Axes.

A grouped method × condition barplot built from a tidy eval frame:

  • auto bar offsets / widths for N groups (bar count = groups × conditions);
  • optional per-bar value labels (annotate);
  • optional dashed chance / baseline line with an auto label (baseline /
    baseline_label);
  • house colors via plot_get_clist; follows plotting.md (library plot code
    never calls plt.show() / tight_layout() / plot_settings(); returns the Axes).

Replaces the fully-manual matplotlib of a hand-built grouped barplot (the
x ± w/2 offsets + per-bar ax.text loop + axhline(50) in γ-secretase cell 28).

Files

  • aaanalysis/plotting/_plot_comparison.py — the function (frontend + Validate block).
  • aaanalysis/plotting/__init__.py, aaanalysis/__init__.py — re-export (feat: aa.plot_comparison — grouped comparison barplot with value labels + chance line #311 is
    on the approved wire-to-public-API list; additive __all__ entry).
  • tests/unit/plotting_tests/test_plot_comparison.py — 33 tests (positive +
    negative per public param; bar count, baseline-when-set, annotations, ValueErrors).
  • examples/plotting/plot_comparison.ipynb — executed, inline backend, display_df
    • plt.tight_layout()/plt.show(); renders the γ-sec cell-28 figure
      (61/60/75 vs 71/74/94, baseline 50) via one call.
  • docs/source/api.rst (autosummary), docs/source/index/release_notes.rst
    (Unreleased), docs/_cheatsheet/content.py (cheat-sheet row).

Local gates (all green)

  • tests/unit/plotting_tests/ → 319 passed (33 new).
  • tests/unit/api_tests/ → 175 passed (param-coverage, abbreviation registry,
    backend-import-hygiene, utils barrel).
  • docstring checker → 0 defects (1 advisory RAISES-UNDOCUMENTED, matching the
    sibling plot_rank which also omits a Raises section); doc/signature drift → 0.
  • import aaanalysis resolves cleanly with plot_comparison in aa.__all__.

What to review

  • API surface / param names (locked subset group/condition/value/baseline/ annotate/ax plus the optional extras baseline_label, group_order,
    condition_order, colors, bar_width, ylim, fontsize_annotations,
    labels/title/figsize — mirrors the plot_rank extras pattern).
  • Default ylabel="Score" (generic) vs leaving it None.
  • Repeated (group, condition) cells are averaged (mean); confirm that
    policy vs erroring on duplicates.
  • Return type is Axes (per the issue), unlike plot_rank which returns
    (fig, ax) — confirm the intended house convention here.

γ-secretase cell-28 re-authoring (KPI — recommended drop-in for the maintainer)

The paper notebook can't be re-executed in the agent env (No module named 'xgboost'), so its committed outputs are left intact. With the maintainer's full
env, cell 28 collapses to:

aa.plot_settings()
df_eval = pd.DataFrame({
    "group":     ["Scale-based"] * 3 + ["CPP"] * 3,
    "condition": cols * 2,
    "value":     res_scale + res_cpp,
})
aa.plot_comparison(df_eval, baseline=50, baseline_label="random (50%)",
                   colors={"Scale-based": "tab:gray", "CPP": "tab:red"},
                   ylabel="Balanced accuracy [%]", ylim=(0, 108),
                   title="Feature engineering x data expansion")
plt.tight_layout(); plt.show()

This removes the manual offsets, the per-bar ax.text loop, and the axhline
the issue KPI. The example notebook in this PR renders the equivalent figure.

BLOCKED / TODO

  • Re-run γ-secretase use_case1_gamma_secretase.ipynb with xgboost installed and
    swap cell 28 to the snippet above (could not execute here).

🤖 Generated with Claude Code


Critical self-review (review-and-improve pass)

Rendered the figure to PNG across several scenarios (percent, fractional AUC,
5 groups, tall bars) and inspected for real defects. Found and fixed:

  • Value labels broke on fractional metrics (major). The hardcoded f"{h:.0f}"
    rendered AUC-style values in [0, 1] as "0"/"1" for every bar (0.62, 0.72,
    0.82, 0.92 all became "1"). Added _auto_annotation_fmt (no decimals for
    integer-valued scores, two decimals for fractional, one otherwise) plus an
    explicit annotation_fmt override.
  • Legend overlapped the data (major, plot quality). The inside ax.legend()
    sat on top of the tallest bars and clipped their value labels (e.g. the "94"
    label vanished behind the legend box). Moved the legend outside the axes
    (upper right, frameless) so it never collides with bars or labels.
  • Return type now (fig, ax) to match the sibling plot_rank (same epic)
    instead of a bare Axes — the open question flagged in the original PR body is
    resolved toward the family convention. Tests and the example notebook updated to
    unpack fig, ax.
  • Cryptic error on non-distinct columns. Passing the same column for two of
    group/condition/value produced an opaque pandas groupby error; added an
    explicit up-front ValueError.
  • Broken cross-reference. See Also pointed to a non-existent
    aaanalysis.plot_eval_heatmap; repointed to the real aaanalysis.pipe.plot_eval
    and added plot_rank.
  • Minor: extend the y headroom slightly even when annotate=False; validate
    annotation_fmt in the Validate block.

Tests: 39 unit tests (adds fractional-precision, mean-aggregation, annotation_fmt,
distinct-column, and (fig, ax) cases) plus param_coverage / backend-import-hygiene
green; docstring checker 0 defects (the advisory RAISES-UNDOCUMENTED is consistent
with plot_rank, which also omits a Raises section); the example notebook
re-executes under nbmake with embedded outputs.

Residual: the library default palette (plot_get_clist) is used for auto colors;
the tab:gray/tab:red literals appear only in user-facing example/test call sites
(demonstrating the colors arg), not in library code. The γ-secretase notebook
swap remains blocked on xgboost in the agent env (unchanged from the original PR).

Iterative review log

  • Round 2 (correctness & edge cases): fixed a duplicate entry in group_order/condition_order (cryptic TypeError / silent extra x-tick + wrong bar count) by de-duplicating the explicit order; added an up-front ValueError for a non-numeric value column (was a cryptic pandas mean error). +4 regression tests (dup group/condition order, partial grid, non-numeric value).
  • Round 3 (plot quality): rendered percent + fractional values, 2/5 groups, tall bars, and long condition names under plot_settings() + default rcParams. Found the free-floating chance (...) baseline text overlapped the bars when the baseline sat near bar heights (AUC vs 0.5); moved it into the outside legend via the axhline label. Added an additive xtick_rotation arg (default 0, output unchanged) so long condition names stay legible. Residual: on very dense clusters (>=5 groups) adjacent per-bar value labels can still touch — inherent to value labels on narrow bars.
  • Round 4 (efficiency & simplicity): hoisted the loop-invariant 0.01*max(heights_max,1) label gap out of the per-bar annotation loop and replaced .values.astype(float) with a single to_numpy(dtype=float) conversion. Output byte-identical (46 tests unchanged). No dead code / redundant validation found otherwise; the empty/distinct-column/bar_width==0 guards each cover a case check_df/check_number_range do not.
  • Round 5 (guides, docs & tests): frontend Validate covers every public param (including the new xtick_rotation, positive+negative tested); no ADR refs / print / tab: literals in library code; docstring checker 0 defects (RAISES advisory matches sibling plot_rank), doc/signature drift 0; (fig, ax) return matches sibling plot_rank. Refreshed + re-executed the example notebook (nbmake green, embedded PNGs) to cover xtick_rotation and the legend baseline. Full local gate: plotting + api meta-tests 507 passed.

breimanntools and others added 6 commits July 1, 2026 01:31
Top-level aa.plot_comparison(df_eval, group=, condition=, value=, baseline=50,
annotate=True, ax=None) draws a grouped method x condition barplot from a tidy
eval frame: automatic bar offsets/widths for N groups, optional per-bar value
labels, and an optional dashed chance/baseline line with a label. Returns the
Axes. Replaces the fully-manual matplotlib (x +/- w/2 offsets + per-bar ax.text
loop + axhline) of a hand-built grouped barplot (gamma-secretase cell 28).

House colors via plot_get_clist; follows plotting.md (no plt.show/tight_layout/
plot_settings in library code; returns the Axes).

Full ripple: numpydoc + Examples include; executed example notebook
examples/plotting/plot_comparison.ipynb (inline backend, display_df + plt.show);
33 unit tests (bar count = groups x conditions, baseline drawn when set,
annotations present, bad input -> ValueError); re-exported in
aaanalysis/__init__.py __all__; api.rst autosummary; release-notes Unreleased
entry; cheat-sheet row.

Part of #305 / prototype for #311.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…x) return

- Value labels used a hardcoded {:.0f} that collapsed fractional metrics
  (AUC in [0,1]) to '0'/'1'; add auto-precision + an annotation_fmt override.
- Legend was drawn inside the axes and overlapped the tallest bars / their
  value labels; move it outside (upper right) for a publication-clean figure.
- Return (fig, ax) to match the sibling plot_rank instead of a bare Axes.
- Reject non-distinct group/condition/value columns with a clear ValueError
  (was a cryptic pandas groupby error); validate annotation_fmt.
- Fix a broken See Also cross-reference (plot_eval_heatmap -> pipe.plot_eval,
  plot_rank); extend headroom slightly when annotate is False.
- Update tests to unpack (fig, ax) and cover the new behaviors; re-execute the
  example notebook with embedded outputs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fix two correctness edge cases found in adversarial review:
- A repeated entry in group_order/condition_order created a duplicate grid
  axis label, raising a cryptic TypeError (group_order) or silently drawing
  an extra x-tick + wrong bar count (condition_order). _resolve_order now
  de-duplicates the explicit order.
- A non-numeric 'value' column produced a cryptic pandas 'mean' TypeError;
  the Validate block now raises a clear ValueError up front.
Adds 4 regression tests (dup group/condition order, partial grid, non-numeric).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Plot-quality pass — rendered percent/fractional, 2-5 groups, tall bars,
long condition names under plot_settings + default rcParams:
- The free-floating 'chance (...)' baseline text obscured the bars whenever
  the baseline sat near the bar heights (e.g. AUC vs 0.5). Move it into the
  (already-outside) legend via the axhline label, so it can never overlap data.
- Add an additive xtick_rotation arg (default 0, output unchanged) so long
  condition names can be right-aligned/rotated to stay legible.
Updates the baseline-label tests to assert the legend entry; adds positive+
negative xtick_rotation tests. 46 tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… copy

Efficiency/clarity micro-pass (output byte-identical):
- Hoist the constant 0.01*max(heights_max,1) label gap out of the per-bar
  annotation loop (was recomputed for every bar).
- grid.loc[g].to_numpy(dtype=float) instead of .values.astype(float) (one
  conversion, no redundant intermediate copy).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…line wording

Docs/tests completeness pass:
- Refresh the executed example notebook for the round-3 behavior (baseline
  label now a legend entry) and demonstrate the new xtick_rotation param.
- Update the 'Further parameters' prose (baseline_label is a legend label;
  add xtick_rotation).
Re-executed with embedded outputs; nbmake green. Docstring checker 0 defects,
doc/signature drift 0; full plotting + api meta-test gate 507 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant