feat: aa.plot_comparison grouped comparison barplot (prototype #311)#320
Draft
breimanntools wants to merge 6 commits into
Draft
feat: aa.plot_comparison grouped comparison barplot (prototype #311)#320breimanntools wants to merge 6 commits into
breimanntools wants to merge 6 commits into
Conversation
Top-level aa.plot_comparison(df_eval, group=, condition=, value=, baseline=50, annotate=True, ax=None) draws a grouped method x condition barplot from a tidy eval frame: automatic bar offsets/widths for N groups, optional per-bar value labels, and an optional dashed chance/baseline line with a label. Returns the Axes. Replaces the fully-manual matplotlib (x +/- w/2 offsets + per-bar ax.text loop + axhline) of a hand-built grouped barplot (gamma-secretase cell 28). House colors via plot_get_clist; follows plotting.md (no plt.show/tight_layout/ plot_settings in library code; returns the Axes). Full ripple: numpydoc + Examples include; executed example notebook examples/plotting/plot_comparison.ipynb (inline backend, display_df + plt.show); 33 unit tests (bar count = groups x conditions, baseline drawn when set, annotations present, bad input -> ValueError); re-exported in aaanalysis/__init__.py __all__; api.rst autosummary; release-notes Unreleased entry; cheat-sheet row. Part of #305 / prototype for #311. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…x) return
- Value labels used a hardcoded {:.0f} that collapsed fractional metrics
(AUC in [0,1]) to '0'/'1'; add auto-precision + an annotation_fmt override.
- Legend was drawn inside the axes and overlapped the tallest bars / their
value labels; move it outside (upper right) for a publication-clean figure.
- Return (fig, ax) to match the sibling plot_rank instead of a bare Axes.
- Reject non-distinct group/condition/value columns with a clear ValueError
(was a cryptic pandas groupby error); validate annotation_fmt.
- Fix a broken See Also cross-reference (plot_eval_heatmap -> pipe.plot_eval,
plot_rank); extend headroom slightly when annotate is False.
- Update tests to unpack (fig, ax) and cover the new behaviors; re-execute the
example notebook with embedded outputs.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fix two correctness edge cases found in adversarial review: - A repeated entry in group_order/condition_order created a duplicate grid axis label, raising a cryptic TypeError (group_order) or silently drawing an extra x-tick + wrong bar count (condition_order). _resolve_order now de-duplicates the explicit order. - A non-numeric 'value' column produced a cryptic pandas 'mean' TypeError; the Validate block now raises a clear ValueError up front. Adds 4 regression tests (dup group/condition order, partial grid, non-numeric). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Plot-quality pass — rendered percent/fractional, 2-5 groups, tall bars, long condition names under plot_settings + default rcParams: - The free-floating 'chance (...)' baseline text obscured the bars whenever the baseline sat near the bar heights (e.g. AUC vs 0.5). Move it into the (already-outside) legend via the axhline label, so it can never overlap data. - Add an additive xtick_rotation arg (default 0, output unchanged) so long condition names can be right-aligned/rotated to stay legible. Updates the baseline-label tests to assert the legend entry; adds positive+ negative xtick_rotation tests. 46 tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… copy Efficiency/clarity micro-pass (output byte-identical): - Hoist the constant 0.01*max(heights_max,1) label gap out of the per-bar annotation loop (was recomputed for every bar). - grid.loc[g].to_numpy(dtype=float) instead of .values.astype(float) (one conversion, no redundant intermediate copy). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…line wording Docs/tests completeness pass: - Refresh the executed example notebook for the round-3 behavior (baseline label now a legend entry) and demonstrate the new xtick_rotation param. - Update the 'Further parameters' prose (baseline_label is a legend label; add xtick_rotation). Re-executed with embedded outputs; nbmake green. Docstring checker 0 defects, doc/signature drift 0; full plotting + api meta-test gate 507 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Status: SOLID — works, green local tests, full ripple done. One integration
step (re-authoring the live γ-secretase cell 28) is left for the maintainer
because the paper notebook needs
xgboost, which is not installed in the agentenv; the identical figure is proven by the executed example notebook + unit tests.
Part of #305 / prototype for #311.
What this adds
Top-level
aa.plot_comparison(df_eval, group=..., condition=..., value=..., baseline=50, baseline_label=None, annotate=True, group_order=None, condition_order=None, colors=None, bar_width=0.8, ax=None, figsize=(7, 4.2), xlabel=None, ylabel="Score", title=None, ylim=None, fontsize_annotations=10)→ returns the
Axes.A grouped method × condition barplot built from a tidy eval frame:
annotate);baseline/baseline_label);plot_get_clist; followsplotting.md(library plot codenever calls
plt.show()/tight_layout()/plot_settings(); returns the Axes).Replaces the fully-manual matplotlib of a hand-built grouped barplot (the
x ± w/2offsets + per-barax.textloop +axhline(50)in γ-secretase cell 28).Files
aaanalysis/plotting/_plot_comparison.py— the function (frontend + Validate block).aaanalysis/plotting/__init__.py,aaanalysis/__init__.py— re-export (feat: aa.plot_comparison — grouped comparison barplot with value labels + chance line #311 ison the approved wire-to-public-API list; additive
__all__entry).tests/unit/plotting_tests/test_plot_comparison.py— 33 tests (positive +negative per public param; bar count, baseline-when-set, annotations, ValueErrors).
examples/plotting/plot_comparison.ipynb— executed, inline backend,display_dfplt.tight_layout()/plt.show(); renders the γ-sec cell-28 figure(61/60/75 vs 71/74/94, baseline 50) via one call.
docs/source/api.rst(autosummary),docs/source/index/release_notes.rst(Unreleased),
docs/_cheatsheet/content.py(cheat-sheet row).Local gates (all green)
tests/unit/plotting_tests/→ 319 passed (33 new).tests/unit/api_tests/→ 175 passed (param-coverage, abbreviation registry,backend-import-hygiene, utils barrel).
RAISES-UNDOCUMENTED, matching thesibling
plot_rankwhich also omits a Raises section); doc/signature drift → 0.import aaanalysisresolves cleanly withplot_comparisoninaa.__all__.What to review
group/condition/value/baseline/ annotate/axplus the optional extrasbaseline_label,group_order,condition_order,colors,bar_width,ylim,fontsize_annotations,labels/title/figsize — mirrors the
plot_rankextras pattern).ylabel="Score"(generic) vs leaving itNone.(group, condition)cells are averaged (mean); confirm thatpolicy vs erroring on duplicates.
Axes(per the issue), unlikeplot_rankwhich returns(fig, ax)— confirm the intended house convention here.γ-secretase cell-28 re-authoring (KPI — recommended drop-in for the maintainer)
The paper notebook can't be re-executed in the agent env (
No module named 'xgboost'), so its committed outputs are left intact. With the maintainer's fullenv, cell 28 collapses to:
This removes the manual offsets, the per-bar
ax.textloop, and theaxhline—the issue KPI. The example notebook in this PR renders the equivalent figure.
BLOCKED / TODO
use_case1_gamma_secretase.ipynbwithxgboostinstalled andswap cell 28 to the snippet above (could not execute here).
🤖 Generated with Claude Code
Critical self-review (review-and-improve pass)
Rendered the figure to PNG across several scenarios (percent, fractional AUC,
5 groups, tall bars) and inspected for real defects. Found and fixed:
f"{h:.0f}"rendered AUC-style values in
[0, 1]as"0"/"1"for every bar (0.62, 0.72,0.82, 0.92 all became
"1"). Added_auto_annotation_fmt(no decimals forinteger-valued scores, two decimals for fractional, one otherwise) plus an
explicit
annotation_fmtoverride.ax.legend()sat on top of the tallest bars and clipped their value labels (e.g. the
"94"label vanished behind the legend box). Moved the legend outside the axes
(upper right, frameless) so it never collides with bars or labels.
(fig, ax)to match the siblingplot_rank(same epic)instead of a bare
Axes— the open question flagged in the original PR body isresolved toward the family convention. Tests and the example notebook updated to
unpack
fig, ax.group/condition/valueproduced an opaque pandasgroupbyerror; added anexplicit up-front
ValueError.See Alsopointed to a non-existentaaanalysis.plot_eval_heatmap; repointed to the realaaanalysis.pipe.plot_evaland added
plot_rank.annotate=False; validateannotation_fmtin the Validate block.Tests: 39 unit tests (adds fractional-precision, mean-aggregation,
annotation_fmt,distinct-column, and
(fig, ax)cases) plusparam_coverage/ backend-import-hygienegreen; docstring checker 0 defects (the advisory
RAISES-UNDOCUMENTEDis consistentwith
plot_rank, which also omits aRaisessection); the example notebookre-executes under
nbmakewith embedded outputs.Residual: the library default palette (
plot_get_clist) is used for auto colors;the
tab:gray/tab:redliterals appear only in user-facing example/test call sites(demonstrating the
colorsarg), not in library code. The γ-secretase notebookswap remains blocked on
xgboostin the agent env (unchanged from the original PR).Iterative review log
group_order/condition_order(cryptic TypeError / silent extra x-tick + wrong bar count) by de-duplicating the explicit order; added an up-frontValueErrorfor a non-numericvaluecolumn (was a cryptic pandasmeanerror). +4 regression tests (dup group/condition order, partial grid, non-numeric value).plot_settings()+ default rcParams. Found the free-floatingchance (...)baseline text overlapped the bars when the baseline sat near bar heights (AUC vs 0.5); moved it into the outside legend via the axhline label. Added an additivextick_rotationarg (default 0, output unchanged) so long condition names stay legible. Residual: on very dense clusters (>=5 groups) adjacent per-bar value labels can still touch — inherent to value labels on narrow bars.0.01*max(heights_max,1)label gap out of the per-bar annotation loop and replaced.values.astype(float)with a singleto_numpy(dtype=float)conversion. Output byte-identical (46 tests unchanged). No dead code / redundant validation found otherwise; the empty/distinct-column/bar_width==0 guards each cover a casecheck_df/check_number_rangedo not.xtick_rotation, positive+negative tested); no ADR refs /print/tab:literals in library code; docstring checker 0 defects (RAISES advisory matches siblingplot_rank), doc/signature drift 0;(fig, ax)return matches siblingplot_rank. Refreshed + re-executed the example notebook (nbmake green, embedded PNGs) to coverxtick_rotationand the legend baseline. Full local gate: plotting + api meta-tests 507 passed.