feat: ShapModelPlot.clustermap + shap_to_feat_imp + CPPPlot sample= (prototype for #313)#323
Draft
breimanntools wants to merge 6 commits into
Draft
feat: ShapModelPlot.clustermap + shap_to_feat_imp + CPPPlot sample= (prototype for #313)#323breimanntools wants to merge 6 commits into
breimanntools wants to merge 6 commits into
Conversation
…CPPPlot sample= shortcut Ports the explanation-similarity clustermap from the original gamma-secretase project into a library-grade pro API, adds the shap_to_feat_imp normalization helper, and lets CPPPlot.ranking/profile/feature_map resolve a sample by name. - ShapModelPlot.clustermap: correlation-of-SHAP-vectors clustermap with row/col class-color sidebars, a class legend, a labelled horizontal colorbar, and font via plot_gco; returns the seaborn ClusterGrid. - ShapModelPlot.get_clusters: deterministic dendrogram cut (n_clusters / color_threshold), replacing the original dendrogram-color parsing. - shap_to_feat_imp: signed impact (reusing the ShapModel backend) / absolute importance, both normalized to sum(|.|)=100. - CPPPlot sample=: resolves col_imp=feat_impact_<entry> (+ TMD-JMD parts from df_parts for profile/feature_map) and sets shap_plot=True; default output unchanged when sample is None. ShapModelPlot / shap_to_feat_imp stay unwired at the top level (TODO #305, CONFIRM-FIRST). pro-gated; tests skip cleanly when shap is absent. Part of #305 / prototype for #313. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e-consistency test - comp_shap_correlation: raise a clear ValueError naming samples with a constant (zero-variance) SHAP vector instead of an opaque scipy non-finite-distance error (covers clustermap + get_clusters). - shap_to_feat_imp: raise on an all-zero vector instead of silently returning nan. - Add a regression test proving get_clusters uses the same linkage the clustermap dendrogram draws, plus negative tests for both new guards. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Under default Matplotlib rcParams the class legend overflowed the figure's right edge (~35px) and was clipped on a plain savefig without bbox_inches='tight'. Reserve a right margin via grid.gs.update(right=0.80) so the legend fits inside the canvas; verified under both default rcParams and plot_settings(). Add a regression test asserting the legend stays within the figure bounds. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Counter dedup - Remove unused n_features unpacking (3 spots; only n_samples is used). - df_parts.index[int(sample)] instead of list(df_parts.index)[int(sample)]. - Duplicate-name detection via collections.Counter (single pass) instead of an O(n^2) list(names).count(...) scan. Output byte-identical; all tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The auto-discovering pro-contract meta-test requires every public *_pro symbol's one-line summary to carry the [pro] / aaanalysis[pro] install marker; shap_to_feat_imp lacked it and was failing test_pro_marker_in_summary. Add the marker. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #323 +/- ##
==========================================
+ Coverage 94.93% 94.95% +0.01%
==========================================
Files 185 187 +2
Lines 17862 18008 +146
Branches 3032 3054 +22
==========================================
+ Hits 16957 17099 +142
- Misses 598 599 +1
- Partials 307 310 +3
... and 1 file with indirect coverage changes
🚀 New features to boost your workflow:
|
breimanntools
commented
Jul 1, 2026
breimanntools
left a comment
Owner
Author
There was a problem hiding this comment.
The clustermap is not only for shap values but also for featuers or other numerical represntations. Perhaps we need a plot_clustmarp utils insterad of asigning it to SHapModel. Should we make a general plotting class called AAPlot (aap) and the predction plots can be asigned to this one as well. Or AAPredPlot) I do not know right now
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Part of #305 / prototype for #313.
Status
Draft, functionally complete and green locally. New unit tests (56 in the pro
plot suite + CPPPlot
sample=) pass; the fulltests/unit/cpp_plot_tests/suite(356) stays green; docstring, doc/signature-drift, param-coverage, backend
import-hygiene, utils-barrel and agentic-docs checkers all pass.
ShapModelPlot/shap_to_feat_impare intentionally not wired into the top-levelaaanalysisnamespace (see the TODO below) — they are reachable via
aaanalysis.explainable_ai_pro.What this adds
ShapModelPlot.clustermap— clusters samples by explanation similarity(Pearson correlation of per-sample SHAP vectors), with row/col class-color
sidebars, a class legend, a labelled horizontal colorbar and font via
plot_gco. Returns the seabornClusterGrid(multi-axes object — see theresidual note).
ShapModelPlot.get_clusters— deterministic dendrogram cut(
n_clusters/color_threshold), the library-grade replacement for theoriginal project's dendrogram-color parsing.
shap_to_feat_imp— SHAP vector → signed feature impact (reusing theShapModelper-sample backend, so it never diverges) / absolute importance,both normalized so
sum(|.|) == 100.CPPPlot.ranking/profile/feature_mapsample=— resolvescol_imp="feat_impact_<entry>"(+ TMD-JMD parts fromdf_partsviaSequenceFeature.get_seq_kwsfor the sequence-level plots) and setsshap_plot=True, replacing the manualcol_imp=f"..."+**seq_kwsplumbingfrom γ-secretase cells 30/32.
Critical self-review
Defects found and fixed while reviewing the prior (uncommitted, untested) code:
sampleas an int position produced a wrongcol_imp. The oldresolve_sample_kwsbuiltcol_imp=f"feat_impact_{sample}"from the rawvalue, so
sample=0looked forfeat_impact_0— but impact columns are keyedby entry name (
feat_impact_APP). Fixed: forprofile/feature_mapanint position is mapped to its entry name via
df_parts.index;rankinghasno
df_partsto map a position, so it now accepts the entry name (str) only(annotation tightened to
Optional[str], clear error otherwise). Covered bytest_int_position_equals_nameandtest_int_position_rejected.label to
"Pearson correlation (r)"— but that axis is the sample list, not acorrelation scale. Removed; the correlation label now lives only on the
colorbar.
colorbar rendered in the tiny default vertical slot and its centered label
overflowed the figure's left edge; the class-color sidebars had no legend.
Fixed with an explicit
cbar_pos, a left-aligned colorbar label, and ahouse-style class legend (
ut.plot_legend_) top-right. Rendered to PNG andinspected: block structure clear, dendrograms sane, nothing clipped.
df_seqdocstring diverged from the canonical baseline (docstring checkerDFSEQ-BASELINE). Reworded to the canonical "DataFrame containing anentrycolumn …".
numpyimport / redundantnp.asarraycoercion inthe backend (frontend already hands it a validated float array).
Verified byte-identical default behaviour:
sample=Nonereproduces theexisting
ranking/profile/feature_mapPNG exactly, andsample="<entry>"equals the explicit
col_imp=... (+ **seq_kws)call (golden PNG-equality tests).Residual concerns
FigAxResultfamily.clustermapreturns aseaborn
ClusterGrid, not(fig, ax), because the figure has several axes(heatmap, two dendrograms, colorbar, two sidebars) and callers need
grid.fig/grid.ax_heatmap/grid.dendrogram_*.linkage. Documented in theReturnssection; flagging it as a deliberate, sibling-inconsistent choice.get_clustersis beyond the three named#313deliverables. It is a small,deterministic complement to
clustermap(and itsSee Also); drop it if youprefer to keep the PR to the exact issue scope.
methodis validated only as a str (scipy raises on an unknown linkagemethod) rather than against an allow-list — kept thin on purpose.
ShapModelPlot/shap_to_feat_impdon't needshapthemselves, but importing them goes through
explainable_ai_pro.__init__→_shap_model(which importsshap), so they require theproextra inpractice. Core
import aaanalysisis unaffected (ShapModel is try/except-stubbedand these two symbols are not wired top-level). Tests use
pytest.importorskip("shap").TODO (not in this PR)
ShapModelPlot/shap_to_feat_impinto top-levelaaanalysis.__init__with pro-gating (CONFIRM-FIRST;
__init__.py/__all__are confirm-firstsurfaces) — kept as a TODO in
explainable_ai_pro/__init__.py.examples/*.rstfor the new methods and switch the inline doctests to.. include::.Iterative review log
Round 2 (correctness & edge cases): a sample with a constant/zero-variance SHAP vector
produced an opaque
ValueError: The condensed distance matrix must contain only finite valuesdeep inside seaborn — nowcomp_shap_correlationdetects the undefinedcorrelation and raises a clear message naming the offending sample(s) (covers both
clustermapandget_clusters).shap_to_feat_impon an all-zero vector silentlyreturned
nan; now raises a clear "undefined"ValueError. Added a regression testproving
get_clustersuses the exact same linkage theclustermapdendrogram draws(KPI: linkage + cluster assignment match), plus negative tests for both new guards
(70 tests, all green). Verified out-of-range int
sample=positions are alreadyrejected upstream by
get_seq_kws(friendlyValueError, notIndexError).Round 3 (plot quality): rendered the clustermap to PNG under both default Matplotlib
rcParams and
aa.plot_settings(). Found the class legend overflowed the figure'sright edge by ~35px under default rcParams (title clipped to "Cl"), so it was cut on a
plain
fig.savefigwithoutbbox_inches="tight". Fixed by reserving a right margin(
grid.gs.update(right=0.80)) so the legend fits inside the canvas — verified it nowfits under both default and
plot_settings()(font 18). Colorbar (horizontal, left-aligned "Correlation (r)" label, -1/0/1 ticks), dendrograms, block structure and fonts
(
plot_gco) all inspected clean. Added a regression test asserting the legend stayswithin the figure bounds under default rcParams.
Round 4 (efficiency & simplicity): removed dead
n_featuresunpacking in three spots(
check_match_shap_values_labels,clustermap,get_clusters— onlyn_samplesisused); replaced
list(df_parts.index)[int(sample)](materialized the whole index toread one label) with a direct
df_parts.index[int(sample)]; simplified theduplicate-name detection from an O(n^2)
list(names).count(...)to a singlecollections.Counterpass. Output byte-identical (all 71 tests green, including thesample=golden PNG-equality tests).Round 5 (guides, docs & tests): the auto-discovering pro-contract meta-test
(
test_docstring_contracts.py::test_pro_marker_in_summary) was failing for the newshap_to_feat_imp— its one-line summary lacked the required[pro]/aaanalysis[pro]install marker (Round 1 never ran this test). Fixed the summary. Re-verified the whole
area: docstring checker + doc/signature-drift clean; no ADR refs /
print()in the new.py; allraises are bareValueError; every public param ofclustermap/get_clusters/shap_to_feat_impand the CPPPlotsample=/df_seq/df_partsshortcut has a Validate check and a positive+negative test;
plot_gcofonts. Full localgate green: shap_model_plot + full cpp_plot suite + param-coverage + import-hygiene +
return-contract + docstring-contracts + utils-barrel + agentic-docs (492 passed). The two
internal symbols remain out of the top-level
__all__(TODO kept, CONFIRM-FIRST).🤖 Generated with Claude Code