feat(feature_engineering_pro): CPPStructurePlot.map_structure — paint CPP feature impact on a 3D structure (pro) (#119)#278
Draft
breimanntools wants to merge 1 commit into
Draft
Conversation
… CPP feature impact on a 3D structure (pro) Add the pro plotting class CPPStructurePlot whose single method map_structure(df_feat, pdb=...|uniprot=...) paints per-residue CPP / CPP-SHAP feature impact onto a 3D protein structure. Each feature is mapped to the residues it spans (get_positions_, shifted to absolute numbers by `start`) and its col_imp aggregated per residue with the same normalized-sum as CPPPlot.profile — no re-implemented per-position loop. Renders via py3Dmol (interactive, write_html) with a matplotlib mplot3d fallback (static, savefig), returning a thin StructureView delegator with a uniform show / write_html / savefig / _repr_html_ surface. Modes impact (white->SHAP_POS/NEG ramp, reusing the package SHAP palette, with a sign*sqrt perceptual transform) and plddt (AlphaFold confidence). Focus whole / fade / zoom; window auto-derived from feature positions or set via focus_region. AlphaFold auto-fetch when uniprot= is given. Reuse, no duplication: the shared CPP position backend and the StructurePreprocessor structure parser (load_structure / _collect_chain_residues / _resolve_best_chain) are reused; the only new structure code is a thin chain-by-id Ca/pLDDT extractor (the encoder backend returns sequence-aligned arrays that drop absolute residue numbering). Gated on biopython; py3Dmol imported lazily so the matplotlib fallback works without it. Wiring: __init__.py export (pro-gated stub) + py3Dmol in _EXTRA_MODULES and the [pro] extra; pLDDT palette ut constants. Ripple: api.rst, release notes (v1.1.0), docstring guide (abbr csp), CONTEXT.md (CPPStructurePlot, StructureView), DEDICATED_OWNERS + abbreviation registry, a 69-test unit module (golden-value normalized-sum mapping, multi-chain selection, both backends, focus modes, fallback), and an executed example notebook. py3Dmol selections are chain-qualified (no multi-chain residue-number leak), zoomTo is intersected with present residues, and an explicit chain still scores identity so a wrong chain + sequence warns. Closes #119 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
breimanntools
commented
Jun 26, 2026
breimanntools
left a comment
Owner
Author
There was a problem hiding this comment.
I provide the ADAMTS7 html as example for how the structure should be visulazed!
Codecov Report❌ Patch coverage is ❌ Your patch check has failed because the patch coverage (80.80%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## master #278 +/- ##
==========================================
- Coverage 96.13% 95.81% -0.32%
==========================================
Files 176 183 +7
Lines 16827 17202 +375
Branches 2877 2935 +58
==========================================
+ Hits 16176 16482 +306
- Misses 366 414 +48
- Partials 285 306 +21
... and 1 file with indirect coverage changes
🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the pro plotting class
CPPStructurePlot(aaanalysis/feature_engineering_pro/) whose single methodmap_structure(df_feat, pdb=...|uniprot=...)paints per-residue CPP / CPP-SHAP feature impact onto a 3D protein structure. This is "Issue 1" scope of the handoff (the static render class); the interactive live-prediction.interactive()stays a separate follow-up.Closes #119
What it does
df_featfeature to the residues it spans (get_positions_, shifted to absolute residue numbers bystart) and aggregatescol_impper residue with the same normalized-sumCPPPlot.profileuses — never a re-implemented per-position loop.write_html) with a matplotlibmplot3dfallback (static,savefig), returning a thinStructureViewdelegator with a uniformshow/write_html/savefig/_repr_html_surface."impact"(white→COLOR_SHAP_POS/COLOR_SHAP_NEG, reusing the package SHAP palette so it matchesCPPPlot, with asign·sqrtperceptual transform) and"plddt"(AlphaFold confidence palette). Focuswhole/fade/zoom, window auto-derived from the feature positions or set viafocus_region.uniprot=is given (viaStructurePreprocessor.fetch_alphafoldinto a temp dir).Design / reuse (no duplication)
get_positions_/get_df_pos_(value_type="sum")) and theStructurePreprocessorstructure backend (load_structure/_collect_chain_residues/_resolve_best_chain) — the only new structure code is a thin chain-by-id Cα/pLDDT extractor (the encoder backend returns sequence-aligned arrays that drop absolute residue numbering). This follows the establishedscan_motif → seq_analysis._backendcross-package-reuse precedent; the reusedencode_pdbhelpers carry aNOTEmarking them load-bearing.StructureViewis the package's first non-Axesplotting return type, a deliberate documented exception (a thin pure delegator).Bioimport); py3Dmol imported lazily so the matplotlib fallback works without it.py3Dmol>=2.0added to the[pro]extra.Ripple
aaanalysis/__init__.pyexport (pro-gated stub on ImportError) +_EXTRA_MODULES["pro"]gainspy3Dmol.utconstants: pLDDT palette (COLOR_PLDDT_*,LIST_COLOR_PLDDT,DICT_COLOR_PLDDT,COLOR_STRUCT_MISSING).api.rst,release_notes.rst(v1.1.0 Unreleased),docstring_guide.rst(abbrcsp),CONTEXT.md(CPPStructurePlot, StructureView terms).DEDICATED_OWNERS, abbreviationREGISTRY).examples/feature_engineering_pro/csp_map_structure.ipynb(executed, embedded figures + tables).Review
Ran a multi-agent
code-review high+ a security review on the diff. Security: no exploitable findings. Code-review fixes folded in:{"chain", "resi"}) so a multi-chain structure no longer paints residue N onto every chain;zoomTois intersected with residues actually present (no silent empty-selection no-op).chainstill scores sequence identity, so a wrongchain=+sequence=warns instead of silently mis-painting.plot_get_cmap_) rather than a divergent linear interpolation, keeping 3D colors consistent with the 2D CPP plots.The cross-package backend reuse (flagged by the reviewer) is the intended, user-approved "reuse, no duplication" decision and is left as-is.
Tests
Full fast unit gate green locally (
-m "not regression and not integration and not e2e"); docstring checkers andnbmakeon the new notebook pass. Branch rebased on currentorigin/master.🤖 Generated with Claude Code