Skip to content

feat(feature_engineering_pro): CPPStructurePlot.map_structure — paint CPP feature impact on a 3D structure (pro) (#119)#278

Draft
breimanntools wants to merge 1 commit into
masterfrom
feat/cpp-structure-plot
Draft

feat(feature_engineering_pro): CPPStructurePlot.map_structure — paint CPP feature impact on a 3D structure (pro) (#119)#278
breimanntools wants to merge 1 commit into
masterfrom
feat/cpp-structure-plot

Conversation

@breimanntools

Copy link
Copy Markdown
Owner

Summary

Adds the pro plotting class CPPStructurePlot (aaanalysis/feature_engineering_pro/) whose single method map_structure(df_feat, pdb=...|uniprot=...) paints per-residue CPP / CPP-SHAP feature impact onto a 3D protein structure. This is "Issue 1" scope of the handoff (the static render class); the interactive live-prediction .interactive() stays a separate follow-up.

Closes #119

What it does

  • Maps each df_feat feature to the residues it spans (get_positions_, shifted to absolute residue numbers by start) and aggregates col_imp per residue with the same normalized-sum CPPPlot.profile uses — never a re-implemented per-position loop.
  • Renders via py3Dmol (interactive, write_html) with a matplotlib mplot3d fallback (static, savefig), returning a thin StructureView delegator with a uniform show / write_html / savefig / _repr_html_ surface.
  • Modes: "impact" (white→COLOR_SHAP_POS/COLOR_SHAP_NEG, reusing the package SHAP palette so it matches CPPPlot, with a sign·sqrt perceptual transform) and "plddt" (AlphaFold confidence palette). Focus whole / fade / zoom, window auto-derived from the feature positions or set via focus_region.
  • AlphaFold auto-fetch when uniprot= is given (via StructurePreprocessor.fetch_alphafold into a temp dir).

Design / reuse (no duplication)

  • Reuses the shared CPP position backend (get_positions_ / get_df_pos_(value_type="sum")) and the StructurePreprocessor structure backend (load_structure / _collect_chain_residues / _resolve_best_chain) — the only new structure code is a thin chain-by-id Cα/pLDDT extractor (the encoder backend returns sequence-aligned arrays that drop absolute residue numbering). This follows the established scan_motif → seq_analysis._backend cross-package-reuse precedent; the reused encode_pdb helpers carry a NOTE marking them load-bearing.
  • StructureView is the package's first non-Axes plotting return type, a deliberate documented exception (a thin pure delegator).
  • Gated on biopython (top-level Bio import); py3Dmol imported lazily so the matplotlib fallback works without it. py3Dmol>=2.0 added to the [pro] extra.

Ripple

  • Public API: aaanalysis/__init__.py export (pro-gated stub on ImportError) + _EXTRA_MODULES["pro"] gains py3Dmol.
  • New ut constants: pLDDT palette (COLOR_PLDDT_*, LIST_COLOR_PLDDT, DICT_COLOR_PLDDT, COLOR_STRUCT_MISSING).
  • Docs: api.rst, release_notes.rst (v1.1.0 Unreleased), docstring_guide.rst (abbr csp), CONTEXT.md (CPPStructurePlot, StructureView terms).
  • Tests: 69-test unit module (golden-value normalized-sum mapping, per-param positive/negative, both backends, focus modes, fallback, multi-chain selection, color-ramp match); meta-tests updated (DEDICATED_OWNERS, abbreviation REGISTRY).
  • Example notebook examples/feature_engineering_pro/csp_map_structure.ipynb (executed, embedded figures + tables).

Review

Ran a multi-agent code-review high + a security review on the diff. Security: no exploitable findings. Code-review fixes folded in:

  • py3Dmol selections are chain-qualified ({"chain", "resi"}) so a multi-chain structure no longer paints residue N onto every chain; zoomTo is intersected with residues actually present (no silent empty-selection no-op).
  • An explicit chain still scores sequence identity, so a wrong chain= + sequence= warns instead of silently mis-painting.
  • The impact ramp now reuses the package SHAP palette (plot_get_cmap_) rather than a divergent linear interpolation, keeping 3D colors consistent with the 2D CPP plots.

The cross-package backend reuse (flagged by the reviewer) is the intended, user-approved "reuse, no duplication" decision and is left as-is.

Tests

Full fast unit gate green locally (-m "not regression and not integration and not e2e"); docstring checkers and nbmake on the new notebook pass. Branch rebased on current origin/master.

🤖 Generated with Claude Code

… CPP feature impact on a 3D structure (pro)

Add the pro plotting class CPPStructurePlot whose single method
map_structure(df_feat, pdb=...|uniprot=...) paints per-residue CPP / CPP-SHAP
feature impact onto a 3D protein structure. Each feature is mapped to the
residues it spans (get_positions_, shifted to absolute numbers by `start`) and
its col_imp aggregated per residue with the same normalized-sum as
CPPPlot.profile — no re-implemented per-position loop. Renders via py3Dmol
(interactive, write_html) with a matplotlib mplot3d fallback (static, savefig),
returning a thin StructureView delegator with a uniform show / write_html /
savefig / _repr_html_ surface.

Modes impact (white->SHAP_POS/NEG ramp, reusing the package SHAP palette, with a
sign*sqrt perceptual transform) and plddt (AlphaFold confidence). Focus
whole / fade / zoom; window auto-derived from feature positions or set via
focus_region. AlphaFold auto-fetch when uniprot= is given.

Reuse, no duplication: the shared CPP position backend and the
StructurePreprocessor structure parser (load_structure / _collect_chain_residues
/ _resolve_best_chain) are reused; the only new structure code is a thin
chain-by-id Ca/pLDDT extractor (the encoder backend returns sequence-aligned
arrays that drop absolute residue numbering). Gated on biopython; py3Dmol
imported lazily so the matplotlib fallback works without it.

Wiring: __init__.py export (pro-gated stub) + py3Dmol in _EXTRA_MODULES and the
[pro] extra; pLDDT palette ut constants. Ripple: api.rst, release notes
(v1.1.0), docstring guide (abbr csp), CONTEXT.md (CPPStructurePlot,
StructureView), DEDICATED_OWNERS + abbreviation registry, a 69-test unit module
(golden-value normalized-sum mapping, multi-chain selection, both backends,
focus modes, fallback), and an executed example notebook.

py3Dmol selections are chain-qualified (no multi-chain residue-number leak),
zoomTo is intersected with present residues, and an explicit chain still scores
identity so a wrong chain + sequence warns.

Closes #119

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@breimanntools breimanntools left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I provide the ADAMTS7 html as example for how the structure should be visulazed!

@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 80.80000% with 72 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.81%. Comparing base (ae64fe2) to head (f7c7643).
⚠️ Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
...sis/feature_engineering_pro/_cpp_structure_plot.py 82.24% 15 Missing and 4 partials ⚠️
...ture_engineering_pro/_backend/cpp_struct/render.py 82.41% 9 Missing and 7 partials ⚠️
...eature_engineering_pro/_backend/cpp_struct/view.py 73.46% 12 Missing and 1 partial ⚠️
...e_engineering_pro/_backend/cpp_struct/structure.py 78.00% 5 Missing and 6 partials ⚠️
...ture_engineering_pro/_backend/cpp_struct/colors.py 77.14% 7 Missing and 1 partial ⚠️
...ure_engineering_pro/_backend/cpp_struct/mapping.py 82.14% 2 Missing and 3 partials ⚠️

❌ Your patch check has failed because the patch coverage (80.80%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #278      +/-   ##
==========================================
- Coverage   96.13%   95.81%   -0.32%     
==========================================
  Files         176      183       +7     
  Lines       16827    17202     +375     
  Branches     2877     2935      +58     
==========================================
+ Hits        16176    16482     +306     
- Misses        366      414      +48     
- Partials      285      306      +21     
Files with missing lines Coverage Δ
aaanalysis/__init__.py 97.53% <100.00%> (+0.19%) ⬆️
aaanalysis/_constants.py 100.00% <100.00%> (ø)
...handling_pro/_backend/struct_preproc/encode_pdb.py 96.00% <ø> (ø)
aaanalysis/feature_engineering_pro/__init__.py 100.00% <100.00%> (ø)
...ure_engineering_pro/_backend/cpp_struct/mapping.py 82.14% <82.14%> (ø)
...ture_engineering_pro/_backend/cpp_struct/colors.py 77.14% <77.14%> (ø)
...e_engineering_pro/_backend/cpp_struct/structure.py 78.00% <78.00%> (ø)
...eature_engineering_pro/_backend/cpp_struct/view.py 73.46% <73.46%> (ø)
...ture_engineering_pro/_backend/cpp_struct/render.py 82.41% <82.41%> (ø)
...sis/feature_engineering_pro/_cpp_structure_plot.py 82.24% <82.24%> (ø)

... and 1 file with indirect coverage changes

Components Coverage Δ
cpp_core 94.95% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add CPPStructurePlot.map_structure: paint CPP-SHAP feature impact onto a 3D protein structure (pro)

1 participant