You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The dominant data object in single-cell / spatial omics is AnnData (Scanpy /
scverse), yet AAanalysis has no bridge to it: a user holding a Scanpy marker-gene
or differential-expression result cannot turn the selected genes/proteins into a df_seq, and there is no way to attach a CPP df_feat back onto an AnnData for
downstream omics plotting. The ecosystem epic (#210) enumerates proteomics
adapters but never the single-cell/scverse upstream, so today that entire
upstream is unreachable — a user comparing AAanalysis to their existing scverse
stack sees no integration point, only a manual, error-prone copy of sequences in
and scores out.
Goal
Add thin, optional bidirectional adapters — from_anndata (selected
genes/proteins → df_seq) and to_anndata (df_feat → adata.var / varm / uns) — gated behind a new [omics] extra, with anndata never a core/required
dependency and no single-cell analysis reimplemented.
Requirements
from_anndata(adata, var_key=..., sequence_key=..., group_key=None, ...)
returns a df_seq with entry + sequence (and optional label/group
columns) for a selected .var subset. Sequences come from a user-supplied .var column or a passed-in {id: sequence} mapping — no network
lookup here.
to_anndata(adata, df_feat, key="aaanalysis", ...) writes a per-protein
score to adata.var, the protein×feature matrix to adata.varm, and run
metadata to adata.uns, non-destructively (namespaced keys, original
slots untouched).
Place alongside read_fasta/to_fasta in aaanalysis/data_handling/
(flag at review if a dedicated adapters/ module is preferred).
New [omics] extra in pyproject.toml (anndata; mudata deferred).
On a base install (no [omics]), calling either adapter raises an
actionable ImportError naming pip install aaanalysis[omics] — never a
bare crash (mirror the optional-engine pattern of Add standardized export formats for outputs #33's parquet path).
numpydoc docstrings (named Returns, per-method Examples include); one
example notebook built on a tiny synthetic AnnData so it runs under
the nbmake CI gate without pulling scanpy/torch.
KPIs / Acceptance criteria
Round-trip on a small fixture: from_anndata(adata) yields a df_seq
whose entry set equals the selected .var subset (asserted); after to_anndata(adata, df_feat), reading the attached columns back recovers df_feat's contracted columns losslessly.
Base install (no [omics]) raises the actionable ImportError
(asserted) — anndata is absent from core dependencies (verified by a
base-install import test).
≥1 example notebook runs under the nbmake gate using a synthetic AnnData (no scanpy/torch resolved).
Scope / non-goals
Out: gene→protein sequence resolution (UniProt/network lookup) —
sequences are user-supplied; automatic resolution is a separate concern.
Out: the expression→sequence-signature enrichment orchestration
(ProteinSignatureEnrichment / "GSEA-for-protein-features"). That end-to-end
workflow — resolve gene→protein, run CPP, narrate the signature — is a
downstream application and lives in ProtXplain, not in AAanalysis core. The
adapter only moves data across the boundary.
Out:MuData / SpatialData (deferred follow-ups once AnnData works).
Out: any Scanpy analysis (clustering / DE / UMAP) — consumed, not cloned.
Problem
The dominant data object in single-cell / spatial omics is
AnnData(Scanpy /scverse), yet AAanalysis has no bridge to it: a user holding a Scanpy marker-gene
or differential-expression result cannot turn the selected genes/proteins into a
df_seq, and there is no way to attach a CPPdf_featback onto anAnnDatafordownstream omics plotting. The ecosystem epic (#210) enumerates proteomics
adapters but never the single-cell/scverse upstream, so today that entire
upstream is unreachable — a user comparing AAanalysis to their existing scverse
stack sees no integration point, only a manual, error-prone copy of sequences in
and scores out.
Goal
Add thin, optional bidirectional adapters —
from_anndata(selectedgenes/proteins →
df_seq) andto_anndata(df_feat→adata.var/varm/uns) — gated behind a new[omics]extra, withanndatanever a core/requireddependency and no single-cell analysis reimplemented.
Requirements
from_anndata(adata, var_key=..., sequence_key=..., group_key=None, ...)returns a
df_seqwithentry+sequence(and optionallabel/groupcolumns) for a selected
.varsubset. Sequences come from a user-supplied.varcolumn or a passed-in{id: sequence}mapping — no networklookup here.
to_anndata(adata, df_feat, key="aaanalysis", ...)writes a per-proteinscore to
adata.var, the protein×feature matrix toadata.varm, and runmetadata to
adata.uns, non-destructively (namespaced keys, originalslots untouched).
read_fasta/to_fastainaaanalysis/data_handling/(flag at review if a dedicated
adapters/module is preferred).[omics]extra inpyproject.toml(anndata;mudatadeferred).[omics]), calling either adapter raises anactionable
ImportErrornamingpip install aaanalysis[omics]— never abare crash (mirror the optional-engine pattern of Add standardized export formats for outputs #33's parquet path).
Returns, per-methodExamplesinclude); oneexample notebook built on a tiny synthetic
AnnDataso it runs underthe nbmake CI gate without pulling
scanpy/torch.KPIs / Acceptance criteria
from_anndata(adata)yields adf_seqwhose
entryset equals the selected.varsubset (asserted); afterto_anndata(adata, df_feat), reading the attached columns back recoversdf_feat's contracted columns losslessly.[omics]) raises the actionableImportError(asserted) —
anndatais absent from core dependencies (verified by abase-install import test).
AnnData(noscanpy/torchresolved).Scope / non-goals
sequences are user-supplied; automatic resolution is a separate concern.
(
ProteinSignatureEnrichment/ "GSEA-for-protein-features"). That end-to-endworkflow — resolve gene→protein, run CPP, narrate the signature — is a
downstream application and lives in ProtXplain, not in AAanalysis core. The
adapter only moves data across the boundary.
MuData/SpatialData(deferred follow-ups onceAnnDataworks).Dependencies
upstream the epic did not enumerate)
df_featcontract theto_anndatawriter serializes),Add standardized export formats for outputs #33 (reuse its export/metadata sidecar shape for the
unsrecord)Standards checklist
pyproject.toml(new[omics]extra)and
__init__.py/__all__(two new public symbols) — flag forapproval
Returns, per-methodExamplesinclude)print()(ut.print_out); bareValueError/RuntimeError; noaaanalysis._utils.*imports outsideutils.py__init__.py(CONFIRM-FIRST); optionaldep gated via
[omics]extra with a clear install hint