Skip to content

ImL1s/dogneo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

25 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧬 DogNeo

Canine Neoantigen Prioritization Pipeline

From somatic variants to ranked vaccine candidates β€” purpose-built for dogs.

License Python Tests install

Quick Start Β· Usage Β· LLM Config Β· Architecture Β· License


⚠️ FOR RESEARCH USE ONLY (RUO) β€” This tool is intended solely for computational research purposes. It does not provide clinical diagnoses, veterinary advice, or treatment recommendations.

Why DogNeo?

Cancer immunotherapy has shown remarkable results in human medicine β€” but dogs get cancer too. When a team used AI + mRNA to treat a dog's tumor, it highlighted a gap: there was no open-source, canine-specific neoantigen pipeline.

DogNeo fills that gap. It takes annotated somatic variants from a dog's tumor, generates candidate mutant peptides, predicts binding to DLA (Dog Leukocyte Antigen) molecules, and ranks neoantigens β€” giving researchers a starting point for personalized canine cancer vaccines.

Tumor/Normal DNA ──→ Variant Calling ──→ Peptide Generation ──→ DLA Binding ──→ Ranked Candidates
       RNA-seq ──→ Expression Quantification β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       DLA typing (KPR) β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

✨ Key Features

Feature Description
πŸ• Canine-specific DLA alleles from IPD-MHC, CanFam3.1 proteome, KPR genotyping
πŸ”¬ Full pipeline BWA β†’ GATK β†’ Mutect2/Strelka β†’ VEP/SnpEff β†’ Salmon β†’ NetMHCpan β†’ Ranking
πŸ§ͺ Out-of-the-box demo 3 commands to run a complete analysis on real canine osteosarcoma data
πŸ€– AI-assisted reports 3-tier LLM backend (CLI free β†’ Local β†’ Cloud) for narrative interpretation
πŸ“¦ Multi-format output TSV, JSON, FASTA, and interactive HTML reports
♻️ Reproducible Snakemake workflow with Docker containerization

πŸš€ Quick Start

pip install git+https://github.com/ImL1s/dogneo.git   # 1. Install
dogneo setup                                           # 2. Download reference data (~15 MB)
dogneo demo                                            # 3. Run demo pipeline ✨

That's it. The demo runs a full pipeline on bundled canine osteosarcoma data (8 published mutations across TP53, BRAF, KRAS, PIK3CA, PTEN) and generates ranked candidates in TSV, JSON, and FASTA formats.

πŸ“‹ Expected output
🧬 DogNeo v0.1.0 β€” Running demo pipeline
πŸ“‚ Using bundled demo data
πŸ“‚ Using reference: ~/.dogneo/data/CanFam3.1.pep.all.fa
   8 total β†’ 6 coding variants
   228 peptides from 6 variants
   912 unscored candidates

βœ… Demo complete! Results: dogneo_demo_results/CANINE_OSA_DEMO
   πŸ“„ candidates.tsv   β€” Tab-separated candidates
   πŸ“„ candidates.json  β€” Structured JSON with metadata
   πŸ“„ candidates.fasta β€” Peptide sequences for wet lab

πŸ“¦ Installation

Prerequisites

Tool Version Check Notes
Python β‰₯ 3.10 python --version Required
pip latest pip --version Included with Python
git any git --version Required

Install from GitHub (Recommended)

pip install git+https://github.com/ImL1s/dogneo.git

With extras

pip install "dogneo[bio] @ git+https://github.com/ImL1s/dogneo.git"     # + pysam, pyvcf3
pip install "dogneo[llm] @ git+https://github.com/ImL1s/dogneo.git"     # + openai, anthropic
pip install "dogneo[all] @ git+https://github.com/ImL1s/dogneo.git"     # Everything

From source (development)

git clone https://github.com/ImL1s/dogneo.git
cd dogneo
pip install -e ".[all]"

Reference Data (~15 MB, one-time)

dogneo setup

This downloads the CanFam3.1 proteome from Ensembl FTP to ~/.dogneo/data/. DLA alleles are bundled with the package β€” no additional download needed.

πŸ“– Usage

Rank neoantigens from your own VCF

# Minimal β€” auto-detects cached proteome + bundled DLA alleles
dogneo rank --vcf my_somatic_variants.vcf

# Full options
dogneo rank \
  --vcf somatic_annotated.vcf \
  --expression salmon_quant.sf \
  --alleles "DLA-88*001:01,DLA-88*501:01" \
  --protein-db CanFam3.1.pep.all.fa \
  --output-dir results/ \
  --formats tsv,json,fasta,html \
  --llm-tier cli

Generate AI-assisted report

# From existing candidates JSON
dogneo report \
  --input results/candidates.json \
  --output report.html \
  --llm-tier cli    # free, uses Gemini CLI / Claude Code

Re-rank with external binding predictions

# After running NetMHCpan on exported FASTA
dogneo rerank \
  --candidates results/candidates.json \
  --binding netmhcpan_output.tsv \
  --output-dir reranked/

Design mRNA vaccine construct

# Generate codon-optimized mRNA from top candidates
dogneo design-mrna \
  --candidates results/candidates.json \
  --top-n 10 \
  --output-dir mrna_design/

Launch interactive dashboard

dogneo ui              # Opens browser at localhost:8501
dogneo ui --demo       # Pre-load demo results

Run full Snakemake pipeline

dogneo run --config pipeline_config.yaml

CLI Reference

Command Description
dogneo setup Download reference data (CanFam3.1 proteome)
dogneo demo Run full pipeline on bundled demo data
dogneo rank Rank neoantigens from a VCF file (auto DLA binding estimation)
dogneo rerank Import external binding results (NetMHCpan/MHCflurry) and re-score
dogneo design-mrna Generate codon-optimized mRNA construct from top candidates
dogneo report Generate HTML/Markdown report
dogneo ui Launch interactive Streamlit dashboard
dogneo check-llm Display status of all LLM backends
dogneo version Show version

πŸ€– LLM Backend Configuration

The AI report layer is entirely optional β€” all computational analysis works without any LLM. LLMs are only used for generating narrative interpretations in HTML reports.

Three tiers, prioritized by cost

Tier Backend Cost Privacy Setup
1. CLI Gemini CLI, Claude Code, Codex Free Data sent to cloud Install any AI CLI tool
2. Local llama-cpp (GGUF models) Free Fully offline Download a GGUF model
3. Cloud OpenAI, Anthropic, Google Gemini Pay per token Data sent to cloud Set API key

Check available backends

$ dogneo check-llm

πŸ“‘ CLI Backends (Tier 1 β€” Free):
   βœ… gemini
   βœ… claude
   βœ… codex
πŸ’Ύ Local Backends (Tier 2 β€” Offline):
   βšͺ No local model configured
☁️  Cloud Backends (Tier 3 β€” API):
   ❌ OpenAI  (OPENAI_API_KEY)
   ❌ Anthropic  (ANTHROPIC_API_KEY)

Environment Variables (Cloud tier only)

# Optional β€” only needed if using cloud LLM tier
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export GOOGLE_API_KEY=AI...

πŸ”¬ Pipeline Architecture

                          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                          β”‚  Tumor DNA   β”‚
                          β”‚  Normal DNA  β”‚
                          β”‚  RNA-seq     β”‚
                          β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β–Ό            β–Ό             β–Ό
             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
             β”‚ BWA-MEM   β”‚ β”‚   STAR   β”‚ β”‚  Salmon  β”‚
             β”‚ alignment β”‚ β”‚ alignmentβ”‚ β”‚ quant.   β”‚
             β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
                   β”‚            β”‚             β”‚
             β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”     β”‚        Expression
             β”‚   GATK    β”‚     β”‚          TPMs
             β”‚ Mutect2   β”‚     β”‚             β”‚
             β”‚ Strelka2  β”‚     β”‚             β”‚
             β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜     β”‚             β”‚
                   β”‚           β”‚             β”‚
             β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”    β”‚             β”‚
             β”‚ VEP/SnpEffβ”‚    β”‚             β”‚
             β”‚ annotationβ”‚    β”‚             β”‚
             β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜    β”‚             β”‚
                   β”‚          β”‚             β”‚
                   β–Ό          β–Ό             β–Ό
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚       DogNeo Core Engine         β”‚
            β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
            β”‚  β”‚ Variant Parsing & Filtering β”‚ β”‚
            β”‚  β”‚ Mutant Peptide Generation   β”‚ β”‚
            β”‚  β”‚ DLA Binding Prediction      β”‚ β”‚
            β”‚  β”‚ Multi-factor Ranking        β”‚ β”‚
            β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β–Ό            β–Ό            β–Ό
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚  TSV   β”‚  β”‚  JSON  β”‚  β”‚  HTML    β”‚
         β”‚  FASTA β”‚  β”‚        β”‚  β”‚ + LLM   β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Computational Steps

Step Tool Description
1 BWA-MEM / STAR Sequence alignment (DNA / RNA)
2 GATK MarkDuplicates, BQSR
3 Mutect2 + Strelka2 Somatic variant calling
4 VEP / SnpEff Variant annotation
5 Salmon / Kallisto Expression quantification
6 DogNeo Mutant peptide windows (8–11aa MHC-I, 15–17aa MHC-II)
7 KPR DLA genotyping from RNA-seq
8 NetMHCpan / DogNeo estimator Binding prediction
9 DogNeo Multi-factor immunogenicity scoring
10 DogNeo TSV / JSON / FASTA / HTML report

πŸ—‚οΈ Project Structure

dogneo/
β”œβ”€β”€ core/                 # Core computational modules
β”‚   β”œβ”€β”€ variants.py       #   VCF parsing & somatic variant handling
β”‚   β”œβ”€β”€ peptides.py       #   Mutant peptide window generation
β”‚   β”œβ”€β”€ expression.py     #   RNA-seq expression quantification
β”‚   β”œβ”€β”€ binding.py        #   MHC/DLA binding prediction wrappers
β”‚   β”œβ”€β”€ ranking.py        #   Multi-factor neoantigen scoring
β”‚   └── dla_typing.py     #   Canine DLA genotyping
β”œβ”€β”€ data/                 # Reference & demo data
β”‚   β”œβ”€β”€ manager.py        #   Auto-download CanFam3.1 from Ensembl FTP
β”‚   └── demo/             #   Bundled demo data (VCF, expression, alleles)
β”œβ”€β”€ pipeline/             # Snakemake workflow definitions
β”‚   β”œβ”€β”€ Snakefile         #   Main workflow
β”‚   └── config.yaml       #   Pipeline configuration template
β”œβ”€β”€ llm/                  # Pluggable LLM layer (optional)
β”‚   β”œβ”€β”€ router.py         #   3-tier routing (CLI β†’ Local β†’ Cloud)
β”‚   β”œβ”€β”€ backends.py       #   OpenAI / Anthropic / Gemini backends
β”‚   β”œβ”€β”€ cli_wrapper.py    #   Subprocess wrapper for AI CLIs
β”‚   └── prompts.py        #   Prompt templates for neoantigen analysis
β”œβ”€β”€ report/               # Report generation
β”‚   └── generator.py      #   HTML/Markdown report with AI summary
β”œβ”€β”€ export/               # Output format converters
β”‚   └── exporters.py      #   FASTA, TSV, JSON exporters
└── cli.py                # Click-based CLI entrypoint

πŸ“Š Reference Data

Resource Source Notes
Canine proteome Ensembl CanFam3.1 Auto-downloaded via dogneo setup (~15 MB)
DLA alleles IPD-MHC 6 DLA-88 Class I alleles, bundled
DLA typing KPR (paper) RNA-seq based MHC-I genotyping
Canine genome CanFam3.1, GSD_1.0, UMICH_Zoey_3.1 Multiple assemblies supported

🀝 Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

# Development setup
git clone https://github.com/ImL1s/dogneo.git
cd dogneo
pip install -e ".[all]"
python -m pytest tests/ -v        # Run tests (195 passing)

βš–οΈ Legal Notices

RUO Disclaimer

All outputs are labeled "FOR RESEARCH USE ONLY β€” NOT FOR CLINICAL OR VETERINARY DIAGNOSTIC USE." This tool does not provide medical advice, diagnoses, or treatment recommendations. Users must obtain appropriate ethical approvals before using this tool with real animal samples.

Patent Notice

Existing patents (e.g., ASU WO2018223094A1) cover canine cancer vaccine methods. This tool performs computational analysis only and does not constitute vaccine design or manufacture.

πŸ™ Acknowledgments

DogNeo stands on the shoulders of incredible open-source tools:

Upstream: BWA Β· GATK Β· Mutect2 Β· Strelka2 Β· STAR Β· Salmon Β· NetMHCpan Β· MHCflurry Β· VEP Β· SnpEff

Inspired by: pVACtools Β· OpenVax/Vaxrank Β· nextNEOpi Β· MiroFish Β· Ollama

Motivated by: The story of Paul Conyngham using AI to develop an mRNA treatment for his dog Rosie's cancer β€” and the realization that accessible tools can accelerate veterinary oncology research.

πŸ“„ License

Apache License 2.0


Made with πŸ• for dogs everywhere fighting cancer.

About

🧬 Canine neoantigen prioritization pipeline β€” from somatic variants to ranked vaccine candidates, purpose-built for dogs

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors