Canine Neoantigen Prioritization Pipeline
From somatic variants to ranked vaccine candidates β purpose-built for dogs.
Quick Start Β· Usage Β· LLM Config Β· Architecture Β· License
β οΈ FOR RESEARCH USE ONLY (RUO) β This tool is intended solely for computational research purposes. It does not provide clinical diagnoses, veterinary advice, or treatment recommendations.
Cancer immunotherapy has shown remarkable results in human medicine β but dogs get cancer too. When a team used AI + mRNA to treat a dog's tumor, it highlighted a gap: there was no open-source, canine-specific neoantigen pipeline.
DogNeo fills that gap. It takes annotated somatic variants from a dog's tumor, generates candidate mutant peptides, predicts binding to DLA (Dog Leukocyte Antigen) molecules, and ranks neoantigens β giving researchers a starting point for personalized canine cancer vaccines.
Tumor/Normal DNA βββ Variant Calling βββ Peptide Generation βββ DLA Binding βββ Ranked Candidates
RNA-seq βββ Expression Quantification ββββββββββββββββββββββ
DLA typing (KPR) βββββββββββββββββββββββββββββββββββββββββββ
| Feature | Description |
|---|---|
| π Canine-specific | DLA alleles from IPD-MHC, CanFam3.1 proteome, KPR genotyping |
| π¬ Full pipeline | BWA β GATK β Mutect2/Strelka β VEP/SnpEff β Salmon β NetMHCpan β Ranking |
| π§ͺ Out-of-the-box demo | 3 commands to run a complete analysis on real canine osteosarcoma data |
| π€ AI-assisted reports | 3-tier LLM backend (CLI free β Local β Cloud) for narrative interpretation |
| π¦ Multi-format output | TSV, JSON, FASTA, and interactive HTML reports |
| β»οΈ Reproducible | Snakemake workflow with Docker containerization |
pip install git+https://github.com/ImL1s/dogneo.git # 1. Install
dogneo setup # 2. Download reference data (~15 MB)
dogneo demo # 3. Run demo pipeline β¨That's it. The demo runs a full pipeline on bundled canine osteosarcoma data (8 published mutations across TP53, BRAF, KRAS, PIK3CA, PTEN) and generates ranked candidates in TSV, JSON, and FASTA formats.
π Expected output
𧬠DogNeo v0.1.0 β Running demo pipeline
π Using bundled demo data
π Using reference: ~/.dogneo/data/CanFam3.1.pep.all.fa
8 total β 6 coding variants
228 peptides from 6 variants
912 unscored candidates
β
Demo complete! Results: dogneo_demo_results/CANINE_OSA_DEMO
π candidates.tsv β Tab-separated candidates
π candidates.json β Structured JSON with metadata
π candidates.fasta β Peptide sequences for wet lab
| Tool | Version | Check | Notes |
|---|---|---|---|
| Python | β₯ 3.10 | python --version |
Required |
| pip | latest | pip --version |
Included with Python |
| git | any | git --version |
Required |
pip install git+https://github.com/ImL1s/dogneo.gitpip install "dogneo[bio] @ git+https://github.com/ImL1s/dogneo.git" # + pysam, pyvcf3
pip install "dogneo[llm] @ git+https://github.com/ImL1s/dogneo.git" # + openai, anthropic
pip install "dogneo[all] @ git+https://github.com/ImL1s/dogneo.git" # Everythinggit clone https://github.com/ImL1s/dogneo.git
cd dogneo
pip install -e ".[all]"dogneo setupThis downloads the CanFam3.1 proteome from Ensembl FTP to ~/.dogneo/data/. DLA alleles are bundled with the package β no additional download needed.
# Minimal β auto-detects cached proteome + bundled DLA alleles
dogneo rank --vcf my_somatic_variants.vcf
# Full options
dogneo rank \
--vcf somatic_annotated.vcf \
--expression salmon_quant.sf \
--alleles "DLA-88*001:01,DLA-88*501:01" \
--protein-db CanFam3.1.pep.all.fa \
--output-dir results/ \
--formats tsv,json,fasta,html \
--llm-tier cli# From existing candidates JSON
dogneo report \
--input results/candidates.json \
--output report.html \
--llm-tier cli # free, uses Gemini CLI / Claude Code# After running NetMHCpan on exported FASTA
dogneo rerank \
--candidates results/candidates.json \
--binding netmhcpan_output.tsv \
--output-dir reranked/# Generate codon-optimized mRNA from top candidates
dogneo design-mrna \
--candidates results/candidates.json \
--top-n 10 \
--output-dir mrna_design/dogneo ui # Opens browser at localhost:8501
dogneo ui --demo # Pre-load demo resultsdogneo run --config pipeline_config.yaml| Command | Description |
|---|---|
dogneo setup |
Download reference data (CanFam3.1 proteome) |
dogneo demo |
Run full pipeline on bundled demo data |
dogneo rank |
Rank neoantigens from a VCF file (auto DLA binding estimation) |
dogneo rerank |
Import external binding results (NetMHCpan/MHCflurry) and re-score |
dogneo design-mrna |
Generate codon-optimized mRNA construct from top candidates |
dogneo report |
Generate HTML/Markdown report |
dogneo ui |
Launch interactive Streamlit dashboard |
dogneo check-llm |
Display status of all LLM backends |
dogneo version |
Show version |
The AI report layer is entirely optional β all computational analysis works without any LLM. LLMs are only used for generating narrative interpretations in HTML reports.
| Tier | Backend | Cost | Privacy | Setup |
|---|---|---|---|---|
| 1. CLI | Gemini CLI, Claude Code, Codex | Free | Data sent to cloud | Install any AI CLI tool |
| 2. Local | llama-cpp (GGUF models) | Free | Fully offline | Download a GGUF model |
| 3. Cloud | OpenAI, Anthropic, Google Gemini | Pay per token | Data sent to cloud | Set API key |
$ dogneo check-llm
π‘ CLI Backends (Tier 1 β Free):
β
gemini
β
claude
β
codex
πΎ Local Backends (Tier 2 β Offline):
βͺ No local model configured
βοΈ Cloud Backends (Tier 3 β API):
β OpenAI (OPENAI_API_KEY)
β Anthropic (ANTHROPIC_API_KEY)# Optional β only needed if using cloud LLM tier
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export GOOGLE_API_KEY=AI... ββββββββββββββββ
β Tumor DNA β
β Normal DNA β
β RNA-seq β
ββββββββ¬ββββββββ
β
ββββββββββββββΌβββββββββββββ
βΌ βΌ βΌ
βββββββββββββ ββββββββββββ ββββββββββββ
β BWA-MEM β β STAR β β Salmon β
β alignment β β alignmentβ β quant. β
βββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ
β β β
βββββββ΄ββββββ β Expression
β GATK β β TPMs
β Mutect2 β β β
β Strelka2 β β β
βββββββ¬ββββββ β β
β β β
βββββββ΄ββββββ β β
β VEP/SnpEffβ β β
β annotationβ β β
βββββββ¬ββββββ β β
β β β
βΌ βΌ βΌ
ββββββββββββββββββββββββββββββββββββ
β DogNeo Core Engine β
β βββββββββββββββββββββββββββββββ β
β β Variant Parsing & Filtering β β
β β Mutant Peptide Generation β β
β β DLA Binding Prediction β β
β β Multi-factor Ranking β β
β βββββββββββββββββββββββββββββββ β
ββββββββββββββββ¬ββββββββββββββββββββ
β
ββββββββββββββΌβββββββββββββ
βΌ βΌ βΌ
ββββββββββ ββββββββββ ββββββββββββ
β TSV β β JSON β β HTML β
β FASTA β β β β + LLM β
ββββββββββ ββββββββββ ββββββββββββ
| Step | Tool | Description |
|---|---|---|
| 1 | BWA-MEM / STAR | Sequence alignment (DNA / RNA) |
| 2 | GATK | MarkDuplicates, BQSR |
| 3 | Mutect2 + Strelka2 | Somatic variant calling |
| 4 | VEP / SnpEff | Variant annotation |
| 5 | Salmon / Kallisto | Expression quantification |
| 6 | DogNeo | Mutant peptide windows (8β11aa MHC-I, 15β17aa MHC-II) |
| 7 | KPR | DLA genotyping from RNA-seq |
| 8 | NetMHCpan / DogNeo estimator | Binding prediction |
| 9 | DogNeo | Multi-factor immunogenicity scoring |
| 10 | DogNeo | TSV / JSON / FASTA / HTML report |
dogneo/
βββ core/ # Core computational modules
β βββ variants.py # VCF parsing & somatic variant handling
β βββ peptides.py # Mutant peptide window generation
β βββ expression.py # RNA-seq expression quantification
β βββ binding.py # MHC/DLA binding prediction wrappers
β βββ ranking.py # Multi-factor neoantigen scoring
β βββ dla_typing.py # Canine DLA genotyping
βββ data/ # Reference & demo data
β βββ manager.py # Auto-download CanFam3.1 from Ensembl FTP
β βββ demo/ # Bundled demo data (VCF, expression, alleles)
βββ pipeline/ # Snakemake workflow definitions
β βββ Snakefile # Main workflow
β βββ config.yaml # Pipeline configuration template
βββ llm/ # Pluggable LLM layer (optional)
β βββ router.py # 3-tier routing (CLI β Local β Cloud)
β βββ backends.py # OpenAI / Anthropic / Gemini backends
β βββ cli_wrapper.py # Subprocess wrapper for AI CLIs
β βββ prompts.py # Prompt templates for neoantigen analysis
βββ report/ # Report generation
β βββ generator.py # HTML/Markdown report with AI summary
βββ export/ # Output format converters
β βββ exporters.py # FASTA, TSV, JSON exporters
βββ cli.py # Click-based CLI entrypoint
| Resource | Source | Notes |
|---|---|---|
| Canine proteome | Ensembl CanFam3.1 | Auto-downloaded via dogneo setup (~15 MB) |
| DLA alleles | IPD-MHC | 6 DLA-88 Class I alleles, bundled |
| DLA typing | KPR (paper) | RNA-seq based MHC-I genotyping |
| Canine genome | CanFam3.1, GSD_1.0, UMICH_Zoey_3.1 | Multiple assemblies supported |
Contributions are welcome! Please feel free to submit issues and pull requests.
# Development setup
git clone https://github.com/ImL1s/dogneo.git
cd dogneo
pip install -e ".[all]"
python -m pytest tests/ -v # Run tests (195 passing)RUO Disclaimer
All outputs are labeled "FOR RESEARCH USE ONLY β NOT FOR CLINICAL OR VETERINARY DIAGNOSTIC USE." This tool does not provide medical advice, diagnoses, or treatment recommendations. Users must obtain appropriate ethical approvals before using this tool with real animal samples.
Patent Notice
Existing patents (e.g., ASU WO2018223094A1) cover canine cancer vaccine methods. This tool performs computational analysis only and does not constitute vaccine design or manufacture.
DogNeo stands on the shoulders of incredible open-source tools:
Upstream: BWA Β· GATK Β· Mutect2 Β· Strelka2 Β· STAR Β· Salmon Β· NetMHCpan Β· MHCflurry Β· VEP Β· SnpEff
Inspired by: pVACtools Β· OpenVax/Vaxrank Β· nextNEOpi Β· MiroFish Β· Ollama
Motivated by: The story of Paul Conyngham using AI to develop an mRNA treatment for his dog Rosie's cancer β and the realization that accessible tools can accelerate veterinary oncology research.
Made with π for dogs everywhere fighting cancer.