Microbial Community Mechanisms Knowledge Base
A LinkML-based knowledge base for modeling microbial community structure, function, and ecological interactions with evidence-based validation.
Model specific microbial communities as individual YAML files, combining:
- Rich, expressive YAML for agent consumption
- Validated evidence chains (anti-hallucination)
- Ontology-grounded terms (NCBITaxon, ENVO, CHEBI, GO)
- Causal graphs for ecological interactions
- Faceted browser for scientists
- KG export for integration (via Koza)
Adapted from: Monarch Initiative's dismech
Rich YAML Source
kb/communities/Human_Gut_Healthy_Adult.yaml
↓
Validation Stack
├── Schema validation (linkml-validate)
├── Term validation (linkml-term-validator + OAK)
└── Reference validation (snippets vs PubMed)
↓
Dual Output
├── Koza Transform → KGX Edges (for KG stacks)
└── Browser Export → Faceted Search (for scientists)
# Clone repository
git clone https://github.com/CultureBotAI/CommunityMech.git
cd CommunityMech
# Install dependencies (once implemented)
just install
# Or with uv
uv sync --group dev# Schema validation
just validate kb/communities/Human_Gut_Healthy_Adult.yaml
# Reference validation (prevent hallucination)
just validate-references kb/communities/Human_Gut_Healthy_Adult.yaml
# Term validation (check ontology terms)
just validate-terms# Export to KG (Koza transform)
just kgx-export
# Generate faceted browser
just gen-browser
# Generate HTML pages
just gen-html-allNote: Commands above are planned for implementation. See COMMUNITY_MECH_PLAN.md for development roadmap.
CommunityMech/
├── src/
│ └── communitymech/
│ ├── schema/
│ │ └── communitymech.yaml # LinkML schema
│ ├── datamodel/ # Generated Python models
│ ├── export/
│ │ ├── kgx_export.py # Koza transform to KG
│ │ └── browser_export.py # Faceted browser export
│ ├── render.py # HTML page generator
│ └── templates/ # Jinja2 templates
├── kb/
│ └── communities/
│ ├── Human_Gut_Healthy_Adult.yaml
│ ├── Human_Gut_IBD_UC.yaml
│ ├── Soil_Grassland_Temperate.yaml
│ └── ...
├── conf/
│ ├── oak_config.yaml # OAK ontology adapters
│ └── qc_config.yaml # QC configuration
├── app/ # Faceted browser
│ ├── index.html
│ ├── data.js
│ └── schema.js
├── pages/
│ └── communities/ # Rendered HTML pages
├── tests/
│ └── test_communities.py
├── .claude/
│ └── skills/ # Claude Code curation skills
├── justfile # Task runner
├── pyproject.toml
├── COMMUNITY_MECH_PLAN.md # Full implementation plan
├── QUICK_START.md # Quick reference
└── README.md
- Implementation Plan - Complete 11-phase plan
- Quick Start Guide - Quick reference
- Schema Documentation - LinkML schema reference
- Cross-Repo Linking - Environmental linking to CultureMech and MediaIngredientMech
- Growth Media Linking - Cultivation-based media linking
CommunityMech publishes its community knowledge graph as KGX TSV on every release. The artifact has stable column shapes, deterministic edge IDs, and propagates literature evidence directly into edge metadata so downstream graphs preserve provenance.
- Latest release: GitHub Releases page →
nodes.tsv.gz,edges.tsv.gz, andmanifest.jsonattached to each release. - CI artifacts: every push and release run uploads the same
files to the workflow's Artifacts panel
(
.github/workflows/kgx-release.yaml, ~90-day retention). - Local build:
just kgx-exportwrites them tooutput/kgx/.just kgx-validateruns structural checks.
nodes.tsv columns: id, category, name, description, provided_by
edges.tsv columns: id, subject, predicate, object, category, publications, supporting_text, knowledge_level, agent_type, primary_knowledge_source
Edge id values are deterministic UUID5 derived from (subject, predicate, object, qualifier) — re-running the export with the
same input produces byte-identical IDs.
| Predicate | Subject category | Object category | Edge category |
|---|---|---|---|
biolink:located_in |
biolink:OrganismalEntity (community) |
biolink:EnvironmentalFeature (ENVO) |
biolink:OrganismToEnvironmentAssociation |
biolink:has_part |
biolink:OrganismalEntity (community) |
biolink:OrganismTaxon (NCBITaxon) |
biolink:OrganismToOrganismAssociation |
biolink:related_to |
biolink:OrganismalEntity (community) |
biolink:ChemicalEntity (CHEBI metals/REE) |
biolink:ChemicalEntityToOrganismalEntityAssociation |
biolink:occurs_in |
biolink:OrganismalEntity (community) |
growth medium | biolink:Association |
Every edge derived from a community evidence claim carries:
publications: pipe-separated CURIEs (e.g.PMID:18936492|DOI:10.1099/00207713-50-4-1539)supporting_text: pipe-separated verbatim snippets (the same ones that pass MIM's anti-hallucination Phase 1 validator)knowledge_level:knowledge_assertion(curator-asserted)agent_type:manual_agentprimary_knowledge_source:infores:communitymech
- Column order in both TSVs is fixed; new columns will only be appended at the right.
- Edge IDs are stable across runs unless the underlying
(subject, predicate, object, qualifier)tuple changes. infores:communitymechis the canonical knowledge-source id.- Bare element names in
metals_present/rare_earth_elements_presentare auto-resolved to verified CHEBI atom CURIEs (26 elements covered as of 2026-05); see_ELEMENT_CHEBIinsrc/communitymech/export/kgx_export.pyfor the full table.
The export is implemented as a custom Python emitter (no Koza
dependency for a 78-record / ~580-edge scale). Schema, evidence
slots, and edge selection are documented in:
../../culturebotai-claw/docs/proposals/phase3_communitymech_kgx_export_with_publications.md.
name: Human Gut Healthy Adult
ecological_state: HEALTHY
environment_term:
preferred_term: human gut
term:
id: ENVO:0001998
label: human gut environment
taxonomy:
- taxon_term:
preferred_term: Faecalibacterium prausnitzii
term:
id: NCBITaxon:853
label: Faecalibacterium prausnitzii
abundance_level: ABUNDANT
functional_role: [KEYSTONE, CORE]
evidence:
- reference: PMID:18936492
supports: SUPPORT
snippet: "F. prausnitzii represents more than 5%..."
ecological_interactions:
- name: Butyrate Production
source_taxon:
preferred_term: Faecalibacterium prausnitzii
term:
id: NCBITaxon:853
metabolites:
- preferred_term: butyrate
term:
id: CHEBI:30089
downstream:
- target: Host Colonocyte Energy
evidence:
- reference: PMID:18936492- Every claim backed by PMID references
- Snippets validated against PubMed abstracts
- Prevents AI hallucination
- NCBITaxon - Microbial taxa
- ENVO - Environments
- CHEBI - Chemical entities/metabolites
- GO - Biological processes
- UBERON - Host anatomy
- Model ecological interactions as directed graphs
- Represent cross-feeding, competition, mutualism
- Visualize with D3.js/Cytoscape
- Rich YAML - For agent consumption (full context)
- Simple KG - For graph algorithms (Biolink edges)
- Environmental linking to CultureMech media and MediaIngredientMech ingredients
- Environment-based discovery via shared ENVO terms
related_mediafor environmentally relevant media (complementsgrowth_media)related_ingredientsfor environmentally significant compounds- SPARQL query patterns for cross-repo joins
- Faceted browser (no coding required)
- Click-through to evidence
- Interactive visualizations
Current Phase: Foundation (Sprint 1)
- Repository created
- Implementation plan documented
- Reference implementation analyzed (Monarch dismech)
- Seed data added (35 communities)
- Schema design (LinkML schema with full validation)
- First example community YAML
- Cross-repo environmental linking (CultureMech + MediaIngredientMech)
- Validation stack setup
- Koza transform implementation
- Faceted browser adaptation
- HTML rendering
See COMMUNITY_MECH_PLAN.md for the complete roadmap.
This is a private repository during initial development. Once the core infrastructure is in place, we'll open it up for community contributions.
- Create a new branch for your feature/community
- Add/modify community YAML files
- Run validation:
just qc - Commit with evidence validation
- Create PR for review
# Schema validation
just validate kb/communities/YourCommunity.yaml
# Reference validation (anti-hallucination)
just validate-references kb/communities/YourCommunity.yaml
# Term validation (ontology checking)
just validate-terms-file kb/communities/YourCommunity.yaml
# Full QC
just qc(TBD once published)
Based on the dismech framework:
- Monarch Initiative. (2024). dismech: Disorder Mechanisms Knowledge Base. https://github.com/monarch-initiative/dismech
BSD-3-Clause (matching the dismech framework)
- dismech - Disease mechanisms KB (inspiration)
- LinkML - Modeling framework
- Koza - KG transformation tool
- BugSigDB - Microbial signatures database
- OAK - Ontology access toolkit
For questions or collaboration inquiries, please open an issue.
Status: 🚧 Under active development