Skip to content

drknowhow/deep-research

Repository files navigation

deep-research

Protocol-first, gated, multi-agent literature investigation. Every claim a synthesis ships is backed by a row in research_evidence with a DOI and a verbatim quote span. If a claim can't produce its quote, the claim gets cut.

That is the entire point.

Why this exists

LLM-generated literature reviews fail in a specific way: they read beautifully and contain claims that no actual paper makes. The drift is usually paraphrase, occasionally fabrication, and almost never caught by the author of the prompt.

This repo is a workflow that makes the failure mode impossible by construction:

  1. Protocol pre-registration — search strategy, inclusion criteria, effect measure, and analysis plan are written and approved BEFORE any search runs. Drift after this point is logged, not absorbed.
  2. Three concurrent search-oriented roles in Pass-1 — a Scout for coverage, a Skeptic actively hunting refutation, a Methodologist grading design. The Synthesizer doesn't see the corpus yet.
  3. A human spend gate between cheap Pass-1 (abstracts) and expensive Pass-2 (full text). The corpus the human approves is the corpus the Synthesizer gets.
  4. A Synthesizer with hard rules: every claim cites a row, every cited row has a quote_span, every numeric value matches verbatim. If the corpus doesn't support a claim, the doc says so explicitly.

The output is a structured document with native heading hierarchy, effect-size table, forest plot, blockquoted citations, and a full reference list. It is meant to be defensible under adversarial re-read.

What ships in v0.2.0

Path What's in it
SKILL.md The full protocol, agent-runtime-agnostic.
schema/schema.sql PostgreSQL schema (three append-only tables).
schema/schema_sqlite.sql SQLite equivalent for local dev.
agents/scout.md Pass-1 broad-recall role prompt.
agents/skeptic.md Pass-1 refutation-hunting role prompt.
agents/methodologist.md Pass-1 design-grading role prompt.
agents/synthesizer.md Phase 4 no-fabrication role prompt.
agents/critic.md Continuation mode critique role prompt.
lib/scholar.py stdlib-urllib adapter over OpenAlex, Semantic Scholar, PubMed, arXiv, Europe PMC, Crossref, and Unpaywall.
lib/synthesis_doc_builder.py python-docx + matplotlib helper that renders the synthesis doc (forest plot, PRISMA flow, stance heat-table) with a pluggable upload callback.
manifests/deep-research.v0.4.json install-manifest-spec v0.4 declaration.
examples/cholesterol_primary_prevention/ A real run, end to end.
tests/ stdlib unittest smoke tests. Network-free.

What's new in v0.2.0

  • lib/scholar.py — six free academic APIs behind one normalized hit schema. stdlib only. Configure the polite-pool contact email at install time via scholar.configure(contact_email=...) or the SCHOLAR_CONTACT_EMAIL env var. Embedding-based dedup in multi_search is pluggable — wire your runtime's embeddings provider via scholar.set_embedding_deduper(fn) or leave it unregistered and fall back to DOI / title-hash dedup.
  • lib/synthesis_doc_builder.py — decoupled upload via dependency injection. Pass an Uploader callable ((local_path, name, mime_type) -> dict) and the helper hands your builder the local .docx; no uploader = no upload, you keep the file. matplotlib + python-docx are soft deps gated behind the [viz] extra.
  • pyproject.toml — installable package. Core is stdlib-only.
  • Teststests/test_scholar_smoke.py and tests/test_synthesis_doc_builder_smoke.py. Run with python -m unittest discover tests.

A working reference of both modules also lives upstream in the Yep agent codebase.

Agent-runtime compatibility

The protocol is runtime-agnostic. The reference implementation uses the Claude Agent SDK, but the workflow only requires:

Capability What's needed
Subagent fan-out Spawn 3 sibling agents in parallel with isolated contexts.
Persistent KV Read/write the three research_* tables. PostgreSQL or SQLite.
Scholarly HTTP Reach OpenAlex / Semantic Scholar / PubMed / arXiv / Crossref / Unpaywall. PDF fetch + extract-to-text.
Human-in-loop Two checkpoints where execution blocks until the user approves.

Pipe the agents/*.md prompts through whatever orchestration layer your agent runtime provides.

Quickstart

  1. Install (the [viz] extra pulls python-docx + matplotlib + numpy for the synthesis doc builder; omit it if you only need the protocol + scholar adapter):

    pip install "deep-research[viz]"
    # or, from a clone:
    pip install -e ".[viz]"
  2. Pick your database:

    # PostgreSQL
    psql "$DATABASE_URL" < schema/schema.sql
    
    # or SQLite (local dev)
    sqlite3 deep_research.db < schema/schema_sqlite.sql
  3. Read SKILL.md end to end. The protocol is short. The gates are not optional.

  4. Pick a question. Draft the protocol JSON per the schema in SKILL.md. Save it. Walk through Gate 1 with a human.

  5. On approval, fan out the three Pass-1 subagents with the prompts in agents/. Each writes its research_searches audit rows and research_evidence candidate rows. Wire lib.scholar.scholar(...) in for the search calls.

  6. Roll up the corpus. Walk through Gate 2. Approve, revise, or abort.

  7. On approval, run Pass-2 retrieval. Stage extracted text.

  8. Run the Synthesizer. Read the hard-rules block at the top of agents/synthesizer.md first. The Synthesizer's job is to NOT make anything up; that job is harder than it sounds. Use lib.synthesis_doc_builder.build_synthesis_doc(inputs, uploader=...) for the artifact — pass your own uploader for Drive/S3 wiring, or omit it to keep the local .docx.

Worked example: cholesterol primary prevention

See examples/cholesterol_primary_prevention/.

This was a real run of the protocol on the question:

Does pharmacological LDL-lowering reduce all-cause mortality in strict primary prevention (no prior cardiovascular events, no known cardiovascular disease)?

The example ships the approved protocol, the run log (subagent counts per phase, retraction sweep result, top-graded candidates), and the synthesis document with all citations intact.

Continuation modes

Once a project is complete, the user can revisit it under one of four modes without spinning up a fresh project:

  • refresh — pull literature since the last cutoff.
  • deepen — same question + corpus; retry paywalled rows.
  • rescope — same corpus, new sub-question.
  • critique — adversarial re-read of the synthesis against its quotes.

Lineage is preserved. The prior synthesis stays as immutable record. See the "Continuing a prior project" section in SKILL.md.

License

Apache-2.0. See LICENSE.

Author

Maintained by Dimitri T (@drknowhow). This protocol was extracted from the Yep agent and generalized for reuse by other agent owners.

Related

About

Protocol-first, gated, multi-agent literature investigation with no-fabrication enforced via verbatim quote spans.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages