Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is loosely based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.2.0] - 2026-06-02

### Added
- `lib/scholar.py` — stdlib-urllib adapter over OpenAlex, Semantic Scholar,
PubMed, arXiv, Europe PMC, Crossref, and Unpaywall. Five actions:
`search`, `multi_search`, `get`, `find_doi`, `resolve_oa`. Returns a
uniform normalized hit schema across all sources.
- `lib/synthesis_doc_builder.py` — `python-docx` + `matplotlib` helper
that renders the synthesis doc (forest plot, PRISMA flow, stance
heat-table) with a pluggable upload callback.
- `pyproject.toml` — installable package. Core is stdlib-only; the
`[viz]` extra adds `python-docx`, `matplotlib`, and `numpy` for the
synthesis doc builder.
- `tests/test_scholar_smoke.py` — stdlib `unittest` coverage with
monkeypatched `urllib.request.urlopen`. Network-free.
- `tests/test_synthesis_doc_builder_smoke.py` — stdlib `unittest`
coverage; tests that require `python-docx` or `matplotlib` skip
cleanly when those soft deps are absent.

### Changed
- `lib/scholar.py` exposes `configure(contact_email, app_name)` and
honors `SCHOLAR_CONTACT_EMAIL` env at import-time. Without it,
polite-pool `mailto=` params are omitted and a generic User-Agent
is sent. Embedding-based dedup in `multi_search` is now pluggable
via `set_embedding_deduper(fn)`; without a registered deduper,
embedding modes degrade to no-op.
- `lib/synthesis_doc_builder.py` decouples Drive/upload concerns
behind an `Uploader = Callable[[Path, str, str], dict]` parameter
on `build_synthesis_doc`. Without an uploader, the helper returns
the local `.docx` path and the result is `uploaded=False`.
- `manifests/deep-research.v0.4.json` bumps `tool.version` to `0.2.0`.
- `SKILL.md` drops the v0.1 hedge — `lib/synthesis_doc_builder.py`
is now part of the shipped reference implementation.
- `README.md` "What ships" tables updated for v0.2.0; "Quickstart"
shows the new `pip install "deep-research[viz]"` form.

### Notes
- No runtime breaking changes for skill-bundle consumers — the SKILL
protocol surface is unchanged. The library additions are reference
code that orchestrators can wire into their own runtime.

## [0.1.1] - 2026-06-01

### Fixed
- Redacted author surname from `LICENSE`, `README.md`, and the manifest
to comply with the project's public_identity rule.
- Removed `$schema` property from `manifests/deep-research.v0.4.json`
to satisfy `additionalProperties: false` in the v0.4 manifest schema.

## [0.1.0] - 2026-06-01

### Added
- Initial public release.
- `SKILL.md`: full protocol, agent-runtime-agnostic.
- `schema/schema.sql` + `schema/schema_sqlite.sql`: three append-only
research tables.
- `agents/{scout,skeptic,methodologist,synthesizer,critic}.md`: role
prompts as plain text.
- `manifests/deep-research.v0.4.json`: install-manifest-spec v0.4
declaration.
- `examples/cholesterol_primary_prevention/`: a real run, end to end.

[0.2.0]: https://github.com/drknowhow/deep-research/releases/tag/v0.2.0
[0.1.1]: https://github.com/drknowhow/deep-research/releases/tag/v0.1.1
[0.1.0]: https://github.com/drknowhow/deep-research/releases/tag/v0.1.0
70 changes: 48 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ The output is a structured document with native heading hierarchy,
effect-size table, forest plot, blockquoted citations, and a full
reference list. It is meant to be defensible under adversarial re-read.

## What ships in v0.1.0
## What ships in v0.2.0

| Path | What's in it |
|---|---|
Expand All @@ -46,20 +46,33 @@ reference list. It is meant to be defensible under adversarial re-read.
| `agents/methodologist.md` | Pass-1 design-grading role prompt. |
| `agents/synthesizer.md` | Phase 4 no-fabrication role prompt. |
| `agents/critic.md` | Continuation mode `critique` role prompt. |
| `lib/scholar.py` | stdlib-urllib adapter over OpenAlex, Semantic Scholar, PubMed, arXiv, Europe PMC, Crossref, and Unpaywall. |
| `lib/synthesis_doc_builder.py` | `python-docx` + `matplotlib` helper that renders the synthesis doc (forest plot, PRISMA flow, stance heat-table) with a pluggable upload callback. |
| `manifests/deep-research.v0.4.json` | install-manifest-spec v0.4 declaration. |
| `examples/cholesterol_primary_prevention/` | A real run, end to end. |

## What's NOT in v0.1.0 (lands in v0.2.0)

- `lib/scholar.py` — a stdlib-urllib adapter over OpenAlex, Semantic
Scholar, PubMed, arXiv, Europe PMC, Crossref, and Unpaywall.
- `lib/synthesis_doc_builder.py` — `python-docx` + `matplotlib` helper
that renders the synthesis doc (forest plot, PRISMA flow, stance heat-table)
with a pluggable upload callback.

A working reference implementation of both lives upstream in the Yep agent
codebase. Pointers in the SKILL.md text. v0.2.0 will extract them with
their agent-runtime couplings stripped.
| `tests/` | stdlib `unittest` smoke tests. Network-free. |

## What's new in v0.2.0

- **`lib/scholar.py`** — six free academic APIs behind one normalized hit
schema. stdlib only. Configure the polite-pool contact email at install
time via `scholar.configure(contact_email=...)` or the
`SCHOLAR_CONTACT_EMAIL` env var. Embedding-based dedup in `multi_search`
is pluggable — wire your runtime's embeddings provider via
`scholar.set_embedding_deduper(fn)` or leave it unregistered and
fall back to DOI / title-hash dedup.
- **`lib/synthesis_doc_builder.py`** — decoupled upload via dependency
injection. Pass an `Uploader` callable
(`(local_path, name, mime_type) -> dict`) and the helper hands your
builder the local `.docx`; no uploader = no upload, you keep the file.
matplotlib + python-docx are soft deps gated behind the `[viz]` extra.
- **`pyproject.toml`** — installable package. Core is stdlib-only.
- **Tests** — `tests/test_scholar_smoke.py` and
`tests/test_synthesis_doc_builder_smoke.py`. Run with
`python -m unittest discover tests`.

A working reference of both modules also lives upstream in the
[Yep agent](https://yepgent.com) codebase.
Comment on lines +74 to +75

## Agent-runtime compatibility

Expand All @@ -78,7 +91,16 @@ agent runtime provides.

## Quickstart

1. Pick your database:
1. Install (the `[viz]` extra pulls `python-docx` + `matplotlib` + `numpy`
for the synthesis doc builder; omit it if you only need the protocol +
scholar adapter):
```bash
pip install "deep-research[viz]"
# or, from a clone:
pip install -e ".[viz]"
```

2. Pick your database:
```bash
# PostgreSQL
psql "$DATABASE_URL" < schema/schema.sql
Expand All @@ -87,23 +109,27 @@ agent runtime provides.
sqlite3 deep_research.db < schema/schema_sqlite.sql
```

2. Read `SKILL.md` end to end. The protocol is short. The gates are not
3. Read `SKILL.md` end to end. The protocol is short. The gates are not
optional.

3. Pick a question. Draft the protocol JSON per the schema in `SKILL.md`.
4. Pick a question. Draft the protocol JSON per the schema in `SKILL.md`.
Save it. Walk through Gate 1 with a human.

4. On approval, fan out the three Pass-1 subagents with the prompts in
5. On approval, fan out the three Pass-1 subagents with the prompts in
`agents/`. Each writes its `research_searches` audit rows and
`research_evidence` candidate rows.
`research_evidence` candidate rows. Wire `lib.scholar.scholar(...)`
in for the search calls.

5. Roll up the corpus. Walk through Gate 2. Approve, revise, or abort.
6. Roll up the corpus. Walk through Gate 2. Approve, revise, or abort.

6. On approval, run Pass-2 retrieval. Stage extracted text.
7. On approval, run Pass-2 retrieval. Stage extracted text.

7. Run the Synthesizer. Read the hard-rules block at the top of
8. Run the Synthesizer. Read the hard-rules block at the top of
`agents/synthesizer.md` first. The Synthesizer's job is to NOT make
anything up; that job is harder than it sounds.
anything up; that job is harder than it sounds. Use
`lib.synthesis_doc_builder.build_synthesis_doc(inputs, uploader=...)`
for the artifact — pass your own uploader for Drive/S3 wiring, or
omit it to keep the local `.docx`.

## Worked example: cholesterol primary prevention

Expand Down
11 changes: 7 additions & 4 deletions SKILL.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
name: deep_research
description: Protocol-first, gated, multi-agent literature investigation. Use when a question warrants citation-grade synthesis with traceable provenance — "what does the evidence say about X", "meta-analysis on Y", "systematic review of Z". Two human gates, four-role crew, no-fabrication enforced via verbatim quote spans.
version: 0.1.0
version: 0.2.0
license: Apache-2.0
homepage: https://github.com/drknowhow/deep-research
---
Expand Down Expand Up @@ -221,9 +221,12 @@ The synthesis ships as a structured document with:
`— Author Year, DOI:...` in italic.
- References list with full DOI per row.

`lib/synthesis_doc_builder.py` (when ported in v0.2) produces a `.docx` with
all of the above via `python-docx` + `matplotlib`. v0.1.0 ships the data
model and rendering spec; orchestrators wire in their own document builder.
`lib/synthesis_doc_builder.py` produces a `.docx` with all of the above
via `python-docx` + `matplotlib`. Install the optional `[viz]` extra
(`pip install "deep-research[viz]"`) and pass an `Uploader` callable —
`(local_path, name, mime_type) -> {"doc_id": ..., "web_url": ...}` —
to wire upload into your runtime. Without an uploader, the helper
returns the local `.docx` path and lets you ship it yourself.

UPDATE `research_evidence` rows the synthesizer cites to set
`supports_or_refutes` to the correct stance.
Expand Down
15 changes: 15 additions & 0 deletions lib/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
"""deep-research reference implementation library.

Two modules:

- ``lib.scholar`` — stdlib-urllib adapter over OpenAlex, Semantic
Scholar, PubMed, arXiv, Europe PMC, Crossref,
and Unpaywall.
- ``lib.synthesis_doc_builder`` — python-docx + matplotlib helper that renders
the synthesis doc (forest plot, PRISMA flow,
stance heat-table) with a pluggable upload
callback.

Both are agent-runtime-agnostic. Wire them into your own orchestration layer.
"""
__version__ = "0.2.0"
Loading