feat(v0.2.0): reference scholar + synthesis_doc_builder#2
Merged
Conversation
Ports the Python reference implementation into the public repo with
agent-runtime couplings stripped.
lib/scholar.py
- stdlib-urllib adapter over OpenAlex, Semantic Scholar, PubMed,
arXiv, Europe PMC, Crossref, and Unpaywall.
- Five actions: search, multi_search, get, find_doi, resolve_oa.
- Uniform normalized hit schema across all sources.
- Polite-pool contact email is no longer hard-coded:
* configure(contact_email, app_name) sets module-global UA + mailto.
* SCHOLAR_CONTACT_EMAIL env var honored at import time.
* Without configuration, mailto params are omitted (APIs still work,
polite-pool benefits forfeited).
- Embedding-based dedup in multi_search is now pluggable via
set_embedding_deduper(fn) — no hard dep on any embeddings module.
Unregistered = no-op pass-through; hash dedup remains the safety net.
lib/synthesis_doc_builder.py
- python-docx + matplotlib helper that renders forest plot, PRISMA
flow, stance heat-table, and assembles the .docx with native heading
hierarchy + tables.
- Drive upload decoupled behind a DI Uploader callable:
Uploader = Callable[[Path, str, str], dict]
build_synthesis_doc(inputs, *, uploader=None)
Without an uploader the helper returns the local .docx path and
uploaded=False; with one, doc_id / web_url come back populated.
- matplotlib + python-docx remain soft imports; RuntimeError on use
when missing, not ImportError at load.
pyproject.toml
- Installable package. Core deps: stdlib only.
- [viz] extra: python-docx>=1.0, matplotlib>=3.7, numpy>=1.24.
tests/
- test_scholar_smoke.py — 14 tests. urllib.request.urlopen monkey-
patched with per-source fake responses. Confirms normalized hit
schema, configure() mutates UA + email, find_doi best-match, OA
flattening, dedup behavior. Network-free.
- test_synthesis_doc_builder_smoke.py — 7 tests. tempfile.mkdtemp +
pathlib.Path only (no \ separator literals — Linux CI safe).
Asserts .docx valid zip with word/document.xml containing heading
text + table tag; uploader called with correct args + name;
uploader exception preserves local artifact. matplotlib/python-docx
tests SkipTest cleanly when soft deps absent.
Docs
- README "What ships in v0.1.0" -> "What ships in v0.2.0"; new
"What's new in v0.2.0" section; pip install "deep-research[viz]"
in Quickstart.
- SKILL.md: drops the "when ported in v0.2" hedge; bumps version: 0.2.0.
- manifests/deep-research.v0.4.json: tool.version 0.1.1 -> 0.2.0;
runtime.install.ref v0.1.0 -> v0.2.0; description rewritten to
name the shipped library surface.
- CHANGELOG.md: new file with 0.2.0, 0.1.1, 0.1.0 entries.
Abstraction surfaces in one breath
- Polite-pool: configure() + env var; no embedded contact details.
- Embedding dedup: register a runtime-specific function or accept the
no-op fallback.
- Document upload: pass an Uploader or accept "local file only".
- Soft deps: matplotlib + python-docx live behind [viz]; tests skip.
21/21 tests pass locally on Python 3.13 (Windows + matplotlib + python-docx
installed). No surname or yepgent.com references in the shipped tree.
There was a problem hiding this comment.
Pull request overview
Ports the v0.1 reference implementation into this repo by adding a stdlib-only scholarly search adapter (lib/scholar.py) and an optional-deps synthesis .docx builder (lib/synthesis_doc_builder.py), plus packaging/docs/tests updates to support a v0.2.0 release.
Changes:
- Add
lib/scholar.py(multi-source search/get/find_doi/resolve_oa with normalized hit schema) andlib/synthesis_doc_builder.py(docx + plots with optional uploader injection). - Add stdlib
unittestsmoke tests for both modules (network-free via monkeypatchedurlopen). - Introduce
pyproject.tomlpackaging and update README/SKILL/manifest/CHANGELOG to v0.2.0.
Reviewed changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
lib/scholar.py |
New unified scholarly API adapter with search, multi_search, get, find_doi, and resolve_oa actions. |
lib/synthesis_doc_builder.py |
New synthesis document builder producing .docx and optional plots, with optional uploader callback. |
lib/__init__.py |
Introduces library package marker + version string. |
tests/test_scholar_smoke.py |
Adds network-free smoke tests for scholar behaviors and normalization. |
tests/test_synthesis_doc_builder_smoke.py |
Adds smoke tests for doc builder with soft-dep skipping. |
tests/__init__.py |
Marks tests as a package (empty). |
pyproject.toml |
Adds installable packaging config and [viz] extras for doc builder deps. |
README.md |
Updates “what ships”, quickstart, and v0.2.0 feature descriptions. |
SKILL.md |
Bumps skill version and updates synthesis builder documentation. |
manifests/deep-research.v0.4.json |
Bumps tool version/ref and updates manifest description for v0.2.0. |
CHANGELOG.md |
Adds changelog entries up through v0.2.0. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| readme = "README.md" | ||
| requires-python = ">=3.11" | ||
| license = { text = "Apache-2.0" } | ||
| authors = [{ name = "Dimitri T", email = "" }] |
Comment on lines
+1061
to
+1064
| key = ( | ||
| f"doi:{h.get('doi')}" if h.get("doi") else | ||
| (f"title:{_title_hash_key(h.get('title'))}" if h.get("title") else None) | ||
| ) |
| # Per-source: arXiv (Atom XML over export.arxiv.org) | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
| ARXIV_BASE = "http://export.arxiv.org/api/query" |
Comment on lines
+606
to
+613
| result: dict[str, Any] = { | ||
| "local_docx_path": str(docx_path), | ||
| "plots": {k: str(v) for k, v in plots.items()}, | ||
| "doc_id": None, | ||
| "web_url": None, | ||
| "uploaded": False, | ||
| "upload_error": None, | ||
| } |
Comment on lines
+74
to
+75
| A working reference of both modules also lives upstream in the | ||
| [Yep agent](https://yepgent.com) codebase. |
Comment on lines
+96
to
+100
| def setUp(self): | ||
| if not _has_python_docx(): | ||
| raise unittest.SkipTest("python-docx not installed (viz extra)") | ||
| self.tmp = Path(tempfile.mkdtemp(prefix="dr_smoke_")) | ||
|
|
|
|
||
|
|
||
| def _normalize_crossref_item(it: dict) -> dict: | ||
| doi = _normalize_doi(it.get("DOI")) |
| elif crossref_type in ("review-article",): | ||
| tier_hint = 3 | ||
| return { | ||
| "id": f"crossref:{doi or ''}", |
This was referenced Jun 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Ports the v0.1 reference implementation (
scholaradapter + synthesisdoc builder) into the public
deep-researchrepo with agent-runtimecouplings stripped behind clean abstraction surfaces.
What ships
lib/scholar.py— stdlib-urllib adapter over six free academicAPIs (OpenAlex, Semantic Scholar, PubMed, arXiv, Europe PMC, Crossref)
plus Unpaywall. Five actions:
search,multi_search,get,find_doi,resolve_oa. Uniform normalized hit schema.lib/synthesis_doc_builder.py—python-docx+matplotlibhelper that renders forest plot, PRISMA flow, stance heat-table, and
assembles a structured
.docxwith native heading hierarchy andtables.
pyproject.toml— installable package. Core deps: stdlib only.tests/— 21 stdlibunittesttests. Network-free.pathlib.Patheverywhere (no
\separator literals — Linux-CI safe).CHANGELOG.md— first entry for the repo; covers 0.2.0, 0.1.1,0.1.0.
Abstraction surfaces (the only API changes that matter)
mailto=…+ User-Agentscholar.configure(contact_email, app_name)+SCHOLAR_CONTACT_EMAILenvdeep-research-scholar/1.0; no mailto sentscholar.set_embedding_deduper(fn)build_synthesis_doc(inputs, *, uploader: Callable[[Path, str, str], dict]).docxreturned,uploaded=FalseRuntimeErroron usepip install "deep-research[viz]"Test plan
python -m unittest discover tests— 21/21 pass locally (Python3.13, Windows, with
[viz]extras installed).yepgent.com/dimitri@grep acrosslib/and root— clean.
python -c "import lib.scholar; lib.scholar.scholar('search', {'source':'openalex','query':'test','limit':1})"against the real OpenAlex API after merge.Out of scope
SKILL.mdprotocol semantics — only the v0.1 "lands inv0.2" hedge was removed.
schema/oragents/.manifest_version: 0.4).🤖 Generated with Claude Code