GitHub - fraware/verified-science-agent: Evidence-backed scientific AI report infrastructure

 ######################################################################################################

 __     __        _  __ _          _   ____       _                          _                    _   
 \ \   / /__ _ __(_)/ _(_) ___  __| | / ___|  ___(_) ___ _ __   ___ ___     / \   __ _  ___ _ __ | |_ 
  \ \ / / _ \ '__| | |_| |/ _ \/ _` | \___ \ / __| |/ _ \ '_ \ / __/ _ \   / _ \ / _` |/ _ \ '_ \| __|
   \ V /  __/ |  | |  _| |  __/ (_| |  ___) | (__| |  __/ | | | (_|  __/  / ___ \ (_| |  __/ | | | |_ 
    \_/ \___|_|  |_|_| |_|\___|\__,_| |____/ \___|_|\___|_| |_|\___\___| /_/   \_\__, |\___|_| |_|\__|
                                                                                 |___/                

 ######################################################################################################

Evidence-backed scientific AI report infrastructure

Treat every AI-generated scientific report like a software build artifact —
inputs, source records, claims, validation, provenance, and review status.

Documentation · API · Benchmark · Changelog

Overview

Verified Science Agent (VSA) turns scientific questions into inspectable ScientificReport JSON artifacts. Retrieval produces evidence. Generation produces claims. Validation checks claims against evidence. Models cannot invent source fields.

Mission: Scientific AI reports should be inspectable by engineers, readable by scientists, and shareable with reviewers — with signed or hashable outputs.

flowchart LR
  Q[Question] --> P[Subject parser]
  P --> C[Connectors]
  C --> E[Evidence ranking]
  E --> X[Claim extraction]
  X --> V[Validation]
  V --> R[Render / audit / review / export]

Core rule: retrieval → evidence · generation → claims · validation → proof

Quick start

git clone https://github.com/fraware/verified-science-agent.git
cd verified-science-agent
pip install -e ".[dev,ui,pdf,signing,api]"
make demo && pytest && vsa benchmark

Or run the full suite in one step:

make acceptance

make acceptance runs a demo build, all tests, and the 50-task evaluation suite.

Typical workflow

# Retrieve and build
vsa retrieve "BRCA1 c.68_69del"
vsa build examples/brca1_input.json --out reports/brca1_report.json --claim-mode rule

# Validate, audit, export
vsa validate reports/brca1_report.json
vsa audit reports/brca1_report.json --audit-mode rule --out reports/audit.json
vsa export reports/brca1_report.json --out-dir reports/bundle --audit-mode rule
vsa verify-bundle reports/bundle

# Attestation and review
vsa attest reports/brca1_report.json --out reports/attestation.json --subject-name report.json
vsa review start reports/brca1_report.json --reviewer you@example.com
vsa review approve-claim reports/brca1_report.json --reviewer you@example.com --claim C002
vsa verify-review reports/brca1_report.json

# Render, sign, serve
vsa render reports/brca1_report.json --format markdown --out reports/brca1_report.md
vsa sign reports/brca1_report.json
vsa serve --port 8000

make demo                      # build + validate + audit + export + verify-bundle
streamlit run ui/app.py        # interactive inspector with credibility warnings

Features

Area	What you get
Artifacts	Canonical `ScientificReport` schema (v1.2.0), provenance hashes, export bundles
Credibility	ClinVar ambiguity alerts, metadata-only warnings, AlphaFold predicted-structure labeling
Verification	Schema + semantic validation, rule/hybrid audit, SLSA/in-toto attestation
Review	Human review workflow with verifiable event chains
Benchmark	50-task evaluation suite with quality checks
API	REST server with optional `VSA_API_KEY` auth

CLI reference

Build and retrieve

Command	Description
`vsa retrieve "question"`	Retrieve evidence from databases
`vsa build input.json --out report.json`	Build a full ScientificReport
`vsa extract input.json`	Extract claims (rule or LLM)
`vsa benchmark`	Run 50-task benchmark suite (`--live` for network)

Validate and audit

Command	Description
`vsa validate report.json`	Schema + semantic validation
`vsa audit report.json`	Scientific audit (rule + optional LLM hybrid)
`vsa compare report_a.json report_b.json`	Diff two reports
`vsa compare-audit audit_a.json audit_b.json`	Diff audit artifacts

Export and verify

Command	Description
`vsa export report.json --out-dir dir/`	Full bundle: report, audit, provenance, sources/, manifest
`vsa verify-bundle dir/`	Verify manifest hashes and attestation
`vsa attest report.json --out attestation.json`	SLSA/in-toto provenance attestation
`vsa verify-attestation report.json attestation.json`	Verify attestation digest

Review

Command	Description
`vsa review start report.json --reviewer NAME`	Start human review session
`vsa review approve-claim ... --claim C001`	Approve specific claims
`vsa review verify report.json`	Verify review chain hashes
`vsa verify-review report.json`	Alias for `review verify`

Legacy flags remain supported: vsa review report.json --reviewer NAME --approve C001.

Render, sign, and serve

Command	Description
`vsa render report.json --format markdown\|html\|json\|pdf`	Render report
`vsa hash report.json`	Provenance hash chain
`vsa sign report.json`	Ed25519-sign report provenance hash
`vsa verify-signature report.json`	Verify Ed25519 signature
`vsa serve --port 8000`	Start REST API (requires `[api]` extra)

REST endpoint parity: docs/api.md

Scientific credibility

VSA enforces policies that make weak evidence hard to miss:

ClinVar ambiguity — ambiguous queries capped to low reliability; CLINVAR AMBIGUITY ALERT in report warnings
Metadata-only papers — SCIENTIFIC CREDIBILITY WARNING when all publication evidence is bibliographic only
AlphaFold — summaries always declare predicted structure; never treated as experimental
Materials Project — missing API key degrades with an explicit skip warning

Warnings surface in CLI output, markdown/HTML render, validation checks, and the Streamlit UI.

LLM modes (optional)

Copy .env.example to .env and add API keys. Never commit .env.

vsa build examples/brca1_input.json --out reports/brca1_report.json --claim-mode auto
vsa audit reports/brca1_report.json --audit-mode auto

Rule-based modes require no API keys and are used in CI:

vsa build examples/brca1_input.json --out reports/brca1_report.json --claim-mode rule
vsa audit reports/brca1_report.json --audit-mode rule

The LLM auditor evaluates only claim text and cited evidence in the payload — it cannot introduce new sources.

ScientificReport schema

Canonical artifact model (schema 1.2.0, also accepts 1.0.0 and 1.1.0):

ScientificReport
├── subject
├── claims[]
├── evidence[]
├── methods[]
├── provenance
├── validation_results
├── human_review
└── generated_outputs

Schema: src/vsa/schemas/scientific_report.schema.json · Field reference: docs/schema.md

Domains: genomics variants, proteins, papers, chemicals, materials, experiments.

Connectors

Read-only connectors with normalized evidence and file caching (.vsa_cache/):

Category	Sources
Literature	OpenAlex, Crossref, PubMed, Europe PMC, Semantic Scholar
Genomics / protein	ClinVar, UniProt, AlphaFold DB
Materials	Materials Project (`MATERIALS_PROJECT_API_KEY`)

Details: docs/connectors.md

Benchmark

50 evaluation tasks covering genomics, proteins, papers, materials, and edge cases:

vsa benchmark

Measures source recall/precision, citation integrity, evidence validity, review boundaries, contradiction detection, and bundle reproducibility. See docs/benchmark.md.

Repository layout

verified-science-agent/
├── src/vsa/            CLI, pipeline, connectors, validation, API
├── schemas/            JSON Schema (symlink to package schema)
├── benchmarks/         50 tasks + offline fixtures
├── examples/           Input files and good/bad report examples
├── docs/               Architecture, schema, connectors, API
├── ui/                 Streamlit inspector
├── scripts/            Full test suite helper
└── .github/workflows/  CI and release pipelines

Architecture: docs/architecture.md

Development

pip install -e ".[dev,signing,api]"
pytest
make demo

Safety notice

Research infrastructure only. Not a medical device, clinical decision system, or diagnostic platform.

Research infrastructure output. Not for diagnosis, treatment, or clinical decision-making without qualified expert review.

Human expert review is required before any clinical use.

License

MIT License — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Quick start

Features

CLI reference

Build and retrieve

Validate and audit

Export and verify

Review

Render, sign, and serve

Scientific credibility

LLM modes (optional)

ScientificReport schema

Connectors

Benchmark

Repository layout

Development

Safety notice

License

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
docs		docs
examples		examples
integrations		integrations
reports		reports
schemas		schemas
scripts		scripts
src/vsa		src/vsa
tests		tests
ui		ui
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Overview

Quick start

Features

CLI reference

Build and retrieve

Validate and audit

Export and verify

Review

Render, sign, and serve

Scientific credibility

LLM modes (optional)

ScientificReport schema

Connectors

Benchmark

Repository layout

Development

Safety notice

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages