Skip to content

glfharris/minerva

Repository files navigation

Minerva

LLM-based Single Best Answer Question Generation

Minerva CLI generating an SBA question on lung compliance

High-quality question banks for postgraduate medical examinations are expensive, often charging significant sums for access — with little reduction in price for candidates who need to resit. Minerva generates Single Best Answer (SBA) questions from your own reference material using retrieval-augmented generation, with the aim of producing questions that meet the standard of those written by human examiners.

Example output

A ventilated patient has been admitted to the intensive care unit after emergency major colorectal surgery and is receiving a continuous intravenous cardiovascular support drug. Twelve hours later the blood glucose concentration is 13 mmol L⁻¹, although he is not known to be diabetic.

Which drug infusion is most likely to be responsible for the hyperglycaemia?

A. Adrenaline
B. Dobutamine
C. Enoximone
D. Noradrenaline
E. Vasopressin

Correct: A. Adrenaline commonly causes hyperglycaemia by stimulating glycogenolysis and gluconeogenesis and reducing peripheral glucose uptake via adrenergic receptor effects.

Each question includes per-option explanations, an overall educational explanation, and matched curriculum node codes.

How it works

flowchart TD
    subgraph Index["Reference indexing"]
        Docs["Reference PDFs / EPUBs"] --> Parse["Parse + chunk"]
        Parse --> Embed["Embed chunks"]
        Embed --> VectorDB[("LanceDB reference store")]
    end

    subgraph Curriculum["Curriculum context"]
        Curricula[("RCoA curriculum trees")]
        Topic["Topic + exam"]
        Node["Optional --node"]
        Topic --> CurrMatch["Match topic to curriculum node"]
        Curricula --> CurrMatch
        Node --> PromptContext["Curriculum breadcrumb"]
        CurrMatch --> PromptContext
    end

    subgraph Generation["Question generation"]
        VectorDB --> Retrieve["Retrieve relevant context"]
        PromptContext --> Agent["Pydantic AI agent"]
        Retrieve --> Agent
        Agent --> QS["QuestionSet"]
    end

    subgraph Review["Post-processing"]
        QS --> Match["Match questions to curriculum nodes"]
        Match --> Critique{"Critique?"}
        Critique -->|optional| Revised["Revised QuestionSet"]
        Critique -->|skip| Final["Final QuestionSet"]
        Revised --> Final
    end

    subgraph Outputs["Outputs and reuse"]
        Final --> JSON["JSON"]
        Final --> MD["Markdown"]
        JSON --> Validate["Validate"]
        JSON --> Quiz["Interactive quiz"]
        JSON --> History["Few-shot histories"]
    end

    subgraph Convert["Existing question conversion"]
        Existing["PDF / Markdown / text SBAs"] --> ConvertCmd["Convert"]
        ConvertCmd --> QS
    end
Loading
  • Reference indexing — reference PDFs and EPUBs are parsed, chunked, and embedded into a local LanceDB vector store using PubMedBERT (runs locally, no API key needed).
  • Curriculum context — a topic and exam can be matched to the most relevant RCoA curriculum node, or a node can be supplied directly with --node.
  • Retrieve + generate — a Pydantic AI agent searches the vector store for relevant material, then produces a structured QuestionSet using the retrieved context and exam-aware prompt.
  • Post-process — generated or converted questions are matched to curriculum nodes, and an optional critique pass can revise questions against SBA writing criteria.
  • Outputs and reuse — final QuestionSet files can be saved as JSON or Markdown, validated, used in the interactive quiz, or converted into few-shot example histories.
  • Convert — existing SBA questions from PDFs, Markdown, or inline text can be parsed into the same structured QuestionSet format.

Setup

Minerva uses uv as its package manager. Install it for your operating system before proceeding, then install dependencies:

uv sync

Copy .env.example to .env and fill in your API keys:

OPENAI_API_KEY=        # required if using OpenAI models
ANTHROPIC_API_KEY=     # required if using Anthropic/Claude models
MINERVA_MODEL=openai:gpt-5.5
LANCEDB_DIR=./lancedb

Embeddings use NeuML/pubmedbert-base-embeddings locally — no API key required for embedding.

Usage

1. Embed your reference documents (a folder of PDFs):

./mincli.py embed path/to/docs/

# Show per-file progress and chunk-level detail
./mincli.py embed path/to/docs/ --verbose

# Reset existing embeddings and re-embed from scratch
./mincli.py embed path/to/docs/ --reset

PDFs are split into overlapping 300-word chunks. Tables are extracted atomically as markdown to avoid splitting across chunks. Already-embedded files are skipped automatically on subsequent runs.

2. Generate questions:

# Single question on a topic
./mincli.py create "Lung Compliance"

# Multiple questions, saved to a directory
./mincli.py create "Cardiac Output" --count 3 --output ./output

# Save to a specific file
./mincli.py create "Cardiac Output" --output ./output/cardiac.json

# With curriculum context (agent matches the best curriculum node automatically)
./mincli.py create "Rocuronium" --exam primary

# From a specific curriculum node by code (--exam inferred from node if omitted)
./mincli.py create --node 1_GA_P_6

# From a specific node with a custom topic
./mincli.py create "Rocuronium reversal" --node 1_GA_P_6

# Using Anthropic Claude
./mincli.py create "Pharmacokinetics" --model anthropic:claude-opus-4-6

# With a self-critique pass to improve question quality
./mincli.py create "Lung Compliance" --critique

# Show retrieval detail, critique feedback, diffs, and token usage
./mincli.py create "Lung Compliance" --critique --verbose

3. Critique saved questions:

# Run a critique pass on a previously generated file
./mincli.py critique output/1_GA_P_6_2026-04-30.json

# With feedback and diffs showing exactly what changed
./mincli.py critique output/1_GA_P_6_2026-04-30.json --verbose

# Save revised questions to a specific location
./mincli.py critique output/1_GA_P_6_2026-04-30.json -o output/revised/

The critique checks each question against SBA writing criteria (positive framing, homogeneous distractors, option length balance, explanation completeness) and saves a revised file alongside the original with a _critiqued suffix.

4. Interactive quiz:

# Quiz from a saved file
./mincli.py quiz output/cardiac_output_2026-04-30.json

# Generate then quiz in one step
./mincli.py quiz --topic "Lung Compliance" --exam primary --count 5

5. Convert existing questions:

# Parse a PDF or markdown file of SBA questions into structured JSON
./mincli.py convert "Primary FRCA Sample SBAs.pdf" --exam primary --output ./output

# From a markdown file with a custom topic label
./mincli.py convert examples/questions.md --topic "Primary FRCA Pharmacology"

# Inline text
./mincli.py convert --text "A patient... Which drug? A. X B. Y ..." --topic "test"

Per-option explanations are generated where missing. Questions referencing images or ECGs are automatically skipped.

6. Validate saved questions (no LLM call):

# Check structure, option counts, curriculum codes, etc.
./mincli.py validate output/lung_compliance_2026-05-02.json

# Validate multiple files at once
./mincli.py validate output/*.json

7. Test retrieval (useful for debugging):

# Check curriculum node matching for a topic
./mincli.py match "Rocuronium"
./mincli.py match "Rocuronium" --exam final

# Show ancestor path and similarity scores
./mincli.py match "Rocuronium" --verbose

# Check what reference material would be retrieved
./mincli.py match "Rocuronium" --source docs

Maintenance: build few-shot examples from converted question sets:

uv run python scripts/make_history.py output/primary_frca_2026-05-02.json

Curriculum-aware generation

Minerva includes the full RCoA Primary and Final FRCA curriculum trees. When --exam is provided, the agent automatically matches the topic to the most relevant curriculum node using embedding similarity and includes the full curriculum breadcrumb in the prompt — helping the LLM target the right scope and depth for the exam standard.

You can also specify a node directly by code (--node 1_GA_P_6) to bypass automatic matching and pin generation to a specific curriculum item.

Models

Model strings use provider:name format:

Provider Model string
OpenAI openai:gpt-5.5
Anthropic anthropic:claude-opus-4-6
Ollama ollama:qwen3.6

Set the default via MINERVA_MODEL in .env, or override per-run with --model.

Token usage is shown under --verbose.

Running locally with Ollama

Install Ollama and pull a model, then set OLLAMA_BASE_URL in .env:

ollama pull qwen3.6
OLLAMA_BASE_URL=http://localhost:11434
MINERVA_MODEL=ollama:qwen3.6

No API key is required, unless using cloud based models.

Testing

uv run pytest

The test suite covers pure functions across all modules (models, curriculum, embed, output, agent, prompts, validation, CLI, inputs, paths) and runs without any API keys or network access.

Adapting to Other Fields

To use Minerva in another domain:

  • Update the role prompt in minerva/prompts.py
  • Replace the few-shot examples in examples/
  • Supply embeddings from relevant reference material
  • Replace the curriculum JSON in data/ with your own structure

What's Next

Question quality is currently being validated against a set of human-written questions. Longer term, the goal is to make a freely accessible web platform that serves generated questions to anyone preparing for postgraduate exams.

About

LLM-based Single Best Answer Question Generation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages