APRAG-Lab

APRAG-Lab is a local-first multimodal RAG benchmark workspace for teams that need to prove an answer is grounded before they trust it. Upload documents, images, audio, and video; build a searchable knowledge base; run Traditional RAG, Agentic RAG, and Hybrid Graph RAG against the same evidence; then compare the outputs with citations, traces, graph signals, queue progress, metrics, and grounding notes.

The product is designed for teams that need to evaluate whether a RAG system is really grounded in source material before using it in production. It solves the repeated setup problem by keeping the app, queue, metadata store, and vector databases in Docker, while running heavy model and media processing on the host laptop where Ollama, FFmpeg, OCR, and transcription can use the machine's native resources.

Why APRAG-Lab

Benchmark multiple RAG strategies side by side instead of trusting a single answer path.
Test real multimodal evidence with local models, local files, and local observability.
Inspect the full chain from upload to extraction, retrieval, graph usage, citation resolution, and export.
Keep heavy inference on host Ollama while Docker gives new users a repeatable app stack.
Use realistic sample data to validate the product before bringing private documents.

What It Does

Upload TXT, Markdown, CSV, JSON, PDF, DOCX, image, audio, and video files.
Extract text, tables, OCR text, video frames, audio, and transcripts.
Store metadata and graph state in SQLite.
Store embeddings in Chroma or Qdrant, with a SQLite/in-memory fallback for tests and offline development.
Use host Ollama for real LLM, VLM, and embedding models.
Use host media tooling for OCR, FFmpeg extraction, and transcription.
Run Traditional RAG, Agentic RAG, and Hybrid Graph RAG for the same question.
Compare quality, latency, citations, retrieval coverage, graph usage, and grounding.
Show upload and benchmark queue progress in the frontend.
Let users inspect source chunks, resolved citations, run traces, and local database dashboards.
Publish benchmark spans to OpenTelemetry/Jaeger and Phoenix, with optional LangSmith tracing.

Architecture

Docker runs the product shell:

web: React/Vite frontend at http://localhost:5173
api: FastAPI backend at http://localhost:8000
worker: default local benchmark and ingestion worker
redis: queue and job coordination
redis-commander: Redis dashboard
chroma: vector database, enabled with the vector profile
qdrant: vector database, enabled with the vector profile
rq-worker: optional RQ worker, enabled with the queue profile
celery-worker: optional Celery worker, enabled with the queue profile
jaeger: OpenTelemetry trace dashboard
phoenix: AI/RAG observability dashboard

The host laptop runs heavy processing:

Ollama for LLM, VLM, and embedding inference
FFmpeg and ffprobe for audio/video handling
Tesseract for OCR
faster-whisper or whisper.cpp for transcription

This split is intentional. Containers make the app reproducible, while local host inference avoids Docker memory limits and makes real benchmarking practical on a laptop.

Requirements

Docker Desktop with Docker Compose.
Ollama installed and running on the host.
Python 3.11 or newer for the host media runtime virtual environment.
FFmpeg and ffprobe on the host PATH.
Tesseract on the host PATH.
At least 16 GB Docker Desktop memory allocation is recommended for comfortable local use.

On macOS, the host tools can be installed with Homebrew:

brew install ffmpeg tesseract ollama

Start Ollama:

ollama serve

If Ollama is installed as a desktop app, opening the app is usually enough.

Model Profiles

Docker Compose sets APRAG_MODEL_PROFILE=auto. Auto mode preflights the host and currently selects the lite profile, which is the recommended first run:

LLM: qwen3:8b
VLM: qwen3-vl:4b
Embeddings: bge-m3
Transcription: faster-whisper:large-v3-turbo

Stronger profiles are available when the host has enough resources:

standard: qwen3:14b, qwen3-vl:8b, bge-m3
high_quality: qwen3:32b, qwen3-vl:8b, bge-m3

You can pull models before launch:

ollama pull qwen3:8b
ollama pull qwen3-vl:4b
ollama pull bge-m3

You can also start the app first and use Settings -> Pull missing models. The backend queues model pulls through the worker and shows progress in the UI.

The frontend Settings panel can save runtime overrides for the LLM, VLM, embedding model, transcription provider, transcription model, vector store, and provider mode. If an override names an Ollama model that is not installed, Settings -> Pull missing models asks the backend worker to pull it from host Ollama.

Setup From Scratch

From the project folder:

cd APRAG-Lab

Create the host media runtime virtual environment:

./scripts/setup_host_media_runtime.sh

Start the host media runtime in a separate terminal:

APRAG_HOST_DATA_DIR=$PWD/data .venv/host-media/bin/python scripts/host_media_runtime.py --host 0.0.0.0

Start the product containers:

docker compose --profile vector up -d --build

The default stack uses the built-in local worker. The optional RQ and Celery services are available for queue parity testing:

docker compose --profile vector --profile queue up -d --build

Only use the queue profile when APRAG_QUEUE_MODE is also set to rq or celery; normal local use should keep the Compose default APRAG_QUEUE_MODE=local.

Open the app:

http://localhost:5173

The first load creates a local benchmark project automatically.

How To Use The App

Open http://localhost:5173.
In Upload And Processing, choose source files.
Choose the upload behavior:
- Add to current knowledge base
- Clear current data and start fresh
- Create a new knowledge base
Click Upload and watch the extraction queue progress.
Use Project Sources to select or inspect files.
Use Source Viewer to read extracted chunks and source content.
In Benchmark Run, ask a question.
Choose Independent or Follow-up mode.
Click Run all RAG flows.
Watch the RAG Flow Queue until Traditional, Agentic, and Hybrid Graph runs finish.
Review the Comparison tab, individual flow tabs, citations, trace data, graph usage, and grounding notes.
Use Settings to inspect provider health, model overrides, missing model pulls, resource profile, database dashboards, and monitoring dashboards.

The main frontend surfaces are Upload And Processing, Project Sources, Source Viewer, Benchmark Run, Run History, and Settings. Project Sources is expandable by source, Source Viewer opens extracted chunks and citations, Run History reopens previous benchmark runs, and Settings separates action controls from system information.

Run exports are available from the API:

http://localhost:8000/api/runs/{run_id}/export.json
http://localhost:8000/api/runs/{run_id}/export.md

Local URLs

Frontend: http://localhost:5173
API health: http://localhost:8000/health
API docs: http://localhost:8000/docs
Host media runtime health: http://localhost:8765/health
SQLite query console: http://localhost:8000/api/database/sqlite-dashboard
Chroma API docs: http://localhost:8001/docs
Qdrant dashboard: http://localhost:6333/dashboard
Redis Commander: http://localhost:8083
Jaeger traces: http://localhost:16686
Phoenix AI observability: http://localhost:6006
LangSmith project: https://smith.langchain.com when LANGSMITH_TRACING=true and LANGSMITH_API_KEY are set

The database and monitoring dashboard buttons are also available in the frontend Settings panel.

Monitoring And Logs

APRAG-Lab records three levels of benchmark observability:

In-app traceability: run queue events, per-pipeline Trace Viewer steps, citation resolution, metrics, graph state snapshots, and JSONL trace exports.
Local observability: OpenTelemetry spans are exported to Jaeger and Phoenix for API requests, benchmark pipeline runs, retrieval, fusion RAG, LLM calls, VLM calls, and embedding calls.
Optional external AI observability: LangSmith tracing can be enabled for teams that already use LangGraph/LangChain monitoring.

Local dashboards:

http://localhost:16686   Jaeger OpenTelemetry traces
http://localhost:6006    Phoenix AI/RAG traces

Raw local logs and run trace endpoints:

http://localhost:8000/api/diagnostics/logs?limit=200
http://localhost:8000/api/runs/{run_id}/trace.jsonl
http://localhost:8000/api/runs/{run_id}/trace-events
http://localhost:8000/api/runs/{run_id}/graph-states

To enable LangSmith, set these environment variables before starting Docker Compose:

export LANGSMITH_TRACING=true
export LANGSMITH_API_KEY=your_langsmith_key
export LANGSMITH_PROJECT=APRAG-Lab

Sample Data

Realistic manual test files are in:

sample-data/manual-test-suite

The sample is a connected Northstar field-service case study. It uses one real-world image, one audio memo, one CSV, and one PDF to prove the multimodal path while keeping the scenario small enough for repeatable manual testing. The files share customers, ticket IDs, owners, firmware versions, incident metrics, SLA decisions, and mitigation actions.

Files:

01_vaccine_administration_event.jpg: real-world clinical vaccine administration photo with no added text overlay, sourced from Wikimedia Commons / NIH public-domain media.
02_lakeside_dispatch_memo.wav: spoken dispatch memo for INC-1043 and Priya Shah's field action.
03_service_tickets.csv: ticket metrics, severity, downtime, inventory risk, owners, and SLA flags.
04_incident_review.pdf: incident narrative, root cause, rollback decision, rollout plan, and customer outcomes.

Useful benchmark questions:

What caused the Lakeside Clinic outage, and which CSV ticket metrics prove it was the highest-risk case?
What does the vaccination image show, and how does it relate to the Lakeside vaccine freezer incident?
What did the audio dispatch memo say Priya Shah did for INC-1043?
Which customers were on firmware 4.8.2, and why did only Lakeside qualify for an SLA credit?
What did the review board decide about rollback versus staged firmware 4.8.3 rollout?
Was Pine Ridge Foods part of the firmware defect, or was it a different issue?

Regenerate the sample files when needed:

python3 scripts/create_manual_test_dataset.py

Supported upload extensions:

Text and documents: .txt, .md, .markdown, .pdf, .docx
Structured data: .csv, .json
Images: .png, .jpg, .jpeg, .webp, .gif
Audio: .wav, .mp3, .m4a, .ogg
Video: .mp4, .mov, .mkv, .webm

Upload limits enforced by the API:

Max file size: 200 MB
Max audio duration: 30 minutes
Max video duration: 15 minutes
Max PDF length: 300 pages
Max DOCX estimate: 300 structural units
Max images per upload run: 50
Max chunks per project: 10,000

Configuration

Common environment variables:

APRAG_MODEL_PROFILE=auto|lite|standard|high_quality
APRAG_PROVIDER_MODE=real|deterministic
APRAG_VECTOR_STORE=chroma|qdrant|sqlite
APRAG_MEDIA_RUNTIME=host|container
APRAG_QUEUE_MODE=inline|local|rq|celery
OLLAMA_BASE_URL=http://host.docker.internal:11434
APRAG_HOST_MEDIA_BASE_URL=http://host.docker.internal:8765
APRAG_DEFAULT_OLLAMA_LLM=qwen3:8b
APRAG_DEFAULT_OLLAMA_VLM=qwen3-vl:4b
APRAG_DEFAULT_EMBEDDINGS=bge-m3
APRAG_DEFAULT_TRANSCRIPTION=faster-whisper:large-v3-turbo
APRAG_OBSERVABILITY_ENABLED=true|false
APRAG_OTEL_EXPORTER_OTLP_ENDPOINTS=http://jaeger:4318/v1/traces,http://phoenix:6006/v1/traces
LANGSMITH_TRACING=true|false
LANGSMITH_API_KEY=...
LANGSMITH_PROJECT=APRAG-Lab

For normal local use, keep the defaults in docker-compose.yml.

Additional runtime controls supported by the backend:

Model generation: APRAG_OLLAMA_TEMPERATURE, APRAG_OLLAMA_NUM_PREDICT, APRAG_VLM_TEMPERATURE, APRAG_VLM_NUM_PREDICT, APRAG_OLLAMA_THINKING
Model timeouts: APRAG_LLM_TIMEOUT, APRAG_VLM_TIMEOUT, APRAG_EMBEDDING_TIMEOUT
Model pulling: APRAG_OLLAMA_MODELS
Host media runtime: APRAG_HOST_DATA_DIR, APRAG_HOST_MEDIA_HOST, APRAG_HOST_MEDIA_PORT, APRAG_HOST_MEDIA_TIMEOUT, APRAG_HOST_MEDIA_HEALTH_TIMEOUT
Transcription: APRAG_TRANSCRIPTION_PROVIDER=faster-whisper|whisper.cpp, APRAG_FASTER_WHISPER_MODEL, APRAG_WHISPER_DEVICE, APRAG_WHISPER_COMPUTE_TYPE, WHISPER_CPP_MODEL
OCR and parsing: APRAG_OCR_PROVIDER=tesseract|easyocr, APRAG_EASYOCR_GPU, APRAG_DOC_PARSER=docling
Graph RAG: APRAG_USE_LANGGRAPH, APRAG_GRAPH_LLM_MAX_CHUNKS, APRAG_GRAPH_LLM_MAX_SUMMARIES, APRAG_GRAPH_LLM_TIMEOUT, APRAG_GRAPH_LLM_TEMPERATURE, APRAG_GRAPH_LLM_NUM_PREDICT
Storage and reliability: DATA_DIR, APRAG_SQLITE_TIMEOUT, APRAG_STRICT_VECTOR_STORE, APRAG_DISK_WARNING_BYTES, APRAG_RESOURCE_CHECK_PATH

Storage Layout

Persistent app data lives under data/:

data/sqlite/APRAG-Lab.db: projects, sources, chunks, runs, metrics, feedback, graph state, and job events.
data/projects/{project_id}: uploaded source files, derived OCR/transcript/frame artifacts, manifests, and run exports.
data/vector_store/chroma: local Chroma data when the API uses embedded Chroma instead of the Compose Chroma service.
data/logs/APRAG-Lab.log: local structured application logs.

Docker named volumes hold service-specific data for Qdrant, Chroma, and Phoenix when their containers are running.

Verification

Backend tests:

docker compose exec -T api pytest -q

Frontend tests:

docker compose exec -T web npm test -- --run

Frontend production build:

docker compose exec -T web npm run build

These commands assume the stack is already running. From a stopped machine, start it first with:

docker compose --profile vector up -d --build

Provider health:

curl http://localhost:8000/api/settings/provider-health
curl http://localhost:8000/api/settings/resource-profile
curl http://localhost:8000/api/settings/models

Logs:

curl "http://localhost:8000/api/diagnostics/logs?limit=200"
tail -f data/logs/APRAG-Lab.log

Browser-level manual testing can be started with:

cd apps/web
APRAG_PLAYWRIGHT_HEADLESS=0 node scripts/manual-product-test.cjs

Stop And Clean Up

Stop containers:

docker compose down

Stop the host media runtime with Ctrl+C in its terminal.

Persistent local product data is stored under:

data/

Delete that folder only when you intentionally want to remove local projects, sources, extracted chunks, runs, logs, and SQLite metadata.

Troubleshooting

If the app says Ollama is unreachable:

curl http://localhost:11434/api/tags

If that fails, start Ollama and reload the app.

If models are missing, use Settings -> Pull missing models or run:

ollama pull qwen3:8b
ollama pull qwen3-vl:4b
ollama pull bge-m3

If media extraction fails, make sure the host media runtime is running and the tools exist:

ffmpeg -version
ffprobe -version
tesseract --version
curl http://localhost:8765/health

If vector dashboards do not open, make sure the app was started with:

docker compose --profile vector up -d --build

If Docker runs out of memory, keep Ollama on the host, use the lite model profile, close other heavy apps, and keep Docker Desktop memory near its available maximum.

Privacy

APRAG-Lab is local-first. By default, source files, extracted content, embeddings, model calls, traces, and benchmark results stay on the local machine. The default product path does not require a cloud LLM provider.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github		.github
apps		apps
assets		assets
sample-data/manual-test-suite		sample-data/manual-test-suite
scripts		scripts
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
docker-compose.yml		docker-compose.yml
requirements-host-media.txt		requirements-host-media.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

APRAG-Lab

Why APRAG-Lab

What It Does

Architecture

Requirements

Model Profiles

Setup From Scratch

How To Use The App

Local URLs

Monitoring And Logs

Sample Data

Configuration

Storage Layout

Verification

Stop And Clean Up

Troubleshooting

Privacy

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

APRAG-Lab

Why APRAG-Lab

What It Does

Architecture

Requirements

Model Profiles

Setup From Scratch

How To Use The App

Local URLs

Monitoring And Logs

Sample Data

Configuration

Storage Layout

Verification

Stop And Clean Up

Troubleshooting

Privacy

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages