Extract structured data from invoices using a local LLM on your own hardware, with arithmetic validation and a human-in-the-loop approval gate. Zero document content leaves the machine.
A privacy-first document-processing pipeline. Invoices (PDF or image) go in, validated structured JSON comes out, clean documents are committed automatically, and a human only approves the ones the system flags. Built to show that AI automation can cut back-office work without sending sensitive financial documents to any cloud API.
Part of a local-AI series alongside Local AI Data Analyst.
Manual invoice data entry is slow, error-prone, and expensive. Full automation is risky, because a wrong total written into an accounting system is a real problem. This project takes the middle path: the model does the extraction, validation rules catch the obvious errors, and a person approves only the flagged cases. For regulated industries (finance, anything under data-residency rules), the local-only inference is the point: no customer document is ever sent to a third party.
The pipeline runs in four stages, orchestrated end-to-end by n8n:
- Ingest. A text-based PDF is read directly through its text layer. A scanned PDF or an image is run through Tesseract OCR. No vision model is needed, so it runs on a 4GB GPU.
- Extract. A local Ollama model (Qwen2.5) returns JSON constrained to a fixed schema using structured output at temperature 0.
- Validate. Required fields, date formats, and arithmetic
(
line items == subtotal,subtotal + tax == total) are checked. Any failure becomes a listed issue and flips the document to "needs review". - Route. Clean documents are logged automatically. Flagged documents are sent to a reviewer through Telegram with Approve / Disapprove buttons; n8n pauses until a button is pressed, then logs the outcome.
The same Python pipeline runs in evaluation and in production (n8n calls it over a local FastAPI endpoint), so the measured accuracy is the real behaviour.
PDF/Image -> Ingest (text layer | OCR) -> Extract (local LLM) -> Validate
-> needs_review? --no--> log (auto-passed)
--yes--> Telegram approval -> log (approved | rejected)
Evaluated on 50 hand-verified documents (35 clean digital invoices, 15 scanned receipts from the SROIE dataset).
| Metric | Value |
|---|---|
| Field-level accuracy (normalized) | ~100% on the labeled set |
| Auto-pass rate (no human needed) | 22% |
| Routed to human review | 78% |
Honest reading of these numbers: the ~100% field accuracy is optimistic. The
test set contains many scanned receipts that genuinely have no invoice_number,
due_date, or tax, so both the prediction and the ground truth are null and are
counted as correct. The headline accuracy therefore overstates performance on dense,
field-rich invoices. The more meaningful operational metric is the 22% auto-pass
rate: the validation layer is deliberately strict, so most documents are routed to
a quick human check rather than committed blindly.
These are estimates with stated assumptions, not measured production figures.
Assuming manual handling of one invoice (read, categorize, key in, sanity-check) takes ~3 minutes, and a flagged document takes ~20 seconds to review and approve via the Telegram prompt:
| Manual | This pipeline | |
|---|---|---|
| Per 1,000 invoices | ~50 hours | ~4.3 hours of review (78% × 20s) |
| Human touches | 1,000 | 780 |
The savings come from two places: 22% of documents need no human at all, and the remaining 78% become a one-tap approval instead of full manual entry.
All extraction and validation happen locally via Ollama. Invoice content is never sent to OpenAI, Anthropic, or any third party. The only external surfaces are the ones you choose to wire up (the Telegram approval message and the Google Sheets log), and those carry the extracted summary, not the raw document.
python -m venv .venv
.venv\Scripts\activate # Windows
pip install -r requirements.txt
ollama pull qwen2.5:7b # or qwen2.5:3b on a 4GB GPU (set MODEL_NAME)
# Install Tesseract (Windows). If it isn't on PATH:
set TESSERACT_CMD=C:\Program Files\Tesseract-OCR\tesseract.exeRun one document, evaluate, or serve the API:
python -m src.pipeline data/raw/invoice_001.jpg
python -m eval.run_eval
uvicorn api.main:app --port 8000The n8n workflow (n8n/workflow.json) wires a trigger to the API, branches on
needs_review, and handles the Telegram approval and Google Sheets logging. See
n8n/README.md for the node-by-node build. For local webhook callbacks (Telegram
approval buttons), the workflow is exposed during development through an ngrok tunnel
with WEBHOOK_URL set to the public URL.
- Tuned for invoices; receipts, purchase orders, and other document types are out of scope for v1.
- On a 4GB GPU the smaller
qwen2.5:3bmodel is used; accuracy is lower than 7B. - Field accuracy is reported only for scalar fields. Line-item extraction is not in the headline metric yet.
- Ambiguous date formats (DD/MM vs MM/DD) are normalized but can still be misread.
- The reported accuracy is inflated by null-heavy scanned receipts in the test set; a pure dense-invoice test set would likely score lower.