Skip to content

FaizarM/AI-Invoice-Extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Local AI Invoice Extractor

Extract structured data from invoices using a local LLM on your own hardware, with arithmetic validation and a human-in-the-loop approval gate. Zero document content leaves the machine.

A privacy-first document-processing pipeline. Invoices (PDF or image) go in, validated structured JSON comes out, clean documents are committed automatically, and a human only approves the ones the system flags. Built to show that AI automation can cut back-office work without sending sensitive financial documents to any cloud API.

Part of a local-AI series alongside Local AI Data Analyst.

Why this exists

Manual invoice data entry is slow, error-prone, and expensive. Full automation is risky, because a wrong total written into an accounting system is a real problem. This project takes the middle path: the model does the extraction, validation rules catch the obvious errors, and a person approves only the flagged cases. For regulated industries (finance, anything under data-residency rules), the local-only inference is the point: no customer document is ever sent to a third party.

How it works

The pipeline runs in four stages, orchestrated end-to-end by n8n:

  1. Ingest. A text-based PDF is read directly through its text layer. A scanned PDF or an image is run through Tesseract OCR. No vision model is needed, so it runs on a 4GB GPU.
  2. Extract. A local Ollama model (Qwen2.5) returns JSON constrained to a fixed schema using structured output at temperature 0.
  3. Validate. Required fields, date formats, and arithmetic (line items == subtotal, subtotal + tax == total) are checked. Any failure becomes a listed issue and flips the document to "needs review".
  4. Route. Clean documents are logged automatically. Flagged documents are sent to a reviewer through Telegram with Approve / Disapprove buttons; n8n pauses until a button is pressed, then logs the outcome.

The same Python pipeline runs in evaluation and in production (n8n calls it over a local FastAPI endpoint), so the measured accuracy is the real behaviour.

PDF/Image -> Ingest (text layer | OCR) -> Extract (local LLM) -> Validate
   -> needs_review? --no--> log (auto-passed)
                     --yes--> Telegram approval -> log (approved | rejected)

Results

Evaluated on 50 hand-verified documents (35 clean digital invoices, 15 scanned receipts from the SROIE dataset).

Metric Value
Field-level accuracy (normalized) ~100% on the labeled set
Auto-pass rate (no human needed) 22%
Routed to human review 78%

Honest reading of these numbers: the ~100% field accuracy is optimistic. The test set contains many scanned receipts that genuinely have no invoice_number, due_date, or tax, so both the prediction and the ground truth are null and are counted as correct. The headline accuracy therefore overstates performance on dense, field-rich invoices. The more meaningful operational metric is the 22% auto-pass rate: the validation layer is deliberately strict, so most documents are routed to a quick human check rather than committed blindly.

Resource saving (illustrative)

These are estimates with stated assumptions, not measured production figures.

Assuming manual handling of one invoice (read, categorize, key in, sanity-check) takes ~3 minutes, and a flagged document takes ~20 seconds to review and approve via the Telegram prompt:

Manual This pipeline
Per 1,000 invoices ~50 hours ~4.3 hours of review (78% × 20s)
Human touches 1,000 780

The savings come from two places: 22% of documents need no human at all, and the remaining 78% become a one-tap approval instead of full manual entry.

Privacy

All extraction and validation happen locally via Ollama. Invoice content is never sent to OpenAI, Anthropic, or any third party. The only external surfaces are the ones you choose to wire up (the Telegram approval message and the Google Sheets log), and those carry the extracted summary, not the raw document.

Setup

python -m venv .venv
.venv\Scripts\activate          # Windows
pip install -r requirements.txt

ollama pull qwen2.5:7b          # or qwen2.5:3b on a 4GB GPU (set MODEL_NAME)

# Install Tesseract (Windows). If it isn't on PATH:
set TESSERACT_CMD=C:\Program Files\Tesseract-OCR\tesseract.exe

Run one document, evaluate, or serve the API:

python -m src.pipeline data/raw/invoice_001.jpg
python -m eval.run_eval
uvicorn api.main:app --port 8000

The n8n workflow (n8n/workflow.json) wires a trigger to the API, branches on needs_review, and handles the Telegram approval and Google Sheets logging. See n8n/README.md for the node-by-node build. For local webhook callbacks (Telegram approval buttons), the workflow is exposed during development through an ngrok tunnel with WEBHOOK_URL set to the public URL.

Limitations

  • Tuned for invoices; receipts, purchase orders, and other document types are out of scope for v1.
  • On a 4GB GPU the smaller qwen2.5:3b model is used; accuracy is lower than 7B.
  • Field accuracy is reported only for scalar fields. Line-item extraction is not in the headline metric yet.
  • Ambiguous date formats (DD/MM vs MM/DD) are normalized but can still be misread.
  • The reported accuracy is inflated by null-heavy scanned receipts in the test set; a pure dense-invoice test set would likely score lower.

About

Privacy-first invoice data extraction with a local LLM (Ollama), arithmetic validation, and human-in-the-loop approval via n8n + Telegram.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages