Historical Document Extractor is a local OCR pipeline for extracting data from historical NSDAP membership cards (Formular A3340), powered by Qwen3.5-9B running through llama.cpp.
The tool automatically reads and extracts data from NSDAP personal cards, including:
- Full name
- Birth date and place
- Occupation
- Residential address
- Membership number
- Joining/leaving/rejoining dates
- Region (Gau)
- Python 3.8+
- Llama.cpp server with Qwen3.5-9B
- Libraries:
requests,Pillow,tqdm,json
- Clone the repository
- Install dependencies:
pip install requests Pillow tqdm
Run the llama.cpp server with your selected Qwen model:
./llama-server -m "$SELECTED_MODEL" \
--ctx-size 131072 \
--jinja \
--tensor-split 24,12 \
--temp 0.7 \
--top-p 0.8 \
--top-k 20 \
--min-p 0.00 \
--chat-template-kwargs '{"enable_thinking": false}' \
--cache-type-k q8_0 --cache-type-v q8_0 \
-np 1 \
-fa on \
-ngl 99The server must be running at http://localhost:8080/v1/chat/completions
- Place document images in the
images/folder - Run the extractor:
python extract.py
- Results are saved to
extraction_results.json
{
"name": "Aatz Heinrich",
"birth_date": "16.10.00",
"birth_place": "Herrensohr",
"occupation": null,
"residence_city": "Herrensohr",
"street_address": "B., Bergstr.38",
"district": "O. G.",
"gau": "Saarpf.",
"membership_number": "2690739",
"joined_date": "25.8.33",
"left_date": "1.Aug.1933",
"rejoined_date": null,
"footer_note": "Weitere Überweisungen umseitig!",
"source_file": "A3340-MFKL-A0001-0076.png"
}Edit SERVER_URL in extract.py if you're using a different port.
MIT License