Historical Document Extractor

Historical Document Extractor is a local OCR pipeline for extracting data from historical NSDAP membership cards (Formular A3340), powered by Qwen3.5-9B running through llama.cpp.

📋 Description

The tool automatically reads and extracts data from NSDAP personal cards, including:

Full name
Birth date and place
Occupation
Residential address
Membership number
Joining/leaving/rejoining dates
Region (Gau)

🛠️ Requirements

Python 3.8+
Llama.cpp server with Qwen3.5-9B
Libraries: requests, Pillow, tqdm, json

🚀 Installation

Clone the repository
Install dependencies:
```
pip install requests Pillow tqdm
```

📦 Model Setup

Run the llama.cpp server with your selected Qwen model:

./llama-server -m "$SELECTED_MODEL" \
  --ctx-size 131072 \
  --jinja \
  --tensor-split 24,12 \
  --temp 0.7 \
  --top-p 0.8 \
  --top-k 20 \
  --min-p 0.00 \
  --chat-template-kwargs '{"enable_thinking": false}' \
  --cache-type-k q8_0 --cache-type-v q8_0 \
  -np 1 \
  -fa on \
  -ngl 99

The server must be running at http://localhost:8080/v1/chat/completions

📝 Usage

Place document images in the images/ folder
Run the extractor:
```
python extract.py
```
Results are saved to extraction_results.json

📄 Example Output

{
  "name": "Aatz Heinrich",
  "birth_date": "16.10.00",
  "birth_place": "Herrensohr",
  "occupation": null,
  "residence_city": "Herrensohr",
  "street_address": "B., Bergstr.38",
  "district": "O. G.",
  "gau": "Saarpf.",
  "membership_number": "2690739",
  "joined_date": "25.8.33",
  "left_date": "1.Aug.1933",
  "rejoined_date": null,
  "footer_note": "Weitere Überweisungen umseitig!",
  "source_file": "A3340-MFKL-A0001-0076.png"
}

⚙️ Configuration

Edit SERVER_URL in extract.py if you're using a different port.

📝 License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
images		images
README.md		README.md
extract.py		extract.py
extraction_results.json		extraction_results.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Historical Document Extractor

📋 Description

🛠️ Requirements

🚀 Installation

📦 Model Setup

📝 Usage

📄 Example Output

⚙️ Configuration

📝 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Historical Document Extractor

📋 Description

🛠️ Requirements

🚀 Installation

📦 Model Setup

📝 Usage

📄 Example Output

⚙️ Configuration

📝 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages