DAMPF — Data Aggregation & Modular Processing Framework

TL;DR

Converts raw folder-based datasets into a structured SQLite database (and optional CSV/JSON exports) for querying, filtering, and analysis — usable standalone or as part of the WETTER framework.

WETTER Framework

This project is part of the WETTER framework, a modular toolkit for structured processing, analysis, and exploration of scientific imaging datasets.

Pipeline: Raw Data → DAMPF → KEIM → WOLKE → BLITZ

At its core, WETTER operates on a structured SQLite dataset:
DAMPF converts raw file-based archives into a queryable database, which is then extended (KEIM), explored (WOLKE), and visualized (BLITZ).

The central entry point, ecosystem overview, and module navigation are available here:
wetter.mess.engineering

Modules at a glance

DAMPF — Data Aggregation & Modular Processing Framework
Semantic indexing of file and folder structures into a structured SQLite dataset (primary data source).
KEIM — Knowledge Extraction & Inference Module
Scalar statistics, feature extraction, and analytical evaluation on top of the indexed dataset.
WOLKE — Web-Oriented Layout for Knowledge Exploration
Browser-based exploration of structured metadata with filtering, selection, and dataset navigation.
BLITZ — Bulk Loading and Interactive Time series Zonal analysis
High-performance interactive visualization and analysis of selected data.

What is DAMPF

DAMPF is the entry point of the WETTER pipeline.
It performs semantic indexing of file and folder structures and produces a SQLite database as its primary output: one row per file, typed columns, and supporting tables (column_metadata, optional config). This turns raw archives into structured metadata that can be queried, filtered, and used programmatically (WOLKE, BLITZ, scripts).

Purpose

DAMPF automatically structures large collections of measurement files based on folder and file naming conventions.

Semantic information such as date, run number, temperature, exposure time, camera ID, sample name, or modality is extracted from paths and filenames and mapped into a uniform schema.

The resulting dataset can be explored and analyzed by other tools in the WETTER ecosystem.

Typical use cases:

Scientific image data (PNG, TIFF, JPG)
NumPy arrays (NPY, NPZ)
CSV/text measurement data
HDF5 files
Mixed folder structures with experimental parameters in the path

Indexing Pipeline

Folder scan
- Recursive
- Filter on defined file types
Tokenization
- Split on _ - . /
- Split at letter↔digit boundaries
- Include folder names
Rule-based parsing
- Date detection (ISO, compact)
- Run / Rep / Trial
- Camera / Channel / ROI / Frame
- Exposure (ns, us, ms, s)
- Temperature (°C, K)
- Frequency, voltage, current
- Modality tags
- Processing tags (raw, dark, flat, avg, ...)
Structured schema

{
  "date": "2026-02-28",
  "run": 3,
  "temp": {"val": 20.0, "unit": "c"},
  "exposure": {"val": 2.0, "unit": "us"},
  "camera": 1,
  "modality": "ir",
  "processing": ["raw"],
  "path": "...",
  "confidence": 0.93
}

Structured output

SQLite (primary for WETTER tools): one row per file in the data table, plus column_metadata and optional config.
Optional ASCII exports: index.jsonl, index.csv, index_meta.json for quick inspection, spreadsheets, or diffs.

Step 5 details match the Output section below (defaults, tables, column layout).

Configuration

The GUI still reads and writes dampf.ini next to the executable (frozen) or in the current working directory (dev). A reference copy of the default layout is in config/dampf.ini.

CLI: Adjust in app/dampf/filename_semantic_indexer.py: ROOT_DIR, OUTPUT_DIR, ALLOWED_SUFFIXES, FOLDER_TOKEN_DEPTH. That script writes JSONL/CSV only; for SQLite use the GUI Save to DB or call write_index_to_sqlite from code.

GUI:

Set crawl root and output folder (default export folder: root/_index)
Set database path (default: root/<root_folder_name>.db, same rule as default_db_path_for_root in app/dampf/db_writer.py)
Save schema as project.json / project.ini
Reload via “Load root according to schema”

Naming conventions:

GUI and standalone EXE

Start: uv run dampf, or python -m dampf.gui (with app on PYTHONPATH, e.g. set PYTHONPATH=app before python), or dist\DAMPF.exe
Workflow: Root folder → Crawl → Adjust nomenclature → Save schema → Save to DB (primary) → optional Export JSONL/CSV to _index (or chosen output folder)
Schema handling: Save as project.json / project.ini Reapply to new datasets
Outlier detection:
- Identifies inconsistent paths / typos
- Suggests corrections
- Optional filtered view (“Outliers only”)
Build EXE: pyinstaller build/DAMPF.spec → dist\DAMPF.exe (or uv run python scripts/build_exe.py)

Outputs integrate directly with:

WOLKE (metadata exploration)
BLITZ (image visualization)

Output

Each indexed file corresponds to one row in the primary store (SQLite data table).

SQLite (primary handoff for WOLKE / BLITZ): path is user-configurable; GUI default matches default_db_path_for_root in app/dampf/db_writer.py (<crawl_root>/<root_folder_name>.db, with sanitization and fallbacks).
- data table (values)
- column_metadata (units, traceability)
- optional config
Optional ASCII exports in the output folder (default root/_index/):
- index.jsonl — line-based, machine-readable
- index.csv — spreadsheet-friendly
- index_meta.json — column metadata (units, raw units)

Includes:

numeric-only value columns
separate unit storage
confidence score per file

Why rule-based?

File naming is typically semi-structured. Rule-based parsing is:

deterministic
reproducible
fast
easy to debug

ML/LLM approaches are useful when:

naming conventions vary strongly
ambiguities increase
adaptive parsing is required

The baseline is intentionally deterministic.

Extension

Add regex rules
Add domain-specific tokens
Refine confidence scoring
Optional: ML fallback for unknown tokens

About

Developed and maintained by
M.E.S.S. — Mattern Engineering & Software Solutions

Parts of this work were influenced by prior research activities in an academic context, including work at INP Greifswald.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
app/dampf		app/dampf
build		build
config		config
docs		docs
icon		icon
scripts		scripts
tests		tests
.gitignore		.gitignore
README.MD		README.MD
TODO.md		TODO.md
dampf.ini		dampf.ini
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DAMPF — Data Aggregation & Modular Processing Framework

TL;DR

WETTER Framework

Modules at a glance

What is DAMPF

Purpose

Indexing Pipeline

Configuration

GUI and standalone EXE

Output

Why rule-based?

Extension

About

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DAMPF — Data Aggregation & Modular Processing Framework

TL;DR

WETTER Framework

Modules at a glance

What is DAMPF

Purpose

Indexing Pipeline

Configuration

GUI and standalone EXE

Output

Why rule-based?

Extension

About

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages