Converts raw folder-based datasets into a structured SQLite database (and optional CSV/JSON exports) for querying, filtering, and analysis — usable standalone or as part of the WETTER framework.
This project is part of the WETTER framework, a modular toolkit for structured processing, analysis, and exploration of scientific imaging datasets.
Pipeline: Raw Data → DAMPF → KEIM → WOLKE → BLITZ
At its core, WETTER operates on a structured SQLite dataset:
DAMPF converts raw file-based archives into a queryable database, which is then extended (KEIM), explored (WOLKE), and visualized (BLITZ).
The central entry point, ecosystem overview, and module navigation are available here:
wetter.mess.engineering
-
DAMPF — Data Aggregation & Modular Processing Framework
Semantic indexing of file and folder structures into a structured SQLite dataset (primary data source). -
KEIM — Knowledge Extraction & Inference Module
Scalar statistics, feature extraction, and analytical evaluation on top of the indexed dataset. -
WOLKE — Web-Oriented Layout for Knowledge Exploration
Browser-based exploration of structured metadata with filtering, selection, and dataset navigation. -
BLITZ — Bulk Loading and Interactive Time series Zonal analysis
High-performance interactive visualization and analysis of selected data.
DAMPF is the entry point of the WETTER pipeline.
It performs semantic indexing of file and folder structures and produces a SQLite database as its primary output: one row per file, typed columns, and supporting tables (column_metadata, optionalconfig). This turns raw archives into structured metadata that can be queried, filtered, and used programmatically (WOLKE, BLITZ, scripts).
DAMPF automatically structures large collections of measurement files based on folder and file naming conventions.
Semantic information such as date, run number, temperature, exposure time, camera ID, sample name, or modality is extracted from paths and filenames and mapped into a uniform schema.
The resulting dataset can be explored and analyzed by other tools in the WETTER ecosystem.
Typical use cases:
- Scientific image data (PNG, TIFF, JPG)
- NumPy arrays (NPY, NPZ)
- CSV/text measurement data
- HDF5 files
- Mixed folder structures with experimental parameters in the path
-
Folder scan
- Recursive
- Filter on defined file types
-
Tokenization
- Split on
_ - . / - Split at letter↔digit boundaries
- Include folder names
- Split on
-
Rule-based parsing
- Date detection (ISO, compact)
- Run / Rep / Trial
- Camera / Channel / ROI / Frame
- Exposure (ns, us, ms, s)
- Temperature (°C, K)
- Frequency, voltage, current
- Modality tags
- Processing tags (raw, dark, flat, avg, ...)
-
Structured schema
{
"date": "2026-02-28",
"run": 3,
"temp": {"val": 20.0, "unit": "c"},
"exposure": {"val": 2.0, "unit": "us"},
"camera": 1,
"modality": "ir",
"processing": ["raw"],
"path": "...",
"confidence": 0.93
}- Structured output
- SQLite (primary for WETTER tools): one row per file in the
datatable, pluscolumn_metadataand optionalconfig. - Optional ASCII exports:
index.jsonl,index.csv,index_meta.jsonfor quick inspection, spreadsheets, or diffs.
Step 5 details match the Output section below (defaults, tables, column layout).
The GUI still reads and writes dampf.ini next to the executable (frozen) or in the current working directory (dev). A reference copy of the default layout is in config/dampf.ini.
CLI:
Adjust in app/dampf/filename_semantic_indexer.py:
ROOT_DIR, OUTPUT_DIR, ALLOWED_SUFFIXES, FOLDER_TOKEN_DEPTH. That script writes JSONL/CSV only; for SQLite use the GUI Save to DB or call write_index_to_sqlite from code.
GUI:
- Set crawl root and output folder (default export folder:
root/_index) - Set database path (default:
root/<root_folder_name>.db, same rule asdefault_db_path_for_rootinapp/dampf/db_writer.py) - Save schema as
project.json/project.ini - Reload via “Load root according to schema”
Naming conventions:
-
Start:
uv run dampf, orpython -m dampf.gui(withapponPYTHONPATH, e.g.set PYTHONPATH=appbeforepython), ordist\DAMPF.exe -
Workflow: Root folder → Crawl → Adjust nomenclature → Save schema → Save to DB (primary) → optional Export JSONL/CSV to
_index(or chosen output folder) -
Schema handling: Save as
project.json/project.iniReapply to new datasets -
Outlier detection:
- Identifies inconsistent paths / typos
- Suggests corrections
- Optional filtered view (“Outliers only”)
-
Build EXE:
pyinstaller build/DAMPF.spec→dist\DAMPF.exe(oruv run python scripts/build_exe.py)
Outputs integrate directly with:
- WOLKE (metadata exploration)
- BLITZ (image visualization)
Each indexed file corresponds to one row in the primary store (SQLite data table).
-
SQLite (primary handoff for WOLKE / BLITZ): path is user-configurable; GUI default matches
default_db_path_for_rootinapp/dampf/db_writer.py(<crawl_root>/<root_folder_name>.db, with sanitization and fallbacks).datatable (values)column_metadata(units, traceability)- optional
config
-
Optional ASCII exports in the output folder (default
root/_index/):index.jsonl— line-based, machine-readableindex.csv— spreadsheet-friendlyindex_meta.json— column metadata (units, raw units)
Includes:
- numeric-only value columns
- separate unit storage
- confidence score per file
File naming is typically semi-structured. Rule-based parsing is:
- deterministic
- reproducible
- fast
- easy to debug
ML/LLM approaches are useful when:
- naming conventions vary strongly
- ambiguities increase
- adaptive parsing is required
The baseline is intentionally deterministic.
- Add regex rules
- Add domain-specific tokens
- Refine confidence scoring
- Optional: ML fallback for unknown tokens
Developed and maintained by
M.E.S.S. — Mattern Engineering & Software Solutions
Parts of this work were influenced by prior research activities in an academic context, including work at INP Greifswald.
