I'm a Data Engineer based in Greece, working remotely at NIKI Digital Engineering on automotive data pipelines for BMW and AUDI— processing millions of sensor records daily using AWS and PySpark.
My background combines production data engineering with 5 years of scientific research, where I built ETL pipelines processing 20+ TB of satellite climate data during my PhD at the University of Ioannina.
- 🏢 Currently: Data Engineer @ NIKI Digital Engineering (BMW and AUDI external partner) — remote
- ☁️ Stack: AWS Glue · Athena · S3 · PySpark · Python · SQL
- 🔬 Background: PhD researcher — large-scale satellite data pipelines (ERA5, NASA, EUMETSAT)
- 📍 Location: Greece
NIKI Digital Engineering · Remote · Jun 2024 – Present
Data Engineering:
- Build production ETL pipelines for automotive data using AWS Glue and PySpark, processing millions of sensor records daily
- Design and maintain data tables for analytics used in production environments
- Develop SQL transformations and data quality validation frameworks for time-series vehicle data
- Collaborate with stakeholders to optimize pipeline performance and define data schemas
- Tech: Python, PySpark, SQL, AWS (Glue, S3, Athena), Docker, Git
Test Automation:
- Develop automated test frameworks for ECU validation using EXAM and Python
- Hardware-in-the-loop (HIL) testing
University of Ioannina · Ioannina, Greece · Oct 2020 – Sep 2025
- Built CLARISC — a cloud database integrating EUMETSAT and NASA satellite datasets, processing 20+ TB of climate data
- Engineered ETL pipelines for large-scale satellite and reanalysis datasets (ERA5, CERES, MERRA-2) using Python, xarray, Dask, and SQL
- Developed EarthSense — an interactive web application to visualize research results
- Performed statistical analysis, data validation, and quality control on multi-dimensional time-series datasets
- Published 4 peer-reviewed papers in high-impact journals (Atmospheric Research, Climatic Change)
- Tech: Python (pandas, xarray, Dask, NumPy, SciPy), SQL, Flask, FastAPI, JavaScript, HPC, Linux, Docker, Git, Fortran
Key Projects:
- 🌍 EarthSense — Interactive web app for PhD climate research results
- 🛰️ NATEX — Satellite data viewer built with Python, Satpy, and Bokeh
- 🌐 Aether — Quick netCDF explorer
- 📊 ERMES — ERA5 and CAMS reanalysis data explorer
| Category | Tools |
|---|---|
| Languages | |
| Big Data | |
| Cloud (AWS) | |
| Data Formats | |
| DevOps | |
| Currently Learning |
PySpark tool that automatically profiles and validates any dataset before it enters a pipeline.
- Detects nulls, duplicates, outliers, schema mismatches
- Uses Spark SQL for column statistics and aggregations
- Outputs a structured JSON quality report
- Maps directly to AWS Glue → S3 → Athena production workflows
⚙️ etl-pipeline
A clean, production-style ETL pipeline: Extract → Transform → Load using PySpark + SQL.
- Cleans messy raw data (nulls, casing, invalid values)
- Enriches with calculated columns and business logic
- Spark SQL aggregations for analytics-ready output
- Writes partitioned Parquet — same pattern as AWS S3/Athena