This project transforms raw effluent sensor data from a pulp mill into an interactive regulatory dashboard. It bridges the gap between process engineering and data science by implementing industrial standards directly into the ETL pipeline.
- Overall Compliance Rate: 36.7%
- Financial Risk: Calculated based on a $500/violation model.
- Primary Violation Driver: Turbidity (averaging 4.45 NTU during non-compliance).
Data source for case study: https://www.kaggle.com/datasets/adityakadiwal/water-potability
The pipeline was built in Power Query with a focus on data integrity:
- Locale-Aware Ingestion: Applied
en-USlocale settings to ensure decimal precision (protecting pH and Conductivity readings). - Dynamic Mean Imputation: Automated null-handling for 491 missing sensor records using a scalable M-code script (
List.Accumulate). - Biological Sanity Checks: Verified all pH data falls within the 0.0β14.0 range to eliminate sensor artifacts.
/data_raw: Original sensor dataset./data_processed: Final Excel dashboard featuring DAX measures and Root Cause Heatmaps./scripts: Full M-code documentation for the ETL process./docs: Audit logs, terminology, and industrial standards used for the model.