Multi-project data platform demonstrating production patterns used in real data engineering work.
Streaming ingestion with Redpanda (Kafka), medallion-layer transformations with dbt, polyglot implementations in Python and Go, and a hybrid-cloud bridge to BigQuery. Every project is self-contained, runs locally via Docker, and mirrors patterns from client work at retail and SaaS scale.
Live CV · Portfolio · LinkedIn
data-platform/
├── foundation/ # Shared infrastructure (Docker services)
│ ├── docker-compose.yml # PostgreSQL, Redpanda, Redis
│ └── shared/ # Reusable libraries (messaging, database, models)
│
├── warehouse/ # BigQuery + dbt medallion (staging → intermediate → marts)
│ └── models/
│ ├── staging/ # stg_events — source normalization + country null-fix
│ ├── intermediate/ # int_events — bot detection, PPP pricing, country backfill
│ └── marts/ # mart_funnel, mart_campaign_performance, mart_session_stats
│
├── projects/
│ ├── ecommerce-dbt/ # Python — Kafka → PostgreSQL → dbt
│ ├── go-ecommerce/ # Go — direct port of ecommerce-dbt for shadow deployment
│ ├── go-marketing-analytics/ # Go — multi-source marketing platform (GA4 + CRM + Ads)
│ └── hybrid-cloud-bridge/ # Python + GCP — GCS → BigQuery via Cloud Functions
│
├── scripts/ # Setup, verification, and quality check scripts
└── tests/ # Unit and integration tests
flowchart TB
subgraph Foundation["Foundation (Docker)"]
PG[(PostgreSQL<br/>:5433)]
RP[Redpanda<br/>:19092]
RD[(Redis<br/>:6379)]
end
subgraph Projects
EC_PY[ecommerce-dbt<br/>Python]
EC_GO[go-ecommerce<br/>Go]
MKT[go-marketing-analytics<br/>Go]
HB[hybrid-cloud-bridge<br/>Python]
end
subgraph Cloud["GCP"]
BQ[(BigQuery)]
GCS[(GCS)]
end
subgraph Warehouse["Warehouse (dbt)"]
STG[Staging]
INT[Intermediate]
MRT[Marts]
end
EC_PY --> RP
EC_GO --> RP
MKT --> RP
RP --> PG
HB --> BQ
HB --> GCS
GCS --> BQ
BQ --> STG
STG --> INT
INT --> MRT
Containerized shared infrastructure. Every project runs against the same local services with namespace isolation (separate Postgres databases, prefixed Kafka topics, prefixed Redis keys).
| Service | Purpose | Port | Web UI |
|---|---|---|---|
| PostgreSQL | Operational + warehouse DB | 5433 | — |
| Redpanda | Kafka-compatible streaming | 19092 | Console |
| Redis | Caching layer | 6379 | — |
See foundation/README.md for architecture details.
End-to-end real-time e-commerce pipeline. Python data generator publishes to Redpanda, a Kafka consumer lands events into PostgreSQL, and dbt transforms them into analytics-ready tables.
Stack: Python, Redpanda (Kafka), PostgreSQL, dbt, Docker Project details →
Direct port of ecommerce-dbt to Go, designed to run side-by-side with the Python version against the same Redpanda topics and PostgreSQL schema. Uses separate Kafka consumer groups (orders_ingestion_go vs orders_ingestion), so both pipelines can process the same events concurrently — useful for performance comparisons and shadow deployments.
Highlights:
- Event generator mirroring the exact statistical distribution of the Python version
- Highly concurrent Kafka consumer using goroutines (one per topic)
- Bulk insert via
jackc/pgxwithON CONFLICTdeduplication
Stack: Go 1.25, segmentio/kafka-go, jackc/pgx, Redpanda, PostgreSQL
Project details →
Real-time marketing data platform. Simulates GA4 page views, CRM lead lifecycle events (created → qualified → opportunity → won/lost), and paid-media ad spend — all unified through Redpanda ingestion and attributed across sources in PostgreSQL.
Highlights:
- Full GA4-style event tracking with UTM parameters
- Cross-source attribution: GA4
utm_source→ CRMlead_source→ Ads platform spend - High-throughput Go services with goroutines, Kafka batching, and pgx connection pooling
- BigQuery client integration (
cloud.google.com/go/bigquery) for warehouse uplift
Stack: Go 1.25, segmentio/kafka-go, jackc/pgx, cloud.google.com/go/bigquery, Redpanda, PostgreSQL, dbt
Project details →
Hybrid-cloud data flow: local Docker producer mock uploads to GCS, GCS triggers a Cloud Function that loads Parquet into BigQuery. Demonstrates the ingestion half of the medallion stack before the dbt transformations take over.
Stack: Python, Docker, GCS, BigQuery, Cloud Functions Project folder →
Production-grade dbt project targeting Google BigQuery, implementing a full medallion architecture with self-healing patterns.
| Layer | Model | Purpose |
|---|---|---|
| Staging | stg_events |
Source normalization, NULL country → 'XX' placeholder |
| Intermediate | int_events |
Business logic: bot detection, PPP pricing, country backfill |
| Marts | mart_funnel |
Conversion funnel: View → Click → Purchase |
mart_campaign_performance |
Revenue breakdown by campaign, device, country | |
mart_session_stats |
Session-level aggregations |
- Self-healing ingestion: window-function backfill of corrupt country codes in
int_eventswithout touching source data - Bot detection: automated flagging of high-frequency sessions (>50 events/session)
- Regional pricing (PPP): tier-based value adjustment (US/UK/DE/JP = 100%, BR/FR/CA = 50%, others = 20%)
- Campaign hierarchy extraction: parsing campaign category from composite IDs (
cmp_US_blackfriday→Blackfriday) - Revenue attribution: separate streams for sales, clicks, and views
flowchart LR
subgraph Bronze["Bronze"]
SRC[(events)]
end
subgraph Silver["Silver"]
STG[stg_events]
INT[int_events]
end
subgraph Gold["Gold"]
F1[mart_funnel]
F2[mart_campaign_performance]
F3[mart_session_stats]
end
SRC --> STG
STG --> INT
INT --> F1
INT --> F2
INT --> F3
STG -.-|"NULL → XX"| STG
INT -.-|"Bot Detection<br/>PPP Pricing"| INT
Stack: BigQuery, dbt, Python, GCS (Parquet)
- Docker Desktop (running)
- Python 3.10+ with
venv - Go 1.21+ (for Go projects)
- Git
# 1. Clone
git clone https://github.com/hailtr/data-platform.git
cd data-platform
# 2. Start infrastructure (PostgreSQL, Redpanda, Redis)
cd foundation && docker-compose up -d && cd ..
# 3. Python environment
python -m venv venv
source venv/bin/activate # macOS/Linux
# venv\Scripts\activate # Windows
pip install -r requirements.txt
# 4. Initialize the database
python scripts/init_database.py
# 5. (Optional) Run quality checks
./scripts/run_checks.sh # macOS/Linux
# scripts\run_checks.bat # Windows# Check services are up
python scripts/check_services.py
# Redpanda Console
open http://localhost:8080 # macOS
# start http://localhost:8080 # Windows- Docker containerization with namespace isolation across projects
- Multi-tenant platform design (DB-per-project, topic-prefixing, Redis key-prefixing)
- Shared-library pattern (
foundation/shared/) for cross-project reuse
- Kafka-compatible streaming via Redpanda
- Consumer-group shadowing (run two pipeline implementations against the same topic concurrently)
- Bulk insert with
ON CONFLICTdedup for idempotent replays
- BigQuery medallion architecture (staging → intermediate → marts)
- dbt: incremental models, window functions, cross-model macros
- Self-healing data patterns (country-code backfill via window functions)
- Bot detection and PPP-adjusted revenue attribution
- Goroutine-based concurrent Kafka consumers
jackc/pgxconnection pooling and bulk insert- BigQuery client integration via
cloud.google.com/go/bigquery
- Synthetic data generation with tunable statistical distributions
- Kafka consumer pipelines
- Quality gating via Black + Flake8 + Pytest
# All checks (format, lint, test)
./scripts/run_checks.sh # macOS/Linux
scripts\run_checks.bat # WindowsTooling: Black (formatting), Flake8 (linting), Pytest (unit + integration tests).
MIT — see LICENSE.
Built by Rafael Ortiz — Senior Data Engineer. These patterns are derived from real client work on Snowflake → ClickHouse migrations, Microsoft Fabric lakehouses, and streaming backends. See the live CV for the full context.