Skip to content
View rohith-66's full-sized avatar

Block or report rohith-66

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
rohith-66/README.md

Hi, I'm Rohith

MS Data Science @ Arizona State University · Graduating May 2026 · Open to Data Engineer & Data Analyst roles (F1 OPT)

I build data systems end to end — from raw, messy inputs to production-ready insights. My work spans cloud-native pipelines, ML forecasting, and AI-powered tooling.


Flagship Projects

ComorbidAlert — US County Comorbidity Forecasting System

Python · Prophet · LightGBM · SHAP · AWS S3 · Streamlit · Plotly

End-to-end ML pipeline forecasting diabetes-cardiac comorbidity risk across all 3,144 US counties — built to catch counties on worsening trajectories 2–3 years before they cross critical thresholds.

CDC PLACES + Census ACS + BRFSS → 3-Layer Comorbidity Index → Weighted Ensemble Forecast → Early Warning Alerts
  • 3-layer scoring model — L1 clinical burden · L2 social vulnerability · L3 trajectory
  • Weighted ensemble (Prophet + LightGBM) — WAPE 0.46%, outperforms both individual models
  • 830 early warning alerts — Critical / Warning / Watch tiers with plain-English reasons
  • Novel finding — Great Plains emerging cluster (NE/IA/SD) not documented in prior literature
  • Live Streamlit dashboard — choropleth map, county drill-down, alert table, key insights

🔗 Live Dashboard · GitHub Repo


DataFlow Studio — AI-Powered Data Engineering Pipeline

React · FastAPI · Claude AI · PySpark · Python · Vercel + Render

Upload any CSV and watch it flow through a full Bronze → Silver → Gold medallion architecture powered by Claude AI.

  • Bronze — Schema detection, null analysis, data quality profiling
  • Silver — AI-generated PySpark & SQL transformations with real code output
  • Gold — Auto-generated KPIs, charts, and business insights dashboard
  • Export — Cleaned CSV, production PySpark .py file, pipeline report

🔗 Live Demo · GitHub Repo


Lakehouse Pipeline — GCP + Spark + BigQuery

Apache Spark · Docker · Google Cloud Storage · BigQuery · Parquet

Production-style lakehouse architecture processing 80,000+ records per run.

Raw JSON (GCS Bronze) → Spark Transforms (Dockerized) → Parquet (Silver) → BigQuery Warehouse (Gold)
  • Parameterized daily batch pipeline — single command execution
  • Schema enforcement, nested JSON flattening, deduplication
  • Partitioned + clustered BigQuery modeling

🔗 GitHub Repo


Construction Portfolio — Cost Forecasting System

PostgreSQL · Power BI · 75,000+ Work Items

End-to-end performance analytics framework simulating a real-world project controls environment.

  • Planned vs Actual Cost Tracking with SQL window functions
  • CPI & EAC forecasting, portfolio risk classification (RAG framework)
  • What-if financial impact simulation
  • Identified cost overruns, flagged high-risk projects

Core Stack

Layer Technologies
Data Engineering PySpark · Apache Spark · Docker · GCP · BigQuery · AWS S3 · Parquet
ML & Forecasting Prophet · LightGBM · Scikit-learn · SHAP · PyTorch
Languages Python · SQL · PostgreSQL
AI Tooling Claude API
BI & Visualization Streamlit · Plotly · Power BI · Tableau
Backend FastAPI · REST APIs
Frontend React · Tailwind CSS
Tools Git · Docker · Jupyter

Connect

LinkedIn Portfolio Email GitHub


Build systems that reduce uncertainty — not increase complexity.

Popular repositories Loading

  1. Weeedoclever Weeedoclever Public

    Online Student Portal is a all in one platform for students where they can find all the notes, assignments, question papers, and academic updates. This reduces mainly the storage space and time

    HTML 1

  2. -Predictive-Analytics-for-Optimizing-Urban-Bike-Sharing-Systems -Predictive-Analytics-for-Optimizing-Urban-Bike-Sharing-Systems Public

    Machine learning model to forecast hourly bike rental demand, leveraging historical rental data, weather variables (temperature, humidity, precipitation), and temporal patterns. Utilized Random For…

    Jupyter Notebook 1

  3. CycleDemand CycleDemand Public

    Daily bike rental forecasting with LightGBM, XGBoost, SHAP explainability, and data storytelling.

    1

  4. football-injury-analysis football-injury-analysis Public

    Decision-focused injury impact analytics (15,600+ records) analyzing availability loss, severity trends, and high-risk player profiles using SQL & Power BI.

    1

  5. project-performance-analytics-forecasting project-performance-analytics-forecasting Public

    End-to-end project performance and cost forecasting system using PostgreSQL and Power BI with CPI, EAC, risk classification, and scenario simulation.

    1

  6. lakehouse lakehouse Public

    Data lakehouse implementation with structured pipelines and analytics layers, supporting scalable data processing and transformation workflows.

    Python 1