🚍 TransportAnalytics

A complete end-to-end machine learning project built on Google Vertex AI to forecast transit ridership revenue (ROI) and predict mode of transportation based on factors such as fare, duration, and weather. This repository includes data ingestion, feature engineering, model training, and deployment pipeline orchestration.

📁 Repository Structure

TransportAnalytics/
├── config/                    # Configuration files for pipeline, training, or GCP integration
│   └── .placeholder
├── data_ingestion/           # Scripts to fetch and upload datasets to Google Cloud
│   └── download_kaggle_and_upload_gcs.py
├── deployment/               # (Optional) Scripts for batch or real-time model predictions
│   └── .placeholder
├── notebooks/                # Jupyter notebooks for EDA and insights
│   ├── eda_mta_ridership.ipynb
│   └── eda_mode_choice.ipynb
├── pipeline/                 # Vertex AI pipeline orchestration scripts
│   └── vertex_pipeline.py
├── preprocessing/            # Feature engineering scripts and processed datasets
│   ├── feature_engineering.py
│   └── merged_feature_data.csv
├── training/                 # Model training scripts
│   ├── train_ridership_model.py
│   └── train_mode_classifier.py
├── .gitignore
└── README.md

🚀 Running the Pipeline on Google Vertex AI

✅ Prerequisites

Google Cloud Project (your-gcp-projectid)
Vertex AI API enabled
BigQuery and Cloud Storage set up
Service Account with Vertex AI permissions
Python ≥ 3.8

🔧 Step-by-Step Setup

Clone the repository:

git clone https://github.com/YOUR_USERNAME/TransportAnalytics.git
cd TransportAnalytics

Create and activate a virtual environment (required in Cloud Shell or locally):

This ensures isolated package installations and avoids permission issues, especially in Google Cloud Shell.

python3 -m venv venv
source venv/bin/activate

Install dependencies:

pip install -r requirements.txt
# or manually:
pip install google-cloud-aiplatform kfp pandas scikit-learn

Download datasets from Kaggle and upload to GCS (required before feature engineering):

export KAGGLE_USERNAME=you_kaggle_username
export KAGGLE_KEY=your_kaggle_key
python data_ingestion/download_kaggle_and_upload_gcs.py

Note: This step is mandatory before running the feature engineering script.

Run feature engineering:

python preprocessing/feature_engineering.py

Train models locally (optional):

python training/train_ridership_model.py
python training/train_mode_classifier.py

Compile and submit Vertex AI pipeline:

python pipeline/vertex_pipeline.py

Deploy the pipeline using Python SDK:

from google.cloud import aiplatform
from google.cloud.aiplatform.pipeline_jobs import PipelineJob

aiplatform.init(project="your-gcp-projectid", location="your-gcp-project-location")

pipeline_job = PipelineJob(
    display_name="ridership-forecast-pipeline",
    template_path="vertex_ridership_pipeline.json",
    enable_caching=True,
)
pipeline_job.run()

🧠 Project Highlights

Datasets Used:
- MTA Ridership
- Multimodal Mode Choice
ML Models:
- RandomForestRegressor: Predict transit ridership revenue
- RandomForestClassifier: Predict mode of transport based on influencing factors
Feature Engineering Highlights:
- Encoding categorical variables like weather and mode
- Engineering features like fare_per_minute
- Merging datasets on common temporal dimensions

🧩 Coming Soon

Cloud function or batch endpoint for scoring
BigQuery ML support
Looker Studio dashboard for visualization

📬 Feedback & Contributions

Feel free to fork this repo, submit pull requests, or create issues. Contributions are welcome!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚍 TransportAnalytics

📁 Repository Structure

🚀 Running the Pipeline on Google Vertex AI

✅ Prerequisites

🔧 Step-by-Step Setup

🧠 Project Highlights

🧩 Coming Soon

📬 Feedback & Contributions

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
config		config
data_ingestion		data_ingestion
deployment		deployment
mindmaps		mindmaps
notebooks		notebooks
pipeline		pipeline
preprocessing		preprocessing
training		training
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🚍 TransportAnalytics

📁 Repository Structure

🚀 Running the Pipeline on Google Vertex AI

✅ Prerequisites

🔧 Step-by-Step Setup

🧠 Project Highlights

🧩 Coming Soon

📬 Feedback & Contributions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages