Skip to content

Anran0716/DE-Project-Bikeshare

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DE-Project-Bikeshare

Problem Statement

Philadelphia’s bikeshare system, Indego, has provided communities across the city access to public bikes for over eight years. The goal of this project is to develop a data pipeline and dashboard, which provides periodic analysis of bikeshare usage patterns and travel demand trends.

Indego

Datasets

Workflow

wf

The ETL pipeline is designed to Extract data from various sources, Transform it using DBT, and Load it into BigQuery for analysis. The process involves the following steps:

  • Extraction: Docker containerized Airflow orchestrates data extraction from CSV files.
  • Loading: Extracted data is loaded into BigQuery using Airflow DAGs.
  • Transformation: DBT performs data cleaning, normalization, and aggregation directly in BigQuery.
  • Visualization: Transformed data is visualized using Tableau, providing actionable insights into bike share usage.

Tools & Technology

  • Cloud Provider: Google Cloud Platform (GCP)
  • Infrastructure as Code: Terraform
  • Orchestration: Apache Airflow (Containerized with Docker)
  • Data Processing: DBT for transformations
  • Storage & Querying: GCS, BigQuery
  • Visualization: Tableau

Dashboard & Visualization

Dashboard

The dashboard is built with Tableau to show key metrics about Indego ridership pattern, identifying peak demand periods and popular stations.

wf

Key Insights:

  • Bike Usage: More people tend to choose e-bikes and monthly pass (Indego30).
  • Temporal Trends: Ridership peaks during summer months, with the highest activity from May to September. There are more trips at weekdays compared to weekends. Peak riding hours are around 8 AM (morning commute) and 5-6 PM (evening commute).
  • Station Activity: 15th & Spruce is the most popular origin/destination station. Other popular stations include 23rd & South, 12th & Chestnut, 17th & Locust, and 34th & Spruce. Trip activities concentrated in the central Philadelphia downtown.

General Guidance

Step 1: Setup

  • Set up GCP environment, Terraform, Docker, and configure necessary IAM roles.

Step 2: Data Ingestion & Loading

  • Use Docker to containerize the pipeline.
  • Build Airflow DAGs for workflow automation.
  • Load the raw data in Google Bigquery.

Step 3: Data Transformation

  • Implement data partitioning and indexing in BigQuery.
  • Optimize data transformations using DBT.

Step 4: Data Visualization

  • Build Tableau dashboards for analytics and insights.

About

Bikeshare Trip Data Engineering & Analytics in Philadelphia

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors