This project demonstrates a modern data engineering pipeline that integrates the Spotify API, processes data with Python, deploys it using AWS Lambda, and automates ingestion into Snowflake using Snowpipe.
This pipeline automates the process of extracting data from Spotify, transforming it in Python, and ingesting it into Snowflake for further analysis.
- Spotify API Integration: Extracts music-related data (e.g. tracks, playlists, artists).
- AWS Lambda: Hosts the Python script to automate extraction and transformation.
- AWS CloudWatch Trigger: Automatically invokes the Lambda function on a schedule.
- Amazon S3: Stores the transformed data files (.csv or .json).
- Snowpipe: Automatically ingests the files from S3 into Snowflake.
- Programming: Python 3
- Cloud Services: AWS Lambda, S3, IAM, CloudWatch
- Data Warehouse: Snowflake
- Data Integration: Spotify Web API, Snowpipe
- Others: boto3, requests
spotify-data-pipeline/
βββ AWS lambda/
β βββ Extract_Spotify_Data.py
β βββ Transform_Spotify_Data.py
βββ snowflake/
β βββ Snowflake_SQL.sql
β βββ Storage_Integration.sql
βββ S3-Sample-Output/
βββ raw-data/ β stores unprocessed data pulled from Spotify API
β βββ processed/
β βββ to_process/
βββ transform-data/ β stores cleaned, transformed datasets ready for loading into Snowflake
β βββ album_data/
β βββ artist_data/
β βββ songs_data/
You can watch the demo video here: Spotify_Pipeline.mkv
This project was inspired by the "Data Warehouse for Data Engineering with Snowflake" by Darshil Parmar. The course provided the foundational concepts and structure for integrating Spotify, AWS, and Snowflake.
All implementation, customization, and additional enhancements in this repository are my own work and reflect my personal understanding and learning.