Spotify Global Top 50 ETL Pipeline

Overview

This project builds an automated ETL pipeline to collect weekly global Top 50 song data from Topsify (A channel in Spotify) from the Spotify API and store it in Snowflake using AWS services. The pipeline is designed for a client interested in tracking global music trends over time to gain insights for data-driven content creation in the music industry.

Project Value

By collecting data every week over a year, the client will be able to uncover patterns related to trending artists, genres, and albums. This will allow them to understand what makes a song successful and make data-informed decisions when creating new music content.

Architecture Diagram

Pipeline Description

🔄 Extract

Source: Spotify Top 50 Global Playlist (Topsify)
Trigger: Weekly via Amazon CloudWatch
Lambda: Python-based function uses the Spotify API to extract current playlist data and tranforms it into json format.
Raw Data Storage: Stored in Amazon S3 as json

🔁 Transform

Trigger: S3 Object PUT triggers the transformation
AWS Glue: Spark job performs data cleaning and transformation on raw JSON
Output: Transformed data stored back into Amazon S3 as csv

📥 Load

Snowpipe: Automatically ingests the transformed data from S3
Snowflake: Stores structured and queryable song data for downstream analysis

Technologies Used

Spotify API (Data Source)
AWS Lambda (ETL Trigger + Extraction in Python)
AWS CloudWatch (Trigger for Lambda)
AWS S3 (Raw and Transformed Data Storage)
AWS Glue (Apache Spark-based Transformation)
Snowpipe & Snowflake (Data Warehouse & Auto-loading)

Key Features

Fully serverless and scalable architecture
Collects weekly updates without manual intervention
Enables year-long data accumulation for rich analytics

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Lambda_func_data_extraction.ipynb		Lambda_func_data_extraction.ipynb
README.md		README.md
Snowflake_code.sql		Snowflake_code.sql
glue_transformation_job.ipynb		glue_transformation_job.ipynb
spotipy_layer.zip		spotipy_layer.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spotify Global Top 50 ETL Pipeline

Overview

Project Value

Architecture Diagram

Pipeline Description

🔄 Extract

🔁 Transform

📥 Load

Technologies Used

Key Features

About

Uh oh!

Releases

Packages

Languages

anushreebiswas/SoundPulse-Music-Trend-Tracker-with-AWS-and-Snowflake

Folders and files

Latest commit

History

Repository files navigation

Spotify Global Top 50 ETL Pipeline

Overview

Project Value

Architecture Diagram

Pipeline Description

🔄 Extract

🔁 Transform

📥 Load

Technologies Used

Key Features

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages