Skip to content

anushreebiswas/SoundPulse-Music-Trend-Tracker-with-AWS-and-Snowflake

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

10 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Spotify Global Top 50 ETL Pipeline

Domain
Tech
Project Type

Overview

This project builds an automated ETL pipeline to collect weekly global Top 50 song data from Topsify (A channel in Spotify) from the Spotify API and store it in Snowflake using AWS services. The pipeline is designed for a client interested in tracking global music trends over time to gain insights for data-driven content creation in the music industry.

Project Value

By collecting data every week over a year, the client will be able to uncover patterns related to trending artists, genres, and albums. This will allow them to understand what makes a song successful and make data-informed decisions when creating new music content.

Architecture Diagram

Screenshot 2025-06-07 at 11 48 50โ€ฏPM

Pipeline Description

๐Ÿ”„ Extract

  • Source: Spotify Top 50 Global Playlist (Topsify)
  • Trigger: Weekly via Amazon CloudWatch
  • Lambda: Python-based function uses the Spotify API to extract current playlist data and tranforms it into json format.
  • Raw Data Storage: Stored in Amazon S3 as json

๐Ÿ” Transform

  • Trigger: S3 Object PUT triggers the transformation
  • AWS Glue: Spark job performs data cleaning and transformation on raw JSON
  • Output: Transformed data stored back into Amazon S3 as csv

๐Ÿ“ฅ Load

  • Snowpipe: Automatically ingests the transformed data from S3
  • Snowflake: Stores structured and queryable song data for downstream analysis

Technologies Used

  • Spotify API (Data Source)
  • AWS Lambda (ETL Trigger + Extraction in Python)
  • AWS CloudWatch (Trigger for Lambda)
  • AWS S3 (Raw and Transformed Data Storage)
  • AWS Glue (Apache Spark-based Transformation)
  • Snowpipe & Snowflake (Data Warehouse & Auto-loading)

Key Features

  • Fully serverless and scalable architecture
  • Collects weekly updates without manual intervention
  • Enables year-long data accumulation for rich analytics

Releases

No releases published

Packages

No packages published