Skip to content

🎶Comparative analysis of ensemble regression algorithms for predicting song popularity using Spotify Web API audio features.

Notifications You must be signed in to change notification settings

KayteKatelyn/Predicting-Song-Popularity

Repository files navigation

🎶 Predicting-Song-Popularity

This project develops a machine learning pipeline to predict song popularity using Spotify audio features and metadata from the Spotify Web API. The foundational implementation uses Random Forest regression to generate popularity scores and rank songs based on predictions. The project was extended into a comparative research study evaluating three regression algorithms: Random Forest, XGBoost, and LightGBM, to determine which algorithm most effectively predicts track success.

Key Features

  • Retrieves metadata and audio features using the Spotify Web API with Python batch processing.
  • Comprehensive data preprocessing and feature engineering to extract meaningful insights.
  • Random Forest Regressor for popularity prediction with strong performance metrics.
  • Ranks songs based on predicted popularity scores.

Tech Stack

  • Python - Primary programming language
  • Pandas - Data manipulation and preprocessing
  • Spotify Web API - Real-time data integration
  • scikit-learn - Random Forest & metrics
  • LightGBM - Gradient boosting implementation
  • XGBoost - Advanced gradient boosting
  • Matplotlib / Seaborn - Data visualization

Requirements

To run this project, you will need the following hardware and software requirements.

Hardware Requirements:

  • Primary Memory: 8.00 GB
  • Secondary Memory: 1 TB
  • Processor: 10th Generation Intel Core i5

Software Requirements:

  • Python: Version 3.7
  • Google Colab or Pycharm or any preferred integrated development environment.
  • Libraries: spotipy, Pandas, NumPy, Scikit-learn, Matplotlib, csv, time

How it Works?

Data Collection:

  • Integrated with Spotify Web API to retrieve song metadata
  • Extracted 10+ audio features per track (energy, danceability, acousticness, etc.)
  • Implemented Python batch processing for efficient large-scale data retrieval

📁GET THE DATA

Steps:

  • Create an application in the Spotify developers
  • Obtain the Client ID and Client secret
  • Run the Python code to retrieve data from the Spotify Web API

Data Preprocessing:

  • Cleaned and normalised audio features using Pandas
  • Handled missing values and outliers
  • Standardized feature scaling for optimal model performance

Model Comparison & Training

  • Built and trained three different machine learning algorithms
  • Optimized hyperparameters for each model

Prediciton & Ranking

  • Generates popularity scores (0-100) for any song
  • Ranks songs based on predicted popularity

image

Visualization

  • Scatter plots: True vs Predicted popularity scores

XGB Regressor
image
LightGBM Regressor
image
Random Forest Regressor
image

br>

Design Framework

image

📈 Model Performance Comparison

Random Forest Regressor

Metric Value
R² Score 0.99
RMSE 0.22
MAE 0.16
MSE 0.05

LightGBM Regressor ⭐ Best Overall

Metric Value
R² Score 0.99
RMSE 0.20
MAE 0.13
MSE 0.04

XGBoost Regressor

Metric Value
R² Score 0.99
RMSE 0.20
MAE 0.15
MSE 0.04

Key Findings

  • LightGBM stood out for its predictive accuracy.
  • All three algorithms performed exceptionally well (R² ≥ 0.99)

Note

Please make sure you have the necessary Python Libraries and dependencies installed.
Note: This project is for educational purposes. Please respect Spotify's API terms of service when using this code.

About

🎶Comparative analysis of ensemble regression algorithms for predicting song popularity using Spotify Web API audio features.

Topics

Resources

Stars

Watchers

Forks