🎶 Predicting-Song-Popularity

This project develops a machine learning pipeline to predict song popularity using Spotify audio features and metadata from the Spotify Web API. The foundational implementation uses Random Forest regression to generate popularity scores and rank songs based on predictions. The project was extended into a comparative research study evaluating three regression algorithms: Random Forest, XGBoost, and LightGBM, to determine which algorithm most effectively predicts track success.

Key Features

Retrieves metadata and audio features using the Spotify Web API with Python batch processing.
Comprehensive data preprocessing and feature engineering to extract meaningful insights.
Random Forest Regressor for popularity prediction with strong performance metrics.
Ranks songs based on predicted popularity scores.

Tech Stack

Python - Primary programming language
Pandas - Data manipulation and preprocessing
Spotify Web API - Real-time data integration
scikit-learn - Random Forest & metrics
LightGBM - Gradient boosting implementation
XGBoost - Advanced gradient boosting
Matplotlib / Seaborn - Data visualization

Requirements

To run this project, you will need the following hardware and software requirements.

Hardware Requirements:

Primary Memory: 8.00 GB
Secondary Memory: 1 TB
Processor: 10th Generation Intel Core i5

Software Requirements:

Python: Version 3.7
Google Colab or Pycharm or any preferred integrated development environment.
Libraries: spotipy, Pandas, NumPy, Scikit-learn, Matplotlib, csv, time

How it Works?

Data Collection:

Integrated with Spotify Web API to retrieve song metadata
Extracted 10+ audio features per track (energy, danceability, acousticness, etc.)
Implemented Python batch processing for efficient large-scale data retrieval

📁GET THE DATA

`Spotify Web API`

Steps:

Create an application in the Spotify developers
Obtain the Client ID and Client secret
Run the Python code to retrieve data from the Spotify Web API

Data Preprocessing:

Cleaned and normalised audio features using Pandas
Handled missing values and outliers
Standardized feature scaling for optimal model performance

Model Comparison & Training

Built and trained three different machine learning algorithms
Optimized hyperparameters for each model

Prediciton & Ranking

Generates popularity scores (0-100) for any song
Ranks songs based on predicted popularity

Visualization

Scatter plots: True vs Predicted popularity scores

XGB Regressor

LightGBM Regressor

Random Forest Regressor

br>

Design Framework

📈 Model Performance Comparison

Random Forest Regressor

Metric	Value
R² Score	0.99
RMSE	0.22
MAE	0.16
MSE	0.05

LightGBM Regressor ⭐ Best Overall

Metric	Value
R² Score	0.99
RMSE	0.20
MAE	0.13
MSE	0.04

XGBoost Regressor

Metric	Value
R² Score	0.99
RMSE	0.20
MAE	0.15
MSE	0.04

Key Findings

LightGBM stood out for its predictive accuracy.
All three algorithms performed exceptionally well (R² ≥ 0.99)

Note

Please make sure you have the necessary Python Libraries and dependencies installed.
Note: This project is for educational purposes. Please respect Spotify's API terms of service when using this code.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
1_DatasetCode.py		1_DatasetCode.py
2_CleanedDatasetCode.py		2_CleanedDatasetCode.py
3_FeaturesDatasetCode.py		3_FeaturesDatasetCode.py
4_featureengineereddata.py		4_featureengineereddata.py
LGBMRegressorSpotify.ipynb		LGBMRegressorSpotify.ipynb
README.md		README.md
RandomForest.ipynb		RandomForest.ipynb
XGBRegressorSpotify.ipynb		XGBRegressorSpotify.ipynb
spotifydataset.csv		spotifydataset.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎶 Predicting-Song-Popularity

Key Features

Tech Stack

Requirements