S&P 500 Clustering Project

Overview

This project applies clustering techniques to analyze S&P 500 stock data based on their historical adjusted close prices. The goal is to group stocks into clusters that share similar characteristics, using K-Means Clustering and Gaussian Mixture Model as the primary algorithms. The project involves data preprocessing, cluster analysis, and validation, providing insights into market behavior and asset groupings.

Prerequisites

Install Node JS

Refer to https://nodejs.org/en/ to install Node.js

Installation Poetry

This project uses Poetry for dependency management and packaging. Poetry allows for an easy and reliable way to manage project dependencies, ensuring a consistent

To get started, you need to have Poetry installed. You can install Poetry by following the instructions on the official documentation.

Alternatively, you can use this command to install it:

curl -sSL https://install.python-poetry.org | python3 -

Installation

Clone the repo

https://github.com/Fritozz-105/Cluster-Algorithm-Comparison.git

Install NPM packages

npm install

To run the application locally use in the frontend directory:

npm run dev

And use this command in the backend directory:

python main.py

Project Features

Frontend

Navigation:
- Navigate towards the cluster algorithm analysis page by pressing the next button.
- Click the logo in the header to return to the start page.
Data Display:
- Fetches the CSV data returned from the K-Means Clustering Algorithm and Gausian Mixture Model Algorithm and displays it in a scatterplot.
- Fetches the Silhouette Score, Calinski-Harabasz Index, and Davies-Bouldin Index to compare the performance between the two algorithms and displays it as a table.

Backend

Data Collection and Cleaning:
- The list of S&P 500 tickers is fetched dynamically from Wikipedia.
- Historical adjusted close prices for each ticker are fetched using the yfinance library.
- Stocks with incomplete data are excluded to ensure consistency.
Data Preprocessing:
- Daily returns are calculated for each stock to measure price changes.
- Rolling volatility is calculated over a 20-day window to assess risk.
Clustering Algorithm:
- K-Means Clustering and Gaussian Mixture Model are implemented from scratch to identify clusters in the data.
- The number of clusters is determined using the Elbow Method, which evaluates distortion as a function of cluster count.
Cluster Validation:
- Within-cluster variances are calculated to measure cluster compactness and validate clustering performance.
- Other metrics are calculated for comparison, such as silhouette score, calinski-harabasz Index, and Davies-Bouldin Index.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.vscode		.vscode
ClusterAlgorithmComparison		ClusterAlgorithmComparison
clustering_results		clustering_results
resources		resources
.DS_Store		.DS_Store
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

S&P 500 Clustering Project

Overview

Prerequisites

Install Node JS

Installation Poetry

Installation

Project Features

Frontend

Backend

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Fritozz-105/Cluster-Algorithm-Comparison

Folders and files

Latest commit

History

Repository files navigation

S&P 500 Clustering Project

Overview

Prerequisites

Install Node JS

Installation Poetry

Installation

Project Features

Frontend

Backend

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages