Data Mining Projects 📊

This repository contains multiple projects covering several important topics in Data Mining.

About ℹ️

Under The Supervision of Prof. Ehsan Nazerfard 👨‍🏫

Spring 2023 🌸

1. Data Preprocessing 🧹

The objective of this project is to employ a variety of preprocessing techniques to showcase the significance of comprehending, cleansing, and refining the raw dataset. The considered aspects encompass:

Managing NaN values ❓
Processing non-numeric data through Label Encoding and One Hot Encoding 🔤➡️🔢
Implementing Data Augmentation 📈
Utilizing Upsampling and Downsampling methods ⚖️
Applying Smotetomek and Smoteenn approaches 🔄
Normalizing the data 📊
Conducting Principal Component Analysis (PCA) 📉
Creating plots and visualizations 📈📊

Libraries: Scikit-learn, Pandas, Imbalanced-learn, Matplotlib 🐍

About the Dataset: The dataset considered for this project is Palmer Penguin 🐧. This collection was collected to identify three different breeds of penguins (Adelie, Gentoo and Chinstrap). There are 7 features for each penguin.

2. Regression and Classification 📈📊

The objective of this project is to demonstrate the deployment of various machine learning techniques on housing price data, illustrating the application and impact of both classification and regression methods.

Q-box analysis 📦
Comparison between Linear Regression and Polynomial Regression ➕➗📈
Calculation of Mean Squared Error 📐
Classification methods such as Decision Trees 🌳, Random Forests 🌲🌲🌲, K-Nearest Neighbors (KNN) 👨‍👩‍👧‍👦, Linear and Non-Linear Support Vector Machines (SVM) ⚔️
Multi-class classification employing Deep Learning techniques 🧠🤖
Utilization of a Confusion Matrix 🤔✅❌

Libraries: Scikit-learn, Tensorflow, Pandas, Numpy, Matplotlib 🐍

About the Dataset: The data set considered for this project is the House Price Prediction (houseprice.csv) 🏠. This collection includes the characteristics of the area, the number of rooms, having parking, storage, elevator, address and the price of the house corresponding to them.

3. Kmeans Algorithm 🎯

The target of this project is to gain insight into clusters through practical exploration and to create clusters using the Python language.

Generate a Similarity matrix using Cosine Similarity and Euclidean distance. 📐
Implementation of the K-means algorithm

Simulated Result:

Libraries: Scikit-learn, matplotlib, numpy 🐍

4. Final Project: Persian Spotify 🎵🇮🇷

A project aimed at making various predictions using the Persian music dataset.

Data analysis and review, including Exploratory Data Analysis (EDA) and PCA visualization. 🔍📊
Application of regression to predict music popularity. 🎶📈
Classification of music into traditional and non-traditional categories. 🪕🎸

Libraries: Scikit-learn, matplotlib, numpy, Seaborn 🐍

About the Dataset: This dataset contains 10,632 songs from 69 Iranian artists. There are 32 features to describe music.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Classification And Regression		Classification And Regression
Final Project - Persian Spotify		Final Project - Persian Spotify
Kmeans Algorithm		Kmeans Algorithm
Preproccesing Data		Preproccesing Data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Mining Projects 📊

About ℹ️

1. Data Preprocessing 🧹

2. Regression and Classification 📈📊

3. Kmeans Algorithm 🎯

4. Final Project: Persian Spotify 🎵🇮🇷

About

Uh oh!

Releases

Packages

Languages

Amirbehnam1009/Data-Mining-Projects

Folders and files

Latest commit

History

Repository files navigation

Data Mining Projects 📊

About ℹ️

1. Data Preprocessing 🧹

2. Regression and Classification 📈📊

3. Kmeans Algorithm 🎯

4. Final Project: Persian Spotify 🎵🇮🇷

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages