Skip to content

๐Ÿ“Š A collection of hands-on Data Mining projects covering core concepts like classification, clustering, association rule mining, and text analysis, implemented in Python.

Notifications You must be signed in to change notification settings

Amirbehnam1009/Data-Mining-Projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

3 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Data Mining Projects ๐Ÿ“Š

This repository contains multiple projects covering several important topics in Data Mining.

About โ„น๏ธ

Under The Supervision of Prof. Ehsan Nazerfard ๐Ÿ‘จโ€๐Ÿซ

Spring 2023 ๐ŸŒธ


1. Data Preprocessing ๐Ÿงน

The objective of this project is to employ a variety of preprocessing techniques to showcase the significance of comprehending, cleansing, and refining the raw dataset. The considered aspects encompass:

  1. Managing NaN values โ“
  2. Processing non-numeric data through Label Encoding and One Hot Encoding ๐Ÿ”คโžก๏ธ๐Ÿ”ข
  3. Implementing Data Augmentation ๐Ÿ“ˆ
  4. Utilizing Upsampling and Downsampling methods โš–๏ธ
  5. Applying Smotetomek and Smoteenn approaches ๐Ÿ”„
  6. Normalizing the data ๐Ÿ“Š
  7. Conducting Principal Component Analysis (PCA) ๐Ÿ“‰
  8. Creating plots and visualizations ๐Ÿ“ˆ๐Ÿ“Š

Libraries: Scikit-learn, Pandas, Imbalanced-learn, Matplotlib ๐Ÿ

About the Dataset: The dataset considered for this project is Palmer Penguin ๐Ÿง. This collection was collected to identify three different breeds of penguins (Adelie, Gentoo and Chinstrap). There are 7 features for each penguin.


2. Regression and Classification ๐Ÿ“ˆ๐Ÿ“Š

The objective of this project is to demonstrate the deployment of various machine learning techniques on housing price data, illustrating the application and impact of both classification and regression methods.

  1. Q-box analysis ๐Ÿ“ฆ
  2. Comparison between Linear Regression and Polynomial Regression โž•โž—๐Ÿ“ˆ
  3. Calculation of Mean Squared Error ๐Ÿ“
  4. Classification methods such as Decision Trees ๐ŸŒณ, Random Forests ๐ŸŒฒ๐ŸŒฒ๐ŸŒฒ, K-Nearest Neighbors (KNN) ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ, Linear and Non-Linear Support Vector Machines (SVM) โš”๏ธ
  5. Multi-class classification employing Deep Learning techniques ๐Ÿง ๐Ÿค–
  6. Utilization of a Confusion Matrix ๐Ÿค”โœ…โŒ

Libraries: Scikit-learn, Tensorflow, Pandas, Numpy, Matplotlib ๐Ÿ

About the Dataset: The data set considered for this project is the House Price Prediction (houseprice.csv) ๐Ÿ . This collection includes the characteristics of the area, the number of rooms, having parking, storage, elevator, address and the price of the house corresponding to them.


3. Kmeans Algorithm ๐ŸŽฏ

The target of this project is to gain insight into clusters through practical exploration and to create clusters using the Python language.

  1. Generate a Similarity matrix using Cosine Similarity and Euclidean distance. ๐Ÿ“
  2. Implementation of the K-means algorithm

Simulated Result:

C1 C2 C3 C4

Libraries: Scikit-learn, matplotlib, numpy ๐Ÿ


4. Final Project: Persian Spotify ๐ŸŽต๐Ÿ‡ฎ๐Ÿ‡ท

A project aimed at making various predictions using the Persian music dataset.

  1. Data analysis and review, including Exploratory Data Analysis (EDA) and PCA visualization. ๐Ÿ”๐Ÿ“Š
  2. Application of regression to predict music popularity. ๐ŸŽถ๐Ÿ“ˆ
  3. Classification of music into traditional and non-traditional categories. ๐Ÿช•๐ŸŽธ

Libraries: Scikit-learn, matplotlib, numpy, Seaborn ๐Ÿ

About the Dataset: This dataset contains 10,632 songs from 69 Iranian artists. There are 32 features to describe music.

About

๐Ÿ“Š A collection of hands-on Data Mining projects covering core concepts like classification, clustering, association rule mining, and text analysis, implemented in Python.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published