🍷 WinQ: Machine Learning Tastes Wine Quality!

A machine learning project 🚀 that predicts wine quality (0-10 scale) based on physicochemical properties. Achieves 82% accuracy using XGBoost classifier. 🍇

Features

🔍 Comprehensive EDA with histograms, correlation heatmaps, and feature analysis
🤖 Multiple ML models comparison (XGBoost, SVM, Logistic Regression)
⚙️ Advanced preprocessing with missing value imputation and feature scaling
📊 Model evaluation using ROC-AUC scores and classification reports
🧹 Clean codebase with PEP8 compliance and modular structure

📚 Importing Libraries and Dataset

🐼 Pandas: Data handling
🔢 NumPy: Array operations
📊 Seaborn/Matplotlib: Data visualization
🤖 scikit-learn (sklearn): Machine learning tasks
🚀 XGBoost: Advanced boosting algorithm

🏅First Five Rows of the Dataset

🗃️ Dataset

The dataset contains 11 fundamental wine features that help determine wine quality:

🍋 Fixed acidity
🌬️ Volatile acidity
🍊 Citric acid
🍬 Residual sugar
🧂 Chlorides
🫧 Free sulfur dioxide
⚖️ Density
🧪 pH
🧪 Sulphates
🍷 Alcohol
🏆 Quality (target)

Each feature provides unique insight into the chemistry and characteristics of the wine, ultimately influencing its quality.

📊 Descriptive Statistical Measures of the Dataset

Explore key statistics such as mean, standard deviation, min, max, and quartiles for each wine feature. These insights help you understand data distribution, variability, and potential outliers in your dataset. 🧮

🔍 Exploratory Data Analysis (EDA) 📊

EDA is an approach to analyzing data using visual techniques. It helps you discover trends, patterns, and check assumptions through statistical summaries and graphical representations. 🕵️‍♂️ Let’s start by checking the number of null values in each column of the dataset to ensure data quality and completeness. 🧐

📈 Histograms for Continuous Data

🍇 Count Plot for Each Quality of Wine

🔥 Heatmap for Highly Correlated Features

There are times the data provided to us contains redundant features which do not help with increasing the model's performance. That is why we remove them before training our model.

From the above heatmap, we can conclude that the 'total sulphur dioxide' and 'free sulphur dioxide' are highly correlated features, so we will remove them.

🤖 Model Performance Comparison

Model	🏋️ Training Accuracy	🧪 Validation Accuracy
Logistic Regression	0.698	0.686
XGBoost Classifier	0.976	0.805
SVC (RBF Kernel)	0.720	0.707

XGBoost Classifier delivered the highest validation accuracy! 🚀

🧮 Confusion Matrix on Validation Data

📊 Model Performance (AUC)

Model	Training AUC	Validation AUC
Logistic Regression	0.70	0.69
XGBoost	0.98	0.80
SVM (RBF Kernel)	0.72	0.71

Best Model (XGBoost) Classification Report: precision recall f1-score support 0 0.76 0.74 0.75 474 1 0.86 0.86 0.86 826

🤝 Contributing

🍴 Fork the repository
🌿 Create your feature branch (git checkout -b feature/AmazingFeature)
💾 Commit changes (git commit -m 'Add some AmazingFeature')
🚀 Push to branch (git push origin feature/AmazingFeature)
🔄 Open Pull Request

📜 License

Distributed under the MIT License. See LICENSE for more information.

Code in Place 2025

This project has been created as my final submission for Stanford’s Code in Place 2025! 🚀 The project applies the foundational Python and data science skills learned in Code in Place to a real-world machine learning challenge: predicting wine quality based on physicochemical features. The program uses a well-known dataset to train and evaluate several machine learning models, focusing on clean code, data analysis, and model comparison.

🙌 Credits

Adityabaan Tripathy - Initial work
Wine Quality Dataset - UCI Machine Learning Repository

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
0.png		0.png
5196.png		5196.png
Confusion matrix drawn on the validation data.png		Confusion matrix drawn on the validation data.png
Count plot for each quality of wine.png		Count plot for each quality of wine.png
First Five rows of dataset.png		First Five rows of dataset.png
Heat map for highly correlated features.png		Heat map for highly correlated features.png
Histograms for the columns containing continuous data.png		Histograms for the columns containing continuous data.png
Information about columns of the data.png		Information about columns of the data.png
LICENSE		LICENSE
LR.png		LR.png
README.md		README.md
Some descriptive statistical measures of the dataset.png		Some descriptive statistical measures of the dataset.png
Sum of null values column wise.png		Sum of null values column wise.png
WineQ_Machine_Learning.ipynb		WineQ_Machine_Learning.ipynb
classification.png		classification.png
stanford.png		stanford.png
winequalityN.csv		winequalityN.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🍷 WinQ: Machine Learning Tastes Wine Quality!

Features

📚 Importing Libraries and Dataset

🏅First Five Rows of the Dataset

🗃️ Dataset

📊 Descriptive Statistical Measures of the Dataset

🔍 Exploratory Data Analysis (EDA) 📊

📈 Histograms for Continuous Data

🍇 Count Plot for Each Quality of Wine

🔥 Heatmap for Highly Correlated Features

🤖 Model Performance Comparison

🧮 Confusion Matrix on Validation Data

📊 Model Performance (AUC)

🤝 Contributing

📜 License

Code in Place 2025

🙌 Credits

About

Uh oh!

Releases

Packages

Languages

License

Adityabaan/WinQ-Machine-Learning-Tastes-Wine-Quality

Folders and files

Latest commit

History

Repository files navigation

🍷 WinQ: Machine Learning Tastes Wine Quality!

Features

📚 Importing Libraries and Dataset

🏅First Five Rows of the Dataset

🗃️ Dataset

📊 Descriptive Statistical Measures of the Dataset

🔍 Exploratory Data Analysis (EDA) 📊

📈 Histograms for Continuous Data

🍇 Count Plot for Each Quality of Wine

🔥 Heatmap for Highly Correlated Features

🤖 Model Performance Comparison

🧮 Confusion Matrix on Validation Data

📊 Model Performance (AUC)

🤝 Contributing

📜 License

Code in Place 2025

🙌 Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages