Skip to content

WinQ leverages machine learning to predict wine quality using key physicochemical features, delivering actionable insights with strong model accuracy. Developed for Stanford Code in Place 2025, this project showcases the power of Python and data science fundamentals in a real-world context.

License

Notifications You must be signed in to change notification settings

Adityabaan/WinQ-Machine-Learning-Tastes-Wine-Quality

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🍷 WinQ: Machine Learning Tastes Wine Quality!

Python 3.8+ License: MIT ML

A machine learning project 🚀 that predicts wine quality (0-10 scale) based on physicochemical properties. Achieves 82% accuracy using XGBoost classifier. 🍇

Features

  • 🔍 Comprehensive EDA with histograms, correlation heatmaps, and feature analysis
  • 🤖 Multiple ML models comparison (XGBoost, SVM, Logistic Regression)
  • ⚙️ Advanced preprocessing with missing value imputation and feature scaling
  • 📊 Model evaluation using ROC-AUC scores and classification reports
  • 🧹 Clean codebase with PEP8 compliance and modular structure

📚 Importing Libraries and Dataset

  • 🐼 Pandas: Data handling
  • 🔢 NumPy: Array operations
  • 📊 Seaborn/Matplotlib: Data visualization
  • 🤖 scikit-learn (sklearn): Machine learning tasks
  • 🚀 XGBoost: Advanced boosting algorithm

🏅First Five Rows of the Dataset

Image Alt

🗃️ Dataset

The dataset contains 11 fundamental wine features that help determine wine quality:

  • 🍋 Fixed acidity
  • 🌬️ Volatile acidity
  • 🍊 Citric acid
  • 🍬 Residual sugar
  • 🧂 Chlorides
  • 🫧 Free sulfur dioxide
  • ⚖️ Density
  • 🧪 pH
  • 🧪 Sulphates
  • 🍷 Alcohol
  • 🏆 Quality (target)

Each feature provides unique insight into the chemistry and characteristics of the wine, ultimately influencing its quality.

📊 Descriptive Statistical Measures of the Dataset

📈 Statistical Summary

Explore key statistics such as mean, standard deviation, min, max, and quartiles for each wine feature. These insights help you understand data distribution, variability, and potential outliers in your dataset. 🧮

🔍 Exploratory Data Analysis (EDA) 📊

EDA is an approach to analyzing data using visual techniques. It helps you discover trends, patterns, and check assumptions through statistical summaries and graphical representations. 🕵️‍♂️ Let’s start by checking the number of null values in each column of the dataset to ensure data quality and completeness. 🧐

Image Alt

📈 Histograms for Continuous Data

📊 Histograms

🍇 Count Plot for Each Quality of Wine

🍷 Count Plot

🔥 Heatmap for Highly Correlated Features

There are times the data provided to us contains redundant features which do not help with increasing the model's performance. That is why we remove them before training our model.

🔥 Heat Map

From the above heatmap, we can conclude that the 'total sulphur dioxide' and 'free sulphur dioxide' are highly correlated features, so we will remove them.


🤖 Model Performance Comparison

Model 🏋️ Training Accuracy 🧪 Validation Accuracy
Logistic Regression 0.698 0.686
XGBoost Classifier 0.976 0.805
SVC (RBF Kernel) 0.720 0.707
  • XGBoost Classifier delivered the highest validation accuracy! 🚀

🧮 Confusion Matrix on Validation Data

🧮 Confusion Matrix


📊 Model Performance (AUC)

Model Training AUC Validation AUC
Logistic Regression 0.70 0.69
XGBoost 0.98 0.80
SVM (RBF Kernel) 0.72 0.71

Best Model (XGBoost) Classification Report: precision recall f1-score support 0 0.76 0.74 0.75 474 1 0.86 0.86 0.86 826


🤝 Contributing

  1. 🍴 Fork the repository
  2. 🌿 Create your feature branch (git checkout -b feature/AmazingFeature)
  3. 💾 Commit changes (git commit -m 'Add some AmazingFeature')
  4. 🚀 Push to branch (git push origin feature/AmazingFeature)
  5. 🔄 Open Pull Request

📜 License

Distributed under the MIT License. See LICENSE for more information.


Code in Place 2025

Image Alt

This project has been created as my final submission for Stanford’s Code in Place 2025! 🚀 The project applies the foundational Python and data science skills learned in Code in Place to a real-world machine learning challenge: predicting wine quality based on physicochemical features. The program uses a well-known dataset to train and evaluate several machine learning models, focusing on clean code, data analysis, and model comparison.

🙌 Credits

About

WinQ leverages machine learning to predict wine quality using key physicochemical features, delivering actionable insights with strong model accuracy. Developed for Stanford Code in Place 2025, this project showcases the power of Python and data science fundamentals in a real-world context.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published