Skip to content

An end-to-end ML pipeline for classifying wine customer segments using the UCI Wine dataset. It leverages Kernal Principal Component Analysis (KPCA) to reduce 13 chemical features into 2 dimensions, followed by a Logistic Regression model.

Notifications You must be signed in to change notification settings

samir-m0hamed/WineCustomerSegmentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Wine Customer Segmentation

Kernal PCA & Logistic Regression

Python Scikit-Learn Status


A streamlined Machine Learning approach to classify wine varieties based on chemical composition using Dimensionality Reduction and Linear Classification.


Technical Specifications

Component Specification
Dataset UCI Wine Dataset (178 Samples, 13 Features)
Preprocessing StandardScaler (Z-score Normalization)
Dim. Reduction PCA (13D to 2D) retaining max variance
Model Logistic Regression (Linear Classifier)
Metric Accuracy Score & Confusion Matrix
Visualization Linear Decision Boundaries (2D Projection)

The Architecture

The project follows a strict pipeline to ensure data integrity and optimal performance:

graph LR
    A[Raw Data] --> B(Standardization)
    B --> C{PCA Transformation}
    C --> D[2 Principal Components]
    D --> E[Logistic Regression Training]
    E --> F[Prediction & Evaluation]
Loading

1. Data Transformation (PCA)

Instead of feeding 13 complex features directly into the model, we use Principal Component Analysis to extract the most informative signals.

  • Why? To mitigate the curse of dimensionality and enable 2D visualization.
  • Result: Two new features (PC1 & PC2) capture the essence of the dataset.

2. The Classifier (Logistic Regression)

Since the data becomes linearly separable after PCA, Logistic Regression provides a fast, interpretable, and highly accurate solution.

  • Outcome: The model draws linear boundaries between the three customer segments.

Performance & Results

The combination of PCA and Logistic Regression proved to be highly efficient for this dataset.

Metric Score Note
Training Accuracy ~97% Model fits the data well without overfitting.
Testing Accuracy 100% Perfect generalization on unseen data.
Dimensionality 2 Features Massive reduction from original 13 features.

Observation: The visualization demonstrates distinct clusters for Customer Segments 1, 2, and 3, separated clearly by linear decision boundaries.


About

An end-to-end ML pipeline for classifying wine customer segments using the UCI Wine dataset. It leverages Kernal Principal Component Analysis (KPCA) to reduce 13 chemical features into 2 dimensions, followed by a Logistic Regression model.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published