A streamlined Machine Learning approach to classify wine varieties based on chemical composition using Dimensionality Reduction and Linear Classification.
| Component | Specification |
|---|---|
| Dataset | UCI Wine Dataset (178 Samples, 13 Features) |
| Preprocessing | StandardScaler (Z-score Normalization) |
| Dim. Reduction | PCA (13D to 2D) retaining max variance |
| Model | Logistic Regression (Linear Classifier) |
| Metric | Accuracy Score & Confusion Matrix |
| Visualization | Linear Decision Boundaries (2D Projection) |
The project follows a strict pipeline to ensure data integrity and optimal performance:
graph LR
A[Raw Data] --> B(Standardization)
B --> C{PCA Transformation}
C --> D[2 Principal Components]
D --> E[Logistic Regression Training]
E --> F[Prediction & Evaluation]
Instead of feeding 13 complex features directly into the model, we use Principal Component Analysis to extract the most informative signals.
- Why? To mitigate the curse of dimensionality and enable 2D visualization.
- Result: Two new features (PC1 & PC2) capture the essence of the dataset.
Since the data becomes linearly separable after PCA, Logistic Regression provides a fast, interpretable, and highly accurate solution.
- Outcome: The model draws linear boundaries between the three customer segments.
The combination of PCA and Logistic Regression proved to be highly efficient for this dataset.
| Metric | Score | Note |
|---|---|---|
| Training Accuracy | ~97% | Model fits the data well without overfitting. |
| Testing Accuracy | 100% | Perfect generalization on unseen data. |
| Dimensionality | 2 Features | Massive reduction from original 13 features. |
Observation: The visualization demonstrates distinct clusters for Customer Segments 1, 2, and 3, separated clearly by linear decision boundaries.