A comprehensive machine learning project that classifies orange quality into three categories (Good, Medium, Poor) based on physical, chemical, and environmental characteristics. Features a trained logistic regression model with 98.6% accuracy and an interactive web application.
This school project demonstrates an end-to-end machine learning pipeline from data analysis to deployment. The system helps farmers and distributors automatically grade orange quality for better market placement and pricing decisions.
- 🤖 Machine Learning Model: Logistic Regression with 98.6% cross-validation accuracy
- 🛠️ Complete Pipeline: Integrated data preprocessing, training, and evaluation
- 🌐 Web Application: Interactive Streamlit app for real-time predictions
- 📊 Data Visualization: Comprehensive EDA and performance metrics
- 🔧 Production Ready: Model persistence and easy deployment
500 samples with the following features:
| Feature | Description | Type |
|---|---|---|
diameter |
Orange diameter (cm) | Numerical |
berat |
Weight (grams) | Numerical |
tebal_kulit |
Skin thickness (cm) | Numerical |
kadar_gula |
Sugar content (%) | Numerical |
asal_daerah |
Origin region | Categorical |
warna |
Skin color | Ordinal |
musim_panen |
Harvest season | Categorical |
kualitas |
Quality label (Target) | Categorical |
Quality Classes:
- 🟢 Bagus (Good) - Export quality
- 🟡 Sedang (Medium) - Local market quality
- 🔴 Jelek (Poor) - Industrial processing quality
Pipeline([
('preprocessing', ColumnTransformer([
('scaler', StandardScaler(), numeric_features),
('ohe', OneHotEncoder(), categorical_features)
])),
('model', LogisticRegression())
])| Metric | Score |
|---|---|
| Cross-validation Accuracy | 98.6% |
| Test Accuracy | 100% |
| Precision | 99% |
| Recall | 99% |
Python 3.8+
pip install -r requirements.txt- Clone the repository
git clone https://github.com/yourusername/citrus-quality-classification.git
cd citrus-quality-classification- Install dependencies
pip install -r requirements.txt- Run the Streamlit app
streamlit run app_jeruk.pyTrain the model:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
# Load and preprocess data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LogisticRegression()
model.fit(X_train, y_train)Make predictions:
# Single prediction
new_orange = [[7.5, 180.0, 0.5, 12.0, 'Jawa Barat', 'oranye', 'kemarau']]
prediction = model.predict(new_orange)
probability = model.predict_proba(new_orange)The Streamlit app provides an intuitive interface for quality prediction:
Features:
- Interactive sliders for numerical features
- Pill selectors for categorical options
- Real-time quality predictions
- Probability visualization
- Business recommendations
citrus-quality-classification/
│
├── app_streamlit.py # Streamlit web application
├── model_klasifikasi_jeruk.joblib # Trained model file
├── jeruk_balance_500.csv # Dataset
├── requirements.txt # Dependencies
├── EDA_analysis.ipynb # Exploratory Data Analysis
└── README.md # Project documentation
- Programming: Python 3.8+
- Machine Learning: Scikit-learn, Pandas, NumPy
- Visualization: Matplotlib, Seaborn
- Web Framework: Streamlit
- Model Persistence: Joblib
The model identifies key factors affecting orange quality:
- Sugar content (
kadar_gula) - Weight (
berat) - Skin thickness (
tebal_kulit) - Diameter (
diameter)
- Farmers: Better pricing decisions based on quality
- Distributors: Optimal market channel selection
- Exporters: Automated quality control for international standards
- Compare multiple algorithms (Random Forest, SVM, Neural Networks)
- Add feature importance analysis with SHAP values
- Develop REST API for integration
- Mobile app development
- Real-time image recognition for quality assessment
- Your Name - GitHub Profile
- School: SMKN 1 Purbalingga
- Course: Machine Learning
This project is licensed under the MIT License - see the LICENSE file for details.
- Dataset provided for educational purposes
- Instructors and peers for valuable feedback
- Open-source community for amazing libraries
⭐ If you find this project useful, please give it a star!
For questions or collaborations, feel free to reach out:
- Email: ibraramadanialzaki2@gmail.com
- LinkedIn: Your Profile
- Portfolio: Your Website
Made with 🍊 and ❤️ for Machine Learning
