This project focuses on predicting liver disease using machine learning models trained on the Indian Liver Patient Dataset (ILPD). The objective is to assist early diagnosis by analyzing routine clinical and biochemical parameters.
Liver disease is a major health concern and often remains undiagnosed until advanced stages. This project applies supervised machine learning techniques to classify patients as having liver disease or not, based on medical test results.
Multiple models are trained, evaluated, and compared to identify the best-performing approach.
- Dataset: Indian Liver Patient Dataset (ILPD)
- Records: 583
- Features include:
- Age
- Gender
- Total Bilirubin
- Direct Bilirubin
- Alkaline Phosphatase
- ALT (SGPT)
- AST (SGOT)
- Total Proteins
- Albumin
- Albumin/Globulin Ratio
- Target variable:
1→ Liver Disease0→ No Liver Disease
The following six models were implemented and compared:
- Logistic Regression
- K-Nearest Neighbors (KNN)
- Support Vector Machine (SVM)
- Decision Tree
- Random Forest
- XGBoost
- Data loading and preprocessing
- Handling missing values and encoding categorical features
- Feature scaling and normalization
- Model training using multiple classifiers
- Performance evaluation and comparison
- Selection of the best-performing model
Models were evaluated using:
- Accuracy
- Precision
- Recall
- F1-score
- Confusion Matrix
- Ensemble and tree-based models achieved higher accuracy
- Bilirubin levels and liver enzyme values were strong predictors
- Random Forest and XGBoost performed better than linear models
- Comparing multiple models helped in selecting a reliable classifier
- Python
- Jupyter Notebook
- Pandas
- NumPy
- Scikit-learn
- Matplotlib
- Seaborn
- XGBoost
Liver-Disease-Prediction/ │ ├── ILPD Final-6models.ipynb ├── dataset.csv └── README.md
This project demonstrates the effectiveness of machine learning techniques in medical diagnosis. By comparing multiple models, the system identifies the most suitable approach for liver disease prediction, supporting early detection and better healthcare outcomes.
- Hyperparameter tuning
- Handling class imbalance using SMOTE
- Model deployment as a web application
- Integration with real-time clinical data