This project applies machine learning techniques to predict loan approval status using the Kaggle "Loan Status Prediction" dataset. It demonstrates:
- Comprehensive data preprocessing (imputation, encoding, scaling).
- Handling class imbalance using SMOTE.
- Feature selection using Recursive Feature Elimination (RFE).
- Comparative analysis of five classification models (Logistic Regression, SVM, Naive Bayes, Random Forest, Decision Tree) on both full and RFE-selected feature sets.
- Python: Pandas, NumPy, Scikit-learn, Imblearn
- ML Techniques: Data Cleaning, EDA, SMOTE, RFE, Model Evaluation (Accuracy, Precision, Recall, F1, ROC AUC), Classification Algorithms.
The study found that while using all features generally yielded slightly higher accuracy (SVM/Naive Bayes best at ~74%), RFE significantly improved Logistic Regression's performance (73% accuracy), demonstrating the nuanced impact of feature selection across different algorithms.
Link to Project Documentation: (https://lapis-school-f5e.notion.site/Loan-Approval-Prediction-A-Comparative-Study-1ecca101e469809d8b9feab397c40a47?pvs=4)