This project aims to develop a full credit-scoring machine-learning solution following the CRISP-ML(Q) framework, including data exploration and quality assessment (univariate EDA, outlier detection, missing-value treatment), feature engineering (StandardScaler/MinMaxScaler, ratio feature, dummy encoding), sampling and class balancing (SMOTE/SMOTENC), multivariate feature-reduction techniques (PCA, RFE), training of multiple classification algorithms (Logistic Regression, Decision Tree, Random Forest, XGBoost, LightGBM, CatBoost, AdaBoost, etc.), cross-validation (10-fold), hyperparameter tuning (Bayesian Optimization), and interpretation of variable importance to identify the most effective model for predicting credit default.
This work is part of the 1FIN14 – Financial Programing course.
The dataset used in this project contains historical loan applications, including applicant demographics, financial information, loan details, and the target variable indicating whether the loan was approved or resulted in default. The dataset is not included in this repository due to privacy and course-use restrictions.