Loan Risk Classification

Purpose

This analysis aims to develop a machine learning model to predict whether a loan is healthy (Class 0) or risky (Class 1). By identifying high-risk loans, the bank can mitigate credit risk and improve loan approval decisions.

Data Overview

The dataset includes financial variables such as loan size, interest rate, borrower income, debt-to-income ratio, number of accounts, derogatory marks, and total debt. The target variable (loan_status) indicates loan health:

Class 0 (Healthy Loans): 18,759 samples
Class 1 (Risky Loans): 625 samples

Key Observation: The dataset is highly imbalanced, with healthy loans dominating. While desirable for a bank, this imbalance can hinder the model's ability to detect risky loans accurately.

The graph below shows that loan amounts are normally distributed, with most requests centered around $10,000 and fewer at the extremes.

Machine Learning Process

Data Preprocessing:
- Features (X) and labels (y) were separated.
- Data was split into training and testing sets, stratified to preserve class balance.
Model Training:
- A Logistic Regression model was implemented using the lbfgs solver.
- The model was trained on the stratified training dataset to ensure consistent results across imbalanced classes.
Evaluation:
- Performance was assessed using a confusion matrix and classification report, providing accuracy, precision, recall, and F1-score metrics.

Results

Key Metrics:

Overall Accuracy: 99%
Precision:
- Class 0: 1.00 (Perfect precision)
- Class 1: 0.87
Recall:
- Class 0: 1.00 (Perfect recall)
- Class 1: 0.89
F1-Score:
- Class 0: 1.00
- Class 1: 0.88

Visualization of Key Metrics

Insights:

The model excels at predicting healthy loans, achieving perfect precision and recall.
Performance for risky loans is strong but slightly lower due to class imbalance.

Recommendations

To improve the model's ability to predict risky loans:

Oversample Risky Loans: Increase the number of risky loan samples to balance the dataset.
Adjust Class Weights: Penalize misclassification of risky loans more heavily.
Precision-Recall Tradeoff: Focus on improving precision and recall for Class 1 (risky loans), as missing high-risk cases is costlier for the bank.
Ongoing Monitoring: Regularly validate the model with real-world loan data to ensure its performance remains robust over time.

Summary

The Logistic Regression model performs exceptionally well for healthy loans and reasonably well for risky loans. With targeted improvements, this model can provide reliable predictions, enabling better credit risk management and decision-making.

Repo Structure

Notebook: credit_risk_classification.ipynb

Documentation: README.md (includes analysis summary)

License: Contains the public license information. ___

Data Reference

This dataset is fictional and is used to showcase my machine learning skills in predicting healthy and risky loans, as well as applying data transformation techniques in a real-world context.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Resources		Resources
Visualization		Visualization
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
credit_risk_classification.ipynb		credit_risk_classification.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Loan Risk Classification

Purpose

Data Overview

Machine Learning Process

Results

Key Metrics:

Visualization of Key Metrics

Insights:

Recommendations

Summary

Repo Structure

Data Reference

About

Uh oh!

Releases

Packages

Languages

License

LegallyNotBlonde/Loan_Risk_Predictor

Folders and files

Latest commit

History

Repository files navigation

Loan Risk Classification

Purpose

Data Overview

Machine Learning Process

Results

Key Metrics:

Visualization of Key Metrics

Insights:

Recommendations

Summary

Repo Structure

Data Reference

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages