Skip to content

Predicting medical insurance charges using regression models like Random Forest, KNN, and Linear Regression.

Notifications You must be signed in to change notification settings

KumarRaju1313/insurance-cost-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ’Έ Insurance Charges Prediction

This project predicts individual medical insurance charges based on demographic and lifestyle factors using machine learning regression models.


πŸ“ Dataset

The dataset includes the following features:

  • age β€” Age of the policyholder
  • sex β€” Gender
  • bmi β€” Body Mass Index
  • children β€” Number of dependents
  • smoker β€” Smoker status
  • region β€” Residential region
  • charges β€” Target variable (Insurance premium)

The dataset is available in the data/ directory.


πŸ“Š Methodology

πŸ§ͺ Data Exploration

  • View summary statistics and unique categorical values.
  • Visualize distributions of key variables like age, BMI, and charges.

🧼 Data Preprocessing

  • Handle missing values (if any).
  • Encode categorical variables using one-hot encoding.
  • Standardize numerical columns for uniformity.

πŸ€– Model Training

Train the following regression models:

  • K-Nearest Neighbors (KNN)
  • Linear Regression
  • Support Vector Machine (SVM)
  • Decision Tree Regressor
  • Random Forest Regressor

πŸ“ Model Evaluation

  • Metric: Mean Absolute Error (MAE) on the test set.

πŸ“ˆ Results

Model MAE
KNN 3532.65
Linear Regression 4243.65
SVM 8478.46
Decision Tree 2817.28
Random Forest 2575.89 βœ…

πŸ” Conclusion

  • The Random Forest Regressor outperformed all other models with the lowest Mean Absolute Error of 2575.89.
  • Simpler models like Linear Regression and SVM were less effective for this dataset.
  • Proper preprocessing (like encoding and scaling) had a significant impact on overall model performance.

βš™οΈ Tools Used

  • Python
  • Pandas
  • NumPy
  • Matplotlib & Seaborn
  • Scikit-learn
  • Jupyter Notebook

About

Predicting medical insurance charges using regression models like Random Forest, KNN, and Linear Regression.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published