This project focuses on analyzing and comparing the performance of Support Vector Machine (SVM), Random Forest, and XGBoost in diagnosing depression based on individual data. As the team leader, I led the implementation of machine learning techniques, ensuring a robust evaluation process that incorporates Non-Parametric Statistical Testing, Feature Engineering, Resampling, and Hyperparameter Tuning. This project was conducted as part of the Computational Intelligence coursework and received a grade of 97/100.
- Compare the performance of SVM, Random Forest, and XGBoost using F1-Score.
- Implement several statistical and model techniques and intepret the results through feature importances and SHAP Analysis.
- Data Preprocessing: Cleaned and transformed individual data for better model training.
- Exploratory Data Analysis: Visualized each feature based on depression status, including statistical analysis to confirm significance association.
- Feature Engineering: Reduced the dataframe dimension through PCA.
- Resampling: Separated the model into several cases: original cleaned dataset, oversampled, undersampled, random sampling.
- Hyperparameter Tuning: Optimized model performance using Grid Search and Bayesian Optimization.
- Model Implementation: Trained and evaluated SVM, Random Forest, and XGBoost.