This project compares various machine learning models for predicting diabetes outcomes using a large-scale dataset with one lakh records and multiple features. It introduces a novel deep learning model, the Feature Utility, Grouping, and Adaptive fusion Network (FUGA-Net), for enhanced analysis.
- Logistic Regression (optimized for large datasets)
- Decision Tree (with memory optimization)
- Random Forest (parallel processing enabled)
- KNN (optimized for large-scale data)
- Naive Bayes (memory-efficient implementation)
- SVM (with cache optimization)
- Ensemble (Voting Classifier with parallel processing)
- Feature Utility, Grouping, and Adaptive fusion Network (FUGA-Net) (novel deep learning model with feature fusion)
data/: Contains the diabetes dataset (1 lakh records).outputs/plots/: Contains visualizations for model evaluation, including:correlation/: Correlation heatmap (optimized computation).distributions/: Feature distribution plots (with sampling).confusion_matrices/: Confusion matrices for each model.roc_curves/: ROC curves for each model and a combined plot (including FUGA-Net).pr_curves/: Precision-Recall curves for each model (including FUGA-Net).model_comparison/: Bar plots comparing model metrics.
outputs/metrics/: Contains CSV files with model evaluation metrics (model_comparison_results.csv) and detailed metrics for FUGA-Net (fuga_net_detailed_metrics.txt).outputs/best_model/: Contains analysis files for key models, includingfuga_net_analysis.txtandensemble_analysis.txt.
- Python 3.8+
- Core packages:
- numpy
- pandas
- scikit-learn
- matplotlib
- seaborn
- torch
- Performance optimization:
- numba
- pyarrow
- fastparquet
- Clone the repository.
- Install the required packages:
pip install -r requirements.txt
- Run the FUGA-Net evaluation script to train and save its results:
python evaluate_fuga_net.py
- Run the main analysis script to train traditional models and generate comparisons:
python diabetes_analysis.py
- Memory-efficient data processing
- Parallel processing for faster computation
- Batch processing for large datasets
- Optimized model parameters
- Efficient visualization techniques
- Comprehensive model evaluation
- Novel Feature Utility, Grouping, and Adaptive fusion Network (FUGA-Net) implementation
The evaluation results for all models are compiled in outputs/metrics/model_comparison_results.csv. Detailed metrics for FUGA-Net are in outputs/metrics/fuga_net_detailed_metrics.txt. Visualizations, including combined ROC and individual PR curves, are available in the outputs/plots/ directory. Analysis of key models is in the outputs/best_model/ folder.
The project implements several memory optimization techniques:
- Efficient data types and downcasting
- Batch processing for large predictions (especially in FUGA-Net)
- Garbage collection and CUDA memory management
- Sampling strategies for visualization
- Parallel processing for faster computation
This project is licensed under the MIT License - see the LICENSE file for details.