Skip to content

Early Diabetes Detection Using ML Algorithms: A Comparative Analysis with FUGA-Net (Feature Utility, Grouping, and Adaptive fusion Network).

License

Notifications You must be signed in to change notification settings

ManakRaj-7/early-diabetes-detection-ml-comparison

Repository files navigation

Diabetes Prediction Model Comparison

This project compares various machine learning models for predicting diabetes outcomes using a large-scale dataset with one lakh records and multiple features. It introduces a novel deep learning model, the Feature Utility, Grouping, and Adaptive fusion Network (FUGA-Net), for enhanced analysis.

Models Included

  • Logistic Regression (optimized for large datasets)
  • Decision Tree (with memory optimization)
  • Random Forest (parallel processing enabled)
  • KNN (optimized for large-scale data)
  • Naive Bayes (memory-efficient implementation)
  • SVM (with cache optimization)
  • Ensemble (Voting Classifier with parallel processing)
  • Feature Utility, Grouping, and Adaptive fusion Network (FUGA-Net) (novel deep learning model with feature fusion)

Project Structure

  • data/: Contains the diabetes dataset (1 lakh records).
  • outputs/plots/: Contains visualizations for model evaluation, including:
    • correlation/: Correlation heatmap (optimized computation).
    • distributions/: Feature distribution plots (with sampling).
    • confusion_matrices/: Confusion matrices for each model.
    • roc_curves/: ROC curves for each model and a combined plot (including FUGA-Net).
    • pr_curves/: Precision-Recall curves for each model (including FUGA-Net).
    • model_comparison/: Bar plots comparing model metrics.
  • outputs/metrics/: Contains CSV files with model evaluation metrics (model_comparison_results.csv) and detailed metrics for FUGA-Net (fuga_net_detailed_metrics.txt).
  • outputs/best_model/: Contains analysis files for key models, including fuga_net_analysis.txt and ensemble_analysis.txt.

Requirements

  • Python 3.8+
  • Core packages:
    • numpy
    • pandas
    • scikit-learn
    • matplotlib
    • seaborn
    • torch
  • Performance optimization:
    • numba
    • pyarrow
    • fastparquet

Installation

  1. Clone the repository.
  2. Install the required packages:
    pip install -r requirements.txt

Usage

  1. Run the FUGA-Net evaluation script to train and save its results:
    python evaluate_fuga_net.py
  2. Run the main analysis script to train traditional models and generate comparisons:
    python diabetes_analysis.py

Key Features

  • Memory-efficient data processing
  • Parallel processing for faster computation
  • Batch processing for large datasets
  • Optimized model parameters
  • Efficient visualization techniques
  • Comprehensive model evaluation
  • Novel Feature Utility, Grouping, and Adaptive fusion Network (FUGA-Net) implementation

Results

The evaluation results for all models are compiled in outputs/metrics/model_comparison_results.csv. Detailed metrics for FUGA-Net are in outputs/metrics/fuga_net_detailed_metrics.txt. Visualizations, including combined ROC and individual PR curves, are available in the outputs/plots/ directory. Analysis of key models is in the outputs/best_model/ folder.

Memory Optimization

The project implements several memory optimization techniques:

  1. Efficient data types and downcasting
  2. Batch processing for large predictions (especially in FUGA-Net)
  3. Garbage collection and CUDA memory management
  4. Sampling strategies for visualization
  5. Parallel processing for faster computation

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Early Diabetes Detection Using ML Algorithms: A Comparative Analysis with FUGA-Net (Feature Utility, Grouping, and Adaptive fusion Network).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages