Skip to content

A SAS-based clinical analytics pipeline using the UCI Heart Disease dataset to identify high-signal risk factors through inferential testing and stepwise logistic regression (ROC-AUC: 0.9365).

License

Notifications You must be signed in to change notification settings

Sreeya003/Identifying-Key-Determinants-of-Heart-Disease

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🩺 Identifying Key Determinants of Heart Disease

πŸ“ŠProject Overview

This repository hosts a SAS-based clinical analytics pipeline designed to uncover the physiological drivers behind heart disease. Using the UCI Heart Disease dataset, this project bridges the gap between raw medical data and actionable clinical insights through rigorous statistical validation.

The pipeline achieves high diagnostic accuracy, specifically isolating high-signal predictors like thalassemia and vessel blockage.

πŸš€Key Features & Methodology

πŸ› οΈ Data Engineering & EDA

Cleaning: Processed and encoded mixed-type variables for seamless modeling.

Analysis: Performed deep Exploratory Data Analysis (EDA) to validate distributions and ensure modeling readiness.

πŸ§ͺ Inferential Screening To eliminate noise and focus on "true" predictors, I conducted:

t-tests & ANOVA: For continuous physiological metrics (age, cholesterol, etc.).

Chi-Square Tests: To find significant associations between categorical risks.

Signal Capture: Only statistically significant variables were moved to the modeling phase.

πŸ€– Predictive Modeling

Model: Stepwise Logistic Regression.

Performance: πŸ“ˆ ROC-AUC: 0.9365

🎯 Sensitivity: 87%

Clinical Insight: Isolated thalassemia and major vessel blockage as the most critical risk factors.

πŸ› οΈTechnical Stack

-> Language: SAS (Base SAS, SAS/STAT)

-> Dataset: UCI Machine Learning Repository

-> Statistical Methods: Inferential Tests, Stepwise Selection, Logistic Regression

πŸ“‚Repository Structure

scripts/: SAS programs for cleaning, testing, and modeling.

data/: Processed dataset (or link to UCI source).

output/: Statistical reports and ROC curve visualizations.

🏁How to Run

πŸ“₯ Load the heart.csv into your SAS Environment.

βš™οΈ Run the preprocessing script to clean and encode variables.

πŸ”¬ Execute the inferential testing script to view p-values.

πŸ† Run the logistic regression script to generate the final model and AUC metrics.

About

A SAS-based clinical analytics pipeline using the UCI Heart Disease dataset to identify high-signal risk factors through inferential testing and stepwise logistic regression (ROC-AUC: 0.9365).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages