GitHub - Sreeya003/Identifying-Key-Determinants-of-Heart-Disease: A SAS-based clinical analytics pipeline using the UCI Heart Disease dataset to identify high-signal risk factors through inferential testing and stepwise logistic regression (ROC-AUC: 0.9365).

🩺 Identifying Key Determinants of Heart Disease

📊Project Overview

This repository hosts a SAS-based clinical analytics pipeline designed to uncover the physiological drivers behind heart disease. Using the UCI Heart Disease dataset, this project bridges the gap between raw medical data and actionable clinical insights through rigorous statistical validation.

The pipeline achieves high diagnostic accuracy, specifically isolating high-signal predictors like thalassemia and vessel blockage.

🚀Key Features & Methodology

🛠️ Data Engineering & EDA

Cleaning: Processed and encoded mixed-type variables for seamless modeling.

Analysis: Performed deep Exploratory Data Analysis (EDA) to validate distributions and ensure modeling readiness.

🧪 Inferential Screening To eliminate noise and focus on "true" predictors, I conducted:

t-tests & ANOVA: For continuous physiological metrics (age, cholesterol, etc.).

Chi-Square Tests: To find significant associations between categorical risks.

Signal Capture: Only statistically significant variables were moved to the modeling phase.

🤖 Predictive Modeling

Model: Stepwise Logistic Regression.

Performance: 📈 ROC-AUC: 0.9365

🎯 Sensitivity: 87%

Clinical Insight: Isolated thalassemia and major vessel blockage as the most critical risk factors.

🛠️Technical Stack

-> Language: SAS (Base SAS, SAS/STAT)

-> Dataset: UCI Machine Learning Repository

-> Statistical Methods: Inferential Tests, Stepwise Selection, Logistic Regression

📂Repository Structure

scripts/: SAS programs for cleaning, testing, and modeling.

data/: Processed dataset (or link to UCI source).

output/: Statistical reports and ROC curve visualizations.

🏁How to Run

📥 Load the heart.csv into your SAS Environment.

⚙️ Run the preprocessing script to clean and encode variables.

🔬 Execute the inferential testing script to view p-values.

🏆 Run the logistic regression script to generate the final model and AUC metrics.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
DATA 5301 Group 7 Final Project Code F 1.sas		DATA 5301 Group 7 Final Project Code F 1.sas
LICENSE		LICENSE
README.md		README.md
processed.cleveland (2).data		processed.cleveland (2).data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The pipeline achieves high diagnostic accuracy, specifically isolating high-signal predictors like thalassemia and vessel blockage.

Clinical Insight: Isolated thalassemia and major vessel blockage as the most critical risk factors.

output/: Statistical reports and ROC curve visualizations.

About

Uh oh!

Releases

Packages

Languages

License

Sreeya003/Identifying-Key-Determinants-of-Heart-Disease

Folders and files

Latest commit

History

Repository files navigation

The pipeline achieves high diagnostic accuracy, specifically isolating high-signal predictors like thalassemia and vessel blockage.

Clinical Insight: Isolated thalassemia and major vessel blockage as the most critical risk factors.

output/: Statistical reports and ROC curve visualizations.

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages