π©Ί Identifying Key Determinants of Heart Disease
πProject Overview
This repository hosts a SAS-based clinical analytics pipeline designed to uncover the physiological drivers behind heart disease. Using the UCI Heart Disease dataset, this project bridges the gap between raw medical data and actionable clinical insights through rigorous statistical validation.
The pipeline achieves high diagnostic accuracy, specifically isolating high-signal predictors like thalassemia and vessel blockage.
πKey Features & Methodology
π οΈ Data Engineering & EDA
Cleaning: Processed and encoded mixed-type variables for seamless modeling.
Analysis: Performed deep Exploratory Data Analysis (EDA) to validate distributions and ensure modeling readiness.
π§ͺ Inferential Screening To eliminate noise and focus on "true" predictors, I conducted:
t-tests & ANOVA: For continuous physiological metrics (age, cholesterol, etc.).
Chi-Square Tests: To find significant associations between categorical risks.
Signal Capture: Only statistically significant variables were moved to the modeling phase.
π€ Predictive Modeling
Model: Stepwise Logistic Regression.
Performance: π ROC-AUC: 0.9365
π― Sensitivity: 87%
π οΈTechnical Stack
-> Language: SAS (Base SAS, SAS/STAT)
-> Dataset: UCI Machine Learning Repository
-> Statistical Methods: Inferential Tests, Stepwise Selection, Logistic Regression
πRepository Structure
scripts/: SAS programs for cleaning, testing, and modeling.
data/: Processed dataset (or link to UCI source).
πHow to Run
π₯ Load the heart.csv into your SAS Environment.
βοΈ Run the preprocessing script to clean and encode variables.
π¬ Execute the inferential testing script to view p-values.
π Run the logistic regression script to generate the final model and AUC metrics.