This technical audit targets a dataset of 1,000 patient records to validate data integrity and assess the impact of lifestyle interventions on cardiovascular performance. The analysis identifies critical biometric inconsistencies and employs a frequentist statistical approach to test hypothesis significance.
A preliminary audit revealed structural flaws in the source data that would lead to "Garbage In, Garbage Out" (GIGO) scenarios.
- Audit of Faulty BMI: The original
BMIcolumn exhibited mathematical drift. I implemented a recovery fieldBMI_Realderived from rawWeight_kgandHeight_cm. - Structural Deconstruction: The
Blood_Pressurestring variable was parsed intoSystolicandDiastolicintegers, enabling granular clinical risk profiling. - Validation Metric: A Pearson correlation matrix confirms the audit's success, showing a robust 0.82 correlation between weight and the corrected BMI.
The core objective was to determine if physical activity levels significantly shift the biometric mean of heart rate.
- Statistical Methodology: Two-sample Student's T-test.
-
Null Hypothesis (
$H_0$ ): No significant difference exists in heart rate between active and sedentary cohorts. - The Verdict: Computed P-Value = 0.4614.
- Strategic Insight: We fail to reject the null hypothesis. The variance is not statistically significant, proving that cardiovascular health in this cohort is a multifactorial system.
- Environment: JupyterLab.
- Language: Python 3.10+.
- Libraries: Pandas (Auditing), Seaborn/Matplotlib (Visualization), Scipy.stats (Inferential Analysis).
Audit performed by André Vinagre.