As per the Centers for Disease Control and Prevention, one person dies every 36 seconds in the US from heart disease. The Heart is the most important part of our bodies. Any dysfunctionality in the heart will affect other organs of the body. Heart disease has increasingly become one of the leading causes of death in the world. For that, we have decided as a team to create a prediction method for classification of heart disease. A dataset on this subject was extracted from UCI Machine Learning Repository which contains 14 attributes and explained in detail in the coming sections. The creators for this data were:
- Hungarian Institute of Cardiology. Budapest: Andras Janosi, M.D.
- University Hospital, Zurich, Switzerland: William Steinbrunn, M.D.
- University Hospital, Basel, Switzerland: Matthias Pfisterer, M.D.
- V.A. Medical Center, Long Beach and Cleveland Clinic Foundation: Robert Detrano, M.D., Ph.D.
Generally medical decisions are often made based on the doctor’s intuition and experience. However, with the current available data, we can generate rich information by applying data mining technique which would help significantly in avoiding the biases and minimizing the decision error. The questions for which we were trying to find answers through data analytics are as follows:
- What are the key parameters that play a major role in indicating that an individual has a heart disease?
- What are the effects of outliers on our analysis?
- How can we know that our chosen model is working efficiently?
- What is our main class?
This dataset was extracted from UCI Machine Learning Repository. It contains a record of 918 patients, mostly above the age of 30. It looks at 11 attributes which are explained as follow and has 1 target variable column:
- Age - age in years
- Sex - (1 = male; 0 = female)
- Chest Pain Type ● 0: Typical angina: chest pain related decrease blood supply to the heart ● 1: Atypical angina: chest pain not related to heart ● 2: Non-anginal pain: typically esophageal spasms (non heart related) ● 3: Asymptomatic: chest pain not showing signs of disease
- RestingBP - resting blood pressure (in mm Hg on admission to the hospital) anything above 130-140 is typically cause for concern
- Cholesterol - serum cholesterol in mg/dl ● serum = LDL + HDL + .2 * triglycerides ● above 200 is cause for concern
- FastingBS - (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
- RestingECG - resting electrocardiographic results ● 0: Nothing to note ● 1: ST-T Wave abnormality
- can range from mild symptoms to severe problems
- signals non-normal heart beat ● 2: Possible or definite left ventricular hypertrophy
- Enlarged heart's main pumping chamber
- MaxHR - maximum heart rate achieved
- Exercise Angina - exercise induced angina (1 = yes; 0 = no)
- Oldpeak - ST depression induced by exercise relative to rest looks at stress of heart during exercise unhealthy heart will stress more
- ST Slope - the slope of the peak exercise ST segment ● 0: Upsloping: better heart rate with exercise (uncommon) ● 1: Flat Sloping: minimal change (typical healthy heart)
- Heart Disease - have heart disease or not (1=yes, 0=no)
Dataset URL: https://archive.ics.uci.edu/ml/datasets/heart+disease
Project by - Manvika Tuteja and Ahmed Alsaadi