This project is to create a model that able to make a prediction of heart attack possibilities in a patient. I have deployed an app using Streamlit platform. This project used Logistic Regression classification model of Machine Learning (ML) to predict the required results. This repository consists of mainly python.
├── Datasets : Contains dataset used
├── Models : Contains model used in Heart_Attack_App Deploy
├── Statics : Contains all save image (graphs/heatmap)
├── __pycache__ : Contains .pyc file
├── GitHub_url.txt : Github url in .txt
├── Heart_Attack_App_deploy.py : App deploy in python format
├── Heart_Attack_Predictions.py : Code file in python format
└── README.md : Project Descriptions
This projects is trained with Heart Attack Analysis & Prediction Dataset.
Age (age)
: Age of the patient at the time of health checkupSex (sex)
: 0 = female and 1 = maleChest Pain (cp)
: 1 = typical angina, 2 = atypical angina, 3 = non-anginal pain, 4 = asymptoticResting Blood Pressure (trestbps)
: Resting blood pressure value of patient in mmHg (unit)Cholesterol (chol)
: Cholesterol of patient in mg/dl (unit)Fasting Blood Sugar (fbs)
: 1 = if fbs >120 mg/dl (true), else 0 = if not that (false)Resting ECG (restecg)
: 0 = normal, 1 = having ST-T wave abnormality, 2 = left ventricular hypertrophyMax Heart Rate (thalach)
: Maximum heart rate achieved by any patientExercise induced angina (exang)
: 0 = No and 1 = Yesoldpeak
: Displays the value of ST depression of any patient induced by exercise w.r.t. rest (float values)slp
: Describes the peak of exercise during ST segment, 0 = up-slope, 1 = flat, 2 = down-slopeNo. of major vessels (caa)
: Classified in range 0 to 4 by coloring through fluoroscopyThalassemia (thall)
: 1 = normal,2 = fixeddefect, 3 = reversible defectoutput
: It's the prediction column for diagnosis of heart attacks. Here, 0 = no possibility of heart attack and 1 = possibilities of heart attack
This project is created using Spyder as the main IDE. The main frameworks used in this project are Pandas, Matplotlib, Seaborn, Scikit-learn and Streamlit.
This project contains two .py files. The training and deploy files are Heart_Attack_Predictions.py and Heart_Attack_App_deploy.py respectively. The flow of the projects are as follows:
The data are loaded from the dataset and usefull libraries are imported.
The datasets is cleaned with necessary step. The duplicate is removed. The correlation between features are computed using Logistic Regression (continous vs continous) and Cramer's V (continous vs categorical). From the correlation results, the selected features are age
, trtbps
,chol
,thalachh
, oldpeak
, cp
, exng
, caa
, and thall
.
Few machine learning model suits for binary classfification problem are selected and built into the pipeline using both Min Max Scaler and Standard Scaler such as
- Logistic regression (lr)
- K Neighbors Classifier (knn)
- Random Forest Classifier (rf)
- Support Vector Classifier (svc)
- Decision Tree Classifier (dt)
The results with the best accuracy score is Logistic Regression (lr), Standard Scaler (ss) with 84 % accuracy score.
- The classification report in a image.
- The classification report in a table is shown below.
precision | recall | f1-score | support | |
---|---|---|---|---|
0.0 | 0.89 | 0.750 | 0.81 | 44 |
1.0 | 0.80 | 0.91 | 0.85 | 47 |
accuracy | 0.84 | 91 | ||
macro avg | 0.84 | 0.83 | 0.83 | 91 |
weighted avg | 0.84 | 0.84 | 0.83 | 91 |
The data is then tested with few cases.
An app to predict the chance of a person to get heart attack is then build using Streamlit.
This dataset are from Heart_Attack_Analysis & Predictions