Skip to content

Latest commit

 

History

History
69 lines (54 loc) · 2.75 KB

File metadata and controls

69 lines (54 loc) · 2.75 KB

Medicine-Charges-for-Smokers-and-Non-Smokers-Analysis

This Data analysis and data visualization project consists of the fact that the average medicine charges can be changed for the smokers and non-smokers.

Contents :

Abstract

Part : I : Analysis :

  • Data cleaning and setting
  • Heatmap to identify the corelation model
  • Distribution of Charges for smokers and non-smokers
  • Gender wise analysis
  • Cost analysis for women smokers and non-smokers
  • Cost analysis for men smokers and non-smokers
  • Age distribution for smokers
  • Ratio of 18 year old smokers and non-smokers
  • Charges for 18 year old smokers and non-smokers
  • Jointplot - charges and age of the Non-smokers
  • Jointplot - charges and age of the smokers
  • Scatter Plot : Charges for smokers and non-smokers
  • BMI Distribution
  • Distribution of charges for the patients greater than 30 BMI
  • Distribution of charges for the patients less than 30 BMI
  • Analysis on Charges of Smokers and Non-smokers using the Scatter Plot
  • Smoker Parents who have children at their home
  • Distribution of Smoker and Non-smokers who have children
  • Health Effects of Smoking and Secondhand Smoke on Children

Part : II : Regression Models based on the datasets :

  • Linear regression model
  • Data Visualization and Preprocessing
  • Heatmap shows the better corelation among the attributes
  • Age v/s Count and Age v/s Charges
  • Age v/s charges [by boxenplot representation]
  • BMI v/s Count and Charge v/s BMI
  • Gender Distribution and Gender v/s Charges
  • Smokers having Children
  • Smokers v/s Charges
  • Region based Smokers analysis
  • Basic Linear Regression Model
  • Polynomial Regression - 2nd degree
  • Ridge Regression
  • Lasso Regression
  • Support Vector Regression Model
  • Desicion Tree Regression
  • Random Forest Regression
  • Error Measurement
  • Checking the scores of RMSE, R2_score (training), R2_score (test) and Cross Validation
  • Which model is better in terms of Cross Validation Score ?
  • Which model is better than in terms of R2_score (Training and Test) ?
  • Which model is better in terms of RMSE ?
  • Result :: In terms of Cross Validation scores the "LINEAR REGRESSION" provides higher accuracy

Conclusion :

1.The impact of smoking on medical care use was examined in a 30-month prospective population-based cohort study in Japan (N = 43 408).

2.Male smokers incurred 11% more medical costs than ‘never smokers’ but for female smokers and never smokers the costs were almost thesame.

3.This difference was mainly attributable to the increased use of inpatient medical care among smokers, especially in males, where per monthcost of inpatient care was 33% higher in smokers.

Thank You! Stay Safe!

abhisheks008