Skip to content

supervised models to predict the happiness by entering categorical features as input

Notifications You must be signed in to change notification settings

farnazgh/Somerville-Happiness-Survey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Somerville-Happiness-Survey

Supervised models to predict the happiness by entering categorical features as input

Somerville Happiness Survey data set [1] was selected to be used in this project. This data set was collected in 2015 by surveying 143 Somerville residents about their personal happiness and their satisfaction with city services. There are 6 attributes X1 to X6 with values 1 to 5 and a binary decision attribute as follows:

• D = decision attribute (D) with values 0 (unhappy) and 1 (happy)
• X1 = the availability of information about the city services
• X2 = the cost of housing
• X3 = the overall quality of public schools
• X4 = your trust in the local police
• X5 = the maintenance of streets and sidewalks
• X6 = the availability of social community events
No missing values have been observed in this data set, so no further action is needed to deal with them.

In this project Decision Tree, Bagging, Random Forest and Gradient Boosting methods were used for training classification model to predict happiness or unhappiness.

Models
Decision trees provide a hierarchical representation of the data that can be used for classification and regression problems. They are easy to interpret, extremely fast and can handle categorical input data. However, they are prone to overfitting and a small change in data can lead to a large change in tree structure. In order to remedy these issues, ensemble methods can be used. Bootstrap aggregating or Bagging is an ensemble method which can be applied to decision tree method to improve the stability and accuracy. It works by several sampling of data with replacement, and ended up with an ensemble of different decision tree models which the average of their predictions will indicate the final result. Random Forest is an extension of Bagging that in addition to taking random sample of data, it also uses random subset of features which mitigates overfitting. Gradient Boosting is another ensemble technique which is an extension of Boosting method. Boosting itself is an ensemble method having ensemble of trees in which each tree tries to minimize the error by considering the previous tree. Gradient Boosting technique minimizes errors by gradient descent algorithm.

[1] Koczkodaj, W.W.; Li, F.; Wolny-Dominiak, A., RatingScaleReduction package: stepwise rating scale item reduction without predictability loss, R JOURNAL, 10(1): 43-55, 2018.

About

supervised models to predict the happiness by entering categorical features as input

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published