The code shared demonstrates the various Classification algorithms using Python.
Please find the data used, uploaded to github along with the code.
Python, Spark MLlib
Logistic regression is a popular method to predict a categorical response. It is a special case of Generalized Linear models that predicts the probability of the outcomes.
In spark.ml logistic regression can be used to predict a binary outcome by using binomial logistic regression, or it can be used to predict a multiclass outcome by using multinomial logistic regression.
Decision trees are a popular family of classification and regression methods.
Random forests are a popular family of classification and regression methods.
Gradient-boosted trees (GBTs) are a popular classification and regression method using ensembles of decision trees.
Multilayer perceptron classifier (MLPC) is a classifier based on the feedforward artificial neural network. MLPC consists of multiple layers of nodes.
A support vector machine constructs a hyperplane or set of hyperplanes in a high- or infinite-dimensional space, which can be used for classification, regression, or other tasks.
Naive Bayes classifiers are a family of simple probabilistic, multiclass classifiers based on applying Bayes’ theorem with strong (naive) independence assumptions between every pair of features.