Skip to content

Learning Algorithms

melanie edited this page Mar 9, 2018 · 3 revisions

Supervised Learnings

Concepts

Supervised learning is the machine learning task of learning a function that mapping from the input variable and output variable. The target results are the approximate mapping function that generates the prediction based on new input data. It is called "supervised" because of the process of an algorithm learning from the training dataset can be thought of as the object supervising the learning process. Generally, it is known as a learning that data come with labels.

Supervised learning are mainly used in regression and classification problems.

Approaches

  • Linear Regression
  • Logistic Regression
  • Support Vector machine
  • Naive Bayes
  • Decision Trees
  • Random Forest
  • Neural Networks

Convolutional Neural Network

Unsupervised Learnings

Concepts

Unsupervised learning is the machine learning task of inferring a function to describe hidden pattern from input variable without corresponding labels. The target results are the underlying structure or distribution in the data in order to learn more about the data. It is called "unsupervised" since there is no object supervising the learning process.

Unsupervised learning are mainly used in clustering and association problems.

Approaches

  • K means clustering
  • Principal Component Analysis
  • Singular Value Decomposition
  • Independent Component Analysis
  • Nonnegative Matrix Factorization

Non-Negative Matrix Factorization

Concepts

Matrix factorization is to find two or more matrix factors whose product is a good approximation to the original matrix. In practice, the dimension of the decomposed matrix factors is usually much smaller than that of the original matrix, which compact representation of the data points and facilitate other learning tasks (e.g. clustering and classification)

Non-negative Matrix Factorization (NMF) enforces the constraint of having the factor matrices be nonnegative (n.n.d) (i.e. all elements must be equal to or greater than zero). The nonnegativity constraint leads to a parts-based representation of the object in the sense that it only allows additive combination of the original data. NMF is an ideal dimension reduction algorithm for image processing, face recognition, and document clustering where it is a natural to consider the object as a combination of parts to form a whole.

Characteristic
  • Unsupervised learning algorithm (unlabeled data) inapplicable to many real-world problems where limited knowledge from domain experts is available
  • Since unlabeled data with small labels improves in accuracy where small set of labeled data is relatively inexpensive, then extending NMF to semi-supervised learning results in great practical value.
Theories
  • Since factorization of matrices is non-unique and plenty of different methods to perform, so incorporating different constraints have been developed.
    • PCA/SVD: Decomposes matrix as linear combination of principle components (also eigenvalue decomposition)
    • NMF: Enforces the constraints that the elements of the factor matrices must be nonnegative.
  • Suppose n data points, each data point is m-dimensional and represented by a vector. The vectors are placed as the columns and the dataset is represented by a matrix X. Then NMF aims to find two nonnegative matrix factors U and V where the product of the two factors is an approximation of the original matrix. These represent as:

  • The approximation is quantified by a cost function which can be constructed by some distance measures.
    • Frobenius norm - square of euclidean distance between two matrices Then, the goal of NMF is to obtain following results:

    • Divergence of X from Y - Y is product of UV. This measurement is not symmetric, and the distance from X to Y is not necessarily the same as the distance from Y to X.

Semi-supervised Learnings

Concepts

Semi-supervised learning is the machine learning task of a large amount of input data with only small amount of input labels. In real-world data, semi-supervised learning is common since unlabeled data is cheap and easy to collect and store.

Approaches

  • Constrained Nonnegative Matrix Factorization

Constrained Nonnegative Matrix Factorization

  • Takes the label information as additional hard constraints
  • The data points from the same class should be merged together in the new representation space, then the obtained representation has the consistent label with the original data, and can have more discriminating power.
  • Parameter free, avoids the cost of tuning parameter in order to get the best result

Comparison: NMF vs CNMF

NMF CNMF
unsupervised learning
not incoporate the label information
semi-supervised learning
takes the label information as constraints
no way to directly obtain solution parameter free
no cost of tuning parameters