Skip to content

ChinmayaSharma-hue/neural-networks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Neural-Networks

Neural Networks are complex architectures thaat use the concept of feedforward and backpropagation in order to optimize prediction by inferene using weights that are varied based on backpropagation.

Feedforward

The weights are randomly initialized in the beginning and the inputs are fed into the model. After a set of matrix operations, an output is obtained. This part of the model is called feedforward. Feedforward involves using the weights in order to make a prediction which can be compared with the actual prediction in order to make changes to the weights and bias accordingly.

Backpropagation

The output from the model is compared with the actual output, and based on the difference that is measured using a desired metric, the weights and bias are changed, by using what is known as Gradient Descent, which hopes to get the model weights to converge to values that give the least possible error in the metric used for comparison between model predictions and the desired output.

Logistic Regression with a single output

In a logistic regression model described by n features and one output as shown, each of the features are weighted and added, and then the weighted sum is passed through an activation function that gives a probability of the input belonging to a specific class.

image

The expression for the output can be obtained as,

image

where the subscript 'i' represents the index of the sample of the input that is passed through the model, since the model has to train on several samples to learn. The weights and the input can be written in the form of matrices in order to make the expression simpler,

image

The bias is a single scalar, since the number of outputs is 1. The matrix multiplication of the weights and inputs matrices gives a matrix with a single element, which is added to the bias and then passed through the activation function to get the output. The output can therefore be expressed as,

image

If there are several samples of data that is passed through the model, the input matrix needs to be extended to include all the samples of the data, and the output can be represented as a matrix consisting of the outputs of different samples. If the number of samples is r, the input matrix can be represented as,

image

which is a matrix with dimensions [n x r], with each column representing one sample of data, and each row representing the index of the input. Similarly, the output matrix can be represented as,

image

which is a matrix with dimensions [1 x r], with each column representing the output of one sample of data. The weight matrix remains unchanged, as the same set of weights is used for different samples of the data. The bias matrix, however, is broadcasted in order to have the same dimension as the input.

image

which is a matrix with dimensions [1 x r], with each of the column having the same value for b as the same bias is used for different samples of data. Now, the output matrix can be expressed in terms of the weights matrix, inputs matrix and the bias matrix as,

image

Dimensionally speaking, the matrix multiplication of the weights and the inputs matrix gives a matrix of dimension [1 x r], and the bias matrix is of the same dimension which is added to get a matrix of the same dimension, and the activation function is applied to each element of the matrix individually to get the output matrix with the dimension [1 x r] This is the vectorization of the model when the number of outputs is 1. When the number of outputs is more than 1, the outputs matrix has to be extended by increasing the number of rows, with each row corresponding to a distinct output of the model.

Logistic Regression with multiple outputs

The output matrix when the number of outputs of the model is more than 1, say m, is given by,

image

where each column represents the outputs for a distinct sample of data, and each row represents the index of the output of the model. The weights matrix also needs to be extended as different sets of weights are used for different outputs of the model, and the bias matrix also needs to be extended as different bias is used for different outputs of the model, while the inputs matrix remains the same.

The weights matrix is given by a matrix with dimensions [m x n], where each row corresponds to a set of weights that is used to obtain a distinct output of the model, and each column corresponds to the set of weights that is used on a specific input index.

image

The bias matrix is given by a matrix with dimensions [m x r], where each row corresponds to the bias used to obtain a distinct output irrespective of the sample used, and each column corresponds to the bias used for different samples, which is the same for all the samples. If there are several samples of data that is passed through the model, the input matrix needs to be extended to include all the samples of the data, and the output can be represented as a matrix consisting of the outputs of different samples. If the number of samples is r, the input matrix can be represented as,

image

The output matrix of the model can be expressed in terms of the weights matrix, inputs matrix and the bias matrix as,

image

Backpropagation through a Logistic Regression Model

Let the cost function be denoted by C. The objective is to find the variation of the cost with respect to the weights. The cost is calculated from the actual value of the output to be predicted, and the value predicted by the model.

Let the weighted sum of the inputs be given by z and the activated value of z be given by y. Therefore, the output of the model is y.

image

The above equation follows from the chain rule of derivatives. This expression has three terms,

  1. Derivative of the cost with respect to the output
  2. Derivative of the output with respect to the weighted sum of the inputs
  3. Derivative of the weighted sum of outputs with respect to the weight

Finding the first term

If the cost function taken is the Mean Square Error function,

image

which is the sum of the squares of the difference of the actual and the predicted values, which is averaged over all the samples.
The derivative of the cost function with respect to the output is given by,

image

Finding the second term

If the activation function taken is represented by σ, the value of y in terms of z is given by,

image

The derivative of the output (y) in terms of the weighted sum of the inputs (z) is given by,

image

Finding the third term

The weighted sum of the inputs (z) in terms of the weights (w) is given by,

image

The derivative of the weighted sum of inputs (z) with respect to a specific weight is given by,

image

Therefore, the derivative of the weighted sum of inputs in terms of a specific weight is the input data point which is multiplied with that specific weight in the expression of the weighted sum of inputs.
Combining the three terms, and averaging over all the samples,

image

In order to find the variation of the cost with the bias,

image

The first two terms are the same as that of the derivative of the cost with respect to the weights. In order to find the third term,

image

Combining the three terms, and averaging over all the samples,

image

The variation of the cost with respect to the weights and the bias are given by,

image Using these two equations, the gradient matrix of the weights and the bias can be found.

image

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published