Deep Learnig
Deep Learnig
Weights
• Weights play an important role in Neural Network, every node/neuron has some
weights.
• Neural Networks learn through the weights, by adjusting weights the neural
networks decide whether certain features are important or not.
Hidden Layer
• These lie between the input layer and the output layer.
• In this layer, the neurons take in a set of weighted inputs and produce an output with
the help of the activation function.
• In this step, we apply activation function, these neurons apply different
transformations to the input data.
• There are many activation functions used in DL,some of them are ReLU, Threshold
Function, Sigmoid.
Output Layer
• This is the last layer in the neural networks and receives input from the last node in
the hidden layer. This layer can be
• Continous (stock price)
• Binary (0 or 1)
• Categorical (Cat or Dog or Duck)
There are two phases in the Neural Network cycle, one is the training phase and the other is
the prediction phase.
• The process of finding the weight and bias values occurs in the training phase.
• The process where the neural network processes input to produce predictions comes
under the prediction phase.
• Consider the learning process of a neural network as an iterative process of the pass
and return.
• Pass is a process of Forward Propagation of Information and return is the Backward
Propagation of Information.
• In Forward Propagation, Given some data, we compute the dot product of that input
value with the assigned weight and then add all those and apply the activation
function to the result in the hidden layer.
• This node acts as an input layer for the next layer. This is repeated until we get the
final output vector y.
• The obtained output value is known as a predicted value.
• we compare the predicted value with the actual value, the difference is known as an
error which is known as Cost Function.
• Loss Function is inversely proportional to accuracy, less the cost function, more is the
accuracy, our goal is to minimize the loss function.
• The formula for loss function
• After calculation the Loss Function, we feed this information back to Neural Network
where it travels back through weights and weights are updated, this method is
called Back Propagation.
• This process is repeated several times so that the machine understands the data and
the weights for the features.
Gradient Descent
• It is an optimization technique that is used to improve the neural network-based
models by minimizing the cost function.
• This process occurs in the backpropagation step.
• It allows us to adjust the weights of the features in order to reach the global minima.
• A global minimum is a point where the function value is smaller than at all other
feasible points.
Activation Function:
• It’s just a function that is used to get the output of node, It is also known as Transfer
Function.
• It is used to determine the output of neural network like yes or no. It maps the
resulting values in between 0 to 1 or -1 to 1 etc. (depending upon the function).
• The Activation Functions can be basically divided into 2 types-
Linear Activation Function
Non-linear Activation Functions
Linear or Identity Activation Function
• The function is a line or linear, the output of the functions will not be confined
between any range.
Equation : f(x) = x
Range : (-infinity to infinity)
• It makes it easy for the model to generalize or adapt with variety of data and to
differentiate between the output.
• The main terminologies needed to understand for nonlinear functions are:
• Derivative or Differential: Change in y-axis w.r.t. change in x-axis. It is also known as
slope.
• Monotonic function: A function which is either entirely non-increasing or non-
decreasing.
• The Nonlinear Activation Functions are mainly divided on the basis of their range or
curves-
1. Sigmoid or Logistic Activation Function
• The Sigmoid Function curve looks like a S-shape.
• sigmoid function is especially used for models where we have to predict the
probability as an output.
• Since probability of anything exists only between the range of 0 and 1.
• The softmax function is a more generalized logistic activation function which is used
for multiclass classification.
2. Tanh or hyperbolic tangent Activation Function
• tanh is also like logistic sigmoid but better.
• The range of the tanh function is from (-1 to 1).
• tanh is also sigmoidal (s - shaped).
• The advantage is that the negative inputs will be mapped strongly negative and the
zero inputs will be mapped near zero in the tanh graph.
• The tanh function is mainly used classification between two classes.
• Both tanh and logistic sigmoid activation functions are used in feed-forward nets.
3. ReLU (Rectified Linear Unit) Activation Function
• The ReLU is the most used activation function.
• it is used in almost all the convolutional neural networks or deep learning.
• the ReLU is half rectified (from bottom).
• f(z) is zero when z is less than zero and f(z) is equal to z when z is above or equal to
zero.
• Range: [ 0 to infinity)
• The function and its derivative both are monotonic.
• But the issue is that all the negative values become zero immediately which
decreases the ability of the model to fit or train from the data properly.
4. Leaky ReLU
• It is an attempt to solve the dying ReLU problem
• The leak helps to increase the range of the ReLU function. Usually, the value of a is
0.01 or so.
• When a is not 0.01 then it is called Randomized ReLU.
• Therefore the range of the Leaky ReLU is (-infinity to infinity).
• Both Leaky and Randomized ReLU functions are monotonic and their derivatives are
also monotonic in nature.
Multi-layer neural network:
Multilayer Perceptron
• The Multilayer Perceptron is a neural network where the mapping between inputs
and output is non-linear.
• A Multilayer Perceptron has input and output layers, and one or more hidden layers
with many neurons stacked together.
And neuron must have an activation function like ReLU or sigmoid.
• Multilayer Perceptron falls under the category of feedforward algorithms, each linear
combination is propagated to the next layer.
• Each layer is feeding the next one with the result of their computation.
This goes all the way through the hidden layers to the output layer
• Backpropagation is the learning mechanism that allows the Multilayer Perceptron to
iteratively adjust the weights in the network, with the goal of minimizing the cost
function.
XOR Problem:
• The XOR problem is not linearly separable, thus a single perceptron cannot solve it
• Let’s analyze how this MLP works.
• We assume here that all the neurons use activation function (i.e., the function whose
value is 1 for all non-negative inputs, and 0 for all negative inputs)
• The top hidden neuron is connected only to the first input x₁ with a connection
weight of 1, and it has a bias of -1.
• Therefore, this neuron fires only when x₁ = 1 (in which case its net input is
1 × 1 + (-1) = 0
• For example, let’s compute the forward propagation of this MLP for the inputs x₁ = 1
and x₂ = 0.
• The activations of the hidden neurons in this case are:
• We can see that only the top hidden neuron fires in this case.
• The activation of the output neuron is therefore:
• The output neuron fires in this case, which is what we expect the output of XOR to be
for the inputs x₁ = 1 and x₂ = 0.