0% found this document useful (0 votes)
61 views

Multi Layer Perceptron

Uploaded by

kmelaku895
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views

Multi Layer Perceptron

Uploaded by

kmelaku895
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Multi Layer Perceptron

Neural Network

• Humans have an ability to identify patterns within the


accessible information with an astonishingly high degree
of accuracy.
• Whenever you see a car or a bicycle you can immediately
recognize what they are. This is because we have
learned over a period of t ime how a car and bicycle looks
like and what their distinguishing features are.
• Artificial neural networks are computation systems that
intend to imitate human learning capabilities via a
complex architecture that resembles the human nervous
system.
Human Nervous System
Human Nervous System

• Human nervous system consists of billions of neurons. These


neurons collectively process input received from sensory
organs, process the information, and decides what to do in
reaction to the input.
• A typical neuron in the human nervous system has three
main parts: dendrites, nucleus, and axons.
– The information passed to a neuron is received by
dendrites.
– The nucleus is responsible for processing this information.
– The output of a neuron is passed to other neurons via the
axon, which is connected to the dendrites of other
neurons further down the network.
Perceptron

• A percept ron is a simple binary classificat ion


algorithm, proposed by Cornell scientist Frank
Rosenblatt.
• It helps to divide a set of input signals into two
parts—“yes” and “no”.
• But unlike many other classification algorithms, the
perceptron was modeled after the essential unit of
the human brain—the neuron and has an uncanny
ability to learn and solve complex problems.
Perceptron
Perceptron

• A perceptron is a very simple learning machine.


It can take in a few inputs, each of which has a
weight to signify how important it is, and
generate an output decision of “0” or “1”.
• However, when combined with many other
perceptrons, it forms an artificial neural
network.
• A neural network can, theoretically, answer any
question, given enough training data and
computing power.
Multilayer Perceptron

• A multilayer perceptron (MLP) is a perceptron


that teams up with additional perceptrons,
stacked in several layers, to solve complex
problems.
• Each perceptron in the first layer on the left
(the input layer), sends outputs to all the
perceptrons in the second layer (the hidden
layer), and all perceptrons in the second layer
send outputs to the final layer on the right (the
output layer).
Multilayer Perceptron
Multilayer Perceptron

• Each layer can have a large number of perceptrons,


and there can be multiple layers, so the multilayer
percept ron can quickly become a very complex
system.
• The multilayer perceptron has another, more
common name—a neural network.
• A three-layer MLP, like the diagram in previous slide,
is called a Non-Deep or Shallow Neural Network.
• An MLP with f our or more layers is called a Deep
Neural Network.
Multilayer Perceptron

• One difference bet ween an MLP and a neural


network is that in the classic perceptron, the
decision funct ion is a st ep funct ion and t he
output is binary.
• In neural networks that evolved from MLPs,
other activation functions can be used which
result in outputs of real values, usually between
0 and 1 or between -1 and 1.
• This allows for probability-based predictions or
classification of items into multiple labels.
Structure of a Perceptron
The Percpetron Learning Process

1 Takes the inputs, multiplies them by their


weights, and computes their sum
2 Adds a bias factor, the number 1 multiplied by a
weight
3Feeds the sum through the activation function
4 The result is the perceptron output
Step-1 Backpropogation

• Takes the inputs, multiplies them by their


weights, and computes their sum
• Why It’s Important ?
– The weights allow the perceptron to evaluate the
relative importance of each of the outputs.
– Neural network algorithms learn by discovering
better and better weights that result in a more
accurate prediction.
– There are several algorithms used to fine tune the
weights, the most common is called backpropagation.
Step-2 Neural Network Bias

• Adds a bias factor, the number 1 multiplied by a


weight
• This is a technical step that makes it possible to
move the activation function curve up and
down, or left and right on the number graph.
• It makes it possible to fine-tune the numeric
output of the perceptron.
Step-3 Activation Function

• Feeds the sum through the activation function


• The activation function maps the input values to
the required output values.
• For example, input values could be between 1
and 100, and outputs can be 0 or 1. The activation
function also helps the perceptron to learn, when
it is part of a mult ilayer percept ron (MLP).
• Certain properties of the activation function,
especially its non-linear nature, make it possible
to train complex neural networks.
Step-4 Output

• The perceptron output is a classification


decision.
• In a multilayer perceptron, the output of one
layer’s perceptrons is the input of the next
layer.
• The output of the final perceptrons, in the
“output layer”, is the final prediction of the
perceptron learning model.
Transformation

• From t he Classic Percep tron t o a Full-Fledged


Deep Neural Network
• Although multilayer perceptrons (MLP) and
neural networks are essentially the same thing,
you need to add a few ingredients before an
MLP becomes a full neural net work. These are:
– Backpropagation
– Hyperparameters
– Advanced structures
Backpropogation

• The backpropagation algorithm allows you to


perform a “backward pass”, which helps tune the
weights of the inputs.
• Backpropagation performs iterative backward
passes which at t empt t o minimize t he “loss”, or t he
difference between the known correct prediction
and the actual model prediction.
• With each backward pass, t he weight s move
towards an optimum that minimizes the loss
function and results in the most accurate prediction.
Backpropogation

• Backpropagation is an algorithm commonly used


to train neural networks.
• When the neural network is initialized, weights
are set for its individual elements, called neurons.
• Inputs are loaded, they are passed through the
network of neurons, and the network provides an
output for each one, given the initial weights.
• Backpropagation helps to adjust the weights of
the neurons so that the result comes closer and
closer to the known true result.
Backpropogation
Hyperparameters

• In a modern neural net work, aspect s of t he


multilayer structure such as the number of
layers, initial weights, the type of activation
function, and details of the learning process, are
t reat ed as paramet ers and t uned t o improve
the performance of the neural network.
• Tuning hyperparameters is an art, and can have
a huge impact on the performance of a neural
network.
Model vs. Hyperparameters

• Model parameters are internal to the neural network – for


example, neuron weights. They are estimated or learned
automatically from training samples. These parameters
are also used to make predictions in a production model.
• Hyperparameters are external parameters set by the
operator of the neural network – for example, selecting
which activation function to use or the batch size used in
training.
• Hyperparameters have a huge impact on the accuracy of a
neural network, there may be different optimal values for
different values, and it is non-trivial to discover those
values.
Hyperparameters of Neural N/W

• Number of hidden layers


• Dropout
• Neural network activation function
• Weights initialization
Hyperparameters of Neural N/W

• Number of hidden layers –


– adding more hidden layers of neurons generally improves
accuracy, to a certain limit which can differ depending on
the problem.
• Dropout –
– what percentage of neurons should be randomly “killed”
during each epoch to prevent overfitting.
• Neural network activation function –
– which function should be used to process the inputs flowing
into each neuron. The activation function can impact the
network’s ability to converge and learn for different ranges
of input values, and also its training speed.
Hyperparameters of Neural N/W

• Weights initialization –
– it is necessary to set initial weights for the first forward
pass. Two basic opt ions are t o set weight s t o zero or t o
randomize them.
– However, this can result in a vanishing or exploding
gradient, which will make it difficult to train the model.
– To mitigate this problem, you can use a heuristic (a
formula tied to the number of neuron layers) to
determine the weights.
– A common heuristic used for the Tanh activation is
called Xavier initialization.
Hyperparameters of training algo

• Neural network learning rate


• Deep learning epoch, iterations and batch size
• Optimizer algorithm and neural network
momentum
Neural Network Learning Rate

• How fast the backpropagation algorithm


performs gradient descent.
• A lower learning rate makes the network train
faster but might result in missing the minimum
of the loss function.
Epoch, iterations, batch size

• Deep learning epoch, iterations and batch size – these


parameters determine the rate at which samples are fed to
the model for training.
• An epoch is a group of samples which are passed through
the model together (forward pass) and then run through
backpropagation (backward pass) to determine their
optimal weights.
• If the epoch cannot be run all together due the size of the
sample or complexity of the network, it is split into
batches, and the epoch is run in two or more iterations.
• The number of epochs and batches per epoch can
significantly affect model fit, as shown (next slide).
Epoch, iterations, batch size
Optimizer Algorithm

• Optimizer algorithm and neural network momentum –


when a neural network trains, it uses an algorithm to
determine the optimal weights for the model, called an
optimizer.
• The basic option is Stochastic Gradient Descent, but there
are other options.
• Another common algorithm is Momentum, which works by
waiting after a weight is updated, and updating it a second
time using a delta amount.
• This speeds up training gradually, with a reduced risk of
oscillation. Other algorithms are Nesterov Accelerated
Gradient, AdaDelta and Adam.
Hyperparameter Tuning Methods

• Manual Hyperparameter Tuning


• Grid Search
• Random Search
• Bayesian Optimization
Manual Tuning

• Traditionally, hyperparameters were tuned manually by


trial and error.
• This is still commonly done, and experienced operators
can “guess” parameter values that will achieve very high
accuracy for deep learning models.
• However, there is a constant search for better, faster
and more automatic methods to optimize
hyperparameters.
• Pros: Very simple and effective with skilled operators
• Cons: Not scient ific, unknown if you have f ully
optimized hyperparameters
Grid Search

• Grid search is slightly more sophisticated than manual tuning. It


involves systematically testing multiple values of each
hyperparameter, by automatically retraining the model for each
value of the parameter.
• For example, you can perform a grid search for the optimal
batch size by automatically training the model for batch sizes
between 10-100 samples, in steps of 20.
• The model will run 5 times and the batch size selected will be
the one which yields highest accuracy.
• Pros: Maps out the problem space and provides more
opportunity for optimization
• Cons: Can be slow to run for large numbers of hyperparameter
values
Random Search

• According to a 2012 research study by James Bergstra and


Yoshua Bengio, testing randomized values of
hyperparameters is actually more effective than manual
search or grid search.
• In other words, instead of testing systematically to cover
“promising areas” of the problem space, it is preferable to
test random values drawn from the entire problem space.
• Pros: According to the study, provides higher accuracy with
less training cycles, for problems with high dimensionality
• Cons: Result s are unintuit ive, difficult t o underst and “why”
hyperparameter values were chosen
Comparing
Baysian Optimization

• Bayesian optimization (described by Shahriari, et al) is


a technique which tries to approximate the trained
model with different possible hyperparameter values.
• To simplify, bayesian optimization trains the model
with different hyperparameter values, and observes
the function generated for the model by each set of
parameter values.
• It does this over and over again, each time selecting
hyperparameter values that are slightly different and
can help plot the next relevant segment of the
problem space.
Baysian Optimization

• Similar to sampling methods in statistics, the


algorithm ends up with a list of possible
hyperparameter value sets and model functions, from
which it predicts the optimal function across the
entire problem set.
• Pros: The original study and practical experience
from the industry shows that bayesian optimization
results in significantly higher accuracy compared to
random search.
• Cons: Like random search, results are not intuitive
and difficult to improve on, even by trained operators
In real world...

• In a real neural network project, you will have three


practical options:
– Performing manual optimization
– Leveraging hyperparameter optimization
techniques in the deep learning framework of
your choice. The framework will report on
hyperparameter values discovered, their
accuracy and validation scores
– Using third party hyperparameter optimization
tools
Advanced Strutures

• Many neural networks use a complex structure that


builds on the multilayer perceptron.
• For example, a Recurrent Neural Network (RNN) uses
two neural networks in parallel—one runs the training
data from beginning to end, the other from the end to
the beginning, which helps with language processing.
• A Convolutional Neural Netw ork (CNN) uses a three-
dimensional MLP—essentially, three multilayer
perceptron structures that learn the same data point.
• This is useful for color images which have three layers
of “depth”—red, green and blue.
Neural Network in Real World

• In the real world, perceptrons work under the hood.


You will run neural networks using deep learning
frameworks such as TensorFlow, Keras, and PyTorch.
• These frameworks ask you for hyperparameters
such as the number of layers, activation function,
and type of neural network, and construct the
network of perceptrons automatically.
• When you work on real, production-scale deep
learning projects, you will find that the operations
side of t hings can become a bit daunt ing:
Neural Network in Real World

• Running experiments at scale and tracking results,


source code, metrics, and hyperparameters.
– To succeed at deep learning you need to run large
numbers of experiments and manage them
correctly to see what worked.
• Running experiments across multiple machines—
– in most cases neural networks are computationally
intensive. To work efficiently, you’ll need to run
experiments on multiple machines. This requires
provisioning these machines and distributing the
work.
Neural Network in Real World

• Manage training data —


– The more training data you provide, the
better the model will learn and perform.
– There are files to manage and copy to the
training machines.
– If your model’s input is multimedia, those
files can weigh anywhere from Gigabytes to
Petabytes.
Activation Function

• Neural network activation functions are a crucial


component of deep learning.
• Act ivat ion f unct ions determine t he out put of a deep
learning model, its accuracy, and also the
comput at ional efficiency of t raining a model— which
can make or break a large scale neural network.
• Activation functions also have a major effect on the
neural network’s ability to converge and the
convergence speed, or in some cases, activation
functions might prevent neural networks from
converging in the first place.
Activation Function

• Activation functions are mathematical equations


that determine the output of a neural network.
• The function is attached to each neuron in the
network, and determines whether it should be
activated (“fired”) or not, based on whether
each neuron’s input is relevant for the model’s
prediction.
• Activation functions also help normalize the
output of each neuron to a range between 1 and
0 or between -1 and 1.
Activation Function

• An additional aspect of activation functions is


that they must be computationally efficient
because they are calculated across thousands or
even millions of neurons for each data sample.
• Modern neural net works use a t echnique called
backpropagation to train the model, which
places an increased computational strain on the
activation function, and its derivative function.
Common Activation Function
ANN and DNN

• Artificial Neural Networks (ANN) are comprised of a


large number of simple elements, called neurons, each of
which makes simple decisions. Together, the neurons can
provide accurate answers to some complex problems,
such as natural language processing, computer vision,
and AI.
• A neural network can be “shallow”, meaning it has an
input layer of neurons, only one “hidden layer” that
processes the inputs, and an output layer that provides
the final output of the model.
• A Deep Neural Network (DNN) commonly has between 2-
8 additional layers of neurons.
Non-Deep Feed Forward Neural N/W
Deep Neural Network
Useful resources

• https://missinglink.ai
• https://machinelearningmastery.com
• https://www.allaboutcircuits.com
• https://medium.com

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy