0% found this document useful (0 votes)
34 views

ANN MODULE 1 Part2

Uploaded by

yaminisatish461
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

ANN MODULE 1 Part2

Uploaded by

yaminisatish461
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Perceptron

A perceptron is a fundamental building block of neural


networks and artificial neural systems. It's a simple
computational unit inspired by the way biological
neurons work in the human brain. The perceptron is a
linear classifier used to solve binary classification
problems, where the goal is to separate two classes of
data points by finding an optimal decision boundary.
1.Rosenblatt Perceptron model was designed by Rosenblatt in 1958
to overcome the issues of McCulloch-Pitts Neuron model.

✓ It can process non-Boolean inputs and it assigns different weights to


each input automatically.

2. It is a single layer network.

3. Rosenblatt perceptron can be seen as a set of inputs that are


weighted and to which we apply activation function.

2
3
Here's a breakdown of the components:

Weighted Sum of Inputs: Each input feature is multiplied by its


corresponding weight. The products of these multiplications are then
summed up to calculate the weighted sum of inputs.

Bias: The bias is a constant term added to the weighted sum. It helps
the perceptron account for cases where all input values are zero.

Activation Function: The activation function takes the weighted


sum plus bias as input and determines the final output of the
perceptron. The choice of activation function can impact the
perceptron's ability to learn complex patterns.

4
The most common activation function used in perceptrons is the step
function or the sign function. The step function outputs 1 if the
weighted sum plus bias is greater than or equal to zero, and it outputs 0
otherwise. This essentially makes the perceptron act as a simple
threshold-based binary classifier

Perceptrons can be used to build more complex neural networks by


combining them in layers. However, a single perceptron can only learn
linear decision boundaries, which limits its ability to handle complex
data. To overcome this limitation, multi-layer neural networks with
nonlinear activation functions (e.g., sigmoid, tanh, ReLU) are used,
enabling them to learn more intricate relationships in the data

5
6
7
8
9
10
Linear Separability
⚫ Linear Separable:

⚫ Linear inseparable:

⚫ Solution?
Multilayer Perceptron

13
Multilayer Perceptron
(MLP)
Output Values

Output Layer
Adjustable
Weights

Input Layer

Input Signals (External


Stimuli)
Types of Layers
• The input layer.
– Introduces input values into the network.
– No activation function or other processing.
• The hidden layer(s).
– Perform classification of features
– Two hidden layers are sufficient to solve any problem
– Features imply more layers may be better
• The output layer.
– Functionally just like the hidden layers
– Outputs are passed on to the world outside the neural
network.
Multilayer perceptron with two hidden
layers

Output Signals
Input Signals

First Second
Input hidden hidden Output
layer layer layer layer
What does the middle layer hide?
◆A hidden layer “hides” its desired output.
◆ Neurons in the hidden layer cannot be observed
through the input/output behaviour of the network.
◆ There is no obvious way to know what the desired
output of the hidden layer should be.
◆ Commercial ANNs incorporate three and sometimes
four layers, including one or two hidden layers.
◆ Each layer can contain from 10 to 1000 neurons.
◆ Experimental neural networks may have five or
even six layers, including three or four hidden layers,
and utilise millions of neurons.
17
18
19
Delta Rule

• The delta rule can be states as follows: The


adjustment made to weight factor of an input neuron
connection is proportional to the product of the error
signal and the input value of the connection in question
• The key idea behind the delta rule is to use gradient
descent to search the hypothesis space of possible
weight vectors to find the weights that best fit the
training data.
• Delta rule is important because it provides the basis for
the backpropagation algorithm, which can learn
networks with many interconnected units.
Gradient Descent
Gradient Descent is defined as one of the most commonly
used iterative optimization algorithms of machine learning
to train the machine learning and deep learning models. It
helps in finding the local minimum of a function.

21
Error

• An error exists at the output of a neuron j at iteration n (i.e.,


presentation of the nth training sample)
– ej(n) = tj(n) – yj(n)
• Define the instantaneous value of the error for neuron j is
– (1/2)ej2(n)
• The total error for the entire network is obtained by summing
instantaneous values over all neurons
– E(n) = (1/2)∑ ej2(n)
Sigmoid Unit
kth sigmoid unit

x1 wk1
x2 wk2
. .
. . ok
. .
xm wkm

bk
1
ok = f ( y k ) =
1 + e − yk
E  1
= 
wi wi 2 dD
(td − od ) 2


=
1

2 dD wi
(td − od ) 2

(td − od )
=
1

2 dD
2(td − o d )
wi
od
= −  (td − od )
dD wi
od yd
= −  (td − od ) chain rule
dD yd wi
n
( xiwi )
 1
= −  (td − od ) ( − yd
) i=0
Sigmoid
dD yd 1+ e wi function
= −  (td − od )od (1− od )xi Continue….
dD 35
E
wi = − =   (td − od )od (1− od )xi
wi dD

wi  wi +   (td − od )od (1− od )xi Delta rule learning (training)


dD

wji = (td − od )od (1− od )xi


wji  wi +  (td − od )od (1− od )xi

36
What is Backpropagation?

• Supervised Error Back-propagation Training


The mechanism of backward error transmission is used to
modify the synaptic weights of the internal (hidden) and
output layers.

• Based on the delta learning rule.

• One of the most popular algorithms for supervised training of


multilayer feed forward networks.
31
The Back-Propagation Algorithm

32
Steps in Back propagation Algorithm
⚫ STEP ONE: initialize the weights and biases.

⚫ The weights in the network are initialized to


random numbers from the interval [-1,1].

⚫ Each unit has a BIAS associated with it

⚫ The biases are similarly initialized to random


numbers from the interval [-1,1].

⚫ STEP TWO: feed the training sample.


Steps in Back propagation Algorithm
( cont..)

⚫ STEP THREE: Propagate the inputs forward;


we compute the net input and output of each
unit in the hidden and output layers.

⚫ STEP FOUR: back propagate the error.

⚫ STEP FIVE: update weights and biases to


reflect the propagated errors.

⚫ STEP SIX: terminating conditions.


Propagation through Hidden
Layer ( One Node )
- Bias j
x0 w0j


x1 w1j
f
output y
xn wnj

Input weight weighted Activation


vector x vector sum function
w
⚫ The inputs to unit j are outputs from the previous layer. These are
multiplied by their corresponding weights in order to form a
weighted sum, which is added to the bias associated with unit j.
⚫ A nonlinear activation function f is applied to the net input.
Propagate the inputs forward
⚫ For unit j in the input layer, its output is
equal to its input, that is, O j = I j
for input unit j.
• The net input to each unit in the hidden and output
layers is computed as follows.
•Given a unit j in a hidden or output layer, the net input is
I j =  wij Oi +  j
i

where wij is the weight of the connection from unit i in the previous layer to
unit j; Oi is the output of unit I from the previous layer;

j
is the bias of the unit
Propagate the inputs forward

⚫ Each unit in the hidden and output layers takes its


net input and then applies an activation function.
The function symbolizes the activation of the
neuron represented by the unit. It is also called a
logistic, sigmoid, or squashing function. O = 1 +1e
j −I j

⚫ Given a net input Ij to unit j, then


Oj = f(Ij),
the output of unit j, is computed as
Back propagate the error
⚫ When reaching the Output layer, the error is
computed and propagated backwards.
⚫ For a unit k in the output layer the error is
• computed by a formula:
Errk = Ok (1 − Ok )(Tk − Ok )
Where O k – actual output of unit k ( computed by activation
function. 1
Ok =
1 + e− Ik

Tk – True output based of known class label; classification of


training sample

Ok(1-Ok) – is a Derivative ( rate of change ) of activation function.


Update weights and biases

⚫ Weights are updated by the following equations,


where l is a constant between 0.0 and 1.0
reflecting the learning rate, this learning rate is
fixed for implementation.
wij = (l ) ErrjOi

wij = wij + wij

• Biases are updated by the following equations

 j = (l) Errj

 j =  j +  j
Update weights and biases
⚫ We are updating weights and biases after the
presentation of each sample.
⚫ This is called case updating.

⚫ Epoch --- One iteration through the training set is called an


epoch.

⚫ Epoch updating ------------


⚫ Alternatively, the weight and bias increments could be
accumulated in variables and the weights and biases
updated after all of the samples of the training set have
been presented.

⚫ Case updating is more accurate


Terminating Conditions
⚫ Training stops
• All w in the previous epoch are below some
ij
threshold, or
•The percentage of samples misclassified in the previous
epoch is below some threshold, or

• a pre specified number of epochs has expired.

• In practice, several hundreds of thousands of epochs may


be required before the weights will converge.
Backpropagation Formulas

Output vector
Errk = Ok (1 − Ok )(Tk − Ok )
Output nodes
1 Err j = O j (1 − O j ) Errk w jk
Oj = −I j k
1+ e
Hidden nodes

I j =  wij Oi +  j wij  j =  j + (l) Err j


i
wij = wij + (l ) Err j Oi
Input nodes

Input vector: xi
Example of Back propagation
Input = 3, Hidden
Neuron = 2 Output =1

Initialize weights :

Random Numbers
from -1.0 to 1.0

Initial Input and weight

x1 x2 x3 w14 w15 w24 w25 w34 w35 w46 w56

1 0 1 0.2 -0.3 0.4 0.1 -0.5 0.2 -0.3 -0.2


Example ( cont.. )

⚫ Bias added to Hidden


⚫ + Output nodes
⚫ Initialize Bias
⚫ Random Values from
⚫ -1.0 to 1.0

⚫ Bias ( Random )

θ4 θ5 θ6

-0.4 0.2 0.1


Net Input and Output Calculation

Unitj Net Input Ij Output Oj

4 0.2 + 0 + 0.5 -0.4 = -0.7 1


Oj = = 0.332
1 + e0.7
5 -0.3 + 0 + 0.2 + 0.2 =0.1
1
Oj = = 0.525
1 + e −0.1
6 (-0.3)0.332-
1
(0.2)(0.525)+0.1= -0.105 Oj = = 0.475
1 + e0.105
Calculation of Error at Each Node

Unit j Error j
6 0.475(1-0.475)(1-0.475) =0.1311
We assume T 6 = 1

5 0.525 x (1- 0.525)x 0.1311x


(-0.2) = 0.0065
4 0.332 x (1-0.332) x 0.1311 x
(-0.3) = -0.0087
Calculation of weights and Bias
Updating
Learning Rate l =0.9

Weight New Values


w46 -0.3 + 0.9(0.1311)(0.332) = -
0.261
w56 -0.2 + (0.9)(0.1311)(0.525) = -
0.138
w14 0.2 + 0.9(-0.0087)(1) = 0.192

w15 -0.3 + (0.9)(-0.0065)(1) = -0.306


……..similarly ………similarly
θ6 0.1 +(0.9)(0.1311)=0.218

……..similarly ………similarly
48
49
50
51
52
53
54
55
56
57
Applications
⚫ domains and tasks where neural networks are
successfully used
• recognition
• control problems
• series prediction
• weather, financial forecasting
• categorization
• sorting of items (fruit, characters, …)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy