Isch 4
Isch 4
2
Biological Neural Networks
3
Biological Neural Networks
4
Biological Neural Network
5
Artificial Neural Network (ANN)
6
Analogy Between Biological and ANN
7
Learning in ANN
➢ Supervised learning
➢ Uses a set of inputs for which the desired outputs are known
➢ Example: Back propagation algorithm
➢ Unsupervised learning
➢ Uses a set of inputs for which no desired output are known.
➢ The system is self-organizing; that is, it organizes itself internally. A human must
examine the final categories to assign meaning and determine the usefulness of
the results.
➢ Example: Self-organizing map
8
The Neuron as a simple computing
element: Diagram of a neuron
❑ The neuron computes the weighted sum of
the input signals and compares the result
with a threshold value, θ. If the net input
is less than the threshold, the neuron
output is –1. But if the net input is greater
than or equal to the threshold, the neuron
becomes activated and its output attains a
value +1.
❑ The neuron uses the following transfer or
activation function
𝑛 +1 𝑖𝑓 𝑋 ≥ 𝜃
❑ 𝑋 = σ𝑖=1 𝑥𝑖 𝑤𝑖 𝑌=
−1 𝑖𝑓 𝑋 ≤ 𝜃
❑ This type of activation function is called a
sign function
9
Activation Functions for Neuron
10
Perceptron
11
Perceptron(cont.…)
➢ Perceptron is one of the earliest learning systems.
➢ It can be regarded as the triangle classifier.
➢ Mark I perceptron was capable of making binary decisions.
➢ The idea was to start from a binary image (in the form of Pixels; zeros or ones)
➢ A number of associator units receive inputs from different sets of image pixels and produce binary
outputs in the following levels.
➢ The perceptron uses an error-correction learning algorithm in which the weights are modified after
erroneous decisions as follows:
➢ a) If the decision is correct, the weights are not adjusted
➢ b) If the erroneous decision is the rth one in the list of possible decisions and the correct decisions is the sth
one, some values aj are deducted from the weights wrj and added to the weight wsj (j=1,2,…,n).
12
Perceptron
σ𝒏𝒊=𝟏 𝒙𝒊 𝒘𝒊 − 𝜽 = 0
14
Perceptron
16
Perceptron
where p = 1, 2, 3, …
➢ α is the learning rate , a positive constant less than unity
➢ The perceptron learning rule was first proposed by
Rosenblatt in 1960. Using this rule we can derive
perceptron training algorithm for classification task
17
Perceptron training algorithm
➢ Step 1 : Initialization
➢ Set initial weights 𝑤1 , 𝑤2 , …, 𝑤𝑛 and threshold θ to random
numbers in the range [-0.5, +0.5].
18
Perceptron training algorithm (cont.…)
➢ Step 2 : Activation
Activate the perceptron by applying inputs 𝑥1 (p), 𝑥2 (p), …, 𝑥𝑛 (𝑝) and
desired output 𝑌𝑑 (p)
Calculate the actual output at iteration p = 1
➢ Step 4: Iteration
➢ Increase iteration p by 1, go back to Step 2 and repeat the process
until convergence
20
Perceptron training algorithm
(pseudocode)
inputs: examples, a set of examples, each with input x = x1, ………., xn and output y network,
a perceptron with weights Wj, j = 0…….n , and activation function g
repeat
for each e in examples do
in σ𝑛𝑗=0. Wj xj[e]
Err y[e] – g(in)
Wj Wj + α * Err * g`(in) * xj[e]
until some stopping criterion is satisfied
21
Multilayer Neural Network
➢ A multi layer perceptron is a feed forward neural network with one or more
hidden layers
➢ The network consists of :
➢ Input Layer
➢ Hidden Layer
➢ Output Layer
➢ The input signal is propagated in a forward direction in a layer-by-layer basis
22
Multilayer Neural Network
❑ The hidden layer “hides” its desired output. Neurons in the hidden layer can
not be observed through the input/output behavior of the network. There is
no obvious way to know what the desired output of the hidden layer should
be.
❑ Commercial ANNs incorporate three and sometimes four layers, including one
or two hidden layers. Each layer can contain from 10 to 1000 neurons.
Experimental neural networks may have five or six layers, including three or
four hidden layers, and utilize millions of neurons.
24
Back Propagation
25
Back Propagation
Input Signals
error signals
26
Backpropagation algorithm
➢ 1. Build a network with the chosen number of inputs, hidden and output units.
➢ 2.Initialize all the weights to low random values
➢ 3. Choose a single training pair at random.
➢ 4. Copy the input pattern to the input layer.
➢ 5. Cycle the network so that the activations from the inputs generate the activations in the
hidden and output layers.
➢ 6. Calculate the error derivative between the output activation and the target output.
➢ 7. Backpropagate the summed products of the weights and errors in the output layer in
order to calculate the error in the hidden units.
➢ 8. Update the weights attached to each unit according to errors in that unit, the output from
the unit below it and the learning parameters until the error is sufficiently low or the
network setteles.
27
Back Propagation Training Algorithm
➢ Step 1: Initialization
➢ Set all the weights and threshold levels of the network to
random numbers uniformly distributed inside a small
range:
2.4 2.4
− ,+
𝐹𝑖 𝐹𝑖
28
Back Propagation Training Algorithm
➢ Step 2: Activation
➢ Activate the back-propagation neural network by applying inputs
𝑥1 𝑝 , 𝑥2 𝑝 , …, 𝑥𝑛 𝑝 and desired outputs 𝑦𝑑,1 𝑝 , 𝑦𝑑,2 𝑝 ,…, 𝑦𝑑,𝑛 𝑝
➢ A) Calculate the actual output of the neurons in the hidden layers:
➢ Where n is the number of inputs of neuron j in the hidden layer, and sigmoid is the sigmoid
activation function
29
Back Propagation Training Algorithm
➢ Step 2: Activation(contd…)
➢ b) Calculate the actual outputs of the neurons in the output layer:
30
Back Propagation Training Algorithm
31
Back Propagation Training Algorithm
32
Back Propagation Training Algorithm
➢ Step 3: Iteration
➢ Increase iteration p by one, go back to Step 2 and repeat the process
until the selected error criterion is satisfied.
➢ As an example, we may consider the three layer back-propagation
network. Suppose that the network is required to perform logical
operation Exclusive-OR. Recall that a single-layer perceptron could
not do this operation. Now we will apply the three layer net.
33
Back Propagation Training Algorithm
inputs: examples, a set of examples, each with input vector x and output vector y network, a multilayer network with L layers,
weights Wj, i , activation function g
repeat
for each e in examples do
for each node j in the input layer do aj xj[e]
for l = 2 to M do
ini ∑j Wj,i aj
ai g(ini)
for each node i in the output layer do
∆i g`( ini ) * (yi [e] - ai)
for l = M -1 to 1 do
for each node j in layer l do
∆j g`( inj ) ∑ i Wj,i ∆i
for each node i in layer l+1 do
Wj,i Wj,i + α * aj * ∆i
34
until some stopping criterion is satisfied
Example: Three-layer network for
solving the Exclusive-OR operation
❑ The effect of the threshold applied
to a neuron in the hidden layer is
represented by its weight, 𝜃,
connected to a fixed input equal
to -1
❑ The initial weights and threshold
levels are set randomly as follows:
𝑤14
35
Example: Three-layer network for
solving the Exclusive-OR operation
We consider a training set where inputs 𝑥1 ,and 𝑥2 are equal to 1 and desired output 𝑦𝑑,5 is
0. The actual output of neurons 3 and 4 in the hidden layers are calculated as:
Now the actual output of neuron 5 in the output layer is determined as:
36
Example: Three-layer network for
solving the Exclusive-OR operation
❑ The next step is weight training. To update the weights and threshold levels in our network,
we propagate the error, ℯ , from the output layer backward to the input layer.
❑ First, we calculate the error gradient for neuron 5 in the output layer
❑ Then we determine the weight corrections assuming that the learning rate parameter, 𝛼, is
equal to 0.1
37
Example: Three-layer network for
solving the Exclusive-OR operation
❑ Now we calculate the error gradients for neuron 3 and 4 in the hidden layer:
38
Example: Three-layer network for
solving the Exclusive-OR operation
❑ At last, we update all weights and
thresholds
❑ The training process is updated till
the sum of squared error is less
than 0.001
39
Example: Three-layer network for
solving the Exclusive-OR operation
❑ The Final results of three layer network learning is:
40
Gradient Descent
❑ Gradient descent is an iterative minimisation method. The gradient of the error function
always
shows in the direction of the steepest ascent of the error function.
❑ It is determined as the derivative of the activation function multiplied by the error at the
neuron output
For Neuron k in the output layer
Where, yk(p) is the output of neuron k at iteration p, and Xk(p) is the net weighted input of
neuron k at same iteration.
41
Characteristics of ANN
❑ Adaptive learning
❑ Self-organization
❑ Error tolerance
❑ Real-time operation
❑ Parallel information processing
42
Benefits and Limitations of ANN
Benefits Limitations
Ability to tackle new kind of problems Performs less well at tasks humans tend to find
difficult
43
Thank You!!!
44