DL 02 Deep Forward Networks
DL 02 Deep Forward Networks
Deep Learning
2. Deep Feedforward Networks
Lars Schmidt-Thieme
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
1 / 33
Deep Learning
Syllabus
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
1 / 33
Deep Learning
Outline
2. An example: XOR
5. Backpropagation
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
1 / 33
Deep Learning 1. What is a Neural Network?
Outline
2. An example: XOR
5. Backpropagation
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
1 / 33
Deep Learning 1. What is a Neural Network?
I Feedforward networks
(aka feedforward neural networks or multilayer perceptrons)
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
1 / 33
Deep Learning 1. What is a Neural Network?
Why Feedforward?
I Given a Feedforward Network ŷ = f(x; θ)
I Input x, then pass through a chain of steps before outputting y
Why Neural?
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
3 / 33
Deep Learning 1. What is a Neural Network?
Why Network?
I A feed-forward network is an acyclic directed graph, but
I Graph nodes are structured in layers
I Directed links between nodes are parameters/weights
I Each node is a computational functions
I No inter-layer and intra-layer connections (but possible)
I Input to the first layer is given (the features x)
I Output is the computation of the last layer (the target ŷ)
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
Figure 1: FNN, Source www.analyticsvidhya.com 4 / 33
Deep Learning 1. What is a Neural Network?
Nonlinear Mapping
I We can easily solve linear regression, but not every problem is linear.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
5 / 33
Deep Learning 1. What is a Neural Network?
Nonlinear Mapping
I We can easily solve linear regression, but not every problem is linear.
f(x)=(x+1)^2 f(a,b)= a + 2 b + 1
120
120
100 100
80
f(a,b)
80
60
f(x)
60 40
20
40 0
20 10
5 100
0 80
0 60
-5 40
b=x -10 0
20
a = x^2
-10 -5 0 5 10
x
Figure 2: Mapping
Lars Schmidt-Thieme, Information Systems feature x into
and Machine a new
Learning (ISMLL), Universityxof→
Lab dimensionality φ(x) =Germany
Hildesheim, (a, b)
5 / 33
Deep Learning 1. What is a Neural Network?
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
6 / 33
Deep Learning 1. What is a Neural Network?
Nonlinear Mapping
f (x) = x 2 + 2e x + 3x − 5
I from which latent features can it be linearly combined:
A. x 2
B. x 2 , x, e x
C. x, e x
D. x 2 , x, e x , sin(x)
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
7 / 33
Deep Learning 2. An example: XOR
Outline
2. An example: XOR
5. Backpropagation
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
8 / 33
Deep Learning 2. An example: XOR
I Our dataset
train 0 1 0 1
D := {( , 0), ( , 1), ( , 1), ( , 0)}
0 0 1 1
I Leading to the optimization:
1 X
arg min J(θ) := (y − f (x; θ))2
θ 4
(x,y )∈Dtrain
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
8 / 33
Deep Learning 2. An example: XOR
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
10 / 33
Deep Learning 2. An example: XOR
1 1 0 1
W = , c= , w= , b=0
1 1 −1 −2
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
11 / 33
Deep Learning 2. An example: XOR
0
ŷ (1) = w T h(1) + b = 1 −2 +0=0
0
1
ŷ (2) = w T h(2) + b = 1 −2 +0=1
0
1
ŷ (3) = w T h(3) + b = 1 −2 +0=1
0
2
ŷ (4) = w T h(4) + b = 1 −2 +0=0
1
The computations of the final layer match exactly those of the XOR function.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
13 / 33
Deep Learning 3. Loss and Output Layer
Outline
2. An example: XOR
5. Backpropagation
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
14 / 33
Deep Learning 3. Loss and Output Layer
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
14 / 33
Deep Learning 3. Loss and Output Layer
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
15 / 33
Deep Learning 3. Loss and Output Layer
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
16 / 33
Deep Learning 3. Loss and Output Layer
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
17 / 33
Deep Learning 3. Loss and Output Layer
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
18 / 33
Deep Learning 4. Basic Feedforward Network Architecture
Outline
2. An example: XOR
5. Backpropagation
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
19 / 33
Deep Learning 4. Basic Feedforward Network Architecture
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
19 / 33
Deep Learning 4. Basic Feedforward Network Architecture
h = relu(W T x + b) = max(0, W T x + b)
h = σ(z)
h = tanh(z) = 2σ(2z) − 1
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
19 / 33
Deep Learning 4. Basic Feedforward Network Architecture
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
20 / 33
Deep Learning 5. Backpropagation
Outline
2. An example: XOR
5. Backpropagation
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
21 / 33
Deep Learning 5. Backpropagation
Computational Graphs
z2
+
z
dot
z1
dot
x w
x w b
z = xT w
z2 = z1 + b = x T w + b
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
21 / 33
Deep Learning 5. Backpropagation
Computational Graphs z6
relu
z5
z3
+
relu z4
z2 dot
z3 w2 b2
+
relu
z2
z1
+
dot z1
x w b dot
x w1 b1
z3 = relu(x T w + b)
z6 = relu(w2T relu(x T w1 + b1 ) + b2 )
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
22 / 33
Deep Learning 5. Backpropagation
Forward Computation
I computational graph (Z , E ), a DAG.
I Z a set of node IDs.
I E ⊆ Z × Z a set of directed edges.
For every node z ∈ Z :
I Tz : domain of the node (e.g., R17 )
and additionally for every non-root node z ∈ Z :
Q
I fz : Tz 0 → Tz node operation
z 0 ∈fanin(z)
I forward computation:
Given values vz ∈ Tz of all the root nodes z ∈ Z ,
compute a value for every node z ∈ Z via
vz := fz ((vz 0 )z 0 ∈fanin(z) )
| {z }
=:vfanin(z)
Note: fanin(Z ,E ) (z) := {z 0 ∈ Z | (z 0 , z) ∈ E } nodes with edges into z.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
23 / 33
Deep Learning 5. Backpropagation
z3 Tx := R2 , Tw := R2 , Tb := R,
relu Tz1 = Tz2 = Tz3 := R
1 topological-order(Z , E ) :
2 x := ()
3 while Z 6= ∅:
4 choose z ∈ roots(Z , E ) arbitrarily
5 delete z in graph (Z , E )
6 x := x _ (z)
7 return x
I for any node then we can compute its gradient values w.r.t. its inputs:
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
26 / 33
Deep Learning 5. Backpropagation
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
28 / 33
Deep Learning 5. Backpropagation
z1 A.
2 1
dot ∇x z 3 = , ∇w z 3 =
1 −1
x w B.
b
1 2
∇x z 3 = , ∇w z 3 =
−1 1
z3 = relu(x T w + b) ∇x z 2 =
1
, ∇w z 2 =
2
−1 1
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
29 / 33
Deep Learning 5. Backpropagation
z3
I gradient values for all neighbors:
relu ∇z2 z3 = 1, ∇z1 z2 = 1, ∇b1 z2 = 1,
z2 1 2
∇x z 1 = , ∇w z 1 =
−1 1
+
I gradient values for all node pairs:
z1 ∇z1 z3 = 1, ∇b z3 = 1,
dot 1 2
∇x z 3 = , ∇w z 3 =
−1 1
x w b
1
2
∇x z 2 = , ∇w z 2 =
−1 1
z3 = relu(x T w + b)
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
29 / 33
Deep Learning 5. Backpropagation
I compute gradients ∇z 0 z for single leaf node z and all root nodes z 0
Gradient Computation
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
31 / 33
Deep Learning 5. Backpropagation
Summary
I Feedforward neural networks are models for supervised prediction
(regression, classification).
Summary (2/2)
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
33 / 33
Deep Learning
Further Readings
I Goodfellow et al. 2016, ch. 6
Acknowledgement: An earlier version of the slides for this lecture have been written by my
former postdoc Dr Josif Grabocka.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
34 / 33
Deep Learning
References
Charu C. Aggarwal. Neural Networks and Deep Learning: A Textbook. Springer International Publishing, 2018. ISBN
978-3-319-94462-3. doi: 10.1007/978-3-319-94463-0.
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. The Mit Press, Cambridge, Massachusetts, November
2016. ISBN 978-0-262-03561-3.
Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander Smola. Dive into Deep Learning. https://d2l.ai/, 2020.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
35 / 33