0% found this document useful (0 votes)

352 views

Solution PDF

α = 1/√n Var(s) = Var(Σwijxj) = ΣVar(wijxj) = ΣE(wij^2)Var(xj) + ΣE(xj^2)Var(wij) = α^2/3 * 1 + 1 * α^2/n = α^2 Setting α^2 = 1 gives the result.

Uploaded by

Vard Farrell

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

352 views

Solution PDF

α = 1/√n Var(s) = Var(Σwijxj) = ΣVar(wijxj) = ΣE(wij^2)Var(xj) + ΣE(xj^2)Var(wij) = α^2/3 * 1 + 1 * α^2/n = α^2 Setting α^2 = 1 gives the result.

Uploaded by

Vard Farrell

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Chair for Computer Vision and Artificial Intelligence

Department of Informatics
Technical University of Munich

Note:
• During the attendance check a sticker containing a unique code will be put on this exam.
Esolution • This code contains a unique number that associates this exam with your registration number.
Place student sticker here • This number is printed both next to the code and to the signature field in the attendance
check list.

Introduction to Deep Learning

n
Exam: IN2346 / Endterm Date: Thursday 8th August, 2019

tio
Examiner: Prof. Dr. Leal-Taixé, Prof. Dr. Nießner Time: 08:00 – 09:30

P1 P2 P3 P4

lu P5 P6
So
I
e
pl

Working instructions
• This exam consists of 20 pages with a total of 6 problems.
Please make sure now that you received a complete copy of the exam.
m

• The total amount of achievable credits in this exam is 90 credits.

• Detaching pages from the exam is prohibited.
Sa

• Allowed resources:
– none

• Do not write with red or green colors nor use pencils.

• Physically turn off all electronic devices, put them into your bag and close the bag. This includes
calculators.

Left room from to / Early submission at

– Page 1 / 20 –
Problem 1 Multiple Choice (18 credits)
Mark your answer clearly by a cross in the corresponding box. Multiple correct answers per question possible.

a) Your network is overfitting. What are good ways to approach this problem?
Increase the size of the validation set

× Increase the size of the training set

× Reduce your model capacity
Reduce learning rate and continue training

b) A sigmoid layer

n
has a learnable parameter.

cannot be used during backpropagation.

tio
× is continuous and differentiable everywhere.
maps to values between -1 and 1.

lu
c) Training error does not decrease. What could be a reason?

× Too much regularization.

Too many weights in your network.
So
× Bad initialization.
× Learning rate is too high.
d) How many network parameters are in ResNet-152?
1,337,337.
e

× 60,344,232.
pl

more than a billion.

152.
m

e) What is the correct order of operations for an optimization with gradient descent?

a Update the network weights to minimize the loss.

b Calculate the difference between the predicted and target value.
Sa

c Iteratively repeat the procedure until convergence.

d Compute a forward pass.

e Initialize the neural network weights.

bcdea

ebadc

eadbc

× edbac

– Page 2 / 20 –
f) Dropout
has trouble with tanh activations.

× is an efficient way for regularization.

× can be seen as an ensemble of networks.
makes your network train faster.

g) Consider a simple convolutional neural network with a single convolutional layer. Which of the following
statements is true about this network?
All input nodes are connected to all output nodes.

It is scale invariant.

n
It is translation invariant.

tio
It is rotation invariant.

h) You are building a model to predict the presence (labeled 1) or absence(labeled 0) of a tumor in a brain
scan. The goal is to ultimately deploy the model to help doctors in hospitals. Which of these two metrics
would you choose to use?

lu
× Recall = True positive examples
Total positive examples .

True positive examples

Precision = Total predicted positive examples .
So
Average Precision = True positive examples + True negative examples
Total examples .

i) Why you would want use 1 × 1 convolutions? (check all that apply)
Predict binary class probabilities.

× Collapse number of channels.

× Learn more complex functions by introducing additional non-linearities.

To enforce a fixed size output.

m
Sa

– Page 3 / 20 –
Problem 2 Short Questions (24 credits)

0 a) You are training a neural network with 15 fully-connected layers with a tanh nonlinearity. Explain the
behavior of the gradient of the non-linearity with respect to very large positive inputs.
1

2 Because the tanh is almost flat for very large positive values (1pt), its gradient will be almost 0. (1pt)
Comment: Points deducted for saying "gradient saturates" but not mentioning the small value of the
gradient, a neuron saturates but not the gradient.

0 b) Why might this be a problem for training neural networks? Name and explain this phenomenon.

n
1
Vanishing gradient (1p), during backprop gradient of non-linearity is close to zero, makes train-
2 ing/parameter updates much much slower. (explanation is another 1p)

tio
lu
0 c) In modern architectures, another type of non-linearity is commonly used. Draw and name this non-linearity
(1p) and explain why it helps solve the problem mentioned in the previous two questions (1p).
1

2 Rectified Linear Unit (0.5p) + drawing (0.5p)

So
Because ReLU activations are linear, they do not saturate for large (positive) values, and hence freely
allow gradients to change weights in the network. (1p) Comment: Saturation was enough
e

0 d) Why do we often refer to L 2-regularization as “weight decay”? Derive a mathematical expression that
pl

includes the weights W , the learning rate η, and the L 2-regularization hyperparameter λ to explain your
1 point.
2
Weight update with objective function J incl. weight decay:
m

W = W − η∇W J + 12 λ i Wi2
P

W = W (1 − ηλ) − η∇W J ,
where η = learning rate and λ = regularisation parameter with ηλ << 1.
Value of W is pushed towards zero in each iteration.
Sa

Points: Qualitative answer: 0.5. Mathematical part: l2 loss 0.5, weight update formula 0.5, final
result 0.5

– Page 4 / 20 –
e) You are solving the binary classification task of classifying images as cars vs. persons. You design a CNN 0
with a single output neuron. Let the output of this neuron be z . The final output of your network, ŷ is given by:
1
ŷ = σ (ReLU(z))
You classify all inputs with a final value ŷ ≥ 0.5 as car images. What problem are you going to encounter?

Using ReLU then sigmoid will cause all predictions to be positive (0.5p)
σ (ReLU(z)) ≥ 0.5 ∀z . (0.5p)
Writing "all predictions are ’cars’ is enough

n
f) Suppose you initialize your weights w with uniform random distribution U (−α, α). The output s for given 0
input vector x is given by
n
X 1

tio
si = wij · xj ,
j=0 2

where n is the number of input values.

Assume that the input data x and weights are independent and identically distributed. How do you have to
choose α such that the variance of the input data and the output is identical, hence Var(s) = Var(x).
Hint: For two statistically independent variables X and Y holds:

lu
2 2
Var(X · Y ) = E(X ) Var(Y ) + E(Y ) Var(X ) + Var(X )Var(Y )

Furthermore the PDF of an uniform distribution U (a, b) is

So
(
1
for x ∈ [a, b]
f (x) = b −a
0 otherwise.

The variance of a continuous distribution is calculated as

Z
Var(X ) = x 2 f (x) dx − µ2 ,
R
e

where µ is the expected value of X .

P
n Pn
Var(si ) = Var j=0 wij · xj , = j=0 Var(wij )Var(xj ) = n · Var(w)Var(x)(1p)

Var (U (−α, α)) = 13 α2 (0.5)

q
3
α= (0.5)
m

Correct result: 2p
If only Var(w) = n1 is written then 1p.
Sa

g) Consider 2 different models for image classification of the MNIST data set. 0
The models are: (i) a 3 layer perceptron, (ii) LeNet.
Which of the two models is more robust to translation of the digits in the images? Give a short explanation 1
why.
2

LeNet (0.5p), Convolutional layers (1.5p)

2p: lenet mentioned and convolutional layers as reason

1.5: lenet mentioned, convolutional layers are mentioned but students wrote too much text which
included wrong statements

– Page 5 / 20 –
0 h) Consider the following one-dimensional data points with classes {0, 1} . Sketch a linear (0.5p) and logistic
(0.5p) regression into the figures. Which model is more suitable for this task (1p)?
1

n
tio
Plot linear regression (left) and logistic regression (right).

Logistic regression. (1pt)

an "S" is not a function
(0.5pt)
lu
(0.5p) line should go through points not 0,
So
0 i) What is the mean and standard deviation of Xavier initialization? What changes to this initialization would
you propose when used with ReLU non-linearities?
1

2 Mean=0, Var= n1 . With relus: Var= 2

n
(1p each) Writing variance instead of stddev was fine, both
solutions accepted
3
e
pl

0 j) You have 4000 cat and 100 dog images and want to train a neural network on these images to do binary
classification. What problems do you foresee with this dataset distribution? Name two possible solutions.
m

2 Network prefers cats as they are more likely or imbalance between classes (1pt)
leave out images/reweight dataloader/reweight loss function/collect more dog images/data augmenta-
tion for dogs (0.5pt/sol) No points for: dropout, regularization, batch norm, transfer learning, "get more
Sa

data"

0 k) Why is initializing all the weights of a fully connected layer to the same value problematic during training?

1
If all weights are equal, nodes will learn the same thing during backpropagation, and this limits the
2 capacity. (2 if correct)
If there is no mention of gradients/weight updating, e.g. by only saying "the network will not learn", ->
1.5p

– Page 6 / 20 –
l) What is the difference between dropout for convolutional layers compared to dropout for fully connected 0
layers? Explain both behaviours.
1

Conv: drop feature map at random, fully connected: drop weights at random (1p each) 2

n
tio
lu
So
e
pl
m
Sa

– Page 7 / 20 –
Problem 3 Optimization (12 credits)

0 a) Explain the concept behind RMSProp optimization. How does it help converging faster?

1
Mitigate step size in directions with high-variance gradients (1). Can increase learning rate (1).
2

0 b) Which SGD variation uses first and second momentum?

1
Adam.

n
tio
0 c) Why is it common to use a learning rate decay?

1
When far away (0.5p), one want higher gradients to get closer to solution; the closer you get, the less
jitter/overshooting you want. (0.5)

1
dealing with saddle points?

lu
d) What is a saddle point? What is the advantage/disadvantage of Stochastic Gradient Descent (SGD) in
So
2 Saddle point - The gradient is zero (0.5p), but it is neither a local minima nor a local maxima (0.5p)
(or:the gradient is zero and the function has a local maximum in one direction, but a local minimum in
another direction).
SGD has noisier updates and can help escape from a saddle point (1p)
e
pl

0 e) Why would one want to use larger mini-batches in SGD?

1
Make the gradients less noisy.
m

0 f) Why do we usually use small mini-batches in practice?

1
Limited GPU memory / faster compute (for each batch), so faster update

0 g) Your network’s training curve diverges (assuming data loading is correct). Name one way to address the
problem through hyperparameter change.
1

reduce learning rate (1 point each)

– Page 8 / 20 –
h) What is an epoch? 0

1
full run through the entire train set

i) When is SGD guaranteed to converge to a local minima (provide formula)? 0

P∞ P∞ 1
Robbins-Monro condition; i=1 αi = ∞ (1p) and i=1 αi2 < ∞ (1p)
2

n
tio
lu
So
e
pl
m
Sa

– Page 9 / 20 –
Problem 4 Convolutional Neural Networks and Advanced Architectures (12 credits)
In the following we assume that the input of our network is a 224 × 224 × 3 color (RGB) image. The task is
to perform image classification on 1000 classes. You design a network with the following structure [CONV
- RELU] x 20 - FC - FC. That is, you place 20 consecutive convolutional layers (including non-linear ac-
tivations), followed by two fully-connected layers. Each layer will have its own number of filters and kernel size.

0 a) The first 3 convolutional layers have each 5 filters with kernels of size 3 × 3, applied with stride 1 and no
padding. How large is the receptive field of a feature after the 3 convolutional operations?
1

1x1− > 3x3− > 5x5− > 7x7 (1p)

n
0 b) What are the dimensions of the feature map after the 3 convolutional operations from (a) ?

tio
1
224 - 2 (first conv layer) - 2 (second conv layer) - 2 (third conv layer) = 218x218x5 (number of filters)
2 (1p spatial size, 1p kernel size)

0
dimension represent? (1p)
lu
c) What are the dimensions of the weight tensor of the first convolutional layer? (1p) What does each
So
1

2 Shape: (3, 5, 3, 3) (1pt)

Reasoning: input channels (RGB), output channels/number of filters, kernel size = 3x3 (1p)
( no points when only 3dims are mentioned)
e

0 d) After the 10th convolutional layer your feature map has size 100x100x224. You realize the next convolu-
pl

tional filter operation will involve too many multiplications that make your network training slow. However, the
1 next layer requires identical spatial size of the feature map.
Propose a solution for this problem (1p) and demonstrate your solution with an example (1p).
2
m

1x1 convolutions (1p)

If you use 25 convolutional filters of 1x1x256, we would reduce the feature map to 100x100x25, making
the next operation cheaper. (1p)
Comment: Any output larget than 100x100x224 is wrong.
Sa

0 e) Your network is now trained for the task of image classification. You now want to use the trained weights
of this network for the task of image segmentation for which you need a pixel-wise output. Which layers of
1 your original network described above can you not reuse for the image segmentation task? (1p) Describe
briefly how you would adapt the network for image segmentation given any input image size? (1p)
2

The FC layers, because they take a fixed input size (1p) Make it fully convolutional (1pt). Comment:
mentioning only upscaling: 0.5p

– Page 10 / 20 –
f) You decide to increase the number of layers substantially and therefore you switch to a ResNet architecture. 0
Draw a ResNet block (1p). Describe all the operations inside the block (1pt). What is the advantage of using
such a block in terms of training (1p)? 1

(1pt)

Final summation of passed features through convolutional layers and skipped initial features. F(x) + x .

n
(1p)
One of multiple solutions: Skip-connections
- provide highways for gradients and make network easier to train

tio
- resolve vanishing gradient problem (1p)

lu
So
e
pl
m
Sa

– Page 11 / 20 –
Problem 5 Backpropagation and Convolutional Layers (12 credits)
Your friend is excited to try out those "Convolutional Layers" you were talking about from your lecture.
However, he seems to have some issues and requests your help for some theoretical computations on a toy
example.
Consider a neural network with a convolutional (without activation) and a max pooling layer. The convolutional
layer has a single filter with kernel size (1, 1), no bias, a stride of 1 and no padding. The filter weights are all
initialized to a value of 1. The max pooling layer has a kernel size of (2, 2) with stride 2, and 1 zero-padding.

n
tio
You are given the following input image of dimensions (3, 2, 2):

1 −0.5 −2 1 1 0
x= , ,
2 −2 −1.5 1 0 0

0 a) Compute the forward pass of this input and write down your calculations.

lu
1
Forward pass
2
1 −0.5 −2 1 1 0 0 0.5
+ + = (1p)
So
2 −2 −1.5 1 0 0 0.5 −1
After max pooling,
0 0.5
(1p)
0.5 0

b) Consider the corresponding ground truth,

0

1 0 1
y=
1 0
pl

Calculate the binary cross-entropy with respect to the natural logarithm by summing over all output pixels of
the forward pass computed in (a). You may assume log(0) ≈ −109 . (Write down the equation and keep the
logarithm for the final result.)
m

X
BCEloss = − ti log si (0.5p for either his or the line below)
i
Sa

= − log(2w1 − 1.5w2 ) − log(−0.5w1 + w2 )

= − log(0.5) − log(0.5) = 2 log 2 (1p)

0 c) You don’t recall learning the formula for backpropagation through convolutional layers but those 1 × 1
½ convolutions seem suspicious. Write down the name of a common layer that is able to produce the same
result as the convolutional layer used above.

Fully-connected layer

– Page 12 / 20 –
d) Update the kernel weights accordingly by using gradient descent with a learning rate of 1. (Write down 0
your calculations!)
1

Partial derivatives for w1/w2 (2p), 2

∂ BCE
∂ w1 = − ∂ ln(2w1 −1.5w∂2w)+ln(
1
−0.5w1 +w2 )
= − 2w1 −21.5w2 − −0.5
−0.5w1 +w2 = −4 + 1 = −3 3

4
∂ BCE
∂ w2 = − ∂ ln(2w1 −1.5w∂2w)+ln(
2
−0.5w1 +w2 )
= − 2w1−−1.5
1.5w2
− 1
−0.5w1 +w2 =3−2=1
5
Update using gradient descent for w1/w2 (2p),

∂ BCE
w1+ = w1 − lr ∗ ∂ w1 = 1 − 1 × (−3) = 4

∂ BCE
w2+ = w2 − lr ∗ =1−1×1=0

n
∂ w2

Derivate and update for w3 (1p total):

tio
∂ BCE
∂ w3 =0

w3+ = w3 − 0 = 1

lu
1p if the person only wrote at least the gradient descent update rule
So
e
pl
m
Sa

– Page 13 / 20 –
0 e) After helping your friend debugging, you want to showcase the power of convolutional layers. Deduce
what kind of 3 × 3 convolutional filter was used to generate the output (right) of the grayscale image (left)
1 and write down its 3 × 3 values.
2

n
Vertical edge detector (1p)

tio
 
1 0 −1
1 0 −1 (1p)
1 0 −1

Flipping & Scaling are OK

0
each pixel has a value between 0 (black) and 1 (white).
lu
f) He finally introduces you to his real problem. He wants to find 3 × 3 black crosses in grayscale images, i.e.,
So
1
e

You notice that you can actually hand-craft such a filter. Write down the numerical values of a 3 × 3 filter that
maximally highlights on the position of black crosses.
pl

 
−1 1 −1
1 −1 1  (2p)
−1 1 −1
m

Flipping & Scaling are OK, even though pixel values were given
Sa

– Page 14 / 20 –
Problem 6 Recurrent Neural Networks and LSTMs (12 credits)

a) Consider a vanilla RNN cell of the form ht = tanh(V · ht −1 + W · xt ). The figure below shows the input 0
sequence x1 , x2 , and x3 .
1

n
tio
Given the dimensions xt ∈ R4 and ht ∈ R12 , what is the number of parameters in the RNN cell? Neglect the
bias parameter.

4 × 12 + 12 × 12 (1 pt) = 48 + 144 = 192 (1 pt)

lu
So
b) If xt is the 0 vector, then ht = ht −1 . Discuss whether this statement is correct. 0

1
False: ( 1 pt)
e

After transformation with V and non-linearity xt = 0 does not lead to ht = ht −1 (1 pt) . Full points require 2
explanation, solely equation not sufficient.
pl
m
Sa

– Page 15 / 20 –
0 c) Now consider the following one-dimensional ReLU-RNN cell.

1 ht = ReLU(V · ht −1 + W · xt )

2 (Hidden state, input, and weights are scalars)

Calculate h1 , h2 and h3 where V = 1, W = 2, h0 = −3, x1 = 1, x2 = 2 and x3 = 0.
3

h0 = −3
h1 = relu(1 · (−3) + 2 · 1) = 0 ( 1 pt)
h2 = relu(1 · 0 + 2 · 2) = 4 (1 pt)
h3 = relu(1 · 4 + 2 · 0) = 4 ( 1 pt)

n
tio
lu
So
e
pl
m
Sa

– Page 16 / 20 –
∂ h3 ∂ h3 ∂ h3
d) Calculate the derivatives ∂V , ∂W , and ∂ x1 for the forward pass of the ReLU-RNN Cell of (c). Use that 0

∂

∂ x ReLU(x)
= 1. 1
x=0
2

3
ht = ReLU(V · ht −1 + W · xt ) = ReLU(zt )

∂ h3 ∂ ∂ ∂
= ReLU(x) · h2 + ReLU(x) · V · h1 + ReLU(x) · V 2 · h0 =
∂V ∂x x=z3 ∂ x x=z2 ∂ x x=z1

= 1 · 4 + 1 · 1 · 0 + 0 · 1 · (−3) = 4 (1 pt)

∂ h3 ∂ ∂ ∂
= ReLU(x)
· x3 + ReLU(x)
· V · x2 + ReLU(x) · V 2 · x1 =
∂W ∂x x=z3 ∂ x x=z2 ∂ x x=z1

1·0+1·2+0·0=2 (1 pt)

n

∂ h3
= ReLU(x)
· V · ReLU(x)
· V · ReLU(x) ·W =1·1·1·1·0·2=0 (1 pt)
∂ x1 x=z3 x=z2 x=z1

tio
Only correct and calculated result gives point.

lu
So
e
pl
m
Sa

– Page 17 / 20 –
0 e) A Long-Short Term Memory (LSTM) unit is defined as

1 g1 = σ (W1 · xt + U1 · ht −1 ) ,
g2 = σ (W2 · xt + U2 · ht −1 ) ,
2
g3 = σ (W3 · xt + U3 · ht −1 ) ,
c̃t = tanh (Wc · xt + uc · ht −1 ) ,
ct = g2 ◦ ct −1 + g3 ◦ c̃t ,
ht = g1 ◦ ct ,

where g1 , g2 , and g3 are the gates of the LSTM cell.

1) Assign these gates correctly to the forget f , update u, and output o gates. (1p)
2) What does the value ct represent in a LSTM? (1p)

n
g1 = output gate
g2 = forget gate

tio
g3 = update gate
(1 pt)
ct : cell state
(1 pt)

lu
So
e
pl
m
Sa

– Page 18 / 20 –
Additional space for solutions–clearly mark the (sub)problem your answers are related to and strike
out invalid solutions.

n
tio
lu
So
e
pl
m
Sa

– Page 19 / 20 –
n
tio
lu
So
e
pl
m
Sa

– Page 20 / 20 –

Cloud Computing Tum Klausur 2019
No ratings yet
Cloud Computing Tum Klausur 2019
6 pages
01 MS-16R5 v1.0 English
No ratings yet
01 MS-16R5 v1.0 English
56 pages
Mindray New Pump - BeneFusion U Series
No ratings yet
Mindray New Pump - BeneFusion U Series
4 pages
Understanding Machine Learning Solution Manual: 2 Gentle Start
No ratings yet
Understanding Machine Learning Solution Manual: 2 Gentle Start
67 pages
SP18 CS182 Midterm Solutions - Edited
No ratings yet
SP18 CS182 Midterm Solutions - Edited
14 pages
Autoencoders - Presentation
No ratings yet
Autoencoders - Presentation
18 pages
Exercises 695 Clas
No ratings yet
Exercises 695 Clas
3 pages
Seminar On Medical Mirror
94% (18)
Seminar On Medical Mirror
28 pages
Must Know Questions Deep Learning
No ratings yet
Must Know Questions Deep Learning
22 pages
Deep Learning - IIT Ropar - Unit 4 - Week 1
No ratings yet
Deep Learning - IIT Ropar - Unit 4 - Week 1
5 pages
Assignment - Week 6 (Neural Networks) Type of Question: MCQ/MSQ
No ratings yet
Assignment - Week 6 (Neural Networks) Type of Question: MCQ/MSQ
4 pages
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
No ratings yet
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
11 pages
Assignment 5
No ratings yet
Assignment 5
3 pages
2023 ML Assignment
No ratings yet
2023 ML Assignment
57 pages
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
4 pages
Deep Learning - IIT Ropar - Unit 4 - Week 1
No ratings yet
Deep Learning - IIT Ropar - Unit 4 - Week 1
8 pages
CISC 867: Deep Learning Assignment #1: K J Net
No ratings yet
CISC 867: Deep Learning Assignment #1: K J Net
3 pages
CS230 Midterm Solutions Fall 2022
No ratings yet
CS230 Midterm Solutions Fall 2022
20 pages
Nueral Network Mcqs
No ratings yet
Nueral Network Mcqs
6 pages
2022 ML Assignments
No ratings yet
2022 ML Assignments
45 pages
Deep Learning
No ratings yet
Deep Learning
2 pages
RBM, DBN, and DBM
No ratings yet
RBM, DBN, and DBM
79 pages
Question Bank
No ratings yet
Question Bank
4 pages
CS230 Midterm Fall 2022
No ratings yet
CS230 Midterm Fall 2022
14 pages
Assignment 9
No ratings yet
Assignment 9
4 pages
MP Neuron
No ratings yet
MP Neuron
35 pages
Cs230exam Win21
No ratings yet
Cs230exam Win21
21 pages
Assignment 3: Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 3: Introduction To Machine Learning Prof. B. Ravindran
4 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
19 pages
Assignment 2: Introduction To Machine Learning Prof. B. Ravindran
100% (1)
Assignment 2: Introduction To Machine Learning Prof. B. Ravindran
3 pages
Practice Final sp22
No ratings yet
Practice Final sp22
10 pages
ML Assignment 3 Nptel 2019
No ratings yet
ML Assignment 3 Nptel 2019
26 pages
Deep Learning Assignment3 Solution
No ratings yet
Deep Learning Assignment3 Solution
9 pages
Artificial Intelligence - Knowledge Representation and Reasoning - Unit 5 - Week 2
No ratings yet
Artificial Intelligence - Knowledge Representation and Reasoning - Unit 5 - Week 2
6 pages
Deep Learning Exp
No ratings yet
Deep Learning Exp
25 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
PA12
100% (2)
PA12
3 pages
NPTEL Online Certification Courses Indian Institute of Technology Kharagpur
100% (1)
NPTEL Online Certification Courses Indian Institute of Technology Kharagpur
4 pages
AD601 Deep Learning Unit-2 Notes
No ratings yet
AD601 Deep Learning Unit-2 Notes
14 pages
Assignment 10
100% (1)
Assignment 10
3 pages
Question Bank AML
No ratings yet
Question Bank AML
4 pages
Noc20-Cs28 Week 07 Assignment 02
No ratings yet
Noc20-Cs28 Week 07 Assignment 02
6 pages
Deep Learning - IIT Ropar - Unit 10 - Week 7
100% (1)
Deep Learning - IIT Ropar - Unit 10 - Week 7
4 pages
Dica Question Bank
No ratings yet
Dica Question Bank
4 pages
Answers For End-Sem Exam Part - 2 (Deep Learning)
No ratings yet
Answers For End-Sem Exam Part - 2 (Deep Learning)
20 pages
Machine Learning Full Question Bank
No ratings yet
Machine Learning Full Question Bank
14 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
Module2.3 Hyperparameter Optimization
No ratings yet
Module2.3 Hyperparameter Optimization
29 pages
Assignment 7
No ratings yet
Assignment 7
3 pages
Deep Learning
No ratings yet
Deep Learning
6 pages
Lecture Notes 5
No ratings yet
Lecture Notes 5
3 pages
Dl Question Bank
No ratings yet
Dl Question Bank
23 pages
Lecture 2.1.2activation Function
No ratings yet
Lecture 2.1.2activation Function
15 pages
Independent Component Analysis: Bhagesh Bhutani (20) Chayan Sharma (21) Deepak
No ratings yet
Independent Component Analysis: Bhagesh Bhutani (20) Chayan Sharma (21) Deepak
15 pages
Unit -3-NNDL- Notes
No ratings yet
Unit -3-NNDL- Notes
17 pages
ML Unit-4
No ratings yet
ML Unit-4
9 pages
Train A Simple NN - Jupyter Notebook
No ratings yet
Train A Simple NN - Jupyter Notebook
4 pages
Thyroid Disease Classification Using Machine Learning Project
No ratings yet
Thyroid Disease Classification Using Machine Learning Project
34 pages
Sheet 1 Neural Network
No ratings yet
Sheet 1 Neural Network
5 pages
Introduction To Machine Learning - IITKGP - Unit 4 - Week 2
No ratings yet
Introduction To Machine Learning - IITKGP - Unit 4 - Week 2
5 pages
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet
SS 2020 Solutions
No ratings yet
SS 2020 Solutions
22 pages
Solution: Introduction To Deep Learning
No ratings yet
Solution: Introduction To Deep Learning
20 pages
Multiple View Geometry: Solution Sheet 1
No ratings yet
Multiple View Geometry: Solution Sheet 1
2 pages
Multiple View Geometry: Exercise Sheet 10
No ratings yet
Multiple View Geometry: Exercise Sheet 10
2 pages
Multiple View Geometry: Solution Sheet 2: A A B A B A B A B A B A B A B B A B
No ratings yet
Multiple View Geometry: Solution Sheet 2: A A B A B A B A B A B A B A B B A B
2 pages
Multiple View Geometry: Solution Sheet 1
No ratings yet
Multiple View Geometry: Solution Sheet 1
2 pages
Chapter4 PDF
No ratings yet
Chapter4 PDF
50 pages
Chapter2 PDF
No ratings yet
Chapter2 PDF
44 pages
Chapter1 PDF
No ratings yet
Chapter1 PDF
30 pages
cs231n 2018 Midterm Review-2 PDF
No ratings yet
cs231n 2018 Midterm Review-2 PDF
86 pages
Free VPS by WNTD
No ratings yet
Free VPS by WNTD
2 pages
Case 1 - Apple - Revolutionary Innovation
No ratings yet
Case 1 - Apple - Revolutionary Innovation
2 pages
Web Unit - 2
No ratings yet
Web Unit - 2
165 pages
Alumni Tracking System: A Major Project Report ON
No ratings yet
Alumni Tracking System: A Major Project Report ON
65 pages
Loc Flow
No ratings yet
Loc Flow
85 pages
VLT® Midi Drive FC 280 Programming Guide
No ratings yet
VLT® Midi Drive FC 280 Programming Guide
144 pages
Smart 3D Hardware and Software Recommendations - Intergraph Smart 3D Release Bulletin - Reader - ALI Documentation
No ratings yet
Smart 3D Hardware and Software Recommendations - Intergraph Smart 3D Release Bulletin - Reader - ALI Documentation
2 pages
Infineon-Enabling The Digital Transformation of IIoT With Bluetooth-Presentations-V01 00-En
No ratings yet
Infineon-Enabling The Digital Transformation of IIoT With Bluetooth-Presentations-V01 00-En
31 pages
N Convert Recurring Decimals To Fractions PDF
No ratings yet
N Convert Recurring Decimals To Fractions PDF
3 pages
Alm List
No ratings yet
Alm List
5 pages
BCS WS II LMJA 8xxx
No ratings yet
BCS WS II LMJA 8xxx
20 pages
Tim's Mini Plotter 2 With Single PCB - 25 Steps (With Pictures) - Instructables
No ratings yet
Tim's Mini Plotter 2 With Single PCB - 25 Steps (With Pictures) - Instructables
35 pages
Cyber Crimes Ppt Spitr
No ratings yet
Cyber Crimes Ppt Spitr
14 pages
Revo Uninstaller Pro Help
No ratings yet
Revo Uninstaller Pro Help
74 pages
Lecture 2 EE675
No ratings yet
Lecture 2 EE675
4 pages
Mathematical Operation
No ratings yet
Mathematical Operation
31 pages
Amogh Tandel Resume
No ratings yet
Amogh Tandel Resume
2 pages
Mathematics Chapter No .1 Numbers
No ratings yet
Mathematics Chapter No .1 Numbers
2 pages
500 Questions PDF
No ratings yet
500 Questions PDF
506 pages
Recruitment - Fraud - 123
No ratings yet
Recruitment - Fraud - 123
2 pages
Interprocess Communication Mechanisms: UNIT-3 Cooperating Processes
No ratings yet
Interprocess Communication Mechanisms: UNIT-3 Cooperating Processes
24 pages
Spring Boot Notes:: List of Annotations
No ratings yet
Spring Boot Notes:: List of Annotations
9 pages
Developing An Improved Approach To Solving A New Gas Lift Optimization Problem
No ratings yet
Developing An Improved Approach To Solving A New Gas Lift Optimization Problem
25 pages
EAadhaar_0143216940031020230915120604_12032025174142
No ratings yet
EAadhaar_0143216940031020230915120604_12032025174142
1 page
Letter Respondents
No ratings yet
Letter Respondents
8 pages
ITIL Edition 2011 - COBIT 5 - Mapping-22
No ratings yet
ITIL Edition 2011 - COBIT 5 - Mapping-22
1 page
AUTOSAR SWS DiagnosticEventManager-已解锁 120-125
No ratings yet
AUTOSAR SWS DiagnosticEventManager-已解锁 120-125
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Solution PDF

Uploaded by

Solution PDF

Uploaded by

Chair for Computer Vision and Artificial Intelligence

Introduction to Deep Learning

• The total amount of achievable credits in this exam is 90 credits.

• Do not write with red or green colors nor use pencils.

Left room from to / Early submission at

× Increase the size of the training set

cannot be used during backpropagation.

× Too much regularization.

more than a billion.

a Update the network weights to minimize the loss.

c Iteratively repeat the procedure until convergence.

e Initialize the neural network weights.

× is an efficient way for regularization.

True positive examples

× Collapse number of channels.

× Learn more complex functions by introducing additional non-linearities.

To enforce a fixed size output.

2 Rectified Linear Unit (0.5p) + drawing (0.5p)

where n is the number of input values.

Furthermore the PDF of an uniform distribution U (a, b) is

The variance of a continuous distribution is calculated as

where µ is the expected value of X .

Var (U (−α, α)) = 13 α2 (0.5)

LeNet (0.5p), Convolutional layers (1.5p)

2p: lenet mentioned and convolutional layers as reason

Logistic regression. (1pt)

2 Mean=0, Var= n1 . With relus: Var= 2

0 b) Which SGD variation uses first and second momentum?

0 e) Why would one want to use larger mini-batches in SGD?

0 f) Why do we usually use small mini-batches in practice?

reduce learning rate (1 point each)

i) When is SGD guaranteed to converge to a local minima (provide formula)? 0

1x1− > 3x3− > 5x5− > 7x7 (1p)

2 Shape: (3, 5, 3, 3) (1pt)

1x1 convolutions (1p)

b) Consider the corresponding ground truth,

= − log(2w1 − 1.5w2 ) − log(−0.5w1 + w2 )

Partial derivatives for w1/w2 (2p), 2

Derivate and update for w3 (1p total):

Flipping & Scaling are OK

4 × 12 + 12 × 12 (1 pt) = 48 + 144 = 192 (1 pt)

2 (Hidden state, input, and weights are scalars)

where g1 , g2 , and g3 are the gates of the LSTM cell.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.