Unit - 1 Deep Learning 3-2
Unit - 1 Deep Learning 3-2
Deep Learning
Unit-1
Fundamentals of Deep Learning: Artificial Intelligence, History of Machine learning: Probabilistic
Modeling, Early Neural Networks, Kernel Methods, Decision Trees, Random forests and Gradient
Boosting Machines, Fundamentals of Machine Learning: Four Branches of Machine Learning,
Evaluating Machine learning Models, Overfitting and Underfitting. [Text Book 2]
------------------------------------------------------------------------------------------------------------------------
The “deep” here stands for the idea of successive layers of representations. How many
layers contribute to a model of the data is called the depth of the model. Other appropriate
names for the field could have been layered representations learning and hierarchical
representations learning. Modern deep learning often involves tens or even hundreds of
successive layers of representations. In deep learning, these layered representations are
(almost always) learned via models called neural networks.
o A way to measure
A machine-learning model transforms its input data into meaningful outputs.
The central problem in machine learning and deep learning is meaningfully
transform data.
Let us take an example to understand THREE things. Consider an x-axis, and
y-axis, and some points represented by their coordinates in the (x, y) system,
as shown in figure.
The network transforms the digit image into different representations from
the original image and increasingly informative about the final result.
Figure 4: The loss score is used as a feedback signal to adjust the weights
In this context, learning means finding a set of values for the weights of all
layers in a network, such that the network will correctly map example inputs
to their associated targets
Finding the correct value for all of them may a daunting (frightening) task,
because the change in one parameter will affect other layers.
To control the neural network, first we have to observe predicted value, and
we need to measure how far this output is from what you expected. This is
the job of the loss function of the network, also called the objective function.
The loss function takes the predictions of the network and the true target
(what you wanted the network to output) and computes a distance score.
Since the weights are initialized randomly using random process, the Loss
score obviously high.
But with every example (item or image) the network processes, the weights
are adjusted a little in the correct direction, and the loss score decreases.
This is the training loop, which is repeated sufficient number of times to
reduce the loss score. Then The outputs will be close to the targets.
Applications of Deep Learning
In particular, deep learning has achieved the following breakthroughs, all in
historically Difficult areas of machine learning:
Probabilistic Modeling:
The early neural networks have been replaced by the modern neural
networks.
The early neural networks have laid the path to the deep learning. The core
idea of neural networks coined in the year 1950, and due its structure itself
was ignored for decades.
When some people independently rediscovered the Backpropagation
algorithm has initiated the neural networks again.
Kernel Methods:
The kernel methods are group of classification algorithms. The support vector
machine is one of the best known algorithm under this category. SVM was developed by
Vladimir Vapnik and cornna cortes in 1990s at Bell Labs. SVMs aim at solving classification
problems by finding good decision boundaries between two sets of points belonging to two
different categories. This decision boundary is a line which can be linear or non-linear and
separates two spaces belong to two categories. SVMs proceed to find these boundaries in
two steps:
The process of mapping the data to a high-dimensional space can be carried out using the
Kernel methods. An example of kernel methods is given below.
These Kernel methods are used to transform the non-linear data into linear
(Ex: y=power(x,2)).
x y=power(x,2)
1.2 1.44 x
1.4 1.96 12
10
1.3 1.69 8 x
6
1.5 2.25 4
1.3 1.69 2
0
1.2 1.44 0 2 4 6 8 10 12
But, if we add second feature using the polynomial expression y=power(x,2), then the
dataset becomes linearly separable as shown below.
y=power(x,2)
2.5
1.5 y=power(x,2)
0.5
0
1.15 1.2 1.25 1.3 1.35 1.4 1.45 1.5 1.55
Decision Trees:
Decision trees are Tree-like structures that let you classify input data points or
predict output values given inputs as Shown in the Figure 7. Decision Tree is a supervised
learning technique that can be used for both classification and Regression problems, but
mostly it is preferred for solving Classification problems. They’re easy to visualize and
interpret. IT contains 3 main elements: Decision Nodes, Branch, and Leaf Nodes. The
Decision nodes can have multiple branches whereas the Leaf nodes cannot contain any
further branches.
Example:
Random Forest:
Random Forest is a popular machine learning algorithm that belongs to the
supervised learning technique. It can be used for both Classification and Regression
problems in ML. It is a collection of large number of specialized decision trees. It is based on
the concept of ensemble learning, which is a process of combining multiple classifiers to
solve a complex problem and to improve the performance of the model. The greater number
of trees in the forest leads to higher accuracy and prevents the problem of overfitting.
For the same data different decision trees are created, instead of depending on one decision
tree, the random forest takes the decision from each tree and based on the majority votes
of prediction the final output will be predicted.
Syntax tree prediction – It is used to predict the Syntax tree for the given
sentence.
Object detection – Given a picture, it draws the boundary line around some
objects considering their internal features in the picture or image.
Image segmentation – Divides the image into sub parts based on the pixels
intensity values.
Unsupervised learning:
This is used to find the interesting information from the input without knowledge of
the known targets. This is mainly used in data visualization, data compression, data
denoising, and understanding the correlations present in the data. This is often treated as
bread and butter for the data analysts before attempting to use any supervised learning
technique. There are two well-known categories of unsupervised learning as follow:
Dimensionality reduction
Clustering
Self-supervised learning:
It is specific types of supervised learning and deserves to be considering as a
different category. This is used learn the patterns without human involvement. Here, labels
are also involved, but are generated from the input data using the heuristic techniques.
For instance, autoencoders are a well-known instance of self-supervised learning,
where the generated targets are the input, unmodified.
In the same way, predicting the next frame from the video when some past frames
are given.
It is used in predicting the next words in the text, when the previous words are
given.
Reinforcement learning:
In reinforcement learning, an agent receives information about its environment and
learns to choose actions that will maximize some reward. This is mostly used in games to
predict the next move which minimizes the loss and maximizes the reward. Some of the
applications are as follow:
Self-driving cars
Robotics
Education
Validation and Test sets. Before starting the process the random shuffling can be done to
mix the data well.
K-FOLD VALIDATION
Here we split your data into K partitions of equal size. For each partition i, train a
model on the remaining K – 1 partitions, and evaluate it on partition i. The Same process is
repeated for K Times. The final score of the model is the average of all the scores obtained
in K Scores. This is preferred when your model is giving significance variance on the test set.
Here, only one fold may not be considered as validation set.
We can compute some error measure on the training set called the training error, and we
reduce this training error. What separates machine learning from optimization is that we
want the generalization error, also called the test error to be low as well. The
generalization error is defined as the expected value of the error on a new input.
We typically estimate the generalization error of a machine learning model by measuring its
performance on a test set of examples that were collected separately from the training set.
The test error will be computer using the MSE (Means Square Error) as follow:
1. Measuring the distance of the observed y-values from the predicted y-values at each
value of x; (y-y`)
The factors determining how well a machine learning algorithm will perform are its ability to:
These two factors correspond to the two central challenges in machine learning: Underfitting
and overfitting. Underfitting occurs when the model is not able to obtain a sufficiently low
error value on the training set. That means the model has not learned from the training
sufficient enough. Overfitting occurs when the gap between the training error and test error is
too large. In this case the model has learned completely from the training set and results in
low training error and when the new item or sample is given the difference between the
training error and test error will be large.
The capacity play the major role in controlling the Underfitting and Overfitting. The capacity
is nothing the number of functions that are applied on the dataset to fit it. Models with low
capacity may struggle to fit the training set. Models with high capacity can Overfit.
To prevent a model from learning misleading or irrelevant patterns found in the training
data, the best solution is to get more training data. A model trained on more data will
naturally generalize better. The processing of fighting overfitting this way is called
regularization. Let’s review some of the most common regularization techniques:
The simplest way to prevent overfitting is to reduce the size of the model: the number
of learnable parameters in the model. It is often referred as Capacity.
A simple model in this context is a model where the distribution of parameter values
has less entropy (or a model with fewer parameters, as you saw in the previous
section). Thus a common way to mitigate overfitting is to put constraints on the
complexity of a network by forcing its weights to take only small values, which
makes the distribution of weight values more regular. This is called weight
regularization. It is done with help of cost function. This cost comes in two flavors:
3. Adding dropout
Dropout is one of the most effective and most commonly used regularization
techniques for neural networks, developed by Geoff Hinton and his students at the
University of Toronto. Dropout, applied to a layer, consists of randomly dropping out
(setting to zero) a number of output features of the layer during training.
Let’s say a given layer would normally return a vector [0.2, 0.5, 1.3, 0.8, 1.1] for a
given input sample during training. After applying dropout, this vector will have a
few zero entries distributed at random: for example, [0, 0.5, 1.3, 0, 1.1].
-------------------End of Unit-1-----------