Unit - 2
Unit - 2
The basic building blocks of artificial neural networks are artificial neurons, or nodes, which are
interconnected in layers.
Artificial Neural Network has an input layer, an output layer as well as hidden layers. The input layer
receives data from the outside world which the neural network needs to analyze or learn about. Then
this data passes through one or multiple hidden layers that transform the input into data that is valuable
for the output layer. Finally, the output layer provides an output in the form of a response of the
Artificial Neural Networks to input data provided.
In the majority of neural networks, units are interconnected from one layer to another. Each of these
connections has weights that determine the influence of one unit on another unit. As the data transfers
from one unit to another, the neural network learns more and more about the data which eventually
results in an output from the output layer .
1. Input Layer: This layer receives the raw input data. Each node in the input layer
represents a feature or attribute of the input data.
2. Hidden Layers: These are intermediate layers between the input and output
layers. Each node in a hidden layer performs a weighted sum of the inputs from
the previous layer, applies an activation function to the result, and passes the
output to the next layer. Multiple hidden layers allow neural networks to learn
complex relationships in the data.
3. Output Layer: This layer produces the final output of the neural network. The
number of nodes in the output layer depends on the type of problem being
solved. For example, in a binary classification problem, there might be one node
representing the probability of belonging to one class and another node
representing the probability of belonging to the other class.
Appropriate Problems for Learning Neural Networks
*instances have many attribute-value pairs: The target function to be learned is defined
over instances that can be described by a vector of predefined features.
*Training examples may contain errors: ANN learning methods are quite robust to noise
in the training data.
*Long training times are acceptable: Network training algorithms typically require longer
training times than, say, decision tree learning algorithms. Training times can range from
a few seconds to many hours, depending on factors such as the number of weights in
the network, the number of training examples considered, and the settings of various
learning algorithm parameters.
*Fast evaluation of the learned target function may be required. Although ANN learning
times are relatively long, evaluating the learned network, in order to apply it to a
subsequent instance, is typically very fast.
*The ability for humans to understand the learned target function is not important. The
weights learned by neural networks are often difficult for humans to interpret. Learned
neural networks are less easily communicated to humans than learned rules
Perceptrons
"perceptrons," which are a type of artificial neuron used in artificial neural networks
*Limitations of Perceptron
However, its capabilities are limited:
The perceptron model has some limitations that can make it unsuitable for certain types
of problems:
Limited to linearly separable problems.
Convergence issues with non-separable data
Requires labeled data
Sensitivity to input scaling
Lack of hidden layers
Multi-Layer Neural Network
To be accurate a fully connected Multi-Layered Neural Network is known as Multi-
Layer Perceptron. A Multi-Layered Neural Network consists of multiple layers of
artificial neurons or nodes. Unlike Single-Layer Neural networks, in recent times most
networks have Multi-Layered Neural Network. The following diagram is a visualization
of a multi-layer neural network.
(MLPs), are a type of artificial neural network with multiple layers of neurons.
1. Input Layer: The input layer consists of neurons that receive input signals from
the external environment or other systems. Each neuron in the input layer
represents a feature or attribute of the input data.
2. Hidden Layers: Hidden layers are intermediate layers between the input and
output layers. Each neuron in a hidden layer receives input from neurons in the
previous layer, performs a weighted sum of the inputs, applies an activation
function, and then passes the output to neurons in the next layer. The number of
hidden layers and the number of neurons in each hidden layer are configurable
parameters of the network architecture.
3. Output Layer: The output layer produces the final output of the neural network.
The number of neurons in the output layer depends on the nature of the task
being solved. For example, in binary classification, there may be one neuron
representing the probability of belonging to one class and another neuron
representing the probability of belonging to the other class. In multi-class
classification, there will be one neuron for each class.
Backpropagation algorithm
Backpropagation is an algorithm that backpropagates the errors from the output nodes
to the input nodes. Therefore, it is simply referred to as the backward propagation of
errors.
In summary, the backpropagation algorithm has revolutionized the field of deep learning and
remains a fundamental tool for training neural networks. Its ability to efficiently compute gradients
enables the training of deep, complex models on large-scale datasets, leading to state-of-the-art
performance in various machine learning tasks.
face recognition
1. Deep Learning Architectures: Deep learning encompasses neural network
architectures with many layers, allowing them to learn hierarchical
representations of data. Advanced architectures include Convolutional Neural
Networks (CNNs) for image processing, Recurrent Neural Networks (RNNs) for
sequential data, Long Short-Term Memory networks (LSTMs) and Gated
Recurrent Units (GRUs) for handling long-term dependencies, and Transformers
for natural language processing tasks.
2. Transfer Learning and Fine-Tuning: Transfer learning involves leveraging pre-
trained neural network models trained on large datasets for a specific task and
fine-tuning them on a smaller, task-specific dataset. This approach can
significantly reduce the amount of labeled data required to train a model and
improve generalization performance.
3. Generative Adversarial Networks (GANs): GANs are a class of neural networks
that consist of two networks—a generator and a discriminator—that are trained
simultaneously through a min-max game. GANs can generate realistic synthetic
data, such as images, text, and audio, and have applications in image generation,
data augmentation, and domain adaptation.
4. Reinforcement Learning (RL): RL is a branch of machine learning where an
agent learns to make decisions by interacting with an environment to maximize
cumulative rewards. Deep RL combines deep neural networks with RL algorithms,
enabling the learning of complex policies for tasks such as game playing,
robotics, and autonomous driving.
5. Meta-Learning: Meta-learning, or learning to learn, involves training models that
can learn new tasks or adapt to new environments with minimal data or training
samples. Meta-learning algorithms aim to discover common patterns across
different tasks and leverage this knowledge to facilitate rapid learning of new
tasks.
6. Neuroevolution: Neuroevolution combines neural networks with evolutionary
algorithms to optimize network architectures or parameters through genetic
algorithms, evolutionary strategies, or other evolutionary computation
techniques. It is particularly useful for training neural networks in environments
with limited data or when manual design is challenging.
7. Adversarial Robustness: Adversarial attacks involve intentionally perturbing
input data to mislead neural network models into making incorrect predictions.
Adversarial training methods aim to improve the robustness of neural networks
against such attacks by augmenting training data with adversarial examples or
incorporating adversarial perturbations directly into the training process.
8. Capsule Networks: Capsule networks are a novel type of neural network
architecture designed to better capture hierarchical relationships and spatial
hierarchies in data. They use groups of neurons called capsules to represent
entities or parts of objects and have shown promise in tasks such as object
recognition and pose estimation.
9. Explainable AI (XAI): XAI techniques aim to provide insights into the decision-
making process of neural networks, making them more interpretable and
transparent to users. Methods include attention mechanisms, saliency maps, and
model distillation, which help understand which features or parts of input data
are relevant for making predictions.
Evaluating hypotheses:
Whenever you form a hypothesis for a given training data set, for example,
you came up with a hypothesis for the EnjoySport example where the
attributes of the instances decide if a person will be able to enjoy their
favorite sport or not.
Now to test or evaluate how accurate the considered hypothesis is we use
different statistical measures. Evaluating hypotheses is an important step in
training the model.
*To evaluate the hypotheses precisely focus on these points:
When statistical methods are applied to estimate hypotheses,
First, how well does this estimate the accuracy of a hypothesis across additional
examples, given the observed accuracy of a hypothesis over a limited sample of
data?
Second, how likely is it that if one theory outperforms another across a set of data,
it is more accurate in general?
Third, what is the best strategy to use limited data to both learn and measure the
accuracy of a hypothesis?
Motivation:
There are instances where the accuracy of the entire model plays a huge
role in the model is adopted or not. For example, consider using a training
model for Medical treatment. We need to have a high accuracy so as to
depend on the information the model provides.
When we need to learn a hypothesis and estimate its future accuracy based
on a small collection of data, we face two major challenges:
1. What is the best estimate of the accuracy of h over future instances taken from the
same distribution, given a hypothesis h and a data sample containing n examples
picked at random according to the distribution D?
2. What is the margin of error in this estimate of accuracy?
Sample Error:
It is denoted by errors(h) of hypothesis h with respect to target function f
and data sample S is
True Error:
It is denoted by errorD(h) of hypothesis h with respect to target function f
and distribution D, which is the probability that h will misclassify an
instance drawn at random according to D.
Basics of Sampling Theory
For estimating hypothesis accuracy, statistical methods are applied. In this blog, we’ll
have a look at evaluating hypotheses and the basics of sampling theory.
Let’s have a look at the terminologies involved and what they mean,
1)*Random Variable:
A random variable may be thought of as the name of a probabilistic experiment. Its value
is the outcome of the experiment. When we don’t know the outcome of the experiment for
certain, it comes under random variables.
2)*Probability Distribution:
A probability distribution is a statistical function that specifies all possible values and
probabilities for a random variable in a given range.
This range will be bounded by the minimum and greatest possible values, but where the
possible value will be plotted on the probability distribution will be determined by a
variety of factors.
The mean (average), standard deviation, skewness, and kurtosis of the distribution are
among these parameters.
3)*Expected Value:
The expected value (EV) is the value that investment is predicted to have at some time in
the future.
In statistics and probability analysis, the expected value is computed by multiplying each
conceivable event by the likelihood that it will occur and then summing all of those
values.
Investors might choose the scenario that is most likely to provide the desired result by
assessing anticipated values.
4)*The variance of a Random Variable:
In statistics, variance refers to the deviation of a data collection from its mean value. The
probability-weighted average of squared deviations from the predicted value is used to
calculate it.
As a result, the greater the variance, the greater the difference between the set’s numbers
and the mean. A smaller variance, on the other hand, indicates that the numbers in the
collection are closer to the mean.
The Y-random variable variance is defined as,
5)*Standard Deviation:
The standard deviation is a statistic that calculates the square root of the variance and
measures the dispersion of a dataset relative to its mean.
The standard deviation is determined as the square root of variance by computing each
data point’s difference from the mean.
When data points are further from the mean, there is more variation within the data set;
as a result, the larger the standard deviation, the more spread out the data is.The standard
7)*Normal Distribution:
The standard distribution, also known as the Gaussian distribution, is the probability of a
measure of distribution based on the definition, indicating that the data about the
definition occurs more often than the data at a distance. The normal distribution will
appear as a metal grid on the graph.
It is also referred to as a bell-shaped probability distribution that covers many natural
phenomena.
9)Estimator:
It is a random variable Y used to estimate some parameter p of an underlying population.
The estimand is the quantity that is being estimated (i.e. the one you wish to know). For
example, suppose you needed to discover the average height of pupils at a 1000-student
school.
You measure a group of 30 children and discover that the average height is 56 inches.
This is the estimator for your sample mean. You estimate the population means (your
estimand) to be around 56 inches using the sample mean.
This article aims to provide the readers with different angles to view
the ML algorithms. With these perspectives, algorithms can be
compared on common grounds and they can be analysed easily. The
article is written with two major ML tasks in mind — regression and
classification.
Time complexity
Under the RAM model [1], the “time” an algorithm takes is measured
by the elementary operations of the algorithm. While users and
developers may concern more about the wall clock time an algorithm
takes to train the models, it would be fairer to use the standard worst
case computational time complexity to compare the time the models
take to train. Using computational complexity has the benefits of
ignoring the differences like the computer power and architecture used
at runtime and the underlying programming language, allowing users
to focus on the fundamental differences of the elementary operations
of the algorithms.
Note that the time complexity can be very different during training and
testing. For example, parametric models like linear regression could
have long training time but they are efficient during test time.
Space complexity
Sample complexity
Bias-variance tradeoff
Parallelizability
Parametricity