0% found this document useful (0 votes)
103 views

DLT Unit-2

Uploaded by

TONY 562
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views

DLT Unit-2

Uploaded by

TONY 562
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Unit-II

Introducing Deep Learning:

Biological and Machine Vision, Human and Machine Language,

Artificial Neural Networks, Training Deep Networks, Improving Deep Networks.

G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 1


Introducing Deep Learning
• Deep learning is a subfield of AI and machine learning that uses artificial neural networks to learn from large datasets,
inspired by the human brain's functioning.

• The human visual system processes information in layers, from edges to objects. Deep learning models mimic this
with deep networks that learn abstract features from raw data.

• Convolutional Neural Networks are widely used in image and video processing tasks, excelling in image
classification, object detection, and segmentation, often surpassing traditional methods.

• In machines, vision involves understanding visual data from images or videos. Deep learning models trained on large,
annotated datasets can recognize patterns and objects, enabling tasks like scene understanding.

• Deep learning has enabled machines to achieve superhuman performance in areas such as medical image analysis,
autonomous driving, and facial recognition.

• Deep learning has significantly advanced NLP, with RNNs and Transformer models for enhancing machine
translation, sentiment analysis, and chatbot development.
G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 2
Biological and Machine Vision
• Computer vision relies on algorithms and artificial intelligence to process, analyze, and interpret visual data,
mimicking human visual perception.

• Human vision is a complex biological process involving the eyes, optic nerves, and brain, which work
together to perceive and interpret visual information.

• Computer vision operates through computational models, while human vision is driven by biological
mechanisms.

• Both computer vision and human vision aim to understand and interpret visual data, but they differ
significantly in methods and capabilities.

• Recognizing the differences and similarities between these two domains is crucial for advancing technology
and enhancing human visual perception.

G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 3


G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 4
What is Computer Vision?

• Computer vision is the ability of machines to understand and interpret visual data.

• It is a field of artificial intelligence that uses algorithms and computational models to analyze
images and videos.

• By mimicking human visual processing, computer vision can detect patterns, recognize
objects, and extract meaningful information from visual input.

• This technology has diverse applications, including robotics, autonomous vehicles,


surveillance systems, medical imaging, and augmented reality.

G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 5


Fundamentals of Human Vision
• Human vision is a biological process that enables us to perceive and interpret the visual
world.

• The process starts with the eyes capturing light and sending signals to the brain for
processing.

• The cornea, pupil, and lens work together to focus light onto the retina.

• The retina contains specialized cells called cones and rods that detect light and convert it into
electrical signals.

• These electrical signals are transmitted through the optic nerves to the brain, where they are
processed to form our visual perception.

G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 6


Computer Vision Vs. Human Vision
• Processing Mechanisms: Computer vision uses algorithms and computational models, while
human vision relies on complex neural networks and biological processes.

• Adaptability and Efficiency: Human vision is highly adaptable and efficient in recognizing
patterns, even in complex scenes or varied lighting conditions, whereas computer vision may
struggle in these situations.

• Handling Complex Scenes: Human vision integrates information from multiple sensory
channels to create a cohesive perception, while computer vision algorithms often focus on
specific visual features and may find it challenging to manage complex scenes or changing
conditions.
G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 7
Computer Vision Tasks

G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 8


Advantages and Applications of Computer Vision
• Processing Speed and Accuracy: Computer vision algorithms can process large amounts of visual data
quickly and accurately, outperforming human capabilities in tasks like object recognition and image
classification.

• Object Recognition: Computer vision excels at identifying and categorizing objects in images and videos,
with practical applications in surveillance systems, autonomous vehicles, and medical imaging.

• Applications in Various Fields: Computer vision is used in robotics for environmental interaction, in medical
imaging for diagnosis and treatment, and in augmented reality to enhance our interaction with the digital world.

G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 9


Limitations of Computer Vision

• Contextual Understanding: Computer vision algorithms can recognize objects and patterns
but often struggle to grasp the context and meaning behind visual scenes, something human
vision does naturally.

• Handling Ambiguity: Human vision can interpret ambiguous or incomplete visual


information using past experiences and thinking processes, whereas computer vision
algorithms may find it challenging in these situations.

G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 10


Future of Computer Vision

• Growing Importance: Computer vision is expected to become increasingly integral to our


lives as technology advances.

• Technological Advancements: Improvements in machine learning, deep learning, and neural


networks will enhance the capabilities of computer vision algorithms.

• Integration in Various Fields: We can anticipate greater integration of computer vision in


areas such as robotics, autonomous systems, and healthcare.

• Research and Innovation: The ongoing challenge of bridging the gap between computer
vision and human vision will continue to drive research and innovation.

G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 11


Human and Machine Language
• Human Language is complex and dynamic system of
communication used by humans to express thoughts, ideas, and
emotions.

• Human languages exist in three fields – speech, writing and


gesture.

• Machine Language is a low-level language made up of binary


numbers or bits that a computer can understand. It is also known
as machine code or object code and is extremely tough to
comprehend. The only language the computer can understand is
machine language.
G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 12
Human and Machine Language
Human Language
• Natural and Evolved: Developed organically over thousands of years, deeply embedded in human culture
and society.
• Complex and Expressive: This complexity enables detailed and expressive communication, allowing
individuals to convey intricate meanings, emotions, and cultural context.
• Contextual: Relies heavily on context, including tone of voice, facial expressions, and body language, to
convey accurate meaning.
• Subjective and Emotional: Capable of expressing emotions, opinions, and personal experiences,
influenced by individual perspectives.

• Learning and Creativity: Acquired through exposure and practice, with the ability to invent new words
and expressions creatively.
• Ambiguity: Often ambiguous, with words or phrases having multiple meanings depending on context,
which can sometimes lead to misunderstandings.

G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 13


Human and Machine Language
Machine Language
• Artificial and Designed: Created by humans for specific purposes, such as programming
languages like Python, Java, and C++.
• Formal and Precise: Structured with strict syntax and semantics to give clear, unambiguous
instructions to computers.
• Lack of Context: Machines do not inherently understand context, emotions, following
instructions exactly as programmed.
• Objective and Logical: Based purely on logic and algorithms, with an emphasis on objective
processing of data and task execution.
• Learned by Instruction: Machines are programmed to perform specific tasks and do not learn
languages organically like humans.
• No Ambiguity: Designed to avoid ambiguity, ensuring predictable and reliable behavior in
computational tasks.
G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 14
Natural Language Processing
• Natural Language Processing (NLP) is a field of Artificial Intelligence (AI) and Computer Science
that is concerned with the interactions between computers and humans in natural language.

• The goal of NLP is to develop algorithms and models that enable computers to understand,
interpret, generate, and manipulate human languages.

G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 15


Natural Language Processing

• Natural Language Processing (NLP) is a subfield of artificial intelligence that deals with the
interaction between computers and humans in natural language.

• It involves the use of computational techniques to process and analyze natural language data,
such as text and speech, with the goal of understanding the meaning behind the language.

• NLP is used in a wide range of applications, including machine translation, sentiment


analysis, speech recognition, chatbots, and text classification.

G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 16


Natural Language Processing
Some common techniques used in NLP include:

• Tokenization: the process of breaking text into individual words or phrases.

• Part-of-speech tagging: the process of labeling each word in a sentence with its grammatical part
of speech.

• Named entity recognition: the process of identifying and categorizing named entities, such as
people, places, and organizations, in text.

• Sentiment analysis: the process of determining the sentiment of a piece of text, such as whether it
is positive, negative, or neutral.

• Machine translation: the process of automatically translating text from one language to another.

• Text classification: the process of categorizing text into predefined categories or topics.
G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 17
Working of Natural Language Processing (NLP)

• Working in NLP typically involves using computational techniques to analyze and


understand human language. This can include tasks such as language understanding,
language generation, and language interaction.

• The field is divided into three different parts:


• Speech Recognition — The translation of spoken language into text.

• Natural Language Understanding (NLU) — The computer’s ability to understand what we


say.

• Natural Language Generation (NLG) — The generation of natural language by a computer.

G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 18


• Speech Recognition:

• First, the computer must take natural language and convert it into machine-readable language. This is
what speech recognition or speech-to-text does. This is the first step of NLU.

• Natural Language Understanding (NLU):

• The next and hardest step of NLP is the understanding part.

• First, the computer must comprehend the meaning of each word. It tries to figure out whether the word is
a noun or a verb, whether it’s in the past or present tense, and so on. This is called Part-of-Speech
tagging (POS).

• Natural Language Generation (NLG):

• NLG is much simpler to accomplish. NLG converts a computer’s machine-readable language into text
and can also convert that text into audible speech using text-to-speech technology.

G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 19


Spam Detection

Sentiment Analysis
Question Answering
Question Answering focuses on building systems
that automatically answer the questions asked by
humans in a natural language.

G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 20


Artificial Neural Networks

• The term "Artificial Neural Network" is derived from Biological neural networks that develop
the structure of a human brain.

• Similar to the human brain that has neurons interconnected to one another, artificial neural
networks also have neurons that are interconnected to one another in various layers of the
networks. These neurons are known as nodes.

G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 21


Artificial Neural Networks

Biological Neural Network Artificial Neural Network


Dendrites Inputs
Cell nucleus Nodes
Synapse Weights
Axon Output

G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 22


The Architecture of an Artificial Neural Network

G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 23


Artificial Neural Network primarily consists of three layers

• Input Layer: As the name suggests, it accepts inputs in several different formats provided by
the programmer.
• Hidden Layer: The hidden layer presents in-between input and output layers. It performs all
the calculations to find hidden features and patterns.
• Output Layer: The input goes through a series of transformations using the hidden layer,
which finally results in output that is conveyed using this layer.

• The artificial neural network takes input and computes the weighted sum of the inputs and
includes a bias. This computation is represented in the form of a transfer function.

G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 24


Types of Artificial Neural Networks
• Feedforward Neural Network: The feedforward neural network is one of the most basic
artificial neural networks. In this, the data or the input provided travels in a single direction. It
enters into the ANN through the input layer and exits through the output layer. So the feedforward
neural network has a front-propagated wave only and usually does not have backpropagation.
• Convolutional Neural Network: A Convolutional neural network has some similarities to the
feed-forward neural network, where the connections between units have weights that determine
the influence of one unit on another unit.
• But a CNN has one or more than one convolutional layer that uses a convolution operation on the
input and then passes the result obtained in the form of output to the next layer. CNN has
applications in speech and image processing which is particularly useful in computer vision.

G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 25


Types of Artificial Neural Networks
• Modular Neural Network: A Modular Neural Network contains a collection of different neural
networks that work independently towards obtaining the output with no interaction between them.
Each of the different neural networks performs a different sub-task by obtaining unique inputs
compared to other networks. The advantage of this modular neural network is that it breaks down
a large and complex computational process into smaller components, thus decreasing its
complexity while still obtaining the required output.
• Radial basis function Neural Network: Radial basis functions are those functions that consider
the distance of a point concerning the center.
• Recurrent Neural Network: The Recurrent Neural Network saves the output of a layer and feeds
this output back to the input to better predict the outcome of the layer.
G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 26
Training Deep Networks
• The main purpose of a neural network is to receive a set of inputs, perform progressively
complex calculations on them, and give output to solve real world problems .

• A deep neural network (DNN) is an ANN with multiple hidden layers between the input
and output layers.

• There can be multiple hidden layers which depend on what kind of data you are dealing
with. The number of hidden layers is known as the depth of the neural network.

G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 27


1. Data Preparation
• Data Collection: Gather a large and diverse dataset relevant to the problem you're solving
(e.g., images, text, or time-series data).

• Data Preprocessing: Clean the data by handling missing values, normalizing or


standardizing features, and encoding categorical variables.

• Data Augmentation: Apply techniques like rotation, flipping, scaling, and cropping to
artificially increase the size and diversity of the dataset, especially for image data.

• Train-Test Split: Split the dataset into training, validation, and test sets to evaluate the
model's performance during and after training.
G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 28
2. Model Design
• Select a Model Architecture: Choose a suitable neural network architecture based on the
problem. Common architectures include:
• Convolutional Neural Networks (CNNs) for image data.
• Recurrent Neural Networks (RNNs) or Long Short-Term Memory (LSTM) networks for sequential
data.
• Fully Connected Networks for tabular data.
• Define the Layers: Specify the number of layers, types of layers (e.g., convolutional, dense,
dropout), and the number of neurons in each layer.
• Convolutional layer extracts features from spatial data using filters.
• Dense layer combines features learned from previous layers into final predictions.
• Dropout layer regularizes the model by randomly dropping neurons to prevent overfitting.
G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 29
3. Choosing the Loss Function and Optimization Algorithm
• Loss Function: The loss function provides a numerical value that represents how far off the model's
predictions are from the actual target values. Select a loss function that aligns with your problem:
• Mean Squared Error (MSE) for regression tasks - measures the average of the squared differences between
predicted and actual values.
• Categorical Cross-Entropy for multi-class classification - difference between the predicted probability
distribution and the actual distribution
• Binary Cross-Entropy for binary classification.

• Optimizer is essential in deep learning because it drives the learning process by efficiently updating
the model's parameters to minimize the loss function. Common optimizers include:
• Stochastic Gradient Descent (SGD) – for simple problems
• Adaptive Moment Estimation(Adam) – for deep learning tasks
• RMSprop – non-stationary objectives i.e., data change over time
G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 30
4. Training the Model
• Forward Propagation: Pass the input data through the network to generate
predictions.
• Backpropagation: Calculate the gradient of the loss function with respect to each
weight by propagating the error backward through the network.
• Weight Updates: Update the weights of the network using the optimizer based on the
calculated gradients.
• Batch Training: Use mini-batches (small subsets of the dataset) to update the weights
after each batch. This approach balances convergence speed and generalization.
• Epochs: Repeat the forward and backward propagation process for several epochs (full
passes through the training dataset).

G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 31


5. Monitoring and Adjusting
• Validation Set: Monitor the performance of the model on a validation set during training to
avoid overfitting.
• Early Stopping: Stop training if the performance on the validation set starts to degrade,
indicating potential overfitting.
• Learning Rate Scheduling: The learning rate is a hyperparameter that controls how much to
change the model in response to the estimated error each time the model weights are updated.
Adjust the learning rate during training to improve convergence.
6. Regularization Techniques
• Dropout: Randomly deactivate a fraction of neurons during training to prevent overfitting.
• L1(Lasso)/L2(Ridge) Regularization: These techniques to prevent the model from
overfitting by adding extra information to it.
G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 32
7. Evaluation and Testing
• Test Set Evaluation: After training, evaluate the model on the test set to assess its
generalization to new data.
• Metrics: Use appropriate metrics (e.g., accuracy, precision, recall, F1-score) depending on
the task to evaluate performance.
8. Fine-Tuning and Optimization
• Hyperparameter Tuning: Experiment with different hyperparameters (e.g., learning rate,
batch size, number of layers) to find the optimal settings.
• Transfer Learning: If data is limited, consider using a pre-trained model and fine-tuning it
on your specific dataset.
8. Deployment
• Model Serialization: Save the trained model for later use in production environments.
• Integration: Integrate the model into an application or system where it can make predictions
on new, unseen data.
G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 33
1. Data Preparation
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
# Load and preprocess the dataset
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
# Normalize the images to the range [0, 1]
train_images = train_images.astype('float32') / 255
test_images = test_images.astype('float32') / 255
# Reshape the images to include the channel dimension
train_images = train_images.reshape((train_images.shape[0], 28, 28, 1))
test_images = test_images.reshape((test_images.shape[0], 28, 28, 1))
# Convert the labels to one-hot encoded vectors
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels) G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 34
2. Model Design
from tensorflow.keras import layers, models
# Build the CNN model
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax’))
3. Choosing the Loss Function and Optimization Algorithm
# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy’])
G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 35
4. Training the Model
history = model.fit(train_images, train_labels, epochs=5, batch_size=64, validation_split=0.2)

5. Monitoring and Adjusting


# Monitor validation accuracy to prevent overfitting
# The model.fit() function already includes validation through `validation_split`
# You can use callbacks like EarlyStopping for further control
from tensorflow.keras.callbacks import EarlyStopping
# Early stopping callback
early_stopping = EarlyStopping(monitor='val_loss', patience=3)
# Train with early stopping
history = model.fit(train_images, train_labels, epochs=20, batch_size=64, validation_split=0.2, callbacks=[early_stopping])

G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 36


6. Regularization Techniques
# Adding Dropout to prevent overfitting
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dropout(0.5)) # Dropout added here
model.add(layers.Dense(10, activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
history = model.fit(train_images, train_labels, epochs=10, batch_size=64, validation_split=0.2)
G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 37
7. Evaluation and Testing
# Evaluate the model on the test set
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"Test accuracy: {test_acc}")

8. Fine-Tuning and Optimization


# Hyperparameter tuning could involve experimenting with different optimizers, batch sizes, epochs, etc.
# For example, experimenting with the learning rate
from tensorflow.keras.optimizers import Adam
# Adjusting learning rate
model.compile(optimizer=Adam(learning_rate=0.001), loss='categorical_crossentropy', metrics=['accuracy'])
# Re-train the model with the adjusted learning rate
history = model.fit(train_images, train_labels, epochs=10, batch_size=64, validation_split=0.2)

G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 38


9. Deployment

# Save the trained model


model.save('mnist_cnn_model.h5')

# To load the model later


loaded_model = models.load_model('mnist_cnn_model.h5')

# Making predictions with the loaded model


predictions = loaded_model.predict(test_images)

G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 39


What Is A Feed Forward Neural Network?
• In the feed-forward neural network, there are not any feedback loops or connections in the
network. Here is simply an input layer, a hidden layer, and an output layer.

G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 40


Backpropagation Process in Deep Neural Network
• Backpropagation is one of the important concepts of a neural network. Our task is to classify our data
best.
• For this, we have to update the weights of parameter and bias, but how can we do that in a deep neural
network?
• In the linear regression model, we use gradient descent to optimize the parameter. Similarly here we also
use gradient descent algorithm using Backpropagation.
• Backpropagation algorithms are a set of methods used to efficiently train artificial neural networks
following a gradient descent approach which exploits the chain rule.
• The main features of Backpropagation are the iterative, recursive and efficient method through which it
calculates the updated weight to improve the network
• Derivatives of the activation function to be known at network design time is required to
Backpropagation.
G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 41
Backpropagation Process in Deep Neural Network

G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 42


Improving Deep Networks
• A Deep Learning Model usually has variable parameters that must be set before training called
Hyperparameters. These values affect the results of the model effectively. So the optimal values for these
parameters to obtain the best results should be found.
• Finding the most optimal combination is called Hyperparameter Tuning.
• Hyperparameter tuning can improve a neural network's accuracy and efficiency and is essential for getting
good results.
Here are a few methods that can be used to avoid overfitting during Neural Network hyperparameter tuning:
• Use a separate validation set to evaluate the model's performance during hyperparameter tuning.
• Using regularization techniques, such as weight decay (L2 regularization) or dropout, prevents the model
from overfitting to the training data.
• Implement early stopping from terminating the training process if the model's performance on the validation
set starts to degrade.
G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 43
Functions for Hyperparameter Tuning
• Several approaches can be used to perform hyperparameter tuning on neural networks
• Grid search,
• Random search, and
• Bayesian optimization.

G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 44


Grid Search
• Grid search is a hyperparameter tuning method involving specifying a grid of
hyperparameter values and training and evaluating the neural network model for each
combination of hyperparameter values.
• For example, if we want to tune the learning rate and the batch size of a neural network,
we can specify a grid of possible values for the learning rate (e.g., 0.1, 0.01, 0.001) and
the batch size (e.g., 32, 64, 128) and train and evaluate the model for each combination of
values. The combination of hyperparameters that results in the best results on the
validation set is then selected as the optimal set of hyperparameters.

G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 45


Random Search
• Random search is another hyperparameter tuning method
involving sampling random combinations of hyperparameter
values and training and evaluating the neural network model for
each combination. Random search can be more efficient than grid
search,
• Random Search can be better than grid search, especially if the
most optimal values for the model are in between the specified
values. For example, if the most optimal learning rate is 0.05 and
the specified values are 0.01 and 0.1, then the grid search will not
give good results, while the random search can get the optimal
value. G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 46
Bayesian optimization

• Bayesian optimization uses the previous values of scores and probabilities to make an
informed decision in the following iterations. Allowing the model to focus on the
hyperparameters that can significantly change the results while not focusing on the
parameters doesn't affect the result much.
• Bayesian optimization can be more efficient than grid search or random search, as it
can adaptively select the next set of hyperparameters to evaluate based on the
previous evaluations. However, it can be more computationally expensive and require
more resources.

G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 47


Optimization Algorithms For Training Neural Network
• Optimizers are algorithms or methods used to change the attributes of your neural network such as weights
and learning rate in order to reduce the losses

• Gradient Descent

• Gradient Descent is the most basic but most used optimization algorithm. It’s used heavily in linear
regression and classification algorithms. Backpropagation in neural networks also uses a gradient descent
algorithm.

• Gradient descent is a first-order optimization algorithm which is dependent on the first order derivative of a
loss function. It calculates that which way the weights should be altered so that the function can reach a
minima. Through backpropagation, the loss is transferred from one layer to another and the model’s
parameters also known as weights are modified depending on the losses so that the loss can be minimized.

• Stochastic Gradient Descent

• It’s a variant of Gradient Descent. It tries to update the model’s parameters more frequently. In this, the model
parameters are altered after computation of loss on each training example. So, if the dataset contains 1000
rows SGD will update the model parameters 1000 times in one cycle of dataset instead of one time as in
Gradient Descent.
G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 48
Regularization
• Regularization in deep neural networks is a set of techniques used to prevent overfitting, which
occurs when a model learns to fit the training data very closely but performs poorly on unseen data.
• Regularization methods aim to encourage the model to generalize better by adding constraints or
penalties to the loss function,
L1 and L2 Regularization:
L1 Regularization (Lasso): This adds a penalty term to the loss function that is proportional to the
absolute values of the model's weights.
L2 Regularization (Ridge): L2 regularization adds a penalty term to the loss function that is
proportional to the square of the model's weights.
Dropout: Dropout is a regularization technique that randomly deactivates (sets to zero) a fraction of
neurons during each forward and backward pass of training. This prevents any single neuron from
becoming overly specialized and encourages the network to rely on a more robust set of features.
G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 49
Early Stopping: Early stopping is a simple but effective regularization technique. It involves
monitoring the model's performance on a validation dataset during training. If the performance starts
to degrade (indicating overfitting), training is stopped early to prevent the model from learning noise
in the data.
Data Augmentation: Data augmentation involves creating new training examples by applying
random transformations (e.g., rotations, flips, crops) to the existing training data. This increases the
diversity of the training set and helps the model generalize better.
Weight Constraints: You can apply constraints to the weights of the neural network to limit their
values.
Noise Injection: Adding noise to the input data or to the activations of neurons during training can
act as a form of regularization. Noise can help the model become more robust to variations in the
data.
DropConnect: Similar to dropout, DropConnect randomly sets a fraction of weights to zero during
each forward and backward pass.
Ensemble Methods: Combining the predictions of multiple neural networks (ensemble learning)
can lead to improved performance and act as a form of regularization. Techniques like bagging and
boosting can be applied to neural networks.
G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 50

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy