DLT Unit-2
DLT Unit-2
• The human visual system processes information in layers, from edges to objects. Deep learning models mimic this
with deep networks that learn abstract features from raw data.
• Convolutional Neural Networks are widely used in image and video processing tasks, excelling in image
classification, object detection, and segmentation, often surpassing traditional methods.
• In machines, vision involves understanding visual data from images or videos. Deep learning models trained on large,
annotated datasets can recognize patterns and objects, enabling tasks like scene understanding.
• Deep learning has enabled machines to achieve superhuman performance in areas such as medical image analysis,
autonomous driving, and facial recognition.
• Deep learning has significantly advanced NLP, with RNNs and Transformer models for enhancing machine
translation, sentiment analysis, and chatbot development.
G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 2
Biological and Machine Vision
• Computer vision relies on algorithms and artificial intelligence to process, analyze, and interpret visual data,
mimicking human visual perception.
• Human vision is a complex biological process involving the eyes, optic nerves, and brain, which work
together to perceive and interpret visual information.
• Computer vision operates through computational models, while human vision is driven by biological
mechanisms.
• Both computer vision and human vision aim to understand and interpret visual data, but they differ
significantly in methods and capabilities.
• Recognizing the differences and similarities between these two domains is crucial for advancing technology
and enhancing human visual perception.
• Computer vision is the ability of machines to understand and interpret visual data.
• It is a field of artificial intelligence that uses algorithms and computational models to analyze
images and videos.
• By mimicking human visual processing, computer vision can detect patterns, recognize
objects, and extract meaningful information from visual input.
• The process starts with the eyes capturing light and sending signals to the brain for
processing.
• The cornea, pupil, and lens work together to focus light onto the retina.
• The retina contains specialized cells called cones and rods that detect light and convert it into
electrical signals.
• These electrical signals are transmitted through the optic nerves to the brain, where they are
processed to form our visual perception.
• Adaptability and Efficiency: Human vision is highly adaptable and efficient in recognizing
patterns, even in complex scenes or varied lighting conditions, whereas computer vision may
struggle in these situations.
• Handling Complex Scenes: Human vision integrates information from multiple sensory
channels to create a cohesive perception, while computer vision algorithms often focus on
specific visual features and may find it challenging to manage complex scenes or changing
conditions.
G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 7
Computer Vision Tasks
• Object Recognition: Computer vision excels at identifying and categorizing objects in images and videos,
with practical applications in surveillance systems, autonomous vehicles, and medical imaging.
• Applications in Various Fields: Computer vision is used in robotics for environmental interaction, in medical
imaging for diagnosis and treatment, and in augmented reality to enhance our interaction with the digital world.
• Contextual Understanding: Computer vision algorithms can recognize objects and patterns
but often struggle to grasp the context and meaning behind visual scenes, something human
vision does naturally.
• Research and Innovation: The ongoing challenge of bridging the gap between computer
vision and human vision will continue to drive research and innovation.
• Learning and Creativity: Acquired through exposure and practice, with the ability to invent new words
and expressions creatively.
• Ambiguity: Often ambiguous, with words or phrases having multiple meanings depending on context,
which can sometimes lead to misunderstandings.
• The goal of NLP is to develop algorithms and models that enable computers to understand,
interpret, generate, and manipulate human languages.
• Natural Language Processing (NLP) is a subfield of artificial intelligence that deals with the
interaction between computers and humans in natural language.
• It involves the use of computational techniques to process and analyze natural language data,
such as text and speech, with the goal of understanding the meaning behind the language.
• Part-of-speech tagging: the process of labeling each word in a sentence with its grammatical part
of speech.
• Named entity recognition: the process of identifying and categorizing named entities, such as
people, places, and organizations, in text.
• Sentiment analysis: the process of determining the sentiment of a piece of text, such as whether it
is positive, negative, or neutral.
• Machine translation: the process of automatically translating text from one language to another.
• Text classification: the process of categorizing text into predefined categories or topics.
G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 17
Working of Natural Language Processing (NLP)
• First, the computer must take natural language and convert it into machine-readable language. This is
what speech recognition or speech-to-text does. This is the first step of NLU.
• First, the computer must comprehend the meaning of each word. It tries to figure out whether the word is
a noun or a verb, whether it’s in the past or present tense, and so on. This is called Part-of-Speech
tagging (POS).
• NLG is much simpler to accomplish. NLG converts a computer’s machine-readable language into text
and can also convert that text into audible speech using text-to-speech technology.
Sentiment Analysis
Question Answering
Question Answering focuses on building systems
that automatically answer the questions asked by
humans in a natural language.
• The term "Artificial Neural Network" is derived from Biological neural networks that develop
the structure of a human brain.
• Similar to the human brain that has neurons interconnected to one another, artificial neural
networks also have neurons that are interconnected to one another in various layers of the
networks. These neurons are known as nodes.
• Input Layer: As the name suggests, it accepts inputs in several different formats provided by
the programmer.
• Hidden Layer: The hidden layer presents in-between input and output layers. It performs all
the calculations to find hidden features and patterns.
• Output Layer: The input goes through a series of transformations using the hidden layer,
which finally results in output that is conveyed using this layer.
• The artificial neural network takes input and computes the weighted sum of the inputs and
includes a bias. This computation is represented in the form of a transfer function.
• A deep neural network (DNN) is an ANN with multiple hidden layers between the input
and output layers.
• There can be multiple hidden layers which depend on what kind of data you are dealing
with. The number of hidden layers is known as the depth of the neural network.
• Data Augmentation: Apply techniques like rotation, flipping, scaling, and cropping to
artificially increase the size and diversity of the dataset, especially for image data.
• Train-Test Split: Split the dataset into training, validation, and test sets to evaluate the
model's performance during and after training.
G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 28
2. Model Design
• Select a Model Architecture: Choose a suitable neural network architecture based on the
problem. Common architectures include:
• Convolutional Neural Networks (CNNs) for image data.
• Recurrent Neural Networks (RNNs) or Long Short-Term Memory (LSTM) networks for sequential
data.
• Fully Connected Networks for tabular data.
• Define the Layers: Specify the number of layers, types of layers (e.g., convolutional, dense,
dropout), and the number of neurons in each layer.
• Convolutional layer extracts features from spatial data using filters.
• Dense layer combines features learned from previous layers into final predictions.
• Dropout layer regularizes the model by randomly dropping neurons to prevent overfitting.
G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 29
3. Choosing the Loss Function and Optimization Algorithm
• Loss Function: The loss function provides a numerical value that represents how far off the model's
predictions are from the actual target values. Select a loss function that aligns with your problem:
• Mean Squared Error (MSE) for regression tasks - measures the average of the squared differences between
predicted and actual values.
• Categorical Cross-Entropy for multi-class classification - difference between the predicted probability
distribution and the actual distribution
• Binary Cross-Entropy for binary classification.
• Optimizer is essential in deep learning because it drives the learning process by efficiently updating
the model's parameters to minimize the loss function. Common optimizers include:
• Stochastic Gradient Descent (SGD) – for simple problems
• Adaptive Moment Estimation(Adam) – for deep learning tasks
• RMSprop – non-stationary objectives i.e., data change over time
G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 30
4. Training the Model
• Forward Propagation: Pass the input data through the network to generate
predictions.
• Backpropagation: Calculate the gradient of the loss function with respect to each
weight by propagating the error backward through the network.
• Weight Updates: Update the weights of the network using the optimizer based on the
calculated gradients.
• Batch Training: Use mini-batches (small subsets of the dataset) to update the weights
after each batch. This approach balances convergence speed and generalization.
• Epochs: Repeat the forward and backward propagation process for several epochs (full
passes through the training dataset).
• Bayesian optimization uses the previous values of scores and probabilities to make an
informed decision in the following iterations. Allowing the model to focus on the
hyperparameters that can significantly change the results while not focusing on the
parameters doesn't affect the result much.
• Bayesian optimization can be more efficient than grid search or random search, as it
can adaptively select the next set of hyperparameters to evaluate based on the
previous evaluations. However, it can be more computationally expensive and require
more resources.
• Gradient Descent
• Gradient Descent is the most basic but most used optimization algorithm. It’s used heavily in linear
regression and classification algorithms. Backpropagation in neural networks also uses a gradient descent
algorithm.
• Gradient descent is a first-order optimization algorithm which is dependent on the first order derivative of a
loss function. It calculates that which way the weights should be altered so that the function can reach a
minima. Through backpropagation, the loss is transferred from one layer to another and the model’s
parameters also known as weights are modified depending on the losses so that the loss can be minimized.
• It’s a variant of Gradient Descent. It tries to update the model’s parameters more frequently. In this, the model
parameters are altered after computation of loss on each training example. So, if the dataset contains 1000
rows SGD will update the model parameters 1000 times in one cycle of dataset instead of one time as in
Gradient Descent.
G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 48
Regularization
• Regularization in deep neural networks is a set of techniques used to prevent overfitting, which
occurs when a model learns to fit the training data very closely but performs poorly on unseen data.
• Regularization methods aim to encourage the model to generalize better by adding constraints or
penalties to the loss function,
L1 and L2 Regularization:
L1 Regularization (Lasso): This adds a penalty term to the loss function that is proportional to the
absolute values of the model's weights.
L2 Regularization (Ridge): L2 regularization adds a penalty term to the loss function that is
proportional to the square of the model's weights.
Dropout: Dropout is a regularization technique that randomly deactivates (sets to zero) a fraction of
neurons during each forward and backward pass of training. This prevents any single neuron from
becoming overly specialized and encourages the network to rely on a more robust set of features.
G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 49
Early Stopping: Early stopping is a simple but effective regularization technique. It involves
monitoring the model's performance on a validation dataset during training. If the performance starts
to degrade (indicating overfitting), training is stopped early to prevent the model from learning noise
in the data.
Data Augmentation: Data augmentation involves creating new training examples by applying
random transformations (e.g., rotations, flips, crops) to the existing training data. This increases the
diversity of the training set and helps the model generalize better.
Weight Constraints: You can apply constraints to the weights of the neural network to limit their
values.
Noise Injection: Adding noise to the input data or to the activations of neurons during training can
act as a form of regularization. Noise can help the model become more robust to variations in the
data.
DropConnect: Similar to dropout, DropConnect randomly sets a fraction of weights to zero during
each forward and backward pass.
Ensemble Methods: Combining the predictions of multiple neural networks (ensemble learning)
can lead to improved performance and act as a form of regularization. Techniques like bagging and
boosting can be applied to neural networks.
G. Muni Nagamani, Assistant Professor, CSE, ALIET, Vijayawada. 50