0% found this document useful (0 votes)
21 views

AD8703 Basic of Computer vision UNIT 1

Uploaded by

bhuvanesh.cse23
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

AD8703 Basic of Computer vision UNIT 1

Uploaded by

bhuvanesh.cse23
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

AD8703 BASIC OF COMPUTER VISION

What is Image Processing?

Image Processing is a step that analyses the Image and processes it


digitally before we can provide input to a Model.Image processing is a way to
convert an image to a digital aspect and perform certain mathematical functions
on it, to get an enhanced image or extract other useful information from it.

Image processing involves the following three steps.

Importing an image with an optical scanner or digital photography.

Analysis and image management including data compression and image


enhancement and visual detection patterns such as satellite imagery.

It produces the final stage where the result can be changed to an image or
report based on image analysis.

Image processing is a way by which an individual can enhance the quality


of an image or gather alerting insights from an image and feed it to an algorithm
to predict the later things.

Q: What are the steps involved in Image Processing?

Now let’s see what are the different types of image processing one can
perform, they are as follows:

Rearrange channels: from BGR to RGB –

Converting a BGR image to RGB and vice versa can have several reasons,
one of them being that several image processing libraries have different pixel
orderings.

OpenCV reads the image in BGR mode, whereas Matplotlib needs the
image to be in RGB mode. That is why you might face a situation in which you
need to display an image using Matplotlib. In that case, you have to convert the
image to RGB mode.
Visualize channels separately -

This is an analysis step where you try to find out information hidden in
any one channel which might not be properly visible in the other channels. This
is more of a human analysis than a machine analysis.

Visualize histograms

In an image processing context, the histogram of an image normally refers


to a histogram of the pixel intensity values. This histogram is a graph showing
the number of pixels in an image at each different intensity value found in that
image.

Histograms have many uses. One of the more common is to decide what
value of threshold to use when converting a grayscale image to a binary one by
thresholding.

Crop an image

Image Cropping is a common photo manipulation process, which


improves the overall composition by removing unwanted regions. Image
Cropping is widely used in photographic, film processing, graphic design, and
printing businesses. Cropping allows us to focus on the subject alone rather than
its unique combination with the surrounding.

Image Augmentation

Image augmentation is a technique of altering the existing data to create


more data for the model training process. In other words, it is the process of
artificially expanding the available dataset for training a deep learning model.

Image Rotation

Rotation transformation applies a rotation on the image, from right to left,


on an axis between 1° and 359°

Random Shifts

Shifting the entire pixels of an image from one position to another


position is called shift augmentation.

There are two types of shifts:


Horizontal Shift Augmentation

Shifting all pixels of an image in a horizontal direction is called


Horizontal shift augmentation.

Vertical Shift Augmentation

Shifting all pixels of an image in a vertical direction is called vertical shift


augmentation.

Random Flips

Flipping means rotating an image on a horizontal or vertical axis.

In a horizontal flip, the flipping will be on a vertical axis, In a Vertical


flip, the flipping will be on a horizontal axis.

Horizontal Flip Augmentation:

Reversing the entire rows and columns of an image pixel horizontally is


called horizontal flip augmentation.

Vertical Flip Augmentation:

Reversing the entire rows and columns of an image pixel vertically is


called Vertical flip augmentation.

Random Scale

Scaling is used to change the visual appearance of an image, to alter the


quantity of information stored in a scene representation, or as a low-level
preprocessor in a multi-stage image processing chain that operates on features of
a particular scale. Scaling is a special case of an affine transformation.

Image scaling is an essential part of image processing. Images need to be


scaled up or down for multiple reasons.

We will assume we have an image with a resolution of width×height that


we want to resize to new_width×new_height. First, we will introduce the scaling
factors scale_x and scale_y

A scale factor <1 indicates shrinking while a scale factor >1 indicates stretching.
Gaussian Noise

Gaussian Noise is a statistical noise having a probability density function


equal to the normal distribution, also known as Gaussian Distribution. A random
Gaussian function is added to the Image function to generate this noise. It is also
called electronic noise because it arises in amplifiers or detectors

It is commonly known that Gaussian noise is statistical noise with a


probability density function (PDF) equal to the normal distribution. Gaussian
noise has a uniform distribution throughout the signal.

A noisy image has pixels that are made up of the sum of their original
pixel values plus a random Gaussian noise value. The probability distribution
function for a Gaussian distribution has a bell shape. Additive white Gaussian
noise is the most common application for Gaussian noise in applications.

Removing Gaussian noise involves smoothing the inside distinct region of


an image. For this, classical linear filters such as the Gaussian filter reduce noise
efficiently but blur the edges significantly.

Filtering image data is a standard process used in almost every image


processing system. Filters are used for this purpose. They remove noise from
images by preserving the details of the same. The choice of filter depends on the
filter behavior and type of data.

We all know that noise is an abrupt change in pixel values in an image. So


when it comes to filtering images, the first intuition that comes is to replace the
value of each pixel with the average of the pixel around it. This process smooths
the image.
Computer Vision Introduction

Computer vision is a subfield of artificial intelligence that deals with


acquiring, processing, analyzing, and making sense of visual data such as digital
images and videos. It is one of the most compelling types of artificial
intelligence that we regularly implement in our daily routines.

Computer vision helps to understand the complexity of the human vision


system and trains computer systems to interpret and gain a high-level
understanding of digital images or videos. In the early days, developing a
machine system having human-like intelligence was just a dream, but with the
advancement of artificial intelligence and machine learning, it also became
possible. Similarly, such intelligent systems have been developed that can "see"
and interpret the world around them, similar to human eyes. The fiction of
yesterday has become the fact of today. In this tutorial, "Computer Vision
Introduction", we will discuss a few important concepts of computer vision, such
as:

o What is Computer Vision?


o How does Computer Vision Work?
o The evolution of computer vision
o Applications of computer vision
o Challenges of computer vision

What is Computer Vision?


Computer vision is one of the most important fields of artificial
intelligence (AI) and computer science engineering that makes computer
systems capable of extracting meaningful information from visual data like
videos and images. Further, it also helps to take appropriate actions and make
recommendations based on the extracted information.

Further, Artificial intelligence is the branch of computer science that


primarily deals with creating a smart and intelligent system that can behave and
think like the human brain. So, we can say if artificial intelligence enables
computer systems to think intelligently, computer vision makes them capable of
seeing, analyzing, and understanding.

History of Computer Vision

Computer vision is not a new technology because scientists and experts have
been trying to develop machines that can see and understand visual data for
almost six decades. The evolution of computer vision is classified as follows:

1959: The first experiment with computer vision was initiated in 1959, where
they showed a cat as an array of images. Initially, they found that the system
reacts first to hard edges or lines, and scientifically, this means that image
processing begins with simple shapes such as straight edges.

1960: In 1960, artificial intelligence was added as a field of academic study to


solve human vision problems.

1963: This was another great achievement for scientists when they developed
computers that could transform 2D images into 3-D images.

1974: This year, optical character recognition (OCR) and intelligent character
recognition (ICR) technologies were successfully discovered. The OCR has
solved the problem of recognizing text printed in any font or typeface, whereas
ICR can decrypt handwritten text. These inventions are one of the greatest
achievements in document and invoice processing, vehicle number plate
recognition, mobile payments, machine translation, etc.

1982: In this year, the algorithm was developed to detect edges, corners, curves,
and other shapes. Further, scientists also developed a network of cells that could
recognize patterns.

2000: In this year, scientists worked on a study of object recognition.

2001: The first real-time face recognition application was developed.


2010: The ImageNet data set became available to use with millions of tagged
images, which can be considered the foundation for recent Convolutional Neural
Network (CNN) and deep learning models.

2012: CNN has been used as an image recognition technology with a reduced
error rate.

2014: COCO has also been developed to offer a dataset for object detection and
support future research.

How does Computer Vision Work?

Computer vision is a technique that extracts information from visual data,


such as images and videos. Although computer vision works similarly to human
eyes with brain work, this is probably one of the biggest open questions for IT
professionals: How does the human brain operate and solve visual object
recognition?

On a certain level, computer vision is all about pattern recognition which


includes the training process of machine systems for understanding the visual
data such as images and videos, etc.

Firstly, a vast amount of visual labeled data is provided to machines to


train it. This labeled data enables the machine to analyze different patterns in all
the data points and can relate to those labels. E.g., suppose we provide visual
data of millions of dog images. In that case, the computer learns from this data,
analyzes each photo, shape, the distance between each shape, color, etc., and
hence identifies patterns similar to dogs and generates a model. As a result, this
computer vision model can now accurately detect whether the image contains a
dog or not for each input image.

Task Associated with Computer Vision

Although computer vision has been utilized in so many fields, there are a
few common tasks for computer vision systems. These tasks are given below:

Object classification: Object classification is a computer vision


technique/task used to classify an image, such as whether an image contains a
dog, a person's face, or a banana. It analyzes the visual content (videos &
images) and classifies the object into the defined category. It means that we can
accurately predict the class of an object present in an image with image
classification.

Object Identification/detection: Object identification or detection uses


image classification to identify and locate the objects in an image or video. With
such detection and identification technique, the system can count objects in a
given image or scene and determine their accurate location and labeling. For
example, in a given image, one dog, one cat, and one duck can be easily detected
and classified using the object detection technique.

Object Verification: The system processes videos, finds the objects based on
search criteria, and tracks their movement.

Object Landmark Detection: The system defines the key points for the
given object in the image data.
Image Segmentation: Image segmentation not only detects the classes in
an image as image classification; instead, it classifies each pixel of an image to
specify what objects it has. It tries to determine the role of each pixel in the
image.

Object Recognition: In this, the system recognizes the object's location


with respect to the image.

How to learn computer Vision?

Although, computer vision requires all basic concepts of machine learning, deep
learning, and artificial intelligence. But if you are eager to learn computer vision,
then you must follow below things, which are as follows:

1. Build your foundation:


o Before entering this field, you must have strong knowledge of
advanced mathematical concepts such as Probability, statistics,
linear algebra, calculus, etc.
o The knowledge of programming languages like Python would be an
extra advantage to getting started with this domain.
2. Digital Image Processing:
It would be best if you understood image editing tools and their functions,
such as histogram equalization, median filtering, etc. Further, you should
also know about compressing images and videos using JPEG and MPEG
files. Once you know the basics of image processing and restoration, you
can kick-start your journey into this domain.
3. Machine learning understanding
To enter this domain, you must deeply understand basic machine learning
concepts such as CNN, neural networks, SVM, recurrent neural networks,
generative adversarial neural networks, etc.
4. Basic computer vision: This is the step where you need to decrypt the
mathematical models used in visual data formulation.
5. These are a few important prerequisites that are essentially required to
start your career in computer vision technology. Once you are prepared
with the above prerequisites, you can easily start learning and make a
career in Computer vision.

Applications of computer vision


6. Computer vision is one of the most advanced innovations of artificial
intelligence and machine learning. As per the increasing demand for AI
and Machine Learning technologies, computer vision has also become a
center of attraction among different sectors. It greatly impacts different
industries, including retail, security, healthcare, automotive, agriculture,
etc.

7. Below are some most popular applications of computer vision:


8. Facial recognition: Computer vision has enabled machines to detect face
images of people to verify their identity. Initially, the machines are given
input data images in which computer vision algorithms detect facial
features and compare them with databases of fake profiles. Popular social
media platforms like Facebook also use facial recognition to detect and
tag users. Further, various government spy agencies are employing this
feature to identify criminals in video feeds.
9. Healthcare and Medicine: Computer vision has played an important role
in the healthcare and medicine industry. Traditional approaches for
evaluating cancerous tumors are time-consuming and have less accurate
predictions, whereas computer vision technology provides faster and more
accurate chemotherapy response assessments; doctors can identify cancer
patients who need faster surgery with life-saving precision.
10. Self-driving vehicles: Computer vision technology has also contributed to
its role in self-driving vehicles to make sense of their surroundings by
capturing video from different angles around the car and then introducing
it into the software. This helps to detect other cars and objects, read traffic
signals, pedestrian paths, etc., and safely drive its passengers to their
destination.
11. Optical character recognition (OCR)
Optical character recognition helps us extract printed or handwritten text
from visual data such as images. Further, it also enables us to extract text
from documents like invoices, bills, articles, etc.
12. Machine inspection: Computer vision is vital in providing an image-based
automatic inspection. It detects a machine's defects, features, and
functional flaws, determines inspection goals, chooses lighting and
material-handling techniques, and other irregularities in manufactured
products.

o Retail (e.g., automated checkouts): Computer vision is also being


implemented in the retail industries to track products, shelves, wages,
record product movements into the store, etc. This AI-based computer
vision technique automatically charges the customer for the marked
products upon checkout from the retail stores.
o 3D model building: 3D model building or 3D modeling is a technique to
generate a 3D digital representation of any object or surface using the
software. In this field also, computer vision plays its role in constructing
3D computer models from existing objects. Furthermore, 3D modeling has
a variety of applications in various places, such as Robotics, Autonomous
driving, 3D tracking, 3D scene reconstruction, and AR/VR.
o Medical imaging: Computer vision helps medical professionals make
better decisions regarding treating patients by developing visualization of
specific body parts such as organs and tissues. It helps them get more
accurate diagnoses and a better patient care system. E.g., Computed
Tomography (CT) or Magnetic Resonance Imaging (MRI) scanner to
diagnose pathologies or guide medical interventions such as surgical
planning or for research purposes.
o Automotive safety: Computer vision has added an important safety feature
in automotive industries. E.g., if a vehicle is taught to detect objects and
dangers, it could prevent an accident and save thousands of lives and
property.
o Surveillance: It is one of computer vision technology's most important and
beneficial use cases. Nowadays, CCTV cameras are almost fitted in every
place, such as streets, roads, highways, shops, stores, etc., to spot various
doubtful or criminal activities. It helps provide live footage of public
places to identify suspicious behavior, identify dangerous objects, and
prevent crimes by maintaining law and order.
o Fingerprint recognition and biometrics: Computer vision technology
detects fingerprints and biometrics to validate a user's identity. Biometrics
deals with recognizing persons based on physiological characteristics,
such as the face, fingerprint, vascular pattern, or iris, and behavioral traits,
such as gait or speech. It combines Computer Vision with knowledge of
human physiology and behavior.

How to become a computer vision engineer?

Computer vision is one of the world's most popular & high-demand


technologies. Although starting your career in this domain is not easy, if you
have a good command of machine learning basics, advanced mathematics
concepts, and the basics of computer vision, you can easily start your career as a
computer vision engineer.

There are some roles and responsibilities required to become a computer


vision engineer, which is as follows

o To create and implement a vision algorithm for working with image and
video content pixels
o To develop a data-based approach for better problem solutions.
o Whenever required, you have to work on various AI and ML tasks
required for computer vision, such as image processing.
o Experience in working on various real-time project scenarios for problem-
solving.
o Hierarchical problem decomposition, implementation of solutions, and
integration with other sub-systems.
o Hierarchical problem decomposition, implementation of solutions, and
integration with other sub-systems.
o Should be capable of understanding business objectives and can connect
to technical solutions through effective system design and architecture.

Job description (JD) for Computer vision engineer

o The candidate must have cumulative work experience in visual data


processing and analysis using machine learning and deep learning.
o Hands-on experience with various AI/ML frameworks such as Python,
TensorFlow, PyTorch, Keras, CPP, etc.
o Candidates must have good experience in implementing AI techniques.
o Must have good written and verbal communication skills.
o Candidates should be aware of object detection techniques and models
such as YOLO, RCNN, etc.

Which programming language is best for computer vision?

o Computer vision engineers require in-depth knowledge of machine


learning and deep learning concepts with strong command over at least
one programming language. There are so many programming languages
that can be used in this domain, but Python is among the most popular.
However, one can also choose OpenCV with Python, OpenCV with C++,
or MATLAB to learn and implement computer vision applications.
o OpenCV with Python could be the most preferred choice for beginners
due to its flexibility, simple syntax, and versatility. Various reasons make
Python the best programming language for computer vision, which is as
follows:
o Easy-to-use: Python is very famous as it is easy to learn for entry-level
persons and professionals. Further, Python is also easily adaptable and
covers all business needs.
o Most used programming language: Python is one of the most popular
programming languages as it contains complete learning environments to
get started with machine learning, artificial intelligence, deep learning,
and computer vision.
o Debugging and visualization: Python has an in-built facility for debugging
via 'PDB' and visualization through Matplotlib.

Computer Vision Challenges

o Computer vision has emerged as one of the most growing domains of


artificial intelligence, but it still has a few challenges to becoming a
leading technology. There are a few challenges observed while working
with computer vision technology.
o Reasoning and analytical issuesAll programming languages and
technologies require the basic logic behind any task. To become a
computer vision expert, you must have strong reasoning and analytical
skills. If you don't have such skills, then defining any attribute in visual
content may be a big problem.
o Privacy and security: Privacy and security are among the most important
factors for any country. Similarly, vision-powered surveillance is also
having various serious privacy issues for lots of countries. It restricts users
from accessing unauthorized content. Further, various countries also avoid
such face recognition and detection techniques for privacy and security
reasons.
o Duplicate and false content: Cyber security is always a big concern for all
organizations, and they always try to protect their data from hackers and
cyber fraud. A data breach can lead to serious problems, such as creating
duplicate images and videos over the internet.

Computer Vision Features

In general, a feature is an attribute or property that represents an element.


In the area of computer vision, features represent images or video properties that
may be utilized to describe and understand the information. Corners, edges,
angles, and colors are examples of low-level features, whereas items, scenes, and
behaviors are examples of high-level features.

Features are crucial in computer vision because they facilitate the


representation and analysis of visual input. Algorithms can find patterns and
correlations in photos and videos by extracting and describing them with features
and then making predictions or conclusions based on this knowledge.
Computer vision algorithms may reach great levels of accuracy and
understanding of visual input by integrating low and high-level information.

Low-level Vision

Based on low-level image processing, low-level vision tasks could be


preformed, such as image matching, optical flow computation and motion
analysis. Image matching basically is to find correspondences between two or
more images. These images could be the same scene taken from different view
points, or a moving scene taken by a fixed camera, or both. Constructing image
correspondences is a fundamentally important problem in vision for both
geometry recovery and motion recovery. Without exaggeration, image matching
is part of the base for vision. Optical flow is a kind of image observation of
motion, but it is not the true motion. Since it only measure the optical changes in
images, an aperture problem is unavoidable. But based on optical flows, camera
motion or object motion could be estimated.

Low-Level Features

Low-level features are essential aspects of an image or video that may be


retrieved simply by applying straightforward techniques. Contours, edges,
angles, and colors are instances of low-level characteristics. These features are
frequently unique to a single image or video and have little relevance on their
own.

Common strategies for learning low-level features include edge, corner,


color detection, and color histogram analysis. Simple procedures, such as
the Canny edge detector, the Harris corner detector, and the K-means clustering
algorithm, can be employed to execute these strategies and extract these features.

Middle-level Vision

There are two major aspects in middle-level vision: (1) inferring the
geometry and (2) inferring the motion. These two aspects are not independent
but highly related. A simple question is “can we estimate geometry based on just
one image?”. The answer is obvious. We need at least two images. They could
be taken from two cameras or come from the motion of the scene. Some
fundamental parts of geometric vision include multiview geometry, stereo and
structure from motion (SfM), which fulfill the step of from 2D to 3D by inferring
3D scene information from 2D images. Based on that, geometric modelling is to
construct 3D models for 6 objects and scenes, such that 3D reconstruction and
image-based rendering could be made possible. Another task of middle-level
vision is to answer the question “how the object moves”. Firstly, we should
know which areas in the images belong to the object, which is the task of image
segmentation. Image segmentation has been a challenging fundamental problem
in computer vision for decades. Segmentation could be based on spatial
similarities and continuities. However, uncertainty can not be overcome for
static image. When considering motion continuities, we hope the uncertainty of
segmentation could be alleviated. On top of that is visual tracking and visual
motion capturing, which estimate 2D and 3D motions, including deformable
motions and articulated motions.

High-level Vision

High-level vision is to infer the semantics, for example, object


recognition and scene understanding. A challenging question in many decades is
that how to achieve invariant recognition, i.e., recognize 3D object from
different view directions. There have been two approaches for recognition:
model-based recognition and learning-based recognition. It is noticed that there
was a spiral development of these two approaches in history. Even higher level
vision is image understanding and video understanding. We are interested in
answering questions like “Is there a car in the image?” or “Is this video a drama
or an action?”, or ”Is the person in the video jumping?” Based on the answers of
these questions, we should be able to fulfill different tasks in intelligent human-
computer interaction, intelligent robots, smart environment and content-based
multimedia.

High-Level Features

High-level characteristics, on the contrary, are more conceptual and


thematically significant. They are generated from low-level feature pairings and
contain more complicated details about an image’s or video’s topic. Items,
scenarios, and interactions are instances of high-level features. These
characteristics are more generic and may be applied to a broader variety of
photos and videos.

Convolutional neural networks (CNNs) and recurrent neural


networks (RNNs) are two major strategies for learning high-level features. These
techniques may learn to extract and describe high-level features by being trained
on big datasets of annotated images or videos.

Furthermore, pre-trained models that have previously been trained with


large chunks of data are another technique for learning high-level characteristics.
So, these models may be fine-tuned for special purposes and tasks and, therefore,
can deliver accurate results with hardly any training.

Differences Between Low and High-Level Features

The main difference between low and high-level features lies in the fact
that the first one is characteristics extracted from an image, such as colors,
edges, and textures. In contrast, the second one is extracted from low-level
features and denotes more semantically meaningful concepts. This basic
difference can be easily understood by the diagram below:

Fundamentals of Image Formation

Image formation is an analog to digital conversion of an image with the


help of 2D Sampling and Quantization techniques that is done by the capturing
devices like cameras. In general, we see a 2D view of the 3D world.

In the same way, the formation of the analog image took place. It is
basically a conversion of the 3D world that is our analog image to a 2D world
that is our Digital image.

Generally, a frame grabber or a digitizer is used for sampling and


quantizing the analog signals.
Imaging:

The mapping of a 3D world object into a 2D digital image plane is


called imaging. In order to do so, each point on the 3D object must correspond to
the image plane. We all know that light reflects from every object that we see
thus enabling us to capture all those light-reflecting points in our image plane.

Various factors determine the quality of the image like spatial factors or
the lens of the capturing device.

Color and Pixelation:

In digital imaging, a frame grabber is placed at the image plane which is


like a sensor. It aims to focus the light on it and the continuous image is
pixelated via the reflected light by the 3D object. The light that is focused on the
sensor generates an electronic signal.

Each pixel that is formed may be colored or grey depending on the


intensity of the sampling and quantization of the light that is reflected and the
electronic signal that is generated via them.

All these pixels form a digital image. The density of these pixels
determines the image quality. The more the density the more the clear and high-
resolution image we will get.
Forming a Digital Image:

In order to form or create an image that is digital in nature, we need to have a


continuous conversion of data into a digital form. Thus, we require two main
steps to do so:

 Sampling (2D): Sampling is a spatial resolution of the digital image. And the
rate of sampling determines the quality of the digitized image. The magnitude
of the sampled image is determined as a value in image processing. It is
related to the coordinates values of the image.

 Quantization: Quantization is the number of grey levels in the digital image.


The transition of the continuous values from the image function to its digital
equivalent is called quantization. It is related to the intensity values of the
image.

 The normal human being acquires a high level of quantization levels to get
the fine shading details of the image. The more quantization levels will result
in the more clear image.

Transformation

2D Transformation in computer graphics is a process of modifying and re-


positioning the existing graphics in 2 dimensions. Transformations help change
the object's position, size, orientation, shape, etc.; there are three basic rigid
transformations: reflections, rotations, and translations.
Transformation leads to a different state of being for a company. The
metaphor of a caterpillar becoming a butterfly encapsulates a transformation
process. However, transformation is not an event. It is a journey that is designed
to achieve high impact at two levels – individual and organizational.

Types of Transformations:
o Translation.
o Scaling.
o Rotating.
o Reflection.
o Shearing.

Affine transformation

The combination of linear transformations is called an affine transformation.


By linear transformation, we mean that lines will be mapped to new lines
preserving their parallelism, and pixels will be mapped to new pixels without
disrupting the distance ratio. Affine transformation is also used in satellite
image processing, data augmentation for images, and so on.

These transformations are performed by different matrices multiplication with a


matrix M. Different transformations require different kernel matrices that give
respective transformations when multiplied by the image matrix. The affine
transformation consists of the following transformations:

o Scaling
o Translation
o Shear
o Rotation

Note: A combination of these transformations is also an affine transformation.

Mathematical background

As mentioned earlier, matrix multiplication and addition play a big role in affine
transformation. We first take a point X with and x and y from the image and
represent it as a vector but with a three dimension set to 1. It is important to
include this third dimension because otherwise, the transformation would not be
linear.
𝑥
X = 𝑦
1
Now, if we want to transform this point X into ′X′, we multiply X with a
matrix M.

𝒂 𝒃 𝒄 𝒙
X' = 𝒅 𝒆 𝒇 𝒚
𝒈 𝒉 𝒊 𝟏

Scaling

When we scale an image, we shrink or expand it. The M matrix for scaling is as
follows:
𝑆𝑥 0 0
0 𝑆𝑦 0
0 0 1

Here sx and sy are the parameters for scaling in x and y axis, respectively.

Now we'll use tf.keras.preprocessing.image.apply_affine_transform to apply the


transformations. We can find the details about the parameters in the official
documentation.

PROGRAM:
image = cv2.imread("Detective.png")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
transformation = tf.keras.preprocessing.image.apply_affine_transform(
image, # input image
zx=0.5,
zy=0.5,
row_axis=0,
col_axis=1,
channel_axis=2
)
Line 2: We use the OpenCV library to read the input image, so the image read
would be in BGR, which we need to convert into RGB.
Lines 6 and 7: We have zx and zy as the scaling parameters.
Lines 8 to 10: We tell the order of RGB channels. We may not need this if we use
some other library to read the image.
The output of scaling is as follows:
Translation
Translation means to pick up the image and place it in a new dimension. The
kernel for translation is as follows:
1 0 0
0 1 0
𝑡𝑥 𝑡𝑦 1

Here tx and ty are the parameters for translation in x and y axis, respectively.

Code implementation

Let's look at the code below:


transformation = tf.keras.preprocessing.image.apply_affine_transform(
image,
tx=-400,
ty=400,
row_axis=0,
col_axis=1,
channel_axis=2

 Lines 3 and 4: We have tx and ty as the translation parameters.

The output of the translation transform is as follows:


The translated image

Shear

In this transformation, the image is slanted in the x or y direction.

The kernel for horizontal shear is:


1 𝑆ℎ 0
0 1 0
0 0 1
And the kernel for vertical shear is:
1 0 0
𝑆𝑣 1 0
0 0 1
Here ℎsh and sv are the parameters for horizontal and vertical shearing,
respectively.

Code implementation

Let's look at the code below:

transformation = tf.keras.preprocessing.image.apply_affine_transform(
image,
shear=30,
row_axis=0,
col_axis=1,
channel_axis=2
)
 Line 3: In the implementation provided
by tf.keras.preprocessing.image.apply_affine_transform , we provide an
angle for shearing.

The output of the shearing transform is as follows:

Rotation
In rotation, we rotate the image in θ direction. The kernel for this transformation
is as follows:
cos(𝜃) sin(𝜃) 0
−sin(𝜃) cos(𝜃) 0
0 0 1

Code implementation

Let's look at the code below:

transformation = tf.keras.preprocessing.image.apply_affine_transform(
image,
theta=90,
row_axis=0,
col_axis=1,
channel_axis=2
)

 Line 3: We want to rotate our image in the theta angle.

The output of the rotation transform is as follows:


Projective transformation

The projective transformation shows how the perceived objects change when
the view point of the observer changes. This transformation allows creating
perspective distortion. The affine transformation is used for scaling, skewing
and rotation. Graphics Mill for .NET supports both these classes of
transformations.

Difference Between Projective and Affine Transformations

The sole difference between these two transformations is in the last line of the
transformation matrix. For affine transformations, the first two elements of this
line should be zeros. But this leads to different properties of the two operations:

o The projective transformation does not preserve parallelism, length, and


angle. But it still preserves collinearity and incidence.
o Since the affine transformation is a special case of the projective
transformation, it has the same properties. However unlike projective
transformation, it preserves parallelism.

Projective transformation can be represented as transformation of an arbitrary


quadrangle (i.e. system of four points) into another one. Affine transformation is
a transformation of a triangle. Since the last row of a matrix is zeroed, three
points are enough. The image below illustrates the difference.
Linear transformation are not always can be calculated through a matrix
multiplication. If the matrix of transformation is singular, it leads to problems.
The transformation matrix is singular when it represents non-convex
quadrangle. The shape is convex when each point which lies between two of
points belonging to this shape is also belongs to the same shape. Speaking
simpler, if

o the quadrangle is self-intersecting,


- or -
o some vertex lies "inside" the quadrangle,
- or -
o some vertices are situated in the same point,

this quadrangle is non-convex. This figure demonstrates some examples of non-


convex quadrangles:

When you carry out linear transformation, make sure that it will not be
singular. Graphics Mill for .NET is not able to apply such transformations.

How to Apply Projective Transformations


To create the matrix for the projective transformation, use
the Matrix.FromProjectivePoints method. To initialize the existing instance of
the Matrix class, use the Matrix.FillFromProjectivePoints method.
Important

Be careful when choosing destination points. As the transformation matrix


should be non-singular, do not specify points that form a self-intersecting
quadrangle.

Here is an example of perspective distortion effect.


Visual Basic

Dim bitmap As New Aurigma.GraphicsMill.Bitmap("C:\image.jpg")


Dim source As System.Drawing.PointF() = { _
New System.Drawing.PointF(0.0F, 0.0F), _
New System.Drawing.PointF(0.0F, bitmap.Height), _
New System.Drawing.PointF(bitmap.Width, bitmap.Height), _
New System.Drawing.PointF(bitmap.Width, 0.0F)}

Dim target As System.Drawing.PointF() = { _


New System.Drawing.PointF(0.0F, 0.0F), _
New System.Drawing.PointF(0.0F, bitmap.Height), _
New System.Drawing.PointF(bitmap.Width * 0.75F, bitmap.Height - 50.0F),
_
New System.Drawing.PointF(bitmap.Width * 0.75F, 80.0F)}

Dim matrix As Aurigma.GraphicsMill.Transforms.Matrix = _


Aurigma.GraphicsMill.Transforms.Matrix.FromProjectivePoints(source,
target)

Dim transform As New _


Aurigma.GraphicsMill.Transforms.ApplyMatrixTransform(matrix)

transform.ApplyTransform(bitmap)
C#

Aurigma.GraphicsMill.Bitmap bitmap = new Aurigma.GraphicsMill.Bitmap(


@"C:\image.jpg");

System.Drawing.PointF [] source = {
new System.Drawing.PointF(0f, 0f),
new System.Drawing.PointF(0f, bitmap.Height),
new System.Drawing.PointF(bitmap.Width, bitmap.Height),
new System.Drawing.PointF(bitmap.Width, 0f)};

System.Drawing.PointF [] target = {
new System.Drawing.PointF(0f, 0f),
new System.Drawing.PointF(0f, bitmap.Height),
new System.Drawing.PointF(bitmap.Width * 0.75f, bitmap.Height - 50f),
new System.Drawing.PointF(bitmap.Width * 0.75f, 80f)};

Aurigma.GraphicsMill.Transforms.Matrix matrix =
Aurigma.GraphicsMill.Transforms.Matrix.FromProjectivePoints(
source, target);

Aurigma.GraphicsMill.Transforms.ApplyMatrixTransform transform =
new Aurigma.GraphicsMill.Transforms.ApplyMatrixTransform(matrix);

transform.ApplyTransform(bitmap);

The image that will be produced will look as follows (resized version).

Fourier Transform:
Fourier
Fourier was a mathematician in 1822. He give Fourier series and Fourier
transform to convert a signal into frequency domain.
Fourier transform
The Fourier transform simply states that that the non periodic signals whose
area under the curve is finite can also be represented into integrals of the sines
and cosines after being multiplied by a certain weight.
The Fourier transform has many wide applications that include , image
compression (e.g JPEG compression) , filtrering and image analysis.
Which one is applied on images.
Now the question is that which one is applied on the images , the Fourier series
or the Fourier transform. Well , the answer to this question lies in the fact that
what images are. Images are non – periodic. And since the images are non
periodic , so Fourier transform is used to convert them into frequency domain.
Discrete fourier transform.
Since we are dealing with images, and infact digital images , so for digital
images we will be working on discrete fourier transform

Consider the above Fourier term of a sinusoid. It include three things.


Spatial Frequency
Magnitude
Phase
The spatial frequency directly relates with the brightness of the image. The
magnitude of the sinusoid directly relates with the contrast. Contrast is the
difference between maximum and minimum pixel intensity. Phase contains the
color information.
The formula for 2 dimensional discrete Fourier transform is given below.

The discrete Fourier transform is actually the sampled Fourier transform, so it


contains some samples that denotes an image. In the above formula f(x,y)
denotes the image , and F(u,v) denotes the discrete Fourier transform. The
formula for 2 dimensional inverse discrete Fourier transform is given below.
The inverse discrete Fourier transform converts the Fourier transform back to
the image
Consider this signal.
Now we will see an image , whose we will calculate FFT magnitude spectrum
and then shifted FFT magnitude spectrum and then we will take Log of that
shifted spectrum.
Original Image

The Fourier transform magnitude spectrum


The Shifted Fourier transform

The Shifted Magnitude Spectrum

Properties of Fourier Transform:


Linearity:
Addition of two functions corresponding to the addition of the two frequency
spectrum is called the linearity. If we multiply a function by a constant, the
Fourier transform of the resultant function is multiplied by the same constant.
The Fourier transform of sum of two or more functions is the sum of the Fourier
transforms of the functions.
 Case I.
 If h(x) -> H(f) then ah(x) -> aH(f)
 Case II.
If h(x) -> H(f) and g(x) -> G(f) then h(x)+g(x) -> H(f)+G(f)
Scaling:
Scaling is the method that is used to the change the range of the independent
variables or features of data. If we stretch a function by the factor in the time
domain then squeeze the Fourier transform by the same factor in the frequency
domain.
 If f(t) -> F(w) then f(at) -> (1/|a|)F(w/a)
Differentiation:
Differentiating function with respect to time yields to the constant multiple of
the initial function.
 If f(t) -> F(w) then f'(t) -> jwF(w)
Convolution:
It includes the multiplication of two functions. The Fourier transform of a
convolution of two functions is the point-wise product of their respective
Fourier transforms.
 If f(t) -> F(w) and g(t) -> G(w)
 then f(t)*g(t) -> F(w)*G(w)
Frequency Shift:
Frequency is shifted according to the co-ordinates. There is a duality between
the time and frequency domains and frequency shift affects the time shift.
 If f(t) -> F(w) then f(t)exp[jw't] -> F(w-w')
Time Shift:
The time variable shift also effects the frequency function. The time shifting
property concludes that a linear displacement in time corresponds to a linear
phase factor in the frequency domain.
If f(t) -> F(w) then f(t-t') -> F(w)exp[-jwt']
Convolution
computer vision is a branch that involves various complex algorithms and
techniques which are used to load, handle, preprocess, and analyze the image
dataset, which is to be used for training the final model. Computer vision
involves multiple famous tasks such as object detection, image segmentation,
face recognition, etc.
For computer vision, convolutional neural networks are used, which are the
type of neural networks which deal with image datasets. It is capable of
accepting the images as input, loading them, preprocessing them, and applying
different techniques to extract the information from the same.
The convolutional neural networks are fundamentally the same as the artificial
neural networks; just here, the term artificial is replaced by the4 convolution,
which directly means that the convolutions or convolutional operations are
involved in these techniques.

What is a Convolution? How is it relevant? Why use Convolution?


These are some of the questions every data scientist encounter at least once in
their deep learning journey. I have these questions now and then.

So, mathematically speaking, convolution is an operator on two functions


(matrices) that produces a third function (matrix), which is the modified input by
the other having different features (values in the matrix).

In Computer Vision, convolution is generally used to extract or create a feature


map (with the help of kernels) out of the input image.

BASIC TERMINOLOGIES

Figure 2: 2D Convolution (GIF by Vincent)


In the above image, the blue matrix is the input and the green matrix is the
output. Whereas we have a kernel moving through the input matrix to get/extract
the features. So let’s first understand the input matrix.

Input Matrix: An image is made up of pixels, where each pixel is in the


inclusive range of [0, 255]. So we can represent an image, in terms of a matrix,
where every position represents a pixel. The pixel value represents how bright it
is, i.e. pixel -> 0 is black and pixel -> 255 is white (highest brightness). A
grayscale image has a single matrix of pixels, i.e. it doesn't have any colour,
whereas a coloured image (RGB) has 3 channels, and each channel represents its
colour density.

Figure 3: Grayscale image of digit 8; shape (24 x 16); every pixel (0–255)
represents the brightness/density of the colour (Image by Author)

The above image is of shape: (24, 16) where height = 24 and width = 16.
Similarly, we have a coloured image (RGB) having 3 channels and it can be
represented in a matrix of shape: (height, width, channels)

Now, we know what is the first input to the convolution operator, but how
to transform this input and get the output feature matrix. Here comes the term
‘kernel’ which acts on the input image to get the required output.
Kernel: In an image, we can say that a pixel surrounding another pixel has
similar values. So to harness this property of the image we have kernels. A
kernel is a small matrix that uses the power of localisation to extract the required
features from the given image (input). Generally, a kernel is much smaller than
the input image. We have different kernels for different tasks like blurring,
sharpening or edge detection.

The convolution happens between the input image and the given kernel. It
is the sliding dot product between the kernel and the localised section of the
input image.

Figure 4: Input (5, 5); Kernel (3, 3); Convolution 2D (Image by Author)

In the above image, for the first convolution, we will select a 3x3 region in
the image (sequential order) and do a dot product with the kernel. Ta-da! This is
the first convolution we did and we will move the region of interest, pixel by
pixel(stride).
Figure 5: Convolution between the kernel and selected region in the
image (Image by Author)

The dimension of the region of interest (ROI) is always equal to the


kernel’s dimension. We move this ROI after every dot product and continue to
do so until we have the complete output matrix.

Figure 6: 2D Convolution with the horizontal kernel (Image by Author)

So by now, we know how to perform a convolution, what exactly is the


input, and what the kernel looks like. But after every dot product, we slide the
ROI by some pixels (can skip 1, 2, 3 … pixels). This functionality is controlled
by a parameter called stride.
Stride: It is a parameter which controls/modifies the amount of movement of the
ROI in the image. The stride of greater than 1 is used to decrease the output
dimension. Intuitively, it skips the overlap of a few pixels in every dot product,
which leads to a decrease in the final shape of the output.

Figure 7: Example of a 2D Convolution (GIF by Vincent)

In the above image, we always move the ROI by 1 pixel and perform the
dot product with the kernel. But if we increase the stride, let’s say stride = 2,
then the output matrix would be of dimension -> 2 x 2 (Figure 8).
Figure 8: Stride = 2 in a 2D convolution (GIF by Vincent)

CONVOLUTION FOR MULTI-CHANNEL IMAGE

Now, we know what the input looks like, and what are some of the
general parameters like kernel, and stride. But how does the convolution work if
we have multiple channels in the image, i.e. image is coloured or to be more
precise if the input matrix is of shape: (height, width, channels), where the
channel is greater than 1.

Figure 9: Coloured image; RGB (BGR) channels (Image by Author)


Now you may have a question, we only have a single kernel and how to
use it on a stack of the 2D matrices (in our case, three 2D matrices stacked
together). So here we will introduce the term ‘filter’.

Generally people interchange filters and kernels, but in reality they are different.

Filter: It is a group of kernels which is used for the convolution of the image.
For eg: in a coloured image we have 3 channels, and for each channel, we would
have a kernel (to extract the features), and a group of such kernels is known as a
filter. For a grayscale image (or a 2D matrix) the term filter is equal to
a kernel. In a filter, all the kernels can be the same or different from each other.
The specific kernel can be used to extract specific features.

So how does the convolution takes place?

Figure 10: 2D Convolution on a coloured image (GIF by Author)

Every channel is enacted by its kernel (exactly similar to convolution on a


grayscaled image) to extract the features. All the kernels should have the same
dimension. As a result, we have multiple output matrices (one each for a
channel) combined (with the help of matrix addition) into a single output.
Figure 11: Aggregation of the output of multiple channels into one (Image by
Author)

So the output is the feature map (extracted features) of the given image.
We can further use this output with classical machine learning algorithms for
classification/regression tasks, or this output can also be used as one of the
variations in the given image.

PADDING & OUTPUT DIMENSIONS

At this juncture, we have a pretty decent understanding of how the


convolution between the image and the kernels takes place. But this isn’t
enough. Sometimes, there is a need to get the exact output dimension as that of
the given input, we need to convert the input into a different form but the same
size. Here comes the usage of padding.
Padding: It helps in keeping the constant output size, otherwise with the use of
kernels, the output is a smaller dimension and could create a bottleneck in some
scenarios. Also, padding helps in retaining the information at the border of the
image. In padding, we pad the boundary of the image (the edges) with fake
pixels. More often, we use zero-padding, i.e. adding dark pixels at the edges of
the matrix, such that the information at the original edge pixels isn’t lost.

Figure 12: Example of padding (GIF by Vincent)

We can also add multiple padding, i.e. instead of a single-pixel pad at


every edge, we can use n number of pixels at the boundary. This is normally
decided by the kernel dimension used in the convolution.
Figure 13: Padding of 2 pixels in a 2D convolution (GIF by Vincent)

Congratulations! We have covered the foundational concepts of


convolution operation. But, how do we know the output shape of the matrix?
And to answer this, we have a simple formula that helps in calculating the shape
of the output matrix.

Figure 14: Calculation of width and height of the output matrix (Image by
Author)

Where:
wᵢ -> width of the input image, hᵢ -> height of the input image
wₖ -> width of the kernel, hₖ -> height of the kernel
sᵥᵥ -> stride for width, sₕ -> stride for height
pᵥᵥ -> padding along the width of the image, pₕ -> padding along with the height
of the image
Generally, the padding, stride and kernel in a convolution are symmetric (equal
for height and width) which converts the above formula into:

Figure 15: Calculation of the width/height of the symmetric input image and
other parameters (Image by Author)

Where:
i -> input shape (height = width)
k -> kernel shape
p -> padding along the edges of the image
s -> stride for the convolution (for sliding dot product)

So, now what? In the upcoming section, we are going to learn about
different types of kernels, which are designed to perform a specific operation on
the image.

IMAGE KERNELS

Convolutions can be used in two different ways; either with a learnable


kernel in a Convolutional Neural Network with the help of gradient descent or
with a pre-defined kernel to convert the given image. Today, we would be
focusing on the latter one and learning what all different types of transformations
are possible with the help of kernels.

Let’s consider this as an input image.


Figure 16: Input Image (matrix representing the image) (Image by Victor)

Now, with the help of kernels and using convolution on the input image and the
given kernel, we would transform the given image into the required form.
Following are a few transformations, that can be done with the help of custom
kernels in image convolutions.

1. Detecting Horizontal and Vertical lines

2. Edge Detection

3. Blur, sharpen, outline, emboss and various other transformations found in


Photoshop

4. Erode and Dilation

So let’s first have a look at one of the most basic kernels, and it is the blur
kernel. There are many variations of the blur kernel, but the most famous are
Gaussian and Box blur.
Figure 17: Blur Kernel (Image by Author)

Figure 18: Gaussian Blur on the given image (Image by Victor)

Similarly, other kernels help in the transformation of the image into its
required form. We have a great illustration to try out different kernels on the
above image: https://setosa.io/ev/image-kernels/But generally, there are some
kernels, that we as data scientists should get a hold of, and it is the line detection
kernel which primarily includes horizontal, vertical and diagonal lines.
Figure 19: Line Detection Kernels (Image by Author)

From the above graph, the intuition of these line detection kernels is very
clear. Suppose the task is to detect horizontal lines in an image. Let’s construct a
kernel for the same. So in general a 3 x 3 kernel is a good option to start with.
Now to detect the horizontal lines (from the above intuition) what if we subtract
all the nearby pixels considering it is a horizontal line. In that scenario, the
horizontal line would be easily visible as we have decreased the pixel values in
its vicinity, which would be in a straight line or precisely parallel to the type of
line we wish to detect.

But in what ratio should we decrease the pixels in the vicinity or increase
the pixel for the line to detect? Actually, it all depends on the use case along with
a few iterations with the kernels, to get a good match.

Let’s play with the given horizontal kernel and its convolution results.
Figure 20: Horizontal Line Detection (Image by Author)

Surprised! So from figure 20, we can easily state that if we increase the
values in the basic horizontal kernel, then only a few lines are detected. This is
the power of kernels on the image. Moreover, we can also increase/decrease the
density for the given kernel by symmetric modification of the given values.

This was it…Hope you learned and also loved the blog :D

Filtering

Filtering refers to linear transforms that change the frequency contents of


signals. Depending on whether high (low) frequencies are attenuated, filtering
process is called low (high) pass. 1.1 Low Pass Filtering Example: two point
moving average, recall the linear time invariant system: y(n) = [s(n) + s(n − 1)]/2,
(1.1) where s(n) is a periodic signal, s(−1) = s(N), N ≥ 2. The output y is a
smoother signal with higher frequency component damped and low frequency
component maintained. To see this, let s(n) = sin(2π f n/N), then: y(n) = 1 2 s(n) +
1 2 s(n − 1) = 1 2 sin(2π f n/N) + 1 2 sin(2π f (n − 1)/N) = 1 2 (1 + cos(2π f/N))
sin(2π fn/N) − 1 2 sin(2π f/N) cos(2π f n/N), = A1 sin(2πf n/N) − A2 cos(2π
fn/N), (1.2) where: A1 = 1 2 (1 + cos(2π f/N)), A2 = 1 2 sin(2π f/N).

At low frequency, f ∼ 0, A1 ∼ 1, A2 ∼ 0, so y(n) ∼ sin(2π fn/N) = s(n).


Low frequency input is almost preserved. At high frequency, f ∼ N/2 (near
Nyquist frequency), A1 ∼ 0, A2 ∼ 0, so y(n) ∼ 0. High frequency input is almost
zeroed out.

Signal Filtering

Consider the following signal:

Image by Author

An essential concept in signal processing is the superposition of


signals, which resolves around breaking down a signal into simpler components.
Keep in mind that the whole purpose of this concept is to understand
how signals and systems work.

By breaking down the previous signal into simpler ones, we obtain the following
components:

Image by Author
After obtaining these, we can then analyze how they are affected by the system
individually. Moreover, we are obtaining information about how the system
responds to an impulse, also known as convolution. You are probably familiar
with this term if you came to this article; however, knowing what you are
actually doing when performing a convolution, might help you in the future
when solving more complex problems. Another term you are probably very
familiar with is filter kernel, which is no more than the system’s impulse
response. This is very important because, if we know how a system responds to
an impulse, its output can be calculated to any given input signal.

The convolution operation in signals follows the same process as a


convolution in an image: the filter has to pass through the total length of the
signal with a given size and stride. The following image illustrates the
convolution of a signal x with a kernel h.

Image by Author

Low-pass Filtering vs. High-pass Filtering

Now that you understand the process of filtering and what it actually means, it is
also important to be able to tell the difference between low-pass and high-pass
filtering.

 Low-pass filtering —it can be seen as a smoothing filter which is used to


attenuate high frequencies and preserve the lower ones. You can easily
distinguish it from the other filters since it only has positive values (1
direction).

Image by Author

 High-pass filtering — contrary to low-pass filtering, this one is used to


attenuate the low frequencies and preserve the high ones. The idea is to
highlight the parts of the signal where it changes the most. You can easily
identify them as high-pass as they have both positive and negative values
(2 directions).

Image by Author

It is extremely important to analyze these types of filters when applied to


signals, considering it will build the foundations you need to deeply understand
their multiple impacts on image processing (2D).
With this being said, at this point, even if you did not know the difference
between these two types of filters, you can already begin to understand the
different effects that they will have when applied to an image.

Image Filtering

Before CNN (Convolutional Neural Network) models were proposed,


there was the need to manually extract images' features and then feed them into a
neural network or any other type of classifier. These features could be simple,
such as the number of white and black pixels, or a bit more complex, like
brightness, intensity, and energy. This was especially hard given that there was
no way to know which features would be the most meaningful for the classifier
— besides trying different combinations of features and analyzing how the
model’s performance behaves; however, this can be a very time-consuming
process. From a personal experience, coming out with new and meaningful
features to extract is not an easy task and can sometimes lead to adding features
that might negatively impact the model’s performance. To obtain more unique
features from an image, it is also common to apply filters to it and then extract
features from the resulting image. The Gabor filter (high-pass) is often used to
detect edges from an image, as shown in the following image.

Image by Author
This is where CNNs come in and completely change the feature extraction
process. Nowadays, if you have a large amount of data, you don’t need to
perform any feature extraction manually since the model will find the most
meaningful features on its own.

When carrying out this process the old way, you had to try different
combinations of features to evaluate the model’s performance. Now, the CNN
model is doing the same, but much more rigorously, automatically and on a
much larger scale.

With this being said, one of the main downsides of any neural network
model is interpretability. If there are 4096 filters in a convolutional model, you
have no idea what they mean and which features they are extracting. All you
know is that they extract very good features and that they are the most
meaningful for that certain problem. While this method works perfectly to solve
an immense amount of real-world problems, if you ever want your solution to
have a chance at being accepted by the government, health entities, and many
more sensitive areas, your model must be able to explain its decisions. People
would rather have a model that is 70% accurate and explains every decision it
makes than a 99% accurate model that does not output any type of explanation.

In addition, a few more examples of what features can be extracted with


the two types of filters mentioned in this article should be shown.

If you take into consideration the definition of each filter type in the
signals section, the same effect applies to images. On the one hand, low-pass
filters have all positive values and are commonly used to smooth images or to
perform blurring.
Smoothing filter. Image by Author

But on the other hand, high-pass filters are commonly used to perform edge
detection. If you remember its definition, high pass filters preserve the high
frequencies. In images, they highlight zones where the difference between pixel
values is high.

Edge detection filter. Image by Author

Image enhancement
Image enhancement is a powerful tool that can be used to improve the
visual quality of images and to equip machine vision systems with more
accurate information. In its simplest form, image enhancement involves altering
an image to make it appear clearer, sharper, and more visually appealing. By
improving the visibility of important features in an image, applying algorithms
such as contrast stretching, noise reduction, and edge detection can help
maximize performance for a variety of computer vision tasks.
In this blog post, we’ll explore what exactly Image Enhancement is and
how different types of Image Enhancement techniques affect Machine Vision
Systems. If you’re involved in the technical field, then you may have heard of
image enhancement and its benefits for machine vision. But, what exactly is
image enhancement?

What is Image enhancement?


Image enhancement is the technique of enhancing an image's visual
appeal. It entails changing an image's visual elements to make it more
aesthetically pleasing or to more effectively deliver the message. Different
methods, such as modifying the image's contrast, brightness, color balance, or
sharpness, can be used to achieve this. Image enhancement can be used for a
wide range of purposes, including improving the visual appeal of a photograph
or making features in a medical image more visible. It can be applied to a
number of images, including photographs, medical images, and satellite images.
The best strategy for image enhancement depends on the specifics of the image
and the desired outcome. There are numerous methods and techniques available.

Different methods to perform Image enhancement

The best way to utilize image enhancement will depend on the particular
qualities of the image and the intended results. There are many various methods
and techniques available. Typical techniques for improving images include:

1. Adjusting the contrast: Changing the spectrum of tonal values within an


image to emphasize highlights and shadows or to reveal hidden details is
known as adjusting contrast. This can be accomplished using methods
like histogram stretching or equalization.
2. Adjusting the brightness: This entails altering an image's overall
brightness or blackness. Techniques like tone mapping and gamma
correction can be used to achieve this.
3. Adjusting the color balance: In order to achieve a more natural or
aesthetically acceptable color balance, this includes changing the relative
proportions of the basic colors in an image. Techniques like color
correcting and white balance can be used to achieve this.
4. Sharpening: By enhancing the edges and minute details, sharpening
entails improving the clarity and definition of an image. Techniques
like frequency domain filters and unsharp masking can be used for this.
5. Filtering: Filtering is the process of performing a mathematical operation
on an image in order to draw attention to specific details or eliminate
unwanted elements. Low pass, high pass, and edge detection filters are
just a few of the numerous types of filters that can be used to improve
images.
6. Resampling: Resampling entails adding or subtracting pixels to alter the
quality of an image. This can be helpful for enhancing an image's
sharpness or clarity or for scaling it to meet a particular aspect ratio.
7. Deblurring: Deblurring is the process of taking away blur or noise from
a picture in order to enhance its sharpness and clarity. Techniques
like deconvolution or picture restoration are used for this.

Now, let's take a look at some of the most prevalent applications of image
enhancement.

Applications of image enhancement

There are several uses for image enhancement, including:


o Photography: By changing a photograph's contrast, brightness, color
balance, or sharpness, image enhancement can enhance its aesthetic
appeal.
o In medical imaging, such as X-rays, CT scans, and MRIs, image
enhancement can be utilized to enhance the visibility of details. This
could aid in more accurate medical problem diagnosis.
o The quality and contrast of aerial photographs can be improved via image
enhancement, which makes it simpler to view and understand details on
the land.
o In order to recognize criminals or acquire evidence, forensic photos like
fingerprints or footage from security cameras can be enhanced using
image processing techniques.
o Military and defense: To help detect potential threats or collect
intelligence, image enhancement can be employed to enhance the contrast
and resolution of photos captured by military drones or satellites.
o Astronomy: By enhancing the contrast and resolution of telescope
photographs, astronomers can better view and understand the finer
aspects of celestial objects.
o Industrial investigation: Image enhancement can be applied to
photographs captured during industrial examinations, such as evaluating
welds or looking for flaws in items, to enhance the visibility of details.
o Image enhancement has the potential to increase the quality and contrast
of photographs captured when monitoring the environment, such as
looking for oil spills or keeping an eye on vegetation.

Process for doing Image Enhancement

The specific process will depend on the features of the image and the desired
outcome. There are many different ways and techniques for image
enhancement. The general steps that are usually taken throughout the process of
image enhancement are listed below:
1. Preprocessing: In this phase, any noise or artifacts that could interfere
with the enhancement process are removed from the image in order to
prepare it for enhancement. Techniques like noise reduction or filtering
can be used to achieve this.
2. Analysis: The properties of the image are examined in this step to
ascertain which elements need to be improved. Analyzing the image's
contrast, brightness, color balance, or other visual elements may be
necessary for this.
3. Enhancement: To increase the image's visual quality, use the proper
enhancement techniques in this phase. This could entail making
adjustments to the image's contrast, brightness, color balance, or
sharpness, as well as adding filters to accentuate specific details or
eliminate unwanted elements.
4. Post-processing: In this last stage, any artifacts or distortions that may
have been generated during the enhancement process are examined in the
enhanced image. To create the final improved image, any necessary
modifications are applied.

How does Image enhancement affect machine vision?


By enhancing the transparency and clarity of picture details, image
augmentation can considerably boost machine vision systems' performance. In
order to make choices or carry out tasks, machine vision systems depend
on computer algorithms to evaluate and interpret images. By making it simpler
for the computer vision system to recognize and extract pertinent elements
from the image, image enhancement can assist in improving the reliability and
accuracy of these algorithms.

For instance, image enhancement can assist make product flaws or blemishes
more visible in a system for machine vision used mostly for quality assurance in
a manufacturing environment. The machine vision system can detect and
classify flaws more easily by enhancing the brightness and sharpness of the
image, leading to more accurate and trustworthy inspection findings.

Similarly to this, picture augmentation can increase the accuracy of a


machine vision system used mostly for object detection or tracking by making it
simpler to discern between various objects or features. The system for machine
vision may distinguish between similar things or track objects as they traverse
within the picture by altering the color balance or using filters to highlight
particular features.

Overall, image enhancement makes it simpler for machine vision systems


to extract pertinent data from images and to make decisions based on that
information, which leads to a significant improvement in performance and
accuracy.

Advantages of image enhancement:


Enhancements are used to make it easier for visual interpretation and
understanding of imagery. The advantage of digital imagery is that it allows us
to manipulate the digital pixel values in an image.

importance of enhancement technique in image processing:


The aim of image enhancement is to improve the interpretability or
perception of information in images for human viewers, or to provide `better'
input for other automated image processing techniques.

Histograms
A histogram is a graph. A graph that shows frequency of anything.
Usually histogram have bars that represent frequency of occurring of data in the
whole data set.
A Histogram has two axis the x axis and the y axis.
The x axis contains event whose frequency you have to count.
The y axis contains frequency.
The different heights of bar shows different frequency of occurrence of data.
Usually a histogram looks like this.

Now we will see an example of this histogram is build


Example
Consider a class of programming students and you are teaching python to
them.
At the end of the semester, you got this result that is shown in table. But it is
very messy and does not show your overall result of class. So you have to make
a histogram of your result, showing the overall frequency of occurrence of
grades in your class. Here how you are going to do it.
Result sheet

Name Grade

John A

Jack D

Carter B

Tommy A

Lisa C+

Derek A-

Tom B+

Histogram of result sheet


Now what you are going to do is, that you have to find what comes on the
x and the y axis.
There is one thing to be sure, that y axis contains the frequency, so what
comes on the x axis. X axis contains the event whose frequency has to be
calculated. In this case x axis contains grades.
Now we will how do we use a histogram in an image.
Histogram of an image
Histogram of an image, like other histograms also shows frequency. But
an image histogram, shows frequency of pixels intensity values. In an image
histogram, the x axis shows the gray level intensities and the y axis shows the
frequency of these intensities.
For example

The histogram of the above picture of the Einstein would be something like this
The x axis of the histogram shows the range of pixel values. Since its an 8
bpp image, that means it has 256 levels of gray or shades of gray in it. Thats
why the range of x axis starts from 0 and end at 255 with a gap of 50. Whereas
on the y axis, is the count of these intensities.
As you can see from the graph, that most of the bars that have high
frequency lies in the first half portion which is the darker portion. That means
that the image we have got is darker. And this can be proved from the image
too.
Applications of Histograms:
Histograms has many uses in image processing. The first use as it has
also been discussed above is the analysis of the image. We can predict about an
image by just looking at its histogram. Its like looking an x ray of a bone of a
body.
The second use of histogram is for brightness purposes. The histograms
has wide application in image brightness. Not only in brightness, but histograms
are also used in adjusting contrast of an image.
Another important use of histogram is to equalize an image.
And last but not the least, histogram has wide use in thresholding. This is
mostly used in computer vision.
Histogram Processing Techniques:
Histogram Sliding
In Histogram sliding, the complete histogram is shifted towards
rightwards or leftwards. When a histogram is shifted towards the right or left,
clear changes are seen in the brightness of the image. The brightness of the
image is defined by the intensity of light which is emitted by a particular light
source.

Histogram Stretching
In histogram stretching, contrast of an image is increased. The contrast of
an image is defined between the maximum and minimum value of pixel
intensity.
If we want to increase the contrast of an image, histogram of that image
will be fully stretched and covered the dynamic range of the histogram.
From histogram of an image, we can check that the image has low or high
contrast.

Histogram Equalization
Histogram equalization is used for equalizing all the pixel values of an image.
Transformation is done in such a way that uniform flattened histogram is
produced.
Histogram equalization increases the dynamic range of pixel values and makes
an equal count of pixels at each level which produces a flat histogram with high
contrast image.
While stretching histogram, the shape of histogram remains the same whereas in
Histogram equalization, the shape of histogram changes and it generates only
one image.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy