AD8703 Basic of Computer vision UNIT 1
AD8703 Basic of Computer vision UNIT 1
It produces the final stage where the result can be changed to an image or
report based on image analysis.
Now let’s see what are the different types of image processing one can
perform, they are as follows:
Converting a BGR image to RGB and vice versa can have several reasons,
one of them being that several image processing libraries have different pixel
orderings.
OpenCV reads the image in BGR mode, whereas Matplotlib needs the
image to be in RGB mode. That is why you might face a situation in which you
need to display an image using Matplotlib. In that case, you have to convert the
image to RGB mode.
Visualize channels separately -
This is an analysis step where you try to find out information hidden in
any one channel which might not be properly visible in the other channels. This
is more of a human analysis than a machine analysis.
Visualize histograms
Histograms have many uses. One of the more common is to decide what
value of threshold to use when converting a grayscale image to a binary one by
thresholding.
Crop an image
Image Augmentation
Image Rotation
Random Shifts
Random Flips
Random Scale
A scale factor <1 indicates shrinking while a scale factor >1 indicates stretching.
Gaussian Noise
A noisy image has pixels that are made up of the sum of their original
pixel values plus a random Gaussian noise value. The probability distribution
function for a Gaussian distribution has a bell shape. Additive white Gaussian
noise is the most common application for Gaussian noise in applications.
Computer vision is not a new technology because scientists and experts have
been trying to develop machines that can see and understand visual data for
almost six decades. The evolution of computer vision is classified as follows:
1959: The first experiment with computer vision was initiated in 1959, where
they showed a cat as an array of images. Initially, they found that the system
reacts first to hard edges or lines, and scientifically, this means that image
processing begins with simple shapes such as straight edges.
1963: This was another great achievement for scientists when they developed
computers that could transform 2D images into 3-D images.
1974: This year, optical character recognition (OCR) and intelligent character
recognition (ICR) technologies were successfully discovered. The OCR has
solved the problem of recognizing text printed in any font or typeface, whereas
ICR can decrypt handwritten text. These inventions are one of the greatest
achievements in document and invoice processing, vehicle number plate
recognition, mobile payments, machine translation, etc.
1982: In this year, the algorithm was developed to detect edges, corners, curves,
and other shapes. Further, scientists also developed a network of cells that could
recognize patterns.
2012: CNN has been used as an image recognition technology with a reduced
error rate.
2014: COCO has also been developed to offer a dataset for object detection and
support future research.
Although computer vision has been utilized in so many fields, there are a
few common tasks for computer vision systems. These tasks are given below:
Object Verification: The system processes videos, finds the objects based on
search criteria, and tracks their movement.
Object Landmark Detection: The system defines the key points for the
given object in the image data.
Image Segmentation: Image segmentation not only detects the classes in
an image as image classification; instead, it classifies each pixel of an image to
specify what objects it has. It tries to determine the role of each pixel in the
image.
Although, computer vision requires all basic concepts of machine learning, deep
learning, and artificial intelligence. But if you are eager to learn computer vision,
then you must follow below things, which are as follows:
o To create and implement a vision algorithm for working with image and
video content pixels
o To develop a data-based approach for better problem solutions.
o Whenever required, you have to work on various AI and ML tasks
required for computer vision, such as image processing.
o Experience in working on various real-time project scenarios for problem-
solving.
o Hierarchical problem decomposition, implementation of solutions, and
integration with other sub-systems.
o Hierarchical problem decomposition, implementation of solutions, and
integration with other sub-systems.
o Should be capable of understanding business objectives and can connect
to technical solutions through effective system design and architecture.
Low-level Vision
Low-Level Features
Middle-level Vision
There are two major aspects in middle-level vision: (1) inferring the
geometry and (2) inferring the motion. These two aspects are not independent
but highly related. A simple question is “can we estimate geometry based on just
one image?”. The answer is obvious. We need at least two images. They could
be taken from two cameras or come from the motion of the scene. Some
fundamental parts of geometric vision include multiview geometry, stereo and
structure from motion (SfM), which fulfill the step of from 2D to 3D by inferring
3D scene information from 2D images. Based on that, geometric modelling is to
construct 3D models for 6 objects and scenes, such that 3D reconstruction and
image-based rendering could be made possible. Another task of middle-level
vision is to answer the question “how the object moves”. Firstly, we should
know which areas in the images belong to the object, which is the task of image
segmentation. Image segmentation has been a challenging fundamental problem
in computer vision for decades. Segmentation could be based on spatial
similarities and continuities. However, uncertainty can not be overcome for
static image. When considering motion continuities, we hope the uncertainty of
segmentation could be alleviated. On top of that is visual tracking and visual
motion capturing, which estimate 2D and 3D motions, including deformable
motions and articulated motions.
High-level Vision
High-Level Features
The main difference between low and high-level features lies in the fact
that the first one is characteristics extracted from an image, such as colors,
edges, and textures. In contrast, the second one is extracted from low-level
features and denotes more semantically meaningful concepts. This basic
difference can be easily understood by the diagram below:
In the same way, the formation of the analog image took place. It is
basically a conversion of the 3D world that is our analog image to a 2D world
that is our Digital image.
Various factors determine the quality of the image like spatial factors or
the lens of the capturing device.
All these pixels form a digital image. The density of these pixels
determines the image quality. The more the density the more the clear and high-
resolution image we will get.
Forming a Digital Image:
Sampling (2D): Sampling is a spatial resolution of the digital image. And the
rate of sampling determines the quality of the digitized image. The magnitude
of the sampled image is determined as a value in image processing. It is
related to the coordinates values of the image.
The normal human being acquires a high level of quantization levels to get
the fine shading details of the image. The more quantization levels will result
in the more clear image.
Transformation
Types of Transformations:
o Translation.
o Scaling.
o Rotating.
o Reflection.
o Shearing.
Affine transformation
o Scaling
o Translation
o Shear
o Rotation
Mathematical background
As mentioned earlier, matrix multiplication and addition play a big role in affine
transformation. We first take a point X with and x and y from the image and
represent it as a vector but with a three dimension set to 1. It is important to
include this third dimension because otherwise, the transformation would not be
linear.
𝑥
X = 𝑦
1
Now, if we want to transform this point X into ′X′, we multiply X with a
matrix M.
𝒂 𝒃 𝒄 𝒙
X' = 𝒅 𝒆 𝒇 𝒚
𝒈 𝒉 𝒊 𝟏
Scaling
When we scale an image, we shrink or expand it. The M matrix for scaling is as
follows:
𝑆𝑥 0 0
0 𝑆𝑦 0
0 0 1
Here sx and sy are the parameters for scaling in x and y axis, respectively.
PROGRAM:
image = cv2.imread("Detective.png")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
transformation = tf.keras.preprocessing.image.apply_affine_transform(
image, # input image
zx=0.5,
zy=0.5,
row_axis=0,
col_axis=1,
channel_axis=2
)
Line 2: We use the OpenCV library to read the input image, so the image read
would be in BGR, which we need to convert into RGB.
Lines 6 and 7: We have zx and zy as the scaling parameters.
Lines 8 to 10: We tell the order of RGB channels. We may not need this if we use
some other library to read the image.
The output of scaling is as follows:
Translation
Translation means to pick up the image and place it in a new dimension. The
kernel for translation is as follows:
1 0 0
0 1 0
𝑡𝑥 𝑡𝑦 1
Here tx and ty are the parameters for translation in x and y axis, respectively.
Code implementation
Shear
Code implementation
transformation = tf.keras.preprocessing.image.apply_affine_transform(
image,
shear=30,
row_axis=0,
col_axis=1,
channel_axis=2
)
Line 3: In the implementation provided
by tf.keras.preprocessing.image.apply_affine_transform , we provide an
angle for shearing.
Rotation
In rotation, we rotate the image in θ direction. The kernel for this transformation
is as follows:
cos(𝜃) sin(𝜃) 0
−sin(𝜃) cos(𝜃) 0
0 0 1
Code implementation
transformation = tf.keras.preprocessing.image.apply_affine_transform(
image,
theta=90,
row_axis=0,
col_axis=1,
channel_axis=2
)
The projective transformation shows how the perceived objects change when
the view point of the observer changes. This transformation allows creating
perspective distortion. The affine transformation is used for scaling, skewing
and rotation. Graphics Mill for .NET supports both these classes of
transformations.
The sole difference between these two transformations is in the last line of the
transformation matrix. For affine transformations, the first two elements of this
line should be zeros. But this leads to different properties of the two operations:
When you carry out linear transformation, make sure that it will not be
singular. Graphics Mill for .NET is not able to apply such transformations.
transform.ApplyTransform(bitmap)
C#
System.Drawing.PointF [] source = {
new System.Drawing.PointF(0f, 0f),
new System.Drawing.PointF(0f, bitmap.Height),
new System.Drawing.PointF(bitmap.Width, bitmap.Height),
new System.Drawing.PointF(bitmap.Width, 0f)};
System.Drawing.PointF [] target = {
new System.Drawing.PointF(0f, 0f),
new System.Drawing.PointF(0f, bitmap.Height),
new System.Drawing.PointF(bitmap.Width * 0.75f, bitmap.Height - 50f),
new System.Drawing.PointF(bitmap.Width * 0.75f, 80f)};
Aurigma.GraphicsMill.Transforms.Matrix matrix =
Aurigma.GraphicsMill.Transforms.Matrix.FromProjectivePoints(
source, target);
Aurigma.GraphicsMill.Transforms.ApplyMatrixTransform transform =
new Aurigma.GraphicsMill.Transforms.ApplyMatrixTransform(matrix);
transform.ApplyTransform(bitmap);
The image that will be produced will look as follows (resized version).
Fourier Transform:
Fourier
Fourier was a mathematician in 1822. He give Fourier series and Fourier
transform to convert a signal into frequency domain.
Fourier transform
The Fourier transform simply states that that the non periodic signals whose
area under the curve is finite can also be represented into integrals of the sines
and cosines after being multiplied by a certain weight.
The Fourier transform has many wide applications that include , image
compression (e.g JPEG compression) , filtrering and image analysis.
Which one is applied on images.
Now the question is that which one is applied on the images , the Fourier series
or the Fourier transform. Well , the answer to this question lies in the fact that
what images are. Images are non – periodic. And since the images are non
periodic , so Fourier transform is used to convert them into frequency domain.
Discrete fourier transform.
Since we are dealing with images, and infact digital images , so for digital
images we will be working on discrete fourier transform
BASIC TERMINOLOGIES
Figure 3: Grayscale image of digit 8; shape (24 x 16); every pixel (0–255)
represents the brightness/density of the colour (Image by Author)
The above image is of shape: (24, 16) where height = 24 and width = 16.
Similarly, we have a coloured image (RGB) having 3 channels and it can be
represented in a matrix of shape: (height, width, channels)
Now, we know what is the first input to the convolution operator, but how
to transform this input and get the output feature matrix. Here comes the term
‘kernel’ which acts on the input image to get the required output.
Kernel: In an image, we can say that a pixel surrounding another pixel has
similar values. So to harness this property of the image we have kernels. A
kernel is a small matrix that uses the power of localisation to extract the required
features from the given image (input). Generally, a kernel is much smaller than
the input image. We have different kernels for different tasks like blurring,
sharpening or edge detection.
The convolution happens between the input image and the given kernel. It
is the sliding dot product between the kernel and the localised section of the
input image.
Figure 4: Input (5, 5); Kernel (3, 3); Convolution 2D (Image by Author)
In the above image, for the first convolution, we will select a 3x3 region in
the image (sequential order) and do a dot product with the kernel. Ta-da! This is
the first convolution we did and we will move the region of interest, pixel by
pixel(stride).
Figure 5: Convolution between the kernel and selected region in the
image (Image by Author)
In the above image, we always move the ROI by 1 pixel and perform the
dot product with the kernel. But if we increase the stride, let’s say stride = 2,
then the output matrix would be of dimension -> 2 x 2 (Figure 8).
Figure 8: Stride = 2 in a 2D convolution (GIF by Vincent)
Now, we know what the input looks like, and what are some of the
general parameters like kernel, and stride. But how does the convolution work if
we have multiple channels in the image, i.e. image is coloured or to be more
precise if the input matrix is of shape: (height, width, channels), where the
channel is greater than 1.
Generally people interchange filters and kernels, but in reality they are different.
Filter: It is a group of kernels which is used for the convolution of the image.
For eg: in a coloured image we have 3 channels, and for each channel, we would
have a kernel (to extract the features), and a group of such kernels is known as a
filter. For a grayscale image (or a 2D matrix) the term filter is equal to
a kernel. In a filter, all the kernels can be the same or different from each other.
The specific kernel can be used to extract specific features.
So the output is the feature map (extracted features) of the given image.
We can further use this output with classical machine learning algorithms for
classification/regression tasks, or this output can also be used as one of the
variations in the given image.
Figure 14: Calculation of width and height of the output matrix (Image by
Author)
Where:
wᵢ -> width of the input image, hᵢ -> height of the input image
wₖ -> width of the kernel, hₖ -> height of the kernel
sᵥᵥ -> stride for width, sₕ -> stride for height
pᵥᵥ -> padding along the width of the image, pₕ -> padding along with the height
of the image
Generally, the padding, stride and kernel in a convolution are symmetric (equal
for height and width) which converts the above formula into:
Figure 15: Calculation of the width/height of the symmetric input image and
other parameters (Image by Author)
Where:
i -> input shape (height = width)
k -> kernel shape
p -> padding along the edges of the image
s -> stride for the convolution (for sliding dot product)
So, now what? In the upcoming section, we are going to learn about
different types of kernels, which are designed to perform a specific operation on
the image.
IMAGE KERNELS
Now, with the help of kernels and using convolution on the input image and the
given kernel, we would transform the given image into the required form.
Following are a few transformations, that can be done with the help of custom
kernels in image convolutions.
2. Edge Detection
So let’s first have a look at one of the most basic kernels, and it is the blur
kernel. There are many variations of the blur kernel, but the most famous are
Gaussian and Box blur.
Figure 17: Blur Kernel (Image by Author)
Similarly, other kernels help in the transformation of the image into its
required form. We have a great illustration to try out different kernels on the
above image: https://setosa.io/ev/image-kernels/But generally, there are some
kernels, that we as data scientists should get a hold of, and it is the line detection
kernel which primarily includes horizontal, vertical and diagonal lines.
Figure 19: Line Detection Kernels (Image by Author)
From the above graph, the intuition of these line detection kernels is very
clear. Suppose the task is to detect horizontal lines in an image. Let’s construct a
kernel for the same. So in general a 3 x 3 kernel is a good option to start with.
Now to detect the horizontal lines (from the above intuition) what if we subtract
all the nearby pixels considering it is a horizontal line. In that scenario, the
horizontal line would be easily visible as we have decreased the pixel values in
its vicinity, which would be in a straight line or precisely parallel to the type of
line we wish to detect.
But in what ratio should we decrease the pixels in the vicinity or increase
the pixel for the line to detect? Actually, it all depends on the use case along with
a few iterations with the kernels, to get a good match.
Let’s play with the given horizontal kernel and its convolution results.
Figure 20: Horizontal Line Detection (Image by Author)
Surprised! So from figure 20, we can easily state that if we increase the
values in the basic horizontal kernel, then only a few lines are detected. This is
the power of kernels on the image. Moreover, we can also increase/decrease the
density for the given kernel by symmetric modification of the given values.
This was it…Hope you learned and also loved the blog :D
Filtering
Signal Filtering
Image by Author
By breaking down the previous signal into simpler ones, we obtain the following
components:
Image by Author
After obtaining these, we can then analyze how they are affected by the system
individually. Moreover, we are obtaining information about how the system
responds to an impulse, also known as convolution. You are probably familiar
with this term if you came to this article; however, knowing what you are
actually doing when performing a convolution, might help you in the future
when solving more complex problems. Another term you are probably very
familiar with is filter kernel, which is no more than the system’s impulse
response. This is very important because, if we know how a system responds to
an impulse, its output can be calculated to any given input signal.
Image by Author
Now that you understand the process of filtering and what it actually means, it is
also important to be able to tell the difference between low-pass and high-pass
filtering.
Image by Author
Image by Author
Image Filtering
Image by Author
This is where CNNs come in and completely change the feature extraction
process. Nowadays, if you have a large amount of data, you don’t need to
perform any feature extraction manually since the model will find the most
meaningful features on its own.
When carrying out this process the old way, you had to try different
combinations of features to evaluate the model’s performance. Now, the CNN
model is doing the same, but much more rigorously, automatically and on a
much larger scale.
With this being said, one of the main downsides of any neural network
model is interpretability. If there are 4096 filters in a convolutional model, you
have no idea what they mean and which features they are extracting. All you
know is that they extract very good features and that they are the most
meaningful for that certain problem. While this method works perfectly to solve
an immense amount of real-world problems, if you ever want your solution to
have a chance at being accepted by the government, health entities, and many
more sensitive areas, your model must be able to explain its decisions. People
would rather have a model that is 70% accurate and explains every decision it
makes than a 99% accurate model that does not output any type of explanation.
If you take into consideration the definition of each filter type in the
signals section, the same effect applies to images. On the one hand, low-pass
filters have all positive values and are commonly used to smooth images or to
perform blurring.
Smoothing filter. Image by Author
But on the other hand, high-pass filters are commonly used to perform edge
detection. If you remember its definition, high pass filters preserve the high
frequencies. In images, they highlight zones where the difference between pixel
values is high.
Image enhancement
Image enhancement is a powerful tool that can be used to improve the
visual quality of images and to equip machine vision systems with more
accurate information. In its simplest form, image enhancement involves altering
an image to make it appear clearer, sharper, and more visually appealing. By
improving the visibility of important features in an image, applying algorithms
such as contrast stretching, noise reduction, and edge detection can help
maximize performance for a variety of computer vision tasks.
In this blog post, we’ll explore what exactly Image Enhancement is and
how different types of Image Enhancement techniques affect Machine Vision
Systems. If you’re involved in the technical field, then you may have heard of
image enhancement and its benefits for machine vision. But, what exactly is
image enhancement?
The best way to utilize image enhancement will depend on the particular
qualities of the image and the intended results. There are many various methods
and techniques available. Typical techniques for improving images include:
Now, let's take a look at some of the most prevalent applications of image
enhancement.
The specific process will depend on the features of the image and the desired
outcome. There are many different ways and techniques for image
enhancement. The general steps that are usually taken throughout the process of
image enhancement are listed below:
1. Preprocessing: In this phase, any noise or artifacts that could interfere
with the enhancement process are removed from the image in order to
prepare it for enhancement. Techniques like noise reduction or filtering
can be used to achieve this.
2. Analysis: The properties of the image are examined in this step to
ascertain which elements need to be improved. Analyzing the image's
contrast, brightness, color balance, or other visual elements may be
necessary for this.
3. Enhancement: To increase the image's visual quality, use the proper
enhancement techniques in this phase. This could entail making
adjustments to the image's contrast, brightness, color balance, or
sharpness, as well as adding filters to accentuate specific details or
eliminate unwanted elements.
4. Post-processing: In this last stage, any artifacts or distortions that may
have been generated during the enhancement process are examined in the
enhanced image. To create the final improved image, any necessary
modifications are applied.
For instance, image enhancement can assist make product flaws or blemishes
more visible in a system for machine vision used mostly for quality assurance in
a manufacturing environment. The machine vision system can detect and
classify flaws more easily by enhancing the brightness and sharpness of the
image, leading to more accurate and trustworthy inspection findings.
Histograms
A histogram is a graph. A graph that shows frequency of anything.
Usually histogram have bars that represent frequency of occurring of data in the
whole data set.
A Histogram has two axis the x axis and the y axis.
The x axis contains event whose frequency you have to count.
The y axis contains frequency.
The different heights of bar shows different frequency of occurrence of data.
Usually a histogram looks like this.
Name Grade
John A
Jack D
Carter B
Tommy A
Lisa C+
Derek A-
Tom B+
The histogram of the above picture of the Einstein would be something like this
The x axis of the histogram shows the range of pixel values. Since its an 8
bpp image, that means it has 256 levels of gray or shades of gray in it. Thats
why the range of x axis starts from 0 and end at 255 with a gap of 50. Whereas
on the y axis, is the count of these intensities.
As you can see from the graph, that most of the bars that have high
frequency lies in the first half portion which is the darker portion. That means
that the image we have got is darker. And this can be proved from the image
too.
Applications of Histograms:
Histograms has many uses in image processing. The first use as it has
also been discussed above is the analysis of the image. We can predict about an
image by just looking at its histogram. Its like looking an x ray of a bone of a
body.
The second use of histogram is for brightness purposes. The histograms
has wide application in image brightness. Not only in brightness, but histograms
are also used in adjusting contrast of an image.
Another important use of histogram is to equalize an image.
And last but not the least, histogram has wide use in thresholding. This is
mostly used in computer vision.
Histogram Processing Techniques:
Histogram Sliding
In Histogram sliding, the complete histogram is shifted towards
rightwards or leftwards. When a histogram is shifted towards the right or left,
clear changes are seen in the brightness of the image. The brightness of the
image is defined by the intensity of light which is emitted by a particular light
source.
Histogram Stretching
In histogram stretching, contrast of an image is increased. The contrast of
an image is defined between the maximum and minimum value of pixel
intensity.
If we want to increase the contrast of an image, histogram of that image
will be fully stretched and covered the dynamic range of the histogram.
From histogram of an image, we can check that the image has low or high
contrast.
Histogram Equalization
Histogram equalization is used for equalizing all the pixel values of an image.
Transformation is done in such a way that uniform flattened histogram is
produced.
Histogram equalization increases the dynamic range of pixel values and makes
an equal count of pixels at each level which produces a flat histogram with high
contrast image.
While stretching histogram, the shape of histogram remains the same whereas in
Histogram equalization, the shape of histogram changes and it generates only
one image.