0% found this document useful (0 votes)

19 views

CV Unit 2

Uploaded by

bgvirat53

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

CV Unit 2

Uploaded by

bgvirat53

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Real World Computer Vision Applications

Syllabus content
Unit 4: Basic Image and Digital Image Processing

 Image Processing with OpenCV Overview

 Edge Detection and Image Gradients
 Dilation, Opening, Closing, And Erosion
 Perspective Transformation
 Image Pyramids
 Cropping
 Scaling
 Interpolations
 Re-Sizing
 Thresholding
 Adaptive Thresholding
 Binarization
 Sharpening
 Blurring
 Contours
 Line Detection Using Hough Lines
 Finding Corners
 Counting Circles And Ellipses

 What Is the Pix2Pix GAN?

 Satellite to Map Image Translation Dataset
 How to Develop and Train a Pix2Pix Model
 How to Translate Images With a Pix2Pix Model
 How to Translate Google Maps to Satellite Images

What is Image Processing with OpenCV

Real World Computer Vision Applications

Difference between computer vision Vs image processing vs computer graphics

Difference between computer vision and computer graphics

Vision and Graphics

Related Disciplines of Image Processing

Real World Computer Vision Applications

How image processing works

Or what is digital image processing?
Real World Computer Vision Applications

What is the physics of an image?

How does image formed?

Camera:

 What camera does

 How to tell where is the camera

Light:

 How to measure the light?

 What light does at surface
 How the brightness value we see in camera are determined.

Color:

 The underlying mechanism of the color

 How to describe and measure it

So computer vision depends on

Computer vision depends= Geometry + Physics + nature of object in world

(Configuration of
camera) Measure of light

o o
f f

c c
a a
m m
e e
Real World Computer Vision Applications

What is OpenCV Library?

OpenCV, short for Open Source Computer Vision Library, is an open-source computer vision and machine
learning software library. Originally developed by Intel, it is now maintained by a community of developers
under the OpenCV Foundation.

OpenCV is one of the most popular computer vision libraries. If you want to start your journey in the field
of computer vision, then a thorough understanding of the concepts of OpenCV is of paramount importance.
To understand the basic functionalities of Python OpenCV module, we will cover the most basic and
important concepts of OpenCV intuitively:
1. Reading an image
2. Extracting the RGB values of a pixel
3. Extracting the Region of Interest (ROI)
4. Resizing the Image
5. Rotating the Image
6. Drawing a Rectangle
7. Displaying text , etc

Edge Detection and Image Gradients

Image Edge Detection Operators in Digital Image Processing

What is an Edge?
In computer vision, an edge in an image is a significant local change in the image's
brightness, hue, or intensity. Edges are often associated with discontinuities in the image's
intensity or its first derivative. They can also be defined as a set of connected pixels that form a
boundary between two different regions.
Edges can be distinguished from noise by their long-range structure. They also have properties
such as gradient and orientation. Discontinuities in an image's brightness can be caused by
changes in depth, surface orientation, scene illumination, or material properties.
Edge detection is an important task in object recognition. When two non-parallel edges meet,
they form a corner
Real World Computer Vision Applications
Real World Computer Vision Applications
Real World Computer Vision Applications

Line Detection
A image in a photograph is called a raw image, and in order to extract useful information
from it, it must be put in a certain form. The first step in preparing the picture for higher-
level processing is called pre-processing. The purpose of pre-processing is two-fold: to
eliminate undesirable features that will hinder further processing and to extract the
desirable features that represent useful information in the image. Unwanted image
attributes include noise (insignificant lines and contours) and the presence of featureless
space. The important features include surface details and boundaries such as lines, edges,
and vertices.

Edge Detection
The first step in object pre-processing is edge detection. To isolate an image from its
background and neighboring images, you must first recognize its edges. An edge in an
image is an image contour across which the image's brightness or hue changes abruptly,
Real World Computer Vision Applications

perhaps in the magnitude or in the rate of change in the magnitude These edges are
modeled as mathematical discontinuities. The principle intensity edges in an object are:

surface-normal discontinuities (top vs. side)

depth discontinuities (side of an object)

surface-reflectance/reflective discontinuities (text/ink)

illumination discontinuities (shadows)

Two methods of edge detection are Thresholding using Histograms and Gaussian
Convolution.

Thresholding
Thresholding is a process by which the intensity resolution of a picture is reduced
(to be displayed, for example, on a computer that does not support as high an
intensity resolution as the picture). The threshold should be between the average
intensity of the object and the average intensity of the background or other object.
Histograms aid in the use of determining this value.

Histograms
Histograms are used to separate an image from its background and to separate objects of
different colors, by calculating and locating in the picture the changes in intensity
resolution. A histogram is used one of two ways. One is a graph of the frequency of
occurrence of each level of intensity in an image. It indicates the picture's changes in
intensity by finding all the values of the pixels in the picture and then plotting the number
Real World Computer Vision Applications

of pixels that have each value. If the image is a high-intensity image, where the difference
between the object and the background is very different, the graph will contain many
well-defined peaks.

The threshold lies at one of these peaks (or troughs). The second method by which
histograms are utilized is to record the value of every pixel as it appears in the
picture. This method is employed more for edge detection: as the resolution of the
graph increases, a more defined edge or threshold can be located by closer
examination of the change in the pixels' intensities. Notice the abrupt intensity
changes in this close-up:
Real World Computer Vision Applications

One issue that may arise in edge detection with histograms is that of noise. In the
histogram shown, there is more than one peak; some of the peaks could be
mistaken for an edge. These subsidiary peaks occur because of the presence of
noise in the image. Below is an example of a noisy image, and its results after edge
detection:

Processes such as smoothing and convolution serve to diminish these distractions --

in particular Gaussian Convolution is widely used. Gaussian Convolution is a
mathematical algorithm that employs the Gaussian function and its derivative to
calculate and locate only those pixels in a picture that mark the edges of the image.

To try out edge-detection for yourself, check out the Edge Detector Demo at
Carnegie Melon University.

Why we use edge detection?

 Reduce unnecessary information in the image while preserving the structure of

the image.
Real World Computer Vision Applications

 Extract important features of an image such as corners, lines, and curves.

 Edges provide strong visual clues that can help the recognition process.

Methods of Edge Detection

There are various methods, and the following are some of the most commonly used methods-

 Prewitt edge detection

 Sobel edge detection
 Laplacian edge detection
 Canny edge detection

Erosion and Dilation of an Image

Dilation adds pixels to the boundaries of objects in an image, while erosion removes pixels
on object boundaries. The number of pixels added or removed from the objects in an
image depends on the size and shape of the structuring element used to process the image.
Morphology is known as the broad set of image processing operations that process images
based on shapes. It is also known as a tool used for extracting image components that are
useful in the representation and description of region shape.
The basic morphological operations are:
 Erosion
 Dilation

Dilation:
 Dilation expands the image pixels i.e. it is used for expanding an element A by
using structuring element B.
 Dilation adds pixels to object boundaries.
Real World Computer Vision Applications

 The value of the output pixel is the maximum value of all the pixels in the
neighborhood. A pixel is set to 1 if any of the neighboring pixels have the value 1.

Difference between Dilation and Erosion:

Dilation Erosion
It increases the size of the objects. It decreases the size of the objects.
It fills the holes and broken areas. It removes the small anomalies.
It connects the areas that are separated by space It reduces the brightness of the bright
smaller than structuring element. objects.
It increases the brightness of the objects. It removes the objects smaller than
the structuring element.
Distributive, duality, translation and decomposition It also follows the different
properties are followed. properties like duality etc.
It is XOR of A and B. It is dual of dilation.
It is used prior in Closing operation. It is used later in Closing operation.
It is used later in Opening operation. It is used prior in Opening operation.

Difference between Opening and Closing in Digital

Image Processing
Opening and Closing are dual operations used in Digital Image Processing for restoring an
eroded image. Opening is generally used to restore or recover the original image to the
maximum possible extent. Closing is generally used to smoother the contour of the
distorted image and fuse back the narrow breaks and long thin gulfs. Closing is also used
for getting rid of the small holes of the obtained image. The combination
of Opening and Closing is generally used to clean up artifacts in the segmented image
before using the image for digital analysis.
Real World Computer Vision Applications

Some of the differences between Opening and Closing are:

S.No. Opening Closing

Opening is a process in which first erosion Closing is a process in which first dilation
1. operation is performed and then dilation operation is performed and then erosion
operation is performed. operation is performed.

Opening operation performed on X & Y is Closing operation performed on X & Y is the

2. the union of all translations of Y that fit complement of the union of all translations of Y
entirely within X. that do not fit entirely within X.

It eliminates the thin protrusions of the It eliminates the small holes from the obtained
3.
obtained image. image.

Opening operation performed on X & Y is Closing operation performed on X & Y is

4.
represented by (AoB). represented by (A.B)

Opening is used for smoothening of Closing is used for removing internal noise of
5.
contour and fusing of narrow breaks. the obtained image

Opening is denoted by:

Real World Computer Vision Applications
Properties of Opening are:
1. XoY is a subset (subimage of X)
2. If X is a subset of Z then XoY is a subset of ZoY
3.(XoY)oY = XoY

Closing is denoted by:

Properties of Closing are:

1. X is a subset subimage of X.Y
2. (X.Y).Y = X.Y
Summer-time is here and so is the time to skill-up! More than 5,000 learners have now
completed their journey from basics of DSA to advanced level development
programs such as Full-Stack, Backend Development, Data Science.

Perspective Transformation
When human eyes see near things they look bigger as compare to those who are far away. This is called
perspective in a general way. Whereas transformation is the transfer of an object e.t.c from one state to
another. So overall, the perspective transformation deals with the conversion of 3d world into 2d image. The
same principle on which human vision works and the same principle on which the camera works.

We will see in detail about why this happens, that those objects which are near to you look bigger, while
those who are far away, look smaller even though they look bigger when you reach them.

We will start this discussion by the concept of frame of reference:

Frame of reference:

Frame of reference is basically a set of values in relation to which we measure something.

Real World Computer Vision Applications
5 frames of reference

In order to analyze a 3d world/image/scene, 5 different frame of references are required.

 Object
 World
 Camera
 Image
 Pixel
Object coordinate frame

Object coordinate frame is used for modeling objects. For example, checking if a particular object is in a
proper place with respect to the other object. It is a 3d coordinate system.

World coordinate frame

World coordinate frame is used for co-relating objects in a 3 dimensional world. It is a 3d coordinate system.

Camera coordinate frame

Camera co-ordinate frame is used to relate objects with respect of the camera. It is a 3d coordinate system.

Image coordinate frame

It is not a 3d coordinate system, rather it is a 2d system. It is used to describe how 3d points are mapped in a
2d image plane.

Pixel coordinate frame

It is also a 2d coordinate system. Each pixel has a value of pixel co ordinates.

Transformation between these 5 frames

Real World Computer Vision Applications

Thats how a 3d scene is transformed into 2d, with image of pixels.

Now we will explain this concept mathematically.

Where

Y = 3d object

y = 2d Image

f = focal length of the camera

Z = distance between object and the camera

Now there are two different angles formed in this transform which are represented by Q.
Real World Computer Vision Applications
The first angle is

Where minus denotes that image is inverted. The second angle that is formed is:

Comparing these two equations we get

From this equation, we can see that when the rays of light reflect back after striking from the object, passed
from the camera, an invert image is formed.

We can better understand this, with this example.

For example

Calculating the size of image formed

Suppose an image has been taken of a person 5m tall, and standing at a distance of 50m from the camera,
and we have to tell that what is the size of the image of the person, with a camera of focal length is 50mm.

Solution:

Since the focal length is in millimeter, so we have to convert every thing in millimeter in order to calculate
it.

So,

Y = 5000 mm.

f = 50 mm.

Z = 50000 mm.

Putting the values in the formula, we get

Real World Computer Vision Applications

= -5 mm.

Again, the minus sign indicates that the image is inverted.

Image Pyramids
Image pyramids are one of the most beautiful concept of image processing. Normally, we
work with images with default resolution but many times we need to change the resolution
(lower it) or resize the original image in that case image pyramids comes handy.
The pyrUp() function increases the size to double of its original size and pyrDown() function
decreases the size to half. If we keep the original image as a base image and go on
applying pyrDown function on it and keep the images in a vertical stack, it will look like a
pyramid. The same is true for upscaling the original image by pyrUp function.

Once we scale down and if we rescale it to the original size, we lose some information and the resolution
of the new image is much lower than the original one.
Below is an example of Image Pyramiding –

import cv2
import matplotlib.pyplot as plt

img = cv2.imread("images/input.jpg")

layer = img.copy()

for i in range(4):
plt.subplot(2, 2, i + 1)

# using pyrDown() function

layer = cv2.pyrDown(layer)
Real World Computer Vision Applications

plt.imshow(layer)
cv2.imshow("str(i)", layer)
cv2.waitKey(0)

cv2.destroyAllWindows()

Output:

Advantages of Image pyramids:

 Lowering of resolution
 Getting various sizes of image
 Image Blending
 Edge detection
Real World Computer Vision Applications

Reading an Image

Images are represented as arrays consisting of pixel values. 8-bit images have pixel values ranging from 0

(black) to 255 (white). Depending on the color scale there are various channels in an image, each channel

representing the pixel values for one particular color. RGB (Red, green, blue) is the most commonly used

color scale and all images I’ve used in my examples are RGB images.

We can easily read the image array using the imread function from OpenCV. One thing to remember here is

that OpenCV reads images in BGR order by default.

Cropping

Cropping is a widely used augmentation technique. However, be careful as to not crop important parts of the

image (pretty obvious, but easy to miss when you have too many images of various different sizes). Since

images are represented using arrays, cropping is equivalent to taking out a slice from an array:

Resizing

Most deep learning model architectures expect all input images to be of the same dimensions.
resized = cv2.resize(im, (120,90))
plt.imshow(resized)
<matplotlib.image.AxesImage at 0x128427240>
Real World Computer Vision Applications

Flipping image

This is another very popular image augmentation technique. The only thing to remember here is that the

flipping should make sense for your use case. For example, if you’re classifying building types, you wouldn’t

encounter any inverted buildings in your test set so it doesn’t make sense to do a vertical flip in this case.

flip_v = np.flip(im,0)
plt.imshow(flip_v)
<matplotlib.image.AxesImage at 0x1284cd358>

Rotate Image:

In most cases, it is okay to rotate the image by a small angle. The naive way of doing this might change the

entire orientation of the image in some cases like in this case:

Hence, a better way of rotating is by doing an affine transform using OpenCV. An affine transformation
preserves collinearity and ratios of distances (eg: the midpoint of a line segment continues to remain the

midpoint even after transformation). You can also fill the borders by using the BORDER_REFLECT flag.
Real World Computer Vision Applications

Change Brightness and Contrast:

This involves applying the following function to each pixel:

Here alpha (>0) is called gain and beta is called bias, these parameters are said to control contrast and brightness
respectively. Since we represent images using arrays, this function can be applied to each pixel by traversing through
the array.

Displaying Bounding Box:

Object detection is a very popular computer vision problem that involves finding a bounding box enclosing

the object of interest. Displaying the bounding box on the picture can help us visually inspect the problem

and requirements. One thing to remember while dealing with these problems is that if you’re planning to flip

the image, make sure you flip the box coordinates accordingly too. Here’s an easy way to display an image

with its bounding box.

Real World Computer Vision Applications

Showing multiple images in a grid:

Often we want to inspect multiple images in one go. It can be easily done using subplots in matplotlib.

Binarization(Converting images to Black and white):

Although not widely used in computer vision, it’s nice to know how to convert color images to greyscale.

Blur:

This technique can be useful in making your model more robust to image quality issues. If a model can

perform well on blurred images, it may indicate the model is doing well in general.

Thresholding
Thresholding is one of the segmentation techniques that generates a binary image (a binary image is one
whose pixels have only two values – 0 and 1 and thus requires only one bit to store pixel intensity) from a
given grayscale image by separating it into two regions based on a threshold value. Hence pixels having
intensity values greater than the said threshold will be treated as white or 1 in the output image and the
others will be black or 0.

Adaptive thresholding can be used to convert images to grayscale or binary, separate objects from t heir
backgrounds, and improve segmentation
Real World Computer Vision Applications

What Are Contours in Computer Vision?

Contours and Shape Detection

To understand an object in an image, we first need to find its shape, determined by its contour. The
boundary or contour marks the outline of an object in an image. So, detecting contours plays a vital role in
applications for identifying and segmenting objects in an image.
A contour consists of the pixels in an object’s boundary:

These pixels are usually of the same color, differentiating them from the rest.
Real World Computer Vision Applications
3. Contour Representation

We represent contours with chain codes and shape numbers. These parameters help in clear representation
and a better understanding of object contour. However, computing these parameters is purely optional in the
process of finding contours.

3.1. Chain Code

We can trace contours with chain codes. A chain code indicates the directions of tracing along the
boundary. Tracing starts from the selected initial point and proceeds clockwise.
There are two types of chain codes, 4-directional and 8-directional:

They differ in the number of directions along which we can trace a contour and specify a unique number for
each direction. Let’s see how an object can be represented with chain codes:

The image after sampling shows the boundary pixels. These are the contour points. Assuming the starting
position is top-left, we get the chain code moving clockwise. In our example, the 4-directional and 8-
directional chain codes are and .
Real World Computer Vision Applications
3.2. The Shape Numbers and First Differences

A shape number represents a normalized version of the corresponding chain code’s first difference.
The first difference shows how many counterclockwise directional changes were made in the chain code.
We compute the first difference by considering adjacent pairs one at a time. For example, the 4-
directional chain code pair needs no (counterclockwise) directional changes, so their first difference is .
However, the chain code pair takes changes, so the first difference is .
We concatenate the differences of the consecutive pairs to get the first difference of a chain code. For
example, the first difference of the four-directional code is .
A shape number’s the same as the first difference, except it starts from the lowest number in the first
difference. For instance, assuming the object’s first difference is , its shape number is . The differences
before the first lowest number are cyclically shifted to the left:

Detect the corner of an image using OpenCV

OpenCV (Open Source Computer Vision) is a computer vision library that contains various functions to
perform operations on Images or videos. OpenCV library can be used to perform multiple operations on
videos.
Let’s see how to detect the corner in the image.
cv2.goodFeaturesToTrack() method finds N strongest corners in the image by Shi-Tomasi method. Note that
the image should be a grayscale image. Specify the number of corners you want to find and the quality level
(which is a value between 0-1). It denotes the minimum quality of corner below which everyone is rejected.
Then provide the minimum Euclidean distance between corners detected.
Syntax : cv2.goodFeaturesToTrack(image, maxCorners, qualityLevel, minDistance[, corners[, mask[,
blockSize[, useHarrisDetector[, k]]]]])

>>hariscorner
Real World Computer Vision Applications
Image before corner detection:

# import the required library

import numpy as np
import cv2
from matplotlib import pyplot as plt

# read the image

img = cv2.imread('corner1.png')

# convert image to gray scale image

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# detect corners with the goodFeaturesToTrack function.

corners = cv2.goodFeaturesToTrack(gray, 27, 0.01, 10)
corners = np.int8(corners)

# we iterate through each corner,

# making a circle at each point that we think is a corner.
for i in corners:
x, y = i.ravel()
Real World Computer Vision Applications

cv2.circle(img, (x, y), 3, 255, -1)

plt.imshow(img), plt.show()

Image after corner detection –

What are contours?

Contours can be explained simply as a curve joining all the continuous points (along the boundary),
having same colour or intensity. The contours are a useful tool for shape analysis and object detection and
recognition. For better accuracy, use binary images.

A contour consists of the pixels in an object’s boundary:

When we join all the points on the boundary of an object, we get a contour. Typically, a specific contour
refers to boundary pixels that have the same color and intensity. OpenCV makes it really easy to find and
draw contours in images. It provides two simple functions:

1. findContours()
2. drawContours()

Also, it has two different algorithms for contour detection:

1. CHAIN_APPROX_SIMPLE
Real World Computer Vision Applications
2. CHAIN_APPROX_NONE

What is Hough transform?

Hough transform is a feature extraction method for detecting simple shapes such as circles, lines etc in an image.A “simple”
shape is one that only a few parameters can represent. For example, a line can be represented by two parameters (slope,
intercept) and a circle has three parameters — the coordinates of the center and the radius (x, y, r). Hough transform does an
excellent job in finding such shapes in an image.

The main advantage of using the Hough transform is that it is insensitive to occlusion. Let’s see how Hough transform works
by way of an example.

Hough transform to detect lines in an image

Fig1 A line in polar coordinates

Equation of a line in polar coordinates

From high school math class we know the polar form of a line is represented as:

Here represents the perpendicular distance of the line from the origin in pixels, and is the angle measured in radians,
which the line makes with the origin as shown in the figure above.

You may be tempted to ask why we did not use the familiar equation of the line given below

The reason is that the slope, m, can take values between – to + . For the Hough transform, the parameters need to be
bounded.
Real World Computer Vision Applications

You may also have a follow-up question. In the form, is bounded, but can’t take a value between 0 to + ? That
may be true in theory, but in practice, is also bounded because the image itself is finite.

Accumulator

When we say that a line in 2D space is parameterized by and , it means that if we any pick a , it corresponds to a
line. Imagine a 2D array where the x-axis has all possible values and the y-axis has all possible values. Any bin in this 2D
array corresponds to one line.

Fig2 Accumulator

This 2D array is called an accumulator because we will use the bins of this array to collect evidence about which lines exist in
the image. The top left cell corresponds to a (-R, 0) and the bottom right corresponds to (R, ).

We will see in a moment that the value inside the bin ( , ) will increase as more evidence is gathered about the presence of a
line with parameters and .

The following steps are performed to detect lines in an image.

Step 1 : Initialize Accumulator

First, we need to create an accumulator array. The number of cells you choose to have is a design decision. Let’s say you chose
a 10×10 accumulator. It means that can take only 10 distinct values and the can take 10 distinct values, and therefore you
will be able to detect 100 different kinds of lines. The size of the accumulator will also depend on the resolution of the image.
But if you are just starting, don’t worry about getting it perfectly right. Pick a number like 20×20 and see what results you get.

Step 2: Detect Edges

Now that we have set up the accumulator, we want to collect evidence for every cell of the accumulator because every cell of
the accumulator corresponds to one line.
Real World Computer Vision Applications
How do we collect evidence?

The idea is that if there is a visible line in the image, an edge detector should fire at the boundaries of the line. These edge
pixels provide evidence for the presence of a line.

The output of edge detection is an array of edge pixels

Step 3: Voting by Edge Pixels

For every edge pixel (x, y) in the above array, we vary the values of from 0 to and plug it in equation 1 to obtain a value
for .

In the Figure below we vary the for three pixels ( represented by the three colored curves ), and obtain the values for using
equation 1.

As you can see, these curves intersect at a point indicating that a line with parameters and is passing
through them.

Typically, we have hundreds of edge pixels and the accumulator is used to find the intersection of all the curves generated by
the edge pixels.

Let’s see how this is done.

Let’s say our accumulator is 20×20 in size. So, there are 20 distinct values of and so for every edge pixel (x, y), we can
calculate 20 ( , ) pairs by using equation 1. The bin of the accumulator corresponding to these 20 values of is
incremented. We do this for every edge pixel and now we have an accumulator that has all the evidence about all possible lines
in the image. We can simply select the bins in the accumulator above a certain threshold to find the lines in the image. If the
threshold is higher, you will find fewer strong lines, and if it is lower, you will find a large number of lines including some
weak ones. HoughLine: How to Detect Lines using OpenCV.

Syntax

cv2.HoughLines(src, rho, theta, threshold, lines, srn, stn, min_theta, max_theta)

Parameters

 src: Source image or input image which will be modified.

Real World Computer Vision Applications
 rho: Distance resolution
 theta: Angle resolution
 threshold: Threshold parameter for the accumulator. The lines which have votes above the threshold value are
returned
 lines: Output vector of lines returned by voting.
 srn: Divisor for distance resolution rho for multi-scale Hough transform.
 stn: Divisor for distance resolution theta for multi-scale Hough transform
 min_theta: Minimum angle to be checked for the lines. The min_theta value must fall between 0 and max_theta.
 max_theta: Maximum angle to be checked for the lines.

Code:

# Read image

img = cv2.imread('lanes.jpg', cv2.IMREAD_COLOR) # road.png is the filename

# Convert the image to gray-scale

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Find the edges in the image using canny detector

edges = cv2.Canny(gray, 50, 200)

# Detect points that form a line

lines = cv2.HoughLinesP(edges, 1, np.pi/180, max_slider, minLineLength=10, maxLineGap=250)

# Draw lines on the image

for line in lines:

x1, y1, x2, y2 = line[0]

cv2.line(img, (x1, y1), (x2, y2), (255, 0, 0), 3)

# Show result

cv2.imshow("Result Image", img)

How to find corner of an image?

Let’s see how to detect the corner in the image.

cv2.goodFeaturesToTrack() method finds N strongest corners in the image by Shi-Tomasi method.

Note that the image should be a grayscale image. Specify the number of corners you want to find and the
quality level (which is a value between 0-1). It denotes the minimum quality of corner below which
everyone is rejected. Then provide the minimum Euclidean distance between corners detected.

Syntax :
Real World Computer Vision Applications
cv2.goodFeaturesToTrack(image, maxCorners, qualityLevel, minDistance[, corners[, mask[,
blockSize[, useHarrisDetector[, k]]]]])

Image before corner detection: Image after corner detection

# import the required library

import numpy as np

import cv2

from matplotlib import pyplot as plt

# read the image

img = cv2.imread('corner1.png')

# convert image to gray scale image

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# detect corners with the goodFeaturesToTrack function.

corners = cv2.goodFeaturesToTrack(gray, 27, 0.01, 10)

corners = np.int0(corners)

# we iterate through each corner,

Real World Computer Vision Applications

# making a circle at each point that we think is a corner.

for i in corners:

x, y = i.ravel()

cv2.circle(img, (x, y), 3, 255, -1)

plt.imshow(img), plt.show()

Image after corner detection –

Syntax : cv2.goodFeaturesToTrack(gray_img, maxc, Q, minD)

Parameters :
gray_img – Grayscale image with integral values
maxc – Maximum number of corners we want(give negative value to get all the corners)
Q – Quality level parameter(preferred value=0.01)
maxD – Maximum distance(preferred value=10)

What Is Image Transformation?

Image transformation refers to a process in which we manipulate various bands of image data. The data is
manipulated from one or more than one multispectral images. These images may also consist of the same
area that is captured at different times.

For example, remote sensing images. These images are captured using satellites, and different operations are
applied to them. These operations aim for image transformation that is helpful in further analysis of the
Real World Computer Vision Applications
image. No matter which method we adopt, we get a new image generated from one or more than one source.
These are called image transformations.

In basic image transformation, we apply arithmetic operations to our image data. For example, image
subtraction is performed to detect the changes between two images. These changes occurred because the
same area was captured at different times. Let’s explore more about image transformation types.

What is a generative adversarial network (GAN)?

A generative adversarial network (GAN) is a machine learning (ML) model in which two neural
networks compete with each other by using deep learning methods to become more accurate in their
predictions. GANs typically run unsupervised and use a cooperative zero-sum game framework to learn,
where one person's gain equals another person's loss.

The two neural networks that make up a GAN are referred to as the generator and the discriminator. The
generator is a convolutional neural network and the discriminator is a deconvolutional neural network.
The goal of the generator is to artificially manufacture outputs that could easily be mistaken for real data.
The goal of the discriminator is to identify which of the outputs it receives have been artificially created.

Essentially, generative models create their own training data. While the generator is trained to produce
false data, the discriminator network is taught to distinguish between the generator's manufactured data
and true examples. If the discriminator rapidly recognizes the fake data that the generator produces -- such
as an image that isn't a human face -- the generator suffers a penalty. As the feedback loop between the
adversarial networks continues, the generator will begin to produce higher-quality and more believable
output and the discriminator will become better at flagging data that has been artificially created. For
instance, a generative adversarial network can be trained to create realistic-looking images of human faces
that don't belong to any real person.

How GANs work

GANs are typically divided into the following three categories:

 Generative. This describes how data is generated in terms of a probabilistic model.

 Adversarial. A model is trained in an adversarial setting.
Real World Computer Vision Applications
 Networks. Deep neural networks can be used as artificial intelligence (AI) algorithms for training
purposes.

The first step in establishing a GAN is to identify the desired end output and gather an initial training data
set based on those parameters. This data is then randomized and input into the generator until it acquires
basic accuracy in producing outputs.

Next, the generated samples or images are fed into the discriminator along with actual data points from the
original concept. After the generator and discriminator models have processed the data, optimization
with backpropagation starts. The discriminator filters through the information and returns a probability
between 0 and 1 to represent each image's authenticity -- 1 correlates with real images and 0 correlates
with fake. These values are then manually checked for success and repeated until the desired outcome is
reached.

A GAN typically takes the following steps:

1. The generator outputs an image after accepting random numbers.

2. The discriminator receives this created image in addition to a stream of photos from the real,
ground-truth data set.
3. The discriminator inputs both real and fake images and outputs probabilities -- a value between 0
and 1 -- where 1 indicates a prediction of authenticity and 0 indicates a fake.

This creates a double feedback loop where the discriminator is in a feedback loop with the ground truth of
the images and the generator is in a feedback loop with the discriminator.

An image showing how GAN works.

Types of GANs

GANs come in a variety of forms and can be used for various tasks. The following are the most common
GAN types:
Real World Computer Vision Applications
 Vanilla GAN. This is the simplest of all GANs and its algorithm tries to optimize the mathematical
equation using stochastic gradient descent, which is a method of learning an entire data set by
going through one example at a time. It consists of a generator and a discriminator. The
classification and creation of generated images is done using the generators and discriminators as
straightforward multi-layer perceptrons. The discriminator seeks to determine the likelihood that
the input belongs to a particular class while the generator collects the distribution of the data.
 Conditional GAN. By applying class labels, this kind of GAN enables the conditioning of the
network with new and specific information. As a result, during GAN training, the network receives
the images with their actual labels, such as "rose," "sunflower" or "tulip" to help it learn how to
distinguish between them.
 Deep convolutional GAN. This GAN uses a deep convolutional neural network for producing
high-resolution image generation that can be differentiated. Convolutions are a technique for
drawing out important information from the generated data. They function particularly well with
images, enabling the network to quickly absorb the essential details.
 CycleGAN. This is the most common GAN architecture and is generally used to learn how to
transform between images of various styles. For instance, a network can be taught how to alter an
image from winter to summer or from an image of a horse to a zebra. One of the most well-known
applications of CycleGAN is FaceApp, which alters human faces into various age groups.
 StyleGAN. Researchers from Nvidia released StyleGAN in December 2018 and proposed
significant improvements to the original generator architecture models. StyleGAN can produce
photorealistic, high-quality photos of faces, but users can modify the model to alter the appearance
of the images that are produced.
 Super resolution GAN. With this type of GAN, a low-resolution image can be changed into a more
detailed one. Super-resolution GANs increase the image resolution by filling in blurry spots.

Popular use cases for GANs

GANs are becoming a popular ML model for online retail sales because of their ability to understand and
recreate visual content with increasingly remarkable accuracy. They can be used for a variety of tasks,
including anomaly detection, data augmentation, picture synthesis, and text-to-image and image-to-image
translation.

Common use cases of GANs include the following:

 Filling in images from an outline.

 Generating a realistic image from text.
 Producing photorealistic depictions of product prototypes.
 Converting black and white imagery into color.

 Photo translations from image sketches or semantic images that are especially useful in the
healthcare industry for diagnoses.

In video production, GANs can be used to perform the following:

 Model patterns of human behavior and movement within a frame.

 Predict subsequent video frames.
 Create a deepfake.
Real World Computer Vision Applications
Other use cases of GANs include text-to-speech for the generation of realistic speech sounds.
Furthermore, GAN-based generative AI models can generate text for blogs, articles and product
descriptions. These AI-generated texts can be used for a variety of purposes, including advertising, social
media content, research and communication.

GAN examples

GANs are used to generate a wide range of data types, including images, music and text. The following
are popular real-world examples of GAN:

 Generating human faces. GANs can produce accurate representations of human faces. For
example, StyleGAN2 from Nvidia can produce excellent, photorealistic images of people that don't
exist. These pictures are so lifelike that many people believe they're actual individuals.
 Developing new fashion designs. GANs can be used to create new fashion designs that reflect
existing ones. For instance, clothing retailer H&M used GANs to create new apparel designs for its
merchandise.
 Creating realistic animal images. GANs can also generate realistic images of animals. For
example, BigGAN, a GAN model developed by Google researchers, can produce high-quality
images of animals such as birds and dogs.
 Video game character creation. GANs can be used to create new characters for video games. For
example, Nvidia created new characters using GANs for the well-known video game Final
Fantasy XV.
 Generating realistic three-dimensional (3D) objects. GANs are also capable of producing
actual 3D objects. For example, researchers at Massachusetts Institute of Technology have created
3D models of chairs and other furniture that appear to have been created by people using GANs.
These models can be applied to architectural visualization or video games.

2002 Magnum 325 500 Service Manual 9917198
100% (10)
2002 Magnum 325 500 Service Manual 9917198
429 pages
Akak 0801 s034 24 Fayraz Interior
No ratings yet
Akak 0801 s034 24 Fayraz Interior
6 pages
Computer Vision Class 10 Notes
100% (5)
Computer Vision Class 10 Notes
7 pages
RAT292 M3 Part 2 Sensors and Actuators
No ratings yet
RAT292 M3 Part 2 Sensors and Actuators
55 pages
Computer Vision ch4
No ratings yet
Computer Vision ch4
100 pages
Unit - 3
No ratings yet
Unit - 3
42 pages
Edge Detection FPCV-2-1
No ratings yet
Edge Detection FPCV-2-1
22 pages
Image Processing and Analysis
No ratings yet
Image Processing and Analysis
38 pages
3. Feature Extraction
No ratings yet
3. Feature Extraction
32 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
74 pages
Invicta-2020 Day12
No ratings yet
Invicta-2020 Day12
61 pages
CV CH-3
No ratings yet
CV CH-3
27 pages
CSC 553-CV Lec 04
No ratings yet
CSC 553-CV Lec 04
32 pages
Unit-5 Computer Vision
No ratings yet
Unit-5 Computer Vision
3 pages
The Blue and Green Colors Are Actually The Same
No ratings yet
The Blue and Green Colors Are Actually The Same
47 pages
Computer vision
No ratings yet
Computer vision
13 pages
Computer Vision
No ratings yet
Computer Vision
29 pages
Computer Vision - Session 2
No ratings yet
Computer Vision - Session 2
59 pages
A Survey On Edge Detection Methods PDF
No ratings yet
A Survey On Edge Detection Methods PDF
36 pages
Image segmentation
No ratings yet
Image segmentation
74 pages
UNIT 2 COMPUTER VISION & IMAGE PROCESSSING (1)
No ratings yet
UNIT 2 COMPUTER VISION & IMAGE PROCESSSING (1)
16 pages
Computer Vision - Edge Detection
No ratings yet
Computer Vision - Edge Detection
30 pages
Unit-3
No ratings yet
Unit-3
9 pages
Image Segmentation: Unit-Iv
No ratings yet
Image Segmentation: Unit-Iv
108 pages
Week 8
No ratings yet
Week 8
94 pages
Unit 4
No ratings yet
Unit 4
39 pages
Unit II - Chapter 4 - Edge Detection
No ratings yet
Unit II - Chapter 4 - Edge Detection
43 pages
chapter-2
No ratings yet
chapter-2
37 pages
Segment 5
No ratings yet
Segment 5
100 pages
3lec02 Edge for Web
No ratings yet
3lec02 Edge for Web
35 pages
CV Unit 3
No ratings yet
CV Unit 3
41 pages
Image Feature Extraction and Segmentation
No ratings yet
Image Feature Extraction and Segmentation
36 pages
2-1 Module 2
No ratings yet
2-1 Module 2
12 pages
Assignment No.: 5: Aim: Theory
No ratings yet
Assignment No.: 5: Aim: Theory
3 pages
IVP Unit-V & VI
No ratings yet
IVP Unit-V & VI
51 pages
Computer Vision-Unit 2 Notes
No ratings yet
Computer Vision-Unit 2 Notes
15 pages
Digital Image Processing
No ratings yet
Digital Image Processing
10 pages
Lecture 3 of Computer Vision
No ratings yet
Lecture 3 of Computer Vision
45 pages
Week 9 Lecture Notes
No ratings yet
Week 9 Lecture Notes
27 pages
Computer Vision How To Approach Vision Problems
No ratings yet
Computer Vision How To Approach Vision Problems
6 pages
Ip Cv Summary Finaaaal-1
No ratings yet
Ip Cv Summary Finaaaal-1
178 pages
Edge and Contour DEtection
No ratings yet
Edge and Contour DEtection
10 pages
Lecture-01 (Unit-02)
No ratings yet
Lecture-01 (Unit-02)
8 pages
Unit 2 Comuter Vision
No ratings yet
Unit 2 Comuter Vision
7 pages
Image Segmentation
No ratings yet
Image Segmentation
32 pages
06 Features
No ratings yet
06 Features
94 pages
lecture 1 AI Summary
No ratings yet
lecture 1 AI Summary
31 pages
Features 1 B
No ratings yet
Features 1 B
94 pages
unit 3_1_1709014556934
No ratings yet
unit 3_1_1709014556934
49 pages
Chapter 4 Colour Analysis Part 2
No ratings yet
Chapter 4 Colour Analysis Part 2
65 pages
OpenCV Computer Vision Application Programming Cookbook 2nd Edition Robert Laganiere - Quickly download the ebook to read anytime, anywhere
100% (1)
OpenCV Computer Vision Application Programming Cookbook 2nd Edition Robert Laganiere - Quickly download the ebook to read anytime, anywhere
52 pages
Lecture2 Edges
No ratings yet
Lecture2 Edges
46 pages
Image Detection Numerical Examples and
No ratings yet
Image Detection Numerical Examples and
5 pages
Ece3099 Ipt PPT Template 18becxxxx
No ratings yet
Ece3099 Ipt PPT Template 18becxxxx
27 pages
2017 05 12 Image Segmentation
No ratings yet
2017 05 12 Image Segmentation
2 pages
Robotics
No ratings yet
Robotics
35 pages
ImageProcessing9 Segmentation (PointsLinesEdges)
100% (1)
ImageProcessing9 Segmentation (PointsLinesEdges)
24 pages
(IP'22) Lecture 5 - Segmentation I - Edge-Based - Thresholding
No ratings yet
(IP'22) Lecture 5 - Segmentation I - Edge-Based - Thresholding
97 pages
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
From Everand
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
Fouad Sabry
No ratings yet
Pyramid Image Processing: Exploring the Depths of Visual Analysis
From Everand
Pyramid Image Processing: Exploring the Depths of Visual Analysis
Fouad Sabry
No ratings yet
Computer Vision Fundamental Matrix: Please, suggest a subtitle for a book with title 'Computer Vision Fundamental Matrix' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
From Everand
Computer Vision Fundamental Matrix: Please, suggest a subtitle for a book with title 'Computer Vision Fundamental Matrix' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
Fouad Sabry
No ratings yet
Ray Tracing Graphics: Exploring Photorealistic Rendering in Computer Vision
From Everand
Ray Tracing Graphics: Exploring Photorealistic Rendering in Computer Vision
Fouad Sabry
No ratings yet
CH 03
No ratings yet
CH 03
92 pages
2537 - York YCAS 690
No ratings yet
2537 - York YCAS 690
11 pages
Conclusion and Recommendation
No ratings yet
Conclusion and Recommendation
1 page
Overload Monitoring System in Public Transportation
100% (4)
Overload Monitoring System in Public Transportation
41 pages
Hitachi Zaxis 650lc 670lch 3 Technical Manual Operational Principle
100% (59)
Hitachi Zaxis 650lc 670lch 3 Technical Manual Operational Principle
20 pages
Year 4 Maths Homework Activities
100% (2)
Year 4 Maths Homework Activities
7 pages
C Orrosion Resistant Materials 06
No ratings yet
C Orrosion Resistant Materials 06
10 pages
Fuzzy Space Time, Quantum Geometry and Cosmology PDF
No ratings yet
Fuzzy Space Time, Quantum Geometry and Cosmology PDF
29 pages
Fortran Basics
No ratings yet
Fortran Basics
63 pages
B.SC Chemistry Syllabus PDF
No ratings yet
B.SC Chemistry Syllabus PDF
49 pages
Cracking NoteZilla Passwords - Paper
No ratings yet
Cracking NoteZilla Passwords - Paper
6 pages
Ep3291 - Ep Lab - Iv Lab Report: Adil Sidan EP20B003
No ratings yet
Ep3291 - Ep Lab - Iv Lab Report: Adil Sidan EP20B003
12 pages
Ekram Assignment ECONO
100% (1)
Ekram Assignment ECONO
16 pages
GP Combined Torsion Tension To Yield Pipe Tubes
No ratings yet
GP Combined Torsion Tension To Yield Pipe Tubes
7 pages
1,2-Prosthodontic - Docx (1,2) Fifth Class
No ratings yet
1,2-Prosthodontic - Docx (1,2) Fifth Class
6 pages
Some BTS Alarms and Solutions
No ratings yet
Some BTS Alarms and Solutions
22 pages
Pyroshock Explained: Written by Patrick L. Walter, Ph. D
No ratings yet
Pyroshock Explained: Written by Patrick L. Walter, Ph. D
5 pages
Host Interface Manual - Cobas C 311
75% (4)
Host Interface Manual - Cobas C 311
62 pages
Curriculum Vitae Personal Name Uday P. Khairnar Address
No ratings yet
Curriculum Vitae Personal Name Uday P. Khairnar Address
3 pages
Adaptive Model-Predictive-Control-Based Real-Time Energy Management of Fuel Cell Hybrid Electric Vehicles
No ratings yet
Adaptive Model-Predictive-Control-Based Real-Time Energy Management of Fuel Cell Hybrid Electric Vehicles
14 pages
Using COMSOL-Multiphysics in An Eddy Current
No ratings yet
Using COMSOL-Multiphysics in An Eddy Current
5 pages
DLL - Math4 Week 8
No ratings yet
DLL - Math4 Week 8
9 pages
Download Full Rahul Sardana Optics and Modern Physics JEE Rahul Sardana PDF All Chapters
100% (8)
Download Full Rahul Sardana Optics and Modern Physics JEE Rahul Sardana PDF All Chapters
71 pages
High Impedance Busbar Protection
100% (1)
High Impedance Busbar Protection
2 pages
Comparacion All On Six All Onfour
No ratings yet
Comparacion All On Six All Onfour
11 pages
Lymphatic System
No ratings yet
Lymphatic System
4 pages
Radiographic Interpretation: Coursework 2
No ratings yet
Radiographic Interpretation: Coursework 2
5 pages
Design and Fabrication of Stair Climbing Trolley: Click To Edit Master Subtitle Style
No ratings yet
Design and Fabrication of Stair Climbing Trolley: Click To Edit Master Subtitle Style
15 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.