CCV-Preview
CCV-Preview
net/publication/259196682
CITATIONS READS
302 17,948
1 author:
Reinhard Klette
Shandong Academy of Sciences
790 PUBLICATIONS 10,496 CITATIONS
SEE PROFILE
All content following this page was uploaded by Reinhard Klette on 15 December 2013.
Reinhard Klette
1 Book Preface
tributions to Section 10.3), Yizhe Lin (Figure 6.19), Dongwei Liu (Figure 2.16),
Yan Liu (permission to publish Figure 1.6), Rocı́o Lizárraga (permission to pub-
lish Figure 5.1, bottom row), Peter Meer (comments on Subsection 2.4.2), James
Milburn (contributions to Section 4.4). Pedro Real (comments on geometric and
topologic subjects), Mahdi Rezaei (contributions to face detection in Chapter 10,
including text and figures, and Exercise 10.2), Bodo Rosenhahn (Figure 7.9,
right), John Rugis (definition of similarity curvature and Exercises 7.2 and 7.6),
James Russell (contributions to Subsection 5.1.1), Jorge Sanchez (contribution
to Example 9.1, Figures 9.1, right, and 9.5), Konstantin Schauwecker (com-
ments on feature detectors and RANSAC plane detection, Figures 6.10, right,
7.19, 9.9, and 2.23), Karsten Scheibe (contributions to Chapter 6, in particular
to Section 6.1.4, and Figure 7.1), Karsten Schlüns (contributions to Section 7.4),
I thank my wife, Gisela Klette, for authoring Subsection 3.2.4 about the
Euclidean distance transform, and critical views on structure and details of the
book while the book was written at CIMAT Guanajuato between mid July to
beginning of November 2013 during a sabbatical leave from The University of
Auckland, New Zealand.
Concise Computer Vision 5
2 Contents
Page numbers are for the submitted manuscript, and will certainly change in the
finalized book (after the editing process by Springer).
1 Image Data . 1
1.1 Images in the Spatial Domain . 1
1.1.1 Pixels and Windows . 2
1.1.2 Image Values and Basic Statistics . 3
1.1.3 Spatial and Temporal Data Measures . 8
1.1.4 Step-Edges . 10
1.2 Images in the Frequency Domain . 14
1.2.1 Discrete Fourier Transform . 15
1.2.2 Inverse Discrete Fourier Transform . 16
1.2.3 The Complex Plane . 17
1.2.4 Image Data in the Frequency Domain . 19
1.2.5 Phase-Congruency Model for Image Features . 25
1.3 Color and Color Images . 27
1.3.1 Color Definitions . 28
1.3.2 Color Perception, Visual Deficiencies, and Gray-Levels . 32
1.3.3 Color Representations . 36
1.4 Exercises . 41
1.4.1 Programming Exercises . 41
1.4.2 Non-Programming Exercises . 43
2 Image Processing . 45
2.1 Point, Local, and Global Operators . 45
2.1.1 Gradation Functions . 45
2.1.2 Local Operators . 48
2.1.3 Fourier Filtering . 51
2.2 Three Procedural Components . 54
2.2.1 Integral Images . 54
2.2.2 Regular Image Pyramids . 55
2.2.3 Scan Orders . 57
2.3 Classes of Local Operators . 59
2.3.1 Smoothing . 59
2.3.2 Sharpening . 62
2.3.3 Basic Edge Detectors . 64
2.3.4 Basic Corner Detectors . 69
2.3.5 Removal of Illumination Artefacts . 72
2.4 Advanced Edge Detectors . 75
2.4.1 LoG and DoG, and Their Scale Spaces . 75
2.4.2 Embedded Confidence . 80
2.4.3 The Kovesi Algorithm . 83
2.5 Exercises . 88
6 Reinhard Klette
3 Image Analysis . 91
3.1 Basic Image Topology . 91
3.1.1 4- and 8-Adjacency for Binary Images . 92
3.1.2 Topologically-Sound Pixel Adjacency . 96
3.1.3 Border Tracing . 100
3.2 Geometric 2D Shape Analysis . 103
3.2.1 Area . 103
3.2.2 Length . 106
3.2.3 Curvature . 109
3.2.4 Distance Transform (by Gisela Klette) . 112
3.3 Image Value Analysis . 119
3.3.1 Co-Occurrence Matrices and Measures . 120
3.3.2 Moment-Based Region Analysis . 122
3.4 Detection of Lines and Circles . 125
3.4.1 Lines . 125
3.4.2 Circles . 131
3.5 Exercises . 132
3.5.1 Programming Exercises . 132
3.5.2 Non-Programming Exercises . 136
Symbols . 425
Index . 427
Persons . 439
A stereo matcher is often defined by the data and smoothness-cost terms used,
and by a control structure how those terms are applied for minimizing the total
error of the calculated labeling function f . Smoothness terms are very much
generically defined, and we present possible control structures later in this chap-
ter. Data-cost calculation is the “core component” of a stereo matcher. We define
a few data-cost functions with a particular focus on ensuring some invariance
with respect to lighting artifacts in recorded images, or brightness differences
between left and right images.
Zero-Mean Version. Instead of calculating a data-cost function such as
ESSD (x, l) or ESAD (x, l) on the original image data, we calculate at first the
10 Reinhard Klette
l,k
mean B x of a used window Wxl,k (B), the mean M x+d of a used window Wx+d (M ),
l,k
subtract B x from all intensity values in Wx (B) and M x+d from all values in
l,k
Wx+d (M ), and calculate then the data-cost function in its zero-mean version.
This is one option for reducing the impact of lighting artefacts (i.e. for not
depending on the ICA).
We indicate this way of processing by starting the subscript of the data-cost
function with a Z. For example, EZSSD or EZSAD are the zero-mean SSD or
zero-mean SAD data-cost function, respectively, formally defined by
l k
X X 2
EZSSD (x, d) = (Bx+i,y+j − B x ) − (Mx+i+d,y+j − M x+d ) (1)
i=−l j=−k
l
X k
X
EZSAD (x, d) = [Bx+i,y+j − B x ] − [Mx+d+i,y+j − M i+d ] (2)
i=−l j=−k
NCC Data Cost. The normalized cross correlation (NCC) was defined in
Insert 2.5 for comparing two images. The NCC is already defined by zero-mean
normalization, but we add the Z to the index for uniformity in notation. The
NCC data cost is defined by
Pl Pk
i=−l j=−k Bx+i,y+j − B x Mx+d+i,y+j − M x+d
EZN CC (x, d) = 1 − q
2
σB,x 2
· σM,x+d
(3)
where
l k
2
X X 2
σB,x = Bx+i,y+j − B x (4)
i=−l j=−k
l k
2
X X 2
σM,x+d = Mx+d+i,y+j − M x+d (5)
i=−l j=−k
with
0 Buv ⊥ B x and Mu+d,v ⊥ M x+d
ρ(u, v, d) = (7)
1 otherwise
with ⊥ either < or > in both cases. By using Bx instead of B x , and Mx+d
instead of M x+d , we have the census data-cost function ECEN (without zero-
mean normalization).
Concise Computer Vision 11
bx = [−1, −1, +1, −1, −1, +1, −1, −1, +1]> (8)
mx+d = [−1, −1, +1, +1, −1, +1, −1, −1, −1] (9)
cx,d = [ 0 , 0 , 0 , 1 , 0 , 0 , 0 , 0 , 1 ]> (10)
Vector cx,d shows exactly the positions where vectors bx and mx+d differ in
values; the number of positions where two vectors differ is known as the Hamming
distance of those two vectors.
Observation 11 The zero-mean normalized census data cost EZCEN (x, d) equals
the Hamming distance between vectors bx and mx+d .
By adapting the definition of both vectors bx and mx+d to the census data-
cost function ECEN , we can also obtain those costs as the Hamming distance.
===================================
Hamming. The US-American mathematician R. W. Hamming (1915 – 1998)
contributed to computer science and telecommunications. The Hamming code,
Hamming window, Hamming numbers, and the Hamming distance are all named
after him. ===================================
By replacing values “-1” by “0” in vectors bx and mx+d , the Hamming
distance for the resulting binary vectors can be calculated very time-efficiently.1
4 Index of Subjects
Page numbers are, as in the list of contents, for the submitted manuscript, and
will certainly change in the finalized book (after the editing process by Springer).
Index
Gmax , 4 SouthLeft, 45
Ω, 2, 5 SouthRight, 45
atan2, 21, 67 Spring, 206, 207, 215
pos, 86, 87 Straw, 21
Altar, 62 Taroko, 10
AnnieYukiTim, 27, 172, 214 Tomte, 99
Aussies, 10, 180, 215 Uphill, 45
Crossing, 293, 294, 317 Wiper, 45
Donkey, 22 WuhanU, 14
Emma, 52, 55 Xochicalco, 215
Fibers, 21 Yan, 6, 172
Fountain, 4 bicyclist, 141, 210, 355
Kiri, 99 motorway, 210
LightAndTrees, 72 queenStreet, 163
MainRoad, 72 tennisball, 141, 208, 209
1D, 15
Michoacan, 399
2D, 15
MissionBay, 176, 214
3D, 6
Monastry, 178
Neuschwanstein, 5 absolute difference, 297
NorthLeft, 72 AC, 21
NorthRight, 72 accumulated cost, 294
Odense, 182, 189 accuracy
OldStreet, 10 – sub-cell, 129
PobleEspanyol, 99 – subpixel, 124
RagingBull, 46 AD, 297, 315, 318
Rangitoto, 99 AdaBoost, 402
RattusRattus, 173 adaptive boosting, 402
Rocio, 172, 401 adjacency
SanMiguel, 3, 8 -, 92
Set1Seq1, 49, 61, 68, 70, 71, 77, 82 -, 136
Set2Seq1, 74, 167 -, 92
1
See [H. S. Warren. Hacker’s Delight. Pages 65–72, Addison-Wesley Longman, New
York, 2002].
Concise Computer Vision 13
5 Index of Persons
Page numbers are for the submitted manuscript, and will certainly change in the
finalized book (after the editing process by Springer).
Index