Singular Value Decomposition Geometry
Singular Value Decomposition Geometry
(4)
A=U
Considering each column of V separately the latter is the same as
Av j = j uj ,
j = 1, . . . , n.
(5)
2v2
0.5
1u1
0.5
0.5
v2
0.5
v1
20
20
40
40
60
60
80
80
100
100
120
120
140
140
160
160
180
180
200
200
50
100
150
200
250
300
50
100
150
200
250
300
Full SVD
U
e
=
V .
U
O
(6)
24
2.0.5
and U = kAk2 . Then, clearly, we have found a reduced SVD, i.e., A = U V . The full
to U by the Gram-Schmidt algorithm and adding the
SVD is obtained by extending U
necessary zeros to .
We now assume an SVD exists for the case (m 1, n 1) and show it also exists
for (m, n). To this end we pick v 1 Cn such that kv 1 k2 = 1 and
kAk2 = sup kAv 1 k2 > 0.
v 1 Cn
kv 1 k2 =1
Now we take
u1 =
Av 1
.
kAv 1 k2
(7)
u1
e
U
1
"
=
A
v 1 Ve1
u1 Av 1 u1 AVe1
e Av 1 U
e AVe1
U
1
1
i
#
.
u1 Av 1 =
U1 AV1 =
0
1
=
0
or
A = U1
1 0T
0 U2
e = U
e AVe1 and can write the block
A
1
0T
e .
A
1 0T
0 V2
V1 ,
2.1
We now discuss the use of the SVD to diagonalize systems of linear equations. Consider
the linear system
Ax = b
with A Cmn . Using the SVD we can write
Ax = U V x
b = U b0 .
26
b0 = U b
Connection to Eigenvalues
If A Cmm is square with a linearly independent set of eigenvectors (i.e., nondefective), then
AX = X X 1 AX = ,
where X contains the eigenvectors of A as its columns and = diag(1 , . . . , m ) is a
diagonal matrix of the eigenvalues of A.
If we compare this eigen-decomposition of A to the SVD we see that the SVD is a
generalization: A need not be square, and the SVD always exists (whereas even a square
matrix need not have an eigen-decomposition). The price we pay is that we require
two unitary matrices U and V instead of only X (which is in general not unitary).
2.1.2
A number of theoretical facts about the matrix A can be obtained via the SVD. They
are summarized in
Theorem 2.2 Assume A Cmn , p = min(m, n), and r p denotes the number of
positive singular values of A. Then
1. rank(A) = r
2. range(A) = range(U (:, 1 : r))
null(A) = range(V (:, r + 1, : n))
3. kAk2 = p
1
kAkF = 12 + 22 + . . . + r2
4. The eigenvalues of A A are the i2 and the v i are the corresponding (orthonormalized) eigenvectors. The eigenvalues of AA are the i2 and possibly m n
zeros. The corresponding orthonormalized eigenvectors are given by the ui .
27
2.1.3
Low-rank Approximation
Theorem 2.3 The m n matrix A can be decomposed into a sum of r rank-one matrices:
r
X
A=
j uj v j .
(8)
j=1
j uj v j .
j=1
In fact,
kA A k2 = +1 .
(9)
Proof The representation (8) of the SVD follows immediately from the full SVD (6)
by splitting into a sum of diagonal matrices j = diag(0, . . . , 0, j , 0, . . . , 0).
Formula (9) for the approximation error follows from the fact that U AV = and
the expansion for A so that U (A A )V = diag(0, . . . , 0, +1 , . . .) and kA A k2 =
+1 by the invariance of the 2-norm under unitary transformations and item 3 of the
previous theorem.
The claim regarding the best approximation property is a little more involved, and
omitted.
28
Remark There are many possible rank- decompositions of A (e.g., by taking partial
sums of the LU or QR factorization). Theorem 2.3, however, says that the -th partial
sum of the SVD captures as much of the energy of A (measured in the 2-norm) as possible. This fact gives rise to many applications in image processing, data compression,
data mining, and other fields. See, e.g., the Matlab scripts svd compression.m and
qr compression.m.
A geometric interpretation of Theorem 2.3 is given by the best approximation of a
hyperellipsoid by lower-dimensional ellipsoids. For example, the best approximation of
a given hyperellipsoid by a line segment is given by the line segment corresponding to
the hyperellipsoids longest axis. Similarly, the best approximation by an ellipse is given
by that ellipse whose axes are the longest and second-longest axis of the hyperellipsoid.
2.1.4
We now list a simplistic algorithm for computing the SVD of a matrix A. It can be
used fairly easily for manual computation of small examples. For a given m n matrix
A the procedure is as follows:
1. Form A A.
2. Find the eigenvalues and orthonormalized eigenvectors of A A, i.e.,
A A = V V .
3. Sort the eigenvalues according to their magnitude, and let j =
j , j = 1, . . . , n.
j = 1, . . . , r.
1 2
A = 2 2 .
2 1
1.
A A=
9 8
8 9
.
1
2
1
2
29
1
2
12
#
.
3. 1 =
17 and 2 = 1, so that
=
17 0
0
1 .
0
0
1 2
1 1
1
2 2
1
17 2 2 1
3
1
4 ,
34 3
u1 =
=
and
1
Av 2
1
u2 =
1 2
1
1
2 2
1
2 2 1
1
1
0 .
2
1
U =
3
34
4
34
3
34
u3 (1)
u3 (2)
.
u3 (3)
0
1
2
j = 1, 2, 3.
2
1
u3 = 3 ,
17
2
so that
A = U V =
3
34
4
34
3
34
0
1
2
2
17
3
17
2
17
30
17 0 "
0
1
0
0
1
2
1
2
1
2
12
#
.
V
=
A=U
34
4
34
3
34
1
2
17 0
0
1
"
1
2
1
2
1
2
12
#
.
31