Spectral and Algebraic Graph Theory
Spectral and Algebraic Graph Theory
Daniel A. Spielman
Yale University
Preface v
Contents vi
Notation xxii
1 Introduction 2
5 Comparing Graphs 39
6 Fundamental Graphs 47
7 Cayley Graphs 55
i
CHAPTER LIST ii
VI Algorithms 245
Bibliography 363
Index 375
Preface
• One must convey how the coordinates of eigenvectors correspond to vertices in a graph.
This is obvious to those who understand it, but it can take a while for students to grasp.
• One must introduce necessary linear algebra and show some interesting interpretations of
graph eigenvalues.
• One must derive the eigenvalues of some example graphs to ground the theory.
I find that one has to do all these at once. For this reason my first few lectures jump between
developing theory and examining particular graphs. For this book I have decided to organize the
material differently, mostly separating examinations of particular graphs from the development of
the theory. To help the reader reconstruct the flow of my courses, I give three orders that I have
used for the material:
put orders here
There are many terrific books on Spectral Graph Theory. The four that influenced me the most
are
v
PREFACE vi
Other books that I find very helpful and that contain related material include
“Spectra of Graphs” by Dragos Cvetkovic, Michael Doob, and Horst Sachs, and
For those needing an introduction to linear algebra, a perspective that is compatible with this
book is contained in Gil Strang’s “Introduction to Linear Algebra.” For more advanced topics in
linear algebra, I recommend “Matrix Analysis” by Roger Horn and Charles Johnson, as well as
their “Topics in Matrix Analysis” For treatments of physical systems related to graphs, the topic
of Part III, I recommend Gil Strang’s “Introduction to Applied Mathematics”, Sydney H. Gould’s
“Variational Methods for Eigenvalue Problems”, and “Markov Chains and Mixing Times” by
Levin, Peres and Wilmer.
I include some example in these notes. All of these have been generated inside Jupyter notebooks
using the Julia language. Some of them require use of the package Laplacians.jl. A simple search
will produce good instructions for installing Julia and packages for it. The notebooks used in this
book may be found at http://cs-www.cs.yale.edu/homes/spielman/sagt.
Contents
Preface v
Contents vi
Notation xxii
1 Introduction 2
1.1 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Matrices for Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 A spreadsheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 An operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.3 A quadratic form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Spectral Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.1 Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Highlights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5.1 Spectral Graph Drawing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5.2 Graph Isomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5.3 Platonic Solids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5.4 The Fiedler Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5.5 Bounding Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
vii
CONTENTS viii
5 Comparing Graphs 39
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2 The Loewner order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.3 Approximations of Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.4 The Path Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.4.1 Bounding λ2 of a Path Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.5 The Complete Binary Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
CONTENTS ix
6 Fundamental Graphs 47
6.1 The complete graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.2 The star graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.3 Products of graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.3.1 The Hypercube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.4 Bounds on λ2 by test vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.5 The Ring Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.6 The Path Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7 Cayley Graphs 55
7.1 Cayley Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.2 Paley Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.3 Eigenvalues of the Paley Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.4 Generalizing Hypercubes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.5 A random set of generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.7 Non-Abelian Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.8 Eigenvectors of Cayley Graphs of Abelian Groups . . . . . . . . . . . . . . . . . . . . 62
VI Algorithms 245
Bibliography 363
Index 375
Notation
This section lists the notation that I try to use throughout the book. I sometimes fall into
different notations when the conventions surrounding a result are so strong that failing to follow
them would make it difficult for experts to understand this book, or would cause cognitive stress.
I almost always treat vectors as functions, and thus write x (i) for the ith component of the vector
x . I place subscripts on vectors, like x i , to indicate the ith vector in a set of vectors.
I An identity matrix
J An all-1s matrix
D The diagonal matrix of weighted degrees of a graph
L Laplacian Matrix
M Adjacency Matrix or Generic Matrix
N Normalized Laplacian Matrix
W The diagonal matrix of edge weights, or
W The Walk Matrix, M D −1
W
f Lazy Walk Matrix, I /2 + W /2
A+ The Moore-Penrose pseudoinverse of A.
A+/2 The square root of of A+ .
xxii
NOTATION xxiii
1
Chapter 1
Introduction
In this chapter we present essential background on graphs and spectral theory. We also provide a
brief introduction to some of the ideas of spectral graph theory, describe some of the topics
covered in this book, and try to give some useful intuition about graph spectra.
1.1 Graphs
First, we recall that a graph G = (V, E) is specified by its vertex1 set, V , and edge set E. In an
undirected graph, the edge set is a set of unordered pairs of vertices. Unless otherwise specified,
all graphs will be undirected, simple (having no loops or multiple edges) and finite. We will
sometimes assign weights to edges. These will usually be positive real numbers. If no weights
have been specified, we view all edges as having weight 1. This is an arbitrary choice, and we
should remember that it has an impact.
Graphs (also called “networks”) are typically used to model connections or relations between
things, where “things” are vertices. When the edges in a graph are more important than the
vertices, we may just specify an edge set E and ignore the ambient vertex set.
Common “natural” examples of graphs are:
• Friendship graphs: people are vertices, edges exist between pairs of people who are friends
(assuming the relation is symmetric).
• Network graphs: devices, routers and computers are vertices, edges exist between pairs that
are connected.
• Circuit graphs: electronic components, such as transistors, are vertices: edges exist between
pairs connected by wires.
• Protein-Protein Interaction graphs: proteins are vertices. Edges exist between pairs that
1
I will use the words “vertex” and “node” interchangeably. Sorry about that.
2
CHAPTER 1. INTRODUCTION 3
interact. These should really have weights indicating the strength and nature of interaction.
So should most other graphs.
• The path on n vertices. The vertices are {1, . . . n}. The edges are (i, i + 1) for 1 ≤ i < n.
• The ring on n vertices. The vertices are {1, . . . n}. The edges are all those in the path, plus
the edge (1, n).
• The hypercube on 2k vertices. The vertices are elements of {0, 1}k . Edges exist between
vertices that differ in only one coordinate.
The naive view of a matrix is that it is essentially a spreadsheet—a table we use to organize
numbers. This is like saying that a car is an enclosed metal chair with wheels. It says nothing
about what it does!
We will use matrices to do two things. First, we will view a matrix M as providing a function
that maps a vector x to the vector M x . That is, we view M as an operator. Second, we use the
matrix M to define a quadratic form: a function that maps a vector x to a number x T M x .
1.2.1 A spreadsheet
We will usually write V for the set of vertices of a graph, and let n denote the number of vertices.
There are times that we will need to order the vertices and assign numbers to them. In this case,
they will usually be {1, . . . , n}. For example, if we wish to draw a matrix as a table, then we need
to decide which vertex corresponds to which row and column.
The most natural matrix to associate with a graph G is its adjacency matrix2 , M G , whose entries
M G (a, b) are given by (
1 if (a, b) ∈ E
M G (a, b) =
0 otherwise.
It is important to realize that we index the rows and columns of the matrix by vertices, rather
than by numbers. Almost every statement that we make will remain true under renaming of
vertices. The first row of a matrix has no special importance. To understand this better see the
exercises at the end of this section.
While the adjacency matrix is the most natural matrix to associate with a graph, I find it the
least useful. Eigenvalues and eigenvectors are most meaningful when used to understand a
natural operator or a natural quadratic form. The adjacency matrix provides neither.
2
I am going to try to always use the letter M for the adjacency matrix, in contrast with my past practice which
was to use A. I will use letters like a and b to denote vertices.
CHAPTER 1. INTRODUCTION 4
1.2.2 An operator
The most natural operator associated with a graph G is probably its diffusion operator. This
operator describes the diffusion of stuff among the vertices of a graph. Imagine a process in which
each vertex can contain some amount of stuff (such as a gas). At each time step, the stuff at a
vertex will be uniformly distributed to its neighbors. None of the stuff that was at a vertex
remains at the vertex, but stuff can enter from other vertices. This is a discrete-time and slightly
unnatural notion of diffusion, but it provides a nice matrix.
To construct the diffusion matrix, let D G be the diagonal matrix in which D G (a, a) is the degree
of vertex a. We will usually write d (a) for the degree of vertex a. In an unweighted graph, the
degree of a vertex is the number of edges attached to it. In the case of a weighted graph, we use
the weighted degree: the sum of the weights of the edges attached to the vertex a. Algebraically,
we can obtain the vector of degrees from the expression
def
d = M G 1,
Of course, when the graph is regular, that is when every vertex has the same degree, W G is
merely a rescaling of M G 3 .
Formally4 , we use a vector p ∈ IRV to indicate how much stuff is at each vertex, with p(a) being
the amount of stuff at vertex a. After one time step, the distribution of stuff at each vertex will
be W G p. To see this, first consider the case when p is an elementary unit vector, δ a , where we
define δ a to be the vector for which δ a (a) = 1, and for every other vertex b, δ a (b) = 0. The vector
D −1 −1
G δ a has the value 1/d (a) at vertex a, and is zero everywhere else. So, the vector M G D G δ a
has value 1/d (a) at every vertex b that is a neighbor of a, and is zero everywhere else. If this is
not immediately obvious, think about it until it is.
It is sometimes more convenient to consider a lazy random walk . These are usually defined to be
walks that stay put with probability one half and take a step with probability one half. The
matrix corresponding to this operator is given by
f G def
W = I /2 + W G /2.
One of the purposes of spectral theory is to provide an understanding of what happens when one
repeatedly applies a linear operator like W G .
3
I think this is why researchers got away with studying the adjacency matrix for so long.
4
We write IRV instead of IRn to emphasize that each coordinate of the vector corresponds to a vertex of the graph.
CHAPTER 1. INTRODUCTION 5
The most natural quadratic form associated with a graph is defined in terms of its Laplacian
matrix,
def
LG = D G − M G .
Given a function on the vertices, x ∈ IRV , the Laplacian quadratic form of a weighted graph in
which edge (a, b) has weight wa,b > 0 is
X
x T LG x = wa,b (x (a) − x (b))2 . (1.1)
(a,b)∈E
This form measures the smoothness of the function x . It will be small if the function x does not
jump too much over any edge.
We use the notation x (a) to denote the coordinate of vector x corresponding to vertex a. Other
people often use subscripts for this, like x a . We usually use subscripts to name vectors.
There are many possible definitions of Laplacians with negative edge weights. So, we will only
define them when we need them.
We now review the highlights of the spectral theory for symmetric matrices. Almost all of the
matrices we consider will be symmetric or will be similar5 to symmetric matrices.
We recall that a vector ψ is an eigenvector of a matrix M with eigenvalue λ if
M ψ = λψ. (1.2)
That is, λ is an eigenvalue if and only if λI − M is a singular matrix. Thus, the eigenvalues are
the roots of the characteristic polynomial of M :
det(xI − M ).
Theorem 1.3.1. [The Spectral Theorem] If M is an n-by-n, real, symmetric matrix, then there
exist real numbers λ1 , . . . , λn and n mutually orthogonal unit vectors ψ 1 , . . . , ψ n and such that ψ i
is an eigenvector of M of eigenvalue λi , for each i.
This is the great fact about symmetric matrices. If the matrix is not symmetric, it might not have
n eigenvalues. And, even if it has n eigenvalues, their eigenvectors will not be orthogonal6 . If M
is not symmetric, its eigenvalues and eigenvalues might be the wrong thing to study.
5
A matrix M is similar to a matrix B if there is a non-singular matrix X such that X −1 M X = B. In this case,
M and B have the same eigenvalues. See the exercises at the end of this section.
6
You can prove that if the eigenvectors are orthogonal, then the matrix is symmetric.
CHAPTER 1. INTRODUCTION 6
Recall that the eigenvectors are not uniquely determined, although the eigenvalues are. If ψ is an
eigenvector, then −ψ is as well. Some eigenvalues can be repeated. If λi = λi+1 , then ψ i + ψ i+1
will also be an eigenvector of eigenvalue λi . The eigenvectors of a given eigenvalue are only
determined up to an orthogonal transformation.
Definition 1.3.2. A matrix is positive definite if it is symmetric and all of its eigenvalues are
positive. It is positive semidefinite if it is symmetric and all of its eigenvalues are nonnegative.
We always number the eigenvalues of the Laplacian from smallest to largest. Thus, λ1 = 0. We
will refer to λ2 , and in general λk for small k, as low-frequency eigenvalues. λn is a high-frequency
eigenvalue. We will see why soon.
Before we get to any theorems, we will examine evidence that the eigenvalues and eigenvectors of
graphs are meaningful by looking at some examples. These were produced in Julia using a
Jupyter notebook. You may find the notebook on the book homepage.
1.4.1 Paths
A path graph has vertices {1, . . . , n} and edges (i, i + 1) for 1 ≤ i < n. Here is the adjacency
matrix of a path graph on 4 vertices.
M = path_graph(4)
Matrix(M)
0.0 1.0 0.0 0.0
1.0 0.0 1.0 0.0
0.0 1.0 0.0 1.0
0.0 0.0 1.0 0.0
Matrix(lap(M))
1.0 -1.0 0.0 0.0
CHAPTER 1. INTRODUCTION 7
L = lap(path_graph(10))
E = eigen(Matrix(L))
println(E.values)
[0.0, 0.097887, 0.381966, 0.824429, 1.38197, 2.0, 2.61803, 3.17557, 3.61803, 3.90211]
E.vectors[:,1]
0.31622776601683755
0.31622776601683716
0.31622776601683766
0.3162277660168381
0.31622776601683855
0.3162277660168381
0.3162277660168385
0.31622776601683805
0.3162277660168378
0.3162277660168378
The eigenvector of λ2 is the lowest frequency eigenvector, as we can see that it increases
monotonically along the path:
v2 = E.vectors[:,2]
-0.44170765403093937
-0.39847023129620024
-0.316227766016838
-0.20303072371134553
-0.06995961957075425
0.06995961957075386
0.2030307237113457
0.31622776601683766
0.3984702312961997
0.4417076540309382
0.4
0.0
-0.2
-0.4
2 4 6 8 10
vertex number
plot(v2,marker=5,legend=false)
xlabel!("vertex number")
ylabel!("value in eigenvector")
The x-axis is the name/number of the vertex, and the y-axis is the value of the eigenvector at
that vertex. Now, let’s look at the next few eigenvectors.
0.4
v2
v3
v4
Value in Eigenvector
0.2
0.0
-0.2
-0.4
2 4 6 8 10
Vertex Number
Plots.plot(E.vectors[:,2],label="v2",marker = 5)
Plots.plot!(E.vectors[:,3],label="v3",marker = 5)
Plots.plot!(E.vectors[:,4],label="v4",marker = 5)
xlabel!("Vertex Number")
ylabel!("Value in Eigenvector")
You may now understand why we refer to these as the low-frequency eigenvectors. The curves
they trace out resemble the low-frequency modes of vibration of a string. The reason for this is
CHAPTER 1. INTRODUCTION 9
that the path graph can be viewed as a discretization of the string, and its Laplacian matrix is a
discretization of the Laplace operator. We will relate the low-frequency eigenvalues to
connectivity.
In contrast, the highest frequency eigenvalue alternates positive and negative with every vertex.
We will see that the high-frequency eigenvectors may be related to problems of graph coloring
and finding independent sets.
0.4
v10
Value in Eigenvector
0.2
0.0
-0.2
-0.4
2 4 6 8 10
Vertex Number
Plots.plot(E.vectors[:,10],label="v10",marker=5)
xlabel!("Vertex Number")
ylabel!("Value in Eigenvector")
1.5 Highlights
We now attempt to motivate this book, and the course on which it is based, by surveying some of
its highlights.
We can often use the low-frequency eigenvalues to obtain a nice drawing of a graph. For example,
here is 3-by-4 grid graph, and its first two non-trivial eigenvectors. Looking at them suggests that
they might provide nice coordinates for the vertices.
M = grid2(3,4)
L = lap(M)
E = eigen(Matrix(L))
V = E.vectors[:,2:3]
CHAPTER 1. INTRODUCTION 10
-0.377172 0.353553
-0.15623 0.353553
0.15623 0.353553
0.377172 0.353553
-0.377172 -1.66533e-16
-0.15623 -4.16334e-16
0.15623 -5.82867e-16
0.377172 2.77556e-16
-0.377172 -0.353553
-0.15623 -0.353553
0.15623 -0.353553
0.377172 -0.353553
In the figure below, we use these eigenvectors to draw the graph. Vertex a be been plotted at
coordinates ψ 2 (a), ψ 3 (a). That is, we use ψ 2 to provide a horizontal coordinate for every vertex,
and ψ 3 to obtain a vertical coordinate. We then draw the edges as straight lines.
plot_graph(M,V[:,1],V[:,2])
Let’s do a fancier example that should convince you something interesting is going on. We begin
by generating points by sampling them from the Yale logo.
CHAPTER 1. INTRODUCTION 11
2.00
1.75
1.50
1.25
1.00
1.00 1.25 1.50 1.75 2.00
@load "yale.jld2"
scatter(xy[:,1],xy[:,2],legend=false)
We then construct a graph on the points by forming their Delaunay triangulation7 , and use the
edges of the triangles to define a graph on the points.
Since the vertices came with coordinates, it was easy to draw a nice picture of the graph. But,
what if we just knew the graph, and not the coordinates? We could generate coordinates by
computing two eigenvectors, and using each as a coordinate. Below, we plot vertex a at position
ψ 2 (a), ψ 3 (a), and again draw the edges as straight lines.
7
While it does not make sense to cover Delaunay triangulations in this book, they are fascinating and I recommend
that you look them up.
CHAPTER 1. INTRODUCTION 12
plot_graph(a,xy[:,1],xy[:,2])
That’s a great way to draw a graph if you start out knowing nothing about it8 . Note that the
middle of the picture is almost planar, although edges do cross near the boundaries.
8
It’s the first thing I do whenever I meet a strange graph.
CHAPTER 1. INTRODUCTION 13
It is important to note that the eigenvalues do not change if we relabel the vertices. Moreover, if
we permute the vertices then the eigenvectors are similarly permuted. That is, if P is a
permutation matrix, then
because P T P = I . To prove it by experiment, let’s randomly permute the vertices, and plot the
permuted graph.
Random.seed!(1)
p = randperm(size(a,1))
M = a[p,p]
E = eigen(Matrix(lap(M)))
V = E.vectors[:,2:3]
plot_graph(M,V[:,1],V[:,2], dots=false)
Note that this picture is slightly different from the previous one: it has flipped vertically. That’s
because eigenvectors are only determined up to signs, and that’s only if they have multiplicity 1.
This gives us a very powerful heuristic for testing if one graph is a permutation of another (this is
the famous “Graph Isomorphism Testing Problem”). First, check if the two graphs have the same
sets of eigenvalues. If they don’t, then they are not isomorphic. If they do, and the eigenvalues
have multiplicity one, then draw the pictures above. If the pictures are the same, up to horizontal
or vertical flips, and no vertex is mapped to the same location as another, then by lining up the
pictures we can recover the permutation.
As some vertices can map to the same location, this heuristic doesn’t always work. We will learn
about it to the extent to which it does. In particular, we will see in Chapter 39 that if every
CHAPTER 1. INTRODUCTION 14
eigenvalue of two graphs G and H have multiplicity 1, then we can efficiently test whether or not
they are isomorphic.
These algorithms have been extended to handle graph in which the multiplicity of every
eigenvalue is bounded by a constant [BGM82]. But, there are graphs in which every non-trivial
eigenvalue has large multiplicity. We will learn how to construct and analyze these, as they
constitute fundamental examples and counter-examples to many natural conjectures. For
example, here are the eigenvalues of a Latin Square Graph on 25 vertices. These are a type of
Strongly Regular Graph.
M = latin_square_graph(5);
println(eigvals(Matrix(lap(M))))
[0.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0,
15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0]
All Latin Square Graphs of the same size have the same eigenvalues, whether or not they are
isomorphic. We will learn some surprisingly fast (but still not polynomial time) algorithms for
checking whether or not Strongly Regular Graphs are isomorphic.
Of course, somme graphs are not meant to be drawn in 3 dimensions. For example let’s try this
with the dodecahedron.
M = read_graph("dodec.txt")
spectral_drawing(M)
CHAPTER 1. INTRODUCTION 15
You will notice that this looks like what you would get if you squashed the dodecahedron down to
the plane. The reason is that we really shouldn’t be drawing this picture in two dimensions: the
smallest non-zero eigenvalue of the Laplacian has multiplicity three.
E = eigen(Matrix(lap(M)))
println(E.values)
So, we can’t reasonably choose just two eigenvectors. We should be choosing three that span the
eigenspace. If we do, we would get the canonical representation of the dodecahedron in three
dimensions.
x = E.vectors[:,2]
y = E.vectors[:,3]
z = E.vectors[:,4]
plot_graph(M, x, y, z; setaxis=false)
As you would guess, this happens for all Platonic solids. In fact, if you properly re-weight the
edges, it happens for every graph that is the one-skeleton of a convex polytope [Lov01]. Let me
state that more concretely. Given a convex polytope in IRd , we can treat its 1-skeleton as a graph
on its vertices. There is always a way of assigning positive weights to edges so that the
second-smallest Laplacian eigenvalue has multiplicity d, and so that the corresponding eigenspace
is spanned by the coordinate vectors of the vertices of the polytope.
We finish this section by contemplating an image of the high-frequency eigenvectors of the
dodecahedron. This code plots them in three dimensions, although we can only print them in two.
Observe that vertices are approximately opposite their neighbors.
CHAPTER 1. INTRODUCTION 16
x = E.vectors[:,20]
y = E.vectors[:,19]
z = E.vectors[:,18]
plot_graph(M, x, y, z; setaxis=false);
The second-smallest eigenvalue of the Laplacian matrix of a graph is zero if and only if the graph
is disconnected. If G is disconnected, then we can partition it into two graphs G1 and G2 with no
edges between them, and then write
LG1 0
LG = .
0 LG2
As the eigenvalues of LG are the union, with multiplicity, of the eigenvalues of LG1 and LG2 we
see that LG inherits a zero eigenvalue from each. Conversely, if G is connected then we can show
that the only vectors x for which x T LG x = 0 are the constant vectors. If x is not constant and G
is connected then there must be an edge (a, b) for which x (a) 6= x (b). And, this edge will
contribute a positive term to the sum (1.1).
Fiedler suggested that we make this qualitative observation quantitative and think of λ2 as a
measure of how well connected the graph is. For this reason, he called it the “Algebraic
Connectivity” of a graph, and we call it the “Fiedler value”.
Fiedler [Fie73] that the further λ2 is from 0, the better connected the graph is. In Chapter 21 we
will prove ultimate extension of this result: Cheeger’s inequality.
In short, we say that a graph is poorly connected if one can cut off many vertices by removing
only a few edges. We measure how poorly connected it is by the ratio of these quantities (almost).
Cheeger’s inequality gives a tight connection between this ratio and λ2 . If λ2 is small, then for
some t, the set of vertices
def
Si = {i : ψ 2 (i) < t}
CHAPTER 1. INTRODUCTION 17
may be removed by cutting much less than |Si | edges. This spectral graph partitioning heuristic
has proved very successful in practice.
In general, it will be interesting to turn qualitative statements like this into quantitative ones. For
example, the smallest eigenvalue of the diffusion matrix is zero if and only if the graph is bipartite.
One can relate the magnitude of this eigenvalue to how far a graph is from being bipartite [Tre09].
We will often be interested in the magnitudes of certain eigenvalues. For this reason, we will learn
multiple techniques for proving bounds on eigenvalues. The most prominent of these will be
proofs by test vectors and proofs by comparison with simpler graphs.
We will prove that graphs that can be drawn nicely must have small Fiedler value, and we will
prove very tight results for planar graphs.
We will also see how to use the graph Laplacian to draw planar graphs: Tutte [Tut63] that if one
reasonably fixes the locations of the vertices on a face of a planar graph and then lets the others
settle into the positions obtained by treating the edges as springs, then one obtains a planar
drawing of the graph!
Spectral graph theory is one of the main tools we use for analyzing random walks on graphs. We
will devote a few chapters to this theory, connect it to Cheeger’s inequality, and use tools
developed to study random walks to derive a fascinating proof of Cheeger’s inequality.
1.5.8 Expanders
We will be particularly interested in graphs that are very well connected. These are called
expanders. Roughly speaking, expanders are sparse graphs (say a number of edges linear in the
number of vertices), in which λ2 is bounded away from zero by a constant. They are among the
most important examples of graphs, and play a prominent role in Theoretical Computer Science.
Expander graphs have numerous applications. We will see how to use random walks on expander
graphs to construct pseudo-random generators about which one can actually prove something. We
will also use them to construct good error-correcting codes.
Error-correcting codes and expander graphs are both fundamental objects of study in the field of
Extremal Combinatorics and are extremely useful. We will also use error-correcting codes to
construct crude expander graphs. In Chapter 30 we will see a simple construction of good
expanders. The best expanders are the Ramanujan graphs. These were first constructed by
CHAPTER 1. INTRODUCTION 18
Margulis [Mar88] Lubotzky, Phillips and Sarnak [LPS88]. In Chapters ?? and ?? we will prove
that there exist infinite families of bipartite Ramanujan graphs.
We will ask what it means for one graph to approximate another. Given graphs G and H, we will
measure how well G approximates H by the closeness of their Laplacian quadratic forms. We will
see that expanders are precisely the sparse graphs that provide good approximations of the
complete graph, and we will use this perspective for most of our analysis of expanders. We will
show that every graph can be well-approximated by a sparse graph through a process called
sparsification.
We will also ask how well a graph can be approximated by a tree, and see that low-stretch
spanning-trees provide good approximations under this measure.
Our motivation for this material is the need to design fast algorithms for solving systems of linear
equations in Laplacian matrices and for computing their eigenvectors. This first problem arises in
numerous contexts, including the solution of elliptic PDEs by the finite element method, the
solution of network flow problems by interior point algorithms, and in classification problems in
Machine Learning.
In fact, our definition of graph approximation is designed to suit the needs of the Preconditioned
Conjugate Gradient algorithm.
1.6 Exercises
The following exercises are intended to help you get back in practice at doing linear algebra. You
should solve all of them.
1. Orthogonal eigenvectors. Let M be a symmetric matrix, and let ψ and φ be vectors so
that
M ψ = µψ and M φ = νφ.
Prove that if µ 6= ν then ψ must be orthogonal to φ. Note that your proof should exploit the
symmetry of M , as this statement is false otherwise.
2. Invariance under permutations.
Let Π be a permutation matrix. That is, there is a permutation π : V → V so that
(
1 if u = π(v), and
Π(u, v) =
0 otherwise.
CHAPTER 1. INTRODUCTION 19
Prove that if
M ψ = λψ,
then
ΠM ΠT (Πψ) = λ(Πψ).
That is, permuting the coordinates of the matrix merely permutes the coordinates of the
eigenvectors, and does not change the eigenvalues.
3. Invariance under rotations.
Let Q be an orthogonal matrix. That is, a matrix such that Q T Q = I . Prove that if
M ψ = λψ,
then
QM Q T (Qψ) = λ(Qψ).
4. Similar Matrices.
A matrix M is similar to a matrix B if there is a non-singular matrix X such that
X −1 M X = B. Prove that similar matrices have the same eigenvalues.
5. Spectral decomposition.
Let M be a symmetric matrix with eigenvalues λ1 , . . . , λn and let ψ 1 , . . . , ψ n be a corresponding
set of orthonormal column eigenvectors. Let Ψ be the orthogonal matrix whose ith column is ψ i .
Prove that
Ψ T M Ψ = Λ,
where Λ is the diagonal matrix with λ1 , . . . , λn on its diagonal. Conclude that
X
M = Ψ ΛΨ T = λi ψ i ψ Ti .
i∈V
6. Traces.
Recall that the trace of a matrix A, written Tr (A), is the sum of the diagonal entries of A. Prove
that for two matrices A and B,
Tr (AB) = Tr (BA) .
Note that the matrices do not need to be square for this to be true. They can be rectangular
matrices of dimensions n × m and m × n.
Use this fact and the previous exercise to prove that
n
X
Tr (A) = λi ,
i=1
where λ1 , . . . , λn are the eigenvalues of A. You are probably familiar with this fact about the
trace, or it may have been the definition you were given. This is why I want you to remember
how to prove it.
CHAPTER 1. INTRODUCTION 20
Write
n
X
p(x) = xn−k ck (−1)k .
k=0
Prove that X
ck = det(M (S, S)).
S⊆[n],|S|=k
Here, we write [n] to denote the set {1, . . . , n}, and M (S, S) to denote the submatrix of M with
rows and columns indexed by S.
8. Reversing products.
Let M be a d-by-n matrix. Prove that the multiset of nonzero eigenvalues of M M T is the same
as the multiset of nonzero eigenvalues of M T M .
Chapter 2
One of the reasons that the eigenvalues of matrices have meaning is that they arise as the solution
to natural optimization problems. The formal statement of this is given by the Courant-Fischer
Theorem. We begin by using the Spectral Theorem to prove the Courant-Fischer Theorem. We
then prove the Spectral Theorem in a form that is almost identical to Courant-Fischer.
The Rayleigh quotient of a vector x with respect to a matrix M is defined to be
xTM x
. (2.1)
xTx
The Rayleigh quotient of an eigenvector is its eigenvalue: if M ψ = µψ, then
ψT M ψ ψ T µψ
= = µ.
ψT ψ ψT ψ
The Courant-Fischer Theorem tells us that the vectors x that maximize the Rayleigh quotient are
exactly the eigenvectors of the largest eigenvalue of M . In fact it supplies a similar
characterization of all the eigenvalues of a symmetric matrix.
xTM x xTM x
µk = maxn min = minn max ,
S⊆IR x ∈S xTx T ⊆IR x ∈T xTx
6 0
dim(S)=k x = 6 0
dim(T )=n−k+1 x =
where the maximization and minimization are over subspaces S and T of IRn .
Be warned that we will often neglect to include the condition x 6= 0, but we always intend it.
21
CHAPTER 2. EIGENVALUES AND OPTIMIZATION 22
As with many proofs in Spectral Graph Theory, we begin by expanding a vector x in the basis of
eigenvectors of M . Let’s recall how this is done.
Let ψ 1 , . . . , ψ n be an orthonormal basis of eigenvectors of M corresponding to µ1 , . . . , µn . As
these are an orthonormal basis, we may write
X
x = ci ψ i , where ci = ψ Ti x .
i
There are many ways to verify this. We let Ψ be the matrix whose columns are ψ 1 , . . . , ψ n , and
recall that the matrix Ψ is said to be orthogonal if its columns are orthonormal vectors. Also
recall that the orthogonal matrices are exactly those matrices Ψ for which Ψ Ψ T = I , and that
this implies that Ψ T Ψ = I . We now verify that
!
X X X
ψ i ψ Ti x = ψ i ψ Ti x = Ψ Ψ T x = I x = x .
ci ψ i =
i i i
When confused by orthonormal bases, just pretend that they are the basis of elementary unit
vectors. For example, you know that
X
x = x (i)δ i , and that x (i) = δ Ti x .
i
The first step in the proof is to express the Laplacian quadratic form of x in terms of the
expansion of x in the eigenbasis.
Then,
n
X
T
x Mx = c2i µi .
i=1
CHAPTER 2. EIGENVALUES AND OPTIMIZATION 23
Proof. Compute:
!T
X X
xTM x = ci ψ i M cj ψ j
i j
!T
X X
= ci ψ i cj µj ψ j
i j
X
= ci cj µj ψ Ti ψ j
i,j
X
= c2i µi ,
i
as (
0 for i 6= j
ψ Ti ψ j =
1 for i = j.
So,
xTM x
min ≥ µk .
x ∈S xTx
To show that this is in fact the maximum, we will prove that for all subspaces S of dimension k,
xTM x
min ≤ µk .
x ∈S xTx
Let T be the span of ψ k , . . . , ψ n . As T has dimension n − k + 1, every S of dimension k has an
intersection with T of dimension at least 1. So,
xTM x xTM x xTM x
min ≤ min ≤ max .
x ∈S xTx x ∈S∩T x T x x ∈T xTx
Any x in T may be expressed as
n
X
x = ci ψ i ,
i=k
CHAPTER 2. EIGENVALUES AND OPTIMIZATION 24
and so for x in T Pn Pn
xTM x i=k µi c2i i=k µk ci
2
T
= Pn 2 ≤ P n 2 = µk .
x x i=k ci i=k ci
We begin the second proof by showing that the Rayleigh quotient is maximized at an eigenvector
of µ1 .
Theorem 2.2.1. Let M be a symmetric matrix and let x be a non-zero vector that maximizes
the Rayleigh quotient with respect to M :
xTM x
.
xTx
Then, M x = µ1 x , where µ1 is the largest eigenvalue of M . Conversely, the minimum is achieved
by eigenvectors of the smallest eigenvalue of M .
Proof. We first observe that the maximum is achieved: as the Rayleigh quotient is homogeneous,
it suffices to consider unit vectors x . As the set of unit vectors is a closed and compact set, the
maximum is achieved on this set.
Now, let x be a non-zero vector that maximizes the Rayleigh quotient. We recall that the
gradient of a function at its maximum must be the zero vector. Let’s compute that gradient.
We have1
∇x T x = 2x ,
and
∇x T M x = 2M x .
So,
xTM x (x T x )(2M x ) − (x T M x )(2x )
∇ = .
xTx (x T x )2
In order for this to be zero, we must have
(x T x )M x = (x T M x )x ,
which implies
xTM x
Mx = x.
xTx
That is, if and only if x is an eigenvector of M with eigenvalue equal to its Rayleigh quotient. As
x maximizes the Rayleigh quotient, this eigenvalue must be µ1 .
1
In case you are not used to computing gradients of functions of vectors, you can derive these directly by reasoning
like
∂ ∂ X
xT x = x (b)2 = 2x (a).
∂x (a) ∂x (a)
b
CHAPTER 2. EIGENVALUES AND OPTIMIZATION 25
We now prove the Spectral Theorem by generalizing this characterization to all of the eigenvalues
of M . The idea is to always use Theorem 2.2.1 to show that a vector is an eigenvector. To do
this, we must modify the matrix for each vector.
Theorem 2.2.2. Let M be an n-dimensional real symmetric matrix. There exist numbers
µ1 , . . . , µn and orthonormal vectors ψ 1 , . . . , ψ n such that M ψ i = µi ψ i . Moreover,
ψ 1 ∈ arg max x T M x ,
kx k=1
and for 2 ≤ i ≤ n
ψ i ∈ arg max xTM x. (2.2)
kx k=1
x T ψ j =0,for j<i
Similarly,
ψ i ∈ arg min xTM x.
kx k=1
x T ψ j =0,for j>i
Proof. We use Theorem 2.2.1 to obtain ψ 1 and µ1 , and would like to proceed by induction. But
first, we reduce to the case of positive definite matrices.
By Theorem 2.2.1, we also know that there is a µn such that
xTM xT
µn = min .
x xTx
f = M + (1 − µn )I . For all x such that kx k = 1,
Now consider the matrix M
xTM
f x = x T M x + 1 − µn ≥ 1.
M kx = M x , x T M kx = x T M x , and
Now, let y be a unit vector that maximizes y T M k y . We know from Theorem 2.2.1 that y is an
eigenvector of M k . Call its eigenvalue µ. We now show that y must be orthogonal to each of
ψ 1 , . . . , ψ k . Let
Xk
e=y−
y ψ i (ψ Ti y )
i=1
nonzero, then ke eT M y
y k < ky k. As y e > 0, this would imply that for the unit vector y b,
T T
y
b M ky b > y M k y , a contradiction. As y is orthogonal to ψ 1 , . . . , ψ k and it is an eigenvector of
M k , it is also an eigenvector of M :
M y = M k y = µy ,
and by (2.3)
y ∈ arg max xTM x.
kx k=1
x T ψ j =0,for j≤k
2.3 Notes
2.4 Exercise
1. A tighter characterization.
Tighten Theorem 2.2.2 by proving that for every sequence of vectors x 1 , . . . , x n such that
each x i is an eigenvector of M .
Chapter 3
We being this section by establishing the equivalence of multiple expressions for the Laplacian.
The Laplacian Matrix of a weighted graph G = (V, E, w), w : E → IR+ , is designed to capture the
Laplacian quadratic form: X
x T LG x = wa,b (x (a) − x (b))2 . (3.1)
(a,b)∈E
We will now use this quadratic form to derive the structure of the matrix. To begin, consider a
graph with just two vertices and one edge of weight 1. Let’s call it G1,2 . We have
x T LG1,2 x = (x (1) − x (2))2 . (3.2)
Consider the vector δ 1 − δ 2 , where δ a is the elementary unit vector with a 1 in coordinate a. We
have
x (1) − x (2) = δ T1 x − δ T2 x = (δ 1 − δ 2 )T x ,
so
2 T
2 T T T 1 −1
(x (1) − x (2)) = (δ 1 − δ 2 ) x = x (δ 1 − δ 2 ) (δ 1 − δ 2 ) x = x x.
−1 1
Thus,
1 −1
LG1,2 = .
−1 1
Now, let Ga,b be the graph with just one edge between a and b. It can have as many other
vertices as you like. The Laplacian of Ga,b can be written in the same way:
LGa,b = (δ a − δ b )(δ a − δ b )T .
This is the matrix that is zero except at the intersection of rows and columns indexed by a and b,
where it looks looks like
1 −1
.
−1 1
27
CHAPTER 3. THE LAPLACIAN AND GRAPH DRAWING 28
You can check that this agrees with the definition of the Laplacian from Section 1.2.3:
LG = D G − AG ,
where X
D G (a, a) = wa,b .
b
This formula turns out to be useful when we view the Laplacian as an operator. For every vector
x we have X X
(LG x )(a) = d(a)x (a) − wa,b x (b) = wa,b (x (a) − x (b)). (3.3)
(a,b)∈E (a,b)∈E
From (3.1), we see that if all entries of x are the same, then x T Lx equals zero. From (3.3), we
can immediately see that L1 = 0, so the constant vectors are eigenvectors of eigenvalue zero. If
the graph is connected, these are the only eigenvectors of eigenvalue zero.
Lemma 3.1.1. Let G = (V, E) be a graph, and let 0 = λ1 ≤ λ2 ≤ · · · ≤ λn be the eigenvalues of
its Laplacian matrix, L. Then, λ2 > 0 if and only if G is connected.
Thus, for every pair of vertices (a, b) connected by an edge, we have ψ(a) = ψ(b). As every pair
of vertices a and b are connected by a path, we may inductively apply this fact to show that
ψ(a) = ψ(b) for all vertices a and b. Thus, ψ must be a constant vector. We conclude that the
eigenspace of eigenvalue 0 has dimension 1.
The idea of drawing graphs using eigenvectors demonstrated in Section 1.5.1 was suggested by
Hall [Hal70] in 1970.
To explain Hall’s approach, we first consider the problem of drawing a graph on a line. That is,
mapping each vertex to a real number. It isn’t easy to see what a graph looks like when you do
this, as all of the edges sit on top of one another. One can fix this either by drawing the edges of
the graph as curves, or by wrapping the line around a circle.
Let x ∈ IRV be the vector that describes the assignment of a real number to each vertex. We
would like vertices that are neighbors to be close to one another. So, Hall suggested that we
choose an x minimizing X
x T Lx = (x (a) − x (b))2 . (3.4)
(a,b)∈E
Unless we place restrictions on x , the solution will be degenerate. For example, all of the vertices
could map to 0. To avoid this, and to fix the scale of the embedding overall, we require
X
x (a)2 = kx k2 = 1. (3.5)
a∈V
Even with this restriction, another degenerate solution is possible: it could be that every vertex
√
maps to 1/ n. To prevent this from happening, we impose the additional restriction that
X
x (a) = 1T x = 0. (3.6)
a
On its own, this restriction fixes the shift of the embedding along the line. When combined with
(3.5), it guarantees that we get something interesting.
As 1 is the eigenvector of smallest eigenvalue of the Laplacian, Theorem 2.2.2 implies that a unit
eigenvector of λ2 minimizes x T Lx subject to (3.5) and (3.6).
Of course, we really want to draw a graph in two dimensions. So, we will assign two coordinates
to each vertex given by x and y . As opposed to minimizing (3.4), we will minimize the sum of
the squares of the lengths of the edges in the embedding:
2
X x (a) x (b)
− .
y (a) y (b)
(a,b)∈E
kx k2 = 1 and ky k2 = 1,
CHAPTER 3. THE LAPLACIAN AND GRAPH DRAWING 30
However, this still leaves us with the degenerate solution x = y = ψ 2 . To ensure that the two
coordinates are different, Hall introduced the restriction that x be orthogonal to y . To embed a
graph in kPdimensions, we find k orthonormal vectors x 1 , . . . , x k that are orthogonal to 1 and
minimize i x Ti Lx i . A natural choice for these is ψ 2 through ψ k+1 , and this choice achieves
objective function value k+1
P
i=2 λi .
The following theorem says that this choice is optimal. It is a variant of [Fan49, Theorem 1].
Theorem 3.2.1. Let L be a Laplacian matrix and let x 1 , . . . , x k be orthonormal vectors that are
all orthogonal to 1. Then
Xk k+1
X
x Ti Lx i ≥ λi ,
i=1 i=2
and this inequality is tight only when x Ti ψ j = 0 for all j such that λj > λk+1 .
as λj ≥ λk+1 for j > k + 1. This inequality is only tight when (ψ Tj x i )2 = 0 for j such that
λj > λk+1 .
1
This theorem is really about majorization, which is easily established through multiplication by a doubly-
stochastic matrix.
CHAPTER 3. THE LAPLACIAN AND GRAPH DRAWING 31
Pk T 2
where the inequality follows from the facts that λj − λk+1 ≤ 0 and i=1 (ψ j x i ) ≤ 1. This
inequality is tight under the same conditions as the previous one.
The beautiful pictures that we sometimes obtain from Hall’s graph drawing should convince you
that eigenvectors of the Laplacian should reveal a lot about the structure of graphs. But, it is
worth pointing out that there are many graphs for which this approach does not produce nice
images, and there are in fact graphs that can not be nicely drawn. Expander graphs are good
examples of these.
Many other approaches to graph drawing borrow ideas from Hall’s work: they try to minimize
some function of the distances of the edges subject to some constraints that keep the vertices well
separated. However, very few of these have compactly describable solutions, or even solutions
that can provably be computed in polynomial time. The algorithms that implement them
typically use a gradient based method to attempt to minimize the function of the distances
subject to constraints. This means that relabeling the vertices could produce very different
drawings! Thus, one must be careful before using these images to infer some truth about a graph.
Chapter 4
In this chapter, we examine the meaning of the smallest and largest eigenvalues of the adjacency
matrix of a graph. Note that the largest eigenvalue of the adjacency matrix corresponds to the
smallest eigenvalue of the Laplacian. Our focus in this chapter will be on the features that
adjacency matrices possess but which Laplacians do not. Where the smallest eigenvector of the
Laplacian is a constant vector, the largest eigenvector of an adjacency matrix, called the Perron
vector, need not be. The Perron-Frobenius theory tells us that the largest eigenvector of an
adjacency matrix is non-negative, and that its value is an upper bound on the absolute value of
the smallest eigenvalue. These are equal precisely when the graph is bipartite.
We will examine the relation between the largest adjacency eigenvalue and the degrees of vertices
in the graph. This is made more meaningful by the fact that we can apply Cauchy’s Interlacing
Theorem to adjacency matrices. We will use it to prove a theorem of Wilf [Wil67] which says that
a graph can be colored using at most 1 + bµ1 c colors. We will learn more about eigenvalues and
graph coloring in Chapter 19.
We will denote the eigenvalues of M by µ1 , . . . , µn . But, we order them in the opposite direction
than we did for the Laplacian: we assume
µ1 ≥ µ2 ≥ · · · ≥ µ n .
32
CHAPTER 4. ADJACENCY, INTERLACING, AND PERRON-FROBENIUS 33
The reason for this convention is so that µi corresponds to the ith Laplacian eigenvalue, λi . If G
is a d-regular graph, then D = I d,
L = Id − M,
and so
λi = d − µi .
Thus the largest adjacency eigenvalue of a d-regular graph is d, and its corresponding eigenvector
is the constant vector. We could also prove that the constant vector is an eigenvector of
eigenvalue d by considering the action of M as an operator (4.1): if x (a) = 1 for all a, then
(M x )(b) = d for all b.
We now examine µ1 for graphs which are not necessarily regular. Let G be a graph, let dmax be
the maximum degree of a vertex in G, and let dave be the average degree of a vertex in G.
Lemma 4.2.1.
dave ≤ µ1 ≤ dmax .
Proof. The lower bound follows by considering the Rayleigh quotient with the all-1s vector:
P
xTM x 1T M 1 a,b M (a, b)
P
d (a)
µ1 = max T
≥ T
= = a = dave .
x x x 1 1 n n
To prove the upper bound, Let φ1 be an eigenvector of eigenvalue µ1 . Let a be the vertex on
which φ1 takes its maximum value, so φ1 (a) ≥ φ1 (b) for all b, and we may assume without loss of
generality that φ1 (a) > 0 (use −φ1 if φ1 is strictly negative). We have
P
(M φ1 )(a) φ (b) X φ (b) X
µ1 = = b:b∼a 1 = 1
≤ 1 = d(a) ≤ dmax . (4.2)
φ1 (a) φ1 (a) φ1 (a)
b:b∼a b:b∼a
Proof. If we have equality in (4.2), then it must be the case that d(a) = dmax and φ1 (b) = φ1 (a)
for all (a, b) ∈ E. Thus, we may apply the same argument to every neighbor of a. As the graph is
connected, we may keep applying this argument to neighbors of vertices to which it has already
been applied to show that φ1 (c) = φ1 (a) and d(c) = dmax for all c ∈ V .
The technique used in these last two proofs will appear many times in this Chapter.
CHAPTER 4. ADJACENCY, INTERLACING, AND PERRON-FROBENIUS 34
We can strengthen the lower bound in Lemma 4.2.1 by proving that µ1 is at least the average
degree of every subgraph of G. We will prove this by applying Cauchy’s Interlacing Theorem.
For a graph G = (V, E) and S ⊂ V , we define the subgraph induced by S, written G(S), to be the
graph with vertex set S and all edges in E connecting vertices in S:
{(a, b) ∈ E : a ∈ S and b ∈ S} .
For a symmetric matrix M whose rows and columns are indexed by a set V , and a S ⊂ V , we
write M (S) for the symmetric submatrix with rows and columns in S.
Theorem 4.3.1 (Cauchy’s Interlacing Theorem). Let A be an n-by-n symmetric matrix and let
B be a principal submatrix of A of dimension n − 1 (that is, B is obtained by deleting the same
row and column from A). Then,
α1 ≥ β1 ≥ α2 ≥ β2 ≥ · · · ≥ αn−1 ≥ βn−1 ≥ αn ,
Proof. Without loss of generality we will assume that B is obtained from A by removing its first
row and column. We now apply the Courant-Fischer Theorem, which tells us that
x T Ax
αk = maxn min .
S⊆IR x ∈S xTx
dim(S)=k
We see that the right-hand expression is taking a maximum over a special family of subspaces of
dimension k: all the vectors in the family must have first coordinate 0. As the maximum over all
subspaces of dimension k can only be larger, we immediately have
αk ≥ βk .
We may prove the inequalities in the other direction, such as βk ≥ αk+1 , by replacing A and B
with −A and −B.
Lemma 4.3.2. For every S ⊆ V , let dave (S) be the average degree of G(S). Then,
dave (S) ≤ µ1 .
CHAPTER 4. ADJACENCY, INTERLACING, AND PERRON-FROBENIUS 35
Proof. If M is the adjacency matrix of G, then M (S) is the adjacency matrix of G(S). Lemma
4.2.1 says that dave (S) is at most the largest eigenvalue of the adjacency matrix of G(S), and
Theorem 4.3.1 says that this is at most µ1 .
If we remove the vertex of smallest degree from a graph, the average degree can increase. On the
other hand, Cauchy’s Interlacing Theorem says that µ1 can only decrease when we remove a
vertex.
Lemma 4.3.2 is a good demonstration of Cauchy’s Theorem. But, using Cauchy’s Theorem to
prove it was overkill. An more direct way to prove it is to emulate the proof of Lemma 4.2.1, but
computing the quadratic form in the characteristic vector of S instead of 1.
We now apply Lemma 4.3.2 to obtain an upper bound on the chromatic number of a graph.
Recall that a coloring of a graph is an assignment of colors to vertices in which adjacent vertices
have distinct colors. A graph is said to be k-colorable if it can be colored with only k colors1 . The
chromatic number of a graph, written χ(G), is the least k for which G is k-colorable. The
bipartite graphs are exactly the graph of chromatic number 2.
It is easy to show that every graph is (dmax + 1)-colorable. Assign colors to the vertices
one-by-one. As each vertex has at most dmax neighbors, there is always some color one can assign
that vertex that is different than those assigned to its neighbors. The following theorem of Wilf
[Wil67] improves upon this bound.
Theorem 4.4.1.
χ(G) ≤ bµ1 c + 1.
Proof. We prove this by induction on the number of vertices in the graph. To ground the
induction, consider the graph with one vertex and no edges. It has chromatic number 1 and
largest eigenvalue zero2 . Now, assume the theorem is true for all graphs on n − 1 vertices, and let
G be a graph on n vertices. By Lemma 4.2.1, G has a vertex of degree at most bµ1 c. Let a be
such a vertex and let S = V \ {a}. By Theorem 4.3.1, the largest eigenvalue of G(S) is at most
µ1 , and so our induction hypothesis implies that G(S) has a coloring with at most bµ1 c + 1 colors.
Let c be any such coloring. We just need to show that we can extend c to a. As a has at most
bµ1 c neighbors, there is some color in {1, . . . , bµ1 c + 1} that does not appear among its neighbors,
and which it may be assigned. Thus, G has a coloring with bµ1 c + 1 colors.
The simplest example in which this theorem improves over the naive bound of dmax + 1 is the
path graph on 3 vertices: it has dmax = 2 but µ1 < 2. Thus, Wilf’s theorem tells us that it can be
colored with 2 colors. Star graphs
√ provide more extreme examples. A star graph with n vertices
has dmax = n − 1 but µ1 = n − 1.
1
To be precise, we often identify these k colors with the integers 1 through k. A k-coloring is then a function
c : {1, . . . , k} → V such that c(a) 6= c(b) for all (a, b) ∈ E.
2
If this makes you uncomfortable, you could use both graphs on two vertices
CHAPTER 4. ADJACENCY, INTERLACING, AND PERRON-FROBENIUS 36
The eigenvector corresponding to the largest eigenvalue of the adjacency matrix of a graph is
usually not a constant vector. However, it is always a positive vector if the graph is connected.
This follows from the Perron-Frobenius theory (discovered independently by Perron [Per07] and
Frobenius [Fro12]). In fact, the Perron-Frobenius theory says much more, and it can be applied to
adjacency matrices of strongly connected directed graphs. Note that these need not even be
diagonalizable!
In the symmetric case, the theory is made much easier by both the spectral theory and the
characterization of eigenvalues as extreme values of Rayleigh quotients. For a treatment of the
general Perron-Frobenius theory, we recommend Seneta [Sen06] or Bapat and Raghavan [BR97].
Theorem 4.5.1. [Perron-Frobenius, Symmetric Case] Let G be a connected weighted graph, let
M be its adjacency matrix, and let µ1 ≥ µ2 ≥ · · · ≥ µn be its eigenvalues. Then
b. µ1 ≥ −µn , and
c. µ1 > µ2 .
Before proving Theorem 4.5.1, we will prove a lemma that will be used in the proof. It says that
non-negative eigenvectors of non-negative adjacency matrices of connected graphs must be strictly
positive.
Lemma 4.5.2. Let G be a connected weighted graph (with non-negative edge weights), let M be
its adjacency matrix, and assume that some non-negative vector φ is an eigenvector of M . Then,
φ is strictly positive.
Proof. If φ is not strictly positive, there is some vertex a for which φ(a) = 0. As G is connected,
there must be some edge (b, c) for which φ(b) = 0 but φ(c) > 0. Let µ be the eigenvalue of φ. As
φ(b) = 0, we obtain a contradiction from
X
µφ(b) = (M φ)(b) = wb,z φ(z) ≥ wb,c φ(c) > 0,
(b,z)∈E
where the inequalities follow from the fact that the terms wb,z and φ(z) are non-negative.
So, we conclude that φ must be strictly positive.
Proof of Theorem 4.5.1. Let φ1 be an eigenvector of µ1 of norm 1, and construct the vector x
such that
x (u) = |φ1 (u)| , for all u.
We will show that x is an eigenvector of eigenvalue µ1 .
CHAPTER 4. ADJACENCY, INTERLACING, AND PERRON-FROBENIUS 37
So, the Rayleigh quotient of x is at least µ1 . As µ1 is the maximum possible Rayleigh quotient for
a unit vector, the Rayleigh quotient of x must be µ1 and Theorem 2.2.1 implies that x must be
an eigenvector of µ1 . As x is non-negative, Lemma 4.5.2 implies that it is strictly positive.
To prove part b, let φn be the eigenvector of µn and let y be the vector for which y (u) = |φn (u)|.
In the spirit of the previous argument, we can again show that
X
|µn | = |φn M φn | ≤ M (a, b)y (a)y (b) ≤ µ1 y T y = µ1 . (4.3)
a,b
µ2 = φT2 M φ2 ≤ y T M y ≤ µ1 .
Finally, we show that for a connected graph G, µn = −µ1 if and only if G is bipartite. In fact, if
µn = −µ1 , then µn−i = −µi+1 for every i.
Proof. Consider the conditions necessary to achieve equality in (4.3). First, y must be an
eigenvector of eigenvalue µ1 . Thus, y must be strictly positive, φn can not have any zero values,
and there must be an edge (a, b) for which φn (a) < 0 < φn (b). It must also be the case that all of
the terms in X
M (a, b)φn (a)φn (b)
(a,b)∈E
have the same sign, and we have established that this sign must be negative. Thus, for every edge
(a, b), φn (a) and φn (b) must have different signs. That is, the signs provide the bipartition of the
vertices.
Proposition 4.5.4. If G is bipartite then the eigenvalues of its adjacency matrix are symmetric
about zero.
CHAPTER 4. ADJACENCY, INTERLACING, AND PERRON-FROBENIUS 38
Proof. As G is bipartite, we may divide its vertices into sets S and T so that all edges go between
S and T . Let φ be an eigenvector of M with eigenvalue µ. Define the vector x by
(
φ(a) if a ∈ S, and
x (a) =
−φ(a) if a ∈ T .
Comparing Graphs
5.1 Overview
It is rare than one can analytically determine the eigenvalues of an abstractly defined graph.
Usually one is only able to prove loose bounds on some eigenvalues.
In this lecture we will see a powerful technique that allows one to compare one graph with
another, and prove things like lower bounds on the smallest eigenvalue of a Laplacians. It often
goes by the name “Poincaré Inequalities” (see [DS91, SJ89, GLM99]), or “Graphic inequalities”.
I begin by recalling an extremely useful piece of notation that is used in the Optimization
community. For a symmetric matrix A, we write
A<0
if A is positive semidefinite. That is, if all of the eigenvalues of A are nonnegative, which is
equivalent to
v T Av ≥ 0,
for all v . We similarly write
A<B
if
A−B <0
which is equivalent to
v T Av ≥ v T Bv
for all v .
39
CHAPTER 5. COMPARING GRAPHS 40
The relation < is called the Loewner partial order. It applies to some pairs of symmetric matrices,
while others are incomparable. But, for all pairs to which it does apply, it acts like an order. For
example, we have
A < B and B < C implies A < C ,
and
A < B implies A + C < B + C ,
for symmetric matrices A, B and C .
We will overload this notation by defining it for graphs as well. Thus, we write
G<H
if LG < LH . When we write this, we are always describing an inequality on Laplacian matrices.
For example, if G = (V, E) is a graph and H = (V, F ) is a subgraph of G, then
LG < LH .
To see this, recall the Laplacian quadratic form:
X
x T LG x = wu,v (x (u) − x (v))2 .
(u,v)∈E
It is clear that dropping edges can only decrease the value of the quadratic form. The same holds
for decreasing the weights of edges.
This notation is particularly useful when we consider some multiple of a graph, such as when we
write
G < c · H,
for some c > 0. What is c · H? It is the same graph as H, but the weight of every edge is
multiplied by c.
We usually use this notation for the inequalities it implies on the eigenvalues of LG and LH .
Lemma 5.2.1. If G and H are graphs such that
G < c · H,
then
λk (G) ≥ cλk (H),
for all k.
Corollary 5.2.2. Let G be a graph and let H be obtained by either adding an edge to G or
increasing the weight of an edge in G. Then, for all a
λi (G) ≤ λi (H).
CHAPTER 5. COMPARING GRAPHS 41
We consider one graph to be a good approximation of another if their Laplacian quadratic forms
are similar. For example, we will say that H is a c-approximation of G if
Surprising approximations exist. For example, random regular and random Erdös-Rényi graphs
are good approximations of complete graphs. We will encounter infinite families of expander
graphs that for every > 0 provide a d > 0 such that for all n > 0 there is a d-regular graph Gn
that is a (1 + )-approximation of Kn . As d is fixed, such a graph has many fewer edges than a
complete graph!
In Chapters ?? and ?? we will also prove that every graph can be well-approximated by a sparse
graph.
By now you should be wondering, “how do we prove that G < c · H for some graph G and H?”
Not too many ways are known. We’ll do it by proving some inequalities of this form for some of
the simplest graphs, and then extending them to more general graphs. For example, we will prove
where Pn is the path from vertex 1 to vertex n, and G1,n is the graph with just the edge (1, n).
All of these edges are unweighted.
The following very simple proof of this inequality was discovered by Sam Daitch.
Lemma 5.4.1.
(n − 1) · Pn < G1,n .
For 1 ≤ a ≤ n − 1, set
∆(a) = x (a + 1) − x (a).
The inequality we need to prove then becomes
n−1 n−1
!2
X X
(n − 1) ∆(a)2 ≥ ∆(a) .
a=1 a=1
CHAPTER 5. COMPARING GRAPHS 42
But, this is just the Cauchy-Schwartz inequality. I’ll remind you that Cauchy-Schwartz follows
from the fact that the inner product of two vectors is at most the product of their norms. In this
case, those vectors are ∆ and the all-ones vector of length n − 1:
n−1
!2 n−1
X 2 X
∆(a) = 1Tn−1 ∆ ≤ (k1n−1 k k∆k)2 = k1n−1 k2 k∆k2 = (n − 1) ∆(i)2 .
a=1 i=1
In Lemma 6.6.1 we will prove that λ2 (Pn ) ≈ π 2 /n2 . For now, we demonstrate the power of
Lemma 5.4.1 by using it to prove a lower bound on λ2 (Pn ) that is very close to this.
To prove a lower bound on λ2 (Pn ), we will prove that some multiple of the path is at least the
complete graph. To this end, write X
LKn = LGa,b ,
a<b
This inequality says that Ga,b is at most (b − a) times the part of the path connecting a to b, and
that this part of the path is less than the whole.
Summing inequality (5.2) over all edges (a, b) ∈ Kn gives
X X
Kn = Ga,b 4 (b − a)Pn .
a,b a,b
X n−1
X
(b − a) = c(n − c) = n(n + 1)(n − 1)/6.
1≤a<b≤n c=1
So,
n(n + 1)(n − 1)
LKn 4 LPn .
6
Applying Lemma 5.2.1, we obtain
6
≤ λ2 (Pn ).
(n + 1)(n − 1)
CHAPTER 5. COMPARING GRAPHS 43
4 7
5 6
Figure 5.1: T1 , T2 and T3 . Node 1 is at the top, 2 and 3 are its children. Some other nodes have
been labeled as well.
Let’s first upper bound λ2 (Td ) by constructing a test vector x . Set x (1) = 0, x (2) = 1, and
x (3) = −1. Then, for every vertex u that we can reach from node 2 without going through node
1, we set x (a) = 1. For all the other nodes, we set x (a) = −1.
0
1 −1
1 −1 −1
1
1 1 1 1 −1 −1 −1 −1
We will again prove a lower bound by comparing Td to the complete graph. For each edge a < b,
let Tda,b denote the unique path in T from a to b. This path will have length at most 2d ≤ 2 log2 n.
So, we have
X X a,b
X n
Kn = Ga,b 4 (2d)Td 4 (2 log2 n)Td = (2 log2 n)Td .
2
a<b a<b a<b
which implies
1
λ2 (Td ) ≥ .
(n − 1) log2 n
Using the generalization of Lemma 5.4.1 presented in the next section, one can improve this lower
bound to 1/cn for some constant c.
We now generalize the the inequality in Lemma 5.4.1 to weighted path graphs. Allowing for
weights on the edges of the path greatly extends it applicability.
Proof. Let x ∈ IRn and set ∆(a) as in the proof of Lemma 5.4.1 Now, set
√
γ(a) = ∆(a) wa .
Then, X
∆(a) = γ T w −1/2 ,
a
2 X 1
w −1/2 = ,
a
wa
and X
kγk2 = ∆(a)2 wa .
a
So,
!2
X 2
x T LG1,n x = ∆(a) = γ T w −1/2
a
n−1
! ! !
2 X 1 X X 1 X
−1/2 2 T
≤ kγk w = ∆(a) wa = x wa LGa,a+1 x.
a
wa a a
wa
a=1
CHAPTER 5. COMPARING GRAPHS 45
5.7 Exercises
kv k2 ≤ kv + t1k2 ,
46
Chapter 6
Fundamental Graphs
We will bound and derive the eigenvalues of the Laplacian matrices of some fundamental graphs,
including complete graphs, star graphs, ring graphs, path graphs, and products of these that yield
grids and hypercubes. As all these graphs are connected, they all have eigenvalue zero with
multiplicity one. We will have to do some work to compute the other eigenvalues.
We will see in Part IV that the Laplacian eigenvalues that reveal the most about a graph are the
smallest and largest ones. To interpret the smallest eigenvalues, we will exploit a relation between
λ2 and the isoperimetric ratio of a graph that is derived in Chapter 20, and which we state here
for convenience:
For every S ⊂ V ,
θ(S) ≥ λ2 (1 − s),
where s = |S| / |V | and
def |∂(S)|
θ(S) =
|S|
is the isoperimetric ratio of S.
Lemma 6.1.1. The Laplacian of Kn has eigenvalue 0 with multiplicity 1 and n with multiplicity
n − 1.
Proof. To compute the non-zero eigenvalues, let ψ be any non-zero vector orthogonal to the all-1s
vector, so X
ψ(a) = 0. (6.1)
a
47
CHAPTER 6. FUNDAMENTAL GRAPHS 48
We now compute the first coordinate of LKn ψ. Using (3.3), the expression for the action of the
Laplacian as an operator, we find
X n
X
(LKn ψ) (1) = (ψ(1) − ψ(b)) = (n − 1)ψ(1) − ψ(b) = nψ(1), by (6.1).
v≥2 v=2
As the choice of coordinate was arbitrary, we have Lψ = nψ. So, every vector orthogonal to the
all-1s vector is an eigenvector of eigenvalue n.
We often think of the Laplacian of the complete graph as being a scaling of the identity. For
every x orthogonal to the all-1s vector, Lx = nx .
Now, let’s see how our bound on the isoperimetric ratio works out. Let S ⊂ [n]. Every vertex in S
has n − |S| edges connecting it to vertices not in S. So,
|S| (n − |S|
θ(S) = = n − |S| = λ2 (LKn )(1 − s),
|S|
where s = |S| /n. Thus, Theorem 20.1.1 is sharp for the complete graph.
Lemma 6.2.1. Let G = (V, E) be a graph, and let a and b be vertices of degree one that are both
connected to another vertex c. Then, the vector ψ = δ a − δ b is an eigenvector of LG of eigenvalue
1.
Proof. Just multiply LG by ψ, and check (using (3.3)) vertex-by-vertex that it equals ψ.
As eigenvectors of different eigenvalues are orthogonal, this implies that ψ(a) = ψ(b) for every
eigenvector with eigenvalue different from 1.
Lemma 6.2.2. The graph Sn has eigenvalue 0 with multiplicity 1, eigenvalue 1 with multiplicity
n − 2, and eigenvalue n with multiplicity 1.
Proof. Applying Lemma 6.2.1 to vertices i and i + 1 for 2 ≤ i < n, we find n − 2 linearly
independent eigenvectors of the form δ i − δ i+1 , all with eigenvalue 1. As 0 is also an eigenvalue,
only one eigenvalue remains to be determined.
CHAPTER 6. FUNDAMENTAL GRAPHS 49
Recall that the trace of a matrix equals both the sum of its diagonal entries and the sum of its
eigenvalues. We know that the trace of LSn is 2n − 2, and we have identified n − 1 eigenvalues
that sum to n − 2. So, the remaining eigenvalue must be n.
To determine the corresponding eigenvector, recall that it must be orthogonal to the other
eigenvectors we have identified. This tells us that it must have the same value at each of the
points of the star. Let this value be 1, and let x be the value at vertex 1. As the eigenvector is
orthogonal to the constant vectors, it must be that
(n − 1) + x = 0,
so x = −(n − 1).
We now define a product on graphs. If we apply this product to two paths, we obtain a grid. If
we apply it repeatedly to one edge, we obtain a hypercube.
Definition 6.3.1. Let G = (V, E, v) and H = (W, F, w) be weighted graphs. Then G × H is the
graph with vertex set V × W and edge set
(a, b), (b
a, b) with weight va,ba , where (a, ba) ∈ E and
(a, b), (a, bb) with weight wb,bb , where (b, bb) ∈ F .
Figure 6.1: An m-by-n grid graph is the product of a path on m vertices with a path on n vertices.
This is a drawing of a 5-by-4 grid made using Hall’s algorithm.
Theorem 6.3.2. Let G = (V, E, v) and H = (W, F, w) be weighted graphs with Laplacian
eigenvalues λ1 , . . . , λn and µ1 , . . . , µm , and eigenvectors α1 , . . . , αn and β 1 , . . . , β m , respectively.
Then, for each 1 ≤ i ≤ n and 1 ≤ j ≤ m, G × H has an eigenvector γ i,j of eigenvalue λi + µj
such that
γ i,j (a, b) = αi (a)β j (b).
CHAPTER 6. FUNDAMENTAL GRAPHS 50
= (λ + µ)(α(a)β(b)).
An alternative approach to defining the graph product and proving Theorem 6.3.2 is via
Kronecker products. G × H is the graph with Laplacian matrix
(LG ⊗ I W ) + (I V ⊗ LH ).
The d-dimensional hypercube graph, Hd , is the graph with vertex set {0, 1}d , with edges between
vertices whose names differ in exactly one bit. The hypercube may also be expressed as the
product of the one-edge graph with itself d − 1 times.
Let H1 be the graph with vertex set {0, 1} and one edge between those vertices. It’s Laplacian
matrix has eigenvalues 0 and 2. As Hd = Hd−1 × H1 , we may use this to calculate the eigenvalues
and eigenvectors of Hd for every d.
The eigenvectors of H1 are
1 1
and ,
1 −1
with eigenvalues 0 and 2, respectively. Thus, if ψ is an eigenvector of Hd−1 with eigenvalue λ, then
ψ ψ
and ,
ψ −ψ
are eigenvectors of Hd with eigenvalues λ and λ + 2, respectively. This means that Hd has
eigenvalue 2i for each 0 ≤ i ≤ d with multiplicity di . Moreover, each eigenvector of Hd can be
identified with a vector y ∈ {0, 1}d :
Tx
ψ y (x ) = (−1)y ,
CHAPTER 6. FUNDAMENTAL GRAPHS 51
where x ∈ {0, 1}d ranges over the vertices of Hd . Each y ∈ {0, 1}d−1 indexing an eigenvector of
Hd−1 leads to the eigenvectors of Hd indexed by (y , 0) and (y , 1).
Using Theorem 20.1.1 and the fact that λ2 (Hd ) = 2, we can immediately prove the following
isoperimetric theorem for the hypercube.
Corollary 6.3.3.
θHd ≥ 1.
In particular, for every set of at most half the vertices of the hypercube, the number of edges on
the boundary of that set is at least the number of vertices in that set.
This result is tight, as you can see by considering one face of the hypercube, such as all the
vertices whose labels begin with 0. It is possible to prove this by more concrete combinatorial
means. In fact, very precise analyses of the isoperimetry of sets of vertices in the hypercube can
be obtained. See [Har76] or [Bol86].
If we can guess an approximation of ψ 2 , we can often plug it in to the Laplacian quadratic form
to obtain a good upper bound on λ2 . The Courant-Fischer Theorem tells us that every vector v
orthogonal to 1 provides an upper bound on λ2 :
v T Lv
λ2 ≤ .
vT v
When we use a vector v in this way, we call it a test vector.
Let’s see what a test vector can tell us about λ2 of a path graph on n vertices. I would like to use
the vector that assigns i to vertex a as a test vector, but it is not orthogonal to 1. So, we will use
the next best thing. Let x be the vector such that x (a) = (n + 1) − 2a, for 1 ≤ a ≤ n. This vector
satisfies x ⊥ 1, so
2
P
1≤a<n (x(a) − x(a + 1))
λ2 (Pn ) ≤ P 2
a x(a)
2
P
1≤a<n 2
=P 2
a (n + 1 − 2a)
4(n − 1)
= (clearly, the denominator is n3 /c for some c)
(n + 1)n(n − 1)/3
12
= . (6.2)
n(n + 1)
We will soon see that this bound is of the right order of magnitude. Thus, Theorem 20.1.1 does
not provide a good bound on the isoperimetric ratio of the path graph. The isoperimetric ratio is
minimized by the set S = {1, . . . , n/2}, which has θ(S) = 2/n. However, the upper bound
CHAPTER 6. FUNDAMENTAL GRAPHS 52
provided by Theorem 20.1.1 is of the form c/n. Cheeger’s inequality, which appears in Chapter
21, will tell us that the error of this approximation can not be worse than quadratic.
The Courant-Fischer theorem is not as helpful when we want to prove lower bounds on λ2 . To
prove lower bounds, we need the form with a maximum on the outside, which gives
v T Lv
λ2 ≥ max min .
S:dim(S)=n−1 v ∈S vT v
v T Lv
min
v ∈S vT v
over a space S of large dimension. We will see a technique that lets us prove such lower bounds
next lecture.
But, first we compute the eigenvalues and eigenvectors of the path graph exactly.
The ring graph on n vertices, Rn , may be viewed as having a vertex set corresponding to the
integers modulo n. In this case, we view the vertices as the numbers 0 through n − 1, with edges
(a, a + 1), computed modulo n.
for 0 ≤ k ≤ n/2, ignoring y 0 which is the all-zero vector, and for even n ignoring y n/2 for the
same reason. Eigenvectors x k and y k have eigenvalue 2 − 2 cos(2πk/n).
Note that x 0 is the all-ones vector. When n is even, we only have x n/2 , which alternates ±1.
Proof. We will first see that x 1 and y 1 are eigenvectors by drawing the ring graph on the unit
circle in the natural way: plot vertex a at point (cos(2πa/n), sin(2πa/n)).
You can see that the average of the neighbors of a vertex is a vector pointing in the same
direction as the vector associated with that vertex. This should make it obvious that both the x
and y coordinates in this figure are eigenvectors of the same eigenvalue. The same holds for all k.
Alternatively, we can verify that these are eigenvectors by a simple computation.
CHAPTER 6. FUNDAMENTAL GRAPHS 53
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
−0.2 −0.2
−0.4 −0.4
−0.6 −0.6
−0.8 −0.8
−1 −1
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
Figure 6.2:
We will derive the eigenvalues and eigenvectors of the path graph from those of the ring graph.
To begin, I will number the vertices of the ring a little differently, as in Figure 6.3.
Lemma 6.6.1. Let Pn = (V, E) where V = {1, . . . , n} and E = {(a, a + 1) : 1 ≤ a < n}. The
Laplacian of Pn has the same eigenvalues as R2n , excluding 2. That is, Pn has eigenvalues
namely 2(1 − cos(πk/n)), and eigenvectors
for 0 ≤ k < n
Proof. We derive the eigenvectors and eigenvalues by treating Pn as a quotient of R2n : we will
identify vertex a of Pn with vertices a and a + n of R2n (under the new numbering of R2n ). These
are pairs of vertices that are above each other in the figure that I drew.
CHAPTER 6. FUNDAMENTAL GRAPHS 54
3 2
4 1
8 5
7 6
If there is an eigenvector ψ of R2n with eigenvalue λ for which ψ(a) = ψ(a + n) for 1 ≤ a ≤ n,
then the above equation gives us a way to turn this into an eigenvector of Pn : Let φ ∈ IRn be the
vector for which
φ(a) = ψ(a), for 1 ≤ a ≤ n.
Then,
In In In
φ = ψ, LR2n φ = λψ, and In In LR2n ψ = 2λφ.
In In In
So, if we can find such a vector ψ, then the corresponding φ is an eigenvector of Pn of eigenvalue
λ.
As you’ve probably guessed, we can find such vectors ψ. I’ve drawn one in Figure 6.3. For each of
the two-dimensional eigenspaces of R2n , we get one such a vector. These provide eigenvectors of
eigenvalue
2(1 − cos(πk/n)),
for 1 ≤ k < n. Thus, we now know n − 1 distinct eigenvalues. The last, of course, is zero.
The type of quotient used in the above argument is known as an equitable partition. You can find
a extensive exposition of these in Godsil’s book [God93].
Chapter 7
Cayley Graphs
Ring graphs and hypercubes are types of Cayley graph. In general, the vertices of a Cayley graph
are the elements of some group Γ. In the case of the ring, the group is the set of integers modulo
n. The edges of a Cayley graph are specified by a set S ⊂ Γ, which are called the generators of
the Cayley graph. The set of generators must be closed under inverse. That is, if s ∈ S, then
s−1 ∈ S. Vertices u, v ∈ Γ are connected by an edge if there is an s ∈ S such that
u ◦ s = v,
where ◦ is the group operation. In the case of Abelian groups, like the integers modulo n, this
would usually be written u + s = v. The generators of the ring graph are {1, −1}.
The d-dimensional hypercube, Hd , is a Cayley graph over the additive group (Z/2Z)d : that is the
set of vectors in {0, 1}d under addition modulo 2. The generators are given by the vectors in
{0, 1}d that have a 1 in exactly one position. This set is closed under inverse, because every
element of this group is its own inverse.
We require S to be closed under inverse so that the graph is undirected:
u+s=v ⇐⇒ v + (−s) = u.
Cayley graphs over Abeliean groups are particularly convenient because we can find an
orthonormal basis of eigenvectors without knowing the set of generators. They just depend on the
group1 . Knowing the eigenvectors makes it much easier to compute the eigenvalues. We give the
computations of the eigenvectors in sections 7.4 and 7.8.
We will now examine two exciting types of Cayley graphs: Paley graphs and generalized
hypercubes.
1
More precisely, the characters always form an orthonormal set of eigenvectors, and the characters just depend
upon the group. When two different characters have the same eigenvalue, we obtain an eigenspace of dimension
greater than 1. These eigenspaces do depend upon the choice of generators.
55
CHAPTER 7. CAYLEY GRAPHS 56
The Paley graph are Cayley graphs over the group of integer modulo a prime, p, where p is
equivalent to 1 modulo 4. Such a group is often written Z/p.
I should begin by reminding you a little about the integers modulo p. The first thing to remember
is that the integers modulo p are actually a field, written Fp . That is, they are closed under both
addition and multiplication (completely obvious), have identity elements under addition and
multiplication (0 and 1), and have inverses under addition and multiplication. It is obvious that
the integers have inverses under addition: −x modulo p plus x modulo p equals 0. It is a little less
obvious that the integers modulo p have inverses under multiplication (except that 0 does not
have a multiplicative inverse). That is, for every x 6= 0, there is a y such that xy = 1 modulo p.
When we write 1/x, we mean this element y.
The generators of the Paley graphs are the squares modulo p (usually called the quadratic
residues). That is, the set of numbers s such that there exits an x for which x2 ≡p s. Thus, the
vertex set is {0, . . . , p − 1}, and there is an edge between vertices u and v if u − v is a square
modulo p. I should now prove that −s is a quadratic residue if and only if s is. This will hold
provided that p is equivalent to 1 modulo 4. To prove that, I need to tell you one more thing
about the integers modulo p: their multiplicative group is cyclic.
Fact 7.2.1. For every prime p, there exists a number g such that for every number x between 1
and p − 1, there is a unique i between 1 and p − 1 such that
x ≡ gi mod p.
In particular, g p−1 ≡ 1.
Proof. We know that 4 divides p − 1. Let s = g (p−1)/4 . I claim that s2 = −1. This will follow
from s4 = 1.
To see this, consider the equation
x2 − 1 ≡ 0 mod p.
As the numbers modulo p are a field, it can have at most 2 solutions. Moreover, we already know
two solutions, x = 1 and x = −1. As s4 = 1, we know that s2 must be one of 1 or −1. However, it
cannot be the case that s2 = 1, because then the powers of g would begin repeating after the
(p − 1)/2 power, and thus could not represent every number modulo p.
We now understand a lot about the squares modulo p (formally called quadratic residues). The
squares are exactly the elements g i where i is even. As g i g j = g i+j , the fact that −1 is a square
implies that s is a square if and only if −s is a square. So, S is closed under negation, and the
Cayley graph of Z/p with generator set S is in fact a graph. As |S| = (p − 1)/2, it is regular of
degree
p−1
d= .
2
CHAPTER 7. CAYLEY GRAPHS 57
It will prove simpler to compute the eigenvalues of the adjacency matrix of the Paley Graphs.
Since these graphs are regular, this will immediately tell us the eigenvalues of the Laplacian. Let
L be the Laplacians matrix of the Paley graph on p vertices. A remarkable feature of Paley graph
is that L2 can be written as a linear combination of L, J and I , where J is the all-1’s matrix. We
will prove that
p−1 p(p − 1)
L2 = pL + J− I. (7.1)
4 4
The proof will be easiest if we express L in terms of a matrix X defined by the quadratic
character :
1
if x is a quadratic residue modulo p
χ(x) = 0 if x = 0, and
−1 otherwise.
This is called a character because it satisfies χ(xy) = χ(x)χ(y). We will use this to define a
matrix X by
X (u, v) = χ(u − v).
An elementary calculation, which I skip, reveals that
X = pI − 2L − J . (7.2)
Lemma 7.3.1.
X 2 = pI − J .
Proof. The diagonal entries of X 2 are the squares of the norms of the columns of X . As each
contains (p − 1)/2 entries that are 1, (p − 1)/2 entries that are −1, and one entry that is 0, its
squared norm is p − 1.
To handle the off-diagonal entries, we observe that X is symmetric, so the off-diagonal entries are
the inner products of columns of X . That is,
X X
X (u, v) = χ(u − x)χ(v − x) = χ(y)χ((v − u) + y),
x y
where we have set y = u − x. For convenience, set w = v − u, so we can write this more simply.
As we are considering a non-diagonal entry, w 6= 0. The term in the sum for y = 0 is zero. When
y 6= 0, χ(y) ∈ ±1, so
χ(y)χ(w + y) = χ(w + y)/χ(y) = χ(w/y + 1).
Now, as y varies over {1, . . . , p − 1}, w/y varies over all of {1, . . . , p − 1}. So, w/y + 1 varies over
all elements other than 1. This means that
p−1
!
X X
χ(y)χ((v − u) + y) = χ(z) − χ(1) = 0 − 1 = −1.
y z=0
CHAPTER 7. CAYLEY GRAPHS 58
This gives us a quadratic equation that every eigenvalue other than d must obey. Let φ be an
eigenvector of L of eigenvalue λ 6= 0. As φ is orthogonal to the all-1s vector, J φ = 0. So,
p(p − 1)
λ2 φ = L2 φ = pLφ − I φ == (pλ − p(p − 1)/4)φ.
4
So, we find
p(p − 1)
λ2 + pλ − = 0.
4
This gives
1 √
λ= (p ± p) .
2
2. Payley graphs have only two nonzero eigenvalues. This places them within the special
family of Strongly Regular Graphs, that we will study later in the semester.
To generalize the hypercube, we will consider Cayley graphs over the same group, but with more
generators. Recall that we view the vertex set as the vectors in {0, 1}d , modulo 2. Each
generator, g 1 , . . . , g k , is in the same group.
Let G be the Cayley graph with these generators. To be concrete, set V = {0, 1}d , and note that
G has edge set
(x , x + g j ) : x ∈ V, 1 ≤ j ≤ k .
Using the analysis of products of graphs, we derived a set of eigenvectors of Hd . We will now
verify that these are eigenvectors for all generalized hypercubes. Knowing these will make it easy
to describe the eigenvalues.
For each b ∈ {0, 1}d , define the function ψ b from V to the reals given by
T
ψ b (x ) = (−1)b x
.
When we write b T x , you might wonder if we mean to take the sum over the reals or modulo 2.
As both b and x are {0, 1}-vectors, you get the same answer either way you do it.
While it is natural to think of b as being a vertex, that is the wrong perspective. Instead, you
should think of b as indexing a Fourier coefficient (if you don’t know what a Fourier coefficient is,
just don’t think of it as a vertex).
CHAPTER 7. CAYLEY GRAPHS 59
The eigenvectors and eigenvalues of the graph are determined by the following theorem. As this
graph is k-regular, the eigenvectors of the adjacency and Laplacian matrices will be the same.
Lemma 7.4.1. For each b ∈ {0, 1}d the vector ψ b is a Laplacian matrix eigenvector with
eigenvalue
k
X T
k− (−1)b g i .
i=1
Let L be the Laplacian matrix of the graph. For any vector ψ b for b ∈ {0, 1}d and any vertex
x ∈ V , we compute
k
X
(Lψ b )(x ) = kψ b (x ) − ψ b (x + g i )
i=1
k
X
= kψ b (x ) − ψ b (x )ψ b (g i )
i=1
k
!
X
= ψ b (x ) k − ψ b (g i ) .
i=1
We will now show that if we choose the set of generators uniformly at random, for k some
constant multiple of the dimension, then we obtain a graph that is a good approximation of the
complete graph. That is, all the eigenvalues of the Laplacian will be close to k. This construction
comes from the work of Alon and Roichman [AR94]. We will set k = cd, for some c > 1. Think of
c = 2, c = 10, or c = 1 + .
For b ∈ {0, 1}d but not all zero, and for g chosen uniformly at random from {0, 1}d , b T g modulo
2 is uniformly distributed in {0, 1}, and so
T
(−1)b g
CHAPTER 7. CAYLEY GRAPHS 60
The right-hand part is a sum of independent, uniformly chosen ±1 random variables. So, we
know it is concentrated around 0, and thus λb will be concentrated around k. To determine how
concentrated the sum actually is, we use a Chernoff bound. There are many forms of Chernoff
bounds. We will not use the strongest, but settle for one which is simple and which gives results
that are qualitatively correct.
Theorem 7.5.1. Let x1 , . . . , xk be independent ±1 random variables. Then, for all t > 0,
" #
2
X
Pr xi ≥ t ≤ 2e−t /2k .
i
This becomes very small when t is a constant fraction of k. In fact, it becomes so small that it is
unlikely that any eigenvalue deviates from k by more than t.
Theorem 7.5.2. With high probability, all of the nonzero eigenvalues of the generalized hypercube
differ from k by at most r
2
k ,
c
where k = cd.
p
Proof. Let t = k 2/c. Then, for every nonzero b,
2 /2k
Pr [|k − λb | ≥ t] ≤ 2e−t ≤ 2e−k/c = 2e−d .
Now, the probability that there is some b for which λb violates these bounds is at most the sum
of these terms:
X
Pr [∃b : |k − λb | ≥ t] ≤ Pr [|k − λb | ≥ t] ≤ (2d − 1)2e−d ,
b∈{0,1}d ,b6=0d
which is always less than 1 and goes to zero exponentially quickly as d grows.
We initially suggested thinking of c = 2 or c = 10. The above bound works for c = 10. To get a
useful bound for c = 2, we need to sharpen the analysis. A naive sharpening will work down to
c = 2 ln 2. To go lower than that, you need a stronger Chernoff bound.
CHAPTER 7. CAYLEY GRAPHS 61
7.6 Conclusion
We have now seen that a random generalized hypercube of degree k probably has all non-zero
Laplacian eigenvalues between
p p
k(1 − 2/c) and k(1 + 2/c).
If we let n be the number of vertices, and we now multiply the weight of every edge by n/k, we
obtain a graph with all nonzero Laplacian eigenvalues between
p p
n(1 − 2/c) and n(1 + 2/c).
p
Thus, this is essentially a 1 + 2/c approximation of the complete graph on n vertices. But, the
degree of every vertex is only c log2 n. Expanders are infinite families of graphs that are
constant-factor approximations of complete graphs, but with constant degrees.
We know that random regular graphs are probably expanders. If we want explicit constructions,
we need to go to non-Abelian groups.
Explicit constructions that achieve bounds approaching those of random generalized hypercubes
come from error-correcting codes.
Explicit constructions allow us to use these graphs in applications that require us to implicitly
deal with a very large graph. In Chapter 31, we will see how to use such graphs to construct
pseudo-random generators.
In the homework, you will show that it is impossible to make constant-degree expander graphs
from Cayley graphs of Abelian groups. The best expanders are constructed from Cayley graphs of
2-by-2 matrix groups. In particular, the Ramanujan expanders of Margulis [Mar88] and
Lubotzky, Phillips and Sarnak [LPS88] are Cayley graphs over the Projective Special Linear
Groups PSL(2, p), where p is a prime. These are the 2-by-2 matrices modulo p with determinant
1, in which we identify A with −A.
They provided a very concrete set of generators. For a prime q modulo to 1 modulo 4, it is known
that there are p + 1 solutions to the equation
where a1 is odd and a2 , a3 and a4 are even. We obtain a generator for each such solution of the
form:
1 a0 + ia1 a2 + ia3
√ ,
p −a2 + ia3 a0 − ia1
where i is an integer that satisfies i2 = −1 modulo p.
Even more explicit constructions, which do not require solving equations, may be found
in [ABN+ 92].
CHAPTER 7. CAYLEY GRAPHS 62
The wonderful thing about Cayley graphs of Abelian groups is that we can construct an
orthornormal basis of eigenvectors for these graphs without even knowing the set of generators S.
That is, the eigenvectors only depend upon the group. Related results also hold for Cayley graphs
of arbitrary groups, and are related to representations of the groups. See [Bab79] for details.
As Cayley graphs are regular, it won’t matter which matrix we consider. For simplicity, we will
consider adjacency matrices.
Let n be an integer and let G be a Cayley graph on Z/n with generator set S. When S = {±1},
we get the ring graphs. For general S, I think of these as generalized Ring graphs. Let’s first see
that they have the same eigenvectors as the Ring graphs.
Recall that we proved that the vectors x k and y k were eigenvectors of the ring graphs, where
for 1 ≤ k ≤ n/2.
Let’s just do the computation for the x k , as the y k are similar. For every u modulo n, we have
X
(Ax k )(u) = x k (u + g)
g∈S
1 X
= x k (u + g) + x k (u − g)
2
g∈S
1 X
= sin(2πk(u + g)/n) + sin(2πk(u − g)/n)
2
g∈S
1 X
= 2 sin(2πku/n) cos(2πkg/n)
2
g∈S
X
= sin(2πku/n) cos(2πkg/n)
g∈S
X
= x k (u) cos(2πkg/n).
g∈S
Notation: at many points during this chapter, we will write [n] to indicate the set {1, 2, . . . , n}.
In this chapter we examine the adjacency matrix eigenvalues of Erdös-Rényi random graphs.
These are graphs in which each edge is chosen to appear with probability p, and the choices are
made independently for each edge. We will find that these graphs typically have one large
eigenvalue paround pn, and that all of the others probably have absolute value at most
(1 + o(1))2 p(1 − p)n. In fact, their distribution within this region follows Wigner’s [Wig58]
semicircle law: their histogram looks like a semicircle.
For example, let’s consider p = 1/2. Here is the histogram of all but the largest eigenvalue of a
random graph on 4,000 vertices.
The following image is the histogram of the 99 smallest adjacency eigenvalues of 10,000 random
graphs on 100 vertices.
63
CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 64
Note that the eigenvalues are almost never outside −10, 10. We are going to prove something like
that today.
Here is a histogram of the second-largest eigenvalues of 10,000 matrices on 1,000 vertices.
The reason we write it this way is that the expectation of every entry of R, and thus R is zero.
We will show that with high probability all of the eigenvalues of R are probably small, and thus
we can view M as being approximately p(J − I ).
CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 65
As p(J − I ) has one eigenvalue of pn and n − 1 eigenvalues of −p, the bulk of the distribution of
eigenvalues of M is very close to the distribution of eigenvalues of R, minus p. To see this, first
subtract pI from R. This shifts all the eigenvalues down by p. We must now add pJ . As J is a
rank-1 matrix, we can show that the eigenvalues of pJ + (R − pI ) interlace the eigenvalues of
R − pI (see the exercise at the end of the chapter). So, the largest eigenvalue moves up a lot, and
all the other n − 1 move up to at most the next eigenvalue.
A good way to characterize the general shape of a distribution is through its moments. Let
ρ1 , . . . , ρn be the eigenvalues of R. Their first moment is simply their sum, which is the trace of
R and thus zero. Their kth moment is
X n
ρki = Tr Rk .
i=1
One of the easiest ways to reason about the distribution of the eigenvalues is to estimate the
expectations of the traces of powers of R. This is called Wigner’s trace method.
Before proceeding with our analysis, we recall a formula for the entries of a power of matrices.
For matrices A and B whose rows and columns are indexed by V ,
X
(AB)(a, b) = A(a, c)B(c, b).
c∈V
Recall that the norm of a matrix R, written kRk, is the maximum of the absolute value of its
eigenvalues. It is called a norm because for all x ,
kRx k ≤ kRk kx k .
We will focus on proving an upper bound on the norm of R that holds with high probability. We
will do this by estimating the trace of a high power of R. For an even power l, this should be
relatively close to kRk. In particular, we use the fact that for every even l
kRkl ≤ Tr Rl
We will prove in Theorem 8.3.2 that for every even l such that np(1 − p) ≥ 2l8 ,
ETr Rl ≤ 2n(4np(1 − p))l/2 .
CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 66
This will allow us to show that the norm of R is usually less than u where
1/l
def p
u = 2n(4np(1 − p))l/2 = (2n)1/l 2 np(1 − p).
≤ (1 + )−l ,
by Markov’s inequality.
To understand this probability, remember that for small (1 + ) is approximately exp(). So,
(1 + )−l is approximately exp(−l). This probability becomes small when l > 1/. Concretely, for
< 1/2, 1 + < exp(4/5). Thus, we can take approximately (n/2)−1/8 . While this bound is not
very useful for n that we encounter in practice, it is nice asymptotically. The bound can be
substantially improved by more careful arguments, as we explain at the end of the Chapter.
We should also examine the term (2n)1/l :
for ln(2n)/l < 1/2. Thus, for lg ln(2n) this term is close to 1.
Recall that the trace is the sum of the diagonal entries in a matrix. By expanding the formula for
matrix multiplication, one can also show
X l−1
Y
Rl (a0 , a0 ) = R(a0 , al−1 ) R(ai−1 , ai ),
a1 ,...,al−1 ∈V i=1
and so
X l−1
Y
l
ER (a0 , a0 ) = ER(a0 , al−1 ) R(ai−1 , ai ).
a1 ,...,al−1 ∈V i=1
To simplify this expression, we will recall that if X and Y are independent random variables, then
E(XY ) = E(X)E(Y ). So, to the extent that the terms in this product are independent, we can
distribute this expectation across this product. As the entries of R are independent, up to the
symmetry condition, the only terms that are dependent are those that are identical. So, if
{bj , cj }j is the set of pairs that occur in
for d ≥ 2.
So, ERl (a0 , a0 ) is at most the sum over sequences a1 , . . . , al−1 such that each pair in (8.1) appears
at least twice, times p(1 − p) for each pair that appears in the sequence.
To describe this more carefully, we say that a sequence a0 , a1 , . . . , al is a closed walk of length l on
n vertices if each ai ∈ {1, . . . , n} and al = a0 . In addition, we say that it is significant if for every
tuple {b, c} there are at least two indices i for which {ai , ai+1 } = {b, c}. Let Wn,l,k denote the
number of significant closed walks of length l on n vertices such that a1 , . . . , al−1 contains exactly
k distinct elements. As a sequence with k distinct elements must contain at least k distinct pairs,
we obtain the following upper bound on the trace.
Lemma 8.3.1.
l/2
X
l
ETr R ≤ Wn,l,k (p(1 − p))k .
k=1
Proof. Let
tk = nk+1 2l l4(l−2k) (p(1 − p))k .
We will show that the sequence tk is geometrically increasing, and thus it is dominated by its
largest term. We compute
Thus,
Of course, better bounds on Wn,l,k provide better bounds on the trace. Vu [Vu07] proves that
k+1 l
Wn,l,k ≤ n (k + 1)2(l−2k) 22k .
2k
Our goal is to prove an upper bound on Wn,l,k . We will begin by proving a crude upper bound,
and then refine it.
As it is tricky to obtain a clean formula for the number of such walks, we will instead derive ways
of describing such walks, and then count how many such descriptions there can be.
Let S ⊂ {1, . . . , l − 1} be the set of i for which ai does not appear earlier in the walk:
ai 6∈ {aj : j < i} .
σ : S → [n].
τ : [l − 1] \ S → S ∪ {0} .
There are at most (k + 1)l−1−k choices for τ . See figure 8.1 for an example.
While not every choice of a0 , S, σ and τ corresponds to a significant walk, every significant walk
with k distinct elements corresponds to some a0 , S, σ and τ . Thus,
l − 1 k+1
Wn,l,k ≤ n (k + 1)l−1−k . (8.3)
k
CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 69
This bound
p is too loose to obtain the result we desire. It merely allows us to prove that
kRk ≤ c np(1 − p) log n for some constant c. This bound is loosest when k = l/2. This is
fortunate both because this is the case in which it is easiest to tighten the bound, and because the
computation in Theorem 8.3.2 is dominated by this term.
Consider the graph with edges (ai−1 , ai ) for i ∈ S. This graph must be a tree because it contains
exactly the edges from which the walk first hits each vertex. Formally, this is because the graph
contains k edges, touches k + 1 vertices, and we can prove by induction on the elements in S that
it is connected, starting with a0 . See Figure 8.2.
We can use this tree to show that, when k = l/2, every pair {ai−1 , ai } that appears in the walk
must appear exactly twice: the walk only takes l steps, and each pair of the k = l/2 in the tree
must be covered at least twice.
We now argue that when l = 2k the map τ is completely unnecessary: the walk is determined by
S and σ alone. That is, for every i 6∈ S there is only one edge that the walk can follow. For i 6∈ S,
the tuple {ai−1 , ai } must have appeared exactly once before in the walk. We will show that at
step i the vertex ai−1 is adjacent to exactly one edge with this property.
To this end, we keep track of the graph of edges that have appeared exactly once before in the
walk. We could show by induction that at step i this graph is precisely a path from a0 to ai−1 .
But, we take an alternate approach. Consider the subgraph of the tree edges that have been used
exactly once up to step i. We will count both its number of vertices, v, and its number of edges,
CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 70
Now that we know Wn,l,l/2 is much less than the bound suggested by (8.3), we should suspect
that Wn,l,k is also much lower when k is close to l/2. To show this, we extend the idea used in the
previous argument to show that with very little information we can determine where the walk
goes for almost every step not in S.
We say that the ith step in the walk is extra if the pair {ai−1 , ai } is not a tree edge or if it appears
at least twice in the walk before step i. Let x denote the number of extra steps. As each of the
tree edges appears in at least two steps, the number of extra steps is at most l − 2k. We will use τ
to record the destination vertex ai of each extra step, again by indicating its position in S.
During the walk, we keep track of the set of tree edges that have been used exactly once. Let T
be the set of steps in which in which ai−1 is adjacent to exactly one tree edge that has been used
exactly once and the walk follows that edge. That is, the edge is {ai−1 , ai } and we can infer ai
given i ∈ T . If ai−1 is adjacent to exactly one tree edge that has been used exactly once but the
walk does not follow that edge, then step i is extra: it either follows an edge that is not in the
tree or it follows a tree edge that has been used at least twice.
This leaves us to account for the steps in which ai−1 is adjacent to more than one tree edge that
has been used exactly once and follows such an edge. In this case, we call step i ambiguous,
because we need some way to indicate which of those edges it used. In ambiguous steps we also
use τ to record ai . Every step not in S or T is extra or ambiguous. So,
τ : ([l − 1] \ (S ∪ T )) → S ∪ {0} .
The data a0 , S, T , σ, and τ determine the walk. It remains to determine how many ways we can
choose them.
We will show that the number of ambiguous steps is at most the number of extra steps. This
implies that |V \ (S ∪ T )| ≤ 2x. Thus, the number of possible maps τ is at most
(k + 1)2x .
CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 71
l−1 l−1−k
≤ 2l−1 (l − 1 − k)2x .
k 2x
Thus
Wn,l,k ≤ nk+1 2l−1 (l − k − 1)2x (k + 1)2x ≤ nk+1 2l−1 (lk)2(l−2k) ≤ nk+1 2l l4(l−2k) . (8.4)
We now finish by arguing that the number of ambiguous edges is at most the number of extra
edges. As before, keep track of the subgraph of the tree edges that have been used exactly once
up to step i. We will count both its number of vertices, v, and its number of edges, e. At step i
we include ai in this subgraph regardless of whether it is adjacent to any of the graphs edges, so
initially v = 1 and e = 0. The walk ends in the same state.
For steps in which i ∈ S, both v and e increase by 1. For steps in which i ∈ T , the vertex ai−1 has
degree one in this graph. When we follow the edge (ai−1 , ai ), we remove it from this graph. As
ai−1 is no longer adjacent to any edge of the graph, both v and e decrease by 1.
At ambiguous steps i, we decrement e. But, because ai−1 was adjacent to at least two tree edges
that had been used exactly once, it is not removed from the graph and v does not decrease. The
ambiguous steps may be compensated by extra steps. An extra step does not change f , but it can
decrease v. This happens when ai−1 is not adjacent to any tree edges that have been used exactly
once, but ai is. Thus, ai−1 contributes 1 to v during step i − 1, but is removed from the count as
soon as the walk moves to ai . As the walk starts and ends with v − f = 1, neither the steps in S
nor T change this difference, ambiguous steps increase it, and extra steps can decrease it, the
number of extra steps must be at least the number of ambiguous steps.
8.5 Notes
The proof in this chapter is a slight simplification and weakening of result due to Vu [Vu07]. The
result was first claimed by Füredi and Komlos [FK81]. However, there were a few mistakes in
their paper. Vu’s paper also provides concentration results that lower bound µ2 , whereas the
argument in this chapter merely provides an upper bound.
8.6 Exercise
1. Interlacing.
Let A be a symmetric matrix with eigenvalues α1 ≥ α2 ≥ . . . ≥ αn . Let B = A + x x T for some
vector x and let β1 ≥ β2 ≥ . . . ≥ βn be the eigenvalues of B. Prove that for all i
βi ≥ αi ≥ βi+1 .
CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 72
9.1 Introduction
In this and the next lecture, I will discuss strongly regular graphs. Strongly regular graphs are
extremal in many ways. For example, their adjacency matrices have only three distinct
eigenvalues. If you are going to understand spectral graph theory, you must have these in mind.
In many ways, strongly-regular graphs can be thought of as the high-degree analogs of expander
graphs. However, they are much easier to construct.
The Paley graphs we encountered in Chapter ?? are Strongly Regular Graphs.
Many times someone has asked me for a matrix of 0s and 1s that “looked random”, and strongly
regular graphs provided a resonable answer.
Warning: I will use the letters that are standard when discussing strongly regular graphs. So λ
and µ will not be eigenvalues in this lecture.
9.2 Definitions
2. there exists an integer λ such that for every pair of vertices x and y that are neighbors in G,
there are λ vertices z that are neighbors of both x and y;
3. there exists an integer µ such that for every pair of vertices x and y that are not neighbors
in G, there are µ vertices z that are neighbors of both x and y.
73
CHAPTER 9. STRONGLY REGULAR GRAPHS 74
These conditions are very strong, and it might not be obvious that there are any non-trivial
graphs that satisfy these conditions. Of course, the complete graph and disjoint unions of
complete graphs satisfy these conditions.
For the rest of this lecture, we will only consider strongly regular graphs that are connected and
that are not the complete graph. I will now give you some examples.
n = 5, k = 2, λ = 0, µ = 1.
For a positive integer n, the lattice graph Ln is the graph with vertex set {1, . . . n}2 in which
vertex (a, b) is connected to vertex (c, d) if a = c or b = d. Thus, the vertices may be arranged at
the points in an n-by-n grid, with vertices being connected if they lie in the same row or column.
Alternatively, you can understand this graph as the line graph of a bipartite complete graph
between two sets of n vertices.
It is routine to see that the parameters of this graph are:
k = 2(n − 1), λ = n − 2, µ = 2.
A Latin Square is an n-by-n grid, each entry of which is a number between 1 and n, such that no
number appears twice in any row or column. For example,
1 2 3
2 3 1
3 1 2
Let me remark that the number of different latin squares of size n grows very quickly, at least as
fast as n!(n − 1)!(n − 2)! . . . 2!.
From such a latin square, we construct a Latin Square Graph. It will have n2 nodes, one for each
cell in the square. Two nodes are joined by an edge if
So, such a graph has degree k = 3(n − 1). Any two nodes in the same row will both be neighbors
with every other pair of nodes in their row. They will have two more common neighors: the nodes
in their columns holding the other’s number. So, they have n common neighbors. The same
obviously holds for columns, and is easy to see for nodes that have the same number. So, every
pair of nodes that are neighbors have exactly λ = n common neighbors.
On the other hand, consider two vertices that are not neighbors, say (1, 1) and (2, 2). They lie in
different rows, lie in different columns, and hold different numbers. The vertex (1, 1) has two
common neighbors of (2, 2) in its row: the vertex (1, 2) and the vertex holding the same number
as (2, 2). Similarly, it has two common neighbors of (2, 2) in its column. Finally, we can find two
more common neighbors of (2, 2) that are in different rows and columns by looking at the nodes
that hold the same number as (1, 1), but which are in the same row or column as (2, 2). So, µ = 6.
We will consider the adjacency matrices of strongly regular graphs. Let A be the adjacency
matrix of a strongly regular graph with parameters (k, λ, µ). We already know that A has an
eigenvalue of k with multiplicity 1. We will now show that A has just two other eigenvalues.
To prove this, first observe that the (u, v) entry of A2 is the number of common neighbors of
vertices u and v. For u = v, this is just the degree of vertex u. We will use this fact to write A2 as
a linear combination of A, I and J. To this end, observe that the adjacency matrix of the
complement of A (the graph with non-edges where A has edges) is J − I − A. So,
A2 v = (λ − µ)Av + (k − µ)v .
θ2 = (λ − µ)θ + k − µ.
The eigenvalues of A other than k are those θ that satisfy this quadratic equation, and so are
given by p
λ − µ ± (λ − µ)2 + 4(k − µ)
.
2
These eigenvalues are always denoted r and s, with r > s. By convention, the multiplicty of the
eigenvalue r is always denoted f , and the multiplicty of s is always denoted g.
For example, for the pentagon we have
√ √
5−1 5+1
r= , s=− .
2 2
CHAPTER 9. STRONGLY REGULAR GRAPHS 76
r = n − 3, s = −3.
We will now show that every regular connected graph with at most 3 eigenvalues must be a
strongly regular graph. Let G be k-regular, and let its eigenvalues other than k be r and s. As G
is connected, its adjacency eigenvalue k has multiplicty 1.
Then, for every vector orthogonal to 1, we have
(A − rI)(A − sI)v = 0.
A2 − (r + s)A + rsI = βJ =⇒
A2 = (r + s)A − rsI + βJ
= (r + s + β)A + β(J − A − I) + (rs + β)I.
So, the number of common neighbors of two nodes just depends on whether or not they are
neighbors, which implies that A is strongly regular.
We will now see that, unless f = g, both r and s must be integers. We do this by observing a few
identities that they both must satisfy. First, from the quadratic equation above, we know that
r+s=λ−µ (9.1)
and
rs = µ − k. (9.2)
As the trace of an adjacency matrix is zero, and is also the sum of the eigenvalues times their
multiplicites, we know
k + f r + gs = 0. (9.3)
So, it must be the case that s < 0. Equation 9.1 then gives r > 0.
If f 6= g, then equations (9.3) and (9.1) provide independent constraints on r and s, and so
together they determine r and s. As the coefficients in both equations are integers, they tell us
CHAPTER 9. STRONGLY REGULAR GRAPHS 77
that both r and s are rational numbers. From this, and the fact that r and s are the roots of a
quadratic equation with integral coefficients, we may conclude that r and s are in fact integers.
Let me remind you as to why.
Lemma 9.8.1. If θ is a rational number that satisfies
θ2 + bθ + c = 0,
Proof. Write θ = x/y, where the greatest common divisor of x and y is 1. We then have
(x/y)2 + b(x/y) + c = 0,
which implies
x2 + bxy + cy 2 = 0,
which implies that y divides x2 . As we have assumed the greatest common divisor of x and y is 1,
this implies y = 1.
It is natural to ask what the eigenspaces can tell us about a strongly regular graph. But, we will
find that they don’t tell us anything we don’t already know.
Let u 1 , . . . u f be an orthonormal set of eigenvectors of the eigenvalue r, and let U be the matrix
containing these vectors as columns. Recall that U is only determined up to an orthnormal
transformation. That is, we could equally take U Q for any f -by-f orthnormal matrix Q.
To the ith vertex, we associate the vector
def
x i = (u 1 (i), . . . , u f (i)).
While the vectors U are determined only up to orthogonal transformations, these transformations
don’t effect the geometry of these vectors. For example, for vertices i and j, the distance between
x i and x j is
kx i − x j k ,
and
kx i − x j k2 = kx i k2 + kxxj k2 − 2x i x Tj .
On the other hand,
kx i Q − x j Qk2 = kx i Qk2 +kxxj Qk2 −2(x i Q)(x j Q)T = kx i Qk2 +kxxj Qk2 −2x i QQT x Tj = kx i k2 +kxxj k2 −2x i x Tj .
In fact, all the geometrical information about the vectors x i is captured by their Gram matrix,
whose (i, j) entry is x i x Tj . This matrix is also given by
UUT .
CHAPTER 9. STRONGLY REGULAR GRAPHS 78
So,
1
P (A) = P (r)U U T + P (s)W W T + P (k) J = U U T .
n
That is, the Gram matrix of the point set x 1 , . . . , x n is a linear combination of the identity, A
and A2 . So, the distance between any pair of points in this set just depends on whether or not the
corresponding vertices are neighbors in G.
In particular, this means that the point set x 1 , . . . , x n is a two-distance point set: a set of points
such that there are only two different distances between them. Next lecture, we will use this fact
to prove a lower bound on the dimensions f and g.
For a positive integer n, the triangular graph Tn may be defined to be the line graph of the
complete graph on n vertices. To be more concrete, its vertices are the subsets of size 2 of
{1, . . . , n}. Two of these sets are connected by an edge if their intersection has size 1.
You are probably familiar with some triangular graphs. T3 is the triangle, T4 is the skeleton of the
octahedron, and T5 is the complement of the Petersen graph.
Let’s verify that these are strongly-regular, and compute their parameters. As the construction is
competely symmetric, we may begin by considering any vertex, say the one labeled by the set
{1, 2}. Every vertex labeled by a set of form {1, i} or {2, i}, for i ≥ 3, will be connected to this
set. So, this vertex, and every vertex, has degree 2(n − 2).
For any neighbor of {1, 2}, say {1, 3}, every other vertex of from {1, i} for i ≥ 4 will be a neighbor
of both of these, as will the set {2, 3}. Carrying this out in general, we find that
λ = (n − 3) + 1 = n − 2.
CHAPTER 9. STRONGLY REGULAR GRAPHS 79
Finally, any non-neighbor of {1, 2}, say {3, 4}, will have 4 common neighbors with {1, 2}:
So, µ = 4.
Recall from last lecture that each eigenspace of a strongly regular graph supply a set of points on
the unit sphere such that the distance between a pair of points just depends on whether or not
they are adjacent. If the graph is connected and not the complete graph, then we can show that
these distances are greater than zero, so no two vertices map to the same unit vector. If we take
the corresponding point sets for two strongly regular graphs with the same parameters, we can
show that the graphs are isomorphic if and only if there is an orthogonal transformation that
maps one point set to the other. In low dimensions, it is easy to find such an orthogonal
transformation if one exists.
Consider the eigenspace of r, which we recall has dimension f . Fix any set of f independent
vectors corresponding to f vertices. An orthogonal transformation is determined by its action on
these vectors. So, if there is an orthogonal transformation that maps one vector set onto the
other, we will find it by examining all orthogonal transformations determined by mapping these f
vectors to f vectors in the other set. Thus, we need only examine nf f ! transformations. This
would be helpful √if f were small. Unfortunately, it is not. We will now prove that both f and g
must be at least 2n − 2.
Let x 1 , . . . , x n be a set of unit vectors in IRf such that there are two values α, β < 1 such that
x i , x j = α or β.
pi (y ) = (y T x i − α)(y T x i − β),
for y ∈ IRf . We first note that each polynomial pi is a polynomial of degree 2 in f variables (the
coordinates of y ). As each f -variate polynomial of degree 2 can be expressed in the form
X X
a+ bi yi + ci,j yi yj ,
i i≤k
we see that the vector space of degree-2 polynomials in f variables has dimension
f
1 + 2f + .
2
To prove a lower bound on f , we will show that these polynomials are linearly independent.
Assume by way of contradiction that they are not. Then, without loss of generality, there exist
coefficients γ1 , . . . , γn with γ1 6= 0 and
X
γi pi (y) = 0.
i
Physical Metaphors
82
Chapter 10
We will examine how the eigenvalues of a graph govern the convergence of a random walk.
We will consider random walks on undirected graphs. Let’s begin with the definitions. Let
G = (V, E, w) be a weighted undirected graph. A random walk on a graph is a process that
begins at some vertex, and at each time step moves to another vertex. When the graph is
unweighted, the vertex the walk moves to is chosen uniformly at random among the neighbors of
the present vertex. When the graph is weighted, it moves to a neighbor with probability
proportional to the weight of the corresponding edge. While the transcript (the list of vertices in
the order they are visited) of a particular random walk is sometimes of interest, it is often more
productive to reason about the expected behavior of a random walk. To this end, we will
investigate the probability distribution over vertices after a certain number of steps.
We will let the vector p t ∈ IRV denote the probability distribution at time t. We write p t (a) to
indicate the value of p t at a vertex a—the probability of being at vertex a at time t. A
probability vector p is a vector such that p(a) ≥ 0, for all a ∈ V , and
X
p(a) = 1.
a
Our initial probability distribution, p 0 , will typically be concentrated one vertex. That is, there
will be some vertex a for which p 0 (a) = 1. In this case, we say that the walk starts at a.
To derive a p t+1 from p t , note that the probability of being at a vertex a at time t + 1 is the sum
over the neighbors b of a of the probability that the walk was at b at time t, times the probability
it moved from b to a in time t + 1. We can state this algebraically as
X w(a, b)
p t+1 (a) = p (b), (10.1)
d (b) t
b:(a,b)∈E
P
where d (b) = a w(a, b) is the weighted degree of vertex b.
83
CHAPTER 10. RANDOM WALKS ON GRAPHS 84
We may write this in matrix form using the walk matrix of the graph, which is given by
def
W = M D −1 .
We then have
p t+1 = W p t .
To see why this holds, consider how W acts as an operator on an elementary unit vector.
X
M D −1 δ b = M (δ b /d (b)) = (wa,b /d (b))δ a .
a∼b
We will often consider lazy random walks, which are the variant of random walks that stay put
with probability 1/2 at each time step, and walk to a random neighbor the other half of the time.
These evolve according to the equation
X w(a, b)
p t+1 (a) = (1/2)p t (a) + (1/2) p (b), (10.2)
d (b) t
b:(a,b)∈E
and satisfy
p t+1 = W
f p t,
where W
f is the lazy walk matrix , given by
f def 1 1 1 1
W = I + W = I + M D −1 .
2 2 2 2
While the walk matrices are not symmetric, they are similar to symmetric matrices. We will see
that this implies that they have n real eigenvalues, although their eigenvectors are generally not
orthogonal. Define the normalized adjacency matrix by
def
A = D −1/2 W D 1/2 = D −1/2 M D −1/2 .
So, A is symmetric.
Of course, W
f has the same eigenvectors as W .
M D −1 d = M 1 = d .
So, the Perron-Frobenius theorem (Theorem 4.5.1) tells us that all the eigenvalues of W lie
between −1 and 1. As we did in Proposition 4.5.3, one can show that G is bipartite if and only if
−1 is an eigenvalue of A.
As Wf = W /2 + I /2, this implies that all the eigenvalues of W
f lie between 0 and 1. We denote
the eigenvalues of W and I /2 + A/2 by
f
1 = ω1 ≥ ω2 ≥ · · · ≥ ωn ≥ 0.
While the letter ω is not a greek equivalent “w”, we use it because it looks like one.
From Claim 10.2.1, we now know that
def d 1/2
ψ1 =
d 1/2
Regardless of the starting distribution, the lazy random walk on a connected graph always
converges to one distribution: the stable distribution. This is the other reason that we forced our
random walk to be lazy. Without laziness1 , there can be graphs on which the random walks never
converge. For example, consider a non-lazy random walk on a bipartite graph. Every-other step
will bring it to the other side of the graph. So, if the walk starts on one side of the graph, its
limiting distribution at time t will depend upon the parity of t.
In the stable distribution, every vertex is visited with probability proportional to its weighted
degree. We denote the vector encoding this distribution by π, where
def
π = d /(1T d ).
We have already seen that π is a right-eigenvector of eigenvalue 1. To show that the lazy random
walk converges to π, we will exploit the fact that all the eigenvalues other than 1 are in [0, 1).
And, we expand the vectors p t in the eigenbasis of A, after first multiplying by D −1/2 .
1
Strictly speaking, any nonzero probability of staying put at any vertex in a connected graph will guarantee
convergence. We don’t really need a half probability at every vertex.
CHAPTER 10. RANDOM WALKS ON GRAPHS 86
Note that
(d 1/2 )T 1T p 0 1
c1 = ψ T1 (D −1/2 p 0 ) = 1/2
(D −1/2 p 0 ) = 1/2
= 1/2
,
kd k kd k kd k
as p 0 is a probability vector. One of the reasons we do not expand in a basis of eigenvectors of
W
f is that it, not being orthogonal, it does not allow such a nice expression for the coefficients.
We have
f tp 0
pt = W
t
= D 1/2 D −1/2 W
f D 1/2 D −1/2 p 0
t
= D 1/2 D −1/2 Wf D 1/2 D −1/2 p 0
As 0 ≤ ωi < 1 for i ≥ 2, the right-hand term must go to zero. On the other hand,
ψ 1 = d 1/2 /kd 1/2 k, so
!
1 d 1/2 d d
D 1/2 c1 ψ 1 = D 1/2 1/2 1/2
= 1/2 2
=P = π.
kd k kd k kd k a d (a)
This is a perfect example of one of the main uses of spectral theory: to understand what happens
when we repeatedly apply an operator.
The rate of convergence of a lazy random walk to the stable distribution is dictated by ω2 : a
small value of ω2 implies fast convergence.
There are many ways of measuring convergence of a random walk. We will do so point-wise.
Assume that the random walk starts at some vertex a ∈ V . For every vertex b, we will bound how
far p t (b) can be from π(b).
CHAPTER 10. RANDOM WALKS ON GRAPHS 87
We need merely prove an upper bound on the magnitude of the right-hand term. To this end,
recall that
1
ci = ψ Ti D −1/2 δ a = p ψ Ti δ a .
d (a)
So, s
X d (b) T X t
δ Tb D 1/2 ωit ci ψ i = δ ωi ψ i ψ Ti δ a .
d (a) b
i≥2 i≥2
X X
δ Tb ωit ψ i ψ Ti δ a = ωit δ Tb ψ i ψ Ti δ a
i≥2 i≥2
X
≤ ωit δ Tb ψ i ψ Ti δ a
i≥2
X
≤ ω2t δ Tb ψ i ψ Ti δ a
i≥2
X
≤ ω2t δ Tb ψ i ψ Ti δ a
i≥1
sX sX
2 2
≤ ω2t δ Tb ψ i δ Ta ψ i by Cauchy-Schwartz
i≥1 i≥1
The walk matrix is closely related to the normalized Laplacian, which is defined by
We let 0 ≤ ν1 ≤ ν2 ≤ · · · ≤ νn denote the eigenvalues of N , and note that they have the same
eigenvectors as A. Other useful relations include
νi = 2 − 2ωi , ωi = 1 − νi /2,
and
f = I − 1 D 1/2 N D −1/2 .
W
2
The normalized Laplacian is positive semidefinite and has the same rank as the ordinary
(sometimes called “combinatorial”) Laplacian. There are many advantages of working with the
normalized Laplacian: the mean of its eigenvalues is 1, so they are always on a degree-independent
scale. One can prove that νn ≤ 2, with equality if and only if the graph is bipartite.
The bound in Theorem 10.4.1 can be expressed in the eigenvalues of the normalized Laplacian as
s
d (b)
|p t (b) − π(b)| ≤ (1 − ν2 /2)t .
d (a)
for all vertices b. Using the approximation 1 − x ≈ exp(−x), we see that this should happen once
s
d (b)
(1 − ν2 /2)t ≤ d (b)/2d (V ) ⇐⇒
d (a)
p
(1 − ν2 /2)t ≤ d (b)d (a)/2d (V ) ⇐⇒
p
exp(−tν2 /2) ≤ d (b)d (a)/2d (V ) ⇐⇒
p
−tν2 /2 ≤ ln d (b)d (a)/2d (V ) ⇐⇒
p
t ≥ 2 ln 2d (V )/ d (b)d (a) /ν2 .
So, for graphs in which all degrees are approximately constant, this upper bound on the time to
mix is approximately ln(n)/ν2 . For some graphs the ln n term does not appear. Note that
multiplying all edge weights by a constant does not change any of these expressions.
While we have explicitly worked out λ2 for many graphs, we have not done this for ν2 . The
following lemma will allow us to relate bounds on λ2 to bounds on ν2 :
xTN x
νi = min max .
dim(S)=i x ∈S xTx
y T Ly
min max .
dim(T )=i y ∈T y T Dy
So,
y T Ly y T Ly 1 y T Ly λi
min max T
≥ min max T
= min max T
= .
dim(T )=i y ∈T y Dy dim(T )=i y ∈T dmax y y dmax dim(T )=i y ∈T y y dmax
The other bound may be proved similarly.
10.6 Examples
We now do some examples. For each we think about the random walk in two ways: by reasoning
directly about how a random walk should behave and by examining ν2 .
As with the path, ν2 for the tree is within a constant of λ2 for the tree, and so is approximately
c/n for some constant c. To understand the random walk on Tn , first note that whenever it is at a
vertex, it is twice as likely to step towards a leaf as it is to step towards the root. So, if the walk
starts at a leaf, there is no way the walk can mix until it reaches the root. The height of the walk
is like a sum of ±1 random variables, except that they are twice as likely to be −1 as they are to
be 1, and that their sum never goes below 0. One can show that we need to wait approximately n
steps before such a walk will hit the root. Once it does hit the root, the walk mixes rapidly.
CHAPTER 10. RANDOM WALKS ON GRAPHS 90
The dumbell graph Dn consists of two complete graphs on n vertices, joined by one edge called
the “bridge”. So, there are 2n vertices in total, and all vertices have degree n − 1 or n.
To understand the random walk on this graph, consider starting it at some vertex that is not
attached to the bridge edge. After the first step the walk will be well mixed on the vertices in the
side on which it starts. Because of this, the chance that it finds the edge going to the other side is
only around 1/n2 : there is only a 1/n chance of being at the vertex attached to the bridge edge,
and only a 1/n chance of choosing that edge when at that vertex. So, we must wait some multiple
of n2 steps before there is a reasonable chance that the walk reaches the other side of the graph.
The isoperimetric ratio of this graph is
1
θDn ∼ .
n
Using the test vector that is 1 on one complete graph and −1 on the other, we can show that
λ2 (Dn ) / 1/n.
Proof. For every pair of vertices (a, b), let P (a, b) be a path in G of length at most r. We have
So,
n
Kn 4 r G,
2
and
n
n≤r λ2 (G),
2
from which the lemma follows.
The diameter of Dn is 3, so we have λ2 (Dn ) ≥ 2/3(n − 1). As every vertex of Dn has degree at
least n − 1, we may conclude ν2 (Dn ) ' 2/3(n − 1)2 .
CHAPTER 10. RANDOM WALKS ON GRAPHS 91
We define the bolas2 graph Bn to be a graph containing two n-cliques connected by a path of
length n. The bolas graph has a value of ν2 that is almost as small as possible. Equivalently,
random walks on a bolas graph mix almost as slowly as possible.
The analysis of the random walk on a bolas is similar to that on a dumbbell, except that when
the walk is on the first vertex of the path the chance that it gets to the other end before moving
back to the clique at which we started is only 1/n. So, we must wait around n3 steps before there
is a reasonable chance of getting to the other side.
Next lecture, we will learn that we can upper bound ν2 with a test vector using the fact that
x T Lx
ν2 = min .
x ⊥d x T Dx
To prove an upper bound on ν2 , form a test vector that is n/2 on one clique, −n/2 on the other,
and increases by 1 along the path. We can use the symmetry of the construction to show that this
vector is orthogonal to d . The numerator of the generalized Rayleigh quotient is n, and the
denominator is the sum of the squares of the entries of the vectors times the degrees of the
vertices, which is some constant times n4 . This tells us that ν2 is at most some constant over n3 .
To see that ν2 must be at least some constant over n3 , and in fact that this must hold for every
graph, apply Lemmas 10.5.1 and 10.6.1.
10.7 Diffusion
There are a few types of diffusion that people study in a graph, but the most common is closely
related to random walks. In a diffusion process, we imagine that we have some substance that can
occupy the vertices, such as a gas or fluid. At each time step, some of the substance diffuses out
of each vertex. If we say that half the substance stays at a vertex at each time step, and the other
half is distributed among its neighboring vertices, then the distribution of the substance will
evolve according to equation (10.2). That is, probability mass obeys this diffusion equation.
People often consider finer time steps in which smaller fractions of the mass leave the vertices. In
the limit, this results in continuous random walks that are modeled by the matrix exponential: if
the walk stays put with probability 1 − in each step, and we view each step as taking time ,
then the transition matrix of the walk after time t will be
These are in many ways more natural than discrete time random walks.
2
A bolas is a hunting weapon consisting of two balls or rocks tied together with a cord.
CHAPTER 10. RANDOM WALKS ON GRAPHS 92
11.1 Overview
In this lecture we will see how the analysis of random walks, spring networks, and resistor
networks leads to the consideration of systems of linear equations in Laplacian matrices. The
main purpose of this lecture is to introduce concepts and language that we will use extensively in
the rest of the course.
The theme of this whole lecture will be harmonic functions on graphs. These will be defined in
terms of a weighted graph G = (V, E, w) and a set of boundary vertices B ⊆ V . We let S = V − B
(I use “-” for set-minus). We will assume throughout this lecture that G is connected and that B
is nonempty.
A function x : V → R is said to be harmonic at a vertex a if the value of x at a is the weighted
average of its values at the neighbors of a where the weights are given by w:
1 X
x (a) = wa,b x (b). (11.1)
da
b∼a
93
CHAPTER 11. WALKS, SPRINGS, AND RESISTOR NETWORKS 94
Consider the standard (not lazy) random walk on the graph G. Recall that when the walk is at a
vertex a, the probability it moves to a neighbor b is
wa,b
.
da
Distinguish two special nodes in the graph that we will call s and t, and run the random walk
until it hits either s or t. We view s and t as the boundary, so B = {s, t}.
Let x (a) be the probability that a walk that starts at a will stop at s, rather than at t. We have
the boundary conditions x (s) = 1 and x (t) = 0. For every other node a the chance that the walk
stops at s is the sum over the neighbors b of a of the chance that the walk moves to b, times the
chance that a walk from b stops at s. That is,
X wa,b
x (a) = x (b).
da
b∼a
We begin by imagining that every edge of a graph G = (V, E) is an ideal spring or rubber band.
They are joined together at the vertices. Given such a structure, we will pick a subset of the
vertices B ⊆ V and fix the location of every vertex in B. For example, you could nail each vertex
in B onto a point in the real line, or onto a board in IR2 . We will then study where the other
vertices wind up.
We can use Hooke’s law to figure this out. To begin, assume that each rubber band is an ideal
spring with spring constant 1. If your graph is weighted, then the spring constant of each edge
CHAPTER 11. WALKS, SPRINGS, AND RESISTOR NETWORKS 95
should be its weight. If a rubber band connects vertices a and b, then Hooke’s law tells us that
the force it exerts at node a is in the direction of b and is proportional to the distance between a
and b. Let x (a) be the position of each vertex a. You should begin by thinking of x (a) being in
IR, but you will see that it is just as easy to make it a vector in IR2 or IRk for any k.
The force the rubber band between a and b exerts on a is
x (b) − x (a).
In a stable configuration, all of the vertices that have not been nailed down must experience a
zero net force. That is
X X
(x (b) − x (a)) = 0 ⇐⇒ x (b) = da x (a)
b∼a b∼a
1 X
⇐⇒ x (b) = x (a).
da
b∼a
In a stable configuration, every vertex that is not on the boundary must be the average of its
neighbors.
In the weighted case, we would have for each a ∈ V − B
1 X
wa,b x (b) = x (a).
da
b∼a
we see that it corresponds to the row of the Laplacian matrix corresponding to vertex a. So, we
may find a solution to the equations (11.1) by solving a system of equations in the submatrix of
the Laplacian indexed by vertices in V − B.
To be more concete, I will set up those equations. For each vertex a ∈ B, let its position be fixed
to f (a). Then, we can re-write equation (11.2) as
X X
da x (a) − wa,b x (b) = wa,b f (b),
b6∈B:(a,b)∈E b∈B:(a,b)∈E
1
It can only fail to be unique if there is a connected component that contains no vertices of B.
CHAPTER 11. WALKS, SPRINGS, AND RESISTOR NETWORKS 96
for each a ∈ V − B. So, all of the boundary terms wind up in the right-hand vector.
Let S = V − B. We now see that this is an equation of the form
By L(S, S) I mean the submatrix of L indexed by rows and columns of S, and by x (S) I mean
the sub-vector of x indexed by S.
We can then write the condition that entries of B are fixed to f by
x (B) = f (B).
We have reduced the problem to that of solving a system of equations in a submatrix of the
Laplacian.
Submatrices of Laplacians are a lot like Laplacians, except that they are positive definite. To see
this, note that all of the off-diagonals of the submatrix of L agree with all the off-diagonals of the
Laplacian of the induced subgraph on the internal vertices. But, some of the diagonals are larger:
the diagonals of nodes in the submatrix account for both edges in the induced subgraph and
edges to the vertices in B.
Claim 11.5.1. Let L be the Laplacian of G = (V, E, w), let B ⊆ V , and let S = V − B. Then,
L(S, S) = LG(S) + X S ,
where G(S) is the subgraph induced on the vertices in S and X S is the diagonal matrix with
entries X
X S (a, a) = wa,b , for a ∈ S.
b∼a,b∈B
Lemma 11.5.2. Let L be the Laplacian matrix of a connected graph and let X be a nonnegative,
diagonal matrix with at least one nonzero entry. Then, L + X is positive definite.
Proof. We will prove that x T (L + X )x > 0 for every nonzero vector x . As both L and X are
positive semidefinite, we have
x T (L + X )x ≥ min x T Lx , x T X x .
Lemma 11.5.3. Let L be the Laplacian matrix of a connected graph G = (V, E, w), let B be a
nonempty, proper subset of V , and let S = V − B. Then, L(S, S) is positive definite.
CHAPTER 11. WALKS, SPRINGS, AND RESISTOR NETWORKS 97
Proof. Let S1 , . . . , Sk be the connected components of vertices of G(S). We can use these to write
L(S, S) as a block matrix with blocks equal to L(Si , Si ). Each of these blocks can be written
As G is connected, there must be some vertex in Si with an edge to a vertex not in Si . This
implies that XSi is not the zero matrix, and so we can apply Lemma 11.5.2 to prove that L(Si , Si )
is invertible.
As the matrix L(S, S) is invertible, the equations have a solution, and it must be unique.
11.6 Energy
Physics also tells us that the vertices will settle into the position that minimizes the potential
energy. The potential energy of an ideal linear spring with constant w when stretched to length l
is
1 2
wl .
2
So, the potential energy in a configuration x is given by
def 1 X
E (x ) = wa,b (x (a) − x (b))2 . (11.3)
2
(a,b)∈E
For any x that minimizes the energy, the partial derivative of the energy with respect to each
variable must be zero. In this case, the variables are x (a) for a ∈ S. The partial derivative with
respect to x (a) is
1X X
wa,b 2(x (a) − x (b)) = wa,b (x (a) − x (b)).
2
b∼a b∼a
Theorem 11.6.1. Let G = (V, E, w) be a connected, weighted graph, let B ⊂ V , and let
S = V − B. Given x (B), E (x ) is minimized by setting x (S) so that x is harmonic on S.
We now consider a related physical model of a graph in which we treat every edge as a resistor. If
the graph is unweighted, we will assume that each resistor has resistance 1. If an edge e has
weight we , we will give the corresponding resistor resistance re = 1/we . The reason is that when
the weight of an edge is very small, the edge is barely there, so it should correspond to very high
resistance. Having no edge corresponds to having a resistor of infinite resistance.
CHAPTER 11. WALKS, SPRINGS, AND RESISTOR NETWORKS 98
I now let v ∈ IRV be a vector of potentials (voltages) at vertices. Given these potentials, we can
figure out how much current flows on each edge by the formula
1
i (a, b) = (v (a) − v (b)) = wa,b (v (a) − v (b)) .
ra,b
That is, we adopt the convention that current flows from high voltage to low voltage. We would
like to write this equation in matrix form. The one complication is that each edge comes up twice
in i . So, to treat i as a vector we will have each edge show up exactly once as (a, b) when a < b.
We now define the signed edge-vertex adjacency matrix of the graph U to be the matrix with
rows indexed by edges and columns indexed by vertices such that
1
if a = c
U ((a, b), c) = −1 if b = c
0 otherwise.
i = W U v.
Also recall that resistor networks cannot hold current. So, all the current entering a vertex a from
edges in the graph must exit a to an external source. Let i ext ∈ IRV denote the external currents,
where i ext (a) is the amount of current entering the graph through node a. We then have
X
i ext (a) = i (a, b).
b∼a
It is often helpful to think of the nodes a for which i ext (a) 6= 0 as being boundary nodes. We will
call the other nodes internal. Let’s see what the equation
i ext = Lv .
means for the internal nodes. If the graph is unweighted and a is an internal node, then the ath
row of this equation is
X X
0 = (δ Ta L)v = (v (a) − v (b)) = da v (a) − v (b).
a∼b a∼b
That is,
1 X
v (a) = v (b),
da
a∼b
which means that v is harmonic at a. Of course, the same holds in weighted graphs.
We are often interested in applying (11.4) in the reverse: given a vector of external currents i ext
we solve for the induced voltages by
v = L−1 i ext .
This at first appears problematic, as the Laplacian matrix does not have an inverse. The way
around this problem is to observe that we are only interested in solving these equations for vectors
i ext for which the system has a solution. In the case of a connected graph, this equation will have
a solution if the sum of the values of i ext is zero. That is, if the current going in to the circuit
equals the current going out. These are precisely the vectors that are in the span of the Laplacian.
To obtain the solution to this equation, we multiply i ext by the Moore-Penrose pseudo-inverse of
L.
Definition 11.8.1. The pseudo-inverse of a symmetric matrix L, written L+ , is the matrix that
has the same span as L and that satisfies
LL+ = Π,
It is easy to find a formula for the pseudo-inverse. First, let Ψ be the matrix whose ith column is
ψ i and let Λ be the diagonal matrix with λi on the ith diagonal. Recall that
X
L = Ψ ΛΨ T = λi ψ i ψ Ti .
i
Claim 11.8.2. X
L+ = (1/λi )ψ i ψ Ti .
i>1
11.9 Exercise
Moreover, this holds for any symmetric matrix. Not just Laplacians.
Chapter 12
The effective resistance between two vertices a and b in an electrical network is the resistance of
the entire network when we treat it as one complex resistor. That is, we reduce the rest of the
network to a single edge. In general, we will see that if we wish to restrict our attention to a
subset of the vertices, B, and if we require all other vertices to be internal, then we can construct
a network just on B that factors out the contributions of the internal vertices. The process by
which we do this is Gaussian elimination, and the Laplacian of the resulting network on B is
called a Schur complement.
We now know that if a resistor network has external currents i ext , then the voltages induced at
the vertices will be given by
v = L+ i ext .
Consider what this means when i ext corresponds to a flow of one unit from vertex a to vertex b.
The resulting voltages are
v = L+ (δ a − δ b ).
Now, let c and d be two other vertices. The potential difference between c and d is
v (c) − v (d) = (δ c − δ d )T v = (δ c − δ d )T L+ (δ a − δ b ).
(δ a − δ b )T L+ (δ c − δ d ).
So, the potential difference between c and d when we flow one unit from a to b is the same as the
potential difference between a and b when we flow one unit from c to d.
101
CHAPTER 12. EFFECTIVE RESISTANCE AND SCHUR COMPLEMENTS 102
The effective resistance between vertices a and b is the resistance between a and b when we view
the entire network as one complex resistor.
To figure out what this is, recall the equation
v (a) − v (b)
i (a, b) = ,
ra,b
which holds for one resistor. We use the same equation to define the effective resistance of the
whole network between a and b. That is, we consider an electrical flow that sends one unit of
current into node a and removes one unit of current from node b. We then measure the potential
difference between a and b that is required to realize this current, define this to be the effective
resistance between a and b, and write it Reff (a, b). As it equals the potential difference between a
and b in a flow of one unit of current from a to b:
def
Reff (a, b) = (δ a − δ b )T L+ (δ a − δ b ).
We will eventually show that effective resistance is a distance. For now, we observe that effective
resistance is the square of a Euclidean distance.
To this end, let L+/2 denote the square root of L+ . Recall that every positive semidefinite matrix
has a square root: the square root of a symmetric matrix M is the symmetric matrix M 1/2 such
that (M 1/2 )2 = M . If X
M = λi ψ i ψ T
i
is the spectral decomposition of M , then
1/2
X
M 1/2 = λi ψ i ψ T .
i
We now have
T 2
(δ a − δ b )T L+ (δ a − δ b ) = L+/2 (δ a − δ b ) L+/2 (δ a − δ b ) = L+/2 (δ a − δ b )
2
= L+/2 δ a − L+/2 δ b = dist(L+/2 δ a , L+/2 δ b )2 .
As you would imagine, we can also define the effective resistance through effective spring
constants. In this case, we view the network of springs as one large compound network. If we
define the effective spring constant of s, t to be the number w so that when s and t are stretched
to distance l the potential energy in the spring is wl2 /2, then we should define the effective spring
constant to be twice the minimum possible energy of the network,
X
2E (x ) = wa,b (x (a) − x (b))2 ,
(a,b)∈E
CHAPTER 12. EFFECTIVE RESISTANCE AND SCHUR COMPLEMENTS 103
when x (s) is fixed to 0 and x (t) is fixed to 1. From Theorem 11.6.1, we know that this vector will
be harmonic on V − {s, t}.
Fortunately, we already know how compute such a vector x . Set
We have
y (t) − y (s) = (δ t − δ s )T L+ (δ t − δ s )/Reff (s, t) = 1,
and y is harmonic on V − {s, t}. So, we choose
x = y − 1y (s).
The vector x satisfies x (s) = 0, x (t) = 1, and it is harmonic on V − {s, t}. So, it is the vector
that minimizes the energy subject to the boundary conditions.
To finish, we compute the energy to be
x T Lx = y T Ly
1 T
L+ (δ t − δ s ) L L+ (δ t − δ s )
= 2
(Reff (s, t))
1
= (δ t − δ s )T L+ LL+ (δ t − δ s )
(Reff (s, t))2
1
= (δ t − δ s )T L+ (δ t − δ s )
(Reff (s, t))2
1
= .
Reff (s, t)
As the weights of edges are the reciprocals of their resistances, and the spring constant
corresponds to the weight, this is the formula we would expect.
Resistor networks have an analogous quantity: the energy dissipation (into heat) when current
flows through the network. It has the same formula. The reciprocal of the effective resistance is
sometimes called the effective conductance.
12.3 Monotonicity
Rayleigh’s Monotonicity Principle tells us that if we alter the spring network by decreasing some
of the spring constants, then the effective spring constant between s and t will not increase. In
terms of effective resistance, this says that if we increase the resistance of some resistors then the
effective resistance can not decrease. This sounds obvious. But, it is in fact a very special
property of linear elements like springs and resistors.
Theorem 12.3.1. Let G = (V, E, w) be a weighted graph and let G
b = (V, E, w)
b be another
weighted graph with the same edges and such that
ba,b ≤ wa,b
w
CHAPTER 12. EFFECTIVE RESISTANCE AND SCHUR COMPLEMENTS 104
for all (a, b) ∈ E. For vertices s and t, let cs,t be the effective spring constant between s and t in
G and let b cs,t be the analogous quantity in G.
b Then,
cs,t ≤ cs,t .
b
Proof. Let x be the vector of minimum energy in G such that x (s) = 0 and x (t) = 1. Then, the
energy of x in G
b is no greater:
1 X 1 X
ba,b (x (a) − x (b))2 ≤
w wa,b (x (a) − x (b))2 = cs,t .
2 2
(a,b)∈E (a,b)∈E
Similarly, if we let R b eff (s, t) be the effective resistance in G between s and t, then
b eff (s, t) ≥ Reff (s, t). That is, increasing the resistance of resistors in the network cannot decrease
R
effective resistances.
While this principle seems very simple and intuitively obvious, it turns out to fail in just slightly
more complicated situations.
In the case of a path graph with n vertices and edges of weight 1, the effective resistance between
the extreme vertices is n − 1.
In general, if a path consists of edges of resistance r1,2 , . . . , rn−1,n then the effective resistance
between the extreme vertices is
r1,2 + · · · + rn−1,n .
To see this, set the potential of vertex i to
Ohm’s law then tells us that the current flow over the edge (i, i + 1) will be
If we have k parallel edges between two nodes s and t of resistances r1 , . . . , rk , then the effective
resistance is
1
Reff (s, t) = .
1/r1 + · · · + 1/rk
To see this, impose a potential difference of 1 between s and t. This will induce a flow of
1/ri = wi on edge i. So, the total flow will be
X X
1/ri = wi .
i i
CHAPTER 12. EFFECTIVE RESISTANCE AND SCHUR COMPLEMENTS 105
We have shown that the impact of the entire network on two vertices can be reduced to a network
with one edge between them. We will now see that we can do the same for a subset of the
vertices. We will do this in two ways: first by viewing L as an operator, and then by considering
it as a quadratic form.
Let B be the subset of nodes that we would like to understand (B stands for boundary). All
nodes not in B will be internal. Call them I = V − B.
As an operator, the Laplacian maps vectors of voltages to vectors of external currents. We want
to examine what happens if we fix the voltages at vertices in B, and require the rest to be
harmonic. Let v (B) ∈ IRB be the voltages at B. We want the matrix LB such that
i B = LB v (B)
is the vector of external currents a vertices in B when we impose voltages v (B) at vertices of B.
As the internal vertices will have their voltages set to be harmonic, they will not have any
external currents.
The remarkable fact that we will discover is that LB is in fact a Laplacian matrix, and that it is
obtained by performing Gaussian elimination to remove the internal vertices. Warning: LB is
not a submatrix of L. To prove this, we will move from V to B by removing one vertex at a time.
We’ll start with a graph G = (V, E, w), and we will set B = {2, . . . , n}, and we will treat vertex 1
as internal. Let N denote the set of neighbors of vertex 1.
We want to compute Lv given v (b) for b ∈ B, and that
1 X
v (1) = w1,a v (a). (12.1)
d (1)
a∈N
That is, we want to substitute the value on the right-hand side for v (1) everywhere that it
appears in the equation i ext = Lv . The variable v (1) only appears in the equation for i ext (a)
when a ∈ N . When it does, it appears with coefficient w1,a . Recall that the equation for i ext (b) is
X
i ext (b) = d (b)v (b) − wb,c v (c).
c∼b
For b ∈ N we expand this by making the substitution for v (1) given by (12.1).
X
i ext (b) = d (b)v (b) − wb,1 v (1) − wb,c v (c)
c∼b,c6=1
1 X X
= d (b)v (b) − wb,1 w1,a v (a) − wb,c v (c)
d (1)
a∈N c∼b,c6=1
X wb,1 wa,1 X
= d (b)v (b) − v (a) − wb,c v (c).
d (1)
a∈N c∼b,c6=1
CHAPTER 12. EFFECTIVE RESISTANCE AND SCHUR COMPLEMENTS 106
To finish, observe that b ∈ N , so we are counting b in the middle sum above. Removing the
double-count gives.
X wb,1 wa,1 X
2
i ext (b) = (d (b) − wb,1 /d (1))v (b) − v (a) − wb,c v (c).
d (1)
a∈N,a6=b c∼b,c6=1
We will show that these revised equations have two interesting properties: they are the result of
applying Gaussian elimination to eliminate vertex 1, and the resulting equations are Laplacian.
Let’s look at exactly how the matrix has changed. In the row for vertex b, the edge to vertex 1
w wa,1
was removed, and edges to every vertex a ∈ N were added with weights b,1 d (1) . And, the
wb,1 wb,1
diagonal was decreased by d (1) . Overall, the star of edges based at 1 were removed, and a
clique on N was added in which edge (a, b) has weight
wb,1 w1,a
.
d (1)
To see that this new system of equations comes from a Laplacian, we observe that
1. It is symmetric.
3. The sum of the changes in diagonal and off-diagonal entries is zero, so the row-sum is still
zero. This follows from
2
wb,1 X wb,1 wa,1
− = 0.
d (1) d (1)
a∈N
We now do this in terms of the quadratic form. That is, we will compute the matrix LB so that
v (B)T LB v (B) = v T Lv ,
given that v is harmonic at vertex 1 and agrees with v (B) elsewhere. The quadratic form that we
want to compute is thus given by
1 P T 1 P
d (1) b∼1 w1,b v (b) d (1) b∼1 w1,b v (b)
L .
v (B) v (B)
So that we can write this in terms of the entries of the Laplacian matrix, note that
d (1) = L(1, 1), and so
1 X
v (1) = w1,b v (b) = −(1/L(1, 1))L(1, B)v (B).
d (1)
b∼1
CHAPTER 12. EFFECTIVE RESISTANCE AND SCHUR COMPLEMENTS 107
v (B)T L(B, B)v (B) + L(1, 1) (−(1/L(1, 1))L(1, B)v (B))2 + 2v (1)L(1, B) (−(1/L(1, 1))L(1, B)v (B))
= v (B)T L(B, B)v (B) + (L(1, B)v (B))2 /L(1, 1) − 2 (L(1, B)v (B))2 /L(1, 1)
= v (B)T L(B, B)v (B) − (L(1, B)v (B))2 /L(1, 1).
Thus,
L(B, 1)L(1, B)
LB = L(B, B) − .
L(1, 1)
To see that this is the matrix that appears in rows and columns 2 through n when we eliminate
the entries in the first column of L by adding multiples of the first row, note that we eliminate
entry L(a, 1) by adding −L(a, 1)/L(1, 1) times the first row of the matrix to L(a, :). Doing this
for all rows in B = {2, . . . , n} results in this formula.
We can again check that LB is a Laplacian matrix. It is clear from the formula that it is
symmetric and that the off-diagonal entries are negative. To check that the constant vectors are
in the nullspace, we can show that the quadratic form is zero on those vectors. If v (B) is a
constant vector, then v (1) must equal this constant, and so v is a constant vector and the value
of the quadratic form is 0.
We can of course use the same procedure to eliminate many vertices. We begin by partitioning
the vertex set into boundary vertices B and internal vertices I. We can then use Gaussian
elimination to eliminate all of the internal vertices. You should recall that the submatrices
produced by Gaussian elimination do not depend on the order of the eliminations. So, you may
conclude that the matrix LB is uniquely defined.
Or, observe that to eliminate the entries in row a ∈ B and columns in S, using the rows in S, we
need to add those rows, L(S, :) to row L(a, :) with coefficients c so that
L(a, S) + cL(S, S) = 0.
This gives
c = −L(a, S)L(S, S)−1 ,
and thus row a becomes
L(a, :) − L(a, S)L(S, S)−1 L(S, :).
CHAPTER 12. EFFECTIVE RESISTANCE AND SCHUR COMPLEMENTS 108
which implies
v (S) = −L(S, S)−1 L(S, B)v (B) = L(S, S)−1 M (S, B)v (B),
as M (S, B) = −L(S, B) because off-diagonal blocks of the Laplacian equal the negative of the
corresponding blocks in the adjacency matrix. This gives
i ext (B) = L(B, S)v (S) + L(B, B)v (B) = −L(B, S)L(S, S)−1 L(S, B)v (B) + L(B, B)v (B),
and so
i ext (B) = LB v (B), where LB = L(B, B) − L(B, S)L(S, S)−1 L(S, B)
is the Schur complement.
This gives us a way of understand how Gaussian elimination solves a system of equations like
i ext = Lv . It constructs a sequence of graphs, G2 , . . . , Gn , so that Gi is the effective network on
vertices i, . . . , n. It then solves for the entries of v backwards. Given v (i + 1), . . . , v (n) and
i ext (i), we can solve for v (i). If i ext (i) = 0, then v (i) is set to the weighted average of its
neighbors. If not, then we need to take i ext (i) into account here and in the elimination as well. In
the case in which we fix some vertices and let the rest be harmonic, there is no such complication.
We claim that the effective resistance is a distance. The only non-trivial part to prove is the
triangle inequality, (4).
From the previous section, we know that it suffices to consider graphs with only three vertices: we
can reduce any graph to one on just vertices a, b and c without changing the effective resistances
between them.
Proof. Let
z = wa,b , y = wa,c , and x = wb,c .
If we eliminate vertex c, we create an edge between vertices a and b of weight
xy
.
x+y
xy
Adding this to the edge that is already there produces weight z + x+y , for
1 1 x+y
Reff a,b = xy = zx+zy+xy =
z + x+y x+y
zx + zy + xy
Working symmetrically, we find that we need to prove that for all positive x, y, and z
x+y y+z x+z
+ ≥ ,
zx + zy + xy zx + zy + xy zx + zy + xy
which is of course true.
Chapter 13
13.1 Introduction
In this chapter we present one of the most fundamental results in Spectral Graph Theory: the
Matrix-Three Theorem. It relates the number of spanning trees of a connected graph to the
determinants of principal minors of the Laplacian. We then extend this result to relate the
fraction of spanning trees that contain a given edge to the effective resistance of the entire graph
between the edge’s endpoints.
13.2 Determinants
To begin, we review some facts about determinants of matrices and characteristic polynomials.
We first recall the Leibniz formula for the determinant of a square matrix A:
n
!
X Y
det(A) = sgn(π) A(i, π(i)) , (13.1)
π i=1
Elementary row operations do not change the determinant. If the columns of A are the vectors
a 1 , . . . , a n , then for every c
det a 1 , a 2 , . . . , a n = det a 1 , a 2 , . . . , a n + caa 1 .
This fact gives us two ways of computing the determinant. The first comes from the fact that we
can apply elementary row operations to transform A into an upper triangular matrix, and (13.1)
tells us that the determinant of an upper triangular matrix is the product of its diagonal entries.
110
CHAPTER 13. RANDOM SPANNING TREES 111
The second comes from the observation that the determinant is the volume P of the parallelepiped
with axes a 1 , . . . , a n : the polytope whose corners are the origin and i∈S a i for every
S ⊆ {1, . . . , n}. Let
Πa 1
be the symmetric projection orthogonal to a 1 . As this projection amounts to subtracting off a
multiple of a 1 and elementary row operations do not change the determinant,
det a 1 , a 2 , . . . , a n = det a 1 , Πa 1 a 2 , . . . , Πa 1 a n .
The volume of this parallelepiped is kaa 1 k times the volume of the parallelepiped formed by the
vectors Πa 1 a 2 , . . . , Πa 1 a n . I would like to write this as a determinant, but must first deal with
the fact that these are n − 1 vectors in an n dimensional space. The way we first learn to handle
this is to project them into an n − 1 dimensional space where we can take the determinant.
Instead, we will employ other elementary symmetric functions of the eigenvalues.
det(xI − A).
where σk (A) is the kth elementary symmetric function of the eigenvalues of A, counted with
algebraic multiplicity: X Y
σk (A) = λi .
|S|=k i∈S
Thus, σ1 (A) is the trace and σn (A) is the determinant. From this formula, we know that these
functions are invariant under similarity transformations.
In Exercise 3 from Lecture 2, you were asked to prove that
X
σk (A) = det(A(S, S)). (13.3)
|S|=k
This follows from applying the Leibnitz formula (13.1) to det(xI − A).
If we return to the vectors Πa 1 a 2 , . . . , Πa 1 a n from the previous section, we see that the volume of
their parallelepiped may be written
σn−1 0n , Πa 1 a 2 , . . . , Πa 1 a n ,
Recall that the matrices BB T and B T B have the same eigenvalues, up to some zero eigenvalues
if they are rectangular. So,
σk (BB T ) = σk (B T B).
This gives us one other way of computing the absolute value of the product of the nonzero
eigenvalues of the matrix
Πa 1 a 2 , . . . , Πa 1 a n .
We can instead compute their square by computing the determinant of the square matrix
Πa 1 a 2
..
. Πa 1 a 2 , . . . , Πa 1 a n .
Πa 1 a n
When B is a singular matrix of rank k, σk (B) acts as the determinant of B restricted to its span.
Thus, there are situations in which σk is multiplicative. For example, if A and B both have rank
k and the range of A is orthogonal to the nullspace of B, then
We will use this identity in the case that A and B are symmetric and have the same nullspace.
We will state a slight variant of the standard Matrix-Tree Theorem. Recall that a spanning tree
of a graph is a subgraph that is a tree.
Theorem 13.4.1. Let G = (V, E, w) be a connected, weighted graph. Then
X Y
σn−1 (LG ) = n we .
spanning trees T e∈T
Thus, the eigenvalues allow us to count the sum over spanning trees of the product of the weights
of edges in those trees. When all the edge weights are 1, we just count the number of spanning
trees in G.
We first prove this in the case that G is just a tree.
Lemma 13.4.2. Let G = (V, E, w) be a weighted tree. Then,
Y
σn−1 (LG ) = n we .
e∈E
Proof of Theorem 13.4.1 . As in the previous lemma, let LG = U T W U and B = W 1/2 U . So,
σn−1 (LG ) = σn−1 (B T B)
= σn−1 (BB T )
X
= σn−1 (B(S, :)B(S, :)T ) (by (13.3) )
|S|=n−1,S⊆E
X
= σn−1 (B(S, :)T B(S, :))
|S|=n−1,S⊆E
X
= σn−1 (LGS ),
|S|=n−1,S⊆E
CHAPTER 13. RANDOM SPANNING TREES 114
where by GS we mean the graph containing just the edges in S. As S contains n − 1 edges, this
graph is either disconnected or a tree. If it is disconnected, then its Laplacian has at least two
zero eigenvalues and σn−1 (LGS ) = 0. If it is a tree, we apply the previous lemma. Thus, the sum
equals X X Y
σn−1 (LGT ) = n we .
spanning trees T ⊆E spanning trees T e∈T
The leverage score of an edge, written `e is defined to be we Reff (e). That is, the weight of the
edge times the effective resistance between its endpoints. The leverage score serves as a measure
of how important the edge is. For example, if removing an edge disconnects the graph, then
Reff (e) = 1/we , as all current flowing between its endpoints must use the edge itself, and `e = 1.
Consider sampling a random spanning tree with probability proportional to the product of the
weights of its edges. We will now show that the probability that edge e appears in the tree is
exactly its leverage score.
Theorem 13.5.1. If we choose a spanning tree T with probability proportional to the product of
its edge weights, then for every edge e
Pr [e ∈ T ] = `e .
For simplicity, you might want to begin by thinking about the case where all edges have weight 1.
Recall that the effective resistance of edge e = (a, b) is
(δa − δb )T L+
G (δa − δb ),
and so
`a,b = wa,b (δa − δb )T L+
G (δa − δb ).
We can write a matrix Γ that has all these terms on its diagonal by letting U be the edge-vertex
adjacency matrix, W be the diagonal edge weight matrix, B = W 1/2 U , and setting
Γ = BL+ T
GB .
The rows and columns of Γ are indexed by edges, and for each edge e,
Γ(e, e) = `e .
For off-diagonal entries corresponding to edges (a, b) and (c, d), we have
√ √
Γ((a, b), (c, d)) = wa,b wc,d (δ a − δ b )T L+
G (δ c − δ d ).
Claim 13.5.2. The matrix Γ is a symmetric projection matrix and has trace n − 1.
CHAPTER 13. RANDOM SPANNING TREES 115
Proof. The matrix Γ is clearly symmetric. To show that it is a projection, it suffices to show that
all of its eigenvalues are 0 or 1. This is true because, excluding the zero eigenvalues, Γ has the
same eigenvalues as
L+ T +
G B B = LG LG = Π,
where Π is the projection orthogonal to the all 1 vector. As Π has n − 1 eigenvalues that are 1,
so does Γ.
This is a good sanity check on Theorem 13.5.1: every spanning tree has n − 1 edges, and thus the
probabilities that each edge is in the tree must sum to n − 1.
We also obtain another formula for the leverage score. As a symmetric projection is its own
square,
Γ(e, e) = Γ(e, :)Γ(e, :)T = kΓ(e, :)k2 .
This is the formula I introduced in Section ??. If we flow 1 unit from a to b, the potential
difference between c and d is (δ a − δ b )T L+
G (δ c − δ d ). If we plug these potentials into the
Laplacian quadratic form, we obtain the effective resistance. Thus this formula says
X 2
wa,b Reff a,b = wa,b wc,d (δ a − δ b )T L+
G (δ c − δ d ) .
(c,d)∈E
Proof of Theorem 13.5.1. Let Span(G) denote the set of spanning trees of G. For an edge e,
X σn−1 (LGT )
PrT [e ∈ T ] =
σn−1 (LG )
T ∈Span(G):e∈T
X
= σn−1 (LGT )σn−1 (L+
G)
T ∈Span(G):e∈T
X
= σn−1 (LGT L+
G ),
T ∈Span(G):e∈T
by (13.4). Recalling that the subsets of n − 1 edges that are not spanning trees contribute 0
allows us to re-write this sum as
X
σn−1 (LGS L+
G ).
|S|=n−1,e∈S
σn−1 (Γ(S, :)Γ(:, S)) = kγ e k2 σn−2 (Γ(S, :)Πγ e Γ(:, S)) = kγ e k2 σn−2 ((ΓΠγ e Γ)(S, S)).
= kγ e k2 σn−2 (ΓΠγ e Γ)
= kγ e k2
= `e .
Chapter 14
In this chapter, we will see how to use the Johnson-Lindenstrauss Lemma, one of the major
techniques for dimension reduction, to approximately represent and compute effective resistances.
Throughout this chapter, G = (V, E, w) will be a connected, weighted graph with n vertices and
m edges.
We begin by considering the problem of building a data structure from which one can quickly
estimate the effective resistance between every pair of vertices a, b ∈ V . To do this, we exploit the
fact from Section 12.1 that effective resistances can be expressed as squares of Euclidean distances:
Reff (a, b) = (δ a − δ b )T L+ (δ a − δ b )
2
= L+/2 (δ a − δ b )
2
= L+/2 δ a − L+/2 δ b
= dist(L+/2 δ a , L+/2 δ b )2 .
One other way of expressing the above terms is through a matrix norm . For a positive
semidefinite matrix A, the matrix norm in A is defined by
√
kx kA = x T Ax = A1/2 x .
It is worth observing that this is in fact a norm: it is zero when x is zero, it is symmetric, and it
obeys the triangle inequality: for x + y = z ,
117
CHAPTER 14. APPROXIMATING EFFECTIVE RESISTANCES 118
The Johnson-Lindenstrauss Lemma [JL84] tells us that every Euclidean metric on n points is
well-approximated by a Euclidean metric in O(log n) dimensions, regardless of the original
dimension of the points. Johnson and Lindenstrauss proved this by applying a random orthogonal
projection to the points. As is now common, we will analyze the simpler operation of applying a
random matrix of Gaussian random variables (also known as Normal variables). All Gaussian
random variables that appear in this chapter will have mean 0.
We recall that a Gaussian random variable of variance 1 has probability density
1
p(x) = √ exp(−x2 /2),
2π
and that a Gaussian random variable of variance σ 2 has probability density
1
p(x) = √ exp(−x2 /2σ 2 ).
2πσ
The distribution of such a variable is written N (0, σ 2 ), where the 0 corresponds to the mean
being 0. A variable with distribution N (0, σ 2 ) may be obtained by sampling one with distribution
N (0, 1), and then multiplying it by σ. Gaussian random variables have many special properties,
some of which we will see in this chapter. For those who are not familiar with them, we begin by
mentioning that they are the limit of a binomial distribution. If X is the sum of n ±1 random
variables for large n, then √
Pr X/ n = t → p(t).
Theorem 14.1.1. Let x 1 , . . . , x n be vectors in IRk . For any , δ > 0, let d = 8(ln(n2 /δ)/2 . If R
is a d-by-k matrix of independent N (0, 1/d) variables, then with probability at least 1 − δ, for all
a 6= b,
(1 − )dist(x a , x b )2 ≤ dist(Rx a , Rx b )2 ≤ (1 + )dist(x a , x b )2 .
Thus, if we set d = 8(ln(n2 /δ)/2 , let R be a d-by-n matrix of independent N (0, 1/d) variables,
and set y a = RL+/2 δ a for each a ∈ V , then with probability at least 1 − δ we will have that for
every a and b, Reff (a, b) is within a 1 ± factor of dist(y a , y b )2 . Whereas writing all effective
n
resistances would require 2 numbers, storing y 1 , . . . , y n only requires ?nd.
We remark that the 8 in the theorem can be replace with a constant that tends towards 4 as
goes to zero.
Note that the naive way of computing one effective resistance requires solving one Laplacian
system: (δ a − δ b )T L+ (δ a − δ b ). We will see that we can approximate all of them by solving a
logarithmic number of such systems.
If we could quickly multiply a vector by L+/2 , then this would give us a fast way of approximately
computing all effective resistances. All we would need to do is multiply each of the d rows of R by
L+/2 . This would provide the matrix RL+/2 , from which we could compute RL+/2 δ a by selecting
CHAPTER 14. APPROXIMATING EFFECTIVE RESISTANCES 119
the ath column. This leads us to ask how quickly we can multiply a vector by L+/2 . Cheng,
Cheng, Liu, Peng and Teng [CCL+ 15] show that this can be done in nearly-linear time. In this
section, we will present a more elementary approach that merely requires solving systems of
equations in Laplacian matrices. We will see in Chapter ?? that this can be done very quickly.
The key is to realize that we do not actually need to multiply by the square root of the
pseudoinverse of the Laplacian. Any matrix M such that M T M = L+ will suffice.
Recall that we can write L = U T W U , where U is the signed edge-vertex adjacency matrix and
W is the diagonal matrix of edge weights. We then have
L+ U T W 1/2 W 1/2 U L+ = L+ LL+ = L+ .
So,
2
W 1/2 U L+ (δ a − δ b ) = Reff (a, b).
Now, we let R be a d-by-m matrix of independent N (0, 1/d) entries, and compute
RW 1/2 U L+ = (RW 1/2 U )L+ .
This requires multiplying d vectors in IRm by W 1/2 U , and solving d systems of linear equations
in L. We then set
y a = (RW 1/2 U )L+ δ a .
Each of these is a vector in d dimensions, and with high probability ky a − y b k2 is a good
approximation of Reff (a, b).
One way to remember this is to recall that for a N (0, σ 2 ) random variable r, Er2 = σ 2 , and the
variance of the sum of independent random variables is the sum of their variances. The above
claim adds the fact that the sum is also Gaussian.
In particular, if x is an arbitrary vector and r is a vector of independent N (0, 1) random
variables, then x T r is a Gaussian random variable of variance kx k2 . This follows because
x (i)r (i) has variance x (i)2 , and X
xTr = x (i)r (i).
CHAPTER 14. APPROXIMATING EFFECTIVE RESISTANCES 120
Finally, the probability that X − d > d or X − d < −d is at most the sum of these probabilities,
which is at most 2 exp(−t).
√
We √
remark that for small the term 2d/ 8 dominates, and the upper bound of d approaches
d/ 2. If one pushes this into the proof below, we see that it suffices to project into a space of
dimension dimension of just a little more than 4(ln(n2 /δ)/2 , instead of 8(ln(n2 /δ)/2 .
Proof of Theorem 14.1.1. First consider an arbitrary a and b, and let ∆ = kx a − x b k2 . Each
entry of R(x a − x b ) is a d-dimensional vector of N (0, σ 2 ) variables, where σ 2 = ∆/d. Thus,
Corollary 14.3.3 tells us that
2δ
2 exp(−2 d/8) ≤ 2 exp(− ln(n2 /δ)) = .
n2
n
As there 2 possible choices for a and b, the probability that there is one such that
kR(x a − x b )k2 6∈ (1 ± ) kx a − x b k2
is at most
n 2δ
< δ.
2 n2
Chapter 15
We prove Tutte’s theorem [Tut63], which shows how to use spring embeddings to obtain planar
drawings of 3-connected planar graphs. One begins by selecting a face, and then nailing down the
positions of its vertices to the corners of a strictly convex polygon. Of course, the edges of the
face should line up with the edges of the polygon. Ever other vertex goes where the springs say
they should—to the center of gravity of their neighbors. Tutte proved that the result is a planar
embedding of the planar graph. Here is an image of such an embedding
The presentation in this lecture is a based on notes given to me by Jim Geelen. I begin by
recalling some standard results about planar graphs that we will assume.
122
CHAPTER 15. TUTTE’S THEOREM: HOW TO DRAW A GRAPH 123
A planar drawing of a graph G = (V, E) consists of mapping from the vertices to the plane,
z : V → IR2 , along with interior-disjoint curves for each edge. The curve for edge (a, b) starts at
z (a), ends at z (b), never crosses itself, and its interior does not intersect the curve for any other
edge. A graph is planar if it has a planar drawing. There can, of course, be many planar drawings
of a graph.
If one removes the curves corresponding to the edges in a planar drawing, one divides the plane
into connected regions called faces. In a 3-connected planar graph, the sets of vertices and edges
that border each face are the same in every planar drawing. There are planar graphs that are not
3-connected, like those in Figures 15.1 and 15.1, in which different planar drawings result in
combinatorially different faces. We will only consider 3-connected planar graphs.
Figure 15.1: Planar graphs that are merely one-connected. Edge (c, d) appears twice on a face in
each of them.
Figure 15.2: Two different planar drawings of a planar graph that is merely two-connected. Vertices
g and h have switched positions, and thus appear in different faces in each drawing.
We state a few properties of 3-connected planar graphs that we will use. We will not prove these
properties, as we are more concerned with algebra and these properly belong in a class on
combinatorial graph theory.
Claim 15.1.1. Let G = (V, E) be a 3-connected planar graph. Then, there exists a set of faces F ,
each of which corresponds to a cycle in G, so that no vertex appears twice in a face, no edge
appears twice in a face, and every edge appears in exactly two faces.
We call the face on the outside of the drawing the outside face. The edges that lie along the
CHAPTER 15. TUTTE’S THEOREM: HOW TO DRAW A GRAPH 124
Figure 15.3: 3-connected planar graphs. Some faces of the graph on the left are abf , f gh, and
af he. The outer face is abcde. The graph on the right is obtained by contracting edge (g, h).
Another standard fact about planar graphs is that they remain planar under edge contractions.
Contracting an edge (a, b) creates a new graph in which a and b become the same vertex, and all
edges that went from other vertices to a or b now go to the new vertex. Contractions also preserve
3-connectivity. Figure 15.1 depicts a 3-connected planar graph and the result of contracting an
edge.
A graph H = (W, F ) is a minor of a graph G = (V, E) if H can be obtained from G by
contracting some edges and possibly deleting other edges and vertices. This means that each
vertex in W corresponds to a connected subset of vertices in G, and that there is an edge between
two vertices in W precisely when there is some edge between the two corresponding subsets. This
leads to Kuratowski’s Theorem [Kur30], one of the most useful characterizations of planar graphs.
Theorem 15.1.2. A graph G is planar if and only if it does not have a minor isomorphic to the
complete graph on 5 vertices, K5 , or the bipartite complete graph between two sets of 3 vertices,
K3,3 .
Figure 15.4: The Peterson graph appears on the left. On the right is a minor of the Peterson graph
that is isomorphic to K5 , proving that the Peterson graph is not planar.
CHAPTER 15. TUTTE’S THEOREM: HOW TO DRAW A GRAPH 125
We will use one other important fact about planar graphs, whose utility in this context was
observed by Jim Geelen.
Lemma 15.1.3. Let (a, b) be an edge of a 3-connected planar graph and let S1 and S2 be the sets
of vertices on the two faces containing (a, b). Let P be a path in G that starts at a vertex of
S1 − {a, b}, ends at a vertex of S2 − {a, b}, and that does not intersect a or b. Then, every path in
G from a to b either intersects a vertex of P or the edge (a, b).
Proof. Let s1 and s2 be the vertices at the ends of the path P . Consider a planar drawing of G
and ¡the closed curve in the plane that follows the path P from s1 to s2 , and then connects s1 to
s2 by moving inside the faces S1 and S2 , where the path only intersects the curve for edge (a, b).
This curve separates vertex a from vertex b. Thus, every path in G that connects a to b must
intersect this curve. This means that it must either consist of just edge (a, b), or it must intersect
a vertex of P . See Figure 15.1.
Figure 15.5: A depiction of Lemma 15.1.3. S1 = abcde, S2 = abf , and the path P starts at d, ends
at f , and contains the other unlabeled vertices.
This is a good time to remind you what exactly a convex polygon is. A subset C ⊆ IR2 is convex
if for every two points x and y in C, the line segment between x and y is also in C. A convex
polygon is a convex region of IR2 whose boundary is comprised of a finite number of straight lines.
It is strictly convex if in addition the angle at every corner is less than π. We will always assume
that the corners of a strictly convex polygon are distinct. Two corners form an edge of the
polygon if the interior of the polygon is entirely on one side of the line through those corners.
This leads to another definition of a strictly convex polygon: a convex polygon is strictly convex if
for every edge, all of the corners of the polygon other than those two defining the edge lie entirely
on one side of the polygon. In particular, none of the other corners lie on the line.
Definition 15.2.1. Let G = (V, E) be a 3-connected planar graph. We say that z : V → IR2 is a
Tutte embedding if
CHAPTER 15. TUTTE’S THEOREM: HOW TO DRAW A GRAPH 126
a. There is a face F of G such that z maps the vertices of F to the corners of a strictly convex
polygon so that every edge of the face joins consecutive corners of the polygon;
b. Every vertex not in F lies at the center of gravity of its neighbors.
We will prove Tutte’s theorem by proving that every face of G is embedded as a strictly convex
polygon. In fact, we will not use the fact that every non-boundary vertex is exactly the average of
its neighbors. We will only use the fact that every non-boundary vertex is inside the convex hull
of its neighbors. This corresponds to allowing arbitrary spring constants in the embedding.
Theorem 15.2.2. Let G = (V, E) be a 3-connected planar graph, and let z be a Tutte embedding
of G. If we represent every edge of G as the straight line between the embedding of its endpoints,
then we obtain a planar drawing of G.
Note that if the graph were not 3-connected, then the embedding could be rather degenerate. If
there are two vertices a and b whose removal disconnects the graph into two components, then all
of the vertices in one of those components will embed on the line segment from a to b.
Henceforth, G will always be a 3-connected planar graph and z will always be a Tutte embedding.
The proof of Theorem 15.2.2 will be easy once we rule out certain degeneracies. There are two
types of degeneracies that we must show can not happen. The most obvious is that we can not
have z (a) = z (b) for any edge (a, b). The fact that this degeneracy can not happen will be a
consequence of Lemma 15.4.1.
The other type of degeneracy is when there is a vertex a such that all of its neighbors lie on one
line in IR2 . We will rule out such degeneracies in this section.
We first observe two simple consequences of the fact that every vertex must lie at the average of
its neighbors.
Claim 15.3.1. Let a be a vertex and let ` be any line in IR2 through z (a). If a has a neighbor
that lines on one side of `, then it has a neighbor that lies on the other.
CHAPTER 15. TUTTE’S THEOREM: HOW TO DRAW A GRAPH 127
Claim 15.3.2. All vertices not in F must lie strictly inside the convex hull of the polygon of
which the vertices in F are the corners.
Proof. For every vertex a not in F , we can show that the position of a is a weighted average of
the positions of vertices in F by eliminating every vertex not in F ∪ {a}. As we learned in Lecture
13, this results in a graph in which all the neighbors of a are in F , and thus the position of a is
some weighted average of the position of the vertices in F . As the graph is 3-connected, we can
show that this average must assign nonzero weights to at least 3 of the vertices in F .
Note that it is also possible to prove Claim 15.3.2 by showing that one could reduce the potential
energy by moving vertices inside the polygon. See Claim 8.8.1 from my lecture notes from 2015.
Lemma 15.3.3. Let H be a halfspace in IR2 (that is, everything on one side of some line). Then
the subgraph of G induced on the vertices a such that z (a) ∈ H is connected.
Proof. Let t be a vector so that we can write the line ` in the form t T x = µ, with the halfspace
consisting of those points x for which t T x ≥ µ. Let a be a vertex such that z (a) ∈ H and let b be
a vertex that maximizes t T z (b). So, z (b) is as far from the line defining the halfspace as possible.
By Claim 15.3.2, b must be on the outside face, F .
For every vertex c, define t(c) = t T z (c). We will see that there is a path in G from a to b along
which the function t never decreases, and thus all the vertices along the path lie in the halfspace.
We first consider the case in which t(a) = t(b). In this case, we also know that a ∈ F . As the
vertices in F embed to a strictly convex polygon, this implies that (a, b) is an edge of that
polygon, and thus the path from a to b.
If t(a) < t(b), it suffices to show that there is a path from a to some other vertex c for which
t(c) > t(a) and along which t never decreases: we can then proceed from c to obtain a path to b.
Let U be the set of all vertices u reachable from a for which t(u) = t(a). As the graph is
connected, there must be a vertex u ∈ U that has a neighbor c 6∈ U . By Claim 15.3.1 u must have
a neighbor c for which t(c) > t(u). Thus, the a path from a through U to c suffices.
Proof. This is trivially true for vertices in F , as no three of them are colinear.
Assume by way of contradiction that there is a vertex a that is colinear with all of its neighbors.
Let ` be that line, and let S + and S − be all the vertices that lie above and below the line,
respectively. Lemma 15.3.3 tells us that both sets S + and S − are connected. Let U be the set of
vertices u reachable from a and such that all of us neighbors lie on `. The vertex a is in U . Let W
be the set of nodes that lie on ` that are neighbors of vertices in U , but which themselves are not
in U . As vertices in W are not in U , Claim 15.3.1 implies that each vertex in W has neighbors in
both S + and S − . As the graph is 3-connected, and removing the vertices in W would disconnect
U from the rest of the graph, there are at least 3 vertices in W . Let w1 , w2 and w3 be three of the
vertices in W .
CHAPTER 15. TUTTE’S THEOREM: HOW TO DRAW A GRAPH 128
We will now obtain a contradiction by showing that G has a minor isomorphic to K3,3 . The three
vertices on one side are w1 , w2 , and w3 . The other three are obtained by contracting the vertex
sets S + , S − , and U .
Lemma 15.4.1. Let (a, b) be any non-boundary edge of the graph, and let ` be a line through
z (a) and z (b) (there is probably just one). Let F0 and F1 be the faces that border edge (a, b) and
let S0 and S1 be the vertices on those faces, other than a and b. Then all the vertices of S0 and S1
lie on opposite sides of `, and none lie on `.
Note: if z (a) = z (b), then we can find a line passing through them and one of the vertices of S0 .
This leads to a contradiction, and thus rules out this type of degeneracy.
Proof. Assume by way of contradiction that the lemma is false. Without loss of generality, we
may then assume that there are vertices of both S0 and S1 on or below the line `. Let s0 and s1
be such vertices. By Lemma 15.3.4 and Claim 15.3.1, we know that both s0 and s1 have
neighbors that lie strictly below the line `. By Lemma 15.3.3, we know that there is a path P
that connects s0 and s1 on which all vertices other than s0 and s1 lie strictly below `.
On the other hand, we can similarly show that that both a and b have neighbors above the line `,
and that they are joined by a path that lies strictly above `. Thus, this path cannot consist of the
edge (a, b) and must be disjoint from P . This contradicts Lemma 15.1.3.
So, we now know that the embedding z contains no degeneracies, that every face is embedded as
a strictly convex polygon, and that the two faces bordering each edge embed on opposites sides of
CHAPTER 15. TUTTE’S THEOREM: HOW TO DRAW A GRAPH 129
that edge. This is all we need to know to prove Tutte’s Theorem. We finish the argument in the
proof below.
Proof of Theorem 15.2.2. We say that a point of the plane is generic if it does not lie on any z (a)
for on any segment of the plane corresponding to an edge (a, b). We first prove that every generic
point lies in exactly one face of G.
Begin with a point that is outside the polygon on which F is drawn. Such a point lies only in the
outside face. For any other generic point we can draw a curve between these points that never
intersects a z (a) and never crosses the intersection of the drawings of edges. That is, it only
crosses drawings of edges in their interiors. By Lemma 15.4.1, when the curve does cross such an
edge it moves from one face to another. So, at no point does it ever appear in two faces.
Now, assume by way of contradiction that the drawings of two edges cross. There must be some
generic point near their intersection that lies in at least two faces. This would be a
contradiction.
15.5 Notes
This is the simplest proof of Tutte’s theorem that I have seen. Over the years, I have taught
many versions of Tutte’s proof by building on expositions by Lovász [LV99] and Geelen [Gee12],
and an alternative proof of Gortler, Gotsman and Thurston [GGT06].
Chapter 16
16.1 Introduction
These notes are still very rough, and will be finished later.
For a vector f and an integer k, we define f {k} to be the sum of the largest k entries of f . For
convenience, we define f {0} = 0. Symbolically, you can define this by setting π to be a
permutation for which
f (π(1)) ≥ f (π(2)) ≥ ... ≥ f (π(n)),
and then setting
k
X
f {k} = f (π(i)).
i=1
For real number x between 0 and n, we define f {x} by making it be piece-wise linear between
consecutive integers. This means that for x between integers k and k + 1, the slope of f {} at x is
f (π(k + 1)). As these slopes are monotone nonincreasing, the function f {x} is concave.
We will prove the following theorem of Lovàsz and Simonovits [LS90] on the behavior of W f .
Theorem 16.1.1. Let W be the transition matrix of the lazy random walk on a d-regular graph
with conductance at least φ. Let g = W f . Then for all integers 0 ≤ k ≤ n
1
g {k} ≤ (f {k − φh} + f {k + φh}) ,
2
where h = min(k, n − k).
130
CHAPTER 16. THE LOVÀSZ - SIMONOVITS APPROACH TO RANDOM WALKS 131
I remark that this theorem has a very clean extension to irregular, weighted graphs. I just present
this version to simplify the exposition.
We can use this theorem to bound the rate of convergence of random walks in a graph. Let p t be
the probability distribution of the walk after t steps, and plot the curves p t {x}. The theorem tells
us that these curves lie beneath each other, and that each curve lies beneath a number of chords
drawn across the previous. The walk is uniformly mixed when the curve reaches a straight line
from (0, 0) to (n, 1). This theorem tells us how quickly the walks approach the straight line.
Today, we will use the theorem to prove a variant of Cheeger’s inequality.
We believe that larger conductance should imply faster mixing. In the case of Theorem 16.1.1, it
should imply lower curves. This is because wider chords lie beneath narrower ones.
Claim 16.2.1. Let h(x) be a convex function, and let z > y > 0. Then,
1 1
(h(x − z) + h(x + z)) ≤ (h(x − y) + h(x + y)) .
2 2
Claim 16.2.2. Let f be a vector, let k ∈ [0, n], and let α1 , . . . , αn be numbers between 0 and 1
such that X
αi = k.
i
Then, X
αi f (i) ≤ f {k}.
i
This should be obvious, and most of you proved something like this when solving problem 2 on
homework 1. It is true because the way one would maximize this sum is by setting x to 1 for the
largest values.
Throughout this lecture, we will only consider lazy random walks on regular graphs. For a set S
and a vertex a, we define γ(a, S) to be the probability that a walk that is at vertex a moves to S
in one step. If a is not in S, this equals one half the fraction of edges from a to S. It is one half
because there is a one half probability that the walk stays at a. Similarly, if a is in S, then γ(a, S)
equals one half plus one half the fraction of edges of a that end in S.
16.3 Warm up
We warm up by proving that the curves must lie under each other.
For a vector f and a set S, we define
X
f (S) = f (a).
a∈S
CHAPTER 16. THE LOVÀSZ - SIMONOVITS APPROACH TO RANDOM WALKS 132
Proof. As the function g {x} is piecewise linear between integers, it suffices to prove it at integers
k. Let k be an integer and let S be a set of size k for which f (S) = f {k}. As g = W f ,
X
g (S) = γ(a, S)f (a).
a∈V
Our proof of the main theorem improves the previous argument by exploiting the conductance
through the following lemma.
Lemma 16.4.1. Let S be any set of k vertices. Then
X
γ(a, S) = (φ(S)/2) min(k, n − k).
a6∈S
Proof. For a 6∈ S, γ(a, S) equals half the fraction of the edges from a that land in S. And, the
number of edges leaving S equals dφ(S) min(k, n − k).
Lemma 16.4.2. Let W be the transition matrix of the lazy random walk on a d-regular graph,
and let g = W f . For every set S of size k with conductance at least φ,
1
g (S) ≤ (f {k − φh} + f {k + φh}) ,
2
where h = min(k, n − k).
CHAPTER 16. THE LOVÀSZ - SIMONOVITS APPROACH TO RANDOM WALKS 133
Proof. To ease notation, define γ(a) = γ(a, S). We prove the theorem by rearranging the formula
X
g (S) = γ(a)f (a).
a∈V
P
Recall that a∈V γ(a) = k.
For every vertex a define
( (
γ(a) − 1/2 if a ∈ S 1/2 if a ∈ S
α(a) = and β(a) =
0 if a ∈
6 S γ(a) if a ∈
6 S.
We now come to the point in the argument where we exploit the laziness of the random walk,
which manifests as the fact that γ(a) ≥ 1/2 for a ∈ S, and so 0 ≤ α(a) ≤ 1/2 for all a. Similarly,
0 ≤ β(a) ≤ 1/2 for all a. So, we can write
X 1X X 1X
α(a)f (a) = (2α(a))f (a), and β(a)f (a) = (2β(a))f (a)
2 2
a∈V a∈V a∈V a∈V
we can set X
z= γ(a)
a6∈S
and write X X
(2α(a)) = k − 2z and (2β(a)) = k + 2z.
a∈V a∈V
Theorem 16.1.1 follows by applying Lemma 16.4.2 to sets S for which f (S) = f {k}, for each
integer k between 0 and n.
CHAPTER 16. THE LOVÀSZ - SIMONOVITS APPROACH TO RANDOM WALKS 134
Reid Andersen observed that the technique of Lovàsz and Simonovits can be used to give a new
proof of Cheeger’s inequality. I will state and prove the result for the special case of d-regular
graphs that we consider in this lecture. But, one can of course generalize this to irregular,
weighted graphs.
Theorem 16.5.1. Let G be a d-regular graph with lazy random walk matrix W , and let
ω2 = 1 − λ be the second-largest eigenvalue of W . Then there is a subset of vertices S for which
√
φ(S) ≤ 8λ.
17.1 Disclaimer
These notes are not necessarily an accurate representation of what happened in class. They are a
combination of what I intended to say with what I think I said. They have not been carefully
edited.
17.2 Overview
Consider a spring network. As in last lecture, we model it by a weighted graph G = (V, E, w),
where wa,b is the spring constant of the edge (a, b). Recall that a stronger spring constant results
in a stronger connection between a and b.
Now, let s and t be arbitrary vertices in V . We can view the network as a large, complex spring
connecting s to t. We then ask for the spring constant of this complex spring. We call it the
effective spring constant between s and t.
To determine what it is, we recall the definition of the spring constant for an ordinary spring: the
potential energy in a spring connecting a to b is the spring constant times times the square of the
length of the spring, divided by 2. We use this definition to determine the effective spring
constant between s and t.
Recall again that if we fix the positions of s and t on the real line, say to 0 and 1, then the
135
CHAPTER 17. MONOTONICITY AND ITS FAILURES 136
As s and t are separated by a distance of 1, we may define twice this quantity to be the effective
spring constant of the entire network between s and t. To verify that this definition is consistent,
we should consider what happens if the displacement between s and t is something other than 1.
If we fix the position of s to 0 and the position of t to y, then the homogeniety of the expression
for energy (17.1) tells us that the vector yx will minimize the energy subject to the boundary
conditions. Moreover, the energy in this case will be y 2 /2 times the effective spring constant.
17.4 Monotonicity
Rayleigh’s Monotonicity Principle tells us that if we alter the spring network by decreasing some
of the spring constants, then the effective resistance between s and t will not increase.
Theorem 17.4.1. Let G = (V, E, w) be a weighted graph and let G
b = (V, E, w)
b be another
weighted graph with the same edges and such that
ba,b ≤ wa,b
w
for all (a, b) ∈ E. For vertices s and t, let cs,t be the effective spring constant between s and t in
G and let b cs,t be the analogous quantity in G.
b Then,
cs,t ≤ cs,t .
b
Proof. Let x be the vector of minimum energy in G such that x (s) = 0 and x (t) = 1. Then, the
energy of x in G
b is no greater:
1 X 1 X
ba,b (x (a) − x (b))2 ≤
w wa,b (x (a) − x (b))2 = cs,t .
2 2
(a,b)∈E (a,b)∈E
While this principle seems very simple and intuitively obvious, it turns out to fail in just slightly
more complicated situtations. Before we examine them, I will present the analogous material for
electrical networks.
There are two (equivalent) ways to define the effective resistance between two vertices in a
network of resistors. The first is to start with the formula
V = IR,
CHAPTER 17. MONOTONICITY AND ITS FAILURES 137
This corresponds to a flow of 1 from s to t. We then solve for the voltages that realize this flow:
Lv = i ext ,
by
v = L+ i ext .
We thus have
def
v (s) − v (t) = i Text v = i Text L+ i ext = Reff (s, t).
This agrees with the other natural approach to defining effective resistance: twice the energy
dissipation when we flow one unit of current from s to t.
Theorem 17.5.1. Let i be the electrical flow of one unit from vertex s to vertex t in a graph G.
Then,
Reff s,t = E (i ) .
Theorem 17.5.2 (Rayleigh’s Monotonicity). The effective resistance between a pair of vertices
cannot be decreased by increasing the resistance of some edges.
CHAPTER 17. MONOTONICITY AND ITS FAILURES 138
17.6 Examples
In the case of a path graph with n vertices and edges of weight 1, the effective resistance between
the extreme vertices is n − 1.
In general, if a path consists of edges of resistance r(1, 2), . . . , r(n − 1, n) then the effective
resistance between the extreme vertices is
Ohm’s law then tells us that the current flow over the edge (i, i + 1) will be
If we have k parallel edges between two nodes s and t of resistances r1 , . . . , rk , then the effective
resistance is
1
Reff (s, t) = .
1/r1 + · · · + 1/rk
Again, to see this, note that the flow over the ith edge will be
1/ri
,
1/r1 + · · · + 1/rk
We will now exhibit a breakdown of monotonicity in networks of nonlinear elements. In this case,
we will consider a network of springs and wires. For examples in electrical networks with resistors
and diodes or for networks of pipes with valves, see [PP03] and [CH91].
There will be 4 important vertices in the network that I will describe, a, b, c and d. Point a is
fixed in place at the top of my aparatus. Point d is attached to an object of weight 1. The
network has two springs of spring constant 1: one from point a to point b and one from point c to
point d. There is a very short wire connecting point b to point c.
As each spring is supporting one unit of weight, each is stretched to length 1. So, the distance
from point a to point d is 2.
I now add two more wires to the network. One connects point a to point c and the other connects
point b to point d. Both have lengths 1 + , and so are slack. Thus, the addition of these wires
does not change the position of the weight.
CHAPTER 17. MONOTONICITY AND ITS FAILURES 139
I now cut the small wire connecting point b to point c. While you would expect that removing
material from the supporting structure would cause the weight to go down, it will in fact move
up. To see why, let’s analyze the resulting structure. It consists of two suppors in parallel. One
consists of a spring from point a to point b followed by a wire of length 1 + from point b to d.
The other has a wire of length 1 + from point a to point c followed by a spring from point c to
point d. Each of these is supporting the weight, and so each carries half the weight. This means
that the length of the springs will be 1/2. So, the distance from a to d should be essentially 3/2.
This sounds like a joke, but we will see in class that it is true. The measurements that we get will
not be exactly 2 and 3/2, but that is because it is difficult to find ideal springs at Home Depot.
In the example with resistors and diodes, one can increase electrical flow between two points by
cutting a wire!
I will now explain some analogous behavior in traffic networks. We will examine the more
formally in the next lecture.
We will use a very simple model of a road in a traffic network. It will be a directed edge between
two vertices. The rate at which traffic can flow on a road will depend on how many cars are on
the road: the more cars, the slower the traffic. I will assume that our roads are linear. That is,
when a road has flow f , the time that it takes traffic to traverse the road is
af + b,
for some nonnegative constants a and b. I call this the characteristic function of the road.
We first consider an example of Pigou consisting of two roads between two vertices, s and t. The
slow road will have characteristic function 1: think of a very wide super-highway that goes far out
of the way. No matter how many cars are on it, the time from s to t will always be 1. The fast
road is better: its characteristic is f . Now, assume that there is 1 unit of traffic that would like to
go from s to t.
A global planner that could dictate the route that everyone takes could minimize the average time
of the traffic going from s to t by assigning half of the traffic to take the fast road and half of the
traffic to take the slow road. In this case, half of the traffic will take time 1 and half will take time
1/2, for an average travel time of 3/4. To see that this is optimal, let f be the fraction of traffic
that takes the fast road. Then, the average travel time will be
f · f + (1 − f ) · 1 = f 2 − f + 1.
2f − 1 = 0,
On the other hand, this is not what people will naturally do if they have perfect information and
freedom of choice. If a f < 1 fraction of the flow is going along the fast road, then those travelling
on the fast road will get to t faster than those going on the slow road. So, anyone going on the
slow road would rather take the fast road. So, all of the traffic will wind up on the fast road, and
it will become not-so-fast. All of the traffic will take time 1.
We call this the Nash Optimal solution, because it is what everyone will do if they are only
maximizing their own benefit. You should be concerned that this is not as well as they would do
if they allowed some authority to dictate their routes. For example, the authority could dictate
that half the cars go each way every-other day, or one way in the morning and another at night.
Let’s see an even more disturbing example.
We now examine Braes’s Paradox, which is analogous to the troubling example we saw with
springs and wires. This involves a network with 4 vertices, a, b, c, and d. All the traffic starts at
s = a and wants to go to t = d. There are slow roads from s to c and from d to t, and fast roads
from s to d and from c to t. If half of the traffic goes through route sct and the other half goes
through route sdt, then all the traffic will go from s to t in time 3/2. Moroever, noone can
improve their lot by taking a different route, so this is a Nash equilibrium.
We now consider what happens if some well-intentioned politician decides to build a very fast
road connecting c to d. Let’s say that its characteristic function is 0. This opens up a faster
route: traffic can go from s to c to d to t. If no one else has changed route, then this traffic will
reach t in 1 unit of time. Unfortunately, once everyone realizes this all the traffic will take this
route, and everyone will now require 2 units of time to reach t.
Let’s prove that formally. Let p1 , p2 and p3 be the fractions of traffic going over routes sct, sdt,
and scdt, respectively. The cost of route sct is p1 + p3 + 1. The cost of route sdt is p2 + p3 + 1.
And, the cost of route scdt is p3 + p3 . So, as long as p3 is less than 1, the cheapest route will be
scdt. So, all the traffic will go that way, and the cost of every route will be 2.
In any traffic network, we can measure the average amount of time it takes traffic to go from s to
t under the optimal flow. We call this the cost of the social optimum, and denote it by Opt(G).
When we let everyone pick the route that is best for themselves, the resulting solution is a Nash
Equilibrium, and we denote it by Nash(G).
The “Price of Anarchy” is the cost to society of letting everyone do their own thing. That is, it is
the ratio
Nash(G)
.
Opt(G)
In these examples, the ratio was 4/3. In the next lecture, we will show that the ratio is never
CHAPTER 17. MONOTONICITY AND ITS FAILURES 141
more than 4/3 when the cost functions are linear. If there is time today, I will begin a more
formal analysis of Opt(G) and Nash(G) that we will need in our proof.
Let the set of s-t paths be P1 , . . . , Pk , and let αi be the fraction of the traffic that flows on path
Pi . In the Nash equilibrium, no car will go along a sub-optimal path. Assuming that each car has
a negligible impact on the traffic flow, this means that every path Pi that has non-zero flow must
have minimal cost. That is, for all i such that αi > 0 and all j
c(Pi ) ≤ c(Pj ).
Society in general cares more about the average time its takes to get from s to t. If we have a flow
that makes this average time low, everyone could rotate through all the routes and decrease the
total time that they spend in traffic. So, the social cost of the flow f is
def
c(α1 , . . . , αk ) = =
X X X
αi c(Pi ) = αi ce (fe )
i i e∈Pi
X X
= ce (fe ) αi
e i:e∈Pi
X
= ce (fe )fe .
e
Theorem 17.12.1. All local minima of the social cost function are global minima. Moreover, the
set of global minima is convex.
and recall that we assumed that ae and be are both at least zero. The cost function on each edge
is convex. It is strictly convex if ae > 0, but that does not matter for this theorem.
If you take two flows, say f 0 and f 1 , the line segments of flows between them contains the flows of
the form f t where
fet = tfe1 + (1 − t)fe0 ,
for 0 ≤ t ≤ 1.
By the convexity of each cost function, we know that the cost of any flow f t is at most the
maximum of the costs of f 0 and f 1 . So, if f 1 is the global optimum and f 0 is any other flow with
CHAPTER 17. MONOTONICITY AND ITS FAILURES 142
higher cost, the flow f will have a social cost lower than f 0 . This means that f 0 cannot be a
local optimum. Similarly, if both f 0 and f 1 are global optima, then f t must be as well.
Chapter 18
18.1 Disclaimer
These notes are not necessarily an accurate representation of what happened in class. They are a
combination of what I intended to say with what I think I said. They have not been carefully
edited.
18.2 Overview
In this lecture we will consider two generalizations of resistor networks: resistor networks with
non-linear resistors and networks whose resistances change over time. While they were introduced
over 50 years ago, non-linear resistor networks seem to have been recently rediscovered in the
Machine Learning community. We will discuss how they can be used to improve the technique we
learned in Lecture 13 for semi-supervised learning.
The material on time-varying networks that I will present comes from Cameron Musco’s senior
thesis from 2012.
A non-linear resistor network, as defined by Duffin [Duf47], is a like an ordinary resistor network
but the resistances depend on the potential differences across them. In fact, it might be easier not
to talk about resistances, and just say that the amount of flow across an edge increases as the
potential difference across the edge does. For every resistor e, there is a function
φe (v)
143
CHAPTER 18. DYNAMIC AND NONLINEAR NETWORKS 144
that gives the flow over resistor e when there is a potential difference of v between its terminals.
We will restrict our attention to functions φ that are
a. continuous,
b. monotone increasing,
Note that condition c implies that φe (0) = 0. For an ordinary resistor of resistance r, we have
φe (v) = v/r.
18.4 Energy
We will show that the setting of the voltages that minimizes the total energy provides the flow I
claimed exists.
In the case of linear resistors, where φe (v) = v/r,
1 v2
Φe (v) = ,
2 r
which is exactly the energy function we introduced in Lecture 13.
The conditions on φe imply that
d. Φe is strictly convex1 ,
e. Φe (0) = 0, and
f. Φe (−x) = Φe (x).
1
That is, for all x 6= y and all 0 < λ < 1, Φe (λx + (1 − λ)y) ≤ λΦe (x) + (1 − λ)Φe (y).
CHAPTER 18. DYNAMIC AND NONLINEAR NETWORKS 145
We remark that a function that is strictly convex has a unique minimum, and that a sum of
strictly convex functions is strictly convex.
Theorem 18.4.1. Let G = (V, E) be a non-linear resistor network with functions fe satisfying
conditions a, b and c for every e ∈ E. For every set S ⊆ V and fixed voltages wa for a ∈ S, there
exists a setting of voltages va for a 6∈ S that result in a flow of current that satisfies the flow-in
equals flow-out conditions at every a 6∈ S. Moreover, these voltages are unique.
As each of the functions Φ(a,b) are strictly convex, Φ is as well. So, Φ has a minimum subject to
the fixed voltages. At this minimum point, we know that for every a 6∈ S
∂Φ(v)
0=
∂va
X ∂Φ(a,b) (va − vb )
=
∂va
b:(a,b)∈E
X
= φ(a,b) (va − vb ).
b:(a,b)∈E
In Lecture 13, I suggested an approach to estimating a function f on the vertices of a graph given
its values at a set S ⊆ V : X
min (x(a) − x(b))2 .
x:f (a)=x(a) for a∈S
(a,b)∈E
Moreover, we saw that we can minimize such a function by solving a system of linear equations.
Unfortunately, there are situtations in which this approach does not work very well. In general,
this should not be surprising: sometimes the problem is just unsolvable. But, there are cases in
which it would be reasonable to solve the learning problem in which this approach fails.
CHAPTER 18. DYNAMIC AND NONLINEAR NETWORKS 146
Better results are sometimes obtained by modifying the penalty function. For example, Bridle
and Zhu [BZ13] (and, essentially, Herbster and Guy [HL09]) suggest
X
min |x(a) − x(b)|p ,
x:f (a)=x(a) for a∈S
(a,b)∈E
We can establish a corresponding, although different, energy for the flows. Let ψ be the inverse of
φ. We then define the flow-energy of an edge that carries a flow of f to be
Z f
def
Ψ(f ) = ψ(t)dt.
0
If we minimize the sum of the flow-energies over the space of flows, we again recover the unique
valid flow in the network. (The function Φ is implicit in the work of Duffin. The dual Ψ comes
from Millar [Mil51]).
In the classical case, Φ and Ψ are the same. While they are not the same here, their sum is. We
will later prove that when v = ψ(f ),
Ψ(f ) + Φ(v) = f v.
Ψ(f ) + Φ(v) ≥ f v,
Theorem 18.6.1. Under the conditions of Theorem 18.4.1, let fext be the vector of external flows
resulting from the induced voltages. Let f be the flow on the edges that is compatible with fext and
that minimizes
def
X
Ψ(f ) = Ψ(a,b) (f(a,b) ).
(a,b)∈E
Then, f is the flow induced by the voltages shown to exist in Theorem 18.4.1.
Sketch. We first show that f is a potential flow. That is, that there exist voltages v so that for
every edge (a, b), f(a,b) = φ(a,b) (va − vb ). The theorem then follows by the uniqueness established
in Theorem 18.4.1.
To prove that f is a potential flow, we consider the potential difference that the flow “wants” to
induce on each edge, ψ(f(a,b) ). There exist vertex potentials that agree with these desired
CHAPTER 18. DYNAMIC AND NONLINEAR NETWORKS 147
potential differences if an only if for every pair of vertices and for every pair of paths between
them, the sum of the desired potential differences along the edges in the paths is the same. To see
this, arbitrarily fix the potential of one vertex, such as s. We may then set the potential of any
other vertex a by summing the desired potential differences along the edges in any path from s.
Equivalenty, the desired potential differences are realizable if and only if the sum of these desired
potential differences is zero around every cycle. To show that this is the case, we use the
minimality of the flow. Because Ψ(f ) is strictly convex, small changes to the optimum have a
negligible effect on its value (that is, the first derivative is zero). So, pushing an amount of flow
around any cycle will not change the value of Ψ(f ). That is, the sum of the derivatives around
any cycle will be zero. As
∂
Ψe (f ) = ψe (f ),
∂f
this means that the sum of the desired potential differences around every cycle is zero.
Proof. One can prove this theorem through “integration by parts”. But, I prefer a picture. In the
following two figures, the curve is the plot of φ. In the first figure, the shaded region is the
integral of φ between 0 and v (2 in this case). In the second figure, the shaded region is the
integral of ψ between 0 and φ(v) (just turn the picture on its side). It is clear that these are
complementary parts of the rectangle between the axes and the point (v, φ(v)).
The bottom line is that almost all of the classical theory can be carried over to nonlinear networks.
We now turn our attention to networks of resistors whose resistance changes over time. We
consider a natural model in which edges get “worn out”: as they carry more flow their resistance
increases. One physical model that does this is a thermistor. A thermistor is a resistor whose
resistance increases with its temperature. These are used in thermostats.
Remember the “energy dissipation” of a resistor? The energy dissipates as heat. So, the
temperature of resistor increases as its resistance times the square of the flow through it. To
prevent the temperatures of the resistors from going to infinity, we will assume that there is an
ambient temperature TA , and that they tend to the ambient temperature. I will denote by Te the
temperature of resistor e, and I will assume that there is a constant αe for each resistor so that its
resistance
re = αe Te . (18.1)
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
f
f
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 0.5 1 1.5 2 0 0.5 1 1.5 2
v v
Now, assume that we would like to either flow a current between two vertices s and t, or that we
have fixed the potentials of s and t. Given the temperature of every resistor at some moment, we
can compute all their resistances, and then compute the resulting electrical flow as we did in
Lecture 13. Let fe be the resulting flow on resistor e. The temperature of e will increase by re fe2 ,
and it will also increase in proportion to the difference between its present temperature and the
ambient temperature.
This gives us the following differential equation for the change in the temperature of a resistor:
∂Te
= re fe2 − (Te − TA ). (18.2)
∂t
Ok, there should probably be some constant multiplying the (Te − TA ) term. But, since I haven’t
specified the units of temperature we can just assume that the constant is 1.
By substituting in (18.1) we can eliminate the references to resistance. We thus obtain
∂Te
= αe Te fe2 − (Te − TA ).
∂t
There are now two natural questions to ask: does the system converge, and if so, what does it
converge to? If we choose to impose a current flow between s and t, the system does not need to
converge. For example, consider just one resistor e between vertices s and t with αe = 2. We then
CHAPTER 18. DYNAMIC AND NONLINEAR NETWORKS 149
find
∂Te
= αe Te fe2 − (Te − TA ) = 2Te − (Te − TA ) = Te + TA .
∂t
So, the temperature of the resistor will go to infinity.
For this reason, I prefer to just fix the voltages of certain vertices. Under these conditions, we can
prove that the system will converge. While I do not have time to prove this, we can examine what
it will converge to.
If the system converges, that is if the voltages at the nodes converge along with the potential
drops and flows across edges, then
∂Te
0= = αe Te fe2 − (Te − TA ).
∂t
To turn this into a relationship between fe and ve , we apply the identity fe re = ve , which
becomes fe αe Te = ve , to obtain
0 = ve fe − Te + TA .
To eliminate the last occurence of Te , we then multiply by fe and apply the same identity to
produce
0 = ve fe2 − ve /αe + fe TA .
The solutions of this equation in fe are given by
s
TA 2 TA
1
fe = ± + − .
αe 2ve 2ve
The correct choice of sign is the one that gives this the same sign as ve :
s
1 (2ve )2
fe = + TA2 − TA . (18.3)
2ve αe
When ve is small this approaches zero, so we define it to be zero when ve is zero. As ve becomes
−1/2
large this expression approaches αe . Similarly, when ve becomes very negative, this approaches
−1/2
−αe . If we now define s
1 (2ve )2
φe (ve ) = + TA2 − TA ,
2ve αe
we see that this function satisfies properties a, b and c. Theorem 18.4.1 then tells us that a stable
solution exists.
We now observe that when the ambient temperature is low, a thermistor network produces a
minimum s-t cut in a graph. The weights of the edges in the graph are related to αe . For
CHAPTER 18. DYNAMIC AND NONLINEAR NETWORKS 150
simplicity, we will just examine the case when all αe = 1. If we take the limit as TA approaches
zero, then the behavior of φe is
0
if ve = 0
φe (ve ) = 1 if ve > 0
−1 if ve < 0.
We will obtain similar behavior for small TA : if there is a non-negligible potential drop across an
edge, then the flow on that edge will be near 1. So, every edge will either have a flow near 1 or a
negligible potential drop. When an edge has a flow near 1, its energy will be near 1. On the other
hand, the energy of edges with negligible potential drop will be near 0.
So, in the limit of small temperatures, the energy minimization problem becomes
X
min |v(a) − v(b)| .
v:v(s)=0,v(t)=1
(a,b)∈E
One can show that the minimum is achieved when all of the voltages are 0 or 1, in which case the
energy is the number of edges going between voltage 0 and 1. That is, the minimum is achieved
by a minimum s-t cut.
Part IV
151
Chapter 19
19.1 Overview
In this lecture we will see how high-frequency eigenvalues of the Laplacian and Adjacency matrix
can be related to independent sets and graph coloring. Recall the we number the Laplacian
matrix eigenvalues in increasing order:
0 = λ1 ≤ λ2 ≤ · · · ≤ λn .
We call the adjacency matrix eigenvalues µ1 , . . . , µn , and number them in the reverse order:
µ1 ≥ · · · ≥ µn .
A coloring of a graph is an assignment of one color to every vertex in a graph so that each edge
connects vertices of different colors. We are interested in coloring graphs while using as few colors
as possible. Formally, a k-coloring of a graph is a function c : V → {1, . . . , k} so that for all
(u, v) ∈ V , c(u) 6= c(v). A graph is k-colorable if it has a k-coloring. The chromatic number of a
graph, written χ(G), is the least k for which G is k-colorable. A graph G is 2-colorable if and only
if it is bipartite. Determining whether or not a graph is 3-colorable is an NP-complete problem []
The famous 4-Color Theorem [AH77a, AH77b] says that every planar graph is 4-colorable.
A set of vertices S is independent if there are no edges between vertices in S. In particular, each
color class in a coloring is an independent set. The size of the largest independent set in a graph,
which we call its independence number is written α(G). As a k-colorable graph with n vertices
must have a color class of size at least n/k,
n
α(G) ≥ .
χ(G)
152
CHAPTER 19. INDEPENDENT SETS AND COLORING 153
The problem of finding large independent sets in a graph is NP-Complete, and it is very difficult
to even approximate the size of the largest independent set in a graph [FK98]. However, for some
carefully chosen graphs spectral analysis provides very good bounds on the sizes of independent
sets.
One of the first results in spectral graph theory was Hoffman’s [Hof70] proof the following upper
bound on the size of an independent set in a graph G.
Theorem 19.3.1. Let G = (V, E) be a d-regular graph, and let µn be its smallest adjacency
matrix eigenvalue. Then
−µn
α(G) ≤ n .
d − µn
Recall that µn < 0. Otherwise this theorem would not make sense. We will prove a generalization
of Hoffman’s theorem due to Godsil and Newman [GN08]:
Theorem 19.3.2. Let S be an independent set in G, and let dave (S) be the average degree of a
vertex in S. Then,
dave (S)
|S| ≤ n 1 − .
λn
This is a generalization because in the d-regular case dave = d and λn = d − µn . So, these bounds
are the same for regular graphs:
dave (S) λn − d −µn
1− = = .
λn λn d − µn
Let S be an independent set of vertices, let 1S be the characteristic vector of S, and let d(S) be
the sum of the degrees of vertices in S. Consider the vector
x = 1S − s1,
The reason that we subtracted s1 from 1S is that this minimizes the norm of the result. We
compute
Thus,
dave (S) |S| dave (S)sn dave (S)
λn ≥ = = .
n(s − s2 ) n(s − s2 ) 1−s
Re-arranging terms, this gives
dave (S)
1− ≥ s,
λn
which is equivalent to the claim of the theorem.
We will use the computation of the norm of x often, so we will make it a claim.
Claim 19.3.4. For a vector x of length n, the value of t that minimizes the norm of x − t1 is
t = 1T x /n.
Let’s examine what Hoffman’s bound on the size of the largest independent set tells us about
Paley graphs.
If G is a Paley graph and S is an independent set, we have n = p, d = (p − 1)/2, and
√
λn = (p + p)/2, so Hoffman’s bound tells us that
dave (S)
|S| ≤ n 1 −
λn
p−1
=p 1− √
p+ p
√
p+1
=p √
p+ p
√
= p.
CHAPTER 19. INDEPENDENT SETS AND COLORING 155
√
One can also show that every clique in a Paley graph has size at most p.
A graph is called a k-Ramsey graph if it contains no clique or independent set of size k. It is a
challenge to find large k-Ramsey graphs. Equivalently, it is challenging to find k-Ramsey graphs
on n vertices for which k is small. In one of the first papers on the Probabilistic Method in
Combinatorics, Erdös proved that a random graph on n vertices in which each edge is included
with probability 1/2 is probably 2 log2 n Ramsey [Erd47].
However, constructing explicit Ramsey graphs has proved much more challenging. Until recently,
Paley graphs were among the best known. A recent construction of Barak, Rao, Shatltiel and
o(1)
Wigderson [BRSW12] constructs explicit graphs that are 2(log n) Ramsey.
As a k-colorable graph must have an independent set of size at least n/k, an upper bound on the
sizes of independent sets gives a lower bound on its chromatic number. However, this bound is
not always a good one.
For example, consider a graph on 2n vertices consisting of a clique (complete graph) on n vertices
and n vertices of degree 1, each of which is connected to a different vertex in the clique. The
chromatic number of this graph is n, because each of the vertices in the clique must have a
different color. However, the graph also has an independent set of size n, which would only give a
lower bound of 2 on the chromatic number.
Hoffman proved the following lower bound on the chromatic number of a graph that does not
require the graph to be regular. Numerically, it is obtained by dividing n by the bound in
Theorem 19.3.1. But, the proof is very different because that theorem only applies to regular
graphs.
Theorem 19.5.1.
µ1 − µn µ1
χ(G) ≥ =1+ .
−µn −µn
The proof of this theorem relies on the following inequality whose proof we defer to Section 19.6.
To state it, we introduce the notation λmax (M ) and λmin (M ) to indicate the largest and smallest
eigenvalues of the matrix M .
Proof of Theorem 19.5.1. Let G be a k-colorable graph. After possibly re-ordering the vertices,
the adjacency matrix of G can be written
M 1,2 · · · M 1,k
0
M T1,2 0 · · · M 2,k
.. . (19.1)
.. .. ..
. . . .
M T1,k M T2,k · · · 0
(k − 1)λmin (M ) + λmax (M ) ≤ 0.
To return to our example of the n clique with n degree-1 vertices attached, I examined an
example with n = 6. We find µ1 = 5.19 and µ12 = −1.62. This gives a lower bound on the
chromatic number of 4.2, which implies a lower bound of 5. We can improve the lower bound by
re-weighting the edges of the graph. For example, if we give weight 2 to all the edges in the clique
and weight 1 to all the others, we obtain a bound of 5.18, which agrees with the chromatic
number of this graph which is 6.
To prove Lemma 19.5.2, we begin with the case of k = 2. The general case follows from this one
by induction. While the lemma in the case k = 2 when there are zero blocks on the diagonal
follows from Proposition 4.5.4, we require the general statement for induction.
We first consider the case in which neither x 1 nor x 2 is an all-zero vector. In this case, we set
kx 2 k
!
kx 1 k x 1
y= .
− kx 1k
kx 2 k x 2
y T Ay ≥ λmin (A).
We have
as x is a unit vector.
We now return to the case in which kx 2 k = 0 (or kx 1 k = 0, which is really the same case).
Lemma 4.3.1 tells us that λmax (B) ≤ λmax (A). So, it must be the case that x 1 is an eigenvector
of eigenvalue λmax (A) of B, and thus λmax (B) = λmax (A). To finish the proof, also observe that
Lemma 4.3.1 implies
λmax (D) ≥ λmin (D) ≥ λmin (A).
Proof of Lemma 19.5.2. For k = 2, this is exactly Lemma 19.6.1. For k > 2, we apply induction.
Let
M 1,2 · · ·
M 1,1 M 1,k−1
M T1,2 M 2,2 · · · M 2,k−1
B= .
.. .. . . ..
. . . .
T T
M 1,k−1 M 2,k−1 · · · M k−1,k−1
Lemma 4.3.1 now implies.
λmin (B) ≥ λmin (A).
CHAPTER 19. INDEPENDENT SETS AND COLORING 158
Applying Lemma 19.6.1 to B and the kth row and column of A, we find
Graph Partitioning
Computer Scientists are often interested in cutting, partitioning, and finding clusters of vertices in
graphs. This usually means finding a set of vertices that is connected to the rest of the graph by a
small number of edges. There are many ways of balancing the size of the set of vertices with the
number of edges. We will examine isoperimetric ratio and conductance, and will find that they
are intimately related to the second-smallest eigenvalue of the Laplacian and the normalized
Laplacian. The motivations for measuring these range from algorithm design to data analysis.
Let S be a subset of the vertices of a graph. One way of measuring how well S can be separated
from the graph is to count the number of edges connecting S to the rest of the graph. These
edges are called the boundary of S, which we formally define by
def
∂(S) = {(a, b) ∈ E : a ∈ S, b 6∈ S} .
We are less interested in the total number of edges on the boundary than in the ratio of this
number to the size of S itself. For now, we will measure this in the most natural way–by the
number of vertices in S. We will call this ratio the isoperimetric ratio of S, and define it by
def |∂(S)|
θ(S) = .
|S|
The isoperimetric ratio of a graph1 is the minimum isoperimetric ratio over all sets of at most
half the vertices:
def
θG = min θ(S).
|S|≤n/2
We will now derive a lower bound on θG in terms of λ2 . We will present an upper bound, known
as Cheeger’s Inequality in Chapter 21.
1
Other authors call this the isoperimetric number.
159
CHAPTER 20. GRAPH PARTITIONING 160
Proof. As
x T LG x
λ2 = min ,
x :x T 1=0 x T x
for every non-zero x orthogonal to 1 we know that
x T LG x ≥ λ2 x T x .
To exploit this inequality, we need a vector related to the set S. A natural choice is 1S , the
characteristic vector of S, (
1 if a ∈ S
1S (a) =
0 otherwise.
We find X
1TS LG 1S = (1S (a) − 1S (b))2 = |∂(S)| .
(a,b)∈E
However, χS is not orthogonal to 1. To fix this, use
x = 1S − s1,
so (
1−s for a ∈ S, and
x (a) =
−s otherwise.
We have x T 1 = 0, and
X
x T LG x = ((1S (a) − s) − (1S (b) − s))2 = |∂(S)| .
(a,b)∈E
This theorem says that if λ2 is big, then G is very well connected: the boundary of every small set
of vertices is at least λ2 times something just slightly smaller than the number of vertices in the
set.
Re-arranging terms slightly, Theorem 20.1.1 can be stated as
|∂(S)|
θ(S) = |V | ≥ λ2 .
|S| |V − S|
CHAPTER 20. GRAPH PARTITIONING 161
20.2 Conductance
The formula for conductance is appropriately normalized for weighted graphs. Instead of counting
the edges on the boundary, we count the sum of their weights. Similarly, the denominator
depends upon the sum of the weighted degrees of the vertices in S. We write d(S) for the sum of
the degrees of the vertices in S. Thus, d(V ) is twice the sum of the weights of edges in the graph.
For a set of edges F , we write w(F ) for the sum of the weights of edges in F . We define the
conductance of S to be
def w(∂(S))
φ(S) = .
min(d(S), d(V − S))
Note that many similar, although sometimes slightly different, definitions appear in the literature.
For example, we would instead use
d(V )w(∂(S))
,
d(S)d(V − S)
which appears below in (20.3).
We define the conductance of a graph G to be
def
φG = min φ(S).
S⊂V
The conductance of a graph is more useful in many applications than the isoperimetric number. I
usually find that conductance is the more useful quantity when you are concerned about edges,
and that isoperimetric ratio is most useful when you are concerned about vertices. Conductance
is particularly useful when studying random walks in graphs.
It seems natural to try to relate the conductance to the following generalized Rayleigh quotient:
y T Ly
. (20.1)
y T Dy
If we make the change of variables
D 1/2 y = x ,
then this ratio becomes
x T D −1/2 LD −1/2 x
.
xTx
That is an ordinary Rayleigh quotient, which we understand a little better. The matrix in the
middle is called the normalized Laplacian (see [Chu97]). We reserve the letter N for this matrix:
def
N = D −1/2 LD −1/2 .
This matrix often proves more useful when examining graphs in which nodes have different
degrees. We will let 0 = ν1 ≤ ν2 ≤ · · · ≤ νn denote the eigenvalues of N .
CHAPTER 20. GRAPH PARTITIONING 162
The eigenvector of eigenvalue 0 of N is d 1/2 , by which I mean the vector whose entry for vertex u
is the square root of the degree of u. Observe that
we find
y T Ly
ν2 = min .
y ⊥d y T Dy
ν2 /2 ≤ φG . (20.2)
Proof. We would again like to again use 1S as a test vector. But, it is not orthogonal to d . To fix
this, we subtract a constant. Set
y = 1S − σ1,
where
σ = d(S)/d(V ).
You should now check that y T d = 0:
It remains to compute y T Dy . If you remember the previous computation like this, you would
guess that it is d(S)(1 − σ) = d(S)d(V − S)/d(V ), and you would be right:
X X
y T Dy = d(u)(1 − σ)2 + d(u)σ 2
u∈S u6∈S
So,
y T Ly w(∂(S))d(V )
ν2 ≤ T
= . (20.3)
y Dy d(S)d(V − S)
Proof. As the larger of d(S) and d(V − S) is at least half of d(V ), we find
w(∂(S))
ν2 ≤ 2 .
min(d(S), d(V − S))
20.4 Notes
There are many variations on the definitions used in this chapter. For example, sometimes one
wants to measure the number of vertices on the boundary of a set, rather than the number of
edges. The ratio of the number of boundary vertices to internal vertices is often called expansion.
But, authors are not consistent about these and related terms. Cut ratio is sometimes used
instead of isoperimetric ratio. When reading anything in this area, be sure to check the formulas
for the definitions.
Chapter 21
Cheeger’s Inequality
In the last chapter we learned that φ(S) ≥ ν2 /2 for every S ⊆ V . Cheeger’s inequality is a partial
converse. It says that there exists a set of vertices S for which
√
φ(S) ≤ 2ν2 ,
and provides an algorithm for using the eigenvector of ν2 to find such a set.
Cheeger [Che70] first proved his famous inequality for manifolds. Many discrete versions of
Cheeger’s inequality were proved in the late 80’s [SJ89, LS88, AM85, Alo86, Dod84, Var85]. Some
of these consider the walk matrix instead of the normalized Laplacian, and some consider the
isoperimetic ratio instead of conductance. The proof in this Chapter follows an approach
developed by Trevisan [Tre11].
Cheeger’s inequality proves that if we have a vector y , orthogonal to d , for which the generalized
Rayleigh quotient (20.1) is small, then one can obtain a set of small conductance from y . We
obtain such a set by carefully choosing a real number τ , and setting
Stau = {a : y (a) ≤ τ } .
164
CHAPTER 21. CHEEGER’S INEQUALITY 165
We then set
z = y − y (j)1.
This vector z satisfies z (j) = 0. And, the following lemma tells us that
z T Lz y T Ly
≤ .
z T Dz y T Dy
Lemma 21.1.2. Let v s = y + s1. Then, the minimum of v Ts Dv Ts is achieved at the s for which
v Ts d = 0.
Proof. The derivative with respect to s is 2d T v s , and this is zero at the minimum.
Theorem 21.1.3. Let G be a weighted graph, let L be its Laplacian, and let d be its vector of
weighted degrees. Let z be a vector that is centered with respect to d . Then, there is a number τ
for which the set Sτ = {a : z (a) < τ } satisfies
r
z T Lz
φ(Sτ ) ≤ 2 T .
z Dz
z (1)2 + z (n)2 = 1.
This can be achieved by multiplying z by a constant. We begin our proof of Cheeger’s inequality
by defining
z T Lz
ρ= T .
z Dz
√
So, we need to show that there is a τ for which φ(Sτ ) ≤ 2ρ.
Recall that
w(∂(S))
φ(S) = .
min(d(S), d(V − S))
We will define a distribution on τ for which we can prove that
p
E [w(∂(Sτ ))] ≤ 2ρ E [min(d(Sτ ), d(V − Sτ ))] .
CHAPTER 21. CHEEGER’S INEQUALITY 166
as z (1) ≤ 0 ≤ z (n).
Similarly, the probability that τ lies in the interval [a, b] is
Z b
2 |t| = sgn(b)b2 − sgn(a)a2 ,
t=a
where
1
if x > 0
sgn(x) = 0 if x = 0, and
−1 if x < 0.
Lemma 21.1.4.
X X
Et [w(∂(Sτ ))] = wa,b Prt [(a, b) ∈ ∂(Sτ )] ≤ wa,b |z (a) − z (b)| (|z (a)| + |z (b)|). (21.1)
(a,b)∈E (a,b)∈E
z (a)2 − z (b)2 = |(z (a) − z (b))(z (a) + z (b))| ≤ |z (a) − z (b)| (|z (a)| + |z (b)|).
Lemma 21.1.5.
T
Et [min(d(Sτ ), d(V − Sτ ))] = z Dz .
That is, for a < j, a is in the smaller set if τ < 0; and, for a ≥ j, a is in the smaller set if τ ≥ 0.
So,
X X
Et [min(d(Sτ ), d(V − Sτ ))] = Pr [z (a) < τ and τ < 0] d(a) + Pr [z (a) > τ and τ ≥ 0] d(a)
a<j a≥j
X X
= Pr [z (a) < τ < 0] d(a) + Pr [z (a) > τ ≥ 0] d(a)
a<j a≥j
X X
= z (a)2 d(a) + z (a)2 d(a)
a<j a≥j
X
2
= z (a) d(a)
a
T
= z Dz .
and that X
Et [w(∂(Sτ ))] ≤ wa,b |z (a) − z (b)| (|z (a)| + |z (b)|).
(a,b)∈E
We may use the Cauchy-Schwartz inequality to upper bound the term above by
s X s X
wa,b (z (a) − z (b))2 wa,b (|z (a)| + |z (b)|)2 . (21.2)
(a,b)∈E (a,b)∈E
We have defined ρ so that the term under the left-hand square root is at most
z T Lz ≤ ρz T Dz .
(a,b)∈E (a,b)∈E a
Local graph clustering algorithms discover small clusters of low conductance near a given input
vertex. Imagine that a graph has a cluster S that is not too big–d (S) is small relative to
d (V )–and that has low conductance. Also imagine that we know some vertex a ∈ S. Local
clustering algorithms give us a way of computing a cluster nearby S of similar size and
conductance. They are not guaranteed to work for all a ∈ S. But, we can show that they work for
“most” a ∈ S, where we have to measure “most” by weighted degree. In this chapter, we will see
an elegant analysis due to Kwok, Lau and Lee [KLL16] of a random-walk based local graph
clustering algorithm suggested by Spielman and Teng [ST04, ST13]
Most local clustering algorithms can be implemented to run on unweighted graphs in time
depending on d (S), rather that on the size of the graph. This means that they can find the
cluster without having to examine the entire graph! Many of the developments in these
algorithms have improved the running time, the size of the set returned, and the conductance of
the set returned. The end of the chapter contains pointers to major advances in these algorithms.
In this chapter, we focus on proving that we can find a cluster approximately as good as S,
without optimizing parameters or run time.
The input to the algorithm is a target set size, s, a conductance bound φ, and a seed vertex, a.
We will prove that if G contains a set S with d (S) ≤ s ≤ d (V )/32 and φ(S) ≤ φ, then there is an
a ∈ S such that when thepalgorithm is run with these parameters, it will return a set T with
d (T ) ≤ 16s and φ(T ) ≤ 8 ln(8s)φ. For the rest of this chapter we will assume that G does
contain a set S that satisfies these conditions.
Here is the algorithm.
1. Set p 0 = δ a .
2. Set t = 1/2φ (we will assume that t is an integer).
169
CHAPTER 22. LOCAL GRAPH CLUSTERING 170
t
3. Set y = D −1 W
f p 0.
4. Return the set of the form Tτ = {b : y (b) > τ } that has least conductance among those with
d (Tτ ) ≤ 8s.
Recall that the stable distribution of the random walk on a graph is d /(1T d ). So, to measure
how close a probability distribution p is to the stable distribution, we could ask how close D −1 p
is to being constant. In this chapter, we will measure this by the generalized Rayleigh quotient
p T D −1 LD −1 p
.
p T D −1 p
f = 1 (I + W ) = 1 (I + M D −1 ).
W
2 2
d (a) 1 t
≥ and 1TS W
f δ a ≥ 1/2.
d (S) 2 |S|
The second inequality says that after t steps the lazy walk that starts at a will be in S with
probability at least 1/2. In this section we show that S contains a good vertex. We will then show
that the local clustering algorithm succeeds if it begins at a good vertex.
Consider the distribution on vertices that corresponds to choosing a vertex at random from S
with probability proportional to its degree:
(
def d (a)/d (S), for a ∈ S
pS =
0, otherwise.
The following lemma says that if we start a walk from a random vertex in S chosen with
probability proportional to degree, then the probability it is outside S on the tth step of the lazy
walk is at most tφ(S)/2.
f t p S . Then
Lemma 22.2.1. Let S be a set with d (S) ≤ d (V )/2. Let p t = W
1TV −S p t ≤ tφ(S)/2.
CHAPTER 22. LOCAL GRAPH CLUSTERING 171
Proof. We will upper bound the probability that the lazy walk leaves S in each step by φ(S)/2. In
the first step, the probability that the lazy walk leaves S is exactly the sum over vertices a in S of
the probability the walk begins at a times the probability it follows an edge to a vertex not in S:
X 1 X wa,b 1 1 X 1 w(∂(S)) 1
p S (a) = wa,b = = φ(S).
2 d (a) 2 d (S) 2 d (S) 2
a∈S b∼a a∈S
b6∈S b6∈S
We now wish to show that in every future step the probability that the lazy walk leaves S is at
most this large. To this end, let p 0 = p S , and define
pi = W
f p i−1 .
We now show by induction that for every a ∈ V , p i (a) ≤ d (a)/d (S). This is true for p 0 , and in
fact the inequality is tight for a ∈ S. To establish the induction, note that all entries of W
f and
p i−1 are nonnegative. So, the assumption that p i−1 is entrywise at most d (a)/d (S) implies that
for a ∈ S
δ Ta p i = δ Ta W
f p i−1 ≤ δ T W
a
f d /d (S) = δ T d /d (S) = d (a)/d (S).
a
Thus, the probability that the walk transitions from a vertex in S to a vertex not in S at step i
satisfies X 1 X wa,b X 1 X wa,b 1
p i (a) ≤ p S (a) = φ(S).
2 d (a) 2 d (a) 2
a∈S b∼a a∈S b∼a
b6∈S b6∈S
As
X d (a) X 1 X 1
ba < ba ≤ = 1/2.
d (S) 2 |S| 2 |S|
a∈S a∈S a∈S
By slightly loosening the constants in the definition of “good”, we could prove that most vertices
of S are good, where “most” is defined by sampling with probability proportional to degree.
(1TS p)2
p T D −1 p ≥ .
d (S)
Proof. Write X Xp
p
1TS p = p(a) = d (a) p(a)/ d (a)
a∈S a∈S
1
p Tt D −1 p t ≥ .
4d (S)
CHAPTER 22. LOCAL GRAPH CLUSTERING 173
The following lemma allows us to measure how close a walk is to convergence merely in terms of
the quadratic form p Tt D −1 p t and the number of steps t.
p Tt D −1 LD −1 p t p 0 D −1 p 0
1
≤ ln .
p Tt D −1 p t t p t D −1 p t
Theorem 22.4.2. [Power Means Inequality] For k > h > 0, nonnegative numbers w1 , . . . , wn that
sum to 1, and nonnegative numbers λ1 , . . . , λn ,
n
!1/k n
!1/h
X X
wi λki ≥ wi λhi
i=1 i=1
and set
1 1
γ=P 2 =
i ci z T0 z 0
2
P
so that i γci = 1. We have
t
f t p 0 = D −1/2 W
f t D 1/2 z 0 = D −1/2 W
z t = D −1/2 W f D 1/2 z 0 .
νi = 2 − 2ωi .
Thus, X
zt = ci ωit ψ i ,
i
CHAPTER 22. LOCAL GRAPH CLUSTERING 174
and X X X
z Tt N z t = 2 c2i νi ωi2t ≤ 2 c2i ωi2t − 2 c2i ωi2t+1 .
i i i
Thus,
γz Tt N z t 2 2t − 2 i γc2i ωi2t+1 .
P P
2 i γci ωP
i
T
= 2 2t
γz t z t i γci ωi
P 2 2t+1
γc ω
= 2 − 2 Pi i 2 i 2t .
i γci ωi
P 2
To upper bound this last term, we recall that i γci = 1 and apply the Power Means Inequality
to show
!1/(2t+1) !1/(2t)
X X
2 2t+1 2 2t
γci ωi ≥ γci ωi =⇒
i i
! !1+1/(2t)
X X
γc2i ωi2t+1 ≥ γc2i ωi2t =⇒
i i
P 2 2t+1 !1/(2t)
γc ω X
Pi i 2 i 2t ≥ γc2i ωi2t .
i γci ωi i
This implies
P 2 2t+1 !1/2t
γc ω X
2 − 2 Pi i 2 i 2t ≤ 2 − 2 γc2i ωi2t
i γc ω
i i i
T 1/2t
zt zt
=2−2 .
z T0 z 0
zT
t zt
To finish the proof, let R = zT
, and note that for all R
0 z0
So,
1/2t
z Tt z t z T0 z 0
1
2−2 ≤ 2 − 2 (1 − ln(1/R)/2t) = ln(1/R)/t = ln .
z T0 z 0 t z Tt z t
22.5 Rounding
def
To apply Cheeger’s inequality, Theorem 21.1.3, we first change variables from p t to y = D −1 p t .
As 1T p t = 1, the vector y satisfies d T y = 1, and
p Tt D −1 LD −1 p t y T Ly
−1 = .
p Tt D p t y T Dy
So that we can be sure that the algorithm underlying Theorem 21.1.3 will find a set T that is not
too big, we will round to zero all the small entries of y and call the result x . While this is not
necessary for the algorithm, it does facilitate analysis.
Define
x (a) = max(0, y (a) − 1/16s). (22.1)
If s ≤ d (V )/32, then x will be balanced with respect to d . This is because at most half its entries
(measured by degree) will be positive. Formally,
X X X X
d (a) = d (a) < 16sp t (a) ≤ 16sp t (a) ≤ 16s.
a:y (a)>1/16s a:p t (a)>d (a)/16s a:p t (a)>d (a)/16s a
Tτ = {a : y (a) > τ } ,
Then, x T Dx ≥ y T Dy − 2.
Moreover, as shifting y and rounding entries to zero can not increase the length of any edge,
x T Lx ≤ y T Ly .
22.6 Notes
Explain where these come from, and give some references to where they are used in practice.
Chapter 23
In this chapter, show how eigenvectors can be used to partition graphs drawn from certain
natural models. These are called stochastic block models or a planted partition model, depending
on community and application.
The simplest model of this form is for the graph bisection problem. This is the problem of
partitioning the vertices of a graph into two equal-sized sets while minimizing the number of
edges bridging the sets. To create an instance of the planted bisection problem, we first choose a
partition of the vertices into equal-sized sets X and Y . When then choose probabilities p > q, and
place edges between vertices with the following probabilities:
p if u ∈ X and v ∈ X
Pr [(u, v) ∈ E] = p if u ∈ Y and v ∈ Y
q otherwise.
The expected number of edges crossing between X and Y will be q |X| |Y |. If p is sufficiently
√
larger than q, for example if p = 1/2 and q = p − 24/ n, we will show that the partition can be
approximately recovered from the second eigenvector of the adjacency matrix of the graph. The
result, of course, extends to over values of p and q. This will be a crude version of an analysis of
McSherry [McS01].
If p is too close to q, then the partition given by X and Y will not be the smallest. For example,
√
if q = p − / n for small then one cannot hope to distinguish between X and Y .
McSherry analyzed more general models than this, including planted coloring problems, and
sharp results have been obtained in a rich line of work. See, for example,
[MNS14, DKMZ11, BLM15, Mas14, Vu14].
McSherry’s analysis treats the adjacency matrix of the generated graph as a perturbation of one
ideal probability matrix. In the probability matrix the second eigenvector provides a clean
partition of the two blocks. McSherry shows that the difference between the generated matrix and
177
CHAPTER 23. SPECTRAL PARTITIONING IN A STOCHASTIC BLOCK MODEL 178
the ideal one is small, and so the generated matrix can be viewed as a small perturbation of the
idea one. He then uses matrix perturbation theory to show that the second eigenvector of the
generated matrix will probably be close to the second eigenvector of the original, and so it reveals
the partition. The idea of using perturbation theory to analyze random objects generated from
nice models has been very powerful.
Warning: stochastic block models have been the focus of a lot of research lately, and there are
now very good algorithms for solving problems on graphs generated from these models. But,
these are just models and very little real data resembles that produced by these models. So, there
is no reason to believe that algorithms that are optimized for these models will be useful in
practice. Nevertheless, some of them are.
As long as we don’t tell our algorithm, we can choose X = {1, . . . , n/2} and
Y = {n/2 + 1, . . . , n}. Let’s do this for simplicity.
Define the matrix
p ··· p p q q ···
0 q q
p 0 ··· p p q q ··· q q
.. ..
. .
p
p ··· 0 p q q ··· q q
p p ··· p 0 q q ··· q q = pJ n/2 qJ n/2 − pI n ,
A=
q
q ··· q q 0 p ··· p p qJ n/2 pJ n/2
q
q ··· q q p 0 ··· p p
. ..
.. .
q q ··· q q p p ··· 0 p
q q ··· q q p p ··· p 0
where we write J n/2 for the square all-1s matrix of size n/2.
The adjacency matrix of the planted partition graph is obtained by setting M (a, b) = 1 with
probability A(a, b), subject to M (a, b) = M (b, a) and M (a, a) = 0. So, this is a random graph,
but the probabilities of some edges are different from others.
We will study a very simple algorithm for finding an approximation of the planted bisection:
compute ψ 2 , the eigenvector of the second-largest eigenvalue of M . Then, set
S = {a : ψ 2 (a) < 0}. We guess that S is one of the sets in the bisection. We will show that under
reasonable conditions on p and q, S will be mostly right. For example, we might consider p = 1/2
√
and q = 1/2 − 12/ n. Intuitively, the reason this works is that M is a slight perturbation of A,
and so the eigenvectors of M should look like the eigenvectors of A.
To simplify some formulas, we henceforth work with
c def
M = M + pI b def
and A = A + pI
CHAPTER 23. SPECTRAL PARTITIONING IN A STOCHASTIC BLOCK MODEL 179
b = n (p + q)1,
A1
2
and so the corresponding eigenvalue is
def n
α1 = (p + q).
2
The second eigenvector of A b has two values: one on X and one on Y . Let’s be careful to make
this a unit vector. We take ( 1
√
n
a∈X
φ2 (a) =
− √1n a ∈ Y.
Then,
b 2 = n (p − q)φ2 ,
Aφ
2
and the corresponding eigenvalue is
n
def
α2 = (p − q).
2
As A
b has rank 2, all the other eigenvalues of A
b are zero.
We can use bounds similar to that proved in Chapter 8 to show that it is unlikely that R has
large norm. The bounds that we proved on the norm of a matrix in which entries are chosen from
{1 − p, −p} applies equally well if each entry (a, b) is chosen from {1 − qa,b , −qa,b } as long as
qa,b < p and all have expectation 0, because (8.2) still applies. For a sharp result, we appeal to a
theorem of Vu [Vu07, Theorem 1.4], which implies the following.
Theorem 23.1.1. There exist constants c1 and c2 such that with probability approaching 1,
p
kRk ≤ 2 p(1 − p)n + c1 (p(1 − p)n)1/4 ln n,
provided that
ln4 n
p ≥ c2 .
n
CHAPTER 23. SPECTRAL PARTITIONING IN A STOCHASTIC BLOCK MODEL 180
Corollary 23.1.2. There exists a constant c0 such that with probability approaching 1,
√
kRk ≤ 3 pn,
provided that
ln4 n
p ≥ c0 .
n
In fact, Alon, Krivelevich and Vu [AKV02] prove that the probability that the norm of R exceeds
this value by more than t is exponentially small in t. However, we will not need that fact for this
lecture.
So, we can view µ2 as a perturbation of α2 . We need a stronger fact, which is that we can view
ψ 2 as a perturbation of φ2 .
The Davis-Kahan theorem [DK70] says that ψ 2 will be close to φ2 , in angle, if the norm of R is
significantly less than the distance between α2 and the other eigenvalues of A.
b That is, the
eigenvector does not move too much if its corresponding eigenvalue is isolated.
The angle is never more than π/2, because this theorem is bounding the angle between the
eigenspaces rather than a particular choice of eigenvectors. We will prove and use a slightly
weaker statement in which we replace 2θ with θ.
23.3 Partitioning
Consider
δ = ψ 2 − φ2 ,
CHAPTER 23. SPECTRAL PARTITIONING IN A STOCHASTIC BLOCK MODEL 181
and let θ be the angle between them. For every vertex a that is misclassified by ψ 2 , we have
|δ(a)| ≥ √1n . So, if ψ 2 misclassifies k vertices, then
r
k
kδk ≥ .
n
As φ2 and ψ 2 are unit vectors, we may apply the crude inequality
√
kδk ≤ 2 sin θ
√
(the 2 disappears as θ gets small).
To combine this with the perturbation bound, we assume q > p/3, and find
n
min |α2 − αj | = (p − q).
j6=2 2
√
Assuming that kRk ≤ 3 pn, we find
√ √
2 kRk 2 · 3 pn 12 p
sin θ ≤ n ≤ n =√ .
2 (p − q) 2 (p − q) n(p − q)
which implies
288p
k≤ .
(p − q)2
So, we expect to misclassify at most a constant number of vertices if p and q remain constant as n
√
grows large. An interesting case to consider is p = 1/2 and q = p − 24/ n. This gives
288p n
= ,
(p − q)2 4
so we expect to misclassify at most a constant fraction of the vertices. Of course, once one gets
most of the vertices correct is should be possible to use them to better classify the rest. Many of
the advances in the study of algorithms for this problem involve better and more rigorous ways of
doing this.
more than 1, we may also assume that αi has multiplicity 1 as an eigenvalue, and that ψ i is a
unit vector in the nullspace of B.
Our assumption that αi = 0 also leads to |βi | ≤ kRk by Weyl’s inequality (23.1).
Expand ψ i in the eigenbasis of A, as
X
ψi = cj φj , where cj = φTj ψ i .
j
Setting
δ = min |αj | ,
j6=i
we compute
X
kAψ i k2 = c2j αj2
j
X
≥ c2j δ 2
j6=i
X
= δ2 c2j
j6=i
= δ (1 − c2i )
2
= δ 2 sin2 θi .
So,
2 kRk
sin θi ≤ .
δ
It may seem surprising that the amount by which eigenvectors move depends upon how close
their respective eigenvalues are to the other eigenvalues. However, this dependence is necessary.
To see why, first consider the matrices
1+ 0 1 0
and .
0 1 0 1+
1 0
While these two matrices are very close, their leading eigenvectors are and , which are
0 1
90 degrees from each other.
The heart of the problem is that there is no unique eigenvector of an eigenvalue that has
multiplicity greater than 1.
CHAPTER 23. SPECTRAL PARTITIONING IN A STOCHASTIC BLOCK MODEL 183
If you would like to know more about bounding norms and eigenvalues of random matrices, I
recommend [Ver10] and [Tro12].
Chapter 24
Nodal Domains
24.1 Overview
In today’s lecture we will justify some of the behavior we observed when using eigenvectors to
draw graphs in the first lecture. First, recall some of the drawings we made of graphs:
We will show that the subgraphs obtained in the right and left halfs of each image are connected.
Path graphs exhibited more interesting behavior: their kth eigenvector changes sign k times:
184
CHAPTER 24. NODAL DOMAINS 185
0.4 0.4
v2 v10
v3
v4
Value in Eigenvector
Value in Eigenvector
0.2 0.2
0.0 0.0
-0.2 -0.2
-0.4 -0.4
2 4 6 8 10 2 4 6 8 10
Vertex Number Vertex Number
Here are the analogous plots for a path graph with edge weights randomly chosen in [0, 1]:
0.4 v2 v11
v3
v4 0.50
Value in Eigenvector
0.2
Value in Eigenvector
0.25
0.0
0.00
-0.2
-0.25
-0.4
-0.50
2 4 6 8 10 2 4 6 8 10
Vertex Number Vertex Number
v2
v3
0.50 v4
Value in Eigenvector
0.25
0.00
-0.25
-0.50
2 4 6 8 10
Vertex Number
Random.seed!(1)
M = spdiagm(1=>rand(10))
M = M + M’
L = lap(M)
E = eigen(Matrix(L))
Plots.plot(E.vectors[:,2],label="v2",marker = 5)
Plots.plot!(E.vectors[:,3],label="v3",marker = 5)
Plots.plot!(E.vectors[:,4],label="v4",marker = 5)
xlabel!("Vertex Number")
ylabel!("Value in Eigenvector")
savefig("rpath2v24.pdf")
We see that the kth eigenvector still changes sign k times. We will see that this always happens.
These are some of Fiedler’s theorems about “nodal domains”. Nodal domains are the connected
parts of a graph on which an eigenvector is negative or positive.
In this lecture, we will make use of Sylvester’s law of intertia, which is a powerful generalization
of this fact. I will state and prove it now.
CHAPTER 24. NODAL DOMAINS 187
Theorem 24.2.2 (Sylvester’s Law of Intertia). Let A be any symmetric matrix and let B be any
non-singular matrix. Then, the matrix BAB T has the same number of positive, negative and zero
eigenvalues as A.
Note that if the matrix B were orthonormal, or if we used B −1 in place of B T , then these
matrices would have the same eigenvalues. What we are doing here is different, and corresponds
to a change of variables.
Proof. It is clear that A and BAB T have the same rank, and thus the same number of zero
eigenvalues.
We will prove that A has at least as many positive eigenvalues as BAB T . One can similarly
prove that that A has at least as many negative eigenvalues, which proves the theorem.
Let γ1 , . . . , γk be the positive eigenvalues of BAB T and let Yk be the span of the corresponding
eigenvectors. Now, let Sk be the span of the vectors B T y , for y ∈ Yk . As B is non-singluar, Sk
has dimension k. Let α1 ≥ · · · ≥ αn be the eigenvalues of A. By the Courant-Fischer Theorem,
we have
x T Ax x T Ax y T BAB T y γk y T y
αk = maxn min ≥ min = min ≥ > 0.
S⊆IR x ∈S xTx x ∈Sk x T x y ∈Yk y T BB T y y T BB T y
dim(S)=k
So, A has at least k positive eigenvalues (The point here is that the denominators are always
positive, so we only need to think about the numerators.)
To finish, either apply the symmetric argument to the negative eigenvalues, or apply the same
argument with B −1 .
Theorem 24.3.1. Let T be a weighted tree graph on n vertices, let LT have eigenvalues
0 = λ1 < λ2 · · · ≤ λn , and let ψ k be an eigenvector of λk . If there is no vertex u for which
ψ k (u) = 0, then there are exactly k − 1 edges for which ψ k (u)ψ k (v) < 0.
One can extend this theorem to accomodate zero entries and prove that the eigenvector changes
k − 1 times. We will just prove this theorem for weighted path graphs.
Our analysis will rest on an understanding of Laplacians of paths that are allowed to have
negative edges weights.
Lemma 24.3.2. Let M be the Laplacian matrix of a weighted path that can have negative edge
weights: X
M = wa,a+1 La,a+1 ,
1≤a<n
CHAPTER 24. NODAL DOMAINS 188
where the weights wa,a+1 are non-zero and we recall that La,b is the Laplacian of the edge (a, b).
The number of negative eigenvalues of M equals the number of negative edge weights.
We now perform a change of variables that will diagonalize the matrix M . Let δ(1) = x (1), and
for every a > 1 let δ(a) = x (a) − x (a − 1).
Every variable x (1), . . . , x (n) can be expressed as a linear combination of the variables
δ(1), . . . , δ(n). In particular,
x = Lδ.
LT M L
has the same number of positive, negative, and zero eigenvalues as M . On the other hand,
X
δ T LT M Lδ = wa,a+1 (δ(v))2 .
1≤a<n
So, this matrix clearly has one zero eigenvalue, and as many negative eigenvalues as there are
negative wa,a+1 .
Proof of Theorem 24.3.1. We assume that λk has multiplicity 1. One can prove it, but
we will skip it.
Let Ψ k denote the diagonal matrix with ψ k on the diagonal, and let λk be the corresponding
eigenvalue. Consider the matrix
M = Ψ k (LP − λk I )Ψ k .
The matrix LP − λk I has one zero eigenvalue and k − 1 negative eigenvalues. As we have
assumed that ψ k has no zero entries, Ψ k is non-singular, and so we may apply Sylvester’s Law of
Intertia to show that the same is true of M .
I claim that X
M = wu,v ψ k (u)ψ k (v)Lu,v .
(u,v)∈E
To see this, first check that this agrees with the previous definition on the off-diagonal entries. To
verify that these expression agree on the diagonal entries, we will show that the sum of the entries
CHAPTER 24. NODAL DOMAINS 189
in each row of both expressions agree. As we know that all the off-diagonal entries agree, this
implies that the diagonal entries agree. We compute
Ψ k (LP − λk I )Ψ k 1 = Ψ k (LP − λk I )ψ k = Ψ k (λk ψ k − λk ψ k ) = 0.
As Lu,v 1 = 0, the row sums agree. Lemma 24.3.2 now tells us that the matrix M , and thus
LP − λk II, has as many negative eigenvalues as there are edges (u, v) for which
ψ k (u)ψ k (v) < 0.
There are a few more facts from linear algebra that we will need for the rest of this lecture. We
stop to prove them now.
Proof. Consider the matrix A = σI − M , for some large σ. For σ sufficiently large, this matrix
will be non-negative, and the graph of its non-zero entries is connected. So, we may apply the
Perron-Frobenius theory to A to conclude that its largest eigenvalue α1 has multiplicity 1, and
the corresponding eigenvector v 1 may be assumed to be strictly positive. We then have
λ1 = σ − α1 , and v 1 is an eigenvector of λ1 .
We will often use the following elementary consequence of the Courant-Fischer Theorem. I will
assign it as homework.
Theorem 24.4.2 (Eigenvalue Interlacing). Let A be an n-by-n symmetric matrix and let B be a
principal submatrix of A of dimension n − 1 (that is, B is obtained by deleting the same row and
column from A). Then,
α1 ≥ β1 ≥ α2 ≥ β2 ≥ · · · ≥ αn−1 ≥ βn−1 ≥ αn ,
where α1 ≥ α2 ≥ · · · ≥ αn and β1 ≥ β2 ≥ · · · ≥ βn−1 are the eigenvalues of A and B, respectively.
CHAPTER 24. NODAL DOMAINS 190
Given a graph G = (V, E) and a subset of vertices, W ⊆ V , recall that the graph induced by G on
W is the graph with vertex set W and edge set
{(i, j) ∈ E, i ∈ W and j ∈ W } .
Wk = {i ∈ V : ψ k (i) ≥ 0} .
Proof. To see that Wk is non-empty, recall that ψ 1 = 1 and that ψ k is orthogonal ψ 1 . So, ψ k
must have both positive and negative entries.
Assume that G(Wk ) has t connected components. After re-ordering the vertices so that the
vertices in one connected component of G(Wk ) appear first, and so on, we may assume that LG
and ψ k have the forms
B1 0 0 · · · C1 x1
0 B 2 0 · · · C2 x 2
LG = ... .. .. .. .. ψ = .. ,
. . . .
k .
0 0 · · · Bt Ct x t
T T
C1 C2 · · · Ct D T y
and
B1 0 0 ··· C1 x1 x1
0
B2 0 ··· C2 x 2
x 2
.. .. .. .. .. .. = λ .. .
.
. . . . .
k .
0 0 ··· Bt Ct x t x t
C1T C2T ··· CtT D y y
The first t sets of rows and columns correspond to the t connected components. So, x i ≥ 0 for
1 ≤ i ≤ t and y < 0 (when I write this for a vector, I mean it holds for each entry). We also know
that the graph of non-zero entries in each Bi is connected, and that each Ci is non-positive, and
has at least one non-zero entry (otherwise the graph G would be disconnected).
We will now prove that the smallest eigenvalue of Bi is smaller than λk . We know that
B i x i + Ci y = λ k x i .
Bi x i = λk x i − Ci y ≤ λk x i
CHAPTER 24. NODAL DOMAINS 191
and
x Ti Bi x i ≤ λk x Ti x i .
If x i has any zero entries, then the Perron-Frobenius theorem tells us that x i cannot be an
eigenvector of smallest eigenvalue, and so the smallest eigenvalue of Bi is less than λk . On the
other hand, if x i is strictly positive, then x Ti Ci y > 0, and
x Ti Bi x i = λk x Ti x i − x Ti Ci y < λk x Ti x i .
We remark that Fiedler actually proved a somewhat stronger theorem. He showed that the same
holds for
W = {i : ψ k (i) ≥ t} ,
for every t ≤ 0.
This theorem breaks down if we instead consider the set
W = {i : ψ k (i) > 0} .
1
0 −3
1
25.1 Overview
Spectral Graph theory first came to the attention of many because of the success of using the
second Laplacian eigenvector to partition planar graphs and scientific meshes
[DH72, DH73, Bar82, PSL90, Sim91].
In this lecture, we will attempt to explain this success by proving, at least for planar graphs, that
the second smallest Laplacian eigenvalue is small. One can then use Cheeger’s inequality to prove
that the corresponding eigenvector provides a good cut.
This was already known for the model case of a 2-dimensional grid. If the grid is of size
√ √
n-by- n, then it has λ2 ≈ c/n. Cheeger’s inequality then tells us that it has a cut of
√
conductance c/ n. And, this is in fact the cut that goes right accross the middle of one of the
axes, which is the cut of minimum conductance.
Theorem 25.1.1 ([ST07]). Let G be a planar graph with n vertices of maximum degree d, and let
λ2 be the second-smallest eigenvalue of its Laplacian. Then,
8d
λ2 ≤ .
n
The proof will involve almost no calculation, but will use some special properties of planar
graphs. However, this proof has been generalized to many planar-like graphs, including the
graphs of well-shaped 3d meshes.
192
CHAPTER 25. THE SECOND EIGENVALUE OF PLANAR GRAPHS 193
We typically upper bound λ2 by evidencing a test vector. Here, we will upper bound λ2 by
evidencing a test embedding. The bound we apply is:
kv i − v j k2
P
(i,j)∈E
λ2 = min P P 2 (25.1)
v 1 ,...,v n ∈IRd : v i =0 i kv i k .
Similarly, X X X X
kv i k2 = x2i + yi2 + · · · + zi2 .
i i i i
xj )2
P
(i,j)∈E (xi −
P 2 ≥ λ2 .
i xi
For an example, consider the natural embedding of the square with corners (±1, ±1).
The key to applying this embedding lemma is to obtain the right embedding of a planar graph.
Usually, the right embedding of a planar graph is given by Koebe’s embedding theorem, which I
will now explain. I begin by considering one way of generating planar graphs. Consider a set of
circles {C1 , . . . , Cn } in the plane such that no pair of circles intersects in their interiors. Associate
a vertex with each circle, and create an edge between each pair of circles that meet at a boundary.
See Figure 25.2. The resulting graph is clearly planar. Koebe’s embedding theorem says that
every planar graph results from such an embedding.
Theorem 25.2.2 (Koebe). Let G = (V, E) be a planar graph. Then there exists a set of circles
{C1 , . . . , Cn } in IR2 that are interior-disjoint such that circle Ci touches circle Cj if and only if
(i, j) ∈ E.
This is an amazing theorem, which I won’t prove today. You can find a beautiful proof in the
book “Combinatorial Geometry” by Agarwal and Pach.
Such an embedding is often called a kissing disk embedding of the graph. From a kissing disk
embedding, we obtain a natural choice of v i : the center of disk Ci . Let ri denote the radius of
CHAPTER 25. THE SECOND EIGENVALUE OF PLANAR GRAPHS 194
this disk. We now have an easy upper bound on the numerator of (25.1):
kv i − v j k2 = (ri + rj )2 ≤ 2ri2 + 2rj2 . On the other hand, it is trickier to obtain a lower bound on
kv i k2 . In fact, there are graphs whose kissing disk embeddings result in
P
(25.1) = Θ(1).
These graphs come from triangles inside triangles inside triangles. . . Such a graph is depicted
below:
Graph
Discs
We will fix this problem by lifting the planar embeddings to the sphere by stereographic
projection. Given a plane, IR2 , and a sphere S tangent to the plane, we can define the
stereographic projection map, Π, from the plane to the sphere as follows: let s denote the point
where the sphere touches the plane, and let n denote the opposite point on the sphere. For any
point x on the plane, consider the line from x to n. It will intersect the sphere somewhere. We
let this point of intersection be Π(x ).
The fundamental fact that we will exploit about stereographic projection is that it maps circles to
circles! So, by applying stereographic projection to a kissing disk embedding of a graph in the
plane, we obtain a kissing disk embedding of that graph on the sphere. Let Di = Π(Ci ) denote
the image of circle Ci on the sphere. We will now let v i denote the center of Di , on the sphere.
CHAPTER 25. THE SECOND EIGENVALUE OF PLANAR GRAPHS 195
P
If we had i v i = 0, the rest of the computation would be easy. For each i, kv i k = 1, so the
denominator of (25.1) is n. Let ri denote the straight-line distance from v i to the boundary of
Di . We then have (see Figure 25.2)
kv i − v j k2 ≤ (ri + rj )2 ≤ 2ri2 + 2rj2 .
So, the denominator of (25.1) is at most 2d i ri2 . On the other hand, a theorem of Archimedes
P
tells us that the area of the cap encircled by Di is at exactly πri2 . Rather than proving it, I will
convince you that it has
√ to be true because it is true when ri is small, it is true when the cap is a
hemisphere and ri = 2, and it is true when the cap is the whole sphere and ri = 2.
As the caps are disjoint, we have X
πri2 ≤ 4π,
i
which implies that the denominator of (25.1) is at most
X X
kv a − v b k2 ≤ 2ra2 + 2rb2 ≤ 2d ra2 ≤ 8d.
(a,b)∈E a
CHAPTER 25. THE SECOND EIGENVALUE OF PLANAR GRAPHS 196
kv i − v j k2
P
(i,j)∈E 8d
min P 2 ≤ .
n
P
v 1 ,...,v n ∈IRd : v i =0 i kv i k .
Note that there is enough freedom in our construction to believe that we could prove such a
thing: we can put the sphere anywhere on the plane, and we could even scale the image in the
plane before placing the sphere. By carefully combining these two operations, it is clear that we
can place the center of gravity of the v i s close to any point on the boundary of the sphere. It
turns out that this is sufficient to prove that we can place it at the origin.
We need a nice family of maps that transform our kissing disk embedding on the sphere. It is
particularly convenient to parameterize these by a point ω inside the sphere. For any point α on
the surface of the unit sphere, I will let Πα denote the stereographic projection from the plane
tangent to the sphere at α.
I will also define Π−1 −1
α . To handle the point −α, I let Πα (−α) = ∞, and Πα (∞) = −α. We also
define the map that dilates the plane tangent to the sphere at α by a factor a: Dαa . We then
define the following map from the sphere to itself
def 1−kωk
fω (x ) = Πω/kωk Dω/kωk Π−1 ω/kωk (x ) .
For α ∈ S and ω = aα, this map pushes everything on the sphere to a point close to α. As a
approaches 1, the mass gets pushed closer and closer to α.
CHAPTER 25. THE SECOND EIGENVALUE OF PLANAR GRAPHS 197
Instead of proving that we can achieve (25.2), I will prove a slightly simpler theorem. The proof
of the theorem we really want is similar, but about just a few minutes too long for class. We will
prove
Theorem 25.3.1. Let v 1 , . . . , v n be points on the unit-sphere. Then, there exists an ω such that
P
i fω (v i ) = 0.
The reason that this theorem is different from the one that we want to prove is that if we apply a
circle-preserving map from the sphere to itself, the center of the circle might not map to the
center of the image circle.
P
To show that we can achieve i v i = 0, we will use the following topological lemma, which
follows immediately from Brouwer’s fixed point theorem. In the following, we let B denote the
ball of points of norm less than 1, and S the sphere of points of norm 1.
Lemma 25.3.2. If φ : B → B be a continuous map that is the identity on S. Then, there exists
an ω ∈ B such that
φ(ω) = 0.
Proof of Lemma 25.3.2. Let b be the map that sends z ∈ B to z / kz k. The map b is continuous
at every point other than 0. Now, assume by way of contradiction that 0 is not in the image of φ,
and let g(z ) = −b(φ(z )). By our assumption, g is continuous and maps B to B. However, it is
clear that g has no fixed point, contradicting Brouwer’s fixed point theorem.
Lemma 25.3.2, was our motivation for defining the maps fω in terms of ω ∈ B. Now consider
setting
1X
φ(ω) = fω (v i ).
n
i
The only thing that stops us from applying Lemma 25.3.2 at this point is that φ is not defined on
S, because fω was not defined for ω ∈ S. To fix this, we define for α ∈ S
(
α if z 6= −α
fα (z ) =
−α otherwise.
φ(ω) = 0.
To finish the proof, we need to get rid of this . That is, we wish to show that ω is bounded away
from S, say by µ, for all sufficiently small . If that is the case, then we will have
dist(ω, v i ) ≥ µ > 0 for all sufficiently small . So, for < µ and sufficiently small, hω (v i ) = 1 for
all i, and we recover the = 0 case.
One can verify that this holds provided that the points v i are distinct and there are at least 3 of
them.
Finally, recall that this is not exactly the theorem we wanted to prove: this theorem deals with
v i , and not the centers of caps. The difficulty with centers of caps is that they move as the caps
move. However, this can be overcome by observing that the centers remain inside the caps, and
move continuously with ω. For a complete proof, see [ST07, Theorem 4.2]
This result has been improved in many ways. Jonathan Kelner [Kel06] generalized this result to
graphs of bounded genus. Kelner, Lee, Price and Teng [KLPT09] obtained analogous bounds for
λk for k ≥ 2. Biswal, Lee and Rao [BLR10] developed an entirely new set of techniques to prove
these results. Their techniques improve these bounds, and extend them to graphs that do not
have Kh minors for any constant h.
Chapter 26
26.1 Introduction
In this lecture, I will introduce the Colin de Verdière number of a graph, and sketch the proof that
it is three for planar graphs. Along the way, I will recall two important facts about planar graphs:
2. Planar graphs are the graphs that do not have K5 or K3,3 minors.
The Colin de Verdière graph parameter essentially measures the maximum multiplicity of the
second eigenvalue of a generalized Laplacian matrix of the graph. It is less than or equal to three
precisely for planar graphs.
We say that M is a Generalized Laplacian Matrix of a graph G = (V, E) if M can be expressed as
M = L + D where L is a the Laplacian matrix of a weighted version of G and D is an arbitrary
diagonal matrix. That is, we impose the restrictions:
199
CHAPTER 26. PLANAR GRAPHS 2, THE COLIN DE VERDIÈRE NUMBER 200
The Colin de Verdière graph parameter, which we denote cdv(G) is the maximum multiplicity of
the second-smallest eigenvalue of a Generalized Laplacian Matrix M of G satisfying the following
condition, known as the Strong Arnold Property.
For every non-zero n-by-n matrix X such that X(i, j) = 0 for i = j and (i, j) ∈ E,
M X 6= 0.
That later restriction will be unnecessary for the results we will prove in this lecture.
Colin de Verdière [dV90] proved that cdv(G) is at most 2 if and only if the graph G is
outerplanar. That is, it is a planar graph in which every vertex lies on one face. He also proved
that it is at most 3 if and only if G is planar. Lovàsz and Schrijver [LS98] proved that it is at
most 4 if and only if the graph is linkless embeddable.
In this lecture, I will sketch proofs from two parts of this work:
The first requires the construction of a matrix, which we do using the representation of the graph
as a convex polytope. The second requires a proof that no Generalized Laplacian Matrix of the
graph has a second eigenvalue of high multiplicity. We prove this by using graph minors.
Let me begin by giving two definitions of convex polytope: as the convex hull of a set of points
and as the intersection of half-spaces.
Let x 1 , . . . , x n ∈ IRd (think d = 3). Then, the convex hull of x 1 , . . . , x n is the set of points
( )
X X
ai x i : ai = 1 and all ai ≥ 0 .
i
is a convex polytope. Moreover, every convex polytope containing the origin in its interior can be
described in this way. Each vector y i defines a face of the polytope consisting of those points x in
the polytope such that y Ti x = 1.
The vertices of a convex polytope are those points x in the polytope that cannot be expressed
non-trivially as a convex combination of any points other than themselves. The edges (or 1-faces)
CHAPTER 26. PLANAR GRAPHS 2, THE COLIN DE VERDIÈRE NUMBER 201
of a convex polytope are the line segments on the boundary of the polytope that go between two
vertices of the polytope and such that every point on the edge cannot be expressed non-trivially
as the convex hull of any vertices other than these two.
Theorem 26.3.1 (Steinitz’s Theorem). For every three-connected planar graph G = (V, E), there
exists a set of vectors x 1 , . . . , x n ∈ IR3 such that the line segment from x i to x j is an edge of the
convex hull of the vectors if and only if (i, j) ∈ E.
That is, every planar graph may be represented by the edges of a three-dimensional convex
polytope. We will use this representation to construct a Generalized Laplacian Matrix M whose
second-smallest eigenvalue has multiplicity 3.
Let G = (V, E) be a planar graph, and let x 1 , . . . , x n ∈ IR3 be the vectors given by Steinitz’s
Theorem. For 1 ≤ i ≤ 3, let v i ∈ IRn be the vector given by
v i (j) = x j (i).
So, the vector v i contains the ith coordinate of each vector x 1 , . . . , x n .
We will now see how to construct a generalized Laplacian matrix M having the vectors v 1 , v 2 and
v 3 in its nullspace. One can also show that the matrix M has precisely one negative eigenvalue.
But, we won’t have time to do that in this lecture. You can find the details in [Lov01].
Our construction will exploit the vector cross product. Recall that for two vectors x and y in IR3
that it is possible to define a vector x × y that is orthogonal to both x and y , and whose length
is the area of the parallelogram with sizes x and y . This determines the cross product up to sign.
You should recall that the sign is determined by an ordering of the basis of IR3 , or by the right
hand rule. Also recall that
x × y = −y × x ,
(x 1 + x 2 ) × y = x 1 × y + x 2 × y , and
x × y = 0 if and only if x and y are parallel.
We will now specify the entries M (i, j) for (i, j) ∈ E. An edge (i, j) is on the boundary of two
faces of the polytope. Let’s say that the vectors defining these faces are y a and y b . So,
y Ta x i = y Ta x j = y Tb x i = y Tb x j = 1.
So,
(y a − y b )T x i = (y a − y b )T x j = 0.
This implies that y a − y b is parallel to x i × x j .
Assume y a comes before y b in the clockwise order about vertex x i . So, y b − y a points the same
direction as x i × x j . Set M (i, j) so that
M (i, j)x i × x j = y a − y b
CHAPTER 26. PLANAR GRAPHS 2, THE COLIN DE VERDIÈRE NUMBER 202
This sum counts the difference y b − y a between each adjacent pair of faces that touch x i . By
going around x i in counter-clockwise order, we see that each of these vectors occurs once
positively and once negatively in the sum, so the sum is zero.
Thus, x i and x̂ i are parallel, and we may set M (i, i) so that
M (i, i)x i + x̂ i = 0.
This implies that the coordinate vectors are in the nullspace of M , as
x1
x 2 X
M .. = M (i, i)x i + M (i, j)x j = M (i, i)x i + x̂ i .
.
j∼i
xn i
One can also show that the matrix M has precisely one negative eigenvalue, so the multiplicity of
its second-smallest eigenvalue is 3.
I will now show you that cdv(G) ≤ 3 for every 3-connected planar graph G. To begin, I mention
one other characterization of planar graphs.
First, observe that if G is a planar graph, it remains planar when we remove an edge. Also
observe that if (u, v) is an edge, then the graph obtained by contracting (u, v) to one vertex is
also planar. Any graph H that can be obtained by removing and contracting edges from a graph
G is called a minor of G. It is easy to show that every minor of a planar graph is also planar.
Kuratowski’s Theorem tells us that a graph is planar if and only if it does not have K5 or K3,3
(the complete bipartite graph between two sets of 3 vertices) as a minor. We will just use the fact
that a planar graph does not have K3,3 as a minor.
26.6 cdv(G) ≤ 3
We will now prove that if G is a 3-connected planar graph, then cdv(G) ≤ 3. Assume, by way of
contradiction, that there is generalized Laplacian matrix M of G whose second eigenvalue λ2 has
CHAPTER 26. PLANAR GRAPHS 2, THE COLIN DE VERDIÈRE NUMBER 203
b+
d a+ ′ a−
a
a b’
b
b−
c+ c
c′
c−
Now, contract every edge on the path from a to a0 , on the path from b to b0 and on the path from
c to c0 . Also, contract all the vertices for which v is positive and contract all the vertices for
which v is negative (which we can do because these sets are connected). Finally, contract every
edge in the face F that does not involve one of a, b, or c. We obtain a graph with a triangular
face abc such that each of a, b, and c have an edge to the positive supervertex and the negative
CHAPTER 26. PLANAR GRAPHS 2, THE COLIN DE VERDIÈRE NUMBER 204
b+
d a+ a−
a′
a b’
b
f
b−
c+ c
c′
c−
Figure 26.2: The set of positive and negative vertices that will be contracted. Vertex f has been
inserted.
To do this, we add one additional vertex f inside the face and connected to each of a, b, and c.
This does not violate planarity because a, b, and c were contained in a face. In fact, we can add f
before we do the contractions. By throwing away all other edges, we have constructed a K3,3
minor, so the graph cannot be planar.
CHAPTER 26. PLANAR GRAPHS 2, THE COLIN DE VERDIÈRE NUMBER 205
+
a′
a b’
b
f
−
c
c′
Figure 26.3: The edges in the cycle have been contracted, as have all the positive and negative
vertices. After contracting the paths between a and a0 , between b and b0 and between c and c0 , we
obtain a K3,3 minor.
Part V
Expander Graphs
206
Chapter 27
27.1 Overview
We say that a d-regular graph is a good expander if all of its adjacency matrix eigenvalues are
small. To quantify this, we set a threshold > 0, and require that each adjacency matrix
eigenvalue, other than d, has absolute value at most d. This is equivalent to requiring all
non-zero eigenvalues of the Laplacian to be within d of d.
In this lecture, we will:
Random d-regular graphs are expander graphs. Explicitly constructed expander graphs have
proved useful in a large number of algorithms and theorems. We will see some applications of
them next week.
One way of measuring how well two matrices A and B approximate each other is to measure the
operator norm of their difference: A − B. Since I consider the operator norm by default, I will
207
CHAPTER 27. PROPERTIES OF EXPANDER GRAPHS 208
just refer to it as the norm. Recall that the norm of a matrix M is defined to be its largest
singular value:
kM x k
kM k = max ,
x kx k
where the norms in the fraction are the standard Euclidean vector norms. The norm of a
symmetric matrix is just the largest absolute value of one of its eigenvalues. It can be very
different for a non symmetric matrix.
For this lecture, we define an -expander to be a d-regular graph whose adjacency matrix
eigenvalues satisfy |µi | ≤ d for µi ≥ 2. As the Laplacian matrix eigenvalues are given by
λi = d − µi , this is equivalent to |d − λi | ≤ d for i ≥ 2. It is also equivalent to
(1 − )H 4 G 4 (1 + )H,
x T LH x ≤ x T LG x .
I warn you that this definition is not symmetric. When I require a symmetric definition, I usually
use the condition (1 + )−1 H 4 G instead of (1 − )H 4 G.
If G is an -expander, then for all x ∈ IRV that are orthogonal to the constant vectors,
(1 − )dx T x ≤ x T LG x ≤ (1 + )dx T x .
On the other hand, for the complete graph Kn , we know that all x orthogonal to the constant
vectors satisfy
x T LKn x = nx T x .
Let H be the graph
d
H= Kn ,
n
so
x T LH x = dx T x .
So, G is an -approximation of H.
This tells us that LG − LH is a matrix of small norm. Observe that
There are many ways in which expander graphs act like random graphs. Conversely, one can prove
that a random d-regular graph is an expander graph with reasonably high probability [Fri08].
We will see that all sets of vertices in an expander graph act like random sets of vertices. To make
this precise, imagine creating a random set S ⊂ V by including each vertex in S independently
with probability α. How many edges do we expect to find between vertices in S? Well, for every
edge (u, v), the probability that u ∈ S is α and the probability that v ∈ S is α, so the probability
that both endpoints are in S is α2 . So, we expect an α2 fraction of the edges to go between
vertices in S. We will show that this is true for all sufficiently large sets S in an expander.
In fact, we will prove a stronger version of this statement for two sets S and T . Imagine including
each vertex in S independently with probability α and each vertex in T with probability β. We
allow vertices to belong to both S and T . For how many ordered pairs (u, v) ∈ E do we expect to
have u ∈ S and v ∈ T ? Obviously, it should hold for an αβ fraction of the pairs.
For a graph G = (V, E), define
~
E(S, T ) = {(u, v) : u ∈ S, v ∈ T, (u, v) ∈ E} .
We have put the arrow above the E in the definition, because we are considering ordered pairs of
vertices. When S and T are disjoint
~
E(S, T)
~
E(S, S)
Observe that when α and β are greater than , the term on the right is less than αβdn.
In class, we will just prove this in the case that S and T are disjoint.
~
χTS LG χT = d |S ∩ T | − E(S, T) .
CHAPTER 27. PROPERTIES OF EXPANDER GRAPHS 210
This is almost as good as the bound we are trying to prove. To prove the claimed bound, recall
that LH x = LH (x + c1) for all c. So, let x S and x T be the result of orthogonalizing χS and χT
with respect to the constant vectors. By Claim 2.4.2 (from Lecture 2), kx S k = n(α − α2 ). So, we
obtain the improved bound
while p
kx S k kx T k = n (α − α2 )(β − β 2 ).
So, we may conclude
p
~
E(S, T ) − αβdn ≤ dn (α − α2 )(β − β 2 ).
We remark that when S and T are disjoint, the same proof goes through even if G is irregular and
~
weighted if we replace E(S, T ) with
X
w(S, T ) = w(u, v).
(u,v)∈E,u∈S,v∈T
d
We only need the fact that G -approximates n Kn . See [BSS12] for details.
The reason for the name expander graph is that small sets of vertices in expander graphs have
unusually large numbers of neighbors. For S ⊂ V , let N (S) denote the set of vertices that are
neighbors of vertices in S. The following theorem, called Tanner’s Theorem, provides a lower
bound on the size of N (S).
CHAPTER 27. PROPERTIES OF EXPANDER GRAPHS 211
|S|
|N (S)| ≥ ,
2 (1 − α) + α
Note that when α is much less than 2 , the term on the right is approximately |S| /2 , which can
be much larger than |S|. We will derive Tanner’s theorem from Theorem 27.3.1.
Proof. Let R = N (S) and let T = V − R. Then, there are no edges between S and T . Let
|T | = βn and |R| = γn, so γ = 1 − β. By Theorem 27.3.1, it must be the case that
p
αβdn ≤ dn (α − α2 )(β − β 2 ).
The lower bound on γ now follows by re-arranging terms. Dividing through by dn and squaring
both sides gives
α2 β 2 ≤ 2 (α − α2 )(β − β 2 ) ⇐⇒
2
αβ ≤ (1 − α)(1 − β) ⇐⇒
β 2 (1
− α)
≤ ⇐⇒
1−β α
1−γ 2 (1 − α)
≤ ⇐⇒
γ α
1 2
(1 − α) + α
≤ ⇐⇒
γ α
α
γ≥ 2 .
(1 − α) + α
If instead of N (S) we consider N (S) − S, then T and S are disjoint, so the same proof goes
through for weighted, irregular graphs that -approximate nd Kn .
Consider applying Tanner’s Theorem with S = {v} for some vertex v. As v has exactly d
neighbors, we find
2 (1 − 1/n) + 1/n ≥ 1/d,
p √
from which we see that must be at least 1/ d + d2 /n, which is essentially 1/ d. But, how
small can it be?
CHAPTER 27. PROPERTIES OF EXPANDER GRAPHS 212
The Ramanujan graphs, constructed by Margulis [Mar88] and Lubotzky, Phillips and
Sarnak [LPS88] achieve √
2 d−1
≤ .
d
We will see that if we keep d fixed while we let n grow, cannot exceed this bound in the limit.
We will prove an upper bound on by constructing a suitable test function.
As a first step, choose two vertices v and u in V whose neighborhoods to do not overlap. Consider
the vector x defined by
1 if i = u,
√
1/ d if i ∈ N (u),
x (i) = −1 if i = v,
√
−1/ d if i ∈ N (v),
0 otherwise.
we find
y T Ly √
= d + d.
yT y
√
This is not so impressive, as it merely tells us that ≥ 1/ d, which we already knew. But, we
can improve this argument by pushing it further. We do this by modifying it in two ways. First,
we extend x to neighborhoods of neighborhoods of u and v. Second, instead of basing the
construction at vertices u and v, we base it at two edges. This way, each vertex has d − 1 edges to
those that are farther away from the centers of the construction.
The following theorem is attributed to A. Nilli [Nil91], but we suspect it was written by N. Alon.
Theorem 27.5.1. Let G be a d-regular graph containing two edges (u0 , u1 ) and (v0 , v1 ) that are
at distance at least 2k + 2. Then
√
√ 2 d−1−1
λ2 ≤ d − 2 d − 1 + .
k+1
CHAPTER 27. PROPERTIES OF EXPANDER GRAPHS 213
and
k−1 √ 2 k
1 − 1/ d − 1
X X
−k+1
Y0 = |Vi | (d − 1) + |V k | (d − 1) , and Y1 = |Vi | (d − 1)−i .
i=0
(d − 1)−i/2 i=0
By my favorite inequality, it suffices to prove upper bounds on X0 /X1 and Y0 /Y1 . So, consider
√ 2
|Ui | (d − 1) 1−1/ d−1
Pk−1
i=0 (d−1) −i/2 + |Uk | (d − 1)−k+1
Pk .
−i
i=0 |Ui | (d − 1)
To upper bound the Rayleigh quotient, we observe that the left-most of these terms contributes
Pk |Ui | √
i=0 (d−1)i (d − 2 d − 1) √
Pk = d − 2 d − 1.
−i
i=0 |Ui | (d − 1)
|Uk | √
k
(2 d − 1 − 1),
(d − 1)
note that
|Uk | ≤ (d − 1)k−i |Ui | .
So, we have
k
|Uk | 1 X |Ui |
≤ .
(d − 1)k k+1 (d − 1)i
i=0
What can we say about λn ? In a previous iteration of this course, I falsely asserted that the same
proof tells us that √
√ 2 d−1−1
λn ≥ d + 2 d − 1 − .
k+1
But, the proof did not work.
Another question is how well a graph of average degree d can approximate the complete graph.
That is, let G be a graph with dn/2 edges, but let G be irregular. While I doubt that irregularity
helps one approximate the complete graph, I do not know how to prove it.
We can generalize this question further. Let G = (V, E, w) be a weighted graph with dn/2 edges.
Can we prove that G cannot approximate a complete graph any better than the Ramanujan
graphs do? I conjecture that for every d and every β > 0 there is an n0 so that for every graph of
average degree d on n ≥ n0 vertices,
√
λ2 d−2 d−1
≤ √ + β.
λn d+2 d−1
Chapter 28
This chapter gives a short introduction to the combinatorial view of error-correcting codes. Our
motivation is twofold: good error-correcting codes provide choices for the generators of
generalized hypercubes that have high expansion, and in the next chapter we learn how to use
expander graphs to construct good error-correcting codes.
We begin and end the chapter with a warning: the combinatorial, worst-case view of coding
theory presented herein was very useful in the first few decades of the field. But, the problem of
error-correction is at its heart probabilistic and great advances have been made by avoiding the
worst-case formulation. For readers who would like to understand this perspective, we recommend
“Modern Coding Theory” by Richardson and Urbanke. For those who wish to learn more about
the worst-case approach, we recommend “The Theory of Error-Correcting Codes” by
MacWilliams and Sloane.
28.1 Coding
Error-correcting codes are used to compensate for noise and interference in communication. They
are used in practically all digital transmission and data storage schemes. We will only consider
the problem of storing or transmitting bits1 , or maybe symbols from some small discrete alphabet.
The only type of interference we will consider is the flipping of bits. Thus, 0101 may become
1101, but not 010. More noise means more bits are flipped.
In our model problem, a transmitter wants to send m bits, which means that the transmitter’s
message is an element of Fm2 . But, if the transmitter wants the receiver to correctly receive the
message in the presence of noise, the transmitter should not send the plain message. Rather, the
transmitter will send n > m bits, encoded in such a way that the receiver can figure out what the
message was even if there is a little bit of noise.
1
Everything is bits. You think that’s air you’re breathing?
216
CHAPTER 28. A BRIEF INTRODUCTION TO CODING THEORY 217
A naive way of doing this would be for the transmitter to send every bit 3 times. If only 1 bit
were flipped during transmission, then the receiver would be able to figure out which one it was.
But, this is a very inefficient coding scheme. Much better approaches exist.
28.2 Notation
Recall that the Generalized Hypercubes we encountered in Section 7.4 have vertex set Fk2 and are
defined by d ≥ k generators, g 1 , . . . , g d ∈ Fk2 . For each b ∈ Fk2 , the graph defined by these
generators has an adjacency matrix eigenvalue given by
d
T
X
µb = (−1)g i b .
i=1
Let G be the d-by-k matrix whose ith row is g Ti . As (−1)x = 1 − 2x, for x ∈ {0, 1},
d
T
X
µb = (−1)g i b = d − 2 |Gb| .
i=1
The eigenvalue of d comes from b = 0. If Gb has small Hamming weight for every other vector b,
then all the other eigenvalues of the adjacency matrix will be small. We will see that this
condition is satisfied when G is the generator matrix of a good code.
The first idea in coding theory was the parity bit. It allows one to detect one error. Let’s say that
the transmitter wants to send b1 , . . . , bm . If the transmitter constructs
m
X
bm+1 = bi mod 2, (28.1)
i=1
CHAPTER 28. A BRIEF INTRODUCTION TO CODING THEORY 218
and sends
b1 , . . . , bm+1 ,
then the receiver will be able to detect one error, as it would cause (28.1) to be violated. But, the
receiver won’t know where the error is, and so won’t be able to figure out the correct message
unless it request a retransmit. And, of course, the receiver wouldn’t be able to detect 2 errors.
Hamming codes combine parity bits in an interesting way to enable the receiver to correct one
error. Let’s consider the first interesting Hamming code, which transmits 4-bit messages by
sending 7 bits in such a way that any one error can be corrected. Note that this is much better
than repeating every bit 3 times, which would require 12 bits.
For reasons that will be clear soon, we will let b3 , b5 , b6 , and b7 be the bits that the transmitter
would like to send. The parity bits will be chosen by the rules
b4 = b5 + b6 + b7
b2 = b3 + b6 + b7
b1 = b3 + b5 + b7 .
All additions, of course, are modulo 2. The transmitter will send the codeword b1 , . . . , b7 .
If we write the bits as a vector, then we see that they satisfy the linear equations
b1
b2
0 0 0 1 1 1 1 b3
0
0 1 1 0 0 1 1 b4 = 0 .
1 0 1 0 1 0 1 b5
0
b6
b7
For example, to transmit the message 1010, we set
b3 = 1, b5 = 0, b6 = 1, b7 = 0,
and then compute
b1 = 1, b2 = 0, b4 = 1.
Let’s see what happens if some bit is flipped. Let the received transmission be c1 , . . . , c7 , and
assume that ci = bi for all i except that c6 = 0. This means that the parity check equations that
involved the 6th bit will now fail to be satisfied, or
0 0 0 1 1 1 1 1
0 1 1 0 0 1 1 c = 1 .
1 0 1 0 1 0 1 0
Note that this is exactly the pattern of entries in the 6th column of the matrix. This will happen
in general. If just one bit is flipped, and we multiply the received transmission by the matrix, the
product will be the column of the matrix containing the flipped bit. As each column is different,
we can tell which bit it was. To make this even easier, the columns have been arranged to be the
binary representations of their index. For example, 110 is the binary representation of 6.
CHAPTER 28. A BRIEF INTRODUCTION TO CODING THEORY 219
C : Fm n
2 → F2 ,
for n larger than m. Every string in the image of C is called a codeword. We will also abuse
notation by identifying C with the set of codewords.
We define the rate of the code to be
m
. r=
n
The rate of a code tells you how many bits of information you receive for each codeword bit. Of
course, codes of higher rate are more efficient.
The Hamming distance between two words c 1 and c 2 is the number of bits in which they differ.
It will be written
dist(c 1 , c 2 ) = c 1 − c 2 .
d= min dist(c 1 , c 2 )
c 1 6=c 2 ∈C
(here we have used C to denote the set of codewords). It should be clear that if a code has large
minimum distance then it is possible to correct many errors. In particular, it is possible to correct
any number of errors less than d/2. To see why, let c be a codeword, and let r be the result of
flipping e < d/2 bits of c. As dist(c, r ) < d/2, c will be the closest codeword to r . This is
because for every c 1 6= c,
d ≤ dist(c 1 , c) ≤ dist(c 1 , r ) + dist(r , c) < dist(c 1 , r ) + d/2 implies d/2 < dist(c 1 , r ).
One of the early goals of coding theory was to construct asymptotically good sequences of codes.
Of course, one also needs to derive codes that have concise descriptions and that can be encoded
and decoded efficiently.
CHAPTER 28. A BRIEF INTRODUCTION TO CODING THEORY 220
A big step in this direction was the use of linear codes . In the same way that we defined
Hamming codes, we may define a linear code as the set of vectors c ∈ Fn2 such that M c = 0, for
some (n − m)-by-n matrix M . In this chapter, we will instead define a code by its generator
matrix. Given an n-by-m matrix G, we define the code CG to be the of vectors of the form Gb,
where b ∈ Fm2 . One may view b as the message to be transmitted, and Gb as its encoding.
A linear code is called linear because the sum of two codewords in the code is always another
codeword. In particular, 0 is always a codeword and the minimum distance of the code equals the
minimum Hamming weight of a non-zero codeword, as
dist(c 1 , c 2 ) = c 1 − c 2 = c 1 + c 2
over F2 .
We now pause to make the connection back to generalized hypercubes: if CG has minimum
relative distance δ and maximum relative distance 1 − δ, then the corresponding generalized
hypercube is 1 − 2δ expander.
In the early years of coding theory, there were many papers published that contained special
constructions of codes such as the Hamming code. But, as the number of bits to be transmitted
became larger and larger, it became more and more difficult to find such exceptional codes. Thus,
an asymptotic approach became reasonable. In his paper introducing coding theory, Shannon
[Sha48] proved that random codes are asymptotically good. A few years later, Elias [Eli55]
suggested using random linear codes.
We will now see that random linear codes are asymptotically good with high probability. We
consider a code of the form CG , where G is an n-by-m matrix with independent uniformly chosen
F2 entries. Clearly, the rate of the code will be m/n.
So, the minimum distance of CG is
min dist(0, Gb) = min m |Gb| ,
06=b∈Fm
2 06=b∈F2
where by |c| we mean the number of 1s in c. This is sometimes called the weight of c.
Here’s what we can say about the minimum distance of a random linear code. The following
argument is a refinement of the Chernoff based argument that appears in Section 7.5.
Lemma 28.6.1. Let G be a random n-by-m matrix. For any d, the probability that CG has
minimum distance at least d is at least
d
2m X n
1− n .
2 i
i=0
Proof. It suffices to upper bound the probability that there is some non-zero b ∈ Fm
2 for which
|Gb| ≤ d.
CHAPTER 28. A BRIEF INTRODUCTION TO CODING THEORY 221
To this end, fix some non-zero vector b in Fm2 . Each entry of Gb is the inner product of a column
of G with b. As each column of G consists of random F2 entries, each entry of Gb is chosen
uniformly from F2 . As the columns of G are chosen independently, we see that Gb is a uniform
random vector in Fn2 . Thus, the probability that |Gb| is at most d is precisely
d
1 X n
.
2n i
i=0
As the probability that one of a number of events holds is at most the sum of the probabilities
that each holds (the “union bound”),
X
PrG [∃b ∈ Fm2 , b 6
= 0 : |Gb| ≤ d] ≤ PrG [|Gb| ≤ d]
06=b∈Fm
2
d
m 1 X n
≤ (2 − 1) n .
2 i
i=0
d
2m X n
≤ n .
2 i
i=0
Reed-Solomon Codes are one of the workhorses of coding theory. The are simple to describe, and
easy to encode and decode.
However, Reed-Solomon Codes are not binary codes. Rather, they are codes whose symbols are
elements of a finite field. If you don’t know what a finite field is, don’t worry (yet). For now, we
will just consider prime fields, Fp . These are the numbers modulo a prime p. Recall that such
numbers may be added, multiplied, and divided.
A message in a Reed-Solomon code over a field Fp is identified with a polynomial of degree m − 1.
That is, the message f1 , . . . , fm is viewed as providing the coefficients of the polynomial
m−1
X
Q(x) = fi+1 xi .
i=0
A Reed-Solomon code is encoded by evaluating it over every element of the field. That is, the
codeword is
Q(0), Q(1), Q(2), . . . , Q(p − 1).
Sometimes, it is evaluated at a subset of the field elements.
We will now see that the minimum distance of such a Reed-Solomon code is p − m. We show this
using the following standard fact from algebra.
Lemma 28.7.1. Let Q be a polynomial of degree at most m − 1 over a field Fp . If there exists
distinct field elements x1 , . . . , xm such that
Q(xi ) = 0
then Q is identically zero.
Theorem 28.7.2. The minimum distance of the Reed-Solomon code is at least p − m.
Proof. Let Q1 and Q2 be two different polynomials of degree at most m − 1. For a polynomial Q,
let
E(Q) = (Q(0), Q(1), . . . , Q(p))
be its encoding. If
dist(E(Q1 ), E(Q2 )) ≤ p − k,
then there exists field elements x1 , . . . , xk such that
Q1 (xj ) = Q2 (xj ).
Now, consider the polynomial
Q1 (x) − Q2 (x).
It also has degree at most m − 1, and it is zero at k field elements. Lemma 28.7.1 tells us that if
k ≥ m, then Q1 − Q2 is exactly zero, which means that Q1 = Q2 . Thus, for distinct Q1 and Q2 , it
must be the case that
dist(E(Q1 ), E(Q2 )) > p − m.
CHAPTER 28. A BRIEF INTRODUCTION TO CODING THEORY 223
However, Reed-Solomon codes do not provide an asymptotically good family. If one represents
each field element by log2 p bits in the obvious way, then the code has length p log2 p, but can
only correct at most p errors. That said, one can find an asymptotically good family by encoding
each field element with its own small error-correcting code.
Next lecture, we will see how to make asymptotically good codes out of expander graphs. In the
following lecture, we will use good error-correcting codes to construct graphs.
28.8 Caution
Expander Codes
In this Chapter we will learn how to use expander graphs to construct and decode asymptotically
good error correcting codes.
Our construction of error-correcting codes will exploit bipartite expander graphs (as these give a
much cleaner construction than the general case). Let’s begin by examining what a bipartite
expander graph should look like. It’s vertex set will have two parts, U and V , each having n
vertices. Every vertex will have degree d, and every edge will go from a vertex in U to a vertex in
V.
In the same way that we view ordinary expanders as approximations of complete graphs, we will
view bipartite expanders as approximations of complete bipartite graphs1 . That is, if we let Kn,n
denote the complete bipartite graph, then we want a d-regular bipartite graph G such that
d d
(1 − ) Kn,n 4 G 4 (1 + ) Kn,n .
n n
As the eigenvalues of the Laplacian of nd Kn,n are 0 and 2d with multiplicity 1 each, and d
otherwise, this means that we want a d-regular graph G whose Laplacian spectrum satisfies
λ1 = 0, λ2n = 2d, and |λi − d| ≤ d, for all 1 < i < 2n.
We can obtain such a graph by taking the double-cover of an ordinary expander graph.
Definition 29.1.1. Let G = (V, E) be a graph. The double-cover of G is the graph with vertex set
V × {0, 1} and edges
((a, 0), (b, 1)) , for (a, b) ∈ E.
224
CHAPTER 29. EXPANDER CODES 225
Proposition 29.1.2. Let H be the double-cover of G. Then, for every eigenvalue λi of the
Laplacian of G, H has a pair of eigenvalues,
λi and 2d − λi .
The easiest way to prove this is to observe that if A is the adjacency matrix of G, then the
adjacency matrix of H looks like
0 A
.
A 0
Our analysis of error-correcting codes will exploit the following theorem, which is analogous to
Theorem 10.2.1.
d
Theorem 29.1.3. Let G = (U ∪ V, E) be a d-regular bipartite graph that -approximates n Kn,n .
Then, for all S ⊆ U and T ⊆ V ,
d p
|E(S, T )| − |S| |T | ≤ d |S| |T |.
n
Let G(S ∪ T ) denote the graph induced on vertex set S ∪ T . We use the following simple corollary
of Theorem 29.1.3.
Corollary 29.1.4. For S ⊆ U with |S| = σn and and T ⊆ V with |T | = τ n, the average degree of
vertices in G(S ∪ T ) is at most
2dστ
+ d.
σ+τ
Proof. The average degree of a graph is twice its number of edges, divided by the number of
vertices. In our case, this is at most
p
2d |S| |T | |S| |T |
+ 2d .
n |S| + |T | |S| + |T |
CHAPTER 29. EXPANDER CODES 226
Our construction of error-correcting codes will require two ingredients: a d-regular bipartite
expander graph G on 2n vertices, and a linear error correcting code C0 of length d. We will
combine these to construct an error correcting code of length dn. We think of the code C0 as
being a small code that drives the construction. This is reasonable as we will keep d a small
constant while n grows.
In our construction of the code, we associate one bit with each edge of the graph. As the graph
has dn edges, this results in dn bits, which we label y1 , . . . , ydn . We now describe the code by
listing the linear constraints its codewords must satisfy. Each vertex requires that the bits on its
attached edges resemble a codeword in the code C0 . That is, each vertex should list its attached
edges in some order (which order doesn’t matter, but it should be fixed). As a vertex has d
attached edges, it is easy to require that the d bits on these edges are a codeword in the code C0 .
Let r0 be the rate of code C0 . This means that the space of codewords has dimension r0 d. But,
since C0 is a linear code, it means that its codewords are exactly the the vectors that satisfy some
set of d(1 − r0 ) linear equations. As there are 2n vertices in the graph, the constraints imposed by
each vertex impose 2nd(1 − r0 ) linear constraints on the dn bits. Thus, the vector space of
codewords that satisfy all of these constraints has dimension at least
dn − 2dn(1 − r0 ) = dn(2r0 − 1),
and the code we have constructed has rate at least
r = 2r0 − 1.
So, this rate will be a non-zero constant as long as r0 > 1/2.
For the rest of the lecture, we will let C denote the resulting expander code.
29.3 Encoding
We have described the set of codewords, but have not said how one should encode. As the code is
linear, it is relatively easy to find a way to encode it. In particular, one may turn the above
description of the code into a matrix M with dn columns and 2dn(1 − r0 ) rows such that the
codewords are precisely those y such that
M y = 0.
CHAPTER 29. EXPANDER CODES 227
So, the codewords form a vector space of dimension dn(2r0 − 1), and so there is a matrix G with
dn(2r0 − 1) columns and dn rows for which the codewords are precisely the vectors Gx , for
x ∈ {0, 1}dn(2r0 −1) . In fact, there are many such matrices G, and they are called generator
matrices for the code. Such a matrix G may be computed from M by elementary linear algebra.
We will now see that if C0 is a good code, then C has large minimum distance. Let δ0 d be the
minimum distance of the code C0 . You should think of δ0 as being a constant.
δ ≥ δ02 /2.
Proof. As C is a linear code, it suffices to prove that C has no nonzero codewords of small
Hamming weight. To this end, we identify a codeword with the set of edges on which its bits are
1. Let F be such a set of edges, and let |F | = φdn. As the minimum distance of C0 is δ0 d, every
vertex v that is attached to an edge of F must be attached to at least δ0 d edges of F . Let S be
the subset of vertices of U adjacent to edges in F , and let T be the corresponding subset of V .
We have just argued that every vertex in G(S ∪ T ) must have degree at least δ0 d, and so in
particular the average degree of G(S ∪ T ) is at least δ0 d.
We may also use this fact to see that
|F |
|S| , |T | ≤ .
δ0 d
Setting σ = |S| /n and τ = |T | /n, the previous inequality becomes
φ
σ, τ ≤ .
δ0
which implies
δ0 (δ0 − ) ≤ φ.
The assumption ≤ δ0 /2 then yields
φ ≥ δ02 /2.
As we assumed that F was the set of edges corresponding to a codeword and that |F | = φdn, we
have shown that the minimum relative distance of C is at least δ02 /2.
29.5 Decoding
We will convert an algorithm that corrects errors in C0 into an algorithm for correcting errors in
C. The construction is fairly simple. We first apply the decoding algorithm at every vertex in U .
We then do it at every vertex in V . We alternate in this fashion until we produce a codeword.
To make this more concrete, assume that we have an algorithm A that corrects up to δ0 d/2 errors
in the code C0 . That is, on input any word r ∈ {0, 1}d , A outputs another word in {0, 1}d with
the guarantee that if there is a c ∈ C0 such that dist(c, r ) ≤ δ0 d/2, then A outputs c. We apply
the transformation A independently to the edges attached to each vertex of U . We then do the
same for V , and then alternate sides for a logarithmic number of iterations. We refer to these
alternating operations as U − and V -decoding steps
We will prove that if ≤ δ0 /3 then this algorithm will correct up to δ02 dn/18 errors in at most
log4/3 n iterations. The idea is to keep track of which vertices are attached to edges that contain
errors, rather than keeping track of the errors themselves. We will exploit the fact that any vertex
that is attached to few edges in error will correct those errors. Let S be the set of vertices
attached to edges in error after a U -decoding step. We will show that the set T of vertices
attached to edges in error after the next V -decoding step will be much smaller.
Lemma 29.5.1. Assume that ≤ δ0 /3. Let F ⊂ E be a set of edges, let S be the subset of
vertices in U attached to edges in F and let T be the subset of vertices in V attached to at least
δ0 d/2 edges in F . If
|S| ≤ δ0 n/9,
then
3
|T | ≤ |S| .
4
Proof. Let |S| = σn and |T | = τ n. We have |F | ≥ (δ0 d/2) |T |. As the average degree of G(S ∪ T )
is twice the number of edges in the subgraph divided by the number of vertices, it is at least
δ0 d |T | δ0 dτ
= .
|S| + |T | σ+τ
This implies
δ0 τ ≤ 2στ + (σ + τ ),
which becomes
σ
τ≤ .
δ0 − 2σ −
Recalling that σ ≤ δ0 /9 and ≤ δ0 /3, we obtain
δ0 /3 3
τ ≤σ ≤ σ.
δ0 (4/9) 4
Lemma 29.5.2. Assume that ≤ δ0 /3. Let F be the set of edges in error after a U -decoding
step, and let S be the set of vertices in U attached to F . Now, perform a V -decoding step and let
T be the set of vertices in V attached to edges in error afterwards. If
|S| ≤ δ0 n/9,
then
3
|T | ≤ |S| .
4
Proof. Every vertex in V that outputs an error after the V -decoding step must be attached to at
least δ0 d/2 edges of F . Moreover, each of these edges is attached to a vertex of S. Thus, the
lemma follows immediately from Lemma 29.5.1.
Theorem 29.5.3. If ≤ δ0 /3, then the proposed decoding algorithm will correct every set of at
most
δ02
dn
18
errors.
Proof. Let F denote the set of edges that are initially in error. Let S denote the set of vertices
that output errors after the first U -decoding step. Every vertex in S must be adjacent to at least
δd/2 edges in F , so
δ2 |F |
|F | ≤ 0 dn =⇒ |S| ≤ ≤ δ0 n/9.
18 δ0 d/2
After this point, we may apply Lemma 29.5.2 to show that the decoding process converges in at
most log4/3 n iterations.
Gallager [Gal63] first used graphs to construct error-correcting codes. His graphs were also
bipartite, with one set of vertices representing bits and the other set of vertices representing
constraints. Tanner [Tan81] was the first to put the vertices on the edges. The use of expansion in
CHAPTER 29. EXPANDER CODES 230
analyzing these codes we pioneered by Sipser and Spielman [SS96]. The construction we present
here is due to Zemor [Zem01], although he presents a tighter analysis. Improved constructions
and analyses may be found in [BZ02, BZ05, BZ06, AS06].
Surprisingly, encoding these codes is slower than decoding them. As the matrix G will be dense,
leading to an encoding algorithm that takes time Θ((dn)2 ). Of course, one would prefer to encode
them in time O(dn). Using Ramanujan expanders and the Fast Fourier Transform over the
appropriate groups, Lafferty and Rockmore [LR97] reduced the time for encoding to O(d2 n4/3 ).
Spielman [Spi96a] modifies the code construction to obtain codes with similar performance that
may be encoded in linear time.
Related ideas have been used to design codes that approach channel capacity. See [?, ?, ?, ?].
Chapter 30
30.1 Overview
Our goal is to prove that for every > 0 there is a d for which we can efficiently construct an
infinite family of d-regular -expanders. I recall that these are graphs whose adjacency matrix
eigenvalues satisfy |µi | ≤ d and whose Laplacian matrix eigenvalues satisfy |d − λi | ≤ d. Viewed
as a function of , the d that we obtain in this construction is rather large. But, it is a constant.
The challenge here is to construct infinite families with fixed d and .
Before we begin, I remind you that in Lecture 5 we showed that random generalized hybercubes
were expanders of degree f () log n, for some function f . The reason they do not solve today’s
problem is that their degrees depend on the number of vertices. However, today’s construction
will require some small expander graph, and these graphs or graphs like them can serve in that
role. So that we can obtain a construction for every number of vertices n, we will exploit random
generalized ring graphs. Their analysis is similar to that of random generalized hypercubes.
Claim 30.1.1. There exists a function f () so that for every > 0 and every sufficiently large n
the Cayley graph with group Z/n and a random set of at least f () log n generators is an
-expander with high probability.
I am going to present the simplest construction of expanders that I have been able to find. By
“simplest”, I mean optimizing the tradeoff of simplicity of construction with simplicity of
analysis. It is inspired by the Zig-Zag product and replacement product constructions presented
by Reingold, Vadhan and Wigderson [RVW02].
For those who want the quick description, here it is. Begin with an expander. Take its line graph.
231
CHAPTER 30. A SIMPLE CONSTRUCTION OF EXPANDER GRAPHS 232
Observe that the line graph is a union of cliques. So, replace each clique by a small expander. We
need to improve the expansion slightly, so square the graph. Square one more time. Repeat.
The analysis will be simple because all of the important parts are equalities, which I find easier to
understand than inequalities.
While this construction requires the choice of two expanders of constant size, it is explicit in the
sense that we can obtain a simple implicit representation of the graph: if the name of a vertex in
the graph is written using b bits, then we can compute its neighbors in time polynomial in b.
We will first show that we can obtain a family of expanders from a family of β-expanders for
any β < 1. The reason is that squaring a graph makes it a better expander, although at the cost
of increasing its degree.
Given a graph G, we define the graph G2 to be the graph in which vertices u and v are connected
if they are at distance 2 in G. Formally, G2 should be a weighted graph in which the weight of an
edge is the number of such paths. When first thinking about this, I suggest that you ignore the
issue. When you want to think about it, I suggest treating such weighted edges as multiedges.
We may form the adjacency matrix of G2 from the adjacency matrix of G. Let M be the
adjacency matrix of G. Then M 2 (u, v) is the number of paths of length 2 between u and v in G,
and M 2 (v, v) is always d. We will eliminate those self-loops. So,
M G2 = M 2G − dIn .
If G has no cycles of length up to 4, then all of the edges in its square will have weight 1. The
following claim is immediate from this definition.
Claim 30.2.1. The adjacency matrix eigenvalues of G2 are precisely
µ2i − d,
where µ1 , . . . , µn are the adjacency matrix eigenvalues of G.
√
Lemma
2 30.2.2. If {G i } i is an infinite family of d-regular β-expanders for β ≥ 1/ d − 1, then
2
Gi i is an infinite family of d(d − 1)-regular β expanders.
√
We remark that the case of β > 1/ d − 1, or even larger, is the case
√ of interest. We are not
expecting to work with graphs that beat the Ramanujan bound, 2 d − 1/d.
So, by squaring enough times, we can convert a family of β expanders for any β < 1 into a family
of expanders.
To measure the qualities of the graphs that appear in our construction, we define a quantity that
we will call the relative spectral gap of a d-regular graph:
λ2 (G) 2d − λn
def
r(G) = min , .
d d
The graphs with larger relative spectral gaps are better expanders. An -expander has relative
spectral gap at least 1 − , and vice versa. Because we can square graphs, we know that it suffices
to find an infinite family of graphs with relative spectral gap strictly greater than 0.
We now state exactly how squaring impacts the relative spectral gap of a graph.
Corollary 30.3.1. If G has relative spectral gap β, then G2 has relative spectral gap at least
2β − β 2 .
Our construction will leverage small expanders to make bigger expanders. To begin, we need a
way to make a graph bigger and still say something about its spectrum.
We use the line graph of a graph. Let G = (V, E) be a graph. The line graph of G is the graph
whose vertices are theedges of G in which two are connected if they share an endpoint in G.
That is, (u, v), (w, z) is an edge of the line graph if one of {u, v} is the same as one of {w, z}.
The line graph is often written L(G), but we won’t do that in this class so that we can avoid
confusion with the Laplacian.
Let G be a d-regular graph with n vertices, and let H be its line graph1 .As G has dn/2 edges, H
has dn/2 vertices. Each vertex of H, say (u, v), has degree 2(d − 1): d − 1 neighbors for the other
edges attached to u and d − 1 for v. In fact, if we just consider one vertex u in V , then all vertices
in H of form (u, v) of G will be connected. That is, H contains a d-clique for every vertex in V .
We see that each vertex of H is contained in exactly two of these cliques.
Here is the great fact about the spectrum of the line graph.
Lemma 30.4.1. Let G be a d-regular graph with n vertices, and let H be its line graph. Then the
spectrum of the Laplacian of H is the same as the spectrum of the Laplacian of G, except that it
has dn/2 − n extra eigenvalues of 2d.
Before we prove this lemma, we need to recall the factorization of a Laplacian as the product of
the signed edge-vertex adjacency matrix times its transpose. We reserved the letter U for this
matrix, and defined it by
1
if a = c
U ((a, b), c) = −1 if b = c
0 otherwise.
Define the matrix |U | to be the matrix obtained by replacing every entry of U by its absolute
value. Now, consider |U |T |U |. It looks just like the Laplacian, except that all of its off-diagonal
entries are 1 instead of −1. So,
|U |T |U | = D G + M G = dI + M G ,
as G is d-regular. We will also consider the matrix |U | |U |T . This is a matrix with nd/2 rows
and nd/2 columns, indexed by edges of G. The entry at the intersection of row (u, v) and column
(w, z) is
(δ u + δ v )T (δ w + δ z ).
So, it is 2 if these are the same edge, 1 if they share a vertex, and 0 otherwise. That is
|U | |U |T = 2Ind/2 + M H .
Moreover, |U | |U |T and |U |T |U | have the same eigenvalues, except that the later matrix has
nd/2 − n extra eigenvalues of 0.
1
If G has multiedges, which is how we interpret integer weights, then we include a vertex in the line graph for
each of those multiedges. These will be connected to each other by edges of weight two—one for each vertex that
they share. All of the following statements then work out.
CHAPTER 30. A SIMPLE CONSTRUCTION OF EXPANDER GRAPHS 235
λi is an eigenvalue of D G − M G =⇒
d − λi is an eigenvalue of M G =⇒
2d − λi is an eigenvalue of D G + M G =⇒
2d − λi is an eigenvalue of 2Ind/2 + M H =⇒
2(d − 1) − λi is an eigenvalue of M H =⇒
λi is an eigenvalue of D H − M H .
Of course, this last matrix is the Laplacian matrix of H. We can similarly show that the extra
dn/2 − n zero eigenvalues of 2Ind/2 + M H become 2d in LH .
While the line graph operation preserves λ2 , it causes the degree of the graph to grow. So, we are
going to need to do more than just take line graphs to construct expanders.
Proposition 30.5.1. Let G be a d-regular graph with d ≥ 7 and let H be its line graph. Then,
λ2 (G)
r(H) = ≥ r(G)/2.
2(d − 1)
Proof. For G a d-regular graph other than Kd+1 , λ2 (G) ≤ d + 1. By the Perron-Frobenius
theorem (Lemma 6.A.1) λmax (G) ≤ 2d (with equality if and only G is bipartite). So,
λmax (H) = 2d and λ2 (H) = λ2 (G) ≤ d. So, the term in the definition of the relative spectral gap
corresponding to the largest eigenvalue of H satisfies
While the line graph of G has more vertices, its degree is higher and its relative spectral gap is
approximately half that of G. We can improve the relative spectral gap by squaring. In the next
section, we show how to lower the degree.
CHAPTER 30. A SIMPLE CONSTRUCTION OF EXPANDER GRAPHS 236
Our next step will be to construct approximations of line graphs. We already know how to
approximate complete graphs: we use expanders. As line graphs are sums of complete graphs, we
will approximate them by sums of expanders. That is, we replace each clique in the line graph by
an expander on d vertices. Since d will be a constant in our construction, we will be able to get
these small expanders from known constructions, like the random generalized ring graphs.
Let G be a d-regular graph and let Z be a graph on d vertices of degree k (we will use a
low-degree expander). We define the graph
GLZ
to be the graph obtained by forming the edge graph of G, H, and then replacing every d-clique in
H by a copy of Z. Actually, this does not uniquely define G L Z, as there are many ways to
replace a d-clique by a copy of Z. But, any choice will work. Note that every vertex of G L Z has
degree 2k.
Lemma 30.6.1. Let G be a d-regular graph, let H be the line graph of G, and let Z be a
k-regular α-expander. Then,
k k
(1 − α) H 4 G L Z 4 (1 + α) H
d d
Let Zi be the graph obtained by replacing Hi with a copy of Z, on the same set of vertices. To
prove the lower bound, we compute
n n
X kX k
LG L Z = LZi < (1 − α) LHi = (1 − α) LH .
d d
i=1 i=1
kλ2 (G)
λ2 (G L Z) ≥ (1 − α) ,
d
and
λmax (G L Z) ≤ (1 + α)2k.
CHAPTER 30. A SIMPLE CONSTRUCTION OF EXPANDER GRAPHS 237
So,
kλ2 (G) kλ2 (G)
min (λ2 (G L Z), 2(2k) − λmax (G L Z)) ≥ min (1 − α) , (1 − α)2k = (1 − α) ,
d d
as λ2 (G) ≤ d. So,
1 1−α
r(G L Z) ≥ (1 − α)kr(G) = r(G).
2k 2
So, the relative spectral gap of G L Z is a little less than half that of G. But, the degree of G L Z
is 2k, which we will arrange to be much less than the degree of G, d.
We will choose k and d so that squaring this graph improves its relative spectral gap, but still
leaves its degree less than d. If G has relative spectral gap β, then G2 has relative spectral gap at
least
2β − β 2 .
It is easy to see that when β is small, this gap is approximately 2β. This is not quite enough to
compensate for the loss of (1 − )/2 in the corollary above, so we will have to square the graph
once more.
To make the inductive construction work, we need for Z to be a graph of degree k whose number
of vertices equals the degree of G. This is approximately 16k 4 , and is exactly
I’ll now carry out the computation of relative spectral gaps with more care. Let’s assume that G0
has a relative spectral gap of β ≥ 4/5, and assume, by way of induction, that ρ(Gi ) ≥ 4/5. Also
assume that Z is a 1/6-expander. We then find
So, Gi L Z is a 2/3-expander. Our analysis of graph squares then tells us that Gi+1 is a
(2/3)4 -expander. So,
r(Gi+1 ) ≥ 1 − (2/3)4 = 65/81 > 4/5.
By induction, we conclude that every Gi has relative spectral gap at least 4/5.
To improve their relative spectral gaps of the graphs we produce, we can just square them a few
times.
There is a better construction technique, called the Zig-Zag product [RVW02]. The Zig-Zag
construction is a little trickier to understand, but it achieves better expansion. I chose to present
the line-graph based construction because its analysis is very closely related to an analysis of the
Zig-Zag product.
Chapter 31
31.1 Overview
There are three major approaches to designing pseudo-random generators (PSRGs). The most
common is to use quick procedures that seem good enough. This is how the PSRGs that are
standard in most languages arise. Cryptographers and Complexity Theorists try to design PSRGs
that work for every polynomial-time algorithm. For example, one can construct PSRGs from
cryptographic functions with the guarantee that if the output of a polynomial-time algorithm
differs from random when using the PSRG, then one can use it to break the cryptographic
function (see [HILL99, Gol07]). In this chapter we consider the construction of PSRGs that can
be proved to work for specific algorithms or algorithms of specific forms. In particular, we will see
w Impagliazzo and Zuckerman’s [IZ89] approach of using of random walks on expanders to run
the same algorithm many times. We are going to perform a very crude analysis that is easy to
present. Rest assured that much tighter analyses are possible and much better PSRGs have been
constructed since.
Pseudo-random number generators take a seed which is presumably random (or which has a lot of
randomness in it), and then generate a long string of random bits that are supposed to act
random. We should first discuss why we would actually want such a thing. I can think of two
reasons.
1. Random bits are scarce. This might be surprising. After all, if you look at the last few bits
of the time that I last hit a key, it is pretty random. Similarly, the low-order bits of the
temperature of the processor in my computer seem pretty random. While these bits are
pretty random, there are not too many of them.
Many randomized algorithms need a lot of random bits. Sources such as these just do not
produce random bits with a frequency sufficient for many applications.
239
CHAPTER 31. PSRGS VIA RANDOM WALKS ON GRAPHS 240
2. If you want to re-run an algorithm, say to de-bug it, it is very convenient to be able to use
the same set of random bits by re-running the PSRG with the same seed. If you use truly
random bits, you can’t do this.
You may also wonder how good the standard pseudo-random number generators are. The first
answer is that the default ones, such as rand in C, are usually terrible. There are many
applications, such as those in my thesis, for which these generators produce behavior that is very
different from what one would expect from truly random bits (yes, this is personal). On the other
hand, one can use cryptographic functions to create bits that will act random for most purposes,
unless one can break the underlying cryptography [HILL99]. But, the resulting generators are
usually much slower than the fastest pseudo-random generators. Fundamentally, it comes down to
a time-versus-quality tradeoff. The longer you are willing to wait, the better the pseudo-random
bits you can get.
In today’s lecture we will require an infinite family of d-regular 1/10-expander graphs. We require
that d be a constant, that the graphs have 2r vertices for all sufficiently large r, and that we can
construct the neighbors of a vertex in time polynomial in r. That is, we need the graphs to have a
simple explicit description. One can construct expanders families of this form using the
techniques from last lecture. For today’s purposes, the best expanders are the Ramanujan graphs
produced by Margulis [Mar88] and Lubotzky, Phillips and Sarnak [LPS88]. Ramanujan graphs of
degree d = 400 are 1/10-expanders. See also the work of Alon, Bruck, Naor, Naor and
Roth [ABN+ 92] for even more explicit constructions.
While the explicit Ramanujan graphs only exist in certain sizes, none of which do have exactly 2r
vertices, some of them have just a little more that 2r vertices. It is possible to trim these to make
them work, say by ignoring all steps in which the vertex does not correspond to r bits.
Imagine you are given a black box that takes r bits as input and then outputs either 0 or 1.
Moreover, let’s assume that the black box is very consistent: we know that it returns the same
answer at least 99% of the time. If it almost always returns 0, we will call it a 0-box and if it
almost always returns 1, we will call it a 1-box. Our job is to determine whether a given box is a
0 or 1 box. We assume that r is big, so we don’t have time to test the box on all 2r settings of r
bits. Instead, we could pick r bits at random, and check what the box returns. If it says“1”, then
it is probably a 1-box. But, what if we want more than 99% confidence? We could check the box
on many choices of r random bits, and report the majority value returned by the box.1 . But, this
seems to require a new set of random bits for each run. In this lecture, we will prove that 9 new
bits per run suffice. Note that the result would be interesting for any constant other than 9.
1
Check for yourself that running it twice doesn’t help
CHAPTER 31. PSRGS VIA RANDOM WALKS ON GRAPHS 241
Since we will not make any assumptions about the black box, we will use truly random bits the
first time we test it. But, we will show that we only need 9 new random bits for each successive
test. In particular, we will show that if we use our PSRG to generate bits for t + 1 test, then the
probability that majority answer is wrong decreases exponentially in t.
You are probably wondering why we would want to do such a thing. The reason is to increase the
accuracy of randomized algorithms. There are many randomized algorithms that provide weak
guarantees, such as being correct 99% or 51% of the time. To obtain accurate answers from such
algorithms, we run them many times with fresh random bits. You can view such an algorithm has
having two inputs: the problem to be solved and its random bits. The black box is the behavior
of the algorithm when the problem to be solved is fixed, so it is just working on the random bits.
Let r be the number of bits that our black box takes as input. So, the space of random bits is
{0, 1}r . Let X ⊂ {0, 1}r be the settings of the random bits on which the box gives the minority
answer, and let Y be the settings on which it gives the majority answer.
Our pseudo-random generator will use a random walk on a 1/10-expander graph whose vertex set
is {0, 1}r . Recall that we can use d = 400. For the first input we feed to the black box, we will
require r truly random bits. We treat these bits as a vertex of our graph. For each successive test,
we choose a random neighbor of the present vertex, and feed the corresponding bits to the box.
That is, we choose a random i between 1 and 400, and move to the ith neighbor of the present
vertex. Note that we only need log2 400 ≈ 9 random bits to choose the next vertex. So, we will
only need 9 new bits to generate each input we feed to the box after the first.
Assume that we are going to test the box t + 1 times. Our pseudo-random generator will begin at
a truly random vertex v, and then take t random steps. Recall that we defined X to be the set of
vertices on which the box outputs the minority answer, and we assume that |X| ≤ 2r /100. If we
report the majority of the outcomes of the t + 1 outputs of the box, we will return the correct
answer as long as the random walk is inside X less than half the time. To analyze this, let v0 be
the initial random vertex, and let v1 , . . . , vt be the vertices produced by the t steps of the random
walk. Let T = {0, . . . , t} be the time steps, and let S = {i : vi ∈ X}. We will prove
2 t+1
Pr [|S| > t/2] ≤ √ .
5
To begin our analysis, recall that the initial distribution of our random walk is p 0 = 1/n. Let χX
and χY be the characteristic vectors of X and Y , respectively, and let D X = diag(χX ) and
D Y = diag(χY ). Let
1
W = M (31.1)
d
CHAPTER 31. PSRGS VIA RANDOM WALKS ON GRAPHS 242
be the transition matrix of the ordinary random walk on G. We are not using the lazy random
walk: it would be silly to use the lazy random walk for this problem, as there is no benefit to
re-running the experiment with the same random bits as before. Let ω1 , . . . , ωn be the eigenvalues
of W . As the graph is a 1/10-expander, |ωi | ≤ 1/10 for all i ≥ 2.
Let’s see how we can use these matrices to understand the probabilities under consideration. For
a probability vector p on vertices, the probability that a vertex chosen according to p is in X
may be expressed
χTX p = 1T D X p.
The second form will be more useful, as
DXp
is the vector obtained by zeroing out the events in which the vertices are not in X. If we then
want to take a step in the graph G, we multiply by W . That is, the probability that the walk
starts at vertex in X, and then goes to a vertex i is q (i) where
q = W D X p 0.
Continuing this way, we see that the probability that the walk is in X at precisely the times i ∈ R
is
1T D Zt W D Zt−1 W · · · D Z1 W D Z0 p 0 ,
where (
X if i ∈ R
Zi =
Y otherwise.
We will prove that this probability is at most (1/5)|R| . It will then follow that
X
Pr [|S| > t/2] ≤ Pr [the walk is in X at precisely the times in R]
|R|>t/2
X 1 |R|
≤
5
|R|>t/2
(t+1)/2
t+11
≤2
5
t+1
2
= √ .
5
Recall that the operator norm of a matrix M (also called the 2-norm) is defined by
kM v k
kM k = max .
v kv k
CHAPTER 31. PSRGS VIA RANDOM WALKS ON GRAPHS 243
The matrix norm measures how much a vector can increase in size when it is multiplied by M .
When M is symmetric, the 2-norm is just the largest absolute value of an eigenvalue of M (prove
this for yourself). It is also immediate that
kM 1 M 2 k ≤ kM 1 k kM 2 k .
You should also verify this yourself. As D X , D Y and W are symmetric, they each have norm 1.
Warning 31.7.1. While the largest eigenvalue of a walk matrix is 1, the norm of an asymmetric
walk matrix can be larger√than 1. For instance, consider the walk matrix of the path on 3 vertices.
Verify that it has norm 2.
Lemma 31.7.2.
kD X W k ≤ 1/5.
Let’s see why this implies the theorem. For any set R, let Zi be as defined above. As p 0 = W p 0 ,
we have
1T D Zt W D Zt−1 W · · · D Z1 W D Z0 p 0 = 1T (D Zt W ) D Zt−1 W · · · (D Z0 W ) p 0 .
Now, (
1/5 for i ∈ R, and
D Zt−1 W ≤
1 for i ∈
6 R.
So,
(D Zt W ) D Zt−1 W · · · (D Z0 W ) ≤ (1/5)|R| .
√ √
As kp 0 k = 1/ n and k1k = n, we may conclude
1T (D Zt W ) D Zt−1 W · · · (D Z0 W ) p 0 ≤ 1T (D Zt W ) D Zt−1 W · · · (D Z0 W ) p 0
≤ 1T (1/5)|R| kp 0 k
= (1/5)|R| .
x = c1 + y ,
D X W 1 = χX .
CHAPTER 31. PSRGS VIA RANDOM WALKS ON GRAPHS 244
This implies p √
kD X W c1k = c kχX k = c |X| ≤ c n/10.
We will now show that kW y k ≤ ky k /10. The easiest way to see this is to consider the matrix
W − J /n,
where we recall that J is the all-1 matrix. This matrix is symmetric and all of its eigenvalues
have absolute value at most 1/10. So, it has norm at most 1/10. Moreover, (W − J /n)y = W y ,
which implies kW y k ≤ ky k /10. Another way to prove this is to expand y in the eigenbasis of
W , as in the proof of Lemma 2.1.3.
Finally, as 1 is orthogonal to y , q
kx k = c2 n + ky k2 .
So,
√
kD X W x k ≤ kD X W c1k + kD X W y k ≤ c n/10 + ky k /10 ≤ kx k /10 + kx k /10 ≤ kx k /5.
31.9 Conclusion
Observe that this is a very strange proof. When considering probabilities, it seems that it would
be much more natural to sum them. But, here we consider 2-norms of probability vectors.
31.10 Notes
For the best results on the number of bits one needs for each run of an algorithm, see [?].
For tighter results on the concentration on variables drawn from random walks on expanders, see
Gillman [Gil98]. For matrices, see [GLSS18].
Part VI
Algorithms
245
Chapter 32
32.1 Overview
Two weeks ago, we learned that expander graphs are sparse approximations of the complete
graph. This week we will learn that every graph can be approximated by a sparse graph. Today,
we will see how a sparse approximation can be obtained by careful random sampling: every graph
on n vertices has an -approximation with only O(−2 n log n) edges (a result of myself and
Srivastava [SS11]). We will prove this using a matrix Chernoff bound due to Tropp [Tro12].
We originally proved this theorem using a concentration bound of Rudelson [Rud99]. This
required an argument that used sampling with replacement. When I taught this result in 2012, I
asked if one could avoid sampling with replacement. Nick Harvey pointed out to me the argument
that avoids replacement that I am presenting today.
In the next lecture, we will see that the log n term is unnecessary. In fact, almost every graph can
be approximated by a sparse graph almost as well as the Ramanujan graphs approximate
complete graphs.
32.2 Sparsification
(1 − )LG 4 LH 4 (1 + )LG .
We will show that every graph G has a good approximation by a sparse graph. This is a very
strong statement, as graphs that approximate each other have a lot in common. For example,
1. the effective resistance between all pairs of vertices are similar in the two graphs,
2. the eigenvalues of the graphs are similar,
3. the boundaries of all sets are similar, as these are given by χTS LG χS , and
246
CHAPTER 32. SPARSIFICATION BY RANDOM SAMPLING 247
We will prove this by using a very simple random construction. We first carefully1 choose a
probability pa,b for each edge (a, b). We then include each edge (a, b) with probabilty pa,b ,
independently. If we do include edge (a, b), we give it weight wa,b /pa,b . We will show that our
choice of probabilities ensures that the resulting graph H has at most 4n ln n/2 edges and is an
approximation of G with high probability.
The reason we employ this sort of sampling–blowing up the weight of an edge by dividing by the
probability that we choose it—is that it preserves the matrix in expectation. Let La,b denote the
elementary Laplacian on edge (a, b) with weight 1, so that
X
LG = wa,b La,b .
(a,b)∈E
The main tool that we will use in our analysis is a theorem about the concentration of random
matrices. These may be viewed as matrix analogs of the Chernoff bound that we saw in Lecture
5. These are a surprisingly recent development, with the first ones appearing in the work of
Rudelson and Vershynin [Rud99, RV07] and Ahlswede and Winter [AW02]. The best present
source for these bounds is Tropp [Tro12], in which the following result appears as Corollary 5.2.
Theorem 32.3.1. Let X 1 , . . . , X m be independent random n-dimensional
P symmetric positive
semidefinite matrices so that kX i k ≤ R almost surely. Let X = i X i and let µmin and µmax be
the minimum and maximum eigenvalues of
X
E [X ] = E [X i ] .
i
Then,
" # µmin /R
e−
X
Pr λmin ( X i ) ≤ (1 − )µmin ≤ n , for 0 < < 1, and
(1 − )1−
i
" # µmax /R
e
X
Pr λmax ( X i ) ≥ (1 + )µmax ≤ n , for 0 < .
(1 + )1+
i
It is important to note that the matrices X 1 , . . . , X m can have different distributions. Also note
that as the norms of these matrices get bigger, the bounds above become weaker. As the
1
For those who can’t stand the suspense, we reveal that we will choose the probabilities to be proportional to
leverage scores of the edges.
CHAPTER 32. SPARSIFICATION BY RANDOM SAMPLING 248
expressions above are not particularly easy to work with, we often use the following
approximations.
e−
2
1−
≤ e− /2 , for 0 < < 1, and
(1 − )
e
2
≤ e− /3 , for 0 < < 1.
(1 + )1+
Chernoff (and Hoeffding and Bernstein) bounds rarely come in exactly the form you want.
Sometimes you can massage them into the needed form. Sometimes you need to prove your own.
For this reason, you may some day want to spend a lot of time reading how these are proved.
Before applying the matrix Chernoff bound, we make a transformation that will cause
µmin = µmax = 1.
For positive definite matrices A and B, we have
A 4 (1 + )B ⇐⇒ B −1/2 AB −1/2 4 (1 + )I .
The same things holds for singular semidefinte matrices that have the same nullspace:
+/2 +/2 +/2 +/2
LH 4 (1 + )LG ⇐⇒ LG LH LG 4 (1 + )LG LG LG ,
+/2
where LG is the square root of the pseudo-inverse of LG . Let
+/2 +/2
Π = LG LG LG ,
which is the projection onto the range of LG . We now know that LG is an -approximation of LH
+/2 +/2
if and only if LG LH LG is an -approximation of Π.
As multiplication by a fixed matrix is a linear operation and expectation commutes with linear
operations,
+/2 +/2 +/2 +/2 +/2 +/2
ELG LH LG = LG (ELH ) LG = ELG LG LG = Π.
So, we really just need to show that this random matrix is probably close to its expectation, Π. It
would probably help to pretend that Π is in fact the identity, as it will make it easier to
understand the analysis. In fact, you don’t have to pretend: you could project all the vectors and
matrices onto the span of Π and carry out the analysis there.
Let (
+/2 +/2
(wa,b /pa,b )LG L(a,b) LG with probability pa,b
X a,b =
0 otherwise,
CHAPTER 32. SPARSIFICATION BY RANDOM SAMPLING 249
so that
+/2 +/2
X
LG LH LG = X a,b .
(a,b)∈E
`a,b = wa,b (δ a − δ b )T L+
G (δ a − δ b ).
To see the relation between the leverage score and pa,b , compute
As we can quickly approximate the effective resistance of every edge, we can quickly compute
sufficient probabilities.
Recall that the leverage score of an edge equals the probability that the edge appears in a random
spanning tree. As every spanning tree has n − 1 edges, this means that the sum of the leverage
scores is n − 1, and thus
X n−1 n
pa,b = ≤ .
R R
(a,b)∈E
This is a very clean bound on the expected number of edges in H. One can use a Chernoff bound
(on real variables rather than matrices) to prove that it is exponentially unlikely that the number
of edges in H is more than any small multiple of this.
CHAPTER 32. SPARSIFICATION BY RANDOM SAMPLING 250
For your convenience, I recall another proof that the sum of the leverage scores is n − 1:
X X
`a,b = wa,b Reff (a, b)
(a,b)∈E (a,b)∈E
X
= wa,b (δ a − δ b )T L+
G (δ a − δ b )
(a,b)∈E
X
wa,b Tr L+ T
= G (δ a − δ b )(δ a − δ b )
(a,b)∈E
X
= Tr L+
G wa,b (δ a − δ b )(δ a − δ b )
T
(a,b)∈E
X
= Tr L+
G wa,b La,b
(a,b)∈E
L+
= Tr G LG
= Tr (Π)
= n − 1.
We will choose
2
R= .
3.5 ln n
Thus, the number of edges in H will be at most 4(ln n)−2 with high probability.
We have X
EX a,b = Π.
(a,b)∈E
For the lower bound, we need to remember that we can just work orthogonal to the all-1s vector,
and so treat the smallest eigenvalue of Π as 1. We then find that
X
X a,b ≤ (1 − )Π ≤ n exp −2 /2R = n exp (−(3.5/2) ln n) = n−3/2 ,
Pr
a,b
CHAPTER 32. SPARSIFICATION BY RANDOM SAMPLING 251
We finally return to deal with the fact that there might be some edges for which pa,b ≥ 1 and so
definitely appear in H. There are two natural ways to deal with these—one that is easiest
algorithmically and one that simplifies the proof. The algorithmically natural way to handle these
is to simply include these edges in H, and remove them from the analysis above. This requires a
small adjustment to the application of the Matrix Chernoff bound, but it does go through.
From the perspective of the proof, the simplest way to deal with these is to split each such X a,b
into many independent random edges: k = b`a,b /Rc that appear with probability exactly 1, and
one more that appears with probability `a,b /R − k. This does not change the expectation of their
sum, or the expected number of edges once we remember to add together the weights of edges
that appear multiple times. The rest of the proof remains unchanged.
If I have time in class, I will sketch a way to quickly approximate the effective resistances of every
edge in the graph. The basic idea, which can be found in [SS11] and which is carried out better in
[KLP12], is that we can compute the effective resistance of an edge (a, b) from the solution to a
logarithmic number of systems of random linear equations in LG . That is, after solving a
logarithmic number of systems of linear equations in LG , we have information from which we can
estimates all of the effective resistances.
In order to sparsify graphs, we do not actually need estimates of effective resistances that are
always accurate. We just need a way to identify many edges of low effective resistance, without
listing any that have high effective resistance. I believe that better algorithms for doing this
remain to be found. Current fast algorithms that make progress in this direction and that exploit
such estimates may be found in [KLP12, Kou14, CLM+ 15, LPS15]. These, however, rely on fast
Laplacian equation solvers. It would be nice to be able to estimate effective resistances without
these. A step in this direction was recently taken in the works [CGP+ 18, LSY18], which quickly
decompose graphs into the union of short cycles plus a few edges.
Chapter 33
33.1 Overview
In this lecture, we will prove a slight simplification of the main result of [BSS12, BSS14]. This will
tell us that every graph with n vertices has an -approximation with approximately 4−2 n edges.
To translate this into a relation between approximation quality and average degree, note that
such a graph has average degree dave = 8−2 . So,
√
2 2
≈ √ ,
d
which is about twice what you would get from a Ramanujan graph. Interestingly, this result even
works for average degree just a little bit more than 1.
In the last lecture, we considered the Laplacian matrix of a graph G times the square root of the
pseudoinverse on either side. That is,
+/2 +/2
X
LG wa,b L(a,b) LG .
(a,b)∈E
Today, it will be convenient to view this as a sum of outer products of vectors. Set
√ +/2
v (a,b) = wa,b LG (δ a − δ b ).
Then,
+/2 +/2
X X
LG wa,b L(a,b) LG = v (a,b) v T(a,b) = Π,
(a,b)∈E (a,b)∈E
252
CHAPTER 33. LINEAR SIZED SPARSIFIERS 253
where we recall that Π = n1 LKn is the projection orthogonal to the constant vectors.
The problem of sparsification is then the problem of finding a small subset of these vectors,
S ⊆ E, along with scaling factors, c : S → IR, so that
X
(1 − )Π 4 ca,b v (a,b) v T(a,b) 4 (1 + )Π
(a,b)∈S
If we project onto the span of the Laplacian, then the sum of the outer products of vectors v (a,b)
becomes the identity, and our goal is to find a set S and scaling factors ca,b so that
X
(1 − )I n−1 4 ca,b v (a,b) v T(a,b) 4 (1 + )I n−1 .
(a,b)∈S
That is, so that all the eigenvalues of the matrix in the middle lie between (1 − ) and (1 + ).
Then, for every > 0 there exists a set S along with scaling factors ci so that
X
(1 − )2 I 4 ci v i v Ti 4 (1 + )2 I ,
i∈S
and
|S| ≤ n/2 .
The condition that the sum of the outer products of the vectors sums to the identity has a name,
isotropic position. I now mention one important property of vectors in isotropic position
Lemma 33.3.2. Let v 1 , . . . , v m be vectors in isotropic position. Then, for every matrix M ,
X
v Ti M v i = Tr (M ) .
i
Proof. We have
v T M v = Tr v v T M ,
so ! !
X X X
v Ti M v i v i v Ti M v i v Ti
= Tr = Tr M = Tr (I M ) = Tr (M ) .
i i i
CHAPTER 33. LINEAR SIZED SPARSIFIERS 254
Today, we will prove that we can find a set of√ 6n vectors for which all eigenvalues lie between
√ 1n
and
√ 13n. If you divide all scaling factors by 13n, this puts the eigenvalues between 1/ 13 and
13. You can tighten the argument to prove Theorem 33.3.1.
We will prove this theorem by an iterative argument in which we choose one vector at a time to
add to the set S. We will set the scaling factor of a vector when we add it to S. It is possible that
we will add a vector to S more than once, in which case we will increase its scaling factor each
time. Throughout the argument we will maintain the invariant that the eigenvalues of the scaled
sum of outer produces is in the interval [l, u], where l and u are quantities that will change with
each addition to S. At the start of the algorithm, when S is empty, we will have
l0 = −n and u0 = n.
δL = 1/3 and δU = 2.
We will need to understand what happens to a matrix when we add the outer product of a vector.
Theorem 33.4.1 (Sherman-Morrison). Let A be a nonsingular symmetric matrix and let v be a
vector and let c be a real number. Then,
A−1 v v T A−1
(A − cv v T )−1 = A−1 + c .
1 − cv T A−1 v
Proof. The easiest way to prove this is to multiply it out, gathering v T A−1 v terms into scalars:
To prove the main theorem we need a good way to measure progress. We would like to keep all
the eigenvalues of the matrix we have constructed at any point to lie in a nice range. But, more
than that, we need them to be nicely distributed within this range. To enforce this, we need to
measure how close the eigenvalues are to the limits.
CHAPTER 33. LINEAR SIZED SPARSIFIERS 255
For a lower bound on the eigenvalues l, we will define an analogous lower barrier function
X 1
= Tr (A − lI )−1 .
Φl (A) =
λi − l
i
This is positive whenever l is smaller than all the eigenvalues, goes to infinity as l approaches the
smallest eigenvalue, and decreases as l becomes smaller. In particular,
l + 1/Φl (A) ≤ λ1 . (33.3)
The analog of (33.1) is the following.
Claim 33.5.1. Let l be a lower bound on A and let δ < 1/Φl (A). Then,
1
Φl+δ (A) ≤ .
1/Φl (A) − δ
The most important thing to understand about the barrier functions is how they change when we
add a vector to S. The Sherman-Morrison theorem tells us that happens when we change A to
A + cv v T :
Φu (A + cv v T ) = Tr (uI − A − cv v T )−1
This increases the upper barrier function, and we would like to counteract this increase by
increasing u at the same time. If we advance u to u
b = u + δU , then we find
uI − A)−2 v
v T (b
Φu+δU (A + cv v T ) = Φu+δU (A) + c
uI − A)−1 v
1 − cv T (b
uI − A)−2 v
v T (b
= Φu (A) − Φu (A) − Φu+δU (A) + .
uI − A)−1 v
1/c − v T (b
We would like for this to be less than Φu (A). If we commit to how much we are going to increase
u, then this gives an upper bound on how large c can be. We want
uI − A)−2 v
v T (b
Φu (A) − Φu+δU (A) ≥ ,
uI − A)−1 v
1/c − v T (b
which is equivalent to
1 uI − A)−2 v
v T (b
≥ u uI − A)−1 v .
+ v T (b
c (Φ (A) − Φu+δU (A))
Define
((u + δu )I − A)−2
UA = + ((u + δu )I − A)−1 .
(Φu (A) − Φu+δU (A))
We have established a clean condition for when we can add cv v T to S and increase u by δU
without increasing the upper barrier function.
CHAPTER 33. LINEAR SIZED SPARSIFIERS 257
Lemma 33.6.1. If
1
≥ v T U Av ,
c
then
Φu+δU (A + cv v T ) ≤ Φu (A).
The miracle in the above formula is that the condition in the lemma just involves the vector v as
the argument of a quadratic form.
We also require the following analog for the lower barrier function. The difference is that
increasing l by setting ˆl = l + δL increases the barrier function, and adding a vector decreases it.
(A − ˆlI )−2
LA = − (A − ˆlI )−1 .
(Φl+δL (A) − Φl (A))
If
1
≤ v T LA v ,
c
then
Φl+δL (A + cv v T ) ≤ Φl (A).
If we fix the vector v and an increment δL , then this gives a lower bound on the scaling factor by
which we need to multiply it for the lower barrier function not to increase.
It remains to show that there exits a vector v and a scaling factor c so that
v Ti U A v i ≤ v Ti LA v i .
The analysis for the lower barrier is similar, but the second term is slightly more complicated.
Lemma 33.7.2. X 1 1
v Ti LA v i ≥ − .
δL 1/Φl (A) − δL
i
So, for there to exist a v i that we can add to S with scale factor c so that neither barrier function
increases, we just need that
1 1 1
+ Φu (A) ≤ − .
δU δL 1/Φl (A) − δ
If this holds, then there is a v i so that
v i U A v i ≤ v i LA v i .
We then set c so that
1
v iU Av i ≤ ≤ v i LA v i .
c
We now finish the proof by checking that the numbers I gave earlier satisfy the necessary
conditions. At the start both barrier functions are less than 1, and we need to show that this
holds throughout the algorithm. At every step, we will have by induction
1 1 3
+ Φu (A) ≤ + 1 = ,
δU 2 2
and
1 1 1 3
− ≥3− = .
δL 1/Φl (A) − δL 1 − 1/3 2
So, there is always a v i that we can add to S and a scaling factor c so that both barrier function
remain upper bounded by 1.
If we now do this for 6n steps, we will have
l = −n + 6n/3 = n and u = n + 2 · 6n = 13n.
The bound stated at the beginning of the lecture comes from tightening the analysis. In
particular, it is possible to improve Lemma 33.7.2 so that it says
X 1 1
v Ti LA v i ≥ − .
δL 1/Φl (A)
i
I recommend the paper for details.
We introduce basic iterative solvers for systems of linear equations: Richardson iteration and
Chebyshev’s method. We discuss Conjugate Gradient in the next Chapter, and iterative
refinement and preconditioning in Chapter 36.
Ax = b
260
CHAPTER 34. ITERATIVE SOLVERS FOR LINEAR EQUATIONS 261
To get started, we will examine a simple, but sub-optimal, iterative method, Richardson’s
iteration. The idea of the method is to find an iterative process that has the solution to Ax = b
as a fixed point, and which converges. We observe that if Ax = b, then for any α,
αAx = αb, =⇒
x + (αA − I)x = αb, =⇒
x = (I − αA)x + αb.
This leads us to the following iterative process:
x t = (I − αA)x t−1 + αb, (34.1)
where we will take x 0 = 0. We will show that this converges if
I − αA
has norm less than 1, and that the convergence rate depends on how much the norm is less than
1. This is analogous to our analysis of random walks on graphs from Chapter 10.
As we are assuming A is symmetric, I − αA is symmetric as well, and so its norm is the
maximum absolute value of its eigenvalues. Let 0 < λ1 ≤ λ2 . . . ≤ λn be the eigenvalues of A.
Then, the eigenvalues of I − αA are
1 − αλi ,
and the norm of I − αA is
max |1 − αλi | = |max (1 − αλ1 , 1 − αλn )| .
i
= (I − αA)(x − x t−1 ).
CHAPTER 34. ITERATIVE SOLVERS FOR LINEAR EQUATIONS 262
So,
x − x t = (I − αA)t (x − x 0 ) = (I − αA)t x .
and
x − x t = (I − αA)t x ≤ (I − αA)t kx k
= k(I − αA)kt kx k
t
2λ1
≤ 1− kx k .
λn + λ1
≤ e−2λ1 t/(λn +λ1 ) kx k .
x − xt
≤ ,
kx k
34.3 Expanders
Let’s pause a moment to consider the problem of solving systems in the Laplacians of expander
graphs. These are singular, but we know that their nullspace is spanned by the constant vectors.
So, if we work orthogonal to the constant vectors their effective smallest eigenvalue is λ2 . If the
graph is an -expander, then its condition number, λn /λ2 , will be approximately 1 + 2. Thus, we
can solve systems of linear equations in this Laplacian very quickly.
This should make intuitive sense: the Laplacian of an expander is an approximation of the
Laplacian of a complete graph. And, the Laplacians of complete graphs act as multiples of the
identity on the space orthogonal to constant vectors.
In contrast, Gaussian elimination on expanders is slow: it takes time Ω(n3 ) and requires space
Ω(n2 ) [LRT79].
1
For general matrices, the condition number is defined to be the ratio of the largest to smallest singular value.
CHAPTER 34. ITERATIVE SOLVERS FOR LINEAR EQUATIONS 263
Thinking about kx − x t k is a little awkward because we do not know x . For this reason, people
often measure the quality of approximation of a solution to a system of linear equations by
kb − Ax t k. For this quantity, the same sort of convergence results hold. First observe that
kApt (A) − Ik
x 0 = 0,
x 1 = αb,
x 2 = (I − αA)αb + αb,
x 3 = (I − αA)2 αb + (I − αA)αb + αb, and
t
X
xt = (I − αA)i αb.
i=0
To get some idea of why this should be an approximation of A−1 , consider the limit as t goes to
infinity. Assuming that the infinite sum converges, we obtain
∞
(I − αA)i = α (I − (I − αA))−1 = α(αA)−1 = A−1 .
X
α
i=0
So, the Richardson iteration can be viewed as a truncation of this infinite summation.
CHAPTER 34. ITERATIVE SOLVERS FOR LINEAR EQUATIONS 264
This leads us to the question of whether we can find better polynomial approximations to A−1 .
The reason I ask is that the answer is yes! As A, pt (A) and I all commute, the matrix
Apt (A) − I
is symmetric and its norm is the maximum absolute value of its eigenvalues. So, it suffices to find
a polynomial pt such that
|λi pt (λi ) − 1| ≤ ,
for all eigenvalues λi of A.
To reformulate this, define
qt (x) = 1 − xp(x).
Then, it suffices to find a polynomial qt of degree t + 1 for which
qt (0) = 1, and
|qt (x)| ≤ , for λ1 ≤ x ≤ λn .
that satisfy these conditions and thus allow us to compute solutions of accuracy . In terms of the
condition number of A, this is a quadratic improvement over Richardson’s first-order method.
Theorem 34.6.1. For every t ≥ 1, and 0 < λmin ≤ λmax , there exists a polynomial qt (x) such
that
for √ √
≤ 2(1 + 2/ κ)−t ≤ 2e−2t/ κ ,
where
λmax
κ= .
λmin
CHAPTER 34. ITERATIVE SOLVERS FOR LINEAR EQUATIONS 265
I’d now like to explain how we find these better polynomials. The key is to transform one of the
most fundamental polynomials: the Chebyshev polynomials. These polynomials are as small as
possible on [−1, 1], and grow quickly outside this interval. We will translate the interval [−1, 1] to
obtain the polynomials we need.
The tth Chebyshev polynomial, Tt (x) has degree t, and may be defined by setting
T0 (x) = 1, T1 (x) = x,
and for t ≥ 2
Tt (x) = 2xTt−1 (x) − Tt−2 (x).
These polynomials are best understood by realizing that they are the polynomials for which
It might not be obvious that one can express cos(tθ) as a polynomial in cos(θ). To see this, and
the correctness of the above formulas, recall that
1 iθ 1 θ
cos(θ) = e + e−iθ , and cosh(θ) = e + e−θ .
2 2
To verify that these satisfy the stated recurrences with x = cos(θ), compute
1 θ −θ
(t−1)θ −(t−1)θ
1
(t−2)θ −(t−2)θ
2Tt−1 (x) − Tt−2 (x) = e +e e +e − e +e
2 2
1 (tθ
1 (t−2)θ
1 (t−2)θ
= e + e−tθ + e + e−(t−2)θ − e + e−(t−2)θ
2 2 2
1 (tθ −tθ
= e +e .
2
Thus, (
cos(t acos(x)) for |x| ≤ 1, and
Tt (x) =
cosh(t acosh(x)) for x ≥ 1.
Claim 34.7.1. For x ∈ [−1, 1], |Tt (x)| ≤ 1.
Proof. For x ∈ [−1, 1], there is a θ so that cos(θ) = x. We then have Tt (x) = cos(tθ), which must
also be between −1 and 1.
To compute the values of the Chebyshev polynomials outside [−1, 1], we use the hyperbolic cosine
function. Hyperbolic cosine maps the real line to [1, ∞] and is symmetric about the origin. So,
the inverse of hyperbolic cosine may be viewed as a map from [1, ∞] to [0, ∞], and satisfies
p
acosh(x) = ln x + x2 − 1 , for x ≥ 1.
CHAPTER 34. ITERATIVE SOLVERS FOR LINEAR EQUATIONS 266
1. Tt has degree t.
To express qt (x) in terms of a Chebyshev polynomial, we should map the range on which we want
qt to be small, [λmin , λmax ] to [−1, 1]. We will accomplish this with the linear map:
def Tt (l(x))
qt (x) = .
Tt (l(0))
CHAPTER 34. ITERATIVE SOLVERS FOR LINEAR EQUATIONS 267
We know that |Tt (l(x))| ≤ 1 for x ∈ [λmin , λmax ]. To find q(x) for x in this range, we must
compute Tt (l(0)). We have
l(0) ≥ 1 + 2/κ(A),
and so by properties 3 and 4 of Chebyshev polynomials,
√
Tt (l(0)) ≥ (1 + 2/ κ)t /2.
Thus, √
q(x) ≤ 2(1 + 2/ κ)−t ,
√
for x ∈ [λmin , λmax ], and so all eigenvalues of q(A) will have absolute value at most 2(1 + 2/ κ)−t .
One might at first think that these techniques do not apply to Laplacian systems, as these are
always singular. However, we can apply these techniques without change if b is in the span of L.
That is, if b is orthogonal to the all-1s vector and the graph is connected. In this case the
eigenvalue λ1 = 0 has no role in the analysis, and it is replaced by λ2 . One way of understanding
this is to just view L as an operator acting on the space orthogonal to the all-1s vector.
By considering the example of the Laplacian of the path graph, one can show that it is impossible
√
to do much better than the κ iteration bound that I claimed at the end of the last section. To
see this, first observe that when one multiplies a vector x by L, the entry (Lx )(i) just depends on
x (i − 1), x (i), and x (i + 1). So, if we apply a polynomial of degree at most t, x t (i) will only
depend on b(j) with i − t ≤ j ≤ i + t. This tells us that we will need a polynomial of degree on
the order of n to solve such a system.
p
On the other hand, λn /λ2 is on the order of n as well. So, we shouldp not be able to solve the
system with a polynomial whose degree is significantly less than λn /λ2 .
34.10 Warning
The polynomial-based approach that I have described here only works in infinite precision
arithmetic. In finite precision arithmetic one has to be more careful about how one implements
these algorithms. This is why the descriptions of methods such as the Chebyshev method found
in Numerical Linear Algebra textbooks are more complicated than that presented here. The
algorithms that are actually used are mathematically identical in infinite precision, but they
actually work. The problem with the naive implementations are the typical experience: in
double-precision arithmetic the polynomial approach to Chebyshev will fail to solve linear systems
in random positive definite matrices in 60 dimensions!
Chapter 35
We introduce the matrix norm as the measure of convergence of iterative methods, and show how
the Conjugate Gradient method efficiently minimizes it. We finish by relating the rate of
convergence of any iterative method on a Laplacian matrix to the diameter of the underlying
graph.
My description of the Conjugate Gradient method is inspired by Vishnoi’s [Vis12]. It is the
simplest explanation of the Conjugate Gradient that I have seen.
Recall from Chapter 14 that for a positive semidefinite matrix A, the matrix norm in A is defined
by √
kx kA = x T Ax = A1/2 x .
For many applications, the right way to measure the quality of approximation of a system of
linear equations Ax = b is by kx − x t kA . Many algorithms naturally produce bounds on the
error in the matrix norm. And, for many applications that use linear equation solvers as
subroutines, this is the measure of accuracy in the subroutine that most naturally translates to
accuracy of the outside algorithm.
We should observe that both the Richardon and Chebyshev methods achieve error in the
A-norm. Let p be a polynomial such that
kp(A)A − I k ≤ .
Then,
268
CHAPTER 35. THE CONJUGATE GRADIENT AND DIAMETER 269
The analysis above works because these methods produce x t by applying a linear operator, p(A),
to b that commutes with A. While most of the algorithms we use to solve systems of equations in
A will be linear operators, they will typically not commute with A. But, they will produce small
error in the A-norm.
The following theorem shows that a linear operator Z is an approximation of A−1 if and only if
it produces at most error in the A-norm when used to solve systems of linear equations in A.
Theorem 35.1.1. Let A and Z be positive definite matrices. Then
kZ Ax − x kA ≤ kx kA (35.1)
for all x if and only if
(1 − )A−1 4 Z 4 (1 + )A−1 .
Proof. The assertion that (35.1) holds for all x is equivalent to the assertion that for all x ,
A1/2 (Z A − I )x ≤ A1/2 x .
(A1/2 Z A1/2 − I )y ≤ ky k ,
where the last statement follows from multiplying on the left and right by A−1/2 .
The iterative methods that we consider begin with the vector b, and then perform multiplications
by A and take linear combinations with vectors that have already been produced. So, after t
iterations they produce a vector that is in the span of
b, Ab, A2 b, . . . , At b .
We would like to find the coefficients ci that minimize (35.2). Expanding x gives
t
!T t
! t
!
1 T T 1 X X
T
X
x Ax t − b x t = ci p i A ci p i − b ci p i
2 t 2
i=0 i=0 i=0
t t
1 X X 1X
= c2i p Ti Ap i − ci b T p i + ci cj p Ti Ap j .
2 2
i=0 i=0 i6=j
To simplify the selection of the optimal constants ci , the Conjugate Gradient will compute a basis
p 0 , . . . , p t that makes the rightmost term 0. That is, it will compute a basis such that p Ti Ap j = 0
for all i 6= j. Such a basis is called an A-orthogonal basis.
When the last term is zero, the objective function becomes
t
X 1 2 T T
c p Ap i − ci b p i .
2 i i
i=0
So, the terms corresponding to different is do not interact, and we can minimize the sum by
minimizing each term individually. The term
1 2 T
c p Ap i − ci b T p i
2 i i
CHAPTER 35. THE CONJUGATE GRADIENT AND DIAMETER 271
bT pi
ci = .
p Ti Ap i
It remains to describe how we compute this A-orthogonal basis. The algorithm begins by setting
p 0 = b.
The next vector should be Ap 0 , but A-orthogonalized with respect to p 0 . That is,
(Ap 0 )T Ap 0
p 1 = Ap 0 − p 0 .
p T0 Ap 0
It is immediate that
p T0 Ap 1 = 0.
In general, we set
t
X (Ap t )T Ap i
p t+1 = Ap t − pi . (35.3)
i=0
p Ti Ap i
Let’s verify that p t+1 is A-orthogonal to p i for i ≤ t, assuming that p 0 , . . . , p t are A-orthogonal.
We have
t
X (Ap t )T Ap i
p Tj Ap t+1 = p Tj AAp t − p Tj Ap i
i=0
p Ti Ap i
(Ap t )T Ap i
= p Tj A2 p t − p Tj Ap j
p Tj Ap j
= 0.
The computation of p t+1 is greatly simplified by the observation that all but two of the terms in
the sum (35.3) are zero: for i < t − 1,
(Ap t )T Ap i = 0.
The computation of x t by
t
X bT pi
xt = pi .
i=0
p Ti Ap i
Only requires an additional O(t) more such operations.
In fact, only t multiplications by A are required to compute p 0 , . . . , p t and x 1 , . . . , x t : every term
in the expressions for these vectors can be derived from the products Ap i . Thus, the Conjugate
Gradient algorithm can find the x t in the t + 1st Krylov subspace that minimizes the error in the
A-norm in time O(tn) plus the time required to perform t multiplications by A.
Caution: the algorithm that I have presented here differs from the implemented Conjugate
gradient in that the implemented Conjugate Gradient re-arranges this computation to keep the
norms of the vectors involved reasonably small. Without this adjustment, the algorithm that I’ve
described will fail in practice as the vectors p i will become too large.
The Conjugate Gradient is at least as good as the Chebyshev iteration, in that it finds a vector of
smaller error in the A-norm in any given number of iterations. The optimality property of the
Conjugate Gradient causes it to perform remarkably well.
For example, one can see that it should never require more than n iterations. The vector x is
always in nth Krylov subspace. Here’s an easy way to see this. Let the distinct eigenvalues of A
be λ1 , . . . , λk . Now, consider the polynomial
Qk
def (λi − x)
q(x) = i=1 Qk .
i=1 λ i
Laplacian of the hypercube. While there are other fast algorithms the exploit the special structure
of the hypercube, CG works well when one has a graph that is merely very close to the hypercube.
In general, CG works especially quickly on matrices in which the eigenvalues appear in just a few
clusters, and on matrices in which there are just a few extreme eigenvalues. We will learn more
about this in the next lecture.
This would be a good time to re-examine what we want when our matrix is a Laplacian. The
Laplacian does not have an inverse. Rather, we want a polynomial in the Laplacian that
approximates its pseudo-inverse (which we defined back in Lecture 8). If we were exactly solving
the system of linear equations, we would have found a polynomial p such that
p(L)b = x ,
kp(L)L − Πk ≤ .
Our intuition tells us that if we can quickly solve linear equations in the Laplacian matrix of a
graph by an iterative method, then the graph should have small diameter. We now make that
intuition precise.
If s and t are vertices that are at distance greater than d from each other, then
χTs Ld χt = 0.
On the other hand, if L only has k distinct eigenvalues other than 0, then we can form a
polynomial p of degree k − 1 such that
Lp(L) = Π.
Theorem 35.6.1. Let G be a connected graph whose Laplacian has at most k distinct eigenvalues
other than 0. Then, the diameter of G is at most k.
Proof. Let d be the diameter of the graph and let s and t be two vertices at distance d from each
other. We have
e Ts Πe t = −1/n.
On the other hand, we have just described a polynomial in L with zero constant term, given by
Lp(L), that has degree k and such that
Lp(L) = Π.
e Ts Lp(L)e t = 0.
Theorem 35.6.2. Let G = (V, E) be a connected graph, and let λ2 ≤ · · · ≤ λn be its Laplacian
eigenvalues. Then, the diameter of G is at most
r !
1 λn
+ 1 ln 2n.
2 λ2
Chapter 36
Preconditioning Laplacians
αB 4 A 4 βB.
Lemma 36.0.1. Let α and β be as defined above. Then, α and β are the smallest and largest
eigenvalues of B −1 A, excluding possible zero eigenvalues corresponding to a common nullspace of
A and B.
We need to exclude the common nullspace when A and B are the Laplacian matrices of
connected graphs. If these matrices have different nullspaces α = 0 or β = ∞ and the condition
number β/α is infinite.
Proof of Lemma 36.0.1. We just prove the statement for β, in the case where neither matrix is
singular. We have
which equals β.
Recall that the eigenvalues of B −1 A are the same as those of B −1/2 AB −1/2 and A1/2 B −1 A1/2 .
275
CHAPTER 36. PRECONDITIONING LAPLACIANS 276
ke
x − x kA ≤ kx kA .
We will now see how to use a very good preconditioner to solve a system of equations. Let’s
consider a preconditioner B that satisfies
(1 − )B 4 A 4 (1 + )B.
B −1 b − x A
= A1/2 B −1 b − A1/2 x
= A1/2 B −1 Ax − A1/2 x
≤ A1/2 x
= kx kA .
Remark: This result crucially depends upon the use of the A-norm. It fails under the Euclidean
norm.
If we want a better solution, we can just compute the residual and solve the problem in the
residual. That is, we set
x 1 = B −1 b,
and compute
r 1 = b − Ax 1 = A(x − x 1 ).
We then use one solve in B to compute a vector x 2 such that
k(x − x 1 ) − x 2 kA ≤ kx − x 1 kA ≤ 2 kx kA .
CHAPTER 36. PRECONDITIONING LAPLACIANS 277
So, x 1 + x 2 , our new estimate of x , differs from x by at most an 2 factor. Continuing in this
way, we can find an k approximation of x after solving k linear systems in B. This procedure is
called iterative refinement.
The iterative methods we studied last class can also be shown to produce good approximate
solutions in the matrix norm. Given a matrix A, these produce -approximation solutions after t
iterations if there is a polynomial q of degree t for which q(0) = 1 and |q(λi )| ≤ for all
eigenvalues of A. To see this, recall that we can define p(x) so that q(x) = 1 − xp(x), and set
xe = p(A)b,
to get
ke
x − x kA = kp(A)b − x kA = kp(A)Ax − x kA .
≤ kp(A)A − I k A1/2 x
≤ kx kA .
kx − x t kA = A1/2 x − A1/2 x t
So, we find
kx − x t kA ≤ qt (A1/2 B −1 A1/2 ) (A1/2 x ) ≤ kx kA .
The Preconditioned Conjugate Gradient (PCG) is a magical algorithm that after t steps (each of
which involves solving a system in B, multiplying a vector by A, and performing a constant
number of vector operations) produces the vector x t that minimizes
kx t − x kA
over all vectors x t that can be written in the form pt (b) for a polynomial of degree at most t.
That is, the algorithm finds the best possible solution among all iterative methods of the form we
have described. We first bound the quality of PCG by saying that it is at least as good as
Preconditioned Chebyshev, but it has the advantage of not needing to know α and β. We will
then find an improved analysis.
Vaidya [Vai90] had the remarkable idea of preconditioning the Laplacian matrix of a graph by the
Laplacian matrix of a subgraph. If H is a subgraph of G, then
LH 4 LG ,
It is relatively easy to show that linear equations in the Laplacian matrices of trees can be solved
exactly in linear time. One can either do this by finding an LU -factorization with a linear number
of non-zeros, or by viewing the process of solving the linear equation as a dynamic program that
passes up once from the leaves of the tree to a root, and then back down.
CHAPTER 36. PRECONDITIONING LAPLACIANS 279
We will now show that a special type of tree, called a low-stretch spanning tree provides a very
good preconditioner. To begin, let T be a spanning tree of G. Write
X X
LG = wu,v Lu,v = wu,v (χu − χv )(χu − χv )T .
(u,v)∈E (u,v)∈E
To evaluate this last term, we need to know the value of (χu − χv )T L−1 T (χu − χv ). You already
know something about it: it is the effective resistance in T between u and v. In a tree, this equals
the distance in T between u and v, when we view the length of an edge as the reciprocal of its
weight. This is because it is the resistance of a path of resistors in series. Let T (u, v) denote the
path in T from u to v, and let w1 , . . . , wk denote the weights of the edges on this path. As we
view the weight of an edge as the reciprocal of its length,
k
X 1
(χu − χv )T L−1
T (χu − χv ) = . (36.1)
wi
i=1
Even better, the term (36.1) is something that has been well-studied. It was defined by Alon,
Karp, Peleg and West [AKPW95] to be the stretch of the unweighted edge (u, v) with respect to
the tree T . Moreover, the stretch of the edge (u, v) with weight wu,v with respect to the tree T is
defined to be exactly
k
X 1
wu,v ,
wi
i=1
where again w1 , . . . , wk are the weights on the edges of the unique path in T from u to v. A
sequence of works, begining with [AKPW95], has shown that every graph G has a spanning tree
in which the sum of the stretches of the edges is low. The best result so far is due to [AN12], who
prove the following theorem.
Theorem 36.5.1. Every weighted graph G has a spanning tree subgraph T such that the sum of
the stretches of all edges of G with respect to T is at most
where m is the number of edges G. Moreover, one can compute this tree in time
O(m log n log log n).
CHAPTER 36. PRECONDITIONING LAPLACIANS 280
We can show that the Preconditioned Conjugate Gradient will actually run in closer to O(m1/3 )
iterations. Since the trace is the sum of the eigenvalues, we know that for every β > 0, L−1
T LG has
at most
Tr L−1
T LG /β
eigenvalues that are larger than β.
To exploit this fact, we use the following lemma. It basically says that we can ignore the largest
eigenvalues of B −1 A if we are willing to spend one iteration for each.
Lemma 36.6.1. Let λ1 , . . . , λn be positive numbers such that all of them are at least α and at
most k of them are more than β. Then, for every t ≥ k, there exists a polynomial p(X) of degree t
such that p(0) = 1 and
!−(t−k)
2
|p(λi )| ≤ 2 1 + p ,
β/α
for all λi .
Proof. Let r(X) be the polynomial we constructed using Chebyshev polynomials of degree t − k
for which !−(t−k)
2
|r(X)| ≤ 2 1 + p ,
β/α
for all X between α and β. Now, set
Y
p(X) = r(X) (1 − X/λi ).
i:λi >β
This new polynomial is zero at every λi greater than β, and for X between α and β
Y
|p(X)| = |r(X)| |(1 − X/λi )| ≤ |r(X)| ,
i:λi >β
Applying this lemma to the analysis of the Preconditioned Conjugate Gradient, with
2/3 1/3
β = Tr L−1T LG and k = Tr L−1
T LG , we find that the algorithm produces -approximate
solutions within 1/3
O(Tr L−1
T LG ln(1/)) = O(m1/3 log n ln 1/)
iterations.
This result is due to Spielman and Woo [SW09].
We now have three families of algorithms for solving systems of equations in Laplaican matrices
in nearly-linear time.
• By subgraph preconditioners. These basically work by adding back edges to the low-stretch
trees. The resulting systems can no longer be solved directly in linear time. Instead, we use
Gaussian elimination to eliminate the degree 1 and 2 vertices to reduce to a smaller system,
and then solve that system recursively. The first nearly linear time algorithm of this form
ran in time O(m logc n log 1/), for some constant c [ST14]. An approach of this form was
first made practical (and much simpler) by Koutis, Miller, and Peng [KMP11]. The
asymptotically fastest method also works this way. It runs in time
O(m log1/2 m logc log n log 1/), [CKM+ 14] (Cohen, Kyng, Miller, Pachocki, Peng, Rao, Xu).
• By sparsification (see my notes from Lecture 19 from 2015). These algorithms work rather
differently, and do not exploit low-stretch spanning trees. They appear in the papers
[PS14, KLP+ 16].
There are other algorithms that are often fast in practice, but for which we have no theoretical
analysis. I suggest the Algebraic Multigrid of Livne and Brandt, and the Combinatorial Multigrid
of Yiannis Koutis.
36.8 Questions
I conjecture that it is possible to construct spanning trees of even lower stretch. Does every graph
have a spanning tree of average stretch 2 log2 n? I do not see any reason this should not be true. I
also believe that this should be achievable by a practical algorithm. The best code that I know for
computing low-stretch spanning trees, and which I implemented in Laplacians.jl, is a heuristic
based on the algorithm of Alon, Karp, Peleg and West. √ However, I do not know an analysis of
their algorithm that gives stretch better than O(m2 log n ). The theoretically better low-stretch
CHAPTER 36. PRECONDITIONING LAPLACIANS 282
trees of Abraham and Neiman are obtained by improving constructions of [EEST08, ABN08].
However, they seem too complicated to be practical.
The eigenvalues of L−1
H LG are called generalized eigenvalues. The relation between generalized
eigenvalues and stretch is the first result of which I am aware that establishes a combinatorial
interpretation of generalized eigenvalues. Can you find any others?
Chapter 37
under construction
The first algorithms that solved Laplacian systems in nearly linear time used augmented spanning
tree preconditioners. These are formed by adding edges of G back to a spanning tree of G. Vaidya
[Vai90] first suggested doing this with maximum spanning trees. The first nearly linear time
solvers were developed by Spielman and Teng [ST14] by augmenting low stretch spanning trees.
The elegant algorithm described in this chapter is from two papers by Koutis, Miller, and Peng
[KMP10, KMP11]. It solves systems to accuracy in time O(me log n log −1 ).
Using the Iterative Refinement algorithm from the previous chapter, we know that it suffices to
show this with any constant < 1. You should assume throughout this chapter that is some
absolute constant like 1/20.
I recall that O
e is like O-notation, but it hides low order logarithmic terms. That is, when we
write f (n) ≤ O(g(n)),
e we mean that there is a constant c such that f (n) ≤ O(g(n) logc g(n)). For
example, in this notation we can say that every graph G has a spanning tree T of average stretch
O(log
e n). In this Chapter we will want to specify that many statements are true given some
choice of constants c. For this purpose, we will often let c be a constant, but not the same
constant, where it appears throughout the chapter. We do this instead of using O-notation, as it
simplifies making the constants explicit later.
37.1 Recursion
Let H be obtained by adding a few edges back to a spanning tree T of G. As a large fraction of
the vertices of T will have degree 1 or 2, the same is true of H. We can eliminate these degree 1
283
CHAPTER 37. AUGMENTED SPANNING TREE PRECONDITIONERS 284
This means that we can solve a system of equations in LH by solving systems in U T , LHe , and U .
As elimination of a degree 1 vertex only decreases the degree of its neighbor and the elimination
of a degree 2 vertex does not change the degrees of its neighbors, the matrix U has at most 2n
nonzero entries. As U is upper triangular, systems in U and U T can be solved in time
proportional to their number of nonzero entries, O(n). This inspires a recursive algorithm for
solving equations in LG : we construct a good preconditioner H with many degree 1 and 2
vertices, and then solve systems in LH by approximately solving in LHe .
We now explore this idea in a little more detail. First observe that because we are applying a
recursive algorithm, we will not solve systems in LHe exactly. Rather, we will be applying an
algorithm to approximately solve these systems. The one guarantee we make about this algorithm
is that it acts as a linear operator. That is, the action of this algorithm corresponds to
multiplication by some matrix Z that we never construct. But, we know that for some
Lemma 37.1.1. Let T be a tree on n vertices. Then, more than half the vertices of T have degree
1 or 2.
Proof. The number of edges in T is n − 1, so the average degree of vertices in T is less than 2.
Thus T must contain at least one degree 1 vertex for every vertex of degree at least 3. The other
vertices have degree 2.
We learned last lecture that if we keep eliminating degree 1 vertices from trees, then we will
eventually eliminate all the vertices. An analogous fast is true for a graph that equals a tree plus
k edges.
1
Whether this matrix is actually upper triangular depends on the ordering of the vertices. We assume, without
loss of generality, that the vertices are ordered so that the matrix is upper triangular.
CHAPTER 37. AUGMENTED SPANNING TREE PRECONDITIONERS 285
Lemma 37.1.2. Let H be a tree on n vertices plus k edges. If we eliminate degree 1 and 2
vertices of the tree that do not touch the extra k edges until none remain, we will be left with at
most 4k vertices and 5k edges.
Proof. If we eliminate a degree 1 or 2 vertex of the tree that does not touch one of the extra k
edges, we will obtain a graph that looks like a tree on one fewer vertex, plus k edges. As a tree on
4k vertices must have at least 2k + 1 vertices of degree 1 or 2, at least one of these does not touch
one of the extra k edges, and so can be eliminated.
Koutis, Miller, and Peng observe that we do not necessarily have to produce a subgraph H of G
that looks like a tree plus a few edges. All we really need is for H to have many fewer edges than
G. This still leaves the question of how we will find such an H that is a good approximation of G.
The trick is to use a variant of the random-sampling based approach of Chapter 32. But, we
avoid the cost of computing effective resistances of edges by estimating them by their stretches, at
the cost of a worse approximation.
We begin by formally stating the result of that chapter for graphs.
Theorem 37.2.1. Let G = (V, E, w) be a graph, let > 0, and for every edge (a, b) and let
pa,b ∈ (0, 1] satisfy
4 ln n
pa,b ≥ min 1, 2 wa,b Reff G (a, b) .
Form the random graph H = (V, F, u) by setting for every edge independently
(
wa,b /pa,b with probability pa,b
ua,b =
0 with probability 1 − pa,b .
Proof. Rayleigh’s Monotonicity Theorem tells us that Reff G (a, b) ≤ Reff T (a, b), and this latter
term equals StretchT (a, b).
The problem with sampling edges with probability proportional to their effective resistance, or
stretches, is that this will produce too many edges. Koutis, Miller, and Peng solve this problem
by multiplicatively increasing the weights of the edges in a low-stretch spanning tree of G. Define
e = G + (s − 1)T.
G
That is, G
e is the same as G, but every edge in the tree T has its weight multiplied by s.
CHAPTER 37. AUGMENTED SPANNING TREE PRECONDITIONERS 286
For every edge (a, b)inT we set pa,b = 1 and for every edge (a, b) 6∈ T , we set
4 ln n
pa,b ≥ min 1, wa,b StretchT (a, b) .
s2
So, by making s a little more than some constant times σ ln n, we can make sure that the number
of edges of H not in T is less than the number of edges of G not in T .
But, we need to solve systems in G, not G.
e To this end, we use the following multiplicative
property of condition numbers.
Claim 37.2.3.
κ(LG , LH ) ≤ κ(LG , LGe )κ(LGe , LH ).
As Ge differs from G by having the weights of some edges multiplied by s, κ(LG , L e ) ≤ s. Thus,
G
we will have κ(LG , LH ) ≤ s(1 + )/(1 − ), and to get accurate solutions to systems in LG we
will need to solve some constant times κ(LG , LH )1/2 systems in LH . As we are going to keep
constant, this will be around s1/2 .
To make an efficient algorithm for solving systems in G out of an algorithm for solving systems in
H, it would be easiest if the cost of the solves in H is less than the cost of a multiply by G. As we
will solve the system in H around s1/2 times, it seems natural to ensure that the number of edges
of H that are not in T is at most the number of edges in G divide by s1/2 . That is, we want
4mσ ln n
s1/2 ≤ m,
s2
which requires
s ≥ c(σ ln n)2 ,
for some constant c. We will now show that such a choice of c yields an algorithm for solving
linear equations in LG to constant accuracy in time O(m
e log2 n).
CHAPTER 37. AUGMENTED SPANNING TREE PRECONDITIONERS 287
We now describe the recursion. Let G0 = G, the input graph. We will eventually solve systems in
Gi by recursively solving systems in Gi+1 . Each system Gi+1 will have fewer edges than Gi , and
thus we can use a brute force solve when the system becomes small enough. We will bound the
running time of solvers for systems in Gi in terms of the number of edges that are not in their
spanning trees. We denote this by oi = mi − (ni − 1). There is some issue with o0 , so let’s assume
without much loss of generality that G0 does not have any degree 1 or 2 vertices, and thus the
o0 ≥ n0 .
Form Ge i by multiplying a low-stretch spanning tree of G by s, and use random sampling to
produce Hi . We know that the number of off-tree edges in Hi is at most a 1/(cσ ln n) fraction of
the number of off-tree edges in Gi . If the number of off-tree edges in Hi is less than ni /4, then we
know that after eliminating degree 1 and 2 vertices we will be left with a graph having at most
4ni vertices and 5ni edges. We let Gi+1 be this graph. If this number off of-tree edges in Hi is
more than ni /4, then we just set Gi+1 = Hi .
In this way, we ensure that oi+1 ≤ oi /(cσ ln n). We can now prove by backwards induction on i
that the time required to solve systems of equations in LGi is at most O(oi σ ln n). A solve in Gi
to constant accuracy requires performing O(s1/2 ) solves in Gi+1 and as many multiplies by LGi .
By induction we know that this takes time at most
38.1 Overview
We will see how sparsification allows us to solve systems of linear equations in Laplacian matrices
and their sub-matrices in nearly linear time. By “nearly-linear”, I mean time
O(m logc (nκ−1 ) log −1 ) for systems with m nonzero entries, n dimensions, condition number κ.
and accuracy .
This algorithm comes from [PS14].
In today’s lecture, I will find it convenient to define matrix approximations slightly differently
from previous lectures. Today, I define A ≈ B to mean
e− A 4 B 4 e A.
Note that this relation is symmetric in A and B, and that for small e ≈ 1 + .
The advantage of this definition is that
288
CHAPTER 38. FAST LAPLACIAN SOLVERS BY SPARSIFICATION 289
I begin by describing the idea behind the algorithm. This idea won’t quite work. But, we will see
how to turn it into one that does.
We will work with matrices that look like M = L + X where L is a Laplacian and X is a
non-zero, non-negative diagonal matrix. Such matrices are called M-matrices. A symmetric
M-matrix is a matrix M with nonpositive off-diagonal entries such that M 1 is nonnegative and
nonzero. We have encountered M-matrices before without naming them. If G = (V, E) is a graph,
S ⊂ V , and G(S) is connected, then the submatrix of LG indexed by rows and columns in S is an
M-matrix. Algorithmically, the problems of solving systems of equations in Laplacians and
symmetric M-matrices are equivalent.
The sparsification results that we learned for Laplacians translate over to M-matrices. Every
M-matrix M can be written in the form X + L where L is a Laplacian and X is a nonnegative
b ≈ L, then it is easy to show (too easy for homework) that
diagonal matrix. If L
b ≈ X + L.
X +L
In Lecture 7, Lemma 7.3.1, we proved that if X has at least one nonzero entry and if L is
connected, then X + L is nonsingular. We write such a matrix in the form M = D − A where D
is positive diagonal and A is nonnegative, and note that its being nonsingular and positive
semidefinite implies
D −A0 ⇐⇒ D A. (38.1)
Using the Perron-Frobenius theorem, one can also show that
D −A. (38.2)
We only need O(log κ−1 ) terms of this product to obtain a good approximation of (I − B)−1 .
j
The obstacle to quickly applying a series like this is that the matrices I + B 2 are probably dense.
We know how to solve this problem: we can sparsify them! I’m not saying that flippantly. We
actually do know how to sparsify matrices of this form.
j
But, simply sparsifying the matrices I + B 2 does not solve our problem because approximation
is not preserved by products. That is, even if A ≈ A
b and B ≈ B,b A bB b could be a very poor
approximation of AB. In fact, since the product A bBb is not necessarily symmetric, we haven’t
even defined what it would mean for it to approximate AB.
We will now derive a way of expanding (I − B)−1 that is amenable to approximation. We begin
with an alternate derivation of the series we saw before. Note that
(I − B)(I + B) = (I − B 2 ),
and so
(I − B) = (I − B 2 )(I + B)−1 .
Taking the inverse of both sides gives
(I − B)−1 = (I + B)(I − B 2 )−1 .
We can then apply the same expansion to (I − B 2 )−1 to obtain
(I − B)−1 = (I + B)(I + B 2 )(I − B 4 )−1 .
38.5 D and A
Unfortunately, we are going to need to stop writing matrices in terms of I and B, and return to
writing them in terms of D and A. The reason this is unfortunate is that it makes for longer
expressions.
The analog of (38.3) is
1 −1
(D − A)−1 = D + (I + D −1 A)(D − AD −1 A)−1 (I + AD −1 ) .
(38.4)
2
In order to be able to work with this expression inductively, we need to check that the middle
matrix is an M-matrix.
Proof. As the off-diagonal entries of this matrix are symmetric and nonpositive, it suffices to
prove that M 1 ≥ 0 and M 1 6= 0. To compute the row sums set
d = D1 and a = A1,
(D − AD −1 A)1 = d − AD −1a ≥ d − A1 = d − a ,
We will apply transformation like this many times during our algorithm. To keep track of
progress, I say that (D, A) is an (α, β)-pair if
a. D is positive diagonal,
For our initial matrix M = D − A, we know that there is some number κ > 0 for which (D, A) is
a (1 − κ, 1 − κ)-pair.
At the end of our recursion we will seek a (1/4, 1/4)-pair. When we have such a pair, we can just
approximate D − A by D.
M ≈1/3 D.
CHAPTER 38. FAST LAPLACIAN SOLVERS BY SPARSIFICATION 292
Proof. We have
M = D − A 4 (1 + 1/4)D ≤ e1/4 D,
and
M = D − A < D − (1/4)D = (3/4)D < e−1/3 D.
Proof. From Lecture 14, Lemma 3.1, we know that the condition of the lemma is equivalent to
the assertion that all eigenvalues of D −1 A have absolute value at most α, and that the conclusion
is equivalent to the assertion that all eigenvalues of D −1 AD −1 A lie between 0 and α2 , which is
immediate as they are the squares of the eigenvalues of D −1 A.
So, if we start with matrices D and A that are a (1 − κ, 1 − κ)-pair, then after applying this
transformation approximately log κ−1 + 2 times we obtain a (1/4, 0)-pair. But, the matrices in
this pair could be dense. To keep them sparse, we need to figure out how approximating D − A
degrades its quality.
a. (D, A) is a (1 − κ, 0) pair,
b. D − A ≈ D
b − A,
b and
c. D ≈ D,
b
b is an (1 − κe−2 , 3)-pair.
b −A
then D
e2 D
b < e D < e (D − A) < (D
b − A).
b
It remains to confirm that sparsification satisfies the requirements of this lemma. The reason this
might not be obvious is that we allow A to have nonnegative diagonal elements. While this does
not interfere with condition b, you might be concerned that it would interfere with condition c. It
need not.
Let C be the diagonal of A, and let L be the Laplacian of the graph with adjacency matrix
A − C , and set X so that X + L = D − A. Let L e be a sparse -approximation of L. By
computing the quadratic form in elementary unit vectors, you can check that the diagonals of L
and L
e approximate each other. If we now write Le=D e − A,
e where Ae has zero diagonal, and set
D
b =D
e +C and A
b =A
e +C
You might wonder why we bother to keep diagonal elements in a matrix like A. It seems simpler
to get rid of them. However, we want (D, A) to be an (α, β) pair, and removing subtracting C
from both of them would make β worse. This might not matter too much as we have good control
over β. But, I don’t yet see a nice way to carry out a proof that exploits this.
D 1 − A1 ≈ D 0 − A0 D −1
0 A0
M i = D i − Ai
M i ≈ D i − Ai−1 D −1
i−1 Ai−1 .
For the i such that κi is small, κi+1 is approximately twice κi . So, for k = 2 + log2 1/κ and close
to zero, we can guarantee that (D k , Ak ) is a (1/4, 1/4) pair.
We now see how this construction allows us to approximately solve systems of equations in
D 0 − A0 , and how we must set for it to work. For every 0 ≤ i < k, we have
1 1 1 −1 1
(D i −Ai )−1 D −1 −1 −1 −1 −1 −1 −1
i + (I +D i Ai )(D i −Ai D i Ai ) (I +Ai D i ) ≈ D i + (I +D i Ai )(D i+1 −Ai+1 ) (I +Ai D
2 2 2 2
CHAPTER 38. FAST LAPLACIAN SOLVERS BY SPARSIFICATION 294
and
(D k − Ak )−1 ≈1/3 D −1
k .
The dominant cost of the resulting algorithm will be the multiplication of vectors by 2k matrices
of O(n/2 ) entries, with a total cost of
O(n(log2 (1/κ))3 ).
In the above construction, I just assumed that appropriate sparsifiers exist, rather than
constructing them efficiently. To construct them efficiently, we need two ideas. The first is that
we need to be able to quickly approximate effective resistances so that we can use the sampling
algorithm from Lecture 17.
The second is to observe that we do not actually want to form the matrix AD −1 A before
sparsifying it, as that could take too long. Instead, we express it as a product of cliques that have
succinct descriptions, and we form the sum of approximations of each of those.
38.8 Improvements
√
The fastest known algorithms for solving systems of equations run in time O(m log n log −1 )
[CKM+ 14]. The algorithm I have presented here can be substantially improved by combining it
with Cholesky factorization. This both gives an efficient parallel algorithm, and proves the
existence of an approximate inverse for every M-matrix that has a linear number of nonzeros
[LPS15].
Chapter 39
39.1 Introduction
I will present an algorithm of Leighton and Miller [LM82] for testing isomorphism of graphs in
which all eigenvalues have multiplicity 1. This algorithm was never published, as the results were
technically subsumed by those in a paper of Babai, Grigoriev and Mount [BGM82], which gave a
polynomial time algorithm for testing isomorphism of graphs in which all eigenvalues have
multiplicity bounded by a constant.
I present the weaker result in the interest of simplicity.
Testing isomorphism of graphs is√a notorious problem. Until very recently, the fastest-known
algorithm for it took time time 2 O(n log n) (See [Bab81, BL83, ZKT85]). Babai [Bab16] recently
O(1)
announced a breakthrough that reduces the complexity to 2(log n) .
However, testing graph isomorphism seems easy in almost all practical instances. Today’s lecture
and one next week will give you some idea as to why.
Recall that two graphs G = (V, E) and H = (V, F ) are isomorphic if there exists a permutation π
of V such that
(a, b) ∈ E ⇐⇒ (π(a), π(b)) ∈ F.
Of course, we can express this relation in terms of matrices associated with the graphs. It doesn’t
matter much which matrices we use. So for this lecture we will use the adjacency matrices.
295
CHAPTER 39. TESTING ISOMORPHISM OF GRAPHS WITH DISTINCT EIGENVALUES296
Every permutation may be realized by a permutation matrix. For the permutation π, this is the
matrix Π with entries given by (
1 if π(a) = b
Π(a, b) =
0 otherwise.
For a vector ψ, we see1 that
(Πψ) (a) = ψ(π(a)).
Let A be the adjacency matrix of G and let B be the adjacency matrix of H. We see that G and
H are isomorphic if and only if there exists a permutation matrix Π such that
ΠAΠT = B.
If G and H are isomorphic, then A and B must have the same eigenvalues. However, there are
many pairs of graphs that are non-isomorphic but which have the same eigenvalues. We will see
some tricky ones next lecture. But, for now, we note that if A and B have different eigenvalues,
then we know that the corresponding graphs are non-isomorphic, and we don’t have to worry
about them.
For the rest of this lecture, we will assume that A and B have the same eigenvalues, and that
each of these eigenvalues has multiplicity 1. We will begin our study of this situation by
considering some cases in which testing isomorphism is easy.
Recall that we can write
A = Ψ ΛΨ T ,
where Λ is the diagonal matrix of eigenvalues of A and Ψ is an orthonormal matrix holding its
eigenvectors. If B has the same eigenvalues, we can write
B = ΦΛΦ T .
ΠΨ ΛΨ T ΠT = ΦΛΦ T .
As each entry of Λ is distinct, this looks like it would imply ΠΨ = Φ. But, the eigenvectors
(columns of Φ and Ψ ) are only determined up to sign. So, it just implies
ΠΨ = ΦS,
Lemma 39.3.1. Let A = Ψ ΛΨ T and B = ΦΛΦ T where Λ is a diagonal matrix with distinct
entries and Ψ and Φ are orthogonal matrices. A permutation matrix Π satisfies ΠAΠT = B if
and only if there exists a diagonal ±1 matrix S for which
ΠΨ = ΦS .
Our algorithm for testing isomorphism will determine all such matrices S . Let S be the set of all
diagonal ±1 matrices. We will find diagonal matrices S ∈ S such that the set of rows of ΦS is
the same as the set of rows of Ψ . As the rows of Ψ are indexed by vertices a ∈ V , we will write
the row indexed by a as the row-vector
def
v a = (ψ 1 (a), . . . , ψ n (a)).
Similarly denote the rows of Φ by vectors u a . In this notation, we are searching for matrices
S ∈ S for which the set of vectors {v a }a∈V is identical to the set of vectors {u a S }a∈V We have
thus transformed the graph isomorphism problem into a problem about vectors:
determines the isomorphism if ψ i (a) 6= ψ i (b) for all a 6= b and there is a canonical way to choose a
sign for the vector ψ i . For example, if the sum of the entries in ψ i is not zero, we can choose its
sign to make the sum positive. In fact, unless ψ i and −ψ i have exactly the same set of values,
there is a canonical choice of the sign for this vector.
Even if there is no canonical choice of sign for this vector, it leaves at most two choices for the
isomorphism.
The graph isomorphism problem is complicated by the fact that there can be many isomorphisms
from one graph to another. So, any algorithm for finding isomorphisms must be able to find many
of them.
Recall that an automorphism of a graph is an isomorphism from the graph to itself. These form a
group which we denote aut(G): if Π and Γ are automorphisms of A then so is ΠΓ. Let A ⊆ S
denote the corresponding set of diagonal ±1 matrices. The set A is in fact a group and is
isomorphic to aut(G).
Here is a way to make this isomorphism very concrete: Lemma 39.3.1 implies that the
Π ∈ aut(G) and the S ∈ A are related by
Π = ΨSΨT and S = Ψ T ΠΨ .
As diagonal matrices commute, we have that for every Π1 and Π2 in aut(G) and for
S 1 = Ψ T Π1 Ψ and S 2 = Ψ T Π2 Ψ ,
Π1 Π2 = Ψ S 1 Ψ T Ψ S 2 Ψ T = Ψ S 1 S 2 Ψ T = Ψ S 2 S 1 Ψ T = Ψ S 2 Ψ T Ψ S 1 Ψ T = Π2 Π1 .
Thus, the automorphism group of a graph with distinct eigenvalues is commutative, and it is
isomorphic to a subgroup of S.
It might be easier to think about these subgroups by realizing that they are isomorphic to
subspaces of (Z/2Z)n . Let f : S → (Z/2Z)n be the function that maps the group of diagonal
matrices with ±1 entries to vectors t modulo 2 by setting t(i) so that S (i, i) = (−1)t(i) . You
should check that this is a group homomorphism: f (S 1 S 2 ) = f (S 1 ) + f (S 2 ). You should also
confirm that f is invertible.
For today’s lecture, we will focus on the problem of finding the group of automorphisms of a
graph with distinct eigenvalues. We will probably save the slight extension to finding
isomorphisms for homework. Note that we will not try to list all the isomorphisms, as there could
be many. Rather, we will give a basis of the corresponding subspace of (Z/2Z)n .
Recall that the orbit of an element under the action of a group is the set of elements to which it is
mapped by the elements of the group. Concretely, the orbit of a vertex a in the graph is the set of
CHAPTER 39. TESTING ISOMORPHISM OF GRAPHS WITH DISTINCT EIGENVALUES299
vertices to which it can be mapped by automorphisms. We will discover the orbits by realizing
that the orbit of a vertex a is the set of b for which v a S = v b for some S ∈ A.
The set of orbits of vertices forms a partition of the vertices. We say that a partition of the
vertices is valid if every orbit is contained entirely within one set in the partition. That is, each
class of the partition is a union of orbits. Our algorithm will proceed by constructing a valid
partition of the vertices and then splitting classes in the partition until each is exactly an orbit.
Recall that a set is stabilized by a group if the set is unchanged when the group acts on all of its
members. We will say that a group G ⊆ S stabilizes a set of vertices C if it stabilizes the set of
vectors {v a }a∈C . Thus, A is the group that stabilizes V .
An orbit is stabilized by A, and so are unions of orbits and thus classes of valid partitions. We
would like to construct the subgroup of S that stabilizes each orbit Cj . However, I do not yet see
how to do that directly. Instead, we will construct a particular valid partition of the vertices, and
find for each class in the partition Cj the subgroup of Aj ⊆ S that stabilizes Cj , where here we
are considering the actions of matrices S ∈ S on vectors v a . In fact, Aj will act transitively2 on
the class Cj . As A stabilizes every orbit, and thus every union of orbits, it is a subgroup of Aj . In
fact, A is exactly the intersection of all the groups Aj .
We now observe that we can use linear algebra to efficiently construct A from the groups Aj by
exploiting the isomorphism between S and (Z/2)n . Each subgroup Aj is isomorphic to a
subgroup of (Z/2)n . Each subgroup of (Z/2)n is precisely a vector space modulo 2, and thus may
be described by a basis. It will eventually become clear that by “compute Aj ” we mean to
compute such a basis. From the basis, we may compute a basis of the nullspace. The subgroup of
(Z/2)n corresponding to A is then the nullspace of the span of the nullspaces of the subspaces
corresponding to the Aj . We can compute all these using Gaussian elimination.
We may begin by dividing vertices according to the absolute values of their entries in
eigenvectors. That is, if |ψ i (a)| =
6 |ψ i (b)| for some i, then we may place vertices a and b in
different classes, as there can be no S ∈ S for which v a S = v b . The partition that we obtain this
way is thus valid, and is the starting point of our algorithm.
Thus, an unbalanced vector tells us that all vertices for which ψ i (a) = x are in different orbits
from those for which ψ i (a) = −x. This lets us refine classes.
We now extend this idea in two ways. First, we say that ψ i is unbalanced on a class C if there is
some value x for which
By the same reasoning, we can infer that the sign of S (i, i) must be fixed to 1. Assuming, as will
be the case, that C is a class in a valid partition and thus a union of orbits, we are now able to
split C into two smaller classes
The partition we obtain by splitting C into C1 and C2 is thus also valid. Of course, it is only
useful if both sets are non-empty.
Finally, we consider vectors formed from products of eigenvectors. For R ⊆ {1, . . . , n}, define ψ R
to be the component-wise product of the ψ i for i ∈ R:
Y
ψ R (a) = ψ i (a).
i∈R
We say that the vector ψ R is unbalanced on class C if there is some value x for which
An unbalanced vector of this form again tells us that the vertices in the two sets belong to
different orbits. So, if both sets are nonempty we can use such a vector to split the class C in two
to obtain a more refined valid partition. It also provides some relations between the entries of S ,
but we will not exploit those.
We say that a vector is balanced if it is not unbalanced.
We say that a subset of the vertices C ⊆ V is balanced if every non-constant product of
eigenvectors is balanced on C. Thus, orbits are balanced. Our algorithm will partition the
vertices into balanced classes.
My confusion over this lecture stemmed from thinking that all balanced classes must be orbits.
But, I don’t know if this is true.
Question: Is every balanced class an orbit of A?
Let Cj be a balanced class. By definition, the product of every subset of eigenvectors is either
constant or balanced on Cj . We say that a subset of eigenvectors Q is independent on Cj if all
products of subsets of eigenvectors in Q are balanced on Cj (except for the empty product). In
particular, none of these eigenvectors is zero or constant on Cj . Construct a matrix MCj ,Q whose
CHAPTER 39. TESTING ISOMORPHISM OF GRAPHS WITH DISTINCT EIGENVALUES301
rows are indexed by vertices in a ∈ Cj , whose columns are indexed by subsets R ⊆ Q, and whose
entries are given by
1
if x > 0
MCj ,Q (a, R) = sgn(ψ R (a)), where I recall sgn(x) = −1 if x < 0, and
0 if x = 0.
Proof. Let R1 and R2 index two columns of MC,Q . That is, R1 and R2 are two different subsets of
Q. Let R0 be their symmetric difference. We have
As all the nonempty products of subsets of eigenvectors in Q are balanced on C, MC,Q (a, R0 ) is
positive for half the a ∈ C and negative for the other half. So,
X X
MC,Q (:, R1 )T MC,Q (:, R2 ) = MC,Q (a, R1 )MC,Q (a, R2 ) = MC,Q (a, R0 ) = 0.
a∈C a∈C
Lemma 39.9.2. If C is a balanced class of vertices and Q is a maximal set of eigenvectors that
are independent on C, then for every a and b in C there is an i ∈ Q for which ψ i (a) 6= ψ i (b).
Proof. Assume by way of contradiction that this does not hold. There must be some eigenvector i
for which ψ i (a) 6= ψ i (b). We will show that if we added i to Q, the product of every subset would
still be balanced. As we already know this for subsets of Q, we just have to prove it for subsets of
the form R ∪ {i}, where R ⊆ Q. As ψ h (a) = ψ h (b) for every h ∈ Q, ψ R (a) = ψ R (b). This implies
ψ R∪{i} (a) 6= ψ R∪{i} (b). Thus, ψ R∪{i} is not uniform on C, and so it must be balanced on C.
Lemma 39.9.3. If C is a balanced class of vertices and Q is a maximal set of eigenvectors that
are independent on C, then the rows of MC,Q are orthogonal.
Proof. Let a and b be in C. From Lemma 39.9.2 we know that there is an i ∈ Q for which
ψ i (a) = −ψ i (b). To prove that the rows MC,Q (a, :) and MC,Q (b, :) are orthogonal, we compute
CHAPTER 39. TESTING ISOMORPHISM OF GRAPHS WITH DISTINCT EIGENVALUES302
= 0.
Corollary 39.9.4. Let C be a balanced subset of vertices. Then the size of C is a power of 2. If
Q is an independent set of eigenvectors on C, then |Q| ≤ log2 |C|.
Proof. Let C be an orbit and let Q be a maximal set of eigenvectors that are independent on C.
As the rows and columns of MC,Q are both orthogonal, MC,Q must be square. This implies that
|C| = 2|Q| . If we drop the assumption that Q is maximal, we still know that all the columns of
MC,Q are orthogonal. This matrix has 2|Q| columns. As they are vectors in |C| dimensions, there
can be at most |C| of them.
We can now describe the structure of a balanced subset of vertices C. We call a maximal set of
eigenvectors that are independent on C a base for C. Every other eigenvector j is either constant
on C or becomes constant when multiplied by the product of some subset R of eigenvectors in Q.
In either case, we can write
Y
ψ j (a) = γ ψ i (a) for all a ∈ C, (39.1)
i∈R
Apply f to map this subgroup of S to (Z/2)n , and let B be a n-by-log2 (|C|) matrix containing a
basis of the subspace in its columns. Any independent subset of log2 (|C|) rows of B will form a
basis of the row-space, and is isomorphic to a base for C of the eigenvectors.
39.10 Algorithms
Let Cj be a balanced class. We just saw how to compute Aj , assuming that we know Cj and a
base Q for it. Of course, by “compute” we mean computing a basis of f (Aj ). We now show how
to find a base for a balanced class Cj . We do this by building up a set Q of eigenvectors that are
independent on Cj . To do this, we go through the eigenvectors in order. For each eigenvector ψ i ,
we must determine whether or not its values on Cj can be expressed as a product of eigenvectors
already present in Q. If it can be, then we record this product as part of the structure of Aj . If
not, we add i to Q.
The eigenvector ψ i is a product of eigenvectors in Q on Cj if and only if there is a constant γ and
yh ∈ {0, 1} for h ∈ Q such that Y
ψ i (a) = γ (ψ h (a))yh ,
h∈Q
We can tell whether or not these equations have a solution using linear algebra modulo 2. Let B
be the matrix over Z/2 such that
ψ i (a) = (−1)B(i,a) .
Then, the above equations become
X
B(i, a) = yh B(h, a) for all a ∈ Cj .
h∈Q
Thus, we can solve for the coefficients yh in polynomial time, if they exist. If they do not, we add
i to Q.
Once we have determined a base Q and how to express on Cj the values of every other
eigenvector as a product of eigenvectors in Q, we have determine Aj .
It remains to explain how we partition the vertices into balanced classes. Consider applying the
above procedure to a class Cj that is not balanced. We will discover that Cj is not balanced by
finding a product of eigenvectors that is neither constant nor balanced on Cj . Every time we add
an eigenvector ψ i to Q, we will examine every product of vectors in Q to check if any are
unbalanced on Cj . We can do this efficiently, because there are at most 2|Q| ≤ |Cj | such products
to consider. As we have added ψ i to Q, none of the products of vectors in Q can be constant on
Cj . If we find a product that it not balanced on Cj , then it must also be non-constant, and thus
provide a way of splitting class Cj into two.
CHAPTER 39. TESTING ISOMORPHISM OF GRAPHS WITH DISTINCT EIGENVALUES304
We can now summarize the entire algorithm. We first compute the partition by absolute values of
entries described in section 39.7. We then go through the classes of the partition one-by-one. For
each, we use the above procedure until we have either split it in two or we have determined that it
is balanced and we have computed its automorphism group. If we do split the class in two, we
refine the partition and start over. As the total number of times we split classes is at most n, this
algorithm runs in polynomial time.
After we have computed a partition into balanced classes and have computed their
automorphisms groups, we combine them to find the automorphisms group of the entire graph as
described at the end of section 39.6.
Chapter 40
40.1 Introduction
In the last lecture we saw how to test isomorphism of graphs in which every eigenvalue is distinct.
So, in this lecture we will consider the opposite case: graphs that only have 3 distinct eigenvalues.
These are the strongly regular graphs.
Our algorithm for testing isomorphism of these will not run in polynomial time. Rather, it takes
1/2
time nO(n log n) . This is at least much faster than the naive algorithm of checking all n! possible
permutations. In fact, this was the best known running time for general algorithms for graph
isomorphism until three years ago.
40.2 Definitions
These conditions are very strong, and it might not be obvious that there are any non-trivial
graphs that satisfy these conditions. Of course, the complete graph and disjoint unions of
305
CHAPTER 40. TESTING ISOMORPHISM OF STRONGLY REGULAR GRAPHS 306
complete graphs satisfy these conditions. Before proceeding, I warn you that there is a standard
notation in the literature about strongly regular graphs, and I am trying not to use it. In this
literature, d becomes k, α becomes λ and β becomes µ. Many other letters are bound as well.
For the rest of this lecture, we will only consider strongly regular graphs that are connected and
that are not the complete graph. I will now give you some examples.
The Paley graphs we encountered are strongly regular. The simplest of these is the pentagon. It
has parameters
n = 5, d = 2, α = 0, β = 1.
For a positive integer n, the lattice graph Ln is the graph with vertex set {1, . . . n}2 in which
vertex (a, b) is connected to vertex (c, d) if a = c or b = d. Thus, the vertices may be arranged at
the points in an n-by-n grid, with vertices being connected if they lie in the same row or column.
Alternatively, you can understand this graph as the product of two complete graphs on n vertices.
The parameters of this graph are:
d = 2(n − 1), α = n − 2, β = 2.
A Latin square is an n-by-n grid, each entry of which is a number between 1 and n, such that no
number appears twice in any row or column. For example,
1 2 3 4 1 2 3 4 1 2 3 4
4 1 2 3 2 1 4 3 2 4 1 3
3 4 1 2 , 3 4 1 2 , and 3
1 4 2
2 3 4 1 4 3 2 1 4 3 2 1
are Latin squares. Let me remark that the number of different Latin squares of size n grows very
quickly—at least as fast as n!(n − 1)!(n − 2)! . . . 2!. Two Latin squares are said to be isomorphic if
there is a renumbering of their rows, columns, and entries, or a permutation of these, that makes
them the same. As this provides 6(n!)3 isomorphisms, and this is much less than the number of
Latin squares, there must be many non-isomorphic Latin squares of the same size. The two of the
Latin squares above are isomorphic, but one is not.
From such a Latin square, we construct a Latin square graph. It will have n2 nodes, one for each
cell in the square. Two nodes are joined by an edge if
CHAPTER 40. TESTING ISOMORPHISM OF STRONGLY REGULAR GRAPHS 307
So, such a graph has degree d = 3(n − 1). Any two nodes in the same row will both be neighbors
with every other pair of nodes in their row. They will have two more common neighors: the nodes
in their columns holding the other’s number. So, they have n common neighbors. The same
obviously holds for columns, and is easy to see for nodes that have the same number. So, every
pair of nodes that are neighbors have exactly α = n common neighbors.
On the other hand, consider two vertices that are not neighbors, say (1, 1) and (2, 2). They lie in
different rows, lie in different columns, and we are assuming that they hold different numbers.
The vertex (1, 1) has two common neighbors of (2, 2) in its row: the vertex (1, 2) and the vertex
holding the same number as (2, 2). Similarly, it has two common neighbors of (2, 2) in its column.
Finally, we can find two more common neighbors of (2, 2) that are in different rows and columns
by looking at the nodes that hold the same number as (1, 1), but which are in the same row or
column as (2, 2). So, β = 6.
We will consider the adjacency matrices of strongly regular graphs. Let A be the adjacency
matrix of a strongly regular graph with parameters (d, α, β). We already know that A has an
eigenvalue of d with multiplicity 1. We will now show that A has just two other eigenvalues.
To prove this, first observe that the (a, b) entry of A2 is the number of common neighbors of
vertices a and b. For a = b, this is just the degree of vertex a. We will use this fact to write A2 as
a linear combination of A, I and J, the all 1s matrix. To this end, observe that the adjacency
matrix of the complement of A (the graph with non-edges where A has edges) is J − I − A. So,
A2 v = (α − β)Av + (d − β)v .
λ2 = (α − β)λ + d − β.
One can prove that every connected regular graph whose adjacency (or Laplacian) matrix has just
three distinct eigenvalues is a strongly regular graph.
The problem of testing isomorphism of graphs is often reduced to the problem of giving each
vertex in a graph a unique name. If we have a way of doing this that does not depend upon the
initial ordering of the vertices, then we can use it to test graph isomorphism: find the unique
names of vertices in both graphs, and then see if it provides an isomorphism. For example,
consider the graph below.
1
2 3
3
1 2
The degrees distinguish between many nodes, but not all of them. We may refine this labeling by
appending the labels of every neighbor of a node.
1, {3}
2, {1,3} 3, {2, 2, 3}
3, {1, 2, 3}
2, {3, 3}
1, {2}
CHAPTER 40. TESTING ISOMORPHISM OF STRONGLY REGULAR GRAPHS 309
Now, every vertex has its own unique label. If we were given another copy of this graph, we could
use these labels to determine the isomorphism between them. This procedure is called refinement,
and it can be carried out until it stops producing new labels. However, it is clear that this
procedure will fail to produce unique labels if the graph has automorphisms, or if it is a regular
graph. In these cases, we need a way to break symmetry.
The procedure called individualization breaks symmetry arbitrarily. It chooses some nodes in the
graph, arbitrarily, to give their own unique names. Ideally, we pick one vertex to give a unique
name, and then refine the resulting labeling. We could then pick another troubling vertex, and
continue. We call a set of vertices S ⊂ V a distinguishing set if individualizing this set of nodes
results in a unique name for every vertex, after refinement. How would we use a distinguishing set
to test isomorphism? Assume that S is a distinguishing set for G = (V, E). To test if H = (W, F )
is isomorphic to G, we could enumerate over every possible set of |S| vertices of W , and check if
they are a distinguishing set for H. If G and H are isomorphic, then H will also have an
isomorphic distinguishing set that we can use to find an isomorphism between G and H. We
n
would have to check |S| sets, and try |S|! labelings for each, so we had better hope that S is
small.
We will now prove a result of Babai [Bab80] which says that every strongly regular graph has a
√
distinguishing set of size O( n log n). Babai’s result won’t require any refinement beyond naming
every vertex by the set of individualized nodes that are its neighbors. So, we will prove that a set
of nodes S is a distinguishing set by proving that for every pair of distinct vertices a and b, either
there is an s ∈ S that is a neighbor of a but not of b, or the other way around. This will suffice to
distinguish a and b. As our algorithm will work in a brute-force fashion, enumerating over all sets
of a given size, we merely need to show that such a set S exists. We will do so by proving that a
random set of vertices probably works.
I first observe that it suffices to consider strongly-regular graphs with d < n/2, as the complement
of a strongly regular graph is also a strongly regular graph (that would have been too easy to
assign as a homework problem).
√ We should also observe that every strongly-regular graph has
diameter 2, and so d ≥ n − 1.
Lemma 40.8.1. Let G = (V, E) be a connected strongly regular graph with n vertices and degree
d < n/2. Then for every pair of vertices a and b, there are at least d/3 vertices that are neighbors
of a but not b.
√
If we choose a set S of 3 n + 2 ln n2 vertices at random, the probability that none of them is in T
is 3√n+2 ln n2
1 1
1− √ ≤ 2.
3 n+2 n
So, the probability that a random set of this many nodes fails to distinguish all n2 pairs is at
most 1/2.
Z0 = {w : w ∼ a, w 6∼ z} , and Z1 = {w : w 6∼ b, w ∼ z} .
So,
d − β ≤ 2(d − α − 1) =⇒ 2(α + 1) ≤ d + β. (40.2)
This tells us that if α is close to d, then β is also.
We require one more relation between α and β. We obtain this relation by picking any vertex a,
and counting the pairs b, z such that b ∼ z, a ∼ b and a 6∼ z.
Every node b that is a neighbor of a has α neighbors in common with a, and so has d − α − 1
neighbors that are not neighbors of a. This gives
On the other hand, there are n − d − 1 nodes z that are not neighbors of a, and each of them has
β neighbors in common with a, giving
Combining, we find
(n − d − 1)β = d(d − α − 1). (40.3)
As d < n/2, this equation tells us
d(d − α − 1) ≥ dβ =⇒ d − α − 1 ≥ β. (40.4)
Thus, for every a 6= b the number of vertices that are neighbors of a but not of b is at least
min(d − α − 1, d − β) ≥ d/3.
CHAPTER 40. TESTING ISOMORPHISM OF STRONGLY REGULAR GRAPHS 312
40.9 Notes
You should wonder if we can make this faster by analyzing refinement steps. In, [Spi96b], I
1/3
improved the running time bound to 2O(n log n) by analyzing two refinement phases. The
algorithm required us to handle certain special families of strongly regular graphs separately:
Latin square graphs and Steiner graphs. Algorithms for testing isomorphism of strongly regular
graphs were recently improved by Babai, Chen, Sun, Teng, and Wilmes [BCS+ 13, BW13, SW15].
The running times of all these algorithms are subsumed by that in Babai’s breakthrough
algorithm for testing graph isomorphism [Bab16].
Part VII
Interlacing Families
313
Chapter 41
41.1 Overview
Margulis [Mar88] and Lubotzky, Phillips and Sarnak [LPS88] presented the first explicit
constructions of infinite families of Ramanujan graphs. These had degrees p + 1, for primes p.
There have been a few other explicit constructions, [Piz90, Chi92, JL97, Mor94], all of which
produce graphs of degree q + 1 for some prime power q. Over this lecture and the next we will
prove the existence of infinite families of bipartite Ramanujan of every degree. While today’s
proof of existence does not lend itself to an explicit construction, it is easier to understand than
the presently known explicit constructions.
We think that much stronger results should be true. There is good reason to think that random
d-regular graphs should be Ramanujan [MNS08]. And, Friedman [Fri08] showed √ that a random
d-regular graph is almost Ramanujan: for sufficiently large n such a graph is a 2 d − 1 +
approximation of the complete graph with high probability, for every > 0.
In today’s lecture, we will use the method of interlacing families of polynomials to prove (half) a
conjecture of Bilu and Linial [BL06] that every bipartite Ramanujan graph has a 2-lift that is also
Ramanujan. This theorem comes from [MSS15b], but today’s proof is informed by the techniques
of [HPS15]. We will use theorems about the matching polynomials of graphs that we will prove
next lecture.
In the same way that a Ramanujan graph approximates the complete graph, a bipartite
Ramanujan graph approximates a complete bipartite graph. We say that a d-regular graph is a
bipartite Ramanujan graph√ if all of its adjacency matrix eigenvalues, other than d and −d, have
absolute value at most 2 d − 1. The eigenvalue of d is a consequence of being d-regular and the
eigenvalue of −d is a consequence of being bipartite. In particular, recall that the adjacency
matrix eigenvalues of a bipartite graph are symmetric about the origin. This is a special case of
the following claim, which you can prove when you have a sparse moment.
314
CHAPTER 41. BIPARTITE RAMANUJAN GRAPHS 315
We remark that one can derive bipartite Ramanujan graphs from ordinary Ramanujan
graphs—just take the double cover. However, we do not know any way to derive ordinary
Ramanujan graphs from the bipartite ones.
As opposed to reasoning directly about eigenvalues, we will work with characteristic polynomials.
For a matrix M , we write its characteristic polynomial in the variable x as
def
χx (M ) = det(xI − M ).
41.2 2-Lifts
just like the double-cover. If S (u, v) = 1, then GS has the two edges
You should check that G−A is the double-cover of G and that GA consists of two
disjoint copies of G.
Prove that the eigenvalues of the adjacency matrix of GS are the union of the
eigenvalues of A and the eigenvalues of S .
Theorem 41.2.1. Every d-regular graph √ G has a signed adjacency matrix S for which the
minimum eigenvalue of S is at least −2 d − 1.
We can use this theorem to build infinite families of bipartite √ Ramanujan graphs, because their
eigenvalues
√ are symmetric about the origin. Thus, if µ n ≥ −2 d − 1, then we know that
|µi | ≤ 2 d − 1 for all 1 < i < n. Note that every 2-lift of a bipartite graph is also a bipartite
graph.
We will prove Theorem 46.2.1 by considering a random 2-lift. In particular, we consider the
expected characteristic polynomial of a random signed adjacency matrix S :
ES [χx (S )] . (41.1)
Godsil and Gutman [GG81] proved that this is equal to the matching polynomial of G! We will
learn more about the matching polynomial next lecture.
For now, we just need the following bound on its zeros which was proved by Heilmann and Lieb
[HL72].
Theorem 41.3.1. The eigenvalues of the matching √ polynomial of a graph of maximum degree at
most d are real and have absolute value at most 2 d − 1.
√
Now that we know that the smallest zero of (46.1) is at least −2 d − 1, all we need to do is to
show that there is some signed adjacency matrix whose smallest eigenvalue is at least this bound.
This is not necessarily as easy as it sounds, because the smallest zero of the average of two
polynomials is not necessarily related to the smallest zeros of those polynomials. We will show
that, in this case, it is.
Instead of directly reasoning about the characteristic polynomials of signed adjacency matrices S ,
we will work with characteristic polynomials of dI − S . It suffices
√ for us to prove that there exists
an S for which the largest eigenvalue of dI − S is at most d + 2 d − 1.
Fix an ordering on the m edges of the graph, associate each S with a vector σ ∈ {±1}m , and
define
pσ (x) = χx (dI − S ).
The expected polynomial is the average of all these polynomials.
We define two vectors for each edge in the graph. If the ith edge is (a, b), then we define
v i,σi = δ a − σi δ b .
CHAPTER 41. BIPARTITE RAMANUJAN GRAPHS 317
where S is the signed adjacency matrix corresponding to σ. So, for every σ ∈ {±1}m ,
m
!
X
T
pσ (x) = χx v i,σi v i,σi .
i=1
Here is the problem we face. We have a large family of polynomials, say p1 (x), . . . , pm (x), for
which we know each pi is real-rooted and that their sum is real rooted. We would like to show
that there is some polynomial pi whose largest zero is at most the largest zero of the sum. This is
not true in general. But, it is true in our case because the polynomials form an interlacing family.
For a polynomial p(x) = ni=1 (x − λi ) of degree n and a polynomial q(x) = i=1
Q Qn−1
(x − µi ) of
degree n − 1, we say that q(x) interlaces p(x) if
λn ≤ µn−1 ≤ λn−1 ≤ · · · ≤ λ2 ≤ µ1 ≤ λ1 .
Qn
If r(x) = i=1 (x − µi ) has degree n, we write r(x) → p(x) if
µn ≤ λn ≤ µn−1 ≤ · · · ≤ λ2 ≤ µ1 ≤ λ1 .
That is, if the zeros of p and r interlace, with the zeros of p being larger. We also make these
statements if they hold of positive multiples of p, r and q.
The following lemma gives the examples of interlacing polynomials that motivate us.
Lemma 41.5.1. Let A be a symmetric matrix and let v be a vector. For a real number t let
pt (x) = χx (A + tv v T ).
Then, for t > 0, p0 (x) → pt (x) and there is a monic1 degree n − 1 polynomial q(x) so that for all t
Proof. The fact that p0 (x) → pt (x) for t > 0 follows from the Courant-Fischer Theorem.
We first establish the existence of q(x) in the case that v = δ 1 . As the matrix tδ 1 δ T1 is zeros
everywhere except for the element t in the upper left entry and the determinant is linear in each
entry of the matrix,
where A(1) is the submatrix of A obtained by removing its first row and column. The polynomial
q(x) = χx (A(1) ) has degree n − 1.
For arbitrary, v , let Q be a rotation matrix for which Qv = δ 1 . As determinants, and thus
characteristic polynomials, are unchanged by multiplication by rotation matrices,
χx (A + tv v T ) = χx (Q(A + tv v T )Q T )
= χx (QAQ T + tδ 1 δ T1 )) = χx (QAQ T ) − tq(x) = χx (A) − tq(x),
For a polynomial p, let λmax (p) denote its largest zero. When polynomials interlace, we can relate
the largest zero of their sum to the largest zero of at least one of them.
Lemma 41.5.2. Let p1 (x), p2 (x) and r(x) be polynomials so that r(x) → pi (x). Then,
r(x) → p1 (x) + p2 (x) and there is an i ∈ {1, 2} for which
Proof. Let µ1 be the largest zero of r(x). As each polynomial pi (x) has a positive leading
coefficient, each is eventually positive and so is their sum. As each has exactly one zero that is at
least µ1 each is nonpositive at µ1 , and the same is also true of their sum. Let λ be the largest zero
of p1 + p2 . We have established that λ ≥ µ1 .
If pi (λ) = 0 for some i, then we are done. If not, there is an i for which pi (λ) > 0. As pi only has
one zero larger than µ1 , and it is eventually positive, the largest zero of pi must be less than λ.
If p1 , . . . , pm are polynomials such that there exists an r(x) for which r(x) → pi (x) for all i, then
these polynomials are said to have a common interlacing. Such polynomials satisfy the natural
generalization of Lemma 44.3.1.
The polynomials pσ (x) do not all have a common interlacing. However, they satisfy a property
that is just as useful: they form an interlacing family. Rather than defining these in general, we
will just explain the special case we need for today’s theorem.
We define polynomials that correspond to fixing the signs of the first k edges and then choosing
the rest at random. We indicate these by shorter sequences σ ∈ {±1}k . For k < m and σ ∈ {±1}k
we define
def
pσ (x) = Eρ∈{±1}n−k [pσ,ρ (x)] .
So,
p∅ (x) = Eσ∈{±1}m [pσ (x)] .
We view the strings σ, and thus the polynomials pσ , as vertices in a complete binary tree. The
nodes with σ of length m are the leaves, and ∅ corresponds to the root. For σ of length less than
n, the children of σ are (σ, 1) and (σ, −1). We call such a pair of nodes siblings. We will
eventually prove in Lemma 41.6.1 that all the polynomials pσ (x) are real rooted and in Corollary
41.6.2 that every pair of siblings has a common interlacing.
CHAPTER 41. BIPARTITE RAMANUJAN GRAPHS 319
But first, we show that this implies that there is a leaf indexed by σ ∈ {±1}m for which
Proof. Corollary 41.6.2 and Lemma 44.3.1 imply that every non-leaf node in the tree has a child
whose largest zero is at most the largest zero of that node. Starting at the root of the tree, we
find a node whose largest zero is at most the largest zero of p∅ . We then proceed down the tree
until we reach a leaf, at each step finding a node labeled by a polynomial whose largest zero is at
most the largest zero of the previous polynomial. The leaf we reach, σ, satisfies the desired
inequality.
We can now use Lemmas 46.3.2 and 44.3.1 to show that every σ ∈ {±1}m−1 has a child (σ, s) for
which λmax (pσ,s ) ≤ λmax (pσ ). Let
m−1
X
A= v i,σi v Ti,σi .
i=1
The children of σ, (σ, 1) and (σ, −1) have polynomials p(σ,1) and p(σ,−1) that equal
By Lemma 46.3.2, χx (A) → χx (A + v m,s v Tm,s ) for s ∈ {±1}, and Lemma 44.3.1 implies that there
is an s for which the largest zero of p(σ,s) is at most the largest zero of their average, which is pσ .
To extend this argument to nodes higher up in the tree, we will prove the following statement.
Lemma 41.6.1. Let A be a symmetric matrix and let w i,s be vectors for 1 ≤ i ≤ k and
s ∈ {0, 1}. Then the polynomial
k
!
X X
T
χx A + w i,ρi w i,ρi
ρ∈{0,1}k i=1
Corollary 41.6.2. For every k < n and σ ∈ {±1}k , the polynomials pσ,s (x) for s ∈ {±1} are real
rooted and have a common interlacing.
CHAPTER 41. BIPARTITE RAMANUJAN GRAPHS 320
To prove Lemma 41.6.1, we use the following two lemmas which are known collectively as
Obreschkoff’s Theorem [Obr63].
Lemma 41.7.1. Let p and q be polynomials of degree n and n − 1, and let pt (x) = p(x) − tq(x).
If pt is real rooted for all t ∈ IR, then q interlaces p.
Proof Sketch. Recall that the roots of a polynomial are continuous functions of its coefficients, and
thus the roots of pt are continuous functions of t. We will use this fact to obtain a contradiction.
For simplicity,2 I just consider the case in which all of the roots of p and q are distinct. If they are
not, one can prove this by dividing out their common divisors.
If p and q do not interlace, then p must have two roots that do not have a root of q between
them. Let these roots of p be λi+1 and λi . Assume, without loss of generality, that both p and q
are positive between these roots. We now consider the behavior of pt for positive t.
As we have assumed that the roots of p and q are distinct, q is positive at these roots, and so pt is
negative at λi+1 and λi . If t is very small, then pt will be close to p in value, and so there must be
some small t0 for which pt0 (x) > 0 for some λi+1 < x < λi . This means that pt0 must have two
roots between λi+1 and λi .
As q is positive on the entire closed interval [λi+1 , λi ], when t is large pt will be negative on this
entire interval, and thus have no roots inside. As we vary t between t0 and infinity, the two roots
at t0 must vary continuously and cannot cross λi+1 or λi . This means that they must become
complex, contradicting our assumption that pt is always real rooted.
Lemma 41.7.2. Let p and q be polynomials of degree n and n − 1 that interlace and have positive
leading coefficients. For every t > 0, define pt (x) = p(x) − tq(x). Then, pt (x) is real rooted and
p(x) → pt (x).
Proof Sketch. For simplicity, I consider the case in which all of the roots of p and q are distinct.
One can prove the general case by dividing out the common repeated roots.
To see that the largest root of pt is larger than λ1 , note that q(x) is positive for all x > µ1 , and
λ1 > µ1 . So, pt (λ1 ) = p(λ1 ) − tq(λ1 ) < 0. As pt is monic, it is eventually positive and it must have
a root larger than λ1 .
We will now show that for every i ≥ 1, pt has a root between λi+1 and λi . As this gives us d − 1
more roots, it accounts for all d roots of pt . For i odd, we know that q(λi ) > 0 and q(λi+1 ) < 0.
As p is zero at both of these points, pt (λi ) > 0 and pt (λi+1 ) < 0, which means that pt has a root
between λi and λi+1 . The case of even i is similar.
Lemma 41.7.3. Let p0 (x) and p1 (x) be degree n monic polynomials for which there is a third
polynomial r(x) Such that
r(x) → p0 (x) and r(x) → p1 (x).
2
I thank Sushant Sachdeva for helping me work out this particularly simple proof.
CHAPTER 41. BIPARTITE RAMANUJAN GRAPHS 321
Then
r(x) → (1/2)p0 (x) + (1/2)p1 (x),
and the latter is a real rooted polynomial.
Sketch. Assume for simplicity that all the roots of r are distinct and different from the roots of p0
and p1 . Let µn < µn−1 < · · · < µ1 be the roots of r. Our assumptions imply that both p0 and p1
are negative at µi for odd i and positive for even i. So, the same is true of their average. This
tells us that their average must have at least n − 1 real roots between µn and µ1 . As their average
is monic, it must be eventually positive and so must have a root larger than µ1 . That accounts for
all n of its roots.
Proof of Lemma 41.6.1. We prove this by induction on k. Assuming that we have proved it for
k − 1, we now prove it for k. Let u be any vector and let t ∈ IR. Define
k−1
!
X X
T T
pt (x) = χx A + w i,ρi w i,ρi + tuu .
ρ∈{0,1}k i=1
41.8 Conclusion
The major open problem left by this work is establishing the existence of regular (non-bipartite)
Ramanujan graphs. The reason we can not prove this using the techniques in this lecture is that
the interlacing techniques only allow us to reason about the largest or smallest eigenvalue of a
matrix, but not both.
To see related papers establishing the existence of Ramanujan graphs, see [MSS15d, HPS15]. For
a survey on this and related material, see [MSS14].
CHAPTER 41. BIPARTITE RAMANUJAN GRAPHS 322
41.9 Overview
The coefficients of the matching polynomial of a graph count the numbers of matchings of various
sizes in that graph. It was first defined by Heilmann and Lieb [HL72], who proved that it has
some amazing properties, including that it is real rooted. They also √proved that all roots of the
matching polynomial of a graph of maximum degree d are at most 2 d − 1. Our proofs today
come from a different approach to the matching polynomial that appears in the work of Godsil
[God93, God81]. A theorem of Godsil and Gutman [GG81] implies that the expected
characteristic polynomial of a randomly signed adjacency matrix is the matching polynomial of a
graph. Last lecture we used these results to establish the existence of infinite families of bipartite
Ramanujan graphs.
√
41.10 2 d−1
√
We begin by explaining where the number 2 d − 1 comes from: it is an upper bound on the
eigenvalues of a tree of maximum
√ degree at most d. One can also show that the largest eigenvalue
of an d-ary tree approaches 2 d − 1 as the depth of the tree (and number of vertices) increases.
We prove this statement in two steps. The first is similar to proofs we saw at the beginning of the
semester.
Lemma 41.10.1. Let M be a (not necessarily symmetric) nonnegative matrix. Let s = kM 1k∞
be the maximum row sum of M . Then, |λ| ≤ s for every eigenvalue of M .
Proof. Let M ψ = λψ, and let a be an entry of ψ of largest absolute value. Then,
|λ| |ψ(a)| = |λψ(a)|
= |(M ψ)(a)|
X
= M (b, a)ψ(a)
b
X
≤ M (b, a) |ψ(a)|
b
≤ s |ψ(a)| .
This implies |λ| ≤ s.
Theorem 41.10.2. Let T be a tree in which every vertex√ has degree at most d. Then, all
eigenvalues of χx (M T ) have absolute value at most 2 d − 1.
Proof. Let M be the adjacency matrix of T . Choose some vertex to be the root of the tree, and
define its height to be 0. For every other vertex a, define its height, h(a), to be its distance to the
root. Define D to be the diagonal matrix with
√ h(a)
D(a, a) = d−1 .
CHAPTER 41. BIPARTITE RAMANUJAN GRAPHS 323
Recall that the eigenvalues of M are the same as the eigenvalues of DM D −1 . We will use the
fact that all eigenvalues of a nonnegative matrix are upper bounded in absolute value by its
maximum row sum.
√
So, we need to prove that all row sums of DM D −1 are at most 2 d − 1. There are √ three types
of vertices√to consider.
√ First, the row of the root has up to d entries that are all 1/ d − 1. For
√≥ 2, d/ d − 1 ≤ 2 d − 1. The intermediate vertices
d √ have one entry in their√ row that equals
d − 1, and up to d − 1 entries that are equal to 1/ d − 1, for a √ total of 2 d − 1. Finally, every
leaf only has one nonzero entry in its row, and that entry equals d − 1.
A matching in a graph G = (V, E) is a subgraph of G in which every vertex has degree 1. We say
that a matching has size k if it has k edges. We let
mk (G)
denote the number of matchings in G of size k. Throughout this lecture, we let |V | = n. Observe
that m1 (G) is the number of edges in G, and that mn/2 (G) is the number of perfect matchings in
G. By convention we set m0 (G) = 1, as the empty set is matching with no edges. Computing the
number of perfect matchings is a #P -hard problem [Val79]. This means that it is much harder
than solving N P -hard problems, so you shouldn’t expect to do it quickly on large graphs.
The matching polynomial of G, written µx [G], is
n/2
def
X
µx [G] = xn−2k (−1)k mk (G).
k=0
Lemma 41.11.1. Let G be a graph and let S be a uniform random signed adjacency matrix of G.
Then,
E [χx (S )] = µx [G] .
CHAPTER 41. BIPARTITE RAMANUJAN GRAPHS 324
E [χx (S )] = E [det(xI − S )]
= E [det(xI + S )]
X Y
= E sgn(π)x|{a:π(a)=a}| (S (a, π(a))).
π∈Sn a:π(a)6=a
X Y
= sgn(π)x|{a:π(a)=a}| E (S (a, π(a))) .
π∈Sn a:π(a)6=a
As E [S (a, π(a))] = 0 for every a so that π(a) 6= a, the only way we can get a nonzero contribution
from a permutation π is if for all a so that π(a) 6= a,
b. π(π(a)) = a.
The latter condition guarantees that whenever S (a, π(a)) appears in the product, S (π(a), a) does
as well. As these entries are constrained to be the same, their product is 1.
Thus, the only permtuations that count are the involuations (the permutations in which all cycles
have length 1 or 2). These correspond exactly to the matchings in the graph. Finally, the sign of
an involution is exactly its number of two-cycles, which is exactly its number of edges.
We will prove that the matching polynomial of every d-regular graph divides the matching
polynomial of a larger tree of maximum degree d.
The matching polynomials of trees are very special—they are exactly the same as the
characteristic polynomial of the adjacency matrix.
Theorem 41.11.2. Let G be a tree and let M be its adjacency matrix. Then
µx [G] = χx (M ).
Proof. Expand
χx (M ) = det(xI − M )
by summing over permutations. We obtain
X Y
sgn(π)x|{a:π(a)=a}| (−M (a, π(a))).
π∈Sn a:π(a)6=a
We will prove that the only permutations that contribute to this sum are those for which
π(π(a)) = a for every a. And, these correspond to matchings.
CHAPTER 41. BIPARTITE RAMANUJAN GRAPHS 325
We begin by establishing some fundamental properties of the matching polynomial. For graphs G
and H on different vertex sets, we write G ∪ H for their disjoint union.
Lemma 41.12.1. Let G and H be graphs on different vertex sets. Then,
µx [G ∪ H] = µx [G] µx [H] .
For a a vertex of G = (V, E), we write G − a for the graph G(V − {a}). This notation will prove
very useful when reasoning about matching polynmomials. Fix a vertex a of G, and divide the
matchings in G into two classes: those that involve vertex a and those that do not. The number
of matchings of size k that do not involve a is mk (G − a). On the other hand, those that do
involve a connect a to one of its neighbors. To count these, we enumerate the neighbors b of a. A
matching of size k that includes edge (a, b) can be written as the union of (a, b) and a matching of
size k − 1 in G − a − b. So, the number of matchings that involve a is
X
mk−1 (G − a − b).
b∼a
Lemma 41.12.2. X
µx [G] = xµx [G − a] − µx [G − a − b] .
b∼a
Godsil proves that the matching polynomial of a graph is real rooted by proving that it divides
the matching polynomial of a tree. Moreover, the maximum degree of vertices in the tree is at
most the maximum degree of vertices in the graph. As the matching polynomial of a tree is the
same as its characteristic polynomial, and all zeros of the √
characteristic polynomial of a tree of
maximum degree at most d have absolute value at most 2 d −√1, all the zeros of the matching
polynomial of a d-regular graph have absolute value at most 2 d − 1.
The tree that Godsil uses is the path tree of G starting at a vertex of G. For a a vertex of G, the
path tree of G starting at a, written Ta (G) is a tree whose vertices correspond to paths in G that
start at a and do not contain any vertex twice. One path is connected to another if one extends
the other by one vertex. For example, here is a graph and its path tree starting at a.
The term on the upper-right hand side is a little odd. It is a forrest obtained by removing the
root of the tree Ta (G). We may write it as a disjoint union of trees as
[
Ta (G) − a = Tb (G − a).
b∼a
Theorem 41.13.2. For every vertex a of G, the polynomial µx [G] divides the polynomial
µx [Ta (G)].
CHAPTER 41. BIPARTITE RAMANUJAN GRAPHS 327
Proof. We prove this by induction on the number of vertices in G, using as our base case graphs
with at most 2 vertices. We then know, by induction, that for b ∼ a,
As
Ta (G) − a = ∪b∼a Tb (G − a),
µx [Tb (G − a)] divides µx [Ta (G) − a] .
Thus,
µx [G − a] divides µx [Ta (G) − a] ,
and so
µx [Ta (G) − a]
µx [G − a]
is a polynomial in x. To finish the proof, we apply Theorem 45.4.1, which implies
Proof of Theorem 45.4.1. If G is a tree, then the left and right sides are identical, and so the
equality holds. As the only graphs on less than 3 vertices are trees, the theorem holds for all
graphs on at most 2 vertices. We will now prove it by induction on the number of vertices.
We may use Lemma 45.3.2 to expand the the left-hand side:
P
µx [G] xµx [G − a] − b∼a µx [G − a − b] X µx [G − a − b]
= =x− .
µx [G − a] µx [G − a] µx [G − a]
b∼a
To simplify this expression, we examine these graphs carefully. By the observtion we made before
the proof, [
Tb (G − a) − b = Tc (G − a − b).
c∼b,c6=a
Similarly, [
Ta (G) − a = Tc (G − a),
c∼a
which implies Y
µx [Ta (G) − a] = µx [Tc (G − a)] .
c∼a
CHAPTER 41. BIPARTITE RAMANUJAN GRAPHS 328
Let ab be the vertex in Ta (G) corresponding to the path from a to b. We also have
[ [
Ta (G) − a − ab = Tc (G − a) ∪ Tc (G − a − b)
c∼a,c6=b c∼b,c6=a
[
= Tc (G − a) ∪ (Tb (G − a) − b) .
c∼a,c6=b
which implies
Y
µx [Ta (G) − a − ab] = µx [Tc (G − a)] µx [Tb (G − a) − b] .
c∼a,c6=b
Thus,
Q
µx [Ta (G) − a − ab] c∼a,c6=b µ x [Tc (G − a)] µx [Tb (G − a) − b]
= Q
µx [Ta (G) − a] c∼a µx [Tc (G − a)]
µx [Tb (G − a) − b]
= .
µx [Tb (G − a)]
42.1 Overview
Over the next few lectures, we will see two different proofs that infinite families of bipartite
Ramanujan graphs exist. Both proofs will use the theory of interlacing polynomials, and will
consider the expected characteristic polynomials of random matrices. In today’s lecture, we will
see a proof that some of these polynomials are real rooted.
At present, we do not know how to use these techniques to prove the existence of infinite families
of non-bipartite Ramanujan graphs.
The material in today’s lecture comes from [MSS15d], but the proof is inspired by the treatment
of that work in [HPS15].
We will build Ramanujan graphs on n vertices of degree d, for every d and even n. We begin by
considering a random graph on n vertices of degree d. When n is even, the most natural way to
generate such a graph is to choose d perfect matchings uniformly at random, and to then take
their sum. I should mention one caveat: some edge could appear in many of the matchings. In
this case, we add the weights of the corresponding edges together. So, the weight of an edge is the
number of matchings in which it appears.
Let M be the adjacency matrix of some perfect matching on n vertices. We can generate the
adjacency matrix of a random perfect matching by choosing a permutation matrix Π uniformly at
random, and then forming ΠM ΠT . The sum of d independent uniform random perfect machings
329
CHAPTER 42. EXPECTED CHARACTERISTIC POLYNOMIALS 330
is then
d
X
Πi M ΠTi .
i=1
In today’s lecture, we will consider the expected characteristic polynomial of such a graph. For a
matrix M , we let
def
χx (M ) = det(xI − M )
denote the characteristic polynomial of M in the variable x.
For simplicity, we will consider the expected polynomial of the sum of just two graphs. For
generality, we will let them be any graphs, or any symmetric matrices.
Our goal for today is to prove that these expected polynomials are real rooted.
Theorem 42.2.1. Let A and B be symmetric n-by-n matrices and let Π be a uniform random
permutation. Then,
T
EΠ χx (A + ΠBΠ )
has only real roots.
So that you will be surprised by this, I remind you that the sum of real rooted polynomials might
have no real roots. For example, both (x − 2)2 and (x + 2)2 have only real roots, but their sum,
2x2 + 8, has no real roots.
Theorem 42.2.1 also holds for sums of many matrices. But, for simplicity, we restrict ourselves to
considering the sum of two.
42.3 Interlacing
λ1 ≥ µ1 ≥ λ2 ≥ µ2 · · · ≥ λn−1 ≥ µn−1 ≥ λn .
We have seen two important examples of interlacing in this class so far. A real rooted polynomial
and its derivative interlace. Similarly, the characteristic polynomial of a symmetric matrix and
the characteristic polynomial of a principal submatrix interlace.
When p and q have the same degree, we also say that they interlace if their roots alternate. But,
now there are two ways in which their roots can do so, depending on which polynomial has the
largest root. If
Yn Yn
p(x) = (x − λi ) and q(x) = (x − µi ),
i=1 i=1
CHAPTER 42. EXPECTED CHARACTERISTIC POLYNOMIALS 331
we write q → p if p and q interlace and for every i the ith root of p is at least as large as the ith
root of q. That is, if
λ 1 ≥ µ1 ≥ λ 2 ≥ µ 2 ≥ · · · ≥ λ n ≥ µ n .
Lemma 42.3.1. Let p and q be polynomials of degree n and n − 1 that interlace and have positive
leading coefficients. For every t > 0, define pt (x) = p(x) − tq(x). Then, pt (x) is real rooted and
p(x) → pt (x).
Proof Sketch. For simplicity, I consider the case in which all of the roots of p and q are distinct.
One can prove the general case by dividing out the common repeated roots.
To see that the largest root of pt is larger than λ1 , note that q(x) is positive for all x > µ1 , and
λ1 > µ1 . So, pt (λ1 ) = p(λ1 ) − tq(λ1 ) < 0. As pt is monic, it is eventually positive and it must have
a root larger than λ1 .
We will now show that for every i ≥ 1, pt has a root between λi+1 and λi . As this gives us d − 1
more roots, it accounts for all d roots of pt . For i odd, we know that q(λi ) > 0 and q(λi+1 ) < 0.
As p is zero at both of these points, pt (λi ) > 0 and pt (λi+1 ) < 0, which means that pt has a root
between λi and λi+1 . The case of even i is similar.
Proof Sketch. Recall that the roots of a polynomial are continuous functions of its coefficients, and
thus the roots of pt are continuous functions of t. We will use this fact to obtain a contradiction.
For simplicity,1 I again just consider the case in which all of the roots of p and q are distinct.
If p and q do not interlace, then p must have two roots that do not have a root of q between
them. Let these roots of p be λi+1 and λi . Assume, without loss of generality, that both p and q
are positive between these roots. We now consider the behavior of pt for positive t.
As we have assumed that the roots of p and q are distinct, q is positive at these roots, and so pt is
negative at λi+1 and λi . If t is very small, then pt will be close to p in value, and so there must be
some small t0 for which pt0 (x) > 0 for some λi+1 < x < λi . This means that pt0 must have two
roots between λi+1 and λi .
As q is positive on the entire closed interval [λi+1 , λi ], when t is large pt will be negative on this
entire interval, and thus have no roots inside. As we vary t between t0 and infinity, the two roots
at t0 must vary continuously and cannot cross λi+1 or λi . This means that they must become
complex, contradicting our assumption that pt is always real rooted.
Together, Lemmas 46.3.4 and 46.3.3 are known as Obreschkoff’s Theorem [Obr63].
The following example will be critical.
1
I thank Sushant Sachdeva for helping me work out this particularly simple proof.
CHAPTER 42. EXPECTED CHARACTERISTIC POLYNOMIALS 332
Lemma 42.3.3. Let A be an n-dimensional symmetric matrix and let v be a vector. Let
pt (x) = χx (A + tv v T ).
Proof. Consider the case in which v = δ 1 . It suffices to consider this case as determinants, and
thus characteristic polynomials, are unchanged by multiplication by rotation matrices.
Then, we know that
χx (A + tδ 1 δ T1 ) = det(xI − A − tδ 1 δ T1 ).
Now, the matrix tδ 1 δ T1 is zeros everywhere except for the element t in the upper left entry. So,
where A(1) is the submatrix of A obtained by removing its first row and column.
We know that χx (A + tv v T ) is real rooted for all t, and we can easily show using the Courant
Fischer Theorem that for t > 0 it interlaces χx (A) from above. Lemmas 46.3.4 and 46.3.3 tell us
that these facts imply each other.
We need one other fact about interlacing polynomials.
Lemma 42.3.4. Let p0 (x) and p1 (x) be two degree n monic polynomials for which there is a third
polynomial r(x) that has the same degree as p0 and p1 and so that
Sketch. Assume for simplicity that all the roots of r are distinct. Let µ1 > µ2 > · · · > µn be the
roots of r. Our assumptions imply that both p0 and p1 are positive at µi for odd i and negative
for even i. So, the same is true of their sum ps . This tells us that ps must have at least n − 1 real
roots.
We can also show that ps has a root that is less than µn . One way to do it is to recall that the
complex roots of a polynomial with real coefficients come in conjugate pairs. So, ps can not have
only one complex root.
CHAPTER 42. EXPECTED CHARACTERISTIC POLYNOMIALS 333
is a real rooted polynomial for all symmetric matrices A and B, where Sn is the set of n-by-n
permuation matrices. We will do this by proving it for smaller sets of permutation matrices. To
begin, we know it for S = {I }. We will build up larger sets by swapping coordinates.
This will actually result in a distribution on permuations, so we consider σ : Sn → IR≥0 and
consider sums of the form X
σ(Π)χx (A + ΠBΠT ).
Π
For coordinates i and j, let Γi,j be the permutation matrix that just swaps i and j. We call such a
permutation a swap. We need the following important fact about the action of swaps on matrices.
Lemma 42.4.1. Let A be a symmetric matrix. Then, for all i and j, there are vectors u and v
so that
Γi,j AΓi,j = A − uu T + v v T .
Proof. Without loss of generality, let i = 1 and j = 2. We prove that A − Γi,j AΓi,j has rank 2
and trace 0.
We can write this difference in the form
a11 − a22 a12 − a21 a13 − a23 a14 − a24 . . .
T
a21 − a12 a22 − a11 a23 − a13 a24 − a14 . . . α β y
a31 − a32 a32 − a31 0 ... = −β −α −y T
a41 − a42 a42 − a41 0 ... y −y 0n−2
...
for some numbers α, β and some column vector y of length n − 2. If α 6= β then the sum of the
first two rows is equal to (c, −c, 0, . . . , 0) for some c 6= 0, and every other row is a scalar multiple
of this. On the other hand, if α = β then the first two rows are linearly dependent, and all of the
other rows are multiples of (1, −1, 0, . . . , 0).
Lemma 42.4.2. Let σ be such that for all symmetric matrices A and B,
def
X
px (A, B) = σ(Π)χx (A + ΠBΠT )
Π∈S
is real rooted. Then, for every 0 < s < 1 and pair of vectors u and v , for every symmetric A and
B the polynomial
(1 − s)px (A, B) + spx (A − uu T + v v T , B)
is real rooted.
CHAPTER 42. EXPECTED CHARACTERISTIC POLYNOMIALS 334
Proof. Define
rt (x) = px (A + tv v T , B).
By assumption, rt (x) is real rooted for every t ∈ IR. By Lemma 46.3.2, we can write
where q(x) has degree n − 1 and both r0 and q have positive leading coefficients. So, by Lemma
46.3.3 q(x) interlaces r0 (x) = px (A, B). Lemma 46.3.4 then tells us that
px (A, B) → px (A + v v T , B).
px (A − uu T + v v T , B) → px (A + v v T , B).
This tells us that px (A, B) and px (A − uu T + v v T , B) both interlace r1 (x) from below. We
finish by applying Lemma 46.3.5 to conclude that every convex combination of these polynomials
is real rooted.
Corollary 42.4.3. Let σ be such that for all symmetric matrices A and B,
def
X
px (A, B) = σ(Π)χx (A + ΠBΠT )
Π∈S
is real rooted. Then, for every 0 < s < 1 and for every symmetric A and B the polynomial
X
sσ(Π)χx (A + ΠBΠT ) + (1 − s)σ(Π)χx (A + Γi,j ΠBΠT ΓTi,j )
Π∈S
is real rooted.
We will build a random permutation out of random swaps. A random swap is specified by
coordinates i and j and a swap probability s. It is a random matrix is that is equal to the identity
with probability 1 − s and Γi,j with probability s. Let S be a random swap.
In the language of random swaps, we can express Corollary 42.5.1 as follows.
CHAPTER 42. EXPECTED CHARACTERISTIC POLYNOMIALS 335
Corollary 42.5.1. Let Π be a random permutation matrix drawn from a distribution so that for
all symmetric matrices A and B,
T
E χx (A + ΠBΠ )
is real rooted. Let S be a random swap. Then,
T T
E χx (A + S ΠBΠ S )
All that remains is to show that a uniform random permutation can be assembled out of random
swaps. The trick to doing this is to choose the random swaps with swap probabilities other than
1/2. If you didn’t do this, it would be impossible as there are n! permutations, which is not a
power of 2.
Lemma 42.5.2. For every n, there exists a finite sequence of random swaps S 1 , . . . , S k so that
S 1S 2 . . . S k
43.1 Overview
The material in today’s lecture comes from [MSS15d] and [MSS15a]. My goal today is to prove
simple analogs of the main quadrature results, and then give some indication of how the other
quadrature statements are proved. I will also try to explain what led us to believe that these
results should be true.
Recall that last lecture we considered the expected characteristic polynomial of a random matrix
of the form A + ΠBΠT , where A and B are symmetric. We do not know a nice expression for
this expected polynomial for general A and B. However, we will see that there is a very nice
expression when A and B are Laplacian matrices or the adjacency matrices of regular graphs.
In Free Probability [Voi97], one studies operations on matrices in a large dimensional limit. These
matrices are determined by the moments of their spectrum, and thus the operations are
independent of the eigenvectors of the matrices. We consider a finite dimensional analog.
For n-dimensional symmetric matrices A and B, we consider the expected characteristic
polynomial
T
EQ∈O(n) χx (A + QBQ ) ,
where O(n) is the group of n-by-n orthonormal matrices, and Q is a random orthonormal matrix
chosen according to the Haar measure. In case you are not familiar with “Haar measure”, I’ll
quickly explain the idea. It captures our most natural idea of a random orthnormal matrix. For
336
CHAPTER 43. QUADRATURE FOR THE FINITE FREE CONVOLUTION 337
example, if A is a Gaussian random symmetric matrix, and V is its matrix of eigenvectors, then
V is a random orthonormal matrix chosen according to Haar measure. Formally, it is the
measure that is invariant under group operations, which in this case are multiplication by
orthnormal matrices. That is, the Haar measure is the measure under which for every S ⊆ O(n)
and P ∈ O(n), S has the same measure as {QP : Q ∈ S}.
This expected characteristic polynomial does not depend on the eigenvectors of A and B, and
thus can be written as a function of the characteristic polynomials of these matrices. To see this,
write A = V DV T and B = U C U T where U and V are the orthnormal eigenvectors matrices
and C and D are the diagonal matrices of eigenvalues. We have
χx (V DV T + QU C U T Q T ) = χx (D + V T QU C U T Q T V ) = χx (D + (V T QU )C (V T QU )T ).
In today’s lecture, we will establish the following formula for the finite free convolution.
Then,
n
X X (n − i)!(n − j)!
p(x) n q(x) = xn−k (−1)k ai bj . (43.1)
n!(n − i − j)!
k=0 i+j=k
This convolution was studied by Walsh [Wal22], who proved that when p and q are real rooted, so
is p n q.
Our interest in the finite free convolution comes from the following theorem, whose proof we will
also sketch today.
Theorem 43.2.2. Let A and B be symmetric matrices with constant row sums. If A1 = a1 and
B1 = b1, we may write their characteristic polynomials as
We then have
T
EΠ∈Sn χx (A + ΠBΠ ) = (x − (a + b))(p(x) n−1 q(x)).
We describe this theorem as a quadrature result, because it obtains an integral over a continuous
space as a sum over a finite number of points.
Before going in to the proof of the theorem, I would like to explain why one might think
something like this could be true. The first answer is that it was a lucky guess. We hoped that
this expectation would have a nice formula. The nicest possible formula would be a bi-linear map:
a function that is linear in p when q is held fixed, and vice versa. So, we computed some examples
by holding B and q fixed and varying A. We then observed that the coefficients of the resulting
expected polynomial are in fact a linear functions of the coefficients of p. Once we knew this, it
didn’t take too much work to guess the formula.
I now describe the main quadrature result we will prove today. Let B(n) be the nth
hyperoctahedral group. This is the group of symmetries of the generalized octahedron in n
dimensions. It may be described as the set of matrices that can be written in the form DΠ, where
D is a diagonal matrix of ±1 entries and Π is a permutation. It looks like the family of
permutation matrices, except that both 1 and −1 are allowed as nonzero entries. B(n) is a
subgroup of O(n).
We will use this result to prove Theorem 43.2.1. The proof of Theorem 43.2.2 is similar to the
proof of Theorem 43.2.3. So, we will prove Theorem 43.2.3 and then explain the major differences.
43.3 Quadrature
In general, quadrature formulas allow one to evaluate integrals of a family of functions over a
fixed continuous domain by summing the values of those functions at a fixed number of points.
There is an intimate connection between families of orthogonal polynomials and quadrature
formulae that we unfortunately do not have time to discuss.
The best known quadrature formula allows us to evalue the integral of a polynomial around the
unit circle in the complex plane. For a polynomial p(x) of degree less than n,
Z 2π n−1
iθ 1X
p(e )dθ = p(ω k ),
θ=0 n
k=0
And, for |k| < n, the corresponding sum is the sum of nth roots of unity distributed
symmetrically about the unit circle. So,
n−1
X
ω jk = 0.
j=0
We used this fact in the start of the semester when we computed the eigenvectors of the ring
graph and observed that all but the dominant are orthogonal to the all-1s vector.
On the other hand, for p(x) = 1 both the integral and sum are 1.
We will use an alternative approach to quadrature on groups, encapsulted by the following lemma.
Lemma 43.3.1. For every n and function p(x) = |k|<n ck xk , and every θ ∈ [0, 2π],
P
n
X n
X
i(2πj/n+θ)
p(e )= p(ei(2πj/n) ).
j=0 j=0
This identity implies the quadrature formula above, and has the advantage that it can be
experimentally confirmed by evaluating both sums for a random θ.
Proof. We again evaluate the sums monomial-by-monomial. For p(x) = xk , with |k| < n, we have
n
X n
X
(ei(2πj/n+θ) )k = eiθk (ei(2πj/n) )k .
j=0 j=0
Proof of Theorem 43.2.3. First, observe that it suffices to consider determinants. For every
P ∈ B(n), we have
Z Z Z
det(A + QBQ T ) = f (Q) = f (QP).
Q∈O(n) Q∈O(n) Q∈O(n)
CHAPTER 43. QUADRATURE FOR THE FINITE FREE CONVOLUTION 340
So, "Z # Z
EP∈B(n) f (QP) = f (Q).
Q∈O(n) Q∈O(n)
On the other hand, as B(n) is discrete we can reverse the order of integration to obtain
Z Z Z
f (Q) = EP∈B(n) [f (QP)] = EP∈B(n) [f (P)] = EP∈B(n) [f (P)] ,
Q∈O(n) Q∈O(n) Q∈O(n)
To prove Theorem 43.4.1, we need to know a little more about the orthogonal group. We divide
the orthonormal matrices into two types, those of determinant 1 and those of determinant −1.
The orthonormal matrices of determinant 1 form the special orthogonal group, SO(n), and every
matrix in O(n) may be written in the form DQ where Q ∈ SO(n) and D is a diagonal matrix in
which the first entry is ±1 and all others are 1. Every matrix in SO(n) may be expressed as a
product of 2-by-2 rotation matrices. That is, for every Q ∈ SO(n) there are matrices Q i,j for
1 ≤ i < j ≤ n so that Q i,j is a rotation in the span of δ i and δ j and so that
If you learned the QR-factorization of a matrix, then you learned an algorithm for computing this
decomposition.
These facts about the structure of O(n) tell us that it suffices to prove Theorem 43.4.1 for the
special cases in which Q = diag(−1, 1, 1, . . . , 1) and when Q is rotation of the plane spanned by δ i
and δ j . As the diagonal matrix is contained in B(n), the result is immediate in that case.
For simplicity, consider the case i = 1 and j = 2, and let Rθ denote the rotation by angle θ in the
first two coordinates:
cos θ sin θ 0
def
Rθ = − sin θ cos θ 0 .
0 0 I n−2
The hyperoctahedral group B(n) contains the matrices Rθ for θ ∈ {0, π/2, π, 3π/2}. As B(n) is a
group, for these θ we know
as the set of matrices in the expectations are identical. This identity implies
3
1X
EP∈B(n) fA,B (R2πj/4 P) = EP∈B(n) [f (P)] .
4
j=0
We will prove the following lemma, and then show it implies Theorem 43.4.1.
CHAPTER 43. QUADRATURE FOR THE FINITE FREE CONVOLUTION 341
Lemma 43.5.2. For every symmetric A and B, there exist c−2 , c−1 , c0 , c1 , c2 so that
2
X
fA,B (Rθ ) = ck (eiθ )k .
k=−2
Proof. We need to express f (Rθ ) as a function of eiθ . To this end, recall that
1 −i iθ
cos θ = (eiθ + e−iθ ) and sin θ = (e − e−iθ ).
2 2
From these identities, we see that all two-by-two rotation matrices can be simultaneously
diagonalized by writing iθ
cos θ sin θ e 0
=U U ∗,
− sin θ cos θ 0 e−iθ
where
1 1
U = ,
i −i
and we recall that U ∗ is the conjugate transpose:
1 −i
U∗ = .
1 i
Let D θ be the digaonal matrix having its first two entries eiθ and e−iθ , and the rest 1, and let U n
be the matrix with U in its upper 2-by-2 block and 1s on the diagonal beneath. So,
Rθ = U n D θ U ∗n .
Now, examine
The term eiθ only appears in the first row and column of this matrix, and the term e−iθ only
appears in the second row and column. As a determinant can be expressed as a sum of products
of matrix entries with one in each row and column, it is immediate that this determinant can be
expressed in terms of ekiθ for |k| ≤ 4. As each such product can have at most 2 terms of the form
eiθ and at most two of the form e−iθ , we have |k| ≤ 2.
The difference between Theorem 43.2.3 and Theorem 43.2.2 is that the first involves a sum over
the isometries of hyperoctahedron, while the second involves a sum over the symmetries of the
regular n-simplex in n − 1 dimensions. The proof of the appropriate quadrature theorem for the
symmetries of the regular simplex is very similar to the proof we just saw, except that rotations of
the plane through δ i and δ j are replaced by rotations of the plane parallel to the affine subspace
spanned by triples of vertices of the simplex.
To establish the formula in Theorem 43.2.1, we observe that it suffices to compute the formula for
diagonal matrices, and that Theorem 43.2.3 makes this simple. Every matrix in B(n) can be
written as a product ΠD where D is a ±1 diagonal matrix. If B is the diagonal matrix with
entries µ1 , . . . , µn , then ΠDBDΠT = ΠBΠT , which is the diagonal matrix with entries
µπ(1) , . . . , µπ(n) , where π is the permutation corresponding to Π.
Let A be diagonal with entries λ1 , . . . , λn . For a subset S of {1, . . . , n}, define
Y
λS = λi .
i∈S
We then have X
ai = λS .
|S|=i
Let
n
X
p nq = xn−k (−1)k ck .
k=0
As opposed to expanding this out, let’s just figure out how often the product λS µT appears. We
must have |T | = n − |S|, and then this term appears for each permutation such that π(T ) ∩ S = ∅.
This happens 1/ ni fraction of the time, giving the formula
n n n
X 1 X S X T
X 1 X i!(n − i)!
cn = n
λ µ = n
a b
i n−i = ai bn−i .
i i
n!
i=0 |S|=i |T |=n−i i=0 i=0
CHAPTER 43. QUADRATURE FOR THE FINITE FREE CONVOLUTION 343
For general ck and i + j = k, we see that λS and µT appear whenever µ(T ) is disjoint from S.
The probability of this happening is
n−i
j (n − i)!(n − j)!j! (n − i)!(n − j)!
n
= = ,
j
n!(n − i − j)!j! n!(n − i − j)!
and so
X (n − i)!(n − j)!
ck = ai bj .
n!(n − i − j)!
i+j=k
43.7 Question
For which discrete subgroups of O(n) does a result like Theorem 43.2.3 hold? Can it hold for a
substantially smaller subgroup than the symmetries of the simplex (which has size (n + 1)! in n
dimensions).
Chapter 44
44.1 Overview
We will mostly prove that there are Ramanujan graphs of every number of vertices and degree.
The material in today’s lecture comes from [MSS15d] and [MSS15a]. In those papers, we prove
that for every even n and degree d < n there is a bipartite Ramanujan graph of degree d on n
vertices. A bipartite Ramanujan graph of degree d is an approximation of a complete bipartite
√ matrix thus has eigenvalues d and −d, and all other eigenvalues bounded in
graph. It’s adjacency
absolute value by 2 d − 1.
The difference between this result and that which we prove today is that we will show that for
every d√< n there is a d-regular (multi) graph in whose second adjacency matrix eigenvalue is at
most 2 d − 1. This bound is sufficient for many applications of expanders, but not all. We will
not control the magnitude of the negative eigenvalues. The reason will simply be for simplicity:
the proofs to bound the negative eigenvalues would take more lectures.
Next week we will see a different technique that won’t produce a multigraph and that will
produce a bipartite Ramanujan graph.
We will consider the sum of d random perfect matchings on n vertices. This produces a d-regular
graph that might be a multigraph. Friedman [Fri08] proves that such a graph is probably very
close to being Ramanujan if n is big enough relative to d. In particular, he proves that for all d
and > 0 there is an n0 so that for all n > n0√
, such a graph will probably have all eigenvalues
other than µ1 bounded in absolute value by 2 d − 1 + . We remove the asymptotics and the ,
but merely prove the existence of one such graph. We do not estimate the probability with which
344
CHAPTER 44. RAMANUJAN GRAPHS OF EVERY SIZE 345
In Lecture 22, we learned that this polynomial is real rooted. In Lecture 23, we learned a
technique that allows us to compute√this polynomial. Today we will prove that the second largest
root of this polynomial is at most 2 d − 1. First, we show why this matters: it implies that there
is some√choice of the matrices Π1 , . . . , Πd so that resulting polynomial has second largest root at
most 2 d − 1. These matrices provide the desired graph.
The general problem we face is the following. We have a large family of polynomials, say
p1 (x), . . . , pm (x), for which we know each pi is real-rooted and such that their sum is real rooted.
We would like to show that there is some polynomial pi whose largest root is at most the largest
root of the sum, or rather we want to do this for the second-largest root. This is not true in
general. But, it is true in our case. We will show that it is true whenever the polynomials form
what we call an interlacing family.
Recall from Lecture 22 that we say that for monic degree n polynomials p(x) and r(x),
p(x) → r(x) if the roots of p and r interlace, with the roots of r being larger. We proved that if
p1 (x) → r(x) and p2 (x) → r(x), then every convex combination of p1 and p2 is real rooted. If we
go through the proof, we will also see that for all 0 ≤ s ≤ 1,
Proceeding by induction, we can show that if pi (x) → r(x) for each i, then every convex
combination of these polynomials interlaces r(x), and is thus real rooted. That is, for every
s1 , . . . , sm so that si ≥ 0 (but not all are zero),
X
si pi (x) → r(x).
i
Polynomials that satisfy this condition are said to have a common interlacing. By a technique
analogous to the one we used to prove Lemma 22.3.2, one can prove that the polynomials
p1 , . . . , pm have a common interlacing if and only if every convex combination of these
polynomials is real rooted.
CHAPTER 44. RAMANUJAN GRAPHS OF EVERY SIZE 346
Lemma 44.3.1. Let p1 , . . . , pm be polynomials so that pi (x) → r(x), and let s1 , . . . , sm ≥ 0 be not
identically zero. Define
m
X
p∅ (x) = si pi (x).
i=1
Then, there is an i so that the largest root of pi (x) is at most the largest root of p∅ (x). In general,
for every j there is an i so that the jth largest root of pi (x) is at most the jth largest root of p∅ (x).
Proof. We prove this for the largest root. The proof for the others is similar. Let λ1 and λ2 be
the largest and second-largest roots of r(x). Each polynomial pi (x) has exactly one root between
λ1 and λ2 , and is positive at all x > λ1 . Now, let µ be the largest root of p∅ (x). We can see that
µ must lie between λ1 and λ2 . We also know that
X
pi (µ) = 0.
i
If pi (µ) = 0 for some i, then we are done. If not, there is an i for which pi (µ) > 0. As pi only has
one root larger than λ2 , and it is eventually positive, the largest root of pi must be less than µ.
Our polynomials do not all have a common interlacing. However, they satisfy a property that is
just as useful: they form an interlacing family. We say that a set of polynomials p1 , . . . , pm forms
an interlacing family if there is a rooted tree T in which
The last condition guarantees that every internal vertex is labeled by a real rooted polynomial.
Note that the same label is allowed to appear at many leaves.
Lemma 44.3.2. Let p1 , . . . , pm be an interlacing family, let T be the tree witnessing this, and let
p∅ be the polynomial labeling the root of the tree. Then, for every j there exists an i for which the
jth largest root of pi is at most the jth largest root of p∅ .
Proof. By Lemma 44.3.1, there is a child of the root whose label has a jth largest root that is
smaller than the jth largest root of p∅ . If that child is not a leaf, then we can proceed down the
tree until we reach a leaf, at each step finding a node labeled by a polynomial whose jth largest
root is at most the jth largest root of the previous polynomial.
Our construction of permutations by sequences of random swaps provides the required interlacing
family.
Theorem 44.3.3. For permutation matrices Π1 , . . . , Πd , let
pΠ1 ,...,Πd (x) = χx (Π1 M ΠT1 + · · · + Πd M ΠTd ).
These polynomials form an interlacing family.
CHAPTER 44. RAMANUJAN GRAPHS OF EVERY SIZE 347
Recall from the last lecture that for n-dimensional symmetric matrices A and B with uniform
row sums a and b and characteristic polynomials (x − a)p(x) and (x − b)q(x),
T
EΠ χx (A + ΠBΠ ) = (x − (a + b))p(x) n−1 q(x).
This formula extends to sums of many such matrices. It is easy to show that
def
χx (M ) = (x − 1)n/2 (x + 1)n/2 = (x − 1)p(x), where p(x) = (x − 1)n/2−1 (x + 1)n/2 .
So,
def
p∅ (x) = E [pΠ1 ,...,Πd (x)] = (x − d) (p(x) n−1 p(x) n−1 p(x) n−1 ··· n−1 p(x)) ,
where p(x) appears d times above.
We would like to prove a bound on the largest root of this polynomial in terms of the largest
roots of p(x). This effort turns out not to be productive. To see why, consider matrices A = aI
and B = bI . It is clear that A + ΠBΠT = (a + b)I for every Π. This tells us that
So, the largest roots can add. This means that if we are going to obtain useful bounds on the
roots of the sum, we are going to need to exploit facts about the distribution of the roots of p(x).
As in Lecture ??, we will use the barrier functions, just scaled a little differently.
For,
n
Y
p(x) = (x − λi ),
i=1
define the Cauchy transform of p at x to be
d
1X 1 1 p0 (x)
Gp (x) = = .
d x − λi d p(x)
i=1
For those who are used to Cauchy transforms, I remark that this is the Cauchy transform of the
uniform distribution on the roots of p(x). As we will be interested in upper bounds on the
Cauchy transform, we will want a number u so that for all x > u, Gp (x) is less than some
specified value. That is, we want the inverse Cauchy transform, which we define to be
For a real rooted polynomial p, and thus for real λ1 , . . . , λd , it is the value of x that is larger than
all the λi for which Gp (x) = w. For w = ∞, it is the largest root of p. But, it is larger for finite w.
We will prove the following bound on the Cauchy transforms.
Theorem 44.4.1. For degree n polynomials p and q and for w > 0,
For w = ∞, this says that the largest root of p n q is at most the sum of the largest roots of p
and q. But, this is obvious.
To explain the 1/w term in the above expression, consider q(x) = xn . As this is the characteristic
polynomial of the all-zero matrix, p n q = p(x). We have
1 nxn−1 1
Gq (x) = = .
n xn x
So,
Kq (w) = max {x : 1/x = w} = 1/w.
Thus,
Kq (w) − 1/w = 0.
I will defer the proof of this theorme to next lecture (or maybe the paper [MSS15a]), and now just
show how we use it.
As this is an upper bound on the largest root of p n−1 ··· n−1 p, we wish to set w to minimize
this expression. As,
x
Gχ(M ) (x) = ,
x2 −1
we have
x
Kχ(M ) (w) = x if and only if w = .
x2 − 1
So,
d−1 x2 − 1
dKχ(M ) (w) − . ≤ dx − d − 1 .
w x
√
The choice of x that minimizes this is d − 1, at which point it becomes
√ (d − 1)(d − 2) √ √ √
d d−1− √ = d d − 1 − (d − 2) d − 1 = 2 d − 1.
d−1
I will now have time to go through the proof of Theorem 44.4.1. So, I’ll just tell you a little about
it. We begin by transforming statements about the inverse Cauchy transform into statements
about the roots of polynomials.
1 p0 (x)
As Gp (x) = d p(x) ,
1 0
Gp (x) = w ⇐⇒ p(x) − p (x) = 0.
wd
This tells us that
Uα = 1 − α∂x .
We, of course, also need to exploit an expression for the finite free convolution. Last lecture, we
proved that if
n
X n
X
p(x) = xn−i (−1)i ai and q(x) = xn−i (−1)i bi .
i=0 i=0
Then,
n
X X (n − i)!(n − j)!
p(x) n q(x) = xn−k (−1)k ai bj . (44.2)
n!(n − i − j)!
k=0 i+j=k
CHAPTER 44. RAMANUJAN GRAPHS OF EVERY SIZE 350
From this, one can derive a formula that plays better with derivatives:
n
1 X
p(x) n q(x) = (n − i)!bi p(i) (x).
n!
i=0
This equation allows us to understand what happens when p and q have different degrees.
so
maxroot (Uα q(x)) = α(n − 1).
So, in this case (44.1) says
The proof of Theorem 44.4.1 has two major ingredients. We begin by proving the above
inequality. We then show that the extreme case for the inequality is when q(x) = (x − b)n for
some b. To do this, we consider an arbitrary real rooted polynomial q, and then modify it to make
two of its roots the same. This leads to an induction on degree, which is essentially handled by
the following result.
Lemma 44.6.2. If p(x) has degree n and the degree of q(x) is less than n, then
1
p n q= (∂x p) n−1 q.
n
I would like to reflect on the fundamental difference between considering expected characteristic
polynomials and the distributions of the roots of random polynomials. Let A be a symmetric
matrix of dimension 3k with k eigenvalues that are 1, 0, and −1. If you consider A + ΠAΠT for a
random Π, the resulting matrix will almost definitely have a root at 2 and a root at −2. In fact,
the chance that it does not is exponentially small in k. However, all the roots of the expected
characteristic polynomial of this matrix are strictly bounded away from 2. You could verify this
by computing the Cauchy transform of this polynomial.
CHAPTER 44. RAMANUJAN GRAPHS OF EVERY SIZE 351
A + Π1 AΠT1 + Π2 AΠT2 ,
45.1 Overview
The coefficients of the matching polynomial of a graph count the numbers of matchings of various
sizes in that graph. It was first defined by Heilmann and Lieb [HL72], who proved that it has
some amazing properties, including that it is real rooted. They also √
proved that all root of the
matching polynomial of a graph of maximum degree d are at most 2 d − 1. In the next lecture,
we will use this fact to derive the existence of Ramanujan graphs.
Our proofs today come from a different approach to the matching polynomial that appears in the
work of Godsil
√[God93, God81]. My hope is that someone can exploit Godsil’s approach to
connect
√ the 2 d − 1 bound from today’s lecture with that from last lecture. In today’s lecture,
2 d − 1 appears as an upper bound on the spectral radius of a d-ary tree. Infinite d-ary trees
appear as the graphs of free groups in free probability. I feel like there must be a formal relation
between these that I am missing.
A matching in a graph G = (V, E) is a subgraph of G in which every vertex has degree 1. We say
that a matching has size k if it has k edges. We let
mk (G)
denote the number of matchings in G of size k. Throughout this lecture, we let |V | = n. Observe
that m1 (G) is the number of edges in G, and that mn/2 (G) is the number of perfect matchings in
G. By convention we set m0 (G) = 1, as the empty set is matching with no edges. Computing the
number of perfect matchings is a #P -hard problem. This means that it is much harder than
solving N P -hard problems, so you shouldn’t expect to do it quickly on large graphs.
352
CHAPTER 45. MATCHING POLYNOMIALS OF GRAPHS 353
We begin by establishing some fundamental properties of the matching polynomial. For graphs G
and H on different vertex sets, we write G ∪ H for their disjoint union.
µx [G ∪ H] = µx [G] µx [H] .
For a a vertex of G = (V, E), we write G − a for the graph G(V − {a}). This notation will prove
very useful when reasoning about matching polynmomials. Fix a vertex a of G, and divide the
matchings in G into two classes: those that involve vertex a and those that do not. The number
of matchings of size k that do not involve a is mk (G − a). On the other hand, those that do
involve a connect a to one of its neighbors. To count these, we enumerate the neighbors b of a. A
matching of size k that includes edge (a, b) can be written as the union of (a, b) and a matching of
size k − 1 in G − a − b. So, the number of matchings that involve a is
X
mk−1 (G − a − b).
b∼a
So, X
mk (G) = mk (G − a) + mk−1 (G − a − b).
b∼a
Lemma 45.3.2. X
µx [G] = xµx [G − a] − µx [G − a − b] .
b∼a
The matching polynomials of trees are very special—they are exactly the same as the
characteristic polynomial of the adjacency matrix.
µx [G] = χx (AG ).
Proof. Expand
χx (AG ) = det(xI − AG )
by summing over permutations. We obtain
X Y
(−1)sgn(π) x|{a:π(a)=a}| (−AG (a, π(a))).
π∈Sn a:π(a)6=a
We will prove that the only permutations that contribute to this sum are those for which
π(π(a)) = a for every a. And, these correspond to matchings.
If π is a permutation for which there is an a so that π(π(a)) 6= a, then there are a = a1 , . . . , ak
with k > 2 so that π(ai ) = ai+1 for 1 ≤ i < k, and π(ak ) = a1 . For this term to contribute, it
must be the case that AG (ai , ai+1 ) = 1 for all i, and that AG (ak , a1 ) = 1. For k > 2, this would
be a cycle of length k in G. However, G is a tree and so cannot have a cycle.
So, the only permutations that contribute are the involutions: the permutations π that are their
own inverse. An involution has only fixed points and cycles of length 2. Each cycle of length 2
that contributes a nonzero term corresponds to an edge in the graph. Thus, the number of
permutations with k cycles of length 2 is equal to the number of matchings with k edges. As the
sign of an involution with k cycles of length 2 is (−1)k , the coefficient of xn−2k is (−1)k mk (G).
Godsil proves that the matching polynomial of a graph is real rooted by proving that it divides
the matching polynomial of a tree. As the matching polynomial of a tree is the same as the
characteristic polynomial of its adjacency matrix, it is real rooted. Thus, the matching
polynomial of the graph is as well. The tree that Godsil uses is the path tree of G starting at a
vertex of G. For a a vertex of G, the path tree of G starting at a, written Ta (G) is a tree whose
vertices correspond to paths in G that start at a and do not contain any vertex twice. One path is
connected to another if one extends the other by one vertex. For example, here is a graph and its
path tree starting at a.
CHAPTER 45. MATCHING POLYNOMIALS OF GRAPHS 355
The term on the upper-right hand side is a little odd. It is a forrest obtained by removing the
root of the tree Ta (G). We may write it as a disjoint union of trees as
[
Ta (G) − a = Tb (G − a).
b∼a
Proof. If G is a tree, then the left and right sides are identical, and so the inequality holds. As
the only graphs on less than 3 vertices are trees, the theorem holds for all graphs on at most 2
vertices. We will now prove it by induction on the number of vertices.
We may use Lemma 45.3.2 to expand the reciprocal of the left-hand side:
P
µx [G] xµx [G − a] − b∼a µx [G − a − b] X µx [G − a − b]
= =x− .
µx [G − a] µx [G − a] µx [G − a]
b∼a
To simplify this expression, we examine these graphs carefully. By the observtion we made before
the proof, [
Tb (G − a) − b = Tc (G − a − b).
c∼b,c6=a
Similarly, [
Ta (G) − a = Tc (G − a),
c∼a
which implies Y
µx [Ta (G) − a] = µx [Tc (G − a)] .
c∼a
CHAPTER 45. MATCHING POLYNOMIALS OF GRAPHS 356
Let ab be the vertex in Ta (G) corresponding to the path from a to b. We also have
[ [
Ta (G) − a − ab = Tc (G − a) ∪ Tc (G − a − b)
c∼a,c6=b c∼b,c6=a
[
= Tc (G − a) ∪ (Tb (G − a) − b) .
c∼a,c6=b
which implies
Y
µx [Ta (G) − a − ab] = µx [Tc (G − a)] µx [Tb (G − a) − b] .
c∼a,c6=b
Thus,
Q
µx [Ta (G) − a − ab] µ [T
c∼a,c6=b x c (G − a)] µx [Tb (G − a) − b]
= Q
µx [Ta (G) − a] c∼a µx [Tc (G − a)]
µx [Tb (G − a) − b]
= .
µx [Tb (G − a)]
Theorem 45.4.2. For every vertex a of G, the polynomial µx [G] divides the polynomial
µx [Ta (G)].
Proof. We again prove this by induction on the number of vertices in G, using as our base case
graphs with at most 2 vertices. We then know, by induction, that for b ∼ a,
As
Ta (G) − a = ∪b∼a Tb (G − a),
µx [Tb (G − a)] divides µx [Ta (G) − a] .
Thus,
µx [G − a] divides µx [Ta (G) − a] ,
CHAPTER 45. MATCHING POLYNOMIALS OF GRAPHS 357
and so
µx [Ta (G) − a]
µx [G − a]
is a polynomial in x. To finish the proof, we apply Theorem 45.4.1, which implies
If every vertex of G has degree at most d, then the same is true of Ta (G). We will show that the
norm
√ of the adjacency matrix of a tree in which every vertex has degree at most d is at most
2 d − 1. √Thus, all of the roots of the matching polynomial of a graph of maximum degree d are
at most 2 d − 1.
Theorem 45.5.1. Let T be a tree in which every vertex√ has degree at most d. Then, all
eigenvalues of χx (AT ) have absolute value at most 2 d − 1.
Proof. Let A be the adjacency matrix of T . Choose some vertex to be the root of the tree, and
define its height to be 0. For every other vertex a, define its height, h(a), to be its distance to the
root. Define D to be the diagonal matrix with
√ h(a)
D(a, a) = d−1 .
Recall that the eigenvalues of A are the same as the eigenvalues of DAD −1 . We will use the fact
that all eigenvalues of a nonnegative matrix are upper bounded in absolute value by its maximum
row sum.
√
So, we need to prove that all row sums of DAD −1 are at most 2 d − 1. There are √ three types of
vertices to
√consider. √First, the row of the root has up to d entries that are all 1/ d − 1. For
d
√ ≥ 2, d/ d − 1 ≤ 2 d − 1. The intermediate vertices
√ have one entry in their
√ row that equals
d − 1, and up to d − 1 entries that are equal to 1/ d − 1, for a √total of 2 d − 1. Finally, every
leaf only has one nonzero entry in its row, and that entry equals d − 1.
When combined with Theorem 45.4.2, this tells us that that matching polynomial
√ of a graph with
all degrees at most d has all of its roots bounded in absolute value by 2 d − 1.
Chapter 46
46.1 Overview
In today’s lecture, we will prove the existence of infinite families of bipartite Ramanujan of every
degree. We do this by proving (half) a conjecture of Bilu and Linial [BL06] that every bipartite
Ramanujan graph has a 2-lift that is also Ramanujan.
Today’s theorem comes from [MSS15b], and the proof is informed by the techniques of [HPS15].
We will use theorems about the matching polynomials of graphs that we proved last lecture.
46.2 2-Lifts
358
CHAPTER 46. BIPARTITE RAMANUJAN GRAPHS OF EVERY DEGREE 359
just like the double-cover. If S (u, v) = 1, then GS has the two edges
You should check that G−A is the double-cover of G and that GA consists of two
disjoint copies of G.
Prove that the eigenvalues of the adjacency matrix of GS are the union of the
eigenvalues of A and the eigenvalues of S .
S = A+ − A− and A = A+ + A− .
To show that these are the eigenvectors with the claimed eigenvalues, compute
S + A + A− ψi A+ ψ i + A− ψ i Aψ i ψi
A ψi = = = = λi ,
A − A+ ψi A− ψ i + A+ ψ i Aψ i ψi
and
A+ A− φi A+ φi − A− φi S φi φi
A S
φ−
i = = = = µi .
A− A+ −φi −A− φi + A+ φi S φi φi
CHAPTER 46. BIPARTITE RAMANUJAN GRAPHS OF EVERY DEGREE 360
We can use this theorem to build infinite families of bipartite √ Ramanujan graphs, because their
eigenvalues
√ are symmetric about the origin. Thus, if µ 2 ≤ 2 d − 1, then we know that
|µi | ≤ 2 d − 1 for all 1 < i < n. Note that the 2-lift of a bipartite graph is also a bipartite graph.
We will prove Theorem 46.2.1 by considering a random 2-lift, and then applying the method of
interlacing polynomials. In particular, we consider
E [χx (S )] . (46.1)
Godsil and Gutman [GG81] proved that this is equal to the matching polynomial of G!
Lemma 46.3.1. Let G be a graph and let S be a uniform random signed adjacency matrix of G.
Then,
E [χx (S )] = µx [G] .
E [χx (S )] = E [det(xI − S )]
X Y
= E (−1)sgn(π) x|{a:π(a)=a}| (S (a, π(a))).
π∈Sn a:π(a)6=a
X Y
= (−1)sgn(π) x|{a:π(a)=a}| E (S (a, π(a))) .
π∈Sn a:π(a)6=a
As E [S (a, π(a))] = 0 for every a so that π(a) 6= a, the only way we can get a nonzero contribution
from a permutation π is if for all a so that π(a) 6= a,
b. π(π(a)) = a.
CHAPTER 46. BIPARTITE RAMANUJAN GRAPHS OF EVERY DEGREE 361
The latter condition guarantees that whenever S (a, π(a)) appears in the product, S (π(a), a) does
as well. As these entries are constrained to be the same, their product is 1.
Thus, the only permutations that count are the involutions. As we saw last lecture, these
correspond exactly to the matchings in the graph.
√
Thus, we know that the largest root of (46.1) is at most 2 d − 1. So, all we need to do is to show
that there is some signed adjacency matrix whose largest eigenvalue is at most this bound. We do
this via the method of interlacing polynomials.
To this end, choose an ordering on the m edges of the graph. We can now associate each S with a
vector σ ∈ {±1}m . Define
pσ = χx (S ).
The expected polynomial is the average of all these polynomials.
To form an interlacing family, we will form a tree that has the polynomials pσ at the leaves. The
intermediate nodes will correspond to choices of the first couple signs. That is, for k < m and
σ ∈ {±1}k we define
def
pσ (x) = Eρ∈{±1}n−k [pσ,ρ (x)] .
So, p∅ is the polynomial at the root of the tree. It remains to show that all pairs of siblings in the
tree have a common interlacing.
Polynomials indexed by σ and τ are siblings if σ and τ have the same length, and only differ in
their last index. To show that they have a common interlacing, we recall a few results from
Lecture 22.
Lemma 46.3.2. [Lemma 22.3.3] Let A be an n-dimensional symmetric matrix and let v be a
vector. Let
pt (x) = χx (A + tv v T ).
Then there is a degree n − 1 polynomial q(x) so that
pt (x) = χx (A) − tq(x).
Lemma 46.3.3. [Lemma 22.3.2] Let p and q be polynomials of degree n and n − 1, and let
pt (x) = p(x) − tq(x). If pt is real rooted for all t ∈ IR, then p and q interlace.
Lemma 46.3.4. [Lemma 22.3.1] Let p and q be polynomials of degree n and n − 1 that interlace
and have positive leading coefficients. For every t > 0, define pt (x) = p(x) − tq(x). Then, pt (x) is
real rooted and
p(x) → pt (x).
Lemma 46.3.5. Let p0 (x) and p1 (x) be two degree n monic polynomials for which there is a third
polynomial r(x) that has the same degree as p0 and p1 and so that
p0 (x) → r(x) and p1 (x) → r(x).
Then for all 0 ≤ s ≤ 1,
def
ps (x) = sp1 (x) + (1 − s)p0 (x)
is a real rooted polynomial.
CHAPTER 46. BIPARTITE RAMANUJAN GRAPHS OF EVERY DEGREE 362
is real rooted. Moreover, for every vector u in the support of v k , all the polynomials
k−1
" !#
X
T T
E χx A + uu + v iv i
i=1
Proof. We prove this by induction on k. Assuming that we have proved it for k, we now prove it
for k + 1. Let u be any vector and let t ∈ IR. Define
k
" !#
X
T T
pt (x) = E χx A + tuu + v iv i .
i=1
We now apply this result with each u from the support of v k+1 to conclude (via Lemma ) that
k k
" !# " !#
X X
E χx A + v i v Ti → E χx A + v k+1 v Tk+1 + v i v Ti ,
i=1 i=1
To apply this theorem to the matrices S , we must write them as a sum of outer products of
random vectors. While we cannot do this, we can do something just as good. For each edge (a, b)
of G, let v a,b be the random vector that is δ a − δ b with probability 1/2 and δ a + δ b with
probability 1/2. The random matrix S is distributed according to
X
v a,b v Ta,b − dI .
(a,b)∈E
Subtracting dI shifts the roots by d, and so does not impact any results we have proved about
interlacing or real rootedness.
Bibliography
[ABN+ 92] Noga Alon, Jehoshua Bruck, Joseph Naor, Moni Naor, and Ron M. Roth.
Construction of asymptotically good low-rate error-correcting codes through
pseudo-random graphs. IEEE Transactions on Information Theory, 38(2):509–516,
March 1992. 61, 240
[ABN08] I. Abraham, Y. Bartal, and O. Neiman. Nearly tight low stretch spanning trees. In
Proceedings of the 49th Annual IEEE Symposium on Foundations of Computer
Science, pages 781–790, Oct. 2008. 282
[AC88] Noga Alon and Fan Chung. Explicit construction of linear sized tolerant networks.
Discrete Mathematics, 72:15–19, 1988. 209
[AH77a] Kenneth Appel and Wolfgang Haken. Every planar map is four colorable part i.
discharging. Illinois Journal of Mathematics, 21:429–490, 1977. 152
[AH77b] Kenneth Appel and Wolfgang Haken. Every planar map is four colorable part ii.
reducibility. Illinois Journal of Mathematics, 21:491–567, 1977. 152
[AKPW95] Noga Alon, Richard M. Karp, David Peleg, and Douglas West. A graph-theoretic
game and its application to the k-server problem. SIAM Journal on Computing,
24(1):78–100, February 1995. 279
[AKV02] Noga Alon, Michael Krivelevich, and Van H. Vu. On the concentration of eigenvalues
of random symmetric matrices. Israel Journal of Mathematics, 131(1):259–267, 2002.
180
[AM85] Noga Alon and V. D. Milman. λ1 , isoperimetric inequalities for graphs, and
superconcentrators. J. Comb. Theory, Ser. B, 38(1):73–88, 1985. 164
[AN12] Ittai Abraham and Ofer Neiman. Using petal-decompositions to build a low stretch
spanning tree. In Proceedings of the 44th Annual ACM Symposium on the Theory of
Computing (STOC ’12), pages 395–406, 2012. 279
[AR94] Noga Alon and Yuval Roichman. Random cayley graphs and expanders. Random
Structures & Algorithms, 5(2):271–284, 1994. 59
363
BIBLIOGRAPHY 364
[AS06] A. Ashikhmin and V. Skachek. Decoding of expander codes at rates close to capacity.
IEEE Transactions on Information Theory, 52(12):5475–5485, Dec. 2006. 230
[AW02] R. Ahlswede and A. Winter. Strong converse for identification via quantum channels.
Information Theory, IEEE Transactions on, 48(3):569–579, 2002. 247
[AZLO15] Zeyuan Allen-Zhu, Zhenyu Liao, and Lorenzo Orecchia. Spectral sparsification and
regret minimization beyond matrix multiplicative updates. In Proceedings of the
Forty-Seventh Annual ACM on Symposium on Theory of Computing, pages 237–245.
ACM, 2015. 259
[Bab79] László Babai. Spectra of cayley graphs. Journal of Combinatorial Theory, Series B,
pages 180–189, 1979. 62
[Bab80] László Babai. On the complexity of canonical labeling of strongly regular graphs.
SIAM Journal on Computing, 9(1):212–216, 1980. 309
[Bar82] Earl R. Barnes. An algorithm for partitioning the nodes of a graph. SIAM Journal
on Algebraic and Discrete Methods, 3(4):541–550, 1982. 192
[BCS+ 13] László Babai, Xi Chen, Xiaorui Sun, Shang-Hua Teng, and John Wilmes. Faster
canonical forms for strongly regular graphs. In 2013 IEEE 54th Annual Symposium
on Foundations of Computer Science, pages 157–166. IEEE, 2013. 312
[BGM82] László Babai, D Yu Grigoryev, and David M Mount. Isomorphism of graphs with
bounded eigenvalue multiplicity. In Proceedings of the fourteenth annual ACM
symposium on Theory of computing, pages 310–324. ACM, 1982. 14, 295
[BL83] László Babai and Eugene M Luks. Canonical labeling of graphs. In Proceedings of the
fifteenth annual ACM symposium on Theory of computing, pages 171–183. ACM,
1983. 295
[BL06] Yonatan Bilu and Nathan Linial. Lifts, discrepancy and nearly optimal spectral gap*.
Combinatorica, 26(5):495–519, 2006. 314, 315, 358, 360
[BLR10] P. Biswal, J. Lee, and S. Rao. Eigenvalue bounds, spectral partitioning, and metrical
deformations via flows. Journal of the ACM, 2010. to appear. 198
[BMS93] Richard Beigel, Grigorii Margulis, and Daniel A. Spielman. Fault diagnosis in a small
constant number of parallel testing rounds. In SPAA ’93: Proceedings of the fifth
annual ACM symposium on Parallel algorithms and architectures, pages 21–29, New
York, NY, USA, 1993. ACM. 209
[Bol86] Béla Bollobás. Combinatorics: set systems, hypergraphs, families of vectors, and
combinatorial probability. Cambridge University Press, 1986. 51
[BR97] R. B. Bapat and T. E. S. Raghavan. Nonnegative Matrices and Applications.
Number 64 in Encyclopedia of Mathematics and its Applications. Cambridge
University Press, 1997. 36
[BRSW12] Boaz Barak, Anup Rao, Ronen Shaltiel, and Avi Wigderson. 2-source dispersers for
no(1) entropy, and ramsey graphs beating the frankl-wilson construction. Annals of
Mathematics, pages 1483–1543, 2012. 155
[BSS12] Joshua Batson, Daniel A Spielman, and Nikhil Srivastava. Twice-Ramanujan
sparsifiers. SIAM Journal on Computing, 41(6):1704–1721, 2012. 210, 252
[BSS14] Joshua Batson, Daniel A Spielman, and Nikhil Srivastava. Twice-ramanujan
sparsifiers. SIAM Review, 56(2):315–334, 2014. 252
[BW13] László Babai and John Wilmes. Quasipolynomial-time canonical form for steiner
designs. In Proceedings of the forty-fifth annual ACM symposium on Theory of
computing, pages 261–270. ACM, 2013. 312
[BZ02] A. Barg and G. Zemor. Error exponents of expander codes. IEEE Transactions on
Information Theory, 48(6):1725–1729, Jun 2002. 230
[BZ05] A. Barg and G. Zemor. Concatenated codes: serial and parallel. IEEE Transactions
on Information Theory, 51(5):1625–1634, May 2005. 230
[BZ06] A. Barg and G. Zemor. Distance properties of expander codes. IEEE Transactions on
Information Theory, 52(1):78–90, Jan. 2006. 230
[BZ13] Nick Bridle and Xiaojin Zhu. p-voltages: Laplacian regularization for semi-supervised
learning on high-dimensional data. In Eleventh Workshop on Mining and Learning
with Graphs (MLG2013), 2013. 146
[CCL+ 15] Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, and Shang-Hua Teng. Efficient
sampling for gaussian graphical models via spectral sparsification. In Peter Grünwald,
Elad Hazan, and Satyen Kale, editors, Proceedings of The 28th Conference on
Learning Theory, volume 40 of Proceedings of Machine Learning Research, pages
364–390, Paris, France, 03–06 Jul 2015. PMLR. 119
[CFM94] F. R. K. Chung, V. Faber, and T. A. Manteuffel. On the diameter of a graph from
eigenvalues associated with its Laplacian. SIAM Journal on Discrete Mathematics,
7:443–457, 1994. 274
BIBLIOGRAPHY 366
[CGP+ 18] Timothy Chu, Yu Gao, Richard Peng, Sushant Sachdeva, Saurabh Sawlani, and
Junxing Wang. Graph sparsification, spectral sketches, and faster resistance
computation, via short cycle decompositions. arXiv preprint arXiv:1805.12051, 2018.
251
[CH91] Joel E Cohen and Paul Horowitz. Paradoxical behaviour of mechanical and electrical
networks. 1991. 138
[Che70] J. Cheeger. A lower bound for smallest eigenvalue of the Laplacian. In Problems in
Analysis, pages 195–199, Princeton University Press, 1970. 164
[Chi92] Patrick Chiu. Cubic Ramanujan graphs. Combinatorica, 12(3):275–285, 1992. 314
[Chu97] F. R. K. Chung. Spectral Graph Theory. American Mathematical Society, 1997. 161
[CKM+ 14] Michael B. Cohen, Rasmus Kyng, Gary L. Miller, Jakub W. Pachocki, Richard Peng,
Anup B. Rao, and Shen Chen Xu. Solving sdd linear systems in nearly mlog1/2n
time. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing,
STOC ’14, pages 343–352, New York, NY, USA, 2014. ACM. 281, 294
[CLM+ 15] Michael B Cohen, Yin Tat Lee, Cameron Musco, Christopher Musco, Richard Peng,
and Aaron Sidford. Uniform sampling for matrix approximation. In Proceedings of
the 2015 Conference on Innovations in Theoretical Computer Science, pages 181–190.
ACM, 2015. 251
[dCSHS11] Marcel K. de Carli Silva, Nicholas J. A. Harvey, and Cristiane M. Sato. Sparse sums
of positive semidefinite matrices. CoRR, abs/1107.0088, 2011. 259
[DH72] W. E. Donath and A. J. Hoffman. Algorithms for partitioning graphs and computer
logic based on eigenvectors of connection matrices. IBM Technical Disclosure
Bulletin, 15(3):938–944, 1972. 192
[DH73] W. E. Donath and A. J. Hoffman. Lower bounds for the partitioning of graphs. IBM
Journal of Research and Development, 17(5):420–425, September 1973. 192
[DK70] Chandler Davis and William Morton Kahan. The rotation of eigenvectors by a
perturbation. iii. SIAM Journal on Numerical Analysis, 7(1):1–46, 1970. 180
[DKMZ11] Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborová.
Asymptotic analysis of the stochastic block model for modular networks and its
algorithmic applications. Physical Review E, 84(6):066106, 2011. 177
[DS91] Persi Diaconis and Daniel Stroock. Geometric bounds for eigenvalues of Markov
chains. The Annals of Applied Probability, 1(1):36–61, 1991. 39
[Duf47] R. J. Duffin. Nonlinear networks. IIa. Bull. Amer. Math. Soc, 53:963–971, 1947. 143
BIBLIOGRAPHY 367
[dV90] Colin de Verdière. Sur un nouvel invariant des graphes et un critère de planarité. J.
Combin. Theory Ser. B, 50:11–21, 1990. 200
[EEST08] Michael Elkin, Yuval Emek, Daniel A. Spielman, and Shang-Hua Teng. Lower-stretch
spanning trees. SIAM Journal on Computing, 32(2):608–628, 2008. 282
[Eli55] Peter Elias. Coding for noisy channels. IRE Conv. Rec., 3:37–46, 1955. 220
[Erd47] Paul Erdös. Some remarks on the theory of graphs. Bulletin of the American
Mathematical Society, 53(4):292–294, 1947. 155
[FK98] Uriel Feige and Joe Kilian. Zero knowledge and the chromatic number. Journal of
Computer and System Sciences, 57(2):187–199, 1998. 153
[Fri08] Joel Friedman. A Proof of Alon’s Second Eigenvalue Conjecture and Related
Problems. Number 910 in Memoirs of the American Mathematical Society. American
Mathematical Society, 2008. 209, 314, 344
[Fro12] Georg Frobenius. Über matrizen aus nicht negativen elementen. 1912. 36
[Gal63] R. G. Gallager. Low Density Parity-Check Codes. MIT Press, Cambridge, MA, 1963.
229
[GGT06] Steven J Gortler, Craig Gotsman, and Dylan Thurston. Discrete one-forms on meshes
and applications to 3d mesh parameterization. Computer Aided Geometric Design,
23(2):83–112, 2006. 129
[Gil98] David Gillman. A chernoff bound for random walks on expander graphs. SIAM
Journal on Computing, 27(4):1203–1220, 1998. 244
BIBLIOGRAPHY 368
[GLM99] S. Guattery, T. Leighton, and G. L. Miller. The path resistance method for bounding
the smallest nontrivial eigenvalue of a Laplacian. Combinatorics, Probability and
Computing, 8:441–460, 1999. 39
[GLSS18] Ankit Garg, Yin Tat Lee, Zhao Song, and Nikhil Srivastava. A matrix expander
chernoff bound. In Proceedings of the 50th Annual ACM SIGACT Symposium on
Theory of Computing, pages 1102–1114. ACM, 2018. 244
[GN08] C.D. Godsil and M.W. Newman. Eigenvalue bounds for independent sets. Journal of
Combinatorial Theory, Series B, 98(4):721 – 734, 2008. 153
[God93] Chris Godsil. Algebraic Combinatorics. Chapman & Hall, 1993. 54, 322, 352
[Har76] Sergiu Hart. A note on the edges of the n-cube. Discrete Mathematics, 14(2):157–163,
1976. 51
[HILL99] Johan Håstad, Russell Impagliazzo, Leonid A. Levin, and Michael Luby. A
pseudorandom generator from any one-way function. SIAM Journal on Computing,
28(4):1364–1396, 1999. 239, 240
[HL09] Mark Herbster and Guy Lever. Predicting the labelling of a graph via minimum
p-seminorm interpolation. In Proceedings of the 2009 Conference on Learning Theory
(COLT), 2009. 146
[Hof70] A. J. Hoffman. On eigenvalues and colorings of graphs. In Graph Theory and its
Applications, pages 79–92. Academic Press, New York, 1970. 153
[HPS15] Chris Hall, Doron Puder, and William F Sawin. Ramanujan coverings of graphs.
arXiv preprint arXiv:1506.02335, 2015. 314, 321, 329, 358
[IZ89] R. Impagliazzo and D. Zuckerman. How to recycle random bits. In 30th annual IEEE
Symposium on Foundations of Computer Science, pages 248–253, 1989. 239
[JL84] William B Johnson and Joram Lindenstrauss. Extensions of lipschitz mappings into a
hilbert space. Contemporary mathematics, 26(189-206):1, 1984. 118
[JL97] Bruce W Jordan and Ron Livné. Ramanujan local systems on graphs. Topology,
36(5):1007–1024, 1997. 314
BIBLIOGRAPHY 369
[Kel06] Jonathan A. Kelner. Spectral partitioning, eigenvalue bounds, and circle packings for
graphs of bounded genus. SIAM J. Comput., 35(4):882–902, 2006. 198
[KLL16] Tsz Chiu Kwok, Lap Chi Lau, and Yin Tat Lee. Improved cheeger’s inequality and
analysis of local graph partitioning using vertex expansion and expansion profile. In
Proceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete
algorithms, pages 1848–1861. Society for Industrial and Applied Mathematics, 2016.
169
[KLP12] Ioannis Koutis, Alex Levin, and Richard Peng. Improved spectral sparsification and
numerical algorithms for sdd matrices. In STACS’12 (29th Symposium on Theoretical
Aspects of Computer Science), volume 14, pages 266–277. LIPIcs, 2012. 251
[KLP+ 16] Rasmus Kyng, Yin Tat Lee, Richard Peng, Sushant Sachdeva, and Daniel A
Spielman. Sparsified cholesky and multigrid solvers for connection laplacians. In
Proceedings of the forty-eighth annual ACM symposium on Theory of Computing,
pages 842–850. ACM, 2016. 281
[KLPT09] Jonathan A. Kelner, James Lee, Gregory Price, and Shang-Hua Teng. Higher
eigenvalues of graphs. In Proceedings of the 50th IEEE Symposium on Foundations of
Computer Science, 2009. 198
[KMP10] I. Koutis, G.L. Miller, and R. Peng. Approaching optimality for solving sdd linear
systems. In Foundations of Computer Science (FOCS), 2010 51st Annual IEEE
Symposium on, pages 235 –244, 2010. 283
[KMP11] I. Koutis, G.L. Miller, and R. Peng. A nearly-mlogn time solver for sdd linear
systems. In Foundations of Computer Science (FOCS), 2011 52nd Annual IEEE
Symposium on, pages 590–598, 2011. 281, 283
[Kou14] Ioannis Koutis. Simple parallel and distributed algorithms for spectral graph
sparsification. In Proceedings of the 26th ACM Symposium on Parallelism in
Algorithms and Architectures, SPAA ’14, pages 61–66, New York, NY, USA, 2014.
ACM. 251
[KS16] Rasmus Kyng and Sushant Sachdeva. Approximate gaussian elimination for
laplacians-fast, sparse, and simple. In Foundations of Computer Science (FOCS),
2016 IEEE 57th Annual Symposium on, pages 573–582. IEEE, 2016. 281
[Kur30] Casimir Kuratowski. Sur le probleme des courbes gauches en topologie. Fundamenta
mathematicae, 15(1):271–283, 1930. 124
[LM82] F. Tom Leighton and Gary Miller. Certificates for graphs with distinct eigenvalues.
Manuscript, 1982. 295
[LM00] Beatrice Laurent and Pascal Massart. Adaptive estimation of a quadratic functional
by model selection. Annals of Statistics, pages 1302–1338, 2000. 120
[Lov01] Làszlò Lovàsz. Steinitz representations of polyhedra and the Colin de Verdière
number. Journal of Combinatorial Theory, Series B, 82(2):223 – 236, 2001. 15, 201
BIBLIOGRAPHY 370
[LPS15] Yin Tat Lee, Richard Peng, and Daniel A. Spielman. Sparsified cholesky solvers for
SDD linear systems. CoRR, abs/1506.08204, 2015. 251, 294
[LR97] John D. Lafferty and Daniel N. Rockmore. Spectral techniques for expander codes. In
STOC ’97: Proceedings of the twenty-ninth annual ACM symposium on Theory of
computing, pages 160–167, New York, NY, USA, 1997. ACM. 230
[LRT79] Richard J Lipton, Donald J Rose, and Robert Endre Tarjan. Generalized nested
dissection. SIAM journal on numerical analysis, 16(2):346–358, 1979. 262
[LS88] Gregory F. Lawler and Alan D. Sokal. Bounds on the l2 spectrum for Markov chains
and Markov processes: A generalization of Cheeger’s inequality. Transactions of the
American Mathematical Society, 309(2):557–580, 1988. 164
[LS90] L. Lovàsz and M. Simonovits. The mixing rate of Markov chains, an isoperimetric
inequality, and computing the volume. In IEEE, editor, Proceedings: 31st Annual
Symposium on Foundations of Computer Science: October 22–24, 1990, St. Louis,
Missouri, volume 1, pages 346–354, 1109 Spring Street, Suite 300, Silver Spring, MD
20910, USA, 1990. IEEE Computer Society Press. 130
[LS98] Làszlò Lovàsz and Alexander Schrijver. A borsuk theorem for antipodal links and a
spectral characterization of linklessly embeddable graphs. Proceedings of the
American Mathematical Society, 126(5):1275–1285, 1998. 200
[LS15] Yin Tat Lee and He Sun. Constructing linear-sized spectral sparsification in
almost-linear time. arXiv preprint arXiv:1508.03261, 2015. 259
[LSY18] Yang P Liu, Sushant Sachdeva, and Zejun Yu. Short cycles via low-diameter
decompositions. arXiv preprint arXiv:1810.05143, 2018. 251
[LV99] László Lovász and Katalin Vesztergombi. Geometric representations of graphs. Paul
Erdos and his Mathematics, 1999. 129
[Mas14] Laurent Massoulié. Community detection thresholds and the weak ramanujan
property. In Proceedings of the 46th Annual ACM Symposium on Theory of
Computing, pages 694–703. ACM, 2014. 177
[Mil51] William Millar. Cxvi. some general theorems for non-linear systems possessing
resistance. Philosophical Magazine, 42(333):1150–1160, 1951. 146
BIBLIOGRAPHY 371
[MNS08] Steven J. Miller, Tim Novikoff, and Anthony Sabelli. The distribution of the largest
nontrivial eigenvalues in families of random regular graphs. Experiment. Math.,
17(2):231–244, 2008. 314
[MNS14] Elchanan Mossel, Joe Neeman, and Allan Sly. Belief propagation, robust
reconstruction and optimal recovery of block models. In Proceedings of The 27th
Conference on Learning Theory, pages 356–370, 2014. 177
[MSS14] Adam W. Marcus, Daniel A. Spielman, and Nikhil Srivastava. Ramanujan graphs
and the solution of the Kadison-Singer problem. In Proceedings of the International
Congress of Mathematicians, 2014. 259, 321
[MSS15b] Adam W. Marcus, Daniel A. Spielman, and Nikhil Srivastava. Interlacing families I:
Bipartite Ramanujan graphs of all degrees. Ann. of Math., 182-1:307–325, 2015. 314,
358
[MSS15c] Adam W. Marcus, Daniel A. Spielman, and Nikhil Srivastava. Interlacing families II:
Mixed characteristic polynomials and the Kadison-Singer problem. Ann. of Math.,
182-1:327–350, 2015. 259
[MSS15d] Adam W Marcus, Nikhil Srivastava, and Daniel A Spielman. Interlacing families IV:
Bipartite Ramanujan graphs of all sizes. arXiv preprint arXiv:1505.08010, 2015.
appeared in Proceedings of the 56th IEEE Symposium on Foundations of Computer
Science. 321, 329, 336, 344
[Nil91] A. Nilli. On the second eigenvalue of a graph. Discrete Math, 91:207–210, 1991. 212
[Obr63] Nikola Obrechkoff. Verteilung und berechnung der Nullstellen reeller Polynome. VEB
Deutscher Verlag der Wissenschaften, Berlin, 1963. 320, 331
[Per07] Oskar Perron. Zur theorie der matrices. Mathematische Annalen, 64(2):248–263,
1907. 36
[Piz90] Arnold K Pizer. Ramanujan graphs and Hecke operators. Bulletin of the AMS, 23(1),
1990. 314
[PP03] Claude M Penchina and Leora J Penchina. The braess paradox in mechanical, traffic,
and other networks. American Journal of Physics, 71:479, 2003. 138
[PS14] Richard Peng and Daniel A. Spielman. An efficient parallel solver for SDD linear
systems. In Symposium on Theory of Computing, STOC 2014, New York, NY, USA,
May 31 - June 03, 2014, pages 333–342, 2014. 281, 288
BIBLIOGRAPHY 372
[PSL90] A. Pothen, H. D. Simon, and K.-P. Liou. Partitioning sparse matrices with
eigenvectors of graphs. SIAM J. Matrix Anal. Appl., 11:430–452, 1990. 192
[Rud99] M. Rudelson. Random vectors in the isotropic position,. Journal of Functional
Analysis, 164(1):60 – 72, 1999. 246, 247
[RV07] Mark Rudelson and Roman Vershynin. Sampling from large matrices: An approach
through geometric functional analysis. J. ACM, 54(4):21, 2007. 247
[RVW02] Omer Reingold, Salil Vadhan, and Avi Wigderson. Entropy waves, the zig-zag graph
product, and new constant-degree expanders. Annals of Mathematics,
155(1):157–187, 2002. 231, 238
[Sen06] Eugene Seneta. Non-negative matrices and Markov chains. Springer Science &
Business Media, 2006. 36
[Sha48] Claude Elwood Shannon. A mathematical theory of communication. Bell system
technical journal, 27(3):379–423, 1948. 220
[Sim91] Horst D. Simon. Partitioning of unstructured problems for parallel processing.
Computing Systems in Engineering, 2:135–148, 1991. 192
[SJ89] Alistair Sinclair and Mark Jerrum. Approximate counting, uniform generation and
rapidly mixing Markov chains. Information and Computation, 82(1):93–133, July
1989. 39, 164
[Spi96a] D.A. Spielman. Linear-time encodable and decodable error-correcting codes. IEEE
Transactions on Information Theory, 42(6):1723–1731, Nov 1996. 230
[Spi96b] Daniel A. Spielman. Faster isomorphism testing of strongly regular graphs. In STOC
’96: Proceedings of the twenty-eighth annual ACM symposium on Theory of
computing, pages 576–584, New York, NY, USA, 1996. ACM. 312
[SS96] M. Sipser and D.A. Spielman. Expander codes. IEEE Transactions on Information
Theory, 42(6):1710–1722, Nov 1996. 230
[SS11] D.A. Spielman and N. Srivastava. Graph sparsification by effective resistances. SIAM
Journal on Computing, 40(6):1913–1926, 2011. 246, 251
[ST04] Daniel A. Spielman and Shang-Hua Teng. Nearly-linear time algorithms for graph
partitioning, graph sparsification, and solving linear systems. In Proceedings of the
thirty-sixth annual ACM Symposium on Theory of Computing, pages 81–90, 2004.
Full version available at http://arxiv.org/abs/cs.DS/0310051. 169
[ST07] Daniel A. Spielman and Shang-Hua Teng. Spectral partitioning works: Planar graphs
and finite element meshes. Linear Algebra and its Applications, 421:284–305, 2007.
192, 198
[ST13] Daniel A Spielman and Shang-Hua Teng. A local clustering algorithm for massive
graphs and its application to nearly linear time graph partitioning. SIAM Journal on
Computing, 42(1):1–26, 2013. 169
BIBLIOGRAPHY 373
[ST14] Daniel A. Spielman and Shang-Hua Teng. Nearly-linear time algorithms for
preconditioning and solving symmetric, diagonally dominant linear systems. SIAM.
J. Matrix Anal. & Appl., 35:835–885, 2014. 269, 281, 283
[SW09] Daniel A. Spielman and Jaeoh Woo. A note on preconditioning by low-stretch
spanning trees. CoRR, abs/0903.2816, 2009. Available at
http://arxiv.org/abs/0903.2816. 281
[SW15] Xiaorui Sun and John Wilmes. Faster canonical forms for primitive coherent
configurations. In Proceedings of the forty-seventh annual ACM symposium on
Theory of computing, pages 693–702. ACM, 2015. 312
[Tan81] R. Michael Tanner. A recursive approach to low complexity codes. IEEE
Transactions on Information Theory, 27(5):533–547, September 1981. 229
[Tan84] R. Michael Tanner. Explicit concentrators from generalized n-gons. SIAM Journal
Alg. Disc. Meth., 5(3):287–293, September 1984. 211
[Tre09] Luca Trevisan. Max cut and the smallest eigenvalue. In STOC ’09: Proceedings of
the 41st annual ACM symposium on Theory of computing, pages 263–272, 2009. 17
[Tre11] Luca Trevisan. Lecture 4 from cs359g: Graph partitioning and expanders, stanford
university, January 2011. available at
http://theory.stanford.edu/ trevisan/cs359g/lecture04.pdf. 164
[Tro12] Joel A Tropp. User-friendly tail bounds for sums of random matrices. Foundations of
Computational Mathematics, 12(4):389–434, 2012. 183, 246, 247
[Tut63] W. T. Tutte. How to draw a graph. Proc. London Mathematical Society, 13:743–768,
1963. 17, 122
[Vai90] Pravin M. Vaidya. Solving linear equations with symmetric diagonally dominant
matrices by constructing good preconditioners. Unpublished manuscript UIUC 1990.
A talk based on the manuscript was presented at the IMA Workshop on Graph
Theory and Sparse Matrix Computation, October 1991, Minneapolis., 1990. 278, 283
[Val79] Leslie G Valiant. The complexity of computing the permanent. Theoretical computer
science, 8(2):189–201, 1979. 323
[van95] Hein van der Holst. A short proof of the planarity characterization of Colin de
Verdière. Journal of Combinatorial Theory, Series B, 65(2):269 – 272, 1995. 203
[Var85] N. Th. Varopoulos. Isoperimetric inequalities and Markov chains. Journal of
Functional Analysis, 63(2):215 – 239, 1985. 164
[Ver10] Roman Vershynin. Introduction to the non-asymptotic analysis of random matrices.
arXiv preprint arXiv:1011.3027, 2010. 183
[Vis12] Nisheeth K. Vishnoi. Lx = b, 2012. available at
http://research.microsoft.com/en-us/um/people/nvishno/Site/Lxb-Web.pdf.
268
BIBLIOGRAPHY 374
[Voi97] Dan V Voiculescu. Free probability theory. American Mathematical Society, 1997. 336
[Vu07] Van Vu. Spectral norm of random matrices. Combinatorica, 27(6):721–736, 2007. 68,
71, 179
[Vu14] Van Vu. A simple svd algorithm for finding hidden partitions. arXiv preprint
arXiv:1404.3918, 2014. 177
[Wal22] JL Walsh. On the location of the roots of certain types of polynomials. Transactions
of the American Mathematical Society, 24(3):163–180, 1922. 337
[Wig58] Eugene P Wigner. On the distribution of the roots of certain symmetric matrices.
Ann. Math, 67(2):325–327, 1958. 63
[Wil67] Herbert S. Wilf. The eigenvalues of a graph and its chromatic number. J. London
math. Soc., 42:330–332, 1967. 32, 35
δ, 4 regular, 4
dave , 33
dmax , 33 Schur complement, 101, 108
square root of a matrix, 102
approximation of graphs, 41
vertex-induced subgraph, 34
boundary, 159
walk matrix, 84
Cauchy’s Interlacing Theorem, 34
centered vector, 164
chromatic number, 35, 152
coloring, 35
conductance, 161
Courant-Fischer Theorem, 21
Laplacian, 5
lazy random walk, 4, 84
lazy walk matrix, 84
linear codes, 220
Loewner partial order, 39
norm of matrix, 65
normalized adjacency matrix, 84
normalized Laplacian, 87, 161
orthogonal matrix, 22
path graph, 6
Perron vector, 32
positive definite, 6
positive semidefinite, 6
Rayleigh quotient, 21
375