Clustering Lecture
Clustering Lecture
Distance Measure
The similarity is captured by a
distance measure in this algorithm.
The original proposed measure of
distance is the Euclidean distance.
X ( x1 , x2 , , x n ), Y ( y 1 , y 2 , , y n )
d ( x, y )
2
(
x
y
)
i i
i 1
Instance
1
2
3
4
5
6
1.0
1.0
2.0
2.0
3.0
5.0
1.5
4.5
1.5
3.5
2.5
6.0
f(x)
7
6
5
4
3
2
1
0
6
2
4
5
3
Cluster Centers
Cluster Points
(2.67,4.67)
2, 4, 6
Squared Error
14.50
(2.00,1.83)
1, 3, 5
(1.5,1.5)
1, 3
15.94
(2.75,4.125)
2, 4, 5, 6
(1.8,2.7)
1, 2, 3, 4, 5
9.60
(5,6)
f(x)
7
6
5
4
3
2
1
0
General Considerations
Works best when the clusters in the data are of
approximately equal size.
Attribute significance cannot be determined.
Lacks explanation capabilities.
Requires real-valued data. Categorical data can be
converted into real, but the distance function needs to
be worked out carefully.
We must select the number of clusters present in the
data.
Data normalization may be required if attribute ranges
vary significantly.
Alternative distance measures may generate different
clusters.
9
K-means Clustering
Complexity is O( n * K * I * d )
Original Points
1.5
1
0.5
0
-2
-1.5
-1
-0.5
0.5
1.5
2.5
2.5
1.5
1.5
0.5
0.5
-2
-1.5
-1
-0.5
0.5
1.5
Optimal Clustering
-2
-1.5
-1
-0.5
0.5
1.5
Sub-optimal Clustering
3
2.5
2
1.5
1
0.5
0
-2
-1.5
-1
-0.5
0.5
1.5
Iteration 2
1.5
1.5
1.5
2.5
2.5
2.5
0.5
0.5
0.5
-2
-1.5
-1
-0.5
0.5
1.5
-2
Iteration 4
-1.5
-1
-0.5
0.5
1.5
-2
Iteration 5
1.5
1.5
1.5
0.5
0.5
0.5
-1
-0.5
0.5
1.5
-1
-0.5
0.5
1.5
1.5
Iteration 6
2.5
2.5
-1.5
-1.5
2.5
-2
Iteration 3
-2
-1.5
-1
-0.5
0.5
1.5
-2
-1.5
-1
-0.5
0.5
SSE dist 2 ( mi , x )
i 1 xCi
3
2.5
2
1.5
1
0.5
0
-2
-1.5
-1
-0.5
0.5
1.5
1.5
1.5
2.5
2.5
0.5
0.5
-2
-1.5
-1
-0.5
0.5
Iteration 3
Iteration 2
1.5
-2
-1.5
-1
-0.5
0.5
Iteration 4
1.5
1.5
1.5
0.5
0.5
0.5
-1
-0.5
0.5
1.5
Iteration 5
2.5
2.5
-1.5
1.5
2.5
-2
-2
-1.5
-1
-0.5
0.5
1.5
-2
-1.5
-1
-0.5
0.5
1.5
If there are K real clusters then the chance of selecting one centroid from each cluster is small.
Hierarchical Clustering
Produces a set of nested clusters
organized as a hierarchical tree
Can be visualized as a dendrogram
A tree like diagram that records the
sequences of merges or splits
5
0.2
4
3
0.15
0.1
2
1
0.05
3
0
Strengths of Hierarchical
Clustering
Do not have to assume any particular
number of clusters
Any desired number of clusters can be
obtained by cutting the dendogram at the
proper level
Hierarchical Clustering
Two main types of hierarchical clustering
Agglomerative:
Divisive:
Agglomerative Clustering
Algorithm
More popular hierarchical clustering technique
Basic algorithm is straightforward
1.
2.
3.
4.
5.
6.
Starting Situation
Start with clusters of individual points
p1 p2
p3
p4 p5
...
and a proximity matrix
p1
p2
p3
p4
p5
.
.
.
Proximity Matrix
Intermediate Situation
After some merging steps, we have some clusters
C1
C2
C3
C4
C1
C2
C3
C3
C4
C4
C5
Proximity Matrix
C1
C2
C5
C5
Intermediate Situation
We want to merge the two closest clusters (C2 and C5)
and update the proximity matrix.
C1
C2
C3
C4
C1
C2
C3
C3
C4
C4
C5
Proximity Matrix
C1
C2
C5
C5
After Merging
The question is How do we update the proximity
C2
matrix?
C1
C1
C2 U C5
C3
C4
U
C5
C3
C4
?
?
C3
C4
Proximity Matrix
C1
C2 U C5
Similarity?
p2
p3
p4 p5
p1
p2
p3
p4
p5
MIN
.
MAX
.
Group Average
.
Proximity Matrix
Distance Between Centroids
Other methods driven by an objective
function
Wards Method uses squared error
...
p2
p3
p4 p5
p1
p2
p3
p4
p5
MIN
.
MAX
.
Group Average
.
Proximity Matrix
Distance Between Centroids
Other methods driven by an objective
function
Wards Method uses squared error
...
p2
p3
p4 p5
p1
p2
p3
p4
p5
MIN
.
MAX
.
Group Average
.
Proximity Matrix
Distance Between Centroids
Other methods driven by an objective
function
Wards Method uses squared error
...
p2
p3
p4 p5
p1
p2
p3
p4
p5
MIN
.
MAX
.
Group Average
.
Proximity Matrix
Distance Between Centroids
Other methods driven by an objective
function
Wards Method uses squared error
...
p2
p3
p4 p5
p1
p2
p3
p4
p5
MIN
.
MAX
.
Group Average
.
Proximity Matrix
Distance Between Centroids
Other methods driven by an objective
function
Wards Method uses squared error
...
I1
1.00
0.90
0.10
0.65
0.20
I2
0.90
1.00
0.70
0.60
0.50
I3
0.10
0.70
1.00
0.40
0.30
I4
0.65
0.60
0.40
1.00
0.80
I5
0.20
0.50
0.30
0.80
1.00
3
5
0.2
Nested Clusters
0.15
0.1
0.05
0
Dendrogram
Strength of MIN
Original Points
Two Clusters
Limitations of MIN
Original Points
Two Clusters
I1
1.00
0.90
0.10
0.65
0.20
I2
0.90
1.00
0.70
0.60
0.50
I3
0.10
0.70
1.00
0.40
0.30
I4
0.65
0.60
0.40
1.00
0.80
I5
0.20
0.50
0.30
0.80
1.00
2
5
0.4
0.35
0.3
0.25
6
1
Nested Clusters
0.2
0.15
0.1
0.05
0
Dendrogram
Strength of MAX
Original Points
Two Clusters
Limitations of MAX
Original Points
Tends to break large clusters
Biased towards globular clusters
Two Clusters
proximity(p ,p )
proximity(
Cluster
i , Cluster
j)
piCluster
i
pjClusterj
|Clusteri | |Clusterj |
I1
I2
I3
I4
I5
I1
1.00
0.90
0.10
0.65
0.20
I2
0.90
1.00
0.70
0.60
0.50
I3
0.10
0.70
1.00
0.40
0.30
I4
0.65
0.60
0.40
1.00
0.80
I5
0.20
0.50
0.30
0.80
1.00
2
5
0.25
0.2
0.15
6
1
Nested Clusters
0.1
0.05
0
Dendrogram
Limitations
Biased towards globular clusters
Hierarchical Clustering:
Comparison
5
1
MIN
MAX
5
Wards Method
2
5
Group Average
1
4
2
3
1
5
6
1