0% found this document useful (0 votes)
3 views

Clustering-Part1.pptx

The document provides an overview of clustering in machine learning, focusing on the differences between supervised and unsupervised learning, with a particular emphasis on K-means clustering and hierarchical clustering methods. It discusses applications of clustering, challenges faced, and various algorithms and distance measures used in clustering. Additionally, it covers the process of selecting the number of clusters and the importance of initialization methods like K-means++.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Clustering-Part1.pptx

The document provides an overview of clustering in machine learning, focusing on the differences between supervised and unsupervised learning, with a particular emphasis on K-means clustering and hierarchical clustering methods. It discusses applications of clustering, challenges faced, and various algorithms and distance measures used in clustering. Additionally, it covers the process of selecting the number of clusters and the importance of initialization methods like K-means++.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 84

CS60050: Machine Learning

Autumn 2024

Sudeshna Sarkar
Clustering
K-Means
26-27 Sep 2024
Supervised learning vs. unsupervised learning
• Supervised learning: discover discriminative patterns in the data that
relate data attributes with a target (class) attribute.
• These patterns are then utilized to predict the values of the target attribute in
future data instances.
• Unsupervised learning: The data have no target attribute.
• discover data generative (all) patterns.
Clustering
Unsupervised learning: The data have no target attribute.
– Requires data, but no labels
– Detect patterns e.g. in
• Group emails or search results
• Customer shopping patterns
• Regions of images
– Useful when don’t know what you’re looking for
– But: can get gibberish
Applications
• Segmenting of customers with similar market characteristics
— pricing , loyalty, spending behaviors etc.
• Grouping of products based on their properties
• Identify similar energy use customer profiles
<x> = time series of energy usage
• Clustering weblog data to discover groups of similar access patterns.
• Recognize communities in social networks.
• Top 20 topics in Twitter
Clustering
Clustering Algorithms
Clustering
Clustering Examples
Customer Segmentation
• Group customers based on their demography and activity
• Purchase History
• Demographic
• Content engagement
• Behavior
• Customer Lifecycle Stage

Cater to customer groups for promotion, recommendation, product


development strategies
INCOME SPEND Customer Segments
233 150
250 187
204 172
236 178
354 163
192 148
294 153
263 173 Annual Spend
199 162
168 174
239 160
275 139
266 171
211 144 Annual Income
Applications:
News
Clustering
Clustering
Fundamental Aspects of clustering
• A clustering algorithm
• Partitional clustering
• Hierarchical clustering
•…
• A distance (similarity, or dissimilarity) function
• Euclidean, cosine, Mahalanobis
• Clustering quality
• Inter-clusters distance ⇒ maximized
• Intra-clusters distance ⇒ minimized
• The quality of a clustering result depends on the algorithm, the
distance function, and the application.
How many clusters?
An illustration
This data set has four natural clusters.
Clustering

x x xx
x xx x
xx
xxxxx
X1 x xxx
X2
Aspects of clustering

vs
Similarity / Distance Measures

Depends on the problem domain and


data type
• Customers
• Time series
• Text
• Images

• Similarity or distance measures:


• Euclidean
• Manhattan distance
• Cosine similarity
• Pearson correlation
• …
Distance/ Similarity measures

y
x
x

y
x
x
Similarity measures

(Dis)similarity measures

Types of Clustering: Hard vs Soft
• Exclusive (Hard)
• Non-overlapping subsets
• Each item is a member of a single cluster

• Overlapping (Soft)
• Potentially overlapping subsets
• A item can simultaneously belong to multiple clusters
Challenges in Clustering
•Data is very large
•High dimensional data space.
•Data space is not Euclidean (e.g. NLP problems).
K-means Clustering
Clustering by Partitioning

K-means algorithm (MacQueen, 1967)

Given K
1. Initialization: Randomly choose K data
points (seeds) to be the initial cluster
centres
2. Cluster Assignment: Assign each data
point to the closest cluster centre
3. Move Centroid: Re-compute the cluster
centres using the current cluster
memberships.
4. If a convergence criterion is not met, go
to 2.
1. Random Initialization
2. Cluster Assignment
2. Cluster Assignment
3. Move Centroid
K-Means & Its Stopping criterion
Given K
1. Initialization: Randomly choose K data •
points (seeds) to be the initial cluster
centres
2. While not converged:
I. Cluster Assignment: Assign each
data point to the closest cluster
centre
II. Move Centroid: Re-compute the
cluster centres using the current
cluster memberships.
III. If a convergence criterion is not met,
go to 2.
K-Means Optimization Objective

K-Means Convergence Property

Convergence of K-Means

Convergence of K-Means
Given K •
1. Initialization: Randomly choose K data
points (seeds) to be the initial cluster
centres
2. While not converged:
I. Cluster Assignment: Assign each
data point to the closest cluster
centre
II. Move Centroid: Re-compute the
cluster centres using the current
cluster memberships.
III. If a convergence criterion is not
met, go to 2.
Convergence K-Means (summary)
Convergence Property

Kmeans illustrated
Picking Cluster Seeds (Initial Values)
1. Lloyd’s Method: Random
Initialization

2. K-Means++ : Iteratively
construct a random sample
with good spacing across the
dataset.
Picking Cluster Seeds (Initial Values)
1. Lloyd’s Method: Random
Initialization
May converge at a local optimum
1. K-Means++ : Iteratively
construct a random sample
with good spacing across the
dataset.
1. Perform multiple runs
o Each run with a different set of
randomly chosen seeds
2. Select that configuration that gives
minimum SSE
Picking Cluster Seeds (K-means++)
• Choose centers at random from the data •
points
• Weight the probability of choosing
the centres according to their
squared distance from the closest
centre already chosen

In other words,
• Choose 1 center randomly.
• Choose second furthest from the first.
• Choose third furthest from first and second.
• … and so on.
How to select K?
1: Use cross validation to select K 1.00E+03

Objective Function
9.00E+02
• What should we optimize? 8.00E+02
7.00E+02
2: Let the domain expert look at 6.00E+02
the clustering and decide 5.00E+02
4.00E+02
3.00E+02
2.00E+02
3: The “knee” solution 1.00E+02
• Plot the objective function values for 0.00E+00
1 2 3 4 5 6
different values of K K
• “knee finding” or “elbow finding”.

Figure from slide by Eamonn Keogh


K-Means Time Complexity

K-Means Pros and Cons


K-means Getting Stuck (Varying K)
K-means not able to properly cluster

Changing the Features or distance function (kernel)


May help
Some bad cases for k-means
• Clusters may overlap
• Some clusters may be
“wider” than others
• Clusters may not be linearly
separable

Slide credit: CMU MLD Aarti Singh


Hierarchical Clustering
Hierarchical Algorithms

Agglomerative (bottom-up):
Start with each point as a Agglomerative
cluster. Clusters are combined
based on their “closeness”.
Divisive (top-down): Start with
one cluster including all points
and recursively split each Divisive
cluster.
Types of hierarchical clustering
1.
• Divisive Hierarchical Clustering
Hierarchical clustering
Divisive (Top-down)

Slide credit: Min Zhang


Types of hierarchical clustering
2.
• Agglomerative (bottom up) clustering
Hierarchical Clustering: Example
1. C= {1},{2},{3},{4},{5},{6},{7}
2. C = {1,6} ,{2},{3},{4},{5},{7}
3. {1,6}, {2,4} ,{3},{5},{7}
Hierarchical Clustering: Example
1. C= {1},{2},{3},{4},{5},{6},{7}
2. {1,6} ,{2},{3},{4},{5},{7}
3. {1,6}, {2,4} ,{3},{5},{7}
4. {1,6},{2,4},{3}, {5,7}
5. {1,6}, {2,4,5,7} ,{3}
6. {1,6,3}, {2,4,5,7}
7. {1,6,3,2,4,5,7}
Dendrogram: Hierarchical Clustering
Dendrogram: Hierarchical Clustering
Dendrogram
• Input set S
• Nodes represent subsets of S

Features of the tree


• The root is S
• The leaves are the individual
elements of S
• The internal nodes are defined as
the union of their children.
Dendrogram: Definition


Hierarchical clustering
Height

Height=11; K=12

K=8

K=4
Height = 0; K=1
Height = 0
Clusters = 9

1 4 5 8 3 2 9 6 7
Height = 1
Clusters=8

1 4 5 8 3 2 9 6 7
Height = 2
Clusters=7

1 4 5 8 3 2 9 6 7
Height = 3
Clusters=6

1 4 5 8 3 2 9 6 7
Height = 4
Clusters=5

1 4 5 8 3 2 9 6 7
Height = 5
Clusters=4

1 4 5 8 3 2 9 6 7
Height = 6
Clusters=3

1 4 5 8 3 2 9 6 7
Height = 7
Clusters=2

1 4 5 8 3 2 9 6 7
Height = 8
Clusters=1

1 4 5 8 3 2 9 6 7
Hierarchical Agglomerative clustering

Different definitions of the distance leads to different algorithms.


Distance Measures
Real variables Discrete variables

• Euclidean • Hamming
• Cosine • Jaccard
• Correlation •…
• Manhattan
• Minkowski
• Mahalanobis
•…
Linkage: Definition

Initialization
• Each individual point is taken as a cluster
• Construct distance/proximity matrix

p1
p8
p1 p2 p3 p4 p5 ..
p1 .
p3
p2 p7
p2
p9 p10 p3
p4
p4
p5
p5
p11 .
p6 .
.
p12 Distance/Proximity Matrix
Intermediate State
Distance/Proximity Matrix

C1 C2 C3 C4 C5
C2
C1

C5 C2
C1 C3

C4
C5
C3 C4

After some merging steps, we have some clusters


Intermediate State
C1 C2 C3 C4 C5

C1
C2
C2
C5
C3
C1
C4

C5

C3 C4

Merge the two closest clusters (C2 and C5)


and update the distance matrix.
After Merging
Update the distance matrix

C2 U C5
C1 C3 C4

C1 ?

C2 U C5 ? ? ? ?
C1
C3 ?

C4 ?

C3 C4
Closest Pair
• A few ways to measure distances of two clusters.
• Single-link
• Similarity of the most similar (single-link)
• Complete-link
• Similarity of the least similar points
• Centroid
• Clusters whose centroids (centers of gravity) are the most similar
• Average-link
• Average cosine between pairs of elements
Distance between two clusters

It can result in long and thin


clusters.
Distance between two clusters

It can result in long and thin


clusters.
Single-link clustering: example
• Determined by one pair of points, i.e., by one link in the proximity
graph.

1 2 3 4 5
Complete link method


Complete link method


Complete-link clustering: example
• Distance between clusters is determined by the two most distant
points in the different clusters

1 2 3 4 5
Computational Complexity

Average Link Clustering

Compromise between single and complete link. Less


susceptible to noise and outliers.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy