0% found this document useful (0 votes)
7 views

Week 9

The document discusses unsupervised learning techniques including clustering algorithms like k-means and hierarchical clustering as well as distance measures used in clustering like Euclidean, squared Euclidean, Manhattan and cosine distance. It also covers supervised learning techniques like classification and regression.

Uploaded by

Aqil Syahmi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Week 9

The document discusses unsupervised learning techniques including clustering algorithms like k-means and hierarchical clustering as well as distance measures used in clustering like Euclidean, squared Euclidean, Manhattan and cosine distance. It also covers supervised learning techniques like classification and regression.

Uploaded by

Aqil Syahmi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

Learning Outcomes

• Understand concept and applications of Unsupervised Learning


• Understand concept of Partitional Clustering and apply it into the
lab assignment and coursework
• K-Means
• Understand Hierarchical Clustering and draw Dendrogram
• Single-linkage clustering
• Complete-linkage clustering
• Average linkage clustering
• Centroid method
• Combine K-means with hierarchical clustering
Supervised Learning
Apple

Known Data

Model
Apple Banana
New Data
Known Response Training Testing
Supervised Learning
Supervised Learning is basically of two types
• Classification
• When the variables are categorical, i.e., with 2 or more classes (yes/no,
true/false, apple/banana), the classification is used.
• Regression
• In the relationship between two or more variables, a change in one
variable is associated with a change in another variable
Supervised Learning - Classification
New Email

Non spam
Non spam
Spam Filtering

Categorical
Separation

Learn Scan content


Spam
Spam
Spam Emails
Supervised Learning - Regression
Weather Prediction
Humidity

Prediction
%

Temperature Learn

Past Data

New Data
Supervised Learning Applications
Signature Risk Image Face Fraud
Recognition Assessment Classification Detection Detection

Attack Visual Spam Weather


Detection Recognition Detection Forecasting
Unsupervised Learning

Pattern
Recognition

Known Data Response

Model
Unsupervised Learning
Unsupervised Learning is basically of two types
• Clustering
• A method of dividing the objects into clusters such that objects in a cluster
should be as similar as possible, and objects in different clusters should be as
dissimilar as possible.
• Association
• A method for discovering interesting relations between variables in large
collections
Unsupervised Learning - Clustering

A
B

Call Duration
Internet Usage

Internet Usage
Total Call Duration

A telecom service provider provides the personalized data and call plans to keep the customers
Unsupervised Learning - Association
Customer 1 Customer 2 New Customer

• Bread • Bread If a new customer purchases


• Milk • Milk bread, he is most likely to
• Fruit • Corn purchase milk as well.
• Candy
Unsupervised Learning Applications
Delivery Store Products Customer
Market Research
Optimization Segmentation Segmentation

Identification of Identifying
Similarity Recommendation
Human Errors Accident Prone
Detection Systems
during Data Entry Areas

Anomaly
Search Engine
Detection
Summary of Classical Machine Learning
Machine Learning

Supervised Learning Unsupervised Learning

Classification Regression Cluster Association


Clustering

• Clustering is a technique for finding similarity groups in data, called clusters.


• Similar data instances in same cluster
• Dissimilar data in different clusters
• Clustering is an example of unsupervised learning
• No labels assigned to data points/instances
• Clustering algorithms find patterns in the given data
Types of Clustering
Clustering

Partitional Clustering Hierarchical Clustering

K-means Fuzzy C-means Agglomerative Divisive


Types of Clustering
Clustering

C1 C2

Partitional Clustering

K-means Fuzzy C-means

Divide objects into clusters such that each object is only in one cluster, not several clusters
Types of Clustering
Clustering

C1 C2

Partitional Clustering

K-means Fuzzy C-means

Divide objects into clusters such that an object can belong to more than one cluster
Hierarchical Clustering
Clustering

Hierarchical Clustering

Clusters have a tree type structure


Hierarchical Agglomerative Clustering
a b c d e f g
Clustering

de fg
Hierarchical Clustering
defg

cdefg
Agglomerative Divisive
bcdefg

abcdefg

Bottom-Up Approach: Begin with each object as a separate cluster, and then merge them in to larger clusters
Hierarchical Clustering - Divisive
abcdefg
Clustering

bcdefg

Hierarchical Clustering cdefg

defg

Agglomerative Divisive
de fg

a b c d e f g

Top-Down Approach: Begin with all object as a cluster, and then divide them in to smaller clusters
Distance Measure of K-means Clustering

• Distance measure is used to determine the similarity between two objects


• Distance measure influences the shape of the clusters
• Distance measure supported by K-means
• Euclidean distance measure
• Squared Euclidean distance measure
• Manhattan distance measure
• Cosine distance measure
Distance Measure in K-means Clustering
1. Euclidean distance • The Euclidean distance is a straight line
measure • It is the distance between two points in
2. Squared Euclidean Euclidean space
distance measure 𝑛
𝑑= ෍ 𝑞𝑖 − 𝑝𝑖 2
3. Manhattan distance 𝑖=1
measure q(x2, y2)
4. Cosine distance Euclidean distance
measure
P(x1, y1)
Distance Measure in K-means Clustering
1. Euclidean distance • Euclidean square distance measure uses the
measure same equation as the Euclidean distance
2. Squared Euclidean measure without the square root
distance measure 𝑑=෍
𝑛
𝑞𝑖 − 𝑝𝑖 2

3. Manhattan distance 𝑖=1

measure
4. Cosine distance
measure
Distance Measure in K-means Clustering
1. Euclidean distance • Manhattan distance is the sum of the distance
measure between two points measured along axes at
2. Squared Euclidean right angles
distance measure 𝑑=෍
𝑛
𝑞𝑥 − 𝑝𝑥 + 𝑞𝑦 − 𝑝𝑦
3. Manhattan distance 𝑖=1

measure Q(x, y)
4. Cosine distance Manhattan distance
measure
P(x, y)
Distance Measure in K-means Clustering
1. Euclidean distance • Cosine distance measures the angle between
measure two vectors
2. Squared Euclidean
σ𝑛𝑖=1 𝑝𝑖 𝑞𝑖
distance measure 𝑑= 𝑛
σ𝑖=1 𝑝𝑖 2 σ𝑛𝑖=1 𝑞𝑖 2
3. Manhattan distance
measure p
4. Cosine distance Cosine distance
measure
q
Clustering Basics
• Clustering algorithm
• Partitional clustering
• Hierarchical
clustering
• Distance function
• Decides which class is the nearest
• Can be Euclidean distance
• Clustering quality depends
• Algorithm
• Distance function
• Application
• Inter-clusters distance  maximized
• Intra-clusters distance  minimized
K-means Algorithm
• Partitional clustering
• Partitions the given data into k clusters.
• Each cluster has a cluster center, called centroid.
• k is specified by the user
• Each data point
• Vector X = {x1, x2, …, xn} -> n dimensional data
• N attributes
• Could be weighted or non-weighted attributes
K-means Algorithm
• User decides on the k value
• Given k:
1) Randomly choose k data points as the initial centroids, cluster centers
2) Assign each data point to the closest centroid
3) Re-compute the centroids using the current cluster memberships.
4) If a convergence criterion is not met, go to step 2

• Different convergence criteria can be used


• Based on the application
• No further change in the centroid
K-means Algorithm
• User decides on the k value
• Given k:
1) Randomly choose k data points as the initial centroids, cluster centers
2) Assign each data point to the closest centroid
3) Re-compute the centroids using the current cluster memberships.
4) If a convergence criterion is not met, go to step 2

• Different convergence criteria can be used


• Based on the application
• No further change in the centroid
K-means: an example
K-means: Initialize centers randomly

K=3
K-means: assign points to nearest center
K-means: readjust centers
K-means: assign points to nearest center
K-means: readjust centers
K-means: assign points to nearest center
K-means: readjust centers
K-means: assign points to nearest center

No changes: Done
K-means Clustering Algorithm
• Step 1
• Random select K cluster centroids
• C is the set of all centroids
𝐶 = 𝑐1 , 𝑐2 , … , 𝑐𝑘
• Step 2
• Calculate the Euclidean distance from each data point to the centroids
and assign the data point to one centroid with the min distance
2
arg min 𝑑 𝑥, 𝑐𝑖
𝐶𝑖 ∈𝐶
K-means Clustering Algorithm
• Step 3
• Calculate the new centroid for each cluster
1
𝑐𝑖 = ෍ 𝑥𝑖
𝑆𝑖
𝑥𝑖 ∈𝑆𝑖

where 𝐶𝑖 is the new centroid, 𝑆𝑖 is all data point 𝑥𝑖 assigned to the 𝑖𝑡ℎ cluster
• Step 4
• Repeat Step 2 and Step 3 until the cluster assignments are stable
Strengths of K-means
• Strengths:
• Simple: easy to understand and to implement
• Efficient: Time complexity: O(tkn),
where n is the number of data points,
k is the number of clusters, and
t is the number of iterations.
• Since both k and t are small. k-means is considered a linear algorithm.
• K-means is the most popular clustering algorithm.
• It terminates at a local optimum
• The global optimum is hard to find.
Weakness of K-means
• The algorithm is only applicable if the mean is defined.
• For categorical data, k-mode - the centroid is represented by most frequent
values, is used
• The user needs to specify k.
• The algorithm is sensitive to outliers
• Outliers are data points that are very far away from other data points.
• Outliers could be errors in the data recording or some special data points with
very different values
• Algorithm is very sensitive to the initial assignment of the centroids
Outliers in K-means
Outliers in K-means
• Desired output

Outlier
Sensitive to initial points in K-means

An example clustering by k-means

What if the initial centroids are


different?
Sensitive to initial points in K-means
Another clustering with a
different starting point.
Discovering non hyper-ellipsoids
• Not suitable in finding non hyper-ellipsoids

• K-means may cluster as


follows.

• Cannot identify the


obvious two clusters
How to Choose the Optimum Number of Clusters?

• The most well-known method


• Elbow method for determining the optimal number of clusters
• A heuristic used in determining the number of clusters in a data set
• Within-Cluster-Sum of Squared (WSS)
• Sum of the Euclidean distance measure between each member of the
cluster and its centroid for each k
WSS = σ𝑛𝑖=1 𝑥𝑖 − 𝑐𝑖 2

Where 𝑥𝑖 is a data point, 𝑐𝑖 is its centroid, and 𝑛 is the total data points
Calculating WSS for a range of values for k

• Calculate the WSS for


different values of k 40000

▪ 1 to max k 30000
• Plot WSS vs k

WSS
• Choose k for which WSS 20000

becomes first starts to 10000


diminish
▪ the plot looks like an 0
1 2 3 4 5 6 7 8 9 10
arm with a clear elbow
k
at k = 3
Summary
• Despite weaknesses, K-means is still the most popular algorithm due
to its simplicity, efficiency and
• other clustering algorithms have their own lists of weaknesses.
• No clear evidence that any other clustering algorithm performs better
in general
• although they may be more suitable for some specific types of data or
applications.
• Comparing different clustering algorithms is a difficult task. No one
knows the correct clusters!
Hierarchical Clustering
• Hierarchy of clusters
• Tree structure
• Cases where there is a hierarchy of classes
• Hierarchical Clustering 44 min
• https://www.youtube.com/watch?v=9U4h6pZw6f8&feature=emb_rel_pause
Buildings

Commercial Residential

Mall Tradehub Industrial Hawker Condo HDB Landed


Types of hierarchical clustering
• Agglomerative (bottom up) clustering: It builds the dendrogram (tree) from
the bottom level, and
• merges the most similar (or nearest) pair of clusters
• stops when all the data points are merged into a single cluster (i.e., the root cluster).
• Divisive (top down) clustering: It starts with all data points in one cluster,
the root.
• Splits the root into a set of child clusters. Each child cluster is recursively divided
further
• stops when only singleton clusters of individual data points remain, i.e., each cluster
with only a single point
Dendrograms
• A Dendrogram is a Final
diagram representing
a tree.
• It illustrates the nested ➢4
sequence of clusters in
Hierarchical Clustering ➢3
➢2
➢1
Agglomerative Clustering
It is more popular than divisive methods.
• At the beginning, each data point forms a cluster (also called a node).
• Merge nodes/clusters that have the least distance.
• Go on merging
• Eventually all nodes belong to one cluster

• Time complexity: at least O(n2)


Calculating the distance between 2 clusters
• A few ways to measure distances of two clusters.
• Results in different variations of the algorithm.
• Single-linkage clustering
• Complete-linkage clustering
• Average linkage clustering
• WPGMA (Weighted Pair Group Method with
Arithmetic Mean)
• UPGMA (Unweighted Pair Group Method with
Arithmetic Mean)
• Centroid method
Single-Linkage Clustering
• In the beginning of the agglomerative clustering process, each
element is in a cluster of its own.
• The clusters are then sequentially combined into larger clusters, until
all elements end up being in the same cluster.
• At each step, the two clusters separated by the shortest distance are
combined.
• Distance between two clusters is the shortest distance between a pair
of elements from two clusters.

Source: https://en.wikipedia.org/wiki/Single-linkage_clustering
Single-Linkage Clustering
Working Example Dendrogram
• Five elements (a,b,c,d,e) and the following u
a
matrix D1 of pairwise distances between them b
• First Step 8 0
Height

a b c d e • Get the min D1(a,b) → Cluster (a, b)


• Let u denote the node to which a and b are now
a 0 16 20 35 25
connected. Setting δ(a,u) = δ(b,u) = D1(a,b)/2 ensures
b 16 0 32 30 22 that elements a and b are equidistant from u.
c 20 32 0 28 39 • The branches joining a and b to u then have lengths
d 35 30 28 0 50 δ(a,u) = δ(b,u) = D1(a,b)/2 = 16/2 = 8
e 25 22 39 50 0 • Height of the Dendrogram at the first step is 8.
Single-Linkage Clustering
Working Example
• Second Step
D2((a,b),c) = min (D1(a,c), D1(b,c)) = min(20, 32) = 20 a b c d e
D2((a,b),d) = min (D1(a,d), D1(b,d)) = min(35, 30) = 30
D2((a,b),e) = min (D1(a,e), D1(b,e)) = min(25, 22) = 22 a 0 16 20 35 25
b 16 0 32 30 22
(a,b) c d e c 20 32 0 28 39
d 35 30 28 0 50
(a,b) 0 20 30 22
e 25 22 39 50 0
c 20 0 28 39
d 30 28 0 50
• Update the initial proximity matrix D1 to a new proximity matrix
e 22 39 50 0 D2
Single-Linkage Clustering
Working Example Dendrogram
• Second Step u
a
D2((a,b),c) = min (D1(a,c), D1(b,c)) = min(20, 32) = 20 b
D2((a,b),d) = min (D1(a,d), D1(b,d)) = min(35, 30) = 30 Height
8 0
D2((a,b),e) = min (D1(a,e), D1(b,e)) = min(25, 22) = 22

• D2((a,b),c)=20 is the lowest values of D2, so we join Cluster (a,b)


(a,b) c d e with element c →Cluster ((a,b),c)
• Let v denote the node to which (a,b), and c are now connected.
(a,b) 0 20 30 22
• δ(a, v) = δ(b, v) = δ(c, v) = D2((a,b),c)= 10
c 20 0 28 39 • Height of the Dendrogram at the second step is 10.
d 30 28 0 50
e 22 39 50 0
Single-Linkage Clustering
Working Example
• Third Step
D3(((a,b),c),d) = min (D2((a,c),d), D2(c,d)) = min(30, 28) = 28
D3(((a,b),c),e) = min (D2((a,b),e), D2(c,e)) = min(22, 39) = 22 (a,b) c d e

(a,b) 0 20 30 22
c 20 0 28 39
((a,b),c) d e
d 30 28 0 50
((a,b),c) 0 28 22 e 22 39 50 0
d 28 0 50
e 22 50 0 • Update the proximity matrix D2 to a new proximity matrix D3
Single-Linkage Clustering
Dendrogram
Working Example u a
• Third Step v b
D3(((a,b),c),d) = min (D2((a,c),d), D2(c,d)) = min(30, 28) = 28 c
D3(((a,b),c),e) = min (D2((a,b),e), D2(c,e)) = min(22, 39) = 22 Height
10 8 0

• D3(((a,b),c),e)=22 is the lowest values of D3, so we join Cluster


((a,b),c) d e
((a,b),c) with element e → Cluster (((a,b),c),e)
((a,b),c) 0 28 22 • Let w denote the node to which ((a,b),c) and e are now connected.
• δ(a, w) = δ(b, w) = δ(c, w) = δ(e, w) = D3(((a,b),c),e)= 11
d 28 0 50
• Height of the Dendrogram at the third step is 11.
e 22 50 0
Single-Linkage Clustering
Working Example
• Final Step
D4((((a,b),c),e),d) = min (D3((((a,b),c),e),d), D3(e,d))
= min(28,50) = 28
((a,b),c) d e

((a,b),c) 0 28 22
(((a,b),c),e) d
d 28 0 50
(((a,b),c),e) 0 28 e 22 50 0
d 28 0
• Update the proximity matrix D3 to a new proximity matrix D4
Single-Linkage Clustering
Dendrogram
Working Example u a
v b
• Final Step w c
D4((((a,b),c),e),d) = min (D3((((a,b),c),e),d), D3(e,d))
= min(28,50) = 28 e
Height
11 10 8 0

• D4((((a,b),c),e),d)=28 is the lowest values of D4, so we join Cluster


(((a,b),c),e) d
(((a,b),c),e) with element d → Cluster ((((a,b),c),e),d)
(((a,b),c),e) 0 28 • Let r denote the (root) node to which (((a,b),c),e) and x are now
connected.
d 28 0
• δ(a, r) = δ(b, r) = δ(c, r) = δ(e, r) = δ(d, r) = D4((((a,b),c),e),d)= 14
• Height of the Dendrogram at the final step from the root to a, b, c, d,
e is 14.
Complete-Linkage Clustering
1. At the beginning of the process, each element is in a cluster of its own.
2. Distance between two clusters is the longest distance between a pair of
elements from two clusters.
D2((a,b),c) = max (D1(a,c), D1(b,c)) = max(20, 32) = 32
D2((a,b),d) = max (D1(a,d), D1(b,d)) = max(35, 30) = 35
D2((a,b),e) = max (D1(a,e), D1(b,e)) = max(25, 22) = 25

3. At each step, the two clusters separated by the shortest distance are
combined. D2((a,b),e)=25 is the lowest values of D2, so we join Cluster (a,b)
with element e →Cluster ((a,b),e)
4. Repeat Step 2 and Step 3 so that the clusters are then sequentially
combined into larger clusters until all elements end up being in the
same cluster.
Average Linkage Clustering and Centroid method

• Average Linkage Clustering:


• the sensitivity of complete-Linkage Clustering to outliers and
• the tendency of Single-linkage Clustering to form long chains that do not
correspond to the intuitive notion of clusters as compact, spherical objects.
• In this method, the distance between two clusters is the average distance of
all pair-wise distances between the data points in two clusters.
• Centroid method: In this method, the distance between two clusters
is the distance between their centroids
Buckshot Algorithm
• Another way to an efficient implementation:
• Cluster a sample, then assign the entire set
• Buckshot combines Hierarchical Agglomerative Clustering
(HAC) and K-means clustering.
• First randomly take a sample of instances of size n
• Run group-average HAC on this sample, which takes only O(n)
time.
• Use the results of HAC as initial seeds for K-means.
• Overall algorithm is O(n) and avoids problems of bad seed
selection
Summary
• Unsupervised Learning
• Concept and Applications
• Partitional Clustering
• K-means
• Distance Measure in K-means Clustering
• Elbow method
• Understand Hierarchical Clustering and draw Dendrogram
• Single-linkage clustering
• Buckshot Algorithm
• Combine HAC and K-means clustering

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy