0% found this document useful (0 votes)

7 views

Week 9

The document discusses unsupervised learning techniques including clustering algorithms like k-means and hierarchical clustering as well as distance measures used in clustering like Euclidean, squared Euclidean, Manhattan and cosine distance. It also covers supervised learning techniques like classification and regression.

Uploaded by

Aqil Syahmi

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Week 9

Uploaded by

Aqil Syahmi

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

Learning Outcomes

• Understand concept and applications of Unsupervised Learning

• Understand concept of Partitional Clustering and apply it into the
lab assignment and coursework
• K-Means
• Understand Hierarchical Clustering and draw Dendrogram
• Single-linkage clustering
• Complete-linkage clustering
• Average linkage clustering
• Centroid method
• Combine K-means with hierarchical clustering
Supervised Learning
Apple

Known Data

Model
Apple Banana
New Data
Known Response Training Testing
Supervised Learning
Supervised Learning is basically of two types
• Classification
• When the variables are categorical, i.e., with 2 or more classes (yes/no,
true/false, apple/banana), the classification is used.
• Regression
• In the relationship between two or more variables, a change in one
variable is associated with a change in another variable
Supervised Learning - Classification
New Email

Non spam
Non spam
Spam Filtering

Categorical
Separation

Learn Scan content

Spam
Spam
Spam Emails
Supervised Learning - Regression
Weather Prediction
Humidity

Prediction
%

Temperature Learn

Past Data

New Data
Supervised Learning Applications
Signature Risk Image Face Fraud
Recognition Assessment Classification Detection Detection

Attack Visual Spam Weather

Detection Recognition Detection Forecasting
Unsupervised Learning

Pattern
Recognition

Known Data Response

Model
Unsupervised Learning
Unsupervised Learning is basically of two types
• Clustering
• A method of dividing the objects into clusters such that objects in a cluster
should be as similar as possible, and objects in different clusters should be as
dissimilar as possible.
• Association
• A method for discovering interesting relations between variables in large
collections
Unsupervised Learning - Clustering

A
B

Call Duration
Internet Usage

Internet Usage
Total Call Duration

A telecom service provider provides the personalized data and call plans to keep the customers
Unsupervised Learning - Association
Customer 1 Customer 2 New Customer

• Bread • Bread If a new customer purchases

• Milk • Milk bread, he is most likely to
• Fruit • Corn purchase milk as well.
• Candy
Unsupervised Learning Applications
Delivery Store Products Customer
Market Research
Optimization Segmentation Segmentation

Identification of Identifying
Similarity Recommendation
Human Errors Accident Prone
Detection Systems
during Data Entry Areas

Anomaly
Search Engine
Detection
Summary of Classical Machine Learning
Machine Learning

Supervised Learning Unsupervised Learning

Classification Regression Cluster Association

Clustering

• Clustering is a technique for finding similarity groups in data, called clusters.

• Similar data instances in same cluster
• Dissimilar data in different clusters
• Clustering is an example of unsupervised learning
• No labels assigned to data points/instances
• Clustering algorithms find patterns in the given data
Types of Clustering
Clustering

Partitional Clustering Hierarchical Clustering

K-means Fuzzy C-means Agglomerative Divisive

Types of Clustering
Clustering

C1 C2

Partitional Clustering

K-means Fuzzy C-means

Divide objects into clusters such that each object is only in one cluster, not several clusters
Types of Clustering
Clustering

C1 C2

Partitional Clustering

K-means Fuzzy C-means

Divide objects into clusters such that an object can belong to more than one cluster
Hierarchical Clustering
Clustering

Hierarchical Clustering

Clusters have a tree type structure

Hierarchical Agglomerative Clustering
a b c d e f g
Clustering

de fg
Hierarchical Clustering
defg

cdefg
Agglomerative Divisive
bcdefg

abcdefg

Bottom-Up Approach: Begin with each object as a separate cluster, and then merge them in to larger clusters
Hierarchical Clustering - Divisive
abcdefg
Clustering

bcdefg

Hierarchical Clustering cdefg

defg

Agglomerative Divisive
de fg

a b c d e f g

Top-Down Approach: Begin with all object as a cluster, and then divide them in to smaller clusters
Distance Measure of K-means Clustering

• Distance measure is used to determine the similarity between two objects

• Distance measure influences the shape of the clusters
• Distance measure supported by K-means
• Euclidean distance measure
• Squared Euclidean distance measure
• Manhattan distance measure
• Cosine distance measure
Distance Measure in K-means Clustering
1. Euclidean distance • The Euclidean distance is a straight line
measure • It is the distance between two points in
2. Squared Euclidean Euclidean space
distance measure 𝑛
𝑑= ෍ 𝑞𝑖 − 𝑝𝑖 2
3. Manhattan distance 𝑖=1
measure q(x2, y2)
4. Cosine distance Euclidean distance
measure
P(x1, y1)
Distance Measure in K-means Clustering
1. Euclidean distance • Euclidean square distance measure uses the
measure same equation as the Euclidean distance
2. Squared Euclidean measure without the square root
distance measure 𝑑=෍
𝑛
𝑞𝑖 − 𝑝𝑖 2

3. Manhattan distance 𝑖=1

measure
4. Cosine distance
measure
Distance Measure in K-means Clustering
1. Euclidean distance • Manhattan distance is the sum of the distance
measure between two points measured along axes at
2. Squared Euclidean right angles
distance measure 𝑑=෍
𝑛
𝑞𝑥 − 𝑝𝑥 + 𝑞𝑦 − 𝑝𝑦
3. Manhattan distance 𝑖=1

measure Q(x, y)
4. Cosine distance Manhattan distance
measure
P(x, y)
Distance Measure in K-means Clustering
1. Euclidean distance • Cosine distance measures the angle between
measure two vectors
2. Squared Euclidean
σ𝑛𝑖=1 𝑝𝑖 𝑞𝑖
distance measure 𝑑= 𝑛
σ𝑖=1 𝑝𝑖 2 σ𝑛𝑖=1 𝑞𝑖 2
3. Manhattan distance
measure p
4. Cosine distance Cosine distance
measure
q
Clustering Basics
• Clustering algorithm
• Partitional clustering
• Hierarchical
clustering
• Distance function
• Decides which class is the nearest
• Can be Euclidean distance
• Clustering quality depends
• Algorithm
• Distance function
• Application
• Inter-clusters distance  maximized
• Intra-clusters distance  minimized
K-means Algorithm
• Partitional clustering
• Partitions the given data into k clusters.
• Each cluster has a cluster center, called centroid.
• k is specified by the user
• Each data point
• Vector X = {x1, x2, …, xn} -> n dimensional data
• N attributes
• Could be weighted or non-weighted attributes
K-means Algorithm
• User decides on the k value
• Given k:
1) Randomly choose k data points as the initial centroids, cluster centers
2) Assign each data point to the closest centroid
3) Re-compute the centroids using the current cluster memberships.
4) If a convergence criterion is not met, go to step 2

• Different convergence criteria can be used

• Based on the application
• No further change in the centroid
K-means Algorithm
• User decides on the k value
• Given k:
1) Randomly choose k data points as the initial centroids, cluster centers
2) Assign each data point to the closest centroid
3) Re-compute the centroids using the current cluster memberships.
4) If a convergence criterion is not met, go to step 2

• Different convergence criteria can be used

• Based on the application
• No further change in the centroid
K-means: an example
K-means: Initialize centers randomly

K=3
K-means: assign points to nearest center
K-means: readjust centers
K-means: assign points to nearest center
K-means: readjust centers
K-means: assign points to nearest center
K-means: readjust centers
K-means: assign points to nearest center

No changes: Done
K-means Clustering Algorithm
• Step 1
• Random select K cluster centroids
• C is the set of all centroids
𝐶 = 𝑐1 , 𝑐2 , … , 𝑐𝑘
• Step 2
• Calculate the Euclidean distance from each data point to the centroids
and assign the data point to one centroid with the min distance
2
arg min 𝑑 𝑥, 𝑐𝑖
𝐶𝑖 ∈𝐶
K-means Clustering Algorithm
• Step 3
• Calculate the new centroid for each cluster
1
𝑐𝑖 = ෍ 𝑥𝑖
𝑆𝑖
𝑥𝑖 ∈𝑆𝑖

where 𝐶𝑖 is the new centroid, 𝑆𝑖 is all data point 𝑥𝑖 assigned to the 𝑖𝑡ℎ cluster
• Step 4
• Repeat Step 2 and Step 3 until the cluster assignments are stable
Strengths of K-means
• Strengths:
• Simple: easy to understand and to implement
• Efficient: Time complexity: O(tkn),
where n is the number of data points,
k is the number of clusters, and
t is the number of iterations.
• Since both k and t are small. k-means is considered a linear algorithm.
• K-means is the most popular clustering algorithm.
• It terminates at a local optimum
• The global optimum is hard to find.
Weakness of K-means
• The algorithm is only applicable if the mean is defined.
• For categorical data, k-mode - the centroid is represented by most frequent
values, is used
• The user needs to specify k.
• The algorithm is sensitive to outliers
• Outliers are data points that are very far away from other data points.
• Outliers could be errors in the data recording or some special data points with
very different values
• Algorithm is very sensitive to the initial assignment of the centroids
Outliers in K-means
Outliers in K-means
• Desired output

Outlier
Sensitive to initial points in K-means

An example clustering by k-means

What if the initial centroids are

different?
Sensitive to initial points in K-means
Another clustering with a
different starting point.
Discovering non hyper-ellipsoids
• Not suitable in finding non hyper-ellipsoids

• K-means may cluster as

follows.

• Cannot identify the

obvious two clusters
How to Choose the Optimum Number of Clusters?

• The most well-known method

• Elbow method for determining the optimal number of clusters
• A heuristic used in determining the number of clusters in a data set
• Within-Cluster-Sum of Squared (WSS)
• Sum of the Euclidean distance measure between each member of the
cluster and its centroid for each k
WSS = σ𝑛𝑖=1 𝑥𝑖 − 𝑐𝑖 2

Where 𝑥𝑖 is a data point, 𝑐𝑖 is its centroid, and 𝑛 is the total data points
Calculating WSS for a range of values for k

• Calculate the WSS for

different values of k 40000

▪ 1 to max k 30000
• Plot WSS vs k

WSS
• Choose k for which WSS 20000

becomes first starts to 10000

diminish
▪ the plot looks like an 0
1 2 3 4 5 6 7 8 9 10
arm with a clear elbow
k
at k = 3
Summary
• Despite weaknesses, K-means is still the most popular algorithm due
to its simplicity, efficiency and
• other clustering algorithms have their own lists of weaknesses.
• No clear evidence that any other clustering algorithm performs better
in general
• although they may be more suitable for some specific types of data or
applications.
• Comparing different clustering algorithms is a difficult task. No one
knows the correct clusters!
Hierarchical Clustering
• Hierarchy of clusters
• Tree structure
• Cases where there is a hierarchy of classes
• Hierarchical Clustering 44 min
• https://www.youtube.com/watch?v=9U4h6pZw6f8&feature=emb_rel_pause
Buildings

Commercial Residential

Mall Tradehub Industrial Hawker Condo HDB Landed

Types of hierarchical clustering
• Agglomerative (bottom up) clustering: It builds the dendrogram (tree) from
the bottom level, and
• merges the most similar (or nearest) pair of clusters
• stops when all the data points are merged into a single cluster (i.e., the root cluster).
• Divisive (top down) clustering: It starts with all data points in one cluster,
the root.
• Splits the root into a set of child clusters. Each child cluster is recursively divided
further
• stops when only singleton clusters of individual data points remain, i.e., each cluster
with only a single point
Dendrograms
• A Dendrogram is a Final
diagram representing
a tree.
• It illustrates the nested ➢4
sequence of clusters in
Hierarchical Clustering ➢3
➢2
➢1
Agglomerative Clustering
It is more popular than divisive methods.
• At the beginning, each data point forms a cluster (also called a node).
• Merge nodes/clusters that have the least distance.
• Go on merging
• Eventually all nodes belong to one cluster

• Time complexity: at least O(n2)

Calculating the distance between 2 clusters
• A few ways to measure distances of two clusters.
• Results in different variations of the algorithm.
• Single-linkage clustering
• Complete-linkage clustering
• Average linkage clustering
• WPGMA (Weighted Pair Group Method with
Arithmetic Mean)
• UPGMA (Unweighted Pair Group Method with
Arithmetic Mean)
• Centroid method
Single-Linkage Clustering
• In the beginning of the agglomerative clustering process, each
element is in a cluster of its own.
• The clusters are then sequentially combined into larger clusters, until
all elements end up being in the same cluster.
• At each step, the two clusters separated by the shortest distance are
combined.
• Distance between two clusters is the shortest distance between a pair
of elements from two clusters.

Source: https://en.wikipedia.org/wiki/Single-linkage_clustering
Single-Linkage Clustering
Working Example Dendrogram
• Five elements (a,b,c,d,e) and the following u
a
matrix D1 of pairwise distances between them b
• First Step 8 0
Height

a b c d e • Get the min D1(a,b) → Cluster (a, b)

• Let u denote the node to which a and b are now
a 0 16 20 35 25
connected. Setting δ(a,u) = δ(b,u) = D1(a,b)/2 ensures
b 16 0 32 30 22 that elements a and b are equidistant from u.
c 20 32 0 28 39 • The branches joining a and b to u then have lengths
d 35 30 28 0 50 δ(a,u) = δ(b,u) = D1(a,b)/2 = 16/2 = 8
e 25 22 39 50 0 • Height of the Dendrogram at the first step is 8.
Single-Linkage Clustering
Working Example
• Second Step
D2((a,b),c) = min (D1(a,c), D1(b,c)) = min(20, 32) = 20 a b c d e
D2((a,b),d) = min (D1(a,d), D1(b,d)) = min(35, 30) = 30
D2((a,b),e) = min (D1(a,e), D1(b,e)) = min(25, 22) = 22 a 0 16 20 35 25
b 16 0 32 30 22
(a,b) c d e c 20 32 0 28 39
d 35 30 28 0 50
(a,b) 0 20 30 22
e 25 22 39 50 0
c 20 0 28 39
d 30 28 0 50
• Update the initial proximity matrix D1 to a new proximity matrix
e 22 39 50 0 D2
Single-Linkage Clustering
Working Example Dendrogram
• Second Step u
a
D2((a,b),c) = min (D1(a,c), D1(b,c)) = min(20, 32) = 20 b
D2((a,b),d) = min (D1(a,d), D1(b,d)) = min(35, 30) = 30 Height
8 0
D2((a,b),e) = min (D1(a,e), D1(b,e)) = min(25, 22) = 22

• D2((a,b),c)=20 is the lowest values of D2, so we join Cluster (a,b)

(a,b) c d e with element c →Cluster ((a,b),c)
• Let v denote the node to which (a,b), and c are now connected.
(a,b) 0 20 30 22
• δ(a, v) = δ(b, v) = δ(c, v) = D2((a,b),c)= 10
c 20 0 28 39 • Height of the Dendrogram at the second step is 10.
d 30 28 0 50
e 22 39 50 0
Single-Linkage Clustering
Working Example
• Third Step
D3(((a,b),c),d) = min (D2((a,c),d), D2(c,d)) = min(30, 28) = 28
D3(((a,b),c),e) = min (D2((a,b),e), D2(c,e)) = min(22, 39) = 22 (a,b) c d e

(a,b) 0 20 30 22
c 20 0 28 39
((a,b),c) d e
d 30 28 0 50
((a,b),c) 0 28 22 e 22 39 50 0
d 28 0 50
e 22 50 0 • Update the proximity matrix D2 to a new proximity matrix D3
Single-Linkage Clustering
Dendrogram
Working Example u a
• Third Step v b
D3(((a,b),c),d) = min (D2((a,c),d), D2(c,d)) = min(30, 28) = 28 c
D3(((a,b),c),e) = min (D2((a,b),e), D2(c,e)) = min(22, 39) = 22 Height
10 8 0

• D3(((a,b),c),e)=22 is the lowest values of D3, so we join Cluster

((a,b),c) d e
((a,b),c) with element e → Cluster (((a,b),c),e)
((a,b),c) 0 28 22 • Let w denote the node to which ((a,b),c) and e are now connected.
• δ(a, w) = δ(b, w) = δ(c, w) = δ(e, w) = D3(((a,b),c),e)= 11
d 28 0 50
• Height of the Dendrogram at the third step is 11.
e 22 50 0
Single-Linkage Clustering
Working Example
• Final Step
D4((((a,b),c),e),d) = min (D3((((a,b),c),e),d), D3(e,d))
= min(28,50) = 28
((a,b),c) d e

((a,b),c) 0 28 22
(((a,b),c),e) d
d 28 0 50
(((a,b),c),e) 0 28 e 22 50 0
d 28 0
• Update the proximity matrix D3 to a new proximity matrix D4
Single-Linkage Clustering
Dendrogram
Working Example u a
v b
• Final Step w c
D4((((a,b),c),e),d) = min (D3((((a,b),c),e),d), D3(e,d))
= min(28,50) = 28 e
Height
11 10 8 0

• D4((((a,b),c),e),d)=28 is the lowest values of D4, so we join Cluster

(((a,b),c),e) d
(((a,b),c),e) with element d → Cluster ((((a,b),c),e),d)
(((a,b),c),e) 0 28 • Let r denote the (root) node to which (((a,b),c),e) and x are now
connected.
d 28 0
• δ(a, r) = δ(b, r) = δ(c, r) = δ(e, r) = δ(d, r) = D4((((a,b),c),e),d)= 14
• Height of the Dendrogram at the final step from the root to a, b, c, d,
e is 14.
Complete-Linkage Clustering
1. At the beginning of the process, each element is in a cluster of its own.
2. Distance between two clusters is the longest distance between a pair of
elements from two clusters.
D2((a,b),c) = max (D1(a,c), D1(b,c)) = max(20, 32) = 32
D2((a,b),d) = max (D1(a,d), D1(b,d)) = max(35, 30) = 35
D2((a,b),e) = max (D1(a,e), D1(b,e)) = max(25, 22) = 25

3. At each step, the two clusters separated by the shortest distance are
combined. D2((a,b),e)=25 is the lowest values of D2, so we join Cluster (a,b)
with element e →Cluster ((a,b),e)
4. Repeat Step 2 and Step 3 so that the clusters are then sequentially
combined into larger clusters until all elements end up being in the
same cluster.
Average Linkage Clustering and Centroid method

• Average Linkage Clustering:

• the sensitivity of complete-Linkage Clustering to outliers and
• the tendency of Single-linkage Clustering to form long chains that do not
correspond to the intuitive notion of clusters as compact, spherical objects.
• In this method, the distance between two clusters is the average distance of
all pair-wise distances between the data points in two clusters.
• Centroid method: In this method, the distance between two clusters
is the distance between their centroids
Buckshot Algorithm
• Another way to an efficient implementation:
• Cluster a sample, then assign the entire set
• Buckshot combines Hierarchical Agglomerative Clustering
(HAC) and K-means clustering.
• First randomly take a sample of instances of size n
• Run group-average HAC on this sample, which takes only O(n)
time.
• Use the results of HAC as initial seeds for K-means.
• Overall algorithm is O(n) and avoids problems of bad seed
selection
Summary
• Unsupervised Learning
• Concept and Applications
• Partitional Clustering
• K-means
• Distance Measure in K-means Clustering
• Elbow method
• Understand Hierarchical Clustering and draw Dendrogram
• Single-linkage clustering
• Buckshot Algorithm
• Combine HAC and K-means clustering

True Colors Personality Test
100% (5)
True Colors Personality Test
2 pages
Acrylic I3 Pro B 3D Printer Building Instruction PDF
No ratings yet
Acrylic I3 Pro B 3D Printer Building Instruction PDF
94 pages
Clustering-Part1.pptx
No ratings yet
Clustering-Part1.pptx
84 pages
Introduction To Unsupervised Learning:: Clustering
No ratings yet
Introduction To Unsupervised Learning:: Clustering
21 pages
Clustering K-Means
100% (2)
Clustering K-Means
28 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Unit 4
No ratings yet
Unit 4
74 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
Clustering FinancialData
No ratings yet
Clustering FinancialData
38 pages
Clustering Algorithm: An Unsupervised Learning Approach
No ratings yet
Clustering Algorithm: An Unsupervised Learning Approach
23 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
10 pages
DM&BAFall2204 2
No ratings yet
DM&BAFall2204 2
61 pages
Clustering
No ratings yet
Clustering
125 pages
unsupervised learning
No ratings yet
unsupervised learning
23 pages
04-FSSR_DS610_2024=2025T1_Kmeans
No ratings yet
04-FSSR_DS610_2024=2025T1_Kmeans
57 pages
Ml Unit5 Notes
No ratings yet
Ml Unit5 Notes
18 pages
som-new
No ratings yet
som-new
21 pages
Clustering
No ratings yet
Clustering
84 pages
Machine Learning Unsupervised
No ratings yet
Machine Learning Unsupervised
20 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
K - Means Clustering
No ratings yet
K - Means Clustering
13 pages
Unsupervised Learning Modi
No ratings yet
Unsupervised Learning Modi
16 pages
Unsupervised Learning: K-Means Clustering
No ratings yet
Unsupervised Learning: K-Means Clustering
23 pages
K-MEANS-FINAL
No ratings yet
K-MEANS-FINAL
10 pages
U1 - KMeans - 5th Sem - DS
No ratings yet
U1 - KMeans - 5th Sem - DS
14 pages
datamining-lect8
No ratings yet
datamining-lect8
79 pages
WINSEM2023-24 BEEE410L TH VL2023240502246 2024-03-22 Reference-Material-I
No ratings yet
WINSEM2023-24 BEEE410L TH VL2023240502246 2024-03-22 Reference-Material-I
95 pages
10.Lab Activity
No ratings yet
10.Lab Activity
11 pages
Lecture - 10 Unsupervised Learning & K-Means Clustering
No ratings yet
Lecture - 10 Unsupervised Learning & K-Means Clustering
31 pages
Datamining-lect5 - Clustering. the K-means Algorithm. Hierarchical Clustering. the DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-lect5 - Clustering. the K-means Algorithm. Hierarchical Clustering. the DBSCAN Algorithm. Clustering Evaluation
110 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
kmeansfinal
No ratings yet
kmeansfinal
16 pages
Unit IV
No ratings yet
Unit IV
96 pages
Unsupervised Learning Final
No ratings yet
Unsupervised Learning Final
17 pages
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
No ratings yet
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
65 pages
Unsupervised Learning 2024-PPG
No ratings yet
Unsupervised Learning 2024-PPG
85 pages
Mod4_Unsupervised Learning
No ratings yet
Mod4_Unsupervised Learning
9 pages
Lesson 5 - Unsupervised Learning
No ratings yet
Lesson 5 - Unsupervised Learning
11 pages
K Mean
No ratings yet
K Mean
12 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
Unit 3 - KmeansClustering
No ratings yet
Unit 3 - KmeansClustering
17 pages
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
2 - K-Mean
No ratings yet
2 - K-Mean
39 pages
13: Clustering: Unsupervised Learning - Introduction
No ratings yet
13: Clustering: Unsupervised Learning - Introduction
4 pages
kmea
No ratings yet
kmea
53 pages
Unit-4
No ratings yet
Unit-4
53 pages
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
No ratings yet
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
40 pages
Week 11
No ratings yet
Week 11
49 pages
8. Clustering
No ratings yet
8. Clustering
80 pages
Unit-4 (2)
No ratings yet
Unit-4 (2)
29 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
21csc305p Machine Learning Unit 3_updated (2)
No ratings yet
21csc305p Machine Learning Unit 3_updated (2)
147 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
Clustering Techniques - Hierarchical, K-Means Clustering
No ratings yet
Clustering Techniques - Hierarchical, K-Means Clustering
22 pages
Clustering
No ratings yet
Clustering
6 pages
6 Clustering
No ratings yet
6 Clustering
15 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Python Machine Learning By Example: Unlock machine learning best practices with real-world use cases
From Everand
Python Machine Learning By Example: Unlock machine learning best practices with real-world use cases
Yuxi (Hayden) Liu
No ratings yet
Lec4 Data Analysis
No ratings yet
Lec4 Data Analysis
39 pages
X03 - Microsoft Threat Modeling Tool - 2023
No ratings yet
X03 - Microsoft Threat Modeling Tool - 2023
8 pages
L06-Singapore Cyber Landscape 2020 - Final
No ratings yet
L06-Singapore Cyber Landscape 2020 - Final
33 pages
L02-Business and Technology Threats
No ratings yet
L02-Business and Technology Threats
55 pages
Imtihon javoblari 11-sinf
No ratings yet
Imtihon javoblari 11-sinf
4 pages
EEL319 Digital Signal Processing Experiment 3: PN Sequence Generation
No ratings yet
EEL319 Digital Signal Processing Experiment 3: PN Sequence Generation
2 pages
Mosfet Characteristics
No ratings yet
Mosfet Characteristics
12 pages
Mid-term SDP Evaluation PPT (1)
No ratings yet
Mid-term SDP Evaluation PPT (1)
31 pages
Shasun Company Profile
No ratings yet
Shasun Company Profile
16 pages
Some Interesting Facts: African Culture Here
No ratings yet
Some Interesting Facts: African Culture Here
2 pages
Emotion and Motivation
100% (1)
Emotion and Motivation
42 pages
Free Download Here: Discrete Mathematics Goodaire 3rd Edition PDF
No ratings yet
Free Download Here: Discrete Mathematics Goodaire 3rd Edition PDF
2 pages
Huawei Certification Examination Appointment Guide
No ratings yet
Huawei Certification Examination Appointment Guide
14 pages
Series Compression Machine With Automatic VFD Controls and High Stiffness Frame Fhs
No ratings yet
Series Compression Machine With Automatic VFD Controls and High Stiffness Frame Fhs
5 pages
9 8 Equations of Parabolas
No ratings yet
9 8 Equations of Parabolas
29 pages
Teoría Atómica Moderna
No ratings yet
Teoría Atómica Moderna
140 pages
Constellations:: Mythology & Science
No ratings yet
Constellations:: Mythology & Science
36 pages
Measuring Suicidal Behavior and Risk in Children and Adolescents 1st Edition David B. Goldston - Download the ebook now for an unlimited reading experience
100% (1)
Measuring Suicidal Behavior and Risk in Children and Adolescents 1st Edition David B. Goldston - Download the ebook now for an unlimited reading experience
53 pages
Vif Procedure
No ratings yet
Vif Procedure
4 pages
P E R S P E C T I V E: Site Development Plan
No ratings yet
P E R S P E C T I V E: Site Development Plan
1 page
Robotics q1
No ratings yet
Robotics q1
12 pages
Reviewer-Internal Audit
No ratings yet
Reviewer-Internal Audit
7 pages
3fdef5_ff0041de531d42268d3b5ce135b48a8c
No ratings yet
3fdef5_ff0041de531d42268d3b5ce135b48a8c
7 pages
Forecasting Odds Movements in Horse Racing
No ratings yet
Forecasting Odds Movements in Horse Racing
63 pages
Now Let's Move On To The 1 Activity: Word Choice
No ratings yet
Now Let's Move On To The 1 Activity: Word Choice
3 pages
the meditations
No ratings yet
the meditations
1 page
Nyu Gsas Dissertation Formatting
100% (2)
Nyu Gsas Dissertation Formatting
8 pages
The Big One
No ratings yet
The Big One
24 pages
Dineshkumar Vidi Ravindran: Professional Summary
No ratings yet
Dineshkumar Vidi Ravindran: Professional Summary
4 pages
Edu 250 Lesson Plan 2
No ratings yet
Edu 250 Lesson Plan 2
4 pages
Corporate Ib
No ratings yet
Corporate Ib
7 pages
WEEK 4 - Unit INTRODUCTION - FRIENDS
No ratings yet
WEEK 4 - Unit INTRODUCTION - FRIENDS
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Week 9

Uploaded by

Week 9

Uploaded by

Learning Outcomes

• Understand concept and applications of Unsupervised Learning

Learn Scan content

Attack Visual Spam Weather

Known Data Response

• Bread • Bread If a new customer purchases

Supervised Learning Unsupervised Learning

Classification Regression Cluster Association

• Clustering is a technique for finding similarity groups in data, called clusters.

Partitional Clustering Hierarchical Clustering

K-means Fuzzy C-means Agglomerative Divisive

K-means Fuzzy C-means

K-means Fuzzy C-means

Clusters have a tree type structure

Hierarchical Clustering cdefg

• Distance measure is used to determine the similarity between two objects

3. Manhattan distance 𝑖=1

• Different convergence criteria can be used

• Different convergence criteria can be used

An example clustering by k-means

What if the initial centroids are

• K-means may cluster as

• Cannot identify the

• The most well-known method

• Calculate the WSS for

becomes first starts to 10000

Mall Tradehub Industrial Hawker Condo HDB Landed

• Time complexity: at least O(n2)

a b c d e • Get the min D1(a,b) → Cluster (a, b)

• D2((a,b),c)=20 is the lowest values of D2, so we join Cluster (a,b)

• D3(((a,b),c),e)=22 is the lowest values of D3, so we join Cluster

• D4((((a,b),c),e),d)=28 is the lowest values of D4, so we join Cluster

• Average Linkage Clustering:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.