0% found this document useful (0 votes)

19 views

Clustering-Part1

Uploaded by

ankityadav10291

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

Clustering-Part1

Uploaded by

ankityadav10291

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 79

Clustering Techniques

1
What is Cluster Analysis?
• Given a set of objects, place them in groups such that the objects in
a group are similar (or related) to one another and different from (or
unrelated to) the objects in other groups
Inter-cluster
Intra-cluster distances are
distances are maximized
minimized

2
Applications of Cluster Analysis

• Understanding
• Group related documents for
browsing, group genes and proteins
that have similar functionality, or
group stocks with similar price
fluctuations

• Summarization
• Reduce the size of large data sets

Clustering precipitation
in Australia
Examples of Clustering Applications
• Marketing: Help marketers discover distinct groups in their
customer bases, and then use this knowledge to develop targeted
marketing programs
• Land use: Identification of areas of similar land use in an earth
observation database
• Insurance: Identifying groups of motor insurance policy holders with
a high average claim cost
• City-planning: Identifying groups of houses according to their house
type, value, and geographical location
• Earth-quake studies: Observed earth quake epicenters should be
clustered along continent faults
4
Requirements of Clustering in Data Mining
• Scalability
• Ability to deal with different types of attributes
• Discovery of clusters with arbitrary shape
• Minimal requirements for domain knowledge to determine input
parameters
• Able to deal with noise and outliers
• Insensitive to order of input records
• High dimensionality
• Incorporation of user-specified constraints
• Interpretability and usability

5
Notion of a Cluster can be Ambiguous

How many clusters? Six Clusters

Two Clusters Four Clusters

6
Types of Clustering
• A clustering is a set of clusters

• Important distinction between hierarchical and partitional sets of

clusters

• Partitional Clustering
• A division of data objects into non-overlapping subsets (clusters)

• Hierarchical clustering
• A set of nested clusters organized as a hierarchical tree

7
Partitional Clustering

Original Points A Partitional Clustering

8
Hierarchical Clustering

Traditional Hierarchical Clustering Traditional Dendrogram

Non-traditional Hierarchical Clustering Non-traditional Dendrogram

9
Other Distinctions Between Sets of Clusters
• Exclusive versus non-exclusive
• In non-exclusive clusterings, points may belong to multiple clusters.
• Can belong to multiple classes or could be ‘border’ points
• Fuzzy clustering (one type of non-exclusive)
• In fuzzy clustering, a point belongs to every cluster with some weight between 0 and 1
• Weights must sum to 1
• Probabilistic clustering has similar characteristics
• Partial versus complete
• In some cases, we only want to cluster some of the data

10
Types of Clusters
• Well-separated clusters

• Prototype-based clusters

• Contiguity-based clusters

• Density-based clusters

• Described by an Objective Function

11
Types of Clusters: Well-Separated
• Well-Separated Clusters:
• A cluster is a set of points such that any point in a cluster is closer (or more
similar) to every other point in the cluster than to any point not in the
cluster.

3 well-separated clusters

12
Types of Clusters: Prototype-Based
• Prototype-based
• A cluster is a set of objects such that an object in a cluster is closer (more
similar) to the prototype or “center” of a cluster, than to the center of any
other cluster
• The center of a cluster is often a centroid, the average of all the points in the
cluster, or a medoid, the most “representative” point of a cluster

4 center-based clusters

13
Types of Clusters: Contiguity-Based
• Contiguous Cluster (Nearest neighbor or Transitive)
• A cluster is a set of points such that a point in a cluster is closer (or more
similar) to one or more other points in the cluster than to any point not in
the cluster.

8 contiguous clusters

14
Types of Clusters: Density-Based
• Density-based
• A cluster is a dense region of points, which is separated by low-density
regions, from other regions of high density.
• Used when the clusters are irregular or intertwined, and when noise and
outliers are present.

6 density-based clusters

15
Types of Clusters: Objective Function
• Clusters Defined by an Objective Function
• Finds clusters that minimize or maximize an objective function.
• Enumerate all possible ways of dividing the points into clusters and evaluate
the `goodness' of each potential set of clusters by using the given objective
function. (NP Hard)
• Can have global or local objectives.
• Hierarchical clustering algorithms typically have local objectives
• Partitional algorithms typically have global objectives
• A variation of the global objective function approach is to fit the data to a
parameterized model.
• Parameters for the model are determined from the data.
• Mixture models assume that the data is a ‘mixture' of a number of statistical
distributions.

16
Characteristics of the Input Data Are Important
• Type of proximity or density measure
• Central to clustering
• Depends on data and application

• Data characteristics that affect proximity and/or density are

• Dimensionality
• Sparseness
• Attribute type
• Special relationships in the data
• For example, autocorrelation
• Distribution of the data

• Noise and Outliers

• Often interfere with the operation of the clustering algorithm
• Clusters of differing sizes, densities, and shapes

17
What Is Good Clustering?
• A good clustering method will produce high
quality clusters with
• high intra-class similarity
• low inter-class similarity
• The quality of a clustering result depends on
both the similarity measure used by the
method and its implementation.

18
Clustering Algorithms
• K-means and its variants

• Hierarchical clustering

• Density-based clustering

19
Partitioning Clustering Approach
• Partitioning algorithms construct partition of a database of N objects into
a set of K clusters.
• The partitioning clustering algorithm usually adopts the Iterative
Optimization paradigm.
• It starts with an initial partition and uses an iterative control strategy.
• It tries swapping data points to see if such a swapping improves the quality of
clustering.
• When swapping does not yield any improvements in clustering, it finds a locally
optimal partitioning
• in principle, optimal partition achieved via minimizing the sum of
squared distance to its “representative object” in each cluster

e.g., Euclidean distance

20
K-means algorithm
• Given the cluster number K, the K-means algorithm
is carried out in three steps after initialization:
• Initialization: set seed points (randomly)
1. Assign each object to the cluster of the nearest seed
point measured with a specific distance metric

2. Compute new seed points as the centroids of the clusters

of the current partition (the centroid is the centre, i.e.,
mean point , of the cluster)

3. Go back to Step 1), stop when no more new assignment

(i.e., membership in each cluster no longer changes)

21
K-means - Example
• Problem:
• Suppose we have 4 types of medicines and each has two attributes (pH and
weight index). Our goal is to group these objects into K=2 group of
medicine.

D
Medicine Weight pH-Inde
x
C
A 1 1

B 2 1

C 4 3 A B

D 5 4

22
K-means - Example
• Step 1: Use initial seed points for partitioning

Assign each object to the cluster

with the nearest seed point

23
K-means - Example
• Step 2: Compute new centroids of the current partition
Knowing the members of each
cluster, now we compute the new
centroid of each group based on
these new memberships.

24
K-means - Example
• Step 2: Renew membership based on new centroids

Compute the distance of all

objects to the new centroids

Assign the membership to objects

25
K-means - Example
• Step 3: Repeat the first two steps until its convergence
Knowing the members of each
cluster, now we compute the new
centroid of each group based on
these new memberships.

26
K-means - Example
• Step 3: Repeat the first two steps until its convergence

Compute the distance of all objects

to the new centroids

Stop due to no new assignment

Membership in each cluster no
longer change

27
Strengths of k-means
• Strengths:
• Simple: easy to understand and to implement
• Efficient: Time complexity: O(tkn), where n is the
number of data points, k is the number of clusters,
and t is the number of iterations.

28
Weaknesses of k-means
• The algorithm is only applicable if the mean is
defined.
• The user needs to specify k.
• The algorithm is sensitive to outliers
• Outliers are data points that are very far away
from other data points.
• Outliers could be errors in the data recording or
some special data points with very different
values.

29
Weaknesses of k-means: Problems with outliers

30
Weaknesses of k-means: To deal with outliers
• One method is to remove some data points
in the clustering process that are much
further away from the centroids than other
data points.
• To be safe, we may want to monitor these
possible outliers over a few iterations and then
decide to remove them.
• Another method is to perform random
sampling. Since in sampling we only choose
a small subset of the data points, the
chance of selecting an outlier is very small.
• Assign the rest of the data points to the clusters
by distance or similarity comparison, or
classification

31
Weaknesses of k-means
• The algorithm is sensitive to initial seeds.

32
Weaknesses of k-means
• If we use different seeds: good results
There are some
methods to help
choose good seeds

33
Weaknesses of k-means
• The k-means algorithm is not suitable for discovering clusters that
are not hyper-ellipsoids (or hyper-spheres).

34
The K-Medoids Clustering Method
• Find representative objects, called medoids, in clusters
• PAM (Partitioning Around Medoids)
• The algorithm is intended to find a sequence of objects
called medoids that are centrally located in clusters
• The goal of the algorithm is to minimize the average
dissimilarity of objects to their closest selected object.
• PAM works effectively for small data sets, but does not
scale well for large data sets
• CLARA
• CLARANS

35
PAM Partition Around Medoids
1) Pick a number, k, of random data items as medoids
2) Calculate
The pair (n,m) of medoid/non-medoid
with the smallest impact on clustering quality

3) If TCmn < 0, replace m by n and go back to 2

4) Assign every item to its nearest medoid

36
Swapping Cost
• For each pair of a medoid m and a non-medoid object
h, measure whether h is better than m as a medoid
• For example, we can use the squared-error criterion

• Compute Eh-Em
• Negative: swapping brings benefit
• Choose the minimum swapping cost

37
K-medoids Example
X1 2 6 Distance to X5 to X9
s
X2 3 4
X1 8 7
X3 3 8
X4 5 7 X2 5 6

X5 6 2 Assume k=2 X3 9 8
X6 6 4
Select X5 and X9 as medoids X4 7 6
X7 7 3 X6 2 3
X8 7 4 X7 2 3
X9 8 5 X8 3 2
x10 7 6 x10 5 2

Current clustering: {X2,X5,X6,X7},{X1,X3,X4,X8,X9,X10}

38
X1 2 6
X2 3 4

K-medoids Example X3
X4
3
5
8
7
• So, now let us choose some other point to be a medoid instead of X5 (6, 2). Let us X5 6 2
randomly choose X1 (2, 6). X6 6 4
X7 7 3
• Not the new medoid set is: (2, 6) and (8, 5). Now repeating the same task as earlier: X8 7 4
Replace Befor to X1 To Chang X9 8 5
X5 by X1 e X9 e x10 7 6
X1 7 0 0 -7
X2 5 3 6 -2
X3 8 3 8 -5
X4 6 4 6 -2
X5 0 8 5 5
X6 2 6 3 1
X7 2 8 3 1
X8 2 7 2 0
X9 0 0 0 0 Current clustering: {X1,X2,X3,X4},{X5,X6,X7,X8,X9,X10}
x10 2 5 2 0
-9

39
K-medoids Properties
•

40
CLARA (Clustering Large Applications)
• CLARA (Clustering Large
Applications) uses a
sampling-based method to deal
with large data sets
• A random sample should closely
represent the original data
• The chosen medoids will likely be
similar to what would have been
chosen from the whole data set

41
CLARA (Clustering Large Applications)
• Draw multiple samples of the data set
• Apply PAM to each sample
• Return the best clustering

42
CLARA Properties
•

43
CLARA - Algorithm
• Set mincost to MAXIMUM;
• Repeat q times // draws q samples
• Create S by drawing s objects randomly from D;
• Generate the set of medoids K from S by applying the
PAM algorithm;
• Compute cost(K,D)
• If cost(K, D)<mincost
Mincost = cost(K, D);
Bestset = K;
• Endif;
• Endrepeat;
• Return Bestset;

44
Complexity of CLARA
• Set mincost to MAXIMUM; O(1)
• Repeat q times O(t(s-k)2*k+(n-k)*k)
• Create S by drawing s objects
randomly from D; O(1)
• Generate the set of medoids K
from S by applying the PAM
algorithm; O(t(s-k)2*k)
• Compute cost(K,D) O((n-k)*k)
• If cost(K, D)<mincost O(1)
Mincost = cost(K, D);
Bestset = K;
Endif;
• Endrepeat;
• Return Bestset; 45
CLARANS (“Randomized” CLARA)
• CLARANS (A Clustering Algorithm based on
Randomized Search)
• The clustering process can be presented as searching a
graph where every node is a potential solution, that
is, a set of k medoids
• Two nodes are neighbours if their sets differ by only
one medoid
• Each node can be assigned a cost that is defined to be
the total dissimilarity between every object and the
medoid of its cluster
• The problem corresponds to search for a minimum on
the graph
• At each step, all neighbours of current node node are
searched; the neighbour which corresponds to the
deepest descent in cost is chosen as the next solution

46
CLARANS (“Randomized” CLARA)
• CLARANS (A Clustering Algorithm
based on Randomized Search)
• The clustering process can be
presented as searching a graph
where every node is a potential
solution, that is, a set of k medoids
• Graph Abstraction
• Every node is a potential solution
(k-medoid)
• Two nodes are adjacent if they differ
by one medoid
• Every node has k(n−k) adjacent nodes

47
CLARANS (“Randomized” CLARA)
• For large values of n and k, examining k(n-k) neighbours is time
consuming.
• At each step, CLARANS draws sample of neighbours to examine.
• Note that CLARA draws a sample of nodes at the beginning of
search; therefore, CLARANS has the benefit of not confining the
search to a restricted area.
• If the local optimum is found, CLARANS starts with a new randomly
selected node in search for a new local optimum. The number of
local optimums to search for is a parameter.
• It is more efficient and scalable than both PAM and CLARA; returns
higher quality clusters.
48
Compare no more than maxneighbor
CLARANS times

N C N

N N
<
C
… Local
minimum

N N numlocal
… Local
minimum

… Local
minimum
Best Node
… Local
minimum
49
CLARANS - Algorithm
• Set mincost to MAXIMUM;
• For i=1 to h do // find h local optimum
• Randomly select a node as the current node C in the graph;
• J = 1; // counter of neighbors
• Repeat
Randomly select a neighbor N of C;
If Cost(N,D)<Cost(C,D)
Assign N as the current node C;
J = 1;
Else J++;
Endif;
• Until J > m
• Update mincost with Cost(C,D) if applicableEnd for;
• End For
• Return bestnode;
50
Hierarchical Clustering
• Hierarchical Clustering Approach
• A typical clustering analysis approach via partitioning data set
sequentially
• Construct nested partitions layer by layer via grouping objects into
a tree of clusters (without the need to know the number of
clusters in advance)
• Use (generalised) distance matrix as clustering criteria
• Agglomerative vs Divisive
• Two sequential clustering strategies for constructing a tree of clusters
• Agglomerative: a bottom-up strategy
• Initially each data object is in its own (atomic) cluster
• Then merge these atomic clusters into larger and larger clusters
• Divisive: a top-down strategy
• Initially all objects are in one single cluster
• Then the cluster is subdivided into smaller and smaller clusters

51
Hierarchical Clustering
• Agglomerative approach
Initialization:
Each object is a cluster
Iteration:
a
ab Merge two clusters which are
b abcde most similar to each other;
Until all objects are merged
c
cde into a single cluster
d
de
e

Step 0 Step 1 Step 2 Step 3 Step 4 bottom-up

52
Hierarchical Clustering
• Divisive Approaches
Initialization:
All objects stay in one cluster
Iteration:
a Select a cluster and split it into
ab
two sub clusters
b abcde Until each leaf cluster contains
c only one object
cde
d
de
e

Step 4 Step 3 Step 2 Step 1 Step 0 Top-down

53
Dendrogram
• A binary tree that shows how clusters are
merged/split hierarchically
• Each node on the tree is a cluster; each leaf
node is a singleton cluster

54
Dendrogram
• A clustering of the data objects is obtained by
cutting the dendrogram at the desired level,
then each connected component forms a cluster

55
Dendrogram
• A clustering of the data objects is obtained by
cutting the dendrogram at the desired level, then
each connected component forms a cluster

56
How to Merge Clusters?
• How to measure the distance between clusters?

Single-link
Complete-link
Distance?
Average-link
Centroid distance

Hint: Distance between clusters is usually defined

on the basis of distance between objects.

57
How to Define Inter-Cluster Distance

Single-link
Complete-link
The distance between two clusters is
Average-link represented by the distance of the closest pair
Centroid distance of data objects belonging to different clusters.

58
How to Define Inter-Cluster Distance

Single-link
Complete-link
The distance between two clusters is
Average-link represented by the distance of the farthest pair
Centroid distance of data objects belonging to different clusters.

59
How to Define Inter-Cluster Distance

Single-link
Complete-link
Average-link The distance between two clusters is
Centroid distance represented by the average distance of all pairs
of data objects belonging to different clusters.

60
How to Define Inter-Cluster Distance

× ×
mi,mj are the means
of Ci, Cj,

Single-link
Complete-link
Average-link The distance between two clusters is
represented by the distance between the
Centroid distance means of the cluters.

61
Cluster Distance Measures
Example: Given a data set of five objects characterized by a single continuous feature, assume that there
are two clusters: C1: {a, b} and C2: {c, d, e}.
a b c d e
1 2 4 5 6

1. Calculate the distance matrix. 2. Calculate threelink

Single cluster distances between C1 and C2.
a b c d e

a 0 1 3 4 5

b 1 0 2 3 4 Complete link

c 3 2 0 1 2
Average
d 4 3 1 0 1

e 5 4 2 1 0

62
Agglomerative Algorithm
• The Agglomerative algorithm is carried out in three steps:
1) Convert all object features into
a distance matrix
2) Set each object as a cluster
(thus if we have N objects, we
will have N clusters at the
beginning)
3) Repeat until number of cluster
is one (or known # of clusters)
▪ Merge two closest clusters
▪ Update “distance matrix”

63
Example
• Problem: clustering analysis with agglomerative algorithm

data matrix

Euclidean distance

distance matrix
64
Example
• Merge two closest clusters (iteration 1)

65
Example
• Update distance matrix (iteration 1)

66
Example
• Merge two closest clusters (iteration 2)

67
Example
• Update distance matrix (iteration 2)

68
Example
• Merge two closest clusters/update distance matrix (iteration 3)

69
Example
• Merge two closest clusters/update distance matrix (iteration 4)

70
Example
• Final result (meeting termination condition)

71
Example
• Dendrogram tree representation
1. In the beginning we have 6
clusters: A, B, C, D, E and F
6 2. We merge clusters D and F into
cluster (D, F) at distance 0.50
3. We merge cluster A and cluster B
into (A, B) at distance 0.71
e
lifetim

4. We merge clusters E and (D, F)

5 into ((D, F), E) at distance 1.00
5. We merge clusters ((D, F), E) and C
4 into (((D, F), E), C) at distance 1.41
3 6. We merge clusters (((D, F), E), C)
2 and (A, B) into ((((D, F), E), C), (A, B))
at distance 2.50
7. The last cluster contain all the objects,
thus conclude the computation
object
72
Example
• Dendrogram tree representation
• For a dendrogram tree, its horizontal axis
indexes all objects in a given data set, while
6
its vertical axis expresses the lifetime of all
possible cluster formation.
e
lifetim

5 • The lifetime of a cluster (individual cluster)

in the dendrogram is defined as a distance
4
interval from the moment that the cluster is
3
2 created to the moment that it disappears by
merging with other clusters.

object
73
Hierarchical Clustering: Comparison
5
1 4 1
3
2 5
5 5
2 1 2
MIN MAX
2 3 6 3 6
3
1
4 4
4

5
1
2
5
2
3 6 Group Average
3
4 1
4

74
Which Distance Measure is Better?
• Each method has both advantages and disadvantages;
application-dependent, single-link and complete-link
are the most common methods
• Single-link
• Can find irregular-shaped clusters
• Sensitive to outliers, suffers the so-called chaining effects
• In order to merge two groups, only need one pair of points to be
close, irrespective of all others. Therefore clusters can be too spread
out, and not compact enough
• Average-link, and Centroid distance
• Robust to outliers
• Tend to break large clusters

75
AGNES
• AGNES : Agglomerative Nesting
• Use single-link method
• Merge nodes that have the least dissimilarity
• Eventually all objects belong to the same cluster

76
UPGMA
• UPGMA: Un-weighted Pair-Group Method Average.
• Merge Strategy:
• Average-link approach
• The distance between two clusters is measured by the average distance
between two objects belonging to different clusters.

d avg (C i , C j ) =
1
ni n j
∑ ∑ d ( p, q )
p∈C i q∈C j
Average
distance
ni,nj: the number of objects in cluster
C i, C j.

77
DIANA
• DIANA: Divisive Analysis
• First, all of the objects form one cluster.
• The cluster is split according to some principle, such as the minimum
Euclidean distance between the closest neighboring objects in the
cluster.
• The cluster splitting process repeats until, eventually, each new
cluster contains a single object, or a termination condition is met.

78
C

Splitting Process of DIANA C2 C1

1. Choose the object Oh which is most dissimilar

to other objects in C.
2. Let C1={Oh}, C2=C-C1. C2 C1

3. For each object Oi in C2, tell whether it is more

close to C1 or to other objects in C2 C2 C1

C2 C1
4. Choose the object Ok with greatest D score.
5. If Dk>0, move Ok from C2 to C1, and repeat 3-5.
……
6. Otherwise, stop splitting process.
C2 C1

New Holland E40.2SR E50.2SR Mini Crawler Excavators Service Repair Manual
No ratings yet
New Holland E40.2SR E50.2SR Mini Crawler Excavators Service Repair Manual
21 pages
UNIT5
No ratings yet
UNIT5
60 pages
Lect 12
No ratings yet
Lect 12
80 pages
Lecture 9 Clustering
No ratings yet
Lecture 9 Clustering
36 pages
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
93 pages
Clustering
No ratings yet
Clustering
104 pages
Cluster
100% (1)
Cluster
72 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
8. Clustering
No ratings yet
8. Clustering
38 pages
07Clustering
No ratings yet
07Clustering
34 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
No ratings yet
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
40 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Cluster Analysis Set 01: Types of Clustering
No ratings yet
Cluster Analysis Set 01: Types of Clustering
18 pages
Data Mining Cluster Analysis: Basic Concepts and Algorithms
No ratings yet
Data Mining Cluster Analysis: Basic Concepts and Algorithms
26 pages
DMDWUNITV
No ratings yet
DMDWUNITV
72 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Chap7 Basic Cluster Analysis
No ratings yet
Chap7 Basic Cluster Analysis
82 pages
datamining-lect8
No ratings yet
datamining-lect8
79 pages
Clustering
No ratings yet
Clustering
29 pages
Clustering K Means Agnes
No ratings yet
Clustering K Means Agnes
36 pages
Clustering
No ratings yet
Clustering
39 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
No ratings yet
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
38 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Chapter 5 Clustering
No ratings yet
Chapter 5 Clustering
40 pages
Soft Vs Hard Clustering
No ratings yet
Soft Vs Hard Clustering
5 pages
Clustering
No ratings yet
Clustering
125 pages
Fds Unit03
No ratings yet
Fds Unit03
11 pages
Topic4 Clustering
No ratings yet
Topic4 Clustering
78 pages
Datamining-lect5 - Clustering. the K-means Algorithm. Hierarchical Clustering. the DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-lect5 - Clustering. the K-means Algorithm. Hierarchical Clustering. the DBSCAN Algorithm. Clustering Evaluation
110 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
83 pages
M5
No ratings yet
M5
40 pages
Clustering New
No ratings yet
Clustering New
64 pages
Clustering
No ratings yet
Clustering
84 pages
Clustering-Part 1
No ratings yet
Clustering-Part 1
35 pages
Clustering
No ratings yet
Clustering
32 pages
Introduction to Cluster Analysis.
No ratings yet
Introduction to Cluster Analysis.
53 pages
8. Clustering
No ratings yet
8. Clustering
80 pages
ML - 8
No ratings yet
ML - 8
70 pages
M5
No ratings yet
M5
40 pages
10ClusBasic
No ratings yet
10ClusBasic
95 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
L11 Cluster Analysis
No ratings yet
L11 Cluster Analysis
47 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
Data Mining Unit 3 Cluster Analysis: Types of Clusters
No ratings yet
Data Mining Unit 3 Cluster Analysis: Types of Clusters
11 pages
Unit 5
No ratings yet
Unit 5
63 pages
DSS09 (B) - Clustering
No ratings yet
DSS09 (B) - Clustering
35 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
Data Mining: I Gede Mahendra Darmawiguna
No ratings yet
Data Mining: I Gede Mahendra Darmawiguna
25 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
clustering
No ratings yet
clustering
6 pages
Unit-4 (2)
No ratings yet
Unit-4 (2)
29 pages
ML4 Unsupervised Learning
No ratings yet
ML4 Unsupervised Learning
60 pages
Week 9 Part 1 Clustering
No ratings yet
Week 9 Part 1 Clustering
44 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
Unit5 Clustering
No ratings yet
Unit5 Clustering
74 pages
The Numpy Pocketbook: Essentials on the Go
From Everand
The Numpy Pocketbook: Essentials on the Go
Silas Meadowlark
No ratings yet
Julia for Data Science
From Everand
Julia for Data Science
Anshul Joshi
No ratings yet
A Case Study On Identifying Software Development Lifecycle and Process Framework
No ratings yet
A Case Study On Identifying Software Development Lifecycle and Process Framework
5 pages
Marine Cargo Handbook
No ratings yet
Marine Cargo Handbook
14 pages
Gender and Arbitration
No ratings yet
Gender and Arbitration
3 pages
Comparative Regulatory Guidelines of Biosensors in India
No ratings yet
Comparative Regulatory Guidelines of Biosensors in India
89 pages
Neilson Etal 2007 PDF
No ratings yet
Neilson Etal 2007 PDF
7 pages
Daftar Akun PT Jaya Sukses Tech
No ratings yet
Daftar Akun PT Jaya Sukses Tech
4 pages
zpa-private-service-edge-at-a-glance
No ratings yet
zpa-private-service-edge-at-a-glance
2 pages
Pentagon-Versus-C A
No ratings yet
Pentagon-Versus-C A
2 pages
Chapter 1-5
100% (1)
Chapter 1-5
77 pages
Unit 3 FPL PDF
No ratings yet
Unit 3 FPL PDF
30 pages
Manual CO IFU Leonardo-E Fuser Guide
No ratings yet
Manual CO IFU Leonardo-E Fuser Guide
70 pages
Unit-IV Plant Physiology (Azeem) Quick Revision - Singh Sir
No ratings yet
Unit-IV Plant Physiology (Azeem) Quick Revision - Singh Sir
3 pages
Manish Resume
No ratings yet
Manish Resume
2 pages
Week 1 Day 4
No ratings yet
Week 1 Day 4
5 pages
IOT Based Land Mine Detection Robot
No ratings yet
IOT Based Land Mine Detection Robot
11 pages
Admitcard Visakhapatnam SSA218370356794
No ratings yet
Admitcard Visakhapatnam SSA218370356794
1 page
Session Plan: A. Select Materials To Be Hauled B. Haul Materials C. Mix Mortar /concrete
No ratings yet
Session Plan: A. Select Materials To Be Hauled B. Haul Materials C. Mix Mortar /concrete
4 pages
Judgment Without Trial - 2023
No ratings yet
Judgment Without Trial - 2023
13 pages
Understanding Leadership Traits Behavior and Attitudes
No ratings yet
Understanding Leadership Traits Behavior and Attitudes
4 pages
Human Resource MGMT (HR Planning at COMPTECH) - AzlinaBNM (Final)
No ratings yet
Human Resource MGMT (HR Planning at COMPTECH) - AzlinaBNM (Final)
8 pages
Burn
100% (1)
Burn
10 pages
Management U 4 Note
No ratings yet
Management U 4 Note
18 pages
A Rotor-Tilt-Free Tricopter UAV Design M
No ratings yet
A Rotor-Tilt-Free Tricopter UAV Design M
7 pages
Action Plan For Tle 9
No ratings yet
Action Plan For Tle 9
2 pages
Campos Chapter 3 Case Digests
No ratings yet
Campos Chapter 3 Case Digests
22 pages
Company Profile: Crystal Textile Mill PVT LTD
No ratings yet
Company Profile: Crystal Textile Mill PVT LTD
8 pages
Agro Technique of Selected Med Plant
No ratings yet
Agro Technique of Selected Med Plant
131 pages
Digital Control Systems KDS PDF
No ratings yet
Digital Control Systems KDS PDF
77 pages
Tutorial Guide OpenFoam
No ratings yet
Tutorial Guide OpenFoam
102 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Clustering-Part1

Uploaded by

Clustering-Part1

Uploaded by

Clustering Techniques

How many clusters? Six Clusters

Two Clusters Four Clusters

• Important distinction between hierarchical and partitional sets of

Original Points A Partitional Clustering

Traditional Hierarchical Clustering Traditional Dendrogram

Non-traditional Hierarchical Clustering Non-traditional Dendrogram

• Described by an Objective Function

• Data characteristics that affect proximity and/or density are

• Noise and Outliers

e.g., Euclidean distance

2. Compute new seed points as the centroids of the clusters

3. Go back to Step 1), stop when no more new assignment

Assign each object to the cluster

Compute the distance of all

Assign the membership to objects

Compute the distance of all objects

Stop due to no new assignment

3) If TCmn < 0, replace m by n and go back to 2

Current clustering: {X2,X5,X6,X7},{X1,X3,X4,X8,X9,X10}

Step 0 Step 1 Step 2 Step 3 Step 4 bottom-up

Step 4 Step 3 Step 2 Step 1 Step 0 Top-down

Hint: Distance between clusters is usually defined

1. Calculate the distance matrix. 2. Calculate threelink

4. We merge clusters E and (D, F)

5 • The lifetime of a cluster (individual cluster)

Splitting Process of DIANA C2 C1

1. Choose the object Oh which is most dissimilar

3. For each object Oi in C2, tell whether it is more

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.