0% found this document useful (0 votes)

14 views

CH-5 DM Classification

Classification is the process of organizing data into categories based on learning patterns from training data. Decision trees are a popular classification algorithm that organize attributes into a tree structure to predict an object's class. Decision trees work by splitting instances into branches based on attribute values and assigning a class label to leaf nodes.

Uploaded by

addis alemayhu

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

CH-5 DM Classification

Uploaded by

addis alemayhu

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 31

Classification

1
Classification: Definition
• Classification is the process of finding a model, which
describes and distinguishes data classes or concepts,
for the purpose of being able to use the model to
predict the class of objects whose class label is
unknown
• Generally, classification is a data mining technique
used to predict group membership for data instances.
• Given a collection of records (training set), each
record contains a set of attributes, one of the attributes
is the class
– Find a model for class attribute as a function of the values
of other attributes 2
Classification: Definition
• The goal of classification is to accurately
predict the target class for each case in the data
– For example, a classification model could be used
to identify loan applicants as low, medium, or high
credit risks
• A classification task begins with a data set in
which the class assignments are known.
– For example, a classification model that predicts
credit risk could be developed based on observed
data for many loan applicants over a period of time

3
Classification: Definition
– In addition to the historical credit rating, the data
might track employment history, home ownership
or rental, years of residence, number and type of
investments, and so on
– Credit rating would be the target, the other
attributes would be the predictors, and the data for
each customer would constitute a case
• The simplest type of classification problem is
binary classification
– In binary classification, the target attribute has
only two possible values: for example, high credit
rating or low credit rating
4
Classification: Definition
• In the model building (training) process, a classification
algorithm finds relationships between the values of the
predictors and the values of the target
– Different classification algorithms use different techniques for
finding relationships
– These relationships are summarized in a model, which can then be
applied to a different data set in which the class assignments are
unknown
• Classification models are tested by comparing the
predicted values to known target values in a set of test
data
• The historical data for a classification project is typically
divided into two data sets: one for building the model
(training data set); the other for testing the model (test
data set) 5
Illustrating Classification Task
Tid Attrib1 Attrib2 Attrib3 Class Learning
No
1 Yes Large 125K
algorithm
2 No Medium 100K No

3 No Small 70K No

4 Yes Medium 120K No

Induction
5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn

8 No Small 85K Yes Model
9 No Medium 75K No
10 No Small 90K Yes
Model
10

Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ? Deduction

14 No Small 95K ?

15 No Large 67K ?
10

Test Set
6
Classification
• There are different Classification algorithms such
– Decision tree,
– Naïve Bayes method,
– Bayesian Belief Network,
– Artificial Neural network,
– Support vector Machine, etc
• Most Classification algorithms involves two steps:
1. Model construction
2. Model Usage
• Some other classification algorithm such as the K-
Nearest Neighbor approach don’t require any model
April 27, 2024 Data Mining: Concepts and Techniques 7
Classification
1. Model construction
• refers to describing a set of predetermined classes
using training data set
• The training data is a set of tuples where Each
tuple/sample is assumed to belong to a predefined
class, as determined by the class label attribute
• The model is represented as classification rules,
decision trees, or mathematical formulae
2. Model usage:
• Refers to using the model for classifying future or
unknown objects

April 27, 2024 Data Mining: Concepts and Techniques 8

Classification Process (1): Model
Construction

Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED Classifier

(Model)
Mike Assistant Prof 3 no
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes
IF rank = ‘professor’
Dave Assistant Prof 6 no OR years > 6
Anne Associate Prof 3 no THEN tenured = ‘yes’
April 27, 2024 Data Mining: Concepts and Techniques 9
Classification Process (2): Use
the Model in Prediction

Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)

NAME RANK YEARS TENURED

Tom Assistant Prof 2 no Tenured?
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph
April 27, 2024
Assistant Prof Data Mining:
7 Concepts andyes
Techniques 10
Metrics for Performance Evaluation…
PREDICTED CLASS
Class=Yes Class=No

Class=Yes a b
ACTUAL (TP) (FP)
CLASS Class=No c d
(FP) (TP)

• Most widely-used metric:

ad TP
Accuracy   *100
a  b  c  d TP  FP
11
Classification by Decision Tree
Induction
• Decision tree induction is the learning of decision
trees from class-labeled training tuples
• A decision tree is a flow-chart-like tree structure,
where
– each internal node (nonleaf node) denotes a test
on an attribute,
– each branch represents an outcome of the test, and
– each leaf node (or terminal node) holds a class
label
• The top most node in a tree is the root node
• Instances are classified starting at the root node
and sorted based on their feature values 12
How are decision trees used for classification?
• Given a tuple, X, for which the associated
class label is unknown,
– the attribute values of the tuple are tested against
the decision tree
– a path is traced from the root to a leaf node, which
holds the class prediction for that tuple
• Decision trees can easily be converted to
classification rules

13
Decision Tree

14
Decision tree classifier
• Decision tree performs classification by
constructing a tree based on training instances
with leaves having class labels
– The tree is traversed for each test instance to find a
leaf, and the class of the leaf is the predicted class
• Widely used learning method
• It has been applied to:
– classify medical patients based on the disease
– equipment malfunction by cause
– loan applicant by likelihood of payment
15
Decision Trees
• Tree where internal nodes are simple decision rules on one or more
attributes and leaf nodes are predicted class labels; i.e. a Boolean
classifier for the input instance
Given an instance of an object or situation, which is specified by a
set of properties, the tree returns a "yes" or "no" decision about that
instance

Attribute_1
value-1 value-3
value-2
Attribute_2 Class1 Attribute_2

value-5 value-4 value-6 value-7

Class2 Class3 Class4 Class5
16
Algorithm for Decision Tree Induction
• Basic algorithm(a greedy algorithm i.e. nonbacktracking)
– Tree is constructed in a top-down recursive divide-and-conquer
manner
– At start, all the training examples/tuples are at the root
– Attributes are categorical (if continuous-valued, they are
discretized in advance)
– Examples are partitioned recursively based on selected
attributes
– Optimal attributes are selected on the basis of a heuristic or
statistical measure (e.g., information gain)

• Conditions for stopping partitioning

– All samples (tuples) for a given node belong to the same class
– There are no remaining attributes on which the tuples may be
further partitioned
– There are no samples (tuples) left for a given branch
17
Attribute Selection Measure
• An attribute selection measure is a heuristic for selecting the splitting
criterion that “best” separates a given data partition, D, of class-
labeled training tuples into individual classes.
• are also known as splitting rules because they determine how the
tuples at a given node are to be split
• provides a ranking for each attribute describing the given training
tuples
• Decision tree induction employs an attribute selection measure such as
• Information gain
– Select the attribute with the highest information gain
• First, compute the disorder using Entropy; the expected information needed to
classify objects into classes
• Second, measure the Information Gain; to calculate by how much the disorder
of a set would reduce by knowing the value of a particular attribute
• GINI index
– An alternative to information gain that measure impurity of attributes in the
classification task
– Select the attribute with the smallest GINI value
18
Entropy
• The Entropy measures the disorder of a set S containing a total of n
examples of which n+ are positive and n- are negative. The expected
information needed to classify a tuple in D is given by:
n n n n
D(n , n )   log 2  log 2  Entropy ( S )
n n n n
OR,

• where pi is the probability that an arbitrary tuple in D belongs to

class Ci and is estimated by |Ci, D|/D
• A log function to the base 2 is used, because the information is
encoded in bits
19
Entropy
• How much more information would we still need (after the
partitioning) in order to arrive at an exact classification? This
amount is measured by

• InfoA(D) is the expected information required to classify a

tuple from D based on the partitioning by A.
• Some useful properties of the Entropy:
• D(n, m) = D(m, n)
• D(0,m) = 0
• D(S)=0 means that all the examples in S have the same class
• D(m, m) = 1
• D(S)=1 means that half the examples in S are of one class and half20
are the opposite class
Information Gain
• Information gain is defined as the difference between the
original information requirement (i.e., based on just the
proportion of classes) and the new requirement (i.e.,
obtained after partitioning on A)
• The Information Gain measures the expected reduction in
entropy due to splitting on an attribute A
 v Di 
GAIN split  Entropy ( D)    Entropy (i ) 
 i 1 D 

Parent Node, p is split into v partitions;

Di is number of records in partition i
Example: Decision Tree for “play football or not”. Use the
weather training Dataset given below to construct decision tree
Outlook Temperature Humidity Windy Play_football
Sunny Hot High False No
Sunny Hot High True No
Overcast Hot High False Yes
Rainy Mild High False Yes
Rainy Cool Normal False Yes
Rainy Cool Normal True No
Overcast Cool Normal True Yes
Sunny Mild High False No
Sunny Cool Normal False Yes
Rainy Mild Normal False Yes
Sunny Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Rainy Mild High True No 22
The Process of Constructing a Decision Tree
• Select an attribute to place at the root of the decision
tree and make one branch for every possible value
• Repeat the process recursively for each branch
• Which attribute should be placed at a certain node is
based on the information gained by placing a certain
attribute at this node
• In the weather data example, there are 9 instances of
which the decision to play_football is “yes” and there
are 5 instances of which the decision to play_football
is “no’. Then, the expected information required by
knowing the result of the decision is
9  9   5   5 
   log      log   0.940 bits.
14  14   14   14 
Information Further Required If
“Outlook” Is Placed at the Root
Outlook
sunny overcast rainy

yes yes yes

yes yes yes
no yes yes
no yes no
no no

Information further required outlook   inf ooutlook 

5 4 5
   0.971     0     0.971  0.693bits .
 14   14   14 
Attribute Selection by Information Gain
• Class P: play_football = “yes” 5 4
E (outlook )  E ( 2,3)  E ( 4,0)
• Class N: play_football = “no” 14 14
5
• E(P, N) = E(9, 5) =0.940  E (3,2)  0.69
14
Hence
• Compute the entropy for
Outlook: Gain(outlook )  E ( p, n)  E (outlook )
Similarly
outlook pi ni E(pi, ni) Gain(outlook )  0.25
sunny 2 3 0.971 Gain( tempreture )  0.029
overcast 4 0 0 Gain( humidity )  0.151
rainy 3 2 0.971 Gain( windy )  0.048
25
The Strategy for Selecting an Attribute
to Place at a Node
• Select the attribute that gives us the largest
information gain
• In this example, it is the attribute “Outlook”

Outlook

sunny overcast rainy

4 “yes” 3 “yes”
2 “yes”
3 “no” 2 “no”
The Recursive Procedure for Constructing a
Decision Tree
• The operation discussed above is applied to each
branch recursively to construct the decision tree
• For example, for the branch “Outlook = Sunny”, we
evaluate the information gained by applying each of
the remaining 3 attributes
– Gain(Outlook=sunny;Temperature) = 0.971 – 0.4 = 0.571
– Gain(Outlook=sunny;Humidity) = 0.971 – 0 = 0.971
– Gain(Outlook=sunny;Windy) = 0.971 – 0.951 = 0.02
The Recursive Procedure for Constructing a
Decision Tree
• Similarly, we also evaluate the information
gained by applying each of the remaining 3
attributes for the branch “Outlook = rainy”.
– Gain(Outlook=rainy;Temperature) = 0.971 – 0.951
= 0.02
– Gain(Outlook=rainy;Humidity) = 0.971 – 0.951 =
0.02
– Gain(Outlook=rainy;Windy) =0.971 – 0 = 0.971
Output: A Decision Tree for “play_football”
Outlook

sunny overcast rainy

humidity yes windy

high normal false true

no yes yes no

Classification Rules
IF outlook= “sunny” & humidity= “high” THEN play_football = “no”
IF outlook= “sunny” & humidity= “normal” THEN play_football = “yes”
IF outlook= “overcast” THEN play_football = “yes”
IF outlook= “rainy” & windy= “false” THEN play_football = “yes”
29
IF outlook= “rainy” & windy= “true” THEN play_football = “no”
Pros and Cons of decision trees
Pros Cons
• Reasonable training time • Cannot handle complicated
• Fast application relationship between features
• Easy to interpret • problems with lots of missing
• Easy to implement data
•Can handle large number of
features

Why decision tree induction in data mining?

•Relatively faster learning speed (than other classification methods)
•Convertible to simple and easy to understand classification if-then-
else rules
•Comparable classification accuracy with other methods
•Does not require any prior knowledge of data distribution, works well
on noisy data
30
Logarithmic Table

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
57% (83)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (79)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (108)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Penis Enlargement Secret
60% (124)
Penis Enlargement Secret
12 pages
Workbook For The Body Keeps The Score
89% (53)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
79% (28)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
100 Questions To Ask Your Partner
78% (36)
100 Questions To Ask Your Partner
2 pages
The 36 Questions That Lead To Love - The New York Times
91% (35)
The 36 Questions That Lead To Love - The New York Times
3 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (8)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
77% (13)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
1001 Songs
70% (73)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
CH 5
No ratings yet
CH 5
84 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
Classification
100% (1)
Classification
37 pages
05classification Rule Mining
No ratings yet
05classification Rule Mining
56 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
88 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
50 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Classification Ppts 2021
No ratings yet
Classification Ppts 2021
80 pages
7 Classification
100% (3)
7 Classification
63 pages
Class Basic
No ratings yet
Class Basic
67 pages
Classification
No ratings yet
Classification
33 pages
5.classification and Prediction
No ratings yet
5.classification and Prediction
9 pages
Lecture 6 Classification-Decision Tree Rule Based K-NN
No ratings yet
Lecture 6 Classification-Decision Tree Rule Based K-NN
73 pages
Classification and Prediction
No ratings yet
Classification and Prediction
40 pages
Unit 3
No ratings yet
Unit 3
16 pages
7 - Classification
No ratings yet
7 - Classification
71 pages
Unit 4
No ratings yet
Unit 4
186 pages
4 Classification
No ratings yet
4 Classification
20 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
Class Basic
No ratings yet
Class Basic
75 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
08 Class Basic
No ratings yet
08 Class Basic
141 pages
Unit-6: Classification and Prediction
No ratings yet
Unit-6: Classification and Prediction
63 pages
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
No ratings yet
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
75 pages
Data Mining: Classification
No ratings yet
Data Mining: Classification
70 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Unit 3-Classification
No ratings yet
Unit 3-Classification
71 pages
20210913115613D3708 - Session 05-08 Decision Tree Classification
No ratings yet
20210913115613D3708 - Session 05-08 Decision Tree Classification
37 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
Classification
No ratings yet
Classification
45 pages
Module 04
No ratings yet
Module 04
75 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
_08ClassBasic_v1
No ratings yet
_08ClassBasic_v1
46 pages
08ClassBasic-L
No ratings yet
08ClassBasic-L
78 pages
TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
Unit Iv
No ratings yet
Unit Iv
38 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
05 Classification
No ratings yet
05 Classification
79 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
Data Mining: Concepts and Techniques: - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 7
61 pages
Chapter 5 Classification
No ratings yet
Chapter 5 Classification
24 pages
Classification and Prediction
No ratings yet
Classification and Prediction
69 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
Week 4 Part 1 Classification
No ratings yet
Week 4 Part 1 Classification
71 pages
unit 2 notes (1)
No ratings yet
unit 2 notes (1)
83 pages
Unit 4
No ratings yet
Unit 4
20 pages
Classification and Prediction
100% (1)
Classification and Prediction
31 pages
Classification DecisionTreesNaiveBayeskNN
No ratings yet
Classification DecisionTreesNaiveBayeskNN
75 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
Classification
No ratings yet
Classification
73 pages
Data Mining UNIT-III R20 Syllabus
No ratings yet
Data Mining UNIT-III R20 Syllabus
50 pages
DWDM Module IV
No ratings yet
DWDM Module IV
57 pages
08 Class Basic
No ratings yet
08 Class Basic
86 pages
04 Classification
No ratings yet
04 Classification
72 pages
Algebra - Task & Drill Sheets Gr. 3-5
From Everand
Algebra - Task & Drill Sheets Gr. 3-5
Nat Reed
No ratings yet
Genetic Algorithm Maze Solving Program
No ratings yet
Genetic Algorithm Maze Solving Program
13 pages
IAT-I CP QP
No ratings yet
IAT-I CP QP
1 page
Java Practice - Algorithms
No ratings yet
Java Practice - Algorithms
7 pages
Lecture 2 Deep Learning Overview
No ratings yet
Lecture 2 Deep Learning Overview
98 pages
EC-233 Data Structures and Algorithms: Lab Report#05
No ratings yet
EC-233 Data Structures and Algorithms: Lab Report#05
11 pages
Ai FINAL
No ratings yet
Ai FINAL
22 pages
Module 3: Linked List: Data Structures and Applications (15CS33)
No ratings yet
Module 3: Linked List: Data Structures and Applications (15CS33)
28 pages
Gauss Seidel Iteration Method, Convergence Analysis
No ratings yet
Gauss Seidel Iteration Method, Convergence Analysis
20 pages
DSA by Shradha Ma'am - Google Sheets
No ratings yet
DSA by Shradha Ma'am - Google Sheets
6 pages
Greedy Algorithm
No ratings yet
Greedy Algorithm
35 pages
3 - CSE3013 - Adversarial Search
No ratings yet
3 - CSE3013 - Adversarial Search
30 pages
Python Lab
No ratings yet
Python Lab
27 pages
DFS Algorithm
No ratings yet
DFS Algorithm
7 pages
Quiz Question Bank - 4
No ratings yet
Quiz Question Bank - 4
3 pages
AI LAB OUTPUT - Colaboratory
No ratings yet
AI LAB OUTPUT - Colaboratory
17 pages
Gauss-Seidel Method - More Examples Computer Engineering: Example 1
No ratings yet
Gauss-Seidel Method - More Examples Computer Engineering: Example 1
3 pages
Ai - Chapter 3
No ratings yet
Ai - Chapter 3
10 pages
MTZ
No ratings yet
MTZ
15 pages
Assignment On: Course Title: Design and Analysis of Algorithms Course Code: CSE 3203
No ratings yet
Assignment On: Course Title: Design and Analysis of Algorithms Course Code: CSE 3203
8 pages
Lab 1
No ratings yet
Lab 1
4 pages
Tic Tac Toe
No ratings yet
Tic Tac Toe
12 pages
Question Bank Ge6151 / Computer Programming
No ratings yet
Question Bank Ge6151 / Computer Programming
3 pages
Phase Transitions in Sudoku: Carlos Cotta
No ratings yet
Phase Transitions in Sudoku: Carlos Cotta
8 pages
COMP 3711 Design and Analysis of Algorithms: Lecture 0: Course Mechanics Review of Asymptotic Notations
No ratings yet
COMP 3711 Design and Analysis of Algorithms: Lecture 0: Course Mechanics Review of Asymptotic Notations
33 pages
Data Structures and Algorithms: Lab Experiment-12
No ratings yet
Data Structures and Algorithms: Lab Experiment-12
10 pages
Lecture 3 - Complexity Analysis Cont
No ratings yet
Lecture 3 - Complexity Analysis Cont
19 pages
CS301 Short Notes MidTerm by Vu Topper RM
No ratings yet
CS301 Short Notes MidTerm by Vu Topper RM
21 pages
Algorithms Topics
No ratings yet
Algorithms Topics
11 pages
Discrete Maths Reading Assignment
No ratings yet
Discrete Maths Reading Assignment
2 pages
Computer Science C++
No ratings yet
Computer Science C++
48 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

CH-5 DM Classification

Uploaded by

CH-5 DM Classification

Uploaded by

Classification

4 Yes Medium 120K No

7 Yes Large 220K No Learn

12 Yes Medium 80K ?

13 Yes Large 110K ? Deduction

April 27, 2024 Data Mining: Concepts and Techniques 8

NAME RANK YEARS TENURED Classifier

NAME RANK YEARS TENURED

• Most widely-used metric:

value-5 value-4 value-6 value-7

• Conditions for stopping partitioning

• where pi is the probability that an arbitrary tuple in D belongs to

• InfoA(D) is the expected information required to classify a

Parent Node, p is split into v partitions;

yes yes yes

Information further required outlook   inf ooutlook 

sunny overcast rainy

sunny overcast rainy

humidity yes windy

high normal false true

Why decision tree induction in data mining?

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.