unit 2 notes (1)
unit 2 notes (1)
Programme : BCA
Semester : V
Subject Code : BCAT311
Subject : Operating System
Topic : Decision tree, Naïve Bayes Support
Vector machines Classifier, Rule based Classifier
Faculty : Ms. Shilpi Bansal
© Institute of Information Technology and Management, D-29, Institutional Area, Janakp
uri, New Delhi-110058
List of Topics
Classification
Classification
Algorithms
Training
Data
Classifier
Testing
Data Unseen Data
(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
Merlisa Associate Prof 7 no
George Professor 5 yes
8 Joseph Assistant Prof 7 yes
Decision Tree Induction: An Example
age income student credit_rating buys_computer
<=30 high no fair no
Training data set: Buys_computer <=30 high no excellent no
The data set follows an example of 31…40 high no fair yes
>40 medium no fair yes
Quinlan’s ID3 (Playing Tennis) >40 low yes fair yes
Resulting tree: >40 low yes excellent no
31…40 low yes excellent yes
age? <=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
<=30 overcast
31..40 >40 31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
no yes no yes
9
Algorithm for Decision Tree Induction
Basic algorithm (a greedy algorithm)
Tree is constructed in a top-down recursive divide-
and-conquer manner
At start, all the training examples are at the root
Attributes are categorical (if continuous-valued,
they are discretized in advance)
Examples are partitioned recursively based on
selected attributes
Test attributes are selected on the basis of a
heuristic or statistical measure (e.g., information
gain)
Conditions for stopping partitioning
All samples for a given node belong to the same
class
10
There are no remaining attributes for further
Brief Review of Entropy
m=2
11
Attribute Selection Measure:
Information Gain (ID3/C4.5)
Select the attribute with the highest information gain
Let pi be the probability that an arbitrary tuple in D belongs to
class Ci, estimated by |Ci, D|/|D|
Expected information (entropy) needed to classify
m a tuple in D:
Info ( D) pi log 2 ( pi )
i 1
Information needed (after using A to split D into v partitions) to
v | D |
classify D:
Info A ( D )
j
Info ( D j )
j 1 | D |
Information gained by branching on attribute A
Gain(A) Info(D) Info A(D)
12
Attribute Selection: Information Gain
g Class P: buys_computer = 5 4
Info age ( D ) I (2,3) I (4,0)
“yes” 14 14
g Class N: buys_computer =
9 9 5 5 5
Info ( D“no”
) I (9,5) log 2 ( ) log 2 ( ) 0.940 I (3,2) 0.694
14 14 14 14 14
age pi ni I(p i, n i) 5
<=30 2 3 0.971 I (2,3) means “age <=30” has
14
31…40 4 0 0 5 out of 14 samples, with 2
>40 3 2 0.971 yes’es and 3 no’s. Hence
age
<=30
income student credit_rating
high no fair
buys_computer
no
Gain(age) Info ( D ) Info age ( D ) 0.246
<=30 high no excellent no
31…40
>40
high
medium
no
no
fair
fair
yes
yes
Similarly,
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes Gain(income) 0.029
<=30 medium no fair no
<=30
>40
low
medium
yes
yes
fair
fair
yes
yes
Gain( student ) 0.151
13
<=30
31…40
medium
medium
yes
no
excellent
excellent
yes
yes Gain(credit _ rating ) 0.048
31…40 high yes fair yes
>40 medium no excellent no
Computing Information-Gain for Continuous-
Valued Attributes
Let attribute A be a continuous-valued attribute
Must determine the best split point for A
Sort the value A in increasing order
Typically, the midpoint between each pair of
adjacent values is considered as a possible split
point
(ai+ai+1)/2 is the midpoint between the values of ai and ai+1
GainRatio(A) = Gain(A)/SplitInfo(A)
Ex.
22
Naïve Bayes Classifier: Training Dataset
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6 >40 medium no excellent no
27
TYPES of Naïve Bayes
There are three types of Naive Bayes Model, which are given below:
Gaussian: The Gaussian model assumes that features follow a normal distribution.
This means if predictors take continuous values instead of discrete, then the model
assumes that these values are sampled from the Gaussian distribution.
Multinomial: The Multinomial Naïve Bayes classifier is used when the data is
multinomial distributed. It is primarily used for document classification problems, it
means a particular document belongs to which category such as Sports, Politics,
education, etc.
The classifier uses the frequency of words for the predictors.
Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier, but the
predictor variables are the independent Booleans variables. Such as if a particular word
is present or not in a document. This model is also famous for document classification
tasks.
28
TOPIC
Support Vector Machines Classifier
• Rule: (Condition) y
– where
• Condition is a conjunctions of attributes
• y is the class label
– LHS: rule antecedent or condition
– RHS: rule consequent
– Examples of classification rules:
(Blood Type=Warm) (Lay Eggs=Yes) Birds
(Taxable Income < 50K) (Refund=Yes) Evade=No
Name
(Example)
Blood Type Give Birth Can Fly Live in Water Class
human warm yes no no mammals
python cold no no no reptiles
salmon cold no no yes fishes
whale warm yes no yes mammals
frog cold no no sometimes amphibians
komodo cold no no no reptiles
bat warm yes yes no mammals
pigeon warm no yes no birds
cat warm yes no no mammals
leopard shark cold yes no yes fishes
turtle cold no no sometimes reptiles
penguin warm no no sometimes birds
porcupine warm yes no no mammals
eel cold no no yes fishes
salamander cold no no sometimes amphibians
gila monster cold no no no reptiles
platypus warm no no no mammals
owl warm no yes no birds
dolphin warm yes no yes mammals
eagle warm no yes no birds
R1: (Give Birth = no) (Can Fly = yes) Birds
R2: (Give Birth = no) (Live in Water = yes) Fishes
R3: (Give Birth = yes) (Blood Type = warm) Mammals
R4: (Give Birth = no) (Can Fly = no) Reptiles
R5: (Live in Water = sometimes) Amphibians
Application of Rule-Based Classifier
• A rule r covers an instance x if the attributes of the instance
satisfy the condition of the rule
R1: (Give Birth = no) (Can Fly = yes) Birds
R2: (Give Birth = no) (Live in Water = yes) Fishes
R3: (Give Birth = yes) (Blood Type = warm) Mammals
R4: (Give Birth = no) (Can Fly = no) Reptiles
R5: (Live in Water = sometimes) Amphibians
Name Blood Type Give Birth Can Fly Live in Water Class
hawk warm no yes no ?
grizzly bear warm yes no no ?
(Status=Single) No
Coverage = 40%, Accuracy = 50%
Decision Trees vs. rules
From trees to rules.
Easy: converting a tree into a set of rules
One rule for each leaf:
Antecedent contains a condition for every node on the path from the
root to the leaf
Consequent is the class assigned by the leaf
Straightforward, but rule set might be overly complex
Decision Trees vs. rules
From rules to trees
• More difficult: transforming a rule set into a tree
– Tree cannot easily express disjunction between rules
• Example:
If a and b then x
If c and d then x
Name Blood Type Give Birth Can Fly Live in Water Class
lemur warm yes no no ?
turtle cold no no sometimes ?
dogfish shark cold yes no yes ?
• Exhaustive rules
– There exists a rule for each combination of attribute values.
– This ensures that every record is covered by at least one rule.
Name Blood Type Give Birth Can Fly Live in Water Class
turtle cold no no sometimes ?
Building Classification Rules: Sequential Covering
R2
rule after
adding new
term
Selecting a test
Goal: maximizing accuracy
t: total number of instances covered by rule
p: positive examples of the class covered by rule
t-p: number of errors made by rule
The rule isn’t very accurate, getting only 4 out of 12 that it covers.
So, it needs further refinement.
Further refinement
Modified rule and resulting data
Should we stop here? Perhaps. But let’s say we are going for exact
rules, no matter how complex they become.
So, let’s refine further.
Further refinement
The result
© Institute of Information Technology and Manage
ment, D-29, Institutional Area, Janakpuri, New Delh
i-110058