0% found this document useful (0 votes)

22 views

U4 ML Updated

Uploaded by

Janhvi

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

U4 ML Updated

Uploaded by

Janhvi

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 32

UNIT IV

TREE BASED AND PROBABILISTIC MODELS

CONTENTS
• Tree Based Model: Decision Tree – Concepts and Terminologies,
Impurity Measures -Gini Index, Information gain, Entropy, Tree
Pruning -ID3/C4.5, Advantages and Limitations
• Probabilistic Models: Conditional Probability and Bayes Theorem,
Naïve Bayes Classifier, Bayesian network for Learning and Inferencing.
Decision Tree Classification Algorithm
• Decision Tree is a Supervised learning technique that can be used for both classification
and Regression problems, but mostly it is preferred for solving Classification problems. It is
a tree-structured classifier, where internal nodes represent the features of a dataset,
branches represent the decision rules and each leaf node represents the outcome.
• In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple branches, whereas
Leaf nodes are the output of those decisions and do not contain any further branches.
• The decisions or the test are performed on the basis of features of the given dataset.
• It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
• It is called a decision tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure.
• In order to build a tree, we use the CART algorithm, which stands for Classification and
Regression Tree algorithm.
• A decision tree simply asks a question, and based on the answer (Yes/No), it further split
the tree into subtrees.
Decision Tree Classification Algorithm
Why use Decision Trees?

• Decision Trees usually mimic human thinking ability while making a

decision, so it is easy to understand.

• The logic behind the decision tree can be easily understood because it
shows a tree-like structure.
Decision Tree Terminologies

• Root Node: Root node is from where the decision tree starts. It represents the
entire dataset, which further gets divided into two or more homogeneous sets.
• Leaf Node: Leaf nodes are the final output node, and the tree cannot be
segregated further after getting a leaf node.
• Splitting: Splitting is the process of dividing the decision node/root node into
sub-nodes according to the given conditions.
• Branch/Sub Tree: A tree formed by splitting the tree.
• Pruning: Pruning is the process of removing the unwanted branches from the
tree.
• Parent/Child node: The root node of the tree is called the parent node, and
other nodes are called the child nodes.
How does the Decision Tree algorithm Work?

• In a decision tree, for predicting the class of the given dataset, the algorithm starts from the
root node of the tree. This algorithm compares the values of root attribute with the record
(real dataset) attribute and, based on the comparison, follows the branch and jumps to the
next node.
• For the next node, the algorithm again compares the attribute value with the other sub-
nodes and move further. It continues the process until it reaches the leaf node of the tree.
The complete process can be better understood using the below algorithm:
• Step-1: Begin the tree with the root node, says S(Entropy), which contains the complete
dataset.
• Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
• Step-3: Divide the S(Entropy) into subsets that contains possible values for the best
attributes.
• Step-4: Generate the decision tree node, which contains the best attribute.
• Step-5: Recursively make new decision trees using the subsets of the dataset created in
step -3. Continue this process until a stage is reached where you cannot further classify the
nodes and called the final node as a leaf node.
How does the Decision Tree
algorithm Work?
• Example: Suppose there is a candidate who has a job offer and wants
to decide whether he should accept the offer or Not. So, to solve this
problem, the decision tree starts with the root node (Salary attribute
by ASM). The root node splits further into the next decision node
(distance from the office) and one leaf node based on the
corresponding labels. The next decision node further gets split into
one decision node (Cab facility) and one leaf node. Finally, the
decision node splits into two leaf nodes (Accepted offers and Declined
offer). Consider the below diagram:
How does the Decision Tree
algorithm Work?
How does the Decision Tree
algorithm Work?
How does the Decision Tree
algorithm Work?
Attribute Selection Measures

• While implementing a Decision tree, the main issue arises that how to select
the best attribute for the root node and for sub-nodes. So, to solve such
problems there is a technique which is called as Attribute selection measure
or ASM. By this measurement, we can easily select the best attribute for the
nodes of the tree. There are two popular techniques for ASM, which are:

• Information Gain(IG)
• Gini Index(GI)
Information Gain

• Information gain is the measurement of changes in entropy after the segmentation of a

dataset based on an attribute.
• It calculates how much information a feature provides us about a class.
• According to the value of information gain, we split the node and build the decision tree.
• A decision tree algorithm always tries to maximize the value of information gain, and a
node/attribute having the highest information gain is split first. It can be calculated using
the below formula:
• Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)
• Entropy: Entropy is a metric to measure the impurity in a given attribute. It specifies
randomness in data. Entropy can be calculated as:
• Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)
• Where,
• S= Total number of samples(or sets)
• P(yes)= probability of yes
• P(no)= probability of no
G in iIn d e x

• Gini index is a measure of impurity or purity(how impure or mixed a dataset

is) used while creating a decision tree in the CART(Classification and
Regression Tree) algorithm.
• An attribute with the low Gini index should be preferred as compared to the
high Gini index.
• It only creates binary splits, and the CART algorithm uses the Gini index to
create binary splits.
• Gini index can be calculated using the below formula:
Pruning: Getting an Optimal Decision tree

• Pruning is a process of deleting the unnecessary nodes from a tree in order

to get the optimal decision tree.
• A too-large tree increases the risk of overfitting, and a small tree may not
capture all the important features of the dataset. Therefore, a technique
that decreases the size of the learning tree without reducing accuracy is
known as Pruning. There are mainly two types of tree pruning technology
used:
• Cost Complexity Pruning
• Reduced Error Pruning.
Advantages & Disadvantages of the Decision Tree

• It is simple to understand as it follows the same process which a human

follow while making any decision in real-life.
• It can be very useful for solving decision-related problems.
• It helps to think about all the possible outcomes for a problem.
• There is less requirement of data cleaning compared to other algorithms.
• Disadvantages of the Decision Tree
• The decision tree contains lots of layers, which makes it complex.
• It may have an overfitting issue, which can be resolved using the Random
Forest algorithm.
• For more class labels, the computational complexity of the decision tree
may increase.
-ID3/C4.5

• ID3 (Iterative Dichotomiser 3) was developed in 1986 by Ross Quinlan. The algorithm
creates a multiway tree, finding for each node (i.e. in a greedy manner) the categorical
feature that will yield the largest information gain for categorical targets. Trees are grown
to their maximum size and then a pruning step is usually applied to improve the ability of
the tree to generalize to unseen data.
• C4.5 is the successor to ID3 and removed the restriction that features must be categorical
by dynamically defining a discrete attribute (based on numerical variables) that partitions
the continuous attribute value into a discrete set of intervals. C4.5 converts the trained
trees (i.e. the output of the ID3 algorithm) into sets of if-then rules. This accuracy of each
rule is then evaluated to determine the order in which they should be applied. Pruning is
done by removing a rule’s precondition if the accuracy of the rule improves without it.
• CART (Classification and Regression Trees) is very similar to C4.5, but it differs in that it
supports numerical target variables (regression) and does not compute rule sets. CART
constructs binary trees using the feature and threshold that yields the largest information
gain at each node.
Naïve Bayes Classifier Algorithm

• Naïve Bayes algorithm is a supervised learning algorithm, which is based

on Bayes theorem and used for solving classification problems.
• It is mainly used in text classification that includes a high-dimensional
training dataset.
• Naïve Bayes Classifier is one of the simple and most effective Classification
algorithms which helps in building the fast machine learning models that can
make quick predictions.
• It is a probabilistic classifier, which means it predicts on the basis of the
probability of an object.
• Some popular examples of Naïve Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles.
Why is it called Naïve Bayes?
• The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which
can be described as:
• Naïve: It is called Naïve because it assumes that the occurrence of a certain
feature is independent of the occurrence of other features. Such as if the
fruit is identified on the bases of color, shape, and taste, then red, spherical,
and sweet fruit is recognized as an apple. Hence each feature individually
contributes to identify that it is an apple without depending on each other.
• Bayes: It is called Bayes because it depends on the principle of
Bayes' Theorem
Bayes' Theorem:

• Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to
determine the probability of a hypothesis with prior knowledge. It depends
on the conditional probability.
• The formula for Bayes' theorem is given as:
• Where,
• P(A|B) is Posterior probability: Probability of hypothesis A on the observed
event B.
• P(B|A) is Likelihood probability: Probability of the evidence given that the
probability of a hypothesis is true.
• P(A) is Prior Probability: Probability of hypothesis before observing the
evidence.
• P(B) is Marginal Probability: Probability of Evidence.
Working of Naïve Bayes' Classifier:

• Working of Naïve Bayes' Classifier can be understood with the help of the
below example:
• Suppose we have a dataset of weather conditions and corresponding target
variable "Play". So using this dataset we need to decide that whether we
should play or not on a particular day according to the weather conditions.
So to solve this problem, we need to follow the below steps:
• Convert the given dataset into frequency tables.
• Generate Likelihood table by finding the probabilities of given features.
• Now, use Bayes theorem to calculate the posterior probability.
• Problem: If the weather is sunny, then the Player should play or not?
• Solution: To solve this, first consider the below dataset:
Dataset
Outlook Play

0 Rainy Yes
1 Sunny Yes
2 Overcast Yes
3 Overcast Yes
4 Sunny No
5 Rainy Yes
6 Sunny Yes
7 Overcast Yes
8 Rainy No
9 Sunny No
10 Sunny Yes
11 Rainy No
12 Overcast Yes
13 Overcast Yes
Frequency table for the Weather Conditions:
Weather Yes No
Overcast 5 0
Rainy 2 2
Sunny 3 2
Total 10 4

Likelihood table weather condition:

Weather Yes No
Overcast 5 0 5/14=0.35
Rainy 2 2 4/14=0.29
Sunny 3 2 5/14=0.35
Total 10/14=0.71 4/14=0.29
Applying Bayes'theorem
• P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
• P(Sunny|Yes)= 3/10= 0.3
• P(Sunny)= 0.35
• P(Yes)=0.71
• So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60
• P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
• P(Sunny|NO)= 2/4=0.5
• P(No)= 0.29
• P(Sunny)= 0.35
• So P(No|Sunny)= 0.5*0.29/0.35 = 0.41
• So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)
• Hence on a Sunny day, Player can play the game.
Applying Bayes'theorem
• P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
• P(Sunny|Yes)= 3/10= 0.3
• P(Sunny)= 0.35
• P(Yes)=0.71
• P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
• P(Sunny|NO)= 2/4=0.5
• P(No)= 0.29
• P(Sunny)= 0.35
Advantages & Disadvantages
Advantages of Naïve Bayes Classifier:
• Naïve Bayes is one of the fast and easy ML algorithms to predict a
class of datasets.
• It can be used for Binary as well as Multi-class Classifications.
• It performs well in Multi-class predictions as compared to the other
Algorithms.
• It is the most popular choice for text classification problems.
Disadvantages of Naïve Bayes Classifier:
• Naive Bayes assumes that all features are independent or unrelated,
so it cannot learn the relationship between features.
Applications of Naïve Bayes
Classifier:
• It is used for Credit Scoring.
• It is used in medical data classification.
• It can be used in real-time predictions because Naïve Bayes Classifier
is an eager learner.
• It is used in Text classification such as Spam filtering and Sentiment
analysis.
Types of Naïve Bayes Model:

• There are three types of Naive Bayes Model, which are given below:
• Gaussian: The Gaussian model assumes that features follow a normal distribution.
This means if predictors take continuous values instead of discrete, then the
model assumes that these values are sampled from the Gaussian distribution.

• Multinomial: The Multinomial Naïve Bayes classifier is used when the data is
multinomial distributed. It is primarily used for document classification
problems, it means a particular document belongs to which category such as
Sports, Politics, education etc.
The classifier uses the frequency of words for the predictors.

• Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier, but
the predictor variables are the independent Booleans variables. Such as if a
particular word is present or not in a document. This model is also famous for
document classification tasks.
What are Bayesian networks?
• A Bayesian network (also known as a Bayes network, Bayes net, belief
network, or decision network) is a probabilistic graphical model that
represents a set of variables and their conditional dependencies via a
directed acyclic graph (DAG).
• Bayesian networks are ideal for taking an event that occurred and predicting
the likelihood that any one of several possible known causes was the
contributing factor.
• For example, a Bayesian network could represent the probabilistic
relationships between diseases and symptoms. Given symptoms, the
network can be used to compute the probabilities of the presence of
various diseases.
A simple Bayesian network with
conditional probability tables
Inference

• Inference is the process of calculating a probability distribution of interest e.g.

P(A | B=True), or P(A,B|C, D=True). The terms inference and queries are used
interchangeably. The following terms are all forms of inference will slightly
difference semantics.
• Prediction - focused around inferring outputs from inputs.
• Diagnostics - inferring inputs from outputs.
• Supervised anomaly detection - essentially the same as prediction
• Decision making under uncertainty - optimization and inference combined.
• A few examples of inference in practice:
• Given a number of symptoms, which diseases are most likely?
• How likely is it that a component will fail, given the current state of the system?
• Given recent behavior of 2 stocks, how will they behave together for the next 5
time steps?
Inference
• Exact inference
• Exact inference is the term used when inference is performed exactly
(subject to standard numerical rounding errors).
• Exact inference is applicable to a large range of problems, but may not
be possible when combinations/paths get large.
• Wider class of problems are solve by exact inference.

4G Lte Network Ip Traffic Classification Using Machine Learning
No ratings yet
4G Lte Network Ip Traffic Classification Using Machine Learning
51 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
DECSION TREE
No ratings yet
DECSION TREE
6 pages
CSL0777 L25
No ratings yet
CSL0777 L25
39 pages
Tree
No ratings yet
Tree
7 pages
DMDW 04
No ratings yet
DMDW 04
10 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Lecture 7.1 - Decision Tree Classification
No ratings yet
Lecture 7.1 - Decision Tree Classification
15 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
5 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
Supervised Learning Algorithm DT
No ratings yet
Supervised Learning Algorithm DT
15 pages
Deciosn_tree_(1)
No ratings yet
Deciosn_tree_(1)
5 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Unit 4
No ratings yet
Unit 4
33 pages
Machine_Learning_Lecture_08_Decision Tree Learning (1)
No ratings yet
Machine_Learning_Lecture_08_Decision Tree Learning (1)
67 pages
decision tree
No ratings yet
decision tree
13 pages
Decision tree
No ratings yet
Decision tree
16 pages
Unit-3 Introduction To Machine Learning Algorithms
No ratings yet
Unit-3 Introduction To Machine Learning Algorithms
18 pages
NOTES
No ratings yet
NOTES
18 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
chapter 04
No ratings yet
chapter 04
48 pages
Session 5b Classification by Decision Tree Induction (1)
No ratings yet
Session 5b Classification by Decision Tree Induction (1)
42 pages
Cours #4—Decision Tree
No ratings yet
Cours #4—Decision Tree
18 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
2179-Unit-3
No ratings yet
2179-Unit-3
29 pages
Lab 2
No ratings yet
Lab 2
3 pages
Decision Tree
No ratings yet
Decision Tree
68 pages
Types of Pruning Techniques
No ratings yet
Types of Pruning Techniques
10 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Unit-II - Tree Based Methods
No ratings yet
Unit-II - Tree Based Methods
158 pages
Decision Trees
No ratings yet
Decision Trees
3 pages
Lesson 7 Supervised Method (Decision Trees) Algorithms
No ratings yet
Lesson 7 Supervised Method (Decision Trees) Algorithms
12 pages
Decision Tree Algorithm, Explained
No ratings yet
Decision Tree Algorithm, Explained
20 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
2 - Decision Tree
No ratings yet
2 - Decision Tree
23 pages
Decisiontree
No ratings yet
Decisiontree
6 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
No ratings yet
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
22 pages
ML for ME S17 Decision Trees
No ratings yet
ML for ME S17 Decision Trees
12 pages
Module 4 Lecture -2
No ratings yet
Module 4 Lecture -2
65 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
10 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
Day48 Decision Trees
No ratings yet
Day48 Decision Trees
5 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
41 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
16 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
AIML Removed
No ratings yet
AIML Removed
25 pages
AIML Removed Merged
No ratings yet
AIML Removed Merged
31 pages
Decision Trees Edited
No ratings yet
Decision Trees Edited
56 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Network Security: A Note On The Use of These PPT Slides
No ratings yet
Network Security: A Note On The Use of These PPT Slides
91 pages
2ndSeComputerGroupCurriculumCOIFCMCW 14220181720
No ratings yet
2ndSeComputerGroupCurriculumCOIFCMCW 14220181720
30 pages
Institute User Manual - Msbte Online Exam s21
No ratings yet
Institute User Manual - Msbte Online Exam s21
11 pages
Digital Temperature Sensor
No ratings yet
Digital Temperature Sensor
11 pages
Java Notes
No ratings yet
Java Notes
95 pages
Decision Tree: Dr. Alekh Gour
No ratings yet
Decision Tree: Dr. Alekh Gour
12 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
52 pages
Fuzzy ID3
100% (1)
Fuzzy ID3
6 pages
FPGA Implementation of A Reduct Generation Algorithm Based On Rough Set Theory
No ratings yet
FPGA Implementation of A Reduct Generation Algorithm Based On Rough Set Theory
7 pages
Decision Making in Banking Marketing
No ratings yet
Decision Making in Banking Marketing
27 pages
Pilot Study Using Decision Trees To Diagnose The Efficacy of Virtual Offshore Egress Training
No ratings yet
Pilot Study Using Decision Trees To Diagnose The Efficacy of Virtual Offshore Egress Training
15 pages
ML Final
No ratings yet
ML Final
95 pages
Data Science Interview Questions: Answer Here
No ratings yet
Data Science Interview Questions: Answer Here
54 pages
VII - CS8031 - DMDW - Module 6 - Classification - VBP
No ratings yet
VII - CS8031 - DMDW - Module 6 - Classification - VBP
99 pages
ID3 Algorithm
100% (1)
ID3 Algorithm
3 pages
Few AIML Lab Viva QA
No ratings yet
Few AIML Lab Viva QA
3 pages
Understanding of Working of DECISION TREE CART Algorithm
No ratings yet
Understanding of Working of DECISION TREE CART Algorithm
15 pages
Comparative Study Fuzzy Decision Tree, ID3
No ratings yet
Comparative Study Fuzzy Decision Tree, ID3
62 pages
Models For Machine Learning: M. Tim Jones
No ratings yet
Models For Machine Learning: M. Tim Jones
10 pages
Handling Missing Value in Decision Tree Algorithm PDF
No ratings yet
Handling Missing Value in Decision Tree Algorithm PDF
6 pages
Unit 5 Classification PDF
No ratings yet
Unit 5 Classification PDF
131 pages
Decision Tree - Classifica On
No ratings yet
Decision Tree - Classifica On
4 pages
Employee Performance Appraisal For Salary Hike - Project
No ratings yet
Employee Performance Appraisal For Salary Hike - Project
93 pages
AID 4th Semester Machine Learning Laboratory - Lab Manual
No ratings yet
AID 4th Semester Machine Learning Laboratory - Lab Manual
56 pages
Chapter 3 Decision Trees
No ratings yet
Chapter 3 Decision Trees
61 pages
Scikit Learn
No ratings yet
Scikit Learn
17 pages
Introduction of Machine Learning
No ratings yet
Introduction of Machine Learning
58 pages
Chapter 3 Decision Trees
No ratings yet
Chapter 3 Decision Trees
12 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
Ch-2 Supervised Machine Learning
No ratings yet
Ch-2 Supervised Machine Learning
48 pages
Lecture 12
No ratings yet
Lecture 12
19 pages
DDT: Distributed Decision Tree
No ratings yet
DDT: Distributed Decision Tree
54 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

U4 ML Updated

Uploaded by

U4 ML Updated

Uploaded by

UNIT IV

TREE BASED AND PROBABILISTIC MODELS

• Decision Trees usually mimic human thinking ability while making a

• Information gain is the measurement of changes in entropy after the segmentation of a

• Gini index is a measure of impurity or purity(how impure or mixed a dataset

• Pruning is a process of deleting the unnecessary nodes from a tree in order

• It is simple to understand as it follows the same process which a human

• Naïve Bayes algorithm is a supervised learning algorithm, which is based

Likelihood table weather condition:

• Inference is the process of calculating a probability distribution of interest e.g.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.