0% found this document useful (0 votes)

148 views

Decision Trees - 2022

This document discusses decision tree learning algorithms. It covers concepts like entropy, information gain, and the ID3, CART, and C4.5 algorithms. It also discusses overfitting, methods to reduce overfitting like pruning, and random forests which utilize ensemble learning of decision trees to be less prone to overfitting.

Uploaded by

Soubhav Chaman

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

148 views

Decision Trees - 2022

Uploaded by

Soubhav Chaman

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Decision Tree Learning

Dr. Shaifu Gupta

shaifu.gupta@iitjammu.ac.in
Contents

● Decision Trees concept

● Entropy, Information gain
● ID3 algorithm
● CART algorithm, Gini Impurity
● C4.5 algorithm
● Overfitting
● Methods to reduce overfitting
● Random Forest
Decision Trees

Tree in which each branch node represents a choice between a number of

alternatives, and each leaf node represents a decision.

Supervised learning algorithm!

Decision Tree: structure
Examples
Learning classification tree
What criteria should a decision tree
algorithm use to split variables/columns?
Entropy
Used to measure uncertainity / disorder

Example:
Mixed structure
Positive (1): ⅔ [10/15]
Negative (0): ⅓ [5/15]

The more mixed (1)s and (0)s in column, higher the entropy
Entropy

b=2 irrespective of number of classes

Entropy
What will the entropy for all positives or all negatives ?

● Entropy is 0 if all the members belong to the same class.

● Entropy is 1 when the collection contains an equal no. of +ve and -ve examples.
● Entropy is between 0 and 1 if collection contains unequal no. of +ve and -ve examples.

Goal: Find best attributes to split on when building a decision tree based on reduction in
entropy.

Keep splitting the variables/columns until mixed target column is no longer mixed.
Information gain

● Use entropy to measure quality of split

● Compute entropies for branches, determine quality of split by weighting
entropy of each branch by how many elements it has.
● Subtract from previous entropy to measure reduction -> Information gain

T = Target column, A = the variable (column) we are testing, v = each value in A

Putting it all together : ID3 algorithm

● ID3: Iterative Dichotomizer 3

● Follows greedy approach by selecting a best attribute that yields maximum
information Gain
● The steps in ID3 algorithm are as follows:
○ Calculate entropy for target (using all training examples).
○ For each attribute/feature:
■ Calculate entropy for all its categorical values.
■ Calculate information gain for the feature.
○ Find the feature with maximum information gain.
○ Repeat it until we get the desired tree.
Gini Impurity -> CART Algorithm

One of the other methods used in decision tree algorithms to decide optimal split
from a root node, and subsequent splits.

The lower the Gini Impurity the better the split

Where p(i) is the probability of class i.

Gini Impurity : Example

Left Branch has only blues, so G_left = 0

Right Branch has 1 blue and 5 greens, so G_right= 0.278

Quality of split obtained by weighting impurity of each

branch by how many elements it has:
0.4*0 + 0.6*0.278 = 0.167

Amount of impurity “removed” with this split (Gini Gain)

0.5 - 0.167 = 0.333

Higher Gini Gain = Better Split

Gini Impurity on continuous data
Gini Impurity on continuous data
Gini Impurity on continuous data
Gini Impurity vs Entropy

Gini Impurity is more efficient than entropy in terms of computing power

Computationally, entropy is more complex since it makes use of logarithms and
consequently, the calculation of the Gini Index will be faster.
C4.5 Algorithm

● ID3 applicable for discrete datasets

● Extended to C4.5
○ Handling both continuous and discrete attributes
○ Pruning trees after creation - Reduce overfitting

● C4.5 uses Gain Ratio

Example
Overfitting
Lose some generalization capability.
Overfitting happens when learning algorithm continues to develop hypotheses that
reduces training set error at the cost of an increased test set error.

Causes
● Due to Presence of Noise
● Due to Lack of Representative Instances
Overfitting due to noise
Overfitting due to noise
Overfitting due to lack of samples
Identify overfitting

Relation between error and model complexity

Avoid overfitting in decision trees

Identify and removes subtrees that are likely to be due to noise

● Early stopping: stop growing tree earlier, before it reaches the point where it
perfectly classifies the training data. (depth goes beyond limit, IG insufficient)
● Post-pruning: allow the tree to overfit the data, and then post-prune the tree.

Select “best” tree:

● measure performance over training data

● measure performance over separate validation data set
Post-Pruning (Reduced error pruning)

● Consider each of the decision nodes in the tree to be candidates for pruning.

● Pruning decision node: remove subtree rooted at that node, making it a leaf
node, and assign it most common classification of training examples affiliated with
that node.

● Nodes are removed only if the resulting pruned tree performs no worse than the
original over the validation set.

● Pruning of nodes continues until further pruning is harmful (i.e., decreases

accuracy of the tree over the validation set).
Example
ID3 variation for regression
Real valued features/ attributes

Create a discrete attribute to test continuous

Temperature = 24.50C

(Temperature > 22.00C) = {true, false}

Where to set the threshold?

Random forest

● Utilizes ensemble learning (combines many classifiers) to provide solutions

● Consists of many decision trees

● Predicts by taking average (regression) or majority vote (classification) of

output from various trees

● It reduces the overfitting of datasets

● Trained through bagging or bootstrap aggregating

Random forest vs decision tree

Main difference between decision tree

algorithm and random forest algorithm is that
establishing root nodes and segregating nodes
is done randomly in the latter.
Random forest : Bagging

● Random forest classifier divides training dataset into subsets.

● These subsets are given to every decision tree in the random forest system.
● Each decision tree produces its specific output.

Note: not suited to classification problems with a skewed class distribution

Bootstrap Powerpoint
100% (1)
Bootstrap Powerpoint
20 pages
Machine Learning (CSC052P6G, CSC033U3M, CSL774, EEL012P5E) : Dr. Shaifu Gupta
No ratings yet
Machine Learning (CSC052P6G, CSC033U3M, CSL774, EEL012P5E) : Dr. Shaifu Gupta
18 pages
Sensitivity Analysis
No ratings yet
Sensitivity Analysis
64 pages
Module 3-Decision Tree Learning
100% (1)
Module 3-Decision Tree Learning
33 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Text
No ratings yet
Text
131 pages
Data Mining:: Concepts and Techniques
100% (1)
Data Mining:: Concepts and Techniques
63 pages
Prediction of Crop Using SVM Algorithm
100% (1)
Prediction of Crop Using SVM Algorithm
5 pages
Neural Network and Fuzzy Logic
No ratings yet
Neural Network and Fuzzy Logic
46 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
Unit 5 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Data Mining - WWW - Rgpvnotes.in
15 pages
ML UNIT-IV Notes
100% (1)
ML UNIT-IV Notes
23 pages
Bayesian Inference
No ratings yet
Bayesian Inference
5 pages
Review of Machine Learning Techniques For Crop Recommendation System
100% (1)
Review of Machine Learning Techniques For Crop Recommendation System
13 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
Gujarat Technological University: Computer Engineering Machine Learning SUBJECT CODE: 3710216
No ratings yet
Gujarat Technological University: Computer Engineering Machine Learning SUBJECT CODE: 3710216
2 pages
CS601PC - MACHINE LEARNING Unit - 1-2
No ratings yet
CS601PC - MACHINE LEARNING Unit - 1-2
145 pages
ML - Expectation-Maximization Algorithm
No ratings yet
ML - Expectation-Maximization Algorithm
3 pages
Unit V
100% (1)
Unit V
24 pages
9.deep Feedforward Networks
100% (1)
9.deep Feedforward Networks
13 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
2 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
No ratings yet
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
49 pages
ML - Unit 2
No ratings yet
ML - Unit 2
15 pages
Data Science Introduction
No ratings yet
Data Science Introduction
82 pages
Machine Learning Unit 3
No ratings yet
Machine Learning Unit 3
40 pages
MODULE 5
No ratings yet
MODULE 5
31 pages
A PageRank Model For Player Performance Assessment
No ratings yet
A PageRank Model For Player Performance Assessment
27 pages
LN ML Rug
No ratings yet
LN ML Rug
267 pages
LSTM
No ratings yet
LSTM
42 pages
Introduction To AI-ML-and Applications
No ratings yet
Introduction To AI-ML-and Applications
115 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
Interview Questions For DS & DA (ML)
100% (1)
Interview Questions For DS & DA (ML)
66 pages
Particle Swarm Optimization
No ratings yet
Particle Swarm Optimization
18 pages
ARM7 Processor Architecture
No ratings yet
ARM7 Processor Architecture
33 pages
Two Stage Job Title Identification-1
No ratings yet
Two Stage Job Title Identification-1
77 pages
ML Unit 1
No ratings yet
ML Unit 1
25 pages
02 ML Supervised Learning
No ratings yet
02 ML Supervised Learning
32 pages
Chapter4 Associative Memory
No ratings yet
Chapter4 Associative Memory
27 pages
Dwbi Unit 4 & 5
No ratings yet
Dwbi Unit 4 & 5
26 pages
Ensemble Learning
100% (1)
Ensemble Learning
7 pages
2.2 ML Session Bias Variance Tradeoffs
No ratings yet
2.2 ML Session Bias Variance Tradeoffs
38 pages
Support Vector Machine
No ratings yet
Support Vector Machine
17 pages
UE20CS302 Unit3 Slides
No ratings yet
UE20CS302 Unit3 Slides
308 pages
6CS4-02 ML PPT Unit-3
No ratings yet
6CS4-02 ML PPT Unit-3
52 pages
Automatic Fault Detection System Using PLC
No ratings yet
Automatic Fault Detection System Using PLC
26 pages
Unit 4 Supervised Learning
100% (1)
Unit 4 Supervised Learning
75 pages
Machine Learning C
No ratings yet
Machine Learning C
24 pages
Classification - Issues Regarding Classification and Prediction
No ratings yet
Classification - Issues Regarding Classification and Prediction
42 pages
Semantic Web: Abstra CT
No ratings yet
Semantic Web: Abstra CT
15 pages
Price Prediction
No ratings yet
Price Prediction
16 pages
CS402 Data Mining and Warehousing PDF
No ratings yet
CS402 Data Mining and Warehousing PDF
3 pages
Poly
100% (1)
Poly
108 pages
Decision Trees & The Iterative Dichotomiser 3 (ID3) Algorithm
100% (1)
Decision Trees & The Iterative Dichotomiser 3 (ID3) Algorithm
8 pages
Neural Network Module 2 Notes
100% (1)
Neural Network Module 2 Notes
72 pages
SUpport Vector Machine
No ratings yet
SUpport Vector Machine
28 pages
Implications of Predictive Analytics
No ratings yet
Implications of Predictive Analytics
9 pages
Equity of Cybersecurity in the Education System: High Schools, Undergraduate, Graduate and Post-Graduate Studies.
From Everand
Equity of Cybersecurity in the Education System: High Schools, Undergraduate, Graduate and Post-Graduate Studies.
Joseph O. Esin
No ratings yet
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
Polarity Detection of Kannada Documents: Deepamala. N Dr. Ramakanth Kumar. P
100% (1)
Polarity Detection of Kannada Documents: Deepamala. N Dr. Ramakanth Kumar. P
4 pages
A Survey and Implementation of Machine Learning Algorithms For Customer Churn Prediction
No ratings yet
A Survey and Implementation of Machine Learning Algorithms For Customer Churn Prediction
7 pages
Lecture 05.decision Tree and K Means PDF
No ratings yet
Lecture 05.decision Tree and K Means PDF
38 pages
DWM - Classification-Unit7
No ratings yet
DWM - Classification-Unit7
44 pages
Reservoir Rock Typing For Optimum Permeability Pre
No ratings yet
Reservoir Rock Typing For Optimum Permeability Pre
22 pages
Remote Sensing: Without Actually Being in Contact With It. This Is Done by
100% (1)
Remote Sensing: Without Actually Being in Contact With It. This Is Done by
51 pages
Lulc 12
No ratings yet
Lulc 12
52 pages
Machine Learning - 4
No ratings yet
Machine Learning - 4
23 pages
10.1109@IADCC.2018.8692137
No ratings yet
10.1109@IADCC.2018.8692137
6 pages
ML Overview Notes
No ratings yet
ML Overview Notes
23 pages
Cluster Analysis Fifth Edition Wiley Series in Probability and Statistics Brian S. Everitt instant download
100% (1)
Cluster Analysis Fifth Edition Wiley Series in Probability and Statistics Brian S. Everitt instant download
55 pages
Predictive Modelling of Bipolar Disorder Utilizing Advanced Machine Learning Techniques
No ratings yet
Predictive Modelling of Bipolar Disorder Utilizing Advanced Machine Learning Techniques
6 pages
Datamining-Lect5 Decision Tree
No ratings yet
Datamining-Lect5 Decision Tree
38 pages
Prediction of Risk Delay in Construction Projects Using A Hybrid Artificial Intelligence Model
No ratings yet
Prediction of Risk Delay in Construction Projects Using A Hybrid Artificial Intelligence Model
14 pages
An Unsupervised Deep Domain Adaptation Approach For Robust Speech Recognition PDF
No ratings yet
An Unsupervised Deep Domain Adaptation Approach For Robust Speech Recognition PDF
12 pages
Immediate download (Ebook) Data Mining: Concepts and Techniques, 4th Edition by Jiawei Han, Jian Pei, Hanghang Tong ISBN 9780128117606, 0128117605 ebooks 2024
100% (2)
Immediate download (Ebook) Data Mining: Concepts and Techniques, 4th Edition by Jiawei Han, Jian Pei, Hanghang Tong ISBN 9780128117606, 0128117605 ebooks 2024
81 pages
Kaur2020 Article Hyper-parameterOptimizationOfD
No ratings yet
Kaur2020 Article Hyper-parameterOptimizationOfD
15 pages
ML Lab File
No ratings yet
ML Lab File
53 pages
Bias Variance PDF
No ratings yet
Bias Variance PDF
58 pages
Literature Review On Risk Management in Banks
100% (2)
Literature Review On Risk Management in Banks
6 pages
Data Analytics and AI
100% (10)
Data Analytics and AI
267 pages
Predictive Modelling of Crime Dataset Using Data Mining
No ratings yet
Predictive Modelling of Crime Dataset Using Data Mining
16 pages
Unit I
No ratings yet
Unit I
28 pages
Object Orientated Course Manuel PDF
No ratings yet
Object Orientated Course Manuel PDF
115 pages
MCQ On Data Mining
No ratings yet
MCQ On Data Mining
20 pages
NN Ch04
No ratings yet
NN Ch04
29 pages
Data Warehousing and Data Mining Syllabus
No ratings yet
Data Warehousing and Data Mining Syllabus
1 page
Unit 4 Data Science
No ratings yet
Unit 4 Data Science
21 pages
DWDM UNIT-IV Classification and Prediction
100% (1)
DWDM UNIT-IV Classification and Prediction
70 pages
Predicting Customer Churn On OTT Platforms
No ratings yet
Predicting Customer Churn On OTT Platforms
19 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Decision Trees - 2022

Uploaded by

Decision Trees - 2022

Uploaded by

Decision Tree Learning

Dr. Shaifu Gupta

● Decision Trees concept

Tree in which each branch node represents a choice between a number of

Supervised learning algorithm!

b=2 irrespective of number of classes

● Entropy is 0 if all the members belong to the same class.

● Use entropy to measure quality of split

T = Target column, A = the variable (column) we are testing, v = each value in A

● ID3: Iterative Dichotomizer 3

The lower the Gini Impurity the better the split

Where p(i) is the probability of class i.

Left Branch has only blues, so G_left = 0

Right Branch has 1 blue and 5 greens, so G_right= 0.278

Quality of split obtained by weighting impurity of each

Amount of impurity “removed” with this split (Gini Gain)

Higher Gini Gain = Better Split

Gini Impurity is more efficient than entropy in terms of computing power

● ID3 applicable for discrete datasets

● C4.5 uses Gain Ratio

Relation between error and model complexity

Identify and removes subtrees that are likely to be due to noise

Select “best” tree:

● measure performance over training data

● Pruning of nodes continues until further pruning is harmful (i.e., decreases

Create a discrete attribute to test continuous

(Temperature > 22.00C) = {true, false}

Where to set the threshold?

● Utilizes ensemble learning (combines many classifiers) to provide solutions

● Consists of many decision trees

● Predicts by taking average (regression) or majority vote (classification) of

● It reduces the overfitting of datasets

● Trained through bagging or bootstrap aggregating

Main difference between decision tree

● Random forest classifier divides training dataset into subsets.

Note: not suited to classification problems with a skewed class distribution

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.