0% found this document useful (0 votes)
148 views

Decision Trees - 2022

This document discusses decision tree learning algorithms. It covers concepts like entropy, information gain, and the ID3, CART, and C4.5 algorithms. It also discusses overfitting, methods to reduce overfitting like pruning, and random forests which utilize ensemble learning of decision trees to be less prone to overfitting.

Uploaded by

Soubhav Chaman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
148 views

Decision Trees - 2022

This document discusses decision tree learning algorithms. It covers concepts like entropy, information gain, and the ID3, CART, and C4.5 algorithms. It also discusses overfitting, methods to reduce overfitting like pruning, and random forests which utilize ensemble learning of decision trees to be less prone to overfitting.

Uploaded by

Soubhav Chaman
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Decision Tree Learning

Dr. Shaifu Gupta


shaifu.gupta@iitjammu.ac.in
Contents

● Decision Trees concept


● Entropy, Information gain
● ID3 algorithm
● CART algorithm, Gini Impurity
● C4.5 algorithm
● Overfitting
● Methods to reduce overfitting
● Random Forest
Decision Trees

Tree in which each branch node represents a choice between a number of


alternatives, and each leaf node represents a decision.

Supervised learning algorithm!


Decision Tree: structure
Examples
Learning classification tree
What criteria should a decision tree
algorithm use to split variables/columns?
Entropy
Used to measure uncertainity / disorder

Example:
Mixed structure
Positive (1): ⅔ [10/15]
Negative (0): ⅓ [5/15]

The more mixed (1)s and (0)s in column, higher the entropy
Entropy

b=2 irrespective of number of classes


Entropy
What will the entropy for all positives or all negatives ?

● Entropy is 0 if all the members belong to the same class.


● Entropy is 1 when the collection contains an equal no. of +ve and -ve examples.
● Entropy is between 0 and 1 if collection contains unequal no. of +ve and -ve examples.

Goal: Find best attributes to split on when building a decision tree based on reduction in
entropy.

Keep splitting the variables/columns until mixed target column is no longer mixed.
Information gain

● Use entropy to measure quality of split


● Compute entropies for branches, determine quality of split by weighting
entropy of each branch by how many elements it has.
● Subtract from previous entropy to measure reduction -> Information gain

T = Target column, A = the variable (column) we are testing, v = each value in A


Putting it all together : ID3 algorithm

● ID3: Iterative Dichotomizer 3


● Follows greedy approach by selecting a best attribute that yields maximum
information Gain
● The steps in ID3 algorithm are as follows:
○ Calculate entropy for target (using all training examples).
○ For each attribute/feature:
■ Calculate entropy for all its categorical values.
■ Calculate information gain for the feature.
○ Find the feature with maximum information gain.
○ Repeat it until we get the desired tree.
Gini Impurity -> CART Algorithm

One of the other methods used in decision tree algorithms to decide optimal split
from a root node, and subsequent splits.

The lower the Gini Impurity the better the split

Where p(i) is the probability of class i.


Gini Impurity : Example

Left Branch has only blues, so G_left = 0

Right Branch has 1 blue and 5 greens, so G_right= 0.278

Quality of split obtained by weighting impurity of each


branch by how many elements it has:
0.4*0 + 0.6*0.278 = 0.167

Amount of impurity “removed” with this split (Gini Gain)


0.5 - 0.167 = 0.333

Higher Gini Gain = Better Split


Gini Impurity on continuous data
Gini Impurity on continuous data
Gini Impurity on continuous data
Gini Impurity vs Entropy

Gini Impurity is more efficient than entropy in terms of computing power


Computationally, entropy is more complex since it makes use of logarithms and
consequently, the calculation of the Gini Index will be faster.
C4.5 Algorithm

● ID3 applicable for discrete datasets


● Extended to C4.5
○ Handling both continuous and discrete attributes
○ Pruning trees after creation - Reduce overfitting

● C4.5 uses Gain Ratio


Example
Overfitting
Lose some generalization capability.
Overfitting happens when learning algorithm continues to develop hypotheses that
reduces training set error at the cost of an increased test set error.

Causes
● Due to Presence of Noise
● Due to Lack of Representative Instances
Overfitting due to noise
Overfitting due to noise
Overfitting due to lack of samples
Identify overfitting

Relation between error and model complexity


Avoid overfitting in decision trees

Identify and removes subtrees that are likely to be due to noise

● Early stopping: stop growing tree earlier, before it reaches the point where it
perfectly classifies the training data. (depth goes beyond limit, IG insufficient)
● Post-pruning: allow the tree to overfit the data, and then post-prune the tree.

Select “best” tree:

● measure performance over training data


● measure performance over separate validation data set
Post-Pruning (Reduced error pruning)

● Consider each of the decision nodes in the tree to be candidates for pruning.

● Pruning decision node: remove subtree rooted at that node, making it a leaf
node, and assign it most common classification of training examples affiliated with
that node.

● Nodes are removed only if the resulting pruned tree performs no worse than the
original over the validation set.

● Pruning of nodes continues until further pruning is harmful (i.e., decreases


accuracy of the tree over the validation set).
Example
ID3 variation for regression
Real valued features/ attributes

Create a discrete attribute to test continuous

Temperature = 24.50C

(Temperature > 22.00C) = {true, false}

Where to set the threshold?


Random forest

● Utilizes ensemble learning (combines many classifiers) to provide solutions

● Consists of many decision trees

● Predicts by taking average (regression) or majority vote (classification) of


output from various trees

● It reduces the overfitting of datasets

● Trained through bagging or bootstrap aggregating


Random forest vs decision tree

Main difference between decision tree


algorithm and random forest algorithm is that
establishing root nodes and segregating nodes
is done randomly in the latter.
Random forest : Bagging

● Random forest classifier divides training dataset into subsets.


● These subsets are given to every decision tree in the random forest system.
● Each decision tree produces its specific output.

Note: not suited to classification problems with a skewed class distribution

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy