0% found this document useful (0 votes)
27 views

L22 DecisionTrees

This document discusses decision trees, including their history, how they work for classification and regression problems, and algorithms for building decision trees like ID3 and CART. It explains how decision trees classify instances by sorting them from the root node to a leaf node. Each node specifies an attribute test and branches represent attribute values. Metrics like entropy, information gain, and Gini index are used to build trees. The recursive binary splitting algorithm is also described to build regression trees. Sample weather and Python decision trees are shown.

Uploaded by

whathwaye
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

L22 DecisionTrees

This document discusses decision trees, including their history, how they work for classification and regression problems, and algorithms for building decision trees like ID3 and CART. It explains how decision trees classify instances by sorting them from the root node to a leaf node. Each node specifies an attribute test and branches represent attribute values. Metrics like entropy, information gain, and Gini index are used to build trees. The recursive binary splitting algorithm is also described to build regression trees. Sample weather and Python decision trees are shown.

Uploaded by

whathwaye
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Decision Trees

Arun Kumar

IIT Ropar

1 / 14
Outlines

1 Elements of Information Theory

2 Decision Tree Classification for Categorical Data

3 Decision Tree Regression

2 / 14
History

• Information theory was introduced in 1948 by Shannon.

• The theory cam into existence in connection with the problem of


transmission of information along communications channels.

• “Information" in itself is a very general, qualitative, subjective and not


very precise concept.

• However, information theory is developed into a quantitative, precise,


objective and very useful theory.

3 / 14
Shannon’s Information

• Let the PMF of the rv X is given.

• The question posed by Shannon is the following “Can we find a measure


of how much uncertain we are of the outcome" ?

• Shannon then assumed that if such a function, denoted H(p1 , · · · , pn ),


exists, it is reasonable to expect that it will have the following properties.
a H should be continuous in all the pi .
b If all the pi are equal, i.e., pi = 1/n, then H should have a maximum value
and this maximum value should be a monotonic increasing function of n.
c If a choice is broken down into successive choices, the quantity H should be
the weighted sum of the individual values of H.
Pn
• The entroy function is defined by H(p1 , p2 , · · · , pn ) = − i=1 pi log(pi ).

4 / 14
Measure of Impurity

Entropy
Entropy for a set S is givne by
X
H(S) = − p(c) log2 p(c),
c∈C

where C is the set of classes is S and p(c) are proportions of different


classes.

Gini
Gini impurity for a set S, where the target variable takes N different labels

X N
X
Gini(S) = p(i)p(j) = 1 − p(i)2 ,
i̸=j i=1

where p(i) are the proportions of different labels in the set S.

5 / 14
Decision Tree Introduction

• Decision Tree algorithm belongs to the family of supervised learning


algorithms.
• Decision tree can be used for solving classification as well as regression
problems.
• Decision trees classify instances by sorting them down the tree from the
root node to some leaf node. Leaf node classify the instance.
• Each node in the tree specifies a test of some attribute of instance and
each branch emanating from that node belong to the values of the
attribute.

6 / 14
Sample Decision Tree

7 / 14
Algorithms to build decision trees

• ID3 (Iterative Dichotomiser 3): Entropy and Information Gains as


metrics
• CART (Classification and Regression Trees): Gini Index as metric
• Decision tree regression by using recursive binary splitting as metric
• Others

8 / 14
ID3 Algorithm based on Weather Data

1
1
Based on “Machine Learning", by T. Mitchell, Ch. 3
9 / 14
Final Decision Tree

10 / 14
Recursive Binary Tree Splitting Algorithm

• Suppose there are p predictors.


• We find out the predictor Xj and the cutpoint s such that splitting the
predictor space into the regions {X |Xj < s} and {X |Xj ≥ s} leads to the
highest reduction in Residual Square Sums (RSS).
• In details, for any j and s, define the pair of half-planes given by
R1 (j, s) = {X |Xj < s} and R2 (j, s) = {X |Xj ≥ s}, and we search for the
pair (j, s), that minimize the value of RSS given by
X X
(yi − ŷR1 )2 + (yi − ŷR2 )2
i:xi ∈R1 (j,s) i:xi ∈R2 (j,s)

where ŷR1 is the mean response for the training observations in R1 (j, s),
and ŷR1 is the mean response for the training observations in R2 (j, s).
2

2
Based on “An Introduction to Statistical Learning with Applications in R ", Chapter 8, Page 306
11 / 14
Data

12 / 14
Final Decision Tree Based On Python Sklearn

13 / 14
References

• Beazley, D. M. (2009). Python: Essential Reference (4th ed.) Pearson


Education, Inc.

• James, G., Witten, D., Hastie, T. and Tibshirani, R. (2013). An


Introduction to Statistical Learning with Applications in R. Springer New
York.

• Mitchell, T. M. (2017). Machine Learning. McGraw Hill Education.

• https://www.superdatascience.com/

14 / 14

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy