0% found this document useful (0 votes)

51 views

Decision Tree

The document describes a decision tree for predicting whether a match will be played or not based on weather conditions. It includes: 1) An example decision tree with weather attributes like outlook, temperature, and humidity to classify outcomes of a match being played or not. 2) Descriptions of key concepts like entropy, information gain, and gini impurity which are measures used to build decision trees. 3) The steps involved in building a decision tree classifier by selecting attributes that maximize information gain and purity at each node.

Uploaded by

patricknamdev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views

Decision Tree

Uploaded by

patricknamdev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 31

Decision Tree

23MSD7027-Kunal Kaustav Nath

23MSD7038-T.Kiran Adithya
23MSD7054-B.Lakshmi Priya
23MSD7055-D.Suvarna
23MSD7062-B.Supriya
23MSD7061-Sreelakshmi T
Decision Tree

Introduction Classification Regression Pruning Conclusion

What is Information
Supervised & decision Entropy Pre Post Advantages
Unsupervised tree Gain pruning pruning

Components Ginny index

of Tree Limitations
Introduction

Supervised Learning

 Supervised learning is a type of machine learning algorithm that learns from labeled data.
Labeled data is data that has been tagged with a correct answer or classification.

 Supervised learning is when we teach or train the machine using data that is well-labelled,
which means some data is already tagged with the correct answer.

 After that, the machine is provided with a new set of examples(data) so that the supervised
learning algorithm analyses the training data(set of training examples) and produces a
correct outcome from labeled data.

 Examples: Classification and regression problems are common examples of supervised

learning.
Unsupervised Learning
 In unsupervised learning, the algorithm is given unlabeled data and is tasked with finding
patterns, structures, or relationships within the data without explicit guidance. The learning
algorithm explores the inherent structure of the data without predefined output labels.

 Examples:Clustering, where the algorithm groups similar data points together based on certain
characteristics, and dimensionality reduction, where the algorithm reduces the number of
features while preserving relevant information, are examples of unsupervised learning.
What is Decision Tree

 A decision tree is a popular machine learning algorithm that is used for both classification and regression
tasks. It is a tree-like model where each internal node represents a decision based on a feature, each branch
represents the outcome of the decision, and each leaf node represents the final predicted output for a given
input.
Components of Decision Tree
A decision tree consists of several key components that together define its structure and behavior. Here are
the main components of a decision tree:

 Root Node:
- The topmost node in the tree, representing the initial decision or test based on a specific feature. It is the
starting point for the decision-making process.

 Internal Nodes:
- Nodes within the tree, excluding the leaf nodes. Each internal node corresponds to a decision based on a
specific feature and serves as a branching point in the tree.

 Terminal nodes/Leaf nodes:

- Terminal nodes that do not have any children. Each leaf node represents the final predicted output (or class
in the case of classification) for a given input. The leaf nodes are the endpoints of the decision-making
process.

 Decision Node
-In a decision tree, a decision node is a point where the tree makes a decision about the input data. It
represents a test or condition that is applied to the input features, and based on the outcome of that test, the
 Edges:
- Edges connect nodes in the tree and represent the outcome of a decision. Each edge leads from
one node to another and corresponds to a particular outcome of the decision associated with the parent
node.

 Parent and Child Nodes:

- In the context of a decision tree, an internal node is a parent node to its child nodes. Child nodes are
connected to their parent node through branches.

 Subtree:
- A subtree is a portion of the entire decision tree that is itself a valid decision tree. Internal nodes and
leaf nodes, along with their connecting branches, form subtrees within the larger tree.
ID3 VS CART
 ID3 (Iterative Dichotomiser 3) and CART (Classification and Regression Trees) are both algorithms
used in machine learning for building decision trees, but they have some differences in terms of their
approach and applications.

 Objective:

• ID3: Primarily designed for classification tasks. It builds a decision tree by recursively splitting the
dataset based on the most informative attribute at each step, aiming to create branches that result in
pure subsets.

• CART: Can be used for both classification and regression tasks. It constructs binary trees by
recursively splitting the dataset based on the feature that provides the best separation according to a
specified criterion.
 Categorical vs. Numeric Attributes:

• ID3: Primarily handles categorical attributes. It selects the attribute that maximizes information gain or
minimizes entropy.

• CART: Can handle both categorical and numeric attributes. It uses different criteria for determining the best split
depending on the type of attribute (e.g., Gini impurity for categorical attributes and mean squared error for
numeric attributes).

 Splitting Criteria:

• ID3: Uses information gain and entropy as the criterion for selecting the best attribute to split the data.

• CART: Uses Gini impurity for classification tasks (to measure the node's impurity) and mean squared error for
regression tasks.
 Tree Structure:

• ID3: Tends to create deeper trees, which can be more prone to overfitting.

• CART: Typically results in shallower trees, and it employs pruning techniques to control overfitting.

 Handling Missing Values:

• ID3: Generally does not handle missing values well.

• CART: Can handle missing values by using surrogate splits

 Output:

• ID3: Outputs categorical labels.

• CART: Can output both categorical labels and numeric values, making it versatile for classification
and regression tasks.
Decision Tree

Regression Classifiaction
Classification
 Classification in decision trees involves building a tree structure that helps classify instances or data
points into different classes or categories. The decision tree is constructed based on features or attributes
of the data, and each internal node of the tree represents a decision based on a specific attribute.

 If we have output of the data as categarical or label, we use classification technique to split the nodes of
the decision tree
 Entropy

• It is used for checking the impurity or uncertainity present in the data.Entropy is used to evaluate the quality
of the split.

• Formula of Entropy

• Lets asume that resulting decision tree classifies instances into two categories,we will call them positive and
negative.Given a set containing positive and negative classes,then the entropy of S is,
The above table is about forecasting whether the match will be played or not according to the weather condition
• Graph of Entropy
 Gini Impurity

• Gini impurity is a measure used in decision tree algorithms to quantity a data sets impurity
level or disorder

• Formula for Gini Impurity

 Information Gain

• It indicates how much information a particular feature or variable give us about the final outcome

• To mininize the decision tree depth when we traverse the path,we need to select the optimal attribute for
splitting the tree node
Steps to make Decision Tree
 Take the entire dataset as an input

 Calculate the entropy of the target variable,as well as the predictor attributes

 Calculate the information gain of all attributes

 Choose the attributes withh the highest information gain as the Root node

 Repeat the same procedure on every branch until the decision node of each branch is
finalised
PROBLEM
Instance Outlook Temperature Humidity Outcomes

1 Sunny Hot High NO

2 Sunny Hot High NO

3 Rain Hot High YES

4 Rain Cool Normal YES

5 Rain Cool Normal YES

6 Sunny Cool High NO

7 Sunny Hot High NO

8 Sunny Hot Normal YES

Regression

 Regression refers to using a decision tree algorithm to predict a continuous numeric output rather
than class labels.

 While decision trees are commonly associated with classification tasks, they can also be adapted for
regression tasks.

 The process of building a decision tree for regression is similar to that for classification, but instead
of predicting discrete classes at the leaf nodes, it predicts continuous values.

 Continuous data is a type of quantitative data that can take any value within a given range.

 For exmaple:Height,Weight,Time,Temperature
Regression Problem

Experience Gap Salary

2 Yes 40K
2.5 Yes 42K
3 No 52K
4 No 60K
4.5 Yes 56K
STEPS TO SOLVE A DECISION TREE REGRESSION

• At first we have to take any one of the input as the root node

• Then we have to split the output and arrange it according to the condition of the root node

• The next step will be we have to calculate the Variance or MSE or SSR for the rootnode

• Formula of Variance/MSE/SSR
• Next step will be we have to calculate the variance reduction for that root node

• Variance Reduction Formula

• Next we have to calculate the variance reduction for different root nodes

• After calculating the variance reduction for all the root nodes,we will select the root node as the main
root node for the decision tree where the variance reduction is high or larger

• After selecting the root node,we will split the node into binary according to the conditions

• This process will continue until the decision tree has been built
 Key Differences from Classification Trees:

• Output at Leaf Nodes: Instead of class labels, regression trees output continuous values representing the
predicted outcome.

• Splitting Criteria: The decision criteria for selecting features and thresholds are typically based on
minimizing the variance or mean squared error of the target variable within the subsets created by the
splits.

• Evaluation Metrics: Common evaluation metrics for regression trees include mean squared error (MSE),
mean absolute error (MAE), or other measures of prediction accuracy.
PRUNING

 Pruning consists of a set of techniques that can be used to simplify a Decision Tree, and enable it to
generalise better

 Pruning Decision Trees falls into 2 general forms: Pre-Pruning and Post-Pruning.

 Post Pruning :

• This technique is used after construction of decision tree.

• This technique is used when decision tree will have very large depth and will show overfitting of model.

• It is also known as backward pruning.

• This technique is used when we have infinitely grown decision tree.

• Here we will control the branches of decision tree that is max_depth and min_samples_split using
cost_complexity_pruning
 Pre-Pruning :

• This technique is used before construction of decision tree.

• Pre-Pruning can be done using Hyperparameter tuning.

• Overcome the overfitting issue

 The benefits of pruning include:

• Improved Generalization: Pruning helps create a simpler and more generalizable tree, reducing the risk
of overfitting to the training data.

• Reduced Complexity: A pruned tree is often smaller and easier to interpret than an unpruned one,
making it more suitable for practical applications.

• Faster Prediction: Smaller trees typically lead to faster prediction times for new instances.

• Increased Robustness: Pruned trees are less sensitive to noise in the training data.
 Advantages of the Decision Tree

• It is simple to understand as it follows the same process which a human follow while making any
decision in real-life.

• It can be very useful for solving decision-related problems.

• It helps to think about all the possible outcomes for a problem.

• There is less requirement of data cleaning compared to other algorithms.

• Disadvantages of the Decision Tree

• The decision tree contains lots of layers, which makes it complex.

• It may have an overfitting issue, which can be resolved using the Random Forest algorithm.

• For more class labels, the computational complexity of the decision tree may increase
 Limitations of decision tree

1.Overfitting

2.Instability:

3.Sensitive to Noise

4.Not Suitable for Regression with High Variance Data

 Conclusion

Decision trees are powerful tools for classification and regression tasks, providing a clear and interpretable
way to make predictions. Their ability to handle both categorical and numerical data, along with their simplicity
and visual representation, makes them widely used in various fields. However, decision trees can be prone to
overfitting, and the choice of hyperparameters, such as tree depth, is crucial. Ensemble methods like
Random Forests and Gradient Boosting can enhance performance and mitigate overfitting. Ultimately, the
suitability of a decision tree depends on the specific characteristics of the dataset and the goals
of the analysis.
THANK YOU

CTT-5A-Getting To The So What - Document - To Share
100% (5)
CTT-5A-Getting To The So What - Document - To Share
35 pages
Classroom Management Routines and Materials During CLS
63% (8)
Classroom Management Routines and Materials During CLS
22 pages
A Short Guide To Preparing For The Fellowship Examination: Australian Board in General Surgery
No ratings yet
A Short Guide To Preparing For The Fellowship Examination: Australian Board in General Surgery
9 pages
Cheez-It Area and Perimeter Lesson Plan
0% (1)
Cheez-It Area and Perimeter Lesson Plan
3 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Decision Tree Classification Algorithm (2)
No ratings yet
Decision Tree Classification Algorithm (2)
11 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
17 pages
Tree
No ratings yet
Tree
7 pages
AI - Mod 5. Part 2
No ratings yet
AI - Mod 5. Part 2
40 pages
Decision Tree
No ratings yet
Decision Tree
45 pages
Learning Decision Trees
No ratings yet
Learning Decision Trees
13 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
Lesson 7 Supervised Method (Decision Trees) Algorithms
No ratings yet
Lesson 7 Supervised Method (Decision Trees) Algorithms
12 pages
Machine Learning chapter 4
No ratings yet
Machine Learning chapter 4
9 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
DECSION TREE
No ratings yet
DECSION TREE
6 pages
Machine_Learning_Lecture_08_Decision Tree Learning (1)
No ratings yet
Machine_Learning_Lecture_08_Decision Tree Learning (1)
67 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
My Decision Tree Algorithm
No ratings yet
My Decision Tree Algorithm
21 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
5 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
DMI UNIT 4
No ratings yet
DMI UNIT 4
34 pages
Day48 Decision Trees
No ratings yet
Day48 Decision Trees
5 pages
u34
No ratings yet
u34
4 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
Deciosn_tree_(1)
No ratings yet
Deciosn_tree_(1)
5 pages
TEAA_ Tree Ensembles-1
No ratings yet
TEAA_ Tree Ensembles-1
43 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
11 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
7 pages
DecisionTree Numerical ID3Prob
No ratings yet
DecisionTree Numerical ID3Prob
114 pages
2 - Decision Tree
No ratings yet
2 - Decision Tree
23 pages
Aiml M4 C1
No ratings yet
Aiml M4 C1
101 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Decision Tree Ppt
0% (1)
Decision Tree Ppt
24 pages
Decisiontree 2
No ratings yet
Decisiontree 2
16 pages
ML_Module-3-chapter-6 RNSIT
No ratings yet
ML_Module-3-chapter-6 RNSIT
10 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
Decision Tree (1)
No ratings yet
Decision Tree (1)
7 pages
Decision tree
No ratings yet
Decision tree
16 pages
Unit 4
No ratings yet
Unit 4
33 pages
Module 4 Lecture -2
No ratings yet
Module 4 Lecture -2
65 pages
Lecture 7.1 - Decision Tree Classification
No ratings yet
Lecture 7.1 - Decision Tree Classification
15 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
UNIT-3 ML notes
No ratings yet
UNIT-3 ML notes
4 pages
Decision Trees
No ratings yet
Decision Trees
3 pages
decision tree
No ratings yet
decision tree
13 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
Decision Trees
No ratings yet
Decision Trees
21 pages
S&ML Unit 6- Q & A
No ratings yet
S&ML Unit 6- Q & A
12 pages
Decision_tree
No ratings yet
Decision_tree
15 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Lbs Emba Brochure
No ratings yet
Lbs Emba Brochure
19 pages
Lessons 9A - D
No ratings yet
Lessons 9A - D
10 pages
the-secret-daily-teachings
No ratings yet
the-secret-daily-teachings
15 pages
Session 2C. Activity 2
No ratings yet
Session 2C. Activity 2
3 pages
Forceville (1996) Kress and Van Leeuwențs Reading Images PDF
No ratings yet
Forceville (1996) Kress and Van Leeuwențs Reading Images PDF
16 pages
Review Resubmition
No ratings yet
Review Resubmition
4 pages
7TH Grade Life Science
100% (1)
7TH Grade Life Science
85 pages
5 Balista
No ratings yet
5 Balista
43 pages
Research Proposal
No ratings yet
Research Proposal
25 pages
Jurnal Ebp Kel 1
No ratings yet
Jurnal Ebp Kel 1
12 pages
MATHEMATICS The Teaching and Learning of Mathematics at University Level PDF
100% (2)
MATHEMATICS The Teaching and Learning of Mathematics at University Level PDF
574 pages
Articulation and Pronunciation - Abad, Glenford
No ratings yet
Articulation and Pronunciation - Abad, Glenford
2 pages
E-Portfolio FS3: Micro Teaching and The Use of Technology
No ratings yet
E-Portfolio FS3: Micro Teaching and The Use of Technology
10 pages
ALS-EST Handbook Chapter11
No ratings yet
ALS-EST Handbook Chapter11
12 pages
Subject Outline: 49049 Air and Noise Pollution
No ratings yet
Subject Outline: 49049 Air and Noise Pollution
10 pages
Allama Iqbal Open University Islamabad: Book Name (8610) Level: B.Ed
No ratings yet
Allama Iqbal Open University Islamabad: Book Name (8610) Level: B.Ed
9 pages
A Detailed Lesson Plan in Mapeh
77% (22)
A Detailed Lesson Plan in Mapeh
4 pages
Welcome To Grade 11 University Chemistry: Halton Catholic District School Board Course of Study
No ratings yet
Welcome To Grade 11 University Chemistry: Halton Catholic District School Board Course of Study
3 pages
Art Nouveau Architecture in Ljubljana - Thezaurus - Com 2020
No ratings yet
Art Nouveau Architecture in Ljubljana - Thezaurus - Com 2020
1 page
College of Arts and Sciences: Art Appreciation
No ratings yet
College of Arts and Sciences: Art Appreciation
13 pages
Different Methods of Study in Psycology
No ratings yet
Different Methods of Study in Psycology
5 pages
Quadruple Aim in Healthcare
No ratings yet
Quadruple Aim in Healthcare
2 pages
English Project 2024-25
No ratings yet
English Project 2024-25
19 pages
Handout - Challenges For Novice Leaders Facing Todays Issues in Administration - Xyna Jobyleen Nuyles
No ratings yet
Handout - Challenges For Novice Leaders Facing Todays Issues in Administration - Xyna Jobyleen Nuyles
9 pages
(Language Acquisition & Language Disorders, v. 20) Kazue Kanno-The Acquisition of Japanese As A Second Language-John Benjamins Pub (1999) PDF
No ratings yet
(Language Acquisition & Language Disorders, v. 20) Kazue Kanno-The Acquisition of Japanese As A Second Language-John Benjamins Pub (1999) PDF
193 pages
Splitbrain PDF
No ratings yet
Splitbrain PDF
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Decision Tree

Uploaded by

Decision Tree

Uploaded by

Decision Tree

23MSD7027-Kunal Kaustav Nath

Introduction Classification Regression Pruning Conclusion

Components Ginny index

 Examples: Classification and regression problems are common examples of supervised

 Terminal nodes/Leaf nodes:

 Parent and Child Nodes:

 Handling Missing Values:

• ID3: Generally does not handle missing values well.

• CART: Can handle missing values by using surrogate splits

• ID3: Outputs categorical labels.

• Formula for Gini Impurity

 Calculate the information gain of all attributes

1 Sunny Hot High NO

2 Sunny Hot High NO

3 Rain Hot High YES

4 Rain Cool Normal YES

5 Rain Cool Normal YES

6 Sunny Cool High NO

7 Sunny Hot High NO

8 Sunny Hot Normal YES

Experience Gap Salary

• Variance Reduction Formula

• This technique is used after construction of decision tree.

• It is also known as backward pruning.

• This technique is used when we have infinitely grown decision tree.

• This technique is used before construction of decision tree.

• Pre-Pruning can be done using Hyperparameter tuning.

• Overcome the overfitting issue

 The benefits of pruning include:

• It can be very useful for solving decision-related problems.

• It helps to think about all the possible outcomes for a problem.

• There is less requirement of data cleaning compared to other algorithms.

• Disadvantages of the Decision Tree

• The decision tree contains lots of layers, which makes it complex.

4.Not Suitable for Regression with High Variance Data

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.