0% found this document useful (0 votes)

286 views

2 - Decision Tree

The document discusses decision tree classification. It covers the basics of decision tree induction including decision nodes, leaf nodes, and paths. It also discusses attribute selection measures like information gain and gini index. The document outlines the steps for decision tree construction and describes pruning algorithms. It discusses scalability issues and algorithms like SLIQ and SPRINT that address handling large datasets. Finally, it lists some common applications and issues related to classification.

Uploaded by

bandaru_jahnavi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

286 views

2 - Decision Tree

Uploaded by

bandaru_jahnavi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 23

2 - decision tree classification

Classification is a supervised learning i.e. we can predict input and out values, Classification is divided into groups but not necessarily similar properties is called Classification. Decision Tree Induction is developed by Ross Quinlan, decision tree algorithm known as ID3 (Iterative Dichotomiser). Decision tree is a classifier in the form of a tree structure

Decision node: specifies a test on a single attribute Leaf node: indicates the value of the target attribute Arc/edge: split of one attribute Path: a disjunction of test to make the final decision

ID3, C4.5, and CART are greedy algorithms for the induction of decision trees. Each algorithm uses an attribute selection measure to select the attribute tested for each nonleaf node in the tree. Pruning algorithms attempt to improve accuracy by removing tree branches reflecting noise in the data. Early decision tree algorithms typically assume that the data are memory resident a limitation to data mining on large databases. Several scalable algorithms, such as SLIQ, SPRINT, and RainForest, have been proposed to address this issue.
2

Why Decision Tree :

Decision trees are powerful and popular tools for classification and prediction. Decision trees represent rules, which can be understood by humans and used in knowledge system such as database.

Key Requirements :
Attribute-value description: Object or case must be expressible in terms of a fixed collection of properties or attributes (e.g., hot, mild, cold). Predefined classes (Target values): the target function has discrete output values (Boolean or Multiclass) Sufficient data: Enough training cases should be provided to learn the model.

TYPES OF CLASSIFICATION TECHNIQUES

1. Decision Tree

2. Bayesian classification
3. Rule-based classification

4. Prediction: Accuracy and error measures

4) Prediction: Accuracy and error measures

1) Decision Tree

CLASSIFICATION TYPES

3) Rule-based classification

2) Bayesian classification

FIG: TYPES OF CLASSIFICATION TECHNIQUES

Classification by Decision Tree Induction consists of

1. Decision Tree Induction 2. Attribute Selection Measures
i. ii. Information gain Gain ratio

iii. Gini index

3. Tree Pruning 4. Scalability and Decision Tree Induction

1. Decision Tree Induction: Decision tree induction is the learning of decision trees from class-labeled training tuples. A decision tree is a flowchart-like tree structure, where each internal node (nonleaf node)denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (or terminal node) holds a class label. The topmost node in a tree is the root node. Decision trees classify instances or examples by starting at the root of the tree and moving through it until a leaf node. It can generate understandable rules. It perform classification without much computation. It can handle continuous and categorical variables. It provide a clear indication of which fields are most important for prediction or classification. It is not suitable for prediction of continuous attribute. 7 It Perform poorly with many class and small data.

Steps of Decision Tree Construction

1. Select the best feature as the root node of the whole tree 2. After partition by this feature, select the best feature (w.r.t the subset of training data) as the root node of this sub-tree 3. Recursively, until the partitions become pure or almost pure

Fig: Class-labeled training tuples from the AllElectronics customer database.

Let A be the splitting attribute. A has v distinct values, fa1, a2, : : : , avg, based on the training data.
1. A is discrete-valued: In this case, the outcomes of the test at node N correspond directly to the known values of A. 2. A is continuous-valued: In this case, the test at node N has two possible outcomes, corresponding to the conditions A split point and A > split point, respectively. 3. A is discrete-valued and a binary tree must be produced (as dictated by the attribute selection measure or algorithm being used)
10

age?

<=30

overcast 31..40

>40

student? no
no

yes

credit rating? excellent fair

yes

yes
yes

Fig: Decision tree for the concept buys computer, indicating whether a customer at AllElectronics is likely to purchase a computer. Each internal (non-leaf) node represents a test on an attribute. Each leaf node represents a class (either buys computer = yes or buys computer = no). 11

Algorithm: Generate decision tree. Generate a decision tree from the training tuples of data partition D. Input: i. Data partition, D, which is a set of training tuples and their associated class labels; ii. Attribute_list, the set of candidate attributes; iii. Attribute_selection_method, a procedure to determine the splitting criterion that best partitions the data tuples into individual classes. This criterion consists of a splitting_attribute and, possibly, either a split point or splitting subset. Output: A decision tree.
12

Algorithm

2. Attribute Selection Measure: Select the attribute with the highest information gain. An attribute selection measure is a heuristic for selecting the splitting criterion that best separates a given data partition, D, of classlabeled training tuples into individual classes. i. Information gain: ID3 uses information gain as its attribute selection measure. This measure is based on pioneering work by Claude Shannon on information theory, which studied the value or information content of messages. Let node N represent or hold the tuples of partition D.

ii. Gain ratio: The information gain measure is biased toward tests with many outcomes. That is, it prefers to select attributes having a large number of values.

iii. Gini index: The Gini index is used in CART. Using the notation described above, the Gini index measures the impurity of D, a data partition or set of training tuples, as

Where pi is the probability that a tuple in D belongs to class Ci and is estimated by |Ci,D|/|D|
15

3. Tree Pruning: Pruning algorithms attempt to improve accuracy by removing tree branches reflecting noise in the data. There are two common approaches to tree pruning: prepruning and postpruning. i. Prepruning approach, a tree is pruned by halting its construction early (e.g., by deciding not to further split or partition the subset of training tuples at a given node). ii. Postpruning, which removes subtrees from a fully grown tree. A subtree at a given node is pruned by removing its branches and replacing it with a leaf. The leaf is labeled with the most frequent class among the subtree being replaced.
16

Fig: An unpruned decision tree

Fig: A pruned version of it.

4. Scalability and Decision Tree Induction: The efficiency of existing decision tree algorithms, such as ID3, C4.5, and CART, has been well established for relatively small data sets. Efficiency becomes an issue of concern when these algorithms are applied to the mining of very large realworld databases. The restriction that the training tuples should reside in memory. It consists of (a) Repetition (b) Replication (c) SLIQ (d) SPRINT
18

a) Repetition

Fig: An example of subtree (a) Repetition (where an attribute is repeatedly tested along a given branch of the tree, e.g., age)
19

b) Replication

Fig: An example of subtree (b) Replication (where duplicate subtrees exist within a tree, such as the subtree headed by the node credit rating?).
20

c) SLIQ

Fig: Data Set

21 Fig: Attribute list and class list data structures used in SLIQ for the tuple data

d) SPRINT

Fig: Attribute list data structure used in SPRINT for the tuple data

Issues of Classification
1. 2. 3. 4. 5. 1. 2. 3. 4. 5. 6. Accuracy Training time Robustness Interpretability Scalability Credit approval Target marketing Medical diagnosis Fraud detection Weather forecasting Stock Marketing
23

Typical applications

2
No ratings yet
2
11 pages
Rusczyk R. Introduction To Geometry 2ed 2007
100% (4)
Rusczyk R. Introduction To Geometry 2ed 2007
151 pages
B. Discuss Key Enabling Technologies in Cloud Computing Systems
No ratings yet
B. Discuss Key Enabling Technologies in Cloud Computing Systems
3 pages
Unit-Iii: A Weather Dataset
No ratings yet
Unit-Iii: A Weather Dataset
12 pages
Asme Section V B Se-1419
100% (1)
Asme Section V B Se-1419
8 pages
Unit-3 DWDM
No ratings yet
Unit-3 DWDM
11 pages
Web Services Notes
No ratings yet
Web Services Notes
119 pages
AIML 4th and 5th Module Notes
No ratings yet
AIML 4th and 5th Module Notes
77 pages
CC Unit-1 Notes
No ratings yet
CC Unit-1 Notes
41 pages
Decision Tree - A Step-by-Step Guide
No ratings yet
Decision Tree - A Step-by-Step Guide
36 pages
Decision Tree Induction Algorithm
No ratings yet
Decision Tree Induction Algorithm
2 pages
Data Warehousing Questions
No ratings yet
Data Warehousing Questions
2 pages
Unit4 Datascience
No ratings yet
Unit4 Datascience
43 pages
ml unit 2
No ratings yet
ml unit 2
23 pages
Unit 1 (DMW)
No ratings yet
Unit 1 (DMW)
53 pages
1 Conceptual Graphs
No ratings yet
1 Conceptual Graphs
11 pages
Module 4 Nosql
No ratings yet
Module 4 Nosql
8 pages
ADBMS Lab Manual
No ratings yet
ADBMS Lab Manual
33 pages
East West Institute of Technology: Sadp Notes
No ratings yet
East West Institute of Technology: Sadp Notes
30 pages
Automated Visual Fruit Detection For Harvesting
No ratings yet
Automated Visual Fruit Detection For Harvesting
4 pages
6CS4-02 ML PPT Unit-3
No ratings yet
6CS4-02 ML PPT Unit-3
52 pages
Candidate Generation and Pruning
No ratings yet
Candidate Generation and Pruning
9 pages
Unit 5 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Data Mining - WWW - Rgpvnotes.in
15 pages
Classification - Issues Regarding Classification and Prediction
No ratings yet
Classification - Issues Regarding Classification and Prediction
42 pages
Important Questions and Answers of Big Data Course
No ratings yet
Important Questions and Answers of Big Data Course
4 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
Unit V
No ratings yet
Unit V
24 pages
Neural Network PPT Presentation
67% (3)
Neural Network PPT Presentation
23 pages
SE Unit 4 - Part 2
No ratings yet
SE Unit 4 - Part 2
9 pages
Mfcs PPT (All Units)
No ratings yet
Mfcs PPT (All Units)
103 pages
ML UNIT 2 Sir
No ratings yet
ML UNIT 2 Sir
46 pages
Sets Python
No ratings yet
Sets Python
20 pages
JNTUA Advanced Data Structures and Algorithms Lab Manual R20
No ratings yet
JNTUA Advanced Data Structures and Algorithms Lab Manual R20
71 pages
OOAD
No ratings yet
OOAD
2 pages
Iot Unit 3
No ratings yet
Iot Unit 3
4 pages
Toc 4 and 5 Unit Notes
No ratings yet
Toc 4 and 5 Unit Notes
72 pages
Unit 1 - Machine Learning
No ratings yet
Unit 1 - Machine Learning
21 pages
CHAPTER 03: Big Data Technology Landscape
No ratings yet
CHAPTER 03: Big Data Technology Landscape
81 pages
SM 6th-Sem Cse Internet-Of-Things
No ratings yet
SM 6th-Sem Cse Internet-Of-Things
76 pages
unit V
No ratings yet
unit V
67 pages
@vtucode - in Module 4 AI 2021 Scheme 5th Sem
No ratings yet
@vtucode - in Module 4 AI 2021 Scheme 5th Sem
11 pages
Machine Learning: PAC-Learning and VC-Dimension
No ratings yet
Machine Learning: PAC-Learning and VC-Dimension
31 pages
IoT Levels and Deployment Templates
No ratings yet
IoT Levels and Deployment Templates
10 pages
IOT (Unit III) ...
No ratings yet
IOT (Unit III) ...
39 pages
WS MCQ (Sem-5) (Itscholar - Codegency.co - In) (MC)
No ratings yet
WS MCQ (Sem-5) (Itscholar - Codegency.co - In) (MC)
22 pages
Cloud Computing Unit 5
No ratings yet
Cloud Computing Unit 5
16 pages
Elementary Data Structures
No ratings yet
Elementary Data Structures
66 pages
SCADA and RFID Protocols
No ratings yet
SCADA and RFID Protocols
4 pages
Fsd Unit III
No ratings yet
Fsd Unit III
22 pages
ML Unit-3 ppt
No ratings yet
ML Unit-3 ppt
92 pages
JNTUA MCA V Semester R17 Syllabus
No ratings yet
JNTUA MCA V Semester R17 Syllabus
24 pages
ML Unit-2.1
No ratings yet
ML Unit-2.1
17 pages
Unit-3-Greedy Method PDF
No ratings yet
Unit-3-Greedy Method PDF
22 pages
Uid-Graphical System Advatages
No ratings yet
Uid-Graphical System Advatages
21 pages
Iot Physical Devices
No ratings yet
Iot Physical Devices
22 pages
Unit 5-Key - Value Store Database
No ratings yet
Unit 5-Key - Value Store Database
16 pages
Standard Template Library
No ratings yet
Standard Template Library
24 pages
ML Lab Programs (1-12)
No ratings yet
ML Lab Programs (1-12)
35 pages
Makaut Mar Points Distributions
No ratings yet
Makaut Mar Points Distributions
3 pages
A Model For Network Security
No ratings yet
A Model For Network Security
1 page
Decision Tree
No ratings yet
Decision Tree
74 pages
Data Mining Unit-Iii
No ratings yet
Data Mining Unit-Iii
36 pages
Institute Summer Technical Activities 2022-23
No ratings yet
Institute Summer Technical Activities 2022-23
44 pages
Business Mathematics
No ratings yet
Business Mathematics
20 pages
Corrigendum BHU Various Posts
No ratings yet
Corrigendum BHU Various Posts
12 pages
BAUM HORSTMAN KRÜGER. 1979. Transcendental Arguments and Science
0% (1)
BAUM HORSTMAN KRÜGER. 1979. Transcendental Arguments and Science
303 pages
Pioneer DEH-P40MP PDF
No ratings yet
Pioneer DEH-P40MP PDF
70 pages
Belarusian State University Prospective Tuition Fees For International Students IN 2010-2011 ACADEMIC YEAR
No ratings yet
Belarusian State University Prospective Tuition Fees For International Students IN 2010-2011 ACADEMIC YEAR
8 pages
Statement of Axis Account No:923010035809266 For The Period (From: 01-11-2023 To: 09-02-2024)
No ratings yet
Statement of Axis Account No:923010035809266 For The Period (From: 01-11-2023 To: 09-02-2024)
4 pages
CCTV Course Assessment Doc 1
No ratings yet
CCTV Course Assessment Doc 1
3 pages
Top 50 Programming Quotes of All Time
No ratings yet
Top 50 Programming Quotes of All Time
3 pages
DATA SHEET of C&I Cable For VF 1 - Rev 01
No ratings yet
DATA SHEET of C&I Cable For VF 1 - Rev 01
2 pages
Mathematics of Graphs: Module Overview
No ratings yet
Mathematics of Graphs: Module Overview
16 pages
Top 10 Issues Reformatted Final PDF
No ratings yet
Top 10 Issues Reformatted Final PDF
184 pages
ED 201 - Basic Inferential Statistics: Leah Mae L. Magana Final Requirement
No ratings yet
ED 201 - Basic Inferential Statistics: Leah Mae L. Magana Final Requirement
5 pages
Chapter 2 HISTORY OF PSYCHOLOGICAL TESTING
No ratings yet
Chapter 2 HISTORY OF PSYCHOLOGICAL TESTING
43 pages
BQ20Z70 Gas Gauge Circuit Design
No ratings yet
BQ20Z70 Gas Gauge Circuit Design
17 pages
12 158 en
No ratings yet
12 158 en
1 page
Science 5
No ratings yet
Science 5
5 pages
The World of Translation
No ratings yet
The World of Translation
20 pages
User Needs White Paper
No ratings yet
User Needs White Paper
89 pages
Air Dryer Check
No ratings yet
Air Dryer Check
4 pages
Project Preparation Grant Proposal
No ratings yet
Project Preparation Grant Proposal
41 pages
Agric Mock 2015
No ratings yet
Agric Mock 2015
13 pages
Dynamical Systems: Lecture Notes
No ratings yet
Dynamical Systems: Lecture Notes
40 pages
Soldadura de Plasticos Por Radiofrecuencia
No ratings yet
Soldadura de Plasticos Por Radiofrecuencia
7 pages
MST 001
No ratings yet
MST 001
5 pages
MCQ Quadratic Equation
No ratings yet
MCQ Quadratic Equation
14 pages
CV Youssef AWRAGH
No ratings yet
CV Youssef AWRAGH
1 page
Report Writing
No ratings yet
Report Writing
12 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

2 - Decision Tree

Uploaded by

2 - Decision Tree

Uploaded by

2 - decision tree classification

Why Decision Tree :

TYPES OF CLASSIFICATION TECHNIQUES

4. Prediction: Accuracy and error measures

4) Prediction: Accuracy and error measures

FIG: TYPES OF CLASSIFICATION TECHNIQUES

Classification by Decision Tree Induction consists of

iii. Gini index

3. Tree Pruning 4. Scalability and Decision Tree Induction

Steps of Decision Tree Construction

Fig: Class-labeled training tuples from the AllElectronics customer database.

credit rating? excellent fair

Fig: An unpruned decision tree

Fig: A pruned version of it.

Fig: Data Set

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.