0% found this document useful (0 votes)
3 views

UNIT-3 ML notes

Decision trees are supervised machine learning algorithms used for classification and regression, structured as a tree with nodes representing tests on attributes and leaf nodes indicating class labels. They are easy to interpret and versatile, but can suffer from overfitting and sensitivity to data changes. Applications include customer segmentation, fraud detection, and medical diagnosis, with the CART algorithm commonly used for tree construction.

Uploaded by

srinu
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

UNIT-3 ML notes

Decision trees are supervised machine learning algorithms used for classification and regression, structured as a tree with nodes representing tests on attributes and leaf nodes indicating class labels. They are easy to interpret and versatile, but can suffer from overfitting and sensitivity to data changes. Applications include customer segmentation, fraud detection, and medical diagnosis, with the CART algorithm commonly used for tree construction.

Uploaded by

srinu
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

1

UNIT-3
DECISICION TREE FOR CLASSIFICATION:
 Decision trees are a supervised machine learning algorithm used for both classification
and regression, represented as a tree-like structure where each node represents a test on
an attribute, each branch represents the outcome of the test, and each leaf node represents
a class label or a predicted value.
How Decision Trees Work for Classification:
 Tree Structure: A decision tree starts with a root node, which represents the entire
dataset.
 Splitting: Each internal node represents a test on a specific attribute, and the branches
represent the possible outcomes of that test.
 Leaf Nodes: The leaf nodes represent the final classifications or predictions.
 Decision Rules: The path from the root node to a leaf node defines a decision rule.
 Goal: The goal of a decision tree is to find the optimal set of splits that best separates
the data into distinct classes.
Key Concepts in Decision Trees:
 Root Node: The starting point of the tree, representing the entire dataset.
 Internal Nodes: Nodes that represent tests on attributes.
 Branches: Connections between nodes, representing the outcomes of tests.
 Leaf Nodes: Nodes that represent the final classifications or predictions.
 Splitting: The process of dividing the dataset into subsets based on attribute values.
 Pruning: A technique used to simplify the tree by removing branches that are not
important for prediction.
Advantages of Decision Trees:
 Easy to Understand and Interpret: The tree structure makes the decision-making
process easy to visualize and understand.
 Versatile: Can be used for both classification and regression problems.
 Can Handle Both Numerical and Categorical Data: Decision trees can work with
different types of data.
2

 Foundation for Ensemble Methods: Decision trees are the basis for more advanced
techniques like Random Forests and Gradient Boosting.
Disadvantages of Decision Trees:
 Overfitting: Decision trees can become overly complex and fit the training data too well,
leading to poor performance on unseen data.
 Sensitive to Small Changes in Data: A small change in the training data can lead to a
different tree structure.
Applications of Decision Trees:
 Customer Segmentation: Classifying customers into different groups based on their
characteristics.
 Fraud Detection: Identifying fraudulent transactions.
 Medical Diagnosis: Predicting the likelihood of a disease based on patient symptoms.
 Risk Assessment: Assessing the risk of lending money to a borrower.

o Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier, where internal nodes represent
the features of a dataset, branches represent the decision rules and each leaf node
represents the outcome.
o In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple branches,
whereas Leaf nodes are the output of those decisions and do not contain any further
branches.
o The decisions or the test are performed on the basis of features of the given dataset.
o It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
o It is called a decision tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure.
o In order to build a tree, we use the CART algorithm, which stands for Classification
and Regression Tree algorithm.
3

o A decision tree simply asks a question, and based on the answer (Yes/No), it further split
the tree into subtrees.

Decision Tree Terminologies


o Root Node: Root node is from where the decision tree starts. It represents the entire
dataset, which further gets divided into two or more homogeneous sets.
o Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated
further after getting a leaf node.
o Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.
o Branch/Sub Tree: A tree formed by splitting the tree.
o Pruning: Pruning is the process of removing the unwanted branches from the tree.
o Parent/Child node: The root node of the tree is called the parent node, and other nodes
are called the child nodes.

o Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
o Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
o Step-3: Divide the S into subsets that contains possible values for the best attributes.
o Step-4: Generate the decision tree node, which contains the best attribute.
4

o Step-5: Recursively make new decision trees using the subsets of the dataset created in
step -3. Continue this process until a stage is reached where you cannot further classify
the nodes and called the final node as a leaf node.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy