UNIT-3 ML notes
UNIT-3 ML notes
UNIT-3
DECISICION TREE FOR CLASSIFICATION:
Decision trees are a supervised machine learning algorithm used for both classification
and regression, represented as a tree-like structure where each node represents a test on
an attribute, each branch represents the outcome of the test, and each leaf node represents
a class label or a predicted value.
How Decision Trees Work for Classification:
Tree Structure: A decision tree starts with a root node, which represents the entire
dataset.
Splitting: Each internal node represents a test on a specific attribute, and the branches
represent the possible outcomes of that test.
Leaf Nodes: The leaf nodes represent the final classifications or predictions.
Decision Rules: The path from the root node to a leaf node defines a decision rule.
Goal: The goal of a decision tree is to find the optimal set of splits that best separates
the data into distinct classes.
Key Concepts in Decision Trees:
Root Node: The starting point of the tree, representing the entire dataset.
Internal Nodes: Nodes that represent tests on attributes.
Branches: Connections between nodes, representing the outcomes of tests.
Leaf Nodes: Nodes that represent the final classifications or predictions.
Splitting: The process of dividing the dataset into subsets based on attribute values.
Pruning: A technique used to simplify the tree by removing branches that are not
important for prediction.
Advantages of Decision Trees:
Easy to Understand and Interpret: The tree structure makes the decision-making
process easy to visualize and understand.
Versatile: Can be used for both classification and regression problems.
Can Handle Both Numerical and Categorical Data: Decision trees can work with
different types of data.
2
Foundation for Ensemble Methods: Decision trees are the basis for more advanced
techniques like Random Forests and Gradient Boosting.
Disadvantages of Decision Trees:
Overfitting: Decision trees can become overly complex and fit the training data too well,
leading to poor performance on unseen data.
Sensitive to Small Changes in Data: A small change in the training data can lead to a
different tree structure.
Applications of Decision Trees:
Customer Segmentation: Classifying customers into different groups based on their
characteristics.
Fraud Detection: Identifying fraudulent transactions.
Medical Diagnosis: Predicting the likelihood of a disease based on patient symptoms.
Risk Assessment: Assessing the risk of lending money to a borrower.
o Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier, where internal nodes represent
the features of a dataset, branches represent the decision rules and each leaf node
represents the outcome.
o In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple branches,
whereas Leaf nodes are the output of those decisions and do not contain any further
branches.
o The decisions or the test are performed on the basis of features of the given dataset.
o It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
o It is called a decision tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure.
o In order to build a tree, we use the CART algorithm, which stands for Classification
and Regression Tree algorithm.
3
o A decision tree simply asks a question, and based on the answer (Yes/No), it further split
the tree into subtrees.
o Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
o Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
o Step-3: Divide the S into subsets that contains possible values for the best attributes.
o Step-4: Generate the decision tree node, which contains the best attribute.
4
o Step-5: Recursively make new decision trees using the subsets of the dataset created in
step -3. Continue this process until a stage is reached where you cannot further classify
the nodes and called the final node as a leaf node.