Decision Tree
Decision Tree
What is Information
Supervised & decision Entropy Pre Post Advantages
Unsupervised tree Gain pruning pruning
Supervised Learning
Supervised learning is a type of machine learning algorithm that learns from labeled data.
Labeled data is data that has been tagged with a correct answer or classification.
Supervised learning is when we teach or train the machine using data that is well-labelled,
which means some data is already tagged with the correct answer.
After that, the machine is provided with a new set of examples(data) so that the supervised
learning algorithm analyses the training data(set of training examples) and produces a
correct outcome from labeled data.
Examples:Clustering, where the algorithm groups similar data points together based on certain
characteristics, and dimensionality reduction, where the algorithm reduces the number of
features while preserving relevant information, are examples of unsupervised learning.
What is Decision Tree
A decision tree is a popular machine learning algorithm that is used for both classification and regression
tasks. It is a tree-like model where each internal node represents a decision based on a feature, each branch
represents the outcome of the decision, and each leaf node represents the final predicted output for a given
input.
Components of Decision Tree
A decision tree consists of several key components that together define its structure and behavior. Here are
the main components of a decision tree:
Root Node:
- The topmost node in the tree, representing the initial decision or test based on a specific feature. It is the
starting point for the decision-making process.
Internal Nodes:
- Nodes within the tree, excluding the leaf nodes. Each internal node corresponds to a decision based on a
specific feature and serves as a branching point in the tree.
Decision Node
-In a decision tree, a decision node is a point where the tree makes a decision about the input data. It
represents a test or condition that is applied to the input features, and based on the outcome of that test, the
Edges:
- Edges connect nodes in the tree and represent the outcome of a decision. Each edge leads from
one node to another and corresponds to a particular outcome of the decision associated with the parent
node.
Subtree:
- A subtree is a portion of the entire decision tree that is itself a valid decision tree. Internal nodes and
leaf nodes, along with their connecting branches, form subtrees within the larger tree.
ID3 VS CART
ID3 (Iterative Dichotomiser 3) and CART (Classification and Regression Trees) are both algorithms
used in machine learning for building decision trees, but they have some differences in terms of their
approach and applications.
Objective:
• ID3: Primarily designed for classification tasks. It builds a decision tree by recursively splitting the
dataset based on the most informative attribute at each step, aiming to create branches that result in
pure subsets.
• CART: Can be used for both classification and regression tasks. It constructs binary trees by
recursively splitting the dataset based on the feature that provides the best separation according to a
specified criterion.
Categorical vs. Numeric Attributes:
• ID3: Primarily handles categorical attributes. It selects the attribute that maximizes information gain or
minimizes entropy.
• CART: Can handle both categorical and numeric attributes. It uses different criteria for determining the best split
depending on the type of attribute (e.g., Gini impurity for categorical attributes and mean squared error for
numeric attributes).
Splitting Criteria:
• ID3: Uses information gain and entropy as the criterion for selecting the best attribute to split the data.
• CART: Uses Gini impurity for classification tasks (to measure the node's impurity) and mean squared error for
regression tasks.
Tree Structure:
• ID3: Tends to create deeper trees, which can be more prone to overfitting.
• CART: Typically results in shallower trees, and it employs pruning techniques to control overfitting.
Output:
• CART: Can output both categorical labels and numeric values, making it versatile for classification
and regression tasks.
Decision Tree
Regression Classifiaction
Classification
Classification in decision trees involves building a tree structure that helps classify instances or data
points into different classes or categories. The decision tree is constructed based on features or attributes
of the data, and each internal node of the tree represents a decision based on a specific attribute.
If we have output of the data as categarical or label, we use classification technique to split the nodes of
the decision tree
Entropy
• It is used for checking the impurity or uncertainity present in the data.Entropy is used to evaluate the quality
of the split.
• Formula of Entropy
• Lets asume that resulting decision tree classifies instances into two categories,we will call them positive and
negative.Given a set containing positive and negative classes,then the entropy of S is,
The above table is about forecasting whether the match will be played or not according to the weather condition
• Graph of Entropy
Gini Impurity
• Gini impurity is a measure used in decision tree algorithms to quantity a data sets impurity
level or disorder
• It indicates how much information a particular feature or variable give us about the final outcome
• To mininize the decision tree depth when we traverse the path,we need to select the optimal attribute for
splitting the tree node
Steps to make Decision Tree
Take the entire dataset as an input
Calculate the entropy of the target variable,as well as the predictor attributes
Choose the attributes withh the highest information gain as the Root node
Repeat the same procedure on every branch until the decision node of each branch is
finalised
PROBLEM
Instance Outlook Temperature Humidity Outcomes
Regression refers to using a decision tree algorithm to predict a continuous numeric output rather
than class labels.
While decision trees are commonly associated with classification tasks, they can also be adapted for
regression tasks.
The process of building a decision tree for regression is similar to that for classification, but instead
of predicting discrete classes at the leaf nodes, it predicts continuous values.
Continuous data is a type of quantitative data that can take any value within a given range.
For exmaple:Height,Weight,Time,Temperature
Regression Problem
• At first we have to take any one of the input as the root node
• Then we have to split the output and arrange it according to the condition of the root node
• The next step will be we have to calculate the Variance or MSE or SSR for the rootnode
• Formula of Variance/MSE/SSR
• Next step will be we have to calculate the variance reduction for that root node
• Next we have to calculate the variance reduction for different root nodes
• After calculating the variance reduction for all the root nodes,we will select the root node as the main
root node for the decision tree where the variance reduction is high or larger
• After selecting the root node,we will split the node into binary according to the conditions
• This process will continue until the decision tree has been built
Key Differences from Classification Trees:
• Output at Leaf Nodes: Instead of class labels, regression trees output continuous values representing the
predicted outcome.
• Splitting Criteria: The decision criteria for selecting features and thresholds are typically based on
minimizing the variance or mean squared error of the target variable within the subsets created by the
splits.
• Evaluation Metrics: Common evaluation metrics for regression trees include mean squared error (MSE),
mean absolute error (MAE), or other measures of prediction accuracy.
PRUNING
Pruning consists of a set of techniques that can be used to simplify a Decision Tree, and enable it to
generalise better
Pruning Decision Trees falls into 2 general forms: Pre-Pruning and Post-Pruning.
Post Pruning :
• This technique is used when decision tree will have very large depth and will show overfitting of model.
• Here we will control the branches of decision tree that is max_depth and min_samples_split using
cost_complexity_pruning
Pre-Pruning :
• Improved Generalization: Pruning helps create a simpler and more generalizable tree, reducing the risk
of overfitting to the training data.
• Reduced Complexity: A pruned tree is often smaller and easier to interpret than an unpruned one,
making it more suitable for practical applications.
• Faster Prediction: Smaller trees typically lead to faster prediction times for new instances.
• Increased Robustness: Pruned trees are less sensitive to noise in the training data.
Advantages of the Decision Tree
• It is simple to understand as it follows the same process which a human follow while making any
decision in real-life.
• It may have an overfitting issue, which can be resolved using the Random Forest algorithm.
• For more class labels, the computational complexity of the decision tree may increase
Limitations of decision tree
1.Overfitting
2.Instability:
3.Sensitive to Noise
Conclusion
Decision trees are powerful tools for classification and regression tasks, providing a clear and interpretable
way to make predictions. Their ability to handle both categorical and numerical data, along with their simplicity
and visual representation, makes them widely used in various fields. However, decision trees can be prone to
overfitting, and the choice of hyperparameters, such as tree depth, is crucial. Ensemble methods like
Random Forests and Gradient Boosting can enhance performance and mitigate overfitting. Ultimately, the
suitability of a decision tree depends on the specific characteristics of the dataset and the goals
of the analysis.
THANK YOU