0% found this document useful (0 votes)
3 views

Decision Tree & Random Forest

Uploaded by

vardhanvalluri5
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Decision Tree & Random Forest

Uploaded by

vardhanvalluri5
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Clustering

Clustering
• Clustering is the process of grouping a set of objects into
classes of similar objects.
• Group data points that are close (or similar) to each other.
• Clustering - unsupervised machine learning: no predefined
classes.
• A good clustering method will produce
clusters with
High intra-class similarity
Low inter-class similarity
Types of Clustering
Centroid based clustering

The centroid of a cluster is the


arithmetic mean of all the points in
the cluster.

Centroid-based clustering organizes


the data into non-hierarchical
clusters.
Clustering
Density based clustering
o Density-based clustering connects
contiguous areas of high example
density into clusters.
o This allows for the discovery of
any number of clusters of any
shape.
o Outliers are not assigned to
clusters.
Clustering
Distribution based clustering
o This approach assumes data is
composed of probabilistic
distributions, such as Gaussian
distributions.
o The distribution-based algorithm
clusters data into 3 Gaussian
distributions.
o As distance from the distribution's
center increases, the probability that
a point belongs to the distribution
decreases.
Clustering
Hierarchical clustering
o Hierarchical clustering creates
a tree of clusters.

o It is well suited to hierarchical


data.
Decision Tree
Decision Tree
• A decision tree is a supervised learning algorithm that is used
for classification and regression modeling.
• Mostly it is preferred for solving Classification problems.
• A decision tree is a non-parametric supervised learning
algorithm
• It is a tree-structured classifier
• Internal nodes represent the features of a dataset.
• Branches represent the decision rules.
• Each leaf node represents the final decision/outcome.
Decision Tree
Decision Tree
Decision Tree
• A decision tree starts with a root node, which does not have
any incoming branches.
• The outgoing branches from the root node then feed into
the internal nodes (decision nodes).
• Decision nodes are used to make any decision and have
multiple branches.
• Leaf nodes are the output of those decisions and do not
contain any further branches.
Decision Tree
• Decision tree is a graphical representation for getting all the
possible solutions to a problem/decision based on given
conditions.
• In order to build a tree, CART (Classification and Regression
Tree algorithm) algorithm can be used.
• A decision tree simply asks a question, and based on the
answer (Yes/No), it further split the tree into subtrees.
Decision Tree Example
Decision Tree Example
Decision Tree Example
Decision Tree Example
Decision Tree Terminologies
• Root Node: Root node is from where the decision tree starts. It
represents the entire dataset, which further gets divided into two
or more homogeneous sets.

• Decision (or internal) node: Decision nodes are used to make


any decision and have multiple branches.

• Leaf (External or terminal) Node: Leaf nodes are the final output
node, and the tree cannot be segregated further after getting a
leaf node.
Decision Tree
• Splitting: Splitting is the process of dividing the decision
node/root node into sub-nodes according to the given
conditions.
• Branch/Sub Tree: A tree formed by splitting the tree.
• Pruning: Pruning is the process of removing the unwanted
branches from the tree.
• Parent/Child node: The root node of the tree is called the
parent node, and other nodes are called the child nodes.
Decision Tree – Solving Example -1
Decision Tree – Solving Example -2

Suppose there is a candidate who


has a job offer and wants to
decide whether he should accept
the offer or Not.
Decision Tree – Solving Example -2
Decision Tree - Algorithm
• Step-1: Begin the tree with the root node, say S, which contains the
complete dataset.

• Step-2: Find the best attribute in the dataset using Attribute


Selection Measure (ASM).
• Information Gain
• Gini Index
Decision Tree - Algorithm
• Step-3: Divide the S into subsets that contains possible
values for the best attributes.

• Step-4: Generate the decision tree node, which contains the


best attribute.

• Step-5: Recursively make new decision trees using the


subsets of the dataset created in step -3. Continue this
process until a stage is reached where you cannot further
classify the nodes and called the final node as a leaf node.
Decision Tree
Advantages
• It is simple to understand as it follows the same process
which a human follow while making any decision in real-life.

• It can be very useful for solving decision-related problems.

• It helps to think about all the possible outcomes for a


problem.

• There is less requirement of data cleaning compared to


other algorithms.
Decision Tree
Disadvantages
• The decision tree contains lots of layers, which makes it
complex.

• It may have an overfitting issue, which can be resolved using


the Random Forest algorithm.

• For more class labels, the computational complexity of the


decision tree may increase.
Random Forest
Random Forest
• Random Forest is a popular machine learning algorithm that
belongs to the supervised learning technique.

• It can be used for both Classification and Regression


problems in ML.

• It is based on the concept of ensemble learning.


• Combining multiple classifiers to solve a complex problem
and to improve the performance of the model.
Random Forest
• Random Forest is a classifier that contains a number
of decision trees on various subsets of the given
dataset and takes the average to improve the
predictive accuracy of that dataset.

• The greater number of trees in the forest leads to


higher accuracy and prevents the problem of
overfitting.
Random Forest
Random Forest
Random Forest - Algorithm
• Step-1: Select random K data points from the training set.
• Step-2: Build the decision trees associated with the
selected data points (Subsets).
• Step-3: Choose the number N for decision trees that you
want to build.
• Step-4: Repeat Step 1 & 2.
• Step-5: For new data points, find the predictions of each
decision tree, and assign the new data points to the
category that wins the majority votes.
Random Forest
Advantages
• Random Forest is capable of performing both Classification and
Regression tasks.
• It is capable of handling large datasets with high dimensionality.
• It enhances the accuracy of the model and prevents the
overfitting issue.
Disadvantages
• Although random forest can be used for both classification and
regression tasks, it is not more suitable for Regression tasks.
Decision Tree vs Random Forest

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy