0% found this document useful (0 votes)
10 views

DWM EXP 4-2

Uploaded by

221nicole0006
Copyright
© © All Rights Reserved
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

DWM EXP 4-2

Uploaded by

221nicole0006
Copyright
© © All Rights Reserved
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
You are on page 1/ 4

CSL503 Data Warehousing and Mining Lab Sem VII

Roll No.-
Name –

EXPERIMENT 4

Title Implementation of Bayesian algorithm

Pre requisite Clustering, Data Mining concept

Mapping with CO To apply Data Mining algorithms on a given dataset for a real-time case study
and evaluate their performance using Accuracy Measures. (CSL503.5)

Objective To apply Naïve Baye’s algorithm on a given dataset.

Outcome To implement Naïve Baye’s algorithm and calculate accuracy of the model.

Instructions - Explain the dataset used as input


- Code must be implemented using python language
- Dataset pre processing, visualization and algorithm implementation
must be done
- Accuracy of algorithm must be calculated

Deliverables 1. Explain Naïve Baye’s Algorithm


 Naive Bayes is a statistical classification technique based on Bayes
Theorem. It is one of the simplest supervised learning algorithms.
Naive Bayes classifier is the fast, accurate and reliable algorithm.
Naive Bayes classifiers have high accuracy and speed on large datasets.
Naive Bayes classifier assumes that the effect of a particular feature in
a class is independent of other features. For example, a loan applicant
is desirable or not depending on his/her income, previous loan and
transaction history, age, and location. Even if these features are
interdependent, these features are still considered independently. This
assumption simplifies computation, and that's why it is considered as
naive. This assumption is called class conditional independence.
 P(h): the probability of hypothesis h being true (regardless of the
data). This is known as the prior probability of h.
 P(D): the probability of the data (regardless of the hypothesis). This
is known as the prior probability.
 P(h|D): the probability of hypothesis h given the data D. This is
known as posterior probability.
 P(D|h): the probability of data d given that the hypothesis h was
true. This is known as posterior probability.

Naive Bayes classifier calculates the probability of an event in the following


steps:

 Step 1: Calculate the prior probability for given class labels


 Step 2: Find Likelihood probability with each attribute for each
class
 Step 3: Put these value in Bayes Formula and calculate posterior
probability.
 Step 4: See which class has a higher probability, given the input be-
longs to the higher probability class.

Advantages-

 It is not only a simple approach but also a fast and accurate method
for prediction.
 Naive Bayes has a very low computation cost.
 It can efficiently work on a large dataset.
 It performs well in case of discrete response variable compared to
the continuous variable.
 It can be used with multiple class prediction problems.
 It also performs well in the case of text analytics problems.
 When the assumption of independence holds, a Naive Bayes classi-
fier performs better compared to other models like logistic regres-
sion.

Disadvantages-

 The assumption of independent features. In practice, it is almost im-


possible that model will get a set of predictors which are entirely in-
dependent.
 If there is no training tuple of a particular class, this causes zero pos-
terior probability. In this case, the model is unable to make predic-
tions. This problem is known as Zero Probability/Frequency Prob-
lem.

2. Explain Conditional Probability


 Conditional probability is defined as the likelihood of an event or
outcome occurring, based on the occurrence of a previous event or
outcome. Conditional probability is calculated by multiplying the
probability of the preceding event by the updated probability of the
succeeding, or conditional, event.

3. Readable screenshots of code and output


 Code:-

from sklearn.datasets import fetch_20newsgroups


from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report
# Load the 20 Newsgroups dataset (you can replace this with your own
dataset)
newsgroups = fetch_20newsgroups(subset='all', remove=('headers',
'footers', 'quotes'))
# Convert text data to numerical features using CountVectorizer
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(newsgroups.data)
y = newsgroups.target
# Split the data into training and testing sets
split_ratio = 0.8
split_index = int(split_ratio * X.shape[0])
X_train, X_test = X[:split_index], X[split_index:]
y_train, y_test = y[:split_index], y[split_index:]
# Initialize the Multinomial Naive Bayes model
nb_classifier = MultinomialNB()

# Train the model on the training data


nb_classifier.fit(X_train, y_train)
# Predict the labels for the test data
y_pred = nb_classifier.predict(X_test)
# Calculate accuracy and print classification report
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred,
target_names=newsgroups.target_names)
print("Accuracy:", accuracy)
print("Classification Report:\n", report)

Output:-

Conclusion From the above experiment we have learnt how the Naïve Bayes Algorithm
works and what is the definition of this Algorithm. From this experiment we
can conclude, that

References Paulraj Ponniah, “Data Warehousing: Fundamentals for IT Professionals”,


Wiley
India
http://www.oracle.com/webfolder/technetwork/tutorials/obe/db/10g/r2/owb/
owb10gr2
_gs/owb/lesson3/starandsnowflake.htm

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy