0% found this document useful (0 votes)
11 views

Machine Ass

machine

Uploaded by

yemantsegaye24
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Machine Ass

machine

Uploaded by

yemantsegaye24
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 33

← prevnext →

Naïve Bayes Classifier Algorithm

o Naïve Bayes algorithm is a supervised learning algorithm, which is based


on Bayes theorem and used for solving classification problems.
o It is mainly used in text classification that includes a high-dimensional training
dataset.
o Naïve Bayes Classifier is one of the simple and most effective Classification
algorithms which helps in building the fast machine learning models that can
make quick predictions.
o It is a probabilistic classifier, which means it predicts on the basis of the
probability of an object.
o Some popular examples of Naïve Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles.

Why is it called Naïve Bayes?

The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can
be described as:

o Naïve: It is called Naïve because it assumes that the occurrence of a certain


feature is independent of the occurrence of other features. Such as if the fruit
is identified on the bases of color, shape, and taste, then red, spherical, and
sweet fruit is recognized as an apple. Hence each feature individually
contributes to identify that it is an apple without depending on each other.
o Bayes: It is called Bayes because it depends on the principle of Bayes'
Theorem.

Bayes' Theorem:

o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to
determine the probability of a hypothesis with prior knowledge. It depends on
the conditional probability.
o The formula for Bayes' theorem is given as:

Where,

P(A|B) is Posterior probability: Probability of hypothesis A on the observed event


B.

P(B|A) is Likelihood probability: Probability of the evidence given that the


probability of a hypothesis is true.

Advertisement
P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
P(B) is Marginal Probability: Probability of Evidence.

Working of Naïve Bayes' Classifier:

Working of Naïve Bayes' Classifier can be understood with the help of the below
example:

Suppose we have a dataset of weather conditions and corresponding target variable


"Play". So using this dataset we need to decide that whether we should play or not on
a particular day according to the weather conditions. So to solve this problem, we
need to follow the below steps:

1. Convert the given dataset into frequency tables.


2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.

Problem: If the weather is sunny, then the Player should play or not?

Solution: To solve this, first consider the below dataset:

Advertisement
Outlook Play
0 Rainy Yes
1 Sunny Yes
2 Overcast Yes
3 Overcast Yes
4 Sunny No
5 Rainy Yes
6 Sunny Yes
7 Overcast Yes
8 Rainy No
9 Sunny No
10 Sunny Yes
11 Rainy No
12 Overcast Yes
13 Overcast Yes
Frequency table for the Weather Conditions:

Weather Yes No
Overcast 5 0
Rainy 2 2
Sunny 3 2
Total 10 5
Likelihood table weather condition:

Weather No Yes
Overcast 0 5 5/14= 0.35
Rainy 2 2 4/14=0.29
Sunny 2 3 5/14=0.35
All 4/14=0.29 10/14=0.71
Applying Bayes'theorem:

P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)

P(Sunny|Yes)= 3/10= 0.3

P(Sunny)= 0.35

P(Yes)=0.71

Advertisement
So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60

P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)

P(Sunny|NO)= 2/4=0.5

P(No)= 0.29

P(Sunny)= 0.35

Advertisement
So P(No|Sunny)= 0.5*0.29/0.35 = 0.41

So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)

Hence on a Sunny day, Player can play the game.

Advantages of Naïve Bayes Classifier:

o Naïve Bayes is one of the fast and easy ML algorithms to predict a class of
datasets.
o It can be used for Binary as well as Multi-class Classifications.
o It performs well in Multi-class predictions as compared to the other
Algorithms.
o It is the most popular choice for text classification problems.

Disadvantages of Naïve Bayes Classifier:

o Naive Bayes assumes that all features are independent or unrelated, so it


cannot learn the relationship between features.

Applications of Naïve Bayes Classifier:

o It is used for Credit Scoring.


o It is used in medical data classification.
o It can be used in real-time predictions because Naïve Bayes Classifier is an
eager learner.
o It is used in Text classification such as Spam filtering and Sentiment
analysis.

Types of Naïve Bayes Model:

There are three types of Naive Bayes Model, which are given below:

o Gaussian: The Gaussian model assumes that features follow a normal


distribution. This means if predictors take continuous values instead of
discrete, then the model assumes that these values are sampled from the
Gaussian distribution.
o Multinomial: The Multinomial Naïve Bayes classifier is used when the data is
multinomial distributed. It is primarily used for document classification
problems, it means a particular document belongs to which category such as
Sports, Politics, education, etc.
The classifier uses the frequency of words for the predictors.
o Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier,
but the predictor variables are the independent Booleans variables. Such as if a
particular word is present or not in a document. This model is also famous for
document classification tasks.

Python Implementation of the Naïve Bayes algorithm:

Now we will implement a Naive Bayes Algorithm using Python. So for this, we will
use the "user_data" dataset, which we have used in our other classification model.
Therefore we can easily compare the Naive Bayes model with the other models.

Steps to implement:

o Data Pre-processing step


o Fitting Naive Bayes to the Training set
o Predicting the test result
o Test accuracy of the result(Creation of Confusion matrix)
o Visualizing the test set result.
1) Data Pre-processing step:

In this step, we will pre-process/prepare the data so that we can use it efficiently in
our code. It is similar as we did in data-pre-processing. The code for this is given
below:

. Importing the libraries


. import numpy as nm
. import matplotlib.pyplot as mtp
. import pandas as pd
.
. # Importing the dataset
. dataset = pd.read_csv('user_data.csv')
. x = dataset.iloc[:, [2, 3]].values
. y = dataset.iloc[:, 4].values
.
. # Splitting the dataset into the Training set and Test set
. from sklearn.model_selection import train_test_split
. x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.25, random_state =
0)
.
. # Feature Scaling
. from sklearn.preprocessing import StandardScaler
. sc = StandardScaler()
. x_train = sc.fit_transform(x_train)
. x_test = sc.transform(x_test)
Advertisement
In the above code, we have loaded the dataset into our program using "dataset =
pd.read_csv('user_data.csv'). The loaded dataset is divided into training and test set,
and then we have scaled the feature variable.

The output for the dataset is given as:


2) Fitting Naive Bayes to the Training Set:

After the pre-processing step, now we will fit the Naive Bayes model to the Training
set. Below is the code for it:
. # Fitting Naive Bayes to the Training set
. from sklearn.naive_bayes import GaussianNB
. classifier = GaussianNB()
. classifier.fit(x_train, y_train)
In the above code, we have used the GaussianNB classifier to fit it to the training
dataset. We can also use other classifiers as per our requirement.

Output:

Out[6]: GaussianNB(priors=None, var_smoothing=1e-09)

3) Prediction of the test set result:

Now we will predict the test set result. For this, we will create a new predictor
variable y_pred, and will use the predict function to make the predictions.

. # Predicting the Test set results


. y_pred = classifier.predict(x_test)
Output:
The above output shows the result for prediction vector y_pred and real vector y_test.
We can see that some predications are different from the real values, which are the
incorrect predictions.

4) Creating Confusion Matrix:

Now we will check the accuracy of the Naive Bayes classifier using the Confusion
matrix. Below is the code for it:

. # Making the Confusion Matrix


. from sklearn.metrics import confusion_matrix
. cm = confusion_matrix(y_test, y_pred)
Output:

As we can see in the above confusion matrix output, there are 7+3= 10 incorrect
predictions, and 65+25=90 correct predictions.

5) Visualizing the training set result:

Next we will visualize the training set result using Na �ve Bayes Classifier. Below is
the code for it:

. # Visualising the Training set results


. from matplotlib.colors import ListedColormap
. x_set, y_set = x_train, y_train
. X1, X2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max(
) + 1, step = 0.01),
. nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step
= 0.01))
. mtp.contourf(X1, X2, classifier.predict(nm.array([X1.ravel(), X2.ravel()]).T).reshape(
X1.shape),
. alpha = 0.75, cmap = ListedColormap(('purple', 'green')))
. mtp.xlim(X1.min(), X1.max())
. mtp.ylim(X2.min(), X2.max())
. for i, j in enumerate(nm.unique(y_set)):
. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
. c = ListedColormap(('purple', 'green'))(i), label = j)
. mtp.title('Naive Bayes (Training set)')
. mtp.xlabel('Age')
. mtp.ylabel('Estimated Salary')
. mtp.legend()
. mtp.show()
Output:

Advertisement

In the above output we can see that the Na�ve Bayes classifier has segregated the
data points with the fine boundary. It is Gaussian curve as we have
used GaussianNB classifier in our code.

6) Visualizing the Test set result:

. # Visualising the Test set results


. from matplotlib.colors import ListedColormap
. x_set, y_set = x_test, y_test
. X1, X2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max(
) + 1, step = 0.01),
. nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step
= 0.01))
. mtp.contourf(X1, X2, classifier.predict(nm.array([X1.ravel(), X2.ravel()]).T).reshape(
X1.shape),
. alpha = 0.75, cmap = ListedColormap(('purple', 'green')))
. mtp.xlim(X1.min(), X1.max())
. mtp.ylim(X2.min(), X2.max())
. for i, j in enumerate(nm.unique(y_set)):
. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
. c = ListedColormap(('purple', 'green'))(i), label = j)
. mtp.title('Naive Bayes (test set)')
. mtp.xlabel('Age')
. mtp.ylabel('Estimated Salary')
. mtp.legend()
. mtp.show()
Output:

next →
Naive Bayes Classifiers
Last Updated : 10 Jul, 2024

A Naive Bayes classifiers, a family of algorithms based on Bayes’ Theorem. Despite


the “naive” assumption of feature independence, these classifiers are widely utilized
for their simplicity and efficiency in machine learning. The article delves into theory,
implementation, and applications, shedding light on their practical utility despite
oversimplified assumptions.
What is Naive Bayes Classifiers?
Naive Bayes classifiers are a collection of classification algorithms based on Bayes’
Theorem. It is not a single algorithm but a family of algorithms where all of them
share a common principle, i.e. every pair of features being classified is independent of
each other. To start with, let us consider a dataset.
One of the most simple and effective classification algorithms, the Naïve Bayes
classifier aids in the rapid development of machine learning models with rapid
prediction capabilities.
Naïve Bayes algorithm is used for classification problems. It is highly used in text
classification. In text classification tasks, data contains high dimension (as each word
represent one feature in the data). It is used in spam filtering, sentiment detection,
rating classification etc. The advantage of using naïve Bayes is its speed. It is fast and
making prediction is easy with high dimension of data.
This model predicts the probability of an instance belongs to a class with a given set
of feature value. It is a probabilistic classifier. It is because it assumes that one feature
in the model is independent of existence of another feature. In other words, each
feature contributes to the predictions with no relation between each other. In real
world, this condition satisfies rarely. It uses Bayes theorem in the algorithm for
training and prediction
Why it is Called Naive Bayes?
The “Naive” part of the name indicates the simplifying assumption made by the Naïve
Bayes classifier. The classifier assumes that the features used to describe an
observation are conditionally independent, given the class label. The “Bayes” part of
the name refers to Reverend Thomas Bayes, an 18th-century statistician and
theologian who formulated Bayes’ theorem.
Consider a fictional dataset that describes the weather conditions for playing a game
of golf. Given the weather conditions, each tuple classifies the conditions as
fit(“Yes”) or unfit(“No”) for playing golf.Here is a tabular representation of our
dataset.

Outlook Temperature Humidit Windy Play Golf


y

0 Rainy Hot High False No

1 Rainy Hot High True No


Outlook Temperature Humidit Windy Play Golf
y

2 Overcast Hot High False Yes

3 Sunny Mild High False Yes

4 Sunny Cool Normal False Yes

5 Sunny Cool Normal True No

6 Overcast Cool Normal True Yes

7 Rainy Mild High False No

8 Rainy Cool Normal False Yes

9 Sunny Mild Normal False Yes

10 Rainy Mild Normal True Yes

11 Overcast Mild High True Yes

12 Overcast Hot Normal False Yes

13 Sunny Mild High True No

The dataset is divided into two parts, namely, feature matrix and the response
vector.
 Feature matrix contains all the vectors(rows) of dataset in which each vector
consists of the value of dependent features. In above dataset, features are
‘Outlook’, ‘Temperature’, ‘Humidity’ and ‘Windy’.
 Response vector contains the value of class variable(prediction or output) for each
row of feature matrix. In above dataset, the class variable name is ‘Play golf’.
Assumption of Naive Bayes
The fundamental Naive Bayes assumption is that each feature makes an:
 Feature independence: The features of the data are conditionally independent of
each other, given the class label.
 Continuous features are normally distributed: If a feature is continuous, then it
is assumed to be normally distributed within each class.
 Discrete features have multinomial distributions: If a feature is discrete, then it
is assumed to have a multinomial distribution within each class.
 Features are equally important: All features are assumed to contribute equally to
the prediction of the class label.
 No missing data: The data should not contain any missing values.
With relation to our dataset, this concept can be understood as:
 We assume that no pair of features are dependent. For example, the temperature
being ‘Hot’ has nothing to do with the humidity or the outlook being ‘Rainy’ has
no effect on the winds. Hence, the features are assumed to be independent.
 Secondly, each feature is given the same weight(or importance). For example,
knowing only temperature and humidity alone can’t predict the outcome accurately.
None of the attributes is irrelevant and assumed to be contributing equally to the
outcome.
The assumptions made by Naive Bayes are not generally correct in
real-world situations. In-fact, the independence assumption is never
correct but often works well in practice.Now, before moving to the
formula for Naive Bayes, it is important to know about Bayes’
theorem.
Bayes’ Theorem
Bayes’ Theorem finds the probability of an event occurring given the probability of
another event that has already occurred. Bayes’ theorem is stated mathematically as
the following equation:
P(A∣B)=P(B∣A)P(A)P(B)P(A∣B)=P(B)P(B∣A)P(A)
where A and B are events and P(B) ≠ 0
 Basically, we are trying to find probability of event A, given the event B is true.
Event B is also termed as evidence.
 P(A) is the priori of A (the prior probability, i.e. Probability of event before
evidence is seen). The evidence is an attribute value of an unknown instance(here,
it is event B).
 P(B) is Marginal Probability: Probability of Evidence.
 P(A|B) is a posteriori probability of B, i.e. probability of event after evidence is
seen.
 P(B|A) is Likelihood probability i.e the likelihood that a hypothesis will come true
based on the evidence.
Now, with regards to our dataset, we can apply Bayes’ theorem in following way:
P(y∣X)=P(X∣y)P(y)P(X)P(y∣X)=P(X)P(X∣y)P(y)
where, y is class variable and X is a dependent feature vector (of size n) where:
X=(x1,x2,x3,…..,xn)X=(x1,x2,x3,…..,xn)
Just to clear, an example of a feature vector and corresponding class variable can be:
(refer 1st row of dataset)
X = (Rainy, Hot, High, False)
y = No
So basically, P(y∣X)P(y∣X)here means, the probability of “Not playing golf” given
that the weather conditions are “Rainy outlook”, “Temperature is hot”, “high
humidity” and “no wind”.
With relation to our dataset, this concept can be understood as:
 We assume that no pair of features are dependent. For example, the temperature
being ‘Hot’ has nothing to do with the humidity or the outlook being ‘Rainy’ has
no effect on the winds. Hence, the features are assumed to be independent.
 Secondly, each feature is given the same weight(or importance). For example,
knowing only temperature and humidity alone can’t predict the outcome accurately.
None of the attributes is irrelevant and assumed to be contributing equally to the
outcome.
Now, its time to put a naive assumption to the Bayes’ theorem, which
is, independence among the features. So now, we split evidence into the independent
parts.
Now, if any two events A and B are independent, then,
P(A,B) = P(A)P(B)
Hence, we reach to the result:
P(y∣x1,…,xn)=P(x1∣y)P(x2∣y)…P(xn∣y)P(y)P(x1)P(x2)…P(xn)P(y∣x1,…,xn)=P(x1
)P(x2)…P(xn)P(x1∣y)P(x2∣y)…P(xn∣y)P(y)
which can be expressed as:
P(y∣x1,…,xn)=P(y)∏i=1nP(xi∣y)P(x1)P(x2)…P(xn)P(y∣x1,…,xn)=P(x1)P(x2)…P(xn
)P(y)∏i=1nP(xi∣y)
Now, as the denominator remains constant for a given input, we can remove that term:
P(y∣x1,…,xn)∝P(y)∏i=1nP(xi∣y)P(y∣x1,…,xn)∝P(y)∏i=1nP(xi∣y)
Now, we need to create a classifier model. For this, we find the probability of given
set of inputs for all possible values of the class variable y and pick up the output with
maximum probability. This can be expressed mathematically as:
y=argmaxyP(y)∏i=1nP(xi∣y)y=argmaxyP(y)∏i=1nP(xi∣y)
So, finally, we are left with the task of calculating P(y)P(y)and P(xi∣y)P(xi∣y).
Please note that P(y)P(y) is also called class probability and P(xi∣y)P(xi∣y) is called
conditional probability.
The different naive Bayes classifiers differ mainly by the assumptions they make
regarding the distribution of P(xi∣y).P(xi∣y).
Let us try to apply the above formula manually on our weather dataset. For this, we
need to do some precomputations on our dataset.
We need to findP(xi∣yj)P(xi∣yj)for each xixi in X andyjyj in y. All these calculations
have been demonstrated in the tables below:
Naive Bayes Classifiers

So, in the figure above, we have calculated P(xi ∣yj)P(xi ∣yj) for each xixi in X
and yjyj in y manually in the tables 1-4. For example, probability of playing golf
given that the temperature is cool, i.e P(temp. = cool | play golf = Yes) = 3/9.
Also, we need to find class probabilities P(y)P(y) which has been calculated in the
table 5. For example, P(play golf = Yes) = 9/14.
So now, we are done with our pre-computations and the classifier is ready!
Let us test it on a new set of features (let us call it today):
today = (Sunny, Hot, Normal, False)
P(Yes∣today)=P(SunnyOutlook∣Yes)P(HotTemperature∣Yes)P(NormalHumidity∣Yes)
P(NoWind∣Yes)P(Yes)P(today)P(Yes∣today)=P(today)P(SunnyOutlook∣Yes)P(HotTe
mperature∣Yes)P(NormalHumidity∣Yes)P(NoWind∣Yes)P(Yes)
and probability to not play golf is given by:
P(No∣today)=P(SunnyOutlook∣No)P(HotTemperature∣No)P(NormalHumidity∣No)P(
NoWind∣No)P(No)P(today)P(No∣today)=P(today)P(SunnyOutlook∣No)P(HotTempera
ture∣No)P(NormalHumidity∣No)P(NoWind∣No)P(No)
Since, P(today) is common in both probabilities, we can ignore P(today) and find
proportional probabilities as:
P(Yes∣today)∝39.29.69.69.914≈0.02116P(Yes∣today)∝93.92.96.96.149≈0.02116
and
P(No∣today)∝35.25.15.25.514≈0.0068P(No∣today)∝53.52.51.52.145≈0.0068
Now, since
P(Yes∣today)+P(No∣today)=1P(Yes∣today)+P(No∣today)=1
These numbers can be converted into a probability by making the sum equal to 1
(normalization):
P(Yes∣today)=0.021160.02116+0.0068≈0.0237P(Yes∣today)=0.02116+0.00680.02116
≈0.0237
and
P(No∣today)=0.00680.0141+0.0068≈0.33P(No∣today)=0.0141+0.00680.0068≈0.33
Since
P(Yes∣today)>P(No∣today)P(Yes∣today)>P(No∣today)
So, prediction that golf would be played is ‘Yes’.
The method that we discussed above is applicable for discrete data. In case of
continuous data, we need to make some assumptions regarding the distribution of
values of each feature. The different naive Bayes classifiers differ mainly by the
assumptions they make regarding the distribution of P(xi∣y).P(xi∣y).

Types of Naive Bayes Model

There are three types of Naive Bayes Model:


Gaussian Naive Bayes classifier
In Gaussian Naive Bayes, continuous values associated with each feature are assumed
to be distributed according to a Gaussian distribution. A Gaussian distribution is also
called Normal distribution When plotted, it gives a bell shaped curve which is
symmetric about the mean of the feature values as shown below:

Updated table of prior probabilities for outlook feature is as following:


The likelihood of the features is assumed to be Gaussian, hence, conditional
probability is given by:
P(xi∣y)=12πσy2exp(−(xi−μy)22σy2)P(xi∣y)=2πσy21exp(−2σy2(xi−μy)2)
Now, we look at an implementation of Gaussian Naive Bayes classifier using scikit-
learn.

Yes No P(Yes) P(No)

Sunny 3 2 3/9 2/5

Rainy 4 0 4/9 0/5

Overcast 2 3 2/9 3/5

Total 9 5 100% 100%

Python

1
# load the iris dataset
2
from sklearn.datasets import load_iris
3
iris = load_iris()
4

5
# store the feature matrix (X) and response vector (y)
6
X = iris.data
7
y = iris.target
8

9
# splitting X and y into training and testing sets
10
from sklearn.model_selection import train_test_split
11
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)
12

13
# training the model on training set
14
from sklearn.naive_bayes import GaussianNB
15
gnb = GaussianNB()
16
gnb.fit(X_train, y_train)
17

18
# making predictions on the testing set
19
y_pred = gnb.predict(X_test)
20

21
# comparing actual response values (y_test) with predicted response values (y_pred)
22
from sklearn import metrics
23
print("Gaussian Naive Bayes model accuracy(in %):", metrics.accuracy_score(y_test,
y_pred)*100)
Output:
Gaussian Naive Bayes model accuracy(in %): 95.0

Multinomial Naive Bayes

Feature vectors represent the frequencies with which certain events have been
generated by a multinomial distribution. This is the event model typically used for
document classification.

Bernoulli Naive Bayes

In the multivariate Bernoulli event model, features are independent booleans (binary
variables) describing inputs. Like the multinomial model, this model is popular for
document classification tasks, where binary term occurrence(i.e. a word occurs in a
document or not) features are used rather than term frequencies(i.e. frequency of a
word in the document).
Advantages of Naive Bayes Classifier
 Easy to implement and computationally efficient.
 Effective in cases with a large number of features.
 Performs well even with limited training data.
 It performs well in the presence of categorical features.
 For numerical features data is assumed to come from normal distributions
Disadvantages of Naive Bayes Classifier
 Assumes that features are independent, which may not always hold in real-world
data.
 Can be influenced by irrelevant attributes.
 May assign zero probability to unseen events, leading to poor generalization.
Applications of Naive Bayes Classifier
 Spam Email Filtering: Classifies emails as spam or non-spam based on features.
 Text Classification: Used in sentiment analysis, document categorization, and
topic classification.
 Medical Diagnosis: Helps in predicting the likelihood of a disease based on
symptoms.
 Credit Scoring: Evaluates creditworthiness of individuals for loan approval.
 Weather Prediction: Classifies weather conditions based on various factors.
As we reach to the end of this article, here are some important points to ponder upon:
 In spite of their apparently over-simplified assumptions, naive Bayes classifiers
have worked quite well in many real-world situations, famously document
classification and spam filtering. They require a small amount of training data to
estimate the necessary parameters.
 Naive Bayes learners and classifiers can be extremely fast compared to more
sophisticated methods. The decoupling of the class conditional feature distributions
means that each distribution can be independently estimated as a one dimensional
distribution. This in turn helps to alleviate problems stemming from the curse of
dimensionality.
Conclusion
In conclusion, Naive Bayes classifiers, despite their simplified assumptions, prove
effective in various applications, showcasing notable performance in document
classification and spam filtering. Their efficiency, speed, and ability to work with
limited data make them valuable in real-world scenarios, compensating for their naive
independence assumption.
Author's Note
Hello everyone 👋🏻,
Welcome to a beginner-friendly guide to Naive Bayes classification! This notebook
has been carefully crafted to serve as a comprehensive companion for those who are
taking their first steps into the world of machine learning.
If you're just starting out with machine learning, this guide is designed specifically
for you. We'll walk through the Naive Bayes classification technique in a way that's
easy to understand, even if you're new to this exciting field 🤩.
By the time you finish this guide, you'll have a solid grasp of how Naive Bayes
works and how it can be used to make predictions and when you should use it 🙌.
Let's dive in and unlock the power of Naive Bayes classification for beginners!
Naïve Bayes Classification : Spam Email Detection
Classifier to identify spam emails from legitimate ones
What is Naive Bayes Classification?
In 1763, the English statistician and philosopher Thomas Bayes proposed the Bayes
Theorem, which serves as the fundamental principle of conditional probability. This
theorem states that the likelihood of an event occurring, given the occurrence of
another event, is equal to the conditional probability of the second event given the
first event, multiplied by the probability of the first event itself.
Naive Bayes is a popular classification approach that is rooted in Bayes' theory. The
posterior class probability of a test data point can be calculated using class-conditional
density estimation and class prior probability. The test data will then be assigned to
the class with the highest posterior class probability.

In [1]:

# Try it out !# Uncomment the code to run this cell# Code Below# Calculating
Conditional Probability using Bayes Theorem :-
#----------------------------------------------------------------------------------------------------
-----------------------

# P_A = float(input("Enter the probability of event A = ")) # Probability of event


A# P_B_given_A = float(input("Enter the probability of event B given event A = "))
# Probability of event B given event A
# # Calculate the complement of event A# P_not_A = 1 - P_A

# # Calculate the probability of event B# P_B = P_B_given_A * P_A + (1 -


P_B_given_A) * P_not_A

# # Calculate the conditional probability using Bayes' theorem# P_A_given_B =


(P_B_given_A * P_A) / P_B

# # Print the results# print(f"P(A|B) = {P_A_given_B:.2f}")


Why is it called Naive Bayes ?
This classification methodology makes a naive assumption that features are
independent of each other. For instance, consider the Titanic Dataset. When
classifying using naive Bayes, we assume that data labels like age, gender, class, and
cabin are all independent of each other.
It's important to note that this assumption is made to simplify our task, but in reality,
these features may or may not be interrelated. The assumption helps us handle the
complexity of the model, even though real-world data relationships might be more
intricate.
When should we consider using Naive Bayes?
If we are given a condition where there are multiple events occouring at the same time
and it is difficult to handle/understand tehm all at once, in such a case we may make
an naive assumption considering all the events to be independent of each other. this
would help us solve the problem with simplicity and comparatively less effort.
Where can we use Naive bayes Classification ?
A few interesting projects could be :
1. Spam Detection
2. Character Recognition
3. Whether Prediction
4. News Article Catagorization
5. Face Detection
What are the types of Naive Bayes Classification?
1.
Bernoulli: The Bernoulli model is suitable when our feature vectors are binary,
meaning they can only take two values (usually 0 and 1). In the context of text
classification with a 'bag of words' model, the 1s represent "word occurs in the
document," and the 0s represent "word does not occur in the document." This model
is useful when we want to represent presence or absence of certain features in our
data.
2.
3.
Gaussian: In classification, Gaussian is a method that assumes the features we use to
describe data (like measurements or characteristics) follow a normal distribution. This
means that most of the data points cluster around the average value, and fewer data
points deviate far from this average.
4.
5.
Multinomial: Multinomial is used when we are dealing with discrete counts. For
example, in text classification, instead of just checking if a word occurs in a document
(like in Bernoulli), we now count how many times a word appears in the document.
It's like counting how many times a specific outcome (word) is observed over several
trials (words in the document).
6.
Why is Naive Bayes a better classifier ?
Naive Bayes is a superior classifier because it employs probabilities to make
predictions. Unlike other classifiers that rely on manually coded rules, Naive Bayes
considers multiple features together, which makes it more accurate, especially with
complex and large datasets.
Naive Bayes uses probabilistic methods; thus, it can adapt to changes in data over
time, giving it an edge over classifiers that struggle to maintain and update their fixed
rules.
Project :

Spam Email Detection


Utilizing machine learning techniques to build a robust and efficient spam email
detection system by implementing naive bayes classifier.
Github : https://github.com/Satarupa22-SD/Spam_Detection (feel free to fork and use
the project)
In [2]:

#Importing the Necessary librariesimport numpy as npimport pandas as pdfrom


sklearn.model_selection import train_test_splitfrom sklearn.feature_extraction.text
import CountVectorizerfrom sklearn.naive_bayes import MultinomialNBfrom
sklearn.metrics import accuracy_score, confusion_matrix, classification_report
/opt/conda/lib/python3.10/site-packages/scipy/__init__.py:146: UserWarning: A
NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected
version 1.23.5
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
In [3]:

# Reading the data from .csv filedata = pd.read_csv('/kaggle/input/email-spam-


detection-dataset-classification/spam.csv', encoding='latin-1')
In [4]:

#display the first 5 rowsdata.head()


Out[4]:
Unnamed: Unnamed: Unnamed:
v1 v2
2 3 4
Go until jurong point, crazy..
0 ham NaN NaN NaN
Available only ...
1 ham Ok lar... Joking wif u oni... NaN NaN NaN
Free entry in 2 a wkly comp to win
2 spam NaN NaN NaN
FA Cup fina...
U dun say so early hor... U c already
3 ham NaN NaN NaN
then say...
Nah I don't think he goes to usf, he
4 ham NaN NaN NaN
lives aro...

Displaying the first 5 rows gives us an idea of how the data is arranged
in the table. This helps us estimate which features are necessary and
which are not. Note : The label ham denotes non spam emails.
Data Preprocessing
What is Data Preprocessing ?
In simpler terms, data preprocessing refers to cleaning of the data. It is like chopping
and cleaning the veggies before cooking them.
Defination : Data preprocessing is the act of cleaning, converting, and organizing
raw data such that it may be fed into a machine learning or data analysis algorithm in
a more useable and structured shape.
why is it necessary ?
Quality assurance: Raw data may have errors, inconsistencies, or missing numbers.
By addressing these challenges, preprocessing ensures data quality.
Better Results: Accurate, dependable insights are generated by good data. Clean,
well-organized data helps algorithms function better.
Feature Engineering: By combining existing features, you can construct new, useful
ones that improve the model's capacity to grasp the data.
Reduced Noise: Outliers or extreme values might cause results to be distorted.
Preprocessing assists in identifying and dealing with them.
Standardization: Different data sources may have varying units or scales. Data is
more similar after preprocessing.
Missing Values: Algorithms may struggle to handle missing values. Preprocessing
aids in the filling or removal of missing data.
Efficiency: Preparing data correctly saves time and computational resources during
analysis.
In [5]:
# Drop the columns with NaN valuesdata = data.drop(columns=['Unnamed: 2',
'Unnamed: 3', 'Unnamed: 4'], axis=1)
We do not need the Not-a-Number (NaN) values, as they do not
provide any insights into the data or impact other features. Therefore,
we are dropping them.
In [6]:

# Rename columns for clarity:data.columns = ['label', 'text']


In [7]:

# Displaying the first 5 rows to get basic understanding of the dataprint(data.head())


label text
0 ham Go until jurong point, crazy.. Available only ...
1 ham Ok lar... Joking wif u oni...
2 spam Free entry in 2 a wkly comp to win FA Cup fina...
3 ham U dun say so early hor... U c already then say...
4 ham Nah I don't think he goes to usf, he lives aro...
Separate Features and Target Labels
A typical dataset consists of input features and corresponding target labels. The input
features are the attributes or variables that are used to make predictions, while the
target labels are the values we are trying to predict.
let us consider a simple example Imagine you're trying to teach a computer to tell
whether a fruit is an apple or an orange based on its color and size. In this
case,Features are the color and size of the fruit and Labels are whether the fruit is an
apple or an orange.
Key Terms :
 train_test_split: This function from the sklearn.model_selection module is used to
split the data into training and testing sets.
 X_train: This variable holds the subset of input features that will be used for training
the model.
 X_test: This variable holds the subset of input features that will be used for testing
the model.
 y_train: This variable holds the corresponding target labels for the training set.
 y_test: This variable holds the corresponding target labels for the testing set.
 test_size=0.2: This parameter indicates that 20% of the data will be allocated for
testing, and the remaining 80% will be used for training.
 random_state=42: This parameter is used to seed the random number generator,
ensuring that the data is split in a reproducible manner. Using the same seed will
produce the same split each time you run the code.
In [8]:

# Separate features (X) and target labels (y)X = data.drop('label', axis=1)y =


data['label']

# Split the data into training and testing sets (80% training, 20% testing)X_train,
X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Train the Classifier (Multinomial Naive Bayes)
In [9]:

from sklearn.feature_extraction.text import CountVectorizerfrom


sklearn.naive_bayes import MultinomialNBfrom sklearn.metrics import
accuracy_score, confusion_matrix, classification_report

Why are we performing Count vectorization ?


We're utilizing the MultinomialNB() classifier for this project, which exclusively
accepts numeric values. However, our X_train and X_test datasets comprise text data
(email messages). This is where CountVectorizer() comes in. It is being used here to
convert the provided text into a vector, considering the frequency (count) of each
word appearing throughout the entire text. This transformation is essential to enable
the classifier to work with the text data effectively.
In [10]:

# Create a CountVectorizer instancevectorizer = CountVectorizer()


In [11]:

# Fit and transform the training data (X_train)X_train_vectorized =


vectorizer.fit_transform(X_train['text'])

# Transform the test data (X_test)X_test_vectorized =


vectorizer.transform(X_test['text'])
In [12]:

# Train the Multinomial Naive Bayes classifierclassifier =


MultinomialNB()classifier.fit(X_train_vectorized, y_train)
Out[12]:

MultinomialNB

MultinomialNB()
Make Predictions on the Test Data
In this step, we are predicting the accuracy of our model by evaluating how precisely
it can predict outcomes on new, unseen data.
In [13]:

# Make predictions on the test datay_pred = classifier.predict(X_test_vectorized)

# Evaluate the modelaccuracy = accuracy_score(y_test, y_pred)conf_matrix =


confusion_matrix(y_test, y_pred)classification_rep = classification_report(y_test,
y_pred)

print(f"Accuracy: {accuracy:.2f}")print("Confusion
Matrix:")print(conf_matrix)print("Classification Report:")print(classification_rep)
Accuracy: 0.98
Confusion Matrix:
[[963 2]
[ 16 134]]
Classification Report:
precision recall f1-score support

ham 0.98 1.00 0.99 965


spam 0.99 0.89 0.94 150

accuracy 0.98 1115


macro avg 0.98 0.95 0.96 1115
weighted avg 0.98 0.98 0.98 1115

Visualizing the Data


Understanding raw numbers or datasets can often be challenging. Therefore, it is
crucial to visually represent our data. By generating visual representations of data,
complex patterns, trends, and relationships become easier to comprehend than when
dealing with raw numbers alone. Visualization also aids in identifying anomalies
within the data. In the code snippet below, we have visualized the data using a
histogram that displays the distribution of spam and non-spam emails.
In [14]:

import matplotlib.pyplot as plt

# Count the number of spam and non-spam emails in the test setspam_counts =
y_test.value_counts()

# Plot the histogramplt.figure(figsize=(8, 6))plt.bar(spam_counts.index,


spam_counts.values, color=['green', 'red'])plt.xlabel('Email Type')plt.ylabel('Number
of Emails')plt.title('Number of Spam and Non-Spam Emails')plt.xticks([0, 1], ['ham
(Non-Spam)', 'spam'])plt.show()

linkcode
Reference :

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy