Pgm5 With Output
Pgm5 With Output
Consider a fictional dataset that describes the weather conditions for playing a game of golf. Given
the weather conditions, each tuple classifies the conditions as fit(“Yes”) or unfit(“No”) for plaing
golf.
PLAY
SL.NO. OUTLOOK TEMPERATURE HUMIDITY WINDY
GOLF
The dataset is divided into two parts, namely, feature matrix and the response vector.
Feature matrix contains all the vectors(rows) of dataset in which each vector consists of the
value of dependent features. In above dataset, features are ‘Outlook’, ‘Temperature’,
‘Humidity’ and ‘Windy’.
Response vector contains the value of class variable(prediction or output) for each row of
feature matrix. In above dataset, the class variable name is ‘Play golf’.
Assumption:
The fundamental Naive Bayes assumption is that each feature makes an:
independent
equal
contribution to the outcome.
We assume that no pair of features are dependent. For example, the temperature being ‘Hot’
has nothing to do with the humidity or the outlook being ‘Rainy’ has no effect on the winds.
Hence, the features are assumed to be independent.
Secondly, each feature is given the same weight(or importance). For example, knowing only
temperature and humidity alone can’t predict the outcome accuratey. None of the attributes is
irrelevant and assumed to be contributing equally to the outcome.
Note: The assumptions made by Naive Bayes are not generally correct in real-world situations. In-
fact, the independence assumption is never correct but often works well in practice.
Now, before moving to the formula for Naive Bayes, it is important to know about Bayes’ theorem.
Bayes’ Theorem
Bayes’ Theorem finds the probability of an event occurring given the probability of another event
that has already occurred. Bayes’ theorem is stated mathematically as the following equation:
Basically, we are trying to find probability of event A, given the event B is true. Event B is
also termed as evidence.
P(A) is the priori of A (the prior probability, i.e. Probability of event before evidence is
seen). The evidence is an attribute value of an unknown instance(here, it is event B).
P(A|B) is a posteriori probability of B, i.e. probability of event after evidence is seen.
Now, with regards to our dataset, we can apply Bayes’ theorem in following way:
where, y is class variable and X is a dependent feature vector (of size n) where:
Just to clear, an example of a feature vector and corresponding class variable can be: (refer 1st row
of dataset)
y = No
So basically, P(y|X) here means, the probability of “Not playing golf” given that the weather
conditions are “Rainy outlook”, “Temperature is hot”, “high humidity” and “no wind”.
Naive assumption
Now, its time to put a naive assumption to the Bayes’ theorem, which is, independence among the
features. So now, we split evidence into the independent parts.
Now, if any two events A and B are independent, then,
P(A,B) = P(A)P(B)
Now, as the denominator remains constant for a given input, we can remove that term:
Now, we need to create a classifier model. For this, we find the probability of given set of inputs for
all possible values of the class variable y and pick up the output with maximum probability. This
can be expressed mathematically as:
So, finally, we are left with the task of calculating P(y) and P(xi | y).
Please note that P(y) is also called class probability and P(xi | y) is called conditional probability.
The different naive Bayes classifiers differ mainly by the assumptions they make regarding the
distribution of P(xi | y).
Let us try to apply the above formula manually on our weather dataset. For this, we need to do some
precomputations on our dataset.
We need to find P(xi | yj) for each xi in X and yj in y. All these calculations have been demonstrated
in the tables below:
So, in the figure above, we have calculated P(xi | yj) for each xi in X and yj in y manually in the
tables 1-4. For example, probability of playing golf given that the temperature is cool, i.e P(temp. =
cool | play golf = Yes) = 3/9.
Also, we need to find class probabilities (P(y)) which has been calculated in the table 5. For
example, P(play golf = Yes) = 9/14.
So now, we are done with our pre-computations and the classifier is ready!
Since, P(today) is common in both probabilities, we can ignore P(today) and find proportional
probabilities as:
and
Now, since
These numbers can be converted into a probability by making the sum equal to 1 (normalization):
and
Since
The method that we discussed above is applicable for discrete data. In case of continuous data, we
need to make some assumptions regarding the distribution of values of each feature. The different
naive Bayes classifiers differ mainly by the assumptions they make regarding the distribution of
P(xi | y).
Now, we discuss one of such classifiers here.
The likelihood of the features is assumed to be Gaussian, hence, conditional probability is given by:
When dealing with continuous data, a typical assumption is that the continuous values associated
with each class are distributed according to a Gaussian distribution. For example, suppose the
training data contains a continuous attribute, x. We first segment the data by the class, and then
compute the mean and variance of x in each class. Let μ be the mean of the values in x associated
with class Ck, and let σ 2 k be the variance of the values in x associated with class Ck. Suppose we
have collected some observation value v. Then, the probability distribution of v given a class Ck,
p(x=v|Ck) can be computed by plugging v into the equation for a Normal distribution parameterized
by μ and σ 2 k . That is
Above method is adopted in our implementation of the program. Pima Indian diabetis dataset This
dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The
objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on
certain diagnostic measurements included in the dataset.
APPLICATION AREAS:
Real time Prediction: Naive Bayes is an eager learning classifier and it is sure fast. Thus, it
could be used for making predictions in real time
. Multi class Prediction: This algorithm is also well known for multi class prediction
feature. Here we can predict the probability of multiple classes of target variable.
Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayes classifiers mostly
used in text classification (due to better result in multi class problems and independence
rule) have higher success rate as compared to other algorithms. As a result, it is widely used
in Spam filtering (identify spam e-mail) and Sentiment Analysis (in social media analysis,
to identify positive and negative customer sentiments)
Recommendation System: Naive Bayes Classifier and Collaborative Filtering together
builds a Recommendation System that uses machine learning and data mining techniques to
filter unseen information and predict whether a user would like a given resource or not
zip:
The zip()function returns a zip object, which is an iterator of tuples where the first
item in each passed iterator is paired together, and then the second item in each
passed iterator are paired together etc.
If the passed iterators have different lengths, the iterator with the least items decides
the length of the new iterator.
Syntax
zip(iterator1, iterator2, iterator3 ...)
output:
(('John', 'Jenny'), ('Charles', 'Christy'),
('Mike', 'Monica'))
Gaussian distribution:
import csv
import random
import math
def loadcsv(filename):
lines = csv.reader(open(filename, "r"));
dataset = list(lines)
for i in range(len(dataset)):
# converting strings into numbers for processing
dataset[i] = [float(x) for x in dataset[i]]
return dataset
def separatebyclass(dataset):
separated = {} # dictionary of classes 1 and 0
# creates a dictionary of classes 1 and 0 where the values are
# the instances belonging to each class
for i in range(len(dataset)):
vector = dataset[i]
if (vector[-1] not in separated):
separated[vector[-1]] = []
separated[vector[-1]].append(vector)
return separated
def mean(numbers):
return sum(numbers) / float(len(numbers))
def stdev(numbers):
avg = mean(numbers)
variance = sum([pow(x - avg, 2) for x in numbers]) / float(len(numbers) - 1)
return math.sqrt(variance)
def summarizebyclass(dataset):
separated = separatebyclass(dataset);
# print(separated)
summaries = {}
for classvalue, instances in separated.items():
# for key,value in dic.items()
# summaries is a dic of tuples(mean,std) for each class value
summaries[classvalue] = summarize(instances) # summarize is used to cal to mean and std
return summaries
def main():
filename = 'naivedata.csv'
splitratio = 0.67
dataset = loadcsv(filename);
# prepare model
summaries = summarizebyclass(trainingset);
#print(summaries)
# test model
predictions = getpredictions(summaries, testset) # find the predictions of test data with the
training data
accuracy = getaccuracy(testset, predictions)
print('Accuracy of the classifier is : {0}%'.format(accuracy))
output:
1. Split 768 rows into train=514 and test=254 rows
Accuracy of the classifier is : 73.62204724409449%
2.Split 768 rows into train=514 and test=254 rows
Accuracy of the classifier is : 75.19685039370079%