100% found this document useful (1 vote)

108 views

ML Lab Observation

1. The program implements the ID3 decision tree algorithm to classify a dataset. It loads a CSV dataset, splits it into subtables based on attribute values, calculates the entropy of subsets, and recursively builds a decision tree by selecting the attribute with the highest information gain at each node. 2. It defines Node and Tree classes to represent the tree structure. Functions for loading data, splitting into subtables, calculating entropy and information gain are also included. 3. The ID3 algorithm is applied to build a decision tree on the loaded dataset. A new sample can then be classified by traversing the tree based on attribute values.

Uploaded by

jaswanthch16

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

108 views

ML Lab Observation

Uploaded by

jaswanthch16

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 44

1.

Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data
from a .CSV file.

FIND-S Algorithm
1. Initialize h to the most specific hypothesis in H
2. For each positive training instance x
For each attribute constraint ai in h
If the constraint ai is satisfied by x
Then do nothing
Else replace ai in h by the next more general constraint that is
satisfied by x
3. Output hypothesis h

Training Examples:
Example Sky AirTemp Humidity Wind Water Forecast EnjoySport

1 Sunny Warm Normal Strong Warm Same Ye

2 Sunny Warm High Strong Warm Same Ye

3 Rainy Cold High Strong Warm Change No

4 Sunny Warm High Strong Cool Change Ye

Program:
import csv
num_attributes = 6
a = []
print("\n The Given Training Data Set \n")
with open('enjoysport.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
a.append (row)
print(row)
print("\n The initial value of hypothesis: ")
hypothesis = ['0'] * num_attributes
print(hypothesis)

for j in range(0,num_attributes):
hypothesis[j] = a[0][j];

print("\n Find S: Finding a Maximally Specific Hypothesis\

n")

for i in range(0,len(a)):
if a[i][num_attributes]=='yes':
for j in range(0,num_attributes):
if a[i][j]!=hypothesis[j]:
hypothesis[j]='?'
else :
hypothesis[j]= a[i][j]
print(" For Training instance No:{0} the hypothesis is
".format(i),hypothesis)

print("\n The Maximally Specific Hypothesis for a given

Training Examples :\n")
print(hypothesis)
Data Set:
sunny warm normal strong warm same yes
sunny warm high strong warm same yes
rainy cold high strong warm change no
sunny warm high strong cool change yes
Output:

The Given Training Data Set

['sunny', 'warm', 'normal', 'strong', 'warm', 'same',

'yes'] ['sunny', 'warm', 'high', 'strong', 'warm',
'same', 'yes'] ['rainy', 'cold', 'high', 'strong',
'warm', 'change', 'no'] ['sunny', 'warm', 'high',
'strong', 'cool', 'change', 'yes']

The initial value of hypothesis:

['0', '0', '0', '0', '0', '0']

Find S: Finding a Maximally Specific Hypothesis

For Training Example No:0 the hypothesis is

['sunny', 'warm', 'normal', 'strong', 'warm', 'same']

For Training Example No:1 the hypothesis is

['sunny', 'warm', '?', 'strong', 'warm', 'same']

For Training Example No:2 the hypothesis is

'sunny', 'warm', '?', 'strong', 'warm', 'same']

For Training Example No:3 the hypothesis is

'sunny', 'warm', '?', 'strong', '?', '?']

The Maximally Specific Hypothesis for a given Training

Examples: ['sunny', 'warm', '?', 'strong', '?', '?']
2. For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm to output a description of the
set of all hypotheses consistent with the training examples.

CANDIDATE-ELIMINATION Learning Algorithm

The CANDIDATE-ELIMINTION algorithm computes the version space containing all

hypotheses from H that are consistent with an observed sequence of training examples.

Initialize G to the set of maximally general hypotheses in H

Initialize S to the set of maximally specific hypotheses in H
For each training example d, do
• If d is a positive example
• Remove from G any hypothesis inconsistent with d
• For each hypothesis s in S that is not consistent with d
• Remove s from S
• Add to S all minimal generalizations h of s such that
• h is consistent with d, and some member of G is more general than h •
Remove from S any hypothesis that is more general than another hypothesis
in S

• If d is a negative example
• Remove from S any hypothesis inconsistent with d
• For each hypothesis g in G that is not consistent with d
• Remove g from G
• Add to G all minimal specializations h of g such that
• h is consistent with d, and some member of S is more specific than h •
Remove from G any hypothesis that is less general than another hypothesis in
G

CANDIDATE- ELIMINTION algorithm using version spaces

Training Examples:
Example Sky AirTemp Humidity Wind Water Forecast EnjoySport

1 Sunny Warm Normal Strong Warm Same Yes

2 Sunny Warm High Strong Warm Same Yes
3 Rainy Cold High Strong Warm Change No
4 Sunny Warm High Strong Cool Change Yes

Program:

import numpy as np
import pandas as pd
data =
pd.DataFrame(data=pd.read_csv('enjoysport.csv'))
concepts = np.array(data.iloc[:,0:-1])
print(concepts)
target = np.array(data.iloc[:,-1])
print(target)

def learn(concepts, target):

specific_h = concepts[0].copy()
print("initialization of specific_h and general_h")
print(specific_h)
general_h = [["?" for i in range(len(specific_h))] for i
in range(len(specific_h))]
print(general_h)
for i, h in enumerate(concepts):
if target[i] == "yes":
for x in range(len(specific_h)):
if h[x]!= specific_h[x]:
specific_h[x] ='?'
general_h[x][x] ='?'
print(specific_h)
print(specific_h)
if target[i] == "no":
for x in range(len(specific_h)):
if h[x]!= specific_h[x]:
general_h[x][x] = specific_h[x] else:
general_h[x][x] = '?'
print(" steps of Candidate Elimination Algorithm",i+1)
print(specific_h)
print(general_h)
indices = [i for i, val in enumerate(general_h) if val ==
['?', '?', '?', '?', '?', '?']]
for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])
return specific_h, general_h
s_final, g_final = learn(concepts, target)
print("Final Specific_h:", s_final, sep="\n")
print("Final General_h:", g_final, sep="\n")

Data Set:
Sky AirTemp Humidity Wind Water Forecast EnjoySport

sunny warm normal strong warm same yes

sunny warm high strong warm same yes
rainy cold high strong warm change no
sunny warm high strong cool change yes

Output:

Final Specific_h:
['sunny' 'warm' '?' 'strong' '?' '?']

Final General_h:
[['sunny', '?', '?', '?', '?', '?'],
['?', 'warm', '?', '?', '?', '?']]
3. Write a program to demonstrate the working of the decision tree based ID3
algorithm. Use an appropriate data set for building the decision tree and apply this
knowledge to classify a new sample.

ID3 Algorithm

ID3(Examples, Target_attribute, Attributes)

Examples are the training examples. Target_attribute is the attribute whose value is
to be predicted by the tree. Attributes is a list of other attributes that may be tested
by the learned decision tree. Returns a decision tree that correctly classifies the
given Examples.

∙ Create a Root node for the tree

∙ If all Examples are positive, Return the single-node tree Root, with label = + ∙ If all
Examples are negative, Return the single-node tree Root, with label = - ∙ If Attributes is
empty, Return the single-node tree Root, with label = most common value of
Target_attribute in Examples

∙ Otherwise Begin
∙ A ← the attribute from Attributes that best* classifies Examples
∙ The decision attribute for Root ← A
∙ For each possible value, vi, of A,
∙ Add a new tree branch below Root, corresponding to the test A = vi
∙ Let Examples vi, be the subset of Examples that have value vi for A
∙ If Examples vi , is empty
∙ Then below this new branch add a leaf node with label = most common
value of Target_attribute in Examples
∙ Else below this new branch add the subtree
ID3(Examples vi, Targe_tattribute, Attributes – {A}))
∙ End
∙ Return Root

* The best attribute is the one with highest information gain

ENTROPY:
Entropy measures the impurity of a collection of examples.

Where, p+ is the proportion of positive examples in S

p- is the proportion of negative examples in S.

INFORMATION GAIN:

∙ Information gain, is the expected reduction in entropy caused by partitioning the

examples according to this attribute.
∙ The information gain, Gain(S, A) of an attribute A, relative to a collection of examples
S, is defined as
Program:

import math
import csv

def load_csv(filename):
lines=csv.reader(open(filename,"r"))
; dataset = list(lines)
headers = dataset.pop(0)
return dataset,headers

class Node:
def __init__(self,attribute):
self.attribute=attribute
self.children=[]
self.answer=""

def subtables(data,col,delete):
dic={}
coldata=[row[col] for row in data]
attr=list(set(coldata))

counts=[0]*len(attr)
r=len(data)
c=len(data[0])
for x in range(len(attr)):
for y in range(r):
if data[y][col]==attr[x]:
counts[x]+=1

for x in range(len(attr)):
dic[attr[x]]=[[0 for i in range(c)] for j in
range(counts[x])]
pos=0
for y in range(r):
if data[y][col]==attr[x]:
if delete:
del data[y][col]
dic[attr[x]][pos]=data[y] pos+=1
return attr,dic
def entropy(S):
attr=list(set(S))
if len(attr)==1:
return 0

counts=[0,0]
for i in range(2):
counts[i]=sum([1 for x in S if
attr[i]==x])/(len(S)*1.0)
sums=0
for cnt in counts:
sums+=-1*cnt*math.log(cnt,2)
return sums

def compute_gain(data,col):
attr,dic = subtables(data,col,delete=False)

total_size=len(data)
entropies=[0]*len(attr)
ratio=[0]*len(attr)

total_entropy=entropy([row[-1] for row in

data]) for x in range(len(attr)):
ratio[x]=len(dic[attr[x]])/(total_size*1.0)
entropies[x]=entropy([row[-1] for row in
dic[attr[x]]])
total_entropy-=ratio[x]*entropies[x]
return total_entropy

def build_tree(data,features):
lastcol=[row[-1] for row in data]
if(len(set(lastcol)))==1:
node=Node("")
node.answer=lastcol[0]
return node
n=len(data[0])-1
gains=[0]*n
for col in range(n):
gains[col]=compute_gain(data,col)
split=gains.index(max(gains))
node=Node(features[split])
fea = features[:split]+features[split+1:]

attr,dic=subtables(data,split,delete=True)
for x in range(len(attr)):
child=build_tree(dic[attr[x]],fea)
node.children.append((attr[x],child))
return node

def print_tree(node,level):
if node.answer!="":
print(" "*level,node.answer)
return

print(" "*level,node.attribute)
for value,n in node.children:
print(" "*(level+1),value)
print_tree(n,level+2)

def classify(node,x_test,features):
if node.answer!="":
print(node.answer)
return
pos=features.index(node.attribute)
for value, n in node.children:
if x_test[pos]==value:
classify(n,x_test,features)

'''Main program'''
dataset,features=load_csv("data3.csv")
node1=build_tree(dataset,features)

print("The decision tree for the dataset using ID3

algorithm is")
print_tree(node1,0)
testdata,features=load_csv("data3_test.csv")
for xtest in testdata:
print("The test instance:",xtest)
print("The label for test instance:",end=" ")
classify(node1,xtest,features)

Output:

The decision tree for the dataset using ID3 algorithm is

Outlook
rain
Wind
strong
no
weak
yes
overcast
yes

sunny
Humidity
normal
yes
high
no

The test instance: ['rain', 'cool', 'normal',

'strong'] The label for test instance: no

The test instance: ['sunny', 'mild', 'normal',

'strong'] The label for test instance: yes
7. Build an Artificial Neural Network by implementing the Backpropagation
algorithm and test the same using appropriate data sets.
Program:

import numpy as np
X = np.array(([2, 9], [1, 5], [3, 6]),
dtype=float) y = np.array(([92], [86], [89]),
dtype=float)
X = X/np.amax(X,axis=0) # maximum of X array
longitudinally y = y/100

#Sigmoid Function
def sigmoid (x):
return 1/(1 + np.exp(-x))

#Derivative of Sigmoid Function

def derivatives_sigmoid(x):
return x * (1 - x)

#Variable initialization
epoch=5000 #Setting training iterations
lr=0.1 #Setting learning rate
inputlayer_neurons = 2 #number of features in data set
hiddenlayer_neurons = 3 #number of hidden layers neurons
output_neurons = 1 #number of neurons at output layer

#weight and bias initialization

wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_
neur ons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_ne
uron s))
bout=np.random.uniform(size=(1,output_neurons))

#draws a random range of numbers uniformly of dim

x*y for i in range(epoch):

#Forward Propogation
hinp1=np.dot(X,wh)
hinp=hinp1 + bh
hlayer_act = sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout)
outinp= outinp1+ bout
output = sigmoid(outinp)

#Backpropagation
EO = y-output
outgrad = derivatives_sigmoid(output)
d_output = EO* outgrad
EH = d_output.dot(wout.T)

#how much hidden layer wts contributed to

error hiddengrad =
derivatives_sigmoid(hlayer_act)
d_hiddenlayer = EH * hiddengrad

# dotproduct of nextlayererror and

currentlayerop wout +=
hlayer_act.T.dot(d_output) *lr
wh += X.T.dot(d_hiddenlayer) *lr
print("Input: \n" + str(X))
print("Actual Output: \n" + str(y))
print("Predicted Output: \n" ,output)

Output:

Input:
[[0.66666667 1. ]
[0.33333333
0.55555556] [1.
0.66666667]]

Actual Output:
[[0.92]
[0.86]
[0.89]]

Predicted Output:
[[0.89726759]
[0.87196896]
[0.9000671]]
8. Write a program to implement k-Nearest Neighbour algorithm to classify the iris
data set. Print both correct and wrong predictions. Java/Python ML library classes
can be used for this problem.

K-Nearest Neighbor Algorithm

Training algorithm:
∙ For each training example (x, f (x)), add the example to the list training
examples Classification algorithm:
∙ Given a query instance xq to be classified,
∙ Let x1 . . .xk denote the k instances from training examples that are nearest
to xq ∙ Return

∙ Where, f(xi) function to calculate the mean value of the k nearest training
examples.

Data Set:

Iris Plants Dataset: Dataset contains 150 instances (50 in each of three
classes) Number of Attributes: 4 numeric, predictive attributes and the
Class

Program:

from sklearn.model_selection import

train_test_split from sklearn.neighbors import
KNeighborsClassifier
from sklearn.metrics import classification_report,
confusion_matrix from sklearn import datasets

""" Iris Plants Dataset, dataset contains 150 (50 in each of

three classes)Number of Attributes: 4 numeric, predictive
attributes and the Class
"""
iris=datasets.load_iris()
""" The x variable contains the first four columns of the
dataset (i.e. attributes) while y contains the labels.
"""
x = iris.data
y = iris.target

print ('sepal-length', 'sepal-width', 'petal-length', 'petal-

width') print(x)
print('class: 0-Iris-Setosa, 1- Iris-Versicolour, 2- Iris-
Virginica') print(y)

""" Splits the dataset into 70% train data and 30% test data.
This means that out of total 150 records, the training set will
contain 105 records and the test set contains 45 of those
records """
x_train, x_test, y_train, y_test =
train_test_split(x,y,test_size=0.3)

#To Training the model and Nearest nighbors

K=5 classifier =
KNeighborsClassifier(n_neighbors=5)
classifier.fit(x_train, y_train)

#to make predictions on our test data

y_pred=classifier.predict(x_test)

""" For evaluating an algorithm, confusion matrix, precision,

recall and f1 score are the most commonly used metrics.
"""
print('Confusion Matrix')
print(confusion_matrix(y_test,y_pred))
print('Accuracy Metrics')
print(classification_report(y_test,y_pred))

Output:

sepal-length sepal-width petal-length petal-

width [[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5. 3.6 1.4 0.2]
. . . . .
. . . . .

[6.2 3.4 5.4 2.3]

[5.9 3. 5.1 1.8]]

class: 0-Iris-Setosa, 1- Iris-Versicolour, 2- Iris-

Virginica [0 0 0 ………0 0 1 1 1 …………1 1 2 2 2 ………… 2 2]

Confusion Matrix
[[20 0 0]
[ 0 10 0]
[ 0 1 14]]

Accuracy Metrics

Precision recall f1-score support

0 1.00 1.00 1.00 20 1 0.91 1.00 0.95 10 2

1.00 0.93 0.97 15

avg / total 0.98 0.98 0.98 45

Basic knowledge

Confusion Matrix
True positives: data points labelled as positive that are actually positive
False positives: data points labelled as positive that are actually negative
True negatives: data points labelled as negative that are actually negative
False negatives: data points labelled as negative that are actually positive
Accuracy: how often is the classifier correct?
F1-Score:
Support: Total Predicted of Class.
Support = TP + FN

Example:
∙ Support _ A = TP_A + FN_A = 30 + (20 + 10)
= 60
9. Implement the non-parametric Locally Weighted Regression algorithm in
order to fit data points. Select appropriate data set for your experiment and
draw graphs.

Locally Weighted Regression Algorithm

Regression:
∙ Regression is a technique from statistics that is used to predict values of a
desired target quantity when the target quantity is continuous.
∙ In regression, we seek to identify (or estimate) a continuous variable y associated
with a given input vector x.
∙ y is called the dependent variable.
∙ x is called the independent variable.

Loess/Lowess Regression:
Loess regression is a nonparametric technique that uses local weighted regression
to fit a smooth curve through points in a scatter plot.
Lowess Algorithm:
∙ Locally weighted regression is a very powerful nonparametric model used in
statistical learning.
∙ Given a dataset X, y, we attempt to find a model parameter β(x) that
minimizes residual sum of weighted squared errors.
∙ The weights are given by a kernel function (k or w) which can be chosen arbitrarily

Algorithm

1. Read the Given data Sample to X and the curve (linear or non
linear) to Y 2. Set the value for Smoothening parameter or Free
parameter say τ
3. Set the bias /Point of interest set x0 which is a subset of X
4. Determine the weight matrix using :

5. Determine the value of model term parameter β using :

6. Prediction = x0*β:

Program
import numpy as np
from bokeh.plotting import figure, show,
output_notebook from bokeh.layouts import
gridplot
from bokeh.io import push_notebook

def local_regression(x0, X, Y, tau):# add

bias term x0 = np.r_[1, x0] # Add one to
avoid the loss in information
X = np.c_[np.ones(len(X)), X]

# fit model: normal equations with kernel

xw = X.T * radial_kernel(x0, X, tau) # XTranspose * W

beta = np.linalg.pinv(xw @ X) @ xw @ Y #@
Matrix Multiplication or Dot Product

# predict value
return x0 @ beta # @ Matrix Multiplication or Dot
Product for prediction
def radial_kernel(x0, X, tau):
return np.exp(np.sum((X - x0) ** 2, axis=1) / (-2 *
tau * tau))
# Weight or Radial Kernal Bias Function

n = 1000
# generate dataset
X = np.linspace(-3, 3, num=n)
print("The Data Set ( 10 Samples) X :\
n",X[1:10]) Y = np.log(np.abs(X ** 2 - 1) +
.5)
print("The Fitting Curve Data Set (10
Samples) Y :\n",Y[1:10])
# jitter X
X += np.random.normal(scale=.1, size=n)
print("Normalised (10 Samples) X :\n",X[1:10])

domain = np.linspace(-3, 3, num=300)

print(" Xo Domain Space(10 Samples) :\
n",domain[1:10]) def plot_lwr(tau):
# prediction through regression
prediction = [local_regression(x0, X, Y, tau) for
x0 in domain]
plot = figure(plot_width=400,
plot_height=400) plot.title.text='tau=
%g' % tau
plot.scatter(X, Y, alpha=.3)
plot.line(domain, prediction, line_width=2,
color='red') return plot

show(gridplot([
[plot_lwr(10.), plot_lwr(1.)],
[plot_lwr(0.1), plot_lwr(0.01)]]))

Output

# -- coding: utf-8 --

from numpy import *
from os import listdir
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np1
import numpy.linalg as np
from scipy.stats.stats import pearsonr

def kernel(point,xmat, k):

m,n = np1.shape(xmat)
weights = np1.mat(np1.eye((m)))
for j in range(m):
diff = point - X[j]
weights[j,j] = np1.exp(diff*diff.T/(-
2.0*k**2)) return weights

def localWeight(point,xmat,ymat,k):
wei = kernel(point,xmat,k)
W=(X.T*(wei*X)).I*(X.T*(wei*ym
at.T))
return W

def
localWeightRegression(xmat,yma
t,k): m,n = np1.shape(xmat)

ypred = np1.zeros(m)
for i in range(m):
ypred[i] = xmat[i]*localWeight(xmat[i],xmat,ymat,k)
return ypred
# load data points
data = pd.read_csv('tips.csv')
bill = np1.array(data.total_bill)
tip = np1.array(data.tip)

#preparing and add 1 in bill

mbill = np1.mat(bill)
mtip = np1.mat(tip) # mat is used to convert to n dimesiona to 2 dimensional
array form m= np1.shape(mbill)[1]
# print(m) 244 data is stored in m
one = np1.mat(np1.ones(m))
X= np1.hstack((one.T,mbill.T)) # create a stack of bill from ONE
#print(X)
#set k here
ypred = localWeightRegression(X,mtip,0.3)
SortIndex = X[:,1].argsort(0)
xsort = X[SortIndex][:,0]

fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(bill,tip, color='green')
ax.plot(xsort[:,1],ypred[SortIndex], color = 'red', linewidth=5)
plt.xlabel('Total bill')
plt.ylabel('Tip')
plt.show();
10. Assuming a set of documents that need to be classified, use the naïve Bayesian
Classifier model to perform this task. Built-in Java classes/API can be used to write
the program. Calculate the accuracy, precision, and recall for your data set.
sExperiment-11: Apply EM algorithm to cluster a Heart Disease Data Set. Use the same
data set for clustering using k-Means algorithm. Compare the results of these two algorithms
and comment on the quality of clustering. You can add Java/Python ML library classes/API in
the program.

import matplotlib.pyplot as plt

from sklearn import datasets
from sklearn.cluster import KMeans
import sklearn.metrics as sm
import pandas as pd
import numpy as np

iris = datasets.load_iris()

X = pd.DataFrame(iris.data)
X.columns = ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width']

y = pd.DataFrame(iris.target)
y.columns = ['Targets']

model = KMeans(n_clusters=3)
model.fit(X)

plt.figure(figsize=(14,7))

colormap = np.array(['red', 'lime', 'black'])

# Plot the Original Classifications

plt.subplot(1, 2, 1)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[y.Targets], s=40)
plt.title('Real Classification')
plt.xlabel('Petal Length')
plt.ylabel('Petal Width')

# Plot the Models Classifications

plt.subplot(1, 2, 2)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[model.labels_], s=40)
plt.title('K Mean Classification')
plt.xlabel('Petal Length')
plt.ylabel('Petal Width')
print('The accuracy score of K-Mean: ',sm.accuracy_score(y, model.labels_))
print('The Confusion matrixof K-Mean: ',sm.confusion_matrix(y, model.labels_))

from sklearn import preprocessing

scaler = preprocessing.StandardScaler()
scaler.fit(X)
xsa = scaler.transform(X)
xs = pd.DataFrame(xsa, columns = X.columns)
#xs.sample(5)

from sklearn.mixture import GaussianMixture

gmm = GaussianMixture(n_components=3)
gmm.fit(xs)

y_gmm = gmm.predict(xs)
#y_cluster_gmm

plt.subplot(2, 2, 3)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[y_gmm], s=40)
plt.title('GMM Classification')
plt.xlabel('Petal Length')
plt.ylabel('Petal Width')

print('The accuracy score of EM: ',sm.accuracy_score(y, y_gmm))

print('The Confusion matrix of EM: ',sm.confusion_matrix(y, y_gmm))
13. Write a program to construct a Bayesian network considering medical data.
Use this model to demonstrate the diagnosis of heart patients using standard
Heart Disease Data Set. You can use Java/Python ML library classes/API

Theory
A Bayesian network is a directed acyclic graph in which each edge corresponds to a
conditional dependency, and each node corresponds to a unique random variable.

Bayesian network consists of two major parts: a directed acyclic graph and a set of
conditional probability distributions
• The directed acyclic graph is a set of random variables represented by nodes. •
The conditional probability distribution of a node (random variable) is defined for
every possible outcome of the preceding causal node(s).

For illustration, consider the following example. Suppose we attempt to turn on our
computer, but the computer does not start (observation/evidence). We would like to
know which of the possible causes of computer failure is more likely. In this
simplified illustration, we assume only two possible causes of this misfortune:
electricity failure and computer malfunction.
The corresponding directed acyclic graph is depicted in below figure.

Fig: Directed acyclic graph representing two independent possible causes of a computer
failure.
The goal is to calculate the posterior conditional probability distribution of each of
the possible unobserved causes given the observed evidence, i.e. P [Cause |
Evidence].

Data Set:
Title: Heart Disease Databases
The Cleveland database contains 76 attributes, but all published experiments refer to
using a subset of 14 of them. In particular, the Cleveland database is the only one
that has been used by ML researchers to this date. The "Heartdisease" field refers to
the presence of heart disease in the patient. It is integer valued from 0 (no presence)
to 4.
Database: 0 1 2 3 4 Total
Cleveland: 164 55 36 35 13 303
Attribute Information:
1. age: age in years
2. sex: sex (1 = male; 0 = female)
3. cp: chest pain type
• Value 1: typical angina
• Value 2: atypical angina
• Value 3: non-anginal pain
• Value 4: asymptomatic
4. trestbps: resting blood pressure (in mm Hg on admission to the
hospital) 5. chol: serum cholestoral in mg/dl
6. fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
7. restecg: resting electrocardiographic results
• Value 0: normal
• Value 1: having ST-T wave abnormality (T wave inversions and/or ST
elevation or depression of > 0.05 mV)
• Value 2: showing probable or definite left ventricular hypertrophy by Estes'
criteria
8. thalach: maximum heart rate achieved
9. exang: exercise induced angina (1 = yes; 0 = no)
10. oldpeak = ST depression induced by exercise relative to rest
11.slope: the slope of the peak exercise ST segment
• Value 1: upsloping
• Value 2: flat
• Value 3: downsloping
12.thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
13.Heartdisease: It is integer valued from 0 (no presence) to 4.

Some instance from the dataset:

age sex cp trestbp chol fbs restec thalac exan oldpea slop ca thal Heartdisea
s g h g k e se

63 1 1 145 233 1 2 150 0 2.3 3 0 6 0

67 1 4 160 286 0 2 108 1 1.5 2 3 3 2

67 1 4 120 229 0 2 129 1 2.6 2 2 7 1

41 0 2 130 204 0 2 172 0 1.4 1 0 3 0

62 0 4 140 268 0 2 160 0 3.6 3 2 3 3

60 1 4 130 206 0 2 132 1 2.4 2 2 7 4

Program:

import numpy as np
import pandas as pd
import csv
from pgmpy.estimators import
MaximumLikelihoodEstimator from pgmpy.models
import BayesianModel
from pgmpy.inference import VariableElimination

#read Cleveland Heart Disease data

heartDisease = pd.read_csv('heart.csv')
heartDisease = heartDisease.replace('?',np.nan)

#display the data

print('Sample instances from the dataset are given
below') print(heartDisease.head())

#display the Attributes names and datatyes

print('\n Attributes and datatypes')

print(heartDisease.dtypes)
#Creat Model- Bayesian Network
model =
BayesianModel([('age','heartdisease'),
('sex','heartdisease'),( 'exang','heartdisease'),
('cp','heartdisease'),('heartdisease', 'restecg'),
('heartdisease','chol')])

#Learning CPDs using Maximum Likelihood Estimators

print('\n Learning CPD using Maximum likelihood

estimators')
model.fit(heartDisease,estimator=MaximumLikelihoodEsti
mator)

# Inferencing with Bayesian Network

print('\n Inferencing with Bayesian Network:')

HeartDiseasetest_infer = VariableElimination(model)

#computing the Probability of HeartDisease given

restecg print('\n 1.Probability of HeartDisease
given evidence= restecg :1')
q1=HeartDiseasetest_infer.query(variables=['heartdiseas
e'],evi dence={'restecg':1})
print(q1)

#computing the Probability of HeartDisease given cp

print('\n 2.Probability of HeartDisease given evidence=

cp:2 ')
q2=HeartDiseasetest_infer.query(variables=['heartdisease
'],evi dence={'cp':2})
print(q2)

Output:

Science Olympiad Forensics Qualitative Analysis
100% (1)
Science Olympiad Forensics Qualitative Analysis
4 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Example Sky Airtemp Humidity Wind Water Forecast Enjoysport 1 2 3 4
No ratings yet
Example Sky Airtemp Humidity Wind Water Forecast Enjoysport 1 2 3 4
6 pages
RCC Railing To OHSR
No ratings yet
RCC Railing To OHSR
24 pages
1 FIND+S+Algorithm
No ratings yet
1 FIND+S+Algorithm
2 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
6 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
No ratings yet
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
7 pages
CS402 Data Mining and Warehousing PDF
No ratings yet
CS402 Data Mining and Warehousing PDF
3 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
2 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
48 pages
Fdsa UNIT V
No ratings yet
Fdsa UNIT V
18 pages
Constraint Satisfaction Problems: AIMA: Chapter 6
No ratings yet
Constraint Satisfaction Problems: AIMA: Chapter 6
64 pages
Lec01 Conceptlearning
100% (1)
Lec01 Conceptlearning
49 pages
Question Bank AML
No ratings yet
Question Bank AML
4 pages
Thyroid Disease Classification Using Machine Learning Project
No ratings yet
Thyroid Disease Classification Using Machine Learning Project
34 pages
Ad3411 - Student
No ratings yet
Ad3411 - Student
27 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
33 pages
ML Unit-3 ppt
No ratings yet
ML Unit-3 ppt
92 pages
IF4071 - Deep Learning Laboratory
No ratings yet
IF4071 - Deep Learning Laboratory
1 page
AIML Lab Manual
No ratings yet
AIML Lab Manual
43 pages
LP I ML Viva Questions
100% (1)
LP I ML Viva Questions
9 pages
ML Lab Manual (1-10) FINAL
No ratings yet
ML Lab Manual (1-10) FINAL
34 pages
DL Lab Manual
100% (1)
DL Lab Manual
35 pages
Ad3311 Set4
No ratings yet
Ad3311 Set4
2 pages
Deep Learning Exp
No ratings yet
Deep Learning Exp
25 pages
Gujarat Technological University: Computer Engineering Machine Learning SUBJECT CODE: 3710216
No ratings yet
Gujarat Technological University: Computer Engineering Machine Learning SUBJECT CODE: 3710216
2 pages
Regression Notes
100% (1)
Regression Notes
20 pages
Machine Learning Full Question Bank
No ratings yet
Machine Learning Full Question Bank
14 pages
CP4252 Machine Learning lab manual
No ratings yet
CP4252 Machine Learning lab manual
37 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
PYTHON Notes Unit1&Unit2
No ratings yet
PYTHON Notes Unit1&Unit2
38 pages
Poly
100% (1)
Poly
108 pages
Play Tennis Example: Outlook Temperature Humidity Windy
No ratings yet
Play Tennis Example: Outlook Temperature Humidity Windy
29 pages
ML Unit-1
No ratings yet
ML Unit-1
26 pages
CS 224n Assignment #2: Word2vec (43 Points)
No ratings yet
CS 224n Assignment #2: Word2vec (43 Points)
4 pages
Text
No ratings yet
Text
131 pages
Lab Program
100% (1)
Lab Program
15 pages
Convolutional Neural Network
100% (1)
Convolutional Neural Network
3 pages
Machine Learning Lab Dlihebca6sem
100% (1)
Machine Learning Lab Dlihebca6sem
25 pages
Unit 4 Neuro Fuzzy
No ratings yet
Unit 4 Neuro Fuzzy
25 pages
Concept Learning
No ratings yet
Concept Learning
62 pages
Genetic Algorithm
No ratings yet
Genetic Algorithm
14 pages
Bai602 Ml Lesson Plan 2024-25 Even Aiml Dept
No ratings yet
Bai602 Ml Lesson Plan 2024-25 Even Aiml Dept
5 pages
Evaluation Metrics For Regression: Dr. Jasmeet Singh Assistant Professor, Csed Tiet, Patiala
No ratings yet
Evaluation Metrics For Regression: Dr. Jasmeet Singh Assistant Professor, Csed Tiet, Patiala
13 pages
AI Lab MAnual Final
No ratings yet
AI Lab MAnual Final
44 pages
Ai-Unit-Iii Notes
No ratings yet
Ai-Unit-Iii Notes
46 pages
DAA UNIT 4 - Final
No ratings yet
DAA UNIT 4 - Final
12 pages
ML Question Bank
No ratings yet
ML Question Bank
29 pages
ML Unit-4
No ratings yet
ML Unit-4
9 pages
ConceptLearning-Candidate Elimination Algorithm
No ratings yet
ConceptLearning-Candidate Elimination Algorithm
45 pages
ad3461-ml-lab-manual
No ratings yet
ad3461-ml-lab-manual
48 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
21CS54 Aiml Module3 PPT
No ratings yet
21CS54 Aiml Module3 PPT
102 pages
Previous Exam Exercises On Classification: Exercise 4 2012: Classification With 2 Features
No ratings yet
Previous Exam Exercises On Classification: Exercise 4 2012: Classification With 2 Features
9 pages
ML Unit 1
No ratings yet
ML Unit 1
44 pages
C Programming Question Bank
No ratings yet
C Programming Question Bank
3 pages
Ad3311 - Artificial Intelligence Lab Manual
No ratings yet
Ad3311 - Artificial Intelligence Lab Manual
30 pages
AI Unit 4 QA
No ratings yet
AI Unit 4 QA
22 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
135 pages
ML Lab
No ratings yet
ML Lab
21 pages
ML MANUAL (1)
No ratings yet
ML MANUAL (1)
74 pages
Body Repair Manual
No ratings yet
Body Repair Manual
121 pages
Material Handling Standard
No ratings yet
Material Handling Standard
31 pages
Diass Week 1
No ratings yet
Diass Week 1
6 pages
STD IX 2013 DR Homi Bhabha BalVaidyanik Competition Test Paper
No ratings yet
STD IX 2013 DR Homi Bhabha BalVaidyanik Competition Test Paper
8 pages
EOBD
100% (1)
EOBD
86 pages
CLP COP101 v11 201001
No ratings yet
CLP COP101 v11 201001
42 pages
Patient Education Reducing The Costs of Medicines The Basics - Uptodate
No ratings yet
Patient Education Reducing The Costs of Medicines The Basics - Uptodate
9 pages
Hazmat Compliance Manual Rev 2011-08
No ratings yet
Hazmat Compliance Manual Rev 2011-08
192 pages
7P
No ratings yet
7P
13 pages
Gender Role Socialization
No ratings yet
Gender Role Socialization
4 pages
Fisiologi Hati Dan Pankreas-Digest12
No ratings yet
Fisiologi Hati Dan Pankreas-Digest12
19 pages
AF-Manual-L2-A4-English-Student-Download
No ratings yet
AF-Manual-L2-A4-English-Student-Download
99 pages
Attitude of Gratitude
No ratings yet
Attitude of Gratitude
3 pages
Execution Plan For Shutdown Repair of Tanks 9C & 10C
No ratings yet
Execution Plan For Shutdown Repair of Tanks 9C & 10C
14 pages
Scoping Review Assignment
No ratings yet
Scoping Review Assignment
4 pages
Autism Spectrum Disorder
100% (2)
Autism Spectrum Disorder
27 pages
Atlas of Gynecologic Oncology Imaging download
100% (4)
Atlas of Gynecologic Oncology Imaging download
38 pages
مذكرة science2 اولى اعدادى ترم اول - منتدى الامتحان التعليمى
No ratings yet
مذكرة science2 اولى اعدادى ترم اول - منتدى الامتحان التعليمى
33 pages
That's Just The Way It Was Back Then by Dr. Lawson Broadrick (Chapter 1 Preview)
No ratings yet
That's Just The Way It Was Back Then by Dr. Lawson Broadrick (Chapter 1 Preview)
36 pages
Mhsus Documentation Manual 2015
100% (3)
Mhsus Documentation Manual 2015
86 pages
EMERY Wesley Reservoir Petrophysical Modeling
No ratings yet
EMERY Wesley Reservoir Petrophysical Modeling
24 pages
TYCE Syllabus
No ratings yet
TYCE Syllabus
73 pages
Equations in OCF
No ratings yet
Equations in OCF
46 pages
Valve VUBA - GEMELS
No ratings yet
Valve VUBA - GEMELS
3 pages
Occlusion in Complete Denture
100% (1)
Occlusion in Complete Denture
64 pages
Prompt
No ratings yet
Prompt
2 pages
Module 2 Basic Ecological Concepts and Principles
No ratings yet
Module 2 Basic Ecological Concepts and Principles
15 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

ML Lab Observation

Uploaded by

ML Lab Observation

Uploaded by

1.

1 Sunny Warm Normal Strong Warm Same Ye

2 Sunny Warm High Strong Warm Same Ye

3 Rainy Cold High Strong Warm Change No

4 Sunny Warm High Strong Cool Change Ye

print("\n Find S: Finding a Maximally Specific Hypothesis\

print("\n The Maximally Specific Hypothesis for a given

The Given Training Data Set

['sunny', 'warm', 'normal', 'strong', 'warm', 'same',

The initial value of hypothesis:

Find S: Finding a Maximally Specific Hypothesis

For Training Example No:0 the hypothesis is

For Training Example No:1 the hypothesis is

For Training Example No:2 the hypothesis is

For Training Example No:3 the hypothesis is

The Maximally Specific Hypothesis for a given Training

CANDIDATE-ELIMINATION Learning Algorithm

The CANDIDATE-ELIMINTION algorithm computes the version space containing all

Initialize G to the set of maximally general hypotheses in H

CANDIDATE- ELIMINTION algorithm using version spaces

1 Sunny Warm Normal Strong Warm Same Yes

def learn(concepts, target):

sunny warm normal strong warm same yes

ID3(Examples, Target_attribute, Attributes)

∙ Create a Root node for the tree

* The best attribute is the one with highest information gain

Where, p+ is the proportion of positive examples in S

∙ Information gain, is the expected reduction in entropy caused by partitioning the

total_entropy=entropy([row[-1] for row in

print("The decision tree for the dataset using ID3

The decision tree for the dataset using ID3 algorithm is

The test instance: ['rain', 'cool', 'normal',

The test instance: ['sunny', 'mild', 'normal',

#Derivative of Sigmoid Function

#weight and bias initialization

#draws a random range of numbers uniformly of dim

#how much hidden layer wts contributed to

# dotproduct of nextlayererror and

K-Nearest Neighbor Algorithm

from sklearn.model_selection import

""" Iris Plants Dataset, dataset contains 150 (50 in each of

print ('sepal-length', 'sepal-width', 'petal-length', 'petal-

#To Training the model and Nearest nighbors

#to make predictions on our test data

""" For evaluating an algorithm, confusion matrix, precision,

sepal-length sepal-width petal-length petal-

[6.2 3.4 5.4 2.3]

class: 0-Iris-Setosa, 1- Iris-Versicolour, 2- Iris-

Precision recall f1-score support

0 1.00 1.00 1.00 20 1 0.91 1.00 0.95 10 2

avg / total 0.98 0.98 0.98 45

Locally Weighted Regression Algorithm

5. Determine the value of model term parameter β using :

def local_regression(x0, X, Y, tau):# add

# fit model: normal equations with kernel

domain = np.linspace(-3, 3, num=300)

# -*- coding: utf-8 -*-

def kernel(point,xmat, k):

#preparing and add 1 in bill

import matplotlib.pyplot as plt

colormap = np.array(['red', 'lime', 'black'])

# Plot the Original Classifications

# Plot the Models Classifications

from sklearn import preprocessing

from sklearn.mixture import GaussianMixture

print('The accuracy score of EM: ',sm.accuracy_score(y, y_gmm))

Some instance from the dataset:

63 1 1 145 233 1 2 150 0 2.3 3 0 6 0

67 1 4 160 286 0 2 108 1 1.5 2 3 3 2

67 1 4 120 229 0 2 129 1 2.6 2 2 7 1

41 0 2 130 204 0 2 172 0 1.4 1 0 3 0

62 0 4 140 268 0 2 160 0 3.6 3 2 3 3

60 1 4 130 206 0 2 132 1 2.4 2 2 7 4

#read Cleveland Heart Disease data

#display the data

#display the Attributes names and datatyes

# -- coding: utf-8 --