0% found this document useful (0 votes)
26 views

ML NEW Final Format

The document describes implementing the candidate elimination algorithm to learn hypotheses from a dataset. It involves initializing a specific and general hypothesis, then iterating through examples to eliminate inconsistent hypotheses. For positive examples, inconsistent specific hypotheses are removed and generalizations are added, while negative examples remove inconsistent general hypotheses and add specializations. The final hypotheses remaining are the version space consistent with all examples.

Uploaded by

Tharun Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

ML NEW Final Format

The document describes implementing the candidate elimination algorithm to learn hypotheses from a dataset. It involves initializing a specific and general hypothesis, then iterating through examples to eliminate inconsistent hypotheses. For positive examples, inconsistent specific hypotheses are removed and generalizations are added, while negative examples remove inconsistent general hypotheses and add specializations. The final hypotheses remaining are the version space consistent with all examples.

Uploaded by

Tharun Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Ex.

No: 1
Linear Discriminant Analysis (LDA)

Aim:

To implement Linear Discriminant Analysis using Python

Algorithm:

1. Import Required Libraries:

- Import the necessary libraries for performing Linear Discriminant Analysis, loading the
dataset, splitting data, and evaluating the model's accuracy.

2. Load and Prepare Dataset:

- Load the Iris dataset using `load_iris()` function from `sklearn.datasets`.

- Separate the feature data (`X`) and target labels (`y`).

3. Split Data into Training and Testing Sets:

- Use `train_test_split()` from `sklearn.model_selection` to split the dataset into training and
testing sets.

- Set aside 20% of the data for testing, and use a random seed (`random_state`) for
reproducibility.

4. Create and Fit LDA Model:

- Create an instance of `LinearDiscriminantAnalysis()` from


`sklearn.discriminant_analysis`.

- Fit the LDA model to the training data using the `fit()` method.

5. Make Predictions:

- Use the trained LDA model to predict the labels for the testing data using the `predict()`
method.

6. Calculate Accuracy:

- Compare the predicted labels (`y_pred`) with the actual labels (`y_test`) using the
`accuracy_score()` function from `sklearn.metrics`.

- Calculate the accuracy of the model's predictions.

7. Print Accuracy:
- Print the calculated accuracy of the LDA model's predictions on the testing data.

Program :

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis


from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and fit LDA model


lda = LinearDiscriminantAnalysis()
lda.fit(X_train, y_train)

# Make predictions
y_pred = lda.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("LDA Accuracy:", accuracy)

Output:
LDA Accuracy: 1.0
Ex.No: 2
Decision Tree Classifier

Aim
To find the best hyperparameters for the decision tree classifier and demonstrates the process of
creating a pruned decision tree with those hyperparameters
Algorithm:

1. Import Necessary Libraries:


o Import libraries, including NumPy for data generation, Scikit-Learn for the
decision tree classifier, and metrics for accuracy measurement.
2. Load or Generate Data:
o Load your dataset or generate sample data for the demonstration. Ensure you
have feature matrix X and target labels y.
3. Split Data into Training and Testing Sets:
o Use train_test_split to divide the data into training and testing sets.
4. Vary Fitting Parameters and Find the Best:
o Initialize variables best_accuracy and best_params.
o Loop through different combinations of hyperparameters (e.g., max_depth,
min_samples_split, min_samples_leaf).
o Create a decision tree classifier with the current parameter combination and
train it on the training data.
o Make predictions on the test data and calculate accuracy.
o Update best_accuracy and best_params if the current model has better
accuracy.
5. Print the Best Parameters and Accuracy:
o Display the best hyperparameters and the corresponding accuracy.
6. Create and Train a Pruned Decision Tree:
o Create a new decision tree classifier with the best hyperparameters.
o Train the pruned decision tree on the training data.
7. Test the Pruned Decision Tree:
o Make predictions using the pruned decision tree on the test data.
o Calculate the accuracy of the pruned decision tree.
8. Print Pruned Decision Tree Accuracy:
o Display the accuracy of the pruned decision tree.

Program:
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load your dataset or generate sample data (replace this with your data)
# X and y should be your feature matrix and target labels
X, y = np.random.rand(100, 2), np.random.choice([0, 1], size=100)

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

best_accuracy = 0
best_params = {}

for max_depth in [None, 10, 20, 30]:


for min_samples_split in [2, 5, 10]:
for min_samples_leaf in [1, 2, 5]:
clf = DecisionTreeClassifier(
max_depth=max_depth,
min_samples_split=min_samples_split,
min_samples_leaf=min_samples_leaf
)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

if accuracy > best_accuracy:


best_accuracy = accuracy
best_params = {
"max_depth": max_depth,
"min_samples_split": min_samples_split,
"min_samples_leaf": min_samples_leaf
}

print("Best Parameters:", best_params)


print("Best Accuracy:", best_accuracy)

# Create and train a pruned decision tree with the best parameters
pruned_tree = DecisionTreeClassifier(**best_params)
pruned_tree.fit(X_train, y_train)

# Test the pruned decision tree on new data


y_pred_pruned = pruned_tree.predict(X_test)
accuracy_pruned = accuracy_score(y_test, y_pred_pruned)

print("Pruned Decision Tree Accuracy:", accuracy_pruned)

Output
Best Parameters: {'max_depth': 10, 'min_samples_split': 10, 'min_samples_leaf': 5}
Best Accuracy: 0.675
Pruned Decision Tree Accuracy: 0.625
Ex.No: 3
Candidate Elimination algorithm.

Aim: For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm in python to output a description of the set
of all hypotheses consistent with the training examples.

Algorithm:

1. Initialize two sets: G (general hypotheses) with a wildcard hypothesis '?' and S
(specific hypotheses) with a bottom hypothesis '⊥'.
2. For each training example 'd', do the following:

a. If 'd' is a positive example:

i. Remove any hypothesis 'h' from G that is inconsistent with 'd'.

ii. For each hypothesis 's' in S that is inconsistent with 'd': - Remove 's' from S. - Add
to S all minimal generalizations of 's' that are consistent with 'd' and have a
generalization in G. - Remove any hypothesis from S that has a more specific
hypothesis 'h' in S.

b. If 'd' is a negative example:

i. Remove any hypothesis 'h' from S that is inconsistent with 'd'.

ii. For each hypothesis 'g' in G that is inconsistent with 'd': - Remove 'g' from G. - Add
to G all minimal specializations of 'g' that are consistent with 'd' and have a
specialization in S. - Remove any hypothesis from G that has a more general
hypothesis in G.

3. Continue this process for all training examples.


4. The algorithm will converge to a version space with G containing the most general
hypotheses and S containing the most specific hypotheses.

Program:

import numpy as np
import pandas as pd

data = pd.read_csv(path+'/enjoysport.csv')
concepts = np.array(data.iloc[:,0:-1])
print("\nInstances are:\n",concepts)
target = np.array(data.iloc[:,-1])
print("\nTarget Values are: ",target)

def learn(concepts, target):


specific_h = concepts[0].copy()
print("\nInitialization of specific_h and genearal_h")
print("\nSpecific Boundary: ", specific_h)
general_h = [["?" for i in range(len(specific_h))] for i in
range(len(specific_h))]
print("\nGeneric Boundary: ",general_h)

for i, h in enumerate(concepts):
print("\nInstance", i+1 , "is ", h)
if target[i] == "yes":
print("Instance is Positive ")
for x in range(len(specific_h)):
if h[x]!= specific_h[x]:
specific_h[x] ='?'
general_h[x][x] ='?'

if target[i] == "no":
print("Instance is Negative ")
for x in range(len(specific_h)):
if h[x]!= specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'

print("Specific Bundary after ", i+1, "Instance is ", specific_h)


print("Generic Boundary after ", i+1, "Instance is ", general_h)
print("\n")

indices = [i for i, val in enumerate(general_h) if val == ['?', '?',


'?', '?', '?', '?']]
for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])
return specific_h, general_h

s_final, g_final = learn(concepts, target)

print("Final Specific_h: ", s_final, sep="\n")


print("Final General_h: ", g_final, sep="\n")

Output:

Instances are:
[[‘sunny’ ‘warm’ ‘normal’ ‘strong’ ‘warm’ ‘same’]
[‘sunny’ ‘warm’ ‘high’ ‘strong’ ‘warm’ ‘same’]
[‘rainy’ ‘cold’ ‘high’ ‘strong’ ‘warm’ ‘change’]
[‘sunny’ ‘warm’ ‘high’ ‘strong’ ‘cool’ ‘change’]]

Target Values are: [‘yes’ ‘yes’ ‘no’ ‘yes’]

Initialization of specific_h and genearal_h

Specific Boundary: [‘sunny’ ‘warm’ ‘normal’ ‘strong’ ‘warm’ ‘same’]

Generic Boundary: [[‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’,
‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’]]
Instance 1 is [‘sunny’ ‘warm’ ‘normal’ ‘strong’ ‘warm’ ‘same’] Instance is Positive

Specific Bundary after 1 Instance is [‘sunny’ ‘warm’ ‘normal’ ‘strong’ ‘warm’ ‘same’]

Generic Boundary after 1 Instance is [[‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’],
[‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’,
‘?’, ‘?’]]

Instance 2 is [‘sunny’ ‘warm’ ‘high’ ‘strong’ ‘warm’ ‘same’] Instance is Positive

Specific Bundary after 2 Instance is [‘sunny’ ‘warm’ ‘?’ ‘strong’ ‘warm’ ‘same’]

Generic Boundary after 2 Instance is [[‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’],
[‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’,
‘?’, ‘?’]]

Instance 3 is [‘rainy’ ‘cold’ ‘high’ ‘strong’ ‘warm’ ‘change’] Instance is Negative

Specific Bundary after 3 Instance is [‘sunny’ ‘warm’ ‘?’ ‘strong’ ‘warm’ ‘same’]

Generic Boundary after 3 Instance is [[‘sunny’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘warm’, ‘?’, ‘?’, ‘?’,
‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’,
‘?’, ‘?’, ‘?’, ‘same’]]

Instance 4 is [‘sunny’ ‘warm’ ‘high’ ‘strong’ ‘cool’ ‘change’] Instance is Positive

Specific Bundary after 4 Instance is [‘sunny’ ‘warm’ ‘?’ ‘strong’ ‘?’ ‘?’]

Generic Boundary after 4 Instance is [[‘sunny’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘warm’, ‘?’, ‘?’, ‘?’,
‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’,
‘?’, ‘?’, ‘?’, ‘?’]]

Final Specific_h: [‘sunny’ ‘warm’ ‘?’ ‘strong’ ‘?’ ‘?’]

Final General_h: [[‘sunny’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘warm’, ‘?’, ‘?’, ‘?’, ‘?’]]
Ex.No: 4
Locally Weighted Regression Algorithm

Aim: To Implement the non-parametric Locally Weighted Regression algorithm in Python


in order to fit data points. Select the appropriate data set for your experiment and draw
graphs.

Locally Weighted Regression Algorithm


Regression:

 Regression is a technique from statistics that are used to predict values of the desired
target quantity when the target quantity is continuous.
o In regression, we seek to identify (or estimate) a continuous variable y
associated with a given input vector x.
 y is called the dependent variable.
 x is called the independent variable.

Loess/Lowess Regression:

Loess regression is a nonparametric technique that uses local weighted regression to fit a
smooth curve through points in a scatter plot.
Lowess Algorithm:

Locally weighted regression is a very powerful nonparametric model used in statistical


learning.

Given a dataset X, y, we attempt to find a model parameter β(x) that minimizes residual sum
of weighted squared errors.

The weights are given by a kernel function (k or w) which can be chosen arbitrarily

AIM: To Implement the non-parametric Locally Weighted Regression algorithm in Python in


order to fit data points

Algorithm
1. Read the Given data Sample to X and the curve (linear or non linear) to Y

2. Set the value for Smoothening parameter or Free parameter say τ

3. Set the bias /Point of interest set x0 which is a subset of X

4. Determine the weight matrix using :

5. Determine the value of model term parameter β using:


6. Prediction = x0*β

Python Program
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

def kernel(point, xmat, k):


m,n = np.shape(xmat)
weights = np.mat(np1.eye((m)))
for j in range(m):
diff = point - X[j]
weights[j,j] = np.exp(diff*diff.T/(-2.0*k**2))
return weights

def localWeight(point, xmat, ymat, k):


wei = kernel(point,xmat,k)
W = (X.T*(wei*X)).I*(X.T*(wei*ymat.T))
return W

def localWeightRegression(xmat, ymat, k):


m,n = np.shape(xmat)
ypred = np.zeros(m)
for i in range(m):
ypred[i] = xmat[i]*localWeight(xmat[i],xmat,ymat,k)
return ypred

# load data points


data = pd.read_csv('10-dataset.csv')
bill = np.array(data.total_bill)
tip = np.array(data.tip)

#preparing and add 1 in bill


mbill = np.mat(bill)
mtip = np.mat(tip)

m= np.shape(mbill)[1]
one = np.mat(np1.ones(m))
X = np.hstack((one.T,mbill.T))

#set k here
ypred = localWeightRegression(X,mtip,0.5)
SortIndex = X[:,1].argsort(0)
xsort = X[SortIndex][:,0]

fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(bill,tip, color='green')
ax.plot(xsort[:,1],ypred[SortIndex], color = 'red', linewidth=5)
plt.xlabel('Total bill')
plt.ylabel('Tip')
plt.show();

Output
Ex.No: 5 Decision Tree ID3 Algorithm

Aim: To Write a program to demonstrate the working of the decision tree based ID3
algorithm. Use an appropriate data set for building the decision tree and apply this knowledge
to classify a new sample.

Algorithm:

1. Create a root node for the decision tree.


2. Check if all examples are positive:
o If all examples have a positive target attribute, return a single-node tree with
the label "+".
3. Check if all examples are negative:
o If all examples have a negative target attribute, return a single-node tree with
the label "-".
4. Check if the list of attributes (Attributes) is empty:
o If no attributes are left to split on, return a single-node tree with the label being
the most common value of the target attribute in the examples.
5. If none of the above conditions are met, the algorithm proceeds with attribute
selection:
o Select the attribute A from the list of attributes (Attributes) that best classifies
the examples based on some criterion (e.g., information gain or Gini
impurity). The selection criterion depends on the algorithm variant.
6. Set the decision attribute for the current root node to A.
7. For each possible value vi of attribute A:
o Add a new tree branch below the current root node, corresponding to the test
A = vi.
8. Create a subset of examples, denoted as Examples_vi, that have the value vi for
attribute A.
9. Check if Examples_vi is empty:
o If Examples_vi is empty, add a leaf node below the current branch with the
label being the most common value of the target attribute in the original
examples.
10. If Examples_vi is not empty:
o Recursively call the ID3 algorithm with the subset Examples_vi, the target
attribute, and the list of attributes excluding A (i.e., Attributes - {A}).
o Add the subtree returned by the recursive call below the current branch.
11. Repeat steps 7 to 10 for all possible values of attribute A.
12. Return the root node of the decision tree.
Program:
import pandas as pd
import math
import numpy as np

data = pd.read_csv("Dataset/4-dataset.csv")
features = [feat for feat in data]
features.remove("answer")

Create a class named Node with four members children, value, isLeaf and pred.

class Node:
def __init__(self):
self.children = []
self.value = ""
self.isLeaf = False
self.pred = ""

Define a function called entropy to find the entropy oof the dataset.

def entropy(examples):
pos = 0.0
neg = 0.0
for _, row in examples.iterrows():
if row["answer"] == "yes":
pos += 1
else:
neg += 1
if pos == 0.0 or neg == 0.0:
return 0.0
else:
p = pos / (pos + neg)
n = neg / (pos + neg)
return -(p * math.log(p, 2) + n * math.log(n, 2))

Define a function named info_gain to find the gain of the attribute

def info_gain(examples, attr):


uniq = np.unique(examples[attr])
#print ("\n",uniq)
gain = entropy(examples)
#print ("\n",gain)
for u in uniq:
subdata = examples[examples[attr] == u]
#print ("\n",subdata)
sub_e = entropy(subdata)
gain -= (float(len(subdata)) / float(len(examples))) * sub_e
#print ("\n",gain)
return gain

Define a function named ID3 to get the decision tree for the given dataset

def ID3(examples, attrs):


root = Node()

max_gain = 0
max_feat = ""
for feature in attrs:
#print ("\n",examples)
gain = info_gain(examples, feature)
if gain > max_gain:
max_gain = gain
max_feat = feature
root.value = max_feat
#print ("\nMax feature attr",max_feat)
uniq = np.unique(examples[max_feat])
#print ("\n",uniq)
for u in uniq:
#print ("\n",u)
subdata = examples[examples[max_feat] == u]
#print ("\n",subdata)
if entropy(subdata) == 0.0:
newNode = Node()
newNode.isLeaf = True
newNode.value = u
newNode.pred = np.unique(subdata["answer"])
root.children.append(newNode)
else:
dummyNode = Node()
dummyNode.value = u
new_attrs = attrs.copy()
new_attrs.remove(max_feat)
child = ID3(subdata, new_attrs)
dummyNode.children.append(child)
root.children.append(dummyNode)

return root

Define a function named printTree to draw the decision tree

def printTree(root: Node, depth=0):


for i in range(depth):
print("\t", end="")
print(root.value, end="")
if root.isLeaf:
print(" -> ", root.pred)
print()
for child in root.children:
printTree(child, depth + 1)

Define a function named classify to classify the new example

def classify(root: Node, new):


for child in root.children:
if child.value == new[root.value]:
if child.isLeaf:
print ("Predicted Label for new example", new," is:",
child.pred)
exit
else:
classify (child.children[0], new)

Finally, call the ID3, printTree and classify functions

root = ID3(data, features)


print("Decision Tree is:")
printTree(root)
print ("------------------")

new = {"outlook":"sunny", "temperature":"hot", "humidity":"normal",


"wind":"strong"}
classify (root, new)

Output:

Decision Tree is:


outlook
overcast -> ['yes']

rain
wind
strong -> ['no']

weak -> ['yes']

sunny
humidity
high -> ['no']

normal -> ['yes']

------------------
Predicted Label for new example {'outlook': 'sunny', 'temperature': 'hot',
'humidity': 'normal', 'wind': 'strong'} is: ['yes']
Ex.No: 6
Support Vector Machine (SVM)

Aim: To Implementation of Support Vector Machine (SVM) in Python

Algorithm:

1. Importing the Necessary libraries


2. Importing the dataset
3. Splitting the dataset into the Training set and Test set
4. Feature Scaling
5. Training the Support Vector Machine (SVM) Classification model on the Training set
6. Support Vector Machine (SVM) classifier model
7. Display the results (confusion matrix and accuracy)

Program:

import numpy as np
import pandas as pd
ataset = pd.read_csv('Data.csv')

X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25,
random_state = 0)

from sklearn.preprocessing import StandardScaler


sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

from sklearn.svm import SVC


classifier = SVC(kernel = 'linear', random_state = 0)
classifier.fit(X_train, y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,


decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
kernel='linear', max_iter=-1, probability=False, random_state=0,
shrinking=True, tol=0.001, verbose=False)
from sklearn.metrics import confusion_matrix, accuracy_score
y_pred = classifier.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)
print (accuracy_score)

Output

[[102, 5]
[ 5, 59]]

Accuracy Score: 0.89


Ex.No: 7 K-Nearest Neighbour Algorithm using iris Data Set

Aim: To Write a program to implement k-Nearest Neighbour algorithm to classify the iris
data set. Print both correct and wrong predictions.

Algorithm:

1. Import Necessary Libraries:


- Import the required Python libraries such as NumPy, Pandas, scikit-learn's
KNeighborsClassifier, train_test_split, and metrics for evaluation.

2. Define Column Names:


- Create a list called `names` that contains the names of columns in the dataset. These
names will be used as headers when reading the dataset.

3. Read the Dataset:


- Read the dataset from a CSV file ("9-dataset.csv") into a Pandas DataFrame. Assign the
column names from `names` to the dataset.

4. Split Data into Features and Target:


- Split the dataset into feature variables (X) and the target variable (y).
- `X` contains all columns except the last one, which is the "Class" column.
- `y` contains only the "Class" column, representing the labels or classes you want to
predict.

5. Print First Few Rows of Features:


- Display the first few rows of the feature dataset (`X`) to provide a preview of the data.

6. Split Data into Training and Testing Sets:


- Split the data into training and testing sets using the `train_test_split` function.
- `Xtrain` and `ytrain` contain 90% of the data for training.
- `Xtest` and `ytest` contain 10% of the data for testing.

7. Create and Train the KNN Classifier:


- Create a K-nearest neighbors (KNN) classifier with `n_neighbors=5`.
- Train the classifier using the training data (`Xtrain` and `ytrain`) with the `.fit()` method.

8. Make Predictions:
- Use the trained KNN classifier to make predictions on the testing data (`Xtest`).
- Store the predicted labels in the `ypred` variable.

9. Evaluate the Model:


-
- Calculate and print the confusion matrix, classification report, and accuracy of the
classifier using metrics from scikit-learn.
10. Display Results:
- Print a separator line to mark the end of the script execution.

K-Nearest Neighbor Algorithm

Training algorithm:

 For each training example (x, f (x)), add the example to the list training examples
Classification algorithm:
o Given a query instance xq to be classified,
 Let x1 . . .xk denote the k instances from training examples that are
nearest to xq
 Return

 Where, f(xi) function to calculate the mean value of the k nearest training examples.

Data Set:

Iris Plants Dataset: Dataset contains 150 instances (50 in each of three classes) Number of
Attributes: 4 numeric, predictive attributes and the Class.
Sample Data

Click Here to Download Iris Dataset

Python Program to Implement and Demonstrate KNN Algorithm


import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics

names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width',


'Class']

# Read dataset to pandas dataframe


dataset = pd.read_csv("9-dataset.csv", names=names)
X = dataset.iloc[:, :-1]
y = dataset.iloc[:, -1]
print(X.head())
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size=0.10)

classifier = KNeighborsClassifier(n_neighbors=5).fit(Xtrain, ytrain)

ypred = classifier.predict(Xtest)
i = 0
print ("\n-----------------------------------------------------------------
--------")
print ('%-25s %-25s %-25s' % ('Original Label', 'Predicted Label',
'Correct/Wrong'))
print ("-------------------------------------------------------------------
------")
for label in ytest:
print ('%-25s %-25s' % (label, ypred[i]), end="")
if (label == ypred[i]):
print (' %-25s' % ('Correct'))
else:
print (' %-25s' % ('Wrong'))
i = i + 1
print ("-------------------------------------------------------------------
------")
print("\nConfusion Matrix:\n",metrics.confusion_matrix(ytest, ypred))
print ("-------------------------------------------------------------------
------")
print("\nClassification Report:\n",metrics.classification_report(ytest,
ypred))
print ("-------------------------------------------------------------------
------")
print('Accuracy of the classifer is %0.2f' %
metrics.accuracy_score(ytest,ypred))
print ("-------------------------------------------------------------------
------")

Output
sepal-length sepal-width petal-length petal-width
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2

-------------------------------------------------------------------------
Original Label Predicted Label Correct/Wrong
-------------------------------------------------------------------------
Iris-versicolor Iris-versicolor Correct
Iris-virginica Iris-versicolor Wrong
Iris-virginica Iris-virginica Correct
Iris-versicolor Iris-versicolor Correct
Iris-setosa Iris-setosa Correct
Iris-versicolor Iris-versicolor Correct
Iris-setosa Iris-setosa Correct
Iris-setosa Iris-setosa Correct
Iris-virginica Iris-virginica Correct
Iris-virginica Iris-versicolor Wrong
Iris-virginica Iris-virginica Correct
Iris-setosa Iris-setosa Correct
Iris-virginica Iris-virginica Correct
Iris-virginica Iris-virginica Correct
Iris-versicolor Iris-versicolor Correct
-------------------------------------------------------------------------

Confusion Matrix:
[[4 0 0]
[0 4 0]
[0 2 5]]
-------------------------------------------------------------------------

Classification Report:
precision recall f1-score support

Iris-setosa 1.00 1.00 1.00 4


Iris-versicolor 0.67 1.00 0.80 4
Iris-virginica 1.00 0.71 0.83 7

avg / total 0.91 0.87 0.87 15

-------------------------------------------------------------------------
Accuracy of the classifer is 0.87
-------------------------------------------------------------------------
Ex.No: 8 Implement the Bayesian network

Aim: Write a program to construct a Bayesian network considering medical data. Use this
model to demonstrate the diagnosis of heart pa ents using a standard Heart Disease Data
Set.

Descrip on:

A Bayesian network is a directed acyclic graph in which each edge corresponds to a condi onal
dependency, and each node corresponds to a unique random variable.

consider the following example. Suppose we attempt to turn on our computer, but the computer
does not start (observation/evidence). We would like to know which of the possible causes of
computer failure is more likely. In this simplified illustration, we assume only two possible
causes of this misfortune: electricity failure and computer malfunction.

The corresponding directed acyclic graph is depicted in below figure.

The goal is to calculate the posterior conditional probability distribution of each of the possible
unobserved causes given the observed evidence, i.e. P [Cause | Evidence].
Algorithm:

1. Import the required libraries.


o Import NumPy, Pandas, and CSV for data handling.
o Import classes and functions from pgmpy for Bayesian network modeling.
2. Load the heart disease dataset from a CSV file and replace missing values ('?') with
NaN.
3. Print the first few rows of the dataset and the attribute datatypes.
4. Define the structure of the Bayesian network using the BayesianModel class. This
step specifies the dependencies between variables. In this case, the network structure
is defined with the following edges:
o 'age' -> 'heartdisease'
o 'sex' -> 'heartdisease'
o 'exang' -> 'heartdisease'
o 'cp' -> 'heartdisease'
o 'heartdisease' -> 'restecg'
o 'heartdisease' -> 'chol'
5. Use Maximum Likelihood Estimation to learn Conditional Probability Distributions
(CPDs) for the Bayesian network based on the dataset. This step estimates the
probabilities of the variables given their parents.
6. Create an instance of the VariableElimination class for performing inference with the
Bayesian network.
7. Perform two inference queries:

a. Calculate the probability of 'heartdisease' given the evidence 'restecg=1' and display
the result.

b. Calculate the probability of 'heartdisease' given the evidence 'cp=2' and display the
result.

Program:

import numpy as np
import pandas as pd
import csv
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.models import BayesianModel
from pgmpy.inference import VariableElimination

heartDisease = pd.read_csv('heart.csv')
heartDisease = heartDisease.replace('?',np.nan)

print('Sample instances from the dataset are given below')


print(heartDisease.head())

print('\n Attributes and datatypes')


print(heartDisease.dtypes)

model=
BayesianModel([('age','heartdisease'),('sex','heartdisease'),('exang','hear
tdisease'),('cp','heartdisease'),('heartdisease','restecg'),('heartdisease'
,'chol')])
print('\nLearning CPD using Maximum likelihood estimators')
model.fit(heartDisease,estimator=MaximumLikelihoodEstimator)

print('\n Inferencing with Bayesian Network:')


HeartDiseasetest_infer = VariableElimination(model)

print('\n 1. Probability of HeartDisease given evidence= restecg')


q1=HeartDiseasetest_infer.query(variables=['heartdisease'],evidence={'reste
cg':1})
print(q1)

print('\n 2. Probability of HeartDisease given evidence= cp ')


q2=HeartDiseasetest_infer.query(variables=['heartdisease'],evidence={'cp':2
})
print(q2)

Output
Ex.No: 9 Artificial Neural Network by implementing the Single-layer
Perceptron

Aim:

Write a program to build Artificial Neural Network by implementing the Single-layer


Perceptron.

Algorithm:

1. Import the necessary libraries, including NumPy, scikit-learn, and the Iris dataset
from scikit-learn.
2. Load the Iris dataset and convert it into a binary classification problem by setting the
target labels to 1 if the class is not 0 (versicolor or virginica) and 0 otherwise (setosa).
3. Split the dataset into training and testing sets using the train_test_split function.
4. Create a single-layer perceptron using Perceptron() from scikit-learn and train it
on the training data.
5. Make predictions on the test data using the trained perceptron.
6. Calculate the accuracy of the perceptron by comparing the predicted labels with the
true labels and print the accuracy.

Program:

# Import necessary libraries


import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Perceptron
from sklearn.metrics import accuracy_score

# Load a dataset (in this example, we'll use the Iris dataset)
data = load_iris()
X = data.data
y = (data.target != 0).astype(int) # Convert labels to binary (1 if
not class 0, 0 otherwise)

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.3, random_state=42)

# Create and train a single-layer perceptron


perceptron = Perceptron()
perceptron.fit(X_train, y_train)
# Make predictions on the test data
y_pred = perceptron.predict(X_test)

# Calculate the accuracy of the perceptron


accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

OutPut:

Accuracy: 1.0
Ex.No: 10 Artificial Neural Network by implementing the Multi-layer
Perceptron

Aim: Build an Artificial Neural Network by implementing the Multi-layer Perceptron and
test the same using appropriate data sets.

Algorithm:

1. Import the necessary libraries, including NumPy, scikit-learn, and the Iris dataset
from scikit-learn.
2. Load the Iris dataset for a multi-class classification problem, where we predict the
class of iris flowers.
3. Split the dataset into training and testing sets using the train_test_split function.
4. Create a multi-layer perceptron (MLP) using MLPClassifier from scikit-learn. In
this example, we specify two hidden layers with 100 and 50 neurons, respectively.
You can adjust the architecture of the MLP by modifying the hidden_layer_sizes
parameter.
5. Train the MLP on the training data.
6. Make predictions on the test data using the trained MLP.
7. Calculate the accuracy of the MLP by comparing the predicted labels with the true
labels and print the accuracy.

Program:

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score

# Load a dataset (in this example, we'll use the Iris dataset)
data = load_iris()
X = data.data
y = data.target

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.3, random_state=42)

# Create and train a multi-layer perceptron (MLP)


mlp = MLPClassifier(hidden_layer_sizes=(100, 50), max_iter=1000,
random_state=42)
mlp.fit(X_train, y_train)
# Make predictions on the test data
y_pred = mlp.predict(X_test)

# Calculate the accuracy of the MLP


accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

OutPut:

Accuracy: 1.0
Ex.No: 11
Radial Basis Function (RBF) network

Aim:
To build a Radial Basis Function (RBF) network to calculate a fitness function with five neurons
Using Python
Algorithm:

1. Import necessary libraries, such as NumPy.


2. Define your dataset, including input data and target values.
3. Specify the number of neurons in your RBF network (in this example, num_neurons = 5).
4. Define the centroids and spread for the RBF neurons. These values need to be
determined based on your specific problem. In this example, we use random initial
centroids and equal spreads for simplicity.
5. Define the Radial Basis Function (RBF) as a Gaussian function.
6. Calculate the RBF activations for each data point and neuron by iterating through your
dataset and applying the RBF function to each data point and neuron using their
respective centroids and spreads.
7. Solve for the weights using the pseudo-inverse method, which involves taking the
pseudo-inverse of the RBF activations matrix and multiplying it with the target values.
8. Calculate the fitness function for a new input by applying each RBF neuron to the new
input and weighting the results with the calculated weights.
9. Print the fitness value for the given input

Program:
import numpy as np

# Define your dataset (input data and target values)


# Replace these with your actual data
data = np.array([[0.1, 0.2], [0.4, 0.5], [0.6, 0.7], [0.8, 0.9]])
target = np.array([1.0, 2.0, 3.0, 4.0])

# Number of neurons
num_neurons = 5

# Define the centroids and spread for the RBF neurons


# You can choose these values based on your problem
centroids = np.random.rand(num_neurons, 2) # Random initial centroids
spread = np.ones(num_neurons) # You can adjust the spread as needed

# Define the RBF function


def rbf(x, c, s):
return np.exp(-np.linalg.norm(x - c) / (2 * s**2))

# Calculate the RBF activations for each data point and neuron
rbf_activations = np.zeros((len(data), num_neurons))
for i in range(len(data)):
for j in range(num_neurons):
rbf_activations[i, j] = rbf(data[i], centroids[j], spread[j])

# Solve for the weights using the pseudo-inverse method


weights = np.linalg.pinv(rbf_activations).dot(target)

# Calculate the fitness function for a new input


new_input = np.array([0.3, 0.4])
output = sum([weights[j] * rbf(new_input, centroids[j], spread[j]) for j in range(num_neurons)])

print("Fitness:", output)

Output:

Fitness: 1.0047912368185284
Ex.No: 12
Gaussian Mixture Models

Aim

To Implement a Gaussian Mixture Models using python

Algorithm:

1. Import Required Libraries:


o Import NumPy for data manipulation, Matplotlib for data visualization, and
Scikit-Learn for GMM.
2. Data Generation:
o Generate synthetic data for clustering using make_blobs from Scikit-Learn.
You can replace this with your own dataset.
3. Create and Fit GMM:
o Create a GMM model with the desired number of components (clusters) and a
random seed for reproducibility.
o Fit the GMM model to the data using the fit method.
4. Cluster Assignments:
o Predict cluster assignments for each data point using the predict method.
5. Retrieve Gaussian Component Information:
o Obtain the means and covariances of the Gaussian components (clusters) using
the means_ and covariances_ attributes of the GMM model.
6. Data Visualization:
o Create a scatter plot to visualize the clustered data points with color-coded
clusters.
o Overlay cluster centers (means) with red "x" markers.
7. Display the Plot:
o Show the plot with the clustered data and cluster centers using plt.show().

Program:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.mixture import GaussianMixture
from sklearn.datasets import make_blobs

# Generate synthetic data (you can replace this with your dataset)
n_samples = 300
n_features = 2
n_clusters = 3
X, _ = make_blobs(n_samples=n_samples, n_features=n_features, centers=n_clusters,
random_state=42)
# Create and fit a Gaussian Mixture Model
gmm = GaussianMixture(n_components=n_clusters, random_state=42)
gmm.fit(X)

# Predict cluster assignments for each data point


labels = gmm.predict(X)

# Get the means and covariances of the Gaussian components


means = gmm.means_
covariances = gmm.covariances_

# Plot the data points with color-coded clusters


plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
plt.scatter(means[:, 0], means[:, 1], s=100, color='red', marker='x', label='Cluster Centers')
plt.legend()
plt.title('Gaussian Mixture Model Clustering')
plt.show()

Output:
Ex.No: 12
Linear Vector Quantization

Aim:

To Implement Linear Vector Quantization with hidden neuron and learning rate

Description
Linear Vector Quan za on (LVQ) is a type of unsupervised learning algorithm used for vector
quan za on and clustering. It's similar to k-means but allows for more flexible, non-uniform quan za on

Algorithm:

1. Generate Synthetic Data:


o Generate synthetic data for demonstration purposes, including features ( X) and
labels (y). You should replace this with your own dataset.
2. LVQ Initialization:
o Initialize the LVQ model with random neuron weights (neurons) and
associated labels (neuron_labels).
o Each neuron represents a prototype vector with a specific label.
3. LVQ Training:
o Iterate through training epochs.
o For each data point x in the dataset:
 Find the nearest neuron among the prototypes by calculating the
Euclidean distance.
 Determine the label of the data point.
 Update the winning neuron based on the learning rate
(learning_rate).
 If the winning neuron's label matches the data point's label, move the
neuron closer to the data point (x) by adding learning_rate * (x -
neuron).
 If the labels don't match, move the neuron away from the data point ( x)
by subtracting learning_rate * (x - neuron).
4. Example Usage:
o Set the parameters for the LVQ algorithm, including the number of samples,
features, classes, neurons, learning rate, and training epochs.
o Generate synthetic data and initialize the LVQ model with prototypes.
o Train the LVQ model on the data.
o The trained prototypes (trained_neurons) can be used for clustering or
classification tasks.
Program:
import numpy as np
import random

# Generate synthetic data (you can replace this with your dataset)
def generate_data(num_samples, num_features, num_classes):
X = np.random.rand(num_samples, num_features)
y = np.random.randint(0, num_classes, num_samples)
return X, y

# LVQ initialization
def initialize_lvq(X, y, num_neurons, learning_rate):
num_features = X.shape[1]
neurons = np.random.rand(num_neurons, num_features)
neuron_labels = np.array([random.choice(np.unique(y)) for _ in range(num_neurons)])
return neurons, neuron_labels

# LVQ training
def lvq_train(X, y, neurons, neuron_labels, learning_rate, epochs):
for epoch in range(epochs):
for i in range(len(X)):
x = X[i]
label = y[i]

# Find the nearest neuron


distances = np.linalg.norm(neurons - x, axis=1)
winner_index = np.argmin(distances)

# Update the winning neuron


if neuron_labels[winner_index] == label:
neurons[winner_index] += learning_rate * (x - neurons[winner_index])
else:
neurons[winner_index] -= learning_rate * (x - neurons[winner_index])

return neurons

# Example usage
num_samples = 100
num_features = 2
num_classes = 3
num_neurons = 5
learning_rate = 0.1
epochs = 100

X, y = generate_data(num_samples, num_features, num_classes)


neurons, neuron_labels = initialize_lvq(X, y, num_neurons, learning_rate)
trained_neurons = lvq_train(X, y, neurons, neuron_labels, learning_rate, epochs)

Output:

Trained Neuron Prototypes:


[[ 2.29103335e+14 1.02182575e+14]
[-9.60898210e+13 -2.46332797e+14]
[-4.54141623e+13 -2.54609438e+14]
[ 3.03237505e+13 -2.20322438e+14]
[-9.31526282e+13 -2.26094512e+14]]

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy