ML NEW Final Format
ML NEW Final Format
No: 1
Linear Discriminant Analysis (LDA)
Aim:
Algorithm:
- Import the necessary libraries for performing Linear Discriminant Analysis, loading the
dataset, splitting data, and evaluating the model's accuracy.
- Use `train_test_split()` from `sklearn.model_selection` to split the dataset into training and
testing sets.
- Set aside 20% of the data for testing, and use a random seed (`random_state`) for
reproducibility.
- Fit the LDA model to the training data using the `fit()` method.
5. Make Predictions:
- Use the trained LDA model to predict the labels for the testing data using the `predict()`
method.
6. Calculate Accuracy:
- Compare the predicted labels (`y_pred`) with the actual labels (`y_test`) using the
`accuracy_score()` function from `sklearn.metrics`.
7. Print Accuracy:
- Print the calculated accuracy of the LDA model's predictions on the testing data.
Program :
# Load dataset
iris = load_iris()
X = iris.data
y = iris.target
# Make predictions
y_pred = lda.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("LDA Accuracy:", accuracy)
Output:
LDA Accuracy: 1.0
Ex.No: 2
Decision Tree Classifier
Aim
To find the best hyperparameters for the decision tree classifier and demonstrates the process of
creating a pruned decision tree with those hyperparameters
Algorithm:
Program:
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Load your dataset or generate sample data (replace this with your data)
# X and y should be your feature matrix and target labels
X, y = np.random.rand(100, 2), np.random.choice([0, 1], size=100)
best_accuracy = 0
best_params = {}
# Create and train a pruned decision tree with the best parameters
pruned_tree = DecisionTreeClassifier(**best_params)
pruned_tree.fit(X_train, y_train)
Output
Best Parameters: {'max_depth': 10, 'min_samples_split': 10, 'min_samples_leaf': 5}
Best Accuracy: 0.675
Pruned Decision Tree Accuracy: 0.625
Ex.No: 3
Candidate Elimination algorithm.
Aim: For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm in python to output a description of the set
of all hypotheses consistent with the training examples.
Algorithm:
1. Initialize two sets: G (general hypotheses) with a wildcard hypothesis '?' and S
(specific hypotheses) with a bottom hypothesis '⊥'.
2. For each training example 'd', do the following:
ii. For each hypothesis 's' in S that is inconsistent with 'd': - Remove 's' from S. - Add
to S all minimal generalizations of 's' that are consistent with 'd' and have a
generalization in G. - Remove any hypothesis from S that has a more specific
hypothesis 'h' in S.
ii. For each hypothesis 'g' in G that is inconsistent with 'd': - Remove 'g' from G. - Add
to G all minimal specializations of 'g' that are consistent with 'd' and have a
specialization in S. - Remove any hypothesis from G that has a more general
hypothesis in G.
Program:
import numpy as np
import pandas as pd
data = pd.read_csv(path+'/enjoysport.csv')
concepts = np.array(data.iloc[:,0:-1])
print("\nInstances are:\n",concepts)
target = np.array(data.iloc[:,-1])
print("\nTarget Values are: ",target)
for i, h in enumerate(concepts):
print("\nInstance", i+1 , "is ", h)
if target[i] == "yes":
print("Instance is Positive ")
for x in range(len(specific_h)):
if h[x]!= specific_h[x]:
specific_h[x] ='?'
general_h[x][x] ='?'
if target[i] == "no":
print("Instance is Negative ")
for x in range(len(specific_h)):
if h[x]!= specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
Output:
Instances are:
[[‘sunny’ ‘warm’ ‘normal’ ‘strong’ ‘warm’ ‘same’]
[‘sunny’ ‘warm’ ‘high’ ‘strong’ ‘warm’ ‘same’]
[‘rainy’ ‘cold’ ‘high’ ‘strong’ ‘warm’ ‘change’]
[‘sunny’ ‘warm’ ‘high’ ‘strong’ ‘cool’ ‘change’]]
Generic Boundary: [[‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’,
‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’]]
Instance 1 is [‘sunny’ ‘warm’ ‘normal’ ‘strong’ ‘warm’ ‘same’] Instance is Positive
Specific Bundary after 1 Instance is [‘sunny’ ‘warm’ ‘normal’ ‘strong’ ‘warm’ ‘same’]
Generic Boundary after 1 Instance is [[‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’],
[‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’,
‘?’, ‘?’]]
Specific Bundary after 2 Instance is [‘sunny’ ‘warm’ ‘?’ ‘strong’ ‘warm’ ‘same’]
Generic Boundary after 2 Instance is [[‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’],
[‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’,
‘?’, ‘?’]]
Specific Bundary after 3 Instance is [‘sunny’ ‘warm’ ‘?’ ‘strong’ ‘warm’ ‘same’]
Generic Boundary after 3 Instance is [[‘sunny’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘warm’, ‘?’, ‘?’, ‘?’,
‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’,
‘?’, ‘?’, ‘?’, ‘same’]]
Specific Bundary after 4 Instance is [‘sunny’ ‘warm’ ‘?’ ‘strong’ ‘?’ ‘?’]
Generic Boundary after 4 Instance is [[‘sunny’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘warm’, ‘?’, ‘?’, ‘?’,
‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘?’,
‘?’, ‘?’, ‘?’, ‘?’]]
Final General_h: [[‘sunny’, ‘?’, ‘?’, ‘?’, ‘?’, ‘?’], [‘?’, ‘warm’, ‘?’, ‘?’, ‘?’, ‘?’]]
Ex.No: 4
Locally Weighted Regression Algorithm
Regression is a technique from statistics that are used to predict values of the desired
target quantity when the target quantity is continuous.
o In regression, we seek to identify (or estimate) a continuous variable y
associated with a given input vector x.
y is called the dependent variable.
x is called the independent variable.
Loess/Lowess Regression:
Loess regression is a nonparametric technique that uses local weighted regression to fit a
smooth curve through points in a scatter plot.
Lowess Algorithm:
Given a dataset X, y, we attempt to find a model parameter β(x) that minimizes residual sum
of weighted squared errors.
The weights are given by a kernel function (k or w) which can be chosen arbitrarily
Algorithm
1. Read the Given data Sample to X and the curve (linear or non linear) to Y
Python Program
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
m= np.shape(mbill)[1]
one = np.mat(np1.ones(m))
X = np.hstack((one.T,mbill.T))
#set k here
ypred = localWeightRegression(X,mtip,0.5)
SortIndex = X[:,1].argsort(0)
xsort = X[SortIndex][:,0]
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(bill,tip, color='green')
ax.plot(xsort[:,1],ypred[SortIndex], color = 'red', linewidth=5)
plt.xlabel('Total bill')
plt.ylabel('Tip')
plt.show();
Output
Ex.No: 5 Decision Tree ID3 Algorithm
Aim: To Write a program to demonstrate the working of the decision tree based ID3
algorithm. Use an appropriate data set for building the decision tree and apply this knowledge
to classify a new sample.
Algorithm:
data = pd.read_csv("Dataset/4-dataset.csv")
features = [feat for feat in data]
features.remove("answer")
Create a class named Node with four members children, value, isLeaf and pred.
class Node:
def __init__(self):
self.children = []
self.value = ""
self.isLeaf = False
self.pred = ""
Define a function called entropy to find the entropy oof the dataset.
def entropy(examples):
pos = 0.0
neg = 0.0
for _, row in examples.iterrows():
if row["answer"] == "yes":
pos += 1
else:
neg += 1
if pos == 0.0 or neg == 0.0:
return 0.0
else:
p = pos / (pos + neg)
n = neg / (pos + neg)
return -(p * math.log(p, 2) + n * math.log(n, 2))
Define a function named ID3 to get the decision tree for the given dataset
max_gain = 0
max_feat = ""
for feature in attrs:
#print ("\n",examples)
gain = info_gain(examples, feature)
if gain > max_gain:
max_gain = gain
max_feat = feature
root.value = max_feat
#print ("\nMax feature attr",max_feat)
uniq = np.unique(examples[max_feat])
#print ("\n",uniq)
for u in uniq:
#print ("\n",u)
subdata = examples[examples[max_feat] == u]
#print ("\n",subdata)
if entropy(subdata) == 0.0:
newNode = Node()
newNode.isLeaf = True
newNode.value = u
newNode.pred = np.unique(subdata["answer"])
root.children.append(newNode)
else:
dummyNode = Node()
dummyNode.value = u
new_attrs = attrs.copy()
new_attrs.remove(max_feat)
child = ID3(subdata, new_attrs)
dummyNode.children.append(child)
root.children.append(dummyNode)
return root
Output:
rain
wind
strong -> ['no']
sunny
humidity
high -> ['no']
------------------
Predicted Label for new example {'outlook': 'sunny', 'temperature': 'hot',
'humidity': 'normal', 'wind': 'strong'} is: ['yes']
Ex.No: 6
Support Vector Machine (SVM)
Algorithm:
Program:
import numpy as np
import pandas as pd
ataset = pd.read_csv('Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
Output
[[102, 5]
[ 5, 59]]
Aim: To Write a program to implement k-Nearest Neighbour algorithm to classify the iris
data set. Print both correct and wrong predictions.
Algorithm:
8. Make Predictions:
- Use the trained KNN classifier to make predictions on the testing data (`Xtest`).
- Store the predicted labels in the `ypred` variable.
Training algorithm:
For each training example (x, f (x)), add the example to the list training examples
Classification algorithm:
o Given a query instance xq to be classified,
Let x1 . . .xk denote the k instances from training examples that are
nearest to xq
Return
Where, f(xi) function to calculate the mean value of the k nearest training examples.
Data Set:
Iris Plants Dataset: Dataset contains 150 instances (50 in each of three classes) Number of
Attributes: 4 numeric, predictive attributes and the Class.
Sample Data
ypred = classifier.predict(Xtest)
i = 0
print ("\n-----------------------------------------------------------------
--------")
print ('%-25s %-25s %-25s' % ('Original Label', 'Predicted Label',
'Correct/Wrong'))
print ("-------------------------------------------------------------------
------")
for label in ytest:
print ('%-25s %-25s' % (label, ypred[i]), end="")
if (label == ypred[i]):
print (' %-25s' % ('Correct'))
else:
print (' %-25s' % ('Wrong'))
i = i + 1
print ("-------------------------------------------------------------------
------")
print("\nConfusion Matrix:\n",metrics.confusion_matrix(ytest, ypred))
print ("-------------------------------------------------------------------
------")
print("\nClassification Report:\n",metrics.classification_report(ytest,
ypred))
print ("-------------------------------------------------------------------
------")
print('Accuracy of the classifer is %0.2f' %
metrics.accuracy_score(ytest,ypred))
print ("-------------------------------------------------------------------
------")
Output
sepal-length sepal-width petal-length petal-width
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
-------------------------------------------------------------------------
Original Label Predicted Label Correct/Wrong
-------------------------------------------------------------------------
Iris-versicolor Iris-versicolor Correct
Iris-virginica Iris-versicolor Wrong
Iris-virginica Iris-virginica Correct
Iris-versicolor Iris-versicolor Correct
Iris-setosa Iris-setosa Correct
Iris-versicolor Iris-versicolor Correct
Iris-setosa Iris-setosa Correct
Iris-setosa Iris-setosa Correct
Iris-virginica Iris-virginica Correct
Iris-virginica Iris-versicolor Wrong
Iris-virginica Iris-virginica Correct
Iris-setosa Iris-setosa Correct
Iris-virginica Iris-virginica Correct
Iris-virginica Iris-virginica Correct
Iris-versicolor Iris-versicolor Correct
-------------------------------------------------------------------------
Confusion Matrix:
[[4 0 0]
[0 4 0]
[0 2 5]]
-------------------------------------------------------------------------
Classification Report:
precision recall f1-score support
-------------------------------------------------------------------------
Accuracy of the classifer is 0.87
-------------------------------------------------------------------------
Ex.No: 8 Implement the Bayesian network
Aim: Write a program to construct a Bayesian network considering medical data. Use this
model to demonstrate the diagnosis of heart pa ents using a standard Heart Disease Data
Set.
Descrip on:
A Bayesian network is a directed acyclic graph in which each edge corresponds to a condi onal
dependency, and each node corresponds to a unique random variable.
consider the following example. Suppose we attempt to turn on our computer, but the computer
does not start (observation/evidence). We would like to know which of the possible causes of
computer failure is more likely. In this simplified illustration, we assume only two possible
causes of this misfortune: electricity failure and computer malfunction.
The goal is to calculate the posterior conditional probability distribution of each of the possible
unobserved causes given the observed evidence, i.e. P [Cause | Evidence].
Algorithm:
a. Calculate the probability of 'heartdisease' given the evidence 'restecg=1' and display
the result.
b. Calculate the probability of 'heartdisease' given the evidence 'cp=2' and display the
result.
Program:
import numpy as np
import pandas as pd
import csv
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.models import BayesianModel
from pgmpy.inference import VariableElimination
heartDisease = pd.read_csv('heart.csv')
heartDisease = heartDisease.replace('?',np.nan)
model=
BayesianModel([('age','heartdisease'),('sex','heartdisease'),('exang','hear
tdisease'),('cp','heartdisease'),('heartdisease','restecg'),('heartdisease'
,'chol')])
print('\nLearning CPD using Maximum likelihood estimators')
model.fit(heartDisease,estimator=MaximumLikelihoodEstimator)
Output
Ex.No: 9 Artificial Neural Network by implementing the Single-layer
Perceptron
Aim:
Algorithm:
1. Import the necessary libraries, including NumPy, scikit-learn, and the Iris dataset
from scikit-learn.
2. Load the Iris dataset and convert it into a binary classification problem by setting the
target labels to 1 if the class is not 0 (versicolor or virginica) and 0 otherwise (setosa).
3. Split the dataset into training and testing sets using the train_test_split function.
4. Create a single-layer perceptron using Perceptron() from scikit-learn and train it
on the training data.
5. Make predictions on the test data using the trained perceptron.
6. Calculate the accuracy of the perceptron by comparing the predicted labels with the
true labels and print the accuracy.
Program:
# Load a dataset (in this example, we'll use the Iris dataset)
data = load_iris()
X = data.data
y = (data.target != 0).astype(int) # Convert labels to binary (1 if
not class 0, 0 otherwise)
OutPut:
Accuracy: 1.0
Ex.No: 10 Artificial Neural Network by implementing the Multi-layer
Perceptron
Aim: Build an Artificial Neural Network by implementing the Multi-layer Perceptron and
test the same using appropriate data sets.
Algorithm:
1. Import the necessary libraries, including NumPy, scikit-learn, and the Iris dataset
from scikit-learn.
2. Load the Iris dataset for a multi-class classification problem, where we predict the
class of iris flowers.
3. Split the dataset into training and testing sets using the train_test_split function.
4. Create a multi-layer perceptron (MLP) using MLPClassifier from scikit-learn. In
this example, we specify two hidden layers with 100 and 50 neurons, respectively.
You can adjust the architecture of the MLP by modifying the hidden_layer_sizes
parameter.
5. Train the MLP on the training data.
6. Make predictions on the test data using the trained MLP.
7. Calculate the accuracy of the MLP by comparing the predicted labels with the true
labels and print the accuracy.
Program:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
# Load a dataset (in this example, we'll use the Iris dataset)
data = load_iris()
X = data.data
y = data.target
OutPut:
Accuracy: 1.0
Ex.No: 11
Radial Basis Function (RBF) network
Aim:
To build a Radial Basis Function (RBF) network to calculate a fitness function with five neurons
Using Python
Algorithm:
Program:
import numpy as np
# Number of neurons
num_neurons = 5
# Calculate the RBF activations for each data point and neuron
rbf_activations = np.zeros((len(data), num_neurons))
for i in range(len(data)):
for j in range(num_neurons):
rbf_activations[i, j] = rbf(data[i], centroids[j], spread[j])
print("Fitness:", output)
Output:
Fitness: 1.0047912368185284
Ex.No: 12
Gaussian Mixture Models
Aim
Algorithm:
Program:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.mixture import GaussianMixture
from sklearn.datasets import make_blobs
# Generate synthetic data (you can replace this with your dataset)
n_samples = 300
n_features = 2
n_clusters = 3
X, _ = make_blobs(n_samples=n_samples, n_features=n_features, centers=n_clusters,
random_state=42)
# Create and fit a Gaussian Mixture Model
gmm = GaussianMixture(n_components=n_clusters, random_state=42)
gmm.fit(X)
Output:
Ex.No: 12
Linear Vector Quantization
Aim:
To Implement Linear Vector Quantization with hidden neuron and learning rate
Description
Linear Vector Quan za on (LVQ) is a type of unsupervised learning algorithm used for vector
quan za on and clustering. It's similar to k-means but allows for more flexible, non-uniform quan za on
Algorithm:
# Generate synthetic data (you can replace this with your dataset)
def generate_data(num_samples, num_features, num_classes):
X = np.random.rand(num_samples, num_features)
y = np.random.randint(0, num_classes, num_samples)
return X, y
# LVQ initialization
def initialize_lvq(X, y, num_neurons, learning_rate):
num_features = X.shape[1]
neurons = np.random.rand(num_neurons, num_features)
neuron_labels = np.array([random.choice(np.unique(y)) for _ in range(num_neurons)])
return neurons, neuron_labels
# LVQ training
def lvq_train(X, y, neurons, neuron_labels, learning_rate, epochs):
for epoch in range(epochs):
for i in range(len(X)):
x = X[i]
label = y[i]
return neurons
# Example usage
num_samples = 100
num_features = 2
num_classes = 3
num_neurons = 5
learning_rate = 0.1
epochs = 100
Output: