To Study About Numpy, Pandas and Matplotlib Libraries in Python
To Study About Numpy, Pandas and Matplotlib Libraries in Python
CODE: -
import numpy as np
# Creating an array
arr = np.array([1, 2, 3, 4])
print(arr)
# Reshaping an array
arr_reshaped = arr.reshape(2, 2)
print(arr_reshaped)
OUTPUT: -
Pandas: - Pandas is used for data manipulation and analysis. It provides powerful data
structures like DataFrame (similar to an Excel spreadsheet) and Series (one-
dimensional data) that make it easier to handle structured data.
Key Concepts in Pandas:
Series: One-dimensional labeled array capable of holding any data type.
DataFrame: Two-dimensional labeled data structure with columns of potentially
different types.
Reading/Writing Data: Learn how to read data from various formats (CSV,
Excel, SQL, JSON) and export them.
Indexing and Selecting Data: Filter rows/columns, access subsets of data using
.loc[] and .iloc[].
Data Manipulation: Perform operations like sorting, filtering, group by, and
merging datasets.
Missing Data Handling: Handle NaN values, fill missing data, or drop missing
rows.
CODE: -
import pandas as pd
# Creating a DataFrame
data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 24, 35]}
df = pd.DataFrame(data)
print(df)
OUTPUT: -
Matplotlib: - Matplotlib is a plotting library used for creating static, animated, and
interactive visualizations in Python. It is often used along with NumPy and Pandas to
visualize data.
Key Concepts in Matplotlib:
Basic Plots: Learn to create line plots, scatter plots, bar charts, histograms, and
more.
Customization: Add titles, labels, legends, and grid lines to plots.
Subplots: Create multiple plots in a single figure using plt.subplot().
Figure and Axes: Learn the hierarchy of Figure and Axes objects and how to
manipulate them.
Saving Figures: Save your visualizations to image formats like PNG, PDF, etc.
CODE: -
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y = [10, 20, 25, 30]
plt.plot(x, y)
plt.title('Simple Line Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()
EXPERIMENT – 02
AIM: - To perform data preprocessing and data summarization on iris dataset
THEROY: - The Iris dataset is a popular dataset in machine learning for classification tasks,
especially when learning data preprocessing and analysis techniques. Here’s a guide on how to
perform data preprocessing and data summarization on the Iris dataset using Pandas and
NumPy.
Steps Involved:
1. Loading the Iris Dataset
2. Data Preprocessing:
o Checking for missing values
o Handling missing values (if any)
o Encoding categorical variables (if needed)
3. Data Summarization:
o Descriptive statistics
o Visualizing distributions
o Understanding the relationships between features
CODE: -
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
CODE: -
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
# Feature Scaling
scaler = StandardScaler()
iris_scaled = pd.DataFrame(scaler.fit_transform(iris.iloc[:, :-2]),
columns=iris.columns[:-2])
OUTPUT: -
EXPERIMENT – 04
AIM: - To implement the k mean clustering.
THEROY: - K-mean clustering is the process of teaching a computer to use unlabeled,
unclassified data and enabling the algorithm to operate on that data without supervision.
CODE: -
# Importing necessary libraries
import numpy as np
from sklearn.cluster import KMeans
# Sample data
X = np.array([[1, 2], [1.5, 1.8], [5, 8], [8, 8], [1, 0.6], [9, 11]])
OUTPUT:
EXPERIMENT – 05
AIM: - To implement data classification using KNN.
THEROY: - KNN, or the k-nearest neighbor algorithm, is a machine learning algorithm that uses
proximity to compare one data point with a set of data it was trained on and has memorized to
make predictions.
KNN is also known as a "lazy learner" because it doesn't learn a discriminative function
from the training data, but instead "memorizes" it.
CODE: -
# Implementing KNN
from sklearn.neighbors import KNeighborsClassifier
# Make predictions
predictions = knn.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, predictions)
print(accuracy)
print(f'Accuracy: {accuracy * 100:.2f}%')
OUTPUT: -
EXPERIMENT – 06
AIM: - To implement decision tree using ID3 Algorithm.
THEROY: - KNN, or the k-nearest neighbor algorithm, is a machine learning algorithm that uses
proximity to compare one data point with a set of data it was trained on and has memorized to
make predictions.
KNN is also known as a "lazy learner" because it doesn't learn a discriminative function
from the training data, but instead "memorizes" it.
CODE: -
# Importing necessary libraries
import pandas as pd
import numpy as np
class DecisionTreeID3:
def __init__(self):
self.tree = None
def entropy(self, y):
value_counts = y.value_counts(normalize=True)
return -np.sum(value_counts * np.log2(value_counts + 1e-9))
if feature_value in tree[feature]:
return self.predict_one(tree[feature][feature_value],
instance)
else:
return None # or handle unseen feature values
# Convert to DataFrame
df = pd.DataFrame(data)
model.fit(X, y)
# Display predictions
print("Predictions:")
print(predictions.tolist())
OUTPUT : -
EXPERIMENT – 07
AIM: - To implement decision tree using CART Algorithm.
THEROY: - Classification and Regression Trees (CART) is a decision tree algorithm that is used
for both classification and regression tasks. It is a supervised learning algorithm that learns
from labeled data to predict unseen data.
The CART algorithm works via the following process:
The best-split point of each input is obtained.
Based on the best-split points of each input in Step 1, the new “best” split point
is identified.
Split the chosen input according to the “best” split point.
Continue splitting until a stopping rule is satisfied or no further desirable
splitting is available.
CODE: -
# Importing necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.datasets import load_iris
from sklearn import tree
from sklearn import metrics
import matplotlib.pyplot as plt
# Target (PlayTennis)
y = df_encoded['PlayTennis']
# Evaluate accuracy
accuracy = metrics.accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
VISULIZATION
EXPERIMENT – 08
AIM: - To implement decision tree using C4.5 Algorithm.
THEROY: - The C4.5 algorithm is used in Data Mining as a Decision Tree Classifier which can be
employed to generate a decision, based on a certain sample of data (univariate or multivariate
predictors).
CODE: -
# Import required libraries
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn import datasets
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.tree import plot_tree
# Load a dataset (we'll use the famous iris dataset for simplicity)
iris = datasets.load_iris()
X = iris.data # features
y = iris.target # target (labels)
# Initialize the Decision Tree Classifier with 'entropy' criterion (for C4.5-like
behavior)
clf = DecisionTreeClassifier(criterion='entropy')
# Target (PlayTennis)
y = df_encoded['PlayTennis']
# Evaluate accuracy
accuracy = metrics.accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
VISULIZATION
EXPERIMENT – 09
AIM: - To implement multi layer neural network.
THEROY: - The C4.5 algorithm is used in Data Mining as a Decision Tree Classifier which can be
employed to generate a decision, based on a certain sample of data (univariate or multivariate
predictors).
CODE: -
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
_, predicted = torch.max(outputs, 1)
# Calculate the accuracy
accuracy = accuracy_score(y_test, predicted)
print(f'Accuracy: {accuracy * 100:.2f}%')
class MultiLayerNN(nn.Module):
def __init__(self):
super(MultiLayerNN, self).__init__()
# Input layer (2 inputs), three hidden layers (16, 32, 16 neurons),
and output layer (2 classes)
self.fc1 = nn.Linear(2, 16)
self.fc2 = nn.Linear(16, 32) # New hidden layer with 32 neurons
self.fc3 = nn.Linear(32, 16)
self.fc4 = nn.Linear(16, 2) # Output layer
OUTPUT : -