0% found this document useful (0 votes)
15 views

To Study About Numpy, Pandas and Matplotlib Libraries in Python

Uploaded by

Nil Da
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

To Study About Numpy, Pandas and Matplotlib Libraries in Python

Uploaded by

Nil Da
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

EXPERIMENT – 01

AIM: - To study about numpy, pandas and matplotlib libraries in python.


THEROY: -
 NumPy (Numerical Python): - NumPy is a library used for numerical computing in Python. It
provides support for multi-dimensional arrays and matrices, along with a large collection of high-
level mathematical functions to operate on these arrays efficiently.
Key Concepts in NumPy:
i. ndarray: The central object in NumPy. It is a multi-dimensional array that allows fast
array operations.
ii. Array Creation: Learn how to create arrays using functions like np.array(), np.zeros(),
np.ones(), np.linspace(), and np.arange().
iii. Array Manipulation: Reshape, slice, and perform operations like sum, max, min, etc.
iv. Broadcasting: Learn how NumPy handles arithmetic operations on arrays of different
shapes.
v. Vectorization: Using array operations instead of loops for performance optimization.
vi. Mathematical Functions: Learn various mathematical operations like trigonometry,
logarithmic, exponential, etc.

CODE: -
import numpy as np

# Creating an array
arr = np.array([1, 2, 3, 4])
print(arr)

# Reshaping an array
arr_reshaped = arr.reshape(2, 2)
print(arr_reshaped)

OUTPUT: -

 Pandas: - Pandas is used for data manipulation and analysis. It provides powerful data
structures like DataFrame (similar to an Excel spreadsheet) and Series (one-
dimensional data) that make it easier to handle structured data.
Key Concepts in Pandas:
 Series: One-dimensional labeled array capable of holding any data type.
 DataFrame: Two-dimensional labeled data structure with columns of potentially
different types.
 Reading/Writing Data: Learn how to read data from various formats (CSV,
Excel, SQL, JSON) and export them.
 Indexing and Selecting Data: Filter rows/columns, access subsets of data using
.loc[] and .iloc[].
 Data Manipulation: Perform operations like sorting, filtering, group by, and
merging datasets.
 Missing Data Handling: Handle NaN values, fill missing data, or drop missing
rows.

CODE: -
import pandas as pd

# Creating a DataFrame
data = {'Name': ['John', 'Anna', 'Peter'], 'Age': [28, 24, 35]}
df = pd.DataFrame(data)
print(df)

OUTPUT: -

 Matplotlib: - Matplotlib is a plotting library used for creating static, animated, and
interactive visualizations in Python. It is often used along with NumPy and Pandas to
visualize data.
Key Concepts in Matplotlib:
 Basic Plots: Learn to create line plots, scatter plots, bar charts, histograms, and
more.
 Customization: Add titles, labels, legends, and grid lines to plots.
 Subplots: Create multiple plots in a single figure using plt.subplot().
 Figure and Axes: Learn the hierarchy of Figure and Axes objects and how to
manipulate them.
 Saving Figures: Save your visualizations to image formats like PNG, PDF, etc.

CODE: -
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y = [10, 20, 25, 30]
plt.plot(x, y)
plt.title('Simple Line Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()
EXPERIMENT – 02
AIM: - To perform data preprocessing and data summarization on iris dataset
THEROY: - The Iris dataset is a popular dataset in machine learning for classification tasks,
especially when learning data preprocessing and analysis techniques. Here’s a guide on how to
perform data preprocessing and data summarization on the Iris dataset using Pandas and
NumPy.
Steps Involved:
1. Loading the Iris Dataset
2. Data Preprocessing:
o Checking for missing values
o Handling missing values (if any)
o Encoding categorical variables (if needed)
3. Data Summarization:
o Descriptive statistics
o Visualizing distributions
o Understanding the relationships between features

CODE: -
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

# Load the iris dataset


iris = sns.load_dataset('iris')

# Data Preprocessing, Check for missing values


print("Missing values per column:\n", iris.isnull().sum())

# Encoding categorical variable


iris['species'] = iris['species'].astype('category')
iris['species_encoded'] = iris['species'].cat.codes

# Data Summarization, Descriptive statistics


print("\nSummary statistics:\n", iris.describe())

iris.hist(bins=20, figsize=(10, 8)) # Visualize feature distributions


plt.show()

# Pairplot for relationships between features


sns.pairplot(iris, hue='species')
plt.show()

corr_matrix = iris.corr()# Correlation matrix


print("\nCorrelation matrix:\n", corr_matrix)

# Heatmap of correlation matrix


sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
plt.show()
OUTPUT: -
EXPERIMENT – 03
AIM: - To perform data preprocessing and data visualization on iris dataset
THEROY: - The Iris dataset is a popular dataset in machine learning for classification tasks,
especially when learning data preprocessing and analysis techniques. Here’s a guide on how to
perform data preprocessing and data summarization on the Iris dataset using Pandas and
NumPy.
Steps:
1. Loading the Iris Dataset
2. Data Preprocessing:
o Checking for missing values
o Handling missing values (if any)
o Encoding categorical variables
o Feature scaling (normalization/standardization)
3. Data Visualization:
o Histograms
o Pair plots
o Box plots
o Correlation matrix heat-map

CODE: -
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler

# Load the iris dataset


iris = sns.load_dataset('iris')

# Data Preprocessing, Check for missing values


print("Missing values per column:\n", iris.isnull().sum())

# Encoding categorical variable


iris['species'] = iris['species'].astype('category')
iris['species_encoded'] = iris['species'].cat.codes

# Feature Scaling
scaler = StandardScaler()
iris_scaled = pd.DataFrame(scaler.fit_transform(iris.iloc[:, :-2]),
columns=iris.columns[:-2])

# Data Visualization, Histograms for all features


iris.hist(bins=20, figsize=(10, 8))
plt.show()

# Pair plot to see relationships between features


sns.pairplot(iris, hue='species')
plt.show()
# Box plot to see distribution and outliers by species
plt.figure(figsize=(10, 8))
sns.boxplot(data=iris, x='species', y='sepal_length')
plt.show()

# Correlation matrix and heatmap


corr_matrix = iris.iloc[:, :-2].corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
plt.show()

OUTPUT: -
EXPERIMENT – 04
AIM: - To implement the k mean clustering.
THEROY: - K-mean clustering is the process of teaching a computer to use unlabeled,
unclassified data and enabling the algorithm to operate on that data without supervision.

CODE: -
# Importing necessary libraries
import numpy as np
from sklearn.cluster import KMeans

# Sample data
X = np.array([[1, 2], [1.5, 1.8], [5, 8], [8, 8], [1, 0.6], [9, 11]])

# Initialize the KMeans model with k=2 clusters


kmeans = KMeans(n_clusters=2, random_state=0)

# Fit the model to the data


kmeans.fit(X)

# Get the cluster labels for each data point


labels = kmeans.labels_

# Print the cluster labels


print(labels)

# Get the coordinates of the cluster centroids


centroids = kmeans.cluster_centers_

# Print the cluster centroids


print(centroids)

OUTPUT:
EXPERIMENT – 05
AIM: - To implement data classification using KNN.
THEROY: - KNN, or the k-nearest neighbor algorithm, is a machine learning algorithm that uses
proximity to compare one data point with a set of data it was trained on and has memorized to
make predictions.
KNN is also known as a "lazy learner" because it doesn't learn a discriminative function
from the training data, but instead "memorizes" it.

CODE: -

# Importing necessary libraries


import pandas as pd
from sklearn.datasets import load_iris

# Load the Iris dataset


iris = load_iris()
X = iris.data # Features
y = iris.target # Labels

# Splitting the Data


from sklearn.model_selection import train_test_split

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Implementing KNN
from sklearn.neighbors import KNeighborsClassifier

# Create a KNN classifier with k=3


knn = KNeighborsClassifier(n_neighbors=4)
knn.fit(X_train, y_train) # Fit the classifier to the training data

# Make predictions
predictions = knn.predict(X_test)

# Evaluating the Model


from sklearn.metrics import accuracy_score

# Calculate accuracy
accuracy = accuracy_score(y_test, predictions)
print(accuracy)
print(f'Accuracy: {accuracy * 100:.2f}%')
OUTPUT: -
EXPERIMENT – 06
AIM: - To implement decision tree using ID3 Algorithm.
THEROY: - KNN, or the k-nearest neighbor algorithm, is a machine learning algorithm that uses
proximity to compare one data point with a set of data it was trained on and has memorized to
make predictions.
KNN is also known as a "lazy learner" because it doesn't learn a discriminative function
from the training data, but instead "memorizes" it.

CODE: -
# Importing necessary libraries
import pandas as pd
import numpy as np

class DecisionTreeID3:
def __init__(self):
self.tree = None
def entropy(self, y):
value_counts = y.value_counts(normalize=True)
return -np.sum(value_counts * np.log2(value_counts + 1e-9))

def information_gain(self, X, y, feature):


parent_entropy = self.entropy(y)
values = X[feature].unique()
weighted_entropy = 0
for value in values:
subset = y[X[feature] == value]
weighted_entropy += (len(subset) / len(y)) *
self.entropy(subset)
return parent_entropy - weighted_entropy

def best_feature(self, X, y):


gains = {feature: self.information_gain(X, y, feature) for feature in
X.columns}
return max(gains, key=gains.get)

def build_tree(self, X, y):


if len(y.unique()) == 1:
return y.iloc[0]
if X.empty:
return y.mode()[0]
best_feat = self.best_feature(X, y)
tree = {best_feat: {}}
for value in X[best_feat].unique():
subset_X = X[X[best_feat] ==
value].drop(columns=best_feat)
subset_y = y[X[best_feat] == value]
tree[best_feat][value] = self.build_tree(subset_X, subset_y)
return tree

def fit(self, X, y):


self.tree = self.build_tree(X, y)

def predict_one(self, tree, instance):


if not isinstance(tree, dict):
return tree
feature = next(iter(tree))
feature_value = instance[feature]

if feature_value in tree[feature]:
return self.predict_one(tree[feature][feature_value],
instance)
else:
return None # or handle unseen feature values

def predict(self, X):


return X.apply(lambda row: self.predict_one(self.tree, row),
axis=1)

# Create the dataset


data = {
'Outlook': ['Sunny', 'Sunny', 'Overcast', 'Rain', 'Rain', 'Rain', 'Overcast',
'Sunny', 'Sunny', 'Rain', 'Sunny', 'Overcast', 'Overcast', 'Rain'],
'Temperature': ['Hot', 'Hot', 'Hot', 'Mild', 'Cool', 'Cool', 'Cool', 'Mild', 'Cool',
'Mild', 'Mild', 'Mild', 'Hot', 'Mild'],
'Humidity': ['High', 'High', 'High', 'High', 'Normal', 'Normal', 'Normal',
'High', 'Normal', 'Normal', 'Normal', 'High', 'Normal', 'High'],
'Windy': [False, True, False, False, False, True, True, False, False, False,
True, True, False, True],
'Play Tennis': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes',
'Yes', 'Yes', 'Yes', 'No']
}

# Convert to DataFrame
df = pd.DataFrame(data)

# Split features and target


X = df.drop('Play Tennis', axis=1)
y = df['Play Tennis']
# Instantiate and fit the model
model = DecisionTreeID3()

model.fit(X, y)

# Make predictions on the training set


predictions = model.predict(X)

# Display predictions
print("Predictions:")
print(predictions.tolist())

# Display the decision tree structure


print("\nDecision Tree Structure:")
import pprint
pprint.pprint(model.tree)

OUTPUT : -
EXPERIMENT – 07
AIM: - To implement decision tree using CART Algorithm.
THEROY: - Classification and Regression Trees (CART) is a decision tree algorithm that is used
for both classification and regression tasks. It is a supervised learning algorithm that learns
from labeled data to predict unseen data.
The CART algorithm works via the following process:
 The best-split point of each input is obtained.
 Based on the best-split points of each input in Step 1, the new “best” split point
is identified.
 Split the chosen input according to the “best” split point.
 Continue splitting until a stopping rule is satisfied or no further desirable
splitting is available.

CODE: -
# Importing necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.datasets import load_iris
from sklearn import tree
from sklearn import metrics
import matplotlib.pyplot as plt

# Load dataset (using iris dataset as an example)


data = load_iris()
X = data.data
y = data.target

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Create the Decision Tree Classifier


clf = DecisionTreeClassifier(criterion='gini', random_state=42)

clf.fit(X_train, y_train) # Train the model


y_pred = clf.predict(X_test) # Make predictions

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
print("Classification Report:\n", report)
# Sample PlayTennis dataset
data = {
'Outlook': ['Sunny', 'Sunny', 'Overcast', 'Rain', 'Rain', 'Rain', 'Overcast',
'Sunny', 'Sunny', 'Rain', 'Sunny', 'Overcast', 'Overcast', 'Rain'],
'Temperature': ['Hot', 'Hot', 'Hot', 'Mild', 'Cool', 'Cool', 'Cool', 'Mild', 'Cool',
'Mild', 'Mild', 'Mild', 'Hot', 'Mild'],
'Humidity': ['High', 'High', 'High', 'High', 'Normal', 'Normal', 'Normal',
'High', 'Normal', 'Normal', 'Normal', 'High', 'Normal', 'High'],
'Windy': [False, True, False, False, False, True, True, False, False, False,
True, True, False, True],
'Play Tennis': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes',
'Yes', 'Yes', 'Yes', 'No']
}

# Convert to pandas DataFrame


df = pd.DataFrame(data)

# Convert categorical features into numeric codes for scikit-learn


df_encoded = df.apply(lambda col: col.astype('category').cat.codes)

# Features (Outlook, Temperature, Humidity, Windy)


X = df_encoded[['Outlook', 'Temperature', 'Humidity', 'Windy']]

# Target (PlayTennis)
y = df_encoded['PlayTennis']

# Split dataset into training and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)

# Initialize the DecisionTreeClassifier with entropy criterion


clf = DecisionTreeClassifier(criterion='entropy')

clf.fit(X_train, y_train) # Train the classifier


y_pred = clf.predict(X_test) # Predict on the test set

# Evaluate accuracy
accuracy = metrics.accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")

# Visualize the decision tree


plt.figure(figsize=(12, 8))
tree.plot_tree(clf, filled=True, feature_names=['Outlook', 'Temperature',
'Humidity', 'Windy'], class_names=['No', 'Yes'])
plt.show()
OUTPUT : -

VISULIZATION
EXPERIMENT – 08
AIM: - To implement decision tree using C4.5 Algorithm.
THEROY: - The C4.5 algorithm is used in Data Mining as a Decision Tree Classifier which can be
employed to generate a decision, based on a certain sample of data (univariate or multivariate
predictors).

CODE: -
# Import required libraries
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn import datasets
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.tree import plot_tree

# Load a dataset (we'll use the famous iris dataset for simplicity)
iris = datasets.load_iris()
X = iris.data # features
y = iris.target # target (labels)

# Split the dataset into training and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)

# Initialize the Decision Tree Classifier with 'entropy' criterion (for C4.5-like
behavior)
clf = DecisionTreeClassifier(criterion='entropy')

# Train the classifier


clf.fit(X_train, y_train)

# Make predictions on the test data


y_pred = clf.predict(X_test)

# Evaluate the accuracy of the classifier


accuracy = metrics.accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")

# Sample PlayTennis dataset


data = {
'Outlook': ['Sunny', 'Sunny', 'Overcast', 'Rain', 'Rain', 'Rain', 'Overcast',
'Sunny', 'Sunny', 'Rain', 'Sunny', 'Overcast', 'Overcast', 'Rain'],
'Temperature': ['Hot', 'Hot', 'Hot', 'Mild', 'Cool', 'Cool', 'Cool', 'Mild', 'Cool',
'Mild', 'Mild', 'Mild', 'Hot', 'Mild'],
'Humidity': ['High', 'High', 'High', 'High', 'Normal', 'Normal', 'Normal',
'High', 'Normal', 'Normal', 'Normal', 'High', 'Normal', 'High'],
'Windy': [False, True, False, False, False, True, True, False, False, False,
True, True, False, True],
'Play Tennis': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes',
'Yes', 'Yes', 'Yes', 'No']
}

# Convert to pandas DataFrame


df = pd.DataFrame(data)

# Convert categorical features into numeric codes for scikit-learn


df_encoded = df.apply(lambda col: col.astype('category').cat.codes)

# Features (Outlook, Temperature, Humidity, Windy)


X = df_encoded[['Outlook', 'Temperature', 'Humidity', 'Windy']]

# Target (PlayTennis)
y = df_encoded['PlayTennis']

# Split dataset into training and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)

# Initialize the DecisionTreeClassifier with entropy criterion


clf = DecisionTreeClassifier(criterion='entropy')

# Train the classifier


clf.fit(X_train, y_train)

# Predict on the test set


y_pred = clf.predict(X_test)

# Evaluate accuracy
accuracy = metrics.accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")

# Visualize the decision tree


plt.figure(figsize=(12, 8))
plot_tree(clf, filled=True, feature_names=['Outlook', 'Temperature', 'Humidity',
'Windy'], class_names=['No', 'Yes'])
plt.show()
OUTPUT : -

VISULIZATION
EXPERIMENT – 09
AIM: - To implement multi layer neural network.
THEROY: - The C4.5 algorithm is used in Data Mining as a Decision Tree Classifier which can be
employed to generate a decision, based on a certain sample of data (univariate or multivariate
predictors).

CODE: -
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# Generate a sample dataset (e.g., a binary classification dataset)


X, y = make_moons(n_samples=1000, noise=0.2, random_state=42)

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)

# Normalize the data


scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

X_train = torch.tensor(X_train, dtype=torch.float32)


X_test = torch.tensor(X_test, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.long)
y_test = torch.tensor(y_test, dtype=torch.long)

# Define the multi-layer neural network


class MultiLayerNN(nn.Module):
def __init__(self):
super(MultiLayerNN, self).__init__()
# Input layer (2 inputs), two hidden layers (16 neurons each), and
an output layer (2 classes)
self.fc1 = nn.Linear(2, 16) # Layer 1
self.fc2 = nn.Linear(16, 16) # Layer 2
self.fc3 = nn.Linear(16, 2) # Output layer
def forward(self, x):
x = F.relu(self.fc1(x)) # Apply ReLU activation function to Layer
x = F.relu(self.fc2(x)) # Apply ReLU to Layer 2
x = self.fc3(x) # No activation function for the output layer (logits)
return x
# Initialize the network, define the loss function and the optimizer
model = MultiLayerNN()
criterion = nn.CrossEntropyLoss() # For multi-class classification
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Training the neural network


num_epochs = 100
for epoch in range(num_epochs):
model.train() # Set the model to training mode

# Forward pass: compute predicted labels


outputs = model(X_train)
loss = criterion(outputs, y_train)

# Backward pass: compute the gradients


optimizer.zero_grad()
loss.backward()

# Update the weights


optimizer.step()

# Print the loss every 10 epochs


if (epoch+1) % 10 == 0:
print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

# Evaluating the model


model.eval() # Set the model to evaluation mode
with torch.no_grad():
# Predictions on the test set
outputs = model(X_test)

_, predicted = torch.max(outputs, 1)
# Calculate the accuracy
accuracy = accuracy_score(y_test, predicted)
print(f'Accuracy: {accuracy * 100:.2f}%')

class MultiLayerNN(nn.Module):
def __init__(self):
super(MultiLayerNN, self).__init__()
# Input layer (2 inputs), three hidden layers (16, 32, 16 neurons),
and output layer (2 classes)
self.fc1 = nn.Linear(2, 16)
self.fc2 = nn.Linear(16, 32) # New hidden layer with 32 neurons
self.fc3 = nn.Linear(32, 16)
self.fc4 = nn.Linear(16, 2) # Output layer

def forward(self, x):


x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x)) # New hidden layer
x = F.relu(self.fc3(x))
x = self.fc4(x)
return x

OUTPUT : -

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy