0% found this document useful (0 votes)

4 views

data science

Uploaded by

paviganesh8789

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

data science

Uploaded by

paviganesh8789

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 42

SREE SAKTHI ENGINEERING COLLEGE

KARAMADAI, COIMBATORE

DEPARTMENT OF

COMPUTER SCIENCE AND ENGINEERING

ACADEMIC YEAR: 2024-2025

CS3361- DATA SCIENCE LABORATORY

Data Science Lab Page 1

SREE SAKTHI ENGINEERING COLLEGE
Karamadai, Coimbatore – 641 104

Certified that this is the bonafide record of work done by Mr./Ms….

…………………………………………………………………..in the
CS3361 DATA SCIENCE LABORATORY of this institution, as prescribed
by the Anna University, Chennai for the THIRD Semester Computer Science
and Engineering, during the year 2024-2025.

Staff - In charge Head of the Department

Submitted for the University Practical Examination held on ……………………at

Sree Sakthi Engineering College, Coimbatore – 641 104.

Name :
Register Number :

Internal Examiner External Examiner

Data Science Lab Page 2

INDEX

MARK SIGN
EX. DATE NAME OF THE EXPERIMENT PAGE
NO NO.

Download Install and Explore the features of

1 Numpy,Scipy,Jupyter,Statsmodels and Pandas
Packages

2 Working With Numpy Arrays

3 Working With Pandas Data Frames

Implementation Of Various Data Sampling Methods

4 Using Python

Reading Data from text files ,Excel and the web and
5 exploring various commands for doing descriptive
analytics on the IRIS Dataset

Implementation Of Classification And Clustering Of

6 Data Using Python

Use The Diabetes Data Set From Uci And Pima

7 Indians Diabetes Data Set Performing The Following

Apply And Explore Various Plotting Functionson Uci

8
Data Sets

9 Visualizing Geographic Data With Basemap

10 Arithmetic Operation Between Two Panda Series

11 Scatter Plots In Python Using Pokemon

Data Science Lab Page 3

EX.NO: 1 DOWNLOAD INSTALL AND EXPLORE THE FEATURES OF
DATE: NUMPY,SCIPY, JUPYTER, STATSMODELS AND PANDAS PACKAGES.

AIM:
To download, install and explore the features of Numpy, Scipy, Jupyter, Statsmodels
and pandas packages.

ALGORITHM:
Step 1: Go to Command prompt.
Step 2: Type pip install Numpy.
Step3: Numpy packages have been installed.
Step 4: Type pip Scipy, Scipy packages get installed.
Step 5: Type pip install Jupyter, Jupyter packages get
installed. Step 6: Type pip install Statsmodel, the packages get
installed. Step 7: Type pip install pandas, the packages get
installed.

INSTALLATION PROCESS:
Numpy Installation: pip install numpy

Data Science Lab Page 4

Scipy Installation: pip install scipy

Jupyter Installation: pip install jupyter

Data Science Lab Page 5

Statsmodels installation: pip install statsmodels

pi Pandas installation:

RESULT:
Thus the working with commands executed successfully.

Data Science Lab Page 6

EX: NO: 2 WORKING WITH NUMPY
ARRAYS DATE:

AIM:
Write a python code to implement the concept of Numpy arrays.

ALGORITHM:

Step 1: Import the NumPy library using the import statement.

Step 2:Define the input data that you want to work with. You can do this by either
creating a NumPy array or loading data from a file.
Step 3:Perform any necessary operations on the data. This may include things
like calculating statistics, manipulating the data, or applying filters.
Step 4:Display the results of your operations. This could be as simple as printing out
the result to the console, or it could involve creating a visualization of the data.

PROGRAM:

import numpy as np

# Creating a 1-D NumPy

array arr1d = np.array([1, 2, 3,
4, 5]) print("1-D Array:")
print(arr1d)

# Creating a 2-D NumPy array

arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("2-D
Array:")
print(arr2d)

# Creating a random 1-D NumPy

array rand_arr = np.random.rand(5)
print("Random Array:")
print(rand_arr)

# Performing element-wise multiplication of two NumPy

arrays arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5,
6]) result = arr1 * arr2
print("Element-wise
multiplication:") print(result)

# Performing matrix multiplication of two NumPy

arrays mat1 = np.array([[1, 2], [3, 4]])
mat2 = np.array([[5, 6], [7, 8]])
Data Science Lab Page 7
result = np.matmul(mat1,
mat2) print("Matrix
multiplication:") print(result)

# create a 1D NumPy
array a = np.array([1, 2, 3,
4, 5])
print(a)

# create a 2D NumPy array

b = np.array([[1, 2, 3], [4, 5, 6]])
print(b)

# use NumPy functions to perform operations on

arrays g = np.sum(b)
h=
np.mean(b) i =
np.max(b)
j = np.min(b)

# print the
results print(g)
print(h
)
print(i)
print(j)

OUTPUT:

1-D Array:
[1 2 3 4 5]
2-D Array:
[[1 2 3]
[4 5 6]
[7 8 9]]
Random Array:
[0.21995867 0.92288075 0.69384057 0.7043604 0.80637838]
Element-wise multiplication:
[ 4 10 18]
Matrix multiplication:
[[19 22]

Data Science Lab Page 8

[43 50]]
[1 2 3 4 5]
(5,)
[[1 2 3]
[4 5 6]]
(2,3)

21
3.5
6
1

RESULT:
Thus the working with numpy array was completed successfully.

Data Science Lab Page 9

Data Science Lab Page 10
Data Science Lab Page 11
AIM:
Write a python code to implement the concept of Pandas Data frames.

ALGORITHM:
Step 1: Import pandas library
Step 2: Load data into a
DataFrame. Step 3: Explore the
DataFrame Step 4: Selecting
data.
Step 5: Manipulating data
Step 6: Cleaning data
Step 7: Saving the modified DataFrame

Pandas Data Frame is two-dimensional size-mutable, potentially heterogeneous.

Tabular data structure with labelled axes (rows and columns). A data frame
is a two-dimensional data structure i.e. data is aligned in a tabular fashion in
rows and columns. Pandas Data frame consists of three principle
components the data, rows and columns.

PROGRAM:

import pandas as pd

# create a sample dataframe

data = {'Name': ['John', 'Jane', 'Bob', 'Mary'],
'Age': [30, 25, 40, 35],
'Gender': ['M', 'F', 'M', 'F']}
df = pd.DataFrame(data)

# print the dataframe

print(df)

# select specific columns

print(df[['Name', 'Age']])

# filter rows based on a condition

print(df[df['Age'] > 30])

# add a new column

df['Salary'] = [50000, 60000, 70000, 80000]
print(df)
# drop a column
df = df.drop('Gender', axis=1)
print(df)

# group by a column and calculate statistics

grouped = df.groupby('Name').sum()
print("Groupby")
Data Science Lab Page 12
print(grouped)

# Create a list of dictionaries

data = [{'Name': 'Alice', 'Age': 25, 'Gender': 'Female'},
{'Name': 'Bob', 'Age': 30, 'Gender': 'Male'},
{'Name': 'Charlie', 'Age': 35, 'Gender': 'Male'},
{'Name': 'David', 'Age': 40, 'Gender': 'Male'},
{'Name': 'Eve', 'Age': 45, 'Gender': 'Female'}]

# Create a DataFrame from the list

df = pd.DataFrame(data)

# Print the DataFrame

print(df)

OUTPUT:

Name Age Gender

0 John 30 M
1 Jane 25 F
2 Bob 40 M
3 Mary 35 F

Name Age
0 John 30
1 Jane 25
2 Bob 40
3 Mary 35

Name Age Gender

2 Bob 40 M
3 Mary 35 F

Name Age Gender Salary

0 John 30 M 50000
1 Jane 25 F 60000
2 Bob 40 M 70000
3 Mary 35 F 80000
Name Age Salary
0 John 30 50000
1 Jane 25 60000
2 Bob 40 70000
3 Mary 35 80000
Groupby

Age Salary
Name
Data Science Lab Page 13
Bob 40 70000
Jane 25 60000
John 30 50000
Mary 35 80000

Name Age Gender

0 Alice 25 Female
1 Bob 30 Male
2 Charlie 35 Male
3 David 40 Male
4 Eve 45 Female

RESULT:
Thus the working with pandas Data Frame was completed successfully.

Data Science Lab Page 14

EX: NO 4 IMPLEMENTATION OF VARIOUS DATA
DATE: SAMPLING METHODS USING PYTHON

AIM:
Write a python code to implement the data sampling method.

ALGORITHM:
Step 1: import pandas.
Step 2: get data
Step 3: start by generating a sample dataset with two columns, 'A'
and 'B'.
We then implement four different sampling methods:
 Simple random sampling: We use the sample() method from the Pandas library
to randomly select 30 rows from the dataset.
 Systematic sampling: We select every 10th row of the dataset using Python's slicing
syntax.
 Stratified sampling: We group the dataset by quantiles of the 'B' column
using the groupby() method from Pandas. We then select a random sample of
20% of rows from each group using the sample() method.
 Cluster sampling: We randomly select 5 clusters of rows from the 'A' column and
include all rows with those values using the apply() method.
Step 4:Finally, we output the results for each sampling method using the print()
function.

PROGRAM

import random
import numpy as
np import pandas
as pd

# Generate a sample dataset

data = pd.DataFrame({'A': range(1, 101), 'B': np.random.rand(100)})

# Simple random sampling

simple_random_sample = data.sample(n=30)

# Systematic sampling
systematic_sample = data.iloc[::10,
:]

# Stratified sampling
strata = data.groupby(pd.qcut(data['B'], 3))
stratified_sample = strata.apply(lambda x: x.sample(frac=0.2))

Data Science Lab Page 15

# Cluster sampling
clusters = np.random.choice(range(1, 11), size=5, replace=False)

Data Science Lab Page 16

cluster_sample = data[data['A'].apply(lambda x: x in clusters)]

# Output the results

print('Simple random sample:\n',
simple_random_sample) print('\nSystematic sample:\n',
systematic_sample) print('\nStratified sample:\n',
stratified_sample) print('\nCluster sample:\n',
cluster_sample)

OUTPUT:
Simple random sample:
A B
79 80 0.878277
40 41 0.639264
57 58 0.897447
58 59 0.600354
13 14 0.661578
95 96 0.246993
7 8 0.934867
94 95 0.812213
24 25 0.837017
49 50 0.186842
0 1 0.940231
42 43 0.394464
33 34 0.793838
60 61 0.181043
54 55 0.190086
56 57 0.773640
74 75 0.228341
4 5 0.514767
34 35 0.640982
87 88 0.102709
53 54 0.594242
23 24 0.689938
72 73 0.800255
52 53 0.898425
65 66 0.530389
61 62 0.322569
77 78 0.029112
80 81 0.596407
35 36 0.699136
99 100 0.637643

Systematic
sample: AB
0 1 0.940231
10 11 0.721191
20 21 0.242574
30 31 0.275564
40 41 0.639264
50 51 0.663985
60 61 0.181043
70 71 0.409256
80 81 0.596407
Data Science Lab Page 17
90 91 0.133356

Stratified sample:
A B
B
(0.012400000000000001, 0.334] 60 61 0.181043
69 70 0.177240
96 97 0.124787
87 88 0.102709
19 20 0.122518
31 32 0.118424
93 94 0.152851
(0.334, 0.641] 42 43 0.394464
80 81 0.596407
14 15 0.444850
83 84 0.633580
75 76 0.475987
82 83 0.416136
66 67 0.340407
(0.641, 0.967] 44 45 0.814840
28 29 0.836442
46 47 0.680723
32 33 0.653128
57 58 0.897447
86 87 0.837541
10 11 0.721191

Cluster sample:
A B
2 3 0.621991
3 4 0.576675
4 5 0.514767
6 7 0.308789
9 10 0.013366

RESULT:
Thus the implementation of sampling method executed successfully.
Data Science Lab Page 18
EX: NO: 5 READING DATA FROM TEXT FILES, EXCEL AND
THE DATE: WEB AND EXPLORING VARIOUS COMMANDS FOR
DOING DESCRIPTIVE ANALYTICS ON THE IRIS DATA SET

AIM:
To Read the data from text files, Excel and the web and exploring various commands
for doing descriptive analytics on the Iris data set.

ALGORITHM:
Step 1: Import the pandas library as pd and the requests library.
Step 2: From the io library, import the BytesIO function.
Step 3: Read data from a text file called iris.txt using the pd.read_csv() function. Assign
the resulting DataFrame to iris_txt. The file has no header row, so header=None is passed
as an argument. The column names are specified as a list of strings using the names
argument.
Step 4: Read data from an Excel file called iris.xlsx using the pd.read_excel()
function. Assign the resulting DataFrame to iris_excel.
Step 5: Read data from a CSV file from the web using the requests.get() function to
retrieve the file contents, and then pass the contents to the pd.read_csv() function using
BytesIO to create a file-like object. Assign the resulting DataFrame to iris_web. The file
has no header row, so header=None is passed as an argument. The column names are
specified as a list of strings using the names argument.
Step 6: Concatenate the three DataFrames using pd.concat(), and assign the result to
iris. ignore_index=True is passed as an argument to reset the index of the concatenated
DataFrame.
Step 7: Display the descriptive statistics of the entire dataset using iris.describe().
Step 8:Group the data by species and display the mean values for each species
using iris.groupby('species').mean().
Step 9: Create a box plot for each variable by species using
iris.boxplot(by='species', figsize=(10, 8)).

PROGRAM:

import pandas as
pd import requests
from io import BytesIO

# Read data from web

url =
'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
response = requests.get(url)
iris_web = pd.read_csv(BytesIO(response.content), header=None, names=['sepal_length',

Data Science Lab Page 19

'sepal_width', 'petal_length', 'petal_width', 'species'])

# Display descriptive statistics

print('Descriptive statistics for Iris
dataset:') print(iris_web.describe())

# Group data by species and display mean

values print('\nMean values for each species:')
print(iris_web.groupby('species').mean())

OUTPUT:

sepal_length sepal_width petal_length petal_width

count 150.000000 150.000000 150.000000
150.000000
mean 5.843333 3.054000 3.758667 1.198667
std 0.828066 0.433594 1.764420 0.763161
min 4.300000 2.000000 1.000000 0.100000
25% 5.100000 2.800000 1.600000 0.300000
50% 5.800000 3.000000 4.350000 1.300000
75% 6.400000 3.300000 5.100000 1.800000
max 7.900000 4.400000 6.900000 2.500000

Mean values for each species:

sepal_length sepal_width petal_length petal_width
species
Iris-setosa 5.006 3.418 1.464 0.244
Iris-versicolor 5.936 2.770 4.260 1.326
Iris-virginica 6.588 2.974 5.552 2.026

Data Science Lab Page 20

RESULT:
Thus the program was executed successfully.

EX: NO: 6 IMPLEMENTATION OF CLASSIFICATION AND

CLUSTERING DATE: OF DATA USING PYTHON

AIM:

Implementation of classification and clustering of data using python

ALGORITHM:

Step 1: Import the necessary libraries:

 pandas for data manipulation
 matplotlib.pyplot for data visualization
 sklearn.datasets for loading the iris dataset
 sklearn.model_selection for splitting the data into training and testing sets
 sklearn.neighbors for k-nearest neighbors classification
 sklearn.cluster for k-means clustering
Step 2: Load the iris dataset using the load_iris() function and create a pandas dataframe
from the data.
Step 3: Add a column to the dataframe that maps the target values (0, 1, 2) to
the corresponding species names ('setosa', 'versicolor', 'virginica').
Step 4:Split the data into training and testing sets using the train_test_split()
function. Step 5: Create a k-nearest neighbors classifier object with k=3 using the
KNeighborsClassifier() function.
Step 6: Train the classifier on the training data using the fit() method.
Step 7: Print the accuracy of the classifier on the testing set using the score() method.
Step 8: Create a k-means clustering object with k=3 using the KMeans() function.
Step 9: Train the clustering algorithm on the data using the fit() method.
Step 10: Add a new column to the dataframe that contains the cluster labels assigned by
the k-means algorithm.
Step 11: Map each cluster label to a color and add a new column to the dataframe
that contains the corresponding color.
Step 12: Visualize the clusters using a scatter plot, where the x-axis is the petal length and
the y-axis is the petal width. Use the color column to color-code the points according to their
cluster.

PROGRAM :

Data Science Lab Page 21

import pandas as pd
from sklearn.datasets import load_iris

Data Science Lab Page 22

from sklearn.model_selection import train_test_split
from sklearn.neighbors import
KNeighborsClassifier from sklearn.cluster import
KMeans
import matplotlib.pyplot as plt

# Load the iris

dataset iris =
load_iris()

# Create a Pandas DataFrame of the iris data

df = pd.DataFrame(data=iris.data, columns=iris.feature_names)

# Print the first 5 rows of the

dataset print(df.head())

# Split the dataset into training and testing data

X_train, X_test, y_train, y_test = train_test_split(df, iris.target, test_size=0.2)

# Implement classification using K-Nearest

Neighbors knn =
KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
print('Accuracy:', knn.score(X_test, y_test))

# Implement clustering using K-Means

kmeans = KMeans(n_clusters=3,
random_state=0) kmeans.fit(X_train)
y_pred = kmeans.predict(X_test)
plt.scatter(X_test.iloc[:, 0], X_test.iloc[:, 1],
c=y_pred) plt.title('K-Means Clustering')
plt.xlabel('Sepal
Length')
plt.ylabel('Sepal Width')
plt.show()

OUTPUT:

sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2

Data Science Lab Page 23

4 5.0 3.6 1.4 0.2
Accuracy: 0.9333333333333333

Data Science Lab Page 24

RESULT:

Thus the implementation of classification and clustering successfully completed.

Data Science Lab Page 25

EX: NO: 7 USE THE DIABETES DATA SET FROM UCI AND PIMA
DATE: INDIANS DIABETES DATA SET PERFORMING THE FOLLOWING

AIM:

To use the diabetes data set from UCI and Pima Indians diabetes data set performing
the following.
a) Implement Univariate analysis: Frequency, Mean, Median, Mode, Variance,
Standard Deviation, Skewness and Kurtosis from UCI dataset.
b) Bivariate analysis: Linear and Logistic Regression Modeling.
c) Multiple Regression Analysis.

ALGORITHM:
Step 1: Download the Pima Indians Diabetes dataset
Link: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes- database?
resource=download
Step 2: Install Packages.
Step 3: Open the pycharm and type the following Commands.
Step 4: The output will display.

PROGRAM:
a) Univariate analysis: Frequency, Mean, Median, Mode, Variance,
Standard Deviation, Skewness and Kurtosis.

import pandas as pd
import numpy as np
from scipy.stats import skew, kurtosis
import statistics as st

# Load the UCI diabetes dataset

diabetes_df = pd.read_csv('diabetes.csv')

# Calculate descriptive statistics for each column

for col in diabetes_df.columns:
freq = diabetes_df[col].value_counts()

# Mean
mean = diabetes_df[col].mean()
# Median
median = diabetes_df[col].median()

# Mode

Data Science Lab Page 26

mode = diabetes_df[col].mode()

# Variance
variance = diabetes_df[col].var()

# Standard deviation
std_dev = diabetes_df[col].std() # Skewness
skewness = skew(diabetes_df[col])

# Kurtosis
kurt = kurtosis(diabetes_df[col])

# Print the results

print('Column:', col)
print('Frequency:', freq)
print('Mean:', mean)
print('Median:', median)
print('Mode:', mode)
print('Variance:', variance)
print('Standard Deviation:', std_dev)
print('Skewness:', skewness)
print('Kurtosis:', kurt)
print(' ')
print(df.shape)
print(df.info())
print(df.mean())
print(df.median())
print(df.mode())
print(df.std())
print(df.var())
print(df.skew())

Output:
Column: Outcome
Frequency: Outcome
0 500
1 268
Name: count, dtype: int64
Mean: 0.3489583333333333
Median: 0.0
Mode: 0 0
Name: Outcome, dtype: int64
Variance: 0.22748261625380273
Standard Deviation: 0.47695137724279896
Skewness: 0.6337757030614577
Kurtosis: -1.5983283582089547

(768, 9)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Pregnancies 768 non-null int64
Data Science Lab Page 27
1 Glucose 768 non-null int64
2 BloodPressure 768 non-null int64
3 SkinThickness 768 non-null int64
4 Insulin 768 non-null int64
5 BMI 768 non-null float64
6 DiabetesPedigreeFunction 768 non-null float64
7 Age 768 non-null int64
8 Outcome 768 non-null int64
dtypes: float64(2), int64(7)
memory usage: 54.1 KB
None
Pregnancies 3.845052
Glucose 120.894531
BloodPressure 69.105469
SkinThickness 20.536458
Insulin 79.799479
BMI 31.992578
DiabetesPedigreeFunction 0.471876
Age 33.240885
Outcome 0.348958
dtype: float64
Pregnancies 3.0000
Glucose 117.0000
BloodPressure 72.0000
SkinThickness 23.0000
Insulin 30.5000
BMI 32.0000
DiabetesPedigreeFunction 0.3725
Age 29.0000
Outcome 0.0000
dtype: float64
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \
0 1.0 99 70.0 0.0 0.0 32.0
1 NaN 100 NaN NaN NaN NaN

DiabetesPedigreeFunction Age Outcome

0 0.254 22.0 0.0
1 0.258 NaN NaN
Pregnancies 3.369578
Glucose 31.972618
BloodPressure 19.355807
SkinThickness 15.952218
Insulin 115.244002
BMI 7.884160
DiabetesPedigreeFunction 0.331329
Age 11.760232
Outcome 0.476951
dtype: float64
Pregnancies 11.354056
Glucose 1022.248314
BloodPressure 374.647271
SkinThickness 254.473245
Insulin 13281.180078
BMI 62.159984
DiabetesPedigreeFunction 0.109779
Age 138.303046
Outcome 0.227483
dtype: float64
Pregnancies 0.901674
Glucose 0.173754
BloodPressure -1.843608
SkinThickness 0.109372
Insulin 2.272251
BMI -0.428982
DiabetesPedigreeFunction 1.919911
Age 1.129597
Outcome 0.635017

Data Science Lab Page 28

dtype: float64

7 b) Bivariate Analysis: Linear and Logistic Regression Modeling

PROGRAM:
diabetes = datasets.load_diabetes()
diabetes.keys()
df=pd.DataFrame(diabetes['data'],columns=diabetes['feature_names'])
x=df
y=diabetes['target']
from sklearn.model_selection import train_test_split
#to split our data into training and testing set
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size = 0.3,random_state = 101)
#splitting our data

from sklearn import linear_model

model=linear_model.LinearRegression()
model.fit (x_train,y_train) #Training data is used always
#prediction of testset result of the Prepared Model
y_pre=model.predict(x_test) #puts the test feature value to get the label value which are
predicted by the mode

from sklearn.model_selection import cross_val_score #importing

scores=cross_val_score(model,x,x,scoring="neg_mean_squared_error",cv=10)
rmse_scores=np.sqrt(-scores).mean()
rmse_scores

from sklearn.metrics import mean_squared_error

rms=mean_squared_error(y_test,y_pre,squared=False)
mse=mean_squared_error(y_test,y_pre)
rmse=np.sqrt(mse)
rmse

print("\nWeights :",model.coef_)
print("\nIntercept",model.intercept_)

Output:

Weights : [ -8.02358048 -308.83941066 583.63743356 299.99074281 -360.66454462

95.11692608 -93.03587104 118.15977759 662.11309186 26.07805489]

Data Science Lab Page 29

Intercept 153.72032548545178

LOGISTIC REGRESSION:

PROGRAM:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn import datasets,linear_model
from sklearn.metrics import mean_squared_error
diabetes=datasets.load_diabetes()
diabetes.keys()
df=pd.DataFrame(diabetes['data'],columns=diabetes['fea
ture_names'])
x=df
y=diabetes['target']

from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test=train_test_split(x,y,test_siz
e=0.3,random_state=101)

from sklearn import linear_model

model=linear_model.LinearRegression()
model.fit(x_train,y_train)
y_pre=model.predict(x_test)

from sklearn.model_selection import cross_val_score

scores=cross_val_score(model,x,y,scoring="neg_mean_
squared_error",cv=10)
rmse_scores=np.sqrt(-scores).mean()
print('Cross validation:',rmse_scores)

from sklearn.metrics import r2_score

print('r^2:',r2_score(y_test,y_pre))
mse=mean_squared_error(y_test,y_pre)
rmse=np.sqrt(mse)
print('RMSE:',rmse)
print("Weights:",model.coef_)
print("\nIntercept",model.intercept_)

Output:

r^2 : -0.44401265478624397
RMSE : 94.65723681369009

7 c) MULTIPLE REGRESSION ANALYSIS.

ALGORITHM:
Step 1: Import
Libraries. Step 2: Import
Data Science Lab Page 30
dataset.
Step 3: Define x and y.
Step 4: Train the model on the training
set. Step 5: Predict the test set results.
Step 6: Evaluate the
model. Step 7: Plot the
results.
PROGRAM:

import matplotlib.pyplot as plt

import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import
LogisticRegression
diabetes=datasets.load_diabetes()
diabetes.keys()
df.head()
df=pd.DataFrame(diabetes['data'],columns=diabetes[
'feature_names'])
x=df
y=diabetes['target']
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test
_size=0.3,random_state=101)
reg =linear_model.LinearRegression()
reg.fit(x_train,y_train)
y_predict=reg.predict(x_test)
print('\ncoefficients :' ,reg.coef_)
print('\nVariance score:
{}',format(reg.score(x_test,y_test)))
from sklearn.metrics import r2_score
print('r^2 :' ,r2_score(y_test,y_predict))
from sklearn.metrics import mean_squared_error
mse=mean_squared_error(y_test,y_predict)
rmse=np.sqrt(mse)
print('\nRMSE :',rmse)
Output:
coefficients : [ -8.02358048 -308.83941066 583.63743356 299.99074281 -360.66454462
95.11692608 -93.03587104 118.15977759 662.11309186 26.07805489]

Variance score:{} 0.45767579788519963

r^2 : 0.45767579788519963

RMSE : 58.00932552866432

Data Science Lab Page 31

RESULT:
Thus the program was executed successfully.

EX: NO: 8 APPLY AND EXPLORE VARIOUS PLOTTING FUNCTIONS

DATE: ON UCI DATA SETS

AIM:
To apply and explore various plotting functions on UCI data sets.
a) Normal Curves.
b) Density and Contour Plots.
c) Correlation and Scatter Plots.
d) Histograms.
e) Three Dimensional Plotting.

ALGORITHM:
Step 1: Download Heart dataset from kaggle.
Link: https://www.kaggle.com/datasets/zhaoyingzhu/heartcsv
Step 2: Save that in downloads or any other Folder and install packages.
Step 3: Apply these following commands on the dataset.
Step 4: The Output will display.

PROGRAM:

import pandas as pd
import matplotlib.pyplot as plt

# Load the dataset

data_url =
"https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data" names =
['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
dataset = pd.read_csv(data_url, names=names)

# Plot a histogram of sepal length

plt.hist(dataset['sepal-length'],
bins=10) plt.xlabel('Sepal Length')
plt.ylabel('Frequency')
plt.title('Histogram of Sepal Length')
plt.show()

# Plot a scatter plot of sepal length vs sepal width

plt.scatter(dataset['sepal-length'], dataset['sepal-
width']) plt.xlabel('Sepal Length')
plt.ylabel('Sepal Width')
plt.title('Scatter Plot of Sepal Length vs Sepal
Width') plt.show()
Data Science Lab Page 32
Data Science Lab Page 33
# Plot a box plot of petal length for each class
dataset.boxplot(column='petal-length', by='class')
plt.title('Box Plot of Petal Length for Each
Class') plt.xlabel('Class')
plt.ylabel('Petal
Length') plt.show()

# Plot a bar chart of the mean petal width for each class
class_means = dataset.groupby('class')['petal-
width'].mean() class_means.plot(kind='bar')
plt.title('Mean Petal Width for Each
Class') plt.xlabel('Class')
plt.ylabel('Mean Petal
Width') plt.show()

OUTPUT:

Data Science Lab Page 34

RESULT:
Thus the program was executed successfully.
Data Science Lab Page 35
EX: NO: 9 VISUALIZING GEOGRAPHIC DATA WITH BASEMAP
DATE:

AIM:
To create an insight Geographic Data with Basemap.

ALGORITHM:
Step 1: Install Basemap. The zip file occurs extract the original file.
Step 2: import Packages.
Step3: Save that in downloads or any other Folder.
Step 4: Apply these following commands.
Step 5: The Output will display.

PROGRAM & OUTPUT:

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap

plt.figure(figsize=(8, 8))
m = Basemap(projection='ortho', resolution=None, lat_0=50, lon_0=-
100) m.bluemarble(scale=0.5);

fig = plt.figure(figsize=(8, 8))

m = Basemap(projection='lcc',
resolution=None, width=8E6, height=8E6,
lat_0=45, lon_0=-100,)

m.etopo(scale=0.5, alpha=0.5)

# Map (long, lat) to (x, y) for

plotting x, y = m(-122.3, 47.6)
plt.plot(x, y, 'ok', markersize=5)
plt.text(x, y, ' Seattle', fontsize=12);

Data Science Lab Page 36

OUTPUT:

RESULT:

Thus the program was executed successfully.

Data Science Lab Page 37

EX: NO: 10 ARITHMETIC OPERATION BETWEEN TWO PANDA SERIES
DATE:

AIM
To write a python program to perform arithmetic operation between two panda series

ALGORITHM
STEP 1: Start
STEP 2: Import pandas package
STEP 3: Initialise ds1 and ds2
STEP 4: For addition, calculate ds1+ds2
STEP 5: For subtraction, calculate ds1-ds2
STEP 6: For multiplication, calculate
ds1*ds2 STEP 7: For division, calculate
ds1/ds2 STEP 8: Print the desired results
STEP 9: Stop

PROGRAM

import pandas as pd
ds1=pd.Series([2,4,6,8,10]
)
ds2=pd.Series([1,3,5,7,9]
) print("Add two series")
ds=ds1+ds2
print(ds)
print("Subtract two
series") ds=ds1-ds2
print(ds)
print("Multiply two
series") ds=ds1*ds2
print(ds)
print("Divide two
series") ds=ds1/ds2
print(ds)

Data Science Lab Page 38

OUTPUT

Add two
series 0 3
1 7
2 11
3 15
4 19
dtype: int64
Subtract two
series 0 1
1 1
2 1
3 1
4 1
dtype: int64
Multiply two
series 0 2
1 12
2 30
3 56
4 90
dtype: int64
Divide two
series 0
2.000000
1 1.333333
2 1.200000
3 1.142857
4 1.111111
dtype: float64

RESULT
Thus the program to perform arithmetic operations between two panda series has
been executed successfully

Data Science Lab Page 39

EX: NO: 11 SCATTER PLOTS IN PYTHON USING POKEMON

DATE: DATASET

AIM
To perform a scatter plots in Python, using Matplotlib and Seaborn library with Pokemon
dataset.

ALGORITHM:
Step 1: Download pokemon dataset from kaggle.
Link: https://www.kaggle.com/datasets/rounakbanik/pokemon
Step 2: Save that in downloads or any other Folder and install packages.
Step 3: Apply these following commands on the dataset.
Step 4: The Output will display.

PROGRAM:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = pd.read_csv("pokemon.csv")
data.shape
data.head()
g1 = data.loc[data.generation==1,:]
# dataframe.plot.scatter() method
g1.plot.scatter('attack', 'defense');
# The ';' is to avoid showing a message before showing t e plot
# plt.scatter() function
plt.scatter('attack', 'defense', data=g1);
g1.plot.scatter('attack', 'defense', s = 40, c = 'orange', marker = 's', figsize=(8,5.5));
plt.figure(figsize=(10,7)) # Specify size of the chart
plt.scatter('attack', 'defense', data=data[data.is_legendary==1], marker = 'x', c = 'magenta')
plt.scatter('attack', 'defense', data=data[data.is_legendary==0], marker = 'o', c = 'blue')
plt.legend(('Yes', 'No'), title='Is legendary?')
plt.show()
plt.figure(figsize=(10,7))
sns.scatterplot(x = 'attack', y = 'defense', s = 70, hue ='is_legendary', data=data);
# hue represents color
plt.figure(figsize=(10,7))
sns.scatterplot(x = 'attack', y = 'defense', s = 50, hue = 'is_legendary', style ='is_legendary',
data=data);
# style represents marker
plt.figure(figsize=(11,7))
sns.scatterplot(x = 'attack', y = 'defense', s = 50, hue = 'type1', data=data)
plt.legend(bbox_to_anchor=(1.02, 1))
# move legend to outside of the chart
plt.title('Defense vs Attack for All Pokemons', fontsize=16)
plt.xlabel('Attack', fontsize=12)
Data Science Lab Page 40
plt.ylabel('Defense', fontsize=12)

plt.show()
water = data[data.type1 == 'water']
water.plot.scatter('height_m', 'weight_kg', figsize=(10,6))
plt.grid(True) # add gridlines
plt.show()
water.plot.scatter('height_m', 'weight_kg', figsize=(10,6))
plt.grid(True)
for index, row in water.nlargest(5, 'height_m').iterrows():
plt.annotate(row['name']) # text to show
xy = (row['height_m'], row['weight_kg']), # the point to annotate
xytext = (row['height_m']+0.2, row['weight_kg']), # where to show the text fontsize=12)
plt.xlim(0, ) # x-axis has minimum 0 plt.ylim(0, ) # y-axis has minimum 0 plt.show()

Data Science Lab Page 41

RESULT:
Thus the above program was executed successfully.

Data Science Lab Page 42

FDS Final Manual
No ratings yet
FDS Final Manual
41 pages
FDS RECORD-1-4
No ratings yet
FDS RECORD-1-4
18 pages
Fundamentals of Data Science Students
No ratings yet
Fundamentals of Data Science Students
52 pages
Data Science
No ratings yet
Data Science
18 pages
Fds Lab Record
No ratings yet
Fds Lab Record
84 pages
Fds PDF
No ratings yet
Fds PDF
58 pages
Pds Record Document Ds II
No ratings yet
Pds Record Document Ds II
36 pages
EXP1-siddhant gupta (23_SE_148)
No ratings yet
EXP1-siddhant gupta (23_SE_148)
17 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
dv_lab_manual_modified
No ratings yet
dv_lab_manual_modified
31 pages
ML MANUAL
No ratings yet
ML MANUAL
21 pages
Ai Programs
No ratings yet
Ai Programs
22 pages
Python programming U5
No ratings yet
Python programming U5
46 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
fdsa lab manual final
No ratings yet
fdsa lab manual final
70 pages
Practical Record File X - DS
No ratings yet
Practical Record File X - DS
12 pages
CS3362 - Data Science Laboratory - Manual - Final-1
No ratings yet
CS3362 - Data Science Laboratory - Manual - Final-1
76 pages
Python Notes by Prof T
No ratings yet
Python Notes by Prof T
10 pages
3rd EXPERIMENT
No ratings yet
3rd EXPERIMENT
13 pages
Ds Lab-1
No ratings yet
Ds Lab-1
40 pages
ML Lab File Vijay Kumar
No ratings yet
ML Lab File Vijay Kumar
16 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
43 pages
Report
No ratings yet
Report
18 pages
2330293Lab7SubmissionPPJ
No ratings yet
2330293Lab7SubmissionPPJ
13 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
45 pages
IDS-1
No ratings yet
IDS-1
30 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
No ratings yet
Ge Sem II Dav Upc 2344001201 Sl. No. Qp. 2012 July 2023
16 pages
sowmi DS
No ratings yet
sowmi DS
27 pages
DS-DS Lab-1
No ratings yet
DS-DS Lab-1
4 pages
CS3361-DATA SCIENCE LAB MANUAL
No ratings yet
CS3361-DATA SCIENCE LAB MANUAL
44 pages
Khadeeja_DS_PRACTICAL 4
No ratings yet
Khadeeja_DS_PRACTICAL 4
24 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
index
No ratings yet
index
4 pages
Fds Answers
No ratings yet
Fds Answers
53 pages
Class 1 - 2024 Business Analytics
No ratings yet
Class 1 - 2024 Business Analytics
8 pages
Dsa Record-1
No ratings yet
Dsa Record-1
153 pages
introduction to python
No ratings yet
introduction to python
11 pages
CS3361 - Data Science Laboratory
No ratings yet
CS3361 - Data Science Laboratory
31 pages
ML(sudhanshu)
No ratings yet
ML(sudhanshu)
24 pages
Datascience
No ratings yet
Datascience
8 pages
Exp_1_Introduction to Data Analytics and Python fundamentals_sdk_ok
No ratings yet
Exp_1_Introduction to Data Analytics and Python fundamentals_sdk_ok
9 pages
EX-02-Data manipulation pandas matplot
No ratings yet
EX-02-Data manipulation pandas matplot
9 pages
CS3362 Data Science Laboratory Manual 2022-23
No ratings yet
CS3362 Data Science Laboratory Manual 2022-23
54 pages
Data Toolkit Assignment
No ratings yet
Data Toolkit Assignment
30 pages
AI Final PDF
No ratings yet
AI Final PDF
38 pages
DSL Rough Draft
No ratings yet
DSL Rough Draft
34 pages
Lab Manual Python Programming Language
No ratings yet
Lab Manual Python Programming Language
21 pages
ML IU48prac1,2
No ratings yet
ML IU48prac1,2
16 pages
Pandas NumPy Practice Questions
No ratings yet
Pandas NumPy Practice Questions
2 pages
Data Science Workshop - Day 1
No ratings yet
Data Science Workshop - Day 1
80 pages
CS3361 - Data Science
No ratings yet
CS3361 - Data Science
56 pages
ML Final Prac
No ratings yet
ML Final Prac
47 pages
Study Material IP XII
No ratings yet
Study Material IP XII
116 pages
Practical_1
No ratings yet
Practical_1
5 pages
11th PGM
No ratings yet
11th PGM
9 pages
2023 Data Analysis and Visualization Using Python
100% (1)
2023 Data Analysis and Visualization Using Python
9 pages
Mastering Pandas in Python: Course Book
From Everand
Mastering Pandas in Python: Course Book
Pedro Martins
No ratings yet
Machine Learning: Hands-On for Developers and Technical Professionals
From Everand
Machine Learning: Hands-On for Developers and Technical Professionals
Jason Bell
No ratings yet
Who Are The Childfree
No ratings yet
Who Are The Childfree
13 pages
RM Module-3
No ratings yet
RM Module-3
82 pages
Written Report - Martin Junior - Sta404g1
No ratings yet
Written Report - Martin Junior - Sta404g1
58 pages
Chapter-3 Class Scheduling
67% (3)
Chapter-3 Class Scheduling
18 pages
Noise Pollution Projec Treport
No ratings yet
Noise Pollution Projec Treport
43 pages
Britannia Case Study
No ratings yet
Britannia Case Study
4 pages
A Project Report On On Indian Oil Out Lets and Bazzar Traders On Servo Lubricants at IOCL
No ratings yet
A Project Report On On Indian Oil Out Lets and Bazzar Traders On Servo Lubricants at IOCL
56 pages
Economic Thresholds of Insect Pest - Printer Friendly
No ratings yet
Economic Thresholds of Insect Pest - Printer Friendly
12 pages
Random Sampling: Math Tutorials by Mwendalubi
No ratings yet
Random Sampling: Math Tutorials by Mwendalubi
23 pages
2021 Heffernan Exam 1
No ratings yet
2021 Heffernan Exam 1
14 pages
Mean Median Mode Grouped Data.
No ratings yet
Mean Median Mode Grouped Data.
21 pages
Research Project Format
No ratings yet
Research Project Format
13 pages
7311010550-Nelvin Apiyo
No ratings yet
7311010550-Nelvin Apiyo
28 pages
Subaru Case Study
No ratings yet
Subaru Case Study
6 pages
Understanding Research 2nd Edition W Lawrence Neuman download
100% (1)
Understanding Research 2nd Edition W Lawrence Neuman download
40 pages
3rd Assignment Research Solution
No ratings yet
3rd Assignment Research Solution
109 pages
Chapter 1
No ratings yet
Chapter 1
5 pages
1212tesfay Berhe Research Proposal
No ratings yet
1212tesfay Berhe Research Proposal
21 pages
Lesson 2.2 Quantitative Data-Collection Techniques
No ratings yet
Lesson 2.2 Quantitative Data-Collection Techniques
27 pages
Population-and-Sampling-Techniques
No ratings yet
Population-and-Sampling-Techniques
34 pages
AC 1103 Presentations
No ratings yet
AC 1103 Presentations
10 pages
Chapter Five Quality Management and Control 5.1. Overview of Total Quality Management and Quality Specification
No ratings yet
Chapter Five Quality Management and Control 5.1. Overview of Total Quality Management and Quality Specification
7 pages
CH 4 1 Transport Planning
No ratings yet
CH 4 1 Transport Planning
26 pages
Sampling Theory PYQ
No ratings yet
Sampling Theory PYQ
2 pages
Social_Media_Content_Creation_A_Study_of_SMEs_in_M
No ratings yet
Social_Media_Content_Creation_A_Study_of_SMEs_in_M
10 pages
A Study On Customer Perception Towards Mahindra Finance
67% (3)
A Study On Customer Perception Towards Mahindra Finance
74 pages
Chapter - Ii Review of Literature
No ratings yet
Chapter - Ii Review of Literature
25 pages
Mansi Rawat Major Project Report 1
No ratings yet
Mansi Rawat Major Project Report 1
42 pages
Sample Size Estimation and Sampling Techniques For Selecting A Representative Sample
No ratings yet
Sample Size Estimation and Sampling Techniques For Selecting A Representative Sample
7 pages
Factors That Cause Anxiety in Learning English Speaking Skills Among College Students in City of Malabon University: Basis For 1 Year Action Plan
No ratings yet
Factors That Cause Anxiety in Learning English Speaking Skills Among College Students in City of Malabon University: Basis For 1 Year Action Plan
27 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.