Scientific Computing
Scientific Computing
Topics covered
1) Data Analysis and numerical computing with numpy
2) Introduction to Scipy
3) Introduction to Pandas
4) Introduction to scikit-learn
NumPy stands for ‘Numerical Python’. It is a package for data analysis and scientific computing
with Python. NumPy uses a multidimensional array object, and has functions and tools for
working with these arrays. The powerful n-dimensional array in NumPy speeds-up data
processing. NumPy can be easily interfaced with other Python packages and provides tools for
integrating with other programming languages like C, C++ etc.
One of the key features of NumPy is its N-dimensional array object, or ndarray, which is a fast,
flexible container for large datasets in Python. Arrays enable you to perform mathematical
operations on whole blocks of data using similar syntax to the equivalent operations between
scalar elements.
NumPy arrays are used to store lists of numerical data, vectors and matrices. The NumPy library
has a large set of routines (built-in functions) for creating, manipulating, and transforming
NumPy arrays. Python language also has an array data structure, but it is not as versatile, efficient
and useful as the NumPy array. The NumPy Contiguous memory allocation:
Array is officially called ndarray but commonly known as array. In rest of the chapter, we will
be referring to NumPy array whenever we use “array”. following are few differences between list
and Array.
List Array
List can have elements of different data types forAll elements of an array are of same data type for
example, [1,3.4, ‘hello’, ‘a@’] example, an array of floats may be: [1.2, 5.4, 2.7]
Elements of a list are not stored Array elements are stored in contiguous memory
contiguously in memory. locations. This makes operations on arrays faster
than lists.
Lists do not support element wise operations, Arrays support element wise operations. For
for example, addition, multiplication, etc. example, if A1 is an array, it is possible to say
because elements may not be of same type. A1/3 to divide each element of the array by 3
Lists can contain objects of different NumPy array takes up less space in memory as
datatype that Python must store the type compared to a list because arrays do not require
information for every element along with its to store datatype of each element separately.
element value. Thus lists take more space
in memory and are less efficient.
List is a part of core Python. Array (ndarray) is a part of NumPy library.
One of the reasons NumPy is so important for numerical computations in Python is because it is
designed for efficiency on large arrays of data. There are a number of reasons for this:
• NumPy internally stores data in a contiguous block of memory, independent of other built-in
Python objects. NumPy’s library of algorithms written in the C lan-guage can operate on this
memory without any type checking or other overhead. NumPy arrays also use much less memory
than built-in Python sequences.
• NumPy operations perform complex computations on entire arrays without the need for Python
for loops.
Numpy Arrays
There are several ways to create arrays. To create an array and to use its methods, first we need
to import the NumPy library.
#NumPy is loaded as np (we can assign any #name), numpy must be written in lowercase
import numpy as np
The NumPy’s array() function converts a given list into an array. For example,
#Create an array called array1 from the given list.
array1 = np.array([10,20,30])
Write a python program that converts the list to array using numpy and display the array
import numpy as np
array1=np.array([10,20,30])
print(array1)
print(array1([10,20,30]))
array2 = np.array([5,-7.4,'a',7.2])
print(array2)
Write a program uses the numpy library to perform some basic operations with arrays
#This imports the numpy library and allows it to be used with the alias np.
import numpy as np
#This creates a 2x3 array called data filled with random numbers drawn from a standard normal
distribution (mean = 0, variance = 1).
data = np.random.randn(2, 3)
#Print the array:
print(data)
#Element-wise multiplication by 10:
print(data * 10)
#Element-wise addition of the array with itself:
print(data + data)
Output
array([[-0.2047, [-0.5557, 0.4789, -0.5194], 1.9658, 1.3934]])
array([[ -2.0471, 4.7894, -5.1944], [ -5.5573, 19.6578, 13.9341]])
array([[-0.4094, [-1.1115, 0.9579, -1.0389], 3.9316, 2.7868]])
In the first example, the array elements are prints, second all of the elements have been multiplied
by 10. In the third, the corresponding values in each “cell” in the array have been added to each
other.
The easiest way to create an array is to use the array function. This accepts any sequence-like
object (including other arrays) and produces a new NumPy array containing the passed data.
Write a program create a NumPy array from a Python list and then print it
data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
print(arr1)
output: array([ 6. ,7.5,8. ,0. ,1. ])
Nested sequences, like a list of equal-length lists, will be converted into a multidimensional array:
write a python program to create a 2-dimensional NumPy array from a nested Python list and
then print it.
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)
print(arr2)
Table for a short list of standard array creation functions is given below.
write a program to create of NumPy arrays with specified data types and then prints the data
types of these arrays
import numpy as np
arr1 = np.array([1, 2, 3], dtype=np.float64)
arr2 = np.array([1, 2, 3], dtype=np.int32)
print(arr1.dtype)
print(arr2.dtype)
Output
float64
int32
dtype('int32')
In this sequence of code snippets, you are converting an integer array to a floating-point array
using numpy.
Write a python program to create a NumPy array, checking its data type, and converting it to a
different data type
import numpy as np
float_arr = arr.astype(np.float64)
print(float_arr.dtype) # Output: dtype('float64')
Write a python program to create a NumPy array with floating-point numbers, and then convert
it to an array of integers
import numpy as np
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
print(arr) # Output: [ 3.7 -1.2 -2.6 0.5 12.9 10.1]
int_arr = arr.astype(np.int32)
print(int_arr) # Output: [ 3 -1 -2 0 12 10]
if you assign a scalar value to a slice, as in arr[5:8] = 12, the value is propagated (or broadcasted
henceforth) to the entire selection. An important first distinction from Python’s built-in lists is
that array slices are views on the original array.
This means that the data is not copied, and any modifications to the view will be reflected in the
source array.
This sequence of code snippets demonstrates array creation, slicing, and assignment in numpy.
Here's the step-by-step explanation:
import numpy as np
# Create a 1D array with values from 0 to 9
arr = np.arange(10)
print(arr)
# Output:
# array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# Access and print a slice of the array from index 5 to 7 (inclusive of 5, exclusive of 8)
print(arr[5:8])
# Output:
# array([5, 6, 7])
If you want a copy of a slice of an ndarray instead of a view, you will need to explicitly copy the
array—for example, arr[5:8].copy().
This program demonstrates that slices in NumPy are views into the original array. Modifying the
slice directly affects the original array, which can be useful for efficiently updating portions of an
array without creating a copy.
import numpy as np
# Create a 1D array with values from 0 to 9
arr = np.arange(10)
print("Original array:")
print(arr)
# Output:
# Original array:
# [0 1 2 3 4 5 6 7 8 9]
# Create a slice `arr_slice` from `arr` containing elements from index 5 to 7 (exclusive)
arr_slice = arr[5:8]
print("\nSlice of the original array (arr_slice):")
print(arr_slice)
# Output
# Slice of the original array (arr_slice):
# [5 6 7]
# Output:
# Modified arr_slice with slice assignment:
# [64 64 64]
# Accessing the element at the first row, third column using different indexing methods
element_0_2_method1 = arr2d[0][2]
element_0_2_method2 = arr2d[0, 2]
Write a python code demonstrates various slicing and indexing operations on 1D and 2D numpy
arrays, as well as how to modify specific parts of these arrays.
import numpy as np
# Creating a 1D array
arr = np.array([0, 1, 2, 3, 4, 64, 5])
print("1D array:")
print(arr)
# Creating a 2D array
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("\n2D array:")
print(arr2d)
# Slicing the first two rows and columns from the second to the end
print("\nFirst two rows, columns from second to the end:")
print(arr2d[:2, 1:])
Output
1D array:
[ 0 1 2 3 4 64 5]
2D array:
[[1 2 3]
[4 5 6]
[7 8 9]]
2D array slicing
Write a
program that demonstrates how to create and manipulate a NumPy array by filling it with specific
values and selecting rows using both positive and negative indices.
import numpy as np
# Create an empty array with shape (8, 4) and fill it with row indices
arr = np.empty((8, 4))
for i in range(8):
arr[i] = i
Output
Original array:
[[0. 0. 0. 0.]
[1. 1. 1. 1.]
[2. 2. 2. 2.]
[3. 3. 3. 3.]
[4. 4. 4. 4.]
[5. 5. 5. 5.]
[6. 6. 6. 6.]
[7. 7. 7. 7.]]
This code demonstrates how to use both positive and negative indices to select specific rows from
a NumPy array. The use of negative indices allows you to access rows from the end of the array,
which can be a useful feature in certain scenarios.
2)Introduction to scipy
When working with Python for data science, machine learning, web development, or other
programming tasks, there are several essential libraries that provide powerful tools and
functionalities.
SciPy in Python is an open-source library used for solving mathematical, scientific, engineering,
and technical problems. It allows users to manipulate the data and visualize the data using a
wide range of high-level Python commands. SciPy is built on the Python NumPy extention.
SciPy is also pronounced as “Sigh Pi.”
Introduction to scipy
It provides more utility functions for optimization, stats and signal processing.
SciPy, pronounced as Sigh Pi, is a scientific python open source, distributed under the
BSD licensed library to perform Mathematical, Scientific and Engineering Computations.
Numpy VS SciPy
Numpy
Numpy is written in C and use for mathematical or numeric calculation.
It is faster than other Python Libraries
Numpy is the most useful library for Data Science to perform basic calculations.
Numpy contains nothing but array data type which performs the most basic operation
like sorting, shaping, indexing, etc.
SciPy
SciPy is built in top of the NumPy
SciPy module in Python is a fully-featured version of Linear Algebra while Numpy
contains only a few features.
Most new Data Science features are available in Scipy rather than Numpy.
scipy is the core package for scientific routines in Python; it is meant to operate efficiently on
numpy arrays, so that numpy and scipy work hand in hand.
SciPy is built on the NumPy array framework and takes scientific programming to a whole new
level by supplying advanced mathematical functions like integration, ordinary differential
equation solvers, special functions, optimizations, and more.
It’s a package that utilizes NumPy arrays and manipulations to take on standard problems that
scientists and engineers commonly face: integration, determining a function’s maxima or
minima, finding eigenvectors for large sparse matrices, testing whether two distributions are the
same, and much more.
The scipy package contains various toolboxes dedicated to common issues in scientific
computing. Its different submodules correspond to different applications, such as interpolation,
integration, optimization, image processing, statistics, special functions, etc
It’s a package that utilizes NumPy arrays and manipulations to take on standard problems that
scientists and engineers commonly face: integration,determining a function’s maxima or
minima, finding eigenvectors for large sparse matrices, testing whether two distributions are the
same, and much more.
3)Introduction to Pandas
Pandas is a powerful data manipulation and analysis library for Python, providing data
structures and functions needed to work with structured data seamlessly. The two primary
data structures in pandas are Series and DataFrame
Pandas is a powerful and open-source Python library. Pandas consist of data structures
and functions to perform efficient operations on data.
Pandas is a powerful and versatile library that simplifies the tasks of data manipulation in
Python.
Pandas is well-suited for working with tabular data, such as spreadsheets or SQL tables.
The Pandas library is an essential tool for data analysts, scientists, and engineers working
with structured data in Python.
It is built on top of the NumPy library which means that a lot of the structures of NumPy
are used or replicated in Pandas.
The data produced by Pandas is often used as input for plotting functions in Matplotlib,
statistical analysis in SciPy, and machine learning algorithms in Scikit-learn.
Pandas Series
A Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer,
string, float, Python objects, etc.). The axis labels are collectively called indexes.
The Pandas Series is nothing but a column in an Excel sheet. Labels need not be unique but must
be of a hashable type.
The object supports both integer and label-based indexing and provides a host of methods for
performing operations involving the index.
import pandas as pd
import numpy as np
# simple array
data = np.array(['g', 'e', 'e', 'k', 's'])
ser = pd.Series(data)
print("Pandas Series:\n", ser)
Output
Pandas Series: Series([], dtype: float64)
Pandas Series:
0 g
1 e
2 e
3 k
4 s
dtype: object
This Pandas Series obj is a one-dimensional array-like object that holds a sequence of values (4,
7, -5, 3) and provides each value with a unique integer index. It's a fundamental data structure in
Pandas for handling and manipulating data efficiently in Python.
import pandas as pd
The Pandas Series object obj contains both its values ([4, 7, -5, 3]) accessed via obj.values and its
index (RangeIndex(start=0, stop=4, step=1)) accessed via obj.index. These attributes allow for
direct access to the data and its indexing information, crucial for operations and manipulations
within Pandas and other analytical tasks in Python.
import pandas as pd
Output
Using NumPy functions or NumPy-like operations, such as filtering with a boolean array, scalar
multiplication, or applying math functions, will preserve the index-value link:
import pandas as pd
import numpy as np
output
Pandas Series obj2:
d 6
b 7
a -10
c 3
dtype: int64
Dataframe
A DataFrame represents a rectangular table of data and contains an ordered collection of columns,
each of which can be a different value type (numeric, string,boolean, etc.). The DataFrame has
both a row and column index; it can be thought of as a dict of Series all sharing the same index.
Under the hood, the data is stored as one or more two-dimensional blocks rather than a list, dict,
or some other collection of one-dimensional arrays
There are many ways to construct a DataFrame, though one of the most common is from a dict
of equal-length lists or NumPy arrays:
import pandas as pd
Output
Columns can be modified by assignment. For example, the empty 'debt' column could be assigned
a scalar value or an array of values:
import pandas as pd
import numpy as np
frame2 = pd.DataFrame(data)
frame2.index = ['one', 'two', 'three', 'four', 'five', 'six']
print("Initial DataFrame:")
print(frame2)
Introduction to scikit-learn
scikit-learn is one of the most widely used and trusted general-purpose Python machine learning
toolkits. It contains a broad selection of standard supervised and unsupervised machine learning
methods with tools for model selection and evaluation, data transformation, data loading, and
model persistence. These models can be used for classification, clustering, prediction, and other
common tasks.
There are excellent online and printed resources for learning about machine learning and how to
apply libraries like scikit-learn and TensorFlow to solve real-world problems. In this section, I
will give a brief flavor of the scikit-learn API style.
At the time of this writing, scikit-learn does not have deep pandas integration, though there are
some add-on third-party packages that are still in development. pandas can be very useful for
massaging datasets prior to model fitting.
datasets/titanic/train.csv
datasets/titanic/test.csv
import pandas as pd
Output
Handling missing data is crucial before feeding it to libraries like statsmodels and scikit-learn.
Here's a step-by-step guide to identify and handle missing values in the Titanic dataset:
Test Dataset
PassengerId 0
Pclass 0
Name 0
Sex 0
Age 86
SibSp 0
Parch 0
Ticket 0
Fare 1
Cabin 327
Embarked 0
dtype: int64
Question Bank
1) What are the most common NumPy data types?
2) How to create NumPy arrays?
3) How do you identify the data type of an array?
4) List the advantages of NumPy over Python lists.
5) How do you create an array with all values as zeros or ones?
6) How to add and multiply matrices using NumPy?
7) How to find the transpose of the matrix using NumPy?
8) What is array slicing and how do you do it in NumPy?
9) Describe the operations that NumPy can execute.
10) What is the output of the below code snippet?
11) write a python program to create a 2-dimensional NumPy array from a nested Python list
and then print it.
12) Write a Short Note on scipy
13) List out the Packages of scipy
14) write a short note on scikit-learn
15) Write a python program that loads the Titanic dataset, performs data preprocessing, and
prepares the data for model training.
16) Write a python program that includes fitting a Logistic Regression model on the prepared
Titanic dataset and making predictions on the test set
17) write the short notes on Pandas
18) write a program that demonstrates the Creation of series using the Pandas Library.