Numpy in Pandas
Numpy in Pandas
1. NumPy in Python
NumPy is a Python library used for working with arrays.
It also has functions for working in domain of linear algebra, fourier transform, and matrices.
NumPy was created in 2005 by Travis Oliphant. It is an open source project and you can use it freely.
NumPy can be used to perform a wide variety of mathematical operations on arrays.
It adds powerful data structures to Python that guarantee efficient calculations with arrays and matrices and
it supplies an enormous library of high-level mathematical functions that operate on these arrays and
matrices.
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of
generic data.
Arbitrary data-types can be defined using Numpy which allows NumPy to seamlessly and speedily
integrate with a wide variety of databases.
The import numpy portion of the code tells Python to bring the NumPy library into your current
environment.
The as np portion of the code then tells Python to give NumPy the alias of np.
This allows you to use NumPy functions by simply typing np.
NumPy is a very popular python library for large multi-dimensional array and matrix processing, with the
help of a large collection of high-level mathematical functions.
It is very useful for fundamental scientific computations in Machine Learning.
NumPy is a module for Python.
The name is an acronym for "Numeric Python" or "Numerical Python".
It is pronounced /'nʌmpai/ (NUM-py) or less often /'nʌmpi (NUM-pee)). It
is an extension module for Python, mostly written in C.
A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers.
The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the
size of the array along each dimension.
NumPy is a general-purpose library for working with large arrays and matrices.
Scrapy is the most popular high-level Python framework for extracting data from websites.
Matplotlib is a standard data visualization library that together with NumPy, SciPy, and IPython provides
features similar to MATLAB.
NumPy (short for Numerical Python) provides an efficient interface to store and operate on dense data
buffers.
In some ways, NumPy arrays are like Python's built-in list type, but NumPy arrays provide much more
efficient storage and data operations as the arrays grow larger in size.
NumPy Features
High-performance N-dimensional array object
It contains tools for integrating code from C/C++ and Fortran
It contains a multidimensional container for generic data
Additional linear algebra, Fourier transform, and random number capabilities
It consists of broadcasting functions
It had data type definition capability to work with varied databases
NumPy Arrays are faster than Python Lists because of the following reasons:
An array is a collection of homogeneous data-types that are stored in contiguous memory locations.
On the other hand, a list in Python is a collection of heterogeneous data types stored in non-
contiguous memory locations.
ndarray attributes
Every array in NumPy is an object of ndarray class.
The Properties of an array can be manipulated by accessing the ndarray attributes.
The more important attributes of an ndarray are ndarray.ndim, ndarray.shape, ndarray.size,
ndarray.dtype, and ndarray.itemsize
i - integer
b- boolean
u - unsigned integer
f - float
c - complex float
m - timedelta
M - datetime
O - object
S - string
U - Unicode string
V - fixed chunk of memory for other type
import numpy library
In [1]:
1 import numpy as np
You can find more information about NumPy when executing the following commands.
In [2]:
1 # help(np)
array(data_type, value_list)
In [3]:
1 numpy_array = np.array([0.577, 1.618, 2.718, 3.14, 6, 28, 37, 1729])
2 print(numpy_array)
3 print(numpy_array.dtype)
In [4]:
[[5.770e-01 1.618e+00]
[2.718e+00 3.140e+00]
[6.000e+00 2.800e+01]
[3.700e+01 1.729e+03]]
float64
The following code gives a TypeError.
In [5]:
1 numpy_array = np.array(0.577, 1.618, 2.718, 3.14, 6, 28, 37, 1729)
2 # This error is due to calling array() mith multiple arguments, instead of a single list of values
3 print(numpy_array)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_11868/2649635874.py in <module>
----> 1 numpy_array = np.array(0.577, 1.618, 2.718, 3.14, 6, 28, 37, 1729) # This error is due to calling a
rray() mith multiple arguments, instead of a single list of values
2 print(numpy_array)
In [ ]:
[[ 0 1 1]
[ 2 3 5]
[ 8 13 21]]
In [ ]:
arange()
In [ ]:
1 numpy_array = np.arange(0, 100, 5, int)
2 print(numpy_array)
3 print(len(numpy_array))
[ 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95]
20
In [6]:
[[ 0 5 10 15 20]
[25 30 35 40 45]
[50 55 60 65 70]
[75 80 85 90 95]]
In [7]:
[[ 0 5 10 15 20]
[25 30 35 40 45]
[50 55 60 65 70]
[75 80 85 90 95]]
In [8]:
1 # Since one of parameters in the array is float type, the elements in the array are of type float.
2 numpy_array = np.arange(0, 3.14, 0.3)
3 print(numpy_array)
[0. 0.3 0.6 0.9 1.2 1.5 1.8 2.1 2.4 2.7 3. ]
In [9]:
1 # Type of arange
2 numpy_array = np.arange(2, 37, 3)
3 print(numpy_array)
4 print(type(numpy_array))
[ 2 5 8 11 14 17 20 23 26 29 32 35]
<class 'numpy.ndarray'>
zeros()
In [10]:
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
<class 'numpy.ndarray'>
In [11]:
[[0 0 0 0]
[0 0 0 0]
[0 0 0 0]]
<class 'numpy.ndarray'>
In [12]:
In [13]:
ones()
In [14]:
1 numpy_ones = np.ones((3, 4))
2 print(numpy_ones)
3 print(type(numpy_ones))
[[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]]
<class 'numpy.ndarray'>
In [15]:
[[1 1 1 1]
[1 1 1 1]
[1 1 1 1]]
<class 'numpy.ndarray'>
In [16]:
In [17]:
empty()
It creates an empty array in the certain dimension with random values which change for every call.
In [18]:
1 numpy_empty = np.empty((4, 4))
2 print(numpy_empty)
In [19]:
[[1.23160026e-311+1.05730048e-321j 0.00000000e+000+0.00000000e+000j
1.42413554e-306+5.02034658e+175j 1.21004824e-071+4.23257854e+175j]
[3.41996727e-032+4.90863814e-062j 9.83255598e-072+4.25941885e-096j
1.12855837e+277+8.93168725e+271j 7.33723594e+223+1.70098498e+256j]
[5.49109388e-143+1.06396443e+224j 3.96041428e+246+1.16318408e-028j
1.89935647e-052+9.85513351e+165j 1.08805205e-071+4.18109207e-062j]
[2.24151504e+174+3.36163259e-067j 5.41760579e-067+3.18070066e-028j
3.93896263e-062+5.74015544e+180j 1.94919988e-153+1.02847381e-307j]]
In [21]:
linspace()
In [22]:
1 numpy_linspace = np.linspace(1, 37, 37)
2 print(numpy_linspace)
In [23]:
[ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36 37]
In [24]:
[ True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True]
In [25]:
reshape()
In [26]:
[[ 1 2 3 4 5 6]
[ 7 8 9 10 11 12]
[13 14 15 16 17 18]
[19 20 21 22 23 24]
[25 26 27 28 29 30]
[31 32 33 34 35 36]]
In [50]:
1 numpy_exercises = np.arange(16).reshape(4, 4)
2 print(numpy_exercises)
3 print(numpy_exercises.size)
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
16
In [28]:
1 """
2 If the array size is very large then only corners of the array are printed.
3 Then the central part of the array is skipped.
4 """
5 numpy_exercises = np.arange(10000).reshape(200, 50)
6 print(numpy_exercises)
[[ 0 1 2 ... 47 48 49]
[ 50 51 52 ... 97 98 99]
[ 100 101 102 ... 147 148 149]
...
[9850 9851 9852 ... 9897 9898 9899]
[9900 9901 9902 ... 9947 9948 9949]
[9950 9951 9952 ... 9997 9998 9999]]
set_printoption()
np.set_printoption(threshold = sys.maxsize)
In [29]:
1 """
2 To enable numpy to print the entire array, use 'set_printoptions()' method.
3 """
4 import sys
5 numpy_exercises = np.arange(10000).reshape(200, 50)
6 np.set_printoptions(threshold = sys.maxsize)
7 print(numpy_exercises)
[[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13
14 15 16 17 18 19 20 21 22 23 24 25 26 27
28 29 30 31 32 33 34 35 36 37 38 39 40 41
42 43 44 45 46 47 48 49]
[ 50 51 52 53 54 55 56 57 58 59 60 61 62 63
64 65 66 67 68 69 70 71 72 73 74 75 76 77
78 79 80 81 82 83 84 85 86 87 88 89 90 91
92 93 94 95 96 97 98 99]
[ 100 101 102 103 104 105 106 107 108 109 110 111 112 113
114 115 116 117 118 119 120 121 122 123 124 125 126 127
128 129 130 131 132 133 134 135 136 137 138 139 140 141
142 143 144 145 146 147 148 149]
[ 150 151 152 153 154 155 156 157 158 159 160 161 162 163
164 165 166 167 168 169 170 171 172 173 174 175 176 177
178 179 180 181 182 183 184 185 186 187 188 189 190 191
192 193 194 195 196 197 198 199]
[ 200 201 202 203 204 205 206 207 208 209 210 211 212 213
214 215 216 217 218 219 220 221 222 223 224 225 226 227
228 229 230 231 232 233 234 235 236 237 238 239 240 241
indexing
In [30]:
1 # Using 1D array
2 special_nums = np.array([0.577, 1.618, 2.718, 3.14, 6, 37, 1729])
3 print(special_nums[0])
4 print(special_nums[-1]) # This shows negative indexing and negative indexing starts with -1.
5 print(special_nums[-3])
6 print(special_nums[2])
7 print(special_nums[5])
0.577
1729.0
6.0
2.718
37.0
In [31]:
1 # Using 2D array
2 special_nums = np.array([0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]).reshape(3, 4)
3 print(special_nums)
4 print(len(special_nums))
5 print(special_nums[1, 1]) # It means first line, first column, namely the output is 5.
6 print(special_nums[2, 3]) # It means second line, third column, namely the output is 89.
[[ 0 1 1 2]
[ 3 5 8 13]
[21 34 55 89]]
3
5
89
Addition
In [32]:
[[20. 27.5]
[35. 42.5]]
[[0.577 1.618]
[2.718 3.14 ]]
Added array
[[20.577 29.118]
[37.718 45.64 ]]
Substraction
In [33]:
[[20. 27.5]
[35. 42.5]]
[[0.577 1.618]
[2.718 3.14 ]]
Substracted array
[[19.423 25.882]
[32.282 39.36 ]]
Multiplication
In [34]:
[[20. 27.5]
[35. 42.5]]
[[0.577 1.618]
[2.718 3.14 ]]
Multiplicated array
[[ 11.54 44.495]
[ 95.13 133.45 ]]
Division
In [35]:
[[20. 27.5]
[35. 42.5]]
[[0.577 1.618]
[2.718 3.14 ]]
Divided array
[[34.66204506 16.99629172]
[12.87711553 13.53503185]]
Floor division
In [36]:
[[20. 27.5]
[35. 42.5]]
[[0.577 1.618]
[2.718 3.14 ]]
Divisor array
[[34. 16.]
[12. 13.]]
Modulus
In [37]:
[[20. 27.5]
[35. 42.5]]
[[0.577 1.618]
[2.718 3.14 ]]
Modulus array
[[0.382 1.612]
[2.384 1.68 ]]
Exponentiation
In [38]:
[[20. 27.5]
[35. 42.5]]
[[0.577 1.618]
[2.718 3.14 ]]
Exponentiated array
[[5.63241060e+00 2.13226068e+02]
[1.57317467e+04 1.29760121e+05]]
@ operator
[[20. 27.5]
[35. 42.5]]
[[0.577 1.618]
[2.718 3.14 ]]
Product of @ operator
[[ 86.285 118.71 ]
[135.71 190.08 ]]
dot()
new_matrix = matrix_1.dot(matrix_2)
In [40]:
1 numpy_arange = np.arange(20, 50, 7.5).reshape(2, 2)
2 numpy_array = np.array([0.577, 1.618, 2.718, 3.14]).reshape(2, 2)
3 print(numpy_arange)
4 print()
5 print(numpy_array)
6 print()
7 print('Product of the function dot()')
8 array = numpy_arange.dot(numpy_array) # dot() function
9 print(array)
[[20. 27.5]
[35. 42.5]]
[[0.577 1.618]
[2.718 3.14 ]]
Relational operations
numpy_array<100
In [49]:
It returns a UFuncTypeError.
In [56]:
1 numpy_array_one = np.ones((2, 2), dtype=int)
2 numpy_array_two = np.arange(2, 20.0)
3 numpy_array_one += numpy_array_two
4 print(numpy_array_one)
---------------------------------------------------------------------------
UFuncTypeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_11868/3374378000.py in <module>
1 numpy_array_one = np.ones((2, 2), dtype=int)
2 numpy_array_two = np.arange(2, 20.0)
----> 3 numpy_array_one += numpy_array_two
4 print(numpy_array_one)
UFuncTypeError: Cannot cast ufunc 'add' output from dtype('float64') to dtype('int32') with casting rule
'same_kind'
Unary operations
array_name.min()
array_name.max()
array_name.sum()
etc.
In [92]:
[[ 1 2 3 4 5 6]
[ 7 8 9 10 11 12]
[13 14 15 16 17 18]
[19 20 21 22 23 24]]
The minimum element of the certain array is 1.
The maximum element of the certain array is 24.
The sum of the elements of the array is 300.
The mean of the elements of the array is 12.5.
The standard deviation of the elements of the array is 6.922186552431729.
The variance of the elements of the array is 47.916666666666664.
The length of the elements of the array is 4.
The shape of the array is (4, 6).
The dtype of the array is int32.
The type of the array is <class 'numpy.ndarray'>.
The minimum numbers of every row are [ 1 7 13 19].
The maximum numbers of every row are [ 6 12 18 24].
The minimum numbers of every column are [1 2 3 4 5 6].
The maximum numbers of every column are [19 20 21 22 23 24].
The sum of the numbers in each column are [40 44 48 52 56 60].
The sum of the numbers in each row are [ 21 57 93 129].
The mean of the numbers in each column is [10. 11. 12. 13. 14. 15.].
The mean of the numbers in each row is [ 3.5 9.5 15.5 21.5].
In [93]:
In [111]:
1 import math
2 numpy_array = np.arange(1, 25, 3)
3 print('The main array is', numpy_array)
4 numpy_array_addition = numpy_array + 6
5 print('By using the += operator, the array is', numpy_array_addition)
6 numpy_array_subtraction = numpy_array - 6
7 print('By using the -= operator, the array is', numpy_array_subtraction)
8 numpy_array_multiplication = numpy_array * 6
9 print('By using the *= operator, the array is', numpy_array_multiplication)
10 numpy_array_division = numpy_array / 6
11 print('By using the /= operator, the array is', numpy_array_division)
12 numpy_array_floor_division = numpy_array // 6
13 print('By using the //= operator, the array is', numpy_array_floor_division)
14 numpy_array_modulus = numpy_array % 6
15 print('By using the %= operator, the array is', numpy_array_modulus)
16 numpy_array_exponentiation = numpy_array ** 6
17 print('By using the **= operator, the array is', numpy_array_exponentiation)
18
concatenate()
In [135]:
Splitting of 1D arrays
np.array_split(array_name, number_of_splits)
In [143]:
1 array_special_nums = np.array([0.577, 1.618, 2, 2.718, 3.14, 6, 28, 37, 1729])
2 print('Before splitting the array \n', array_special_nums)
3 new_array = np.array_split(array_special_nums, 3)
4 print('After splitting the array \n',new_array)
In [144]:
Splitting of 2D arrays
1 array_special_nums = np.array([[0.577, 1.618], [2, 2.718], [3.14, 6], [13, 28], [37, 1729]])
2 print('Before splitting the array \n', array_special_nums)
3 new_array = np.array_split(array_special_nums, 2)
4 print('After splitting the array \n',new_array)
In [147]:
1 array_special_nums = np.array([[0.577, 1.618], [2, 2.718], [3.14, 6], [13, 28], [37, 1729]])
2 print('Before splitting the array \n', array_special_nums)
3 new_array = np.array_split(array_special_nums, 2, axis=0)
4 print('After splitting the array \n',new_array)
1 array_special_nums = np.array([[0.577, 1.618], [2, 2.718], [3.14, 6], [13, 28], [37, 1729]])
2 print('Before splitting the array \n', array_special_nums)
3 new_array = np.array_split(array_special_nums, 2, axis=1)
4 print('After splitting the array \n',new_array)
In [2]:
1 import numpy as np
2 array_special_nums = np.array([0.577, 1.618, 2, 2.718, 3.14, 6, 28, 37, 1729])
3 splitted_array = np.array_split(array_special_nums, 5)
4 print('Before indexing\n', splitted_array)
5 print('After indexing\n', splitted_array[0:2])
6 print('After splitting\n', splitted_array[2:5])
7 print('After splitting\n', splitted_array[1])
8 print('After splitting\n', splitted_array[2])
Before indexing
[array([0.577, 1.618]), array([2. , 2.718]), array([3.14, 6. ]), array([28., 37.]), array([1729.])]
After indexing
[array([0.577, 1.618]), array([2. , 2.718])]
After splitting
[array([3.14, 6. ]), array([28., 37.]), array([1729.])]
After splitting
[2. 2.718]
After splitting
[3.14 6. ]
Copy of array
1 # With assignment
2 array_old = np.array([0, 1, 1, 2, 3, 5, 8, 13, 21, 34])
3 new_array = array_old
4 print('The old array is', array_old, 'and the id of the old array is', id(array_old))
5 print('The new array is', new_array, 'and the id of the new array is', id(new_array), 'which is same as that of the old ar
6 print(id(array_old))
7 for i in array_old:
8 print(i, end=' ')
9 print()
10 print(id(new_array))
11 for i in new_array:
12 print(i, end=' ')
The old array is [ 0 1 1 2 3 5 8 13 21 34] and the id of the old array is 2017306300016
The new array is [ 0 1 1 2 3 5 8 13 21 34] and the id of the new array is 2017306300016 which is sam
e as that of the old array.
2017306300016
0 1 1 2 3 5 8 13 21 34
2017306300016
0 1 1 2 3 5 8 13 21 34
In [20]:
The old array is [ 0 1 1 2 3 5 8 13 21 34] and the id of the old array is 2017306299056
The new array is [ 0 1 1 2 3 5 8 13 21 34] and the id of the new array is 2017306299248 which is diffe
rent from that of the old array.
In [21]:
The old array is [ 0 1 1 2 3 5 8 13 21 34] and the id of the old array is 2017306297904
The new array is [ 0 1 1 2 3 5 8 13 21 34] and the id of the new array is 2017306296944 which is diffe
rent from that of the old array.
np.where(array_name==element to search)
In [24]:
In [25]:
In [26]:
np.searchsorted(name_array, value)
In [28]:
1 fibonacci_nums = np.array([0, 1, 1, 2, 3, 5, 8, 13, 21, 34])
2 new_array = np.searchsorted(fibonacci_nums, 2)
3 print(f' The element 2 in fibonacci numbers is in index {new_array}.')
Sorting
np.sort(name_array)
In [29]:
Before sorting
[21 34 0 2 5 3 8 1 1 13]
After sorting
[ 0 1 1 2 3 5 8 13 21 34]
In [44]:
Before sorting
['Banana' 'Orange' 'Erdberry' 'Apple' 'Pineapple' 'Kiwi']
After sorting
['Apple' 'Banana' 'Erdberry' 'Kiwi' 'Orange' 'Pineapple']
In [47]:
Before sorting
[[21 34 0]
[ 2 5 3]
[ 8 1 1]]
After sorting
[[ 0 21 34]
[ 2 3 5]
[ 1 1 8]]
Statistics
np.mean(array)
np.max(array)
np.min(array)
np.sum(array)
np.std(array)
np.var(array)
np.median(array)
In [52]:
Mathematical functions
In [59]:
1 print(np.pi)
2 print(np.e)
3 print(np.nan)
4 print(np.inf)
5 print(-np.inf)
3.141592653589793
2.718281828459045
nan
inf
-inf
In [65]:
In [81]:
[[ 1 2 3]
[11 12 13]
[14 15 16]]
2
(3, 3)
9
In [93]:
1 #Accessing
2 print(numpy_array[0, 0])
3 print(numpy_array[0, 1])
4 print(numpy_array[0, 2])
5 print(numpy_array[1, 0])
6 print(numpy_array[1, 1])
7 print(numpy_array[1, 2])
8 print(numpy_array[2, 0])
9 print(numpy_array[2, 1])
10 print(numpy_array[2, 2])
11 print(numpy_array[0][0:3])
12 print(numpy_array[1][1:3])
13 print(numpy_array[2][0:2])
1
2
3
11
12
13
14
15
16
[1 2 3]
[12 13]
[14 15]