0% found this document useful (0 votes)
29 views

B14_LT2_07_Numpy Matplotlib Pandas

Uploaded by

nthang0987
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

B14_LT2_07_Numpy Matplotlib Pandas

Uploaded by

nthang0987
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 101

NUMPY, MATPLOTLIB,

PANDAS
Lê Ngọc Hiếu
hieu.ln@ou.edu.vn
Objectives
• Hiểu được khái niệm và các thao tác với thư viện
NUMPY, MATPLOTLIB và PANDAS.
• Nắm được các thao tác với thư viện NUMPY,
MATPLOTLIB và PANDAS.
• Sử dụng các thư viện NUMPY, MATPLOTLIB và PANDAS
vào các bài toán xử lý và phân tích dữ liệu.

2
1. Numpy Package
2. Matplotlib Package
Contents 3. Pandas Package
4. Exercises

3
Numpy Package

4
What is Numpy?
• NumPy is a Python library used for working with arrays.
• It also has functions for working in domain of linear algebra,
fourier transform, and matrices.
• NumPy was created in 2005 by Travis Oliphant. It is an open
source project and you can use it freely.
• NumPy stands for Numerical Python.
Why use Numpy?
• In Python we have lists that serve the purpose of arrays,
but they are slow to process.
• NumPy aims to provide an array object that is up to 50x
faster than traditional Python lists.
• The array object in NumPy is called ndarray, it provides a
lot of supporting functions that make working
with ndarray very easy.
• Arrays are very frequently used in data science, where
speed and resources are very important.

6
Instal and Start to Use Numpy
• Already included in Anaconda.
• If you wish to install Numpy, open Command Prompt
Window (CMD) and type: pip install numpy
• To use numpy, import numpy package before using its
functions.

7
Numpy Array
• A numpy array is a grid of values, all of the same type, and
is indexed by a tuple of nonnegative integers.
• The number of dimensions is the rank of the array.
• Syntax to get the rank of a numpy array:
<array_name>.ndim
• The shape of an array is a tuple of integers giving the size
of the array along each dimension.
• Syntax to get the shape of a numpy array:
<array_name>.shape
• We can initialize numpy arrays from nested Python lists.

8
Numpy Array

9
Create Numpy Array
• There are several way to create Numpy arrays.
• Consider three ways:
• Convert from Python List or Type using the array function.
• Create array with initialize values using ones, or zeros
function.
• Create a sequence of numbers using arrange or linspace
function.

10
Create Numpy Array using array
function

11
Create Numpy Array using ones, or zeros
function
Syntax:
• <var_name> =
np.zeros((ndim, nrows,
ncolumns))

• <var_name> =
np.ones((ndim, nrows,
ncolumns))

12
Create Numpy array using arange or linspace
function
Syntax:
• <var_name> =
np.array(start, end, step)

• <var_name> =
np.linspace(start, end,
number_of_elements)

13
Array Indexing - Slicing
• Similar to Python
lists, numpy
arrays can be
sliced.
• Since arrays may
be
multidimensional,
you must specify
a slice for each
dimension of the
array.
14
Array Indexing - Integer array
indexing

Integer array
indexing
allows you to
construct
arbitrary
arrays using
the data from
another array.

15
NumPy Data Types
• Basic Data Types in Python:
• strings - used to represent text data, the text is given under
quote marks. e.g. "ABCD"
• integer - used to represent integer numbers. e.g. -1, -2, -3
• float - used to represent real numbers. e.g. 1.2, 42.42
• boolean - used to represent True or False.
• complex - used to represent complex numbers. e.g. 1.0 +
2.0j, 1.5 + 2.5j

16
NumPy Data Types
• NumPy has some extra data types, and refer to data types with
one character:
• i - integer
• b - boolean
• u - unsigned integer
• f - float
• c - complex float
• m - timedelta
• M - datetime
• O - object
• S - string
• U - unicode string
• 17
NumPy Data Types
• Checking the Data Type of an Array:
• The NumPy array object has a property
called dtype that returns the data type of
the array.
• Creating Arrays With a Defined Data
Type:
• The array() can take an optional
argument dtype that allows us to define
the expected data type of the array
elements.
• Converting Data Type on Existing Arrays:
• The astype() function creates a copy of
the array and allows you to specify the
data type as a parameter. 18
NumPy Array Copy vs View
• The main difference between a copy and a view of an array
is that the copy is a new array, and the view is just a view
of the original array.

• The copy own s the data and any changes made to the
copy will not affect original array, and any changes made to
the original array will not affect the copy.

• The view does not own the data and any changes made to
the view will affect the original array, and any changes
made to the original array will affect the view.
19
NumPy Array Copy vs View

20
NumPy Array Reshaping
• Reshaping means changing the shape of an array.

• The shape of an array is the number of elements in each


dimension.

• By reshaping we can add or remove dimensions or change


number of elements in each dimension.

21
22
23
NumPy Array Iterating – Use for loop

24
NumPy Joining Array

25
NumPy Joining Array - concatenate
• Concatenation refers to joining. This function is used to join
two or more arrays of the same shape along a specified
axis.
• Syntax: numpy.concatenate((array1, araay2, ...),
axis)
• If axis is not explicitly passed, it is taken as 0.

26
NumPy Joining Array - concatenate

27
NumPy Joining Array - stack
• This function joins the sequence of arrays along a new axis.
• Syntax: numpy.stack(arrays, axis)

28
NumPy Joining Array - hstack
• Variants of numpy.stack function to stack so as to make a
single array horizontally.
• Syntax: numpy.hstack(array1, array2, …, arrayn)

29
NumPy Joining Array - vstack
• Variants of numpy.stack function to stack so as to make a
single array vertically.
• Syntax: numpy.vstack(array1, array2, …, arrayn)

30
NumPy Splitting Array - numpy.split
• Syntax: numpy.split(array, indices_or_sections, axis)

31
NumPy Searching Arrays - where() method
• where() method search an array for a certain value and
return the indexes that get a match.

32
NumPy Searching Arrays - searchsorted()
method
• searchsorted() method performs a binary search in the
array and returns the index where the specified value would
be inserted to maintain the search order.

33
NumPy Sorting Arrays
• Syntax: numpy.sort(a, axis=- 1, kind=None, order=None)
• a: Array to be sorted.
• axis: int or None, optional. Axis along which to sort. If None,
the array is flattened before sorting. The default is -1, which
sorts along the last axis.
• kind{‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’}, optional
• order: str or list of str, optional. When a is an array with fields
defined, this argument specifies which fields to compare first,
second, etc.

34
NumPy Sorting Arrays

35
NumPy Sorting Arrays

36
Numpy Array Math
Basic Elementwise Operators
• Elementwise sum: + or numpy.add
• Elementwise difference: - or numpy.subtract
• Elementwise product: numpy.multiply
• Elementwise division: numpy.divide
• Elementwise square root: numpy.sqrt

37
38
Numpy Array Math
• Inner products of
vectors:
• Multiply a vector by a
matrix:
• Multiply matrices
• Syntax: Given
matrix/vector a and b
• numpy.dot(a, b)
• a.dot(b)

39
Full list of Numpy mathematical
functions
• Link:
https://numpy.org/doc/stable/reference/routines.math.html
• Categories: • Floating point routines
• Trigonometric functions • Rational routines
• Hyperbolic functions • Arithmetic operations
• Rounding • Handling complex numbers
• Sums, products, differences • Extrema Finding
• Exponents and logarithms • Miscellaneous
• Other special functions

40
Numpy - Broadcasting
• The term broadcasting refers to the ability of NumPy to
treat arrays of different shapes during arithmetic
operations.
• If the dimensions of two arrays are dissimilar, element-to-
element operations are not possible.
• However, operations on arrays of non-similar shapes is still
possible in NumPy, because of the broadcasting capability.
• The smaller array is broadcast to the size of the larger array
so that they have compatible shapes.

41
Numpy - Broadcasting

Figure from:
https://www.tutorialspoint.com/numpy/numpy_broadcasti
ng.htm

42
Matplotlib Package

Reference:
https://www.w3schools.com/python/matplotlib_intr 43
What is Matplotlib?
• Matplotlib is a low level graph plotting library in python that
serves as a visualization utility.

• Matplotlib was created by John D. Hunter.

• Matplotlib is open source, and we can use it freely.

• Matplotlib is mostly written in python, a few segments are


written in C, Objective-C and Javascript for Platform
compatibility.
Install Matplotlib
• Already included in Anaconda.
• If you wish to install Matplotlib, open Command Prompt
Window (CMD) and type: pip install matplotlib
• To use Matplotlib, import Matplotlib package before using
its functions.

45
Matplotlib Pyplot
• Most of the Matplotlib utilities lies under
the pyplot submodule, and are usually imported under
the plt alias:
• Syntax: import matplotlib.pyplot as plt

46
Basic Plot Type
• Line plot
• Scatter plot

47
Line plot
• The plot() function is used to draw points (markers) in a
diagram.
• By default, the plot() function draws a line from point to
point.
• Basic syntax: plt.plot(xpoints, ypoints)
• xpoints is an array containing the points on the x-axis.
• ypoints is an array containing the points on the y-axis.

48
plot() function with default X-Points
• If the points in the x-axis are not specified, they will get the
default values 0, 1, 2, 3, …

49
Plotting Options
• All options:
https://matplotlib.org/2.1.2/api/_as_gen/matplotlib.pyplot.pl
ot.html

50
Plot Label and Title
• To set a label for the x- and • To set a title for the plot:
y-axis: • title()
• xlabel()
• ylabel()

51
Legends

52
Legends

53
Legend Position

54
Legend Position
Location Location Location Location
String Code String Code
'best' 0 'center left' 6

'upper right' 1 'center right' 7

'upper left' 2 'lower center' 8


'lower left' 3 'upper center' 9

'lower right' 4 'center' 10

'right' 5
55
Legend Position - bbox_to_anchor

56
57
Scatter Plots

58
59
Customizing Markers in Scatter Plots
• Four main features of the markers used in a scatter plot
that can be customized:
• Size
• Color
• Shape
(https://matplotlib.org/stable/api/markers_api.html#module-
matplotlib.markers)
• Transparency

60
61
62
63
64
65
66
67
ColorMap
Available ColorMaps:
https://www.w3schools.com/python/matplotlib_scatter.asp

68
69
plt.scatter(x, y, c=colors,
cmap='Accent')

70
plt.scatter(x, y, c=colors,
cmap='Blues')

71
72
Bar Plot

73
74
Matplotlib Multiple Bar Chart

75
76
Create Multiple Bar Chart
• Syntax: plt.bar(x, height, width=None, bottom=None,
align='center', data=None, **kwargs)
• The parameters are defined below:
• x: specify the x-coordinates of the bars.
• height: y-coordinates specify the height of the bars.
• width: specify the width of the bars.
• bottom: specify the y coordinates of the bases of the bars.
• align: alignment of the bars.

77
Matplotlib Histograms

A histogram is a graph
showing frequency
distributions.
78
Syntax to create a histogram plot:
matplotlib.pyplot.hist(x, bins=None, range=None, density=False,
weights=None, cumulative=False, bottom=None, histtype='bar',
align='mid', orientation='vertical', rwidth=None, log=False, color=None,
label=None, stacked=False, *, data=None, **kwargs)
79
Matplotlib Histograms - Options
• bins:
• If bins is an integer, it defines the number of equal-width bins
in the range.
• If bins is a sequence, it defines the bin edges, including the
left edge of the first bin and the right edge of the last bin; in
this case, bins may be unequally spaced. All but the last
(righthand-most) bin is half-open
• Example: if bins is [1, 2, 3, 4] then the first bin is [1, 2), and
the second [2, 3). The last bin is [3, 4].
• rwidth (default: None)
• The relative width of the bars as a fraction of the bin width. If
None, automatically compute the width.
80
plt.hist(commutes, bins=10,
edgecolor='black') 81
plt.hist(commutes, bins=20,
edgecolor='black') 82
Matplotlib Pie Charts

83
Matplotlib Pie Charts

84
Matplotlib Pie Charts

85
Pandas Package

86
Introduction

Pandas is a Python library.


Pandas is used to analyze
What is Pandas?
• Pandas is a Python library used for working with data sets.

• It has functions for analyzing, cleaning, exploring, and


manipulating data.

• The name "Pandas" has a reference to both "Panel Data",


and "Python Data Analysis" and was created by Wes
McKinney in 2008.

88
Why Use Pandas?
• Pandas allows us to analyze big data and make conclusions
based on statistical theories.

• Pandas can clean messy data sets, and make them


readable and relevant.

• Relevant data is very important in data science.

89
What Can Pandas Do?
• Pandas gives you answers about the data.

• For examples:
• Is there a correlation between two or more columns?
• What is average value?
• Max value?
• Min value?
• Pandas are also able to delete rows that are not relevant, or
contains wrong values, like empty or NULL values. This is
called cleaning the data.

90
Pandas Getting Started
• Install Pandas: pip install pandas
• Import Pandas:
• import pandas
• import pandas as pd

91
Pandas Series
• A Pandas Series is like a column in a table.
• It is a one-dimensional array holding data of any type.

92
Pandas Series - Labels
• If nothing else is specified, the values are labeled with their
index number.
• First value has index 0, second value has index 1 etc.
• This label can be used to access a specified value.

Output: 7

93
Pandas Series - Create Labels
• With the index argument, you can name your own labels.

Output: 7

94
Key/Value Objects as Series

95
Pandas DataFrames
• A Pandas DataFrame is a 2 dimensional data structure, like
a 2 dimensional array, or a table with rows and columns.

96
Access to DataFrame Elements
• Syntax: pandas.loc[row_index][column_index]

97
Named Indexes.
• With the index argument, you can name your own indexes

98
Pandas Read CSV
• What is CSV (comma separated value) files:
• A simple way to store big data sets.
• CSV files contains plain text and is a well know format that
can be read by everyone including Pandas.

100
Load the CSV into a DataFrame
• Use read_csv() function.
• Syntax: pandas.read_csv(csv_filename)

Link to download ‘data.csv’:


https://www.w3schools.com/python/pandas/da
101
Pandas Read JSON
• Use read_json() function.
• Syntax: pandas.read_json(json_filename)

Link to download ‘data.json’:


https://www.w3schools.com/python/pandas/da
102

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy