0% found this document useful (0 votes)

78 views

Chapter 3 Exploratory Data Analysis

Uploaded by

barnabas

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views

Chapter 3 Exploratory Data Analysis

Uploaded by

barnabas

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

EXPLORATORY DATA ANALYSIS

CHAPTER 3

“Introduction to Data Science : Practical Approach with R and Python ”

B.Uma Maheswari and R Sujatha
Copyright @ 2021 Wiley India Pvt. Ltd. All rights reserved.
LEARNING OBJECTIVES
Apply the steps in data pre-processing.
Understand data by looking and visualizing the data
Learn the concept of outliers how to deal with them.
Dealing with missing values during data preprocessing.
Understand the concept of standardization.
Apply R and Python programming for data anlysis
DATA SCIENCE PROCESS MODEL
Objectives of EDA
To develop an understanding of the data
To identify trends and patterns
To understand relationship between variables
To decide on the appropriate models to be executed on
the data
To find answers to questions relating to the data
To test assumptions
STEPS IN DATA PRE-PROCESSING
DATASET DESCRIPTION
S.No. Column Name Description
1 phoneno Phone Number of the customer
2 age Age of the customer (1-> 18-30, 2->31-40, 3->41-50, 4->Above 50)
3 gender Gender of the customer (0->Male, 1->Female)
4 zipcode Zip code of the area where the customer lives
5 calls Number of calls made by the customer per month
6 sms Number of SMS made by the customer per month
7 mms Number of MMS made by the customer per month
8 charges Monthly charges paid by the customer
9 coverage Number of days out of coverage
Type of Complaint (0-> no problem,1->Recharge issues, 2-> Problems in the
10 complaint offer/package , 3->Network problem, 4->Call dropping)
11 sim Single or dual sim (0->Single sim, 1->Dual sim)
12 phone Type of Phone (0->Android, 1-> IOS)
13 prepost Prepaid or Post Paid (0->Prepaid, 1->Post Paid)
14 churn Customer Churn (0-No Churn, 1-Churn)
UNDERSTANDING THE DATA

Summary of the
dataset
Structure of the
dataset
Dimensions of
the data
Load the dataset • dim, nrow,
ncol, names
CONTINOUS AND CATEGORICAL VARIABLES
Continuous variables are quantitative variables which can take
any infinite values and can be measured. Mean, median and mode
can be calculated for continuous variables. For e.g. Height, weight,
speed of the vehicle etc.
Categorical variables are variables which could be categorized
into distinct groups e.g. gender, pass/fail etc. are finite.
In simple words, if we can measure the variables it is a continuous
variable and if we can count the variables it is categorical.
NORMAL DISTRIBUTION

Line drawing
to be drawn
RIGHT SKEWED AND LEFT SKEWED

Line drawing to be drawn

DATA VISUALIZATION
Histogram
(Continuous
variables)
Barplot
(Categorical
variables)
Boxplot
(Continous
variables)
BOXPLOT
A box plot provides a good representation of distribution of quantitative data. It is also known as
a box and whisker plot. It is used in exploratory data analysis to draw inferences from the data..
Boxplot divides the data into quartiles.
The first 25% of the data lies between the minimum value and the start of the box which is the first
quartile(Q1). This is called as whiskers
The second 25% of the data lies between start of the box and the median which is the second
quartile(Q2).
The third 25% of the data lies between the median and the end of the box which is the third
quartile (Q3).
The last 25% of the data lies from the end of the box to the maximum value which is shown as
whiskers.
The length of the whiskers and the position of the median indicates the skewness of the data.
The plot shows the interquartile range (IQR) which is the difference between the 25th and the 75th
percentile
Boxplot also indicates the presence of outliers.
BOX PLOT AND OUTLIERS
1st 2nd 3rd
Minimum Quartile Quartile Maximum
Quartile
value value

Whiskers
Outliers Whiskers

Median
OUTLIER TREATMENT
First 25% of the Second 25% Third 25% of Last 25% of the
data of the data the data data
DEALING WITH MISSING VALUES
STANDARDIZING DATA

This process is also called feature scaling.

This is usually done when there are large differences in the range of values in the
columns of a dataset. This process is done to ensure that the variables are on the same
scale.
This can be done in two ways Normalization and Standardisation.
In normalization the minimum and maximum values are used and in standardisation
mean and standard deviation are used.
MEAN
MEDIAN
MODE
VARIANCE AND STANDARD DEVIATION
The IQR can also be
used to identify
suspected outliers.
In general, a suspected
outlier can exist in the
following two ranges:
= 4 – 16.5= -12.5
= 15 + 16.5= 31.5
Dependent
Independent Variables
Variables

A sample dataset

Project Work - DISA - 3.0 - Implementation - of - Adequate - Governance - in - Hotel - Management - System
100% (2)
Project Work - DISA - 3.0 - Implementation - of - Adequate - Governance - in - Hotel - Management - System
29 pages
MCQ of Evs
No ratings yet
MCQ of Evs
14 pages
Unit 3
No ratings yet
Unit 3
20 pages
CHP 2
No ratings yet
CHP 2
52 pages
Box and Whisker Lesson
No ratings yet
Box and Whisker Lesson
4 pages
Visualization - Hist and Box
No ratings yet
Visualization - Hist and Box
23 pages
Concepts of EDA, Outliers-Detection and Treatment
No ratings yet
Concepts of EDA, Outliers-Detection and Treatment
99 pages
Unit 3 Data Preprocessing - Data
No ratings yet
Unit 3 Data Preprocessing - Data
90 pages
Data Visualization
No ratings yet
Data Visualization
37 pages
Data Preprocessing
No ratings yet
Data Preprocessing
56 pages
02data Part2
No ratings yet
02data Part2
34 pages
Data Mining and Warehousing Assignment-1: Introduction To Boxplots
No ratings yet
Data Mining and Warehousing Assignment-1: Introduction To Boxplots
4 pages
Week - 1 Day - 1 Descriptive Statistics
No ratings yet
Week - 1 Day - 1 Descriptive Statistics
40 pages
Notes 03
No ratings yet
Notes 03
21 pages
02Data (2)
No ratings yet
02Data (2)
36 pages
Data Preprocessing Data Basics
No ratings yet
Data Preprocessing Data Basics
86 pages
Chapter 2_ Data Exploration, Preprocessing and Visualization
No ratings yet
Chapter 2_ Data Exploration, Preprocessing and Visualization
92 pages
Part2 Statistics
No ratings yet
Part2 Statistics
55 pages
Data Analytics Summary
No ratings yet
Data Analytics Summary
89 pages
Ch 3_250408_170537
No ratings yet
Ch 3_250408_170537
33 pages
DAAN436277 Buoi09 EDA
No ratings yet
DAAN436277 Buoi09 EDA
132 pages
CH 03
No ratings yet
CH 03
50 pages
Statistics Midterm Review
No ratings yet
Statistics Midterm Review
21 pages
program-1_
No ratings yet
program-1_
15 pages
Basic Business Statistics: Concepts & Applications: Activity 4+ 5 + 6 Descriptive Statistics and Graphical Analysis
No ratings yet
Basic Business Statistics: Concepts & Applications: Activity 4+ 5 + 6 Descriptive Statistics and Graphical Analysis
33 pages
Machine Learning (1) : Inteligência Artificial E Cibersegurança (Inacs)
No ratings yet
Machine Learning (1) : Inteligência Artificial E Cibersegurança (Inacs)
33 pages
COMPSCI 5590-f23-DS-rr-lecture1-4
No ratings yet
COMPSCI 5590-f23-DS-rr-lecture1-4
15 pages
L1-D3 Concepts of Data Analysis
No ratings yet
L1-D3 Concepts of Data Analysis
17 pages
S3-Measures of Dispersion
No ratings yet
S3-Measures of Dispersion
15 pages
Preprocessing 935
No ratings yet
Preprocessing 935
68 pages
ML U2
No ratings yet
ML U2
62 pages
DM Lec2 Getting To Know Your Data
No ratings yet
DM Lec2 Getting To Know Your Data
34 pages
02data DMDW
No ratings yet
02data DMDW
40 pages
Lecture 1ASADA Descriptive Stats
No ratings yet
Lecture 1ASADA Descriptive Stats
38 pages
Module -3
No ratings yet
Module -3
43 pages
DSI237_GROUP_2
No ratings yet
DSI237_GROUP_2
27 pages
UNIT02
No ratings yet
UNIT02
41 pages
Boxplots in R-1
No ratings yet
Boxplots in R-1
10 pages
20210129--Lecture01
No ratings yet
20210129--Lecture01
76 pages
R For Data Exploration
No ratings yet
R For Data Exploration
52 pages
Chapter 1: Descriptive Statistics: Example 1: Making Steel Rods
No ratings yet
Chapter 1: Descriptive Statistics: Example 1: Making Steel Rods
20 pages
L4 Exploratory Analysis en
No ratings yet
L4 Exploratory Analysis en
42 pages
CS822-DataMining-Week2 (2)
No ratings yet
CS822-DataMining-Week2 (2)
28 pages
Data Analytics Summary
No ratings yet
Data Analytics Summary
80 pages
DM UNIT-1-1
No ratings yet
DM UNIT-1-1
56 pages
Data Mining (DM) : Lecture 3: Know Your Data
No ratings yet
Data Mining (DM) : Lecture 3: Know Your Data
53 pages
DM-2Preprocessing 2
No ratings yet
DM-2Preprocessing 2
61 pages
3-Data Description
No ratings yet
3-Data Description
91 pages
Evans Analytics2e PPT 04
No ratings yet
Evans Analytics2e PPT 04
63 pages
Chapter 4 - Descriptive Statistical Measures
No ratings yet
Chapter 4 - Descriptive Statistical Measures
63 pages
ADS imp ans
No ratings yet
ADS imp ans
11 pages
Data-Preprocessing
No ratings yet
Data-Preprocessing
138 pages
Shubham Dadhich Box Plot-1
No ratings yet
Shubham Dadhich Box Plot-1
9 pages
02Data Edited v2
No ratings yet
02Data Edited v2
43 pages
4 ExploratoryAnalysis
No ratings yet
4 ExploratoryAnalysis
42 pages
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
No ratings yet
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
44 pages
Transportation Data Mining: Chapter 2. Getting To Know Your Data
No ratings yet
Transportation Data Mining: Chapter 2. Getting To Know Your Data
77 pages
Feature Engineering
No ratings yet
Feature Engineering
63 pages
3.3 Percentiles and Box-and-Whisker Plots
No ratings yet
3.3 Percentiles and Box-and-Whisker Plots
16 pages
Discriptive Statistics
No ratings yet
Discriptive Statistics
50 pages
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Chapter 2 Introduction To R and Python
No ratings yet
Chapter 2 Introduction To R and Python
35 pages
Chapter 1 Introduction To Datascience
No ratings yet
Chapter 1 Introduction To Datascience
13 pages
Unit 3 - Structure of A C Program
No ratings yet
Unit 3 - Structure of A C Program
4 pages
Unit 1-Introduction To Algorithm and Flowchart
No ratings yet
Unit 1-Introduction To Algorithm and Flowchart
7 pages
Module-1 Part 1
No ratings yet
Module-1 Part 1
66 pages
Module 3 - Part 1
No ratings yet
Module 3 - Part 1
27 pages
Mitosis Lesson Plans Laurenwhite
No ratings yet
Mitosis Lesson Plans Laurenwhite
1 page
RD CV Compressed
No ratings yet
RD CV Compressed
5 pages
Airline-Reservation-System VB & Mis
100% (5)
Airline-Reservation-System VB & Mis
58 pages
Lab 2 Homework 1 PDF For Printing
No ratings yet
Lab 2 Homework 1 PDF For Printing
5 pages
Were Not Afraid To Die Notes
No ratings yet
Were Not Afraid To Die Notes
1 page
Minor Project Report (Case Study) On: Goal Setting and Decision Making Strategies
0% (1)
Minor Project Report (Case Study) On: Goal Setting and Decision Making Strategies
53 pages
f6ca1e61-nairobi-city-county-private-secondar
No ratings yet
f6ca1e61-nairobi-city-county-private-secondar
11 pages
Aggregate
No ratings yet
Aggregate
2 pages
Cl2
No ratings yet
Cl2
54 pages
4M Changing Point Management
No ratings yet
4M Changing Point Management
7 pages
Layout Jalur Hydrant Pillar & Fire Monitor
No ratings yet
Layout Jalur Hydrant Pillar & Fire Monitor
1 page
Probate PR and Cle Application Form
No ratings yet
Probate PR and Cle Application Form
13 pages
Rural Consumer's Behavior: Module - 2
No ratings yet
Rural Consumer's Behavior: Module - 2
21 pages
The Quick Command For AutoCAD User
No ratings yet
The Quick Command For AutoCAD User
14 pages
Management of Instruction: Objective-Related Principles of Teaching
No ratings yet
Management of Instruction: Objective-Related Principles of Teaching
10 pages
Guidelines On Management of HPTs in Kenya
100% (1)
Guidelines On Management of HPTs in Kenya
56 pages
Review of Literature On Employee Training and Development PDF
No ratings yet
Review of Literature On Employee Training and Development PDF
6 pages
BSBWHS616 - CAC Class Activities.v1.0
No ratings yet
BSBWHS616 - CAC Class Activities.v1.0
16 pages
Commercial Law Notes
No ratings yet
Commercial Law Notes
3 pages
Business Valuation PDF Version
No ratings yet
Business Valuation PDF Version
13 pages
International Supreme Price List 2021
No ratings yet
International Supreme Price List 2021
41 pages
Riva 44' RIVARAMA - Lady M
No ratings yet
Riva 44' RIVARAMA - Lady M
15 pages
Do No Harm Guide Centering Accessibility in Data Visualization
No ratings yet
Do No Harm Guide Centering Accessibility in Data Visualization
112 pages
289 Whats New and Exciting in SAP Screen Personas 3.0 - Demo, Migration From Version 2, and Deployment at Scale PDF
No ratings yet
289 Whats New and Exciting in SAP Screen Personas 3.0 - Demo, Migration From Version 2, and Deployment at Scale PDF
13 pages
Adobe Scan Feb 28, 2024
No ratings yet
Adobe Scan Feb 28, 2024
3 pages
Tps25924 12v Efuse
No ratings yet
Tps25924 12v Efuse
36 pages
State Machine Design Pattern: Anatoly Shalyto Nikita Shamgunov Georgy Korneev
No ratings yet
State Machine Design Pattern: Anatoly Shalyto Nikita Shamgunov Georgy Korneev
7 pages
UNIT 4 - 4.alvaro Siza
No ratings yet
UNIT 4 - 4.alvaro Siza
11 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Chapter 3 Exploratory Data Analysis

Uploaded by

Chapter 3 Exploratory Data Analysis

Uploaded by

EXPLORATORY DATA ANALYSIS

“Introduction to Data Science : Practical Approach with R and Python ”

Line drawing to be drawn

This process is also called feature scaling.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.