0% found this document useful (0 votes)

629 views31 pages

Data Visualization and Story Telling Notes

The document discusses data visualization and storytelling. It provides an agenda for making people familiar with data visualization and how to communicate interpretations of data through visualization. It then discusses concepts like time, events, artifacts, data collection, and pattern identification from data. Examples on climate data from tree rings are provided. Steps for understanding data like selecting relevant data and identifying patterns are outlined. Finally, it discusses what storytelling is, the basic structure of a data story, and exploring different data types through various visualizations in Tableau.

Uploaded by

aakash verma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

629 views31 pages

Data Visualization and Story Telling Notes

Uploaded by

aakash verma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 31

Data visualization and Story telling

Session 1:

Agenda is to make familiar with data visualization and how to communicate with data visualization. Just
plotting graphs does not end task. we should be able to communicate the interpretation of data.

Role of data:

What is time according to you.

1)Sequence of events

2)Continuous

3)Time is base. all activities happen in time.

4)Start and end of events

To summarize, time is continuous in nature and all the events can be marked within the time. Events
may leave certain artifacts which can be in form of another events , data etc.

How will you know about past events and impact of this events?
We should collect the data. But what does data mean. Data means analysis of data point that can help
to get perception about the artifact.

Try to understand the outcome /Artifacts of events before going to data points. Then try to analyze data
point that can help to understand outcomes and take better decisions. Another is to understand the
patterns from data that can help to take decisions.

Example of Climate data:

Do we have storage of past climatic data?

Tree stores climate data. Research happens to find the age of tree.

Every year tree adds one stem and based on climate size of stem is different.
Try to find out the artifacts from the data.

Whenever you want to understand any event below are the steps

1)Select the relevant data

2)Process the data like merging

3)Clean the data like remove inconsistency or adding missing values.

4)Transform the data like data of daily sales needs to transform to identify monthly weekly sales etc.

Columns are dimensions, attributes etc.

5)Identify the Patterns from data using data mining techniques, visualization etc. From pattern we try to
convert it into insights.
Identify plain Problem:

We should focus on missing parts why plain not returned with bullet less area like 1 and 3.
We should choose the appropriate scaling to plot graphs, so chart is very clear

What is story telling:

Story is sense making process with the help of phenomenon. you make sense of insight keeping how it
happens.

Simply telling sales has gone down, does not make sense. we should have insight on how and why it
happens.
The data story has below basic structure

1)in the past

2)Then something happened

3) As a result

How should be understand the data:

Try to understand what is this data about, what each value /column represent in this data. What is
type of data example Quantitative or qualitative.

First step to know is what is entity of data. Entity is any virtual or physical object. Example: in batch of
student data each entity is student.

Then understand the attribute of each entity. Example: name, age, percentage marks etc. of each
student. Type of each attribute. Quantitative or qualitative. How this data is generated, how attribute
values are calculated.

Attribute can be called as variables, Columns features in machine learning or characterizes.

Types of Attributes:

1)Qualitative (Categorical)

a) Nominal variable: related to names, the values of nominal attribute are names of things, some
kind of symbol. Example gender, occupation. There is no order (rank, position) among values of
nominal attribute. By nominal attribute we can infer if two entities are similar or not. That’s it nothing
more

b) Binomial Variable: Which has 2 states. Any variable which has only 2 values. example work
experience (Yes or no). There are of two types on binomial variable:

symmetric: in which both the labels have equal relevance. Like in our batch experience or
unexperienced is same

asymmetric: which has different relevance like covid + and covid. we can represent them with 1
or 0.
C) Ordinal attributes: have a meaningful sequence or ranking. but magnitude between values is not
known . example: grade A, B, C, D, F. the order of value is that show what is important but don’t
indicate how important it is. like we can say A is better than B but how better.

2) Quantitative (Numerical): Numeric is Quantitative because it is measurable quantity.

a) Interval scaled variable: has values whose difference are interpretable but do not have
absolute Zero. Example: temperature 0 means it is some temperature. Similarly, we can’t say that
temp. of 20 degree is double of 10 degrees.

b) Ratio scaled Variable: with a fix zero point. We can say equal, not equal, ratio. examples
age, income, height, weight.
Exploring Nominal and Ordinal Attributes:

• Frequency Charts

• Mode

• Cross Tabulation

• Pie- Chart

Exploring Numeric Attributes:

• Mean, Mode, Median ---Measure of central tendency

• Range, standard deviation, interquartile range --- Measure of dispersion

• Outliers, Boxplots, five number summary (minimum, Q1,Q2,Q3,maximum)

• Correlation among variables

The mean (average) of a data set is found by adding all numbers in the data set and then dividing
by the number of values in the set. The median is the middle value when a data set is ordered
from least to greatest. The mode is the number that occurs most often in a data set. Created
by Sal Khan.

Q1 -25% percentile ,q2 -50% ,q3 -75%

Interquartile range - Is used to find the outliners (extremely high or extremely low) in the data.
process to find Outliers is below

1) Compute IQR
2) Identify the maximum limit.
Session 2:

Introduction of tableau:

Founded in California in year 2003. Founded by academic.it is recently acquired by salesforce.

It has different products.

 Tableau reader
 Tableau desktop
 Tableau Mobile
 Tableau public
Zogo bike sharing systems case analysis: Analise demand with different season, weather condition
etc.

Class case.xlsx

Most of the data is categorial values as dictionary is defined them as category.

Registered users can be defined from casual.

we have 2 sheets so we can see merge sheets in tableau.

Merging data from different data source is also possible.

Data sources in tableau

Choose excel as data source and load class case file. After importing you can see 2 tabs loaded as below.
How to merge two data sources

Drag and drop to table space and apply union of 2 sheets. Drop in union box else it will become join.
Connection: 2 types of connections

Live : live will automatically refresh any changes in source

Extract: extract is static copy of data only. It is faster than live.

Filtering can be done similar to excel. Tableau reads data type by providing high importance to string.

Convert numeric to non-numeric by creating group as shown below

Right click on field and click create group.

After grouping it will create new column for the group

Create calculated field: derive column based on formula

Explore Chartings:

Bar chart is when we want to plot numeric entity against categorical entity. Example demands across
different seasons.

All categorical data types are dimensions or attributes.

All numeric variables are for measures.

Column shall correspond to X axis in a graph

Row Shall corresponds to Y axis in a graph.

Stacked bar can be placed in tableau by dragging the dimension to colors in marks sections.

Demand against temperature: scatter plot is required as 2 numeric values.

In scatter plot we need to remove aggregate functions from top menu analysis.

Trend line shows the correlation in the chart. as shown below it is positive co relation.

Trend line is based on regression between the entities. On hover it shows the regression.

Try to make scales unit free like it was done temperature in example.

Line chart is always suited for represented a time series like sales over a period of time.

Outliers can be plot in box plot for numeric column. you need to remove aggregate for box plot.
Box plot against category like below drag season into column.
Histogram frequency distribution should be used to see distribution of data. Skewness means outliers. It
is right skewed.

Time series: represents a series of data points collected an over a period of time. line chart
Bubble chart is different and better represtation of Bar chart
Session 3:

Data driven decision making:

Data is in in different formats.

Machine learning – tomorrow what can happen. machine learning works on structured data.
Deep learning - Deep learning means understanding important feature based on common pattern.
We should try to add more features to the data to identify the targets.

When columns headers are there, it called supervised data. Else unsupervised data.
Below is unsupervised learning as we can’t identify what each entity attribute means.

Bin is a number which we converted into small categories.

Deep learning models explanation is very difficult.

Data Types and Sources
No ratings yet
Data Types and Sources
36 pages
(eBook PDF) Basic Marketing Research 9th Edition by Tom J. Brown pdf download
100% (1)
(eBook PDF) Basic Marketing Research 9th Edition by Tom J. Brown pdf download
45 pages
Fem3004 Practical 7 T-Test
No ratings yet
Fem3004 Practical 7 T-Test
2 pages
Solution Manual for Statistics for The Behavioral Sciences, 10th Edition download
No ratings yet
Solution Manual for Statistics for The Behavioral Sciences, 10th Edition download
49 pages
Unit Ii Data Collection and Sources PDF
100% (3)
Unit Ii Data Collection and Sources PDF
20 pages
HCI Unit3
No ratings yet
HCI Unit3
13 pages
CHAPTER - 7 - Measurement and Scaling
No ratings yet
CHAPTER - 7 - Measurement and Scaling
33 pages
Riddle, Mark Hanneman, Robert A - Introduction To Social Network Methods-University of California (2005)
No ratings yet
Riddle, Mark Hanneman, Robert A - Introduction To Social Network Methods-University of California (2005)
322 pages
Inferential Statistics, T Test, ANOVA & Proportionate Test
No ratings yet
Inferential Statistics, T Test, ANOVA & Proportionate Test
117 pages
Cheatsheet FDA a4 Full
No ratings yet
Cheatsheet FDA a4 Full
2 pages
STA470 Chapter 1
No ratings yet
STA470 Chapter 1
21 pages
a
No ratings yet
a
2 pages
Unit-1-Introduction To Statistical Analysis
No ratings yet
Unit-1-Introduction To Statistical Analysis
103 pages
Definition of Statistics: 1.plural Sense (Lay Man Definition) - 2.singular Sense (Formal Definition)
No ratings yet
Definition of Statistics: 1.plural Sense (Lay Man Definition) - 2.singular Sense (Formal Definition)
19 pages
Machine Learning - AL3451 - Important Questions With Answer
No ratings yet
Machine Learning - AL3451 - Important Questions With Answer
25 pages
Unit 2 Ipr
No ratings yet
Unit 2 Ipr
15 pages
MMW Reviewer Data Management
No ratings yet
MMW Reviewer Data Management
17 pages
Lect 3
No ratings yet
Lect 3
51 pages
Assignment
No ratings yet
Assignment
12 pages
Attitude Measurement and Scaling
No ratings yet
Attitude Measurement and Scaling
41 pages
Chapter 2: Getting To Know Your Data
No ratings yet
Chapter 2: Getting To Know Your Data
30 pages
Biostatistics Classes Till 11.10.20
No ratings yet
Biostatistics Classes Till 11.10.20
265 pages
IP-Lab Manual
100% (1)
IP-Lab Manual
19 pages
Applied Statistics
No ratings yet
Applied Statistics
8 pages
Sensory Evaluation Resources 2
No ratings yet
Sensory Evaluation Resources 2
31 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
37 pages
DATA ANALYTICS QUESTION BANK
No ratings yet
DATA ANALYTICS QUESTION BANK
4 pages
Information Retrieval 1
100% (2)
Information Retrieval 1
12 pages
Normative Values For The Foot Posture
No ratings yet
Normative Values For The Foot Posture
9 pages
Unit 5a
No ratings yet
Unit 5a
22 pages
Dsdm-Unit1 241031 194317
No ratings yet
Dsdm-Unit1 241031 194317
38 pages
Unit 1 - Business Analytics
No ratings yet
Unit 1 - Business Analytics
57 pages
Lesson 4 Big Data Ecosystem
No ratings yet
Lesson 4 Big Data Ecosystem
26 pages
Information Retrieval 1 Introduction To IR
No ratings yet
Information Retrieval 1 Introduction To IR
12 pages
ds4015-big-data-analytics-vignesh-k-notes
No ratings yet
ds4015-big-data-analytics-vignesh-k-notes
146 pages
ccs346 Eda
No ratings yet
ccs346 Eda
2 pages
Basic Concept and Terms
No ratings yet
Basic Concept and Terms
28 pages
Final Twitter - Sentiment - Analysis - Report
100% (1)
Final Twitter - Sentiment - Analysis - Report
14 pages
Unit 1 DataScience
No ratings yet
Unit 1 DataScience
105 pages
Understanding Customer Requirements: Principles of Design
No ratings yet
Understanding Customer Requirements: Principles of Design
30 pages
Nature of Inquiry and Research
No ratings yet
Nature of Inquiry and Research
22 pages
Data Mining Syllabus
No ratings yet
Data Mining Syllabus
1 page
R22-Ids-Question Bank
No ratings yet
R22-Ids-Question Bank
4 pages
Data Analytics Lab File Rohit
No ratings yet
Data Analytics Lab File Rohit
23 pages
FDSA Unit-2
No ratings yet
FDSA Unit-2
41 pages
Da Unit-2
No ratings yet
Da Unit-2
23 pages
20IT503 - Big Data Analytics - Unit2
No ratings yet
20IT503 - Big Data Analytics - Unit2
62 pages
UNIT 4 Information Retrieval Using NLP
No ratings yet
UNIT 4 Information Retrieval Using NLP
13 pages
Lesson 1: Fundamental Concepts and Summation Notation
No ratings yet
Lesson 1: Fundamental Concepts and Summation Notation
8 pages
Completed Final UNIT-V 9.10.17
100% (1)
Completed Final UNIT-V 9.10.17
74 pages
Statistics For Management
No ratings yet
Statistics For Management
20 pages
EDA - With Python Question Bank
No ratings yet
EDA - With Python Question Bank
3 pages
CS-605 Data - Analytics - Lab Complete Manual (2) - 1672730238
No ratings yet
CS-605 Data - Analytics - Lab Complete Manual (2) - 1672730238
56 pages
Unit-2 Solution
No ratings yet
Unit-2 Solution
22 pages
274 - Soft Computing LECTURE NOTES
No ratings yet
274 - Soft Computing LECTURE NOTES
499 pages
Data Mining Question Bank
No ratings yet
Data Mining Question Bank
4 pages
1) Aim: Demonstration of Preprocessing of Dataset Student - Arff
No ratings yet
1) Aim: Demonstration of Preprocessing of Dataset Student - Arff
26 pages
Q&A Univ 3unit
No ratings yet
Q&A Univ 3unit
18 pages
Data manipulation in R
No ratings yet
Data manipulation in R
5 pages
DBDM Unit-3
No ratings yet
DBDM Unit-3
30 pages
Unit 5
No ratings yet
Unit 5
104 pages
3-1 Bigdata (Spark)
No ratings yet
3-1 Bigdata (Spark)
3 pages
Memory Based Reasoning - BIA
100% (1)
Memory Based Reasoning - BIA
19 pages
Chapter 5 Concept Description Characterization and Comparison 395
No ratings yet
Chapter 5 Concept Description Characterization and Comparison 395
64 pages
ML Lab Manual (1-10) FINAL
No ratings yet
ML Lab Manual (1-10) FINAL
34 pages
Agriculture Management System-3
No ratings yet
Agriculture Management System-3
22 pages
MA7155-Applied Probability and Statistics Question Bank
No ratings yet
MA7155-Applied Probability and Statistics Question Bank
15 pages
AI Lab MAnual Final
No ratings yet
AI Lab MAnual Final
44 pages
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
No ratings yet
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
79 pages
Pincer Search Algo
No ratings yet
Pincer Search Algo
8 pages
BDA Unit 1-1
No ratings yet
BDA Unit 1-1
21 pages
Haze Removal
No ratings yet
Haze Removal
34 pages
Intrusion Detection System in Software Defined Networks Using Machine Learning Approach
No ratings yet
Intrusion Detection System in Software Defined Networks Using Machine Learning Approach
8 pages
CSE4022 Natural-Language-Processing ETH 1 AC41
No ratings yet
CSE4022 Natural-Language-Processing ETH 1 AC41
6 pages
Midterm Notes MGMT 2050
No ratings yet
Midterm Notes MGMT 2050
10 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
48 pages
Paper Presentation
No ratings yet
Paper Presentation
2 pages
CS6007 Information Retrieval
No ratings yet
CS6007 Information Retrieval
8 pages
Assignment # II-01: Digital Image Processing (CS - 306)
No ratings yet
Assignment # II-01: Digital Image Processing (CS - 306)
6 pages
Big Data Unit 2
No ratings yet
Big Data Unit 2
19 pages
Data Visualization Techniques
No ratings yet
Data Visualization Techniques
20 pages
Data Science Techniques Classification Regression and Clustering
No ratings yet
Data Science Techniques Classification Regression and Clustering
5 pages
Notes of Data Science Unit 3
No ratings yet
Notes of Data Science Unit 3
22 pages
DBMS ER Design Issues - Copy Unit.2
No ratings yet
DBMS ER Design Issues - Copy Unit.2
2 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
4 pages
IR UNIT I - Notes
No ratings yet
IR UNIT I - Notes
23 pages
006 Practical List of DM-2023
No ratings yet
006 Practical List of DM-2023
1 page
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
19 pages
Advanced Database Management Systems
No ratings yet
Advanced Database Management Systems
7 pages
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data Visualization and Story Telling Notes

Uploaded by

Data Visualization and Story Telling Notes

Uploaded by

Data visualization and Story telling

What is time according to you.

3)Time is base. all activities happen in time.

4)Start and end of events

Example of Climate data:

Do we have storage of past climatic data?

1)Select the relevant data

2)Process the data like merging

3)Clean the data like remove inconsistency or adding missing values.

Columns are dimensions, attributes etc.

What is story telling:

1)in the past

2)Then something happened

How should be understand the data:

Attribute can be called as variables, Columns features in machine learning or characterizes.

2) Quantitative (Numerical): Numeric is Quantitative because it is measurable quantity.

Exploring Numeric Attributes:

• Mean, Mode, Median ---Measure of central tendency

• Range, standard deviation, interquartile range --- Measure of dispersion

• Outliers, Boxplots, five number summary (minimum, Q1,Q2,Q3,maximum)

• Correlation among variables

Q1 -25% percentile ,q2 -50% ,q3 -75%

Founded in California in year 2003. Founded by academic.it is recently acquired by salesforce.

It has different products.

Most of the data is categorial values as dictionary is defined them as category.

Registered users can be defined from casual.

we have 2 sheets so we can see merge sheets in tableau.

Merging data from different data source is also possible.

Data sources in tableau

Live : live will automatically refresh any changes in source

Extract: extract is static copy of data only. It is faster than live.

Convert numeric to non-numeric by creating group as shown below

Right click on field and click create group.

Create calculated field: derive column based on formula

All categorical data types are dimensions or attributes.

All numeric variables are for measures.

Column shall correspond to X axis in a graph

Row Shall corresponds to Y axis in a graph.

Demand against temperature: scatter plot is required as 2 numeric values.

Data driven decision making:

Data is in in different formats.

Bin is a number which we converted into small categories.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.