Data Visualization and Story Telling Notes
Data Visualization and Story Telling Notes
Session 1:
Agenda is to make familiar with data visualization and how to communicate with data visualization. Just
plotting graphs does not end task. we should be able to communicate the interpretation of data.
Role of data:
1)Sequence of events
2)Continuous
To summarize, time is continuous in nature and all the events can be marked within the time. Events
may leave certain artifacts which can be in form of another events , data etc.
How will you know about past events and impact of this events?
We should collect the data. But what does data mean. Data means analysis of data point that can help
to get perception about the artifact.
Try to understand the outcome /Artifacts of events before going to data points. Then try to analyze data
point that can help to understand outcomes and take better decisions. Another is to understand the
patterns from data that can help to take decisions.
Tree stores climate data. Research happens to find the age of tree.
Every year tree adds one stem and based on climate size of stem is different.
Try to find out the artifacts from the data.
Whenever you want to understand any event below are the steps
4)Transform the data like data of daily sales needs to transform to identify monthly weekly sales etc.
5)Identify the Patterns from data using data mining techniques, visualization etc. From pattern we try to
convert it into insights.
Identify plain Problem:
We should focus on missing parts why plain not returned with bullet less area like 1 and 3.
We should choose the appropriate scaling to plot graphs, so chart is very clear
Story is sense making process with the help of phenomenon. you make sense of insight keeping how it
happens.
Simply telling sales has gone down, does not make sense. we should have insight on how and why it
happens.
The data story has below basic structure
3) As a result
Try to understand what is this data about, what each value /column represent in this data. What is
type of data example Quantitative or qualitative.
First step to know is what is entity of data. Entity is any virtual or physical object. Example: in batch of
student data each entity is student.
Then understand the attribute of each entity. Example: name, age, percentage marks etc. of each
student. Type of each attribute. Quantitative or qualitative. How this data is generated, how attribute
values are calculated.
1)Qualitative (Categorical)
a) Nominal variable: related to names, the values of nominal attribute are names of things, some
kind of symbol. Example gender, occupation. There is no order (rank, position) among values of
nominal attribute. By nominal attribute we can infer if two entities are similar or not. That’s it nothing
more
b) Binomial Variable: Which has 2 states. Any variable which has only 2 values. example work
experience (Yes or no). There are of two types on binomial variable:
symmetric: in which both the labels have equal relevance. Like in our batch experience or
unexperienced is same
asymmetric: which has different relevance like covid + and covid. we can represent them with 1
or 0.
C) Ordinal attributes: have a meaningful sequence or ranking. but magnitude between values is not
known . example: grade A, B, C, D, F. the order of value is that show what is important but don’t
indicate how important it is. like we can say A is better than B but how better.
a) Interval scaled variable: has values whose difference are interpretable but do not have
absolute Zero. Example: temperature 0 means it is some temperature. Similarly, we can’t say that
temp. of 20 degree is double of 10 degrees.
b) Ratio scaled Variable: with a fix zero point. We can say equal, not equal, ratio. examples
age, income, height, weight.
Exploring Nominal and Ordinal Attributes:
• Frequency Charts
• Mode
• Cross Tabulation
• Pie- Chart
The mean (average) of a data set is found by adding all numbers in the data set and then dividing
by the number of values in the set. The median is the middle value when a data set is ordered
from least to greatest. The mode is the number that occurs most often in a data set. Created
by Sal Khan.
Interquartile range - Is used to find the outliners (extremely high or extremely low) in the data.
process to find Outliers is below
1) Compute IQR
2) Identify the maximum limit.
Session 2:
Introduction of tableau:
Tableau reader
Tableau desktop
Tableau Mobile
Tableau public
Zogo bike sharing systems case analysis: Analise demand with different season, weather condition
etc.
Class case.xlsx
Choose excel as data source and load class case file. After importing you can see 2 tabs loaded as below.
How to merge two data sources
Drag and drop to table space and apply union of 2 sheets. Drop in union box else it will become join.
Connection: 2 types of connections
Filtering can be done similar to excel. Tableau reads data type by providing high importance to string.
Bar chart is when we want to plot numeric entity against categorical entity. Example demands across
different seasons.
Stacked bar can be placed in tableau by dragging the dimension to colors in marks sections.
Trend line shows the correlation in the chart. as shown below it is positive co relation.
Trend line is based on regression between the entities. On hover it shows the regression.
Try to make scales unit free like it was done temperature in example.
Line chart is always suited for represented a time series like sales over a period of time.
Outliers can be plot in box plot for numeric column. you need to remove aggregate for box plot.
Box plot against category like below drag season into column.
Histogram frequency distribution should be used to see distribution of data. Skewness means outliers. It
is right skewed.
Time series: represents a series of data points collected an over a period of time. line chart
Bubble chart is different and better represtation of Bar chart
Session 3:
When columns headers are there, it called supervised data. Else unsupervised data.
Below is unsupervised learning as we can’t identify what each entity attribute means.