0% found this document useful (0 votes)
415 views

Data Visualization and Story Telling Notes

The document discusses data visualization and storytelling. It provides an agenda for making people familiar with data visualization and how to communicate interpretations of data through visualization. It then discusses concepts like time, events, artifacts, data collection, and pattern identification from data. Examples on climate data from tree rings are provided. Steps for understanding data like selecting relevant data and identifying patterns are outlined. Finally, it discusses what storytelling is, the basic structure of a data story, and exploring different data types through various visualizations in Tableau.

Uploaded by

aakash verma
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
415 views

Data Visualization and Story Telling Notes

The document discusses data visualization and storytelling. It provides an agenda for making people familiar with data visualization and how to communicate interpretations of data through visualization. It then discusses concepts like time, events, artifacts, data collection, and pattern identification from data. Examples on climate data from tree rings are provided. Steps for understanding data like selecting relevant data and identifying patterns are outlined. Finally, it discusses what storytelling is, the basic structure of a data story, and exploring different data types through various visualizations in Tableau.

Uploaded by

aakash verma
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Data visualization and Story telling

Session 1:

Agenda is to make familiar with data visualization and how to communicate with data visualization. Just
plotting graphs does not end task. we should be able to communicate the interpretation of data.

Role of data:

What is time according to you.

1)Sequence of events

2)Continuous

3)Time is base. all activities happen in time.

4)Start and end of events

To summarize, time is continuous in nature and all the events can be marked within the time. Events
may leave certain artifacts which can be in form of another events , data etc.

How will you know about past events and impact of this events?
We should collect the data. But what does data mean. Data means analysis of data point that can help
to get perception about the artifact.

Try to understand the outcome /Artifacts of events before going to data points. Then try to analyze data
point that can help to understand outcomes and take better decisions. Another is to understand the
patterns from data that can help to take decisions.

Example of Climate data:

Do we have storage of past climatic data?

Tree stores climate data. Research happens to find the age of tree.

Every year tree adds one stem and based on climate size of stem is different.
Try to find out the artifacts from the data.

Whenever you want to understand any event below are the steps

1)Select the relevant data

2)Process the data like merging

3)Clean the data like remove inconsistency or adding missing values.

4)Transform the data like data of daily sales needs to transform to identify monthly weekly sales etc.

Columns are dimensions, attributes etc.

5)Identify the Patterns from data using data mining techniques, visualization etc. From pattern we try to
convert it into insights.
Identify plain Problem:

We should focus on missing parts why plain not returned with bullet less area like 1 and 3.
We should choose the appropriate scaling to plot graphs, so chart is very clear

What is story telling:

Story is sense making process with the help of phenomenon. you make sense of insight keeping how it
happens.

Simply telling sales has gone down, does not make sense. we should have insight on how and why it
happens.
The data story has below basic structure

1)in the past

2)Then something happened

3) As a result

How should be understand the data:

Try to understand what is this data about, what each value /column represent in this data. What is
type of data example Quantitative or qualitative.

First step to know is what is entity of data. Entity is any virtual or physical object. Example: in batch of
student data each entity is student.

Then understand the attribute of each entity. Example: name, age, percentage marks etc. of each
student. Type of each attribute. Quantitative or qualitative. How this data is generated, how attribute
values are calculated.

Attribute can be called as variables, Columns features in machine learning or characterizes.


Types of Attributes:

1)Qualitative (Categorical)

a) Nominal variable: related to names, the values of nominal attribute are names of things, some
kind of symbol. Example gender, occupation. There is no order (rank, position) among values of
nominal attribute. By nominal attribute we can infer if two entities are similar or not. That’s it nothing
more

b) Binomial Variable: Which has 2 states. Any variable which has only 2 values. example work
experience (Yes or no). There are of two types on binomial variable:

symmetric: in which both the labels have equal relevance. Like in our batch experience or
unexperienced is same

asymmetric: which has different relevance like covid + and covid. we can represent them with 1
or 0.
C) Ordinal attributes: have a meaningful sequence or ranking. but magnitude between values is not
known . example: grade A, B, C, D, F. the order of value is that show what is important but don’t
indicate how important it is. like we can say A is better than B but how better.

2) Quantitative (Numerical): Numeric is Quantitative because it is measurable quantity.

a) Interval scaled variable: has values whose difference are interpretable but do not have
absolute Zero. Example: temperature 0 means it is some temperature. Similarly, we can’t say that
temp. of 20 degree is double of 10 degrees.

b) Ratio scaled Variable: with a fix zero point. We can say equal, not equal, ratio. examples
age, income, height, weight.
Exploring Nominal and Ordinal Attributes:

• Frequency Charts

• Mode

• Cross Tabulation

• Pie- Chart

Exploring Numeric Attributes:

• Mean, Mode, Median ---Measure of central tendency

• Range, standard deviation, interquartile range --- Measure of dispersion

• Outliers, Boxplots, five number summary (minimum, Q1,Q2,Q3,maximum)

• Correlation among variables

The mean (average) of a data set is found by adding all numbers in the data set and then dividing
by the number of values in the set. The median is the middle value when a data set is ordered
from least to greatest. The mode is the number that occurs most often in a data set. Created
by Sal Khan.

Q1 -25% percentile ,q2 -50% ,q3 -75%

Interquartile range - Is used to find the outliners (extremely high or extremely low) in the data.
process to find Outliers is below

1) Compute IQR
2) Identify the maximum limit.
Session 2:

Introduction of tableau:

Founded in California in year 2003. Founded by academic.it is recently acquired by salesforce.

It has different products.

 Tableau reader
 Tableau desktop
 Tableau Mobile
 Tableau public
Zogo bike sharing systems case analysis: Analise demand with different season, weather condition
etc.

Class case.xlsx

Most of the data is categorial values as dictionary is defined them as category.

Registered users can be defined from casual.

we have 2 sheets so we can see merge sheets in tableau.

Merging data from different data source is also possible.

Data sources in tableau

Choose excel as data source and load class case file. After importing you can see 2 tabs loaded as below.
How to merge two data sources

Drag and drop to table space and apply union of 2 sheets. Drop in union box else it will become join.
Connection: 2 types of connections

Live : live will automatically refresh any changes in source

Extract: extract is static copy of data only. It is faster than live.

Filtering can be done similar to excel. Tableau reads data type by providing high importance to string.

Convert numeric to non-numeric by creating group as shown below

Right click on field and click create group.


After grouping it will create new column for the group

Create calculated field: derive column based on formula


Explore Chartings:

Bar chart is when we want to plot numeric entity against categorical entity. Example demands across
different seasons.

All categorical data types are dimensions or attributes.

All numeric variables are for measures.

Column shall correspond to X axis in a graph

Row Shall corresponds to Y axis in a graph.

Stacked bar can be placed in tableau by dragging the dimension to colors in marks sections.

Demand against temperature: scatter plot is required as 2 numeric values.


In scatter plot we need to remove aggregate functions from top menu analysis.

Trend line shows the correlation in the chart. as shown below it is positive co relation.

Trend line is based on regression between the entities. On hover it shows the regression.

Try to make scales unit free like it was done temperature in example.

Line chart is always suited for represented a time series like sales over a period of time.

Outliers can be plot in box plot for numeric column. you need to remove aggregate for box plot.
Box plot against category like below drag season into column.
Histogram frequency distribution should be used to see distribution of data. Skewness means outliers. It
is right skewed.

Time series: represents a series of data points collected an over a period of time. line chart
Bubble chart is different and better represtation of Bar chart
Session 3:

Data driven decision making:

Data is in in different formats.


Machine learning – tomorrow what can happen. machine learning works on structured data.
Deep learning - Deep learning means understanding important feature based on common pattern.
We should try to add more features to the data to identify the targets.

When columns headers are there, it called supervised data. Else unsupervised data.
Below is unsupervised learning as we can’t identify what each entity attribute means.

Bin is a number which we converted into small categories.


Deep learning models explanation is very difficult.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy