Univariate and Multivariate Data Exploration
Univariate and Multivariate Data Exploration
DATA SETS
● The most popular datasets used to learn
data science is probably the Iris dataset,
introduced by Ronald Fisher, in his seminal
work on discriminant analysis,
● “The use of multiple measurements in
taxonomic problems” (Fisher, 1936)
● The Iris dataset contains 150 observations of
three different species
● Iris setosa, Iris virginica, and I. versicolor, with 50
observations each.
● Each observation consists of four attributes: sepal
length, sepal width, petal length, and petal width.
● The fifth attribute, the label, is the name of the
species observed.
1.1 Types of Data
● Data come in different formats and types
● For example, the temperature in weather data
can be expressed as any of the following
formats:
– Numeric centigrade (31 C, 33.3 C) or Fahrenheit (100
F, 101.45 F) or on the Kelvin scale
– Ordered labels as in hot, mild, or cold
– Number of days within a year below 0 C (10 days in a
year below freezing)
Types of Data Contd...
● Numerical or Continous