Statistics and Data
Statistics and Data
Outline
Relevance of Statistics
Course Details
Relevance of Statistics
CASE: PEPSI’S EXCLUSIVITY AGREEMENT
Case: Pepsi’s Exclusivity
Agreement
•A large university with a total enrollment of
about 50,000 students has offered Pepsi an
exclusivity agreement that would give Pepsi
exclusive rights to sell its products at all
university facilities for the next year with an
option for future years.
• In return, the university would receive 35% of
the on-campus revenues and an additional lump
sum of $200,000 per year.
• Pepsi has been given 2 weeks to respond.
The market for soft drinks is measured in
terms of 12-ounce cans.
Source: https://99designs.com/icon-button-design/contests/icon-button-design-wanted-guessing-game-167222
Profit-Loss Calculation
• Suppose the current market share were around
25%.
• Pepsi would sell 88,000 (22,000 is 25% of
88,000) cans per week or 3,520,000 cans per
year.
• The profit or loss can be calculated.
Source: https://www.score.org/resource/12-month-profit-and-loss-projection
Case 1: Market Survey
• The only problem is that Pepsi does not know
how many soft drinks are sold weekly at the
university.
• Pepsi assigned a recent university graduate to
survey the university's students to supply the
missing information.
• Accordingly, she organizes a survey that asks 500
students to keep track of the number of soft drinks
they purchase in the next 7 days.
Source: https://getthematic.com/insights/customer-survey-design/
Simple Random
Sample
Simple random sample is a sample
of n observations which has the
same probability of being selected
from the population as any other
sample of n observations.
• Most statistical methods presume
simple random samples.
• However, in some situations
other sampling methods have an
advantage over simple random
samples.
Source: https://www.statisticshowto.com/simple-random-sample/
Stratified Random
Sampling
• Divide the population into mutually
exclusive and collectively exhaustive
groups, called strata.
• Randomly select observations from each
stratum, which are proportional to the
stratum’s size.
• Advantages:
Guarantees that each population
subdivision is represented in the
sample.
Parameter estimates have greater
precision than those estimated from
simple random sampling.
Source: https://www.netquest.com/blog/en/random-sampling-stratified-sampling
Cluster Sampling
• Divide population into mutually exclusive
and collectively exhaustive groups, called
clusters.
• Randomly select clusters.
• Sample every observation in those randomly
selected clusters.
• Advantages and disadvantages:
Less expensive than other sampling
methods.
Less precision than simple random
sampling or stratified sampling.
Useful when clusters occur naturally in
the population.
Source: https://www.netquest.com/blog/en/cluster-sampling
A Simple Representation of Survey Data
(First 8 Rows)
Student Id No. of Cans Purchased in a Week
1 14
2 10
3 8
4 6
5 9
6 12
7 13
8 4
Decision-Making
Subset
Population Sample
Parameter Statistic
• Sample
A sample is a set of data drawn from the population.
Large enough, but less than the population.
Parameter and Statistic
• Parameter
A descriptive measure of a population.
• Statistic
A descriptive measure of a sample.
Too expensive to gather
information on the
entire population
Need for
Sampling Often impossible to
gather information on
the entire population
Two Branches
Statistics
Data Types
Cross- Time
Sectional Series
Case 1: Survey Data
Student Id No. of Cans Purchased
in a Week
Cross-sectional Data
1 14 • Data collected by recording a characteristic of many
2 10 subjects at the same point in time, or without
3 8
regard to differences in time.
4 6 • Subjects might include individuals, households,
5 9 firms, industries, regions, and countries.
6 12
7 13
8 4
Time Series Data
• Data collected by recording a characteristic of e-3 Wheeler Registrations in India
a subject over several time periods. 800000
India. 400000
in India. 100000
2015,Sep
2018,May
2013,Jan
2013,Sep
2014,Jan
2014,Sep
2015,Jan
2016,Jan
2016,Sep
2017,Jan
2017,Sep
2018,Jan
2018,Sep
2019,Jan
2019,Sep
2020,Jan
2020,Sep
2021,Jan
2021,Sep
2022,Jan
2013,May
2014,May
2015,May
2016,May
2017,May
2019,May
2020,May
2021,May
https://www.thehindu.com/opinion/op-ed/indias-ev-ambition-rides-on-three-
wheels/article65480119.ece
Case 2: Tween Survey
Case 2: Tween Survey
• Luke McCaffrey owns a ski resort two hours outside Boston.
• Luke is in need of a new marketing manager.
• Luke is particularly interested in serving the needs of the “tween” population
(children aged 8 to 12 years old).
• He believes that tween spending power has grown over the past few years, and
he wants their skiing experience to be memorable so that they want to return.
Tween Survey
• At the end of last year’s ski season, Luke asked 20 tweens four specific
questions:
Q1. On your car drive to the resort, which music streaming service
was playing?
Q2. Rate the quality of the food at the resort on a scale of 1 to 4.
Q3. What time should the main dining area close?
Q4. How much of your own money did you spend at the resort today?
Tween Survey Data
Variables and Scales of Measurement
Variable
• A variable is the general characteristic being observed on an object of
interest.
Types of Variables
Variables
Qualitative Quantitative
Types of Variables
• Qualitative – gender, race, political affiliation
• Quantitative – test scores, age, weight
Discrete
Continuous
Discrete Variable
• A discrete variable assumes a countable number of distinct values.
• Examples: Number of children in a family, number of points scored in a
basketball game.
Continuous Variables
• A continuous variable can assume an infinite number of values within
some interval.
• Examples: Weight, height, investment return.
Scales of Measurement
- Nominal
Qualitative Variables
- Ordinal
- Interval
Quantitative Variables
- Ratio
Nominal Scale
• The least sophisticated level of measurement.
• Data are simply categories for grouping the data.
• Solution: These are nominal data—the values in the data differ merely in
name or label.
Tweens Survey
• How are the data based on the ratings of the food quality similar to or
different from the music streaming data?
Tweens Survey
• How are the data based on the ratings of the food quality similar to or
different from the music streaming data?
• Solution: These are ordinal since they can be both categorized and ranked.
Interval Scale
• Differences between values are equal and meaningful. Thus, the
arithmetic operations of addition and subtraction are meaningful.
• No “absolute 0” or starting point defined. Meaningful ratios may not be
obtained.
Interval Scale
•For example, consider the Fahrenheit
scale of temperature.
•This scale is interval because the data
are ranked and differences (+ or -)
may be obtained.
•But there is no “absolute 0”.
Ratio Scale
• The strongest level of measurement.
• Differences between values are equal and meaningful.
• There is an “absolute 0” or defined starting point. “0” does mean
“the absence of …” Thus, meaningful ratios may be obtained.
Ratio Scale
•The following variables are measured on a ratio scale:
General Examples: Weight and Distance
Business Examples: Sales, Profits, and Inventory Levels
Tween Survey
• How are the time data classified? In what ways do the time data differ from
ordinal data? What is a potential weakness of this measurement scale?
Tween Survey
• How are the time data classified? In what ways do the time data differ from
ordinal data? What is a potential weakness of this measurement scale?
• Solution: Clock time responses are on an interval scale. With this type of data,
we can calculate meaningful differences, however, there is no apparent zero
point.
Tween Survey
• What is the measurement scale of the money data? Why is it considered the
most sophisticated form of data?
Tween Survey
• What is the measurement scale of the money data? Why is it considered the
most sophisticated form of data?
• Solution: Since the tweens’ responses are in dollar amounts, this is ratio-scaled
data; ratio-scaled data has a natural zero point which allows the calculation of
ratios.
Synopsis of Tween Survey
• 60% of the tweens listened to Spotify. The resort may want to direct its
advertising dollars to this streaming service.
• 55% of the tweens felt that the food was, at best, fair.
• 95% of the tweens would like the dining area to remain open later.
• 85% of the tweens spent their own money at the lodge.
Course Details
Course Plan
Introduction to
Sampling
Descriptive Probability and
Introduction Distribution and
Statistics Probability
Interval Estimation
Distributions
5/6 Questions
Total Marks: 50