Unit 3 Part II
Unit 3 Part II
2
What is a “Sample Size”?
• A sample size is a part of the population chosen for a survey or experiment.
• For example, you might take a survey of dog owner’s brand preferences. You won’t
want to survey all the millions of dog owners in the country (either because it’s too
expensive or time consuming), so you take a sample size. That may be several
thousand owners. The sample size is a representation of all dog owner’s brand
preferences. If you choose your sample wisely, it will be a good representation.
3
When Error can Creep in
• When survey on a small sample of the population, uncertainty creeps in to
statistics.
• When to survey a certain percentage of the true population, it can never be 100%
sure that statistics are a complete and accurate representation of the population.
• For example, you might state that your results are at a 90% confidence level. That
means if you were to repeat your survey over and over, 90% of the time your would
get the same results.
4
How to Find a Sample Size in Statistics
• A sample is a percentage of the total population in statistics.
• Finding a sample size can be one of the most challenging tasks in statistics and
depends upon many factors including the size of your original population.
5
Some methods for estimating the size of sample
7
Histogram of above example
8
How about family of three?
9
10
11
12
13
Probability Distributions (PD)
DiscretePD Continuous PD
Exponentia
Bernoulli Binomial Poisson Geometric Uniform Normal Triangular Lognormal Beta
l
Distributio Distributio Distributio Distributio Distributio Distributio Distributio Distributio Distributio
Distributio
n n n n n n n n n
n
14
Discrete Probability Distributions
Probability Mass Function (PMF)
a mathematical function f(x) specifying the probability of the random
variable X.
xi represents the i th value of X.
Properties:
Apprentice example
• Teams were required to select an artist (mainstream or avant-garde) and
sell their art for the most money possible.
21
Discrete Probability Distributions
Bernoulli Distribution
• two possible outcomes each with a constant probability of
occurrence
• typically “success” is x = 1 and “failure” x = 0
• p is the probability of a success outcome
E[X] = p
Var[X] = p(1 − p)
Discrete Probability Distributions
Example: Using the Bernoulli Distribution
Model whether an individual responds positively to a telemarketing
promotion.
• You have a box with 20 red and 80 white marbles.
• You ask individuals exposed to the telemarketing promotion to select a
marble and then replace it.
• If the customer selects a red marble, the customer makes a purchase.
• If the customer selects a white marble, the customer does not make a
purchase.
Discrete Probability Distributions
Binomial Distribution
• Models n independent replications of a Bernoulli experiment
• X represents the number of successes in these n experiments
Example: Computing Binomial Probabilities
• Suppose 10 individuals receive the telemarking promotion.
• Each individual has a 0.2 probability of making a purchase.
• Find the probability that exactly 3 of the 10 individuals make a
purchase.
Discrete Probability Distributions
Poisson Distribution
• Models the number of occurrences in some unit of measure (often
time or distance).
• There is no limit on the number of occurrences.
• The average number of occurrence per unit is a constant denoted as λ.
Discrete Probability Distributions
Example: Computing Poisson Probabilities
Suppose the average number of customers arriving at a Subway restaurant
during lunch hour is λ =12 per hour. The probability that exactly x customers
arrive during the hour is given by the Poisson distribution. Find the
probability that exactly 5 arrive during lunch hour:
f(5) = e-12(125)/5!
= (0.000006144)(248,832)/120
= 0.1274
Continuous Probability Distributions
Probability density function
• A curve described by a mathematical function that characterizes a
continuous random variable
Area = 1
x
Continuous Probability Distributions
Uniform Distribution
• Expected Value =
• Variance =
Continuous Probability Distributions
Example: Computing Uniform Probabilities
• Sales revenue for a product varies uniformly each week between $1000 and
$2000.
•f(x) = 1/(2000-1000)
= 1/1000
Area = 1
Continuous Probability Distributions
Example: (continued) Computing Uniform Probabilities
• Find the probability sales revenue will be less
than $1,300.
• P(X < 1300) = (1300-1000)/1000) = 0.30
Continuous Probability Distributions
Example(continued): Uniform Probabilities
• Find the probability that revenue will be between
$1,500 and $1,700.
Normal Distribution
- f(x) is a bell-shaped curve
- Characterized by 2 parameters
(mean)
σ2 (variance)
- Properties
1. Symmetric
2. Mean = Median = Mode
3. Unbounded
4. Empirical rules apply
Continuous Probability Distributions
Example: Using Z-table to Compute Normal Probabilities
• The distribution for customer demand (units per month) is normal with:
mean = 750
stdev. = 100
• Find the probability that demand will be:
a) at most 900 units/month
b) exceed 700 units/month
c) be between 700 and 900 units/month
Continuous Probability Distributions
Example: Computing Probabilities with the Standard Normal Tables
1. P(X < 900) = Z = (900 − 750)/100 = 1.50 using Z- table the value is 0.9332
2. P(X > 700) = 1− 0.3085 = 0.6915 [z= (700-750)/100 = -0.50 using z table value
is 0.3085]
37
Continuous Probability Distributions
Example continued):
0.9332
Continuous Probability Distributions
Example (continued):
0.6915
Continuous Probability Distributions
Example(continued):
0.6247
Continuous Probability Distributions
Exponential Distribution
• Models the time between randomly occurring events (arrivals, machine failures,
etc.)
with λ=1
• The sampling distribution describes how these different values are distributed. Technically,
it could choose any statistic to paint a picture, some common ones are:
• Mean
• Mean absolute value of the deviation from the mean
• Range
• Standard deviation of the sample
• Unbiased estimate of variance
• Variance of the sample
• Proportion
43
How measures of central tendency and spread are affected by changes to the data set
What happens to measures of central tendency and spread when we add a constant
value to every value in the data set? To answer this question, let’s pretend we have
the data set 3, 3, 7, 9, 13, and let’s calculate our measures for the set.
• Mean: (3+3+7+9+13)/5=7
• Median: 7
• Mode: 3
• Range: 13-3=10
• IQR: 11-3=8
What we see is that adding 6 to the entire data set also adds 6 to the mean,
median, and mode, but that the range and IQR stay the same.
44