05 Probability Distributions
05 Probability Distributions
(CS40003)
Lecture #5
Probability Distributions
Binomial distribution
Multinomial distribution
Hypergeometric distribution
Poisson distribution
CS 40003: Data Analytics 3
Today’s discussion
• Continuous probability distribution
Continuous uniform probability distribution
Normal distribution
Chi-squared distribution
Gamma distribution
Exponential distribution
Lognormal distribution
Weibull distribution
Probability deals with predicting the Statistics involves the analysis of the
likelihood of future events. frequency of past events
Example: Consider there is a drawer containing 100 socks: 30 red, 20 blue and
50 black socks.
We can use probability to answer questions about the selection of a
random sample of these socks.
PQ1. What is the probability that we draw two blue socks or two red socks from
the drawer?
PQ2. What is the probability that we pull out three socks or have matching pair?
PQ3. What is the probability that we draw five socks and they are all black?
SQ1: A random sample of 10 socks from the drawer produced one blue, four red, five
black socks. What is the total population of black, blue or red socks in the drawer?
SQ2: We randomly sample 10 socks, and write down the number of black socks and
then return the socks to the drawer. The process is done for five times. The mean
number of socks for each of these trial is 7. What is the true number of black socks in
the drawer?
etc.
In statistics, we are given data and asked what kind of model is likely to have
generated it.
Example
4.2: In “measles Study”, we define a random variable as the number
of parents in a married couple who have had childhood measles.
This random variable can take values of .
Note:
Random variable is not exactly the same as the variable defining a data.
The probability that the random variable takes a given value can be computed
using the rules governing probability.
For example, the probability that means either mother or father but not both has had
measles is . Symbolically, it is denoted as P(X=1) = 0.32
CS 40003: Data Analytics 9
Probability Distribution
Definition 4.2: Probability distribution
A probability distribution is a definition of probabilities of the values of
random variable.
Example 4.3: Given that is the probability that a person (in the ages between
17 and 35) has had childhood measles. Then the probability distribution is given
by
X Probability
?
0 0.64
1 0.32
2 0.04
0.64
Example: Measles Study
0 1 2
0.64 0.32 0.04 0.32
f(x)
0.04
The use of simulation studies can often eliminate the need of costly experiments and
is also often used to study problems where actual experimentation is impossible.
Examples 4.4:
1) A study involving testing the effectiveness of a new drug, the number of cured
patients among all the patients who use such a drug approximately follows a
binomial distribution.
The function for computing the probability for the binomial probability
distribution is given by
for x = 0, 1, 2, …., n
Here, where denotes “the number of success” and denotes the number of
success in trials.
Thus,
15 38 68 39 49 54 19 79 38 14
If the value of the digit is 0 or 1, the outcome is “had childhood measles”, otherwise,
(digits 2 to 9), the outcome is “did not”.
For example, in the first pair (i.e., 15), representing a couple and for this couple, x = 1. The
frequency distribution, for this sample is
x 0 1 2
f(x)=P(X=x) 0.7 0.3 0.0
If a given trial can result in the k outcomes with probabilities then the
probability distribution of the random variables representing the number of
occurrences for in n independent trials is
where =
and
Example 4.8:
Probability of observing three red cards in 5 draws from an ordinary deck of 52
playing cards.
You draw one card, note the result and then returned to the deck of cards
Reshuffled the deck well before the next drawing is made
• The hypergeometric distribution does not require independence and is based on the
sampling done without replacement.
with
Example 4.9:
Number of clients visiting a ticket selling counter in a metro station.
1. [ is the mean ]
2. [ is the variance ]
and
2. Hypergeometric distribution
The hypergeometric distribution function is characterized with the size of a sample ,
the number of items and labelled success. Then
f(x)
x1 x2 x3 x4
X=x
Discrete Probability distribution
f(x)
X=x
Continuous Probability Distribution
1.
f(x)
a b
X=x
A B
X=x
Note:
a)
b) )= where both and are in the interval (A,B)
f(x)
σ2
µ1
µ1 µ2
µ2 µ1 = µ2
Normal curves with µ1< µ2 and σ1 = σ2 Normal curves with µ1 = µ2 and σ1< σ2
σ1
σ2
µ1 µ2
CS 40003: Data Analytics Normal curves with µ1<µ2 and σ1<σ2 36
Properties of Normal Distribution
The
curve is symmetric about a vertical axis through the mean
The random variable can take any value from
The most frequently used descriptive parameter s define the curve itself.
The mode, which is the point on the horizontal axis where the curve is a
maximum occurs at .
The total area under the curve and above the horizontal axis is equal to .
[Z-transformation]
0.09
0.4
0.08 σ σ=1
0.07
0.3
0.06
0.05
0.2
0.04
0.03
0.02 0.1
0.01
0.00 0.0
-5 0 5 10 15 20 25 -3 -2 -1 0 1 2 3
x=µ µ=0
f(x: µ, σ) f(z: 0, 1)
for
Further,
Note:
[An important property]
The continuous random variable has a gamma distribution with parameters and
such that:
1.0
σ=1, β=1
0.8
0.6
f(x)
0.4
σ=2, β=1
0.2
σ=4, β=1
0.0
0 2 4 6 8 10 12
x
Note:
1) The mean and variance of gamma distribution are
and
0.7
µ=0, σ=1
0.6
0.5
0.4
f(x)
0.3
0.2
µ=1, σ=1
0.1
0.0
0 5 10 15 20
x
The continuous random variable has a Weibull distribution with parameter and
such that.
where and
1.0
ß=1
The
mean and variance of Weibull
0.8
distribution are:
0.6
f(x)
0.4
ß=2
0.2
ß=3.5
0.0
0 2 4 6 8 10 12
x