Unit 3 Probability Distributions - 21MA41
Unit 3 Probability Distributions - 21MA41
UNIT-III
PROBABILITY DISTRIBUTIONS
• To apply the knowledge of the statistical analysis and theory of probability in the study of
uncertainties.
• To use probability theory to solve random physical phenomena and implement appropriate
distribution models.
Introduction:
In this unit, discrete probability distributions and continuous probability distributions are
discussed. Discrete probability distribution is used when the sample space is discrete but not
countable, whereas continuous probability distribution is used when the sample space is
continuous or sample space is defined in a continuous interval.
In discrete distributions, the variables are distributed according to some definite probability
law which can be expressed mathematically. The present study will also enable us to fit a
mathematical' model or a function of the form y = p(x) to the observed data. In discrete
distributions Binomial distributions and Poisson distributions are discussed. In continuous
distributions, Exponential distributions and Normal distributions are discussed.
Bernoulli distribution: A random variable X which takes two values 0 and 1, with
probabilities q and p respectively, i.e ., P (X = 1) = p, P(X = 0) = q, q = 1 - p is called a Bernoulli
variate and is said to have a Bernoulli distribution.
The probability of getting a head or a tail on tossing a coin is 1/2. If a coin is tossed thrice, the
sample space S = {HHH, HHT, HTH, THH, TTH, THT, HTT, TTT}.
The probability of getting one head and two tails = 3/8. i.e., {HTT, TTH, THT}.
The probability of each one (one head, one tail, one tail) of these being (1/2) * (1/2) * (1/2)
i.e., (1/2)3, their total probability shall be 3 * (1/2)3.
Similarly if a trial is repeated ‘n’ times and if ‘p’ is the probability of a success and ‘q’ that of
a failure, then the probability of ‘r’ successes and ‘n – r’ failures is given by ‘𝑝𝑟 𝑞 𝑛−𝑟 ’. But
these ‘r’ successes and ‘n – r’ failures can occur in any of 𝑛𝐶𝑟 ways in each of which the
probability is same. Thus the probability of ‘r’ successes is 𝑛𝐶𝑟 𝑝𝑟 𝑞 𝑛−𝑟 . The probability of at
least ‘r’ successes in ‘n’ trials = Sum of probabilities of ‘r, r + 1, ..., n’ successes.
= 𝑛𝐶𝑟 𝑝𝑟 𝑞 𝑛−𝑟 + 𝑛𝐶𝑟+1 𝑝𝑟+1 𝑞 𝑛−𝑟−1 + ⋯ + 𝑛𝐶𝑛 𝑝𝑛 .
Poisson distribution:
Poisson distribution was discovered by the French mathematician and physicist Simeon Denis
Poisson (1781-1840) who published it in 1837. Poisson distribution is a limiting case of the
binomial distribution under the following conditions:
i) n, the number of trials is indefinitely large, i.e., 𝑛 → ∞.
ii) p, the constant probability of success for each trial is indefinitely small, i.e., 𝑝 → 0.
iii) 𝑛𝑝 = 𝜆, (say), is finite. Thus p = 𝜆 /n, q = 1 - 𝜆 /n, where 𝜆 is a positive real number.
The probability function of the Poisson distribution is given by
𝒆−𝝀 𝝀𝒙
𝒑(𝒙, 𝝀) = 𝑷(𝑿 = 𝒙) = where 𝝀 is known as the parameter of poisson distribution.
𝒙!
Definition: A random variable X is said to follow a Poisson distribution if it assumes only non-
negative values and its probability mass function is given by
𝑒 −𝜆 𝜆x
𝑝(x, 𝜆) = 𝑃(𝑋 = x) = ; 𝑥 = 0, 1, 2, … ; 𝜆 > 0
x!
= 0, otherwise
3. Poisson distribution occurs when there are events which do not occur as outcomes of a definite
number of trials (unlike that in binomial) of an experiment but which occur at random points
of time and space wherein our interest lies only in the number of occurrences of the event, not
in its non-occurrences.
4. Following are some instances where Poisson distribution may be successfully employed.
i) Number of deaths from a disease (not in the form of an epidemic) such as heart attack or
cancer or due to snake bite.
ii) Number of suicides reported in a particular city.
iii) The number of defective material in a packing manufactured by a good concern.
iv) Number of faulty blades in a packet of 100.
v) Number of air accidents in some unit of time.
vi) Number of printing mistakes at each page of the book.
vii) Number of telephone calls received at a particular telephone exchange in some unit of
time or connections to wrong numbers in a telephone exchange.
viii) Number of cars passing a crossing per minute during the busy hours of a day.
ix) The number of fragments received by a surface area 't' from a fragment atom bomb.
x) The emission of radioactive (alpha) particles.
Problems
𝑒 −𝜆 𝜆x
p(𝜆, 𝑥) = , 𝑥 = 0,1,2,3, … , ∞.
x!
𝑒 −2 2𝑥
𝑝(2, 𝑥) = , 𝑥 = 0,1,2,3, … , ∞.
𝑥!
Here, the problem is about finding the probability of the event, namely,
𝑒 −2 2𝑥
= 1 − [∑2𝑥=0 ]
𝑥!
= 1 - 𝑒 −2 (1 + 2 + 2) = 1 − 5𝑒 −2
2. A Car-hire firm has two cars it hires out daily. The number of demands for a car on
each day is distributed as Poisson variate with mean 1.5. Obtain the proportion of days
on which i) there was no demand ii) demand is refused.
Solution: Here λ = 1.5
𝑒 −1.5 (1.5)0
i) p(𝑥, 0) = = 0.2231
0!
= 0.1913
3. Assuming that the probability of an individual being killed in a mine accident during a
year is 1/2400. Use Poisson distribution to calculate the probability that in a mine
employing 200 miners there will be at least one fatal accident in a year?
Solution: Here p = 1/2400, n = 200, λ= np = 0.083
= 1 − e−0.083 = 0.0796
2
4. In a Poisson distribution if P(2) = 3 P(1), find P(0). Find also its mean and standard
deviation.
4 4
Mean = 𝜇 = 𝜆 = 3 and standard deviation = σ = √λ = √3
5. The incidence of occupational disease in an industry is such that the workmen have a 10%
chance of suffering from it. What is the probability that in a group of seven, 5 or more will
suffer from it.
Solution: p = 10% = 0.1, n = 7
𝜇 = 𝑛𝑝 = 0.1 ∗ 7 = 0.7
P(x ≥ 5) = P(5) + P(6) + P(7)
𝑒 −𝜆 𝜆5 𝑒 −𝜆 𝜆6 𝑒 −𝜆 𝜆7
= + +
5! 6! 7!
𝑒 −0.7 (0.7)5 𝑒 −0.7 (0.7)6 𝑒 −0.7 (0.7)7
= + + = 0.0008
5! 6! 7!
Exercise:
1. For a Poisson variable 3P(2) = P(4) , find standard deviation.
2. If the probability of a bad reaction from a certain injection is 0.001, determine the chance that
out of 2000 individuals more than two will get a bad reaction.
3. Fir a Poisson distribution to the set of observations given below.
x 0 1 2 3 4
f(x) 122 60 15 2 1
4. In a certain factory turning out razor blades there is a small chance of 0.002 for any blade to be
defective. The blades are supplied in packets of 10. Use Poisson distribution to calculate the
approximate number of packets containing no defective, one defective two blades defective
respectively ina consignment of 10,000 packets.
5. A manufacturer of cotter pins knows that 5% of his product is defective. If he sells cotter pins
in boxes of 100 and guarantees that not more than 10 pins will be defective, what is the
approximate probability that a box will fail to meet the guaranteed quality?
𝑒 −0.5 (0.5)𝑥
Answers: 1. 2.45 2. 0.32 3. f(𝑥) = , for N = 200, it is N ∗ f(𝑥).
𝑥!
(5)𝑥
4. 9802, 196, 2 5. 1 − 𝑒 −5 ∑10
𝑥=0 𝑥!
Exponential distribution
Many experiments involve the measurement of time X between an initial point of time and
the occurrence of some phenomenon of interest. Exponential distribution deals with such type
of continuous random variable X.
A continuous random variable X assuming non-negative values is said to have an exponential
distribution with parameter λ > 0 , if its probability density function is given by
−λ𝑥
f(𝑥) = { λe , 𝑥≥0
0, otherwise
Examples such as time between two successive job arrivals, duration of telephone calls, life
time of a component or a product, server time at a server in a queue can be taken under
Exponential distribution.
Mean and variance of Exponential distribution
∞
𝐌𝐞𝐚𝐧 = 𝝁 = ∫ 𝑥 f(𝑥)𝑑𝑥
0
∞
= ∫0 𝑥 λe−λ𝑥 𝑑𝑥
∞
= λ ∫0 𝑥 e−λ𝑥 𝑑𝑥
e−λ𝑥 e−λ𝑥
= λ [𝑥 ∗ (−λ) − 1 ∗ (−λ)2 ]
𝑥=0 to ∞
𝟏
𝝁=𝛌
∞
𝐕𝐚𝐫𝐢𝐚𝐧𝐜𝐞 = 𝝈𝟐 = ∫ 𝑥 2 f(𝑥)𝑑𝑥 − 𝜇 2
0
∞ 1 2
= ∫0 𝑥 2 λe−λ𝑥 𝑑𝑥 − (λ)
𝟏 𝟐
𝝈𝟐 = (𝛌)
𝟏
Standard deviation = 𝝈 = 𝛌
Problems:
1. Let the mileage (in thousands of miles) of a particular tyre be a random variable X having the
1 −𝑥
probability density (𝑥) = { e 20 ,𝑥 > 0 .
20
0, 𝑥 ≤ 0
Find the probability that one of these tyres will last (i) at most 10,000 miles
2. The length of time for one person to be served at a cafeteria is a random variable X having an
exponential distribution with a mean of 4 minutes. Find the probability that a person is served
in less than 3 minutes on at least 4 of the next 6 days.
1 1
Solution: Given, Mean = 4. i.e., Mean = 4 = λ → λ = 4
1 −𝑥
The probability density function is 𝑓(𝑥) = λe−λ𝑥 = e 4
4
∞
P(𝑥 < 3) = 1 − P(𝑥 ≥ 3) = 1 − ∫ 𝑓(𝑥)𝑑𝑥
3
∞ 1 −𝑥 −3
= 1 − ∫3 e 4 𝑑𝑥 = 1 − e 4 = 0.9875
4
Let D represents the number of days on which a person is served in less than 3 minutes. Then
using the binomial distribution, the probability that a person is served in less than 3 minutes
on at least 4 of next 6 days is;
P(D ≥ 4) = P(D = 4) + P(D = 5) + P(D = 6)
3 4 3 3 3
= 6𝐶4 (1 − 𝑒 −4 ) (𝑒 −4 )2 + 6𝐶5 (1 − 𝑒 −3/4 )5 (𝑒 −4 )1 + 6𝐶6 (1 − 𝑒 −3/4 )6 (𝑒 −4 )0
= 0.3968
3. The increase in sales per day in a shop is exponentially distributed with Rs 800 as the average.
If sales tax is paid at the rate of 6%, find the probability that increase in sales tax return from
that shop will exceed Rs 30 per day.
Solution: Given, Mean = 800
1 1
i.e., Mean = 800 = →λ=
λ 800
−𝑥
1
The probability density function is 𝑓(𝑥) = λe−λ𝑥 = 800 e800
6
Let X denotes the sales per day. Total sales tax on X items = 100 X
6
Given total sales tax exceeds Rs 30 per day. i.e., 100 X > 30. i. e. , X > 500
Probability of sales tax exceeding Rs 30 = Probability of sales per day exceeding 500
= P(X > 500) = 1 − P(X ≤ 500)
500
= 1 − ∫0 𝑓(𝑥)𝑑𝑥
4. After the appointment of a new sales manager the sales in a 2 wheeler showroom is
exponentially distributed with mean 4. If 2 days are selected at random what is the probability
that (i) on both days, the sales is over 5 units (ii) the sales is over 5 times at least 1 of 2 days.
1 1
Solution: Given, Mean = 4. i.e., Mean = 4 = λ → λ = 4
1 −𝑥
The probability density function is 𝑓(𝑥) = λe−λ𝑥 = 4 e 4
Exercise:
1. The sales per day in a shop are exponentially distributed with average sale amounting to Rs
100 and net profit is 8%. Find the probability that net profit exceed Rs 30 on 2 consecutive
days.
2. Let X and Y have common p.d.f αe−α𝑥 , 0 < 𝑥 < ∞, 𝛼 > 0. Find the p.d.f of
(i) 3 + 2X (ii) X – Y.
3. If X has exponential distribution with mean 2, find P(X < 1|X < 2).
4. The life (in years) of a certain electrical switch has an exponential distribution with an average
life of 2 years. If 100 of these switches are installed in different systems, find the probability
that at most 30 fail during the first year.
α α(𝑥−3) α
Answers: 1. (𝑒 −3.75 )2 2. 2 exp (− ),𝑥 > 3 , exp(−α|x|) , ∀ 𝑥
2 2
(1−e−λ ) 1
3. (1−e−2λ ) , where λ = 2. 4. P(X ≤ 30) = ∑30 𝑥
𝑥=0 100𝐶𝑥 (0.606) (0.394)
100−𝑥
Normal distribution
The normal distribution was first discovered in 1733 by English mathematician De-Moivre,
who obtained this continuous distribution as a limiting case of the binomial distribution and
applied it to problems arising in the game of chance.
Among all the distribution of a continuous random variable, the most popular and widely used
one is normal distribution function. Most of the work in correlation and regression analysis,
Problems:
2. A sample of 100 battery cells is tested to find the length of life, gave the following
results. Mean = 12 hrs. Standard Deviation = 3 hrs. Assuming the data to be normally
distributed what % of battery cells are expected to have life (i) more than 15 hrs. (ii)
less than 6hrs. (iii) between 10 & 14 hrs .
Solution: (i) when x = 15 for given mean = 12 hrs and standard deviation = 3 hrs;
X − μ 15 − μ X − μ 15 − 12
P(x > 15) = P ( > ) = P( > )
σ σ σ 3
= P(z > 1)
= 0.5 - 0.3413
= 0.1587 = 16%
(ii) When x = 6
X−μ 6−μ X − μ 6 − 12
P(x < 6) = P ( < ) = P( < )
σ σ σ 3
Fourth Semester 12 Statistics and Probability for Data
Science(21MA41)
Department of Mathematics
= P(z < −2)
= 0.5 - 0.4772
= 0.0228 = 2.28%
10−μ X−μ 6−μ
(iii) P(10 < 𝑥 < 14) = P ( < < )
σ σ σ
70−μ
Here, −0.6 = gives μ − 0.6σ = 70
σ
88−μ
and 1.4 = gives μ + 1.4σ = 88
σ
4. The marks X obtained in mathematics by 1000 students is normally distributed with mean
78% and standard deviation 11%. Determine how many students got marks above 90%.
Solution: Here, mean = 78% = 0.78 and standard deviation = 11% = 0.11.
X−μ X−0.78
Thus, 𝑧 = σ
= 0.11
0.9−0.78
For X = 0.9, write 𝑧 = = 1.09
0.11
= 1 − 0.86214 = 0.13786
5. X is a normal variate with mean 30 and standard deviation 5. Find the probabilities that (i)
26 ≤ X ≤ 40 (ii) X ≥ 45 (iii) |X − 30| > 5.
Solution: Given, mean = 30 and standard deviation = 5
X−μ X−30
Thus, 𝑧 = σ
= 5
26−30
(i) For X = 26, 𝑧 = = −0.8 and
5
40−30
For X = 40, 𝑧 = =2
5
= 0.76539
45−30
(ii) For X = 45, 𝑧 = =3
5
= 1 − P(−5 ≤ X − 30 ≤ 5)
= 1 − P(25 ≤ X ≤ 35)
= 1 − P(−1 ≤ z ≤ 1)
= 1 − (F(1) − F(−1))
Exercise:
1. In a test of 2000 electric bulbs, it was found that the life of a particular make was
normally distributed with an average life of 2040 hours and standard deviation of 60
hours. Estimate the number of bulbs likely to burn for (i) more than 2150 hours (ii)
less than 1950 hours (iii) more than 1920 hours but less than 2060 hours.
2. Assume that the reduction of a person’s oxygen consumption during a period of
transcendental meditation (T M) is a continuous random variable X normally
distributed with mean 37.6 cc/min and standard deviation 4.6 cc/min. Determine the
probability that during a period of T M a person’s oxygen consumption will be reduced
by (i) at least 44.5 cc/min (ii) at most 35 cc/min (iii) anywhere from 30 cc/min to 40
cc/min/
3. An analog signal received at a detector (measured in micro volts) may be modeled as a
Gaussian random variable N (200, 256) at a fixed point in time. What is the probability
that the signal will exceed 240 micro volts? What is the probability that the signal is
larger than 240 micro volts, given that it is larger than 210 micro volts.
4. In an examination it is laid down that a student passes if he secures 30 percent or more
marks. He is placed in the first, second or third division according as he secures 60%
or more marks, between 45% to 60% marks and marks between 30% and 45%
Weibull Distribution:
A distribution that is extensively used in recent years to deal with problems such as a fuse
may burn out, a steel column may buckle or a heat-sensing device may fail, is the Weibull
distribution. It was introduced by the Swedish physicist Waloddi Weibull in 1939.
𝛽 𝑥 𝛽−1 −(𝑥)𝛽
𝑓(𝑥; 𝛼, 𝛽) = {𝛼 (𝛼 ) 𝑒 𝛼 , 𝑥>0
0 ,𝑥 ≤ 0
where 𝛽 > 0 is the shape parameter and 𝛼 > 0 is the scale parameter, is called the Weibull
distribution.
The graphs of Weibull distribution for 𝛼 = 1 and various values of the parameter 𝛽 are as
shown in the figure.
We observe that the curves change considerably in shape for different values of the parameter
𝛽. If we let 𝛽 = 1, the Weibull distribution reduces to exponential distribution. For values of
𝛽 > 1, the curve become somewhat bell shaped and resembles the normal curve but shows
some skewness.
Exercise:
1. Suppose that the service life (in hours) of a semiconductor is a random variable having
Weibull distribution with 𝛼 = 1600 and 𝛽 = 0.5. What is the probability that a
semiconductor will still be in operating condition after 4,000 hours? Also find the mean
and standard deviation of the service life of the semiconductor.
2. Suppose that the lifetime of a certain kind of an emergency backup (in hours) is a
random variable 𝑋 having the Weibull distribution with 𝛼 = 10 and 𝛽 = 0.5.
Find
(a) the mean and variance lifetime of these batteries
(b) the probability that such a battery will last for more than 300 hrs.
Video Links:
https://www.youtube.com/watch?v=82Ad1orN-NA
https://www.youtube.com/watch?v=c06FZ2Yq9rk
https://www.youtube.com/watch?v=N-IVFB8Rlfo
https://www.youtube.com/watch?v=d5iAWPnrH6w
https://www.youtube.com/watch?v=vjXLH7FXrj8