Probability & Statistics PDF
Probability & Statistics PDF
for
P ro ba bility a nd S ta tistic s
AAOC ZC1 1 1
M.S.Radhakrishnan
Narendra Saini
Ashok Jitawat
Contents
Page No.
INTRODUCTION, SAMPLE SPACES & EVENTS 1
Probability 1
Events 2
AXIOMS OF PROBABILITY 4
Some elementary consequences of the Axioms 4
Finite Sample Space (in which all outcomes are equally likely) 6
CONDITIONAL PROBABILITY 11
Independent events 11
Theorem on Total Probability 14
BAYE’S THEOREM 16
MATHEMATICAL EXPECTATION & DECISION MAKING 22
RANDOM VARIABLES 26
Discrete Random Variables 27
Binomial Distribution 28
Cumulative Binomial Probabilities 29
Binomial Distribution – Sampling with replacement 31
Mode of a Binomial distribution 31
Hyper Geometric Distribution (Sampling without replacement) 32
Binomial distribution as an approximation to the Hypergeometric
34
Distribution
THE MEAN AND VARIANCE OF PROBABILITY DISTRIBUTIONS 36
The mean of a Binomial Distribution 37
Digression 37
Chebychevs theorem 39
Law of large numbers 41
Poisson Distribution 42
Poisson approximation to binomial distribution 42
Cumulative Poisson distribution 43
Poisson Process 43
The Geometric Distribution 46
Multinomial Distribution 52
Simulation 54
CONTINUOUS RANDOM VARIABLES 56
Probability Density Function (pdf) 57
Normal Distribution 64
Normal Approximation to Binomial Distribution 69
Correction for Continuity 70
Other Probability Densities 71
The uniform Distribution 71
Gamma Function 73
Properties of Gamma Function 74
The Gamma Distribution 74
Exponential Distribution 74
Beta Distribution 78
The Log-Normal Distribution 79
Conditional Distribution 86
Independence 87
Two-Dimensional Continuous Random Variables 88
Marginal and Conditional Densities 90
Independence 91
The Cumulative Distribution Function 93
Properties of Expectation 100
Sample Mean 101
Sample Variance 102
SAMPLING DISTRIBUTION 115
Statistical Inference 115
Statistics 116
Probability
Let E be a random experiment (where we ‘know’ all possible outcomes but can’t predict
what the particular outcome will be when the experiment is conducted). The set of all
possible outcomes is called a sample space for the random experiment E.
Example 1:
Toss two coins and observe the sequence of heads and tails. A sample space for this
experiment could be S = {HH , TH , HT , TT }. If however we only observe the number
of heads got, the sample space would be S = {0, 1, 2}.
Example 2:
Toss two fair dice and observe the two numbers on the top. A sample space would be
If however, we are interested only in the sum of the two numbers on the top, the
sample space could be S = { 2, 3, …, 12}.
Example 3:
1
Example 4:
Events
Example 5:
Suppose a balanced die is rolled and we observe the number on the top. Let A be the
event: an even number occurs.
Thus in symbols,
A = {2,4,6} ⊂ S = {1,2,3,4,5,6}
Two events are said to be mutually exclusive if they cannot occur together; that is there
is no element common between them.
In the above example if B is the event: an odd number occurs, i.e. B = {1,3,5} , then A and
B are mutually exclusive.
Solved Examples
Example 1:
(a) region 2 (b) regions 1 and 3 together (c) regions 3, 5, 6 and 8 together.
2
A B
7 2 5
1
4 3
C 6
8
Solution:
(a) Since this region is contained in A and B but not in C, it represents the event that
the shaft is too large and the windings improper but the electrical connections are
satisfactory.
(b) Since this region is common to B and C, it represents the event that the windings
are improper and the electrical connections are unsatisfactory. (c) Since this is the
entire region outside A, it represents the event that the shaft size is not too large.
Example 2:
A carton of 12 rechargeable batteries contain one that is defective. In how many ways can
the inspector choose three of the batteries and
Solution:
(a) one defective can be chosen in one way and two good ones can be chosen in
11
= 55 ways. Hence one defective and two good can be chosen in 1 x 55 = 55
2
ways.
11
(b) Three good ones can be chosen in = 165 ways
3
3
AXIOMS OF PROBABILITY
(i) 0 ≤ P ( A) ≤ 1
(ii) P (S ) = 1
(iii) If A and B are any two mutually exclusive events, then
P ( A ∪ B ) = P ( A) + P ( B )
(iv) If {A1, A2 - - - - - -An , …} is a sequence of pair- wise mutually exclusive
events, then P ( A1 ∪ A2 ∪ ... ∪ An ∪ ...) = P ( A1 ) + P ( A2 ) + ... + P ( An ) + ...
Axiom 1 says that the probability of an event is always a number between 0 and 1.
Axiom 2 says that the probability of the certain event S is 1. Axiom 3 says that the
probability is an additive set function.
1. P(φ ) = 0
Proof: S= S ∪ φ .. Now S and φ are disjoint.
Hence P ( S ) = P ( S ) + P (φ ) P (φ ) = 0. Q.E.D.
Proof: By induction on n.
Def.: If A is an event
A′ the complementary event = S-A (It is the shaded portion in the figure below)
4
3. P ( A′) = 1 − P ( A)
Proof: S = A ∪ A ′
Now P ( S ) = P ( A) + P ( A′) as A and A ′ are disjoint or 1 = P ( A) + P ( A′) .
Thus P ( A′) = 1 − P ( A) . Q.E.D.
4. Probability is a
subtractive set function; i.e.
B
If A ⊂ B , then A
P ( B − A) = P ( B ) − P ( A) .
∴ P( A ∪ B ) = P( A) + P ( B) − P( A ∩ B ) . Q.E.D.
5
Proof:
P(A ∪ B ∪ C) = P(A ∪ B) + P(C) − P((A ∪ B) ∩ C )
= P(A) + P(B) − P(A ∩ B) + P(C) − P((A ∪ B) ∩ C)
= P(A) + P(B) + P(C) − P(A ∩ B) − P((A ∩ C) ∪ (B ∩ C))
= P(A) + P(B) + P(C) − P(A ∩ B) − P(A ∩ C) − P(B ∩ C) + P(A ∩ B ∩ C)
More generally,
+ (−1) n −1 P(A1 ∩ A 2 ∩ − − − − − − − ∩ A n )
Finite Sample Space (in which all outcomes are equally likely)
6
Example 1:
If a card is drawn from a well-shuffled pack of 52 cards find the probability of drawing
2
(a) a red king Ans:
52
16
(b) a 3, 4, 5 or 6 Ans:
52
1
(c) a black card Ans:
2
4
(d) a red ace or a black queen Ans:
52
Example 2:
When a pair of balanced die is thrown, find probability of getting a sum equal to
(a) 7.
6 1
Ans: = (Total number of equally likely outcomes is
36 6
36 & the favourable number of outcomes = 6, namely
(1,6), (2,5),, …(6,1).)
2
(b) 11 Ans:
36
8
(c) 7 or 11 Ans:
36
1 2 1 4
(d) 2, 3 or 12 Ans: = + + = .
36 36 36 36
Example 3:
10 persons in a room are wearing badges marked 1 through 10. 3 persons are chosen at
random and asked to leave the room simultaneously and their badge nos are noted. Find
the probability that
(a) the smallest badge number is 5.
(b) the largest badge number is 5.
7
Solution:
(a) 3 persons can be chosen in 10C3 equally likely ways. If the smallest badge
number is to be 5, the badge numbers should be 5 and any two of the 5
numbers 6, 7, 8, 9,10. Now 2 numbers out of 5 can be chosen in 5C2 ways.
Hence the probability that the smallest badge number is 5 is 5C2 /10C3 .
(b) Ans. 4C2 /10C3 .
Example 4:
A lot consists of 10 good articles, 4 articles with minor defects and 2 with major defects.
Two articles are chosen at random. Find the probability that
10
C2
(a) both are good Ans: 16
C2
2
C2
(b) both have major defects Ans: 16
C2
6c1
(c) At least one is good Ans: 1 – P(none is good) = 1 −
16c 2
10c1 . 6c1
(d) Exactly one is good Ans:
16c 2
(e) At most one is good Ans. P(none is good) + P(exactly one is good) =
6c 2 10c1 . 6c1
+
16c 2 16c 2
14c 2
(f) Neither has major defects Ans:
16c 2
6c 2
(g) Neither is good Ans:
16c 2
8
Example 5:
From 6 positive and 8 negative integers, 4 integers are chosen at random and multiplied.
Find the probability that their product is positive.
Solution:
The product is positive if all the 4 integers are positive or all of them are negative or two
of them are positive and the other two are negative. Hence the probability is
6 8 6 8
4 4 2 2
+ +
14 14 14
4 4 4
Example 6:
If, A, B are mutually exclusive events and if P(A) = 0.29, P(B) = 0.43, then
Example 7:
Example 8:
Example 9:
Three newspapers are published in a city. A recent survey of readers indicated the
following:
20 + 16 + 14 + −8 − 5 − 4 + 2
Ans. 1 − P(A ∪ B ∪ C) = 1 − = 0.65
100
(c) reads at least A and B given he reads at least one of the papers.
P (At least reading A and B given he reads at least one of the papers)
P(A ∩ B) 8
= =
P(A ∪ B ∪ C) 35
10
CONDITIONAL PROBABILITY
P(A ∩ B)
P(A | B) = probability of A given B = .
P(B)
P(A ∩ B)
Similarly we define P(B | A) = if P(A) ≠ 0.
P(A)
Example 10
A bag contains 4 red balls and 6 black balls. 2 balls are chosen at random one by one
without replacement. Find the probability that both are red.
Solution
Let A be the event that the first ball drawn is red, B the event the second ball drawn is
red. Hence the probability that both balls drawn are red =
4 3 2
P(A ∩ B) = P(A) × P(B | A) = × =
10 9 15
Independent events:
A∩B
A′ ∩ B
Mutually
exclusive
Example 11
Solution
If Ai is the event of getting a head in the ith toss, A1, A2, …, A8 are independent and
1
P(Ai) = for all i. Hence P(getting all heads) =
2
8
1
P(A1) P(A2)…P(An) =
2
Example 12
It is found that in manufacturing a certain article, defects of one type occur with
probability 0.1 and defects of other type occur with probability 0.05. Assume
independence between the two types of defects. Find the probability that an article chosen
at random has exactly one type of defect given that it is defective.
12
Let A be the event that article has exactly one type of defect.
P(A ∩ B)
Required P(A | B) =
P(B)
0.14
∴Probability =
0.145
[Note: If A and B are two events, probability that exactly only one of them occurs
is P(A) + P(B) – 2P(A ∩ B)]
Example 13
P (A fails) = 0.2
13
Solution
P(A and B failed) 0.15 1
(a) P(A fails | B has failed) = = =
P(B failed) 0.30 2
Example 14
A binary number is a number having digits 0 and 1. Suppose a binary number is made up
of ‘n’ digits. Suppose the probability of forming an incorrect binary digit is p. Assume
independence between errors. What is the probability of forming an incorrect binary
number?
Example 15
A question paper consists of 5 Multiple choice questions each of which has 4 choices (of
which only one is correct). If a student answers all the five questions randomly, find the
probability that he answers all questions correctly.
5
1
Ans .
4
Let B1, B2, …, Bn be n mutually exclusive events of which one must occur. If A is any
other event, then
Example 16
There are 2 urns. The first one has 4 red balls and 6 black balls. The second has 5 red
balls and 4 black balls. A ball is chosen at random from the 1st and put in the 2nd. Now a
ball is drawn at random from the 2nd urn. Find the probability it is red.
14
Solution:
Let B1 be the event that the first ball drawn is red and B2 be the event that the first ball
drawn is black. Let A be the event that the second ball drawn is red. By the theorem on
total probability,
4 6 6 5 54
P(A) = P(B1) P(A | B1) + P(B2) P(A | B2) = × + × = =0.54.
10 10 10 10 100
Example 17:
A consulting firm rents cars from three agencies D, E, F. 20% of the cars are rented from
D, 20% from E and the remaining 60% from F. If 10% of cars rented from D, 12% of
cars rented from E, 4% of cars rented from F have bad tires, find the probability that a
car rented from the consulting firm will have bad tires.
Example 18:
A bolt factory has three divisions B1, B2, B3 that manufacture bolts. 25% of output is
from B1, 35% from B2 and 40% from B3. 5% of the bolts manufactured by B1 are
defective, 4% of the bolts manufactured by B2 are defective and 2% of the bolts
manufactured by B3 are defective. Find the probability that a bolt chosen at random from
the factory is defective.
25 5 35 4 40 2
Ans. × + × + ×
100 100 100 100 100 100
15
BAYES’ THEOREM
Let B1, B2, ……….Bn be n mutually exclusive events of which one of them must occur.
If A is any event, then
Example 19
Miss ‘X’ is fond of seeing films. The probability that she sees a film on the day before
the test is 0.7. Miss X is any way good at studies. The probability that she maxes the test
is 0.3 if she sees the film on the day before the test and the corresponding probability is
0.8 if she does not see the film. If Miss ‘X’ maxed the test, find the probability that she
saw the film on the day before the test.
Solution
Let B1 be the event that Miss X saw the film before the test and let B2 be the
complementary event. Let A be the event that she maxed the test.
Required. P(B1 | A)
P(B1 )P(A | B1 )
=
P(B1 ) × P(A | B1 ) + P(B) × P(A | B 2 )
0 .7 × 0 .3
=
0 . 7 × 0 . 3 + 0 . 3 × 0 .8
Example 20
At an electronics firm, it is known from past experience that the probability a new worker
who attended the company’s training program meets the production quota is 0.86. The
corresponding probability for a new worker who did not attend the training program is
0.35. It is also known that 80% of all new workers attend the company’s training
16
program. Find probability that a new worker who met the production quota would have
attended the company’s training programme.
Solution
Let B1 be the event that a new worker attended the company’s training programme. Let
B2 be the complementary event, namely a new worker did not attend the training
programme. Let A be the event that a new worker met the production quota. Then we
0 .8 × 0 .8
want P(B1 | A) = .
0.8 × 0.86 + 0.2 × 0.35
Example 21
A printing machine can print any one of n letters L1, L2,……….Ln. It is operated by
electrical impulses, each letter being produced by a different impulse. Assume that there
is a constant probability p that any impulse prints the letter it is meant to print. Also
assume independence. One of the impulses is chosen at random and fed into the machine
twice. Both times, the letter L1 was printed. Find the probability that the impulse chosen
was meant to print the letter L1.
Solution:
Let B1 be the event that the impulse chosen was meant to print the letter L1. Let B2 be the
complementary event. Let A be the event that both the times the letter L1 was printed.
1
P(B1) = . P(A|B1) = p2. Now the probability that an impulse prints a wrong letter is (1-
n
1− p
p). Since there are n-1 ways of printing a wrong letter, P(A|B2) = . Hence P(B1|A)
n −1
P(B1 ) × P(A | B1 )
=
P(B1 ) × P(A | B1 ) + P(B 2 ) × P(A | B 2 )
1 2
p
n
= 2
. This is the required probability.
1 2 1 1− p
p + 1−
n n n −1
17
Miscellaneous problems
1 (a). Suppose the digits 1,2,3 are written in a random order. Find probability that at
least one digit occupies its proper place.
Solution
There are 3! = 6 ways of arranging 3 digits (See the figure), out of which in 4
arrangements , at least one digit occupies its proper place. Hence the probability is
4 4
= . 123 213 312
3! 6
132 231 321
(Remark. An arrangement like 231, where no digit occupies its proper place is
called a derangement.)
15
(b) Same as (a) but with 4 digits 1,2,3,4 Ans. (Try proving this.)
24
Solution
Let A1 be the Event 1st digit occupies its proper place
A2 be the Event 2nd digit occupies its proper place
A3 be the Event 3rd digit occupies its proper place
A4 be the Event 4th digit occupies its proper place
18
1 1 1
= 1− + −
2 6 24
24 − 12 + 4 − 1 15
= =
24 24
Solution
Let A1 be the Event 1st digit occupies its proper place
A2 be the Event 2nd digit occupies its proper place
……………………
An be the Event nth digit occupies its proper place
= P(A1∪A2 ∪ … ∪An)
(n − 1)! (n − 2)! (n − 3)! 1
= nc1 − nc 2 + nc 3 - ...... + (-1) n -1
n! n! n! n!
1 1 1 1
= 1− + − ..........(−1) n −1 ≈ 1 − e −! (for n large).
2! 3! 4! n!
2. In a party there are ‘n’ married couples. If each male chooses at random a
female for dancing, find the probability that no man chooses his wife.
1 1 1 1
Ans 1-( 1 − + − ..........(−1) n −1 ).
2! 3! 4! n!
3. A and B play the following game. They throw alternatively a pair of dice.
Whosoever gets sum of the two numbers on the top as seven wins the game
and the game stops. Suppose A starts the game. Find the probability (a) A
wins the game (b) B wins the game.
19
Solution
A wins the game if he gets seven in the 1st throw or in the 3rd throw or in the
1 5 5 1 5 5 5 5 1
5th throw or …. Hence P(A wins) = + × × + × × × × + …
6 6 6 6 6 6 6 6 6
1 1
6 6 6 5
= = = . P(B wins) = complementary probability = .
5
2
36 − 25 11 11
1− 36
6
4. Birthday Problem
There are n persons in a room. Assume that nobody is born on 29th Feb.
Assume that any one birthday is as likely as any other birth day. Find the
probability that no two persons will have same birthday.
Solution
If n > 365, at least two will have the same birthday and hence the probability
that no two will have the same birthday is 0.
6!
Ans.
66
(b) What is probability that exactly ‘n’ throws are needed? (n > 6)
20
6. Polya’s urn problem
An urn contains g green balls and r red balls. A ball is chosen at random and
its color is noted. Then the ball is returned to the urn and c more balls of same
color are added. Now a ball is drawn. Its color is noted and the ball is
replaced. This process is repeated.
g
Ans.
g+r
(b) Find the probability that the 2nd ball drawn is green.
g g+c r g g
Ans. × + =
g +r g +r+c g +r g+r+c g+r
(c) Find the probability that the nth ball drawn is green.
g
The surprising answer is .
g+r
7. There are n urns and each urn contains a white and b red balls. A ball is
chosen from Urn 1 and put into Urn 2. Now a ball is chosen at random from
urn 2 and put into urn 3 and this is continued. Finally a ball drawn from Urn n.
Find the probability that it is white.
Solution
a +1 a
∴ p r = p r −1 × + (1 − p r −1 ) × ; r = 1, 2, …, n.
a + b +1 a + a +1
a
This is a recurrence relation for pr. Noting that p1 = , we can find pn.
a+b
21
MATHEMATICAL EXPECTATION & DECISION MAKING
Suppose we roll a die n times. What is the average of the n numbers that appear on the
top?
1 × n 1 + 2 × n 2 ..........6 × n 6 n n n
= = 1 × 1 + 2 × 2 + ... + 6 × 6
n n n n
Here clearly n1, n2, …, n6 are unknown. But by the relative frequency definition of
n 1 n
probability, we may approximate 1 by P(getting 1 on the top) = , 2 by
n 6 n
1
P(getting 2 on the top) = , and so on. So we can ‘expect’ the average of the n
6
7
numbers to be = 3.5 . We call this the Mathematical Expectation of the number
2
on the top.
Definition
22
Problems
1. If a service club sells 4000 raffle tickets for a cash prize of $800, what is the
mathematical expectation of a person who buys one of these tickets?
1 1
Solution. 800 × + 0 × ( ) = = 0 .2
4000 5
2. A charitable organization raises funds by selling 2000 raffle tickets for a 1st prize
worth $5000 and a second prize $100. What is mathematical expectation of a
person who buys one of the tickets?
1 1
Solution. 5000 × + 100 × + 0× ( )
2000 2000
3. A game between 2 players is called fair if each player has the same mathematical
expectation. If some one gives us $5 whenever we roll a 1 or a 2 with a balanced
die, what we must pay him when we roll a 3, 4, 5 or 6 to make the game fair?
4. Gambler’s Ruin
A and B are betting on repeated flips of a balanced coin. At the beginning, A has
m dollars and B has n dollars. After each flip the loser pays the winner 1 dollar
and the game stops when one of them is ruined. Find probability that A will win
B’s n dollars before he loses his m dollars.
Solution.
Let p be the probability that A wins (so that 1-p is the probability that B wins).
Since the game is fair, A’s math exp = B’s math exp.
m
Thus n × p + 0 (1 − p ) = m(1 − p) + 0.p or p =
m+n
23
5. An importer is offered a shipment of machines for $140,000. The probability that
he will sell them for $180,000, $170,000 (or) $150,000 are respectively 0.32,
0.55, and 0.13. What is his expected profit?
6. The manufacturer of a new battery additive has to decide whether to sell her
product for $80 a can and for $1.2 a can with a ‘double your money back if not
satisfied’ guarantee. How does she feel about the chances that a person will ask
for double his/her money back if
Solution. In the 1st case, she gets a fixed amount of $0.80 a can
Let p be the prob that a person will ask for double his money back.
(a) happens if 0.80 > 1.20 –2.40 p
p > 1/6
(b) happens if
p < 1/6
24
7. A manufacturer buys an item for $1.20 and sells it for $4.50. The probabilities for
a demand of 0, 1, 2, 3, 4, “5 or more” items are 0.05, 0.15, 0.30, 0.25, 0.15, 0.10
respectively. How many items he must stock to maximize his expected profit?
(a) Which job should the contractor choose to maximize his expected profit?
3 1
i. Exp. profit for job1 = 240,000 × − 60,000 × = 155,000
4 4
1 1
ii. Exp. profit for job2 = 36,000 × − 90,000 × = 135,000
2 2
Go in for job1.
(b) What job would the contractor probably choose if her business is in bad
shape and she goes broke unless, she makes a profit of $300,000 on her
next job.
Ans:- She takes the job2 as it gives her higher profit.
25
RANDOM VARIABLES
Example 1
Let E be the random experiment of tossing a fair coin 3 times. We see that there are
2 3 = 8 outcomes TTT, HTT, THT, TTH, HHT, HTH, THH, HHH all of which are
equally likely. Let X be the random variable that ‘counts’ the number of heads obtained.
Thus X can take only 4 values 0,1,2,3. We note that
1 3 3 1
P ( X = 0 ) = , P ( X = 1) = , P ( X = 2 ) = , P ( X = 3) = . This is called the
8 8 8 8
probability distribution of the rv X. Thus the probability distribution of a rv X is the
listing of the probabilities with which X takes all its values.
Example 2
Let E be the random experiment of rolling a pair of balanced die. There are 36 possible
equally likely outcomes, namely (1,1), (1,2)…… (6,6). Let X be the rv that gives the sum
of the two nos on the top. Hence X take 11 values namely 2,3……12. We note that the
probability distribution of X is
1 2
P(X = 2 ) = P(X = 12 ) = , P(X = 3) = P(X = 11) = ,
36 36
3
P(X = 4 ) = P(X = 10 ) = ,
36
4
P(X = 5) = P(X = 9 ) = .
36
5 6 1
P(X = 6 ) = P(X = 8) = , P(X = 7 ) = = .
36 36 6
Example 3
Let E be the random experiment of rolling a die till a 6 appears on the top. Let X be the
no of rolls needed to get the “first” six. Thus X can take values 1,2,3…… Here X takes
an infinite number of values. So it is not possible to list all the probabilities with which X
takes its values. But we can give a formula.
26
x −1
5 1
P( X = x ) = (x = 1,2.....)
6 6
(Justification: X = x means the first (x-1) rolls gave a number (other than 6) and
x −1
5 5 5 1 5 1
the xth roll gave the first 6. Hence P ( X = x ) = × ...× × = )
6 6 6 6 6 6
x −1 times
We say X is a discrete rv of it can take only a finite number of values (as in example 1,2
above) or a “countably” infinite values (as in example 3).
On the other hand, the annual rainfall in a city, the lifelength of an electronic device, the
diameter of washers produced by a factory are all continuous random variables in the
sense they can take (theoretically at least) all values in an ‘interval’ of the x-axis. We
shall discuss continuous rvs a little later.
The first condition follows from the fact that the probability is always ≥ 0. The second
condition follows from the fact that the probability of the certain event = 1.
27
Example 4
Determine whether the following can be the probability distribution of a rv which can
take only 4 values 1,2,3 and 4.
Binomial Distribution
Let E be a random experiment having only 2 outcomes, say ‘success’ and ‘failure’.
Suppose that P(success) = p and so P(failure) = q (=1-p). Consider n independent
repetitions of E (This means the outcome in any one repetition is not dependent upon the
outcome in any other repetition). We also make the important assumption that P(success)
= p remains the same for all such independent repetitions of E. Let X be the rv that
’counts’ the number of successes obtained in n such independent repetitions of E. Clearly
X is a discrete rv that can take n+1 values namely 0,1,2,….n. We note that there are
2 n outcomes each of which is a ‘string’ of n letters each of which is an S or F (if n =3, it
will be FFF, SFF, FSF, FFS, SSF, SFS, FSS, SSS).
x successes and (n-x) failures in some order. One such will be SSS ..S FFF ..F . Since all
x n− x
the repetitions are independent prob of this outcome will be p x q n − x . Exactly the same
prob would be associated with any other outcome for which X = x. But x successes can
n
occur out of n repetitions in mutually exclusive ways. Hence
x
n
P(X = x ) = p x q n − x (x = 0,1, ...n ).
x
28
We say X has a Binomial distribution with parameters n ( ≡ the number of repetitions)
and p (Prob of success in any one repetition).
We denote P(X = x ) by b(x; n , p ) to show its dependence on x, n and p. The letter ‘b’
stands for binomial.
Since all the above (n+1) probabilities are the (n+1) terms in the expansion of the
binomial (q + p ) , X is said to have a binomial distribution. We at once see that the sum
n
The independent repetitions are usually referred to as the “Bernoulli” trials. We note that
b(x; n, p ) = b(n − x; n, q )
(LHS = Prob of getting x successes in n Bernoulli trials = prob of getting n-x failures in
n Bernoulli trials = R.H.S.)
29
Example 5 (Exercise 4.15 of your book)
During one stage in the manufacture of integrated circuit chips, a coating must be
applied. If 70% of the chips receive a thick enough coating find the probability that
among 15 chips.
Solution
Among 15 chips, let X be the number of chips that will have thick enough coatings.
Hence X is a rv having Binomial distribution with parameters n =15 and p = 0.70.
A food processor claims that at most 10% of her jars of instant coffee contain less coffee
than printed on the label. To test this claim, 16 jars are randomly selected and contents
weighed. Her claim is accepted if fewer than 3 of the 16 jars contain less coffee (note that
10% of 16 = 1.6 and rounds to 2). Find the probability that the food processor’s claim
will be accepted if the actual percent of the jars containing less coffee is
Solution:
Let X be the number of jars that contain less coffee (than printed on the label) (among the
16 jars randomly chosen. Thus X is a random variable having a Binomial distribution
30
with parameters n = 16 and p (the prob of “success” = The prob that a jar chosen at
random will have less coffee)
Suppose there is an urn containing 10 marbles of which 4 are white and the rest are black.
Suppose 5 marbles are chosen with replacement. Let X be the rv that counts the no of
white marbles drawn. Thus X = 0,1,2,3,4 or 5 (Remember that we replace each marble in
the urn before drawing the next one. Hence we can draw 5 white marbles)
4
P (“Success”) = P (Drawing a white marble in any one of the 5 draws) = (remember
10
we draw with replacement).
4
Thus X has a Binomial distribution with parameters n = 5 and p =
10
4
Hence P ( X = x ) = b x;5,
10
31
1
When n = 10, p = , P ( X = 5) is the greatest or 5 is the mod e.
2
Fact
b( x + 1; n, p ) n − x p
= × > 1if x < np − (1 − p )
b( x; n; p ) n +1 1− p
= 1 if x = np − (1 − p )
<1if n > n p − (1 − p )
Thus so long as x <np – (1-p) the binomial probabilities increase and if x> np-(1-p) they
decrease. Hence if np-(1-p) = x0 is an integer, then the mode is x0 and x0 + 1. If n – (1-p)
in not an integer and if x0 = smallest integer ≥ np − (1 − p ) , the mode is x 0 .
An urn contains 10 marbles of which 4 are white. 5 marbles are chosen at random
without replacement. Let X be the rv that counts the number of white marbles drawn.
Thus X can take 5 values names 0,1,2,3,4. What is P (X = x)? Now out of 10 marbles 5
10 4 6
can be chosen in equally like ways, out of which there will be ways of
5 x 5− x
drawing x white marbles (and so 5-x read marbles) (Reason out of 4 white marbles, x can
4 6
be chosen in ways and out of 6 red marbles, 5-x can be chosen in ways).
x 5− x
4 6
x 5− x
Hence P ( X = x ) = x = 0,1,2,3,4.
10
5
A box contains N marbles out of which a are white. n marbles are chosen without
replacement. Let X be the random variable that counts the number of white marbles
drawn. X can take the values 0,1,2……. n.
32
a N −a
x n−x
P( X = a ) = x = 0,1,2.... n
N
n
(Note x must be less than or equal to a and n-x must be less than or equal to N-a)
We say the rv X has a hypergeometric distribution with parameters n,a and N. We denote
P(X=x) by h (x;n,a,N).
Among the 12 solar collectors on display, 9 are flat plate collectors and the other three
are concentrating collectors. If a person choses at random 4 collectors, find the prob that
3 are flat plate ones.
9 3
3 1
Ans h (3; 4, 9,12 ) =
12
4
If 6 of 18 new buildings in a city violate the building code, what is the probability that a
building inspector, who randomly selects 4 of the new buildings for inspection, will catch
(a) None of the new buildings that violate the building code
12
4
Ans h(1; 4, 6, 18) =
18
4
(b) One of the new buildings that violate the building code
33
6 12
1 3
Ans h(1; 4, 6,18) =
18
4
(c) Two of the new buildings that violate the building code
6 12
2 2
Ans h(2; 4, 6, 18) =
18
4
(d) At least three of the new buildings that violate the building code
A shipment of 120 burglar alarms contains 5 that are defective. If 3 of these alarms are
randomly selected and shipped to a customer, find the probability that the customer will
get one defective alarm.
34
Solution
5 115
1 2 5 × 6555
= = = 0.1167
120 280840
3
5
(b) h(1; 3, 5, 120 ) ≈ b 1; 3,
120
2
3 5 5
= 1− = 0.1148
1 120 120
Among the 300 employees of a company, 240 are union members, while the others are
not. If 8 of the employees are chosen by lot to serve on the committee which
administrates the provident fund, find the prob that 5 of them will be union members
while the others are not.
Solution
35
THE MEAN AND VARIANCE OF PROBABILITY DISTRIBUTIONS
We know that the equation of a line can be written as y = mx + c. Here m is the slope and
c is the y intercept. Different m,c give different lines. Thus m and c characterize a line.
Similarly we define certain numbers that characterize a probability distribution.
Example 11
X 1 2 3
Prob 1 1 1
2 3 6
1 1 1 5
µ =1 × + 2× + 3× =
2 3 6 3
Example 12
X 0 1
Prob q p
36
where q = 1 − p. Thus µ = 0 × q + 1 × p = p.
Mean of X = µ = np.
a
If X is a rv having hypergeometric distribution with parameters N , n, a, then µ = n .
N
Digression
The mean of a rv x give the “average” of the values taken by the rv. X. Thus the
average marks in a test is 40 means the students would have got marks less than 40
and greater than 40 but it averages out to be 40. But we do not get an idea about the
spread ( ≡ deviation from the mean) of the marks. This spread is measured by the
variance. Informally speaking by the average of the squares of deviation from the
mean.
Variance of X = σ 2
= (x i − )2 P(X = x i )
xi ∈ R X
The positive square root σ of σ 2 is called the standard deviation of X and has the
same units as X and µ .
37
Example 13
For the rv X having the prob distribution given in example 11, the variance is
2 2 2
5 1 5 1 5 1
1− × + 2− × + 3− ×
3 2 3 3 3 6
4 1 1 1 16 1 5
= x + × + × =
9 2 9 3 9 6 9
( )
σ 2 = E (X − µ ) = E X 2 − µ 2
2
( ) 1
Here E X 2 = 12 ×
2
1 1 1 4 9 60 10
+ 2 2 × + 32 × = + + = =
3 6 2 3 6 18 3
10 25 5
∴σ2 = − = .
3 9 9
Example 14
( )
E X 2 = o 2 × q + 12 × p = p
∴σ 2 = p − p 2 = p(1 − p ) = pq
σ 2 = npq
a a N −n
σ2 =n 1− . .
N N N −1
38
CHEBYCHEV’S THEOREM
1
P(| X − µ | ≥ kσ ) ≤
k2
In words the prob of getting a value which deviates from its mean µ by at least kσ is at
1
most .
k2
Note: Chebyshev’s Theorem gives us an upper bound of the prob of an event. Mostly it is
of theoretical interest.
In one out of 6 cases, material for bullet proof vests fails to meet puncture standards. If
405 specimens are tested, what does Chebyshev theorem tell us about the prob of getting
at most 30 or at least 105 cases that do not meet puncture standards?
1 135
Here µ = np = 405 × =
6 2
1 5
σ 2 = n p q = 405 × ×
6 6
15
∴σ =
2
Let X = no of cases out of 405 that do not meet puncture standards
Reqd P(X ≤ 30 or X ≥ 105)
75
Now X ≤ 30 X − µ ≤−
2
75
X ≥ 105 X −µ ≥
2
75
Thus X ≤ 30 or X ≥ 105 | X −µ |≥ = 5σ
2
39
1 1
∴P(X ≤ 30 or X ≥ 105) = P(| X − µ | ≥ 5σ ) ≤ = = 0.04
5 2 25
Example 16 (Exercise 446 of your text)
How many times do we have to flip a balanced coin to be able to assert with a prob of at
most 0.01 that the difference between the proportion of tails and 0.50 will be at least
0.04?
Solution:
Suppose we flip the coin n times and suppose X is the no of tails obtained. Thus the
X No of tails
proportion of tails = = . We must find n so that
n Total No of flips
X
P − 0.50 ≥ 0.04 ≤ 0.01
n
X
Now − 0.50 ≥ 0.04 is equivalent to X − n × 0.50 ≥ 0.04n.
n
1
We know P( X − µ ≥ kσ )≤
k2
Here kσ = 0.04n
0.04n
∴k = = 0.08 n
0.50 × n
40
X
∴P − 0.50 ≥ 0.04
n
1
= P(| X − µ | ≥ kσ ) ≤ ≤ 0.01
k2
1
= 100 or if (.08) n ≥ 100.
2
if k 2 ≥
0.01
100
or n ≥ =15625
(.08)2
Suppose a factory manufactures items. Suppose there is a constant prob p that an item is
defective. Suppose we choose n items at random and let X be the no of defectives found.
Then X is a rv having binomial distribution with parameters n and p.
X
Now P −p ≥ε
n
= P( X − np ≥ nε ) = P( x − µ ≥ kσ ) (where kσ = nε )
1 σ2 npq pq
≤ ( by Chebyshev '
s theorem ) = = 2 2 = 2 → 0 as n → ∞.
k 2
n ε
2 2
n ε nε
Thus we can say that the prob that the proportion of defective items differs from the
actual prob. p by any + ve no ε → 0 as n → ∞ . (This is called the Law of Large
numbers)
This means “most of the times” the proportion of defectives will be close to the actual
X
(unknown) prob p that an item is defective for large n. So we can estimate p by , the
n
(Sample) proportion of defectives.
41
POISSON DISTRIBUTION
A random variable X is said to have a Poisson distribution with parameter λ > 0 if its
probability distribution is given by
λx
P ( X = x ) = f ( x; λ ) = e − λ x = 0,1,2......
x!
Hence for n large, p small, the binomial prob b( x; n, p ) can be approximated by the
Poisson prob f ( x; λ ) where λ = np.
Example 17
b(3;100, 0.03)
e −3 3 3
≈ f (3;3) =
3!
If 0.8% of the fuses delivered to an arsenal are defective, use the Poisson approximation
to determine the probability that 4 fuses will be defective in a random sample of 400.
Solution
If X is the number of defectives in a sample of 400, X has the binomial distribution with
parameters n = 400 and p = 0.8% = 0.008.
42
Thus P (4 out of 400 are defective)
=e − 3.2 (3.2)4
4!
= 0.781 − 0.603
(from table 2 at the end of the text)
= 0.178
For various λ and x, F(x; λ ) has been tabulated in table 2 (of your text book on page 581
to 585) .We use the table 2 as follows.
Poisson Process
There are many situations in which events occur randomly in regular intervals of time.
For example in a time period t, let X t be the number of accidents at a busy road junction
in New Delhi; X t be the number of calls received at a telephone exchange; X t be the
number of radio active particles emitted by a radioactive source etc. In all such examples
we find X t is a discrete rv which can take non-ve integral values 0,1,2,….. The important
thing to note is that all such random variables have “same” distribution except that the
parameter(s) depend on time t.
Divide the time period t into n small time periods each of length ∆t . Hence by
assumptions above, we note that Xt = no of successes in time period t is a rv having
Binomial distribution with parameters n and p = α∆t . Hence
→ f (x; ) as n → ∞
where = n. t
(Note: For a more rigorous derivation of the distribution of Xt, you may see Meyer,
Introductory probability and statistical applications, pages 165-169).
Given that the switch board of a consultant’s office receives on the average 0.6 call per
minute, find the probability that
44
Solution
Example 20
Suppose that Xt, the number of particles emitted in t hours from a radio – active source
has a Poisson distribution with parameter 20t. What is the probability that exactly 5
particles are emitted during a 15 minute period?
Solution
1
15 minutes = hour
4
1
Hence if X 14 = no of particles emitted in hour
4
5
1
× 20
( )
P X 14 = 5 = e
− 14 × 20 4
5!
= e −5
55
5!
45
THE GEOMETRIC DISTRIBUTION
Suppose there is a random experiment having only two possible outcomes, called
‘success’ and ‘failure’. Assume that the prob of a success in any one ‘trial’ ( ≡ repetition
of the experiment) is p and remains the same for all trials. Also assume the trials are
independent. The experiment is repeated till a success is got. Let X be the rv that counts
the number of trials needed to get the 1st success. Clearly X = x if the first (x-1) trials
were failures and the xth trial gave the first success. Hence
We say X has a geometric distribution with parameter p (as the respective probabilities
form a geometric progression with common ratio q).
1 q
µ= and the variance is σ 2 = 2
p p
(For example suppose a die is rolled till a 6 is got. It is reasonable to expect on an average
1
we will need 1 = 6 rolls as there are 6 nos!)
6
An expert hits a target 95% of the time. What is the probability that the expert will miss
the target for the first time on the fifteenth shot?
Solution
Here ‘Success’ means the expert misses the target. Hence p = P(Success ) = 5% = 0.05 . If
X is the rv that counts the no. of shots needed to get ‘a success’, we want
P ( X = 15) = q 14 × p = (0.95) × 0.05.
14
46
Example 22
The probability of a successful rocket launching is 0.8. Launching attempts are made till
a successful launching has occurred. Find the probability that exactly 6 attempts will be
necessary.
Example 23
(a) P ( X ≥ r ) = q r −1 r = 1,2,.........
(b) P(x ≥ s + t | x > s ) = P( X ≥ t )
Solution
∞
q r −1 p
(a) P(X ≥ r ) = q x −1 .p = = q r −1 .
x =r 1− q
P ( X ≥ s + t ) q s +t −1
(b) P(X ≥ s + t X > s )= = = q t −1 = P ( X ≥ t ).
P( X > s ) q s
Service facility
Customers arrive in a
Depart after service
Poisson Fashion S
There is a service facility. Customers arrive in a random fashion and get service if the
server is idle. Else they stand in a Queue and wait to get service.
47
Questions that one can ask are :
1. At any point of time on an average how many customers are in the system
(getting service and waiting to get service)?
2. What is the mean time a customer waits in the system?
3. What proportion of time a server is idle? And so on.
We shall consider only the simplest queueing system where there is only one server. We
assume that the population of customers is infinite and that there is no limit on the
number of customers that can wait in the queue.
We also assume that the customers arrive in a ‘Poission fashion’ at the mean rate of α .
This means that X t the number of customers that arrive in a time period t is a rv having
Poisson distribution with parameter α t . We also assume that so long as the service
station is not empty, customers depart in a Poisson fashion at a mean rate of β . This
means, when there is at least one customer, Yt , the number of customers that depart
(after getting service) in a time period t is a r.v. having Poisson distribution with
parameter βt (where β > α ).
Further assumptions are : In a small time interval ∆t , there will be a single arrival or a
single departure but not both. (Note that by assumptions of Poisson process in a small
time interval ∆t , there can be at most one arrival and at most one departure). Let at time
t, N t be the number of customers in the system. Let P ( N t = n ) = p n (t ). We make another
assumption:
α
π o =1−
β
n
α α
π n = 1− (n = 0, 1, 2, . . .)
β β
Thus L = Mean number of customers in the system getting service and waiting to get
service)
48
∞
α
= n.π n =
n =0 β −α
∞
α2 α
= (n − 1) π n = =L−
n =1 β (β − α ) β
1 L
= =
β −α α
α Lq 1
= = =W − .
β (β − α ) α β
Trucks arrive at a receiving dock in a Poisson fashion at a mean rate of 2 per hour. The
trucks can be unloaded at a mean rate of 3 per hour in a Poisson fashion (so long as the
receiving dock is not empty).
(a) What is the average number of trucks being unloaded and waiting to get
unloaded?
(b) What is the mean no of trucks in the queue?
(c) What is the mean time a truck spends waiting in the queue?
(d) What is the prob that there are no trucks waiting to be unloaded?
(e) What is the prob that an arriving truck need not wait to get unloaded?
49
Solution
Thus
α 2
(a) L= = =2
β −α 3−2
α2 22 4
(b) Lq = = =
β (β − α ) 3(1) 3
α 2
(c) Wq = = hr
β (β − α ) 3
α α α 2 2 2
= π 0 +π 1 = 1− + 1− = 1− + 1−
β β β 3 3 3
1 2 5
= + =
3 9 9
Example 25
With reference to example 24, suppose that the cost of keeping a truck in the system is
Rs. 15/hour. If it were possible to increase the mean loading rate to 3.5 trucks per hour at
a cost of Rs. 12 per hour, would this be worth while?
50
Solution
4
In the new scheme α = 2, β = 3, L = verify!
3
4
∴ Net cost per hour to the dock = × 15 + 12 = 32 / hr.
3
51
MULTINOMIAL DISTRIBUTION
n!
= p1x1 p 2x 2 ..... p kxk
x1 ! x 2 !......x k !
Proof. The probability of getting A1 x1 times, A2 x 2 times, Ak x k times in any one way
is p1x1 p 2x2 ...... p kxk as all the repetitions are independent. Now among the n repetitions
n n!
A1 occurs x1 times in = ways.
x1 x1 ! (n − x1 )!
Hence the total number of ways of getting A1 x1 times, A2 x 2 times, …. Ak x k times will
be
n! (n − x1 )! (n − x1 − x 2 .....x k −1 )!
× × ...
x1 ! (n − x1 )! x 2 ! (n − x1 − x 2 )! x k ! (n − x1 − x 2 ....x k −1 − x k )!
n!
= as x1 + x 2 + .....x k = n and 0! = 1
x1 ! x 2 !......x k !
n!
Hence P ( X 1 = x1 , X 2 = x 2 ,..... X k = x k ) = p1x1 p 2x2 .... p kxk
x1 ! x 2 !....x k !
52
Example 26
A die is rolled 30 times. Find the probability of getting 1 2 times, 2 3 times, 3 4 times,
4 6 times, 5 7 times and 6 8 times.
Ans
2 3 4 6 7 8
30! 1 1 1 1 1 1
×
2! 3! 4! 6! 7! 8! 6 6 6 6 6 6
The probabilities are, respectively, 0.40, 0.40, and 0.20 that in city driving a certain type
of imported car will average less than 10 kms per litre, anywhere between 10 and 15 kms
per litre, or more than 15 kms per litre. Find the probability that among 12 such cars
tested, 4 will average less than 10 kms per litre, 6 will average anywhere from 10 to 15
kms per litre and 2 will average more than 15 kms per litre.
Solution
12!
(.40)4 (.40)6 (.20)2 .
4! 6! 2!
Remark
1. Note that the different probabilities are the various terms in the expansion of the
multinomial
( p1 + p 2 + ...... p k )n .
Hence the name multinomial distribution.
53
SIMULATION
Nowadays simulation techniques are being applied to many problems in Science and
Engineering. If the processes being simulated involve an element of chance, these
techniques are referred to as Monte Carlo methods. For example to study the distribution
of number of calls arriving at a telephone exchange, we can use simulation techniques.
Random Numbers : In simulation problems one uses the tables of random numbers to
“generate” random deviates (values assumed by a random variable). Table of random
numbers consists of many pages on which the digits 0,1,2….. 9 are distributed in such a
1
was that the probability of any one digit appearing is the same, namely 0.1 = .
10
Use of random numbers to generate ‘heads’ and ‘tails’. For example choose the 4th
column of the four page of table 7, start at the top and go down the page. Thus we get
6,2,7,5,5,0,1,8,6,3….. Now we can interpret this as H,H,T, T,T, H, T, H, H,T, because the
prob of getting an odd no. = the propagating an even number = 0.5 Thus we associate
head to the occurrence of an even number and tail to that of an odd no. We can also
associate a head if we get 5,6,7,8, or 9 and tail otherwise. The use can say we got
H,T,H,H,H,T,T,H,H,T….. In problems on simulation we shall adopt the second scheme
as it is easy to use and is easily ‘extendable’ for more than two outcomes. Suppose for
example, we have an experiment having 4 outcomes with prob. 0.1, 0.2, 0.3 and 0.4
respectively.
Thus to simulate the above experiment, we have to allot one of the 10 digits 0,1….9 to
the first outcome, two of them to the second outcome, three of them to the third outcome
and the remaining four to the fourth outcome. Though this can be done in a variety of
ways, we choose the simplest way as follows:
Now that there will be 100 2 digit random numbers : 00, 01, …, 10, 11, …, 20, 21, …,
98, 99. Thus we associate the first 80 numbers 00,01…79 to the first out come, the next
15 numbers (80, 81, …94) to the second outcome and the last 5 numbers (95, 96, …, 99)
to the 3rd outcome. Thus the above sequence of 2 digit random numbers would simulate
the outcomes:
O 2 , O1 , O1 , O1 , O1 , O1 , O1 , O1 .......
* Cumulative prob is got by adding all the probabilities at that position and above thus cumulative
prob at O2 = Prob of O1 + Prob O2 = 0.80 + 0.15 = 0.95.
** You observe the beginning random number is 00 for the 1st outcome; and for the remaining
outcomes, it is one more than the ending random numbers of the immediately preceding outcome.
Also the ending random number for each outcome is “one less than the cumulative probability”.
Similarly three digit random numbers are used if the prob of an outcome has 3 decimal
places. Read the example on page 133 of your text book.
55
Exercise 4.97 on page 136
Cumulative
No. of polluting spices Probability Random Numbers
Probability
0 0.2466 0.2466 0000-2465
1 0.3452 0.5918 2466-5917
2 0.2417 0.8335 5918-8334
3 0.1128 0.9463 8335-9462
4 0.0395 0.9858 9463-9857
5 0.0111 0.9969 9858-9968
6 0.0026 0.9995 9969-9994
7 0.0005 1.0000 9995-9999
Starting with page 592, Row 14, Column 7, we read of the 4 digit random nos as :
In many situations, we come across random variables that take all values lying in a
certain interval of the x axis.
Example
(1) life length X of a bulb is a continuous random variable that can take all non-ve
real values.
(2) The time between two consecutive arrivals in a queuing system is a random
variable that can take all non-ve real values.
56
(3) The distance R of the point (where a dart hits) (from the centre) is a
continuous random variable that can take all values in the interval (0,a) where
a is the radius of the board.
It is clear that in all such cases, the probability that the random variable takes any one
particular value is meaningless. For example, when you buy a bulb, you ask the question?
What are the chances that it will work for at least 500 hours?
If X is a continuous random variable, the questions about the probability that X takes
values in an interval (a,b) are answered by defining a probability density function.
Def Let X be a continuous rv. A real function f(x) is called the prob density function of X
if
f ( x )dx = 1
∞
(2)
−∞
P (a ≤ X ≤ b ) = f ( x ) dx.
b
(3)
a
Remarks
P( X = a ) = P(a ≤ X ≤ a ) = f ( x )dx = 0
a
1.
a
57
This is proved using Mean value theorem.
−∞
We denote the above by F(x) and call it the cumulative distribution function (cdf) of X.
Properties of cdf
1. 0 ≤ F ( x ) ≤ 1 for all x.
3. F (− ∞ ) = lim f ( x ) = 0; f (+ ∞ ) = lim F (x ) = 1.
x → −∞ x →∞
x
d d
4. F (x ) = f (t ) dt = f ( x )
dx dx −∞
(Thus we can get density function f(x) by differentiating the distribution function F(x)).
If the prob density of a rv is given by f ( x ) = kx 2 0 < x < 1 (and 0 elsewhere) find the
value of k and the probability that the rv takes on a value
1 3
(a) Between and
4 4
2
(b) Greater than
3
Find the distribution function F(x) and hence answer the above questions.
58
Solution
f ( x )dx = 1
∞
−∞
gives
1 1
i.e. kx 2 dx = 1 or k = 1 or k = 3.
0 3
3 3
1 3 3
3 1 26 13
P <X < = 3 x dx =
2
− = =
4
4 4 4 4 64 32
1
4
2 2 1
P X > = P < X <1 = 3 x 2 dx
3 3 2
3
3
2 19
= 13 − =
3 27
−∞
Case (ii) 0<x<1. In this case f (t ) = 3t 2 between 0 and x and 0 for t<0.
∴ F (x ) = f (t )dt =
x x
3t 2 dt = x 3 .
−∞ 0
59
∴ F (x ) = f (t )dt = f (t )dt = 1 (by case ii )
x 1
−∞ −∞
0 x≤ 0
F (x ) = x 3 0< x ≤1
1 x> 0
1 3 3 1
Now P <X < =P X < −P X ≤
4 4 4 4
3 1
= P X ≤ −P X ≤
4 4
3 3
3 1 3 1 13
= F −F = − =
4 4 4 4 32
2 2
P X > = 1− P X ≤
3 3
3
2 2 19
= 1− F =1 − =
3 3 27
x 0 < x <1
f (x ) = 2 − x 1 ≤ x < 2
0 elsewhere
2 2
0.8 0 .8 0 .2
= x dx = − = 0 .3
0.2 2 2
2 2 1.2
2
2− x
(2 − x ) dx = 1 − 0.6
1 1.2
= x dx + + −
0.6 1 2 2 2
1
1 (.8)
2
= 0.32 + = = 0.32 + 0.18 = 0.5
2 2
−∞
∴ F (x ) = f (t )dt = 0.
x
−∞
−∞ −∞ 0
x x2
=0+ t dt =
0 2
61
∴ F (x ) = f (t )dt
x
−∞
f (t )dt + f (t )dt
1 x
=
−∞ 1`
1
(by case ii ) + (2 − t )dt
x
=
2 1
1 1 (2 − x ) (2 − x )2 2
= + − = 1−
2 2 2 2
∴ F (x ) = f (t )dt
x
−∞
f (t )dt + f (t )dt
2 x
=
−∞ 2
Thus
0 x ≤ 0
x2
0< x≤ 1
F (x ) = 2
1−
(2 − x)
2
1< x≤ 2
2
1 x > 2
= 1−
(0 .8 )
2
−
(0.6 )2
2 2
= 0 .5
62
P ( X > 1 .8 ) = 1 − P ( X ≤ 1 .8 )
= 1 − F (1.8) = 1 − 1−
(.2 )
2
= 0.02
2
µ = E(X ) = x f ( x )dx
∞
−∞
∞
E (x − µ ) = (x − µ )2 f (x )dx
2
−∞
( )
= E X 2 −µ2
( )
Here E X 2 =
∞
−∞
x 2 f ( x )dx
∞ 3
Its mean µ = E ( X ) = x f ( x )dx =
1
x.3 x 2 dx = .
−∞ 0 4
( )
E X2 =
∞
−∞
x 2 f ( x )dx
1 3
= x 2 . 3 x 2 dx =
0 5
2
3 3
Hence σ 2 = − = 0.0375
5 4
63
Example 4 The density of a rv X is
1 − x / 20
e x>0
f ( x ) = 20
0 elsewhere
∞ ∞ 1 − x / 20
µ = E(X ) = x f ( x )dx = x. e dx
−∞ 0 20
[( )
= x. − e − x / 20 − 20e − x / 20 ] ∞
0
= 20.
( )
E X2 =
∞
−∞
x 2 f ( x )dx
∞ 1 − x / 20
= x2 e dx
0 20
[x (− e
2 − x / 20
) − (2 x ) (20 e − x / 20
) + 2.(− 400 e − x / 20
)]
∞
0
= 800
( )
∴σ 2 = E X 2 − µ 2 = 800 − 400 = 400
∴σ = 20.
NORMAL DISTRIBUTION
A random variable X is said to have the normal distribution (or Gaussian Distribution) if
its density is
( x − µ )2
( ) 1 −
f x; µ , σ 2 = e 2σ 2
−∞ < x < ∞
2π σ
Hence µ , σ are fixed (called parameters) and σ > 0. The graph of the normal density is
a bell shaped curve:
64
Figure
variance of X = E ( X − µ ) = σ 2 .
2
2π
1
F ( z ) = P (Z ≤ z ) =
z
e −t
2
2
dt
−∞
2π
and represents the area under the density upto z. It is the shaded portion in the figure.
Figure
1
We at once see from the symmetry of the graph that F (0 ) = = 0 .5
2
F (− z ) = 1 − F ( z )
65
F(z) for various positive z has been tabulated at in table 3 (at the end of your book).
Definition of zα
z 0.05 = 1.645
Important
If X is normal with mean µ and variance σ 2 , it can be shown that the standardized r.v.
X −µ
Z= has standard normal distribution. Thus questions about the prob that X
σ
assumes a value between say a and b can be translated into the prob that Z assumes
values in a corresponding range. Specifically :
66
a−µ X −µ b−µ a−µ b−µ
=P < < =P <Z<
σ σ σ σ σ
b−µ a−µ
=F −F
σ σ
Given that X has a normal distribution with mean µ = 16.2 and variance σ 2 = 1.5625,
find the prob that it will take on a value
X −µ 16.8 − 16.2
Thus P ( X > 16.8) = P >
σ 1.25
.6
=P Z> = P (Z > 0.48)
1.25
= 1 − P (z ≤ 0.48) = 1 − F (0.48)
= 1 − 0.6844 = 0.3156
X −µ 14.9 − 16.2
(b) P ( X < 14.9 ) = P <
σ 1.25
1 .3
=P Z <− = P (Z < −1.04 )
1.25
= F (− 1.04 ) = 1 − F (1.04 ) = 1 − 0.8508 = .1492
67
2 .6 2 .6
=P − <Z < = P (− 2.08 < Z < 2.08)
1.25 1.25
= F (2.08) − F (− 2.08) = F (2.08) − (1 − F (2.08))
= 2 F (2.08) − 1 = 2 × 0.9812 − 1 = .9624
.3 .5
=P <Z <
1.25 1.25
= P (0.24 < z < 0.4 ) = F (0.4 ) − F (0.24 )
= 0.6554 − 0.5948 = 0.606
Example 2
A rv X has a normal distribution with σ = 10. If the prob is 0.8212 that it will take on a
value < 82.5, what is the prob that it will take on a value > 58.3?
Solution
X −µ 82.5 − µ
Thus P < = 0.8212
σ 10
82.5 − µ
Or P Z < = 0.8212
10
82.5 − µ
F = 0.8212
10
82.5 − µ
From table 3, = 0.92
10
68
X −µ 58.3 − 73.3
=P > = P (Z > 1 / 5 )
σ 10
In a Photographic process the developing time of prints may be looked upon as a r.v. X
having normal distribution with µ = 16.28 seconds and s.d. of 0.12 second. For which
value is the prob 0.95 that it will be exceeded by the time it takes to develop one of the
prints.
Solution
P( X > c ) = 0.95
X −µ c − 16.28
i.e P > = 0.95
σ 1 .2
c − 16.28
i.e. P Z > = 0.95
1 .2
c − 16.28
Hence P Z ≤ = 0.05
1 .2
c − 16.28
∴ = 1.645
1 .2
69
Thus when n is large, the binomial probabilities can be approximated using normal
distribution function.
A manufacturer knows that on the average 2% of the electric toasters that he makes will
require repairs within 90 days after they are sold. Use normal approximation to the
binomial distribution to determine the prob that among 1200 of these toasters at least 30
will require repairs within the first 90 days after they are sold?
Solution
Let X = No. of toasters (among 1200) that require repairs within the first 90 days after
they are sold. Hence X is a rv having Binomial Distribution with parameters n = 1200
2
and p = = .02.
100
X − np 30 − 24
Required P ( X ≥ 30 ) = P ≥
npq 4.85
Since for continuous rvs P( z ≥ c ) = P( z > c ) (which is not true for discrete rvs), when we
approximate binomial prob by normal prob, we must ensure that we do not ‘lose’ the end
point. This is achieved by what we call continuity correction: In the previous example,
P( X ≥ 30) also = P( X ≥ 29.5) (Read the justification given in your book on page 150
line 1to 7).
X − np 29.5 − 24
=P ≥
npq 4.85
5 .5
≈P Z≥ = P(Z ≥ 1.13)
4.85
= 1 − P (Z ≤ 1.13) = 1 − F (1.13) = 1 − 0.878
= .1292
A safety engineer feels that 30% of all industrial accidents in her plant are caused by
failure of employees to follow instructions. Find approximately the prob that among 84
industrial accidents anywhere from 20 to 30 (inclusive) will be due to failure of
employees to follow instructions.
Solution
Let X = no. of accidents (among 84) due to failure of employees to follow instructions.
Thus X is a rv having Binomial distribution with parameters n = 84 and p = 0.3.
≈ P (− 1.36 ≤ Z ≤ 1.26 )
A r.v X is said to have uniform distribution over the interval (α , β ) if its density is given
by
1
α <x<β
f (x ) = β − α
0 elsewhere
71
Thus the graph of the density is a constant over the interval (α , β )
If α <c<d <β
1 d −c
P (c < X < d ) =
d
dx =
c β −α β −α
α +β
The mean of X = E ( X ) = µ = (mid point of the interval (α , β ) )
2
The variance of X = σ 2
=
( β − α )2
. The cumulative distribution function is
12
0 x ≤α
x −α
f (x ) = α <x≤β
β −α
1 x>β
Solution
0.015 − 0.010
(a) P (0.010 < X < 0.015) =
0.025 − (− 0.025)
0.005
= = 0 .1
0.050
0.012 − (− 0.012 )
(b) P (− 0.012 < X < 0.012) =
0.025 − (− 0.025)
12
= = 0.48
25
72
Example 7 (See exercise 5.47 on page 165)
From experience, Mr. Harris has found that the low bid on a construction job can be
regarded as a rv X having uniform density
3 2C
< x < 2C
f ( x ) = 4C 3
0 elsewhere
where C is his own estimate of the cost of the job. What percentage should Mr. Harris
add to his cost estimate when submitting bids to maximize his expected profit?
Solution
Suppose Mr. Harris adds k% of C when submitting his bid. Thus Mr. Harris gets a profit
kC kC
if he gets the contract which happens if the lowest bid (by others) ≥ C + and
100 100
kC
gets no profit if the lowest bid < C + . Thus the prob that he gets the bid
100
kC 3 kC 3 k
=P C+ < X < 2C = × 2C − C + = 1−
100 4C 100 4 100
kC 3 k
× 1− + 0 × (....)
100 4 100
3C k2
= k−
400 100
Thus Mr. Harris’s expected profit is a maximum when he adds 50% of C to C, when
submitting bids.
Gamma Function
This is one of the most useful functions in Mathematics. If x > 0, it is shown that the
∞
improper integral e − t t x −1 dt converges to a fuite real number which we denote by Γ( x )
0
(Capital gamma of x). Thus for all real no x > 0, we define
Γ( x ) =
∞
e −t t x −1 dt.
0
73
Properties of Gamma Function
1. Γ( x + 1) = xΓ( x ) , x > 0
2. Γ(1) = 1
1
4. Γ = π.
2
5. Γ( x ) decreases in the interval (0,1) and increases in the interval (2, ∞ ) and has a
minimum somewhere between 1 and 2.
Let α 1 β be 2 +ve real numbers. A r.v X is said to have a Gamma Distribution with
parameters α 1 β if its density is
1 −x .
e β x α −1 x > 0
f ( x ) = β Γ(α )
α
0 elsewhere
Mean of X = E ( X ) = µ = αβ
Variance of X = σ 2 = αβ 2 .
Exponential Distribution
1 − xβ
e x>0
f (x ) = β
0 elsewhere
74
We also see easily that:
1. Mean of X = E ( X ) = β
2. Variance of X = σ 2 = β 2
3. The cumulative distribution function of X is
− xβ
1− e x>0
F (x ) =
0 elsewhere
= 1 − F (s ) = e
− sβ
(by (3))
P (( X > s + t ) ∩ ( X > s ))
P( X > s + t | X > s ) =
P( X > s )
P( X > s + t ) e − ( s + t ) / β
= P( x > t ).QED
− tβ
= = = e
P( X > s ) − s
e β
In a certain city, the daily consumption of electric power (in millions of kw hours) can be
treated as a r.v. X having a Gamma distribution with α = 3, β = 2. If the power plant in
the city has a daily capacity of 12 million kw hrs, what is the prob. that the power supply
will be inadequate on any given day?
Solution
The power supply will be inadequate if demand exceeds the daily capacity.
∞
= P ( X > 12 ) = f ( x )dx
12
75
x
1 −
Now as α = 3, β = 2, f ( x ) = 3 e 2 x 3−1
2 Γ(3)
x
1 2 −2
= x e
16
∞ x
1 2 −2
Hence P ( X > 12 ) = x e dx
12 10
∞
x x x
1 2 − − −
= x − 2e 2 − 2 x 4e 2 + 2 − 8e 2
10
12
=
1
16
[
2 × 12 2 × e − 6 + 8 × 12 × e − 6 + 16e − 6 ]
400 − 6
= e = 25e − 6 = 0.062
10
The amount of time that a surveillance camera will run without having to be reset is a r.v.
X having exponential distribution with β = 50 days. Find the prob that such a camera
Solution
The density of X is
x
1 − 50
f (x ) = e x > 0 (and 0 elsewhere)
50
76
x x 20
20
1 − 50 −
= P ( X < 20 ) = e dx = − e 50
0
50 0
20 2
− −
= 1− e 50
=1 − e 5
= 0.3297
∞ x
1 − 50
= P ( X > 60 ) = e dx
60 50
x ∞ 6
− −
= −e 50
=e 5
= 0.3012
60
Given a Poisson process with the average α arrivals per unit time, find the prob density
of the inter arrival time (i.e the time between two consecutive arrivals).
Solution
Let T be the time between two consecutive arrivals. Thus clearly T is a continuous r.v.
with values > 0. Now T > t No arrival in time period t.
Thus P (T > t ) = P ( X t = 0)
= F (t ) = P (T ≤ t ) = 1 − P (t > t ) = 1 − e αt t > 0
77
d
Hence the density of T , f (t ) = F (t )
dt
αe −αt if t > 0
=
0 elsewhere
Hence we would say the IAT is a continuous rv. with exponential density with parameter
1
.
α
Γ( x )Γ( y )
It is well-known that B ( x, y ) = , x , y > 0.
Γ( x + y )
BETA DISTRIBUTION
A r.v. X is said to have a Beta distribution with parameter α , β > 0 if its density is
1
f (x ) = , x α −1 (1 − x )
β −1
0 < x <1
B (α , β )
0 elsewhere
α
(1) E(X ) = µ =
α+β
αβ
(2) V (X ) = σ 2 =
(α + β ) (α + β + 1)
2
78
Example 11 (See Exercise 5.64)
If the annual proportion of erroneous income tax returns can be looked upon as a rv
having a Beta distribution with α = 2, β = 9, what is the prob that in any given year,
there will be fewer than 10% of erroneous returns?
Solution
Let X = annual proportion of erroneous income tax returns. Thus X has a Gamma density
with α = 2, β = 9.
0.1
∴ P( X < 0.1) = f (x )dx (Note the proportion can not be < 0)
0
0.1
1
x 2 −1 (1 − x ) dx
9 −1
=
0
B (2,9 )
Γ(2 )Γ(9 ) 1× 8! 1 1
B (2,9 ) = = = =
Γ(11) 11! 9 × 10 × 11 990
[(1 − x ) ]
0.1 0.1
x. (1 − x ) dx = − (1 − x ) dx
8 8 9
0 0
(1 − x )9 (1 − x) (.9 )1 (.9 )
10 0.1 9 10
1
= − = + + −
−9 − 10 0
−9 9 10 10
.9 1 1 1 1 19
= (.9 ) = − (.9 ) ×
9 9
− + −
10 9 9 10 90 900
= 0.00293
1
x −1 e − (ln x −α )
2
/ 2β 2
x > 0, β > 0
f (x ) = 2π β
0 elsewhere
79
It can be shown that if X has log-normal distribution, Y = ln X has a normal distribution
with mean µ = α and s.d. σ = β .
ln a − α ln b − α ln b − α ln a − α
=p <Z < =F −F
β β β β
2
and its variance = e 2α + β e β − 1( 2
)
More problems on Normal Distribution
Example 12
P( X ≤ c ) = 2 P( X ≥ c )
Solution
P ( X ≤ c ) = 2 P (x ≥ c )
Implies P( X ≤ c ) = 2 (1 − P( X < c ))
Let P( X ≤ c ) = p
2
Thus 3 p = 2 or p =
3
X −µ c−µ c−µ 2
Now P ( X ≤ c ) = P ≤ =F = = .6667
σ σ σ 3
c−µ
Implies = 0.43 (approx from Table 3)
σ
∴c = µ + 0.43σ
80
Example 13
(
Suppose X is normal with mean 0 and sd 5. Find P 1 < X 2 < 4 )
Solution
(
P 1< X 2 < 4 )
= P (1 < X < 2 )
1 2 2 1
=P < Z < =P Z < −P Z <
5 5 5 5
2 1 2 1
=2 F −1− 2 F −1 = 2 F −F
5 5 5 5
= 2 × (.0761) = 0.1522
Example 14
The annual rain fall in a certain locality is a r.v. X having normal distribution with mean
29.5” and sd 2.5”. How many inches of rain (annually) is exceeded about 5% of the time?
Solution
P ( X > C ) = 0.05
X −µ C − 29.5
i.e P > = 0.05
σ 2 .5
C − 29.5
Hence = z 0.05 = 1.645
2 .5
81
Example 15
A rocket fuel is to contain a certain percent (say X) of a particular compound. The
specification calls for X to lie between 30 and 35. The manufacturer will make a net
profit on the fuel per gallon which is the following function of X.
82
= 0.10 × .5899 + 0.05 × 0.3963 + (− 0.10) × 0.0138
= $0.077425
Suppose X,Y are 2 discrete rvs and suppose X can take values x1 , x 2 .......and Y can take
values y1 , y 2 ......... we refer to the function f ( x, y ) = P(Y = x, Y = y ) as the joint prob
distribution of X and Y. The ordered pair (X,Y) is sometimes referred to as a two –
dimensional discrete r.v.
Example 16
Two cards are drawn at random from a pack of 52 cards. Let X be the number of aces
drawn and Y be the number of Queens drawn.
Solution
Clearly X can take any one of the three values 0,1,2 and Y one of the three values, 0,1,2.
x
0 1 2
44 4 44 4
2 1 2 2
0
52 52 52
2 2 2
4 44 4 4
1 1 1 1
y 1 0
52 52
2 2
4
2
2 0 0
52
2
83
Justification
P ( x = 0, y = 0 )
44
2
=
52
2
=P (one ace and one other card which is neither ace nor a queen)
44 44
1 1
= etc.
52
2
Can we write down the distribution of X? X can take any one of the 3 values 0,1,2
What is P( X = 0) ?
X = 0 means no ace is drawn but we might draw 2 queens, or 1 queen and one non queen
or 2 cards which are neither aces nor queens.
Thus
P ( X = 0 ) = P ( X = 0, Y = 0 ) + P ( X = 0, Y = 1) + P ( X = 0, Y = 1)
= Sum of the 3 prob in col. 1
44 4 44 4 48
2 1 1 2 2
+ + = (Verify!)
52 52 52 52
2 2 2 2
Similarly P( X = 1) = P( X = 1, Y = 0) + P( X = 1, Y = 1) + P( X = 1, Y = 2)
84
= Sum of the 3 probabilities in 2nd col.
4 44 4 4 4 48
1 1 1 1 1 1
= + +0= (Verify!)
52 52 52
2 2 2
P ( X = 2 ) = P( X = 2, Y = 0 ) + P ( X = 2, Y = 1) + P( X = 2, Y = 2 )
The distribution of X derived from the joint distribution of X and Y is referred to as the
marginal distribution of X..
Example 17
x
-1 0 1
1 1 1 3
-1
8 8 8 8
y
1 1 2
0 0
8 8 8
1 1 1 3
1
8 8 8 8
3 2 3
Marginal Distribution of X
8 8 8
Write the marginal distribution of X and Y. To get the marginal distribution of X, we find
the column totals and write them in the (bottom) margin. Thus the (marginal) distribution
of X is
X -1 0 1
Prob 3 2 3
8 8 8
85
(Do you see why we call it the marginal distribution)
Similarly to get the marginal distribution of Y, we find the 3 row totals and write them in
the (right) margin.
Y Prob
-1 3
8
0 2
8
1 3
8
Thus g ( x ) = P ( X = x ) = 1 P( X = x, Y = y ) = 1 f ( x, y )
1 1
All y all y
h( y ) = P(Y = y ) = P ( X = x, Y = y ) = f ( x, y )
1 1
And 1 1
all x all x
Conditional Distribution
P ( X = x, Y = y ) f ( x , y )
= =
P( X = x ) g (x )
P( X = 1, Y = 0 ) 1 1
h(0 | 1) = P (Y = 0 | X = 1) = = 8
=
P( X = 1) 3
8 3
86
P ( X = x, Y = y ) f ( x, y )
g ( x | y ) = P( X = x | Y = y ) = =
P(Y = y ) h( y )
P (Y = 0, y = 0 ) 0
g (0 | 0 ) = P ( X = 0 | Y = 0 ) = = =0
P (Y = 0 ) 2
8
Independence
which is the same as saying of g(x|y) =g(x) for all x and y which is the same as saying
h( y | x ) = h( y ) for all x,y.
Example 18
X
2 0 1
Y 2 0.1 0.2 0.1
0 0.05 0.1 0.15
1 0.1 0.1 0.1
Ans
X 2 0 1
Prob 0.25 0.4 0.35
87
(b) Find the marginal distribution of Y
Ans
Y Prob
2 0.4
0 0.3
1 0.3
(c) Find P( X + Y = 2)
Ans X + Y = 2 if ( X = 2, Y = 0) or ( X = 1, Y = 1) or ( X = 0, Y = 2)
(d) Find P( X − Y = 0)
Ans : X − Y = 0 if ( X = 2, Y = 2) or ( X = 0, Y = 0) or ( X = 1, Y = 1)
Let (X,Y) be a continuous 2-dimensional r.v. This means (X,Y) can take all values in a
certain region of the X,Y plane. For example, suppose a dart is thrown at a circular board
of radius 2. Then the position where the dart hits the board (X,Y) is a continuous two
dimensional r.v as it can take all values (x,y) such that x 2 + y 2 ≤ 4.
88
∞ ∞
(ii) f (x, y )dy dx = 1
−∞ − ∞
b d
(iii) P(a ≤ X ≤ b, c ≤ Y ≤ d ) = f ( x, y )dy dx.
a c
Example 19(a)
1
f ( x, y ) = 0 ≤ x ≤ 2, 0 ≤ y ≤ 2
4
0 elsewhere
Find P( X + Y ≤ 1)
1 1− x
1
∴ P (x + y ≤ 1) dy dx
x = 0 y =0
4
1 1
1
= (1 − x ) dx = − 1 (1 − x )2 1
= .
0 4 8 0 8
Example 19(b)
1
f ( x, y ) = (6 − x − y ) 0 < x < 2, 0 < y < 4
8
Solution
1 3
f (x, y )dy dx
x =0 y = 2
89
1 3
1
= (6 − x − y )dy dx
x=0 y =2 8
1 2 3
1
(6 − x ) y − y dx
x =0 8 2 2
1
1
= (6 − x ) − 5 dx
x=0
8 2
1
1 (6 − x )
2
5 1 25 5 3
= − − = − − + 18 =
8 2 2 0
8 2 2 8
∞
g (x ) = f ( x, y )dy
−∞
∞
h( y ) = f ( x, y )dx
−∞
f ( x, y )
h( y | x ) = (Defined only for those x for which g(x) ≠ 0)
g (x )
f ( x, y )
g (x | y ) = (defined only for those y for which h( y ) ≠ 0)
h( y )
90
Independence
Example 20
4
1
= g (x ) = (6 − x − y )dy
y =2
8
4
1 y2
(6 − x ) y −
8 2 2
1
= [2(6 − x ) − 6] 0 < x < 2
8
and = 0 elsewhere
1
g (x ) = (6 − 2 x ) ≥ 0 for 0 < x < 2
8
2 2
1
Secondly g ( x )dx = (6 − 2 x )dx
0
8 0
=
1
8
[6x − x 2 ]
2
0 =
1
8
[12 − 4] = 1
2
1
h( y ) (6 − x − y )dx
x =0
8
2
1 x2 1
= (6 − y )x − = [2(6 − y ) − 2]
8 2 x =0
8
91
1
(10 − 2 y ) or < y < 4
= 8
0 elsewhere
4
Again h( y ) ≥ 0 and h( y )dy
2
[ ]
4
1
= (10 − 2 y )dy = 1 10 y − y 2 4
2 =
1
[20 − 12] = 1
82 8 8
1
f ( x, y ) 8
(6 − 1y ) 1
is h( y | 1) = = = (5 − y ), 2 < y < 4
g (1) 1
(6 − 2) 4
8
And 0 elsewhere
4 4
1
And h( y | 1)dy = (5 − y )dy
2 4 2
4
1 (5 − y )
2
1 9 1
= − = − =1
4 2 2
4 2 2
P( X < 1, Y < 3)
P ( x < 1 | Y < 3) =
P (Y < 3)
3
Now Nr =
8
3 3
1
Dr = P(Y < 3) = h( y )dy = (10 − 2 y )dy
2
82
3
(5 − y )dy = 1 − (5 − y )
3 2
1 1 9 4 5
= = − =
42 4 2 2
4 2 2 8
92
The conditional density of Y for X = 1
1
f (1, y ) 8
(6 − 1y ) 1
Is h( y | 1) = = = (5 − y ) 2 < y < 4
g (1) 1
(6 − 2) 4
8
And 0 elsewhere
4 4
1
Again this is a valid density as h( y | 1) ≥ 0 and h( y | 1)dy = (5 − y )dy
2 4 2
1 (5 − y )
2 4
1 9 1
= − = − =1
4 2 2
4 2 2
P ( x < 1, y < 3)
P ( X < 1 | Y < 3) =
P (Y < 3)
3
Now Numerator =
8
3 3
1
Denominator = P (Y < 3) = h( y )dy = (10 − 2 y )dy
2
8 2
(5 − y )dy = 1 − (5 − y )
2 3
3
1 1 9 4 5
= = − =
4 2
4 2 2
4 2 2 8
3
3
Hence P ( X < 1, Y < 3) = 8 =
5 5
8
Let f ( x, y ) be the joint density of (X,Y). We define the cumulative distribution function
as
F ( x, y ) = P ( X ≤ x , Y ≤ y )
x y
= f (u , v )dvdu.
− ∞ −∞
93
Example 21 (See Exercise 5.77 on page 180)
f ( x, y ) =
6
5
(x + y ) 2
0 < x < 1, 0 < y < 1
0 elsewhere
Solution
x y
F ( x, y ) = f (u , v )dvdu
−∞ −∞
= 0 as f (u , v ) = 0 for
any u , v < 0
Again F ( x, y ) = 0 whatever be x.
Case (iii)
(0 < x < 1, 0 < y < 1)
y
F ( x, y ) = f (u , v )dvdu
−∞
x y
= 6
5
(u + v )dvdu (as f (u, v ) = 0 for u < 0 or v < 0)
2
u =0 v = 0
x y
6 v3
= uv + du
5 u =0 3 0
x
6 y3 6 x 2 y xy 3
= uy + du = + .
5 u =0 3 5 2 3
94
Case (iv) 0 < x < 1, y ≥ 1
x y
F ( x, y ) = f (u , v ) dv du
−∞ −∞
x 1
=
6
5
(
u + v 2 dv du)
u =0 v = 0
x
6 1 6 x2 x
= u + du = +
5 u =0 3 5 2 3
6 y y3
F ( x, y ) = +
5 2 3
Case (v) x ≥ 1, y ≥ 1
x y 1 1
F ( x, y ) = f (u , v )dv du =
6
5
( )
u + v 2 dvdu
−∞ − ∞ u =0 v = 0
1
6 1 6 1 1
= u + du = + =1
5 u −0 3 5 2 3
Hence
= F (0.5,0.6 )
+ F (0.2,0.4 ) (Why ?)
95
6 (.5) (0.6 ) (0.5)(0.6 ) (0.2 ) (0.6 ) (0.2 )(0.6 )
2 3 2 3
= + − −
5 2 3 2 3
−
(0.5) (0.4 ) (0.5)(0.4 ) (0.2 ) (0.4 ) (0.2 )(0.4 )
2
−
3
+
2
+
3
2 3 2 3
=
6
5
[[ ] [
(0.5 )2 − (0.2 )2 × 0.1 + (0.1) (0 .6 )3 − (0.4 )3 ]]
=
6
5
[
× 0 .1 (0 .5 ) − (0 .2 ) + (0 .6 ) − (0 .4 )
2 2 3 3
]
6
= × 0 .1 × [0 .362 ]
5
= 0 .04344
Example 22
f ( x, y ) =
6
5 (x + y )2
0 < x < 1, 0 < y < 1
0 elsewhere
Solution
f ( x, y )
g (x | y ) = where h( y ) is the marginal density of y.
h( y )
96
Thus
1 1
h( y ) = f ( x, y )dx = 6
5 (x + y )dx
2
x =0 x =0
6 1
= + y 2 0 < y < 1.
5 2
Hence
g (x | y ) =
6
5
(x + y ) = x + y
2 2
, 0 < x < 1.
6
5
( +y ) +y
1
2
2 1
2
2
(and 0 elsewhere )
1 x+ 1 4 1
∴g x | = 1 14 = x+ , 0 < x <1
2 2
+4 3 4
Hence
1
E x| y=
2
1
1
= x g x| dx
0 2
1
4 1
= × x + dx
0 3 4
1
4 x3 x2 4 1 1 11
= + = + =
3 3 8 0
3 3 8 8
97
Example 23
Solution
1
f (x , y ) =
Area of the r hom bus
1
= over the r hom bus
2
and 0 elsewhere
1− x
1
f (x ) = dy = (1 − x )
y = x −1
2
1+ x
1
f (x ) = dy = 1 + x
y = −1− x
2
Thus
1 + x −1 < x < 0
g (x ) = 1 − x 0 < x <1
0 elsewhere
98
1+ y −1 < y < 0
h( y ) = 1 − y 0 < y <1
0 elsewhere
1 1 1
(c) for x = , y ranges from − to
2 2 2
1
Thus conditional density of Y for X = is
2
f (x , 12 ) 1 − 12 < y < 12
h (y | 12 ) = =
f ( 12 ) 0 elsewhere
1 2 2
for x = Y rangs from − to
3 3 3
1
3 2 2
2
= − <y<
∴ h (y | 1
3
)= 2
4 3 3
3
0 elsewhere
99
PROPERTIES OF EXPECTATIONS
Then
(a) E (aX + b ) = a E ( x ) + b
Please note : whether we add X and Y or subtract Y from X, we always must add their
variances.
Th. If X,Y are indep, E(XY ) = E(X )E(Y ) and COV (X, Y ) = 0
100
Sample Mean
Let X 1 , X 2 .....X n be n indep rvs each having the same mean µ and same variance σ 2 .
We define
X 1 + X 2 + ... + X n
X=
n
X is called the mean of the rvs X 1 .....X n . Please note that X is also a rv.
Theorem
1. ()
E X =µ
2. ()
Var X =
σ2
n
.
Proof
(i) ()
EX =
1
n
[E(X 1 ) + E(X 2 ) + .... + E(X n )]
1 µ + µ + ..... + µ
= =µ
n n times
(2) ()
Var X =
1
n2
[Var (X 1 ) + Var(X 2 ) + .... + Var(X n )]
1 σ 2 + σ 2 + .. + σ 2 nσ 2 σ 2
= = =
n2 n times n2 n
101
Sample variance
Let X 1 ...X n be n indep rvs each having the same mean µ and same variance σ 2 . Let
X1 + X 2 + X n
X= be their sample mean. We define the sample variance as
n
1
(X )
n 2
S2 = i −X
n −1 i =1
( )
E S2 =σ 2
Simulation
To simulate the values taken by a continuous r.v. X, we have to use the following
theorem.
Theorem
Let X be a continuous r.v. with density f(x) and cumulative distribution function F(x). Let
U = F ( X ) . Then U is a r.v. having uniform distribution on (0,1).
In other words, U is a random number. Thus to simulate the value taken by X, we take a
random no U from the table 7 (Now you must put a decimal point before the no) And
solve for X, the equation
F (X ) = U
102
Example 24
Let X have uniform density on (α , β ) . Simulate the values of X using the 3-digit random
numbers.
Solution
1
α <n<β
f (x ) = β −α
0 elsewhere
0 x ≤α
F (x ) = x −α
β −α α <x≤β
1 x>β
X −α
F(X ) = means =
β−α
_ _
∴X = α + (β − α )
= .133, X = α + (β − α ).133
etc.
x
−
f (x ) =
1
β
e β
x>0
0 elsewhere
103
Hence the cumulative distribution function is
0 x≤0
F(x ) = − xp
1− e x>0
1
X = β ln
1− U
Since U is a random number implies 1-U is also a random number, we can as well use the
formula
1
X = β ln
U
= −β ln U.
Example 25
Solution
X = −2 ln
Taking 3 digit random numbers form table 7 page 595 row 21 col. 3, we get the random
numbers : 913, 516, 692, 007 etc.
104
Example 26
f (x ) = x − 1 < x < 1
= 0 elsewhere
Solution
x
1− x2
= − t dt =
−1
2
x
In this case F ( x ) = f (t )dt
−∞
−1 0 x
= 0 dt + − t dt + tdt
−∞ −1 0
1 x2 1+ x2
= 0+ + =
2 2 2
105
Case (iv) x>1. In this case F(x) =1
Thus
0 x ≤ −1
F(x ) =
2
1− x
2
−1 < x ≤ 0
1+ x 2
2
0 < x ≤1
1 x >1
1
Case (i) 0 ≤U <
2
1− x 2
F(x ) = = U(why ?)
2
∴ X = − 1 − 2 U (why ?)
1
Case (ii) ≤U <1
2
1+ X 2
F(X ) = =U
2
∴ X = + 2U − 1
Thus the defining conditions are :
1
If 0 ≤ U < , X = − 1 − 2U
2
and
1
If ≤ U < 1, x = + 2U − 1
2
106
Let us consider the 3 digit random numbers on page 594 Row 17 Col. 5
1
U = .726 ≥ Thus X = + 2 × .726 − 1 = 0.672
2
1
U = .281 < Thus X = − 21 − 2 × .281 = − 0.662
2
Note : Most of the computers have built in programs which generate random deviates
from important distributions. Especially, we can invoke the random deviates from a
standard normal distribution. You may also want to study how to simulate values from a
standard normal distribution by Box-Muller-Marsaglia method given on page 190 of the
text book.
Example 27
Suppose the no of hours it takes a person to learn how to operate a certain machine is a
random variable having normal distribution with µ = 5.8 and σ = 1.2. Suppose it takes
two person to operate the machine. Simulate the time it takes four pairs of persons to
learn how to operate the machine. That is, for each pair, calculate the maximum of the
two learning times.
Solution
x1 = µ + σz1
x 2 = µ + σz 2
Note
z 1 = − 2 ln (u 2 ) Cos (2πµ 1 )
z 2 = − 2 ln u 2 Sin (2πu 1 )
U1 U2 Z1 Z2 X1 X2
.729 .016 -0.378 -0.991 5.346 4.611
etc.
Review Exercises
f (x ) =
(
k 1− x 2 ) 0 < x <1
0 elsewhere
Solution
∞ 1
f ( x )dx = 1 gives w s ( )
k 1 − x 2 dx = 1
−∞ 0
1
or k 1 − =1
3
3
∴k =
2
108
The cumulative distribution function F(x) of X is:
Case (i) x ≤ 0 ∴ F (x ) = 0
x
Case (ii) 0 < x ≤ 1 , F(x ) = ( )
k 1 − t 2 dt
0
3 x3
= x− .
2 3
(0.5) − (0.5)
3
3
= 1 − F(0.5) = 1 −
2 3
5.113: The burning time X of an experimental rocket is a r.v. having the normal
distribution with µ = 4.76 sec and σ = 0.04 sec . What is the prob that this kind of rocket
will burn
(a) <4.66 Sec
(b) > 4.80 se
(c) anywhere from 4.70 to 4.82 sec?
Solution
X − µ 4.66 − 4.76
(a) P(X < 4.66 ) = P <
σ 0.04
= P(Z < −0.25) = 1 − P(Z < 0.25)
109
X − µ 4.80 − 4.76
(b) P(X > 4.80 ) = P >
σ 0.04
5.11 The prob density of the time (in milliseconds) between the emission of beta particles
is a r.v. X having the exponential density
(a) The time to observe a particle is more than 200 microseconds (=200x 10-3
milliseconds)
(b) The time to observe a particle is < 10 microseconds
Solution
(a) (
P(> 200 micro sec ) = P X > 200 × 10 −3 milli sec )
∞
= [
0.25e −0.25 x dx = − e − 0.25 x ]
∞
200×10 − 3
−3
200×10
−3
= e −50×10
110
(b) P(X < 10 micro sec onds ) = P X < 10 × 10 −3 ( )
10×10 −3
= [
0.25 e − 0.25 x dx = − e −0.25b ] 10×10 − 3
0
0
−3
= 1 − e − 2.5×10
5.120: If n sales people are employed in a door-to-door selling campaign, the gross sales
volume in thousands of dollars may be regarded as a r.v. having the Gamma distribution
1
with α = 100 n and β = . If the sales costs are $5,000 per salesperson, how many
2
sales persons should be employed to maximize the profit.
Solution
5.122: Let the times to breakdown for the processors of a parallel processing machine
have joint density
where X is the time for the first processor and Y is the time for the 2nd processor. Find
111
Solution
∞ ∞
= g (x ) = f (x , y )dy = 0.04e − 0.2 x − 0.2 y dy
y = −∞ y =0
∞
− 0.2 x
= 0 .2 e 0.2e − 0.2 y dy = 0.2e − 0.2 x , x > 0
y=0
(and = 0 if x ≤ 0 )
1
Since X (& Y) have exponential distributions (with parameters = 5 ) E(X)
0 .2
= E(Y) = 5.
∞ ∞
E(X + Y ) = (x + y ) f (x, y )dydx
−∞ −∞
∞ ∞
= (x + y )(0.04)e −0.2 x −0.2 y dydx
x =0 y = 0
∞ ∞
= x.0.04e − 0.2 x − 0.02 y dydx
x =0 y = 0
112
∞ +∞
+ y × 0.04e − 0.2 x − 0.2 y dydx
x =0 y =0
= 5 + 5 = 10 (verify!)
= E(X ) + E(Y )
5.123: Two random variable are independent and each has binomial distribution with
success prob 0.7 and 2 trials.
Solution
Let X,Y be independent and have Binomial distribution with parameters n = 2, and
p = 0.7 Thus
2
P(X = k ) = (0.7 )k (0.3)2− k k = 0,1,2
k
2
P(Y = r ) = (0.7 )r (0.3)2− r r = 0,1,2
r
2 2
= (0.7 )k + r (.3)4−(k + r )
k r
0 ≤ k, r ≤ 2
113
(b) P(Y > X )
2 2 2
= (0.7 )2 (.3)0 (0.7 )0 (0.3)2 + (0.7 )1 (0.3)1
2 0 1
2 2
+ (0.7 )1 (0.3)1 (0.7 )0 (0.3)2
1 0
5.124 If X1 has mean – 5, variance 3 while X2 has mean 1 and variance 4, and the two are
independent, find
(a) E(3X 1 + 5X 2 + 2)
Ans:
(a) 3 (− 5) + 5(1) + 2 = −8
(b) 9 × 3 + 25 × 4 = 127
114
Sampling Distribution
Statistical Inference
Suppose we want to know the average height of an Indian or the average life length of a
bulb manufactured by a company, etc. obviously we cannot burn out every bulb and find
the mean life length. One chooses at random, say n bulbs, find their lifelengths
X + X 2 + .... + X n
X 1 , X 2 ..... X n and take the mean life length X = 1 as an ‘approximation’
n
to the actual (unknown) mean life length. Thus we make a statement about the
“population” (of all life lengths) by looking at a sample of it. This is the basis behind
statistical inference. The whole theory of statistical inference tells us how close we are to
the true (unknown) characteristic of the population.
In the above example, let X be the lifelength of a bulb manufactured by the company.
Thus X is a rv which can assume values > 0. It will have a certain distribution and a
certain mean µ etc. When we make n independent observations, we get n values
x1 , x 2 ....x n . clearly if we again take n observations, we would get y1 , y 2 .... y n . Thus we
may say
Definition
Suppose there is an universe having a finite number of elements only (like the number of
Indians, the number of females in USA who are blondes etc.). A sample of size n from
the above is a subset of n elements such that each subset of n elements has the same prob
of being selected.
115
Statistics
Whenever we sample, we use a characteristic of the sample to make a statement about the
population. For example suppose the true mean height of an Indian is µ (cms). To make a
statement about µ , we randomly select n Indians, Find their heights {X 1 , X 2 ...., X n }and
then their mean namely
X 1 + X 2 + ..... + X n
X=
n
Definition : Let X be a r.v. Let {X1 , X 2 .....X n } be a sample of size n from X. A statistic
is a function of the sample {X 1 , X 2 ,...., X n }.
X 1 + X 2 + ..... + X n
1. The sample mean X =
n
Definition
∧ ∧
If X 1 ,.....X n is a random sample of size n and if X is a statistic, then we remember X is
∧
also a r.v. Its distribution is referred to as the sampling distribution of X .
116
The Sampling Distribution of the Sample Mean X .
Suppose X is a r.v. with mean µ and variance σ 2 . Let X 1 , X 2 .....X n be a random sample
X 1 + X 2 + ........ + X n
of size n from X. Let X = be the sample mean. Then
n
(a) ( )
E X = µ.
(b) ( )
VX =
σ2
n
.
(c) If X 1 ....X n is a random sample from a finite population with N elements, then
Var X =( ) σ2 N − n
n N −1
.
X −µ
(e) Whatever be the distribution of X, if n is “large” has approximately the
σ
n
standard normal distribution. (This result is known as the central limit theorem.)
Explanation
The mean of a random sample of size n = 25 is used to estimate the mean of an infinite
population with standard deviation σ = 2.4. What can we assert about the prob that the
error will be less than 1.2 if we use
Solution
(
P | T − E (T ) | k Var(T ) ≥ 1 − ) 1
k2
2 .4 1
P X − µ < k. ≥ 1− 2 .
5 k
(
Desired P X − µ < 1.2 ? )
2 .4 5
k. =1.2 gives k =
5 2
(
P X − µ < 1 .2 ≥ 1 −) 1
25
=
21
25
= 0.84
4
118
X−µ X−µ
(b) Central limit theorem says σ
= 2.4
is approximately standard normal.
n 5
(
Thus P X − µ < 1.2 )
X−µ 1 .2
=P σ
< 2.4
n 5
5 5
≈P Z < = 2F −1
2 2
A random sample of size 100 is taken from an infinite population having mean µ = 76
and variance σ 2 = 256. What is the prob that X will be between 75 and 78?
Solution
X−µ
We use central limit theorem namely σ
is approximately standard normal.
n
(
Required P 75 < X < 78 )
75 − 76 X−µ 78 − 76
=P 16
< σ
< 16
10 n 10
10 20 5 5
≈P − <Z< =P − <Z<
16 16 8 4
5 5 5 5
=F −F − =F +F −1
4 8 4 8
= 0.8944 + 0.7340 − 1 = 0.8284
119
Example 3 (See Exercise 6.17 on page 217)
If the distribution of weights of all men travelling by air between Dallas and El Paso has
a mean of 163 pounds and a s.d .of 18 pounds, what is the prob. That the combined gross
weight of 36 men travelling on a plane between these two cities is more than 6000
pounds?
Solution
Let X be the weight of a man traveling by air between D and E. It is given that X is a rv
with mean E(X ) = µ = 163 lbs and sd σ = 18 lbs.
Let X 1 , X 2 .....X 36 be the weights of 36 men traveling on a plane between these two cities.
Thus we can regard {X 1 , X 2 ....., X 36 }as a random sample of size 36 from X.
6000
=P X>
36
X−µ 1000
− 163
=P σ
> 6
18
by central limit theorem
n 6
22
≈P Z>
18
22
= 1− P Z ≤ = 1 − F (1.22 )
18
= 1 − 0.8888 = 0.1112
120
The sampling distribution of the sample mean X (when σ is unknown).
Theorem
Let X be a rv having normal distribution with mean E(X ) = µ . Let X be the sample
mean and S2 the sample variance of a random sample of size n form that of X.
X−µ
Then the rv. t = S
has (student’s) t-distribution with n-1 degrees of freedom.
n
Remark
(1) The shape of the density curve of t-distribution (with parameter ν -greek nu)
is like that of standard normal distribution and is symmetrical about the y-
axis.
t ν,α is that
unique number such that
By symmetry tν ,1−α = 1 − tν ,α
For ν large, tν ,α ≈ Z α .
A random sample of size 25 from a normal population has the mean x = 47.5 and the s.d.
s = 8.4. Does this information tend to support or refute the claim that the mean of the
population is µ = 42.1?
121
Solution:
x−µ
t= s
has a t-distribution with parameter ν = n − 1
n
X−µ
Or P s
> 2.797 = 0.005
n
8 .4
Or P X > 42.1 + 2.797 × = 0.005
5
(
Or P X > 46.78 = 0.005 )
This means when µ = 4.21 only in about 0.5 percent of the cases we may get an
X > 46.78 . Thus we will have to refute the claim µ = 42.1 (in favour of µ > 42.1)
The following are the times between six calls for an ambulance (in a certain city) and the
patients arrival at the hospital : 27, 15,20, 32, 18 and 26 minutes. Use these figures to
judge the reasonableness of the ambulance service’s claim that it takes on the average 20
minutes between the call for an ambulance and the patients arrival at the hospital.
Solution
Let X = time (in minutes) between the call for an ambulance and the patient’s arrival at
the hospital. We assume X has a normal distribution. (When nothing is given, we assume
normality). We want to judge the reasonableness of the claim that E(X ) = µ = 20 minutes.
For this we recorded the times for 6 calls. So we have a random sample of size 6 from X
with
122
X 1 = 27, X 2 = 15, X 3 = 20, X 4 = 32, X 5 = 18, X 6 = 26. Thus X = (27 + 15 + 20 + 32 + 18 + 26 ) / 6
138
= = 23.
6
S2 =
1
6 −1
[
(27 − 23)2 + (15 − 23)2 + (20 − 23)2 + (32 − 23)2 + (18 − 23)2 + (26 − 23)2 ]
1
= [16 + 64 + 9 + 81 + 25 + 9] = 204
5 5
204
Hence S =
5
We calculate
x −µ 23 − 20
t= s
= = 1.150
n
204
5 / 6
We can say that it is reasonable to assume that the average time is µ = 20 minutes
Example 6
A process for making certain bearings is under control if the diameters of the bearings
have a mean of 0.5000 cm. What can we say about this process if a sample of 10 of these
bearings has a mean diameter of 0.5060 cm and sd 0.0040 cm?
X − 0 .5
H int . P − 3.25 < .004
< 3.25 = 0.01
10
(
or P 0.492 < x < 0.504 = 0.01 )
Since X = 0.506 > 0.504,
the process is not under control.
123
Sampling Distribution of S2 (The sample variance)
Theorem
If S2 is the sample variance of a random sample of size n taken from the normal
population with (population) variance σ 2 , then
S2 1
(X )
n
Χ 2 = (n − 1)
2
= 2 i −X
σ 2
σ i =1
Remark
Since S2 > 0, the rv has +ve density only to right of the origin. Χ ν2 ,α is that unique
( )
number such that P Χ 2 > Χ ν2 ,α = α and is tabulated for some α s and ν s in table 5.
Solution
42.5 σ 2
42.5
(
= P 2.088 < Χ < 16.925
2
)
(From Table 5, Χ 92 05 = 16.919, Χ 92 , 0.99 = 2.088 )
( ) (
= P Χ 2 > 2.088 − P Χ 2 > 16.919 (approx ) )
= 0.99 − 0.05 = 0.94 (approx)
124
Example 8 (See exercise 6.23 on page 213)
The claim that the variance of a normal population is σ 2 = 21.3 is rejected if the
variance of a random sample of size 15 exceeds 39.74. What is the prob that the claim
will be rejected even though σ 2 = 21.3 ?
Solution
(
= P S 2 > 29.74 )
=P
(n − 1) S 2 > 14
(
× 39.74 = P Χ 2 > 21.12 )
σ 2
21.3
(
= 0.025 As from table 5, Χ14
2
, 0.025 = 21.12 )
Theorem
S12
F= 2
S2
Remark
125
( )
P F > Fν 1ν 2 ,α = α and is tabulated for α = 0.05 in table 6(a) and for α = 0.01 in table
6(b).
1
We also note the fact : Fν 2 ,ν 2 ,α =
Fν1 ,ν 2 ,1−α
1 1
Thus F10, 20,0.95 = = = 0.36
F20,10, 0.05 2.77
Example 9
1 1
(a) F12,15, 0.95 = = = 0.38
F15,12,0.05 2.62
1 1
(b) F6, 20, 0.99 = = = 0.135
F20, 6, 0.01 7.40
Solution
(
Reqd P S12 > 7S 22 OR S 22 > 7S12 )
S12 S 22
=P > 7 or >7
S 22 S12
= 2 P (F > 7 )
If two independent random samples of size n1 = 9 and n2 = 16 are taken from a normal
population, what is the prob that the variance of the first sample will be at least four times
as large as the variance of the second sample?
(
Hint : Reqd prob = P S12 > 4S 22 )
S12
=P > 4 = P(F > 4 )
S 22
6 F (1 + F )
−4
F >0
f (F ) =
0 F ≤0
If random samples of size 5 are taken from two normal populations having the same
variance, find the prob that the ratio of the larger to the smaller sample variance will
exceed 3?
Solution
S12
= 2 P 2 > 3 = 2 P ( F > 3)
S2
127
∞ ∞
6F 1 1
=2 dF = 12 − dF
3 (1 + F) 4
3 (1 + F)3
(1 + F)4
1 1
= 12 − +
2(1 + F) 3(1 + F)
2 3
1 1 5 × 12 5
= 12 − = =
32 192 192 16
We shall discuss how we can make statement about the mean of a population from the
knowledge about the mean of a random sample. That is we ‘estimate’ the mean of a
population based on a random sample.
Point Estimation
Definition
∧
Let θ be a parameter associated with the distribution of a r.v. A statistic θ (based on a
random sample of size n) is said to be an unbiased estimate ( ≡ estimator) of θ if
∧ ∧
E θ = θ . That is, θ will be on the average close to θ .
Example
Let X be a rv; µ the mean of X. If X is the sample mean then we know E X = µ . Thus ( )
we may say the sample mean X is an unbiased estimate of µ (Note X is a rv, a
X 1 + X 2 + ..... + X n
statistic, X= a function of the random sample
n
128
(X1 , X 2 ....., X n ). If ω1 , ω 2 ....ω n are any n non-ve numbers ≤1 such that
ω1 + ω 2 + ...... + ω n = 1, then we can easily see that ω1 x 1 + ω 2 x 2 + ..... + ω n x n is also an
unbiased estimate of µ . (Prove this). X is got as a special case by taking
1
ω1 = ω 2 = .... = ω n = . Thus we have a large number of unbiased estimates for µ .
n
∧ ∧
Hence the question arises : If θ 1 ,θ 2 are both unbiased estimates of θ , which one do we
prefer? The answer is given by the following definition.
Definition
∧ ∧ ∧
Let θ 1 ,θ 2 be both unbiased estimates of the parameter θ . We say θ is more efficient than
∧ ∧ ∧
θ 2 if Var θ1 ≤ Var θ 2 .
Remark
That is the above definition says prefer that unbiased estimate which is “more closer” to
∧
θ . Remember the variance is a measure of the “closeness’ of θ X to θ .
Let X be the sample mean of a random sample of size n from a population with
(unknown) mean µ . Suppose we use X to estimate µ . X - µ is called the error in
estimating µ by X . Can we find an upperbound on this error? We know if X is normal
(or if n is large) then by Cantral Limit Theorem.
X−µ
σ
is a r.v. having (approximately) the standard normal distribution. And we can say
n
X−µ
P − Zα < σ
< Zα = 1− α
2 2
n
129
Thus we can say with prob (1 − α ) that the max absolute error X − µ in estimating µ by
σ
X is atmost Z α . (Here obviously we assume, σ the population s.d. is known. And
2
n
2
(
Z α is that unique no. such that P Z > Z α =
2
) α
2
.
We also say that we can say with 100(1 − α ) percent confidence that the max. abs error is
σ
atmost Z α . The book denotes, this by E.
2
n
Estimation of n
Thus to find the size n of the sample so that we may say with 100(1 − α ) percent
confidence, the max. abs. error is a given quantity E, we solve for n, the equation
σ
Zα = E.
2
n
2
Zασ
or n = 2
Example 1
What is the maximum error one can expect to make with prob 0.90 when using the mean
of a random sample of size n = 64 to estimate the mean of a population with σ 2 = 2.56 ?
Solution
α
Substituting n = 64, σ = 1.6 and Z α = Z 0.05 = 1.645 (Note 1 − α = 0.90 implies = 0.05 )
2 2
σ
in the formula for the maximum error E = Z α we get
2
n
1 .6 1 .6
E = 1.645 × = 1.445 × = 1.645 × 0.2 = 0.3290
64 8
Thus the maximum error one can expect to make with prob 0.90 is 0.3290.
130
Example 2
Solution
α
Here 1 − α = 0.95 so that = 0.025 , hence Z α = Z 0.025 = 1.96
2 2
Thus we want n so that we can assert with prob 0.95 that the max error E = 3.0
2
Zασ 1.96 × 20
2
∴n = 2
= = 170.74
E 3
Small Samples
If the population is normal and we take a random sample of size n (n small) from it, we
note
X −µ
t= s
( X sample mean, S = Sample s.d)
n
(
P t > t n−1, α =
2 2
) α
. Thus if we use X to estimate µ , we can assert with prob (1 − α ) that
131
Example 3
20 fuses were subjected to a 20% overload, and the times it took them to blow had a
mean x = 10.63 minutes and a s.d. S = 2.48 minutes. If we use x = 10.63 minutes as a
point estimate of the true average it takes for such fuses to blow with a 20% overload,
what can we assert with 95% confidence about the maximum error?
Solution
95 α
1−α = = 0.95 so that = 0.025
100 2
Hence we can assert with 95% confidence (ie with prob 0.95) that the max error will be
S 2.48
E = t n −1, α = 2.093 × = 1.16
2
n 20
Interval Estimation
If X is the mean of a random sample of size n from a population with known sd σ , then
we know by central limit theorem,
X−µ
Z= σ
n
σ σ
X− Zα < µ<X + Zα
n 2
n 2
132
Thus we can assert with Prob (1 − α ) (≡ ie. with (1 − α ) × 100% confidence ) that µ lies in
σ σ
the interval X − − Zα , X + Zα .
n 2
n 2
We refer to the above interval as a (1 − α )100% confidence interval for µ . The end
σ
points X ± Z α are known as (1 − α )100% . confidence limits for µ .
n 2
Example 4
Suppose the mean of a random sample of size 25 from a normal population (with σ = 2 )
is x = 78.3. Obtain a 99% confidence interval for µ , the population mean.
Solution
79
Here n = 25, σ = 2, (1 − α ) = = 0.99
100
α
∴ = 0.005 ∴Z α = Z 0.005 = 2.575
2 2
x = 78.3
σ σ
x − Zα , x + Zα
2
n 2
n
2 2
= 78.3 − 2.575 × , 78.3 + 2.575 ×
25 25
= (77.27, 79.33)
133
σ unknown
Suppose X is the sample mean and S is the sample sd of a random sample of size n taken
from a normal population with (unknown) mean µ . Then we know the r.v.
X−µ
t=
s
n
has a t-distribution with (n-1) degrees of freedom. Thus we can say with prob 1 − α that
X−µ
or − t n −1, α < <t α
2 S n −1,
2
n
S S
or X − t α <µ < X + t α
n −1,
2 n n −1,
2 n
S S
X−t α ,X + t α
n −1,
2 n n −1,
2 n
Note :
(1) If n is large, t has approx the standard normal distribution. In which case the
(1 − α )100% confidence interval for µ will be
S S
x − Zα , x + Zα
2 n 2 n
(2) If nothing is mentioned, we assume that the sample is taken from a normal
population so that the above is valid.
134
Example 5
Material manufactured continuously before being cut and wound into large rolls must be
monitored for thickness (caliper). A sample of ten measurements on paper, in mm,
yielded
32.2, 32.0, 30.4, 31.0, 31.2, 31.2, 30.3, 29.6, 30.5, 30.7
Solution
Here n = 10
x = 30.41 S = 0.7880
α
1 − α = 0.95 or = 0.025
2
∴t α = t 9, 0.0025 = 2.262
n −1,
2
0.7880 0.7880
30.9 − 2.262 × , 30.9 + 2.262 ×
10 10
= (30.34, 31.46 )
Example 6:
Ten bearings made by a certain process have a mean diameter of 0.5060 cm with a sd of
0.0040 cm. Assuming that the data may be looked upon as a random sample from a
normal population, construct a 99% confidence interval for the actual average diameter of
bearings made by this process.
135
Solution
99
(1 − α ) = = 0.99. Hence α = 0.005
100
∴t α = t 9, 0.005 = 3.250
n −1,
2
S s
= x −t α , x+t α
n −1,
2 n n −1,
2 n
0.0040 0.0040
= 0.5060 − 3.250 × , 0.5060 + 3.250 ×
10 10
= (0.5019, 0.5101)
Example 7
In a random sample of 100 batteries the lifetimes have a mean of 148.2 hours with a s.d.
of 24.9 hours. Construct a 76.60% confidence interval for the mean life of the batteries.
Solution
136
Example 8
A random sample of 100 teachers in a large metropolitan area revealed a mean weekly
salary of $487 with a sd of $48. With what degree of confidence can we assert that the
average weekly salary of all teachers in the metropolitan area is between $472 and $502?
Solution
S
Thus x + t α = $502
n −1,
2 n
∴t α ≈ Zα
99 ,
2 2
48
Thus we get 487 + Z α = 502
2
10
15
Or Z α = = 3.125
2
4 .8
α
∴ = 0.0009 or 1 − α = 0.9982
2
∴ We can assert with 99.82% confidence that the true mean salaries will be between
$472 and $502.
Definition
Let X be a rv. Let f ( x,θ ) = P( X = x ) be the point prob function if X is discrete and let
f ( x,θ ) be the pdf of X if X is continuous (here θ is a parameter). Let X 1 , X 2 .....X n be a
random sample of size n from X. Then the likelihood function based on the random
sample is defined as
137
L (θ) = L(x 1 , x 2 ,....x n ; θ) = f (x 1 , θ)f (x 2 , θ).....f (x n , θ).
Example 8
λx
Thus f (x , λ ) = P(X = x ) = e − λ ; x = 0,1,2.......
x!
λx λx λx
L(λ ) = e −λ
1 2 n
e −λ ....e −λ
x1! x2 ! xn !
e − nλ λx1 + x 2 +....+ x n
= ; x i = 0,1,2.......
x 1! x 2 !.....x n !
∧
To find λ the value of λ which maximizes L(λ ) , we use calculus.
1 ∂L (x + ....xn )
We get = −n + 1
L(λ ) ∂λ λ
x 1 + .... + x n
= 0 gives λ =
n
138
∂2L
We can ‘easily’ verify is <0 for this λ .
∂λ2
∧ x1 + ....x n
Hence the MLE of λ is λ = = x (The sample mean)
n
We define a rv X as follows.
x 0 1
Prob 1-p p
f ( x; p )(of X ) is given by
f (x; p ) = p x (1 − p )
1− x
; x = 0,1
L(p ) = f (x 1 ; p )f (x 2 , p ).....f (x n ; p )
= p x1T ...+ x n (1 − p )
n − ( x1 + x 2 +...+ x n )
x i = 0 or1 for all i = 1,....n
= p (1 − p ) (s = x 1 .... + x n )
s n −s
We get
1 ∂L s (n − s )
= −
L ∂p p 1 − p
∂L s n −s
for maximum, = 0 or =
∂p p 1− p
s x 1 + x 2 + ..... + x n
(i.e) p = =
n n
∂ 2L
(One can easily see this p makes < 0 so that L is maximum for this p).
∂p 2
Example 10
140
Let {X1 , X 2 ....., X n } be a random sample of size n. Hence the likelihood function is
L(β ) = f ( x1 ; β ) f ( x 2 ; β ).... f ( x n ; β )
( x1 + x2 +....+ xn )
1 −
= e β
( xi > 0)
β n
1 ∂L n x + .... + x n
=− + 1 = 0 (for max imum )
L ∂β β β2
x1 + x 2 + .... + x n
gives β = =x
n
Example 11
Obtain the ML estimate of β based on a random sample {X1 , X 2 .....X n } of size n from
x.
Solution
141
1 ∂L n
= + ln (x 1 .......x n )
L ∂β β + 1
1
gives β = −1 −
ln (x 1 .....x n )
n
Example 12
1
(ie) The density of X is f (x; β) = ; 0 ≤ x ≤ β (and 0 elsewhere)
β
L(β ) = f (x 1 ; β )f (x 2 ; β ).....f (x n ; β )
1
= ; 0 ≤ x 1 ≤ β, 0 ≤ x 2 ≤ β, ....,0 ≤ x n ≤ β
βn
142
Estimation of Sample proportion
We have just in the above seen if p = population proportion (i.e proportion of persons,
things etc. having a characteristics) then the ML estimate of p = sample proportion Now
we would like to find a (1 − α ) 100% confidence interval for p.
Large Samples
For example we can think of a population of all bulbs produced by a factory. Any bulb is
either a “have” (ie defective) or is a “have-not” (ie it is good) and p = proportion of haves
= “Prob that a randomly chosen member is a “have”.
To estimate p, we choose n members at random and count the number X of “haves”. Thus
X is a rv having binomial distribution with parameters n and p!
n
P(X = x ) = f (x; p ) = p x (1 − p )
n −x
; x = 0,1,2.....n
x
X − np
(ie) for large n , has approx standard normal distribution. So we can say with
np(1 − p )
prob (1 − α ) that
143
x − np
− zα < < zα
2 np (1 − p ) 2
x
−p
or − z α < n zα
2 p (1 − p ) 2
n
x p (1 − p ) x p (1 − p )
or − zα < p < + zα
n 2
n n 2
n
X
In the end points, we replace ‘p’ by the MLE (=sample proportion)
n
x x x x
1− 1−
x n n x n n
− zα < p< + zα
n 2
n n 2
n
X
Remark : We can say with prob (1 − α ) that the max error − p in approximating p by
n
X
is
n
p (1 − p )
E = Zα
2
n
X
We can replace p by and say the
n
144
X X
1−
n n
Max error = Z α
2
n
1 1
Or we note that p(1 − p ) for (0 ≤ p ≤ 1) is a maximum (which is obtained when p = )
4 2
Thus we can also say with prob (1 − α ) that the max error.
1
= Zα
2
4n
This last equation tell us that to assert with prob (1 − α ) that the max error is E, n must be
2
Zα
1 2
4 E
Example 13
In a random sample of 400 industrial accidents, it was found that 231 were due at least
partially to unsafe working conditions. Construct a 99% confidence interval for the
corresponding true proportion p.
Solution
x x x x
1− 1−
x n n x n n
− Zα + Zα
n 2
n n 2
n
145
231 231 231 231
1− 1−
231 400 400 231 400 400
= − 2.575 + 2.575
400 400 400 400
= (0.5139,0.6411)
Example 14
Solution
α
so that = 0.025 ; hence Z α = 1.96
2 2
x x
1−
n n
Hence we can say with 95% confidence that the max. error is E = Z α
2
n
0.38 × 0.62
= 1.96 ×
250
= 0.0602
Example 15:
Among 100 fish caught in a large lake, 18 were inedible due to the pollution of the
.18
environment. If we use = 0.18 as an estimate of the corresponding true proportion,
100
with what confidence can we assert that the error of this estimate is atmost 0.065?
146
Solution
X X
1−
n n .18 × .82
We note E = Z α = Zα
2
n 2
100
= Z α × 0.03842
2
0.065
∴ Zα = = 1.69
2
0.03842
α
Hence = 1 − 0.9545 = 0.0455
2
∴α = 0.0910 or 1 − α = 0.9190
So we can assert with (1 − α ) × 100% = 91.9% confidence that the error is at most 0.065.
Example 16
What is the size of the smallest sample required to estimate an unknown proportion to
within a max. error of 0.06 with at least 95% confidence?
Solution
α
Here E = 0.06 ;1 − α = 0.95 or = 0.025
2
∴ Z α = Z 0.025 = 1.96
2
147
2
Zα 2
1 1 1.96
n= 2
=
4 E 4 0.06
= 266.77
Remark
Read the relevant material in your text on pages 279-281 of finding the confidence
interval for the proportion in case of small samples.
In fact we may even have to decide whether the mean life is 500 hours or more (!)
In such situations, we have a statement whose truth or falsity we want to test. We then
say we want to test the null hypothesis H0 = the mean life lengths is 500 hours (Here
onwards, when we say we want to test a statement, it shall mean we want to test whether
the statement is true). We then have another (usually called alternative) hypothesis. Make
some ‘experiment’ and on the basis of that we will ‘decide’ whether to accept the null
hypothesis or reject it. (When we reject the null hypothesis we automatically accept the
alternative hypothesis).
Example
Suppose we wish to test the null hypothesis H0 = The mean life length of a bulb is 500
hours against the alternative H1 = The mean life length is > 500 hours. Suppose we take a
random sample of 50 bulbs and found that the sample mean is 520 hours. Should we
accept H0 or reject H0 ? We have to note that even though the population mean is 500
hours the sample mean could be more or less. Similarly even though the population mean
is > 500 hours, say 550 hours, even then the sample mean could be less than 550 hours.
Thus whatever decision we may make, there is a possibility of making an error. That is
148
falsely rejecting H0 (when it should have been accepted) and falsely accepting H0 (when
it should have been rejected). We put this in a tabular form as follows:
Accept H0 Reject H0
H0 is true Correct Decision Type I error
H0 is false Type II Error Correct Decision
Thus the type I error is the error of falsely rejecting H0 and the type II error is the error of
falsely accepting H0. A good decision ( ≡ test) is one where the prob of making the errors
is small.
Notation
The prob of committing a type I error is denoted by α . It is also referred to as the size of
the test or the level of significance of the test. The prob of committing Type II error is
denoted by β .
Example 1
Suppose we want to test the null hypothesis µ = 80 against the alternative hyp µ = 83 on
the basis of a random sample of size n = 100 (assume that the population s.d. σ = 8.4 )
The null hyp. is rejected if the sample mean x > 82 ; otherwise is is accepted. What is the
prob of typeI error; the prob of type II error?
Solution
X−µ
We know that when µ = 80 (and σ = 8.4 ) the r.v. has a standard normal
σ
n
distribution. Thus,
P (Type I error)
=P (Rejecting the null hyp when it is true)
149
(
= P X > 82 given µ = 80 )
X − µ 82 − 80
=P >
σ 8 .4
n 10
Thus in roughly about 1% of the cases we will be (falsely) rejecting H0. Recall this is also
called the size of the test or level of significance of the test.
X − µ 82 − 83
=P ≤
σ 8 .4
n 10
= P(Z ≤ 1.19 )
In the previous example we rejected the null hypothesis when x > 82 (i.e.) when x lies in
the ‘region’ x>82 (of the x axis). This portion of the horizontal axis is then called the
critical region and denoted by C. Thus the critical region for the above situation is
{ }
C = x > 82 and remember we reject H0 when the (test) statistic X lies in the critical
150
region (ie takes a value > 82). So the size of the critical region ( ≡ prob that X lies in C)
is the size of the test or level or significance.
The shaded portion is the critical region. The portion ... is the region of false
acceptance of H0.
Let X be a rv having a normal distribution with (unknown) mean µ and (known) s.d. σ .
The following tables given the critical regions (criteria for rejecting H0) for various
alternative hypotheses.
Alternative Hypothesis
Reject H0 if Prob of Type I error Prob of type II error
H1
µ 0 − µ1
µ = µ1 (< µ 0 ) Z < −Zα α 1− F − Zα
σ
n
µ < µ0 Z < −Zα α
µ 0 − µ1
µ = µ1 > µ 0 Z > Zα α F + Zα
σ
n
µ > µ0 Z > Zα α
Z < −Z α
µ ≠ µ0 2
α
or Z > Z α
2
151
F(x) = cd f of standard normal distribution.
Remark:
The prob of Type II error is blank in case H1 (the alternative hypothesis) is one of the
following three things = µ < µ 0 , µ > µ 0 , µ ≠ µ 0 . This is because the Type II error can
happen in various ways and so we cannot determine the prob of its occurrence.
Example 2:
According to norms established for a mechanical aptitude test, persons who are 18 years
old should average 73.2 with a standard deviation of 8.6. If 45 randomly selected persons
averaged 76.7 test the null hypothesis µ = 73.2 against the alternative µ > 73.2 at the
0.01 level of significance.
Solution
= α = 0.01
Step IV Calculations
x − µ0 76.7 − 73.2
Z= = = 2.73
σ 8 .6
n 45
152
Example 3
It is desired to test the null hypothesis µ = 100 against the alternative hypothesis
µ < 100 on the basis of a random sample of size n = 40 from a population with σ = 12.
For what values of x must the null hypothesis be rejected if the prob of Type I error is to
be α = 0.01?
Solution
Z α = Z 0.01 = 2.33 . Hence from the table we reject H0 if Z < − Z α =-2.33 where
x − µ0 x − 100
Z= = < −2.33 gives
σ 12
n 40
12
x < 100 − 2.33 × = 95.58
40
Example 4
To test a paint manufacturer’s claim that the average drying time of his new “fast-drying”
paint is 20 minutes, a ‘random sample’ of 36 boards is painted with his new paint and his
claim is rejected if the mean drying time x is > 20.50 minutes. Find
Solution
153
X − µ X − 20
σ
=
2 .4
=
6
2 .4
( )
X − 20 is standard normal.
n 36
X − µ 20.50 − 20
=P >
σ 2 .4
n 36
= P(Z > 1.25) = 1 − P(Z ≤ 1.25) = 1 − F(1.25)
= 1 − 0.8944 = 0.1056
=P (Accepting H0 when µ = 21 )
(
= P X ≤ 20.50 when µ = 21 )
X − µ 20.50 − 21
=P ≤ = P(Z ≤ −1.25) = P(Z > 1.25)
σ 2 .4
n 36
= 0.1056
154
Example 5
It is desired to test the null hypothesis µ = 100 pounds against the alternative hypothesis
µ < 100 pounds on the basis of a random sample of size n=40 from a population with
σ = 12. For what values of x must the null hypothesis be rejected if the prob of type I
error is to be α = 0.01?
Solutions
We want to test the null hypothesis H 0 : µ = 100 against the alt hypothesis H 1 : µ < 100
given σ = 12, n = 50.
Suppose we reject H0 when x < C.
X − µ C − 100 C − 100
=P < =P Z<
σ 12 12
n 50 50
C − 100
=F = 0.01
12
50
C − 100
implies = −2.33
12
50
12
Or C = 100 − × 2.33 = 96.05
50
155
Example 6
Suppose that for a given population with σ = 8.4 in 2 , we want to test the null hypothesis
µ = 80.0 in 2 against the alternative hypothesis µ < 80.0 in 2 on the basis of a random
sample of size n = 100.
(a) If the null hypothesis is rejected for x < 78.0 in 2 and otherwise it is accepted,
what is the probability of type I error?
(b) What is the answer to part (a) if the null hypothesis is µ ≥ 80 in 2 instead of
µ = 80.0 in 2
Solution
(
= P X < 78.0 given µ = 80 )
X − µ 78.0 − 80.0 10
=P < = P Z < 1−
σ 8 .4 4 .2
n 100
10
= 1− P Z < = 1 − F (2.38)
4 .2
=1-0.9913 =.0087
(b) In this case we define the type I error as the max prob of rejecting H0 when it is
(
true = P x < 78.0 given µ is a number ≥ 80.0 )
(
Now P x < 78.0 when the population mean is µ )
156
x−µ 78.0 − µ 10
=P < =P Z< (78 − µ )
σ 8 .4 8 .4
n 100
= F (1.19(78 − µ ))
Example 7
σ 2 (Z α + Z β )
2
n=
(µ1 − µ 0 )2
(a) It is desired to test the null hypothesis µ = 40 against the alternative hypothesis
µ < 40 on the basis of a large random sample from a population with σ = 4.
If the prob of type I error is to be 0.05 and the prob of Type II error is to be 0.12
for µ = 38, find the required size of the sample.
(b) Suppose we want to test the null hypothesis µ = 64 against the alternative
hypothesis µ < 64 for a population with standard deviation σ = 7.2. How large a
157
sample must we take if α is to be 0.05 and β is to be 0.01 for µ = 61? Also for
what values of x will the null hypothesis have to be rejected?
Solution
16(1.645 + 1.175)
2
= = 31.89 ∴n ≥ 32.
(38 − 40)2
∴n ≥
(7.2 ) (1.645 + 2.33)
2 2
= 91.01 ∴ n ≥ 92
(61 − 64 )2
X − 64
We reject H 0 if Z < − Z α ie < −1.645 or X < 62.76
7 .2
92
If X is the sample mean and S the sample s.d. of a (small) random sample of size n from
X − µ0
a normal population (with mean µ 0 ) we know that the statistic t = has a t-
S
n
distribution with (n-1) degrees of freedom. Thus to test the null hypothesis H 0 : µ = µ 0
against the alternative hypothesis H 1 : µ > µ 0 , we note that when H 0 is true, (ie) when
µ = µ 0 , P(t > t n −1,α ) = α
S
Thus if we reject the null hypothesis when t > t n −1,α (ie) when X > µ 0 + t n −1,α we
n
shall be committing a type I error with prob α .
158
The corresponding tests when the alternative hypothesis is µ < µ 0 (& µ ≠ µ 0 ) are
described below.
t < −t n −1,α or
µ ≠ µ0 2
t > t n −1,α
2
X − µ0
t= (n → sample size)
s
n
Example 8
A random sample of six steel beams has a mean compressive strength of 58,392 psi
(pounds per square inch) with a s.d. of 648 psi. Use this information and the level of
significance α = 0.05 to test whether the true average compressive strength of the steel
from which this sample came is 58,000 psi. Assume normality.
Solution
159
X − µ 0 58,392 − 58,000
t= =
S 648
n .6
= 1.48
5. Decision
= 1.48 ≤ 2.015
Since t observed
we cannot reject the null hypothesis. That is we can say the true average compressive
strength is 58,000 psi.
Example 9
Test runs with six models of an experimental engine showed that they operated for
24,28,21,23,32 and 22 minutes with a gallon of a certain kind of fuel. If the prob of type I
error is to be at most 0.01, is this evidence against a hypothesis that on the average this
kind of engine will operate for at least 29 minutes per gallon with this kind of fuel?
Assume normality.
Solution
1. Null hypothesis H 0 : µ ≥ µ 0 = 29
Alt hypothesis: H 1 : µ < µ 0
2. Level of significance ≤ α = 0.01
3. Criterion : Reject the null hypothesis if t < − t n −1,α = − t 5, 0.01 = −3.365 (Note n = 6 )
X − µ0
where t =
S
n
4. Calculations
24 + 28 + 21 + 23 + 32 + 22
X= = 25
6
160
S2 =
1
6 −1
[
(24 − 25)2 + (28 − 25)2 + (21 − 25)2 + (23 − 25)2 + (32 − 25)2 + (22 − 25)2 ]
= 17.6
25 − 29
∴t = = −2.34
17.6
6
5. Decision
Since t obs = −2.34 ≥ − 3.365 , we cannot reject the null hypothesis. That is we can
say that this kind of engine will operate for at least 29 minute per gallon with this
kind of fuel.
Example 10
A random sample from a company’s very extensive files shows that orders for a certain
piece of machinery were filled, respectively in 10,12,19,14,15,18,11 and 13 days. Use the
level of significance α = 0.01 to test the claim that on the average such orders are filled
in 10.5 days. Choose the alternative hypothesis so that rejection of the null hypothesis.
µ = 10.5 indicates that it takes longer than indicated. Assume normality.
Solution
10 + 12 + 19 + 14 + 15 + 18 + 11 + 13
X= =14
8
161
1 (10 − 14 ) + (12 − 14) + (19 − 14 ) + (14 − 14) + (15 − 14)
2 2 2 2 2
S2 =
8 − 1 + (18 − 14)2 + (11 − 14 )2 + (13 − 14 )2
= 10.29
14 − 10.5
∴t = = 3.09
10.29
8
5. Decision
Since t observed = 3.09 > 2.998 , we have to reject the null hypothesis .That is we can
say on the average, such orders are filled in more than 10.5 days.
Example 11
Solution
162
Example 12
In 64 randomly selected hours of production, the mean and the s.d. of the number of
acceptable pieces produced by an automatic stamping machine are
X = 1,038 and S = 146. At the 0.05 level of significance, does this enable us to reject the
null hypothesis µ = 1000 against the alt hypothesis µ > 1000 ?
Solution
163
REGRESSION AND CORRELATION
Regression
Although it is desirable to predict the quantity exactly in terms of the others, this is
seldom possible and in most cases, we have to be satisfied with predicting average or
expected values. Thus we would like to predict the average sales in terms of the money
spent on advertising, the average income of a college student in terms of the number of
years he/she has been out of the college.
Thus given two random variables, X, Y and given that X takes th value x, the basic
problem of bivariate regression is to determine the conditional expected value E(Y|x) as a
function of x. In most cases, we may find that E(Y|x) is a linear function of x:
E(Y|x) = α + βx, where the constants α , β are called the regression coefficients.
Denoting E(X) = µ1, E(Y) = µ2, Var (X ) = σ1, Var (Y ) = σ2, cov(X,Y) = σ12, ρ =
σ 12
, we can show:
σ 1σ 2
164
squares”. The method of least squares says that choose constants a and b for which the
sum of the squares of the “vertical deviations” of the sample points (xi, yi) from the line y
n
= a+bx is a minimum. I.e. find a, b so that T = [ y i − (a + bxi )] 2 is a minimum. Using
i =1
∂T ∂T
2-variable calculus, we should determine a, b so that = 0 and = 0. Thus we get
∂a ∂b
n n
the following two equations (−2) [yi – (a + bxi)] = 0 and ( -2xi) [yi – (a + bxi)] = 0.
i =1 i =1
n n
na + ( xi )b = yi
i =1 i =1
n n n
( xi )a + ( xi2 )b = ( xi y i )
i =1 i =1 i =1
n n n n n
n( xi y i ) − ( xi ) ( yi ) ( yi ) − ( xi ) b
Solving we get b= i =1
n
i =1
n
i =1
; a= i =1 i =1
.
n
n( x )−(
2
i xi ) 2
i =1 i =1
These constants a and b are used to estimate the unknown regression coefficients α , β.
Now if x = xg, we predict y as yg = a + bxg.
Problem 1.
Various doses of a poisonous substance were given to groups of 25 mice and the
following results were observed:
165
(a) Find the equation of the least squares line fit to these data
(b) Estimate the number of deaths in a group of 25 mice who receive a 7 mg dose of
this poison.
Solution:
Thus the least square line that fits the given data is: y = -6.536 + 1.625 x
Problem 2:
The following are the scores that 12 students obtained in the midterm and final
examinations in a course in Statistics:
166
(a) Fit a straight line to the above data
(b) Hence predict the final exam score of a student who received a score of 84 in the
midterm examination.
Solution:
Thus the least square line that fits the given data is: y = 31.609 + 0.5816 x
Correlation
cov ( X , Y )
ρ= .
Var ( X ) Var (Y )
(a) -1 ≤ ρ ≤ 1
(b) If Y is a linear function of X, ρ = ± 1
(c) If X and Y are independent, then ρ = 0
(d) If X, Y have bivariate normal distribution and if ρ = 0, then X and Y are
independent.
If { (x1,y1), (x2, y2), … (xn, yn)} is a random sample of size n from the 2-dimensional
random variable (X, Y), then the sample correlation coefficient, r, is defined by
167
n
( xi − x ) ( y i − y )
r= i =1
.
n n
( xi − x ) 2
( yi − y ) 2
i =1 i =1
n n
( xi ) 2 n n
( yi ) 2
where S xx = ( xi − x ) 2 = xi2 − i =1
, S xx = ( yi − y ) 2 = y i2 − i =1
,
i =1 i =1 n i =1 i =1 n
n n
n n
( xi ) ( yi )
S xy = ( xi − x ) ( y i − y ) = xi y i − i =1 i =1
.
i =1 i =1 n
Problem 3.
Calculate r for the data { (8, 3), (1, 4), (5, 0), (4, 2), (7, 1) }.
Solution
x = 25/5 = 5. y = 10/5 = 2.
n
( xi − x ) ( y i − y ) = 3 x 1 + (-4) x 2 + 0 x (-2) + (-1) x 0 + 2 x (-1) = -7
i =1
n
( xi − x ) 2 = 9 + 16 + 0 + 1 + 4 = 30
i =1
n
( y i − y ) 2 = 1 + 4 + 4 + 0 + 1 = 10
i =1
−7
Hence r = = - 0.404.
(30) (10)
168
Problem 4.
The following are the measurements of the air velocity and evaporation coefficient of
burning fuel droplets in an impulse engine:
n n
( xi ) 2
Solution. S xx = ( xi − x ) 2 = xi2 − i =1
= 532000 – (2000)2 /10 = 132000
i =1 i =1 n
n n
( yi ) 2
S xx = ( yi − y ) 2 = y i2 − i =1
= 9.1097 – (8.35)2 /10 = 2.13745
i =1 i =1 n
n n
( xi ) ( yi )
n n
(2000) (8.35)
S xy = ( xi − x ) ( y i − y ) = xi y i − i =1 i =1
= 2175.4 –
i =1 i =1 n 10
= 505.4
S xy 505.4
Hence r = = = 0.9515.
S xx S yy (132000) (2.13745)
**************
169
A Review of
STATISTICS
and
PROBABILITY
M. GANESH
(Professor, Mathematics Group)
In these notes, you are going to study and learn Statistics and
Probability, presented at the basic level.
Probability and Statistics are like cousin sisters. They go hand in hand
and they enable and synergize each other. We will see more of this as we
proceed into the actual study of these topics.
In these notes, written especially for YOU, you will study and learn
the basics or the fundamentals of Statistics and Probability. These ideas will
be illustrated with appropriately chosen examples.
As in the case of Calculus notes, I will take you through a short and
smooth journey of Statistics and Probability, which will enable you to learn
the basic ideas painlessly and with clarity, so that you can embark on a
happy journey of your course through the prescribed Text Book.
So here are my BEST WISHES for a happy journey into the realm of
Statistics and Probability.
M. Ganesh
Table of Contents
1. Prologue
2. Chapter 1; Basic Statistics
3. Chapter 2: Standard Distributions
4. Chapter 3: Population Vs Sampling
5. Chapter 4: Estimation
6. Chapter 5: Correlation and Regression
7. Epilogue
PROLOGUE
1. Statistics, as was pointed out, deals with raw data; that is,
with numbers. It gives you methods for arranging the raw data into
manageable collections, and how to extract required information
about the data set.
1
occurrence of an event (and as a corollary, the businessman’s
chances of winning). Now – a - days, we call it by the name
‘probability’.
2
Chapter 1
Basic Statistics
1.1 : Introduction
In this Chapter, we will study and learn certain of the above concepts
which are quite popular and often used.
Such a set of values is also called raw data. (Each value is known as
datum and the collection of values data). This is because; they do not have
any ‘meaning’ or ‘significance’ on their own, other than their numeric
values. Their ‘meaning’ depends on the context to which they are applied.
We illustrate this by two examples.
1
Example (1): Suppose you are told that this set of values represents the
number of runs scored by a batsman in 50 consecutive innings. Then, the
data set acquires a meaning and we feel it is meaningful.
Example (2): Again, suppose you are told that this set represents the
quantity of rainfall (in mm) in a certain place for a period of 50 months (the
rainfall is noted down on the same date of 50 consecutive months). Then the
same data set acquires a meaning, even though a different one.
Given the data set of Section 1.2,, we compute the mean or average
−
(denoted by µ or x ) as follows: Add all the 50 values and divide by 50. This
gives
Sum = 1566
Mean = Sum / 50 = 31.32
Example (3): Let us go back to Example (1). In this interpretation the mean
value of 31.32 represents the ‘average number of runs scored by the batsman
per innings’.
Example (4): Let us go back to Example (2). In this interpretation the mean
value of 31.32 represents the ‘average quantity of rainfall (in mm) in that
place per day’.
2
Data
Values
Mean
Value
Figure 1
Example (5): Consider the following data which represent the length (in cm)
of the index finger (right hand) of 30 men:
6.2, 6.5, 5.8, 7.2, 7.0, 5.9, 6.8, 6.4, 6.0, 6.1,
5.6, 5.8, 7.2, 7.5, 7.0, 6.9, 6.3, 6.1, 6.2, 6.0,
7.1, 7.1, 6.9, 6.5, 6.8, 6.6, 7.0, 7.9, 5.4, 6.3.
Let us determine the mean.
Sum = 197 and N = 30;
Therefore, mean = Sum / N = 6.57.
You should draw a diagram like the one shown above. Take the 30 numbers
1 to 30 along the X – axis and the data values along the Y – axis. Plot the
points and join them in sequence. Draw the horizontal line indicating the
mean. Now, answer the questions: What can you infer from this Figure?
Does it convey any meaningful information to you? Write down your
answers and compare with those of your friends.
3
1.4: The Mode of the Data Set
Consider the same data set given in the Section 1.2. Pick out the
maximum value / values. This maximum value represents the mode or
modal value of the data set. For this data set , te mode is 129. This value of
the mode conveys the information that the set of values in the data set cannot
go beyond the modal value. This is the peak value. A diagram like the one
given below conveys this simple idea. See Figure 2.
Figure 2
Example (6): Let us go back to Example (1). The modal value indicates the
highest number of runs scored by the batsman during the period. This he
scored in his 12th innings.
Example (7): Let us go back to Example (2). Here the modal value indicates
the maximum quantity of rainfall in that place during the period. The
maximum rainfall occurred on the 12th month.
Example (8): Let us go back to Example (5). The highest value is 7.9. So,
the modal value of this data set is 7.9.
4
In Examples (6), (7) and (8) the modal value occurs only once. Such a
data set or its plot is said to be unimodal. Figure 2 shows a unimodal plot.
Generally, the modal value may occur several times or may get
repeated several times. In such cases, we say that the data set or its plot is
multi – modal. Its graph or plot will look like as shown in Figure 3.
Modal Value
Figure 3
The value (which may or may not belong to the data set) which
bifurcates (that is, divides the data set into two equal half) the data set is
called the median or median value. To be precise, arrange the values of the
data set in increasing order, and look at the middle value (or values). This
value (or values) gives the median value (or values) of the data set.
Example (9): For a given data set, we have the sorted values in ascending
order as:
03, 10, 16, 18, 20, 21, 23, 25, 25, 29,
34, 34, 34, 34, 35, 35, 41, 41, 44, 44,
46, 46, 47, 47, 48, 50, 50, 50, 52, 52,
56, 57, 57, 59, 62, 63, 65, 66, 66, 69,
70, 72, 75, 77, 78, 80, 82, 82, 83, 96.
5
Since there are 50 values in the data set, we look at the 25th and 26th values.
Since the 25th and 26th values are 48 and 50 respectively, the median is given
by the average of these two values; that is by [48 + 50] / 2 = 49. Note
that this value 49 of the median does not belong to the data set.
Now, let us plot the given data set as given by the increasing order
and draw the horizontal line at 49. (The student should do this). You will get
a plot like Figure 4. You will find that exactly 25 values (the first 25 values0
lie below this line and exactly 25 (the last 25 values) lie above this line.
Median
Figure 4
6
Let us digress briefly from our mainstream ideas to discuss another
important concept associated with any Statistical Analysis of a data set. It is
called the frequency table. This idea ia an important one and will reappear
in the later Chapters.
Table 1
Datum 00 01 02 03 04 05 06 07 08 09
Freq. 0 0 0 1 0 0 0 0 0 0
Datum 10 16 18 20 21 23 25 29 34 35
Freq. 1 1 1 1 1 1 2 1 4 2
Datum 41 44 46 47 48 50 52 56 57 59
Freq. 2 2 2 2 1 3 2 1 2 1
Datum 62 63 65 66 69 70 72 75 77 78
Freq 1 1 1 2 1 1 1 1 1 1
Datum. 80 82 83 96
Freq 1 2 1 1
The total of these frequencies is 50, as it should be. This frequency table can
be further compressed and expressed in a compact manner as shown below.
See Table 2.
7
Table 2
Class Frequency
00 -- 09 01
10 -- 19 03
20 -- 29 06
30 -- 39 06
40 -- 49 09
50 -- 59 09
60 -- 69 06
70 -- 79 05
80 – 89 04
90 – 99 01
Total 50
1.7: Histogram
8
Histogram for Example (11)
This is another visual aid based on the frequency table, which helps in
taking decisions and assessing about profit / loss, above average / below
average, healthy / sick , etc.
Once we are given a data set and we have found the mean, then the
variance is determined based on the view point taken by us or the
information provided or available to us.
9
If we have information that the given data set represents a sample
from a population, then we compute the variance from the formula
Var = [∑ (xk – m)2] / [N – 1]
The standard deviation of the data set is given by the positive square
root of variance. In the first case, where the data is considered as population,
it is denoted by σ and in the second case it is denoted by s. that is, we have
and
Example: Let us go back to Example (5). We have the length (in cm) of the
index finger (right hand) of 30 men:
6.2, 6.5, 5.8, 7.2, 7.0, 5.9, 6.8, 6.4, 6.0, 6.1,
5.6, 5.8, 7.2, 7.5, 7.0, 6.9, 6.3, 6.1, 6.2, 6.0,
7.1, 7.1, 6.9, 6.5, 6.8, 6.6, 7.0, 7.9, 5.4, 6.3.
We have already found the mean m for this data. It is = 6.57. So we
compute the mean square deviations of each datum:
(0.37)2 + (0.07)2 + (0.77)2 + (0.63)2 + (0.43)2 + (0.67)2 + (0.23)2 + (0.17)2 +
(0.57)2 + (0.47)2 + (0.97)2 + (0.77)2 + (0.63)2 + (0.93)2 + (0.43)2 + (0.33)2 +
(0.27)2 + (0.47)2 + (0.37)2 + (0.57)2 + (0/53)2 + (0.53)2 + (0.33)2 + (0.07)2 +
(0.23)2 + (0.03)2 + (0.43)2 + (1.33)2 + (1.17)2 + (0.27)2 = 10.363
We close with two formulas for computing the variance and the
standard deviation from frequency table. So, we start with a frequency table,
whose colums are xk and fk. Here xk is the mid point of the class interval
10
and fk is the corresponding frequency. Let the mean be m. Then we have the
formulas:
Exercises:
1. Consider the data set of Section 1.2. Compute the variance and the SD.
2. For the data set of Example 9, compute the variance and the SD.
3. For the frequency Table (2) of Section 1.6, compute the variance and the
SD.
11
Chapter 2
STANDARD DISTRIBUTIONS
2.1: Introduction
Table – 1
Class Frequency
20-25 10
25-30 6
30-35 4
35-40 4
40-45 6
45-50 7
50-55 2
55-60 1
n = 40
Now, for each class, we can compute the relative frequency; for
example, for the class 20-25, the relative frequency is 10 40 = 1 4 . Similarly,
for the class 25-30, the relative frequency is 6 40 = 3 20 , and so on. We can
now tabulate this as follows. See Table – 2 given below.
1
Table – 2
Note that the sum of all relative frequencies is 1. Also, observe that,
if we take another set of 40 employees of the same company and form the
relative frequency distribution then, we will get completely different table.
Thus, Tables like Table – 2 depends on the set of data. But, still, the sum of
all relative frequencies will be 1.
This gives rise to the question: Are there distributions which do not
depend on the data? Can we define them independent of any data? If so,
how?
Such questions are very natural to ask, but not so very easy to answer;
though, the answers are available to all of them. The answers are yes and
that is precisely what we are going to study in this Chapter.
2
2.2: Discrete versus Continuous
The data arising from a real – life situation can be of either one of the
following two categories: discrete and continuous.
3
2.3.1: Binomial Distribution
4
For such situations, the chances of getting k successes out of n trials
can be calculated from the formula:
Solution: One way of doing this is as follows: take this pack and test each
one of the 100 bulbs! If we come across exactly 20 defective bulbs, then the
probability is 1; otherwise, it is zero. Of course, this is a straightforward
method.
But if the company wants to know this probability for 1000 packs of
100 bulbs each, then this method would be cumbersome and problematic.
So, we need methods to deal with such situations. Let us look at this
situation as a situation with two outcomes – either a bulb is good or
defective (note that it cannot be both or neither!). All the above given 5
conditions for a Binomial distribution are satisfied. (Verify this). So, we are
in a binomial situation. Thus, we can compute the probability, if we can find
n, k and p. We have here n = 100 and k = 20. But what is p? It is not given.
So, assume for the moment that p = 0.31. Then the required probability is
P(x = 20), whose value is given by
100 !
(0.31) 20 (0.69) 80 .
20 !80!
5
Note that, this probability is same for every pack of 100 bulbs of this
company. This also means that the chance of finding exactly 20 defective
bulbs in every pack of 100 bulbs is only 0.0046, which is very small. So, the
ADC company can be happy that the quality control is up to their
expectations.
6
Another method is to simply assume a value for p (note that the value
assumed should be between 0 and 1) which would be “realistic”. For
example, when USA sent a man into the space for the first time, the
probability that the rocket would function normally or the probability that he
would survive in the unknown conditions of the space or the probability that
he would safe land, were not known as there was no past records available
(naturally). Also you cannot send 100 people into the space to collect the
required data. Similarly, what will happen after a nuclear holocaust is
anybody’s guess. So, we cannot know the probability of survival of a human
being or a nation or a species or oxygen in the atmosphere. And it will be the
height of stupidity to conduct experiments regarding this. Under such
circumstances, either we assume a value for the probability which is
“realistic’ or we admit that probability theory is NOT applicable to such
situations.
It might appear to you that these are very extreme cases, which are
rare. If that is the case, consider the following examples from real life:
(i) A production company is thinking of launching a new product. In this
case, the probability that this new product will out beat its rivals or the
probability that it will garner 40% of the market, cannot be determined (that
is, not known a priori);
(ii) The government of a country is thinking of launching a new program for
agriculture. The probability of its success cannot be determined (that is, not
known a priori);
(iii) An educational institution is considering the introduction of a new
method for teaching. The probability of its success cannot be determined
(that is, not known a priori);
(iv) You are preparing to appear in an examination. The probability of your
success cannot be determined (that is, not known a priori). Even your past
records regarding the other examinations you have taken, are of no use here.
All these and many other examples are encountered quite often in real
life. So, even in such familiar situations, determining the probability is either
difficult or might not be possible.
This aspect should be kept firmly in the mind, when dealing with
applications of probability to real life situations.
End of Digression
--------------------------------------------------------------------------------------------
7
Example:2: In a certain experiment, 6 rabbits are given a drug. It is known
that one – fifth of all rabbits which are giver the drug develop certain
symptoms. Let us determine the chances that 4 out of these 6 rabbits develop
the symptoms.
It is given that one – fifth of all rabbits which are given the drug
develop symptoms. Thus
1
p = = 0 .2
5
From the tables, we get this value as 0.015. This means that, if we
repeat the above experiments, we will be able to observe 4 rabbits
developing the symptoms on 1.5% of times.
Example: 3: Go back to the above example 2. Let us now ask: what are the
chances that at most 2 rabbits develop symptoms.
Solution: Now, it may happen that (i) none of the rabbits develops
symptoms, that is, k = 0; or (ii) one of the rabbits develops symptoms, that
is, k = 1; or (iii) two of the rabbits develops symptoms, that is, k = 2. This is
the meaning of saying that “at most 2 rabbits develop symptoms”.
= 0.9111.
8
In other words, the chances are very high (more than 91%) that at most 2
rabbits develop symptoms.
Just as we have seen, for every data set we can compute the mean and
the variance, in a similar manner we can compute the mean and the variance
for every distribution. Note that a binomial distribution involves two
parameters n and p.
σ2 = n p (1 – p)
σ2 = 21.39
EXERCISES
Note: Keep your Text Book by your side (or any book will do)
Caution: In certain books, tables are given only for cumulative
distributions. Use it with caution, as we are dealing with the probability
directly.
1. An urn (or a box) contains 4 red and 6 blue balls. A ball in drawn at
random, its color noted and replaced before next drawing. Drawing a
red ball is considered as success. What are the chances of getting a red
ball in n number of drawings?
9
2. Two teams play a five game series. The chances of the home team
winning a particular game is 0.55. What are the chances that the home
team wins at least 3 games?
3. Of every 1000 parts produced by a machine, 10 are defective, on the
average. What are the chances that some, but not all, of a sample of
three of these parts turn out to be defective.
4. A shopkeeper has been getting over Rs. 200 a day, on the average, for
eight days out of every ten days over the past several months. What
are his chances of getting the same turn-over at least five out of the
next six days?
Note that, the examples (1) to (5) involve time period and examples
(5) to (8) involve space (length or area or volume).
10
1. Events which occur in one time interval do not depend on the
happening or non - happening of those occurring in any other non
over-lapping time interval.
2. The probability that an event occurs is proportional to the length of
the time or space units.
3. The probability that two or more events occur in a very small time or
space unit is supposed to be small enough that it can be neglected.
e −m m k
(2) P (x = k) =
k!
The mean (µ) and the variance (σ2) can be easily derived. They are
given by
Mean of the Poisson distribution = µ = m
Variance of the Poisson distribution = σ2 = m.
11
Note : (1) The sum of all probabilities is 1. That is, if P(x = k) is as in
equation (1) then Σ P(x = k) = 1.
(2) The probability of having n or less number of occurrences (or
successes) is given by P(x ≤ n) = Σ P(x = k) where the sum is taken from k
= 0 to k = n.
(3) The probability of having n or more number of occurrences is
given by P(x ≥ n) = 1 - Σ P(x = k) where the sum is taken from k = 0 to k =
n – 1.
Example: 5: If a person receives 5 calls on the average during a day, what is
the probability that he will receive fewer than 5 calls tomorrow?
Solution: According to the previous discussion, experience has shown that
Poisson probability model is appropriate for this situation. The average value
m = 5 is given. Thus we need to compute P(x ≤ 4). This is given by
4
P ( x ≤ 4 ) = ∑ P (x = k) = 0.44049
k =0
(The value can be found from the table in any statistics book);
P ( x = 5) = p (5) = 0.17547
Example: 6: A secretary claims that she averages one error per page. A
sample page is selected at random from some of her work, and five errors
are found. What is the probability of her making five or more errors on a
page if her claim is correct?
Solution: Assuming that the Poisson process is appropriate, we take m = 1
per page. The required probability is given by P(x ≥ 5). Thus we have
P(x ≥ 5) = 1 – P(x ≤ 4) = 1 – 0.9963 = 0.0037.
In general, such problems do not end with the computation of the
required probability. It is also required to interpret the value according to the
given context. This is done as follows: Note that, the value of the probability
is very small, indicating that the secretary is an exceptional one (that is, she
makes very few errors!). But, if she is not very experienced person in this
area, then, in view of the small value of the probability, we may conclude
one of the following:
12
1. The Poisson model is correct and a near miracle has occurred;
2. The model is correct but the wrong average value m has been claimed;
3. The model is incorrect.
Probably (2) is more plausible, in this case.
EXERCISES
(1) A city has on the average, five traffic deaths per month. What is the
probability that this average is exceeded in any given month?
(2) A taxicab company has, on the average, 10 flat tyres per week. During
the past week they had 20. Assuming the Poisson model is
appropriate, what is the probability of having 20 or more flats daring a
week? Would you suspect foul play?
13
Theoretically, although it may not be apparent from figures, the
curve never touches the X – axis. However, it approaches it so closely that
for practical purposes the area lying farther than ± 3σ from the mean µ can
be ignored without any loss.
Example:7: Determine standard scores for x = 18-3, 27-9, 43-4, 39.3 in the
normal distribution for which µ = 30.1, and σ = 2.4.
Solution :
18.3 − 30.1
For x =18.3, z = = − 4.92
2 .4
27.9 − 30.1
if x = 27.9, z = = − 0.92
2 .4
34.4 − 30.1
if x = 34.4 z = =1.79
2 .4
14
39.3 − 30.1
if x = 39.3, z= = 3.83
2 .4
(6) x = µ + z σ.
The fact that any normal – distribution can be related to the standard
normal distribution is of central importance. Because of this, the standard
normal distribution can be studied in detail and the results transferred to any
normal distribution. Table of cumulative values of the standard normal
distribution is usually given at the end of every statistics book.
Now in this table, the entries on the left and top correspond to
the values of z and the decimal value, usually given to two decimal places.
15
The integer part and the first decimal value are given in the column at
the left, and the second decimal value in the top row. The entries in the body
of the table are the areas under the normal curve, between the mean (0) and
the given value z, correct to four decimal places. (Caution: In some books
the area is given from - ∞ to the given value of z).
For instance, if z = 1.62, to find the corresponding area, look down the
left column to find 1.6 and look along the top row to find 0.02. Then, the
entry which is in both the row of 1.6 and the column of 0.2 is 0.4474. This
means that the area under the standard normal curve between 0 and 1.62 is
0.4474. This gives the value of the probability P( 0 ≤ z ≤ 1.62).
You should see the graphs of these from your Text Book or from
some good book.
Finally, the area between –1.62 and 1.62 is twice the area between 0
and 1.62 or 0.8948. This is interpreted as: the probability that the variable of
the standard normal distribution has a value between – 1.62 and 1.62 is equal
to 0.8948. This is illustrated by the following
Example:11: Find the area under the standard normal curve between 0 and –
z if z = 0.07, 0.83, 1.70, 2.56, – 0.24, - 1.12 , - 3.01
Solution:
16
z Area between 0 and z
0.07 0.0279
0.83 0.2967
1.70 0.4554
2.56 0.4948
-0.24 0.0948
-1.12 0.3686
-3.01 0.4984
Points to Remember:
1. The area under the curve is always positive.
2. The entries on the edge of the table (left and top) represent standard
deviations distance from the mean (standard scores).
3. The entries in the body of the table represent areas under the standard
normal curve between the mean and the given standard score (z –
value).
Example: 12: Find the standard scores for which the area under the standard
normal curve between it and the mean is 0.2019, 0.4908, 0.3621
Solution: From the table we can obtain the values of z
Area z
0.2019 0.53
0.4908 2.36
0.3621 1.09
Example:13: Find the area under the normal curve between z = -1.34 and z =
0.57, between z = 0.59 and z = 1.27
Solution: For z = -1.34 and z = 0.57, the values of the areas from the table
are respectively 0.4099 and 0.2157. Since these are on the opposite sides of
the mean, they should be added together. Thus the area between z = - 1.34
and z = 0.57 under the normal curve is 0.6256. For z = 0.59 and z = 1.27 the
corresponding areas are 0.2224 and 0.3980. Since they are on the same side
17
of mean, their difference is the desired area. Thus the area under the normal
curve between z = 0.59 and z = 1.27 is 0.1765.
EXERCISES
18
2. Find the values of x, for the following standard scores, in a normal
distribution with mean, m = 10.4 and variance, σ = 11.8
5. Find the area under the standard normal curve between z and – z if
(a) z=1
(b) z = 1.96
(c) z = 1.28
6. Find the value of z for which 0.1230 of the area under the standard
normal curve lies to the right of z.
7. The mean of a normal distribution is 100. If the probability that the
variable assumes a value greater than 121.0 is .1446, what is the
standard deviation of the distribution?
8. A normal distribution has a standard deviation of 134. The probability
that the variable takes a value less than 1072 is 0.7734. What is the
mean of the distribution?
19
2.4.2: Applications of the Normal Distributions
The areas under the normal curve associated with z1 and z2 are 0.2734 and
0.3944. The area between 48 and 50, then is 0.3944 – 0.2734 = 0.1210.
Thus P(48 ≤ x ≤ 50) = 0.1210, where x is the age of the worker.
20
15.5 −10
Here z = =1.75 and the associated area is 0.4599. So, P( x > 15 ) =
3.14
0.0401 (why?).
Example:18: In a certain high rent district, the monthly rental for apartments
is approximately normally distributed with a mean of Rs 384.22 and a
standard deviation of 126.40. Above what value is the highest 30 percent of
the monthly rentals in this district?
Solution: Although the variable is discrete, its values are not given in
integers; so we do not apply the continuity correction. According to the table
20% of the values are between the mean and z, for z = 0.52. At this point, 30
percent of the values are above it.
x − 384.22
Thus, we have 0.52 = or x = 449.95.
126.40
Thus, about 30 percent of the rentals are above Rs. 499.95. A slightly more
accurate figure could be obtained with more detailed working.
EXERCISES
21
2.4.3: Normal Approximations
22
2.5: Exponential Distributions
−m x
(7) F ( x) =1 − e for x ≥ 0
23
an event A has not occurred during the first N repetitions. Then the
probability that it will not occur during the next M repetitions, is the
same as the probability of that it will not occur during the first M
repetitions”. In other words, the information of no successes is forgotten
so far as subsequent developments are concerned.
Therefore,
Expected cost for Process I = (5) P(m>200) + (105) P(m ≤ 200)
= (5) Exp(- 2) + (105) [1 - Exp(- 2)]
= 91.466
By a similar computations, we get for the process II m = 1 / 150
Cost per fuse = 10 if m > 200
= 110 if m ≤ 200
Expected cost for Process II = (10) Exp( - 4/3) + (110) [1 - Exp ( - 4/3)]
= 83.64
The expected cost of Process I is, though, slightly more than that for the
Process II, still we prefer Process I as the cost per fuse for Process II is
double that for Process I. Hence we prefer, Process I.
24
Solution: The required probability is P(x > 1 / m). We have
P(x > 1 / m) = P(x > 1 / 10) = Exp (- 10 / 10) = Exp ( - 1) . Find this value
using your calculator.
EXERCISES
APPENDIX
It can also be proved that mean = µ and variance = σ2 . Thus, the two
parameters µ and σ represent the mean and standard deviation of this
distribution.
25
2. Standard Normal Variate
The variable
x−m
(2) z=
σ
is called the standard normal variate (SNV) and has the distribution given
by
1 − z2
(3) y= exp .
2π 2
Note that the mean and standard deviation of this distribution are 0 and 1
respectively. A Diagram of this is shown below;
26
Note the following
1. When z lies between – 1.96 and 1.96, the corresponding area is .95.
We say that the probability of z lying between – 1.96 and 1.96 is 0.95
or p (−1.96 < z <1.96)= 0.95
3. Sigma Levels
27
Chapter 3
POPULATION VS SAMPLING
3.1: Introduction
1
Why should we at all sample the population? Why cannot we
consider the entire population for any observation or study? The
following reasons explain why it is necessary to do sampling.
But, for most of the practical purposes and statistical studies, sampling
is an essentially valid idea for the above said reasons.
2
3.2.1 Purposive Sampling
3
Some of the important sampling schemes covered under this
above sampling techniques are:
Our prime concern will lie in the simple random sampling which is of
great importance in sample surveys.
4
shuffled after each draw. The sampling units corresponding to the numbers
on the selected slips will constitute a random sample.
5
A random number table is given below.
295 266 413 992 979 279 795 911 317 056 244
167 952 415 451 396 720 353 561 300 269 323
707 483 340
6
Thus, the numbers obtained are 11, 24, 15, 13, and 03. This gives a
sample of size 5.
Remark: In this method a large number of digits are rejected and thus we
need large tables even to draw small samples. Some times it is possible that
we may not be able to draw a sample exhausting all the numbers of the table.
This difficulty is overcome by assigning more than one number to each of
the sampling units. For instance, in Example 2 the first unit may be assigned
the numbers
1, 1 + 24, 1+2*24, 1+3*24, and so on
i.e 1, 25, 49, 73, 97, 121 and so on
Similarly the 2nd unit may be assigned the numbers
2, 269, 50, 74, 98, 122 and so on.
Finally, the last unit may be assigned
0, 24 48, 72, 96 and so on.
7
Example:3: The following table of ten random numbers of 2 digits each is
provided to the field investigation.
Table 2
34 96 61 85 49
78 50 02 27 13
How should he use this table to make a random selection of 5 plots out of
35!
Solution: In this case we shall first identify the 40 plots with the numbers 1
to 35. In the above table there are only 3 numbers 34, 02, and 13 which are
less than 35 and accordingly we are not able to draw the desired sample
space of size 5 from this table. In this case we shall assign more than one
number to each of the sample units i.e. plots.
For example, the first plot will be assigned the numbers
01, 01 + 35, 01 + 70, ---------------
i.e 1, 36, 71, 106, --------------------
Similarly the second plot is assigned the numbers,
02, 02 + 35, 02 + 2 * 35, 02 + 3 * 35 ---------
i.e. 2, 37, 72, 107, 142 -----------
Finally, the last plot i.e. 35th plot can be assigned the numbers
0, 35, 70, 105, 140, -------------
If we select the first number form the Table 2 and move row wise, we get the
following table:
Thus, the plots of numbers 08, 14, 15, 26 and 34 constitute the
desired sample. (Note that repetitions have to be discarded).
8
3.4: Sample Mean & Sample Standard Deviation
9
Example:5: The ages of 25 people in a certain income bracket are
distributed as in the following frequency table:
Age 29 33 37 38 39 40 42 43 45 47 50 59 66
Frequency 1 1 3 4 2 3 2 2 3 1 1 1 1
1050
= = 42
25
∑ (x − x)
i −1
i
2
s=
n −1
10
x x - x ( x - x )2
67.32 - 70.28 4939.2784
108.97 - 28.63 819.6469 s=
98600.9066
17.64 - 119.96 14390.4016 4
(or )
412.11 274.51 75355.7401
s =157.00
81.96 - 55.64 3095.8096
688 98600.9066
1402
Thus s= = 7.64
24
11
Some times it is not continent to use x - x for each measure.
Short cut formulas also utilize the mean.
Some of the short cut formulas for calculation of standard
deviation are given below.
n
∑x
i =1
2
− n (x)2
(1) s=
n −1
n n
n ∑ xi2 − (∑ x) 2
(2) s=
i =1 1=1
n ( n −1)
n
(∑ x i ) 2
i =1
∑x
1 =1
2
−
n
(3) s =
n −1
80.1
x= =13.35
6
12
( 80.1 ) 2
1173.23 −
6 109.335
s2 = = = 20.78
5 5
s = 20.78 = 4.56
EXERCISES
13
0 - 2,999 324
(3) Use the best technique to determine the mean and standard
deviation of each of the following samples:
(1) 20, 22, 23, 26, 29, 30
(2) 28, 25, 20, 33, 27, 29, 23, 24, 21
Day 1 2 3 4 5 6 7 8 9 10
No of 284 386 273 212 202 312 372 247 267 289
cars
These measures have a mean of 284.4. We may use this figure as an estimate
of the number of cars which pass through the corner each day.
However, there are number of hazards attached to such a case.
Suppose if we take a different sample (i.e. the experiment is repeated), it is
obvious that we would obtain a different mean. Statistical inference provides
us with a method of estimating the true mean to a desired degree of
accuracy. In order to put this method in to practice, we must study the
theoretical sampling distribution.
14
Sample Weights X
1. 3.1, 3.4, 3.6 3.37
2. 3.1, 3.4, 2.8 3.10
3. 3.1, 3.4, 3.2 3.23
4. 3.1, 3.4, 3.9 3.47
5. 3.1, 3.6, 2.8 3.17
6. 3.1, 3.6, 3.2 3.30
7. 3.1, 3.6, 3.9 3.53
8. 3.1, 2.8, 3.2 3.03
9. 3.1, 2.8, 3.2 3.27
10. 3.1, 3.2, 3.9 3.40
11. 3.4, 3.6, 2.8 3.27
12. 3.4, 3.6, 3.2 3.40
13. 3.4, 3.6, 3.9 3.63
14. 3.4, 2.8, 3.2 3.13
15. 3.4, 2.8, 3.9 3.37
16. 3.4, 3.2, 3.9 3.50
17. 3.6, 2.8, 3.2 3.20
18. 3.6, 3.2, 3.9 3.57
19. 3.6, 2.8, 3.9 3.43
20. 2.8, 3.2, 3.9 3.30
x p (x) x p( x )
3.03 1/20 3.37 2/20
3.10 1/20 3.40 2/20
3.13 1/20 3.43 1/20
3.71 1/20 3.47 1/20
3.20 1/20 3.50 1/20
3.23 1/20 3.53 1/20
3.27 2/20 3.57 1/20
3.30 2/20 3.63 1/20
15
The mean of this distribution µ * is found to be approximately 3.33.
The standard deviation σ * is 0.1586
x1 + x 2 + − − − − + x6
Also, the mean of the population = = 3.33
6
Note, that the mean of the sampling distribution ( µ *) is equal to the
population mean ( µ ).
The set of all random samples of size ‘n’ drawn from a population of
size N with mean µ and standard deviation σ has the mean µ * = µ and
standard deviation σ * is given by,
σ N −n
σ ∗=
n N −1
N −n
Note 1: If N is very large or infinite the term can be taken as 1.
N −1
2: As a rule, if the sample is drawn from an infinite population or
constitute less than five percent of the population, then we have
σ
σ ∗=
n
although this should be used only for theoretically large population. In
practice this is generally used even for small populations.
16
σ N −n
σ* (from the formula) =
n n −1
σ*(from actual distribution) = 0.158
(Since the sample is such a large proportion of the population the correction
factor must be used).
EXERCISES
a) 100
b) 144
c) 10,000
d) 36
e) 128
f) 1024
17
Example:10: A study has to be conducted to know the smoking habits of the
population in a town . Assuming that the chances of a person being a smoker
is 50%, find the sampling distribution of the total number of smokers, based
on a sample size of 4.
Solution: Given that the probability of a person being a smoker = ½. Also
n = 4.
Table 1
X 0 1 2 3 4
Probability 1 1 3 1 1
16 4 8 4 16
Study No No of 14. 0
Smokers 15. 3
1. 2 16. 1
2. 3 17. 4
3. 2 18. 4
4. 2 19. 2
5. 4 20. 3
6. 1
7. 0 Study No No of
8. 3 Smokers
9. 2 21. 1
10. 1 22. 2
23. 1
Study No No of 24. 2
Smokers 25. 3
11. 3
12. 2
13. 1
18
The relative frequency table for x is
Table 2
x 0 1 2 3 4
Relative Frequency 2 6 8 6 3
25 25 25 25 25
19
52 − 50
The corresponding standard score is then given by z = = 1.00 . So, the
2
area under the normal curve to the right of z = 1.00 is equal to 0.5000 –
0.3413 = 0.1587. Thus, the probability that the sample mean will be greater
than 52 is 0.1587
(0.400) 30
σ ∗= = 0.038
(10) (31.61)
0.100
Thus z = = 2.63
0.038
And from table we get the probability as 0.5000 – 0.4957 or 0.0043.
Thus the sampling procedure seems to be a good one.
EXERCISES
20
3. A random sample of size 64 is taken from a large normally distributed
population with mean 0.080 and standard deviation 0.004. What is the
probability that the maximum error is 0.001?
APPENDIX
1. Population:
By a ‘population’ we mean the total collection of items or elements
that fall within the scope of a statistical investigation. This is also called the
‘universe of discourse’ or simply ‘universe’. The purpose of defining a
statistical population is to provide very explicit limit for the data collection
process and for the inferences and conclusion that may be drawn from the
study. Time and space limitations must be specified, and it should be clear
whether or not a particular element falls within or outside the universe. In
short, a population is a universal set.
2. Examples of populations:
Examples:
1. The expectation or mean of a population. It is denoted by µ .
2. The standard deviation of a population. It is denoted by σ.
21
3. Sample
Examples:
22
Examples:
1. The mean of a sample. It is denoted by x .
2. The standard deviation of a sample. It is denoted by s.
CAUTION: consider a set of n data, x1, x2, …., xn. If this set is considered as
a population, then its mean and the standard deviation are given by
1 1
µ=
n
∑ xi , ο=
n
∑ (x − µ )2 ;
whereas, if this same set is considered as a sample, then its mean and
standard deviation are given by
1 1 −
x=
n
∑ xi , . s=
n −1
∑ ( x − x )2
Note that µ = x (ie, the mean is same whether the data are considered as a
population or a sample).
But, σ # s : i.e., the value of the standard deviation depends on the
interpretation of the data as population or sample.
23
Remember:
1. The sampling distributions depend on the sample size and describe the
distribution of a sample statistic.
2. When the population is normally distributed with mean µ and
standard deviation σ, the sampling distribution of x is also normal
with mean µ and standard deviation σ / n where n is the (fixed)
sample size.
3. The sampling distribution of proportion has mean p and variance
p (1 – p) / n where p is the population proportion and n is the (fixed)
sample size.
24
Chapter 4
ESTIMATION
4.1 Introduction
There are two kinds of estimates that are usually used: 1. point
estimates and 2.imteval estimates. Estimates which specify a single value to
a population parameter are called point estimates; estimates which specify a
range of values in an interval are called interval estimates.
1
Because there are 90% chances that x lies between 25000 and 30000, this
interval (25,000, 30,000) is called a “90 percent confidence interval” for x.
This means, that, if we watch the number of visitors on 100 days, then on 90
of these days, the number would be between 25,000 and 30,000. (This also
means that on 10 of these 100 days, the number may not tall in this the
above limits).
A real situation would be, to have an interval with small range and
high confidence level. Unfortunately, this seldom happens in practice. We
will see that, if the interval is small, the confidence would be low and if the
confidence is to be high, then the corresponding interval would have to be
large.
4.2 Estimation of µ
x−µ
z=
σ/ n
2
has standard normal distribution. We want to find the values a and b such
that.
p (a ≤ z ≤ b ) = 0.95
−1.96 ≤ z ≤1.96 ;
x −µ
or −1.96 ≤ = ≤1.96
σ/ n
900 − µ
or −1.96 ≤ = ≤1.96
140 25
or 845.12 ≤ µ ≤ 954.88
In general we have
σ σ
p ( x −1.96 ≤ µ ≤ x + 1.96 ) = 0.95
n n
3
4.2.2. Estimation of µ when σ in not known
In some investigations, an upper limit for the error of the estimate has
to be fixed in advance and a suitable sample size is determined, so that the
error does not exceed this limit.
4
Solution: If x is to estimate µ to within 3 minutes, then 1 x − µ 1≤ 3 and also
σ
p (1x − µ 1≤ 3) = 0.95 this mean that p (1.96 ≤ 3) = 0.95
n
Or n ≥
1.96 ×12
3
Or n ≥ 61.4656
Note: Work out this problem the other way: that is take σ = 12 and n = 62
and find 1.96σ / n . This value must be less than or equal to 3. Satisfy
yourself.
EXERCISES
5
n = N x 40/100 = 2N/5. Given that p 1x − µ 1≤ s = .99 . But, we also
3
10
s 3 s
have p 1x − µ 1≤ 2.56 = .99 . Hence s = 2.56 . Solve for n to find
n 10 n
N)
6
(A) Since the denominator of equation (1) also involves p, which is not
known and to be estimated, this is not useful. However, our interest is
to find an estimate for p; so we replace the denominator by
1 x x
. 1 − taking the value x / n as an approximation for p.
n n n
This gives z = − p /
x 1 x x
− 1 −
n n n n
Hence, we get z = (0.4 – p ) / 0.05. We know that
0.302 ≤ p ≤ 0.498
Hence, the TV manufacturer can be sure with 95% confidence that 30.2% to
49.8% of the population of his locality owns color sets. In other words, he
can be sure that at least 50% of the people do not own color sets (with 95%
confidence), so that it is profitable so start business.
EXERCISES
7
times he can be sure that the manufacturer is right. Will a sample size
of 81 work?
Solution: It is given that x = 30, s =17 and n =16. Note that σ is not given
(therefore unknown) and n is less than 30. Since it is required to construct a
90% confidence interval, we take α = 0.90 . So that 1−α = 0.10 and
1
(1−α )= 0.05
2
From the table, we find that, for v = 15 and 0.05, the value of t is 1.761.
This means that the 90% interval is given by −1.761 < t <1.761 where
x −µ
t=
s/ n
30 − µ
=
17 / 4
30 − µ
Thus, the required interval is obtained as −1.761< 1.761
17 / 4
17 17
Or 30 − × < µ < 30 + ×1.761
4 4
8
Thus, on 90% of the occasion, the mean increase in sleep will be between
22.5 minutes and 37.5 minutes.
EXERCISES
1. 20 steel washers were tested for their diameters, giving x = 0.11 inches
and s = 0.002 inch. Find a 95% confidence interval for the true mean.
2. A health inspector tests 19 bottles of certain syrup for alcohol content,
and finds that the mean alcohol content is 2.7% with standard
deviation 0.13%. Find a 99% confidence interval for the true mean
content of alcohol.
9
Chapter 5
5.1: Introduction
1
5.2: Types of Correlation
There are two important types of correlation. They are (1) Positive
and Negative correlation and (2) Linear and Non – Linear correlation.
If the values of the two variables deviate in the same direction i.e. if
an increase (or decrease) in the values of one variable results, on an average,
in a corresponding increase (or decrease) in the values of the other variable
the correlation is said to be positive.
2
Graphs of Positive and Negative correlation:
x
x
x
x
x
x
x x
x
3
x
x
x x
x x
x
Note:
(i) If the points are very close to each other, a fairly good amount of
correlation can be expected between the two variables. On the
other hand if they are widely scattered a poor correlation can be
expected between them.
(ii) If the points are scattered and they reveal no upward or downward
trend as in the case of (d) then we say the variables are
uncorrelated.
(iii) If there is an upward trend rising from the lower left hand corner
and going upward to the upper right hand corner, the correlation
obtained from the graph is said to be positive. Also, if there is a
downward trend from the upper left hand corner the correlation
obtained is said to be negative.
(iv) The graphs shown above are generally termed as scatter
diagrams.
4
Example:1: The following are the heights and weights of 15 students of a
class. Draw a graph to indicate whether the correlation is negative or
positive.
Since the points are dense (close to each other) we can expect a high
degree of correlation between the series of heights and weights. Further,
since the points reveal an upward trend, the correlation is positive. Arrange
the data in increasing order of height and check that , as height increases, the
weight also increases, except for some (stray) cases..
EXERCISES
(1) A Company has just brought out an annual report in which the capital
investment and profits were given for the past few years. Find the
type of correlation (if it exists).
Capital Investment (crores) 10 16 18 24 36 48 57
Profits (lakhs) 12 14 13 18 26 38 62
5
(2) Try to construct more examples on the positive and negative
correlations.
(3) Construct the scattered diagram of the data given below and indicate
the type of correlation.
X 2 4 6 8 10
Y 7 13 19 25 31
y = 3x +1
y = a + bx
where ‘a’ and ‘b’ are real numbers. This is nothing but a straight line when
plotted on a graph sheet with different values of x and y and for constant
values of a and b. Such relations generally occur in physical sciences but are
rarely encountered in economic and social sciences.
6
The relationship between two variables is said to be non – linear if
corresponding to a unit change in one variable, the other variable does not
change at a constant rate but changes at a fluctuating rate. In such cases, if
the data is plotted on a graph sheet we will not get a straight line curve. For
example, one may have a relation of the form
y = a + bx + cx2
n∑ x y −∑ x∑ y
r=
(n ∑ x 2
)(
− ( ∑ x) 2 n ∑ y 2 − ( ∑ y ) 2 )
7
S. No. Weight Blood Pressure
1. 78 140
2. 86 160
3. 72 134
4. 82 144
5. 80 180
6. 86 176
7. 84 174
8. 89 178
9. 68 128
10. 71 132
Solution:
x y x2 y2 xy
78 140 6084 19600 10920
86 160 7396 25600 13760
72 134 5184 17956 9648
82 144 6724 20736 11808
80 180 6400 32400 14400
86 176 7396 30976 15136
84 174 7056 30276 14616
89 178 7921 31684 15842
68 128 4624 16384 8704
71 132 5041 17424 9372
796 1546 63,776 243036 1242069
Then
8
11444
=
(1144) ( 40244)
= 0.5966
(∑d 2)
R =1 − 6
n ( n 2 −1)
Example:3: The data given below are obtained from student records.
Calculate the rank correlation coefficient ‘R’ for the data.
Subject Grade Point Average (x) Graduate Record exam score (y)
1. 8.3 2300
2. 8.6 2250
3. 9.2 2380
4. 9.8 2400
5. 8.0 2000
6. 7.8 2100
7. 9.4 2360
8. 9.0 2350
9. 7.2 2000
10. 8.6 2260
9
Now we first arrange the data in descending order and then rank
1,2,3,---- 10 accordingly. In case of a tie, the rank of each tied value is the
mean of all positions they occupy. In x, for instance, 8.6 occupy ranks 5 and
5+ 6
6. So each has a rank = 5 .5 ;
2
Similarly in ‘y’ 2000 occupies ranks 9 and 10, so each has rank
9 +10
= 9 .5 .
2
6∑d2
Now we come back to our formula R = 1 −
n ( n 2 −1)
6 (12)
R =1 −
10 (100 −1)
=1− 0.0727 = 0.9273
Note: If we are provided with only ranks without giving the values of x and
y we can still find Spearman rank difference correlation R by taking the
difference of the ranks and proceeding in the above shown manner.
10
EXERCISES
2. The top and bottom number which may appear on a die are as
follows
Top 1 2 3 4 5 6
bottom 5 6 4 3 1 2
3. The ranks of two sets of variables (Heights and Weights) are given
below. Calculate the Spearman rank difference correlation
coefficient R.
1 2 3 4 5 6 7 8 9 10
Heights 2 6 8 4 7 4 9.5 4 1 9.5
Weights 9 1 9 4 5 9 2 7 6 3
11
5.5: Regression
Suppose we have a sample of size ‘n’ and it has two sets of measures,
denoted by x and y. We can predict the values of ‘y’ given the values of ‘x’
by using the equation, called the REGRESSION EQUATION.
y* = a + bx
12
n ∑ xy − ( ∑ x ) ( ∑ y )
b=
n ( ∑ x 2 ) − ( ∑ x)2
∑ y−b∑ x
a=
n
Solution:
We want to predict the final exam scores from the mid term scores. So
let us designate ‘y’ for the final exam scores and ‘x’ for the mid – term exam
scores. We open the following table for the calculations.
13
Stud x y X2 xy
1 98 90 9604 8820
2 66 74 4356 4884
3 100 98 10,000 9800
4 96 88 9216 8448
5 88 80 7744 7040
6 45 62 2025 2790
7 76 78 5776 5928
8 60 74 3600 4440
9 74 86 5476 6364
10 82 80 6724 6560
Total 785 810 64,521 65,071
Therefore a = 40.7531
y* = 40.7531 + (0.5127) x
We can use this to find the projected or estimated final scores of the
students.
For example, for the midterm score of 50 the projected final score is
y* = 40.7531 + (0.5127) 50 = 40.7531 + 25.635 = 66.3881
which is a quite a good estimation.
To give another example, consider the midterm score of 70. Then the
projected final score is
y* = 40.7531 + (0.5127) 70 = 40.7531 + 35.889 = 76.6421,
which is again a very good estimation.
14
This brings us to the end of this chapter. We close with some problems for
you.
EXERCISES
1. The data given below are obtained from student records. Calculate the
regression equation and compute the estimated GRE scores for GPA = 7.5,
8.5..
Subject Grade Point Average (x) Graduate Record exam score (y)
11. 8.3 2300
12. 8.6 2250
13. 9.2 2380
14. 9.8 2400
15. 8.0 2000
16. 7.8 2100
17. 9.4 2360
18. 9.0 2350
19. 7.2 2000
20. 8.6 2260
15
3. A horse was subject to the test of how many minutes it takes to reach a
point from the starting point. The horse was made to carry luggage of
various weights on 10 trials.. The data collected are presented below in the
table.
Find the regression equation between the load and the time taken to reach
the goal. Estimate the time taken for the loads of 35 Kgs , 23 Kgs, and 9
Kgs. Are the answers in agrrement with your intuitive feelings? Justify.
16