Statistics and Probability
Statistics and Probability
TOPIC 5
STATISTICS &
PROBABILITY
Topic 5: Statistics & Probability
What is Statistics?
Definition: Science of collection, presentation, analysis,
and reasonable interpretation of data.
Statistics presents a rigorous scientific method for
gaining insight into data
With so many measurements, simply looking at the data
fails to provide an informative account. However statistics
gives an instant overall picture of data based on
graphical presentation or numerical summarization
irrespective to the number of data points.
Helps make inference and predict relations of variables.
Topic 5: Statistics & Probability
Frequency Distribution
Consider a data set of 26 children of ages 1-6 years. The frequency
distribution of variable ‘age’ can be tabulated as follows:
Age 1 2 3 4 5 6
Frequency 5 3 7 5 4 2
Grouped Frequency Distribution of Age:
Frequency 8 12 6
Topic 5: Statistics & Probability
Cumulative Frequency
Cumulative frequency of data in previous page
Age 1 2 3 4 5 6
Frequency 5 3 7 5 4 2
Cumulative Frequency 5 8 15 20 24 26
Data Presentation
Two types of statistical presentation of data:
Graphical Presentation
Numerical Presentation.
Bar diagram and Pie charts are used for categorical variables.
Histogram, stem and leaf and Box-plot are used for numerical
variable.
Topic 5: Statistics & Probability
30
25
1 15 (15/60)=0.25 25.0
20
15 2 25 (25/60)=0.333 41.7
10
5
3 20 (20/60)=0.417 33.3
0 Total 60 1.00 100
1 2 3
Treatm ent Group
Topic 5: Statistics & Probability
16
14
Number of Subjects
12
10
8
6
4
2
0
40 60 80 100 120 140 More
Age in Month
Topic 5: Statistics & Probability
Numerical Presentation
A fundamental concept in summary statistics is that of a central value for a
set of observations and the extent to which the central value characterizes the
whole set of data. Measures of central value such as the mean or median must
be coupled with measures of data dispersion (e.g., average distance from the
mean) to indicate how well the central value characterizes the data as a whole.
Commonly used methods are mean, median, mode, geometric mean etc.
Mean: The mean is obtained by summing up all the observation and dividing by
number of observations.
For example. Given numbers: 20, 30, 40
x1 x2 ... xn i 1 x i
x
n n
Topic 5: Statistics & Probability
That is, to find the median we need to order the data set and then find
the middle value.
we first sort the data giving {3, 5, 6, 7, 9}, then choose the middle value 6.
If the number of observations is even, e.g., {9, 3, 6, 7, 5, 2}, then the median
is the average of the two middle values from the sorted sequence, in this
case: (5 + 6) / 2 = 5.5.
Topic 5: Statistics & Probability
Mode = 3
Mean or Median
The median is less sensitive to outliers (extreme scores) than
the mean and thus a better measure than the mean for highly
skewed distributions, e.g. family income.
For example mean of 20, 30, 40, and 990 is
(20 + 30 + 40 + 990) / 4 = 270.
The median of these four observations is:
(30 + 40) / 2 = 35.
Here 3 observations out of 4 lie between 20 - 40. So, the mean
270 really fails to give a realistic picture of the major part of the
data. It is influenced by extreme value 990.
Topic 5: Statistics & Probability
(5 5) 2 (3 5) 2 (7 5) 2
4
3 1
Standard Deviation; : This is determined by getting the square root of the
variance.
i.e. The standard deviation of the above example is the Square root of 4 = 2.
Topic 5: Statistics & Probability
Q1 is the median of the first half of the ordered observations and Q3 is the
median of the second half of the ordered observations.
Topic 5: Statistics & Probability
Coefficient of Variation = 100
x
Topic 5: Statistics & Probability
Box Plot:
A Box-Plot is a graph of the five number summary.
The central box spans the quartiles.
Box Plot
Distribution of Age in Month
160
140
120
q1
100 min
80 median
60 max
q3
40
20
0
1
Topic 5: Statistics & Probability
Choosing a Summary
Data Shapes
Shape of data is measured in 2 ways:
I. Skewness
II. Kurtosis
Skewness: Measures asymmetry of data
Positive or right skewed: Longer right tail
Negative or left skewed: Longer left tail
Kurtosis: Measures peakedness of the distribution of
data. The kurtosis of normal distribution is 0.
END OF
STATISTIC
SUB-TOPIC
Topic 5: Statistics & Probability
Probability
Formal study of uncertainty
The engine that drives statistics
History of Probability
For most of human history, probability,
the formal study of the laws of chance, has
been used for only one thing: gambling
Topic 5: Statistics & Probability
What is Probability
In the Statistics sub-topic (Above), we look at graphs
and numerical measures to describe data sets which
were usually samples.
We measure “how often” using
Relative
Relativefrequency
frequency== f/n
f/n
• As n gets larger,
Sample Population
And “How often”
= Relative frequency Probability
Topic 5: Statistics & Probability
Probability of an Event
The probability of an event A measures “how
often” A will occur. We write P(A).
Suppose that an experiment is performed n
times. The relative frequency for an event A is
Number of timesA occurs f
n n
If we let n get infinitely large,
ff
PP((AA))
lim
lim
nn n
n
Topic 5: Statistics & Probability
Probability of an Event
P(A) must be between 0 and 1.
If event A can never occur, P(A) = 0. If event A
always occurs when the experiment is performed,
P(A) =1.
The sum of the probabilities for all simple events in
S equals 1.
•• The
The probability
probability of
of an
an event
event A A is
is found
found by
by
adding
adding the
the probabilities
probabilities of
of all
all the
the simple
simple events
events
contained
contained in
in A.
A.
Topic 5: Statistics & Probability
Finding Probabilities
Probabilities can be found using
Estimates from empirical studies
Common sense estimates based on
equally likely events.
• Examples:
–Toss a fair coin. P(Head) = 1/2
– Suppose that 10% of the Uganda’s population
is male. Then for a person selected at random,
P(male) = 0.1
Topic 5: Statistics & Probability
nnAA number
number of
of simple
simple events
events in
in A
A
PP((AA))
NN total
totalnumber
numberofof simple
simpleevents
events
Topic 5: Statistics & Probability
HH HH
HH 1/4 P(at
P(atleast
least11head)
head)
HH
1/4 ==P(E
TT HT
HT P(E1))++P(E
1 P(E2))++P(E
2 P(E3))
3
1/4 ==1/4
HH 1/4++1/4
1/4++1/4
1/4==3/4
3/4
TH
TH 1/4
TT
TT TT
TT
Topic 5: Statistics & Probability
Example 2:
A bowl contains three marbles, one red, one blue and
one green. A child selects two marbles at random. What
is the probability that at least one is red?
1st M&M 2nd M&M Ei P(Ei)
m RB
m RB 1/6
m P(at
RG
RG 1/6 P(atleast
least11red)
red)
m ==P(RB)
P(RB)++P(BR)+
P(BR)+P(RG)
P(RG)++
BR
BR 1/6
m P(GR)
P(GR)
m
BG
BG 1/6 ==4/6
4/6==2/3
2/3
m
m GB
GB 1/6
m GR
GR 1/6
Topic 5: Statistics & Probability
Example 3:
The sample space of throwing a pair of dice is:
Topic 5: Statistics & Probability
Example 3 (Cont.):
Some simple events and their probabilities:
start thinking …
We need some counting rules
Topic 5: Statistics & Probability
The nm Rule
If an experiment is performed in two stages, with m
ways to accomplish the first stage and n ways to
accomplish the second stage, then there are mn ways
to accomplish the experiment.
This rule is easily extended to k stages, with the number
of ways equal to
n1 n2 n3 … nk
Permutations
The number of ways you can arrange
n distinct objects, taking them r at a time is:
n n!
P
r
(n r )!
where n!n(n 1)(n 2)...(2)(1) and 0!1.
Example: How many 3-digit lock combinations can
we make from the numbers 1, 2, 3, and 4?
Permutations
Example: A lock consists of five parts and can
be assembled in any order. A quality control
engineer wants to test each order for efficiency
of assembly.
How many orders are there?
The order of the choice is
important!
55 55!!
PP
55 55((44)()(33)()(22)()(11))
120
120
00!!
Topic 5: Statistics & Probability
Combinations
The number of distinct combinations of n distinct objects
that can be formed, taking them r at a time is
n n!
Cr
r!(n r )!
Example: Three members of a 5-person committee must be
chosen to form a subcommittee. How many different
subcommittees could be formed?
Example 1 - Combinations
A box contains six marbles, four red and two green. A
child selects two marbles at random. What is the
probability that exactly one is red?
2!
The order of 6! 6(5) C12 2
C26 15 1!1!
the choice is 2!4! 2(1)
ways to choose
not important! ways to choose 2 Marbles
1 green Marble.
4!
C14 4 4 2 =8 ways to
1!3! P(exactly one
choose 1 red and 1
ways to choose color) = 8/15
green marble.
1 red Marble.
Topic 5: Statistics & Probability
Example 2 - Combinations
A deck of cards consists of 52 cards, 13 "kinds" each
of four suits (spades, hearts, diamonds, and clubs).
The 13 kinds are Ace (A), 2, 3, 4, 5, 6, 7, 8, 9, 10,
Jack (J), Queen (Q), King (K). In many poker games,
each player is dealt five cards from a well shuffled
deck.
52 52
52!! 52
52((51
51 )()(50
50 )()(4949 ))48
48 2,598,960
There are CC
Thereare 52
55 2,598,960
52 55)!)!
55!!((52 55((44)()(33)()(22))11
possible
possiblehands
hands
Topic 5: Statistics & Probability
Example 2 (Cont.)
Four of a kind: 4 of the 5 cards are the same “kind”.
What is the probability of getting four of a kind in a five
card hand?
Example 3
One pair: two of the cards are of one kind, the other
three are of three different kinds. What is the
probability of getting one pair in a five card hand?
There
Thereare
are13
13possible
possiblechoices
choicesfor
for the
thekind
kind
of
of which
which to
tohave
haveaa pair;
pair;given
given the
thechoice,
choice,
44
there
thereare C
areC 66possible
22 possiblechoices
choicesof
of two
two
of
of the
thefour
fourcards
cardsof
of that
that kind
kind
Topic 5: Statistics & Probability
Example 3 (Cont.)
There are 12 kinds remaining from which to
select the other three cards in the hand. We
must insist that the kinds be different from each
other and from the kind of which we have a pair,
or we could end up with a second pair, three or
four of a kind, or a full house.
Topic 5: Statistics & Probability
Example 3 (Cont.)
12
There are C3312
There are C 220
220ways
waysto topick
pick the
thekinds
kindsof of
the
theremaining
remaining three
threecards.
cards.There
Thereareare44choices
choices
for
for the
thesuit
suit of
of each
eachofof those
those three
threecards,
cards,aa total
total
33
of 4
of 4 64
64choices
choicesfor
for the
thesuits
suitsof
of all
allthree.
three.
Therefore
Thereforethethenumber
numberof of ""one
onepair"
pair"hands
handsis
is
13
13 66 220
220 64
64 1,098,240.
1,098,240.
The probabilityy
Theprobabilit 1098240/25
1098240/259896098960
..422569
422569
Topic 5: Statistics & Probability
Event Relations
The intersection of two events, A and B, is the event
that both A and B occur when the experiment is
performed. We write A B.
AB A B
Event Relations
The complement of an event A consists of all
outcomes of the experiment that do not result in
event A. We write AC.
S
A C
A
Topic 5: Statistics & Probability
PP((AA
BB))
PP((AA))PP((BB)) PP((AA
BB))
A B
Topic 5: Statistics & Probability
P(AB)
P(AB)==P(A)
P(A)++P(B)
P(B)––P(AB)
P(AB)
==50/120
50/120++--30/120
30/120
==80/120
80/120==2/360/120
2/360/120 Check:
Check: P(AB)
P(AB)
==(20
(20++30
30++30)/120
30)/120
Topic 5: Statistics & Probability
P(AB)
P(AB)==P(A)
P(A)++P(B)
P(B)––P(AB)
P(AB)
==6/36
6/36++ 6/36
6/36––1/36
1/36
==11/36
11/36
Topic 5: Statistics & Probability
P(A
P(AC))==11––P(A)
C
P(A)
Topic 5: Statistics & Probability
Two
Twoevents,
events, AAand
andB,
B, are
aresaid
saidtotobebe independent
independent ifif the
the
occurrence
occurrenceor ornonoccurrence
nonoccurrenceof of one
oneof of the
theevents
eventsdoesdoes
not
not change
changethe
theprobability
probabilityof
of the
theoccurrence
occurrenceof of the
the
other
otherevent.
event.
Topic 5: Statistics & Probability
Conditional Probabilities
• The probability that A occurs, given that event B has
occurred is called the conditional probability of A given
B and is defined as
PP((AA BB))
PP((AA||BB))
ifif PP((BB))
00
PP((BB))
“given”
Topic 5: Statistics & Probability
P(A|B)
P(A|B)==P(A
P(Aand
andB)/P(B)
B)/P(B)
=1/36/1/6=1/6=P(A)
=1/36/1/6=1/6=P(A)
Defining Independence
We can redefine independence in terms of conditional
probabilities:
Two
Twoevents
eventsAAand
andBBare
are independent
independent ifif and
andonly
onlyifif
P(AB)
P(A B)==P(A)
P(A) or
or P(B|A)
P(B|A)==P(B)P(B)
Otherwise,
Otherwise, they
theyare
are dependent.
dependent
dependent.
dependent
P(AB)
P(A B)== P(A)
P(A)P(B
P(Bgiven
giventhat
thatAAoccurred)
occurred)
==P(A)P(B|A)
P(A)P(B|A)
P(A B)
P(A B) == P(A)
P(A) P(B)
P(B)
Topic 5: Statistics & Probability
P(A) P(A
P(A)== P(A P(A
SS11))++P(A SS22))++… P(A
…++P(A SSkk))
==P(S
P(S11)P(A|S
)P(A|S11))++P(S
P(S22)P(A|S
)P(A|S22))++…
…++P(S
P(Skk)P(A|S
)P(A|Skk))
Topic 5: Statistics & Probability
S1
A A Sk
A S1 Sk
S2….
P(A) P(A
P(A)== P(A P(A
SS11))++P(A SS22))++… P(A
…++P(A SSkk))
==P(S
P(S11)P(A|S
)P(A|S11))++P(S
P(S22)P(A|S
)P(A|S22))++…
…++P(S
P(Skk)P(A|S
)P(A|Skk))
Topic 5: Statistics & Probability
P(H|F)
P(H|F)== .08 .51
.51 (.(.12
12))
.61
.61
P(H|M)
P(H|M)== .12 .51 12)).49
.51(.(.12 .49(.(.08
08))
Topic 5: Statistics & Probability
Example 2:
Tom and Dick are going to take
a driver's test at the nearest DMV office. Tom
estimates that his chances to pass the test are 70%
and Dick estimates his as 80%. Tom and Dick take
their tests independently.
Define D = {Dick passes the driving test}
T = {Tom passes the driving test}
T and D are independent.
P (T) = 0.7, P (D) = 0.8
Topic 5: Statistics & Probability
Example 2 (Cont.):
What is the probability that at most one of the two friends
will pass the test?
P(At
P(Atmost
mostoneoneperson
personpass)
pass)
P(Dc
==P(D P(Dc
TTc))++P(D T) P(D
T)++ P(D TTc))
c c c c
==(1
(1--0.8)
0.8)(1
(1––0.7)
0.7)++ (0.7)
(0.7)(1
(1––0.8)
0.8)++(0.8)
(0.8)(1
(1––0.7)
0.7)
==.44
.44
P(At
P(Atmost
mostone
oneperson
personpass)
pass)
==1-P(both
1-P(bothpass)
pass)==1-
1-0.8
0.8xx0.7
0.7== .44
.44
Topic 5: Statistics & Probability
Example 2 (Cont.):
What is the probability that at least one of the two friends
will pass the test?
P(At
P(Atleast
leastone
oneperson
personpass)
pass)
P(D
==P(D T)
T)
==0.8
0.8++ 0.7
0.7--0.8
0.8xx0.7
0.7
==.94
.94
P(At
P(Atleast
leastone
oneperson
personpass)
pass)
==1-P(neither
1-P(neitherpasses)
passes)==1-
1-(1-0.8)
(1-0.8)xx(1-0.7)
(1-0.7)== .94
.94
Topic 5: Statistics & Probability
Example 2 (Cont.):
Suppose we know that only one of the two friends passed
the test. What is the probability that it was Dick?
P(D
P(D||exactly
exactlyone oneperson
personpassed)
passed)
P(D
==P(D exactly
exactlyoneoneperson
personpassed)
passed)// P(exactly
P(exactlyone
one
person
personpassed)
passed)
P(D T ) / (P(D T ) + P(D
==P(D T ) / (P(D T ) + P(D T)
T)))
cc cc cc
==0.8
0.8xx(1-0.7)/(0.8
(1-0.7)/(0.8xx(1-0.7)+(1-.8)
(1-0.7)+(1-.8)xx0.7)
0.7)
==.63
.63
Topic 5: Statistics & Probability
Random Variables:
A quantitative variable x is a random variable if
the value that it assumes, corresponding to the
outcome of an experiment is a chance or random
event.
Random variables can be discrete or continuous.
• Examples:
x = SAT score for a randomly selected student
x = number of people in a room at a randomly
selected time of day
x = number on the upper face of a randomly
tossed die
Topic 5: Statistics & Probability
We
We must
musthave
have
00
pp((xx)) and pp((xx))
11and 11
Topic 5: Statistics & Probability
8 5/36
9 4/36
10 3/36
11 2/36
12 1/36
Topic 5: Statistics & Probability
Probability Distribution
Probability distributions can be used to describe
the population, just as we described samples in
Chapter 2.
Shape: Symmetric, skewed, mound-shaped…
Mean::
Mean xp
xp((xx))
Variance :: ((xx )) pp((xx))
22 22
Variance
deviation::
22
Standard
Standard deviation
Topic 5: Statistics & Probability
22
28125..09375
..28125 09375..09375
09375..28125
28125..75
75
75
..75 ..688
688
Topic 5: Statistics & Probability
Symmetric; mound-
shaped
• Shape?
None
• Outliers?
= 1.5
• Center?
• Spread? = .688
Topic 5: Statistics & Probability
Key Concepts
I. Experiments and the Sample Space
1. Experiments, events, mutually exclusive events,
simple events
2. The sample space
II. Probabilities
1. Relative frequency definition of probability
2. Properties of probabilities
a. Each probability lies between 0 and 1.
b. Sum of all simple-event probabilities equals 1.
3. P(A), the sum of the probabilities for all simple events in A
Topic 5: Statistics & Probability