0% found this document useful (0 votes)
25 views

Statistics and Probability

Uploaded by

2100804800
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Statistics and Probability

Uploaded by

2100804800
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 91

MTH 1110: Basic Mathematics

TOPIC 5

STATISTICS &
PROBABILITY
Topic 5: Statistics & Probability

What is Statistics?
Definition: Science of collection, presentation, analysis,
and reasonable interpretation of data.
 Statistics presents a rigorous scientific method for
gaining insight into data
 With so many measurements, simply looking at the data
fails to provide an informative account. However statistics
gives an instant overall picture of data based on
graphical presentation or numerical summarization
irrespective to the number of data points.
 Helps make inference and predict relations of variables.
Topic 5: Statistics & Probability

Frequency Distribution
Consider a data set of 26 children of ages 1-6 years. The frequency
distribution of variable ‘age’ can be tabulated as follows:

Frequency Distribution of Age

Age 1 2 3 4 5 6
Frequency 5 3 7 5 4 2
Grouped Frequency Distribution of Age:

Age Group 1-2 3-4 5-6

Frequency 8 12 6
Topic 5: Statistics & Probability

Cumulative Frequency
Cumulative frequency of data in previous page

Age 1 2 3 4 5 6
Frequency 5 3 7 5 4 2
Cumulative Frequency 5 8 15 20 24 26

Age Group 1-2 3-4 5-6


Frequency 8 12 6
Cumulative Frequency 8 20 26
Topic 5: Statistics & Probability

Data Presentation
Two types of statistical presentation of data:
 Graphical Presentation
 Numerical Presentation.

Graphical Presentation: We look for the overall pattern and for


striking deviations from that pattern. Over all pattern usually
described by shape, center, and spread of the data. An individual
value that falls outside the overall pattern is called an outlier.

 Bar diagram and Pie charts are used for categorical variables.

 Histogram, stem and leaf and Box-plot are used for numerical
variable.
Topic 5: Statistics & Probability

Data Presentation – Categorical Variables


Bar Diagram: Lists the categories and presents the percent or count of individuals
who fall in each category, given a total of 60

Figure 1: Bar Chart of Subjects in


Treatm ent Groups Treatment Frequency Proportion Percent
Group (%)
Number of Subjects

30
25
1 15 (15/60)=0.25 25.0
20
15 2 25 (25/60)=0.333 41.7
10
5
3 20 (20/60)=0.417 33.3
0 Total 60 1.00 100
1 2 3
Treatm ent Group
Topic 5: Statistics & Probability

Data Presentation – Categorical Variables


Pie Chart: Lists the categories and presents the percent or count of individuals
who fall in each category.

Figure 2: Pie Chart of Treatmen Frequenc Proportion Percent


Subjects in Treatment t y (%)
Groups Group
1 15 (15/60)=0.25 25.0
25% 1 2 25 (25/60)=0.333 41.7
33% 3 20 (20/60)=0.417 33.3
2
Total 60 1.00 100
3
42%
Topic 5: Statistics & Probability

Data Presentation – Numerical Variables


Histogram: Overall pattern can be described by its shape, center, and spread.
The following age distribution is right skewed. The center lies between 40 to
180. No outliers.
Figure 3: Age Distribution

16
14
Number of Subjects

12
10
8
6
4
2
0
40 60 80 100 120 140 More
Age in Month
Topic 5: Statistics & Probability

Numerical Presentation
 A fundamental concept in summary statistics is that of a central value for a
set of observations and the extent to which the central value characterizes the
whole set of data. Measures of central value such as the mean or median must
be coupled with measures of data dispersion (e.g., average distance from the
mean) to indicate how well the central value characterizes the data as a whole.

 To understand how well a central value characterizes a set of observations, let


us consider the following two sets of data:
A: 30, 50, 70
B: 40, 50, 60
The mean of both two data sets is 50. But, the distance of the observations from
the mean in data set A is larger than in the data set B. Thus, the mean of data
set B is a better representation of the data set than is the case for set A.
Topic 5: Statistics & Probability

Methods of Center Measurement


 Center measurement is a summary measure of the overall level of a dataset

 Commonly used methods are mean, median, mode, geometric mean etc.
Mean: The mean is obtained by summing up all the observation and dividing by
number of observations.
For example. Given numbers: 20, 30, 40

Mean = (20 + 30 + 40) / 3 = 30.

Notation : Let x1 , x2, ...xn are n observations of a variable


x. Then the mean of this variable,
n

x1  x2  ...  xn i 1 x i
x 
n n
Topic 5: Statistics & Probability

Methods of Center Measurement


Median: This is the middle value in an ordered sequence of observations.

 That is, to find the median we need to order the data set and then find
the middle value.

 In case of an even number of observations the average of the two


middle most values is the median.

For example: To find the median of {9, 3, 6, 7, 5},

we first sort the data giving {3, 5, 6, 7, 9}, then choose the middle value 6.

If the number of observations is even, e.g., {9, 3, 6, 7, 5, 2}, then the median
is the average of the two middle values from the sorted sequence, in this
case: (5 + 6) / 2 = 5.5.
Topic 5: Statistics & Probability

Methods of Center Measurement


Mode: This is the value that is observed most frequently.

 The mode is undefined for sequences in which no observation


is repeated.

For example: Given the set of numbers: {9, 3, 6, 7, 5, 3},

Mode = 3

 In a case where we have {9, 3, 6, 7, 5},

the mode of this set is unobservable


Topic 5: Statistics & Probability

Mean or Median
The median is less sensitive to outliers (extreme scores) than
the mean and thus a better measure than the mean for highly
skewed distributions, e.g. family income.
 For example mean of 20, 30, 40, and 990 is
(20 + 30 + 40 + 990) / 4 = 270.
 The median of these four observations is:
(30 + 40) / 2 = 35.
Here 3 observations out of 4 lie between 20 - 40. So, the mean
270 really fails to give a realistic picture of the major part of the
data. It is influenced by extreme value 990.
Topic 5: Statistics & Probability

Methods of Variability Measurement:


Variability (or dispersion) measures the amount of scatter
in a dataset.

Commonly used methods: range, variance, standard


deviation, interquartile range, coefficient of variation etc.

Range: This is the difference between the largest and the


smallest observations.
 i.e. Given the following numbers; 10, 5, 2, 100
Range = (100 - 2 ) = 98.
It’s a crude measure of variability.
Topic 5: Statistics & Probability

Methods of Variability Measurement:


Variance: The variance of a set of observations is the average of the squares of
the deviations of the observations from their mean. In symbols, the variance of
the n observations x1, x2,…xn is
2 2
( x  x )  ....  ( x  x )
S2  1 n
n 1
Variance of 5, 7, 3? Mean is (5+7+3)/3 = 5 and the variance is

(5  5) 2  (3  5) 2  (7  5) 2
4
3 1
Standard Deviation;  : This is determined by getting the square root of the
variance.
 i.e. The standard deviation of the above example is the Square root of 4 = 2.
Topic 5: Statistics & Probability

Methods of Variability Measurement:


Quartiles: Data can be divided into four regions that cover the total range of
observed values. Cut points for these regions are known as quartiles.

In notations, quartiles of a data is the ((n+1) / 4) qth observation of the data,


where q is the desired quartile and n is the number of observations of data.

 The first quartile (Q1) is the first 25% of the data.


 The second quartile (Q2) is between the 25th and 50th percentage points
in the data.
 The upper bound of Q2 is the median.
 The third quartile (Q3) is the 25% of the data lying between the median
and the 75% cut point in the data.

Q1 is the median of the first half of the ordered observations and Q3 is the
median of the second half of the ordered observations.
Topic 5: Statistics & Probability

Methods of Variability Measurement:


Example: Find the Quartile in the following number
sequence:
3 6 7 11 13 22 30 40 44 50 52 61 68 80 94
Q1 Q2 Q3
The first quartile is Q1=11. The second quartile is Q2=40
(This is also the Median.) The third quartile is Q3=61.

In the following example Q1= ((15+1)/4)1 =4th observation of the


data. The 4th observation is 11. So Q1 is of this data is 11.

Inter-quartile Range: This is the difference between Q3 and Q1.


Inter-quartile range of the example above is:
61 – 11 = 50
Topic 5: Statistics & Probability

Deciles and Percentiles


Deciles: If data is ordered and divided into 10 parts, then cut points are called
Deciles
Percentiles: If data is ordered and divided into 100 parts, then cut points are
called Percentiles. 25th percentile is the Q1, 50th percentile is the Median (Q2)
and the 75th percentile of the data is Q3.

In notations, percentiles of a data is the ((n+1)/100)pth observation of the data,


where p is the desired percentile and n is the number of observations of data.

Coefficient of Variation: The standard deviation of data divided by it’s mean. It is


usually expressed in percent.


Coefficient of Variation = 100
x
Topic 5: Statistics & Probability

Five Number Summary

Five Number Summary: The five number summary of a


distribution consists:

 The smallest (Minimum) observation,


 The first quartile (Q1),
 The median(Q2),
 The third quartile,
 The largest (Maximum) observation written in order

from smallest to largest.


Topic 5: Statistics & Probability

Box Plot:
A Box-Plot is a graph of the five number summary.
The central box spans the quartiles.

A line within the box marks the median. Lines


extending above and below the box mark the
smallest and the largest observations (i.e., the
range).

Outlying samples may be additionally plotted


outside the range.
Topic 5: Statistics & Probability

Box Plot
Distribution of Age in Month

160
140
120
q1
100 min
80 median
60 max
q3
40
20
0
1
Topic 5: Statistics & Probability

Choosing a Summary

 The five number summary is usually


better than the mean and standard deviation
for describing a skewed distribution or a
distribution with extreme outliers.

 The mean and standard deviation are


reasonable for symmetric distributions that
are free of outliers.
Topic 5: Statistics & Probability

Choosing a Summary continued..

 In real life we can’t always expect symmetry of the


data. It’s a common practice to include number of
observations (n), mean, median, standard deviation,
and range as common for data summarization
purpose.

 We can include other summary statistics like Q1,


Q3, Coefficient of variation if it is considered to be
important for describing data.
Topic 5: Statistics & Probability

Data Shapes
 Shape of data is measured in 2 ways:
I. Skewness
II. Kurtosis
 Skewness: Measures asymmetry of data
 Positive or right skewed: Longer right tail
 Negative or left skewed: Longer left tail
 Kurtosis: Measures peakedness of the distribution of
data. The kurtosis of normal distribution is 0.
END OF
STATISTIC
SUB-TOPIC
Topic 5: Statistics & Probability

Probability
 Formal study of uncertainty
 The engine that drives statistics

Why Study Probability


 Nothing in life is certain
 We gauge the chances of successful outcomes in
business, medicine, weather, and other everyday
situations such as winning a lottery
 It provides a bridge between descriptive and inferential
statistics
Topic 5: Statistics & Probability

History of Probability
 For most of human history, probability,
the formal study of the laws of chance, has
been used for only one thing: gambling
Topic 5: Statistics & Probability

What is Probability
 In the Statistics sub-topic (Above), we look at graphs
and numerical measures to describe data sets which
were usually samples.
 We measure “how often” using
Relative
Relativefrequency
frequency== f/n
f/n
• As n gets larger,

Sample Population
And “How often”
= Relative frequency Probability
Topic 5: Statistics & Probability

Probabilistic vs Statistical Reasoning


 Suppose I know exactly the proportions of car
makes in Kampala. Then I can find the
probability that the first car I see in the street is
a Ford. This is probabilistic reasoning as I know
the population and predict the sample
 Now suppose that I do not know the proportions
of car makes in Kampala, but would like to
estimate them. I observe a random sample of
cars in the street and then I have an estimate of
the proportions of the population. This is
statistical reasoning
Topic 5: Statistics & Probability

Basic Concepts of Probability


 An experiment is the process by which an
observation (or measurement) is obtained.
 An event is an outcome of an experiment,
usually denoted by a capital letter.
 The basic element to which probability is applied
 When an experiment is performed, a particular
event either happens, or it doesn’t!
Topic 5: Statistics & Probability

Basic Concepts of Probability


 Experiment: Record an age
 A: person is 30 years old
 B: person is older than 65
 Experiment: Toss a die
 A: observe an odd number
 B: observe a number greater than 2
Topic 5: Statistics & Probability

Basic Concepts (Cont.)


 Two events are mutually exclusive if, when
one event occurs, the other cannot, and vice
versa.
Experiment: Toss a die Not Mutually
Exclusive
–A: observe an odd number
–B: observe a number greater than 2
–C: observe a 6 B and C?
Mutually
–D: observe a 3 Exclusive B and D?
Topic 5: Statistics & Probability

Basic Concepts (Cont.)


 An event that cannot be decomposed is called a
simple event.
 Denoted by E with a subscript.

 Each simple event will be assigned a probability,

measuring “how often” it occurs.


 The set of all simple events of an experiment is

called the sample space, S.


Topic 5: Statistics & Probability

Example: The die toss


Simple events: Sample space:
11 E1 S ={E1, E2, E3, E4, E5, E6}
22
E2
33 S
•E1 •E3
E3
44 •E5
E4
55 •E2 •E4 •E6
E5
66
E6
Topic 5: Statistics & Probability

Example: The die toss


 An event is a collection of one or more
simple events.
S
•E1 •E3
•The die toss: A •E5
–A: an odd number B
–B: a number > 2 •E2 •E6
•E4

A ={E1, E3, E5}


B ={E3, E4, E5, E6}
Topic 5: Statistics & Probability

Probability of an Event
 The probability of an event A measures “how
often” A will occur. We write P(A).
 Suppose that an experiment is performed n
times. The relative frequency for an event A is
Number of timesA occurs f

n n
 If we let n get infinitely large,
ff
PP((AA)) 
lim
lim
nn n
n
Topic 5: Statistics & Probability

Probability of an Event
 P(A) must be between 0 and 1.
 If event A can never occur, P(A) = 0. If event A
always occurs when the experiment is performed,
P(A) =1.
 The sum of the probabilities for all simple events in
S equals 1.

•• The
The probability
probability of
of an
an event
event A A is
is found
found by
by
adding
adding the
the probabilities
probabilities of
of all
all the
the simple
simple events
events
contained
contained in
in A.
A.
Topic 5: Statistics & Probability

Finding Probabilities
 Probabilities can be found using
 Estimates from empirical studies
 Common sense estimates based on
equally likely events.
• Examples:
–Toss a fair coin. P(Head) = 1/2
– Suppose that 10% of the Uganda’s population
is male. Then for a person selected at random,
P(male) = 0.1
Topic 5: Statistics & Probability

Use of Simple Events


 The probability of an event A is equal to the
sum of the probabilities of the simple events
contained in A
 If the simple events in an experiment are

equally likely, you can calculate

nnAA number
number of
of simple
simple events
events in
in A
A
PP((AA)) 
  
NN total
totalnumber
numberofof simple
simpleevents
events
Topic 5: Statistics & Probability

Example 1: Simple Events


Toss a fair coin twice. What is the
probability of observing at least one head?
1st Coin 2nd Coin Ei P(Ei)

HH HH
HH 1/4 P(at
P(atleast
least11head)
head)
HH
1/4 ==P(E
TT HT
HT P(E1))++P(E
1 P(E2))++P(E
2 P(E3))
3
1/4 ==1/4
HH 1/4++1/4
1/4++1/4
1/4==3/4
3/4
TH
TH 1/4
TT
TT TT
TT
Topic 5: Statistics & Probability

Example 2:
 A bowl contains three marbles, one red, one blue and
one green. A child selects two marbles at random. What
is the probability that at least one is red?
1st M&M 2nd M&M Ei P(Ei)
m RB
m RB 1/6
m P(at
RG
RG 1/6 P(atleast
least11red)
red)
m ==P(RB)
P(RB)++P(BR)+
P(BR)+P(RG)
P(RG)++
BR
BR 1/6
m P(GR)
P(GR)
m
BG
BG 1/6 ==4/6
4/6==2/3
2/3
m
m GB
GB 1/6
m GR
GR 1/6
Topic 5: Statistics & Probability

Example 3:
 The sample space of throwing a pair of dice is:
Topic 5: Statistics & Probability

Example 3 (Cont.):
 Some simple events and their probabilities:

Event Simple events Probability

Dice add to 3 (1,2),(2,1) 2/36


Dice add to 6 (1,5),(2,4),(3,3), (4,2),(5,1) 5/36
Red die show 1 (1,1),(1,2),(1,3), 6/36
(1,4),(1,5),(1,6)
Green die show 1 (1,1),(2,1),(3,1), 6/36
(4,1),(5,1),(6,1)
Topic 5: Statistics & Probability

Simple Events Counting rules:


 Sample space of throwing 3 dice has 216
entries, sample space of throwing 4 dice has
1296 entries, …
 At some point, we have to stop listing and

start thinking …
 We need some counting rules
Topic 5: Statistics & Probability

The nm Rule
 If an experiment is performed in two stages, with m
ways to accomplish the first stage and n ways to
accomplish the second stage, then there are mn ways
to accomplish the experiment.
 This rule is easily extended to k stages, with the number
of ways equal to
n1 n2 n3 … nk

Example: Toss two coins. The total number of simple


events is:
22
22==44
Topic 5: Statistics & Probability

The nm Rule by Examples


Example: Toss three coins. The total number of
simple events is: 222222==88
Example: Toss two dice. The total number of simple
events is:
66
66==36
36
Example: Toss three dice. The total number of simple
events is: 6666 66==216
216
Example: Two Marbles are drawn from a dish
containing two red and two blue candies. The total
number of simple events is: 4433==12
12
Topic 5: Statistics & Probability

Permutations
 The number of ways you can arrange
n distinct objects, taking them r at a time is:
n n!
P 
r
(n  r )!
where n!n(n  1)(n  2)...(2)(1) and 0!1.
Example: How many 3-digit lock combinations can
we make from the numbers 1, 2, 3, and 4?

The order of the choice is 44 44!!


important!
PP 
33  44((33)()(22)) 
24
24
11!!
Topic 5: Statistics & Probability

Permutations
Example: A lock consists of five parts and can
be assembled in any order. A quality control
engineer wants to test each order for efficiency
of assembly.
How many orders are there?
The order of the choice is
important!

55 55!!
PP 
55  55((44)()(33)()(22)()(11)) 
120
120
00!!
Topic 5: Statistics & Probability

Combinations
 The number of distinct combinations of n distinct objects
that can be formed, taking them r at a time is
n n!
Cr 
r!(n  r )!
Example: Three members of a 5-person committee must be
chosen to form a subcommittee. How many different
subcommittees could be formed?

The order of 55 55!! 55((44)()(33)()(22))11 55((44))


the choice is
CC 

33 
 
 
10
10
not important!
33!!((55 33)!)! 33((22)()(11)()(22))11 ((22))11
Topic 5: Statistics & Probability

Example 1 - Combinations
 A box contains six marbles, four red and two green. A
child selects two marbles at random. What is the
probability that exactly one is red?
2!
The order of 6! 6(5) C12  2
C26   15 1!1!
the choice is 2!4! 2(1)
ways to choose
not important! ways to choose 2 Marbles
1 green Marble.

4!
C14  4 4  2 =8 ways to
1!3! P(exactly one
choose 1 red and 1
ways to choose color) = 8/15
green marble.
1 red Marble.
Topic 5: Statistics & Probability

Example 2 - Combinations
 A deck of cards consists of 52 cards, 13 "kinds" each
of four suits (spades, hearts, diamonds, and clubs).
The 13 kinds are Ace (A), 2, 3, 4, 5, 6, 7, 8, 9, 10,
Jack (J), Queen (Q), King (K). In many poker games,
each player is dealt five cards from a well shuffled
deck.

52 52
52!! 52
52((51
51 )()(50
50 )()(4949 ))48
48 2,598,960
There are CC 
Thereare 52
55  2,598,960
52 55)!)!
55!!((52 55((44)()(33)()(22))11
possible
possiblehands
hands
Topic 5: Statistics & Probability

Example 2 (Cont.)
 Four of a kind: 4 of the 5 cards are the same “kind”.
What is the probability of getting four of a kind in a five
card hand?

There are 13 possible choices for the kind of which to


have four, and 52-4=48 choices for the fifth card. Once
the kind has been specified, the four are completely
determined: you need all four cards of that kind. Thus
there are 13×48=624 ways to get four of a kind.
The probability=624/2598960=.000240096
Topic 5: Statistics & Probability

Example 3
 One pair: two of the cards are of one kind, the other
three are of three different kinds. What is the
probability of getting one pair in a five card hand?

There
Thereare
are13
13possible
possiblechoices
choicesfor
for the
thekind
kind
of
of which
which to
tohave
haveaa pair;
pair;given
given the
thechoice,
choice,
44
there
thereare C 
areC 66possible
22 possiblechoices
choicesof
of two
two
of
of the
thefour
fourcards
cardsof
of that
that kind
kind
Topic 5: Statistics & Probability

Example 3 (Cont.)
 There are 12 kinds remaining from which to
select the other three cards in the hand. We
must insist that the kinds be different from each
other and from the kind of which we have a pair,
or we could end up with a second pair, three or
four of a kind, or a full house.
Topic 5: Statistics & Probability

Example 3 (Cont.)
12
There are C3312 
There are C 220
220ways
waysto topick
pick the
thekinds
kindsof of
the
theremaining
remaining three
threecards.
cards.There
Thereareare44choices
choices
for
for the
thesuit
suit of
of each
eachofof those
those three
threecards,
cards,aa total
total
33
of 4 
of 4 64
64choices
choicesfor
for the
thesuits
suitsof
of all
allthree.
three.
Therefore
Thereforethethenumber
numberof of ""one
onepair"
pair"hands
handsis
is
13
13 66 220
220 64
64 1,098,240.
1,098,240.
The probabilityy
Theprobabilit 1098240/25
1098240/259896098960  
..422569
422569
Topic 5: Statistics & Probability

Event Relations
The intersection of two events, A and B, is the event
that both A and B occur when the experiment is
performed. We write A B.

AB A B

• If two events A and B are mutually exclusive, then


P(A B) = 0.
Topic 5: Statistics & Probability

Event Relations
The complement of an event A consists of all
outcomes of the experiment that do not result in
event A. We write AC.

S
A C

A
Topic 5: Statistics & Probability

Example - Event Relations


Select a student from the classroom and
record his/her hair color and gender.
 A: student has brown hair
 B: student is female
 C: student is male

What is the relationship between events B and C?


•AC: Student does not have brown hair
•BC:
Student is both male and female = 
•BC:
Student is either male or female = all students = S
Topic 5: Statistics & Probability

Probabilities for Unions & Complements


 There are special rules that will allow you to calculate
probabilities for composite events.
The Additive Rule for Unions:
 For any two events, A and B, the probability of their union,
P(A B), is

PP((AA
BB)) 
PP((AA))PP((BB)) PP((AA
BB))

A B
Topic 5: Statistics & Probability

The Additive Rule


Example: Suppose that there were 120 students in
the classroom, and that they could be classified as
follows, (Half the females is brown haired)
A: brown hair Brown Not Brown
P(A) = 50/120 Male 20 40
B: female
P(B) = 60/120 Female 30 30

P(AB)
P(AB)==P(A)
P(A)++P(B)
P(B)––P(AB)
P(AB)
==50/120
50/120++--30/120
30/120
==80/120
80/120==2/360/120
2/360/120 Check:
Check: P(AB)
P(AB)
==(20
(20++30
30++30)/120
30)/120
Topic 5: Statistics & Probability

The Additive Rule


A: red die show 1
B: green die show 1

P(AB)
P(AB)==P(A)
P(A)++P(B)
P(B)––P(AB)
P(AB)
==6/36
6/36++ 6/36
6/36––1/36
1/36
==11/36
11/36
Topic 5: Statistics & Probability

Calculating Probabilities for Compliments


 We know that for any event A:
AC
 P(A AC) = 0
A
 Since either A or AC must occur,
P(A AC) =1
 so that P(A AC) = P(A)+ P(AC) =
1

P(A
P(AC))==11––P(A)
C
P(A)
Topic 5: Statistics & Probability

Example: Probabilities for Compliments


Select a student at random from the
classroom. Define:
A: male Brown Not Brown
P(A) = 60/120
Male 20 40
B: female
P(B) = ? Female 30 30

A and B are P(B)


P(B)== 1-
1-P(A)
P(A)
complementary, so that ==1-
1-60/120
60/120==60/120
60/120
Topic 5: Statistics & Probability

Calculating Probabilities of Intersections

In the previous example, we found P(A  B) directly from


the table. Sometimes this is impractical or impossible.
The rule for calculating P(A  B) depends on the idea of
independent and dependent events.

Two
Twoevents,
events, AAand
andB,
B, are
aresaid
saidtotobebe independent
independent ifif the
the
occurrence
occurrenceor ornonoccurrence
nonoccurrenceof of one
oneof of the
theevents
eventsdoesdoes
not
not change
changethe
theprobability
probabilityof
of the
theoccurrence
occurrenceof of the
the
other
otherevent.
event.
Topic 5: Statistics & Probability

Conditional Probabilities
• The probability that A occurs, given that event B has
occurred is called the conditional probability of A given
B and is defined as

PP((AA  BB))
PP((AA||BB)) 
 ifif PP((BB)) 
00
PP((BB))

“given”
Topic 5: Statistics & Probability

Conditional Probabilities: Two Dice Example


Toss a pair of fair dice. Define
 A: red die show 1
 B: green die show 1

P(A|B)
P(A|B)==P(A
P(Aand
andB)/P(B)
B)/P(B)
=1/36/1/6=1/6=P(A)
=1/36/1/6=1/6=P(A)

P(A) does not


change, whether B A and B are
happens or not… independent!
Topic 5: Statistics & Probability

Defining Independence
 We can redefine independence in terms of conditional
probabilities:
Two
Twoevents
eventsAAand
andBBare
are independent
independent ifif and
andonly
onlyifif
P(AB)
P(A B)==P(A)
P(A) or
or P(B|A)
P(B|A)==P(B)P(B)
Otherwise,
Otherwise, they
theyare
are dependent.
dependent
dependent.
dependent

• Once you’ve decided whether or not two events are


independent, you can use the following rule to calculate
their intersection.
Topic 5: Statistics & Probability

The Multiplicative Rule for Intersections


 For any two events, A and B, the probability that
both A and B occur is

P(AB)
P(A B)== P(A)
P(A)P(B
P(Bgiven
giventhat
thatAAoccurred)
occurred)
==P(A)P(B|A)
P(A)P(B|A)

• If the events A and B are independent, then the


probability that both A and B occur is

P(A B)
P(A B) == P(A)
P(A) P(B)
P(B)
Topic 5: Statistics & Probability

Example 1: Multiplicative Rule for Intersections


In a certain population, 10% of the people can be
classified as being high risk for a heart attack. Three
people are randomly selected from this population. What is
the probability that exactly one of the three are high risk?
Define H: high risk N: not high risk
P(exactly
P(exactlyone
onehigh
highrisk)
risk)==P(HNN)
P(HNN)++ P(NHN)
P(NHN)++P(NNH)
P(NNH)
==P(H)P(N)P(N)
P(H)P(N)P(N)++ P(N)P(H)P(N)
P(N)P(H)P(N)++ P(N)P(N)P(H)
P(N)P(N)P(H)
==(.1)(.9)(.9)
(.1)(.9)(.9)++ (.9)(.1)(.9)
(.9)(.1)(.9)++ (.9)(.9)(.1)=
(.9)(.9)(.1)= 3(.1)(.9)
3(.1)(.9) 2 ==.243
2
.243
Topic 5: Statistics & Probability

Example 2: Multiplicative Rule for Intersections


Suppose we have additional information in the
previous example. We know that only 49% of the population
are female. Also, of the female patients, 8% are high risk. A
single person is selected at random. What is the probability
that it is a high risk female?
Define H: high risk F: female
From
Fromthe
theexample,
example, P(F)
P(F)==.49
.49 and
andP(H|F)
P(H|F)==.08.
.08.
Use
Usethe
theMultiplicative
MultiplicativeRule:
Rule:
P(high
P(highrisk
riskfemale)
female)==P(HF)
P(HF)
==P(F)P(H|F)
P(F)P(H|F)=.49(.08)
=.49(.08)==.0392
.0392
Topic 5: Statistics & Probability

Law of Total Probability


Let S1 , S2 , S3 ,..., Sk be mutually exclusive and
exhaustive events (that is, one and only one must
happen). Then the probability of any event A can
be written as

P(A) P(A 
P(A)== P(A P(A 
SS11))++P(A SS22))++… P(A 
…++P(A SSkk))
==P(S
P(S11)P(A|S
)P(A|S11))++P(S
P(S22)P(A|S
)P(A|S22))++…
…++P(S
P(Skk)P(A|S
)P(A|Skk))
Topic 5: Statistics & Probability

Law of Total Probability

S1

A A Sk

A  S1 Sk
S2….

P(A) P(A 
P(A)== P(A P(A 
SS11))++P(A SS22))++… P(A 
…++P(A SSkk))
==P(S
P(S11)P(A|S
)P(A|S11))++P(S
P(S22)P(A|S
)P(A|S22))++…
…++P(S
P(Skk)P(A|S
)P(A|Skk))
Topic 5: Statistics & Probability

Law of Total Probability


Let S1 , S2 , S3 ,..., Sk be mutually exclusive and exhaustive
events with prior probabilities P(S1), P(S2),…,P(Sk). If an event A
occurs, the posterior probability of S i, given that A occurred is
PP((SSi i))PP((AA||SSi i))
PP((SSi i || AA))
 forii 
for 11,, 22,...k
,...k
PP((SSi i))PP((AA||SSi i))
Proof
Proof
P( AS )
PP((AA| |SSi )) P( ASi i )

 PP((AS ASi ))
 PP((SSi ))PP((AA| |SSi ))
i PP((SSi )) i i i
i
P( AS ) P( S ) P( A | S )
PP((SSi | |AA)) P( ASi i ) P ( Si i ) P( A | Si i )
i PP((AA)) PP((SSi ))PP((AA| |SSi ))
i i
Topic 5: Statistics & Probability

Law of Total Probability


From a previous example, we know that 49% of the
population are female. Of the female patients, 8% are high
risk for heart attack, while 12% of the male patients are high
risk. A single person is selected at random and found to be
high risk. What is the probability that it is a male?
Define H: high risk F: female M: male
We
Weknow:
know: PP((MM))PP((HH| |MM))
P(F)
P(F)==
.49 PP((MM | |HH))
PP((MM))PP((HH| |MM))PP((FF))PP((HH| |FF))
P(M)
P(M)== .51

P(H|F)
P(H|F)== .08 .51
.51 (.(.12
12))
 .61
.61
P(H|M)
P(H|M)== .12 .51 12)).49
.51(.(.12 .49(.(.08
08))
Topic 5: Statistics & Probability

Example 2:
Tom and Dick are going to take
a driver's test at the nearest DMV office. Tom
estimates that his chances to pass the test are 70%
and Dick estimates his as 80%. Tom and Dick take
their tests independently.
Define D = {Dick passes the driving test}
T = {Tom passes the driving test}
T and D are independent.
P (T) = 0.7, P (D) = 0.8
Topic 5: Statistics & Probability

Example 2 (Cont.):
What is the probability that at most one of the two friends
will pass the test?
P(At
P(Atmost
mostoneoneperson
personpass)
pass)
P(Dc 
==P(D P(Dc
TTc))++P(D T) P(D 
T)++ P(D TTc))
c c c c

==(1
(1--0.8)
0.8)(1
(1––0.7)
0.7)++ (0.7)
(0.7)(1
(1––0.8)
0.8)++(0.8)
(0.8)(1
(1––0.7)
0.7)
==.44
.44

P(At
P(Atmost
mostone
oneperson
personpass)
pass)
==1-P(both
1-P(bothpass)
pass)==1-
1-0.8
0.8xx0.7
0.7== .44
.44
Topic 5: Statistics & Probability

Example 2 (Cont.):
What is the probability that at least one of the two friends
will pass the test?
P(At
P(Atleast
leastone
oneperson
personpass)
pass)
P(D
==P(D T)
T)
==0.8
0.8++ 0.7
0.7--0.8
0.8xx0.7
0.7
==.94
.94

P(At
P(Atleast
leastone
oneperson
personpass)
pass)
==1-P(neither
1-P(neitherpasses)
passes)==1-
1-(1-0.8)
(1-0.8)xx(1-0.7)
(1-0.7)== .94
.94
Topic 5: Statistics & Probability

Example 2 (Cont.):
Suppose we know that only one of the two friends passed
the test. What is the probability that it was Dick?

P(D
P(D||exactly
exactlyone oneperson
personpassed)
passed)
P(D
==P(D exactly
exactlyoneoneperson
personpassed)
passed)// P(exactly
P(exactlyone
one
person
personpassed)
passed)
P(D  T ) / (P(D  T ) + P(D 
==P(D  T ) / (P(D  T ) + P(D T)
T)))
cc cc cc

==0.8
0.8xx(1-0.7)/(0.8
(1-0.7)/(0.8xx(1-0.7)+(1-.8)
(1-0.7)+(1-.8)xx0.7)
0.7)
==.63
.63
Topic 5: Statistics & Probability

Random Variables:
 A quantitative variable x is a random variable if
the value that it assumes, corresponding to the
outcome of an experiment is a chance or random
event.
 Random variables can be discrete or continuous.

• Examples:
x = SAT score for a randomly selected student
x = number of people in a room at a randomly
selected time of day
x = number on the upper face of a randomly
tossed die
Topic 5: Statistics & Probability

Probability Distribution for Discrete


Random Variables:
The probability distribution for a discrete random
variable x resembles the relative frequency
distributions. It is a graph, table or formula that gives
the possible values of x and the probability p(x)
associated with each value.

We
We must
musthave
have
00
pp((xx)) and pp((xx))
11and 11
Topic 5: Statistics & Probability

Example: Toss a fair coin three times and define x =


number of heads.
x p(x)
x P(x
P(x==0)
0)== 1/8
1/8 0 1/8
P(x
P(x==1)
1)== 3/8
3/8
HHH 1/8 3 P(x 1 3/8
HHH P(x==2)
2)== 3/8
3/8
HHT 1/8 2 P(x
P(x==3)
3)== 1/8
1/8 2 3/8
HHT
HTH 1/8 3 1/8
HTH 2
THH
THH 1/8 2 Probability
Probability
HTT Histogram
HTT 1/8 1 Histogramforforxx
THT
THT 1/8 1
TTH
TTH 1/8 1
TTT
TTT 1/8 0
Topic 5: Statistics & Probability

Example 2: Toss two dice and define


x p(x)
x = sum of two dice.
2 1/36
3 2/36
4 3/36
5 4/36
6 5/36
7 6/36

8 5/36
9 4/36
10 3/36
11 2/36
12 1/36
Topic 5: Statistics & Probability

Probability Distribution
Probability distributions can be used to describe
the population, just as we described samples in
Chapter 2.
 Shape: Symmetric, skewed, mound-shaped…

 Outliers: unusual or unlikely measurements

 Center and spread: mean and standard

deviation. A population mean is called  and a


population standard deviation is called 
Topic 5: Statistics & Probability

The Mean & Standard Deviation


Let x be a discrete random variable with
probability distribution p(x). Then the mean,
variance and standard deviation of x are given
as:

Mean:: 
Mean   xp
xp((xx))
Variance ::  ((xx )) pp((xx))
22 22
Variance 
deviation:: 
 
22
Standard
Standard deviation
Topic 5: Statistics & Probability

The Mean & Standard Deviation


Toss a fair coin 3 times and record x the number of heads.

x p(x) xp(x) (x-2p(x) 12


12 1.5
0 1/8 0 (-1.5)2(1/8)
  xp ( x ) 
 xp( x)  1.5
88
1 3/8 3/8 (-0.5)2(3/8)
2 3/8 6/8 (0.5)2(3/8)
3 1/8 3/8 (1.5) (1/8)
2 22  x   ) 22
 ( x   ) pp((xx))
(

 
22
28125..09375
..28125 09375..09375
09375..28125
28125..75
75
  75 
 ..75 ..688
688
Topic 5: Statistics & Probability

The Mean & Standard Deviation


The probability distribution for x the number of heads in
tossing 3 fair coins.

Symmetric; mound-
shaped
• Shape?
None
• Outliers?
 = 1.5
• Center?
• Spread?  = .688


Topic 5: Statistics & Probability

Key Concepts
I. Experiments and the Sample Space
1. Experiments, events, mutually exclusive events,
simple events
2. The sample space

II. Probabilities
1. Relative frequency definition of probability
2. Properties of probabilities
a. Each probability lies between 0 and 1.
b. Sum of all simple-event probabilities equals 1.
3. P(A), the sum of the probabilities for all simple events in A
Topic 5: Statistics & Probability

Key Concepts (Cont.)


III. Counting Rules
1. mn Rule; extended mn Rule
2. Permutations: n n!
Pr 
(n  r )!
3. Combinations: n!
Crn 
r!(n  r )!
IV. Event Relations
1. Unions and intersections
2. Events
a. Disjoint or mutually exclusive: P(A B)  0
b. Complementary: P(A)  1  P(AC )
Topic 5: Statistics & Probability

Key Concepts (Cont.)


P( A  B)
3. Conditional probability: P( A | B) 
P( B)
4. Independent and dependent events
5. Additive Rule of Probability:
P ( A  B ) P ( A)  P ( B )  P ( A  B )

6. Multiplicative Rule of Probability:


P ( A  B ) P ( A) P ( B | A)

7. Law of Total Probability


8. Bayes’ Rule
Topic 5: Statistics & Probability

Key Concepts (Cont.)


V. Discrete Random Variables and Probability
Distributions
1. Random variables, discrete and continuous
2. Properties of probability distributions
00
pp((xx)) and pp((xx))
11and 11
3. Mean or expected value of a discrete random
variable: Mean:: 
Mean xpxp((xx))
4. Variance and standard deviation of a discrete random
variable: Variance :  22  ( x   ) 22p ( x)
Variance :   ( x   ) p ( x)
Standard deviation :   
Standard deviation :   
22
END OF PROBABILITY
SUB-TOPIC

END OF STATISTICS &


PROBABILITY TOPIC

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy