0% found this document useful (0 votes)
11 views

Statistic Lecture2023

The document outlines a course on the Principles of Statistics, covering fundamental concepts, data presentation, measures of central tendency, regression and correlation analysis, and probability. It emphasizes the importance of statistics in decision-making across various fields and explains key statistical terms and processes. The course is structured into chapters that detail statistical methods, data collection, and analysis techniques.

Uploaded by

ibraahimaxmad358
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Statistic Lecture2023

The document outlines a course on the Principles of Statistics, covering fundamental concepts, data presentation, measures of central tendency, regression and correlation analysis, and probability. It emphasizes the importance of statistics in decision-making across various fields and explains key statistical terms and processes. The course is structured into chapters that detail statistical methods, data collection, and analysis techniques.

Uploaded by

ibraahimaxmad358
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 99

Course Name: Principle of Statistics

Table of Contents
Ch1: Fundamental Elements of Statistics
1.1 What are Statistics?
1.2 Why Study Statistics?
1.3 Who Uses Statistics?
1.4 Origin and Growth of Statistics
1.5 Four stages of statistical process
1.6 Functions of Statistics
1.7 Types of statistics
1.8 Types of Variables
1.9 Collecting Data and Obtaining Data
Ch2: Presentation of statistical data
2.1 some statistical terminology
2.2 Presentation of ungrouped data
2.3 Presentation of grouped data

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 1


Course Name: Principle of Statistics
Ch3: Measures of Central Tendency
3.1 The mean Ungrouped and grouped Data
3.2 The median Ungrouped and grouped Data
3.3 The mode Ungrouped and grouped Data
3.4 The range
3.5 Mean Deviation Definition
3.6 Sample Variance
3.7 variance and standard deviation
3.8 standard deviation Grouped data
Ch4: Regression Analysis
4.1 Linear model assumptions
4.2 Simple linear regression
4.3 Multiple linear regression
4.4 Problems for Regression Analysis
(i) Regression equation of X on Y

(ii) Regression coefficient of Y on X

(iii) Regression equation of Y on X

Chapter 5: Correlation Analysis


5.1 Types of correlation coefficient formulas
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 2
Course Name: Principle of Statistics
5.2 What is Pearson Correlation?
5.3 Potential problems with Pearson correlation

Ch6: Probability
6.1 Introduction to Probability
6.2 Laws of Probability
6.3 Empirical Probability
 The Addition Rules for Probability
 The Multiplication Rules and Conditional
Probability
 Conditional Probability

Chapter One
Fundamental Elements of Statistics
 What are Statistics?
 Statistics is the science of collecting, organizing, presenting, analyzing,
and interpreting numerical data to assist in making more effective decisions.

 Statistics is the science of data. It involves collecting, classifying,


summarizing, organizing, analyzing, and interpreting numerical information.

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 3


Course Name: Principle of Statistics
 Statistics is a branch of mathematics that examines ways to collect,
Analyze, interprets and presents data in a meaningful way.

Why Study Statistics?


1) Numerical information is every where
2) Statistical techniques are used to make decisions that affect our daily lives.
3) The knowledge of statistical methods will help you understand how decisions
are made and give you a better understanding of how they affect you.
4) Develop an understanding of some basic ideas of statistical reliability,
stochastic process (probability concepts).
5) Statistics is important in every aspect of society (Govt., People or Business)
6) To develop an appreciation for variability and how it effects product, process
and system.
7) It is estimating the present; predicting the future
8) Study methods that can be used to solve problems, build knowledge.
9) Statistics make data into information
No matter what line of work you select, you will find yourself faced with
decisions where an understanding of data analysis is helpful.

Who Uses Statistics?


Statistical techniques are used extensively by marketing, accounting,
quality control, consumers, professional sports people, hospital
administrators, educators, politicians, physicians, or Doctors, etc...

Origin and Growth of Statistics:

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 4


Course Name: Principle of Statistics
 The word ‘Statistics’ and ‘Statistical’ are all derived from the Latin word
Status, means a political state.
 The theory of statistics as a distinct branch of scientific method is of
comparatively recent growth.
 Research particularly into the mathematical theory of statistics is rapidly
proceeding and fresh discoveries are being made all over the world.

Four stages of statistical process


1) Collection of Data: It is the first step and this is the foundation upon which the entire
data set.
2) Presentation of data: The mass data collected should be presented in a suitable, concise
form
3) Analysis of data:
4) Interpretation of data:

Functions of Statistics
There are many functions of statistics. Let us consider the following five
important functions:
1) Condensation:
2) Comparison:
3) Forecasting:
4) Estimation:
5) Tests of Hypothesis:

1. Key Statistical Concepts


1) Experimental unit Object upon which we collect data

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 5


Course Name: Principle of Statistics
2) A population is a collection of all possible individuals, objects, or
measurements of interest.
3) Variable Variables are properties or characteristics of some event, object, or person that
can take on different values or amounts;
Constants do not vary.

Variables may be...


1) Independent or dependent;
2) discrete or continuous;

3) Qualitative or quantitative.
4) A sample a sample is a portion, or part, of the population of interest.
A measurement is a number or attribute computed for each member of a population or of a
sample. The measurements of sample elements are collectively called the sample data.

A parameter is a number that summarizes some aspect of the population as a whole.

Types of statistics
There are two main branches of statistics: descriptive and inferential. The Descriptive
statistics is used to say something about a set of information that has been collected only.
The Inferential statistics is used to make predictions or comparisons about larger group (a
population) using information gathered about a small part of that population. Thus, inferential
statistics involves generalizing beyond the data, something that descriptive statistics does not do.
1) Descriptive statistics: methods of organizing, summarizing, and presenting data in
an informative way.

2) EXAMPLE 1: The United States government reports the population of the United States
was 179,323,000 in 1960; 203,302,000 in 1970; 226,542,000 in 1980; 248,709,000 in 1990,
and 265,000,000 in 2000.

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 6


Course Name: Principle of Statistics
2) Inferential Statistics: A decision, estimate, prediction, or generalization about a

population, based on a sample.


Note: In statistics the word population and sample have a broader meaning.
A population or sample may consist of individuals or objects

Types of Variables
Quantitative data are measurements that are recorded on a naturally occurring
numerical scale. Or are numerical measurements that arise from a natural numerical
scale. Quantitative data are further classified as either discrete or continuous.
Discrete data are numeric data that have finite number of possible value.

A classic example of discrete data is a finite subset of the counting number. (1, 2, 3,
4, 5, 6, 7, 8) perhaps corresponding to (Strongly disagree…… Strongly Agree).

Continuous data have infinite possibilities: 1.4, 1.41, 1.414, 1.4142, 1.41421…

The real numbers are continuous with no gaps or interruptions. Physically measurable
quantities of length, volume, time, mass.
Qualitative data are measurements that cannot be measured on a natural numerical
scale; they can only be classified into one of a group of categories or are
measurements for which there is no natural numerical scale, but which consist of
attributes, labels, or other nonnumeric characteristics.
Qualitative data are nonnumeric.
Data Analysis is a process of gathering, modeling, and transforming data with the
goal of highlighting useful information, suggesting conclusions, and supporting
decision making. The data analysis has multiple facets and approaches, encompassing
diverse techniques under variety of names, in difference business, science, and social
science domain.

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 7


Course Name: Principle of Statistics
Some definitions are the following as:
1) Raw data: data collected in original form.
2) Frequency: the number of times a certain values or class of values occurs.
3) Frequency Distribution: the organizations of raw data in table form with
classes and frequencies.
4) Categorical frequency distribution: A frequency distribution in which the data
is only nominal or ordinal.
5) Ungrouped frequency distribution: A frequency distribution of numerical data.
The raw data is not grouped.
6) Grouped frequency distribution: A frequency distribution where several
numbers are grouped into one class.
7) Class limits: separate one class in a grouped frequency distribution from
another. The limits could actually appear in the data have gaps between the
upper limit of one class and the lower limit of the next.
8) Class boundaries: separate one class in a grouped frequency distribution from
another. Boundaries have one more decimal place than the raw data and
therefore do not appear in the data. There is no gap between the upper
boundary of one class and the lower boundary of the next class. The lower
class boundary is found by subtracting 0.5 unit from the lower class limit and
the upper class boundary is found by adding 0.5 units to the upper class limit.
9) Class width: the difference between the upper and lower boundaries of any
class. The class width is also the difference between the lower consecutive
classes or the upper limit of two consecutive classes. It is not the difference
between the upper and lower limits of the same class.

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 8


Course Name: Principle of Statistics
10) Class mark (midpoint): the number in the middle of the class. It is found
by adding the upper and lower limits and dividing by two. It can also be found
by adding the upper and lower boundaries and dividing by two.
11) Cumulative frequency: the number of values less than the upper class
boundary for the current class. This is a running total of the frequencies.
12) Relative Frequency: the frequency divided by the total frequency. This
gives the percent of values falling in the class.
13) Cumulative Relative Frequency (Relative Cumulative frequency): the
running total of the Relative Frequency or the Cumulative frequency divided
by the total frequency, gives the percent of the values which are less than the
upper class boundary.

Collecting Data and Obtaining Data


Published source: book, journal, newspaper, Web site
Designed experiment: researcher exerts strict control over units
Survey: a group of people are surveyed and their responses are recorded
Observation study: units are observed in natural setting and variables of interest are recorded

Samples
A representative sample exhibits characteristics typical of those possessed by
the population of interest.
A random sample of n experimental units is a sample selected from the
population in such a way that every different sample of size n has an equal
chance of selection

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 9


Course Name: Principle of Statistics
Measures of Central Tendency
A measure of central tendency is a descriptive statistic that describes
the average, or typical value of a set of scores
There are three common measures of central tendency:
1) the mean
2) the median
3) the mode
Mean: also known as the arithmetic mean or average. Calculated by adding the
scores and dividing by the number of scores
The mean (also called the arithmetic mean) is the same as the average.


x ( for a population ) 𝒙̄ =
∑𝒙
(𝒇𝒐𝒓 𝒂 𝒔𝒂𝒎𝒑𝒍𝒆)
N 𝒏

Calculate the mean of the following data:


1 5 4 3 2
Sum the scores (X):
1 + 5 + 4 + 3 + 2 = 15
Divide the sum (X = 15) by the number of scores (N = 5):
15 / 5 = 3
Mean = X = 3
Example1: Compute the arithmetic mean of the first 6 odd, natural
numbers.
Solution: The first 6 odd, natural numbers: 1, 3, 5, 7, 9, And 11
x̄ = (1+3+5+7+9+11) / 6 = 36/6 = 6.
Thus, the arithmetic mean is 6
Example2: The data represent the number of textbooks purchased by a sample of
seven students: 10 4 7 5 7 8 9

10 + 4 + 7 + 5 + 7 + 8 + 9 𝟓𝟎
𝒙̄ = = = 7.14
𝟕 𝟕

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 10


Course Name: Principle of Statistics
Median – the number in the middle when the data is arranged in ascending or
descending order
The median is a measure of central tendency more resistant to the effects of extreme
values. The median is the value that occupies the middle position of data when data
are put in rank order by magnitude.
Let n be the number of cases in your data.
If n is odd, the median is the middle number of the data values sorted by magnitude.
th
It occupies the  n + 1  position.
 2 
If n is even, the median is the average of the middle two numbers of the data sorted
th
n
th
n +2
by magnitude. It is the average of the numbers in the   and   positions.
2  2 
How to Calculate the Median
Conceptually, it is easy to calculate the median
There are many minor problems that can occur; it is best to let a computer do it
Sort the data from highest to lowest
Find the score in the middle
Middle = (N + 1) / 2
If N, the number of scores, is even the median is the average of the middle two scores.

Example: Calculate the median age of the seven employees


53 32 61 57 39 44 57
To find the median, sort the data
32 39 44 53 57 57 61

The median age of the employees is 53 years.


Example (odd number of values): 1 3 4 8 10
The middle value is 4 (two values are higher, and two lower. This is the median.

Example (even number of values): 2 3 4 4 5 8 9 9

The two middle values are 4 and 5. The median is the average of these two values, or 4.5.

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 11


Course Name: Principle of Statistics
Mode – the most frequent. If two numbers occur the same amount of times the set is bimodal. If
all the same, more than one mode.
Example:
Find the mode of the ages of the seven employees.
53 32 61 57 39 44 57
The mode is 57 because it occurs the most times

AVERAGES (MEAN, MEDIAN, AND MODE)


A. Finding the Mean
The mean of a set of values is the sum of the values divided by the number of values. It is also
called the average.
Example: Find the mean of 19, 13, 15, 25, and 18

19 + 13 + 15 + 25 + 18 = 90 = 18
5 5
When the mean is known and you must find a missing value, some simple rules of algebra must
be applied.
Example: Ali has received the following grades this term: 75, 87, 90, 88, and 79. If he wishes to
earn an 85 average, what must he score on his final test?
Set up the problem like this: 75 + 87 + 90 + 88 + 79 + s = 85
6
To solve:
1. Add the known values.
419 + s = 85
6
2. Next, we want to try to isolate the unknown (s) on one side of the equation. To do this we
must use inverse operations to eliminate the numbers on the side of the equation with the
unknown (this means we do the opposite of what is being done).
Start with the 6. Since we are dividing the expression 419 + s by the 6, we must now multiply it
by 6.
NOTE: Whatever you do to one side of the equation, you must do to the other side of the
equation as well. Therefore, I will multiply the 85 by 6 too.

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 12


Course Name: Principle of Statistics
6x 419 + s = 85 x 6
6
I can cancel the 6s on the left side of the equation. This leaves you with the equation:
419 + s = 510
Now we must eliminate the 419 from the side of the equation with the unknown. Since we are
adding 419 to s, we will subtract it from both sides of the equation.

419 + s = 510 – 419


- 419
0
This leaves us with: s = 91
Answer: The student will need to score a 91 on his last test to earn an average of 85 for the
term.

Notation:

 - denotes summation of a set of values

x – Is the variable usually used to represent the individual data values

n – Represents the number of values in a sample

N – Represents the number of values in a population

∑𝒙
𝑿̄ = Is the mean of a set of sample values.
𝒏

Is the mean of all values in a population.


`

∑(𝒇𝒙)
𝑿̄ = Mean from a frequency
∑𝒇

How to Find the Sample Mean

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 13


Course Name: Principle of Statistics
Sample Question: Find the sample mean for the following set of numbers: 12, 13,
14, 16, 17, 40, 43, 55, 56, 67, 78, 78, 79, 80, 81, 90, 99, 101, 102, 304, 306, 400, 401,
403, 404, and 405
Step 1:Add up all of the numbers:
12 + 13 + 14 + 16 + 17 + 40 + 43 + 55 + 56 + 67 + 78 + 78 + 79 + 80 + 81 + 90 + 99
+ 101 + 102 + 304 + 306 + 400 + 401 + 403 + 404 + 405 = 3744.
Step 2: Count the numbers of items in your data set. In this particular data set
there are 26 items.
Step 3: Divide the number you found in Step 1 by the number you found in Step 2.
3744/26 = 144.
That’s it!
Tip: If you have to show working out on a test, just place the two numbers into the
formula. Step 1 gives you the σ and Step 2 gives you n:
x = (Σ xi) / n
= 3744/26
= 144

Assumed Mean Method Formula


Let x1, x2, x3,…,xn are mid-points or class marks of n class intervals and f1,
f2, f3, …, fn are the respective frequencies. The formula of the assumed
mean method is:

Here,
a = assumed mean

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 14


Course Name: Principle of Statistics
fi = frequency of ith class
di = xi – a = deviation of ith class
Σfi = n = Total number of observations
Xi = class mark = (upper class limit + lower class limit)/2
Assumed Mean Method Questions
If xi and fi are numerically large, the assumed mean method is preferred. Below are some examples
of calculating the mean of grouped data by this method.
Example 1:
The following table gives information about the marks obtained by 110 students in an examination.
Class 0-10 10-20 20-30 30-40 40-50
Frequency 12 28 32 25 13
Find the mean marks of the students using the assumed mean method.
Solution:

Class (CI) Frequency (fi) Class mark (xi) di = xi – a fidi


0-10 12 5 5– 25= – 20 -240
10-20 28 15 1 –25= – 10 -280
20-30 32 25 = a 25-25 = 0 0
30-40 25 35 35-25 = 10 250
40-50 13 45 45-25 = 20 260
Total Σfi =110 Σfidi = -
10
Assumed mean = a = 25
Mean of the data:

= 25 + (-10/ 110)
= 25 -( 1/11)
= (275-1)/11

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 15


Course Name: Principle of Statistics
= 274/11
=24.9
Hence, the mean marks of the students are 24.9.
Example 2: The table below gives information about the percentage distribution of
female employees in a company of various branches and a number of departments.
Percentage of female employees Number of departments
5-15 1

15-25 2

25-35 4

35-45 4

45-55 7

55-65 11

65-75 6

Find the mean percentage of female employees by the assumed mean method.
Solution:
Percentage of Number of Class mark (xi) di = xi – a fxidi
female employees departments
(CI) (fi)
5-15 1 10 -30 -30
15-25 2 20 -20 -40
25-35 4 30 -10 -40
35-45 4 40 = a 0 0
45-55 7 50 10 70
55-65 11 60 20 220
65-75 6 70 30 180

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 16


Course Name: Principle of Statistics
Total Σfi =35 Σfidi
=
360
Assumed mean = a = 40
Mean = a+ (Σfidi /Σfi)
=40+ (360/35)
= 40+ (72/7)
= 40 + 10.28
=50.28 (approx)
Hence, the mean percentage of female employees is 50.28
Formula to find arithmetic mean for a grouped data using assumed
Mean: = A + [∑fd / N]
Here A is the assumed mean.
Example 1: Calculate arithmetic mean for the following data.
X F
5 4
10 5
15 7
20 4
25 3
30 2
Solution: Now we have to use the formula given above to find the arithmetic mean.
Take the assumed mean A = 15

x F d = x-A fd
5 4 -10 -40
10 5 -5 -25
15 7 0 0
20 4 5 20
25 3 10 30
30 2 15 30

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 17


Course Name: Principle of Statistics
Total N = 25 ∑fd = 15
Arithmetic mean = A + [∑fd / N]
= 15 + (15/25)
= 15 + (3/5)
= (75 + 3)/5
= 78/5
= 15.6

Example 2: Calculate arithmetic mean for the following data.


Marks Number of students
65 6
70 11
75 3
80 5
85 4
90 7
95 10
100 4

Solution: Now we have to use the formula given above to find the arithmetic
mean. Take the assumed mean A = 80
X F d = x- fd
65 6 A -90
70 11 -15 -110
75 3 -10 -15
80 5 -5 0
85 4 0 20
90 7 5 70
95 10 10 150
100 4 15 80
Total N = 50 20 ∑fd = 115

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 18


Course Name: Principle of Statistics
Arithmetic mean = A + [∑fd / N]
= 80 + (115/50)
= 80 + (23/10)
= 80 + 2.3
= 82.3

Example 3: The following data give the number of boys of a particular age
in a class of 40 students. Calculate the mean age of the students
Age (in years) Number of students
13 3
14 8
15 9
16 11
17 6
18 3

Solution: Now we have to use the formula given above to find the
arithmetic mean.Take the assumed mean A = 16
X F d = x-A Fd
13 3 -3 -9
14 8 -2 -16
15 9 -1 -9
16 11 0 0
17 6 1 6
18 3 2 6

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 19


Course Name: Principle of Statistics
Total N = 40 ∑fd = -
22

Arithmetic mean = A + [∑fd / N]


= 16 + (-22/40)
= 16 - 0.55
= 15.45
Example 4: Calculate the Arithmetic mean of the following data:
Class (X) Frequency (F)
15 12
25 20
35 15
45 14
55 16
65 11
75 7
85 8

Solution: Now we have to use the formula given above to find the arithmetic mean.
Take the assumed mean A = 45
Class Frequency(F) d = x– Fd
(x) A
15 12 -30 -360

25 20 -20 -400
35 15 -10 -150

45 14 0 0

55 16 10 160

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 20


Course Name: Principle of Statistics
65 11 20 220

75 7 30 210

85 8 40 320

Total N = 103 ∑fd = 0

Arithmetic mean = A + [∑fd / N]

= 45 + (0/103) = 45

Median – the number in the middle when the data is arranged in ascending or descending order.
The median is a measure of central tendency more resistant to the effects of extreme values. The
median is the value that occupies the middle position of data when data are put in rank order by
magnitude.
Let n be the number of cases in your data.
If n is odd, the median is the middle number of the data values sorted by magnitude. It occupies
th
 n +1
the   position.
 2 

If n is even, the median is the average of the middle two numbers of the data sorted by magnitude.
th th
It is the average of the numbers in the  n  and  n + 2  positions.
2  2 

Example (odd number of values): 1 3 4 8 10


The middle value is 4 (two values are higher, and two lower. This is the median.
Example (even number of values): 2 3 4 4 5 8 9 9
The two middle values are 4 and 5. The median is the average of these two values, or 4.5.

Find Median grouped and ungrouped data


1) 3,8,9,4,12,34,21,7,1
2) 12,14,10,22,18,20

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 21


Course Name: Principle of Statistics
Solution
1) 1,3,4,7,8,9,12,21,34
th th
 n +1  9 +1  10 
N=9 odd number Median =      
 2   2  =  2 =5th, the median is 8
2) 12,14,10,22,18,20

Solution arranged in ascending or descending order for the data


th th
n 6
3) 10,12,14,18,20,22 N =6 Even Number    
 2  =  2  =3th Number
th th
 14 + 18   32 
Median =    
 2  =  2  = 16

Find the Median of the following data


Marks obtained 20 29 28 33 42 38 43 25
Number of students 6 20 24 28 15 4 2 1

Solution Step1 Calculate Cumulative frequency


Class (X) Frequency (F) Cumulative Frequency(CF)
20 6 6
25 20 6+20 =26
28 24 26+24 = 50
29 28 50+28 =78
33 15 78+15 = 93
38 4 93+4 = 97
42 2 97+2 =99
43 1 99+1 =100
Σf= 100
𝑁 100
Σf= 100, Even number = = 50th number
2 2

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 22


Course Name: Principle of Statistics
th
 n +1  28 + 29   57 
     
 2  =  2  =  2  = 28.5
The following Frequency distribution gives the monthly consumption of electricity of
68 consumers of a locality. Find the median of the following data.
Monthly Consumption Number of Consumers Cumulative Frequency
in (Unit)
65 – 85 4 4+0 = 4
85 – 105 5 4+5 = 9
105 – 125 13 9+13 = 22
125 – 145 20 22+20 = 42
145 – 165 14 42+14 = 56
165 – 185 8 56+8 = 64
185 –205 4 64+4 = 68

Σf= 68

L Lower limit of the Median class 𝑥 𝑏 √


=− ± 2𝑎 4𝑎𝑐
𝑏2 −
H Size of the median class

F Frequency of the median class

N Sum of frequencies

c.f. Cumulative frequency of the class just preceding the median class

N 
  cf M 
Formula Median  LM  2  xi
 FM 
 
 

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 23


Course Name: Principle of Statistics

N  68 
   
2 =  2  =34, C= 125- 145, L=125, Cf =22, F=20,
I=20
68
−22 34−22 12
= 125 + ( 2 20 ) 𝑥20 = 125 + ( 20
) 𝑥20 = 125 + ( ) 𝑥20
20
= 125 + 12 Median = 137

Calculate the median from the following data:


Marks below: 10 20 30 40 50 60 70 80
No. of students 15 35 60 84 96 127 198 250

Answer
Marks (x) No. of students (f) C.F.
0-10 15 15
10-20 20 35
20-30 25 60
30-40 24 84
40-50 12 96
50-60 31 127
60-70 71 198
70-80 52 250
N = 250

𝑵 𝟐𝟓𝟎
As, N=250⇒ = =125
𝟐 𝟐

As 127 are just greater than 125, therefore median class is 50−60.

N 
  Cf 
Median  L 2
Principle of Statistics
 f
 xh
Collected

by: Eng Ali Sidow Osman Page 24
 
 
Course Name: Principle of Statistics

Here, l=lower limit of median class =50


C=C.F. of the class preceding the median class =96
h= higher limit - lower limit =60−50=10
f= frequency of median class =31

𝟐𝟓𝟎
−𝟗𝟔 𝟏𝟐𝟓−𝟗𝟔
𝟐
∴ Median = 𝟓𝟎 + ( ) 𝒙𝟏𝟎 = 𝟓𝟎 + (
Median ) 𝒙𝟏𝟎
𝟑𝟏 𝟑𝟏

Median= 𝟓𝟎 + 𝟗. 𝟑𝟓 Median = 𝟓𝟗. 𝟑𝟓


Find the median of the following data marks and students 10-20=7, 20-30=9,30-
40=10,40-50=12,50-60=5,60-70=7
Marks (x) 10-20 20-30 30-40 40-50 50-60 60-70
No. of students (f) 7 9 10 12 5 7
⠀⠀⠀⠀⠀⠀⠀

Median of the given data⠀⠀⠀⠀⠀

Solution: ⠀⠀⠀

⋆ TABLE: ⠀⠀⠀⠀⠀⠀

Marks (x) No. of students (f) C.F.


10-20 7 7
20-30 9 16
30-40 10 26
40-50 12 38
50-60 5 43

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 25


Course Name: Principle of Statistics
60-70 7 50
N = 50
⠀⠀⠀⠀⠀⠀

Here, Total no. of students, n = 50⠀If N = 50


Then,

⠀⠀⠀⠀

As 26 are greater than 25, therefore median class is 30-40.


⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀

⠀⠀⠀⠀⠀⠀⠀

⠀⠀⠀⠀⠀⠀⠀

⠀⠀⠀

Hence, Median of the given data is 39

The mode is the most commonly observed value in a set of data. For the normal distribution, the mode is
also the same value as the mean and median. In many cases, the modal value will differ from the average
value in the data.

In statistics, the mode is the value which is repeatedly occurring in a given set. We can also say that the
value or number in a data set, which has a high frequency or appears more frequently, is called
mode or modal value. It is one of the three measures of central tendency, apart from mean and median.
For example, mode of the set {3, 7, 8, 8, 9}, is 8. Therefore, for a finite number of observations, we can
easily find the mode. A set of values may have one mode or more than one mode or no mode at all

How to Find the Mode or Modal Value


Finding the Mode ungrouped data

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 26


Course Name: Principle of Statistics
To find the mode, or modal value, it is best to put the numbers in order.
Then count how many of each number. A number that appears most often is
the mode.
Example: 3, 7, 5, 13, 20, 23, 39, 23, 40, 23, 14, 12, 56, 23, 29
In order these numbers are: 3, 5, 7, 12, 13, 14, 20, 23, 23, 23, 23, 29, 39, 40, 56
This makes it easy to see which numbers appear most often.
In this case the mode is 23.
Another Example2: {19, 8, 29, 35, 19, 28, 15}
Arrange them in order: {8, 15, 19, 19, 28, 29, and 35}
19 appear twice, all the rest appear only once, so 19 is the mode.

Mode by formula Formula  l   2 f 1f1 f 0f 0 f 2  xh


 

How to Calculate Mode Step by Step?


Step 1. Find the maximum class frequency.
Step 2. Find the class corresponding to this frequency. It is called the modal class.

Step 3. Find the class size. (Upper limit – lower limit.)

Step 4. Calculate mode using the formula

Where l = the lower limit of modal class.


h = the size of class interval, (assuming classes are of equal size).
f1 = the frequency of the modal class.
F0 denotes the frequency of the class preceding the modal class.
F2 denotes the frequency of the class succeeding the modal class.
The marks obtained by 40 students of 50 in a class are given below in the table
marks obtained 42 36 30 45 50

Number of students 7 10 13 8 2

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 27


Course Name: Principle of Statistics
The Mode is 30

Mode of Grouped Data


Example: In a class of 30 students marks obtained by students in statistics out of
50 is tabulated as below. Calculate the mode of data given.

Solution:
The maximum class frequency is 12 and the class interval corresponding to this
frequency is 20 – 30. Thus, the modal class is 20 – 30.
Lower limit of the modal class (l) = 20
Size of the class interval (h) = 10
Frequency of the modal class (f1) = 12
Frequency of the class preceding the modal class (f0) = 5
Frequency of the class succeeding the modal class (f2) = 8
Substituting these values in the formula we get
𝒇𝟏−𝒇𝟎 𝟏𝟐−𝟓 𝟕
Mode= 𝒍 + ( ) 𝒙𝒉 = 𝟐𝟎 + ( ) 𝒙𝟏𝟎 = 𝟐𝟎 + ( ) 𝒙𝟏𝟎 =
𝟐𝒇𝟏−𝒇𝟎−𝒇𝟐 𝟐𝒙𝟏𝟐−𝟓−𝟖 𝟐𝟒−𝟓−𝟖
𝟕
𝟐𝟎 + ( ) 𝒙𝟏𝟎 = 𝟐𝟎 + (𝟎. 𝟔𝟑𝟔𝟑)𝒙𝟏𝟎 = 𝟐𝟎 + 𝟔. 𝟑𝟔𝟑 = 𝟐𝟔. 𝟑𝟔𝟑
𝟏𝟏
The following data gives the information on the observed lifetimes (in hour) of 225 electrical
components.
lifetimes (in 0-20 20-40 40-60 60-80 80-100 100-120
hour)
Frequency (x) 10 35 52 61 38 29
Determine the model lifetimes (in hour) components.

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 28


Course Name: Principle of Statistics
𝒇𝟏−𝒇𝟎
Formula= 𝒍 + (𝟐𝒇𝟏−𝒇𝟎−𝒇𝟐) 𝒙𝒉

Where l = the lower limit of modal class.

h = the size of class interval, (assuming classes are of equal size).

f1 = the frequency of the modal class.

F0 denotes the frequency of the class preceding the modal class.

F2 denotes the frequency of the class succeeding the modal class

Modal class = 60-80, L=60, F1=61, F0=52, f2 =38, h=20.


𝟔𝟏−𝟓𝟐 𝟗
Formula= 𝟔𝟎 + (𝟐(𝟔𝟏)−𝟓𝟐−𝟑𝟖) 𝒙𝟐𝟎 = 𝟔𝟎 + (𝟑𝟐) 𝒙𝟐𝟎 = 𝟔𝟎 + 𝟓. 𝟔𝟐𝟓 = 𝟔𝟓. 𝟔𝟐𝟓

In statistics, the range is the spread of your data from the lowest to the highest value in the
distribution. It is a commonly used measure of variability.
Along with measures of central tendency, measures of variability give you descriptive statistics for
summarizing your data set.
The range is calculated by subtracting the lowest value from the highest value. While a large range
means high variability, a small range means low variability in a distribution.

Calculate the range


The formula to calculate the range is:

 R = range
 H = highest value
 L = lowest value

The range is the easiest measure of variability to calculate. To find the range, follow these steps:

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 29


Course Name: Principle of Statistics
1. Order all values in your data set from low to high.
2. Subtract the lowest value from the highest value.
This process is the same regardless of whether your values are positive or negative, or whole
numbers or fractions.

Range example your data set is the ages of 8 participants.


Participant 1 2 3 4 5 6 7 8

Age 37 19 31 29 21 26 33 36

First, order the values from low to high to identify the lowest value (L) and the highest value (H).

Age 19 21 26 29 31 33 36 37

Then subtract the lowest from the highest value.

R=H–L R = 37 – 19 = 18

The range of our data set is 18 years.

How useful is the range?


The range generally gives you a good indicator of variability when you have a distribution without
extreme values. When paired with measures of central tendency, the range can tell you about the
span of the distribution.
But the range can be misleading when you have outliers in your data set. One extreme value in the
data will give you a completely different range.
Range example with an outlier one value in your data set is replaced with an outlier.
Age 19 21 26 29 31 33 36 61

Using the same calculation, we get a very different result this time:

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 30


Course Name: Principle of Statistics
R=H–L R = 61 – 19 = 42

With an outlier, our range is now 42 years.


EXAMPLE – Range: The number of cappuccinos sold at the Starbucks location in
the Orange Country Airport between 4 and 7 p.m. for a sample of 5 days last year
were 20, 40, 50, 60, and 80. Determine the mean deviation for the number of
cappuccinos sold.

Range = Largest – Smallest value


= 80 – 20 = 60

Mean Deviation Definition


The mean deviation is defined as a statistical measure which is used to calculate the average
deviation from the mean value of the given data set. The mean deviation of the data values can be
easily calculated using the below procedure.
Step 1: Find the mean value for the given data values
Step 2: Now, subtract mean value from each of the data value given (Note: Ignore the minus
symbol)
Step 3: Now, find the mean of those values obtained in step 2.

Mean Deviation Formula


The formula to calculate the mean deviation for the given data set is given below.

Mean Deviation = [Σ |X – µ|]/N

Here,
Σ represents the addition of values

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 31


Course Name: Principle of Statistics
X represents each value in the data set
Μ represents the mean value of the data set
N represents the number of data values
|| represents the absolute value, which ignores the “-” symbol
EXAMPLE – Mean Deviation: The number of cappuccinos sold at the Starbucks location in
the Orange Country Airport between 4 and 7 p.m. for a sample of 5 days last year were 20, 40, 50,
60, and 80. Determine the mean deviation for the number of cappuccinos sold.

Step1: Add numbers of Cappuccinos sold Daily (20+40+50+60+80)/5 =50

Example: Find mean, variance and standard deviation for the following data
below: 7, 15,12,17,20,14,9.
∑𝒙 𝟕+𝟏𝟓+𝟏𝟐+𝟏𝟕+𝟐𝟎+𝟏𝟒+𝟗 𝟗𝟒
𝒙̄ = (𝒇𝒐𝒓 𝒂 𝒔𝒂𝒎𝒑𝒍𝒆) 𝒙̄ = 𝒙̄ = 𝒙̄ = 𝟏𝟑. 𝟒
𝒏 𝟕 𝟕

𝒙̄
(𝟕 − 𝟏𝟑. 𝟒) + (𝟏𝟓 − 𝟏𝟑. 𝟒) + (𝟏𝟐 − 𝟏𝟑. 𝟒) + (𝟏𝟕 − 𝟏𝟑. 𝟒) + (𝟐𝟎 − 𝟏𝟑. 𝟒) + (𝟏𝟒 − 𝟏𝟑. 𝟒) + (𝟗
=
𝟕
𝟒𝟎.𝟗𝟔+𝟐.𝟓𝟔+𝟏.𝟗𝟔+𝟏𝟐.𝟗𝟔+𝟒𝟑.𝟓𝟔+𝟎,𝟑𝟔+𝟏𝟗.𝟑𝟔 𝟏𝟐𝟏.𝟕𝟐
𝒙̄ = 𝒙̄ = 𝒙̄ = 𝟏𝟕. 𝟒
𝟕 𝟕

𝑺𝑫 = √𝒗𝒂𝒓 𝒊 𝒂𝒏𝒄𝒆 = 𝑺𝑫 = √𝟏𝟕. 𝟒 = 4.2


Find Mean and the standard deviation for the following values: 78.2, 90.5, 98.1,
93.7, and 94.5, find the mean. Organize the next steps in a table.
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 32
Course Name: Principle of Statistics
∑𝒙 𝟕𝟖.𝟐+𝟗𝟎.𝟓+𝟗𝟖.𝟏+𝟗𝟑.𝟕+𝟐𝟎+𝟗𝟒.𝟓 𝟒𝟓𝟒.𝟗
𝒙̄ = 𝒙̄ = 𝒙̄ = 𝒙̄ = 𝟗𝟏
𝒏 𝟓 𝟓
(x - 𝒙̄ ) 𝟐
x
)

78.2 91 -12.8 163.84


90.5 91 0.5 0.25
98.1 91 7.1 50.41
93.7 91 2.7 7.29
94.5 91 3.5 12.25
Σ )2 =234.04
Find standard deviation

SD  ( X  X ) 2

n
234.04
𝑆𝐷 = √ = 6.8
5

Mean and standard deviation of ungrouped data Recovery times from shoulder injuries.
Time in weeks(x) Frequency FX 2
x .f
1 5 5 5
2 8 16 32
3 12 36 108
4 19 76 304
5 7 35 175
6 4 24 144
7 3 21 147
8 2 16 128
Σf= 60 Σfx=229 Σfx2=1043

Mean = 𝑥̄ = ∑𝑛𝑥 𝑆𝑢𝑚𝑜𝑓𝑤𝑒𝑒𝑘𝑠


= 𝑥̄ = 𝑛𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝑝𝑎𝑡𝑒 𝑖𝑛𝑡 𝑠 𝑥̄ =
∑ 𝑥𝑓
∑𝑓
𝑥̄ =
229
60
= 3.82weeks

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 33


Course Name: Principle of Statistics
= (√∑(𝒇)[∑(fx^2] − [∑(fx)]𝟐)/ ∑ 𝒇(∑ f-1) = (√∑(𝟔𝟎)(𝟏𝟎𝟒𝟑) − (𝟐𝟐𝟗)𝟐)/
∑(𝟔𝟎)(𝟔𝟎 − 𝟏) = 𝑺𝑫 =
√𝟔𝟐𝟓𝟖𝟎 − 𝟓𝟐𝟒𝟒𝟏/𝟑𝟓𝟒𝟎 = √𝟏𝟎𝟏𝟑𝟗/𝟑𝟓𝟒𝟎 = √𝟐. 𝟖𝟔𝟒𝟏𝟐𝟒𝟑 = 𝑺𝑫 =
𝟏. 𝟔𝟗𝒘𝒆𝒆𝒌𝒔
Find the variance and standard deviation for the following data.
No of Frequenc FX 2
x .f
orders x) y
10-12 4 5 5
13-15 12 16 32
16-18 20 36 108
19-21 14 76 304

Solution
No of orders x) Frequency Midpiont(x) FX 2
x .f
10-12 4 11 44 484
13-15 12 14 168 2352
16-18 20 17 340 5780
19-21 14 20 280 5600
N =50 Σfx=832 Σfx2=14216

SD= (√∑(𝒇)[∑(fx^2] − [∑(fx)]𝟐)/ ∑ 𝒇(∑ f-1)


SD= (√∑(𝟓𝟎)(𝟏𝟒𝟐𝟏𝟔) − (𝟖𝟑𝟐)𝟐)/ ∑(𝟓𝟎)(𝟓𝟎 − 𝟏) = 𝑺𝑫 =

√𝟕𝟏𝟎𝟖𝟎𝟎 − 𝟔𝟗𝟐𝟐𝟐𝟒/𝟐𝟒𝟓𝟎 = √𝟏𝟖𝟓𝟕𝟔/𝟐𝟒𝟓𝟎 = √𝟕. 𝟓𝟖𝟐𝟎 = 𝑺𝑫 = 𝟐. 𝟕𝟓


Thus the standard deviation of the number of orders received at the office of this
mail-order company during the past 50 day is 2.75

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 34


Course Name: Principle of Statistics
Sample Variance
Average of the squared deviations from the arithmetic mean
EXAMPLE – Sample Variance
The hourly wages for a sample of part-time employees at Home Depot are: $12, $20,
$16, $18, and $19. What is the sample variance?

Standard Deviation Grouped Data


Score Frequency Mid(x) Fx 𝒙𝟐 𝒇𝒙𝟐

41-45 1 43 43 1849 1849


36-40 5 38 190 1444 7220
31-35 10 33 330 1089 10890

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 35


Course Name: Principle of Statistics
26-30 12 28 336 784 9408
21-25 10 23 230 529 5290
16-20 5 18 90 324 1620
11-15 3 13 39 169 507
6-10 3 8 24 64 192
1-5 1 3 3 9 9
Σf = 50 Σfx =1285 Σfx2 =36989

= (√∑(𝒇)[∑(fx^2] − [∑(fx)]𝟐)/ ∑ 𝒇(∑ f-1) = (√∑(𝟓𝟎)(𝟑𝟔𝟗𝟖𝟓) − (𝟏𝟐𝟖𝟓)𝟐)/


∑(𝟓𝟎)(𝟒𝟗) = 𝑺𝑫 =
√𝟏𝟖𝟒𝟗𝟐𝟓𝟎 − 𝟏𝟔𝟓𝟏𝟐𝟐𝟓/𝟐𝟒𝟓𝟎 = √𝟏𝟗𝟖𝟎𝟐𝟓/𝟐𝟒𝟓𝟎 = √𝟏𝟗𝟖𝟎𝟐𝟓/𝟐𝟒𝟓𝟎 = 𝑺𝑫𝟖. 𝟗𝟗
Standard Deviation Grouped Data
Grade Frequenc Mid(x) Fm (m-x) (m - x )² F)2
y
50-59 3 54.5 163.5 79. -24.7 610.09 1830.27
5
60-69 5 64.5 322.5 79. -14.7 216.09 1080.45
5
70-79 9 74.5 670.5 79. -4.7 22.09 198.81
5
80-89 12 84.5 1014 79. 5.3 28.09 337.08
5
90-100 8 95 760 79. 15.8 249.64 1997.12
5
Σf = 37 Σfx =2930.5 Σfx2 =36989 5443.73

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 36


Course Name: Principle of Statistics
 f.m 
=   ∑ 𝒇. 𝒎/f =2930.5/37 = 97.2
 n 

s s 2 (X  X ) 2
s s  2  (m  X ) 2

s
5443.73
N 1 n 1 37  1

s
5443.73  12.3
36

Summary of the calculation procedures:


1) subtract the mean from each score
2) square each result
3) sum all the square
4) Divide the sum of square by N. Now you get variance
If divide the sum of square by N-1, you will get the population variance estimate
5) Standard deviation is just the positive square root of the variance

Grouped data: the weights in kilograms, recoded by final year students


are as follow: Calculate the mean, Mean Deviation, variance and standard
deviation.
W(kg) f X F*x (x-𝒙̄ ) f(x-𝒙̄ ) f(x-𝒙̄ )^2
54-57 5 55.5 277.5 11.44 57.20 654.368
58-61 7 59.5 416.5 7.44 52.08 387.4752
62-65 10 63.5 635 3.44 34.40 118.336
66-69 12 67.5 810 0.56 6.72 3.7632
70-73 6 71.5 429 4.56 27.36 124.7616

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 37


Course Name: Principle of Statistics
74-77 5 75.5 377.5 8.56 42.80 366.368
78-81 4 79.5 318 12.56 50.24 631.0144
82-85 1 83.5 83.5 16.56 26.56 276.2336
50 3347 287.36 2560.32
Solution:
Step one:𝑴𝒆𝒂𝒏 = ∑𝒏𝒇𝒙 𝑴𝒆𝒂𝒏 =
𝟑𝟑𝟒𝟕
𝟓𝟎
=66.69kg

∑ 𝒇(x-xbarr) 𝟐𝟖𝟕.𝟑𝟔
Step two: Mean Deviation = = =5.7472
𝒏 𝟓𝟎
∑ 𝒇(x-xbarr )^2 2560.32
Variance == 𝒏
=
𝟓𝟎
= 51.2064

𝑺𝑫 = √𝟓𝟏. 𝟐𝟎𝟔𝟒 = 7.155

Standard Deviation of Grouped Data

Standard Deviation of Grouped Data – Example


Refer to the frequency distribution for the Whitner Autoplex data used earlier. Compute the standard
deviation of the vehicle selling prices.

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 38


Course Name: Principle of Statistics

Ch3: Regression Analysis


6 Linear model assumptions
7 Simple linear regression
8 Multiple linear regression
9 Problems for Regression Analysis
(i) Regression equation of X on Y

(ii) Regression coefficient of Y on X

(iii) Regression equation of Y on X

Regression Analysis

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 39


Course Name: Principle of Statistics
Regression analysis is a set of statistical methods used for the estimation of
relationships between a dependent variable and one or more independent variables. It
can be utilized to assess the strength of the relationship between variables and for
modeling the future relationship between them.

Regression analysis includes several variations, such as linear, multiple linear, and
nonlinear. The most common models are simple linear and multiple linear. Nonlinear
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 40
Course Name: Principle of Statistics
regression analysis is commonly used for more complicated data sets in which the
dependent and independent variables show a nonlinear relationship.
Regression analysis offers numerous applications in various disciplines,
including finance.

Regression Analysis – Linear model assumptions

Linear regression analysis is based on six fundamental assumptions:

1. The dependent and independent variables show a linear relationship between


the slope and the intercept.
2. The independent variable is not random.
3. The value of the residual (error) is zero.
4. The value of the residual (error) is constant across all observations.
5. The value of the residual (error) is not correlated across all observations.
6. The residual (error) values follow the normal distribution.

Regression Analysis – Simple linear regression

Simple linear regression is a model that assesses the relationship between a


dependent variable and an independent variable. The simple linear model is
expressed using the following equation:

Y = a + bX + ϵ

Where:

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 41


Course Name: Principle of Statistics
1) Y – Dependent variable
2) X – Independent (explanatory) variable
3) a – Intercept
4) b – Slope
5) ϵ – Residual (error)
Regression Analysis – Multiple linear regression

Multiple linear regression analysis is essentially similar to the simple linear model,
with the exception that multiple independent variables are used in the model. The
mathematical representation of multiple linear regressions is:

Y = a + bX1 + cX2 + dX3 + ϵ

Where:

1) Y – Dependent variable
2) X1, X2, X3 – Independent (explanatory) variables
3) a – Intercept
4) b, c, d – Slopes
5) ϵ – Residual (error)
Multiple linear regressions follow the same conditions as the simple linear model.
However, since there are several independent variables in multiple linear analyses,
there is another mandatory condition for the model:
Non-co linearity: Independent variables should show a minimum of correlation with
each other. If the independent variables are highly correlated with each other, it will
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 42
Course Name: Principle of Statistics
be difficult to assess the true relationships between the dependent and independent
variables.

Problems for Regression Analysis


Example1: Calculate the regression coefficient and obtain the lines of regression for
the following data

Solution:

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 43


Course Name: Principle of Statistics
Regression coefficient of X on Y

(i) Regression equation of X on Y

(ii) Regression coefficient of Y on X

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 44


Course Name: Principle of Statistics

(iii) Regression equation of Y on X

Y = 0.929X–3.716+11

= 0.929X+7.284

The regression equation of Y on X is Y= 0.929X + 7.284

Example2: Calculate the two regression equations of X on Y and Y on X from the data
given below, taking deviations from an actual means of X and Y.

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 45


Course Name: Principle of Statistics
Estimate the likely demand when the price is Somali Shilling.20.

Solution:

Calculation of Regression equation

(i) Regression equation of X on Y

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 46


Course Name: Principle of Statistics
(ii) Regression Equation of Y on X

When X is 20, Y will be

= –0.25 (20) +44.25

= –5+44.25

= 39.25 (when the price is Somali Shilling. 20, the likely demand is 39.25)

Example3: Obtain regression equation of Y on X and estimate Y when X=55 from the
following

Solution:

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 47


Course Name: Principle of Statistics

(i) Regression coefficients of Y on X

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 48


Course Name: Principle of Statistics

(ii) Regression equation of Y on X

Y–51.57 = 0.942(X–48.29)

Y = 0.942X–45.49+51.57=0.942 x–45.49+51.57

Y = 0.942X+6.08

The regression equation of Y on X is Y= 0.942X+6.08 Estimation of Y when X= 55

Y= 0.942(55) +6.08=57.89

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 49


Course Name: Principle of Statistics

Chapter 4: Correlation Analysis


10 Types of correlation coefficient formulas
11 What is Pearson Correlation?
12 Potential problems with Pearson correlation
Correlation Analysis

Correlation is a statistical measure that expresses the extent to which two


variables are linearly related (meaning they change together at a constant rate). It's a
common tool for describing simple relationships without making a statement about
cause and effect.
Example 1: An agriculture research organization tested a particular chemical fertilizer used would
lead to a corresponding increase in the food supply
X 2 1 3 2 4 5 3
Y 4 3 4 3 6 5 5

Solution:
X Y XY X2 Y2
2 4
1 3
3 4
2 3
4 6

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 50


Course Name: Principle of Statistics
5 5
3 5
Σ= Σ= Σ= Σ= Σ=

Example 2: The table below shows the time in hours spent studying (x) of 6 grade 11

students and their scores on a test (y) solve for Pearson’s product Correlation
Coefficients.
X 1 2 3 4 5 6
Y 5 10 15 15 25 35

Solution

X Y XY X2 Y2
1 5
2 10
3 15
4 15
5 24
6 35
Σ=21 Σ =104 Σ= Σ= Σ=
Use the following correlation coefficient formula.

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 51


Course Name: Principle of Statistics

Correlation coefficients are used to measure how strong a relationship is between two variables.
There are several types of correlation coefficient, but the most popular is Pearson’s. Pearson’s
correlation (also called Pearson’s R) is a correlation coefficient commonly used in linear
regression. If you’re starting out in statistics, you’ll probably learn about Pearson’s R first. In fact,
when anyone refers to the correlation coefficient, they are usually talking about Pearson’s.

Correlation Coefficient Formula: Definition

Correlation coefficient formulas are used to find how strong a relationship is between data. The
formulas return a value between -1 and 1, where:

1) 1 indicates a strong positive relationship.

2) -1 indicates a strong negative relationship.

3) A result of zero indicates no relationship at all.

Graphs showing a correlation of -1, 0 and +1

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 52


Course Name: Principle of Statistics
Eng: Cali Siidow Osmaan

Meaning
1) A correlation coefficient of 1 means that for every positive increase in one variable, there is

a positive increase of a fixed proportion in the other. For example, shoe sizes go up in
(almost) perfect correlation with foot length.
2) A correlation coefficient of -1 means that for every positive increase in one variable, there is

a negative decrease of a fixed proportion in the other. For example, the amount of gas in a
tank decreases in (almost) perfect correlation with speed.
3) Zero means that for every increase, there isn’t a positive or negative increase. The two just

aren’t related.
The absolute value of the correlation coefficient gives us the relationship strength. The larger the
number, the stronger the relationship for example, |-.75| = .75, which has a stronger relationship
than .65.

Types of correlation coefficient formulas


There are several types of correlation coefficient formulas.
One of the most commonly used formulas is Pearson’s correlation coefficient
formula. If you’re taking a basic stats class, this is the one you’ll probably use:

Two other formulas are commonly used: the sample correlation coefficient and the
population correlation coefficient.

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 53


Course Name: Principle of Statistics
Sample correlation coefficient

Sx and sy are the sample standard deviations, and sxy is the sample covariance.
Population correlation coefficient

The population correlation coefficient uses σx and σy as the population standard


deviations, and σxy as the population covariance.

What is Pearson Correlation?


Correlation between sets of data is a measure of how well they are related. The most
common measure of correlation in stats is the Pearson Correlation. The full name is
the Pearson Product Moment Correlation (PPMC). It shows the linear
relationship between two sets of data. In simple terms, it answers the question; Can I
draw a line graph to represent the data? Two letters are used to represent the
Pearson correlation: Greek letter rho (ρ) for a population and the letter “r” for a
sample.

Potential problems with Pearson correlation


The PPMC is not able to tell the difference between dependent
variables and independent variables. For example, if you are trying to find the
correlation between a high calorie diet and diabetes, you might find a high correlation
of .8. However, you could also get the same result with the variables switched
around. In other words, you could say that diabetes causes a high calorie diet. That
obviously makes no sense. Therefore, as a researcher you have to be aware of the
data you are plugging in. In addition, the PPMC will not give you any information
about the slope of the line; it only tells you whether there is a relationship.

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 54


Course Name: Principle of Statistics
Real Life Example
Pearson correlation is used in thousands of real life situations. For example, scientists in
China wanted to know if there was a relationship between how weedy rice populations are
different genetically. The goal was to find out the evolutionary potential of the rice.
Pearson’s correlation between the two groups was analyzed. It showed a positive Pearson
Product Moment correlation of between 0.783 and 0.895 for weedy rice populations. This
figure is quite high, which suggested a fairly strong relationship.

How to Find Pearson’s Correlation Coefficients


By Hand
Example question: Find the value of the correlation coefficient from the following
table:
SUBJECT AGE X GLUCOSE LEVEL Y
1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81
Step 1: Make a chart. Use the given data, and add three more columns: xy, x2, and
y2.

SUBJECT AGE GLUCOSE XY X 2 Y 2


X LEVEL Y
1 43 99
2 21 65

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 55


Course Name: Principle of Statistics
3 25 79
4 42 75
5 57 87
6 59 81
Step 2: Multiply x and y together to fill the xy column. For example, row 1 would be
43 × 99 = 4,257.
SUBJECT AGE GLUCOSE XY X2 Y2
X LEVEL Y
1 43 99 4257
2 21 65 1365
3 25 79 1975
4 42 75 3150
5 57 87 4959
6 59 81 4779
Step 3: Take the square of the numbers in the x column, and put the result in the
x2 column.
SUBJECT AGE GLUCOSE XY X2 Y2
X LEVEL Y
1 43 99 4257 1849
2 21 65 1365 441
3 25 79 1975 625
4 42 75 3150 1764
5 57 87 4959 3249
6 59 81 4779 3481

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 56


Course Name: Principle of Statistics
Step 4: Take the square of the numbers in the y column, and put the result in the
y2 column.
SUBJECT AGE GLUCOSE XY X2 Y2
X LEVEL Y
1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561
Step 5: Add up all of the numbers in the columns and put the result at the bottom of
the column. The Greek letter sigma (Σ) is a short way of saying “sum of”
or summation.
SUBJECT AGE GLUCOSE XY X2 Y2
X LEVEL Y
1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561
Σ 247 486 20485 11409 40022

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 57


Course Name: Principle of Statistics
Step 6: Use the following correlation coefficient formula.

The answer is: 2868 / 5413.27 = 0.529809


From our table:

 Σx = 247
 Σy = 486
 Σxy = 20,485
 Σx2 = 11,409
 Σy2 = 40,022
 n is the sample size, in our case = 6
The correlation coefficient =

 6(20,485) – (247 × 486) / [√[[6(11,409) – (2472)] × [6(40,022) – 4862]]]


= 0.5298

The range of the correlation coefficient is from -1 to 1. Our result is 0.5298 or


52.98%, which means the variables have a moderate positive correlation
Example4: Find the means of X and Y variables and the coefficient of correlation
between them from the following two regression equations:
2Y–X–50 = 0
3Y–2X–10 = 0.
Given:
2y - x - 50 = 0

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 58


Course Name: Principle of Statistics
3y - 2x - 10 = 0
To find:
Mean of the variables X and Y
Correlation coefficient
Solution:
2y - x - 50 = 0
2y - x = 50 (i)
3y - 2x - 10 = 0
3y - 2x = 10 (ii)
Solving equation (i) and (ii) simultaneously
2y - x = 50 ×2
3y - 2x = 10
So, we get
4y - 2x = 100
3y - 2x = 10
(-) (+) (-)
y = 90
Putting value of y in equation (i)
2y - x = 50
2(90) - x = 50
180 - x = 50
x = 180 - 50
x = 130
So, we get X' = 130 and Y' = 90
Assume equation (i), regression equation of Y on X
2y - x = 50
2y = x + 50

So,
Consider equation (ii), regression equation of X on Y
3y - 2x = 10
2x = 3y - 10

So,

r = 0.866

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 59


Course Name: Principle of Statistics
So, correlation coefficient is 0.866

Example5: Find the means of X and Y variables and the coefficient of correlation
between them from the following two regression equations:
4X–5Y+33 = 0
20X–9Y–107 = 0
Solution:
To get mean values we must solve the given lines.
4X – 5Y = -33 … (1)
20X – 9Y = 107 … (2)
1× 5 ⇒ 20X – 25Y = -165
20X – 9Y = 107
Subtracting (1) and (2), -16Y = -272
Y = 272/16 = 17 i.e., Y¯ = 17
Using Y = 17 in (1)
We get,
4X – 85 = -33
4X = 52
X = 13 i.e., X¯X¯ = 13
Mean values are X¯ = 13, Y¯ = 17,
Let regression line of Y on X be
4X – 5Y + 33 = 0
5Y = 4X + 33
Y = (4X + 33) Y = 1/5(4x + 33)
Y = 4/5X+33/5 Y = 0.8X + 6.6
∴ byx = 0.8
Let regression line of X on Y be
20X – 9Y – 107 = 0
20X = 9Y + 107 X = 1/20 (9Y + 107)

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 60


Course Name: Principle of Statistics
X = 9/20Y+107/20Y
X = 0.45Y + 5.35
∴ bxy = 0.45 Coefficient of correlation between X and Y is = ±0.6 = 0.6 Both byx and bxy is
positive take positive sign.
Example6
The following table shows the sales and advertisement expenditure of a form

Coefficient of correlation r= 0.9. Estimate the likely sales for a proposed advertisement
expenditure of Sh. Somali. 10 crores.
Solution:

When advertisement expenditure is 10 crores i.e., Y=10 then sales X=6(10) +4=64
which implies sales is 64.
Example7
There are two series of index numbers P for price index and S for stock of the
commodity. The mean and standard deviation of P are 100 and 8 and of S are 103 and

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 61


Course Name: Principle of Statistics
4 respectively. The correlation coefficient between the two series is 0.4. With these
data obtain the regression lines of P on S and S on P.
Solution:
Let us consider X for price P and Y for stock S. Then the mean and SD for P is
considered as X-Bar = 100 and σx=8. Respectively and the mean and SD of S is
considered as Y-Bar =103 and σy=4. The correlation coefficient between the series is r(X,
Y) =0.4
Let the regression line X on Y be

Example8
For 5 pairs of observations the following results are obtained ∑X=15, ∑Y=25, ∑X2
=55, ∑Y2 =135, ∑XY=83 Find the equation of the lines of regression and estimate the
value of X on the first line when Y=12 and value of Y on the second line if X=8.
Solution:

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 62


Course Name: Principle of Statistics

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 63


Course Name: Principle of Statistics
Y–5 = 0.8(X–3)
= 0.8X+2.6
When X=8 the value of Y is estimated as
= 0.8(8) +2.6
=9
Example9
The two regression lines are 3X+2Y=26 and 6X+3Y=31. Find the correlation
coefficient.
Solution:
Let the regression equation of Y on X be
3X+2Y = 26

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 64


Course Name: Principle of Statistics

Example9

In a laboratory experiment on correlation research study the equation of the two


regression lines were found to be 2X–Y+1=0 and 3X–2Y+7=0 . Find the means
of X and Y. Also work out the values of the regression coefficient and correlation
between the two variables X and Y.

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 65


Course Name: Principle of Statistics

Solution:
Solving the two regression equations we get mean values of X and Y

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 66


Course Name: Principle of Statistics

Example10

For the given lines of regression 3X–2Y=5and X–4Y=7. Find

(i) Regression coefficients

(ii) Coefficient of correlation

Solution:

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 67


Course Name: Principle of Statistics
(i) First convert the given equations Y on X and X on Y in standard form and find their
regression coefficients respectively.

Given regression lines are

3X–2Y = 5 ... (1)

X–4Y = 7 ... (2)

Let the line of regression of X on Y is

3X–2Y = 5

3X = 2Y+5

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 68


Course Name: Principle of Statistics

Coefficient of correlation

Since the two regression coefficients are positive then the correlation coefficient is also
positive and it is given by

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 69


Course Name: Principle of Statistics

Exercise1

1. from the data given below

Find (a) The two regression equations, (b) The coefficient of correlation between
marks in Economics and statistics, (c) The mostly likely marks in Statistics when the
marks in Economics are 30.

2. The heights (in cm.) of a group of fathers and sons are given below

Find the lines of regression and estimate the height of son when the height of the father
is 164 cm.

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 70


Course Name: Principle of Statistics
3. The following data give the height in inches (X) and the weight in lb. (Y) of a random
sample of 10 students from a large group of students of age 17 years:

Estimate weight of the student of a height 69 inches.

4. Obtain the two regression lines from the following data N=20, ∑X=80, ∑Y=40,
∑X2=1680, ∑Y2=320 and ∑XY=480

5. Given the following data, what will be the possible yield when the rainfall is 29₹₹?

Coefficient of correlation between rainfall and production is 0.8

6. The following data relate to advertisement expenditure (in lakh of rupees) and their
corresponding sales (in cores of rupees)

Estimate the sales corresponding to advertising expenditure of Rs. 30 lakh.

7. You are given the following data:

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 71


Course Name: Principle of Statistics

If the Correlation coefficient between X and Y is 0.66, then find (i) the two regression
coefficients, (ii) the most likely value of Y when X=10

8. Find the equation of the regression line of Y on X, if the observations ( Xi, Yi) are the
following (1,4) (2,8) (3,2) ( 4,12) ( 5, 10) ( 6, 14) ( 7, 16) ( 8, 6) (9, 18)

9. A survey was conducted to study the relationship between expenditure on


accommodation (X) and expenditure on Food and Entertainment (Y) and the following
results were obtained:

Write down the regression equation and estimate the expenditure on Food and
Entertainment, if the expenditure on accommodation is Rs. 200.

10. For 5 observations of pairs of (X, Y) of variables X and Y the following results are
obtained. ∑X=15, ∑Y=25, ∑X2=55, ∑Y2=135, ∑XY=83. Find the equation of the lines
of regression and estimate the values of X and Y if Y=8; X=12.

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 72


Course Name: Principle of Statistics
11. The two regression lines were found to be 4X–5Y+33=0 and 20X–9Y–107=0. Find
the mean values and coefficient of correlation between X and Y.

12. The equations of two lines of regression obtained in a correlation analysis are the
following 2X=8–3Y and 2Y=5–X. Obtain the value of the regression coefficients and
correlation coefficient.

Ch5: Empirical Probability


 The Addition Rules for Probability
 The Multiplication Rules and Conditional
Probability
 Conditional Probability
Empirical probability, also known as experimental probability,
refers to a probability that is based on historical data. In other
words, empirical probability illustrates the likelihood of an event
occurring based on historical data.

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 73


Course Name: Principle of Statistics

Example1: In the travel survey just described, find the


probability that a person will travel by airplane over the Thanks
giving holiday

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 74


Course Name: Principle of Statistics
𝒇 𝟔
Solution 𝒑(𝑬) = 𝒏 = 𝟓𝟎 =
𝟑
Example2: In a sample of 50 people, 21
𝟐𝟓
had type O blood, 22 had type A blood, 5 had
type B blood, and 2 had type AB blood. Set up
a frequency distribution and find the following
probabilities.

a. A person has type O blood.

b. A person has type A or type B blood.

c. A person has neither type A nor type O


blood.

d. A person does not have type AB blood

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 75


Course Name: Principle of Statistics

Solution

𝒇
𝒑(𝟎) = =
𝒏
𝟐𝟐 𝟓 𝟐𝟕
𝒑(𝑨𝒐𝒓𝑩) = + =
𝟓𝟎 𝟓𝟎 𝟓𝟎

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 76


Course Name: Principle of Statistics
𝟓
𝒑(𝒏𝒆𝒊𝒕𝒉𝒆𝒓𝑨𝒏𝒐𝒓𝑶) = 𝟓𝟎 +
𝟐 𝟕
= 𝒑(𝒏𝒐𝒕𝑨𝑩) = 𝟏 −
𝟓𝟎 𝟓𝟎
𝟐 𝟒𝟖
𝑷(𝑨𝑩) = 𝟏 − 𝟓𝟎 = 𝟓𝟎
Example3: Hospital records indicated that knee
replacement patients stayed in the hospital for the
number of days shown in the distribution.

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 77


Course Name: Principle of Statistics
Find these probabilities.
a. A patient stayed exactly 5 days.
b. A patient stayed less than 6 days.
c. A patient stayed at most 4 days.
d. A patient stayed at least 5 days.

Solution
𝟓𝟔
𝒑(𝟓) =
𝟏𝟐𝟕
𝒑(𝒍𝒆𝒔𝒕𝒉𝒆𝒏𝟔𝒅𝒂𝒚𝒔)
𝟏𝟓 𝟑𝟐 𝟓𝟔
= + +
𝟏𝟐𝟕 𝟏𝟐𝟕 𝟏𝟐𝟕
𝟏𝟎𝟑
=
𝟏𝟐𝟕
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 78
Course Name: Principle of Statistics

𝒑(𝒂𝒕𝒎𝒐𝒔𝒕𝟒𝒅𝒂𝒚𝒔)
𝟏𝟓 𝟑𝟐 𝟒𝟕
= + =
𝟏𝟐𝟕 𝟏𝟐𝟕 𝟏𝟐𝟕
𝒑(𝒂𝒕𝒍𝒆𝒔𝒕𝟓𝒅𝒂𝒚𝒔)
𝟓𝟔 𝟏𝟗 𝟓
= + +
𝟏𝟐𝟕 𝟏𝟐𝟕 𝟏𝟐𝟕
𝟖𝟎
=
𝟏𝟐𝟕
The Addition Rules for Probability
Two events are mutually exclusive events if
they cannot occur at the same time (i.e., they
have no outcomes in common).

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 79


Course Name: Principle of Statistics
Example: Determine which events are
mutually exclusive and which are not, when a
single die is rolled. a. Getting an odd number
and getting an even number
b. Getting a 3 and getting an odd number
c. Getting an odd number and getting a
number less than 4
d. Getting a number greater than 4 and getting
a number less than 4
Solution: The events are mutually
exclusive, since the first event can be 1, 3, or
5 and the second event can be 2, 4, or 6.
b. The events are not mutually exclusive,
since the first event is a 3 and the second can
be 1, 3, or 5. Hence, 3 is contained in both
events.

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 80


Course Name: Principle of Statistics
 The events are not mutually
exclusive, since the first event can be
1, 3, or 5 and the second can be 1, 2,
or 3. Hence, 1 and 3 are contained in
both events.
d. The events are mutually exclusive,
since the first event can be 5 or 6 and the
second event can be 1, 2, or 3.
Example: Determine which events are
mutually exclusive and which are not
when a single card is
 drawn from a deck
a. Getting a 7 and getting a jack
b. Getting a club and getting a king
c. Getting a face card and getting an ace
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 81
Course Name: Principle of Statistics
d. Getting a face card and getting a spade
Solution

Only the events in parts a and c are


mutually exclusive.
 Example: In a hospital unit there are
8 nurses and 5 physicians; 7 nurses
and 3 physicians are females.
 If a staff person is selected, find the
probability that the subject is a
nurse or a male.

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 82


Course Name: Principle of Statistics
Solution

Example:
 A single card is drawn at random
from an ordinary deck of cards. Find
the probability
 that it is either an ace or a black
card.

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 83


Course Name: Principle of Statistics

Example
 In a hospital unit there are 8 nurses
and 5 physicians; 7 nurses and 3
physicians are females.
 If a staff person is selected, find the
probability that the subject is a
nurse or a male.

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 84


Course Name: Principle of Statistics
Solution

The Multiplication Rules and


Conditional Probability
Two events A and B are
independent events if the fact
that A occurs does not affect
the probability of B occurring.

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 85


Course Name: Principle of Statistics

Examples: A coin is flipped


and a die is rolled. Find the
probability of getting a head
on the coin and a 4 on the die.
Solution 𝒑(𝒉𝒆𝒂𝒅𝒂𝒏𝒅𝟒) =
𝟏 𝟏 𝟏
𝑷(𝒉𝒆𝒂𝒅). 𝒑(𝟒) = . =
𝟐 𝟔 𝟏𝟐
The problem in Example can also be
solved by using the sample space
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 86
Course Name: Principle of Statistics
H1 H2 H3 H4 H5 H6 T1 T2 T3 T4 T5
𝟏
T6 The solution is
𝟏𝟐

since there is only one way to get the


head-4 outcome.

Example: A card is drawn from a


deck and replaced; then a second card
is drawn. Find the probability of getting
a queen and then an ace.
Solution
𝑷(queen and ace) = P(queen) . P(ac
𝟏𝟔 𝟏
=
𝟐𝟕𝟎𝟒 𝟏𝟔𝟗
Example: An urn contains 3 red balls, 2 blue
balls, and 5 white balls. A ball is selected and
its

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 87


Course Name: Principle of Statistics
color noted. Then it is replaced. A second
ball is selected and its color noted. Find the
probability of each of these.
a. Selecting 2 blue balls
b. Selecting 1 blue ball and then 1 white ball
c. Selecting 1 red ball and then 1 blue ball
 Solution
𝟐 𝟐
𝑷(blue and blue) =P(blue) .P(blue) = . =
𝟏𝟎 𝟏𝟎
𝟒
𝑷(blue and white) =P(blue) .P(white) =
𝟏𝟎
𝟐 𝟓 𝟏𝟎
. = =
𝟏𝟎 𝟏𝟎 𝟏𝟎𝟎
𝟏 𝟑 𝟐
𝑷(red and blue) = P(red) . P(blue) = .
𝟏𝟎 𝟏𝟎 𝟏𝟎
𝟔 𝟑
= When the outcome or occurrence
𝟏𝟎𝟎 𝟓𝟎
of the first event affects the outcome or
occurrence of the second event in such a

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 88


Course Name: Principle of Statistics
way that the probability is changed, the
events are said to be dependent events.
Example: Three cards are drawn from
an ordinary deck and not replaced.
Find the probability of these events.
a. Getting 3 jacks
b. Getting an ace, a king, and a queen in
order
c. Getting a club, a spade, and a heart in
order
d. Getting 3 clubs
𝟒 𝟑 𝟐
Solution 𝒂. 𝒑(𝟑𝒋𝒂𝒄𝒌) = . . =
𝟓𝟐 𝟓𝟏 𝟓𝟎
𝟐𝟒
=
𝟏𝟐𝟑𝟔𝟎𝟎
𝟏
𝒃. 𝒑((ace and king and queen) =
𝟓𝟓𝟐𝟓
𝟒 𝟒 𝟒 𝟔𝟒 𝟖
. . = =
𝟓𝟐 𝟓𝟏 𝟓𝟎 𝟏𝟐𝟑𝟔𝟎𝟎 𝟏𝟔𝟓𝟕𝟓

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 89


Course Name: Principle of Statistics
𝒄. 𝒑(club and spade and heart)
𝟏𝟑 𝟏𝟑 𝟏𝟑 𝟐𝟏𝟗𝟕 𝟏𝟔𝟗
= . . = =
𝟓𝟐 𝟓𝟏 𝟓𝟎 𝟏𝟐𝟑𝟔𝟎𝟎 𝟏𝟎𝟐𝟎𝟎
𝟏𝟑 𝟏𝟐 𝟏𝟏 𝟏𝟕𝟏𝟔
𝒅. 𝒑((3 clubs) = . . = =
𝟓𝟐 𝟓𝟏 𝟓𝟎 𝟏𝟐𝟑𝟔𝟎𝟎
𝟏𝟏
𝟖𝟓𝟎
Conditional Probability

Example: A box contains black chips


and white chips. A person selects two
chips without replacement. If the
𝟏𝟓
probability of selecting a black chip
𝟓𝟔
and a white chip is and the probability
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 90
Course Name: Principle of Statistics
of selecting a black chip on the first
𝟑
draw is
𝟖

find the probability of selecting the


white chip on the second draw, given
that the first chip selected was a black
chip.
Solution Let
B selecting a black chip
W selecting a white chip
p( B  w) 15 56 15 3 15 8 5
p(W B)      . 
 p ( B ) 3 8 56 8 56 3 7
A recent survey asked 100 people if
they thought women in the armed
forces should be permitted to
participate in combat. The results of
Principle of Statistics Collected by: Eng Ali Sidow Osman Page 91
Course Name: Principle of Statistics
the survey are shown

Find these probabilities.


a. The respondent answered yes, given
that the respondent was a female.
b. The respondent was a male, given
that the respondent answered no.
Solution

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 92


Course Name: Principle of Statistics
50
a.P ( F ) 
100

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 93


Course Name: Principle of Statistics
FREQUENTLY USED FORMULAS n = sample size; N = population size

Sample mean

Population mean

Sample standard deviation

Population standard deviation

Sample mean for a frequency distribution

Sample standard deviation for a frequency distribution

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 94


Course Name: Principle of Statistics
Sample coefficient of variation

Range = Largest data value - smallest data value

Standard z value

Original x value

Central limit theorem

PROBABILITY FORMULAS
Probability of an event A

where f = frequency of occurrence of event


n = sample size

Probability of the complement of event A

P(not A) = 1 - P(A)

Multiplication rule for independent events

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 95


Course Name: Principle of Statistics

General multiplication rules

Addition rule for mutually exclusive events

P(A or B) = P(A) + P(B)

General addition rule

P(A or B) = P(A) + P(B) - P(A and B)

Permutation rule

Combination rule

Mean of a discrete probability distribution

Standard deviation of a discrete probability distribution

where r = number of successes;


BINOMIAL DISTRIBUTION FORMULAS p = probability of success; q = 1 –
p

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 96


Course Name: Principle of Statistics
Formula for a binomial probability distribution

Mean for a binomial distribution

Standard deviation for a binomial distribution

CONFIDENCE INTERVALS
Confidence interval for a mean (large samples)

Confidence interval for a mean (Small samples)

Confidence interval for a proportion (where np > 5 and nq > 5)

SAMPLE SIZE

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 97


Course Name: Principle of Statistics
Sample size for estimating means

Sample size for estimating proportions

REGRESSION AND CORRELATION

In all these formulas

Least squares line

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 98


Course Name: Principle of Statistics
Standard error of estimate

Pearson product-moment correlation coefficient

Coefficient of determination

r2

Confidence interval for y

yp - E < y < yp + E where yp is the predicted y value for x

Spearman Rank correlation coefficient

Principle of Statistics Collected by: Eng Ali Sidow Osman Page 99

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy