Basicstat 1011
Basicstat 1011
Awol S.
Department of Statistics
College of Computing & Informatics
Haramaya University
Dire Dawa, Ethiopia
2013/2014
c
Contents
1 Introduction 1
1.1 Some Statistical Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Definition and Classification of Statistics . . . . . . . . . . . . . . . . . . . 1
1.2.1 Definitions of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.2 Stages in Statistical Investigation . . . . . . . . . . . . . . . . . . . 2
1.2.3 Classification of Statistics . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Applications, Uses and Limitations of Statistics . . . . . . . . . . . . . . . 4
1.3.1 Applications of Statistics . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.2 Uses of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.3 Limitations of Statistics . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Types of Variables and Measurement Scales . . . . . . . . . . . . . . . . . 6
1.4.1 Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.2 Scales of Measurement . . . . . . . . . . . . . . . . . . . . . . . . . 7
i
CONTENTS CONTENTS
3.5 Median . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6 Other Measures of Location: Quantiles . . . . . . . . . . . . . . . . . . . . 36
3.6.1 Quartiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.6.2 Deciles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.6.3 Percentiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.7 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5 Elementary Probability 60
5.1 Concept of Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.2 Basic Probability Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.3 Counting Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.4 Definitions of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.5 Some Rules of Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.6 Conditional Probability and Independence . . . . . . . . . . . . . . . . . . 69
5.6.1 Conditional Events . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.6.2 Independent Events . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6 Probability Distributions 72
6.1 Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.1.1 Probability Distribution . . . . . . . . . . . . . . . . . . . . . . . . 72
6.1.2 Expectations of a Random Variable . . . . . . . . . . . . . . . . . . 74
6.2 Common Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . 75
6.2.1 The Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . 75
6.2.2 The Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . 77
6.3 Common Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . 78
6.3.1 The Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . 78
6.3.2 Other Continuous Distributions . . . . . . . . . . . . . . . . . . . . 82
ii
Introduction to Statistics - Stat 1011 es.awol@gmail.com
7 Sampling Techniques 83
7.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.2 Reasons for Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.3 Types of Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.4 Types of Sampling Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.4.1 Probability Sampling Techniques . . . . . . . . . . . . . . . . . . . 85
7.4.2 Non-probability Sampling Techniques . . . . . . . . . . . . . . . . . 87
iii
Chapter 1
Introduction
• Plural sense: Statistics are collection of facts (figures). This meaning of the word
is widely used when reference is made to facts and figures on a certain characteristic.
For example: sales statistics, labor statistics, employment statistics, e.t.c. In this
sense the word ”statistics” serves simply as ”data”. But, not all numerical data
are statistics. In order for the numerical data to be identified as statistics, it must
1
Introduction to Statistics - Stat 1011 es.awol@gmail.com
• Singular sense: Statistics is a science that deals with the method of data collec-
tion, data organization, data presentation, data analysis and interpretation of results.
It refers to a subject matter that is concerned with extracting relevant information
from available data with the aim to make sound decisions. According to this mean-
ing, statistics is concerned with the development and application of methods and
techniques for collecting, organizing, presenting, analyzing data and interpreting re-
sults.
2
Introduction to Statistics - Stat 1011 es.awol@gmail.com
sults.
1. Collection of data: Data collection is the first stage in any statistical investiga-
tion. It involves the process of obtaining (gathering) a set of related measurements
or counts to meet predetermined objectives. Data may be available from existing
published sources which may have already been organized in some presentable form.
Such information is commonly referred to as secondary data. On the other hand, the
investigator may actually collect his or her own data. This is usually warranted when
information about some area of inquiry has not been ascertained. In such cases, the
data are said to be of primary form.
4. Analysis of data: The analysis of data is the extraction of summarized and com-
prehensive numerical description in order to reach conclusions or provide answers to
a problem. That is, the basic purpose of data analysis is to make it useful for certain
conclusions. This analysis may require from simple to sophisticated mathematical
techniques.
3
Introduction to Statistics - Stat 1011 es.awol@gmail.com
2. Inferential statistics: Inferential statistics includes the methods used to find out
something about a population based on the sample. It is concerned with drawing
statistically valid conclusions about the characteristics of the population based on
information obtained from sample. In this form of statistical analysis, descriptive
statistics is linked with probability theory in order to generalize the results of the
sample to the population. Performing hypothesis testing, determining relationships
between variables and making predictions are also inferential statistics.
Example 1.1. Classify the following statements as descriptive and inferential statistics.
4. Of the students enrolled in Haramaya University this year, 74% are male and 26%
are female.
5. The chance of winning the Ethiopian National Lottery in any day is 1 out of 167000.
• In scientific research: There is hardly any advanced research going on without the
use of statistics in one form or another. Statistics are used extensively in medical,
pharmaceutical and agricultural research. The effectiveness of a new drug is de-
termined by statistical experimentation and evaluation. In agriculture, experiments
4
Introduction to Statistics - Stat 1011 es.awol@gmail.com
about crop yields, types of fertilizers and types of soils under different types of en-
vironments are commonly designed and analyzed through statistical methods and
concepts. In marketing research, statistical tools are indispensable in studying con-
sumer behavior, effects of various promotional strategies and so on. In economics, it
is used for modeling functional relationships between or among variables. In educa-
tion and agricultural extension also it is used to study the effects of certain training.
Also in decision making, statistics helps to enhance the power of decision making in
the face of uncertainty by providing sufficient information.
• In quality control: Statistics are used in quality control so extensively that even the
phenomenon itself is known as statistical quality control. Statistical quality control
(SQC) consists of using statistical methods to gather and analyze data on determi-
nation and control of quality. Statistical methods help to check whether a product
satisfies a given standard. This technique primarily deals with the samples taken
randomly and as representative of the entire population, then these samples are an-
alyzed and inferences made concerning the characteristics of the population from
which these samples were taken. The concept is similar to testing one spoonful from
a pot of stew and deciding whether it needs more salt or not. The characteristics
of samples are analyzed by statistical quality control and the use of other statistical
techniques.
• In other areas: Statistics are commonly used by insurance companies, stock broker-
age houses, banks public utility companies and so on. Statistics are also immensely
useful to politicians since their chances of winning can be predicted through the
use of sampling techniques in random selection of voter samples and studying their
attitudes on issues and policies.
5
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Qualitative variables are those variables that do not assume numeric values. For example,
gender is qualitative variable. But, quantitative variables are, on the other hand, those
variables which assume numeric values. These variables are numeric in nature. Height and
family size are examples of quantitative variables.
6
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Quantitative variables are again classified into two; discrete and continuous variables. Dis-
crete variables are those variables that assume whole number values and consist of distinct
and recognizable individual elements that can be counted. For example, family size, num-
ber of children in a family, number of cars at the traffic light, · · · are some of the discrete
variables. These variables assume a finite or countable number of possible values. The
values of these variables are obtained by counting (0, 1, 2, · · · ).
The other quantitative variables, continuous variables, takes any value including decimals.
These variables can theoretically assume an infinite number of possible values. Their val-
ues are obtained by measuring. Examples of continuous variables are height, weight, time,
temperature, · · ·
Generally the values of a variable can be obtained either by counting for discrete variables,
by measuring for continuous variables or by making categories for qualitative variables.
Example 1.2. Classify each of the following variables as qualitative and quantitative and
if it is quantitative classify as discrete and continuous.
Case 1:
Case 2:
7
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Based on the number on the shirts, it is not possible to judge whether Mr B plays better.
But by using the test score, it is possible to judge that Mr B did better in the exam. Also
it not possible to find the average shirt numbers because the numbers on the shirts are
simply codes but it is possible to obtain the average test score.
1. Nominal variables: are those qualitative variables which show category of indi-
viduals. They reflect classification into mutually exclusive (non-overlapping) and
exhaustive categories (name of groups) without any associated ranking. Numbers
may be assigned to the variables simply for coding purposes. It is not possible to
compare individuals based on the numbers assigned to categories. This scale is the
weakest form of measurement. The only mathematical operation permissible on these
variables is counting. Some examples of nominal variables are gender, religion, ID
No, ethnicity, color,· · · .
2. Ordinal variables: are also those qualitative variables whose values can be ordered
and ranked. Ranking and counting are the mathematical operations to be done on
the values of the variables. However, these ranks only indicate as to which category
greater or better but there is no precise difference between the values (categories) of
the variable. Example: grade scores (A, B, C, D, F), academic qualifications (B.Sc.,
M.Sc., Ph.D.), strength (very weak, week, strong, very strong), health status (very
sick, sick, cured).
3. Interval variables: are those quantitative variables and identifies not only as to
which category is greater or better but also by how much. It is the stronger form of
measurement but there is no true zero. Zero indicates low than empty. Examples:
temperature, 0 ◦ C does not mean there is no temperature but, rather, it is too cold.
Similarly, if a student scores 0 in a certain course, it does not mean that the student
has no knowledge in the course at all.
4. Ratio variables: These scales are the highest form of measurements. Ratio variables
are those quantitative variables but, unlike the interval variables, zero shows absence
of the characteristics. All mathematical operations are allowed to be operated on
the values of these variables. Examples: height, weight, income, amount of yield,
expenditure, consumption,· · · .
8
Chapter 2
2. Secondary data: When an investigator uses the data which has already been col-
lected by others, such data is called secondary data. This data is primary data for
the agency that collected it and becomes secondary data for someone else who uses
this data for his own purposes. The secondary data can be obtained from journals,
official reports, government publications, publications of professional and research
organizations and so on.
Based on the role of time, data can be classified as cross-sectional and time series.
1. Cross-sectional data: is a set of observations taken at a point of time.
2. Time series data: is a set of observations collected for a sequence of time usually
at equal intervals.
9
Introduction to Statistics - Stat 1011 es.awol@gmail.com
collection (why we need to collect data), the kind data to be collected (what type of data
to be collected), the source of data (where we can get the data) and the methods of data
collection (how can we collect this data).
Once these questions are answered, it becomes necessary to collect the information needed.
This information has to be collected from certain individuals, directly or indirectly. Such
a technique is known as survey method which is commonly used in social sciences, i.e.,
problems related to sociology, political science, psychology and various economic studies.
2.2.1 Questionnaire
The most common methods of data collection for survey are personal interview and self-
administered questionnaire. In these and other methods of data collection, it is necessary
to prepare a document, called questionnaire, which contains a set of questions to be an-
swered and is used to record the responses.
Questionnaire is a form containing a cover letter that explains about the person conducting
the survey and the objectives of the survey, and a set of related questions to be answered
by the respondents. One of the most important points in preparing it is that all questions
in it must have relevance to the objectives of the survey. In short, the following points
should be kept in mind while designing a questionnaire:
• The person conducting the survey should introduce himself and state the objective(s)
of the survey, promise of the anonymity and also include instructions how to fill the
form as it is necessary in getting correct responses (cover letter).
• The number of questions should be as few as possible. Once the objectives of the sur-
vey are clearly defined only questions pertinent to the objectives should be included.
The time of the respondent should not be wasted by asking irrelevant questions. In
general 5 to 25 questions may be regarded as a fair number. If a lengthy questionnaire
is unavoidable, it should preferably be divided into two or more parts.
10
Introduction to Statistics - Stat 1011 es.awol@gmail.com
• Questions should be simple, short and easy to understand and they should convey
one and only one idea. Technical terms should be avoided.
• Leading questions should be completely avoided. If you ask person like ”You do not
smoke cigarette?” the person will automatically say ’Yes I do not’.
1. Whether the data are suitable for the purpose of investigation. This can be judged in
the light of the nature and scope of investigation.
2. If the data obtained is suitable for our purpose it should be look at whether the data
are adequate for the purpose of investigation. This can be judged in the light of the
time and geographical area covered by the available data.
3. Whether the data are reliable. The data obtained should be checked for its accuracy.
In case, if the data are based on a sample, one should see whether the sample is a
proper representative of the population.
Once the above points are observed in the secondary data, it is ready to be used for further
statistical analysis.
11
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Editing Data
Before further analysis, the collected data should be edited for completeness, consistency,
accuracy and homogeneity.
Classification of Data
The next important step towards organizing data is classification. Classification is the
separation of items according to similar characteristics and grouping them into various
groups. Data may be classified into four broad classes:
12
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Tabulation of Data
A table is a systematic arrangement of data in rows and columns, which is easy to under-
stand and makes data fit for further analysis and drawing conclusions.
Tabulation should not be confused with classification, as the two differ in many ways.
Mainly the purpose of classification is to divide the data into homogenous groups whereas
the data are presented into rows and columns in tabulation. Hence, classification is a pre-
liminary step prior to tabulation.
2. Title: There should be a title at the top of every table. The title should be clear,
concise and adequate. The title should answer the questions : What is the data?
where is the data? how is the data classified? and, what is the time period of data?
3. Caption: The caption labels the data presented in a column of the table. There
may be sub-captions in each caption.
5. Body: The body of the table is the most important part. The information given
in the rows and columns forms the body of the table. It contains the quantitative
information to be presented.
6. Footnote: Any explanatory notes concerning the table itself, placed directly beneath
the table, is called ’footnote’. The main purpose of footnote is to clarify some of the
specific items given in the table or to explain the ambiguities, omissions, if any, about
the data shown in the table.
7. Source Note: If the data is collected from secondary sources, a source note is given
to disclose the sources from which the data is collected.
Though the format of a table has already been discussed, some guidelines for preparing a
table are as follows:
1. The table should contain the required number of rows and columns with stubs and
captions and the whole data should be accommodated within the cells formed corre-
sponding to these rows and columns.
2. If the quantity is zero, it should be entered as zero. Leaving blank space or putting
dash in place of zero is confusing and undesirable.
13
Introduction to Statistics - Stat 1011 es.awol@gmail.com
3. The unit of measurement should either be given in parentheses just below the col-
umn’s caption or in parentheses along with the stub in the row.
4. If any figure in the table has to be specified for a particular purpose, it should be
marked with an asterisk or another symbol. The specification of the marked figure
should be explained at the beneath of the table with the same mark.
There are three types of frequency distributions; categorical, ungrouped and grouped fre-
quency distributions.
1. Categorical frequency distribution: It is used when the variable is qualitative
i.e. either nominal or ordinal. Each category of the variable represents a single class
and the number of times each category repeats represents the frequency of that class.
14
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Example 2.4. Consider the following age group and number of persons:
(a) Class Limits: The lowest and highest values that can be included in a class
are called class limits. The lowest values are called lower class limits and the
highest values are called upper class limits. For example: Class limit for the
first class is 1-25, where 1 is the lower class limit and 25 is the upper class limit
of the first class.
(b) Class Boundaries: Class boundaries are class limits when there is no gap
between the UCL of one class and the LCL of the next class. The lowest values
are called lower class boundaries and the highest values are called upper class
boundaries. The class boundary for the first class 0.5-25.5 where the Lower class
boundary is 0.5 and the Upper class boundary is 25.5. Note that the UCL of
one class is the LCL of the next class.
(c) Class Width: It is the difference between UCB and LCB of a certain class.
It is also the difference between the lower limits of two consecutive classes or
it is the difference between upper limits of two consecutive classes. That is,
W = U CB − LCB or W = LCLi − LCLi−1 or W = U CLi − U CLi−1 .
The class width of the above frequency distribution is W = 25.5 − 0.5 = 25 or
W = 26 − 1 = 25 or W = 50 − 25 = 25.
15
Introduction to Statistics - Stat 1011 es.awol@gmail.com
(d) Class Mark: is the half way between the class limits or the class boundaries
of a certain class.
LCLi + U CLi LCBi + U CBi
CMi = =
2 2
Class marks of the above distribution are CM1 = 13, CM2 = 38, CM3 = 63
and CM4 = 88. Note also that W = CMi − CMi−1 .
The relative frequencies are particularly helpful when comparing two or more frequency
distributions in which the number of cases under investigation are not equal. The percent-
age distributions make such a comparison more meaningful, since percentages are relative
frequencies and hence the total number in the sample or population under consideration
becomes irrelevant.
16
Introduction to Statistics - Stat 1011 es.awol@gmail.com
3. Find the Range (R). Range is the maximum numerical difference in the data set, i.e.
the difference between the largest and the smallest values of the variable.
4. Determine the number of classes (k) using Sturge’s Rule. k = 1 + 3.322 log N where
N is the total number of observations.
6. Put the smallest value of the data set as the LCL of the first class. Then obtain the
LCL of the second class by adding the class width W to the LCL of the first class.
Continue adding W until you get k classes.
Let X be the smallest observation. Thus, LCL1 = X and LCLi = LCLi−1 + W for
i = 2, 3, · · · , k.
7. Now obtain the UCLs of the frequency distribution by adding W − U to the corre-
sponding LCLs. U CLi = LCLi + (W − U ) for i = 1, 2, · · · k.
1 1
8. Generate the class boundaries. LCBi = LCLi − U and U CBi = U CLi + U for
2 2
i = 1, 2, · · · k.
Example 2.5. Construct grouped frequency distribution for the following score of 56 stu-
dents (out of 40).
31 33 33 34 34 35 35 17 31 36 17 18 19 25 26 27 27 19 20 22 31 36 38 13 22 22 35 36 28 28
29 30 30 36 11 13 16 17 17 22 22 23 23 23 23 24 24 24 25 27 27 28 28 30 13 16
Solution:
17
Introduction to Statistics - Stat 1011 es.awol@gmail.com
2. U = 17 − 16 = 1
3. R = L − S = 38 − 11 = 27
4. K = 1 + 3.322 log N = 1 + 3.322 log 56 = 6.81 ≈ 7
5. W = R/K = 27/6.81 = 3.96 ≈ 4
6. W − U = 4 − 1 = 3
18
Introduction to Statistics - Stat 1011 es.awol@gmail.com
4. Classes should standardized. A class should follow logical and chronological (increas-
ing) order.
5. Classes should be continuous. Even if there are no values in a class the class must
be included in the frequency distribution.
6. Open ended classes, where there is no lower limit of the first class or no upper
limit of the last class, should be avoided since this creates difficulty in analysis and
interpretation.
• Advantages:
• Disadvantages:
19
Introduction to Statistics - Stat 1011 es.awol@gmail.com
are represented on the Y-axis. The width of the bar represents nothing (it is
meaningless), but it should be equal for all bars. Also, each bar is separated by
an equal space.
Example 2.7. Construct simple bar chart for the following data.
Marital Status Number of Individuals
Single 10
Married 7
Divorced 3
Others (Widowed,· · · ) 1
Total 21
(b) Component Bar Diagram: is used when there is a desire to show a total or
aggregate is divided into its component parts. The bars represent total value of
a variable with each total broken into its component parts and different colors
are used for identification. In such type of diagrams, a bar is subdivided into
parts in proportion to the size of the subdivision. These subdivided rectangles
are shaded differently by lines, dots and colors so that they will be very easy to
compare the components. Sometimes the volumes of different attributes may
be greatly different. For making meaningful comparisons, the components of
the attributes are reduced to percentages. In that case each attribute will have
100 as its maximum volume. This sort of component bar diagram is known as
percentage bar-diagram.
Example 2.8. Construct component bar chart for the following data.
20
Introduction to Statistics - Stat 1011 es.awol@gmail.com
(c) Multiple Bars Diagram: used to display data on more than one variable. In
the multiple bars diagram two or more sets of inter-related data are interpreted.
Example 2.9. Construct multiple bar chart for the following data.
Year Coffee Butter Sugar
1997 12 10 7
1998 5 9 8
1999 10 12 7
2000 9 8 8
21
Introduction to Statistics - Stat 1011 es.awol@gmail.com
2. Pie Chart: Pie chart is popularly used in practice to show the percentage break
down of data. A pie chart is simply a circle divided into a number of slices whose
sizes correspond to the frequency or relative frequency of each class or a pie chart is
a circle representing the total, cut into slices in proportional to the size of the parts
that make up the total.
Solution:
22
Introduction to Statistics - Stat 1011 es.awol@gmail.com
23
Introduction to Statistics - Stat 1011 es.awol@gmail.com
bar. It is also called frequency curve if the points are joined by a smooth free hand
sketch.
24
Chapter 3
Usually the collected data are not suitable to draw conclusions about the mass from which
it has been taken. Even though the data will be, somewhat summarized after it has de-
picted using frequency distributions and presented by using graphs and diagrams, still we
cannot make any inference about the data since there are many groups. Hence, organizing a
data into a frequency distribution is not sufficient, there is a need for further condensation,
particularly, to compare two or more distributions, we may reduce the entire distribution
into one number that represents the distribution we need. A single value which can be
considered as typical or representative of a set of observations and around which the ob-
servations can be considered as centered is called an ’average’ (or average value or center
of location). Since, such typical values tend to lie centrally within a set of observations
when arranged according to magnitudes; averages are called measures of central tendency.
There are many types of measures of central tendency, each possessing particular properties
and each being typical in some unique way. The most frequently encountered ones are:
• Computed averages: Mean (Arithmetic Mean, Geometric Mean and Harmonic Mean)
25
Introduction to Statistics - Stat 1011 es.awol@gmail.com
• Mode
3. It should be defined rigidly which means it should have a definite value (it should be
unique).
6. It should be stable with regard to sampling. This means that if a number of samples
of the same size are drawn from a population, the measure of central tendency having
the minimum variation among the different calculated values should be preferred.
26
Introduction to Statistics - Stat 1011 es.awol@gmail.com
n
X
• Xi Yi = X1 Y1 + X2 Y2 + · · · + Xn Yn
i=1
n
X
• c = nc where c is a constant.
i=1
n
X n
X
• (Xi ± c) = Xi ± nc
i=1 i=1
n
X n
X
• cXi = c Xi
i=1 i=1
n
X n
X n
X n
X
• (Xi ± Yi )2 = Xi2 ± 2 Xi Yi + Yi2
i=1 i=1 i=1 i=1
n
X n
X n
X
• Xi Yi 6= Xi Yi
i=1 i=1 i=1
n
X Xn
• Xi2 6= ( Xi ) 2
i=1 i=1
3.4 Mean
3.4.1 Arithmetic Mean
1. Simple arithmetic mean: The arithmetic mean is the simplest but most useful
measure of central tendency. It is nothing but the ’average’ which we compute in our
high school arithmetic. It is defined as the sum of all observations divided by the
total number of observations. The sample mean is denoted by X̄ (read as X bar)
while the population mean is represented by the Greek letter µ, mu.
• For a sample of n raw (individual) observations, X1 , X2 , · · · , Xn :
n
X
Xi
i=1
X̄ =
n
• For grouped data (continuous or ungrouped frequency distributions):
k
X
f i Xi
i=1
X̄ = k
X
fi
i=1
27
Introduction to Statistics - Stat 1011 es.awol@gmail.com
where Xi is class mark of the ith class for grouped data or it is the ith class value
for ungrouped data and fi is the corresponding frequency.
To find the mean of the frequency distribution, the necessary calculations are as
follows:
k
X 7
X
f i Xi f i Xi
i=1 i=1 1436
Thus, X̄ = k
= 7
= = 25.64
X X 56
fi fi
i=1 i=1
(a) The algebraic sum of the deviations of each value from the arithmetic mean is
Xn
zero. That is (Xi − X̄) = 0.
i=1
(b) The sum of the squares of the deviations from the mean is less than the sum
of the squares of the deviations about the other score in the distribution, that
is, the sum of the squares of the deviation from the mean is minimum. That is,
X n X n
2
(Xi − X̄) < (Xi − a)2 , a 6= X̄
i=1 i=1
28
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Examples:
i. The mean weight of 50 women working in a factory is 48 kilograms. The
mean weight of 75 men working in the same factory is 58 kilograms. Find
the mean weight of all workers in the factory.
ii. The mean mark in statistics of 50 students in a class was 72 and that of the
35 boys was 75. Find the mean mark of the girls in the class. Ans:65
iii. The mean salary of 100 laborers working in a factory, running in two shifts
of 40 and 60 workers respectively is birr 380. The mean salary of the 40
laborers working in the morning shift is 350. Find the mean salary of the
60 laborers working in the evening shift.
Solutions:
i. nw = 50, X̄w = 48, nm = 75, X̄m = 58, X̄c =?
29
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Example 3.4. The mean of 200 items was found to be 50. Later on it was discovered
that two items were wrongly read as 92 and 8 instead of the correct values 192 and
88 respectively. Find the correct mean.
Example 3.5. A teacher attaches 2 to quiz, 3 to midterm and 5 for final exam. If
a student gets 90, 50 and 60 for quiz, midterm and final exam respectively, what is
his/her average academic performance?
Solution: Xi = 90, 50, 60 and wi = 2, 3, 5
n
X 3
X
wi X i wi X i
i=1 i=1 2(90) + 3(50) + 5(60) 630
X̄w = n = 3
= = = 63
X X 2+3+5 10
wi wi
i=1 i=1
The arithmetic mean fulfils all characteristic of good measures of central tendency with
the exception that it is highly affected by extreme values. And it cannot be calculated for
a frequency distribution with open ended classes (a frequency distribution with no lower
boundary of the first class or with no upper class boundary of the last class or with both).
Geometric mean is defined as the nth root of the product of n positive numerical values.
30
Introduction to Statistics - Stat 1011 es.awol@gmail.com
where Xi is the class mark the ith class and fi is corresponding class frequency,
Xk
n= fi .
i=1
k
X
But the above formula is used if n = fi is small. If it is large, it is difficult to calculate
i=1
the nth root. Thus, to facilitate the computation, we make use of logarithms. Thus:
n
1X
GM = antilog( log Xi ) for ungrouped data and
n i=1
k
1 X
GM = antilog( k fi log Xi )) for grouped data.
i=1
X
fi
i=1
The disadvantage of geometric mean is that it will be meaningless if one or more obser-
vations are zero or negative. It is also affected by extreme values but not to the extent of
arithmetic mean.
v v
u n u 3
uY uY
3
p √
3
n
GM = t Xi = t Xi = 3 2(4)(8) = 64 = 4
i=1 i=1
Example 3.7. The price of a commodity increased by 5% from 1989 to 1990, 8% from
1990 to 1991 and by 77% from 1991 to 1992. Find the average price increase.
For increment, take the base line value as 100% and then add the % increase so as to get
the values in successive years.
31
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Example 3.8. A machine depreciated by 10% each in the first two years and by 40% in
the third year. Find out the average rate of depreciation.
Like the previous one, take the base line value of the machine as 100% and then deduct
the % of depreciation so as to get the depreciated values in successive years.
Example 3.9. Decadal percentage growth of population in country A is given below. Find
the average rate growth.
Harmonic mean is defined as the inverse of the arithmetic mean of the reciprocals of the
values.
32
Introduction to Statistics - Stat 1011 es.awol@gmail.com
where Xi is the class mark of the ith class and fi is the corresponding class frequency,
Xk
n= fi .
i=1
Similar to weighted arithmetic mean, there is also weighted harmonic mean. It is given by:
n
X n
X
wi wi
i=1 i=1
HM = n = w1 w2 wk
X wi + + ··· +
X1 X2 Xn
i=1
Xi
Harmonic mean is not affected by extreme values. But it cannot be calculated when one
or more observations are zero.
Example 3.10. Find the harmonic mean of 2, 4 and 8.
Xi = 2, 4, 8;
3 3
HM = = = 3.429
1/2 + 1/4 + 1/8 0.875
Example 3.11. In a factory a mechanic takes 15 days to fabricate a machine, the second
mechanic takes 18 days, the third takes 30 days and the fourth takes 90 days. Find the
average number of days taken by the workers to fabricate the machine.
Xi = 15, 18, 30, 90;
4 4
HM = = = 23.95
1/15 + 1/18 + 1/30 + 1/90 0.167
Example 3.12. Suppose a train moves 100 km with a speed of 40 km per hour, then 150
km with a speed of 50 km per hour and the next 135 km with a speed of 45 km per hour.
Calculate the average speed of the train.
33
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Example 3.16. The arithmetic mean of two observations is 36 and their harmonic mean
is 25. What is the geometric mean of the two observations?
3.5 Median
It has been pointed out that mean cannot be calculated whenever there is frequency distri-
bution with open-ended classes. Also the mean is to a great extent affected by the extreme
values. For instance, there are eight persons getting salaries as Birr 150, 225, 240, 260,
275, 290, 300 and 1500. The mean salary of the persons is Birr 405. This value is not a
good measure of central tendency because out of the eight people, seven get Birr 300 or
34
Introduction to Statistics - Stat 1011 es.awol@gmail.com
less. Hence, some better measure is preferable and median is one of them.
Median is the half way point in a data set. It divides a data set into two equal parts
such that half of the numbers have a value less than the median and have will have values
greater than the median. Graphically, median is located at the intersection point of the
less than and more than cumulative frequency curves.
a. 4th value=209
The median value for grouped frequency distributions is given by the formula:
n
− FX̃−1
X̃ = LX̃ + 2 ×w
fX̃
k
X
where n = fi is the total number of observations, fX̃ is frequency of the median class,
i=1
LX̃ is the lower class boundary of the median class, FX̃−1 is the less than cumulative fre-
quency just before the median class or it is the sum of all the frequencies up to but not
including the median class and w is the class width of the median class. The median class
is the class corresponding to the minimum less than cumulative frequency which contains
n
the value .
2
Example 3.18. Find the median mark of the students score data and interpret it.
35
Introduction to Statistics - Stat 1011 es.awol@gmail.com
First calculate less than cumulative frequency of the frequency distribution and identify
the median class.
Class Boundaries fi LCF (Fi )
10.5-14.5 4 4
14.5-18.5 7 11
18.5-22.5 8 19
22.5-26.5 10 29
26.5-30.5 12 41
30.5-34.5 7 48
34.5-38.5 8 56
Total 56
The median class is the class having the less than cumulative frequency containing the
value n/2 = 56/2 = 28. This implies, 22.5-26.5 is the median class.
n
− FX̃−1
28 − 19
X̃ = LX̃ + 2 × w = 22.5 + × 4 = 22.5 + 3.6 = 26.1
fX̃ 10
Median is not influenced by extreme values. It can be calculated for FD with open-ended
classes, even it can be located if the data is incomplete.
3.6.1 Quartiles
Quartiles are values that divide a data set into four equal parts. These values are denoted
by Q1 , Q2 and Q3 such that 25% of the data fall below Q1 , 50% below Q2 and 75% below Q3 .
th
th i(n + 1)
Let Qi be the i quartile (i = 1, 2, 3), then Qi = value.
4
36
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Example 3.20. Given the data: 420, 430, 435, 438, 441, 449, 490, 500, 510 and 515. Find
all the quartiles.
th
i(n + 1)
Qi = value, i = 1, 2, 3
4
th
(10 + 1)
Q1 = value = 2.75th value = 2nd value + 0.75 (3rd value - 2nd value) =
4
430+0.75(435-430) = 433.75
th
2(10 + 1)
Q2 = value = 5.5th value = 5th value + 0.5 (6th value - 5th value) =
4
441+0.5(449-441) = 445
th
3(10 + 1)
Q3 = value = 8.25th value = 8th value + 0.25 (9th value - 8th value) =
4
500+0.25(510-500)= 502.5
in
− FQi −1 k
4 X
For frequency distribution, Qi = LQi + × w, i = 1, 2, 3 where n = fi
fQi i=1
is the total number of observations, fDi is frequency of the ith quartile class, LQi is the
lower class boundary of the ith quartile class, FQi −1 is the less than cumulative frequency
just before the ith quartile class and w is the class width of the ith quartile class. The ith
quartile class is the class corresponding to the minimum less than cumulative frequency
in
which contains the value .
4
Example 3.21. Calculate all quartiles for the students score data and interpret the results.
in
− FQi −1
Calculate the less than cumulative frequencies Fi s first. Qi = LQi + 4 ×w, i =
fQi
1, 2, 3
37
Introduction to Statistics - Stat 1011 es.awol@gmail.com
3n
− FQ3 −1
42 − 41
4
Q3 = LQ3 + × w = 30.5 + × 4 = 30.5 + 0.57 = 31.07
fQ3 7
3.6.2 Deciles
Deciles are values that divide the data into ten equal parts. These values are denoted by
D1 , D2 , · · · , D9 such that 10% of the data fall below D1 , 20% below D2 , · · · , 90% below D9 .
th
th i(n + 1)
Let Di be the i decile (i = 1, 2, · · · , 9), then Di = value.
10
Example 3.22. Given the data: 420, 430, 435, 438, 441, 449, 490, 500, 510 and 515. Find
the 1stand 7th deciles.
th
i(n + 1)
Di = value, i = 1, 2, · · · , 9
10
th
(10 + 1)
D1 = value = 1.1th value = 1st value + 0.1 (2nd value - 1st value) =
10
420+0.1(430-420) = 421
th
7(10 + 1)
D7 = value = 7.7th value = 7th value + 0.7 (8th value - 7th value) =
10
490+0.7(500-490)= 497
in
− FDi −1 k
4 X
For frequency distribution, Di = LDi + ×w, i = 1, 2, · · · , 9 where n = fi
fDi i=1
is the total number of observations, fDi is frequency of the ith decile class, LDi is the lower
class boundary of the ith decile class, FDi −1 is the less than cumulative frequency just
before the ith decile class and w is the class width of the ith decile class. The ith decile class
is the class corresponding to the minimum less than cumulative frequency which contains
in
the value .
10
Example 3.23. Calculate the 5th and 8th deciles for the students score data and interpret
the results.
in
− FDi −1
Di = LDi + 10 × w, i = 1, 2, · · · , 9
fDi
38
Introduction to Statistics - Stat 1011 es.awol@gmail.com
5n
− FD5 −1
28 − 19
10
D5 = LD5 + × w = 22.5 + × 4 = 22.5 + 3.6 = 26.1
fD5 10
3.6.3 Percentiles
Percentiles are values that divide a data set into 100 equal parts. These values are denoted
by P1 , P2 , · · · , P99 .
th
th i(n + 1)
Let Pi be the i percentile (i = 1, 2, · · · , 99), then Pi = value.
100
Example 3.24. Given the data: 420, 430, 435, 438, 441, 449, 490, 500, 510 and 515. Find
the 40th and 75th percentiles.
th
i(n + 1)
Pi = value, i = 1, 2, · · · , 99
100
th
40(10 + 1)
P40 = value = 4.4th value = 4st value + 0.4 (5th value - 4th value) =
100
438+0.4(441-438) = 439.2
th
75(10 + 1)
P75 = value = 8.25th value = 8th value + 0.25 (9th value - 8th value) =
100
500+0.25(510-500) = 502.5
in
− FPi −1 k
4 X
For frequency distribution, Pi = LPi + ×w, i = 1, 2, · · · , 99 where n = fi
fPi i=1
is the total number of observations, fPi is frequency of the ith percentile class, LPi is the
lower class boundary of the ith percentile class, FPi −1 is the less than cumulative frequency
just before the ith percentile class and w is the class width of the ith percentile class. The ith
percentile class is the class corresponding to the minimum less than cumulative frequency
in
which contains the value .
100
Example 3.25. Calculate the 30th and 80th percentiles for the students score data and
interpret the results.
39
Introduction to Statistics - Stat 1011 es.awol@gmail.com
in
− FPi −1
Pi = LPi + 100 × w, i = 1, 2, · · · , 99
fPi
P30 class: 30n/100 = 30(56)/100 = 16.80, The P30 class is ⇒ 18.5 − 22.5.
30n
− FP30 −1
16.80 − 11
100
P30 = LP30 + × w = 18.5 + × 4 = 18.5 + 1.22 = 19.72
fP30 19
P90 class: 90n/100 = 90(56)/100 = 50.40, The P90 class is ⇒ 35.5 − 38.5.
90n
− FP90 −1
50.40 − 48
100
P90 = LP90 + × w = 35.5 + × 4 = 35.5 + 1.2 = 36.7
fP90 8
Example 3.26. The life times (in hours) of eighty randomly selected light bulbs in sum-
marized in the following table. Find all the quartiles, the 6th decile and the 65th percentile.
• Qi = Pi×25 , i = 1, 2, 3
• Di = Pi×10 , i = 1, 2, · · · , 9
3.7 Mode
Mode is another measure of central tendency. It is a value of a particular type of items
which occur most frequently. For instance if shoe size 7 has the maximum demand, size
No. 7 is the modal value of shoe sizes. Mode is denoted by X̂. A data set may have one
mode (uni-modal), two modes (bi-modal), more than two modes (multi-modal) or no mode
at all (i.e. when all observations are equally frequent).
In ungrouped (individual series) cases, one can find mode by inspection. After arranging
the data in ascending or descending order, the value appearing most frequently (the most
frequent value) is taken as the modal value.
a. 110, 113, 116, 116, 118, 118, 118, 121 and 123.
40
Introduction to Statistics - Stat 1011 es.awol@gmail.com
b. 2, 3, 5, 7 and 8.
c. 15, 18, 18, 18, 20, 22, 24, 24, 24, 26 and 26
e. 1, 1, 0, 1, 0, 0, 0, 2, 4 and 3.
To find the modal value of each data set, just find the value having the highest frequency.
a. Since 118 occurs more than other values, the mode is 118.
b. Each value occurs once (equally frequent), the data has no mode.
c. 18 and 24 occur three times, hence the modal values are 18 and 24 (bi-modal).
e. The modal value here is 0 as it occurs more number of times than other values.
In grouped (continuous) frequency distribution, the modal value is located in the class with
highest frequency and that class is the modal class.
fX̂ − fX̂−1
X̂ = LX̂ + ×w
(fX̂ − fX̂−1 ) + (fX̂ − fX̂+1 )
where LX̂ is the lower class boundary of the modal class, fX̂ is frequency of the modal
class, fX̂−1 is the frequency just before the modal class, fX̂+1 is the frequency just after
the modal class and w is the class width of the modal class. The modal class is the class
corresponding to the largest frequency.
Example 3.28. Find the modal score of the students score data.
The class having highest frequency is ⇒ 26.5 − 30.5, hence it is the modal class.
fX̂ − fX̂−1
X̂ = LX̂ + ×w
(fX̂ − fX̂−1 ) + (fX̂ − fX̂+1 )
12 − 10
X̂ = 26.5 + × 4 = 26.5 + 1.14 = 27.64
(12 − 10) + (12 − 7)
Example 3.29. What is the modal life time of the light bulbs given below in the table.
Mode is not affected by extreme values and can be calculated for open-ended classes. But
it often does not exist and its value may not be unique.
41
Introduction to Statistics - Stat 1011 es.awol@gmail.com
EXERCISES:
1. In a certain investigation, 460 persons were involved in the study, and based on
an enquiry on their age, it was known that 75% of them were 22 or more years.
The following frequency distribution shows the age composition of the persons under
study.
(a) Find the median and modal life of condensers and interpret them.
(b) Find the values of all quartiles.
(c) Compute the 5th decile, 25th percentile, 50th percentile and the 75th percentile
and interpret the results.
2. The mean annual salary of all employees in a company is 2500. The mean salary of
male and female is 2700 and 1700 respectively. Find the percentage of males and
females employed in the company.
(a) If 75% of the items were sold in birr 45 or less and most items were sold in birr
34, find the missing frequencies.
(b) If 25% of the items were sold in birr 45 or more and most items were sold in
birr 34, find the missing frequencies.
Summary
Different measures of central tendency and quantiles have been discussed in this chapter.
Out of mean, median and mode, the mean (average) is the most commonly used measure
of central tendency. But, the other two namely, the median and mode are not any less
important. Median is a largely used central measure in psychology, education and other
social sciences. Mode is a suitable average for qualitative information like attitude towards
disabled people, beauty or intelligence of certain individuals. It is a useful measure for
manufacturers.
42
Chapter 4
The following table displays the price of a certain commodity in four cities. Find the mean
and median prices of the four cities and interpret it.
A 30 30 30 30 30
B 28 29 30 31 32
C 10 15 30 45 50
D 0 5 30 55 60
All the four data sets have mean 30 and median is also 30. But by inspection it is ap-
parent that the four data sets differ remarkably from one another. So measures of central
tendency alone do not provide enough information about the nature of the data. Thus, to
have a clear picture of the data, one needs to have a measure of dispersion or variability
among observations in the data set.
Variation or dispersion may be defined as the extent of scatteredness of value around the
measures of central tendency. Thus, a measure of dispersion tells us the extent to which
the values of a variable vary about the measure of central tendency.
43
Introduction to Statistics - Stat 1011 es.awol@gmail.com
2. To compare two or more sets of data with regard to their variability. Two or
more data sets can be compared by calculating the same measure of variation having
the same units of measurement. A set with smaller value posses less variability or is
more uniform (or more consistent).
The size of the absolute measures of dispersion depends upon the size of the values
in the data. That is, if the size of the values is larger, the value of the absolute
measures will also be larger. Therefore, an absolute measures of variation fails to
be appropriate for comparing two or more groups if the size of the data among the
groups is not the same.
44
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Before giving the details of these measures of dispersion, it is worthwhile to point out that
a measure of dispersion (variation) is to be adjudged on the basis of all those properties of
good measures of central tendency. Hence, their repetition is superfluous.
45
Introduction to Statistics - Stat 1011 es.awol@gmail.com
M D is not much affected by extreme values. Its main drawback is that the algebraic
negative signs of the deviations are ignored. M D is minimum when the deviation is taken
from median. The coefficient of mean deviations are:
M D(X̄) M D(X̃)
CM D(X̄) = and CM D(X̃) =
X̄ X̃
Example 4.1. Calculate the R, CR, QD, CQD, MD(X̄), MD(X̃), CMD(X̄) and CMD(X̃)
for the following data: 20, 28, 40, 12, 30, 15, 50.
46
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Previously, we have obtained the following quantities for the students score data:
X̄ = 25.64, X̃ = 26.1, Q1 = 20, Q3 = 31.07
47
Introduction to Statistics - Stat 1011 es.awol@gmail.com
For a sample of n elements, the sample variance and standard deviation denoted by S and
S 2 , respectively, are calculated as using the formulae:
v
X n uX n
2
(Xi − X̄)2
u
(Xi − X̄) u
t
• For raw data, S 2 = i=1 n − 1 and S = i=1
n−1
v
k
u k
X uX
2
fi (Xi − X̄) u
u fi (Xi − X̄)2
• For grouped data, S 2 = i=1 k and S = u i=1 k
u
X u X
fi − 1 t fi − 1
i=1 i=1
Example 4.4. Find the variance and standard deviation of: 20, 28, 40, 12, 30, 15 and 50.
a. Take the data as a population.
b. Consider it as a sample.
N
X
(Xi − µ)2
i=1
a. N = 7, µ = 27.86; σ 2 =
N
(20 − 27.86)2 + · · · + (50 − 27.86)2 1120.86
σ2 = = = 160.12
7 7
48
Introduction to Statistics - Stat 1011 es.awol@gmail.com
√
⇒σ= 160.12 = 12.65
n
X
(Xi − X̄)2
i=1
b. n = 7, X̄ = 27.86; S 2 =
n−1
(20 − 27.86)2 + · · · + (50 − 27.86)2 1120.86
S2 = = = 186.81
6 6
√
⇒S= 186.81 = 13.67
Example 4.5. Find the variance and standard deviation of the students score data.
The necessary calculation for calculating variance are as follows: X̄ = 25.64
49
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Standard deviation is considered to be the best measure of dispersion because the unit of
measurement is the same as the data set and the exaggeration made by variance will be
eliminated by taking the square root of it. In simple words, it explains the average amount
of variation on either sides of the mean. If the standard deviation of the data is small the
values are concentrated near the mean and if it large the values are scattered away from
the mean.
Similarly, the pooled population variance can be calculated using the formula:
g
X
Ni [σi2 + (µi − µc )2 ]
i=1
σp2 = g
X
Ni
i=1
50
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Example 4.7. The mean weight of 150 students is 60 kilograms. The mean weight of boys
is 70 kilograms with aa standard deviation of 10 kilograms. For the girls, the mean weight
weight is 55 kilograms and the standard deviation 15 kilograms.
Example 4.8. A distribution consists of four parts characterized as follows. Find the
mean and standard deviation of the distribution. Ans: X̄c = 73.8 and σ = 11.93
Example 4.10. The following data are some of the particulars of the distribution of
weights of boys and girls in a class.
Boys Girls
Number 100 50
Mean 60 45
Variance 9 4
51
Introduction to Statistics - Stat 1011 es.awol@gmail.com
S
• For sample: CV = × 100%
X̄
The distribution having less CV is said to be less variable or more consistent or more
uniform. For field experiments, CV , is generally reported. If it is small, it indicates more
reliability of of experimental findings.
Example 4.11. Compare the variability of the following two sample data sets using stan-
dard deviation and coefficient of variation:
Example 4.12. The average IQ of statistics students is 110 with standard deviation 5 and
the average IQ of mathematics students is 106 with standard deviation 4. Which class is
less variable in terms of IQ?
Summary
A measure of dispersion, specially the variance, is the back bone of statistics. As a matter
of fact, statistics involves variance almost in every study in one way or the other. Most of
the surveys and experiments are considered as a study of sample units. Hence, the formulae
for sampling are mostly used. All the formulae except the variance are not affected whether
we consider a population or a sample. Of course, the interpretation of values has to be
made accordingly.
52
Introduction to Statistics - Stat 1011 es.awol@gmail.com
4.3 Moments
Let X is a variable that assumes values X1 , X2 ,· · · ,XN .
X̄ = µ001 + A
2. The rth moment about the origin (i.e., in (1) above A = 0) is defined as:
N
X
Xir
i=1
• µ0r = for raw data
N
k
X
fi Xir
i=1
• µ0r = k
for grouped data.
X
fi
i=1
53
Introduction to Statistics - Stat 1011 es.awol@gmail.com
• µ0 = 1
• µ1 = 0
• µ2 = σ 2
Example 4.14. Find the first three central moments of the numbers 2, 3 and 7.
Example 4.15. Find the third moment about 3 of the numbers 2, 3 and 7.
4.4 Skewness
4.4.1 Frequency Curves
So far it has been discussed that frequency curve is one of the graphical methods of data
presentation used for continuous data. It is a graph of smooth line segment joining the
intersection points of class marks and frequencies.
54
Introduction to Statistics - Stat 1011 es.awol@gmail.com
2. Positively skewed curve: If some observations are extremely large, the mean of the
distribution becomes greater than the median or mode. In such case, the distribution
is said to be positively skewed. In positively skewed distribution:
• The right tail of the frequency curve is more elongated, longest tail to the right
of the central point.
• More values are on the left of the mean.
• The extreme variation is towards large values (to the right).
• Smaller values are more frequent.
• Mean>Median>Mode
3. Negatively skewed curve: If some extremely small observations are present, the
mean is the smallest of the the other two averages, and the distribution is said to be
negatively skewed.
55
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Measures of Skewness
X̄ − X̂
Skp =
S
• If Skp = 0, the distribution is symmetrical curve.
• If Skp > 0, the distribution is positively skewed.
• If Skp < 0, the distribution is negatively skewed.
Q3 + Q1 − 2Q2
Skb =
Q3 − Q1
• If Skb = 0, the distribution is symmetrical curve.
• If Skb > 0, the distribution is positively skewed.
• If Skb < 0, the distribution is negatively skewed.
56
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Example 4.16. Calculate the Pearson’s and Bowley’s coefficient of skewness for: 2,3,4,4,5,5,5,7,8,9.
Example 4.17. The mean, median and coefficient of variation of 100 observations are
found to be 90.84 and 80 respectively. Find the coefficient of skewness.
4.5 Kurtosis
Kurtosis refers to the peakedness or flatness of a certain distribution with respect to the
normal distribution. It describes the degree of concentration of observations around the
mode of the distribution, whether the values are concentrated more around the mode (a
peaked curve) or away from the mode toward both the tails of the frequency curve. Two or
more distributions may have identical average, variation and skewness but they may show
different degrees of concentration of values of observations around the mode and hence
may show different degrees of peakedness.
A distribution which is neither more peaked nor flat topped is called mesokurtic.
57
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Measures of Kurtosis
58
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Example 4.18. The first four central moments of a distribution are 0, 2.5, 0.7 and 18.75.
Comment on the skewness and kurtosis of the distribution.
EXERCISES:
1. Find the range, quartile deviation, mean deviation about the mean, mean deviation
about the median, mean deviation about the mode, variance, standard deviation and
coefficient of variation for the following distribution.
2. Three independent distributions each of 100 members and standard deviation 4.5
units are located with their means at 12.1, 17.1 and 22.1 units respectively. Find the
standard deviation of the three distributions taken as a whole.
3. The first of the two groups has 100 items with mean 45 and variance 49. If the
combined has 250 items with mean 51 and variance 130, find the mean and standard
deviation of the second group.
4. Karl Pearson’s coefficient of skewness is +0.32. Its standard deviation is 6.5 and
mean is 29.6. Find the median and mode of the distribution.
5. For a distribution, Bowley’s coefficient of skewness is -0.56, the lower quartile is 16.4
and median is 24.2. What is the quartile deviation?
6. If the first two moments of a distribution about the value 5 are 2 and 20. Find the
mean and variance of the distribution.
59
Chapter 5
Elementary Probability
As a general concept, probability is the measure of a chance that something will occur. Or
it may also be defined as a quantitative measure of uncertainty.
In describing which objects are contained in set A, two common methods are available.
These methods are:
1. Listing all objects of A. For example, A = {1, 2, 3, 4} describes the set consisting of
the positive integers 1, 2, 3 and 4.
2. Describing a set in words, for example, set A consists of all real numbers between 0
and 1, inclusive. It can be written as A = {x : 0 ≤ x ≤ 1}, that is, A is the set of
all x’s where x is a real number between 0 and 1, inclusive.
If every element of set A is also an element of set B, A is said to be a subset of B and write
as A ⊂ B. Every set is a subset of itself, i.e., A ⊂ A. Empty set is a subset of every set.
If A ⊂ B and B ⊂ C, then A ⊂ C. If A ⊂ B and B ⊂ A, then A and B are said to be equal.
Now let us see some methods of combining sets in order to form a new set and develop the
main properties.
60
Introduction to Statistics - Stat 1011 es.awol@gmail.com
1. Union (Or): A set consisting all elements in A or B or both is called the union set
of A and B, and write as A ∪ B. That is, A ∪ B = {x : x ∈ A, x ∈ B or x ∈ both}.
The set A ∪ B is also called the sum of A and B.
Equivalent Sets
• Commutative laws:
– A∪B =B∪A
– A∩B =B∩A
• Associative laws:
– A ∪ (B ∪ C) = (A ∪ B) ∪ C
– A ∩ (B ∩ C) = (A ∩ B) ∩ C
• Distributive laws:
– A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
– A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
• Identity laws:
– A ∪ A = A, A ∩ A = A
– A ∪ U = U, A ∩ U = A
– A ∪ ∅ = A, A ∩ ∅ = ∅
• A ∪ A0 = U and A ∩ A0 = ∅
• ∅0 = U and U 0 = ∅
• De-Morgan’s laws:
– (A ∪ B)0 = A0 ∩ B 0
– (A ∩ B)0 = A0 ∪ B 0
• A ⊂ B ⇔ B 0 ⊂ A0 ⇒ A ∪ B = B and A ∩ B = A
61
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Example 5.1. Let U = {a, b, c, d, e, f, g, h}. Let A = {a, d, e}, B = {d, e, g, h} and
C = {a, d, c, e, h}. Find A ∪ B, A ∩ B, A ∩ B 0 , A0 ∩ B, (A ∪ B)0 , (A ∩ B)0 , A ∩ (B ∪ C),
A ∪ (B ∩ C).
Example 5.3. In a survey conducted among 200 statistics major students, the number of
students who visited historical, religious and both sites are found to be 150, 130 and 80
respectively. Find the number of students who visited none of the sites.
62
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Since S ⊂ S and E ⊂ S, it follows that S and ∅ are also events. S is called certain (sure)
event because every outcome is an element of S. The event ∅ is an impossible event because
no outcome of the experiment can be an element of ∅.
Definition:
Mutually Exclusive Events: Two events A and B are said to be mutually exclusive if
they cannot occur together, i.e., A ∩ B = ∅. For example, in the experiment of rolling a
die, odd numbers and even numbers are mutually exclusive events.
Let us now use the various methods of combining sets (that is, events) and obtain the new
sets (that is, events) which are introduced earlier. Consider the outcome s and events A
and B:
• If s ∈ A, then A occurs. A0 is the event which occurs if A does not occur.
• If s ∈ (A ∪ B), then one of the events A or B occurs or both occur.
• If s ∈ (A ∩ B), then both events A and B occur.
• If s ∈ (A0 ∩ B 0 ), then neither A nor B occurs.
• If s ∈ (A0 ∩ B), then B occurs but not A.
• If s ∈ (A ∩ B 0 ) ∪ (A0 ∩ B), then one of the events A or B occurs.
• If s ∈ (A ∩ B)0 = A0 ∪ B 0 , then both events do not occur.
Example 5.4. There are 2 bus and 3 train routes from city X to city Z. In how
many ways can a person go from city X to city Z? Ans: 2 + 3 = 5 ways
2. Multiplication Rule: Suppose there are a sequence of k events, in which the ith
event has ni ; i = 1, 2, · · · , k possibilities, then the total number of possibilities of the
whole sequence will be n1 × n2 × · · · × nk .
63
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Example 5.5. There are 2 bus routes from city X to city Y and 3 train routes from
city Y to city Z. In how many ways can a person go from city X to city Z? Ans:
2 × 3=6 ways
Example 5.6. There are 6 questions. Each question has 4 choices. How many
answer keys must be made? Ans: 4 × 4 × 4 × 4 × 4 × 4 = 46 = 4096
Example 5.7. There are 5 hotels in a city. If 4 persons check into a different hotel,
in how many ways can this be done? Ans: 5 × 4 × 3 × 2 = 120
Example 5.8. In how many ways can 6 persons be seat in a row? Ans: 6 × 5 × 4 ×
3 × 2 × 1 = 720
Example 5.9. Seven dice are rolled. How many different outcomes are there? Ans:
6 × 6 × 6 × 6 × 6 × 6 × 6 = 67 = 279986
Example 5.10. In how many ways can 6 persons be seat in a row? Ans:
6! = 720
Example 5.11. Suppose a photographer must arrange 4 persons in a row for
a photograph. In how many different ways can the arrangement be done? Ans:
4! = 24
(b) Permutation Rule 2: The arrangement of n distinct objects in a specific order
using r objects at a time is called a permutation of n objects taking r objects
at a time, that is, nPr where
n!
nPr = , 0 ≤ r ≤ n.
(n − r)!
Example 5.12. In how many ways can 9 books be arranged on a shelf having
9!
4 places? Ans: 9P4 = = 9 × 8 × 7 × 6 = 3024
(9 − 4)!
Example 5.13. How many 5 letter permutations can be formed from the letters
8!
in the word ’DISCOVER’ ? Ans: 8P5 = = 8 × 7 × 6 × 5 × 4 = 6720
(8 − 5)!
(c) Permutation Rule 3: The number of permutations of n objects in which n1
are alike, n2 are alike, · · · , nr are alike is given by
n!
n1 ! × n2 ! × · · · × nr !
where n1 + n2 + · · · + nr = n
64
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Example 5.14. How many different permutations can be made from the letters
in the word
10!
a. STATISTICS. Ans:
3! × 3! × 1! × 2! × 1!
11!
b. MISSISSIPPI. Ans:
1! × 4! × 4! × 2!
9!
c. EXERCISES. Ans:
3! × 1! × 1! × 1! × 1! × 2
4. Combination: is the arrangement or selection of objects without regard to order.
Here, order does not matter.
The number
of combinations of r objects selected from n objects is denoted by
n
nCr = where
r
n n!
nCr = = ; 0≤r≤n
r (n − r)! × r!
Note: The difference between permutation and combination is that in combination, the or-
der of objects being selected (arranged) is not important, but order matters in permutation.
Example 5.15. In how many different ways can a secretary, a president and a manager
be selected from 5 persons?
Example 5.17. A committee of 5 persons must be selected from 5 men and 8 women.
How many ways can the selection be done if there are at least 3 women in the committee?
65
Introduction to Statistics - Stat 1011 es.awol@gmail.com
(c) A die is rolled. What is the probability of getting i) an odd number, ii) a number
greater than 4.
(d) An urn contains 6 white and 3 black balls. If one ball is selected, what is the
probability that the selected ball is black.
(e) A family plans to have three children. Describe the sample space for all possible
gender combinations. What is the probability that the family will have two
boys?
(f) Two dice are rolled. Describe the sample space. What is the probability of
getting i) a sum of 10 or more, ii) a pair which at least one number is 3, iii) a
sum of 8, 9 or 10, iv) one number less than 4.
Solutions:
(a) S = {1, 2, 3, 4, 5, 6},
E = getting number 6= {6}. Thus n(S) = 6 and n(E) = 1
n(E) 1
P (E) = =
n(S) 6
(b) S = {HH, HT, T H, T T },
E = getting one head = {HT, T H}. Thus n(S) = 4 and n(E) = 2
n(E) 2
P (E) = = = 0.5
n(S) 4
(c) S = {1, 2, 3, 4, 5, 6}, n(S) = 6
i. E = getting an odd number = {1, 3, 5}. Thus n(E) = 3
n(E) 3
P (E) = = = 0.5
n(S) 6
ii. E = getting number > 4 = {5, 6}. Thus n(E) = 2
n(E) 2
P (E) = =
n(S) 6
2. Empirical probability: It is based on a relative frequency. Given a frequency
distribution, the probability of an event being in a given class is
f
P (E) = P
f
P
where f is the class frequency and f = n is the total number of observations.
The difference between classical and empirical probability is that the former uses
sample space to determine the numerical probability while the latter is based on fre-
quency distribution.
66
Introduction to Statistics - Stat 1011 es.awol@gmail.com
67
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Outcome 1 2 3 4 5 6
Probability 1/6 1/6 1/6 1/6 1/6 1/6
P
⇒ pi = 1/6 + 1/6 + 1/6 + 1/6 + 1/6 + 1/6 = 6/6 = 1.
1. If there are two events A and B, the probability that at least one of these events will
occur is the sum of the probability that each event will occur minus the probability
that both events will occur at the same time. That is, P (A ∪ B) = P (A) + P (B) −
P (A ∩ B).
Example: A part time student is taking two courses, namely economics and statis-
tics. The probability that the student will pass economics course is 0.60 and the
probability of passing statistics course is 0.70. The probability that the student will
pass both courses is 0.50. Find the probability that the student
Solution:
68
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Example: Suppose that A and B are events for which P (A) = x, P (B) = y and
P (A ∩ B) = z. Express each of the following probabilities in terms of x, y and z. a)
P (A0 ∪ B 0 ) b) P (A0 ∩ B), c) P (A0 ∩ B 0 ) d) P (A ∩ B 0 ).
Mutually exclusive events: If two events cannot occur simultaneously, that is, one ”ex-
cludes” the other, then the two events are said to be mutually exclusive. As a result, if
event A and B are mutually exclusive, then P (A ∩ B) = 0. In such events, the occurrence
of one stops the occurrence of the other.
Example: What is the probability of getting head and tail in tossing a coin? Ans:
P (H ∩ T ) = 0.
If the events A and B are dependent to each other, the probability of event B occurring
knowing that event A has already occurred is said to be the conditional probability of B
P (A ∩ B)
given that event A has occurred, P (B/A) = .
P (A)
⇒ P (A ∩ B) = P (A)P (B/A).
Similarly, the probability of event A occurring knowing that event B has already oc-
curred is said to be the conditional probability of A given that event B has occurred,
P (B ∩ A)
P (A/B) = .
P (B)
⇒ P (B ∩ A) = P (B)P (A/B).
Remarks:
i. 0 ≤ P (A/B) ≤ 1 or 0 ≤ P (B/A) ≤ 1
iii. P (A1 ∩A2 ∩· · ·∩An ) = P (A1 )P (A2 /A1 )P (A3 /A1 ∩A2 ) · · · P (An /A1 ∩A2 ∩· · ·∩An−1 )
Examples:
69
Introduction to Statistics - Stat 1011 es.awol@gmail.com
1. Recall the previous example that a part time student who is taking two courses,
economics and statistics. Find P (E/S) and P (S/E).
2. A package contains 12 resistors, 3 of which are defective. If 3 are selected, find the
probability of getting
3. Urn I contains 4 white balls and 5 red balls. And urn II contains 6 white balls and
8 red balls. A ball is chosen at random from urn I and put into urn II. Then a ball
is chosen at random from urn II. What is the probability that the ball is white.
4. An urn contains 6 green and 4 black balls. Another urn contains 7 green and 9 black
balls. Two balls are transferred from the first urn and placed in the second urn.
Then one ball is taken from the latter. What is the probability that the ball drawn
is from the second urn is black.
Solutions:
1. P (E/S) =, P (S/E) =
2. 12C3
Example
1. A coin is tossed and a die is rolled. What is the probability of getting a head on the
coin or number 4 on the die.
2. An urn contains 6 white and 3 black balls. Three balls are drawn. What is the
probability that all the drawn balls will be black
Solutions:
70
Introduction to Statistics - Stat 1011 es.awol@gmail.com
1. Let A= getting head on the coin ⇒ P (A) = 1/2. Let B= getting number 4 on the
die ⇒ P (B) = 1/6
P (A ∪ B) = P (A) + P (B) − P (A ∩ B) = P (A) + P (B) − P (A ∩ B) = 7/12
But P (A ∩ B) = P (A)P (B) = 1/12. Thus P (A ∪ B) = 7/12.
2. N = 12, n = 3
Let E1 = the first black ball selected.
Let E2 = the second black ball selected.
Let E3 = the third black ball selected.
Example: Let A and B be two events associated with an experiment. Suppose that
P (A) = 0.4, P (A ∪ B) = 0.7 and P (B) = p. For what choices of p are A and B indepen-
dent.
EXERCISE: A certain travel club has 1000 members. 60% of these members are males.
45% of these members pay by credit card when they travel including 175 females. If a
member is selected from the travel club at random, what is the probability that:
Are the sex of the member and the mode of payment statistically independent events?
71
Chapter 6
Probability Distributions
If the number of possible values of a random variable X (that is, RX ) is finite or countable
infinite, the random variable is called discrete random variable. That is, the possible values
of X may be listed as x1 , x2 , · · · , xn , · · · . In the finite case the list terminates and in the
countably infinite case the list continuous indefinitely.
On the other hand, if the random variable assumes an uncountable infinite number of
possible values, the random variable is called a continuous random variable.
i. 0 ≤ p(xi ) ≤ 1
X
ii. p(xi ) = 1
i
72
Introduction to Statistics - Stat 1011 es.awol@gmail.com
This function p defined above is called probability mass function (pmf ) of the random vari-
able X. The collection of pairs (xi , p(xi )), i = 1, 2, · · · is sometimes called the probability
distribution of X.
Examples:
1. Construct a probability distribution for the number of heads observed in tossing a
coin two times. Also plot the probability distribution using bar diagram.
2. Construct a probability distribution for the number of heads observed in tossing a
coin three times and plot it.
3. Construct a probability distribution for the number of girls if a family plans to have
four children.
Solutions:
1. S = {HH, HT, T H, T T }
Let X be the number of heads observed in tossing a coin two times. RX = {0, 1, 2}
x 0 1 2 Total
P (x) 1/4 2/4 1/4 1
x 0 1 2 3 Total
P (x) 1/8 3/8 3/8 1/8 1
73
Introduction to Statistics - Stat 1011 es.awol@gmail.com
(
2x, 0 ≤ x ≤ 1;
(b) f (x) =
0, otherwise.
(
e−x , x ≥ 0;
(c) f (x) =
0, otherwise.
As continuous random variables differ from discrete random variables, consequently con-
tinuous probability distributions differ from discrete ones. Some of the most important
differences are listed as follows:
1. The function f (x) does not give the probability that X = x as did p(x) in the discrete
case. This is because X can take on an infinite number of values and, therefore, it
is impossible to assign a probability for each value x. In fact the values of f (x) is
not a probability at all; hence f (x) can take any nonnegative value, including values
greater than 1.
2. Since the area under the curve corresponding to a single point is zero, the probability
of obtaining exactly a specific value is zero. Thus, for a continuous random variable,
P (a ≤ X ≤ b) and P (a < X < b) are equivalent, which is certainly not true for
discrete distributions.
3. Finding areas under curves representing continuous probability distributions involves
the use of calculus and may become quite difficult. For some distributions, areas
cannot even be directly computed and require special numerical techniques. Of course
statistical computer programs easily calculate such probabilities.
74
Introduction to Statistics - Stat 1011 es.awol@gmail.com
1. Find the mean number of heads observed in tossing a coin three times.
Solution:
x 0 1 2 3 Total
P (x) 1/8 3/8 3/8 1/8 1
X
µ = E(x) = xp(x)
= 0 × 1/8 + 1 × 3/8 + 2 × 3/8 + 3 × 1/8
= 1.5
X
σ 2 = E(x − µ)2 = (x − µ)2 p(x)
= (0 − 1.5)2 × 1/8 + (1 − 1.5)2 × 3/8 × +(2 − 1.5)2 × 3/8 + (3 − 1.5)2 × 1/8
= 0.75
√
⇒σ= 0.75 = 0.86
R R1 R R1
2. E(X) = µ = xf (x)dx = 0 xdx = 0.5 and σ 2 = (x − µ)2 f (x) = 0 (x − 0.5)2 dx =?
1. Each trial has only two mutually exclusive outcomes or outcomes that can be reduced
to two. One of the outcomes is labeled as ”success” and and the other as ”failure”.
2. The outcome of each trial is independent. That is, the outcome of one trial does not
affect the outcome of another.
75
Introduction to Statistics - Stat 1011 es.awol@gmail.com
This is called the binomial distribution. The mean is E(X) = np and variance is V (X) =
npq.
Examples:
1. Suppose a coin is tossed 10 times. What is the probability of getting
2. The probability of a man kicking into the goal is 2/3. If a person kicks 5 times, what
is the probability of scoring
Find the average, variance and standard deviation of the number of goals.
Solution:
1. Let X be the number of heads observed in tossing a coin 10 time, Rx = {0, 1, 2, · · · , 10}
1 1
p = P (Success) = P (Head) = ⇒ q = 1 − p =
2 2
X ∼ Bin(p = 0.5, n = 10)
x 10−x
10 1 1
⇒ P (X = x) = ; x = 0, 1, 2, · · · , 10
x 2 2
76
Introduction to Statistics - Stat 1011 es.awol@gmail.com
3 10−3
10 1 1
(a) P (X = 3) = =
3 2 2
0 10−0
10 1 1
(b) P (X = 0) = =
0 2 2
(c) P (X ≤ 3) = P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3) =
(d) P (X ≥ 3) = P (X = 3) + P (X = 4) + · · · + P (X = 10) = 1 − P (X < 3) =
(e) P (X > 3) = P (X = 4) + P (X = 5) + · · · + P (X = 10) = 1 − P (X ≤ 3) =
e−λ λx
P (X = x) = , x = 0, 1, 2, · · ·
x!
where λ is the average number of events per unit of time.
Properties:
77
Introduction to Statistics - Stat 1011 es.awol@gmail.com
The mean and variance number of successes for a poisson distribution are the same, i.e,
E(X) = V ar(X) = λ.
Examples:
1. On average a typist commits 3 errors per page. Find the probability that she will
make
(a) no mistake.
(b) two mistakes.
(c) more than one mistake.
Solution:
78
Introduction to Statistics - Stat 1011 es.awol@gmail.com
If X is a continuous random variable having a normal distribution with mean µ and variance
σ 2 , write as X ∼ N (µ, σ 2 ), its density function is:
1 x−µ 2
1 − ( )
f (x) = √ e 2 σ , −∞ < x < ∞
2πσ
Knowing the values of these two parameters, µ and σ 2 , completely determine the distri-
bution. Thus, the function describes a family of curves which may may differ only with
regard to µ and σ 2 , but have the same characteristics.
Several interesting features can be determined from this function without really evaluating
it. Some of these features are:
2. The curve is symmetric about the mean. This means that the number of units in
the data below the mean is the same as the number of units above the mean. This
means the mean and median have the same value.
3. The height of the curve is maximum at the mean value. Thus, the mean and mode
values coincide. This means the normal distribution has the same value for the mean,
median and mode.
4. The curve declines as we go in either direction from the mean, but never touches the
base (x-axis) so that the tails of the curve on both sides extend indefinitely.
5. The corresponding deciles, quartiles and percentiles are at equidistant from the mean.
x2
1 x−µ 2
− ( )
Z
1
P (x1 < X < x2 ) = √ e 2 σ dx.
x1 2πσ
But, integration of this function is quite complicated and is never directly used to calcu-
late such probabilities. Fortunately normal distribution can easily be standardized, which
allows to use a single table for any normal distribution.
Suppose X has a normal distribution with mean µ and variance σ 2 , i.e, X ∼ N (µ, σ 2 ). If
X −µ
we define Z = , then Z will have a normal distribution with mean 0 and variance
σ
79
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Such normal distribution with mean µ = 0 and variance σ 2 = 1 is called a standard normal
distribution. Hence, the pdf of the standard normal variate Z is given by:
1 2
1 − z
f (z) = √ e 2 , −∞ < z < ∞
2π
Now for any standard normal variate Z, the probability (area) between two values z1 and
z2 is defined as:
Z z2 1 2
1 − z
P (z1 < Z < z2 ) = √ e 2 dz.
z1 2π
Hence,
x1 − µ X −µ x2 − µ
⇒ P (x1 < X < x2 ) = P ( < < )
σ σ σ
= P (z1 < Z < z2 )
The total area under the (standard) normal curve is 1. Hence, the area to the right and left
of the central value (µ = 0) of the standard normal distribution is 0.5 (as it is symmetric
about 0).
• P (Z > z) = P (Z < −z).
5. Find the area between -1 and 1.5; P (−1 < Z < 1.5).
Solution: P (−1 < Z < 1.5) = P (−1 < Z < 0) + P (0 < Z < 1.5) = P (0 < Z <
1) + P (0 < Z < 1.5) = 0.7745
80
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Solutions:
15 − µ 15 − 10
(a) P (X > 15) = P (Z > ) = P (Z > ) = P (Z > 1.12) = P (Z >
σ 4.472
0) − P (0 < Z < 1.12) = 0.1314
5−µ 15 − µ 5 − 10 15 − 10
(b) P (5 < X < 15) = P ( <Z < ) = P( <Z < )=
σ σ 4.472 4.472
P (−1.12 < Z < 1.12) = 0.7372
5−µ 10 − µ 5 − 10 10 − 10
(c) P (5 < X < 10) = P ( <Z < ) = P( <Z < )=
σ σ 4.472 4.472
P (−1.12 < Z < 0) = 0.3686
If the concern is to find the values of z for given probability values, the form of notation
often called the zα notation can be adopted. According to this notation, zα , is the value
of z such that P (Z > zα ) = α. This definition results in the equivalent statements
P (Z < −zα ) = α and because of the symmetry of the normal distribution, P (−zα/2 < Z <
zα/2 ) = 1 − α.
Examples:
1. Find the value z associated P (|Z| < z) = 0.10.
2. The IQ score of students is normally distributed with a mean of 120 and variance
400. What is the probability that a student will have an IQ
81
Introduction to Statistics - Stat 1011 es.awol@gmail.com
The χ2 Distribution
The χ2 distribution is usually denoted by χ2 (v), where v is the degrees of freedom. χ2
values are nonnegative. The shape of the distribution is different for each value of v. For
large values of v (usually greater than 30), the χ2 distribution is approximated by normal.
The F Distribution
It is a continuous and right skewed distribution. It is indexed by two degree of freedom
parameters v1 and v2 ; these are usually integers and written as F (v1 , v2 ).
82
Chapter 7
Sampling Techniques
• Sample: is a subset of the population that being studied with the aim of estimating
the characteristics of the population.
• Sampling Frame: is a list of all elements of the population. The sampling frame
forms the basic material from which a sample is drawn. Hence, it should be complete
and up-to-date.
• Sampling Unit: The population may be regarded as consisting of units which are
to be used for the purpose of sampling. Each unit is regarded as individual and
indivisible when the selection is made. Such a unit is known as a sampling unit.
• Sample Size: It is the total number of elements in the sample. That is, the size of
the sample is the number of sampling units which are selected from the population
by a random method.
83
Introduction to Statistics - Stat 1011 es.awol@gmail.com
element in the population. The latter method is a study in which some elements which
are assumed representatives of the population are investigated. It is a statistical process
in which we select and examine a sample instead of considering the whole population.
In practice, it may not be possible to collect information on all units of the population. One
reason is lack of resources in terms of money, personnel and equipment. Another reason is
that sample survey enables us to obtain results on time. Hence, for getting quick results
sampling is preferred. Also, sampling helps to get data of good quality as the number
of enumerators’ decreases we can train and supervise them well in the process of data
collection. Moreover, complete investigation may be destructive in nature. And samples
reduce the damages caused by some tests in quality control. For example, in cooking food
mothers check whether the food has enough amount of salt, spices, butter and so on, by
taking a small amount and testing it. What would happen if the test is all what is in the
dish?
In probability sampling, each unit in the population has an equal chance of being included
in the sample. In the non-probability sampling, the units are drawn using ceratin amount
of judgement.
84
Introduction to Statistics - Stat 1011 es.awol@gmail.com
(a) Lottery Method: This method is useful in comparatively small size of pop-
ulation. All members in the population are numbered or named on separate
pieces of paper of identical size and shape. These slips of paper are then iden-
tically folded and mixed up in a container. The probability of the first item
being selected out of the total number of N slips of paper is 1/N , for the second
particular piece, this probability is 1/(N − 1), since N − 1 slips of papers left
in the container after the first slip has been drawn. Similarly, the probability
of the third slip being picked up is 1/(N − 1) and so on. The items from the
container are selected successively until the desired sample size reached. This
would constitute a random sample called simple random sample.
(b) Random Number Table Method: A random number table is giving numbers
in a random order which are generated using computer. In the lottery method,
the selection may subject to human bias as people may identify the slips (chits)
in many ways. The inconvenience of preparing slips of paper, shuffling them and
choosing the items one by one may be avoided by the use of random number
table. This principle involved in this method is also same as that in the lottery
method.
85
Introduction to Statistics - Stat 1011 es.awol@gmail.com
86
Introduction to Statistics - Stat 1011 es.awol@gmail.com
of individual units directly. In such a case the whole population is divided into a
number of primary units called stages, each of which is composed of second stage of
units. A serious of samples are then taken at successive stages. The sample size at
each stage is determined by the relative population size at each stage.
Nonprobability sampling gives rise to those methods where the subjects are selected delib-
erately. No probability is attached or can be computed for an item being selected.
2. Judgment Sampling: In this method, sampling units are selected on the judgement
of the person doing the study. The underlying assumption is that the unit selected
truly represent the entire population. For example to find out the potential of drip
irrigation technology, a researcher may go the teachers of Agricultural University.
3. Convenience Sampling: Here, an investigator selects the sample at his own conve-
nience. This method is based on the assumption that the population is homogeneous
and the individuals selected and interviewed similar information with regard to the
characteristic under study. For example, persons selected from gas stations or petrol
pumps to collect information about the quality of gas or petrol, service or correct-
ness of the measurement, e.t.c are supposed to represent the population of gasoline
buyers.
87
Chapter 8
As noted earlier, one of the primary objectives of a statistical analysis is to use data from a
sample to make inferences about the population from which the sample was drawn. In this
section, the basic procedures for making such inferences are presented. Statistical inference
generally takes two forms, namely, estimation of the parameter and testing of a hypothesis.
8.1 Estimation
For the purpose of general discussion, let θ be the population parameter and θ̂ be the
corresponding statistic. As already stated, the parameter θ is unknown. The value of the
statistic θ̂ is computed from the random sample taken from the population.
The best estimator should be highly reliable and have desirable properties like unbiased-
ness, consistency, efficiency and sufficiency. These criteria are described as follows:
88
Introduction to Statistics - Stat 1011 es.awol@gmail.com
2. Consistency: It refers to the effect of sample size on the accuracy of the estimator. A
statistic is said to be consistent estimator of the population parameter if it approaches
the parameter as the sample size increases, that is, θ̂ → θ as n → N .
Hypothesis testing is a statistical procedure which leads to take a decision about an as-
sumption for the population parameter(s) for being correct or not using sample data. It
starts by making a set of two statements about the parameter(s) in question. These are
89
Introduction to Statistics - Stat 1011 es.awol@gmail.com
usually expressed in the form of simple mathematical relationships involving the param-
eters. These two statements are exclusive and exhaustive, which means that one or the
other statement must be true. The first statement is called null hypothesis and is denoted
by H0 and, the second is called alternative hypothesis and is denoted by H1 .
H0 : θ = θ0 ⇔ θ − θ0 = 0.
– Two-sided test: H1 : θ 6= θ0 ⇒ θ − θ0 6= 0
– One-sided test:
∗ Right tailed test: H1 : θ > θ0 ⇒ θ − θ0 > 0
∗ Left tailed test: H1 : θ < θ0 ⇒ θ − θ0 < 0
• Type I Error: It is an error occurred if one rejects the null hypothesis which is
actually true. The probability of making such error is denoted by α and called
significance level. This significance level (α) is the maximum acceptable probability
of rejecting a true null hypothesis.
• Type II Error: It is an error occurred if one failed to reject the null hypothesis
which is actually false. The probability of making this type II error is denoted by β.
The power of a test is obtained as 1 − β which is the probability of correctly rejecting
the null hypothesis when it is false.
90
Introduction to Statistics - Stat 1011 es.awol@gmail.com
• Step 3: Define a sample based test statistic (Tcal ) and rejection region (Ttab ) for H0 .
• Step 5: Conclusion.
In step 1, H0 and H1 are the null and alternative hypotheses, respectively, defined before
while α is the level of significance. The most common choices of significance levels are
α = 0.1, α = 0.05 and α = 0.01. In step 3, the test statistic is a sample statistic whose
sampling distribution can be specified for both the null and alternative hypothesis case
(although the sampling distribution when the alternative hypothesis is true may often
be quite complex). After specifying the appropriate significance level α, the sampling
distribution of this statistic is used to define the rejection region. The rejection (critical)
region is the range of values of a sample statistic that will lead to rejection of the null
hypothesis. It comprises of the values of the test statistic for which (1) the probability
when the null hypothesis is true is less than or equal to the specified α and (2) probabilities
when H1 is true are greater than they are under H0 . Regard to making decision, for a
two-sided test reject H0 if |Tcal | ≥ Ttab , for right tailed test reject H0 if Tcal ≥ Ttab and for
left tailed test reject H0 if Tcal ≤ −Ttab .
2. Significance level = α.
5. Conclude.
Examples:
91
Introduction to Statistics - Stat 1011 es.awol@gmail.com
1. Assume that the average annual income for government employees in Ethiopia is
reported by the Ethiopian Statistical Agency Census Bureau to be birr 18750.00.
There was some doubt whether the average yearly income of government employees
in Ethiopia was representative of the national average. A random sample of 100
government employees in Ethiopia was taken and it was found that their average
salary was birr 19240.00 with a standard deviation of birr 2610.00. Can we say
that the average salary of government employees in Ethiopia is representative of the
national average at 5% level of significance?
2. A research done by a graduating student reports that the average score of Haramaya
University students in statistics course is less than 80. To test this claim, a random
sample of 10 students was taken and their scores in the course are recorded as: 65,
70, 80, 85, 60, 90, 80, 75, 85, 90. At 0.05 level of significance, test the validity of this
claim.
Solutions
1. Given µ0 = 18750, n = 100, x̄ = 19240 and s = 2610.
(a) H0 : µ = 18750
H1 : µ 6= 18750
(b) α = 0.05 ⇒ Ztab = Zα/2 = Z0.025 = 1.96
x̄ − µ0 19240 − 18750
(c) Zcal = √ = √ = 1.877
s/ n 2610/ 100
(d) Since |Zcal | < Ztab , H0 should not be rejected.
(e) Thus, the average salary of government employees in Ethiopia is not significantly
different from the national average at 5% level of significance.
2. Given µ0 = 80, n = 10
n
1X 1
x̄ = xi = (65 + 70 + . . . + 90) = 78
n i=1 10
n
1 X √
s2 = (xi − x̄)2 = 106.67 ⇒ s = 106.67 = 10.33
n − 1 i=1
(a) H0 : µ = 80
H1 : µ < 80
(b) α = 0.05 ⇒ ttab = tα (n − 1) = t0.05 (9) = 1.833
x̄ − µ0 78 − 80
(c) tcal = √ = √ = −0.612
s/ n 10.33/ 10
(d) Since tcal > −ttab , H0 should not be rejected.
(e) Thus, the average score of Haramaya University students in statistics course is
less than 80 at 5% level of significance.
92
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Example: Construct the 95% confidence interval for the population mean of the previous
two examples.
Solutions:
√ √ √
1. (x̄ − Z√α/2 × s/ n, x̄ + Zα/2 × s/ n) = (19240 − 1.96 × 2610/ 100, 19240 − 1.96 ×
2610/ 100) = (18728.44, 19751.56)
√ √ √
2. (x̄ − tα/2 (n − 1)
√ × s/ n, x̄ + tα/2 (n − 1) × s/ n) = (78 − 2.262 × 10.33/ 10, 78 −
2.262 × 10.33/ 10) = (70.61, 85.39)
93
Chapter 9
The inferences we have made so far have concerned a parameter from a single population.
There may be situations that need comparison of parameters from different populations.
Comparative studies are designed to discover and evaluate the difference between effects
rather than the effect themselves. In such studies, we must perform an experiment, collect
informative data and then reach at a decision based on the results. In general discussion,
the statistical term ’treatment’ is used to refer to techniques that will be compared. In
performing an experiment, the basic units exposed to one or another treatment is called
experimental units (subjects). The characteristics recorded after the treatment is applied
to the units is called a response. the manner in which subjects are chosen and assign to
treatment is called experimental design.
For two paired variables X1 and X2 , the difference of the two variables, di = x1i − x2i
i = 1, 2, · · · , n, is treated as if it were a single sample. The null hypothesis is that the
true mean difference of the two variables is µd = µ1 − µ2 = D0 . The difference is typically
94
Introduction to Statistics - Stat 1011 es.awol@gmail.com
The steps to be followed is similar to the one we have seen in the one sample case.
1. The null and alternative hypotheses to be tested are:
H0 : µd = 0
H1 : µd 6= 0 or µd < 0 or µd > 0
Solution: Let µd be the population mean of the difference in the blood pressure of women.
The differences of the before-after blood pressures are:
Women 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Before (x1i ) 70 80 72 76 76 76 72 78 82 64 74 92 74 68 84
After (x2i ) 68 72 62 70 58 66 68 52 64 72 74 60 74 72 74
di = x1i − x2i 2 8 10 6 18 10 4 26 18 -8 0 32 0 -4 10
95
Introduction to Statistics - Stat 1011 es.awol@gmail.com
n
1X 1 1
The sample mean of the differences is d¯ = di = (2 + 8 + · · · + 10) = (132) = 8.8.
n i=1 15 15
n
1 X ¯ 2 = 1 {(2 − 8.8)2 + (8 − 8.8)2 +
The variance of the differences is s2d = (di − d)
n − 1 i=1 15 − 1
1
· · · + (10 − 8.8)2 } = (1686.4) = 120.457 which implies the standard deviation sd = 10.98.
14
1. Thus, the null and alternative hypotheses to be tested are:
H0 : µd = 0
H1 : µd > 0
96
Introduction to Statistics - Stat 1011 es.awol@gmail.com
n1 n2
1 X 1 X
where x̄1 = x1i is the sample mean of the first group and x̄2 = x2i is
n1 i=1 n2 i=1
(n1 − 1)s21 + (n2 − 1)s22
the sample mean of the second group, s2p = is the pooled
n1 + n2 − 2
n1
1 X
variance of the both groups (note s21 = (x1i − x̄1 )2 is the sample variance
n1 − 1 i=1
n2
2 1 X
of the first group and s2 = (x2i − x̄2 )2 is the sample variance of the second
n2 − 1 i=1
group), n1 is sample size of the first group and n2 is sample size of the second group.
4. Decision:
• For a two sided test, H0 is rejected if |t| > tα/2 (n1 + n2 − 2).
• For a one sided case, H0 is rejected if |t| > tα (n1 + n2 − 2).
5. Conclude.
The (1 − α)100% confidence interval for the difference of the population means is:
r
1 1
(x̄1 − x̄2 ) ± tα/2 (n1 + n2 − 2)sp + .
n1 n2
The above test statistic is only used when the two distributions have the same variance.
When the two population variances are assumed to be different and hence must be esti-
mated separately, the test statistic is a little bit modified as:
Example: Company officials were concerned about the length of time a particular drug
product retained its toxin’s potency. A random sample of 10 bottles of the product was
drawn from the production line and measured for potency. A second sample of 10 bottles
was obtained and stored in a regulated environment for a period of one year. The readings
obtained from each sample are given below.
97
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Sample 1 10.2 10.5 10.3 10.8 9.8 10.6 10.7 10.2 10.0 10.6
Sample 2 9.8 9.6 10.1 10.2 10.1 9.7 9.5 9.6 9.8 9.9
Test the null hypothesis that the drug product retains its potency. Also, construct the
95% confidence interval for the difference of the population means.
Solution: Let µ1 be the mean potency of the product taken from the production line and
µ2 be the mean potency of the drug product that was retained for a year. The summary
statistics nof the data are:
1
1 X 1
x̄1 = x1i = (103.7) = 10.37
n1 i=1 10
n 2
1 X 1
x̄2 = x2i = (98.3) = 9.83
n2 i=1 10
"n n1
#
1
1 X 1 X 1 1
s21 = x21i − ( x1i )2 = [1076.3 − (103.7)2 ] = 0.105
n1 i=1 n1 i=1 9 10
"n n2
#
2
1 X 1 X 1 1
s22 = x22i − ( x2i )2 = [966.81 − (98.3)2 ] = 0.058
n2 i=1 n2 i=1 9 10
(n1 − 1)s21 + (n2 − 1)s22 (10 − 1)(0.105) + (10 − 1)(0.058)
s2p = = = 0.0815
n1 + n2 − 2 10 + 10 − 2
sp = 0.285
1. The hypotheses to be tested are:
H0 : µ1 = µ2
H1 : µ1 6= µ2
2. The level of significance is α = 0.05. Thus t0.05/2 (10 + 10 − 2) = t0.025 (18) = 2.101.
3. The test statistic is:
(x̄1 − x̄2 ) − (µ1 − µ2 ) (10.31 − 9.83) − 0
t= r = r = 3.766
1 1 1 1
sp + 0.285 +
n1 n2 10 10
4. Decision: Since |t| > t0.025 (18), H0 is rejected.
5. Conclusion: There is a significant difference in the mean potency of the drug product
from the production line and the drug that was retained for one year.
The 95% confidence interval for the difference of the population means, µ1 − µ2 is:
r r !
1 1 1 1
(x̄1 − x̄2 ) ± tα/2 (n1 + n2 − 2)sp + = (10.37 − 9.83) ± 2.101(0.285) +
n1 n2 10 10
= (0.272, 0.808).
98
Introduction to Statistics - Stat 1011 es.awol@gmail.com
H0 : µ1 = µ2
H1 : µ1 > µ2
2. The level of significance is α = 0.05. The degrees of freedom for unequal variances
assumption is
99
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Similarly, if the groups have common variance, the pooled variance (standard deviation)
can be calculated as shown before in the small sample case.
Example: For a random sample of 120 adult female born in country A, the mean height
was 62.7 inches with standard deviation 2.50 inches. For another random sample of 150
adult female born in country B the mean height was 61.8 inches with standard deviation
2.62 inches. Would you reject the null hypothesis that there is no difference in height
between adult female born in the two countries at 1% level of significance.
Solution: Let µ1 = the mean height of adult female born in country A and µ2 = the mean
height of adult female born in country B.
H0 : µ1 = µ2
H1 : µ1 6= µ2
5. Conclusion: There is a difference in the population mean height of in the two coun-
tries.
There are many types of observational classifications. If the observations are classified
on the basis of a single criterion, the classification is called one-way classification. If the
observations are classified on the basis of two criteria, it is called two-way classification.
100
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Here, one-way anova will be discussed. The principle underlying the one-way ANOVA is
that the total variability in a data set is partitioned into two components; the variability
between groups and the variation within groups. Each component represents a different
source of variation. The between groups variation can be accounted for, and the within
group variation is the unexplained (residual) variation results from uncontrolled biological
variation and technical error.
Suppose there is one basic variable or criterion of classification with k groups. The null
hypothesis to be tested is that the all the k group means are equal and the alternative
hypothesis is at least one of the group mean is significantly different from the other. That
is,
H0 : µ1 = µ2 = · · · = µk
H1 : not H0
To construct the test statistics, the total sum squares (TSS) is decomposed into the between
sum squares (BSS) and within (errors) sum squares (ESS).
T SS = BSS + ESS
X ni
k X k
X ni
k X
X
2 2
(xij − x̄) = ni (x̄i − x̄) + (xij − x̄i )2
i=1 j=1 i=1 i=1 j=1
The TSS has n − 1 degrees of freedom, the BSS has k − 1 degrees of freedom and the
ESS has n − k degrees of freedom. The ratios of the BSS and ESS to their corresponding
degrees of freedom are called between mean squares (BMS) and error mean squares (EMS),
respectively. Therefore, the test statistic is called an F test which is the ratio of BMS to
EMS. In addition, the critical value is Fα (k − 1, n − k).
101
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Example: Suppose a university wishes to compare the effectiveness of four teaching meth-
ods (Slide, Self Study, Lecture, Discussion) for a particular course. Twenty four students
are randomly assigned to the teaching methods, with 5, 6, 6 and 7 respectively. At the end
of teaching the students with their assigned method, a test (out of 20%) was given and the
performance of the students were recorded as follows:
Slide Self Study Lecture Discussion
9 10 12 9
12 6 14 8
14 6 11 11
11 9 13 7
13 10 11 8
5 16 6
7
Construct the ANOVA table. Also test the hypothesis that there is no difference among
the four teaching methods.
102
Introduction to Statistics - Stat 1011 es.awol@gmail.com
S.V. SS df MS F
126.89 42.2967
Between BSS = 126.89 4−1=3 BM S = = 42.2967 F = = 11.28
3 3.7485
74.97
Within ESS = 74.97 24 − 4 = 20 EM S = = 3.7485
20
Total T SS = 201.85 24 − 1 = 23
Then the calculated F value is going to be compared with F0.05 (3, 20). Thus, F0.05 (3, 20) =
2.38. Therefore, H0 should be rejected. This means that there is a difference among the
teaching methods.
Mean Separation
In the ANOVA, if the null hypothesis is rejected, then there is a need to identify which
pair of group means are significant and which are not. There are several methods of mean
separation, of these, the Fisher’s Least Significant Difference LSD test is to be considered.
In this method, first sort the group means in ascending order to compare two means at a
time. For comparing µi and µj , compute
s
1 1
LSDij = tα/2 (n − k) EM S + .
ni nj
Then, if x̄i − x̄j > LSDij , there is a significant difference between µi and µj . Otherwise,
no significant difference is observed.
Example: Recall the previous example and identify the significant pair of teaching method
means using LSD.
Solution:
Lecture: x̄3 = 12.833, Slide: x̄1 = 11.800, Discussion: x̄4 = 8.000, Self Study: x̄2 = 7.667
n1 = 5, n2 = 6, n3 = 6, n4 = 7
Therefore, lecture and slide teaching methods are better than the other two.
103
Chapter 10
In the previous chapters we have been dealing with a single variable. In this chapter we
will deal with a bi-variate data, i.e., data involving two variables.
10.1 Correlation
Correlation is a statistical tool desired towards measuring the degree of the relationship
(degree of association) between variables. If the change in one variable affects the change
in the other variable, then the variables are correlated.
Correlation that involves only two variables is called simple correlation. The simplest way
to present bivariate data is to plot on the XY plane. For a bivariate distribution (X, Y ),
the values (Xi , Yi ), i = 1, 2, · · · , N are plotted in the XY plane. This is known as scatter
plot. This gives an idea about the correlation of the two variables. But, it will give only a
vague idea about the presence and absence of correlation and the nature (direct or indirect)
of correlation. It will not indicate about the strength or degree of relationship between
two variables.
10.1.1 Covariance
Covariance is a measure of the joint variation between between two variables, i.e., it
measures the way in which the values of the two variables vary together.
Recall that the sample and population variance of a certain variable X is calculated,
104
Introduction to Statistics - Stat 1011 es.awol@gmail.com
respectively, as:
n
1 X
Sx2 = (Xi − X̄)2
n − 1 i=1
n
1 X
= (Xi − X̄)(Xi − X̄)
n − 1 i=1
= Sxx
and
N
1 X
σx2 = (Xi − X̄)2
N i=1
N
1 X
= (Xi − X̄)(Xi − X̄)
N i=1
= σxx
Similarly the sample covariance between two variables is defined as:
n
1 X
Sxy = (Xi − X̄)(Yi − Ȳ )
n − 1 i=1
Xn X n
n Xi Yi
1 X
Xi Yi − i=1 i=1
= .
n − 1 i=1
n
and
N
1 X
σxy = (Xi − X̄)(Yi − Ȳ )
N i=1
XN XN
X N
Xi Yi
1 i=1 i=1
= X i Y i − .
N i=1 N
If the covariance is zero, there is no linear relationship between the two variables. If it is
negative, there is an indirect linear relationship between them. If the covariance is positive,
there is a direct linear relationship between the variables.
105
Introduction to Statistics - Stat 1011 es.awol@gmail.com
• If the value of r is near zero, there is no linear association between the two variables.
Limitations of r:
106
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Solution:
No. X Y X2 Y2 XY
1 63 66 3969 4356 4158
2 65 68 4225 4624 4420
3 66 65 4356 4225 4290
4 67 67 4489 4489 4489
5 67 69 4489 4761 4623
6 P 68 P 70 P 24624 P 24900 P 4760
Total Xi = 396 Yi = 405 Xi = 26152 Yi = 27355 Xi Yi = 26740
⇒ r = 0.597
Suppose that a group of n individuals is given grades or ranks with respect to two char-
acteristics. Let (Xi , Yi ), i = 1, 2, · · · , n be the ranks of the ith individual on the two
characteristics. Then, the Spearman’s rank correlation coefficient is given by:
n
X
6 d2i
i=1
rs = 1 − where di = RXi − RYi .
n(n2 − 1)
107
Introduction to Statistics - Stat 1011 es.awol@gmail.com
This formula is used when all the ranks are not repeated. For repeated ranks, a correction
factor is required. If ties occur between the pair of measurements, it creates no problem.
m(m2 − 1)
If m is the number of times an item is repeated, then the factor is added to
n
2
X
d2i . For each repeated value, this correction factor is to be added.
i=1
n
!
X X
6 d2i + CF
i=1
rs = 1 −
n(n2 − 1)
Note that −1 ≤ rs ≤ 1.
Example: The ranks of some 10 students in two courses Statistics and Economics are
given below. Calculate the rank correlation and interpret.
Statistics 5 2 9 8 1 10 3 4 6 7
Economics 10 5 1 3 8 6 2 7 9 4
Ans: rs = −0.31
Example: Obtain the rank correlation for the following data.
X 85 74 85 50 65 78 74 60 74 90
Y 78 91 78 58 60 72 80 55 68 70
Ans: rs = −0.545
The regression study that involves only two variables is called simple regression and the
regression analysis that studies more than two variables is called multiple regression. If
the relation ship between the two variables can be described by a straight line then the
regression is known as linear regression other wise it is called non-linear.
The regression analysis involving only two variables and having a linear relationship is
called simple linear regression. This linear relationship between the two variables is repre-
sented by a straight line.
108
Introduction to Statistics - Stat 1011 es.awol@gmail.com
A regression line is a line that gives the best estimate of one variable for any given value
of another variable. The regression line which is used to estimate the values of Y for any
given value of X is called regression line of Y on X.
Model: Yi = β0 + β1 Xi + εi ; i = 1, 2, · · · , n
where
Yi is the ith actual value of the dependent variable.
β0 is the intercept.
β1 is the slope.
• β1 is the increment in the value of the dependent variable when the value of the
independent variable increases by 1 unit. The sign of β1 is the same as to that of the
covariance and correlation coefficient. That is, there is a direct linear relationship
between the two variables if β1 is positive,there is an indirect linear relationship
between the two variables if β1 is negative, and there is no linear relationship between
the two variables if β1 is zero.
109
Introduction to Statistics - Stat 1011 es.awol@gmail.com
and
β̂0 = Ȳ − β̂1 X̄
Example: Recall the previous example on heights of sons and heights of fathers.
a. Estimate the regression model of height of sons on height of fathers.
b. Interpret the estimated parameters.
c. What would be the predicted height of the son if the fathers height is 70 inches?
Solutions: β̂1 = 0.625 and β̂0 = 26.25
The coefficient of determination tells how well the estimated model fits the data. For
simple linear regression (two variables case), it is defined as the square of the sample cor-
relation coefficient, and denoted by r2 . Hence r2 measures the proportion or percentage of
the variation in the dependent variable explained by the independent variable. Variation
means the sum of the squares of the deviation of a variable from its mean value.
Generally, r2 is a nonnegative quantity which lies in the limits 0 and 1, i.e., 0 ≤ r2 ≤ 1. If
it approaches to 1, it means a good fit and if it approaches 0, no relationship between the
variables.
If we consider the example on heights of sons and their fathers, we had r = 0.597 which
implies r2 = 0.357. This means 35.7% of the variation in the heights of sons is explained
by the heights of the fathers.
Example: A study was reported in a medical journal suggesting that the peak heart rate
of an individual can reach during intensive exercise decreases with age. A cardiologist
wants to do his own study. The next 9 patients were given a stress test on the treadmill
at 6 miles per hour and their ages and their heart rates were recorded as follows:
110
Introduction to Statistics - Stat 1011 es.awol@gmail.com
Age 30 30 40 20 20 45 30 45 50
Hear Rate 190 180 180 200 195 170 185 175 165
c. Can we predict the peak heart rate of an 80 year old man who is given a similar
stress test? If so, what peak heart rate do you predict.
Solutions:
No. X Y X2 Y2 XY
1 30 190 900 36100 5700
2 30 180 900 32400 5400
3 40 180 1600 32400 7200
4 20 200 400 40000 4000
5 20 195 400 38025 3900
6 45 170 2025 28900 7650
7 30 185 900 34225 5550
8 45 175 2025 30625 7875
9 P 50 P 165 P 22500 P 227225 P 8250
Total X = 310 Y = 1640 X = 11650 Y = 299900 XY = 55525
THE END!!!
111