0% found this document useful (0 votes)
48 views

Chapter 1 & 2

Good

Uploaded by

simebehailu8
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Chapter 1 & 2

Good

Uploaded by

simebehailu8
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Chapter One

Introduction

1.1 Definition and classification of statistics


Definition:
 Statistics is a collection of numerical facts and data.
 Statistics is a mathematical science dealing with the methods of collection, organizing the
collected data, presentation, analysis and interpretation of the data.
 Statistics is a subject that deals with numbers and figures describing certain situations. It
primarily deals with numerical data taken by surveys and summarizes these data in such a
way that this summary gives a good indication about the nature of the data.

The word “statistics” could be singular or plural. The definition given in the second place above
might be taken as the singular form of “statistics”.
Statistics, in its singular sense is a subject area or field of study. It is defined as science, which deals
with the collection, processing, analysis, interpretation and presentation of numerical facts.

The subjects of statistics, as it seems, is not a new discipline but it is as old as the human society
itself. The sphere of its utility, however, was very much restricted.

The word “statistics” is derived from the Latin for “state” indicating the historical importance of
governmental data gathering, which related to demographic information (military recruitment and tax
collecting). Thus, the scope of statistics in the ancient times was primarily limited to the collection of
demographic, property and wealth data of a country by governments for framing military and fiscal
policies.

Nowadays, statistics is used almost in every field of study, such as natural science, social science
engineering, medicine, agriculture, e t c.

Classification: Statistics is broadly divided into two categories based on how the collected data are
used.

1. Descriptive Statistics
 deals with describing data without attempting to infer anything that goes beyond the given set of
data,
 consists of collection, organization, summarization and presentation of data.
Example1: The mean blood pressure of a group of patients and the success rate of a surgical
procedure can be considered as descriptive statistics.

2. Inferential Statistics
Statistical inference is drawing conclusions about an entire population based on
data in a sample drawn from that population. From both frequentist and Bayesian
perspectives, there are three main goals of inference: estimation, hypothesis testing,
and prediction. Estimation and hypothesis testing deal with drawing conclusions
about unknown and unobservable population parameters.

Prediction is estimating the values of potentially observable but currently unobserved quantities. For
example, we might want to predict the number of “yesses” in
a future survey of 50 UI students. Prediction in statistical inference isn’t restricted to
predicting future observations, however. It may refer to estimating values that have
already occurred but were not measured. For example, we may want to use values
of acid rain deposition measured from rain gauges at specific sites to predict acid
rain deposition at other locations that have no rain gauges.
 deals with making inferences and/or conclusions about a population based on data obtained from a
limited sample of observations,
 consists of performing hypothesis testing, determining relationships among variables and making
predictions.
Example2: The mean blood pressure of all Americans and the expected success rate of a
surgical procedure in patients who have not yet undergone the operation.

1.2 Definition of some basic terms

a) Population: Is the totality (collection) of all objects or items under consideration.


Example: If you want to study the mean age of primary school teachers in sodo town, all
primary school teachers in sodo town constitute the population of your study.

b) Sample: Is a part of a population taken so that some generation about the population can be
made. A sample should be a representative of the population. Example: If you want to study
the mean age of primary school teachers in sodo town, all primary school teachers in sodo
town constitute the population as mentioned above, but if you study only some of the
teachers, the selected ones constitute your sample.

c) Parameter: is a descriptive measure of a population, or summary value calculated from a


population. Examples: Average, Range, variance value of the population.

d) Statistic: is a descriptive measure of a sample, or summary value calculated from a sample.


Example: Average, Range, variance value of the sample.

1.3 Stages in Statistical Investigation

We have defined statistics, in singular sense, as a science that deals with collection,
organization (classification), presentation, analysis, and interpretation of numerical facts. So we
consider the following stages of statistical investigation:

Data Collection: This is a stage where we gather information for our purpose.

Data Organization: It is a stage where we edit our data. A large mass of figures that are
collected from surveys frequently need organization. The collected data involve irrelevant
figures, incorrect facts, omission and mistakes.

Data Presentation: The organized data can now be presented in the form of tables, charts
diagrams and graphs. At this stage, large data are presented in a very summarized and
condensed manner.

Data Analysis: This is the stage where we critically study the data. The purpose of data
analysis is to dig out information useful for decision making.

Data Interpretation: This is the stage where draw valid conclusions from the results
obtained through data analysis. If the data that have been analyzed are not properly interpreted,
the whole purpose of the investigation may be defected and misleading conclusion may be
drawn.

1.4 Application and limitation of statistics

Uses of statistics
The science of statistics is very essential for research and decision making processes in all aspects of
human life. The following are some of the areas for which statistical analysis is required:
 To represent the facts in the form of numerical data.
 To summarize a mass of data into a few presentable understandable and precise
figures.
 To Predict or forecast future trend.
 To help select a course of action among a number of alternatives.
 To help in formulating policies.

However, Statistics has the following limitations.

a) It does not study qualitative characteristics directly Examples: Beauty, honesty, poverty, and
standard of living.
b) It does not study a single individual but deals with aggregate of facts. Example: The
population size of a country for some given year does not help us for comparative studies.
c) Statistical results are true only on the average. Examples: The probability of getting a head in
tossing a coin is 1|2. The germination percentage of a given variety of seed is 80%
d) It is sensitive for misuse: Examples: The number of car accidents committed in a city in a
particular year by women drivers is 10 while that committed by men drivers is 40. Hence
women drivers are safe drivers.

1.5 TYPES OF VARIABLES AND MEASUREMENT SCALES

A variable is a characteristic of an object that can have different possible values.


There are two types of variables.

a) Quantitative variables: are variables that can be quantified or can have numerical values.
Examples: height, area, income, temperature e t c.
b) Qualitative variables: are variables that can not be quantified directly. Examples: colour ,
beauty, sex, location qualitative variables are also called categorical variables. And hence we
have two types of data; quantitative & qualitative data.

Quantitative variables can be further classified as


 Discrete variables, and
 Continuous variables

a) Discrete variables are variables whose values are counts.


Examples: number of students, number of households (family size), Number of pages of a
book.
b) Continuous variables are variables that can have any value within an interval.

Examples: weight, Length, Volume, e t c.


1.5 Measurement scales

There are four types of measurement scales for variables:


1. Nominal scale: - “Nominal “is a Latin word for “name” This is a scale for grouping
individuals into different categories.
Example 1: red, brown, black
2: short, tall
3: pass, fail
 In this scale, one is different from the other
 +, -, *, /, impossible, comparison is impossible
2. Ordinal scale: - “ ordinal” is a Latin word, meaning “order”

 It is a scale for grouping and ordering of individuals in to different


categories.
 Data consisting of an ordering or ranking of measurements are said to be on
an ordinal scale of measurements.
Examples: military ranks, ranks in race, ranks of collage academic staff, e t c.
 One is different from and grater /better/ less than the other.
 +, -, *, / are impossible, comparison is possible.
Ordinal scales data contain and convey more information than the nominal scale data, for relative
magnitudes are known, however, quantitative comparisons are impossible.

3. Interval scale: is a measurement scale in which:


 There is no true zero point (arbitrary zero paint)
 There is no physical significance to the zero point.
 There is a constant interval size between any adjacent units on the measurement
scale.
Example: oc, oF (Measuring units of temperature)
 In this measurement scale
One is different, better/greater and by a certain amount of difference than another (Possible to add and
subtract but multiplication and division are not possible)
37Oc – 35oc = 2oc
45oc – 43 oc= 2oc
40oc = 2(20oc) But this does not imply that an object which is 40 oc is twice as hot as an object
which is 20 oc.

 Interval scale data convey better information than nominal and ordinal scale data.

4. Ratio scale: is a measurement scale in which

 There is a constant interval size between any adjacent units on the measurement scale.
 There exists a zero point on the measurement scale and that there is a physical significance to
this zero point.

Examples: height, weight, volume, etc

 One is different, larger /taller/ better/ less by a certain amount of difference and so much
times than the other.
 (+, -, *, / are possible on this scale)
 This measurement scale provides better information than interval scale of measurement
1.6 Sources of data and methods of data collection

Any aggregate of numbers cannot be called statistical data. We say an aggregate of numbers is
statistical data when they are
 Comparable
 Meaningful and
 Collected for a well defined objective
Raw data: are collected data, which have not been organized numerically.
Examples: 25, 10, 32, 18, 6, 93, 4.
An array: is an arrangement of raw numerical data in ascending or descending order of magnitude.
 It enables us to know the rang of the data set easy and it also gives us some idea about the
general characteristics of the distribution.

Any scientific investigation requires data related to the study. The required data can be obtained from
either a primary source or a secondary source.

Primary source: Is a source of data that supplies first hand information for the use of the immediate
purpose.

 Primary data: are data originally collected for the immediate purpose.
- Primary data are more expensive than secondary data.
Secondary source: are individuals or agencies, which supply data originally collected for other
purposes by them or others.
- Usually they are published or unpublished materials, records, reports, e t c.
 Secondary data: data collected from a secondary source.

Methods of data collection


There are three major methods of data collection
i. observation or measurement
ii. Interviews and questionnaires
iii. The use of documentary sources

I. Observation or measurement
In this method, data can be obtained through direct observation or measurement .
- It requires training of persons who measure in order to insure the use of standard procedure
- Provides accurate information but it is expensive and inconvenient
II. Interviews and Questionnaires
Questionnaire: - are written documents which instruct the readers or listeners to answer the questions
written on it.
There are three ways of collecting information under this method
a) Face to face interviews ( Questionnaires in charge of interviewers )
b) Telephone interviews
c) Mailed questionnaires ( Self administered questionnaires returned by mail )
III. The use of documentary sources
It is extracting of information from existing sources (e.g. Hospital records)
Exercise
1. How does statistics help for your profession?
2. Differentiate descriptive and inferential statistics.
3. Mention some limitations of statistics (discuss by examples).
4. Explain the difference between the following statistical terms by giving example?
. Qualitative and quantitative variables
. Nominal and ordinal
. Parameter and statistic
. Secondary and primary data
5. Explain various methods of collecting primary and secondary data.

6. What is a questionnaire?

7. Classify the following data based on scale of measurement.

a. Months of the year Meskerm, Tikimit, hedare …


b. The net wages of a group of workers
c. Socioeconomic status of a family when classified as low, middle and upper classes.
d. The daily temperature of w/sodo town for 30 days.
Chapter Two

Organization and Methods of Data Presentation

2.1 Classification and Tabulation of Data

Classification: - is the process of arranging items/data into classes or categories according to their
similarities and/or differences.

Classification eliminates inconsistency and also brings out the points of similarity and/or dissimilarity
of collected items/data.

Classification is necessary because it would not be possible to draw inferences and conclusions if we
have a large set of collected [raw] data.

2.2 Frequency Distributions


Frequency: - is the number of times a certain value or set of values occurs in a specific group.

A frequency distribution is a table that presents data according to some criteria with the
corresponding number of items falling in each class (i.e. with the corresponding frequencies.)

Example: A frequency distribution presenting the number of males and females in a class
Sex Frequency
Male 57
Female 39

Generally, there are two basic types of frequency distributions: Ungrouped and Grouped frequency
distributions.

1. Ungrouped frequency distribution

Ungrouped frequency distribution is a table of all potential raw scored values that could possibly
occur in the data along with their corresponding frequencies. Ungrouped frequency distribution is
often constructed for small set of data or a discrete variable.

Constructing an ungrouped frequency distribution


To construct an ungrouped frequency distribution, first find the smallest and the largest raw scores in
the collected data. Then make a columnar table of all potential raw scored values arranged in order of
magnitude with the number of times a particular value is repeated, i.e., the frequency of that value. To
facilitate counting method, tallies can be used.

Example: The following data are the ages in years of 20 women who attend health education last year:

30, 41, 39, 41, 32, 29, 35, 31, 30, 36, 33, 36, 32, 42, 30, 35, 37, 32, 30, and 41.
Construct a frequency distribution for these data.
STEP 1. Find the range of the data:
Range  Maximumobservation  Minimumobservation
STEP 2. Construct a table, tally the data and complete the frequency column. The frequency
distribution becomes as follows.

Age Tally Frequency


29 / 1
30 //// 4
31 / 1
32 /// 3
33 / 1
35 // 2
36 // 2
37 / 1
39 / 1
41 /// 3
42 / 1

2. Grouped frequency distribution

When the range of the data is large, the data must be grouped into classes. Grouped frequency
distribution is a frequency distribution when several numbers of data are grouped into one class.

Some Important Definitions


– Raw data: data collected in original form.
– Array: data arranged, in ascending or descending order.
– Class: the different, on overlapping groups of data.
– Class limits: are limits that separate one class in a grouped frequency distribution from another.
The limits could actually appear in the collected data and have gaps between the limit of one class
and the lower limit of the next class.
– Class boundaries: separate one class in a grouped frequency distribution from another. The
boundaries have one more decimal place than the raw data and therefore do not appear in the
collected data. There is no gap between the upper boundary of one class and the lower boundary
of the next class. The lower class boundary (LCB) is found by subtracting 0.5 units of
measurement from the lower class limit (LCL) and the upper class boundary (UCB) is found by
adding 0.5 units of measurement to the upper class limit (UCL). That is,
1
LCB=LCL - 2 U and UCB =UCL + 2 U 1

– Class width (W): the difference between the upper and lower boundaries of any class or the lower
limits of two consecutive classes, or the upper limits of two consecutive classes.
N.B. Class width is not equal to the difference between UCL and LCL of the same class.
– Class mark (M): the mid point of a class interval.
UCBi  LCBi
i.e. M
2
– Unit of measurement (U): the smallest difference between any two values of the variable being
measured.
– Cumulative frequency (Cf) less than type: the total frequency of all values (observations) less than
or equal to the upper class boundary for the given class.
– Cumulative frequency (Cf) more than type: The total frequency of all values (observations)
greater than or equal to the lower class boundary for the given class.

A tabular arrangement of class intervals together with their corresponding cumulative frequency
(either less than or more than type; as defined above) is called a cumulative frequency distribution.
– Relative frequency: the frequency a class divided by the total frequency (i.e. sum of all
frequencies) and, if multiplied by 100, gives the percent of values falling in that class.
Frequencyof that class
Re lative frequencyof a class 
Total frequency
Note:
 The relative frequency shows what fractional part or proportion of the total frequency belongs
to the corresponding class.
 The sum of all the relative frequencies in the frequency distribution is always 1.
– Relative cumulative frequency (less than type/ more than type): total of the relative frequencies
above/ below a class inclusively. Or the cumulative frequency (less than type/more than type)
divided by the total frequency. This gives the percent of values which are less than/more than the
upper/lower class boundary.

Guidelines to construct a grouped frequency distribution


STEP 1. Find the maximum(Max) and the minimum(Min) observation, and then compute their range, R
Range  Max  Min
STEP 2. Fix the number of classes desired (k). there are two ways to fix k:
– Fix k arbitrarily between 6 and 20, or
– Use Sturge’s Formula: k  1  3.332 log10 N where N is the total frequency. And
round this value of k up to get an integer number.
STEP 3. Find the class widths (W) by dividing the range by the number of classes and round the
number up to get an integer value. W R
K
STEP 4. Pick a suitable starting point less than or equal to the minimum value. This starting point is the
lower limit of the first class. Continue to add the class width to this lower limit to get the rest
of the lower limits.
STEP 5. Find the upper class limits. To find the upper class limit of the first class, subtract one unit of
measurement from the lower limit of the second class. Then continue to add the class width to
this upper limit so as to get the rest of the upper limits.
STEP 6. Compute the class boundaries as: LCB  LCL  12 U and UCB  UCL  12 U
Where LCL = lower class limit, UCL= upper class limit, LCB= lower class boundary and UCB=
upper class boundary. The class boundaries are also half way between the upper limit of one class and
the lower limit of the next class.
STEP 7. Tally the data.
STEP 8. Find the frequencies.
STEP 9. (If necessary) Find the cumulative frequencies (more than and less than types).

Example: The following are weights in pounds of 57 children at a day-care center:


68 63 42 27 30 36 28 32 79 27
22 23 24 25 44 65 43 25 74 51
36 42 28 31 28 25 45 12 57 51
12 32 49 38 42 27 31 50 38 21
16 24 69 47 23 22 43 27 49 28
23 19 46 30 43 49 12
Construct grouped frequency distribution:
1. The smallest number is 12 and the largest is 79, so that
R = 79 - 12 = 67
2. Let’s use 7 classes at a time

3. Class width, W  67  9.57  10


7
4. Take the starting point 10 (less or equal to the smallest number) add class width to get the
subaccount lower class limits. These are 10 20 30 40 50 60 70
5. To find the upper class limit of the first class, subtract one unit of measurement from the
lower limit of the second class. Then continue to add the class width to this upper limit so as
to get the rest of the upper limits. i.e. 20 – 1 = 19, thus the upper class limits are 19 29 39
49 59 69 79.
6. To compute the class boundaries: LCB  LCL  12 U and UCB  UCL  12 U
Where, U is the minimum difference of units in the row data, usually equal to one.
LCB1  LCL1  12 U = 10 – 0.5 = 9.5 and UCB1  UCL1  12 U = 19 + 0.5 = 19.5 then to get the

subsequent class boundaries add class width to both lower and upper class boundaries
7. Tally the data and find the frequencies by counting the no of observations belonging to the
specific class
8. Calculate cumulative frequencies (optional). Finally the resultant frequency distribution
looks like:
Table 2.2
Weight Interval Class Tally Frequency Cumulative Cumulative
Frequency (less Frequency
(lb),class limits boundaries
than type) (more than
type)
10 – 19 9.5 – 19.5 //// 5 5 57
20 – 29 19.5 – 29.5 //// //// //// //// 19 24 52
30 – 39 29.5 – 39.5 //// //// 10 34 33
40 – 49 39.5 – 49.5 //// //// /// 13 47 23
50 – 59 49.5 – 59.5 //// 4 51 10
60 – 69 59.5 – 69.5 //// 4 55 6
70 – 79 69.5 – 79.5 // 2 57 2

2.4. Diagrammatic and Graphic Presentation of Data

The data that is presented by a frequency distribution can also be displayed diagrammatically
or graphically.
Diagrams and graphs:
 are techniques for presenting data in visual displays using geometric figures;
 are visual aids which give a bird’s eye view about a given set of numerical data;
 have greater attraction than mere figures (numbers);
 facilitate comparison of data;
 are easily understandable by anyone who does have no statistical background
Usually diagrams are appropriate for presenting discrete data, whereas graphs are appropriate
for presenting continuous types of data.
There are three common diagrammatic presentations of data: bar-diagram/charts, pie-chart
and pictograms, as well as three common graphic presentations of data: histogram, frequency
polygon, and cumulative frequency polygon (O-give).

2.4.1. Bar-diagrams/ Bar-charts

 Bar-diagram is a series of equally spaced bars having equal width and the height of each
bar representing the magnitude or frequency of observations in each group.
 Bar-diagrams are usually used to represent one way or simple frequency distribution.
 Bar-diagrams can be drawn either horizontally or vertically. Usually horizontal bar-
diagrams are used for qualitatively classified data whereas vertical bar-diagrams are used
for quantitatively classified data.
Example: The mean serum cholesterol level of males and females

Figure2.1: Bar chart of serum cholesterol level of males and females

2.4.2. Pie-charts

A pie-chart is a circle that is divided into sections or wedges according to the percentages of
frequencies in each category of the distribution. The angle of the sector of a class is obtained
by multiplying the ratio of the frequency of the class to the total frequency by 3600.
frequencyof the class
i.e. sec tor angleof a class   3600
total frequency
frequencyof the class
percentageofaclass   100%
total frequency
 Note that pie-charts are usually used for depicting nominal level data.

Example: 57 medical doctors graduated from black line generalized hospital; they are
distributed to seven selected hospitals say A, B, C, D, E, F & G.
Table 2.3
Selected No. of doctors Angle Percentage
hospitals assigned (degree) (%)
A 5 31.8 8.8
B 19 119.9 33.3
C 10 63.0 17.5
D 13 82.2 22.8
E 4 25.2 7.0
F 4 25.2 7.0
G 2 12.7 3.5

G Series1,
Series1,
8, 0,9, 0, Series1,
Series1, Series1,
0% 0% 10,
11, 0,
0, 0%
0% 12, 0, 0%
F A

B
D

2.4.3. Pictograms
In pictograms, we represent the data by means of some picture symbols. Here we decide a
suitable picture to represent a definite number of units in which the variable is measured.
Example: Draw a pictorial diagram to present the following data (number of patients in a
certain hospital for four years.)
Year 1992 1993 1994 1995
No. of 2000 3000 5000 7000
patients

Let a single picture () represents one thousand students.


1995 
1994  Key: = 1000 patients
1993 
1992 
2.4.4. Histogram

A histogram is another way of data presentation which is more suitable for frequency
distributions with continuous classes. In drawing a histogram, we put the class boundaries of
each class on the horizontal axis and its respective frequency on the vertical axis.
Example: Draw a histogram for the above grouped FD (weight of children).
Table 2.4
Cumulative Cumulative
Class Class Mark Frequency Frequency (less Frequency
Boundaries than type) (more than
type)
9.5 – 19.5 14.5 5 5 57
19.5 – 29.5 24.5 19 24 52
29.5 – 39.5 34.5 10 34 33
39.5 – 49.5 44.5 13 47 23
49.5 – 59.5 54.5 4 51 10
59.5 – 69.5 64.5 4 55 6
69.5 – 79.5 74.5 2 57 2

Figure 2.2: Histogram of weight of children


2.4.5. Frequency Polygon

A frequency polygon is a line graph drawn by taking the frequencies of the classes along the
vertical axis and their respective class marks along the horizontal axis. Then join the cross
points by a free hand curve.
Example: Present the data in the previous example (weight of children) using a frequency
polygon.

Figure 2.3: Frequency polygon of weight of children

I. Cumulative Frequency Polygon (O-give)

Cumulative frequency polygon can be traced on less than or more than cumulative frequency
basis. Place the class boundaries along the horizontal axis and the corresponding cumulative
frequencies (either less than or more than cumulative frequencies) along the vertical axis.
Then join the cross points by a free hand curve.
Example: the data in the previous example can be presented using either a less than or a
more than cumulative frequency polygon as given below (i) and (ii) respectively.
(i) Less than type cumulative frequency polygon
Figure 2.4: Less than cumulative frequency polygon of weight of children

(ii) More than type cumulative frequency polygon

Figure 2.5: More than cumulative frequency polygon of weight of children


Exercise
1. Given the following row data:
62 50 57 58 51 53 62 64 60 61
60 51 64 55 55 52 60 65 58 60
59 52 63 56 56 58 64 63 62 60
58 54 62 54 54 60 65 60 62 59
56 63 52 53 62 53 61 61 59 65
a) Construct simple frequency distribution table.
b) Construct grouped frequency distribution table.
2. If class mid-points in a frequency distribution of a group of persons are 25, 32, 39, 46, 53, 60, 67,
74 and 81, find (a) size of the class interval, and (b) the class boundaries.
3. In a sample study about coffee drinking habits in two villages A and B, the following information
was recorded:
A: Females were 40%. Total coffee drinkers were 45% and male non-coffee drinkers were 20%.
B: Male were 55%, male non-coffee drinkers were 30% and female coffee drinkers were 15%.
Present the above information in a tabular form.
4. The following table shows the marital status of males and females (18 years and older) in a certain
city. Draw a pie chart separately for males and females to display the data.
Marital Status Male (percent of total) Female (percent of total)
Single 21 16
Married 65 73
Widowed 9 4
Divorced 5 7
5. Prepare (a) histogram (b) frequency polygon (c) Ogive for the following frequency distribution of
marks in a final examination.
Class 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89
Frequency 6 12 20 14 12 8 6 2

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy