0% found this document useful (0 votes)
3 views

Chapter 04

The document discusses the importance of measures of dispersion in understanding the representativeness of data distributions, emphasizing that central tendency alone does not provide a complete picture. It outlines two types of dispersion measures: absolute and relative, detailing various methods such as range, quartile deviation, mean deviation, and standard deviation, along with their advantages and disadvantages. The document also highlights the significance of standard deviation in statistical analysis and its applications in various fields.

Uploaded by

nasifjamil.khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Chapter 04

The document discusses the importance of measures of dispersion in understanding the representativeness of data distributions, emphasizing that central tendency alone does not provide a complete picture. It outlines two types of dispersion measures: absolute and relative, detailing various methods such as range, quartile deviation, mean deviation, and standard deviation, along with their advantages and disadvantages. The document also highlights the significance of standard deviation in statistical analysis and its applications in various fields.

Uploaded by

nasifjamil.khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Dispersion, Nature and Shape of Frequency Distribution 65

CHAPTER IV • Measure of dispersion is needed to know representativeness


of the observations of a distribution; representativeness of
DISPERSION, NATURE AND SHAPE OF FREQUENCY mean can not be judged without the knowledge about
DISTRIBUTION
dispersion.
Central tendency is one character of a distribution. Measures of • Measures of dispersion help control the deviation of data.
central tendency give the idea of central value or location of the • Measures of dispersion give the comparative picture of
distribution. But the central tendency is not the only character of a different distributions.
distribution. Two distributions may be different despite of their same • Measures of dispersion help control the quality of industrial
central value. As for example, the data set comprised of the values 0, products.
10 and 20 has 10 as its mean and median. Again the mean and median • Measures of dispersion is important for time series data such
of the series 5, 10, 15 is also 10. But the deviation of these values as rainfall, temperature etc., where central values are less
from their mean is not same. The deviation of observations from their important.
mean is called dispersion. The measure of dispersion or variation is
Measures of Dispersion may be divided in two broad types :
the measure of the extent of variation or deviation of individual
values from the central value. This measure of variation gives a (a) Absolute Measures and
precise idea as to the extent of representativeness of the central value. (b) Relative Measures.

Characteristics of an Ideal Measure of Dispersion : (a) Absolute Measures :


The following are the requisites for an ideal measure of
1. Range 2. Quartile Deviation 3. Mean Deviation and
dispersion :
4. Standard Deviation
• It should be rigidly defined.
• Absolute measures of dispersion will retain the unit of
• It should be easy to understand and easy to calculate. measurement of the variable.
• It should be based on all the observations.
• It should be suitable for further algebrical treatments. (b) Relative Measures :
• It should be least affected by sampling fluctuation. 1. Co-efficient of Range 2. Co-efficient of quartile deviation
• It should be least affected by extreme values. 3. Co-efficient of mean deviation 4. Co-efficient of variation.
• Relative measures of dispersion have no unit because these
Importance of Measuring Dispersion :
are the ratio of absolute measures and the corresponding
Dispersion is an important character of distribution. Measures of
values.
dispersion are widely used for the accurate and efficient analysis of
data. The importance of measures of dispersion are :
66 Methods of Statistics Dispersion, Nature and Shape of Frequency Distribution 67

4.1. Absolute Measures of Dispersion : Q 3 − Q1


QD =
Range : 2
Range is the absolute difference between the highest and lowest It is also known as semi-interquartile range.
observations of a distribution. When the frequency distribution is Advantages of Quartile Deviation :
arranged in order of magnitude then range will be the absolute • It is a very easily understandable location based measure.
difference between the mid-values of last class and first class. • It is superior to other measures in the sense that the extreme
Symbolically ; Range = | Xmax - Xmin | = | XM - XL | values cannot affect the quartile deviation.
Range is the samplest and a crude measure of dispersion. Range • For distributions with open ended class intervals no other
is based on two extreme observations only. measure can be computed but it is possible to compute
Advantages of Range : quartile deviation.
• It is very easy to understand and easy to calculate. Disadvantages of Quartile Deviation :
• It gives us a quick idea about the variability of a set of data. • It is not a good measure of dispersion because it does not
measure the deviation from any central value of the
• It is based on the extreme observations only and no detail
distribution.
information is required.
• It is not based upon all the observations.
• It is the simplest of all measures of distribution.
• It is more affected by sampling fluctuations.
Disadvantages of Range :
• It is not suitable for further algebric treatment.
• It is very much affected by the extreme values.
Uses of Quartile Deviation :
• It provides us with the idea of only two extreme values in a
• Quartile deviation is a location-based measure and can be
set of data.
profitably used where a rough estimate of the variation is
• It cannot be computed for data set having open ended class desired.
interval. • It is a suitable measure of dispersion when the frequency
Uses of Range : distribution has open-ended class interval.
• Range is used to forecast the weather, the percentage of Mean Deviation :
humidity in the air for weather forecasting. The arithmetic mean of the absolute deviations of the given
• It is used in reporting daily market price of commodities. observations from their central value is called mean deviation; it can
• It is used in statistical quality control. be measured from mean, median and mode.
Mean deviation of a distribution having observations x1, x2, .......,
Quartile Deviation :
xn may be defined as follows :
The quartile deviation is the half of the difference between the
upper quartile (Q3) and lower quartile (Q1). • Mean deviation from mean or simply mean deviation :
68 Methods of Statistics Dispersion, Nature and Shape of Frequency Distribution 69

1 n Example 4.1 :
MD ( x ) =  xi − x
n i =1 Computing mean deviation of the daily wages of a group of farm
In the case of frequency distribution labours (given in example 3.1): The mean, median and mode are
1 n n respectively, x = 66.40, Me = 66.43, Mo = 66.67.
MD ( x ) =  f i x i − x ; where N =  f i
N i =1 i =1
Mid fi|xi- fi|xi-
fi |xi- x | |xi-Me| |xi-Mo| fi|xi- x |
• Mean deviation from median : value (xi) M e| M o|
52.5 5 13.9 13.93 14.17 69.5 69.65 70.85
1
MD (M e ) =  xi − Me
n 57.5 10 8.9 8.93 9.17 89.0 89.30 91.70

In the case of frequency distribution 62.5 25 3.9 3.93 4.17 97.5 98.25 104.25

1 67.5 35 1.1 1.07 0.83 38.5 37.45 29.05


MD (M e ) =  fi xi − Me
N 72.5 15 6.1 6.07 5.83 91.5 91.05 87.45
• Mean deviation from mode : 77.5 7 11.1 11.07 10.83 77.7 77.49 75.81
1 82.5 3 16.1 16.07 15.83 48.3 48.21 47.49
MD (M o ) =  xi − Mo
n Total 100 512.0 511.40 506.6
In the case of frequency distribution
1 512.0
1 Mean deviation from mean MD ( x ) =  fi xi − x = = 5.12
MD (M o ) =  f i x i − M o N 100
N
1 511.4
Mean deviation from median MD(M e ) =  fi xi − Me = = 5.114
Advantages of Mean Deviation : N 100

• 1 506.6
It is based on all the observations Mean deviation from mode MD(M o ) =  fi xi − Mo = = 5.066
N 100
• It is rigidly defined and easy to understand.
• It is not affected by the extreme values Theorem - 4.1: Mean Deviation from the Median is the Minimum.
• It is suitable for comparative discussion. Proof : Let 2n be the number of observations which are arranged in
Disadvantages of Mean Deviation : order of magnitude as x1, x2, ........., xk, xk+1, ........., xn, xn+1, ........, x2n.
Median (Me) lies between xn and xn+1 because observations are
• It cannot be computed for open-ended class intervals. arranged in order of magnitude (shown below) :
• It is not amenable to father algebraic treatment. Me
• It is seldom used in statistical decision making.
x1 x2 xk xk+1 ............ xn-1 xn xn+1 xn+2 .......... x2n-1 x2n
70 Methods of Statistics Dispersion, Nature and Shape of Frequency Distribution 71

Sum of the absolute deviations of observations from median is For a set of observations x1, x2, ........ xn standard deviation is
given by computed as

 | xi - Me | = [(Me - x1) + (Me - x2) + ...... + (Me - xk)] + [(Me - xk+1) +


1 n
(Me - xk+2) + .......... + (Me - xn)] + [(xn+1 - Me) + (xn+2 - Me) + ......... + =  (x i − x)
2
n i =1
(x2n - Me)] ........................ (1) For frequency distributions.
Again, sum of the absolute deviations of observations from any 1 n n
=  f i ( x i − x ) ; where N =  f i
2
other value xk is given by N i =1 i =1

|xi - xk | = [(xk - x1) + (xk - x2) + ... + (xk - xk) + (xk+1 - xk) + (xk+2 - xk) Root mean−square deviation from an arbitrary value a is denoted
1
+ .... + (xn - xk)] + (xn+1 - xk) + (xn+2 - xk) + ..... + (x2n - xk) ......... (2) by s and is computed as, s =  f i (x i − a)
2
N
Now, subtracting equation (1) from equation (2); we get Standard Error : The standard deviation of the sampling
 | xi - xk | -  | xi - Me | = 2(n-k) xk+1 - 2(n-k)xk distribution of a statistic (say mean) is known as standard error. It is
= 2(n - k) (xk+1 - xk)  0;[Since xk+1 > xk] denoted by SE.
  | xi - xk | -  | xi - Me |  0 Let x1, x2, ......... , xn be the observations of a sample of size n,
  | xi - xk |   | xi - Me | the standard error of mean is given by
1 1 
or, |xi - xk|   | xi - Me | SE( x ) =
2n 2n n
or, MD (xk)  MD (Me)  = Population standard deviation, x = Sample mean (statistic)
i.e., MD (Me)  MD (xk) ; for all 0 < x < 2n Advantages of Standard Deviation :
• It is rigidly defined.
[Note : This theorem is always true for ungrouped data but may not
• It is based upon all the observations.
always be true for grouped frequency distribution; Ref: example 4.1
• It is less affected by sampling fluctuation.
above].
• It is suitable for further algebric treatments.
Standard Deviation :
• The standard deviation of the combined series can be
The arithmetic mean of the squares of deviations of the
obtained if the number of observations, mean and standard
observations of a series from the their mean is known as variance.
deviation in each series are known.
The positive square root of variance is called standard deviation. The
variance is denoted by 2 and standard deviation is denoted by . Disadvantages of Standard Deviation :
Standard deviation, therefore, may be defined as the root mean square • It is not readily comprehensible, computation requires a
good deal of time and knowledge of mathematics.
deviation from the mean.
72 Methods of Statistics Dispersion, Nature and Shape of Frequency Distribution 73

• It is affected by the extreme values. 1. Standard Deviation is Independent of Change of Origin but
• It cannot be computed in case of distributions having open- not of Scale.
ended class interval.
Proof.
Uses of Standard Deviation :
Standard deviation is the most useful measure of dispersion. The Let, x1, x2, ........ , xn be the mid-values of the classes of a
use of standard deviation is highly desirable in advanced statistical frequency distribution and let f1, f2, ......., fn be their corresponding
works. Sampling and analysis of data have got their basis on standard xi − a
frequencies and also let, ui = ; where ui, a and h are changed
deviation. Sampling, correlation analysis, the normal curve of errors, h
comparing variability and uniformity of two sets of data which are of variate, origin and scale respectively.
great use in statistical works, are analysed in terms of standard xi − a x −a
ui =  u=
deviation. h h
Thus standard deviation is the most important measure of Now standard deviation of the new variable u is
dispersion.
1
u = f i (u i − u ) 2
Difference between Mean Deviation and Standard Deviation. N
• In computing mean deviation (MD), we omit the sign of 2
1 n xi − a x − a 
deviation but in computing standard deviation (s.d.) we do =  fi  − 
N i =1  h h 
not need to omit the sign of the deviations.
• M.D. can be computed from mean, Median or Mode but in 1 n xi − a − x + a 
2
=  fi  
computing s.d. we consider only the deviations from the N i =1  h 
mean.
2
• M.D. is not suitable for further algebraic treatment but s.d. is 1 n  xi − x  1 1 n
=  f i   =
  f i (x i − x)
2
N i =1  h  h N i =1
suitable for further algebraic treatment.
1
Some Properties of Standard Deviation : = x
h
1. Standard deviation is independent of change of origin but
 x = h u
not of scale.
2. Standard deviation is the least possible root mean square This implies that standard deviation is independent of change of
deviation. origin but not of scale.
3. For two observations, standard deviation is the half of the
range.
74 Methods of Statistics Dispersion, Nature and Shape of Frequency Distribution 75

2. Standard Deviation is the Least Possible Root Mean Square


Deviation.
We have,  2 =
1 2 1

 (x i − x) = (x1 − x) + (x 2 − x)
2 i =1
2
2
2 2

1 
2 2
x1 + x 2   x + x2  
Proof: =  x 1 −  +  x 2 − 1  
2  2   2  
Let, x1, x2, ........ , xn are the values of 'n' observations with 
1  x 1 − x 2   x 2 − x 1  
corresponding frequencies f1, f2, ...... , fn. Also let x be the arithmetic 2 2

mean of the observations.


= 
2   +  2  
 

2   
2 2
 x − x2  | x − x 2 | 
1 1 n  , 2 =  1  =  1 
We have,  x = f i (x i − x) 2 and s =  f i (x i − a )
2
 2   2 
N N i =1
| x1 − x 2 | 1
= = | x 1 − x 2 | = Half of range.
Mean square deviation from an arbitrary value 'a' is given by 2 2

1 n • Working Formula of Standard Deviation:


s2 =  f i (x i − a )
2
N i =1 Here,
n
 (x i − x )
n n 2
or, Ns =  f i ( x i − a ) =  f i ( x i − x ) + ( x − a )
2 2 2
I =1
i =1 i =1
=  x i2 − 2x  x i +  x 2 =  x i2 − 2x  x i + nx 2
= f i (x i − x ) 2 + 2f i ( x i − x )( x − a ) + f i ( x − a ) 2 2
  xi    xi 
= N 2x + 2( x − a ) f i ( x i − x ) + f i ( x − a ) 2 =  x i2 − 2 ( x i ) + n n 
 n   
= N 2x + 2( x − a ) x 0 + Positive quantity [ f i ( x i − x ) = 0]
=  x i2 − 2
( xi )2 + ( xi )2
n n
 Ns2 = N 2x + positive quantity
( x i ) 2

 Ns2  N 2x =  x i2 −
n
  2 =  (x i − x )2
1
 s2   2x
n
i.e.,   s Proved.
1  ( x i ) 2 
=  x i2 − 
n  n 
3. For two Observations, Standard Deviation is the half of the
2
Range. 1 2   xi 
=  x i −  
n  n 
Proof:
2
x1 + x 2 1 2   xi 
Let, x1 and x2 be two observations. Then, x =  =  x i −  
2 n  n 
76 Methods of Statistics Dispersion, Nature and Shape of Frequency Distribution 77

In case of grouped data 1


 f i (z i − z) 2  0
N
2
2   fi xi 
1 n 1
=  f i x i −   ; where N =  f i  f i (z i2 + z 2 − 2z i z )  0
N  N  i =1 N

2 ( f i x i ) 
1 1 1
1 
2
 f i z i2 + f i z 2 − 2f i z i z  0
=  f i x i −  N N N
N  N 

1
Theorem 4.2 : Standard Deviation is Smaller Than Range.  f i z i2 + z 2 − 2z 2  0
N
1
Proof :  f i z i2 − z 2  0
N
Let, x = Mean and R = Range of 'n' observations x1, x2, ........ , xn.
2
Since, Range is the difference between the highest and lowest 1 1 
 f i z i2 −  f i z i   0
observations of the distribution, it will be greater than (xi - x ). N N 
i.e., R > (xi - x ) 2
1 1 
1
We have,  =  ( x i − x ) 2
2  f i z i2   f i z i 
n N  N 
1
 
= ( x 1 − x ) 2 + ( x 2 − x ) 2 + ......... + ( x n − x ) 2
n

1 1 
f i | x i − x | 2   f i | x i − x | 
2

N N 
1
n

< R 2 + R 2 + ........... + R 2 =  nR 2
n
= R2

1 1
f i (x i − x) 2  f i | x i − x | | x i − x | 2 = (x i − x) 2 
N N
 <R
2 2

<R Proved.    MD( x ) Proved.

Theorem 4.3 : Standard Deviation is Greater than Mean • Standard Deviation of first 'n' Natural Numbers.
Deviation from Mean.
First n natural numbers are 1, 2, 3, ............ , n.
Proof :  n  
2
1 1    x i  
By definition,  2 = f i (x i − x) 2 and MD(x) = f i | x i − x | 1n 
N N Variance,  2 =   x 2i −  i =1  
n i =1 n 
Let, | x i − x |= z
 
 
Now, (z i − z ) 2  0 ; Since any square quantity is positive.
=
1
 2

n
1(+ 2 2
+ 3 2
+ .......... + n 2
− )
(1 + 2 + 3 + .......... + n ) 2 
n



 
78 Methods of Statistics Dispersion, Nature and Shape of Frequency Distribution 79

  n (n + 1)  
2 The variance of the combined series is
   
1  n (n + 1)(2n + 1)  2   1 N
=  − 2 =  (x k − x)
2
n 6 n  N k =1
  n1 n2
  N
  or, N 2 =  ( x k − x ) 2 =  ( x 1i − x ) 2 +  ( x 2 j − x ) 2
k =1 k =1 j=1
1  n (n + 1)( 2n + 1) n (n + 1) 2 
=  −  n1
 ( x 1i − x ) = ( x 1i − x 1 ) + ( x 1 − x )
2 2
n  6 4  Now,
i =1

 2n (n + 1)(2n + 1) − 3n (n + 1) 
=
2



= Σ ( x 1i − x 1 ) 2 + ( x 1 − x ) 2 + 2(x 1i − x 1 )( x 1 − x ) 

 12 n 
 = ( x 1i − x 1 ) 2 + ( x 1i − x ) 2 + 2( x 1 − x ) ( x 1i − x 1 )
 2n(2n + 1) − 3n(n + 1) 
= (n + 1)  = n 112 + d12 + d1 ( x 1i − x 1 ) ; Putting d 1 = x 1 − x
 12n 
 n2 − n  = n 112 + n 1d12 + 0 ; [Since, ( x 1i − x 1 ) = 0]
= (n + 1)  = (n + 1) n (n − 1)
 12n  12n = n 1 (12 + d12 )
 
n2
(n − 1) n 2 − 1 Similarly, we get,  ( x 2 j − x ) 2 = n 2 ( 22 + d 22 )
= (n + 1) =
12 12 j=1

n 2 −1 where, d 2 = x 2 − x
 =
12  N 2 = n 1 (12 + d12 ) + n 2 ( 22 + d 22 )
• Standard Deviation of Combined Series.
n 112 + n 2  22 + n 1d12 + n 2 d 22
Let, x1i (i = 1, 2, ......., n1) and x2j (j = 1, 2, ........, n2) are two series  2 = ...................... (i)
N
with means x 1 and x 2 and variances 12 and  22 respectively.
n 112 + n 2  22 + n 1 ( x 1 − x ) 2 + n 2 ( x 2 − x ) 2
Mean of the combined series is given by =
n1 + n 2
n x + n 2 x 2 n1x1 + n 2 x 2
x= 1 1 = ; where N = n1 + n2
n1 + n 2 N n 112 + n 2  22 + n 1 ( x 1 − x ) 2 + n 2 ( x 2 − x ) 2
 =
By definition, n1 + n 2
1 n1 1 n2 Alternative Way :
12 =  ( x 1i − x 1 ) and  2 =
2 2
 (x 2 j − x 2 )
2
n 1 i =1 n 2 j=1
n x + n 2 x 2 n 2 (x1 − x 2 )
n1 d1 = x 1 − x = x 1 − 1 1 =
n2 n1 + n 2 n1 + n 2
 n 112 =  ( x 1i − x 2 ) 2 and n 2  22 =  ( x 2 j − x 2 ) 2
i =1 j=1
80 Methods of Statistics Dispersion, Nature and Shape of Frequency Distribution 81

n1 (x 2 − x1 ) Solution :
Similarly , d 2 =
n1 + n 2 Direct Method :
Putting the values of d1 and d2 in (1) we get after simplification Class Mid value
frequency fi fixi fi x i2
interval of class xi
n 112 + n 2  22 n 1n 2 50-60 5 55 275 15125
= + ( x 1 − x 2 ) 2 ................ (ii)
n1 + n 2 (n 1 + n 2 ) 2 60-70 9 65 585 38025
70-80 13 75 975 73125
Using (i) above, the relationship can be generalized for k sets to 80-90 20 85 1700 144500
90-100 19 95 1805 171475
100-110 9 105 945 99225
(n 112 + n 2  22 + ......... + n k  2k ) + (n 1d12 + n 2 d 22 + ......... + n k d 2k ) 110-120 5 115 575 66125
2 = Total N=80 6860 607600
N
n i  i2 + n i d i2
1  2 (f i x i ) 
2
= Standard deviation  =
n i f i x i − 
N  N 
n i  i2 + n i d i2
i.e.,  =
=
1  (6860)2  = 19355
n i 607600 − 
80 
 80   80
where ni is the size of the ith set, x i and  i2 are the mean and
= 15.554
variance respectively of the ith set, di = x i − x , and 2 is the variance Indirect Method :
[We change the origin to x = 85 and scale by dividing by 10]
of the combined set.
Class Mid value frequency x i − 85
Example 4.2 : interval
ui fi u i fi u i2
of class xi fi 10
The frequency distribution of the weight of tomato (Example 50-60 55 5 -3 -15 45
2.2) is reproduced below : 60-70 65 9 -2 -18 36
Weights: 50-60 60-70 70-80 80-90 90-100 100-110 110-120 70-80 75 13 -1 -13 13
No. of 80-90 85 20 0 0 0
5 9 13 20 19 9 5
tomato :
90-100 95 19 1 19 19
Calculate standard deviation by direct method and indirect
100-110 105 9 2 18 36
method.
110-120 115 5 3 15 45
Total N=80 6 194
82 Methods of Statistics Dispersion, Nature and Shape of Frequency Distribution 83

But actual x i2 = 93760 – 962 + 692 = 89305


1  2 (f i u i ) 
2
u = 
 i i
f u − 
N  N   Actual standard deviation is
89305
1  (6)2  = 1 (193.55) = − (66.65) 2 = 4465.25 − 4442.2225 = 4.80 (app.)
= (194) −  = 1.5554 20
80 
 80   80
x = hu = 10 x 1.554 = 15.554 Example 4.4:
[Note : The second method is generally known as the short-cut- Two sets of data having 200 and 250 observations have means
method. But at the present age of electronic calculator it is no more a 25 and 15 respectively and standard deviations 3 and 4 respectively.
short-cut method, rather it is more lengthy and time consuming. That If the two sets are combined together what will be the mean and
is why, the method is termed here as an indirect method. However,
standard deviation of the combined set?
the method is sometimes useful when the observations of the
distributions are large.]
Solution : Given that,
Example 4.3 : n1 = 200 x 1 = 25, 1 = 3 and
A student while calculating mean and standard deviation of 20 n2 = 250 x 2 = 15,  2 = 4
observations obtained mean as 68 and standard deviation as 8. At the Let, mean and standard deviation of the combined set are
time of checking it was fond that he copied 96 instead of 69. What x and  respectively.
would be the actual values of mean and standard deviation ? We know, the combined mean for two sets of observation is
n x + n 2 x 2 200 x 25 + 250 x 15 8750
Solution : Here, n = 20, x = 68 and  = 8 x= 1 1 = = = 19.44
n1 + n 2 200 + 250 450
x i Again the combined standard deviation for two sets of observation is
We know, x =  x i = nx = 20 x 68 = 1360
n
Since the student copied 69 instead of 96, the actual sum of the
observations is n 112 + n 2  22 n 1n 2
= + (x1 − x 2 ) 2
xi = 1360 – 96 + 69 = 1333 n1 + n 2 (n 1 + n 2 ) 2

 Actual mean, x =
1333 200 x 3 2 + 250 x 4 2 200 x 250
= 66.65 = + (25 − 15) 2
20 200 + 250 (200 + 250) 2

1
Again we know,  2 = x i2 − x 2 =
5800 5000000
+ = 12.89 + 24.69 = 6.13
n 450 202500
 x i2 = n( 2 + x 2 ) = 20 (82 + 682) = 93760
84 Methods of Statistics Dispersion, Nature and Shape of Frequency Distribution 85

4.2. Relative Measures of Dispersion : [Note : For comparing the variability of two series, we calculate the
• Co-efficient of Range : When the range is divided by the C.V. for each series. The series having greater C.V. is said to be more
variable (unstable) than the other and the series having smaller C.V. is
sum of highest and lowest items of the data and expressed in said to be more consistent (stable/ homogeneous) than the other. Thus
percentage we get the coefficient of range (CR). C.V. is of the great practical significance and is the best measure for
comparing the variability of two or more series.]
xm − xl
Thus, CR = x 100%
4.3. Moments :
xm + xl
where xm = the highest value of the data Moments are constant which are used to determine some
xl = the lowest value of the data characteristics (e.g., nature, shape etc.) of frequency distributions.
Moments about the mean are called the central moments and those
• Coefficient of Quartile Deviation : When the difference of about arbitrary value (other than mean) are known as raw moments.
Q3 and Q1 is divided by their sum and expressed in
If x1, x2, ......., xn occur with frequencies f1, f2, ......, fn,
percentage, we get the coefficient of quartile deviation
respectively, then the rth central moment given by ;
(C.Q.D).
n
Q − Q1  f i (x i − x)
r
Thus, CQD = 3 x 100%
Q3 + Ql  r = i =1 ; where N = fi ; r = 1, 2, 3, 4, ......... etc.
N
where Q3 and Q1 are the upper and lower quartiles 1
In particular :  o = f i ( x i − x ) o ; when r = 0
respectively. N
1 1
= = f i = ( N) = 1
• Co-efficient of Mean Deviation : N N
1
MD( x )
CMD based upon mean, CMD( x ) = x 100%
1st central moment, 1 = f i (x i − x)1 ; when r = 1
N
x
MD(M e ) 1 = 0, Since f i ( x i − x ) = 0
CMD based upon median, CMD (M e ) = x 100% [µ1 for any distribution is zero]
Me
1
MD(M o ) 2nd central moment,  2 = f i ( x i − x ) 2 = 2; when r = 2
CMD based upon mode, CMD (M o ) = x 100% N
Mo
[2nd central moment µ2 is the variance]
• Coefficient of Variation : 1
3rd central moment,  3 = f i ( x i − x ) 3 ; when r = 3
Coefficient of variation of a set of data is the ratio of the N
standard deviation to mean expressed as percentage. 1
4th central moment,  4 = f i (x i − x ) 4 ; when r = 4 etc.
x N
Thus, C.V = x 100%
x • Raw Moment :
The rth raw moment about any arbitrary value 'a' is defined as
86 Methods of Statistics Dispersion, Nature and Shape of Frequency Distribution 87

1 n • When, r = 3
 r =  f i (x i − a )
( )  ( )  (1 )2 − (33 )3−3 (1 )3
r
N i =1  3 =  3 − 3
1 3−1 1 + 3
2 3−2
1 2
rth raw moment about the origin (a = 0) is  r =
3
f i x ir = 3 − 3 2 1 + 31 1 −  0 1
N
1 3 3
• when, r = 1, 1 =
f i x i = x =  3 − 3 2 1 + 31 − 1
N 3
[First raw moment 1 is the arithmetic mean] =  3 − 3 2 1 + 21

1 1 1 • When, r = 4
 2 = f i x i2 ,  3 = f i x 3i ,  4 = f i x i4
( )  ( )  ( )  ( ) 
etc. 2 3 4
N N N  4 =  4 − 4
1 4−1 1 + 4
2 4− 2 1 − 4
3 4−3 1 + 4
4 4− 4 1
• Relation Between Central Moments and Raw Moments : 2 3 4
(rth central moment in terms of raw moments) =  4 − 4 3 1 + 6 2 1 − 41 1 +  0 1
1 2 4
r = f i ( x i − x ) r =  4 − 4  3 1 + 6 2 1 − 31
N
1
() () ()
= f i x ir − 1r x ir −1 ( x) + 2r x ir −2 (x ) 2 − 3r x ir −3 ( x ) 3
N
Moments are Independent of Change of Origin but not of Scale.

( )x
Proof. :
+ .......... + (−1) r −1 r −1r i ( x ) r −1 + (−1) r ( x ) r Let, x1, x2, ............., xn be the mid-values of the classes of a
frequency distribution and let f1, f2, ........., fn be their corresponding
=
1
N
()
f i x ir − 1r
N
1
()
f i x ir −1 (x) + 2r
N
1
f i x ir −2 (x) 2 + 3r() 1
N
f i x ir −3 (x) 3 frequencies.
Now rth central moment is
( )
+ .......... + (−1) r −1 r −1r
N
1 1
f i x i ( x ) r −1 + (−1) r f i (x ) r
N  r( x ) =
1
f i ( x i − x ) r
N
Putting x = 1 , 1st raw moment about the origin, we get
() () ()
 r =  r − 1r  r −1 1 + 2r  r −2 (1 )2 − 3r  r −3 (1 )3
We change the origin and scale of x such that
xi − a x −a
ui = ==> u =
+.......... + (−1) r −1 r 1 (1 )r −1 + (−1) r (1 )r h h
• When, r =2 Now; for new variate u ; we have
() ()
 2 =  2 − 21  2−1 1 + 22  2−2 (1 )2 =  2 − 21 1 +  0 (1 )2  r( u ) =
1
N
f i (u i − u ) r
r
1 x − a x − a 
=  2 − 2 (1 ) + (1 )
2 2 1
[Since  0 = f i x i0 = 1] = f i  i − 
N N  h h 
r
=  2 − 1
2
1  x −a −x +a  1 1
= f i  i  =
 f i ( x i − x ) r
N  h  r
h N
88 Methods of Statistics Dispersion, Nature and Shape of Frequency Distribution 89

1 Compute first four central moments (use Sheppard's-correction for the


  r( u ) =  r( x )
r 2nd and 4th central moments) :
h
  r( x ) = h r  r( u )
Solution :
Hence, moments are independent of original but dependent on No. of Mid
Wages x i − 12.5
scale. Proved. labours value u i= fi u i fi u i2 fi u 3i fi u i4
(Tk.) 5
fi xi
• Sheppard's Correction for Moments :
0-5 10 2.5 -2 -20 40 -80 160
In calculating the moments of a grouped frequency distribution,
5-10 15 7.5 -1 -15 15 -15 15
we assume that all the values within a class interval refer to the
mid-value of the class interval. If the distribution is symmetrical 10-15 40 12.5 0 0 0 0 0
or moderately asymmetrical and the class intervals are small (not 15-20 25 17.5 1 25 25 25 25
1 20-25 10 22.5 2 20 40 80 160
greater that th Range), this assumption is approximately true.
Total 100 10 120 10 360
20
Generally, this assumption is not always true, some error, called 1 1
1 = f i u i = x 10 = 1.0
grouping error creeps into the calculation of the moments. (u ) N 100
W.F. Sheppard proposed that if 1 1
 2 = f i u i2 = x 120 = 1.2
(i) the frequency distribution is continuous and (u ) N 100
(ii) the frequency tapers off to zero in both ends of the class 1 1
 3 = f i u 3i = x 10 = 0.1
interval the effect due to grouping at the mid-point of the (u) N 100
intervals can be corrected by the following formulae, known 1 1
 4 = f i u i4 = x360 = 3.6
as Sheppard's Corrections : (u ) N 100
h2 2
2 (correlated) = 2 - Now, 2 =  2 − 
1 ( u )  = 1.2 − (0.1) = 1.19
2
12 (u ) (u)  
3 (corrected) = 3 3
h2 7 4 3 =  3 − 3 2 1 + 2 
1 ( u ) 
4 (corrected) = 4 - 2 + h (u) (u ) (u ) (u)  
2 240
where h is the length of class interval. = 0.1 - 3(1.2)(0.1) +2{0.1}3 = -0.258
2 4
Example 4.5 : The wages per hour of 100 farm labours are given 4 =  4 − 3 3 1 + 6 2    − 3  
 
( u )  1( u ) 
 1( u ) 
below :
(u ) (u) (u ) (u)  
= 3.6 - 4(0.1)(0.1) + 6(1.2)(0.1)2 - 3(0.1)4
Wages (Taka) : 0-5 5-10 10-15 15-20 20-25
= 3.6 - 0.04 + 0.072 - 0.003 = 3.6317
No. of labours : 10 15 40 25 10
90 Methods of Statistics Dispersion, Nature and Shape of Frequency Distribution 91

First Four Central Moments of the Original Variable: Skewness may be positive or negative. Skewness is said to be
1( x ) = 0 positive if the frequency curve is more elongated to the right side. In
 2( x ) = h 2  2( x ) = (5) 2 (1.19) = 29.75 this case mean of the distribution lies at the right of (or greater than)
the mode.
 3( x ) = h 3  3( x ) = (5) 3 (-0.258) = -32.25
i.e, x > Me > Mo.
 4( x ) = h 4  4( x ) = (5) 4 (3.6317) = 2269.8125 On the otherhand, the skewness is negative if the frequency
Application of Sheppard's correction for moments: curve is more elongated to the left side. In this case mean of the
h2 (5) 2 distribution lies at the left of (or less than) the mode.
2 (corrected) = 2 - = 29.75 − = 27.667
12 12 i.e, Mo > Me > x
3 (corrected) = 3 = -32.25
For symmetrical distributions the mean, median and mode are
h2 7 4
4 (corrected) = 4 - 2 + h same.
2 240
(5) 2 7
= 2269.8125 - (29.75) + (5) 4 Proof*:
2 240
Let us consider the following symmetrical continuous
= 1916.1667
4.4. Skewness : frequency distribution with equal class interval (x1 < x2 < ...... < xn+1):
Skewness means lack of symmetry. For an asymmetric
distribution it is the departure from symmetry. Table 4.1: Frequency distribution.
Symmetrical Distribution : A distribution is said to be Class interval Mid value Frequency Cumulative frequency
yi fi Fi
symmetrical if the frequencies are symmetrically distributed about the
x1 - x2 y1 f1 F1
mean. For symmetrical distributions the values equi-distant from x2 - x3 y2 f2 F2
mean have equal frequency. For example, the following distribution is x3 - x4 y3 f3 F3
symmetrical about its mean 4. : : : : :
x: 0 1 2 3 4 5 6 7 8 xk-1 - xk yk-1 fk-1 Fk-1
f: 12 14 16 18 20 18 16 14 12 xk - xk+1 yk fk Fk
xk+1 - xk+2 yk+1 fk+1 Fk+1
Again for symmetrical distribution mean = mode = median. : : : : :
• A distribution is said to be skewed if - xn-1 - xn yn-1 fn-1 Fn-1
xn - xn+1 yn fn Fn
i) Mean, median and mode fall at different points.
ii) Q1 and Q3 are not equidistant from median and
[*Adopted with minor modification from an unpublished article by
iii) The curve drawn with the help of the given data is not M. Amirul Islam providing a theoretical proof.]
symmetrical but elongated more to one side,
92 Methods of Statistics Dispersion, Nature and Shape of Frequency Distribution 93

Since the distribution is symmetrical, we will have f1 = fn, f2 = fn- ½f k


= xk + h
1, .........., fk-1 = fk+1 and fk will be the highest frequency. Let us fk
consider that h be the width of each class interval. h
= xk +
2
From the traditional formula of mode we get,
f o − f1
We know, for a frequency distribution
Mo = Lo + h
2f o − f1 − f 2
1 n
h (f k − f k −1 ) Arithmetic mean, x =  fi yi ........................ (2)
= xk + N i =1
2f k − f k −1 − f k +1
For a symmetric distribution
[Putting Lo = xk, fo = fk, f1 = fk-1 and f2 = fk+1]
n
h (f − f )  f i y i = f1y1 + f2y2 + ... + fk-1 yk-1 + fkyk + fk+1yk+1 +... + fnyn
= x k + k k −1 ; [since fk-1 = fk+1] i =1
2f k − 2f k −1
h = (f1y1 + fnyn) + (f2y2 + fn-1 yn-1) + (fk-1 yk-1 + fk+1 yk+1) + fkyk
= xk +
2 = f1(y1 + y2) + f2(y2 + yn-1) + ..... + fk-1(y\k-1 + yk+1) + fkyk…..(3)
Since the distribution is symmetrical we get, [f1 = fn, f2 = fn-1, ........., fk-1 = fk+1]
f1 + f2 + ........ + fk-1 + ½fk = ½fk + fk+1 + fk+2 + ........ + fn
N 1 n As the distribution is symmetrical we will also have
 = f1 + f 2 + ....... + f k −1 + f k , where N =  f i
2 2 i =1 (y1 + yn) = (y2 + yn-1) = ............... = (yk-1 + yk+1) = 2yk
N 1
 = Fk −1 + f k ........................................... (1) Putting these values in equation (3) we get,
2 2
n
 f i y i = 2yk (f1 + f2 + ....... + fk-1) + fkyk
Again, from the traditional formula of median we get, i =1
N = 2yk (f1 + f2 + ....... + fk-1 + ½fk)

− Fm
Me = Lm + 2 h N
fm = 2yk [From equation 1]
2
N = yk . N
− Fk −1
= xk + 2 h Putting this value is in equation (2) we get,
fk
1
[putting Lm = xk, fm = fk and Fm = Fk −1 ] Arithmetic mean = yk N = yk
N
(Fk −1 + ½f k ) − Fk −1 Since yk is the mid value of the class (xk - xk+1) having class
= xk + h
fk interval h,
94 Methods of Statistics Dispersion, Nature and Shape of Frequency Distribution 95

h
We get, y k = x k +
2
h
 x = xk +
2
Hence, Arithmetic Mean = Median = Mode. Proved.

Position of Mean, Median and Mode : Mo Me x


The position of arithmetic mean, median and mode of a
Fig. 4.2
symmetrical frequency distribution is shown in Figure 4.1.

x Me Mo

x Fig. 4.3
Mo
Me
Karl Pearson's  and  Co-efficient :
Karl Pearson defined the following co-efficients, based upon
Fig. 4.1. first four central moments :
2 
For distributions of moderate skewness, there is an empirical 1 = 3 and  2 = 4
2
3
 22
relationship among the mean, median and mode that,
Mean - Mode = 3(Mean - Me) 1 = ± 1 and 2 = 2 - 3
or, x - Mo = 3( x - Me)
Measures of Skewness :
We may compare the nature, shape and size of two or more
The position of arithmetic mean, median and mode of
frequency distributions with the help of measures of skewness. The
moderately asymmetrical distributions are shown in Fig. 4.2 and Fig.
difference between mean and mode is considered as a measure of
4.3.
skewness. If x > Me the skewness is said to be positive and if x <
Me, the skewness is said to be negative. Skewness of distributions
96 Methods of Statistics Dispersion, Nature and Shape of Frequency Distribution 97

having different units of measurement cannot be compared with the µ3. If µ3 is positive, the skewness is considered to be positive and if µ3
help of absolute measures of skewness. That is why, relative is negative the skewness is also treated to be negative.
measures of skewness are widely used.
4.5. Kurtosis:
Relative Measures of Skewness : Like skewness, kurtosis is also an important shape characteristic
Mean − Mode x − M o of frequency distribution. Two distributions may be both
(1) Karl Pearson's Formula, S k = =
s.d.  symmetrical, they may have the same variability as measured by
standard deviation, they may be relatively more or less flat topped
In case it is not possible to find the mode or if a distribution has compared to normal curve (Discussed in chapter VII). This relative
more than one mode, the following formula is used to measure flatness of the top or the degree of peakedness is called kurtosis and is
3(Mean − Median ) 3( x − M e ) measured by 2. For normal distribution, 2 = 3. Hence the quantity
skewness : S k = =
s.d.  2-3 is known as excess of kurtosis or simply kurtosis. On the basis of
(2) Bowley's formula kurtosis, frequency curves are divided into the following three
(Q 3 − Q 2 ) − (Q 2 − Q1 ) Q 3 + Q1 − 2Q 2 categories :
Sk = =
( Q 3 − Q 2 ) + ( Q 2 − Q1 ) Q 3 − Q1 1) Leptokurtic ; a curve having a high peak.
Q 3 + Q1 − 2M e 2) Platykurtic ; a curve which is flat topped
=
Q 3 − Q1 3) Mesokurtic ; a curve which is neither too peaked nor too
flat-topped.
where Q1, Q2 and Q3 are the 1st, 2nd and 3rd quartiles respectively.
For formal distribution, 2 = 3 and 2 = 0. Kurtosis is measured
(3) Keley's formula :
by 2 = 2 - 3.
D + D1 − 2M e P + P1 − 2M e
Sk = 9 or, S k = 99
D 9 − D1 P99 − P1 If a distribution has
(i) 2 > 3, it is called leptokurtic
4) Co-efficient of skewness based upon moments.
(ii) 2 < 3, it is called platykurtic
1 ( 2 + 3)  32 4
Sk = ; where 1 = and  2 = (iii) 2 = 3, it is called mesokurtic
2(5 2 − 61 − 9)  32  22

As both 1 and 2 are always non-negative, the above formula


cannot indicate as to whether the skewness is positive or negative. In
such case the nature of the distribution will depend upon the value of
98 Methods of Statistics Dispersion, Nature and Shape of Frequency Distribution 99

1 1
x= f i x i = x 125.0 = 12.5
Leptokurtic N 10
1 1
Mesokurtic 1 = f i ( x i − x ) = x (0) = 0 (1 = 0 always )
N 10
1 1
Platykurtic  2 = f i ( x i − x ) 2 = x (300) = 30.0
N 10
1 1
 3 = f i ( x i − x ) 3 = x (0) = 0
N 10
1 1
and  4 = f i ( x i − x) 4 = x (22500) = 2250.0
N 10
Fig. 4.4: Different types of Kurtosis  32 0
Now, 1 = = = 0
 32 (30) 3
Example 4.6 : A distribution of short term agricultural credit
( 2 + 3) 1
disbursement from 10 branches of a bank is given below -  Coefficient of skewness S k = =0
2(5 2 − 61 − 9)
Amount of credit : 0-5, 5-10, 10-15, 15-20, 20-25
(Lac Taka) Hence the distribution is symmetrical.
4 2250
No. of branches : 1 2 4 2 1 Again  2 = = = 2.5  3
 22 (30) 2
Calculate first four central moments, co-efficients of skewness
  = 2 - 3 = 2.5 - 3 = -0.5.
and kurtosis and thus comment on the shape and nature of the
Since  < 0; The curve is platykurtic.
distribution.
 The distribution is symmetrical and platykurtic.
Solution :
Amount of No. of Mid
credit branches value fixi xi- x fi(xi- x ) fi(xi- x )2 fi(xi- x )3 fi(xi- x )4
(lac Tk.) fi xi
0-5 1 2.5 2.5 -10 -10 100 -1000 10000
5-10 2 7.5 15.0 -5 -10 50 -250 1250
10-15 4 12.5 50.0 0 0 0 0 0
15-20 2 17.5 35.0 5 10 50 250 1250
20-25 1 22.5 22.5 10 10 100 1000 10000
Total N=10 125.0 0 0 300 0 22500

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy