0% found this document useful (0 votes)
9 views

Chapter 4 (Correlation part)

Uploaded by

ccs.himel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Chapter 4 (Correlation part)

Uploaded by

ccs.himel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Dr.

Manju, Associate Professor, CSE , IIUC

CHAPTER FOUR

CORRELATION

Dr. Mohammad Manjur Alam (Manju)


Associate Professor
Department of Computer Science and Engineering
International Islamic University Chittagong.
Email: manjuralam44@yahoo.com

The primary objective of correlation analysis is to measure the strength or degree of relationship
between two or more variables. If the change in one variable affects a change in the other
variable, the variables are said to be correlated.

For example, the production of paddy is dependent on the rainfall. Here production of paddy is
considered to be a dependent variable.

Types of Correlation

 Positive or negative
 Simple or multiple
 Linear or non-linear

Positive or negative

If the two variables deviate in the same direction, that is if the increase (or decrease) in one
results in a corresponding increase (or decrease) in the other, correlation is said to be director
positive. But if they constantly deviate in the opposite directions, that is if increase (or decrease)
in one results in corresponding decrease (or increase) in the other, correlation is said to be
inverse or negative. If the variables are independent, there cannot be any correlation and the
variables are said to be zero correlation.

For example, the correlation between (1) the heights and weights of a group of persons, (2) the
income and expenditure is positive and the correlation between (1) price and demand of a
commodity, (2) the volume and pressure of a perfect gas is negative. And there is no correlation
between income and height.

Simple correlation and Multiple Correlation

Correlation only between two variables is called simple correlation. For example, correlation
between income and expenditure.

Under Multiple Correlation three or more than three variables are studied. Ex. Qd= f ( P,PC, PS,
t, y )
1
Dr. Manju, Associate Professor, CSE , IIUC

Linear correlation and Non Linear correlation

Correlation is said to be linear when the amount of change in one variable tends to bear a
constant ratio to the amount of change in the other. The graph of the variables having a linear
relationship will form a straight line.

Example: X = 1, 2, 3, 4, 5, 6, 7, 8,

Y = 5, 7, 9, 11, 13, 15, 17, 19,

Y = 3 +2x

The correlation would be nonlinear if the amount of change in one variable does not bear a
constant ratio to the amount of change in the other variable.

Methods of studying simple correlation

1. Scatter Diagram method;


2. Karl Pearson’s Coefficient of correlation;
3. Spearman’s Rank Correlation

Scatter diagram method

The diagrammatic way of representing bivariate data is called scatter diagram.

Suppose, (x1,y1), (x2,y2)………..(xn,yn) are n pairs of observations. If the values of the variables
x and y be plotted along the x-axis and y-axis respectively in the xy-plane, the diagram of dots so
obtained is known as scatter diagram.

Scatter diagrams for different values of r are as follows:

• Scatter diagrams for different values of r are as follows:

Interpret of r

2
Dr. Manju, Associate Professor, CSE , IIUC

r= +1, indicates a perfect positive relationship between x and y. the scatter diagram will be as in
fig. 1.1

r=-1, indicates a perfect negative relationship between x and y. the scatter diagram will be as in
fig. 1.2

r=0, means there is no linear relationship between x and y. In this case the two variables are
linearly independent. the scatter diagram will be as in fig. 1.5 and 1.6

0 < r <1, indicates a positive relationship between x and y. In this case the scatter diagram will be
as in fig. 1.3

-1< r <0, indicates a negative relationship between x and y. In this case the scatter diagram will
be as in fig. 1.4

Correlation coefficient

The numerical value by which we measure the strength of linear relationship between two or
more variables is called correlation coefficient.

Let, (x1,y1), (x2,y2)………..(xn,yn) be the pairs of n observations. Then the correlation coefficient
between x and y is denoted by rxy and defined as,
n

 (x
i 1
i  x )( y i  y )
rxy = ……………..(1)
n n

 x
i 1
i  x
2
 y
i 1
i  y
2

Equation (1) is also called Karl pearson’s coefficient of correlation formula given by 1890.

Algebraically (1) reduces to

 n  n 
  x i   y i 
 xi y i   i 1 n i 1 
r=
  n  
2
 n  
2

n   x i   n   yi  
     i 1  
 x i   y i 
2 i 1 2

 i 1 n  i 1 n 
  
  

3
Dr. Manju, Associate Professor, CSE , IIUC

Assumptions of Pearson’s Correlation Coefficient

 There is linear relationship between two variables, i.e. when the two variables are plotted
on a scatter diagram a straight line will be formed by the points.
 Cause and effect relation exists between different forces operating on the item of the two
variable series.

Comment on Correlation Coefficient

1 = Perfect positive correlation

0.7  c < 1 = Strong positive correlation

0.4  c < 0.7 = Fairly positive correlation

0 < c < 0.4 = Weak positive correlation

0 = No correlation

0 > c > -0.4 = Weak negative correlation

-0.4  c > -0.7 = Fairly negative correlation

-0.7  c < -1 = Strong negative correlation

-1 = Perfect negative correlation

4
Dr. Manju, Associate Professor, CSE , IIUC

Properties of correlation coefficient

1. Correlation coefficient is independent of change of origin and scale of measurement.


2. Correlation coefficient lies between -1 to +1. i.e, -1< rxy < 1.
3. Correlation coefficient is symmetric. i.e, rxy= ryx
4. Correlation coefficient is the geometric mean of regression coefficients i.e, rxy=
b yx  b xy
5. For two independent variable correlation coefficient is zero
6. It is always unit free.

Advantages of Pearson’s Coefficient

 It summarizes in one value, the degree of correlation & direction of correlation also

Limitation of Pearson’s Coefficient

 Always assume linear relationship


 Interpreting the value of r is difficult.
 Value of Correlation Coefficient is affected by the extreme values.
 Time consuming methods

Coefficient of Determination

The convenient way of interpreting the value of correlation coefficient is to use of square of
coefficient of correlation which is called Coefficient of Determination.

The Coefficient of Determination = r2.

Suppose: r = 0.9, r2 = 0.81 this would mean that 81% of the variation in the dependent variable
has been explained by the independent variable.

The maximum value of r2 is 1 because it is possible to explain all of the variation in y but it is not
possible to explain more than all of it.

Coefficient of Determination = Explained variation / Total variation

An example of Coefficient of Determination

When r = 0.60, r2 = 0.36 -----(1)

r = 0.30, r2 = 0.09 -----(2)

This implies that in the first case 36% of the total variation is explained whereas in second case
9% of the total variation is explained .

5
Dr. Manju, Associate Professor, CSE , IIUC

Theorm: Show that Correlation coefficient lies between -1 to +1 i.e, -1  rxy  1.

Proof: Let, (x1,y1), (x2,y2)………..(xn,yn) be the pairs of n observations. Then the correlation
coefficient between x and y is denoted by rxy and defined as,
n

 (x
i 1
i  x )( y i  y )
rxy = ……………..(1)
n n

 x
i 1
i  x
2
 y
i 1
i  y
2

Suppose, xi  x   X and  y i  y   Y therefore

r=
 XY
 X Y 2 2

Let us consider the following expression which is always positive.


X Y
i.e,  2  0
X 2
Y 2

 
or,   
X2 X Y Y2
2    0
X 2 X 2
Y2 Y 2
 

or, 
X 2

2
 XY +
Y   0 2

X 2
 X Y 2 2
Y 2

or, 1  2r  1  0

or, 2(1  r )  0

or, (1  r )  0 ……(i)

From (i), 1+r  0 [considering +ve sign.]

or, r  1

or, -1  r …………(ii)

and 1-r  0

or, 1  r

6
Dr. Manju, Associate Professor, CSE , IIUC

or, r  1 …………..(iii)

From (ii) and (iii) we get, 1< r < 1.

i.e, coefficient lies between -1 to +1.

Theorem: Show that for two independent variable correlation coefficient is zero.

Proof: Let, (x1,y1), (x2,y2)………..(xn,yn) be the pairs of n observations. Then the arithmetic
mean of x i is x and y i is y . Since x and y are independent therefore,

 (x
i 1
i  x )( y i  y )
Covariance, Cov(x,y)= 0
n

or,  x i  x  y i  y   0

 (x
i 1
i  x )( y i  y )
We Know, rxy =
n n

 x
i 1
i  x
2
 y
i 1
i  y
2

0
=
n n

 xi  x    y i  y 
2 2

i 1 i 1

= 0 (proved)

Application Problem-1: If y = mx + c, then find the correlation coefficient between x and y.

Solution: Let, (x1,y1), (x2,y2)………..(xn,yn) be the pairs of n observations. Then the correlation
coefficient between x and y is denoted by rxy and defined as,
n

 (x
i 1
i  x )( y i  y )
rxy = ……………..(1)
n n

 x
i 1
i  x
2
 y
i 1
i  y
2

Now, y = mx + c………..(ii)

7
Dr. Manju, Associate Professor, CSE , IIUC

 (x
i 1
i  x )(mxi  c  mx  c)
Therefore, rxy = ……………..(1)
n n

 x
i 1
i  x
2
 mx
i 1
i  c  mx  c 
2

 (x
i 1
i  x )( mxi  mx )
=
n n

 xi  x 2  mxi
i 1 i 1
 mx 
2

n
m ( x i  x )( xi  x )
i 1
=
n n
m   x i  x 2   x i  x 2
i 1 i 1

 (x
i 1
i  x)2
= n
1
 (x
i 1
i  x) 2

Procedure for computing the correlation coefficient

Calculate the mean of the two series ‘x’ &’y’

Calculate the deviations ‘x’ &’y’ in two series from their respective mean.

Square each deviation of ‘x’ &’y’ then obtain the sum of the squared deviation i.e.Σx2& .Σy2

Multiply each deviation under x with each deviation under y & obtain the product of ‘xy’.Then
obtain the sum of the product of x , y i.e. Σxy

Substitute the value in the formula.

Application Problem-1: A research physician recorded the pulse rates and the temperatures of
water submerging the faces of ten small children in cold water to control the abnormally rapid
heartbeats. The results are presented in the following table. Calculate the correlation coefficient
between temperature of water and reduction in pulse rate.

Temperature of water 68 65 70 62 60 55 58 65 69 63
Reduction in pulse rate. 2 5 1 10 9 13 10 3 4 6

8
Dr. Manju, Associate Professor, CSE , IIUC

Solution: Calculating table of correlation coefficient.

xi yi xi2 yi2 xiyi


68 2 4624 4 136
65 5 4225 25 325
70 1 4900 1 70
62 10 3844 100 620
60 9 3600 81 540
55 13 3025 169 715
58 10 3364 100 580
65 3 4225 9 195
69 4 4761 16 276
63 6 3969 36 378
x i  635 y i  63 x i
2
 40537 y i
2
 541  x y =3835
i i

 n  n 
  x i   y i 
 xi y i   i 1 n i 1 
We know, rxy =
  n  
2
 n  
2

n   x i   n   yi  
     i 1  
 x i   y i 
2 i 1 2

 i 1 n  i 1 n 
  
  

635  63
3835 
= 10


635  541  632 
2

 40537  
 10  10 

= -0.94

The result -0.94, indicates that the correlation coefficient between temperature of water and
reduction in pulse rate is highly negatively correlated.

Assignment problem-1: Compute r for the for the following paired sets of values:

9
Dr. Manju, Associate Professor, CSE , IIUC

i.(x, y): (1,2) , (2, 3), (3, 5), (4, 4), (5, 7)

ii. (x, y): (1,1) , (2, 3), (3, 5), (4, 7), (5, 9)

iii.(x, y): (1,10) , (2, 8), (3, 6), (4, 4), (5, 2)

iv.(x, y): (2,9) , (3, 5), (4, 6), (5, 2), (6, 1)

v.(x, y): (-2,4) , (-1, 1), (0, 0), (1, 1), (2, 4)

Solution 1: (x, y): (1,2) , (2, 3), (3, 5), (4, 4), (5, 7)

The formula for finding correlation coefficient is

 n  n 
  x i   y i 
 xi y i   i 1 n i 1 
rxy =
  n  
2
 n  
2

n   x i   n   yi  
     i 1  
 x i   y i 
2 i 1 2

 i 1 n  i 1 n 
  
  

Let us make a table to calculate correlation coefficient.

xi yi xi2 yi2 xiyi


1 2 1 4 2
2 3 4 9 6
3 5 9 25 15
4 4 16 16 16
5 7 25 49 35

10
Dr. Manju, Associate Professor, CSE , IIUC

x i  15 y i  21 x i
2
 55 y i
2
 103 x y i i  74

 n  n 
  x i   y i 
 xi y i   i 1 n i 1 
rxy =
  n  
2
 n  
2

n   x i   n   yi  
     i 1  
 x i   y i 
2 i 1 2

 i 1 n  i 1 n 
  
  

15  21
74 
= 5  0.90
 15  
2
21 
2

55  103  
 5  5 

Comment: There exists a strong positive relationship between x and y.

Problem: above ii-v (Assignment)

Assignment Problem-2: The following table gives the ages and blood pressure of 10 women:

Age in years 56 42 36 47 49 42 72 63 55 60
x
Blood pressure 147 125 118 128 125 140 155 160 149 150
y
Draw a scatter diagram

Find correlation coefficient between x and y and comment.

Ans: Try your-self

Assignment Problem-3: The scores of 12 students in their mathematics and physics classes are:

Mathematics 2 3 4 4 5 6 6 7 7 8 10 10

Physics 1 3 2 4 4 4 6 4 6 7 9 10

Find the correlation coefficient distribution and interpret it.

Comment on the followings:

(i) r=0 (ii) r=-1 (iii) r=1 (iv) r  1 (v) r<1

11
Dr. Manju, Associate Professor, CSE , IIUC

(i) r=0, indicates that the correlation coefficient between x and y is zero.

(ii) r=-1, indicates that the correlation coefficient between x and y is perfect negative.

(iii) r= 1, indicates that the correlation coefficient between x and y is perfect positive.

(iv) r  1 i.e, r=1 and r>1 i.e, r>1, is not possible, because the Correlation coefficient lies between
-1 to +1.

(v) r<1, not possible because, the Correlation coefficient lies between -1 to +1.

Uses of correlation coefficient.

1. To find the relationship between two variables.


2. To find the relationship between dependent variable and combined influence of a group
of independent variables.
3. To solve many problem in biology.
4. In social studies like relationships between crime and educations, correlation analysis has
got definite role to play.
5. In economies this is used specially.

RANK CORRELATION

Rank correlation: In some situation it is difficult to measure the values of the variables from
bivariate distribution numerically, but they can be ranked. The correlation coefficient between
these two ranks is usually called rank correlation coefficient, given by Spearman (1904). It is
denoted by R. this is the only method for finding relationship between two qualitative variables
like beauty, honesty, intelligence, efficiency and so on.

When there are no ties, the formula for computing the spearman’s rank correlation coefficient

6 d 2
 
R = 1-
n n2 1

Here, R= rank correlation coefficient, n = number of pairs of observations being ranked.

d = difference between rank of x and rank of y.

Remarks:

(i) We always have  d   R


i 1  R2   0

(ii)Like simple correlation coefficient, rank correlation coefficient lies between -1 to +1.

Note: For finding rank correlation coefficient, we may have two types of data:

12
Dr. Manju, Associate Professor, CSE , IIUC

Actual observations are given

Actual ranks are given

Interpretation of Rank Correlation Coefficient (R)

The value of rank correlation coefficient, R ranges from -1 to +1

If R = +1, then there is complete agreement in the order of the ranks and the ranks are in the
same direction

If R = -1, then there is complete agreement in the order of the ranks and the ranks are in the
opposite direction

If R = 0, then there is no correlation

Application Problem-1: Obtain the rank correlation co-efficient for the following data:

A: 80 75 90 70 65 60
B: 65 70 60 75 85 80

Solution: Here ranks of the score are not given. Let us start ranking from the highest value for
both the variables as shown in the table given below:

A B Rank of A Rank of B d = x-y d2


(x) (y)

80 65 2 5 -3 9
75 70 3 4 -1 1
90 60 1 6 -5 25
70 75 4 3 1 1
65 85 5 1 4 16
60 80 6 2 4 16
Total d i  0 d i
2
 68
6 d 2 64
R = 1-

n n 12

= 1-

6 62  1  = - 0.94

Conclusion: There exist strongly negative relationship between A and B.

Application Problem -2: Obtain the rank correlation co-efficient for the following data:

Examiner A B C D E
I 1 2 3 4 5
II 2 4 1 5 4

13
Dr. Manju, Associate Professor, CSE , IIUC

Solution: Here ranks of the score are given:

Ranking by Ranking by d = R1 – R2 d2
examiner-I: R1 examiner-II: R2
1 2 -1 1
2 3 -1 1
3 1 2 4
4 5 -1 1
5 4 1 1
Total d i  0 d i
2
 8

6 d 2 68
R = 1-

n n 1
2

= 1-
 
5 52  1
= 0.6

Comment: There is a positive rank correlation coefficient between the rankings of two
examiners.

Repeated ranks or ties observations:

When ranks are repeated the following formula is used for finding rank correlation coefficient:


6 d 2 
1
m1  m1 
3
 1
  
m 2  m 2  .............
3

 12 12 
R = 1-
n n 1
2
 
Problems of equal ranks or tie in ranks:

Application Problem -3: The following data refer to the marks obtained by 8 students in
mathematics and statistics:

Marks in mathematics 20 80 40 12 28 20 15 60
Marks in statistics 30 60 20 30 50 30 40 20
Compute rank correlation coefficient and comment.

Solution: let the marks obtained by mathematics be x and the marks obtained by statistics be y.

Table for computation of rank correlation.

x y Rank of x (R1) Rank of y (R2) d = R1 - R2 d2

20 30 3.5 4 -0.5 0.25


80 60 8 8 0 0
40 20 6 2 4 16
12 30 1 4 -3 9

14
Dr. Manju, Associate Professor, CSE , IIUC

28 50 5 7 -2 4
20 30 3.5 4 -0.5 0.25
15 40 2 6 -4 16
60 10 7 1 6 36
d i
2
 81.5

Here, m1 = 2, m2 = 3, n=8


681.5 
1 3

2 2 
1 3
 
3 3  
 12 12 
R = 1-
8 8 1
2
 
=0

Merits Spearman’s Rank Correlation

 This method is simpler to understand and easier to apply compared to karl pearson’s
correlation method.
 This method is useful where we can give the ranks and not the actual data. (qualitative
term)
 This method is to use where the initial data in the form of ranks.

Limitation Spearman’s Correlation

 Cannot be used for finding out correlation in a grouped frequency distribution.


 This method should be applied where N exceeds 30.

Assignment problem-4:

The following figures relate to advertisement expenditure and profit:

Profit (Tk.Crore):x 25 28 27 33 31 10 16 16 18 23

Adv. Exp.(Tk. Lakh):y 87 91 92 95 93 52 68 72 78 86

(i)Draw a scatter diagram and comment

(ii) Calculate Karl Pearson’s and Spearman rank correlation coefficients and comment.

Assignment problem-5:

The following figures relate to advertisement expenditure and sales of a company:

15
Dr. Manju, Associate Professor, CSE , IIUC

Adv. Exp. 62 67 73 78 85 78 91 92 96 98

(Tk. Lac)

Sales 11 13 17 18 21 24 21 27 26 21
(Tk.Crore)

Calculate Karl Pearson’s correlation coefficient and Spearman rank correlation

Coefficient and comment.

Website:
http://www.pindling.org/Math/Statistics/Textbook/Examples/Chapter3/chapter3_examples.htm

16

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy