Correlation and Regression
Correlation and Regression
Fakhruddin Khan
Correlation is used to measure the strength of
relationship between two or more variables.
• Example: Relationship between price and product of an object.
• Representation: , , ; --------
• = relation between 1 & 2 keeping 3 constant.
Contd..
• In case of multiple correlation, three or more variables are studied
simultaneously.
• Representation: , -------
• 1 2, 13
• -1 r 0; Negative correlation
• 0 r 1; Positive correlation
Measures of Correlation coefficients:
There are three measures of correlation:
• 1) Scatter Diagram
• General Method:
•r =
• Shortcut method:
• r = ; where u = x – a, v = y – b
•r = =
•r = =
• = = 0.70
• Correlation is moderately strong and positive
Correlation:
• 0 < r < 0.25 ----- weak correlation
• 0.25 < r < 0.50 ----- Mid-way correlation
• 0.50 < r < 0.75 ----Little strong correlation
• 0.75 < r < 1 ---- Strong correlation
Example-2: Calculate coeff.of correln. For the following data:
X 50 60 58 47 49 33 65 43 46 68
Y 48 65 50 48 55 58 63 48 50 70
x Y u = x - 49 v = y - 58 uv u2 v2
50 48 1 -10 -10 1 100
60 65 11 7 77 121 49
58 50 9 -8 -72 81 64
47 48 -2 -10 20 4 100
49 55 0 -3 0 0 9
33 58 16 0 0 256 0
65 63 16 5 80 256 25
43 48 -6 -10 60 36 100
46 50 -3 -8 24 9 64
• r = ; where u = x – a, v = y – b
• = = = 0.98 Ans
P.E =
Probable error is used to interpret the value of ‘r’.
i) If r < P.E, then it is not all significant
ii) If r > 6 P.E, then r is highly significant
iii) If P.E < r < 6 P.E, We can’t say anything about significance or ‘r’.
Example:
• If r = 0.6 and n = 64, then interpret ‘r’.
Soln.: PE = = 0.6475*
= 0.6475 * 0.08 = 0.052
• iii) Tie between ranks: There are tie between the ranks of the
data.
i. ranks are assigned:
• Example: In a singing competition, two judges assigned the ranks of
seven candidates: Competitors 1 2 3 4 5 6 7
Judge I 5 6 4 3 2 7 1
Judge II 6 4 5 1 2 7 3
competitors R1 R2 D = R1 – R2 D2
1 5 6 -1 1
2 6 4 2 4
3 4 5 -1 1
4 3 1 2 4
5 2 2 0 0
6 7 7 0 0
7 1 3 -2 4
14
Contd..
• Rank correlation: = 1 - = 1 - = 1- 0.25 = 0.75
• Strongly correlated.
ii. ranks are to be assign
• Example:
X 40 65 50 51 55 67 52 43 22 20
Y 34 35 45 55 60 50 43 48 23 21
x Y R1 R2 D = R1 – R2 D2
40 34 8 8 0 0
65 35 2 7 -5 25
50 45 6 5 1 1
51 55 5 2 3 9
55 60 3 1 2 4
67 50 1 3 -2 4
52 43 4 6 -2 4
43 48 7 4 3 9
22 23 9 9 0 0
20 21 10 10 0 0
= 56
Contd..
• Given n = 10 and D = R1 – R2
• Rank correlation: = 1 - = 1 -
• Find the rank correlation coefficient for the data displayed in the
table.
Students A B C D E F G H I J
Test-I 20 30 22 28 32 40 20 16 14 18
Test-II 32 32 48 36 44 48 28 20 24 28
SOLUTION:
Students Test-I Test-II
x y R1 R2 D D2
A 20 32 6.5 5.5 1 1
B 30 32 3 5.5 -2.5 6.25
C 22 48 5 1.5 3.5 12.25
D 28 36 4 4 0 0
E 32 44 2 3 -1 1
F 40 48 1 1.5 -0.5 0.25
G 20 28 6.5 7.5 -1 1
H 16 20 9 10 -1 1
I 14 24 10 9 1 1
J 18 28 8 7.5 0.5 0.25
Total = 24
• n = 10
Contd..
• Rank correlation: = 1 -
• Here, =2, =2, =2, =2
• = 1 - = 1 + = 1- 0.16 = 0.84
• Y = a + bx linear relation
• Y = 1 + 5x ; x = 0 y = 1; x = 1 y = 6 ………
• Y = a + bx + c non-linear relation
Uses and Types:
• Uses
• It is used to determine dependent variables with the help of
independent variables.
• For future forecasting
• Regression coefficient ‘b’ is used to calculate correlation coefficients ‘r’.
• Type:
Simple regression
Multiple Regression
Simple Regression:
• Two types of lines are there in regression:
• = = and
• = =
Properties:
• The G.M. of regression coefficients gives the correlation coefficients(r)
= *
Age of 18 19 20 21 22 23 24 25 26 27
Husband
Age of 17 17 18 18 19 19 19 20 21 22
Wife
Age of Age of x = X - 22 y = Y - 19 xy x2 y2
Husband Wife
X Y
18 17 -4 -2 8 16 4
19 17 -3 -2 6 9 4
20 18 -2 -1 2 4 1
21 18 -1 -1 1 1 1
22 = A 19 = B 0 0 0 0 0
23 19 1 0 0 1 0
24 19 2 0 0 4 0
25 20 3 1 3 9 1
26 21 4 2 8 16 4
27 22 5 3 15 25 9
Total 5 0 43 85 24
• Now, = A + = 22 + = 22.5
• = B + = 19 + = 19
• Also, = = = 0.52
• = = = 1.792
• Regression equation y on x:
• y - = (x - )
• Y – 19 = 0.521(x – 22.5)
• Y = 0.521x + 7.3 Ans-------------------(i) first regression equation.
.• At x = 30; y = 0.521*30 + 7.3 = 23 Ans
• Regression equation x on y :
x - = (y - )
x – 22.5 = 1.792(y - 19)
x = 1.792y - 11.5 Ans
• To find correlation coefficient:
• =r