0% found this document useful (0 votes)
25 views

Correlation and Regression

Correlation is used to measure the strength of the relationship between two or more variables. There are three main types of correlation: positive correlation, negative correlation, and no correlation. Correlation can be simple, partial, or multiple. It can also be linear or non-linear. Common methods for measuring correlation include scatter diagrams, Karl Pearson's correlation coefficient, and Spearman's rank correlation coefficient. Karl Pearson's coefficient values range from -1 to 1, indicating the strength and direction of correlation between variables. Spearman's rank correlation is useful when variables are qualitative or the data distribution is unknown.

Uploaded by

Shruti Das
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Correlation and Regression

Correlation is used to measure the strength of the relationship between two or more variables. There are three main types of correlation: positive correlation, negative correlation, and no correlation. Correlation can be simple, partial, or multiple. It can also be linear or non-linear. Common methods for measuring correlation include scatter diagrams, Karl Pearson's correlation coefficient, and Spearman's rank correlation coefficient. Karl Pearson's coefficient values range from -1 to 1, indicating the strength and direction of correlation between variables. Spearman's rank correlation is useful when variables are qualitative or the data distribution is unknown.

Uploaded by

Shruti Das
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 43

CORRELATION

Fakhruddin Khan
Correlation is used to measure the strength of
relationship between two or more variables.
• Example: Relationship between price and product of an object.

• Correlation analysis deals with the following three facts.


1. Measuring the relationship between variables.
2. Testing the relationship for its significance.
3. Giving confidence interval for population correlation measure.
Types of correlation:
• i) Positive correlation, Negative correlation and no correlation
• ii) Simple correlation, Partial correlation and Multiple correlation
• Iii) Linear correlation and non-linear correlation
i) Positive and Negative correlation
• If both variables x and y will vary in the same direction, then they are
positively correlated.
• Directly proportional: x y or y x
Or If x increases then y also increases and vice-versa.
E.g: Height and weight of any object.
• If both variables x and y will vary in the opposite direction, then they are
negatively correlated.
• Inversely Proportional: x or y
• If x increases, then y decreases and vice-versa.
E.g: Production and price of an object.
If price increases purchasing power of consumers decreases.
r(x, y) or or
ii) Simple, Partial and Multiple correlation
• In simple correlation only relationship between two variables are
studied.
• r(x, y) or or

• In case of partial correlation three or more variables are studied. We


study two variables at a time by keeping the third variable constant.

• Representation: , , ; --------
• = relation between 1 & 2 keeping 3 constant.
Contd..
• In case of multiple correlation, three or more variables are studied
simultaneously.
• Representation: , -------

• 1 2, 13

• 2 1, 2  3 Multiple correlation.


• x  y, x z, x k and so on…
Linear and non-linear correlation
• In linear correlation the percentage change in one variable will be
equal to the percentage change in the other variables.
• Which is not so in the case of non-linear correlation.
Properties of Correlation (r):
• The value of r lies between -1 and 1  -1 r 1

• r = -1; The relationship will be perfectly negative i.e 100% negative.

• r = 0; No correlation between variables.

• r = 1; The relationship will be perfectly positive i.e 100% positive.

• -1 r 0; Negative correlation

• 0 r 1; Positive correlation
Measures of Correlation coefficients:
There are three measures of correlation:

• 1) Scatter Diagram

• 2. Karl Pearson’s correlation Coefficients.

• 3. Spearman’s Rank correlation Coefficients.


Scatter Diagram:
• The scatter diagram is a technique used to examine the relationship
between both the axis (X and Y) with one variable. In the graph, if the
variables are correlated, the point will drop along a curve or line. A
scatter diagram or scatter plot, is used to give an idea of the nature of
relationship.

• In order to have the relationship, the values of x and y are plotted in a


plane and they are represented using dot or cross.
Diagrams:

0 r 1; Positive correlation -1 r 0; Negative correlation


Contd..

r = -1; The relationship will be perfectly negative


r = 1; The relationship will be perfectly positive
Contd..

r = 0; No correlation between variables.


Karl Pearson’s Correlation:
• This is the mathematical measures to find the correlation coefficients.

• Value of correlation coefficient ‘r’ lies between -1 and 1.

• It is a relative measure and hence having no units of measurements.

• It is not affected by the change of origin or change of scale.


Contd.. Simple Correlation:

• General Method:
•r =

• Shortcut method:
• r = ; where u = x – a, v = y – b

Here a and b are the assumed means of x and y respectively


Example-1: Find the Karl Pearson’s correlation coefficient for
the table below:
X 20 16 12 8 4
y 22 14 4 12 8

• Solution: Using the General method formula:


•r =
Contd..
x y xy X2 Y2
20 22 440 400 484
16 14 224 256 196
12 4 48 144 16
8 12 96 64 144
4 8 32 16 64
= 60 = 60 = 840 = 880 = 904

•r = =

• = = 0.699 == 0.70 >0


• Correlation is moderately strong and positive
Solution: Using the shortcut method formula:
x y u = x - 12 V = y - 14 uv u2 v2
20 22 8 8 64 64 64
16 14 4 0 0 16 0
12 4 0 -10 0 0 100
8 12 -4 -2 8 16 4
4 8 -8 -6 48 64 36
Sum= 0 -10 120 160 204

•r = =

• = = 0.70
• Correlation is moderately strong and positive
Correlation:
• 0 < r < 0.25 ----- weak correlation
• 0.25 < r < 0.50 ----- Mid-way correlation
• 0.50 < r < 0.75 ----Little strong correlation
• 0.75 < r < 1 ---- Strong correlation
Example-2: Calculate coeff.of correln. For the following data:
X 50 60 58 47 49 33 65 43 46 68
Y 48 65 50 48 55 58 63 48 50 70

x Y u = x - 49 v = y - 58 uv u2 v2
50 48 1 -10 -10 1 100
60 65 11 7 77 121 49
58 50 9 -8 -72 81 64
47 48 -2 -10 20 4 100
49 55 0 -3 0 0 9
33 58 16 0 0 256 0
65 63 16 5 80 256 25
43 48 -6 -10 60 36 100
46 50 -3 -8 24 9 64

• . 68 70 19 12 228 361 144


61 -35 407 1125 655
Contd..

• r = ; where u = x – a, v = y – b

• = = = 0.98 Ans

Since correlation is 0.98, hence strongly correlated.

Analysis: PE = = 0.6475 * = 0.0082

6PE = 6*0.0082 = 0.0492 < r Highly significant.


PROBABLE ERROR:
• It is the extent to which correlation coefficient is dependable.

P.E =
Probable error is used to interpret the value of ‘r’.
i) If r < P.E, then it is not all significant
ii) If r > 6 P.E, then r is highly significant
iii) If P.E < r < 6 P.E, We can’t say anything about significance or ‘r’.
Example:
• If r = 0.6 and n = 64, then interpret ‘r’.

Soln.: PE = = 0.6475*
= 0.6475 * 0.08 = 0.052

Now 6PE = 6* 0.052 = 0.31 < r Highly significant Ans


Spearman’s Rank correlation method.
• When we don’t know the shape of population distribution and when the
data is qualitative type, we use Rank correlation method.
• It can be also used when data is quantitative.

• It is defined as the formula: =1- ( - Rho)

• Where, D = R1 – R2 = Rank Difference


• R1 = Rank of the first set of data
• R2 = Rank of the second set of data.
• Here also lies between -1 and 1 as in the case of Karl-Pearson’s method.
Conditions:
• i) ranks are assigned: all data are already ranked

• ii) ranks are to be assign: We have to assign the ranks of the


data

• iii) Tie between ranks: There are tie between the ranks of the
data.
i. ranks are assigned:
• Example: In a singing competition, two judges assigned the ranks of
seven candidates: Competitors 1 2 3 4 5 6 7
Judge I 5 6 4 3 2 7 1
Judge II 6 4 5 1 2 7 3

competitors R1 R2 D = R1 – R2 D2
1 5 6 -1 1
2 6 4 2 4
3 4 5 -1 1
4 3 1 2 4
5 2 2 0 0
6 7 7 0 0
7 1 3 -2 4
14
Contd..
• Rank correlation: = 1 - = 1 - = 1- 0.25 = 0.75

• Strongly correlated.
ii. ranks are to be assign
• Example:
X 40 65 50 51 55 67 52 43 22 20
Y 34 35 45 55 60 50 43 48 23 21

x Y R1 R2 D = R1 – R2 D2
40 34 8 8 0 0
65 35 2 7 -5 25
50 45 6 5 1 1
51 55 5 2 3 9
55 60 3 1 2 4
67 50 1 3 -2 4
52 43 4 6 -2 4
43 48 7 4 3 9
22 23 9 9 0 0
20 21 10 10 0 0
= 56
Contd..
• Given n = 10 and D = R1 – R2

• Rank correlation: = 1 - = 1 -

• =1- = 1 – 0.33 = 0.66

• The correlation coefficient is 0.66, which means moderately correlated.



iii) Tie between ranks: When ranks are repeated.

• Rank correlation: = 1 -

• Find the rank correlation coefficient for the data displayed in the
table.

Students A B C D E F G H I J
Test-I 20 30 22 28 32 40 20 16 14 18
Test-II 32 32 48 36 44 48 28 20 24 28
SOLUTION:
Students Test-I Test-II
x y R1 R2 D D2
A 20 32 6.5 5.5 1 1
B 30 32 3 5.5 -2.5 6.25
C 22 48 5 1.5 3.5 12.25
D 28 36 4 4 0 0
E 32 44 2 3 -1 1
F 40 48 1 1.5 -0.5 0.25
G 20 28 6.5 7.5 -1 1
H 16 20 9 10 -1 1
I 14 24 10 9 1 1
J 18 28 8 7.5 0.5 0.25
Total = 24

• n = 10
Contd..
• Rank correlation: = 1 -
• Here, =2, =2, =2, =2

• = 1 - = 1 + = 1- 0.16 = 0.84

• Since correlation coefficient is 0.84, Hence Strongly positively


correlated.
REGRESSION:
• The regression theory was initially given by M. M. Blair. It defines the
measure of average relationship between two or more variables in
terms of original units of data.
Dependent Independent
Variable Variable

• Y = a + bx linear relation
• Y = 1 + 5x ; x = 0  y = 1; x = 1  y = 6 ………

• Y = a + bx + c non-linear relation
Uses and Types:
• Uses
• It is used to determine dependent variables with the help of
independent variables.
• For future forecasting
• Regression coefficient ‘b’ is used to calculate correlation coefficients ‘r’.
• Type:
Simple regression
Multiple Regression
Simple Regression:
• Two types of lines are there in regression:

• The regression equation y on x:


y - = (x - )

• The regression equation x on y:


x - = (y - )
• Here and are called Regression coefficients.
The value of Regression coefficients are:

• = = and

• = =
Properties:
• The G.M. of regression coefficients gives the correlation coefficients(r)

= *

• The product of regression coefficients is always less than 1


* <1
• If is negative, then will be also negative and hence r is negative.
Contd..

• = r. and = r. where and are the standard deviations of x and y


respectively.

• Regression coefficient is an absolute measure.

• It is meant for estimation.


Example: Find the regression equation from the data given in the table
also calculate the correlation coefficients. Also find age of wife when
husband age is 30.

Age of 18 19 20 21 22 23 24 25 26 27
Husband

Age of 17 17 18 18 19 19 19 20 21 22
Wife
Age of Age of x = X - 22 y = Y - 19 xy x2 y2
Husband Wife
X Y
18 17 -4 -2 8 16 4
19 17 -3 -2 6 9 4
20 18 -2 -1 2 4 1
21 18 -1 -1 1 1 1
22 = A 19 = B 0 0 0 0 0
23 19 1 0 0 1 0
24 19 2 0 0 4 0
25 20 3 1 3 9 1
26 21 4 2 8 16 4
27 22 5 3 15 25 9
Total 5 0 43 85 24
• Now, = A + = 22 + = 22.5
• = B + = 19 + = 19

• Also, = = = 0.52
• = = = 1.792
• Regression equation y on x:
• y - = (x - )
• Y – 19 = 0.521(x – 22.5)
• Y = 0.521x + 7.3 Ans-------------------(i) first regression equation.
.• At x = 30; y = 0.521*30 + 7.3 = 23 Ans
• Regression equation x on y :
x - = (y - )
x – 22.5 = 1.792(y - 19)
x = 1.792y - 11.5 Ans
• To find correlation coefficient:
• =r

• r = = 0.96 strongly corelated.


Question: HW.
• From the data given in the table, Obtain the two lines of regression
and correlation coefficient. Also find the blood pressure when age is
50 yrs.
Age 56 42 72 39 63 47 52 49 40 42 68 60
(x)
B.P 127 112 140 118 129 116 130 125 115 120 135 133
(y)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy