Lesson_10_Relationship_Between_Variables
Lesson_10_Relationship_Between_Variables
ABC, an organization that stores a large amount of data, aims to analyze the data
and extract meaningful insights by determining the relationship between
variables.
Duration: 15 minutes
• What does correlation mean?
• How do you determine the relationship between variables?
Correlation
Correlation
Correlation is a statistical measure that quantifies the extent to which two variables are linearly related.
A scatter diagram helps visually illustrate the relationship between variables, providing a clear
understanding of their interdependence.
Relationship Between Variables with Scatter Diagram
The nature of the scatter plot provides insights into the relationship between variables.
Quantifying relationships
Upward-Sloping
50 50
40 40
30 30
20 20
10 10
0 10 20 30 40 50 60 0 10 20 30 40 50 60
Fig (a)
Fig (b)
This indicates that as one variable increases, the other variable also increases.
Downward-Sloping
50 50
40 40
30 30
20 20
10 10
0 10 20 30 40 50 60 0 10 20 30 40 50 60
Fig (c) Fig (d)
This indicates that as one variable increases, the other variable decreases.
Width of Bands
50 50
40 40
30 30
20 20
10 10
0 10 20 30 40 50 60 0 10 20 30 40 50 60
Fig (e) Fig (f)
50 50
40 40
30 30
20 20
10 10
0 10 20 30 40 50 60 0 10 20 30 40 50 60
Duration: 15 minutes
• What does correlation mean?
Answer: Correlation refers to the statistical measure of the strength and
direction of the association between two variables. It indicates how closely
the variables are related to each other.
Duration: 15 minutes
• What are the types of correlation?
X xj …………….. n 𝑥ҧ
In a data set comprising n pairs of observations,
the xj values correspond to one characteristic (X),
while the yj values correspond to another
characteristic (Y).
Y yj …………….. n 𝑦ത
Karl Pearson’s Coefficient of Correlation
cov 𝑋,𝑌
𝑟=
𝑠𝑥∗ 𝑠𝑦
sx = standard deviation of X
sy = standard deviation of Y
Correlation Coefficient
The correlation coefficient, often represented by the symbol r, is a statistical measure that calculates
the strength and direction of the linear relationship between two variables.
r=
(n*(∑xj*yj)) - ((∑xj) * (∑yj))
√ ( n*∑x j
2 - (∑xj)2 ) ( n*∑yj2 - (∑yj)2 )
Where,
n = total number of paired data points of x and y
Σ = sum of the values
Σx = sum of all x-values
Σy = sum of all y-values
Σxy = sum of the product of paired x and y values
Σx² and Σy² = sums of the squares of all x-values and y-values
Discussion: Correlation and Covariance
Duration: 15 minutes
• What are the types of correlation?
Answer: The two types of correlation coefficients are Karl Pearson's
Coefficient of Correlation and Spearman's Rank Correlation Coefficient.
The test scores are used to identify candidates for specific posts in an organization.
Practical Uses of Karl Pearson’s Coefficient of Correlation
Assume that individuals' test scores are highly correlated with their
scores during later performance appraisals
50
Performance appraisal
40
scores →
30
Well-designed tests prove to be highly
20 effective in personnel selection.
10
0 10 20 30 40 50 60
Test scores →
If there is a lack of correlation or a low correlation, it indicates the need to redesign the test.
Practical Uses of Karl Pearson’s Coefficient of Correlation
Example 2: The sales in units for certain products are correlated with the
demand for spare parts for these products.
Correlated
To make plans for spare To find the correlation To find the correlation
parts production in the between income and credit between cigarette smoking
future card delinquency rate and longevity
Properties of Pearson’s Correlation Coefficient
For instance, the temperature remains constant irrespective of the chosen unit of
measurement, demonstrating its inherent value consistency.
Properties of Pearson’s Correlation Coefficient
When the correlation coefficient (r) assumes any of the extreme values, it indicates a
perfect linear relationship between the two variables.
-1 ≤ r ≤ 1
To determine the correlation coefficient, one must examine a data set of two
variables representing student scores:
Examiners
X Y
Student 1
0 0
Student 2 4 2
Student 3 7 3
Student 4 10 10
The scores of four students in a test are based on the independent assessments
conducted by examiners X and Y.
Spearman’s Rank Correlation Coefficient
Examiner X Examiner Y
Spearman’s Rank Correlation Coefficient
For example, examiner Y penalizes students more severely for wrong answers.
Examiners
X Y
For instance, examiner Y applies stricter
Student 1 0 0
penalties for incorrect answers. As a
result, the correlation between the two
Student 2 4 2
examiners is less than 1, with a
Student 3 7 3 calculated value of 0.9.
Student 4 10 10
This indicates a high level of agreement between the examiners regarding the relative
performance of the candidates.
Spearman’s Rank Correlation Coefficient
To address scenarios like the previous example, Spearman introduced a measure known as the rank
correlation coefficient.
Spearman’s Rank Correlation Coefficient
Example 1: Assign ranks to students incorporating the hierarchy in the scores independently
for both examiners.
Student 1 0 1 Student 1 0 1
Student 2 4 2 Student 2 2 2
Student 3 7 3 Student 3 3 3
Student 4 10 4 Student 4 10 4
Spearman’s Rank Correlation Coefficient
X Rank Y Rank
Student 1 0 4 Student 1 0 4
Student 2 4 3 Student 2 2 3
Student 3 7 2 Student 3 3 2
Student 4 10 1 Student 4 10 1
The correlation coefficient of the two sets of ranks is referred to as the rank correlation coefficient.
This value can be calculated using the same formula.
Formula for Rank Correlation Coefficient
The rank coefficient (r) can be calculated using the following formula:
(6*∑dj2)
r= 1-
[n*(n2-1)]
(6*∑dj2)
1= 1-
[n*(n2-1)]
(6*∑dj2)
=0 ∑dj2 = 0
[n*(n2-1)]
When the rankings are diametrically opposite, the rank correlation is -1.
Rank 1 Rank 2
1 4
2 3 Rank correlation = -1
3 2
4 1
Discussion
Discussion: Spurious Correlation
Duration: 15 minutes
• What does spurious correlation mean?
Example 1: Cutting speed is a cause, and its impact on tool life is the effect.
Correlation Coefficient
Cause Effect
Correlation and Causation
Example 2: The total number of students enrolled in schools across different cities may display a
correlation when observed over multiple years.
City 1 City 2
Spurious correlation
Such correlations, which may seem related but are not directly causal, are known as spurious correlations.
Interpretation of Correlation
The action wherein one variable directly impacts another, creating an effect, is known as causation.
Regression
Xi Yi
To predict Y for any given value of X, use a linear equation known as the regression of Y on X
Example of Regression
Consider the given data set and the scatter diagram, as shown:
X 2 3 4 5 6 7 8 9 10
Y 31 53 53 75 83 94 92 100 124
140
120
100 The red points displayed in the
80 graphic representation
correspond to the data plots
60
derived from the table.
40
20
0
0 5 10 15
Example of Regression
Mean (𝑥)ҧ = Σ𝑋
𝑛
𝑥ҧ = 6 𝑦ത = 78.3
ҧ
(𝑋 −𝑥)(𝑌 −𝑦)
Correlation = ϵ r = 0.97
𝜎𝑥𝜎𝑦
Prediction Value
A regression line, also known as a line of best fit, is a straight line that is used to visualize and quantify
the correlation between two variables in statistical analysis.
140
120
100
80
60
40
20
0
0 5 10 15
Prediction Value
r Correlation
X 2 3 4 5 6 7 8 9 10
Y 31 53 53 75 83 94 92 100 124
It is a statistical test that determines how well changes in one variable can
account for variations in another.
Source: Investopedia
Coefficient of Determination: Example
As cutting speeds change, predicted values will differ from the average tool life.
Quantifying Quality
When the quality of prediction is good, the error terms are small.
Y - 𝑦ത z Est (Y) - 𝑦ത
Hence, the above two values will tend to be close to each other.
.
Quantifying Quality
When variables are highly correlated and the regression equation delivers a high-quality prediction, the
unexplained variation tends to be low.
40
Explained variation
Coefficient of determination =
30 Total variation
Y→
20
10
0 10 20 30 40 50 60
X→
Coefficient of Determination and Variations
The value of the coefficient of determination is non-negative and does not exceed unity.
When r = + or - 1:
When r = + or – 0.9:
The table below presents the calculated coefficient of determination for the given data:
This indicates that 95% of the variations in Y are explained by the regression equation.
= 0.0523
Discussion: Spurious Correlation
Duration: 15 minutes
• What does spurious correlation mean?
Answer: A spurious correlation is a term used in statistics to describe a
situation where two variables seem to be related to each other, but in
reality, there is no causal relationship between them.
A. Scatter plot
B. Bar graph
C. Pie chart
D. Bubble chart
Knowledge
Check
____________ is used to analyze the correlation between two variables.
1
A. Scatter plot
B. Bar graph
C. Pie chart
D. Bubble chart
The formula to find Karl Pearson’s Correlation Coefficient is r = Cov(X, Y)/ sx* sy.
Knowledge
Check In Spearman’s Rank Correlation Coefficient, if rankings are diametrically opposite, the
3 correlation between the rankings is ____________.
A. 0
B. -1
C. 0.5
D. +1
Knowledge
Check In Spearman’s Rank Correlation Coefficient, if rankings are diametrically opposite, the
3 correlation between the rankings is ____________.
A. 0
B. -1
C. 0.5
D. +1
In Spearman’s Rank Correlation Coefficient, if rankings are diametrically opposite, the correlation
between the rankings is -1.
Thank You