Simpreg
Simpreg
6
53.667
5.955
2.431
47.417
59.916
Confidence interval for the mean = Mean +/- (t-value)*(std. error of mean)
Confidence interval for a prediction = Mean +/- (t-value)*(RMSE)
Note that RMSE for the mean model is just the sample standard deviation of the dependent variable,
...which is also the sample standard deviation of the errors in this case.
Page 1
4.333
177.333
35.467
5.955
2.571
UPPER 95%
68.975
68.975
68.975
68.975
68.975
68.975
X
18
25
15
22
24
20
Y
45
58
50
54
62
53
FCST
50.252
59.215
46.411
55.374
57.935
52.813
6
20.667
3.777
14.267
6
53.667
5.955
35.467
53.667
4.836
23.388
0.000
3.475
12.079
2.614
SQ ERR
27.587
1.476
12.879
1.887
16.528
0.035
10.065
2.614
60.393
15.098
3.886
2.776
The difference between unadjusted and adjusted R-squared is that VAR(ERR) = SSE/(n-1) whereas MSE = SSE/(n-2).
The former is a biased measure of the error variance, whereas the latter is an unbiased estimate, correcting for the
fact that 2 coefficients have been estimated, not 1.
Also note that MSE is not just the sample mean of the squared errors--it is the sum of squared errors divided by n-p,
not divided by n.
The RMSE for a regression model is also called the Standard Error of the Estimate (SEE)
The exact confidence interval for a prediction is equal to the prediction +/- (t-value) * (std. dev. of prediction)
...however the std. dev. of the prediction is NOT simply the RMSE of the model (unlike in the mean model).
Rather. it includes an additional factor that depends on the standard errors of the coefficients and the values
of the independent variables at that point.
Page 2
This worksheet shows the "brute force" calculation of regression coefficients, predictions, and confidence
intervals using matrix algebra. Essentially the same formulas would work for any number of data points and
independent variables, although the arrays would have to be reshaped.
The "Y" vector (dependent variable) and its deviations-from-mean and squared-deviations-from-mean:
Y
Y-AVG(Y) (Y-AVG(Y))^2
45
-8.66666667 75.11111111
(The blue cells are "live."
58
4.333333333 18.77777778
You can change their contents
50
-3.66666667 13.44444444
and see what happens....)
54
0.333333333 0.111111111
62
8.333333333 69.44444444
53
-0.66666667 0.444444444
53.66667
0.00000
29.55556 Average values
29.55556 = POPULATION VARIANCE of Y (named VARPY) is the average squared deviation of Y from its mean
35.46667 = SAMPLE VARIANCE of Y (named VARY) is the average squared deviation of Y from its mean ADJUSTED for the estimation of the mean from the finite sample
(i.e., it is the sum of squared deviations from the mean divided by N-1 rather than N)
Here is "X-transpose" (the X matrix transposed, named XT)
1
18
1
25
1
15
1
22
1
24
1
20
SQUARED ERRORS:
27.58704
1.476111
12.87938
1.887414
16.52764
0.034938
60.39252 = Sum of Squared Errors (SSE)
The SIMPLE AVERAGE OF THE SQUARED ERRORS is the sum of squared errors divided by N:
10.06542 (This is a BIASED estimate of the average size of a squared error)
R-SQUARED is equal to 1 minus the average squared error divided by the population variance of Y:
0.659441 (This is a BIASED estimate of the fraction of variance "explained" by the model)
The MEAN SQUARED ERROR (MSE) is equal to the Sum of Squared Errors divided by the # Degrees of Freedom:
15.09813 (This is an UNBIASED estimate of the average size of a squared error)
ADJUSTED R-SQUARED is equal to 1 minus the MSE divided by the sample variance of Y
0.574301 (This is an UNBIASED estimate of the fraction of variance "explained" by the model)
The STANDARD ERROR OF THE ESTIMATE (SEE) is the square root of the MSE:
3.885631
The COVARIANCE MATRIX OF THE COEFFICIENT ESTIMATES ("COVMAT") is equal to X-transpose-X-inverse
times the MSE:
92.917 -4.37422
-4.37422 0.211656
The STANDARD ERRORS OF THE COEFFICIENT ESTIMATES are the square roots of the diagonal elements
of the covariance matrix:
9.639347
0.460061
The T-STATISTICS OF THE COEFFICIENT ESTIMATES
are the coefficients divided by their standard errors:
Page 3
Chart
65
60
55
Y
FCST
50
45
40
10
15
20
25
Page 4
30
Excel regression
SS
MS
F Significance F
1 116.9408 116.9408 7.745383 0.049663
4 60.39252 15.09813
5 177.3333
Regression
Residual
Total
Coefficients
Standard Error t Stat
P-value Lower 95%Upper 95%
Lower 95.000%
Upper 95.000%
27.20560748 9.639347 2.82235 0.047714 0.442436 53.96878 0.442436 53.96878
1.280373832 0.460061 2.783053 0.049663 0.003037 2.55771 0.003037 2.55771
Intercept
X Variable 1
RESIDUAL OUTPUT
Observation
1
2
3
4
5
6
Predicted Y
50.25233645
59.21495327
46.41121495
55.37383178
57.93457944
52.81308411
Residuals
Standard Residuals
-5.25234 -1.35173
-1.21495 -0.31268
3.588785 0.923604
-1.37383 -0.35357
4.065421 1.04627
0.186916 0.048104
Residuals
4
2
0
-2
14
16
18
20
22
24
26
-4
-6
X Variable 1
60
55
Y
50
45
40
14
16
18
20
22
24
26
X Variable 1
Page 5
Predicted Y
SG regression
Page 6