0% found this document useful (0 votes)

4 views

CN Ols

Arizona States Univ. ECON 725 1

Uploaded by

Alexy Flemmings

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

CN Ols

Arizona States Univ. ECON 725 1

Uploaded by

Alexy Flemmings

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 133

1.

LINEAR REGRESSION UNDER IDEAL CONDITIONS

[1] What is “Regression Model”?

Example:
• Suppose you are interested in the average relationship between income (y)
and education (x).
• For the people with 12 years of schooling (x =12), what is the average
income (E(y|x=12))?
• For the people with x years of schooling, what is the average income
(E(y|x))?
• Regression model:
y = E ( y | x) + ε ,
where ε is a disturbance (error) term with E (ε | x ) = 0 .
• Regression analysis is aimed to estimate E ( y | x ).

Linear Regressions under Ideal Conditions-1

Digression to Probability Theory

(1) Bivariate Distributions

• Consider two random variables (RV), X and Y with a joint probability
density function (pdf): f(x, y) = Pr(X=x, Y=y).

• Marginal (unconditional) pdf:

fx(x) = Σyf(x,y) = Pr(X = x) regardless of Y;
fy(y) = Σx f(x,y) = Pr(Y = y) regardless of X.

• Conditional pdf:
f(y|x) = Pr(Y = y, given X = x) = f(x,y)/fx(x).

• Stochastic independence:
• X and Y are stochastically independent iff f(x,y) = fx(x)fy(y), for all x,y.
• Under this condition, f(y|x) = f(x,y)/fx(x) = [fx(x)fy(y)]/fx(x) = fy(y).

Linear Regressions under Ideal Conditions-2

EX:
• Toss two coins, A and B.
• X = 1 if head from A; = 0 if tail from A.
Y = 1 if head from B; = 0 if tail from B.
f(x,y) = 1/4 for any x,y = 0, 1. (4 possible cases)

• Marginal pdf of x:
fx(0) = Pr(X=0) regardless of y = f(0,1) + f(0,0) = 1/4 + 1/4 = 1/2.
fx(1) = Pr(X=1) regardless of y = f(1,1) + f(1,0) = 1/4 + 1/4 = 1/2.

fx(x) = 1/2, for x = 0, 1.

Similarly, fy(y) = 1/2, for y = 0, 1.

• Conditional pdf:
f(y = 1| x = 1) = f(1,1)/fx(1) = (1/4)/(1/2) = 1/2;
f(y = 0| x = 1) = f(0,1)/fx(1) = 1/2.
→ f(y| x=1) = 1/2, for y = 0, 1.

• Find f(y|x=0) by yourself.

• Stochastic independence:
fx(x) = fy(y) = 1/2; fx(x)fy(y) = 1/4 = f(x,y), for any x and y.
Thus, x and y are stochastically independent.

Linear Regressions under Ideal Conditions-3

Expectation:
E[g(x,y)] = ΣxΣyg(x,y)f(x,y) [or ∫Ω
g ( x, y ) f ( x, y )dxdy ].

Means:
μx = E(x) = ΣxΣyxf(x,y) = Σxxfx(x).
μy = E(y) = ΣxΣyyf(x,y) = Σyyfy(y).

Variances:
σ x2 = E[( x − μ x )2 ] = Σ xΣ y ( x − μ x )2 f ( x, y ) = Σ x ( x − μ x ) 2 f x ( x )
.
= E ( x 2 ) − [ E ( x )]2 = Σ x x 2 f x ( x ) − μ x2

σ y2 = Σ x Σ y ( y − μ y )2 f ( x, y ) = Σ y ( y − μ y ) 2 f y ( y )
= E ( y 2 ) − [ E ( y )]2 = Σ y y 2 f y ( y ) − μ y2

Covariance:
σ xy = cov( x, y ) = E [( x − μ x )( y − μ y )] = Σ x Σ y ( x − μ x )( y − μ y ) f ( x, y )
= E ( xy ) − μ x μ y = Σ x Σ y xyf ( x, y ) − μ x μ y

Note: σxy > 0 → positively linearly related;

σxy < 0 → negatively linearly related;
σxy = 0 → no linear relation.

Linear Regressions under Ideal Conditions-4

EX: x, y = 1, 0, with f(x,y) = 1/4.
E(xy) = ΣxΣyxyf(x,y)
= 0×0×(1/4) + 0×1×(1/4)+ 1×0×(1/4) + 1×1×(1/4) = 1/4.

Correlation Coefficient:
The correlation coefficient between x and y is defined by:
σ xy
ρ xy = .
σ xσ y

Theorem:
-1 ≤ ρxy ≤ 1.

Note: ρxy → 1: highly positively linearly related;

ρxy → -1; highly negatively linearly related;
ρxy → 0: no linear relation.

Theorem:
If X and Y are stochastically independent, then, σxy = 0. But, not vice versa.

Linear Regressions under Ideal Conditions-5

Conditioning in a Bivariate Distribution:
• X,Y: RVs with f(x,y). (e.g., Y = income, X = education)
• Population of billions and billions: {(x(1),y(1)), .... (x(b),y(b))}.
• Average of y(j) = E(y).
• For the people earning a specific education level x, what is the average of y?

Conditional Mean and Variance:

• E ( y | x ) = E ( y | X = x ) = Σ y yf ( y | x ) .

• var( y | x ) = E[( y − E ( y | x )) 2 | x ] = Σ y ( y − E ( y | x ) ) f ( y | x ) .
2

Note:
• E(y|x) may vary with x, i.e., E(y|x) is a function of x.
• Thus, we can define Ex[E(y|x)], where Ex(•) is the expectation over x =
Σx•fx(x) or ∫Ω•fx(x)dx.

Linear Regressions under Ideal Conditions-6

Theorem: (Law of Iterative Expectations)
E(y) [unconditional mean] = Ex[E(y|x)] .
Proof:
E(y) = ΣxΣyyf(x,y) = ΣxΣyyf(y|x)fx(x) = Σx[Σyyf(y|x)]fx(x).

Note:
For discrete RV, X with x = x1, ...,
E(y) = ΣxE(y|x)fx(x) = E(y|x=x1)fx(x1) + E(y|x=x2)fx(x2) + ... .

Implication:
If you know the conditional mean of y and the marginal distribution of x, you
can also find the unconditional mean of y, too.

EX 1: Suppose E(y|x) = 0, for all x. E(y) = Ex[E(y|x)] = Ex(0) = 0.

EX 2: E(y|x) = β1 + β2x. → E(y) = Ex(E(y|x)) = Ex(β1+ β2x) = β1+β2E(x).

Question: When can E(y|x) be linear? Answered later.

Definition:
We say that y is homoskedastic if var(y|x) is constant.

EX: y = E(y|x) + ε with var(ε|x) = σ2 for all x(constant).

→ var(y|x) = var[E(y|x)+ε|x] = var(ε|x) = σ2, for all x.
→ y is homoskedastic.

Linear Regressions under Ideal Conditions-7

Graphical Interpretation of Conditional Means and Variances
• Consider the following population:

E(y|x)=β1+β2x

x
x1 x2

• E(y|x=x1) measures the average value of y for the group of x = x1.

Question: It is worth finding E(y|x)?

Linear Regressions under Ideal Conditions-8

Theorem: (Decomposition of Variance)
var(y) = varx[E(y|x)] + Ex[var(y|x)].

Coefficient of Determination:
R2 = varx[E(y|x)]/var(y).
→ Measure of worthiness of knowing E(y|x).
→ 0 ≤ R2 ≤ 1.
Note:
• R2 = variation in y explained by E(y|x)/total variation of y.
• Wish R2 close to 1.

Linear Regressions under Ideal Conditions-9

Summarizing Exercise:
• A population with X (income=$10,000) and Y (consumption=$10,000).
• Joint pdf:
Y\X 4 8
1 1/2 0
2 1/4 1/4

• Graph for this population:

Regression
line

4/3
1

x
4 8

• Marginal pdf:
Y\X 4 8 fy(y)
1 1/2 0 1/2
2 1/4 1/4 1/2
fx(x) 3/4 1/4

Linear Regressions under Ideal Conditions-10

• Means of X and Y:
• E(x) = µx = Σxxfx(x) = 4×fx(4) + 8×fx(8) = 4×(3/4) + 8×(1/4) = 5.
• E(y) = µy = Σyyfy(y) = 1.5.
• Variances of X and Y:
• var(x) = σx2 = Σx(x-µx)2fx(x)
= (4-5)2fx(4) + (8-5)2fx(8) = 1×(3/4) + 9×(1/4) = 3.
• var(y) = σy2 = 1/4.
• Covariance between X and Y:
• σxy = E[(x-µx)(y-µy)] = E(xy) - µxµy = ΣxΣyxyf(x,y) - µxµy
= 4×1×f(4,1)+4×2×f(4,2)+8×1×f(8,1)+8×2×f(8,2)-5×1.5 = 0.5.
σ xy 0 .5
• ρxy = = ≅ 0.58.
σ xσ y 3 1/ 4

• Conditional Probabilities
Y\X 4 8 fy(y)
1 1/2 0 1/2
2 1/4 1/4 1/2
fx(x) 3/4 1/4
• f(y|x):
Y\X 4 8
1 2/3 0
2 1/3 1

Linear Regressions under Ideal Conditions-11

• Conditional mean:
• E(y|x=4) = Σyyf(y|x=4) = 1×f(y=1|x=4) + 2×f(y=2|x=4)
= 1×(2/3) + 2×(1/3) = 4/3.
• E(y|x=8) = 2.
y

4/3
1

x
4 8

• Conditional variance of Y:
• var(y|x=4) = Σy[y-E(y|x=4)]2f(y|x=4) = 6/27.
• var(y|x=8) = 0.

• Law of iterative expectation:

• Ex[E(y|x)] = ΣxE(y|x)fx(x) = E(y|x=4)fx(4) + E(y|x=8)fx(8)
= (4/3)×(3/4) + 2×(1/4) = 1.5 = E(y)!!!

Linear Regressions under Ideal Conditions-12

(2) Bivariate Normal Distribution
Definition:

⎛ x⎞ ⎛ ⎛ μ x ⎞ ⎛ σ x2 ρ xyσ xσ y ⎞ ⎞
⎜ ⎟ ~ N ⎜ ⎜⎜ ⎟⎟, ⎜⎜ ⎟⎟ .
⎝ y⎠ ⎜ μ y ρ xyσ xσ y
⎝⎝ ⎠ ⎝ σ y ⎟⎠ ⎟⎠
2

1
f ( x, y ) =
2πσ xσ y 1 − ρ xy2

⎛ 1 ⎧ ( x − μ x )2 x − μ y − μ y ( y − μ y )2 ⎫ ⎞
× exp⎜ − − 2 ρ xy + ⎬ ⎟⎟ ,
⎜ 2(1 − ρ 2 ) ⎨ σ 2
x

σ σ σ
2

⎝ xy ⎩ x x y y ⎭⎠
where x, y ∈ ℜ.

Facts:
• f x ( x ) ~ N ( μ x , σ x2 ) and f y ( y ) ~ N ( μ y , σ y2 ) .

• E(y|x) = β1 + β2x and var(y|x) is constant (see Greene).

→ E(y|x) is linear in x and y is homoskedastic.
• If σxy = 0 (or ρxy = 0), x and y are stochastically independent.

Linear Regressions under Ideal Conditions-13

(3) Multivariate Distributions

Definition: (Mean vector and covariance matrix)

X1, ... , Xn : random variables.
Let x = [x1, .... , xn]′ (n×1 vector). Then,
⎛ E ( x1 ) ⎞ ⎡ var( x1 ) cov( x1 , x2 ) ... cov( x1 , xn ) ⎤
⎜ ⎟ ⎢cov( x , x )
⎜ E ( x2 ) ⎟ var( x2 ) ... cov( x2 , xn )⎥
E ( x) = ⎜ ⎟ ; Cov( x ) = ⎢ 2 1
⎥.
: ⎢ : : : ⎥
⎜⎜ ⎟⎟ ⎢cov( x , x ) cov( x , x ) ...
⎝ E ( xn ) ⎠ ⎣ n 1 n 1 var( xn ) ⎥⎦

→ Cov(x) is symmetric.
EX: If x is scalar, Cov(x) = E[(x-µ)2] = var(x).
EX: x = [x1,x2]′ ; E(x) = µ = [µ1, µ2]′
x - µ = [x1-µ1, x2-µ2]′
⎛ x − μ1 ⎞
→ (x-μ)(x-μ)′ =⎜ 1 ⎟( x1 − μ1 x2 − μ 2 )
⎝ 2
x − μ 2⎠

⎛ ( x1 − μ1 ) 2 ( x1 − μ1 )( x 2 − μ 2 ) ⎞
= ⎜⎜ ⎟⎟ .
(
⎝ 2 x − μ 2 )( x1 − μ 1 ) ( x 2 − μ 2 ) 2
⎠
→ E[(x-µ)(x-µ)′] = Cov(x).

Theorem: Cov(x) = E[(x-µ)(x-µ)′] = E(xx′) - µµ′.

Proof: See Greene.

Note: In Greene, Cov(x) is denoted by Var(x).

Linear Regressions under Ideal Conditions-14

Definition: Covariance Matrix between Two Random Vectors
X = ( X 1 , X 2 ,..., X n )′ and Y = (Y1 , Y2 ,..., Ym )′ are random vectors. Then,

⎛ cov( x1 , y1 ) cov( x1 , y2 ) ... cov( x1 , ym ) ⎞

⎜ cov( x , y ) cov( x , y ) cov( x2 , ym ) ⎟
Cov ( x, y ) = ⎜ 2 1 2 2 ⎟.
⎜ : : : ⎟
⎜ ⎟
⎝ cov( xn , y1 ) cov( xn , y2 ) ... cov( xn , ym ) ⎠

Definition: (Expectation of random matrix)

Suppose that Bij are RVs. Then,
⎡ B11 B12 ... B1q ⎤ ⎡ E ( B11 ) E ( B12 ) ... E ( B1q ) ⎤
⎢B B22 ... B2 q ⎥ ⎢ E ( B ) E ( B ) ... E ( B )⎥
B=⎢ ⎥ ⇒ E ( B) = ⎢ ⎥
21 21 22 2q

⎢ : : : ⎥ ⎢ : : : ⎥
⎢B Bp2 ... B pq ⎥⎦ ⎢ E ( B ) E ( B ) ... E ( B )⎥
⎣ p1 ⎣ p1 p2 pq ⎦

(4) Multivariate Normal distribution

Definition:
X = [X1, ... , Xn]′ is a normal vector, i.e., each of the xj's is normal.
Let E(x) = µ = [µ1, ... , µn]′ and Cov(x) = Σ = [Σij]n×n. Then,
x ~ N(µ,Σ).

Pdf of x:
f(x) = f(x1, ... , xn) = (2π)-n/2 Σ -1/2exp[-(1/2)(x-µ)′Σ-1(x-µ)] ,

where Σ = det(Σ).

Linear Regressions under Ideal Conditions-15

EX:
Let X be a single RV with N(µx,σx2). Then,
f(x) = (2π)-1/2(σx2)-1/2exp[-(1/2)(x-µx)(σx2)-1(x-µx)]

1 ⎡ ( x − μ x )2 ⎤
= exp ⎢ − ⎥.
2π σ x ⎣ 2σ x
2
⎦
EX:
Assume that all the Xi (i = 1, …, n) are iid with N ( μ x , σ x2 ) . Then,

(1) µ = E(x) = [µx, ... , µx]′ ;

(2) Σ = Cov(x) = diag (σ x2 , σ x2 ,..., σ x2 ) = σ x2 I n .

Using (1) and (2), we can show that f(x) = f(x1, ... , xn) = ∏i =1 f ( xi ) ,
n

1 ⎡ ( xi − μ x ) 2 ⎤
where f(xi) = exp ⎢ − ⎥.
2π σ x ⎣ 2σ x
2
⎦

Theorem: Conditional normal distribution

[y, x2, ... , xk]′ is a normal vector. Then,
E(y| x2,...,xk) = β1 + β2x2 + ... + βkxk = x*′; var(y|x*) = σ2.
where x* = (1, x2, ... , xk)′ and β = (β1, ... , βk)′ ]. That is, the regression of y on
x1, ... , xk is linear & homoskedastic.
Proof: See Greene.

Linear Regressions under Ideal Conditions-16

(5) Properties of the Covariance Matrix of a Random Vector
Definition:
Let X = [X1, ... , Xn]′ be a random vector and let c = [c1, ... , cn]′ be a n×1 vector
of fixed constants. Then,
c′x = x′c = c1x1 + ... + cnxn = Σjcjxj (scalar).

Theorem:
(1) E(c′x) = c′E(x);
(2) var(c′x) = c′Cov(x)c.
Proof:
(1) E(c′x) = E(Σjcjxj) = E(c1x1 + ... + cnxn)
= c1E(x1) + ... + cnE(xn) = ΣjcjE(xj) = c′E(x).
(2) var(c′x) = E[(c′x - E(c′x))2] = E[{c′x - c′E(x)}2]
= E[{c′(x-E(x))}2] = E[{c′(x-E(x))}{c′(x-E(x))}]
= E[{c′(x-E(x))}{(x-E(x))′c}]
= E[c′(x-E(x))(x-E(x))′c] = c′E[(x-E(x))(x-E(x))′]c = c′Cov(x)c.

Remark:
(2) implies that Cov(x) is always positive semidefinite.
→ c′Cov(x)c ≥ 0, for any nonzero vector c.
Proof:
For any nonzero vector c, c′Cov(x)c = var(c′x) ≥ 0.

Linear Regressions under Ideal Conditions-17

Remark:
• Cov(x) is symmetric and positive semidefinite (what does it mean?).
• Usually, Cov(x) is positive definite, that is, c′Cov(x)c > 0, for any nonzero
vector c.

Definition:
Let B = [bij]n x n be a symmetric matrix, and c = [c1, ... , cn]′. Then, a scalar c′Bc
is called a quadratic form of B.

Definition:
• If c′Bc > (<) 0 for any nonzero vector c, B is called positive (negative)
definite.
• If c′Bc ≥ (≤ ) 0 for any nonzero c, B is called positive (negative)
semidefinite.

Linear Regressions under Ideal Conditions-18

Theorem:
Let B be a symmetric and square matrix given by:
⎡ b11 b12 ... b1n ⎤
⎢b b22 ... b2 n ⎥
B = ⎢ 21 ⎥.
⎢ : : : ⎥
⎢b ... bnn ⎥⎦
⎣ n1 bn 2
Define the principal minors by:
b11 b12 b13
b b12
B1 = b11 ; B2 = 11 ; B3 = b21 b22 b23 ;... .
b21 b22
b31 b32 b33

B is positive definite iff B1 , B2 , ... , Bn are all positive. B is negative definite

iff B1 < 0, B2 > 0, B3 < 0, ... .

EX:
Show that B is positive definite:
⎡2 1⎤
⎢1 2⎥ .
⎣ ⎦
End of Digression

Linear Regressions under Ideal Conditions-19

[2] Classical Linear Regression (CLR) Model

Example:
• Wish to find important determinants of individuals’ earnings and estimate
the size of the effect of each determinant.
• Data: (WAGE2.WF1 or WAGE2.TXT)

# of observations (T): 935

1. wage monthly earnings
2. hours average weekly hours
3. IQ IQ score
4. KWW knowledge of world work score
5. educ years of education
6. exper years of work experience
7. tenure years with current employer
8. age age in years
9. married =1 if married
10. black =1 if black
11. south =1 if live in south
12. urban =1 if live in SMSA
13. sibs number of siblings
14. brthord birth order
15. meduc mother's education
16. feduc father's education
17. lwage natural log of wage

What variables would be important determinants of log(wage)?

From now on, we use both “log” and “ln” to refer to natural log.

Linear Regressions under Ideal Conditions-20

Mincerian Wage Equation:
• Set y (dependent variable) = log(wage).
• Set x• (vector of independent variables) = [1, educ, exper, exper2]′.
• xi = vector of independent variables (or explanatory variables, or
regressors).
• Use subscript “o” for “true value”.
• Assume E ( y | xi ) = β1,o + β 2,o educ + β 3,o exp er + β 4,o exp er 2

• y = E ( y | xi ) + ε = β1,o + β 2,o educ + β 3,o exp er + β 4,o exp er 2 + ε

• y = xi′βo + ε , where β o = ( β1,o , β 2,o , β 3,o , β 4,o )′

• Here,
• β2,o × 100 = %Δ in wage by one more year of education.
• (β3,o+2β4,oexper) × 100 = %Δ by one more year of exper.

• Issues:
• How to estimate βo’s?
• Estimated β’s would not be equal to the true values of β (βo). How close
would our estimates to the true values?

Linear Regressions under Ideal Conditions-21

Basic Assumptions for CLR
(I call these assumptions Strong Ideal Conditions (SIC).)

To understand SIC better; imagine a population of T-groups with the following

properties.
• For each group t = 1, 2,..., T, yt denotes the dependent variable and xt• =
(xt1,xt2,...,xtk)′ denotes the vector of regressors.
• The T-groups are assumed to be independent.
• Your sample consists of T observations, each of which comes from each
different group.

As you may find, the above assumptions are unrealistic. But under the
assumptions, more intuitive discussions about the statistical properties of OLS
can be made. The statistical properties of OLS discussed later still hold even
under more realistic assumptions.

Notation:
• E ( xt1 ) is the group population mean of x1 for group t, while E(x1) is the
population mean of x1 for the whole population.

Linear Regressions under Ideal Conditions-22

We now discuss each of SIC in detail:

(SIC.1) The conditional mean of yt (dependent variable) given xt• (vector of

explanatory variables) is linear:
yt = E ( yt | xt • ) + ε t = xt′• β o + ε t = β1,o xt1 + β 2,o xt 2 + ... + β k ,o xtk + ε t , (1)

where xt • = ( xt1 , xt 2 ,..., xtk )′ and β o = ( β1,o ,..., β k ,o )′ .

Comment:
• Usually, xt1 = 1 for all t. That is, β1 is an overall intercept term.
• E (ε t | xt i ) = 0 .
• E ( xt iε t ) = E xt i [ E ( xt iε t | xt i )] = E xt i [ xt i E (ε t | xt i )] = E xt i (0) = 0 .

(SIC.2) β o = ( β1,o ,..., β k ,o )′ is unique.

Comment:
• No other β* such that E ( yt | xt i ) = xt′i βo = xt′i β* for all t.
• The uniqueness assumption of βo is called “identification” condition.
• Rules out perfect multicollinearity (perfect linear relationship among the
regressors):
• Suppose β = ( β1 , β 2 , β 3 )′ and xt 3 = xt1 + xt 2 for all t.
• Set β1,* = β1,o + a; β 2,* = β 2,o + a; β 3,* = β 3,o − a for an arbitrary a ∈ℜ .
xt′i β* = xt1β1,* + xt 2 β 2,* + xt 3β 3,*
• = xt1β1,o + xt 2 β 2,o + xt 3β 3,o + a ( xt1 + xt 2 − xt 3 )
= xt′i β o
• (SIC.2) rules out this possibility.

Linear Regressions under Ideal Conditions-23

(SIC.3) The variables, yt, xt1, … , xtk, have finite moments up to fourth order.

Comment:
• E ( yt2 xt22 ) , E ( xt 3 xt34 ) , E ( xt43 ) , etc, exist.
• Rules out extreme outliers.
• We need this assumption for consistency and asymptotic normality of the
OLS estimator.
• SIC implies the Weak Ideal Conditions (WIC) that will be discussed
later.
• Violated if xt2 = t or xt 2 = xt −1,2 + vt 2 .

(SIC.4) A random sample {( y , x , x ,..., x )′}

t t1 t2 tk
t =1,...,T
is available and T ≥ k .

Comment:
• ( yt , xt1 , xt 2 ,..., xtk )′ are iid (independently and identically distributed):
• T groups which are iid with
⎛⎛ y ⎞⎞ ⎛⎛ y ⎞⎞ ⎛⎛ y ⎞⎞ ⎛⎛ y ⎞⎞
E ⎜ ⎜ t ⎟ ⎟ = E ⎜ ⎜ ⎟ ⎟ and Cov ⎜ ⎜ t ⎟ ⎟ = Cov ⎜ ⎜ ⎟ ⎟ .
⎝ ⎝ xt i ⎠ ⎠ ⎝⎝ x ⎠⎠ ⎝ ⎝ xt i ⎠ ⎠ ⎝⎝ x ⎠⎠
• One observation is drawn from each of the T group.
• Could be appropriate for cross-section data.
• Violated if time series data are used. That is why we add “strong” for the
name of the conditions.
• If T < k , there are infinitely many β* such that xt′i βo = xt′i β* for all t. For
this case, the sample cannot identify β.
• Implies no autocorrelation: cov(ε t , ε s ) = 0 for all t ≠ s .

Linear Regressions under Ideal Conditions-24

(SIC.5) var(ε t | xt • ) = σ o2 , for all xt• (Homoskedasticity Assumption).
Comment:
• Often violated when cross-section data are used.
• Consider the two different populations:
o Population 1 (homoskedastic population):
homy = 1 + 2x2 + ε, where var(ε|x2) = 9.
o Population 2 (heteroskedastic population):
hety = 1 + 2x2 + ε, where var(ε|x2) = x22.
o x2 = 1, or 2, or 3, or 4, or 5, for both populations.

HETY vs. X2
HOMY vs. X2
20
20

16 16

12
12
HETY
HOMY

8
8
4

4
0

-4 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6

X2 X2

(SIC.6) The errors εt are normally distributed conditional on xt • .

(SIC.7) xt1 = 1, for all t = 1, ... , T.

Comment:
• Optional. Not critical.
• This condition implies that β1,o is an overall intercept term.
• Need this assumption for convenient interpretation of empirical R2.

Linear Regressions under Ideal Conditions-25

• Link between βo and covariances:
• Consider a simple regression model, yt = β1,o + β 2,o xt 2 + ε t = xt′i β o + ε t .

• Assume (SIC.1) – (SIC.4) and (SIC.7).

• E ( xt • yt ) = E ( xt • ( xt′• βo + ε t )) = E ( xt • xt′• ) β o

→ E ( x• y ) = E ( x• x•′ ) βo

→ βo = [ E ( x• x• )]−1 E ( x• y ) ,

where,
⎛⎛ 1 ⎞ ⎞ ⎛1 x2 ⎞ ⎛ 1 E ( x2 ) ⎞
E ( x• x• ) = E ⎜ ⎜ ⎟ (1 x2 ) ⎟ = E ⎜ = ;
⎝ ⎝ x2 ⎠ ⎠ ⎝ x2 x22 ⎟⎠ ⎜⎝ E ( x2 ) E ( x22 ) ⎟⎠

⎛⎛ 1 ⎞ ⎞ ⎛ y ⎞ ⎛ E( y) ⎞
E ( x• y ) = E ⎜ ⎜ ⎟ y ⎟ = E ⎜ ⎟ = ⎜ E( x y) ⎟ .
⎝⎝ 2 ⎠ ⎠
x ⎝ 2 ⎠ ⎝
x y 2 ⎠
cov( x2 , y )
→ β 2,o = ; β1,o = E ( y ) − β 2,o E ( x2 ) .
var( x2 )

Linear Regressions under Ideal Conditions-26

Theorem:
Let yt = xt′i β o ≡ β1,o + wt′i β w,o + ε t , where xt′i = (1, wt′i ) , β o = ( β1,o , β w′ ,o )′ ,

wt i = ( xt 2 ,..., xtk )′ and β w,o = ( β 2,o ,..., β k ,o )′ . Suppose that this model satisfies
(SIC.1)-(SIC4) and (SIC.7). Then,
β o = ( E ( xt • xt′• ) ) E ( xt • yt ) = ( E ( x• x•′ ) ) E ( x• y ) .
−1 −1

And,

β w,o = ( E ( w• w•′ ) − E ( w• ) E ( w•′ ) ) ( E ( w• y ) − E ( w• ) E ( y ) )

−1

;
= ( Cov ( w• ) ) Cov ( w• , y )
−1

Hint for proof:

−1
⎛ A11 A12 ⎞ ⎛ A11−1 0 ⎞ ⎛ A11−1 A12 ⎞
⎟ ( A22 − A21 A11 A12 ) ( A21 A11 − I )
−1 −1
⎜A ⎟ =⎜ ⎟+⎜
⎝ 21 A22 ⎠ ⎝ 0 0⎠ ⎝ −I ⎠ ,
⎛ 0 0 ⎞ ⎛ −I ⎞
=⎜ −1 ⎟
+ ⎜ −1 ⎟ ( A11 − A12 A22−1 A21 ) ( − I A12 A22−1 )
⎝ 0 A22 ⎠ ⎝ − A22 A21 ⎠
where 0’s here are zero matrices.

Linear Regressions under Ideal Conditions-27

Implications:
• The slopes, β 2,o , … , β k ,o , measure the correlations between regressors and

dependent variables.
• β 2,o ≠ 0 means non-zero correlation between yt and xt2. It does not mean

that xt2 causes yt. β 2,o ≠ 0 could mean that yt causes xt2.

• SIC do not talk about causality. SIC may hold even if yt determines xt • :
It can be the case that E ( edut | waget ) = β1,o + β 2,o waget .

• But, the regression model (1) is not meaningful if the x variables are not
causal variables. We would like to know by how much hourly wage rate
increases with one more of education. We would not be interested in how
many more years of education an individual could have obtained if his/her
current wage rate increased now by $1!

Linear Regressions under Ideal Conditions-28

[3] Ordinary Least Squares (OLS)

Definition:

For a given sample {( y , x ,..., x )′}

t t1 tk
t =1,...,T
without perfect multicollinearity

among regressors xt1 ,..., xtk , the OLS estimator βˆ = ( βˆ1 , βˆ2 ,..., βˆk )′
minimizes:
ST ( β ) ≡ Σt ( yt − xt1β1 − ... − xtk β k ) 2
,
= Σt ( yt − xt′• β ) = ( y − X β )′( y − X β )

where Σt = ΣTt =1 , and

⎛ y1 ⎞ ⎛ x1′• ⎞
⎜y ⎟ ⎜ x′ ⎟
y = ⎜ ⎟ ; X = ⎜ 2• ⎟ ; xt′• = ( xt1 , xt 2 ,..., xtk ) .
2

⎜ : ⎟ ⎜ : ⎟
⎜ ⎟ ⎜ ′ ⎟
⎝ T⎠
y ⎝ xT • ⎠

Comment on the assumption of no perfect multicollinearity.

• rank ( X ′X ) = rank ( X ) = k . So, X ′X = Σt xt • xt′• is invertible.

• If perfect multicollinearity exists, rank ( X ′X ) = rank ( X ) < k . So,

X ′X = Σt xt i xt′i is not invertible.
• If T < k , rank ( X ′X ) = rank ( X ) ≤ min(T , k ) < k . So, X ′X = Σt xt • xt′• is not
invertible. T < k is a case of perfect multicollinearity.

Linear Regressions under Ideal Conditions-29

EX: Simple Regression Model
• Wish to estimate yt = β1,o xt1 + β 2,o xt 2 + ε t :

ST ( β1 , β 2 ) = Σt ( yt − xt1β1 − xt 2 β 2 ) 2 .
• The first order condition for minimization:
∂ST / ∂β1 = Σt2(yt-xt1β1-xt2β2)(-xt1) = 0 → Σt(xt1yt - xt12β1-xt1xt2β2) = 0
∂ST / ∂β 2 = Σt2(yt-xt1β1-xt2β2)(-xt2) = 0 → Σt(xt2yt - xt1xt2β1-xt22β2) = 0
→ Σtxt1yt = (Σtxt12)β1 + (Σtxt1xt2)β2
Σtxt2yt = (Σtxt1xt2)β1 + (Σtxt22)β2
⎛ Σ t x t1 y t ⎞ ⎛ Σ t x t1 Σ t x t1 x t 2 ⎞⎛ βˆ1 ⎞
2
→ ⎜⎜ ⎟⎟ = ⎜⎜ ⎟⎜ ⎟ .
2 ⎟⎜ ˆ ⎟
Σ
⎝ t t 2 t ⎠ ⎝ Σ t x t 2 x t 1 Σ t x t 2 ⎠⎝ β 2 ⎠
x y

→ But, this equation is equivalent to X ′y = X ′X βˆ .

→ βˆ = (X′X)-1X′y.

Derivation of the OLS estimator for general cases:

• ST(β) = (y′ - β′X′)(y - Xβ) = y′y - β′X′y - y′Xβ + β′X′Xβ .
• Since y′Xβ is a scalar, y′Xβ = (y′Xβ)′ = β′X′y .
• Thus, ST(β) = y′y - 2β′X′y + β′X′Xβ .
⎛ ∂ST ( β ) ⎞
⎜ ∂β ⎟
⎜ 1
⎟ ⎛ 0⎞
⎜ ∂ST ( β ) ⎟ ⎜ ⎟
∂ST ( β ) ⎜ 0
• FOC for minimization of ST(β): ≡ ∂β 2 ⎟ = ⎜ ⎟ = 0k ×1 .
∂β ⎜ ⎟ ⎜:⎟
⎜ : ⎟ ⎜ ⎟
⎜ ∂S ( β ) ⎟ ⎝ 0 ⎠
⎜⎜ T ⎟⎟
⎝ ∂ β k ⎠

Linear Regressions under Ideal Conditions-30

But,
∂(β′X′y)/∂β = X′y;
∂(β′X′Xβ)/∂β = 2X′Xβ.
[In fact, for any k×1 vector d, ∂(β′d)/∂β = d; and, for any k×k symmetric
matrix A, ∂(β′Aβ)/∂β = 2Aβ.]
Thus, FOC implies

∂ST ( β )
= −2 X ′y + 2 X ′X β = 0k ×1
∂β
→
X ′y − X ′X β = 0k ×1 (2)

→ Solving (2), we have

βˆ = ( X ′X ) −1 X ′y .

SOC (second order condition) for minimization:

∂ 2 ST ( β ) ⎡ ∂ 2 ST ( β ) ⎤
=⎢ ⎥ = 2X′X,
∂β∂β ′ ∂ β ∂β
⎣ i j ⎦ k ×k
which is a positive definite matrix for any value of β. That is, the function
ST(β) is globally convex. This indicates that βˆ indeed minimizes ST(β).
[Here, we use the fact that ∂(β′Aβ)/∂β∂β′ = 2A for any symmetric matrix A.]

Linear Regressions under Ideal Conditions-31

Theorem: βˆ = ( X ′X ) −1 X ′y .

Definition:
• t'th residual: et = yt − xt′i βˆ (can be viewed as an estimate of εt).
• Vector of residuals: e = ( e1 ,..., eT )′ = y − X βˆ .

Theorem: X ′e = 0k ×1
Proof:
From the proof of the previous theorem,
X ′y − X ′X βˆ = 0k ×1 → X ′( y − X βˆ ) = 0k ×1 → X ′e = 0 .

Corollary:
If (SIC.7) holds ( xt1 = 1 for all t: β1 is the intercept), Σtet = 0.
Proof:
⎡ x11 x21 ... xT 1 ⎤ ⎡ e1 ⎤ ⎡ Σ t xt1et ⎤ ⎡0⎤
⎢x x22 ... xT 2 ⎥ ⎢ e2 ⎥ ⎢Σ t xt 2 et ⎥ ⎢0⎥
X ′e = ⎢ 12 ⎥⎢ ⎥ = ⎢ ⎥=⎢ ⎥ .
⎢ : : : ⎥⎢ : ⎥ ⎢ : ⎥ ⎢:⎥
⎢x ... xTk ⎥⎦ ⎢⎣ eT ⎥⎦ ⎢⎣ Σt xtk et ⎥⎦ ⎢⎣0⎥⎦ k ×1
⎣ 1k x2 k

→ Σtxt1et = 0 → Σtet = 0 (by SIC.7).

Linear Regressions under Ideal Conditions-32

Question:
Consider the following two models:
(A) yt = xt1β1 + xt2β2 + xt3β3 + εt;
(B) yt = xt1β1 + xt2β2 + εt.
Are the OLS estimates of β1 and β2 from (A) the same as those from (B)?

Digression to Matrix Algebra

Definition: Let A be a T×p matrix.
P(A) = A(A′A)-1A′ (T×T matrix called “projection matrix”);
M(A) = IT - P(A) = IT - A(A′A)-1A′ (T×T matrix called “residual maker).

Facts:
1) P(A) and M(A) are both symmetric and idempotent:
P(A)′ = P(A), M(A)′ = M(A), P(A)P(A) = P(A), M(A)M(A) = M(A).
2) P(A) and M(A) are psd (positive semi-definite).
3) P(A)M(A) = 0T×T (orthogonal).
4) P(A)A = [A(A′A)-1A′]A = A.
5) M(A)A = [IT-P(A)]A = A - P(A)A = A - A = 0T×T.
End of Digression

Theorem: e = M(X)y.
<Proof> e = y − X β = I T y − X ( X ′X ) −1 X ′y = [ I T − X ( X ′X ) −1 X ′] y = M ( X ) y .

Linear Regressions under Ideal Conditions-33

Frisch-Waugh Theorem:
Partition X into [XA,XB] and β = ( β A′ , β B′ )′ . Let β A be the OLS estimate of βA
B

from a regression of the model y = Xβ + ε = XAβA + XBβB + ε. Then,

β A = [ X A′M ( X B ) X A ]−1 X A′M ( X B ) y .

That is, β A is obtained by regressing M(XB)y on M(XB)XA.

Comment:
β A is different from the OLS estimate of βA from a regression of y on XA.

Theorem:
Consider the following models:
(A) yt = β1 + β2xt2 + β3xt3 + error
(B) yt = α1 + α2xt2 + error
(C) xt3 = δ1 + δ2xt2 + error

Then, α 2 = β 2 + δ 2 β 3 .

Linear Regressions under Ideal Conditions-34

Theorem:
Consider the following two models:
(A) yt = β1 + β2xt2 + ... + βkxtk + εt;
(B) yt − y = β 2 ( xt 2 − x2 ) + ... + β k ( xtk − xk ) + error .
Then, the OLS estimates of β2, ... , βk from the regression of (A) are the same as
the OLS estimates of β2, ..., βk from the regression of (B).
Proof :
Model (A) can be written as
y = X β = 1T β1 + X * β* + ε ,
where 1T is the T×1 vector of ones and β* = (β2,...,βk)′. Then,

( )
−1
β * = X *′ M (1T ) X * X *′ M (1T ) y .

Observe that:

M (1T ) y = ( y1 − y y2 − y ... yT − y )′ .
Now, complete the proof by yourself.

Linear Regressions under Ideal Conditions-35

[4] Goodness of Fit
Question: How well does your regression explain yt?

Example:
• A simple regression model: yt = β1 + β2xt2 + εt, with β1,o = β2,o = 1.
• For population A, σo2 = 1. For population B, σo2 = 10.
YA vs. X2 YB vs. X2
16 50

40
12
30

8 20
YA

YB 10
4
0

-10
0
-20

-4 -30
0 2 4 6 8 10 12 0 2 4 6 8 10 12

X2 X2

• Clearly, the regression line E ( yt | xt i ) explains Population A better.

• How can we measure the goodness of fit of E ( yt | xt i ) ?

Definition:
• "Fitted value" of yt: yˆ t = xt′i βˆ (an estimate of E ( yt | xt i ) ).

• Vector of fitted values: ŷ = X βˆ .

Linear Regressions under Ideal Conditions-36

Definition:
SSE = e′e = ( y − X βˆ )′( y − X βˆ ) = ( y − yˆ )′( y − yˆ ) = Σ t ( yt − yˆ t ) 2 .
(Unexplained sum of squares)
→ Measures unexplained variation of yt.
→ SSE/T is an estimate of Ex[var(y|x)].
SSR = Σt ( yˆ t − y )2 , where y = T −1Σt yt (Explained sum of squares).

→ Measures variation of yt explained by regression.

→ SSR/T is an estimate of varx[E(y|x)].
SST = Σt ( yt − y )2 (Total sum of squares)

→ SST/T measures total variation of yt.

Theorem: SSE = Σtet2 = y ′y − βˆ ′X ′y .

Proof:
SSE = ( y − Xβˆ ) ′( y − Xβˆ ) = y ′y − 2 βˆ ′X ′y + βˆ ′X ′Xβˆ

= y ' y − 2 βˆ ′X ′y + βˆ ′X ′X ( X ′X ) −1 X ′y = y ′y − βˆ ′X ′y .

Theorem:
2
SST = Σt ( yt − y )2 = Σ t yt − Ty 2 ,

SSR = βˆ ′X ′y − Ty 2 [if (SIC.7) holds].

Proof: For SSR, see Schmidt.

Linear Regressions under Ideal Conditions-37

Theorem:
Suppose that xt1 = 1, for all t (that is, (SIC.7) holds). Then, SST = SSE + SSR.
Proof: Obvious.

Implication:
Total variation of yt equals sum of explained and unexplained variations of yt.

Definition: [Measure of goodness of fit]

R2 = 1 - (SSE/SST) = (SST-SSE)/SST.

Theorem:
Suppose that xt1 = 1, for all t (SIC.7). Then, R2 = SSR/SST and 0 ≤ R2 ≤ 1.

Note:
1) If (SIC.7) holds, then, R2 = 1 - (SSE/SST) = SSR/SST.
2) If (SIC.7) does not hold, then, 1 - (SSE/SST) ≠ SSR/SST.
3) 1 - (SSE/SST) can never be greater than 1, but it could be negative.
SSR/SST can never be negative, but it could be greater than 1.

Linear Regressions under Ideal Conditions-38

Definition:
Ru2 (uncentered R2) = yˆ ′yˆ / y ′y = Σ t yˆ t2 / Σ t y t2 .

Note:
• Some people use Ru2, when the model has no intercept term.
• 0 ≤ Ru2 ≤ 1, since e′e + yˆ ′yˆ = y′y. [Why? Try it at home.]
→ This holds even if (SIC.7) does not hold.
• If y = 0, then, Ru2 = R2.

Definition:
An estimator of covariance between yt and ŷt (which be viewed as an estimate
of E ( yt | xt • ) ) is defined by:
1
e cov( yt , yˆ t ) = Σt ( yt − y )( yˆ t − y ) ,
T −1
where ~
y = T −1Σt yˆ t . Similarly, the estimators of var(yt) and var( ŷt ) are defined
by:
1 1
e var( yt ) = Σt ( yt − y ) 2 ; e var( yˆ t ) = Σt ( yˆ t − y ) 2 .
T −1 T −1
Then, the estimated correlation coefficient between yt and ŷt is defined by:
e cov( yt , yˆ t )
ρˆ = .
e var( yt ) e var( yˆ t )

Linear Regressions under Ideal Conditions-39

Note:
1) 0 ≤ ρ̂ 2 ≤ 1, whether (SIC. 7) holds or not.
2) If (SIC.7) holds, ~
y = y.
3) If (SIC.7) holds, 1-(SSE/SST) = SSR/SST = ρ̂ 2 .

Remark for the case where (SIC.7) holds:

1) If R2 = 1, yt and ŷt are perfectly correlated (perfect fit).
2) If R2 = 0, yt and ŷt have no correlation.

→ Regression may not be much useful.

3) Does a high R2 always mean that your regression is good?
[Answer]
No. If you use more regressors, then, you will get higher R2. In particular, if
k = T, R2 = 1.
4) R2 tends to exaggerate goodness of fit when T is small.

Definition: [Adjusted R2, Theil (1971)]

SSE /(T − k )
R 2 =1− .
SST /(T − 1)
Comment:
• R 2 < R 2 unless k > 1 and R2 < 1.
• R 2 could be negative.

Linear Regressions under Ideal Conditions-40

[Proof for the fact that R2 increases with k]
Theorem: Let A = [A1,A2]. Then,
M(A)Aj = 0; P(A)Aj = Aj, j = 1, 2; P(A) = P(A1) + P[M(A1)A2].

Theorem: ŷ = P(X)y and e = M(X)y.

Proof: Because ŷ = X βˆ = X(X′X)-1X′y = P(X)y. And e = y – P(X)y = M(X)y.

Lemma: SSE = y′M(X)y = y′y - y′P(X)y.

Proof: SSE = e′e = [ M ( X ) y ]′M ( X ) y = y ' M ( X )′M ( X ) y = y ′M ( X ) y .

Theorem:
When k increases, SSE never increases.
Proof:
Compare:
Model 1: y = Xβ + ε
Model 2: y = Xβ + Zγ + υ = Wξ + υ,
where W = [X,Z] and ξ = [β′,γ′]′.
SSE1 = SSE from M1 = y′M(X)y = y′y - y′P(X)y
SSE2 = SSE from M2 = y′M(W)y = y′y - y′P(W)y
= y′y - y′[P(X)+P{M(X)Z}]y
= y′y - y′P(X)y - y′P{M(X)Z}y
SSE1 - SSE2 = y′P{M(X)Z}y ≥ 0.

Linear Regressions under Ideal Conditions-41

[5] Statistical Properties of the OLS estimator

(1) Random Sample:

• A population (of billions and billions)

x(1) , ... , x(b)

Here, the x(j) are the members of the population.

• θ: An unknown parameter of interest (e.g., population mean or population
variance.)
o If we know the pdf of this population, we could easily compute θ. But
if you do not know the pdf?
• Need to estimate θ, using a random sample {x1, ... , xT} of size T from the
population.

Linear Regressions under Ideal Conditions-42

• What do we mean by “random sample”?
• A sample that represents the population well.
• Divide the population into T groups such that the groups are
stochastically independent and the pdf of each group is the same as the
pdf of the whole population. Then, draw one from each group: Then, the
x1, ... , xT should be iid (independently and identically distributed).
• “Random sample” means a sample obtained by this sampling strategy.
• An example of nonrandom sampling:
• Suppose you wish to estimate the % of supporters of the Republican
Party in the Phoenix metropolitan area.
• t is a zip-code area. Choose a person living in a street corner from
each t.
• If you do, your sample is not random. Because rich people are likely
to live in corner houses! Republicans are over-sampled!

• Let θˆ be an estimator of θ. What properties should θˆ have?

(2) Criteria for “good” estimators

1) Unbiasedness.
2) Small variance.
3) Distributed following a known form of pdf (e.g., normal, or χ2).

Linear Regressions under Ideal Conditions-43

Definition: (Unbiasedness)
If E (θˆ) = θo , then we say that θˆ is an unbiased estimator of θ.
Comment:
• Consider the set of all possible random samples of size T:
Estimate
Sample 1: {x1[1], x2[1], ... , xT[1]} → θˆ[1]
Sample 2: {x1[2], x2[2], ... , xT[2]} → θˆ[ 2 ]
Sample 3: {x1[3], x2[3], ... , xT[3]} → θˆ[ 3]
:
Sample b′: {x1[b′], x2[b′], ... , xT[b′]} → θˆ[ b′] .
• Consider the population of Sθ ≡{θˆ[1] , ... , θˆ[ b′] }.
• Unbiasedness of θˆ means that E(θˆ ) = population average of Sθ = θo.

Definition: (Relative Efficiency)

Let θˆ and θ be unbiased estimators of θ. If var(θˆ) < var(θ ) , we say that θˆ

is more efficient than θ .

Comment:
If θˆ is more efficient than θ , it means that the value of θˆ that I can obtain
from a particular sample would be generally closer to the true value of θ (θo)
than the value of θ .

Linear Regressions under Ideal Conditions-44

Example:
• A population is normally distributed with N(μ,σ2), where μo = 0 and σo2 = 9.
• {x1,x2, ... , xT} is a random sample (T = 100):
1
• Two possible unbiased estimators of μ: x = Σt xt and x = x1 .
T
⎛1 ⎞ 1 1
• E ( x ) = E ⎜ Σt xt ⎟ = Σ t E ( xt ) = Σ t μo = μo ; E ( x ) = E ( x1 ) = μo .
⎝T ⎠ T T
• Which estimator is more efficient?
2 2
⎛1 ⎞ ⎛1⎞ ⎛1⎞ σ o2
• var( x ) = var ⎜ Σt xt ⎟ = ⎜ ⎟ Σ t var( xt ) = ⎜ ⎟ Σtσ o =
2
;
⎝ T ⎠ ⎝ ⎠T ⎝ ⎠
T T

• var( x ) = var( x1 ) = σ o2 .

σ o2
• Thus, var( x ) = < σ o2 = var( x ) , if T > 1.
T

Gauss Exercise:
• From N(0,9), draw 1,000 random samples of size equal to T = 100.
• For each sample, compute x and x .
• Draw a histogram for each estimator.
• Gauss program name: mmonte.prg.

Linear Regressions under Ideal Conditions-45

/*
** Monte Carlo Program for sample mean
*/

seed = 1;
tt = 100; @ # of observations @
iter = 1000; @ # of sets of different data @

storem = zeros(iter,1) ;
stores = zeros(iter,1) ;

i = 1; do while i <= iter;

@ compute sample mean for each sample @

x = 3*rndns(tt,1,seed);
m = meanc(x);
storem[i,1] = m;
stores[i,1] = x[1,1];

i = i + 1; endo;

@ Reporting Monte Carlo results @

output file = mmonte.out reset;

format /rd 12,3;

"Monte Carlo results";

"-----------";
"Mean of x bar =" meanc(storem);
"mean of x rou =" meanc(stores);
library pgraph;
graphset;

v = seqa(-10, .2, 100);

{a1,a2,a3}=hist(storem,v);
@ {b1,b2,b3}=hist(stores,v); @

output off ;

Linear Regressions under Ideal Conditions-46

Linear Regressions under Ideal Conditions-47

Extension to the Cases with Multiple Parameters:
• θ = (θ1,θ2, ... , θp)′ is a unknown parameter vector.

Definition: (Unbiasedness)
θˆ is unbiased iff E (θˆ) = θo :
⎡ E (θˆ1 ) ⎤ ⎡ θ1,o ⎤
⎢ ⎥ ⎢
⎢ E (θ ˆ ) ⎥ θ 2,o ⎥
E (θˆ) = ⎢ 2
⎥ =⎢ ⎥ = θo .
: ⎢ : ⎥
⎢ ⎥ ⎢ ⎥
⎢⎣ E (θˆp ) ⎥⎦ ⎣θ p ,o ⎦

Definition: (Relative Efficiency)

Suppose that θˆ and θ are unbiased estimators. Let c = (c1,c2, ... , cp)′ is a
nonzero vector. θˆ is said to be more efficient than θ , iff var(c ′θ ) ≥ var(c ′θˆ)
for any nonzero vector c.

Remark:
var(c ′θ ) ≥ var(c ′θˆ) .

↔ c ′Cov(θ )c − c ′Cov(θˆ)c ≥ 0 , for any nonzero c.

↔ c ′[Cov(θ ) − Cov(θˆ)]c ≥ 0 , for any nonzero c.

↔ Cov(θ ) − Cov(θˆ) is positive semi-definite.

Linear Regressions under Ideal Conditions-48

Comment:
• Let θ = (θ1,θ2)′ and c = (c1,c2)′.
• Suppose you wish to estimate c′θ = c1θ1 + c2θ2.
• If, for any nonzero c, var(c′θ ) = var(c1θ 1 +c2θ 2 ) ≥ var(c1θˆ1 +c2θˆ2 ) =

var(c′θˆ ), we say that θˆ is more efficient than θ .

Example:
⎡1 0 ⎤ ⎡1.5 1 ⎤
• Let θ = (θ1 , θ 2 )′ . Suppose Cov (θˆ) = ⎢ ⎥; Cov (θ ) = ⎢ 1 1.5⎥ .
⎣ 0 1 ⎦ ⎣ ⎦
• Note that:
var(θˆ1 ) = 1 < 1.5 = var(θ 1 ) ; var(θˆ2 ) = 1 < 1.5 = var(θ 2 ) .

• But,
⎡0.5 1 ⎤
Cov (θ ) − Cov (θˆ) = ⎢ ⎥ ≡ A → A1 = 0.5 > 0; A 2 = −0.75 < 0 .
⎣ 1 0 . 5⎦
• A is neither positive nor negative semi-definite.
• θˆ is not necessarily more efficient than θ .
• For example, suppose you wish to estimate θ1-θ2 = c′θ (where c = (1,-1)′):
• var(c′θˆ ) = c′Cov(θˆ )c = 2; var(c′θ ) = c′Cov(θ )c = 1.
• That is, for the given c = (1,-1)′, c′θ is more efficient than c′θˆ .
• This example is a case where relative efficiency of estimators depends on
c. For such cases, we can’t claim that one estimator is superior to others.

Linear Regressions under Ideal Conditions-49

Theorem:
If θˆ is more efficient than θ , var(θˆ j ) ≤ var(θ j ) , for all j = 1, ..., p. But not
vice versa.
Proof:
Choose c = (1,0,...,0)′. Then, you can show var(θˆ1 ) ≤ var(θ 1 ) . Now, choose c

= (0,1,0,....,0)′. Then, we can show var(θˆ2 ) ≤ var(θ 2 ) . Keep doing this until j
= p.

Linear Regressions under Ideal Conditions-50

(3) Population Projection
• Suppose you have data from all population members (say, t = 1, …., B).
1 B
• Assume that E ( x• x•′ ) = Σt =1 xt • xt′• is pd, where xt1 = 1 for all t.
B
• Let β p = ( β1, p ,..., β 2, p )′ be the OLS estimator obtained using all population:

Notice that β p is a population parameter vector. Denote

Pr oj ( yt | xt • ) = xt′• β p .

• Let e p ,t = yt − xt′i β p , where t = 1, … , B.

• Population projection model:

yt = Pr oj ( yt | xt i ) + ε t = xt′i β p + e p ,t .

• By definition, β p always exists. Notice that (SIC.1) assumes that the

conditional mean of yt is linear in xt i : E ( yt | xt i ) = xt′i βo . In contrast, the

population projection of yt is always linear.

Theorem:
E ( x j e p ) = 0 for all j = 1, … , k. That is, E ( xi e p ) = 0k ×1 .

Proof:
1 B
Recall X ′e = 0k ×1 → ΣTt=1 xt •et = 0k ×1 . That is, E ( x•e p ) = Σ t =1 xt •e p ,t = 0 .
B

Comment:
• E ( x•e p ) = 0 → E ( e p | x• ) = 0 , although the latter implies the former.

Linear Regressions under Ideal Conditions-51

Theorem:
β p = ( E ( x• x•′ ) ) E ( x• y ) .
−1

Proof:
−1
⎛1 ⎞ 1 B
β p = ( Σ x x′ )
−1
B
t =1 t • t • Σ x yt = ⎜ Σ tB=1 xt • xt′• ⎟
B
t =1 t • Σ t =1 xt• yt .
⎝ B ⎠ B

Comment:
• Intuitively, the OLS estimator is a consistent estimator of β p .

• Notice that under (SIC.1)-(SIC.4), β o = β p !

• Under (SIC.1)-(SIC.4), E ( yt | xt • ) = Pr oj ( yt | xt • ) = xt′• βo .

• Thus, under (SIC.1)-(SIC.4), the OLS estimator is a consistent estimator of
βo .

Linear Regressions under Ideal Conditions-52

(4) The Stochastic Properties of the OLS Estimator.

(SIC.8) The regressor xt1, … , xtk ( xt • ) are nonstochastic.

Comment:
• The whole population consists of T groups, and each group has fixed xt•.
We draw yt from each group. The value of yt would change over different
trials, but the value of xt• remains the same.
• Can be replaced by the assumption that E (ε t | x1• ,..., xT • ) = 0 for all t
(assumption of strictly exogenous regressors). This assumption holds as
long as (SIC.1) - (SIC.4) hold. If you do not use (SIC.8), the distributions
of βˆ and s 2 obtained below the conditional ones conditional on
x1i , x2i ,..., xT i .

Theorem:
Assume (SIC.1)-(SIC.6) and (SIC.8). Then,
• E ( βˆ ) = β o (unbiased)

• Cov ( βˆ ) = σ o2 ( X ′X ) −1

• E ( s 2 ) = σ o2 , where s 2 = SSE /(T − k ) = Σt et2 /(T − k ) = e′e /(T − k )

[even if the εt are not normal, that is, (SIC.6) does not hold]
• βˆ ~ N ( β o , σ o2 ( X ′X ) −1 ) .

• βˆ and SSE (so s2) are stochastically independent.

• SSE / σ o2 ~ σ 2 (T − k ) [if (SIC.6) holds.]

Linear Regressions under Ideal Conditions-53

Comment:
• As discussed later, we need to estimate Cov ( βˆ ) = σ o2 ( X ′X ) −1 .

• We can use s2 to estimate Cov ( βˆ ) .

Numerical Exercise:
• yt = β1 + β2xt2 + β3xt3 + εt, T = 5:
⎡ 0⎤ ⎡1 − 2 4⎤
⎢ 0⎥ ⎢1 − 1 1⎥
⎢ ⎥ ⎢ ⎥
y = ⎢1⎥; X = ⎢1 0 0⎥ .
⎢1 ⎥ ⎢1 1 1⎥
⎢ ⎥ ⎢ ⎥
⎢⎣3⎥⎦ ⎢⎣1 2 4⎥⎦

• Then,
⎡ 5 0 10 ⎤ ⎡5⎤
X ′X = ⎢ 0 10 0 ⎥ ; X ′y = ⎢ 7 ⎥ ; y ′y = 11 ; y = 1.
⎢ ⎥ ⎢ ⎥
⎢⎣10 0 34⎥⎦ ⎢⎣13⎥⎦

1) Compute βˆ :

⎡17 / 35 0 − 1 / 7⎤
(X′X)-1= ⎢ 0 1 / 10 0 ⎥.
⎢ ⎥
⎢⎣ − 1 / 7 0 1 / 14 ⎥⎦

⎡ βˆ1 ⎤ ⎡17 / 35 0 − 1 / 7⎤ ⎡ 5 ⎤ ⎡ 0.571⎤

⎢ ⎥
βˆ = ⎢ βˆ2 ⎥ = ⎢ 0 1 / 10 0 ⎥ ⎢ 7 ⎥ = ⎢ 0.7 ⎥ .
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ βˆ3 ⎥ ⎢⎣ − 1 / 7 0 1 / 14 ⎥⎦ ⎢⎣13⎥⎦ ⎢⎣0.214⎥⎦
⎣ ⎦

Linear Regressions under Ideal Conditions-54

2) Compute s2:
SSE = y′y - y′X βˆ = 0.46
→ s2 = SSE/(T-k) = 0.46/(5-3) = 0.23

3) Estimate Cov ( βˆ ) :

⎡17 / 35 0 − 1 / 7⎤ ⎡ 0.112 0 − 0.032⎤

s 2 ( X ′X ) −1 = 0.23⎢ 0 1 / 10 0 ⎥= ⎢ 0 0.023 0 ⎥.
⎢ ⎥ ⎢ ⎥
⎢⎣ − 1 / 7 0 1 / 14 ⎥⎦ ⎢⎣ − 0.032 0 0.016 ⎥⎦

4) Compute SSE, SSR and SST:

• SST = y′y - T y 2 = 11 - 5×(1)2 = 6;

⎛5⎞
⎜ ⎟
• SSE = y′y - βˆ ′X ' y = 11 - (0.571 0.7 0.214 )⎜ 7 ⎟ = 0.46
⎜13⎟
⎝ ⎠
• SSR = SST – SSE = 5.54.

5) Compute R2 and R 2 .
• R2 = SSR/SST =5.54/6 = 0.923
T −1 5 −1
• R2 = 1 - (1 − R 2 ) = 1 − (1 − 0.923) = 0.846.
T −k 5−3

Linear Regressions under Ideal Conditions-55

[Proofs of the General Results under SIC]
1) Some useful results:
a) Let ε = (ε1 ,..., ε T )′ . Then, the model yt = xt′• βo + ε t (t = 1, … , T) can be
written as y = X βo + ε . [Be careful that ε is a vector from now on!]
b) E (ε ) = 0T ×1 , because E (ε t ) = E xt• [ E (ε t | xt • )] = E xt• (0) = 0 for all t.

c) E (εε ′) = E (εε ′) − E (ε ) E (ε ′) = Cov(ε ) = σ o2 I T , because cov(ε t , ε s ) = 0 by

(SIC.4) and var(ε t ) = σ o2 by (SIC.5).

d) Under (SIC.8), E ( X ′ε ) = X ′E (ε ) = 0k ×1 .

2) Show that E ( βˆ ) = β o and Cov ( βˆ ) = σ o2 ( X ′X ) −1 .

Lemma D.1:
βˆ = β o + ( X ′X ) −1 X ′ε .
Proof:
y = X βo + ε .

βˆ = ( X ′X ) −1 X ′y = ( X ′X ) −1 ( X β o + ε ) = β o + ( X ′X ) −1 X ′ε .

Theorem: (Unbiasedness)
E ( βˆ ) = β o .
Proof:
E ( βˆ ) = E [ β o + ( X ′X ) −1 X ′ε ] = β o + ( X ′X ) −1 X ′E (ε ) = β o .

Linear Regressions under Ideal Conditions-56

Theorem:
Cov ( βˆ ) = σ o2 ( X ′X ) −1 .
Proof:
Cov( βˆ ) = Cov[ β o + ( X ′X ) −1 X ′ε ]

= Cov[( X ′X ) −1 X ′ε ] = ( X ′X ) −1 X ′Cov (ε )[( X ′X ) −1 X ′]′

= ( X ′X ) −1 X ′(σ o2 I T ) X ( X ′X ) −1 = σ o2 ( X ′X ) −1 X ′I T X ( X ′X ) −1

= σ o2 ( X ′X ) −1 X ′X ( X ′X ) −1 = σ o2 ( X ′X ) −1 .

3) Show E ( s 2 ) = σ o2 .

Lemma D.2:
SSE = e′e = y ′M ( X ) y = ε ′M ( X )ε .
Proof:
SSE = y′M(X)y = (Xβ+ε)′M(X)(Xβ + ε) = (β′X′+ε′)M(X)ε = ε′M(X)ε.

Theorem:
E ( SSE ) = (T − k )σ o2 .

Linear Regressions under Ideal Conditions-57

Digression to Matrix Algebra:
Definition: (trace of a matrix)
B = [bij]n x n → tr(B) = Σni=1bii = sum of diagonals.

Lemma D.3:
For Am×n and Bn×m, tr(AB) = tr(BA).
Lemma D.4:
If B is an idempotent n×n matrix,
rank(B) = tr(B).

[Comment]
• For Lemma D.4, many econometrics books assume B to be also symmetric.
But the matrix B does not have to be.
• An idempotent matrix does not have to be symmetric: For example,
⎛ 1/ 2 1 ⎞ ⎛1 a⎞
⎜ 1/ 4 1/ 2 ⎟ ; ⎜0 0⎟
⎝ ⎠ ⎝ ⎠
• Theorem DA.1:
The eigenvalues of an idempotent matrix, say B, are ones or zeros.
<Proof> λξ = Bξ = B 2ξ = Bλξ = λ 2ξ .

Linear Regressions under Ideal Conditions-58

• Theorem DA.2:
tr(B) = sum of the eigenvalues of B, where B is n×n.
<Proof> det(λ I − B ) = (λ − λ1 )...(λ − λn )

→ (b11 + b22 + ... + bnn )λ n −1 = (λ1 + ... + λn )λ n −1 .

• Theorem DA.3:
rank (B) = # of non-zero eigenvalues of B [See Greene.]

• Lemma D.4 is implied by Theorems DA.1-3.

Example:
Let A be T×k (T > k). Show that rank[IT-A(A′A)-1A′] = T - k.
[Solution]
rank[IT-A(A′A)-1A′]
= tr(IT - A(A′A)-1A′)
= tr(IT) - tr[A(A′A)-1A′] = T - tr[(A′A)-1A′A]
= T - tr(Ik) = T - k.
End of Digression.

Linear Regressions under Ideal Conditions-59

3) Show E ( s 2 ) = σ o2 :
E ( SSE ) = E (ε ′M ( X )ε ) = E[tr{ε ′M ( X )ε }] = E[tr{M ( X )εε ′}]
= tr[ M ( X ) E (εε ′)] = tr[ M ( X )σ o2 I T ] = σ o2tr[ M ( X )]
= σ o2tr[ I T − X ( X ′X ) −1 X ′] = σ o2 (T − k )

→ E ( s 2 ) = E ( SSE /(T − k )) = E ( SSE ) /(T − k ) = [σ o2 (T − k )]/(T − k ) = σ o2 .

4) Show the normality of βˆ .

Lemma D. 5:
Let zT×1 ~ N(μT×1, ΩT×T). Suppose that A is a k×T nonstochastic matrix. Then,
b + Az ~ N(b + Aμ, AΩA′).

Theorem: βˆ ~ N ( β o , σ o2 ( X ′X ) −1 )
Proof:
βˆ = β o + ( X ′X ) −1 X ′ε

→ βˆ ~ N(βo+(X′X)-1X′E(ε), (X′X)-1X′Cov(ε)X(X′X)-1)

= N ( β o , σ o2 ( X ′X ) −1 ) .

Linear Regressions under Ideal Conditions-60

5) Show that βˆ and SSE are stochastically independent.

Lemma D.6:
Let Q be a T×T (nonstochastic) symmetric and idempotent matrix. Suppose
ε ~ N (0T ×1 , σ o2 I T ) . Then,
ε ′Qε
~ χ 2 ( r) , r = tr(Q).
σo 2

Proof: See Schmidt.

Lemma D.7:
Suppose that Q is a T×T (nonstochastic) symmetric and idempotent and B is a
m×T nonstochastic matrix. If ε ~ N (0T ×1 , σ o2 I T ) , Bε and ε′Qε are
stochastically independent iff BQ = 0mxT.
Proof: See Schmidt.

Theorem:
(T − k ) s 2 SSE
= ~ χ 2 (T − k ).
σ 2
o σ 2
o

And, βˆ and s2 are stochastically independent.

Linear Regressions under Ideal Conditions-61

Proof:
1) Note that (T − k ) s 2 / σ o2 = SSE / σ o2 = ε ′M ( X )ε / σ o2 .
Since M(X) is idempotent and symmetric and tr(M(X)) = T-k, by Lemma D.7,
ε ′M ( X )ε / σ o2 ~ χ 2 (T − k ) .

2) Note that βˆ − β o = ( X ′X ) −1 X ′ε (by Lemma D.1); (T-k)s2 = SSE = ε′M(X)ε.

Note that (X′X)-1X′M(X) = 0kxT . Therefore, Lemma D.7 applies, i.e., SSE and
βˆ are stochastically independent. So are s2 and βˆ .

Theorem: var( s 2 ) = 2σ o4 /(T − k ) .

Proof:
Since (T − k ) s 2 / σ o2 ~ χ 2 (T − k ) , var[(T − k ) s 2 / σ o2 ] = 2(T − k ) (since

var(χ2(r)) = 2r), and [(T − k ) / σ o2 ]2 var( s 2 ) = 2(T − k ) implies

var( s 2 ) = 2σ o4 /(T − k ) .

Remark:
⎡ σ 2
( X ′
X ) −1
0k ×1 ⎤
⎛β ⎞ ⎛ βˆ ⎞ o
Let θ = ⎜ 2 ⎟ and θˆ = ⎜⎜ 2 ⎟⎟ . Then, Cov (θˆ) = ⎢ ⎥
2σ o4 ⎥ .
⎝σ ⎠ ⎝s ⎠ ⎢ 0
⎢⎣ 1×k
T − k ⎥⎦

Linear Regressions under Ideal Conditions-62

[6] Efficiency of βˆ and s2
Question:
Are the OLS estimators, βˆ and s2, the best estimators among the unbiased
estimators of β and σ2?

Theorem: (Gauss-Markov)
Under (SIC.1) – (SIC.5) (ε may not be normal) and (SIC.8), βˆ is the best
linear unbiased estimator (BLUE) of β.

Comment:
Suppose that β is an estimator which is linear in y; that is, there exists a T×k

matrix C such that β = C′y. Let us assume that E ( β ) = β o . Then, the above

theorem means that Cov( β ) - Cov( βˆ ) is psd, for any β .

Linear Regressions under Ideal Conditions-63

Proof of Gauss-Markov (A Sketch):
Let β be an unbiased estimator linear in y: That is, there exists a T×k matrix C

such that β = C′y. Let C′ = (X′X)-1X' + D′. Then,

E( E ( β ) = E [( X ′X ) −1 X ′y + D′y ] = E ( βˆ + D′y ) = β o + E ( D′y ) .

Since β is unbiased, it must be that:

E(D′y) = 0 → E[D′(Xβ+ε)] = 0 → D′Xβ + D′E(ε) = 0
→ D′Xβo = 0.
Since this result must hold whatever βo is, D′X = 0k×k. Then,
β = C′y = [(X′X)-1X′ + D′]y = [(X′X)-1X′ + D′](Xβo + ε)
= βo + [(X′X)-1X′ + D′]ε
After some algebra, you can show that (do this by yourself):
Cov( β ) = Cov( βˆ ) + σo2D′D [using the fact that D′X = 0].
Then, you can show:
Cov( β ) - Cov( βˆ ) = σo2D′D is psd (by the theorem below)

Digression to Matrix Theory

Theorem:
Suppose A is p×q nonzero matrix. Then, A′A is psd. If rank(A) = q, then, A′A
is pd.
End of Digression

Linear Regressions under Ideal Conditions-64

Theorem:
Under (SIC.1) – (SIC.6) (ε should be normal) and (SIC.8), βˆ and s2 are the
most efficient estimators of β and σ2. [(SIC.7) does not have to hold.]

Digression to Mathematical Statistics

(1) Cases in which θ (unknown parameter) is scalar.

Definition: (Likelihood function)

• Let {x1, ... , xT} be a sample from a population.
• It does not have to be a random sample.
• xt is a scalar.
• Let f(x1,x2, ... , xT,θo) be the joint density function of x1, ... , xT.
• The functional form of f is known, but not θo.
• Then, LT(θ) ≡ f(x1, ... , xT, θ) is called “likelihood function”.
• LT(θ) is a function of θ given x1, ... , xT.
• The functional form of f is known, but not θo.

Definition: (log-likelihood function)

lT(θ) = ln[f(x1, ... , xT,θ)].

Linear Regressions under Ideal Conditions-65

Example:
• {x1, ... , xT}: a random sample from a population distributed with f(x,θo).

∏ f ( xt , θ o ) .
T
• f(x1, ... , xT, θo) = t =1

∏ f ( xt ,θ ) .
T
→ LT (θ) = f(x1, ... , xT, θ) = t =1

→ ( )
lT(θ) = ln ∏t =1 f ( xt ,θ ) = Σt ln f ( xt ,θ ) .
T

Definition: (Maximum Likelihood Estimator (MLE))

MLE θˆMLE maximizes lT(θ) given data points x1, ... , xT.

Theorem: (Minimum Variance Unbiased Estimator)

If E(θˆMLE ) = θo, then θˆMLE is the MVUE. If E(θˆMLE ) ≠ θo, but if there exists a

function g(θˆMLE ) such that E[g(θˆMLE )] = θo, then, g(θˆMLE ) is the MVUE.

Example:
• {x1, ... , xT} is a random sample from a population following a Poisson
distribution [i.e., f(x,θ) = e-θθx/x! (suppressing subscript “o” from θ)].
• Note that E(x) = var(x) = θo for Poisson distribution.
• lT(θ) = Σtln[f(xt,θ)] = -θT + (ln(θ))Σtxt - Σtln(xt!)
1
• FOC of maximization: ∂ / ∂θ = −T + Σt xt = 0 .
T
θ
Σx
• Solving this, θˆMLE = t t = x .
T

Linear Regressions under Ideal Conditions-66

(2) Extension to the Cases with Multiple Parameters.
Definition:
• θ = [θ1,θ2, ... , θp]′.
• LT(θ) = f(x1, ..., xT,θ) = f(x1, ... , xT, θ1, ... , θp).
• lT(θ) = ln[f(x1, ... , xT,θ) = ln[f(x1, ... , xT, θ1, ... , θp)].
• xt could be a vector.
• If {x1, ... , xT} is a random sample from a population with f(x,θo),

( )
lT(θ) = ln ∏t =1 f ( xt ,θ ) = Σt ln f ( xt ,θ ) .
T

Definition: (MLE)
MLE θˆMLE maximizes lT(θ) given data (vector) points x1, ... , xT. That is, θˆMLE
solves
⎡ ∂ T (θ ) / ∂θ1 ⎤ ⎡0⎤
⎢ ⎥ ⎢ ⎥
∂ T (θ ) ⎢ ∂ T (θ ) / ∂θ 2 ⎥ ⎢0⎥
= = .
∂θ ⎢ : ⎥ ⎢:⎥
⎢∂ (θ ) / ∂θ ⎥ ⎢ ⎥
⎣ T p⎦ ⎣0⎦ p ×1

Theorem: (Minimum Variance Unbiased Estimator)

If E(θˆMLE ) = θo, then θˆMLE is the MVUE. If E(θˆMLE ) ≠ θo, but if there exists a

function g(θˆMLE ) such that E[g(θˆMLE )] = θo, then, g(θˆMLE ) is the MVUE.

Linear Regressions under Ideal Conditions-67

Comment:
Let θ be any unbiased estimator of θo. The above theorem implies that
[ Cov(θ ) − Cov(θˆMLE ) ] is psd.

Example:
• Let {x1, ... , xT} be a random sample from N ( μo , σ o2 ) .

• Since {x1, ... , xT} is a random sample, E ( xt ) = μo and var( xt ) = σ o2 .

• Let θ = (μ,v)′, where v = σ2.

1 ⎡ ( xt − μ )2 ⎤ ⎡ ( xt − μ ) 2 ⎤
• f ( xt , θ ) = exp ⎢ − ⎥ = ( 2π ) ( v ) exp ⎢ − 2v ⎥ .
−1 / 2 −1 / 2

2πv ⎣ 2 v ⎦ ⎣ ⎦
1 1 ( xt − μ )2
• ln[ f ( xt , θ )] = − ln(2π ) − ln( v ) − .
2 2 2v
T T Σt ( xt − μ )2
• T (θ ) = − ln(2π ) − ln(v ) − .
2 2 2v
• MLE solves FOC:
∂ T (θ ) 1 Σ (x − μ)
(1) = − Σ t 2( x t − μ )( −1) = t t = 0;
∂μ 2v v

∂ T (θ ) T Σt ( xt − μ )2
(2) =− + = 0.
∂v 2v 2v 2
• From (1):
Σt xt
(3) Σt ( xt − μ ) = 0 → Σtxt - Tμ = 0 → μ̂ MLE = = x.
T

Linear Regressions under Ideal Conditions-68

• Substituting (3) in to (2):
1
(4) -Tv + Σt(xt- μ̂ MLE )2 = 0 → vˆMLE = Σt ( xt − x )2 .
T
• Thus,

⎛ μˆ MLE ⎞ ⎛⎜ x ⎞
⎟
θˆ =⎜ ⎟= 1
Σ − 2 .
⎝ vˆMLE ⎠ ⎜⎝ T t t
( x x ) ⎟
MLE
⎠
• Note that:
⎛1 ⎞ 1 1
• E ( μˆ MLE ) = E ( x ) = E ⎜ Σt xt ⎟ = Σt E ( xt ) = Σt μo = μo .
⎝T ⎠ T T
T −1 2
σ o (by the fact that E ⎡⎢ ⎤
1
• E (vˆMLE ) = Σt ( xt − x ) 2 ⎥ = σ o2 )
T ⎣T −1 ⎦
T
→ Let g ( vˆMLE ) = vˆMLE .
T −1
⎡ 1 ⎤
→ Clearly, E [ g ( vˆMLE )] = E ⎢ Σt ( xt − x ) 2 ⎥ = σ o2 .
⎣T −1 ⎦
→ Thus, g (vˆMLE ) is MVUE of σ 2 .

Linear Regressions under Ideal Conditions-69

(3) Extension to Conditional density
Definition:
• Conditional density of yt: f ( yt ,θo | xt i ) , θ = [θ1,θ2, ... , θp]′.

• LT (θ ) = Π Tt =1 f ( yt ,θ | xt i ) .

• lT(θ) = LT (θ ) = ΣTt =1 ln( f ( yt | θ , xt )) .

Example:
• Assume that ( yt , xt′i ) iid and f ( yt , βo , vo | xt i ) ~ N ( xt′i βo , vo ) .

1 ⎛ 1 ⎞
• f ( yt , β , v | xt i ) = exp ⎜ − ( yt − xt′i β ) 2 ⎟ .
2π v ⎝ 2v ⎠
lT ( β , v ) = Σ t ln f ( yt , β , v | xt i )
T T 1
• =− ln(2π ) − ln v − Σ t ( yt − xt ′β ) 2 .
2 2 2v
T T 1
= − ln(2π ) − ln v − ( y − X β )′( y − X β )
2 2 2v

End of Digression

Linear Regressions under Ideal Conditions-70

Return to Efficiency of OLS estimator

Proof:
We already know that E ( βˆ ) = β o and E ( s 2 ) = σ o2 . Thus, it is sufficient to

show that βˆ and s2 are MLE or some functions of MLE. Under (SIC.1) –
(SIC.6) and (SIC.8),
ε ~ N(0T×1, voIT) → y ~ N(Xβo,voIT), where vo = σ o2 .
Therefore, we have the following likelihood function of y,
1 ⎡ 1 ⎤
LT(β,v) = exp ⎢ − ( y − X β )′( vI T ) −1 ( y − X β ) ⎥
(2π )T / 2 vI T ⎣ 2 ⎦

1 ⎡ 1 ⎤
= exp −
⎢⎣ 2 ( y − X β )′( vI ) −1
( y − X β ) ⎥⎦
(2π )T / 2 v T / 2
T

Then,
lT(β,v) = -(T/2)ln(2π) -(T/2)ln(v) - (y-Xβ)′(y-Xβ)/(2v)
= -(T/2)ln(2π) -(T/2)ln(v) - (1/2v)[y′y-2β′X′y+β′X′Xβ].
→ FOC: ∂lT(β,v)/∂β = -(1/2v)[-2X′y + 2X′Xβ] = 0k×1 (i)
∂lT(β,v)/∂v = -(T/2v) + (1/2v2)(y-Xβ)′(y-Xβ) = 0 (ii)
→ From (i), X′y - X′Xβ = 0k×1 → βˆ MLE = (X′X)-1X′y = βˆ .
→ From (ii), v̂MLE = SSE/T → s2 is a function of v̂MLE .
[s2 = [T/(T-k)] v̂MLE ]

Linear Regressions under Ideal Conditions-71

[7] Testing Linear Hypotheses
(1) Testing a single restriction on β:
• Ho: Rβo - r = 0, where R is 1×k and r is a scalar.

Example: yt = xt1β1 + xt2β2 + xt3β3 + εt.

• We would like to test Ho: β3,o = 0.
• Define R = [0 0 1] and r = 0.
• Then, Rβo - r = 0 → β3,o = 0.
• Ho: β2,o - β3,o = 0 (or β2,o = β3,o).
• Define R = [0 1 -1] and r = 0.
• Rβo - r = 0 → β2,o - β3,o = 0
• Ho: 2β2,o + 3β3,o = 3.
• R = [0 2 3] and r = 3.
• Rβ - r = 0 → Ho.

Theorem: (T-Statistics Theorem)

Assume that (SIC.1)-(SIC.6) and (SIC.8) hold. Under Ho: Rβo - r = 0,
Rβˆ − r
t= ~ t (T − k ) ,
sR

where sR = R[ s 2 ( X ′X ) −1 ]R′ .

Linear Regressions under Ideal Conditions-72

Corollary:
Let se( βˆ j ) = square root of the j’th diagonal of s2(X′X)-1. Then, under Ho: βj =

βj*,

βˆ j − β j *
t= ~ t (T − k ) .
ˆ
se( β j )
Proof:
Let R = [0 0 ... 1 ... 0]; that is, only the j′th entry of R equals 1. Let r = βj*. Then,

Rβˆ − r βˆ j − β j * βˆ j − β j *
βˆ j − β j *
t= = = = .
sR 2
′ −1
Rs ( X X ) R ′ ˆ
var( β j ) se ( ˆ
β )
j

Comment:
• T-Statistics Theorem implies the following:
• Imagine that you collect billions and billions (b) of different samples.
• For each sample, compute the t statistic for the same hypothesis Ho.
Denote the population of these t statistics by {t[1], t[2], ..., t[b]}.
• The above theorem indicates that the population of t-statistics is
distributed as t(T-k).

Linear Regressions under Ideal Conditions-73

How to reject or accept Ho
<Case 1> Ho: Rβo = r and Ha: Rβo ≠ r.
• For simplicity, consider a case with T-k = 25.
• Ho: βj,o = 0 and Ha: βj,o ≠ 0.

• If you choose α = 5% (significance level), the probability that your

t-statistic computed with a sample lies between –2.06 and 2.06 is 95%
(confidence level). Call 2.06 “critical value” (c).
• So, if the value of your t-statistic is outside of (-2.06, 2.06) [(-c, c)], you
could say, “My t-value is quite an unlikely number I can obtain, if Ho is
indeed correct”. In this sense, you reject Ho.
• If the value of your t-statistic is inside of (-2.06,2.06), you can say, “My
t-value is a possible number I can get if Ho is correct.” In this sense, you
accept (do not reject) Ho.

Linear Regressions under Ideal Conditions-74

• Another way to determine acceptance/rejection (P-value):
• Suppose you have t = 1.85 and T-k = 40
• Find the probability that a t-random variable is outside of (-1.85, 1.85).

• This probability is called p-value. This value is the minimum α value

with which you can reject Ho. Thus, your choice of α > p-value, reject
Ho. If your choice of α < p-value, do not reject Ho.

Linear Regressions under Ideal Conditions-75

<Case 2> Ho: Rβo = r and Ha: Rβo > r.
• T-k = 28, Ho: βj,o = 0 and Ha: βj,o > 0.

• Here, you strongly believe that βj,o cannot be negative. If so, you would
regard negative t-statistics as evidence for Ho. So, your
acceptance/rejection decision depends on how positively large the value of
your t-statistic is.
• Choose a critical value (c = 1.701) as in the above graph at 5% significance
level. Then, reject Ho in favor of Ha, if t > c (=1.701). Do not reject Ho, if t
< c.

Linear Regressions under Ideal Conditions-76

<Case 3> Ho: Rβo = r and Ha: Rβo < r.
• T-k = 18, Ho: βj,o = 0 and Ha: βj,o < 0.

• Here, you strongly believe that βj,o cannot be positive. If so, you would
regard a positive value of a t-statistic as evidence favoring Ho. So, your
acceptance/rejection decision depends on how negatively large the value of
your t-statistic is.
• Choose a critical value (-c = -1.734) as in the above graph at a given
significance level. Then, reject Ho in favor of Ha, if t < -c (= -1.734). Do not
reject Ho, if t > -c.

Linear Regressions under Ideal Conditions-77

Numerical Example:
• Use 95% of confidence level.
• y = β1 + β2x2t + β3x3t + εt.
⎡1.45 0 0 ⎤ ⎡1.2⎤
• s 2 ( X ′X ) −1 = ⎢ 0 72.57 − 101.60⎥ ; βˆ = ⎢ − 1⎥ ; T = 10.
⎢ ⎥ ⎢ ⎥
⎣⎢ 0 − 101.60 145.14 ⎦⎥ ⎣⎢ 2 ⎦⎥

• Ho: β2,o = β3,o against Ha: β2,o ≠ β3,o

→ Ho: β2,o - β3,o = 0.
→ Ho: 1•β2,o + (-1)• β3,o = 0.
→ R = (0,1,-1) and r = 0.
→ t = -0.14
→ df = 10 – 3 = 7 → c = 2.365
→ Since –2.365 (-c) < t < 2.365 (c), do not reject Ho.

• Ho: β2,o + β3,o = 1 ; Ha: β2,o + β3,o ≠ 1

→ t = 0, c = 2.365.

Linear Regressions under Ideal Conditions-78

[Proof of T-Statistics Theorem]
Digression to Probability Theory
1) Standard Normal Distribution: (z ~ N(0,1))
1 ⎛ z2 ⎞
• Pdf: φ(z) = exp⎜⎜ − ⎟⎟ , -∞ < z < ∞.
2π ⎝ 2⎠
2) χ2 (Chi-Square) Distribution
• Let z1, ... , zk be random variables iid with N(0,1).
• Then, y = Σ ik=1 z i ~ χ2(k).
2

• Here, y > 0, k = degrees of freedom.

• E(y) = k and var(y) = 2k.

3) Student t Distribution
• Let z ~ N(0,1) and y ~ χ2(k). Assume that z and y are stochastically
independent.
z
• Then, t = ~ t(k).
y/k

• E(t) = 0, k > 1; var(t) = k/(k-2), k > 2.

• As k → ∞, var(t) → 1. In fact, t → z.
• The pdf of t is similar to that of z, but t has ticker tails.
• f(t) is symmetric around t = 0.

Linear Regressions under Ideal Conditions-79

4) F Distribution
• Let y1 ~ χ2(k1) and y2 ~ χ2(k2) be stochastically independent.
y1 / k1
• Then, f = ~ f(k1,k2).
y2 / k 2

• f(1,k2) = [t(k2)]2.
• If f ~ f(k1,k2), k1f → χ2(k1) as k2 → ∞.

Gauss Exercise:
• z ~ N(0,1); t ~ t(4); y ~ χ2(2); f ~ f(2,10).
• Gauss program name: dismonte.prg
/*
** Monte Carlo Program for z, x-square, t and f distribution
*/

@ Data generation under Classical Linear Regression Assumptions @

new;
seed = 1;
iter = 10000; @ # of sets of different data points @

z = zeros(iter,1);
t = zeros(iter,1);
x = zeros(iter,1);
f = zeros(iter,1);

i = 1; do while i <= iter;

z[i,1] = rndns(1,1,seed);
t[i,1] = rndns(1,1,seed)./sqrt( sumc(rndns(4,1,seed)^2)/4 );
x[i,1] = sumc(rndns(2,1,seed)^2);
f[i,1] = ( sumc( rndns(2,1,seed)^2 )/2 )./ (sumc( rndns(10,1,seed)^2 )/10) ;
i = i + 1; endo ;

@ Histograms @

library pgraph;
graphset;
ytics(0,6,0.1,0) ;
v = seqa(-8,0.1,220);
@ {a1,a2,a3}=histp(z,v); @
@ {b1,b2,b3}=histp(t,v); @

library pgraph;
graphset;
ytics(0,10,0.1,0);
w = seqa(0, 0.1, 330);

Linear Regressions under Ideal Conditions-80

@ {c1,c2,c3} = histp(x,w); @
{d1,d2,d3} = histp(f,w);

z ~ N(0,1)

t ~ t(4)

Linear Regressions under Ideal Conditions-81

y ~ χ2(2)

f ~ f(2,10)
End of Digression

Linear Regressions under Ideal Conditions-82

Lemma T.1:
Under (SIC.1)-(SIC.6) and (SIC.8), βˆ and s2 are stochastically independent.
(See Schmidt.)
Lemma T.2:
Under (SIC.1)-(SIC.6) and (SIC.8),
R( βˆ − β )
~ t (T − k ) .
sR
Proof:

Define σR = σ o2 R( X ′X ) −1 R′ . Note that:

⎡ R( βˆ − β ) ⎤ ⎡ R ( βˆ − β ) ⎤
E⎢ ⎥ = 0 ; var ⎢ ⎥ = 1.
⎣ σ R ⎦ ⎣ σ R ⎦

[Why?] Furthermore, since βˆ is normal, so is R ( βˆ − β ) / σ R . That is,

R( βˆ − β )
q1 ≡ ~ N (0,1) .
σR
Note that

sR Rs 2 ( X ′X ) −1 R′ (T − k ) s 2
s2 χ 2 (T − k )
q2 ≡ = = = == .
σR Rσ o ( X ′X ) R′
2 −1 σ 2
o (T − k )σ 2
o T −k

Note that q1 and q2 are stochastically independent because βˆ and s2 are

stochastically independent by Lemma T.1. Therefore, we have:
R( βˆ − β ) q1 N (0,1)
= = ~ t (T − k ) .
sR q2 χ (T − K ) /(T − k )
2

Linear Regressions under Ideal Conditions-83

Proof of T-Statistics Theorem:
Under Ho,
Rβˆ − r Rβˆ − Rβ o R( βˆ − β o )
t= = = ~ t (T − k ) .
sR sR sR
Then, the result immediately follows from Lemma T.2.

(2) Testing several restrictions

Assume that R is m×k and r is m×1 vector, and Ho: Rβo = r.

Example:
• A model is given: yt = xt1β1,o + xt2β2,o + xt3β3,o + εt.
• Wish to test for Ho: β1,o = 0 and β2,o + β3,o = 1.
• Define:
⎡ 1 0 0⎤ ⎡ 0⎤
R=⎢ ⎥ ; r = ⎢ ⎥
⎣ 0 1 1⎦ ⎣1⎦
Then, Ho → Rβo = r.

Theorem: (F-Statistics Theorem)

Assume that all of SIC holds. Under Ho: Rβo = r,
( R βˆ − r )′[ Rs 2 ( X ′X ) −1 R′]−1 ( R βˆ − r )
F≡ ~ f ( m, T − k ) .
m

Linear Regressions under Ideal Conditions-84

Comment:
( Rβˆ − r )′[ Rs 2 ( X ′X ) −1 R′]−1 ( Rβˆ − r )
m
.
ˆ ′ −1
′ −1 ˆ
R( β − r ) [ R( X X ) R ] ( Rβ − r ) / m
′
=
SSE /(T − k )

Comment:
F-Statistics Theorem implies the following:
• Imagine that you collect billions and billions (b) of different samples.
• For each sample, compute the F statistic for the same hypothesis Ho. Denote
the population of these F statistics as {F[1], F[2], ... , F[b]}.
• The above theorem indicates that the population of the F-statistics is
distributed as f(m,T-k).

How to reject or accept Ho

• When you use the F-test, it is important to note that the hypothesis you
actually test is not Ho: Rβo = r. It is rather (with some exaggerations) the
hypothesis that:
Ho′: (Rβo-r)′[R(X′X)-1R′]-1(Rβo-r) = 0.
If so, your alternative hypothesis should be that
Ha′: (Rβo-r)′[R(X′X)-1R′]-1(Rβo-r) > 0,
because R(X′X)-1R′ is pd. So, the F-test is a one-tail by nature.

Linear Regressions under Ideal Conditions-85

• Suppose m = 3 and T-k = 60.

• If you choose α = 5% (significance level), the probability that your

F-statistic computed with a sample is greater than 2.76 (confidence level).
Call 2.76 “critical value” (c).
• So, if the value of your F-statistic is greater (smaller) than c, reject (do not
reject) Ho.

Linear Regressions under Ideal Conditions-86

An Alternative Representation of F-Statistic
Definition: (Restricted OLS)
~ ~
Restricted OLS estimators β and σ~ 2 are defined as follows: β minimizes
~
ST(β) = (y-Xβ)′(y-Xβ) subject to the restriction Rβ = r. Given β , σ~ 2 is

computed by ( y − X β )′( y − X β ) /(T − k + m ) .

Theorem:
β = βˆ − ( X ′X ) −1 R′[ R( X ′X ) −1 R′]( Rβˆ − r ) .
Proof: See Greene.

Theorem:
Under Ho: Rβo - r = 0,
E ( β ) = βo .

Cov ( β ) = Cov ( βˆ ) − σ o2 ( X ′X ) −1 R′[ R ( X ′X ) −1 R′]R( X ′X ) −1 .

Proof:
Show it by yourself. Use the fact that for any pd matrix A, BAB′ is a psd
matrix whatever nonzero conformable matrix B.

Theorem:
Assume that (SIC.1)-(SIC.6) and (SIC.8) hold (whether (SIC.7) holds or not).
~
If Ho is correct, then, β is more efficient than βˆ .
Proof: Show it by yourself.

Linear Regressions under Ideal Conditions-87

Theorem
Let SSE = ( y − X βˆ )′( y − X βˆ ) ; SSEr = ( y − X β )′( y − X β ) . Then,
( SSE r − SSE ) / m ( SSE r − SSE ) / m
F= = .
s2 SSE /(T − k )
Proof: See Greene.

Remark:
• Consider a model: yt = xt1β1 + xt2β2 + xt3β3 + xt4β4 + εt.
• Wish to test for Ho: β3,o = β4,o = 0.
~
• To find β , do OLS on:
(*) yt = xt1β1 + xt2β2 + εt .
~ ~
• Denote the OLS estimates by β1 and β 2 . Then, the restricted OLS
~ ~
estimate of β is given by = [ β1 , β 2 , 0, 0]′.
• Also, set SSE from (*) as SSEr.
• Test Ho: β2,o + β3,o = 1 and β4,o = 0.
• yt = xt1β1 + xt2β2 + xt3β3 + xt4β4 + εt.
→ yt = xt1β1 + xt2β2 + xt3(1-β2) + εt.
→ yt - xt3 = xt1β1 + (xt2-xt3)β2 + εt . (**)
~ ~ ~ ~ ~
• Do OLS on (**) and get β1 and β 2 . Set β 3 = 1 - β 2 and β 4 = 0. Set
SSEr = SSE from OLS on (**).

Linear Regressions under Ideal Conditions-88

Theorem
~ ~
Let β1 be the OLS estimator β1 for a model yt = β1 + εt. Then, β1 = y .
Proof: Do this by yourself.

Theorem: (Overall Significance F Test)

The model is given:
yt = xt1β1 + xt2β2 + ... + xtkβk + εt. (*)
Assume that this model satisfies all of SIC (including SIC.7). Consider Ho: β2,o
= ... = βk,o = 0. The F-statistic for this hypothesis is given by
T − k R2
F= ~ f(k-1,T-k),
k − 1 1 − R2
where R2 is from the original model (*).

Linear Regressions under Ideal Conditions-89

Example:
• Consider WAGE2.WF1
• Data: (WAGE2.WF1 or WAGE2.TXT – from Wooldridge’s website)
# of observations (T): 935
1. wage monthly earnings
2. hours average weekly hours
3. IQ IQ score
4. KWW knowledge of world work score
5. educ years of education
6. exper years of work experience
7. tenure years with current employer
8. age age in years
9. married =1 if married
10. black =1 if black
11. south =1 if live in south
12. urban =1 if live in SMSA
13. sibs number of siblings
14. brthord birth order
15. meduc mother's education
16. feduc father's education
17. lwage natural log of wage

• Estimate the Mincerian wage equation:

log(wage) = β1 + β2Educ + β3Exper + β4Exper2 + ε

Linear Regressions under Ideal Conditions-90

Estimation Results by Eviews:

Dependent Variable: LWAGE

Method: Least Squares
Sample: 1 935
Included observations: 935

Variable Coefficient Std. Error t-Statistic Prob.

C 5.517432 0.124819 44.20360 0.0000

EDUC 0.077987 0.006624 11.77291 0.0000
EXPER 0.016256 0.013540 1.200595 0.2302
EXPER^2 0.000152 0.000567 0.268133 0.7887

R-squared 0.130926 Mean dependent var 6.779004

Adjusted R-squared 0.128126 S.D. dependent var 0.421144
S.E. of regression 0.393240 Akaike info criterion 0.975474
Sum squared resid 143.9675 Schwarz criterion 0.996183
Log likelihood -452.0343 F-statistic 46.75188
Durbin-Watson stat 1.788764 Prob(F-statistic) 0.000000

• Ho: Education does not improve individuals’ productivity.

• Ha: Education matters, but its effect could be either positive or negative.
→ Ho: β2,o = 0 Vs. Ha: β2,o ≠ 0.

βˆ2 − 0
→ t= = 11.77291 ; c = 1.96 at 5% significance level.
se( βˆ2 )

→ Since t ∉ (-1.96, 1.96), reject Ho!

→ P-value for this t statistic = 0.0000; α = 0.05.

Linear Regressions under Ideal Conditions-91

• Ho: Education does not improve individuals’ productivity.
Ha: Education improves individuals’ productivity.
→ Ho: β2,o = 0 Vs. Ha: β2,o > 0.
βˆ2 − 0
→ t= = 11.77291; c = 1.645 at 5% significance level.
se( βˆ2 )
Since c < t, reject Ho in favor of Ha.

• Ho: Work experience does not improve individuals’ productivity.

→ Ho: β3,o = β4,o = 0.
Ha: Work experience matters.
→ Ha: β3,o ≠ 0 and/or β4,o ≠ 0.

Wald Test:
Equation: Untitled

Null C(3)=0
Hypothesis:
C(4)=0

F-statistic 17.94867 Probability 0.000000

Chi-square 35.89734 Probability 0.000000

→ F = 17.94867; c from f(2,931) = 2.6 (at α = 5%).

→ Reject Ho.
→ Or, p-val of F = 0.0000 < 0.05 = α. So, reject Ho.

Linear Regressions under Ideal Conditions-92

Example: (Cobb-Douglas production function)
• Setup: L = labor; K = capital; Q = output.
• The Cobb-Douglas production function is given:
Qt = ALt β2 Kt β3 eε t ,
where A is constant. Taking log for both sides, we have:
(*) log(Qt ) = β1 + β 2 log( Lt ) + β 3 log( Kt ) + ε t ,
where β1 = ln(A).
• Estimation: Do OLS on (*), and estimate β’s.
• Interpretation of β’s:
β2 = ∂ log(Qt ) / ∂ log( Lt ) = Elasticity of output with respect to labor.
β3 = ∂ log(Qt ) / ∂ log( Kt ) = Elasticity of output with respect to capital.
β2 + β3 = scale of economy (r)
[increasing returns to scale if r > 1]
• Using F- or t-test methods, we can test Ho: β2,o + β3,o = 1.
• A drawback of Cobb-Douglas
• When you use the Cobb-Douglas production function, you are assuming
that the elasticities are constant over different levels of L and L. In
reality, elasticities might change over different L and K.

Linear Regressions under Ideal Conditions-93

Example: (Translog Production Function)
• Setup:
⎧ β1 + β 2 log( Lt ) + β 3 log( K t ) ⎫
⎪ ⎪
log(Qt ) = ⎨ (log( Lt )) 2
(log( K t )) 2
⎬.
+
⎪⎩ 4β + β 5 + β 6 (log( L ))(log( K )) + ε t⎪
2 2
t t
⎭
• Testing Cobb-Douglas:
• Do a F-test for Ho: β 4,o = β 5,o = β 6,o = 0 .

• Estimating elasticities:
• Let log ( L) and log( K ) be chosen values of log(Lt) and log(Kt).
[You may choose sample means.]
• Observe that ηQL = ∂ log(Q ) / ∂ log( L) = β2 + β4log(L) + β6log(K).
• Thus, a natural estimate of ηQL is given:
ηˆQL = βˆ2 + βˆ4 log( L) + βˆ6 log( K ) = R βˆ ,

where R = (0,1,0, log( L) ,0, log( K ) ).

• var(ηˆQL ) = var( R βˆ ) = RCov ( βˆ ) R′ .

Thus, se(ηˆQL ) = RCov ( βˆ ) R′ .

Linear Regressions under Ideal Conditions-94

[Proofs of the theorems related with F-statistic]
Theorem:
Under Ho: Rβo = r,
( R βˆ − r )′[ R ( X ′X ) −1 R′]−1 ( R βˆ − r ) / m
F= ~ f ( m, T − k ) .
SSE /(T − k )
Proof:
Let g = ( R βˆ − r )′[ R ( X ′X ) −1 R′]−1 ( R βˆ − r ) / σ o2 ; and let h = SSE / σ o2 =

(T − k ) s 2 / σ o2 . Note that F = (g/m)/[h/(T-k)]. We already know that h ~

χ2(T-k). Therefore, we can complete the proof by showing that (i) g is χ2(m),
and that (ii) g and h are stochastically independent.
(i) Note that under Ho,
R βˆ − r = R βˆ − R β = R( βˆ − β ) = R( X ′X ) −1 X ′ε .
Therefore, we have:
ε ′X ( X ′X ) −1 R′[ R( X ′X ) −1 R′]−1 R( X ′X ) −1 X ′ε ε ′Qε
g= ≡ 2 .
σ o2 σo
We can see that Q is symmetric and idempotent with Rank(Q) = m. Since ε ~
N (0T ×1 , σ o2 I T ) , g ~ χ2(m). [See Schmidt.]

(ii) h = SSE/σ2 = ε′M(X)ε/σ2 ~ χ2(T-k). Note that M(X)Q = 0. Therefore, g

and h are stochastically independent. [See Schmidt.]

Linear Regressions under Ideal Conditions-95

Theorem:
Under Ho: Rβo - r = 0,
E ( β ) = βo ) ;

Cov ( β ) = Cov ( βˆ ) − σ o2 ( X ′X ) −1 R′[ R ( X ′X ) −1 R′]−1 R ( X ′X ) −1 .

Proof:
−1
(i) β = βˆ − ( X ′X ) −1 R ′ ⎡⎣ R( X ′X ) −1 R ′⎤⎦ ( R β − r )
−1
= βˆ − ( X ′X ) −1 R′ ⎡⎣ R( X ′X ) −1 R′⎤⎦ ( Rβ o + R( X ′X ) −1 X ′ε − r )
−1
= β o + ( X ′X ) −1 X ′ε − ( X ′X ) −1 R′ ⎡⎣ R( X ′X ) −1 R′⎤⎦ R( X ′X ) −1 X ′ε
−1
= β o + [( X ′X ) −1 X ′ − ( X ′X ) −1 R′ ⎡⎣ R( X ′X ) −1 R′⎤⎦ R( X ′X ) −1 X ′]ε

→ E ( β ) = βo .

(ii) Derive Cov ( β ) by yourself.

Theorem (Overall Significance Test)

The model is given:
(*) yt = xt1β1 + xt2β2 + ... + xtkβk + εt.
The null hypothesis is given by Ho: β2,o = ... = βk,o = 0. Assume that
(SIC.1)-(SIC.8) (including (SIC.7)) hold. Then, the F-statistic for Ho is given
by:
T − k R2
F= ~ f(k-1,T-k),
k −1 1− R 2

where R2 is from the above-unrestricted model (*).

Linear Regressions under Ideal Conditions-96

Proof:
The restricted model is given by: yt = β1 + εt. Since β1 = y ,

SSEr = ( y − X β )′( y − X β ) = Σ t ( yt − xt′• β ) 2

= Σt ( yt − β1 − xt 2 β 2 − ... − xtk β k )2

= Σt ( yt − β1 )2 = Σt ( yt − y )2 = SST.
Observe that:
( SSEr − SSEu ) /(k − 1) T − k SST − SSE
F = =
SSEu /(T − k ) k −1 SSE
.
T − k 1 − SSE / SST T − k R 2
= =
k − 1 SSE / SST k − 1 1 − R2

Linear Regressions under Ideal Conditions-97

[8] Tests of Structural Changes
(1) Motivation:
Relationships among economic variables may change over time or across
different genders (Ch. 7.4 in Greene)

Example 1:
Oil shocks during 70’s may have changed firms’ production functions
permanently.
Example 2:
Effects of schooling on wages may be different over different regions. [Why?
Perhaps because of different industries across different regions.]

• Data: (WAGE2.WF1 or WAGE2.TXT – from Wooldridge’s website)

# of observations (T): 935
1. wage monthly earnings
2. hours average weekly hours
3. IQ IQ score
4. KWW knowledge of world work score
5. educ years of education
6. exper years of work experience
7. tenure years with current employer
8. age age in years
9. married =1 if married
10. black =1 if black
11. south =1 if live in south
12. urban =1 if live in SMSA
13. sibs number of siblings
14. brthord birth order
15. meduc mother's education
16. feduc father's education
17. lwage natural log of wage

Linear Regressions under Ideal Conditions-98

• Mincerian wage equation for people living in South (A):
Dependent Variable: LWAGE
Sample(adjusted): 28 935 IF SOUTH = 1
Included observations: 319 after adjusting endpoints

Variable Coefficient Std. Error t-Statistic Prob.

C 4.860469 0.233695 20.79831 0.0000
EDUC 0.101053 0.012594 8.024086 0.0000
EXPER 0.053960 0.024386 2.212751 0.0276
EXPER^2 -0.001007 0.001009 -0.997829 0.3191

R-squared 0.179628 Mean dependent var 6.665056

Adjusted R-squared 0.171815 S.D. dependent var 0.450349
S.E. of regression 0.409838 Akaike info criterion 1.066352
Sum squared resid 52.90976 Schwarz criterion 1.113565
Log likelihood -166.0832 F-statistic 22.99070
Durbin-Watson stat 1.755004 Prob(F-statistic) 0.000000

• Mincerian wage equation for people living in Non-South (B):

Dependent Variable: LWAGE
Sample(adjusted): 1 910 IF SOUTH = 0
Included observations: 616 after adjusting endpoints

Variable Coefficient Std. Error t-Statistic Prob.

C 5.893468 0.143314 41.12270 0.0000
EDUC 0.063453 0.007563 8.389865 0.0000
EXPER -0.002798 0.015758 -0.177542 0.8591
EXPER^2 0.000744 0.000664 1.120953 0.2627

R-squared 0.103200 Mean dependent var 6.838013

Adjusted R-squared 0.098804 S.D. dependent var 0.392769
S.E. of regression 0.372861 Akaike info criterion 0.871250
Sum squared resid 85.08351 Schwarz criterion 0.899973
Log likelihood -264.3451 F-statistic 23.47553
Durbin-Watson stat 1.852473 Prob(F-statistic) 0.000000

Linear Regressions under Ideal Conditions-99

• Question:
• βA1,o = βB1,o, βA2,o = βB2,o, βA3,o = βB3,o and βA4,o = βB4,o?
• If so, we can pool all observations to estimate:
(C) lwaget = β1 + β2educt + β3expert + β4expert2 + εt, t = 1, ... , T.

Dependent Variable: LWAGE

Method: Least Squares
Date: 02/05/02 Time: 13:57
Sample: 1 935
Included observations: 935

Variable Coefficient Std. Error t-Statistic Prob.

C 5.517432 0.124819 44.20360 0.0000

EDUC 0.077987 0.006624 11.77291 0.0000
EXPER 0.016256 0.013540 1.200595 0.2302
EXPER^2 0.000152 0.000567 0.268133 0.7887

R-squared 0.130926 Mean dependent var 6.779004

• Question:
How can we test Ho: βA1,o = βB1,o, βA2,o = βB2,o, βA3,o = βB3,o and βA4,o = βB4,o?

Linear Regressions under Ideal Conditions-100

(2) General Framework
Model For Group A:
(A) yAt = βA1 + βA2xAt2 + ... + βAkxAtk + εAt , t = 1, ... , TA.
Model For Group B:
(B) yBt = βB1 + βB2xBt2 + ... + βBkxBtk + εBt, t = 1, ... , TB.
Under Ho: βAj,o = βBj,o for any j = 1, ... , k (k restrictions),
we can pool the data to estimate
(C) yt = β1 + β2xt2 + ... + βkxtk + εt, t = 1, ... , T ( = TA+TB). B

Assume that var(εAt) = var(εBt) = σ o2 .

(3) Chow-Test Procedure.

STEP 1: Do OLS on (C) and get SSEC.
STEP 2: Do OLS on (A) and (B); then get SSEA and SSEB.
STEP 3: Compute the Chow-Test statistic.
Under Ho,
( SSEC − SSE A − SSE B ) / k
FCHOW = ~ f ( k , TA + TB − 2k ) .
( SSE A + SSEB ) /(TA + TB − 2k )

Linear Regressions under Ideal Conditions-101

Example: Back to the Mincerian wage equation.
STEP 1: OLS results from all (SSEC = 143.9675; TA+TB = 935).
STEP 2: OLS results from South (SSEA = 52.90976; TA = 319).
OLS results from Non-South (SSEB = 85.08351; TB = 616).
STEP 3: Compute the Chow statistic:
( SSEC − SSE A − SSEB ) / k
FCHOW =
( SSE A + SSEB ) /(TA + TB − 2k )
(143.9675 − 85.08351 − 52.90976) / 4
=
(85.08351 + 52.90976) /(935 − 8)
= 10.033299
c from f(4,927) = 2.37 at 5% significance level. Since F > c, we
reject Ho. There is a structural difference between South and
Non-South.

Linear Regressions under Ideal Conditions-102

[Proof for Chow test]
• Assume εAt and εBt are iid N (0, σ o2 ) .

• Unrestricted Model: Merge Models (A) and (B):

Model A: yA = XAβA + εA
Model B: yB = XBβB + εBB

⎛ yA ⎞ ⎛ X A 0TA × k ⎞ ⎛ β A ⎞ ⎛ ε A ⎞
→ (*) ⎜ y ⎟ = ⎜0 + → y = X*β* + ε* .
⎝ B ⎠ ⎝ TB × k X B ⎟⎠ ⎜⎝ β B ⎟⎠ ⎜⎝ ε B ⎟⎠

(# of obs (T) = TA+TB; # of regressors = 2k)

⎛ βˆ A ⎞
ˆ ′ ′
→ OLS on (*): β* = ( X * X * ) X * y = ⎜ ⎟ .
−1
⎜ βˆ ⎟
⎝ B⎠
→ SSE from this regression = SSE* = SSEA + SSEB [Why?]. B

• Restricted model:
βA,o = βB,o (let us denote them by β): k restrictions.
→ Merge model (A) and (B) with the restriction (Model C):
⎛y ⎞ ⎛X ⎞ ⎛ε ⎞
(**) ⎜ A ⎟ = ⎜ A ⎟ β + ⎜ A ⎟ → y = Xβ + ε
⎝ yB ⎠ ⎝ X B ⎠ ⎝εB ⎠

→ OLS on this model (restricted OLS): βˆ = (X′X)-1X′y.

→ SSEr = SSEC.

Linear Regressions under Ideal Conditions-103

• F-test for βA,o = βB,o:
F = [(SSEr-SSEu)/k]/[SSEu/(T-2k)]
= [(SSEC-SSEA-SSEB)/k]/[(SSEA+SSEB)/(T-2k)].

• Chow test when var(εAt) ≠ var(εBt).

Under Ho: βA,o = βB,o,

WT (wald test) = ( βˆ A − βˆB )′[ s A2 ( X A′ X A ) −1 + sB 2 ( X B′ X B ) −1 ]−1 ( βˆ A − βˆB )

→ χ2(k).

• Alternative form of Chow test [Assuming var(εAt) = var(εBt).]

• Define a dummy variable:
dt = 1 if t ∈ A ; dt = 0 if t t ∈ B .

• Using all T observations, build up a model:

(*) yt = xt1β1 + ... + xtkβk + (dtxt1)βk+1 + ... + (dtxtk)β2k + εt.
• Note that
yt = xt1(β1+βk+1) + ... + xtk(βk+β2k) + εt, for t ∈ A ,
yt = xt1β1 + ... + xtkβk + εt, for t ∈ B .
• If no difference between A and B, βk+1 = ... = β2k = 0.
F test for Ho: βk+1,o = ... = β2k,o = 0 using OLS on (*) = Chow test!!!

Linear Regressions under Ideal Conditions-104

Example: Return to South V.S. Non-South
Dependent Variable: LWAGE
Method: Least Squares
Date: 02/05/02 Time: 16:01
Sample: 1 935
Included observations: 935

Variable Coefficient Std. Error t-Statistic Prob.

C 5.893468 0.148297 39.74107 0.0000

EDUC 0.063453 0.007826 8.107984 0.0000
EXPER -0.002798 0.016306 -0.171577 0.8638
EXPER^2 0.000744 0.000687 1.083291 0.2790
SOUTH -1.032999 0.265316 -3.893462 0.0001
SOUTH*EDUC 0.037600 0.014206 2.646802 0.0083
SOUTH*EXPER 0.056757 0.028159 2.015637 0.0441
SOUTH*EXPER^2 -0.001751 0.001172 -1.493727 0.1356

R-squared 0.166990 Mean dependent var 6.779004

Adjusted R-squared 0.160700 S.D. dependent var 0.421144
S.E. of regression 0.385824 Akaike info criterion 0.941648
Sum squared resid 137.9933 Schwarz criterion 0.983064
Log likelihood -432.2203 F-statistic 26.54749
Durbin-Watson stat 1.825679 Prob(F-statistic) 0.000000

Wald Test:
Equation: Untitled
Null Hypo.: C(5)=0
C(6)=0
C(7)=0
C(8)=0
F-statistic 10.03332 Probability 0.000000
Chi-square 40.13328 Probability 0.000000

Linear Regressions under Ideal Conditions-105

(4) What if TB < k?
• Can’t estimate β for Group B.
• Alternative test procedure (Chow predictive test):
STEP 1: Do OLS on (C) and get SSEC.
STEP 2: Do OLS on (A); then get SSEA.
STEP 3: Compute an alternative Chow-test statistic. Under Ho,
( SSEC − SSE A ) / TB
FACHOW = ~ f (TB , TA − k ) .
( SSE A ) /(TA − k )

• What is this?
yA = X Aβ + ε A for Group A;
•
y B = X B β + I TB γ + ε B for Group B,

where γ = γ 1 ,..., γ TB ′ .
( )
⎛ ⎞
⎛ y B ,1 ⎞ ⎜ xB ,1i′ 1 0 ... 0 ⎟ ⎛ β ⎞ ⎛ ε B ,1 ⎞
⎜ y ⎟ ⎜
x ′ 0 1 ... 0 ⎟ ⎜ γ 1 ⎟ ⎜ ε B ,2 ⎟
• ⎜
B ,2 ⎟=⎜ B ,2 i
⎟⎜ ⎟ + ⎜ ⎟.
⎜ : ⎟ ⎜ : : : : ⎟⎜ : ⎟ ⎜ : ⎟
⎜⎜ ⎟⎟ ⎜ ⎟ ⎜⎜ ⎟⎟ ⎜⎜ ⎟⎟
y γ ε
⎝ B ,TB ⎠ ⎜ xB ,T i′ 0 0 ... 1 ⎟ ⎝ TB ⎠ ⎝ B ,TB ⎠
⎝ B ⎠
⎛β ⎞
⎛ y A ⎞ ⎛ X TA×k 0TA×TB ⎞ ⎜ γ 1 ⎟ ⎛ ε A ⎞
• ⎜ ⎟=⎜ ⎟⎜ ⎟ + ⎜ ⎟
⎝ yB ⎠ ⎝⎜ X TB ×k ⎟
ITB ×TB ⎠ ⎜ : ⎟ ⎝ ε B ⎠
⎜⎜ ⎟⎟
⎝ γ TB ⎠
• SSEA = SSE from regression of the above model.
• FACHOW = F for Ho: γ 1 = ... = γ TB = 0 .

Linear Regressions under Ideal Conditions-106

[9] Forecasting

• Model: yt = β1xt1 + β2xt2 + ... + βkxtk + εt.

• Wish to predict y0 given x01, x02, ... , x0k.
• y0 = x0′ β + ε 0 , x0′ = ( x01 ,..., x0 k ) .

• ŷ0 = x0′ βˆ (point forecast of y0).

Theorem:
Under (SIC.1)-(SIC.6) and (SIC.8), ( y0 − yˆ 0 ) ~ N (0, σ o2 [1 + x0′ ( X ′X ) −1 x0 ]) .
Proof:
ŷ0 = x0′ βˆ = x0′ [ β o + ( X ′X ) −1 X ′ε ] = x0′ β o + x0′ ( X ′X ) −1 X ′ε .
y0 = x0′ βo + ε 0 .

→ y0 − yˆ 0 = ε 0 − x0′ ( X ′X ) −1 X ′ε .

→ Since ε0 and ε are normal, so is ( y0 − yˆ 0 ) .

→ E ( y0 − yˆ 0 ) = 0 .

→ var ( y0 − yˆ 0 ) = var(ε 0 − x0′ ( X ′X ) −1 X ′ε )

= var(ε 0 ) + var[ x0′ ( X ′X ) −1 X ′ε ]

= σ o2 + x0′ ( X ′X ) −1 X ′Cov(ε )[ x0′ ( X ′X ) −1 X ′]′

= σ o2 + σ o2 x0′ ( X ′X ) −1 x0 .

Linear Regressions under Ideal Conditions-107

Theorem:
y0 − yˆ 0
Under (SIC.1)-(SIC.6) and (SIC.8), ~ t (T − k ) .
s (1 + x0′ ( X ′X ) x0 )
2 −1

Implication:
Let c be a critical value for two-tail t-test given a significance level (say, 5%):
⎛ y0 − yˆ 0 ⎞
Pr ⎜ − c < < c ⎟ = 0.95 ,
⎝ se( y0 − yˆ 0 ) ⎠

where se( y0 − yˆ 0 ) = s 2 (1 + x0′ ( X ′X ) −1 x0 ) . This implies that:

Pr( yˆ 0 − c × se < y0 < yˆ 0 + c × se) = 0.95 .

Forecasting Procedure:
STEP 1: Let x0′ = ( x01 , x02 ,..., x0k ) .

STEP 2: Compute ŷ0 = x0′ βˆ .

STEP 3: Compute se( y0 − yˆ 0 ) = s 2 (1 + x0′ ( X ′X ) −1 x0 ) .

STEP 4: From given df = T-k and confidence level, find c.

STEP 5: Compute Pr( yˆ 0 − c × se < y0 < yˆ 0 + c × se) = 0.95 .

Linear Regressions under Ideal Conditions-108

Numerical Example:
⎛ 0.1 0 0⎞ ⎛ 1.2 ⎞
( X ′X ) −1 = ⎜ 0 5 −7 ⎟ ; βˆ = ⎜ −1 ⎟ ; T = 10; s 2 = 14.514 .
⎜ ⎟ ⎜ ⎟
⎜ 0 −7 10 ⎟ ⎜ 2 ⎟
⎝ ⎠ ⎝ ⎠
And x02 = 1 and x03 = 1.

STEP 1: Let x0′ = (1, x02 , x03 ) = (1,1,1) .

⎛ 1.2 ⎞
STEP 2: Compute yˆ 0 = x0′ βˆ = (1 1 1) ⎜ −1 ⎟ = 2.2 .
⎜ ⎟
⎜ 2 ⎟
⎝ ⎠

STEP 3: Compute se = s 2 (1 + x0′ ( X ′X ) −1 x0 ) = 14.514 × (1 + 1.1) = 5.52.

STEP 4: From given df = 10-3 = 7 and α = 5%, c = 2.365.

STEP 5: ŷ0 - c×se = 2.2 - 2.365×5.52 = -10.855.

ŷ0 +c×se = 2.2 + 2.365×5.52 = 15.255.

Pr(-10.855 < yo < 15.255) = 0.95.

Linear Regressions under Ideal Conditions-109

“Dynamic” and “Static” Forecasts in Eviews
• For the analysis of cross-section data, they are the same.
• For the analysis of time-series data, they could be different.
• When a regression model uses lagged dependent variables as regressors, it is
called a dynamic model.
• Consider a simple dynamic model yt = β1 + β2yt-1 + εt.
• “Dynamic” Forecast [Multiple Period Forecast]: Suppose you estimate
β’s using observations up to t = 100. Using the estimates, you would like
to forecast y101 and y102. For this case, if you use “dynamic forecast”,
Eviews will compute point forecasts of y101 and y102 by
ŷ101 = βˆ1 + βˆ2 y100 ; yˆ102 = βˆ1 + βˆ2 yˆ101 .

• “Static” Forecast [One Period Forecast]: If you choose “static forecast”,

Eviews will compute point forecasts of y101 and y102 by
ŷ101 = βˆ1 + βˆ2 y100 ; ŷ102 = βˆ1 + βˆ2 y101 .
Observe that “static forecast” use y101 instead of ŷ101 to forecast y102.

• If you have data points up to t = 100, and if you would like to forecast y at
t = 101 and t = 102, you’d better to use “dynamic forecast.”
• The formula of forecasting standard errors taught in the class can be used
for static forecasts. But the standard errors for dynamic forecasts are
much more complicated.

Linear Regressions under Ideal Conditions-110

[Exercise for Static Forecast]
• Use ECN2002.wf1 (data from 1959:1 to 1995:12).
• For the definitions of the variables, see ECN2002.XLS.
• Forecasting ldpi = log(DPI) using regression results from 1959:1 to
1995:12.

Dependent Variable: LDPI

Method: Least Squares
Date: 02/07/02 Time: 11:31
Sample(adjusted): 1959:07 1995:12
Included observations: 438 after adjusting endpoints

Variable Coefficient Std. Error t-Statistic Prob.

C 0.008851 0.003062 2.890236 0.0040

LDPI(-1) 0.802184 0.047680 16.82446 0.0000
LDPI(-2) 0.130495 0.061254 2.130386 0.0337
LDPI(-3) 0.086545 0.061535 1.406419 0.1603
LDPI(-4) 0.045344 0.061534 0.736894 0.4616
LDPI(-5) 0.078119 0.061248 1.275461 0.2028
LDPI(-6) -0.143010 0.047695 -2.998423 0.0029

R-squared 0.999933 Mean dependent var 7.280527

Adjusted R-squared 0.999932 S.D. dependent var 0.889422
S.E. of regression 0.007340 Akaike info criterion -6.97510
Sum squared resid 0.023220 Schwarz criterion -6.90986
Log likelihood 1534.547 F-statistic 1069361.
Durbin-Watson stat 2.014603 Prob(F-statistic) 0.000000

Linear Regressions under Ideal Conditions-111

9.0
Forecast: LDPIFS
Actual: LDPI
Forecast sample: 1996:01 2001:12
8.9 Included observations: 71

Root Mean Squared Error 0.005262

Mean Absolute Error 0.003230
8.8
Mean Abs. Percent Error 0.036666
Theil Inequality Coefficient 0.000300
Bias Proportion 0.106970
8.7 Variance Proportion 0.005447
Covariance Proportion 0.887582

8.6
1996 1997 1998 1999 2000 2001

LDPIFS

9.00

8.95

8.90

8.85

8.80

8.75

8.70

8.65

8.60
1996 1997 1998 1999 2000 2001

LDPI UPPERBS
LDPIFS LOW ERBS

Linear Regressions under Ideal Conditions-112

[Exercise for Dynamic Forecast]

9.2
Forecast: LDPIFD
Actual: LDPI
9.1
Forecast sample: 1996:01 2001:12
Included observations: 71
9.0
Root Mean Squared Error 0.057155
Mean Absolute Error 0.049899
8.9
Mean Abs. Percent Error 0.565546
Theil Inequality Coefficient 0.003247
8.8 Bias Proportion 0.762216
Variance Proportion 0.221227
8.7 Covariance Proportion 0.016557

8.6
1996 1997 1998 1999 2000 2001

LDPIFD

9.1

9.0

8.9

8.8

8.7

8.6
1996 1997 1998 1999 2000 2001

LDPI UPPERBD
LDPIFD LOW ERBD

Linear Regressions under Ideal Conditions-113

[10] Nonnormal ε and Stochastic Regressors

(1) Motivation
• If the regressors xt• are stochastic, all t and F tests are wrong (bad news).
• The t and F tests require the OLS estimator βˆ to be unbiased.

• Recall how we have shown the unbiasedness of βˆ under (SIC.8):

βˆ = β o + ( X ′X ) −1 X ′ε
?

→ E ( βˆ ) = β + E[( X ′X ) X ′ε ] = β + ( X ′X ) −1 X ′E (ε ) .
−1

• Unbiasedness of β does not require nonstochastic regressors. It only

requires:
E ( ε t | x1• ,..., xT • ) = 0 , for all t. (*)

Or E (ε | X ) = 0T ×1 .
Under this assumption,

( ) (
E ( β ) = E X E ( β | X ) = E X E ( β + ( X ′X ) −1 X ′ε | X ) )
= E X ( β + ( X ′X ) −1 X ′E (ε | X ) ) = E X ( β ) = β .

• But, for some cases, condition (*) does not hold. For example, xt• = yt-1. In

this case, E(εt-1|yt-1) ≠ 0. For this case, we can no longer say that β is an
unbiased estimator.
• An example for models with lagged dependent variables as regressors:
yt = β1xt1 + β2xt2 + β3yt-1 + εt . → β2/(1-β3) = long-run effect of xt2.

Linear Regressions under Ideal Conditions-114

• If the εt are not normally distributed, all t and F tests are wrong (bad news).
• Can we use them if T is large?
• Recall that the t and F statistics follow t and f distributions, respectively,
only if βˆ is normally distributed. But if the εt are not normally distributed,

βˆ is no longer normal.

Digression to Mathematical Statistics

Large-Sample Theories
1. Motivation:
• θˆT : An estimator from a sample of size T, {x1, ... , xT}.
I use subscript “T” to emphasize the fact that an estimator is a function of
sample size T.
• What would be the statistic properties of θˆT when T is infinitely large?

• What do we wish?
[We wish the distribution of θˆT would become more condensed around
θo as T increases.]

Linear Regressions under Ideal Conditions-115

2. Main Points:
Rough Definition of Consistency
• Suppose that the distribution of θˆT becomes more and more condensed

around θo as T increases. Then, we say that θˆT is a consistent estimator.

And we use the following notation:
plimT→∞θˆT = θo (or θˆT →p θo).

• The law of large numbers (LLN) says that a sample mean xT ( x from a

sample size equal to T) is a consistent estimator of μo. What does it mean?

• Gauss Exercise:
• A population with N(1,9).
• 1000 different random samples of T = 10 to compute x10 .

• 1000 different random samples of T = 100 to compute x100 .

• 1000 different random samples of T =5000 to compute x5000 .

Linear Regressions under Ideal Conditions-116

• conmonte.prg
/*
** Monte Carlo Program to Demonstrate Efficiency of Sample Mean
*/

@ Data generation from N(1,9) @

seed = 1;
tt1 = 10; @ # of observations @
tt2 = 100; @ # of observations @
tt3 = 1500; @ # of observations @
iter = 1000; @ # of sets of different data @

storx10 = zeros(iter,1) ;
storx100 = zeros(iter,1) ;
storx5000 = zeros(iter,1);

i = 1; do while i <= iter;

@ compute sample mean for each sample @

x10 = 1 + 3*rndns(tt1,1,seed);
x100 = 1 + 3*rndns(tt2,1,seed);
x5000 = 1 + 3*rndns(tt3,1,seed);
storx10[i,1] = meanc(x10);
storx100[i,1] = meanc(x100);
storx5000[i,1] = meanc(x5000);

i = i + 1; endo;

@ Reporting Monte Carlo results @

library pgraph;
graphset;

v = seqa(-2, .05, 120);

ytics(0,25,0.1,0);
@ {a1,a2,a3}=histp(storx10,v); @
@ {b1,b2,b3}=histp(storx100,v); @
{b1,b2,b3}=histp(storx5000,v);

Linear Regressions under Ideal Conditions-117

x10

x100

x5000

Linear Regressions under Ideal Conditions-118

• Relation between unbiasedness and consistency:
• Biased estimators could be consistent.
Example: Suppose that θT is unbiased and consistent.

Define θˆT = θT + 1/T.

Clearly, E(θˆT ) = θo + 1/T ≠ θo (biased).

But, plimT→∞θˆT = plimT→∞θT = θo (consistent).

• A unbiased estimator θˆT is consistent if var(θˆT ) → 0 as T → 4.

Example: Suppose that {x1, ..., xT} is a random sample from N ( μo , σ o2 ) .

E( xT ) = μo.

var( xT ) = σo2/T → 0 as T → ∞.
Thus, xT is a consistent estimator of μo.

Linear Regressions under Ideal Conditions-119

Law of Large Numbers (LLN)

Case of scalar random variables

• Komogorov's Strong LLN:
Suppose that {x1, ... , xT} is a random sample from a population with
finite µ and σ2. Then, plimT→∞ xT = μo.

• Generalized Weak LLN (GWLLN):

• {x1, ... , xT} is a sample (not necessarily a random sample)
• Define E(x1) = μ1,o, ... , E(xT) = μT,o.
• The variances of the xt (t = 1, …, T) are finite and may be different over
different t.
1
• Then, under suitable assumptions, plimT→∞ xT = limT→∞ Σt μo ,t .
T

Case of Vector Random Variables

• GWLLN
• xt: p×1 random vector.
• {x1, ... , xT} is a sample.
• Let E(x1) = μ1,o (p×1), ... , E(xT) = μT,o.
• Assume that Cov(xj) are well-defined and finite.
1
• Then, under suitable assumptions. plimT→∞ xT = limT→∞ Σ t μo , t .
T

Linear Regressions under Ideal Conditions-120

Central Limit Theorems (CLT) –Asymptotic Normality

Case of scalar random variables

• Motivation:
• Suppose that {x1, ... , xT} is a random sample from a population with
finite μ and σ2.
• We know xT → μo as T → ∞. But we can never have an infinitely large
sample!!!
• For finite T, xT is still a random variable. What statistical distribution
could approximate the true distribution of xT ?
• Lindberg-Levy CLT:
• Suppose that {x1, ... , xT} is a random sample from a population with
finite μ and σ2.

xT - μ o
• Then, T ( xT − μo ) → d N (0, σ o2 ) and T →d N(0,1).
σo
• Implication of CLT:
• T ( xT − μo ) ≈ N (0, σ o2 ) , if T is large.

• E ⎡⎣ T ( xT − μo ) ⎤⎦ = T [ E ( xT ) − μo ] ≈ 0 → E( E ( xT ) ≈ μo .

• var ⎡⎣ T ( xT − μo ) ⎤⎦ = T • var( xT − μo ) = T • var( xT ) ≈ σ 02

→ var( xT ) ≈ σ 02 / T .

• xT ≈ N ( μo , σ o2 / T ) , if T is large.

Linear Regressions under Ideal Conditions-121

Case of random vectors
• GCLT
• {y1, ... , yT}: a sequence of p×1 random vectors.
• For any t, E(yt) = 0p×1 and Cov(yt) is well defined and finite.
• Under some suitable conditions (acceptable for Econometrics I, II),
1 ⎛ 1 ⎞
Σt yt → d N ⎜ 0 p ×1 , limT →∞ Cov (Σt yt ) ⎟
T ⎝ T ⎠
• Note:
• Cov(yt) [var(yt) if yt is a scalar] could differ across different t.
• The yt could be correlated as long as limn→∞cov(yt,yt+n) = 0 (ergodic).
• If E ( yt | yt −1 , yt −2 ,...) = 0 (Martingale Difference Sequence), the yt’s
are linearly uncorrelated. Then,
1 ⎛ 1 ⎞
Σt yt → d N ⎜ 0 p ×1 , limT →∞ Σt Cov ( yt ) ⎟ .
T ⎝ T ⎠

End of Digression

Linear Regressions under Ideal Conditions-122

(2) Weak Ideal Conditions (WIC)
Consider the following linear regression model:
yt = xt′• β + ε t = xt1β1 + xt 2 β 2 + ... + xtk β k + ε t .

(WIC.1) The conditional mean of yt (dependent variable) given x1•, x2•, ... , xt•,
ε1, ... , and εt-1 is linear in xt•:
yt = E ( yt | x1• ,..., xt • , ε1 ,..., ε t −1 ) + ε t = xt′• βo + ε t .

Comment:
• Implies E (ε t | x1• , x2• ,..., xt • , ε1 , ε 2 ,..., ε t −1 ) = 0 .
• No autocorrelation in the εt: cov(ε t , ε s ) = 0 for all t ≠ s.
• Regressors are weakly exogenous and need not be strictly exogenous.
• E ( xs•ε t ) = 0k ×1 for all t ≥ s, but could be that E ( xs iε t ) ≠ 0 for some s > t.

(WIC.2) βo is unique.

(WIC.3) The series {xt•} are covariance-stationary and ergodic.

Comment:
• (WIC.2)-(WIC.3) implies that
p limT →∞ T −1 X ′X = p limT →∞ T −1Σt xt • xt′• ≡ Qo is finite and pd.

• Qo = limT →∞ T −1ΣE ( xt • xt′• ) [By GWLLN].

• Rules out perfect multicollinearity among regressors.

Linear Regressions under Ideal Conditions-123

(WIC.4) The data need not be a random sample.

(WIC.5) var(ε t | x1• , x2• ,..., xt • , ε1 ,..., ε t −1 ) = σ o2 for all t.

(No-Heteroskedasticity Assumption).

(WIC.6) The error terms εt are normally distributed conditionally on x1•, … , xt•,
ε1, … , εt-1.

(WIC.7) xt1 = 1, for all t = 1, ... , T.

Comment:
SIC → WIC.

Linear Regressions under Ideal Conditions-124

(3) Statistical Properties of the OLS estimator under WIC:
Theorem (Consistency/Asymptotic Normality Theorem):
Under (WIC.1)-(WIC.5),
p limT →∞ βˆ = β o (consistent).

p limT →∞ s 2 = σ o2 (consistent).

T ( βˆ − β o ) →d N ( 0k ×1 , σ o2Qo −1 ) .

Implication:
βˆ ≈ N ( β o , σ o 2 (TQo ) −1 ) → βˆ ≈ N ( β o , s 2 ( X ′X ) −1 ) ,

if T is reasonably large.

Implication:
1) t test for Ho: Rβo - r = 0 (R: 1×k, r: scalar) is valid if T is large.
Use z-table to find critical value.
2) For Ho: Rβo - r = 0 (R: m×k, r: m×1),
use WT = mF which is asymptotically χ2(m) distributed. [Why?]
−1
• WT = ( R β − r )′ ⎡ RCov ( β ) R ′⎤ ( R β − r )
⎣ ⎦
−1
= ( R βˆ − r )′ ⎡ Rs 2 ( X ′X ) −1 R ′⎤ ( R βˆ − r ) = mF.
⎣ ⎦

Theorem (Efficiency Theorem):

Under (WIC.1)-(WIC.6), the OLS estimators are efficient asymptotically.

Linear Regressions under Ideal Conditions-125

(4) Testing Nonlinear restrictions:
General form of hypotheses:
• Let w(θ) = [w1(θ),w2(θ), ... , wm(θ)]′, where wj(θ) = wj(θ1, θ2, ... , θp) = a
function of θ1, ... , θp.
• Ho: The true θ (θo) satisfies the m restrictions, w(θ) = 0m×1 (m ≤ p).

Examples:
1) θ: a scalar
Ho: θo = 2 → Ho: θo - 2 = 0 → Ho: w(θ) = 0, where w(θ) = θ - 2.
2) θ = (θ1, θ2, θ3)′.
Ho: θ1,o2 = θ2,o + 2 and θ3,o = θ1,o + θ2,o.
→ Ho: θ1,o2-θ2,o-2 = 0 and θ3,o-θ1,o-θ2,o = 0.
⎛ w1 (θ ) ⎞ ⎛ θ12 − θ 2 − 2 ⎞ ⎛ 0 ⎞
→ Ho: w(θ ) = ⎜ ⎟ = ⎜θ − θ − θ ⎟ = ⎜ 0 ⎟ .
⎝ 2
w (θ ) ⎠ ⎝ 3 1 2⎠ ⎝ ⎠

3) linear restrictions
θ = [θ1, θ2, θ3]′.
Ho: θ1,o = θ2,o + 2 and θ3,o = θ1,o + θ2,o
⎛ w (θ ) ⎞ ⎛ θ1,o − θ 2,o − 2 ⎞ ⎛ 0 ⎞
→ Ho: w(θo ) = ⎜ 1 o ⎟ = ⎜ ⎟ = ⎜ 0⎟ .
(θ ) θ
⎝ 2 o ⎠ ⎝ 3,o 1,o
w − θ − θ 2,o ⎠ ⎝ ⎠
⎛ θ1,o ⎞
⎛ 1 −1 0 ⎞ ⎜ ⎟ ⎛2⎞
→ Ho: w(θo ) = ⎜ ⎟ ⎜ 2,o ⎟ − ⎜ 0 ⎟ = Rθo − r .
θ
⎝ − 1 − 1 1 ⎠⎜ ⎟ ⎝ ⎠
⎝ θ 3,o ⎠

Linear Regressions under Ideal Conditions-126

Remark:
If all restrictions are linear in θ, Ho takes the following form:
Ho: Rθo - r = 0m×1,
where R and r are known m×p and m×1 matrices, respectively.

Definition:
⎛ ∂w1 (θ ) ∂w1 (θ ) ∂w1 (θ ) ⎞
⎜ ∂θ ...
∂θ 2 ∂θ p ⎟
⎜ 1
⎟
⎜ ∂w2 (θ ) ∂w2 (θ ) ∂w2 (θ ) ⎟
∂w(θ ) ⎜ ...
W (θ ) ≡ = ⎜ ∂θ1 ∂θ 2 ∂θ p ⎟ .
∂θ ′ ⎟
⎜ : : : ⎟
⎜ ⎟
⎜ ∂wm (θ ) ∂wm (θ )
...
∂wm (θ ) ⎟
⎜ ∂θ ∂θ 2 ∂θ p ⎟⎠ m× p
⎝ 1

Example: (Nonlinear restrictions)

Let θ = [θ1,θ2,θ3]′.
Ho: θ1,o2 - θ2,o = 0 and θ1,o - θ2,o - θ3,o2 = 0.
⎛ θ12 − θ 2 ⎞ ⎛ 2θ1 −1 0 ⎞
→ w(θ ) = ⎜ 2⎟
; (θ ) = ⎜ 1 .
−1 −2θ 3 ⎟⎠
W
θ −
⎝ 1 2 θ − θ 3 ⎠ ⎝

Linear Regressions under Ideal Conditions-127

Example: (Linear restrictions)
θ = [θ1,θ2,θ3]′
Ho: θ1,o = 0 and θ2,o + θ3,o = 1.
⎛ θ1 ⎞
⎛ θ1 ⎞ ⎛ 0 ⎞ ⎛ 0 ⎞ ⎛ 1 0 0⎞ ⎜ ⎟ ⎛ 0⎞ ⎛ 0⎞
→ w(θ ) = ⎜
θ + θ ⎟ − ⎜ 1 ⎟ = ⎜ 0 ⎟ → w(θ ) = ⎜ 0 1 1 ⎟ ⎜ θ 2 ⎟ − ⎜ 1 ⎟ = ⎜ 0 ⎟ ,
⎝ 2 3⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎜ ⎟ ⎝ ⎠ ⎝ ⎠
⎝ θ3 ⎠
which is of form w(θ ) = Rθ − r .

Theorem:
Under (WIC.1)-(WIC.5),

( )
T w( β ) − w( β o ) → d N ( 0m×1 ,W ( β o )σ 2Qo−1W ( β o )′) .

Proof:
Taylor’s expansion around βo:

w( β ) = w( β o ) + W ( β )( β − β o ) ,

where β is between β and βo . Since β is consistent, so is β . Thus,

( )
T w( β ) − w( β o ) ≈ W ( β o ) T ( β − β o )

→ d N ( 0m×1 ,W ( β o )σ 2Qo−1W ( β o )′) .

Implication:

( w ( β ) − w( β ) ) ≈ N ( 0
o m×1 )
,W ( β ) s 2 ( X ′X ) −1W ( β )′ .

Linear Regressions under Ideal Conditions-128

Theorem:
Under (WIC.1)-(WIC.5) and Ho: w(βo) = 0,
−1
WT = w( β )′ ⎡W ( β )Cov ( β )W ( β )′⎤ w( β ) ⇒ χ 2 ( m).
⎣ ⎦
Proof:
Under Ho: w(βo) = 0,

( )
w( β ) ≈ N 0m×1 ,W ( β )Cov ( β )W ( β )′ .

For a normal random vector hm×1 ~ N(0m×1,Ωm×m), h′Ω-1h ~ χ2(m). Thus, we

obtain the desired result.

Question: What does “Wald test” mean?

A test based on the unrestricted estimator only.

Linear Regressions under Ideal Conditions-129

(5) When the WIC are violated:
CASE 1: Simple dynamic model, yt = βyt-1 + εt.
• SIC is violated. But WIC hold, if the εt i.i.d. N (0, σ o2 ) and -1 < βo < 1.
• If βo = 1, WIC is also violated. For this case, the OLS is consistent, but not
normally distributed.
• For simplicity, set y0 = 0.
• yt = Σts =1ε s → var(yt) = E(yt2) = tσo2.
• plim (1/T)Σtxt•xt•′ = plim (1/T)Σtyt-12 = lim (1/T)ΣtE(yt-12) (by GWLLN)
= lim (1/T)Σt(t-1)σo2 = lim (1/T)[T(T-1)/2]σo2
= lim [(T-1)/2]σo2 → ∞ (WIC.3 violated.)

CASE 2: Deterministic trend model, yt = βt + εt.

1 T (T + 1)(2T + 1)
• plim (1/T)Σtxt•xt•′ = plim (1/T)Σtt2 = → ∞.
T 6
• WIC.3 is violated. But OLS estimator is consistent and asymptotically
normal.

CASE 3: Simultaneous Equations models.

• (a) ct = β1,o + β2,oyt + εt ; (b) ct + it = yt
• (a) → (b): yt = β1,o + β2,oyt + εt + it.
• yt = [β1,o/(1-β2,o)] + it[1/(1-β2,o)] + εt/(1-β2,o).
• yt is correlated with εt in (a).
• OLS is inconsistent.

Linear Regressions under Ideal Conditions-130

CASE 4: Measurement errors:
• yt = βoxt* + εt (true model).
• But we can observe xt = xt* + vt (vt: measurement error).
• If we use xt for xt*,
yt = xtβo + [εt-βovt] (model we estimate).
• xt and (εt-βovt) correlated.
• OLS is inconsistent.

• yt* = βoxt + εt (true model).

• But we can observe yt = yt* + vt.
• If we use yt for yt*,
yt = xtβo + [εt+vt] (model we estimate)
xt and (εt+vt) uncorrelated.
• OLS is consistent.

Linear Regressions under Ideal Conditions-131

[Proofs of Consistency and Asymptotic Normality Theorems]
(1) Show p lim βˆ = β o .

βˆ = βo + ( X ′X ) −1 X ′ε = β o + (T −1Σt xt • xt′i ) T −1Σt xt •ε t .

p limT →∞ T −1Σt xt • xt′• = Qo (by WIC.3)

p limT →∞ T −1Σt xt •ε t = limT →∞ Σt E ( xt •ε t ) [by GWLLN]

= lim T-1Σt0 [by WIC.1] = 0.
→ plim p limT →∞ βˆ = β o + (Qo ) −1 0 = β o .

(2) Show plim s2 = σo2.

plim s2 = plim SSE/T.
SSE/T = ε′M(X)ε/T = ε ′ε / T − ε ′ X ( X ′X ) −1 X ′ε / T

= T −1Σtε t2 − (T −1ε ′X )(T −1 X ′X ) −1 (T −1 X ′ε )

= T −1Σtε t2 − (T −1Σtε t xt′i )(T −1Σt xt • xt′• ) −1 (T −1Σt xt •ε t ) .

p limT →∞ T −1Σtε t2 = limT →∞ T −1Σt E (ε t2 ) = limT →∞ T −1Σtσ o2 = σ o2 .

p limT →∞ T −1Σt xt •ε t = 0 .

→ p limT →∞ s 2 = σ o2 − 0′(Qo ) −1 0 = σ o2 .

Linear Regressions under Ideal Conditions-132

(3) Show T ( βˆ − β ) →d N ( 0k ×1 , σ o2Qo −1 ) .

βˆ = βo + (T −1Σt xt • xt′• ) T −1Σt xt •ε t

→ ( βˆ − β ) = (T −1Σt xt • xt′• ) T −1Σt xt •ε t

⎛ 1 ⎞
→ T ( βˆ − β ) = [T −1Σt xt • xt′• ]−1 ⎜ Σ t xt •ε t ⎟ .
⎝ T ⎠
→ By GCLT with martingale difference,
1
Σt xt •ε t →d N(0, lim T-1ΣtCov(xt•εt))
T
Cov( xt •ε t ) = E ( xt •ε tε t xt′• ) = E (ε t2 xt • xt′• )

= E xt• [ E (ε t 2 xt • xt′• | xt • )] (by LIE)

= E xt• [ E (ε t 2 | xt • ) xt • xt′• ] = E xt (σ o2 xt • xt′• ) = σ o2 E ( xt • xt′• ) .

limT →∞ T −1Cov ( xt •ε t • ) = σ o2 limT →∞ T −1Σt E ( xt• xt′• ) = σ o2Qo .

1
→ Σt xt •ε t →d N (0k ×1 , σ o2Qo ) .
T
→ T ( βˆ − β o ) → d N ((Qo ) −1 0k ×1 ,(Qo ) −1σ o2Qo (Qo ) −1 ) = N (0k ×1 , σ o2 (Qo ) −1 )

Linear Regressions under Ideal Conditions-133

Advanced Econometrics PDF
No ratings yet
Advanced Econometrics PDF
58 pages
Electric Dirt Bike: Owner'S Manual
No ratings yet
Electric Dirt Bike: Owner'S Manual
28 pages
Econometrics Module 2
No ratings yet
Econometrics Module 2
38 pages
Econometrics I 2
No ratings yet
Econometrics I 2
38 pages
Econometrics I: Professor William Greene Stern School of Business Department of Economics
No ratings yet
Econometrics I: Professor William Greene Stern School of Business Department of Economics
47 pages
Exercises MEF - 9 - 2018 - Solution
No ratings yet
Exercises MEF - 9 - 2018 - Solution
8 pages
Mathematical Expectation or Expected Value
No ratings yet
Mathematical Expectation or Expected Value
52 pages
Econometrics I 2
No ratings yet
Econometrics I 2
46 pages
7_Expectation
No ratings yet
7_Expectation
20 pages
MIT14 381F13 Lec1 PDF
No ratings yet
MIT14 381F13 Lec1 PDF
8 pages
Distributions and Normal Random Variables
No ratings yet
Distributions and Normal Random Variables
8 pages
8 Conditional Expectation
No ratings yet
8 Conditional Expectation
27 pages
Expectation: Definition Expected Value of A Random Variable X Is Defined
No ratings yet
Expectation: Definition Expected Value of A Random Variable X Is Defined
15 pages
Expectations: Proposition 12.1
No ratings yet
Expectations: Proposition 12.1
11 pages
revision_concepts
No ratings yet
revision_concepts
5 pages
16oct24 Annotations
No ratings yet
16oct24 Annotations
35 pages
16.584: Lecture 6: Moments of Random Variables
No ratings yet
16.584: Lecture 6: Moments of Random Variables
32 pages
Topic 4 - Sequences of Random Variables
No ratings yet
Topic 4 - Sequences of Random Variables
32 pages
Problems-1. Solutions 1-10
No ratings yet
Problems-1. Solutions 1-10
17 pages
Lecture Notes 1 36-705 Brief Review of Basic Probability
No ratings yet
Lecture Notes 1 36-705 Brief Review of Basic Probability
7 pages
Econ 140 (Spring 2018) - Section 1: 1 Random Variable (RV)
No ratings yet
Econ 140 (Spring 2018) - Section 1: 1 Random Variable (RV)
7 pages
Week 2 DrBuddhananda Banerjee Vector RV
No ratings yet
Week 2 DrBuddhananda Banerjee Vector RV
10 pages
Chapter 4
No ratings yet
Chapter 4
48 pages
Conditional Expectation
No ratings yet
Conditional Expectation
32 pages
SF 2940 Forms
No ratings yet
SF 2940 Forms
23 pages
Random Variables Cheatsheet
No ratings yet
Random Variables Cheatsheet
3 pages
CH 2
No ratings yet
CH 2
31 pages
2021 - Week - 3 - Ch.2 Random Process
No ratings yet
2021 - Week - 3 - Ch.2 Random Process
11 pages
Econometrics Arnav Jain
No ratings yet
Econometrics Arnav Jain
37 pages
Ta 2
No ratings yet
Ta 2
7 pages
Conditional Expectations E (X - Y) As Random Variables: Sums of Random Number of Random Variables (Random Sums)
No ratings yet
Conditional Expectations E (X - Y) As Random Variables: Sums of Random Number of Random Variables (Random Sums)
2 pages
Uni Variate Regression
No ratings yet
Uni Variate Regression
61 pages
Chapter 3
No ratings yet
Chapter 3
38 pages
3.Handouts_binary_dependent_variables
No ratings yet
3.Handouts_binary_dependent_variables
8 pages
Lecture3 Module1 Anova 1
No ratings yet
Lecture3 Module1 Anova 1
10 pages
Probability
No ratings yet
Probability
12 pages
405 Econometrics Odar N. Gujarati: Prof. M. El-Sakka
100% (1)
405 Econometrics Odar N. Gujarati: Prof. M. El-Sakka
27 pages
Sampling Distributions of The OLS Estimators
No ratings yet
Sampling Distributions of The OLS Estimators
27 pages
Statistics 3 Notes
No ratings yet
Statistics 3 Notes
90 pages
Em
No ratings yet
Em
68 pages
Probability Review Stochastic
No ratings yet
Probability Review Stochastic
23 pages
Presentation of Statistics
No ratings yet
Presentation of Statistics
21 pages
Lecture 2
No ratings yet
Lecture 2
15 pages
2 5431342108386526658
No ratings yet
2 5431342108386526658
8 pages
Chapter 6: Regression
No ratings yet
Chapter 6: Regression
7 pages
Classical LinearReg 000
No ratings yet
Classical LinearReg 000
41 pages
Answer Key to Exercises_LN2_ver1
No ratings yet
Answer Key to Exercises_LN2_ver1
15 pages
Multivariate Statistical Distributions
No ratings yet
Multivariate Statistical Distributions
12 pages
L10
No ratings yet
L10
14 pages
Quiz02 Review
No ratings yet
Quiz02 Review
7 pages
7) Correlation and Regression
No ratings yet
7) Correlation and Regression
8 pages
Chapter 8
No ratings yet
Chapter 8
39 pages
Suggested Detailed Solutions For Assignment Set 1 - Updated Ex 4 - Nov 13
No ratings yet
Suggested Detailed Solutions For Assignment Set 1 - Updated Ex 4 - Nov 13
9 pages
Memo Proba
No ratings yet
Memo Proba
2 pages
Stochastic Processes
No ratings yet
Stochastic Processes
46 pages
Financial Engineering & Risk Management: Review of Basic Probability
No ratings yet
Financial Engineering & Risk Management: Review of Basic Probability
46 pages
Stat 100a: Introduction To Probability
No ratings yet
Stat 100a: Introduction To Probability
9 pages
4Gaussian Discriminant
No ratings yet
4Gaussian Discriminant
50 pages
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Elementary Calculus
From Everand
Elementary Calculus
George N. Frempong
No ratings yet
Differentiation (Calculus) Mathematics Question Bank
From Everand
Differentiation (Calculus) Mathematics Question Bank
Mohmmad Khaja Shareef
4/5 (1)
Yale Univ. Mathematics Camp - 08
No ratings yet
Yale Univ. Mathematics Camp - 08
22 pages
hw4
No ratings yet
hw4
1 page
final_review
No ratings yet
final_review
3 pages
Yale Univ. Mathematics Camp - 07
No ratings yet
Yale Univ. Mathematics Camp - 07
16 pages
Yale Univ. Mathematics Camp - 06
No ratings yet
Yale Univ. Mathematics Camp - 06
16 pages
Lec 01
No ratings yet
Lec 01
12 pages
Yale Univ. Mathematics Camp - 12
No ratings yet
Yale Univ. Mathematics Camp - 12
27 pages
Yale Univ. Mathematics Camp - 11
No ratings yet
Yale Univ. Mathematics Camp - 11
20 pages
Lec14 - Mathematics Camp - Yale University Campuses
No ratings yet
Lec14 - Mathematics Camp - Yale University Campuses
14 pages
Lec15 - Mathematics Camp - Yale University Campuses
No ratings yet
Lec15 - Mathematics Camp - Yale University Campuses
14 pages
Lec13 - Mathematics Camp - Yale University Campuses
No ratings yet
Lec13 - Mathematics Camp - Yale University Campuses
12 pages
SMITH 2016 Essentials of Applied Econometrics Ch01
No ratings yet
SMITH 2016 Essentials of Applied Econometrics Ch01
18 pages
02 Subat 28
No ratings yet
02 Subat 28
3 pages
BARNETT 2017 Reply To Stokes and Purdon
No ratings yet
BARNETT 2017 Reply To Stokes and Purdon
2 pages
A202001006 - PUTRI WAHYUNI - Primer3 Output (Primer3 - Results - Cgi Release 4.1.0)
No ratings yet
A202001006 - PUTRI WAHYUNI - Primer3 Output (Primer3 - Results - Cgi Release 4.1.0)
2 pages
Electrical Thumb Rule
No ratings yet
Electrical Thumb Rule
3 pages
Karachi Water & Sewerage Board
No ratings yet
Karachi Water & Sewerage Board
3 pages
Right vs. Right: HR Ethical Dilemmas
No ratings yet
Right vs. Right: HR Ethical Dilemmas
26 pages
20240515（全本杂志）thejournalofnuclearmedicine202405suppl complete-issue
No ratings yet
20240515（全本杂志）thejournalofnuclearmedicine202405suppl complete-issue
85 pages
5 Reasons To Wear A Shearling Jacket
No ratings yet
5 Reasons To Wear A Shearling Jacket
4 pages
Cherono Faith
No ratings yet
Cherono Faith
49 pages
Software Licence Agreement
No ratings yet
Software Licence Agreement
24 pages
Discovering Tut - Summary Slide Show
No ratings yet
Discovering Tut - Summary Slide Show
9 pages
Astronomical Phenomenon Models Ptolemaic, Copernican, and Tychonic Model - 20240406 - 202816 - 0000
No ratings yet
Astronomical Phenomenon Models Ptolemaic, Copernican, and Tychonic Model - 20240406 - 202816 - 0000
15 pages
Vocabulary: Grammar
No ratings yet
Vocabulary: Grammar
1 page
Assignment 3 Microelectronics Devices To Circuits
No ratings yet
Assignment 3 Microelectronics Devices To Circuits
4 pages
Pile Cap Shuttering
No ratings yet
Pile Cap Shuttering
2 pages
Martin C. Jischke Installation Address: "Celebrating The Land-Grant University: Pursuing Excellence For Iowa.", 1991
No ratings yet
Martin C. Jischke Installation Address: "Celebrating The Land-Grant University: Pursuing Excellence For Iowa.", 1991
11 pages
HFY3-3130-00-HSE-PD-0001 - 0 - Hazard Identification and Risk Assessment Code-A PDF
No ratings yet
HFY3-3130-00-HSE-PD-0001 - 0 - Hazard Identification and Risk Assessment Code-A PDF
12 pages
Western Institute of Technology Policy
No ratings yet
Western Institute of Technology Policy
12 pages
USP-NF Celecoxib Capsules
No ratings yet
USP-NF Celecoxib Capsules
7 pages
Ds-pp-0107 - Mechanical Data Sheet For Swivel Joints (Rev.0)
No ratings yet
Ds-pp-0107 - Mechanical Data Sheet For Swivel Joints (Rev.0)
3 pages
S K Mangal - Coffee - Planting, Production and Processing-Gene-Tech Books (2007) - 51-59
No ratings yet
S K Mangal - Coffee - Planting, Production and Processing-Gene-Tech Books (2007) - 51-59
9 pages
Bicol Region
No ratings yet
Bicol Region
69 pages
Results in Physics: Tolga Yarman, Alexander Kholmetskii, Ozan Yarman, Metin Arik, Faruk Yarman T
No ratings yet
Results in Physics: Tolga Yarman, Alexander Kholmetskii, Ozan Yarman, Metin Arik, Faruk Yarman T
4 pages
Tools For Systems Thinkers PDF
No ratings yet
Tools For Systems Thinkers PDF
5 pages
Hyperemesis Gravidarum: A Review of Recent Literature
No ratings yet
Hyperemesis Gravidarum: A Review of Recent Literature
16 pages
The Giant (Apr 1971)
No ratings yet
The Giant (Apr 1971)
2 pages
Assignment 3
100% (1)
Assignment 3
5 pages
Lathrop Paper
No ratings yet
Lathrop Paper
5 pages
Liquefaction of Gases
No ratings yet
Liquefaction of Gases
21 pages
Siemens Wind Power Thesis
No ratings yet
Siemens Wind Power Thesis
6 pages
Skateboarding: Design and Development Guidance For Skateboarding
No ratings yet
Skateboarding: Design and Development Guidance For Skateboarding
52 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.