0% found this document useful (0 votes)

519 views58 pages

Advanced Econometrics PDF

This document outlines concepts in advanced econometrics including bivariate statistics, linear regression models, instrumental variables, and maximum likelihood estimation. It covers topics such as covariance, ordinary least squares, generalized least squares, instrumental variables estimators, partitioned regression, and the normal regression model. Appendices provide reviews of relevant probability theory and linear algebra concepts.

Uploaded by

Jose Cobian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

519 views58 pages

Advanced Econometrics PDF

Uploaded by

Jose Cobian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

Advanced Econometrics

Arnan Viriyavejkul

June 12, 2019

Contents

1 Preliminaries 3
1.1 Bivariate Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Ordinary Least Squares 11

3 Linear Regression Model 15

3.1 Ordinary Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Generalized Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 Instrumental Variables 19
4.1 Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.1.1 Case 1: L = K . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.1.2 Case 2: L ≥ K . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Application: Partitioned Regression . . . . . . . . . . . . . . . . . . . . . . 28
4.2.1 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2.2 A Plot Twist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5 Maximum Likelihood Estimation 32

5.1 Normal Distribution with Unit Variance . . . . . . . . . . . . . . . . . . . . 32
5.1.1 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.1.2 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.1.3 Score function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.1.4 Fisher information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.1.5 Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.1.6 Asymptotic distribution of µ̂ML . . . . . . . . . . . . . . . . . . . . . 34
5.2 Normal Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2.1 Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2.2 Score Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2.3 Hessian matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2.4 Fisher information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

1
CONTENTS 2

5.2.5 Asymptotic distribution . . . . . . . . . . . . . . . . . . . . . . . . . 37

Appendix A Probability Theory 38

A.1 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
A.2 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
A.3 Convergence of Random Variables . . . . . . . . . . . . . . . . . . . . . . . 40

Appendix B Linear Algebra 43

B.1 Inner Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
B.2 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
B.2.1 Orthogonal Projection . . . . . . . . . . . . . . . . . . . . . . . . . . 45
B.2.2 Orthogonal Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . 46
B.3 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
B.3.1 Span and Linear Independence . . . . . . . . . . . . . . . . . . . . . 47
B.3.2 Linear Independence and Dependence . . . . . . . . . . . . . . . . . 48
B.3.3 Basis and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
B.3.4 Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
B.4 List of Variables and Dimensions . . . . . . . . . . . . . . . . . . . . . . . . 49
Chapter 1

Preliminaries

1.1 Bivariate Statistics

Covariance
1. Cov(X, Y ) = Cov(X, E(Y |X))

Cov(X, Y ) = E[XY ] − E[X]E[Y ]

Cov(X,Y )
2. X ∈ {0, 1} then Var(X) = E(Y |X = 1) − E(Y |X = 0)

3
CHAPTER 1. PRELIMINARIES 4

Cov(X, Y ) pE[Y |X = 1] − p[E[Y |X = 1]p + E[Y |X = 0](1 − p)

=
Var(X) p − p2
p[E[Y |X = 1] − E[Y |X = 1]p − E[Y |X = 0](1 − p)
=
p − p2
p(1 − p)[E[Y |X = 1] − E[Y |X = 0]]
=
p − p2
= E[Y |X = 1] − E[Y |X = 0]

3. E[XY ] is an inner product

⟨X, Y ⟩ = E[XY ] = E[Y X] = ⟨Y, X⟩

⟨aX, Y ⟩ = E[aXY ] = aE[XY ] = a⟨X, Y ⟩
⟨X1 + X2 , Y ⟩ = E[(X1 + X2 )Y ] = E[X1 Y ] + E[X2 Y ] = ⟨X1 , Y ⟩ + ⟨X2 , Y ⟩
X = 0 ⇒ E[X 2 ] = 0
E[X 2 ] = 0 ⇒ X = 0
¬(X = 0) ⇒ ¬[E[X 2 ] = 0]
¬(X = 0) ⇒ P (X = 0) < 1 ⇒ E[X 2 ] > 0
¬(X = 0) ⇒ P (X ̸= 0) > 0 ⇒ E[X 2 ] ̸= 0

∑N
4. ȲN := i=1 Yi /N . Derive E[ȲN ] and Var[ȲN ]

µY = E [Yi ]
σY2 = Var [Yi ] < ∞
∑N
¯ Yi
YN = i=1
N

[∑ ] [N ]
[ ] N ∑
i=1 Yi 1
E ȲN =E = E Yi
N N
i=1
(N )  
1 ∑ 1 
= E [Yi ] = µ Y + . . . + µY 
N N | {z }
i=1 N times
1
= (N µY ) = µY
N
CHAPTER 1. PRELIMINARIES 5
[∑ ] [N ]
[ ] N ∑
i=1 Yi 1
Var ȲN = Var = 2 Var Yi
N N
i=1
1
= 2 Var [Y1 + Y2 + . . . + YN ]
N
1 [ ]
= 2 · σY2 + . . . + σY2
N | {z }
N times
1 ( ) σ2
= N σY2 = Y
N N
 [ ]
ȲN − E ȲN
E [ZN ] = E  √ [ ]

var ȲN
 
ȲN − µY 
=E √ 2
σY
N
√
N( [ ] )
= E ȲN − µY
σ
√Y
N
= (µY − µY ) = 0
σY

 
ȲN − µY
Var [ZN ] = Var  √ 2 
σY
N
N [ ]
= 2 Var ȲN
σY
N σY2
= · =1
σY2 N

5. Finiteness of moments by Cauchy Schwartz inequality

√
E|XY | ≤ E [X 2 ] E [Y 2 ]
|⟨X, Y ⟩|2 ≤ ⟨X, X⟩ · ⟨Y, Y ⟩
[ ] [ ]
|E[XY ]|2 ≤ E X 2 E Y 2
[ ] [ ]
|E[X]|2 ≤ E X 2 E 12
CHAPTER 1. PRELIMINARIES 6

To prove the finiteness,

[ ]
|E[X]| ≤ E X 2 < ∞
[ ] [ ]
|E[Y ]|2 ≤ E Y 2 E 12
[ ]
|E[Y ]| ≤ E Y 2 < ∞
[ ] [ ]
|E[XY ]|2 ≤ E X 2 E Y 2
[ ] [ ]
E X 2 < ∞, E Y 2 < ∞ ⇒ E[XY ]2 < ∞ ⇒ E[XY ] < ∞

Application: Treatment Effect

Potential outcomes framework is translated into a regression model, it can be represented
by

Yi = β0 + β1i Xi + ui
Xi = π0 + π1i Zi + vi

You have iid sample data (Xi Yi , Zi ) and compute β̂ IV

sZY σZY
β̂ IV = = + op (1)
sZX σZX
where sZY denotes the sample covariance and σZY the population covariance of Zi and Yi

The goal here is to present σZY and σZX in terms of moments of π1i and β1i . In your
derivations, please assume random assignment of Zi so that

1. Prove that σZX = σZ2 E (π1i )

Cov(Zi , Xi ) = pE[Xi |Zi = 1] − p[E[Xi |Zi = 1]p + E[Xi |Zi = 0](1 − p)

Since E [π1i ] = [E[Xi |Zi = 1] − E[Xi |Zi = 0]. Also, from the two-stage regression
equations,

[E[Xi |Zi = 1] − E[Xi |Zi = 0]] = E[π0 + π1i + vi ] − E[π0 + vi ]

= E[π1i ]

2. Prove that σZY = σZ2 E (β1i π1i )

Cov(Zi , Yi ) = pE[Yi |Zi = 1] − p[E[Yi |Zi = 1]p + E[Yi |Zi = 0](1 − p)

Analogously, by plugging in,

[E[Yi |Zi = 1] − E[Yi |Zi = 0]] = E[βo + β1i π0 + β1i π1i + β1i vi + ui ]
− E[β0 + β1i π0 + β1i vi + ui ]
= E [β1i π1i ]

Finally, one gets,

Cov(Zi , Yi ) = p(1 − p)[E[Yi |Zi = 1] − E[Yi |Zi = 0]]

= Var(Zi )[E[Yi |Zi = 1] − E[Yi |Zi = 0]]
= σZ2 E [β1i π1i ]

Putting terms together, one gets local average treatment effect (LATE),

SZY σZY
β̂ IV = = + op (1)
SZX σZX
E [β1i π1i ]
= + op (1)
E[π1i ]
CHAPTER 1. PRELIMINARIES 8

1.2 Linear Algebra

Matrices
1. Projection matrix

PX Y = Ŷ
( )−1 ′
= X X ′X XY
( ′ )−1 ′
=X XX X (Xβ ⋆ + u)
( )−1 ′ ( )−1 ′
= X X ′X X Xβ ⋆ + X X ′ X Xu
( ) −1
= Xβ ⋆ + X X ′ X X ′u
( ( )−1 ′ )
= X β⋆ + X ′X X u = X β̂ OLS = Ŷ

2. Residual maker matrix

MX Y = û
= (IN − PX ) Y
( ( )−1 ′ )
= IN − X X ′ X X Y
( ′ )−1 ′
=Y −X X X XY
= Y − X β̂ OLS = û

MX u = (IN − PX ) u
( ( )−1 ′ )
= IN − X X ′ X X u
( )−1
= IN u − X X ′ X Xu
( ′ )−1 ′
=u−X X X Xu
( )
= u − X β̂ OLS − β ⋆
= u + Xβ ⋆ − X β̂ OLS
= Y − X β̂ OLS = û
CHAPTER 1. PRELIMINARIES 9

3. PX and MX are symmetric

( )−1 ′
PX = X X ′ X X
( ( ) )′
−1
PX′ = X X ′ X X′
( )
( ′ )′ (( ′ )−1 )′ ′
= X XX X
( ( ( )) )
−1
= X X′ X′ X′
( )−1 ′
= X X ′X X

MX = IN − PX
( ( )−1 ′ )′
′
MX = IN − X X ′ X X
( ( )−1 ′ )′
′
= IN − X X ′X X
= IN − PX

4. PX and MX are idempotent

( ( )−1 ′ ) ( ( ′ )−1 ′ )
PX PX = X X ′ X X X XX X
( )−1 ′ ( ′ )−1
= X X ′X XX XX X
( ′ )−1 ′
= IN X X X X
( ′ )−1 ′
=X XX X

MX MX = (In − PX ) (In − PX )
= IN IN − IN PX − PX IN + PX PX
= IN − PX − PX + PX
= IN − PX

5. The rank and trace of PX and MX are identical

( ( )−1 ′ )
′
tr (PX ) = tr X X X X
( ( ) )
−1
= tr X ′ X X ′ X
=K
CHAPTER 1. PRELIMINARIES 10

tr (MX ) = tr (IN − PX )
= tr (IN ) − tr (PX )
=N −K

tr (PX ) = rank (PX ) = K

tr (MX ) = rank (MX ) = N − K

Remarks: since PX and MX are idempotent matrices, they have only 0 and 1 eigen-
values. The multiplicity of one as eigenvalue is precisely the rank. By Singular Value
Decomposition, one can also prove that,

PX PX = PX
⇒ CΛC ′ · CΛC ′ = CΛC ′
⇒ ΛCΛΛC ′ = CΛC ′
⇒ C ′ CΛΛC ′ C
⇒ ΛΛ = Λ
Chapter 2

Ordinary Least Squares

Estimator
( )
1. β̃ := argminb∈RK E (Y − X ′ b)2

dim X = k × 1
dim a = k × 1
dim A = k × k

∂ ( ′ ) ∂ ( ′ )
AX = XA =A
∂X ∂X
∂
(AX) = A
∂X ′
∂ ( ′ ) ( )
X AX = A + A′ X
∂X
∂ 2 ( ′ )
′
X AX = A + A′
∂X∂X

s(b) = E[(Y − X ′ b)2 ]

= E[Y 2 − 2Y (X ′ b) + (X ′ b)′ (X ′ b)]
= E[Y 2 − 2b′ XY + b′ XX ′ b]
= E[Y 2 ] − 2b′ E[XY ] + b′ E[XX ′ ]b

∂s(b)
= −2E[XY ] + 2[XX ′ ]b
∂b
= −2E[XY ] + 2E[XX ′ ]β̃ = 0
β̃ = E[XX ′ ]−1 E[XY ]

11
CHAPTER 2. ORDINARY LEAST SQUARES 12

2. β̂ OLS := argminb∈RK (Y − Xb)′ (Y − Xb)

e(b) = Y − Xb
RSS(b) = e′ (b)e(b)
= (Y − Xb)′ (Y − Xb)
( )
= Y ′ − b′ X ′ (Y − Xb)
= Y ′ Y − Y ′ Xb − b′ X ′ Y + b′ X ′ Xb
= Y ′ Y − 2b′ X ′ Y + b′ X ′ Xb

∂RSS(b)
= 0 − 2X ′ Y + X ′ Xb + X ′ bX
∂b ( )
= −2X ′ Y + X ′ X + X ′ X b = 0
= −2X ′ Y + 2X ′ X β̂ OLS = 0
( )−1
β̂ OLS = X ′X (X ′ Y )
(N )−1 ( N )
∑ ∑
′
= Xi Xi Xi Yi
i=1 i=1

Asymptotic Distribution
√ ( )
1. N β̂ OLS − β ⋆ under homoskedasticity.
[ ]−1
β ⋆ = E Xi Xi′ E [Xi Yi ]
( ) −1
β̂ OLS = X ′ X X ′Y
( )−1 ′
= X ′X X (Xβ ∗ + u)
( )−1 ′
= β⋆ + X ′X Xu
( )−1 ( )
∑N ∑
N
⋆ −1 ′ −1
=β + N Xi Xi N Xi ui
i=1 i=1
( )−1 ( )
√ ( ) ∑
N ∑
N
−1
N β OLS
−β ⋆
= N Xi Xi′ N − 21
Xi ui
i=1 i=1
CHAPTER 2. ORDINARY LEAST SQUARES 13

( )−1
∑
N
[ ]−1
N −1 Xi Xi′ = E Xi Xi′
i=1
( )
∑
N √ ∑
N
− 21 −1
N Xi ui = N N Xi ui
i=1 i=1
√
= N Z̄N ; where Zi = Xi ui
( [ ])
−→ Z ∼ N 0, E Zi , Zi′
d

( [ ])
−→ Z ∼ N 0, E u2i Xi Xi′
d

Therefore,

√ ( ) [ ] ∑
N
′ −1 −1/2
N β OLS
−β ⋆
= E Xi Xi N Xi ui
i=1
[ ( ]−1 [ 2 ] [ ]−1 )
−→ N 0, E Xi Xi′ E ui Xi Xi′ E Xi Xi′
d

Under homoskedasticity,
[ ] [ [ ] ]
E u2i Xi Xi′ = E E u2i Xi Xi′ |X
[ [ ]]
= E Xi Xi′ E u2i |X
[ ]
= σu2 E Xi Xi′

Finally,
√ ( ) d ( [ ])
N β OLS − β ⋆ −→ Z ∼ N 0, E u2i Xi Xi′
( [ ]−1 )
−→ N 0, σu2 E X; Xi′
d

√ ( )
2. A consistent estimator for the asymptotic variance of N β̂ OLS − β ∗ under ho-
moskedasticity

∑
N N (
∑ )2
1 1
2
S = 2
ûi = Yi − Xi′ β̂ OLS
N −K N −K
i=1 i=1
′ OLS
ûi = Yi − Xi β̂
= ui + Xi′ β ⋆ − Xi′ β̂ OLS
( )
= ui − Xi′ β̂ OLS − β ⋆
CHAPTER 2. ORDINARY LEAST SQUARES 14

∑N [ ( )]2
1
Su2 = ui − Xi′ β̂ OLS − β ⋆
N −K
i=1
[ ]
N ∑ N
= N −1 u2i
N −K
i=1
[ ]
( )′ ∑N X X ′ ( ) ( )′ ∑N X u
i=1 i i i=1 i i
+ β̂ OLS
−β ⋆
β̂ OLS
− β − 2 β̂
⋆ OLS
−β ⋆
N −K N −K
[ ]
Su2 −→ 1 · σ 2 + 0 · E Xi Xi′ · 0 − 2 · 0 · 0 = σu2
The proof of consistency ends here. However, to take it a bit further,
û′ û = û′ M û
[ ] [ ]
E û′ û|X = E û′ MX û|X
= E [tr (ûMX û) |X]
[ [ ] ]
= tr E ûû′ |X MX
( )
= tr σu2 IMX
= σu2 tr (MX )
= σu2 (N − K)
[ ] E [û′ u|X]
E s2 |X =
N −K
σu2 (N − K)
=
N −K
2
= σu

1
σ̂u2 = û′ û
N −K
1
= u′ MX u
N −K
1 [ ( )−1 ′ ]
= u′ IN − X X ′ X X u
N −K
1 [ ( )−1 ′ ]
= u′ u − u′ X X ′ X Xu
N −K [ ]
( )
N u′ u u′ X X ′ X −1 X ′ u
= −
N −K N N N N
X ′u p u′ u N p
−→ 0 → σu2 →1
N N N −K
( ′ )−1
′
û û XX p [ ]−1
−→ σu2 E Xi Xi′
N −K N
Chapter 3

Linear Regression Model

3.1 Ordinary Least Squares

Consistency

( )−1 ( ′ )
β̂ OLS = X ′ X XY
( ′ )−1 ′
= XX X (Xβ + e)
( ′ )−1 ′ ( )−1 ′
= XX X Xβ + X ′ X Xe
( ′ )−1 ′
=β+ XX Xe
= β + Op (1) · op (1)
= β + op (1)
( )−1 ( )
X ′X Xe
p lim β̂OLS = β + p lim
N N
( )
Xe
ρ lim =0
N
( ′ )−1
XX [ ]−1
ρ lim = E X ′X
N
ρ
β̂OLS −→ β

15
CHAPTER 3. LINEAR REGRESSION MODEL 16

Expectation and Variance

[ ] [ ( )−1 ′ ]
E β̂OLS = E β + X ′ X Xe
[ [ ′ −1 ′ ] ]
= β + E E (X X) X e |X
[( )−1 ′ ]
= β + E X ′X X E[e|X]
=β

Compared to other linear unbiased estimator β̃ := CY where dim C ′ = N × K

E[β̃|X] = E[CY |X] = CE[Y |X] = CXβ

CXβ = β ⇒ CX = IK
[ ]
Var(β̃|X) = E Ce(Ce)′ |X
[ ]
= CE ee′ |X C ′
= CΣC ′

Guess C ⋆ from GLS,

( )−1
C ⋆ = X ′ Σ−1 X XΣ−1
D = C − C⋆
CX = Ik

Finally, the variance is,

( )−1 ( ′ −1 )−1
Var(β̃|X) = CΣC ′ − X ′ Σ−1 X + XΣ X
′ ( )−1
= CΣC ′ − C ⋆ ΣC ⋆ + X ′ Σ−1 X
′ ( )−1
= (C − C ⋆ + C ⋆ ) Σ (C − C ⋆ + C ⋆ )′ − C ⋆ ΣC ⋆ + X ′ Σ−1 X
′ ′ ′ ( )−1
= DΣD′ + DΣC ⋆ + C ⋆ ΣD′ + C ⋆ ΣC ⋆ − C ⋆ ΣC ⋆ + X ′ Σ−1 X
( )−1
= DΣD′ + X ′ Σ−1 X
CHAPTER 3. LINEAR REGRESSION MODEL 17

It is important to know that,

′ ( )−1
DΣC ⋆ = (C − C ∗ ) ΣΣ−1 X X ′ Σ−1 X
( )−1
= (CX − C ⋆ X) X ′ Σ−1 X =0

To check positive semidefinite,

q̃DΣD′ q̃ ′ ≥ 0 ⇒ DΣD′

3.2 Generalized Least Squares

Expectation and Variance

β̂GLS = (X̃ ′ X̃)−1 X̃ ′ Ỹ

(( )′ ( 1 ))−1 ( 1 )′ ( 1 )
− 12
= Γ X Γ− 2 X Γ− 2 X Γ− 2 Y
( )−1 ( )
= X ′ Γ− 2 Γ− 2 X X ′ Γ− 2 Γ− 2 Y
1 1 1 1

( )−1 ( )
= X ′ Γ− 2 X X ′ Γ− 2 Γ− 2 y
1 1 1

( )−1 ′ −1
= X ′ Γ−1 X XΓ Y
( ′ −1 )−1 ′ −1
= XΓ X X Γ (Xβ + e)
( ′ −1 )−1 ( )−1 ′ −1
= XΓ X XΓ−1 Xβ + XΓ−1 X XΓ e
( ′ −1 )−1 ′ −1
=β+ XΓ X XΓ e

[( )−1 ]
E[β̂GLS |X] = E X ′ Γ−1 X XΓ−1 Y |X
[( )−1 ′ ]
= Γ−1 ΓE X ′ X X Y |X
( )−1 ′
= X ′X X E[Y |X]
( ′ )−1 ′
= XX X Xβ
=β
CHAPTER 3. LINEAR REGRESSION MODEL 18

[ ] [( )( )′ ]
var β̂GLS |X = E β̂GLS − β β̂GLS − β |X
[( )−1 ′ −1 ′ −1 ( ′ −1 )−1 ]
= E X ′ Γ−1 X X Γ ee Γ X X Γ X |X
( ′ −1 )−1 ′ −1 [ ′ ] −1 ( ′ −1 )−1
= XΓ X X Γ E ee |X Γ X X Γ X
( ′ −1 )−1 ′ −1 −1 ( ′ −1 )−1
= XΓ X X Γ ΣΓ X X Γ X
2
( ′ −1 )−1 ′ −1 −1 ( ′ −1 )−1
=σ XΓ X X Γ ΓΓ X X Γ X
( ) −1 ( ′ −1 ) ( ′ −1 )−1
= σ 2 X ′ Γ−1 X XΓ X XΓ X
( ) −1
= σ 2 X ′ Γ−1 X
( )−1
= X ′ (σ 2 Γ)−1 X
( )−1
= X ′ Σ−1 X
[ ]
It is worth noting that, Var[β̃|x] ≥ Var β̂GLS |X .
Chapter 4

Instrumental Variables

Recall that for a general model Y = Xβ + e (or Yi = Xi′ β + ei ), one of the most important
assumptions of OLS theory is the exogeneity of the independent variables,

E (e|X) = 0

When this assumption is violated we essentially have that,

( ) ( )
cov(X, e) = E X ′ e − E(X)E(e) = E X ′ e
= E[ei Xi ] ̸= 0

and the OLS estimator loses all advantages. In particular it is easy to verify that since
E[ei Xi ] ̸= 0 then
[( )−1 ′ ]
E[β̂] = β + E X ′ X Xe
( ( )) −1
= β + E Xi Xi′ E (Xi ei )
̸= β

when E[Xi Ei ] ̸= 0 we have errors-in-variable problem in which the independent variable

X is measured with error. Namely we want to estimated the model

Yi = Xi′ β + ei

but we can only observe

X̃i = Xi + ri

19
CHAPTER 4. INSTRUMENTAL VARIABLES 20

where ri is a K ×1 measurement error, independent of ei and Xi The regression we perform

is Y on X̃. The estimator of β is expressed as:
( )−1
β̂ = X̃ ′ X̃ X̃ ′ Y
( )−1
= X ′ X + r′ r + X ′ r + r′ X (X + r)′ (Xβ + e)
( ′ ) −1
E[β̂|X] = X X + r′ r X ′ Xβ
Measurement error on X leads to a biased OLS estimate, biased towards zero. This is also
called attenuation bias or measurement error bias. In this case the estimator is inconsistent
 ( 2) 
E r
β̂ → β 1 − ( i ) 
p

E X̃i X̃i′
Proof. Recall from the least squares estimation,
(N )−1 ( N )
∑ ∑
′
β̂ = X̃i X̃i X̃i Yi
i=1 i=1
(N )−1 ( )
∑ ∑
N ( )
= X̃i X̃i′ X̃i X̃i′ β + vi
i=1 i=1
(N )−1 ( ) ( )−1 ( )
∑ ∑
N ∑
N ∑
N
= X̃i X̃i′ X̃i X̃i′ β+ X̃i X̃i′ X̃i vi
i=1 i=1 i=1 i=1
(N )−1 ( N )
∑ ∑
=β+ X̃i X̃i′ X̃i vi
i=1 i=1
(N )−1 ( N )
∑ ∑
=β+ X̃i X̃i′ X̃i (ei − ri β)
i=1 i=1
(N )−1 ( N )
∑ ∑
=β+ X̃i X̃i′ (Xi + ri ) (ei − ri β)
i=1 i=1
(N )−1 ( )
∑ ∑
N
( )
=β+ X̃i X̃i′ Xi ei − Xi ri β + ri ei − ri2 β
i=1 i=1

( )−1
∑
N
β̂ = β+ N −1 X̃i X̃i′
i=1
( )
∑N ∑
N ∑
N ∑
N
−1 −1 −1 −1
N Xi ei − N Xi ri β + N ri ei − N ri2 β
i=1 i=1 i=1 i=1
CHAPTER 4. INSTRUMENTAL VARIABLES 21

By taking probability limit and since ,

∑
N
p
−1
N Xi ei −→ 0
i=1
∑
N
p
N −1 Xi ri β −→ 0
i=1
∑
N
p
−1
N ri ei −→ 0
i=1
∑
N
p
N −1 X̃i X̃i′ −→ E[X̃i X̃i′ ]
i=1
∑N
p
N −1 ri2 β −→ E[ri2 ]
i=1

Notice that,
( )−1
1 ∑ ( [ ] [ ])−1
N
′
p lim X̃i X̃i = E Xi Xi′ + E ri ri′
N
i=1
( )
1 ∑
N
p lim X̃i ei = E [Xi ei ] + E [ri ei ] = 0
N
i=1
( )
1 ∑ [ ] [ ] [ ]
N
p lim ′
X̃i ri = E Xi ri′ + E ri ri′ = E ri ri′
N
i=1
( [ ] [ ])−1 ( [ ] )
p lim β̃ = β + E Xi Xi′ + E ri ri′ −E ri ri′ β
( [ ] [ ])−1 ( [ ] )
p lim β̃ = β + E Xi2 + E ri2 −E ri2 β
( [ ] )
E ri2
= β 1 − [ 2] [ ]
E Xi + E ri2
[ ]
E X̃i X̃i is positive definite,
[ ] [ ]
E X̃i X̃i = E (Xi + ri ) (Xi + ri )′
[ ] [ ] [ ] [ ]
= E Xi Xi′ + E Xi ri′ + E ri Xi′ + E ri ri′
[ ] [ ]
= E Xi Xi′ + E ri ri′
CHAPTER 4. INSTRUMENTAL VARIABLES 22

Therefore,
 ( ) 
E ri2
β̂ → β 1 − ( )
p

E X̃i X̃i′
( )
p σr2
→β 1− 2
σx + σr2

4.1 Estimator
General theory: consistent estimator of β for the general model y = Xβ +e, when E[X ′ e] ̸=
0 can be obtained if we could find a matrix of instruments Z of order N × L, with L ≥ K
(more instruments than variables) such that:
p
1. Variables in Z correlated with those in X and Z ′ X/N → ΣZX finite and full rank
(by column or row).

2. Z ′ e/N →p 0

Idea 4.1.1. By projecting (regressing) X on Z, hence creating X̂, we are taking away the
share of X related to e, making β̂IV consistent!

Xi = π ′ Zi + vi

such that π = E (Zi Zi′ )−1 E (Zi Xi′ ) implying E (Zi vi′ ) = 0.The reduced form for Xi can be
plugged into the original regression:

Yi = Xi′ β + ei
( )′
= π ′ Zi + vi β + ei
= Zi′ λ + wi

where λ := πβ and wi := vi′ β + ei . From two regressions above,

( )−1 ( )
π = E Zi Zi′ E Zi Xi′
( )−1
λ = E Zi Zi′ E (Zi Yi )
CHAPTER 4. INSTRUMENTAL VARIABLES 23

4.1.1 Case 1: L = K
Recall a useful theorem from Linear Algebra
Theorem 4.1.2. The linear system Ax = b has a solution if and only if its augmented
matrix and coefficient matrix have the same rank.
That means the augmented matrix (π λ) has K full rank such that there exists a unique
solution for β.
β = π −1 λ
( )−1 ( ) ( )−1
= E Zi Xi′ E Zi Zi′ E Zi Zi′ E (Zi Yi )
( ) −1
= E Zi Xi′ E (Zi Yi )
Applying the analogy principle delivers the estimator
(N )−1 ( N )
∑ ∑
IV ′
β̂ = Zi Xi Zi Yi
i=1 i=1
(N )−1 ( )
∑ ∑
N
= Zi Xi′ Zi (Xi′ β + ei )
i=1 i=1
(N )−1 ( ) (N )−1 ( N )
∑ ∑
N ∑ ∑
= Zi Xi′ Zi Xi′ β+ Zi Xi′ Zi ei
i=1 i=1 i=1 i=1
(N )−1 ( N )
∑ ∑
=β+ Zi Xi′ Zi e i
i=1 i=1

General Form
To derive the IV estimator we start from the classic regression setup
Y = Xβ + e

and pre-multiply it by Z (Z ′ Z)−1 Z ′ obtaining

( )−1 ′ ( )−1 ′ ( )−1 ′
Z Z ′Z Z Y = Z Z ′Z Z Xβ + Z Z ′ Z Ze
Denote each term as,
( )−1 ′
Ŷ = Z Z ′ Z ZY
( ′ )−1 ′
X̂ = Z Z Z ZX
( ′ )−1 ′
ê = Z Z Z Ze
CHAPTER 4. INSTRUMENTAL VARIABLES 24

such that,

ŷ = X̂β + ê

The instrumental variable estimator can the be obtained by applying OLS on this modified
model
( )−1
β̂IV = X̂ ′ X̂ X̂ ′ Ŷ
[( ]
( ′ )−1 ′ )′ ( ′ )−1 ′ −1 ( ( ′ )−1 ′ )′ ( ′ )−1 ′
= Z ZZ ZX Z ZZ ZX Z ZZ ZX Z ZZ ZY
[ ( )−1 ′ ( ′ )−1 ′ ]−1 ′ ( ′ )−1 ′ ( ′ )−1 ′
= X ′Z Z ′Z ZZ ZZ ZX XZ ZZ ZZ ZZ ZY
[ ( ]
)−1 ′ −1 ′ ( ′ )−1 ′
= X ′Z Z ′Z ZX XZ ZZ ZY

This is how the IV estimator looks in general. If the number of instruments L equals the
number of regressors L (i.e. L = K) then the product Z ′ X is a square matrix of dimension
K × K( or L × L), which is non-singular (i.e. invertible). Therefore, we can re-write the
term in square brackets as
[ ( )−1 ′ ]−1 ′ ( ′ )−1 ′
β̂IV = X ′ Z Z ′ Z ZX XZ ZZ ZY
( )−1 (( ′ )−1 )−1 ( ′ )−1 ′ ( ′ )−1 ′
= Z ′X ZZ XZ XZ ZZ ZY
( ′ )−1 ( ′ ) ( ′ )−1 ′
= ZX ZZ ZZ ZY
( ′ )−1 ′
= ZX ZY

Consistency
The estimate is consistent
( )−1 ( )
∑
N ∑
N
−1
β̂ IV
=β+ N Zi Xi′ N −1
Zi e i
i=1 i=1

By taking probability limit,

( )−1
∑
N
p
N −1 Zi Xi′ → E[Zi Xi′ ]−1
i=1
( )
∑
N
p
−1
N Zi ei →0
i=1
p
β̂ IV → β
CHAPTER 4. INSTRUMENTAL VARIABLES 25

√ ( )
Asymptotic Distribution of N β̂ IV − β

( )−1 ( )
∑
N ∑
N
β̂ IV = β + N −1 Zi Xi′ N −1 Zi ei
i=1 i=1
( )−1 ( )
√ ( ) ∑
N ∑
N √
N β̂ IV − β = N −1 Zi Xi′ N −1 Zi e i N
i=1 i=1

Consider each terms separately,

( )−1
∑
N
p
−1
N Zi Xi′ −→ E[Zi Xi′ ]−1
i=1
( )
∑
N √ ( )
N −1 N −→ N 0, E[e2i Zi Zi′ ]
d
Zi e i
i=1

Therefore,
√ ( )
d
N β̂ IV − β −→ N (0, Ω)
( )
where Ω = E (Zi Xi′ )−1 E e2i Zi Zi′ E (Xi Zi′ )−1 .

4.1.2 Case 2: L ≥ K
Recall from the general form that
( )−1
β̂2sls = X̂ ′ X̂ X̂ ′ Ŷ
[( ]
( ′ )−1 ′ )′ ( ′ )−1 ′ −1 ( ( ′ )−1 ′ )′ ( ′ )−1 ′
= Z ZZ ZX Z ZZ ZX Z ZZ ZX Z ZZ ZY
[ ( )−1 ′ ( ′ )−1 ′ ]−1 ′ ( ′ )−1 ′ ( ′ )−1 ′
= X ′Z Z ′Z ZZ ZZ ZX XZ ZZ ZZ ZZ ZY
[ ( ]
)−1 ′ −1 ′ ( ′ )−1 ′
= X ′Z Z ′Z ZX XZ ZZ ZY

such that it can also be written as

(N )−1 ( N )
∑ ( ) ( ) ∑ ( )
′ ′ −1 ′ ′ ′ −1
β̂2sls = Xi Zi Zi Zi Zi Xi Xi Zi Zi Zi (Zi Yi )
i=1 i=1
(N )−1 ( )
∑ ∑
N
bP = (P Zi )Xi′ (P Zi )Yi
i=1 i=1
CHAPTER 4. INSTRUMENTAL VARIABLES 26

Different choices for the matrix P result in different estimators. For example, the simple IV
estimator for the exactly identified case simply sets P = I. It can be shown that another
choice, namely P = P ∗ := E (Xi Zi′ ) E (Zi Zi′ )−1 , results in an estimator, bP ∗ , with minimal
asymptotic variance. Notice, however, that bP ∗ is an infeasible estimator because you do
not observe P ∗ . Replace P ∗ by P̂ = P ∗ + op (1) resulting in the feasible estimator bP̂ .
(N )−1 ( )
∑ ∑
N
bP̂ = P̂ Zi Xi′ P̂ Zi Yi
i=1 i=1
(N )−1 ( )
∑ ∑
N
( )
= P̂ Zi Xi′ P̂ Zi Xi′ β + ei
i=1 i=1
(N )−1 ( N )
∑ ∑
=β+ P̂ Zi Xi′ P̂ Zi ei
i=1 i=1
( )−1 (N )
∑
N ∑
=β+ P̂ Zi Xi P̂ Zi e i
i=1 i=1
( )−1 ( )
1 ∑
N
1 ∑
N
=β+ P̂ Zi Xi′ P̂ Zi ei
N N
i=1 i=1

Consistency

bP̂ = β + (Op (1) · Op (1))−1 OP (1) · op (1)

= β + Op (1) · op (1)
= β + op (1)
p
Therefore, bP̂ −→ β.
√ ( )
Asymptotic Distribution of N bP̂ − β

( )−1 (√ )
√ ( ) 1 ∑ N∑
N N
N bP̂ − β = P̂ Zi Xi′ P̂ Zi ei
N N
i=1 i=1
CHAPTER 4. INSTRUMENTAL VARIABLES 27

Consider each term separately,

( )
1 ∑ ( [ ])−1 ∗
N
p lim P̂ Zi Xi P̂ = P ∗ E Zi Xi′
′
P
N
i=1
= (P ∗ CZX )−1 P ∗
(√ )
N∑ [ [ ])
N
−→ N 0, E Zi ei e′i Zi′
d
Zi e i
N
i=1
( [ ])
−→ N 0, σe2 E Zi Zi′
d

d ( −1
)
−→ N 0, σe2 CZZ

To sum up,
√
N (bP̂ − β) −→ (CXZ CZZ CZX )−1 CXZ CZZ N (0, E[e2i Zi Zi′ ])
d

where CXZ := E (Xi Zi′ ), CZX := E (Zi Xi′ ), and CZZ := E (Zi Zi′ )−1 . To simplify the
notation,
√
N (bP̂ − β) −→ N (0, AV A′ )
d

where

A = (CXZ CZZ CZX )−1 CXZ CZZ

= (P ∗ CZX )−1 P ∗
V = E[e2i Zi Zi′ ]

Asymptotic Variance Under Homoskedasticity

( )
We have assumption that E e2i Zi Zi′ = σe2 E (Zi Zi′ )
(√ ) (√ )
avar N (bP̂ − β) = avar N (bP̂ )
= AV A′
( )′
= (CXZ CZZ CZX )−1 CXZ CZZ E[e2i Zi Zi′ ] (CXZ CZZ CZX )−1 CXZ CZZ
= σe2 (P ∗ CZX )−1 P ∗ CZZ
−1
CZZ CZX (P ∗ CZX )−1
= σe2 (P ∗ CZX )−1 (P ∗ CZX ) (P ∗ CZX )−1
= σe2 (P ∗ CZX )−1
CHAPTER 4. INSTRUMENTAL VARIABLES 28

Good Estimator for P ∗

Recall that
[ ] [ ]−1
P ∗ = E Xi Zi′ E Zi Zi′
By analogy principal,
(N )( )−1
∑ ∑
N
P̂ = Xi Zi′ Zi Zi′
i=1 i=1
( )( )−1
∑
N ∑
N
P̂ = N −1 Xi Zi′ N −1 Zi Zi′
i=1 i=1

To test its consistency, we take the probability limit,

( )
∑
N
p [ ]
N −1 Xi Zi′ −→ E Xi Zi′
i=1
( )−1
∑
N
−→ E [Zi Zi ]−1
p
N −1 Zi Zi′
i=1

Which is equivalent to
( )
∑
N
[ ]
N −1 Xi Zi′ = E Xi Zi′ + op (1)
i=1
( )−1
∑
N
N −1 Zi Zi′ = E [Zi Zi ]−1 + op (1)
i=1

Plugging these terms back in

( [ ] )( )
P̂ = E Xi Zi′ + op (1) E [Zi Zi ]−1 + op (1)
[ ]
= E Xi Zi′ E [Zi Zi ]−1 + E [Xi Zi ] op (1) + op (1)E [Zi Zi ]−1 + op (1)op (1)
[ ]
= E Xi Zi′ E [Zi Zi ]−1 + op (1)
= P ∗ + op (1)

4.2 Application: Partitioned Regression

The linear model under endogeneity is
Y = Xβ + e
X = Zπ + v
CHAPTER 4. INSTRUMENTAL VARIABLES 29

where E (ei Xi ) ̸= 0 and E (ei Zi ) = 0. Notice dim X = N × K, dim β = K × 1, dim Z =

N × L, dim π = L × K, and dim v = N × K.

The source of the endogeneity is correlation between the two error terms, write

e = vρ + w

where E (vi wi ) = 0. Notice dim ρ = K × 1, and dim w = N × 1.

Combining, we obtain

Y = Xβ + vρ + w

You have available an iid data set (Xi , Yi , vi ),

 
[ ] β
Y = X v  +w
ρ

Normal equation:
    
X ′X X ′v β̂ X ′Y
  = 
v′X v′v ρ̂ v′Y

We have

X ′ X β̂ + X ′ v ρ̂ = X ′ Y
v ′ X β̂ + v ′ v ρ̂ = v ′ Y

by rearranging
( )−1 ′
β̂ = X ′ X X (Y − v ρ̂)
( ′ )−1 ′
ρ̂ = v v v (Y − X β̂)

Since Pv = v (v ′ v)−1 v ′ and Mv = I − Pv

( )−1 ′ ( ( )−1 ′ ( ))
β̂ = X ′ X X Y − v v′v v Y − X ′ β̂
( )−1 ′
β̂ OLS = X ′ Mv X X Mv Y
CHAPTER 4. INSTRUMENTAL VARIABLES 30

4.2.1 Consistency
( )−1 ′
β̂ OLS = X ′ Mv X X Mv Y
( ′ )−1 ′
= X Mv X X Mv (Xβ + vp + w)
( ′ )−1 ′
= β + 0 + X Mv X X Mv w
Notice that
( ( )−1 ′ )
Mv v = I − v v ′ v v v=0
Rewriting
( )−1 ′
β̂ OLS − β = X ′ Mv X X Mv w
( )−1 ( )
1 ′ 1 ′
= X Mv X X Mv w
N N
Consider each term
1 ′ 1 ( ( )−1 ′ )
X Mv X = X ′ I − v v ′ v v X
N N
1 ( ′ ( )−1 ′ )
= X X − X ′v v′v vX
N
( )
X ′X X ′ v v ′ v −1 v ′ X
= −
N N N n
( )( )−1 ( )
1 ′ 1 ′ 1 ′ 1 ′
= ΣXi Xi − ΣXi vi Σvi vi Σvi Xi
N N N N
= Op (1) − Op (1) · Op−1 (1) · Op (1)
= Op (1)

1 ′ 1 ( ′( ( )−1 ′ ) )
X Mv w = X I − v v′v v w
N N
( )( ) ( )
1 ′ 1 ′ 1 ′ −1 1 ′
= X w− Xv vv vw
N N N N
( )( )−1 ( )
1 ′ 1 ′ 1 ′ 1 ′
= ΣXi wi − ΣXi vi Σvi vi Σvi wi
N N N N
= op (1) − Op (1) · (Op (1))−1 · op (1)
= op (1)
Therefore,
β̂ OLS − β = Op (1) · op (1) = op (1)
p
β̂ OLS −→ β
CHAPTER 4. INSTRUMENTAL VARIABLES 31

4.2.2 A Plot Twist

You do not have available an iid data set (Xi , Yi , vi ) . Instead, you have available an iid
data set (Xi , Yi , Zi ) . You cannot run a regression of Y on X and v, but you can instead
run a regression of Y on X and v̂ where v̂ is the first stage residual.
( )−1 ′
β̂ OLS = X ′ Mv X X Mv Y

Recall that
( )−1 ′
P̂v = v̂ v̂ ′ v̂ v̂
X = Zπ + v
v̂ = X − Z π̂
( )−1 ′
= X − Z Z ′Z ZX
= X − PZ X
= (I − PZ ) X
= MZ X
( )−1 ′
P̂v = MZ X X ′ MZ′ MZ X X MZ
( ′ )−1 ′
= MZ X X MZ X X MZ

Remember that, P̂v X = MZ X, M̂v X = PZ X, and PZ = Z (Z ′ Z)−1 Z

( )−1 ( ′ )
β̂ OLS = X ′ PZ X X PZ Y
( ( )−1 ′ )−1 ( ′ ( ′ )−1 ′ )
β̂ 2SLS = X ′ Z Z ′ Z ZX XZ ZZ ZY

Remark: the estimate derived in the former case is more precise but since you cannot
observe vi in reality, the estimate from the latter case is more practical.
Chapter 5

Maximum Likelihood Estimation

5.1 Normal Distribution with Unit Variance

You have Y1 , . . . , YN iid with pdf
( )
1 1
fY (y|µ) = √ exp − (y − µ) 2
2π 2

The likelihood function

1 2 1 2
L (µ|y) = √ e− 2 (y1 −µ) · . . . · √ e− 2 (yN −µ)
1 1

2π 2π
The log likelihood function
( ) ( )
1 − 1 (y1 −µ)2 1 − 1 (yN −µ)2
L (µ|y) = ln √ e 2 + . . . + ln √ e 2
2π 2π
( ( ) ) ( ( ) )
1 1 1 1
= ln √ − (y1 − µ) + . . . + ln √
2
− (yN − µ) 2
2π 2 2π 2
∑N [ ( ) ]
1 1
= ln √ − (yi − µ)2
2π 2
i=1
( )
1∑
N
1
= N ln √ − (yi − µ)2
2π 2
i=1

32
CHAPTER 5. MAXIMUM LIKELIHOOD ESTIMATION 33

Differentiating with respect to µ

( ( ))
1 ∑
N
∂L ∂ 1
(µ|y1 , . . . , yN ) = N ln √ − ·2 (yi − µ) (−1)
∂µ ∂µ 2N 2
i=1
∑
N
( )
0= yi − µ̂ML
i
i=1
∑N
= yi − N µ̂ML
i=1
∑N
i=1 yi
µ̂ML = =y
N

5.1.1 Expectation
[ ]
ML 1
E[µ̂ ]=E (y1 + . . . + yN )
N
1
= (E [y1 ] + . . . + E [yN ])
N
1
= · Nµ
N
=µ

5.1.2 Variance
( )
1 ∑
n
ML
var(µ̂ ) = var yi
N
i=1
1
= · (var (y1 ) + . . . + var (yn ))
N
1 1
= 2 ·N =
N N

5.1.3 Score function

∂ ln fY (y|µ)
S(y|µ) = =y−µ
∂µ

5.1.4 Fisher information

[ ]
I(µ) = E S(y|µ)2 = var(S(y|µ)) = var(y − µ) = 1
CHAPTER 5. MAXIMUM LIKELIHOOD ESTIMATION 34

To confirm the information equality,

1 1
var (T (Y1 , . . . , YN )) ≥ =
N I(µ) N
1
Cramér Rao bound for unbiased estimator is N

∂S(y|µ)
= −1
∂µ
[ ]
∂S(y|µ)
−E = 1 = I(µ)
∂µ
1
The information equality holds. Since E[µ̂ML ] = µ and var(µ̂ML ) = N, the ML estimator
is unbiased and attains the CRB.

5.1.5 Decomposition

1 ∑
N
S (yi |µ) = a(µ) · (T (Y1 , . . . , YN ) − µ)
N
i=1
1
= [(y1 − µ) + . . . + (yN − µ)]
N
1 ∑
N
1
= yi − N µ
N N
i=1

1 ∑
N
= yi − µ
N
i=1
( )
1 ∑
N
=1· yi − µ
N
i=1

Therefore, a(µ) = 1, which is Fisher information!

5.1.6 Asymptotic distribution of µ̂ML

√ ( ) d
N µ̂ML − µ −→ N (0, 1)
( )
ML d 1
µ̂ −→ N µ,
N
CHAPTER 5. MAXIMUM LIKELIHOOD ESTIMATION 35

5.2 Normal Regression Model

Yi = Xi′ β + ei
( )
ei |Xi ∼ N 0, σe2

The novelty here is that the errors are assumed to have a normal distribution. The unknown
parameters are β ∈ RK and σe2 . Notice that the above normal regression model can be
regarded, equivalently, as a statement about the density of Yi given Xi . That conditional
density is
( )
( ) 1 1 ( ′
)2
fY y|x, β, σe = √
2
exp − 2 y − x β
2πσe2 2σe

You have available a random sample (Xi , Yi ) , where the Yi are iid with pdf fY (y|x).

The likelihood function

( ) ( )
( ) 1 1 ( ′
)2 1 1 ( ′
)2
L β, σe2 |x, y =√ exp − 2 y1 − x1 β · ... · √ exp − 2 yN − xN β
2πσe2 2σe 2πσe2 2σe
( )
1 ∑( )2
N
1
= √ exp − 2 yi − x′i β
σeN ( 2π)N 2σe
i=1

The log likelihood function

( ) 1 ∑( )2
N
N
L β, σe2 |x, y = −N log (σe ) − log(2π) − 2 yi − x′i β
2 2σe
i=1

N ( ) N 1 ∑N
( )2
=− log σe2 − log(2π) − 2 yi − x′i β
2 2 2σe
i=1

5.2.1 Estimators

1 ∑( )2
N
∂L N
2
= − 2
+ 4
yi − x′i β = 0
∂σe 2σe 2σe
i=1
N (
∑ )2
ML 1
σˆe2 = yi − x′i β̂ ML
N
i=1

1 ∑N
= ê2i
N
i=1
CHAPTER 5. MAXIMUM LIKELIHOOD ESTIMATION 36

2 ∑( )
N
∂L
=− 2 yi − x′i β (−xi ) = 0
∂β 2σe
i=1
∑
N ∑
N
= xi yi − xi x′i β̂ ML = 0
i=1 i=1
(N )−1 ( N )
∑ ∑
β̂ ML = xi x′i xi yi
i=1 i=1

5.2.2 Score Vector

∂ ln fY ( ) 1 ( )
y|x, β, σe2 = 2 xi yi − x′i β
∂β σe
∂ ln fY ( ) 1 1 ( )2
y|x, β, σe2 = − 2 + 4 yi − x′i β
∂σe2 2σ 2σe
 e 
( ) 1
x (y − x ′ β)
S y|x, β, σe2 : =  σe2 i i 
i

− 2σ2 + 2σ4 (yi − x′i β)2

1 1
e e

5.2.3 Hessian matrix

∂ 2 ln fY ( ) 1
′
y|x, β, σe2 = − 2 xi x′i
∂β∂β σe
2
∂ ln fY ( ) 1 1 ( ′
)2
y|x, β, σ 2
e = − yi − x i β
∂ (σe2 )2 2σe4 σe6
∂ 2 ln fY ( ) 1 ( ′
) ′
y|x, β, σ 2
= − y i − x β xi
∂σe2 ∂β ′ e
σe4 i

∂ 2 ln fY ( ) 1 ( )
2
y|x, β, σe2 = − 4 xi yi − x′i β
∂β∂σe σe

 
− σ12 xi x′i − σ14 xi ei
H(x, y) =  e e 
− σ14 ei x′i 1
2σe4
− 1 2
e
σe6 i
e

( )′
∂ 2 ln fY ∂ ln fY
Notice that ∂β∂σe2
= ∂σe2 ∂β
CHAPTER 5. MAXIMUM LIKELIHOOD ESTIMATION 37

5.2.4 Fisher information

( )
Since I β, σe2 = −E (H (Xi , Yi ))
 
( ) 1
E (Xi Xi′ ) 0
I β, σe = 
2 σe2 
1
0 2σe4

5.2.5 Asymptotic distribution

     
√ β̂ M L − β 0 σe2 E (Xi Xi′ )−1 0
N  ML  → N   ,  
ˆ
σe2 − σe
2 0 0 4
2σe
Appendix A

Probability Theory

A.1 Moments
Definition A.1.1. (Expectation) Let X be a continuous random variable with density
f (x). Then, the expected value of X, denoted by E[X], is defined to be
∫ ∞
E[X] = xf (x)dx
−∞

if the integral is absolutely convergent. The expected value does not exist when the fol-
lowings are true,
∫ 0
xf (x)dx = −∞
−∞
∫ ∞
xf (x)dx = ∞
0

Definition A.1.2. (Variance) The variance of X measures the expected square of the
deviation of X from its expected value
[ ]
Var(X) = E (X − E[X])2

Definition A.1.3. (Covariance) The covariance of any two random variables X and Y ,
denoted by Cov(X, Y ), is defined by

Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])]

= E[XY − Y E[X] − XE[Y ] + E[X]E[Y ]]
= E[XY ] − E[Y ]E[X] − E[X]E[Y ] + E[X]E[Y ]
= E[XY ] − E[X]E[Y ]

38
APPENDIX A. PROBABILITY THEORY 39

Properties of Covariance
For any random variables, XY, Z and constant c ∈ R,

1. Cov(X, X) = Var(X)

2. Cov(X, Y ) = Cov(Y, X)

3. Cov(cX, Y ) = c Cov(X, Y )

4. Cov(X, Y + Z) = Cov(X, Y ) + Cov(X, Z)

Cov(X, Y + Z) = E[X(Y + Z)] − E[X]E[Y + Z]

= E[XY ] − E[X]E[Y ] + E[XZ] − E[X]E[Z]
= Cov(X, Y ) + Cov(X, Z)

A.2 Conditional Expectation

Definition A.2.1. If X and Y are discrete random variables, then the conditional expec-
tation of X given that Y = y is defined by
∑
E[X|Y = y] = xP {X = x|Y = y}
x
∑
= xpX|Y (x|y)
x

Definition A.2.2. Let us denote by E[X|Y ] that function of the random variable Y whose
value at Y = y is E[X|Y = y]. Note that E[X|Y ] is itself a random variable. An extremely
important property of conditional expectation is that for all random variables X and Y

E[X] = E[E[X|Y ]]

If Y is a discrete random variable, then,

∑
E[X] = E[X|Y = y]P {Y = y}
y
APPENDIX A. PROBABILITY THEORY 40

Proof.
∑ ∑∑
E[X|Y = y]P {Y = y} = xP {X = x|Y = y}P {Y = y}
y y x
∑ ∑ P {X = x, Y = y}
= x P {Y = y}
y x
P {Y = y}
∑∑
= xP {X = x, Y = y}
y x
∑ ∑
= x P {X = x, Y = y}
x y
∑
= xP {X = x}
x
= E[X]

One way to understand the proof is to interpret it as follows. It states that to calculate
E[X] we may take a weighted average of the conditional expected value of X given that
Y = y, each of the terms E[X|Y = y] being weighted by the probability of the event on
which it is conditioned.

Remark: the usefulness of above representation is illustrated in section 1.1

A.3 Convergence of Random Variables

Convergence in probability
Definition A.3.1. (a) We say that a sequence of random variables Xn (not necessarily
defined on the same probability space) converges in probability to a real number c, and
p
write Xn → c, if

lim P (|Xn − c| ≥ ϵ) = 0, ∀ϵ > 0

n→∞

(b) Suppose that X and Xn , n ∈ N are all defined on the same probability space. We say
p
that the sequence Xn converges to X, in probability, and write Xn → X, if Xn − X
converges to zero, in probability, i.e.,

lim P (|Xn − X| ≥ ϵ) = 0, ∀ϵ > 0

n→∞
APPENDIX A. PROBABILITY THEORY 41

When X in part (b) of the definition is deterministic, say equal to some constant c, then
the two parts of the above definition are consistent with each other
p
The intuitive content of the statement Xn → c is that in the limit as n increases, almost all
of the probability mass becomes concentrated in a small interval around c, no matter how
small this interval is. On the other hand, for any fixed n, there can be a small probability
mass outside this interval, with a slowly decaying tail. Such a tail can have a strong
impact on expected values. For this reason, convergence in probability does not have any
p
implications on expected values. For example, We have Xn → X, but E[Xn ] does not
converge to E[X].

Convergence in distribution
Definition A.3.2. Let X and Xn , n ∈ N, be random variables with CDFs F and Fn ,
respectively. We say that the sequence Xn converges to X in distribution, and write
d
Xn → X, if

lim Fn (x) = F (x)

n→∞

for every x ∈ R at which F is continuous.

p
Convergence in probability is the stronger notion of convergence. In particular Xn → X
d d p
implies Xn → X. For a constant c, Xn → c implies Xn → c, but not the converse.

For convenience, we say op for convergence and Op for boundedness in probability. So, Let
xn be a sequence of non-negative real-valued random variables.
p
1. xn = op (1) means xn → 0 as n grows.

2. xn = op (bn ) for a non-negative sequence {bn } means xn

bn = op (1)

3. xn = op (yn ) for a sequence of non-negative random variables {yn } implies xn

yn = op (1)

Furthermore,

1. xn = Op (1) mean {xn } is bounded in probability; i.e. for any ϵ > 0, ∃bϵ > 0 such
that supn P (xn > bϵ ) ≤ ϵ
xn
2. xn = Op (bn ) mean bn = Op (1)
xn
3. xn = Op (yn ) means yn = Op (1)
APPENDIX A. PROBABILITY THEORY 42

Slutsky’s Theorem
d
Let {Xn } , {Yn } be sequences of scalar/vector/matrix random elements. If Xn → X and
p
Yn → c, then,
d
1. Xn + Yn → X + c
d
2. Xn Yn → cX
d
3. Xn /Yn → X/c
Appendix B

Linear Algebra

B.1 Inner Product

130 3 Inner Products and Norms

v
v v3
v2

v1 v2
v1

An inner product isFigure The

3.1. dot
the familiar Euclidean Norm in R 2 and R 3 .
product

T ∑ n
T
v · w v1 w
between (column) vectors v = ( v , v
11 + , . . . ,
2 v 2 w2 n v + · · · + vn wn 1= 2 , .v. i.w, w
) , w = ( w , w i n ) , both lying in the
Euclidean space R n . A key observation is that the dot product i=1 (3.1) is equal to the matrix
product ⎛ ⎞
Definition B.1.1. An inner product on the real vector spacewV1 is a pairing that takes two
⎜ w2 ⎟
vectors v, w ∈ V and produces a Treal number ⟨v, w⟩ ∈ R.⎜The ⎟inner product is required
v · w = v w = ( v1 v2 . . . vn ) ⎜ .. ⎟ (3.2)
to satisfy the following three axioms for all u, v, w ∈ V , and . ⎠ c, d ∈ R.
⎝ scalars
wn
between the row vector vT and the column vector w.
The dot product is the cornerstone of Euclidean geometry. The key fact is that the dot
product of a vector with itself,
v · v = v12 + v43
2 + · · · + vn ,
2 2

is the sum of the squares of its entries, and hence, by the classical Pythagorean Theorem,
equals the square of its length; see Figure 3.1. Consequently, the Euclidean norm or length
of a vector is found by taking the square root:
√
v = v·v = v12 + v22 + · · · + vn2 . (3.3)
Note that every nonzero vector, v = 0, has positive Euclidean norm, v > 0, while only
the zero vector has zero norm: v = 0 if and only if v = 0. The elementary properties
of dot product and Euclidean norm serve to inspire the abstract deﬁnition of more general
APPENDIX B. LINEAR ALGEBRA 44

(i) Bilinearity

⟨cu + dv, w⟩ = c⟨u, w⟩ + d⟨v, w⟩,

⟨u, cv + dw⟩ = c⟨u, v⟩ + d⟨u, w⟩

(ii) Symmetry
⟨v, w⟩ = ⟨w, v⟩

(iii) Positivity
⟨v, v⟩ > 0 whenever v ̸= 0, while ⟨0, 0⟩ = 0

Given an inner product, the associated norm of a vector v ∈ V is defined as the positive
square root of the inner product of the vector with itself.
√
∥v∥ = ⟨v, v⟩

B.2 Orthogonality
Orthogonal and Orthonormal Bases
Definition B.2.1. A basis u1 , . . . , un of an n-dimensional inner product space V is called
orthogonal if if ⟨ui , uj ⟩ = 0 for all i ̸= j. The basis is called orthonormal if, in addition,
each vector has unit length: ∥ui ∥ = 1, for all i = 1, . . . , n
Proposition B.2.2. Let v1 , . . . , vk ∈ V be nonzero, mutually orthogonal elements, so
vi ̸= 0 and ⟨vi , vj ⟩ = 0 for all i ̸= j. Then, v1 , . . . , vk are linearly independent.
Lemma B.2.3. If v1 , . . . , vn is an orthogonal basis of a vector space V , then the normalized
vectors ui = vi / ∥vi ∥ , i = 1, . . . , n, form an orthonormal basis
Theorem B.2.4. Let u1 , . . . , un be an orthonormal basis for an inner product space V .
Then one can write any element v ∈ V as a linear combination
v = c1 u1 + · · · + cn un
where its coordinates
ci = ⟨v, ui ⟩ , i = 1, . . . , n
are explicitly given as inner products. Moreover, its norm is given by the Pythagorean
formula
v
√ u n
u∑
∥v∥ = c1 + · · · + cn = t
2 2 ⟨v, ui ⟩2
i=1

namely, the square root of the sum of the squares of its orthonormal basis coordinates.
APPENDIX B. LINEAR ALGEBRA 45

Definition B.2.5. A square matrix Q is called orthogonal if it satisfies

QT Q = QQT = I

The orthogonality condition implies that one can easily invert an orthogonal matrix

Q−1 = QT

B.2.1 Orthogonal Projection

4.4 Orthogonal Projections and Orthogonal Subspaces 213

v
z

Figure 4.4. The Orthogonal Projection of a Vector onto a Subspace.

For W ⊂ V is be a finite-dimensional subspace of a real inner product space, then we have
definitions.
fitting, as we shall discuss in Chapter 5. Second, we develop the concept of orthogonality
for a pairDefinition B.2.6.
of subspaces, culminating z ∈ aVproof
A vectorwith is said
of to
thebeorthogonality
orthogonal toofthe
thesubspace W ⊂ V if it is
fundamental
subspacesorthogonal
associatedto every
with an vector in W , so
m × n matrix ⟨z, at
that w⟩last w ∈striking
for all the
= 0reveals W. geometry that
underlies linear systems of equations and matrix multiplication.
Definition B.2.7. The orthogonal projection of v onto the subspace W is the element
Orthogonalw ∈ WProjection
that makes the difference z = v − w orthogonal to W.
ThroughoutTheorem B.2.8.WLet
this section, ⊂ uV1 , .will n be
. . , ube an orthonormal basis
a finite-dimensional for theofsubspace
subspace W ⊂ V Then
a real inner
product space. The inner
the orthogonal productofspace
projection v ∈ VV onto w ∈ Wtois be
is allowed infinite-dimensional.
given by But,
to facilitate your geometric intuition, you may initially want to view W as a subspace of
Euclidean space V = R m w equipped
= c1 u1 +with· · · +the where
cn uordinary
n ci = ⟨v, ui ⟩ , i = 1, . . . , n
dot product.
Definition 4.30.
Proof. Since u1 , . . .z, ∈
A vector unV form
is said to be of
a basis orthogonal to thethe
the subspace, subspace W ⊂ projection
orthogonal V if it is element
orthogonal to every vector in W , so z , w = 0 for all w ∈ W .
must be some linear combination thereof: w = c1 u1 + · · · + cn un . Definition B.2.7 requires
Given that the difference
a basis w ,...,wz = −w
of vthe be orthogonal
subspace W , we tonote
W , that
it suffices to check orthogonality
z is orthogonal to W if to the
1 n
and only if it is orthogonal to every basis vector: z , wi = 0 for i = 1, . . . , n. Indeed,
any other vector in W has the form w = c1 w1 + · · · + cn wn , and hence, by linearity,
z , w = c1 z , w1 + · · · + cn z , wn = 0, as required.

Deﬁnition 4.31. The orthogonal projection of v onto the subspace W is the element
w ∈ W that makes the diﬀerence z = v − w orthogonal to W .

The geometric conﬁguration underlying orthogonal projection is sketched in Figure 4.4.

As we shall see, the orthogonal projection is unique. Note that v = w + z is the sum of
its orthogonal projection w ∈ V and the perpendicular vector z ⊥ W .
The explicit construction is greatly simpliﬁed by taking an orthonormal basis of the
subspace, which, if necessary, can be arranged by applying the Gram–Schmidt process
to a known basis. (The direct construction of the orthogonal projection in terms of a
APPENDIX B. LINEAR ALGEBRA 46

basis vectors. By our orthonormality assumption

⟨z, ui ⟩ = ⟨v − w, ui ⟩
= ⟨v − c1 u1 − · · · − cn un , ui ⟩
= ⟨v, ui ⟩ − c1 ⟨u1 , ui ⟩ − · · · − cn ⟨un , ui ⟩
= ⟨v, ui ⟩ − ci
=0

The coefficients ci = ⟨v, ui ⟩ of the orthogonal projection W are thus uniquely prescribed
by the orthogonality requirement, which thereby proves its uniqueness.

B.2.2 Orthogonal Subspaces

218 4 Orthogonality
v

W⊥

Figure 4.6. Orthogonal Decomposition of a Vector.

Definition B.2.9. Two subspaces W, Z ⊂ V are called orthogonal if every vector in W is
complement W ⊥to=every
orthogonal w1⊥ isvector
the planein Z. passing through the origin having normal vector w ,
1
T ⊥
as sketched
Lemma in B.2.10.
Figure 4.5.
If wIn, .other
. . , w words,
1 span W
k
z=and Zy,, .z. ). , Z∈ span
( x,
1
W if
l Z,and
thenonly if Z are orthog-
W and
onal subspaces if and only ifz ⟨w
· wi1, z=j ⟩x=+0 2for
y +all3 zi = 0.. . . , k and j = 1, . . . , l.
= 1, (4.45)
W ⊂ Vequation ⊥
W ⊥ is characterized
Thus,Definition B.2.11. The orthogonal
as the complement
solution space of a subspace linear
of the homogeneous , denoted W , is
(4.45),
defined as the set of all vectors that are orthogonal to
or, equivalently, the kernel of the 1 × 3 matrix A = w1 = ( 1 2 3 ). We can write the
T W
general solution in the form
⎛ W⊥⎞ = {v ∈⎛V |⟨v,⎞w⟩ = ⎛ ⎞ w ∈ W}
0 for all
−2y − 3z −2 −3
Proposition ⎝
z =B.2.12. ySuppose y ⎝W 1⊂⎠V+ is
⎠ =that z⎝ 0 ⎠ = y z1 + z z2subspace
a finite-dimensional , of an inner
product space. Then every
z vector v ∈ 0V can be uniquely
1 decomposed into v = w + z,
where w ∈ W and z ∈ W ⊥ .
T T
where y, z are the free variables. The indicated vectors z1 = ( −2, 1, 0 ) , z2 = ( −3, 0, 1 ) ,
form a (non-orthogonal) basis for the orthogonal complement W ⊥ .

Proposition 4.40. Suppose that W ⊂ V is a ﬁnite-dimensional subspace of an inner

product space. Then every vector v ∈ V can be uniquely decomposed into v = w + z,
where w ∈ W and z ∈ W ⊥ .
Proof : We let w ∈ W be the orthogonal projection of v onto W . Then z = v − w is,
by deﬁnition, orthogonal to W and hence belongs to W ⊥ . Note that z can be viewed
as the orthogonal projection of v onto the complementary subspace W ⊥ (provided it is
+
ﬁnite-dimensional). If we are given two such decompositions, v = w + z = w z, then
APPENDIX B. LINEAR ALGEBRA 47

Proof. Let w ∈ W be the orthogonal projection of V onto W . Then, z = v − w is, by

definition, orthogonal to W and hence belongs to W ⊥ . Note that z can be viewed as the
orthogonal projection of v onto the complementary subspace W ⊥ . If we are given two
such decompositions, v = w + z = w e +ez, then w − w = z − z. The left-hand side of this
equation lies in W , while the right-hand side belongs to W ⊥ . But, as we already noted, the
only vector that belongs to both W and W ⊥ is the zero vector. Thus, w − w = 0 = z − z,
so w = w and z = z, which proves uniqueness.

Proposition B.2.13. If W ⊂ V is a subspace with dim W = n and dim V = m, then

dim W ⊥ = m − n.

B.3 Vector Spaces

B.3.1 Span and Linear Independence
88 2 Vector Spaces and Bases

v1
v1
v2 v2

Figure 2.4. Plane and Line Spanned by Two Vectors.

Definition B.3.1. Let v1 , . . . , vk be elements of a vector space V . A sum of the form

For instance, 3 v1 + v2 − 2 v3 , 8 v1 − 13 v3 = 8 v1 +∑ 0k v2 − 13 v3 , v2 = 0 v1 + 1 v2 +
0 v3 , and 0 = 0 v1 + 0 v2 + 0 v c1 vare1 + four
c2 v2 + · · · + cklinear
diﬀerent vk = combinations
ci vi of the three vector
3
space elements v1 , v2 , v3 ∈ V . i=1

wherekey
The theobservation
coefficients is c1 that
, c2 , . .the are any
. , ckspan scalars,
always formsis aknown subspace.as a linear combination of the
elements v1 , . . . , vk . Their span is the subset W = span {v1 , . . . , vk } ⊂ V consisting of all
Proposition 2.14.
possible linear The span with
combinations W =scalarsspan c{v, 1. ,. .. ., .c, v∈k }R.of any ﬁnite collection of vector
1 k
space elements v1 , . . . , vk ∈ V is a subspace of the underlying vector space V .
Proposition B.3.2. The span W = span {v1 , . . . , vk } of any finite collection of vector
Proof : We
space need tov1show
elements ,...,v k ∈ if
that V is a subspace of the underlying vector space V .
= c1to
Proof. Wevneed + · · that
v1show · + cifk vk and =
v c1 v1 + · · · +
c k vk
are any two linear combinations,
v = c1 v1 + · ·then
· + cktheir
vk sum
and is also
b=b ca1 v
linear combination, since
v 1 + ··· + b
ck vk
v+v = (c1 + c1 )v1 + · · · + (ck + ck )vk = c 1 v1 + · · · + c k vk ,
where
ci = ci +
ci . Similarly, for any scalar multiple,
a v = (a c1 ) v1 + · · · + (a ck ) vk = c∗1 v1 + · · · + c∗k vk ,
where ci∗ = a ci , which completes the proof. Q.E.D.

Example 2.15. Examples of subspaces spanned by vectors in R 3 :

(i ) If v1 = 0 is any non-zero vector in R 3 , then its span is the line { c v1 | c ∈ R }
consisting of all vectors parallel to v1 . If v1 = 0, then its span just contains the
origin.
APPENDIX B. LINEAR ALGEBRA 48

are any two linear combinations, then their sum is also a linear combination, since

b = (c1 + b
v+v c1 ) v1 + · · · + (ck + b
ck ) vk = e
c1 v1 + · · · + e
ck vk

where e
ci = ci + b
ci . Similarly, for any scalar multiple,

av = (ac1 ) v1 + · · · + (ack ) vk = c∗1 v1 + · · · + c∗k vk

where c∗i = aci , which completes the proof.

B.3.2 Linear Independence and Dependence

Definition B.3.3. The vector space elements v1 , . . . , vk ∈ V are called linearly dependent
if there exist scalars c1 , . . . , ck , not all zero, such that

c1 v1 + · · · + ck vk = 0

Elements that are not linearly dependent are called linearly independent.

B.3.3 Basis and Dimension

Definition B.3.4. A basis of a vector space V is a finite collection of elements v1 , . . . , vn ∈
V that,

1. spans V

2. is linearly independent

Theorem B.3.5. Every basis of Rn consists of exactly n vectors. Furthermore, a set of

n vectors v1 , . . . , vn ∈ Rn is a basis if and only if the n × n matrix A = (v1 . . . vn ) is
nonsingular: rank A = n.

Theorem B.3.6. Suppose the vector space V has a basis v1 , . . . , vn for some n ∈ N . Then
every other basis of V has the same number, n, of elements in it. This number is called
the dimension of V , and written dim V = n.

B.3.4 Kernel
Definition B.3.7. The image of an m × n matrix A is the subspace imgA ⊂ Rm spanned
by its columns. The kernel of A is the subspace ker A ⊂ Rn consisting of all vectors that
are annihilated by A,

ker A = {z ∈ Rn |Az = 0} ⊂ Rn
APPENDIX B. LINEAR ALGEBRA 49

B.4 List of Variables and Dimensions

Variables Dimension

Y N ×1
X N ×K
β K ×1
e N ×1
r N ×K
Z N ×L
v N ×K
π L×K
λ L×1
w N ×1
Yi 1×1
Xi K ×1
ei 1×1
ri K ×1
Zi L×1
vi K ×1
wi 1×1
APPENDIX

Lecture 1: Projection Theorem

1. Variance: Var(X) := E [X − E[X]]2 = E[X]2 − (E[X])2 .

2. Covariance: Cov(X, Y ) := E [(X − µX ) (Y − µY )] = E[XY ] − E[X]E[Y ] = Cov(X, E(Y |X))

Cov(X,Y )
3. If X ∈ {0, 1} then Var X = E[Y |X = 1] − E[Y |X = 0]. spoiler for lecture 7.

4. Conditional expectation: a function µ : R → R such that E(Y |X) := µ(X).

5. Law of iterated expectations: E[Y ] = E [E[Y |X]]

6. Hilbert space: a complete inner product space.

7. Inner product: ⟨X, Y ⟩ = E[XY ] has following properties:

(i) ⟨X, Y ⟩ = ⟨Y, X⟩ (commutative)

(ii) ⟨X + Y, Z⟩ = ⟨X, Z⟩ + ⟨Y, Z⟩ (distributive)
(iii) ⟨aX, Y ⟩ = a⟨X, Y ⟩
(iv) ⟨X, X⟩ ≥ 0
(v) ⟨X, X⟩ = 0 if and only if X = 0
√
8. Norm: ∥X∥ := ⟨X, X⟩ is the length and ∥X − Y ∥ is the distance.

9. Projection theorem: If S is a closed subspace of the Hilbert space H and Y ∈ H, then

(i) there is a unique Ŷ ∈ S such that ∥Y − Ŷ ∥ = inf Z∈S ∥Y − Z∥

(ii) Ŷ ∈ S and ∥Y − Ŷ ∥ = inf Z∈S ∥Y − Z∥ [iff] (Y − Ŷ ) ∈ S ⊥

10. Space L2 is the collection of all rvs X defined on (Ω, F , P ) such that E|X|2 < ∞ (finite variance).

11. Linear subspaces of L2 is any Xk ∈ L2 , k = 1, . . . , K so b1 X1 + · · · + bK Xk ∈ L2 , and bk ∈ R.

{∑ }
K
12. Span: sp (X1 , . . . , XK ) := k=1 bk Xk : Xk ∈ L2 , bk ∈ R .
⟨ ⟩ ( )
13. Orthonormal basis: X̃k ∈ L2 such that X̃j , X̃l = E X̃j · X̃l = 1 for j = l and 0 otherwise.

14. Gram-Schmidt orthogonalization:

Step 1: Let X̃1 := 1

√ ( )
⃗ 2 where Ẍ2 := X2 − E X2 X̃1 X̃1
Step 2: X̃2 := Ẍ2 / VarX
√ ( ) ( )
Step 3: X̃3 := X3 / Var X3 where Ẍ3 := X3 − E X3 X̃1 X̃1 − E X3 X̃2 X̃2

15. Projection (orthogonal): P(Y ) = Ŷ = argminZ∈sp(X1 ) E[Y − Z]2 = inf b∈R E(Y − bX)2 is the
orthogonal projection of Y onto S.
∑ ( )
16. Projection (orthonormal): P(Y ) = Ŷ = K ′ ∗
k=1 E X̃k · Y X̃k = X β where X̃k is an orthonor-
mal basis of sp(X), X is a K × 1 vector, and β ∗ := E (XX ′ )−1 E(XY ).

17. Linear model representation: Y = PX Y + (Y − PX Y ) = X ′ β ∗ + u where E(Xu) = 0 with

requirements that EY 2 < ∞, E∥X∥2 < ∞, and E (XX ′ ) be positive definite.

1
Lecture 2: Ordinary Least Squares Estimation
1. Feature: Let Z ∈ L2 and P ∈ P where P is a class of distributions on Z. A feature of P is an
object of the form γ(P ) for some γ : P → S where S is often times R.

2. Statistic: gN : {Z1 , . . . , ZN } → S with observed data Zi .

3. An estimator γ̂N is a statistic used to infer some feature γ(P ) of an unknown distribution P .

4. The empirical distribution PN of the sample {Z1 , . . . , ZN } is the discrete distribution that puts
equal probability 1/N on each sample point Zi , i = 1, . . . , N.

5. Analogy principal: γ̂N := γ (PN ).

6. Convergence in probability: A sequence of random variables Z1 , Z2 , . . . converges in probability

p p
to c ∈ R if ∀ε > 0, limN →∞ P (|ZN − c| > ε) → 0 then ZN → c. If ZN → 0 we write ZN = op (1)

7. Bounded in probability: A sequence of random variables Z1 , Z2 , . . . is bounded in probability if

∀ε > 0, ∃bε ∈ R and Nε ∈ Z such that P (|ZN | ≥ bε ) < ε ∀N ≥ Nε , write ZN = Op (1).

8. If ZN = c + op (1) then ZN = Op (1) for c ∈ R.

9. Slutsky theorem: g (c + op (1)) = g(c) + op (1).

10. Weak law of large numbers: Let Z1 , Z2 , . . . be a sequence of iid rvs with E[Zi ] = µZ . Define
∑ p p
Z N := Ni=1 Zi /N. Then Z N − µZ → 0 or Z N → µZ , or write Z N = µZ + op (1).
p
11. Consistency of an estimator: γ̂N → γ
∑N ( ∑N )−1 ( ∑N )
′ 2 ′ ′ −1
12. OLS: β̂ OLS := argmin i=1 (Yi − Xi b) =
1
N i=1 Xi Xi
1
N X Y
i=1 i i = (X X) X ′Y
b∈RK
( ∑N )−1 ∑N
13. Method of Moments: β ∗ = E (XX ′ )−1 E(XY ) −→
a.p 1 ′ 1
N i=1 Xi Xi N i=1 Xi Yi = β̂ MM
∑ ∑
14. Representation: Xi Xi′ = X ′ X and Xi Yi = X ′ Y .

Lecture 3: Intermediate Ordinary Least Squares Estimation

1. Convergence in distribution: A sequence of random variables Z1 , Z2 , . . . converges in distribution
to the continuous random variable Z if limN →∞ FN (t) = F (t) ∀t ∈ R where FN is the CDF of
d
ZN . Write ZN → Z or ZN = Op (1).
d d
2. Continuous mapping theorem: If ZN → Z then g (ZN ) → g(Z) for continuous g(·).
d
3. If ZN → N (0, Ω) then
d
(i) AZN → N (0, AΩA′ )
d
′ Ω−1 Z → χ2 (dim (Z ))
(ii) ZN N N
d
(iii) (A + op (1)) ZN → N (0, AΩA′ )
′ (Ω + o (1))−1 Z → χ2 (dim (Z ))
(iv) ZN
d
p N N

4. Central limit theorem (CLT): Let Z1 , Z2 , . . . be a sequence of iid random vectors with µz := E[Zi ]
∑ √ ( ) d ( ( ′ ))
and E ∥Zi ∥2 < ∞. Define Z N := N i=1 Zi /N. Then N Z N − µZ → N 0, E (Zi − µZ ) (Zi − µZ ) .
√ ( ) ( ∑ ) −1 ( ∑ )
d
5. Asymptotic distribution: N β̂ OLS − β ∗ = N −1 N i=1 Xi Xi
′ N −1/2 N
i=1 Xi ui → N (0, Ω)

2
6. Projection matrix: PX := X (X ′ X)−1 X ′ .

7. Residual maker: MX := IN − PX .
∑
8. Trace: tr A := K k=1 akk with tr(AB) = tr(BA) and tr(A + B) = tr A + tr B.
∑ ( 2 )
9. Since σ̂u2 := N
i=1 ûi /N but E σ̂u |X < σu use Su := N −K σ̂u so limN →∞ σ̂u = limN →∞ su .
2 2 2 N 2 2 2

( ∑N )−1 ( ∑N )( ∑N )−1
10. Heteroskedasticity robust variance estimator: Ω̂ = 1 ′ 1 2 ′ 1 ′
N i=1 Xi Xi N −K i=1 ûi Xi Xi N i=1 Xi Xi

√ ( )
11. Confidence interval construction: let N r′ β̂ OLS − r′ β ∗ ∼ N (0, r′ Ωr) where r is non-stochastic
K × 1 vector and set r = ek then,
( √ ′ √ ′ )
e Ωe e Ωe
P e′k β̂ OLS − 1.96 √k < e′k β ∗ < e′k β̂ OLS + 1.96 √k
k k
= 0.95
N N

the probability is 95% that the random interval contains βk∗ .

√ ( )
d
( )
12. Confidence interval construction: N r′ β̂ OLS − r′ β ∗ → N 0, r′ Ω̂r where Ω̂ = Ω + op (1)
where r is non-stochastic K × 1 vector and set r = ek then,
 √ √ 
e′k Ω̂ek e′k Ω̂ek
P e′k β̂ OLS − 1.96 √ < e′k β ∗ < e′k β̂ OLS + 1.96 √  → 0.95
N N

√ √
( ) e′k Ω̂ek
( ) ( )
13. Standard error: se β̂kOLS = √
N
= e′k · se β̂ OLS where se β̂ OLS = diag Ω̂/N .

( ) β̂kOLS −βknull ( ) d
14. t-statistic: tOLS βknull := if β̂kOLS = βknull + op (1), then tOLS βknull → N(0, 1).
se(β̂kOLS )

15. Joint hypotheses test: Suppose that Rβ ∗ = q where R is an m × K dimensional nonstochastic

matrix and q is an m–dimensional nonstochastic vector. Then,
( )′ ( )−1 ( )
RΩ̂R′
d
WALD := N Rβ̂ OLS − q Rβ̂ OLS − q → χ2 (m)

Lecture 4: Advanced Ordinary Least Squares Estimation

1. Conditional expectation function: µ : R → R, µ (Xi ) ∈ L2 .

2. Conditional mean independence: E (ei ) = E (E (ei |Xi )) = 0

3. Full model: Let X1 , . . . , XM be an exhaustive list of variables that explain Y then Y =

h (X1 , . . . , XM ) = h(X, v) where h(·) is a well behaved function, observed X := (X1 , . . . , XK )′ ,
and unobserved v := (XK+1 , . . . , XM )′ .

4. In linear projection model, Yi = g (Xi ) + ei with E (ei |Xi ) = 0 implies Yi = µ (Xi ) + ei .

∂h(X1 ,...,XM )
5. Causal effect: Ck (X1 , . . . , XM ) := ∂Xk for k ∈ {1, . . . , M }.
∫
6. Average causal effect: ACEk (X) := Ck (X, v) · f (v|X)dv

7. µ(X) = E (h (X1 , . . . , XM ) |X1 , . . . , XK ) = E(h(X, v)|X) = X ′ β + E(v|X).

8. Linear regression model: Yi = Xi′ β + ei with E (ei |Xi ) = 0, EYi2 < ∞, and E ∥Xi ∥2 < ∞

9. Exogeneity: E (ei |Xi ) = 0.

3
10. Homoskedasticity: E (ee′ |X) = σe2 IN

11. Variance decomposition: For P, Q ∈ L2 , Var P = E Var(P |Q) + Var E(P |Q)

12. Estimator: β̂ OLS = (X ′ X)−1 X ′ Y .

13. Unbiased: Eβ̂ OLS = β under exogeneity.

[ ]
14. Unconditional variance: Var β̂ OLS = σe2 · E (X ′ X)−1 under homoskedasticity.

15. A symmetric matrix P is nonnegative definite if q ′ P q ≥ 0 for all vectors q.

16. Gauss Markov theorem: In the linear regression model with homoskedastic errors,
( β̂ OLS
) is the
linear unbiased estimator with minimum variance (BLUE). Var(β̃|X) ≥ Var β̂ OLS |X .

17. Generalised least squares estimator β̂ GLS is the minimum variance unbiased estimator under
heteroskedasticity by Gauss Markov theorem.

18. GLS is an infeasible estimator because E (ee′ |X) cannot be observed. To make it feasible, run
( )−1
OLS and obtain ê and Σ̂ then compute β̂ GLS ′ −1 X ′ Σ̂−1 Y doesn’t satisfy Gauss
feas := X Σ̂ X
Markov theorem.

19. Frisch-Waugh-Lovell theorem: Y = Xβ + e = X1 β1 + X2 β2 + e. The OLS estimator β̂2OLS results

from regressing Y on X2 adjusted for X1 β̂1OLS
[ ][ ] [ ]
X1′ X1 X1′ X2 β̂1OLS X1′ Y
=
X2′ X1 X2′ X2 β̂2OLS X2′ Y
( )−1 ( ) ( )−1 ( )
Estimators: β̂1OLS = X̃1′ X̃1 X̃1′ Ỹ and X̃2′ X̃2 X̃2′ Ỹ . without ‘∼’ if X1 and X2 have
zero sample covariance.

Lecture 5: Instrument Variable I

1. Yi = Xi′ β +ei with E (ei |Xi ) = 0 where the linearity is with respect to β and Xi can be nonlinear
(e.g. quadratic, interaction, logarithms terms).

2. OLS is the best estimation choice under exogeneity, E (ei Xi ) = 0.

3. Exogeneity: E (ei · f (Xi )) = E (f (Xi ) · E (ei |Xi )) = 0

( )
4. Measurement error: X̃i = Xi + ri . Only observe X̃i , Yi so the model: Yi = Xi′ β + ri′ β + vi

5. Omitted variable non-bias: E (ei |Xi ) ̸= 0 but E (ei Xi ) = 0

6. OLS makes sense whenever E (ei Xi ) = 0.

̸ 0 then find Zi such that E (ei Zi ) = 0 where size Zi = L × 1.

7. Instrument variable: if E (ei Xi ) =
rank E (Zi Xi′ ) = K (full column rank) so we need L ≥ K.

8. First stage regression: Xi = π ′ Zi + vi ⇒ π = E (Zi Zi′ )−1 E (Zi Xi′ ) where E (Zi vi ) = 0.

9. Second stage regression: Yi = (π ′ Zi + vi )′ β + ei = Zi′ λ + wi ⇒ λ = E (Zi Zi′ )−1 E (Zi Yi )

10. Exogenous regressors: E (ei Xi1 ) = 0

11. Endogenous regressors: E (ei Xi2 ) ̸= 0 ⇒ β ̸= β ∗ , use IV instead of OLS.

4
12. Identification: If a parameter can be written as an explicit function of population moments, then
it is identifiable. For exogenous variable dim Zi1 = K1 and endogenous variable dim Zi2 = L2
( ) ( )
Zi1 Xi1
Zi := =
Zi2 Zi2

Three cases: L = K (exactly identified), L > K (over identified), L < K (under identified). λ
and π are explicit functions of population moments and thus identified.

13. Existence and uniqueness: if Rank(π) = Rank(π λ) then the solution exists and unique if
dim ker(π) = 0 ⇒ dim π = K. We need E (Zi Xi2 ′ )=K .
2
(∑ )−1 (∑ )
14. Case 1: L = K, β = π −1 λ = E (Zi Xi′ )−1 E (Zi Yi ). Thus, β̂ IV = N
Z X
i=1 i i
′ N
Z Y
i=1 i i

15. Case 2: L > K, see Lecture 8.

16. Consistency: β̂ IV = β + op (1)

√ ( ) ( ( ) )
N β̂ IV − β → N 0, E (Zi Xi′ )−1 E e2i Zi Zi′ E (Xi Zi′ )−1
d
17. Asymptotic Distribution:

Lecture 7: Treatment Effects with IV

1. Treatment: Xi ∈ {0, 1}, is a dummy variable.

2. Outcome function: Yi : {0, 1} → R.

3. Individual treatment effect (ITE): Yi (1) − Yi (0), cannot be observed. To fix this, find any
identical j ̸= i such that Yi (1) − Yj (0) and not Yi (p) = Yj (p) for p ∈ {0, 1}.

4. The regression model: Yi := Yi (1) · Xi + Yi (0) · (1 − Xi ) = β0 + β1i Xi + ũi , don’t use OLS
unless β1i is constant. To take out i-subscript, manipulate it by ±E (Yi (1) − Yi (0)) · Xi where
E (Xi ui ) = 0.

5. Average treatment effect: β1 := E (Yi (1) − Yi (0))

6. Independence: (Yi (1), Yi (0)) ⊥ Xi

7. OLS estimator of β1 in the model Yi = β0 + β1 Xi + ui is an estimator of the average treatment

effect E (Yi (1) − Yi (0)).

8. Randomised eligibility: Zi ∈ {0, 1} if treatment cannot be randomised.

1i ·π1i )
9. Estimator: β̂ IV = E(β
E(π1i ) + op (1) ̸= E (β1i ) + op (1), estimates the causal effect for those with
most influential Zi (large π1i ).
1i ·π1i )
10. Local average treatment effect (LATE): E(β E(π1i ) = ATE+
Cov(β1i ,π1i )
E(π1i )
π1i
where E(π 1i )
is interpreted
as weights. Unreal cases: β1i = β1 , π1i = π1 , β1i and π1i are independent, then LATE = ATE.
E(β1i ·π1i )
11. If there are four types, always taker, complier, defier, never taker, then LATE = E(π1i ) =
E (β1i |C) ̸= E (β1i ), thus the treatment effect for compliers.

12. Intention-to-treat effect: calculated under full compliance.

Y 1 −Y 0
13. Wald estimator: β̂ IV = X 1 −X 0

5
Lecture 8: Instrument Variable II
1. Case 2: L > K, π ′ πβ = π ′ λ ⇒ β = (π ′ π)−1 π ′ λ, motivates 2SLS.
( )−1
2. Two stage least squares estimator: β̂ 2SLS = X ′ Z (Z ′ Z)−1 Z ′ X X ′ Z (Z ′ Z)−1 Z ′ Y
( )−1 ( )−1
3. Representation: (X ′ PZ X)−1 X ′ PZ Y = X̂ ′ X X̂ ′ Y = X̂ ′ X̂ X̂ ′ Y .

4. First stage: X on Z, π̂ = (Z ′ Z)−1 Z ′ X where X̂ = Z π̂ = PZ X

( )−1
5. Second stage: Y on X̂ so β̂ 2SLS = X̂ ′ X̂ X̂ ′ Y .
p
6. Consistency: β̂ 2SLS → β.
√ ( )
d
7. Asymptotic Distribution: N β̂ 2SLS − β → N (0, Ω)

8. Bias of 2SLS: Hahn and Kuersteiner (2002) uses the concentration parameter µ, when large
enough, the bias approaches zero.

9. Invalid instrument: think of π = E (Zi Zi′ )−1 E (Zi Xi′ ) when E (Zi Xi′ ) = 0. β̂ IV is not consistent
and converge to a Cauchy distribution. t-statistic doesn’t converge to normal distribution.
( ) null
10. Generic t-statistic: t β null := β̂−β .
se(β̂)

p
( ) p (ρ): affects asymptotic distribution of t. As ρ → 1 (worst case), ξ1 → ξ2 ,
11. Degree of endogeneity
σ̂e → 0, se β̂
2 IV → 0, S(ρ) → ∞, and t → ∞ ⇒ rejecting H0 : β = 0.

Lecture 9: Weak Instruments

1. Weak instrument: π ̸= 0 but π ≈ 0

2. Asymptotic distribution of t: depends on ρ and τ (the strength of the instrument), both cannot
ξ2
be estimated. If ρ = 1, ξ1 = ξ2 , and t-statistic becomes S(1, τ ) = ξ1 + τ1 .

3. The mixture: S(1, τ ) is χ21 distribution controlled by τ .

4. Strong instrument: large τ , S(1, τ ) close to N(0, 1).

5. Weak instrument: small τ , χ21 dominates.

6. Nightmare: limτ →0 S(1, τ ) = ∞, misleadingly t suggesting significant β.

7. Dealing with weak instruments: Staiger and Stock (1997) uses first stage F statistic of X on Z.
The instrument is strong if F > 10 (safe to use β̂ IV and β̂ 2SLS ) and weak if F < 10.

8. Nominal size (Type I error): P(rejecting H0 |H0 is true) = α.

9. Stock and Yogo: provides a table of F -statistic based on actual size. The more you can tolerate
with high α, the more likely you will reject the null and conclude that the instrument is strong.

4. Maximum Likelihood Estimator: θ̂ML (Y ) := argmaxθ L (θ|Y ) = argmaxθ L(θ|Y ).

( ) ( )
|τ ) fY (Y |τ )
5. Jensen’s inequality: E − ln ffYY (Y
(Y |θ) ≥ − ln E fY (Y |θ) .

6. Invariance property: M̂ L is the ML estimator of Θ, then function κ : Θ → R, the ML

( If θ̂)
estimator κ(θ) is κ θ̂M L .

10. Cramér Rao Bound: Var T (Y1 , . . . , YN ) ≥ 1

where T (·) is an unbiased estimator of θ and at-
N ·I(θ)
∑
tains the bound with necessary and sufficient condition N1 N i=1 S (yi |θ) = a(θ)·(T (Y1 , . . . , YN ) − θ).

11. Consistency: θ̂M L → θ.

√ ( ) ( )
d
12. Asymptotic distribution: N θ̂M L − θ → N 0, I(θ)−1 where I(θ)−1 is the CRB.

Lecture 11: Limited Dependent Variable (LDV) Models

1. Limited dependent variables: binary outcome: Y ∈ {0, 1}, multinomial outcome: Y ∈ {0, 1, . . . , s},
integer outcome: Y ∈ {0, 1, . . .}, or censored outcome: Y ∈ R+ .

2. Estimation: parametric MLE

3. Linear probability model (lpm): E (Yi |Xi ) = Pr (Yi = 1|Xi ) = Xi′ β where β is the effect of Xi
on the probability of success Pr (Yi = 1|Xi ). Use OLS to estimate β.

4. Limitations: Pb (Yi = 1|Xi ) < 0, P

b (Yi = 1|Xi ) > 1, and linearity of the probability restrictive
′
fixed by P (Yi = 1|Xi ) = G (Xi β) where G (probit or logit) is a CDF. Use MLE to estimate β.
∏ ∏N
5. Likelihood function: L (β|x, y) = N i=1 Pr (Yi = yi |Xi = xi ) Pr (Xi = xi ) = i=1 fY (yi |xi , β) fX (xi )
∑N
6. MLE: argmaxL(β|x, y) = argmax i=1 ln fY (yi |xi , β)
β∈B β∈B

7. Latent variable representation: Yi∗ = Xi β + ei where Yi = 1 if Yi∗ > 0 and Yi = 0 otherwise.

8. In the binary outcome model, fY (y|x, β) = Pr (Yi = y|Xi = x) = G (x′ β)y · (1 − G (x′ β))1−y .

9. Let G be the standard normal or the logistic CDF. Then L(β|x, y) is globally concave.
′ β)
′
∑N
10. Score: S(y|x, β) = G(x′y−G(x
β)(1−G(x′ β)) ·g (x β)·x, let the computer find β such that i=1 S (yi |xi , β) =
0.
√ ( )
d ( )
11. Asymptotic distribution of MLE: N β̂ ML − β → N 0, I(β)−1

7
( )
( ′)
2
g (Xi′ β )
12. Information matrix: I(β) = E S (Yi |Xi , β) S (Yi |Xi , β) = E G(Xi′ β )(1−G(Xi′ β ))
· Xi Xi′

i |Xi )
13. Causal effect: ∂E(Y = ∂ Pr(Y∂X
i =1|Xi )
= g (Xi′ β) β ̸= β by chain rule. Take expectation, ϕ :=
( ) ∂Xi i
∑N ( ( ′ ML ) ML )
i =1|Xi )
E ∂ Pr(Y∂X = E (g (X ′ β) β) and use analogy principal, ϕ̂ = 1 .
i i N i=1 g Xi β̂ β̂
√ d
14. Delta method: Let N (θ̂ − θ) → N (0, Ω) with dim θ = K. Take a continuously differentiable
√ d
function C : Θ → R where Q ≤ K. Then N (C(θ̂) − C(θ)) → N (0, c(θ) · Ω · c(θ)′ ) where
Q
∂C
c(θ) := ∂θ′ (θ).

15. Sample selection model: Yi∗ = Xi′ β + ei and Di = 1 · (Zi′ γ + vi ) where Yi = Yi∗ if Di = 1 and
unobserved if Di = 0 with given (Di , Xi , Yi , Zi ).
ϕ(c)
16. Inverse Mills ratio: E (vi |vi > −c) = Φ(c) =: λ(c)

17. Estimating γ (first stage): run a probit estimation of Di on Zi and obtain γ̂ ML .

18. Regression model (second stage): Yi = Xi′ β+ρλ (Zi′ γ)+ri where derive first E (ei |Di = 1, Xi = x, Zi = z) =
ρλ (z ′ γ) then E (Yi∗ |Di = 1, Xi , Zi ) = Xi′ β + ρλ (Zi′ γ). Use OLS.

19. Exclusion restriction: at least one instrument that is not in Xi

20. Important note: β is only identified by imposing some functional form on the joint error distri-
bution.

Lecture 12: Extremum Estimators and M-Estimation

1. Extremum estimators: θ̂EE := argmaxQN (Wi , θ) given data Wi , i = 1, . . . , N and QN (Wi , θ) =
θ∈Θ
1 ∑N
N i=1 q (W i , θ) where q (W i , θ) is ln fY (Yi |θ) for MLE.
u.p
2. Convergence: Q0 (θ) : QN (Wi , θ) −→ Q0 (θ)
∑

3. Sufficient condition: supθ∈Θ N1 Ni=1 (q (Wi , θ) − E (q (Wi , θ))) = op (1)

u.p
4. Regularity conditions: Q0 (θ) is continuous (by inspection) and Q0 (θ) : QN (Wi , θ) −→ Q0 (θ).

5. Substantive conditions: Q0 (θ) is uniquely maximized at θ0 (checked naturally in MLE) and Θ

is compact (verified when estimating a probability or running a regression).

6. Consistency: regularity, substantive conditions and identification must be satisfied.

∑ ( )
∂q(Wi ,θ)
7. M-estimator: N1 N i=1 s W i , θ̂ M = 0 given data W , i = 1, . . . , N where s (W , θ) :=
i i ∂θ .
( )
∂QN
8. Extremum vs M: θ̂EE := argmaxQN (Wi , θ) and ∂θ Wi , θ̂M = 0.
θ∈Θ
√ ( ) ( )
d
9. Asymptotic distribution: N θ̂EE − θ0 → N 0, H −1 ΣH −1 .

p1 Martingale Report
100% (1)
p1 Martingale Report
6 pages
Introduction to Statistical Methods for Financial Models 1st Severini Solution Manual - Latest Version Can Be Downloaded Immediately
100% (4)
Introduction to Statistical Methods for Financial Models 1st Severini Solution Manual - Latest Version Can Be Downloaded Immediately
54 pages
(Textbooks in Mathematics) Jeffrey Paul Wheeler - An Introduction To Optimization. With Applications in Machine Learning and Data Analytics-CRC Press (2024)
No ratings yet
(Textbooks in Mathematics) Jeffrey Paul Wheeler - An Introduction To Optimization. With Applications in Machine Learning and Data Analytics-CRC Press (2024)
475 pages
(eBook PDF) Advanced Microeconomic Theory An Intuitive Approach with Examples all chapter instant download
67% (3)
(eBook PDF) Advanced Microeconomic Theory An Intuitive Approach with Examples all chapter instant download
34 pages
Time Series Analysis and Its Applications (Instructor's Manual) (Robert H. Shumway, David S. Stoffer)
100% (1)
Time Series Analysis and Its Applications (Instructor's Manual) (Robert H. Shumway, David S. Stoffer)
81 pages
ECON681 Homework Question
No ratings yet
ECON681 Homework Question
5 pages
Wickens Exercises in Econometrics
0% (1)
Wickens Exercises in Econometrics
113 pages
First Steps Inna
No ratings yet
First Steps Inna
232 pages
Wiley - Student Solutions Manual To Accompany Introduction To Time Series Analysis and Forecasting - 978-0-470-43574-8
0% (1)
Wiley - Student Solutions Manual To Accompany Introduction To Time Series Analysis and Forecasting - 978-0-470-43574-8
3 pages
CQF Math Aptitude Test Solutions
No ratings yet
CQF Math Aptitude Test Solutions
27 pages
Mathematical Statistics With Applications Solution Manual
No ratings yet
Mathematical Statistics With Applications Solution Manual
5 pages
Harvard Economics 2020a Problem Set 4
100% (1)
Harvard Economics 2020a Problem Set 4
4 pages
Advanced Econometrics
No ratings yet
Advanced Econometrics
9 pages
Essentials of Statistics
No ratings yet
Essentials of Statistics
272 pages
Asteriou - Series de Tiempo
No ratings yet
Asteriou - Series de Tiempo
57 pages
Econometrics Note
No ratings yet
Econometrics Note
387 pages
MacKinnon Critical Values For Cointegration Tests Qed WP 1227
No ratings yet
MacKinnon Critical Values For Cointegration Tests Qed WP 1227
19 pages
4basic Econometrics Chapter III
No ratings yet
4basic Econometrics Chapter III
13 pages
Solution To Campbell Lo Mackinlay PDF
0% (1)
Solution To Campbell Lo Mackinlay PDF
71 pages
Instant Access to Introduction To Stochastic Calculus With Applications 3Rd Edition Fima C Klebaner ebook Full Chapters
No ratings yet
Instant Access to Introduction To Stochastic Calculus With Applications 3Rd Edition Fima C Klebaner ebook Full Chapters
50 pages
Test Bank For Microeconomics 5th by Besanko Full Download
No ratings yet
Test Bank For Microeconomics 5th by Besanko Full Download
24 pages
HOY Solutions Manual For Mathematics For Economics
No ratings yet
HOY Solutions Manual For Mathematics For Economics
158 pages
Quantitative Aptitude Formulae - QuantSensei
No ratings yet
Quantitative Aptitude Formulae - QuantSensei
97 pages
3 - Wooldridge - Introductory Econometrics - Ch03
No ratings yet
3 - Wooldridge - Introductory Econometrics - Ch03
25 pages
Econometric Methods-Johnston PDF
No ratings yet
Econometric Methods-Johnston PDF
514 pages
Applied Linear Regression
0% (1)
Applied Linear Regression
41 pages
A Koutsoyiannis - Theory of Econometrics - An Introductory Exposition of Econometric Methods (1977, Macmillan) - Libgen - Li (1) - 1-35 - Compressed
No ratings yet
A Koutsoyiannis - Theory of Econometrics - An Introductory Exposition of Econometric Methods (1977, Macmillan) - Libgen - Li (1) - 1-35 - Compressed
35 pages
Introduction To Economics (EC1002)
No ratings yet
Introduction To Economics (EC1002)
2 pages
BSC Economics and Maths
No ratings yet
BSC Economics and Maths
5 pages
Herstein Topics in Algebra Solution 3.1-3.2
No ratings yet
Herstein Topics in Algebra Solution 3.1-3.2
4 pages
Mathematics of Finance An Intuitive Introduction Undergraduate Texts in Mathematics Saari - Download the full set of chapters carefully compiled
100% (1)
Mathematics of Finance An Intuitive Introduction Undergraduate Texts in Mathematics Saari - Download the full set of chapters carefully compiled
61 pages
Manual For Instructors: TO Linear Algebra Fifth Edition
No ratings yet
Manual For Instructors: TO Linear Algebra Fifth Edition
9 pages
Financial Statistics Laboratory 3: Bootstrap
No ratings yet
Financial Statistics Laboratory 3: Bootstrap
16 pages
ST2195 Complete
No ratings yet
ST2195 Complete
430 pages
Decision Theory Handout
No ratings yet
Decision Theory Handout
202 pages
(Ian W. Knowles, Roger T. Lewis, International Con (BookFi)
No ratings yet
(Ian W. Knowles, Roger T. Lewis, International Con (BookFi)
401 pages
1 Models
No ratings yet
1 Models
30 pages
Advanced Econometrics
No ratings yet
Advanced Econometrics
6 pages
Limited Dependent Variable Models Example
No ratings yet
Limited Dependent Variable Models Example
5 pages
FMEA - Student Manual
No ratings yet
FMEA - Student Manual
115 pages
Principles of Mathematical Economics II: Shapoor Vali
No ratings yet
Principles of Mathematical Economics II: Shapoor Vali
296 pages
Matrices Study Guide
100% (1)
Matrices Study Guide
42 pages
Oksendal Stochastic Differential Equations PDF Free
No ratings yet
Oksendal Stochastic Differential Equations PDF Free
385 pages
Sydsaether Students Manual Further Smfmea2
100% (1)
Sydsaether Students Manual Further Smfmea2
115 pages
A Primer in Econometric Theory: Vector Spaces
No ratings yet
A Primer in Econometric Theory: Vector Spaces
104 pages
Formula Sheet Mathematics 1 For Economics
No ratings yet
Formula Sheet Mathematics 1 For Economics
3 pages
4 - LM Test and Heteroskedasticity
No ratings yet
4 - LM Test and Heteroskedasticity
13 pages
MT3042-guide_2024 (1)
No ratings yet
MT3042-guide_2024 (1)
175 pages
001-2023-0929 DLMDSAS01 Course Book
No ratings yet
001-2023-0929 DLMDSAS01 Course Book
224 pages
Science Practical 2020-23
No ratings yet
Science Practical 2020-23
57 pages
A Short Course of Time-Series Analysis and Forecasting by D S G Pollock
No ratings yet
A Short Course of Time-Series Analysis and Forecasting by D S G Pollock
133 pages
IT100 Book Revised 7.01.2020
No ratings yet
IT100 Book Revised 7.01.2020
121 pages
Chapter16 Distributed Lag Models
No ratings yet
Chapter16 Distributed Lag Models
30 pages
Calculus and Linear Algebra, Vol. 2 by Wilfred Kaplan PDF
No ratings yet
Calculus and Linear Algebra, Vol. 2 by Wilfred Kaplan PDF
605 pages
7772 LectureNotes
No ratings yet
7772 LectureNotes
120 pages
Econometría
No ratings yet
Econometría
43 pages
Stat Modelling Notes
No ratings yet
Stat Modelling Notes
49 pages
Classical Linear Regression and Its Assumptions
No ratings yet
Classical Linear Regression and Its Assumptions
63 pages
Fundamentals of Mathematical Statistics 2020
No ratings yet
Fundamentals of Mathematical Statistics 2020
196 pages
Lecture1
No ratings yet
Lecture1
8 pages
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
Skees 1999 Conceptual and Practical Considerations
No ratings yet
Skees 1999 Conceptual and Practical Considerations
19 pages
What Drives Households To Buy Flood Insurance? New Evidence From Georgia
No ratings yet
What Drives Households To Buy Flood Insurance? New Evidence From Georgia
9 pages
Risk Preferences, Risk Perceptions, and Flood Insurance - Petrolia (2013)
No ratings yet
Risk Preferences, Risk Perceptions, and Flood Insurance - Petrolia (2013)
19 pages
Intro To Loops - Local & Global
No ratings yet
Intro To Loops - Local & Global
24 pages
Overlaying Histograms in Stata - StataTeX Blog
No ratings yet
Overlaying Histograms in Stata - StataTeX Blog
1 page
Insurance Decision-Making and Market Behavior - Kunreuther (2005)
No ratings yet
Insurance Decision-Making and Market Behavior - Kunreuther (2005)
65 pages
Sem - Sample.notes
No ratings yet
Sem - Sample.notes
46 pages
The Economic Impacts of Natural Disasters - A Review of Models and Empirical Studies
No ratings yet
The Economic Impacts of Natural Disasters - A Review of Models and Empirical Studies
22 pages
SEM R Notes 2019 3
No ratings yet
SEM R Notes 2019 3
172 pages
The Effect of Pollution On Crime - Evidence From Data On Particulate Matter and Ozone - Burkhardt (2019)
No ratings yet
The Effect of Pollution On Crime - Evidence From Data On Particulate Matter and Ozone - Burkhardt (2019)
50 pages
HonestParallelTrends Main
No ratings yet
HonestParallelTrends Main
86 pages
Cattle Farmers Perceptions of Risk and Risk Management Strategies Evidence From Northern Ethiopia
No ratings yet
Cattle Farmers Perceptions of Risk and Risk Management Strategies Evidence From Northern Ethiopia
21 pages
9 Extensive Form Games III Handout
No ratings yet
9 Extensive Form Games III Handout
15 pages
Comparison Sleep Quality Among Urban and Rural Adult Population in Bengali
No ratings yet
Comparison Sleep Quality Among Urban and Rural Adult Population in Bengali
5 pages
Mission Report: Assessment of Child Friendly City Initiative (Cfci) Development in The R Epublic of Belarus
No ratings yet
Mission Report: Assessment of Child Friendly City Initiative (Cfci) Development in The R Epublic of Belarus
58 pages
Cestero Mancera, Ana M._Nonverbal communication in L2 Spanish teaching (2018 Routledge)
No ratings yet
Cestero Mancera, Ana M._Nonverbal communication in L2 Spanish teaching (2018 Routledge)
18 pages
Bsbsus401 Implement and Monitor Environmentally Sustainable Work Practices - Student Guide
100% (1)
Bsbsus401 Implement and Monitor Environmentally Sustainable Work Practices - Student Guide
69 pages
CC2413 Fundamental Psychology For Health Studies: Lesson Plan
No ratings yet
CC2413 Fundamental Psychology For Health Studies: Lesson Plan
10 pages
Checklist For Awesome Lab Reports
No ratings yet
Checklist For Awesome Lab Reports
8 pages
Knowledge, Attitude and Practice Regarding Adverse Drug Reaction Monitoring & Reporting Amongst Physicians in A Tertiary Care Teaching Hospital, Ahmedabad Indian Journal of Applied Research
No ratings yet
Knowledge, Attitude and Practice Regarding Adverse Drug Reaction Monitoring & Reporting Amongst Physicians in A Tertiary Care Teaching Hospital, Ahmedabad Indian Journal of Applied Research
4 pages
Research Experience and Knowledge: Lesson 1
No ratings yet
Research Experience and Knowledge: Lesson 1
22 pages
IC Business Project Risk Assessment Sample 10878
No ratings yet
IC Business Project Risk Assessment Sample 10878
4 pages
Research Ethics in Anthropology / Sociology
No ratings yet
Research Ethics in Anthropology / Sociology
23 pages
Keys 2 Cognition - Cognitive Processes
No ratings yet
Keys 2 Cognition - Cognitive Processes
2 pages
PRACTICAL RESEARCH 2 - 1ST QUARTER (Students' Copy)
100% (1)
PRACTICAL RESEARCH 2 - 1ST QUARTER (Students' Copy)
5 pages
About The Fmi: Our Mission Our Vision
No ratings yet
About The Fmi: Our Mission Our Vision
2 pages
Lesson Plan 10 1 20min
No ratings yet
Lesson Plan 10 1 20min
4 pages
Brainwashing Thesis Statement
100% (3)
Brainwashing Thesis Statement
7 pages
Three Stages in Decision Making
No ratings yet
Three Stages in Decision Making
9 pages
International Conflict Mediation
No ratings yet
International Conflict Mediation
70 pages
International Paleolimnology Symposium 12th IPS 2012
No ratings yet
International Paleolimnology Symposium 12th IPS 2012
214 pages
Writing An Abstract, Precis, or
0% (1)
Writing An Abstract, Precis, or
28 pages
Recaptures, Recaptures:: Read All About It!
No ratings yet
Recaptures, Recaptures:: Read All About It!
12 pages
Decision Making Engagement Sample Lesson Plan
No ratings yet
Decision Making Engagement Sample Lesson Plan
8 pages
PR Technical Notes Tenure Status
No ratings yet
PR Technical Notes Tenure Status
9 pages
HBO Exam Summer 2019
No ratings yet
HBO Exam Summer 2019
2 pages
Rural Sociology
No ratings yet
Rural Sociology
7 pages
Literature Review Example Chemistry
100% (1)
Literature Review Example Chemistry
5 pages
Kotireddy
No ratings yet
Kotireddy
346 pages
HEALTH STATISTICS -LESSON 1
No ratings yet
HEALTH STATISTICS -LESSON 1
40 pages
Urban Property Ownership Records in Karnataka: Computerized Land Registration System For Urban Properties
No ratings yet
Urban Property Ownership Records in Karnataka: Computerized Land Registration System For Urban Properties
31 pages
Religious Trauma 1st Edition Brooke N. Petersen pdf download
No ratings yet
Religious Trauma 1st Edition Brooke N. Petersen pdf download
47 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.