Exercises MEF - 9 - 2018 - Solution
Exercises MEF - 9 - 2018 - Solution
Exercise 1
You have two random variables, X and Y . Consider three statements:
A = ‘The expectation of the product of X and Y is equal to zero. That is, E[XY ] = 0.’
B = ‘The correlation between X and Y is zero.’
C = ‘X and Y are independent.’
For the following statements, state whether they are true or false and explain why:
(a) ¬B =⇒ ¬A
(b) A =⇒ B
(c) C =⇒ B
(d) C =⇒ (A ∩ B).
Solution 1
Cov(X;Y )
(a) False. We have Cov(X; Y ) = E(XY ) − E(X)E(Y ). and Corr(X; Y ) = σX σY . If Corr(X; Y ) 6= 0,
Cov(X; Y ) 6= 0, but E(XY ) may still be equal to 0.
(b) False. If E(XY ) = 0, it may be that E(X)E(Y ) 6= 0 and thus both covariance and correlation between X
and Y are different from 0.
(c) True. If X and Y are independent, we have that E(XY ) = E(X)E(Y ) and Cov(X; Y ) = 0, leading to
Corr(X; Y ) = 0.
(d) False. If X and Y are independent, we have that E(XY ) = E(X)E(Y ), but it does not mean that
E(XY ) = 0.
Exercise 2
For random variables X, Y ∼ N 0, σ 2 determine
(a) E X X 2
(b) E (X |XY )
(c) E X 2 + Y 2 |X + Y .
1
Solution 2
(a) Since normal distribution is symmetric around zero, thus knowing X 2 = x one can immediately tell that
√ √ 2
√
X
√ = x with probability
√ √ 1/2 or X = − x with probability 1/2. Thus E X X = x = P (X = x) ·
1 √ 1 √
x + P (X = − x) · (− x) = 2 · x + 2 · (− x) = 0.
2
Another way to show is to notice that −X ∼ N 0, σ 2 =⇒ E X X 2 = E −X (−X) = E −X X 2 =
−E X X 2 =⇒ E X X 2 = 0.
(b) One may note that −X, −Y ∼ N 0, σ 2 .
because −X, −Y have the same distribution as X, Y . Thus E (X |XY ) = −E (X |XY ) ⇐⇒ E (X |XY ) = 0.
2 2
(c) Applying the same idea, E X 2 + Y 2 |X + Y = −E (−X) + (−Y ) | − X − Y = −E X 2 + Y 2 |X + Y =⇒
E X 2 + Y 2 |X + Y = 0.
Exercise 3
Consider the following conditional density for Y given X = x is
2y + 4x
f Y |X ( y| x) = .
1 + 4x
The marginal density of x is
1 + 4x
fX (x) = ,
3
for 0 < x < 1 and 0 < y < 1. Find
(a) the joint density fXY (x, y),
(b) the marginal density of Y , fY (y), and
(c) the conditional density for X given Y = y, f X|y ( x| y).
Solution 3
(a) From Definition 4.18 in the lecture notes we know that the conditional density of a random variable Y given
another random variable X = x is
fX,Y (x,y) , if fX (x) > 0;
fX (x)
f Y |X ( y| x) =
0, otherwise.
1+4x
∀x ∈ (0, 1) fX (x) = 3 > 0, therefore, the joint density of X, Y is
2y + 4x 1 + 4x 2y + 4x
fXY (x, y) = f Y |X ( y| x) fX (x) = · = .
1 + 4x 3 3
2
(b) The marginal density of Y can be derived by integrating out the variable x
Z ∞ Z 1
2y + 4x 2y + 2
fY (y) = fXY (x, y) dx = dx = .
−∞ 0 3 3
2y+2
(c) Finally, as ∀y ∈ (0, 1) fY (y) = 3 > 0, the conditional density of X given Y is
2y+4x
fX,Y (x, y) 3 y + 2x
f X|Y ( x| y) = = 2y+2 = .
fY (y) 3
y+1
Exercise 4
2
Show that if (X, Y ) ∼ N2 (µX , µY , σX , σY2 , ρ) then the following is true:
2
(a) The marginal distribution of X is N (µX , σX ) and the marginal distribution of Y is N (µY , σY2 ).
(b) The conditional distribution of Y given X = x is N (µY + ρ( σσX
Y
)(x − µX ), σY2 (1 − ρ2 )).
Solution 4
(a) Bivariate normal:
1
fXY (x, y) = p ∗
2πσX σY 1 − ρ2
1 x − µX 2 x − µX y − µY y − µY 2
exp{− [( ) − 2ρ( )( )+( ) ]}
2(1 − ρ2 ) σX σX σY σY
(b)
Z∞
x − µX y − µY
fX (x) = fXY (x, y)dy. Set ω = , and z = , so dy = σY dz.
σX σY
−∞
3
Z∞
1
fX (x) = p
2πσX σY 1 − ρ2
−∞
1 x − µX 2 x − µX y − µY y − µY 2
exp{− [( ) − 2ρ( )( )+( ) ]}dy
2(1 − ρ2 ) σX σX σY σY
Z∞
1 1
= exp{− [ω 2 − 2ρωz + z 2 ]}dz
2(1 − ρ2 )
p
2πσX 1 − ρ 2
−∞
ω2
exp(− 2(1−ρ Z∞
2) ) 1
= exp{− [z 2 − 2ρωz + (ρω)2 − (ρω)2 ]}dz
2(1 − ρ2 )
p
2πσX 1 − ρ 2
−∞
ω2
exp(− 2(1−ρ + (ρω)2 Z∞
2) 2(1−ρ2 ) ) 1
= exp{− [z 2 − 2ρωz + (ρω)2 ]}dz
2(1 − ρ2 )
p
2πσX 1− ρ2
−∞
2 Z∞
exp(− ω2 ) 1 1
= √ √ exp{− (z − ρω)2 ]}dz
2(1 − ρ2 )
p
2πσX 2π 1 − ρ2
−∞| {z }
pdf for r.v ∼N (ρω,(1−ρ2 )
| {z }
=1
1 1 x − µX 2 2
=√ exp[− ( ) ] ∼ N (µX , σX ).
2πσX 2 σX
fXY (x, y)
fY |X (y | x) = .
fX (x)
x−µX 2
1√
2
1
exp{− 2(1−ρ 2 ) [( σ ) − 2ρ( x−µ y−µY y−µY 2
σX )( σY ) + ( σY ) ]}
X
2πσX σY 1−ρ X
=
√ 1
2πσX
exp[− 12 ( x−µ X 2
σX ) ]
1
=√ p
2πσY 1 − ρ2
1 x − µX 2 x − µX 2 x − µX y − µY y − µY 2
exp{− [( ) − (1 − ρ2 )( ) − 2ρ( )( )+( ) ]
2(1 − ρ2 ) σX σX σX σY σY
1 1 x − µX 2 x − µX y − µY y − µY 2
=√ p exp{− 2
[ρ2 ( ) − 2ρ( )( )+( ) ]}
2πσY 1 − ρ 2 2(1 − ρ ) σX σX σY σY
1 1 y − µY x − µX 2
=√ p exp{− 2
[ −ρ ] }
2πσY 1 − ρ 2 2(1 − ρ ) σY σX
1 1 σY
=√ p exp{− 2 2 [(y − µY ) − ρ (x − µX )]2 }.
2πσY 1 − ρ2 2(1 − ρ )σY σX
⇒ Y | X is N (µY + ρ( σσX
Y
)(x − µX ), σY2 (1 − ρ2 )).
Exercise 5
Suppose a sample X1 , X2 , . . . , Xn is drawn randomly from normal distribution N θ, σ 2 .
4
Solution 5 Pn X Pn
j 1
j=1
(a) Cov X, Xi − X = Cov X, Xi − Cov X, X = Cov n , Xi − V arX = n j=1 Cov (Xj , Xi )−
Pn
n
1 X 1 j=1 V arXj σ2 nσ 2
− 2 V ar Xj = V arXi −
= − = 0.
n j=1
n n2 n n2
Pn 2
(Xi −X ) Pn 2 Pn
1 1
2
i=1
(b) Cov X, S = Cov X, n = n i=1 Cov X, Xi − X = n i=1 0 = 0.
σ2 σ2 2 σ 2
2
σ2
nσ 2
E θb = E X 2 − = EX 2 − = V arX + EX − = + (θ) − = θ2 .
n n n n2 n
Exercise 6
Consider a model
y = Xβ + ε, (1)
0 0
where y = (y1 , ..., yN ) and X = (X1 , ..., XN ) for i = 1, ..., N . The matrix X is known and is of size N × K.
Assume the error term ε = (ε1 , ..., εN )0 follows a Normal distribution with E(ε) = 0, Cov(X, ε) = 0, and
V ar(ε) = σ 2 I.
(a) Compute the OLS estimator of β, β̂. How would you compute it if X was unknown?
(d) Find the variance of β̂ for given X in the case of homoscedastic errors.
(e) Suppose now that V ar(ε) = σ 2 Ω, where Ω is an N × N matrix that may have non-equal diagonal elements
(heteroskedasticity), non-zero off-diagonal elements (serial correlation), or both.
[1.] Which mathematical result allows you to find a non-singular matrix P such that P 0 P = Ω−1 ? Now
consider the transformed model
y ∗ = X ∗ β + ε∗ , (2)
where y ∗ = P y, X ∗ = P X, and ε∗ = P ε. Assume P exists, what is the best linear unbiased estimator
(BLUE) for the transformed model (2)?
[2.] Determine the best linear unbiased estimator of β in the original model (1) in terms of X, y and Ω.
What is the variance of the BLUE? Compare it to V ar(βbOLS ). What do you conclude about βbOLS ?
(f) Show that the β̂ is BLUE (best linear unbiased estimator), i.e. prove the Gauss-Markov theorem.
Solution 6
(a) The geometric solution in Euclidean space is the β that makes ε perpendicular or orthogonal to X. Note
that two non-zero vectors a and b are orthogonal, written a⊥b if and only if a0 b = b0 a = 0.
In terms of the Euclidean distance, the vector β should make ||ε|| = ||y − Xβ|| as small as possible. Note
that this is just the sum of squared errors. The geometric solution is the β that makes ε perpendicular or
orthogonal to X. We seek β such that ε⊥X or in other words E(X 0 ε) = 0, as E(ε|X) = E(ε) = 0.
5
X 0 ε = X 0 (y − Xβ) = X 0 y − X 0 Xβ = 0.
X 0 y = X 0 Xβ = 0
If X 0 X is full rank, then we can premultiply by (X 0 X)−1 and obtain the standard OLS estimator for β
β̂ = (X 0 X)−1 X 0 y.
β̂ − β = (X 0 X)−1 X 0 y − β
= (X 0 X)−1 X 0 (Xβ + ε) − β
= (X 0 X)−1 X 0 X β + (X 0 X)−1 X 0 ε − β = (X 0 X)−1 X 0 ε.
| {z }
=In
(c) If we take the expectation of the sampling error we can show that the estimator β̂ of β is unbiased. Denote
A = (X 0 X)−1 X 0 . Note that the quantity A is measurable with respect to X. We obtain
A usual assumption in OLS estimation is that the error term ε has expectation equal to zero, hence
E(β̂ − β | X) = AE(ε | X) = 0.
This shows the unbiased property of the estimator β̂ under the additional assumption. This assumption is
generally not crucial.
(d) Finally, derive the formula for the variance of β̂.
V ar β̂ X = V ar β̂ − β X V ar(β) = 0 and Cov(β, β̂) = 0
= V ar ( Aε| X) A is X measurable
0
= AV ar ( ε| X) A
= AE ( εε0 | X) A0 because E(ε) = 0
2 0
= Aσ In A by assumption of homoscedasticity
2 0 −1 0 0 −1 0 0
= σ (X X) X [(X X) X]
2 0 −1 0 0 0 −1
= σ (X X) X X[(X X) ]
2 0 −1
= σ (X X) .
The properties about β in the first line hold because β is the true value and therefore not a random
variable. Homoscedasticity means that all error terms have the same variance and are uncorrelated, hence
E ( εε0 | X) = σ 2 In . This assumption is clearly more crucial and must be checked in every application.
(e) Spectral Decomposition.
0 0 0
V ar(ε∗ ) = V ar(P ε) = P V ar(ε)P = P 0 σ 2 ΩP = σ 2 P (P 0 P )−1 P = σ 2 P P −1 P 0−1 P = σ 2 I.
6
(f) βbGLS = (X ∗0 X ∗ )−1 X ∗ y ∗ = (X ∗0 X ∗ )−1 X ∗ P y = (X 0 P 0 P X)−1 X 0 P 0 P y = (X 0 Ω−1 X)−1 X 0 Ω−1 y
V ar(βbGLS ) = V ar((X 0 Ω−1 X)−1 X 0 Ω−1 y) = (X 0 Ω−1 X)−1 X 0 Ω−1 V ar(y)Ω−1 X(X 0 Ω−1 X)−1 =
= (X 0 Ω−1 X)−1 X 0 Ω−1 σ 2 ΩΩ−1 X(X 0 Ω−1 X)−1 = σ 2 (X 0 Ω−1 X)−1 .
Then, the usual way to compare the variance of the GLS estimator with the variance of the OLS estimator
(assuming heteroscedasticity) is to use the Gauss-Markov theorem. The Gauss-Markov theorem states that
the OLS estimator is BLUE (best linear unbiased estimator) for the homoscedastic case and, thus, has the
smallest mean squared error among linear estimators. As the GLS estimator is BLUE for the transformed
model 2, one can write V ar(βbOLS ) ≥ V ar(βbGLS ).
OLS estimator is not BLUE for heteroscedactic models.
β̂ = β + (X 0 X)−1 X 0 ε,
thus
β̂ = β + Aε = Ay, A = (X 0 X)−1 X,
which makes it a linear function of disturbances and as such a linear estimator. Let us define a general form
of unbiased linear estimators as:
βb0 = (A + C)y,
where A = (X 0 X)−1 X 0 and C is any matrix of the same dimension. By construction, βb0 is unbiased, i.e.
E βb0 = β. In the terms of parameters it means
h i h i
E βb = E (A + C)y = (A + C)E Xβ + ε = (A + C)Xβ + 0 =
= (X 0 X)−1 X 0 Xβ + CX β = (I + CX)β.
For any real matrix Π a product ΠΠ0 is positive semidefinite. Thus one concludes that V β̂ 6 V β̂0 .
Exercise 7
A random variable X ∼ Γ(α, β) has expected value α/β and variance α/β 2 . Find the method of moments
estimators of α and β.
7
Solution 7
The basic algorithm to follow when computing a method of moments estimator:
(a) If the model has d parameters, compute the first d moments (µi = E(X i )) as functions of parameters
µi = ki (θ1 , . . . , θd ), i = 1, . . . , d.
(b) Solve for the parameters as a functions of the moments θj = gj (µ1 , . . . , µd ).
Pn
(c) Compute the sample moments based on the data x̄i = n−1 m=1 xim , i = 1, . . . , d.
(d) Replace the computed moments by sample moments, which gives the method of moments estimators:
Because we have two parameters, the method of moments methodology requires us to determine the first two
moments.
α 2 2 α 1 α
E(X) = and E(X ) = V ar(X) + [E(X)] = + .
β β β β
Thus we have:
α α 1 α
µ1 = k1 (α, β) = , µ2 = k2 (α, β) = + .
β β β β
No we solve for α and β. Note that:
α µ1 α/β µ21
µ2 − µ21 = , 2 = = β and α = .
β 2 µ2 − µ1 α/β 2 µ2 − µ21
So now set
n n
1X 1X 2
x̄ = xi and x2 = x
n i=1 n i=1 i
to obtain estimators
x̄ (x̄)2
βb = and α
b = βbx̄ = .
x2 − (x̄)2 x2 − (x̄)2