L6 - Kalman Filter
L6 - Kalman Filter
KALMAN FILTER
Keith Law
FROM CONTEMPORANEOUS TO TEMPORAL
• A well-known linear state-space process with the assumption of multivariate normality for
state variables, is the Kalman filter
KALMAN FILTER
• Kalman filter is a powerful tool for combining information in the presence of uncertainty
• Denote the state variable as 𝑥𝑡 (which can be a vector of variables), which is unobservable, and the
observable variable as 𝑦𝑡 (which can be vector as well)
• The goal of the Kalman filter is to update knowledge of the state variable recursively when a new data point
becomes available. That is, knowing the conditional distribution of 𝑥𝑡 given the information at 𝑡 − 1 (ℱ𝑡−1 ) and
the new data 𝑦𝑡 , we want to obtain the conditional distribution of 𝑥𝑡 given ℱ𝑡 , where ℱ𝑡 = 𝑦1 , … , 𝑦𝑡
• To derive the Kalman filter, we need to consider the joint conditional distribution of (𝑥𝑡 , 𝑣𝑡 )′ given ℱ𝑡−1 ,
where 𝑣𝑡 = 𝑦𝑡 − 𝑦𝑡|𝑡−1 , and 𝑦𝑡|𝑡−1 means 𝐸𝑡−1 𝑦𝑡 ≡ 𝐸 𝑦𝑡 ℱ𝑡−1
• By the multivariate normality theorem (next page), if the state variable and observable variables are jointly
multivariate normally distributed, the conditional distribution of 𝑥𝑡 and 𝑣𝑡 given ℱ𝑡−1 are normal as well
MULTIVARIATE CONDITIONAL NORMALITY THEOREM
• For a random vector, say 𝑊, the multivariate normal density is given by:
1 1 ′ Σ −1 W − 𝜇
𝑓 𝑾 𝜇, Σ = 𝑘 1
exp − 𝑊 − 𝜇
2
2𝜋 2 Σ 2
• Multivariate Conditional Normality theorem: Suppose 𝑊 = (𝑥, 𝑦, 𝑧)′ where 𝑥, 𝑦 and 𝑧 are three random
vectors such that their joint distribution is multivariate normal. Assume Σ𝑤 is non-singular and Σ𝑦𝑧 =
′ = 𝟎 (block zero matrix). Then the conditional distribution of 𝑥 given y is normal with
Σ𝑧𝑦
−𝟏
𝑽𝒂𝒓 𝒙 𝒚 = 𝜮𝒙|𝒚 = 𝜮𝒙𝒙 − 𝜮𝒙𝒚 𝜮𝒚𝒚 𝜮𝒚𝒙 ,
−𝟏
𝑬 𝒙 𝒚 = 𝝁𝒙|𝒚 = 𝝁𝒙 + 𝜮𝒙𝒚 𝜮𝒚𝒚 (𝒚 − 𝝁𝒚 )
−𝟏 (𝒛 − 𝝁 )
𝑬 𝒙 𝒚, 𝒛 = 𝝁𝒙|𝒚,𝒛 = 𝝁𝒙|𝒚 + 𝜮𝒙𝒛 𝜮𝒛𝒛 𝒛
LINEAR STATE-SPACE MODEL
For 𝑡 = 1, … , 𝑇, 𝑥𝑡+1 = 𝑏𝑡 + 𝐴𝑡 𝑥𝑡 + 𝜖𝑡+1
𝑦𝑡 = 𝑑𝑡 + 𝐶𝑡 𝑥𝑡 + 𝜀𝑡
where 𝑥1|0 ~𝑁 𝜇0 , Σ0 , 𝜖𝑡 ~𝑁 0, 𝑄 , 𝜀𝑡 ~𝑁 0, 𝑅
• 𝑥𝑡 state variable (vector), the transition equation is a first-order Markov chain, with transition matrix 𝐴𝑡
• 𝑦𝑡 is observable, the measurement or observation equation relates observation to state vector 𝑥𝑡 through the
measurement matrix 𝐶𝑡
• 𝑏𝑡 (transition offsets) and 𝑑𝑡 (observation offsets) are deterministic (can be zero vectors)
• 𝜖𝑡 (state errors) and 𝜀𝑡 (measurement errors) are Gaussian white noise with state (transition) covariance 𝑄𝑡
and observation covariance 𝑅𝑡 , both positive-definite
• Initial state 𝑥1|0 ~𝑁(𝜇0 , Σ0 ), 𝜇0 , Σ0 are given and are independent of 𝜖𝑡 and 𝜀𝑡
• 𝐴𝑡 , 𝐶𝑡 , 𝑄𝑡 , 𝑅𝑡 are system matrices and known. These matrices are often sparse, and can be functions of some
parameters 𝜃, which can be estimated by maximum likelihood method
A DEEPER LOOK
For 𝑡 = 1, … , 𝑇, 𝑥𝑡+1 = 𝑏𝑡 + 𝐴𝑡 𝑥𝑡 + 𝜖𝑡+1
𝑦𝑡 = 𝑑𝑡 + 𝐶𝑡 𝑥𝑡 + 𝜀𝑡
where 𝑥1|0 ~𝑁 𝜇0 , Σ0 , 𝜖𝑡 ~𝑁 0, 𝑄 , 𝜀𝑡 ~𝑁 0, 𝑅
• If 𝐴𝑡 = 𝐼, then the regression coefficients (state equations) represent independent random walk, that is, each
factor’s impact on the observations is serially correlated but its evolution is independent from that of other
factors
• If 𝑄 = 𝟎, then the state-space model reduces to a simple OLS
• In this dynamic setting, an observable process 𝑦𝑡 is driven by a latent (unobservable) process 𝑥𝑡 up to
(independent) Gaussian errors. The latent process is autoregressive of order 1, that is, only depends on its own
previous realization
• This is Bayesian concept as we estimate (or update) 𝑥𝑡 (multi-dimensional) as time-varying when we observe 𝑦𝑡
overtime
LINEAR STATE-SPACE MODELS -EXAMPLES
We can cast the problem into linear state space format as:
𝜶𝒕 𝟏 𝟎 𝟎 ′ 𝟏 𝟐 𝝈𝟐𝜼 𝟎
𝒙𝒕 = 𝜷 , 𝑨𝒕 = , 𝒃𝒕 = 𝒅𝒕 = , 𝑪𝒕 = 𝒓 , 𝑹 𝒕 = 𝝈 𝜺 , 𝑸𝒕 =
𝒕 𝟎 𝟏 𝟎 𝑴,𝒕 𝟎 𝝈𝟐𝝐
KALMAN FILTER RECURSION
For 𝑡 = 1, … , 𝑇,
𝑥𝑡+1|𝑡 = 𝐸𝑡 𝑏𝑡 + 𝐴𝑡 𝑥𝑡 + 𝜖𝑡+1 = 𝑏𝑡 + 𝐴𝑡 𝑥𝑡|𝑡 (1)
Σ𝑡+1|𝑡 = 𝑉𝑎𝑟𝑡 𝐴𝑡 𝑥𝑡 + 𝜖𝑡+1 = 𝐴𝑡 Σ𝑡|𝑡 𝐴′𝑡 + 𝑄𝑡 (2)
𝑦𝑡 = 𝑑𝑡 + 𝐶𝑡 𝑥𝑡 + 𝜀𝑡 , 𝑦𝑡|𝑡−1 = 𝑑𝑡 + 𝐶𝑡 𝑥𝑡|𝑡−1
𝛴𝑡|𝑡 = 𝑉𝑎𝑟 𝑥𝑡 ℱ𝑡 = 𝑉𝑎𝑟 𝑥𝑡 ℱ𝑡−1 , 𝑣𝑡 = Σ𝑡|𝑡−1 − 𝑐𝑜𝑣 𝑥𝑡 , 𝑣𝑡 𝑉𝑡−1 𝑐𝑜𝑣 𝑥𝑡 , 𝑣𝑡 ′ = Σ𝑡|𝑡−1 − Σ𝑡|𝑡−1 𝐶𝑡′ 𝑉𝑡−1 𝐶𝑡 Σ𝑡|𝑡−1
(2) can be written as:
Σ𝑡+1|𝑡 = 𝐴𝑡 Σ𝑡|𝑡−1 𝐴′𝑡 − 𝐴𝑡 Σ𝑡|𝑡−1 𝐶𝑡′ 𝑉𝑡−1 𝐶𝑡 Σ𝑡|𝑡−1 𝐴′𝑡 + 𝑄𝑡 = 𝐴𝑡 Σ𝑡|𝑡−1 𝐿′𝑡 + 𝑄𝑡 where 𝐿𝑡 = 𝐴𝑡 − 𝐾𝑡 𝐶𝑡 .
STATE ESTIMATION ERROR
′ ′ −1 ′
𝑥𝑡|T = 𝐸 𝑥𝑡 ℱ𝑇 = 𝐸 𝑥𝑡 ℱ𝑡−1 , 𝑣𝑡 , … , 𝑣𝑇 ) = 𝐸(𝑥𝑡 ℱ𝑡−1 + 𝑐𝑜𝑣 𝑥𝑡 , 𝑣𝑡 , … , 𝑣 𝑇 𝑐𝑜𝑣 𝑣𝑡 , … 𝑣𝑇 𝑣𝑡 , … , 𝑣𝑇
−1 𝑣𝑡
′ 𝑉𝑡 0 ,,, 0
𝑐𝑜𝑣 𝑥𝑡 , 𝑣𝑡 𝑣𝑡+1
0 𝑉𝑡+1 … 0
= 𝑥𝑡|t−1 + … …
… … … …
𝑐𝑜𝑣 𝑥𝑡 , 𝑣 𝑇 𝑣𝑇
0 … … 𝑉𝑇
′
𝐸 𝑥𝑡 𝑒𝑡′ = 𝐸 𝑥𝑡 𝑥𝑡 − 𝑥𝑡|𝑡−1 = 𝑉𝑎𝑟 𝑥𝑡 = Σ𝑡|𝑡−1
′
𝐸 𝑥𝑡 𝑒𝑡+1 = 𝐸 𝑥𝑡 𝐿𝑡 𝑒𝑡 + 𝜖𝑡+1 − 𝐾𝑡 𝜀𝑡 ′
= Σ𝑡|𝑡−1 𝐿′𝑡
′
𝐸 𝑥𝑡 𝑒𝑡+2 = Σ𝑡|𝑡−1 𝐿′𝑡 𝐿′𝑡+1
:
𝐸 𝑥𝑡 𝑒𝑇′ = Σ𝑡|𝑡−1 𝐿′𝑡 … 𝐿′𝑇−1
Plugging into𝑥𝑡|T = 𝑥𝑡|t−1 + σ𝑇𝑖=𝑡 𝑐𝑜𝑣 𝑥𝑡 , 𝑣𝑖 𝑉𝑖−1 𝑣𝑖 = 𝑥𝑡|t−1 + σ𝑇𝑖=𝑡 𝐸 𝑥𝑡 𝑒𝑖′ 𝐶𝑖′ 𝑉𝑖−1 𝑣𝑖
Let’s define 𝑞𝑇 = 0 and 𝑞𝑡−1 = 𝐶𝑡′ 𝑉𝑡−1 𝑣𝑡 + 𝐿′𝑡 𝑞𝑡 , we can simplify the recursive formula as
𝑥𝑡|𝑇 = 𝑥𝑡|𝑡−1 + Σ𝑡|𝑡−1 𝑞𝑡−1
for 𝑡 = 𝑇, … , 1
𝑖=𝑡
′ −1
Σ𝑡|𝑇 = Σ𝑡|𝑡−1 − Σ𝑡|𝑡−1 𝐶𝑡′ 𝑉𝑡−1 𝐶𝑡 Σ𝑡|𝑡−1 − Σ𝑡|𝑡−1 𝐿′𝑡 𝐶𝑡+1 𝑉𝑡+1 𝐶𝑡+1 Lt Σ𝑡|𝑡−1 − ⋯ − Σ𝑡|𝑡−1 𝐿′𝑡 … 𝐿′𝑇−1 𝐶𝑇′ 𝑉𝑇−1 𝐶𝑇 𝐿 𝑇−1 … 𝐿𝑡 Σ𝑡|𝑡−1
So far, we have been dealing with the state variable and have defined filtering as:
𝐸 𝑥𝑡 |ℱ𝑡−1 ≡ 𝑥𝑡|𝑡−1
And smoothing as𝐸 𝑥𝑡 |ℱ𝑇 ≡ 𝑥𝑡|𝑇
But the ultimate goal is to forecast the observable variable via the state variable, i.e. 𝐸 𝑦𝑡+1 |ℱ𝑡 ≡ 𝑦𝑡+1|𝑡
Suppose the forecast origin is t and we are interested in predicting 𝑦𝑡+𝑗 𝑓𝑜𝑟 𝑗 = 1, … , ℎ, where h > 0
𝑦𝑡+1|𝑡 = 𝑑𝑡+1 + 𝐶𝑡+1 𝑥𝑡+1|𝑡
′
The forecast error is 𝜉𝑡+1 = 𝑦𝑡+1 − 𝑦𝑡+1|𝑡 = 𝐶𝑡+1 𝑒𝑡+1 + 𝜀𝑡+1 and 𝑉𝑎𝑟 𝜉𝑡+1 = 𝐶𝑡+1 Σ𝑡+1|𝑡 𝐶𝑡+1 + 𝑅𝑡+1
This is precisely the covariance matrix 𝑉𝑡+1 of the Kalman filter
INITIALIZATION
Suppose we use 5 PCs from the factors of HW3. Run the NKY returns on the factors with dynamic factor
loadings. The left chart is on the filtered states, the right chart is on the smoothed states