Stationary Random Series
Stationary Random Series
Question
Solution
(i) A process X is strictly stationary if the joint distributions of Xt1 , Xt2 , ..., Xtn and
X k t1 , X k t2 , ..., X k tn are identical for all t1 , t2 , , tn and k t1 , k t2 , , k tn in J and
all integers n. This means that the statistical properties of the process remain unchanged
as time elapses.
A weakly stationary process has constant variance since, for such a process,
var( Xt ) cov( Xt , Xt ) is independent of t .
In the study of time series it is a convention that the word ‘stationary’ on its own is a
shorthand notation for ‘weakly stationary’, though in the case of a multivariate normal
process, strict and weak stationarity are equivalent.
But we do need to be careful in our definition, as there are some processes which we wish
to exclude from consideration but which satisfy the definition of weak stationarity.
Question
Let Yt be a sequence of independent standard normal random variables. Determine which of the
following processes are stationary time series (given the definition above).
(iii) Xt Xt 1 Yt
(iv) Xt Yt 1 Yt
Solution
(i) For this process, X 0 sin U , X1 sin( U) , X2 sin(2 U) , …. Given the value of X 0 ,
future values of the process are fully determined. So this is not purely indeterministic,
and is not therefore a stationary time series in the sense defined in the Core Reading.
E ( Xt ) E ( Xt 1 Yt ) E ( Xt 1 ) E (Yt ) E ( Xt 1 )
Here we are using the fact that Yt is a sequence of independent standard normal random
variables. Since the variance is not constant, the process is not stationary.
E ( Xt ) E (Yt 1 ) E (Yt ) 0
and:
2 k0
cov Xt , Xt k 1 k 1
0 k 2
For example:
var(Yt 1 ) var(Yt ) 2
X1 Y0 Y1
So Y1 X1 Y0 and:
X2 Y1 Y2 ( X1 Y0 ) Y2
Rearranging gives:
Y2 X2 X1 Y0
and hence:
X 3 Y2 Y3 ( X2 X1 Y0 ) Y3
From this formula, we see that knowledge of the values of X1 , X2 and X 3 , say, becomes
progressively less useful in predicting the value of X n as n .
(v) This process has a deterministic trend via the ‘3t’ term, ie its mean varies over time. So it
is not stationary.
A particular form of notation is used for time series: X is said to be I (0) (read ‘integrated
of order 0’) if it is a stationary time series process, X is I (1) if X itself is not stationary but
the increments Yt X t X t 1 form a stationary process, X is I (2) if it is non-stationary but
the process Y is I (1) , and so on.
We will see plenty of examples of integrated processes when we study the ARIMA class of
processes in Section 3.8.
The theory of stationary random processes plays an important role in the theory of time
series because the calibration of time series models (that is, estimation of the values of the
model’s parameters using historical data) can be performed efficiently only in the case of
stationary random processes. A non-stationary random process has to be transformed into
a stationary one before the calibration can be performed. (See Chapter 14.)
Question
Suppose that we have a sample set of data that looks to be a realisation of an integrated process
of order 2. Explain what can we do to the data set in order to model it.
Solution
We can difference the data twice, ie look at the increments of the increments.
Autocovariance function
k cov X t , X t k E X t X t k E X t E X t k
0 var( X t )
If a process is not stationary, then the autocovariance function depends on two variables, namely
the time t and the lag k. This could be denoted, for example, t , k cov Xt , Xt k . However,
one of the main uses of the autocovariance function is to determine the type of process that will
be used to model a given set of data. Since this will be done only for stationary series, as
mentioned above, it is the autocovariance function for stationary series that is most important.
Because of the importance of the autocovariance function, we will have to calculate it for various
processes. This naturally involves calculating covariances and so we need to be familiar with all of
the properties of the covariance of two random variables. The following question is included as a
revision exercise.
Question
(a) cov(Y , X )
(v) Simplify each of the following expressions assuming that { Xt } denotes a stationary time
series defined at integer times and { Zt } are independent N(0, 2 ) random variables.
(a) cov(Z2 , Z3 )
(b) cov(Z3 , Z3 )
(c) cov( X2 , Z3 )
(d) cov( X2 , X 3 )
(e) cov( X2 , X2 )
Solution
(i) cov( X ,Y ) E ( XY ) E ( X )E (Y )
(b) cov( X , c) 0
cov( X ,W ) cov(Y ,W )
(c) cov( X2 , Z3 ) 0
(d) and (e) will depend on the actual process. If it is stationary, then cov( X2 , X 3 ) 1 , and
cov( X2 , X2 ) 0 .
Autocorrelation function
The autocorrelation function (ACF) of a stationary process is defined by:
k
k corr( X t , X t k )
0
This statement is intuitive. We do not expect two values of a (purely indeterministic) time series
to be correlated if they are a long way apart.
Question
Write down the formula for the correlation coefficient between the random variables X and Y .
Hence deduce the formula for the autocorrelation function given above.
Solution
cov X ,Y
corr X ,Y
var( X )var(Y )
So:
cov( Xt , Xt k ) k
k k
var( Xt )var( Xt k ) 0 0 0
cov Xt , Xt k t , k
t , k
var( Xt ) var( Xt k ) t ,0 t k ,0
However, as with the autocovariance function, it is the stationary case that is of most use in
practice.
A simple class of weakly stationary random processes is the white noise processes. A
random process et : t Z is a white noise if E et 0 for any t , and:
2 if k 0
k cov(et , et k )
0 otherwise
Strictly speaking a white noise process only has to be a sequence of uncorrelated random
variables, ie not necessarily a sequence of independent random variables. We can also have white
noise processes without zero mean.
Result 13.1
The autocovariance function and autocorrelation function of a stationary random
process are even functions of k , that is, k k and k k .
Proof
Since the autocovariance function k cov X t , X t k does not depend on t, we have:
This result allows us to concentrate on positive lags when finding the autocorrelation functions of
stationary processes.
2.4 Correlograms
Autocorrelation functions are the most commonly used statistic in time series analysis. A lot of
information about a time series can be deduced from a plot of the sample autocorrelation
function (as a function of the lag). Such a plot is called a correlogram.
0.5
1 2 10
-0.5
-1
At lag 0 the autocorrelation function takes the value 1, since 0 0 1 . Often the function
0
starts out at 1 but decays fairly quickly, which is indicative of the time series being stationary. The
above correlation function tells us that at lags 0, 1 and 2 there is some positive correlation so that
a value on one side of the mean will tend to have a couple of values following that are on the
same side of the mean. However, beyond lag 2 there is little correlation.
In fact, the above function comes from a sample path of a stationary AR(1) process, namely
X n 0.5 X n1 en . (We look in more detail at such processes in the next section.)
The data used for the first 50 values is plotted below. (The actual data used to produce the
autocorrelation function used the first 1,000 values.)
0 10 20 30 40 50
The ‘gap’ in the axes here is deliberate; the vertical axis does not start at zero. The horizontal axis
on this and the next graph shows time, and the vertical axis shows the value of the time series X .
This form of presentation is difficult to interpret. It’s easier to see if we ‘join the dots’.
0 10 20 30 40 50
By inspection of this graph we can indeed see that one value tends to be followed by another
similar value. This is also true at lag 2, though slightly less clear. Once the lag is 3 or more, there
is little correlation.
Alternating series
0 10 20 30 40 50
The average of this data is obviously roughly in the middle of the extreme values. Given a
particular value, the following one tends to be on the other side of the mean. The series is
alternating. This is reflected in the autocorrelation function shown below. At lag 1 there is a
negative correlation. Conversely, at lag 2, the two points will generally be on the same side of the
mean and therefore will have positive correlation, and so on. The autocorrelation therefore also
alternates as shown.
0.5
1 2 3 4 5 6 7 8 9 10
–0.5
–1
The data in this case actually came from a stationary autoregressive process, this time
X n 0.85 X n1 en . This is stationary, but because the coefficient of X n1 is larger in
magnitude, ie 0.85 vs 0.5, the decay of the autocorrelation function is slower. This is because the
X n1 term is not swamped by the random factors en as quickly. It is the fact that the coefficient
is negative that makes the series alternate.
0 10 20 30 40 50
In this time series, a strong trend is clearly visible. The effect of this is that any given value is
followed, in general, by terms that are greater. This gives positive correlation at all lags. The
decay of the autocorrelation function will be very slow, if it occurs at all.
0.5
1 2 3 4 5 6 7 8 9 10
–0.5
–1
If the trend is weaker, for example X n 0.001n 0.5 X n1 en , then there may be some decay at
first as the trend is swamped by the other factors, but there will still be some residual correlation
at larger lags.
0 10 20 30 40 50
The trend is difficult to see from this small sample of the data but shows up in the autocorrelation
function as the residual correlation at higher lags.
0.5
1 2 3 4 5 6 7 8 9 10
–0.5
–1
Question
Describe the associations we would expect to find in a time series representing the average
daytime temperature in successive months in a particular town, and hence sketch a diagram of
the autocorrelation function of this series.
Solution
We expect the temperature in different years to be roughly the same at the same time of year,
and hence there should be very strong positive correlation at lags of 12 months, 24 months and
so on.
Within each year we would also expect a positive correlation between nearby times, for example
with lags of 1 or 2 months, with decreasing correlation as the lag increases. On the other hand,
once we reach a lag of 6 months there should be strong negative correlation since one
temperature will be above the mean, the other below it. For example comparing June with
December.
0.5
5 10 15 20 25
–0.5
–1
Unlike the autocovariance and autocorrelation functions, the PACF is defined for positive lags
only.
2
E X t k ,1X t 1 k ,2 X t 2 k ,k X t k
We can explain the last expression as follows. Suppose that at time t 1 we are trying to
estimate Xt , but we are going to limit our choice of estimator to linear functions of the k
previous values Xt k , , Xt 1 . The most general linear estimator will be of the form:
k ,1 Xt 1 k ,2 Xt 2 k ,k Xt k
where k ,i are constants. We can choose the coefficients to minimise the mean square error,
which is the expression given above in Core Reading. The partial autocorrelation for lag k is then
the weight that we assign to the Xt k term.
Question
Solution
For k 1 we just have the correlation itself. However, in this case it is clear that the Xt for even
values of t are independent of those for odd values. It follows that the correlation at lag 1 is 0.
For k 2 the partial autocorrelation is the coefficient of Xt 2 in the best linear estimator:
2,1 Xt 1 2,2 Xt 2
Similarly, the defining equation suggests that the best linear estimator will not involve
Xt 3 , Xt 4 , . It follows that for k 3 , we have k 0 .
For the time series in the previous question, we have 4 0 . This is in contrast to the actual
correlation at lag four, since Xt depends on Xt 2 , which in turn depends on Xt 4 . Xt and Xt 4
will therefore be correlated. The partial autocorrelation is zero, however, because it effectively
removes the impact of the correlation at smaller lags.
The formula for calculating k involves a ratio of determinants of large matrices whose
entries are determined by 1, , k ; it may be found in standard works on time series
analysis, and is readily available in common computer packages like R.
The diagrams below show the autocorrelation function and partial autocorrelation of an
ARMA(1,1) series. ARMA processes are discussed in detail in Section 3.7.
Figure 13.1: ACF and PACF values of some stationary time series model.
1 1
det 2
1 1, 2 1 2 2 1
1 1 1 12
det
1 1
These formulae can be found on page 40 of the Tables. Their derivations are not required.
It is important to realise that the PACF is determined by the ACF, as the above expressions
suggest. The PACF does not therefore contain any extra information; it just gives an alternative
presentation of the same information. However, as we will see, this can be used to identify
certain types of process.