0% found this document useful (0 votes)
79 views

BaseR Cheat Sheet

The document discusses methods for generating random variables and stochastic processes through Monte Carlo simulation. It begins by describing Monte Carlo integration for estimating integrals using random sampling. It then covers the main methods for generating univariate random variables: the inverse transform method, composition method, and acceptance-rejection method. The inverse transform method works by mapping uniformly distributed random variables to the desired distribution through the inverse CDF. It provides examples of generating geometric and Poisson random variables this way.

Uploaded by

Miguel Rosales
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views

BaseR Cheat Sheet

The document discusses methods for generating random variables and stochastic processes through Monte Carlo simulation. It begins by describing Monte Carlo integration for estimating integrals using random sampling. It then covers the main methods for generating univariate random variables: the inverse transform method, composition method, and acceptance-rejection method. The inverse transform method works by mapping uniformly distributed random variables to the desired distribution through the inverse CDF. It provides examples of generating geometric and Poisson random variables this way.

Uploaded by

Miguel Rosales
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

IEOR E4703: Monte Carlo Simulation

c 2017 by Martin Haugh


Columbia University

Generating Random Variables and Stochastic


Processes
In these lecture notes we describe the principal methods that are used to generate random variables, taking as
given a good U (0, 1) random variable generator. We begin with Monte-Carlo integration and then describe the
main methods for random variable generation including inverse-transform, composition and acceptance-rejection.
We also describe the generation of normal random variables and multivariate normal random vectors via the
Cholesky decomposition. We end with a discussion of how to generate (non-homogeneous) Poisson processes as
well (geometric) Brownian motions.

1 Monte Carlo Integration


Monte-Carlo simulation can also be used for estimating integrals and we begin with one-dimensional integrals.
Suppose then that we want to compute Z 1
θ := g(x) dx.
0
If we cannot compute θ analytically, then we could use numerical methods. However, we can also use simulation
and this can be especially useful for high-dimensional integrals. The key observation is to note that θ = E[g(U )]
where U ∼ U (0, 1). We can use this observation as follows:

1. Generate U1 , U2 , . . . Un ∼ IID U (0, 1)


2. Estimate θ with
g(U1 ) + . . . + g(Un )
θbn :=
n
There are two reasons that explain why θbn is a good estimator:

1. θbn is unbiased, i.e., E[θbn ] = θ and

2. θbn is consistent, i.e., θbn → θ as n → ∞ with probability 1. This follows immediately from the Strong
Law of Large Numbers (SLLN) since g(U1 ), g(U2 ), . . . , g(Un ) are IID with mean θ.

R1
Example 1 Suppose we wish to estimate 0 x3 dx using simulation. We know the exact answer is 1/4 but we
can also estimate this using simulation. In particular if we generate n U (0, 1) independent variables, cube them
and then take their average then we will have an unbiased estimate.

R3
Example 2 We wish to estimate θ = 1 (x2 + x) dx again using simulation. Once again we know the exact
answer (it’s 12.67) but we can also estimate it by noting that
Z 3 2
x +x
θ = 2 dx = 2E[X 2 + X]
1 2
where X ∼ U (1, 3). So we can estimate θ by generating n IID U (0, 1) random variables, converting them
(how?) to U (1, 3) variables, X1 , . . . , Xn , and then taking θbn := 2 i (Xi2 + Xi )/n.
P
Generating Random Variables and Stochastic Processes 2

1.1 Multi-Dimensional Monte Carlo Integration


Suppose now that we wish to approximate
Z 1 Z 1
θ= g(x1 , x2 ) dx1 dx2 .
0 0

Then we can write θ = E[g(U1 , U2 )] where U1 , U2 are IID U (0, 1) random variables. Note that the joint PDF
satisfies fu1 ,u2 (u1 , u2 ) = fu1 (u1 )fu2 (u2 ) = 1 on [0, 1]2 . As before we can estimate θ using simulation by
performing the following steps:
1. Generate n independent bivariate vectors (U1i , U2i ) for i = 1, . . . , n, with all Uji ’s IID U (0, 1).

2. Compute g(U1i , U2i ) for i = 1, . . . , n


3. Estimate θ with
g(U11 , U21 ) + . . . + g(U1n , U2n )
θbn =
n

As before, the SLLN justifies this approach and guarantees that θbn → θ w.p. 1 as n → ∞.

Example 3 (Computing a Multi-Dimensional Integral)


We can use Monte Carlo to estimate
Z 1Z 1
θ := (4x2 y + y 2 ) dxdy
0 0
= E 4X 2 Y + Y 2
 

where X, Y are IID U (0, 1). (The true value of θ is easily calculated to be 1.)

We can also apply Monte Carlo integration to more general problems. For example, if we want to estimate
Z Z
θ= g(x, y)f (x, y) dx dy
A

where f (x, y) is a density function on A, then we observe that θ = E[g(X, Y )] where X, Y have joint density
f (x, y). To estimate θ using simulation we simply generate n random vectors (X, Y ) with joint density f (x, y)
and then estimate θ with
g(X1 , Y1 ) + . . . + g(Xn , Yn )
θbn := .
n

2 Generating Univariate Random Variables


We will study a number of methods for generating univariate random variables. The three principal methods are
the inverse transform method, the composition method and the acceptance-rejection method. All of these
methods rely on having a (good) U (0, 1) random number generator available which we assume to be the case.

2.1 The Inverse Transform Method


The Inverse Transform Method for Discrete Random Variables
Suppose X is a discrete random variable with probability mass function (PMF)

 x1 , w.p. p1
X= x2 , w.p. p2
x3 , w.p. p3

Generating Random Variables and Stochastic Processes 3

where p1 + p2 + p3 = 1. We would like to generate a value of X and we can do this by using our U (0, 1)
generator as follows. First generate U and then set

 x1 , if 0 ≤ U ≤ p1
X= x2 , if p1 < U ≤ p1 + p2
x3 , if p1 + p2 < U ≤ 1.

We can easily check that this is correct: note that P(X = x1 ) = P(0 ≤ U ≤ p1 ) = p1 since U is U (0, 1). The
same is true for P(X = x2 ) and P(X = x3 ).

More generally, suppose X can take on n distinct values, x1 < x2 < . . . < xn , with

P(X = xi ) = pi for i = 1, . . . , n.

Then to generate a sample value of X we:


1. Generate U
Pj−1 Pj
2. Set X = xj if i=1 pi < U ≤ i=1 pi . That is, we set X = xj if F (xj−1 ) < U ≤ F (xj ).

If n is large, then we might want to search for xj more efficiently, however!

Example 4 (Generating a Geometric Random Variable)


Suppose X is geometric with parameter p so that P(X = n) = (1 − p)n−1 p. Then we can generate X as
follows:
1. Generate U
Pj−1 i−1
Pj
2. Set X = j if i=1 (1 − p) p < U ≤ i=1 (1 − p)i−1 p. That is, we set (why?) X = j if
1 − (1 − p)j−1 < U ≤ 1 − (1 − p)j .
 
log(U )
In particular, we set X = Int log(1−p) + 1 where Int(y) denotes the integer part of y.

You should convince yourself that this is correct! How does this compare to the coin-tossing method for
generating X?

Example 5 (Generating a Poisson Random Variable)


Suppose that X is Poisson(λ) so that P(X = n) = exp(−λ) λn /n! . We can generate X as follows:
1. Generate U
2. Set X = j if F (j − 1) < U ≤ F (j).

How do we find j? We could use the following algorithm.

set j = 0, p = e−λ , F = p
while U > F
set p = λp/(j + 1), F = F + p, j = j + 1
set X = j

Questions: How much work does this take? What if λ is large? Can we find j more efficiently?
Answer (to last question): Yes by checking if j is close to λ first.

Further questions: Why might this be useful? How much work does this take?
Generating Random Variables and Stochastic Processes 4

The Inverse Transform Method for Continuous Random Variables


Suppose now that X is a continuous random variable and we want to generate a value of X. Recall that when
X was discrete, we could generate a variate by first generating U and then setting X = xj if
F (xj−1 ) < U ≤ F (xj ). This suggests that when X is continuous, we might generate X as follows:

1. Generate U
2. Set X = x if Fx (x) = U , i.e., set X = Fx−1 (U )

We need to prove that this algorithm actually works! But this follows immediately since
−1
P(X ≤ x) = P(Fx (U ) ≤ x) = P(U ≤ Fx (x)) = Fx (x)
as desired. This argument assumes Fx−1 exists but there is no problem even when Fx−1 does not exist. All we
have to do is
1. Generate U
2. Set X = min{x : Fx (x) ≥ U }.

This works for discrete and continuous random variables or mixtures of the two.

Example 6 (Generating an Exponential Random Variable)


We wish to generate X ∼ Exp(λ). In this case Fx (X) = 1 − e−λx so that Fx−1 (u) = − log(1 − u)/λ. We can
generate X then by generating U and setting (why?) X = − log(U )/λ.

Example 7 (Generating a Gamma(n,λ) Random Variable)


We wish to generate X ∼ Gamma(n, λ) where n is a positive integer. Let Xi be IID ∼ exp(λ) for
i = 1, . . . , n. Note that if Y := X1 + . . . + Xn then Y ∼ Gamma(n, λ). How can we use this observation to
generate a sample value of Y ? If n is not an integer, then we need another method to generate Y .

Example 8 (Generating Order Statistics)


Order statistics are very important and have many applications in statistics, engineering and even finance. So
suppose X has CDF Fx and let X1 , . . . , Xn be IID ∼ X. Let X(1) , . . . , X(n) be the ordered sample so that

X(1) ≤ X(2) ≤ . . . ≤ X(n) .

We say X(i) is the ith ordered statistic. Several questions arise:

Question: How do we generate a sample of X(i) ?


−1
Method 1: Generate U1 , . . . , Un and for each Ui compute Xi = FX (Ui ). We then order the Xi ’s and take the
th
i smallest as our sample. How much work does this take?

Question: Can we do better?


Method 2: Sure, use the monotonicity of F !

Question: Can we do even better?


Method 3: Suppose Z ∼ beta(a, b) on (0, 1) so that

f (z) = cz a−1 (1 − z)b−1 for 0 ≤ z ≤ 1

where c is a constant so that the density integrates to 1. How can we use this distribution?

Question: Can we do even better?


Generating Random Variables and Stochastic Processes 5

Advantages of the Inverse Transform Method


There are two principal advantages to the inverse transform method:
1. Monotonicity: we have already seen how this can be useful.

2. The method is 1-to-1, i.e. one U (0, 1) variable produces one X variable. This property can be useful for
some variance reduction techniques.

Disadvantages of the Inverse Transform Method


The principal disadvantage of the inverse transform method is that Fx−1 may not always be computable. For
example, suppose X ∼ N(0, 1). Then
Z x  2
1 −z
Fx (x) = √ exp dz
−∞ 2π 2

so that we cannot even express Fx in closed form. Even if Fx is available in closed form, it may not be possible
to find Fx−1 in closed form. For example, suppose Fx (x) = x5 (1 + x)3 /8 for 0 ≤ x ≤ 1. Then we cannot
compute Fx−1 . One possible solution to these problems is to find Fx−1 numerically.

2.2 The Composition Approach


Another method for generating random variables is the composition approach. Suppose again that X has CDF
Fx and that we wish to simulate a value of X. We can often write

X
Fx (x) = pj Fj (x)
j=1
P
where the Fj ’s are also CDFs, pj ≥ 0 for all j, and pj = 1. Equivalently, if the densities exist then we can
write

X
fx (x) = pj fj (x).
j=1

Such a representation often occurs very naturally. For example, suppose


X ∼ Hyperexponential(λ1 , α1 , . . . , λn , αn ) so that
n
X
fx (x) = αi λi e−λi x
j=1
Pn
where λi , αi ≥ 0, and i αi = 1. Here αi = 0 for i > n. If it’s difficult to simulate X directly using the
inverse transform method then we could use the composition algorithm (see below) instead.

Composition Algorithm
1. Generate I that is distributed on the non-negative integers so that P(I = j) = pj . (How do we do this?)

2. If I = j, then simulate Yj from Fj


3. Set X = Yj
Generating Random Variables and Stochastic Processes 6

We claim that X has the desired distribution!


Proof: We have

X
P(X ≤ x) = P(X ≤ x|I = j) P(I = j)
j=1
X∞
= P(Yj ≤ x) P(I = j)
j=1
X∞
= Fj (x)pj
j=1
= Fx (x)

The proof actually suggests that the composition approach might arise naturally from ‘sequential’ type
experiments. Consider the following example.
Example 9 (A Sequential Experiment)
Suppose we roll a dice and let Y ∈ {1, 2, 3, 4, 5, 6} be the outcome. If If Y = i then we generate Zi from the
distribution Fi and set X = Zi .

What is the distribution of X? How do we simulate a value of X?

Example 10 (The Hyperexponential Distribution)


Let X ∼ Hyperexponential(λ1 , α1 , λ2 , α2 ) so that fx (x) = α1 λ1 e−λ1 x + α2 λ2 e−λ2 x . In our earlier notation we
have
α1 = p1
α2 = p2
f1 (x) = λ1 e−λ1 x
f2 (x) = λ2 e−λ2 x
and the following algorithm will then generate a sample of X.

generate U1
if U1 ≤ p1 then
set i = 1
else set i = 2
generate U2
/∗ Now generate X from Exp(λi ) ∗/
set
1
X = − log(U2 )
λi

Question: How would you simulate a value of X if Fx (x) = (x + x3 + x5 )/3 ?

When the decomposition



X
Fx = pj Fj (x)
j=1

is not obvious, we can create an artificial decomposition by splitting.


Generating Random Variables and Stochastic Processes 7

Example 11 (Splitting)
Suppose
1 6
fx (x) =
1[−1,0] (x) + 1[0,2] (x).
5 15
How do we simulate a value of X using vertical splitting? How would horizontal splitting work?

2.3 The Acceptance-Rejection Algorithm


Let X be a random variable with density, f (·), and CDF, Fx (·). Suppose it’s hard to simulate a value of X
directly using either the inverse transform or composition algorithm. We might then wish to use the
acceptance-rejection algorithm. Towards this end let Y be another random variable with density g(·) and
suppose that it is easy to simulate a value of Y . If there exists a constant a such that
f (x)
≤ a for all x
g(x)
then we can simulate a value of X as follows.

The Acceptance-Rejection Algorithm

generate Y with PDF g(·)


generate U
f (Y )
while U > ag(Y )
generate Y
generate U
set X = Y

Question: Why must we have a ≥ 1?


We must now prove that this algorithm does indeed work. We define B to be the event that Y has been
accepted in the while loop, i.e., U ≤ f (Y )/ag(Y ). We need to show that P(X ≤ x) = Fx (x)

Proof: First observe


P(X ≤ x) = P(Y ≤ x|B)

P ((Y ≤ x) ∩ B)
= . (1)
P(B)
We can compute P(B) as
 
f (Y ) 1
P(B) = P U≤ =
ag(Y ) a
while the numerator in (1) satisfies
Z ∞
P ((Y ≤ x) ∩ B) = P ((Y ≤ x) ∩ B | Y = y) g(y) dy
−∞
Z ∞    
f (Y )
= P (Y ≤ x) ∩ U ≤ Y = y g(y) dy

−∞ ag(Y )
Z x  
f (y)
= P U≤ g(y) dy (why?)
−∞ ag(y)
Fx (x)
=
a
Generating Random Variables and Stochastic Processes 8

Therefore P(X ≤ x) = Fx (x), as required.

Example 12 (Generating a Beta(a, b) Random Variable)


Recall that X has a Beta(a, b) distribution if f (x) = cxa−1 (1 − x)b−1 for 0 ≤ x ≤ 1. Suppose now that we
wish to simulate from the Beta(4, 3) so that
f (x) = 60x3 (1 − x)2 for 0 ≤ x ≤ 1.
We could, for example, integrate f (·) to find F (·), and then try to use the inverse transform approach.
However, it is hard to find F −1 (·). Instead, let’s use the acceptance-rejection algorithm:

1. First choose g(y): let’s take g(y) = 1 for y ∈ [0, 1], i.e., Y ∼ U (0, 1)
2. Then find a. Recall that we must have
f (x)
≤ a for all x,
g(x)
which implies
60x3 (1 − x)2 ≤ a for all x ∈ [0, 1].
So take a = 3. It is easy to check that this value works. We then have the following algorithm.
Algorithm

generate Y ∼ U (0, 1)
generate U ∼ U (0, 1)
while U > 20Y 3 (1 − Y )2
generate Y
generate U
set X = Y

Efficiency of the Acceptance-Rejection Algorithm


Let N be the number of loops in the A-R algorithm until acceptance, and as before, let B be the event that Y
has been accepted in a loop, i.e. U ≤ f (Y )/ag(Y ). We saw earlier that P(B) = 1/a.
Questions:
1: What is the distribution of N ?
2: What is E[N ]?

How Do We Choose a?
E[N ] = a, so clearly we would like a to be as small as possible. Usually, this is just a matter of calculus.

Example 13 (Generating a Beta(a, b) Random Variable continued)


Recall the Beta(4, 3) example with PDF f (x) = 60x3 (1 − x)2 for x ∈ [0, 1]. We chose g(y) = 1 for y ∈ [0, 1]
so that Y ∼ U (0, 1). The constant a had to satisfy
f (x)
≤a for all x ∈ [0, 1]
g(x)
and we chose a = 3. We can do better by choosing
f (x)
a = max ≈ 2.073.
x∈[0,1] g(x)
Generating Random Variables and Stochastic Processes 9

How Do We Choose g(·)?


We would like to choose g(·) to minimize the computational load. This can be achieved by taking g(·) ‘close’ to
f (·). Then a will be close to 1 and so fewer iterations will be required in the A-R algorithm. There is a tradeoff,
however: if g(·) is ‘close’ to f (·) then it will probably also be hard to simulate from g(·). So we often need to
find a balance between having a ‘nice’ g(·) and a small value of a.

Acceptance-Rejection Algorithm for Discrete Random Variables


So far, we have expressed the A-R algorithm in terms of PDF’s, thereby implicitly assuming that we are
generating continuous random variables. However, the A-R algorithm also works for discrete random variables
where we simply replace PDF’s with PMF’s. So suppose we wish to simulate a discrete random variable, X,
with PMF, pi = P(X = xi ). If we do not wish to use the discrete inverse transform method for example, then
we can use the following version of the A-R algorithm. We assume that we can generate Y with PMF,
qi = P(Y = yi ), and that a satisfies pi /qi ≤ a for all i.

The Acceptance-Rejection Algorithm for Discrete Random Variables

generate Y with PMF qi


generate U
pY
while U > aqY

generate Y
generate U
set X = Y

Generally, we would use this A-R algorithm when we can simulate Y efficiently.

Exercise 1 (From Simulation by Sheldon M. Ross)


Suppose Y ∼ Bin(n, p) and that we want to generate X where

P(X = r) = P(Y = r|Y ≥ k)

for some fixed k ≤ n. Assume α = P(Y ≥ k) has been computed.

1. Give the inverse transform method for generating X.


2. Give another method for generating X.
3. For what values of α, small or large, would the algorithm in (2) be inefficient?

Example 14 (Generating from a Uniform Distribution over a 2-D Region)


Suppose (X, Y ) is uniformly distributed over a 2-dimensional area, A. How would you simulate a sample of
(X, Y )? Note first that if X ∼ U (−1, 1), Y ∼ U (−1, 1) and X and Y are independent then (X, Y ) is uniformly
distributed over the region
A := {(x, y) : −1 ≤ x ≤ 1, −1 ≤ y ≤ 1}.
We can therefore (how?) simulate a sample of (X, Y ) when A is a square. Suppose now that A is a circle of
radius 1 centered at the origin. How do we simulate a sample of (X, Y ) in that case?

Remark 1 The A-R algorithm is an important algorithm for generating random variables. Moreover it can be
used to generate samples from distributions that are only known up to a constant. It is very inefficient in
high-dimensions, however, which is why Markov Chain Monte Carlo (MCMC) algorithms are required.
Generating Random Variables and Stochastic Processes 10

3 Other Methods for Generating Univariate Random Variables


Besides the inverse transform, composition and acceptance-rejection algorithms, there are a number of other
important methods for generating random variables. We begin with the convolution method.

3.1 The Convolution Method


Suppose X ∼ Y1 + Y2 + . . . + Yn where the Yi ’s are IID with CDF Fy (·). Suppose also that it’s easy to
generate the Yi ’s. Then it is straightforward to generate a value of X:
1. Generate Y1 , . . . , Yn that have CDF Fy

2. Set X = Y1 + . . . + Yn

We briefly mentioned this earlier in Example 7 when we described how to generate a Gamma(λ, n) random
variable. The convolution method is not always the most efficient method. Why?
More generally, suppose we want to simulate a value of a random variable, X, and we know that

X ∼ g(Y1 , . . . , Yn )

for some random variables Yi and some function g(·). Note that the Yi ’s need not necessarily be IID. If we know
how to generate (Y1 , . . . , Yn ) then we can generate X by generating (Y1 , . . . , Yn ) and setting
X = g(Y1 , . . . , Yn ). We saw such an application in Example 7.

Example 15 (Generating Lognormal Random Variables)


Suppose X ∼ N(µ, σ 2 ). Then Y := exp(X) has a lognormal distribution, i.e., Y ∼ LN(µ, σ 2 ). (Note E[Y ] 6= µ
and Var(Y ) 6= σ 2 .) How do we generate a log-normal random variable?

Example 16 (Generating χ2 Random Variables)


Suppose X ∼ N(0, 1). Then Y := X 2 has a a chi-square distribution with 1 degree of freedom, i.e., Y ∼ χ21 .
Question: How would you generate a χ21 random variable?
Suppose now that Xi ∼ χ21 for i = 1, . . . , n. Then Y := X1 + . . . + Xn has a chi-square distribution with n
degrees-of-freedom, i.e., Y ∼ χ2n .
Question: How would you generate a χ2n random variable?

Example 17 (Generating tn Random Variables)


Suppose X ∼ N(0, 1) and Y ∼ χ2n with X and Y independent. Then

X
Z := q
Y
n

has a a t distribution with n degrees of freedom , i.e., Z ∼ tn .


Question: How would you generate a tn random variable?
Generating Random Variables and Stochastic Processes 11

Example 18 (Generating Fm,n Random Variables)


Suppose X ∼ χ2m and Y ∼ χ2n with X and Y independent. Then
X

m
Z := Y
n

has an F distribution with m and n degrees of freedom, i.e., Z ∼ Fm,n .


Question: How would you generate a Fm,n random variable?

4 Generating Normal Random Variables


While we typically rely on software packages to generate normal random variables for us, it is nonetheless
worthwhile having an understanding of how to do this. We first note that if Z ∼ N(0, 1) then

X := µ + σZ ∼ N(µ, σ 2 )

so that we need only concern ourselves with generating N(0, 1) random variables. One possibility for doing this
is to use the inverse transform method. But we would then have to use numerical methods since we cannot find
Fz−1 (·) := Φ−1 (·) in closed form. Other approaches for generating N(0, 1) random variables include:
1. The Box-Muller method
2. The Polar method
3. Rational approximations.
There are many other methods such as the A-R algorithm that could also be used to generate N (0, 1) random
variables.

4.1 The Box Muller Algorithm


The Box-Muller algorithm uses two IID U (0, 1) random variables to produce two IID N(0, 1) random variables.
It works as follows:

The Box-Muller Algorithm for Generating Two IID N(0, 1) Random Variables

generate U1 and U2 IID U (0, 1)


set
p p
X = −2 log(U1 ) cos(2πU2 ) and Y = −2 log(U1 ) sin(2πU2 )

We now show that this algorithm does indeed produce two IID N(0, 1) random variables, X and Y .
Proof: We need to show that
 2  2
1 x 1 y
f (x, y) = √ exp − √ exp −
2π 2 2π 2
First, make a change of variables:
p
R := X2 + Y 2
 
−1 Y
θ := tan
X
Generating Random Variables and Stochastic Processes 12

so R and θ arep polar coordinates of (X, Y ). To transform back, note X = R cos(θ) and Y = R sin(θ). Note
also that R = −2 log(U1 ) and θ = 2πU2 . Since U1 and U2 are IID, R and θ are independent. Clearly
2
θ ∼ U (0, 2π) so fθ (θ) = 1/2π for 0 ≤ θ ≤ 2π. It is also easy to see that fR (r) = re−r /2 for r ≥ 0, so that
1 −r2 /2
fR,θ (r, θ) = re , 0 ≤ θ ≤ 2π, r ≥ 0.

This implies

P(X ≤ x1 , Y ≤ y1 ) = P(R cos(θ) ≤ x1 , R sin(θ) ≤ y1 )


Z Z
1 −r2 /2
= re dr dθ (2)
A 2π

where A = {(r, θ) : r cos(θ) ≤ x, r sin(θ) ≤ y}. We now transform back to (x, y) coordinates with

x = r cos(θ) and y = r sin(θ)

and note that dx dy = rdr dθ, i.e., the Jacobian of the transformation is r. We then use (2) to obtain
Z x1 Z y1
(x2 + y 2 )
 
1
P(X ≤ x1 , Y ≤ y1 ) = exp − dxdy
2π −∞ −∞ 2
Z x1 Z y1
1 1
= √ exp(−x2 /2) dx √ exp(−y 2 /2) dy
2π −∞ 2π −∞
as required.

4.2 The Polar Method


One disadvantage of the Box-Muller method is that computing sines and cosines is inefficient. We can get
around this problem using the polar method which is described in the algorithm below.

The Polar Algorithm for Generating Two IID N(0, 1) Random Variables

set S = 2
while S > 1
generate U1 and U2 IID U (0, 1)
set V1 = 2U1 − 1, V2 = 2U2 − 1 and S = V12 + V22
set r r
−2 log(S) −2 log(S)
X= V1 and Y = V2
S S

Can you see why this algorithm1 works?

4.3 Rational Approximations


Let X ∼ N(0, 1) and recall that Φ(x) = P(X ≤ x) is the CDF of X. If U ∼ U (0, 1), then the inverse transform
method seeks xu = Φ−1 (U ). Finding Φ−1 in closed form is not possible but instead, we can use rational
approximations. These are very accurate and efficient methods for estimating xu .
1 See Simulation by Sheldon M. Ross for further details.
Generating Random Variables and Stochastic Processes 13

Example 19 (Rational Approximations)


For 0.5 ≤ u ≤ 1
a0 + a1 t
xu ≈ t −
1 + b1 t + b2 t 2
p
where a0 , a1 , b1 and b2 are constants, and t = −2 log(1 − u). The error is bounded in this case by .003. Even
more accurate approximations are available, and since they are very fast, many packages (including Matlab) use
them for generating normal random variables.

5 The Multivariate Normal Distribution


If the n-dimensional vector X is multivariate normal with mean vector µ and covariance matrix Σ then we write

X ∼ MNn (µ, Σ).

The standard multivariate normal has µ = 0 and Σ = In , the n × n identity matrix. The PDF of X is given by
1 1 >
Σ−1 (x−µ)
f (x) = e− 2 (x−µ) (3)
(2π)n/2 |Σ|1/2

where | · | denotes the determinant, and its characteristic function satisfies


h > i > 1 >
φX (s) = E eis X = eis µ− 2 s Σs . (4)

Recall again our partition of X into X1 = (X1 , . . . , Xk )> and X2 = (Xk+1 , . . . , Xn )> . If we extend this
notation naturally so that
   
µ1 Σ11 Σ12
µ = and Σ = .
µ2 Σ21 Σ22

then we obtain the following results regarding the marginal and conditional distributions of X.

Marginal Distribution
The marginal distribution of a multivariate normal random vector is itself multivariate normal. In particular,
Xi ∼ MN(µi , Σii ), for i = 1, 2.

Conditional Distribution
Assuming Σ is positive definite, the conditional distribution of a multivariate normal distribution is also a
multivariate normal distribution. In particular,

X2 | X1 = x1 ∼ MN(µ2.1 , Σ2.1 )

where µ2.1 = µ2 + Σ21 Σ−1 −1


11 (x1 − µ1 ) and Σ2.1 = Σ22 − Σ21 Σ11 Σ12 .

Linear Combinations
Linear combinations of multivariate normal random vectors remain normally distributed with mean vector and
covariance matrix given by

E [AX + a] = AE [X] + a
Cov(AX + a) = A Cov(X) A> .
Generating Random Variables and Stochastic Processes 14

Estimation of Multivariate Normal Distributions


The simplest and most common method of estimating a multivariate normal distribution is to take the sample
mean vector and sample covariance matrix as our estimators of µ and Σ, respectively. It is easy to justify this
choice since they are the maximum likelihood estimators. It is also common to take n/(n − 1) times the sample
covariance matrix as an estimator of Σ as this estimator is known to be unbiased.

Testing Normality and Multivariate Normality


There are many tests that can be employed for testing normality of random variables and vectors. These include
standard univariate tests and tests based on QQplots, as well omnibus moment tests based on whether the
skewness and kurtosis of the data are consistent with a multivariate normal distribution. Section 3.1.4 of MFE
should be consulted for details on these tests.

5.1 Generating Multivariate Normally Distributed Random Vectors


Suppose that we wish to generate X = (X1 , . . . , Xn ) where X ∼ MNn (0, Σ). Note that it is then easy to
handle the case where E[X] 6= 0. Let Z = (Z1 , . . . , Zn )> where the Zi ’s are IID N(0, 1) for i = 1, . . . , n. If C is
an (n × m) matrix then it follows that
C> Z ∼ MN(0, C> C).
Our problem therefore reduces to finding C such that C> C = Σ. We can use the Cholesky decomposition of Σ
to find such a matrix, C.

The Cholesky Decomposition of a Symmetric Positive-Definite Matrix


A well known fact from linear algebra is that any symmetric positive-definite matrix, M, may be written as

M = U> DU

where U is an upper triangular matrix and D is a diagonal matrix with positive diagonal elements. Since Σ is
symmetric positive-definite, we can therefore write
√ √ √ √
Σ = U> DU = (U> D)( DU) = ( DU)> ( DU).

The matrix C = DU therefore satisfies C> C = Σ. It is called the Cholesky Decomposition of Σ.

The Cholesky Decomposition in Matlab and R


It is easy to compute the Cholesky decomposition of a symmetric positive-definite matrix in Matlab and R using
the chol command and so it is also easy to simulate multivariate normal random vectors. As before, let Σ be an
(n × n) variance-covariance matrix and let C be its Cholesky decomposition. If X ∼ MN(0, Σ) then we can
generate random samples of X in Matlab as follows:

Sample Matlab Code


>> Sigma = [1.0 0.5 0.5;
0.5 2.0 0.3;
0.5 0.3 1.5];
>> C = chol(Sigma);
>> Z = randn(3,1000000);
>> X = C’*Z;
>> cov(X’)

ans =
0.9972 0.4969 0.4988
0.4969 1.9999 0.2998
0.4988 0.2998 1.4971
Generating Random Variables and Stochastic Processes 15

We must be very careful in Matlab2 and R to pre-multiply Z by C> and not C. We have the following algorithm
for generating multivariate random vectors, X.

Generating Correlated Normal Random Variables

generate Z ∼ MN(0, I)
/∗ Now compute the Cholesky Decomposition ∗/
compute C such that C> C = Σ
set X = C> Z

6 Simulating Poisson Processes


Recall that a Poisson process, N (t), with intensity λ is a process such that

(λt)r e−λt
P (N (t) = r) = .
r!
For a Poisson process the numbers of arrivals in non-overlapping intervals are independent and the distribution
of the number of arrivals in an interval only depends on the length of the interval.
The Poisson process is good for modeling many phenomena including the emission of particles from a
radioactive source and the arrivals of customers to a queue. The ith inter-arrival time, Xi , is defined to be the
interval between the (i − 1)th and ith arrivals of the Poisson process, and it is easy to see that the Xi ’s are IID
∼ Exp(λ). In particular, this means we can simulate a Poisson process with intensity λ by simply generating the
inter-arrival times, Xi , where Xi ∼ Exp(λ). We have the following algorithm for simulating the first T time
units of a Poisson process:

Simulating T Time Units of a Poisson Process

set t = 0, I = 0
generate U
set t = t − log(U )/λ
while t < T
set I = I + 1, S(I) = t
generate U
set t = t − log(U )/λ

6.1 The Non-Homogeneous Poisson Process


A non-homogeneous Poisson process, N (t), is obtained by relaxing the assumption that the intensity, λ, is
constant. Instead we take it to be a deterministic function of time, λ(t). More formally, if λ(t) ≥ 0 is the
intensity of the process at time t, then we say that N (t) is a non-homogeneous Poisson process with intensity
λ(t). Define the function m(t) by
Z t
m(t) := λ(s) ds.
0
2 Unfortunately,
some languages takeC> to be the Cholesky Decomposition rather C. You must therefore always be aware
of exactly what convention your programming language / package is using.
Generating Random Variables and Stochastic Processes 16

Then it can be shown that N (t + s) − N (t) is a Poisson random variable with parameter m(t + s) − m(t), i.e.,

exp (−mt,s ) (mt,s )r


P (N (t + s) − N (t) = r) =
r!
where mt,s := m(t + s) − m(t).

Simulating a Non-Homogeneous Poisson Process


Before we describe the thinning algorithm for simulating a non-homogeneous Poisson process, we first need
the following3 proposition.

Proposition 1 Let N (t) be a Poisson process with constant intensity λ. Suppose that an arrival that occurs
at time t is counted with probability p(t), independently of what has happened beforehand. Then the process of
counted arrivals is a non-homogeneous Poisson process with intensity λ(t) = λp(t).

Suppose now N (t) is a non-homogeneous Poisson process with intensity λ(t) and that there exists a λ such that
λ(t) ≤ λ for all t ≤ T . Then we can use the following algorithm, based on Proposition 1, to simulate N (t).

The Thinning Algorithm for Simulating T Time Units of a NHPP

set t = 0, I = 0
generate U1
set t = t − log(U1 )/λ
while t < T
generate U2
if U2 ≤ λ(t)/λ then
set I = I + 1, S(I) = t
generate U1
set t = t − log(U1 )/λ

Questions
1. Can you give a more efficient version of the algorithm when there exists λ > 0 such that min0≤t≤T λ(t) ≥ λ?

2. Can you think of another algorithm for simulating a non-homogeneous Poisson process that is not based on
thinning?

6.2 Credit Derivatives Models


Many credit derivatives models use Cox processes to model company defaults. A Cox process, C(t), is similar to
a non-homogeneous Poisson process except that the intensity function, λ(t), is itself a stochastic process.
However, conditional upon knowing λ(t) for all t ∈ [0, T ], C(t) is a non-homogeneous Poisson process. In credit
derivatives models, bankruptcy of a company is often modelled as occurring on the first arrival in the Cox
process where the intensity at time t, λ(t), generally depends on the level of other variables in the economy.
Such variables might include, for example, interest rates, credit ratings and stock prices, all of which are
themselves random. An understanding of and ability to simulate non-homogeneous Poisson processes is clearly
necessary for analyzing such credit derivatives models.

3A proof may be found in Simulation by Sheldon M. Ross.


Generating Random Variables and Stochastic Processes 17

7 Simulating (Geometric) Brownian Motion


Definition 1 A stochastic process, {Xt : t ≥ 0}, is a Brownian motion with parameters (µ, σ) if
1. For 0 < t1 < t2 < . . . < tn−1 < tn
(Xt2 − Xt1 ), (Xt3 − Xt2 ), . . . , (Xtn − Xtn−1 )
are mutually independent.
2. For s > 0, Xt+s − Xt ∼ N(µs, σ 2 s) and
3. Xt is a continuous function of t w.p. 1.
We say that X is a B(µ, σ) Brownian motion with drift, µ, and volatility, σ. When µ = 0 and σ = 1 we have a
standard Brownian motion (SBM). We will use Bt to denote a SBM and we will always assume (unless
otherwise stated) that B0 = 0. Note that if X ∼ B(µ, σ) and X0 = x then we can write
Xt = x + µt + σBt
where B is a SBM. We will usually write a B(µ, σ) Brownian motion in this way.
Remark 2 Bachelier (1900) and Einstein (1905) were the first to explore Brownian motion from a
mathematical viewpoint whereas Wiener (1920’s) was the first to show that it actually exists as a well-defined
mathematical entity.
Questions
1. What is E[Bt+s Bs ]?
2. What is E[Xt+s Xs ] where X ∼ B(µ, σ)?
3. Let B be a SBM and let Zt := |Bt |. What is the CDF of Zt for t fixed?

7.1 Simulating a Standard Brownian Motion


It is not possible to simulate an entire sample path of Brownian motion between 0 and T as this would require
an infinite number of random variables. This is not always a problem, however, since we often only wish to
simulate the value of Brownian motion at certain fixed points in time. For example, we may wish to simulate
Bti for t1 < t2 < . . . < tn , as opposed to simulating Bt for every t ∈ [0, T ].
Sometimes, however, the quantity of interest, θ, that we are trying to estimate does indeed depend on the entire
sample path of Bt in [0, T ]. In this case, we can still estimate θ by again simulating Bti for t1 < t2 < . . . < tn
but where we now choose n to be very large. We might, for example, choose n so that |ti+1 − ti | <  for all i,
where  > 0 is very small. By choosing  to be sufficiently small, we hope to minimize the numerical error (as
opposed to the statistical error), in estimating θ. We will return to this issue later in the course when we learn
how to simulate stochastic differential equations (SDE’s).
In either case, we need to be able to simulate Bti for t1 < t2 < . . . < tn and for a fixed n. We will now see how
to do this. The first observation we make is that
(Bt2 − Bt1 ), (Bt3 − Bt2 ), . . . , (Btn − Btn−1 )
are mutually independent, and for s > 0, Bt+s − Bt ∼ N(0, s). The idea then is as follows: we begin with
t0 = 0 and Bt0 = 0. We then generate Bt1 which we can do since Bt1 ∼ N(0, t1 ). We now generate Bt2 by first
observing that Bt2 = Bt1 + (Bt2 − Bt1 ). Then since (Bt2 − Bt1 ) is independent of Bt1 , we can generate Bt2 by
generating an N(0, t2 − t1 ) random variable and simply adding it to Bt1 . More generally, if we have already
generated Bti then we can generate Bti+1 by generating an N(0, ti+1 − ti ) random variable and adding it to
Bti . We have the following algorithm.
Generating Random Variables and Stochastic Processes 18

Simulating a Standard Brownian Motion

set t0 = 0, Bt0 = 0
for i = 1 to n
generate X ∼ N(0, ti − ti−1 ))
set Bti = Bti−1 + X

Remark 3 It is very important that when you generate Bti+1 , you do so conditional on the value of Bti . If you
generate Bti and Bti+1 independently of one another then you are effectively simulating from different sample
paths of the Brownian motion. This is not correct! In fact when we generate (Bt1 , Bt2 , . . . , Btn ) we are actually
generating a random vector that does not consist of IID random variables.

Simulating a B(µ, σ) Brownian Motion


Suppose now that we want to simulate a B(µ, σ) BM, X, at the times t1 , t2 , . . . , tn−1 , tn . Then all we have to
do is simulate an SBM, (Bt1 , Bt2 , . . . , Btn ), and use our earlier observation that Xt = x + µt + σBt .

Brownian Motion as a Model for Stock Prices?


There are a number of reasons why Brownian motion is not a good model for stock prices. They include
1. The limited liability of shareholders
2. The fact that people care about returns, not absolute prices so the IID increments property of BM should
not hold for stock prices.
As a result, geometric Brownian Motion (GBM) is a much better model for stock prices.

7.2 Geometric Brownian Motion


Definition 2 A stochastic process, {Xt : t ≥ 0}, is a (µ, σ) geometric Brownian motion (GBM) if
log(X) ∼ B(µ − σ 2 /2, σ). We write X ∼ GBM(µ, σ).

The following properties of GBM follow immediately from the definition of BM:
X t2 Xt3 X tn
1. Fix t1 , t2 , . . . , tn . Then X t1 , Xt3 , Xtn−1 are mutually independent.
 
Xt+s 
2. For s > 0, log Xt ∼ N (µ − σ 2 /2)s, σ 2 s .

3. Xt is continuous w.p. 1.
Again, we call µ the drift and σ the volatility. If X ∼ GBM (µ, σ), then note that
 Xt has a lognormal
distribution. In particular, if X ∼ GBM(µ, σ), then Xt ∼ LN (µ − σ 2 /2)t, σ 2 t . In Figure 1 we have plotted
some sample paths of Brownian and geometric Brownian motions.

Question: How would you simulate a sample path of GBM (µ, σ 2 ) at the fixed times 0 < t1 < t2 < . . . < tn ?
Answer: Simulate log(Xti ) first and then take exponentials! (See below for more details.)
Generating Random Variables and Stochastic Processes 19

1.5 180

1
160

0.5
140

0
Bt Xt
120

-0.5

100
-1

80
-1.5

-2 60
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Years Years

(a) Brownian motion (b) Geometric Brownian motion

Figure 1: Sample paths of Brownian motion, Bt , and geometric Brownian motion (GBM), Xt =
2
X0 e(µ−σ /2)t+σBt . Parameters for the GBM were X0 = 100, µ = 10% and σ = 30%.

Modelling Stock Prices as Geometric Brownian Motion


Suppose X ∼ GBM(µ, σ). Note the following:

1. If Xt > 0, then Xt+s is always positive for any s > 0 so limited liability is not violated.

2. The distribution of XXt+s


t
only depends on s so the distribution of returns from one period to the next only
depends on the length of the period.
This suggests that GBM might be a reasonable model for stock prices. In fact, we will often model stock prices
as GBM’s in this course, and we will generally use the following notation:
• S0 is the known stock price at t = 0
• St is the random stock price at time t and
2
St = S0 e(µ−σ /2)t+σBt

where B is a standard BM. The drift is µ, σ is the volatility and S is a therefore a GBM (µ, σ) process
that begins at S0 .

Questions
1. What is E[St ]?
2. What is E[St2 ]?
2
/2)∆t+σ(Bt+∆t −Bt )
2. Show St+∆t = St e(µ−σ .

Suppose now that we wish to simulate S ∼ GBM (µ, σ). Then we know
2
/2)∆t+σ(Bt+∆t −Bt )
St+∆t = St e(µ−σ

so that we can simulate St+∆t conditional on St for any ∆t > 0 by simply simulating an N(0, ∆t) random
variable.
Generating Random Variables and Stochastic Processes 20

Example 20 (Simulating Delta-Hedging in a Black-Scholes Economy)


In this extended example we consider the use of the Black-Scholes model to hedge a vanilla European call
option. Moreover, we will assume that the assumptions of Black-Scholes are correct so that the security price
has GBM dynamics, it is possible to trade continuously at no cost and borrowing and lending at the risk-free rate
are also possible. It is then possible to dynamically replicate the payoff of the call option using a self-financing
(s.f.) trading strategy. The initial value of this s.f. strategy is the famous Black-Scholes arbitrage-free price of
the option. The s.f. replication strategy requires the continuous delta-hedging of the option but of course it is
not practical to do this and so instead we hedge periodically. (Periodic or discrete hedging then results in some
replication error but this error goes to 0 as the time interval between re-balancing goes to 0.)
Towards this end, let Pt denote the time t value of the discrete-time s.f. strategy that attempts to replicate the
option payoff and let C0 denote the initial value of the option. The replicating strategy is then given by

P0 := C0 (5)

Pti+1 = Pti + (Pti − δti Sti ) r∆t + δti Sti+1 − Sti + qSti ∆t (6)

where ∆t := ti+1 − ti is the length of time between re-balancing (assumed constant for all i), r is the annual
risk-free interest rate (assuming per-period compounding), q is the dividend yield and δti is the Black-Scholes
delta at time ti . This delta is a function of Sti and some assumed implied volatility, σimp say. Note that (5) and
(6) respect the self-financing condition. Stock prices are simulated assuming St ∼ GBM(µ, σ) so that
2

St+∆t = St e(µ−σ /2)∆t+σ ∆tZ

where Z ∼ N(0, 1). In the case of a short position in a call option with strike K and maturity T , the final
trading P&L is then defined as
P&L := PT − (ST − K)+ (7)
where PT is the terminal value of the replicating strategy in (6). In the Black-Scholes world we have σ = σimp
and the P&L will be 0 along every price path in the limit as ∆t → 0.
In practice, however, we do not know σ and so the market (and hence the option hedger) has no way to ensure
a value of σimp such that σ = σimp . This has interesting implications for the trading P&L and it means in
particular that we cannot exactly replicate the option even if all of the assumptions of Black-Scholes are correct.
In Figure 2 we display histograms of the P&L in (7) that results from simulating 100,000 sample paths of the
underlying price process with S0 = K = $100. (Other parameters and details are given below the figure.) In the
case of the first histogram the true volatility was σ = 30% with σimp = 20% and the option hedger makes
(why?) substantial loses. In the case of the second histogram the true volatility was σ = 30% with σimp = 40%
and the option hedger makes (why?) substantial gains.
Clearly then this is a situation where substantial errors in the form of non-zero hedging P&L’s are made and this
can only be due to the use of incorrect model parameters. This example is intended4 to highlight the
importance of not just having a good model but also having the correct model parameters.
Note that the payoff from delta-hedging an option is in general path-dependent, i.e. it depends on the price
path taken by the stock over the entire time interval. In fact, it can be shown that the payoff from continuously
delta-hedging an option satisfies
T
St2 ∂ 2 Vt
Z
2
− σt2

P&L = σimp dt (8)
0 2 ∂S 2

where Vt is the time t value of the option and σt is the realized instantaneous volatility at time t. We recognize
S2 2
the term 2t ∂∂SV2t as the dollar gamma. It is always positive for a call or put option, but it goes to zero as the
4 We do acknowledge that this example is somewhat contrived in that if the true price dynamics really were GBM dynamics

then we could estimate σimp perfectly and therefore exactly replicate the option payoff (in the limit of continuous trading).
That said, it can also be argued that this example is not at all contrived: options traders in practice know the Black-Scholes
model is incorrect but still use the model to hedge and face the question of what is the appropriate value of σimp to use. If
they use a value that doesn’t match (in a general sense) some true “average” level of volatility then they will experience P&L
profiles of the form displayed in Figure 2.
Generating Random Variables and Stochastic Processes 21
6000 8000

7000
5000

6000

4000
5000
# of Paths

# of Paths
3000 4000

3000
2000

2000

1000
1000

0 0
−8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8

(a) Delta-hedging P&L: true vol. = 30%, imp. vol. = 20%. (b) Delta-hedging P&L: true vol. = 30%, imp. vol. = 40%.

Figure 2: Histogram of P&L from simulating 100k paths where we hedge a short call position with S0 =
K = $100, T = 6 months, true volatility σ = 30%, and r = q = 1%. A time step of dt = 1/2, 000 was used
so hedging P&L due to discretization error is negligible. The hedge ratio, i.e. delta, was calculated using
the implied volatility that was used to calculate the initial option price.

option moves significantly into or out of the money.


Returning to the self-financing trading strategy of (5) and (6), note that we can choose any model we like for
the security price dynamics. In particular, we are not restricted to choosing GBM and other diffusion or
jump-diffusion models could be used instead. It is interesting to simulate these alternative models and to then
observe what happens to the replication error in (8) where the δti ’s are computed assuming (incorrectly) GBM
price dynamics. Note that it is common to perform simulation experiments like this when using a model to price
and hedge a particular security. The goal then is to understand how robust the hedging strategy (based on the
given model) is to alternative price dynamics that might prevail in practice. Given the appropriate data, one can
also back-test the performance of a model on realized historical price data to assess its hedging performance.
This back-testing is sometimes called a historical simulation.

Simulating Multidimensional (Geometric) Brownian Motions


It is often the case that we wish to simulate a vector of geometric Brownian motion paths. In this case we again
have  
(i) (i)
(i) (i) (µ −σ 2 /2)∆t+σi Bt+∆t −Bt
St+∆t = St e i i (9)
(i) (i) (j) (j)
for i = 1, . . . , n, and where the Brownian increments Bt+∆t − Bt and Bt+∆t − Bt have correlation ρi,j .
Since we know how to simulate multivariate normal distribution using the Cholesky decomposition method of
Section 5.1, it should be clear how to simulate (9) for i = 1, . . . , n.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy