BaseR Cheat Sheet
BaseR Cheat Sheet
2. θbn is consistent, i.e., θbn → θ as n → ∞ with probability 1. This follows immediately from the Strong
Law of Large Numbers (SLLN) since g(U1 ), g(U2 ), . . . , g(Un ) are IID with mean θ.
R1
Example 1 Suppose we wish to estimate 0 x3 dx using simulation. We know the exact answer is 1/4 but we
can also estimate this using simulation. In particular if we generate n U (0, 1) independent variables, cube them
and then take their average then we will have an unbiased estimate.
R3
Example 2 We wish to estimate θ = 1 (x2 + x) dx again using simulation. Once again we know the exact
answer (it’s 12.67) but we can also estimate it by noting that
Z 3 2
x +x
θ = 2 dx = 2E[X 2 + X]
1 2
where X ∼ U (1, 3). So we can estimate θ by generating n IID U (0, 1) random variables, converting them
(how?) to U (1, 3) variables, X1 , . . . , Xn , and then taking θbn := 2 i (Xi2 + Xi )/n.
P
Generating Random Variables and Stochastic Processes 2
Then we can write θ = E[g(U1 , U2 )] where U1 , U2 are IID U (0, 1) random variables. Note that the joint PDF
satisfies fu1 ,u2 (u1 , u2 ) = fu1 (u1 )fu2 (u2 ) = 1 on [0, 1]2 . As before we can estimate θ using simulation by
performing the following steps:
1. Generate n independent bivariate vectors (U1i , U2i ) for i = 1, . . . , n, with all Uji ’s IID U (0, 1).
As before, the SLLN justifies this approach and guarantees that θbn → θ w.p. 1 as n → ∞.
where X, Y are IID U (0, 1). (The true value of θ is easily calculated to be 1.)
We can also apply Monte Carlo integration to more general problems. For example, if we want to estimate
Z Z
θ= g(x, y)f (x, y) dx dy
A
where f (x, y) is a density function on A, then we observe that θ = E[g(X, Y )] where X, Y have joint density
f (x, y). To estimate θ using simulation we simply generate n random vectors (X, Y ) with joint density f (x, y)
and then estimate θ with
g(X1 , Y1 ) + . . . + g(Xn , Yn )
θbn := .
n
where p1 + p2 + p3 = 1. We would like to generate a value of X and we can do this by using our U (0, 1)
generator as follows. First generate U and then set
x1 , if 0 ≤ U ≤ p1
X= x2 , if p1 < U ≤ p1 + p2
x3 , if p1 + p2 < U ≤ 1.
We can easily check that this is correct: note that P(X = x1 ) = P(0 ≤ U ≤ p1 ) = p1 since U is U (0, 1). The
same is true for P(X = x2 ) and P(X = x3 ).
More generally, suppose X can take on n distinct values, x1 < x2 < . . . < xn , with
P(X = xi ) = pi for i = 1, . . . , n.
You should convince yourself that this is correct! How does this compare to the coin-tossing method for
generating X?
set j = 0, p = e−λ , F = p
while U > F
set p = λp/(j + 1), F = F + p, j = j + 1
set X = j
Questions: How much work does this take? What if λ is large? Can we find j more efficiently?
Answer (to last question): Yes by checking if j is close to λ first.
Further questions: Why might this be useful? How much work does this take?
Generating Random Variables and Stochastic Processes 4
1. Generate U
2. Set X = x if Fx (x) = U , i.e., set X = Fx−1 (U )
We need to prove that this algorithm actually works! But this follows immediately since
−1
P(X ≤ x) = P(Fx (U ) ≤ x) = P(U ≤ Fx (x)) = Fx (x)
as desired. This argument assumes Fx−1 exists but there is no problem even when Fx−1 does not exist. All we
have to do is
1. Generate U
2. Set X = min{x : Fx (x) ≥ U }.
This works for discrete and continuous random variables or mixtures of the two.
where c is a constant so that the density integrates to 1. How can we use this distribution?
2. The method is 1-to-1, i.e. one U (0, 1) variable produces one X variable. This property can be useful for
some variance reduction techniques.
so that we cannot even express Fx in closed form. Even if Fx is available in closed form, it may not be possible
to find Fx−1 in closed form. For example, suppose Fx (x) = x5 (1 + x)3 /8 for 0 ≤ x ≤ 1. Then we cannot
compute Fx−1 . One possible solution to these problems is to find Fx−1 numerically.
Composition Algorithm
1. Generate I that is distributed on the non-negative integers so that P(I = j) = pj . (How do we do this?)
The proof actually suggests that the composition approach might arise naturally from ‘sequential’ type
experiments. Consider the following example.
Example 9 (A Sequential Experiment)
Suppose we roll a dice and let Y ∈ {1, 2, 3, 4, 5, 6} be the outcome. If If Y = i then we generate Zi from the
distribution Fi and set X = Zi .
generate U1
if U1 ≤ p1 then
set i = 1
else set i = 2
generate U2
/∗ Now generate X from Exp(λi ) ∗/
set
1
X = − log(U2 )
λi
Example 11 (Splitting)
Suppose
1 6
fx (x) =
1[−1,0] (x) + 1[0,2] (x).
5 15
How do we simulate a value of X using vertical splitting? How would horizontal splitting work?
P ((Y ≤ x) ∩ B)
= . (1)
P(B)
We can compute P(B) as
f (Y ) 1
P(B) = P U≤ =
ag(Y ) a
while the numerator in (1) satisfies
Z ∞
P ((Y ≤ x) ∩ B) = P ((Y ≤ x) ∩ B | Y = y) g(y) dy
−∞
Z ∞
f (Y )
= P (Y ≤ x) ∩ U ≤ Y = y g(y) dy
−∞ ag(Y )
Z x
f (y)
= P U≤ g(y) dy (why?)
−∞ ag(y)
Fx (x)
=
a
Generating Random Variables and Stochastic Processes 8
1. First choose g(y): let’s take g(y) = 1 for y ∈ [0, 1], i.e., Y ∼ U (0, 1)
2. Then find a. Recall that we must have
f (x)
≤ a for all x,
g(x)
which implies
60x3 (1 − x)2 ≤ a for all x ∈ [0, 1].
So take a = 3. It is easy to check that this value works. We then have the following algorithm.
Algorithm
generate Y ∼ U (0, 1)
generate U ∼ U (0, 1)
while U > 20Y 3 (1 − Y )2
generate Y
generate U
set X = Y
How Do We Choose a?
E[N ] = a, so clearly we would like a to be as small as possible. Usually, this is just a matter of calculus.
generate Y
generate U
set X = Y
Generally, we would use this A-R algorithm when we can simulate Y efficiently.
Remark 1 The A-R algorithm is an important algorithm for generating random variables. Moreover it can be
used to generate samples from distributions that are only known up to a constant. It is very inefficient in
high-dimensions, however, which is why Markov Chain Monte Carlo (MCMC) algorithms are required.
Generating Random Variables and Stochastic Processes 10
2. Set X = Y1 + . . . + Yn
We briefly mentioned this earlier in Example 7 when we described how to generate a Gamma(λ, n) random
variable. The convolution method is not always the most efficient method. Why?
More generally, suppose we want to simulate a value of a random variable, X, and we know that
X ∼ g(Y1 , . . . , Yn )
for some random variables Yi and some function g(·). Note that the Yi ’s need not necessarily be IID. If we know
how to generate (Y1 , . . . , Yn ) then we can generate X by generating (Y1 , . . . , Yn ) and setting
X = g(Y1 , . . . , Yn ). We saw such an application in Example 7.
X
Z := q
Y
n
X := µ + σZ ∼ N(µ, σ 2 )
so that we need only concern ourselves with generating N(0, 1) random variables. One possibility for doing this
is to use the inverse transform method. But we would then have to use numerical methods since we cannot find
Fz−1 (·) := Φ−1 (·) in closed form. Other approaches for generating N(0, 1) random variables include:
1. The Box-Muller method
2. The Polar method
3. Rational approximations.
There are many other methods such as the A-R algorithm that could also be used to generate N (0, 1) random
variables.
The Box-Muller Algorithm for Generating Two IID N(0, 1) Random Variables
We now show that this algorithm does indeed produce two IID N(0, 1) random variables, X and Y .
Proof: We need to show that
2 2
1 x 1 y
f (x, y) = √ exp − √ exp −
2π 2 2π 2
First, make a change of variables:
p
R := X2 + Y 2
−1 Y
θ := tan
X
Generating Random Variables and Stochastic Processes 12
so R and θ arep polar coordinates of (X, Y ). To transform back, note X = R cos(θ) and Y = R sin(θ). Note
also that R = −2 log(U1 ) and θ = 2πU2 . Since U1 and U2 are IID, R and θ are independent. Clearly
2
θ ∼ U (0, 2π) so fθ (θ) = 1/2π for 0 ≤ θ ≤ 2π. It is also easy to see that fR (r) = re−r /2 for r ≥ 0, so that
1 −r2 /2
fR,θ (r, θ) = re , 0 ≤ θ ≤ 2π, r ≥ 0.
2π
This implies
where A = {(r, θ) : r cos(θ) ≤ x, r sin(θ) ≤ y}. We now transform back to (x, y) coordinates with
and note that dx dy = rdr dθ, i.e., the Jacobian of the transformation is r. We then use (2) to obtain
Z x1 Z y1
(x2 + y 2 )
1
P(X ≤ x1 , Y ≤ y1 ) = exp − dxdy
2π −∞ −∞ 2
Z x1 Z y1
1 1
= √ exp(−x2 /2) dx √ exp(−y 2 /2) dy
2π −∞ 2π −∞
as required.
The Polar Algorithm for Generating Two IID N(0, 1) Random Variables
set S = 2
while S > 1
generate U1 and U2 IID U (0, 1)
set V1 = 2U1 − 1, V2 = 2U2 − 1 and S = V12 + V22
set r r
−2 log(S) −2 log(S)
X= V1 and Y = V2
S S
The standard multivariate normal has µ = 0 and Σ = In , the n × n identity matrix. The PDF of X is given by
1 1 >
Σ−1 (x−µ)
f (x) = e− 2 (x−µ) (3)
(2π)n/2 |Σ|1/2
Recall again our partition of X into X1 = (X1 , . . . , Xk )> and X2 = (Xk+1 , . . . , Xn )> . If we extend this
notation naturally so that
µ1 Σ11 Σ12
µ = and Σ = .
µ2 Σ21 Σ22
then we obtain the following results regarding the marginal and conditional distributions of X.
Marginal Distribution
The marginal distribution of a multivariate normal random vector is itself multivariate normal. In particular,
Xi ∼ MN(µi , Σii ), for i = 1, 2.
Conditional Distribution
Assuming Σ is positive definite, the conditional distribution of a multivariate normal distribution is also a
multivariate normal distribution. In particular,
X2 | X1 = x1 ∼ MN(µ2.1 , Σ2.1 )
Linear Combinations
Linear combinations of multivariate normal random vectors remain normally distributed with mean vector and
covariance matrix given by
E [AX + a] = AE [X] + a
Cov(AX + a) = A Cov(X) A> .
Generating Random Variables and Stochastic Processes 14
M = U> DU
where U is an upper triangular matrix and D is a diagonal matrix with positive diagonal elements. Since Σ is
symmetric positive-definite, we can therefore write
√ √ √ √
Σ = U> DU = (U> D)( DU) = ( DU)> ( DU).
√
The matrix C = DU therefore satisfies C> C = Σ. It is called the Cholesky Decomposition of Σ.
ans =
0.9972 0.4969 0.4988
0.4969 1.9999 0.2998
0.4988 0.2998 1.4971
Generating Random Variables and Stochastic Processes 15
We must be very careful in Matlab2 and R to pre-multiply Z by C> and not C. We have the following algorithm
for generating multivariate random vectors, X.
generate Z ∼ MN(0, I)
/∗ Now compute the Cholesky Decomposition ∗/
compute C such that C> C = Σ
set X = C> Z
(λt)r e−λt
P (N (t) = r) = .
r!
For a Poisson process the numbers of arrivals in non-overlapping intervals are independent and the distribution
of the number of arrivals in an interval only depends on the length of the interval.
The Poisson process is good for modeling many phenomena including the emission of particles from a
radioactive source and the arrivals of customers to a queue. The ith inter-arrival time, Xi , is defined to be the
interval between the (i − 1)th and ith arrivals of the Poisson process, and it is easy to see that the Xi ’s are IID
∼ Exp(λ). In particular, this means we can simulate a Poisson process with intensity λ by simply generating the
inter-arrival times, Xi , where Xi ∼ Exp(λ). We have the following algorithm for simulating the first T time
units of a Poisson process:
set t = 0, I = 0
generate U
set t = t − log(U )/λ
while t < T
set I = I + 1, S(I) = t
generate U
set t = t − log(U )/λ
Then it can be shown that N (t + s) − N (t) is a Poisson random variable with parameter m(t + s) − m(t), i.e.,
Proposition 1 Let N (t) be a Poisson process with constant intensity λ. Suppose that an arrival that occurs
at time t is counted with probability p(t), independently of what has happened beforehand. Then the process of
counted arrivals is a non-homogeneous Poisson process with intensity λ(t) = λp(t).
Suppose now N (t) is a non-homogeneous Poisson process with intensity λ(t) and that there exists a λ such that
λ(t) ≤ λ for all t ≤ T . Then we can use the following algorithm, based on Proposition 1, to simulate N (t).
set t = 0, I = 0
generate U1
set t = t − log(U1 )/λ
while t < T
generate U2
if U2 ≤ λ(t)/λ then
set I = I + 1, S(I) = t
generate U1
set t = t − log(U1 )/λ
Questions
1. Can you give a more efficient version of the algorithm when there exists λ > 0 such that min0≤t≤T λ(t) ≥ λ?
2. Can you think of another algorithm for simulating a non-homogeneous Poisson process that is not based on
thinning?
set t0 = 0, Bt0 = 0
for i = 1 to n
generate X ∼ N(0, ti − ti−1 ))
set Bti = Bti−1 + X
Remark 3 It is very important that when you generate Bti+1 , you do so conditional on the value of Bti . If you
generate Bti and Bti+1 independently of one another then you are effectively simulating from different sample
paths of the Brownian motion. This is not correct! In fact when we generate (Bt1 , Bt2 , . . . , Btn ) we are actually
generating a random vector that does not consist of IID random variables.
The following properties of GBM follow immediately from the definition of BM:
X t2 Xt3 X tn
1. Fix t1 , t2 , . . . , tn . Then X t1 , Xt3 , Xtn−1 are mutually independent.
Xt+s
2. For s > 0, log Xt ∼ N (µ − σ 2 /2)s, σ 2 s .
3. Xt is continuous w.p. 1.
Again, we call µ the drift and σ the volatility. If X ∼ GBM (µ, σ), then note that
Xt has a lognormal
distribution. In particular, if X ∼ GBM(µ, σ), then Xt ∼ LN (µ − σ 2 /2)t, σ 2 t . In Figure 1 we have plotted
some sample paths of Brownian and geometric Brownian motions.
Question: How would you simulate a sample path of GBM (µ, σ 2 ) at the fixed times 0 < t1 < t2 < . . . < tn ?
Answer: Simulate log(Xti ) first and then take exponentials! (See below for more details.)
Generating Random Variables and Stochastic Processes 19
1.5 180
1
160
0.5
140
0
Bt Xt
120
-0.5
100
-1
80
-1.5
-2 60
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Years Years
Figure 1: Sample paths of Brownian motion, Bt , and geometric Brownian motion (GBM), Xt =
2
X0 e(µ−σ /2)t+σBt . Parameters for the GBM were X0 = 100, µ = 10% and σ = 30%.
1. If Xt > 0, then Xt+s is always positive for any s > 0 so limited liability is not violated.
where B is a standard BM. The drift is µ, σ is the volatility and S is a therefore a GBM (µ, σ) process
that begins at S0 .
Questions
1. What is E[St ]?
2. What is E[St2 ]?
2
/2)∆t+σ(Bt+∆t −Bt )
2. Show St+∆t = St e(µ−σ .
Suppose now that we wish to simulate S ∼ GBM (µ, σ). Then we know
2
/2)∆t+σ(Bt+∆t −Bt )
St+∆t = St e(µ−σ
so that we can simulate St+∆t conditional on St for any ∆t > 0 by simply simulating an N(0, ∆t) random
variable.
Generating Random Variables and Stochastic Processes 20
P0 := C0 (5)
Pti+1 = Pti + (Pti − δti Sti ) r∆t + δti Sti+1 − Sti + qSti ∆t (6)
where ∆t := ti+1 − ti is the length of time between re-balancing (assumed constant for all i), r is the annual
risk-free interest rate (assuming per-period compounding), q is the dividend yield and δti is the Black-Scholes
delta at time ti . This delta is a function of Sti and some assumed implied volatility, σimp say. Note that (5) and
(6) respect the self-financing condition. Stock prices are simulated assuming St ∼ GBM(µ, σ) so that
2
√
St+∆t = St e(µ−σ /2)∆t+σ ∆tZ
where Z ∼ N(0, 1). In the case of a short position in a call option with strike K and maturity T , the final
trading P&L is then defined as
P&L := PT − (ST − K)+ (7)
where PT is the terminal value of the replicating strategy in (6). In the Black-Scholes world we have σ = σimp
and the P&L will be 0 along every price path in the limit as ∆t → 0.
In practice, however, we do not know σ and so the market (and hence the option hedger) has no way to ensure
a value of σimp such that σ = σimp . This has interesting implications for the trading P&L and it means in
particular that we cannot exactly replicate the option even if all of the assumptions of Black-Scholes are correct.
In Figure 2 we display histograms of the P&L in (7) that results from simulating 100,000 sample paths of the
underlying price process with S0 = K = $100. (Other parameters and details are given below the figure.) In the
case of the first histogram the true volatility was σ = 30% with σimp = 20% and the option hedger makes
(why?) substantial loses. In the case of the second histogram the true volatility was σ = 30% with σimp = 40%
and the option hedger makes (why?) substantial gains.
Clearly then this is a situation where substantial errors in the form of non-zero hedging P&L’s are made and this
can only be due to the use of incorrect model parameters. This example is intended4 to highlight the
importance of not just having a good model but also having the correct model parameters.
Note that the payoff from delta-hedging an option is in general path-dependent, i.e. it depends on the price
path taken by the stock over the entire time interval. In fact, it can be shown that the payoff from continuously
delta-hedging an option satisfies
T
St2 ∂ 2 Vt
Z
2
− σt2
P&L = σimp dt (8)
0 2 ∂S 2
where Vt is the time t value of the option and σt is the realized instantaneous volatility at time t. We recognize
S2 2
the term 2t ∂∂SV2t as the dollar gamma. It is always positive for a call or put option, but it goes to zero as the
4 We do acknowledge that this example is somewhat contrived in that if the true price dynamics really were GBM dynamics
then we could estimate σimp perfectly and therefore exactly replicate the option payoff (in the limit of continuous trading).
That said, it can also be argued that this example is not at all contrived: options traders in practice know the Black-Scholes
model is incorrect but still use the model to hedge and face the question of what is the appropriate value of σimp to use. If
they use a value that doesn’t match (in a general sense) some true “average” level of volatility then they will experience P&L
profiles of the form displayed in Figure 2.
Generating Random Variables and Stochastic Processes 21
6000 8000
7000
5000
6000
4000
5000
# of Paths
# of Paths
3000 4000
3000
2000
2000
1000
1000
0 0
−8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8
(a) Delta-hedging P&L: true vol. = 30%, imp. vol. = 20%. (b) Delta-hedging P&L: true vol. = 30%, imp. vol. = 40%.
Figure 2: Histogram of P&L from simulating 100k paths where we hedge a short call position with S0 =
K = $100, T = 6 months, true volatility σ = 30%, and r = q = 1%. A time step of dt = 1/2, 000 was used
so hedging P&L due to discretization error is negligible. The hedge ratio, i.e. delta, was calculated using
the implied volatility that was used to calculate the initial option price.