ln3
ln3
Often it is convenient to work with numbers rather than events. To do this we use random vari-
ables:
Definition 1 A random variable is a function from the sample space to the real numbers. RV=random
variable
Example 1: Toss a coin three times. The sample space is
Ω = {H H H , H H T, H T H , H T T, T H H , T H T, T T H , T T T }.
We can define a random variable X by taking the number of heads in the three tosses. Thus the
random variable can take on values 0,1,2,3. So, for example, X ({H H T }) = 2.
2. F (x) is nondecreasing.
1
Discrete random variables take on a countable (finite or infinite) number of values. Our example
above is clearly a discrete random variable. Continuous random variables take on values in some
continuous range.
Definition 4 The probability mass function (PMF) of a discrete random variable X is given by
f X (x) = P (X = x).
P P
In particular, P (X ≤ b) = x≤b f X (x). Also, note that with A = (−∞, ∞), we have x∈A f X (x) =
P (ω ∈ Ω) = 1, so the probability function always adds up to unity.
For continuous random variables, which take on an uncountable number of values, there is an
analogous idea:
Note that the probability density function is not unique. Because we only care about integrals
over the probability density function, we can change its value at a countable number of points
without changing any of the associated probabilities, and thus without changing the distribution
of the random variable.
Also note that the probability density function integrates to one, in contrast to the probability
function which sums to one.1
1 Since the PDF and PMF work similarly, many people just use “probability density function” to refer to both con-
2
By the definition of PDF, we have that
Z x
P (X ≤ x) = F X (x) = f X (t )d t .
−∞
Not all random variables taking on an uncountable number of values are continuous. There
are mixed random variables, which are partly continuous and partly discrete. For example, a
variable such as hours worked per year, or expenditures on cars, might have some positive mass
at 0, and be continuously distributed for values greater than zero. This creates some conceptual
difficulties in defining what the probability (mass or density) function is, but one can always
work with the CDF.
Suppose we have a random variable X with distribution function F X (x) and PMF or PDF f X (x).
Sometimes we are interested in the distribution of a function of this random variable, say Y =
g (X ). The distribution of this new random variable is defined by the equation
We are interested in calculating the PDF and PMF/PDF for this transformation. The following
result gives the general case:
tinuous and discrete cases. In measure-theoretic probability, there is no need to make a distinction between the two,
so this saves a bit of excess terminology.
3
Result 1 Define the mapping g −1 (A) by:
Then
This result is not always that easy to apply. More useful results are available when the transfor-
mation satisfies particular conditions. Suppose that X is a random variable with PDF or PMF
f X (x). Let
X = {x : f X (x) > 0}.
This is called (by CB) the support set or support of the distribution of X , and intuitively is the set
of values that the RV X can take on. (See below for some more discussion of supports.)
Result 2 Suppose g (x) is a strictly monotone function with inverse g −1 (·). Let Y = {y : g (x) =
y for some x ∈ X }. Then
(i) If X is a discrete random variable with PMF f X (x) on X , then Y is a discrete random variable
with PMF
f Y (y) = f X (g −1 (y)),
on Y ,
(ii) If X is a continuous random variable with PDF f X (x) on X , then Y is a continuous random
variable with PDF
¯ ∂g −1 ¯
f Y (y) = f X (g −1 (y)) · ¯ (y)¯,
¯ ¯
∂y
on Y .
The argument for the continuous case goes as follows, assuming g (·) is increasing,
F Y (y) = P (Y ≤ y) = P (g (X ) ≤ y)
= P (x ≤ g −1 (y)) = F X (g −1 (y)).
Now use the chain rule to take the derivative with respect to y to get the desired result.
Example 3: Suppose X has a Poisson distribution with parameter λ. That is, the PMF of X is This is an example
of a parametric
family of
distributions
4
e −λ λx
(
x! x = 0, 1, 2, . . .
f X (x) =
0 otherwise,
for some positive λ. This is a distribution that is often used for counts of events, for example
counts of emissions of radioactive particles, counts of job offers, number of patents, etc. Differ-
ent choices for λ lead to different probability distributions, but they all have a similar form.
e −λ λ(y/2)
(
(y/2)! y = 0, 2, 4, . . .
f X (x) =
0 otherwise.
note how Y
reflects the
Example 4: Suppose X has an exponential distribution with PDF transformation
(
e −x x >0
f X (x) =
0 otherwise.
The CDF for this distribution is F X (x) = 1 − e −x . This distribution, or rather a generalization with
f X (x) = λ exp(−λx), for positive λ, is widely used for modelling durations such as survival times
after heart transplants, or durations of unemployment spells.
This is known as a uniform distribution (the PDF is constant over the unit interval).
a uniform distribution on the interval [−1, 1]. Suppose we are interested in the distribution of
Y = g (X ) = X 2 . Because of the non–monotonicity we have to do this on a more ad hoc basis. In
this example we work directly through the cumulative distribution functions. First we calculate
5
the CDF for X :
0
x ≤ −1
F X (x) = (x + 1)/2 −1 < x ≤ 1
1 1 < x.
p p p p
P (Y ≤ y) = P (X 2 < y) = P (− y ≤ X ≤ y) = F X ( y) − F X (− y).
Hence
0
y ≤0
p p p
F Y (y) = ( y + 1)/2 − (− y + 1)/2 = y 0<y ≤1
1 1 < y.
The definition of the support X of a random variable X given in CB (and above) is a bit loose,
because when X is a continuous RV, its PDF f X (x) is not unique. A more precise definition is
that X should be the set of all points x such that every open neighborhood of x has positive
probability. A consequence of this definition is that supports are closed sets. For example, the
uniform distribution in Example 4 has support [0, 1]. In practice the informal definition usually
suffices.