0% found this document useful (0 votes)

107 views66 pages

6 Two-And Higher-Dimensional Random Variables

This document introduces two-dimensional random variables, which describe experiments with outcomes that can be characterized by two numerical values. A two-dimensional random variable (X,Y) assigns a real number to each outcome using two functions, X and Y. The document defines discrete and continuous two-dimensional random variables and describes their probability distributions using either a probability function p or a joint probability density function f. It also provides an example of a discrete two-dimensional random variable and its probability distribution given in a table.

Uploaded by

Viba R Udupa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

107 views66 pages

6 Two-And Higher-Dimensional Random Variables

Uploaded by

Viba R Udupa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

6

Two- and Higher-Dimensional Random Variables

6.1 Two-Dimensional Random Variables

In our study of random variables we have, so far, considered only the one
dimensional case. That is, the outcome of the experiment could be recorded as a
single number x.

In many situations, however, we are interested in observing two or more numer

ical characteristics simultaneously. For example, the hardness Hand the tensile
strength T of a manufactured piece of steel may be of interest, and we would con
sider (h, t) as a single experimental outcome. We might study the height Hand
the weight W of some chosen person, giving rise Jo the outcome (h, w). Finally
we might observe the total rainfall R and the average temperature T at a certain
locality during a specified month; giving rise to the outcome (r, t).
We shall make the following formal definition.

Definition. Let e be an experiment and S a sample space associated with e.

Let X = X(s) and Y = Y(s) be two functions each assigning a real number
to each outcomes E S (Fig. 6.1). We call (X, Y ) a two-dimensional random
variable (sometimes called a random vector).

If X1 = X1(s), X2 = X2(s), ... , Xn = Xn(s) are n functions each assigning

a real number to every outcomes E S, we call (Xi. ... , Xn) an n-dimensional
random variable (or an n-dimensional random vector.)

FIGURE 6.1
93
94 Two- and Higher-Dimensional Random Variables 6.1

Note: As in the one-dimensional case, our concern will be not with the functional
nature of X(s) Y(s), but rather with the values which X and Y assume. We shall
and
again speak of the range space of (X, Y), say RxxY, as the set of all possible values of
(X, Y). In the two-dimensional case, for instance, the range space of (X, Y) will be a
subset of the Euclidean plane. Each outcome X(s), Y(s) may be represented as a point
(x, y) in the plane. We will again suppress the functional nature of X and Y by writing,
for example, P[X � a, Y � b] instead of P[X(s) � a, Y(s) � b].
As in the one-dimensional case, we shall distinguish between two basic types of ran
dom variables: the discrete and the continuous random variables.

Definition. (X, Y) is a two-dimensional discrete random variable if the possible

values of (X, Y) are finite or countably infinite. That is, the possible values
of (X, Y) may be represented as (xi, Yi), i = l, 2, ... , n, ... ; j = 1, 2, ... ,
m, . .
.

(X, Y) is a two-dimensional continuous ra�dom variable if (X, Y) can assume

all values in some noncountable set of the Euclidean plane. [For example, if
(X, Y) assumes all values in the rectangle {(x,y) I a _::::; x _::::; b, c _::::; y _::::; d}
or all values in the circle {(x,y) I x2 + y2 _::::; l } , we would say that (X, Y)
is a continuous two-dimensional random variable.]

Notes: (a) Speaking informally, (X, Y) is a two-dimensional random variable if it

represents the outcome of ·a random experiment in which we have measured the two
numerical characteristics X and Y.
(b) It may happen that one of the components of (X, Y), say X, is discrete, while
the other is continuous. However, in most applications we deal only with the cases
discussed above, in which either both components are discrete or both are continuous.
(c) In many situations the two random variables X and Y, when considered jointly,
are in a very natural way the outcome of a single experiment, as illustrated in the above
examples. For instance, X and Y may represent the height and weight of the same
individual, etc. However this sort of connection need not exist. For example, X might
be the current flowing in a circuit at a specified moment, while Y might be the tem
perature in the room at that moment, and we could then consider the two-dimensional
random variable (X, Y). In most applications there is a very definite reason for con
sidering X and Y jointly.
We proceed in analogy with the one-dimensional case in describing the probability
distribution of (X, Y).

Definition. (a) Let (X, Y) be a two-dimensional discrete random variable. With

each possible outcome (xi, Yi) we associate a number p(xi, Yi) representing
P(X = Xi, Y =
Yi) and satisfying the following conditions:

(1) p(x i, Yi) 2:: 0 for all (x, y),

00 00

(6.1)
(2) L L p(x;, Yi) = 1.
i=l i=l
6.1 Two-Dimensional Random Variables 95

The function p defined for all (xi, Yi) in the range space of (X, Y) is called the
probability function of (X, Y). The set of triples (xi, Y;, p(xi, Yi)), i, j I, 2,
=

..., is sometimes called the probability distribution of (X, Y).

(b) Let (X, Y) be a continuous random variable assuming all values in some
region R of the Euclidean plane. The joint probability density function f is a
function satisfying the following conditions:

(3) f(x, y) � 0 for all (x, y) E R,

(4) ff (x, y) x
f d dy 1.= (6.2)
R

Notes: (a) The analogy to a mass distribution is again clear. We have a unit mass
distributed over a region in the plane. In the discrete case, all the mass is concentrated
at a finite or countably infinite number of places with mass p(x;, y 1) located at (x1, y i).
In the continuous case, mass is found at all points of some noncountable set in the plane.
(b) Condition 4 states that the total volume under the surface given by the equation
z = f(x, y) equals 1.

(c) As in .the one-dimensional case, f(x, y) does not represent the probability of any
thing. However, for positive �x and �Y sufficiently small, f(x, y) �x �Y is approximately
equal to P(x :S X :S x + �x, y :S Y :S y + �y).
(d) As in the one-dimensional case we shall adopt the convention that f(x, y) = 0
if (x, y) f/:. R. Hence we may consider f defined for all (x, y) in the plane and the require
ment 4 above becomes J�= J�= f(x, y) dx dy = 1.
(e) We shall again suppress the functional nature of the two-dimensional random
variable (X, Y). We should always be writing statements of the form P[X(s) x;, =

Y(s) = Y;], etc. However, if our shortcut notation is understood, no difficulty should
arise.
(f) Again, as in the one-dimensional case, the probability distribution of (X, Y) is
actually induced by the probability of events associated with the original sample space S.
However, we shall be concerned mainly with the values of (X, Y) and hence deal directly
with the range space of (X, Y). Nevertheless, the reader should not lose sight of the fact
that if P(A) is specified for all events A CS, then the probability associated with events
in the range space of (X, Y) is determined. That is, if B is in the range space of (X, Y),
we have
P(B) = P[(X(s), Y(s>) E B) = P[s I (X(s), Y(s)) E B).

This latter probability refers to an event in S and hence determines the probability of B.
In terms of our previous terminology, B and {s I (X(s), Y(s)) E B} are equivalent events
(Fig. 6.2).
s RxxY

FIGURE 6.2
96 Two- and Higher-Dimensional Random Variables 6.1

If Bis in the range space of (X, Y) we have

P(B) = L:L: p(x;, Y;), (6.3)

if (X, Y) is discrete, where the sum is taken over all indices (i,j) for which (x;, y1) E B.
And
P(B)
=ff f(x,y)dxdy,
B
(6.4)

if (X, Y) is continuous.

EXAMPLE 6.1. Two production lines manufacture a certain type of item. Sup
5 items for line I and 3 items for line
pose that the capacity (on any given day) is
II. Assume that the number of items actually produced by either production line
is a random variable. Let ( X, Y) represent the two-dimensional random variable
yielding the number of items produced by line I and line II, respectively. Table 6.1
gives the joint probability distribution of (X, Y). Each entry represents

p(xi, Yi) = P(X = Xi, Y = Yi).

Thus p(2, 3) = P(X = 2, Y = 3) = 0.04, etc. Hence if B is defined as

B = {More items are produced by line I than by line II}

we find that

P(B) = 0.(}l + 0.03 + 0.05 + 0.07 + 0.09 + 0.04 + 0.05 + 0 06 .

+ 0.08 + 0.05 + 0.05 + 0.06 ,+ 0.06 + O.Oi

= 0.75.

EXAMPLE 6.2. Suppose that a manufacturer of light bulbs is concerned about

the number of bulbs ordered from him during the months of January and February.
Let X and Y denote the number of bulbs ordered during these two months, re
spectively. We shall assume that (X, Y) is a two-dimensional continuous random

TABLE 6.1

I� 0 0
0

0.01
1

0.03
2

0.05
3

0.07
4

0.09
5

1 0.01 0.02 0.04 0.05 0.06 0.08

2 0.01 0.03 0.05 0.05 0.05 0.06
3 0.01 0.02 0.04 0. 06 0.06 0.05
6.1 Two-Dimensional Random Variables �

X=5000 X=l0,000

FIGURE 6.3

variable with the following joint pdf (see Fig. 6.3):

f(x,y) = c if 5000 � x � 10,000 and 4000 � y � 9000,

= 0, elsewhere.

To determine c we use the fact that f�: f.:,"' f(x, y) dx dy 1. Therefore

19000110,000
=

f+<» !+"'
_.., _..,
f(x, y) dx dy =

4000 5000 f(x, y) dx dy = c[500]0 2•

Thus c = (5000)-2• Hence if B = {X � Y}, we have

rooo ( dx dy
J5ooo }5000
I I
P(B)
(5000)2
"

rooo y
= -

}5000 [
1 17
I dy5000 .
-
(5000)2 J
=

25
=

Note: In the above example, X and Y should obviously be integer-valued since we

cannot order a fractional number of bulbs! However, we are again dealing with the
idealized situation in which we allow X to assume all values between 5000 and 10,000
(inclusive).

EXAMPLE 6.3. Suppose that the two-dimensional continuous random variable

(X, Y) has joint pdf given by

f(x,y) x2 xy 0 � x � I, 0 � y � 2,
=
+ 3'

= 0, elsewhere.
98 Two- and Higher-Dimensional Random Variables 6.1

To check that f+oo J+""

_00 _00 f(x, y)dxdy = I:

(
f ( x , dxdy = J
r
J 1(x2+j')dxdy
Y)
2

!+"" !+""
_00
_00
0 0

12 -x3+- x2y 1x=ldy

0 3 6 X=0

12(1-+-)dy=-y+-y212
0 3
y
6 3
I
12 0

�+:n= t.

Let B =
{X + Y 2:: I}. (See Fig. 6.4.) We shall compute P(B) by evaluating
I - P(B), where B =
{X + Y < I}. Hence
y

P(B) =I - i1 iI-x(x2+xJ)dydx
I- i1[x2(1-x)+x(l�x)2]dx
I - -i'-2 = -tt.
In studying one-dimensional random variables,
we found that F, the cumulative distribution func
tion, played an i mportant role. In the two-dimen
sional case we can again define a cumulative
function as follows. FIGURE 6.4

Definition. Let (X, Y) be a two-dimensional random variable. The cumulative

distribution function (cdf) F of the two-dimensional random variable (X, Y)
is defined by
F(x, y) = P(X :::; x, Y :::; y).
Note: F is a function of
those discussed for the one-dimensional cdf.
two variables and has a number of properties analogous to
(See Section 4.5.) We shall mention only
the following important property.
If F is the cdf of a two-dimensional random variable with joint pdf f, then

iJ2F(x, y)/iJx iJy = f(x, y)

wherever F is differentiable. This result is analogous to Theorem 4.4 in which we proved

that (d/dx)F(x) f(x), where f is the pdf of the one-dimensional random variable X.
=
6.2 Marginal and Conditional Distributions 99

6.2 Marginal and Conditional Probability Distributions

With each two-dimensional random variable (X, Y) we associate two one

dimensional random variables, namely X and Y, individually. That is, we may be
interested in the probability distribution of X or the probability distribution of Y.

EXAMPLE 6.4. Let us again consider Example 6.1. In addition to the entries of
Table 6.1, let us also compute the "marginal" totals, that is, the sum of the 6
columns and 4 rows of the table. (See Table 6.2.)
The probabilities appearing in the row and column margins represent the
probability distribution of Y and X, respectively. For instance, P( Y l) 0.26, = =

P(X 3)= 0.21, etc. Because of the appearance of Table 6.2 we refer, quite
=

generally, to the marginal distribution of X or the marginal distribution of Y,

whenever we have a two-dimensional random variable (X, Y), whether discrete
or continuous.

TABLE 6.2

� 0
0

0
1

0.01 0.03
2 3

0.05 0.07
4

0.09
5 Sum

0.25
1 0.01 0.02 0.04 0.05 0.06 0.08 0.26
2 0.01 0.03 0.05 0.05 0.05 0.06 0.25
3 0.01 0.02 0.04 0.06 0.06 0.05 0.24

Sum 0.03 0.08 0.16 0.21 0.24 0.28 1.00

In the discrete case we proceed as follows: Since X x; must occur with =

Y =Y i for somej and can occur with Y Yi for only onej, we have =

p(xi) = P(X = x;) = P(X = x;, Y =

y1 or X = x;, Y = y2 or··· )
00

= .L: p(x;, Yi).

J=l

The function p defined for x1, x2, , represents the marginal probability dis
• • .

tribution of X Analogously we define q(y3) P( Y Yi) L:�1 p(xi, Yi) as

= = =

the marginal probability distribution of Y.

In the continuous case we proceed as follows: Let f be the joint pdf of the con
tinuous two-dimensional random variable (X, Y). We define g and h, the marginal
probability density functions of X and Y, respectively, as follows:

+oo f(x, y) dy; +oo f(x, y) dx.

g(x) =
j_00 h(y) =
j_00
100 Two- and Higher-Dimensional Random Variables 6.2

These pdrs correspond to the basic pdrs of the one-dimensional random variables
X and Y, respectively. For example

P(c ::; X ::; d) = P [ c ::; X ::; d, - oo < Y < oo]

id J_:"' f(x, y)dy dx

f.d g(x)dx.

EXAMPLE 6.5. Two characteristics of a rocket engine's performance are thrust

X and mixture ratio Y. Suppose that (X, Y) is a two-dimensional continuous
random variable with joint pdf:

f(x, y) = 2(x + y - 2xy), 0 ::; x ::; 1, 0 ::; y ::; 1,

= 0, elsewhere.

(The units have been adjusted in order to use values between 0 and 1.) The mar
ginal pdf of X is given by

g(x) = fo1 2(x + y - 2xy)dy = 2(xy + y2/2 - xy 2)i6

1, 0 ::; x ::; 1.

That is, X is uniformly distributed over [O, 1].

The marginal pdf of Y is given by

h(y) = fo1 2(x + y - 2xy) dx = 2(x2/2 + xy - x2y)i6

= 1, 0 ::; y ::; 1.

Hence Y is also uniformly distributed over [O, 1].

Definition. We say that the two-dimensional continuous random variable is

uniformly distributed over a region R in the Euclidean plane if

f(x, y) = const for (x, y) E R,

= 0, elsewhere.

Because of the requirement J�0000 f� 00 f(x, y)dx dy 1, the above implies

that the constant equals I/area (R). 00 We are assuming that R is a region
with finite, nonzero area.

Note: This definition represents the two-dimensional analog to the one-dimensional

unifo rmly distributed random variable.
6.2 Marginal and Conditional Distributions 101

EXAMPLE 6.6. Suppose that the two-dimensional random variable (X, Y) is

uniformly distributed over the shaded region R indicated in Fig. 6.5. Hence

1 y
f(x,y) = (x,y) ER.
area(R)'

We find that

area(R) = fo1 (x - x2) dx = !·

Therefore the pdf is given by

f(x,y) = 6, (x,y) ER
= 0, (x,y) f1. R.
FIGURE 6.5

In the following equations we find the marginal pdf's of X and Y.

g(x) = J_:"' f(x,y) dy = 1: 6 dy

= 6(x - x2), 0 � x � 1;

h(y) = !�"' � (x,y) dx = 1,fo 6 dx

= 6(v'.Y - y), 0 � y � 1.

The graphs of these pdf's are sketched in Fig. 6.6.

The concept of conditional probability may be introduced in a very natural way.

EXAMPLE 6.7. Consider again Examples 6.1 and 6.4. Suppose that we want to
evaluate the conditional probability P(X = 2I Y = 2). According to the defini
tion of conditional probability we have

P(X 2, y 2) 0.05
2I y 2) 0 20
=

P(X
=

2) 0.25 . .
= = = = =

P(Y =

We can carry out such a computation quite generally for the discrete case. We
have
p(xi I Y;) = P(X Xi I Y
= = Y;)
p(x;,Y;) . 1f q(yj) > o' (6.5)
q(y;)
q(y; I Xi) =
P( y = Yi I x =
Xi)
p(xi,Y;)
if p (xi) 0. (6.6)
_

>
- p(xi )
102 Two- and Higher-Dimensional Random Variables 6.2

g(x) h(y)

X=� (I, 0)
y=� (I, 0)
(a) (b)
FIGURE 6.6

Note: For given j, p(x; IY;) satisfies all the conditions for a probability distribution.
We h�ve p(x; I Y;) � 0 and also

t p(x; I Y;) = t p(xq(;y,;)Y;)

i=l i=l
=
q(y;)
q(y;)
= 1.

In the continuous case the formulation of conditional probability presents some

difficulty since for any given x0,y0, we have P(X x0) = P( Y y0) 0. We
= = =

make the following formal definitions.

Definition. Let (X, Y) be a continuous two-dimensional random variable with

joint pdff Let g and h be the marginal pdrs of X and Y, respectively.

The conditional pdf of X for given Y = y is defined by

g(x Iy) =
f(x, y)' h(y) > 0. (6.7)
h(y)

The conditional pdf of Y for given X = x is defined by

h(y Ix) =
f(x,y)' g(x) > 0. (6.8)
g(x)
Notes: (a) The above conditional pdf's satisfy all the requirements for a one-dimen
sional pdf. Thus, for fixed y, we have g(x I y) � 0 and

f+oo f+oo f+ao

f(x, y) 1 h(y)
-ao
g(x I y) dx = -ao
h(y)
dx =
h(y) -ao f(x, y) dx = h(y) = 1.

An analogous computation may be carried out for h(y I x) . Hence Eqs. (6.7) and (6.8)
define pdf's on Rx and Ry, respectively.
(b) An intuitive interpretation of g(x Iy) is obtained if we consider slicing the surface
represented by the joint pdf /with the plane y = c, say. The intersection of the plane
with the surface z = f(x, y) will result in a one-Jimensional pdf, namely the pdf of X
for Y = c. This will be precisely g(x I c).

(c) Suppose that· (X, Y) represents the height and weight of a person, respectively.
Let /be the joint pdf of (X, Y) and let g be the marginal pdf of X (irrespective of Y).
6.3 Independent Random Variables 103

Hence J2.s g(x) dx would represent the probability of the event {5.8 5 X s 6} irre
spective of the weight Y. And Jf.8 g(x I 150) dx would be interpreted as P(5.8 5 X 5
6IY = 150). Strictly speaking, this conditional probability is not defn
i ed in view of
our previous convention with conditional probability, since P(Y 150) 0. However, = =

we si.µiply use the above integral to define this probability. Certainly on intuitive grounds
this ought to be the meaning of this number.

EXAMPLE 6.8. Referring to Example 6.3, we have

2
g(x) = fo ( x2 + Y) dy = 2x2 + x, �
1
h(y) f ( x2 + )
xy d
x �+!.
Jo
=
=

3 6 3
Hence,

( x2 + xy/3 6x2 +2xy , 0 - 0 -

g x I y) = = < 1'
< x - < y -
< 2;
l/3+y/6 2+y
x 2 + xy/3 3x2 + xy 3x +y
h(y I x) =

2x2 +2/3(x) 6x2 +2x

6x +2 '
0 ::::; y ::::; 2, 0 ::::; x ::::; 1.

To check that g(x I y) is a pdf, we have

1
1
0
6x2 + 2xy d
2+y
x =
2+ y
2+y
= 1 for ally .

A similar computation can be carried out for h(y I x).

6.3 Independent Random Variables

Just as we defined the concept of independence between two events A and B, we

shall now define independent random variables. Intuitively, we intend to say that
X and Y are independent random variables if the outcome of X, say, in no way
influences the outcome of Y. This is an extremely important notion and there
are many situations in which such an assumption is justified.

EXAMPLE 6.9. Consider two sources of radioactive material at some distance

from each other which are emitting a-particles. Suppose that these two sources
are observed for a period of two hours and the number of particles emitted is re
corded. Assume that the following random variables are of interest: X and X2,
1
the number of particles emitted from the first source during the first and second
hour, respectively; and Y and Y2 , the number of particles emitted from the
1
second source during the first and second hour, respectively. It seems intuitively
obvious that (X1 and Y1), or (X1 and Y2), or (X2 and Y1), or (X2 and Y2) are
104 Two- and Higher-Dimensional Random Variables 6.3

all pairs of independent random variables, For the X's depend only on the charac
teristics of source I while the Y's depend on the characteristics of source 2, and
there is presumably no reason to assume that the two sources influence each other's
behavior in any way. When we consider the possible independence of X1 and X2,
however, the matter is not so clearcut. Is the number of particles emitted during
the second hour influenced by the number that was emitted during the first hour?
To answer this question we would have to obtain additional information about
the mechanism of emission. We could certainly not assume. a priori, that X1
and X2 are independent.
Let us now make the above intuitive notion of independence more precise.

Definition. (a) Let (X, Y) be a two-dimensional discrete random variable. We

Yi) =
say that X and Y are independent random variables if and only if p(x;,
p(x;)q(y;) for alliandj. That is,P(X =x;, Y =Yi)= P(X =Xi)P(Y =Yi),
for all i and j.
(b) Let (X, Y) be a two-dimensional continuous random variable. We say that
X and Y are independent random variables if and only if f(x, y) = g(x)h(y)
for all (x, y), where f is the joint pdf, and g and h are the marginal pdf's of
X and Y, respectively.

Note: If we compare the above definition with that given for independent eve111s,
the similarity is apparent: we are essentially requiring that the joint probability (or joint
pdf) can be factored. The following theorem indicates that the above definition is equiva
lent to another approach we might have taken.

Theorem 6.1. (a) Let (X, Y) be a two-dimensional discrete random variable.

Then X and Y are independent if and only if p(xi I Y;) = p(x;) for all i andj
(or equivalently, if and only if q(y; I x;) = q(yj) for all i and j).

(b) Let (X, Y) be a two-dimensional continuous random variable. Then X

and Y are independent if and only if g(x I y) = g(x), or equivalently, if and
only if h(y I x) = h(y), for all (x, y).

Proof" See Problem 6.10.

EXAMPLE 6.10. Suppose that a machine is used for a particular task in the morn
ing and for a different task in the afternoon. Let X and Y represent the number of
times the machine breaks down in the morning and in the afternoon, respectively.
Table 6.3 gives the joint probability distribution of (X. Y).
An easy computation reveals that for all the entries in Table 6.3 we have

Thus X and Y are independent random variables. (See also Example 3.7, for
comparison.)
6.3 Independent Random Variables 105

TABLE 6.3

l)Z 0 1 2 q(yj)

0 0.1 0.2 0.2 0.5

1 0.04 0.08 0.08 0.2

2 0.06 0.12 0.12 0.3

---

p(x;) 0.2 0.4 0.4 1.0

EXAMPLE 6.11. Let X and Y be the life lengths of two electronic devices. Sup
pose that their joint pdf is given by

x � 0, y � 0.
-r -v
Since we can factor f(x, y) = e e , the independence of X and Y is established.

EXAMPLE 6.12. Suppose that f(x, y) Sxy, =

y
0 � x � y � l. (The domain is given by the
shaded region in Fig. 6.7.) Although/ is (already)
written in factored form, X and Y are not inde
pendent, since the domain of definition

{(x. y) I o � x � y � l}

is such that for given x, y may assume only values

greater than that given x and less than I. Hence
X and Y are not independent. FIGURE 6.7

Note: From the definition of the marginal probability distribution (in either the
discrete or the continuous case) it is clear that the joint probability distribution deter
mines, uniquely, the marginal probability distribution. That is, from a knowledge of the
joint pdf f. we can obtain the marginal pdf's g and h. However, the converse is not
true! Thal is, in general, a knowledge of the marginal pdf's g and h do not determine the
joint pdf .f Only when X and Y are independent is this true, for in this case we have
f(x, y) = g(x)h(y).

The following result indicates that our definition of independent random vari
ables is consistent with our previous definition of independent events.

Theorem 6.2. Let (X, Y) be a two-dimensional random variable. Let A and B

be events whose occurrence (or nonoccurrence) depends only on X and Y,
respectively. (That is, A is a subset of Rx, the range space of X, while B is
106 Two- and Higher-Dimensional Random Variables 6.4

a subset of Rv, the range space of Y.) Then, if X and Y are independent
random variables, we have P(A n B) = P(A)P(B).

Proof (continuous case only):

P(A n B) = ff f(x, y) dx dy = ff g(x)h(y) dx dy

AnB AnB

= Lg(x) dx f8 h(y) dy = P(A)P(B).

6.4 Functions of a Random Variable

In defining a random variable X we pointed out, quite emphatically, that X is

a function defined from the sample space S to the real numbers. In defining a two
dimensional random variable (X, Y) we were concerned with a pair of functions,
X = X (s), Y Y(s), each of which is defined on the sample space of some
=

experiment and each of which assigns a real number to every s E S, thus yield
ing the two-dimensional vector [X(s), Y(s)].
Let us now consider Z H1 (X, Y), a function of the two random variables
=

X and Y. It should be clear that Z = Z(s) is again a random variable. Consider

the following sequence of steps:
(a) Perform the experiment e and obtain the outcome s.
(b) Evaluate the numbers X(s) and Y(s).
(c) Evaluate the number Z H1[X(s), Y(s)].
=

The value of Z clearly depends on s, the original outcome of the experiment.

That is, Z = Z(s) is a function assigning to every outcome s E S a real number,
Z(s). Hence Z is a random variable. Some of the important random variables
we shall be interested in are X + Y, XY, X/Y, min (X, Y), max (X, Y), etc.
The problem we solved in the previous chapter for the one-dimensional random
variable arises again: given the joint probability distribution of (X, Y), what is
the probability distribution of Z = H1 (X, Y)? (It should be clear from the numer
ous previous discussions on this point, that a probability distribution is induced on
Rz, the sample space of Z.)
If (X, Y) is a discrete random variable, this problem is quite easily solved.
Suppose that (X, Y) has the distribution given in Examples 6.1 and 6.4. The
following (one-dimensional) random variables might be of interest:

U = min (X, Y) = least number of items produced by the two lines;

V = max (X, Y) = greatest number of items produced by the two lines;
W = X + Y = total number of items produced by the two lines.

To obtain the probability distribution of U, say, we proceed as follows. The pos

sible values of U are: 0, I, 2, and 3. To evaluate P( U 0) we argue that U = 0
=

if and only if one of the following occurs: X = 0, Y = 0 or X 0, Y I or = =

X = 0, Y 2 or X = 0, Y
= 3 or X =I, Y = 0 or X
= 2, Y = 0 or X = 3,
=
6.4 Functions of a Random Variable 107

Y = 0 or X = 4, Y = 0 or X 5, Y
= 0. Hence P( U = 0) = 0.28. The rest
=

of the probabilities associated with U may be obtained in a similar way. Hence the
probability distribution of U may be summarized as follows: u: 0, I, 2, 3; P( U u): =

0.28, 0.3p, 0.25, 0.17. The probability distribution of the random variables V and
W as defined above may be obtained in a similar way. (See Problem 6.9.)
If (X, Y) is a continuous two-dimensional random variable and if Z = H 1 ( X, Y)
is a continuous function of (X, Y), then Z will be a continuous (one-dimensional)
random variable and the problem of finding its pdf is somewhat more involved.
In order to solve this problem we shall need a theorem which we state and discuss
below. Before doing this, let us briefly outline the basic idea.
In finding the pdf of Z =H1( X, YJ it is often simplest to introduce a second
random variable, say W = H2( X, Y), and first obtain the joint pdf of Z and W,
say k(z, w). From a knowledge of k(z, w) we can then obtain the desired pdf of
Z, say g(z), by simply integrating k(z, w) with respect to w. That is,
+z

g(z) = f_,, k(z, w) dw.

The remaining problems are (I) how to find the joint pdf of Z and W, and (2)
how to choose the appropriate random variable W = H2(X, Y). To resolve the
latter problem, let us simply state that we usually make the simplest possible choice
for W. In the present context, W plays only an intermediate role, and we are not
really interested in it for its own sake. In order to find the joint pdf of Z and W
we need Theorem 6.3.

Theorem 6.3. Suppose that (X, Y) is a two-dimensional continuous random

variable with joint pdf f Let Z = H 1(X, Y) and W = H2( X, Y), and as
sume that the functions H1 and H2 satisfy the following conditions:
(a) The equations z = H1(x, y) and w = H2(x, y) may be uniquely solved for
x and y in terms of z and w, say x = G1(z, w) and y = G2(z, w).
(b) The partial derivatives ax/az, ax/aw, ay/az, and ayjaw exist and are con
tinuous.

Then the joint pdf of (Z, W ), say k(z, w), is given by the following expression:
k(z, w) = f[G 1(z, w), G2(t, w)] jJ(z, w)j, where J(z, w) is the following 2 X 2
determinant:
iJx ax
iJz iJw
J(z, w) =

iJy iJy
az iJw

This determinant is called the Jacobian of the transformation (x, y) --+ (z, w)
and is sometimes denoted by iJ(x, y)/iJ(z, w). We note that k(z, w) will be
nonzero for those values of (z, w) corresponding to values of (x, y) for which
f(x, y) is nonzero.
108 Two- and Higher-Dimensional Random Variables 6.4

w y

(a) z (b)
FIGURE 6.8

Notes: (a) Although we shall not prove this theorem, we will at least indicate what
needs to be shown and where the difficulties lie. Consider the joint cdf of the two
dimensional random variable (Z, W), say

K(z, w) = P(Z � z, W � w) = f�"' f�"' k(s, t) dsdt,

where k is the sought pdf. Since the transformation (x, y) ----> (z, w) is assumed to be one
to one [see assumption (a) above], we may find the event, equivalent to {Z � z, W � w} ,
in terms of X and Y. Suppose that this event is denoted by C. (See Fig. 6.8.) That is,
{(X, Y) EC} if and only if {Z � z, W � w} . Hence

J�J�"' k(s, t) dsdt Jc J f(x, y)dxdy.

Since f is assumed to be known, the integral on the right-hand side can be evaluated.
Differentiating it with respect to z and w will yield the required pdf. In most texts on
advanced calculus it is shown that these techniques lead to the result as stated in the
above theorem.
(b) Note the striking similarity between the above result and the result obtained in
the one-dimensional case treated in the previous chapter. (See Theorem 5.1.) The
monotonicity requirement for the function y H(x) is replaced by the assumption that
=

the correspondence between (x, y) and (z, w) is one to one. The differentiability con
dition is replaced by certain assumptions about the partial derivatives involved. The
final solution obtained is also very similar to the one obtained in the one-dimensional
case: the variables x and y are simply replaced by their equivalent expressions in terms
of z and w, and the absolute value of dx/dy is replaced by the absolute value of the
Jacobian.
y
EXAMPLE 6.13. Suppose that we are aiming at a circular
target of radius one which has been placed so that its
center is at the origin of a rectangular coordinate system
(Fig. 6.9). Suppose that the coordinates ( X, Y) of the
point of impact are uniformly distributed over the circle. --+----+----x
That is,

f(x,y) l/7r if (x, y) lies inside (or on) the circle,

0, elsewhere. FIGURE 6.9

6.5 Distribution of Product and Quotient 109

FIGURE 6.10 1
1---- ---

'----------�-----•
1(2 , 1)
...

</> FIGURE 6.11

Suppose that we are interested in the random variable R representing the distance
from the origin. (See Fig. 6.10.) That is, R= v X2 + Y 2• We shall find the
pdf of R, say g, as follows: Let <I>= tan-1 (Y/X). Hence X= H1(R,<I>) and
Y = H (R,<I>) where x = H 1 (r, cp) = r cos cp and y= H (r, cp) = r sin � (We
2 2
are simply introducing polar coordinates.)
The Jacobian is
ax ax
cos cp -r sin cp
ar aq,
J=
ay ay
sin cp r cos cp
ar aq,
r cos2 cp + r sin2 cp= r.

Under the above transformation the unit circle in the x y-plane is mapped into
the rectangle in the cp r-plane in Fig. 6.11. Hence the joint pdf of {<I>, R) is given by
r
g(cp, r)= - ' 0 :=:; r :=:; l ,
1r

Thus the required pdf of R , say h, is given by

r2r
h(r)= lo g(cp, r) dq, = 2r, 0 :=:; r :=:; l .

Note: This example points out the importance of obtaining a precise representation
of the region of possible values for the new random variables introduced.

6.5 Distribution of Product and Quotient of Independent Random Variables

Among the most important functions of X and Y which we wish to consider are
the sum S = X + Y, the product W= XY, and the quotient Z = X/ Y. We
can use the method in this section to obtain the pdf of each of these random vari
ables under very general conditions.
We shall investigate the sum of random variables in much greater detail in
Chapter 11. Hence we will defer discussion of the probability distribution of
X + Y until then. We shall, however, consider the product and quotient in the
following two theorems.

Theorem 6.4. Let (X, Y) be a continuous two-dimensional random variable

and assume that X and Y are independent. Hence the pdf f may be written
asf(x, y)= g(x)h(y). Let W= XY.
110 Two- and Higher-Dimensional Random Variables 6.5

Then the pdf of W, say p, is given by

(6.9)

Proof: Let w = xy and u = x. Thus x = u and y = w/u. The Jacobian is

0
l=
-w u
u2 u

Hence the joint pdf of W = XY and U = X is

s(w,u) = g(u)h (�) l�I ·

The marginal pdf of W is obtained by integrating s(w,u) with respect to u,
yielding the required result. The values of w for- which p(w) > 0 would depend
on the values of (x, y) for which/(x, y) > 0.

Note: In evaluating the above integral we may use the fact that

EXAMPLE 6.14. Suppose that we have a circuit in which both the current I and
the resistance R vary in some random way. Specifically, assume that I and R are
independent continuous random variables with the following pdf's.

/: g(i) = 2i, and 0 elsewhere;

R: h(r) = r 2j9, and 0 elsewhere.

Of interest is the random variable E = IR (the voltage in the circuit). Let p be

the pdf of E.
By Theorem 6.4 we have

p(e) = J_:'° g(i)h (7) I� I di.

Some care must be taken in evaluating this integral. First, we note that the vari
able of integration cannot assume negative values. Second, we note that in order
for the integrand to be positive, both the pdf's appearing in the integrand must
be positive. Noting the values for which g and hare not equal to zero we find that
the following conditions must be satisfied:

and 0 :::; e/i :::; 3.

6.5 Distribution of Product and Quotient 111

These two inequalities are, in turn, equivalent to

e/3 :::; i :::; I. Hence the above integral becomes p(e)

p(e) = 11 2i 9e�2 �di

e/3 l l

-�e2 �11
l e/3

�e(3 - e), 0:::; e:::; 3.

e=� (3, 0)
An easy computation shows that f5 p(e) de = I.
FIGURE 6.12
(See Fig. 6.12.)

Theorem 6.5. Let (X, Y) be a continuous two-dimensional random variable

and assume that X and Y are independent. [Hence the pdf of ( X, Y) may be
written as f(x, y) = g(x)h(y).] Let Z = X/ Y. Then the pdf of Z, say q,
is given by

q(z) =

f+oo
_00 g(vz)h(v)lvl dv. (6.10)

Proof" Let z = x/y and let v = y. Hence x = vz and y = v. The Jacobian is

Hence the joint pdf of Z = X/ Y and V = Y equals

t(z, v) = g(vz)h(v)jvj.

Integrating this joint pdf with respect to v yields the required marginal pdf of Z.

EXAMPLE 6.15. Let X and Y represent the life lengths of two light bulbs manu
factured by different processes.
Assume that X and Y are independent random
variables with the pdf's f and g, respectively, where

f(x) = e-x, x � 0, and 0 elsewhere;

g(y) = 2e-2v, y � 0, and 0 elsewhere.

Of interest might be the random variable X/ Y, representing the ratio of the two
life lengths. Let q be the pdf of Z. 00
By Theorem 6.5 we have q(z) = f�00 g(vz)h(v)lvl dv. Since X and Y can assume
only nonnegative quantities, the above integration need only be carried out over
the positive values of the variable of integration. In addition, the integrand will
be positive only when both the pdf's appearing are positive. This implies that we
must have v � 0 and vz � 0. Since z > 0, these inequalities imply that v � 0.
112 Two- and Higher-Dimensional Random Variables 6.6

Thus the above becomes

q(z)

q(z) = foao e-vz2e-2vv dv = 2foao ve-v<2+z> dv.

An easy integration by parts yields

q(z)
2 z;;::: 0.
(z + 2)2
=

(See Fig. 6.13.) It is again an easy exercise to

check that J; q(z) dz = I. FIGURE 6.13

6.6 n-Dimemional Random Variables

Our discussion so far has been entirely restricted to two-dimensional random

variables. As we indicated at the beginning of this chapter, however, we may have
to be concerned with three or more simultaneous numerical characteristics.
We shall give only the briefest discussion of n-dimensional random variables.
Most of the concepts introduced above for the two-dimensional case can be ex
tended to the n-dimensional one. We shall restrict ourselves to the continuous
case. (See the note at the end of this chapter.)
Suppose then that (X i. ..., Xn) may assume all values in some region of n
dimensional space. That is, the value is an n-dimensional vector

We characterize the probability distribution of (Xi. ..., Xn) as follows.

There exists a joint probability density function f satisfying the following
conditions:

(a) f(x1, ... , Xn) 2:: 0 for all (xi. ..., Xn).

(b)
+ao
f_00 • • •
+ao
J_00 f(xi, ..., Xn) dx1 · · · dxn I.

With the aid of this pdf we define

P{(Xi. ..., Xn ) E CJ = r f f(xi. ..., Xn) dx1

· · · · · dxn,
c

where C is a subset of the range space of (X i. ..., Xn).

With each n-dimensional random variable we can associate a number of lower
dimensional random variables. For example, if n = 3, then

00
J:00 J.:"' f(xi. X2, X3) dx1 dx2 = g(x3),

where g is the marginal pdf of the one-dimensional random variable X3, while
6.6 n-Dimensional Random Variables 113

where h represents the joint pdf of the two-dimensional random variable (X 1, X 2),
etc. The concept of independent random variables is also extended in a natural
way. We say that (Xr, ..., Xn) are independent random variables if and only if
their joint pdf f(x1, • • • , Xn) can be factored into

g I (x i) · · · gn(Xn).
There are many situations in which we wish to consider n-dimensional random
variables. We shall give a few examples.
(a) Suppose that we study the pattern of precipitation due to a particular storm
system. If we have a network of, say, 5 observing stations and if we let Xi be the
rainfall at station i due to a particular frontal system, we might wish to consider
the five-dimensional random variable (Xr, X2, X3, X4, X5).
(b) One of the most important applications of n-dimensional random variables
occurs when we deal with repeated measurements on some random variable X.
Suppose that information about the life length, say X, of an electron tube is re
quired. A large number of these tubes are produced by a certain manufacturer,
and we test n of these. Let Xi be the life length of ith tube, i
1, ..., n. Hence =

(Xr, ... , Xn) is an n-dimensional random variable. If we assume that each X;

has the same probability distribution (since all tubes are produced in the same
manner), and if we assume that the X;'s are all independent random variables
(since, presumably, the production of one tube does not affect the production of
other tubes), we may suppose that then-dimensional random variable (Xr, ..., Xn)
is composed of the independent identically distributed components Xi, ... , Xn .
(It should be obvious that although X1 and X2 have the same distribution, they
need not assume the same value.)
(c) Another way in which n-dimensional random variables arise is the fol
lowing. Let X(t) represent the power required by a certain industrial concern at
time t. For fixed t, X(t) is a one-dimensional random variable. However, we may
be interested in describing the power required at certain n specified times, say
t1 < t2 < · · · < In. Thus we wish to study the n-dimensional random variable

Problems .of this type are studied at a more advanced level. (An excellent reference
for this subject area is "Stochastic Processes" by Emanuel Parzen, Holden-Day,
San Francisco, 1962.)

Note: In some of our discussion we have referred to the concept of "n-space." Let us
summarize a few of the basic ideas required.
With each real number x we may associate a point on the real number line, and con
versely. Similarly, with each pair of real numbers (xr, x2) we may associate a point in
the rectangular coordinate plane, and conversely. Finally, with each set of three real
numbers (x1, x2, x3), we may associate a point in the three-dimensional, rectangular
coordinate space, and conversely.
In many of the problems with which we are concerned we deal with a set of n real
numbers, (x1, x2, ... , xn), also called an n-tuple. Although we cannot draw any sketches
114 Two- and Higher-Dimensional Random Variables

if n > 3, we can continue to adopt geometric terminology as suggested by the lower

dimensional cases referred to above. Thus, we shall speak of a "point" inn-dimensional
space determined by the n-tuple (x1, ... , xn). We shall define as n-space (sometimes
called Euclidean n-space) the set of all (xi, ... , Xn), where x; may be any real number.
Although we will not actually need to evaluate n-dimensional integrals, we shall find
the concept a very useful one and will occasionally need to express a quantity as a multiple
integral. If we recall the definition of

ff f(x, y) dx dy,
A

where A is a region in the (x, y)-plane, then the extension of this concept to

f · · · f f(x1, ... , Xn)dX1 · · · dxn,

where R is a region inn-space should be clear. If f represents the joint pdf of the two
dimensional random variable (X, Y) then

ff f(x, y) dx dy
A

represents P[(X, Y) E A]. Similarly, if f represents the joint pdf of (X 1, . • . , Xn) then

represents
P[(X1, ... , Xn) ER].

PROBLEMS

6.1. Suppose that the following table represents the joint probability distribution of
the discrete random variable (X, Y). Evaluate all the marginal and conditional dis
tributions.

Ii< 1
1

1
2
-- ---
-- ---

1
3

0
1 2 6
-- ---

1 1
2 0 9 5
-- ---

1
3 ls 4 ls

6.2. Suppose that the two-dim�andQrn_ variable (X, Y) has joint pdf
_
f-c::.;;:-kx(� - y), 0 < x < �. -x < y < x,
= -0,. ---�lsewhere.

(a) Evaluate the constant k.

(b) Find the marginal pdf of X.
(c) Find the marginal pdf of Y.
Problems 115

6.3. Suppose that the joint pdf of the two-dimensional random variable (X, Y) is
given by

f(x, y) x2 + xy ' 0 < x <I, 0 < y < 2,

3
=

= 0, elsewhere.

Compute the following.

(a) P(X > !); (b) P(Y < X); (c) P(Y < ! / X < !).
6.4. Suppose that two card� are drawn at random from a deck of cards. Let X be
the number of aces obtained and let Y be the number of queens obtained.
(a) Obtain the joint probability distribution of (X, Y).
(b) Obtain the marginal distribution of X and of Y.
(c) Obtain the conditional distribution of X (given Y) and of Y (given X).
6.5. For what value of k is f(x, y) = ke-<x+u> a joint pdf of (X, Y) over the region
0 < x < 1, 0 < y < 1?
6.6. Suppose that the continuous two-dimensional random variable (X, Y) is uni
formly distributed over the square whose vertices are (1, 0), (0, 1), ( - 1 , 0), and (0, -I).
Find the marginal pdf's of X and of Y.
6.7. Suppose that the dimensions, X and Y, of a rectangular metal plate may be con
-
sidered to be independent continuous random variables with the following pdf's.

X: g(x) x - 1, 1 < x:::; 2,

-x + 3, 2 < x < 3,
0, elsewhere.

Y: h(y) 1
2• 2 < y < 4,
0, elsewhere.

Find the pdf of the area of the plate, A = XY.

6.8. Let X represent the life length of an electronic device and suppose that X is a
continuous random variable with pdf

1000
f(x ) x > 1000 ,
�2'
0, elsewhere.

Let X1 and X2 be two independent determinations of the above random variable X.

(That is, suppose that we are testing the life length of two such devices.) Find the pdf
of the random variable Z = Xii X2.
6.9. Obtain the probability distribution of the random variables V and W introduced
on p. 95.

6.10. Prove Theorem 6.1.

116 Two- and Higher-Dimensional Random Variables

6.11. The magnetizing force Hat a point P, X units from

a wire carrying a current /, is given by H = 21/ X. (See
Fig. 6.14.) Suppose that P is a variable point. That is, X is a
continuous random variable uniformly distributed over (3, 5).
Assume that the current I is also a continuous random vari
able, uniformly distributed over ( 10, 20). Suppose, in addi
tion, that the random variables X and I are independent.
I
Find the pdf of the random variable H.

6.12. The intensity of light at a given point is given by the relationship I = C/ D2,
where C is the candlepower of the·source and Dis the distance that the source is from the
given point. Suppose that C is uniformly distributed over (1, 2), while Dis a continuous
random variable with pdf /(d) = e-d, d > 0. Find the pdf of /, if C and D are in
dependent. [Hn
i t: First find the pdf of D2 and then apply the results of this chapter.]

6.13. When a current I (amperes) flows through a resistance R (ohms), the power
generated is given by W = 12R(watts). Suppose that I and Rare independent random
variables with the following pdf's.

/: /(i) = 6i(l - i), 0 ::; i ::; 1,

= 0, elsewhere.

R: g(r) = 2r, 0 < r < 1,

0, elsewhere.

Determine the pdf of the random variable Wand sketch its graph.

6.14. Suppose that the joint pdf of (X, Y) is given by

f(x, y) = e-11, for x > 0, y > x,

= 0, elsewhere.

(a) Find the marginal pdf of X.

(b) Find the marginal pdf of Y.
(c) Evaluate P(X > 2 IY < 4).
7

Further Characteristics of Random Variables

7.1 The Expected Value of a Random Variable

Consider the deterministic relationship ax + by = 0. We recognize this as a

linear relationship between x and y. The constants a and b are the parameters
of this relationship in the sense that for any particular choice of a and b we obtain
a specific linear function. In other cases one or more parameters may characterize
the relationship under consideration. For example, if y ax2 + bx + c, three
=

parameters are needed. If y = e-kx, one parameter is sufficient. Not only is a

particular relationship characterized by parameters but, conversely, from a certain
relationship we may define various pertinent parameters. For example, if ay +
bx = 0, then m = -b/a represents the slope of the line. Again, if y ax2 + =

bx+ c, then -b/2a represents the value at which a relative maximum or relative
minimum occurs.
In the nondeterministic or random mathematical models which we have been
considering, parameters may also be used to characterize the probability distribu
tion. With each probability distribution we may associate certain parameters
which yield valuable information about the distribution (just as the slope of a
line yields valuable information about the linear relationship it represents).

EXAMPLE 7.1. Suppose that Xis a continuous random variable with pdf/(x) =

ke-kx, x ?:: 0. To check that this is a pdf note that f;' ke-kx dx = l for all
k > 0, and that ke-kx > O for k > 0. This distribution is called an exponential
distribution, which we shall study in greater detail later. It is a particularly useful
distribution for representing the life length, say X, of certain types of equipment
or components. The interpretation of k, in this context, will also be discussed
subsequently.

EXAMPLE 7.2. Assume that items are produced indefinitely on an assembly line.
The probability of an item being defective is p, and this value is the same for all
items. Suppose also that the successi�e items are defective (D) or nondefective
(N) independently of each other. Let the random variable X be the number of
items inspected until the first defective item is found. Thus a typical outcome of the
117
118 Further Characterizations of Random Variables 7.1

experiment would be of the form NNNN D. Here X(NNNN D) = 5. The pos

sible values of X are: l , 2, .. . , n, . . . Since X = k if and only if the first (k - 1)
items are nondefective and the kth item is defective, we assign the following prob
ability to the event {X = k}: P(X k) =p( l - Pi-1, k = 1, 2, ... , n, . . .
=

To check that this is a legitimate probability distribution we note that

L: p(l - p) k-1 = p[ l + (1 - p) + (1 � p)2 + .. ·]

k=l

1
p = 1 if 0 < IPI < l.
=
1 - (1 - p)
Thus the parameter p may b e any number satisfying 0 < p < l.

Suppose that a random variable and its probability distribution i s specified. Is

there some way of characterizing this distribution in terms of a few pertinent
numerical parameters?
Before pursuing the above question, let us motivate our discussion by consider
ing the following example.

EXAMPLE 7.3. A wire cutting machine cuts wire to a specified length. Due to cer
tain inaccuracies of the cutting mechanism, the length of the cut wire (in inches),
say X, may be considered as a uniformly distributed random variable over [11.5,
12.5]. The specified length is 12 inches. If 11.7 � X < 12.2, the wire can be sold
for a profit of $0.25. If X � 12.2, the wire can be recut, and an eventual profit of
$0.10 is realized. And if X < 11.7, the wire is discarded with a loss of $0.02. An
easy computation shows that P(X � 12.2) 0.3, P( l l .7 � X < 12.2)
= 0.5, =

and P(X < 11.7) 0.2.

Suppose that a large number of wire specimens are cut, say N. Let N8 be the
number of specimens for which X < 11.7, NR the number of specimens for which
11.7 � X < 12.2, and NL the number of specimens for which X � 12.2. Hence
the total profit realized from the production of the N specimens equals T =

Ns(-0.02) + NR(0.25) + NL(0.10). The total profit per wire cut, say W, equals
W =(N8/N)(-0.02) + (NR/N)(0.25) + (NL/N)(O. l ). (Note that W is a
random variable, since Ns, NR, and N1, are random variables.)
We have already mentioned that the relative frequency of an event is close to
the probability of that event if the number of repetitions on which the relative
frequency is based is large. (We shall discuss this more precisely in Chapter 12.)
Hence, if N is large, we would expect Ns/N to be close to 0.2, NR/N to be close to
0.5, and NL/N to be close to 0.3. Therefore, for large N, W could be approximated
as follows:
w � (0.2)( -0.02) + 0.5(0.25) + (0.3)(0.1) = $0.151.

Thus, if a large number of wires were produced, we would expect to make a profit
of $0.151 per wire. The number 0.151 is called the expected value of the random
variable W.
7.1 The Expected Value of a Random Variable 119

Definition. Let X be a discrete random variable with possible values

x i. . . . , Xn, • • • Let p(xi) = P(X = xi), i = I, 2, ... , n, . .. Then the
expected value of X (or mathematical expectation of X), denoted by E(X),
is defined as
00

E(X) = 2: XiP(Xi) (7.1)

i= l

if the series L'i=1 xip(xi) converges absolutely, i.e., if L'i=1 lxijp(xi) < oo.

This number is also referred to as the mean value of X.

Notes: (a) If X assumes only a finite number of values, the above expression becomes
E(X) = L:i =l p(x;)x;. This may be considered as a "weighted average" of the possible
values x1, . . . , Xn. If all these possible values are equally probable, E(X) (1/n)L:i=I x;, =

which represents the ordinary arithmetic average of then possible values.

(b) If a fair die is tossed and the random variable X designates the number of points
showing, then E(X) iO + 2 + 3 + 4 + 5 + 6)
= t· This simple example il =

lustrates, strikingly, that E(X) is not the outcome we would expect when X is observed
a single time. In fact, in the above situation, E(X) i is not even a possible value for
=

X! Rather, it turns out that if we obtain a large number of independent observations of

X, say x1, . . , Xn, and compute the arithmetic mean of these outcomes, then, under
. •

fairly general conditions, the arithmetic mean will be close to E(X) in a probabilistic
sense. For example, in the above situation, if we were to throw the die a large number
of times and then compute the arithmetic mean of the various outcomes, we would expect
this average to become closer to i the more often the die were tossed.
(c) We should note the similarity between the notion of expected value as defined above
(particularly if X may assume only a finite number of values) and the notion of an average
of a set of numbers say, z 1, . . . , Zn. We usually define z (1/n)L:i'=i z; as the arithme =

tic mean of the numbers zi, . . . , Zn. Suppose, furthermore, that we have numbers
zi, . . . , z�, where z� occurs n; times, Lf=in; n. Letting f; n;/n, Lf=i f; = 1, = =

we define the weighted mean of the numbers zl., . . . , z� as

k k

-1 2:
n i=l
n;z� = 2: f;z�.
i=l

Although there is a strong resemblance between the above weighted average

and the definition of E(X), it is important to realize that the latter is a number
(parameter) associated with a theoretical probability distribution, while the former
is simply the result of combining a set of numbers in a particular way. However,
there is more than just a superficial resemblance. Consider a random variable X
and let Xi. • • • , Xn be the values obtained when the experiment giving rise to X
was performed n times independently. (That is, xi, . . . , Xn simply represent the
outcomes of n repeated measurements of the numerical characteristic X.) Let x
be the arithmetic mean of these n numbers. Then, as we shall discuss much more
precisely in Chapter 12, if n is sufficiently large, x will be "close" to E(X) fo a
certain sense. This result is very much related to the idea (also to be discussed in
Chapter 12) that the relative frequency fA associated with n repetitions of an ex-
120 Further Characterizations of Random Variables 7.1

periment will be close to the probability P(A) if fA is based on a large number of

repetitions of 8.

EXAMPLE 7.4. A manufacturer produces items such that IO percent are defective
and 90 percent are nondefective. If a defective item is produced, the manufacturer
loses $1 while a nondefective item brings a profit of $5. If X is the net profit per
item, then X is a random variable whose expected value is computed as E(X) =
-1(0.1) + 5(0.9) = $4.'40. Suppose that a large number of such items are pro
duced. Then, since the manufacturer will lose $1 about IO percent of the time and
earn $5 about 90 percent of the time, he will expect to gain about $4.40 per item
in the long run.

Theorem 7.1. Let X be a binomially distributed random variable with parameter

p, based on n repetitions of an experiment. Then

E(X) = np.
k
Proof: Since P(X = k) = (k)p ( l - pr-k, we have

E(X) = �o k k!(n n'� k)! pk(I - Pt k

n! k n-k
= � p p)
6_ (k -'- l)!(n - k)! (I -
(since the term with k = 0 equals zero). Lets k - I in the above sum. = As
k assumes values from one through n,s assumes values from zero through(n - 1).
Replacing k everywhere by (s + 1) we obtain

n
E(X) = I: 1
n ( � ) p•+10 Pt-•-l -

8=0
= np I: ( � ) p"(I - Pt-I-•.
n
1
8=0

The sum in the last expression is simply the sum of the binomial probabilities with
n replaced by (n - I) [that is, (p + (I - p))n-1] and hence equals one. This
establishes the result.

Note: The above result certainly corresponds to our intuitive notion. For suppose
that the probability of some event A is, say 0.3, when an experiment is performed. If we
repeat this experiment, say 100 times, we would expect A to occur about 100(0.3) = 30
times. The concept of expected value, introduced above for the discrete random variable,
will shortly be extended to the continuous case.

EXAMPLE 7.5. A printing machine has a constant probability of 0.05 of breaking

down on any given day. If the machine has no breakdowns during the week, a
profit of $S is realized. If 1 or 2 breakdowns occur, a profit of $R is realized
7.1 The Expected Value of a Random Variable 121

(R < S ). If 3 or more breakdowns occur, a profit of $(-L) is realized. (We

assume that R, S, and L are greater than zero; we also suppose that if the machine
breaks down on any given day, it stays shut down for the remainder of that day.)
Let X be the profit realized per five-day week. The possible values of X are R,
S, and ( -.L). Let B be the number of breakdowns per week. We have

P(B = k) = (D (0.05l(0.95)5-\ k = 0, l, . . . '5.

Since X = S if and only if B = 0, X R if and only if B

= = or 2, and
X = (-L) if and only if B 3, = 4, or 5, we find that,

E(X) = SP(B = 0) + RP(B = I or 2) + (-L)P(B = 3, 4 , or 5)

S(0.95)5 + R[5(0.05)(0.95)4 + 10(0.05)2(0.95)3]
+ (-L)[I0(0.05)3(0.95)2 + 5(0.05)4(0.95) + (0.05)5] dollars.

Definition. Let X be a continuous random variable with pdf f The expected

value of X is defined as
E(X) =

f+oo xf(x) dx.

_00 (7.2)

Again it may happen that this (improper) integral does not converge. Hence
we say that E(X) exists if and only if

J_: lxl f(x) dx

is finite.

Note: We should observe the analogy between the expected value of a random variable
and the concept of "center of mass" in mechanics. If a unit mass is distributed along the
line at the discrete points x 1, . . . , Xn, . . . and if p(x;) is the mass at x;, then we see that
L;'=1 x;p(x;) represents the center of mass (about the origin). Similarly, if a unit mass
is distributed continuously over a line, and if f(x) represents the mass density at x, then
J�: xf(x) dx may again be interpreted as the center of mass. In the above sense, E(X)
can represent "a center" of the probability distribution. Also, E(X) is sometimes called
a measure of central tendency and is in the same units as X.

fl..x)

� x=lSOO x=3000
-x
FIGURE 7.1

EXAMPLE 7.6. Let the random variable X be defined as follows. Suppose that
X is the time (in minutes) during which electrical equipment is used at maximum
122 Further Characterizations of Random Variables 7.1

load in a certain specified time period. Suppose that X is a continuous random

variable with the following pdf:

1
f(x) = x 0 :::; x :::; 1500,
(1500)2 ,
-1
(x -3000), 1500 :::; x :::; 3000,
(1500)2
0, elsewhere.
Thus

E(X) =

f+oo
_00 xf(x) dx

J
{1500 3000
;
(1500 (1500) [Jo x2 dx
-f soo
x(x - 3000) dx

1500 minutes.

(See Fig. 7. 1.)

EXAMPLE 7.7. The ash content in coal (percentage), say X, may be considered.
as a continuous random variable with 3the following pdf: f(x) 4817 5 x2, =

10 :::; x :::;25. Hence E(X) = :ttfrs JfJ x dx 19.5 percent. Thus the ex
=

pected ash content in the particular coal specimen being considered is 19.5 percent.

Theorem 7.2. Let X be uniformly distributed over the interval [a, b]. Then

a+ b.
E(X) =

Proof" The pdf of X is given by f(x) = l/(b-a), a :::; x :::; b. Hence

1b Ib
x 1 x2 a+ b.
E(X) = -- dx = ----
a b- a b-a2 a 2

(Observe that this represents the midpoint of the interval [a, b], as we would expect
intuitively.)

Note: It might be worthwhile to recall at this juncture that a random variable X is a

function from a sample space S to the range space Rx. As we have pointed out repeatedly,
for most purposes we are only concerned with the range space and with probabilities de
fined on it. This notion of expected value was defined entirely in terms of the range space
[see Eqs. (7.1) and (7.2).] However, occasionally we should observe the functional nature

S to be finite? Since x; =
of X. For example, how do we express Eq. (7 .1) in terms of the outcome s E S, assuming
X(s) for some s E S and since

p(x;) = P[s: X(s) = x;],

7.2 Expectation of a Function of a Random Variable 123

we may write
n

E(X) = L: x;p(x;) = L: X(s)P(s), (7.3)

•ES

where P(s) is the probability of the event {s} C S. For instance, if the experiment con
sists of classifying three items as defective (D) or nondefective (N), a sample space for
this experiment would be

S = {NNN, NND, NDN, DNN, NDD, DND, DDN, DDD}.

If X is defined as the number of defectives, and if all the above outcomes are assumed
to be equally likely, we have according to Eq. (7.3),

E(X) = L: X(s)P(s)
sES

= o· m + 1m + l(U + l(U + 2m + 2m + 2m + 3W
- 3
- 2·

Of course, this result could have been obtained more easily by applying Eq. (7.1) directly.
However, it is well to remember that in order to use Eq. (7.1) we needed to know the num
bers p(x;), which in turn meant that a computation such as the one used above had to be
carried out. The point is that once the probability distribution over Rx is known [in this
case the values of the numbers p(x;)], we can suppress the functional relationship between
Rx and S.

7.2 Expectation of a Function of a Random Variable

As we have discussed previously, if X is a random variable and if Y = H(X)

is a function of X, then Y is also a random variable with some probability dis
tribution. Hence it will be of interest and meaningful to evaluate E( Y). There
are two ways of evaluating E( Y) which turn out to be equivalent. To show that
they are in general equivalent is not trivial, and we shall only prove a special case.
However, it is important that the reader understands the two approaches dis
cussed below.

Definition. Let X be a random variable and let Y = H(X).

(a) If Y is a discrete random variable with possible values y i, y 2, • • • and if
q(y;) = P( Y = y;), we define
""

E( Y) = L: y;q(y;). (7.4)
i=l

(b) If Y is a continuous random variable with pdf g, we define

E( Y) = J_:"' yg(y) dy. (7.5)

124 Further Characterizations of Random Variables 7.2

Note: Of course, these definitions are completely consistent with the previous defini
tion given for the expected value of a random variable. In fact, the above simply repre
sents a restatement in terms of Y. One "disadvantage" of applying the above definition
in order to obtain E(Y) is that the probability distribution of Y (that is, the probability
distribution over the range space Ry) is required. We discussed, in the previous chapter,
methods by which we may obtain either the point probabilities q(y;) or g, the pdf of Y.
However, the question arises as to whether we can obtain E(Y) without first finding the
probability distribution of Y, simply from the knowledge of the probability distribution
of X. The answer is in the affirmative as the following theorem indicates.

Theorem 7.3. Let X be a random variable and let Y =

H(X).
(a) If X is a discrete random variable and p(x;) =
P(X = x;), we have
00

E( Y) =
E(H( X )) =
L H(x i)p(xi ). (7.6)
i=l
(b) If X is a continuous random variable with pdf f, we have

E( Y) =
E(H( X )) =

f+oo _00 H(x)f(x) dx. (7.7)

Note: This theorem makes the evaluation of E(Y) much simpler, for it means that we
need not find the probability distribution of Y in order to evaluate E(Y). The knowledge
of the probability distribution of X suffices.

Proof· [We shall only prove 'Eq. (7.6). The proof of Eq. (7.7) is somewhat more
intricate.] Consider the sum Lf=l H(xi)p(xi) L:;f=1 (L:; H(x;)p(x;)), where
=

the inner sum is taken over all indices i for which H(x;) Yi> for some fixed Yi· =

Hence all the terms H(x;) are constant in the inner sum. Hence
00 00

L H(xi )p(xi ) =
L Yi L p(x ;).
i=l i=l
However,
L p(x ;) =
L P[x I H(x;) = Yi] = q(yi).
i i

Therefore, L:.i'.=1 H(xi)p(xi) = LJ=l Yiq(yi), which establishes Eq. (7.6).

Note: The method of proof is essentially equivalent to the method of counting in

which we put together all items having the same value. Thus, if we want to find the total
sum of the values 1, 1, 2, 3, 5, 3, 2, 1, 2, 2, 3, we could either add directly or point out
that since there are 3 one's, 4 two's, 2 three's, and 1 five, the total sum equals

3(1) + 4(2) + 3(3) + 1 (5) = 25.

EXAMPLE 7 8 Let Vbe the wind velocity (mph) and suppose that Vis uniformly
. .

distributed over the interval [O, 10]. The pressure, say W (in lb/ft2), on the sur-
7.2 Expectation of a Function of a Random Variable 125

face of an airplane wing is given by the relationship: W = 0.003V2• To find the

expected value of W, E( W), we can proceed in two ways:
(a) Using Theorem 7.3, we have
1 10
E( W) = 0 0.003v2f(v) dv

= r1° 0.003v2 1 dv
J0 10

= O.l lb/ft2•

(b) Using the definition of E( W), we first need to find the pdf of W, say g, and
then evaluate e: wg(w) dw. To find g(w), we note that w 0.003v2 is a mono =

tone function of v, for v � 0. We may apply Theorem 5.1 and obtain

g(w) =
l� 1::1
= ! v.lf- w-112, 0 :-:::; w :-:::; 0.3 ,

= 0, elsewhere.
Hence
( 0 .3
E( W) wg(w) dw = 0.1
J0
=

after a simple computation. Thus, as the theorem stated, the two evaluations of
E( W) yield the same result.

EXAMPLE 7.9. In many problems we are interested only in the magnitude of a

random variable without regard to its algebraic sign. That is, we are concerned
with IXI. Suppose that X is a continuous random variable with the following pdf:
x
e
f(x) = 2 if x :-:::; 0,
-x
e
= if x > 0.
-2
Let Y = IXI. To obtain E( Y) we may proceed in one of two ways.
(a) Using Theorem 7.3, we have
"'
E( Y) J: lxlf(x) dx
"'
=

! [f�"' (-x)ex dx + J0"' (x)e-x dxJ

![I + I] = l.
126 Further Characterizations of Random Variables 7.2

(b) To evaluate £( Y) using the definition, we need to obtain the pdf of Y = I XI,
say g. Let G be the cdf of Y. Hence

G(y) =
P( Y � y) = P[IXI � y] = P[-y � X � y] = 2P(O � X � y),

since the pdf of X is symmetric about zero. Therefore

( (11 . x
G(y) = 2 l
o
"
f(x) dx = 2 lo �
e
dx = -e-11 + l.

Thus we have for g, the pdf of Y , g(y) = G'(y) = e-Y, y � 0. Hence E(Y) =

fo yg(y) dy = fo ye-Y dy = 1, as above.

EXAMPLE 7.10. In many problems we can use the expected value of a random
variable in order to make a certain decision in an optimum way.
Suppose that a manufacturer produces a certain type of lubricating oil which
loses some of its special attributes if it is not used within a certain period of time.
Let X be the number of units of oil ordered from the manufacturer during each
year. (One unit equals 1000 gallons.) Suppose that X is a continuous random
variable, uniformly distributed over [2, 4]. Hence the pdf f has the form,

f(x) =
!, 2 � x � 4,
= 0, elsewhere.

Suppose that for each unit sold a profit of $300 is earned, while for·each unit not
sold (during any specified year) a loss of $100 is taken, since a unit not used will
have to be discarded. Assume that the manufacturer must decide a few months
prior to the beginning of each year how much he will produce, and that he decides
to manufacture Y units. ( Y is not a random variable; it is specified by the manu
facturer.) Let Z be the profit per year (in dollars). Here Z is clearly a random vari
able since it is a function of the random variable X. Specifically, Z H(X) where =

H(X) = 300Y; if X � Y,
= 300X + (-IOO)(Y - X) if X < Y.
(The last expression may be written as 400X - 100 Y.)
In order for us to obtain E(Z) we apply z
7.3 and write

J.__
_ -----+--�1
Theorem

E(Z) =
J_:"' H(x)f(x) dx
-+-I X

� i4 H(x) dx.
-·

2 y 4
=

FIGURE 7.2
7.3 Two-Dimensional Random Variables 127

To evaluate this integral we must con-

sider three cases: Y < 2, 2 �. Y � 4, E(Z)
and Y > 4. With the aid of Fig. 7.2 and
after some simplification we obtain

E(Z) = 300 Y if Y � 2
2
-100 Y + 700 Y - 400
if 2 < y < 4
1200 - 100 y if y 2:: 4. Y=2 Y=3.5 Y=4

The following question is of interest. FIGURE 7.3

How should the manufacturer choose the
value of Y in order to maximize his expected profit? We can answer this ques
tion easily by simply setting dE(Z)/d Y = 0. This yields Y = 3.5. (See Fig. 7.3.)

7.3 Two-Dimensional Random Variables

The concepts discussed above for the one-dimensional case also hold for higher
dimensional random variables. In particular, for the two-dimensional case, we
make the following definition.

Definition. Let (X, Y) be a two-dimensional random variable and let Z =

H(X, Y) be a real-valued function of (X, Y). Hence Z is a (one-dimensional)

random variable and we define E(Z) as follows:

(a) If Z is a discrete random variable with possible values zh z2, . . . and

with
p(z;) = P(Z = z;),
then
00

E(Z) = 2: z ;p(z;). (7.8)

i=l

(b) If Z is a continuous random variable with pdf f, we have

E(Z) =

f+oo zf(z) dz.

_
00
(7.9)

As in the one-dimensional case, the following theorem (analogous to Theorem

7.3) may be proved.

Theorem 7.4. Let (X, Y) be a two-dimensional random variable and let

Z = H(X, Y).
128 Further Characterizations of Random Variables 7.4

(a) If (X, Y) is a discrete random variable and if

p(xi, Yi) = P(X = Xi, Y = Yi), i,j = 1, 2, ... '

we have
Q() Q()

E(Z) = L.: L.: H(xi, Yi )p(xi, Y1} (7.10)

i=l i=l

(b) If (X, Y) is a continuous random variable with joint pdf f, we have

E(Z) =

!+"'!+"' H(x, y)f(x, y) dx dy.

_
.,
_
.,,
(7.l l )

Note: We shall not prove Theorem 7.4. Again, as in the one-dimensional case, this is
an extremely useful result since it states that we need not find the probability distribution
of the random variable Zin order to evaluate its expectation. We can find E(Z) directly
from the knowledge of the joint distribution of (X, Y).

EXAMPLE 7.11. Let us reconsider Example 6.1 4 and find E(E) where E = JR.
We found that I and R were independent random variables with the following
pdrs g and h, respectively:

g(i) = i
2, 0 s is 1; h( r) r2/9,
= 0 S r S 3.
We also found that the pdf of Eis p(e) ie(3 - e), 0 S e S 3. Since I and
=

R are independent random variables, the joint pdf of (/, R) is simply the product
of the pdf of I and R: f(i, r) �ir2,
0 S = S l, 0 S i
S 3. To evaluate E(E) r
using Theorem 7.4 we have

E(E) = fo3 j01 irf(i, r) di dr fo3 fo1 irfir2 di dr

1 i2 dih3 r3 dr l
�fo
= =

Using the definition (7.9) directly, we have

E(E) = l(3o ep(e)de l(3o efe(3 -

= e)de

ij03 (3e2 - e3)de �· =

7 .4 Properties of Expected Value

We shall list a number of important properties of the expected value of a ran

dom variable which will be very useful for subsequent work. IQ each case we shall
assume that all the expected values to which we refer exist. The proofs will be
7.4 Properties of Expected Value 129

given only for the continuous case. The reader should be able to supply the argu
ment for the discrete case by simply replacing integrals by summations.

Property 7.1. If X = C where C is a con F(x)

stant, then E(X) C. =

Proof
..-----F(x) = 1
E(X) =
1:..,"' Cf(x)dx
C1: "' f(x) dx C.
=
.., =

x=C
FIGURE 74
.

Note: The meaning of X equals C is the following. Since X is a function from the
sample space to Rx, the above means that Rx consists of the single value C. Hence
X equals C if and only if P[X(s) = C] = 1. This notion is best explained in terms of the
cdf of X. Namely, F(x) = 0, if x < C; F(x) equals1, if x � C (Fig. 74
. ). Such a ran
dom variable is sometimes called degenerate.

Property 7.2. Suppose that C is a constant and X is a random variable. Then

E(CX) CE(X).
=

Proof· E(CX) =
1:..,"' Cxf(x)dx cJ:..,"' xf(x)dx
= = CE(X).

Property 7.3. Let (X, Y) be a two-dimensional random variable with a joint

probability distribution. Let Z H 1(X,Y) and W
= H2(X, Y). Then =

E(Z + W) = E(Z) + E( W ).

Proof

E(Z + W) = 1:..,"' 1:0000[H1(x,y) + H2(x,y)]f(x,y)dxdy

[where f is the joint pdf of ( X, Y)]
00 "'
=
1:.., 1:.., H1(x, y)f(x, y)dxdy + 1:..,"' 1:,.,"' H2(x,y)f(x,y)dxdy
= E(Z) + E( W).

Property 7.4. Let X and Y be any two random variables. Then E(X + Y) =

E(X) + E(Y).

Proof: This follows immediately from Property 7.3 by letting H1(X,Y) = X,

and H2(X, Y) Y. =

Notes: (a) Combining Properties 7.1, 7.2, and 74

. we observe the following important
fact: If Y = aX + b, where a and bare constants, then E(Y) = aE(X) +b. In words:
130 Further Characterizations of Random Variables 7.4

The expectation of a linear function is that same linear function of the expectation. This
is not true unless a linear function is involved, and it is a common error to believe other
wise. For instance, E(X2) ,c. (E(X))2, E(ln X) ,c. In E(X), etc. Thus if X assumes the
values -1 and +1, each with probability t, then E(X) 0. However, =

(b) In general, it is difficult to obtain expressions for E(l/X) or E(X112), say, in terms
of 1/E(X) or (E(X)) 112• However, some inequalities are available, which are very easy
to derive. (See articles by Pleiss, Murthy and Pillai, and Gurland in the February 1966,
December 1966, and April 1967 issues, respectively, of The American Statistician.)
For instance, we have:
(1) If X assumes only positive values and has finite expectation, then E(l/X) ::::-: 1/E(X).
(2) Under the same hypotheses as in (1), E(X112) _-:::; (E(X)) 112.

Property 7.5. Let Xi. ... , X n be n random variables. Then

E(X1 + .. . + Xn) = E(X1) + ... + E(Xn).

Proof· This follows immediately from Property 7.4 by applying mathematical

induction.

Note: Combining this property with the above, we obtain

where the a/s are constants.

Property 7.6. Let (X, Y) be a two-dimensional random variable and suppose

that X and Y are independent. Then E(XY) E(X)E( Y). =

Proof

E(XY) =

f+oo f+oo xyf(x, y) dx dy

_
00
_
00

f+oo f+oo xyg(x)h(y) dx dy

_
00
_
00

f+oo xg(x) dxf+oo yh(y) dy

_
00
-oo = E( X)E( Y).

Note: The additional hypothesis of independence is required to establish Property 7.6,

whereas no such assumption was needed to obtain Property 7.4.
7.4 Properties of Expected Value 131

EXAMPLE 7.12. (This example is based on a problem in An Introduction to

Probability Theory and Its Applications by W. Feller, p. 225.)
Suppose that we need to test a large number of persons for some characteristic,
with either positive or negative results. Furthermore, suppose that one can take
specimens from several persons and test the combined specimen as a unit, such as
may be the case in certain types of blood tests.
Assume: The combined specimen will give a negative result if and only if all
contributing specimens are negative.
Thus, in the case of a positive result (of the combined specimen), all specimens
must be retested individually to determine which are positive. If the N persons
are divided into n groups of k persons (assume N = kn) then the following choices
arise:
(a) Test all N persons individually, requiring N tests.
(b) Test groups of k specimens which may require as few as n = N/ k or as
many as (k + l )n = N + n tests.
It shall be our purpose to study the expected number of tests required under
(b) and then to compare this with N.
Assume: The probability that the results of the test are positive equals p and is
the same for all persons. Furthermore, test outcomes for persons within the same
group being tested are independent. Let X = number of tests required to deter
mine the characteristic being studied for all N persons, and let X; = number of
tests required for testing persons in the ith group, i = 1, . . . , n.
Hence X = X1 + · · · + Xn, and therefore E(X) = E(X1) + · · · + E(Xn),
which equals nE(X1), say, since all of the X;'s have the same expectation. Now
X1 assumes just two values: l and k + l. Furthermore,

P(X1 = l) = P(all k persons in group l are negative)

= (l - Pi ·
Therefore
P(X 1 = k+ l) = 1 - ( l - p)k
and hence
E(X1) = 1 · (l - p)k + (k + l)[l - (l - p)k]
= k[l - (l - Pi+ k-1).
Thus

(The above formula is valid only for k > 1, since for k = 1 it yields E(X) =

N + pn, which is obviously false!)

One question of interest is the choice of k for which the above E(X) is smallest.
This could easily be handled by some numerical procedure. (See Problem 7. l la.)
Finally, note that in order for "group testing" to be preferable to individual
testing, we should have E(X) < N, that is, I (l - p)k + k-1 < l, which is
-
132 Further Characterizations of Random Variables 7.4

1
equivalent to k- < (l - pi. This cannot occur if (l p) < !. For, in that -

case, ( l - p)k < !k < l/k, the last inequality following from the fact that
2k > k. Thus we obtain the following interesting conclusion: If p, the probability
of a positive test on any given individual, is greater than!, then it is never preferable
to group specimens before testing. (See Problem 7.llb.)

EXAMPLE 7.13. Let us apply some of the above properties to derive (again) the
expectation of a binomially distributed random variable. The method used may
be applied to advantage in many similar situations.
Consider n independent repetitions of an experiment and let X be the number
of times some event, say A, occurs. Let p equal P(A) and assume that this number
is constant for all repetitions considered.
Define the auxiliary random variables Yi. ... , Yn as follows:

Yi= l if the event A occurs on the ith repetition,

= 0, elsewhere.

Hence
X = Y1 + Y2 + · · · + Yn,

and applying Property 7.5, we obtain

E(X) = E(Y1) + · · · + E(Yn).

However,

E(Yi) = l(p) + 0 (1 - p) = p, for all i.

Thus E(X) = np, which checks with the previous result.

Note: Let us reinterpret this important result. Consider the random variable X/n.
This represents the relative frequency of the event A among the n repetitions of 8. Using
Property 7.2, we have E(X/n) = (np)/n = p. This is, intuitively, as it should be, for it
says that the expected relative frequency of the event A is p, where p = P(A). It repre
sents the first theoretical verification of the fact that there is a connection between the
relative frequency of an event and the probability of that event. In a later chapter we
shall obtain further results yielding a much more precise relationship between relative
frequency and probability.

EXAMPLE 7.14. Suppose that the demand D, per week, of a certain product is
a random variable with a certain probability distribution, say P(D n) p(n), = =

n = 0, l , 2, . Suppose that the cost to the supplier is C 1 dollars per item, while
. .

he sells the item for C 2 dollars. Any item which is not sold at the end of the week
must be stored at a cost of C 3 dollars per item. If the supplier decides to produce
7.4 Properties of Expected Value 133

N items at the beginning of the week, what is his expected profit per week? For
what value of N is the expected profit maximized? If T is the profit per week,
we have
T = NC 2 - NC1 if D > N,
= DC2 - C1N - C3 (N - D) if D ::; N.

Rewriting the above, we obtain

T = N(C 2 - C1) if D > N,

= (C2 + C3)D - N(C1 + C3) if D ::; N.

Hence the expected profit is obtained as follows:

E(T) = N(C2 - C1)P(D > N) + (C2 + Ca) L np(n)

n=O

- N(C1 + Ca)P(D ::; N)

«> N
N(C2 - C1) L p(n) + (C2 + Ca) L np(n)
n=N+l n=O
N
- N(C1 + Ca) L p(n)
n=O

N(C2 - C1) + (C2 + Ca) [� np(n) - N � ] p(n)

N(C2 - C1) + {C2 + Ca) L p(n)(n - N).

n=O

Suppose that the following probability distribution is known to be appropriate

for D: P(D = n) = .g., n = I, 2, 3, 4, 5. Hence

if N ::; 5,

if N > 5.

Suppose that C2 = $9, C1 = $3 , and Ca = $1. Therefore

E(T) = 6N + 2 [
N(N +
l
I) - N 2 ]
if N ::; 5,
= 6N + 2(15 - 5N) if N > 5,
2
= 7N - N if N ::; 5,
= 30 - 4N if N > 5.
134 Further Cllaracterizations of Random Variables 7.5

E(T)

N=3.5

FIGURE 7.5

Hence the maximum occurs for N = 3.5. (See Fig. 7. 5 )

. For N = 3 or 4, we
have E(T) = 12, which is the maximum attainable since N is an integer.

7.5 The Variance of a Random Variable

Suppose that for a random variable X we find that E(X) equals 2. What is the
significance of this? It is important that we do not attribute more meaning to this
information than is warranted. It simply means that if we consider a large number
of determinations of X, say x i. . .. , Xn, and average these values of X, this average
would be close to 2 if n is large. However, it is very crucial that we should not put
too much meaning into an expected value. For example, suppose that X represents
the life length of light bulbs being received from a manufacturer, and that E(X) =

1000 hours. This could mean one of several things. It could mean that most of the
bulbs would be expected to last somewhere between 900 hours and I 100 hours.
It could also mean that the bulbs being supplied are made up of two entirely dif
ferent types of bulbs: about half are of very high quality and will last about 1300
hours, while the other half are of very poor quality and will last about 700 hours.
There is an obvious need to introduce a quantitative measure which will dis
tinguish between such situations. Various measures suggest themselves, but the
following is the most commonly used quantity.

Definition. Let X be a random variable. We define the variance of X, denoted

by V( X) or o1·, as follows:

V(X) = E[X - E(X)]2• (7.12)

The positive square root of V(X) is called the standard deviation of X and is
denoted by ux.

Notes: (a) The number V(X) is expressed in square units of X. That is, if X is measured
in hours, then V(X) is expressed in (hours)2• This is one reason for considering the
standard deviation. It is expressed in the same units as X.
(b) Another possible measure might have been EIX - E(X)j. For a number of rea
sons, one of which is that X2 is a "better-behaved" function than IXI, the variance is
preferred.
7.5 The Variance of a Random Variable 135

(c) If we interpret E(X) as the center of a unit mass distributed over a line, we may in
terpret V(X) as the moment of inertia of this mass about a perpendicular axis through
the center of mass.
(d) V(X) as defined in Eq. (7.12) is a special case of the following more general notion.
The kth moment of the random variable X about its expectation is defined as µk =

E[X - E(X)]k. Clearly fork 2, we obtain the variance.

The evaluation of V(X) may be simplified with the aid of the following result.

Theorem 7.5
V(X) = E(X2) - [E(X)]2•

Proof' Expanding E[X - E(X)]2 and using the previously established proper
ties for expectation, we obtain

V(X) = E[X - E(X)]2

= E{X2 - 2XE(X) + [E(X)]2}
= E(X2) - 2E(X)E(X) + [E(X)]2 [Recall that E(X) is a constant.]

= E(X2) - [E(X)]2•

EXAMPLE 7.15. The weather bureau classifies the type of sky that is visible in
terms of "degrees of cloudiness." A scale of 11 categories is used: 0, l, 2, . . . , 10,
where 0 represents a perfectly clear sky, lO represents a completely overcast sky,
while the other values represent various intermediate conditions. Suppose that
such a classification is made at a particular weather station on a particular day
and time. Let X be the random variable assuming one of the above 11 values.
Suppose that the probability distribution of X is

Po = Pio = 0.05;

Pi = P2 = Ps = pg = 0.15;

Pa = P4 = ps =Pa = P1 = 0.06.

Hence
E(X) = l (0.15) + 2(0.15) + 3(0.06) + 4(0.06) + 5(0.06)

+ 6(0.06) + 7(0.06) + 8(0.15) + 9(0.15)

. + 10(0.05) = 5.0.

In order to compute V(X) we need to evaluate E(X2).

E(X2) = 1(0.15) + 4(0.15) + 9(0.06) + 16(0.06) + 25(0.06)

+ 36(0.06) + 49(0.06) + 64(0.15) + 81(0.15)

+ 100(0.05) = 35.6.
136 Further Characterizations of Random Variables 7.6

f(x)

FIGURE 7.6
Hence
V(X) = E(X2) - (E(X))2 = 35.6 - 25 10.6,

and the standard deviation u = 3.25.

EXAMPLE 7 .16. Suppose that X is a continuous random variable with pdf

f(x) = 1 + x, - 1 ::; x ::; 0,

1 - X, 0 ::; x ::; I.

(See Fig. 7.6.) Because of the symmetry of the pdf, E(X) = 0. (See Note below.)
Furthermore,

Hence V(X) = t.

Note: Suppose that a continuous random variable has a pdf which is symmetric about
x = 0. That is, /(-x) f(x) for all x. Then, provided E(X) exists, E(X)
= 0, which =

is an immediate consequence of the definition of E(X). This may be extended to an

arbitrary point of symmetry x a, in which case E(X)
= a. (See Problem 7.33.)
=

7.6 Properties of the Variance of a Random Variable

There are 'various important properties, in part analogous to those discussed

for the expectation of a random variable, which hold for the variance.

Property 7.7. If C is a constant,

V(X + C) = V(X). (7.13)

Proof

V(X + C) = E[(X + C) - E(X + C)]2 = E[(X + C) - E(X) - C]2

= E[X - E(X)]2 V(X). =
7.6 ·Properties of Variance of a Random Variable 137

Note: This property is intuitively clear, for adding a constant to an outcome X does
not change its variability, which is what the variance measures. It simply "shifts" the
values of X to the right or to the left, depending on the sign of C.

Property 7.8. If C is a constant,

V(CX) = C2V(X). (7.14)

Proof
V(CX) =
E(CX)2 - (E(CX))2 =
C2E(X2) - C2(E(X))2
= C2[E(X2) - (E(X))2] = C2V(X).

Property 7.9. If (X, Y) is a two-dimensional random variable, and if X and Y

are independent then

V(X + Y) = V(X) + V(Y). (7.15)

Proof

V(X + Y) =
E(X + Y)2 - (E(X + Y))2
E(X2 + 2XY + Y2) - (E(X))2 - 2E(X)E(Y) - (E(Y))2
=
E(X2) - (E(X))2 + E(Y2) - (E(Y))2 V(X) + V(Y). =

Note: It is important to realize that the variance is not additive, in general, as is the
expected value. With the additional assumption of independence, Property 7.9 is valid.
Nor does the variance possess the linearity property which we discussed for the expecta
tion, that is, V(aX + b) � aV(X) + b. Instead we have V(aX + b) = a2 V(X).

Property 7.10. Let Xi. ... , Xn be n independent random variables. Then

V(X1 + ... + Xn) =

V(X1) + ... + V(Xn). (7.16)

Proof" This follows from Property 7.9 by mathematical induction.

Property 7.11. Let X be a random variable with finite variance. Then for any
real number a,
V(X) =
E[(X - a)2] - [E(X) - a]2. (7.17)

Proof" See Problem 7.36.

Notes: (a) This is an obvious extension of Theorem 7.5, for by letting a = 0 we ob
tain Theorem 7.5.
(b) If we interpret V(X) as the moment of inertia and E(X) as the center of a unit
mass, then the above property is a statement of the well-known parallel-axis theorem in
mechanics: The moment of inertia about an arbitrary point equals the moment of inertia
about the center of mass plus the square of the distance of this arbitrary point from the
center of mass.
138 Further Characterizations of Random Variables 7.6

(c) E[X - a]2 is minimized if a = E(X). This follows immediately from the above
property. Thus the moment of inertia (of a unit mass distributed over a line) about an
axis through an arbitrary point is minimized if this point is chosen as the center of mass.

EXAMPLE 7 .17. Let us compute the variance of a binomially distributed random

variable with parameterp.
To compute V(X) we can proceed i n two ways. Since we already know that
E(X) np, we must simply compute E(X2) and then evaluate V(X) as E(X2) -
=

(E(X))2• To compute E(X2) we use the fact that P(X = k) (k) pk(l - pt-k, =

k = 0, I, . . , n. Hence E(X2)
.
Lk=O k 2(k)pk(l - pt-k. This sum may be
=

evaluated fairly easily, but rather than do this, we shall employ a simpler method.
We shall again use the representation of X introduced in Example 7.13, namely
X = Y1 + Y + + Y n . We now note that the Y/s are independent random
2
· · ·

variables since the value of Yi depends only on the outcome of the ith repetition,
and the successive repetitions are assumed to be independent. Hence we may
apply Property 7.10 and obtain

V(X) = V(Y1 + . . . + Y n) = V(Y1) + .. . + V(Yn ).

But V(Y;) = E(Yi)2 - [E(Y;)]2• Now

E(Y;) = l(p) + 0(1 - p) = p, E(Y;)2 = l 2(p) + 02(1 - p) = p.

Therefore V(Yi) = p - p 2 = p(l - p) for all i. Thus V(X) = np(l - p).

Note: Let us consider V(X) = np(l - p) as a

function of p, for given n. We sketch a graph as
shown in Fig . 7.7.
Solving (d/dp)np(l - p) = 0 we find that the
maximum value for V(X) occurs for p !. The =
V(X)
minimum value of V(X) obviously occurs at the
endpoints of the interval at p = 0 and p = 1.
This is intuitively as it should be. Recalling that
the variance is a measure of the variation of the
random variable X defined as the number of times
the event A occurs in n repetitions, we find that ��--i--��--- p
this variation is zero if p = 0 or 1 (that is, if A p=!
occurs with probability 0 or 1) and is maximum
when we are as "uncertain as we can be" about
the occurrence or nonoccurrence of A, that is, FIGURE 7.7
when P(A) = !.

EXAMPLE 7.18. Suppose that the random variable X is uniformly distributed

over[a, b ] As we have computed previously, E(X)
. = (a + b)/2.
To compute V(X) we evaluate E(X2):

b
i 1 b3 a3
E(X2) x2--dx
-

b - a 3(b - a)
= =

a
7.7 Expressions for Expectation and Variance . 139

Hence
(b - a) 2
V(X) = E(X2 ) - [E(X)]2
12
after a simple computation.

Notes: (a) This result is intuitively meaningful. It states that the variance of X does
not depend on a and b individually but only on (b a)2, that is, on the square of their
-

difference. Hence two random variables each of which is uniformly distributed over
some interval (not necessarily the same) will have equal variances so long as the lengths
of the intervals are the same.
(b) It is a well-known fact that the moment of inertia of a slim rod of mass Mand
length L about a transverse axis through the center is given by ML2/12.

7.7 Approximate Expressions for Expectation and Variance

We have already noted that in order to evaluate E( Y) or V( Y), where Y H(X), =

we need not find the probability distribution of Y, but may work directly with the
probability distribution of X. Similarly, if Z = H(X, Y), we can evaluate E(Z)
and V(Z) without first obtaining the distribution of Z.
If the function His quite involved, the evaluation of the above expectations and
variances may lead to integrations (or summations) which are quite difficult.
Hence the following approximations are very useful.

Theorem 7.6. Let X be a random variable with E(X) = µ, and V(X) = u2•
Suppose that Y H(X). Then
=

H' µ) 2
E( Y) � H(µ) + u , i (7.18)

V( Y) � [H'(µ))2u2• (7,19)
(In order to make the above approximations meaningful, we obviously require
that Hbe at least twice differentiable at x = µ.)

Proof (outline only): In order to establish Eq. (7.18), we expand the function
Hin a Taylor series about x = µ to two terms. Thus
(X - µ H"(µ)
Y = H(µ) + (X - µ)H'(µ) + t + Ri.

where R 1 is a remainder. If we discard the remainder term R i. then, taking the

expected value of both sides, we have
H' (µ) 2
E( Y) � H(µ) + u, ;
since E(X - µ) = 0. In order to establish Eq. (7.19), we expand Hin a Taylor
series about x = µ to one term. Then Y H(µ) + (X - µ.)H'(µ) + R2• If
=

we discard the remainder R2 and take the variance of both sides, we have
140 Further Characterizations of Random Variables 7.7

EXAMPLE 7.19. Under certain conditions, the surface tension of a liquid (dyn/cm)
is given by the formula S 2(1 - 0.005 T)1.2, where T is the temperature of the
=

liquid (degrees centigrade).

Suppose that T is a continuous random variable with the following pdf,

t ;::: 10,
= 0, elsewhere.
Hence.

E(n r' 3000t-3 dt = 15 (degrees centigrade).

110
=

And
V(T) = E (T2) - (15)2

= ( "" 30001-2 dt - 225 75 (degrees centigrade)2•

110
=

To compute E(S) and V(S) we have to evaluate the following integrals:

and

( "" (I - 0.005t)2'4t-4 dt.

110

Rather than evaluate these expressions, we shall obtain approximations for E(S)
and V(S) by using Eqs. (7.18) and (7.19). In order to use these formulas we have
to compute H'(l5) and H"(l5), where H(t) = 2(1 - 0.0051)1.2• We have

H'(t) = 2.4(1 - 0.005t)0·2(-0.005) = -0.012(1 - 0.005t)0·2•

Hence
H(l5) = 1.82, H'(l5) = 0.01.
Similarly,

H" (t) = -0.0024(1 - 0.005t)-0·8(-0.005) = 0.000012(1 - 0.005tr0·8.

Therefore
0.000012 o+
H"(l5) .
_ _

- (0.925)0.8 -
Thus we have

E(S) � H(l5) + 75H"(l5) = 1.82 (dyne/cm),

V(S) � 75[H'(l5)]2 = 0.87 (dyne/cm)2•

If Z is a function of two random variables, say Z = H(X, Y), an analogous

result is available.
7.8 Chebyshev's Inequality 141

Theorem 7.7. Let (X, Y) be a two-dimensional random variable. Suppose that

E(X) = µx, E(Y) µ11; V(X)
= u; and V(Y)
= u . Let Z H(X, Y).= � =

[We shall assume that the various derivatives of H exist at (µx, µ11).] Then
if Xand Yare independent, we have
2
1 aH 2 [2
a H 2
E(Z) � H(µx, µ71) + 2 ax2 <Tx + a 2 <T11 '
]
y

V(Z ) � [��r u! + [�;r ;

u ,

where all the partial derivatives are evaluated at (µx, µ11).

Proof· The proof involves the expansion of Hin a Taylor series aLout the point
(µx , µ71) to one and two terms, discarding the remainder, and then taking the
expectation and variance of both sides as was done in the proof of Theorem 7.6.
We shall leave the details to the reader. (If Xand Yare not independent, a slightly
more complicated formula may be derived.)
Note: The above result may be extended to a function of n independent random vari
ables, say Z H(X1. . .. , X,.). If E(X;)
= µ;, V(X;) u:, we have the following
= =

approximations, assuming that all the derivatives exist:

.
2
1 " a H 2
E(Z) � H(µ.1. . ., µ.,.) + .2 L: -2 <Ti,

( )
i=l dXi
2
" aH 2
V(Z) � I: �
uXi
u;,

i=l

where all the partial derivatives are evaluated at the point (µ.i, ... , µ...).

EXAMPLE 7.20. Suppose that we have a simple circuit for which the voltage,
say M, is expressed by Ohm's Law as M IR, where I and R are the current and
=

resistance of the circuit, respectively. If I and R are independent random variables,

then M is a random variable, and using Theorem 7.7, we may write
2 2
E[M],..,, E(l)E(R) , V[M] � [E(R)] V(I ) + [E(l)] V(R).

7.8 Chebyshev's Inequality

There is a well-known inequality due to the Russian mathematician Chebyshev

which will play an important role in our subsequent work. In addition , it will
give us a means of understanding precisely how the variance measures variability
about the,expected value of a random variable.
If we know the probability distribution of a random variable X (either the pdf
in the continuous case or the point probabilities in the discrete case ), we may then
compute E(X) and V(X), if these exist. However, the converse is not true. That
is, from a knowledge of E(X) and V(X) we cannot reconstruct the probability
distribution of Xand hence cannot compute quantities such as P[I X - E(X)I � C].
142 Further Characterizations of Random Variables 7.8

Nonetheless, it turns out that although we cannot evaluate such probabilities

[from a knowledge of E(X) and V(X)), we can give a very useful upper (or lower)
bound to such probabilities. This result is contained in what is known as Cheby
shev's inequality.

Chebyshev's inequality. Let X be a random variable with E(X) µ and let

c be any real number. Then, if E(X- c)2 is finite and E is any positive number,
we have

(7 .2 0)

The following forms, equivalent to (7 .20), are immediate:

(a) By considering the complementary event we obtain

1
I - cl < E) � 1 ..,... 2 E(X- c)2•
P[X (7.20a)
E
(b) Choosing c = µwe obtain

Var X
P[IX- µ�
I E) � -- · (7.20b)
E2
(c) Choosing c = µand E = ku, where u2 = Var X > 0,
we obtain
(7 .21)

This last form (7 .21) is particularly indicative of how the variance measures the
"degree of concentration" of probability near E(X) µ. =

Proof (We shall prove only 7 2 . 0 since the others follow as indicated. We shall
deal only with the continuous case. In the discrete case the argument is very
similar with integrals replaced by sums. However, some care must be taken with
endpoints of intervals.):

Consider
I - cl� E)
P([X =
Lix-cl�ef(x)dx .

(The limit on the integral says that we are integrating between - oo and c- E
and between c + E and + oo. )
Now Ix - cl � E is equivalent to (x - c)2/E2 � 1. Hence the above in
tegral is

� { (x - c)2 x x,
f( )d
JR E2
where R = {x: Ix - cl � E}.
7.8 Chebyshev's Inequality 143

This integral is, in turn,

!+"'
(x
�. -«> �2 c) 2
f(x)dx
which equals
l
-E[X c]
-
2
E2
'

as was to be shown.
Notes: (a) It is important to realize that the above result is remarkable precisely be
cause so little is assumed about the probabilistic behavior of the random variable X.
(b) As we might suspect, additional information about the distribution of the random
variable X will enable us to improve on the inequality derived. For example, if C = !
we have, from Chebyshev's inequality,

PCIX - µj � Ju] � t = 0.44.

Suppose that we also know that X is uniformly distributed over (1 - 1/v'J, 1 + 1/V3).
Hence E(X) = 1, V(X) = t and thus

PCI x µI � !ul PCI x 11 � ll 1 PCI x 11 < ll

v;
= - =
- - -

= 1 - PC!< x< !l = 1 - = 0.134 .

Observe that although the statement obtained from Chebyshev's inequality is consistent
with this result, the latter is a more precise statement. However, in many problems no
assumption concerning the specific distribution of the random variable is justified, and
in such cases Chebyshev's inequality can give us important information about the be
havior of the random variable.

As we note from Eq. (7.21), if V(X) is small, most of the probability distribution
of X is "concentrated" near E(X). This may be expressed more precisely in the
following theorem.

Theorem 7.8. Suppose that V(X) 0. Then P[X = = µ] l, whereµ = E(X).

(Informally, X µ,with "probability l.")
=

Proof· From Eq. (7.20b) we find that

P[IX - µI � E] = 0 for any E > 0.
Hence
P[I X - µI < E] = l for any E > 0.

Since E may be chosen arbitrarily small, the theorem is established.

Notes: (a) This theorem shows that zero variance does imply that all the probability
is concentrated at a single point, namely at E(X).
(b) If E(X) = 0, then V(X) = E(X2), and hence in this case, E(X2) = 0 implies the
same conclusion.
144 Further Characterizations of Random Variables 7.9

(c) It is in the above sense that we say that a random variable X is degenerate: It
assumes only one value with probability 1.

7.9 The Correlation Coefficient

So far we have been concerned with associating parameters such as E(X) and
V(X) with the distribution of one-dimensional random variables. These param
eters measure, in a sense described previously, certain characteristics of the dis
tribution. If we have a two-dimensional random variable (X, Y), an analogous
problem is encountered. Of course, we may again discuss the one-dimensional
random variables X and Y associated with (X, Y). However, the question arises
whether there is a meaningful parameter which measures in some sense the "degree
of association" between X and Y. This rather vague notion will be made precise
shortly. We state the following formal definition.

Definition. Let (X, Y) be a two-dimensional random variable. We define

Pxy, the correlation coefficient, between X and Y, as follows:

E{[ X - E( X)][ y - £( Y)]} .

pxy - (7.22)
v'V(X)V( Y)
-

Notes: (a) We assume that all the expectations exist and that both V(X) and V(Y)
are nonzero. When there is no question as to which random variables are involved we
shall simply write p instead of p,,11•
(b) The numerator of p, E{[X - E(X)][Y - E(Y)]}, is called the covariance of X
and Y, and is sometimes denoted by u,,11•
(c) The correlation coefficient is a dimensionless quantity.
(d) Before the above definition can be very meaningful we must discover exactly what
p measures. This we shall do by considering a number of properties of p.

Theorem 7.9
E(XY) - E(X)E(Y)
P
= �-
v�v Y::::-)
=(=x=w=<::;: �

Proof· Consider

E{[X- E(X)][Y- E(Y)]} = E[XY - XE(Y) - YE(X) + E(X)E(Y)]

= E(XY)- E(X)E(Y)- E( Y)E(X) + E(X)E(Y)
= E(XY) - E(X)E(Y).

Theorem 7.10. If X and Y are independent, then p = 0.

Proof: This follows immediately from Theorem 7.9, since

E(XY) = E(X)E(Y)
if X and Y are independent.
7.9 The Correlation Coefficient 145

Note: The converse of Theorem 7.10 is in general not true. (See Problem 7.39.) That
is, we may have p = 0, and yet X and Y need not be independent. If p = 0, we say
that X and Y are uncorrelated. Thus, _being uncorrelated and being independent are, in
general, not equivalent. The following example illustrates this point.*
Let X and Y be any random variables having the same distribution. Let U =
X -Y and V = X + Y. Hence E(U) 0 and cov(U, V) = E[(X
= Y)(X + Y)] - =

E(X2 Y2)
- 0. Thus U and V are uncorrelated. Even if X and Y are independent,
=

U and V may be dependent, as the following choice of X and Y indicates. Let X and Y
be the numbers appearing on the first and second fair dice, respectively, which have
been tossed. We now find, for example, that P[V = 4I U = 3 ] = 0 (since if X - Y = 3,
X + Y cannot equal 4), while P(V = 4) = 3/36. Thus U and V are dependent.

Theorem 7.11. - 1 � p � 1. (That is, p assumes the values between -1

and +I inclusive.)

q(t) q(t)

(a) (b)
FIGURE 7.8

Proof· Consider the following function ·ofthe real variable t:

q(t) = E[V + tW]2,

where V = X - E(X) and W = Y - E(Y). Since [V + tW]2 � 0, we have

that q(t ) � 0 for all t. Expanding, we obtain

q(t) = E[V2 + 2tVW + t2W2] = E(V2) + 2tE(VW) + t2E(W2).

Thus q(t) is a quadratic expression in t. In general, if a quadratic expression q(t) =

at2 + bt + c has the property that q(t) � 0 for all t, it means that its graph
touches the t-axis at j ust one place or not at all, as indicated in Fig. 7.8. This,
in turn, means that its discriminant b2 - 4ac must be �O, since b2 - 4ac > 0
would mean that q(t) has two distinct real roots. Applying this conclusion to the
function q(t) under consideration above, we obtain

4[E(VW)]2 - 4E(V2)E(W2) � 0.

* The example in this note is taken from a discussion appearing in an article entitled
"Mutually Exclusive Events, Independence and Zero Correlation," by J. D. Gibbons,
appearing in The American Sta tistician, 22, No. 5, December 1968, pp. 31-32.
146 Further Characterizations of Random Variables 7.9

This implies

[E(VW)]2 {E[X - E(X)I Y - E( Y)J}2 <

- p2 - I.
_

:::; l , and hence

E(V2 )E(W2 ) V(X)V( Y)
Thus - 1 :::; p :::; I.

Theorem 7.12. Suppose that p2 = I. Then (with probability 1 in the sense of

Theorem 7.8), Y =AX+ B, where A and B are constants. In words:
If the correlation coefficient p is ±1, then Y is a linear function of X (with
probability 1).

Proof· Consider again the function q(t) described in the proof of Theorem 7.11.
It is a simple matter to observe in the proof of that theorem that if q(t) > 0 for
all t, then p2 < I. Hence the hypothesis of the present theorem, namely p2 = 1,
implies that there must exist at least one value of t, say t0, such that q(t0) =
E(V + t0W)2 =0. Since V + t0W =[X - E(X)] + t0[Y - E(Y)], we
have that E(V + t0W) =0 and hence variance (V+ t0W) =E(V + t0W)2•
Thus we find that the hypothesis of Theorem 7.12 leads to the conclusion that
the variance of (V+ t0W) 0. Hence, from Theorem 7.8 we may conclude
=

that the random variable (V+ t0W) =0 (with probability I). Therefore
[X - E(X)]+ t0[Y - E(Y)] =0. Rewriting this, we find that Y =AX+ B
(with probability 1), as was to be proved.

Note: The converse of Theorem 7.12 also holds as is shown in Theorem 7.13.

Theorem 7.13. Suppose that X and Y are two random variables for which
Y AX+ B, where A and B are constants. Then p2 = I. If A > 0,
=

<
p =+I ; if A 0, p = - I.

Proof· Since Y = AX+ B, we have E(Y) =AE(X) +B and V(Y) =A2V(X).

Also,
E(XY) = E[X(AX+ B)] =AE(X2) + BE(X).,
Hence

[E(XY) - E(X)E( Y)]2

p2 =
V(X)V(Y)
{AE(X2) + BE(X) - E(X)[AE(X)+ B]2}2
V(X)A2 V(X)
[AE(X2) + BE(X) - A(E(X))2 - BE(X)]2
A2(V(X))2
A 2{E(X2) - [E(X)]2}2
= =l .
A2(V(X))2
(The second statement of the theorem follows by noting that vA2 =IAI.)
7.9 The Correlation Coefficient 147

Note: Theorems 7.12 and 7.13 establish the following important characteristic of the
correlation coefficient: The correlation coefficient is a measure of the degree of linearity
between X and Y. Values of p near +1 or -1 indicate a high degree of linearity while
values of p near 0 indicate a lack of such linearity. Positive values of p show that Y
tends to increase with increasing X, while negative values of p show that Y tends to
decrease with increasing values of X. There is considerable misunderstanding about
the interpretation of the correlation coefficient. A value of p close to zero only indicates
the absence of a linear relationship between X and Y. It does not preclude the possi
bility of some nonlinear relationship.
y

EXAMPLE 7. 21 . Suppose that the two-dimensional

random variable (X, Y) is uniformly distributed over
the triangular region

R = {(x, y) I 0 < x < y < l}.

(See Fig. 7.9.) Hence the pdf is given as

f(x, y) = 2, ( x,y) ER,

= 0, elsewhere. FIGURE 7.9
Thus the marginal pdf's of X and of Y are

g( x) ={ (2) dy = 2( 1 - x), o::;x::;l;

h(y) = f: (2) dx 2y, = 0 :::; y :::; l.

Therefore

E(X) = f01 x2(l - x) dx= f, E( Y)= f01 y2y dy = !;

E(X2) = f01 x22( 1 - x) dx t, = E( y2) = fo1 y22y dy = !;
V(X)= E(X2) - [E(X)]2 = ta, V(Y) = E( Y2) - [E( Y)]2 J_.
18•

E(XY) = f01 f: xy2 dx dy = t.

Hence
E(XY) - E(X)E(Y)
p=
yV(X)V(Y) 2
As we have noted, the correlation coefficient is a dimensionless quantity. Its
value is not affected by a change of scale. The following theorem may easily be
proved. (See Problem 7.4 1 .)

Theorem 7.14. If pxy is the correlation coefficient between X and Y, and if

V = AX + Band W = CY + D, where A, B, C, and Dare constants , then
Pvw = (AC/jACl)pxy· (We suppose that A F- 0, C -;i£ 0.)
148 Further Characterizations of Random Variables 7.10

7.10 Conditional Expectation

Just as we defined the expected value of a random variable X (in terms of its
probability distribution) as C: xf(x) dx or L:7=1 xip(xi), so we can define the
conditional expectation of a random variable (in terms of its conditional prob
ability distribution) as follows.

Definition. (a) If (X, Y) is a two-dimensional continuous random variable we

define the conditional expectation of X for given Y y as =

oo
E(X I y) =
f_
+
00
xg(x I y) dx. (7.23)

(b) If (X, Y) is a two-dimensional discrete random variable we define the

conditional expectation of X for given Y =
Y; as

E(X I Y;) = t Xip(xi I Y;).

i=l
(7.24)

The conditional expectation of Y for given X is defined analogously.

Notes: (a) The interpretation of conditional expectation is as follows. Since g(x I y)
represents the conditional pdf of X for given Y = y, E(X I y) is the expectation of X
conditioned on the event {Y = y} . For example, if (X, Y) represents the tensile strength
and hardness of a specimen of steel, then E(X I y = 52. 7) is the expected tensile strength
of a specimen of steel chosen at random from the population of specimens whose hard
ness (measured on the Rockwell scale) is 52.7.
(b) It is important to realize that in general E(X I y) is a function of y and hence is a
random variable. Similarly E(Y Ix) x and is also a random variable.
is a function of
[Strictly speaking, E(X I y) is the value of the random variable E(X IY).]
(c) Since E(Y IX) and E(X I Y) are random variables, it will be meaningful to speak
of their expectations. Thus we may consider E[E(X I Y)], for instance. It is important
to realize that the inner expectation is taken with respect to the conditional distribution
of X given Y equals y, while the outer expectation is taken with respect to the probability
distribution of Y.

Theorem 7.15
E[E(X I Y)] = E(X), (7.25)
E[E( y I X)] = E( Y). (7.26)

Proof (continuous case only): By definition,

+ao xg(x I y) dx +ao x fh(x,

E( X I y) =

f_
.,
=

f-ao (y)
y)
dx '

where f is the joii;it pdf of (X, Y) and h is the marginal pdf of Y.

7.10 Conditional Expectation 149

Hence

E[E(X I Y)] =
/_:"' E(X Iy)h(y) dy = /_:"' [/_:"' !�0,f) ]
x dx h(y) dy.

If all the expectations exist, it is permissible to write the above iterated integral
with the order of integration reversed. Thus

E[E(X I Y)] = _,,, _,,, _,,, xg(x) dx

!+"' x [f+"' f(x, y) dyJ dx /+"' = = E(X).

[A similar argument may be used to establish Eq. (7. 26).) This theorem is very
useful as the following example illustrates.

EXAMPLE 7.22. Suppose that shipments involving a varying number of parts

arrive each day. If N is the number of items in the shipment, the probability dis
tribution of the random variable N is given as follows:

n: 10 11 12 13 14 15
P(N n): = 0.05 0.10 0.10 0.20 0.35 0.20

The probability that any particular part is defective is the same for all parts and
equals 0.10. If X is the number of defective parts arriving each day, what is the
expected value of X? For given N equals n , X has a binomial distribution. Since
N is itself a random variable, we proceed as follows.
We have E(X) = E[E(X IN)]. However, E(X IN) = O.lON, since for given
N, X has a binomial distribution. Hence

E(X) = E(O.lON) = O.lOE(N)

= 0.10[10(0.05) + ll{0.10) + 12(0.10) + 13(0.20) + 14(0.35) + 15(0.20))
= 1.33.

Theorem 7.16. Suppose tl;iat X and Y are independent random variables. Then

E(X I Y) = E(X) and E(Y IX) = E(Y).

Proof: See Problem 7.43.

EXAMPLE 7.23. Suppose that the power supply (kilowatts) to a hydroelectric

company during a specified time period is a random variable X, which we shall
assume to have a uniform distribution over [10, 30). The demand for power
(kilowatts), say Y, also is a random variable which we shall assume to be uni
formly distributed over [10, 20). (Thus, on the average, more power is supplied
than is demanded since E(X) 20, while E(Y)
= 15.) For every kilowatt sup
=

plied, the company makes a profit of $0.03. If the demand exceeds the supply, the
company gets additional power from another source making a profit on this power
150 Further Characteril.ations of Random Variables 7.11

of $0.01 per kilowatt supplied. What is the expected profit during the specified
time considered?
Let T be this profit. We have

T = 0.03Y if Y < X,
= 0.03X + O.Ol(Y - X) if y > x.

To evaluate E(D write it as E[E(T I X)]. We have

E(T I x ) {lilzO 0.03.Yto dy + f O (0.0ly + 0.02x}fo dy if 10 < x < 20,

=
Ji
2 0 0.03.Yto dy 20 x 30,

{
if < <
10
Af0.015x2 - l.5 + 2 + 0.4x - 0.005x2 - 0.02x2].
if 10 < x < 20,
lo if 20 < x < 30,
.

{ 0.05 + 0.04x - 0.00lx2 if 10 < x < 20,

0.45 20 x 30.
=

if < <
Therefore

E[E(T I X)] to f 2° (0.05 + 0.04x - 0.00lx2) dx + to f 3° 0.45 dx $0.43.

110 120
= =

7.11 Regression of the Mean

E(X I y) is the value of the ran

As we have pointed out in the previous section,
dom variable E(X I Y) and is a function of y. The graph of this function of y is
known as the regression curve (of the mean) of X on Y. Analogously, the graph
of the function of x, E(Y Ix) is called the regression curve (of the mean) of Y
on X. For each fixed y, E(X I y) is the expected value of the (one-dimensional)
random variable whose probability distribution is defined by Eq. (6.5) or (6.6).
(See Fig. 7.10.) In general, this expected value will depend on y. [Analogous
interpretations may be made for E(Y Ix).]

E(Y[x) E(X[y)

(a) (b)
FIGURE 7.10
7.11 Regression of the Mean 151

X= -l x=l

FIGURE 7.11

EXAMPLE 7.24. Suppose that (X, Y) is uniformly distributed over the semicircle
indicated in Fig. 7.11. Then/(x, y) 2/rr, (x, y) E semicircle. Thus
=

,/1-z2 ·

g(x)= ( '?:.dy = '?:.yl - x2, -l�x�l;

·
Jo 7r 7r

f ,/1-112
h(y) = _'?:.dx = ±v1 - y2, O�y�l.
_,/1-112 7r r

Therefore
l
g(xly)= ---
2yl - y2
1
h(y Ix)= �= ===
yl - x2
Hence
,/1-z2
E( Y J x) ""'. ro yh(y I x)dy
J
../1-z2 1-z2
Y2 ../
=
1 y
1
dy =
1
----;:=
: =
1
0 vl - x2 yl - x2 2 0

Similarly
f+../1-112
E(Xly) xg(xly)dx
.
-�112
=

f+../1-112 + 1-11
x2 v' 2 1
x dx --;::====
-�112 2v1 - y 2 2v1 - y2 2 -../1-112
=

= 0.

It may happen that either or both of the regression curves are in fact straight
lines (Fig. 7.12). That is, E(YIx) may be a linear function of x and/or E(X Iy)
may be a linear function of y. In this case we say that the regression of the mean
of Yon X (say) is linear.
152 Further Characterizations of Random Variables 7.11

FIGURE 7.12

EXAMPLE 7.25. Suppose that (X, Y) is uniformly distributed over the triangle
indicated in Fig. 7.13. Then f(x, y) = 1, (x, y) ET. The following expressions
for the marginal and conditional pdf's are easily verified:

2-y
g(x) 2x, 0 � x � l; h(y) --' 0 � y � 2.
2
= =

2 I
g(x Iy) 2 y/2 � x � t; h(y Ix) 0 � y � 2x.
y 2x
= _ • = •

Thus E(Y Ix) = fgz yh(y Ix)dy = fg'" y(l/2x)dy = x. Similarly,

[1 xg(xjy)dx [1 x 2 1
11112 11112
E( X j y) - dx = E.4 + - ·

2 y 2
= =

Thus both the regression of Y on X and of X on Yare linear (Fig. 7.14).

It turns out that if the regression of the mean of Yon Xis linear, say E(Y Ix) =

ax + {3, then we can easily express the coefficients a and {3 in terms of certain
parameters of the joint distribution of (X, Y). We have the following theorem.

y y

(l, 2) (l, 2)

E(X\y)

FIGURE 7.13 FIGURE 7.14

7.11 Regression of the Mean 153

Theorem 7.17. Let (X, Y) be a two-dimensional random variable and suppose

that

E(X) = µ.,, E(Y) = µy, V(X) = er;, and · V(Y) =

er�.

Let p be the correlation coefficient between X and Y. If the regression of Y

on X is linear, we have
CT
£(YI x) = µy + p _J!_ (x - µ.,). (7.27)
O"x

If the regression of X on Y is linear, we have

E(X I y) µ.,+Per., (y - µy). (7.28)

O"y
=

Proof· The proof of this theorem is outlined in Problem 7.44.

Notes: (a) As is suggested by the above wording, it is possible that one of the regres
sions of the mean is linear while the other one is not.
(b) Note the crucial role played by the correlation coefficient in the above expressions.
If the regression of X on Y, say, is linear, and if p 0, then we find (again) that E(XI y)
=

does not depend on y. Also observe that the algebraic sign of p determines the sign of
the slope of the regression line.
(c) If both regression functions are linear, we find, upon solving Eqs. (7.27) and (7.28)
simultaneously, that the regression lines intersect at the "center" of the distribution,
(µ..,, µy).

As we have noted (Example 7.23, for instance), the regression functions need
not be linear. However we might still be interested in trying to approximate the
regression curve with a linear function. This is usually done by appealing to the
principle of least squares, which in the present context is as follows: Choose the
constants a and b so that E[E(Y IX) - (aX + b)]2 is minimized. Similarly,
choose the constants c and d so that E[E(X I Y) - (c Y + d)]2 is minimized.
The lines y =ax + b and x cy + d are called the least-squares approxima
=

tions to the corresponding regression curves E(YI x) and E(X Iy), respectively.
The following theorem relates these regression lines to those discussed earlier.

Theorem 7.18. If y ax + b is the least-squares approximation to £(Y I x)

and if £(YI x) is in fact a linear function of x, that is

E(YI x) = a'x + b',

then a = a' and b = b'. An analogous statement holds for the regression
of X on Y.

Proof: See Problem 7.45.

154 Further Characterizations of Random Variables

PROBLEMS

7.1. Find the expected value of the following random variables.

(a) The random variable X defined in Problem 4.1.
(b) The random variable X defined in Problem 4.2.
(c) The random variable T defined in Problem 4.6.
(d) The random variable X defined in Problem 4.18.

7.2. Show that E(X) does not exist for the random variable X defined in Problem 4.25.

7.3. The following represents the probability distribution of D, the daily demand of
a certain product. Evaluate E(D).

d: 1, 2, 3, 4, 5,
P(D=d): 0.1, 0.1, 0.3, 0.3, 0.2.

7.4. In the manufacture of petroleum, the distilling temperature, say T (degrees

centigrade), is crucial in determining the quality of the final product. Suppose that T is
considered as a random variable uniformly distributed over (150,300).
Suppose that it costs C i dollars to produce one gallon of petroleum. If the oil distills
at a temperature less than 200°C, the product is known as naphtha and sells for C2
dollars per gallon. If it is distilled at a temperature greater than 200°C, it is known as
refined oil distillate and sells for C3 dollars per gallon. Find the expected net profit
(per gallon).

7.5. A certain alloy is formed by combining the melted mixture of two metals. The
resulting alloy contains a certain percent of lead, say X, which may be considered as a
random variable. Suppose that X has the following pdf:

f(x) = -J10-5x(100 - x), 0 � x � 100.

Suppose that P, the net profit realized. in selling this alloy (per pound), is the following
function of the percent content of lead: P = C1 + C2X. Compute the expected profit
(per pound).

7.6. Suppose that an electronic device has a life length X (in units of 1000 hours)
which is considered as a continuous random variable with the following pdf:

f(x) = e-", x > 0.

Suppose that the cost of manufacturing one such item is $2.00. The manufacturer sells
the item for $5.00, but guarantees a total refund if X � 0.9. What is the manufacturer's
expected profit per item?

7.7. The first 5 repetitions of an experiment cost $10 each. All subsequent repetitions
cost $5 each. Suppose that the experiment is repeated until the first successful outcome
occurs. If the probability of a successful outcome always equals 0.9, and if the repetitions
are independent, what is the expected cost of the entire operation?

7.8. A lot is known to contain 2 defective and 8 nondefective items. If these items are
inspected at random, one after another, what is the expected number of items that must
be chosen for inspection in order to remove all the defective ones?
Problems 155

7.9. A lot of 10 electric motors must either be totally rejected or is sold, depending on
the outcome of the following procedure: Two motors are chosen at random and in
spected. If one or more are defective, the lot is rejected. Otherwise it is accepted. Sup
pose that each motor costs $75 and is sold for $100. If the lot contains 1 defective motor,
what is the manufacturer's expected profit?

7.10. Suppose that D, the daily demand for an item, is a random variable with the
following probability distribution:

P(D = d) = C2d/d!, d = 1, 2, 3, 4.

(a) Evaluate the constant C.

(b) Compute the expected demand.
(c) Suppose that an item is sold for $5.00. A manufacturer produces K items daily.
Any item which is not sold at the end of the day must be discarded at a loss of $3.00.
(i) Find the probability distribution of the daily profit, as a function of K. (ii) How
many items should be manufactured to maximize the expected daily profit?
7.11. (a) With N = 50, p = 0.3, perform some computations to find that value of k
which minimizes E(X) in Example 7.12.
(b) Using the above values of N and p and using k = 5, 10, 25, determine for each of
these values of k whether "group testing" is preferable.

7.12. Suppose that X and Y are independent random variables with the following
pdf's:

f(x) = 8/x3, x > 2; g(y) = 2y, 0 < y < 1.

(a) Find the pdf of Z =XY.

(b) Obtain E(Z) in two �ays: (i) using the pdf of Z as obtained in (a). (ii) Directly,
without using the pdf of Z.

7.13. Suppose that X has pdf

f(x) = 8/x3, x > 2.

Let W = !X.
(a) Evaluate E(W) using the pdf of W.
(b) Evaluate E(W) without using the pdf of W.

7.14. A fair die is tossed 72 times. Given that X is the number of times six appears,
evaluate E(X2).

7.15. Find the expected value and variance of the random variables Y and Z of
Problem 5.2.

7.16. Find the expected value and variance of the random variable Y of Problem 5.3.

7.17. Find the expected value and variance of the random variables Y and Z of
Problem 5.5.

7.18. Find the expected value and variance of the random variables Y, Z, and W of
Problem 5.6.

7.19. Find the expected value and variance of the random variables V and S of
Problem 5.7.
156 Further Characterizations of Random Variables

7.20. Find the expected value and variance of the random variable Y of Problem 5.10
for each of the three cases.

7.21. Find the expected value and variance of the random variable A of Problem 6.7.
7 .22. Find the expected value and variance of the random variable Hof Problem 6.11.
7.23. Find the expected value and variance of the random variable W of Problem 6.13.
7.24. Suppose that X is a random variable for which E(X) = 10 and V(X) 25.=

For what positive values of a and b does Y = aX - b have expectation 0 and variance 1?
7.25. Suppose that S, a random voltage, varies between 0 and 1 volt and is uniformly
distributed over that interval. Suppose that the signal S is perturbed by an additive, in
dependent random noise N which is uniformly distributed between 0 and 2 volts.
(a) Find the expected voltage of the signal, taking noise into account.
(b) Find the expected power when the perturbed signal is applied to a resistor of 2 ohms.
7.26. Suppose that X is uniformly distributed over [-a, 3a]. Find the variance of X.
7.27. A target is made of three concentric circles of radii l/v'3, 1, and v'3 feet. Shots
within the inner circle count 4 points, within the next ring 3 points, and within the third
ring 2 points. Shots outside the target count zero. Let R be the random variable repre
senting the distance of the hit from the center. Suppose that the pdf of R is /(r) =

2/11"(1 + r2), r > 0. Compute the expected value of the score after 5 shots.

7.28. Suppose that the continuou� random variable X has pdf

x � 0.
Let Y = X2• Evaluate E(Y):
(a) directly without first obtaining the pdf of Y,
(b) by first obtaining the pdf of Y.
7.29. Suppose that the two-dimensional random variable (X, Y) is uniformly dis-
tributed over the triangle in Fig. 7.15. Evaluate V(X) and V(Y).
7.30. Suppose that (X, Y) is uniformly distributed over the triangle in Fig. 7.16.
(a) Obtain the marginal pdf of X and of Y.
(b) Evaluate V(X) and V(Y).
y

(2, 4)

l
(-1, 3) (l, 3)

� •x "-��---'-��---x(2, 0)

FIGURE 7.15 FIGURE 7.16

Problems 157

7.31. Suppose that X and Y are random variables for which E(X) = µz, E(Y) = µ11,
V(X) = ;
u , and V(Y) = ;
u . Using Theorem 7.7, obtain an approximation for E(Z)
and V(Z), where Z = X/Y.
7.32. Suppose that X and Y are independent random variables, each uniformly dis
tributed over (1, 2). Let Z = X/Y.
(a) Using Theorem 7.7, obtain approximate expressions for E(Z) and V(Z).
(b) Using Theorem 6.5, obtain the pdf of Z and then find the exact value of E(Z) and
V(Z). Compare with (a).

7.33. Show that if X is a continuous random variable with pdf /having the property
that the graph of /is symmetric about x = a, then E(X) = a, provided that E(X) exists.
(See Example 7.16.)

7.34. (a) Suppose that the random variable X assumes the values -1 and 1 each with
probability !. Consider P[ / X - E(X) / 2: kv'V(X)] as a function of k, k > 0. Plot
this function of k and, on the same coordinate system, plot the upper bound of the
above probability as given by Chebyshev's inequality.
(b) Same as (a) except that P(X = -1) = !, P(X = 1) = �·
7.35. Compare the upper bound on the probability P[ / X - E(X) / 2: 2v'V(X)] ob
tained from Chebyshev's inequality with the exact probability if X is uniformly dis
tributed over (-1, 3).
7.36. Verify Eq. (7.17).
7.37. Suppose that the two-dimensional random variable (X, Y) is uniformly dis
tributed over R, where R is defined by {(x, y) I x2 + y2 � 1, y 2: O}. (See Fig. 7.17.)
Evaluate Pzy, the correlation coefficient:

x
-1

FIGURE 7.17 FIGURE 7.18

7.38. Suppose that the two-dimensional random variable (X, Y) has pdf given by

f(x, y) = ke-11, 0 < x < y < 1

= 0, elsewhere.

(See Fig. 7.18.) Find the correlation coefficient Pzu·

7.39. The following example illustrates that p = 0 does not imply independence.
Suppose that (X, Y) has a joint probability distribution given by Table 7.1.
(a) Show that E(XY) = E(X)E(Y) and hence p = 0.
(b) Indicate why X and Y are not independent.
158 Further Characterizatiom of Random Variables

(c) Show that this example may be generalized as follows. The choice of the number
l is not crucial. What is important is that all the circled values are the same, all the boxed
values are the same, and the center value equals zero.

TABLE 7.1

x -1 0
-- --
1

---
-'- l
CD [I] CD --

---
0
[[] 0 [I]
1 CD []] CD
7.40. Suppose that A and B are two events associated with an experiment 8. Suppose
thatP(A) > 0 and P(B) > 0. Let the random variables X and Y be defined as follows.

X = 1 if A occurs and 0 otherwise,

Y = 1 if B occurs and 0 otherwise.

Show that Px11 = 0 implies that X and Y are independent.

7.41. Prove Theorem 7.14.
7.42. For the random variable (X, Y) defined in Problem 6.15, evaluate E(X I y),
E(Y I x), and check that E(X) = E[E(X I Y)] and E(Y) = E[E(Y IX)].

7.43. Prove Theorem 7.16.

7.44. Prove Theorem 7.17. [Hint: For the continuous case, multiply the equation
E(Y I x) Ax + B by g(x), the pdf of X, and integrate from - co to co. Do the same
=

thing, using xg(x) and then solve the resulting two equations for A and for B.]

7.45. Prove Theorem 7.18.

7.46. If X, Y, and Z are uncorrelated random variables with standard deviations 5,
12, and 9, respectively and if U X + Y and V
= Y + Z, evaluate the correlation
=

coefficient between U and V.

7.47. Suppose that both of the regression curves of the mean are in fact linear. Spe
cifically, assume that E(Y Ix) = -jx - 2 and E(X I y) = -!y - 3.
(a) Determine the correlation coefficient p.
(b) Determine E(X) and E(Y).

7.48. Consider weather forecasting with two alternatives: "rain" or "no rain" in the
next 24 hours. Suppose that p Prob(rain in next 24 hours) > 1/2. The forecaster
=

scores 1 point if he is correct and 0 points if not. In making n forecasts, a forecaster with
no ability whatsoever chooses at random r days (0 ::::; r ::::; n) to say "rain" and the
remaining n - r days to say "no rain." His total point score is Sn. Compute E(Sn)
and Var(Sn) and find that value of r for which E(Sn) is largest. [Hint: Let X; 1 or 0 =

depending on whether the ith forecast is correct or not. Then Sn = Li=1 X;. Note
that the X;'s are not independent.]

Bundle Test Bank Guide to the Code of Ethics for Nurses Interpretation and Application 2nd Edition eBook and TestBank Bundle Instant Download
No ratings yet
Bundle Test Bank Guide to the Code of Ethics for Nurses Interpretation and Application 2nd Edition eBook and TestBank Bundle Instant Download
404 pages
Chapter 5 Joint Probability Distributions 2
No ratings yet
Chapter 5 Joint Probability Distributions 2
49 pages
Chemistry Coursework Experiment 13
50% (2)
Chemistry Coursework Experiment 13
6 pages
STAT21613_Chapter1
No ratings yet
STAT21613_Chapter1
19 pages
5-Joint Probability Mass and Density Functions-09-01-2025
No ratings yet
5-Joint Probability Mass and Density Functions-09-01-2025
42 pages
MA702 - 10
No ratings yet
MA702 - 10
3 pages
Formal Letter Format To University
100% (1)
Formal Letter Format To University
7 pages
040
No ratings yet
040
2 pages
21MAB203T, Unit-2, Updated Feb 26, 2025
No ratings yet
21MAB203T, Unit-2, Updated Feb 26, 2025
49 pages
Multivariate Distributions Chapter
No ratings yet
Multivariate Distributions Chapter
70 pages
Bivariate Discrete Probability
No ratings yet
Bivariate Discrete Probability
18 pages
Statistics I - Unit 5.bidimensional Random Variables
No ratings yet
Statistics I - Unit 5.bidimensional Random Variables
101 pages
Joint Probability Distribution Reference 1
No ratings yet
Joint Probability Distribution Reference 1
13 pages
Chapter - 2
No ratings yet
Chapter - 2
29 pages
Literature Review On Air Cooler
67% (3)
Literature Review On Air Cooler
5 pages
GMAT Practice Questions
100% (1)
GMAT Practice Questions
8 pages
3-Joint Probability Distribution-03-02-2024
No ratings yet
3-Joint Probability Distribution-03-02-2024
22 pages
MAT 326 Chapter 7 Fall 2024
No ratings yet
MAT 326 Chapter 7 Fall 2024
9 pages
Unit 7
No ratings yet
Unit 7
14 pages
Random Variables - 2D
No ratings yet
Random Variables - 2D
17 pages
Lecture 06 - Functions of Random Variables
No ratings yet
Lecture 06 - Functions of Random Variables
50 pages
Chapter 3
No ratings yet
Chapter 3
87 pages
Chapter 3 Two-Dim RVs and Conditional Prob Dist
No ratings yet
Chapter 3 Two-Dim RVs and Conditional Prob Dist
124 pages
Presentation On Harshad Mehta Scam
67% (6)
Presentation On Harshad Mehta Scam
14 pages
CHAPTER 5
No ratings yet
CHAPTER 5
10 pages
Case Study 10 Occupational Health Hazards
No ratings yet
Case Study 10 Occupational Health Hazards
3 pages
International Journal of Ophthalmology and Clinical Research Ijocr 2 035
No ratings yet
International Journal of Ophthalmology and Clinical Research Ijocr 2 035
5 pages
Econ-2042- Unit 4-HO
No ratings yet
Econ-2042- Unit 4-HO
13 pages
08 Bivariate Distributions
No ratings yet
08 Bivariate Distributions
65 pages
CH 1
No ratings yet
CH 1
7 pages
Screw Threads and Fastenings
No ratings yet
Screw Threads and Fastenings
1 page
Screw Threads and Fastenings
No ratings yet
Screw Threads and Fastenings
1 page
10-Two-Dimensional Random Variables-29-01-2024
No ratings yet
10-Two-Dimensional Random Variables-29-01-2024
9 pages
Probability Theory and Applications: O - D R V
No ratings yet
Probability Theory and Applications: O - D R V
7 pages
X (I) X (I-1) +DX: Appendix A 133
No ratings yet
X (I) X (I-1) +DX: Appendix A 133
7 pages
Fingerprint 1
No ratings yet
Fingerprint 1
5 pages
Research Using Empathy
No ratings yet
Research Using Empathy
1 page
Chapter Five 5. Two Dimensional Random Variables
No ratings yet
Chapter Five 5. Two Dimensional Random Variables
12 pages
Probability Theory and Applications: Ntroduction To Robability
No ratings yet
Probability Theory and Applications: Ntroduction To Robability
9 pages
Anatomic and Hemodynamic Correlations in Carotid Artery Stenosis
No ratings yet
Anatomic and Hemodynamic Correlations in Carotid Artery Stenosis
9 pages
unit 2b
No ratings yet
unit 2b
19 pages
Probability Theory and Applications: O - D R V
No ratings yet
Probability Theory and Applications: O - D R V
10 pages
Probability Theory and Applications: O - D R V
No ratings yet
Probability Theory and Applications: O - D R V
10 pages
Probability Theory 8
No ratings yet
Probability Theory 8
12 pages
MAD 1284_M5
No ratings yet
MAD 1284_M5
41 pages
LITERA03Z - Week 7
No ratings yet
LITERA03Z - Week 7
42 pages
6) Bivariate Random Variables
No ratings yet
6) Bivariate Random Variables
16 pages
Analysis of The Effect of The Ho Chi Minh City Tunnel Settlement On The Adjacent Buildings
No ratings yet
Analysis of The Effect of The Ho Chi Minh City Tunnel Settlement On The Adjacent Buildings
9 pages
TwoDimensionalRandomVariable Unit 2
No ratings yet
TwoDimensionalRandomVariable Unit 2
29 pages
Elster AL425 Diaphragm Meter
No ratings yet
Elster AL425 Diaphragm Meter
2 pages
Thesis - Business Engagement Activities in Bankerohan Public Market
No ratings yet
Thesis - Business Engagement Activities in Bankerohan Public Market
21 pages
Simulink and Matlab For Mechanical Engin
No ratings yet
Simulink and Matlab For Mechanical Engin
29 pages
Chapter 7
No ratings yet
Chapter 7
55 pages
Robotics With: ROS Workshop
No ratings yet
Robotics With: ROS Workshop
30 pages
Maths Unit III
No ratings yet
Maths Unit III
66 pages
Statistical Signal Processing
100% (3)
Statistical Signal Processing
125 pages
Coconut Curry Chicken (Super Easy!) - Downshiftology
No ratings yet
Coconut Curry Chicken (Super Easy!) - Downshiftology
2 pages
Lesson 1 2
No ratings yet
Lesson 1 2
7 pages
CH4 Control System
No ratings yet
CH4 Control System
10 pages
Lecture No.6
No ratings yet
Lecture No.6
8 pages
Chapter 2
No ratings yet
Chapter 2
16 pages
Cost-Volume-Profit Relationships
No ratings yet
Cost-Volume-Profit Relationships
97 pages
Chapter5: Joint Probability Distributions
No ratings yet
Chapter5: Joint Probability Distributions
39 pages
Stochastic Hydrology: Indian Institute of Science
No ratings yet
Stochastic Hydrology: Indian Institute of Science
52 pages
Conditional Statements and Loops in Visual Basic
No ratings yet
Conditional Statements and Loops in Visual Basic
9 pages
R Variables
No ratings yet
R Variables
9 pages
Chapter 5
No ratings yet
Chapter 5
56 pages
Chapter 4 - Multivariate Probability Distribution
No ratings yet
Chapter 4 - Multivariate Probability Distribution
27 pages
UNIT 2 Rejinpaul
No ratings yet
UNIT 2 Rejinpaul
74 pages
PTS2 Reader
No ratings yet
PTS2 Reader
139 pages
STB251 Unit 2b
No ratings yet
STB251 Unit 2b
19 pages
Stat276 Chapter 7
No ratings yet
Stat276 Chapter 7
23 pages
6 Jointly Continuous Random Variables: 6.1 Joint Density Functions
No ratings yet
6 Jointly Continuous Random Variables: 6.1 Joint Density Functions
22 pages
Llecture2 1
No ratings yet
Llecture2 1
62 pages
Chapter 6: Probability and Random Variables: ECE 44000 Fall 2020 - Transmission of Information
No ratings yet
Chapter 6: Probability and Random Variables: ECE 44000 Fall 2020 - Transmission of Information
29 pages
03-Two Random Variables R
No ratings yet
03-Two Random Variables R
44 pages
AMA2104 Probability and Engineering Statistics 3 Joint Distribution
No ratings yet
AMA2104 Probability and Engineering Statistics 3 Joint Distribution
25 pages
Joint Probability Distribution
No ratings yet
Joint Probability Distribution
54 pages
Random Variables
No ratings yet
Random Variables
22 pages
6.mean and Variance of A Distribution
No ratings yet
6.mean and Variance of A Distribution
38 pages
Kiran V. Chavan.: Objective
No ratings yet
Kiran V. Chavan.: Objective
6 pages
Prob Review
No ratings yet
Prob Review
19 pages
Chap 5 PME
No ratings yet
Chap 5 PME
48 pages
Computer Assignment
No ratings yet
Computer Assignment
19 pages
Curative Calculation
100% (1)
Curative Calculation
2 pages
Joint Distributions Functions: Scott Sheffield
No ratings yet
Joint Distributions Functions: Scott Sheffield
68 pages
Lect6 PDF
No ratings yet
Lect6 PDF
11 pages
CHAPTER 03-Random Variable
No ratings yet
CHAPTER 03-Random Variable
68 pages
Joint Random Variables 1
No ratings yet
Joint Random Variables 1
11 pages
Volvo Tooth System Handbook - 1 - 1
No ratings yet
Volvo Tooth System Handbook - 1 - 1
36 pages
Two Dimensional Random Variable
No ratings yet
Two Dimensional Random Variable
4 pages
20ma402 Ps Unit II DCM
No ratings yet
20ma402 Ps Unit II DCM
89 pages
Generations in The Workplace
No ratings yet
Generations in The Workplace
42 pages
ch5 pt1 PDF
No ratings yet
ch5 pt1 PDF
40 pages
TB Intro Endsem
No ratings yet
TB Intro Endsem
228 pages
MATLAB Simulink Introduction
No ratings yet
MATLAB Simulink Introduction
58 pages
Single BQ
100% (2)
Single BQ
58 pages
Advertising and Promotion BY Wells, Burnet and Moriarty 5 Edition
No ratings yet
Advertising and Promotion BY Wells, Burnet and Moriarty 5 Edition
25 pages
Unit-12 IGNOU STATISTICS
No ratings yet
Unit-12 IGNOU STATISTICS
34 pages
NBFC Companies
No ratings yet
NBFC Companies
1,608 pages
Diet and Dental Caries
100% (1)
Diet and Dental Caries
83 pages
Introduction to Bessel Functions
From Everand
Introduction to Bessel Functions
Frank Bowman
2.5/5 (1)
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

6 Two-And Higher-Dimensional Random Variables

Uploaded by

6 Two-And Higher-Dimensional Random Variables

Uploaded by

6

Two- and Higher-Dimensional Random Variables

6.1 Two-Dimensional Random Variables

In many situations, however, we are interested in observing two or more numer­

Definition. Let e be an experiment and S a sample space associated with e.

If X1 = X1(s), X2 = X2(s), ... , Xn = Xn(s) are n functions each assigning

Definition. (X, Y) is a two-dimensional discrete random variable if the possible

(X, Y) is a two-dimensional continuous ra�dom variable if (X, Y) can assume

Notes: (a) Speaking informally, (X, Y) is a two-dimensional random variable if it

Definition. (a) Let (X, Y) be a two-dimensional discrete random variable. With

(1) p(x i, Yi) 2:: 0 for all (x, y),

..., is sometimes called the probability distribution of (X, Y).

(3) f(x, y) � 0 for all (x, y) E R,

If Bis in the range space of (X, Y) we have

P(B) = L:L: p(x;, Y;), (6.3)

p(xi, Yi) = P(X = Xi, Y = Yi).

Thus p(2, 3) = P(X = 2, Y = 3) = 0.04, etc. Hence if B is defined as

B = {More items are produced by line I than by line II}

P(B) = 0.(}l + 0.03 + 0.05 + 0.07 + 0.09 + 0.04 + 0.05 + 0 06 .

+ 0.08 + 0.05 + 0.05 + 0.06 ,+ 0.06 + O.Oi

EXAMPLE 6.2. Suppose that a manufacturer of light bulbs is concerned about

1 0.01 0.02 0.04 0.05 0.06 0.08

variable with the following joint pdf (see Fig. 6.3):

f(x,y) = c if 5000 � x � 10,000 and 4000 � y � 9000,

To determine c we use the fact that f�: f.:,"' f(x, y) dx dy 1. Therefore

4000 5000 f(x, y) dx dy = c[500]0 2•

Thus c = (5000)-2• Hence if B = {X � Y}, we have

Note: In the above example, X and Y should obviously be integer-valued since we

EXAMPLE 6.3. Suppose that the two-dimensional continuous random variable

To check that f+oo J+""

12 -x3+- x2y 1x=ldy

Definition. Let (X, Y) be a two-dimensional random variable. The cumulative

iJ2F(x, y)/iJx iJy = f(x, y)

wherever F is differentiable. This result is analogous to Theorem 4.4 in which we proved

6.2 Marginal and Conditional Probability Distributions

With each two-dimensional random variable (X, Y) we associate two one­

generally, to the marginal distribution of X or the marginal distribution of Y,

Sum 0.03 0.08 0.16 0.21 0.24 0.28 1.00

In the discrete case we proceed as follows: Since X x; must occur with =

p(xi) = P(X = x;) = P(X = x;, Y =

= .L: p(x;, Yi).

tribution of X Analogously we define q(y3) P( Y Yi) L:�1 p(xi, Yi) as

the marginal probability distribution of Y.

+oo f(x, y) dy; +oo f(x, y) dx.

P(c ::; X ::; d) = P [ c ::; X ::; d, - oo < Y < oo]

id J_:"' f(x, y)dy dx

EXAMPLE 6.5. Two characteristics of a rocket engine's performance are thrust

f(x, y) = 2(x + y - 2xy), 0 ::; x ::; 1, 0 ::; y ::; 1,

g(x) = fo1 2(x + y - 2xy)dy = 2(xy + y2/2 - xy 2)i6

That is, X is uniformly distributed over [O, 1].

h(y) = fo1 2(x + y - 2xy) dx = 2(x2/2 + xy - x2y)i6

Hence Y is also uniformly distributed over [O, 1].

Definition. We say that the two-dimensional continuous random variable is

f(x, y) = const for (x, y) E R,

Because of the requirement J�0000 f� 00 f(x, y)dx dy 1, the above implies

Note: This definition represents the two-dimensional analog to the one-dimensional

EXAMPLE 6.6. Suppose that the two-dimensional random variable (X, Y) is

area(R) = fo1 (x - x2) dx = !·

Therefore the pdf is given by

In the following equations we find the marginal pdf's of X and Y.

g(x) = J_:"' f(x,y) dy = 1: 6 dy

h(y) = !�"' � (x,y) dx = 1,fo 6 dx

The graphs of these pdf's are sketched in Fig. 6.6.

t p(x; I Y;) = t p(xq(;y,;)Y;)

In the continuous case the formulation of conditional probability presents some

make the following formal definitions.

Definition. Let (X, Y) be a continuous two-dimensional random variable with

The conditional pdf of X for given Y = y is defined by

The conditional pdf of Y for given X = x is defined by

f+oo f+oo f+ao

EXAMPLE 6.8. Referring to Example 6.3, we have

( x2 + xy/3 6x2 +2xy , 0 - 0 -

2x2 +2/3(x) 6x2 +2x

To check that g(x I y) is a pdf, we have

A similar computation can be carried out for h(y I x).

6.3 Independent Random Variables

In many situations, however, we are interested in observing two or more numer

With each two-dimensional random variable (X, Y) we associate two one

To obtain the probability distribution of U, say, we proceed as follows. The pos

if n > 3, we can continue to adopt geometric terminology as suggested by the lower