l4
l4
Josef Ruzicka
Department of Economics
School of Sciences and Humanities
Nazarbayev University
1 / 49
Overview
1 Random variables – basic notions
4 Conditional distributions
7 Problems
2 / 49
Random variables
Definition
A random variable is a quantity determined by the result of an
experiment.
Example
We throw three dice and denote X the sum of the numbers that show
up. X is a random variable. X can take values in {3, 4, . . . , 18}. For
instance, P(X = 4) = P(two dice give 1, one die gives 2) = 633 . In
principle, we could calculate P(X = k) for any k.
3 / 49
Random variables
Example
Let X be the number of visits a website of a company registers in a day.
X can take values 0, 1, 2, 3, . . .
Example
Let X be the time (in days) that a newly started device keeps working
before failure. Suppose it satisfies
So the probability that the device will work for at least two days is
e −0.2·2 ≈ 67%.
• When a random variable takes values in a countable set, we call it a
discrete random variable. (The first and second example).
• When a random variable takes values in a continuum of possible values,
we call it a continuous random variable. (The third example.)
4 / 49
Cumulative distribution function
Definition
For any random variable X , we define its cumulative distribution
function or just distribution function F as
F (x) = P(X ≤ x)
Example
A random variable X has the distribution function
(
0 if x ≤ 0
F (x) = −
√
x
1−e if x > 0
5 / 49
Probability mass function
Definition
For any discrete random variable X , we define its probability mass
function p as
(
P(X = a) when a is in the set of possible values of X
p(a) =
0 otherwise
6 / 49
Probability mass function
Example
Consider a random variable X that can only take values 10, 20, 30, 40 and
p(10) = 81 , p(20) = 12 , p(30) = 81 , p(40) = 41 .
Example
Consider a random variable X that can only take values 0, 1, 2, 3, . . . and
1
p(k) = 2k+1 for k = 0, 1, 2, 3, . . .
7 / 49
Probability density function
Definition
For any continuous random variable X , we define its probability density
function f as
Z
P(X ∈ B) = f (x)dx where B is any set of real numbers
B
8 / 49
Probability density function
• The density is closely related to the distribution function.
• If we take B = (−∞, a], then
Z a
F (a) = P(X ∈ (−∞, a]) = P(X ≤ a) = f (x)dx
−∞
Thus, Z 2
8 3
C (x + (x − 1)2 )dx = C = 1, so C = .
0 3 8
(b) What is P(X > 1)?
Z 2
3 22 (2 − 1)3
2
(1 − 1)3
3 1 11
(x + (x − 1)2 )dx = + − + =
1 8 8 2 3 2 3 16
10 / 49
Jointly distributed random variables
• Following an experiment, we often recover several quantities and are
interested in their relationship.
• For example, when measuring the lifetime of light bulbs, we can study
if and how it depends on temperature, voltage etc.
Definition
For random variables X and Y , we define the joint cumulative
distribution function F as F (x, y ) = P(X ≤ x, Y ≤ y )
• F is also called just joint distribution function.
• If we know the joint distribution function F of X and Y , we can find
the distribution function FX of X as follows:
FX (x) = P(X ≤ x) = P(X ≤ x, Y ≤ ∞) = F (x, ∞)
• Similarly, the distribution function FY of Y is
FY (y ) = P(Y ≤ y ) = P(X ≤ ∞, Y ≤ y ) = F (∞, y )
• We call FX the marginal distribution function of X (and FY the
marginal distribution function of Y ). 11 / 49
Jointly distributed random variables
• Although we can always recover marginal distribution functions from
joint distribution function, in general we can’t recover joint distribution
function from marginal distribution functions.
• F is more informative than FX and FY because F shows how X and Y
interact, while FX and FY don’t.
Definition
When X and Y are both discrete random variables with possible values
x1 , x2 , . . . and y1 , y2 , . . . , respectively, the joint probability mass
function p is p(xi , yj ) = P(X = xi , Y = yj )
• We can recover the probability mass function pX of X from the joint
probability mass function as follows:
X X
pX (xi ) = P(X = xi ) = P(X = xi , Y = yj ) = p(xi , yj )
j j
The table means the joint probability mass function p is p(1, 1) = 0.1,
p(1, 2) = 0.2, p(1, 3) = 0.1, p(2, 1) = 0.1, p(2, 2) = 0.3, p(2, 3) = 0.2. We get
(
0.1 + 0.2 + 0.1 = 0.4 when x = 1
pX (x) =
0.1 + 0.3 + 0.2 = 0.6 when x = 2
0.1 + 0.1 = 0.2
when y = 1
pY (y ) = 0.2 + 0.3 = 0.5 when y = 2
0.1 + 0.2 = 0.3 when y = 3
13 / 49
Jointly distributed random variables
Definition
We say that random variables X and Y are jointly continuous if there
exists a function f (x, y ) : R2 7→ R such that for any set C ⊂ R2
ZZ
P((X , Y ) ∈ C ) = f (x, y )dxdy
(x,y )∈C
14 / 49
Jointly distributed random variables
• Also, when A = (−∞, a], B = (−∞, b],
F (a, b) = P(X ≤ a, Y ≤ b) = P(X ∈ (−∞, a], Y ∈ (−∞, b])
Z b Z a
∂2
= f (x, y )dxdy =⇒ f (a, b) = F (a, b)
−∞ −∞ ∂a∂b
• When X and Y are jointly continuous, they are individually continuous,
too.
• We can get the density fX of X by integrating out Y over R:
Z ∞Z
P(X ∈ A) = P(X ∈ A, Y ∈ R) = f (x, y )dxdy
−∞ A
Z Z ∞ Z
= f (x, y )dy dx = fX (x)dx
A −∞ A
| {z }
fX (x)
Example
Consider random variables X and Y with joint density
( −x
e
when 0 ≤ x < ∞, 1 ≤ y ≤ e 2
f (x, y ) = 2y
0 otherwise
16 / 49
Jointly distributed random variables
Example (Continued)
b) Find P(X < Y ).
e2 Z e2
e −x y
e −x y
Z Z
P(X < Y ) = dxdy = − dy
1 0 2y 1 2y 0
Z e 2 −y 0 Z e2
1 − e −y
e e
= − − − dy = dy ≈ 0.89
1 2y 2y 1 2y
where the last integral doesn’t have a closed form solution, but can be
calculated by mathematical software.
17 / 49
Jointly distributed random variables
Example (Continued)
c) Find P(X < a) where a ∈ R.
We know that X cannot take negative values, so P(X < a) = 0 for any
a < 0. Now let’s consider a ≥ 0.
e2 e2
a
e −x e −x a
Z Z Z
P(X < a) = P(0 ≤ X < a) = dxdy = − dy
1 0 2y 1 2y 0
e2
−a Z 2
e − e0 1 − e −a e 1 1 − e −a
Z
2
= − dy = dy = log y |e1
1 2y 2 1 y 2
1−e −a
= (log e 2 − log 1) = 1 − e −a
2
18 / 49
Independent random variables
Definition
We say that random variables X and Y are independent if for any two
sets A, B ⊂ R
19 / 49
Independent random variables
• For X and Y discrete, denote p the joint pmf of X and Y , while px the
pmf of X and pY the pmf of Y . Then X and Y are independent if and
only if
p(x, y ) = pX (x)pY (y ) for all x, y
• For X and Y jointly continuous, denote f the joint density of X and
Y , while fx the density of X and fY the density of Y . Then X and Y
are independent if and only if
20 / 49
Independent random variables
• The independence notion can be extended to more than two variables.
Definition
We say that random variables X1 , X2 , . . . , Xn are independent if for any sets
A1 , A2 , . . . , An ⊂ R
21 / 49
Conditional distributions
Definition
When X and Y are discrete random variables, the conditional
probability mass function of X given Y is
22 / 49
Conditional distributions
Example
Marketing department classifies customers into categories: man (represented by
X = 1), woman (represented by X = 2), teenager (represented by Y = 1), adult
(represented by Y = 2), and senior (represented by Y = 3). Their proportions:
For example,
0.1
pX |Y (1|1) = P(X = 1|Y = 1) = = 0.5
0.1 + 0.1
0.1
pY |X (1|1) = P(Y = 1|X = 1) = = 0.25
0.1 + 0.2 + 0.1
0.2 2
pX |Y (2|3) = P(X = 2|Y = 3) = =
0.1 + 0.2 3
23 / 49
Conditional distributions
Definition
If X and Y have joint density f (x, y ) and the density of Y is fY (y ), then the
conditional density of X given Y is
f (x, y )
fX |Y (x|y ) =
fY (y )
Example
Consider again random variables X and Y with joint density
( −x
e
when 0 ≤ x < ∞, 1 ≤ y ≤ e 2
f (x, y ) = 2y
0 otherwise
Example
We roll a die. What is the expectation of the number that we get?
X 1 1 1 1 1 1 7
E[X ] = xi P(X = xi ) = 1 · +2· +3· +4· +5· +6· =
6 6 6 6 6 6 2
i
25 / 49
Expectation
Example
Consider a random variable with density
(
3
(x + (x − 1)2 ) when x ∈ [0, 2]
f (x) = 8
0 otherwise
26 / 49
Properties of expectation
Theorem
If X is a discrete random variable with possible values x1 , x2 , . . . , then for
any function g : R 7→ R
X
E[g (X )] = g (xi )P(X = xi )
i
For any a, b ∈ R
E[aX + b] = aE[X ] + b
Definition
For a random variable X , its nth moment is E[X n ].
28 / 49
Expectation of sums of random variables
Theorem
If X1 , X2 , . . . , Xn are random variables, then
Example
We roll 10 dice. What is the expectation of the sum of the numbers that
we get?
There are many possible outcomes: the sum could take any value in
{10, 11, . . . , 60}. Calculating all the probabilities would be tedious.
However, using the result above, it follows that the expected sum is just
10 times the expectation of a single die, which we saw is 72 . Thus, the
expected sum is 10 · 27 = 35.
29 / 49
Variance
Definition
The variance of a random variable X is Var[X ] = E[(X − E[X ])2 ]
Example
Let X be the result of rolling a die. What is its variance?
1 1 1 1 1 1 91
E[X 2 ] = 12 + 22 + 32 + 42 + 52 + 62 =
6 6 6 6 6 6 6
7 2
Earlier we saw E[X ] = 72 , so Var [X ] = E[X 2 ] − E[X ]2 = 91 35
6 − 2 = 12
30 / 49
Variance
Theorem
For a random variable X and any constants a, b ∈ R
Var[aX + b] = a2 Var[X ]
Definition
p
The standard deviation of a random variable X is Var[X ].
31 / 49
Covariance and variance of sums of random variables
Definition
For random variables X and Y , their covariance is defined by
Theorem
For random variables X1 , X2 , . . . , Xn , Y1 , Y2 , . . . , Ym
Xn Xm X n X m
Cov Xi , Yj =
Cov[Xi , Yj ]
i=1 j=1 i=1 j=1
32 / 49
Covariance and variance of sums of random variables
Theorem
For random variables X1 , X2 , . . . , Xn
" n # n n X
n
X X X
Var Xi = Var[Xi ] + Cov[Xi , Xj ]
i=1 i=1 i=1 j=1
j̸=i
Theorem
For random variables X1 , X2 , . . . , Xn that are independent
" n # n
X X
Var Xi = Var[Xi ]
i=1 i=1
33 / 49
Correlation
Definition
The correlation of random variables X and Y is
Cov[X , Y ]
Corr[X , Y ] = p p
Var[X ] Var[Y ]
34 / 49
Moment generating function
• Calculating moments of random variables can be tedious.
• If we need to calculate various moments of a random variable, it can usually
be done faster with the help of the moment generating function.
Definition
The moment generating function ϕ of a random variable X is
X
e txi P(X = xi ) when X is discrete
ϕ(t) = E[e tX ] = Z i ∞
e tx f (x)dx when X is continuous with density f
−∞
36 / 49
Inequalities
Example
The number of calls that a call center receives per hour has mean 10.
a) What can we say about the probability that at least 40 calls are received in
the next hour?
From Markov’s inequality
10
P(X ≥ 40) ≤ = 0.25
40
Thus, the probability that at least 40 calls are received is at most 25%.
b) Suppose we also know that the variance of the calls per hour is 8. What can
we say about the probability that the number of calls next hour will differ from
the mean by at least 4?
From Chebyshev’s inequality
8
P(|X − 10| ≥ 4) ≤ = 0.5
42
We conclude: The probability that the number of calls will differ from the mean
by at least 4 is at most 50%.
37 / 49
The weak law of large numbers
Theorem
Let X1 , X2 , . . . be a sequence of independent and identically distributed
random variables with mean µ. Then for any ε > 0
n
!
1X
P Xi − µ > ε → 0 as n → ∞
n
i=1
38 / 49
Problem 1
Let a > 0, µ > 0, c > 0 be constants. Consider the following function
(
cx a for x ∈ (0, µ)
f (x) =
0 otherwise
For which values of a and µ is the function f a probability density function? What is the corresponding value of c?
39 / 49
Problem 2
A child got lost in a forest whose shape is an equilateral triangle. We assume the probability that the child is in any region of
the forest is proportional to the area of that region. Denote the forest sides a, b and c. What is the distribution of the
distance of the child from forest side a?
• Let D be the distance of the child from forest side a. D is a random variable.
• Let h be the height of the triangle from side a.
• Distance is nonnegative, so P(D < 0) = 0.
• The distance cannot be greater than h, so P(D ≤ h) = 1.
• Take any d ∈ [0, h]. Let’s find the probability that D ≤ d.
√
• The area of the whole equilateral triangle equals h2 / 3
√
• The area of the smaller equilateral triangle equals (h − d)2 / 3
√
(h − d)2 / 3 d 2
P(D ≤ d) = 1 − P(D > d) = 1 − √ =1− 1−
2
h / 3 h
• The distribution of the distance of the child from forest side a is given by
0 if d < 0 2
P(D ≤ d) = 1 − 1 − dh if d ∈ [0, h]
1 if d > h
40 / 49
Problem 3
A new lottery is being designed. The number of tickets to be issued is N. Each ticket may be associated either with one prize
or with no prize. There are various prizes, each of them possibly included multiple times. Specifically, there are mi prizes of
value qi , where i ∈ {1, 2, . . . , k}. Determine the price of the lottery ticket so that its expected value equals half of the
ticket price.
Pk
• What is the value of all the tickets in total? i=1 mi qi .
• By symmetry, the expected value of every ticket is the same.
• Thus, the expected value of each ticket is N1 ki=1 mi qi .
P
• Additional question: If all the tickets are sold, what is going to be the profit from
the lottery?
k k k
2 X X X
N mi qi − mi qi = mi qi
N i=1 i=1 i=1
• The profit is equal to the value of all the tickets; the profit is 50%.
41 / 49
Problem 4
Show that if a random variable X is independent of itself, then X is constant with probability one. (X being independent of
itself means X is independent of X .)
• The last equation means that either P(X ≤ x) = 0 or P(X ≤ x) = 1. Thus, the cdf
of X can take only two values: zero or one. This is the cdf of a random variable that
is constant with probability one.
42 / 49
Problem 5
Let a > 0, µ > 0, c > 0 be constants. Consider the following function
(
cx −a for x > µ
f (x) =
0 otherwise
For which values of a and µ is the function f a probability density function? What is the corresponding value of c?
If a > 1, then
∞
cx −a+1 cµ−a+1 cµ1−a
Z ∞
cx −a dx = =0− =
µ −a + 1 µ −a + 1 a−1
43 / 49
• The integral must be 1, so the only possibility is a > 1.
• For a > 1, we must have
Z ∞
cµ1−a
cx −a dx = =1
µ a−1
44 / 49
Problem 6
For a random variable with pdf f , the median is defined as the number m ∈ R which satisfies
Z m Z ∞
1
f (x)dx = f (x)dx =
−∞ m 2
(
3x 2 if x ∈ [0, 1]
f (x) =
0 otherwise
• We have
Z m m 1
3x 2 dx = x 3 = m3 =
0 0 2
1
• As a result, m = √
3
2
45 / 49
Problem 7
A child got lost in a forest whose shape is an equilateral triangle. We assume the probability that the child is in any region of
the forest is proportional to the area of that region. What is the distribution of the distance of the child from the nearest
forest side?
• Let D be the distance of the child from the nearest forest side. D is a random variable.
• Distance is nonnegative, so P(D < 0) = 0.
• Let h be the height of the triangle.
• The maximal distance of the child from each side of the triangle occurs when the child is at the center. In this case the
distance from each side is h/3.
• The distance cannot be greater than h/3, so P(D ≤ h/3) = 1.
• Take any d ∈ [0, h/3]. Let’s find the probability that D ≤ d.
√
• The area of the whole equilateral triangle equals h2 / 3
√
• The area of the smaller equilateral triangle equals (h − 3d)2 / 3
√
(h − 3d)2 / 3 3d 2
P(D ≤ d) = 1 − P(D > d) = 1 − √ =1− 1−
h2 / 3 h
• The distribution of the distance of the child from forest side a is given by
0 if d < 0
2
P(D ≤ d) = 1 − 1 − 3d h
if d ∈ [0, h/3]
1 if d > h/3
46 / 49
Problem 8
Let X and Y be discrete random variables with joint distribution given by the table below. (X can take values 1 or 2, Y can
take values 1, 2 or 3. The probabilities of the corresponding events are given inside the table.) For the two variables, find their
(i) marginal distributions,
(ii) expectations,
(iii) variances.
Y =1 Y =2 Y =3
• (i)
• (ii) E[X ] = 1 · 0.6 + 2 · 0.4 = 1.4, E[Y ] = 1 · 0.3 + 2 · 0.3 + 3 · 0.4 = 2.1
• (iii) Var[X ] = E[(X − E[X ])2 ] = (1 − 1.4)2 · 0.6 + (2 − 1.4)2 · 0.4 = 0.24,
Var[Y ] = E[(Y − E[Y ])2 ] = (1 − 2.1)2 · 0.3 + (2 − 2.1)2 · 0.3 + (3 − 2.1)2 · 0.4 = 0.69
47 / 49
Remark on previous home assignment
• Recall the Gini index G satisfies G = 1 − 2B,
1 s1 + s2 + · · · + sn−1
B= +
2n nsn
48 / 49
n(n−1)
∂ B̃ 2 nsn − n2 (s1 + s2 + · · · + sn−1 )
=
∂A n2 (sn + nA)2
(n−1) sn−1
2 − ( ssn1 + s2
sn + ··· + sn )
= (sn +nA)2
sn
sj
• Notice that sn ≤ nj , j ∈ {1, 2, . . . , n}. Thus,
(n−1) (n−1)
∂ B̃ 2 − ( n1 + 2
n + ··· + n−1
n ) 2 − ( n(n−1)
2n )
≥ (sn +nA)2
= (sn +nA)2
=0
∂A
sn sn
Since ∂∂A
B̃
≥ 0, it follows that B̃ is nondecreasing in A, so the Gini index
is nonincreasing in A. Hence, if the wealth of all individuals is raised by
A, the Gini index either stays the same or decreases.
49 / 49