0% found this document useful (0 votes)
92 views

PTSP

Uploaded by

Sasi Bhushan
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views

PTSP

Uploaded by

Sasi Bhushan
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 101

PROBABILITY THEORY &

STOCHASTIC PROCESS
probability introduced through sets
and relative frequency
• Experiment:- a random experiment is an
action or process that leads to one of several
possible outcomes
Experiment Outcomes

Flip a coin Heads, Tails


Numbers: 0, 1, 2, ...,
Exam Marks
100
Assembly Time t > 0 seconds

Course Grades F, D, C, B, A, A+
Sample Space
• List: “Called the Sample Space”
• Outcomes: “Called the Simple Events”
This list must be exhaustive, i.e. ALL possible
outcomes included.
• Die roll {1,2,3,4,5} Die roll
{1,2,3,4,5,6}

• The list must be mutually exclusive, i.e.


no two outcomes can occur at the same
time:
• Die roll {odd number or even number}
• Die roll{ number less than 4 or
even number}
Sample Space
• A list of exhaustive [don’t leave anything out] and
mutually exclusive outcomes [impossible for 2
different events to occur in the same experiment]
is called a sample space and is denoted by S.

• The outcomes are denoted by O1, O2, …, Ok

• Using notation from set theory, we can represent


the sample space and its outcomes as:

• S = {O1, O2, …, Ok}


• Given a sample space S = {O1, O2, …, Ok}, the
probabilities assigned to the outcome
must satisfy these requirements:

(1) The probability of any outcome is between 0


and 1
• i.e. 0 ≤ P(Oi) ≤ 1 for each i, and

(2) The sum of the probabilities of all the outcomes


equals 1
• i.e. P(O1) + P(O2) + … + P(Ok) = 1
Relative Frequency
Random experiment with sample space S. we shall assign
non-negative number called probability to each event
in the sample space.
Let A be a particular event in S. then “the probability of
event A” is denoted by P(A).
Suppose that the random experiment is repeated n times,
if the event A occurs nA times, then the probability of
event A is defined as “Relative frequency “
• Relative Frequency Definition: The probability of an
• event A is defined as
nA
P( A) 
lim
n n
Axioms of Probability
For any event A, we assign a number P(A), called
the probability of the event A. This number
satisfies the following three conditions that
act the axioms of probability.
(i) P( A)  0 (Probabili ty is a nonnegativ e
number)
(iii) If A B1
(ii) P() (Probabili
then P( AtyofB)the
 whole
P( A) set is
unity)
 , (Note that (iii) states that ifP(B).
A and B are mutually
exclusive (M.E.) events, the probability of their union
is the sum of their probabilities.)
Events
• The probability of an event is the sum of the
probabilities of the simple events that
constitute the event.
• E.g. (assuming a fair die) S = {1, 2, 3, 4, 5, 6}
and
• P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6
• Then:
• P(EVEN) = P(2) + P(4) + P(6) = 1/6 + 1/6 + 1/6 =
3/6 = 1/2
Conditional Probability
• Conditional probability is used to determine how
two events are related; that is, we can determine
the probability of one event given the occurrence
of another related event.
• Experiment: random select one student in class.
• P(randomly selected student is male) =
• P(randomly selected student is male/student is
on 3rd row) =
• Conditional probabilities are written as P(A | B)
and read as “the probability of A given B” and is
calculated as
• P( A and B) = P(A)*P(B/A) = P(B)*P(A/B) both
are true
• Keep this in mind!
Bayes’ Law
• Bayes’ Law is named for Thomas Bayes, an
eighteenth century mathematician.

• In its most basic form, if we know P(B |


A),

• we can apply Bayes’ Law to determine P(A


| B)

P(B|A) P(A|B)
• The probabilities P(A) and P(AC) are called
prior probabilities because they are
determined prior to the decision about taking
the preparatory course.
• The conditional probability P(A | B) is called a
posterior probability (or revised probability),
because the prior probability is revised after
the decision about taking the preparatory
course.
Total probability theorem
• Take events Ai for I = 1 to k to be:
– Mutually exclusive: Ai  Aj  for all i,j
– Exhaustive: A 0 A  S
1 k

For any event B on S

p(B)  p(B A1 ) p( A1 )  p(B Ak )


p( Ak ) k
p(B)   p(B Ai )
p( Ai )
Bayes theorem follows
i1

p( Aj  p(B Aj ) 
p( Aj B) B) p(B)  kp( A)

 p(B A ) i

p( A )
Independence
• Do A and B depend on one another?
– Yes! B more likely to be true if A.
– A should be more likely if B.
• If Independent
pA  B  pA
pB pA B pA
pB A pB
• If Dependent
p A  B  p A p B
p A  B  p A p B p A 
B  p  A  B   p B A  p  A 
Random variable
• Random variable
– A numerical value to each outcome of a particular
experiment
S

-3 -2 -1 0 1 2 3
• Example 1 : Machine Breakdowns
– Sample space : S  {electrical, mechanical, misuse}
– Each of these failures may be associated with a
repair cost
– State space : {50, 200, 350}
– Cost is a random variable : 50, 200, and 350
• Probability Mass Function (p.m.f.)
– A set of probability value assigned to each
of the values taken by the discrete random variable x i
– 0 1  pi 
i
p
– Probability : P( X  x1i ) 
and i

pi
Continuous and Discrete random
variables
• Discrete random variables have a countable number
of outcomes
– Examples: Dead/alive, treatment/placebo, dice, counts,
etc.
• Continuous random variables have an infinite
continuum of possible values.
– Examples: blood pressure, weight, the speed of a car, the
real numbers from 1 to 6.
• Distribution function:

• If FX(x) is a continuous function of x, then X is a


continuous random variable.
– FX(x): discrete in x  Discrete rv’s
– FX(x): piecewise continuous  Mixed rv’s
– PROPERTIES:


Probability Density Function (pdf)
• X : continuous rv, then,

• pdf
properties: 1.
t
2.
F (t)  f (x)dx

 t
 f (x)dx
0
,

Binomial
• Suppose that the probability of success is p

• What is the probability of failure?


q=1–p

• Examples
– Toss of a coin (S = head): p = 0.5  q = 0.5
– Roll of a die (S = 1): p = 0.1667  q = 0.8333
– Fertility of a chicken egg (S = fertile): p = 0.8  q = 0.2
binomial
• Imagine that a trial is repeated n times

• Examples
– A coin is tossed 5 times
– A die is rolled 25 times
– 50 chicken eggs are examined

• Assume p remains constant from trial to trial and that the trials are
statistically independent of each other
• Example
– What is the probability of obtaining 2 heads from a coin that
was tossed 5 times?

P(HHTTT) = (1/2)5 = 1/32


Poisson
• When there is a large number of trials, but a small probability
of success, binomial calculation becomes impractical
– Example: Number of deaths from horse kicks in the Army
in different years

• The mean number of successes from n trials is µ = np


–Example: 64 deaths in 20 years from thousands of
soldiers If we substitute µ/n for p, and let n tend to infinity, the
binomial
distribution becomes the Poisson distribution:
P(x) =
e x!
-µµx
poisson
• Poisson distribution is applied where random
events in space or time are expected to occur

• Deviation from Poisson distribution may


indicate some degree of non-randomness in
the events under study

• Investigation of cause may be of interest


Exponential Distribution
Uniform
All (pseudo) random generators generate random deviates of
U(0,1) distribution; that is, if you generate a large number of
random variables and plot their empirical distribution
function, it will approach this distribution in the limit.
U(a,b)  pdf constant over the (a,b) interval and CDF is the
ramp
function
0
U(0,1)
pdf 0.1
0.2
1.2
0.3
0.6
1
0.7
0.8
0.8
0.9
1
cdf

0.6
1.1

0.4 1.2
1.3

0.2 1.4
1.5

0 1.6
0 0.1 0.2 0.3 0.6 0.7 0.8 0.9 1 1.1 1.2 1.7
1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2
1.8
time 1.9
Uniform distribution

0 x<

{
, a,
F(x) xa
, a <x<
= ba
b
1 x>
, b.
Gaussian (Normal) Distribution
• Bell shaped pdf – intuitively pleasing!
• Central Limit Theorem: mean of a large
number of mutually independent rv’s (having
arbitrary distributions) starts following Normal
distribution as n 

• μ: mean, σ: std. deviation, σ2: variance (N(μ,


σ2))
• μ and σ completely describe the statistics. This
is significant in statistical estimation/signal
processing/communication theory etc.
• N(0,1) is called normalized Guassian.
• N(0,1) is symmetric i.e.
– f(x)=f(-x)
– F(z) = 1-F(z).
• Failure rate h(t) follows IFR behavior.
– Hence, N( ) is suitable for modeling long-term wear or
aging related failure phenomena
Exponential Distribution
Conditional Distributions
• The conditional distribution of Y given X=1 is:
• While marginal distributions are obtained
from the bivariate by summing, conditional
distributions are obtained by “making a cut”
through the bivariate distribution
The Expectation of a Random
Variable
Expectation of a discrete random variable with
p.m.f
P( X  xi )  pi

E( X )   pi xi
i
Expectation of a continuous random variable with
p.d.f f(x)

E( X )  
continuous r.v.
state space
xf ( x ) d x
E[ X ]  X  N  xf X
expectation of X = mean of X = average of X
(x)dx discrete r.v.

E[ X ]  X   x P(x )
i i
i1
f X (x  a)  f X (x  a),  x  E[ X ] 
a
X r.v.  Y =g( X ) r.v.
Ex: Y  g( X ) 
X2
P( X  0)  P( X 1 1)  P( X  1) P(Y  0)  P(Y
2  1) 
1

3 3 3
Expectation

expectation of a function of a r.v. X


E[g( X )]   g(x) f (x)dx continuous r.v.
N  X

discrete r.v.
i
conditional )]   g(xof
E[g( Xexpectation
1 a r.v.
i )P(x i) X

E[ X B]  continuous r.v.
N  xfX (x
B)dx discrete r.v.
E[ X B]   x P(x B)
i i
i1
Ex: B  {X 
b} f X (x)
 b , x b

f X (x X  b)   f X (x)dx b
 E[ X X  b]
 xf
 X (x)dx
 b

0, x  f
 X (x)dx
b
Moments
n-th moment of a r.v. X

mn  E[ X n ]  xn fX (x)dx
 continuous r.v.
N
mn  E[ Xn ]  i P(xi )
x n
discrete r.v.
m  i1
0
m1 
1 X
properties of
expectation:
(1) E[c]  c --
c constant
(2) E[ag( X )  bh( X )]  aE[g( X )] 
bE[h( X )]  
PF: E[c]   cf X (x)dx  c f X
(x)dx  c

E[ag( X )  bh( X )]   {ag(x)  bh(x)} f X (x)dx




 a g(x) f X (x)dx  b h(x) f X (x)dx  aE[g(X )] 


bE[h(X )]
variance of a r.v. X
 X2  2  E[( X  X )2 ]  E[ X 2  2 XX  X
2
]  E[ X 2 ]  2 XE[ X ]  X 2  m

 m2
2 1

standard deviation of a r.v. X   X ( 0)



skewness of a r.v. X   33
X
f X (x) symmetric about x   3 
Ex 3.2-1 & Ex3.2-2:
X xa 0
1 e  b
, xa
exponential r.v. f X (x) b
 xa
 0

,
 1  xab
m1  E[ X ]  x e dx  a 

a b bxa

2 a1 b b
dx  (a  b) 2

b2
 2    m  m2 
b2 
mX  E[ X2 ] 2 x211 e xa
2

m3  E[ X 3 ]  x3 e b dx  a3  3a2b  6ab2 
 b 6b3
X  3XX 3  X 3 1] 2  3m1 m  m1
2 3
3  E[( X  X 3) ] a E[ 3X 2
 3X 2

1
 m  3m3 m
 a  3a b  6ab  6b  3(a  b){(a  b)2  b 2 } 2(a  b)3
3 2 2

 2b3
skewness of a r.v. X   33 2b3 
 X b3 2
2
Chebychev's inequality P[ X  X   ]  2
X

 
 X   (x  X ) fX (x)dx
2 2
(x  X )2 f X (x)dx
   x X 2
  
2
f (x)dx

  P[ X  X  
] X
Markov's inequalityx X 
P[ X  0]  0  P[ X  a] 
a
2 E[ 1
X ]
X
Ex 3.2-3: P[ X  X  3 X ] 9 2 9
X

Characteristic function of r.v. X

jX
 X ()  E[e ] f X (x)e jxdx
 

 jx Fourier transform


f X (x) 2
1 X ( )e
d 

  
 X ( ) f X (x) ef X (x)dx  1   X
jx
dx
    (0)
d
n
( )   
 
n n j
x n n
X
f X (x) j x e dx   f X (x)x dx  j n
E[ X
d n 
j 0
 n
]

 0 n d  X ()
n
mn  ( j)
dn 
0
Functions That Give Moments
Moment generating function of r.v. X

M X (v)  E[e ]  vX
fX (x)evxdx

d n M (v)   
 
n vx
X
 f X (x)x e dx v  f X (x)x n dx mn
dvn v

0


0

Ex 3.3-1 & Ex 3.3-2: xa



1 e b
, x a
f X (x)   b
 x a
0
,

a 1 a ( 1b  j ) x
1  (  j  ) 1 e
e
jX
 X ( )  ] b b x
dx  b

E[e a e b ( 1b j)


a 1
( b  j ) a eb x
1 eb e e ja
a
 
b 1 1 jb
(  d X ( ) jae ja (1 jb)  e ja jb
b)
j
 d (1 jb)2
vX eva
M X (v)  E[e ]  va
va
1 vb
dv (1 vb)2
d X ()  a dM (v)
m1  ( j) d m1  dv X  a
 b v b
0
0
Chernoff's inequality Ex 3.3-3:
v0
 
P[ X  a]   a
f X (x)dx   
fX
(x)u(x  a)dx 
 f X (x)e v( xa ) dx  eva M X (v)


Transformations of a Random
Variable
YT fX (x) given  fY ( y)
(X) ?
monotone increasing 
T (x1 )  T (x2 ) for any x1 
x2

monotone decreasing 
T (x1 )  T (x2 ) for any x1 
x2
Assume monotone increasing T Y T
() F ( y )  P[Y  y ]  P[ X  x ]  F ( X )
Y 0 0 0 X

(x0 )
y0 T 1 ( y0 )

fY ( y)dy  
f X (x)dx

1 dT 1 ( y0 )
fY ( y0 )  f X [T ( y0 )]
dy0

1 dT 1 ( y) dx
fY ( y)  f X [T  f X (x)
dy
( y)]
dy
Assume monotone decreasing T Y T
()FY ( y0 )  P[Y  y0 ]  P[ X  x0 ]  1( FXX )
(x0 )
dx
fY ( y)   fX (x) dy

1
monotone T ()  dx
fY ( y)  Xf (x) dy  fX (x) dy
dx
nonmonotone T
()
Y T
(X)

f ( y)  f X (xn )
Y
n dT
 (x)dx x n
x
Ex 3.4-2:
Y  T (X )  nonmonoto e
cX 2 n
f ( dy)  f ( y / y /
c) dc
Y X
y
y/
 fX ( y / c )
d c
d
y
fX ( y / c)  fX ( y / y

2 0
c) cy
,
MULTIPLE RANDOM VARIABLES and OPERATIONS:
MULTIPLE RANDOM VARIABLES :
Vector Random Variables
A vector random variable X is a function that assigns a
vector of real numbers to each outcome ζ in S, the sample
space of the random experiment

Events and Probabilities


EXAMPLE 4.4

Consider the tow-dimensional random variable X = (X, Y).


Find the
region of the plane corresponding to the events
A  X  Y  10,
B  min( X ,Y )  5, and
 
C  X 2  Y 2  100 .

The regions corresponding to events A and C are


straightforward to find and are shown in Fig. 4.1.
Independence
If the one-dimensional random variable X and Y are
“independent,” if A1 is any event that involves X only and A2 is
any event that involves Y only, then
PX in A1 , Y in A2  PX in A1 PY in A2
.
In the general case of n random variables, we say that the
random
variables X1, X2,…, Xn are independent if
PX1 in A 1 ,…, X n in An  PX1 in A1 PX n in An (4.3
, )

where the Ak is an event that involves Xk only.


Pairs of Discrete Random Variable
Let the vector random variable X = (X,Y) assume values from some
countable
se 
St  j (x k,
j  1 , 2 , … , k  1,2,….The joint probability mass
yspecifies
), the probabilities of the product-
function of X
form event
 X  x   Y  y j k

p X ,Y (x j, y )  PX :x Y 


k j k

y   j  1,2,… k  1,2,…
 P X  jx ,Y  ky
The probability of any event A is the(4.4)
sum of the pmf over the
outcomes in A 
(4.5
PX in A   p X ,Y (x j , yk ) . )
( x j , yk ) in A
 
(4.6
 p X ,Y (x j , yk )  1. )
j1 k 1
The marginal probability mass
functions :

p X (x j )  P X  xj
  PX  x ,Y j
and Y  y
 PX xand Y  y  X
anything

j 1 j 2
 x
(4.7a
  pX,Y (x )
j ,yk ) ,

pY ( yk )  kP1Y  yk 

(4.7b
  pX,Y ( x j ,yk ) )
.
j 1
The Joint cdf of X and Y
The joint cumulative distribution function of X and Y is
defined as the
probability of the product-form event X  x1Y  y1":
FX ,Y (x1, y1 )  PX  x1,Y  y1 (4.8
The
. )
joint cdf is nondecreasing in the “northeast”
direction,
(i) FX,Y(x1,y1 )  FX,Y(x2 ,y2 ) if x1  x2 and y1 
y2 ,

It is impossible for either X or Y to assume a value less


than  , therefore
(ii) FX,Y(  ,y1 )  FX,Y(x2 , )  0

It is certain that X and Y will assume values less than


infinity, therefore
(iii) FX,Y(,)  1.
If we let one of the variables approach infinity while
keeping the
other fixed, we obtain the marginal cumulative distribution
(iv FX (x)  FX ,Y (x, )  PX  x,Y   PX
functions
)  x
an
d FY ( y)  FX ,Y (, y)  PY 
Recall ythat
. the cdf for a single random variable is
continuous form the right. It can be shown that the joint
cdf is continuous from the “north” and from the “east”
(v) lim FX ,Y (x, y)  FX ,Y (a, y)
xa

and
lim FX ,Y (x, y)  FX ,Y (x,
yb
b)
The Joint pdf of Two Jointly Continuous Random
Variables
We say that the random variables X and Y are jointly
continuous if the probabilities of events involving (X, Y) can
be expressed as an integral of a pdf. There is a
nonnegative function fX,Y(x,y), called the joint probability
density function, that is defined on the real plane such that
for every event A, a subset of the plane,
(4.9
PX in A A  f X ,Y (x', y')dx' )
as shown in Fig. 4.7. When a is the entire plane, the
dy',
integral must
equal one :

 (4.10
1    f X ,Y (x', y')dx'
The joint cdf can be obtained in terms of the joint) pdf
dy'.
of jointly
continuous random variables by integrating over the
semi-infinite
The marginal pdf’s fX(x) and fY(y) are obtained by taking the
derivative of the
corresponding marginal cdf’s
FX (x)  FX ,Y (x, )
FY ( y)  FX ,Y (,
FX (x) dx 
d 
x



y) .

f X ,Y (x',

 y')dy'dx' (4.15a
  f X ,Y (x, )
y')dy' .

(4.15
FY ( y)   f X ,Y (x', b)
y)dx'.
INDEPENDENCE OF TWO RANDOM
VARIABLES

X and Y are independent random variables if any event A1


defined in terms of X is independent of any event A2 defined
in terms of Y ;
PX in A1, Y in A2  PX in A1 PY in A2 (4,17
 )
Suppose .that X and Y are a pair of discrete random variables.
If we let
A  X  x  and A  Y  y t,hen the independence of X
1 j 2 k
implies
and Y
j k 
p (x , y )  P X  x ,Y  ythat
X ,Y j k

  PX  x PY 
j k

y p X (x j ) pY ( yk ) for all x and (4.18


j yk . )
4.4 CONDITIONAL PROBABILITY AND
CONDITIONAL EXPECTATION

Conditional Probability
In Section 2.4, we know

PY in A, X  x (4.22
PY in A | X  x PX  .
)
If X is discrete, then x Eq. (4.22) can be used to
obtain the
conditional cdf of Y given X =
xk : PY  y, X  kx
FY ( y | xk ) , for PX  xk   (4.23
  PX  xk 0 . )

The conditional pdf of Y given X = xk , if the derivative
exists, is given
by Y d
f ( y | xk ) dy FY ( y | xk ) .
(4.24)
MULTIPLE RANDOM VARIABLES

Joint Distributions
The joint cumulative distribution function of X1, X2,…., Xn is
defined as the probability of an n-dimensional semi-infinite
rectangle associate with the point (x1,…, xn):

FX , X , … X (x1, x2 , … x n )  PX1  x1, X 2  x2 , … , X n (4.38


 xn . )
1 2 n
The joint cdf is defined for discrete, continuous, and random
variables of
mixed type
FUNCTIONS OF SEVERAL RANDOM
VARIABLES
One Function of Several Random Variables

Let the random variable Z be defined as a function of


several random variables:
Z  g  X 1 , X 2 , … , X n . (4.51
)
The cdf of Z is found by first finding the equivalent
event of
that is, the set RZ  x   x 1 ,…, xn  such that
gx  z, then
  
f X , … , X x'1 , … , n' dx1' … d xn' . (4.52
FZ (z)  PxRinX z in Rz 
1 n

)
 x 
EXAMPLE 4.31 Sum of Two Random
Variables
Let Z = X + Y. Find FZ(z) and fZ(z) in terms of the joint
pdf of X
and Y.

The cdf ofzx'Z is
FZ (z)    f X ,Y (x', y')dy' dx'.

The pdf of Z is 
d Z
f (z) dz F (z)
Z f X ,Y (x', z  x')dx' .
 (4.53)

Thus the pdf for the sum of two random variables is given by a

superposition integral. If X and Y are
independent random variables, then by Eq. (4.21) the pdf is given
by the
convolution integral of the margial pdf’s of X and Y :


(4.54
f Z (z)   f X (x') fY (z  )
x')dx'.
pdf of Linear Transformations
We consider first the linear transformation of two random
variables

V  aX  bY V  a b  X 
W   c  
eY 
.
W  cX  eY   
Denote the above matrix by A. We will assume A has an inverse,
so each
point (v, w) has a unique corresponding point (x, y) obtained
from
x  1 v  (4.56
 y  A .
w )
In Fig. 4.15, the infinitesimal rectangle and the parallelogram are
equivalent
events, so their probabilities must be equal. Thus

f X ,Y (x, y)dxdy  fV ,W (v, w)dP


where dP is the area of the parallelogram. The joint pdf of V and W is
thus given by

f X ,Y (x, y) (4.57
fV ,W (v, w)  dP ,
)
dxdy
where x an y are related to (v, w) by Eq. It
(4.56) 
be shown tdhPat  ae  bc so the “stretch can
factor” is
dxdy ,
dP ae  bc dxdy
  ae  bc  A
dxdy dxdy
where |A| 
, is the determinant of A.
Let the n-dimensional vector
Z be

Z  AX,
where A is an invertible matrix.
nn The
joint of Z is then
EXPECTED VALUE OF FUNCTIONS OF RANDOM VARIABLES

The expected value of Z = g(X, Y) can be found using the


following
expressions

 
 gx, y X ,Y (x, y)

X, Y jointly
E Z
 
 f continuous (4.64
 i n i , yn ) pX ,Y (xi , yn )
 g(x )
 X,Y discrete.
*Joint Characteristic Function
The joint characteristic function of n random variables is
defined as

X , X1 2 ,…X (w
n 1, w2 , … w
n )  E e 
jw1 X 1 w 2 X 2 w nX 
n
(4.73
a)
.

 X ,Y (w1 , w2 )  E jew1 X w2Y (4.73
b)
.
If X and Y are jointly continuous random
variables, then  
 X ,Y (w1 , w2 ) j w xw
f X ,Y (x, y)e 1 2y 
dxdy.
  (4.73c)
The inversion formula for the Fourier transform implies that the
joint pdf is
given by

 
1 j w1 xw2 y
f X ,Y (x, y)
4 2  
 X ,Y 1
(w , w )e
2 
dw1dw2 .
 (4.74)
 
JOINTLY GAUSSIAN RANDOM VARIABLES
The random variables X and Y are said to be jointly
Gaussian if their
joint pdf has the form

f X ,Y (x, y)
 1  x  m  2  x  m 1 y  m  
2

 2  X ,Y 
      y m
 
1 2 2
exp 2 2
 
 1
  X ,Y  1   1  2   2   
 21 2 1  X2 ,Y
(4.79)
x and    y  
The pdf is constant for values x and y for which the argument of the
exponent is constant
 x  m 12  x  m 1  y  m   y  m2   
2

 1   2  X ,Y   1 

  2
2

  2 constant
   
When ρXY, = 0, X and Y are independent ; when ρX,Y ≠ 0, the major
axis of
the ellipse is oriented along the angle
1  2   
  2 arctan X2,Y1 22  . (4.80
 1 2  )
Note that the angle is 45º when the variance are
equal.
The marginal pdf of X is found by integrating fX,Y(x, y)
over all y
xm 1 2/ 2  12
e
f X (x) , (4.81
 21 )
that is, X is a Gaussian random variable with mean m1 and
variance
2
1
n Jointly Gaussian Random Variables
The random variables X1, X2,…, Xn are said to be jointly
Gaussian
joint pdf isif given
their  1 
by
exp   x  m 
T
K 1
x  m 
2
(x , x , … x )   2 n / 2 k
f X (x)  X1 , X 2 , … , Xn 1 2 , (4.83)
f n
1/ 2
where x and m are column vectors
defined by
 x1  m1   EX1 
x  m  EX 
x    , m  2   EX 2
2

⁝    ⁝  3
 
 x n  m n  EX 4 
and K is the covariance matrix that is
defined by
 VARX1  COVX 2 , X 1  COVX1, X n
COVX , X
2 1  VARX 2  COVX 2 , Xn 
(4.84
K   ⁝ 
 ⁝   ⁝  )
COVX n , X1  VARX n 
 
Transformations of Random Vectors
Let X1,…, Xn be random variables associate with some experiment,
and let the random variables Z1,…, Zn be defined by n functions of X
= (X1,…, Xn) :

Z1 cdf
The joint g1 (X)  g 2 z(X)
Z 2point
of Z1,…, Zn at the … Zzn)
= (z1,…, n isg n (X).

equal to the probability of the region of x where


FZ ,…,Z (z 1 , … , zn )  Pg1 (X)  z 1 , … , g n (X)  zn . (4.55a
1 n
)
FZ , … , Z (z 1 ,…, zn )  (x ' 1,..., x' n )dx'1dx' .
1
 f X ,..., X
n
1 n

x':g (4.55b)

k
n
( x')z k
pdf of Linear Transformations
We consider first the linear transformation of two random
variables

V  aX  bY V  a b  X 
W   c  
eY 
.
W  cX  eY   
Denote the above matrix by A. We will assume A has an
inverse, so each point (v, w) has a unique corresponding
point (x, y) obtained from

x  1 v  (4.56
 y  A  w. )
In Fig. 4.15, the infinitesimal rectangle and the
parallelogram are
equivalent events, so their probabilities must be equal.
Thus

f X ,Y (x, y)dxdy  fV ,W (v, w)dP


Stochastic Processes
denote the random outcome of an experiment. To
Let 
outcome everysuppose
such a X (t ,
Xwaveform is )

 ) collection
(t,The assigned.of X (t ,  )

such waveforms
n

X (t ,  ) ⁝
form a stochastic
k

process. The X (t ,  ) ⁝
set of {k } and the 2

X (t ,  )
time index t can be 1
t
continuous or discrete 0
1
t 2
t

(countably infinite or Fig.


For fixed
finite) i  S(the
as well.
all experimental set
outcomes), X (t, )is a specific time14.1
of
function. X1  X 1 i
For fixed t,
, )
is a random(tvariable. The ensemble of all such realizations
X (t, )
over time represents the stochastic
process X(t). (see Fig 14.1). For
example
X (t)  a cos(0t  
),
If X(t) is a stochastic process, then for fixed t, X(t)
represents
a random variable. Its distribution function is given
by
FX (x,t)  P{X (t) 
x}
Notice thatF (x,t) depends on t, since for a different t, we
X
a different random dF (x,
obtain
variable. Further f X (x, t)  X

t) dx
represents the first-order probability density
function of the process X(t).
For t = t1 and t = t2, X(t) represents two different random
variables X1 = X(t1) and X2 = X(t2) respectively. Their joint
distribution is given by
F (x , x ,t ,t )  P{X (t )  x , X (t ) 
x}
X 1 2 1 2 1 1 2
2
an
d
 2 F ( x , x , 1t , 2t
f X ( x1 , x2 ,1t ,2 t ) X

1 2  x1 2 )
x
represents the second-order density function of the process
X(t). Similarly f X (x1, x2 , xn , t1, t2 , trne)presents the nth
order density function of the process X(t). Complete
process X(t) requires
specification the
of the stochastic f X (x1 , x2 , x n, t 1, t2 , t n)
knowledge
i  1, 2, , and for all n. (an almost impossible
i of
for altl
in n task
reality).
,
Mean of a Stochastic Process:

 (t)  E{X (t)} x f X ( x,
 t)dxX(t). In general, the
mean value of a process
represents the
mean of 
a process can depend on the time index t.
Autocorrelation function of a process X(t) is
defined as
RXX (t1 ,t2 )  E{X (t1 ) X * 2(t  x1 2 f X (x1 , x2 ,t1 ,t2 )dx1 dx2
x *

and)}it represents the interrelationship between the random


variables 
X1 = X(t1) and X2 = X(t2) generated from the process X(t).
Properties:

(t ,t )  RXX* (t2 ,t1 )  [E{X (t)


2 X *
1
1. RXX 1 2 *
(t )}] 2
2. R (t,t)  E{| X (t) | }  0.
XX
3. R XX (t ,t r)epresents a nonnegative definite function,
set of
i.e.,
1 for stants
2 any {a }
i i
n
c on n n
1

Eq. (14-8) follows by noticing  ai a*j R XX


(ti , tj )  (14-
8)
that The function
i1 j
0. n
1
E{|Y |2 }  0 for Y  a X (t ).
i i
 i
1
represents the autocovariance function of the
(14-
process X(t).
Example 14.1
Le C XX
(t ,t
1 2 )  RXX
(t ,t
1 2 )   X
(t1 )  *
X
(t2 ) 9)
t

The
n
T
z T X (t)dt.

T T
E{X (t ) X * (t )}dt dt
E[| z | ]
2
T T 1 2
1
 T T
2
R (t ,t )dt dt (14-
T T XX 1
2
2 1
10)
Stationary Stochastic Processes
Stationary processes exhibit statistical properties
that are invariant to shift in the time index. Thus, for
example, second-order stationarity implies that the
statistical properties of the pairs
{X(t1) , X(t2) } and {X(t1+c) , X(t2+c)} are the same
for any c. Similarly first-order stationarity implies that the
statistical properties of X(ti) and X(ti+c) are the same for
any c.
In strict terms, the statistical properties are
f X (x1 , x2by
governed ,xthen ,joint ,t n )  fdensity
t1 , t2probability X
(x1 , x2 function.  c, t2 a
,x n , t1 Hence c,t n 
process is nth-order
c)any c,Strict-Sense
for where the Stationary
left side represents
(S.S.S) if the joint density
function
the random of X1  X (t1 ), X 2  X (t2 ), , Xn an
variables
the right side corresponds to the joint density function ofd
 X (tn )
the random variables X1  X (t1  c), X2  X (t2  c), , Xn
A process X(t) is said to be strict-sense stationary if (14-

14)X is
true  c).ti , i  1, 2,, n, n  1, 2, and any
(tn all
for
c.
For a first-order strict sense stationary
process,
from (14-14) we have f (x,t)  f (x,t
X c) (14-
X 15)
for any c. In particular c = – t gives
(14-
fX (x,t)  f X(x) 16)
i.e., the first-order density of X(t) is independent of t. In
that case
Similarly, for a second-order strict-sense stationary
we have from (14-
process 
(14-
14)
E[ X (t)]    x f ( x)dx  , a 17)
for any c. Forconstant.
c = – t2 we
get
f (x , x , t , t )  f (x , x , t  c, t 
2
c)
X 1 2 1 2 X 1 2 1

f (x , x , t , t )  f (x , x , t  t ) (14-
X 1 2 1 2 X 1 2 1 2
18)
i.e., the second order density function of a strict sense
stationary
process depends only on the difference of the time t1  t2 
indices R (t , t )  E{X (t ) X * .
(t )}
In that case the autocorrelation function is given by
1 2 x x f1 ( x , x2 ,   t2 )dx1dx2
*

 t
XX

(14-

 R (t1  21t )X 1 R 2  )  R* ( ), 19)
( XX XX

i.e., the autocorrelationXX function


1 of
2 a second order
strict-sense
stationary process depends only on the difference of
the time  t1  t2
indice
Notice that (14-17) and (14-19) are consequences of the
s
stochastic.
process being first and second-order strict sense stationary.
On the other hand, the basic conditions for the first and
second order stationarity – Eqs. (14-16) and (14-18) – are
usually difficult to verify. In that case, we often resort to a
looser definition of stationarity, known as Wide-Sense
Stationarity (W.S.S), by making use of
(14-17) and (14-19) as the necessary conditions. Thus, a
process X(t)
is said to be Wide-Sense Stationary if
(i) E{X (t)}  
and (14-
(ii) 20)
E{X (t1 )X 2(t )}  R1
*
XX
(14-
2
i.e., for wide-sense (t 
stationary processes, the mean is a 21)
t ), and the autocorrelation function depends only on
constant
the difference between the time indices. Notice that (14-20)-
(14-21) does not say anything about the nature of the
probability density functions, and instead deal with the
average behavior of the process. Since (14-20)-(14-21)
follow from (14-16) and (14-18), strict-sense stationarity
always implies wide-sense stationarity. However, the
converse is not true in general, the only exception being the
Gaussian
X 1  X (t 1process.
), X 2  X ( t 2 ),
X n  X ( t n ) are jointly Gaussian random
variables
 , for any
This follows, since if X(t)1 is2 ,t
t ,t whose joint
a Gaussian
n process, then by
definition
characteristic function
is given by
n n
j   (tk )k   C (ti ,tk )  i  k / 2
(14-
X X

 ( 1 ,  2 ,, n )  k 1 l ,k

22)
X

i ek
where C (t ,t ) is as defined on (14-9). If X(t) is wide-
XX
stationary, then using (14-20)-(14-21) in (14-22)
sense
we get n n n
j  k  21  C XX (ti t k ) i k
 X (1 , 2 , , n )  k11 11 k

(14-
and hence e if the set of time indices are shifted by a 23)
generate cato
constant new set of jointly Gaussian random X1  X 1
X2  X 2(t , Xn  X (t
variables then their joint (t  c),
 n  c), is identical
function  c)to (14-23). characteristic
Thus the set of random
variables
and {X i }i have the same joint probability distribution for all n
n
i i
all c, establishing
1 the strict sense stationarity of Gaussian 1
processes from its wide-sense stationarity.
and To summarize
{X } n
if X(t) is a Gaussian process, then
wide-sense stationarity (w.s.s)  strict-sense
stationarity (s.s.s).
Notice that since the joint p.d.f of Gaussian random variables
depends only on their second order statistics, which is also
the basis

PILLAI/
Systems with Stochastic Inputs
A deterministic system1 transforms each input waveform X
(t,i )into
time variable
an output t. Thus Y
waveform a (t,
set i )ofrealizations
T[X (t,i )] byatoperating
the input only on
corresponding
thea process X(t) generates a new set of realizations{Y
to

(t, )}at the output associated with a new process Y(t).

Y (t , i
X (t, i  )
 )

X(t )

T [ ] Y(t )
t t

Fig. 14.3

Our goal is to study the output process statistics in terms of the


input
process statistics and the system function.

1A stochastic system on the other hand operates on both the variables t and .
Linear Systems: L[ ]represents a linear system
if (14-
L{a1 X (t1 )  a2 X (t2 )}  a1L{X (t1 )}  a2 L{X
Le (t )}. 28)
t 2 Y (t)  L{X
(14-
represent the output of(t)} a linear
29)
system.
Time-Invariant System: L[ ] represents a time-invariant system if

Y (t)  L{X (t)}  L{X (t  t0 )}  Y (t  t0 )


i.e., shift in the input results in the same shift in the
(14-
output also. If L[]satisfies both (14-28) and (14-30), 30)
then it corresponds to a linear time-invariant (LTI)
system.
LTI systems can be uniquely represented in terms of their
output to a delta function

h(t
)
Impulse
response
 LTI h(t) of the
system
(t)
Fig. Impuls t
Impuls
14.5 e
the
Y
n (t )
X
(t ) t
X Y
t (t ) LTI (t ) 

arbitra Fig. Y (t)     h(t   ) X


ry 14.6
input ( )d  h( ) X  (14-
31)
Eq. (14-31) follows by expressing (t  )d
X(t) as
(14-
X (t)    X ( ) (t 32)
and applying (14-28) and (14-30) toY (t)  L{X
  )d

(t)}T. hus
Y (t)  L{X (t)}  L{ X ( ) (t
 )d } L{X  ) (t By
Linearity
 (  )d }
 X ( )L{ (t By Time-

  )}d  invariance
    X ( )h(t  )d     h( ) X (t  )d .
(14-33)
Output Statistics: Using (14-33), the mean of the output
process
is given by 
Y (t)  E{Y (t)} E{X  )h(t 
 
 (  )d }
  X ( )h(t
   )d  X
(t)  (14-
  h(t). function between the input
Similarly the cross-correlation 34)
and output processes is given by
R ( t , t )  E { X ( t ) Y * (t ) }
XY 1 2 1 2

 E{X 1 X * (t 2   ) h * (  ) d  }
(t )   
 E { X ( t 1 ) X * (t 2   ) } h *
   (  ) d 
 R X X ( t1 , t2   ) h * (  ) d  (14-
  35)
 RX X ( t1 , t2 )  h * (t2 ).
Finally the output autocorrelation function is
given by
R (t ,t )  E{Y (t )Y * (t
)}
YY 1 2  1 2
 X (t1   Y * (t 2
E{    )h( )d * )}
 
E{X (t   )Y (t


)}h( )d
1 2
  R XY (t1   ,t 2 )h( )d
 (14-
  XY (t 1 ,t 2 )  1 36)
o R h(t ),
r
RYY (t1 ,t 2 )  (t ,t )  h* (t )  h(t (14-
). 37)
R XX 1 2 2 1
 X (t) h(t)  (t)
Y

(a
)
RXX (t1 ,t 2 )    h*(t2)
RYX (t1 h(t1)
 RYY (t1 ,t2 )
,t2 )
In particular if X(t) is wide-sense stationary, then  (t) 
X X
we have
so that from (14-34) 
Y (t)  X h( )d  X c, a constant. (14-
Als R (t,t )  R (t so that (14-35) 38)
XX 1 XX 1
o 
R (treduces
 t  to )h*
 2t )
R XY (t1 ,t ) 2

2   (XX )d1 * 2 (14-


 R ( )  h ( )  R   t1  39)
( ),are
Thus X(t) and Y(t) XX jointly w.s.s. Further, t . (14-36),
XY from
2
the output autocorrelation simplifies to

RYY (t1 ,t )  RXY (t1    t2 )h( )d ,   1t 
 2  t 2 (14-
 RXY ( )  h( )  R
From (14-37), we 40)
obtain
YY
( ).

RYY ( )  XX
(  )  h*
( )  h( (14-
R ). 41)
From (14-38)-(14-40), the output process is also wide-sense
This gives rise to the following
stationary.
representation

X (t) Y (t)
wide-sense wide-sense
stationary LTI system
h(t) stationary
process process.
(a
)
X (t)
strict-sense LTI system
Y (t)
strict-sense
stationary h(t) stationary
process
process (see Text
(b
for proof )
)

X (t) Gaussian
Y (t)
Gaussian
process Linear system process (also
(also stationary)
stationary) (c)

Fig.
Discrete Time Stochastic Processes:

A discrete time stochastic process Xn = X(nT) is a


sequence of random variables. The mean, autocorrelation
and auto-covariance functions of a discrete-time process
are gives by

n  E{X (14-
57)
an R(n , (nT
n ) )} E{X (n T ) X * (n (14-
d T )} 58)
1 2 1 2
respectively. As before strict sense stationarity and
wide-sense
stationarity definitions apply here also.
For example, X(nT) is wide sense stationary if *
2 n
C(n1 , n2 )  R(n1 , n) 1 n2 (14-
an  59)
d

E{X (nT )}  a constant (14-


60)
,
E[ X {(k  n)T}X* {(k)T}]  R(n)  n  *
n
(14-
61)
r r
Power Spectrum
For a deterministic signal x(t), the spectrum is well X
defined: If represents its Fourier transform, i.e., if
 ( )
X ( ) x(t)e  jt dt, (18-
  1)
then ) |2represents
| X (theorem
Parseval’s since its
theenergy
signal spectrum.
 energy is This
follows  1 
given byfrom (18-2)

2
x (t)dt 2  | X () | d 
2

 E. energy in the (  , 
Thu | X () |
2 representsthe signal
s
(see Fig band
  )
18.1).


| X ( )|
X 2
Energy in( ,
(t )
 )

0 t 0 
 
Fig 

18.1
However for stochastic processes, a direct application
of (18-1) a sequence of random variables for every .
generates
Moreover, for a stochastic process, E{| X(t) |2} represents
the ensemble average power (instantaneous energy) at the
instant t.

To obtain the spectral distribution of power versus frequency


for stochastic processes, it is best to avoid infinite intervals
to begin with, and start with a finite interval (– T, T ) in
(18-1). Formally, partial Fourier transform of a process X(t)
based on (– T, T ) is given by
so T
 (18-
that
X T ( ) T
X (t)e  jt
dt
3)

represents the power distribution associated with that
realization based on (– T, T ). Notice that (18-4) represents a
random variable for every and its ensemble average gives,
the average power distribution
based on (– T, T ). 2 2
T
Thus  X (t)e dt ,
 jt
(18-
2T T
4)
2T
T T j  ( t 1  t 2)
PT (  )   | X
T (  ) | 2   1  T E { X ( t 1 ) X * (t 2 )}e  d t 1 dt 2
E  2T  2T T

1 T T

j  ( t1  t 2 )
  T  T
2T
R X X ( t1 , t2 ) e 
d t 1 dt 2
(18-
represents the 
power distribution of X(t) based on (– T, T ). 5)
For wide sense stationary (w.s.s) processes, it is possible to
further simplify (18-5). Thus if X(t) is assumed to be w.s.s,
then
and (18-5) simplifies toR (t ,t )  R (t  t )
XX 1 2 XX 1 2

Let   t1  t2 and proceeding as in (14-24),


T T
P ( )  2T T RXX (t1 2  j (t1 t dt 1dt 2.
we get T 1 2)
T  t )e

to be the power distribution of the w.s.s. process X(t)
based on (– T, T ). Finally2Tletting T   in (18-6), we
obtain T
P ( )  2T 2T
1 R XX
(  )e  j
(2T  | |)d
(18-
2T 6)
 2T RXX ( )e j (1  2T
| |
)d  0

S XX ( )  lim TP () RXX ( )e j d  (18-
7)

T 
 w.s.s
to be the power spectral density of the 0 process X(t).
Notice that 
(18-
RXX ()    SXX () 
FT
8)
0.
i.e., the autocorrelation function and the power spectrum
of a w.s.s Process form a Fourier transform pair, a relation
known as the Wiener-Khinchin Theorem. From (18-8), the
inverse formula gives

and in particular for   0, 1 


RXX ( ) we
2  S XX ( )e j (18-
get 9)
 d
From (18-10), the area under XX
S () represents the total
power of the
process X(t), and henceXXS
spectrum.(Fig
()truly represents
1
the power   2
18.2).  S
2  XX ( )d  XX
(0)  E{| X (t) | }  P, the total power.
(18-
R 10)
If X(t) is a real w.s.s process, ( ) =

XX XX
( ) so that
then RS XX
( ) R XX (  ) e  j   dR
  
  R XX ( ) c o s   d 
 
 2 R X X (  ) c o s   d   S X(X   )  0
so that the power0 spectrum is an even function, (in (18-
addition to being real and nonnegative). 13)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy