Workshop 5: PDF Sampling and Statistics: Preview: Generating Random Numbers
Workshop 5: PDF Sampling and Statistics: Preview: Generating Random Numbers
Please complete workshop activities in code cells in this iPython notebook. The activities titled Practice are
purely for you to explore Python, and no particular output is expected. Some of them have some code written,
and you should try to modify it in different ways to understand how it works. Although no particular output is
expected at submission time, it is highly recommended that you read and work through the practice activities
before or alongside the exercises. However, the activities titled Exercise have specific tasks and specific
outputs expected. Include comments in your code when necessary.
The workshop should be submitted on bCourses under the Assignments tab (both the .ipynb and .pdf
files).
The random variables you generate will be distributed according to some Probability Density Function (PDF).
The most common PDF is flat: f(x) = 1
b−a
for x . Here is how to get a random number uniformly
∈ [a. . b)
x= 0.3301381757449716
[0.42676275 0.21657034 0.31146302 0.34241291 0.76983659 0.822103
0.42949574 0.04390257 0.69953249 0.21365358]
You can generate a set of randomly-distributed integer values instead:
In [3]: a = np.random.randint(0,1000,10)
print(a)
print(x[0:10])
# make a histogram
n, bins, patches = plt.hist(x, 20)
Exercise 1
We just introduced some new functions: np.random.rand() , np.random.uniform() , plt.hist() ,
np.mean() , and np.median() . So let's put them to work. You may also find np.cos() , np.sin() ,
and np.std() useful.
print('Distribution of sin and cos make sense as the range can only be from -1,1 and we wo
print('results at end points since functions oscillate between these maximums')
Mean: -0.00315
Standard Deviation: 2.06389
Distribution of sin and cos make sense as the range can only be from -1,1 and we would exp
ect most
results at end points by the definitions of the functions
Gaussian/Normal distribution
You can also generate Gaussian-distributed numbers. Remember that a Gaussian (or Normal) distribution is a
probability distribution given by
2
(x−μ)
1 −
2σ 2
P (x) = e
2
√2πσ
where μ is the average of the distribution and σ is the standard deviation. The standard normal distribution is a
special case with μ = 0 and σ .
= 1
In [29]: # generate a single random number, gaussian-distributed with mean=0 and sigma=1. This is a
# a standard normal distribution
x = np.random.standard_normal()
print (x)
0.38008365664084504
[ 0.11161829 0.34990148 -0.32832034 -0.0341174 2.62994729 -0.89025002
0.14426907 0.09904844 -0.04559592 1.22346903]
Exercise 2
We now introduced np.random.standard_normal() .
x = np.random.standard_normal(size=100)
print('Mean: {:.5f}'.format(np.mean(x)))
print('Standard Deviation: {:.5f}'.format(np.std(x)))
Mean: -0.02362
Standard Deviation: 1.05725
Standard Error: 0.10573
1. Now find the means of M = 1000 experiments of N = 100 measurements each (you'll end up generating
100,000 random numbers total). Plot a histogram of the means. Is it consistent with your calculation of the
error on the mean for N = 100 ? About how many experiments yield a result within 1σ μ of the true mean
of 0 ? About how many are within 2σ μ ?
2. Now repeat question 4 for N = 10, 50, 1000, 10000 . Plot a graph of the RMS of the distribution of the
means vs N . Is it consistent with your expectations ?
In [3]: M = 1000
N = 100
myarray = []
for i in range(M):
x = np.random.standard_normal(size=N)
myarray.append(np.mean(x))
array1=[]
for i in range(M):
x1 = np.random.standard_normal(size=10)
array1.append(np.mean(x1))
array2=[]
for i in range(M):
x2 = np.random.standard_normal(size=50)
array2.append(np.mean(x2))
array3=[]
for i in range(M):
x3 = np.random.standard_normal(size=1000)
array3.append(np.mean(x3))
array4=[]
for i in range(M):
x4 = np.random.standard_normal(size=10000)
array4.append(np.mean(x4))
ax2.plot(np.std(array1),N)
ax2.plot(np.std(array2),N)
ax2.plot(np.std(array3),N)
ax2.plot(np.std(array4),N)
This is consistent as it shows a clumping of the means towards 0 which is what we expect
Within 1 STD, we will see around 66% of the data which is equal to 660 experiments
Within 2 STD, we will see 95% of the data which is equal to 950 experiements
[<matplotlib.lines.Line2D at 0x7fc1423fc1f0>]
Out[3]:
Exponential distribution
In this part we will repeat the above process, but now using lists of exponentially distributed random numbers.
The probability of selecting a random number between x and x + dx is ∝ e
−x
dx . Exponential distributions
often appear in lossy systems, e.g. if you plot an amplitude of a damped oscillator as a function of time. Or you
may see it when you plot the number of decays of a radioactive isotope as a function of time.
1.794768330797611
[0.80516534 1.2051888 1.71763237 1.11656412 1.39751146 1.76066744
1.32033846 2.48826302 0.99862236 0.55608702]
Exercise 3
We now introduced np.random.exponential() . This function can take up to two keywords, one of which
is size as shown above. The other is scale . Use the documentation and experiment with this exercise to
see what it does.
1. What do you expect to be the mean of the distribution? What do you expect to be the standard deviation?
2. Generate N = 100 random numbers, exponentially-distributed with the keyword scale set to 1.
3. Plot them in a histogram.
4. Compute mean, standard deviation (RMS), and the error on the mean. Is this what you expected?
5. Now find the means, standard deviations, and errors on the means for each of the M = 1000 experiments
of N = 100 measurements each. Plot a histogram of each quantity. Is the RMS of the distribution of the
means consistent with your calculation of the error on the mean for N = 100 ?
6. Now repeat question 5 for N = 10, 100, 1000, 10000 . Plot a graph of the RMS of the distribution of the
means vs N . Is it consistent with your expectations ? This is a demonstration of the Central Limit Theorem
print('Mean: {:.5f}'.format(np.mean(x1)))
print('Standard Deviation: {:.5f}'.format(np.std(x1)))
print('Standard Error: {:.5f}'.format(standard_error(x1,100)))#calling on earlier function
M = 1000
N = 100
myarray = []
for i in range(M):
x2 = np.random.standard_normal(size=N)
myarray.append(np.mean(x2))
Mean: 0.94314
Standard Deviation: 0.81325
Standard Error: 0.08133
Binomial distribution
The binomial distribution with parameters n and p is the discrete probability distribution of the number of
successes in a sequence of n independent yes/no experiments, each of which yields success with probability p.
A typical example is a distribution of the number of heads for n coin flips (p )
= 0.5
In [ ]: # Simulates flipping 1 fair coin one time. Returns 0 for heads and 1 for tails
p = 0.5
print (np.random.binomial(1,p))
Exercise 4
We now introduced the function np.random.binomial(n,p) which requires two arguments, n the number
of coins being flipped in a single trial and p the probability that a particular coin lands tails. As usual, size is
another optional keyword argument.
x = np.random.binomial(1,p, size=10)
print('Mean: {:.5f}'.format(np.mean(x)))
print('Standard Deviation: {:.5f}'.format(np.std(x)))
print('Standard Error: {:.5f}'.format(standard_error(x,10)))
print('These values for mean and standard deviation make perfect sense as there ' +
'can only be an outcome of 0 or 1 (heads or tails) and the mean' +
' will be within this range and the standard deviation is just half of the range')
Mean: 0.50000
Standard Deviation: 0.50000
Standard Error: 0.15811
These values for mean and standard deviation make perfect sense as therecan only be an out
come of 0 or 1 (heads or tails) and there for the mean will be within this range and the s
tandard deviation is just half of the range
Poisson distribution
The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of
events n occurring in a fixed interval of time T if these events occur with a known average rate ν/T and
independently of the time since the last event. The expectation value of n is ν. The variance of n is also ν, so
the standard deviation of n is σ(n) = √ν
Exercise 5
We introduced np.random.poisson() . As usual, you can use the keyword argument size to draw
multiple samples.
print('Mean: {:.5f}'.format(np.mean(n)))
print('Standard Deviation: {:.5f}'.format(np.sqrt(nu)))#did not use regular std function
print('Standard Error: {:.5f}'.format(s_e))
Mean: 9.83000
Standard Deviation: 3.16228
Standard Error: 0.31623
Doing something "useful" with a distribution
Random walks show up when studying statistical mechanics (and many other fields). The simplest random walk
is this:
Imagine a person stuck walking along a straight line. Each second, they randomly step either 1 meter forward or
1 meter backward.
With this in mind, you can start to ask many different questions. After one minute, how far do they end up from
their starting point? How many times do they cross the starting point? (The exact answers require repeating this
"experiment" many times and taking an average across all the trials.) How much do you have to pay someone to
walk along this line for several hours?
There are lots of interesting ways to generalize this problem. You can extend the random walk to 2+ dimensions,
make stepping in some directions more likely than others, draw the step sizes from some probability distribution,
etc. If you're curious, it's fun to plot the paths of 2D random walks to visualize Brownian motion.
Exercise 6
Use np.random.binomial(1, 0.5) (or some other random number generator) to simulate a random walk
along one dimension (the numbers from the binomial distribution signify either stepping forward or backward). It
would be helpful to write a function that takes N steps in the random walk, and then returns the distance from
the starting point.
'''This function will return the distance from the starting point
after a 1-dimensional random walk of N steps'''
x = np.random.binomial(1,0.5,size=N)
return x
random_walk(100)
array([1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1,
Out[83]:
0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0,
1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0,
0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0,
0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1])
Now that you have a function that simulates a single random walk for a given N , write a function (or just some
lines of code) that simulates M = 1000 of these random walks and returns the mean (average) distance
traveled for a given N .
# Use the random_walk(N) function 1000 times and return the average of the results
M=1000
walkarray = []
for i in range(M):
walks = random_walk(N)
walkarray.append(walks)
return sum(walkarray)/M
average_distance(10)
It turns out that you can now use these random walk simulations to estimate the value of π (although in an
extremely inefficient way). For values of N from 1 to 50, use your functions/code to find the mean distance D
after N steps. Then make a plot of D2 vs N . If you've done it correctly, the plot should be a straight line with a
2
slope of π
.
2N
Once we get to fitting in Python, you could find the slope and solve for π . For now, just draw the line π
over
your simulated data.
In [ ]: