0% found this document useful (0 votes)
180 views

Workshop 5: PDF Sampling and Statistics: Preview: Generating Random Numbers

This document provides instructions for completing Workshop 5, which involves generating random numbers and exploring probability distributions in Python. It includes exercises to generate uniformly and normally distributed random values, calculate statistics like the mean and standard deviation, and investigate how the standard error on the mean decreases with larger sample sizes.

Uploaded by

Levi Grantz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
180 views

Workshop 5: PDF Sampling and Statistics: Preview: Generating Random Numbers

This document provides instructions for completing Workshop 5, which involves generating random numbers and exploring probability distributions in Python. It includes exercises to generate uniformly and normally distributed random values, calculate statistics like the mean and standard deviation, and investigate how the standard error on the mean decreases with larger sample sizes.

Uploaded by

Levi Grantz
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Workshop 5: PDF sampling and Statistics

Levi Grantz 101

Submit this notebook to bCourses to receive a grade for this Workshop.

Please complete workshop activities in code cells in this iPython notebook. The activities titled Practice are
purely for you to explore Python, and no particular output is expected. Some of them have some code written,
and you should try to modify it in different ways to understand how it works. Although no particular output is
expected at submission time, it is highly recommended that you read and work through the practice activities
before or alongside the exercises. However, the activities titled Exercise have specific tasks and specific
outputs expected. Include comments in your code when necessary.

The workshop should be submitted on bCourses under the Assignments tab (both the .ipynb and .pdf
files).

Preview: generating random numbers


We will discuss simulations in greater detail later in the semester. The first step in simulating nature -- which,
despite Einstein's objections, is playing dice after all -- is to learn how to generate some numbers that appear
random. Of course, computers cannot generate true random numbers -- they have to follow an algorithm. But
the algorithm may be based on something that is difficult to predict (e.g. the time of day you are executing this
code) and therefore look random to a human. Sequences of such numbers are called pseudo-random.

The random variables you generate will be distributed according to some Probability Density Function (PDF).
The most common PDF is flat: f(x) = 1

b−a
for x . Here is how to get a random number uniformly
∈ [a. . b)

distributed between a = 0 and b = 1 in Python:

In [2]: # standard preamble


import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [2]: # generate one random number between [0,1)


x = np.random.rand()
print ('x=', x)

# generate an array of 10 random numbers between [0,1)


array = np.random.rand(10)
print (array)

x= 0.3301381757449716
[0.42676275 0.21657034 0.31146302 0.34241291 0.76983659 0.822103
0.42949574 0.04390257 0.69953249 0.21365358]
You can generate a set of randomly-distributed integer values instead:

In [3]: a = np.random.randint(0,1000,10)
print(a)

[951 145 133 364 772 291 932 276 47 340]


1d distributions
Moments of the distribution
Python's SciPy library contains a set of standard statistical functions. See a few examples below:

In [10]: # create a set of data and compute mean and variance


# This creates an array of 100 elements, uniformly-distributed between 100 and 200

# Try changing the size parameter!


x = np.random.uniform(low=100,high=200,size=100)

print(x[0:10])
# make a histogram
n, bins, patches = plt.hist(x, 20)

# various measures of "average value":


print('Mean = {0:5.0f}'.format(np.mean(x)))
print( 'Median = {0:5.0f}'.format(np.median(x)))

# measure of the spread


print('Standard deviation = {0:5.1f}'.format(np.std(x)))

[153.60820733 174.59916496 160.62079547 194.95064094 130.41011027


136.09601477 169.31974157 106.60241974 183.53553976 195.36263042]
Mean = 153
Median = 153
Standard deviation = 25.1

Exercise 1
We just introduced some new functions: np.random.rand() , np.random.uniform() , plt.hist() ,
np.mean() , and np.median() . So let's put them to work. You may also find np.cos() , np.sin() ,
and np.std() useful.

1. Generate 100 random numbers, uniformly distributed between [-π, π )


2. Plot them in a histogram.
3. Compute mean and standard deviation (RMS)
4. Plot a histogram of sin(x) and cos(x), where x is a uniformly distributed random number between [-π ,π ). Do
you understand this distribution ?

In [87]: # Your code for Exercise 1


x = np.random.uniform(low=(-np.pi),high=np.pi,size=100)
print('Mean: {:.5f}'.format(np.mean(x)))
print('Standard Deviation: {:.5f}'.format(np.std(x)))

fig, (ax1, ax2, ax3) = plt.subplots(1,3)


fig.set_figwidth(20)#just so it is easier to see
n, bins, patches = ax1.hist(x, 10)
n ,bins, patches = ax2.hist(np.cos(x),10)
n, bins, patches = ax3.hist(np.sin(x),10)

print('Distribution of sin and cos make sense as the range can only be from -1,1 and we wo
print('results at end points since functions oscillate between these maximums')

Mean: -0.00315
Standard Deviation: 2.06389
Distribution of sin and cos make sense as the range can only be from -1,1 and we would exp
ect most
results at end points by the definitions of the functions

Gaussian/Normal distribution
You can also generate Gaussian-distributed numbers. Remember that a Gaussian (or Normal) distribution is a
probability distribution given by
2
(x−μ)
1 −
2σ 2
P (x) = e
2
√2πσ

where μ is the average of the distribution and σ is the standard deviation. The standard normal distribution is a
special case with μ = 0 and σ .
= 1

In [29]: # generate a single random number, gaussian-distributed with mean=0 and sigma=1. This is a
# a standard normal distribution
x = np.random.standard_normal()
print (x)

# generate an array of 10 such numbers


a = np.random.standard_normal(size=10)
print (a)

0.38008365664084504
[ 0.11161829 0.34990148 -0.32832034 -0.0341174 2.62994729 -0.89025002
0.14426907 0.09904844 -0.04559592 1.22346903]

Exercise 2
We now introduced np.random.standard_normal() .

1. Generate N = 100 random numbers, Gaussian-distributed with μ = 0 and σ .


= 1

2. Plot them in a histogram.


3. Compute the mean, standard deviation (RMS), and standard error on the mean.
σ
The standard error on the mean is defined as σ μ = , where σ is the standard deviation.
√N

In [46]: # Your code for Exercise 2

x = np.random.standard_normal(size=100)

print('Mean: {:.5f}'.format(np.mean(x)))
print('Standard Deviation: {:.5f}'.format(np.std(x)))

#create standard error function


def standard_error(x,N):
s_e = np.std(x)/np.sqrt(N)
return s_e

print('Standard Error: {:.5f}'.format(standard_error(x,100)))

n, bins, patches = plt.hist(x)

Mean: -0.02362
Standard Deviation: 1.05725
Standard Error: 0.10573

1. Now find the means of M = 1000 experiments of N = 100 measurements each (you'll end up generating
100,000 random numbers total). Plot a histogram of the means. Is it consistent with your calculation of the
error on the mean for N = 100 ? About how many experiments yield a result within 1σ μ of the true mean
of 0 ? About how many are within 2σ μ ?
2. Now repeat question 4 for N = 10, 50, 1000, 10000 . Plot a graph of the RMS of the distribution of the
means vs N . Is it consistent with your expectations ?

In [3]: M = 1000
N = 100

myarray = []

for i in range(M):
x = np.random.standard_normal(size=N)
myarray.append(np.mean(x))

fig, (ax1,ax2) = plt.subplots(1,2)


fig.set_figwidth(15)
n, bins, patches = ax1.hist(myarray)
print('This is consistent as it shows a clumping of the means towards 0 which is what we e
print('Within 1 STD, we will see around 66% of the data which is equal to 660 experiments'
print('Within 2 STD, we will see 95% of the data which is equal to 950 experiements')
#these values were found online in intro statistics books

array1=[]
for i in range(M):
x1 = np.random.standard_normal(size=10)
array1.append(np.mean(x1))

array2=[]
for i in range(M):
x2 = np.random.standard_normal(size=50)
array2.append(np.mean(x2))

array3=[]
for i in range(M):
x3 = np.random.standard_normal(size=1000)
array3.append(np.mean(x3))

array4=[]
for i in range(M):
x4 = np.random.standard_normal(size=10000)
array4.append(np.mean(x4))

ax2.plot(np.std(array1),N)
ax2.plot(np.std(array2),N)
ax2.plot(np.std(array3),N)
ax2.plot(np.std(array4),N)

This is consistent as it shows a clumping of the means towards 0 which is what we expect
Within 1 STD, we will see around 66% of the data which is equal to 660 experiments
Within 2 STD, we will see 95% of the data which is equal to 950 experiements
[<matplotlib.lines.Line2D at 0x7fc1423fc1f0>]
Out[3]:

Exponential distribution
In this part we will repeat the above process, but now using lists of exponentially distributed random numbers.
The probability of selecting a random number between x and x + dx is ∝ e
−x
dx . Exponential distributions
often appear in lossy systems, e.g. if you plot an amplitude of a damped oscillator as a function of time. Or you
may see it when you plot the number of decays of a radioactive isotope as a function of time.

In [39]: # generate a single random number, exponentially-distributed with scale=1.


x = np.random.exponential()
print (x)

# generate an array of 10 such numbers


a = np.random.exponential(size=10)
print (a)

1.794768330797611
[0.80516534 1.2051888 1.71763237 1.11656412 1.39751146 1.76066744
1.32033846 2.48826302 0.99862236 0.55608702]

Exercise 3
We now introduced np.random.exponential() . This function can take up to two keywords, one of which
is size as shown above. The other is scale . Use the documentation and experiment with this exercise to
see what it does.

1. What do you expect to be the mean of the distribution? What do you expect to be the standard deviation?
2. Generate N = 100 random numbers, exponentially-distributed with the keyword scale set to 1.
3. Plot them in a histogram.
4. Compute mean, standard deviation (RMS), and the error on the mean. Is this what you expected?
5. Now find the means, standard deviations, and errors on the means for each of the M = 1000 experiments
of N = 100 measurements each. Plot a histogram of each quantity. Is the RMS of the distribution of the
means consistent with your calculation of the error on the mean for N = 100 ?
6. Now repeat question 5 for N = 10, 100, 1000, 10000 . Plot a graph of the RMS of the distribution of the
means vs N . Is it consistent with your expectations ? This is a demonstration of the Central Limit Theorem

In [80]: # Your code for Exercise 3


x1 = np.random.exponential(size=100,scale=1)

print('Mean: {:.5f}'.format(np.mean(x1)))
print('Standard Deviation: {:.5f}'.format(np.std(x1)))
print('Standard Error: {:.5f}'.format(standard_error(x1,100)))#calling on earlier function

fig, (ax1,ax2,ax3) = plt.subplots(1,3)


fig.set_figwidth(15)

n, bins, patches = ax1.hist(x1)

M = 1000
N = 100

myarray = []

for i in range(M):
x2 = np.random.standard_normal(size=N)
myarray.append(np.mean(x2))

n, bins, patches = ax2.hist(myarray)

Mean: 0.94314
Standard Deviation: 0.81325
Standard Error: 0.08133
Binomial distribution
The binomial distribution with parameters n and p is the discrete probability distribution of the number of
successes in a sequence of n independent yes/no experiments, each of which yields success with probability p.
A typical example is a distribution of the number of heads for n coin flips (p )
= 0.5

In [ ]: # Simulates flipping 1 fair coin one time. Returns 0 for heads and 1 for tails
p = 0.5
print (np.random.binomial(1,p))

# Simulates flipping 5 biased coins three times


p = 0.7
print (np.random.binomial(5,p, size=3))

Exercise 4
We now introduced the function np.random.binomial(n,p) which requires two arguments, n the number
of coins being flipped in a single trial and p the probability that a particular coin lands tails. As usual, size is
another optional keyword argument.

1. Generate an array of outcomes for flipping 1 unbiased coin 10 times.


2. Plot the outcomes in a histogram (0=heads, 1=tails).
3. Compute mean, standard deviation (RMS), and the error on the mean. Is this what you expected?

In [53]: # Your code for Exercise 4


p = 0.5 #50% chance of landing heads or tails on a two sided coin

x = np.random.binomial(1,p, size=10)

print('Mean: {:.5f}'.format(np.mean(x)))
print('Standard Deviation: {:.5f}'.format(np.std(x)))
print('Standard Error: {:.5f}'.format(standard_error(x,10)))

n, bins, patches = plt.hist(x)

print('These values for mean and standard deviation make perfect sense as there ' +
'can only be an outcome of 0 or 1 (heads or tails) and the mean' +
' will be within this range and the standard deviation is just half of the range')

Mean: 0.50000
Standard Deviation: 0.50000
Standard Error: 0.15811
These values for mean and standard deviation make perfect sense as therecan only be an out
come of 0 or 1 (heads or tails) and there for the mean will be within this range and the s
tandard deviation is just half of the range
Poisson distribution
The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of
events n occurring in a fixed interval of time T if these events occur with a known average rate ν/T and
independently of the time since the last event. The expectation value of n is ν. The variance of n is also ν, so
the standard deviation of n is σ(n) = √ν

In [62]: nu = 10 # expected number of events


n = np.random.poisson(nu) # generate a Poisson-distributed number.
print (n)

Exercise 5
We introduced np.random.poisson() . As usual, you can use the keyword argument size to draw
multiple samples.

1. Generate N = 100 random numbers, Poisson-distributed with ν = 10 .


2. Plot them in a histogram.
3. Compute mean, standard deviation (RMS), and the error on the mean. Is this what you expected?
4. Now repeat question 3 for ν = 1, 5, 100, 10000 . Plot a graph of the RMS vs ν. Is it consistent with your
expectations ?

In [85]: # Your code for Exercise 5


nu = 10
n = np.random.poisson(nu,size=100)

s_e = np.sqrt(nu)/np.sqrt(100) #specific to this kind of distribution

print('Mean: {:.5f}'.format(np.mean(n)))
print('Standard Deviation: {:.5f}'.format(np.sqrt(nu)))#did not use regular std function
print('Standard Error: {:.5f}'.format(s_e))

fig, (ax1,ax2) = plt.subplots(1,2)


fig.set_figwidth(15)

n, bins, patches = ax1.hist(n)

Mean: 9.83000
Standard Deviation: 3.16228
Standard Error: 0.31623
Doing something "useful" with a distribution
Random walks show up when studying statistical mechanics (and many other fields). The simplest random walk
is this:

Imagine a person stuck walking along a straight line. Each second, they randomly step either 1 meter forward or
1 meter backward.

With this in mind, you can start to ask many different questions. After one minute, how far do they end up from
their starting point? How many times do they cross the starting point? (The exact answers require repeating this
"experiment" many times and taking an average across all the trials.) How much do you have to pay someone to
walk along this line for several hours?

There are lots of interesting ways to generalize this problem. You can extend the random walk to 2+ dimensions,
make stepping in some directions more likely than others, draw the step sizes from some probability distribution,
etc. If you're curious, it's fun to plot the paths of 2D random walks to visualize Brownian motion.

Exercise 6
Use np.random.binomial(1, 0.5) (or some other random number generator) to simulate a random walk
along one dimension (the numbers from the binomial distribution signify either stepping forward or backward). It
would be helpful to write a function that takes N steps in the random walk, and then returns the distance from
the starting point.

In [83]: def random_walk(N):

'''This function will return the distance from the starting point
after a 1-dimensional random walk of N steps'''

# Use np.random.binomial(1,0.5) or another np.random function to "simulate" the random

x = np.random.binomial(1,0.5,size=N)

return x

random_walk(100)

array([1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1,
Out[83]:
0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0,
1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0,
0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0,
0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1])
Now that you have a function that simulates a single random walk for a given N , write a function (or just some
lines of code) that simulates M = 1000 of these random walks and returns the mean (average) distance
traveled for a given N .

In [91]: def average_distance(N):

'''This function simulates 1000 random walks of N steps


and then returns the average distance from the start.'''

# Use the random_walk(N) function 1000 times and return the average of the results
M=1000

walkarray = []
for i in range(M):
walks = random_walk(N)
walkarray.append(walks)

return sum(walkarray)/M

average_distance(10)

array([0.505, 0.528, 0.515, 0.508, 0.49 , 0.505, 0.524, 0.491, 0.516,


Out[91]:
0.517])

It turns out that you can now use these random walk simulations to estimate the value of π (although in an
extremely inefficient way). For values of N from 1 to 50, use your functions/code to find the mean distance D
after N steps. Then make a plot of D2 vs N . If you've done it correctly, the plot should be a straight line with a
2
slope of π
.

2N
Once we get to fitting in Python, you could find the slope and solve for π . For now, just draw the line π
over
your simulated data.

In [ ]:

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy