13. Random Processes
[status: partially-written]
13.1. Motivation, prerequisites, plan
Motivation
There is a surprising depth of insight you can gain by understanding various aspects of random processes, also called stochastic processes. The wikipedia article mentions that:
“They have applications in many disciplines including sciences such as biology, chemistry, ecology, neuroscience, and physics as well as technology and engineering fields such as image processing, signal processing, information theory, computer science, cryptography and telecommunications. Furthermore, seemingly random changes in financial markets have motivated the extensive use of stochastic processes in finance.”
This is a staggering list of fields: random processes seem to be as pervasive in all fields of physical, biological and social science as basic math, statistics and calculus.
Prerequisites
The 10-hour “serious programming” course.
The “Data files and first plots” mini-course in Section 2.
Random number basics from Section 11.
The sections on random walks from Section 12.
Plan
So how do we write programs that study and visualize these ideas? We will:
Review random number generation.
Introduce simple poisson processes: random amplitude and random time between samples.
Examine the distribution of \(\Delta t\) in a poisson process.
Look at random placements on a 2D lattice.
Various types of time series.
Revisit random walks.
Diffusion: distribution of distance covered in a random walk after time \(t\).
Brownian motion and kinetic theory.
Progression of record peaks, and progression of sports world records.
13.2. Reviewing random number generation
You should quickly review the brief section on what web pages look like in Section 11 before continuing in this section.
Now let us write a few lines of code which generate random numbers and
then puts them into a file which we can analyze and plot. You can
paste the program make_random_dt_series.py
directly into
the python3 interpreter, or save it to a file and run it:
#! /usr/bin/env python3
import random
## now put the numbers in a file
f = open('time_series.out', 'w')
f.write('## time time_interval what_happened\n')
n_samples = 3000
t = 0
for i in range(n_samples):
dt = random.random()
t = t + dt
f.write(f'{t} {dt} EVENT!!\n')
f.close()
print(f'wrote {n_samples} samples to file time_series.out')
You can run this with:
$ chmod +x make_random_dt_series.py
$ ./make_random_dt_series.py time_series.out
This will leave us with a file called time_series.out which we can examine from the command line or in an editor. The first-glance examination does not give much insight: this kind of data is best viewed using histograms.
In an earlier section we discussed making histograms from data sets,
and here we will use the program
float_histogram_maker.py
to analyze our file:
#! /usr/bin/env python3
"""Takes a file with a single column of floats and makes a histogram
of how frequently those floats occur in the file. Since float
histograms are not as trivial as the int histograms we did earlier,
this program will use the Numerical Python (numpy) histogram
facilities, which makes it really simple.
"""
import sys
import numpy as np
def main():
if len(sys.argv) != 4:
print(f'*error* - usage: {sys.argv[0]} filename which_col n_bins')
print(f' example: {sys.argv[0]} time_diffs.dat 1 80')
sys.exit(1)
fname = sys.argv[1]
which_col = int(sys.argv[2])
n_bins = int(sys.argv[3])
dataset = np.loadtxt(fname, usecols=(which_col,)) # add delimiter=',' for .csv files
print(dataset)
print(f'read {len(dataset)} samples, max {dataset.max()},'
f' min {dataset.min()}')
(hist, edges) = np.histogram(dataset, bins=n_bins)
print(len(hist), hist)
print(len(edges), edges)
z = zip(edges[:-1], hist)
fout = fname + '.hist'
np.savetxt(fout, np.array(list(z)), delimiter=' ')
print('saved edges and histogram to file %s' % fout)
if __name__ == '__main__':
main()
You can download the file, and run it like this:
$ chmod +x float_histogram_maker.py
$ ./float_histogram_maker.py time_series.out
You now have the file file_series.out.hist which has the histogram ready for plotting.
13.3. Poisson processes
A poisson process is a random process in which each event’s probability is unrelated to any previous events.
Examples include throwing dice, flipping coins, radioactive decay.
We will write programs that simulate various types of poisson process.
13.3.1. A pure poisson process
Our first program will examine what’s called a “poisson process”. [missing]
13.3.2. An angry lightning goddess
Scenario: you live in a very unfortunately placed house. Every day Astrape, the goddess of lightning, rolls her special dice and has a probability of 0.03 (3%) to hitting that house. This should correspond to an average of one hit per month.
Let us first write a program which generates this series of events
using a random number generator. Our first stab at this program,
lightning_first_stab.py
, will simply see that the
probability was indeed close to 0.03.
#! /usr/bin/env python3
"""Simpole example of simulating lightning strikes with a fixed
probability every day. This first stab at a simulation simply
calculates the probability of the hit to make sure that the random
number generator performs as we expect.
"""
import random
def main():
n_days = 1000
n_hits = 0
n_misses = 0
for day in range(n_days):
r = random.random()
if r <= 0.03: ## 3% chance
n_hits += 1
else:
n_misses += 1
hit_fraction = n_hits / n_days
print(f'average daily hits: {hit_fraction} ({n_days} days)')
if __name__ == '__main__':
main()
Run the program (a few times) with:
$ chmod +x lightning_first_stab.py
$ ./lightning_first_stab.py
The key mechanism used in this program was in these lines of code:
# ...
r = random.random()
if r <= 0.03: ## 3% chance
n_hits += 1
else:
n_misses += 1
# ...
hit_fraction = n_hits / n_days
Make sure you understand that the random.random()
call returning
less than 0.03 happens 3% of the time.
Now, following an example from Steven Pinker [FIXME: put citation to The Better Angles of Our Nature], we ask the question:
If Astrape struck your house today, which day is the most likely day for the next bolt of lightning to strike your house?
To estimate this we write the program
lightning_time_distribution.py
which simulates the
situation of a 3% chance of strike at every interval:
#! /usr/bin/env python3
import random
import math
def main():
## run this program with n_days = 50 when you want
## to eyeball the output; run it with n_days = 1000,
## then 10*1000, then 100*1000 when you want to make
## plots
n_days = 100000
delta_t_list = simulate_strikes(n_days)
## now that we have the list we print it to a file
with open('time_diffs.dat', 'w') as f:
for i, delta_t in enumerate(delta_t_list):
f.write(f'{i} {delta_t}\n')
if i > len(delta_t_list) - 10:
print(f'{i} {delta_t}')
print(f'wrote time_diffs.dat with {len(delta_t_list)} delta_t values\n'
'and showed you the last 10 or so lines')
def simulate_strikes(n_days):
"""simulates lightning strikes for a given number of
days, collecting information on the times between
strikes. returns the list of delta_t values.
"""
last_delta_t = -1
delta_t_list = []
prev_day_with_strike = -1
for day in range(n_days):
r = random.random() # a random float between 0 and 1
if r <= 0.03: # 3% chance
#print('%d: hit' % day)
if prev_day_with_strike >= 0:
## we record the delta_t of this event
last_delta_t = day - prev_day_with_strike
delta_t_list.append(last_delta_t)
prev_day_with_strike = day
return delta_t_list
if __name__ == '__main__':
main()
When you run the program with:
$ chmod +x lightning_time_distribution.py
$ ./lightning_time_distribution.py
You can then plot the results of this program in a couple of different ways. We will first make a scatter plot showing the lightning events as points in a plot with lightning number and \(\Delta t\). You can see this in Figure 13.3.1.
##REQUIRED_FILE: time_diffs.dat
##PRE_EXEC: ./lightning_time_distribution.py
set xlabel 'lightning number'
set ylabel '{/Symbol D} t (days)'
plot 'time_diffs.dat' using 2 with points pt 4 \
ps 0.3 title 'time between lightning strikes'
A bit more rigor and insight can be gotten from a histogram plotting the number of lighting pairs versus the time between the two strikes. This histogram is shown in Figure 13.3.2.
##REQUIRED_FILE: time_diffs.dat
##REQUIRED_FILE: time_diffs.dat.hist
##PRE_EXEC: ./lightning_time_distribution.py
##PRE_EXEC: ../plotting_intermediate/int_histogram_maker.py time_diffs.dat
set grid
set xlabel '{/Symbol Delta} t (days)'
set ylabel 'frequency of that interval'
set style data histogram
set style fill solid 0.8 border -1
plot [0:] [0:] 'time_diffs.dat.hist' using 1:2 with boxes
First conclusion: the answer to Steven Pinker’s question is that the most likely day in which the next lightning will strike your house is tomorrow. The goddess Astrape likes to play with humans.
Another conclusion: the distribution of time between consecutive lightning strikes looks like a decaying exponential.
[…many other conclusions…]
13.3.2.1. Questions
Now muse about the time distance between droughts and other natural calamities.
Examples: “Climate Change and Cultural Response” by Benson and Berry, page 105, shows distribution of mega-droughts in Figure 7.
Look in to the “palmer drought severity index” (PDSI) to find datasets. https://www.ncdc.noaa.gov/temp-and-precip/drought/historical-palmers/
PDSI is often shown from 1850 or 1870 until today. Some data sets go older, using tree ring data, and show the interesting periods from 800 to 1400, which would allow us to study the collapse of the Chaco Canyon culture.
For information about file formats and to see how pervasive obsolete data formats are, look at ftp://ftp.cdc.noaa.gov/Datasets/dai_pdsi/README
Download the .nc files at ftp://ftp.cdc.noaa.gov/Datasets/dai_pdsi/
and examine them with ncview
.
Or maybe better: get the ascii files from here: http://www.cgd.ucar.edu/cas/catalog/climind/pdsi.html for example http://www.cgd.ucar.edu/cas/catalog/climind/pdsisc.monthly.maps.1850-2010.fawc=1.r2.5x2.5.ipe=2.txt.gz
For Kansas look at http://www.kgs.ku.edu/Hydro/Publications/2012/OFR12_18/KGS_OF_2012-18.pdf page 10, Figure 5a: PDSI for 800 to 2000. And Figure 5b for megadroughts. Figure 5c shows the dustbowl drought.
Aha: the paleoclimatology (2000 years) might be here: https://www.ncdc.noaa.gov/paleo-search/study/19119 and a text file here: https://www1.ncdc.noaa.gov/pub/data/paleo/drought/LBDA2010/LBDA_PMDI_JJA_Recons.txt and the grid metadata is explained here: https://www1.ncdc.noaa.gov/pub/data/paleo/drought/LBDA2010/LBDA_PMDI_Instrumental_metadata.txt Note that Chaco Canyon Lat/Lon are: 36.0530 deg N, 107.9559 deg W. From the meatadat it looks like it might be grid points 1883 or 1884 in this data set.
Q: Are drought cycles random? Are they the combination of so many types of physics that they appear to be random enough? How about anthropogenic climate change? Does it add a baseline?
Topic: Atlantic multidecadal oscillation.
13.3.3. Vicious glow-worms
We have talked about processes which give events distributed randomly in time: events happen at random times. Let us now look at processes that generate points distributed randomly in space: \((x, y)\) coordinates are spewed out by our process. An example might be where the grains of sand land when you drop a handful onto the ground.
We can write a program to generate random \((x, y)\) points
between 0 and 100. The program random_spatial.py
generates
a series of such points, each completely independent of the previous
one.
#! /usr/bin/env python3
"""
Print a bunch of (x, y) points.
"""
import random
import sys
import math
def main():
n_pts = 3000
do_circle = False
if len(sys.argv) == 2:
n_pts = int(sys.argv[1])
n_hits = 0
while n_hits < n_pts:
x = random.random() * 100.0
y = random.random() * 100.0
if not do_circle:
print(f'{x} {y}')
n_hits += 1
if do_circle and math.hypot(x-50, y-50) < 50:
print(f'{x} {y}')
n_hits += 1
if __name__ == '__main__':
main()
You can plot this with the following gnuplot instructions:
##REQUIRED_FILE: random_spatial.dat
##PRE_EXEC: ./random_spatial.py > random_spatial.dat
##CAPTION: Random (x, y) points. You should be able to see
##CAPTION: some structure: occasional filaments, clustering, and empty
##CAPTION: spaces.
set size ratio -1
set xlabel 'x'
set ylabel 'y'
plot 'random_spatial.dat' using 1:2 with points pt 7 \
ps 0.3 title 'random (x, y) points'
The results are shown in Figure 13.3.3. You can see features in the data, even though it was randomly generated: filaments, clustering, voids… 1
A possible comment: people who spend a lot of time looking at randomly generated data probably don’t easily believe in conspiracy theories.
We can then do something analogous to what we did for events with
random time intervals: plot the distribution of distances between
\((x, y)\) points. The programs xy_to_distances.py
and
float_histogram_maker.py
allow us to do so, and the results are
in . Note that you will not get as much insight out of these spatial
histograms as you did in , since a big factor in the distribution of
spacial distances is the small size of the x-y plane we used.
Before running these programs you will need to install the
python3-scipy
package:
$ sudo apt install python3-scipy
#! /usr/bin/env python3
"""Load a collection of (x, y) points and print out the distance
between each pair of points"""
import sys
import scipy
import scipy.spatial
import math
def main():
fname = sys.argv[1]
coords = []
with open(fname, 'r') as f:
lines = f.readlines()
for line in lines:
xs, ys = line.split()
(x, y) = (float(xs), float(ys))
coords.append((x, y))
print('read in %d coordinates' % len(coords))
dist_out_fname = sys.argv[1] + '.distances'
nearest_fname = sys.argv[1] + '.nearest'
with open(dist_out_fname, 'w') as fpairs:
with open(nearest_fname, 'w') as fnearest:
for i in range(len(coords)-1):
nearest_distance = sys.float_info.max
for j in range(0, len(coords)):
r = math.hypot(coords[j][0] - coords[i][0],
coords[j][1] - coords[i][1])
## write all pairwise distances to the fpairs file
if j > i: # don't duplicate
fpairs.write(f'{i}_{j} {r}\n')
## now see if we have a new nearest distance
if i != j and r < nearest_distance:
nearest_distance = r
## write nearest distances to a separate file
fnearest.write(f'{i} {nearest_distance}\n')
print('wrote distances to %s' % dist_out_fname)
print('wrote nearest distances to %s' % nearest_fname)
if __name__ == '__main__':
main()
You can run this with:
$ chmod +x random_spatial.py xy_to_distances.py
$ ./random_spatial.py
$ # then save it to a file
$ ./random_spatial.py > random_spatial.dat
$ ./xy_to_distances.py random_spatial.dat
$ # this will show that output goes to the two files:
$ # random_spatial.dat.distances and random_spatial.dat.nearest
and plot the result with:
##REQUIRED_FILE: random_spatial.dat
##REQUIRED_FILE: random_spatial.dat.distances
##REQUIRED_FILE: random_spatial.dat.nearest
##REQUIRED_FILE: random_spatial.dat.distances.hist
##REQUIRED_FILE: random_spatial.dat.nearest.hist
##PRE_EXEC: ./random_spatial.py > random_spatial.dat
##PRE_EXEC: ./xy_to_distances.py random_spatial.dat
##PRE_EXEC: ./float_histogram_maker.py random_spatial.dat.distances
##PRE_EXEC: ./float_histogram_maker.py random_spatial.dat.nearest
set grid
set multi layout 2,1
set style data histogram
set style fill solid 0.8 border -1
set xlabel 'distance between random (x, y) points'
set ylabel 'frequency of that distance'
plot [0:] 'random_spatial.dat.distances.hist' using 1:2 with boxes
set xlabel 'nearest distance to a point'
set ylabel 'frequency of that nearest distance'
plot [0:] 'random_spatial.dat.nearest.hist' using 1:2 with boxes
The results are shown in Figure 13.3.4.
So we have seen that in two spatial dimensions we have a situation analogous to that for time differences: nearest points are distributed with a decaying exponential distribution.
13.3.3.1. Questions
So what would the two dimensional distribution look like if the \((x, y)\) values were not purely random?
Explore this in two ways:
Change the program
random_spatial.py
to put all the \((x, y)\) points at integer positions.Change the program
random_spatial.py
to only put new points outside a “radius of avoidance” from existing points.
The “radius of avoidance” idea is what happens with the famous Waitomo Glowworm Caves in New Zeland: the worm larvae take positions on the ceiling of the cave and glow to attract prey. They avoid settling near other larvae to avoid cannibalism. The result is a distribution that does not have the same filaments and voids as a purely random distribution.
The lack of structure (filaments, voids…) in the cave was observed by biologist Stephen Jay Gould. He conjectured that the reason was this radius of avoidance. Physicist Ed Purcell carried out a calculation and visualization similar to ours to confirm Gould’s conjecture.
[cite Steven Pinker for the glowworm example]
13.4. Brownian motion
In 1827 Scottish botanist Robert Brown looked at pollen grains in water and noticed that the pollen was moved around in the water, jiggling randomly. This was one of the significant discoveries that led to the acceptance of the molecular theory.
In 1905 Einstein explained this motion by showing that the pollen was being buffeted around by the motion of individual water molecules.
Today we know that fluids (liquids and gasses) are made of molecules which are in constant motion. They bump in to each other, into the walls of their container, and they also bump in to larger particles floating around (such as the grain of pollen that Brown observed).
Simulating this behavior is an interesting challenge: it would involve keeping track of a large number of gas molecules and seeing when they bump in to a large sphere, which would represent the grain of pollen.
Right now I leave that more detailed programming task as an exercise, and will point that a huge number of bumps (\(10^{14}\) collisions/second) by molecules which are going in all directions means that our grain of pollen will be moving in a sort of random walk, the same kind we discussed and visualized in Section 12.
What we will do here is write some programs that simulate such random walks and then look at answering the question “where will I end up?” More specifically we ask “how far did I go?”
Let us start with one dimension. The program multiple-walks.py
will take several walks in one dimension and write out the final
arrival point and how far we went.
#! /usr/bin/env python3
"""Take many one-simensional random walks so that we can study the
distribution of how far you get in a random walk
"""
import random
import math
import sys
def main():
x = 0
n_steps = 10000
n_walks = 10
if len(sys.argv) == 3:
n_steps = int(sys.argv[1])
n_walks = int(sys.argv[2])
elif len(sys.argv) != 1:
sys.stderr.write('error - usage: %s [n_steps n_walks]\n' % sys.argv[0])
exit(1)
distances = [] # we'll store all the distances we found
## now run several random walks
for i in range(n_walks):
final_x = take_walk(x, n_steps)
distance = math.fabs(final_x) # how far did we get?
distances.append(distance)
print('%d: %g %g' % (i, final_x, distance))
def take_walk(x, n_steps):
"""take a simple 1D random walk"""
for i in range(n_steps):
## generate a step of -1 or +1 in the x direction
step_x = random.randint(0, 1) * 2 - 1
x = x + step_x
# print(i, x, math.sqrt(i), math.fabs(math.fabs(x) - math.sqrt(i)))
## return the final location
return x
main()
Nice, but can we have a plot?
#! /usr/bin/env python3
"""Take many one-simensional random walks so that we can study the
distribution of how far you get in a random walk. Keep track of the
path each walk took and plot it.
"""
import random
import math
import sys
import matplotlib.pyplot as plt
def main():
x = 0
n_steps = 10000
n_walks = 10
if len(sys.argv) == 3:
n_steps = int(sys.argv[1])
n_walks = int(sys.argv[2])
elif len(sys.argv) != 1:
sys.stderr.write('error - usage: %s [n_steps n_walks]\n' % sys.argv[0])
exit(1)
distances = [] # we'll store all the distances we found
## prepare some plot parameters
plt.xlabel('step')
plt.ylabel('distance from origin')
plt.grid(True)
## now run several random walks
for i in range(n_walks):
path = take_walk(x, n_steps)
final_x = path[-1]
distance = math.fabs(final_x) # how far did we get?
distances.append(distance)
print('%d: %g %g' % (i, final_x, distance))
plt.plot(path) # add to the plot
## now show the plot we have accumulated
plt.show()
def take_walk(x, n_steps):
"""take a simple 1D random walk"""
path = []
for i in range(n_steps):
## generate a step of -1 or +1 in the x direction
step_x = random.randint(0, 1) * 2 - 1
x = x + step_x
# print(i, x, math.sqrt(i), math.fabs(math.fabs(x) - math.sqrt(i)))
path.append(x)
## return the final location
return path
main()
Sure, but what if we are impatient and we want to see the plots before they are all generated. We can animate the plot using the techniques from Section 6.8.2.
#! /usr/bin/env python3
"""Take many one-simensional random walks so that we can study the
distribution of how far you get in a random walk. Keep track of the
path each walk took and plot it.
"""
import random
import math
import sys
import matplotlib.pyplot as plt
def main():
x = 0
n_steps = 10000
n_walks = 10
if len(sys.argv) == 3:
n_steps = int(sys.argv[1])
n_walks = int(sys.argv[2])
elif len(sys.argv) != 1:
sys.stderr.write('error - usage: %s [n_steps n_walks]\n' % sys.argv[0])
exit(1)
distances = [] # we'll store all the distances we found
## prepare to have animation
fig, ax = plt.subplots()
plt.xlabel('step')
plt.ylabel('distance from origin')
plt.grid(True)
plt.ion()
## now run several random walks
for i in range(n_walks):
path = take_walk(x, n_steps)
final_x = path[-1]
distance = math.fabs(final_x) # how far did we get?
distances.append(distance)
print('%d: %g %g' % (i, final_x, distance))
plt.plot(path) # add to the plot
fig.canvas.draw_idle()
plt.pause(0.4)
fig.canvas.draw_idle()
## now show the plot we have accumulated
plt.show()
plt.waitforbuttonpress()
def take_walk(x, n_steps):
"""take a simple 1D random walk"""
path = []
for i in range(n_steps):
## generate a step of -1 or +1 in the x direction
step_x = random.randint(0, 1) * 2 - 1
x = x + step_x
# print(i, x, math.sqrt(i), math.fabs(math.fabs(x) - math.sqrt(i)))
path.append(x)
## return the final location
return path
main()
Now let’s look at a crucial property of random walks: the overall behavior of the final position and the distance from the origin. At firs tit seems that these are all very different, with no pattern. But if we plot a histogram of the final positions in the one dimensional random walk we see something interesting:
#! /usr/bin/env python3
"""Take many one-simensional random walks so that we can study the
distribution of how far you get in a random walk. This version starts
from multiple-walks.py and adds the ability to plot a histogram of how
far we got.
"""
import random
import math
import sys
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
def main():
x = 0
n_steps = 1001
n_walks = 1001
if len(sys.argv) == 3:
n_steps = int(sys.argv[1])
n_walks = int(sys.argv[2])
elif len(sys.argv) != 1:
sys.stderr.write('error - usage: %s [n_steps n_walks]\n' % sys.argv[0])
exit(1)
positions = [] # we'll store all the positions we found
## prepare to plot
plt.grid(True)
plt.xlabel('final value')
plt.ylabel('number of walks')
plt.title('1D random walk arrival values (%d, %d)' % (n_steps, n_walks))
plt.ion() # enable interactive
## now run several random walks
for i in range(n_walks):
final_x = take_walk(x, n_steps)
positions.append(final_x)
# print('%d: %g %g' % (i, final_x, position))
plot_position_histogram(positions, n_steps, n_walks)
plt.show()
plt.waitforbuttonpress()
def take_walk(x, n_steps):
"""take a simple 1D random walk"""
for i in range(n_steps):
## generate a step of -1 or +1 in the x direction
step_x = random.randint(0, 1) * 2 - 1
x = x + step_x
# print(i, x, math.sqrt(i), math.fabs(math.fabs(x) - math.sqrt(i)))
## return the final location
return x
def plot_position_histogram(positions, n_steps, n_walks):
if len(positions) < 10:
return
n_steps_so_far = len(positions)
n_bins = min(51, 5+int(n_steps_so_far/3))
# print(np.array(positions))
plt.cla()
plt.grid(True)
plt.xlabel('final value')
plt.ylabel('number of walks')
plt.title('1D random walk arrival values (%d, %d)'
% (n_steps_so_far, n_walks))
n, bins, patches = plt.hist(positions, n_bins)
fig.canvas.draw_idle()
plt.pause(0.01)
main()
NOTE: try to superimpose a gaussian with sigma proportional to sqrt(n_steps).
Let’s visualize things together with an animated multiplot:
#! /usr/bin/env python3
"""Take many one-simensional random walks so that we can study the
distribution of how far you get in a random walk. This version starts
from multiple-walks.py and adds the ability to plot a histogram of how
far we got.
"""
import random
import math
import sys
import numpy as np
import matplotlib.pyplot as plt
skip_interval = 1
# skip_interval = 5
def main():
x = 0
n_steps = 1001
n_walks = 1001
if len(sys.argv) == 3:
n_steps = int(sys.argv[1])
n_walks = int(sys.argv[2])
elif len(sys.argv) != 1:
sys.stderr.write('error - usage: %s [n_steps n_walks]\n' % sys.argv[0])
exit(1)
positions = [] # we'll store all the positions we found
# (ax1, ax2, ax3, ax4) = prepare_plotting(n_steps, n_walks)
prepare_plotting(n_steps, n_walks)
## now run several random walks
last_10_walks = []
for i in range(n_walks):
take_walk(x, n_steps, last_10_walks)
if i % skip_interval == 0 or i < 20 or i > (n_walks-20):
plot_walks(last_10_walks)
final_x = last_10_walks[-1][-1] # last position of last walk
position = math.fabs(final_x) # how far did we get?
# positions.append(position)
positions.append(final_x)
# print('%d: %g %g' % (i, final_x, position))
if i % skip_interval == 0 or i < 20 or i > (n_walks-20):
plot_position_histogram(positions, n_steps, n_walks)
plt.subplot(224).figure.suptitle('1D random walks arrival values (%d, %d)'
% (i, n_walks))
plt.pause(0.000001)
plt.show()
plt.waitforbuttonpress()
def prepare_plotting(n_steps, n_walks):
fig = plt.figure(1)
fig.suptitle('1D random walks arrival values (%d, %d)'
% (n_steps, n_walks))
plt.subplots_adjust(top=0.92, bottom=0.10, left=0.10, right=0.95,
hspace=0.25, wspace=0.35)
plt.ion()
# return (ax1, ax2, ax3, ax4)
def take_walk(x, n_steps, last_10_walks):
"""take a simple 1D random walk"""
last_10_walks.append([])
for i in range(n_steps):
## generate a step of -1 or +1 in the x direction
step_x = random.randint(0, 1) * 2 - 1
x = x + step_x
last_10_walks[-1].append(x)
# print(i, x, math.sqrt(i), math.fabs(math.fabs(x) - math.sqrt(i)))
## now make sure we only keep the last 10 entries
while len(last_10_walks) > 10:
del last_10_walks[0]
return last_10_walks
def plot_walks(last_10_walks):
## plot it in the cumulative plot
ax = plt.subplot(2, 2, 1)
plt.xlabel('step')
plt.ylabel('position')
ax.plot(last_10_walks[-1])
## plot it in the "last 10" plots
if len(last_10_walks) >= 10:
ax = plt.subplot(2, 2, 2)
ax.cla()
plt.xlabel('step')
plt.ylabel('position')
for i in range(10):
ax.plot(last_10_walks[i])
ax.figure.canvas.draw_idle()
def plot_position_histogram(positions, n_steps, n_walks):
if len(positions) < 10:
return
n_steps_so_far = len(positions)
n_bins = min(51, 5+int(n_steps_so_far/3))
## plot the positions
ax = plt.subplot(2, 2, 3)
ax.cla()
ax.grid(True)
plt.xlabel('final position')
plt.ylabel('number of walks')
# plt.title('1D random walk (%d, %d)'
# % (n_steps_so_far, n_walks))
n, bins, patches = ax.hist(positions, n_bins)
ax.figure.canvas.draw_idle()
## plot the positions
distances = [math.fabs(pos) for pos in positions]
ax = plt.subplot(2, 2, 4)
ax.cla()
ax.grid(True)
plt.xlabel('distance from start')
plt.ylabel('number of walks')
# plt.title('1D random walk (%d, %d)'
# % (n_steps_so_far, n_walks))
n, bins, patches = ax.hist(distances, n_bins)
ax.figure.suptitle('1D random walks arrival values (%d, %d)'
% (n_steps_so_far, n_walks))
ax.figure.canvas.draw_idle()
main()
Let’s look at the 2D situation:
#! /usr/bin/env python3
"""Take many one-simensional random walks so that we can study the
distribution of how far you get in a random walk. This version starts
from multiple-walks-histogram.py, changes it to take two dimensional
walks, and then plots a histogram of how far we got.
"""
import random
import math
import sys
import numpy as np
import matplotlib.pyplot as plt
def main():
pt = (0, 0)
n_steps = 1000
n_walks = 1000
if len(sys.argv) == 3:
n_steps = int(sys.argv[1])
n_walks = int(sys.argv[2])
elif len(sys.argv) == 4:
n_steps = int(sys.argv[1])
n_walks = int(sys.argv[2])
output_file = sys.argv[3]
elif len(sys.argv) == 2:
output_file = sys.argv[1]
elif len(sys.argv) != 1:
sys.stderr.write('error - usage: %s [n_steps n_walks]\n' % sys.argv[0])
exit(1)
distances = [] # we'll store all the distances we found
## now run several random walks
for i in range(n_walks):
final_pt = take_walk(pt, n_steps)
distance = math.hypot(*final_pt) # how far did we get?
distances.append(distance)
# print('%d: %g %g %g' % (i, *final_pt, distance))
plot_distance_histogram(distances, n_steps, n_walks, output_file)
def take_walk(pt, n_steps):
"""take a simple 1D random walk"""
for i in range(n_steps):
## generate a step of -1 or +1 in the x direction
step_x = random.randint(0, 1) * 2 - 1
step_y = random.randint(0, 1) * 2 - 1
pt = (pt[0] + step_x, pt[1] + step_y)
# print(i, x, math.sqrt(i), math.fabs(math.fabs(x) - math.sqrt(i)))
## return the final location
return pt
def plot_distance_histogram(distances, n_steps, n_walks, output_file):
n_bins = min(50, n_walks/3.0)
# print(np.array(distances))
n, bins, patches = plt.hist(distances, n_bins)
print(bins)
plt.xlabel('distance traveled')
plt.ylabel('number of walks')
plt.title('2D random walk distance distribution (%d, %d)'
% (n_steps, n_walks))
plt.grid(True)
plt.show()
main()
13.5. Further reading
The wikipedia article https://en.wikipedia.org/wiki/Brownian_motion has nice diagrams.
Start by watching this video: https://www.youtube.com/watch?v=hy-clLi8gHg
Possibly do this experiment: https://www.youtube.com/watch?v=wf2tBAvMNbg
Other video on brownian motion: https://www.youtube.com/watch?v=NHo6LTXdFns
This one has a cute demonstration of how you get a random walk when you focus on a single particle, and is https://www.youtube.com/watch?v=pdz7wFHSLD0
Scipy has some routines that help simulate Brownian motion. The cookbook has this example: http://scipy-cookbook.readthedocs.io/items/BrownianMotion.html
13.6. Progression of record peaks
https://en.wikipedia.org/wiki/Athletics_record_progressions
[footnotes]
- 1
Note that the clustering is an artifact of the random generation of points; it is not due to a physical effect that clusters the points together.
13.7. The gambler’s fallacy
13.8. The gambler’s ruin
13.9. Link to other chapters
Simulated annealing covered in Section 30
Genetic algorithms (planned).