13. Random Processes

[status: partially-written]

13.1. Motivation, prerequisites, plan

Motivation

There is a surprising depth of insight you can gain by understanding various aspects of random processes, also called stochastic processes. The wikipedia article mentions that:

“They have applications in many disciplines including sciences such as biology, chemistry, ecology, neuroscience, and physics as well as technology and engineering fields such as image processing, signal processing, information theory, computer science, cryptography and telecommunications. Furthermore, seemingly random changes in financial markets have motivated the extensive use of stochastic processes in finance.”

This is a staggering list of fields: random processes seem to be as pervasive in all fields of physical, biological and social science as basic math, statistics and calculus.

Prerequisites

  • The 10-hour “serious programming” course.

  • The “Data files and first plots” mini-course in Section 2.

  • Random number basics from Section 11.

  • The sections on random walks from Section 12.

Plan

So how do we write programs that study and visualize these ideas? We will:

  1. Review random number generation.

  2. Introduce simple poisson processes: random amplitude and random time between samples.

  3. Examine the distribution of \(\Delta t\) in a poisson process.

  4. Look at random placements on a 2D lattice.

  5. Various types of time series.

  6. Revisit random walks.

  7. Diffusion: distribution of distance covered in a random walk after time \(t\).

  8. Brownian motion and kinetic theory.

  9. Progression of record peaks, and progression of sports world records.

13.2. Reviewing random number generation

You should quickly review the brief section on what web pages look like in Section 11 before continuing in this section.

Now let us write a few lines of code which generate random numbers and then puts them into a file which we can analyze and plot. You can paste the program make_random_dt_series.py directly into the python3 interpreter, or save it to a file and run it:

Listing 13.2.1 make_random_dt_series.py – Make a time history with completely random dt values
#! /usr/bin/env python3

import random

## now put the numbers in a file
f = open('time_series.out', 'w')
f.write('## time  time_interval  what_happened\n')
n_samples = 3000
t = 0
for i in range(n_samples):
    dt = random.random()
    t = t + dt
    f.write(f'{t}  {dt}  EVENT!!\n')

f.close()
print(f'wrote {n_samples} samples to file time_series.out')

You can run this with:

$ chmod +x make_random_dt_series.py
$ ./make_random_dt_series.py time_series.out

This will leave us with a file called time_series.out which we can examine from the command line or in an editor. The first-glance examination does not give much insight: this kind of data is best viewed using histograms.

In an earlier section we discussed making histograms from data sets, and here we will use the program float_histogram_maker.py to analyze our file:

Listing 13.2.2 float_histogram_maker.py – Make a histogram from a file with a single column of floats..
#! /usr/bin/env python3

"""Takes a file with a single column of floats and makes a histogram
of how frequently those floats occur in the file.  Since float
histograms are not as trivial as the int histograms we did earlier,
this program will use the Numerical Python (numpy) histogram
facilities, which makes it really simple.
"""

import sys
import numpy as np

def main():
    if len(sys.argv) != 4:
        print(f'*error* - usage: {sys.argv[0]} filename which_col n_bins')
        print(f'          example: {sys.argv[0]} time_diffs.dat 1 80')
        sys.exit(1)
    fname = sys.argv[1]
    which_col = int(sys.argv[2])
    n_bins = int(sys.argv[3])
    dataset = np.loadtxt(fname, usecols=(which_col,)) # add delimiter=',' for .csv files
    print(dataset)
    print(f'read {len(dataset)} samples, max {dataset.max()},'
          f' min {dataset.min()}')
    (hist, edges) = np.histogram(dataset, bins=n_bins)
    print(len(hist), hist)
    print(len(edges), edges)
    z = zip(edges[:-1], hist)
    fout = fname + '.hist'
    np.savetxt(fout, np.array(list(z)), delimiter=' ')
    print('saved edges and histogram to file %s' % fout)

if __name__ == '__main__':
    main()

You can download the file, and run it like this:

$ chmod +x float_histogram_maker.py
$ ./float_histogram_maker.py time_series.out

You now have the file file_series.out.hist which has the histogram ready for plotting.

13.3. Poisson processes

A poisson process is a random process in which each event’s probability is unrelated to any previous events.

Examples include throwing dice, flipping coins, radioactive decay.

We will write programs that simulate various types of poisson process.

13.3.1. A pure poisson process

Our first program will examine what’s called a “poisson process”. [missing]

13.3.2. An angry lightning goddess

Scenario: you live in a very unfortunately placed house. Every day Astrape, the goddess of lightning, rolls her special dice and has a probability of 0.03 (3%) to hitting that house. This should correspond to an average of one hit per month.

Let us first write a program which generates this series of events using a random number generator. Our first stab at this program, lightning_first_stab.py, will simply see that the probability was indeed close to 0.03.

Listing 13.3.1 lightning_first_stab.py – Hit a house with lightning with a 3% chance every day. First stab at simulating this.
#! /usr/bin/env python3

"""Simpole example of simulating lightning strikes with a fixed
probability every day.  This first stab at a simulation simply
calculates the probability of the hit to make sure that the random
number generator performs as we expect.

"""

import random

def main():
    n_days = 1000
    n_hits = 0
    n_misses = 0
    for day in range(n_days):
        r = random.random()
        if r <= 0.03:             ## 3% chance
            n_hits += 1
        else:
            n_misses += 1
    hit_fraction = n_hits / n_days
    print(f'average daily hits: {hit_fraction} ({n_days} days)')

if __name__ == '__main__':
    main()

Run the program (a few times) with:

$ chmod +x lightning_first_stab.py
$ ./lightning_first_stab.py

The key mechanism used in this program was in these lines of code:

    # ...
    r = random.random()
    if r <= 0.03:            ## 3% chance
        n_hits += 1
    else:
        n_misses += 1
    # ...
hit_fraction = n_hits / n_days

Make sure you understand that the random.random() call returning less than 0.03 happens 3% of the time.

Now, following an example from Steven Pinker [FIXME: put citation to The Better Angles of Our Nature], we ask the question:

If Astrape struck your house today, which day is the most likely day for the next bolt of lightning to strike your house?

To estimate this we write the program lightning_time_distribution.py which simulates the situation of a 3% chance of strike at every interval:

Listing 13.3.2 lightning_time_distribution.py – Hit a house with lightning with a 30% chance every day. Estimate the distribution of time intervals between strikes.
#! /usr/bin/env python3
import random
import math

def main():
    ## run this program with n_days = 50 when you want
    ## to eyeball the output; run it with n_days = 1000,
    ## then 10*1000, then 100*1000 when you want to make
    ## plots
    n_days = 100000
    delta_t_list = simulate_strikes(n_days)
    ## now that we have the list we print it to a file
    with open('time_diffs.dat', 'w') as f:
        for i, delta_t in enumerate(delta_t_list):
            f.write(f'{i}   {delta_t}\n')
            if i > len(delta_t_list) - 10:
                print(f'{i}   {delta_t}')
        print(f'wrote time_diffs.dat with {len(delta_t_list)} delta_t values\n'
              'and showed you the last 10 or so lines')


def simulate_strikes(n_days):
    """simulates lightning strikes for a given number of
    days, collecting information on the times between 
    strikes. returns the list of delta_t values.
    """
    last_delta_t = -1
    delta_t_list = []
    prev_day_with_strike = -1
    for day in range(n_days):
        r = random.random()     # a random float between 0 and 1
        if r <= 0.03:           # 3% chance
            #print('%d: hit' % day)
            if prev_day_with_strike >= 0:
                ## we record the delta_t of this event
                last_delta_t = day - prev_day_with_strike
                delta_t_list.append(last_delta_t)
            prev_day_with_strike = day
    return delta_t_list

if __name__ == '__main__':
    main()

When you run the program with:

$ chmod +x lightning_time_distribution.py
$ ./lightning_time_distribution.py

You can then plot the results of this program in a couple of different ways. We will first make a scatter plot showing the lightning events as points in a plot with lightning number and \(\Delta t\). You can see this in Figure 13.3.1.

Listing 13.3.3 Instructions to make a scatter plot of \(\Delta t\) for each pair of lightning strikes.
##REQUIRED_FILE: time_diffs.dat
##PRE_EXEC: ./lightning_time_distribution.py
set xlabel 'lightning number'
set ylabel '{/Symbol D} t (days)'
plot 'time_diffs.dat' using 2 with points pt 4 \
    ps 0.3 title 'time between lightning strikes'
../_images/lightning-scatter.svg

Figure 13.3.1 Scatter plot of \(\Delta t\) for each pair of lightning strikes. Notice that there are more strikes in the lower portions of the plot, which means that many strike pairs are close by!

A bit more rigor and insight can be gotten from a histogram plotting the number of lighting pairs versus the time between the two strikes. This histogram is shown in Figure 13.3.2.

Listing 13.3.4 Instructions to make a histogram plot of lightning pairs versus \(\Delta t\).
##REQUIRED_FILE: time_diffs.dat
##REQUIRED_FILE: time_diffs.dat.hist
##PRE_EXEC: ./lightning_time_distribution.py
##PRE_EXEC: ../plotting_intermediate/int_histogram_maker.py time_diffs.dat
set grid
set xlabel '{/Symbol Delta} t (days)'
set ylabel 'frequency of that interval'
set style data histogram
set style fill solid 0.8 border -1
plot [0:] [0:] 'time_diffs.dat.hist' using 1:2 with boxes
../_images/lightning-hist.svg

Figure 13.3.2 Histogram of number of lightning pairs versus \(\Delta t\). This shows even more clearly that there are many more consecutive lightning pairs that are closely spaced.

First conclusion: the answer to Steven Pinker’s question is that the most likely day in which the next lightning will strike your house is tomorrow. The goddess Astrape likes to play with humans.

Another conclusion: the distribution of time between consecutive lightning strikes looks like a decaying exponential.

[…many other conclusions…]

13.3.2.1. Questions

Now muse about the time distance between droughts and other natural calamities.

Examples: “Climate Change and Cultural Response” by Benson and Berry, page 105, shows distribution of mega-droughts in Figure 7.

Look in to the “palmer drought severity index” (PDSI) to find datasets. https://www.ncdc.noaa.gov/temp-and-precip/drought/historical-palmers/

PDSI is often shown from 1850 or 1870 until today. Some data sets go older, using tree ring data, and show the interesting periods from 800 to 1400, which would allow us to study the collapse of the Chaco Canyon culture.

For information about file formats and to see how pervasive obsolete data formats are, look at ftp://ftp.cdc.noaa.gov/Datasets/dai_pdsi/README

Download the .nc files at ftp://ftp.cdc.noaa.gov/Datasets/dai_pdsi/ and examine them with ncview.

Or maybe better: get the ascii files from here: http://www.cgd.ucar.edu/cas/catalog/climind/pdsi.html for example http://www.cgd.ucar.edu/cas/catalog/climind/pdsisc.monthly.maps.1850-2010.fawc=1.r2.5x2.5.ipe=2.txt.gz

For Kansas look at http://www.kgs.ku.edu/Hydro/Publications/2012/OFR12_18/KGS_OF_2012-18.pdf page 10, Figure 5a: PDSI for 800 to 2000. And Figure 5b for megadroughts. Figure 5c shows the dustbowl drought.

Aha: the paleoclimatology (2000 years) might be here: https://www.ncdc.noaa.gov/paleo-search/study/19119 and a text file here: https://www1.ncdc.noaa.gov/pub/data/paleo/drought/LBDA2010/LBDA_PMDI_JJA_Recons.txt and the grid metadata is explained here: https://www1.ncdc.noaa.gov/pub/data/paleo/drought/LBDA2010/LBDA_PMDI_Instrumental_metadata.txt Note that Chaco Canyon Lat/Lon are: 36.0530 deg N, 107.9559 deg W. From the meatadat it looks like it might be grid points 1883 or 1884 in this data set.

Q: Are drought cycles random? Are they the combination of so many types of physics that they appear to be random enough? How about anthropogenic climate change? Does it add a baseline?

Topic: Atlantic multidecadal oscillation.

13.3.3. Vicious glow-worms

We have talked about processes which give events distributed randomly in time: events happen at random times. Let us now look at processes that generate points distributed randomly in space: \((x, y)\) coordinates are spewed out by our process. An example might be where the grains of sand land when you drop a handful onto the ground.

We can write a program to generate random \((x, y)\) points between 0 and 100. The program random_spatial.py generates a series of such points, each completely independent of the previous one.

Listing 13.3.5 random_spatial.py – Generate a sequence of random \((x, y)\) points.
#! /usr/bin/env python3

"""
Print a bunch of (x, y) points.
"""

import random
import sys
import math

def main():
    n_pts = 3000
    do_circle = False
    if len(sys.argv) == 2:
        n_pts = int(sys.argv[1])
    n_hits = 0
    while n_hits < n_pts:
        x = random.random() * 100.0
        y = random.random() * 100.0
        if not do_circle:
            print(f'{x}   {y}')
            n_hits += 1
        if do_circle and math.hypot(x-50, y-50) < 50:
            print(f'{x}   {y}')
            n_hits += 1

if __name__ == '__main__':
    main()

You can plot this with the following gnuplot instructions:

Listing 13.3.6 Instructions to plot points randomly distributed in a two-dimensional space.
##REQUIRED_FILE: random_spatial.dat
##PRE_EXEC: ./random_spatial.py > random_spatial.dat
##CAPTION: Random (x, y) points.  You should be able to see
##CAPTION: some structure: occasional filaments, clustering, and empty 
##CAPTION: spaces.
set size ratio -1
set xlabel 'x'
set ylabel 'y'
plot 'random_spatial.dat' using 1:2 with points pt 7 \
    ps 0.3 title 'random (x, y) points'

The results are shown in Figure 13.3.3. You can see features in the data, even though it was randomly generated: filaments, clustering, voids… 1

../_images/random_spatial.svg

Figure 13.3.3 Points randomly distributed in a two-dimensional space.

A possible comment: people who spend a lot of time looking at randomly generated data probably don’t easily believe in conspiracy theories.

We can then do something analogous to what we did for events with random time intervals: plot the distribution of distances between \((x, y)\) points. The programs xy_to_distances.py and float_histogram_maker.py allow us to do so, and the results are in . Note that you will not get as much insight out of these spatial histograms as you did in , since a big factor in the distribution of spacial distances is the small size of the x-y plane we used.

Before running these programs you will need to install the python3-scipy package:

$ sudo apt install python3-scipy
Listing 13.3.7 xy_to_distances.py – Convert a collection of \((x, y)\) points to a list of distances between points.
#! /usr/bin/env python3

"""Load a collection of (x, y) points and print out the distance
between each pair of points"""

import sys
import scipy
import scipy.spatial
import math

def main():
    fname = sys.argv[1]
    coords = []
    with open(fname, 'r') as f:
        lines = f.readlines()
        for line in lines:
            xs, ys = line.split()
            (x, y) = (float(xs), float(ys))
            coords.append((x, y))
    print('read in %d coordinates' % len(coords))

    dist_out_fname = sys.argv[1] + '.distances'
    nearest_fname = sys.argv[1] + '.nearest'
    with open(dist_out_fname, 'w') as fpairs:
        with open(nearest_fname, 'w') as fnearest:
            for i in range(len(coords)-1):
                nearest_distance = sys.float_info.max
                for j in range(0, len(coords)):
                    r = math.hypot(coords[j][0] - coords[i][0],
                                   coords[j][1] - coords[i][1])
                    ## write all pairwise distances to the fpairs file
                    if j > i:   # don't duplicate
                        fpairs.write(f'{i}_{j}   {r}\n')
                    ## now see if we have a new nearest distance
                    if i != j and r < nearest_distance:
                        nearest_distance = r
                ## write nearest distances to a separate file
                fnearest.write(f'{i}   {nearest_distance}\n')
    print('wrote distances to %s' % dist_out_fname)
    print('wrote nearest distances to %s' % nearest_fname)

if __name__ == '__main__':
    main()

You can run this with:

$ chmod +x random_spatial.py xy_to_distances.py
$ ./random_spatial.py
$ # then save it to a file
$ ./random_spatial.py > random_spatial.dat
$ ./xy_to_distances.py random_spatial.dat
$ # this will show that output goes to the two files:
$ # random_spatial.dat.distances and random_spatial.dat.nearest

and plot the result with:

Listing 13.3.8 Instructions to plot a histogram of distances between points randomly distributed in a two-dimensional space.
##REQUIRED_FILE: random_spatial.dat
##REQUIRED_FILE: random_spatial.dat.distances
##REQUIRED_FILE: random_spatial.dat.nearest
##REQUIRED_FILE: random_spatial.dat.distances.hist
##REQUIRED_FILE: random_spatial.dat.nearest.hist
##PRE_EXEC: ./random_spatial.py > random_spatial.dat
##PRE_EXEC: ./xy_to_distances.py random_spatial.dat
##PRE_EXEC: ./float_histogram_maker.py random_spatial.dat.distances
##PRE_EXEC: ./float_histogram_maker.py random_spatial.dat.nearest
set grid
set multi layout 2,1
set style data histogram
set style fill solid 0.8 border -1
set xlabel 'distance between random (x, y) points'
set ylabel 'frequency of that distance'
plot [0:] 'random_spatial.dat.distances.hist' using 1:2 with boxes
set xlabel 'nearest distance to a point'
set ylabel 'frequency of that nearest distance'
plot [0:] 'random_spatial.dat.nearest.hist' using 1:2 with boxes

The results are shown in Figure 13.3.4.

../_images/spatial_hist.svg

Figure 13.3.4 Histogram of distances between 2D random points.

So we have seen that in two spatial dimensions we have a situation analogous to that for time differences: nearest points are distributed with a decaying exponential distribution.

13.3.3.1. Questions

So what would the two dimensional distribution look like if the \((x, y)\) values were not purely random?

Explore this in two ways:

  1. Change the program random_spatial.py to put all the \((x, y)\) points at integer positions.

  2. Change the program random_spatial.py to only put new points outside a “radius of avoidance” from existing points.

The “radius of avoidance” idea is what happens with the famous Waitomo Glowworm Caves in New Zeland: the worm larvae take positions on the ceiling of the cave and glow to attract prey. They avoid settling near other larvae to avoid cannibalism. The result is a distribution that does not have the same filaments and voids as a purely random distribution.

The lack of structure (filaments, voids…) in the cave was observed by biologist Stephen Jay Gould. He conjectured that the reason was this radius of avoidance. Physicist Ed Purcell carried out a calculation and visualization similar to ours to confirm Gould’s conjecture.

[cite Steven Pinker for the glowworm example]

13.4. Brownian motion

In 1827 Scottish botanist Robert Brown looked at pollen grains in water and noticed that the pollen was moved around in the water, jiggling randomly. This was one of the significant discoveries that led to the acceptance of the molecular theory.

In 1905 Einstein explained this motion by showing that the pollen was being buffeted around by the motion of individual water molecules.

Today we know that fluids (liquids and gasses) are made of molecules which are in constant motion. They bump in to each other, into the walls of their container, and they also bump in to larger particles floating around (such as the grain of pollen that Brown observed).

Simulating this behavior is an interesting challenge: it would involve keeping track of a large number of gas molecules and seeing when they bump in to a large sphere, which would represent the grain of pollen.

Right now I leave that more detailed programming task as an exercise, and will point that a huge number of bumps (\(10^{14}\) collisions/second) by molecules which are going in all directions means that our grain of pollen will be moving in a sort of random walk, the same kind we discussed and visualized in Section 12.

What we will do here is write some programs that simulate such random walks and then look at answering the question “where will I end up?” More specifically we ask “how far did I go?”

Let us start with one dimension. The program multiple-walks.py will take several walks in one dimension and write out the final arrival point and how far we went.

Listing 13.4.1 multiple-walks.py – Take several random walks in one dimension; print out the arrival points.
#! /usr/bin/env python3

"""Take many one-simensional random walks so that we can study the
distribution of how far you get in a random walk

"""

import random
import math
import sys

def main():
    x = 0
    n_steps = 10000
    n_walks = 10
    if len(sys.argv) == 3:
        n_steps = int(sys.argv[1])
        n_walks = int(sys.argv[2])
    elif len(sys.argv) != 1:
        sys.stderr.write('error - usage: %s [n_steps n_walks]\n' % sys.argv[0])
        exit(1)

    distances = []            # we'll store all the distances we found
    ## now run several random walks
    for i in range(n_walks):
        final_x = take_walk(x, n_steps)
        distance = math.fabs(final_x) # how far did we get?
        distances.append(distance)
        print('%d: %g  %g' % (i, final_x, distance))

def take_walk(x, n_steps):
    """take a simple 1D random walk"""
    for i in range(n_steps):
        ## generate a step of -1 or +1 in the x direction
        step_x = random.randint(0, 1) * 2 - 1
        x = x + step_x
        # print(i, x, math.sqrt(i), math.fabs(math.fabs(x) - math.sqrt(i)))
    ## return the final location
    return x

main()

Nice, but can we have a plot?

Listing 13.4.2 multiple-walks-plot.py – Take several random walks in one dimension; print out the arrival points and make a plot of the trajectory as a function of time.
#! /usr/bin/env python3

"""Take many one-simensional random walks so that we can study the
distribution of how far you get in a random walk.  Keep track of the
path each walk took and plot it.
"""

import random
import math
import sys
import matplotlib.pyplot as plt

def main():
    x = 0
    n_steps = 10000
    n_walks = 10
    if len(sys.argv) == 3:
        n_steps = int(sys.argv[1])
        n_walks = int(sys.argv[2])
    elif len(sys.argv) != 1:
        sys.stderr.write('error - usage: %s [n_steps n_walks]\n' % sys.argv[0])
        exit(1)

    distances = []            # we'll store all the distances we found
    ## prepare some plot parameters
    plt.xlabel('step')
    plt.ylabel('distance from origin')
    plt.grid(True)
    ## now run several random walks
    for i in range(n_walks):
        path = take_walk(x, n_steps)
        final_x = path[-1]
        distance = math.fabs(final_x) # how far did we get?
        distances.append(distance)
        print('%d: %g  %g' % (i, final_x, distance))
        plt.plot(path)        # add to the plot
    ## now show the plot we have accumulated
    plt.show()

def take_walk(x, n_steps):
    """take a simple 1D random walk"""
    path = []
    for i in range(n_steps):
        ## generate a step of -1 or +1 in the x direction
        step_x = random.randint(0, 1) * 2 - 1
        x = x + step_x
        # print(i, x, math.sqrt(i), math.fabs(math.fabs(x) - math.sqrt(i)))
        path.append(x)
    ## return the final location
    return path

main()

Sure, but what if we are impatient and we want to see the plots before they are all generated. We can animate the plot using the techniques from Section 6.8.2.

Listing 13.4.3 multiple-walks-plot.py – Take several random walks in one dimension; print out the arrival points and make a plot of the trajectory as a function of time. In this version we draw the plots as the walks become ready, rather than waiting for the end.
#! /usr/bin/env python3

"""Take many one-simensional random walks so that we can study the
distribution of how far you get in a random walk.  Keep track of the
path each walk took and plot it.
"""

import random
import math
import sys
import matplotlib.pyplot as plt

def main():
    x = 0
    n_steps = 10000
    n_walks = 10
    if len(sys.argv) == 3:
        n_steps = int(sys.argv[1])
        n_walks = int(sys.argv[2])
    elif len(sys.argv) != 1:
        sys.stderr.write('error - usage: %s [n_steps n_walks]\n' % sys.argv[0])
        exit(1)

    distances = []            # we'll store all the distances we found
    ## prepare to have animation
    fig, ax = plt.subplots()
    plt.xlabel('step')
    plt.ylabel('distance from origin')
    plt.grid(True)
    plt.ion()
    ## now run several random walks
    for i in range(n_walks):
        path = take_walk(x, n_steps)
        final_x = path[-1]
        distance = math.fabs(final_x) # how far did we get?
        distances.append(distance)
        print('%d: %g  %g' % (i, final_x, distance))
        plt.plot(path)        # add to the plot
        fig.canvas.draw_idle()
        plt.pause(0.4)
        fig.canvas.draw_idle()
    ## now show the plot we have accumulated

    plt.show()
    plt.waitforbuttonpress()

def take_walk(x, n_steps):
    """take a simple 1D random walk"""
    path = []
    for i in range(n_steps):
        ## generate a step of -1 or +1 in the x direction
        step_x = random.randint(0, 1) * 2 - 1
        x = x + step_x
        # print(i, x, math.sqrt(i), math.fabs(math.fabs(x) - math.sqrt(i)))
        path.append(x)
    ## return the final location
    return path

main()

Now let’s look at a crucial property of random walks: the overall behavior of the final position and the distance from the origin. At firs tit seems that these are all very different, with no pattern. But if we plot a histogram of the final positions in the one dimensional random walk we see something interesting:

Listing 13.4.4 multiple-walks-histogram.py – Take several random walks in one dimension; plot a histogram of the arrival points. If we take enough walks and they are long enough, we end up with a gaussian distribution of positions, centered around the origin.
#! /usr/bin/env python3

"""Take many one-simensional random walks so that we can study the
distribution of how far you get in a random walk.  This version starts
from multiple-walks.py and adds the ability to plot a histogram of how
far we got.

"""

import random
import math
import sys

import numpy as np
import matplotlib.pyplot as plt

fig, ax = plt.subplots()

def main():
    x = 0
    n_steps = 1001
    n_walks = 1001
    if len(sys.argv) == 3:
        n_steps = int(sys.argv[1])
        n_walks = int(sys.argv[2])
    elif len(sys.argv) != 1:
        sys.stderr.write('error - usage: %s [n_steps n_walks]\n' % sys.argv[0])
        exit(1)

    positions = []            # we'll store all the positions we found
    ## prepare to plot
    plt.grid(True)
    plt.xlabel('final value')
    plt.ylabel('number of walks')
    plt.title('1D random walk arrival values (%d, %d)' % (n_steps, n_walks))
    plt.ion()                   # enable interactive
    ## now run several random walks
    for i in range(n_walks):
        final_x = take_walk(x, n_steps)
        positions.append(final_x)
        # print('%d: %g  %g' % (i, final_x, position))
        plot_position_histogram(positions, n_steps, n_walks)
    plt.show()
    plt.waitforbuttonpress()


def take_walk(x, n_steps):
    """take a simple 1D random walk"""
    for i in range(n_steps):
        ## generate a step of -1 or +1 in the x direction
        step_x = random.randint(0, 1) * 2 - 1
        x = x + step_x
        # print(i, x, math.sqrt(i), math.fabs(math.fabs(x) - math.sqrt(i)))
    ## return the final location
    return x

def plot_position_histogram(positions, n_steps, n_walks):
    if len(positions) < 10:
        return
    n_steps_so_far = len(positions)
    n_bins = min(51, 5+int(n_steps_so_far/3))
    # print(np.array(positions))
    plt.cla()
    plt.grid(True)
    plt.xlabel('final value')
    plt.ylabel('number of walks')
    plt.title('1D random walk arrival values (%d, %d)' 
              % (n_steps_so_far, n_walks))
    n, bins, patches = plt.hist(positions, n_bins)
    fig.canvas.draw_idle()
    plt.pause(0.01)

main()

NOTE: try to superimpose a gaussian with sigma proportional to sqrt(n_steps).

Let’s visualize things together with an animated multiplot:

Listing 13.4.5 multiple-walks-histogram-multiplot.py – This version combines the plotting of the 1D random walks and the histograms, and shows them as they accumulate.
#! /usr/bin/env python3

"""Take many one-simensional random walks so that we can study the
distribution of how far you get in a random walk.  This version starts
from multiple-walks.py and adds the ability to plot a histogram of how
far we got.

"""

import random
import math
import sys

import numpy as np
import matplotlib.pyplot as plt

skip_interval = 1
# skip_interval = 5

def main():
    x = 0
    n_steps = 1001
    n_walks = 1001
    if len(sys.argv) == 3:
        n_steps = int(sys.argv[1])
        n_walks = int(sys.argv[2])
    elif len(sys.argv) != 1:
        sys.stderr.write('error - usage: %s [n_steps n_walks]\n' % sys.argv[0])
        exit(1)

    positions = []            # we'll store all the positions we found
    # (ax1, ax2, ax3, ax4) = prepare_plotting(n_steps, n_walks)
    prepare_plotting(n_steps, n_walks)
    ## now run several random walks
    last_10_walks = []
    for i in range(n_walks):
        take_walk(x, n_steps, last_10_walks)
        if i % skip_interval == 0 or i < 20 or i > (n_walks-20):
            plot_walks(last_10_walks)
        final_x = last_10_walks[-1][-1] # last position of last walk
        position = math.fabs(final_x) # how far did we get?
        # positions.append(position)
        positions.append(final_x)
        # print('%d: %g  %g' % (i, final_x, position))
        if i % skip_interval == 0 or i < 20 or i > (n_walks-20):
            plot_position_histogram(positions, n_steps, n_walks)
        plt.subplot(224).figure.suptitle('1D random walks arrival values (%d, %d)' 
                                         % (i, n_walks))
        plt.pause(0.000001)
    plt.show()
    plt.waitforbuttonpress()

def prepare_plotting(n_steps, n_walks):
    fig = plt.figure(1)
    fig.suptitle('1D random walks arrival values (%d, %d)' 
                 % (n_steps, n_walks))
    plt.subplots_adjust(top=0.92, bottom=0.10, left=0.10, right=0.95, 
                        hspace=0.25, wspace=0.35)
    plt.ion()
    # return (ax1, ax2, ax3, ax4)


def take_walk(x, n_steps, last_10_walks):
    """take a simple 1D random walk"""
    last_10_walks.append([])
    for i in range(n_steps):
        ## generate a step of -1 or +1 in the x direction
        step_x = random.randint(0, 1) * 2 - 1
        x = x + step_x
        last_10_walks[-1].append(x)
        # print(i, x, math.sqrt(i), math.fabs(math.fabs(x) - math.sqrt(i)))
    ## now make sure we only keep the last 10 entries
    while len(last_10_walks) > 10:
        del last_10_walks[0]
    return last_10_walks

def plot_walks(last_10_walks):
    ## plot it in the cumulative plot
    ax = plt.subplot(2, 2, 1)
    plt.xlabel('step')
    plt.ylabel('position')
    ax.plot(last_10_walks[-1])
    ## plot it in the "last 10" plots
    if len(last_10_walks) >= 10:
        ax = plt.subplot(2, 2, 2)
        ax.cla()
        plt.xlabel('step')
        plt.ylabel('position')
        for i in range(10):
            ax.plot(last_10_walks[i])
    ax.figure.canvas.draw_idle()


def plot_position_histogram(positions, n_steps, n_walks):
    if len(positions) < 10:
        return
    n_steps_so_far = len(positions)
    n_bins = min(51, 5+int(n_steps_so_far/3))
    ## plot the positions
    ax = plt.subplot(2, 2, 3)
    ax.cla()
    ax.grid(True)
    plt.xlabel('final position')
    plt.ylabel('number of walks')
    # plt.title('1D random walk (%d, %d)'
    #           % (n_steps_so_far, n_walks))
    n, bins, patches = ax.hist(positions, n_bins)
    ax.figure.canvas.draw_idle()
    ## plot the positions
    distances = [math.fabs(pos) for pos in positions]
    ax = plt.subplot(2, 2, 4)
    ax.cla()
    ax.grid(True)
    plt.xlabel('distance from start')
    plt.ylabel('number of walks')
    # plt.title('1D random walk (%d, %d)'
    #           % (n_steps_so_far, n_walks))
    n, bins, patches = ax.hist(distances, n_bins)
    ax.figure.suptitle('1D random walks arrival values (%d, %d)' 
                       % (n_steps_so_far, n_walks))
    ax.figure.canvas.draw_idle()

main()

Let’s look at the 2D situation:

Listing 13.4.6 multiple-walks-histogram-2d.py – Take several random walks in two dimensions; plot a histogram of how far they arrived.
#! /usr/bin/env python3

"""Take many one-simensional random walks so that we can study the
distribution of how far you get in a random walk.  This version starts
from multiple-walks-histogram.py, changes it to take two dimensional
walks, and then plots a histogram of how far we got.

"""

import random
import math
import sys

import numpy as np
import matplotlib.pyplot as plt

def main():
    pt = (0, 0)
    n_steps = 1000
    n_walks = 1000
    if len(sys.argv) == 3:
        n_steps = int(sys.argv[1])
        n_walks = int(sys.argv[2])
    elif len(sys.argv) == 4:
        n_steps = int(sys.argv[1])
        n_walks = int(sys.argv[2])
        output_file = sys.argv[3]
    elif len(sys.argv) == 2:
        output_file = sys.argv[1]
    elif len(sys.argv) != 1:
        sys.stderr.write('error - usage: %s [n_steps n_walks]\n' % sys.argv[0])
        exit(1)

    distances = []            # we'll store all the distances we found
    ## now run several random walks
    for i in range(n_walks):
        final_pt = take_walk(pt, n_steps)
        distance = math.hypot(*final_pt) # how far did we get?
        distances.append(distance)
        # print('%d: %g  %g  %g' % (i, *final_pt, distance))
    plot_distance_histogram(distances, n_steps, n_walks, output_file)

def take_walk(pt, n_steps):
    """take a simple 1D random walk"""
    for i in range(n_steps):
        ## generate a step of -1 or +1 in the x direction
        step_x = random.randint(0, 1) * 2 - 1
        step_y = random.randint(0, 1) * 2 - 1
        pt = (pt[0] + step_x, pt[1] + step_y)
        # print(i, x, math.sqrt(i), math.fabs(math.fabs(x) - math.sqrt(i)))
    ## return the final location
    return pt

def plot_distance_histogram(distances, n_steps, n_walks, output_file):
    n_bins = min(50, n_walks/3.0)
    # print(np.array(distances))
    n, bins, patches = plt.hist(distances, n_bins)
    print(bins)
    plt.xlabel('distance traveled')
    plt.ylabel('number of walks')
    plt.title('2D random walk distance distribution (%d, %d)'
              % (n_steps, n_walks))
    plt.grid(True)
    plt.show()

main()

13.5. Further reading

The wikipedia article https://en.wikipedia.org/wiki/Brownian_motion has nice diagrams.

Start by watching this video: https://www.youtube.com/watch?v=hy-clLi8gHg

Possibly do this experiment: https://www.youtube.com/watch?v=wf2tBAvMNbg

Other video on brownian motion: https://www.youtube.com/watch?v=NHo6LTXdFns

This one has a cute demonstration of how you get a random walk when you focus on a single particle, and is https://www.youtube.com/watch?v=pdz7wFHSLD0

Scipy has some routines that help simulate Brownian motion. The cookbook has this example: http://scipy-cookbook.readthedocs.io/items/BrownianMotion.html

13.6. Progression of record peaks

http://iaaf-ebooks.s3.amazonaws.com/2015/Progression-of-IAAF-World-Records-2015/projet/IAAF-WRPB-2015.pdf

https://en.wikipedia.org/wiki/Athletics_record_progressions

[footnotes]

1

Note that the clustering is an artifact of the random generation of points; it is not due to a physical effect that clusters the points together.

13.7. The gambler’s fallacy

13.8. The gambler’s ruin