.. _chap-random-processes:

==================
 Random Processes
==================

[status: partially-written]


Motivation, prerequisites, plan
===============================

.. rubric:: Motivation

There is a surprising depth of insight you can gain by understanding
various aspects of random processes, also called *stochastic
processes*.  The wikipedia article mentions that:

.. pull-quote::

   `"They have applications in many disciplines including sciences
   such as biology, chemistry, ecology, neuroscience, and physics as
   well as technology and engineering fields such as image processing,
   signal processing, information theory, computer science,
   cryptography and telecommunications. Furthermore, seemingly random
   changes in financial markets have motivated the extensive use of
   stochastic processes in finance."`

This is a staggering list of fields: random processes seem to be as
pervasive in all fields of physical, biological and social science as
basic math, statistics and calculus.

.. rubric:: Prerequisites

* The 10-hour "serious programming" course.
* The "Data files and first plots" mini-course in
  :numref:`chap-data-files-and-first-plots`.
* Random number basics from :numref:`chap-random-number-basics`.
* The sections on random walks from
  :numref:`chap-randomness-disorder`.

.. rubric:: Plan

So how do we write programs that study and visualize these ideas?  We
will:

#. Review random number generation.
#. Introduce simple poisson processes: random amplitude and random
   time between samples.
#. Examine the distribution of :math:`\Delta t` in a poisson process.
#. Look at random placements on a 2D lattice.
#. Various types of time series.
#. Revisit random walks.
#. Diffusion: distribution of distance covered in a random walk after
   time :math:`t`.
#. Brownian motion and kinetic theory.
#. Progression of record peaks, and progression of sports world
   records.


Reviewing random number generation
==================================

You should quickly review the brief section on what web pages look
like in :numref:`chap-random-number-basics` before continuing in this
section.

Now let us write a few lines of code which generate random numbers and
then puts them into a file which we can analyze and plot.  You can
paste the program :download:`make_random_dt_series.py` directly into
the python3 interpreter, or save it to a file and run it:

.. literalinclude:: make_random_dt_series.py
   :language: python
   :caption: make_random_dt_series.py -- Make a time history with
             completely random dt values

You can run this with:

.. code-block:: console

   $ chmod +x make_random_dt_series.py
   $ ./make_random_dt_series.py time_series.out


This will leave us with a file called `time_series.out` which we can
examine from the command line or in an editor.  The first-glance
examination does not give much insight: this kind of data is best
viewed using *histograms*.

In an earlier section we discussed making *histograms* from data sets,
and here we will use the program
:download:`float_histogram_maker.py<../plotting-intermediate/float_histogram_maker.py>`
to analyze our file:

.. literalinclude:: ../plotting-intermediate/float_histogram_maker.py
   :language: python
   :caption: float_histogram_maker.py -- Make a histogram from a file
             with a single column of floats..

You can download the file, and run it like this:

.. code-block:: console

   $ chmod +x float_histogram_maker.py
   $ ./float_histogram_maker.py time_series.out

You now have the file file_series.out.hist which has the histogram
ready for plotting.


Poisson processes
=================

A poisson process is a random process in which each event's
probability is `unrelated` to any previous events.

Examples include throwing dice, flipping coins, radioactive decay.

We will write programs that simulate various types of poisson process.


A pure poisson process
----------------------

Our first program will examine what's called a "poisson process". [missing]


An angry lightning goddess
--------------------------

Scenario: you live in a very unfortunately placed house.  `Every day`
Astrape, the goddess of lightning, rolls her special dice and has a
probability of 0.03 (3%) to hitting that house.  This should
correspond to an average of one hit per month.

Let us first write a program which generates this series of events
using a random number generator.  Our first stab at this program,
:download:`lightning_first_stab.py`, will simply see that the
probability was indeed close to 0.03.

.. literalinclude:: lightning_first_stab.py
   :language: python
   :caption: lightning_first_stab.py -- Hit a house with lightning
             with a 3% chance every day.  First stab at simulating
             this.

Run the program (a few times) with:

.. code-block:: console

   $ chmod +x lightning_first_stab.py
   $ ./lightning_first_stab.py

The key mechanism used in this program was in these lines of code:

.. code-block:: python

            # ...
            r = random.random()
            if r <= 0.03:            ## 3% chance
                n_hits += 1
            else:
                n_misses += 1
            # ...
        hit_fraction = n_hits / n_days

Make sure you understand that the ``random.random()`` call returning
less than 0.03 happens 3% of the time.

Now, following an example from Steven Pinker [FIXME: put citation to
The Better Angles of Our Nature], we ask the question:

.. pull-quote::

   If Astrape struck your house today, which day is the most likely
   day for the *next* bolt of lightning to strike your house?

To estimate this we write the program
:download:`lightning_time_distribution.py` which simulates the
situation of a 3% chance of strike at every interval:

.. literalinclude:: lightning_time_distribution.py
   :language: python
   :caption: lightning_time_distribution.py -- Hit a house with
             lightning with a 30% chance every day.  Estimate the
             distribution of time intervals between strikes.

When you run the program with:

.. code-block:: console

   $ chmod +x lightning_time_distribution.py
   $ ./lightning_time_distribution.py

You can then plot the results of this program in a couple of different
ways.  We will first make a `scatter plot` showing the lightning
events as points in a plot with lightning number and :math:`\Delta t`.
You can see this in :numref:`fig-lightning-scatter`.

.. literalinclude:: lightning-scatter.gp
   :language: gnuplot
   :caption: Instructions to make a scatter plot of :math:`\Delta t`
             for each pair of lightning strikes.

.. _fig-lightning-scatter:

.. figure:: lightning-scatter.*
   :width: 60%

   Scatter plot of :math:`\Delta t` for each pair of lightning
   strikes.  Notice that there are more strikes in the lower portions
   of the plot, which means that many strike pairs are close by!

A bit more rigor and insight can be gotten from a `histogram` plotting
the number of lighting pairs versus the time between the two strikes.
This histogram is shown in :numref:`fig-lightning-hist`.

.. literalinclude:: lightning-hist.gp
   :language: gnuplot
   :caption: Instructions to make a histogram plot of lightning pairs
             versus :math:`\Delta t`.

.. _fig-lightning-hist:

.. figure:: lightning-hist.*
   :width: 60%

   Histogram of number of lightning pairs versus :math:`\Delta t`.
   This shows even more clearly that there are many more consecutive
   lightning pairs that are closely spaced.


First conclusion: the answer to Steven Pinker's question is that the
most likely day in which the `next` lightning will strike your house
is `tomorrow`.  The goddess Astrape likes to play with humans.

Another conclusion: the distribution of time between consecutive
lightning strikes looks like a `decaying exponential`.

[...many other conclusions...]

Questions
~~~~~~~~~

Now muse about the time distance between droughts and other natural
calamities.

Examples: "Climate Change and Cultural Response" by Benson and Berry,
page 105, shows distribution of mega-droughts in Figure 7.

Look in to the "palmer drought severity index" (PDSI) to find
datasets.
https://www.ncdc.noaa.gov/temp-and-precip/drought/historical-palmers/

PDSI is often shown from 1850 or 1870 until today.  Some data sets go
older, using tree ring data, and show the interesting periods from 800
to 1400, which would allow us to study the collapse of the Chaco
Canyon culture.

For information about file formats and to see how pervasive obsolete
data formats are, look at
ftp://ftp.cdc.noaa.gov/Datasets/dai_pdsi/README

Download the .nc files at ftp://ftp.cdc.noaa.gov/Datasets/dai_pdsi/
and examine them with ``ncview``.

Or maybe better: get the ascii files from here:
http://www.cgd.ucar.edu/cas/catalog/climind/pdsi.html for example
http://www.cgd.ucar.edu/cas/catalog/climind/pdsisc.monthly.maps.1850-2010.fawc=1.r2.5x2.5.ipe=2.txt.gz


For Kansas look at
http://www.kgs.ku.edu/Hydro/Publications/2012/OFR12_18/KGS_OF_2012-18.pdf
page 10, Figure 5a: PDSI for 800 to 2000.  And Figure 5b for
megadroughts.  Figure 5c shows the dustbowl drought.

Aha: the paleoclimatology (2000 years) might be here:
https://www.ncdc.noaa.gov/paleo-search/study/19119 and a text file
here:
https://www1.ncdc.noaa.gov/pub/data/paleo/drought/LBDA2010/LBDA_PMDI_JJA_Recons.txt
and the grid metadata is explained here:
https://www1.ncdc.noaa.gov/pub/data/paleo/drought/LBDA2010/LBDA_PMDI_Instrumental_metadata.txt
Note that Chaco Canyon Lat/Lon are: 36.0530 deg N, 107.9559 deg W.  From the
meatadat it looks like it might be grid points 1883 or 1884 in this
data set.

Q: Are drought cycles random?  Are they the combination of so many
types of physics that they appear to be random enough?  How about
anthropogenic climate change?  Does it add a baseline?

Topic: Atlantic multidecadal oscillation.


Vicious glow-worms
------------------

We have talked about processes which give events distributed *randomly
in time:* events happen at random times. Let us now look at processes
that generate points distributed *randomly in space:* :math:`(x, y)`
coordinates are spewed out by our process. An example might be where
the grains of sand land when you drop a handful onto the ground.

We can write a program to generate random :math:`(x, y)` points
between 0 and 100. The program :download:`random_spatial.py` generates
a series of such points, each completely independent of the previous
one.

.. literalinclude:: random_spatial.py
   :language: python
   :caption: random_spatial.py -- Generate a sequence of random
             :math:`(x, y)` points.

You can plot this with the following gnuplot instructions:

.. literalinclude:: random_spatial.gp
   :language: gnuplot
   :caption: Instructions to plot points randomly distributed in a
             two-dimensional space.

The results are shown in :numref:`fig-random_spatial`.  You can see
*features* in the data, even though it was randomly generated:
filaments, clustering, voids... [#]_

.. _fig-random_spatial:

.. figure:: random_spatial.*
   :width: 60%

   Points randomly distributed in a two-dimensional space.


A possible comment: people who spend a lot of time looking at randomly
generated data probably don’t easily believe in conspiracy theories.

We can then do something analogous to what we did for events with
random time intervals: plot the distribution of distances between
:math:`(x, y)` points.  The programs :download:`xy_to_distances.py`
and
:download:`float_histogram_maker.py<../plotting-intermediate/float_histogram_maker.py>`
allow us to do so, and the results are
in . Note that you will not get as much insight out of these spatial
histograms as you did in , since a big factor in the distribution of
spacial distances is the small size of the *x-y* plane we used.

Before running these programs you will need to install the
``python3-scipy`` package:

.. code-block:: console

   $ sudo apt install python3-scipy

.. literalinclude:: xy_to_distances.py
   :language: python
   :caption: xy_to_distances.py -- Convert a collection of :math:`(x,
             y)` points to a list of distances between points.

You can run this with:

.. code-block:: console

   $ chmod +x random_spatial.py xy_to_distances.py
   $ ./random_spatial.py
   $ # then save it to a file
   $ ./random_spatial.py > random_spatial.dat
   $ ./xy_to_distances.py random_spatial.dat
   $ # this will show that output goes to the two files:
   $ # random_spatial.dat.distances and random_spatial.dat.nearest

and plot the result with:

.. literalinclude:: spatial_hist.gp
   :language: gnuplot
   :caption: Instructions to plot a histogram of distances between
             points randomly distributed in a two-dimensional space.

The results are shown in :numref:`fig-spatial_hist`.

.. _fig-spatial_hist:

.. figure:: spatial_hist.*
   :width: 60%

   Histogram of distances between 2D random points.

So we have seen that in two spatial dimensions we have a situation
analogous to that for time differences: nearest points are distributed
with a `decaying exponential` distribution.


Questions
~~~~~~~~~

So what would the two dimensional distribution look like if the
:math:`(x, y)` values were not purely random?

Explore this in two ways:

#. Change the program ``random_spatial.py`` to put all the :math:`(x,
   y)` points at integer positions.

#. Change the program ``random_spatial.py`` to only put new points
   outside a "radius of avoidance" from existing points.

The "radius of avoidance" idea is what happens with the famous Waitomo
Glowworm Caves in New Zeland: the worm larvae take positions on the
ceiling of the cave and glow to attract prey.  They avoid settling
near other larvae to avoid cannibalism.  The result is a distribution
that does not have the same filaments and voids as a purely random
distribution.

The lack of structure (filaments, voids...) in the cave was observed
by biologist Stephen Jay Gould.  He conjectured that the reason was
this radius of avoidance.  Physicist Ed Purcell carried out a
calculation and visualization similar to ours to confirm Gould's
conjecture.

[cite Steven Pinker for the glowworm example]


Brownian motion
===============

In 1827 Scottish botanist Robert Brown looked at pollen grains in
water and noticed that the pollen was moved around in the water,
jiggling randomly. This was one of the significant discoveries that
led to the acceptance of the molecular theory.

In 1905 Einstein explained this motion by showing that the pollen was
being buffeted around by the motion of individual water molecules.

Today we know that fluids (liquids and gasses) are made of molecules
which are in constant motion.  They bump in to each other, into the
walls of their container, and they also bump in to larger particles
floating around (such as the grain of pollen that Brown observed).

Simulating this behavior is an interesting challenge: it would involve
keeping track of a large number of gas molecules and seeing when they
bump in to a large sphere, which would represent the grain of pollen.

Right now I leave that more detailed programming task as an exercise,
and will point that a huge number of bumps (:math:`10^{14}`
collisions/second) by molecules which are going in all directions
means that our grain of pollen will be moving in a sort of `random
walk`, the same kind we discussed and visualized in
:numref:`chap-randomness-disorder`.

What we will do here is write some programs that simulate such random
walks and then look at answering the question "where will I end up?"
More specifically we ask "how far did I go?"

Let us start with one dimension.  The program ``multiple-walks.py``
will take several walks in one dimension and write out the final
arrival point and how far we went.

.. literalinclude:: multiple-walks.py
   :language: python
   :caption: multiple-walks.py -- Take several random walks in one
             dimension; print out the arrival points.

Nice, but can we have a plot?

.. literalinclude:: multiple-walks-plot.py
   :language: python
   :caption: multiple-walks-plot.py -- Take several random walks in
             one dimension; print out the arrival points and make a
             plot of the trajectory as a function of time.

Sure, but what if we are impatient and we want to see the plots before
they are all generated.  We can animate the plot using the techniques
from :numref:`sec-animation-in-matplotlib`.

.. literalinclude:: multiple-walks-plot-animated.py
   :language: python
   :caption: multiple-walks-plot.py -- Take several random walks in
             one dimension; print out the arrival points and make a
             plot of the trajectory as a function of time.  In this
             version we draw the plots as the walks become ready,
             rather than waiting for the end.

Now let's look at a crucial property of random walks: the overall
behavior of the final position and the distance from the origin.  At
firs tit seems that these are all very different, with no pattern.
But if we plot a *histogram* of the final positions in the one
dimensional random walk we see something interesting:

.. literalinclude:: multiple-walks-histogram.py
   :language: python
   :caption: multiple-walks-histogram.py -- Take several random walks
             in one dimension; plot a histogram of the arrival points.
             If we take enough walks and they are long enough, we end
             up with a gaussian distribution of positions, centered
             around the origin.

NOTE: try to superimpose a gaussian with sigma proportional to
sqrt(n_steps).

Let's visualize things together with an animated multiplot:

.. literalinclude:: multiple-walks-histogram-multiplot.py
   :language: python
   :caption: multiple-walks-histogram-multiplot.py -- This version
             combines the plotting of the 1D random walks and the
             histograms, and shows them as they accumulate.

Let's look at the 2D situation:

.. literalinclude:: multiple-walks-histogram-2d.py
   :language: python
   :caption: multiple-walks-histogram-2d.py -- Take several random
             walks in two dimensions; plot a histogram of how far they
             arrived.


Further reading
===============

The wikipedia article https://en.wikipedia.org/wiki/Brownian_motion
has nice diagrams.

Start by watching this video:
https://www.youtube.com/watch?v=hy-clLi8gHg

Possibly do this experiment:
https://www.youtube.com/watch?v=wf2tBAvMNbg

Other video on brownian motion:
https://www.youtube.com/watch?v=NHo6LTXdFns

This one has a cute demonstration of how you get a random walk when
you focus on a single particle, and is 
https://www.youtube.com/watch?v=pdz7wFHSLD0

Scipy has some routines that help simulate Brownian motion.  The
cookbook has this example:
http://scipy-cookbook.readthedocs.io/items/BrownianMotion.html


Progression of record peaks
===========================

http://iaaf-ebooks.s3.amazonaws.com/2015/Progression-of-IAAF-World-Records-2015/projet/IAAF-WRPB-2015.pdf

https://en.wikipedia.org/wiki/Athletics_record_progressions

[footnotes]

.. [#]
   Note that the clustering is an artifact of the random generation of
   points; it is not due to a physical effect that clusters the points
   together.


The gambler's fallacy
=====================


The gambler's ruin
==================

Link to other chapters
======================

* Simulated annealing covered in :numref:`chap-the-traveling-salesman`

* Genetic algorithms (planned).