.. _chap-random-processes: ================== Random Processes ================== [status: partially-written] Motivation, prerequisites, plan =============================== .. rubric:: Motivation There is a surprising depth of insight you can gain by understanding various aspects of random processes, also called *stochastic processes*. The wikipedia article mentions that: .. pull-quote:: `"They have applications in many disciplines including sciences such as biology, chemistry, ecology, neuroscience, and physics as well as technology and engineering fields such as image processing, signal processing, information theory, computer science, cryptography and telecommunications. Furthermore, seemingly random changes in financial markets have motivated the extensive use of stochastic processes in finance."` This is a staggering list of fields: random processes seem to be as pervasive in all fields of physical, biological and social science as basic math, statistics and calculus. .. rubric:: Prerequisites * The 10-hour "serious programming" course. * The "Data files and first plots" mini-course in :numref:`chap-data-files-and-first-plots`. * Random number basics from :numref:`chap-random-number-basics`. * The sections on random walks from :numref:`chap-randomness-disorder`. .. rubric:: Plan So how do we write programs that study and visualize these ideas? We will: #. Review random number generation. #. Introduce simple poisson processes: random amplitude and random time between samples. #. Examine the distribution of :math:`\Delta t` in a poisson process. #. Look at random placements on a 2D lattice. #. Various types of time series. #. Revisit random walks. #. Diffusion: distribution of distance covered in a random walk after time :math:`t`. #. Brownian motion and kinetic theory. #. Progression of record peaks, and progression of sports world records. Reviewing random number generation ================================== You should quickly review the brief section on what web pages look like in :numref:`chap-random-number-basics` before continuing in this section. Now let us write a few lines of code which generate random numbers and then puts them into a file which we can analyze and plot. You can paste the program :download:`make_random_dt_series.py` directly into the python3 interpreter, or save it to a file and run it: .. literalinclude:: make_random_dt_series.py :language: python :caption: make_random_dt_series.py -- Make a time history with completely random dt values You can run this with: .. code-block:: console $ chmod +x make_random_dt_series.py $ ./make_random_dt_series.py time_series.out This will leave us with a file called `time_series.out` which we can examine from the command line or in an editor. The first-glance examination does not give much insight: this kind of data is best viewed using *histograms*. In an earlier section we discussed making *histograms* from data sets, and here we will use the program :download:`float_histogram_maker.py<../plotting-intermediate/float_histogram_maker.py>` to analyze our file: .. literalinclude:: ../plotting-intermediate/float_histogram_maker.py :language: python :caption: float_histogram_maker.py -- Make a histogram from a file with a single column of floats.. You can download the file, and run it like this: .. code-block:: console $ chmod +x float_histogram_maker.py $ ./float_histogram_maker.py time_series.out You now have the file file_series.out.hist which has the histogram ready for plotting. Poisson processes ================= A poisson process is a random process in which each event's probability is `unrelated` to any previous events. Examples include throwing dice, flipping coins, radioactive decay. We will write programs that simulate various types of poisson process. A pure poisson process ---------------------- Our first program will examine what's called a "poisson process". [missing] An angry lightning goddess -------------------------- Scenario: you live in a very unfortunately placed house. `Every day` Astrape, the goddess of lightning, rolls her special dice and has a probability of 0.03 (3%) to hitting that house. This should correspond to an average of one hit per month. Let us first write a program which generates this series of events using a random number generator. Our first stab at this program, :download:`lightning_first_stab.py`, will simply see that the probability was indeed close to 0.03. .. literalinclude:: lightning_first_stab.py :language: python :caption: lightning_first_stab.py -- Hit a house with lightning with a 3% chance every day. First stab at simulating this. Run the program (a few times) with: .. code-block:: console $ chmod +x lightning_first_stab.py $ ./lightning_first_stab.py The key mechanism used in this program was in these lines of code: .. code-block:: python # ... r = random.random() if r <= 0.03: ## 3% chance n_hits += 1 else: n_misses += 1 # ... hit_fraction = n_hits / n_days Make sure you understand that the ``random.random()`` call returning less than 0.03 happens 3% of the time. Now, following an example from Steven Pinker [FIXME: put citation to The Better Angles of Our Nature], we ask the question: .. pull-quote:: If Astrape struck your house today, which day is the most likely day for the *next* bolt of lightning to strike your house? To estimate this we write the program :download:`lightning_time_distribution.py` which simulates the situation of a 3% chance of strike at every interval: .. literalinclude:: lightning_time_distribution.py :language: python :caption: lightning_time_distribution.py -- Hit a house with lightning with a 30% chance every day. Estimate the distribution of time intervals between strikes. When you run the program with: .. code-block:: console $ chmod +x lightning_time_distribution.py $ ./lightning_time_distribution.py You can then plot the results of this program in a couple of different ways. We will first make a `scatter plot` showing the lightning events as points in a plot with lightning number and :math:`\Delta t`. You can see this in :numref:`fig-lightning-scatter`. .. literalinclude:: lightning-scatter.gp :language: gnuplot :caption: Instructions to make a scatter plot of :math:`\Delta t` for each pair of lightning strikes. .. _fig-lightning-scatter: .. figure:: lightning-scatter.* :width: 60% Scatter plot of :math:`\Delta t` for each pair of lightning strikes. Notice that there are more strikes in the lower portions of the plot, which means that many strike pairs are close by! A bit more rigor and insight can be gotten from a `histogram` plotting the number of lighting pairs versus the time between the two strikes. This histogram is shown in :numref:`fig-lightning-hist`. .. literalinclude:: lightning-hist.gp :language: gnuplot :caption: Instructions to make a histogram plot of lightning pairs versus :math:`\Delta t`. .. _fig-lightning-hist: .. figure:: lightning-hist.* :width: 60% Histogram of number of lightning pairs versus :math:`\Delta t`. This shows even more clearly that there are many more consecutive lightning pairs that are closely spaced. First conclusion: the answer to Steven Pinker's question is that the most likely day in which the `next` lightning will strike your house is `tomorrow`. The goddess Astrape likes to play with humans. Another conclusion: the distribution of time between consecutive lightning strikes looks like a `decaying exponential`. [...many other conclusions...] Questions ~~~~~~~~~ Now muse about the time distance between droughts and other natural calamities. Examples: "Climate Change and Cultural Response" by Benson and Berry, page 105, shows distribution of mega-droughts in Figure 7. Look in to the "palmer drought severity index" (PDSI) to find datasets. https://www.ncdc.noaa.gov/temp-and-precip/drought/historical-palmers/ PDSI is often shown from 1850 or 1870 until today. Some data sets go older, using tree ring data, and show the interesting periods from 800 to 1400, which would allow us to study the collapse of the Chaco Canyon culture. For information about file formats and to see how pervasive obsolete data formats are, look at ftp://ftp.cdc.noaa.gov/Datasets/dai_pdsi/README Download the .nc files at ftp://ftp.cdc.noaa.gov/Datasets/dai_pdsi/ and examine them with ``ncview``. Or maybe better: get the ascii files from here: http://www.cgd.ucar.edu/cas/catalog/climind/pdsi.html for example http://www.cgd.ucar.edu/cas/catalog/climind/pdsisc.monthly.maps.1850-2010.fawc=1.r2.5x2.5.ipe=2.txt.gz For Kansas look at http://www.kgs.ku.edu/Hydro/Publications/2012/OFR12_18/KGS_OF_2012-18.pdf page 10, Figure 5a: PDSI for 800 to 2000. And Figure 5b for megadroughts. Figure 5c shows the dustbowl drought. Aha: the paleoclimatology (2000 years) might be here: https://www.ncdc.noaa.gov/paleo-search/study/19119 and a text file here: https://www1.ncdc.noaa.gov/pub/data/paleo/drought/LBDA2010/LBDA_PMDI_JJA_Recons.txt and the grid metadata is explained here: https://www1.ncdc.noaa.gov/pub/data/paleo/drought/LBDA2010/LBDA_PMDI_Instrumental_metadata.txt Note that Chaco Canyon Lat/Lon are: 36.0530 deg N, 107.9559 deg W. From the meatadat it looks like it might be grid points 1883 or 1884 in this data set. Q: Are drought cycles random? Are they the combination of so many types of physics that they appear to be random enough? How about anthropogenic climate change? Does it add a baseline? Topic: Atlantic multidecadal oscillation. Vicious glow-worms ------------------ We have talked about processes which give events distributed *randomly in time:* events happen at random times. Let us now look at processes that generate points distributed *randomly in space:* :math:`(x, y)` coordinates are spewed out by our process. An example might be where the grains of sand land when you drop a handful onto the ground. We can write a program to generate random :math:`(x, y)` points between 0 and 100. The program :download:`random_spatial.py` generates a series of such points, each completely independent of the previous one. .. literalinclude:: random_spatial.py :language: python :caption: random_spatial.py -- Generate a sequence of random :math:`(x, y)` points. You can plot this with the following gnuplot instructions: .. literalinclude:: random_spatial.gp :language: gnuplot :caption: Instructions to plot points randomly distributed in a two-dimensional space. The results are shown in :numref:`fig-random_spatial`. You can see *features* in the data, even though it was randomly generated: filaments, clustering, voids... [#]_ .. _fig-random_spatial: .. figure:: random_spatial.* :width: 60% Points randomly distributed in a two-dimensional space. A possible comment: people who spend a lot of time looking at randomly generated data probably don’t easily believe in conspiracy theories. We can then do something analogous to what we did for events with random time intervals: plot the distribution of distances between :math:`(x, y)` points. The programs :download:`xy_to_distances.py` and :download:`float_histogram_maker.py<../plotting-intermediate/float_histogram_maker.py>` allow us to do so, and the results are in . Note that you will not get as much insight out of these spatial histograms as you did in , since a big factor in the distribution of spacial distances is the small size of the *x-y* plane we used. Before running these programs you will need to install the ``python3-scipy`` package: .. code-block:: console $ sudo apt install python3-scipy .. literalinclude:: xy_to_distances.py :language: python :caption: xy_to_distances.py -- Convert a collection of :math:`(x, y)` points to a list of distances between points. You can run this with: .. code-block:: console $ chmod +x random_spatial.py xy_to_distances.py $ ./random_spatial.py $ # then save it to a file $ ./random_spatial.py > random_spatial.dat $ ./xy_to_distances.py random_spatial.dat $ # this will show that output goes to the two files: $ # random_spatial.dat.distances and random_spatial.dat.nearest and plot the result with: .. literalinclude:: spatial_hist.gp :language: gnuplot :caption: Instructions to plot a histogram of distances between points randomly distributed in a two-dimensional space. The results are shown in :numref:`fig-spatial_hist`. .. _fig-spatial_hist: .. figure:: spatial_hist.* :width: 60% Histogram of distances between 2D random points. So we have seen that in two spatial dimensions we have a situation analogous to that for time differences: nearest points are distributed with a `decaying exponential` distribution. Questions ~~~~~~~~~ So what would the two dimensional distribution look like if the :math:`(x, y)` values were not purely random? Explore this in two ways: #. Change the program ``random_spatial.py`` to put all the :math:`(x, y)` points at integer positions. #. Change the program ``random_spatial.py`` to only put new points outside a "radius of avoidance" from existing points. The "radius of avoidance" idea is what happens with the famous Waitomo Glowworm Caves in New Zeland: the worm larvae take positions on the ceiling of the cave and glow to attract prey. They avoid settling near other larvae to avoid cannibalism. The result is a distribution that does not have the same filaments and voids as a purely random distribution. The lack of structure (filaments, voids...) in the cave was observed by biologist Stephen Jay Gould. He conjectured that the reason was this radius of avoidance. Physicist Ed Purcell carried out a calculation and visualization similar to ours to confirm Gould's conjecture. [cite Steven Pinker for the glowworm example] Brownian motion =============== In 1827 Scottish botanist Robert Brown looked at pollen grains in water and noticed that the pollen was moved around in the water, jiggling randomly. This was one of the significant discoveries that led to the acceptance of the molecular theory. In 1905 Einstein explained this motion by showing that the pollen was being buffeted around by the motion of individual water molecules. Today we know that fluids (liquids and gasses) are made of molecules which are in constant motion. They bump in to each other, into the walls of their container, and they also bump in to larger particles floating around (such as the grain of pollen that Brown observed). Simulating this behavior is an interesting challenge: it would involve keeping track of a large number of gas molecules and seeing when they bump in to a large sphere, which would represent the grain of pollen. Right now I leave that more detailed programming task as an exercise, and will point that a huge number of bumps (:math:`10^{14}` collisions/second) by molecules which are going in all directions means that our grain of pollen will be moving in a sort of `random walk`, the same kind we discussed and visualized in :numref:`chap-randomness-disorder`. What we will do here is write some programs that simulate such random walks and then look at answering the question "where will I end up?" More specifically we ask "how far did I go?" Let us start with one dimension. The program ``multiple-walks.py`` will take several walks in one dimension and write out the final arrival point and how far we went. .. literalinclude:: multiple-walks.py :language: python :caption: multiple-walks.py -- Take several random walks in one dimension; print out the arrival points. Nice, but can we have a plot? .. literalinclude:: multiple-walks-plot.py :language: python :caption: multiple-walks-plot.py -- Take several random walks in one dimension; print out the arrival points and make a plot of the trajectory as a function of time. Sure, but what if we are impatient and we want to see the plots before they are all generated. We can animate the plot using the techniques from :numref:`sec-animation-in-matplotlib`. .. literalinclude:: multiple-walks-plot-animated.py :language: python :caption: multiple-walks-plot.py -- Take several random walks in one dimension; print out the arrival points and make a plot of the trajectory as a function of time. In this version we draw the plots as the walks become ready, rather than waiting for the end. Now let's look at a crucial property of random walks: the overall behavior of the final position and the distance from the origin. At firs tit seems that these are all very different, with no pattern. But if we plot a *histogram* of the final positions in the one dimensional random walk we see something interesting: .. literalinclude:: multiple-walks-histogram.py :language: python :caption: multiple-walks-histogram.py -- Take several random walks in one dimension; plot a histogram of the arrival points. If we take enough walks and they are long enough, we end up with a gaussian distribution of positions, centered around the origin. NOTE: try to superimpose a gaussian with sigma proportional to sqrt(n_steps). Let's visualize things together with an animated multiplot: .. literalinclude:: multiple-walks-histogram-multiplot.py :language: python :caption: multiple-walks-histogram-multiplot.py -- This version combines the plotting of the 1D random walks and the histograms, and shows them as they accumulate. Let's look at the 2D situation: .. literalinclude:: multiple-walks-histogram-2d.py :language: python :caption: multiple-walks-histogram-2d.py -- Take several random walks in two dimensions; plot a histogram of how far they arrived. Further reading =============== The wikipedia article https://en.wikipedia.org/wiki/Brownian_motion has nice diagrams. Start by watching this video: https://www.youtube.com/watch?v=hy-clLi8gHg Possibly do this experiment: https://www.youtube.com/watch?v=wf2tBAvMNbg Other video on brownian motion: https://www.youtube.com/watch?v=NHo6LTXdFns This one has a cute demonstration of how you get a random walk when you focus on a single particle, and is https://www.youtube.com/watch?v=pdz7wFHSLD0 Scipy has some routines that help simulate Brownian motion. The cookbook has this example: http://scipy-cookbook.readthedocs.io/items/BrownianMotion.html Progression of record peaks =========================== http://iaaf-ebooks.s3.amazonaws.com/2015/Progression-of-IAAF-World-Records-2015/projet/IAAF-WRPB-2015.pdf https://en.wikipedia.org/wiki/Athletics_record_progressions [footnotes] .. [#] Note that the clustering is an artifact of the random generation of points; it is not due to a physical effect that clusters the points together. The gambler's fallacy ===================== The gambler's ruin ================== Link to other chapters ====================== * Simulated annealing covered in :numref:`chap-the-traveling-salesman` * Genetic algorithms (planned).