8. Case studies in data
8.1. Population data from the web
Our goals here are to:
Automate fetching of data sets from the web.
Look at a plot in a few different ways to get a narrative out of it.
We will start by looking at the population history of the whole world. When I discuss this with students I often ask “what do you think the population of the world is today?” (then you can have them search the web for “world population clock”, which will take them to http://www.worldometers.info/world-population/).
Then ask “what do you think the world population was in 1914? And 1923? And 1776? And 1066? And in the early and late Roman empire? And in the Age of Pericles?
Let us search for
world population growth
and we will come to this web site: https://ourworldindata.org/world-population-growth/ and if we go down a bit further we will see a link to download the annual world population data. The text on the link is FIXME: this section is incomplete.
We will not click on the link. Instead we will use the program
wget
to download it automatically 5:
$ wget http://ourworldindata.org/roser/graphs/[...]/....csv -O world-pop.csv
Note that this is a very long URL, but students can get it as a result of their search, so nobody has to type the full thing in.
Once they have the file downloaded they can look at the data with:
$ less world-pop.csv
and will quickly see that it is slightly different from the data we have
seen so far. The columns of data are separated by commas instead of
spaces. This type of file format is called comma-separated-value
format and is quite common. Our plotting program, gnuplot
, works
with space-separated columns by default, so there are two tricks to plot
the file. Either use the cool program sed
to change the commas into
spaces:
$ sed 's/,/ /g' world-pop.csv > world-pop.dat
$ gnuplot
gnuplot> plot 'world-pop.dat' using 1:3 with linespoints
or tell gnuplot to use a comma as a column separator:
##CAPTION: World population.
set grid
set datafile separator comma
plot 'world-pop.csv' using 1:2 with linespoints
And what a story we could tell from this plot if it weren’t so hard to read! The main problem with this plot is that the world population in ancient times was quite small, and then it grew dramatically with various milestones in history which allowed for longer life expectancy and for the occupation of more of the world.
There are a couple of ways of trying to get more out of this plot. One is to zoom in to certain parts of it. For example, in we zoom in to the milennium from the founding of Rome to the fall of the western Roman empire, shown in Figure 8.1.2.
##CAPTION: World population during the period of the Roman empire.
set grid
set datafile separator comma
plot [-753:476] 'world-pop.csv' using 1:2 with linespoints
This is a good time to stop and discuss the graph. In discussing Figure 8.1.2 students might make interesting connections referring to the Wikipedia Roman demography article It is sometimes estimated that the Roman empire might have had about 70 million citizens at the height of the empire, in the 2nd centry CE. The world population at that time was approximately 200 million people, so the Roman empire would have accounted for some 35% of the world’s population. This means that large scale population events in the Roman empire, like the Antonine Plauge in 165-180 CE, or the decline and fall of the empire in the 4th and 5th centuries might account for dips in Figure 8.1.2.
We can also zoom in to the 20th century. In Figure 8.1.3 we zoom in to the 20th century.
##CAPTION: World population in the 20th century.
set grid
set datafile separator comma
plot [1900:1999] 'world-pop.csv' using 1:2 with linespoints
Discussion of Figure 8.1.3 can point out that there is exponential growth from 1900 to 1962 (the year in which the world’s rate of population growth peaked), but that the exponential growth has interruptions due to World War I, the Spanish flu, and World War II.
##CAPTION: World population from year 0 to 1800.
set grid
set datafile separator comma
plot [0:1800] 'world-pop.csv' using 1:2 with linespoints
And in Figure 8.1.4 we zoom in to the period from year 0 to 1800 CE. It can be interesting to look at pandemics and wars in this period and see if you can find features in the plot that correspond to those periods in history.
These attempts at zooming in tell us a some interesting things:
It is frustrating that there is so little data before 1950.
The 0 to 1800 plot allows us to see things clearly before the population jumps up so much.
In the 0-1800 plot we see that the world population starts growing as we approach the year 1000, after which it flattens off around the year 1300 (the period of the great plague), after which it starts pick up and never stops growing.
The other way to look at data when the \(y\) axis has too much
range is to use what is called a log
scale. Figure 8.1.5 shows how this can be
done in gnuplot
, and you can see that the \(y\) axis has been
adjusted so that we can see some of the features in the data. This
plot is more useful than that in Figure 8.1.1.
##CAPTION: World population.
set grid
set datafile separator comma
set logscale y
plot 'world-pop.csv' using 1:2 with linespoints
8.1.1. Exercises
Find effective ways of downloading, processing and plotting data on the duration of ancient empires. You can find some here: http://www.bbc.com/future/story/20190218-the-lifespans-of-ancient-civilisations-compared
- 5
The full URL is http://ourworldindata.org/roser/graphs/WorldPopulationAnnual12000years_interpolated_HYDEandUN/WorldPopulationAnnual12000years_interpolated_HYDEandUN.csv but we don’t need to type it all, so in the text I show an abbreviation of it.