8. Case studies in data

8.1. Population data from the web

Our goals here are to:

  • Automate fetching of data sets from the web.

  • Look at a plot in a few different ways to get a narrative out of it.

We will start by looking at the population history of the whole world. When I discuss this with students I often ask “what do you think the population of the world is today?” (then you can have them search the web for “world population clock”, which will take them to http://www.worldometers.info/world-population/).

Then ask “what do you think the world population was in 1914? And 1923? And 1776? And 1066? And in the early and late Roman empire? And in the Age of Pericles?

Let us search for

world population growth

and we will come to this web site: https://ourworldindata.org/world-population-growth/ and if we go down a bit further we will see a link to download the annual world population data. The text on the link is FIXME: this section is incomplete.

We will not click on the link. Instead we will use the program wget to download it automatically 5:

$ wget http://ourworldindata.org/roser/graphs/[...]/....csv -O world-pop.csv

Note that this is a very long URL, but students can get it as a result of their search, so nobody has to type the full thing in.

Once they have the file downloaded they can look at the data with:

$ less world-pop.csv

and will quickly see that it is slightly different from the data we have seen so far. The columns of data are separated by commas instead of spaces. This type of file format is called comma-separated-value format and is quite common. Our plotting program, gnuplot, works with space-separated columns by default, so there are two tricks to plot the file. Either use the cool program sed to change the commas into spaces:

$ sed 's/,/   /g' world-pop.csv > world-pop.dat
$ gnuplot
gnuplot> plot 'world-pop.dat' using 1:3 with linespoints

or tell gnuplot to use a comma as a column separator:

Listing 8.1.1 Instructions to plot the world population.
##CAPTION: World population.
set grid
set datafile separator comma
plot 'world-pop.csv' using 1:2 with linespoints
../_images/plotworldpopulation.svg

Figure 8.1.1 The world population from 10000 BCE until the present time.

And what a story we could tell from this plot if it weren’t so hard to read! The main problem with this plot is that the world population in ancient times was quite small, and then it grew dramatically with various milestones in history which allowed for longer life expectancy and for the occupation of more of the world.

There are a couple of ways of trying to get more out of this plot. One is to zoom in to certain parts of it. For example, in we zoom in to the milennium from the founding of Rome to the fall of the western Roman empire, shown in Figure 8.1.2.

Listing 8.1.2 Plot the world population from the founding of Rome until the fall of the western Roman empire.
##CAPTION: World population during the period of the Roman empire.
set grid
set datafile separator comma
plot [-753:476] 'world-pop.csv' using 1:2 with linespoints
../_images/plotworldpopulationroman.svg

Figure 8.1.2 The world population from the founding of Rome (753 BCE) until the fall of the western Roman empire (476 CE).

This is a good time to stop and discuss the graph. In discussing Figure 8.1.2 students might make interesting connections referring to the Wikipedia Roman demography article It is sometimes estimated that the Roman empire might have had about 70 million citizens at the height of the empire, in the 2nd centry CE. The world population at that time was approximately 200 million people, so the Roman empire would have accounted for some 35% of the world’s population. This means that large scale population events in the Roman empire, like the Antonine Plauge in 165-180 CE, or the decline and fall of the empire in the 4th and 5th centuries might account for dips in Figure 8.1.2.

We can also zoom in to the 20th century. In Figure 8.1.3 we zoom in to the 20th century.

Listing 8.1.3 Plot the world population in the 20th century.
##CAPTION: World population in the 20th century.
set grid
set datafile separator comma
plot [1900:1999] 'world-pop.csv' using 1:2 with linespoints
../_images/plotworldpopulation20th.svg

Figure 8.1.3 The world population in the 20th century.

Discussion of Figure 8.1.3 can point out that there is exponential growth from 1900 to 1962 (the year in which the world’s rate of population growth peaked), but that the exponential growth has interruptions due to World War I, the Spanish flu, and World War II.

Listing 8.1.4 Plot the world population from 0 to 1800 CE.
##CAPTION: World population from year 0 to 1800.
set grid
set datafile separator comma
plot [0:1800] 'world-pop.csv' using 1:2 with linespoints
../_images/plotworldpopulation0-1800.svg

Figure 8.1.4 The world population from 0 to 1800 CE.

And in Figure 8.1.4 we zoom in to the period from year 0 to 1800 CE. It can be interesting to look at pandemics and wars in this period and see if you can find features in the plot that correspond to those periods in history.

These attempts at zooming in tell us a some interesting things:

  • It is frustrating that there is so little data before 1950.

  • The 0 to 1800 plot allows us to see things clearly before the population jumps up so much.

  • In the 0-1800 plot we see that the world population starts growing as we approach the year 1000, after which it flattens off around the year 1300 (the period of the great plague), after which it starts pick up and never stops growing.

The other way to look at data when the \(y\) axis has too much range is to use what is called a log scale. Figure 8.1.5 shows how this can be done in gnuplot, and you can see that the \(y\) axis has been adjusted so that we can see some of the features in the data. This plot is more useful than that in Figure 8.1.1.

Listing 8.1.5 Instructions to plot the world population with log scale.
##CAPTION: World population.
set grid
set datafile separator comma
set logscale y
plot 'world-pop.csv' using 1:2 with linespoints
../_images/plotworldpopulationlog.svg

Figure 8.1.5 The world population from 10000 BCE until the present time, with a log scale for population. You can see some features because the log scale compresses the 20th century population explosion.

8.1.1. Exercises

Exercise 8.1

Find effective ways of downloading, processing and plotting data on the duration of ancient empires. You can find some here: http://www.bbc.com/future/story/20190218-the-lifespans-of-ancient-civilisations-compared

5

The full URL is http://ourworldindata.org/roser/graphs/WorldPopulationAnnual12000years_interpolated_HYDEandUN/WorldPopulationAnnual12000years_interpolated_HYDEandUN.csv but we don’t need to type it all, so in the text I show an abbreviation of it.