.. _chap-the-traveling-salesman:

========================
 The Traveling Salesman
========================

[status: content-mostly-written]

.. sidebar:: Applications of `optimization`
   :subtitle: Optimization in practice

   The traveling salesman problem (TSP) is one of the best known
   examples of `optimization`, which is a very important field.  Its
   solution is relevant to many large scale logistical efforts, such
   as the most efficient path for school buses in a city, or airline
   routing, or freight transportation.  To show how important this is
   considered, in 1962, the giant corporation Proctor and Gample ran a
   contest to solve an instance of TSP with 33 U.S. cities.  The prize
   was $75000 in 2013 dollars.  In the world of theoretical computer
   science, the TSP spawned the development of many of the most
   important advances in computer science.  Although one might not go
   as far as to quote Mandos in The Silmarillion and say that the fate
   of Earth is linked to solutions to the TSP, it is hard to overstate
   the importance of being able to find good solutions to these types
   of problems.


.. rubric:: Motivation

One of the most important things done by computer software is to find
*approximations* to mathematical problems that are too complex to
carry out exactly.  We will examine one of these problems, the
Traveling Salesman Problem (TSP), which is easy to formulate and has
many practical applications.

.. rubric:: Prerequisites

- The 10 hour "serious programming" course.

- A GNU/Linux system with python3 and python3-tk installed (on a
  debian-based linux system this can be done with ``sudo apt install
  python3-tk``)

- The mini-course on drawing on canvases, in
  :numref:`sec-drawing-on-a-canvas`

.. rubric:: Plan

We will start by formulating what the problem looks like, then we will
discuss the general idea of optimization, after which we will write
programs in Python that find an easy (but not very good) solution with
the "greedy" algorithm.  Finally we will write a program which uses
the "hill climbing" algorithm to find an approximate solution.

Throughout these examples we will use the simple drawing techniques we
learned in :numref:`sec-drawing-on-a-canvas` to show rather cute
animations of our paths through the cities.

And this video snippet, which discusses Buddy Holly's "Winter Dance
Tour", mentions the poorly chosen route through the midwest:
https://youtu.be/NFdWrwbxNms?t=214 -- the cities were, in this order:
Milwaukee WI, Kenosha Wi, Mankato MN, Eau Claire Wi, Montevideo MN,
St. Paul MN, Davenport IA, Fort Dodge IA, Duluth MN, Green Bay WI,
Clear Lake IA, Moorhead MN, Sioux City IA, Des Moines IA, Cedar Rapids
IA, Spring Valley IL, Chicago IL, Waterloo IA, Dubuque IA, Louisville
KY, Canton OH, Youngstown OH, Peoria IL, Springfield IL:

https://commons.wikimedia.org/wiki/File:Winter_Dance_Party_Tour_Schedule,_1959.svg

The video commentator points out: "[...] the second was choosing the
most idiotic route between the venues [...]".


Cities and path lengths
=======================

Our cities are a list of :math:`(x, y)` coordinates within the canvas
size.  A future improvement would be to use realistic latitude and
longitude values and fit those into a canvas.

The length of the path is the sum of the space between each successive
pair of cities, and the distance between two cities can be calculated
with the Pythagoras theorem (IMPROVEME: a simple picture would be nice
here).  If the cities have coordinates :math:`(x_1, y_1)` and
:math:`(x_2, y_2)`, then the x and y distances between them are
:math:`(x_2-x_1)` and :math:`(y_2-y_1)`, and the distance "as the crow
flies" is given by the Pythagorean formula:

.. math::

   d = \sqrt{(x_2-x_1)^2 + (y_2-y_1)^2}

If we were to represent the coordinates of a city in Python with a
simple pair then we might represent two cities like this:

.. code-block:: python

   c1 = (14, 182)
   c2 = (71, 50)

We might then write a function to calculate the distance between them
like this:

.. code-block:: python

   import math
   ## ...
   def distance(city1, city2):
       x1 = city1[0]   ## coordinates of the first city
       y1 = city1[1]
       x2 = city2[0]   ## coordinates of the second city
       y2 = city2[1]
       r = math.sqrt((x2-x1)**2 + (y2-y1)**2)
       return r

Once we have this function we could calculate the distance like this:

.. code-block:: python

   c1 = (14, 182)
   c2 = (71, 50)
   d = distance(c1, c2)
   print(f'The distance between the cities ({c1[0]}, {c1[1]}) and ({c2[0]}, {c2[1]}) is {d}')

The output would look like this:

.. parsed-literal::

   The distance between the cities (14, 182) and (71, 50) is 143.781

To calculate the distance of the *entire path* we would add together
all the distances from the first city to the second, from the second
to the third, and so on until the last one.  Finally we would add the
"return home" path length from the last city back to the first.

Solving the Traveling Salesman Problem
======================================

What does it mean to "solve" the TSP?  It means to find the *shortest*
path between the cities.  We will see that for larger sizes this is
not practically possible, but we will try to do better than a random
path, and to take some steps to further improve our solutions.

A digression on optimization
----------------------------

Show pictures of a search in 1d and 2d, possibly with simple 1-d and
2-d gaussian pictures.

The optimization task we are looking at (the traveling salesman
problem) is an interesting one, but let us digress to look at some
simpler optimization tasks for which we can make interesting plots.

Let us examine the following *gaussian* function :math:`e^{(-(x-3)^2)}`
which you might remember from :numref:`sec-gaussian-distribution`:

.. code:: console

   $ gnuplot
   gnuplot> set samples 400
   gnuplot> plot exp(-(x-3)**2)
   gnuplot> set terminal svg
   gnuplot> set output 'single-hill-1d.svg'
   gnuplot> replot
   gnuplot> quit


Note that you could have similar function that points downward:

.. code:: console

   $ gnuplot
   gnuplot> plot -exp(-(x-3)**2)

In the first case our function has *maximum*, and in the second case
it has a *minimum*.  The word *optimum* can refer to one or the
other.


.. _fig-single-hill-1d:

.. figure:: single-hill-1d.*
   :width: 46%

   An example of a one dimensional hill-shaped function.  Note the
   single optimum: the top of the hill for :math:`x = 3`

Now try a two dimensional function with a clear single hill and single
optimum:

.. code:: bash

   $ gnuplot
   gnuplot> set pm3d
   gnuplot> set samples 150
   gnuplot> set isosamples 60
   gnuplot> set hidden3d
   gnuplot> splot exp(-(x-1)**2 - (y-0.5)**2)
   gnuplot> set terminal svg
   gnuplot> set output 'single-hill-2d.svg'
   gnuplot> replot
   gnuplot> quit


.. _fig-single-hill-2d:

.. figure:: single-hill-2d.*
   :width: 45%

   An example of a two dimensional hill-shaped function.  Note the
   single optimum: the top of the hill for :math:`(x, y) = (1, 0.5)`


..
   .. figure:: single-hill-1d.*
      :scale: 10 %
      :alt: map to buried treasure

      This is the caption of the figure (a simple paragraph).

      The legend consists of all elements after the caption.  In this
      case, the legend consists of this paragraph and the following
      table:

      +-----------------------------+-----------------------------+
      | 1D                          | 2D                          |
      +=============================+=============================+
      | .. image:: single-hill-1d.* | .. image:: single-hill-2d.* |
      +-----------------------------+-----------------------------+

In :numref:`fig-single-hill-1d` and :numref:`fig-single-hill-2d` we
see that some optimization problems are as simple as climbing to the
top of a single nearby hill.  There are no other hills to confuse us
and it should be simple to find the optimal value of x (in one
dimension) or of x and y (in two dimensions).

These figures give a clear idea of what we are looking for in
optimization when we optimize a function of x or of x and y.  The
picture is not so pretty when we try to optimize something more
complicated: there is no such visualization of the distance in the
traveling salesman problem, since city routes cannot be plotted as an
x axis or as x and y axes.  When we get to it, we will visualize the
improvement in the TSP by showing an animation of the path through the
cities.


Generating and visualizing lists of cities
------------------------------------------

Examine and type in the program in :numref:`listing-cities-simple-py`

.. _listing-cities-simple-py:

.. literalinclude:: cities_simple.py
   :language: python
   :caption: Program which draws a simple set of random cities on a
             canvas, in file ``cities_simple.py``.

.. exercise::

   Modify ``cities_simple.py`` to print some information about the
   path of these cities at the bottom of the canvas.  (An example can
   be downloaded in :download:`cities_simple_with_info.py`.)


Animating the drawing of cities
-------------------------------

We can modify the program in :numref:`listing-cities-simple-py` quite
straightforwardly to modify some information about the list of cities
and animate the city drawing to visualize those changes.

You can download this program :download:`cities_animated.py` and try
running it to see how it works.


.. exercise::

   Modify ``cities_animated.py`` to keep the `starting` point of the
   search.

.. exercise::

   Write a function which calculates the distance between two cities,
   then write a function which calculates the total length of a list
   of cities.

.. exercise::

   Modify ``cities_simple.py`` and ``cities_animated.py`` to print the
   total length of the path and other interesting information at the
   bottom of the screen. (An example can be downloaded in this file:
   :download:`cities_animated_with_info.py`.)


Improvements to the route
=========================

Before you start
----------------

* https://www.youtube.com/watch?v=xi5dWND499g
  (british fellow gives a hands-on demonstration of TSP with a map,
  pushpins, and a length of string)

* https://www.youtube.com/watch?v=SC5CX8drAtU (attractive and
  partially annotated visualization of greedy,
  local-search/hill-climbing and simulated annealing algorithms)

Impossibile to compute the optimal solution
-------------------------------------------

People will tell you that the traveling salesman problem cannot be
solved by "direct attack" because there are too many possible paths
for a computer to explore them all.

Let us look in to this with pencil and paper in hand and see if we
agree.

Draw three cities on a sheet of paper, pick a first city.  Draw all
paths that start with that first city and eventually get you back home
to it.  There should be 2 different paths.

Now do the same with 4 cities.  There should be 6 paths.

Try to do the same with 5 cities.  There should be 24 possible paths.

In general the equations is:

.. math::

   n_{paths}(n_{cities}) = (n_{cities}-1)! = (n_{cities}-1)\times(n_{cities}-2)\times ... \times 3 \times 2 \times 1

where the -1 in :math:`n_{cities}-1` comes in because at the end of
the run you always return to the given home city.

Now try to argue with yourself and with your partners to convince
everyone that this factorial formula works for larger
:math:`n_{cities}`.  And how rapidly does :math:`(n_{cities}-1)!`
grow?  Remember from :numref:`sec-simple-math` that it grows at an
amazingly fast rate:

.. math::

   2! &= 2 \times 1 = 2 \\
   3! &= 3 \times 2 \times 1 = 6 \\
   4! &= 4 \times 3 \times 2 \times 1 = 24 \\
   5! &= 5 \times ... \times 2 \times 1 = 120 \\
   6! &= 6 \times ... \times 2 \times 1 = 720 \\
   7! &= 7 \times ... \times 2 \times 1 = 5040 \\
   8! &= 8 \times ... \times 2 \times 1 = 40320 \\
   9! &= 9 \times ... \times 2 \times 1 = 362880 \\
   10! &= 10 \times ... \times 2 \times 1 = 3628800 \\
       &... \\
   20! &= 10 \times ... \times 2 \times 1 = 2432902008176640000 \\
       &... \\
   32! &= 32 \times 31 \times ... \times 2 \times 1 = 263130836933693530167218012160000000

So :math:`32!` is a number with 35 digits, approximately
:math:`2.63131\times 10^{35}`, so clearly we cannot hope for a
computer to look at all possible paths.  Thus we look for approximate
ways of doing it.


Greedy algorithm
----------------

One algorithm that comes to mind is to always travel to the city that
is closest to you and that you have *not* yet visited.  This is not
the best way, but it is usually not the worst way either.

.. exercise::

   With pen and paper draw short routes (4 or 5 cities) and solve them
   with the greedy algorithm.  Discuss with your partners if this is
   the optimal solution or not.

.. exercise::

   You can download this program :download:`tsp_solution_greedy.py`
   and try running it to see how it works.  Change the number of
   cities to be quite small and quite big.

.. exercise::

   Discuss with your partners what the term *deterministic algorithm*
   might mean, and whether the greedy algorithm is deterministic.

.. exercise::

   With your partners try to figure out (with pen and paper) how many
   times the greedy program calculates the distance between two
   cities.  This is called the *computational complexity* of the
   algorithm.  This number should be a function of ``n_cities``.

.. exercise::

   Modify the program to *count* how many times it calculates the
   distance between two cities.  Then run it with a variety of
   different values for ``n_cities``.

.. exercise::

   Try to come up with a layout of cities in which the greedy
   algorithm performs very poorly.  After you have tried conjuring a
   set of your own, you may look at :cite:p:`bird2015apiological` for
   some examples.

.. exercise::

   Try swapping triplets of cities instead of pairs and see if it does
   better.  A couple of things to consider as you try this:

   - Triplets might make it harder to reach an optimum because
     sometimes the best improvement in hill-climbing might come from a
     simple 2-point swap.  So you might need to explore sometimes
     swapping 2, sometimes 3, maybe even more.

   - To compare 2-swaps or 3-swaps or even more, you might want to set
     the *random number seed* to a fixed value at the start so that you
     can reproduce the same layout of cities.  In python you can do
     this with ``random.srandom(1234)`` near the start of the program.

.. exercise::

   Learn about the "basin hopping" algorithm provided by the scipy
   python library, and see if it can be used to solve the traveling
   salesman problem.  Write a program which uses this method and run
   it alongside the program we wrote which uses the hill climbing
   algorithm.


A digression on hill climbing
-----------------------------

..
   When we *search* ... FIXME
   ... "like climbing Everest in thick fog with amnesia" ...


Before you start
~~~~~~~~~~~~~~~~

* Watch this video: https://www.youtube.com/watch?v=kOFBnKDGtJM
  (Georgia Tech course on machine learning -- hill-climbing section.
  Pedagogically quite understandable.)

* Re-examine some of the plots in our tour of functions in
  :numref:`chap-a-tour-of-functions`, specifically in
  :numref:`sec-gaussian-distribution`.


Hill climbing for the traveling salesman problem
------------------------------------------------


So what is hill-climbing?  It is a simple algorithm (expressed for
when we seek a *maximum*):

#. Start at a certain location (random, or you can pick it if you know
   something about the landscape).

#. Try taking a step in a random direction.

#. Calculate your function.  If it is bigger, take that step; if it is
   smaller, go back to where you were.

If we try to climb up the hills shown in above in
:numref:`fig-single-hill-1d` and :numref:`fig-single-hill-2d` we find
it quite straightforward: there is a single peak and we will find it
easily.

Now look at different type of hilly functions that we might need to
optimize.  For example in one dimension we might look at the function
:math:`\sin(6*x)*e^{(x-3)^2}`

We can generate a plot with:

.. code:: bash

   gnuplot> set samples 400
   gnuplot> plot[-4:] sin(6*x) * exp(-(x-3)**2)
   gnuplot> set terminal svg
   gnuplot> set output 'multiple-hills-1d.svg'
   gnuplot> replot
   gnuplot> quit

.. _fig-multiple-hills-1d:

.. figure:: multiple-hills-1d.*

   A one dimensional example of multiple hills with different heights.
   The *global* maximum is for a value of x near 3.3.

Now look at a 2-dimensional situation where the function could be
:math:`e^{-(x^2+y^2)/7} \times \cos(2.5\times \sqrt{x^2 + y^2})`

.. code:: bash

   set pm3d
   set samples 250
   set isosamples 80
   set hidden3d
   splot exp(-(x**2 + y**2)/7.0) * cos(2.5*sqrt(x*x + y*y))
   set terminal svg
   set output 'multiple-hills-2d.svg'
   replot

.. _fig-multiple-hills-2d:

.. figure:: multiple-hills-2d.*

   A two dimensional example of multiple maxima.  The *global* maximum
   is near :math:`(x, y) = (0, 0)`.

In the functions show in :numref:`fig-multiple-hills-1d` and
:numref:`fig-multiple-hills-2d` we will run in to the problem of
climbing to a *local* maximum.  This is similar to when you go hiking
and it's foggy and you think you reached the peak of the main
mountain, but then the sky clears and you realize that you got stuck
on one of the lesser peaks.

Another thing to say about the hill-climbing algorithm is that it is a
*stochastic* approach.  This means that some of the decisions made by
the algorithm involve generating random numbers.

You can download and run the program
:download:`tsp_solution_hillclimbing.py`. Examine the source code,
then run it a few times.

You will see that it starts out with a random and rather chaotic path
through the cities.  Then it will attempt to swap two cities picked
randomly in the path and that swap will stick when the new path is
shorter than the previous one.

Exercises
~~~~~~~~~

* Use the terminal output from ``tsp_solution_hillclimbing.py`` to
  make a plot of the path distance as a function of how many steps
  have been taken.  You could do so by redirecting the output into a
  file called ``hill.out`` and using gnuplot to plot columns 2 and 3.

* Discuss with your class how far apart are the cities get swapped
  successfully.  Are they near each other on the list?  Are they far
  apart?  Are they near each other in distance?  Does that change as
  we get further in the run?

* Add more information to the ``print()`` statement to show how far
  apart the swapped cities are in the list and in distance.  Plot how
  those distances change in the course of a search.


Further study
-------------

These lectures on youtube are at a more advanced level than what we do
here.

* https://www.youtube.com/watch?v=boTeFM-CVFw&t=45s
  (Delightful discussion of hill climbing.)

* https://www.youtube.com/watch?v=j1H3jAAGlEA&t=998s
  (Extensive discussion of search.)

* https://www.youtube.com/watch?v=eczhFRfo3mI
  (Extensive theoretical discussion.)

* https://www.youtube.com/watch?v=K7vc60jn1KU (Georgia Tech course on
  machine learning -- simulated annealing section.  Nice pedagogical
  introduction.)

* http://aperiodical.com/2015/03/apiological-part-3/ (A discussion
  with examples of how the greedy algorithm can fail.)

* https://ocw.mit.edu/courses/sloan-school-of-management/15-053-optimization-methods-in-management-science-spring-2013/lecture-notes/MIT15_053S13_lec17.pdf
  a slide show from MIT courseware which seems very clear and
  systematic.  It also discusses facility location problems.

* https://diego.codes/post/som-tsp/
  Diego Vicente's "Using Self-Organizing Maps to solve the Traveling
  Salesman Problem" article.

Where do you go from search
===========================

* Machine learning: https://www.youtube.com/watch?v=nKW8Ndu7Mjw

* minimizing "wrongness"

* look at the cookie goodness
  https://www.toptal.com/machine-learning/machine-learning-theory-an-introductory-primer