Saturday, March 29, 2014

The Traveling Salesman




One of the most famous problems of math and computer science is the Traveling Salesman Problem. Given a list of \(n\) cities, what is the shortest tour that visits each city exactly once and returns back to the starting city? One of the reasons why this problem is so famous is that it is easy to understand the problem and incredibly hard to solve the problem. The animation above illustrates the steps a computer algorithm used to arrive at what we think is the shortest distance - 12,930 miles - needed to visit all 48 continental US capitals. (We used Google maps to use actual roads and highways when computing the distances between cities.) The best way to view the animation is to watch only one part of the US at a time, like the northeast. Then you can see how the web of paths works itself out into an efficient route.

When one first hears of the problem, one is likely to think that solving the problem really isn't that hard. A common first try at a solution goes something like this: Pick a starting city, then go to the city closest to it, then go the city closest to that, etc. This is called the Nearest Neighbor Algorithm. Mathematicians have another name for algorithms like this: Greedy Algorithms. That is, at every step, you do whatever is best at that time, or, whatever gives you the greatest gain right away. Since Greedy Algorithms do not look at the "big picture," the solutions they provide are generally not the best.

Starting at different cities with the Nearest Neighbor Algorithm generally produces different results. Starting in Wyoming gives the best Nearest Neighbor tour, clocking in at 14,748 miles. The tour is shown below; the roads are shown at 50% opacity so the dark lines represent highways that were traveled twice. Not so efficient. This tour ends with Texas - Tennessee - Michigan - Wisconsin - Wyoming: a "clean up" of all the cities previously missed. This tour is about 14% longer than our best tour shown above.



Like many bad things, the worst Nearest Neighbor tour starts in California, shown below. Its final states include the northwest: Montana, Washington, then Oregon. At that point, 47 of the 48 states are covered. What did we miss? Michigan. Oops. This is about 45% longer than our best tour.



What makes this problem so hard to solve? If the Nearest Neighbor solution isn't so great, why not just check every tour and then pick the shortest?

Before we try, let's do just a little work to make this simpler for us. First, recognize that traversing a tour in the opposite direction gives the same distance - it is essentially the same tour. That cuts the tours we have to check in half. Next, let's start every tour in Alabama, alphabetically the first state in the nation. Starting in Alabama, we have 47 states to choose from to be State #2. Then we have 46 choices for State #3, 45 for State #4, and so on. This gives us \((47\cdot46\cdot45\cdot 44\cdots 2\cdot 1)/2 = 47!/2\) total tours to check. This is a huge number. In fact, it is:

129,311,620,755,584,090,321,482,177,576,805,989,984,598,816,194,560,000,000,000.

Wolfram|Alpha helpfully tells us that this 60-digit number's name is:
129 octodecillion 311 septendecillion 620 sexdecillion 755 quindecillion 584 quattuordecillion 90 tredecillion 321 duodecillion 482 undecillion 177 decillion 576 nonillion 805 octillion 989 septillion 984 sextillion 598 quintillion 816 quadrillion 194 trillion 560 billion.
To show the absurd nature of this number, suppose you could check 1 trillion paths a second. It would take you over 4,100,444,595,243,026,709,838,983,307,230,022,513,463 years to check all paths.

To paraphrase a movie quote: "It's time to reevaluate our problem-solving paradigm."

Mathematical literature is full of ideas for finding good solutions to the Traveling Salesman Problem, but each method comes with the caveat: the solution given is never guaranteed to be the best. It is just probably close.

The animation above shows a "genetic algorithmic" approach to solving the problem. We generated 100 random tours of the states and graphed the shortest tour. We had no expectations that this tour would be any good at all. (It wasn't, coming in at about 49,000 miles.)

For each of these 100 tours, two states are choses at random, say, Georgia and Utah, and suppose the tour includes the path Alabama - Georgia - Virginia - Oregon - Utah - Maine. Then three new tours are created from the original tour:
  1. Swap the two states to make a new tour: Alabama - Utah - Virginia - Oregon - Georgia - Maine.
  2. Reverse the path between the two states: Alabama - Utah - Oregon - Virginia - Georgia - Maine.
  3. "Rotate" the path between the two states by one: Alabama - Utah - Georgia - Virginia - Oregon  - Maine.
Each of the 100 tours creates 3 new tours; of these 400, the 100 with the shortest path are kept and the rest discarded. The best of these new 100 is plotted for the animation. And the process repeats.

The beauty of this approach is that you don't need much of a strategy. You don't have to think "What properties of a tour give a short distance?" Rather, when tweaking a tour makes it shorter, you tend to keep the shorter version (unless there are lots of other even-shorter tours created). 

Below are the results of running this algorithm millions of times. The best result comes first (which is the final frame of the animation at top). We are about 90-95% sure this is the best route. We could be wrong - we didn't check every route - but this came up as the best several times.


These next two results are "relative minimums." That is, when the result immediately following was returned as the "best," the algorithm ran for about 5,000 more iterations without improvement. The variations our genetic algorithm introduces into the tours were not sufficient to make a better path than this. (Perhaps if we ran the algorithm longer it would have; perhaps my monkey would have also started to type Shakespearean sonnets.)


It is interesting to us that the tour below exists; it is 13 miles longer than our best. The difference from the optimal comes from visiting the midwest states in a different order.



After settling our code into something decent, we ran a simulation for about an hour that produced the results above, probably sampling around 20 million tours. We also used Mathematica's FindShortestTour command. In less than 5 seconds it produced a tour that was about 240 miles longer than our best. With an average speed of 60mph, this means Mathematica's tour would take 4 hours longer than ours. So you could use a genetic algorithm and search for an hour and still come out an hour ahead.

Please consider following us on Twitter. We tweet only when a new post is up, and never share pictures of our dessert.