Simulated annealing is an optimization technique that finds an approximation of the global minimum of a function. When working on an optimization problem, a model and a cost function are designed specifically for this problem. By applying the simulated annealing technique to this cost function, an optimal solution can be found. In this article, I present the simulated annealing technique, I explain how it applies to the traveling salesman problem, and I perform experiments to understand how the different parameters control the details of the search for an optimal solution. I also provide an implementation in Python, along with graphic visualization of the solutions.
On Wikipedia, we can read:
The name and inspiration come from annealing in metallurgy, a technique involving heating and controlled cooling of a material to increase the size of its crystals and reduce their defects.
The computer version of simulated annealing mimics the metallurgy one, and finds lower levels of energy for the cost function. The algorithm begins with a high temperature, and slowly cools down to a low temperature. Cooling down is done simply by having a loop on a
temperature variable, and by multiplying this variable by a number between 0 and 1 at every iteration:
temperature = temperature_start while temperature > temperature_end: compute cost values and update optimal solution temperature = temperature * cooling_factor
We want to minimize the value of the cost function. This function has parameters, so minimizing its values is done by finding the good set of parameters. An initial cost value is computed for a set of random parameters, and by modifying this set, we are going to find an optimal solution. In order to find new parameter sets, we are going to change one of the parameters of the cost function at random at every iteration. Then we need to check if the change has been fruitful, and for that we compare the cost values before and after the change. If the new cost value is smaller than the previous cost value (the previous being the best solution known so far), then the change in parameters is a gain, and therefore we keep it. But if the new cost value is bigger than the previous one, it is not directly rejected. This is what makes the interest of the simulated annealing technique compared to hill climbing, as it prevents the search from being stuck in local minima. Even if the new cost value is bigger than the previous one, it can still be kept with a certain probability. This probability is computed based on the difference between the new and the previous cost values, and on the temperature.
costnew = cost_function(...) difference = costnew - costprevious if difference < 0 or e(-difference/temperature) > random(0,1): costprevious = costnew
We use the negative exponential as probability distribution (Kirkpatrick, 1984), and we compare the exponential value to a random number taken in the uniform distribution in the interval [0,1). By replacing these computations into the simulated annealing loop presented above, we obtain:
costprevious = infinite temperature = temperature_start while temperature > temperature_end: costnew = cost_function(...) difference = costnew - costprevious if difference < 0 or e(-difference/temperature) > random(0,1): costprevious = costnew temperature = temperature * cooling_factor
The traveling salesman problem
The traveling salesman problem is a classic of Computer Science. In this problem, a traveling salesman has to visit all the cities in a given list. The difficulty is that he has to do that by visiting each city only once, and by minimizing the traveled distance. This is equivalent to finding a Hamiltonian cycle, which is NP-complete.
The traveling salesman problem is a minimization problem, since it consists in minimizing the distance traveled by the salesman during his tour. As the distance is what we want to minimize, it has to be our cost function. The parameters of this function are the cities in the list. Modifying a parameter of the function equals to changing the order of visit of the cities. Applying the simulated annealing technique to the traveling salesman problem can be summed up as follow:
- Create the initial list of cities by shuffling the input list (ie: make the order of visit random).
- At every iteration, two cities are swapped in the list. The cost value is the distance traveled by the salesman for the whole tour.
- If the new distance, computed after the change, is shorter than the current distance, it is kept.
- If the new distance is longer than the current one, it is kept with a certain probability.
- We update the temperature at every iteration by slowly cooling down.
Moreover, two major optimizations can be used to speed up the computation of the distances:
- Instead of recomputing the distance between two cities every time it is required, the distances between all pairs of cities can be precomputed in a table, and used thereafter. Actually, a triangular matrix is enough, as the distance between cities A and B is the same as the distance between B and A.
- As a change in parameters consists in swapping two cities, it is useless to recompute the total distance for the whole tour. Indeed, only the distances changed by the swap should be recomputed.
I have implemented simulated annealing using Python and the design described in the previous section. This implementation is available for download at the end of this article. The data I am using are GPS coordinates of 50 European cities. I am using an Intel Atom 1.6Ghz processor on Linux Ubuntu to run my experiments. Also, for all the experiments, I am using a constant seed for the random number generator, which is the value 42.
The first experiment that I perform uses only ten cities. Figure 1 shows the initial tour, formed by a random visit order. Figure 2 presents the optimal tour obtained using simulated annealing. A 32% improvement is observed from the initial tour to the optimal tour, as distance goes from 12699 km down to 8588 km. This solution was found in 2 seconds. Figure 3 shows how the optimal solution improves over the course of the simulated annealing.
One of the drawbacks of simulated annealing is that by testing random changes, the solution can diverge very quickly. In order to solve this problem, one can start back the annealing procedure from a previous solution known to be good. This technique is know as restart. The decision to restart can come from various conditions. For example, annealing can restart after the temperature has reached a certain state, or after a certain number of steps. Here I am implementing a restart that takes place when the temperature reaches its final state. The cost function parameters are then set to the best tour known at this moment, instead of a random list as it is done to initialize the first step. Figures 4 and 5 below show the same simulated annealing algorithm used in the previous section, except that ten restarts are used in the hope of finding a better solution. On Figure 5, the vertical green bars indicate when a restart takes place. The computation takes 14 seconds and a better solution is found indeed, but the gain is small. With a 32% improvement on one iteration, we only obtain a 34% improvement on ten iterations. Only the cities #9 and #10 are swapped from the solution found in one iteration. Moreover, we can observe on Figure 5 that the optimal distance does not change after the 3rd restart, which means that 70% of the computation is useless. Click on the figures to enlarge.
Playing with the parameters
In the previous experiment, the shortest distance metric in Figures 3 and 5 shows a nice convergence pattern. However, due to the random swap of cities in the annealing loop, the tested distance metric does not present any real convergence pattern. This metric is at the center of the algorithm, as it is from there that new optimal solutions can appear. The first thing that I am looking for when I am testing optimization and artificial intelligence algorithms is some kind of convergence in the observed metric. I agree that here, a simple city swap can lead to dramatic changes in the total traveled distance, but not seeing any convergence at all is very confusing. It seems there is some convergence for the tested distance in Figure 3 between the steps 11000 and 12000, but what is happening before is very chaotic, which gives me the feeling that simulated annealing does not really know what it is doing here. My guess is that the parameters of the algorithm need to be adapted to the problem to solve. Therefore, in the next section I am performing more experiments, this time with 50 cities, in order to study the effect of parameter changes on the algorithm’s behavior.
Cooling factor and starting temperature
In order to test the effects of the cooling factor and of the starting temperature, I run the simulated annealing for 50 cities on 10 iterations by changing only these two parameters. All other parameters stay constant. Figure 6 shows the same experiment for different cooling factor and starting temperature as follows:
(a). Cooling factor = 0.95, Starting temperature = 1e+10, Improvement: 43%, Time: 1 s
(b). Cooling factor = 0.95, Starting temperature = 1e+50, Improvement: 43%, Time: 3 s
(c). Cooling factor = 0.99, Starting temperature = 1e+10, Improvement: 57%, Time: 4 s
(d). Cooling factor = 0.99, Starting temperature = 1e+50, Improvement: 55%, Time: 16 s
I am very pleased with this series of experiments, because I am able to show a clear convergence of the tested distance over the course of an iteration. The number of steps changes for each of the experiments here. Therefore, the nice convergence pattern observed in Figure 6(a) is also present in Figures 6(b), 6(c) and 6(d), it is just less obvious as the curves are compressed over the y-axis. So what do we learn from this experiment? First, we can see that pre-convergence phase formed by the steps occurring before convergence in the tested distance, and which was our problem in Figures 3 and 5, is longer when the starting temperature is 1e+50 than when it is 1e+10. This is clearly visible in Figures 6(b) and 6(d) against Figures 6(a) and 6(c), respectively. These steps are due to many random changes in the tour configuration, in hope of seeing a new optimal solution emerging. But using the curve of shortest distance, we can see that these pre-convergence phases are very unproductive, since except in the first iterations, no new optimal solutions are found while they are executed. Moreover, the execution time for the experiment with a starting temperature of 1e+50 is more important than with a starting temperature of 1e+10. My conclusion is that we are just loosing time trying compute random stuff, which wastes processor cycles and delay convergence towards the optimal solution. Therefore in the case of our problem, a lower value, like 1e+10, is to be preferred.
Another interesting feature emerges from Figures 6(a) and 6(c). As we can see, convergence of the shortest distance operates faster when the cooling factor is 0.99 than when it is 0.95. There are around 500 steps in an iteration with a cooling factor of 0.95 against 2500 steps in an iteration with a cooling factor of 0.99. Indeed, with a value of 0.99, cooling takes more time, and therefore the algorithm has more opportunities for testing random solutions during the pre-convergence phase, but also to find more precise solutions during the convergence phase. Of course this has a cost, and computation time is 4 seconds for a cooling factor of 0.99 whereas it is only 1 second for a cooling factor of 0.95, but here the additional computations are worth it as they allow to reach a 57% improvement compared to the initial tour.
Now I would like to look into the importance of the ending temperature. Therefore as above, I am running the simulated annealing for 50 cities on 10 iterations, but this time only the ending temperature changes and all other parameters remain constant. During these experiments, cooling factor is at 0.95, and the starting temperature at 10e+50. Figure 7 shows the consequences on the simulated annealing:
(a). Ending temperature = 0.001, Improvement: 60%, Time: 4 s
(b). Ending temperature = 0.01, Improvement: 58%, Time: 4 s
(c). Ending temperature = 0.1, Improvement: 57%, Time: 4 s
(d). Ending temperature = 1, Improvement: 53%, Time: 4 s
The pre-convergence and convergence phases are mostly the same for the tested distance across all experiments, but there is a detail that is worth noticing. It seems that as the ending temperature decreases, convergence gets more precise. Precision does not improves dramatically, but enough to refine the optimal solution found so far. The number of steps and the computation times are the same for all four experiments, however improvement is of 53% for an ending temperature of 1, 57% for 0.1, 58% for 0.01, and 60% for 0.001. Even more interesting, the 60% improvement in the case of 0.001 is reached during the first iteration only.
It seems that the starting and ending temperatures control the convergence pattern. In the previous section we saw that the starting temperature defines when the convergence starts, and here we see that the ending temperature controls the granularity of convergence, and allows to find solutions always more precise. But be careful, the ending temperature should not be too low. When testing an ending temperature of 0.0001, results are even better than for 0.001, but with 0.00001 results get worse. This is because if the ending temperature gets too low, then changes in configuration cannot operate correctly near the end of the annealing process. This causes the algorithm to remain stuck in some local minima.
Final experiment: the quest for the optimal tour
Now that I have studied carefully the important of every parameter on the behavior of the algorithm, I want to apply this knowledge to find an optimal solution for our tour of Europe in 50 cities. I am going to keep a starting temperature of 1e+10 to avoid wasting computations, a cooling factor of 0.99 to increase computation time and hopefully finding better configurations. Also, I am going to set the ending temperature at 0.0001 in order to get a good granularity at the end of the convergence phase. Finally, I run the simulated annealing on up to 100 restart iterations!
Figures 8 through 12 represent the initial tour and the optimal tour after 1, 10, 50 and 100 iterations, respectively. The optimal tour slowly appears. Figure 13 shows the distance metrics for the 100 iterations. The restart locations have not been plotted. The optimal solution after 100 iterations represent a 63% improvement compared to the initial configuration, in 46 seconds. Not too bad, and we can see on Figure 12 that the tour found after 100 iterations is indeed more simple than the initial one! Click on the figures to enlarge.
In this article, I presented simulated annealing, an optimization technique aimed at finding an approximation to the global minimum of a function. I explained how this technique applies to the traveling salesman problem, and I used “restart“, a method supposed to help to find more precise solutions. After some tests, I managed to find a good set of parameters for the algorithm which is adapted to the problem to solve. Finally, I applied the algorithm and I found an optimal path in a 50-city tour.
Now that I have played with simulated annealing quite a bit, I am convinced that this technique can find optimal solutions for complex problems. But I also have seen that the parameters of the algorithm are very sensitive. A good set of parameters is likely to differ from one problem to another. Therefore, before applying simulated annealing to a new problem, my plan would be to reproduce the same experiments that I have done here for the starting and ending temperature, and for the cooling factor. This will help me to find the right spot, where the parameter values allow to find an optimal solution quickly.
Regarding the restart method, my feeling is that it still has to be refined. Indeed, Figures 5, 6, 7 and 13 show that tested distance goes back to high values every time a restart occurs. This means that the algorithm goes back to trying random solutions at every restart, which is almost the same as restarting the algorithm from a random state. Therefore I have serious doubts whether I am doing it right. I think that playing with the starting and ending temperature values could fix that. For instance, by implementing a decay of starting temperature at every new restart, one could limit the randomness observed in the pre-convergence phase of an iteration. Thus, in addition to the convergence observed within the boundaries of a single iteration, a global convergence should also appear, as new iterations are performed.
You can download the source code for the Python implementation that I have used to perform the experiments in this article, annealing.py, and the input data, cities.csv. Also, plotting the figures will require that you install the matplotlib package. You can find more information about this package on the official matplotlib website.