tsp

Finding optimal tours for the Travelling Salesman Problem#

In the 19th century, William Rowan Hamilton formulated the TSP as a puzzle where one tries to generate a Hamiltonian cycle. A Hamiltonian cycle is a tour that visits every node in a graph. Overtime, this evolved to trying to find the most optimal(or shortest) possible route given a graph. Apart from the obvious uses of TSP in vehicle routing and supply chain operations, it can be used in a plethora of other applications such as circuit board design.

In the operation research field, this problem is modeled mathematically as an integer program. While this methodology is highly effective in defining the problem and finding the proven optimal solution, this problem is an NP-hard problem. As the number of nodes increase, the solve times exponentially increase. Thus, work has been done to utilize machine learning techniques to predict optimal (or near optimal) tours for a given network.

Dataset/file format details#

The dataset can be ‘separated’ into 2 parts:

  1. Graph information

    • Customer Coordinate (x,y)

  2. Tour Information

    • Optimal sequence of visiting customers

The data can be found at here. The file contains datasets for TSP10, TSP20, TSP30, TSP50 and TSP100 where TSPn refers to a network of n Customers or nodes. While the data is pre-split into train, validation and test sets, re-separate them based on would be most suitable for your approach.

Additional data can also be easily generated by randomly generating customers in a unit square space. Optimal solutions can be obtained by using current state of the art solvers such as Concorde.

Sample Data#

Each row below corresponds to a different 10 Customer TSP dataset.

0.5793660595972899 0.7777057349094604 0.14206012883152752 0.5196099231652277 0.08009219519202548 0.5250854558725679 0.34332858289616386 0.46651958933311977 0.3799259605996673 0.5470438359404395 0.3381808121172898 0.18992688075754138 0.02571284347570113 0.11841499868977667 0.17112675068611394 0.46252204236151284 0.10827864026615552 0.9470435882364374 0.6663104428835221 0.5635614831230007 output 1 9 3 2 8 7 6 4 5 10 1
0.6520041778999477 0.11662154136245562 0.7566779021000846 0.5200996613002011 0.9813799088415714 0.863119616301226 0.7570818101353403 0.866623336974907 0.01192980586814163 0.5288073981266626 0.9321365982140203 0.014700602164769871 0.9638074448993977 0.16976025876502476 0.9946398204773619 0.27434473929539904 0.57936977010031 0.49700217360612853 0.8439194793049354 0.13964225615249104 output 1 5 9 2 4 3 8 7 6 10 1 

For an n Customer TSP, you will expect 3n+2 columns. The first 2n columns correspond to the coordiates of the customers (x1 y1 x2 y2 … xn yn) followed by output. The remaining n+1 columns indicate the optimal tour starting and ending at customer 1. The starting point is can by any customer, but Customer 1 is used here without the loss of generality.

The cost of each route can be computed by taking the sum of Euclidean distances between sucessive customers in the tour.

Hints and possible approaches#

  1. GCNN

    • The network can be defined as a graph with the links used to present potential arcs between customers

  2. RL

    • We can build a model that learns from itself to generate optimal routes for a given network

    • Advantage: RL would not need the optimal solution for training the model

Note

You should think critically about the question how you want to set up your train/validation/test splits to simulate the challenge.

  • How do you represent your problem in the model?

  • How would you know if your model is helpful?

  • How do you extract the optimal tour from the model output?

Suggested challenges#

  • Consider TSP variants such as mTSP(multiple salesman) or CVRP(multiple vehicles with a capacity limit)

  • What if the optimal route is based on a different metric other than Euclidean distance?