Christmas routes
Last Christmas I was told about this quite interesting problem. It prays like this: Santa needs to deliver presents to all kids in a bunch of cities. To do so, he needs to design routes for his 8 reindeers to visit each city almost once.
To make it easy, it is assumed that the reindeers just need to get to the city and presents are distributed instantly (yes, not realistic, but we are only interested in maths and coding). The objective is to find the minimum speed at which they have to travel to visit all the cities, assuming a constant speed, and a maximum time of 10 hours. Hence, assuming that 30.000 km are travelled by one reindeer, its speed would be 3000 kmph.
We are provided with a matrix of distances, i.e. each position $(i,j)$ in the matrix is the distance from city $i$ to city $j$, and of course there is some redundancy in the matrix since almost half of the numbers are repeated.
I have solved this in a Python notebook to explain while coding. I have no clue whether this is the best solution, but at least is quite fast to compute and as you'll see I have improved after trying different ideas 🙂
I hope you enjoy this problem (please like in such a case!) and I wish you a merry Christmas!
PS: if anyone finds a better solution, please let me know!
import pandas as pd, numpy as np
import matplotlib.pyplot as plt
import networkx as nx
%matplotlib inline
plt.rcParams['figure.figsize'] = (20, 20)
Load data and visualize some rows
graph = pd.read_csv('q1_data.csv', index_col=0)
graph.head(3)
Mexico City | Sofia | Caracas | Paris | Oslo | Panama | Riga | Hanoi | Athens | Luxembourg | ... | Santo Domingo | London | Andorra la Vella | Yerevan | Madrid | Amsterdam | Copenhagen | Lima | Vilnius | Warsaw | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Mexico City | 0.000000 | 10943.249595 | 3597.018894 | 9196.121646 | 9195.632410 | 2407.509542 | 10027.336535 | 14759.958516 | 11280.399674 | 9417.091543 | ... | 3062.375170 | 8928.820398 | 9393.135021 | 12392.218540 | 9062.613574 | 9215.541204 | 9509.162920 | 4255.032976 | 10240.684649 | 10180.162850 |
Sofia | 10943.249595 | 0.000000 | 9235.562797 | 1757.561532 | 2096.229362 | 10360.411414 | 1585.325302 | 7847.187565 | 525.673837 | 1526.175436 | ... | 8887.495567 | 2014.605290 | 1779.664110 | 1783.484275 | 2253.345327 | 1744.422812 | 1637.012112 | 11753.661475 | 1340.976711 | 1073.949136 |
Caracas | 3597.018894 | 9235.562797 | 0.000000 | 7616.652180 | 8311.079762 | 1395.108255 | 9091.109074 | 16421.894156 | 9343.995782 | 7899.810067 | ... | 951.601568 | 7496.409509 | 7456.263338 | 11009.217933 | 6992.114712 | 7851.614143 | 8385.224076 | 2744.751040 | 9195.400241 | 8944.692095 |
3 rows × 73 columns
Here we see, for example, that from Sofia to Paris there are 1757 km.
Plot graph
G = nx.from_numpy_matrix(graph.values)
nx.draw(G, with_labels=True, font_size=20, edge_color='g')
labels = nx.get_edge_attributes(G,'weight')
plt.draw()
In this case, the graph is not really useful since points are in a 3D coodinate system but we are projecting them onto a 2D space. But it’s a nice a picture, isn’t it? You can use this code to represent a graph in the future. Notice that the nodes (cities) are identified by numbers, but I’m not going to conjecture anything about the graph, because of the projection issue.
Function to find shortest node from current
A really simple policy to decide next destination for a reindeer is as follows: given its position, and the visited cities, we will decide the next move to be the non-visited closest city based on graph distances.
def find_shortest_node(cur, visited, graph):
distances = graph.sort_values()
i = 0
while distances.index[i] in visited:
i += 1
return distances.index[i], graph.loc[distances.index[i]]
Distance-based method
Based on previous function, let’s iterate through reindeers and build the first end-to-end algorithm.
city2node = {i: ind for ind,i in enumerate(graph.columns)}
node2city = {ind:i for ind,i in enumerate(graph.columns)}
current = {i: city2node['North Pole'] for i in range(8)}
visited = {city2node['North Pole']}
paths = {i : list() for i in range(8)}
distances = [0] * 8
graph.columns = range(len(graph.columns))
graph.index = range(len(graph.columns))
while len(visited) < len(city2node):
min_travelled = 1000000000
for i in range(8):
next_node_i, dist = find_shortest_node(current[i], visited, graph.iloc[current[i], :])
if distances[i] + dist < min_travelled:
min_travelled = distances[i] + dist
dist_agent = dist
agent = i
next_node = next_node_i
paths[agent].append(node2city[next_node])
distances[agent] += dist_agent
visited.update({next_node})
current[agent] = next_node
print('Speed: {} kmph'.format(max(distances)/10))
for i in paths:
print(' -> '.join(paths[i]))
Speed: 2350.3072558947 kmph
Oslo -> Amsterdam -> Brussels -> Luxembourg -> Bern -> Madrid -> Ottawa -> Mexico City -> Santiago
Tallinn -> Kiev -> Yerevan -> Baku -> Tehran -> Ashgabat -> Tashkent -> Kabul -> Hanoi -> Taipei
Stockholm -> Warsaw -> Budapest -> Sarajevo -> Valletta -> Pretoria -> Montevideo
Riga -> Belgrade -> Skopje -> Athens -> Colombo -> Jakarta -> Canberra -> Wellington
Moscow -> Ankara -> Nicosia -> Damascus -> New Delhi -> Bangkok -> Manila
Copenhagen -> Berlin -> Prague -> Ljubljana -> Zagreb -> Rome -> Santo Domingo -> Caracas -> Bogota -> Quito -> Lima -> Sucre -> Asuncion -> Buenos Aires
Vilnius -> Bucharest -> Sofia -> Cairo -> Beijing -> Seoul -> Tokyo
Dublin -> London -> Paris -> Andorra la Vella -> Lisbon -> Washington -> Tegucigalpa -> San Jose -> Panama -> Brasilia
Notice that we iterate along all reindeers, select the one that travels the less, and start again until all cities are visited.
Zone-based method
Analyzing the previous results, we see that there is a reindeer starting in Stockholm and finishing in Montevideo. This does not make sense: it is better that reindeers specialize somehow in a zone rather than travelling along the whole globe. An interesting idea could be the following: divide the map in 8 zones and make each reindeer responsible of one zone.
Divide map in zones
This is an easy exercise of clustering. When I started experimenting my idea was to send each reindeer to the centroid of each cluster. Therefore, we cannot use the well-known K-means because the centroid needs to be a city. A good alternative for this is K-medoids. Moreover, the implementation in Python of K-medoids allows feeding the algorithm with a distance matrix, so our data is ready to use. Using SKlearn K-means could be an interesting line of future work.
from kmedoids import kMedoids
M, C = kMedoids(graph.values, k=8)
for i in C:
print("{}: {}".format(i,list(map(lambda x: node2city[x],C[i]))))
0: ['Oslo', 'Tallinn', 'North Pole', 'Stockholm', 'Copenhagen']
1: ['Hanoi', 'Bangkok', 'Manila', 'Seoul', 'Beijing', 'Tokyo', 'Jakarta', 'Wellington', 'Canberra', 'Taipei']
2: ['Ljubljana', 'Zagreb', 'Valletta', 'Rome', 'Andorra la Vella']
3: ['Paris', 'Luxembourg', 'Lisbon', 'Berlin', 'Brussels', 'Dublin', 'Bern', 'Prague', 'London', 'Madrid', 'Amsterdam']
4: ['Kabul', 'Tashkent', 'Tehran', 'Colombo', 'New Delhi', 'Ashgabat', 'Baku', 'Yerevan']
5: ['Mexico City', 'Caracas', 'Panama', 'Montevideo', 'Tegucigalpa', 'Santiago', 'Buenos Aires', 'Brasilia', 'Asuncion', 'Sucre', 'San Jose', 'Ottawa', 'Quito', 'Bogota', 'Washington', 'Santo Domingo', 'Lima']
6: ['Sofia', 'Athens', 'Damascus', 'Nicosia', 'Ankara', 'Skopje', 'Budapest', 'Sarajevo', 'Pretoria', 'Bucharest', 'Belgrade', 'Cairo']
7: ['Riga', 'Kiev', 'Moscow', 'Vilnius', 'Warsaw']
####Â Distribute reindeers per zones
In the first iteration we send each reindeer to one different zone. Note that we need to reload variables
M, C = kMedoids(graph.values, k=8)
current = {i: city2node['North Pole'] for i in range(8)}
visited = {city2node['North Pole']}
paths = {i : list() for i in range(8)}
distances = [0] * 8
for i in range(8):
next_node, dist = find_shortest_node(current[i], visited, graph.iloc[current[i], C[i]])
paths[i].append(node2city[next_node])
distances[i] += dist
visited.update({next_node})
current[i] = next_node
while len(visited) < len(city2node):
min_travelled = 1000000000
for i in range(8):
next_node_i, dist = find_shortest_node(current[i], visited, graph.iloc[current[i], :])
if distances[i] + dist < min_travelled:
min_travelled = distances[i] + dist
dist_agent = dist
agent = i
next_node = next_node_i
paths[agent].append(node2city[next_node])
distances[agent] += dist_agent
visited.update({next_node})
current[agent] = next_node
print('Speed: {} kmph'.format(max(distances)/10))
for i in paths:
print(' -> '.join(paths[i]))
Speed: 2254.7100953912 kmph
Oslo -> Stockholm -> Tallinn -> Riga -> Vilnius -> Warsaw -> Berlin -> Prague -> Budapest -> Zagreb -> Ljubljana -> Sarajevo -> Belgrade -> Skopje -> Sofia -> Bucharest -> Athens -> Ankara -> Kiev -> Moscow -> Copenhagen -> Amsterdam -> London -> Dublin
Baku -> Yerevan -> Tehran -> Ashgabat -> Damascus -> Nicosia -> Cairo -> Valletta -> Rome -> Bern -> Luxembourg -> Brussels -> Paris -> Andorra la Vella -> Madrid -> Lisbon
Sucre -> Asuncion -> Buenos Aires -> Santiago
Beijing -> Seoul -> Tokyo -> Taipei -> Manila -> Hanoi -> Pretoria
Brasilia -> Montevideo -> Lima
Canberra -> Wellington
Ottawa -> Washington -> Santo Domingo -> Caracas -> Bogota -> Quito -> Panama -> San Jose -> Tegucigalpa -> Mexico City
Tashkent -> Kabul -> New Delhi -> Colombo -> Bangkok -> Jakarta
We have sent the reindeers one to each zone, but after that is possible that they move back to the closest zone to the North Pole, so eventually having the same problem again. One interesting idea is to force the reindeers to move along its assigned zone until completing all nodes in it.
M, C = kMedoids(graph.values, k=8)
current = {i: city2node['North Pole'] for i in range(8)}
visited = {city2node['North Pole']}
paths = {i : list() for i in range(8)}
distances = [0] * 8
for i in range(8):
next_node, dist = find_shortest_node(current[i], visited, graph.iloc[current[i], C[i]])
paths[i].append(node2city[next_node])
distances[i] += dist
visited.update({next_node})
current[i] = next_node
while len(visited) < len(city2node):
min_travelled = 1000000000
for i in range(8):
if all([v in visited for v in C[i]]):
next_node_i, dist = find_shortest_node(current[i], visited, graph.iloc[current[i], :])
else: # finish your zone before helping others
next_node_i, dist = find_shortest_node(current[i], visited, graph.iloc[current[i], C[i]])
if distances[i] + dist < min_travelled:
min_travelled = distances[i] + dist
dist_agent = dist
agent = i
next_node = next_node_i
paths[agent].append(node2city[next_node])
distances[agent] += dist_agent
visited.update({next_node})
current[agent] = next_node
print('Speed: {} kmph'.format(max(distances)/10))
for i in paths:
print(' -> '.join(paths[i]))
Speed: 1934.0230497204 kmph
Ottawa -> Washington -> Santo Domingo -> Caracas -> Bogota -> Quito -> Brasilia
Beijing -> Seoul -> Tokyo -> Taipei -> Manila -> Hanoi -> Bangkok -> Jakarta -> Colombo
Lima -> Sucre -> Asuncion -> Buenos Aires -> Montevideo
Kiev -> Bucharest -> Sofia -> Skopje -> Belgrade -> Sarajevo -> Zagreb -> Ljubljana -> Budapest -> Prague -> Bern -> Andorra la Vella -> Madrid -> Lisbon -> Rome -> Valletta
Canberra -> Wellington
Mexico City -> Tegucigalpa -> San Jose -> Panama -> Santiago
Tashkent -> Kabul -> Ashgabat -> Tehran -> Baku -> Yerevan -> Ankara -> Nicosia -> Damascus -> Cairo -> Pretoria
Oslo -> Stockholm -> Tallinn -> Riga -> Vilnius -> Warsaw -> Berlin -> Copenhagen -> Amsterdam -> Brussels -> Luxembourg -> Paris -> London -> Dublin -> Moscow -> Athens -> New Delhi
Initialization matters
It is worth pointing out that the clustering algorithm is not deterministic. It depends on its initialization, and in turn, it has an impact on the performance of next steps:
- This can be observed if we execute the last block several times: different results are obtained.
- Also, we can see that in some cases the algorithm crashes. This occurs when the North Pole is assigned a cluster itself. Since we are sending a reindeer to this cluster, but the only single node in it has been visited, the algorithm as it is does not work.
So hereafter there is the last version of the algorithm taking into account this, such that:
- North Pole is not used for clustering.
- 200 iterations of the algorithm are run, and the output is the best of those operations, where the difference among all those iterations is the cluster algorithm initialization.
minval = 4000
graph = pd.read_csv('q1_data.csv', index_col=0)
indNorthPole = [ind for ind,i in enumerate(graph.columns) if i == 'North Pole'][0]
cols = list(range(indNorthPole)) + list(range(indNorthPole+1, len(graph))) + [indNorthPole]
graph = graph.iloc[cols,cols]
city2node = {i: ind for ind,i in enumerate(graph.columns)}
node2city = {ind:i for ind,i in enumerate(graph.columns)}
graph.columns = range(len(graph))
graph.index = range(len(graph))
for it in range(1,200):
M, C = kMedoids(graph.iloc[:-1,:-1].values, k=8)
current = {i: city2node['North Pole'] for i in range(8)}
visited = {city2node['North Pole']}
paths = {i : list() for i in range(8)}
distances = [0] * 8
for i in range(8):
next_node, dist = find_shortest_node(current[i], visited, graph.iloc[current[i], C[i]])
paths[i].append(node2city[next_node])
distances[i] += dist
visited.update({next_node})
current[i] = next_node
while len(visited) < len(city2node):
min_travelled = 1000000000
for i in range(8):
if all([v in visited for v in C[i]]):
next_node_i, dist = find_shortest_node(current[i], visited, graph.iloc[current[i], :])
else: # finish your zone before helping others
next_node_i, dist = find_shortest_node(current[i], visited, graph.iloc[current[i], C[i]])
if distances[i] + dist < min_travelled:
min_travelled = distances[i] + dist
dist_agent = dist
agent = i
next_node = next_node_i
paths[agent].append(node2city[next_node])
distances[agent] += dist_agent
visited.update({next_node})
current[agent] = next_node
tmp = max(distances)/10
if tmp < minval:
minval = tmp
minpaths = paths
print(tmp)
print('Speed: {} kmph'.format(minval))
for i in minpaths:
print(' -> '.join(minpaths[i]))
2372.39329872
2075.33687
2061.09365841
1854.00817298
1776.55611721
1720.09690112
1665.42562055
Speed: 1665.4256205544 kmph
Dublin -> London -> Brussels -> Amsterdam -> Luxembourg -> Paris -> Bern -> Andorra la Vella -> Madrid -> Lisbon -> Brasilia
Oslo -> Stockholm -> Tallinn -> Riga -> Vilnius -> Warsaw -> Berlin -> Copenhagen -> Kiev -> Moscow -> Ankara -> New Delhi -> Colombo
Prague -> Budapest -> Zagreb -> Ljubljana -> Sarajevo -> Belgrade -> Skopje -> Sofia -> Bucharest -> Athens -> Valletta -> Rome -> Pretoria
Tashkent -> Kabul -> Ashgabat -> Tehran -> Baku -> Yerevan -> Damascus -> Nicosia -> Cairo
Canberra -> Wellington
Santo Domingo -> Caracas -> Bogota -> Quito -> Lima -> Sucre -> Asuncion -> Buenos Aires -> Montevideo
Beijing -> Seoul -> Tokyo -> Taipei -> Manila -> Hanoi -> Bangkok -> Jakarta
Ottawa -> Washington -> Mexico City -> Tegucigalpa -> San Jose -> Panama -> Santiago
And we have got the best solution overall, forcing each reindeer to travel at most at 1625 kmph.
Check the solution
Finally, just check the speed required by each reindeer by summing up the distance moved and dividing by the 10 hours they have to distribute the presents.
for i in minpaths:
distance = 0
current = city2node['North Pole']
for j in minpaths[i]:
distance += graph.loc[current, city2node[j]]
current = city2node[j]
print(distance/10)
1153.35693483
1720.09690112
1625.70517399
1480.46545108
1600.72616671
1205.2264646
1582.87191227
1476.92909603