Christmas routes

11 minute read

Last Christmas I was told about this quite interesting problem. It prays like this: Santa needs to deliver presents to all kids in a bunch of cities. To do so, he needs to design routes for his 8 reindeers to visit each city almost once.

To make it easy, it is assumed that the reindeers just need to get to the city and presents are distributed instantly (yes, not realistic, but we are only interested in maths and coding). The objective is to find the minimum speed at which they have to travel to visit all the cities, assuming a constant speed, and a maximum time of 10 hours. Hence, assuming that 30.000 km are travelled by one reindeer, its speed would be 3000 kmph.

We are provided with a matrix of distances, i.e. each position $(i,j)$ in the matrix is the distance from city $i$ to city $j$, and of course there is some redundancy in the matrix since almost half of the numbers are repeated.

I have solved this in a Python notebook to explain while coding. I have no clue whether this is the best solution, but at least is quite fast to compute and as you'll see I have improved after trying different ideas 🙂

I hope you enjoy this problem (please like in such a case!) and I wish you a merry Christmas!

PS: if anyone finds a better solution, please let me know!

import pandas as pd, numpy as np
import matplotlib.pyplot as plt
import networkx as nx
%matplotlib inline
plt.rcParams['figure.figsize'] = (20, 20)

Load data and visualize some rows

graph = pd.read_csv('q1_data.csv', index_col=0)
graph.head(3)

	Mexico City	Sofia	Caracas	Paris	Oslo	Panama	Riga	Hanoi	Athens	Luxembourg	...	Santo Domingo	London	Andorra la Vella	Yerevan	Madrid	Amsterdam	Copenhagen	Lima	Vilnius	Warsaw
Mexico City	0.000000	10943.249595	3597.018894	9196.121646	9195.632410	2407.509542	10027.336535	14759.958516	11280.399674	9417.091543	...	3062.375170	8928.820398	9393.135021	12392.218540	9062.613574	9215.541204	9509.162920	4255.032976	10240.684649	10180.162850
Sofia	10943.249595	0.000000	9235.562797	1757.561532	2096.229362	10360.411414	1585.325302	7847.187565	525.673837	1526.175436	...	8887.495567	2014.605290	1779.664110	1783.484275	2253.345327	1744.422812	1637.012112	11753.661475	1340.976711	1073.949136
Caracas	3597.018894	9235.562797	0.000000	7616.652180	8311.079762	1395.108255	9091.109074	16421.894156	9343.995782	7899.810067	...	951.601568	7496.409509	7456.263338	11009.217933	6992.114712	7851.614143	8385.224076	2744.751040	9195.400241	8944.692095

3 rows × 73 columns

Here we see, for example, that from Sofia to Paris there are 1757 km.

Plot graph

G = nx.from_numpy_matrix(graph.values)
nx.draw(G, with_labels=True, font_size=20, edge_color='g')
labels = nx.get_edge_attributes(G,'weight')
plt.draw()

png

In this case, the graph is not really useful since points are in a 3D coodinate system but we are projecting them onto a 2D space. But it’s a nice a picture, isn’t it? You can use this code to represent a graph in the future. Notice that the nodes (cities) are identified by numbers, but I’m not going to conjecture anything about the graph, because of the projection issue.

Function to find shortest node from current

A really simple policy to decide next destination for a reindeer is as follows: given its position, and the visited cities, we will decide the next move to be the non-visited closest city based on graph distances.

def find_shortest_node(cur, visited, graph):
    distances = graph.sort_values()
    i = 0
    while distances.index[i] in visited:
        i += 1
    return distances.index[i], graph.loc[distances.index[i]]

Distance-based method

Based on previous function, let’s iterate through reindeers and build the first end-to-end algorithm.

city2node = {i: ind for ind,i in enumerate(graph.columns)}
node2city = {ind:i for ind,i in enumerate(graph.columns)}
current = {i: city2node['North Pole'] for i in range(8)}
visited = {city2node['North Pole']}
paths = {i : list() for i in range(8)}
distances = [0] * 8
graph.columns = range(len(graph.columns))
graph.index = range(len(graph.columns))

while len(visited) < len(city2node):
    min_travelled = 1000000000
    for i in range(8):
        next_node_i, dist = find_shortest_node(current[i], visited, graph.iloc[current[i], :])
        if distances[i] + dist < min_travelled:
            min_travelled = distances[i] + dist
            dist_agent = dist
            agent = i
            next_node = next_node_i
    paths[agent].append(node2city[next_node])
    distances[agent] += dist_agent
    visited.update({next_node})
    current[agent] = next_node

print('Speed: {} kmph'.format(max(distances)/10))
for i in paths:
    print(' -> '.join(paths[i]))

Speed: 2350.3072558947 kmph
Oslo -> Amsterdam -> Brussels -> Luxembourg -> Bern -> Madrid -> Ottawa -> Mexico City -> Santiago
Tallinn -> Kiev -> Yerevan -> Baku -> Tehran -> Ashgabat -> Tashkent -> Kabul -> Hanoi -> Taipei
Stockholm -> Warsaw -> Budapest -> Sarajevo -> Valletta -> Pretoria -> Montevideo
Riga -> Belgrade -> Skopje -> Athens -> Colombo -> Jakarta -> Canberra -> Wellington
Moscow -> Ankara -> Nicosia -> Damascus -> New Delhi -> Bangkok -> Manila
Copenhagen -> Berlin -> Prague -> Ljubljana -> Zagreb -> Rome -> Santo Domingo -> Caracas -> Bogota -> Quito -> Lima -> Sucre -> Asuncion -> Buenos Aires
Vilnius -> Bucharest -> Sofia -> Cairo -> Beijing -> Seoul -> Tokyo
Dublin -> London -> Paris -> Andorra la Vella -> Lisbon -> Washington -> Tegucigalpa -> San Jose -> Panama -> Brasilia

Notice that we iterate along all reindeers, select the one that travels the less, and start again until all cities are visited.

Zone-based method

Analyzing the previous results, we see that there is a reindeer starting in Stockholm and finishing in Montevideo. This does not make sense: it is better that reindeers specialize somehow in a zone rather than travelling along the whole globe. An interesting idea could be the following: divide the map in 8 zones and make each reindeer responsible of one zone.

Divide map in zones

This is an easy exercise of clustering. When I started experimenting my idea was to send each reindeer to the centroid of each cluster. Therefore, we cannot use the well-known K-means because the centroid needs to be a city. A good alternative for this is K-medoids. Moreover, the implementation in Python of K-medoids allows feeding the algorithm with a distance matrix, so our data is ready to use. Using SKlearn K-means could be an interesting line of future work.

from kmedoids import kMedoids
M, C = kMedoids(graph.values, k=8)
for i in C:
    print("{}: {}".format(i,list(map(lambda x: node2city[x],C[i]))))

['Oslo', 'Tallinn', 'North Pole', 'Stockholm', 'Copenhagen']
['Hanoi', 'Bangkok', 'Manila', 'Seoul', 'Beijing', 'Tokyo', 'Jakarta', 'Wellington', 'Canberra', 'Taipei']
['Ljubljana', 'Zagreb', 'Valletta', 'Rome', 'Andorra la Vella']
['Paris', 'Luxembourg', 'Lisbon', 'Berlin', 'Brussels', 'Dublin', 'Bern', 'Prague', 'London', 'Madrid', 'Amsterdam']
['Kabul', 'Tashkent', 'Tehran', 'Colombo', 'New Delhi', 'Ashgabat', 'Baku', 'Yerevan']
['Mexico City', 'Caracas', 'Panama', 'Montevideo', 'Tegucigalpa', 'Santiago', 'Buenos Aires', 'Brasilia', 'Asuncion', 'Sucre', 'San Jose', 'Ottawa', 'Quito', 'Bogota', 'Washington', 'Santo Domingo', 'Lima']
['Sofia', 'Athens', 'Damascus', 'Nicosia', 'Ankara', 'Skopje', 'Budapest', 'Sarajevo', 'Pretoria', 'Bucharest', 'Belgrade', 'Cairo']
['Riga', 'Kiev', 'Moscow', 'Vilnius', 'Warsaw']

#### Distribute reindeers per zones

In the first iteration we send each reindeer to one different zone. Note that we need to reload variables

M, C = kMedoids(graph.values, k=8)
current = {i: city2node['North Pole'] for i in range(8)}
visited = {city2node['North Pole']}
paths = {i : list() for i in range(8)}
distances = [0] * 8

for i in range(8):
    next_node, dist = find_shortest_node(current[i], visited, graph.iloc[current[i], C[i]])
    paths[i].append(node2city[next_node])
    distances[i] += dist
    visited.update({next_node})
    current[i] = next_node

while len(visited) < len(city2node):
    min_travelled = 1000000000
    for i in range(8):
        next_node_i, dist = find_shortest_node(current[i], visited, graph.iloc[current[i], :])
        if distances[i] + dist < min_travelled:
            min_travelled = distances[i] + dist
            dist_agent = dist
            agent = i
            next_node = next_node_i
    paths[agent].append(node2city[next_node])
    distances[agent] += dist_agent
    visited.update({next_node})
    current[agent] = next_node

print('Speed: {} kmph'.format(max(distances)/10))
for i in paths:
    print(' -> '.join(paths[i]))

Speed: 2254.7100953912 kmph
Oslo -> Stockholm -> Tallinn -> Riga -> Vilnius -> Warsaw -> Berlin -> Prague -> Budapest -> Zagreb -> Ljubljana -> Sarajevo -> Belgrade -> Skopje -> Sofia -> Bucharest -> Athens -> Ankara -> Kiev -> Moscow -> Copenhagen -> Amsterdam -> London -> Dublin
Baku -> Yerevan -> Tehran -> Ashgabat -> Damascus -> Nicosia -> Cairo -> Valletta -> Rome -> Bern -> Luxembourg -> Brussels -> Paris -> Andorra la Vella -> Madrid -> Lisbon
Sucre -> Asuncion -> Buenos Aires -> Santiago
Beijing -> Seoul -> Tokyo -> Taipei -> Manila -> Hanoi -> Pretoria
Brasilia -> Montevideo -> Lima
Canberra -> Wellington
Ottawa -> Washington -> Santo Domingo -> Caracas -> Bogota -> Quito -> Panama -> San Jose -> Tegucigalpa -> Mexico City
Tashkent -> Kabul -> New Delhi -> Colombo -> Bangkok -> Jakarta

We have sent the reindeers one to each zone, but after that is possible that they move back to the closest zone to the North Pole, so eventually having the same problem again. One interesting idea is to force the reindeers to move along its assigned zone until completing all nodes in it.

M, C = kMedoids(graph.values, k=8)
current = {i: city2node['North Pole'] for i in range(8)}
visited = {city2node['North Pole']}
paths = {i : list() for i in range(8)}
distances = [0] * 8

for i in range(8):
    next_node, dist = find_shortest_node(current[i], visited, graph.iloc[current[i], C[i]])
    paths[i].append(node2city[next_node])
    distances[i] += dist
    visited.update({next_node})
    current[i] = next_node

while len(visited) < len(city2node):
    min_travelled = 1000000000
    for i in range(8):
        if all([v in visited for v in C[i]]):
            next_node_i, dist = find_shortest_node(current[i], visited, graph.iloc[current[i], :])
        else: # finish your zone before helping others
            next_node_i, dist = find_shortest_node(current[i], visited, graph.iloc[current[i], C[i]])
        if distances[i] + dist < min_travelled:
            min_travelled = distances[i] + dist
            dist_agent = dist
            agent = i
            next_node = next_node_i
    paths[agent].append(node2city[next_node])
    distances[agent] += dist_agent
    visited.update({next_node})
    current[agent] = next_node

print('Speed: {} kmph'.format(max(distances)/10))
for i in paths:
    print(' -> '.join(paths[i]))

Speed: 1934.0230497204 kmph
Ottawa -> Washington -> Santo Domingo -> Caracas -> Bogota -> Quito -> Brasilia
Beijing -> Seoul -> Tokyo -> Taipei -> Manila -> Hanoi -> Bangkok -> Jakarta -> Colombo
Lima -> Sucre -> Asuncion -> Buenos Aires -> Montevideo
Kiev -> Bucharest -> Sofia -> Skopje -> Belgrade -> Sarajevo -> Zagreb -> Ljubljana -> Budapest -> Prague -> Bern -> Andorra la Vella -> Madrid -> Lisbon -> Rome -> Valletta
Canberra -> Wellington
Mexico City -> Tegucigalpa -> San Jose -> Panama -> Santiago
Tashkent -> Kabul -> Ashgabat -> Tehran -> Baku -> Yerevan -> Ankara -> Nicosia -> Damascus -> Cairo -> Pretoria
Oslo -> Stockholm -> Tallinn -> Riga -> Vilnius -> Warsaw -> Berlin -> Copenhagen -> Amsterdam -> Brussels -> Luxembourg -> Paris -> London -> Dublin -> Moscow -> Athens -> New Delhi

Initialization matters

It is worth pointing out that the clustering algorithm is not deterministic. It depends on its initialization, and in turn, it has an impact on the performance of next steps:

This can be observed if we execute the last block several times: different results are obtained.
Also, we can see that in some cases the algorithm crashes. This occurs when the North Pole is assigned a cluster itself. Since we are sending a reindeer to this cluster, but the only single node in it has been visited, the algorithm as it is does not work.

So hereafter there is the last version of the algorithm taking into account this, such that:

North Pole is not used for clustering.
200 iterations of the algorithm are run, and the output is the best of those operations, where the difference among all those iterations is the cluster algorithm initialization.

minval = 4000
graph = pd.read_csv('q1_data.csv', index_col=0)
indNorthPole = [ind for ind,i in enumerate(graph.columns) if i == 'North Pole'][0]
cols = list(range(indNorthPole)) + list(range(indNorthPole+1, len(graph))) + [indNorthPole]
graph = graph.iloc[cols,cols]
city2node = {i: ind for ind,i in enumerate(graph.columns)}
node2city = {ind:i for ind,i in enumerate(graph.columns)}
graph.columns = range(len(graph))
graph.index = range(len(graph))

for it in range(1,200):
    M, C = kMedoids(graph.iloc[:-1,:-1].values, k=8)
    current = {i: city2node['North Pole'] for i in range(8)}
    visited = {city2node['North Pole']}
    paths = {i : list() for i in range(8)}
    distances = [0] * 8

    for i in range(8):
        next_node, dist = find_shortest_node(current[i], visited, graph.iloc[current[i], C[i]])
        paths[i].append(node2city[next_node])
        distances[i] += dist
        visited.update({next_node})
        current[i] = next_node

    while len(visited) < len(city2node):
        min_travelled = 1000000000
        for i in range(8):
            if all([v in visited for v in C[i]]):
                next_node_i, dist = find_shortest_node(current[i], visited, graph.iloc[current[i], :])
            else: # finish your zone before helping others
                next_node_i, dist = find_shortest_node(current[i], visited, graph.iloc[current[i], C[i]])
            if distances[i] + dist < min_travelled:
                min_travelled = distances[i] + dist
                dist_agent = dist
                agent = i
                next_node = next_node_i
        paths[agent].append(node2city[next_node])
        distances[agent] += dist_agent
        visited.update({next_node})
        current[agent] = next_node
    tmp = max(distances)/10
    if tmp < minval:
        minval = tmp
        minpaths = paths
        print(tmp)

print('Speed: {} kmph'.format(minval))
for i in minpaths:
    print(' -> '.join(minpaths[i]))

2372.39329872
2075.33687
2061.09365841
1854.00817298
1776.55611721
1720.09690112
1665.42562055
Speed: 1665.4256205544 kmph
Dublin -> London -> Brussels -> Amsterdam -> Luxembourg -> Paris -> Bern -> Andorra la Vella -> Madrid -> Lisbon -> Brasilia
Oslo -> Stockholm -> Tallinn -> Riga -> Vilnius -> Warsaw -> Berlin -> Copenhagen -> Kiev -> Moscow -> Ankara -> New Delhi -> Colombo
Prague -> Budapest -> Zagreb -> Ljubljana -> Sarajevo -> Belgrade -> Skopje -> Sofia -> Bucharest -> Athens -> Valletta -> Rome -> Pretoria
Tashkent -> Kabul -> Ashgabat -> Tehran -> Baku -> Yerevan -> Damascus -> Nicosia -> Cairo
Canberra -> Wellington
Santo Domingo -> Caracas -> Bogota -> Quito -> Lima -> Sucre -> Asuncion -> Buenos Aires -> Montevideo
Beijing -> Seoul -> Tokyo -> Taipei -> Manila -> Hanoi -> Bangkok -> Jakarta
Ottawa -> Washington -> Mexico City -> Tegucigalpa -> San Jose -> Panama -> Santiago

And we have got the best solution overall, forcing each reindeer to travel at most at 1625 kmph.

Check the solution

Finally, just check the speed required by each reindeer by summing up the distance moved and dividing by the 10 hours they have to distribute the presents.

for i in minpaths:
    distance = 0
    current = city2node['North Pole']
    for j in minpaths[i]:
        distance += graph.loc[current, city2node[j]]
        current = city2node[j]
    print(distance/10)

Twitter Facebook Google+ LinkedIn

Christmas routes

Load data and visualize some rows

Plot graph

Function to find shortest node from current

Distance-based method

Zone-based method

Divide map in zones

Initialization matters

Check the solution

You May Also Enjoy

Auto wifi

Hack days and GraphQL

Journal Club

Use Python without installing on Windows