Learning implicit multiple time windows in the traveling salesman problem

Abstract

Classically, researchers working in vehicle routing problems (VRPs) assume that the structure of the problem is known (i.e., objective function, constraints, parameters). However, recent studies have highlighted the gap between the routes offered by classical optimization algorithms and the routes followed by experienced drivers. As a result, researchers have turned their attention towards the acquisition and inclusion of drivers’ knowledge to learn the order in which each customer is going to be served by the driver. In this study, we describe and solve a new problem called the multiple time window learning problem. In contrast to other VRP variants, the goal is to learn the time windows associated with each customer. Our approaches are based on the observation and exploitation of historical data with a new algorithm called the recall heuristic, and the exploration of new information based on the multi-armed bandit problem. Computational results based on real data extracted from a traffic sign dataset from the city of Montreal showed that our approaches can learn time windows and, as a result, propose routes similar to those created by experienced drivers, while still minimizing the routing costs.

Publication
To appear in Transportation Research Procedia

Related