Solving multi-echelon inventory problems with heuristic-guided deep reinforcement learning and centralized control (JOPT)


Multi-echelon inventory models aim to minimize the system-wide total cost in a multi-stage supply chain by applying a proper ordering policy to each of the stages. In a practical inventory system when backlog costs can be incurred in multiple stages, this problem cannot be solved analytically and it is intractable to solve by traditional optimization methods. To alleviate the curse of dimensionality in this problem, we apply and compare three efficient deep reinforcement learning (DRL) algorithms namely Deep Q-network, Advantage Actor-Critic and Twin Delayed Deep Deterministic Policy Gradient, to efficiently determine the inventory policy. We consider a serial supply chain as in the beer game, a classic multi-echelon inventory problem, and extend the application of DRL to the centralized decision-making setting which is more complex due to significantly larger state and action space. We also propose a heuristic-guided exploration mechanism to improve the training efficiency by incorporating known heuristics into the exploration process of the DRL algorithms. The experiments show that in both decentralized and centralized settings, the DRL agents learned policies with significant cost savings compared to benchmark heuristics.

May 16, 2022 4:20 PM — May 16, 2025 4:45 PM
HEC Montréal