Multi-echelon inventory models aim to minimize the system-wide cost in a supply chain by applying a proper ordering policy to each stage. When backlog costs can be incurred in multiple stages, this problem is intractable using traditional optimization methods. To this end, we apply and compare three efficient deep reinforcement learning (DRL) algorithms to determine the inventory policy. We extend the problem to the centralized decision-making setting with significantly larger state and action space. We also propose a heuristic-guided exploration mechanism to improve training efficiency. Experiments show significant cost savings learned by DRL agents against benchmark heuristics.