We consider a multi-echelon inventory management problem that aims to minimize the system-wide total backorder costs and inventory holding costs. When backorder costs are incurred at more than one stage, the optimal policy is unknown even in a simple serial system. We apply and compare three state-of-the-art deep reinforcement learning (DRL) algorithms including Dueling Double Deep Q-network, Advantage Actor-Critic and Twin Delayed Deep Deterministic Policy Gradient. We also propose a mechanism, Heuristic-Guided Exploration, to improve the training efficiency by incorporating known heuristics into the exploration process of the DRL algorithms.