Multi-echelon inventory models aim to minimize the system-wide total cost in a multi-stage supply chain by applying a proper ordering policy to each stage. The optimal solution is known only when several strict assumptions regarding the cost structure are made. To solve scenarios where those assumptions are relaxed, we apply and compare three efficient deep reinforcement learning (DRL) algorithms, namely Deep Q-network, Advantage Actor-Critic and Twin Delayed Deep Deterministic Policy Gradient, to efficiently determine the inventory policy. We consider a serial supply chain as in the beer game, a classic multi-echelon inventory problem, and extend the application of DRL to the centralized decision-making setting which is more complex due to significantly larger state and action space. The experiments show that in both decentralized and centralized settings, the DRL agents learned policies with significant cost savings compared to benchmark heuristics.