Toward Autonomous Supply Chains: A Deep Reinforcement Learning Framework

Bui Thi Kim Uyen; Bui Trong Hieu

International Journal of Advanced Multidisciplinary Research and Studies

Volume 6, Issue 2, 2026

Toward Autonomous Supply Chains: A Deep Reinforcement Learning Framework

Author(s): Bui Thi Kim Uyen, Bui Trong Hieu

DOI: https://doi.org/10.62225/2583049X.2026.6.2.5959

Abstract:

This paper proposes a comprehensive deep reinforcement learning (DRL) framework aimed at enabling autonomous decision-making in supply chain systems operating under uncertainty. Traditional supply chain optimization approaches typically decompose forecasting and operational decisions into separate modules, often relying on static assumptions and deterministic models. In contrast, we model the supply chain as a sequential decision-making problem formulated as a Markov Decision Process (MDP), where inventory replenishment and transportation allocation are jointly optimized. An actor–critic architecture is employed to learn adaptive policies directly from interaction with a stochastic simulation environment characterized by random demand, variable lead times, and transportation constraints. Extensive computational experiments demonstrate that the proposed framework achieves significant reductions in total cost and stock-out frequency compared to classical base-stock and heuristic routing strategies. Moreover, the learned policies exhibit strong adaptability under demand fluctuations. The results highlight the feasibility of transitioning from rule-based planning systems toward fully autonomous, learning-driven supply chains. We formalize the theoretical foundations, propose scalable architectures, and identify open research challenges. The results demonstrate the transformative potential of DRL in reshaping supply chain management toward self-learning and self-optimizing ecosystems. The approach improves trust and interpretability.

Keywords: Actor-Critic Architecture, Adaptive Decision-Making, Autonomous Supply Chains, Deep Reinforcement Learning, Markov Decision Process (MDP), Logistics Systems Optimization, Stochastic Demand, Transportation Integration

Pages: 301-312

Download Full Article: Click Here

Volume 6, Issue 2, 2026

Toward Autonomous Supply Chains: A Deep Reinforcement Learning Framework

Contact Information