A cooperative digital twin–multi-agent reinforcement learning for circular supply chains: balanced control across production, logistics, and sustainability
Ver/Abrir:
Exportar referencia:
Compartir:
Estadísticas:
Ver estadísticasIndice de impacto:
Metadatos
Mostrar el registro completo del ítemFecha de publicación:
2026-04-24Resumen:
Coordination of production, inventory, logistics, and recovery decisions in circular supply chains (CSCs) remains challenging due to demand uncertainty, transport variability, and competing objectives across service, cost, and environmental dimensions. Controlled quantification of trade-offs and information value under matched experimental conditions has rarely been reported in prior digital twin (DT)-enabled control studies. To address this gap, an integrated methodological framework is proposed by coupling a simulation-based DT with cooperative multi-agent reinforcement learning (MARL). Specifically, a five-agent controller was trained to coordinate planning, inventory, logistics, expediting, and recycling decisions under a shared multi-objective reward within a DT–MARL formulation. The primary contribution is methodological: a controlled evaluation protocol with matched seeds, fixed horizons, and 95% confidence intervals is introduced to enable reproducible comparison across baselines, disruption scenarios, and sector archetypes. Moreover, a complementary Value-of-Data (VoD) protocol was used to isolate the marginal impact of cross-functional information integration on controller performance. In benchmark experiments, balanced improvements in lead time and on-time-in-full (OTIF) delivery were observed relative to a No-Op (no-action) baseline, while policy stability was maintained under transport, demand, and energy shocks. Furthermore, transferability was demonstrated across four archetypal operating regimes without retuning. Finally, VoD analysis indicated that integrated observation regimes shifted operating points toward improved resource efficiency.
Coordination of production, inventory, logistics, and recovery decisions in circular supply chains (CSCs) remains challenging due to demand uncertainty, transport variability, and competing objectives across service, cost, and environmental dimensions. Controlled quantification of trade-offs and information value under matched experimental conditions has rarely been reported in prior digital twin (DT)-enabled control studies. To address this gap, an integrated methodological framework is proposed by coupling a simulation-based DT with cooperative multi-agent reinforcement learning (MARL). Specifically, a five-agent controller was trained to coordinate planning, inventory, logistics, expediting, and recycling decisions under a shared multi-objective reward within a DT–MARL formulation. The primary contribution is methodological: a controlled evaluation protocol with matched seeds, fixed horizons, and 95% confidence intervals is introduced to enable reproducible comparison across baselines, disruption scenarios, and sector archetypes. Moreover, a complementary Value-of-Data (VoD) protocol was used to isolate the marginal impact of cross-functional information integration on controller performance. In benchmark experiments, balanced improvements in lead time and on-time-in-full (OTIF) delivery were observed relative to a No-Op (no-action) baseline, while policy stability was maintained under transport, demand, and energy shocks. Furthermore, transferability was demonstrated across four archetypal operating regimes without retuning. Finally, VoD analysis indicated that integrated observation regimes shifted operating points toward improved resource efficiency.
Palabra(s) clave:
Digital twin
Multi-agent reinforcement learning
Circular supply chain
Data-driven decision making
Simulation-based optimization
Colecciones a las que pertenece:
- Artículos de revistas [1314]


