A cooperative digital twin–multi-agent reinforcement learning for circular supply chains: balanced control across production, logistics, and sustainability

Guzmán, Eduardo; Andrés, Beatriz; Torres-Polo, Marta

Ver/Abrir:

Artículo (6.281Mb)

Identificadores:

URI: http://hdl.handle.net/20.500.12226/3361

ISSN: 0360-8352

Exportar referencia:

Estadísticas:

Ver estadísticas

Indice de impacto:

JCR: Q1

Metadatos

Mostrar el registro completo del ítem

Autor(es):

Guzmán, Eduardo; Andrés, Beatriz; Torres-Polo, Marta

Fecha de publicación:

2026-04-24

Resumen:

Coordination of production, inventory, logistics, and recovery decisions in circular supply chains (CSCs) remains challenging due to demand uncertainty, transport variability, and competing objectives across service, cost, and environmental dimensions. Controlled quantification of trade-offs and information value under matched experimental conditions has rarely been reported in prior digital twin (DT)-enabled control studies. To address this gap, an integrated methodological framework is proposed by coupling a simulation-based DT with cooperative multi-agent reinforcement learning (MARL). Specifically, a five-agent controller was trained to coordinate planning, inventory, logistics, expediting, and recycling decisions under a shared multi-objective reward within a DT–MARL formulation. The primary contribution is methodological: a controlled evaluation protocol with matched seeds, fixed horizons, and 95% confidence intervals is introduced to enable reproducible comparison across baselines, disruption scenarios, and sector archetypes. Moreover, a complementary Value-of-Data (VoD) protocol was used to isolate the marginal impact of cross-functional information integration on controller performance. In benchmark experiments, balanced improvements in lead time and on-time-in-full (OTIF) delivery were observed relative to a No-Op (no-action) baseline, while policy stability was maintained under transport, demand, and energy shocks. Furthermore, transferability was demonstrated across four archetypal operating regimes without retuning. Finally, VoD analysis indicated that integrated observation regimes shifted operating points toward improved resource efficiency.

Palabra(s) clave:

Digital twin

Multi-agent reinforcement learning

Circular supply chain

Data-driven decision making

Simulation-based optimization

Colecciones a las que pertenece:

Artículos de revistas [1314]

Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución 4.0 Internacional