Scalable Model-based Policy Optimization for Decentralized Networked Systems

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Scalable Model-based Policy Optimization for Decentralized Networked Systems

Du, Y; Ma, C; Liu, Y; Lin, R; Dong, H; Wang, J; Yang, Y; (2022) Scalable Model-based Policy Optimization for Decentralized Networked Systems. In: IEEE International Conference on Intelligent Robots and Systems. (pp. pp. 9019-9026). IEEE: Kyoto, Japan. Green open access

Preview

Text
2207.06559.pdf - Accepted Version
Download (8MB) | Preview

Abstract

Reinforcement learning algorithms require a large amount of samples; this often limits their real-world applications on even simple tasks. Such a challenge is more outstanding in multi-agent tasks, as each step of operation is more costly, requiring communications or shifting or resources. This work aims to improve data efficiency of multi-agent control by model-based learning. We consider networked systems where agents are cooperative and communicate only locally with their neighbors, and propose the decentralized model-based policy optimization framework (DMPO). In our method, each agent learns a dynamic model to predict future states and broadcast their predictions by communication, and then the policies are trained under the model rollouts. To alleviate the bias of model-generated data, we restrain the model usage for generating myopic rollouts, thus reducing the compounding error of model generation. To pertain the independence of policy update, we introduce extended value function and theoretically prove that the resulting policy gradient is a close approximation to true policy gradients. We evaluate our algorithm on several benchmarks for intelligent transportation systems, which are connected autonomous vehicle control tasks (Flow and CACC) and adaptive traffic signal control (ATSC). Empirical results show that our method achieves superior data efficiency and matches the performance of model-free methods using true models. The source code of our algorithm and baselines can be found at https://github.com/PKU-MARL/Model-Based-MARL.

Type:	Proceedings paper
Title:	Scalable Model-based Policy Optimization for Decentralized Networked Systems
Event:	2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Dates:	23 Oct 2022 - 27 Oct 2022
ISBN-13:	9781665479271
Open access status:	An open access version is available from UCL Discovery
DOI:	10.1109/IROS47612.2022.9982253
Publisher version:	https://doi.org/10.1109/IROS47612.2022.9982253
Language:	English
Additional information:	This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords:	Adaptation models, Source coding, Reinforcement learning, Predictive models, Approximation algorithms, Data models, Task analysis
UCL classification:	UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI:	https://discovery.ucl.ac.uk/id/eprint/10164561

Downloads since deposit

114Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item