Taming Multi-Agent Reinforcement Learning with Estimator Variance Reduction

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Taming Multi-Agent Reinforcement Learning with Estimator Variance Reduction

Jafferjee, T; Ziomek, J; Yang, T; Dai, Z; Wang, J; Taylor, ME; Shao, K; ... Mguni, D; + view all (2025) Taming Multi-Agent Reinforcement Learning with Estimator Variance Reduction. In: AAMAS '25: Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems. (pp. pp. 1042-1050). ACM (Association for Computing Machinery): Detroit, MI, USA. Green open access

Preview

Text
2209.01054v2.pdf - Accepted Version
Download (5MB) | Preview

Abstract

Multi-agent reinforcement learning (MARL) enables systems of autonomous agents to solve complex tasks from jointly gathered experiences of the environment. Many MARL algorithms perform centralized training (CT), often in a simulated environment, where at each time-step the critic makes use of a single sample of the agents' joint-action for training. Yet, as agents update their policies during training, these single samples may poorly represent the agents' joint-policy leading to high variance gradient estimates that hinder learning. In this paper, we examine the effect on MARL estimators of allowing the number of joint-action samples taken at each time-step to be greater than 1 in training. Our theoretical analysis shows that even modestly increasing the number of joint-action samples shown to the critic leads to TD updates that closely approximate the true expected value under the current joint-policy. In particular, we prove this reduces variance in value estimates similar to that of decentralized training while maintaining the learning benefits of CT. We describe how such a protocol can be seamlessly realized by sharing policy parameters between the agents during training and apply the technique to induce lower variance in estimates in MARL methods within a general apparatus which we call Performance Enhancing Reinforcement Learning Apparatus (PERLA). Lastly, we demonstrate PERLA's performance improvements and estimator variance reduction capabilities in a range of environments including Multi-agent Mujoco, and StarCraft II.

Type:	Proceedings paper
Title:	Taming Multi-Agent Reinforcement Learning with Estimator Variance Reduction
Event:	AAMAS '25: 24th International Conference on Autonomous Agents and Multiagent Systems
Location:	MI, Detroit
Dates:	19 May 2025 - 23 May 2025
Open access status:	An open access version is available from UCL Discovery
DOI:	10.5555/3709347.3743624
Publisher version:	https://dl.acm.org/doi/10.5555/3709347.3743624
Language:	English
Additional information:	This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords:	Multi-agent Reinforcement Learning, Centralised Training-Decentralised Execution, Variance Reduction
UCL classification:	UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI:	https://discovery.ucl.ac.uk/id/eprint/10217055

Downloads since deposit

3Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item