Jafferjee, T;
Ziomek, J;
Yang, T;
Dai, Z;
Wang, J;
Taylor, ME;
Shao, K;
... Mguni, D; + view all
(2025)
Taming Multi-Agent Reinforcement Learning with Estimator Variance Reduction.
In:
AAMAS '25: Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems.
(pp. pp. 1042-1050).
ACM (Association for Computing Machinery): Detroit, MI, USA.
Preview |
Text
2209.01054v2.pdf - Accepted Version Download (5MB) | Preview |
Abstract
Multi-agent reinforcement learning (MARL) enables systems of autonomous agents to solve complex tasks from jointly gathered experiences of the environment. Many MARL algorithms perform centralized training (CT), often in a simulated environment, where at each time-step the critic makes use of a single sample of the agents' joint-action for training. Yet, as agents update their policies during training, these single samples may poorly represent the agents' joint-policy leading to high variance gradient estimates that hinder learning. In this paper, we examine the effect on MARL estimators of allowing the number of joint-action samples taken at each time-step to be greater than 1 in training. Our theoretical analysis shows that even modestly increasing the number of joint-action samples shown to the critic leads to TD updates that closely approximate the true expected value under the current joint-policy. In particular, we prove this reduces variance in value estimates similar to that of decentralized training while maintaining the learning benefits of CT. We describe how such a protocol can be seamlessly realized by sharing policy parameters between the agents during training and apply the technique to induce lower variance in estimates in MARL methods within a general apparatus which we call Performance Enhancing Reinforcement Learning Apparatus (PERLA). Lastly, we demonstrate PERLA's performance improvements and estimator variance reduction capabilities in a range of environments including Multi-agent Mujoco, and StarCraft II.
| Type: | Proceedings paper |
|---|---|
| Title: | Taming Multi-Agent Reinforcement Learning with Estimator Variance Reduction |
| Event: | AAMAS '25: 24th International Conference on Autonomous Agents and Multiagent Systems |
| Location: | MI, Detroit |
| Dates: | 19 May 2025 - 23 May 2025 |
| Open access status: | An open access version is available from UCL Discovery |
| DOI: | 10.5555/3709347.3743624 |
| Publisher version: | https://dl.acm.org/doi/10.5555/3709347.3743624 |
| Language: | English |
| Additional information: | This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions. |
| Keywords: | Multi-agent Reinforcement Learning, Centralised Training-Decentralised Execution, Variance Reduction |
| UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science |
| URI: | https://discovery.ucl.ac.uk/id/eprint/10217055 |
Archive Staff Only
![]() |
View Item |

