Tian, Z;
Wen, Y;
Gong, Z;
Punakkath, F;
Zou, S;
Wang, J;
(2019)
A regularized opponent model with maximum entropy objective.
In: Kraus, S, (ed.)
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19).
(pp. pp. 602-608).
International Joint Conferences on Artifical Intelligence (IJCAI): Macao, China.
Preview |
Text
0085.pdf - Published Version Download (427kB) | Preview |
Abstract
In a single-agent setting, reinforcement learning (RL) tasks can be cast into an inference problem by introducing a binary random variable o, which stands for the “optimality”. In this paper, we redefine the binary random variable o in multi-agent setting and formalize multi-agent reinforcement learning (MARL) as probabilistic inference. We derive a variational lower bound of the likelihood of achieving the optimality and name it as Regularized Opponent Model with Maximum Entropy Objective (ROMMEO). From ROMMEO, we present a novel perspective on opponent modeling and show how it can improve the performance of training agents theoretically and empirically in cooperative games. To optimize ROMMEO, we first introduce a tabular Q-iteration method ROMMEO-Q with proof of convergence. We extend the exact algorithm to complex environments by proposing an approximate version, ROMMEO-AC. We evaluate these two algorithms on the challenging iterated matrix game and differential game respectively and show that they can outperform strong MARL baselines.
Type: | Proceedings paper |
---|---|
Title: | A regularized opponent model with maximum entropy objective |
Event: | Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19) |
ISBN-13: | 978-0-9992411-4-1 |
Open access status: | An open access version is available from UCL Discovery |
Publisher version: | https://www.ijcai.org/Proceedings/2019/ |
Language: | English |
Additional information: | This version is the version of record. For information on re-use, please refer to the publisher’s terms and conditions. |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > UCL School of Management |
URI: | https://discovery.ucl.ac.uk/id/eprint/10091833 |
Archive Staff Only
View Item |