UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

A regularized opponent model with maximum entropy objective

Tian, Z; Wen, Y; Gong, Z; Punakkath, F; Zou, S; Wang, J; (2019) A regularized opponent model with maximum entropy objective. In: Kraus, S, (ed.) Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19). (pp. pp. 602-608). International Joint Conferences on Artifical Intelligence (IJCAI): Macao, China. Green open access

[thumbnail of 0085.pdf]
Preview
Text
0085.pdf - Published Version

Download (427kB) | Preview

Abstract

In a single-agent setting, reinforcement learning (RL) tasks can be cast into an inference problem by introducing a binary random variable o, which stands for the “optimality”. In this paper, we redefine the binary random variable o in multi-agent setting and formalize multi-agent reinforcement learning (MARL) as probabilistic inference. We derive a variational lower bound of the likelihood of achieving the optimality and name it as Regularized Opponent Model with Maximum Entropy Objective (ROMMEO). From ROMMEO, we present a novel perspective on opponent modeling and show how it can improve the performance of training agents theoretically and empirically in cooperative games. To optimize ROMMEO, we first introduce a tabular Q-iteration method ROMMEO-Q with proof of convergence. We extend the exact algorithm to complex environments by proposing an approximate version, ROMMEO-AC. We evaluate these two algorithms on the challenging iterated matrix game and differential game respectively and show that they can outperform strong MARL baselines.

Type: Proceedings paper
Title: A regularized opponent model with maximum entropy objective
Event: Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19)
ISBN-13: 978-0-9992411-4-1
Open access status: An open access version is available from UCL Discovery
Publisher version: https://www.ijcai.org/Proceedings/2019/
Language: English
Additional information: This version is the version of record. For information on re-use, please refer to the publisher’s terms and conditions.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > UCL School of Management
URI: https://discovery.ucl.ac.uk/id/eprint/10091833
Downloads since deposit
64Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item