Anthony, TW;
Eccles, T;
Tacchetti, A;
Kramár, J;
Gemp, IM;
Hudson, TC;
Porcel, N;
... Bachrach, Y; + view all
(2020)
Learning to Play No-Press Diplomacy with Best Response Policy Iteration.
In:
Advances in Neural Information Processing Systems 33 pre-proceedings (NeurIPS 2020).
NeurIPS
(In press).
Preview |
Text
Learning to Play No-Press Diplomacy with Best Response Policy Iteration.pdf - Published Version Download (1MB) | Preview |
Abstract
Recent advances in deep reinforcement learning (RL) have led to considerable progress in many 2-player zero-sum games, such as Go, Poker and Starcraft. The purely adversarial nature of such games allows for conceptually simple and principled application of RL methods. However real-world settings are many-agent, and agent interactions are complex mixtures of common-interest and competitive aspects. We consider Diplomacy, a 7-player board game designed to accentuate dilemmas resulting from many-agent interactions. It also features a large combinatorial action space and simultaneous moves, which are challenging for RL algorithms. We propose a simple yet effective approximate best response operator, designed to handle large combinatorial action spaces and simultaneous moves. We also introduce a family of policy iteration methods that approximate fictitious play. With these methods, we successfully apply RL to Diplomacy: we show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements.
Type: | Proceedings paper |
---|---|
Title: | Learning to Play No-Press Diplomacy with Best Response Policy Iteration |
Event: | 34th Conference on Neural Information Processing Systems |
Open access status: | An open access version is available from UCL Discovery |
Publisher version: | https://proceedings.neurips.cc/paper/2020/hash/d14... |
Language: | English |
Additional information: | This version is the version of record. For information on re-use, please refer to the publisher’s terms and conditions. |
UCL classification: | UCL UCL > Provost and Vice Provost Offices UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science |
URI: | https://discovery.ucl.ac.uk/id/eprint/10109592 |




Archive Staff Only
![]() |
View Item |