UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Learning to Play No-Press Diplomacy with Best Response Policy Iteration

Anthony, TW; Eccles, T; Tacchetti, A; Kramár, J; Gemp, IM; Hudson, TC; Porcel, N; ... Bachrach, Y; + view all (2020) Learning to Play No-Press Diplomacy with Best Response Policy Iteration. In: Advances in Neural Information Processing Systems 33 pre-proceedings (NeurIPS 2020). NeurIPS (In press). Green open access

[thumbnail of Learning to Play No-Press Diplomacy with Best Response Policy Iteration.pdf]
Preview
Text
Learning to Play No-Press Diplomacy with Best Response Policy Iteration.pdf - Published version

Download (1MB) | Preview

Abstract

Recent advances in deep reinforcement learning (RL) have led to considerable progress in many 2-player zero-sum games, such as Go, Poker and Starcraft. The purely adversarial nature of such games allows for conceptually simple and principled application of RL methods. However real-world settings are many-agent, and agent interactions are complex mixtures of common-interest and competitive aspects. We consider Diplomacy, a 7-player board game designed to accentuate dilemmas resulting from many-agent interactions. It also features a large combinatorial action space and simultaneous moves, which are challenging for RL algorithms. We propose a simple yet effective approximate best response operator, designed to handle large combinatorial action spaces and simultaneous moves. We also introduce a family of policy iteration methods that approximate fictitious play. With these methods, we successfully apply RL to Diplomacy: we show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements.

Type: Proceedings paper
Title: Learning to Play No-Press Diplomacy with Best Response Policy Iteration
Event: 34th Conference on Neural Information Processing Systems
Open access status: An open access version is available from UCL Discovery
Publisher version: https://proceedings.neurips.cc/paper/2020/hash/d14...
Language: English
Additional information: This version is the version of record. For information on re-use, please refer to the publisher’s terms and conditions.
UCL classification: UCL
UCL > Provost and Vice Provost Offices
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10109592
Downloads since deposit
8Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item