Learning to Play No-Press Diplomacy with Best Response Policy Iteration

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Bookmark & Share

Learning to Play No-Press Diplomacy with Best Response Policy Iteration

Anthony, TW; Eccles, T; Tacchetti, A; Kramár, J; Gemp, IM; Hudson, TC; Porcel, N; ... Bachrach, Y; + view all (2020) Learning to Play No-Press Diplomacy with Best Response Policy Iteration. In: Advances in Neural Information Processing Systems 33 pre-proceedings (NeurIPS 2020). NeurIPS (In press). Green open access

Preview

Text
Learning to Play No-Press Diplomacy with Best Response Policy Iteration.pdf - Published Version
Download (1MB) | Preview

Abstract

Recent advances in deep reinforcement learning (RL) have led to considerable progress in many 2-player zero-sum games, such as Go, Poker and Starcraft. The purely adversarial nature of such games allows for conceptually simple and principled application of RL methods. However real-world settings are many-agent, and agent interactions are complex mixtures of common-interest and competitive aspects. We consider Diplomacy, a 7-player board game designed to accentuate dilemmas resulting from many-agent interactions. It also features a large combinatorial action space and simultaneous moves, which are challenging for RL algorithms. We propose a simple yet effective approximate best response operator, designed to handle large combinatorial action spaces and simultaneous moves. We also introduce a family of policy iteration methods that approximate fictitious play. With these methods, we successfully apply RL to Diplomacy: we show that our agents convincingly outperform the previous state-of-the-art, and game theoretic equilibrium analysis shows that the new process yields consistent improvements.

Type:	Proceedings paper
Title:	Learning to Play No-Press Diplomacy with Best Response Policy Iteration
Event:	34th Conference on Neural Information Processing Systems
Open access status:	An open access version is available from UCL Discovery
Publisher version:	https://proceedings.neurips.cc/paper/2020/hash/d14...
Language:	English
Additional information:	This version is the version of record. For information on re-use, please refer to the publisher’s terms and conditions.
UCL classification:	UCL UCL > Provost and Vice Provost Offices UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI:	https://discovery.ucl.ac.uk/id/eprint/10109592

Downloads since deposit

46Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item