UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Replay-Guided Adversarial Environment Design

Jiang, M; Dennis, M; Parker-Holder, J; Foerster, J; Grefenstette, E; Rocktäschel, T; (2021) Replay-Guided Adversarial Environment Design. In: Advances in Neural Information Processing Systems 34 pre-proceedings (NeurIPS 2021). Neural Information Processing Systems: Sydney, Australia. (In press). Green open access

[thumbnail of replay_guided_adversarial_envi.pdf]
Preview
Text
replay_guided_adversarial_envi.pdf - Accepted Version

Download (835kB) | Preview

Abstract

Deep reinforcement learning (RL) agents may successfully generalize to new settings if trained on an appropriately diverse set of environment and task configurations. Unsupervised Environment Design (UED) is a promising self-supervised RL paradigm, wherein the free parameters of an underspecified environment are automatically adapted during training to the agent's capabilities, leading to the emergence of diverse training environments. Here, we cast Prioritized Level Replay (PLR), an empirically successful but theoretically unmotivated method that selectively samples randomly-generated training levels, as UED. We argue that by curating completely random levels, PLR, too, can generate novel and complex levels for effective training. This insight reveals a natural class of UED methods we call Dual Curriculum Design (DCD). Crucially, DCD includes both PLR and a popular UED algorithm, PAIRED, as special cases and inherits similar theoretical guarantees. This connection allows us to develop novel theory for PLR, providing a version with a robustness guarantee at Nash equilibria. Furthermore, our theory suggests a highly counterintuitive improvement to PLR: by stopping the agent from updating its policy on uncurated levels (training on less data), we can improve the convergence to Nash equilibria. Indeed, our experiments confirm that our new method, PLR ⊥ , obtains better results on a suite of out-of-distribution, zero-shot transfer tasks, in addition to demonstrating that PLR ⊥ improves the performance of PAIRED, from which it inherited its theoretical framework.

Type: Proceedings paper
Title: Replay-Guided Adversarial Environment Design
Event: Thirty-fifth Conference on Neural Information Processing Systems
Open access status: An open access version is available from UCL Discovery
Publisher version: https://papers.nips.cc/paper/2021/hash/0e915db6326...
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10139110
Downloads since deposit
56Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item