UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Stabilizing Unsupervised Environment Design with a Learned Adversary

Mediratta, I; Jiang, M; Parker-Holder, J; Dennis, M; Vinitsky, E; Rocktaschel, T; (2023) Stabilizing Unsupervised Environment Design with a Learned Adversary. In: Proceedings of The 2nd Conference on Lifelong Learning Agents, PMLR 232. (pp. pp. 270-291). PMLR: Proceedings of Machine Learning Research: McGill University, Montréal, Québec, Canada. Green open access

[thumbnail of mediratta23a.pdf]
Preview
Text
mediratta23a.pdf - Published Version

Download (3MB) | Preview

Abstract

A key challenge in training generally-capable agents is the design of training tasks that facilitate broad generalization and robustness to environment variations. This challenge motivates the problem setting of Unsupervised Environment Design (UED), whereby a student agent trains on an adaptive distribution of tasks proposed by a teacher agent. A pioneering approach for UED is PAIRED, which uses reinforcement learning (RL) to train a teacher policy to design tasks from scratch, making it possible to directly generate tasks that are adapted to the agent’s current capabilities. Despite its strong theoretical backing, PAIRED suffers from a variety of challenges that hinder its practical performance. Thus, state-of-the-art methods currently rely on curation and mutation rather than generation of new tasks. In this work, we investigate several key shortcomings of PAIRED and propose solutions for each shortcoming. As a result, we make it possible for PAIRED to match or exceed state-of-the-art methods, producing robust agents in several established challenging procedurally-generated environments, including a partially-observed maze navigation task and a continuous-control car racing environment. We believe this work motivates a renewed emphasis on UED methods based on learned models that directly generate challenging environments, potentially unlocking more open-ended RL training and, as a result, more general agents.

Type: Proceedings paper
Title: Stabilizing Unsupervised Environment Design with a Learned Adversary
Event: The 2nd Conference on Lifelong Learning Agents
Open access status: An open access version is available from UCL Discovery
Publisher version: https://proceedings.mlr.press/v232/mediratta23a.ht...
Language: English
Additional information: This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10187633
Downloads since deposit
7Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item