Learning General World Models in a Handful of Reward-Free Deployments

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Bookmark & Share

Learning General World Models in a Handful of Reward-Free Deployments

Xu, Y; Rybkin, O; Parker-Holder, J; Roberts, SJ; Pacchiano, A; Rocktäschel, T; Ball, PJ; (2022) Learning General World Models in a Handful of Reward-Free Deployments. In: Advances in Neural Information Processing Systems. NeurIPS Green open access

[thumbnail of 2480_learning_general_world_models_.pdf]

Preview

Text
2480_learning_general_world_models_.pdf - Published Version
Download (3MB) | Preview

Abstract

Building generally capable agents is a grand challenge for deep reinforcement learning (RL). To approach this challenge practically, we outline two key desiderata: 1) to facilitate generalization, exploration should be task agnostic; 2) to facilitate scalability, exploration policies should collect large quantities of data without costly centralized retraining. Combining these two properties, we introduce the reward-free deployment efficiency setting, a new paradigm for RL research. We then present CASCADE, a novel approach for self-supervised exploration in this new setting. CASCADE seeks to learn a world model by collecting data with a population of agents, using an information theoretic objective inspired by Bayesian Active Learning. CASCADE achieves this by specifically maximizing the diversity of trajectories sampled by the population through a novel cascading objective. We provide theoretical intuition for CASCADE which we show in a tabular setting improves upon naïve approaches that do not account for population diversity. We then demonstrate that CASCADE collects diverse task-agnostic datasets and learns agents that generalize zero-shot to novel, unseen downstream tasks on Atari, MiniGrid, Crafter and the DM Control Suite. Code and videos are available at https://ycxuyingchen.github.io/cascade/.

Type:	Proceedings paper
Title:	Learning General World Models in a Handful of Reward-Free Deployments
Event:	36th Conference on Neural Information Processing Systems (NeurIPS 2022)
ISBN-13:	9781713871088
Open access status:	An open access version is available from UCL Discovery
Publisher version:	https://proceedings.neurips.cc/paper_files/paper/2...
Language:	English
Additional information:	This version is the version of record. For information on re-use, please refer to the publisher’s terms and conditions.
UCL classification:	UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI:	https://discovery.ucl.ac.uk/id/eprint/10173693

Downloads since deposit

7Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item