eprintid: 10132236
rev_number: 29
eprint_status: archive
userid: 608
dir: disk0/10/13/22/36
datestamp: 2021-08-03 15:13:06
lastmod: 2021-10-15 22:55:28
status_changed: 2021-08-03 15:18:59
type: proceedings_section
metadata_visibility: show
creators_name: Jiang, M
creators_name: Grefenstette, E
creators_name: Rocktäschel, T
title: Prioritized Level Replay
ispublished: pub
divisions: UCL
divisions: B04
divisions: C05
divisions: F48
note: This version is the version of record. For information on re-use, please refer to the publisher’s terms and conditions.
abstract: Environments with procedurally generated content serve as important benchmarks for testing systematic generalization in deep reinforcement learning. In this setting, each level is an algorithmically created environment instance with a unique configuration of its factors of variation. Training on a prespecified subset of levels allows for testing generalization to unseen levels. What can be learned from a level depends on the current policy, yet prior work defaults to uniform sampling of training levels independently of the policy. We introduce Prioritized Level Replay (PLR), a general framework for selectively sampling the next training level by prioritizing those with higher estimated learning potential when revisited in the future. We show TD-errors effectively estimate a level’s future learning potential and, when used to guide the sampling procedure, induce an emergent curriculum of increasingly difficult levels. By adapting the sampling of training levels, PLR significantly improves sample-efficiency and generalization on Procgen Benchmark—matching the previous state-of-the-art in test return—and readily combines with other methods. Combined with the previous leading method, PLR raises the state-of-the-art to over 76% improvement in test return relative to standard RL baselines.
date: 2021
date_type: published
publisher: PMLR: Proceedings of Machine Learning Research
official_url: http://proceedings.mlr.press/v139/http://proceedings.mlr.press/v139/jiang21b.html
oa_status: green
full_text_type: pub
language: eng
primo: open
primo_central: open_green
verified: verified_manual
elements_id: 1879518
lyricists_name: Rocktaschel, Tim
lyricists_id: TROCK95
actors_name: Flynn, Bernadette
actors_id: BFFLY94
actors_role: owner
full_text_status: public
publication: ICML
volume: 139
place_of_pub: Online Only
pagerange: 4940-4950
book_title: Proceedings of the 38th International Conference on Machine Learning
editors_name: Meila, M
editors_name: Zhang, T
citation:        Jiang, M;    Grefenstette, E;    Rocktäschel, T;      (2021)    Prioritized Level Replay.                     In: Meila, M and Zhang, T, (eds.) Proceedings of the 38th International Conference on Machine Learning.  (pp. pp. 4940-4950).  PMLR: Proceedings of Machine Learning Research: Online Only.       Green open access   
 
document_url: https://discovery.ucl.ac.uk/id/eprint/10132236/1/jiang21b.pdf