eprintid: 10132236 rev_number: 29 eprint_status: archive userid: 608 dir: disk0/10/13/22/36 datestamp: 2021-08-03 15:13:06 lastmod: 2021-10-15 22:55:28 status_changed: 2021-08-03 15:18:59 type: proceedings_section metadata_visibility: show creators_name: Jiang, M creators_name: Grefenstette, E creators_name: Rocktäschel, T title: Prioritized Level Replay ispublished: pub divisions: UCL divisions: B04 divisions: C05 divisions: F48 note: This version is the version of record. For information on re-use, please refer to the publisher’s terms and conditions. abstract: Environments with procedurally generated content serve as important benchmarks for testing systematic generalization in deep reinforcement learning. In this setting, each level is an algorithmically created environment instance with a unique configuration of its factors of variation. Training on a prespecified subset of levels allows for testing generalization to unseen levels. What can be learned from a level depends on the current policy, yet prior work defaults to uniform sampling of training levels independently of the policy. We introduce Prioritized Level Replay (PLR), a general framework for selectively sampling the next training level by prioritizing those with higher estimated learning potential when revisited in the future. We show TD-errors effectively estimate a level’s future learning potential and, when used to guide the sampling procedure, induce an emergent curriculum of increasingly difficult levels. By adapting the sampling of training levels, PLR significantly improves sample-efficiency and generalization on Procgen Benchmark—matching the previous state-of-the-art in test return—and readily combines with other methods. Combined with the previous leading method, PLR raises the state-of-the-art to over 76% improvement in test return relative to standard RL baselines. date: 2021 date_type: published publisher: PMLR: Proceedings of Machine Learning Research official_url: http://proceedings.mlr.press/v139/http://proceedings.mlr.press/v139/jiang21b.html oa_status: green full_text_type: pub language: eng primo: open primo_central: open_green verified: verified_manual elements_id: 1879518 lyricists_name: Rocktaschel, Tim lyricists_id: TROCK95 actors_name: Flynn, Bernadette actors_id: BFFLY94 actors_role: owner full_text_status: public publication: ICML volume: 139 place_of_pub: Online Only pagerange: 4940-4950 book_title: Proceedings of the 38th International Conference on Machine Learning editors_name: Meila, M editors_name: Zhang, T citation: Jiang, M; Grefenstette, E; Rocktäschel, T; (2021) Prioritized Level Replay. In: Meila, M and Zhang, T, (eds.) Proceedings of the 38th International Conference on Machine Learning. (pp. pp. 4940-4950). PMLR: Proceedings of Machine Learning Research: Online Only. Green open access document_url: https://discovery.ucl.ac.uk/id/eprint/10132236/1/jiang21b.pdf