UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Near-optimal Policy Identification in Active Reinforcement Learning

Bogunovic, Ilija; (2023) Near-optimal Policy Identification in Active Reinforcement Learning. In: Proceedings of the The 11th International Conference on Learning Representations : ICLR 2023. ICLR Green open access

[thumbnail of 2043_near_optimal_policy_identifica.pdf]
Preview
PDF
2043_near_optimal_policy_identifica.pdf - Published Version

Download (464kB) | Preview

Abstract

Many real-world reinforcement learning tasks require control of complex dynamical systems that involve both costly data acquisition processes and large state spaces. In cases where the expensive transition dynamics can be readily evaluated at specified states (e.g., via a simulator), agents can operate in what is often referred to as planning with a \emph{generative model}. We propose the AE-LSVI algorithm for best policy identification, a novel variant of the kernelized least-squares value iteration (LSVI) algorithm that combines optimism with pessimism for active exploration (AE). AE-LSVI provably identifies a near-optimal policy \emph{uniformly} over an entire state space and achieves polynomial sample complexity guarantees that are independent of the number of states. When specialized to the recently introduced offline contextual Bayesian optimization setting, our algorithm achieves improved sample complexity bounds. Experimentally, we demonstrate that AE-LSVI outperforms other RL algorithms in a variety of environments when robustness to the initial state is required.

Type: Proceedings paper
Title: Near-optimal Policy Identification in Active Reinforcement Learning
Event: International Conference on Learning Representations (ICLR)
Open access status: An open access version is available from UCL Discovery
Publisher version: https://openreview.net/forum?id=3OR2tbtnYC-
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: reinforcement learning, contextual bayesian optimization, kernelized least-squares value iteration
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Electronic and Electrical Eng
URI: https://discovery.ucl.ac.uk/id/eprint/10198819
Downloads since deposit
7Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item