Schmitt, S;
Shawe-Taylor, J;
van Hasselt, H;
(2023)
Exploration via Epistemic Value Estimation.
In:
Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023.
(pp. pp. 9742-9751).
The Association for the Advancement of Artificial Intelligence (AAAI)
Preview |
Text
Shawe-Taylo_Exploration via Epistemic Value Estimation_AAM.pdf Download (691kB) | Preview |
Abstract
How to efficiently explore in reinforcement learning is an open problem. Many exploration algorithms employ the epistemic uncertainty of their own value predictions - for instance to compute an exploration bonus or upper confidence bound. Unfortunately the required uncertainty is difficult to estimate in general with function approximation. We propose epistemic value estimation (EVE): a recipe that is compatible with sequential decision making and with neural network function approximators. It equips agents with a tractable posterior over all their parameters from which epistemic value uncertainty can be computed efficiently. We use the recipe to derive an epistemic Q-Learning agent and observe competitive performance on a series of benchmarks. Experiments confirm that the EVE recipe facilitates efficient exploration in hard exploration tasks.
Type: | Proceedings paper |
---|---|
Title: | Exploration via Epistemic Value Estimation |
Event: | 37th AAAI Conference on Artificial Intelligence, AAAI 2023 |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1609/aaai.v37i8.26164 |
Publisher version: | https://doi.org/10.1609/aaai.v37i8.26164 |
Language: | English |
Additional information: | This version is the author accepted manuscript. For information on re-use, please refer to the publisher's terms and conditions. |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science |
URI: | https://discovery.ucl.ac.uk/id/eprint/10176628 |
Archive Staff Only
View Item |