Exploration bonuses and dual control.
5 - 22.
Finding the Bayesian balance between exploration and exploitation in adaptive optimal control is in general intractable. This paper shows how to compute suboptimal estimates based on a certainty equivalence approximation (Cozzolino, Gonzalez-Zubieta & Miller, 1965) arising from a form of dual control. This systematizes and extends existing uses of exploration bonuses in reinforcement learning (Sutton, 1990). The approach has two components: a statistical model of uncertainty in the world and a way of turning this into exploratory behavior. This general approach is applied to two-dimensional mazes with moveable barriers and its performance is compared with Sutton's DYNA system.
|Title:||Exploration bonuses and dual control|
|Keywords:||reinforcement learning, dynamic programming, exploration bonuses, certainty equivalence, nonstationary environment, TIME|
|UCL classification:||UCL > School of Life and Medical Sciences > Faculty of Life Sciences > Gatsby Computational Neuroscience Unit|
Archive Staff Only