UCL logo

UCL Discovery

UCL home » Library Services » Electronic resources » UCL Discovery

Exploration bonuses and dual control

Dayan, P; Sejnowski, TJ; (1996) Exploration bonuses and dual control. MACH LEARN , 25 (1) 5 - 22.

Full text not available from this repository.


Finding the Bayesian balance between exploration and exploitation in adaptive optimal control is in general intractable. This paper shows how to compute suboptimal estimates based on a certainty equivalence approximation (Cozzolino, Gonzalez-Zubieta & Miller, 1965) arising from a form of dual control. This systematizes and extends existing uses of exploration bonuses in reinforcement learning (Sutton, 1990). The approach has two components: a statistical model of uncertainty in the world and a way of turning this into exploratory behavior. This general approach is applied to two-dimensional mazes with moveable barriers and its performance is compared with Sutton's DYNA system.

Type: Article
Title: Exploration bonuses and dual control
Keywords: reinforcement learning, dynamic programming, exploration bonuses, certainty equivalence, nonstationary environment, TIME
URI: http://discovery.ucl.ac.uk/id/eprint/85069
Downloads since deposit
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item