UCL logo

UCL Discovery

UCL home » Library Services » Electronic resources » UCL Discovery

Regret bounds for Gaussian process bandit problems

Grünewälder, S; Audibert, JY; Opper, M; Shawe-Taylor, J; (2010) Regret bounds for Gaussian process bandit problems. In: (pp. pp. 273-280). Gold open access


Bandit algorithms are concerned with trading exploration with exploitation where a number of options are available but we can only learn their quality by experimenting with them. We consider the scenario in which the reward distribution for arms is modelled by a Gaussian process and there is no noise in the observed reward. Our main result is to bound the regret experienced by algorithms relative to the a posteriori optimal strategy of playing the best arm throughout based on benign assumptions about the covariance function defining the Gaussian process. We further complement these upper bounds with corresponding lower bounds for particular covariance functions demonstrating that in general there is at most a logarithmic looseness in our upper bounds. Copyright 2010 by the authors.

Type: Proceedings paper
Title: Regret bounds for Gaussian process bandit problems
Open access status: An open access publication
UCL classification: UCL > School of BEAMS
UCL > School of BEAMS > Faculty of Engineering Science
URI: http://discovery.ucl.ac.uk/id/eprint/1323909
Downloads since deposit
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item