UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

On the Sample Complexity of Reinforcement Learning

Kakade, Sham Machandranath; (2003) On the Sample Complexity of Reinforcement Learning. Doctoral thesis (Ph.D), UCL (University College London). Green open access

[thumbnail of On the sample complexity of reinforcement learning.pdf]
Preview
Text
On the sample complexity of reinforcement learning.pdf

Download (4MB) | Preview

Abstract

This thesis is a detailed investigation into the following question: how much data must an agent collect in order to perform "reinforcement learning" successfully? This question is analogous to the classical issue of the sample complexity in supervised learning, but is harder because of the increased realism of the reinforcement learning setting. This thesis summarizes recent sample complexity results in the reinforcement learning literature and builds on these results to provide novel algorithms with strong performance guarantees. We focus on a variety of reasonable performance criteria and sampling models by which agents may access the environment. For instance, in a policy search setting, we consider the problem of how much simulated experience is required to reliably choose a "good" policy among a restricted class of policies II (as in Kearns, Mansour, and Ng [2000]). In a more online setting, we consider the case in which an agent is placed in an environment and must follow one unbroken chain of experience with no access to "offline" simulation (as in Kearns and Singh [1998]). We build on the sample based algorithms suggested by Kearns, Mansour, and Ng [2000]. Their sample complexity bounds have no dependence on the size of the state space, an exponential dependence on the planning horizon time, and linear dependence on the complexity of II. We suggest novel algorithms with more restricted guarantees whose sample complexities are again independent of the size of the state space and depend linearly on the complexity of the policy class II, but have only a polynomial dependence on the horizon time. We pay particular attention to the tradeoffs made by such algorithms.

Type: Thesis (Doctoral)
Qualification: Ph.D
Title: On the Sample Complexity of Reinforcement Learning
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Thesis digitised by ProQuest
Keywords: Applied sciences; Reinforcement learning
URI: https://discovery.ucl.ac.uk/id/eprint/10100726
Downloads since deposit
412Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item