Luo, Yicheng;
(2025)
Reinforcement learning from imperfect data.
Doctoral thesis (Ph.D), UCL (University College London).
Preview |
Text
Luo_10208642_thesis.pdf Download (3MB) | Preview |
Abstract
Reinforcement learning (RL) is a powerful branch of machine learning where agents learn to make decisions by interacting with an environment to maximize cumulative rewards. However, traditional RL methods often face the challenge of high sample complexity, requiring vast amounts of online interactions to achieve proficiency. This thesis delves into the untapped potential of leveraging preexisting, imperfect data—such as sub-optimal experiences, incomplete datasets, and unstructured data—to reduce reliance on costly, high-quality datasets, making RL more practical for real-world applications. This thesis makes three key contributions: 1. We investigate the benefits and trade-offs of RL algorithms that build on sub-optimal experiences. By studying how these algorithms can capitalize on imperfect data, we enable more sample-efficient learning and achieve performance unattainable through offline learning alone. 2. We introduce a new offline imitation learning algorithm designed to handle diverse, reward-free datasets. This approach allows learning from mixedquality demonstrations, reducing the need for meticulously annotated behavior data, which is often challenging to obtain. 3. We present a new dataset and benchmark that uses unstructured data from chess to explore RL in a new dimension. This dataset bridges behavior and language, opening the door to the development of generalist agents capable of learning from unstructured, real-world information. Our work demonstrates that incorporating imperfect data into RL frameworks can significantly reduce sample complexity, broadening the horizons for RL applications in more complex, data-scarce environments. These advances offer promising new directions for future research and the practical deployment of RL in diverse domains.
Type: | Thesis (Doctoral) |
---|---|
Qualification: | Ph.D |
Title: | Reinforcement learning from imperfect data |
Open access status: | An open access version is available from UCL Discovery |
Language: | English |
Additional information: | Copyright © The Author 2025. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request. |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science |
URI: | https://discovery.ucl.ac.uk/id/eprint/10208642 |
Archive Staff Only
![]() |
View Item |