Kayal, Aya;
(2025)
Towards Data-efficient AI:
Theoretical analysis and experimental validation of new
exploration algorithms for
Reinforcement Learning.
Doctoral thesis (Ph.D), UCL (University College London).
Preview |
Text
Kayal_10217098_(2)Thesis.pdf Download (9MB) | Preview |
Abstract
Sequential decision-making is at the core of everyday human activities and complex real-world systems. Equipping machines with this capability has enormous implications for advancing Artificial Intelligence (AI) and developing autonomous systems that can reliably address real-world tasks, from robotics and healthcare to traffic management and large-scale information systems. Unlike machines, the human brain makes intelligent decisions in a remarkably data- and energy-efficient manner. A long-term objective of AI research, often associated with the vision of Artificial General Intelligence (AGI), is to approximate the human brain’s capabilities, paving the way for more practical and sustainable AI systems. Reinforcement Learning (RL) is the mathematical framework that enables machines to learn sequential decision-making through trial and error, mirroring aspects of human learning. While RL has driven many recent advances, training RL agents remains highly data- and compute-intensive—far from the efficiency of the human brain. A central bottleneck to data efficiency is exploration: how machines gather informative experiences to learn effective strategies. This thesis addresses the exploration challenge in RL, approaching it from both empirical and theoretical perspectives, bridging the gap between the two. First, a proof-of-concept study is developed that provides a deeper understanding of how exploration bonuses shape the behavior of deep RL agents, yielding new empirical insights into the mechanisms driving exploration in practice. Second, a theoretical framework is introduced for the analytical study of RL in the kernel setting, leading to the development of provably efficient exploration algorithms with regret bounds that are tighter than existing approaches. Third, the study of exploration in Bayesian Optimization with preference-based feedback introduces a novel algorithm that, for the first time, achieves order-optimal sample complexity in this setting. Together, these contributions advance the development of sample-efficient decision-making algorithms, bringing AI systems closer to the remarkable efficiency of human learning.
| Type: | Thesis (Doctoral) |
|---|---|
| Qualification: | Ph.D |
| Title: | Towards Data-efficient AI: Theoretical analysis and experimental validation of new exploration algorithms for Reinforcement Learning |
| Open access status: | An open access version is available from UCL Discovery |
| Language: | English |
| Additional information: | Copyright © The Author 2025. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request. |
| Keywords: | Reinforcement Learning, Bayesian Optimization, Online Learning |
| UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Electronic and Electrical Eng |
| URI: | https://discovery.ucl.ac.uk/id/eprint/10217098 |
Archive Staff Only
![]() |
View Item |

