Madjiheurem, Sephora;
(2022)
Advancing Data-Efficiency in Reinforcement Learning.
Doctoral thesis (Ph.D), UCL (University College London).
Preview |
Text
Doctoral_Thesis_Sephora_Madjiheurem.pdf - Accepted Version Download (18MB) | Preview |
Abstract
In many real-world applications, including traffic control, robotics and web system configurations, we are confronted with real-time decision-making problems where data is limited. Reinforcement Learning (RL) allows us to construct a mathematical framework to solve sequential decision-making problems under uncertainty. Under low-data constraints, RL agents must be able to quickly identify relevant information in the observations, and to quickly learn how to act in order attain their long-term objective. While recent advancements in RL have demonstrated impressive achievements, the end-to-end approach they take favours autonomy and flexibility at the expense of fast learning. To be of practical use, there is an undeniable need to improve the data-efficiency of existing systems. Ideal RL agents would possess an optimal way of representing their environment, combined with an efficient mechanism for propagating reward signals across the state space. This thesis investigates the problem of data-efficiency in RL from these two aforementioned perspectives. A deep overview of the different representation learning methods in use in RL is provided. The aim of this overview is to categorise the different representation learning approaches and highlight the impact of the representation on data-efficiency. Then, this framing is used to develop two main research directions. The first problem focuses on learning a representation that captures the geometry of the problem. An RL mechanism that uses a scalable feature learning on graphs method to learn such rich representations is introduced, ultimately leading to more efficient value function approximation. Secondly, ET (λ ), an algorithm that improves credit assignment in stochastic environments by propagating reward information counterfactually is presented. ET (λ ) results in faster earning compared to traditional methods that rely solely on temporal credit assignment. Overall, this thesis shows how a structural representation encoding the geometry of the state space, and counterfactual credit assignment are key characteristics for data-efficient RL.
Type: | Thesis (Doctoral) |
---|---|
Qualification: | Ph.D |
Title: | Advancing Data-Efficiency in Reinforcement Learning |
Open access status: | An open access version is available from UCL Discovery |
Language: | English |
Additional information: | Copyright © The Author 2022. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request. |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Electronic and Electrical Eng |
URI: | https://discovery.ucl.ac.uk/id/eprint/10158961 |
Archive Staff Only
View Item |