UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Interaction-Efficient Reinforcement Learning: Matching the Real World Data Availability

Jiang, Zhengyao; (2025) Interaction-Efficient Reinforcement Learning: Matching the Real World Data Availability. Doctoral thesis (Ph.D), UCL (University College London). Green open access

[thumbnail of Jiang_10211756_thesis.pdf]
Preview
Text
Jiang_10211756_thesis.pdf

Download (16MB) | Preview

Abstract

Reinforcement Learning (RL) provides a promising approach to developing AI systems capable of surpassing human-level performance by learning directly from environment feedback, bypassing the limitations of supervised learning that relies on labeled data. While RL has demonstrated remarkable achievements in simulated environments, such as surpassing human experts in strategic games, its practical application remains constrained by an excessive need for online interactions, which are costly and difficult to obtain in real-world scenarios. This thesis introduces a series of methods to address the interaction efficiency challenges in RL, with the aim of eventually building superhuman-level AI agents that can operate with real-world data availability. The first method, Grid-to-Graph (GTG), utilizes programmable inductive biases that embed domain knowledge directly into neural networks, significantly reducing the interactions required for effective policy learning and enabling generalization to new, unseen environments. Additionally, the Graph Backup method offers a novel approach to value estimation by treating state transitions as a graph structure, enhancing data efficiency by capturing interdependencies between transitions, with promising results in environments like MinAtar, MiniGrid, and Atari100K. Moving towards offline RL, the Trajectory Autoencoding Planner (TAP) creates a compact latent space for encoding multi-step trajectories, allowing for efficient planning-based control in complex tasks such as MuJoCo locomotion and Adroit robotic manipulation. Building on this, the Humanoid Generalist Autoencoding Planner (H-GAP) scales TAP’s approach to a single, versatile foundation model capable of handling humanoid tasks with 56 degrees of freedom. H-GAP achieves high adaptability across diverse downstream tasks, outperforming traditional task-specific offline RL methods. Our scaling analysis of H-GAP highlights dataset diversity as a critical bottleneck, pointing to data collection and engineering as essential future directions. This thesis suggests that developing generative foundation models, refined through both offline and online RL, may be key to achieving superhuman performance across a broad spectrum of complex, real-world tasks.

Type: Thesis (Doctoral)
Qualification: Ph.D
Title: Interaction-Efficient Reinforcement Learning: Matching the Real World Data Availability
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Copyright © The Author 2025. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI: https://discovery.ucl.ac.uk/id/eprint/10211756
Downloads since deposit
13Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item