UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Safe Automated Research

Khan, Akbir; (2025) Safe Automated Research. Doctoral thesis (Ph.D), UCL (University College London). Green open access

[thumbnail of thesis.pdf]
Preview
Text
thesis.pdf - Accepted Version

Download (15MB) | Preview

Abstract

As AI systems approach human-level capabilities in reasoning and problem-solving, we face an unprecedented opportunity: the automation of alignment research itself. This thesis reframes the superalignment problem from controlling superintelligence to creating trustworthy automated research assistants who can work thousands of hours on alignment problems. The key insight is that solving this more immediate challenge enables us to multiply our research capacity by orders of magnitude, transforming our ability to address harder alignment problems. Through three complementary multi-agent approaches, we develop practical frameworks for ensuring that automated researchers remain aligned with their intended purpose. First, we introduce Shaper, which demonstrates how less capable systems can influence more sophisticated ones through strategic interaction, achieving 45% higher collective rewards in complex environments. Second, we develop debate protocols that enable verification of AI-generated research outputs, allowing non-expert judges to achieve 88% accuracy in evaluating expert conclusions despite lacking domain knowledge. Finally, we create adaptive monitoring systems for extended deployment, reducing harmful outputs by 80% while maintaining 98% capability through dynamic behavior assessment. These methods are particularly suited to AI research automation, where verification becomes more crucial than specification, temporal robustness prevents gradual drift, and multiplicative safety scales with capability. By focusing on human-level automated researchers rather than abstract superintelligence, this work provides immediate practical value for the critical transition period we now approach. Every improvement in our ability to safely oversee automated researchers directly multiplies our capacity to solve alignment challenges, creating a foundation for bootstrapping our way to more capable systems.

Type: Thesis (Doctoral)
Qualification: Ph.D
Title: Safe Automated Research
Open access status: An open access version is available from UCL Discovery
Language: English
Additional information: Copyright © The Author 2025. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request.
UCL classification: UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
UCL
URI: https://discovery.ucl.ac.uk/id/eprint/10210531
Downloads since deposit
56Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item