Khan, Akbir;
(2025)
Safe Automated Research.
Doctoral thesis (Ph.D), UCL (University College London).
Preview |
Text
thesis.pdf - Accepted Version Download (15MB) | Preview |
Abstract
As AI systems approach human-level capabilities in reasoning and problem-solving, we face an unprecedented opportunity: the automation of alignment research itself. This thesis reframes the superalignment problem from controlling superintelligence to creating trustworthy automated research assistants who can work thousands of hours on alignment problems. The key insight is that solving this more immediate challenge enables us to multiply our research capacity by orders of magnitude, transforming our ability to address harder alignment problems. Through three complementary multi-agent approaches, we develop practical frameworks for ensuring that automated researchers remain aligned with their intended purpose. First, we introduce Shaper, which demonstrates how less capable systems can influence more sophisticated ones through strategic interaction, achieving 45% higher collective rewards in complex environments. Second, we develop debate protocols that enable verification of AI-generated research outputs, allowing non-expert judges to achieve 88% accuracy in evaluating expert conclusions despite lacking domain knowledge. Finally, we create adaptive monitoring systems for extended deployment, reducing harmful outputs by 80% while maintaining 98% capability through dynamic behavior assessment. These methods are particularly suited to AI research automation, where verification becomes more crucial than specification, temporal robustness prevents gradual drift, and multiplicative safety scales with capability. By focusing on human-level automated researchers rather than abstract superintelligence, this work provides immediate practical value for the critical transition period we now approach. Every improvement in our ability to safely oversee automated researchers directly multiplies our capacity to solve alignment challenges, creating a foundation for bootstrapping our way to more capable systems.
| Type: | Thesis (Doctoral) |
|---|---|
| Qualification: | Ph.D |
| Title: | Safe Automated Research |
| Open access status: | An open access version is available from UCL Discovery |
| Language: | English |
| Additional information: | Copyright © The Author 2025. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request. |
| UCL classification: | UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science UCL |
| URI: | https://discovery.ucl.ac.uk/id/eprint/10210531 |
Archive Staff Only
![]() |
View Item |

