The Fact Selection Problem in LLM-Based Program Repair

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

The Fact Selection Problem in LLM-Based Program Repair

Parasaram, Nikhil; Yan, Huijie; Yang, Boyu; Flahy, Zineb; Qudsi, Abriele; Ziaber, Damian; Barr, Earl T; (2025) The Fact Selection Problem in LLM-Based Program Repair. In: 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE). (pp. pp. 2574-2586). IEEE: Ottawa, ON, Canada. Green open access

Preview

Text
FactSelectionProblem.pdf - Accepted Version
Download (637kB) | Preview

Abstract

Recent research has shown that incorporating bug-related facts, such as stack traces and GitHub issues, into prompts enhances the bug-fixing capabilities of large language models (LLMs). Considering the ever-increasing context window of these models, a critical question arises: what and how many facts should be included in prompts to maximise the chance of correctly fixing bugs? To answer this question, we conducted a large-scale study, employing over 19K prompts featuring various combinations of seven diverse facts to rectify 314 bugs from open -source Python projects within the BugsInPy benchmark. Our findings revealed that each fact, ranging from simple syntactic details like code context to semantic information previously unexplored in the context of LLMs such as angelic values, is beneficial. Specifically, each fact aids in fixing some bugs that would remain unresolved or only be fixed with a low success rate without it. Importantly, we discovered that the effectiveness of program repair prompts is non-monotonic over the number of used facts; using too many facts leads to subpar outcomes. These insights led us to define the fact selection problem: determining the optimal set of facts for inclusion in a prompt to maximise LLM's performance on a given task instance. We found that there is no one-size-fits-all set of facts for bug repair. Therefore, we developed a basic statistical model, named Maniple, which selects facts specific to a given bug to include in the prompt. This model significantly surpasses the performance of the best generic fact set. To underscore the significance of the fact selection problem, we benchmarked Maniple against the state-of-the-art zero-shot, non-conversational LLM-based bug repair methods. On our testing dataset of 157 bugs, Maniple repairs 88 bugs, 17% above the best configuration.

Type:	Proceedings paper
Title:	The Fact Selection Problem in LLM-Based Program Repair
Event:	2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE)
Dates:	26 Apr 2025 - 6 May 2025
ISBN-13:	979-8-3315-0569-1
Open access status:	An open access version is available from UCL Discovery
DOI:	10.1109/ICSE55347.2025.00162
Publisher version:	https://doi.org/10.1109/icse55347.2025.00162
Language:	English
Additional information:	This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords:	Automated program repair, large language models, prompt engineering
UCL classification:	UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI:	https://discovery.ucl.ac.uk/id/eprint/10214057

Downloads since deposit

7Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item