Parasaram, Nikhil;
(2024)
Synergising Program Analysis and Machine Learning for Program Repair.
Doctoral thesis (Ph.D), UCL (University College London).
Preview |
Text
Nikhil_s_Thesis__Corrections_-5.pdf - Submitted Version Download (1MB) | Preview |
Abstract
Automated program repair (APR) enhances software quality by fixing bugs automatically, but it faces challenges due to software complexity. The vast number of possible patches makes exhaustive search impractical, and identifying correct patches is difficult since tools may generate incorrect fixes that overfit test cases. Mitigating these challenges involves leveraging the structure of code, which consists of a formal channel (execution semantics) and a natural language channel (comments, variable names). Machine learning excels at interpreting the natural language channel using large datasets but struggles with generating semantically correct patches. Conversely, program analysis provides detailed insights into program semantics. Combining program analysis with machine learning can address these challenges, using program analysis for execution specifics and machine learning for natural code aspects, like identifiers and comments. This thesis consists of four different works: • Chapter 3 advances semantic repair by synthesizing patches with side effects, employing symbolic execution with state merging and effective patch prioritization to repair bugs in open-source projects. • Chapter 4 reduces the search space by utilizing neural networks to learn variable information, ranking variables and patch templates to improve accuracy and reduce test overfitting. This enhances existing approaches, allowing them to repair previously unfixable bugs by leveraging program namespace information. • Chapter 5 leverages abstract interpretation and fuzzing to probabilistically approximate reachable program states, focusing on high-probability states PSP boosts performance in abstract interpretation, symbolic execution, and patch prioritization, benefiting strategies discussed in Chapter 3 and Chapter 4. • Chapter 6 addresses the challenge of identifying the most relevant facts, such as test errors and angelic values, for constructing effective prompts for LLM based APR. Extracted through program analysis, these facts build prompts whose effectiveness varies across different bugs. We develop a strategy to select facts tailored to each specific bug, significantly enhancing the effectiveness of LLMs in APR.
Type: | Thesis (Doctoral) |
---|---|
Qualification: | Ph.D |
Title: | Synergising Program Analysis and Machine Learning for Program Repair |
Open access status: | An open access version is available from UCL Discovery |
Language: | English |
Additional information: | Copyright © The Author 2024. Original content in this thesis is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) Licence (https://creativecommons.org/licenses/by-nc/4.0/). Any third-party copyright material present remains the property of its respective owner(s) and is licensed under its existing terms. Access may initially be restricted at the author’s request. |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science |
URI: | https://discovery.ucl.ac.uk/id/eprint/10198758 |
Archive Staff Only
View Item |