An Empirical Study of the Non-determinism of ChatGPT in Code Generation

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

An Empirical Study of the Non-determinism of ChatGPT in Code Generation

Ouyang, Shuyin; Zhang, Jie M; Harman, Mark; Wang, Meng; (2024) An Empirical Study of the Non-determinism of ChatGPT in Code Generation. ACM Transactions on Software Engineering and Methodology 10.1145/3697010. (In press). Green open access

Preview

Text
3697010.pdf - Accepted Version
Download (3MB) | Preview

Abstract

There has been a recent explosion of research on Large Language Models (LLMs) for software engineering tasks, in particular code generation. However, results from LLMs can be highly unstable; nondeterministically returning very different code for the same prompt. Such non-determinism affects the correctness and consistency of the generated code, undermines developers’ trust in LLMs, and yields low reproducibility in LLM-based papers. Nevertheless, there is no work investigating how serious this non-determinism threat is. To fill this gap, this paper conducts an empirical study on the non-determinism of ChatGPT in code generation. We chose to study ChatGPT because it is already highly prevalent in the code generation research literature. We report results from a study of 829 code generation problems across three code generation benchmarks (i.e., CodeContests, APPS, and HumanEval) with three aspects of code similarities: semantic similarity, syntactic similarity, and structural similarity. Our results reveal that ChatGPT exhibits a high degree of non-determinism under the default setting: the ratio of coding tasks with zero equal test output across different requests is 75.76%, 51.00%, and 47.56% for three different code generation datasets (i.e., CodeContests, APPS, and HumanEval), respectively. In addition, we find that setting the temperature to 0 does not guarantee determinism in code generation, although it indeed brings less non-determinism than the default configuration ( temperature =1). In order to put LLM-based research on firmer scientific foundations, researchers need to take into account non-determinism in drawing their conclusions.

Type:	Article
Title:	An Empirical Study of the Non-determinism of ChatGPT in Code Generation
Open access status:	An open access version is available from UCL Discovery
DOI:	10.1145/3697010
Publisher version:	http://dx.doi.org/10.1145/3697010
Language:	English
Additional information:	This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
UCL classification:	UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI:	https://discovery.ucl.ac.uk/id/eprint/10198256

Downloads since deposit

11Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item