Antaki, Fares;
Milad, Daniel;
Chia, Mark A;
Giguère, Charles-Édouard;
Touma, Samir;
El-Khoury, Jonathan;
Keane, Pearse A;
(2023)
Capabilities of GPT-4 in ophthalmology: an analysis of model entropy and progress towards human-level medical question answering.
British Journal of Ophthalmology
10.1136/bjo-2023-324438.
(In press).
Preview |
Text
Keane_GPT-4__ BJO.pdf Download (982kB) | Preview |
Abstract
Background: Evidence on the performance of Generative Pre-trained Transformer 4 (GPT-4), a large language model (LLM), in the ophthalmology question-answering domain is needed. // Methods: We tested GPT-4 on two 260-question multiple choice question sets from the Basic and Clinical Science Course (BCSC) Self-Assessment Program and the OphthoQuestions question banks. We compared the accuracy of GPT-4 models with varying temperatures (creativity setting) and evaluated their responses in a subset of questions. We also compared the best-performing GPT-4 model to GPT-3.5 and to historical human performance. // Results: GPT-4–0.3 (GPT-4 with a temperature of 0.3) achieved the highest accuracy among GPT-4 models, with 75.8% on the BCSC set and 70.0% on the OphthoQuestions set. The combined accuracy was 72.9%, which represents an 18.3% raw improvement in accuracy compared with GPT-3.5 (p<0.001). Human graders preferred responses from models with a temperature higher than 0 (more creative). Exam section, question difficulty and cognitive level were all predictive of GPT-4-0.3 answer accuracy. GPT-4-0.3’s performance was numerically superior to human performance on the BCSC (75.8% vs 73.3%) and OphthoQuestions (70.0% vs 63.0%), but the difference was not statistically significant (p=0.55 and p=0.09). // Conclusion: GPT-4, an LLM trained on non-ophthalmology-specific data, performs significantly better than its predecessor on simulated ophthalmology board-style exams. Remarkably, its performance tended to be superior to historical human performance, but that difference was not statistically significant in our study.
Type: | Article |
---|---|
Title: | Capabilities of GPT-4 in ophthalmology: an analysis of model entropy and progress towards human-level medical question answering |
Location: | England |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1136/bjo-2023-324438 |
Publisher version: | https://doi.org/10.1136/bjo-2023-324438 |
Language: | English |
Additional information: | This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions. |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Institute of Ophthalmology |
URI: | https://discovery.ucl.ac.uk/id/eprint/10182077 |
Archive Staff Only
View Item |