Mikhail, David;
Farah, Andrew;
Milad, Jason;
Nassrallah, Wissam;
Mihalache, Andrew;
Milad, Daniel;
Antaki, Fares;
... Duval, Renaud; + view all
(2025)
Performance of DeepSeek-R1 in ophthalmology: an evaluation of clinical decision-making and cost-effectiveness.
British Journal of Ophthalmology
10.1136/bjo-2025-327360.
(In press).
Preview |
Text
Keane_2025_02_09_DeepSeek-StatPearls-Manuscript.pdf Download (388kB) | Preview |
Abstract
Background/aims: To compare the performance and cost-effectiveness of DeepSeek-R1 with OpenAI o1 in diagnosing and managing ophthalmology clinical cases. // Methods: In this cross-sectional study, a total of 300 clinical cases spanning 10 ophthalmology subspecialties were collected from StatPearls, each with a multiple-choice question on diagnosis or management. DeepSeek-R1 was accessed through its public chat interface, while OpenAI o1 was queried via its Application Programming Interface with a standardised temperature of 0.3. Both models were prompted using plan-and-solve+. Performance was calculated as the proportion of correct answers. McNemar’s test was employed to compare the two models’ performance on paired data. Intermodel agreement for correct diagnoses was evaluated via Cohen’s kappa. Token-based cost analyses were performed to estimate the comparative expenditures of running each model at scale, including input prompts and model-generated output. // Results: DeepSeek-R1 and OpenAI o1 achieved an identical overall performance of 82.0% (n=246/300; 95% CI: 77.3 to 85.9). Subspecialty-specific analysis revealed numerical variation in performance, though none of these comparisons reached statistical significance (p>0.05). Agreement in performance between the models was moderate overall (κ=0.503, p<0.001), with substantial agreement in refractive management/intervention (κ=0.698, p<0.001) and moderate agreement in retina/vitreous (κ=0.561, p<0.001) and ocular pathology/oncology (κ=0.495, p<0.01) cases. Cost analysis indicated an approximately 15-fold reduction in per-query, token-related expenses when using DeepSeek-R1 vs OpenAI o1 for the same workload. // Conclusions: DeepSeek-R1 shows strong diagnostic and management performance comparable to OpenAI o1 across ophthalmic subspecialties, while significantly reducing costs. These results support its use as a cost-effective, open-weight alternative to proprietary models.
Type: | Article |
---|---|
Title: | Performance of DeepSeek-R1 in ophthalmology: an evaluation of clinical decision-making and cost-effectiveness |
Location: | England |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1136/bjo-2025-327360 |
Publisher version: | https://doi.org/10.1136/bjo-2025-327360 |
Language: | English |
Additional information: | This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions. |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Institute of Ophthalmology |
URI: | https://discovery.ucl.ac.uk/id/eprint/10212067 |
Archive Staff Only
![]() |
View Item |