Some Hidden Traps of Confidence Intervals in Medical Image Segmentation: Coverage Issues

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Some Hidden Traps of Confidence Intervals in Medical Image Segmentation: Coverage Issues

André, P; Heitz, C; Christodoulou, E; Reinke, A; Sudre, CH; Antonelli, M; Cardoso, MJ; ... Colliot, O; + view all (2026) Some Hidden Traps of Confidence Intervals in Medical Image Segmentation: Coverage Issues. In: Bridging Regulatory Science and Medical Imaging Evaluation; and Distributed, Collaborative, and Federated Learning (MICCAI 2025). (pp. pp. 15-24). Springer Nature Switzerland

[thumbnail of Sudre_MICCAI_2025_CIs.pdf]

Text
Sudre_MICCAI_2025_CIs.pdf
Access restricted to UCL open access staff until 26 September 2026.
Download (380kB)

Abstract

Medical imaging AI models are usually assessed by reporting an empirical summary statistic of the performance metric, most commonly the mean or median. Recent work has shown that most studies overlook the uncertainty of these estimates, potentially leading to misleading conclusions and hampering clinical translation of medical imaging AI models. To address this issue, systematic reporting of confidence intervals (CIs) has been recommended, but numerous different CI methods exist, and there is very little literature on their behavior in medical imaging. A fundamental property of a CI method is its coverage. This paper contributes towards filling this literature gap in the context of medical image segmentation, studying the coverage of five CI methods for the two arguably most common summary statistics, the mean and the median. To that purpose, we perform a large-scale analysis of CI coverage using non-parametric simulations based on benchmarks instances representing diverse real-world distributions of two common segmentation metrics (Dice similarity coefficient and normalized surface distance). For the mean, all CI methods have decent coverage for most instances when sample sizes exceed 50, even though there are exceptions. For CIs of the median, we unveil major pitfalls: two common bootstrap CI methods have a catastrophic behavior on average whereas another only fails on very degenerate distributions. We believe these pitfalls are important to communicate to the community and that these findings will contribute to future efforts to provide standardized guidelines on confidence interval reporting in medical imaging AI.

Type:	Proceedings paper
Title:	Some Hidden Traps of Confidence Intervals in Medical Image Segmentation: Coverage Issues
Event:	Bridging Regulatory Science and Medical Imaging Evaluation; and Distributed, Collaborative, and Federated Learning First International Workshop, BRIDGE 2025, and 6th International Workshop, DeCaF 2025, Held in Conjunction with MICCAI 2025
ISBN-13:	9783032056658
DOI:	10.1007/978-3-032-05663-4_2
Publisher version:	https://doi.org/10.1007/978-3-032-05663-4_2
Language:	English
Additional information:	This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
UCL classification:	UCL UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Cardiovascular Science UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Cardiovascular Science > Population Science and Experimental Medicine UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Cardiovascular Science > Population Science and Experimental Medicine > MRC Unit for Lifelong Hlth and Ageing
URI:	https://discovery.ucl.ac.uk/id/eprint/10216230

Downloads since deposit

2Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item