André, P;
Heitz, C;
Christodoulou, E;
Reinke, A;
Sudre, CH;
Antonelli, M;
Cardoso, MJ;
... Colliot, O; + view all
(2026)
Some Hidden Traps of Confidence Intervals in Medical Image Segmentation: Coverage Issues.
In:
Bridging Regulatory Science and Medical Imaging Evaluation; and Distributed, Collaborative, and Federated Learning (MICCAI 2025).
(pp. pp. 15-24).
Springer Nature Switzerland
|
Text
Sudre_MICCAI_2025_CIs.pdf Access restricted to UCL open access staff until 26 September 2026. Download (380kB) |
Abstract
Medical imaging AI models are usually assessed by reporting an empirical summary statistic of the performance metric, most commonly the mean or median. Recent work has shown that most studies overlook the uncertainty of these estimates, potentially leading to misleading conclusions and hampering clinical translation of medical imaging AI models. To address this issue, systematic reporting of confidence intervals (CIs) has been recommended, but numerous different CI methods exist, and there is very little literature on their behavior in medical imaging. A fundamental property of a CI method is its coverage. This paper contributes towards filling this literature gap in the context of medical image segmentation, studying the coverage of five CI methods for the two arguably most common summary statistics, the mean and the median. To that purpose, we perform a large-scale analysis of CI coverage using non-parametric simulations based on benchmarks instances representing diverse real-world distributions of two common segmentation metrics (Dice similarity coefficient and normalized surface distance). For the mean, all CI methods have decent coverage for most instances when sample sizes exceed 50, even though there are exceptions. For CIs of the median, we unveil major pitfalls: two common bootstrap CI methods have a catastrophic behavior on average whereas another only fails on very degenerate distributions. We believe these pitfalls are important to communicate to the community and that these findings will contribute to future efforts to provide standardized guidelines on confidence interval reporting in medical imaging AI.
Archive Staff Only
![]() |
View Item |

