UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Designing image segmentation studies: Statistical power, sample size and reference standard quality.

Gibson, E; Hu, Y; Huisman, HJ; Barratt, DC; (2017) Designing image segmentation studies: Statistical power, sample size and reference standard quality. Med Image Anal , 42 pp. 44-59. 10.1016/j.media.2017.07.004. Green open access

[thumbnail of Gibson_1-s2.0-S1361841517301123-main.pdf]
Preview
Text
Gibson_1-s2.0-S1361841517301123-main.pdf - Published Version

Download (2MB) | Preview

Abstract

Segmentation algorithms are typically evaluated by comparison to an accepted reference standard. The cost of generating accurate reference standards for medical image segmentation can be substantial. Since the study cost and the likelihood of detecting a clinically meaningful difference in accuracy both depend on the size and on the quality of the study reference standard, balancing these trade-offs supports the efficient use of research resources. In this work, we derive a statistical power calculation that enables researchers to estimate the appropriate sample size to detect clinically meaningful differences in segmentation accuracy (i.e. the proportion of voxels matching the reference standard) between two algorithms. Furthermore, we derive a formula to relate reference standard errors to their effect on the sample sizes of studies using lower-quality (but potentially more affordable and practically available) reference standards. The accuracy of the derived sample size formula was estimated through Monte Carlo simulation, demonstrating, with 95% confidence, a predicted statistical power within 4% of simulated values across a range of model parameters. This corresponds to sample size errors of less than 4 subjects and errors in the detectable accuracy difference less than 0.6%. The applicability of the formula to real-world data was assessed using bootstrap resampling simulations for pairs of algorithms from the PROMISE12 prostate MR segmentation challenge data set. The model predicted the simulated power for the majority of algorithm pairs within 4% for simulated experiments using a high-quality reference standard and within 6% for simulated experiments using a low-quality reference standard. A case study, also based on the PROMISE12 data, illustrates using the formulae to evaluate whether to use a lower-quality reference standard in a prostate segmentation study.

Type: Article
Title: Designing image segmentation studies: Statistical power, sample size and reference standard quality.
Location: Netherlands
Open access status: An open access version is available from UCL Discovery
DOI: 10.1016/j.media.2017.07.004
Publisher version: https://doi.org/10.1016/j.media.2017.07.004
Language: English
Additional information: © 2017 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/)
Keywords: Image segmentation, Reference standard, Segmentation accuracy, Statistical power
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Med Phys and Biomedical Eng
URI: https://discovery.ucl.ac.uk/id/eprint/1570086
Downloads since deposit
214Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item