Bhuva, AN;
Bai, W;
Lau, C;
Davies, RH;
Ye, Y;
Bulluck, H;
McAlindon, E;
... Manisty, CH; + view all
(2019)
A Multicenter, Scan-Rescan, Human and Machine Learning CMR Study to Test Generalizability and Precision in Imaging Biomarker Analysis.
Circulation: Cardiovascular Imaging
, 12
(10)
, Article e009214. 10.1161/circimaging.119.009214.
Preview |
Text
CIRCIMAGING.119.009214.pdf - Published Version Download (560kB) | Preview |
Abstract
BACKGROUND: Automated analysis of cardiac structure and function using machine learning (ML) has great potential, but is currently hindered by poor generalizability. Comparison is traditionally against clinicians as a reference, ignoring inherent human inter- and intraobserver error, and ensuring that ML cannot demonstrate superiority. Measuring precision (scan:rescan reproducibility) addresses this. We compared precision of ML and humans using a multicenter, multi-disease, scan:rescan cardiovascular magnetic resonance data set. METHODS: One hundred ten patients (5 disease categories, 5 institutions, 2 scanner manufacturers, and 2 field strengths) underwent scan:rescan cardiovascular magnetic resonance (96% within one week). After identification of the most precise human technique, left ventricular chamber volumes, mass, and ejection fraction were measured by an expert, a trained junior clinician, and a fully automated convolutional neural network trained on 599 independent multicenter disease cases. Scan:rescan coefficient of variation and 1000 bootstrapped 95% CIs were calculated and compared using mixed linear effects models. RESULTS: Clinicians can be confident in detecting a 9% change in left ventricular ejection fraction, with greater than half of coefficient of variation attributable to intraobserver variation. Expert, trained junior, and automated scan:rescan precision were similar (for left ventricular ejection fraction, coefficient of variation 6.1 [5.2%–7.1%], P=0.2581; 8.3 [5.6%– 10.3%], P=0.3653; 8.8 [6.1%–11.1%], P=0.8620). Automated analysis was 186× faster than humans (0.07 versus 13 minutes). CONCLUSIONS: Automated ML analysis is faster with similar precision to the most precise human techniques, even when challenged with realworld scan:rescan data. Assessment of multicenter, multi-vendor, multifield strength scan:rescan data (available at www.thevolumesresource. com) permits a generalizable assessment of ML precision and may facilitate direct translation of ML to clinical practice.
Archive Staff Only
View Item |