UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Machine learning reduced workload with minimal risk of missing studies: development and evaluation of an RCT classifier for Cochrane Reviews

Thomas, J; McDonald, S; Noel-Storr, A; Shemilt, I; Elliott, J; Mavergames, C; Marshall, IJ; (2020) Machine learning reduced workload with minimal risk of missing studies: development and evaluation of an RCT classifier for Cochrane Reviews. Journal of Clinical Epidemiology 10.1016/j.jclinepi.2020.11.003. (In press). Green open access

[thumbnail of Shemilt_Machine learning reduced workload with minimal risk of missing studies- development and evaluation of an RCT classifier for Cochrane Reviews_AOP.pdf]
Preview
Text
Shemilt_Machine learning reduced workload with minimal risk of missing studies- development and evaluation of an RCT classifier for Cochrane Reviews_AOP.pdf - Published Version

Download (881kB) | Preview

Abstract

BACKGROUND: To describe the development, calibration and evaluation of a machine learning classifier designed to reduce study identification workload in Cochrane for producing systematic reviews. METHODS: A machine learning classifier for retrieving RCTs was developed (the ‘Cochrane RCT Classifier’), with the algorithm trained using a dataset of title-abstract records from Embase, manually labelled by the Cochrane Crowd. The classifier was then calibrated using a further dataset of similar records manually labelled by the Clinical Hedges team, aiming for 99% recall. Finally, the recall of the calibrated classifier was evaluated using records of RCTs included in Cochrane Reviews that had abstracts of sufficient length to allow machine classification. RESULTS: The Cochrane RCT Classifier was trained using 280,620 records (20,454 of which reported RCTs). A classification threshold was set using 49,025 calibration records (1,587 of which reported RCTs) and our bootstrap validation found the classifier had recall of 0.99 (95% CI 0.98 to 0.99) and precision of 0.08 (95% CI 0.06 to 0.12) in this dataset. The final, calibrated RCT classifier correctly retrieved 43,783 (99.5%) of 44,007 RCTs included in Cochrane Reviews but missed 224 (0.5%). Older records were more likely to be missed than those more recently published. CONCLUSIONS: The Cochrane RCT Classifier can reduce manual study identification workload for Cochrane reviews, with a very low and acceptable risk of missing eligible RCTs. This classifier now forms part of the Evidence Pipeline, an integrated workflow deployed within Cochrane to help improve the efficiency of the study identification processes that support systematic review production.

Type: Article
Title: Machine learning reduced workload with minimal risk of missing studies: development and evaluation of an RCT classifier for Cochrane Reviews
Location: United States
Open access status: An open access version is available from UCL Discovery
DOI: 10.1016/j.jclinepi.2020.11.003
Publisher version: https://doi.org/10.1016/j.jclinepi.2020.11.003
Language: English
Additional information: © 2020 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Keywords: Machine learning; study classifiers; searching; information retrieval; methods/ methodology; randomised controlled trials; systematic reviews; automation; crowdsourcing; Cochrane Library;
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Education
UCL > Provost and Vice Provost Offices > School of Education > UCL Institute of Education
UCL > Provost and Vice Provost Offices > School of Education > UCL Institute of Education > IOE - Social Research Institute
URI: https://discovery.ucl.ac.uk/id/eprint/10115089
Downloads since deposit
68Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item