%0 Journal Article
%@ 1527-974X
%A Wallace, BC
%A Noel-Storr, A
%A Marshall, IJ
%A Cohen, AM
%A Smalheiser, NR
%A Thomas, J
%D 2017
%F discovery:10039588
%J Journal of the American Medical Informatics Association
%K Machine learning, evidence-based medicine, crowdsourcing, human computation, natural language processing
%N 6
%P 1165-1168
%T Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach
%U https://discovery.ucl.ac.uk/id/eprint/10039588/
%V 24
%X OBJECTIVES: Identifying all published reports of randomized controlled trials (RCTs) is an important aim, but it requires extensive manual effort to separate RCTs from non-RCTs, even using current machine learning (ML) approaches. We aimed tomake this process more efficient via a hybrid approach using both crowdsourcing andML. METHODS: We trained a classifier to discriminate between citations that describe RCTs and those that do not. We then adopted a simple strategy of automatically excluding citations deemed very unlikely to be RCTs by the classifier and deferring to crowdworkers otherwise. RESULTS: Combining ML and crowdsourcing provid es a highly sensitive RCT identification strategy (our estimates suggest 95%-99% recall) with substantially less effort (we observed a reduction of around 60%-80%) than relying on manual screening alone. CONCLUSIONS: Hybrid crowd-ML strategies warrant further exploration for biomedical curation/annotation tasks.
%Z © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association.  This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/),  which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact  journals.permissions@oup.com