eprintid: 10039588
rev_number: 26
eprint_status: archive
userid: 608
dir: disk0/10/03/95/88
datestamp: 2017-12-08 12:43:41
lastmod: 2021-09-20 22:18:07
status_changed: 2017-12-08 12:43:41
type: article
metadata_visibility: show
creators_name: Wallace, BC
creators_name: Noel-Storr, A
creators_name: Marshall, IJ
creators_name: Cohen, AM
creators_name: Smalheiser, NR
creators_name: Thomas, J
title: Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach
ispublished: pub
divisions: UCL
divisions: B16
divisions: B14
divisions: J81
keywords: Machine learning, evidence-based medicine, crowdsourcing, human computation, natural language processing
note: © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/),
which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact
journals.permissions@oup.com
abstract: OBJECTIVES: Identifying all published reports of randomized controlled trials (RCTs) is an important aim, but it requires extensive manual effort to separate RCTs from non-RCTs, even using current machine learning (ML) approaches. We aimed tomake this process more efficient via a hybrid approach using both crowdsourcing andML. METHODS: We trained a classifier to discriminate between citations that describe RCTs and those that do not. We then adopted a simple strategy of automatically excluding citations deemed very unlikely to be RCTs by the classifier and deferring to crowdworkers otherwise. RESULTS: Combining ML and crowdsourcing provid es a highly sensitive RCT identification strategy (our estimates suggest 95%-99% recall) with substantially less effort (we observed a reduction of around 60%-80%) than relying on manual screening alone. CONCLUSIONS: Hybrid crowd-ML strategies warrant further exploration for biomedical curation/annotation tasks.
date: 2017-11
date_type: published
official_url: http://doi.org/10.1093/jamia/ocx053
oa_status: green
full_text_type: pub
language: eng
primo: open
primo_central: open_green
article_type_text: Journal Article
verified: verified_manual
elements_id: 1514319
doi: 10.1093/jamia/ocx053
lyricists_name: Thomas, James
lyricists_id: JTHOA32
actors_name: Flynn, Bernadette
actors_id: BFFLY94
actors_role: owner
full_text_status: public
publication: Journal of the American Medical Informatics Association
volume: 24
number: 6
pagerange: 1165-1168
issn: 1527-974X
citation:        Wallace, BC;    Noel-Storr, A;    Marshall, IJ;    Cohen, AM;    Smalheiser, NR;    Thomas, J;      (2017)    Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach.                   Journal of the American Medical Informatics Association , 24  (6)   pp. 1165-1168.    10.1093/jamia/ocx053 <https://doi.org/10.1093/jamia%2Focx053>.       Green open access   
 
document_url: https://discovery.ucl.ac.uk/id/eprint/10039588/1/ocx053.pdf