eprintid: 10039588 rev_number: 26 eprint_status: archive userid: 608 dir: disk0/10/03/95/88 datestamp: 2017-12-08 12:43:41 lastmod: 2021-09-20 22:18:07 status_changed: 2017-12-08 12:43:41 type: article metadata_visibility: show creators_name: Wallace, BC creators_name: Noel-Storr, A creators_name: Marshall, IJ creators_name: Cohen, AM creators_name: Smalheiser, NR creators_name: Thomas, J title: Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach ispublished: pub divisions: UCL divisions: B16 divisions: B14 divisions: J81 keywords: Machine learning, evidence-based medicine, crowdsourcing, human computation, natural language processing note: © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com abstract: OBJECTIVES: Identifying all published reports of randomized controlled trials (RCTs) is an important aim, but it requires extensive manual effort to separate RCTs from non-RCTs, even using current machine learning (ML) approaches. We aimed tomake this process more efficient via a hybrid approach using both crowdsourcing andML. METHODS: We trained a classifier to discriminate between citations that describe RCTs and those that do not. We then adopted a simple strategy of automatically excluding citations deemed very unlikely to be RCTs by the classifier and deferring to crowdworkers otherwise. RESULTS: Combining ML and crowdsourcing provid es a highly sensitive RCT identification strategy (our estimates suggest 95%-99% recall) with substantially less effort (we observed a reduction of around 60%-80%) than relying on manual screening alone. CONCLUSIONS: Hybrid crowd-ML strategies warrant further exploration for biomedical curation/annotation tasks. date: 2017-11 date_type: published official_url: http://doi.org/10.1093/jamia/ocx053 oa_status: green full_text_type: pub language: eng primo: open primo_central: open_green article_type_text: Journal Article verified: verified_manual elements_id: 1514319 doi: 10.1093/jamia/ocx053 lyricists_name: Thomas, James lyricists_id: JTHOA32 actors_name: Flynn, Bernadette actors_id: BFFLY94 actors_role: owner full_text_status: public publication: Journal of the American Medical Informatics Association volume: 24 number: 6 pagerange: 1165-1168 issn: 1527-974X citation: Wallace, BC; Noel-Storr, A; Marshall, IJ; Cohen, AM; Smalheiser, NR; Thomas, J; (2017) Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach. Journal of the American Medical Informatics Association , 24 (6) pp. 1165-1168. 10.1093/jamia/ocx053 <https://doi.org/10.1093/jamia%2Focx053>. Green open access document_url: https://discovery.ucl.ac.uk/id/eprint/10039588/1/ocx053.pdf