UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Accessible data curation and analytics for international-scale citizen science datasets

Murray, B; Kerfoot, E; Chen, L; Deng, J; Graham, MS; Sudre, CH; Molteni, E; ... Ourselin, S; + view all (2021) Accessible data curation and analytics for international-scale citizen science datasets. Scientific Data , 8 (1) , Article 297. 10.1038/s41597-021-01071-x. Green open access

[thumbnail of s41597-021-01071-x.pdf]
Preview
Text
s41597-021-01071-x.pdf - Published Version

Download (3MB) | Preview

Abstract

The Covid Symptom Study, a smartphone-based surveillance study on COVID-19 symptoms in the population, is an exemplar of big data citizen science. As of May 23rd, 2021, over 5 million participants have collectively logged over 360 million self-assessment reports since its introduction in March 2020. The success of the Covid Symptom Study creates significant technical challenges around effective data curation. The primary issue is scale. The size of the dataset means that it can no longer be readily processed using standard Python-based data analytics software such as Pandas on commodity hardware. Alternative technologies exist but carry a higher technical complexity and are less accessible to many researchers. We present ExeTera, a Python-based open source software package designed to provide Pandas-like data analytics on datasets that approach terabyte scales. We present its design and capabilities, and show how it is a critical component of a data curation pipeline that enables reproducible research across an international research group for the Covid Symptom Study.

Type: Article
Title: Accessible data curation and analytics for international-scale citizen science datasets
Open access status: An open access version is available from UCL Discovery
DOI: 10.1038/s41597-021-01071-x
Publisher version: https://doi.org/10.1038/s41597-021-01071-x
Language: English
Additional information: © 2021 Springer Nature Limited. This article is licensed under a Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
Keywords: Epidemiology, Research data
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Cardiovascular Science
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Cardiovascular Science > Population Science and Experimental Medicine
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Cardiovascular Science > Population Science and Experimental Medicine > MRC Unit for Lifelong Hlth and Ageing
URI: https://discovery.ucl.ac.uk/id/eprint/10139549
Downloads since deposit
31Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item