UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Evaluating the harmonisation potential of diverse cohort datasets

Bauermeister, Sarah; Phatak, Mukta; Sparks, Kelly; Sargent, Lana; Griswold, Michael; McHugh, Caitlin; Nalls, Mike; ... Gallacher, John; + view all (2023) Evaluating the harmonisation potential of diverse cohort datasets. European Journal of Epidemiology 10.1007/s10654-023-00997-3. Green open access

[thumbnail of s10654-023-00997-3.pdf]
Preview
Text
s10654-023-00997-3.pdf - Published Version

Download (507kB) | Preview

Abstract

Data discovery, the ability to find datasets relevant to an analysis, increases scientific opportunity, improves rigour and accelerates activity. Rapid growth in the depth, breadth, quantity and availability of data provides unprecedented opportunities and challenges for data discovery. A potential tool for increasing the efficiency of data discovery, particularly across multiple datasets is data harmonisation.A set of 124 variables, identified as being of broad interest to neurodegeneration, were harmonised using the C-Surv data model. Harmonisation strategies used were simple calibration, algorithmic transformation and standardisation to the Z-distribution. Widely used data conventions, optimised for inclusiveness rather than aetiological precision, were used as harmonisation rules. The harmonisation scheme was applied to data from four diverse population cohorts.Of the 120 variables that were found in the datasets, correspondence between the harmonised data schema and cohort-specific data models was complete or close for 111 (93%). For the remainder, harmonisation was possible with a marginal a loss of granularity.Although harmonisation is not an exact science, sufficient comparability across datasets was achieved to enable data discovery with relatively little loss of informativeness. This provides a basis for further work extending harmonisation to a larger variable list, applying the harmonisation to further datasets, and incentivising the development of data discovery tools.

Type: Article
Title: Evaluating the harmonisation potential of diverse cohort datasets
Location: Netherlands
Open access status: An open access version is available from UCL Discovery
DOI: 10.1007/s10654-023-00997-3
Publisher version: https://doi.org/10.1007/s10654-023-00997-3
Language: English
Additional information: Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Keywords: C-surv data model, Cohort, Data discovery, Data harmonisation, Data visualisation, Datasets
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Epidemiology and Health
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Epidemiology and Health > Behavioural Science and Health
URI: https://discovery.ucl.ac.uk/id/eprint/10169232
Downloads since deposit
24Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item