UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Workflow for Integrated Processing of Multicohort Untargeted 1H NMR Metabolomics Data in Large-Scale Metabolic Epidemiology

Karaman, I; Ferreira, DL; Boulangé, CL; Kaluarachchi, MR; Herrington, D; Dona, AC; Castagné, R; ... Ebbels, TM; + view all (2016) Workflow for Integrated Processing of Multicohort Untargeted 1H NMR Metabolomics Data in Large-Scale Metabolic Epidemiology. Journal of Proteome Research , 15 (12) pp. 4188-4194. 10.1021/acs.jproteome.6b00125. Green open access

[thumbnail of Moayyeri_NMR Preprocessing Paper - 20150609 (small figures).pdf]
Preview
Text
Moayyeri_NMR Preprocessing Paper - 20150609 (small figures).pdf - Accepted Version

Download (1MB) | Preview

Abstract

Large-scale metabolomics studies involving thousands of samples present multiple challenges in data analysis, particularly when an untargeted platform is used. Studies with multiple cohorts and analysis platforms exacerbate existing problems such as peak alignment and normalization. Therefore, there is a need for robust processing pipelines that can ensure reliable data for statistical analysis. The COMBI-BIO project incorporates serum from ∼8000 individuals, in three cohorts, profiled by six assays in two phases using both (1)H NMR and UPLC-MS. Here we present the COMBI-BIO NMR analysis pipeline and demonstrate its fitness for purpose using representative quality control (QC) samples. NMR spectra were first aligned and normalized. After eliminating interfering signals, outliers identified using Hotelling's T(2) were removed and a cohort/phase adjustment was applied, resulting in two NMR data sets (CPMG and NOESY). Alignment of the NMR data was shown to increase the correlation-based alignment quality measure from 0.319 to 0.391 for CPMG and from 0.536 to 0.586 for NOESY, showing that the improvement was present across both large and small peaks. End-to-end quality assessment of the pipeline was achieved using Hotelling's T(2) distributions. For CPMG spectra, the interquartile range decreased from 1.425 in raw QC data to 0.679 in processed spectra, while the corresponding change for NOESY spectra was from 0.795 to 0.636, indicating an improvement in precision following processing. PCA indicated that gross phase and cohort differences were no longer present. These results illustrate that the pipeline produces robust and reproducible data, successfully addressing the methodological challenges of this large multifaceted study.

Type: Article
Title: Workflow for Integrated Processing of Multicohort Untargeted 1H NMR Metabolomics Data in Large-Scale Metabolic Epidemiology
Location: United States
Open access status: An open access version is available from UCL Discovery
DOI: 10.1021/acs.jproteome.6b00125
Publisher version: http://dx.doi.org/10.1021/acs.jproteome.6b00125
Language: English
Additional information: This document is the Accepted Manuscript version of a Published Work that appeared in final form in the Journal of Proteome Research, copyright © American Chemical Society after peer review and technical editing by the publisher. To access the final edited and published work see: http://dx.doi.org/10.1021/acs.jproteome.6b00125.
Keywords: NMR, alignment, epidemiology, large scale, metabolomics, multicohort, normalization, preprocessing, quality control
UCL classification: UCL
URI: https://discovery.ucl.ac.uk/id/eprint/1533256
Downloads since deposit
175Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item