Model selection for metabolomics: predicting diagnosis of coronary artery disease using automated machine learning

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Bookmark & Share

Model selection for metabolomics: predicting diagnosis of coronary artery disease using automated machine learning

Orlenko, A; Kofink, D; Lyytikäinen, L-P; Nikus, K; Mishra, P; Kuukasjärvi, P; Karhunen, PJ; ... Moore, JH; + view all (2020) Model selection for metabolomics: predicting diagnosis of coronary artery disease using automated machine learning. Bioinformatics , 36 (6) pp. 1772-1778. 10.1093/bioinformatics/btz796. Green open access

Preview

Text
btz796.pdf - Published Version
Download (340kB) | Preview

Abstract

MOTIVATION: Selecting the optimal machine learning (ML) model for a given dataset is often challenging. Automated ML (AutoML) has emerged as a powerful tool for enabling the automatic selection of ML methods and parameter settings for the prediction of biomedical endpoints. Here, we apply the tree-based pipeline optimization tool (TPOT) to predict angiographic diagnoses of coronary artery disease (CAD). With TPOT, ML models are represented as expression trees and optimal pipelines discovered using a stochastic search method called genetic programing. We provide some guidelines for TPOT-based ML pipeline selection and optimization-based on various clinical phenotypes and high-throughput metabolic profiles in the Angiography and Genes Study (ANGES). RESULTS: We analyzed nuclear magnetic resonance-derived lipoprotein and metabolite profiles in the ANGES cohort with a goal to identify the role of non-obstructive CAD patients in CAD diagnostics. We performed a comparative analysis of TPOT-generated ML pipelines with selected ML classifiers, optimized with a grid search approach, applied to two phenotypic CAD profiles. As a result, TPOT-generated ML pipelines that outperformed grid search optimized models across multiple performance metrics including balanced accuracy and area under the precision-recall curve. With the selected models, we demonstrated that the phenotypic profile that distinguishes non-obstructive CAD patients from no CAD patients is associated with higher precision, suggesting a discrepancy in the underlying processes between these phenotypes. AVAILABILITY AND IMPLEMENTATION: TPOT is freely available via http://epistasislab.github.io/tpot/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Type:	Article
Title:	Model selection for metabolomics: predicting diagnosis of coronary artery disease using automated machine learning
Location:	England
Open access status:	An open access version is available from UCL Discovery
DOI:	10.1093/bioinformatics/btz796
Publisher version:	https://doi.org/10.1093/bioinformatics/btz796
Language:	English
Additional information:	Copyright © The Author(s) 2019. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
UCL classification:	UCL UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics
URI:	https://discovery.ucl.ac.uk/id/eprint/10094940

Downloads since deposit

50Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item