UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Diagnosing migraine from genome-wide genotype data: a machine learning analysis

Danelakis, Antonios; Kumelj, Tjaša; Winsvold, Bendik S; Helene Bjørk, Marte; Nachev, Parashkev; Matharu, Manjit; Giles, Dominic; ... Stubberud, Anker; + view all (2025) Diagnosing migraine from genome-wide genotype data: a machine learning analysis. Brain , Article awaf172. 10.1093/brain/awaf172. (In press). Green open access

[thumbnail of awaf172.pdf]
Preview
PDF
awaf172.pdf - Accepted Version

Download (1MB) | Preview

Abstract

Migraine has an assumed polygenic basis, but the genetic risk variants identified in genome-wide association studies only explain a proportion of the heritability. We aimed to develop machine learning models, capturing non-additive and interactive effects, to address the missing heritability. This was a cross-sectional population-based study of participants in the second and third Trøndelag Health Study. Individuals underwent genome-wide genotyping and were phenotyped based on validated modified criteria of the International Classification of Headache Disorders. Four datasets of increasing number of genetic variants were created using different thresholds of linkage disequilibrium and univariate genome-wide associated p-values. A series of machine learning and deep learning methods were optimized and evaluated. The genotype tools PLINK and LDPred2 were used for polygenic risk scoring. Models were trained on a partition of the dataset and tested in a hold-out set. The area under the receiver operating characteristics curve was used as the primary scoring metric. Classification by machine learning was statistically compared to that of polygenic risk scoring. Finally, we explored the biological functions of the variants unique to the machine learning approach. 43,197 individuals (51% women), with a mean age of 54.6 years, were included in the modelling. A light gradient boosting machine performed best for the three smallest datasets (108, 7,771 and 7,840 variants), all with hold-out test set area under curve at 0.63. A multinomial naïve Bayes model performed best in the largest dataset (140,467 variants) with a hold-out test set area under curve of 0.62. The models were statistically significantly superior to polygenic risk scoring (area under curve 0.52 to 0.59) for all the datasets (p<0.001 to p=0.02). Machine learning identified many of the same genes and pathways identified in genome-wide association studies, but also several unique pathways, mainly related to signal transduction and neurological function. Interestingly, pathways related to botulinum toxins, and pathways related to the calcitonin gene-related peptide receptor also emerged. This study suggests that migraine may follow a non-additive and interactive genetic causal structure, potentially best captured by complex machine learning models. Such structure may be concealed where the data dimensionality (high number of genetic variants) is insufficiently supported by the scale of available data, leaving a misleading impression of purely additive effects. Future machine learning models using substantially larger sample sizes could harness both the additive and the interactive effects, enhancing precision and offering deeper understanding of genetic interactions underlying migraine.

Type: Article
Title: Diagnosing migraine from genome-wide genotype data: a machine learning analysis
Location: England
Open access status: An open access version is available from UCL Discovery
DOI: 10.1093/brain/awaf172
Publisher version: https://doi.org/10.1093/brain/awaf172
Language: English
Additional information: © The Author(s) 2025. Published by Oxford University Press on behalf of The Guarantors of Brain. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Keywords: HUNT, artificial intelligence, epistasis, genetics, gradient boosting, headache
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > UCL Queen Square Institute of Neurology
URI: https://discovery.ucl.ac.uk/id/eprint/10214490
Downloads since deposit
0Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item