UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Identifying subtypes of chronic kidney disease with machine learning: development, internal validation and prognostic validation using linked electronic health records in 350,067 individuals

Dashtban, Ashkan; Mizani, Mehrdad A; Pasea, Laura; Denaxas, Spiros; Corbett, Richard; Mamza, Jil B; Gao, He; ... Banerjee, Amitava; + view all (2023) Identifying subtypes of chronic kidney disease with machine learning: development, internal validation and prognostic validation using linked electronic health records in 350,067 individuals. EBioMedicine , 89 , Article 104489. 10.1016/j.ebiom.2023.104489. Green open access

[thumbnail of 1-s2.0-S2352396423000543-main.pdf]
Preview
Text
1-s2.0-S2352396423000543-main.pdf - Published Version

Download (2MB) | Preview

Abstract

BACKGROUND: Although chronic kidney disease (CKD) is associated with high multimorbidity, polypharmacy, morbidity and mortality, existing classification systems (mild to severe, usually based on estimated glomerular filtration rate, proteinuria or urine albumin-creatinine ratio) and risk prediction models largely ignore the complexity of CKD, its risk factors and its outcomes. Improved subtype definition could improve prediction of outcomes and inform effective interventions. METHODS: We analysed individuals ≥18 years with incident and prevalent CKD (n = 350,067 and 195,422 respectively) from a population-based electronic health record resource (2006-2020; Clinical Practice Research Datalink, CPRD). We included factors (n = 264 with 2670 derived variables), e.g. demography, history, examination, blood laboratory values and medications. Using a published framework, we identified subtypes through seven unsupervised machine learning (ML) methods (K-means, Diana, HC, Fanny, PAM, Clara, Model-based) with 66 (of 2670) variables in each dataset. We evaluated subtypes for: (i) internal validity (within dataset, across methods); (ii) prognostic validity (predictive accuracy for 5-year all-cause mortality and admissions); and (iii) medications (new and existing by British National Formulary chapter). FINDINGS: After identifying five clusters across seven approaches, we labelled CKD subtypes: 1. Early-onset, 2. Late-onset, 3. Cancer, 4. Metabolic, and 5. Cardiometabolic. Internal validity: We trained a high performing model (using XGBoost) that could predict disease subtypes with 95% accuracy for incident and prevalent CKD (Sensitivity: 0.81-0.98, F1 score:0.84-0.97). Prognostic validity: 5-year all-cause mortality, hospital admissions, and incidence of new chronic diseases differed across CKD subtypes. The 5-year risk of mortality and admissions in the overall incident CKD population were highest in cardiometabolic subtype: 43.3% (42.3-42.8%) and 29.5% (29.1-30.0%), respectively, and lowest in the early-onset subtype: 5.7% (5.5-5.9%) and 18.7% (18.4-19.1%). MEDICATIONS: Across CKD subtypes, the distribution of prescription medication classes at baseline varied, with highest medication burden in cardiometabolic and metabolic subtypes, and higher burden in prevalent than incident CKD. INTERPRETATION: In the largest CKD study using ML, to-date, we identified five distinct subtypes in individuals with incident and prevalent CKD. These subtypes have relevance to study of aetiology, therapeutics and risk prediction. FUNDING: AstraZeneca UK Ltd, Health Data Research UK.

Type: Article
Title: Identifying subtypes of chronic kidney disease with machine learning: development, internal validation and prognostic validation using linked electronic health records in 350,067 individuals
Location: Netherlands
Open access status: An open access version is available from UCL Discovery
DOI: 10.1016/j.ebiom.2023.104489
Publisher version: https://doi.org/10.1016/j.ebiom.2023.104489
Language: English
Additional information: © 2023 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Keywords: CKD subtype, Cluster analysis, Machine learning, Survival analysis, Unsupervised clustering
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics > Clinical Epidemiology
URI: https://discovery.ucl.ac.uk/id/eprint/10166094
Downloads since deposit
41Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item