UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Foresight—a generative pretrained transformer for modelling of patient timelines using electronic health records: a retrospective modelling study

Kraljevic, Z; Bean, D; Shek, A; Bendayan, R; Hemingway, H; Yeung, JA; Deng, A; ... Dobson, RJB; + view all (2024) Foresight—a generative pretrained transformer for modelling of patient timelines using electronic health records: a retrospective modelling study. The Lancet Digital Health , 6 (4) e281-e290. 10.1016/S2589-7500(24)00025-6. Green open access

[thumbnail of 1-s2.0-S2589750024000256-main.pdf]
Preview
Text
1-s2.0-S2589750024000256-main.pdf - Published Version

Download (1MB) | Preview

Abstract

Background: An electronic health record (EHR) holds detailed longitudinal information about a patient's health status and general clinical history, a large portion of which is stored as unstructured, free text. Existing approaches to model a patient's trajectory focus mostly on structured data and a subset of single-domain outcomes. This study aims to evaluate the effectiveness of Foresight, a generative transformer in temporal modelling of patient data, integrating both free text and structured formats, to predict a diverse array of future medical outcomes, such as disorders, substances (eg, to do with medicines, allergies, or poisonings), procedures, and findings (eg, relating to observations, judgements, or assessments). / Methods: Foresight is a novel transformer-based pipeline that uses named entity recognition and linking tools to convert EHR document text into structured, coded concepts, followed by providing probabilistic forecasts for future medical events, such as disorders, substances, procedures, and findings. The Foresight pipeline has four main components: (1) CogStack (data retrieval and preprocessing); (2) the Medical Concept Annotation Toolkit (structuring of the free-text information from EHRs); (3) Foresight Core (deep-learning model for biomedical concept modelling); and (4) the Foresight web application. We processed the entire free-text portion from three different hospital datasets (King's College Hospital [KCH], South London and Maudsley [SLaM], and the US Medical Information Mart for Intensive Care III [MIMIC-III]), resulting in information from 811 336 patients and covering both physical and mental health institutions. We measured the performance of models using custom metrics derived from precision and recall. / Findings: Foresight achieved a precision@10 (ie, of 10 forecasted candidates, at least one is correct) of 0·68 (SD 0·0027) for the KCH dataset, 0·76 (0·0032) for the SLaM dataset, and 0·88 (0·0018) for the MIMIC-III dataset, for forecasting the next new disorder in a patient timeline. Foresight also achieved a precision@10 value of 0·80 (0·0013) for the KCH dataset, 0·81 (0·0026) for the SLaM dataset, and 0·91 (0·0011) for the MIMIC-III dataset, for forecasting the next new biomedical concept. In addition, Foresight was validated on 34 synthetic patient timelines by five clinicians and achieved a relevancy of 33 (97% [95% CI 91–100]) of 34 for the top forecasted candidate disorder. As a generative model, Foresight can forecast follow-on biomedical concepts for as many steps as required. / Interpretation: Foresight is a general-purpose model for biomedical concept modelling that can be used for real-world risk forecasting, virtual trials, and clinical research to study the progression of disorders, to simulate interventions and counterfactuals, and for educational purposes. / Funding: National Health Service Artificial Intelligence Laboratory, National Institute for Health and Care Research Biomedical Research Centre, and Health Data Research UK.

Type: Article
Title: Foresight—a generative pretrained transformer for modelling of patient timelines using electronic health records: a retrospective modelling study
Location: England
Open access status: An open access version is available from UCL Discovery
DOI: 10.1016/S2589-7500(24)00025-6
Publisher version: https://doi.org/10.1016/S2589-7500(24)00025-6
Language: English
Additional information: Copyright © 2024 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics > Clinical Epidemiology
URI: https://discovery.ucl.ac.uk/id/eprint/10190262
Downloads since deposit
5Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item