UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Hospital-wide natural language processing summarising the health data of 1 million patients

Bean, Daniel M; Kraljevic, Zeljko; Shek, Anthony; Teo, James; Dobson, Richard JB; (2023) Hospital-wide natural language processing summarising the health data of 1 million patients. PLOS Digit Health , 2 (5) , Article e0000218. 10.1371/journal.pdig.0000218. Green open access

[thumbnail of Dobson_Hospital-wide natural language processing summarising the health data of 1 million patients_VoR.pdf]
Preview
Text
Dobson_Hospital-wide natural language processing summarising the health data of 1 million patients_VoR.pdf - Published Version

Download (1MB) | Preview

Abstract

Electronic health records (EHRs) represent a major repository of real world clinical trajectories, interventions and outcomes. While modern enterprise EHR's try to capture data in structured standardised formats, a significant bulk of the available information captured in the EHR is still recorded only in unstructured text format and can only be transformed into structured codes by manual processes. Recently, Natural Language Processing (NLP) algorithms have reached a level of performance suitable for large scale and accurate information extraction from clinical text. Here we describe the application of open-source named-entity-recognition and linkage (NER+L) methods (CogStack, MedCAT) to the entire text content of a large UK hospital trust (King's College Hospital, London). The resulting dataset contains 157M SNOMED concepts generated from 9.5M documents for 1.07M patients over a period of 9 years. We present a summary of prevalence and disease onset as well as a patient embedding that captures major comorbidity patterns at scale. NLP has the potential to transform the health data lifecycle, through large-scale automation of a traditionally manual task.

Type: Article
Title: Hospital-wide natural language processing summarising the health data of 1 million patients
Location: United States
Open access status: An open access version is available from UCL Discovery
DOI: 10.1371/journal.pdig.0000218
Publisher version: https://doi.org/10.1371/journal.pdig.0000218
Language: English
Additional information: Copyright: © 2023 Bean et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics > Clinical Epidemiology
URI: https://discovery.ucl.ac.uk/id/eprint/10170540
Downloads since deposit
15Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item