UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Automatic multilabel detection of ICD10 codes in Dutch cardiology discharge letters using neural networks

Sammani, A; Bagheri, A; Van der Heijden, PGM; Te Riele, ASJM; Baas, AF; Oosters, CAJ; Oberski, D; (2021) Automatic multilabel detection of ICD10 codes in Dutch cardiology discharge letters using neural networks. npj Digital Medicine , 4 , Article 37. 10.1038/s41746-021-00404-9. Green open access

[thumbnail of Asselbergs_Automatic multilabel detection of ICD10 codes in Dutch cardiology discharge letters using neural networks_VoR.pdf]
Preview
Text
Asselbergs_Automatic multilabel detection of ICD10 codes in Dutch cardiology discharge letters using neural networks_VoR.pdf - Published Version

Download (1MB) | Preview

Abstract

Standard reference terminology of diagnoses and risk factors is crucial for billing, epidemiological studies, and inter/intranational comparisons of diseases. The International Classification of Disease (ICD) is a standardized and widely used method, but the manual classification is an enormously time-consuming endeavor. Natural language processing together with machine learning allows automated structuring of diagnoses using ICD-10 codes, but the limited performance of machine learning models, the necessity of gigantic datasets, and poor reliability of terminal parts of these codes restricted clinical usability. We aimed to create a high performing pipeline for automated classification of reliable ICD-10 codes in the free medical text in cardiology. We focussed on frequently used and well-defined three- and four-digit ICD-10 codes that still have enough granularity to be clinically relevant such as atrial fibrillation (I48), acute myocardial infarction (I21), or dilated cardiomyopathy (I42.0). Our pipeline uses a deep neural network known as a Bidirectional Gated Recurrent Unit Neural Network and was trained and tested with 5548 discharge letters and validated in 5089 discharge and procedural letters. As in clinical practice discharge letters may be labeled with more than one code, we assessed the single- and multilabel performance of main diagnoses and cardiovascular risk factors. We investigated using both the entire body of text and only the summary paragraph, supplemented by age and sex. Given the privacy-sensitive information included in discharge letters, we added a de-identification step. The performance was high, with F1 scores of 0.76–0.99 for three-character and 0.87–0.98 for four-character ICD-10 codes, and was best when using complete discharge letters. Adding variables age/sex did not affect results. For model interpretability, word coefficients were provided and qualitative assessment of classification was manually performed. Because of its high performance, this pipeline can be useful to decrease the administrative burden of classifying discharge diagnoses and may serve as a scaffold for reimbursement and research applications.

Type: Article
Title: Automatic multilabel detection of ICD10 codes in Dutch cardiology discharge letters using neural networks
Location: England
Open access status: An open access version is available from UCL Discovery
DOI: 10.1038/s41746-021-00404-9
Publisher version: https://doi.org/10.1038/s41746-021-00404-9
Language: English
Additional information: This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
Keywords: Diseases, Health care, Health services
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics
URI: https://discovery.ucl.ac.uk/id/eprint/10125440
Downloads since deposit
39Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item