UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Estimating redundancy in clinical text

Searle, T; Ibrahim, Z; Teo, J; Dobson, R; (2021) Estimating redundancy in clinical text. Journal of Biomedical Informatics , 124 , Article 103938. 10.1016/j.jbi.2021.103938. Green open access

[thumbnail of Estimating_Redundancy_in_Clinical_Text__JBI_Submission___Unmarked.pdf]
Preview
Text
Estimating_Redundancy_in_Clinical_Text__JBI_Submission___Unmarked.pdf - Accepted Version

Download (583kB) | Preview

Abstract

The current mode of use of Electronic Health Records (EHR) elicits text redundancy. Clinicians often populate new documents by duplicating existing notes, then updating accordingly. Data duplication can lead to propagation of errors, inconsistencies and misreporting of care. Therefore, measures to quantify information redundancy play an essential role in evaluating innovations that operate on clinical narratives. This work is a quantitative examination of information redundancy in EHR notes. We present and evaluate two methods to measure redundancy: an information-theoretic approach and a lexicosyntactic and semantic model. Our first measure trains large Transformer-based language models using clinical text from a large openly available US-based ICU dataset and a large multi-site UK based Hospital. By comparing the information-theoretic efficient encoding of clinical text against open-domain corpora, we find that clinical text is × to × less efficient than open-domain corpora at conveying information. Our second measure, evaluates automated summarisation metrics Rouge and BERTScore to evaluate successive note pairs demonstrating lexicosyntactic and semantic redundancy, with averages from 43 to 65%.

Type: Article
Title: Estimating redundancy in clinical text
Open access status: An open access version is available from UCL Discovery
DOI: 10.1016/j.jbi.2021.103938
Publisher version: https://doi.org/10.1016/j.jbi.2021.103938
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: Science & Technology, Technology, Life Sciences & Biomedicine, Computer Science, Interdisciplinary Applications, Medical Informatics, Computer Science, Natural language processing methods to estimate redundancy of clinical text, Deep transfer learning for language modelling of clinical text, HEALTH RECORD DATA, BIG DATA, PHYSICIANS, KNOWLEDGE, MEDICINE, UMLS, COPY
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics > Clinical Epidemiology
URI: https://discovery.ucl.ac.uk/id/eprint/10140818
Downloads since deposit
10Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item