UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Validating Transformers for Redaction of Text from Electronic Health Records in Real-World Healthcare

Kraljevic, Z; Shek, A; Yeung, JA; Sheldon, EJ; Shuaib, H; Al-Agil, M; Bai, X; ... Teo, J; + view all (2023) Validating Transformers for Redaction of Text from Electronic Health Records in Real-World Healthcare. In: Proceedings - 2023 IEEE 11th International Conference on Healthcare Informatics, ICHI 2023. (pp. pp. 544-549). IEEE: Houston, TX, USA. Green open access

[thumbnail of Validating_transformers.pdf]
Preview
Text
Validating_transformers.pdf - Accepted Version

Download (958kB) | Preview

Abstract

Protecting patient privacy in healthcare records is a top priority, and redaction is a commonly used method for obscuring directly identifiable information in text. Rule-based methods have been widely used, but their precision is often low causing over-redaction of text and frequently not being adaptable enough for non-standardised or unconventional structures of personal health information. Deep learning techniques have emerged as a promising solution, but implementing them in real-world environments poses challenges due to the differences in patient record structure and language across different departments, hospitals, and countries.In this study, we present AnonCAT, a transformer-based model and a blueprint on how deidentification models can be deployed in real-world healthcare. AnonCAT was trained through a process involving manually annotated redactions of real-world documents from three UK hospitals with different electronic health record systems and 3116 documents. The model achieved high performance in all three hospitals with a Recall of 0.99, 0.99 and 0.96.Our findings demonstrate the potential of deep learning techniques for improving the efficiency and accuracy of redaction in global healthcare data and highlight the importance of building workflows which not just use these models but are also able to continually fine-tune and audit the performance of these algorithms to ensure continuing effectiveness in real-world settings. This approach provides a blueprint for the real-world use of de-identifying algorithms through fine-tuning and localisation, the code together with tutorials is available on GitHub (https://github.com/CogStack/MedCAT).

Type: Proceedings paper
Title: Validating Transformers for Redaction of Text from Electronic Health Records in Real-World Healthcare
Event: 2023 IEEE 11th International Conference on Healthcare Informatics (ICHI)
Dates: 26 Jun 2023 - 29 Jun 2023
ISBN-13: 9798350302639
Open access status: An open access version is available from UCL Discovery
DOI: 10.1109/ICHI57859.2023.00098
Publisher version: http://dx.doi.org/10.1109/ichi57859.2023.00098
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: electronic health records, text deidentification, transformers
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics > Clinical Epidemiology
URI: https://discovery.ucl.ac.uk/id/eprint/10187746
Downloads since deposit
14Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item