UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Multi-domain clinical natural language processing with MedCAT: The Medical Concept Annotation Toolkit

Kraljevic, Z; Searle, T; Shek, A; Roguski, L; Noor, K; Bean, D; Mascio, A; ... Dobson, RJB; + view all (2021) Multi-domain clinical natural language processing with MedCAT: The Medical Concept Annotation Toolkit. Artificial Intelligence in Medicine , 117 , Article 102083. 10.1016/j.artmed.2021.102083. Green open access

[thumbnail of zeljko paper.pdf]
Preview
Text
zeljko paper.pdf - Accepted Version

Download (816kB) | Preview

Abstract

Electronic health records (EHR) contain large volumes of unstructured text, requiring the application of information extraction (IE) technologies to enable clinical analysis. We present the open source Medical Concept Annotation Toolkit (MedCAT) that provides: (a) a novel self-supervised machine learning algorithm for extracting concepts using any concept vocabulary including UMLS/SNOMED-CT; (b) a feature-rich annotation interface for customizing and training IE models; and (c) integrations to the broader CogStack ecosystem for vendor-agnostic health system deployment. We show improved performance in extracting UMLS concepts from open datasets (F1:0.448-0.738 vs 0.429-0.650). Further real-world validation demonstrates SNOMED-CT extraction at 3 large London hospitals with self-supervised training over ∼8.8B words from ∼17M clinical records and further fine-tuning with ∼6K clinician annotated examples. We show strong transferability (F1 > 0.94) between hospitals, datasets and concept types indicating cross-domain EHR-agnostic utility for accelerated clinical and research use cases.

Type: Article
Title: Multi-domain clinical natural language processing with MedCAT: The Medical Concept Annotation Toolkit
Location: Netherlands
Open access status: An open access version is available from UCL Discovery
DOI: 10.1016/j.artmed.2021.102083
Publisher version: http://dx.doi.org/10.1016/j.artmed.2021.102083
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: Clinical concept embeddings, Clinical natural language processing, Clinical ontology embeddings, Electronic health record information extraction
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics > Clinical Epidemiology
URI: https://discovery.ucl.ac.uk/id/eprint/10129853
Downloads since deposit
46Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item