UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Term-BLAST-Like Alignment Tool for Concept Recognition in Noisy Clinical Texts

Groza, Tudor; Wu, Honghan; Dinger, Marcel E; Danis, Daniel; Hilton, Coleman; Bagley, Anita; Davids, Jon R; ... Robinson, Peter N; + view all (2023) Term-BLAST-Like Alignment Tool for Concept Recognition in Noisy Clinical Texts. Bioinformatics , 39 (12) , Article btad716. 10.1093/bioinformatics/btad716. Green open access

[thumbnail of btad716.pdf]
Preview
Text
btad716.pdf

Download (1MB) | Preview

Abstract

Motivation: Methods for concept recognition (CR) in clinical texts have largely been tested on abstracts or articles from the medical literature. However, texts from electronic health records (EHRs) frequently contain spelling errors, abbreviations, and other non-standard ways of representing clinical concepts. // Results: Here, we present a method inspired by the BLAST algorithm for biosequence alignment that screens texts for potential matches on the basis of matching k-mer counts and scores candidates based on conformance to typical patterns of spelling errors derived from 2.9 million clinical notes. Our method, the Term-BLAST-like alignment tool (TBLAT) leverages a gold standard corpus for typographical errors to implement a sequence alignment-inspired method for efficient entity linkage. We present a comprehensive experimental comparison of TBLAT with five widely-used tools. Experimental results show an increase of 10% in recall on scientific publications and 20% increase in recall on EHR records (when compared against the next best method), hence supporting a significant enhancement of the entity linking task. The method can be used stand-alone or as a complement to existing approaches. // Availability: Fenominal is a Java library that implements TBLAT for named concept recognition of Human Phenotype Ontology terms and is available at https://github.com/monarch-initiative/fenominal under the GNU General Public License v3.0. // Supplementary information: Supplementary data are available at Bioinformatics online.

Type: Article
Title: Term-BLAST-Like Alignment Tool for Concept Recognition in Noisy Clinical Texts
Location: England
Open access status: An open access version is available from UCL Discovery
DOI: 10.1093/bioinformatics/btad716
Publisher version: https://doi.org/10.1093/bioinformatics/btad716
Language: English
Additional information: Copyright © The Author(s) 2023. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics
URI: https://discovery.ucl.ac.uk/id/eprint/10182387
Downloads since deposit
12Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item