Automating the extraction of otology symptoms from clinic letters: a methodological study using natural language processing

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Automating the extraction of otology symptoms from clinic letters: a methodological study using natural language processing

Joshi, N; Noor, K; Bai, X; Forbes, M; Ross, T; Barrett, L; Dobson, RJB; ... Lilaonitkul, W; + view all (2025) Automating the extraction of otology symptoms from clinic letters: a methodological study using natural language processing. BMC Medical Informatics and Decision Making , 25 (1) , Article 353. 10.1186/s12911-025-03180-8. Green open access

[thumbnail of Automating the extraction of otology symptoms from Clinical letters.pdf]

Preview

PDF
Automating the extraction of otology symptoms from Clinical letters.pdf - Published Version
Download (1MB) | Preview

Abstract

BACKGROUND: Most healthcare data is in an unstructured format that requires processing to make it usable for research. Generally, this is done manually, which is both time-consuming and poorly scalable. Natural language processing (NLP) using machine learning offers a method to automate data extraction. In this paper we describe the development of a set of NLP models to extract and contextualise otology symptoms from free text documents. METHODS: A dataset of 1,148 otology clinic letters written between 2009 – 2011, from a London NHS hospital, were manually annotated and used to train a hybrid dictionary and machine learning NLP model to identify six key otological symptoms: hearing loss, impairment of balance, otalgia, otorrhoea, tinnitus and vertigo. Subsequently, a set of Bidirectional-Long-Short-Term-Memory (Bi-LSTM) models were trained to extract contextual information for each symptom, for example, defining the laterality of the ear affected. RESULTS: There were 1,197 symptom annotations and 2,861 contextual annotations with 24% of patients presenting with hearing loss. The symptom extraction model achieved a macro F1 score of 0.73. The Bi-LSTM models achieved a mean macro F1 score of 0.69 for the contextualisation tasks. CONCLUSION: NLP models for symptom extraction and contextualisation were successfully created and shown to perform well on real life data. Refinement is needed to produce models that can run without manual review. Downstream applications for these models include deep semantic searching in electronic health records, cohort identification for clinical trials and facilitating research into hearing loss phenotypes. Further testing of the external validity of the developed models is required.

Type:	Article
Title:	Automating the extraction of otology symptoms from clinic letters: a methodological study using natural language processing
Location:	England
Open access status:	An open access version is available from UCL Discovery
DOI:	10.1186/s12911-025-03180-8
Publisher version:	https://doi.org/10.1186/s12911-025-03180-8
Language:	English
Additional information:	© 2025 BioMed Central Ltd. This article is licensed under a Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
Keywords:	Natural language processing, Machine learning, Otology, Symptoms
UCL classification:	UCL UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > The Ear Institute UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics > Clinical Epidemiology
URI:	https://discovery.ucl.ac.uk/id/eprint/10215618

Downloads since deposit

8Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item