UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Improving Health Mention Classification Through Emphasising Literal Meanings: A Study Towards Diversity and Generalisation for Public Health Surveillance

Aduragba, OT; Yu, J; Cristea, A; Long, Y; (2023) Improving Health Mention Classification Through Emphasising Literal Meanings: A Study Towards Diversity and Generalisation for Public Health Surveillance. In: WWW '23: Proceedings of the ACM Web Conference 2023. (pp. pp. 3928-3936). ACM Green open access

[thumbnail of 3543507.3583877.pdf]
Preview
Text
3543507.3583877.pdf - Published Version

Download (685kB) | Preview

Abstract

People often use disease or symptom terms on social media and online forums in ways other than to describe their health. Thus the NLP health mention classification (HMC) task aims to identify posts where users are discussing health conditions literally, not figuratively. Existing computational research typically only studies health mentions within well-represented groups in developed nations. Developing countries with limited health surveillance abilities fail to benefit from such data to manage public health crises. To advance the HMC research and benefit more diverse populations, we present the Nairaland health mention dataset (NHMD), a new dataset collected from a dedicated web forum for Nigerians. NHMD consists of 7,763 manually labelled posts extracted based on four prevalent diseases (HIV/AIDS, Malaria, Stroke and Tuberculosis) in Nigeria. With NHMD, we conduct extensive experiments using current state-of-the-art models for HMC and identify that, compared to existing public datasets, NHMD contains out-of-distribution examples. Hence, it is well suited for domain adaptation studies. The introduction of the NHMD dataset imposes better diversity coverage of vulnerable populations and generalisation for HMC tasks in a global public health surveillance setting. Additionally, we present a novel multi-task learning approach for HMC tasks by combining literal word meaning prediction as an auxiliary task. Experimental results demonstrate that the proposed approach outperforms state-of-the-art methods statistically significantly (p < 0.01, Wilcoxon test) in terms of F1 score over the state-of-the-art and shows that our new dataset poses a strong challenge to the existing HMC methods.

Type: Proceedings paper
Title: Improving Health Mention Classification Through Emphasising Literal Meanings: A Study Towards Diversity and Generalisation for Public Health Surveillance
Event: WWW '23
ISBN-13: 9781450394161
Open access status: An open access version is available from UCL Discovery
DOI: 10.1145/3543507.3583877
Publisher version: https://doi.org/10.1145/3543507.3583877
Language: English
Additional information: This version is the version of record. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: Datasets, Health Mention Classifcation, Public Health Surveillance, Multi-task Learning
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Statistical Science
URI: https://discovery.ucl.ac.uk/id/eprint/10171115
Downloads since deposit
47Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item