UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

MedDistant19: A Challenging Benchmark for Distantly Supervised Biomedical Relation Extraction

Amin, Saadullah; Minervini, Pasquale; Chang, David; Neumann, Günter; Stenetorp, Pontus; (2022) MedDistant19: A Challenging Benchmark for Distantly Supervised Biomedical Relation Extraction. In: Proceedings of the 21st BioNLP workshop associated with the ACL SIGBIOMED 2022. (pp. pp. 1-17). : ACL. Green open access

[thumbnail of 2204.04779v1.pdf]
Preview
Text
2204.04779v1.pdf - Accepted Version

Download (945kB) | Preview

Abstract

Relation Extraction in the biomedical domain is challenging due to the lack of labeled data and high annotation costs, needing domain experts. Distant supervision is commonly used as a way to tackle the scarcity of annotated data by automatically pairing knowledge graph relationships with raw texts. Distantly Supervised Biomedical Relation Extraction (Bio-DSRE) models can seemingly produce very accurate results in several benchmarks. However, given the challenging nature of the task, we set out to investigate the validity of such impressive results. We probed the datasets used by Amin et al. (2020) and Hogan et al. (2021) and found a significant overlap between training and evaluation relationships that, once resolved, reduced the accuracy of the models by up to 71%. Furthermore, we noticed several inconsistencies with the data construction process, such as creating negative samples and improper handling of redundant relationships. We mitigate these issues and present MedDistant19, a new benchmark dataset obtained by aligning the MEDLINE abstracts with the widely used SNOMED Clinical Terms (SNOMED CT) knowledge base. We experimented with several state-of-the-art models achieving an AUC of 55.4% and 49.8% at sentence- and bag-level, showing that there is still plenty of room for improvement.

Type: Proceedings paper
Title: MedDistant19: A Challenging Benchmark for Distantly Supervised Biomedical Relation Extraction
Event: 21st BioNLP workshop associated with the ACL SIGBIOMED 2022
Open access status: An open access version is available from UCL Discovery
Publisher version: https://aclweb.org/aclwiki/BioNLP_Workshop
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher's terms and conditions.
UCL classification: UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL
URI: https://discovery.ucl.ac.uk/id/eprint/10153270
Downloads since deposit
26Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item