Constructing Artificial Data for Fine-Tuning for Low-Resource Biomedical Text Tagging with Applications in PICO Annotation

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Bookmark & Share

Constructing Artificial Data for Fine-Tuning for Low-Resource Biomedical Text Tagging with Applications in PICO Annotation

Singh, G; Sabet, Z; Shawe-Taylor, J; Thomas, J; (2020) Constructing Artificial Data for Fine-Tuning for Low-Resource Biomedical Text Tagging with Applications in PICO Annotation. In: Explainable AI in Healthcare and Medicine. (pp. pp. 131-145). Springer Nature: Cham, Switzerland. Green open access

Preview

Text
1910.09255.pdf - Accepted Version
Download (459kB) | Preview

Abstract

Biomedical text tagging systems are plagued by the dearth of labeled training data. There have been recent attempts at using pre-trained encoders to deal with this issue. Pre-trained encoder provides representation of the input text which is then fed to task-specific layers for classification. The entire network is fine-tuned on the labeled data from the target task. Unfortunately, a low-resource biomedical task often has too few labeled instances for satisfactory fine-tuning. Also, if the label space is large, it contains few or no labeled instances for majority of the labels. Most biomedical tagging systems treat labels as indexes, ignoring the fact that these labels are often concepts expressed in natural language e.g. ‘Appearance of lesion on brain imaging’. To address these issues, we propose constructing extra labeled instances using label-text (i.e. label’s name) as input for the corresponding label-index (i.e. label’s index). In fact, we propose a number of strategies for manufacturing multiple artificial labeled instances from a single label. The network is then fine-tuned on a combination of real and these newly constructed artificial labeled instances. We evaluate the proposed approach on an important low-resource biomedical task called PICO annotation, which requires tagging raw text describing clinical trials with labels corresponding to different aspects of the trial i.e. PICO (Population, Intervention/Control, Outcome) characteristics of the trial. Our empirical results show that the proposed method achieves a new state-of-the-art performance for PICO annotation with very significant improvements over competitive baselines.

Type:	Proceedings paper
Title:	Constructing Artificial Data for Fine-Tuning for Low-Resource Biomedical Text Tagging with Applications in PICO Annotation
ISBN-13:	9783030533519
Open access status:	An open access version is available from UCL Discovery
DOI:	10.1007/978-3-030-53352-6_12
Publisher version:	https://doi.org/10.1007/978-3-030-53352-6_12
Language:	English
Additional information:	This version is the author accepted manuscript. For information on re-use, please refer to the publisher's terms and conditions.
Keywords:	Biomedical text tagging, PICO annotation, Artificial data, Transfer learning
UCL classification:	UCL UCL > Provost and Vice Provost Offices > School of Education UCL > Provost and Vice Provost Offices > School of Education > UCL Institute of Education UCL > Provost and Vice Provost Offices > School of Education > UCL Institute of Education > IOE - Social Research Institute UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Engineering Science > Dept of Computer Science
URI:	https://discovery.ucl.ac.uk/id/eprint/10118994

Downloads since deposit

28Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item