UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

BERT-Flow-VAE: A Weakly-supervised Model for Multi-Label Text Classification

Liu, Ziwen; Grau-Bove, Josep; Orr, Scott; (2022) BERT-Flow-VAE: A Weakly-supervised Model for Multi-Label Text Classification. In: (Proceedings) COLING2022: The 29th International Conference on Computational Linguistics. : Gyeongju, Republic of Korea. Green open access

[thumbnail of 355_Paper.pdf]
Preview
Text
355_Paper.pdf - Accepted Version

Download (2MB) | Preview

Abstract

Multi-label Text Classification (MLTC) is the task of categorizing documents into one or more topics. Considering the large volumes of data and varying domains of such tasks, fully-supervised learning requires manually fully annotated datasets which is costly and time-consuming. In this paper, we propose BERT-Flow-VAE (BFV), a WeaklySupervised Multi-Label Text Classification (WSMLTC) model that reduces the need for full supervision. This new model: (1) produces BERT sentence embeddings and calibrates them using a flow model, (2) generates an initial topic-document matrix by averaging results of a seeded sparse topic model and a textual entailment model that only require surface name of topics and 4-6 seed words per topic, and (3) adopts a VAE framework to reconstruct the embeddings under the guidance of the topic-document matrix. Finally, (4) it uses the means produced by the encoder model in the VAE architecture as predictions for MLTC. Experimental results on 6 multilabel datasets show that BFV can substantially outperform other baseline WSMLTC models in key metrics and achieve approximately 84% performance of a fully-supervised model.

Type: Proceedings paper
Title: BERT-Flow-VAE: A Weakly-supervised Model for Multi-Label Text Classification
Event: COLING2022: The 29th International Conference on Computational Linguistics
Open access status: An open access version is available from UCL Discovery
Publisher version: https://coling2022.org/
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
UCL classification: UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of the Built Environment
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of the Built Environment > Bartlett School Env, Energy and Resources
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL
URI: https://discovery.ucl.ac.uk/id/eprint/10156145
Downloads since deposit
106Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item