UCL Discovery
UCL home » Library Services » Electronic resources » UCL Discovery

Visual-Textual Attribute Learning for Class-Incremental Facial Expression Recognition

Lv, Yuanling; Huang, Guangyu; Yan, Yan; Xue, Jing-Hao; Chen, Si; Wang, Hanzi; (2024) Visual-Textual Attribute Learning for Class-Incremental Facial Expression Recognition. IEEE Transactions on Multimedia 10.1109/tmm.2024.3374573. (In press). Green open access

[thumbnail of Xue_Visual-Textual_Attribute_Learning_for_Class-Incremental_Facial_Expression_Recognition.pdf]
Preview
Text
Xue_Visual-Textual_Attribute_Learning_for_Class-Incremental_Facial_Expression_Recognition.pdf

Download (4MB) | Preview

Abstract

In this paper, we study facial expression recognition (FER) in the class-incremental learning (CIL) setting, which defines the classification of well-studied and easily-accessible basic expressions as an initial task while learning new compound expressions gradually. Motivated by the fact that compound expressions are meaningful combinations of basic expressions, we treat basic expressions as attributes (i.e., semantic descriptors), and thus compound expressions are represented in terms of attributes. To this end, we propose a novel visual-textual attribute learning network (VTA-Net), mainly consisting of a textual-guided visual module (TVM) and a textual compositional module (TCM), for class-incremental FER. Specifically, TVM extracts textual-aware visual features and classifies expressions by incorporating the textual information into visual attribute learning. Meanwhile, TCM generates visual-aware textual features and predicts expressions by exploiting the dependency between textual attributes and category names of old and new expressions based on a textual compositional graph. In particular, a visual-textual distillation loss is introduced to calibrate TVM and TCM during incremental learning. Finally, the outputs from TVM and TCM are fused to make a final prediction. On the one hand, at each incremental task, the representations of visual attributes are enhanced since visual attributes are shared across old and new expressions. This increases the stability of our method. On the other hand, the textual modality, which involves rich prior knowledge of the relevance between expressions, facilitates our model to identify subtle visual distinctions between compound expressions, improving the plasticity of our method. Experimental results on both in-the-lab and in-the-wild facial expression databases show the superiority of our method against several state-of-the-art methods for class-incremental FER.

Type: Article
Title: Visual-Textual Attribute Learning for Class-Incremental Facial Expression Recognition
Open access status: An open access version is available from UCL Discovery
DOI: 10.1109/tmm.2024.3374573
Publisher version: https://doi.org/10.1109/TMM.2024.3374573
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: Facial expression recognition, Class-incremental learning, Multi-modality learning, Attribute learning
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Statistical Science
URI: https://discovery.ucl.ac.uk/id/eprint/10189044
Downloads since deposit
Loading...
63Downloads
Download activity - last month
Loading...
Download activity - last 12 months
Loading...
Downloads by country - last 12 months
Loading...

Archive Staff Only

View Item View Item