Lv, Yuanling;
Huang, Guangyu;
Yan, Yan;
Xue, Jing-Hao;
Chen, Si;
Wang, Hanzi;
(2024)
Visual-Textual Attribute Learning for Class-Incremental Facial Expression Recognition.
IEEE Transactions on Multimedia
10.1109/tmm.2024.3374573.
(In press).
Preview |
Text
Xue_Visual-Textual_Attribute_Learning_for_Class-Incremental_Facial_Expression_Recognition.pdf Download (4MB) | Preview |
Abstract
In this paper, we study facial expression recognition (FER) in the class-incremental learning (CIL) setting, which defines the classification of well-studied and easily-accessible basic expressions as an initial task while learning new compound expressions gradually. Motivated by the fact that compound expressions are meaningful combinations of basic expressions, we treat basic expressions as attributes (i.e., semantic descriptors), and thus compound expressions are represented in terms of attributes. To this end, we propose a novel visual-textual attribute learning network (VTA-Net), mainly consisting of a textual-guided visual module (TVM) and a textual compositional module (TCM), for class-incremental FER. Specifically, TVM extracts textual-aware visual features and classifies expressions by incorporating the textual information into visual attribute learning. Meanwhile, TCM generates visual-aware textual features and predicts expressions by exploiting the dependency between textual attributes and category names of old and new expressions based on a textual compositional graph. In particular, a visual-textual distillation loss is introduced to calibrate TVM and TCM during incremental learning. Finally, the outputs from TVM and TCM are fused to make a final prediction. On the one hand, at each incremental task, the representations of visual attributes are enhanced since visual attributes are shared across old and new expressions. This increases the stability of our method. On the other hand, the textual modality, which involves rich prior knowledge of the relevance between expressions, facilitates our model to identify subtle visual distinctions between compound expressions, improving the plasticity of our method. Experimental results on both in-the-lab and in-the-wild facial expression databases show the superiority of our method against several state-of-the-art methods for class-incremental FER.
Type: | Article |
---|---|
Title: | Visual-Textual Attribute Learning for Class-Incremental Facial Expression Recognition |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1109/tmm.2024.3374573 |
Publisher version: | https://doi.org/10.1109/TMM.2024.3374573 |
Language: | English |
Additional information: | This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions. |
Keywords: | Facial expression recognition, Class-incremental learning, Multi-modality learning, Attribute learning |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Statistical Science |
URI: | https://discovery.ucl.ac.uk/id/eprint/10189044 |
Archive Staff Only
View Item |