TY - JOUR
T1 - Knowledge-Enhanced Facial Expression Recognition With Emotional-to-Neutral Transformation
AU - Li, Hangyu
AU - Xu, Yihan
AU - Yao, Jiangchao
AU - Wang, Nannan
AU - Gao, Xinbo
AU - Han, Bo
N1 - This work was supported in part by the National Natural Science Foundation of China under Grant U22A2096, Grant 62036007, Grant 62376235, and Grant 62306178, in part by Scientific and Technological Innovation Teams in Shaanxi Province under Grant 2025RS-CXTD-011, in part by Shaanxi Province Core Technology Research and Development Project under Grant 2024QY2-GJHX-11, in part by the Fundamental Research Funds for the Central Universities under Grant QTZX23042, in part by GDST Basic Research Fund under Grant 2022A1515011652 and Grant 2024A1515012399, in part by HKBU Faculty Niche Research under Grant RC-FNRA-IG/22-23/SCI/04, and in part by HKBU CSD Departmental Incentive Scheme. The associate editor coordinating the review of this article and approving it for publication was Prof. Qiang Wu. (Corresponding author: Nannan Wang.)
Publisher Copyright:
© 1999-2012 IEEE.
PY - 2025/10/21
Y1 - 2025/10/21
N2 - Existing facial expression recognition (FER) methods typically fine-tune a pre-trained visual encoder using discrete labels. However, this form of supervision limits to specify the emotional concept of different facial expressions. In this paper, we observe that the rich knowledge in text embeddings, generated by vision-language models, is a promising alternative for learning discriminative facial expression representations. Inspired by this, we propose a novel knowledge-enhanced FER method with an emotional-to-neutral transformation. Specifically, we formulate the FER problem as a process to match the similarity between a facial expression representation and text embeddings. Then, we transform the facial expression representation to a neutral representation by simulating the difference in text embeddings from textual facial expression to textual neutral. Finally, a self-contrast objective is introduced to pull the facial expression representation closer to the textual facial expression, while pushing it farther from the neutral representation. We conduct evaluation with diverse pre-trained visual encoders including ResNet-18 and Swin-T on four challenging facial expression datasets. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art FER methods.
AB - Existing facial expression recognition (FER) methods typically fine-tune a pre-trained visual encoder using discrete labels. However, this form of supervision limits to specify the emotional concept of different facial expressions. In this paper, we observe that the rich knowledge in text embeddings, generated by vision-language models, is a promising alternative for learning discriminative facial expression representations. Inspired by this, we propose a novel knowledge-enhanced FER method with an emotional-to-neutral transformation. Specifically, we formulate the FER problem as a process to match the similarity between a facial expression representation and text embeddings. Then, we transform the facial expression representation to a neutral representation by simulating the difference in text embeddings from textual facial expression to textual neutral. Finally, a self-contrast objective is introduced to pull the facial expression representation closer to the textual facial expression, while pushing it farther from the neutral representation. We conduct evaluation with diverse pre-trained visual encoders including ResNet-18 and Swin-T on four challenging facial expression datasets. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art FER methods.
KW - Facial expression recognition (FER)
KW - representation transformation
KW - self-contrast
KW - text embedding
UR - http://www.scopus.com/inward/record.url?scp=105014952931&partnerID=8YFLogxK
U2 - 10.1109/TMM.2025.3604916
DO - 10.1109/TMM.2025.3604916
M3 - Journal article
AN - SCOPUS:105014952931
SN - 1520-9210
VL - 27
SP - 7864
EP - 7875
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -