TY - GEN
T1 - Enhanced Human-Machine Interactive Learning for Multimodal Emotion Recognition in Dialogue System
AU - Leung, Clement H.C.
AU - Deng, James J.
AU - Li, Yuanxi
N1 - Publisher Copyright:
© 2022 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2022/12/23
Y1 - 2022/12/23
N2 - Emotion recognition has been well researched in mono-modality in the past decade. However, people express their emotion or feelings naturally via more than one modalities like voice, facial expressions, text, and behaviors. In this paper, we propose a new method to model deep interactive learning and dual modalities (e.g., speech and text) to conduct multimodal emotion recognition. An unsupervised triplet-loss objective function is constructed to learn representation of emotional information from speech audio. We extract text emotional feature representation by transfer learning of text-To-Text embedding from T5 pre-Trained model. Human-machine interaction like user feedback plays a vital role in improve multimodal emotion recognition in dialogue system. Deep interactive learning model is constructed by explicit and implicit feedback. Human-machine interactive learning enhanced transformer model can achieve higher levels of accuracy and precision than their non-interactive counterparts.
AB - Emotion recognition has been well researched in mono-modality in the past decade. However, people express their emotion or feelings naturally via more than one modalities like voice, facial expressions, text, and behaviors. In this paper, we propose a new method to model deep interactive learning and dual modalities (e.g., speech and text) to conduct multimodal emotion recognition. An unsupervised triplet-loss objective function is constructed to learn representation of emotional information from speech audio. We extract text emotional feature representation by transfer learning of text-To-Text embedding from T5 pre-Trained model. Human-machine interaction like user feedback plays a vital role in improve multimodal emotion recognition in dialogue system. Deep interactive learning model is constructed by explicit and implicit feedback. Human-machine interactive learning enhanced transformer model can achieve higher levels of accuracy and precision than their non-interactive counterparts.
KW - human-machine interaction
KW - interactive learning
KW - Multimodal emotion recognition
KW - transformer model
UR - http://www.scopus.com/inward/record.url?scp=85150369792&partnerID=8YFLogxK
U2 - 10.1145/3579654.3579764
DO - 10.1145/3579654.3579764
M3 - Conference proceeding
AN - SCOPUS:85150369792
SN - 9781450398336
T3 - ACM International Conference Proceeding Series
BT - ACAI 2022 - Conference Proceedings
PB - Association for Computing Machinery (ACM)
CY - New York
T2 - 5th International Conference on Algorithms, Computing and Artificial Intelligence, ACAI 2022
Y2 - 23 December 2022 through 25 December 2022
ER -