Abstract
Current affective computing paradigms often treat emotional understanding and generation as separate tasks, yet they inherently possess symbiotic potential for mutual enhancement. In this paper, we aim to bridge the gap by developing a unified framework. The primary challenge lies in the extraction of precise and semantically rich representations of abstract emotions, which are crucial for both tasks. To address this, we harness the Chain-of-Thought reasoning at the latent space of multimodal large language models and propose EmoSym, a unified framework built upon this advanced foundation. Our framework is executed through three key steps: 1) Emotional reasoning knowledge compression. To enable efficient transfer of emotional reasoning priors, we design specialized reasoning tokens to compact emotion-aware contexts from external reasoning knowledge bases into latent representations. 2) Verifiable reinforcement reasoning optimization. To ensure more reliable and consistent emotional reasoning, we develop a verifiable reinforcement learning paradigm to further enhance the reasoning token by emotion-specific verifiable reward signals. Processed through the above two steps, the reasoning token simultaneously enhances emotional understanding while enriching semantic representations, benefiting subsequent emotional generation tasks. 3) Reasoning-augmented generation and online feedback. We then fuse it with emotional representations and feed them into a diffusion model to generate emotion-evoking images. Additionally, to create a generative-to-understanding enhancement feedback, we propose an Online Emotional Memory Bank (OEMB). It leverages newly generated images to progressively update the training dataset in the training process to reinforce understanding. Extensive experiments demonstrate the superior capabilities of our framework in both emotional understanding and generation tasks.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 33rd ACM International Conference on Multimedia |
| Place of Publication | New York |
| Publisher | Association for Computing Machinery (ACM) |
| Pages | 5451–5460 |
| Number of pages | 10 |
| ISBN (Print) | 9798400720352 |
| DOIs | |
| Publication status | Published - 27 Oct 2025 |
| Event | 33rd ACM International Conference on Multimedia, ACMMM25 - Dublin Royal Convention Centre, Dublin, Ireland Duration: 27 Oct 2025 → 31 Oct 2025 https://whova.com/embedded/event/sa54pNCpHUFy1OTIEiEzceQu5kPuSm3dYlEnqAJdV4o%3D/?utc_source=ems (Conference program) https://acmmm2025.org/ (Conference website) https://dl.acm.org/doi/proceedings/10.1145/3746027 (Conference proceedings) |
Publication series
| Name | Proceedings of the ACM International Conference on Multimedia |
|---|---|
| Publisher | Association for Computing Machinery |
Conference
| Conference | 33rd ACM International Conference on Multimedia, ACMMM25 |
|---|---|
| Country/Territory | Ireland |
| City | Dublin |
| Period | 27/10/25 → 31/10/25 |
| Internet address |
|
User-Defined Keywords
- Visual emotion understanding
- Emotional image content generation
- Unified emotional understanding and generation framework