Full-spectrum prompt tuning with sparse MoE for open-set recognition

  • Yifei Xie (Co-first author)
  • , Chuanxing Geng (Co-first author)
  • , Yahao Hu
  • , Man Chen
  • , Jun Chen
  • , Zhisong Pan*
  • *Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

Abstract

Recent advances in open-set recognition leveraging vision-language models (VLMs) predominantly focus on improving textual prompts by exploiting (high-level) visual features from the final layer of the VLM image-encoder. While these approaches demonstrate promising performance, they generally neglect the discriminative yet underutilized (low-level) visual details embedded in shallow layers of the image encoder, which also play a critical role in identifying unknown classes. More critically, despite their significance, integrating such low-level part-based features into textual prompts-typically reflecting high-level conceptual information-remains nontrivial due to inherent disparities in feature representation. To address these issues, we innovatively propose Full-Spectrum Prompt Tuning with Sparse Mixture-of-Experts (FSMoE), which leverages the full-spectrum visual features across different layers to enhance the textual prompts. Specifically, two complementary groups of textual tokens are strategically designed, i.e., high-level textual tokens and low-level textual tokens, where the former interacts with high-level visual features, while the latter for the low-level visual counterparts, thus comprehensively enhancing textual prompts through full-spectrum visual features. Furthermore, to mitigate the redundancy in low-level visual details, a sparse Mixture-of-Experts mechanism is introduced to adaptively select and weight the appropriate features from all low-level visual features through collaborative efforts among multiple experts. Besides, a routing consistency contrastive loss is also employed to further enforce intra-class consistency among experts. Extensive experiments demonstrate the effectiveness of our FSMoE.

Original languageEnglish
Article number108369
Number of pages13
JournalNeural Networks
Volume196
Early online date2 Dec 2025
DOIs
Publication statusE-pub ahead of print - 2 Dec 2025

User-Defined Keywords

  • Adaptive textual prompts
  • Mixture-of-experts
  • Open-set recognition
  • Visual language models

Fingerprint

Dive into the research topics of 'Full-spectrum prompt tuning with sparse MoE for open-set recognition'. Together they form a unique fingerprint.

Cite this