Abstract
Recent advances in open-set recognition leveraging vision-language models (VLMs) predominantly focus on improving textual prompts by exploiting (high-level) visual features from the final layer of the VLM image-encoder. While these approaches demonstrate promising performance, they generally neglect the discriminative yet underutilized (low-level) visual details embedded in shallow layers of the image encoder, which also play a critical role in identifying unknown classes. More critically, despite their significance, integrating such low-level part-based features into textual prompts-typically reflecting high-level conceptual information-remains nontrivial due to inherent disparities in feature representation. To address these issues, we innovatively propose Full-Spectrum Prompt Tuning with Sparse Mixture-of-Experts (FSMoE), which leverages the full-spectrum visual features across different layers to enhance the textual prompts. Specifically, two complementary groups of textual tokens are strategically designed, i.e., high-level textual tokens and low-level textual tokens, where the former interacts with high-level visual features, while the latter for the low-level visual counterparts, thus comprehensively enhancing textual prompts through full-spectrum visual features. Furthermore, to mitigate the redundancy in low-level visual details, a sparse Mixture-of-Experts mechanism is introduced to adaptively select and weight the appropriate features from all low-level visual features through collaborative efforts among multiple experts. Besides, a routing consistency contrastive loss is also employed to further enforce intra-class consistency among experts. Extensive experiments demonstrate the effectiveness of our FSMoE.
| Original language | English |
|---|---|
| Article number | 108369 |
| Number of pages | 13 |
| Journal | Neural Networks |
| Volume | 196 |
| Early online date | 2 Dec 2025 |
| DOIs | |
| Publication status | E-pub ahead of print - 2 Dec 2025 |
User-Defined Keywords
- Adaptive textual prompts
- Mixture-of-experts
- Open-set recognition
- Visual language models
Fingerprint
Dive into the research topics of 'Full-spectrum prompt tuning with sparse MoE for open-set recognition'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver