Abstract
Vision-language foundation models have exhibited remarkable success across a multitude of downstream tasks due to their scalability on extensive image-text paired data. However, these models also display significant limitations when applied to downstream tasks, such as fine-grained image classification, as a result of “decision shortcuts” that hinder their generalization capabilities. In this work, we find that the CLIP model possesses a rich set of features, encompassing both desired invariant causal features and undesired decision shortcuts. Moreover, the underperformance of CLIP on downstream tasks originates from its inability to effectively utilize pretrained features in accordance with specific task requirements. To address this challenge, we propose a simple yet effective method, Spurious Feature Eraser (SEraser), to alleviate the decision shortcuts by erasing the spurious features. Specifically, we introduce a test-time prompt tuning paradigm that optimizes a learnable prompt, thereby compelling the model to exploit invariant features while disregarding decision shortcuts during the inference phase. The proposed method effectively alleviates excessive dependence on potentially misleading spurious information. We conduct comparative analysis of the proposed method against various approaches which validates the significant superiority.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 39th AAAI Conference on Artificial Intelligence, AAAI 2025 |
| Editors | Toby Walsh, Julie Shah, Zico Kolter |
| Place of Publication | Washington |
| Publisher | AAAI press |
| Pages | 19323-19331 |
| Number of pages | 9 |
| Volume | 39 |
| Edition | 18 |
| ISBN (Electronic) | 157735897X, 9781577358978 |
| DOIs | |
| Publication status | Published - 11 Apr 2025 |
| Event | 39th AAAI Conference on Artificial Intelligence - Pennsylvania Convention Center, Philadelphia, United States Duration: 25 Feb 2025 → 4 Mar 2025 https://ojs.aaai.org/index.php/AAAI/issue/archive (Conference Proceedings) https://aaai.org/conference/aaai/aaai-25/ (Conference website) https://aaai.org/conference/aaai/aaai-25/program-overview/ (Conference program) |
Publication series
| Name | Proceedings of the AAAI Conference on Artificial Intelligence |
|---|---|
| Publisher | AAAI Press |
| ISSN (Print) | 2159-5399 |
| ISSN (Electronic) | 2374-3468 |
Conference
| Conference | 39th AAAI Conference on Artificial Intelligence |
|---|---|
| Abbreviated title | AAAI-25 |
| Country/Territory | United States |
| City | Philadelphia |
| Period | 25/02/25 → 4/03/25 |
| Internet address |
|
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 4 Quality Education
Fingerprint
Dive into the research topics of 'Spurious Feature Eraser: Stabilizing Test-Time Adaptation for Vision-Language Foundation Model'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver