Skip to main navigation Skip to search Skip to main content

Spurious Feature Eraser: Stabilizing Test-Time Adaptation for Vision-Language Foundation Model

  • Huan Ma
  • , Yan Zhu
  • , Changqing Zhang*
  • , Peilin Zhao
  • , Baoyuan Wu
  • , Long Kai Huang
  • , Qinghua Hu
  • , Bingzhe Wu
  • *Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

Abstract

Vision-language foundation models have exhibited remarkable success across a multitude of downstream tasks due to their scalability on extensive image-text paired data. However, these models also display significant limitations when applied to downstream tasks, such as fine-grained image classification, as a result of “decision shortcuts” that hinder their generalization capabilities. In this work, we find that the CLIP model possesses a rich set of features, encompassing both desired invariant causal features and undesired decision shortcuts. Moreover, the underperformance of CLIP on downstream tasks originates from its inability to effectively utilize pretrained features in accordance with specific task requirements. To address this challenge, we propose a simple yet effective method, Spurious Feature Eraser (SEraser), to alleviate the decision shortcuts by erasing the spurious features. Specifically, we introduce a test-time prompt tuning paradigm that optimizes a learnable prompt, thereby compelling the model to exploit invariant features while disregarding decision shortcuts during the inference phase. The proposed method effectively alleviates excessive dependence on potentially misleading spurious information. We conduct comparative analysis of the proposed method against various approaches which validates the significant superiority.

Original languageEnglish
Title of host publicationProceedings of the 39th AAAI Conference on Artificial Intelligence, AAAI 2025
EditorsToby Walsh, Julie Shah, Zico Kolter
Place of PublicationWashington
PublisherAAAI press
Pages19323-19331
Number of pages9
Volume39
Edition18
ISBN (Electronic)157735897X, 9781577358978
DOIs
Publication statusPublished - 11 Apr 2025
Event39th AAAI Conference on Artificial Intelligence - Pennsylvania Convention Center, Philadelphia, United States
Duration: 25 Feb 20254 Mar 2025
https://ojs.aaai.org/index.php/AAAI/issue/archive (Conference Proceedings)
https://aaai.org/conference/aaai/aaai-25/ (Conference website)
https://aaai.org/conference/aaai/aaai-25/program-overview/ (Conference program)

Publication series

NameProceedings of the AAAI Conference on Artificial Intelligence
PublisherAAAI Press
ISSN (Print)2159-5399
ISSN (Electronic)2374-3468

Conference

Conference39th AAAI Conference on Artificial Intelligence
Abbreviated titleAAAI-25
Country/TerritoryUnited States
CityPhiladelphia
Period25/02/254/03/25
Internet address

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 4 - Quality Education
    SDG 4 Quality Education

Fingerprint

Dive into the research topics of 'Spurious Feature Eraser: Stabilizing Test-Time Adaptation for Vision-Language Foundation Model'. Together they form a unique fingerprint.

Cite this