Abstract
Pre-trained vision-language models, such as CLIP, have shown remarkable capabilities across various downstream tasks by learning prompts that consist of context concatenated with a class name; for example, ‘a photo of a [dog]’ with [dog] as a class prior. Advanced prompt-learning methods typically initialize and optimize the context; for example, ‘a photo of a’ for downstream task adaptation. However, context optimization typically leads to poor generalization performance over novel classes or datasets sampled from different distributions. This may be attributed to prompt inconsistency; namely, prompts optimized using one image distribution may differ from those optimized using a different image distribution. To improve the generalization performance of optimized prompts, we propose the novel consistent prompt learning (CPL) approach that identifies and addresses the image distribution that causes prompt inconsistency by performing distributional exploration. CPL identifies and mitigates prompt inconsistency in an adversarial training scheme, in which prompt inconsistency is measured as the similarity discrepancy between images and two different prompts. Specifically, CPL calculates two similarities between a query image and two prompts, and determines the prompt inconsistency through the discrepancy between these two similarities. Subsequently, CPL performs distributional exploration to enlarge the discrepancy and uses an adversarial-training approach to mitigate the discrepancy. Consequently, the model predictions are insensitive to prompt changes. The optimized prompt performs well under various image distributions. Comprehensive experiments show that the proposed CPL method performs favorably on four types of representative tasks across 11 datasets, which improves on existing prompt-learning methods, achieving state-of-the-art performance.
Original language | English |
---|---|
Article number | 112974 |
Number of pages | 9 |
Journal | Knowledge-Based Systems |
Volume | 310 |
DOIs | |
Publication status | Published - 15 Feb 2025 |
Scopus Subject Areas
- Software
- Management Information Systems
- Information Systems and Management
- Artificial Intelligence
User-Defined Keywords
- Domain adaptation
- Domain generalization
- Prompt learning
- Vision-language models