Skip to main navigation Skip to search Skip to main content

Confidence Calibration in Contrastive Vision-Language Models

  • Shuoyuan Wang
  • , Kaiyang Zhou
  • , Hongxin Wei*
  • *Corresponding author for this work

Research output: Chapter in book/report/conference proceedingChapterpeer-review

Abstract

Contrastive Vision-Language Models (VLMs) have emerged as powerful tools, excelling in various open-vocabulary tasks such as image recognition, retrieval-augmented task adaptation, and visual chatbots. To better adapt to downstream tasks, various parameter-efficient fine-tuning approaches have been developed by the community, e.g., prompt learning. However, an important issue has received little attention: the confidence calibration problem in zero-shot or fine-tuned VLMs, which can significantly undermine the reliability of these models in downstream applications. This chapter addresses this issue by systematically studying the confidence calibration problem in the context of prompt learning for CLIP. The analysis reveals that existing calibration techniques are inadequate, particularly in open-vocabulary scenarios. This chapter then discusses a simple yet effective approach called Distance-Aware Calibration (DAC), which automatically adjusts the temperature scaling parameter based on the distance between predicted text labels and base classes. The effectiveness of the approach is validated on 7 prompt learning methods across 11 downstream tasks.

Original languageEnglish
Title of host publicationLarge Vision-Language Models
Subtitle of host publicationPre-training, Prompting, and Applications
EditorsKaiyang Zhou, Ziwei Liu, Peng Gao
Place of PublicationCham
PublisherSpringer Cham
Chapter9
Pages207-226
Number of pages20
ISBN (Electronic)9783031949692
ISBN (Print)9783031949685, 9783031949715
DOIs
Publication statusPublished - 30 Aug 2025

Publication series

NameAdvances in Computer Vision and Pattern Recognition
VolumePart F886
ISSN (Print)2191-6586
ISSN (Electronic)2191-6594

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 9 - Industry, Innovation, and Infrastructure
    SDG 9 Industry, Innovation, and Infrastructure

User-Defined Keywords

  • Confidence calibration
  • Prompt learning
  • Vision-language model

Fingerprint

Dive into the research topics of 'Confidence Calibration in Contrastive Vision-Language Models'. Together they form a unique fingerprint.

Cite this