What Makes Good Examples for Visual In-Context Learning?

Yuanhan Zhang, Kaiyang Zhou, Ziwei Liu*

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

4 Citations (Scopus)

Abstract

Large vision models with billions of parameters and trained on broad data have great potential in numerous downstream applications.However, these models are typically difficult to adapt due to their large parameter size and sometimes lack of accesss to their weights-entities able to develop large vision models often provide APIs only.In this paper, we study how to better utilize large vision models through the lens of in-context learning, a concept that has been well-known in natural language processing but has only been studied very recently in computer vision.In-context learning refers to the ability to perform inference on tasks never seen during training by simply conditioning on in-context examples (i.e., input-output pairs) without updating any internal model parameters.To demystify in-context learning in computer vision, we conduct an extensive research and identify a critical problem: downstream performance is highly sensitivie to the choice of visual in-context examples.To address this problem, we propose a prompt retrieval framework specifically for large vision models, allowing the selection of in-context examples to be fully automated.Concretely, we provide two implementations: (i) an unsupervised prompt retrieval method based on nearest example search using an off-the-shelf model, and (ii) a supervised prompt retrieval method, which trains a neural network to choose examples that directly maximize in-context learning performance.Both methods do not require access to the internal weights of large vision models.Our results demonstrate that our methods can bring non-trivial improvements to visual in-context learning in comparison to the commonly-used random selection.

Original languageEnglish
Title of host publication37th Conference on Neural Information Processing Systems, NeurIPS 2023
EditorsA. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, S. Levine
PublisherNeural Information Processing Systems Foundation
Pages1-22
Number of pages22
ISBN (Print)9781713899921
Publication statusPublished - Dec 2023
Event37th Conference on Neural Information Processing Systems, NeurIPS 2023 - Ernest N. Morial Convention Center, New Orleans, United States
Duration: 10 Dec 202316 Dec 2023
https://proceedings.neurips.cc/paper_files/paper/2023 (conference paper search)
https://openreview.net/group?id=NeurIPS.cc/2023/Conference#tab-accept-oral (conference paper search)
https://neurips.cc/Conferences/2023 (conference website)

Publication series

NameAdvances in Neural Information Processing Systems
Volume36
ISSN (Print)1049-5258
NameNeurIPS Proceedings

Conference

Conference37th Conference on Neural Information Processing Systems, NeurIPS 2023
Country/TerritoryUnited States
CityNew Orleans
Period10/12/2316/12/23
Internet address

Scopus Subject Areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'What Makes Good Examples for Visual In-Context Learning?'. Together they form a unique fingerprint.

Cite this