Towards Out-of-Modal Generalization without Instance-level Modal Correspondence

Zhuo Huang, Gang Niu, Bo Han, Masashi Sugiyama, Tongliang Liu*

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

Abstract

The world is understood from various modalities, such as appearance, sound, and language. Since each modality only partially represents objects in a certain meaning, leveraging additional ones is beneficial in both theory and practice. However, exploiting novel modalities normally requires cross-modal pairs corresponding to the same instance, which is extremely resource-consuming and sometimes even impossible, making knowledge exploration of novel modalities largely restricted. To seek practical multi-modal learning, here we study Out-of-Modal (OOM) Generalization as an initial attempt to generalize to an unknown modality without given instance-level modal correspondence. Specifically, we consider Semi-Supervised and Unsupervised scenarios of OOM Generalization, where the first has scarce correspondences and the second has none, and propose Connect&Explore (COX) to solve these problems. COX first connects OOM data and known In-Modal (IM) data through a variational information bottleneck framework to extract shared information. Then, COX leverages the shared knowledge to create emergent correspondences, which is theoretically justified from an information-theoretic perspective. As a result, the label information on OOM data emerges along with the correspondences, which helps explore the OOM data with unknown knowledge, thus benefiting generalization results. We carefully evaluate the proposed COX method under various OOM generalization scenarios, verifying its effectiveness and extensibility. The code is available at https://github.com/tmllab/2025 ICLR COX.

Original languageEnglish
Title of host publicationProceedings of the Thirteenth International Conference on Learning Representations, ICLR 2025
PublisherInternational Conference on Learning Representations, ICLR
Pages56245-56264
Number of pages20
ISBN (Electronic)9798331320850
Publication statusPublished - 24 Apr 2025
Event13th International Conference on Learning Representations, ICLR 2025 - , Singapore
Duration: 24 Apr 202528 Apr 2025
https://iclr.cc/Conferences/2025 (Conference website)
https://openreview.net/group?id=ICLR.cc/2025/Conference#tab-accept-oral (Conference proceedings)

Publication series

NameInternational Conference on Learning Representations, ICLR

Conference

Conference13th International Conference on Learning Representations, ICLR 2025
Country/TerritorySingapore
Period24/04/2528/04/25
Internet address

Fingerprint

Dive into the research topics of 'Towards Out-of-Modal Generalization without Instance-level Modal Correspondence'. Together they form a unique fingerprint.

Cite this