Privileged information assisted learning from noisy correspondence

  • Zihua Zhao
  • , Tianjie Dai
  • , Mengxi Chen
  • , Jiangchao Yao*
  • , Bo Han
  • , Ya Zhang*
  • , Yanfeng Wang
  • *Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

Abstract

Noisy correspondence which refers to the mismatch in the collected paired data, can inevitably degrade the performance and generalization of cross-modal models. To address this, previous works typically focus on internal signals, such as model certainty or softening hard labels, to mitigate the influence of noise. Distinctly, we explore a novel external structure by leveraging privileged information with a core intuition: both modalities of a matched pair should be closely correlated with their shared privileged information, while for a mismatched pair, at least one modality will likely fail to align. Specifically, we propose a novel Privileged Information Assisted Learning method, which uses privileged information to explain away noisy correspondence by deriving an adaptive weighting mechanism. PIAL first disentangles the problem by estimating preliminary indicators for both cross-modal and the privileged information correspondence, then introduces a confidence-oriented fusion function to arrive at the final weighting term. Extensive experiments demonstrate the rationality, compatibility and consistent effectiveness of PIAL over the current state-of-the-art methods.

Original languageEnglish
Article number132733
Number of pages13
JournalNeurocomputing
Volume672
DOIs
Publication statusPublished - 1 Apr 2026

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 9 - Industry, Innovation, and Infrastructure
    SDG 9 Industry, Innovation, and Infrastructure

User-Defined Keywords

  • Noisy correspondence
  • Cross-modal retrieval
  • Multimodal learning
  • Robust machine learning

Fingerprint

Dive into the research topics of 'Privileged information assisted learning from noisy correspondence'. Together they form a unique fingerprint.

Cite this