TY - JOUR
T1 - UCPM: Uncertainty-Guided Cross-Modal Retrieval With Partially Mismatched Pairs
AU - Zha, Quanxing
AU - Liu, Xin
AU - Cheung, Yiu-ming
AU - Peng, Shu-Juan
AU - Xu, Xing
AU - Wang, Nannan
N1 - This work was supported in part by the National Science Foundation of China under Grant 62476103 and Grant 62222203, in part by the National Science Foundation of Xiamen City under Grant 3502Z202473043, in part by the National Science Foundation of Fujian Province under Grant 2024J01096 and Grant 2022J01316, in part by the RGC General Research Fund under Grant 12202924 and Grant 12202622, in part by Hong Kong Baptist University under Grant RC-SFCRG/23-24/R2/SCI/10, and in part by the NSFC/RGC Joint Research Scheme under Grant N HKBU214/21.
PY - 2025/6/6
Y1 - 2025/6/6
N2 - The manual annotation of perfectly aligned labels for cross-modal retrieval (CMR) is incredibly labor-intensive. As an alternative, the collection of co-occurring data pairs from the Internet is a remarkably cost-effective way, but which, inevitably induces the Partially Mismatched Pairs (PMPs) and therefore significantly degrades the retrieval performance without particular treatment. Previous efforts often utilize the pair-wise similarity to filter out the mismatched pairs, and such operation is highly sensitive to mismatched or ambiguous data and thus leads to sub-optimal performance. To alleviate these concerns, we propose an efficient approach, termed UCPM, i.e., Uncertainty-guided Cross-modal retrieval with Partially Mismatched pairs, which can significantly reduce the adverse impact of mismatched data pairs. Specifically, a novel Uncertainty Guided Division (UGD) strategy is sophisticatedly designed to divide the corrupted training data into confident matched (clean), easily-identifiable mismatched (noisy) and hardly-determined hard subsets, and the derived uncertainty can simultaneously guide the informative pair learning while reducing the negative impact of potential mismatched pairs. Meanwhile, an effective Uncertainty Self-Correction (USC) mechanism is concurrently presented to accurately identify and rectify the fluctuated uncertainty during the training process, which further improves the stability and reliability of the estimated uncertainty. Besides, a Trusted Margin Loss (TML) is newly designed to enhance the discriminability between those hard pairs, by dynamically adjusting their soft margins to amplify the positive contributions of matched pairs while suppressing the negative impacts of mismatched pairs. Extensive experiments on three widely-used benchmark datasets, verify the effectiveness and reliability of UCPM compared with the existing SOTA approaches, and significantly improve the robustness in both synthetic and real-world PMPs. The code is available at: https://github.com/qxzha/UCPM
AB - The manual annotation of perfectly aligned labels for cross-modal retrieval (CMR) is incredibly labor-intensive. As an alternative, the collection of co-occurring data pairs from the Internet is a remarkably cost-effective way, but which, inevitably induces the Partially Mismatched Pairs (PMPs) and therefore significantly degrades the retrieval performance without particular treatment. Previous efforts often utilize the pair-wise similarity to filter out the mismatched pairs, and such operation is highly sensitive to mismatched or ambiguous data and thus leads to sub-optimal performance. To alleviate these concerns, we propose an efficient approach, termed UCPM, i.e., Uncertainty-guided Cross-modal retrieval with Partially Mismatched pairs, which can significantly reduce the adverse impact of mismatched data pairs. Specifically, a novel Uncertainty Guided Division (UGD) strategy is sophisticatedly designed to divide the corrupted training data into confident matched (clean), easily-identifiable mismatched (noisy) and hardly-determined hard subsets, and the derived uncertainty can simultaneously guide the informative pair learning while reducing the negative impact of potential mismatched pairs. Meanwhile, an effective Uncertainty Self-Correction (USC) mechanism is concurrently presented to accurately identify and rectify the fluctuated uncertainty during the training process, which further improves the stability and reliability of the estimated uncertainty. Besides, a Trusted Margin Loss (TML) is newly designed to enhance the discriminability between those hard pairs, by dynamically adjusting their soft margins to amplify the positive contributions of matched pairs while suppressing the negative impacts of mismatched pairs. Extensive experiments on three widely-used benchmark datasets, verify the effectiveness and reliability of UCPM compared with the existing SOTA approaches, and significantly improve the robustness in both synthetic and real-world PMPs. The code is available at: https://github.com/qxzha/UCPM
KW - Cross-modal retrieval
KW - partially mismatched pairs
KW - trusted margin loss
KW - uncertainty guided division
KW - uncertainty self-correction
UR - http://www.scopus.com/inward/record.url?scp=105007612158&partnerID=8YFLogxK
U2 - 10.1109/TIP.2025.3574918
DO - 10.1109/TIP.2025.3574918
M3 - Journal article
SN - 1057-7149
VL - 34
SP - 3622
EP - 3634
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
ER -