TY - JOUR
T1 - Towards Efficient Cross-Modal Anomaly Detection Using Triple-Adaptive Network and Bi-Quintuple Contrastive Learning
AU - Peng, Shu-Juan
AU - Fan, Ye
AU - Cheung, Yiu-ming
AU - Liu, Xin
AU - Cui, Zhen
AU - Li, Taihao
N1 - Publisher Copyright:
IEEE
Funding Information:
This work was supported in part by the Open Project of Zhejiang Lab under Grant 2021KH0AB01, in part by the National Science Foundation of China under Grants 61673185 and 61672444, in part by RGC General Research Fund under Grants 12201321 and 12202622, in part by the NSFC/RGC Joint Research Scheme under Grant N_HKBU214/21, in part by the Innovation and Technology Fund of Innovation and Technology Commission of the Government of the Hong Kong SAR under Grant ITS/339/18, and in part by the National Science Foundation of Fujian Province under Grants 2020J01083 and 2020J01084.
PY - 2024/2
Y1 - 2024/2
N2 - Cross-modal anomaly detection is a relatively new and challenging research topic in machine learning field, which aims at identifying the anomalies whose patterns are disparate across different modalities. As far as we know, this topic has yet to be well studied, and existing works often suffer from the incomplete anomalous data detection and low data utilization problems. To alleviate these limitations, this paper proposes an efficient deep cross-modal anomaly detection approach via Triple-adaptive Network and Bi-quintuple Contrastive Learning (TN-BCL), which lies among the earliest attempt to detect various cross-modal anomalies within the heterogeneous multi-modal data. To be specific, a triple-adaptive network is explicitly designed to identify various anomalies, whose patterns are disparate in both single-modal scenario and cross-modal scenario. On the one hand, the top branch network is utilized to adaptively detect the attribute anomalies and part of mixed anomalies in multi-modal data samples. On the other hand, the bottom two-branch network, with shared residual blocks, is leveraged to learn the discriminative cross-modal embeddings. At the same time, an efficient bi-quintuple contrastive learning method is designed to enhance the feature correlation between the same attribute data, while maximally enlarging the feature difference between different attribute data. Besides that, the bidirectional learning scheme is employed to significantly improve the data utilization. Through the joint exploitation of the above, different kinds of anomalous samples can be well detected across different modalities. Extensive experiments show that the proposed framework outperforms the state-of-the-art competing methods, with a large improvement margin.
AB - Cross-modal anomaly detection is a relatively new and challenging research topic in machine learning field, which aims at identifying the anomalies whose patterns are disparate across different modalities. As far as we know, this topic has yet to be well studied, and existing works often suffer from the incomplete anomalous data detection and low data utilization problems. To alleviate these limitations, this paper proposes an efficient deep cross-modal anomaly detection approach via Triple-adaptive Network and Bi-quintuple Contrastive Learning (TN-BCL), which lies among the earliest attempt to detect various cross-modal anomalies within the heterogeneous multi-modal data. To be specific, a triple-adaptive network is explicitly designed to identify various anomalies, whose patterns are disparate in both single-modal scenario and cross-modal scenario. On the one hand, the top branch network is utilized to adaptively detect the attribute anomalies and part of mixed anomalies in multi-modal data samples. On the other hand, the bottom two-branch network, with shared residual blocks, is leveraged to learn the discriminative cross-modal embeddings. At the same time, an efficient bi-quintuple contrastive learning method is designed to enhance the feature correlation between the same attribute data, while maximally enlarging the feature difference between different attribute data. Besides that, the bidirectional learning scheme is employed to significantly improve the data utilization. Through the joint exploitation of the above, different kinds of anomalous samples can be well detected across different modalities. Extensive experiments show that the proposed framework outperforms the state-of-the-art competing methods, with a large improvement margin.
KW - Cross-modal anomaly detection
KW - triple-adaptive network
KW - bi-quintuple contrastive learning
KW - shared residual block
UR - http://www.scopus.com/inward/record.url?scp=85151567366&partnerID=8YFLogxK
U2 - 10.1109/TETCI.2023.3256466
DO - 10.1109/TETCI.2023.3256466
M3 - Journal article
SN - 2471-285X
VL - 8
SP - 697
EP - 709
JO - IEEE Transactions on Emerging Topics in Computational Intelligence
JF - IEEE Transactions on Emerging Topics in Computational Intelligence
IS - 1
ER -