TY - JOUR
T1 - Cross-Modality Person Re-Identification via Modality-Aware Collaborative Ensemble Learning
AU - Ye, Mang
AU - Lan, Xiangyuan
AU - Leng, Qingming
AU - Shen, Jianbing
N1 - Publisher Copyright:
© 1992-2012 IEEE.
PY - 2020/6/3
Y1 - 2020/6/3
N2 - Visible thermal person re-identification (VT-ReID) is a challenging cross-modality pedestrian retrieval problem due to the large intra-class variations and modality discrepancy across different cameras. Existing VT-ReID methods mainly focus on learning cross-modality sharable feature representations by handling the modality-discrepancy in feature level. However, the modality difference in classifier level has received much less attention, resulting in limited discriminability. In this paper, we propose a novel modality-aware collaborative ensemble (MACE) learning method with middle-level sharable two-stream network (MSTN) for VT-ReID, which handles the modality-discrepancy in both feature level and classifier level. In feature level, MSTN achieves much better performance than existing methods by capturing sharable discriminative middle-level features in convolutional layers. In classifier level, we introduce both modality-specific and modality-sharable identity classifiers for two modalities to handle the modality discrepancy. To utilize the complementary information among different classifiers, we propose an ensemble learning scheme to incorporate the modality sharable classifier and the modality specific classifiers. In addition, we introduce a collaborative learning strategy, which regularizes modality-specific identity predictions and the ensemble outputs. Extensive experiments on two cross-modality datasets demonstrate that the proposed method outperforms current state-of-the-art by a large margin, achieving rank-1/mAP accuracy 51.64%/50.11% on the SYSU-MM01 dataset, and 72.37%/69.09% on the RegDB dataset.
AB - Visible thermal person re-identification (VT-ReID) is a challenging cross-modality pedestrian retrieval problem due to the large intra-class variations and modality discrepancy across different cameras. Existing VT-ReID methods mainly focus on learning cross-modality sharable feature representations by handling the modality-discrepancy in feature level. However, the modality difference in classifier level has received much less attention, resulting in limited discriminability. In this paper, we propose a novel modality-aware collaborative ensemble (MACE) learning method with middle-level sharable two-stream network (MSTN) for VT-ReID, which handles the modality-discrepancy in both feature level and classifier level. In feature level, MSTN achieves much better performance than existing methods by capturing sharable discriminative middle-level features in convolutional layers. In classifier level, we introduce both modality-specific and modality-sharable identity classifiers for two modalities to handle the modality discrepancy. To utilize the complementary information among different classifiers, we propose an ensemble learning scheme to incorporate the modality sharable classifier and the modality specific classifiers. In addition, we introduce a collaborative learning strategy, which regularizes modality-specific identity predictions and the ensemble outputs. Extensive experiments on two cross-modality datasets demonstrate that the proposed method outperforms current state-of-the-art by a large margin, achieving rank-1/mAP accuracy 51.64%/50.11% on the SYSU-MM01 dataset, and 72.37%/69.09% on the RegDB dataset.
KW - collaborative ensemble learning
KW - Cross-modality
KW - person re-identification
UR - http://www.scopus.com/inward/record.url?scp=85093929748&partnerID=8YFLogxK
U2 - 10.1109/TIP.2020.2998275
DO - 10.1109/TIP.2020.2998275
M3 - Journal article
AN - SCOPUS:85093929748
SN - 1057-7149
VL - 29
SP - 9387
EP - 9399
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
M1 - 9107428
ER -