TY - JOUR
T1 - Improving night-time pedestrian retrieval with distribution alignment and contextual distance
AU - Ye, Mang
AU - Cheng, Yi
AU - Lan, Xiangyuan
AU - Zhu, Hongyuan
N1 - Funding Information:
This work was supported by Hong Kong Baptist University Tier-1 Start-up Grant. Paper no. TII-19-3081.
PY - 2020/1
Y1 - 2020/1
N2 - Night-time pedestrian retrieval is a cross-modality retrieval task of retrieving person images between day-time visible images and night-time thermal images. It is a very challenging problem due to modality difference, camera variations, and person variations, but it plays an important role in night-time video surveillance. The existing cross-modality retrieval usually focuses on learning modality sharable feature representations to bridge the modality gap. In this article, we propose to utilize auxiliary information to improve the retrieval performance, which consistently improves the performance with different baseline loss functions. Our auxiliary information contains two major parts: cross-modality feature distribution and contextual information. The former aligns the cross-modality feature distributions between two modalities to improve the performance, and the latter optimizes the cross-modality distance measurement with the contextual information. We also demonstrate that abundant annotated visible pedestrian images, which are easily accessible, help to improve the cross-modality pedestrian retrieval as well. The proposed method is featured in two aspects: the auxiliary information does not need additional human intervention or annotation; it learns discriminative feature representations in an end-to-end deep learning manner. Extensive experiments on two cross-modality pedestrian retrieval datasets demonstrate the superiority of the proposed method, achieving much better performance than the state-of-the-arts.
AB - Night-time pedestrian retrieval is a cross-modality retrieval task of retrieving person images between day-time visible images and night-time thermal images. It is a very challenging problem due to modality difference, camera variations, and person variations, but it plays an important role in night-time video surveillance. The existing cross-modality retrieval usually focuses on learning modality sharable feature representations to bridge the modality gap. In this article, we propose to utilize auxiliary information to improve the retrieval performance, which consistently improves the performance with different baseline loss functions. Our auxiliary information contains two major parts: cross-modality feature distribution and contextual information. The former aligns the cross-modality feature distributions between two modalities to improve the performance, and the latter optimizes the cross-modality distance measurement with the contextual information. We also demonstrate that abundant annotated visible pedestrian images, which are easily accessible, help to improve the cross-modality pedestrian retrieval as well. The proposed method is featured in two aspects: the auxiliary information does not need additional human intervention or annotation; it learns discriminative feature representations in an end-to-end deep learning manner. Extensive experiments on two cross-modality pedestrian retrieval datasets demonstrate the superiority of the proposed method, achieving much better performance than the state-of-the-arts.
KW - contextual distance
KW - Cross modality
KW - distribution alignment
KW - pedestrian retrieval
UR - http://www.scopus.com/inward/record.url?scp=85078291617&partnerID=8YFLogxK
U2 - 10.1109/TII.2019.2946030
DO - 10.1109/TII.2019.2946030
M3 - Journal article
AN - SCOPUS:85078291617
SN - 1551-3203
VL - 16
SP - 615
EP - 624
JO - IEEE Transactions on Industrial Informatics
JF - IEEE Transactions on Industrial Informatics
IS - 1
M1 - 8861378
ER -