TY - JOUR
T1 - Fine-Grained Spatial Alignment Model for Person Re-Identification with Focal Triplet Loss
AU - Zhou, Qinqin
AU - Zhong, Bineng
AU - LAN, Xiangyuan
AU - Sun, Gan
AU - Zhang, Yulun
AU - Zhang, Baochang
AU - Ji, Rongrong
N1 - Funding Information:
Manuscript received October 8, 2019; revised May 8, 2020; accepted June 16, 2020. Date of publication June 29, 2020; date of current version July 13, 2020. This work was supported in part by the National Natural Science Foundation of China under Grant U1705262, Grant 61972167, and Grant 61802135, in part by the National Key Research and Development Program under Grant 2017YFC0113000 and Grant 2016YFB1001503, in part by the Fundamental Research Funds for the Central Universities under Grant 30918014108, and in part by the Open Project Program of the National Laboratory of Pattern Recognition (NLPR) under Grant 202000012. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Jianbing Shen. (Corresponding authors: Bineng Zhong; Rongrong Ji.) Qinqin Zhou and Bineng Zhong are with the Department of Computer Science and Technology, Huaqiao University, Xiamen 361021, China (e-mail: [email protected]).
PY - 2020/6/29
Y1 - 2020/6/29
N2 - Recent advances of person re-identification have well advocated the usage of human body cues to boost performance. However, most existing methods still retain on exploiting a relatively coarse-grained local information. Such information may include redundant backgrounds that are sensitive to the apparently similar persons when facing challenging scenarios like complex poses, inaccurate detection, occlusion and misalignment. In this paper we propose a novel Fine-Grained Spatial Alignment Model (FGSAM) to mine fine-grained local information to handle the aforementioned challenge effectively. In particular, we first design a pose resolve net with channel parse blocks (CPB) to extract pose information in pixel-level. This network allows the proposed model to be robust to complex pose variations while suppressing the redundant backgrounds caused by inaccurate detection and occlusion. Given the extracted pose information, a locally reinforced alignment mode is further proposed to address the misalignment problem between different local parts by considering different local parts along with attribute information in a fine-grained way. Finally, a focal triplet loss is designed to effectively train the entire model, which imposes a constraint on the intra-class and an adaptively weight adjustment mechanism to handle the hard sample problem. Extensive evaluations and analysis on Market1501, DukeMTMC-reid and PETA datasets demonstrate the effectiveness of FGSAM in coping with the problems of misalignment, occlusion and complex poses.
AB - Recent advances of person re-identification have well advocated the usage of human body cues to boost performance. However, most existing methods still retain on exploiting a relatively coarse-grained local information. Such information may include redundant backgrounds that are sensitive to the apparently similar persons when facing challenging scenarios like complex poses, inaccurate detection, occlusion and misalignment. In this paper we propose a novel Fine-Grained Spatial Alignment Model (FGSAM) to mine fine-grained local information to handle the aforementioned challenge effectively. In particular, we first design a pose resolve net with channel parse blocks (CPB) to extract pose information in pixel-level. This network allows the proposed model to be robust to complex pose variations while suppressing the redundant backgrounds caused by inaccurate detection and occlusion. Given the extracted pose information, a locally reinforced alignment mode is further proposed to address the misalignment problem between different local parts by considering different local parts along with attribute information in a fine-grained way. Finally, a focal triplet loss is designed to effectively train the entire model, which imposes a constraint on the intra-class and an adaptively weight adjustment mechanism to handle the hard sample problem. Extensive evaluations and analysis on Market1501, DukeMTMC-reid and PETA datasets demonstrate the effectiveness of FGSAM in coping with the problems of misalignment, occlusion and complex poses.
KW - focal triplet loss
KW - Person re-identification
KW - spatial alignment
UR - http://www.scopus.com/inward/record.url?scp=85088307926&partnerID=8YFLogxK
U2 - 10.1109/TIP.2020.3004267
DO - 10.1109/TIP.2020.3004267
M3 - Journal article
AN - SCOPUS:85088307926
SN - 1057-7149
VL - 29
SP - 7578
EP - 7589
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
M1 - 9127793
ER -