Recent advances of person re-identification have well advocated the usage of human body cues to boost performance. However, most existing methods still retain on exploiting a relatively coarse-grained local information. Such information may include redundant backgrounds that are sensitive to the apparently similar persons when facing challenging scenarios like complex poses, inaccurate detection, occlusion and misalignment. In this paper we propose a novel Fine-Grained Spatial Alignment Model (FGSAM) to mine fine-grained local information to handle the aforementioned challenge effectively. In particular, we first design a pose resolve net with channel parse blocks (CPB) to extract pose information in pixel-level. This network allows the proposed model to be robust to complex pose variations while suppressing the redundant backgrounds caused by inaccurate detection and occlusion. Given the extracted pose information, a locally reinforced alignment mode is further proposed to address the misalignment problem between different local parts by considering different local parts along with attribute information in a fine-grained way. Finally, a focal triplet loss is designed to effectively train the entire model, which imposes a constraint on the intra-class and an adaptively weight adjustment mechanism to handle the hard sample problem. Extensive evaluations and analysis on Market1501, DukeMTMC-reid and PETA datasets demonstrate the effectiveness of FGSAM in coping with the problems of misalignment, occlusion and complex poses.
Scopus Subject Areas
- Computer Graphics and Computer-Aided Design
- focal triplet loss
- Person re-identification
- spatial alignment