TY - JOUR
T1 - DisP+V
T2 - A Unified Framework for Disentangling Prototype and Variation from Single Sample per Person
AU - Pang, Meng
AU - Wang, Binghui
AU - Ye, Mang
AU - Cheung, Yiu Ming
AU - Chen, Yiran
AU - Wen, Bihan
N1 - Funding Information:
The work of Meng Pang and Bihan Wen was supported in part by the Ministry of Education, Singapore, under a startup grant, in part by the National Research Foundation Singapore, and in part by the Singapore Cybersecurity Consortium (SGCSC) Grant Office under Grant SGCSC-Grant-2019-S01. The work of Yiu-ming Cheung was supported in part by NSFC under Grant 61672444; in part by Hong Kong Baptist University under Grant RC-FNRA-IG/18-19/SCI/03, Grant RC-IRCMs/18-19/SCI/01, and Grant AIS 21-22/03; and in part by the Innovation and Technology Fund of the Innovation and Technology Commission, Government of the Hong Kong SAR, under Project ITS/339/18.
Publisher Copyright:
© 2020 IEEE.
PY - 2023/2
Y1 - 2023/2
N2 - Single sample per person face recognition (SSPP FR) is one of the most challenging problems in FR due to the extreme lack of enrolment data. To date, the most popular SSPP FR methods are the generic learning methods, which recognize query face images based on the so-called prototype plus variation (i.e., P+V) model. However, the classic P+V model suffers from two major limitations: 1) it linearly combines the prototype and variation images in the observational pixel-spatial space and cannot generalize to multiple nonlinear variations, e.g., poses, which are common in face images and 2) it would be severely impaired once the enrolment face images are contaminated by nuisance variations. To address the two limitations, it is desirable to disentangle the prototype and variation in a latent feature space and to manipulate the images in a semantic manner. To this end, we propose a novel disentangled prototype plus variation model, dubbed DisP+V, which consists of an encoder-decoder generator and two discriminators. The generator and discriminators play two adversarial games such that the generator nonlinearly encodes the images into a latent semantic space, where the more discriminative prototype feature and the less discriminative variation feature are disentangled. Meanwhile, the prototype and variation features can guide the generator to generate an identity-preserved prototype and the corresponding variation, respectively. Experiments on various real-world face datasets demonstrate the superiority of our DisP+V model over the classic P+V model for SSPP FR. Furthermore, DisP+V demonstrates its unique characteristics in both prototype recovery and face editing/interpolation.
AB - Single sample per person face recognition (SSPP FR) is one of the most challenging problems in FR due to the extreme lack of enrolment data. To date, the most popular SSPP FR methods are the generic learning methods, which recognize query face images based on the so-called prototype plus variation (i.e., P+V) model. However, the classic P+V model suffers from two major limitations: 1) it linearly combines the prototype and variation images in the observational pixel-spatial space and cannot generalize to multiple nonlinear variations, e.g., poses, which are common in face images and 2) it would be severely impaired once the enrolment face images are contaminated by nuisance variations. To address the two limitations, it is desirable to disentangle the prototype and variation in a latent feature space and to manipulate the images in a semantic manner. To this end, we propose a novel disentangled prototype plus variation model, dubbed DisP+V, which consists of an encoder-decoder generator and two discriminators. The generator and discriminators play two adversarial games such that the generator nonlinearly encodes the images into a latent semantic space, where the more discriminative prototype feature and the less discriminative variation feature are disentangled. Meanwhile, the prototype and variation features can guide the generator to generate an identity-preserved prototype and the corresponding variation, respectively. Experiments on various real-world face datasets demonstrate the superiority of our DisP+V model over the classic P+V model for SSPP FR. Furthermore, DisP+V demonstrates its unique characteristics in both prototype recovery and face editing/interpolation.
KW - Adversarial learning
KW - disentangled representation
KW - face editing
KW - prototype recovery
KW - single sample per person
UR - http://www.scopus.com/inward/record.url?scp=85113253202&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2021.3103194
DO - 10.1109/TNNLS.2021.3103194
M3 - Journal article
SN - 2162-237X
VL - 34
SP - 867
EP - 881
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 2
ER -