TY - JOUR
T1 - Does Confusion Really Hurt Novel Class Discovery?
AU - Chi, Haoang
AU - Yang, Wenjing
AU - Liu, Feng
AU - Lan, Long
AU - Qin, Tao
AU - Han, Bo
N1 - This work was supported by the National Natural Science Foundation of China (No. 91948303-1, No. 62372459, No. 62376282). We would like to thank the editor and reviewers for their valuable comments that were very useful for improving the quality of this work.
Publisher Copyright:
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024
PY - 2024/8
Y1 - 2024/8
N2 - When sampling data of specific classes (i.e., known classes) for a scientific task, collectors may encounter unknown classes (i.e., novel classes). Since these novel classes might be valuable for future research, collectors will also sample them and assign them to several clusters with the help of known-class data. This assigning process is known as novel class discovery (NCD). However, category confusion is common in the sampling process and may make the NCD unreliable. To tackle this problem, this paper introduces a new and more realistic setting, where collectors may misidentify known classes and even confuse known classes with novel classes—we name it NCD under unreliable sampling (NUSA). We find that NUSA will empirically degrade existing NCD methods if taking no care of sampling errors. To handle NUSA, we propose an effective solution, named hidden-prototype-based discovery network (HPDN): (1) we try to obtain relatively clean data representations even with the confusedly sampled data; (2) we propose a mini-batch K-means variant for robust clustering, alleviating the negative impact of residual errors embedded in the representations by detaching the noisy supervision timely. Experiments demonstrate that, under NUSA, HPDN significantly outperforms competitive baselines (e.g., 6% more than the best baseline on CIFAR-10) and remains robust when encountering serious sampling errors.
AB - When sampling data of specific classes (i.e., known classes) for a scientific task, collectors may encounter unknown classes (i.e., novel classes). Since these novel classes might be valuable for future research, collectors will also sample them and assign them to several clusters with the help of known-class data. This assigning process is known as novel class discovery (NCD). However, category confusion is common in the sampling process and may make the NCD unreliable. To tackle this problem, this paper introduces a new and more realistic setting, where collectors may misidentify known classes and even confuse known classes with novel classes—we name it NCD under unreliable sampling (NUSA). We find that NUSA will empirically degrade existing NCD methods if taking no care of sampling errors. To handle NUSA, we propose an effective solution, named hidden-prototype-based discovery network (HPDN): (1) we try to obtain relatively clean data representations even with the confusedly sampled data; (2) we propose a mini-batch K-means variant for robust clustering, alleviating the negative impact of residual errors embedded in the representations by detaching the noisy supervision timely. Experiments demonstrate that, under NUSA, HPDN significantly outperforms competitive baselines (e.g., 6% more than the best baseline on CIFAR-10) and remains robust when encountering serious sampling errors.
KW - Novel class discovery
KW - Open-world recognition
KW - Semi-supervised learning
KW - Transfer learning
UR - http://www.scopus.com/inward/record.url?scp=85186860005&partnerID=8YFLogxK
U2 - 10.1007/s11263-024-02012-y
DO - 10.1007/s11263-024-02012-y
M3 - Journal article
SN - 0920-5691
VL - 132
SP - 3191
EP - 3207
JO - International Journal of Computer Vision
JF - International Journal of Computer Vision
IS - 8
ER -