TY - JOUR
T1 - Semi-Supervised and Long-Tailed Object Detection with CascadeMatch
AU - Zang, Yuhang
AU - Zhou, Kaiyang
AU - Huang, Chen
AU - Loy, Chen Change
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2023/4
Y1 - 2023/4
N2 - This paper focuses on long-tailed object detection in the semi-supervised learning setting, which poses realistic challenges, but has rarely been studied in the literature. We propose a novel pseudo-labeling-based detector called CascadeMatch. Our detector features a cascade network architecture, which has multi-stage detection heads with progressive confidence thresholds. To avoid manually tuning the thresholds, we design a new adaptive pseudo-label mining mechanism to automatically identify suitable values from data. To mitigate confirmation bias, where a model is negatively reinforced by incorrect pseudo-labels produced by itself, each detection head is trained by the ensemble pseudo-labels of all detection heads. Experiments on two long-tailed datasets, i.e., LVIS and COCO-LT, demonstrate that CascadeMatch surpasses existing state-of-the-art semi-supervised approaches—across a wide range of detection architectures—in handling long-tailed object detection. For instance, CascadeMatch outperforms Unbiased Teacher by 1.9 AP Fix on LVIS when using a ResNet50-based Cascade R-CNN structure, and by 1.7 AP Fix when using Sparse R-CNN with a Transformer encoder. We also show that CascadeMatch can even handle the challenging sparsely annotated object detection problem. Code: https://github.com/yuhangzang/CascadeMatch.
AB - This paper focuses on long-tailed object detection in the semi-supervised learning setting, which poses realistic challenges, but has rarely been studied in the literature. We propose a novel pseudo-labeling-based detector called CascadeMatch. Our detector features a cascade network architecture, which has multi-stage detection heads with progressive confidence thresholds. To avoid manually tuning the thresholds, we design a new adaptive pseudo-label mining mechanism to automatically identify suitable values from data. To mitigate confirmation bias, where a model is negatively reinforced by incorrect pseudo-labels produced by itself, each detection head is trained by the ensemble pseudo-labels of all detection heads. Experiments on two long-tailed datasets, i.e., LVIS and COCO-LT, demonstrate that CascadeMatch surpasses existing state-of-the-art semi-supervised approaches—across a wide range of detection architectures—in handling long-tailed object detection. For instance, CascadeMatch outperforms Unbiased Teacher by 1.9 AP Fix on LVIS when using a ResNet50-based Cascade R-CNN structure, and by 1.7 AP Fix when using Sparse R-CNN with a Transformer encoder. We also show that CascadeMatch can even handle the challenging sparsely annotated object detection problem. Code: https://github.com/yuhangzang/CascadeMatch.
KW - Object detection
KW - Long-tailed learning
KW - Semi-supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85145831630&partnerID=8YFLogxK
U2 - 10.1007/s11263-022-01738-x
DO - 10.1007/s11263-022-01738-x
M3 - Journal article
AN - SCOPUS:85145831630
SN - 0920-5691
VL - 131
SP - 987
EP - 1001
JO - International Journal of Computer Vision
JF - International Journal of Computer Vision
IS - 4
ER -