TY - JOUR
T1 - Online binary classification from similar and dissimilar data
AU - Shu, Senlin
AU - Wang, Haobo
AU - Wang, Zhuowei
AU - Han, Bo
AU - Xiang, Tao
AU - An, Bo
AU - Feng, Lei
N1 - This research is supported by Natural Science Foundation of China (No. 62106028), Chongqing Overseas Chinese Entrepreneurship and Innovation Support Program, and CAAI-Huawei MindSpore Open Fund.
Publisher Copyright:
© 2023, The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature.
PY - 2024/6
Y1 - 2024/6
N2 - Similar-dissimilar (SD) classification aims to train a binary classifier from only similar and dissimilar data pairs, which indicate whether two instances belong to the same class (similar) or not (dissimilar). Although effective learning methods have been proposed for SD classification, they cannot deal with online learning scenarios with sequential data that can be frequently encountered in real-world applications. In this paper, we provide the first attempt to investigate the online SD classification problem. Specifically, we first adapt the unbiased risk estimator of SD classification to online learning scenarios with a conservative regularization term, which could serve as a naive method to solve the online SD classification problem. Then, by further introducing a margin criterion for whether to update the classifier or not with the received cost, we propose two improvements (one with linearly scaled cost and the other with quadratically scaled cost) that result in two online SD classification methods. Theoretically, we derive the regret, mistake, and relative loss bounds for our proposed methods, which guarantee the performance on sequential data. Extensive experiments on various datasets validate the effectiveness of our proposed methods.
AB - Similar-dissimilar (SD) classification aims to train a binary classifier from only similar and dissimilar data pairs, which indicate whether two instances belong to the same class (similar) or not (dissimilar). Although effective learning methods have been proposed for SD classification, they cannot deal with online learning scenarios with sequential data that can be frequently encountered in real-world applications. In this paper, we provide the first attempt to investigate the online SD classification problem. Specifically, we first adapt the unbiased risk estimator of SD classification to online learning scenarios with a conservative regularization term, which could serve as a naive method to solve the online SD classification problem. Then, by further introducing a margin criterion for whether to update the classifier or not with the received cost, we propose two improvements (one with linearly scaled cost and the other with quadratically scaled cost) that result in two online SD classification methods. Theoretically, we derive the regret, mistake, and relative loss bounds for our proposed methods, which guarantee the performance on sequential data. Extensive experiments on various datasets validate the effectiveness of our proposed methods.
KW - Online learning
KW - Passive-aggressive method
KW - Similar-dissimilar classification
KW - Unbiased risk estimator
UR - http://www.scopus.com/inward/record.url?scp=85180190126&partnerID=8YFLogxK
U2 - 10.1007/s10994-023-06434-6
DO - 10.1007/s10994-023-06434-6
M3 - Journal article
AN - SCOPUS:85180190126
SN - 0885-6125
VL - 113
SP - 3463
EP - 3484
JO - Machine Learning
JF - Machine Learning
IS - 6
ER -