TY - JOUR
T1 - Cold-Start Active Sampling Via γ-Tube
AU - Cao, Xiaofeng
AU - Tsang, Ivor W.
AU - Xu, Jianliang
N1 - Funding information:
The work of Ivor W. Tsang was supported by Australian Research Council under Grant DP180100106 and Grant DP200101328. The work of Jianliang Xu was supported by Hong Kong Research Grants Council (HK-RGC) under Grant C6030-18GF. This article was recommended by Associate Editor S. Ventura. (Corresponding author: Ivor W. Tsang.)
Publisher Copyright:
© 2021 IEEE.
PY - 2022/7
Y1 - 2022/7
N2 - Active learning (AL) improves the generalization performance for the current classification hypothesis by querying labels from a pool of unlabeled data. The sampling process is typically assessed by an informative, representative, or diverse evaluation policy. However, the policy, which needs an initial labeled set to start, may degenerate its performance in a coldstart hypothesis. In this article, we first show that typical AL sampling can be equivalently formulated as geometric sampling over minimum enclosing balls1 (MEBs) of clusters. Following the γ -tube structure in geometric clustering, we then divide one MEB covering a cluster into two parts: 1) a γ -tube and 2) a γ - ball. By estimating the error disagreement between sampling in MEB and γ -ball, our theoretical insight reveals that γ -tube can effectively measure the disagreement of hypotheses in original space over MEB and sampling space over γ -ball. To tighten our insight, we present generalization analysis, and the results show that sampling in γ -tube can derive higher probability bound to achieve a nearly zero generalization error. With these analyses, we finally apply the informative sampling policy of AL over γ -tube to present a tube AL (TAL) algorithm against the coldstart sampling issue. As a result, the dependency between the querying process and the evaluation policy of active sampling can be alleviated. Experimental results show that by using the γ -tube structure to deal with cold-start sampling, TAL achieves the superior performance than standard AL evaluation baselines by presenting substantial accuracy improvements. Image edge recognition extends our theoretical results.
AB - Active learning (AL) improves the generalization performance for the current classification hypothesis by querying labels from a pool of unlabeled data. The sampling process is typically assessed by an informative, representative, or diverse evaluation policy. However, the policy, which needs an initial labeled set to start, may degenerate its performance in a coldstart hypothesis. In this article, we first show that typical AL sampling can be equivalently formulated as geometric sampling over minimum enclosing balls1 (MEBs) of clusters. Following the γ -tube structure in geometric clustering, we then divide one MEB covering a cluster into two parts: 1) a γ -tube and 2) a γ - ball. By estimating the error disagreement between sampling in MEB and γ -ball, our theoretical insight reveals that γ -tube can effectively measure the disagreement of hypotheses in original space over MEB and sampling space over γ -ball. To tighten our insight, we present generalization analysis, and the results show that sampling in γ -tube can derive higher probability bound to achieve a nearly zero generalization error. With these analyses, we finally apply the informative sampling policy of AL over γ -tube to present a tube AL (TAL) algorithm against the coldstart sampling issue. As a result, the dependency between the querying process and the evaluation policy of active sampling can be alleviated. Experimental results show that by using the γ -tube structure to deal with cold-start sampling, TAL achieves the superior performance than standard AL evaluation baselines by presenting substantial accuracy improvements. Image edge recognition extends our theoretical results.
KW - Active learning (AL)
KW - generalization errors
KW - hypothesis
KW - minimum enclosing balls (MEBs)
KW - γ-tube
UR - http://www.scopus.com/inward/record.url?scp=85104614337&partnerID=8YFLogxK
U2 - 10.1109/TCYB.2021.3069956
DO - 10.1109/TCYB.2021.3069956
M3 - Journal article
AN - SCOPUS:85104614337
SN - 2168-2267
VL - 52
SP - 6034
EP - 6045
JO - IEEE Transactions on Cybernetics
JF - IEEE Transactions on Cybernetics
IS - 7
ER -