TY - JOUR
T1 - Point-to-Set Distance Metric Learning on Deep Representations for Visual Tracking
AU - Zhang, Shengping
AU - Qi, Yuankai
AU - Jiang, Feng
AU - Lan, Xiangyuan
AU - Yuen, Pong Chi
AU - Zhou, Huiyu
N1 - Funding Information:
Manuscript received February 20, 2017; revised September 12, 2017; accepted October 19, 2017. Date of publication November 20, 2017; date of current version December 26, 2017. This work was supported in part by the National Natural Science Foundation of China under Grant 61672188 and Grant 61572155 and in part by RGC research grant under Grant RGC/HKBU12254316. The work of H. Zhou was supported in part by U.K. EPSRC under Grant EP/N508664/1, Grant EP/R007187/1, and Grant EP/N011074/1, and in part by the Royal Society-Newton Advanced Fellowship under Grant NA160342. The Associate Editor for this paper was Q. Wang. (Corresponding author: Yuankai Qi.) S. Zhang is with the School of Computer Science and Technology, Harbin Institute of Technology, Weihai 264209, China (e-mail: [email protected]).
PY - 2018/1
Y1 - 2018/1
N2 - For autonomous driving application, a car shall be able to track objects in the scene in order to estimate where and how they will move such that the tracker embedded in the car can efficiently alert the car for effective collision-avoidance. Traditional discriminative object tracking methods usually train a binary classifier via a support vector machine (SVM) scheme to distinguish the target from its background. Despite demonstrated success, the performance of the SVM-based trackers is limited because the classification is carried out only depending on support vectors (SVs) but the target's dynamic appearance may look similar to the training samples that have not been selected as SVs, especially when the training samples are not linearly classifiable. In such cases, the tracker may drift to the background and fail to track the target eventually. To address this problem, in this paper, we propose to integrate the point-to-set/image-to-imageSet distance metric learning (DML) into visual tracking tasks and take full advantage of all the training samples when determining the best target candidate. The point-to-set DML is conducted on convolutional neural network features of the training data extracted from the starting frames. When a new frame comes, target candidates are first projected to the common subspace using the learned mapping functions, and then the candidate having the minimal distance to the target template sets is selected as the tracking result. Extensive experimental results show that even without model update the proposed method is able to achieve favorable performance on challenging image sequences compared with several state-of-the-art trackers.
AB - For autonomous driving application, a car shall be able to track objects in the scene in order to estimate where and how they will move such that the tracker embedded in the car can efficiently alert the car for effective collision-avoidance. Traditional discriminative object tracking methods usually train a binary classifier via a support vector machine (SVM) scheme to distinguish the target from its background. Despite demonstrated success, the performance of the SVM-based trackers is limited because the classification is carried out only depending on support vectors (SVs) but the target's dynamic appearance may look similar to the training samples that have not been selected as SVs, especially when the training samples are not linearly classifiable. In such cases, the tracker may drift to the background and fail to track the target eventually. To address this problem, in this paper, we propose to integrate the point-to-set/image-to-imageSet distance metric learning (DML) into visual tracking tasks and take full advantage of all the training samples when determining the best target candidate. The point-to-set DML is conducted on convolutional neural network features of the training data extracted from the starting frames. When a new frame comes, target candidates are first projected to the common subspace using the learned mapping functions, and then the candidate having the minimal distance to the target template sets is selected as the tracking result. Extensive experimental results show that even without model update the proposed method is able to achieve favorable performance on challenging image sequences compared with several state-of-the-art trackers.
KW - Metric learning
KW - Point to set
KW - Visual tracking
UR - http://www.scopus.com/inward/record.url?scp=85035809030&partnerID=8YFLogxK
U2 - 10.1109/TITS.2017.2766093
DO - 10.1109/TITS.2017.2766093
M3 - Journal article
AN - SCOPUS:85035809030
SN - 1524-9050
VL - 19
SP - 187
EP - 198
JO - IEEE Transactions on Intelligent Transportation Systems
JF - IEEE Transactions on Intelligent Transportation Systems
IS - 1
ER -