TY - JOUR
T1 - Spatial-temporal Regularized Multi-modality Correlation Filters for Tracking with Re-detection
AU - Lan, Xiangyuan
AU - Yang, Zifei
AU - Zhang, Wei
AU - Yuen, Pong Chi
N1 - Funding Information:
This project is supported by the National Natural Science Foundation of China under Grant 61991411, the Hong Kong Research Grant Council GRF project RGC/HKBU12254316, and Hong Kong Baptist University Tier 1 Grant. Authors’ addresses: X. Lan and P. C. Yuen (corresponding author), Department of Computer Science, Hong Kong Baptist University, 34 Renfrew Road, Kowloon Tong, Hong Kong, China; emails: [email protected], [email protected]; Z. Yang and W. Zhang, School of Control Science and Engineering, Shandong University, 73 Jingshi Road Jinan, China 250061; emails: [email protected], [email protected]. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM. 1551-6857/2021/05-ART57 $15.00 https://doi.org/10.1145/3430257
Publisher Copyright:
© 2021 ACM.
PY - 2021/5
Y1 - 2021/5
N2 - The development of multi-spectrum image sensing technology has brought great interest in exploiting the information of multiple modalities (e.g., RGB and infrared modalities) for solving computer vision problems. In this article, we investigate how to exploit information from RGB and infrared modalities to address two important issues in visual tracking: robustness and object re-detection. Although various algorithms that attempt to exploit multi-modality information in appearance modeling have been developed, they still face challenges that mainly come from the following aspects: (1) the lack of robustness to deal with large appearance changes and dynamic background, (2) failure in re-capturing the object when tracking loss happens, and (3) difficulty in determining the reliability of different modalities. To address these issues and perform effective integration of multiple modalities, we propose a new tracking-by-detection algorithm called Adaptive Spatial-temporal Regulated Multi-Modality Correlation Filter. Particularly, an adaptive spatial-temporal regularization is imposed into the correlation filter framework in which the spatial regularization can help to suppress effect from the cluttered background while the temporal regularization enables the adaptive incorporation of historical appearance cues to deal with appearance changes. In addition, a dynamic modality weight learning algorithm is integrated into the correlation filter training, which ensures that more reliable modalities gain more importance in target tracking. Experimental results demonstrate the effectiveness of the proposed method.
AB - The development of multi-spectrum image sensing technology has brought great interest in exploiting the information of multiple modalities (e.g., RGB and infrared modalities) for solving computer vision problems. In this article, we investigate how to exploit information from RGB and infrared modalities to address two important issues in visual tracking: robustness and object re-detection. Although various algorithms that attempt to exploit multi-modality information in appearance modeling have been developed, they still face challenges that mainly come from the following aspects: (1) the lack of robustness to deal with large appearance changes and dynamic background, (2) failure in re-capturing the object when tracking loss happens, and (3) difficulty in determining the reliability of different modalities. To address these issues and perform effective integration of multiple modalities, we propose a new tracking-by-detection algorithm called Adaptive Spatial-temporal Regulated Multi-Modality Correlation Filter. Particularly, an adaptive spatial-temporal regularization is imposed into the correlation filter framework in which the spatial regularization can help to suppress effect from the cluttered background while the temporal regularization enables the adaptive incorporation of historical appearance cues to deal with appearance changes. In addition, a dynamic modality weight learning algorithm is integrated into the correlation filter training, which ensures that more reliable modalities gain more importance in target tracking. Experimental results demonstrate the effectiveness of the proposed method.
KW - behavior understanding
KW - Multi-modality fusion
KW - tracking
UR - http://www.scopus.com/inward/record.url?scp=85107938902&partnerID=8YFLogxK
U2 - 10.1145/3430257
DO - 10.1145/3430257
M3 - Journal article
AN - SCOPUS:85107938902
SN - 1551-6857
VL - 17
SP - 1
EP - 16
JO - ACM Transactions on Multimedia Computing, Communications and Applications
JF - ACM Transactions on Multimedia Computing, Communications and Applications
IS - 2
M1 - 57
ER -