TY - JOUR
T1 - SANet
T2 - A novel segmented attention mechanism and multi-level information fusion network for 6D object pose estimation
AU - Geng, Xinbo
AU - Shi, Fan
AU - Cheng, Xu
AU - Jia, Chen
AU - Wang, Mianzhao
AU - Chen, Shengyong
AU - Dai, Hongning
N1 - Funding Information (Section snippets):
The authors gratefully acknowledge the support from National Natural Science Foundation of China under grant numbers 62272342, 62020106004 and 92048301.
Publisher Copyright:
© 2023 Elsevier B.V.
PY - 2023/7/1
Y1 - 2023/7/1
N2 - Reliably and rapidly estimating the 6D position of an object is a critical challenge when using Internet of Things (IoT) technologies for monitoring. Nowadays, the prevalent 6D pose estimation architecture is based on a two-stage technique, which requires a significant amount of time for both training and deploying the algorithm in actual applications. Additionally, the majority of approaches include intricate high-low level features in the network that greatly influence training but contribute less to testing. To enable more accurate 6D object pose estimation while shortening the deployment time, we used a single-stage end-to-end algorithm to design the network. In this paper, we propose SANet, which is composed of a segmented attention module and a multi-level information fusion module. Specifically, by extracting high-level semantic information from images before fusing them to the decoder, and by removing redundant information using the multi-level information fusion module, the feature fusion complexity of the model is reduced by extracting high level features. In addition, the segmented attention module can suppress unreliable information to enhance network learning of channel and spatial information, enabling the network to more accurately understand the geometric aspects of the object. Extensive experiments on LM and LMO datasets demonstrate that our method outperforms state-of-the-art baselines, ranking 1st in both speed and accuracy.
AB - Reliably and rapidly estimating the 6D position of an object is a critical challenge when using Internet of Things (IoT) technologies for monitoring. Nowadays, the prevalent 6D pose estimation architecture is based on a two-stage technique, which requires a significant amount of time for both training and deploying the algorithm in actual applications. Additionally, the majority of approaches include intricate high-low level features in the network that greatly influence training but contribute less to testing. To enable more accurate 6D object pose estimation while shortening the deployment time, we used a single-stage end-to-end algorithm to design the network. In this paper, we propose SANet, which is composed of a segmented attention module and a multi-level information fusion module. Specifically, by extracting high-level semantic information from images before fusing them to the decoder, and by removing redundant information using the multi-level information fusion module, the feature fusion complexity of the model is reduced by extracting high level features. In addition, the segmented attention module can suppress unreliable information to enhance network learning of channel and spatial information, enabling the network to more accurately understand the geometric aspects of the object. Extensive experiments on LM and LMO datasets demonstrate that our method outperforms state-of-the-art baselines, ranking 1st in both speed and accuracy.
KW - 6D pose estimation
KW - Deep learning
KW - Internet of Things
KW - Multi-level feature fusion
UR - http://www.scopus.com/inward/record.url?scp=85159637392&partnerID=8YFLogxK
U2 - 10.1016/j.comcom.2023.05.003
DO - 10.1016/j.comcom.2023.05.003
M3 - Journal article
AN - SCOPUS:85159637392
SN - 0140-3664
VL - 207
SP - 19
EP - 26
JO - Computer Communications
JF - Computer Communications
ER -