TY - GEN
T1 - IRS: A Large Naturalistic Indoor Robotics Stereo Dataset to Train Deep Models for Disparity and Surface Normal Estimation
AU - Wang, Qiang
AU - Zheng, Shizhen
AU - Yan, Qingsong
AU - Deng, Fei
AU - Zhao, Kaiyong
AU - Chu, Xiaowen
PY - 2021/7/9
Y1 - 2021/7/9
N2 - Indoor robotics applications heavily rely on scene understanding and reconstruction. Compared to monocular vision, stereo vision methods are more promising to produce accurate geometrical information, such as surface normal and depth/disparity. Besides, deep learning models have shown their superior performance in stereo vision tasks. However, existing stereo datasets rarely contain high-quality surface normal and disparity ground truth, hardly satisfying the demand of training a prospective deep model. To this end, we introduce a large-scale indoor robotics stereo (IRS) dataset with over 100K stereo images and high-quality surface normal and disparity maps. Leveraging the advanced techniques of our customized rendering engine, the dataset is considerably close to the real-world scenes. Besides, we present DTN-Net, a two-stage deep model for surface normal estimation. Extensive experiments show the advantages and effectiveness of IRS in training deep models for disparity estimation, and DTN-Net provides state-of-the-art results for normal estimation compared to existing methods.
AB - Indoor robotics applications heavily rely on scene understanding and reconstruction. Compared to monocular vision, stereo vision methods are more promising to produce accurate geometrical information, such as surface normal and depth/disparity. Besides, deep learning models have shown their superior performance in stereo vision tasks. However, existing stereo datasets rarely contain high-quality surface normal and disparity ground truth, hardly satisfying the demand of training a prospective deep model. To this end, we introduce a large-scale indoor robotics stereo (IRS) dataset with over 100K stereo images and high-quality surface normal and disparity maps. Leveraging the advanced techniques of our customized rendering engine, the dataset is considerably close to the real-world scenes. Besides, we present DTN-Net, a two-stage deep model for surface normal estimation. Extensive experiments show the advantages and effectiveness of IRS in training deep models for disparity estimation, and DTN-Net provides state-of-the-art results for normal estimation compared to existing methods.
KW - Training
KW - Surface reconstruction
KW - Multimedia systems
KW - Refining
KW - Estimation
KW - Rendering (computer graphics)
KW - Stereo vision
UR - https://ieeexplore.ieee.org/document/9428423/
U2 - 10.1109/ICME51207.2021.9428423
DO - 10.1109/ICME51207.2021.9428423
M3 - Conference proceeding
SN - 9781665411523
T3 - Proceedings of IEEE International Conference on Multimedia and Expo (ICME)
SP - 1
EP - 6
BT - 2021 IEEE International Conference on Multimedia and Expo (ICME)
PB - IEEE
T2 - 2021 IEEE International Conference on Multimedia and Expo, ICME 2021
Y2 - 5 July 2021 through 9 July 2021
ER -