Indoor robotics applications heavily rely on scene understanding and reconstruction. Compared to monocular vision, stereo vision methods are more promising to produce accurate geometrical information, such as surface normal and depth/disparity. Besides, deep learning models have shown their superior performance in stereo vision tasks. However, existing stereo datasets rarely contain high-quality surface normal and disparity ground truth, hardly satisfying the demand of training a prospective deep model. To this end, we introduce a large-scale indoor robotics stereo (IRS) dataset with over 100K stereo images and high-quality surface normal and disparity maps. Leveraging the advanced techniques of our customized rendering engine, the dataset is considerably close to the real-world scenes. Besides, we present DTN-Net, a two-stage deep model for surface normal estimation. Extensive experiments show the advantages and effectiveness of IRS in training deep models for disparity estimation, and DTN-Net provides state-of-the-art results for normal estimation compared to existing methods.
|Name||Proceedings of IEEE International Conference on Multimedia and Expo (ICME)|
|Conference||2021 IEEE International Conference on Multimedia and Expo, ICME 2021|
|Period||5/07/21 → 9/07/21|
- Surface reconstruction
- Multimedia systems
- Rendering (computer graphics)
- Stereo vision