Stereo-GS: Multi-View Stereo Vision Model for Generalizable 3D Gaussian Splatting Reconstruction

  • Xiufeng Huang
  • , Ka Chun Cheung
  • , Runmin Cong
  • , Simon See
  • , Renjie Wan*
  • *Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

Abstract

Generalizable 3D Gaussian Splatting reconstruction showcases advanced Image-to-3D content creation but requires substantial computational resources and large datasets, posing challenges to training models from scratch. Current methods usually entangle the prediction of 3D Gaussian geometry and appearance, which rely heavily on data-driven priors and result in slow regression speeds. To address this, we propose Stereo-GS, a disentangled framework for efficient 3D Gaussian prediction. Our method extracts features from local image pairs using a stereo vision backbone and fuses them via global attention blocks. Dedicated point and Gaussian prediction heads generate multi-view point-maps for geometry and Gaussian features for appearance, combined as GS-maps to represent the 3DGS object. A refinement network enhances these GSmaps for high-quality reconstruction. Unlike existing methods that depend on camera parameters, our approach achieves pose-free 3D reconstruction, improving robustness and practicality. By reducing resource demands while maintaining high-quality outputs, Stereo- GS provides an efficient, scalable solution for real-world 3D content generation. Project page: https://kevinhuangxf.github.io/stereo-gs.
Original languageEnglish
Title of host publicationMM 2025: Proceedings of the 33rd ACM International Conference on Multimedia
Place of PublicationNew York
PublisherAssociation for Computing Machinery (ACM)
Pages9822-9831
Number of pages10
ISBN (Electronic)9798400720352
ISBN (Print)9798400720352
DOIs
Publication statusPublished - 27 Oct 2025
Event33rd ACM International Conference on Multimedia, ACMMM25 - Dublin Royal Convention Centre, Dublin, Ireland
Duration: 27 Oct 202531 Oct 2025
https://whova.com/embedded/event/sa54pNCpHUFy1OTIEiEzceQu5kPuSm3dYlEnqAJdV4o%3D/?utc_source=ems (Conference program)
https://acmmm2025.org/ (Conference website)
https://dl.acm.org/doi/proceedings/10.1145/3746027 (Conference proceedings)

Publication series

NameMM: International Multimedia Conference
PublisherAssociation for Computing Machinery

Conference

Conference33rd ACM International Conference on Multimedia, ACMMM25
Country/TerritoryIreland
CityDublin
Period27/10/2531/10/25
Internet address

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 9 - Industry, Innovation, and Infrastructure
    SDG 9 Industry, Innovation, and Infrastructure

User-Defined Keywords

  • 3D Gaussian Splatting
  • Image-to-3D generation
  • Multi-view stereo

Fingerprint

Dive into the research topics of 'Stereo-GS: Multi-View Stereo Vision Model for Generalizable 3D Gaussian Splatting Reconstruction'. Together they form a unique fingerprint.

Cite this