4D Panoptic Scene Graph Generation

Jingkang Yang*, Jun Cen, Wenxuan Peng, Shuai Liu, Fangzhou Hong, Xiangtai Li, Kaiyang Zhou, Qifeng Chen, Ziwei Liu*

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

4 Citations (Scopus)

Abstract

We are living in a three-dimensional space while moving forward through a fourth dimension: time. To allow artificial intelligence to develop a comprehensive understanding of such a 4D environment, we introduce 4D Panoptic Scene Graph (PSG-4D), a new representation that bridges the raw visual data perceived in a dynamic 4D world and high-level visual understanding. Specifically, PSG-4D abstracts rich 4D sensory data into nodes, which represent entities with precise location and status information, and edges, which capture the temporal relations. To facilitate research in this new area, we build a richly annotated PSG-4D dataset consisting of 3K RGB-D videos with a total of 1M frames, each of which is labeled with 4D panoptic segmentation masks as well as fine-grained, dynamic scene graphs. To solve PSG-4D, we propose PSG4DFormer, a Transformer-based model that can predict panoptic segmentation masks, track masks along the time axis, and generate the corresponding scene graphs via a relation component. Extensive experiments on the new dataset show that our method can serve as a strong baseline for future research on PSG-4D. In the end, we provide a real-world application example to demonstrate how we can achieve dynamic scene understanding by integrating a large language model into our PSG-4D system.

Original languageEnglish
Title of host publication37th Conference on Neural Information Processing Systems, NeurIPS 2023
EditorsA. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, S. Levine
PublisherNeural Information Processing Systems Foundation
Pages1-14
Number of pages14
ISBN (Print)9781713899921
Publication statusPublished - Dec 2023
Event37th Conference on Neural Information Processing Systems, NeurIPS 2023 - Ernest N. Morial Convention Center, New Orleans, United States
Duration: 10 Dec 202316 Dec 2023
https://proceedings.neurips.cc/paper_files/paper/2023 (Conference Paper Search)
https://openreview.net/group?id=NeurIPS.cc/2023/Conference#tab-accept-oral (Conference Paper Search)
https://neurips.cc/Conferences/2023 (Conference Website)

Publication series

NameAdvances in Neural Information Processing Systems
Volume36
ISSN (Print)1049-5258
NameNeurIPS Proceedings

Conference

Conference37th Conference on Neural Information Processing Systems, NeurIPS 2023
Country/TerritoryUnited States
CityNew Orleans
Period10/12/2316/12/23
Internet address

Fingerprint

Dive into the research topics of '4D Panoptic Scene Graph Generation'. Together they form a unique fingerprint.

Cite this