Panoptic Scene Graph Generation

Jingkang Yang, Yi Zhe Ang, Zujin Guo, Kaiyang Zhou, Wayne Zhang, Ziwei Liu*

*Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

16 Citations (Scopus)

Abstract

Existing research addresses scene graph generation (SGG)—a critical technology for scene understanding in images—from a detection perspective, i.e., objects are detected using bounding boxes followed by prediction of their pairwise relationships. We argue that such a paradigm causes several problems that impede the progress of the field. For instance, bounding box-based labels in current datasets usually contain redundant classes like hairs, and leave out background information that is crucial to the understanding of context. In this work, we introduce panoptic scene graph generation (PSG), a new problem task that requires the model to generate a more comprehensive scene graph representation based on panoptic segmentations rather than rigid bounding boxes. A high-quality PSG dataset, which contains 49k well-annotated overlapping images from COCO and Visual Genome, is created for the community to keep track of its progress. For benchmarking, we build four two-stage baselines, which are modified from classic methods in SGG, and two one-stage baselines called PSGTR and PSGFormer, which are based on the efficient Transformer-based detector, i.e., DETR. While PSGTR uses a set of queries to directly learn triplets, PSGFormer separately models the objects and relations in the form of queries from two Transformer decoders, followed by a prompting-like relation-object matching mechanism. In the end, we share insights on open challenges and future directions. We invite users to explore the PSG dataset on our project page https://psgdataset.org/, and try our codebase https://github.com/Jingkang50/OpenPSG.

Original languageEnglish
Title of host publicationComputer Vision – ECCV 2022
Subtitle of host publication17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXVII
EditorsShai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, Tal Hassner
PublisherSpringer
Pages178-196
Number of pages19
Edition1st
ISBN (Electronic)9783031198120
ISBN (Print)9783031198113
DOIs
Publication statusPublished - 29 Oct 2022
Event17th European Conference on Computer Vision, ECCV 2022 - Tel Aviv, Israel
Duration: 23 Oct 202227 Oct 2022
https://eccv2022.ecva.net/
https://link.springer.com/conference/eccv
https://link.springer.com/book/10.1007/978-3-031-19769-7

Publication series

NameLecture Notes in Computer Science
Volume13687
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th European Conference on Computer Vision, ECCV 2022
Country/TerritoryIsrael
CityTel Aviv
Period23/10/2227/10/22
Internet address

Scopus Subject Areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Panoptic Scene Graph Generation'. Together they form a unique fingerprint.

Cite this