TMVOS: Triplet Matching for Efficient Video Object Segmentation

Jiajia Liu, Hong Ning Dai, Guoying Zhao, Bo Li*, Tianqi Zhang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Video object segmentation (VOS) is a critical yet challenging task in video analysis. Recently, many pixel-level matching VOS methods have achieved an outstanding performance without significant time consumption in fine-tuning. However, most of these methods pay little attention to (i) matching background pixels and (ii) optimizing discriminable embeddings between classes. To address these issues, we propose a new end-to-end trainable method, namely Triplet Matching for efficient semi-supervised Video Object Segmentation (TMVOS). In particular, we devise a new triplet matching strategy that considers both the foreground and background matching and pulls the nearest negative embedding further than the nearest positive one for every anchor. As a result, this method implicitly enlarges the distances between embeddings of different classes and thereby generates accurate matching maps. Additionally, a dual decoder is applied for optimizing the final segmentation so that the model better fits the complex background and relatively simple targets. Extensive experiments demonstrate that the proposed method achieves superior performance in terms of accuracy and running-time compared with the state-of-the-art methods. The source code is available at: https://github.com/CVisionProcessing/TMVOS.

Original languageEnglish
Article number116779
JournalSignal Processing: Image Communication
Volume107
DOIs
Publication statusPublished - Sep 2022
Externally publishedYes

Scopus Subject Areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering

User-Defined Keywords

  • Embedding learning
  • Triplet matching
  • Video object segmentation

Fingerprint

Dive into the research topics of 'TMVOS: Triplet Matching for Efficient Video Object Segmentation'. Together they form a unique fingerprint.

Cite this