Perception-oriented video saliency detection via spatio-temporal attention analysis

Sheng hua Zhong, Yan Liu*, To Yee Ng, Yang LIU

*Corresponding author for this work

Research output: Contribution to journalJournal articlepeer-review

14 Citations (Scopus)


Human visual system actively seeks salient regions and movements in video sequences to reduce the search effort. Computational visual saliency detection model provides important information for semantic understanding in many real world applications. In this paper, we propose a novel perception-oriented video saliency detection model to detect the attended regions for both interesting objects and dominant motions in video sequences. Based on the visual orientation inhomogeneity of human perception, a novel spatial saliency detection technique called visual orientation inhomogeneous saliency model is proposed. In temporal saliency detection, a novel optical flow model is created based on the dynamic consistency of motion. We fused the spatial and the temporal saliency maps together to build the spatio-temporal attention analysis model toward a uniform framework. The proposed model is evaluated on three typical video datasets with six visual saliency detection algorithms and achieves remarkable performance. Empirical validations demonstrate the salient regions detected by the proposed model highlight the dominant and interesting objects effectively and efficiently. More importantly, the saliency regions detected by the proposed model are consistent with human subjective eye tracking data.

Original languageEnglish
Pages (from-to)178-188
Number of pages11
Publication statusPublished - 26 Sept 2016

Scopus Subject Areas

  • Computer Science Applications
  • Cognitive Neuroscience
  • Artificial Intelligence

User-Defined Keywords

  • Dynamic consistency
  • orientation inhomogeneous feature map
  • Perception-oriented video saliency
  • Spatio-temporal modeling
  • Visual attention


Dive into the research topics of 'Perception-oriented video saliency detection via spatio-temporal attention analysis'. Together they form a unique fingerprint.

Cite this