Supervised spatio-temporal neighborhood topology learning for action recognition

Andy J. Ma, Pong Chi Yuen, Wilman W.W. Zou, Jian Huang Lai

Research output: Contribution to journalJournal articlepeer-review

22 Citations (Scopus)


Supervised manifold learning has been successfully applied to action recognition, in which class label information could improve the recognition performance. However, the learned manifold may not be able to well preserve both the local structure and global constraint of temporal labels in action sequences. To overcome this problem, this paper proposes a new supervised manifold learning algorithm called supervised spatio-temporal neighborhood topology learning (SSTNTL) for action recognition. By analyzing the topological characteristics in the context of action recognition, we propose to construct the neighborhood topology using both supervised spatial and temporal pose correspondence information. Employing the property in locality preserving projection (LPP), SSTNTL solves the generalized eigenvalue problem to obtain the best projections that not only separates data points from different classes, but also preserves local structures and temporal pose correspondence of sequences from the same class. Experimental results demonstrate that SSTNTL outperforms the manifold embedding methods with other topologies or local discriminant information. Moreover, compared with state-of-the-art action recognition algorithms, SSTNTL gives convincing performance for both human and gesture action recognition.

Original languageEnglish
Article number6469204
Pages (from-to)1447-1460
Number of pages14
JournalIEEE Transactions on Circuits and Systems for Video Technology
Issue number8
Publication statusPublished - Aug 2013

Scopus Subject Areas

  • Media Technology
  • Electrical and Electronic Engineering

User-Defined Keywords

  • Action recognition
  • manifold learning
  • neighborhood topology learning
  • supervised spatial
  • temporal pose correspondence


Dive into the research topics of 'Supervised spatio-temporal neighborhood topology learning for action recognition'. Together they form a unique fingerprint.

Cite this