Skip to main navigation Skip to search Skip to main content

Action-HSMR: Sequence-based 3D Human Pose and Mesh Estimation with Temporal Consistency

  • Xinyu Huang
  • , Ruiguo Yang
  • , Chen Chen
  • , Xinze Li
  • , Wentao Fan
  • , Weifeng Su*
  • , Yiu-ming Cheung
  • , Xiangchen Li
  • *Corresponding author for this work

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

Abstract

The accurate estimation of multi-dimensional human pose recognition relies heavily on information about posture and movement. Specifically, estimating 3D human pose and mesh requires not only anatomically plausible reconstructions but also robustness to challenging scenarios. Existing anatomically constrained methods often process frames independently, leading to implausible predictions and temporally inconsistent results. To address this, we propose Action-HSMR, a sequence-based framework that leverages short-term temporal context from consecutive frames to correct potentially implausible SKEL parameter predictions. Specifically, sliding windows of three frames are serialized into structured spatial-temporal tokens, encoded via Vision Transformer (ViT), and fused using a self-attention temporal feature fusion module. Extensive experiments on 3D benchmark datasets HMR2.0, demonstrate state-of-the-art performance, particularly under extreme poses.
Original languageEnglish
Title of host publicationICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
PublisherIEEE
Pages10457-10461
Number of pages5
ISBN (Electronic)9798331567019
ISBN (Print)9798331567026
DOIs
Publication statusPublished - 3 May 2026
Event2026 IEEE International Conference on Acoustics, Speech and Signal Processing - Centre de Convencions Internacional de Barcelona, Barcelona, Spain
Duration: 3 May 20268 May 2026
https://2026.ieeeicassp.org/ (Conference website)
https://2026.ieeeicassp.org/technical-program/ (Conference program schedule)
https://ieeexplore.ieee.org/xpl/conhome/11460365/proceeding (Conference proceeding)

Conference

Conference2026 IEEE International Conference on Acoustics, Speech and Signal Processing
Abbreviated titleICASSP 2026
Country/TerritorySpain
CityBarcelona
Period3/05/268/05/26
Internet address

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 9 - Industry, Innovation, and Infrastructure
    SDG 9 Industry, Innovation, and Infrastructure

Fingerprint

Dive into the research topics of 'Action-HSMR: Sequence-based 3D Human Pose and Mesh Estimation with Temporal Consistency'. Together they form a unique fingerprint.

Cite this