Abstract
The accurate estimation of multi-dimensional human pose recognition relies heavily on information about posture and movement. Specifically, estimating 3D human pose and mesh requires not only anatomically plausible reconstructions but also robustness to challenging scenarios. Existing anatomically constrained methods often process frames independently, leading to implausible predictions and temporally inconsistent results. To address this, we propose Action-HSMR, a sequence-based framework that leverages short-term temporal context from consecutive frames to correct potentially implausible SKEL parameter predictions. Specifically, sliding windows of three frames are serialized into structured spatial-temporal tokens, encoded via Vision Transformer (ViT), and fused using a self-attention temporal feature fusion module. Extensive experiments on 3D benchmark datasets HMR2.0, demonstrate state-of-the-art performance, particularly under extreme poses.
| Original language | English |
|---|---|
| Title of host publication | ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
| Publisher | IEEE |
| Pages | 10457-10461 |
| Number of pages | 5 |
| ISBN (Electronic) | 9798331567019 |
| ISBN (Print) | 9798331567026 |
| DOIs | |
| Publication status | Published - 3 May 2026 |
| Event | 2026 IEEE International Conference on Acoustics, Speech and Signal Processing - Centre de Convencions Internacional de Barcelona, Barcelona, Spain Duration: 3 May 2026 → 8 May 2026 https://2026.ieeeicassp.org/ (Conference website) https://2026.ieeeicassp.org/technical-program/ (Conference program schedule) https://ieeexplore.ieee.org/xpl/conhome/11460365/proceeding (Conference proceeding) |
Conference
| Conference | 2026 IEEE International Conference on Acoustics, Speech and Signal Processing |
|---|---|
| Abbreviated title | ICASSP 2026 |
| Country/Territory | Spain |
| City | Barcelona |
| Period | 3/05/26 → 8/05/26 |
| Internet address |
|
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 9 Industry, Innovation, and Infrastructure
Fingerprint
Dive into the research topics of 'Action-HSMR: Sequence-based 3D Human Pose and Mesh Estimation with Temporal Consistency'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver