Self-attention-based fully-inception networks for continuous sign language recognition

Mingjie Zhou, Michael Ng, Zixin Cai, Ka Chun Cheung

Research output: Chapter in book/report/conference proceedingConference proceedingpeer-review

14 Citations (Scopus)

Abstract

In hearing-loss community, sign language is a primary tool to communicate with people while there is communication gap between hearing-loss people with normal hearing people. Continuous sign language recognition, which can bridge the communication gap, is a challenging task because of the weakly supervised ordered annotations where no frame-level label is provided. To overcome this problem, connectionist temporal classification (CTC) is the most widely used method. However, CTC learning could perform bad if extracted features are not good. For better feature extraction, this work presents the novel self-attention-based fully-inception (SAFI) networks for vision-based end-to-end continuous sign language recognition. Considering the length of sign words differs from each other, we introduce fully inception network with different receptive field to extract dynamic clip-level features. To further boost the performance, the fully inception network with an auxiliary classifier is trained with aggregation cross entropy (ACE) loss. Then the self-attention networks as global sequential feature extractor is used to model the clip-level features with CTC. The proposed model is optimized by jointly training with ACE on clip-level feature learning and CTC on global sequential feature learning in an end-to-end fashion. The best method in the baselines achieves 35.6% WER on validation set and 34.5% WER on test set. It employs a better decoding algorithm for pseudo label to do the EM-like optimization to fine tune CNN module. In contrast, our approach focuses on the better feature extraction for end-to-end learning. To alleviate the overfitting on the limited dataset, we employ temporal elastic deformation to triple the real-world dataset RWTH-PHOENIX-Weather 2014. Experimental results on the real-world dataset RWTH-PHOENIX-Weather 2014 demonstrate the effectiveness of our approach which achieves 31.7% WER on validation set and 31.3% WER on test set.

Original languageEnglish
Title of host publicationECAI 2020 - 24th European Conference on Artificial Intelligence, including 10th Conference on Prestigious Applications of Artificial Intelligence, PAIS 2020 - Proceedings
EditorsGiuseppe De Giacomo, Alejandro Catala, Bistra Dilkina, Michela Milano, Senen Barro, Alberto Bugarin, Jerome Lang
PublisherIOS Press BV
Pages2832-2839
Number of pages8
Edition1st
ISBN (Electronic)9781643681016
ISBN (Print)9781643681009
DOIs
Publication statusPublished - 24 Aug 2020
Event24th European Conference on Artificial Intelligence, ECAI 2020, including 10th Conference on Prestigious Applications of Artificial Intelligence, PAIS 2020 - Santiago de Compostela, Online, Spain
Duration: 29 Aug 20208 Sept 2020

Publication series

NameFrontiers in Artificial Intelligence and Applications
Volume325
ISSN (Print)0922-6389
ISSN (Electronic)1879-8314

Conference

Conference24th European Conference on Artificial Intelligence, ECAI 2020, including 10th Conference on Prestigious Applications of Artificial Intelligence, PAIS 2020
Country/TerritorySpain
CitySantiago de Compostela, Online
Period29/08/208/09/20

Scopus Subject Areas

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Self-attention-based fully-inception networks for continuous sign language recognition'. Together they form a unique fingerprint.

Cite this