Skip to main navigation Skip to search Skip to main content

Developing phoneme-based lip-reading sentences system for silent speech recognition

  • Randa El-Bialy
  • , Daqing Chen
  • , Souheil Fenghour
  • , Walid Hussein
  • , Perry Xiao
  • , Omar H. Karam
  • , Bo Li
  • London South Bank University
  • The British University in Egypt

Research output: Contribution to journalArticlepeer-review

23 Scopus citations

Abstract

Lip-reading is a process of interpreting speech by visually analysing lip movements. Recent research in this area has shifted from simple word recognition to lip-reading sentences in the wild. This paper attempts to use phonemes as a classification schema for lip-reading sentences to explore an alternative schema and to enhance system performance. Different classification schemas have been investigated, including character-based and visemes-based schemas. The visual front-end model of the system consists of a Spatial-Temporal (3D) convolution followed by a 2D ResNet. Transformers utilise multi-headed attention for phoneme recognition models. For the language model, a Recurrent Neural Network is used. The performance of the proposed system has been testified with the BBC Lip Reading Sentences 2 (LRS2) benchmark dataset. Compared with the state-of-the-art approaches in lip-reading sentences, the proposed system has demonstrated an improved performance by a 10% lower word error rate on average under varying illumination ratios.

Original languageEnglish
Pages (from-to)129-138
Number of pages10
JournalCAAI Transactions on Intelligence Technology
Volume8
Issue number1
DOIs
StatePublished - Mar 2023

Keywords

  • deep learning
  • deep neural networks
  • lip-reading
  • phoneme-based lip-reading
  • spatial-temporal convolution
  • transformers

Fingerprint

Dive into the research topics of 'Developing phoneme-based lip-reading sentences system for silent speech recognition'. Together they form a unique fingerprint.

Cite this