Espresso: A Fast End-To-End Neural Speech Recognition Toolkit

Yiming Wang, Sanjeev Khudanpur, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

49 Scopus citations

Abstract

We present Espresso, an open-source, modular, extensible end-To-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit FAIRSEQ. ESRESSO supports distributed training across GPUs and computing nodes, and features various decoding approaches commonly employed in ASR, including look-Ahead word-based language model fusion, for which a fast, parallelized decoder is implemented. Espresso achieves state-of-The-Art ASR performance on the WSJ, LibriSpeech, and Switchboard data sets among other end-To-end systems without data augmentation, and is 4-11x faster for decoding than similar systems (e.g. ESPNET).

Original languageEnglish
Title of host publication2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages136-143
Number of pages8
ISBN (Electronic)9781728103068
DOIs
StatePublished - Dec 2019
Event2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Singapore, Singapore
Duration: 15 Dec 201918 Dec 2019

Publication series

Name2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings

Conference

Conference2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019
Country/TerritorySingapore
CitySingapore
Period15/12/1918/12/19

Keywords

  • automatic speech recognition
  • end-To-end
  • language model fusion
  • parallel decoding

Fingerprint

Dive into the research topics of 'Espresso: A Fast End-To-End Neural Speech Recognition Toolkit'. Together they form a unique fingerprint.

Cite this