Espresso: A Fast End-To-End Neural Speech Recognition Toolkit

Yiming Wang; Sanjeev Khudanpur; Tongfei Chen; Hainan Xu; Shuoyang Ding; Hang Lv; Yiwen Shao; Nanyun Peng; Lei Xie; Shinji Watanabe

doi:10.1109/ASRU46091.2019.9003968

Espresso: A Fast End-To-End Neural Speech Recognition Toolkit

Yiming Wang, Sanjeev Khudanpur, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe

School of Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

49 Scopus citations

Abstract

We present Espresso, an open-source, modular, extensible end-To-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit FAIRSEQ. ESRESSO supports distributed training across GPUs and computing nodes, and features various decoding approaches commonly employed in ASR, including look-Ahead word-based language model fusion, for which a fast, parallelized decoder is implemented. Espresso achieves state-of-The-Art ASR performance on the WSJ, LibriSpeech, and Switchboard data sets among other end-To-end systems without data augmentation, and is 4-11x faster for decoding than similar systems (e.g. ESPNET).

Original language	English
Title of host publication	2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	136-143
Number of pages	8
ISBN (Electronic)	9781728103068
DOIs	https://doi.org/10.1109/ASRU46091.2019.9003968
State	Published - Dec 2019
Event	2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Singapore, Singapore Duration: 15 Dec 2019 → 18 Dec 2019

Publication series

Name	2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings

Conference

Conference	2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019
Country/Territory	Singapore
City	Singapore
Period	15/12/19 → 18/12/19

Keywords

automatic speech recognition
end-To-end
language model fusion
parallel decoding

Access to Document

10.1109/ASRU46091.2019.9003968

Cite this

Wang, Y., Khudanpur, S., Chen, T., Xu, H., Ding, S., Lv, H., Shao, Y., Peng, N., Xie, L., & Watanabe, S. (2019). Espresso: A Fast End-To-End Neural Speech Recognition Toolkit. In 2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings (pp. 136-143). Article 9003968 (2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ASRU46091.2019.9003968

Wang, Yiming ; Khudanpur, Sanjeev ; Chen, Tongfei et al. / Espresso : A Fast End-To-End Neural Speech Recognition Toolkit. 2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 136-143 (2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings).

@inproceedings{75bea279a3f3496cbaea9cc02bb29776,

title = "Espresso: A Fast End-To-End Neural Speech Recognition Toolkit",

abstract = "We present Espresso, an open-source, modular, extensible end-To-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit FAIRSEQ. ESRESSO supports distributed training across GPUs and computing nodes, and features various decoding approaches commonly employed in ASR, including look-Ahead word-based language model fusion, for which a fast, parallelized decoder is implemented. Espresso achieves state-of-The-Art ASR performance on the WSJ, LibriSpeech, and Switchboard data sets among other end-To-end systems without data augmentation, and is 4-11x faster for decoding than similar systems (e.g. ESPNET).",

keywords = "automatic speech recognition, end-To-end, language model fusion, parallel decoding",

author = "Yiming Wang and Sanjeev Khudanpur and Tongfei Chen and Hainan Xu and Shuoyang Ding and Hang Lv and Yiwen Shao and Nanyun Peng and Lei Xie and Shinji Watanabe",

note = "Publisher Copyright: {\textcopyright} 2019 IEEE.; 2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 ; Conference date: 15-12-2019 Through 18-12-2019",

year = "2019",

month = dec,

doi = "10.1109/ASRU46091.2019.9003968",

language = "英语",

series = "2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "136--143",

booktitle = "2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings",

}

Wang, Y, Khudanpur, S, Chen, T, Xu, H, Ding, S, Lv, H, Shao, Y, Peng, N, Xie, L & Watanabe, S 2019, Espresso: A Fast End-To-End Neural Speech Recognition Toolkit. in 2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings., 9003968, 2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings, Institute of Electrical and Electronics Engineers Inc., pp. 136-143, 2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019, Singapore, Singapore, 15/12/19. https://doi.org/10.1109/ASRU46091.2019.9003968

Espresso: A Fast End-To-End Neural Speech Recognition Toolkit. / Wang, Yiming; Khudanpur, Sanjeev; Chen, Tongfei et al.
2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2019. p. 136-143 9003968 (2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Espresso

T2 - 2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019

AU - Wang, Yiming

AU - Khudanpur, Sanjeev

AU - Chen, Tongfei

AU - Xu, Hainan

AU - Ding, Shuoyang

AU - Lv, Hang

AU - Shao, Yiwen

AU - Peng, Nanyun

AU - Xie, Lei

AU - Watanabe, Shinji

PY - 2019/12

Y1 - 2019/12

N2 - We present Espresso, an open-source, modular, extensible end-To-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit FAIRSEQ. ESRESSO supports distributed training across GPUs and computing nodes, and features various decoding approaches commonly employed in ASR, including look-Ahead word-based language model fusion, for which a fast, parallelized decoder is implemented. Espresso achieves state-of-The-Art ASR performance on the WSJ, LibriSpeech, and Switchboard data sets among other end-To-end systems without data augmentation, and is 4-11x faster for decoding than similar systems (e.g. ESPNET).

AB - We present Espresso, an open-source, modular, extensible end-To-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit FAIRSEQ. ESRESSO supports distributed training across GPUs and computing nodes, and features various decoding approaches commonly employed in ASR, including look-Ahead word-based language model fusion, for which a fast, parallelized decoder is implemented. Espresso achieves state-of-The-Art ASR performance on the WSJ, LibriSpeech, and Switchboard data sets among other end-To-end systems without data augmentation, and is 4-11x faster for decoding than similar systems (e.g. ESPNET).

KW - automatic speech recognition

KW - end-To-end

KW - language model fusion

KW - parallel decoding

UR - http://www.scopus.com/inward/record.url?scp=85081601429&partnerID=8YFLogxK

U2 - 10.1109/ASRU46091.2019.9003968

DO - 10.1109/ASRU46091.2019.9003968

M3 - 会议稿件

AN - SCOPUS:85081601429

T3 - 2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings

SP - 136

EP - 143

BT - 2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 15 December 2019 through 18 December 2019

ER -

Wang Y, Khudanpur S, Chen T, Xu H, Ding S, Lv H et al. Espresso: A Fast End-To-End Neural Speech Recognition Toolkit. In 2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2019. p. 136-143. 9003968. (2019 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019 - Proceedings). doi: 10.1109/ASRU46091.2019.9003968

Espresso: A Fast End-To-End Neural Speech Recognition Toolkit

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this