跳到主要导航 跳到搜索 跳到主要内容

SEQ-former: A context-enhanced and efficient automatic speech recognition framework

  • Qinglin Meng
  • , Min Liu
  • , Kaixun Huang
  • , Kun Wei
  • , Lei Xie
  • , Zongfeng Quan
  • , Weihong Deng
  • , Quan Lu
  • , Ning Jiang
  • , Guoqing Zhao
  • Mashang Consumer Finance Co. Ltd
  • Northwestern Polytechnical University Xian

科研成果: 期刊稿件会议文章同行评审

1 引用 (Scopus)

摘要

Contextual information is crucial for automatic speech recognition (ASR). Effective utilization of contextual information can improve the accuracy of ASR systems. To improve the model's ability to capture this information, we propose a novel ASR framework called SEQ-former, emphasizing simplicity, efficiency, and quickness. We incorporate a Prediction Decoder Network and a Shared Prediction Decoder Network to enhance contextual capabilities. To further increase efficiency, we use intermediate CTC and CTC Spike Reduce Methods to guide attention masks and reduce redundant peaks. Our approach demonstrates state-of-the-art performance on the AiShell-1 dataset, improves decoding efficiency, and delivers competitive results on LibriSpeech. Additionally, it optimizes 6.3% over 11,000 hours of private data compared to Efficient Conformer.

源语言英语
页(从-至)212-216
页数5
期刊Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
DOI
出版状态已出版 - 2024
活动25th Interspeech Conferece 2024 - Kos Island, 希腊
期限: 1 9月 20245 9月 2024

指纹

探究 'SEQ-former: A context-enhanced and efficient automatic speech recognition framework' 的科研主题。它们共同构成独一无二的指纹。

引用此