TY - GEN
T1 - BA-MoE
T2 - 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023
AU - Chen, Peikun
AU - Yu, Fan
AU - Liang, Yuhao
AU - Xue, Hongfei
AU - Wan, Xucheng
AU - Zheng, Naijun
AU - Zhou, Huan
AU - Xie, Lei
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Mixture-of-experts based models, which use language experts to extract language-specific representations effectively, have been well applied in code-switching automatic speech recognition. However, there is still substantial space to improve as similar pronunciation across languages may result in ineffective multi-language modeling and inaccurate language boundary estimation. To eliminate these drawbacks, we propose a cross-layer language adapter and a boundary-aware training method, namely Boundary-Aware Mixture-of-Experts (BA-MoE). Specifically, we introduce language-specific adapters to separate language-specific representations and a unified gating layer to fuse representations within each encoder layer. Second, we compute language adaptation loss of the mean output of each language-specific adapter to improve the adapter module's language-specific representation learning. Besides, we utilize a boundary-aware predictor to learn boundary representations for dealing with language boundary confusion. Our approach achieves significant performance improvement, reducing the mixture error rate by 16.55% compared to the baseline on the ASRU 2019 Mandarin-English code-switching challenge dataset.
AB - Mixture-of-experts based models, which use language experts to extract language-specific representations effectively, have been well applied in code-switching automatic speech recognition. However, there is still substantial space to improve as similar pronunciation across languages may result in ineffective multi-language modeling and inaccurate language boundary estimation. To eliminate these drawbacks, we propose a cross-layer language adapter and a boundary-aware training method, namely Boundary-Aware Mixture-of-Experts (BA-MoE). Specifically, we introduce language-specific adapters to separate language-specific representations and a unified gating layer to fuse representations within each encoder layer. Second, we compute language adaptation loss of the mean output of each language-specific adapter to improve the adapter module's language-specific representation learning. Besides, we utilize a boundary-aware predictor to learn boundary representations for dealing with language boundary confusion. Our approach achieves significant performance improvement, reducing the mixture error rate by 16.55% compared to the baseline on the ASRU 2019 Mandarin-English code-switching challenge dataset.
KW - automatic speech recognition
KW - boundary-aware learning
KW - code-switch
KW - mixture-of-experts
UR - http://www.scopus.com/inward/record.url?scp=85184660492&partnerID=8YFLogxK
U2 - 10.1109/ASRU57964.2023.10389798
DO - 10.1109/ASRU57964.2023.10389798
M3 - 会议稿件
AN - SCOPUS:85184660492
T3 - 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023
BT - 2023 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2023
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 16 December 2023 through 20 December 2023
ER -