TY - JOUR
T1 - BS-PLCNet 2
T2 - 25th Interspeech Conferece 2024
AU - Zhang, Zihan
AU - Xia, Xianjun
AU - Huang, Chuanzeng
AU - Xiao, Yijian
AU - Xie, Lei
N1 - Publisher Copyright:
© 2024 International Speech Communication Association. All rights reserved.
PY - 2024
Y1 - 2024
N2 - Audio packet loss is an inevitable problem in real-time speech communication.A band-split packet loss concealment network (BS-PLCNet) targeting full-band signals was recently proposed.Although it performs superiorly in the ICASSP 2024 PLC Challenge, BS-PLCNet is a large model with high computational complexity of 8.95G FLOPS.This paper presents its updated version, BS-PLCNet 2, to reduce computational complexity and improve performance further.Specifically, to compensate for the missing future information, in the wide-band module, we design a dual-path encoder structure (with non-causal and causal path) and leverage an intra-model knowledge distillation strategy to distill the future information from the non-causal teacher to the casual student.Moreover, we introduce a lightweight post-processing module after packet loss restoration to recover speech distortions and remove residual noise in the audio signal.With only 40% of original parameters in BS-PLCNet, BS-PLCNet 2 brings 0.18 PLCMOS improvement on the ICASSP 2024 PLC challenge blind set, achieving state-of-the-art performance on this dataset.
AB - Audio packet loss is an inevitable problem in real-time speech communication.A band-split packet loss concealment network (BS-PLCNet) targeting full-band signals was recently proposed.Although it performs superiorly in the ICASSP 2024 PLC Challenge, BS-PLCNet is a large model with high computational complexity of 8.95G FLOPS.This paper presents its updated version, BS-PLCNet 2, to reduce computational complexity and improve performance further.Specifically, to compensate for the missing future information, in the wide-band module, we design a dual-path encoder structure (with non-causal and causal path) and leverage an intra-model knowledge distillation strategy to distill the future information from the non-causal teacher to the casual student.Moreover, we introduce a lightweight post-processing module after packet loss restoration to recover speech distortions and remove residual noise in the audio signal.With only 40% of original parameters in BS-PLCNet, BS-PLCNet 2 brings 0.18 PLCMOS improvement on the ICASSP 2024 PLC challenge blind set, achieving state-of-the-art performance on this dataset.
KW - band-split
KW - intra-model knowledge distillation
KW - packet loss concealment
KW - two-stage
UR - http://www.scopus.com/inward/record.url?scp=85214826458&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2024-1764
DO - 10.21437/Interspeech.2024-1764
M3 - 会议文章
AN - SCOPUS:85214826458
SN - 2308-457X
SP - 1750
EP - 1754
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Y2 - 1 September 2024 through 5 September 2024
ER -