BS-PLCNet 2: Two-stage Band-split Packet Loss Concealment Network with Intra-model Knowledge Distillation

Zihan Zhang; Xianjun Xia; Chuanzeng Huang; Yijian Xiao; Lei Xie

doi:10.21437/Interspeech.2024-1764

BS-PLCNet 2: Two-stage Band-split Packet Loss Concealment Network with Intra-model Knowledge Distillation

Zihan Zhang, Xianjun Xia, Chuanzeng Huang, Yijian Xiao, Lei Xie

School of Computer Science

Research output: Contribution to journal › Conference article › peer-review

Abstract

Audio packet loss is an inevitable problem in real-time speech communication.A band-split packet loss concealment network (BS-PLCNet) targeting full-band signals was recently proposed.Although it performs superiorly in the ICASSP 2024 PLC Challenge, BS-PLCNet is a large model with high computational complexity of 8.95G FLOPS.This paper presents its updated version, BS-PLCNet 2, to reduce computational complexity and improve performance further.Specifically, to compensate for the missing future information, in the wide-band module, we design a dual-path encoder structure (with non-causal and causal path) and leverage an intra-model knowledge distillation strategy to distill the future information from the non-causal teacher to the casual student.Moreover, we introduce a lightweight post-processing module after packet loss restoration to recover speech distortions and remove residual noise in the audio signal.With only 40% of original parameters in BS-PLCNet, BS-PLCNet 2 brings 0.18 PLCMOS improvement on the ICASSP 2024 PLC challenge blind set, achieving state-of-the-art performance on this dataset.

Original language	English
Pages (from-to)	1750-1754
Number of pages	5
Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
DOIs	https://doi.org/10.21437/Interspeech.2024-1764
State	Published - 2024
Event	25th Interspeech Conferece 2024 - Kos Island, Greece Duration: 1 Sep 2024 → 5 Sep 2024

Keywords

band-split
intra-model knowledge distillation
packet loss concealment
two-stage

Access to Document

10.21437/Interspeech.2024-1764

Cite this

@article{45c4353d57e049a3b775661c6a1d16cd,

title = "BS-PLCNet 2: Two-stage Band-split Packet Loss Concealment Network with Intra-model Knowledge Distillation",

abstract = "Audio packet loss is an inevitable problem in real-time speech communication.A band-split packet loss concealment network (BS-PLCNet) targeting full-band signals was recently proposed.Although it performs superiorly in the ICASSP 2024 PLC Challenge, BS-PLCNet is a large model with high computational complexity of 8.95G FLOPS.This paper presents its updated version, BS-PLCNet 2, to reduce computational complexity and improve performance further.Specifically, to compensate for the missing future information, in the wide-band module, we design a dual-path encoder structure (with non-causal and causal path) and leverage an intra-model knowledge distillation strategy to distill the future information from the non-causal teacher to the casual student.Moreover, we introduce a lightweight post-processing module after packet loss restoration to recover speech distortions and remove residual noise in the audio signal.With only 40% of original parameters in BS-PLCNet, BS-PLCNet 2 brings 0.18 PLCMOS improvement on the ICASSP 2024 PLC challenge blind set, achieving state-of-the-art performance on this dataset.",

keywords = "band-split, intra-model knowledge distillation, packet loss concealment, two-stage",

author = "Zihan Zhang and Xianjun Xia and Chuanzeng Huang and Yijian Xiao and Lei Xie",

note = "Publisher Copyright: {\textcopyright} 2024 International Speech Communication Association. All rights reserved.; 25th Interspeech Conferece 2024 ; Conference date: 01-09-2024 Through 05-09-2024",

year = "2024",

doi = "10.21437/Interspeech.2024-1764",

language = "英语",

pages = "1750--1754",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

TY - JOUR

T1 - BS-PLCNet 2

T2 - 25th Interspeech Conferece 2024

AU - Zhang, Zihan

AU - Xia, Xianjun

AU - Huang, Chuanzeng

AU - Xiao, Yijian

AU - Xie, Lei

PY - 2024

Y1 - 2024

N2 - Audio packet loss is an inevitable problem in real-time speech communication.A band-split packet loss concealment network (BS-PLCNet) targeting full-band signals was recently proposed.Although it performs superiorly in the ICASSP 2024 PLC Challenge, BS-PLCNet is a large model with high computational complexity of 8.95G FLOPS.This paper presents its updated version, BS-PLCNet 2, to reduce computational complexity and improve performance further.Specifically, to compensate for the missing future information, in the wide-band module, we design a dual-path encoder structure (with non-causal and causal path) and leverage an intra-model knowledge distillation strategy to distill the future information from the non-causal teacher to the casual student.Moreover, we introduce a lightweight post-processing module after packet loss restoration to recover speech distortions and remove residual noise in the audio signal.With only 40% of original parameters in BS-PLCNet, BS-PLCNet 2 brings 0.18 PLCMOS improvement on the ICASSP 2024 PLC challenge blind set, achieving state-of-the-art performance on this dataset.

AB - Audio packet loss is an inevitable problem in real-time speech communication.A band-split packet loss concealment network (BS-PLCNet) targeting full-band signals was recently proposed.Although it performs superiorly in the ICASSP 2024 PLC Challenge, BS-PLCNet is a large model with high computational complexity of 8.95G FLOPS.This paper presents its updated version, BS-PLCNet 2, to reduce computational complexity and improve performance further.Specifically, to compensate for the missing future information, in the wide-band module, we design a dual-path encoder structure (with non-causal and causal path) and leverage an intra-model knowledge distillation strategy to distill the future information from the non-causal teacher to the casual student.Moreover, we introduce a lightweight post-processing module after packet loss restoration to recover speech distortions and remove residual noise in the audio signal.With only 40% of original parameters in BS-PLCNet, BS-PLCNet 2 brings 0.18 PLCMOS improvement on the ICASSP 2024 PLC challenge blind set, achieving state-of-the-art performance on this dataset.

KW - band-split

KW - intra-model knowledge distillation

KW - packet loss concealment

KW - two-stage

UR - http://www.scopus.com/inward/record.url?scp=85214826458&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2024-1764

DO - 10.21437/Interspeech.2024-1764

M3 - 会议文章

AN - SCOPUS:85214826458

SN - 2308-457X

SP - 1750

EP - 1754

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Y2 - 1 September 2024 through 5 September 2024

ER -

BS-PLCNet 2: Two-stage Band-split Packet Loss Concealment Network with Intra-model Knowledge Distillation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this