A kullback-leibler divergence based recurrent mixture density network for acoustic modeling in emotional statistical parametric speech synthesis

Xiaochun An; Yuchao Zhang; Bing Liu; Liumeng Xue; Lei Xie

doi:10.1145/3267935.3267949

A kullback-leibler divergence based recurrent mixture density network for acoustic modeling in emotional statistical parametric speech synthesis

Xiaochun An, Yuchao Zhang, Bing Liu, Liumeng Xue, Lei Xie

School of Computer Science

Northwestern Polytechnical University Xian

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

1 Scopus citations

Abstract

This paper proposes a Kullback-Leibler divergence (KLD) based recurrent mixture density network (RMDN) approach for acoustic modeling in emotional statistical parametric speech synthesis (SPSS), which aims at improving model accuracy and emotion naturalness. First, to improve model accuracy, we propose to use RMDN as acoustic model, which combines an LSTM with a mixture density network (MDN). Adding mixture density layer allows us to do multimodal regression as well as to predict variances, thus modeling more accurate probability density functions of acoustic features. Second, we further introduce Kullback-Leibler divergence regularization in model training. Inspired by KLD’s success in acoustic model adaptation, we aim to improve the emotion naturalness by maximizing the distances between the distributions of emotional speech and neutral speech. Objective and subjective evaluations show that the proposed approach improves the prediction accuracy of acoustic features and the naturalness of the synthesized emotional speech.

Original language	English
Title of host publication	ASMMC-MMAC 2018 - Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data, Co-located with MM 2018
Publisher	Association for Computing Machinery, Inc
Pages	1-6
Number of pages	6
ISBN (Electronic)	9781450359856
DOIs	https://doi.org/10.1145/3267935.3267949
State	Published - 19 Oct 2018
Event	Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data Workshop, ASMMC-MMAC 2018 - Seoul, Korea, Republic of Duration: 26 Oct 2018 → …

Publication series

Name	ASMMC-MMAC 2018 - Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data, Co-located with MM 2018

Conference

Conference	Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data Workshop, ASMMC-MMAC 2018
Country/Territory	Korea, Republic of
City	Seoul
Period	26/10/18 → …

Keywords

Emotional statistical parametric speech synthesis
KLD-RMDN
LSTM
Recurrent mixture density network

Access to Document

10.1145/3267935.3267949

Cite this

An, X., Zhang, Y., Liu, B., Xue, L., & Xie, L. (2018). A kullback-leibler divergence based recurrent mixture density network for acoustic modeling in emotional statistical parametric speech synthesis. In ASMMC-MMAC 2018 - Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data, Co-located with MM 2018 (pp. 1-6). (ASMMC-MMAC 2018 - Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data, Co-located with MM 2018). Association for Computing Machinery, Inc. https://doi.org/10.1145/3267935.3267949

An, Xiaochun ; Zhang, Yuchao ; Liu, Bing et al. / A kullback-leibler divergence based recurrent mixture density network for acoustic modeling in emotional statistical parametric speech synthesis. ASMMC-MMAC 2018 - Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data, Co-located with MM 2018. Association for Computing Machinery, Inc, 2018. pp. 1-6 (ASMMC-MMAC 2018 - Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data, Co-located with MM 2018).

@inproceedings{d7fe52e440d641cb821218c8fbe541ed,

title = "A kullback-leibler divergence based recurrent mixture density network for acoustic modeling in emotional statistical parametric speech synthesis",

abstract = "This paper proposes a Kullback-Leibler divergence (KLD) based recurrent mixture density network (RMDN) approach for acoustic modeling in emotional statistical parametric speech synthesis (SPSS), which aims at improving model accuracy and emotion naturalness. First, to improve model accuracy, we propose to use RMDN as acoustic model, which combines an LSTM with a mixture density network (MDN). Adding mixture density layer allows us to do multimodal regression as well as to predict variances, thus modeling more accurate probability density functions of acoustic features. Second, we further introduce Kullback-Leibler divergence regularization in model training. Inspired by KLD{\textquoteright}s success in acoustic model adaptation, we aim to improve the emotion naturalness by maximizing the distances between the distributions of emotional speech and neutral speech. Objective and subjective evaluations show that the proposed approach improves the prediction accuracy of acoustic features and the naturalness of the synthesized emotional speech.",

keywords = "Emotional statistical parametric speech synthesis, KLD-RMDN, LSTM, Recurrent mixture density network",

author = "Xiaochun An and Yuchao Zhang and Bing Liu and Liumeng Xue and Lei Xie",

note = "Publisher Copyright: {\textcopyright} 2018 Association for Computing Machinery.; Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data Workshop, ASMMC-MMAC 2018 ; Conference date: 26-10-2018",

year = "2018",

month = oct,

day = "19",

doi = "10.1145/3267935.3267949",

language = "英语",

series = "ASMMC-MMAC 2018 - Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data, Co-located with MM 2018",

publisher = "Association for Computing Machinery, Inc",

pages = "1--6",

booktitle = "ASMMC-MMAC 2018 - Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data, Co-located with MM 2018",

}

An, X, Zhang, Y, Liu, B, Xue, L & Xie, L 2018, A kullback-leibler divergence based recurrent mixture density network for acoustic modeling in emotional statistical parametric speech synthesis. in ASMMC-MMAC 2018 - Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data, Co-located with MM 2018. ASMMC-MMAC 2018 - Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data, Co-located with MM 2018, Association for Computing Machinery, Inc, pp. 1-6, Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data Workshop, ASMMC-MMAC 2018, Seoul, Korea, Republic of, 26/10/18. https://doi.org/10.1145/3267935.3267949

A kullback-leibler divergence based recurrent mixture density network for acoustic modeling in emotional statistical parametric speech synthesis. / An, Xiaochun; Zhang, Yuchao; Liu, Bing et al.
ASMMC-MMAC 2018 - Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data, Co-located with MM 2018. Association for Computing Machinery, Inc, 2018. p. 1-6 (ASMMC-MMAC 2018 - Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data, Co-located with MM 2018).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - A kullback-leibler divergence based recurrent mixture density network for acoustic modeling in emotional statistical parametric speech synthesis

AU - An, Xiaochun

AU - Zhang, Yuchao

AU - Liu, Bing

AU - Xue, Liumeng

AU - Xie, Lei

PY - 2018/10/19

Y1 - 2018/10/19

N2 - This paper proposes a Kullback-Leibler divergence (KLD) based recurrent mixture density network (RMDN) approach for acoustic modeling in emotional statistical parametric speech synthesis (SPSS), which aims at improving model accuracy and emotion naturalness. First, to improve model accuracy, we propose to use RMDN as acoustic model, which combines an LSTM with a mixture density network (MDN). Adding mixture density layer allows us to do multimodal regression as well as to predict variances, thus modeling more accurate probability density functions of acoustic features. Second, we further introduce Kullback-Leibler divergence regularization in model training. Inspired by KLD’s success in acoustic model adaptation, we aim to improve the emotion naturalness by maximizing the distances between the distributions of emotional speech and neutral speech. Objective and subjective evaluations show that the proposed approach improves the prediction accuracy of acoustic features and the naturalness of the synthesized emotional speech.

AB - This paper proposes a Kullback-Leibler divergence (KLD) based recurrent mixture density network (RMDN) approach for acoustic modeling in emotional statistical parametric speech synthesis (SPSS), which aims at improving model accuracy and emotion naturalness. First, to improve model accuracy, we propose to use RMDN as acoustic model, which combines an LSTM with a mixture density network (MDN). Adding mixture density layer allows us to do multimodal regression as well as to predict variances, thus modeling more accurate probability density functions of acoustic features. Second, we further introduce Kullback-Leibler divergence regularization in model training. Inspired by KLD’s success in acoustic model adaptation, we aim to improve the emotion naturalness by maximizing the distances between the distributions of emotional speech and neutral speech. Objective and subjective evaluations show that the proposed approach improves the prediction accuracy of acoustic features and the naturalness of the synthesized emotional speech.

KW - Emotional statistical parametric speech synthesis

KW - KLD-RMDN

KW - LSTM

KW - Recurrent mixture density network

UR - http://www.scopus.com/inward/record.url?scp=85061697082&partnerID=8YFLogxK

U2 - 10.1145/3267935.3267949

DO - 10.1145/3267935.3267949

M3 - 会议稿件

AN - SCOPUS:85061697082

T3 - ASMMC-MMAC 2018 - Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data, Co-located with MM 2018

SP - 1

EP - 6

BT - ASMMC-MMAC 2018 - Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data, Co-located with MM 2018

PB - Association for Computing Machinery, Inc

T2 - Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data Workshop, ASMMC-MMAC 2018

Y2 - 26 October 2018

ER -

An X, Zhang Y, Liu B, Xue L, Xie L. A kullback-leibler divergence based recurrent mixture density network for acoustic modeling in emotional statistical parametric speech synthesis. In ASMMC-MMAC 2018 - Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data, Co-located with MM 2018. Association for Computing Machinery, Inc. 2018. p. 1-6. (ASMMC-MMAC 2018 - Proceedings of the Joint Workshop of the 4th Workshop on Affective Social Multimedia Computing and 1st Multi-Modal Affective Computing of Large-Scale Multimedia Data, Co-located with MM 2018). doi: 10.1145/3267935.3267949

A kullback-leibler divergence based recurrent mixture density network for acoustic modeling in emotional statistical parametric speech synthesis

Abstract

Publication series

Conference

Keywords

Access to Document

Other files and links

Fingerprint

Cite this