Photo-real talking head with deep bidirectional LSTM

Bo Fan, Lijuan Wang, Frank K. Soong, Lei Xie

科研成果: 书/报告/会议事项章节会议稿件同行评审

102 引用 (Scopus)

摘要

Long short-term memory (LSTM) is a specific recurrent neural network (RNN) architecture that is designed to model temporal sequences and their long-range dependencies more accurately than conventional RNNs. In this paper, we propose to use deep bidirectional LSTM (BLSTM) for audio/visual modeling in our photo-real talking head system. An audio/visual database of a subject's talking is firstly recorded as our training data. The audio/visual stereo data are converted into two parallel temporal sequences, i.e., contextual label sequences obtained by forced aligning audio against text, and visual feature sequences by applying active-appearance-model (AAM) on the lower face region among all the training image samples. The deep BLSTM is then trained to learn the regression model by minimizing the sum of square error (SSE) of predicting visual sequence from label sequence. After testing different network topologies, we interestingly found the best network is two BLSTM layers sitting on top of one feed-forward layer on our datasets. Compared with our previous HMM-based system, the newly proposed deep BLSTM-based one is better on both objective measurement and subjective A/B test.

源语言英语
主期刊名2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Proceedings
出版商Institute of Electrical and Electronics Engineers Inc.
4884-4888
页数5
ISBN(电子版)9781467369978
DOI
出版状态已出版 - 4 8月 2015
活动40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015 - Brisbane, 澳大利亚
期限: 19 4月 201424 4月 2014

出版系列

姓名ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
2015-August
ISSN(印刷版)1520-6149

会议

会议40th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2015
国家/地区澳大利亚
Brisbane
时期19/04/1424/04/14

指纹

探究 'Photo-real talking head with deep bidirectional LSTM' 的科研主题。它们共同构成独一无二的指纹。

引用此