TY - GEN
T1 - Accurate visual speech synthesis based on diviseme unit selection and concatenation
AU - Jiang, Dongmei
AU - Ravyse, Ilse
AU - Sahli, Hichem
AU - Zhang, Yanning
PY - 2008
Y1 - 2008
N2 - This paper presents a novel speech driven accurate realistic visual speech synthesis approach. Firstly, an audio visual instance database is built for different viseme context combinations, i.e. diviseme units, using 100 audio visual speech sentences of a female speaker. Then a diviseme instance selection algorithm is introduced to choose the optimal diviseme instances for the viseme contexts in the input speech, considering both the concatenation smoothness of the image sequences, and matching of the mouth movements to the acoustic pronunciation process, as well the intensity of the input speech. Finally mouth image sequences of corresponding viseme segments in the selected diviseme instances are time warped and blended to construct the mouth images of the final animation. Visual speech synthesis experiments and subjective evaluation results show that mouth animations can be obtained which are not only realistic with clear and smooth mouth images, but also in good accordance with the acoustic pronunciation and intensity of the input speech.
AB - This paper presents a novel speech driven accurate realistic visual speech synthesis approach. Firstly, an audio visual instance database is built for different viseme context combinations, i.e. diviseme units, using 100 audio visual speech sentences of a female speaker. Then a diviseme instance selection algorithm is introduced to choose the optimal diviseme instances for the viseme contexts in the input speech, considering both the concatenation smoothness of the image sequences, and matching of the mouth movements to the acoustic pronunciation process, as well the intensity of the input speech. Finally mouth image sequences of corresponding viseme segments in the selected diviseme instances are time warped and blended to construct the mouth images of the final animation. Visual speech synthesis experiments and subjective evaluation results show that mouth animations can be obtained which are not only realistic with clear and smooth mouth images, but also in good accordance with the acoustic pronunciation and intensity of the input speech.
UR - http://www.scopus.com/inward/record.url?scp=58049110676&partnerID=8YFLogxK
U2 - 10.1109/MMSP.2008.4665203
DO - 10.1109/MMSP.2008.4665203
M3 - 会议稿件
AN - SCOPUS:58049110676
SN - 9781424422951
T3 - Proceedings of the 2008 IEEE 10th Workshop on Multimedia Signal Processing, MMSP 2008
SP - 906
EP - 909
BT - Proceedings of the 2008 IEEE 10th Workshop on Multimedia Signal Processing, MMSP 2008
T2 - 2008 IEEE 10th Workshop on Multimedia Signal Processing, MMSP 2008
Y2 - 8 October 2008 through 10 October 2008
ER -