Abstract
This paper presents a robust visual feature based on Visemic LDA for audio visual speech recognition, which captures dynamic lip contour information and reflects the viseme classes of visual speech. The paper also introduces an automatic labeling method using the speech recognition results for LDA training data, which avoids the tedious manually labeling work and labeling errors. Experimental results show that the audio visual speech recognition system based on the visual features presented in this paper can greatly increase the speech recognition rate in noisy conditions. The combination of the visual feature with multi-stream HMM can bring the recognition rate of over 80% at a 10 dB SNR noisy condition.
Original language | English |
---|---|
Pages (from-to) | 64-68 |
Number of pages | 5 |
Journal | Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology |
Volume | 27 |
Issue number | 1 |
State | Published - Jan 2005 |
Keywords
- ASM
- Audio visual speech recognition
- Linear Discriminant Analysis (LDA)
- Speech recognition
- Viseme