A Review of Audio-Visual Fusion with Machine Learning

Xiaoyu Song, Hong Chen, Qing Wang, Yunqiang Chen, Mengxiao Tian, Hui Tang

Research output: Contribution to journalConference articlepeer-review

9 Scopus citations

Abstract

For the study of single-modal recognition, for example, the research on speech signals, ECG signals, facial expressions, body postures and other physiological signals have made some progress. However, the diversity of human brain information sources and the uncertainty of single-modal recognition determine that the accuracy of single-modal recognition is not high. Therefore, building a multimodal recognition framework in combination with multiple modalities has become an effective means of improving performance. With the rise of multi-modal machine learning, multi-modal information fusion has become a research hotspot, and audio-visual fusion is the most widely used direction. The audio-visual fusion method has been successfully applied to various problems, such as emotion recognition and multimedia event detection, biometric and speech recognition applications. This paper firstly introduces multimodal machine learning briefly, and then summarizes the development and current situation of audio-visual fusion technology in some major areas, and finally puts forward the prospect for the future.

Original languageEnglish
Article number022144
JournalJournal of Physics: Conference Series
Volume1237
Issue number2
DOIs
StatePublished - 12 Jul 2019
Externally publishedYes
Event2019 4th International Conference on Intelligent Computing and Signal Processing, ICSP 2019 - Xi'an, China
Duration: 29 Mar 201931 Mar 2019

Fingerprint

Dive into the research topics of 'A Review of Audio-Visual Fusion with Machine Learning'. Together they form a unique fingerprint.

Cite this