TY - JOUR
T1 - A Review of Audio-Visual Fusion with Machine Learning
AU - Song, Xiaoyu
AU - Chen, Hong
AU - Wang, Qing
AU - Chen, Yunqiang
AU - Tian, Mengxiao
AU - Tang, Hui
N1 - Publisher Copyright:
© 2019 IOP Publishing Ltd. All rights reserved.
PY - 2019/7/12
Y1 - 2019/7/12
N2 - For the study of single-modal recognition, for example, the research on speech signals, ECG signals, facial expressions, body postures and other physiological signals have made some progress. However, the diversity of human brain information sources and the uncertainty of single-modal recognition determine that the accuracy of single-modal recognition is not high. Therefore, building a multimodal recognition framework in combination with multiple modalities has become an effective means of improving performance. With the rise of multi-modal machine learning, multi-modal information fusion has become a research hotspot, and audio-visual fusion is the most widely used direction. The audio-visual fusion method has been successfully applied to various problems, such as emotion recognition and multimedia event detection, biometric and speech recognition applications. This paper firstly introduces multimodal machine learning briefly, and then summarizes the development and current situation of audio-visual fusion technology in some major areas, and finally puts forward the prospect for the future.
AB - For the study of single-modal recognition, for example, the research on speech signals, ECG signals, facial expressions, body postures and other physiological signals have made some progress. However, the diversity of human brain information sources and the uncertainty of single-modal recognition determine that the accuracy of single-modal recognition is not high. Therefore, building a multimodal recognition framework in combination with multiple modalities has become an effective means of improving performance. With the rise of multi-modal machine learning, multi-modal information fusion has become a research hotspot, and audio-visual fusion is the most widely used direction. The audio-visual fusion method has been successfully applied to various problems, such as emotion recognition and multimedia event detection, biometric and speech recognition applications. This paper firstly introduces multimodal machine learning briefly, and then summarizes the development and current situation of audio-visual fusion technology in some major areas, and finally puts forward the prospect for the future.
UR - http://www.scopus.com/inward/record.url?scp=85070276220&partnerID=8YFLogxK
U2 - 10.1088/1742-6596/1237/2/022144
DO - 10.1088/1742-6596/1237/2/022144
M3 - 会议文章
AN - SCOPUS:85070276220
SN - 1742-6588
VL - 1237
JO - Journal of Physics: Conference Series
JF - Journal of Physics: Conference Series
IS - 2
M1 - 022144
T2 - 2019 4th International Conference on Intelligent Computing and Signal Processing, ICSP 2019
Y2 - 29 March 2019 through 31 March 2019
ER -