A Review of Audio-Visual Fusion with Machine Learning

Xiaoyu Song; Hong Chen; Qing Wang; Yunqiang Chen; Mengxiao Tian; Hui Tang

doi:10.1088/1742-6596/1237/2/022144

A Review of Audio-Visual Fusion with Machine Learning

Xiaoyu Song, Hong Chen, Qing Wang, Yunqiang Chen, Mengxiao Tian, Hui Tang

China Agricultural University

Research output: Contribution to journal › Conference article › peer-review

9 Scopus citations

Abstract

For the study of single-modal recognition, for example, the research on speech signals, ECG signals, facial expressions, body postures and other physiological signals have made some progress. However, the diversity of human brain information sources and the uncertainty of single-modal recognition determine that the accuracy of single-modal recognition is not high. Therefore, building a multimodal recognition framework in combination with multiple modalities has become an effective means of improving performance. With the rise of multi-modal machine learning, multi-modal information fusion has become a research hotspot, and audio-visual fusion is the most widely used direction. The audio-visual fusion method has been successfully applied to various problems, such as emotion recognition and multimedia event detection, biometric and speech recognition applications. This paper firstly introduces multimodal machine learning briefly, and then summarizes the development and current situation of audio-visual fusion technology in some major areas, and finally puts forward the prospect for the future.

Original language	English
Article number	022144
Journal	Journal of Physics: Conference Series
Volume	1237
Issue number	2
DOIs	https://doi.org/10.1088/1742-6596/1237/2/022144
State	Published - 12 Jul 2019
Externally published	Yes
Event	2019 4th International Conference on Intelligent Computing and Signal Processing, ICSP 2019 - Xi'an, China Duration: 29 Mar 2019 → 31 Mar 2019

Access to Document

10.1088/1742-6596/1237/2/022144

Cite this

@article{c42ef8765e5c4d959b3a43b8ae6c8d51,

title = "A Review of Audio-Visual Fusion with Machine Learning",

abstract = "For the study of single-modal recognition, for example, the research on speech signals, ECG signals, facial expressions, body postures and other physiological signals have made some progress. However, the diversity of human brain information sources and the uncertainty of single-modal recognition determine that the accuracy of single-modal recognition is not high. Therefore, building a multimodal recognition framework in combination with multiple modalities has become an effective means of improving performance. With the rise of multi-modal machine learning, multi-modal information fusion has become a research hotspot, and audio-visual fusion is the most widely used direction. The audio-visual fusion method has been successfully applied to various problems, such as emotion recognition and multimedia event detection, biometric and speech recognition applications. This paper firstly introduces multimodal machine learning briefly, and then summarizes the development and current situation of audio-visual fusion technology in some major areas, and finally puts forward the prospect for the future.",

author = "Xiaoyu Song and Hong Chen and Qing Wang and Yunqiang Chen and Mengxiao Tian and Hui Tang",

note = "Publisher Copyright: {\textcopyright} 2019 IOP Publishing Ltd. All rights reserved.; 2019 4th International Conference on Intelligent Computing and Signal Processing, ICSP 2019 ; Conference date: 29-03-2019 Through 31-03-2019",

year = "2019",

month = jul,

day = "12",

doi = "10.1088/1742-6596/1237/2/022144",

language = "英语",

volume = "1237",

journal = "Journal of Physics: Conference Series",

issn = "1742-6588",

publisher = "IOP Publishing Ltd.",

number = "2",

}

TY - JOUR

T1 - A Review of Audio-Visual Fusion with Machine Learning

AU - Song, Xiaoyu

AU - Chen, Hong

AU - Wang, Qing

AU - Chen, Yunqiang

AU - Tian, Mengxiao

AU - Tang, Hui

PY - 2019/7/12

Y1 - 2019/7/12

N2 - For the study of single-modal recognition, for example, the research on speech signals, ECG signals, facial expressions, body postures and other physiological signals have made some progress. However, the diversity of human brain information sources and the uncertainty of single-modal recognition determine that the accuracy of single-modal recognition is not high. Therefore, building a multimodal recognition framework in combination with multiple modalities has become an effective means of improving performance. With the rise of multi-modal machine learning, multi-modal information fusion has become a research hotspot, and audio-visual fusion is the most widely used direction. The audio-visual fusion method has been successfully applied to various problems, such as emotion recognition and multimedia event detection, biometric and speech recognition applications. This paper firstly introduces multimodal machine learning briefly, and then summarizes the development and current situation of audio-visual fusion technology in some major areas, and finally puts forward the prospect for the future.

AB - For the study of single-modal recognition, for example, the research on speech signals, ECG signals, facial expressions, body postures and other physiological signals have made some progress. However, the diversity of human brain information sources and the uncertainty of single-modal recognition determine that the accuracy of single-modal recognition is not high. Therefore, building a multimodal recognition framework in combination with multiple modalities has become an effective means of improving performance. With the rise of multi-modal machine learning, multi-modal information fusion has become a research hotspot, and audio-visual fusion is the most widely used direction. The audio-visual fusion method has been successfully applied to various problems, such as emotion recognition and multimedia event detection, biometric and speech recognition applications. This paper firstly introduces multimodal machine learning briefly, and then summarizes the development and current situation of audio-visual fusion technology in some major areas, and finally puts forward the prospect for the future.

UR - http://www.scopus.com/inward/record.url?scp=85070276220&partnerID=8YFLogxK

U2 - 10.1088/1742-6596/1237/2/022144

DO - 10.1088/1742-6596/1237/2/022144

M3 - 会议文章

AN - SCOPUS:85070276220

SN - 1742-6588

VL - 1237

JO - Journal of Physics: Conference Series

JF - Journal of Physics: Conference Series

IS - 2

M1 - 022144

T2 - 2019 4th International Conference on Intelligent Computing and Signal Processing, ICSP 2019

Y2 - 29 March 2019 through 31 March 2019

ER -

A Review of Audio-Visual Fusion with Machine Learning

Abstract

Access to Document

Other files and links

Fingerprint

Cite this