Audio-Visual Kinship Verification in the Wild

Xiaoting Wu; Eric Granger; Tomi H. Kinnunen; Xiaoyi Feng; Abdenour Hadid

doi:10.1109/ICB45273.2019.8987241

Audio-Visual Kinship Verification in the Wild

Xiaoting Wu, Eric Granger, Tomi H. Kinnunen, Xiaoyi Feng, Abdenour Hadid

School of Electronics and Information

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

14 Scopus citations

Abstract

Kinship verification is a challenging problem, where recognition systems are trained to establish a kin relation between two individuals based on facial images or videos. However, due to variations in capture conditions (background, pose, expression, illumination and occlusion), state-of-the-art systems currently provide a low level of accuracy. As in many visual recognition and affective computing applications, kinship verification may benefit from a combination of discriminant information extracted from both video and audio signals. In this paper, we investigate for the first time the fusion audio-visual information from both face and voice modalities to improve kinship verification accuracy. First, we propose a new multi-modal kinship dataset called TALking KINship (TALKIN), that is comprised of several pairs of video sequences with subjects talking. State-of-the-art conventional and deep learning models are assessed and compared for kinship verification using this dataset. Finally, we propose a deep Siamese network for multi-modal fusion of kinship relations. Experiments with the TALKIN dataset indicate that the proposed Siamese network provides a significantly higher level of accuracy over baseline uni-modal and multi-modal fusion techniques for kinship verification. Results also indicate that audio (vocal) information is complementary and useful for kinship verification problem.

Original language	English
Title of host publication	2019 International Conference on Biometrics, ICB 2019
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9781728136400
DOIs	https://doi.org/10.1109/ICB45273.2019.8987241
State	Published - Jun 2019
Event	2019 International Conference on Biometrics, ICB 2019 - Crete, Greece Duration: 4 Jun 2019 → 7 Jun 2019

Publication series

Name	2019 International Conference on Biometrics, ICB 2019

Conference

Conference	2019 International Conference on Biometrics, ICB 2019
Country/Territory	Greece
City	Crete
Period	4/06/19 → 7/06/19

Access to Document

10.1109/ICB45273.2019.8987241

Cite this

@inproceedings{52c51daa803248698659e1bf3ce03947,

title = "Audio-Visual Kinship Verification in the Wild",

abstract = "Kinship verification is a challenging problem, where recognition systems are trained to establish a kin relation between two individuals based on facial images or videos. However, due to variations in capture conditions (background, pose, expression, illumination and occlusion), state-of-the-art systems currently provide a low level of accuracy. As in many visual recognition and affective computing applications, kinship verification may benefit from a combination of discriminant information extracted from both video and audio signals. In this paper, we investigate for the first time the fusion audio-visual information from both face and voice modalities to improve kinship verification accuracy. First, we propose a new multi-modal kinship dataset called TALking KINship (TALKIN), that is comprised of several pairs of video sequences with subjects talking. State-of-the-art conventional and deep learning models are assessed and compared for kinship verification using this dataset. Finally, we propose a deep Siamese network for multi-modal fusion of kinship relations. Experiments with the TALKIN dataset indicate that the proposed Siamese network provides a significantly higher level of accuracy over baseline uni-modal and multi-modal fusion techniques for kinship verification. Results also indicate that audio (vocal) information is complementary and useful for kinship verification problem.",

author = "Xiaoting Wu and Eric Granger and Kinnunen, {Tomi H.} and Xiaoyi Feng and Abdenour Hadid",

note = "Publisher Copyright: {\textcopyright} 2019 IEEE.; 2019 International Conference on Biometrics, ICB 2019 ; Conference date: 04-06-2019 Through 07-06-2019",

year = "2019",

month = jun,

doi = "10.1109/ICB45273.2019.8987241",

language = "英语",

series = "2019 International Conference on Biometrics, ICB 2019",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "2019 International Conference on Biometrics, ICB 2019",

}

Wu, X, Granger, E, Kinnunen, TH, Feng, X & Hadid, A 2019, Audio-Visual Kinship Verification in the Wild. in 2019 International Conference on Biometrics, ICB 2019., 8987241, 2019 International Conference on Biometrics, ICB 2019, Institute of Electrical and Electronics Engineers Inc., 2019 International Conference on Biometrics, ICB 2019, Crete, Greece, 4/06/19. https://doi.org/10.1109/ICB45273.2019.8987241

Audio-Visual Kinship Verification in the Wild. / Wu, Xiaoting; Granger, Eric; Kinnunen, Tomi H. et al.
2019 International Conference on Biometrics, ICB 2019. Institute of Electrical and Electronics Engineers Inc., 2019. 8987241 (2019 International Conference on Biometrics, ICB 2019).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Audio-Visual Kinship Verification in the Wild

AU - Wu, Xiaoting

AU - Granger, Eric

AU - Kinnunen, Tomi H.

AU - Feng, Xiaoyi

AU - Hadid, Abdenour

PY - 2019/6

Y1 - 2019/6

N2 - Kinship verification is a challenging problem, where recognition systems are trained to establish a kin relation between two individuals based on facial images or videos. However, due to variations in capture conditions (background, pose, expression, illumination and occlusion), state-of-the-art systems currently provide a low level of accuracy. As in many visual recognition and affective computing applications, kinship verification may benefit from a combination of discriminant information extracted from both video and audio signals. In this paper, we investigate for the first time the fusion audio-visual information from both face and voice modalities to improve kinship verification accuracy. First, we propose a new multi-modal kinship dataset called TALking KINship (TALKIN), that is comprised of several pairs of video sequences with subjects talking. State-of-the-art conventional and deep learning models are assessed and compared for kinship verification using this dataset. Finally, we propose a deep Siamese network for multi-modal fusion of kinship relations. Experiments with the TALKIN dataset indicate that the proposed Siamese network provides a significantly higher level of accuracy over baseline uni-modal and multi-modal fusion techniques for kinship verification. Results also indicate that audio (vocal) information is complementary and useful for kinship verification problem.

AB - Kinship verification is a challenging problem, where recognition systems are trained to establish a kin relation between two individuals based on facial images or videos. However, due to variations in capture conditions (background, pose, expression, illumination and occlusion), state-of-the-art systems currently provide a low level of accuracy. As in many visual recognition and affective computing applications, kinship verification may benefit from a combination of discriminant information extracted from both video and audio signals. In this paper, we investigate for the first time the fusion audio-visual information from both face and voice modalities to improve kinship verification accuracy. First, we propose a new multi-modal kinship dataset called TALking KINship (TALKIN), that is comprised of several pairs of video sequences with subjects talking. State-of-the-art conventional and deep learning models are assessed and compared for kinship verification using this dataset. Finally, we propose a deep Siamese network for multi-modal fusion of kinship relations. Experiments with the TALKIN dataset indicate that the proposed Siamese network provides a significantly higher level of accuracy over baseline uni-modal and multi-modal fusion techniques for kinship verification. Results also indicate that audio (vocal) information is complementary and useful for kinship verification problem.

UR - http://www.scopus.com/inward/record.url?scp=85081059758&partnerID=8YFLogxK

U2 - 10.1109/ICB45273.2019.8987241

DO - 10.1109/ICB45273.2019.8987241

M3 - 会议稿件

AN - SCOPUS:85081059758

T3 - 2019 International Conference on Biometrics, ICB 2019

BT - 2019 International Conference on Biometrics, ICB 2019

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2019 International Conference on Biometrics, ICB 2019

Y2 - 4 June 2019 through 7 June 2019

ER -

Audio-Visual Kinship Verification in the Wild

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this