Micro-expression spotting with multi-scale local transformer in long videos

Xupeng Guo; Xiaobiao Zhang; Lei Li; Zhaoqiang Xia

doi:10.1016/j.patrec.2023.03.012

Micro-expression spotting with multi-scale local transformer in long videos

Xupeng Guo, Xiaobiao Zhang, Lei Li, Zhaoqiang Xia

School of Electronics and Information

Research output: Contribution to journal › Article › peer-review

19 Scopus citations

Abstract

Micro-expression analysis by computer vision techniques has attracted much attention as it can reveal the human emotions automatically. Among the analysis tasks, the temporal spotting is the most challenging task for achieving expression-aware frames from long video sequences. Compared to the well studied recognition task, more researches need to be devoted to the spotting task for further improving the performance and benefiting the subsequent tasks. So, in this paper, we propose a convolutional transformer based deep model for micro-expression spotting in long video sequences. A 3D convolutional subnetwork is firstly employed to extract the visual features from the temporal frames in a fixed-size sliding window of original video sequence. Then a multi-scale local transformer module is designed based on the visual features to model the correlation between frames in a local window. By leveraging the correlation information, the description of face movement becomes more representative for various-duration micro-expressions. Finally, the multi-head classifier and the corresponding estimator are jointly combined to predict the temporal position for spotting micro-expressions. The proposed method is evaluated on two publicly-available datasets, namely CAS(ME)² and SAMM-LV, and achieves the promising performance of 0.2770 F1-score on SAMM-LV and 0.1373 F1-score on CAS(ME)². The code is publicly available on GitHub (https://github.com/xiazhaoqiang/MULT-MicroExpressionSpot).

Original language	English
Pages (from-to)	146-152
Number of pages	7
Journal	Pattern Recognition Letters
Volume	168
DOIs	https://doi.org/10.1016/j.patrec.2023.03.012
State	Published - Apr 2023

Keywords

Convolutional network
Local transformer
Micro-expression spotting

Access to Document

10.1016/j.patrec.2023.03.012

Cite this

@article{e697f73f30e0454f8595dd9053d06083,

title = "Micro-expression spotting with multi-scale local transformer in long videos",

abstract = "Micro-expression analysis by computer vision techniques has attracted much attention as it can reveal the human emotions automatically. Among the analysis tasks, the temporal spotting is the most challenging task for achieving expression-aware frames from long video sequences. Compared to the well studied recognition task, more researches need to be devoted to the spotting task for further improving the performance and benefiting the subsequent tasks. So, in this paper, we propose a convolutional transformer based deep model for micro-expression spotting in long video sequences. A 3D convolutional subnetwork is firstly employed to extract the visual features from the temporal frames in a fixed-size sliding window of original video sequence. Then a multi-scale local transformer module is designed based on the visual features to model the correlation between frames in a local window. By leveraging the correlation information, the description of face movement becomes more representative for various-duration micro-expressions. Finally, the multi-head classifier and the corresponding estimator are jointly combined to predict the temporal position for spotting micro-expressions. The proposed method is evaluated on two publicly-available datasets, namely CAS(ME)2 and SAMM-LV, and achieves the promising performance of 0.2770 F1-score on SAMM-LV and 0.1373 F1-score on CAS(ME)2. The code is publicly available on GitHub (https://github.com/xiazhaoqiang/MULT-MicroExpressionSpot).",

keywords = "Convolutional network, Local transformer, Micro-expression spotting",

author = "Xupeng Guo and Xiaobiao Zhang and Lei Li and Zhaoqiang Xia",

note = "Publisher Copyright: {\textcopyright} 2023 Elsevier B.V.",

year = "2023",

month = apr,

doi = "10.1016/j.patrec.2023.03.012",

language = "英语",

volume = "168",

pages = "146--152",

journal = "Pattern Recognition Letters",

issn = "0167-8655",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Micro-expression spotting with multi-scale local transformer in long videos

AU - Guo, Xupeng

AU - Zhang, Xiaobiao

AU - Li, Lei

AU - Xia, Zhaoqiang

PY - 2023/4

Y1 - 2023/4

N2 - Micro-expression analysis by computer vision techniques has attracted much attention as it can reveal the human emotions automatically. Among the analysis tasks, the temporal spotting is the most challenging task for achieving expression-aware frames from long video sequences. Compared to the well studied recognition task, more researches need to be devoted to the spotting task for further improving the performance and benefiting the subsequent tasks. So, in this paper, we propose a convolutional transformer based deep model for micro-expression spotting in long video sequences. A 3D convolutional subnetwork is firstly employed to extract the visual features from the temporal frames in a fixed-size sliding window of original video sequence. Then a multi-scale local transformer module is designed based on the visual features to model the correlation between frames in a local window. By leveraging the correlation information, the description of face movement becomes more representative for various-duration micro-expressions. Finally, the multi-head classifier and the corresponding estimator are jointly combined to predict the temporal position for spotting micro-expressions. The proposed method is evaluated on two publicly-available datasets, namely CAS(ME)2 and SAMM-LV, and achieves the promising performance of 0.2770 F1-score on SAMM-LV and 0.1373 F1-score on CAS(ME)2. The code is publicly available on GitHub (https://github.com/xiazhaoqiang/MULT-MicroExpressionSpot).

AB - Micro-expression analysis by computer vision techniques has attracted much attention as it can reveal the human emotions automatically. Among the analysis tasks, the temporal spotting is the most challenging task for achieving expression-aware frames from long video sequences. Compared to the well studied recognition task, more researches need to be devoted to the spotting task for further improving the performance and benefiting the subsequent tasks. So, in this paper, we propose a convolutional transformer based deep model for micro-expression spotting in long video sequences. A 3D convolutional subnetwork is firstly employed to extract the visual features from the temporal frames in a fixed-size sliding window of original video sequence. Then a multi-scale local transformer module is designed based on the visual features to model the correlation between frames in a local window. By leveraging the correlation information, the description of face movement becomes more representative for various-duration micro-expressions. Finally, the multi-head classifier and the corresponding estimator are jointly combined to predict the temporal position for spotting micro-expressions. The proposed method is evaluated on two publicly-available datasets, namely CAS(ME)2 and SAMM-LV, and achieves the promising performance of 0.2770 F1-score on SAMM-LV and 0.1373 F1-score on CAS(ME)2. The code is publicly available on GitHub (https://github.com/xiazhaoqiang/MULT-MicroExpressionSpot).

KW - Convolutional network

KW - Local transformer

KW - Micro-expression spotting

UR - http://www.scopus.com/inward/record.url?scp=85150921475&partnerID=8YFLogxK

U2 - 10.1016/j.patrec.2023.03.012

DO - 10.1016/j.patrec.2023.03.012

M3 - 文章

AN - SCOPUS:85150921475

SN - 0167-8655

VL - 168

SP - 146

EP - 152

JO - Pattern Recognition Letters

JF - Pattern Recognition Letters

ER -

Micro-expression spotting with multi-scale local transformer in long videos

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this