Micro-expression spotting with multi-scale local transformer in long videos

Xupeng Guo; Xiaobiao Zhang; Lei Li; Zhaoqiang Xia

doi:10.1016/j.patrec.2023.03.012

Micro-expression spotting with multi-scale local transformer in long videos

Xupeng Guo, Xiaobiao Zhang, Lei Li, Zhaoqiang Xia

电子信息学院

科研成果: 期刊稿件 › 文章 › 同行评审

19 引用（Scopus）

摘要

Micro-expression analysis by computer vision techniques has attracted much attention as it can reveal the human emotions automatically. Among the analysis tasks, the temporal spotting is the most challenging task for achieving expression-aware frames from long video sequences. Compared to the well studied recognition task, more researches need to be devoted to the spotting task for further improving the performance and benefiting the subsequent tasks. So, in this paper, we propose a convolutional transformer based deep model for micro-expression spotting in long video sequences. A 3D convolutional subnetwork is firstly employed to extract the visual features from the temporal frames in a fixed-size sliding window of original video sequence. Then a multi-scale local transformer module is designed based on the visual features to model the correlation between frames in a local window. By leveraging the correlation information, the description of face movement becomes more representative for various-duration micro-expressions. Finally, the multi-head classifier and the corresponding estimator are jointly combined to predict the temporal position for spotting micro-expressions. The proposed method is evaluated on two publicly-available datasets, namely CAS(ME)² and SAMM-LV, and achieves the promising performance of 0.2770 F1-score on SAMM-LV and 0.1373 F1-score on CAS(ME)². The code is publicly available on GitHub (https://github.com/xiazhaoqiang/MULT-MicroExpressionSpot).

源语言	英语
页（从-至）	146-152
页数	7
期刊	Pattern Recognition Letters
卷	168
DOI	https://doi.org/10.1016/j.patrec.2023.03.012
出版状态	已出版 - 4月 2023

访问文件

10.1016/j.patrec.2023.03.012

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{e697f73f30e0454f8595dd9053d06083,

title = "Micro-expression spotting with multi-scale local transformer in long videos",

abstract = "Micro-expression analysis by computer vision techniques has attracted much attention as it can reveal the human emotions automatically. Among the analysis tasks, the temporal spotting is the most challenging task for achieving expression-aware frames from long video sequences. Compared to the well studied recognition task, more researches need to be devoted to the spotting task for further improving the performance and benefiting the subsequent tasks. So, in this paper, we propose a convolutional transformer based deep model for micro-expression spotting in long video sequences. A 3D convolutional subnetwork is firstly employed to extract the visual features from the temporal frames in a fixed-size sliding window of original video sequence. Then a multi-scale local transformer module is designed based on the visual features to model the correlation between frames in a local window. By leveraging the correlation information, the description of face movement becomes more representative for various-duration micro-expressions. Finally, the multi-head classifier and the corresponding estimator are jointly combined to predict the temporal position for spotting micro-expressions. The proposed method is evaluated on two publicly-available datasets, namely CAS(ME)2 and SAMM-LV, and achieves the promising performance of 0.2770 F1-score on SAMM-LV and 0.1373 F1-score on CAS(ME)2. The code is publicly available on GitHub (https://github.com/xiazhaoqiang/MULT-MicroExpressionSpot).",

keywords = "Convolutional network, Local transformer, Micro-expression spotting",

author = "Xupeng Guo and Xiaobiao Zhang and Lei Li and Zhaoqiang Xia",

note = "Publisher Copyright: {\textcopyright} 2023 Elsevier B.V.",

year = "2023",

month = apr,

doi = "10.1016/j.patrec.2023.03.012",

language = "英语",

volume = "168",

pages = "146--152",

journal = "Pattern Recognition Letters",

issn = "0167-8655",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Micro-expression spotting with multi-scale local transformer in long videos

AU - Guo, Xupeng

AU - Zhang, Xiaobiao

AU - Li, Lei

AU - Xia, Zhaoqiang

PY - 2023/4

Y1 - 2023/4

N2 - Micro-expression analysis by computer vision techniques has attracted much attention as it can reveal the human emotions automatically. Among the analysis tasks, the temporal spotting is the most challenging task for achieving expression-aware frames from long video sequences. Compared to the well studied recognition task, more researches need to be devoted to the spotting task for further improving the performance and benefiting the subsequent tasks. So, in this paper, we propose a convolutional transformer based deep model for micro-expression spotting in long video sequences. A 3D convolutional subnetwork is firstly employed to extract the visual features from the temporal frames in a fixed-size sliding window of original video sequence. Then a multi-scale local transformer module is designed based on the visual features to model the correlation between frames in a local window. By leveraging the correlation information, the description of face movement becomes more representative for various-duration micro-expressions. Finally, the multi-head classifier and the corresponding estimator are jointly combined to predict the temporal position for spotting micro-expressions. The proposed method is evaluated on two publicly-available datasets, namely CAS(ME)2 and SAMM-LV, and achieves the promising performance of 0.2770 F1-score on SAMM-LV and 0.1373 F1-score on CAS(ME)2. The code is publicly available on GitHub (https://github.com/xiazhaoqiang/MULT-MicroExpressionSpot).

AB - Micro-expression analysis by computer vision techniques has attracted much attention as it can reveal the human emotions automatically. Among the analysis tasks, the temporal spotting is the most challenging task for achieving expression-aware frames from long video sequences. Compared to the well studied recognition task, more researches need to be devoted to the spotting task for further improving the performance and benefiting the subsequent tasks. So, in this paper, we propose a convolutional transformer based deep model for micro-expression spotting in long video sequences. A 3D convolutional subnetwork is firstly employed to extract the visual features from the temporal frames in a fixed-size sliding window of original video sequence. Then a multi-scale local transformer module is designed based on the visual features to model the correlation between frames in a local window. By leveraging the correlation information, the description of face movement becomes more representative for various-duration micro-expressions. Finally, the multi-head classifier and the corresponding estimator are jointly combined to predict the temporal position for spotting micro-expressions. The proposed method is evaluated on two publicly-available datasets, namely CAS(ME)2 and SAMM-LV, and achieves the promising performance of 0.2770 F1-score on SAMM-LV and 0.1373 F1-score on CAS(ME)2. The code is publicly available on GitHub (https://github.com/xiazhaoqiang/MULT-MicroExpressionSpot).

KW - Convolutional network

KW - Local transformer

KW - Micro-expression spotting

UR - http://www.scopus.com/inward/record.url?scp=85150921475&partnerID=8YFLogxK

U2 - 10.1016/j.patrec.2023.03.012

DO - 10.1016/j.patrec.2023.03.012

M3 - 文章

AN - SCOPUS:85150921475

SN - 0167-8655

VL - 168

SP - 146

EP - 152

JO - Pattern Recognition Letters

JF - Pattern Recognition Letters

ER -

Micro-expression spotting with multi-scale local transformer in long videos

摘要

访问文件

其它文件与链接

指纹

引用此