TY - GEN
T1 - SpFormer
T2 - 38th AAAI Conference on Artificial Intelligence, AAAI 2024
AU - Zhong, Wenqi
AU - Yu, Linzhi
AU - Xia, Chen
AU - Han, Junwei
AU - Zhang, Dingwen
N1 - Publisher Copyright:
Copyright © 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org).All rights reserved.
PY - 2024/3/25
Y1 - 2024/3/25
N2 - Saccadic scanpath, a data representation of human visual behavior, has received broad interest in multiple domains.Scanpath is a complex eye-tracking data modality that includes the sequences of fixation positions and fixation duration, coupled with image information.However, previous methods usually face the spatial misalignment problem of fixation features and loss of critical temporal data (including temporal correlation and fixation duration).In this study, we propose a Transformer-based scanpath model, SpFormer, to alleviate these problems.First, we propose a fixation-centric paradigm to extract the aligned spatial fixation features and tokenize the scanpaths.Then, according to the visual working memory mechanism, we design a local meta attention to reduce the semantic redundancy of fixations and guide the model to focus on the meta scanpath.Finally, we progressively integrate the duration information and fuse it with the fixation features to solve the problem of ambiguous location with the Transformer block increasing.We conduct extensive experiments on four databases under three tasks.The SpFormer establishes new state-of-the-art results in distinct settings, verifying its flexibility and versatility in practical applications.The code can be obtained from https://github.com/wenqizhong/SpFormer.
AB - Saccadic scanpath, a data representation of human visual behavior, has received broad interest in multiple domains.Scanpath is a complex eye-tracking data modality that includes the sequences of fixation positions and fixation duration, coupled with image information.However, previous methods usually face the spatial misalignment problem of fixation features and loss of critical temporal data (including temporal correlation and fixation duration).In this study, we propose a Transformer-based scanpath model, SpFormer, to alleviate these problems.First, we propose a fixation-centric paradigm to extract the aligned spatial fixation features and tokenize the scanpaths.Then, according to the visual working memory mechanism, we design a local meta attention to reduce the semantic redundancy of fixations and guide the model to focus on the meta scanpath.Finally, we progressively integrate the duration information and fuse it with the fixation features to solve the problem of ambiguous location with the Transformer block increasing.We conduct extensive experiments on four databases under three tasks.The SpFormer establishes new state-of-the-art results in distinct settings, verifying its flexibility and versatility in practical applications.The code can be obtained from https://github.com/wenqizhong/SpFormer.
UR - http://www.scopus.com/inward/record.url?scp=85189525267&partnerID=8YFLogxK
U2 - 10.1609/aaai.v38i7.28593
DO - 10.1609/aaai.v38i7.28593
M3 - 会议稿件
AN - SCOPUS:85189525267
T3 - Proceedings of the AAAI Conference on Artificial Intelligence
SP - 7605
EP - 7613
BT - Technical Tracks 14
A2 - Wooldridge, Michael
A2 - Dy, Jennifer
A2 - Natarajan, Sriraam
PB - Association for the Advancement of Artificial Intelligence
Y2 - 20 February 2024 through 27 February 2024
ER -