SpFormer: Spatio-Temporal Modeling for Scanpaths with Transformer

Wenqi Zhong; Linzhi Yu; Chen Xia; Junwei Han; Dingwen Zhang

doi:10.1609/aaai.v38i7.28593

SpFormer: Spatio-Temporal Modeling for Scanpaths with Transformer

Wenqi Zhong, Linzhi Yu, Chen Xia, Junwei Han, Dingwen Zhang

School of Automation

Northwestern Polytechnical University Xian

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

5 Scopus citations

Abstract

Saccadic scanpath, a data representation of human visual behavior, has received broad interest in multiple domains.Scanpath is a complex eye-tracking data modality that includes the sequences of fixation positions and fixation duration, coupled with image information.However, previous methods usually face the spatial misalignment problem of fixation features and loss of critical temporal data (including temporal correlation and fixation duration).In this study, we propose a Transformer-based scanpath model, SpFormer, to alleviate these problems.First, we propose a fixation-centric paradigm to extract the aligned spatial fixation features and tokenize the scanpaths.Then, according to the visual working memory mechanism, we design a local meta attention to reduce the semantic redundancy of fixations and guide the model to focus on the meta scanpath.Finally, we progressively integrate the duration information and fuse it with the fixation features to solve the problem of ambiguous location with the Transformer block increasing.We conduct extensive experiments on four databases under three tasks.The SpFormer establishes new state-of-the-art results in distinct settings, verifying its flexibility and versatility in practical applications.The code can be obtained from https://github.com/wenqizhong/SpFormer.

Original language	English
Title of host publication	Technical Tracks 14
Editors	Michael Wooldridge, Jennifer Dy, Sriraam Natarajan
Publisher	Association for the Advancement of Artificial Intelligence
Pages	7605-7613
Number of pages	9
Edition	7
ISBN (Electronic)	1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 1577358872, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879, 9781577358879
DOIs	https://doi.org/10.1609/aaai.v38i7.28593
State	Published - 25 Mar 2024
Event	38th AAAI Conference on Artificial Intelligence, AAAI 2024 - Vancouver, Canada Duration: 20 Feb 2024 → 27 Feb 2024

Publication series

Name	Proceedings of the AAAI Conference on Artificial Intelligence
Number	7
Volume	38
ISSN (Print)	2159-5399
ISSN (Electronic)	2374-3468

Conference

Conference	38th AAAI Conference on Artificial Intelligence, AAAI 2024
Country/Territory	Canada
City	Vancouver
Period	20/02/24 → 27/02/24

Access to Document

10.1609/aaai.v38i7.28593

Cite this

Zhong, W., Yu, L., Xia, C., Han, J., & Zhang, D. (2024). SpFormer: Spatio-Temporal Modeling for Scanpaths with Transformer. In M. Wooldridge, J. Dy, & S. Natarajan (Eds.), Technical Tracks 14 (7 ed., pp. 7605-7613). (Proceedings of the AAAI Conference on Artificial Intelligence; Vol. 38, No. 7). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v38i7.28593

@inproceedings{86dac5eefb4e44cfa58135f9fa98e99f,

title = "SpFormer: Spatio-Temporal Modeling for Scanpaths with Transformer",

abstract = "Saccadic scanpath, a data representation of human visual behavior, has received broad interest in multiple domains.Scanpath is a complex eye-tracking data modality that includes the sequences of fixation positions and fixation duration, coupled with image information.However, previous methods usually face the spatial misalignment problem of fixation features and loss of critical temporal data (including temporal correlation and fixation duration).In this study, we propose a Transformer-based scanpath model, SpFormer, to alleviate these problems.First, we propose a fixation-centric paradigm to extract the aligned spatial fixation features and tokenize the scanpaths.Then, according to the visual working memory mechanism, we design a local meta attention to reduce the semantic redundancy of fixations and guide the model to focus on the meta scanpath.Finally, we progressively integrate the duration information and fuse it with the fixation features to solve the problem of ambiguous location with the Transformer block increasing.We conduct extensive experiments on four databases under three tasks.The SpFormer establishes new state-of-the-art results in distinct settings, verifying its flexibility and versatility in practical applications.The code can be obtained from https://github.com/wenqizhong/SpFormer.",

author = "Wenqi Zhong and Linzhi Yu and Chen Xia and Junwei Han and Dingwen Zhang",

note = "Publisher Copyright: Copyright {\textcopyright} 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org).All rights reserved.; 38th AAAI Conference on Artificial Intelligence, AAAI 2024 ; Conference date: 20-02-2024 Through 27-02-2024",

year = "2024",

month = mar,

day = "25",

doi = "10.1609/aaai.v38i7.28593",

language = "英语",

series = "Proceedings of the AAAI Conference on Artificial Intelligence",

publisher = "Association for the Advancement of Artificial Intelligence",

number = "7",

pages = "7605--7613",

editor = "Michael Wooldridge and Jennifer Dy and Sriraam Natarajan",

booktitle = "Technical Tracks 14",

edition = "7",

}

Zhong, W, Yu, L, Xia, C, Han, J & Zhang, D 2024, SpFormer: Spatio-Temporal Modeling for Scanpaths with Transformer. in M Wooldridge, J Dy & S Natarajan (eds), Technical Tracks 14. 7 edn, Proceedings of the AAAI Conference on Artificial Intelligence, no. 7, vol. 38, Association for the Advancement of Artificial Intelligence, pp. 7605-7613, 38th AAAI Conference on Artificial Intelligence, AAAI 2024, Vancouver, Canada, 20/02/24. https://doi.org/10.1609/aaai.v38i7.28593

SpFormer: Spatio-Temporal Modeling for Scanpaths with Transformer. / Zhong, Wenqi; Yu, Linzhi; Xia, Chen et al.
Technical Tracks 14. ed. / Michael Wooldridge; Jennifer Dy; Sriraam Natarajan. 7. ed. Association for the Advancement of Artificial Intelligence, 2024. p. 7605-7613 (Proceedings of the AAAI Conference on Artificial Intelligence; Vol. 38, No. 7).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - SpFormer

T2 - 38th AAAI Conference on Artificial Intelligence, AAAI 2024

AU - Zhong, Wenqi

AU - Yu, Linzhi

AU - Xia, Chen

AU - Han, Junwei

AU - Zhang, Dingwen

PY - 2024/3/25

Y1 - 2024/3/25

N2 - Saccadic scanpath, a data representation of human visual behavior, has received broad interest in multiple domains.Scanpath is a complex eye-tracking data modality that includes the sequences of fixation positions and fixation duration, coupled with image information.However, previous methods usually face the spatial misalignment problem of fixation features and loss of critical temporal data (including temporal correlation and fixation duration).In this study, we propose a Transformer-based scanpath model, SpFormer, to alleviate these problems.First, we propose a fixation-centric paradigm to extract the aligned spatial fixation features and tokenize the scanpaths.Then, according to the visual working memory mechanism, we design a local meta attention to reduce the semantic redundancy of fixations and guide the model to focus on the meta scanpath.Finally, we progressively integrate the duration information and fuse it with the fixation features to solve the problem of ambiguous location with the Transformer block increasing.We conduct extensive experiments on four databases under three tasks.The SpFormer establishes new state-of-the-art results in distinct settings, verifying its flexibility and versatility in practical applications.The code can be obtained from https://github.com/wenqizhong/SpFormer.

AB - Saccadic scanpath, a data representation of human visual behavior, has received broad interest in multiple domains.Scanpath is a complex eye-tracking data modality that includes the sequences of fixation positions and fixation duration, coupled with image information.However, previous methods usually face the spatial misalignment problem of fixation features and loss of critical temporal data (including temporal correlation and fixation duration).In this study, we propose a Transformer-based scanpath model, SpFormer, to alleviate these problems.First, we propose a fixation-centric paradigm to extract the aligned spatial fixation features and tokenize the scanpaths.Then, according to the visual working memory mechanism, we design a local meta attention to reduce the semantic redundancy of fixations and guide the model to focus on the meta scanpath.Finally, we progressively integrate the duration information and fuse it with the fixation features to solve the problem of ambiguous location with the Transformer block increasing.We conduct extensive experiments on four databases under three tasks.The SpFormer establishes new state-of-the-art results in distinct settings, verifying its flexibility and versatility in practical applications.The code can be obtained from https://github.com/wenqizhong/SpFormer.

UR - http://www.scopus.com/inward/record.url?scp=85189525267&partnerID=8YFLogxK

U2 - 10.1609/aaai.v38i7.28593

DO - 10.1609/aaai.v38i7.28593

M3 - 会议稿件

AN - SCOPUS:85189525267

T3 - Proceedings of the AAAI Conference on Artificial Intelligence

SP - 7605

EP - 7613

BT - Technical Tracks 14

A2 - Wooldridge, Michael

A2 - Dy, Jennifer

A2 - Natarajan, Sriraam

PB - Association for the Advancement of Artificial Intelligence

Y2 - 20 February 2024 through 27 February 2024

ER -

SpFormer: Spatio-Temporal Modeling for Scanpaths with Transformer

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this