Identifying Children with Autism Spectrum Disorder via Transformer-Based Representation Learning from Dynamic Facial Cues

Chen Xia; Hexu Chen; Junwei Han; Dingwen Zhang; Kuan Li

doi:10.1109/TAFFC.2024.3412032

Identifying Children with Autism Spectrum Disorder via Transformer-Based Representation Learning from Dynamic Facial Cues

Chen Xia, Hexu Chen, Junwei Han, Dingwen Zhang, Kuan Li

School of Automation

Research output: Contribution to journal › Article › peer-review

2 Scopus citations

Abstract

Recognizing autism spectrum disorder (ASD) has faced great challenges due to insufficient professional clinicians and complex procedures. Automated data-driven ASD recognition models can reduce the subjectivity and physician dependency of traditional evaluation methods. Facial data, which can encode important perceptual and social behaviors, have emerged in ASD research to explore novel biomarkers for screening, diagnosing, and treating ASD. However, existing research mainly focuses on extracting low-level hand-crafted facial features for analysis and classification. Determining how to learn discriminative deep representations from dynamic facial data for computational model construction remains an unresolved challenge. In this study, we propose an ASD recognition model based on facial videos to fill the lack of temporal correlation learning of facial features. First, we utilize a vision transformer to extract frame-based global facial features. Then, we use a Longformer to establish the correlation of facial features over time. In the experiment, we recruited 146 subjects between 2 and 8 years of age to record their facial videos under a computer-based eye-tracking experiment and 76 subjects to conduct a smartphone-based experiment. Quantitative comparisons have shown the effectiveness and reliability of the proposed model. Furthermore, we have confirmed the correlation between facial and eye-tracking modalities in visual attention.

Original language	English
Pages (from-to)	1-16
Number of pages	16
Journal	IEEE Transactions on Affective Computing
DOIs	https://doi.org/10.1109/TAFFC.2024.3412032
State	Accepted/In press - 2024

Keywords

Autism spectrum disorder (ASD)
Biological system modeling
Brain modeling
eye-tracking
Face recognition
Facial features
Gaze tracking
longformer
Pediatrics
representation learning
spatiotemporal facial cues
Videos
vision transformer (ViT)
visual attention

Access to Document

10.1109/TAFFC.2024.3412032

Cite this

@article{e1fbfa0b40c24a00adccf42f1e00591d,

title = "Identifying Children with Autism Spectrum Disorder via Transformer-Based Representation Learning from Dynamic Facial Cues",

abstract = "Recognizing autism spectrum disorder (ASD) has faced great challenges due to insufficient professional clinicians and complex procedures. Automated data-driven ASD recognition models can reduce the subjectivity and physician dependency of traditional evaluation methods. Facial data, which can encode important perceptual and social behaviors, have emerged in ASD research to explore novel biomarkers for screening, diagnosing, and treating ASD. However, existing research mainly focuses on extracting low-level hand-crafted facial features for analysis and classification. Determining how to learn discriminative deep representations from dynamic facial data for computational model construction remains an unresolved challenge. In this study, we propose an ASD recognition model based on facial videos to fill the lack of temporal correlation learning of facial features. First, we utilize a vision transformer to extract frame-based global facial features. Then, we use a Longformer to establish the correlation of facial features over time. In the experiment, we recruited 146 subjects between 2 and 8 years of age to record their facial videos under a computer-based eye-tracking experiment and 76 subjects to conduct a smartphone-based experiment. Quantitative comparisons have shown the effectiveness and reliability of the proposed model. Furthermore, we have confirmed the correlation between facial and eye-tracking modalities in visual attention.",

keywords = "Autism spectrum disorder (ASD), Biological system modeling, Brain modeling, eye-tracking, Face recognition, Facial features, Gaze tracking, longformer, Pediatrics, representation learning, spatiotemporal facial cues, Videos, vision transformer (ViT), visual attention",

author = "Chen Xia and Hexu Chen and Junwei Han and Dingwen Zhang and Kuan Li",

note = "Publisher Copyright: IEEE",

year = "2024",

doi = "10.1109/TAFFC.2024.3412032",

language = "英语",

pages = "1--16",

journal = "IEEE Transactions on Affective Computing",

issn = "1949-3045",

}

TY - JOUR

T1 - Identifying Children with Autism Spectrum Disorder via Transformer-Based Representation Learning from Dynamic Facial Cues

AU - Xia, Chen

AU - Chen, Hexu

AU - Han, Junwei

AU - Zhang, Dingwen

AU - Li, Kuan

N1 - Publisher Copyright: IEEE

PY - 2024

Y1 - 2024

N2 - Recognizing autism spectrum disorder (ASD) has faced great challenges due to insufficient professional clinicians and complex procedures. Automated data-driven ASD recognition models can reduce the subjectivity and physician dependency of traditional evaluation methods. Facial data, which can encode important perceptual and social behaviors, have emerged in ASD research to explore novel biomarkers for screening, diagnosing, and treating ASD. However, existing research mainly focuses on extracting low-level hand-crafted facial features for analysis and classification. Determining how to learn discriminative deep representations from dynamic facial data for computational model construction remains an unresolved challenge. In this study, we propose an ASD recognition model based on facial videos to fill the lack of temporal correlation learning of facial features. First, we utilize a vision transformer to extract frame-based global facial features. Then, we use a Longformer to establish the correlation of facial features over time. In the experiment, we recruited 146 subjects between 2 and 8 years of age to record their facial videos under a computer-based eye-tracking experiment and 76 subjects to conduct a smartphone-based experiment. Quantitative comparisons have shown the effectiveness and reliability of the proposed model. Furthermore, we have confirmed the correlation between facial and eye-tracking modalities in visual attention.

AB - Recognizing autism spectrum disorder (ASD) has faced great challenges due to insufficient professional clinicians and complex procedures. Automated data-driven ASD recognition models can reduce the subjectivity and physician dependency of traditional evaluation methods. Facial data, which can encode important perceptual and social behaviors, have emerged in ASD research to explore novel biomarkers for screening, diagnosing, and treating ASD. However, existing research mainly focuses on extracting low-level hand-crafted facial features for analysis and classification. Determining how to learn discriminative deep representations from dynamic facial data for computational model construction remains an unresolved challenge. In this study, we propose an ASD recognition model based on facial videos to fill the lack of temporal correlation learning of facial features. First, we utilize a vision transformer to extract frame-based global facial features. Then, we use a Longformer to establish the correlation of facial features over time. In the experiment, we recruited 146 subjects between 2 and 8 years of age to record their facial videos under a computer-based eye-tracking experiment and 76 subjects to conduct a smartphone-based experiment. Quantitative comparisons have shown the effectiveness and reliability of the proposed model. Furthermore, we have confirmed the correlation between facial and eye-tracking modalities in visual attention.

KW - Autism spectrum disorder (ASD)

KW - Biological system modeling

KW - Brain modeling

KW - eye-tracking

KW - Face recognition

KW - Facial features

KW - Gaze tracking

KW - longformer

KW - Pediatrics

KW - representation learning

KW - spatiotemporal facial cues

KW - Videos

KW - vision transformer (ViT)

KW - visual attention

UR - http://www.scopus.com/inward/record.url?scp=85196097076&partnerID=8YFLogxK

U2 - 10.1109/TAFFC.2024.3412032

DO - 10.1109/TAFFC.2024.3412032

M3 - 文章

AN - SCOPUS:85196097076

SN - 1949-3045

SP - 1

EP - 16

JO - IEEE Transactions on Affective Computing

JF - IEEE Transactions on Affective Computing

ER -

Identifying Children with Autism Spectrum Disorder via Transformer-Based Representation Learning from Dynamic Facial Cues

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this