TY - JOUR
T1 - Identifying Children with Autism Spectrum Disorder via Transformer-Based Representation Learning from Dynamic Facial Cues
AU - Xia, Chen
AU - Chen, Hexu
AU - Han, Junwei
AU - Zhang, Dingwen
AU - Li, Kuan
N1 - Publisher Copyright:
IEEE
PY - 2024
Y1 - 2024
N2 - Recognizing autism spectrum disorder (ASD) has faced great challenges due to insufficient professional clinicians and complex procedures. Automated data-driven ASD recognition models can reduce the subjectivity and physician dependency of traditional evaluation methods. Facial data, which can encode important perceptual and social behaviors, have emerged in ASD research to explore novel biomarkers for screening, diagnosing, and treating ASD. However, existing research mainly focuses on extracting low-level hand-crafted facial features for analysis and classification. Determining how to learn discriminative deep representations from dynamic facial data for computational model construction remains an unresolved challenge. In this study, we propose an ASD recognition model based on facial videos to fill the lack of temporal correlation learning of facial features. First, we utilize a vision transformer to extract frame-based global facial features. Then, we use a Longformer to establish the correlation of facial features over time. In the experiment, we recruited 146 subjects between 2 and 8 years of age to record their facial videos under a computer-based eye-tracking experiment and 76 subjects to conduct a smartphone-based experiment. Quantitative comparisons have shown the effectiveness and reliability of the proposed model. Furthermore, we have confirmed the correlation between facial and eye-tracking modalities in visual attention.
AB - Recognizing autism spectrum disorder (ASD) has faced great challenges due to insufficient professional clinicians and complex procedures. Automated data-driven ASD recognition models can reduce the subjectivity and physician dependency of traditional evaluation methods. Facial data, which can encode important perceptual and social behaviors, have emerged in ASD research to explore novel biomarkers for screening, diagnosing, and treating ASD. However, existing research mainly focuses on extracting low-level hand-crafted facial features for analysis and classification. Determining how to learn discriminative deep representations from dynamic facial data for computational model construction remains an unresolved challenge. In this study, we propose an ASD recognition model based on facial videos to fill the lack of temporal correlation learning of facial features. First, we utilize a vision transformer to extract frame-based global facial features. Then, we use a Longformer to establish the correlation of facial features over time. In the experiment, we recruited 146 subjects between 2 and 8 years of age to record their facial videos under a computer-based eye-tracking experiment and 76 subjects to conduct a smartphone-based experiment. Quantitative comparisons have shown the effectiveness and reliability of the proposed model. Furthermore, we have confirmed the correlation between facial and eye-tracking modalities in visual attention.
KW - Autism spectrum disorder (ASD)
KW - Biological system modeling
KW - Brain modeling
KW - eye-tracking
KW - Face recognition
KW - Facial features
KW - Gaze tracking
KW - longformer
KW - Pediatrics
KW - representation learning
KW - spatiotemporal facial cues
KW - Videos
KW - vision transformer (ViT)
KW - visual attention
UR - http://www.scopus.com/inward/record.url?scp=85196097076&partnerID=8YFLogxK
U2 - 10.1109/TAFFC.2024.3412032
DO - 10.1109/TAFFC.2024.3412032
M3 - 文章
AN - SCOPUS:85196097076
SN - 1949-3045
SP - 1
EP - 16
JO - IEEE Transactions on Affective Computing
JF - IEEE Transactions on Affective Computing
ER -