跳到主要导航 跳到搜索 跳到主要内容

Visual-Audio-based Fusion Network via Enhanced Transformer for Depression Detection

  • Shaanxi University of Chinese Medicine
  • Chongqing Normal University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Depression, as a common psychological disorder, poses a potential threat to public safety in society. Although many approaches have been proposed to address the long tail distribution issue, these methods still face challenges in modeling long-term dependencies and feature selection. To address these issues, this paper proposes a visual-audio fusion network framework via an enhanced transformer. Concisely, a learnable Multimodal Alignment Module (MAM) is designed to uniformly map video and audio features to a consistent spatiotemporal resolution. Then, a bidirectional Crossmodal Interaction Module (CIM) is introduced to enable video and audio to query/context to each other, achieving fine-grained and symmetrical semantic acoustic coupling modeling. Finally, we design an Enhanced Transformer Module (ETM), which combines a randomly deep, regularized Transformer backbone with dynamic absolute position encoding, thereby improving generalization and adaptability to variable-length inputs in small-sample scenarios while enhancing the ability to model long-term dependencies. Extensive quantitative experiments on public datasets show that our method achieves higher classification accuracy and precision in depression classification than existing methods.

源语言英语
主期刊名International Conference on Machine Learning and Artificial Intelligence Applications, MLAIA 2025
编辑Jianhua Zhou
出版商SPIE
ISBN(电子版)9798902322276
DOI
出版状态已出版 - 9 3月 2026
活动International Conference on Machine Learning and Artificial Intelligence Applications, MLAIA 2025 - Shaoyang, 中国
期限: 12 12月 202514 12月 2025

出版系列

姓名Proceedings of SPIE - The International Society for Optical Engineering
14134
ISSN(印刷版)0277-786X
ISSN(电子版)1996-756X

会议

会议International Conference on Machine Learning and Artificial Intelligence Applications, MLAIA 2025
国家/地区中国
Shaoyang
时期12/12/2514/12/25

指纹

探究 'Visual-Audio-based Fusion Network via Enhanced Transformer for Depression Detection' 的科研主题。它们共同构成独一无二的指纹。

引用此