Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning

Chong Ma; Hanqi Jiang; Wenting Chen; Yiwei Li; Zihao Wu; Xiaowei Yu; Zhengliang Liu; Lei Guo; Dajiang Zhu; Tuo Zhang; Dinggang Shen; Tianming Liu; Xiang Li

Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning

Chong Ma, Hanqi Jiang, Wenting Chen, Yiwei Li, Zihao Wu, Xiaowei Yu, Zhengliang Liu, Lei Guo, Dajiang Zhu, Tuo Zhang, Dinggang Shen, Tianming Liu, Xiang Li

自动化学院

科研成果: 期刊稿件 › 会议文章 › 同行评审

摘要

In the medical multi-modal frameworks, the alignment of cross-modality features presents a significant challenge. However, existing works have learned features that are implicitly aligned from the data, without considering the explicit relationships in the medical context. This data-reliance may lead to low generalization of the learned alignment relationships. In this work, we propose the Eye-gaze Guided Multi-modal Alignment (EGMA) framework to harness eye-gaze data for better alignment of medical visual and textual features. We explore the natural auxiliary role of radiologists' eye-gaze data in aligning medical images and text, and introduce a novel approach by using eye-gaze data, collected synchronously by radiologists during diagnostic evaluations. We conduct downstream tasks of image classification and image-text retrieval on four medical datasets, where EGMA achieved state-of-the-art performance and stronger generalization across different datasets. Additionally, we explore the impact of varying amounts of eye-gaze data on model performance, highlighting the feasibility and utility of integrating this auxiliary data into multi-modal alignment framework.

源语言	英语
期刊	Advances in Neural Information Processing Systems
卷	37
出版状态	已出版 - 2024
活动	38th Conference on Neural Information Processing Systems, NeurIPS 2024 - Vancouver, 加拿大期限: 9 12月 2024 → 15 12月 2024

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{2be8f8fa3c784a27be2c357c5377de98,

title = "Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning",

abstract = "In the medical multi-modal frameworks, the alignment of cross-modality features presents a significant challenge. However, existing works have learned features that are implicitly aligned from the data, without considering the explicit relationships in the medical context. This data-reliance may lead to low generalization of the learned alignment relationships. In this work, we propose the Eye-gaze Guided Multi-modal Alignment (EGMA) framework to harness eye-gaze data for better alignment of medical visual and textual features. We explore the natural auxiliary role of radiologists' eye-gaze data in aligning medical images and text, and introduce a novel approach by using eye-gaze data, collected synchronously by radiologists during diagnostic evaluations. We conduct downstream tasks of image classification and image-text retrieval on four medical datasets, where EGMA achieved state-of-the-art performance and stronger generalization across different datasets. Additionally, we explore the impact of varying amounts of eye-gaze data on model performance, highlighting the feasibility and utility of integrating this auxiliary data into multi-modal alignment framework.",

author = "Chong Ma and Hanqi Jiang and Wenting Chen and Yiwei Li and Zihao Wu and Xiaowei Yu and Zhengliang Liu and Lei Guo and Dajiang Zhu and Tuo Zhang and Dinggang Shen and Tianming Liu and Xiang Li",

note = "Publisher Copyright: {\textcopyright} 2024 Neural information processing systems foundation. All rights reserved.; 38th Conference on Neural Information Processing Systems, NeurIPS 2024 ; Conference date: 09-12-2024 Through 15-12-2024",

year = "2024",

language = "英语",

volume = "37",

journal = "Advances in Neural Information Processing Systems",

issn = "1049-5258",

publisher = "Neural information processing systems foundation",

}

TY - JOUR

T1 - Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning

AU - Ma, Chong

AU - Jiang, Hanqi

AU - Chen, Wenting

AU - Li, Yiwei

AU - Wu, Zihao

AU - Yu, Xiaowei

AU - Liu, Zhengliang

AU - Guo, Lei

AU - Zhu, Dajiang

AU - Zhang, Tuo

AU - Shen, Dinggang

AU - Liu, Tianming

AU - Li, Xiang

PY - 2024

Y1 - 2024

N2 - In the medical multi-modal frameworks, the alignment of cross-modality features presents a significant challenge. However, existing works have learned features that are implicitly aligned from the data, without considering the explicit relationships in the medical context. This data-reliance may lead to low generalization of the learned alignment relationships. In this work, we propose the Eye-gaze Guided Multi-modal Alignment (EGMA) framework to harness eye-gaze data for better alignment of medical visual and textual features. We explore the natural auxiliary role of radiologists' eye-gaze data in aligning medical images and text, and introduce a novel approach by using eye-gaze data, collected synchronously by radiologists during diagnostic evaluations. We conduct downstream tasks of image classification and image-text retrieval on four medical datasets, where EGMA achieved state-of-the-art performance and stronger generalization across different datasets. Additionally, we explore the impact of varying amounts of eye-gaze data on model performance, highlighting the feasibility and utility of integrating this auxiliary data into multi-modal alignment framework.

AB - In the medical multi-modal frameworks, the alignment of cross-modality features presents a significant challenge. However, existing works have learned features that are implicitly aligned from the data, without considering the explicit relationships in the medical context. This data-reliance may lead to low generalization of the learned alignment relationships. In this work, we propose the Eye-gaze Guided Multi-modal Alignment (EGMA) framework to harness eye-gaze data for better alignment of medical visual and textual features. We explore the natural auxiliary role of radiologists' eye-gaze data in aligning medical images and text, and introduce a novel approach by using eye-gaze data, collected synchronously by radiologists during diagnostic evaluations. We conduct downstream tasks of image classification and image-text retrieval on four medical datasets, where EGMA achieved state-of-the-art performance and stronger generalization across different datasets. Additionally, we explore the impact of varying amounts of eye-gaze data on model performance, highlighting the feasibility and utility of integrating this auxiliary data into multi-modal alignment framework.

UR - http://www.scopus.com/inward/record.url?scp=105000503235&partnerID=8YFLogxK

M3 - 会议文章

AN - SCOPUS:105000503235

SN - 1049-5258

VL - 37

JO - Advances in Neural Information Processing Systems

JF - Advances in Neural Information Processing Systems

T2 - 38th Conference on Neural Information Processing Systems, NeurIPS 2024

Y2 - 9 December 2024 through 15 December 2024

ER -

Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning

摘要

其它文件与链接

指纹

引用此