Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning

  • Chong Ma
  • , Hanqi Jiang
  • , Wenting Chen
  • , Yiwei Li
  • , Zihao Wu
  • , Xiaowei Yu
  • , Zhengliang Liu
  • , Lei Guo
  • , Dajiang Zhu
  • , Tuo Zhang
  • , Dinggang Shen
  • , Tianming Liu
  • , Xiang Li

Research output: Contribution to journalConference articlepeer-review

7 Scopus citations

Abstract

In the medical multi-modal frameworks, the alignment of cross-modality features presents a significant challenge. However, existing works have learned features that are implicitly aligned from the data, without considering the explicit relationships in the medical context. This data-reliance may lead to low generalization of the learned alignment relationships. In this work, we propose the Eye-gaze Guided Multi-modal Alignment (EGMA) framework to harness eye-gaze data for better alignment of medical visual and textual features. We explore the natural auxiliary role of radiologists' eye-gaze data in aligning medical images and text, and introduce a novel approach by using eye-gaze data, collected synchronously by radiologists during diagnostic evaluations. We conduct downstream tasks of image classification and image-text retrieval on four medical datasets, where EGMA achieved state-of-the-art performance and stronger generalization across different datasets. Additionally, we explore the impact of varying amounts of eye-gaze data on model performance, highlighting the feasibility and utility of integrating this auxiliary data into multi-modal alignment framework.

Original languageEnglish
JournalAdvances in Neural Information Processing Systems
Volume37
StatePublished - 2024
Event38th Conference on Neural Information Processing Systems, NeurIPS 2024 - Vancouver, Canada
Duration: 9 Dec 202415 Dec 2024

Fingerprint

Dive into the research topics of 'Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning'. Together they form a unique fingerprint.

Cite this