Eye-Gaze-Guided Vision Transformer for Rectifying Shortcut Learning

Chong Ma, Lin Zhao, Yuzhong Chen, Sheng Wang, Lei Guo, Tuo Zhang, Dinggang Shen, Xi Jiang, Tianming Liu

Research output: Contribution to journalArticlepeer-review

22 Scopus citations

Abstract

Learning harmful shortcuts such as spurious correlations and biases prevents deep neural networks from learning meaningful and useful representations, thus jeopardizing the generalizability and interpretability of the learned representation. The situation becomes even more serious in medical image analysis, where the clinical data are limited and scarce while the reliability, generalizability and transparency of the learned model are highly required. To rectify the harmful shortcuts in medical imaging applications, in this paper, we propose a novel eye-gaze-guided vision transformer (EG-ViT) model which infuses the visual attention from radiologists to proactively guide the vision transformer (ViT) model to focus on regions with potential pathology rather than spurious correlations. To do so, the EG-ViT model takes the masked image patches that are within the radiologists' interest as input while has an additional residual connection to the last encoder layer to maintain the interactions of all patches. The experiments on two medical imaging datasets demonstrate that the proposed EG-ViT model can effectively rectify the harmful shortcut learning and improve the interpretability of the model. Meanwhile, infusing the experts' domain knowledge can also improve the large-scale ViT model's performance over all compared baseline methods with limited samples available. In general, EG-ViT takes the advantages of powerful deep neural networks while rectifies the harmful shortcut learning with human expert's prior knowledge. This work also opens new avenues for advancing current artificial intelligence paradigms by infusing human intelligence.

Original languageEnglish
Pages (from-to)3384-3394
Number of pages11
JournalIEEE Transactions on Medical Imaging
Volume42
Issue number11
DOIs
StatePublished - 1 Nov 2023

Keywords

  • Eye tracking
  • generalizability
  • interpretability
  • shortcut learning
  • vision transformer

Fingerprint

Dive into the research topics of 'Eye-Gaze-Guided Vision Transformer for Rectifying Shortcut Learning'. Together they form a unique fingerprint.

Cite this