Improving multiple instance contrastive learning via sparse transformer for whole slide image classification

  • Zhaoyang Liu
  • , Mengkang Lu
  • , Yong Xia
  • , Wei Liu
  • , Minglei Shu

Research output: Contribution to journalArticlepeer-review

Abstract

Slide-level pathology diagnosis using whole slide images (WSIs) is typically formulated as a weakly supervised classification task, which can be effectively addressed through multiple instance learning (MIL). Motivated by the limitations of conventional MIL frameworks, we seek to fully exploit self-supervised learning to enhance instance-level feature extraction, while enabling efficient multi-instance aggregation that explicitly accounts for inter-instance correlations. In this paper, we present MICL++, an enhanced multiple instance contrastive learning framework tailored for WSI classification. SETMIL builds upon a sparse transformer backbone and comprises two key components. First, the Pathology-Specific Contrastive Learning Extraction (PSCLE) module generates discriminative instance-level features optimized for pathological image understanding. Second, the Efficient Sparse Transformer Aggregation (ESTA) module models long-range dependencies among instances with improved computational efficiency. Our method achieves state-of-the-art performance on the CAMELYON16 dataset and the TCGA lung cancer dataset, significantly surpassing prior MIL approaches. Additionally, on five widely used MIL benchmark datasets (MUSK1, MUSK2, ELEPHANT, FOX, and TIGER), our framework consistently outperforms existing methods, demonstrating strong generalization across both clinical and standard MIL scenarios.

Original languageEnglish
Article number108714
JournalBiomedical Signal Processing and Control
Volume112
DOIs
StatePublished - Feb 2026

Keywords

  • Multiple instance learning
  • Whole slide image

Fingerprint

Dive into the research topics of 'Improving multiple instance contrastive learning via sparse transformer for whole slide image classification'. Together they form a unique fingerprint.

Cite this