CADS: A Self-Supervised Learner via Cross-Modal Alignment and Deep Self-Distillation for CT Volume Segmentation

Yiwen Ye, Jianpeng Zhang, Ziyang Chen, Yong Xia

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

Self-supervised learning (SSL) has long had great success in advancing the field of annotation-efficient learning. However, when applied to CT volume segmentation, most SSL methods suffer from two limitations, including rarely using the information acquired by different imaging modalities and providing supervision only to the bottleneck encoder layer. To address both limitations, we design a pretext task to align the information in each 3D CT volume and the corresponding 2D generated X-ray image and extend self-distillation to deep self-distillation. Thus, we propose a self-supervised learner based on Cross-modal Alignment and Deep Self-distillation (CADS) to improve the encoder's ability to characterize CT volumes. The cross-modal alignment is a more challenging pretext task that forces the encoder to learn better image representation ability. Deep self-distillation provides supervision to not only the bottleneck layer but also shallow layers, thus boosting the abilities of both. Comparative experiments show that, during pre-training, our CADS has lower computational complexity and GPU memory cost than competing SSL methods. Based on the pre-trained encoder, we construct PVT-UNet for 3D CT volume segmentation. Our results on seven downstream tasks indicate that PVT-UNet outperforms state-of-the-art SSL methods like MOCOv3 and DiRA, as well as prevalent medical image segmentation methods like nnUNet and CoTr. Code and pre-trained weight will be available at https://github.com/yeerwen/CADS.

Original languageEnglish
Pages (from-to)118-129
Number of pages12
JournalIEEE Transactions on Medical Imaging
Volume44
Issue number1
DOIs
StatePublished - 2025

Keywords

  • CT volume segmentation
  • Self-supervised learning
  • cross-modal alignment
  • deep self-distillation

Fingerprint

Dive into the research topics of 'CADS: A Self-Supervised Learner via Cross-Modal Alignment and Deep Self-Distillation for CT Volume Segmentation'. Together they form a unique fingerprint.

Cite this