CADS: A Self-Supervised Learner via Cross-Modal Alignment and Deep Self-Distillation for CT Volume Segmentation

Yiwen Ye; Jianpeng Zhang; Ziyang Chen; Yong Xia

doi:10.1109/TMI.2024.3431916

CADS: A Self-Supervised Learner via Cross-Modal Alignment and Deep Self-Distillation for CT Volume Segmentation

Yiwen Ye, Jianpeng Zhang, Ziyang Chen, Yong Xia

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

5 引用（Scopus）

摘要

Self-supervised learning (SSL) has long had great success in advancing the field of annotation-efficient learning. However, when applied to CT volume segmentation, most SSL methods suffer from two limitations, including rarely using the information acquired by different imaging modalities and providing supervision only to the bottleneck encoder layer. To address both limitations, we design a pretext task to align the information in each 3D CT volume and the corresponding 2D generated X-ray image and extend self-distillation to deep self-distillation. Thus, we propose a self-supervised learner based on Cross-modal Alignment and Deep Self-distillation (CADS) to improve the encoder's ability to characterize CT volumes. The cross-modal alignment is a more challenging pretext task that forces the encoder to learn better image representation ability. Deep self-distillation provides supervision to not only the bottleneck layer but also shallow layers, thus boosting the abilities of both. Comparative experiments show that, during pre-training, our CADS has lower computational complexity and GPU memory cost than competing SSL methods. Based on the pre-trained encoder, we construct PVT-UNet for 3D CT volume segmentation. Our results on seven downstream tasks indicate that PVT-UNet outperforms state-of-the-art SSL methods like MOCOv3 and DiRA, as well as prevalent medical image segmentation methods like nnUNet and CoTr. Code and pre-trained weight will be available at https://github.com/yeerwen/CADS.

源语言	英语
页（从-至）	118-129
页数	12
期刊	IEEE Transactions on Medical Imaging
卷	44
期	1
DOI	https://doi.org/10.1109/TMI.2024.3431916
出版状态	已出版 - 2025

访问文件

10.1109/TMI.2024.3431916

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{a7ed79dd2bda4ed496bfd95838adfd14,

title = "CADS: A Self-Supervised Learner via Cross-Modal Alignment and Deep Self-Distillation for CT Volume Segmentation",

abstract = "Self-supervised learning (SSL) has long had great success in advancing the field of annotation-efficient learning. However, when applied to CT volume segmentation, most SSL methods suffer from two limitations, including rarely using the information acquired by different imaging modalities and providing supervision only to the bottleneck encoder layer. To address both limitations, we design a pretext task to align the information in each 3D CT volume and the corresponding 2D generated X-ray image and extend self-distillation to deep self-distillation. Thus, we propose a self-supervised learner based on Cross-modal Alignment and Deep Self-distillation (CADS) to improve the encoder's ability to characterize CT volumes. The cross-modal alignment is a more challenging pretext task that forces the encoder to learn better image representation ability. Deep self-distillation provides supervision to not only the bottleneck layer but also shallow layers, thus boosting the abilities of both. Comparative experiments show that, during pre-training, our CADS has lower computational complexity and GPU memory cost than competing SSL methods. Based on the pre-trained encoder, we construct PVT-UNet for 3D CT volume segmentation. Our results on seven downstream tasks indicate that PVT-UNet outperforms state-of-the-art SSL methods like MOCOv3 and DiRA, as well as prevalent medical image segmentation methods like nnUNet and CoTr. Code and pre-trained weight will be available at https://github.com/yeerwen/CADS.",

keywords = "CT volume segmentation, Self-supervised learning, cross-modal alignment, deep self-distillation",

author = "Yiwen Ye and Jianpeng Zhang and Ziyang Chen and Yong Xia",

note = "Publisher Copyright: {\textcopyright} 1982-2012 IEEE.",

year = "2025",

doi = "10.1109/TMI.2024.3431916",

language = "英语",

volume = "44",

pages = "118--129",

journal = "IEEE Transactions on Medical Imaging",

issn = "0278-0062",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "1",

}

TY - JOUR

T1 - CADS

T2 - A Self-Supervised Learner via Cross-Modal Alignment and Deep Self-Distillation for CT Volume Segmentation

AU - Ye, Yiwen

AU - Zhang, Jianpeng

AU - Chen, Ziyang

AU - Xia, Yong

PY - 2025

Y1 - 2025

N2 - Self-supervised learning (SSL) has long had great success in advancing the field of annotation-efficient learning. However, when applied to CT volume segmentation, most SSL methods suffer from two limitations, including rarely using the information acquired by different imaging modalities and providing supervision only to the bottleneck encoder layer. To address both limitations, we design a pretext task to align the information in each 3D CT volume and the corresponding 2D generated X-ray image and extend self-distillation to deep self-distillation. Thus, we propose a self-supervised learner based on Cross-modal Alignment and Deep Self-distillation (CADS) to improve the encoder's ability to characterize CT volumes. The cross-modal alignment is a more challenging pretext task that forces the encoder to learn better image representation ability. Deep self-distillation provides supervision to not only the bottleneck layer but also shallow layers, thus boosting the abilities of both. Comparative experiments show that, during pre-training, our CADS has lower computational complexity and GPU memory cost than competing SSL methods. Based on the pre-trained encoder, we construct PVT-UNet for 3D CT volume segmentation. Our results on seven downstream tasks indicate that PVT-UNet outperforms state-of-the-art SSL methods like MOCOv3 and DiRA, as well as prevalent medical image segmentation methods like nnUNet and CoTr. Code and pre-trained weight will be available at https://github.com/yeerwen/CADS.

AB - Self-supervised learning (SSL) has long had great success in advancing the field of annotation-efficient learning. However, when applied to CT volume segmentation, most SSL methods suffer from two limitations, including rarely using the information acquired by different imaging modalities and providing supervision only to the bottleneck encoder layer. To address both limitations, we design a pretext task to align the information in each 3D CT volume and the corresponding 2D generated X-ray image and extend self-distillation to deep self-distillation. Thus, we propose a self-supervised learner based on Cross-modal Alignment and Deep Self-distillation (CADS) to improve the encoder's ability to characterize CT volumes. The cross-modal alignment is a more challenging pretext task that forces the encoder to learn better image representation ability. Deep self-distillation provides supervision to not only the bottleneck layer but also shallow layers, thus boosting the abilities of both. Comparative experiments show that, during pre-training, our CADS has lower computational complexity and GPU memory cost than competing SSL methods. Based on the pre-trained encoder, we construct PVT-UNet for 3D CT volume segmentation. Our results on seven downstream tasks indicate that PVT-UNet outperforms state-of-the-art SSL methods like MOCOv3 and DiRA, as well as prevalent medical image segmentation methods like nnUNet and CoTr. Code and pre-trained weight will be available at https://github.com/yeerwen/CADS.

KW - CT volume segmentation

KW - Self-supervised learning

KW - cross-modal alignment

KW - deep self-distillation

UR - http://www.scopus.com/inward/record.url?scp=85199353989&partnerID=8YFLogxK

U2 - 10.1109/TMI.2024.3431916

DO - 10.1109/TMI.2024.3431916

M3 - 文章

AN - SCOPUS:85199353989

SN - 0278-0062

VL - 44

SP - 118

EP - 129

JO - IEEE Transactions on Medical Imaging

JF - IEEE Transactions on Medical Imaging

IS - 1

ER -

CADS: A Self-Supervised Learner via Cross-Modal Alignment and Deep Self-Distillation for CT Volume Segmentation

摘要

访问文件

其它文件与链接

指纹

引用此