UniMiSS+: Universal Medical Self-Supervised Learning From Cross-Dimensional Unpaired Data

Yutong Xie; Jianpeng Zhang; Yong Xia; Qi Wu

doi:10.1109/TPAMI.2024.3436105

UniMiSS+: Universal Medical Self-Supervised Learning From Cross-Dimensional Unpaired Data

Yutong Xie, Jianpeng Zhang, Yong Xia, Qi Wu

School of Computer Science

Research output: Contribution to journal › Article › peer-review

3 Scopus citations

Abstract

Self-supervised learning (SSL) opens up huge opportunities for medical image analysis that is well known for its lack of annotations. However, aggregating massive (unlabeled) 3D medical images like computerized tomography (CT) remains challenging due to its high imaging cost and privacy restrictions. In our pilot study, we advocated bringing a wealth of 2D images like X-rays as compensation for the lack of 3D data, aiming to build a universal medical self-supervised representation learning framework, called UniMiSS. Especially, we designed a pyramid U-like medical Transformer (MiT) as the backbone to make UniMiSS possible to perform SSL with both 2D and 3D images. UniMiSS surpasses current 3D-specific SSL in effectiveness and versatility, excelling in various downstream tasks and overcoming the limitations of dimensionality. However, the initial version did not fully explore the anatomical correlations between 2D and 3D images due to the absence of paired multi-modal patient data. In this extension, we introduce UniMiSS+, which leverages digitally reconstructed radiographs (DRR) technology to simulate X-rays from CT volumes, providing access to paired data. Benefiting from the paired group, we introduce an extra pair-wise constraint to boost the cross modality correlation learning, which also can be adopted as a cross dimension regularization to further improve the representations. We conduct expensive experiments on multiple 3D/2D medical image analysis tasks, including segmentation and classification. The results show that our UniMiSS+ achieves promising performance on various downstream tasks, not only outperforming ImageNet pre-training and other advanced SSL counterparts but also improving the predecessor UniMiSS pre-training.

Original language	English
Pages (from-to)	10021-10035
Number of pages	15
Journal	IEEE Transactions on Pattern Analysis and Machine Intelligence
Volume	46
Issue number	12
DOIs	https://doi.org/10.1109/TPAMI.2024.3436105
State	Published - 2024

Keywords

Medical image analysis
self-supervised learning
transformer
universal learning

Access to Document

10.1109/TPAMI.2024.3436105

Cite this

@article{4166b2328f6e467f8001b498b72fa102,

title = "UniMiSS+: Universal Medical Self-Supervised Learning From Cross-Dimensional Unpaired Data",

abstract = "Self-supervised learning (SSL) opens up huge opportunities for medical image analysis that is well known for its lack of annotations. However, aggregating massive (unlabeled) 3D medical images like computerized tomography (CT) remains challenging due to its high imaging cost and privacy restrictions. In our pilot study, we advocated bringing a wealth of 2D images like X-rays as compensation for the lack of 3D data, aiming to build a universal medical self-supervised representation learning framework, called UniMiSS. Especially, we designed a pyramid U-like medical Transformer (MiT) as the backbone to make UniMiSS possible to perform SSL with both 2D and 3D images. UniMiSS surpasses current 3D-specific SSL in effectiveness and versatility, excelling in various downstream tasks and overcoming the limitations of dimensionality. However, the initial version did not fully explore the anatomical correlations between 2D and 3D images due to the absence of paired multi-modal patient data. In this extension, we introduce UniMiSS+, which leverages digitally reconstructed radiographs (DRR) technology to simulate X-rays from CT volumes, providing access to paired data. Benefiting from the paired group, we introduce an extra pair-wise constraint to boost the cross modality correlation learning, which also can be adopted as a cross dimension regularization to further improve the representations. We conduct expensive experiments on multiple 3D/2D medical image analysis tasks, including segmentation and classification. The results show that our UniMiSS+ achieves promising performance on various downstream tasks, not only outperforming ImageNet pre-training and other advanced SSL counterparts but also improving the predecessor UniMiSS pre-training.",

keywords = "Medical image analysis, self-supervised learning, transformer, universal learning",

author = "Yutong Xie and Jianpeng Zhang and Yong Xia and Qi Wu",

note = "Publisher Copyright: {\textcopyright} 1979-2012 IEEE.",

year = "2024",

doi = "10.1109/TPAMI.2024.3436105",

language = "英语",

volume = "46",

pages = "10021--10035",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

number = "12",

}

TY - JOUR

T1 - UniMiSS+

T2 - Universal Medical Self-Supervised Learning From Cross-Dimensional Unpaired Data

AU - Xie, Yutong

AU - Zhang, Jianpeng

AU - Xia, Yong

AU - Wu, Qi

PY - 2024

Y1 - 2024

N2 - Self-supervised learning (SSL) opens up huge opportunities for medical image analysis that is well known for its lack of annotations. However, aggregating massive (unlabeled) 3D medical images like computerized tomography (CT) remains challenging due to its high imaging cost and privacy restrictions. In our pilot study, we advocated bringing a wealth of 2D images like X-rays as compensation for the lack of 3D data, aiming to build a universal medical self-supervised representation learning framework, called UniMiSS. Especially, we designed a pyramid U-like medical Transformer (MiT) as the backbone to make UniMiSS possible to perform SSL with both 2D and 3D images. UniMiSS surpasses current 3D-specific SSL in effectiveness and versatility, excelling in various downstream tasks and overcoming the limitations of dimensionality. However, the initial version did not fully explore the anatomical correlations between 2D and 3D images due to the absence of paired multi-modal patient data. In this extension, we introduce UniMiSS+, which leverages digitally reconstructed radiographs (DRR) technology to simulate X-rays from CT volumes, providing access to paired data. Benefiting from the paired group, we introduce an extra pair-wise constraint to boost the cross modality correlation learning, which also can be adopted as a cross dimension regularization to further improve the representations. We conduct expensive experiments on multiple 3D/2D medical image analysis tasks, including segmentation and classification. The results show that our UniMiSS+ achieves promising performance on various downstream tasks, not only outperforming ImageNet pre-training and other advanced SSL counterparts but also improving the predecessor UniMiSS pre-training.

AB - Self-supervised learning (SSL) opens up huge opportunities for medical image analysis that is well known for its lack of annotations. However, aggregating massive (unlabeled) 3D medical images like computerized tomography (CT) remains challenging due to its high imaging cost and privacy restrictions. In our pilot study, we advocated bringing a wealth of 2D images like X-rays as compensation for the lack of 3D data, aiming to build a universal medical self-supervised representation learning framework, called UniMiSS. Especially, we designed a pyramid U-like medical Transformer (MiT) as the backbone to make UniMiSS possible to perform SSL with both 2D and 3D images. UniMiSS surpasses current 3D-specific SSL in effectiveness and versatility, excelling in various downstream tasks and overcoming the limitations of dimensionality. However, the initial version did not fully explore the anatomical correlations between 2D and 3D images due to the absence of paired multi-modal patient data. In this extension, we introduce UniMiSS+, which leverages digitally reconstructed radiographs (DRR) technology to simulate X-rays from CT volumes, providing access to paired data. Benefiting from the paired group, we introduce an extra pair-wise constraint to boost the cross modality correlation learning, which also can be adopted as a cross dimension regularization to further improve the representations. We conduct expensive experiments on multiple 3D/2D medical image analysis tasks, including segmentation and classification. The results show that our UniMiSS+ achieves promising performance on various downstream tasks, not only outperforming ImageNet pre-training and other advanced SSL counterparts but also improving the predecessor UniMiSS pre-training.

KW - Medical image analysis

KW - self-supervised learning

KW - transformer

KW - universal learning

UR - http://www.scopus.com/inward/record.url?scp=85200246009&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2024.3436105

DO - 10.1109/TPAMI.2024.3436105

M3 - 文章

C2 - 39083391

AN - SCOPUS:85200246009

SN - 0162-8828

VL - 46

SP - 10021

EP - 10035

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

IS - 12

ER -

UniMiSS+: Universal Medical Self-Supervised Learning From Cross-Dimensional Unpaired Data

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this