TY - JOUR
T1 - Enhancing cell subpopulation discovery in cancer by integrating single-cell transcriptome and expressed variants
AU - Wang, Tao
AU - Mai, Duoduo
AU - Shu, Han
AU - Hu, Jialu
AU - Wang, Yongtian
AU - Peng, Jiajie
AU - Chen, Jing
AU - Shang, Xuequn
N1 - Publisher Copyright:
© 2025
PY - 2025
Y1 - 2025
N2 - The emergence of single-cell RNA sequencing (scRNA-seq) technology has revolutionized the study of cellular heterogeneity at the single-cell level. However, existing methods for identifying subpopulations of cells in scRNA-seq data mainly rely on gene expression features, neglecting the valuable genomic information present in the raw sequencing data. To address this limitation, we propose an end-to-end deep clustering model called scCluster, which integrates single-cell gene expression profiles and expressed variant features derived from the raw scRNA-seq data to stratify cell subpopulations in cancer tissues. scCluster employs a joint optimization strategy that combines a zero-inflated negative binomial model-based dual-modal autoencoder with deep embedding clustering in the pre-training phase. This allows both gene expression profiles and variant features to be encoded into the same latent embedding space. In the fine-tuning stage, scCluster further enhances the discriminability of the latent representations by integrating deep soft K-means clustering and cross-instance guided contrastive clustering techniques. Our extensive evaluations reveal that scCluster surpasses state-of-the-art methods in multiple real-world cancer scRNA-seq datasets. The results also indicate that incorporating the expressed variant features alongside gene expressions substantially enhances the stratification of cell subpopulations in cancer single-cell research.
AB - The emergence of single-cell RNA sequencing (scRNA-seq) technology has revolutionized the study of cellular heterogeneity at the single-cell level. However, existing methods for identifying subpopulations of cells in scRNA-seq data mainly rely on gene expression features, neglecting the valuable genomic information present in the raw sequencing data. To address this limitation, we propose an end-to-end deep clustering model called scCluster, which integrates single-cell gene expression profiles and expressed variant features derived from the raw scRNA-seq data to stratify cell subpopulations in cancer tissues. scCluster employs a joint optimization strategy that combines a zero-inflated negative binomial model-based dual-modal autoencoder with deep embedding clustering in the pre-training phase. This allows both gene expression profiles and variant features to be encoded into the same latent embedding space. In the fine-tuning stage, scCluster further enhances the discriminability of the latent representations by integrating deep soft K-means clustering and cross-instance guided contrastive clustering techniques. Our extensive evaluations reveal that scCluster surpasses state-of-the-art methods in multiple real-world cancer scRNA-seq datasets. The results also indicate that incorporating the expressed variant features alongside gene expressions substantially enhances the stratification of cell subpopulations in cancer single-cell research.
KW - Cancer
KW - Deep learning
KW - Expressed variants
KW - Multi-omics data integration
KW - Single-cell subpopulation
UR - http://www.scopus.com/inward/record.url?scp=85219693385&partnerID=8YFLogxK
U2 - 10.1016/j.fmre.2025.01.001
DO - 10.1016/j.fmre.2025.01.001
M3 - 文章
AN - SCOPUS:85219693385
SN - 2667-3258
JO - Fundamental Research
JF - Fundamental Research
ER -