TY - JOUR
T1 - Graph-Regularized Non-Negative Matrix Factorization for Single-Cell Clustering in scRNA-Seq Data
AU - Jiang, Hanjing
AU - Wang, Mei Neng
AU - Huang, Yu An
AU - Huang, Yabing
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2024
Y1 - 2024
N2 - The advent of single-cell RNA sequencing (scRNA-seq) has brought forth fresh perspectives on intricate biological processes, revealing the nuances and divergences present among distinct cells. Accurate single-cell analysis is a crucial prerequisite for in-depth investigation into the underlying mechanisms of heterogeneity. Due to various technical noises, like the impact of dropout values, scRNA-seq data remains challenging to interpret. In this work, we propose an unsupervised learning framework for scRNA-seq data analysis (aka Sc-GNNMF). Based on the non-negativity and sparsity of scRNA-seq data, we propose employing graph-regularized non-negative matrix factorization (GNNMF) algorithm for the analysis of scRNA-seq data, which involves estimating cell-cell sparse similarity and gene-gene sparse similarity through Laplacian kernels and p-nearest neighbor graphs (p-NNG). By assuming intrinsic geometric local invariance, we use a weighted p-nearest known neighbors (p-NKN) to optimize the scRNA-seq data. The optimized scRNA-seq data then participates in the matrix decomposition process, promoting the closeness of cells with similar types in cell-gene data space and determining a more suitable embedding space for clustering. Sc-GNNMF demonstrates superior performance compared to other methods and maintains satisfactory compatibility and robustness, as evidenced by experiments on 11 real scRNA-seq datasets. Furthermore, Sc-GNNMF yields excellent results in clustering tasks, extracting useful gene markers, and pseudo-temporal analysis.
AB - The advent of single-cell RNA sequencing (scRNA-seq) has brought forth fresh perspectives on intricate biological processes, revealing the nuances and divergences present among distinct cells. Accurate single-cell analysis is a crucial prerequisite for in-depth investigation into the underlying mechanisms of heterogeneity. Due to various technical noises, like the impact of dropout values, scRNA-seq data remains challenging to interpret. In this work, we propose an unsupervised learning framework for scRNA-seq data analysis (aka Sc-GNNMF). Based on the non-negativity and sparsity of scRNA-seq data, we propose employing graph-regularized non-negative matrix factorization (GNNMF) algorithm for the analysis of scRNA-seq data, which involves estimating cell-cell sparse similarity and gene-gene sparse similarity through Laplacian kernels and p-nearest neighbor graphs (p-NNG). By assuming intrinsic geometric local invariance, we use a weighted p-nearest known neighbors (p-NKN) to optimize the scRNA-seq data. The optimized scRNA-seq data then participates in the matrix decomposition process, promoting the closeness of cells with similar types in cell-gene data space and determining a more suitable embedding space for clustering. Sc-GNNMF demonstrates superior performance compared to other methods and maintains satisfactory compatibility and robustness, as evidenced by experiments on 11 real scRNA-seq datasets. Furthermore, Sc-GNNMF yields excellent results in clustering tasks, extracting useful gene markers, and pseudo-temporal analysis.
KW - clustering
KW - gene marker
KW - non-negative matrix factorization
KW - scRNA-seq
UR - http://www.scopus.com/inward/record.url?scp=85194028461&partnerID=8YFLogxK
U2 - 10.1109/JBHI.2024.3400050
DO - 10.1109/JBHI.2024.3400050
M3 - 文章
C2 - 38787664
AN - SCOPUS:85194028461
SN - 2168-2194
VL - 28
SP - 4986
EP - 4994
JO - IEEE Journal of Biomedical and Health Informatics
JF - IEEE Journal of Biomedical and Health Informatics
IS - 8
ER -