A versatile and scalable single-cell data integration algorithm based on domain-adversarial and variational approximation

Jialu Hu, Yuanke Zhong, Xuequn Shang

Research output: Contribution to journalArticlepeer-review

24 Scopus citations

Abstract

Single-cell technologies provide us new ways to profile transcriptomic landscape, chromatin accessibility, spatial expression patterns in heterogeneous tissues at the resolution of single cell. With enormous generated single-cell datasets, a key analytic challenge is to integrate these datasets to gain biological insights into cellular compositions. Here, we developed a domain-adversarial and variational approximation, DAVAE, which can integrate multiple single-cell datasets across samples, technologies and modalities with a single strategy. Besides, DAVAE can also integrate paired data of ATAC profile and transcriptome profile that are simultaneously measured from a same cell. With a mini-batch stochastic gradient descent strategy, it is scalable for large-scale data and can be accelerated by GPUs. Results on seven real data integration applications demonstrated the effectiveness and scalability of DAVAE in batch-effect removing, transfer learning and cell-type predictions for multiple single-cell datasets across samples, technologies and modalities. Availability: DAVAE has been implemented in a toolkit package "scbean"in the pypi repository, and the source code can be also freely accessible at https://github.com/jhu99/scbean. All our data and source code for reproducing the results of this paper can be accessible at https://github.com/jhu99/davae_paper.

Original languageEnglish
Article numberbbab400
JournalBriefings in Bioinformatics
Volume23
Issue number1
DOIs
StatePublished - 1 Jan 2022

Keywords

  • data integration
  • domain-adversarial learning
  • multimodal data
  • regularized regression
  • single cell analysis
  • variational approximation

Fingerprint

Dive into the research topics of 'A versatile and scalable single-cell data integration algorithm based on domain-adversarial and variational approximation'. Together they form a unique fingerprint.

Cite this