Deep Learning-Based Classification of CRISPR Loci Using Repeat Sequences

Xingyu Liao, Yanyan Li, Yingfu Wu, Xingyi Li, Xuequn Shang

科研成果: 期刊稿件文章同行评审

摘要

With the widespread application of the CRISPR-Cas system in gene editing and related fields, along with the increasing availability of metagenomic data, the demand for detecting and classifying CRISPR-Cas systems in metagenomic data sets has grown significantly. Traditional classification methods for CRISPR-Cas systems primarily rely on identifying cas genes near CRISPR arrays. However, in cases where cas gene information is absent, such as in metagenomes or fragmented genome assemblies, traditional methods may fail. Here, we present a deep learning-based method, CRISPRclassify-CNN-Att, which classifies CRISPR loci solely based on repeat sequences. CRISPRclassify-CNN-Att utilizes convolutional neural networks (CNNs) and self-attention mechanisms to extract features from repeat sequences. It employs a stacking strategy to address the imbalance of samples across different subtypes and uses transfer learning to improve classification accuracy for subtypes with fewer samples. CRISPRclassify-CNN-Att demonstrates outstanding performance in classifying multiple subtypes, particularly those with larger sample sizes. Although CRISPR loci classification traditionally depends on cas genes, CRISPRclassify-CNN-Att offers a novel approach that serves as a significant complement to cas-based methods, enabling the classification of orphan or distant CRISPR loci. The proposed tool is freely accessible via https://github.com/Xingyu-Liao/CRISPRclassify-CNN-Att.

源语言英语
期刊ACS Synthetic Biology
DOI
出版状态已接受/待刊 - 2025

指纹

探究 'Deep Learning-Based Classification of CRISPR Loci Using Repeat Sequences' 的科研主题。它们共同构成独一无二的指纹。

引用此