Abstract
The accurate classification of Cas proteins is crucial for understanding CRISPR-Cas systems and developing genome-editing tools. Here, we present TEMC-Cas, a deep learning framework for accurate classification of Cas proteins that combines a finely tuned ESM protein language model with contrastive learning. Unlike traditional methods that rely on sequence similarity (e.g., BLAST, HMMs) or structural prediction, TEMC-Cas leverages evolutionary-scale modeling to capture distant homology while employing contrastive learning to distinguish closely related subtypes. The framework incorporates LoRA for efficient parameter adaptation and addresses class imbalance through weighted loss functions. TEMC-Cas achieves superior performance in classifying the Cas1-Cas13 families and 17 Cas12 subtypes, demonstrating particular strength in identifying remote homology. This approach provides a robust tool for the discovery of the CRISPR system and expands the toolbox for genome engineering applications. TEMC-Cas is now freely accessible at https://github.com/Xingyu-Liao/TEMC-Cas.
| Original language | English |
|---|---|
| Pages (from-to) | 4586-4596 |
| Number of pages | 11 |
| Journal | ACS Synthetic Biology |
| Volume | 14 |
| Issue number | 11 |
| DOIs | |
| State | Published - 21 Nov 2025 |
Keywords
- CRISPR-Cas system
- Cas protein
- classification
- contrastive learning
- distant homology
- protein language model
Fingerprint
Dive into the research topics of 'TEMC-Cas: Accurate Cas Protein Classification via Combined Contrastive Learning and Protein Language Models'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver