摘要
Circular RNA (circRNA) is a kind of non-coding RNA widely present in cells. CircRNA plays a critical role in the occurrence and treatment of diseases. Unraveling the relationships between circRNAs and diseases has become a focus for diagnosis. While computational methods for predicting circRNA-disease associations (CDA) exist, they often oversimplify the representation of circRNA structures. To address this gap, we propose a novel method LMSCDA, which focuses on enhancing circRNA and disease representation by language model to predict CDAs. Specifically, we first calculate circRNA secondary structure by the chemistry principle. Then we employ a hierarchical feature extraction model to extract the circRNA structure and semantic features and amplify features by attention mechanism. Concurrently disease semantic features encoded utilize the biomedical language model. While behavioral features of circRNA and disease captured from circRNA-miRNA and circRNA-disease networks. We integrate them into comprehensive representation to predict CDAs. LMSCDA achieves an AUC of 0.9877 and an AUPR of 0.9881 in 5-fold cross-validation on the CircR2Disease dataset. Our approach yields demonstrably competitive results when evaluated against prominent existing models. Our case study on breast cancer first validated predictive accuracy of LMSCDA, with 19 of the top 20 circRNA-Breast cancer associations being confirmed by literature evidence. An analysis on independent clinical transcriptomic dataset identified highly differentially expressed circRNA by LMSCDA, pinpointing candidates for future investigation.
| 源语言 | 英语 |
|---|---|
| 期刊 | IEEE Journal of Biomedical and Health Informatics |
| DOI | |
| 出版状态 | 已接受/待刊 - 2026 |
联合国可持续发展目标
此成果有助于实现下列可持续发展目标:
-
可持续发展目标 3 良好健康与福祉
指纹
探究 'LMSCDA: A Secondary Structure Enhanced Language Model for Predicting CircRNA and Disease Associations' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver