跳到主要导航 跳到搜索 跳到主要内容

LMSCDA: A Secondary Structure Enhanced Language Model for Predicting CircRNA and Disease Associations

  • Mian Shuo Lu
  • , Lei Wang
  • , Meng Meng Wei
  • , Xiao Rui Su
  • , Bo Wei Zhao
  • , Zhu Hong You
  • , De Shuang Huang
  • China University of Mining and Technology
  • Xinjiang Technical Institute of Physics and Chemistry
  • Zhejiang University
  • Eastern Institute of Technology, Ningbo

科研成果: 期刊稿件文章同行评审

摘要

Circular RNA (circRNA) is a kind of non-coding RNA widely present in cells. CircRNA plays a critical role in the occurrence and treatment of diseases. Unraveling the relationships between circRNAs and diseases has become a focus for diagnosis. While computational methods for predicting circRNA-disease associations (CDA) exist, they often oversimplify the representation of circRNA structures. To address this gap, we propose a novel method LMSCDA, which focuses on enhancing circRNA and disease representation by language model to predict CDAs. Specifically, we first calculate circRNA secondary structure by the chemistry principle. Then we employ a hierarchical feature extraction model to extract the circRNA structure and semantic features and amplify features by attention mechanism. Concurrently disease semantic features encoded utilize the biomedical language model. While behavioral features of circRNA and disease captured from circRNA-miRNA and circRNA-disease networks. We integrate them into comprehensive representation to predict CDAs. LMSCDA achieves an AUC of 0.9877 and an AUPR of 0.9881 in 5-fold cross-validation on the CircR2Disease dataset. Our approach yields demonstrably competitive results when evaluated against prominent existing models. Our case study on breast cancer first validated predictive accuracy of LMSCDA, with 19 of the top 20 circRNA-Breast cancer associations being confirmed by literature evidence. An analysis on independent clinical transcriptomic dataset identified highly differentially expressed circRNA by LMSCDA, pinpointing candidates for future investigation.

源语言英语
期刊IEEE Journal of Biomedical and Health Informatics
DOI
出版状态已接受/待刊 - 2026

联合国可持续发展目标

此成果有助于实现下列可持续发展目标:

  1. 可持续发展目标 3 - 良好健康与福祉
    可持续发展目标 3 良好健康与福祉

指纹

探究 'LMSCDA: A Secondary Structure Enhanced Language Model for Predicting CircRNA and Disease Associations' 的科研主题。它们共同构成独一无二的指纹。

引用此