TY - JOUR
T1 - NSECDA
T2 - Natural Semantic Enhancement for CircRNA-Disease Association Prediction
AU - Wang, Lei
AU - Wong, Leon
AU - You, Zhu Hong
AU - Huang, De Shuang
AU - Su, Xiao Rui
AU - Zhao, Bo Wei
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2022/10/1
Y1 - 2022/10/1
N2 - Increasing evidence suggest that circRNA, as one of the most promising emerging biomarkers, has a very close relationship with diseases. Exploring the relationship between circRNA and diseases can provide novel perspective for diseases diagnosis and pathogenesis. The existing circRNA-disease association (CDA) prediction models, however, generally treat the data attributes equally, do not pay special attention to the attributes with more significant influence, and do not make full use of the correlation and symbiosis between attributes to dig into the latent semantic information of the data. Therefore, in response to the above problems, this paper proposes a natural semantic enhancement method NSECDA to predict CDA. In practical terms, we first recognize the circRNA sequence as a biological language, and analyze its natural semantic properties through the natural language understanding theory; then integrate it with disease attributes, circRNA and disease Gaussian Interaction Profile (GIP) kernel attributes, and use Graph Attention Network (GAT) to focus on the influential attributes, so as to mine the deeply hidden features; finally, the Rotation Forest (RoF) classifier was used to accurately determine CDA. In the gold standard data set CircR2Disease, NSECDA achieved 92.49% accuracy with 0.9225 AUC score. In comparison with the non-natural semantic enhancement model and other classifier models, NSECDA also shows competitive performance. Additionally, 25 of the CDA pairs with unknown associations in the top 30 prediction scores of NSECDA have been proven by newly reported studies. These achievements suggest that NSECDA is an effective model to predict CDA, which can provide credible candidate for subsequent wet experiments, thus significantly reducing the scope of investigations.
AB - Increasing evidence suggest that circRNA, as one of the most promising emerging biomarkers, has a very close relationship with diseases. Exploring the relationship between circRNA and diseases can provide novel perspective for diseases diagnosis and pathogenesis. The existing circRNA-disease association (CDA) prediction models, however, generally treat the data attributes equally, do not pay special attention to the attributes with more significant influence, and do not make full use of the correlation and symbiosis between attributes to dig into the latent semantic information of the data. Therefore, in response to the above problems, this paper proposes a natural semantic enhancement method NSECDA to predict CDA. In practical terms, we first recognize the circRNA sequence as a biological language, and analyze its natural semantic properties through the natural language understanding theory; then integrate it with disease attributes, circRNA and disease Gaussian Interaction Profile (GIP) kernel attributes, and use Graph Attention Network (GAT) to focus on the influential attributes, so as to mine the deeply hidden features; finally, the Rotation Forest (RoF) classifier was used to accurately determine CDA. In the gold standard data set CircR2Disease, NSECDA achieved 92.49% accuracy with 0.9225 AUC score. In comparison with the non-natural semantic enhancement model and other classifier models, NSECDA also shows competitive performance. Additionally, 25 of the CDA pairs with unknown associations in the top 30 prediction scores of NSECDA have been proven by newly reported studies. These achievements suggest that NSECDA is an effective model to predict CDA, which can provide credible candidate for subsequent wet experiments, thus significantly reducing the scope of investigations.
KW - CircRNA-disease association
KW - Circular RNA
KW - Graph attention network
KW - Natural semantic
KW - Rotation forest
UR - http://www.scopus.com/inward/record.url?scp=85136895979&partnerID=8YFLogxK
U2 - 10.1109/JBHI.2022.3199462
DO - 10.1109/JBHI.2022.3199462
M3 - 文章
C2 - 35976848
AN - SCOPUS:85136895979
SN - 2168-2194
VL - 26
SP - 5075
EP - 5084
JO - IEEE Journal of Biomedical and Health Informatics
JF - IEEE Journal of Biomedical and Health Informatics
IS - 10
ER -