TY - JOUR
T1 - LncDisAP
T2 - A computation model for LncRNA-disease association prediction based on multiple biological datasets
AU - Wang, Yongtian
AU - Juan, Liran
AU - Peng, Jiajie
AU - Zang, Tianyi
AU - Wang, Yadong
N1 - Publisher Copyright:
© 2019 The Author(s).
PY - 2019/12/2
Y1 - 2019/12/2
N2 - Background: Over the past decades, a large number of long non-coding RNAs (lncRNAs) have been identified. Growing evidence has indicated that the mutation and dysregulation of lncRNAs play a critical role in the development of many complex human diseases. Consequently, identifying potential disease-related lncRNAs is an effective means to improve the quality of disease diagnostics and treatment, which is the motivation of this work. Here, we propose a computational model (LncDisAP) for potential disease-related lncRNA identification based on multiple biological datasets. First, the associations between lncRNA and different data sources are collected from different databases. With these data sources as dimensions, we calculate the functional associations between lncRNAs by the recommendation strategy of collaborative filtering. Subsequently, a disease-associated lncRNA functional network is built with functional similarities between lncRNAs as the weight. Ultimately, potential disease-related lncRNAs can be identified based on ranked scores derived by random walking with restart (RWR). Then, training sets and testing sets are extracted from two different versions of a disease-lncRNA dataset to assess the performance of LncDisAP on 54 diseases. Results: A lncRNA functional network is built based on the proposed computational model, and it contains 66,060 associations among 364 lncRNAs associated with 182 diseases in total. We extract 218 known disease-lncRNA pairs associated with 54 diseases to assess the network. As a result, the average AUC (area under the receiver operating characteristic curve) of LncDisAP is 78.08%. Conclusion: In this article, a computational model integrating multiple lncRNA-related biological datasets is proposed for identifying potential disease-related lncRNAs. The result shows that LncDisAP is successful in predicting novel disease-related lncRNA signatures. In addition, with several common cancers taken as case studies, we found some unknown lncRNAs that could be associated with these diseases through our network. These results suggest that this method can be helpful in improving the quality for disease diagnostics and treatment.
AB - Background: Over the past decades, a large number of long non-coding RNAs (lncRNAs) have been identified. Growing evidence has indicated that the mutation and dysregulation of lncRNAs play a critical role in the development of many complex human diseases. Consequently, identifying potential disease-related lncRNAs is an effective means to improve the quality of disease diagnostics and treatment, which is the motivation of this work. Here, we propose a computational model (LncDisAP) for potential disease-related lncRNA identification based on multiple biological datasets. First, the associations between lncRNA and different data sources are collected from different databases. With these data sources as dimensions, we calculate the functional associations between lncRNAs by the recommendation strategy of collaborative filtering. Subsequently, a disease-associated lncRNA functional network is built with functional similarities between lncRNAs as the weight. Ultimately, potential disease-related lncRNAs can be identified based on ranked scores derived by random walking with restart (RWR). Then, training sets and testing sets are extracted from two different versions of a disease-lncRNA dataset to assess the performance of LncDisAP on 54 diseases. Results: A lncRNA functional network is built based on the proposed computational model, and it contains 66,060 associations among 364 lncRNAs associated with 182 diseases in total. We extract 218 known disease-lncRNA pairs associated with 54 diseases to assess the network. As a result, the average AUC (area under the receiver operating characteristic curve) of LncDisAP is 78.08%. Conclusion: In this article, a computational model integrating multiple lncRNA-related biological datasets is proposed for identifying potential disease-related lncRNAs. The result shows that LncDisAP is successful in predicting novel disease-related lncRNA signatures. In addition, with several common cancers taken as case studies, we found some unknown lncRNAs that could be associated with these diseases through our network. These results suggest that this method can be helpful in improving the quality for disease diagnostics and treatment.
KW - Disease
KW - lncRNA network
KW - Long non-coding RNAs
KW - Random walking with restart
UR - http://www.scopus.com/inward/record.url?scp=85075875819&partnerID=8YFLogxK
U2 - 10.1186/s12859-019-3081-1
DO - 10.1186/s12859-019-3081-1
M3 - 文章
C2 - 31787106
AN - SCOPUS:85075875819
SN - 1471-2105
VL - 20
JO - BMC Bioinformatics
JF - BMC Bioinformatics
M1 - 582
ER -