TY - GEN
T1 - Enhanced RNA Sequence Representation through Sequence Masking and Subsequence Consistency Optimization
AU - Shen, Yewei
AU - Wang, Zhiyuan
AU - Li, Zongyu
AU - Liu, Xinmeng
AU - Shang, Xuequn
AU - Wang, Yongtian
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - In the burgeoning field of RNA research, accurate and efficient RNA sequence representation remains a pivotal challenge, exacerbated by the complexity and diversity of RNA sequences. Addressing the critical need for enhanced sequence representation and the issues of sequence context and structural alignment, this study introduces a novel, comprehensive approach. The proposed model seamlessly integrates sequence masking and subsequence consistency optimization, offering a robust solution to the intricate problem of RNA sequence representation. Utilizing the filtered RNAStralign dataset, encompassing 20,923 sequences, the model's performance is rigorously evaluated employing a Support Vector Machine (SVM) for subsequent RNA family classification tasks. Despite the inherent imbalance in RNA family sequence distribution, the model demonstrates exemplary performance, achieving high classification accuracy and AUPRC values across diverse RNA sequence groups. This balanced and unbiased assessment, ensured by the use of AUPRC as an evaluation metric, highlights the model's practical utility for comprehensive RNA sequence analysis and classification. In essence, this research presents a method for enhanced RNA sequence representation and laying a robust foundation for future advancements in the nuanced field of RNA sequence analysis.
AB - In the burgeoning field of RNA research, accurate and efficient RNA sequence representation remains a pivotal challenge, exacerbated by the complexity and diversity of RNA sequences. Addressing the critical need for enhanced sequence representation and the issues of sequence context and structural alignment, this study introduces a novel, comprehensive approach. The proposed model seamlessly integrates sequence masking and subsequence consistency optimization, offering a robust solution to the intricate problem of RNA sequence representation. Utilizing the filtered RNAStralign dataset, encompassing 20,923 sequences, the model's performance is rigorously evaluated employing a Support Vector Machine (SVM) for subsequent RNA family classification tasks. Despite the inherent imbalance in RNA family sequence distribution, the model demonstrates exemplary performance, achieving high classification accuracy and AUPRC values across diverse RNA sequence groups. This balanced and unbiased assessment, ensured by the use of AUPRC as an evaluation metric, highlights the model's practical utility for comprehensive RNA sequence analysis and classification. In essence, this research presents a method for enhanced RNA sequence representation and laying a robust foundation for future advancements in the nuanced field of RNA sequence analysis.
KW - RNA family classification
KW - sequence masking
KW - sequence representation
KW - subsequence consistency optimization
UR - http://www.scopus.com/inward/record.url?scp=85184886399&partnerID=8YFLogxK
U2 - 10.1109/BIBM58861.2023.10385730
DO - 10.1109/BIBM58861.2023.10385730
M3 - 会议稿件
AN - SCOPUS:85184886399
T3 - Proceedings - 2023 2023 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2023
SP - 2938
EP - 2944
BT - Proceedings - 2023 2023 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2023
A2 - Jiang, Xingpeng
A2 - Wang, Haiying
A2 - Alhajj, Reda
A2 - Hu, Xiaohua
A2 - Engel, Felix
A2 - Mahmud, Mufti
A2 - Pisanti, Nadia
A2 - Cui, Xuefeng
A2 - Song, Hong
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2023
Y2 - 5 December 2023 through 8 December 2023
ER -