TY - JOUR
T1 - DPCIPI
T2 - A pre-trained deep learning model for predicting cross-immunity between drifted strains of Influenza A/H3N2
AU - Du, Yiming
AU - Li, Zhuotian
AU - He, Qian
AU - Tulu, Thomas Wetere
AU - Chan, Kei Hang Katie
AU - Wang, Lin
AU - Pei, Sen
AU - Du, Zhanwei
AU - Wang, Zhen
AU - Xu, Xiao Ke
AU - Liu, Xiao Fan
N1 - Publisher Copyright:
© 2025 The Authors
PY - 2025
Y1 - 2025
N2 - Predicting cross-immunity between viral strains is vital for public health surveillance and vaccine development. Traditional neural network methods, such as BiLSTM, could be ineffective due to the lack of lab data for model training and the overshadowing of crucial features within sequence concatenation. The current work proposes a less data-consuming model incorporating a pre-trained gene sequence model and a mutual information inference operator. Our methodology utilizes gene alignment and deduplication algorithms to preprocess gene sequences, enhancing the model's capacity to discern and focus on distinctions among input gene pairs. The model, i.e., DNA Pretrained Cross-Immunity Protection Inference model (DPCIPI), outperforms state-of-the-art (SOTA) models in predicting hemagglutination inhibition titer from influenza viral gene sequences only. Improvement in binary cross-immunity prediction is 1.58% in F1, 2.34% in precision, 1.57% in recall, and 1.57% in Accuracy. For multilevel cross-immunity improvements, the improvement is 2.12% in F1, 3.50% in precision, 2.19% in recall, and 2.19% in Accuracy. Our study showcases the potential of pre-trained gene models to improve predictions of antigenic variation and cross-immunity. With expanding gene data and advancements in pre-trained models, this approach promises significant impacts on vaccine development and public health.
AB - Predicting cross-immunity between viral strains is vital for public health surveillance and vaccine development. Traditional neural network methods, such as BiLSTM, could be ineffective due to the lack of lab data for model training and the overshadowing of crucial features within sequence concatenation. The current work proposes a less data-consuming model incorporating a pre-trained gene sequence model and a mutual information inference operator. Our methodology utilizes gene alignment and deduplication algorithms to preprocess gene sequences, enhancing the model's capacity to discern and focus on distinctions among input gene pairs. The model, i.e., DNA Pretrained Cross-Immunity Protection Inference model (DPCIPI), outperforms state-of-the-art (SOTA) models in predicting hemagglutination inhibition titer from influenza viral gene sequences only. Improvement in binary cross-immunity prediction is 1.58% in F1, 2.34% in precision, 1.57% in recall, and 1.57% in Accuracy. For multilevel cross-immunity improvements, the improvement is 2.12% in F1, 3.50% in precision, 2.19% in recall, and 2.19% in Accuracy. Our study showcases the potential of pre-trained gene models to improve predictions of antigenic variation and cross-immunity. With expanding gene data and advancements in pre-trained models, this approach promises significant impacts on vaccine development and public health.
KW - Cross-immunity prediction
KW - Deep learning
KW - Hemagglutination inhibition
KW - Influenza strains
KW - Pre-trained model
UR - http://www.scopus.com/inward/record.url?scp=105002757287&partnerID=8YFLogxK
U2 - 10.1016/j.jai.2025.03.004
DO - 10.1016/j.jai.2025.03.004
M3 - 文章
AN - SCOPUS:105002757287
SN - 2949-8554
JO - Journal of Automation and Intelligence
JF - Journal of Automation and Intelligence
ER -