TY - JOUR
T1 - TransLSTD
T2 - Augmenting hierarchical disease risk prediction model with time and context awareness via disease clustering
AU - You, Tao
AU - Dang, Qiaodong
AU - Li, Qing
AU - Zhang, Peng
AU - Wu, Guanzhong
AU - Huang, Wei
N1 - Publisher Copyright:
© 2024 Elsevier Ltd
PY - 2024/9
Y1 - 2024/9
N2 - The use of electronic health records has become widespread, providing a valuable source of information for predicting disease risk. While deep neural network models have been proposed and shown to be effective in this task, supplemented with medical domain knowledge for interpretability, several limitations still exist. Firstly, there is often a lack of differentiation between chronic and acute diseases leading to biased modeling of diseases. Secondly, the extraction of patient single-layer temporal patterns is limited, which hinders comprehensive representation and predictive power. Thirdly, weak interpretability based on deep neural networks prevents the extraction of valuable medical knowledge, limiting practical applications. To overcome these challenges, we propose TransLSTD, a hierarchical model that incorporates time awareness and context awareness while distinguishing between long-term and short-term diseases. TransLSTD uses clustering algorithms to classify disease types based on the occurrence feature matrix of diseases from EHR dataset and updates disease representation at the code level while creating patient visit embeddings. The model utilizes query vectors to incorporate visit context information and combines time data to capture the patient's overall health status. Finally, the prediction module generates outcomes and provides effective interpretations. We demonstrate the effectiveness of TransLSTD using two real-world datasets, outperforming state-of-the-art models in terms of both AUC and F1 values. The data and code are released at https://github.com/DangQD/TransLSTD-master.
AB - The use of electronic health records has become widespread, providing a valuable source of information for predicting disease risk. While deep neural network models have been proposed and shown to be effective in this task, supplemented with medical domain knowledge for interpretability, several limitations still exist. Firstly, there is often a lack of differentiation between chronic and acute diseases leading to biased modeling of diseases. Secondly, the extraction of patient single-layer temporal patterns is limited, which hinders comprehensive representation and predictive power. Thirdly, weak interpretability based on deep neural networks prevents the extraction of valuable medical knowledge, limiting practical applications. To overcome these challenges, we propose TransLSTD, a hierarchical model that incorporates time awareness and context awareness while distinguishing between long-term and short-term diseases. TransLSTD uses clustering algorithms to classify disease types based on the occurrence feature matrix of diseases from EHR dataset and updates disease representation at the code level while creating patient visit embeddings. The model utilizes query vectors to incorporate visit context information and combines time data to capture the patient's overall health status. Finally, the prediction module generates outcomes and provides effective interpretations. We demonstrate the effectiveness of TransLSTD using two real-world datasets, outperforming state-of-the-art models in terms of both AUC and F1 values. The data and code are released at https://github.com/DangQD/TransLSTD-master.
KW - Data mining
KW - Disease classification
KW - Disease risk prediction
KW - Electronic health records
KW - Interpretability
UR - http://www.scopus.com/inward/record.url?scp=85191309013&partnerID=8YFLogxK
U2 - 10.1016/j.is.2024.102390
DO - 10.1016/j.is.2024.102390
M3 - 文章
AN - SCOPUS:85191309013
SN - 0306-4379
VL - 124
JO - Information Systems
JF - Information Systems
M1 - 102390
ER -