Abstract
Accurate prediction of drug-drug interactions (DDIs) is critical for enhancing therapeutic safety and efficacy. However, current computational approaches mostly rely on single-modal representations of drug structures, neglecting complementary information across different structural hierarchies, which affects their generalizability and interpretability. To address this limitation, we propose SGT-DDI, a multimodal information fusion framework that synergistically integrates three-dimensional (3D) geometric and two-dimensional (2D) molecular substructure representation through a hierarchical Transformer architecture. SGT-DDI employs a spatial geometry encoder to capture atomic-level 3D conformational properties and a graph transformer network to extract 2D topological patterns. These cross-modal representations are unified via multi-head attention mechanisms to generate context-aware drug embeddings, enabling simultaneous prediction of interaction occurrence and specific pharmacological effects. SGT-DDI achieves outstanding performance with an accuracy of 97.23 % and 95.32 % on DrugBank and DDInter datasets. Performance under rigorous scenarios of unseen-seen, unseen-unseen drugs demonstrates SGT-DDI’s excellent generalization capabilities. Ablation studies validate the necessity of both 2D and 3D structural encoders and cross-modal fusion mechanisms. Case studies further reveal that the interpretable attention patterns capture the key functional groups that determine drug interactions, highlighting the reliability of the proposed model to discover unknown drug interactions.
| Original language | English |
|---|---|
| Article number | 103981 |
| Journal | Information Fusion |
| Volume | 128 |
| DOIs | |
| State | Published - Apr 2026 |
Keywords
- Drug combinations
- Drug-drug interaction
- Graph representation learning
- Multi-head attention
- Multimodal fusion