TY - JOUR
T1 - Aggregate twice more efficiently
T2 - Dual feature aggregation transformer for medical image segmentation
AU - Li, Jiaxin
AU - Cui, Hengfei
AU - Zhang, Yanning
AU - Xia, Yong
N1 - Publisher Copyright:
© 2025 Elsevier B.V.
PY - 2026/5
Y1 - 2026/5
N2 - Accurate medical image segmentation provides precise descriptions of anatomical structures and pathological regions, which plays a crucial role in formulating effective treatment plans, guiding surgeries and monitoring disease progression. Recently, hybrid models combining Convolutional Neural Networks (CNNs) and Transformers have been able to compensate for the limitations of traditional CNNs in capturing long-range dependencies. However, these models often exhibit insufficient generalization ability when confronted with unknown medical data. On the other hand, purely Transformer-based models, while possessing strong global modeling capabilities, face challenges of high computational complexity. To address these problems, this paper proposes a novel U-shaped pure Transformer architecture, called Dual Feature Aggregation Transformer (DFAFormer). A Dual Feature Aggregation Transformer Block (DFATB) is designed based on Feature Aggregation Feed-Forward Network (FAFN), which enhances the model’s ability to capture richer contextual information and complex features by integrating spatial aggregation attention and channel aggregation attention mechanisms. The FAFN module introduces a gating mechanism to capture nonlinear spatial information and reduce channel redundancy, achieving efficient feature extraction while reducing the computational complexity of the model. Additionally, the Differential Transformer is innovatively incorporated, which focuses on key information and suppresses unnecessary noise through differential operations, improving the model’s robustness and generalization capabilities. Extensive comparison and ablation experiments are conducted on the Synapse, ISIC 2018 and WORD dataset, achieving average Dice scores of 83.60 %, 92.27 % and 87.78 % respectively. Experiments have shown that the proposed method outperforms state-of-the-art methods, reducing computational complexity while exhibiting strong generalization ability and promising application prospects. The code will be released via https://github.com/Sunflower-li369/DFAFormer .
AB - Accurate medical image segmentation provides precise descriptions of anatomical structures and pathological regions, which plays a crucial role in formulating effective treatment plans, guiding surgeries and monitoring disease progression. Recently, hybrid models combining Convolutional Neural Networks (CNNs) and Transformers have been able to compensate for the limitations of traditional CNNs in capturing long-range dependencies. However, these models often exhibit insufficient generalization ability when confronted with unknown medical data. On the other hand, purely Transformer-based models, while possessing strong global modeling capabilities, face challenges of high computational complexity. To address these problems, this paper proposes a novel U-shaped pure Transformer architecture, called Dual Feature Aggregation Transformer (DFAFormer). A Dual Feature Aggregation Transformer Block (DFATB) is designed based on Feature Aggregation Feed-Forward Network (FAFN), which enhances the model’s ability to capture richer contextual information and complex features by integrating spatial aggregation attention and channel aggregation attention mechanisms. The FAFN module introduces a gating mechanism to capture nonlinear spatial information and reduce channel redundancy, achieving efficient feature extraction while reducing the computational complexity of the model. Additionally, the Differential Transformer is innovatively incorporated, which focuses on key information and suppresses unnecessary noise through differential operations, improving the model’s robustness and generalization capabilities. Extensive comparison and ablation experiments are conducted on the Synapse, ISIC 2018 and WORD dataset, achieving average Dice scores of 83.60 %, 92.27 % and 87.78 % respectively. Experiments have shown that the proposed method outperforms state-of-the-art methods, reducing computational complexity while exhibiting strong generalization ability and promising application prospects. The code will be released via https://github.com/Sunflower-li369/DFAFormer .
KW - Differential transformer
KW - Dual attention
KW - Feature aggregation
KW - Medical image segmentation
UR - https://www.scopus.com/pages/publications/105024332421
U2 - 10.1016/j.inffus.2025.103996
DO - 10.1016/j.inffus.2025.103996
M3 - 文章
AN - SCOPUS:105024332421
SN - 1566-2535
VL - 129
JO - Information Fusion
JF - Information Fusion
M1 - 103996
ER -