TY - JOUR
T1 - H3T
T2 - Hierarchical Transferable Transformer with TokenMix for Unsupervised Domain Adaptation
AU - Ren, Yihua
AU - Gao, Junyu
AU - Yuan, Yuan
N1 - Publisher Copyright:
© 2024
PY - 2025/3/1
Y1 - 2025/3/1
N2 - Recent research has been focused on exploring the capabilities of Vision Transformers (ViTs) in Unsupervised Domain Adaptation (UDA). This approach typically involves providing more significant attention to fine-grained common information through patch-level transferable discrimination. However, prematurely assigning narrow-range transferability information at the encoding stage can sparse image information, thereby increasing the difficulty of downstream tasks. Therefore, we propose a Hierarchical Transferable Transformer with TokenMix (H3T), which maintains the allocation of fine-grained transferability at the encoding stage while enhancing the learning strength of image information through feature mixup. To address the challenge of missing sample labels in the target domain within the domain adaptation task, we have specifically designed the TokenMix Module (TMM) for ViTs. This module learns the style information from both domains while alleviating the impact of image sparsity on downstream tasks. Furthermore, to enhance the semantic connections among narrow-range image transfer messages, we propose the Hierarchical Discriminative Module (HDM), which also serves a critical role in encoding discriminative information. Our approach underwent comprehensive experimentation across five datasets of varying sizes, demonstrating its effectiveness. Our code is available at https://github.com/reyihua/H3T.
AB - Recent research has been focused on exploring the capabilities of Vision Transformers (ViTs) in Unsupervised Domain Adaptation (UDA). This approach typically involves providing more significant attention to fine-grained common information through patch-level transferable discrimination. However, prematurely assigning narrow-range transferability information at the encoding stage can sparse image information, thereby increasing the difficulty of downstream tasks. Therefore, we propose a Hierarchical Transferable Transformer with TokenMix (H3T), which maintains the allocation of fine-grained transferability at the encoding stage while enhancing the learning strength of image information through feature mixup. To address the challenge of missing sample labels in the target domain within the domain adaptation task, we have specifically designed the TokenMix Module (TMM) for ViTs. This module learns the style information from both domains while alleviating the impact of image sparsity on downstream tasks. Furthermore, to enhance the semantic connections among narrow-range image transfer messages, we propose the Hierarchical Discriminative Module (HDM), which also serves a critical role in encoding discriminative information. Our approach underwent comprehensive experimentation across five datasets of varying sizes, demonstrating its effectiveness. Our code is available at https://github.com/reyihua/H3T.
KW - Adversarial learning
KW - Domain adaptation
KW - Mix-up
KW - Vision transformers
UR - http://www.scopus.com/inward/record.url?scp=85207934600&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2024.125543
DO - 10.1016/j.eswa.2024.125543
M3 - 文章
AN - SCOPUS:85207934600
SN - 0957-4174
VL - 262
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 125543
ER -