H3T: Hierarchical Transferable Transformer with TokenMix for Unsupervised Domain Adaptation

Yihua Ren, Junyu Gao, Yuan Yuan

Research output: Contribution to journalArticlepeer-review

Abstract

Recent research has been focused on exploring the capabilities of Vision Transformers (ViTs) in Unsupervised Domain Adaptation (UDA). This approach typically involves providing more significant attention to fine-grained common information through patch-level transferable discrimination. However, prematurely assigning narrow-range transferability information at the encoding stage can sparse image information, thereby increasing the difficulty of downstream tasks. Therefore, we propose a Hierarchical Transferable Transformer with TokenMix (H3T), which maintains the allocation of fine-grained transferability at the encoding stage while enhancing the learning strength of image information through feature mixup. To address the challenge of missing sample labels in the target domain within the domain adaptation task, we have specifically designed the TokenMix Module (TMM) for ViTs. This module learns the style information from both domains while alleviating the impact of image sparsity on downstream tasks. Furthermore, to enhance the semantic connections among narrow-range image transfer messages, we propose the Hierarchical Discriminative Module (HDM), which also serves a critical role in encoding discriminative information. Our approach underwent comprehensive experimentation across five datasets of varying sizes, demonstrating its effectiveness. Our code is available at https://github.com/reyihua/H3T.

Original languageEnglish
Article number125543
JournalExpert Systems with Applications
Volume262
DOIs
StatePublished - 1 Mar 2025

Keywords

  • Adversarial learning
  • Domain adaptation
  • Mix-up
  • Vision transformers

Fingerprint

Dive into the research topics of 'H3T: Hierarchical Transferable Transformer with TokenMix for Unsupervised Domain Adaptation'. Together they form a unique fingerprint.

Cite this