H3T: Hierarchical Transferable Transformer with TokenMix for Unsupervised Domain Adaptation

Yihua Ren; Junyu Gao; Yuan Yuan

doi:10.1016/j.eswa.2024.125543

H3T: Hierarchical Transferable Transformer with TokenMix for Unsupervised Domain Adaptation

Yihua Ren, Junyu Gao, Yuan Yuan

School of Artificial Intelligence, OPtics and Electronics

Research output: Contribution to journal › Article › peer-review

Abstract

Recent research has been focused on exploring the capabilities of Vision Transformers (ViTs) in Unsupervised Domain Adaptation (UDA). This approach typically involves providing more significant attention to fine-grained common information through patch-level transferable discrimination. However, prematurely assigning narrow-range transferability information at the encoding stage can sparse image information, thereby increasing the difficulty of downstream tasks. Therefore, we propose a Hierarchical Transferable Transformer with TokenMix (H3T), which maintains the allocation of fine-grained transferability at the encoding stage while enhancing the learning strength of image information through feature mixup. To address the challenge of missing sample labels in the target domain within the domain adaptation task, we have specifically designed the TokenMix Module (TMM) for ViTs. This module learns the style information from both domains while alleviating the impact of image sparsity on downstream tasks. Furthermore, to enhance the semantic connections among narrow-range image transfer messages, we propose the Hierarchical Discriminative Module (HDM), which also serves a critical role in encoding discriminative information. Our approach underwent comprehensive experimentation across five datasets of varying sizes, demonstrating its effectiveness. Our code is available at https://github.com/reyihua/H3T.

Original language	English
Article number	125543
Journal	Expert Systems with Applications
Volume	262
DOIs	https://doi.org/10.1016/j.eswa.2024.125543
State	Published - 1 Mar 2025

Keywords

Adversarial learning
Domain adaptation
Mix-up
Vision transformers

Access to Document

10.1016/j.eswa.2024.125543

Cite this

@article{15c284b32eba4fbfae2b7e6e54897d5e,

title = "H3T: Hierarchical Transferable Transformer with TokenMix for Unsupervised Domain Adaptation",

abstract = "Recent research has been focused on exploring the capabilities of Vision Transformers (ViTs) in Unsupervised Domain Adaptation (UDA). This approach typically involves providing more significant attention to fine-grained common information through patch-level transferable discrimination. However, prematurely assigning narrow-range transferability information at the encoding stage can sparse image information, thereby increasing the difficulty of downstream tasks. Therefore, we propose a Hierarchical Transferable Transformer with TokenMix (H3T), which maintains the allocation of fine-grained transferability at the encoding stage while enhancing the learning strength of image information through feature mixup. To address the challenge of missing sample labels in the target domain within the domain adaptation task, we have specifically designed the TokenMix Module (TMM) for ViTs. This module learns the style information from both domains while alleviating the impact of image sparsity on downstream tasks. Furthermore, to enhance the semantic connections among narrow-range image transfer messages, we propose the Hierarchical Discriminative Module (HDM), which also serves a critical role in encoding discriminative information. Our approach underwent comprehensive experimentation across five datasets of varying sizes, demonstrating its effectiveness. Our code is available at https://github.com/reyihua/H3T.",

keywords = "Adversarial learning, Domain adaptation, Mix-up, Vision transformers",

author = "Yihua Ren and Junyu Gao and Yuan Yuan",

note = "Publisher Copyright: {\textcopyright} 2024",

year = "2025",

month = mar,

day = "1",

doi = "10.1016/j.eswa.2024.125543",

language = "英语",

volume = "262",

journal = "Expert Systems with Applications",

issn = "0957-4174",

publisher = "Elsevier Ltd",

}

TY - JOUR

T1 - H3T

T2 - Hierarchical Transferable Transformer with TokenMix for Unsupervised Domain Adaptation

AU - Ren, Yihua

AU - Gao, Junyu

AU - Yuan, Yuan

PY - 2025/3/1

Y1 - 2025/3/1

N2 - Recent research has been focused on exploring the capabilities of Vision Transformers (ViTs) in Unsupervised Domain Adaptation (UDA). This approach typically involves providing more significant attention to fine-grained common information through patch-level transferable discrimination. However, prematurely assigning narrow-range transferability information at the encoding stage can sparse image information, thereby increasing the difficulty of downstream tasks. Therefore, we propose a Hierarchical Transferable Transformer with TokenMix (H3T), which maintains the allocation of fine-grained transferability at the encoding stage while enhancing the learning strength of image information through feature mixup. To address the challenge of missing sample labels in the target domain within the domain adaptation task, we have specifically designed the TokenMix Module (TMM) for ViTs. This module learns the style information from both domains while alleviating the impact of image sparsity on downstream tasks. Furthermore, to enhance the semantic connections among narrow-range image transfer messages, we propose the Hierarchical Discriminative Module (HDM), which also serves a critical role in encoding discriminative information. Our approach underwent comprehensive experimentation across five datasets of varying sizes, demonstrating its effectiveness. Our code is available at https://github.com/reyihua/H3T.

AB - Recent research has been focused on exploring the capabilities of Vision Transformers (ViTs) in Unsupervised Domain Adaptation (UDA). This approach typically involves providing more significant attention to fine-grained common information through patch-level transferable discrimination. However, prematurely assigning narrow-range transferability information at the encoding stage can sparse image information, thereby increasing the difficulty of downstream tasks. Therefore, we propose a Hierarchical Transferable Transformer with TokenMix (H3T), which maintains the allocation of fine-grained transferability at the encoding stage while enhancing the learning strength of image information through feature mixup. To address the challenge of missing sample labels in the target domain within the domain adaptation task, we have specifically designed the TokenMix Module (TMM) for ViTs. This module learns the style information from both domains while alleviating the impact of image sparsity on downstream tasks. Furthermore, to enhance the semantic connections among narrow-range image transfer messages, we propose the Hierarchical Discriminative Module (HDM), which also serves a critical role in encoding discriminative information. Our approach underwent comprehensive experimentation across five datasets of varying sizes, demonstrating its effectiveness. Our code is available at https://github.com/reyihua/H3T.

KW - Adversarial learning

KW - Domain adaptation

KW - Mix-up

KW - Vision transformers

UR - http://www.scopus.com/inward/record.url?scp=85207934600&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2024.125543

DO - 10.1016/j.eswa.2024.125543

M3 - 文章

AN - SCOPUS:85207934600

SN - 0957-4174

VL - 262

JO - Expert Systems with Applications

JF - Expert Systems with Applications

M1 - 125543

ER -

H3T: Hierarchical Transferable Transformer with TokenMix for Unsupervised Domain Adaptation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this